[lttng-dev] Lacking instrumentation for threaded IRQs

Mathieu Desnoyers mathieu.desnoyers at efficios.com
Tue Oct 2 13:51:02 EDT 2018


----- On Oct 2, 2018, at 1:17 PM, Geneviève Bastien gbastien at versatic.net wrote:

> [...]
> 
>>> We need to know that those threads are really interrupt thread for the
>>> dependency analysis. Look at the following sequence of events:
>>>
>>> [12:02:17.779984311] (+0.000002224) dorsallaptop *sched_waking*: {
>>> cpu_id = 3 }, { comm = "irq/130-iwlwifi", tid = *639*, prio = -51,
>>> target_cpu = 3 }
>>> [12:02:17.779985668] (+0.000000565) dorsallaptop *do_IRQ_return*: {
>>> cpu_id = 3 }, { ip = 0xFFFFFFFF89A01C30, parent_ip = 0xFFFFFFFF89A0098F }
>>> [12:02:17.779986575] (+0.000000907) dorsallaptop *sched_switch*: {
>>> cpu_id = 3 }, { prev_comm = "swapper/3", prev_tid = 0, prev_prio = 20,
>>> prev_state = 0, next_comm = "irq/130-iwlwifi", next_tid = *639*,
>>> next_prio = -51 }
>>> [12:02:17.779998608] (+0.000011174) dorsallaptop net_if_receive_skb: {
>>> cpu_id = 3 }, { ... }
>>> [12:02:17.780019468] (+0.000020860) dorsallaptop sched_waking: { cpu_id
>>> = 3 }, { comm = "wget", tid = 2430, prio = 20, target_cpu = 0 }
>>>
>>> If we do the critical path of the wget process, with the irq/130-iwlwifi
>>> thread considered a normal thread, we see it is woken up by an hardware
>>> interrupt, that we know if iwlwifi, but we don't know anything else
>>> about it, so the critical path ends up following the swapper process...

No, that's wrong. The dependency chain should take into account execution
contexts (irq, softirq, thread), not just the current thread. It should then
follow the wakeups.

The sched_waking here should belong to the interrupt handler context, which
would be made possible by adding the do_IRQ entry/exit instrumentation.

I don't see any valid reason to distinguish between interrupt-threads and
normal threads once this is solved.

>>> Now if we consider the irq/130-iwlwifi thread as an interrupt-running
>>> process, we see that wget is actually woken up by the net_if_receive_skb
>>> event and we can follow this to the packet sent by another machine.
>>> Of course this can all be done at analysis time: if (ret = 2 and a
>>> wakeup inside the do_IRQ but outside the irq_handler) { assume the woken
>>> thread will be handling the interrupt } But it's the "assume" part that
>>> I don't like. With instrumentation, we can clarify this.
>> In this example, assuming we have the do_IRQ event, the critical path
>> seems clear to me:
>> - irq wakes up task 639
>> - task 639 runs, receives a packet and wakes up another task
>>
>> So wget has been woken up by a thread that was itself woken up by an
>> interrupt, I don't see how swapper could end up in this chain if we have
>> the do_IRQ event.
> 
> That's just because it's the thread that was running when the interrupt
> occurred and that's how the critical path handles it now. But that can
> be fixed.

It seems to be an issue with the critical path analysis model being inaccurate.
AFAIK those cases are handled correctly by the model implemented within
Julien's latency tracker. Modeling the interrupt context surrounded by do_IRQ
entry/exit separately from the thread it nests upon should fix that.

Thanks,

Mathieu


> 
> But more importantly, task 639 was woken up so that it could receive the
> packet and wake up the other task, so that other task was really woken
> up by the packet reception, which leads the critical path to the process
> sending it. Compare this with what happens on eth interfaces, where the
> packet is received in softirqs:
> 
> [15:17:55.861738353] (+0.000000642) wilbrod irq_softirq_entry: { cpu_id
> = 1 }, { vec = 3 }
> [15:17:55.861739968] (+0.000001615) wilbrod net_if_receive_skb: { cpu_id
> = 1 }, { ...}
> [15:17:55.861746816] (+0.000006848) wilbrod sched_waking: { cpu_id = 1
> }, { comm = "sshd", tid = 6053, prio = 20, target_cpu = 5 }
> [15:17:55.861757051] (+0.000005023) wilbrod irq_softirq_exit: { cpu_id =
> 1 }, { vec = 3 }
> 
> There the packet being received inside an interrupt context, the packet
> is really the cause of the wakeup, not the hardware interrupt preceding
> it. In threaded IRQ example, it would be the same if we could assign
> some kind of irq context to task 639. And for that, we want to make sure
> task 639 is an interrupt thread.
> 
> Geneviève

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


More information about the lttng-dev mailing list