[ltt-dev] [patch] add tracepoints to trace activate/deactivate task
Mathieu Desnoyers
compudj at krystal.dyndns.org
Wed Dec 10 07:34:40 EST 2008
* Peter Zijlstra (peterz at infradead.org) wrote:
> On Tue, 2008-12-09 at 17:10 -0500, Mathieu Desnoyers wrote:
> > * Jason Baron (jbaron at redhat.com) wrote:
> > > On Mon, Dec 08, 2008 at 11:42:49PM +0100, Peter Zijlstra wrote:
> > > > On Mon, 2008-12-08 at 17:38 -0500, Jason Baron wrote:
> > > > > On Mon, Dec 08, 2008 at 08:54:10PM +0100, Peter Zijlstra wrote:
> > > > > > On Mon, 2008-12-08 at 14:49 -0500, Jason Baron wrote:
> > > > > > > hi,
> > > > > > >
> > > > > > > I thought it would be useful to track when a task is
> > > > > > > 'activated/deactivated'. This case is different from wakeup/wait, in that
> > > > > > > task can be activated and deactivated, when the scheduler re-balances
> > > > > > > tasks, the allowable cpuset changes, or cpu hotplug occurs. Using these
> > > > > > > patches I can more precisely figure out when a task becomes runnable and
> > > > > > > why.
> > > > > >
> > > > > > Then I still not agree with it because it does not expose the event that
> > > > > > did the change.
> > > > > >
> > > > > > If you want the cpu allowed mask, put a tracepoint there. If you want
> > > > > > migrate information (didn't we have that?) then put one there, etc.
> > > > > >
> > > > >
> > > > > well, with stap backtrace I can figure out the event, otherwise i'm
> > > > > sprinkling 14 more trace events in the scheduler...I can go down that
> > > > > patch if people think its better?
> > > >
> > > > what events are you interested in? some of them are just straight
> > > > syscall things like nice.
> > > >
> > > > But yes, I'd rather you'd do the events - that's what tracepoints are
> > > > all about, marking indivudual events, not some fugly hook for stap.
> > >
> > > well, i think that the activate/deactivate combination gives you a lot
> > > of interesting statistics. You could figure out how long tasks wait on
> > > the runqueue, when and how tasks are migrated between runqueues, queue
> > > lengths, average queue lengths, large queues lengths. These statistics
> > > could help diagnose performance problems.
> > >
> > > For example, i just wrote the systemtap script below which outputs the
> > > distribution of queue lengths per-cpu on my system. I'm sure Frank could
> > > improve the stap code, but below is the script and the output.
> > >
> >
> > Quoting yourself in this thread :
> >
> > "I thought it would be useful to track when a task is
> > 'activated/deactivated'. This case is different from wakeup/wait, in
> > that task can be activated and deactivated, when the scheduler
> > re-balances tasks, the allowable cpuset changes, or cpu hotplug occurs.
> > Using these patches I can more precisely figure out when a task becomes
> > runnable and why."
> >
> > Peter Zijlstra objected that the key events we would like to see traced
> > are more detailed than just "activate/deactivate" state, e.g. an event
> > for wakeup, one for wait, one for re-balance, one for cpuset change, one
> > for hotplug. Doing this will allow other tracers to do other useful
> > stuff with the information.
> >
> > So trying to argue that "activate/deactivate" is "good" is missing the
> > point here. Yes, we need that information, but in fact we need _more
> > precise_ information, which is a superset of those "activate/deactivate"
> > events.
> >
> > Peter, am I understanding your point correctly ?
>
> Quite so.
>
> The point is that for some operations the tasks never conceptually leave
> the rq, like renice - its never not runnable, we just need to pull it
> off and re-insert it in order to make the weight change, but its for all
> intents and purposes an 'atomic' operation.
>
> So if you want that information, you want a trace_sched_setscheduler()
> hook that will tell you about the old scheduler/prio and the new
> scheduler/prio for a task - because that is the information that is
> native to that event, deactivate/activate not so, one could conceivably
> implement renice without using them.
>
> Same thing with migration, its conceptually an atomic op, we just need
> to lift a task from one cpu to another, it doesn't conceptually become
> unrunnable during that move.
>
> So looking at activate/deactivate is wrong.
>
> ---
> Subject: trace: fixup task migration trace point
>
> The trace point only caught one of many places where a task changes cpu,
> put it in the right place to we get all of them.
>
> Change the signature while we're at it.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra at chello.nl>
> ---
> include/trace/sched.h | 4 ++--
> kernel/sched.c | 3 ++-
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/include/trace/sched.h b/include/trace/sched.h
> index 9b2854a..f4549d5 100644
> --- a/include/trace/sched.h
> +++ b/include/trace/sched.h
> @@ -30,8 +30,8 @@ DECLARE_TRACE(sched_switch,
> TPARGS(rq, prev, next));
>
> DECLARE_TRACE(sched_migrate_task,
> - TPPROTO(struct rq *rq, struct task_struct *p, int dest_cpu),
> - TPARGS(rq, p, dest_cpu));
> + TPPROTO(struct task_struct *p, int orig_cpu, int dest_cpu),
> + TPARGS(p, orig_cpu, dest_cpu));
>
> DECLARE_TRACE(sched_process_free,
> TPPROTO(struct task_struct *p),
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 0eff15b..3dc54cd 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -1861,6 +1861,8 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>
> clock_offset = old_rq->clock - new_rq->clock;
>
> + trace_sched_migrate_task(p, task_cpu(p), new_cpu);
Hrm, looking at it, I think that :
- task_cpu(p) will add some code output outside of the conditional
branch, which I think we would like to avoid.
- We can easily get the "from" cpu within the tracepoint probe.
Therefore, I don't see why we would extract this information
explicitly ?
Mathieu
> +
> #ifdef CONFIG_SCHEDSTATS
> if (p->se.wait_start)
> p->se.wait_start -= clock_offset;
> @@ -2841,7 +2843,6 @@ static void sched_migrate_task(struct task_struct *p, int dest_cpu)
> || unlikely(!cpu_active(dest_cpu)))
> goto out;
>
> - trace_sched_migrate_task(rq, p, dest_cpu);
> /* force the process onto the specified CPU */
> if (migrate_task(p, dest_cpu, &req)) {
> /* Need to wait for migration thread (might exit: take ref). */
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
More information about the lttng-dev
mailing list