[ltt-dev] [patch] add tracepoints to trace activate/deactivate task

Wed Dec 10 07:34:40 EST 2008

* Peter Zijlstra (peterz at infradead.org) wrote:
> On Tue, 2008-12-09 at 17:10 -0500, Mathieu Desnoyers wrote:
> > * Jason Baron (jbaron at redhat.com) wrote:
> > > On Mon, Dec 08, 2008 at 11:42:49PM +0100, Peter Zijlstra wrote:
> > > > On Mon, 2008-12-08 at 17:38 -0500, Jason Baron wrote:
> > > > > On Mon, Dec 08, 2008 at 08:54:10PM +0100, Peter Zijlstra wrote:
> > > > > > On Mon, 2008-12-08 at 14:49 -0500, Jason Baron wrote:
> > > > > > > hi,
> > > > > > > 
> > > > > > > I thought it would be useful to track when a task is
> > > > > > > 'activated/deactivated'. This case is different from wakeup/wait, in that
> > > > > > > task can be activated and deactivated, when the scheduler re-balances
> > > > > > > tasks, the allowable cpuset changes, or cpu hotplug occurs. Using these
> > > > > > > patches I can more precisely figure out when a task becomes runnable and
> > > > > > > why. 
> > > > > > 
> > > > > > Then I still not agree with it because it does not expose the event that
> > > > > > did the change.
> > > > > > 
> > > > > > If you want the cpu allowed mask, put a tracepoint there. If you want
> > > > > > migrate information (didn't we have that?) then put one there, etc.
> > > > > > 
> > > > > 
> > > > > well, with stap backtrace I can figure out the event, otherwise i'm
> > > > > sprinkling 14 more trace events in the scheduler...I can go down that
> > > > > patch if people think its better?
> > > > 
> > > > what events are you interested in? some of them are just straight
> > > > syscall things like nice.
> > > > 
> > > > But yes, I'd rather you'd do the events - that's what tracepoints are
> > > > all about, marking indivudual events, not some fugly hook for stap.
> > > 
> > > well, i think that the activate/deactivate combination gives you a lot
> > > of interesting statistics. You could figure out how long tasks wait on
> > > the runqueue, when and how tasks are migrated between runqueues, queue
> > > lengths, average queue lengths, large queues lengths. These statistics
> > > could help diagnose performance problems.
> > > 
> > > For example, i just wrote the systemtap script below which outputs the 
> > > distribution of queue lengths per-cpu on my system. I'm sure Frank could 
> > > improve the stap code, but below is the script and the output.
> > > 
> > 
> > Quoting yourself in this thread :
> > 
> > "I thought it would be useful to track when a task is
> > 'activated/deactivated'. This case is different from wakeup/wait, in
> > that task can be activated and deactivated, when the scheduler
> > re-balances tasks, the allowable cpuset changes, or cpu hotplug occurs.
> > Using these patches I can more precisely figure out when a task becomes
> > runnable and why."
> > 
> > Peter Zijlstra objected that the key events we would like to see traced
> > are more detailed than just "activate/deactivate" state, e.g. an event
> > for wakeup, one for wait, one for re-balance, one for cpuset change, one
> > for hotplug. Doing this will allow other tracers to do other useful
> > stuff with the information.
> > 
> > So trying to argue that "activate/deactivate" is "good" is missing the
> > point here. Yes, we need that information, but in fact we need _more
> > precise_ information, which is a superset of those "activate/deactivate"
> > events.
> > 
> > Peter, am I understanding your point correctly ?
> 
> Quite so.
> 
> The point is that for some operations the tasks never conceptually leave
> the rq, like renice - its never not runnable, we just need to pull it
> off and re-insert it in order to make the weight change, but its for all
> intents and purposes an 'atomic' operation.
> 
> So if you want that information, you want a trace_sched_setscheduler()
> hook that will tell you about the old scheduler/prio and the new
> scheduler/prio for a task - because that is the information that is
> native to that event, deactivate/activate not so, one could conceivably
> implement renice without using them.
> 
> Same thing with migration, its conceptually an atomic op, we just need
> to lift a task from one cpu to another, it doesn't conceptually become
> unrunnable during that move.
> 
> So looking at activate/deactivate is wrong.
> 
> ---
> Subject: trace: fixup task migration trace point
> 
> The trace point only caught one of many places where a task changes cpu,
> put it in the right place to we get all of them.
> 
> Change the signature while we're at it.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra at chello.nl>
> ---
>  include/trace/sched.h |    4 ++--
>  kernel/sched.c        |    3 ++-
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/include/trace/sched.h b/include/trace/sched.h
> index 9b2854a..f4549d5 100644
> --- a/include/trace/sched.h
> +++ b/include/trace/sched.h
> @@ -30,8 +30,8 @@ DECLARE_TRACE(sched_switch,
>  		TPARGS(rq, prev, next));
>  
>  DECLARE_TRACE(sched_migrate_task,
> -	TPPROTO(struct rq *rq, struct task_struct *p, int dest_cpu),
> -		TPARGS(rq, p, dest_cpu));
> +	TPPROTO(struct task_struct *p, int orig_cpu, int dest_cpu),
> +		TPARGS(p, orig_cpu, dest_cpu));
>  
>  DECLARE_TRACE(sched_process_free,
>  	TPPROTO(struct task_struct *p),
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 0eff15b..3dc54cd 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -1861,6 +1861,8 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>  
>  	clock_offset = old_rq->clock - new_rq->clock;
>  
> +	trace_sched_migrate_task(p, task_cpu(p), new_cpu);

Hrm, looking at it, I think that :

- task_cpu(p) will add some code output outside of the conditional
  branch, which I think we would like to avoid.
- We can easily get the "from" cpu within the tracepoint probe.
  Therefore, I don't see why we would extract this information
  explicitly ?

Mathieu

> +
>  #ifdef CONFIG_SCHEDSTATS
>  	if (p->se.wait_start)
>  		p->se.wait_start -= clock_offset;
> @@ -2841,7 +2843,6 @@ static void sched_migrate_task(struct task_struct *p, int dest_cpu)
>  	    || unlikely(!cpu_active(dest_cpu)))
>  		goto out;
>  
> -	trace_sched_migrate_task(rq, p, dest_cpu);
>  	/* force the process onto the specified CPU */
>  	if (migrate_task(p, dest_cpu, &req)) {
>  		/* Need to wait for migration thread (might exit: take ref). */
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68