From francis.giraldeau at gmail.com Mon Oct 1 00:06:01 2012 From: francis.giraldeau at gmail.com (Francis Giraldeau) Date: Mon, 01 Oct 2012 00:06:01 -0400 Subject: [lttng-dev] notrace missing in lttng-ust In-Reply-To: <20120717141301.GA21204@Krystal> References: <4FED6718.1000205@mentor.com> <20120630181600.GB30747@Krystal> <4FF2A31D.9020100@mentor.com> <20120717141301.GA21204@Krystal> Message-ID: <506916A9.2050308@gmail.com> Le 2012-07-17 10:13, Mathieu Desnoyers a ?crit : > * Woegerer, Paul (Paul_Woegerer at mentor.com) wrote: >> Hi Mathieu, >> >> here is the revised patch that makes tracepoint.h and >> ust-tracepoint-event.h robust against -finstrument-functions. > > I think we want to make the notrace always active. I don't see the point > in letting UST lib be compiled with those tracing stubs in place. I just hit a case where it would have been useful to profile lttng-ust itself. When generating events in a tight loop, the average time to execute a simple tracepoint is about 280ns on my machine. But the problem is that once in a while (once out of a million execution) the latency bump to a blazing 100us. I agree, it's still fast, but nonetheless the maximum is ~350 times more than the average, which is strange. I tried many profiling tools (perf, valgrind, etc.) and none was providing good results. It's no point to use lttng-ust itself (are tracepoints reentrants?). So, I decided to try -finstrument-functions stuff with a small pre-allocated ringbuffer to get the lowest latency possible. Well, it turns out that I was getting no results for the tracepoint, because function instrumentation is disabled. I tried to re-enable it, but then the instrumented app fails with this error message: /usr/local/include/lttng/ust-tracepoint-event.h:646: __lttng_events_init__npt: Assertion `!ret' failed. I confess that I didn't look deeply into this problem, but it's quite strange that the init fails in the case function instrumentation is enabled. Could it be related to the dlopen() done at registration? Maybe somebody has an idea, because I saw screenshots of similar profiling previously on the ML ;) Here is the configuration options to lttng-ust (the profiling library is miniprof[1]): ./configure CFLAGS="-finstrument-functions" LIBS="-lminiprof" LDFLAGS="-L/usr/local/lib -Wl,-E" Cheers, Francis [1] https://github.com/giraldeau/miniprof -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4489 bytes Desc: Signature cryptographique S/MIME URL: From paul.chavent at fnac.net Mon Oct 1 04:39:53 2012 From: paul.chavent at fnac.net (paul.chavent at fnac.net) Date: Mon, 1 Oct 2012 10:39:53 +0200 (CEST) Subject: [lttng-dev] Wich interface to use for lttng ust Message-ID: <14773128.185971349080793918.JavaMail.www@wsfrf1114> Hi. I would like to try lttng user space traces. I've found two documentations : ?- manual : http://lttng.org/files/ust/manual/ust.html ?- man page : http://lttng.org/files/doc/man-pages/man3/lttng-ust.3.html Is there any "prefered" choice for reference ? Should/Can i use ust_marker, tracepoint ? Thank for your replies. Paul. From mathieu.desnoyers at efficios.com Mon Oct 1 10:56:19 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 1 Oct 2012 10:56:19 -0400 Subject: [lttng-dev] UST app and lttng-tools compatibility In-Reply-To: <5069111F.1080604@gmail.com> References: <50670C5A.4060800@gmail.com> <20120929182800.GA19994@Krystal> <5069111F.1080604@gmail.com> Message-ID: <20121001145619.GB13423@Krystal> * Francis Giraldeau (francis.giraldeau at gmail.com) wrote: > Le 2012-09-29 14:28, Mathieu Desnoyers a ?crit : > > * Francis Giraldeau (francis.giraldeau at gmail.com) wrote: > >> Hi, > >> > >> I wanted to share my lttng-ust 2.1 update experience, maybe it will save > >> time for others. > >> > >> I updated lttng-ust recently. After this change, the app would not > >> produce a trace anymore. No error message is displayed by the traced app > >> to indicate that something is wrong. Even when setting LTTNG_DEBUG_UST > >> to the app's environment variable, there is no error message. The debug > >> output suggests that probes are registered and everything is fine, while > >> it's not. > >> > >> By running lttng-sessiond with -vvv --verbose-consumer, I finally got > >> this message: > >> > >> DEBUG2: UST app PID 8112 is not compatible with major version 3 > >> (supporting <= 2) [in ust_app_validate_version() at ust-app.c:2633] > >> > >> Updating lttng-tools to 2.1 solved the issue. Seems that it's mandatory > >> to update lttng-tools to support latest lttng-ust. It may be obvious for > >> developers, but it should be clear for users that they must upgrade both. > >> > >> IMHO, It would be nice if the app side log could tell if the > >> session/consumer refused the registration. > > > > I agree we should do better. > > > > Regarding lttng-tools, I think changing this DBG2 message to a WARN > > message would help, so sessiond would show the warning, except in the > > case where it is started with "-d". > > Excellent idea. David, can you make this change ? > > > On the application side, this is a bit tricky. It has no way to find out > > that it has been rejected by the sessiond. The application registers at > > startup, and then the sessiond keeps the connexion active, but flags it > > as incompatible internally. The reason we do that is because we don't > > want the application to retry endlessly. > > Could the registration process block until the sessiond returns some > status? The session is not "returning" anything to the application. The application is registering to the sessiond, and all the sessiond can do is to keep the socket alive or close it. Even if we added a "command" that the sessiond could use to tell the application it has a wrong version, that would not be 2.0 material, and the application wouldn't know about it. Moreover, given we have changed the communication protocol, not sure we would like to have this extra command as entirely fixed for now, as it would be part of the communication protocol (which is not the case for the initial app registration, with is part of a lower level protocol). > I understand that in commercial setup, the application should > not be prevented to start and run normally if tracing is not available > or misconfigured. By default, we only block the application for up to 3 seconds at registration. I think that if we start the application with LTTNG_UST_REGISTER_TIMEOUT=-1, the application will, in this case, block forever (see man lttng-ust). Thanks! Mathieu > But for developers, some env var like > "LTTNG_ABORT_ON_ERROR" could help to diagnose this king of problem. > > > Moreover, I cannot change the code for the existing 2.0 UST libs, so > > adding a new message is not possible. > > Of course! ;) > > > One thing we have in mind for 2.2 or 2.3 is to add syslog support within > > the sessiond. This would provide a nice centralized place to look at > > those logs. > > I think it would be great. It's a bit less hidden, but certainly address > concerns for production, and still can be used by developers. An error > message is better than no one! ;) > > Cheers, > > Francis > > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 1 10:59:25 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 1 Oct 2012 10:59:25 -0400 Subject: [lttng-dev] Wich interface to use for lttng ust In-Reply-To: <14773128.185971349080793918.JavaMail.www@wsfrf1114> References: <14773128.185971349080793918.JavaMail.www@wsfrf1114> Message-ID: <20121001145925.GC13423@Krystal> * paul.chavent at fnac.net (paul.chavent at fnac.net) wrote: > Hi. > > I would like to try lttng user space traces. I've found two documentations : > ?- manual : http://lttng.org/files/ust/manual/ust.html > ?- man page : http://lttng.org/files/doc/man-pages/man3/lttng-ust.3.html > > Is there any "prefered" choice for reference ? lttng-ust(3). > Should/Can i use > ust_marker, tracepoint ? ust_marker do not exist anymore. Alexandre, can you remove the old http://lttng.org/files/ust/manual/ust.html 0.x manpage ? Or make sure you move it somewhere that clearly states it is outdated ? Thanks, Mathieu > > Thank for your replies. > > Paul. > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Mon Oct 1 11:06:44 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 01 Oct 2012 11:06:44 -0400 Subject: [lttng-dev] UST app and lttng-tools compatibility In-Reply-To: <20121001145619.GB13423@Krystal> References: <50670C5A.4060800@gmail.com> <20120929182800.GA19994@Krystal> <5069111F.1080604@gmail.com> <20121001145619.GB13423@Krystal> Message-ID: <5069B184.4060903@efficios.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Mathieu Desnoyers: > * Francis Giraldeau (francis.giraldeau at gmail.com) wrote: >> Le 2012-09-29 14:28, Mathieu Desnoyers a ?crit : >>> * Francis Giraldeau (francis.giraldeau at gmail.com) wrote: >>>> Hi, >>>> >>>> I wanted to share my lttng-ust 2.1 update experience, maybe >>>> it will save time for others. >>>> >>>> I updated lttng-ust recently. After this change, the app >>>> would not produce a trace anymore. No error message is >>>> displayed by the traced app to indicate that something is >>>> wrong. Even when setting LTTNG_DEBUG_UST to the app's >>>> environment variable, there is no error message. The debug >>>> output suggests that probes are registered and everything is >>>> fine, while it's not. >>>> >>>> By running lttng-sessiond with -vvv --verbose-consumer, I >>>> finally got this message: >>>> >>>> DEBUG2: UST app PID 8112 is not compatible with major version >>>> 3 (supporting <= 2) [in ust_app_validate_version() at >>>> ust-app.c:2633] >>>> >>>> Updating lttng-tools to 2.1 solved the issue. Seems that it's >>>> mandatory to update lttng-tools to support latest lttng-ust. >>>> It may be obvious for developers, but it should be clear for >>>> users that they must upgrade both. >>>> >>>> IMHO, It would be nice if the app side log could tell if the >>>> session/consumer refused the registration. >>> >>> I agree we should do better. >>> >>> Regarding lttng-tools, I think changing this DBG2 message to a >>> WARN message would help, so sessiond would show the warning, >>> except in the case where it is started with "-d". >> >> Excellent idea. > > David, can you make this change ? > Yes! David >> >>> On the application side, this is a bit tricky. It has no way to >>> find out that it has been rejected by the sessiond. The >>> application registers at startup, and then the sessiond keeps >>> the connexion active, but flags it as incompatible internally. >>> The reason we do that is because we don't want the application >>> to retry endlessly. >> >> Could the registration process block until the sessiond returns >> some status? > > The session is not "returning" anything to the application. The > application is registering to the sessiond, and all the sessiond > can do is to keep the socket alive or close it. > > Even if we added a "command" that the sessiond could use to tell > the application it has a wrong version, that would not be 2.0 > material, and the application wouldn't know about it. Moreover, > given we have changed the communication protocol, not sure we would > like to have this extra command as entirely fixed for now, as it > would be part of the communication protocol (which is not the case > for the initial app registration, with is part of a lower level > protocol). > >> I understand that in commercial setup, the application should not >> be prevented to start and run normally if tracing is not >> available or misconfigured. > > By default, we only block the application for up to 3 seconds at > registration. I think that if we start the application with > LTTNG_UST_REGISTER_TIMEOUT=-1, the application will, in this case, > block forever (see man lttng-ust). > > Thanks! > > Mathieu > >> But for developers, some env var like "LTTNG_ABORT_ON_ERROR" >> could help to diagnose this king of problem. >> >>> Moreover, I cannot change the code for the existing 2.0 UST >>> libs, so adding a new message is not possible. >> >> Of course! ;) >> >>> One thing we have in mind for 2.2 or 2.3 is to add syslog >>> support within the sessiond. This would provide a nice >>> centralized place to look at those logs. >> >> I think it would be great. It's a bit less hidden, but certainly >> address concerns for production, and still can be used by >> developers. An error message is better than no one! ;) >> >> Cheers, >> >> Francis >> >> >> > > > -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJQabGBAAoJEELoaioR9I024c0IAJ8i7hIUthhQdbeESSV6T65L VgxPYtjUr2aKbOaeInS1FXiW39Gpg5w1Xut9gTaAmlxP+Dsgs1TXLVx0YJFCsqi0 QYrkPLbpdafmCjbXKpWa6G5hNCFa684MorL6hCqhSOQEBXD0N/aKGQmUVz+pxGXp xR/l/hwHadm4v6VH75N5dr5JvGNHQbcZDYz0gzokemyo1edw0NGa3jQGe0WYcVcp /K6hybbgRvkZObMB2hhTESkLwAFk/OO4WEP6zTZCSPLCTAdiGLaWNQQ9PdxxkcT+ crT83mDSW8K1yQkZDXOk5U9XSmdBLjBuQW6094EaCF9Nb5K6d+y2U01dii4eIU4= =6aHd -----END PGP SIGNATURE----- From mathieu.desnoyers at efficios.com Mon Oct 1 11:24:35 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 1 Oct 2012 11:24:35 -0400 Subject: [lttng-dev] UST app and lttng-tools compatibility In-Reply-To: <20121001145619.GB13423@Krystal> References: <50670C5A.4060800@gmail.com> <20120929182800.GA19994@Krystal> <5069111F.1080604@gmail.com> <20121001145619.GB13423@Krystal> Message-ID: <20121001152435.GA13628@Krystal> * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: > * Francis Giraldeau (francis.giraldeau at gmail.com) wrote: [...] > > Could the registration process block until the sessiond returns some > > status? > > The session is not "returning" anything to the application. The > application is registering to the sessiond, and all the sessiond can do > is to keep the socket alive or close it. > > Even if we added a "command" that the sessiond could use to tell the > application it has a wrong version, that would not be 2.0 material, and > the application wouldn't know about it. Moreover, given we have changed > the communication protocol, not sure we would like to have this extra > command as entirely fixed for now, as it would be part of the > communication protocol (which is not the case for the initial app > registration, with is part of a lower level protocol). > > > I understand that in commercial setup, the application should > > not be prevented to start and run normally if tracing is not available > > or misconfigured. > > By default, we only block the application for up to 3 seconds at > registration. I think that if we start the application with > LTTNG_UST_REGISTER_TIMEOUT=-1, the application will, in this case, block > forever (see man lttng-ust). Let me take this last part back. That was a pre-morning-coffee statement. ;) The application will receive a "registration done" message when the version is found to be incompatible. Therefore, it will never wait for the incompatible sessiond. One thing we could do to make transition smoother between future versions would be to add a new command to let the sessiond send its own version info to the application. In this command's handler, the application could show a warning on stderr. This could not apply to 2.0, but we could do it starting from 2.1. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Mon Oct 1 11:27:20 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 01 Oct 2012 11:27:20 -0400 Subject: [lttng-dev] UST app and lttng-tools compatibility In-Reply-To: <20121001152435.GA13628@Krystal> References: <50670C5A.4060800@gmail.com> <20120929182800.GA19994@Krystal> <5069111F.1080604@gmail.com> <20121001145619.GB13423@Krystal> <20121001152435.GA13628@Krystal> Message-ID: <5069B658.8010106@efficios.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Mathieu Desnoyers: > * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: >> * Francis Giraldeau (francis.giraldeau at gmail.com) wrote: > [...] >>> Could the registration process block until the sessiond returns >>> some status? >> >> The session is not "returning" anything to the application. The >> application is registering to the sessiond, and all the sessiond >> can do is to keep the socket alive or close it. >> >> Even if we added a "command" that the sessiond could use to tell >> the application it has a wrong version, that would not be 2.0 >> material, and the application wouldn't know about it. Moreover, >> given we have changed the communication protocol, not sure we >> would like to have this extra command as entirely fixed for now, >> as it would be part of the communication protocol (which is not >> the case for the initial app registration, with is part of a >> lower level protocol). >> >>> I understand that in commercial setup, the application should >>> not be prevented to start and run normally if tracing is not >>> available or misconfigured. >> >> By default, we only block the application for up to 3 seconds at >> registration. I think that if we start the application with >> LTTNG_UST_REGISTER_TIMEOUT=-1, the application will, in this >> case, block forever (see man lttng-ust). > > Let me take this last part back. That was a pre-morning-coffee > statement. ;) The application will receive a "registration done" > message when the version is found to be incompatible. Therefore, it > will never wait for the incompatible sessiond. > > One thing we could do to make transition smoother between future > versions would be to add a new command to let the sessiond send its > own version info to the application. In this command's handler, > the application could show a warning on stderr. This could not > apply to 2.0, but we could do it starting from 2.1. > > Thoughts ? Why sending version infos instead of a "incompatible" command ? David > > Thanks, > > Mathieu > > -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJQabZVAAoJEELoaioR9I02RYcIALJ7ySGiU6nKDD4F4ffAEIFr ZSQC/1mhr8x66TfTdB9eGaAxA6qmESEBpg6bjyWSmWCn1dI1i5lOeXhxszM1Z8Oz q6DKSSlRVRZFrCaZBf3Xy7QNbxTdUkwrcCjd8Y9Ffu1BswXIzzUhsYSHbJRQAHVF QAuNFy5AeHRlE55H94Gs/ydfiQb3jqVIalVnG5LmeDPyO58sdA+RAWkwaJypgL8e 5m6r3K0IK5ufSIjWWb6G8vmf3HFV3zzdEwGpkEyaVRZjUfDBekRnJ27Bb2XqJUxC /xe2S519VLnsCWE+j1pfN5BcchI0TKUOeTcaLzzaf/VdkHdlOmZlifSZ6ubRusU= =OVEY -----END PGP SIGNATURE----- From alexmonthy at voxpopuli.im Mon Oct 1 11:28:36 2012 From: alexmonthy at voxpopuli.im (Alexandre Montplaisir) Date: Mon, 01 Oct 2012 11:28:36 -0400 Subject: [lttng-dev] Wich interface to use for lttng ust In-Reply-To: <20121001145925.GC13423@Krystal> References: <14773128.185971349080793918.JavaMail.www@wsfrf1114> <20121001145925.GC13423@Krystal> Message-ID: <5069B6A4.9070001@voxpopuli.im> On 12-10-01 10:59 AM, Mathieu Desnoyers wrote: > * paul.chavent at fnac.net (paul.chavent at fnac.net) wrote: >> Hi. >> >> I would like to try lttng user space traces. I've found two documentations : >> - manual : http://lttng.org/files/ust/manual/ust.html >> - man page : http://lttng.org/files/doc/man-pages/man3/lttng-ust.3.html >> >> Is there any "prefered" choice for reference ? > lttng-ust(3). > >> Should/Can i use >> ust_marker, tracepoint ? > ust_marker do not exist anymore. > > Alexandre, can you remove the old > http://lttng.org/files/ust/manual/ust.html 0.x manpage ? Or make sure > you move it somewhere that clearly states it is outdated ? Done, deleted it since it's deprecated. Alex > > Thanks, > > Mathieu > >> Thank for your replies. >> >> Paul. From mathieu.desnoyers at efficios.com Mon Oct 1 11:29:01 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 1 Oct 2012 11:29:01 -0400 Subject: [lttng-dev] notrace missing in lttng-ust In-Reply-To: <506916A9.2050308@gmail.com> References: <4FED6718.1000205@mentor.com> <20120630181600.GB30747@Krystal> <4FF2A31D.9020100@mentor.com> <20120717141301.GA21204@Krystal> <506916A9.2050308@gmail.com> Message-ID: <20121001152901.GA13944@Krystal> * Francis Giraldeau (francis.giraldeau at gmail.com) wrote: > Le 2012-07-17 10:13, Mathieu Desnoyers a ?crit : > > * Woegerer, Paul (Paul_Woegerer at mentor.com) wrote: > >> Hi Mathieu, > >> > >> here is the revised patch that makes tracepoint.h and > >> ust-tracepoint-event.h robust against -finstrument-functions. > > > > I think we want to make the notrace always active. I don't see the point > > in letting UST lib be compiled with those tracing stubs in place. > > I just hit a case where it would have been useful to profile lttng-ust > itself. > > When generating events in a tight loop, the average time to execute a > simple tracepoint is about 280ns on my machine. But the problem is that > once in a while (once out of a million execution) the latency bump to a > blazing 100us. I agree, it's still fast, but nonetheless the maximum is > ~350 times more than the average, which is strange. interesting indeed. I would expect this time might be spent in the "write" to the fifo to wakeup the consumerd, but even then it is surprising, as this is supposed to be a non-blocking fifo. > I tried many profiling tools (perf, valgrind, etc.) and none was > providing good results. It's no point to use lttng-ust itself (are > tracepoints reentrants?). that might not work indeed. Something about infinite recursion. You might want to try kernel tracing, and see if this time is spent anywhere in the kernel when doing UST tracing. > So, I decided to try -finstrument-functions > stuff with a small pre-allocated ringbuffer to get the lowest latency > possible. Well, it turns out that I was getting no results for the > tracepoint, because function instrumentation is disabled. no, don't. you'll run into infinite recursion too. > > I tried to re-enable it, but then the instrumented app fails with this > error message: > > /usr/local/include/lttng/ust-tracepoint-event.h:646: > __lttng_events_init__npt: Assertion `!ret' failed. Doing something like that might require that you LD_PRELOAD liblttng-ust-tracepoint.so, so that its constructor runs before the constructor registering the .so that contains the function callback. Thanks, Mathieu > > I confess that I didn't look deeply into this problem, but it's quite > strange that the init fails in the case function instrumentation is > enabled. Could it be related to the dlopen() done at registration? Maybe > somebody has an idea, because I saw screenshots of similar profiling > previously on the ML ;) Here is the configuration options to lttng-ust > (the profiling library is miniprof[1]): > > ./configure CFLAGS="-finstrument-functions" LIBS="-lminiprof" > LDFLAGS="-L/usr/local/lib -Wl,-E" > > Cheers, > > Francis > > [1] https://github.com/giraldeau/miniprof > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 1 11:31:59 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 1 Oct 2012 11:31:59 -0400 Subject: [lttng-dev] UST app and lttng-tools compatibility In-Reply-To: <5069B658.8010106@efficios.com> References: <50670C5A.4060800@gmail.com> <20120929182800.GA19994@Krystal> <5069111F.1080604@gmail.com> <20121001145619.GB13423@Krystal> <20121001152435.GA13628@Krystal> <5069B658.8010106@efficios.com> Message-ID: <20121001153159.GB13944@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > > > Mathieu Desnoyers: > > * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: > >> * Francis Giraldeau (francis.giraldeau at gmail.com) wrote: > > [...] > >>> Could the registration process block until the sessiond returns > >>> some status? > >> > >> The session is not "returning" anything to the application. The > >> application is registering to the sessiond, and all the sessiond > >> can do is to keep the socket alive or close it. > >> > >> Even if we added a "command" that the sessiond could use to tell > >> the application it has a wrong version, that would not be 2.0 > >> material, and the application wouldn't know about it. Moreover, > >> given we have changed the communication protocol, not sure we > >> would like to have this extra command as entirely fixed for now, > >> as it would be part of the communication protocol (which is not > >> the case for the initial app registration, with is part of a > >> lower level protocol). > >> > >>> I understand that in commercial setup, the application should > >>> not be prevented to start and run normally if tracing is not > >>> available or misconfigured. > >> > >> By default, we only block the application for up to 3 seconds at > >> registration. I think that if we start the application with > >> LTTNG_UST_REGISTER_TIMEOUT=-1, the application will, in this > >> case, block forever (see man lttng-ust). > > > > Let me take this last part back. That was a pre-morning-coffee > > statement. ;) The application will receive a "registration done" > > message when the version is found to be incompatible. Therefore, it > > will never wait for the incompatible sessiond. > > > > One thing we could do to make transition smoother between future > > versions would be to add a new command to let the sessiond send its > > own version info to the application. In this command's handler, > > the application could show a warning on stderr. This could not > > apply to 2.0, but we could do it starting from 2.1. > > > > Thoughts ? > > Why sending version infos instead of a "incompatible" command ? It could very well be an "incompatible" command, but it would need to contain the version info, so the application can print the versions. The main difference between sending version info (all the time) and the "incompatible" command is that the version info would be sent all the time. It could provide better debugging logs in the app when using LTTNG_UST_DEBUG=1, but would require one extra message at app registration. Not sure which is best. Thoughts ? Thanks, Mathieu > > David > > > > > Thanks, > > > > Mathieu > > > > > -----BEGIN PGP SIGNATURE----- > > iQEcBAEBCgAGBQJQabZVAAoJEELoaioR9I02RYcIALJ7ySGiU6nKDD4F4ffAEIFr > ZSQC/1mhr8x66TfTdB9eGaAxA6qmESEBpg6bjyWSmWCn1dI1i5lOeXhxszM1Z8Oz > q6DKSSlRVRZFrCaZBf3Xy7QNbxTdUkwrcCjd8Y9Ffu1BswXIzzUhsYSHbJRQAHVF > QAuNFy5AeHRlE55H94Gs/ydfiQb3jqVIalVnG5LmeDPyO58sdA+RAWkwaJypgL8e > 5m6r3K0IK5ufSIjWWb6G8vmf3HFV3zzdEwGpkEyaVRZjUfDBekRnJ27Bb2XqJUxC > /xe2S519VLnsCWE+j1pfN5BcchI0TKUOeTcaLzzaf/VdkHdlOmZlifSZ6ubRusU= > =OVEY > -----END PGP SIGNATURE----- -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 1 12:01:13 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 1 Oct 2012 12:01:13 -0400 Subject: [lttng-dev] [LTTNG-MODULES PATCH v4] ABI with support for compat 32/64 bits In-Reply-To: <1348951127-18016-1-git-send-email-jdesfossez@efficios.com> References: <1348951127-18016-1-git-send-email-jdesfossez@efficios.com> Message-ID: <20121001160113.GA14260@Krystal> * Julien Desfossez (jdesfossez at efficios.com) wrote: > The current ABI does not work for compat 32/64 bits. > This patch moves the current ABI as old-abi and provides a new ABI in > which all the structures exchanged between user and kernel-space are > packed. Also this new ABI moves the "int overwrite" member of the > struct lttng_kernel_channel to remove the alignment added by the > compiler. > > A patch for lttng-tools has been developed in parallel to this one to > support the new ABI. These 2 patches have been tested in all > possible configurations (applied or not) on 64-bit and 32-bit kernels > (with CONFIG_COMPAT) and a user-space in 32 and 64-bit. > > Here are the results of the tests : > k 64 compat |?u 32 compat | OK > k 64 compat | u 64 compat | OK > k 64 compat | u 32 non-compat | KO > k 64 compat | u 64 non-compat | OK > > k 64 non-compat | u 64 compat | OK > k 64 non-compat | u 32 compat | KO > k 64 non-compat | u 64 non-compat | OK > k 64 non-compat | u 32 non-compat | KO > > k 32 compat | u compat | OK > k 32 compat | u non-compat | OK > > k 32 non-compat | u compat | OK > k 32 non-compat | u non-compat | OK > > The results are as expected : > - on 32-bit user-space and kernel, every configuration works. > - on 64-bit user-space and kernel, every configuration works. > - with 32-bit user-space on a 64-bit kernel the only configuration > where it works is when the compat patch is applied everywhere. > > Signed-off-by: Julien Desfossez Merged, thanks! Mathieu > --- > lttng-abi-old.h | 141 ++++++++++++++++++++ > lttng-abi.c | 392 +++++++++++++++++++++++++++++++++++++++++++++++-------- > lttng-abi.h | 48 +++---- > lttng-events.c | 1 + > lttng-events.h | 7 + > 5 files changed, 510 insertions(+), 79 deletions(-) > create mode 100644 lttng-abi-old.h > > diff --git a/lttng-abi-old.h b/lttng-abi-old.h > new file mode 100644 > index 0000000..3e6b328 > --- /dev/null > +++ b/lttng-abi-old.h > @@ -0,0 +1,141 @@ > +#ifndef _LTTNG_ABI_OLD_H > +#define _LTTNG_ABI_OLD_H > + > +/* > + * lttng-abi-old.h > + * > + * LTTng old ABI header (without support for compat 32/64 bits) > + * > + * Copyright (C) 2010-2012 Mathieu Desnoyers > + * > + * This library is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; only > + * version 2.1 of the License. > + * > + * This library is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with this library; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + */ > + > +#include > +#include "lttng-abi.h" > + > +/* > + * LTTng DebugFS ABI structures. > + */ > +#define LTTNG_KERNEL_OLD_CHANNEL_PADDING LTTNG_KERNEL_SYM_NAME_LEN + 32 > +struct lttng_kernel_old_channel { > + int overwrite; /* 1: overwrite, 0: discard */ > + uint64_t subbuf_size; /* in bytes */ > + uint64_t num_subbuf; > + unsigned int switch_timer_interval; /* usecs */ > + unsigned int read_timer_interval; /* usecs */ > + enum lttng_kernel_output output; /* splice, mmap */ > + char padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING]; > +}; > + > +struct lttng_kernel_old_kretprobe { > + uint64_t addr; > + > + uint64_t offset; > + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > +}; > + > +/* > + * Either addr is used, or symbol_name and offset. > + */ > +struct lttng_kernel_old_kprobe { > + uint64_t addr; > + > + uint64_t offset; > + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > +}; > + > +struct lttng_kernel_old_function_tracer { > + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > +}; > + > +/* > + * For syscall tracing, name = '\0' means "enable all". > + */ > +#define LTTNG_KERNEL_OLD_EVENT_PADDING1 16 > +#define LTTNG_KERNEL_OLD_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > +struct lttng_kernel_old_event { > + char name[LTTNG_KERNEL_SYM_NAME_LEN]; /* event name */ > + enum lttng_kernel_instrumentation instrumentation; > + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; > + > + /* Per instrumentation type configuration */ > + union { > + struct lttng_kernel_old_kretprobe kretprobe; > + struct lttng_kernel_old_kprobe kprobe; > + struct lttng_kernel_old_function_tracer ftrace; > + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; > + } u; > +}; > + > +struct lttng_kernel_old_tracer_version { > + uint32_t major; > + uint32_t minor; > + uint32_t patchlevel; > +}; > + > +struct lttng_kernel_old_calibrate { > + enum lttng_kernel_calibrate_type type; /* type (input) */ > +}; > + > +struct lttng_kernel_old_perf_counter_ctx { > + uint32_t type; > + uint64_t config; > + char name[LTTNG_KERNEL_SYM_NAME_LEN]; > +}; > + > +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 > +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > +struct lttng_kernel_old_context { > + enum lttng_kernel_context_type ctx; > + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; > + > + union { > + struct lttng_kernel_old_perf_counter_ctx perf_counter; > + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; > + } u; > +}; > + > +/* LTTng file descriptor ioctl */ > +#define LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) > +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ > + _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) > +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, 0x42) > +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT _IO(0xF6, 0x43) > +#define LTTNG_KERNEL_OLD_CALIBRATE \ > + _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) > + > +/* Session FD ioctl */ > +#define LTTNG_KERNEL_OLD_METADATA \ > + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) > +#define LTTNG_KERNEL_OLD_CHANNEL \ > + _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) > +#define LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) > +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, 0x53) > + > +/* Channel FD ioctl */ > +#define LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) > +#define LTTNG_KERNEL_OLD_EVENT \ > + _IOW(0xF6, 0x61, struct lttng_kernel_old_event) > + > +/* Event and Channel FD ioctl */ > +#define LTTNG_KERNEL_OLD_CONTEXT \ > + _IOW(0xF6, 0x70, struct lttng_kernel_old_context) > + > +/* Event, Channel and Session ioctl */ > +#define LTTNG_KERNEL_OLD_ENABLE _IO(0xF6, 0x80) > +#define LTTNG_KERNEL_OLD_DISABLE _IO(0xF6, 0x81) > + > +#endif /* _LTTNG_ABI_OLD_H */ > diff --git a/lttng-abi.c b/lttng-abi.c > index eadf0a8..25a350a 100644 > --- a/lttng-abi.c > +++ b/lttng-abi.c > @@ -47,6 +47,7 @@ > #include "wrapper/ringbuffer/vfs.h" > #include "wrapper/poll.h" > #include "lttng-abi.h" > +#include "lttng-abi-old.h" > #include "lttng-events.h" > #include "lttng-tracer.h" > > @@ -143,34 +144,23 @@ fd_error: > } > > static > -long lttng_abi_tracer_version(struct file *file, > - struct lttng_kernel_tracer_version __user *uversion_param) > +void lttng_abi_tracer_version(struct lttng_kernel_tracer_version *v) > { > - struct lttng_kernel_tracer_version v; > - > - v.major = LTTNG_MODULES_MAJOR_VERSION; > - v.minor = LTTNG_MODULES_MINOR_VERSION; > - v.patchlevel = LTTNG_MODULES_PATCHLEVEL_VERSION; > - > - if (copy_to_user(uversion_param, &v, sizeof(v))) > - return -EFAULT; > - return 0; > + v->major = LTTNG_MODULES_MAJOR_VERSION; > + v->minor = LTTNG_MODULES_MINOR_VERSION; > + v->patchlevel = LTTNG_MODULES_PATCHLEVEL_VERSION; > } > > static > long lttng_abi_add_context(struct file *file, > - struct lttng_kernel_context __user *ucontext_param, > + struct lttng_kernel_context *context_param, > struct lttng_ctx **ctx, struct lttng_session *session) > { > - struct lttng_kernel_context context_param; > > if (session->been_active) > return -EPERM; > > - if (copy_from_user(&context_param, ucontext_param, sizeof(context_param))) > - return -EFAULT; > - > - switch (context_param.ctx) { > + switch (context_param->ctx) { > case LTTNG_KERNEL_CONTEXT_PID: > return lttng_add_pid_to_ctx(ctx); > case LTTNG_KERNEL_CONTEXT_PRIO: > @@ -188,10 +178,10 @@ long lttng_abi_add_context(struct file *file, > case LTTNG_KERNEL_CONTEXT_VPPID: > return lttng_add_vppid_to_ctx(ctx); > case LTTNG_KERNEL_CONTEXT_PERF_COUNTER: > - context_param.u.perf_counter.name[LTTNG_KERNEL_SYM_NAME_LEN - 1] = '\0'; > - return lttng_add_perf_counter_to_ctx(context_param.u.perf_counter.type, > - context_param.u.perf_counter.config, > - context_param.u.perf_counter.name, > + context_param->u.perf_counter.name[LTTNG_KERNEL_SYM_NAME_LEN - 1] = '\0'; > + return lttng_add_perf_counter_to_ctx(context_param->u.perf_counter.type, > + context_param->u.perf_counter.config, > + context_param->u.perf_counter.name, > ctx); > case LTTNG_KERNEL_CONTEXT_PROCNAME: > return lttng_add_procname_to_ctx(ctx); > @@ -225,16 +215,60 @@ static > long lttng_ioctl(struct file *file, unsigned int cmd, unsigned long arg) > { > switch (cmd) { > + case LTTNG_KERNEL_OLD_SESSION: > case LTTNG_KERNEL_SESSION: > return lttng_abi_create_session(); > + case LTTNG_KERNEL_OLD_TRACER_VERSION: > + { > + struct lttng_kernel_tracer_version v; > + struct lttng_kernel_old_tracer_version oldv; > + struct lttng_kernel_old_tracer_version *uversion = > + (struct lttng_kernel_old_tracer_version __user *) arg; > + > + lttng_abi_tracer_version(&v); > + oldv.major = v.major; > + oldv.minor = v.minor; > + oldv.patchlevel = v.patchlevel; > + > + if (copy_to_user(uversion, &oldv, sizeof(oldv))) > + return -EFAULT; > + return 0; > + } > case LTTNG_KERNEL_TRACER_VERSION: > - return lttng_abi_tracer_version(file, > - (struct lttng_kernel_tracer_version __user *) arg); > + { > + struct lttng_kernel_tracer_version version; > + struct lttng_kernel_tracer_version *uversion = > + (struct lttng_kernel_tracer_version __user *) arg; > + > + lttng_abi_tracer_version(&version); > + > + if (copy_to_user(uversion, &version, sizeof(version))) > + return -EFAULT; > + return 0; > + } > + case LTTNG_KERNEL_OLD_TRACEPOINT_LIST: > case LTTNG_KERNEL_TRACEPOINT_LIST: > return lttng_abi_tracepoint_list(); > + case LTTNG_KERNEL_OLD_WAIT_QUIESCENT: > case LTTNG_KERNEL_WAIT_QUIESCENT: > synchronize_trace(); > return 0; > + case LTTNG_KERNEL_OLD_CALIBRATE: > + { > + struct lttng_kernel_old_calibrate __user *ucalibrate = > + (struct lttng_kernel_old_calibrate __user *) arg; > + struct lttng_kernel_old_calibrate old_calibrate; > + struct lttng_kernel_calibrate calibrate; > + int ret; > + > + if (copy_from_user(&old_calibrate, ucalibrate, sizeof(old_calibrate))) > + return -EFAULT; > + calibrate.type = old_calibrate.type; > + ret = lttng_calibrate(&calibrate); > + if (copy_to_user(ucalibrate, &old_calibrate, sizeof(old_calibrate))) > + return -EFAULT; > + return ret; > + } > case LTTNG_KERNEL_CALIBRATE: > { > struct lttng_kernel_calibrate __user *ucalibrate = > @@ -294,7 +328,7 @@ create_error: > > static > int lttng_abi_create_channel(struct file *session_file, > - struct lttng_kernel_channel __user *uchan_param, > + struct lttng_kernel_channel *chan_param, > enum channel_type channel_type) > { > struct lttng_session *session = session_file->private_data; > @@ -302,12 +336,9 @@ int lttng_abi_create_channel(struct file *session_file, > const char *transport_name; > struct lttng_channel *chan; > struct file *chan_file; > - struct lttng_kernel_channel chan_param; > int chan_fd; > int ret = 0; > > - if (copy_from_user(&chan_param, uchan_param, sizeof(chan_param))) > - return -EFAULT; > chan_fd = get_unused_fd(); > if (chan_fd < 0) { > ret = chan_fd; > @@ -331,20 +362,20 @@ int lttng_abi_create_channel(struct file *session_file, > } > switch (channel_type) { > case PER_CPU_CHANNEL: > - if (chan_param.output == LTTNG_KERNEL_SPLICE) { > - transport_name = chan_param.overwrite ? > + if (chan_param->output == LTTNG_KERNEL_SPLICE) { > + transport_name = chan_param->overwrite ? > "relay-overwrite" : "relay-discard"; > - } else if (chan_param.output == LTTNG_KERNEL_MMAP) { > - transport_name = chan_param.overwrite ? > + } else if (chan_param->output == LTTNG_KERNEL_MMAP) { > + transport_name = chan_param->overwrite ? > "relay-overwrite-mmap" : "relay-discard-mmap"; > } else { > return -EINVAL; > } > break; > case METADATA_CHANNEL: > - if (chan_param.output == LTTNG_KERNEL_SPLICE) > + if (chan_param->output == LTTNG_KERNEL_SPLICE) > transport_name = "relay-metadata"; > - else if (chan_param.output == LTTNG_KERNEL_MMAP) > + else if (chan_param->output == LTTNG_KERNEL_MMAP) > transport_name = "relay-metadata-mmap"; > else > return -EINVAL; > @@ -358,10 +389,10 @@ int lttng_abi_create_channel(struct file *session_file, > * invariant for the rest of the session. > */ > chan = lttng_channel_create(session, transport_name, NULL, > - chan_param.subbuf_size, > - chan_param.num_subbuf, > - chan_param.switch_timer_interval, > - chan_param.read_timer_interval); > + chan_param->subbuf_size, > + chan_param->num_subbuf, > + chan_param->switch_timer_interval, > + chan_param->read_timer_interval); > if (!chan) { > ret = -EINVAL; > goto chan_error; > @@ -412,20 +443,76 @@ long lttng_session_ioctl(struct file *file, unsigned int cmd, unsigned long arg) > struct lttng_session *session = file->private_data; > > switch (cmd) { > + case LTTNG_KERNEL_OLD_CHANNEL: > + { > + struct lttng_kernel_channel chan_param; > + struct lttng_kernel_old_channel old_chan_param; > + > + if (copy_from_user(&old_chan_param, > + (struct lttng_kernel_old_channel __user *) arg, > + sizeof(struct lttng_kernel_old_channel))) > + return -EFAULT; > + chan_param.overwrite = old_chan_param.overwrite; > + chan_param.subbuf_size = old_chan_param.subbuf_size; > + chan_param.num_subbuf = old_chan_param.num_subbuf; > + chan_param.switch_timer_interval = old_chan_param.switch_timer_interval; > + chan_param.read_timer_interval = old_chan_param.read_timer_interval; > + chan_param.output = old_chan_param.output; > + > + return lttng_abi_create_channel(file, &chan_param, > + PER_CPU_CHANNEL); > + } > case LTTNG_KERNEL_CHANNEL: > - return lttng_abi_create_channel(file, > + { > + struct lttng_kernel_channel chan_param; > + > + if (copy_from_user(&chan_param, > (struct lttng_kernel_channel __user *) arg, > + sizeof(struct lttng_kernel_channel))) > + return -EFAULT; > + return lttng_abi_create_channel(file, &chan_param, > PER_CPU_CHANNEL); > + } > + case LTTNG_KERNEL_OLD_SESSION_START: > + case LTTNG_KERNEL_OLD_ENABLE: > case LTTNG_KERNEL_SESSION_START: > case LTTNG_KERNEL_ENABLE: > return lttng_session_enable(session); > + case LTTNG_KERNEL_OLD_SESSION_STOP: > + case LTTNG_KERNEL_OLD_DISABLE: > case LTTNG_KERNEL_SESSION_STOP: > case LTTNG_KERNEL_DISABLE: > return lttng_session_disable(session); > + case LTTNG_KERNEL_OLD_METADATA: > + { > + struct lttng_kernel_channel chan_param; > + struct lttng_kernel_old_channel old_chan_param; > + > + if (copy_from_user(&old_chan_param, > + (struct lttng_kernel_old_channel __user *) arg, > + sizeof(struct lttng_kernel_old_channel))) > + return -EFAULT; > + chan_param.overwrite = old_chan_param.overwrite; > + chan_param.subbuf_size = old_chan_param.subbuf_size; > + chan_param.num_subbuf = old_chan_param.num_subbuf; > + chan_param.switch_timer_interval = old_chan_param.switch_timer_interval; > + chan_param.read_timer_interval = old_chan_param.read_timer_interval; > + chan_param.output = old_chan_param.output; > + > + return lttng_abi_create_channel(file, &chan_param, > + METADATA_CHANNEL); > + } > case LTTNG_KERNEL_METADATA: > - return lttng_abi_create_channel(file, > - (struct lttng_kernel_channel __user *) arg, > + { > + struct lttng_kernel_channel chan_param; > + > + if (copy_from_user(&chan_param, > + (struct lttng_kernel_channel __user *) arg, > + sizeof(struct lttng_kernel_channel))) > + return -EFAULT; > + return lttng_abi_create_channel(file, &chan_param, > METADATA_CHANNEL); > + } > default: > return -ENOIOCTLCMD; > } > @@ -505,31 +592,28 @@ fd_error: > > static > int lttng_abi_create_event(struct file *channel_file, > - struct lttng_kernel_event __user *uevent_param) > + struct lttng_kernel_event *event_param) > { > struct lttng_channel *channel = channel_file->private_data; > struct lttng_event *event; > - struct lttng_kernel_event event_param; > int event_fd, ret; > struct file *event_file; > > - if (copy_from_user(&event_param, uevent_param, sizeof(event_param))) > - return -EFAULT; > - event_param.name[LTTNG_KERNEL_SYM_NAME_LEN - 1] = '\0'; > - switch (event_param.instrumentation) { > + event_param->name[LTTNG_KERNEL_SYM_NAME_LEN - 1] = '\0'; > + switch (event_param->instrumentation) { > case LTTNG_KERNEL_KRETPROBE: > - event_param.u.kretprobe.symbol_name[LTTNG_KERNEL_SYM_NAME_LEN - 1] = '\0'; > + event_param->u.kretprobe.symbol_name[LTTNG_KERNEL_SYM_NAME_LEN - 1] = '\0'; > break; > case LTTNG_KERNEL_KPROBE: > - event_param.u.kprobe.symbol_name[LTTNG_KERNEL_SYM_NAME_LEN - 1] = '\0'; > + event_param->u.kprobe.symbol_name[LTTNG_KERNEL_SYM_NAME_LEN - 1] = '\0'; > break; > case LTTNG_KERNEL_FUNCTION: > - event_param.u.ftrace.symbol_name[LTTNG_KERNEL_SYM_NAME_LEN - 1] = '\0'; > + event_param->u.ftrace.symbol_name[LTTNG_KERNEL_SYM_NAME_LEN - 1] = '\0'; > break; > default: > break; > } > - switch (event_param.instrumentation) { > + switch (event_param->instrumentation) { > default: > event_fd = get_unused_fd(); > if (event_fd < 0) { > @@ -547,7 +631,7 @@ int lttng_abi_create_event(struct file *channel_file, > * We tolerate no failure path after event creation. It > * will stay invariant for the rest of the session. > */ > - event = lttng_event_create(channel, &event_param, NULL, NULL); > + event = lttng_event_create(channel, event_param, NULL, NULL); > if (!event) { > ret = -EINVAL; > goto event_error; > @@ -561,7 +645,7 @@ int lttng_abi_create_event(struct file *channel_file, > /* > * Only all-syscall tracing supported for now. > */ > - if (event_param.name[0] != '\0') > + if (event_param->name[0] != '\0') > return -EINVAL; > ret = lttng_syscalls_register(channel, NULL); > if (ret) > @@ -607,21 +691,158 @@ long lttng_channel_ioctl(struct file *file, unsigned int cmd, unsigned long arg) > struct lttng_channel *channel = file->private_data; > > switch (cmd) { > + case LTTNG_KERNEL_OLD_STREAM: > case LTTNG_KERNEL_STREAM: > return lttng_abi_open_stream(file); > + case LTTNG_KERNEL_OLD_EVENT: > + { > + struct lttng_kernel_event *uevent_param; > + struct lttng_kernel_old_event *old_uevent_param; > + int ret; > + > + uevent_param = kmalloc(sizeof(struct lttng_kernel_event), > + GFP_KERNEL); > + if (!uevent_param) { > + ret = -ENOMEM; > + goto old_event_end; > + } > + old_uevent_param = kmalloc( > + sizeof(struct lttng_kernel_old_event), > + GFP_KERNEL); > + if (!old_uevent_param) { > + ret = -ENOMEM; > + goto old_event_error_free_param; > + } > + if (copy_from_user(old_uevent_param, > + (struct lttng_kernel_old_event __user *) arg, > + sizeof(struct lttng_kernel_old_event))) { > + ret = -EFAULT; > + goto old_event_error_free_old_param; > + } > + > + memcpy(uevent_param->name, old_uevent_param->name, > + sizeof(uevent_param->name)); > + uevent_param->instrumentation = > + old_uevent_param->instrumentation; > + > + switch (old_uevent_param->instrumentation) { > + case LTTNG_KERNEL_KPROBE: > + uevent_param->u.kprobe.addr = > + old_uevent_param->u.kprobe.addr; > + uevent_param->u.kprobe.offset = > + old_uevent_param->u.kprobe.offset; > + memcpy(uevent_param->u.kprobe.symbol_name, > + old_uevent_param->u.kprobe.symbol_name, > + sizeof(uevent_param->u.kprobe.symbol_name)); > + break; > + case LTTNG_KERNEL_KRETPROBE: > + uevent_param->u.kretprobe.addr = > + old_uevent_param->u.kretprobe.addr; > + uevent_param->u.kretprobe.offset = > + old_uevent_param->u.kretprobe.offset; > + memcpy(uevent_param->u.kretprobe.symbol_name, > + old_uevent_param->u.kretprobe.symbol_name, > + sizeof(uevent_param->u.kretprobe.symbol_name)); > + break; > + case LTTNG_KERNEL_FUNCTION: > + memcpy(uevent_param->u.ftrace.symbol_name, > + old_uevent_param->u.ftrace.symbol_name, > + sizeof(uevent_param->u.ftrace.symbol_name)); > + break; > + default: > + break; > + } > + ret = lttng_abi_create_event(file, uevent_param); > + > +old_event_error_free_old_param: > + kfree(old_uevent_param); > +old_event_error_free_param: > + kfree(uevent_param); > +old_event_end: > + return ret; > + } > case LTTNG_KERNEL_EVENT: > - return lttng_abi_create_event(file, (struct lttng_kernel_event __user *) arg); > + { > + struct lttng_kernel_event uevent_param; > + > + if (copy_from_user(&uevent_param, > + (struct lttng_kernel_event __user *) arg, > + sizeof(uevent_param))) > + return -EFAULT; > + return lttng_abi_create_event(file, &uevent_param); > + } > + case LTTNG_KERNEL_OLD_CONTEXT: > + { > + struct lttng_kernel_context *ucontext_param; > + struct lttng_kernel_old_context *old_ucontext_param; > + int ret; > + > + ucontext_param = kmalloc(sizeof(struct lttng_kernel_context), > + GFP_KERNEL); > + if (!ucontext_param) { > + ret = -ENOMEM; > + goto old_ctx_end; > + } > + old_ucontext_param = kmalloc(sizeof(struct lttng_kernel_old_context), > + GFP_KERNEL); > + if (!old_ucontext_param) { > + ret = -ENOMEM; > + goto old_ctx_error_free_param; > + } > + > + if (copy_from_user(old_ucontext_param, > + (struct lttng_kernel_old_context __user *) arg, > + sizeof(struct lttng_kernel_old_context))) { > + ret = -EFAULT; > + goto old_ctx_error_free_old_param; > + } > + ucontext_param->ctx = old_ucontext_param->ctx; > + memcpy(ucontext_param->padding, old_ucontext_param->padding, > + sizeof(ucontext_param->padding)); > + /* only type that uses the union */ > + if (old_ucontext_param->ctx == LTTNG_KERNEL_CONTEXT_PERF_COUNTER) { > + ucontext_param->u.perf_counter.type = > + old_ucontext_param->u.perf_counter.type; > + ucontext_param->u.perf_counter.config = > + old_ucontext_param->u.perf_counter.config; > + memcpy(ucontext_param->u.perf_counter.name, > + old_ucontext_param->u.perf_counter.name, > + sizeof(ucontext_param->u.perf_counter.name)); > + } > + > + ret = lttng_abi_add_context(file, > + ucontext_param, > + &channel->ctx, channel->session); > + > +old_ctx_error_free_old_param: > + kfree(old_ucontext_param); > +old_ctx_error_free_param: > + kfree(ucontext_param); > +old_ctx_end: > + return ret; > + } > case LTTNG_KERNEL_CONTEXT: > - return lttng_abi_add_context(file, > + { > + struct lttng_kernel_context ucontext_param; > + > + if (copy_from_user(&ucontext_param, > (struct lttng_kernel_context __user *) arg, > + sizeof(ucontext_param))) > + return -EFAULT; > + return lttng_abi_add_context(file, > + &ucontext_param, > &channel->ctx, channel->session); > + } > + case LTTNG_KERNEL_OLD_ENABLE: > case LTTNG_KERNEL_ENABLE: > return lttng_channel_enable(channel); > + case LTTNG_KERNEL_OLD_DISABLE: > case LTTNG_KERNEL_DISABLE: > return lttng_channel_disable(channel); > default: > return -ENOIOCTLCMD; > } > + > } > > /** > @@ -641,6 +862,7 @@ static > long lttng_metadata_ioctl(struct file *file, unsigned int cmd, unsigned long arg) > { > switch (cmd) { > + case LTTNG_KERNEL_OLD_STREAM: > case LTTNG_KERNEL_STREAM: > return lttng_abi_open_stream(file); > default: > @@ -726,12 +948,72 @@ long lttng_event_ioctl(struct file *file, unsigned int cmd, unsigned long arg) > struct lttng_event *event = file->private_data; > > switch (cmd) { > + case LTTNG_KERNEL_OLD_CONTEXT: > + { > + struct lttng_kernel_context *ucontext_param; > + struct lttng_kernel_old_context *old_ucontext_param; > + int ret; > + > + ucontext_param = kmalloc(sizeof(struct lttng_kernel_context), > + GFP_KERNEL); > + if (!ucontext_param) { > + ret = -ENOMEM; > + goto old_ctx_end; > + } > + old_ucontext_param = kmalloc(sizeof(struct lttng_kernel_old_context), > + GFP_KERNEL); > + if (!old_ucontext_param) { > + ret = -ENOMEM; > + goto old_ctx_error_free_param; > + } > + > + if (copy_from_user(old_ucontext_param, > + (struct lttng_kernel_old_context __user *) arg, > + sizeof(struct lttng_kernel_old_context))) { > + ret = -EFAULT; > + goto old_ctx_error_free_old_param; > + } > + ucontext_param->ctx = old_ucontext_param->ctx; > + memcpy(ucontext_param->padding, old_ucontext_param->padding, > + sizeof(ucontext_param->padding)); > + /* only type that uses the union */ > + if (old_ucontext_param->ctx == LTTNG_KERNEL_CONTEXT_PERF_COUNTER) { > + ucontext_param->u.perf_counter.type = > + old_ucontext_param->u.perf_counter.type; > + ucontext_param->u.perf_counter.config = > + old_ucontext_param->u.perf_counter.config; > + memcpy(ucontext_param->u.perf_counter.name, > + old_ucontext_param->u.perf_counter.name, > + sizeof(ucontext_param->u.perf_counter.name)); > + } > + > + ret = lttng_abi_add_context(file, > + ucontext_param, > + &event->ctx, event->chan->session); > + > +old_ctx_error_free_old_param: > + kfree(old_ucontext_param); > +old_ctx_error_free_param: > + kfree(ucontext_param); > +old_ctx_end: > + return ret; > + } > case LTTNG_KERNEL_CONTEXT: > + { > + struct lttng_kernel_context ucontext_param; > + > + if (copy_from_user(&ucontext_param, > + (struct lttng_kernel_context __user *) arg, > + sizeof(ucontext_param))) > + return -EFAULT; > return lttng_abi_add_context(file, > - (struct lttng_kernel_context __user *) arg, > + &ucontext_param, > &event->ctx, event->chan->session); > + } > + case LTTNG_KERNEL_OLD_ENABLE: > case LTTNG_KERNEL_ENABLE: > return lttng_event_enable(event); > + case LTTNG_KERNEL_OLD_DISABLE: > case LTTNG_KERNEL_DISABLE: > return lttng_event_disable(event); > default: > diff --git a/lttng-abi.h b/lttng-abi.h > index cf72b12..8d3ecdd 100644 > --- a/lttng-abi.h > +++ b/lttng-abi.h > @@ -49,21 +49,21 @@ enum lttng_kernel_output { > */ > #define LTTNG_KERNEL_CHANNEL_PADDING LTTNG_KERNEL_SYM_NAME_LEN + 32 > struct lttng_kernel_channel { > - int overwrite; /* 1: overwrite, 0: discard */ > uint64_t subbuf_size; /* in bytes */ > uint64_t num_subbuf; > unsigned int switch_timer_interval; /* usecs */ > unsigned int read_timer_interval; /* usecs */ > enum lttng_kernel_output output; /* splice, mmap */ > + int overwrite; /* 1: overwrite, 0: discard */ > char padding[LTTNG_KERNEL_CHANNEL_PADDING]; > -}; > +}__attribute__((packed)); > > struct lttng_kernel_kretprobe { > uint64_t addr; > > uint64_t offset; > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* > * Either addr is used, or symbol_name and offset. > @@ -73,11 +73,11 @@ struct lttng_kernel_kprobe { > > uint64_t offset; > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > struct lttng_kernel_function_tracer { > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* > * For syscall tracing, name = '\0' means "enable all". > @@ -96,13 +96,13 @@ struct lttng_kernel_event { > struct lttng_kernel_function_tracer ftrace; > char padding[LTTNG_KERNEL_EVENT_PADDING2]; > } u; > -}; > +}__attribute__((packed)); > > struct lttng_kernel_tracer_version { > uint32_t major; > uint32_t minor; > uint32_t patchlevel; > -}; > +}__attribute__((packed)); > > enum lttng_kernel_calibrate_type { > LTTNG_KERNEL_CALIBRATE_KRETPROBE, > @@ -110,7 +110,7 @@ enum lttng_kernel_calibrate_type { > > struct lttng_kernel_calibrate { > enum lttng_kernel_calibrate_type type; /* type (input) */ > -}; > +}__attribute__((packed)); > > enum lttng_kernel_context_type { > LTTNG_KERNEL_CONTEXT_PID = 0, > @@ -130,7 +130,7 @@ struct lttng_kernel_perf_counter_ctx { > uint32_t type; > uint64_t config; > char name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > #define LTTNG_KERNEL_CONTEXT_PADDING1 16 > #define LTTNG_KERNEL_CONTEXT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > @@ -142,36 +142,36 @@ struct lttng_kernel_context { > struct lttng_kernel_perf_counter_ctx perf_counter; > char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; > } u; > -}; > +}__attribute__((packed)); > > /* LTTng file descriptor ioctl */ > -#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) > +#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) > #define LTTNG_KERNEL_TRACER_VERSION \ > - _IOR(0xF6, 0x41, struct lttng_kernel_tracer_version) > -#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) > -#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) > + _IOR(0xF6, 0x46, struct lttng_kernel_tracer_version) > +#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) > +#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) > #define LTTNG_KERNEL_CALIBRATE \ > - _IOWR(0xF6, 0x44, struct lttng_kernel_calibrate) > + _IOWR(0xF6, 0x49, struct lttng_kernel_calibrate) > > /* Session FD ioctl */ > #define LTTNG_KERNEL_METADATA \ > - _IOW(0xF6, 0x50, struct lttng_kernel_channel) > + _IOW(0xF6, 0x54, struct lttng_kernel_channel) > #define LTTNG_KERNEL_CHANNEL \ > - _IOW(0xF6, 0x51, struct lttng_kernel_channel) > -#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) > -#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) > + _IOW(0xF6, 0x55, struct lttng_kernel_channel) > +#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) > +#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) > > /* Channel FD ioctl */ > -#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x60) > +#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x62) > #define LTTNG_KERNEL_EVENT \ > - _IOW(0xF6, 0x61, struct lttng_kernel_event) > + _IOW(0xF6, 0x63, struct lttng_kernel_event) > > /* Event and Channel FD ioctl */ > #define LTTNG_KERNEL_CONTEXT \ > - _IOW(0xF6, 0x70, struct lttng_kernel_context) > + _IOW(0xF6, 0x71, struct lttng_kernel_context) > > /* Event, Channel and Session ioctl */ > -#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) > -#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) > +#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) > +#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) > > #endif /* _LTTNG_ABI_H */ > diff --git a/lttng-events.c b/lttng-events.c > index 97efe42..4f30904 100644 > --- a/lttng-events.c > +++ b/lttng-events.c > @@ -33,6 +33,7 @@ > #include "wrapper/tracepoint.h" > #include "lttng-events.h" > #include "lttng-tracer.h" > +#include "lttng-abi-old.h" > > static LIST_HEAD(sessions); > static LIST_HEAD(lttng_transport_list); > diff --git a/lttng-events.h b/lttng-events.h > index af5aa65..09d5618 100644 > --- a/lttng-events.h > +++ b/lttng-events.h > @@ -28,6 +28,7 @@ > #include > #include "wrapper/uuid.h" > #include "lttng-abi.h" > +#include "lttng-abi-old.h" > > #undef is_signed_type > #define is_signed_type(type) (((type)(-1)) < 0) > @@ -301,6 +302,10 @@ struct lttng_event *lttng_event_create(struct lttng_channel *chan, > struct lttng_kernel_event *event_param, > void *filter, > const struct lttng_event_desc *internal_desc); > +struct lttng_event *lttng_event_compat_old_create(struct lttng_channel *chan, > + struct lttng_kernel_old_event *old_event_param, > + void *filter, > + const struct lttng_event_desc *internal_desc); > > int lttng_channel_enable(struct lttng_channel *channel); > int lttng_channel_disable(struct lttng_channel *channel); > @@ -312,7 +317,9 @@ void lttng_transport_unregister(struct lttng_transport *transport); > > void synchronize_trace(void); > int lttng_abi_init(void); > +int lttng_abi_compat_old_init(void); > void lttng_abi_exit(void); > +void lttng_abi_compat_old_exit(void); > > int lttng_probe_register(struct lttng_probe_desc *desc); > void lttng_probe_unregister(struct lttng_probe_desc *desc); > -- > 1.7.9.5 > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 1 12:15:41 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 1 Oct 2012 12:15:41 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v2] ABI with support for compat 32/64 bits In-Reply-To: <1348952028-31926-1-git-send-email-jdesfossez@efficios.com> References: <1348952028-31926-1-git-send-email-jdesfossez@efficios.com> Message-ID: <20121001161541.GB14260@Krystal> * Julien Desfossez (jdesfossez at efficios.com) wrote: > The current ABI does not work for compat 32/64 bits. > This patch moves the current ABI as old-abi and provides a new ABI in > which all the structures exchanged between user and kernel-space are > packed. Also this new ABI moves the "int overwrite" member of the > struct lttng_kernel_channel to remove the alignment added by the > compiler. > > A patch for lttng-modules has been developed in parallel to this one > to support the new ABI. These 2 patches have been tested in all > possible configurations (applied or not) on 64-bit and 32-bit kernels > (with CONFIG_COMPAT) and a user-space in 32 and 64-bit. > > Here are the results of the tests : > k 64 compat |?u 32 compat | OK > k 64 compat | u 64 compat | OK > k 64 compat | u 32 non-compat | KO > k 64 compat | u 64 non-compat | OK > > k 64 non-compat | u 64 compat | OK > k 64 non-compat | u 32 compat | KO > k 64 non-compat | u 64 non-compat | OK > k 64 non-compat | u 32 non-compat | KO > > k 32 compat | u compat | OK > k 32 compat | u non-compat | OK > > k 32 non-compat | u compat | OK > k 32 non-compat | u non-compat | OK > > The results are as expected : > - on 32-bit user-space and kernel, every configuration works. > - on 64-bit user-space and kernel, every configuration works. > - with 32-bit user-space on a 64-bit kernel the only configuration > where it works is when the compat patch is applied everywhere. > > Signed-off-by: Julien Desfossez > --- > src/bin/lttng-sessiond/trace-kernel.h | 1 + > src/common/kernel-ctl/kernel-ctl.c | 96 ++++++++++++++++++++++++--- > src/common/kernel-ctl/kernel-ctl.h | 1 + > src/common/kernel-ctl/kernel-ioctl.h | 74 +++++++++++++++------ > src/common/lttng-kernel-old.h | 117 +++++++++++++++++++++++++++++++++ > src/common/lttng-kernel.h | 31 ++++++--- > 6 files changed, 281 insertions(+), 39 deletions(-) > create mode 100644 src/common/lttng-kernel-old.h > > diff --git a/src/bin/lttng-sessiond/trace-kernel.h b/src/bin/lttng-sessiond/trace-kernel.h > index f04d9e7..c86cc27 100644 > --- a/src/bin/lttng-sessiond/trace-kernel.h > +++ b/src/bin/lttng-sessiond/trace-kernel.h > @@ -22,6 +22,7 @@ > > #include > #include > +#include > > #include "consumer.h" > > diff --git a/src/common/kernel-ctl/kernel-ctl.c b/src/common/kernel-ctl/kernel-ctl.c > index 1396cd9..2ac2d53 100644 > --- a/src/common/kernel-ctl/kernel-ctl.c > +++ b/src/common/kernel-ctl/kernel-ctl.c > @@ -18,38 +18,100 @@ > > #define __USE_LINUX_IOCTL_DEFS > #include > +#include > > #include "kernel-ctl.h" > #include "kernel-ioctl.h" > > +/* > + * This flag indicates if lttng-tools must use the new or the old kernel ABI > + * (without compat support for 32/64 bits). It is set by > + * kernctl_tracer_version() Hrm. I don't like that this is only set by kernctl_tracer_version(). What if, for an unforeseen reason (e.g. code change), sessiond starts using "create session" before checking the version ? The change you propose assumes a behavior on the sessiond side that is not cast in stone (no ABI requires it), so it might change. This brings coupling between this otherwise self-contained wrapper and the entire sessiond code base, which I don't like. We should put the check in a wrapper macro around the every new ioctl call. e.g. /* * Cache whether we need to use the old or new ABI. */ static lttng_kernel_use_old_abi = -1; if (lttng_kernel_use_old_abi == -1) { ret = ioctl(fd, newname, args); if (!ret) { lttng_kernel_use_old_abi = 0; } } else { if (!lttng_kernel_use_old_abi) { ret = ioctl(fd, oldname, args); } else { ret = ioctl(fd, newname, args); } } return ret; Thoughts ? Thanks, Mathieu > + */ > +static int lttng_kernel_use_old_abi; > + > int kernctl_create_session(int fd) > { > + if (lttng_kernel_use_old_abi) > + return ioctl(fd, LTTNG_KERNEL_OLD_SESSION); > return ioctl(fd, LTTNG_KERNEL_SESSION); > } > > /* open the metadata global channel */ > int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops) > { > - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); > + struct lttng_kernel_old_channel old_channel; > + struct lttng_kernel_channel channel; > + > + if (lttng_kernel_use_old_abi) { > + old_channel.overwrite = chops->overwrite; > + old_channel.subbuf_size = chops->subbuf_size; > + old_channel.num_subbuf = chops->num_subbuf; > + old_channel.switch_timer_interval = chops->switch_timer_interval; > + old_channel.read_timer_interval = chops->read_timer_interval; > + old_channel.output = chops->output; > + memcpy(old_channel.padding, chops->padding, sizeof(chops->padding)); > + > + return ioctl(fd, LTTNG_KERNEL_OLD_METADATA, &old_channel); > + } > + > + channel.overwrite = chops->overwrite; > + channel.subbuf_size = chops->subbuf_size; > + channel.num_subbuf = chops->num_subbuf; > + channel.switch_timer_interval = chops->switch_timer_interval; > + channel.read_timer_interval = chops->read_timer_interval; > + channel.output = chops->output; > + memcpy(channel.padding, chops->padding, sizeof(chops->padding)); > + > + return ioctl(fd, LTTNG_KERNEL_METADATA, &channel); > } > > int kernctl_create_channel(int fd, struct lttng_channel_attr *chops) > { > - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); > + struct lttng_kernel_old_channel old_channel; > + struct lttng_kernel_channel channel; > + > + if (lttng_kernel_use_old_abi) { > + old_channel.overwrite = chops->overwrite; > + old_channel.subbuf_size = chops->subbuf_size; > + old_channel.num_subbuf = chops->num_subbuf; > + old_channel.switch_timer_interval = chops->switch_timer_interval; > + old_channel.read_timer_interval = chops->read_timer_interval; > + old_channel.output = chops->output; > + memcpy(old_channel.padding, chops->padding, sizeof(chops->padding)); > + > + return ioctl(fd, LTTNG_KERNEL_OLD_CHANNEL, &old_channel); > + } > + > + channel.overwrite = chops->overwrite; > + channel.subbuf_size = chops->subbuf_size; > + channel.num_subbuf = chops->num_subbuf; > + channel.switch_timer_interval = chops->switch_timer_interval; > + channel.read_timer_interval = chops->read_timer_interval; > + channel.output = chops->output; > + memcpy(channel.padding, chops->padding, sizeof(chops->padding)); > + > + return ioctl(fd, LTTNG_KERNEL_CHANNEL, &channel); > } > > int kernctl_create_stream(int fd) > { > + if (lttng_kernel_use_old_abi) > + return ioctl(fd, LTTNG_KERNEL_OLD_STREAM); > return ioctl(fd, LTTNG_KERNEL_STREAM); > } > > int kernctl_create_event(int fd, struct lttng_kernel_event *ev) > { > + if (lttng_kernel_use_old_abi) > + return ioctl(fd, LTTNG_KERNEL_OLD_EVENT, ev); > return ioctl(fd, LTTNG_KERNEL_EVENT, ev); > } > > int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) > { > + if (lttng_kernel_use_old_abi) > + return ioctl(fd, LTTNG_KERNEL_OLD_CONTEXT, ctx); > return ioctl(fd, LTTNG_KERNEL_CONTEXT, ctx); > } > > @@ -57,43 +119,64 @@ int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) > /* Enable event, channel and session ioctl */ > int kernctl_enable(int fd) > { > + if (lttng_kernel_use_old_abi) > + return ioctl(fd, LTTNG_KERNEL_OLD_ENABLE); > return ioctl(fd, LTTNG_KERNEL_ENABLE); > } > > /* Disable event, channel and session ioctl */ > int kernctl_disable(int fd) > { > + if (lttng_kernel_use_old_abi) > + return ioctl(fd, LTTNG_KERNEL_OLD_DISABLE); > return ioctl(fd, LTTNG_KERNEL_DISABLE); > } > > int kernctl_start_session(int fd) > { > + if (lttng_kernel_use_old_abi) > + return ioctl(fd, LTTNG_KERNEL_OLD_SESSION_START); > return ioctl(fd, LTTNG_KERNEL_SESSION_START); > } > > int kernctl_stop_session(int fd) > { > + if (lttng_kernel_use_old_abi) > + return ioctl(fd, LTTNG_KERNEL_OLD_SESSION_STOP); > return ioctl(fd, LTTNG_KERNEL_SESSION_STOP); > } > > > int kernctl_tracepoint_list(int fd) > { > + if (lttng_kernel_use_old_abi) > + return ioctl(fd, LTTNG_KERNEL_OLD_TRACEPOINT_LIST); > return ioctl(fd, LTTNG_KERNEL_TRACEPOINT_LIST); > } > > int kernctl_tracer_version(int fd, struct lttng_kernel_tracer_version *v) > { > - return ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > + int ret; > + > + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > + if (!ret) > + return ret; > + > + lttng_kernel_use_old_abi = 1; > + return ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, v); > } > > int kernctl_wait_quiescent(int fd) > { > + if (lttng_kernel_use_old_abi) > + return ioctl(fd, LTTNG_KERNEL_OLD_WAIT_QUIESCENT); > return ioctl(fd, LTTNG_KERNEL_WAIT_QUIESCENT); > } > > int kernctl_calibrate(int fd, struct lttng_kernel_calibrate *calibrate) > { > + if (lttng_kernel_use_old_abi) > + return ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, calibrate); > return ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > } > > @@ -193,10 +276,3 @@ int kernctl_set_stream_id(int fd, unsigned long *stream_id) > { > return ioctl(fd, RING_BUFFER_SET_STREAM_ID, stream_id); > } > - > -/* Get the offset of the stream_id in the packet header */ > -int kernctl_get_net_stream_id_offset(int fd, unsigned long *offset) > -{ > - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, offset); > - > -} > diff --git a/src/common/kernel-ctl/kernel-ctl.h b/src/common/kernel-ctl/kernel-ctl.h > index 18712d9..85a3a18 100644 > --- a/src/common/kernel-ctl/kernel-ctl.h > +++ b/src/common/kernel-ctl/kernel-ctl.h > @@ -21,6 +21,7 @@ > > #include > #include > +#include > > int kernctl_create_session(int fd); > int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops); > diff --git a/src/common/kernel-ctl/kernel-ioctl.h b/src/common/kernel-ctl/kernel-ioctl.h > index 35942be..1d34222 100644 > --- a/src/common/kernel-ctl/kernel-ioctl.h > +++ b/src/common/kernel-ctl/kernel-ioctl.h > @@ -49,37 +49,69 @@ > /* map stream to stream id for network streaming */ > #define RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned long) > > +/* Old ABI (without support for 32/64 bits compat) */ > +/* LTTng file descriptor ioctl */ > +#define LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) > +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ > + _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) > +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, 0x42) > +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT _IO(0xF6, 0x43) > +#define LTTNG_KERNEL_OLD_CALIBRATE \ > + _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) > + > +/* Session FD ioctl */ > +#define LTTNG_KERNEL_OLD_METADATA \ > + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) > +#define LTTNG_KERNEL_OLD_CHANNEL \ > + _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) > +#define LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) > +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, 0x53) > + > +/* Channel FD ioctl */ > +#define LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) > +#define LTTNG_KERNEL_OLD_EVENT \ > + _IOW(0xF6, 0x61, struct lttng_kernel_old_event) > +#define LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ > + _IOR(0xF6, 0x62, unsigned long) > > +/* Event and Channel FD ioctl */ > +#define LTTNG_KERNEL_OLD_CONTEXT \ > + _IOW(0xF6, 0x70, struct lttng_kernel_old_context) > + > +/* Event, Channel and Session ioctl */ > +#define LTTNG_KERNEL_OLD_ENABLE _IO(0xF6, 0x80) > +#define LTTNG_KERNEL_OLD_DISABLE _IO(0xF6, 0x81) > + > + > +/* New ABI (with suport for 32/64 bits compat) */ > /* LTTng file descriptor ioctl */ > -#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) > -#define LTTNG_KERNEL_TRACER_VERSION \ > - _IOR(0xF6, 0x41, struct lttng_kernel_tracer_version) > -#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) > -#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) > +#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) > +#define LTTNG_KERNEL_TRACER_VERSION \ > + _IOR(0xF6, 0x46, struct lttng_kernel_tracer_version) > +#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) > +#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) > #define LTTNG_KERNEL_CALIBRATE \ > - _IOWR(0xF6, 0x44, struct lttng_kernel_calibrate) > + _IOWR(0xF6, 0x49, struct lttng_kernel_calibrate) > > /* Session FD ioctl */ > -#define LTTNG_KERNEL_METADATA \ > - _IOW(0xF6, 0x50, struct lttng_channel_attr) > -#define LTTNG_KERNEL_CHANNEL \ > - _IOW(0xF6, 0x51, struct lttng_channel_attr) > -#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) > -#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) > +#define LTTNG_KERNEL_METADATA \ > + _IOW(0xF6, 0x54, struct lttng_kernel_channel) > +#define LTTNG_KERNEL_CHANNEL \ > + _IOW(0xF6, 0x55, struct lttng_kernel_channel) > +#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) > +#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) > > /* Channel FD ioctl */ > -#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x60) > -#define LTTNG_KERNEL_EVENT \ > - _IOW(0xF6, 0x61, struct lttng_kernel_event) > -#define LTTNG_KERNEL_STREAM_ID_OFFSET \ > - _IOR(0xF6, 0x62, unsigned long) > +#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x62) > +#define LTTNG_KERNEL_EVENT \ > + _IOW(0xF6, 0x63, struct lttng_kernel_event) > > /* Event and Channel FD ioctl */ > -#define LTTNG_KERNEL_CONTEXT \ > - _IOW(0xF6, 0x70, struct lttng_kernel_context) > +#define LTTNG_KERNEL_CONTEXT \ > + _IOW(0xF6, 0x71, struct lttng_kernel_context) > > /* Event, Channel and Session ioctl */ > -#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) > -#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) > +#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) > +#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) > > #endif /* _LTT_KERNEL_IOCTL_H */ > diff --git a/src/common/lttng-kernel-old.h b/src/common/lttng-kernel-old.h > new file mode 100644 > index 0000000..0579751 > --- /dev/null > +++ b/src/common/lttng-kernel-old.h > @@ -0,0 +1,117 @@ > +/* > + * Copyright (C) 2011 - Julien Desfossez > + * Mathieu Desnoyers > + * David Goulet > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2 only, > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along > + * with this program; if not, write to the Free Software Foundation, Inc., > + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > + */ > + > +#ifndef _LTTNG_KERNEL_OLD_H > +#define _LTTNG_KERNEL_OLD_H > + > +#include > +#include > + > +#define LTTNG_KERNEL_OLD_SYM_NAME_LEN 256 > + > +/* > + * LTTng DebugFS ABI structures. > + * > + * This is the kernel ABI copied from lttng-modules tree. > + */ > + > +/* Perf counter attributes */ > +struct lttng_kernel_old_perf_counter_ctx { > + uint32_t type; > + uint64_t config; > + char name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > +}; > + > +/* Event/Channel context */ > +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 > +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 > +struct lttng_kernel_old_context { > + enum lttng_kernel_context_type ctx; > + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; > + > + union { > + struct lttng_kernel_old_perf_counter_ctx perf_counter; > + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; > + } u; > +}; > + > +struct lttng_kernel_old_kretprobe { > + uint64_t addr; > + > + uint64_t offset; > + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > +}; > + > +/* > + * Either addr is used, or symbol_name and offset. > + */ > +struct lttng_kernel_old_kprobe { > + uint64_t addr; > + > + uint64_t offset; > + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > +}; > + > +/* Function tracer */ > +struct lttng_kernel_old_function { > + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > +}; > + > +#define LTTNG_KERNEL_OLD_EVENT_PADDING1 16 > +#define LTTNG_KERNEL_OLD_EVENT_PADDING2 LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 > +struct lttng_kernel_old_event { > + char name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > + enum lttng_kernel_instrumentation instrumentation; > + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; > + > + /* Per instrumentation type configuration */ > + union { > + struct lttng_kernel_old_kretprobe kretprobe; > + struct lttng_kernel_old_kprobe kprobe; > + struct lttng_kernel_old_function ftrace; > + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; > + } u; > +}; > + > +struct lttng_kernel_old_tracer_version { > + uint32_t major; > + uint32_t minor; > + uint32_t patchlevel; > +}; > + > +struct lttng_kernel_old_calibrate { > + enum lttng_kernel_calibrate_type type; /* type (input) */ > +}; > + > +/* > + * kernel channel > + */ > +#define LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > +struct lttng_kernel_old_channel { > + int overwrite; /* 1: overwrite, 0: discard */ > + uint64_t subbuf_size; /* bytes */ > + uint64_t num_subbuf; /* power of 2 */ > + unsigned int switch_timer_interval; /* usec */ > + unsigned int read_timer_interval; /* usec */ > + enum lttng_event_output output; /* splice, mmap */ > + > + char padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; > +}; > + > +#endif /* _LTTNG_KERNEL_OLD_H */ > diff --git a/src/common/lttng-kernel.h b/src/common/lttng-kernel.h > index dbeb6aa..ac881bf 100644 > --- a/src/common/lttng-kernel.h > +++ b/src/common/lttng-kernel.h > @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { > uint32_t type; > uint64_t config; > char name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* Event/Channel context */ > #define LTTNG_KERNEL_CONTEXT_PADDING1 16 > @@ -72,14 +72,14 @@ struct lttng_kernel_context { > struct lttng_kernel_perf_counter_ctx perf_counter; > char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; > } u; > -}; > +}__attribute__((packed)); > > struct lttng_kernel_kretprobe { > uint64_t addr; > > uint64_t offset; > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* > * Either addr is used, or symbol_name and offset. > @@ -89,12 +89,12 @@ struct lttng_kernel_kprobe { > > uint64_t offset; > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* Function tracer */ > struct lttng_kernel_function { > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > #define LTTNG_KERNEL_EVENT_PADDING1 16 > #define LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > @@ -110,13 +110,13 @@ struct lttng_kernel_event { > struct lttng_kernel_function ftrace; > char padding[LTTNG_KERNEL_EVENT_PADDING2]; > } u; > -}; > +}__attribute__((packed)); > > struct lttng_kernel_tracer_version { > uint32_t major; > uint32_t minor; > uint32_t patchlevel; > -}; > +}__attribute__((packed)); > > enum lttng_kernel_calibrate_type { > LTTNG_KERNEL_CALIBRATE_KRETPROBE, > @@ -124,6 +124,21 @@ enum lttng_kernel_calibrate_type { > > struct lttng_kernel_calibrate { > enum lttng_kernel_calibrate_type type; /* type (input) */ > -}; > +}__attribute__((packed)); > + > +/* > + * kernel channel > + */ > +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > +struct lttng_kernel_channel { > + uint64_t subbuf_size; /* bytes */ > + uint64_t num_subbuf; /* power of 2 */ > + unsigned int switch_timer_interval; /* usec */ > + unsigned int read_timer_interval; /* usec */ > + int overwrite; /* 1: overwrite, 0: discard */ > + enum lttng_event_output output; /* splice, mmap */ > + > + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; > +}__attribute__((packed)); > > #endif /* _LTTNG_KERNEL_H */ > -- > 1.7.9.5 > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From jdesfossez at efficios.com Mon Oct 1 13:55:24 2012 From: jdesfossez at efficios.com (Julien Desfossez) Date: Mon, 01 Oct 2012 13:55:24 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v2] ABI with support for compat 32/64 bits In-Reply-To: <20121001161541.GB14260@Krystal> References: <1348952028-31926-1-git-send-email-jdesfossez@efficios.com> <20121001161541.GB14260@Krystal> Message-ID: <5069D90C.1030804@efficios.com> On 01/10/12 12:15 PM, Mathieu Desnoyers wrote: > * Julien Desfossez (jdesfossez at efficios.com) wrote: >> The current ABI does not work for compat 32/64 bits. >> This patch moves the current ABI as old-abi and provides a new ABI in >> which all the structures exchanged between user and kernel-space are >> packed. Also this new ABI moves the "int overwrite" member of the >> struct lttng_kernel_channel to remove the alignment added by the >> compiler. >> >> A patch for lttng-modules has been developed in parallel to this one >> to support the new ABI. These 2 patches have been tested in all >> possible configurations (applied or not) on 64-bit and 32-bit kernels >> (with CONFIG_COMPAT) and a user-space in 32 and 64-bit. >> >> Here are the results of the tests : >> k 64 compat | u 32 compat | OK >> k 64 compat | u 64 compat | OK >> k 64 compat | u 32 non-compat | KO >> k 64 compat | u 64 non-compat | OK >> >> k 64 non-compat | u 64 compat | OK >> k 64 non-compat | u 32 compat | KO >> k 64 non-compat | u 64 non-compat | OK >> k 64 non-compat | u 32 non-compat | KO >> >> k 32 compat | u compat | OK >> k 32 compat | u non-compat | OK >> >> k 32 non-compat | u compat | OK >> k 32 non-compat | u non-compat | OK >> >> The results are as expected : >> - on 32-bit user-space and kernel, every configuration works. >> - on 64-bit user-space and kernel, every configuration works. >> - with 32-bit user-space on a 64-bit kernel the only configuration >> where it works is when the compat patch is applied everywhere. >> >> Signed-off-by: Julien Desfossez >> --- >> src/bin/lttng-sessiond/trace-kernel.h | 1 + >> src/common/kernel-ctl/kernel-ctl.c | 96 ++++++++++++++++++++++++--- >> src/common/kernel-ctl/kernel-ctl.h | 1 + >> src/common/kernel-ctl/kernel-ioctl.h | 74 +++++++++++++++------ >> src/common/lttng-kernel-old.h | 117 +++++++++++++++++++++++++++++++++ >> src/common/lttng-kernel.h | 31 ++++++--- >> 6 files changed, 281 insertions(+), 39 deletions(-) >> create mode 100644 src/common/lttng-kernel-old.h >> >> diff --git a/src/bin/lttng-sessiond/trace-kernel.h b/src/bin/lttng-sessiond/trace-kernel.h >> index f04d9e7..c86cc27 100644 >> --- a/src/bin/lttng-sessiond/trace-kernel.h >> +++ b/src/bin/lttng-sessiond/trace-kernel.h >> @@ -22,6 +22,7 @@ >> >> #include >> #include >> +#include >> >> #include "consumer.h" >> >> diff --git a/src/common/kernel-ctl/kernel-ctl.c b/src/common/kernel-ctl/kernel-ctl.c >> index 1396cd9..2ac2d53 100644 >> --- a/src/common/kernel-ctl/kernel-ctl.c >> +++ b/src/common/kernel-ctl/kernel-ctl.c >> @@ -18,38 +18,100 @@ >> >> #define __USE_LINUX_IOCTL_DEFS >> #include >> +#include >> >> #include "kernel-ctl.h" >> #include "kernel-ioctl.h" >> >> +/* >> + * This flag indicates if lttng-tools must use the new or the old kernel ABI >> + * (without compat support for 32/64 bits). It is set by >> + * kernctl_tracer_version() > > Hrm. I don't like that this is only set by kernctl_tracer_version(). > What if, for an unforeseen reason (e.g. code change), sessiond starts > using "create session" before checking the version ? The change you > propose assumes a behavior on the sessiond side that is not cast in > stone (no ABI requires it), so it might change. This brings coupling > between this otherwise self-contained wrapper and the entire sessiond > code base, which I don't like. > > We should put the check in a wrapper macro around the every new ioctl > call. > > e.g. > > /* > * Cache whether we need to use the old or new ABI. > */ > static lttng_kernel_use_old_abi = -1; > > > if (lttng_kernel_use_old_abi == -1) { > ret = ioctl(fd, newname, args); > if (!ret) { > lttng_kernel_use_old_abi = 0; > } > } else { > if (!lttng_kernel_use_old_abi) { > ret = ioctl(fd, oldname, args); > } else { > ret = ioctl(fd, newname, args); > } > } > return ret; > > Thoughts ? Ok I will add this wrapper, I am wondering if we could just limit to these ioctl since they are the only one called on the "top-level" fd (/proc/lttng) instead of adding it everywhere : LTTNG_KERNEL_SESSION LTTNG_KERNEL_TRACER_VERSION LTTNG_KERNEL_TRACEPOINT_LIST LTTNG_KERNEL_WAIT_QUIESCENT LTTNG_KERNEL_CALIBRATE All other ioctl require at least a session so is it OK if I limit the test to only these ? Thanks, Julien > > Thanks, > > Mathieu > > >> + */ >> +static int lttng_kernel_use_old_abi; >> + >> int kernctl_create_session(int fd) >> { >> + if (lttng_kernel_use_old_abi) >> + return ioctl(fd, LTTNG_KERNEL_OLD_SESSION); >> return ioctl(fd, LTTNG_KERNEL_SESSION); >> } >> >> /* open the metadata global channel */ >> int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops) >> { >> - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); >> + struct lttng_kernel_old_channel old_channel; >> + struct lttng_kernel_channel channel; >> + >> + if (lttng_kernel_use_old_abi) { >> + old_channel.overwrite = chops->overwrite; >> + old_channel.subbuf_size = chops->subbuf_size; >> + old_channel.num_subbuf = chops->num_subbuf; >> + old_channel.switch_timer_interval = chops->switch_timer_interval; >> + old_channel.read_timer_interval = chops->read_timer_interval; >> + old_channel.output = chops->output; >> + memcpy(old_channel.padding, chops->padding, sizeof(chops->padding)); >> + >> + return ioctl(fd, LTTNG_KERNEL_OLD_METADATA, &old_channel); >> + } >> + >> + channel.overwrite = chops->overwrite; >> + channel.subbuf_size = chops->subbuf_size; >> + channel.num_subbuf = chops->num_subbuf; >> + channel.switch_timer_interval = chops->switch_timer_interval; >> + channel.read_timer_interval = chops->read_timer_interval; >> + channel.output = chops->output; >> + memcpy(channel.padding, chops->padding, sizeof(chops->padding)); >> + >> + return ioctl(fd, LTTNG_KERNEL_METADATA, &channel); >> } >> >> int kernctl_create_channel(int fd, struct lttng_channel_attr *chops) >> { >> - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); >> + struct lttng_kernel_old_channel old_channel; >> + struct lttng_kernel_channel channel; >> + >> + if (lttng_kernel_use_old_abi) { >> + old_channel.overwrite = chops->overwrite; >> + old_channel.subbuf_size = chops->subbuf_size; >> + old_channel.num_subbuf = chops->num_subbuf; >> + old_channel.switch_timer_interval = chops->switch_timer_interval; >> + old_channel.read_timer_interval = chops->read_timer_interval; >> + old_channel.output = chops->output; >> + memcpy(old_channel.padding, chops->padding, sizeof(chops->padding)); >> + >> + return ioctl(fd, LTTNG_KERNEL_OLD_CHANNEL, &old_channel); >> + } >> + >> + channel.overwrite = chops->overwrite; >> + channel.subbuf_size = chops->subbuf_size; >> + channel.num_subbuf = chops->num_subbuf; >> + channel.switch_timer_interval = chops->switch_timer_interval; >> + channel.read_timer_interval = chops->read_timer_interval; >> + channel.output = chops->output; >> + memcpy(channel.padding, chops->padding, sizeof(chops->padding)); >> + >> + return ioctl(fd, LTTNG_KERNEL_CHANNEL, &channel); >> } >> >> int kernctl_create_stream(int fd) >> { >> + if (lttng_kernel_use_old_abi) >> + return ioctl(fd, LTTNG_KERNEL_OLD_STREAM); >> return ioctl(fd, LTTNG_KERNEL_STREAM); >> } >> >> int kernctl_create_event(int fd, struct lttng_kernel_event *ev) >> { >> + if (lttng_kernel_use_old_abi) >> + return ioctl(fd, LTTNG_KERNEL_OLD_EVENT, ev); >> return ioctl(fd, LTTNG_KERNEL_EVENT, ev); >> } >> >> int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) >> { >> + if (lttng_kernel_use_old_abi) >> + return ioctl(fd, LTTNG_KERNEL_OLD_CONTEXT, ctx); >> return ioctl(fd, LTTNG_KERNEL_CONTEXT, ctx); >> } >> >> @@ -57,43 +119,64 @@ int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) >> /* Enable event, channel and session ioctl */ >> int kernctl_enable(int fd) >> { >> + if (lttng_kernel_use_old_abi) >> + return ioctl(fd, LTTNG_KERNEL_OLD_ENABLE); >> return ioctl(fd, LTTNG_KERNEL_ENABLE); >> } >> >> /* Disable event, channel and session ioctl */ >> int kernctl_disable(int fd) >> { >> + if (lttng_kernel_use_old_abi) >> + return ioctl(fd, LTTNG_KERNEL_OLD_DISABLE); >> return ioctl(fd, LTTNG_KERNEL_DISABLE); >> } >> >> int kernctl_start_session(int fd) >> { >> + if (lttng_kernel_use_old_abi) >> + return ioctl(fd, LTTNG_KERNEL_OLD_SESSION_START); >> return ioctl(fd, LTTNG_KERNEL_SESSION_START); >> } >> >> int kernctl_stop_session(int fd) >> { >> + if (lttng_kernel_use_old_abi) >> + return ioctl(fd, LTTNG_KERNEL_OLD_SESSION_STOP); >> return ioctl(fd, LTTNG_KERNEL_SESSION_STOP); >> } >> >> >> int kernctl_tracepoint_list(int fd) >> { >> + if (lttng_kernel_use_old_abi) >> + return ioctl(fd, LTTNG_KERNEL_OLD_TRACEPOINT_LIST); >> return ioctl(fd, LTTNG_KERNEL_TRACEPOINT_LIST); >> } >> >> int kernctl_tracer_version(int fd, struct lttng_kernel_tracer_version *v) >> { >> - return ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); >> + int ret; >> + >> + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); >> + if (!ret) >> + return ret; >> + >> + lttng_kernel_use_old_abi = 1; >> + return ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, v); >> } >> >> int kernctl_wait_quiescent(int fd) >> { >> + if (lttng_kernel_use_old_abi) >> + return ioctl(fd, LTTNG_KERNEL_OLD_WAIT_QUIESCENT); >> return ioctl(fd, LTTNG_KERNEL_WAIT_QUIESCENT); >> } >> >> int kernctl_calibrate(int fd, struct lttng_kernel_calibrate *calibrate) >> { >> + if (lttng_kernel_use_old_abi) >> + return ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, calibrate); >> return ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); >> } >> >> @@ -193,10 +276,3 @@ int kernctl_set_stream_id(int fd, unsigned long *stream_id) >> { >> return ioctl(fd, RING_BUFFER_SET_STREAM_ID, stream_id); >> } >> - >> -/* Get the offset of the stream_id in the packet header */ >> -int kernctl_get_net_stream_id_offset(int fd, unsigned long *offset) >> -{ >> - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, offset); >> - >> -} >> diff --git a/src/common/kernel-ctl/kernel-ctl.h b/src/common/kernel-ctl/kernel-ctl.h >> index 18712d9..85a3a18 100644 >> --- a/src/common/kernel-ctl/kernel-ctl.h >> +++ b/src/common/kernel-ctl/kernel-ctl.h >> @@ -21,6 +21,7 @@ >> >> #include >> #include >> +#include >> >> int kernctl_create_session(int fd); >> int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops); >> diff --git a/src/common/kernel-ctl/kernel-ioctl.h b/src/common/kernel-ctl/kernel-ioctl.h >> index 35942be..1d34222 100644 >> --- a/src/common/kernel-ctl/kernel-ioctl.h >> +++ b/src/common/kernel-ctl/kernel-ioctl.h >> @@ -49,37 +49,69 @@ >> /* map stream to stream id for network streaming */ >> #define RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned long) >> >> +/* Old ABI (without support for 32/64 bits compat) */ >> +/* LTTng file descriptor ioctl */ >> +#define LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) >> +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ >> + _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) >> +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, 0x42) >> +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT _IO(0xF6, 0x43) >> +#define LTTNG_KERNEL_OLD_CALIBRATE \ >> + _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) >> + >> +/* Session FD ioctl */ >> +#define LTTNG_KERNEL_OLD_METADATA \ >> + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) >> +#define LTTNG_KERNEL_OLD_CHANNEL \ >> + _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) >> +#define LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) >> +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, 0x53) >> + >> +/* Channel FD ioctl */ >> +#define LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) >> +#define LTTNG_KERNEL_OLD_EVENT \ >> + _IOW(0xF6, 0x61, struct lttng_kernel_old_event) >> +#define LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ >> + _IOR(0xF6, 0x62, unsigned long) >> >> +/* Event and Channel FD ioctl */ >> +#define LTTNG_KERNEL_OLD_CONTEXT \ >> + _IOW(0xF6, 0x70, struct lttng_kernel_old_context) >> + >> +/* Event, Channel and Session ioctl */ >> +#define LTTNG_KERNEL_OLD_ENABLE _IO(0xF6, 0x80) >> +#define LTTNG_KERNEL_OLD_DISABLE _IO(0xF6, 0x81) >> + >> + >> +/* New ABI (with suport for 32/64 bits compat) */ >> /* LTTng file descriptor ioctl */ >> -#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) >> -#define LTTNG_KERNEL_TRACER_VERSION \ >> - _IOR(0xF6, 0x41, struct lttng_kernel_tracer_version) >> -#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) >> -#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) >> +#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) >> +#define LTTNG_KERNEL_TRACER_VERSION \ >> + _IOR(0xF6, 0x46, struct lttng_kernel_tracer_version) >> +#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) >> +#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) >> #define LTTNG_KERNEL_CALIBRATE \ >> - _IOWR(0xF6, 0x44, struct lttng_kernel_calibrate) >> + _IOWR(0xF6, 0x49, struct lttng_kernel_calibrate) >> >> /* Session FD ioctl */ >> -#define LTTNG_KERNEL_METADATA \ >> - _IOW(0xF6, 0x50, struct lttng_channel_attr) >> -#define LTTNG_KERNEL_CHANNEL \ >> - _IOW(0xF6, 0x51, struct lttng_channel_attr) >> -#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) >> -#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) >> +#define LTTNG_KERNEL_METADATA \ >> + _IOW(0xF6, 0x54, struct lttng_kernel_channel) >> +#define LTTNG_KERNEL_CHANNEL \ >> + _IOW(0xF6, 0x55, struct lttng_kernel_channel) >> +#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) >> +#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) >> >> /* Channel FD ioctl */ >> -#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x60) >> -#define LTTNG_KERNEL_EVENT \ >> - _IOW(0xF6, 0x61, struct lttng_kernel_event) >> -#define LTTNG_KERNEL_STREAM_ID_OFFSET \ >> - _IOR(0xF6, 0x62, unsigned long) >> +#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x62) >> +#define LTTNG_KERNEL_EVENT \ >> + _IOW(0xF6, 0x63, struct lttng_kernel_event) >> >> /* Event and Channel FD ioctl */ >> -#define LTTNG_KERNEL_CONTEXT \ >> - _IOW(0xF6, 0x70, struct lttng_kernel_context) >> +#define LTTNG_KERNEL_CONTEXT \ >> + _IOW(0xF6, 0x71, struct lttng_kernel_context) >> >> /* Event, Channel and Session ioctl */ >> -#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) >> -#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) >> +#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) >> +#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) >> >> #endif /* _LTT_KERNEL_IOCTL_H */ >> diff --git a/src/common/lttng-kernel-old.h b/src/common/lttng-kernel-old.h >> new file mode 100644 >> index 0000000..0579751 >> --- /dev/null >> +++ b/src/common/lttng-kernel-old.h >> @@ -0,0 +1,117 @@ >> +/* >> + * Copyright (C) 2011 - Julien Desfossez >> + * Mathieu Desnoyers >> + * David Goulet >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License, version 2 only, >> + * as published by the Free Software Foundation. >> + * >> + * This program is distributed in the hope that it will be useful, but WITHOUT >> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or >> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for >> + * more details. >> + * >> + * You should have received a copy of the GNU General Public License along >> + * with this program; if not, write to the Free Software Foundation, Inc., >> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. >> + */ >> + >> +#ifndef _LTTNG_KERNEL_OLD_H >> +#define _LTTNG_KERNEL_OLD_H >> + >> +#include >> +#include >> + >> +#define LTTNG_KERNEL_OLD_SYM_NAME_LEN 256 >> + >> +/* >> + * LTTng DebugFS ABI structures. >> + * >> + * This is the kernel ABI copied from lttng-modules tree. >> + */ >> + >> +/* Perf counter attributes */ >> +struct lttng_kernel_old_perf_counter_ctx { >> + uint32_t type; >> + uint64_t config; >> + char name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; >> +}; >> + >> +/* Event/Channel context */ >> +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 >> +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 >> +struct lttng_kernel_old_context { >> + enum lttng_kernel_context_type ctx; >> + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; >> + >> + union { >> + struct lttng_kernel_old_perf_counter_ctx perf_counter; >> + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; >> + } u; >> +}; >> + >> +struct lttng_kernel_old_kretprobe { >> + uint64_t addr; >> + >> + uint64_t offset; >> + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; >> +}; >> + >> +/* >> + * Either addr is used, or symbol_name and offset. >> + */ >> +struct lttng_kernel_old_kprobe { >> + uint64_t addr; >> + >> + uint64_t offset; >> + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; >> +}; >> + >> +/* Function tracer */ >> +struct lttng_kernel_old_function { >> + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; >> +}; >> + >> +#define LTTNG_KERNEL_OLD_EVENT_PADDING1 16 >> +#define LTTNG_KERNEL_OLD_EVENT_PADDING2 LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 >> +struct lttng_kernel_old_event { >> + char name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; >> + enum lttng_kernel_instrumentation instrumentation; >> + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; >> + >> + /* Per instrumentation type configuration */ >> + union { >> + struct lttng_kernel_old_kretprobe kretprobe; >> + struct lttng_kernel_old_kprobe kprobe; >> + struct lttng_kernel_old_function ftrace; >> + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; >> + } u; >> +}; >> + >> +struct lttng_kernel_old_tracer_version { >> + uint32_t major; >> + uint32_t minor; >> + uint32_t patchlevel; >> +}; >> + >> +struct lttng_kernel_old_calibrate { >> + enum lttng_kernel_calibrate_type type; /* type (input) */ >> +}; >> + >> +/* >> + * kernel channel >> + */ >> +#define LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 >> +struct lttng_kernel_old_channel { >> + int overwrite; /* 1: overwrite, 0: discard */ >> + uint64_t subbuf_size; /* bytes */ >> + uint64_t num_subbuf; /* power of 2 */ >> + unsigned int switch_timer_interval; /* usec */ >> + unsigned int read_timer_interval; /* usec */ >> + enum lttng_event_output output; /* splice, mmap */ >> + >> + char padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; >> +}; >> + >> +#endif /* _LTTNG_KERNEL_OLD_H */ >> diff --git a/src/common/lttng-kernel.h b/src/common/lttng-kernel.h >> index dbeb6aa..ac881bf 100644 >> --- a/src/common/lttng-kernel.h >> +++ b/src/common/lttng-kernel.h >> @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { >> uint32_t type; >> uint64_t config; >> char name[LTTNG_KERNEL_SYM_NAME_LEN]; >> -}; >> +}__attribute__((packed)); >> >> /* Event/Channel context */ >> #define LTTNG_KERNEL_CONTEXT_PADDING1 16 >> @@ -72,14 +72,14 @@ struct lttng_kernel_context { >> struct lttng_kernel_perf_counter_ctx perf_counter; >> char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; >> } u; >> -}; >> +}__attribute__((packed)); >> >> struct lttng_kernel_kretprobe { >> uint64_t addr; >> >> uint64_t offset; >> char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; >> -}; >> +}__attribute__((packed)); >> >> /* >> * Either addr is used, or symbol_name and offset. >> @@ -89,12 +89,12 @@ struct lttng_kernel_kprobe { >> >> uint64_t offset; >> char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; >> -}; >> +}__attribute__((packed)); >> >> /* Function tracer */ >> struct lttng_kernel_function { >> char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; >> -}; >> +}__attribute__((packed)); >> >> #define LTTNG_KERNEL_EVENT_PADDING1 16 >> #define LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 >> @@ -110,13 +110,13 @@ struct lttng_kernel_event { >> struct lttng_kernel_function ftrace; >> char padding[LTTNG_KERNEL_EVENT_PADDING2]; >> } u; >> -}; >> +}__attribute__((packed)); >> >> struct lttng_kernel_tracer_version { >> uint32_t major; >> uint32_t minor; >> uint32_t patchlevel; >> -}; >> +}__attribute__((packed)); >> >> enum lttng_kernel_calibrate_type { >> LTTNG_KERNEL_CALIBRATE_KRETPROBE, >> @@ -124,6 +124,21 @@ enum lttng_kernel_calibrate_type { >> >> struct lttng_kernel_calibrate { >> enum lttng_kernel_calibrate_type type; /* type (input) */ >> -}; >> +}__attribute__((packed)); >> + >> +/* >> + * kernel channel >> + */ >> +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 >> +struct lttng_kernel_channel { >> + uint64_t subbuf_size; /* bytes */ >> + uint64_t num_subbuf; /* power of 2 */ >> + unsigned int switch_timer_interval; /* usec */ >> + unsigned int read_timer_interval; /* usec */ >> + int overwrite; /* 1: overwrite, 0: discard */ >> + enum lttng_event_output output; /* splice, mmap */ >> + >> + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; >> +}__attribute__((packed)); >> >> #endif /* _LTTNG_KERNEL_H */ >> -- >> 1.7.9.5 >> > From mathieu.desnoyers at efficios.com Mon Oct 1 14:14:52 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 1 Oct 2012 14:14:52 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v2] ABI with support for compat 32/64 bits In-Reply-To: <5069D90C.1030804@efficios.com> References: <1348952028-31926-1-git-send-email-jdesfossez@efficios.com> <20121001161541.GB14260@Krystal> <5069D90C.1030804@efficios.com> Message-ID: <20121001181452.GA15644@Krystal> * Julien Desfossez (jdesfossez at efficios.com) wrote: > > > On 01/10/12 12:15 PM, Mathieu Desnoyers wrote: > > * Julien Desfossez (jdesfossez at efficios.com) wrote: > >> The current ABI does not work for compat 32/64 bits. > >> This patch moves the current ABI as old-abi and provides a new ABI in > >> which all the structures exchanged between user and kernel-space are > >> packed. Also this new ABI moves the "int overwrite" member of the > >> struct lttng_kernel_channel to remove the alignment added by the > >> compiler. > >> > >> A patch for lttng-modules has been developed in parallel to this one > >> to support the new ABI. These 2 patches have been tested in all > >> possible configurations (applied or not) on 64-bit and 32-bit kernels > >> (with CONFIG_COMPAT) and a user-space in 32 and 64-bit. > >> > >> Here are the results of the tests : > >> k 64 compat | u 32 compat | OK > >> k 64 compat | u 64 compat | OK > >> k 64 compat | u 32 non-compat | KO > >> k 64 compat | u 64 non-compat | OK > >> > >> k 64 non-compat | u 64 compat | OK > >> k 64 non-compat | u 32 compat | KO > >> k 64 non-compat | u 64 non-compat | OK > >> k 64 non-compat | u 32 non-compat | KO > >> > >> k 32 compat | u compat | OK > >> k 32 compat | u non-compat | OK > >> > >> k 32 non-compat | u compat | OK > >> k 32 non-compat | u non-compat | OK > >> > >> The results are as expected : > >> - on 32-bit user-space and kernel, every configuration works. > >> - on 64-bit user-space and kernel, every configuration works. > >> - with 32-bit user-space on a 64-bit kernel the only configuration > >> where it works is when the compat patch is applied everywhere. > >> > >> Signed-off-by: Julien Desfossez > >> --- > >> src/bin/lttng-sessiond/trace-kernel.h | 1 + > >> src/common/kernel-ctl/kernel-ctl.c | 96 ++++++++++++++++++++++++--- > >> src/common/kernel-ctl/kernel-ctl.h | 1 + > >> src/common/kernel-ctl/kernel-ioctl.h | 74 +++++++++++++++------ > >> src/common/lttng-kernel-old.h | 117 +++++++++++++++++++++++++++++++++ > >> src/common/lttng-kernel.h | 31 ++++++--- > >> 6 files changed, 281 insertions(+), 39 deletions(-) > >> create mode 100644 src/common/lttng-kernel-old.h > >> > >> diff --git a/src/bin/lttng-sessiond/trace-kernel.h b/src/bin/lttng-sessiond/trace-kernel.h > >> index f04d9e7..c86cc27 100644 > >> --- a/src/bin/lttng-sessiond/trace-kernel.h > >> +++ b/src/bin/lttng-sessiond/trace-kernel.h > >> @@ -22,6 +22,7 @@ > >> > >> #include > >> #include > >> +#include > >> > >> #include "consumer.h" > >> > >> diff --git a/src/common/kernel-ctl/kernel-ctl.c b/src/common/kernel-ctl/kernel-ctl.c > >> index 1396cd9..2ac2d53 100644 > >> --- a/src/common/kernel-ctl/kernel-ctl.c > >> +++ b/src/common/kernel-ctl/kernel-ctl.c > >> @@ -18,38 +18,100 @@ > >> > >> #define __USE_LINUX_IOCTL_DEFS > >> #include > >> +#include > >> > >> #include "kernel-ctl.h" > >> #include "kernel-ioctl.h" > >> > >> +/* > >> + * This flag indicates if lttng-tools must use the new or the old kernel ABI > >> + * (without compat support for 32/64 bits). It is set by > >> + * kernctl_tracer_version() > > > > Hrm. I don't like that this is only set by kernctl_tracer_version(). > > What if, for an unforeseen reason (e.g. code change), sessiond starts > > using "create session" before checking the version ? The change you > > propose assumes a behavior on the sessiond side that is not cast in > > stone (no ABI requires it), so it might change. This brings coupling > > between this otherwise self-contained wrapper and the entire sessiond > > code base, which I don't like. > > > > We should put the check in a wrapper macro around the every new ioctl > > call. > > > > e.g. > > > > /* > > * Cache whether we need to use the old or new ABI. > > */ > > static lttng_kernel_use_old_abi = -1; > > > > > > if (lttng_kernel_use_old_abi == -1) { > > ret = ioctl(fd, newname, args); > > if (!ret) { > > lttng_kernel_use_old_abi = 0; > > } > > } else { > > if (!lttng_kernel_use_old_abi) { > > ret = ioctl(fd, oldname, args); > > } else { > > ret = ioctl(fd, newname, args); > > } > > } > > return ret; > > > > Thoughts ? > > Ok I will add this wrapper, I am wondering if we could just limit to > these ioctl since they are the only one called on the "top-level" fd > (/proc/lttng) instead of adding it everywhere : > LTTNG_KERNEL_SESSION > LTTNG_KERNEL_TRACER_VERSION > LTTNG_KERNEL_TRACEPOINT_LIST > LTTNG_KERNEL_WAIT_QUIESCENT > LTTNG_KERNEL_CALIBRATE > > All other ioctl require at least a session so is it OK if I limit the > test to only these ? yes, it makes sense. And given that this code is very much localized, I don't think it would be worth the effort to create a macro wrapper. Duplicating the checks at each 5 sites, all within the same file, seems good enough. But it's David's call. Thanks, Mathieu > > Thanks, > > Julien > > > > > > Thanks, > > > > Mathieu > > > > > >> + */ > >> +static int lttng_kernel_use_old_abi; > >> + > >> int kernctl_create_session(int fd) > >> { > >> + if (lttng_kernel_use_old_abi) > >> + return ioctl(fd, LTTNG_KERNEL_OLD_SESSION); > >> return ioctl(fd, LTTNG_KERNEL_SESSION); > >> } > >> > >> /* open the metadata global channel */ > >> int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); > >> + struct lttng_kernel_old_channel old_channel; > >> + struct lttng_kernel_channel channel; > >> + > >> + if (lttng_kernel_use_old_abi) { > >> + old_channel.overwrite = chops->overwrite; > >> + old_channel.subbuf_size = chops->subbuf_size; > >> + old_channel.num_subbuf = chops->num_subbuf; > >> + old_channel.switch_timer_interval = chops->switch_timer_interval; > >> + old_channel.read_timer_interval = chops->read_timer_interval; > >> + old_channel.output = chops->output; > >> + memcpy(old_channel.padding, chops->padding, sizeof(chops->padding)); > >> + > >> + return ioctl(fd, LTTNG_KERNEL_OLD_METADATA, &old_channel); > >> + } > >> + > >> + channel.overwrite = chops->overwrite; > >> + channel.subbuf_size = chops->subbuf_size; > >> + channel.num_subbuf = chops->num_subbuf; > >> + channel.switch_timer_interval = chops->switch_timer_interval; > >> + channel.read_timer_interval = chops->read_timer_interval; > >> + channel.output = chops->output; > >> + memcpy(channel.padding, chops->padding, sizeof(chops->padding)); > >> + > >> + return ioctl(fd, LTTNG_KERNEL_METADATA, &channel); > >> } > >> > >> int kernctl_create_channel(int fd, struct lttng_channel_attr *chops) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); > >> + struct lttng_kernel_old_channel old_channel; > >> + struct lttng_kernel_channel channel; > >> + > >> + if (lttng_kernel_use_old_abi) { > >> + old_channel.overwrite = chops->overwrite; > >> + old_channel.subbuf_size = chops->subbuf_size; > >> + old_channel.num_subbuf = chops->num_subbuf; > >> + old_channel.switch_timer_interval = chops->switch_timer_interval; > >> + old_channel.read_timer_interval = chops->read_timer_interval; > >> + old_channel.output = chops->output; > >> + memcpy(old_channel.padding, chops->padding, sizeof(chops->padding)); > >> + > >> + return ioctl(fd, LTTNG_KERNEL_OLD_CHANNEL, &old_channel); > >> + } > >> + > >> + channel.overwrite = chops->overwrite; > >> + channel.subbuf_size = chops->subbuf_size; > >> + channel.num_subbuf = chops->num_subbuf; > >> + channel.switch_timer_interval = chops->switch_timer_interval; > >> + channel.read_timer_interval = chops->read_timer_interval; > >> + channel.output = chops->output; > >> + memcpy(channel.padding, chops->padding, sizeof(chops->padding)); > >> + > >> + return ioctl(fd, LTTNG_KERNEL_CHANNEL, &channel); > >> } > >> > >> int kernctl_create_stream(int fd) > >> { > >> + if (lttng_kernel_use_old_abi) > >> + return ioctl(fd, LTTNG_KERNEL_OLD_STREAM); > >> return ioctl(fd, LTTNG_KERNEL_STREAM); > >> } > >> > >> int kernctl_create_event(int fd, struct lttng_kernel_event *ev) > >> { > >> + if (lttng_kernel_use_old_abi) > >> + return ioctl(fd, LTTNG_KERNEL_OLD_EVENT, ev); > >> return ioctl(fd, LTTNG_KERNEL_EVENT, ev); > >> } > >> > >> int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) > >> { > >> + if (lttng_kernel_use_old_abi) > >> + return ioctl(fd, LTTNG_KERNEL_OLD_CONTEXT, ctx); > >> return ioctl(fd, LTTNG_KERNEL_CONTEXT, ctx); > >> } > >> > >> @@ -57,43 +119,64 @@ int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) > >> /* Enable event, channel and session ioctl */ > >> int kernctl_enable(int fd) > >> { > >> + if (lttng_kernel_use_old_abi) > >> + return ioctl(fd, LTTNG_KERNEL_OLD_ENABLE); > >> return ioctl(fd, LTTNG_KERNEL_ENABLE); > >> } > >> > >> /* Disable event, channel and session ioctl */ > >> int kernctl_disable(int fd) > >> { > >> + if (lttng_kernel_use_old_abi) > >> + return ioctl(fd, LTTNG_KERNEL_OLD_DISABLE); > >> return ioctl(fd, LTTNG_KERNEL_DISABLE); > >> } > >> > >> int kernctl_start_session(int fd) > >> { > >> + if (lttng_kernel_use_old_abi) > >> + return ioctl(fd, LTTNG_KERNEL_OLD_SESSION_START); > >> return ioctl(fd, LTTNG_KERNEL_SESSION_START); > >> } > >> > >> int kernctl_stop_session(int fd) > >> { > >> + if (lttng_kernel_use_old_abi) > >> + return ioctl(fd, LTTNG_KERNEL_OLD_SESSION_STOP); > >> return ioctl(fd, LTTNG_KERNEL_SESSION_STOP); > >> } > >> > >> > >> int kernctl_tracepoint_list(int fd) > >> { > >> + if (lttng_kernel_use_old_abi) > >> + return ioctl(fd, LTTNG_KERNEL_OLD_TRACEPOINT_LIST); > >> return ioctl(fd, LTTNG_KERNEL_TRACEPOINT_LIST); > >> } > >> > >> int kernctl_tracer_version(int fd, struct lttng_kernel_tracer_version *v) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > >> + int ret; > >> + > >> + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > >> + if (!ret) > >> + return ret; > >> + > >> + lttng_kernel_use_old_abi = 1; > >> + return ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, v); > >> } > >> > >> int kernctl_wait_quiescent(int fd) > >> { > >> + if (lttng_kernel_use_old_abi) > >> + return ioctl(fd, LTTNG_KERNEL_OLD_WAIT_QUIESCENT); > >> return ioctl(fd, LTTNG_KERNEL_WAIT_QUIESCENT); > >> } > >> > >> int kernctl_calibrate(int fd, struct lttng_kernel_calibrate *calibrate) > >> { > >> + if (lttng_kernel_use_old_abi) > >> + return ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, calibrate); > >> return ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > >> } > >> > >> @@ -193,10 +276,3 @@ int kernctl_set_stream_id(int fd, unsigned long *stream_id) > >> { > >> return ioctl(fd, RING_BUFFER_SET_STREAM_ID, stream_id); > >> } > >> - > >> -/* Get the offset of the stream_id in the packet header */ > >> -int kernctl_get_net_stream_id_offset(int fd, unsigned long *offset) > >> -{ > >> - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, offset); > >> - > >> -} > >> diff --git a/src/common/kernel-ctl/kernel-ctl.h b/src/common/kernel-ctl/kernel-ctl.h > >> index 18712d9..85a3a18 100644 > >> --- a/src/common/kernel-ctl/kernel-ctl.h > >> +++ b/src/common/kernel-ctl/kernel-ctl.h > >> @@ -21,6 +21,7 @@ > >> > >> #include > >> #include > >> +#include > >> > >> int kernctl_create_session(int fd); > >> int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops); > >> diff --git a/src/common/kernel-ctl/kernel-ioctl.h b/src/common/kernel-ctl/kernel-ioctl.h > >> index 35942be..1d34222 100644 > >> --- a/src/common/kernel-ctl/kernel-ioctl.h > >> +++ b/src/common/kernel-ctl/kernel-ioctl.h > >> @@ -49,37 +49,69 @@ > >> /* map stream to stream id for network streaming */ > >> #define RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned long) > >> > >> +/* Old ABI (without support for 32/64 bits compat) */ > >> +/* LTTng file descriptor ioctl */ > >> +#define LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) > >> +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ > >> + _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) > >> +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, 0x42) > >> +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT _IO(0xF6, 0x43) > >> +#define LTTNG_KERNEL_OLD_CALIBRATE \ > >> + _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) > >> + > >> +/* Session FD ioctl */ > >> +#define LTTNG_KERNEL_OLD_METADATA \ > >> + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) > >> +#define LTTNG_KERNEL_OLD_CHANNEL \ > >> + _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) > >> +#define LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) > >> +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, 0x53) > >> + > >> +/* Channel FD ioctl */ > >> +#define LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) > >> +#define LTTNG_KERNEL_OLD_EVENT \ > >> + _IOW(0xF6, 0x61, struct lttng_kernel_old_event) > >> +#define LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ > >> + _IOR(0xF6, 0x62, unsigned long) > >> > >> +/* Event and Channel FD ioctl */ > >> +#define LTTNG_KERNEL_OLD_CONTEXT \ > >> + _IOW(0xF6, 0x70, struct lttng_kernel_old_context) > >> + > >> +/* Event, Channel and Session ioctl */ > >> +#define LTTNG_KERNEL_OLD_ENABLE _IO(0xF6, 0x80) > >> +#define LTTNG_KERNEL_OLD_DISABLE _IO(0xF6, 0x81) > >> + > >> + > >> +/* New ABI (with suport for 32/64 bits compat) */ > >> /* LTTng file descriptor ioctl */ > >> -#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) > >> -#define LTTNG_KERNEL_TRACER_VERSION \ > >> - _IOR(0xF6, 0x41, struct lttng_kernel_tracer_version) > >> -#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) > >> -#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) > >> +#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) > >> +#define LTTNG_KERNEL_TRACER_VERSION \ > >> + _IOR(0xF6, 0x46, struct lttng_kernel_tracer_version) > >> +#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) > >> +#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) > >> #define LTTNG_KERNEL_CALIBRATE \ > >> - _IOWR(0xF6, 0x44, struct lttng_kernel_calibrate) > >> + _IOWR(0xF6, 0x49, struct lttng_kernel_calibrate) > >> > >> /* Session FD ioctl */ > >> -#define LTTNG_KERNEL_METADATA \ > >> - _IOW(0xF6, 0x50, struct lttng_channel_attr) > >> -#define LTTNG_KERNEL_CHANNEL \ > >> - _IOW(0xF6, 0x51, struct lttng_channel_attr) > >> -#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) > >> -#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) > >> +#define LTTNG_KERNEL_METADATA \ > >> + _IOW(0xF6, 0x54, struct lttng_kernel_channel) > >> +#define LTTNG_KERNEL_CHANNEL \ > >> + _IOW(0xF6, 0x55, struct lttng_kernel_channel) > >> +#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) > >> +#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) > >> > >> /* Channel FD ioctl */ > >> -#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x60) > >> -#define LTTNG_KERNEL_EVENT \ > >> - _IOW(0xF6, 0x61, struct lttng_kernel_event) > >> -#define LTTNG_KERNEL_STREAM_ID_OFFSET \ > >> - _IOR(0xF6, 0x62, unsigned long) > >> +#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x62) > >> +#define LTTNG_KERNEL_EVENT \ > >> + _IOW(0xF6, 0x63, struct lttng_kernel_event) > >> > >> /* Event and Channel FD ioctl */ > >> -#define LTTNG_KERNEL_CONTEXT \ > >> - _IOW(0xF6, 0x70, struct lttng_kernel_context) > >> +#define LTTNG_KERNEL_CONTEXT \ > >> + _IOW(0xF6, 0x71, struct lttng_kernel_context) > >> > >> /* Event, Channel and Session ioctl */ > >> -#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) > >> -#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) > >> +#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) > >> +#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) > >> > >> #endif /* _LTT_KERNEL_IOCTL_H */ > >> diff --git a/src/common/lttng-kernel-old.h b/src/common/lttng-kernel-old.h > >> new file mode 100644 > >> index 0000000..0579751 > >> --- /dev/null > >> +++ b/src/common/lttng-kernel-old.h > >> @@ -0,0 +1,117 @@ > >> +/* > >> + * Copyright (C) 2011 - Julien Desfossez > >> + * Mathieu Desnoyers > >> + * David Goulet > >> + * > >> + * This program is free software; you can redistribute it and/or modify > >> + * it under the terms of the GNU General Public License, version 2 only, > >> + * as published by the Free Software Foundation. > >> + * > >> + * This program is distributed in the hope that it will be useful, but WITHOUT > >> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > >> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > >> + * more details. > >> + * > >> + * You should have received a copy of the GNU General Public License along > >> + * with this program; if not, write to the Free Software Foundation, Inc., > >> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > >> + */ > >> + > >> +#ifndef _LTTNG_KERNEL_OLD_H > >> +#define _LTTNG_KERNEL_OLD_H > >> + > >> +#include > >> +#include > >> + > >> +#define LTTNG_KERNEL_OLD_SYM_NAME_LEN 256 > >> + > >> +/* > >> + * LTTng DebugFS ABI structures. > >> + * > >> + * This is the kernel ABI copied from lttng-modules tree. > >> + */ > >> + > >> +/* Perf counter attributes */ > >> +struct lttng_kernel_old_perf_counter_ctx { > >> + uint32_t type; > >> + uint64_t config; > >> + char name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > >> +}; > >> + > >> +/* Event/Channel context */ > >> +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 > >> +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 > >> +struct lttng_kernel_old_context { > >> + enum lttng_kernel_context_type ctx; > >> + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; > >> + > >> + union { > >> + struct lttng_kernel_old_perf_counter_ctx perf_counter; > >> + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; > >> + } u; > >> +}; > >> + > >> +struct lttng_kernel_old_kretprobe { > >> + uint64_t addr; > >> + > >> + uint64_t offset; > >> + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > >> +}; > >> + > >> +/* > >> + * Either addr is used, or symbol_name and offset. > >> + */ > >> +struct lttng_kernel_old_kprobe { > >> + uint64_t addr; > >> + > >> + uint64_t offset; > >> + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > >> +}; > >> + > >> +/* Function tracer */ > >> +struct lttng_kernel_old_function { > >> + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > >> +}; > >> + > >> +#define LTTNG_KERNEL_OLD_EVENT_PADDING1 16 > >> +#define LTTNG_KERNEL_OLD_EVENT_PADDING2 LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 > >> +struct lttng_kernel_old_event { > >> + char name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > >> + enum lttng_kernel_instrumentation instrumentation; > >> + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; > >> + > >> + /* Per instrumentation type configuration */ > >> + union { > >> + struct lttng_kernel_old_kretprobe kretprobe; > >> + struct lttng_kernel_old_kprobe kprobe; > >> + struct lttng_kernel_old_function ftrace; > >> + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; > >> + } u; > >> +}; > >> + > >> +struct lttng_kernel_old_tracer_version { > >> + uint32_t major; > >> + uint32_t minor; > >> + uint32_t patchlevel; > >> +}; > >> + > >> +struct lttng_kernel_old_calibrate { > >> + enum lttng_kernel_calibrate_type type; /* type (input) */ > >> +}; > >> + > >> +/* > >> + * kernel channel > >> + */ > >> +#define LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > >> +struct lttng_kernel_old_channel { > >> + int overwrite; /* 1: overwrite, 0: discard */ > >> + uint64_t subbuf_size; /* bytes */ > >> + uint64_t num_subbuf; /* power of 2 */ > >> + unsigned int switch_timer_interval; /* usec */ > >> + unsigned int read_timer_interval; /* usec */ > >> + enum lttng_event_output output; /* splice, mmap */ > >> + > >> + char padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; > >> +}; > >> + > >> +#endif /* _LTTNG_KERNEL_OLD_H */ > >> diff --git a/src/common/lttng-kernel.h b/src/common/lttng-kernel.h > >> index dbeb6aa..ac881bf 100644 > >> --- a/src/common/lttng-kernel.h > >> +++ b/src/common/lttng-kernel.h > >> @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { > >> uint32_t type; > >> uint64_t config; > >> char name[LTTNG_KERNEL_SYM_NAME_LEN]; > >> -}; > >> +}__attribute__((packed)); > >> > >> /* Event/Channel context */ > >> #define LTTNG_KERNEL_CONTEXT_PADDING1 16 > >> @@ -72,14 +72,14 @@ struct lttng_kernel_context { > >> struct lttng_kernel_perf_counter_ctx perf_counter; > >> char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; > >> } u; > >> -}; > >> +}__attribute__((packed)); > >> > >> struct lttng_kernel_kretprobe { > >> uint64_t addr; > >> > >> uint64_t offset; > >> char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > >> -}; > >> +}__attribute__((packed)); > >> > >> /* > >> * Either addr is used, or symbol_name and offset. > >> @@ -89,12 +89,12 @@ struct lttng_kernel_kprobe { > >> > >> uint64_t offset; > >> char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > >> -}; > >> +}__attribute__((packed)); > >> > >> /* Function tracer */ > >> struct lttng_kernel_function { > >> char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > >> -}; > >> +}__attribute__((packed)); > >> > >> #define LTTNG_KERNEL_EVENT_PADDING1 16 > >> #define LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > >> @@ -110,13 +110,13 @@ struct lttng_kernel_event { > >> struct lttng_kernel_function ftrace; > >> char padding[LTTNG_KERNEL_EVENT_PADDING2]; > >> } u; > >> -}; > >> +}__attribute__((packed)); > >> > >> struct lttng_kernel_tracer_version { > >> uint32_t major; > >> uint32_t minor; > >> uint32_t patchlevel; > >> -}; > >> +}__attribute__((packed)); > >> > >> enum lttng_kernel_calibrate_type { > >> LTTNG_KERNEL_CALIBRATE_KRETPROBE, > >> @@ -124,6 +124,21 @@ enum lttng_kernel_calibrate_type { > >> > >> struct lttng_kernel_calibrate { > >> enum lttng_kernel_calibrate_type type; /* type (input) */ > >> -}; > >> +}__attribute__((packed)); > >> + > >> +/* > >> + * kernel channel > >> + */ > >> +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > >> +struct lttng_kernel_channel { > >> + uint64_t subbuf_size; /* bytes */ > >> + uint64_t num_subbuf; /* power of 2 */ > >> + unsigned int switch_timer_interval; /* usec */ > >> + unsigned int read_timer_interval; /* usec */ > >> + int overwrite; /* 1: overwrite, 0: discard */ > >> + enum lttng_event_output output; /* splice, mmap */ > >> + > >> + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; > >> +}__attribute__((packed)); > >> > >> #endif /* _LTTNG_KERNEL_H */ > >> -- > >> 1.7.9.5 > >> > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Mon Oct 1 14:23:11 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 01 Oct 2012 14:23:11 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v2] ABI with support for compat 32/64 bits In-Reply-To: <20121001181452.GA15644@Krystal> References: <1348952028-31926-1-git-send-email-jdesfossez@efficios.com> <20121001161541.GB14260@Krystal> <5069D90C.1030804@efficios.com> <20121001181452.GA15644@Krystal> Message-ID: <5069DF8F.9050706@efficios.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Mathieu Desnoyers: > * Julien Desfossez (jdesfossez at efficios.com) wrote: >> >> >> On 01/10/12 12:15 PM, Mathieu Desnoyers wrote: >>> * Julien Desfossez (jdesfossez at efficios.com) wrote: >>>> The current ABI does not work for compat 32/64 bits. This >>>> patch moves the current ABI as old-abi and provides a new ABI >>>> in which all the structures exchanged between user and >>>> kernel-space are packed. Also this new ABI moves the "int >>>> overwrite" member of the struct lttng_kernel_channel to >>>> remove the alignment added by the compiler. >>>> >>>> A patch for lttng-modules has been developed in parallel to >>>> this one to support the new ABI. These 2 patches have been >>>> tested in all possible configurations (applied or not) on >>>> 64-bit and 32-bit kernels (with CONFIG_COMPAT) and a >>>> user-space in 32 and 64-bit. >>>> >>>> Here are the results of the tests : k 64 compat | u 32 >>>> compat | OK k 64 compat | u 64 compat | OK k 64 >>>> compat | u 32 non-compat | KO k 64 compat | u 64 >>>> non-compat | OK >>>> >>>> k 64 non-compat | u 64 compat | OK k 64 non-compat | u 32 >>>> compat | KO k 64 non-compat | u 64 non-compat | OK k 64 >>>> non-compat | u 32 non-compat | KO >>>> >>>> k 32 compat | u compat | OK k 32 compat | u >>>> non-compat | OK >>>> >>>> k 32 non-compat | u compat | OK k 32 non-compat | u >>>> non-compat | OK >>>> >>>> The results are as expected : - on 32-bit user-space and >>>> kernel, every configuration works. - on 64-bit user-space and >>>> kernel, every configuration works. - with 32-bit user-space >>>> on a 64-bit kernel the only configuration where it works is >>>> when the compat patch is applied everywhere. >>>> >>>> Signed-off-by: Julien Desfossez >>>> --- src/bin/lttng-sessiond/trace-kernel.h | 1 + >>>> src/common/kernel-ctl/kernel-ctl.c | 96 >>>> ++++++++++++++++++++++++--- >>>> src/common/kernel-ctl/kernel-ctl.h | 1 + >>>> src/common/kernel-ctl/kernel-ioctl.h | 74 >>>> +++++++++++++++------ src/common/lttng-kernel-old.h | >>>> 117 +++++++++++++++++++++++++++++++++ >>>> src/common/lttng-kernel.h | 31 ++++++--- 6 >>>> files changed, 281 insertions(+), 39 deletions(-) create mode >>>> 100644 src/common/lttng-kernel-old.h >>>> >>>> diff --git a/src/bin/lttng-sessiond/trace-kernel.h >>>> b/src/bin/lttng-sessiond/trace-kernel.h index >>>> f04d9e7..c86cc27 100644 --- >>>> a/src/bin/lttng-sessiond/trace-kernel.h +++ >>>> b/src/bin/lttng-sessiond/trace-kernel.h @@ -22,6 +22,7 @@ >>>> >>>> #include #include >>>> +#include >>>> >>>> #include "consumer.h" >>>> >>>> diff --git a/src/common/kernel-ctl/kernel-ctl.c >>>> b/src/common/kernel-ctl/kernel-ctl.c index 1396cd9..2ac2d53 >>>> 100644 --- a/src/common/kernel-ctl/kernel-ctl.c +++ >>>> b/src/common/kernel-ctl/kernel-ctl.c @@ -18,38 +18,100 @@ >>>> >>>> #define __USE_LINUX_IOCTL_DEFS #include >>>> +#include >>>> >>>> #include "kernel-ctl.h" #include "kernel-ioctl.h" >>>> >>>> +/* + * This flag indicates if lttng-tools must use the new >>>> or the old kernel ABI + * (without compat support for 32/64 >>>> bits). It is set by + * kernctl_tracer_version() >>> >>> Hrm. I don't like that this is only set by >>> kernctl_tracer_version(). What if, for an unforeseen reason >>> (e.g. code change), sessiond starts using "create session" >>> before checking the version ? The change you propose assumes a >>> behavior on the sessiond side that is not cast in stone (no ABI >>> requires it), so it might change. This brings coupling between >>> this otherwise self-contained wrapper and the entire sessiond >>> code base, which I don't like. >>> >>> We should put the check in a wrapper macro around the every new >>> ioctl call. >>> >>> e.g. >>> >>> /* * Cache whether we need to use the old or new ABI. */ static >>> lttng_kernel_use_old_abi = -1; >>> >>> >>> if (lttng_kernel_use_old_abi == -1) { ret = ioctl(fd, newname, >>> args); if (!ret) { lttng_kernel_use_old_abi = 0; } } else { if >>> (!lttng_kernel_use_old_abi) { ret = ioctl(fd, oldname, args); } >>> else { ret = ioctl(fd, newname, args); } } return ret; >>> >>> Thoughts ? >> >> Ok I will add this wrapper, I am wondering if we could just limit >> to these ioctl since they are the only one called on the >> "top-level" fd (/proc/lttng) instead of adding it everywhere : >> LTTNG_KERNEL_SESSION LTTNG_KERNEL_TRACER_VERSION >> LTTNG_KERNEL_TRACEPOINT_LIST LTTNG_KERNEL_WAIT_QUIESCENT >> LTTNG_KERNEL_CALIBRATE >> >> All other ioctl require at least a session so is it OK if I limit >> the test to only these ? > > yes, it makes sense. And given that this code is very much > localized, I don't think it would be worth the effort to create a > macro wrapper. Duplicating the checks at each 5 sites, all within > the same file, seems good enough. But it's David's call. I don't know... I can easily see some other use cases in the future that requires us to wrap the ioctl according to the kernel. I remember having this discussion way back when we started lttng-tools code and here we are almost three years later fixing the "ioctl wrapper issue" :P. I don't think adding a macro is a lot of work and it will be really easier for us to scale and adapt over time. Please, if I'm wrong, speak now or shut up to the end of eternity! :P Cheers! David > > Thanks, > > Mathieu > >> >> Thanks, >> >> Julien >> >> >>> >>> Thanks, >>> >>> Mathieu >>> >>> >>>> + */ +static int lttng_kernel_use_old_abi; + int >>>> kernctl_create_session(int fd) { + if >>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>> LTTNG_KERNEL_OLD_SESSION); return ioctl(fd, >>>> LTTNG_KERNEL_SESSION); } >>>> >>>> /* open the metadata global channel */ int >>>> kernctl_open_metadata(int fd, struct lttng_channel_attr >>>> *chops) { - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); + >>>> struct lttng_kernel_old_channel old_channel; + struct >>>> lttng_kernel_channel channel; + + if >>>> (lttng_kernel_use_old_abi) { + old_channel.overwrite = >>>> chops->overwrite; + old_channel.subbuf_size = >>>> chops->subbuf_size; + old_channel.num_subbuf = >>>> chops->num_subbuf; + old_channel.switch_timer_interval = >>>> chops->switch_timer_interval; + >>>> old_channel.read_timer_interval = >>>> chops->read_timer_interval; + old_channel.output = >>>> chops->output; + memcpy(old_channel.padding, chops->padding, >>>> sizeof(chops->padding)); + + return ioctl(fd, >>>> LTTNG_KERNEL_OLD_METADATA, &old_channel); + } + + >>>> channel.overwrite = chops->overwrite; + channel.subbuf_size = >>>> chops->subbuf_size; + channel.num_subbuf = >>>> chops->num_subbuf; + channel.switch_timer_interval = >>>> chops->switch_timer_interval; + channel.read_timer_interval = >>>> chops->read_timer_interval; + channel.output = >>>> chops->output; + memcpy(channel.padding, chops->padding, >>>> sizeof(chops->padding)); + + return ioctl(fd, >>>> LTTNG_KERNEL_METADATA, &channel); } >>>> >>>> int kernctl_create_channel(int fd, struct lttng_channel_attr >>>> *chops) { - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); + >>>> struct lttng_kernel_old_channel old_channel; + struct >>>> lttng_kernel_channel channel; + + if >>>> (lttng_kernel_use_old_abi) { + old_channel.overwrite = >>>> chops->overwrite; + old_channel.subbuf_size = >>>> chops->subbuf_size; + old_channel.num_subbuf = >>>> chops->num_subbuf; + old_channel.switch_timer_interval = >>>> chops->switch_timer_interval; + >>>> old_channel.read_timer_interval = >>>> chops->read_timer_interval; + old_channel.output = >>>> chops->output; + memcpy(old_channel.padding, chops->padding, >>>> sizeof(chops->padding)); + + return ioctl(fd, >>>> LTTNG_KERNEL_OLD_CHANNEL, &old_channel); + } + + >>>> channel.overwrite = chops->overwrite; + channel.subbuf_size = >>>> chops->subbuf_size; + channel.num_subbuf = >>>> chops->num_subbuf; + channel.switch_timer_interval = >>>> chops->switch_timer_interval; + channel.read_timer_interval = >>>> chops->read_timer_interval; + channel.output = >>>> chops->output; + memcpy(channel.padding, chops->padding, >>>> sizeof(chops->padding)); + + return ioctl(fd, >>>> LTTNG_KERNEL_CHANNEL, &channel); } >>>> >>>> int kernctl_create_stream(int fd) { + if >>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>> LTTNG_KERNEL_OLD_STREAM); return ioctl(fd, >>>> LTTNG_KERNEL_STREAM); } >>>> >>>> int kernctl_create_event(int fd, struct lttng_kernel_event >>>> *ev) { + if (lttng_kernel_use_old_abi) + return ioctl(fd, >>>> LTTNG_KERNEL_OLD_EVENT, ev); return ioctl(fd, >>>> LTTNG_KERNEL_EVENT, ev); } >>>> >>>> int kernctl_add_context(int fd, struct lttng_kernel_context >>>> *ctx) { + if (lttng_kernel_use_old_abi) + return ioctl(fd, >>>> LTTNG_KERNEL_OLD_CONTEXT, ctx); return ioctl(fd, >>>> LTTNG_KERNEL_CONTEXT, ctx); } >>>> >>>> @@ -57,43 +119,64 @@ int kernctl_add_context(int fd, struct >>>> lttng_kernel_context *ctx) /* Enable event, channel and >>>> session ioctl */ int kernctl_enable(int fd) { + if >>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>> LTTNG_KERNEL_OLD_ENABLE); return ioctl(fd, >>>> LTTNG_KERNEL_ENABLE); } >>>> >>>> /* Disable event, channel and session ioctl */ int >>>> kernctl_disable(int fd) { + if (lttng_kernel_use_old_abi) + >>>> return ioctl(fd, LTTNG_KERNEL_OLD_DISABLE); return ioctl(fd, >>>> LTTNG_KERNEL_DISABLE); } >>>> >>>> int kernctl_start_session(int fd) { + if >>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>> LTTNG_KERNEL_OLD_SESSION_START); return ioctl(fd, >>>> LTTNG_KERNEL_SESSION_START); } >>>> >>>> int kernctl_stop_session(int fd) { + if >>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>> LTTNG_KERNEL_OLD_SESSION_STOP); return ioctl(fd, >>>> LTTNG_KERNEL_SESSION_STOP); } >>>> >>>> >>>> int kernctl_tracepoint_list(int fd) { + if >>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>> LTTNG_KERNEL_OLD_TRACEPOINT_LIST); return ioctl(fd, >>>> LTTNG_KERNEL_TRACEPOINT_LIST); } >>>> >>>> int kernctl_tracer_version(int fd, struct >>>> lttng_kernel_tracer_version *v) { - return ioctl(fd, >>>> LTTNG_KERNEL_TRACER_VERSION, v); + int ret; + + ret = >>>> ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); + if (!ret) + >>>> return ret; + + lttng_kernel_use_old_abi = 1; + return >>>> ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, v); } >>>> >>>> int kernctl_wait_quiescent(int fd) { + if >>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>> LTTNG_KERNEL_OLD_WAIT_QUIESCENT); return ioctl(fd, >>>> LTTNG_KERNEL_WAIT_QUIESCENT); } >>>> >>>> int kernctl_calibrate(int fd, struct lttng_kernel_calibrate >>>> *calibrate) { + if (lttng_kernel_use_old_abi) + return >>>> ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, calibrate); return >>>> ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); } >>>> >>>> @@ -193,10 +276,3 @@ int kernctl_set_stream_id(int fd, >>>> unsigned long *stream_id) { return ioctl(fd, >>>> RING_BUFFER_SET_STREAM_ID, stream_id); } - -/* Get the offset >>>> of the stream_id in the packet header */ -int >>>> kernctl_get_net_stream_id_offset(int fd, unsigned long >>>> *offset) -{ - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, >>>> offset); - -} diff --git a/src/common/kernel-ctl/kernel-ctl.h >>>> b/src/common/kernel-ctl/kernel-ctl.h index 18712d9..85a3a18 >>>> 100644 --- a/src/common/kernel-ctl/kernel-ctl.h +++ >>>> b/src/common/kernel-ctl/kernel-ctl.h @@ -21,6 +21,7 @@ >>>> >>>> #include #include >>>> +#include >>>> >>>> int kernctl_create_session(int fd); int >>>> kernctl_open_metadata(int fd, struct lttng_channel_attr >>>> *chops); diff --git a/src/common/kernel-ctl/kernel-ioctl.h >>>> b/src/common/kernel-ctl/kernel-ioctl.h index 35942be..1d34222 >>>> 100644 --- a/src/common/kernel-ctl/kernel-ioctl.h +++ >>>> b/src/common/kernel-ctl/kernel-ioctl.h @@ -49,37 +49,69 @@ /* >>>> map stream to stream id for network streaming */ #define >>>> RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned >>>> long) >>>> >>>> +/* Old ABI (without support for 32/64 bits compat) */ +/* >>>> LTTng file descriptor ioctl */ +#define >>>> LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) >>>> +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ + >>>> _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) >>>> +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, >>>> 0x42) +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT >>>> _IO(0xF6, 0x43) +#define LTTNG_KERNEL_OLD_CALIBRATE \ + >>>> _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) + +/* >>>> Session FD ioctl */ +#define LTTNG_KERNEL_OLD_METADATA >>>> \ + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) >>>> +#define LTTNG_KERNEL_OLD_CHANNEL \ + >>>> _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) +#define >>>> LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) >>>> +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, >>>> 0x53) + +/* Channel FD ioctl */ +#define >>>> LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) >>>> +#define LTTNG_KERNEL_OLD_EVENT \ + >>>> _IOW(0xF6, 0x61, struct lttng_kernel_old_event) +#define >>>> LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ + _IOR(0xF6, 0x62, >>>> unsigned long) >>>> >>>> +/* Event and Channel FD ioctl */ +#define >>>> LTTNG_KERNEL_OLD_CONTEXT \ + _IOW(0xF6, 0x70, >>>> struct lttng_kernel_old_context) + +/* Event, Channel and >>>> Session ioctl */ +#define LTTNG_KERNEL_OLD_ENABLE >>>> _IO(0xF6, 0x80) +#define LTTNG_KERNEL_OLD_DISABLE >>>> _IO(0xF6, 0x81) + + +/* New ABI (with suport for 32/64 bits >>>> compat) */ /* LTTng file descriptor ioctl */ -#define >>>> LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) -#define >>>> LTTNG_KERNEL_TRACER_VERSION \ - _IOR(0xF6, 0x41, >>>> struct lttng_kernel_tracer_version) -#define >>>> LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) -#define >>>> LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) +#define >>>> LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) +#define >>>> LTTNG_KERNEL_TRACER_VERSION \ + _IOR(0xF6, 0x46, struct >>>> lttng_kernel_tracer_version) +#define >>>> LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) +#define >>>> LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) #define >>>> LTTNG_KERNEL_CALIBRATE \ - _IOWR(0xF6, 0x44, struct >>>> lttng_kernel_calibrate) + _IOWR(0xF6, 0x49, struct >>>> lttng_kernel_calibrate) >>>> >>>> /* Session FD ioctl */ -#define LTTNG_KERNEL_METADATA >>>> \ - _IOW(0xF6, 0x50, struct lttng_channel_attr) -#define >>>> LTTNG_KERNEL_CHANNEL \ - _IOW(0xF6, 0x51, >>>> struct lttng_channel_attr) -#define >>>> LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) -#define >>>> LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) +#define >>>> LTTNG_KERNEL_METADATA \ + _IOW(0xF6, 0x54, struct >>>> lttng_kernel_channel) +#define LTTNG_KERNEL_CHANNEL \ + >>>> _IOW(0xF6, 0x55, struct lttng_kernel_channel) +#define >>>> LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) +#define >>>> LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) >>>> >>>> /* Channel FD ioctl */ -#define LTTNG_KERNEL_STREAM >>>> _IO(0xF6, 0x60) -#define LTTNG_KERNEL_EVENT >>>> \ - _IOW(0xF6, 0x61, struct lttng_kernel_event) -#define >>>> LTTNG_KERNEL_STREAM_ID_OFFSET \ - _IOR(0xF6, 0x62, >>>> unsigned long) +#define LTTNG_KERNEL_STREAM _IO(0xF6, >>>> 0x62) +#define LTTNG_KERNEL_EVENT \ + _IOW(0xF6, 0x63, >>>> struct lttng_kernel_event) >>>> >>>> /* Event and Channel FD ioctl */ -#define >>>> LTTNG_KERNEL_CONTEXT \ - _IOW(0xF6, 0x70, >>>> struct lttng_kernel_context) +#define LTTNG_KERNEL_CONTEXT >>>> \ + _IOW(0xF6, 0x71, struct lttng_kernel_context) >>>> >>>> /* Event, Channel and Session ioctl */ -#define >>>> LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) -#define >>>> LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) +#define >>>> LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) +#define >>>> LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) >>>> >>>> #endif /* _LTT_KERNEL_IOCTL_H */ diff --git >>>> a/src/common/lttng-kernel-old.h >>>> b/src/common/lttng-kernel-old.h new file mode 100644 index >>>> 0000000..0579751 --- /dev/null +++ >>>> b/src/common/lttng-kernel-old.h @@ -0,0 +1,117 @@ +/* + * >>>> Copyright (C) 2011 - Julien Desfossez >>>> + * >>>> Mathieu Desnoyers + * >>>> David Goulet + * + * This program >>>> is free software; you can redistribute it and/or modify + * >>>> it under the terms of the GNU General Public License, version >>>> 2 only, + * as published by the Free Software Foundation. + >>>> * + * This program is distributed in the hope that it will be >>>> useful, but WITHOUT + * ANY WARRANTY; without even the >>>> implied warranty of MERCHANTABILITY or + * FITNESS FOR A >>>> PARTICULAR PURPOSE. See the GNU General Public License for + >>>> * more details. + * + * You should have received a copy of >>>> the GNU General Public License along + * with this program; >>>> if not, write to the Free Software Foundation, Inc., + * 51 >>>> Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + >>>> */ + +#ifndef _LTTNG_KERNEL_OLD_H +#define >>>> _LTTNG_KERNEL_OLD_H + +#include +#include >>>> + +#define >>>> LTTNG_KERNEL_OLD_SYM_NAME_LEN 256 + +/* + * LTTng DebugFS >>>> ABI structures. + * + * This is the kernel ABI copied from >>>> lttng-modules tree. + */ + +/* Perf counter attributes */ >>>> +struct lttng_kernel_old_perf_counter_ctx { + uint32_t type; >>>> + uint64_t config; + char >>>> name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; +}; + +/* Event/Channel >>>> context */ +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 >>>> +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 >>>> LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 +struct >>>> lttng_kernel_old_context { + enum lttng_kernel_context_type >>>> ctx; + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; + + >>>> union { + struct lttng_kernel_old_perf_counter_ctx >>>> perf_counter; + char >>>> padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; + } u; +}; + >>>> +struct lttng_kernel_old_kretprobe { + uint64_t addr; + + >>>> uint64_t offset; + char >>>> symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; +}; + +/* + * >>>> Either addr is used, or symbol_name and offset. + */ +struct >>>> lttng_kernel_old_kprobe { + uint64_t addr; + + uint64_t >>>> offset; + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; >>>> +}; + +/* Function tracer */ +struct >>>> lttng_kernel_old_function { + char >>>> symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; +}; + +#define >>>> LTTNG_KERNEL_OLD_EVENT_PADDING1 16 +#define >>>> LTTNG_KERNEL_OLD_EVENT_PADDING2 >>>> LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 +struct >>>> lttng_kernel_old_event { + char >>>> name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; + enum >>>> lttng_kernel_instrumentation instrumentation; + char >>>> padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; + + /* Per >>>> instrumentation type configuration */ + union { + struct >>>> lttng_kernel_old_kretprobe kretprobe; + struct >>>> lttng_kernel_old_kprobe kprobe; + struct >>>> lttng_kernel_old_function ftrace; + char >>>> padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; + } u; +}; + >>>> +struct lttng_kernel_old_tracer_version { + uint32_t major; + >>>> uint32_t minor; + uint32_t patchlevel; +}; + +struct >>>> lttng_kernel_old_calibrate { + enum >>>> lttng_kernel_calibrate_type type; /* type (input) */ +}; + >>>> +/* + * kernel channel + */ +#define >>>> LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 >>>> +struct lttng_kernel_old_channel { + int overwrite; >>>> /* 1: overwrite, 0: discard */ + uint64_t subbuf_size; >>>> /* bytes */ + uint64_t num_subbuf; /* power of >>>> 2 */ + unsigned int switch_timer_interval; /* usec */ + >>>> unsigned int read_timer_interval; /* usec */ + enum >>>> lttng_event_output output; /* splice, mmap */ + + char >>>> padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; +}; + +#endif /* >>>> _LTTNG_KERNEL_OLD_H */ diff --git a/src/common/lttng-kernel.h >>>> b/src/common/lttng-kernel.h index dbeb6aa..ac881bf 100644 --- >>>> a/src/common/lttng-kernel.h +++ b/src/common/lttng-kernel.h >>>> @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { >>>> uint32_t type; uint64_t config; char >>>> name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; >>>> +}__attribute__((packed)); >>>> >>>> /* Event/Channel context */ #define >>>> LTTNG_KERNEL_CONTEXT_PADDING1 16 @@ -72,14 +72,14 @@ struct >>>> lttng_kernel_context { struct lttng_kernel_perf_counter_ctx >>>> perf_counter; char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; } >>>> u; -}; +}__attribute__((packed)); >>>> >>>> struct lttng_kernel_kretprobe { uint64_t addr; >>>> >>>> uint64_t offset; char >>>> symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; >>>> +}__attribute__((packed)); >>>> >>>> /* * Either addr is used, or symbol_name and offset. @@ >>>> -89,12 +89,12 @@ struct lttng_kernel_kprobe { >>>> >>>> uint64_t offset; char >>>> symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; >>>> +}__attribute__((packed)); >>>> >>>> /* Function tracer */ struct lttng_kernel_function { char >>>> symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; >>>> +}__attribute__((packed)); >>>> >>>> #define LTTNG_KERNEL_EVENT_PADDING1 16 #define >>>> LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + >>>> 32 @@ -110,13 +110,13 @@ struct lttng_kernel_event { struct >>>> lttng_kernel_function ftrace; char >>>> padding[LTTNG_KERNEL_EVENT_PADDING2]; } u; -}; >>>> +}__attribute__((packed)); >>>> >>>> struct lttng_kernel_tracer_version { uint32_t major; uint32_t >>>> minor; uint32_t patchlevel; -}; +}__attribute__((packed)); >>>> >>>> enum lttng_kernel_calibrate_type { >>>> LTTNG_KERNEL_CALIBRATE_KRETPROBE, @@ -124,6 +124,21 @@ enum >>>> lttng_kernel_calibrate_type { >>>> >>>> struct lttng_kernel_calibrate { enum >>>> lttng_kernel_calibrate_type type; /* type (input) */ -}; >>>> +}__attribute__((packed)); + +/* + * kernel channel + */ >>>> +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN >>>> + 32 +struct lttng_kernel_channel { + uint64_t subbuf_size; >>>> /* bytes */ + uint64_t num_subbuf; /* power of >>>> 2 */ + unsigned int switch_timer_interval; /* usec */ + >>>> unsigned int read_timer_interval; /* usec */ + int >>>> overwrite; /* 1: overwrite, 0: discard >>>> */ + enum lttng_event_output output; /* splice, mmap */ >>>> + + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; >>>> +}__attribute__((packed)); >>>> >>>> #endif /* _LTTNG_KERNEL_H */ -- 1.7.9.5 >>>> >>> > -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJQad+MAAoJEELoaioR9I02MGUIAIc8TdQlR0fJ4pHPk1tL+0Q/ X3NOPcwgLbPU0/8yr4JjR8F5g5FashnekzyVkErbv0G88RMaOY0bc22Taf2NgkNq an+dT+WjkUCfNrYBdetAxZUo3DkKoUhwyJd5+A7a30DDLgbu86avA3cAHIg69N8j GD1K5WQ4mFgfJRIEZEz0mXWMW2k3mIP7Cma5e/iY7/MtazE+2IO5XVaGkp/UI8my c6rRX2B4P1OZQEB1vymU5PLPOhs3Ku5XKwKzO4m31IR6Kx2BzJ4LVUyVld0+r7QY hIByw6Yxz1lsh6N2oTnODMyc6kLsCqe1V8LOZY3hBl8M1iys4OFyiGUirNkSkTA= =KY6y -----END PGP SIGNATURE----- From jdesfossez at efficios.com Mon Oct 1 14:39:14 2012 From: jdesfossez at efficios.com (Julien Desfossez) Date: Mon, 01 Oct 2012 14:39:14 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v2] ABI with support for compat 32/64 bits In-Reply-To: <5069DF8F.9050706@efficios.com> References: <1348952028-31926-1-git-send-email-jdesfossez@efficios.com> <20121001161541.GB14260@Krystal> <5069D90C.1030804@efficios.com> <20121001181452.GA15644@Krystal> <5069DF8F.9050706@efficios.com> Message-ID: <5069E352.7040909@efficios.com> On 01/10/12 02:23 PM, David Goulet wrote: > > > Mathieu Desnoyers: >> * Julien Desfossez (jdesfossez at efficios.com) wrote: >>> >>> >>> On 01/10/12 12:15 PM, Mathieu Desnoyers wrote: >>>> * Julien Desfossez (jdesfossez at efficios.com) wrote: >>>>> The current ABI does not work for compat 32/64 bits. This >>>>> patch moves the current ABI as old-abi and provides a new ABI >>>>> in which all the structures exchanged between user and >>>>> kernel-space are packed. Also this new ABI moves the "int >>>>> overwrite" member of the struct lttng_kernel_channel to >>>>> remove the alignment added by the compiler. >>>>> >>>>> A patch for lttng-modules has been developed in parallel to >>>>> this one to support the new ABI. These 2 patches have been >>>>> tested in all possible configurations (applied or not) on >>>>> 64-bit and 32-bit kernels (with CONFIG_COMPAT) and a >>>>> user-space in 32 and 64-bit. >>>>> >>>>> Here are the results of the tests : k 64 compat | u 32 >>>>> compat | OK k 64 compat | u 64 compat | OK k 64 >>>>> compat | u 32 non-compat | KO k 64 compat | u 64 >>>>> non-compat | OK >>>>> >>>>> k 64 non-compat | u 64 compat | OK k 64 non-compat | u 32 >>>>> compat | KO k 64 non-compat | u 64 non-compat | OK k 64 >>>>> non-compat | u 32 non-compat | KO >>>>> >>>>> k 32 compat | u compat | OK k 32 compat | u >>>>> non-compat | OK >>>>> >>>>> k 32 non-compat | u compat | OK k 32 non-compat | u >>>>> non-compat | OK >>>>> >>>>> The results are as expected : - on 32-bit user-space and >>>>> kernel, every configuration works. - on 64-bit user-space and >>>>> kernel, every configuration works. - with 32-bit user-space >>>>> on a 64-bit kernel the only configuration where it works is >>>>> when the compat patch is applied everywhere. >>>>> >>>>> Signed-off-by: Julien Desfossez >>>>> --- src/bin/lttng-sessiond/trace-kernel.h | 1 + >>>>> src/common/kernel-ctl/kernel-ctl.c | 96 >>>>> ++++++++++++++++++++++++--- >>>>> src/common/kernel-ctl/kernel-ctl.h | 1 + >>>>> src/common/kernel-ctl/kernel-ioctl.h | 74 >>>>> +++++++++++++++------ src/common/lttng-kernel-old.h | >>>>> 117 +++++++++++++++++++++++++++++++++ >>>>> src/common/lttng-kernel.h | 31 ++++++--- 6 >>>>> files changed, 281 insertions(+), 39 deletions(-) create mode >>>>> 100644 src/common/lttng-kernel-old.h >>>>> >>>>> diff --git a/src/bin/lttng-sessiond/trace-kernel.h >>>>> b/src/bin/lttng-sessiond/trace-kernel.h index >>>>> f04d9e7..c86cc27 100644 --- >>>>> a/src/bin/lttng-sessiond/trace-kernel.h +++ >>>>> b/src/bin/lttng-sessiond/trace-kernel.h @@ -22,6 +22,7 @@ >>>>> >>>>> #include #include >>>>> +#include >>>>> >>>>> #include "consumer.h" >>>>> >>>>> diff --git a/src/common/kernel-ctl/kernel-ctl.c >>>>> b/src/common/kernel-ctl/kernel-ctl.c index 1396cd9..2ac2d53 >>>>> 100644 --- a/src/common/kernel-ctl/kernel-ctl.c +++ >>>>> b/src/common/kernel-ctl/kernel-ctl.c @@ -18,38 +18,100 @@ >>>>> >>>>> #define __USE_LINUX_IOCTL_DEFS #include >>>>> +#include >>>>> >>>>> #include "kernel-ctl.h" #include "kernel-ioctl.h" >>>>> >>>>> +/* + * This flag indicates if lttng-tools must use the new >>>>> or the old kernel ABI + * (without compat support for 32/64 >>>>> bits). It is set by + * kernctl_tracer_version() >>>> >>>> Hrm. I don't like that this is only set by >>>> kernctl_tracer_version(). What if, for an unforeseen reason >>>> (e.g. code change), sessiond starts using "create session" >>>> before checking the version ? The change you propose assumes a >>>> behavior on the sessiond side that is not cast in stone (no ABI >>>> requires it), so it might change. This brings coupling between >>>> this otherwise self-contained wrapper and the entire sessiond >>>> code base, which I don't like. >>>> >>>> We should put the check in a wrapper macro around the every new >>>> ioctl call. >>>> >>>> e.g. >>>> >>>> /* * Cache whether we need to use the old or new ABI. */ static >>>> lttng_kernel_use_old_abi = -1; >>>> >>>> >>>> if (lttng_kernel_use_old_abi == -1) { ret = ioctl(fd, newname, >>>> args); if (!ret) { lttng_kernel_use_old_abi = 0; } } else { if >>>> (!lttng_kernel_use_old_abi) { ret = ioctl(fd, oldname, args); } >>>> else { ret = ioctl(fd, newname, args); } } return ret; >>>> >>>> Thoughts ? >>> >>> Ok I will add this wrapper, I am wondering if we could just limit >>> to these ioctl since they are the only one called on the >>> "top-level" fd (/proc/lttng) instead of adding it everywhere : >>> LTTNG_KERNEL_SESSION LTTNG_KERNEL_TRACER_VERSION >>> LTTNG_KERNEL_TRACEPOINT_LIST LTTNG_KERNEL_WAIT_QUIESCENT >>> LTTNG_KERNEL_CALIBRATE >>> >>> All other ioctl require at least a session so is it OK if I limit >>> the test to only these ? > >> yes, it makes sense. And given that this code is very much >> localized, I don't think it would be worth the effort to create a >> macro wrapper. Duplicating the checks at each 5 sites, all within >> the same file, seems good enough. But it's David's call. > > I don't know... I can easily see some other use cases in the future > that requires us to wrap the ioctl according to the kernel. > > I remember having this discussion way back when we started lttng-tools > code and here we are almost three years later fixing the "ioctl > wrapper issue" :P. > > I don't think adding a macro is a lot of work and it will be really > easier for us to scale and adapt over time. Please, if I'm wrong, > speak now or shut up to the end of eternity! :P For this particular case, we need to make special treatment when the ioctl takes an argument depending on the ABI version (alloc and assign the old struct values from the new one), so we won't be able to call a generic wrapper (except if we do this treatment all the time which I doubt we want). For ioctls that don't take an argument the wrapper is easy though. So I think I should do a wrapper like compat_ioctl_no_arg(int fd, unsigned long oldname, unsigned long newname) and for the ioctls that take an argument make the checks locally. Thoughts ? Thanks, Julien > > Cheers! > David > > >> Thanks, > >> Mathieu > >>> >>> Thanks, >>> >>> Julien >>> >>> >>>> >>>> Thanks, >>>> >>>> Mathieu >>>> >>>> >>>>> + */ +static int lttng_kernel_use_old_abi; + int >>>>> kernctl_create_session(int fd) { + if >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>>> LTTNG_KERNEL_OLD_SESSION); return ioctl(fd, >>>>> LTTNG_KERNEL_SESSION); } >>>>> >>>>> /* open the metadata global channel */ int >>>>> kernctl_open_metadata(int fd, struct lttng_channel_attr >>>>> *chops) { - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); + >>>>> struct lttng_kernel_old_channel old_channel; + struct >>>>> lttng_kernel_channel channel; + + if >>>>> (lttng_kernel_use_old_abi) { + old_channel.overwrite = >>>>> chops->overwrite; + old_channel.subbuf_size = >>>>> chops->subbuf_size; + old_channel.num_subbuf = >>>>> chops->num_subbuf; + old_channel.switch_timer_interval = >>>>> chops->switch_timer_interval; + >>>>> old_channel.read_timer_interval = >>>>> chops->read_timer_interval; + old_channel.output = >>>>> chops->output; + memcpy(old_channel.padding, chops->padding, >>>>> sizeof(chops->padding)); + + return ioctl(fd, >>>>> LTTNG_KERNEL_OLD_METADATA, &old_channel); + } + + >>>>> channel.overwrite = chops->overwrite; + channel.subbuf_size = >>>>> chops->subbuf_size; + channel.num_subbuf = >>>>> chops->num_subbuf; + channel.switch_timer_interval = >>>>> chops->switch_timer_interval; + channel.read_timer_interval = >>>>> chops->read_timer_interval; + channel.output = >>>>> chops->output; + memcpy(channel.padding, chops->padding, >>>>> sizeof(chops->padding)); + + return ioctl(fd, >>>>> LTTNG_KERNEL_METADATA, &channel); } >>>>> >>>>> int kernctl_create_channel(int fd, struct lttng_channel_attr >>>>> *chops) { - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); + >>>>> struct lttng_kernel_old_channel old_channel; + struct >>>>> lttng_kernel_channel channel; + + if >>>>> (lttng_kernel_use_old_abi) { + old_channel.overwrite = >>>>> chops->overwrite; + old_channel.subbuf_size = >>>>> chops->subbuf_size; + old_channel.num_subbuf = >>>>> chops->num_subbuf; + old_channel.switch_timer_interval = >>>>> chops->switch_timer_interval; + >>>>> old_channel.read_timer_interval = >>>>> chops->read_timer_interval; + old_channel.output = >>>>> chops->output; + memcpy(old_channel.padding, chops->padding, >>>>> sizeof(chops->padding)); + + return ioctl(fd, >>>>> LTTNG_KERNEL_OLD_CHANNEL, &old_channel); + } + + >>>>> channel.overwrite = chops->overwrite; + channel.subbuf_size = >>>>> chops->subbuf_size; + channel.num_subbuf = >>>>> chops->num_subbuf; + channel.switch_timer_interval = >>>>> chops->switch_timer_interval; + channel.read_timer_interval = >>>>> chops->read_timer_interval; + channel.output = >>>>> chops->output; + memcpy(channel.padding, chops->padding, >>>>> sizeof(chops->padding)); + + return ioctl(fd, >>>>> LTTNG_KERNEL_CHANNEL, &channel); } >>>>> >>>>> int kernctl_create_stream(int fd) { + if >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>>> LTTNG_KERNEL_OLD_STREAM); return ioctl(fd, >>>>> LTTNG_KERNEL_STREAM); } >>>>> >>>>> int kernctl_create_event(int fd, struct lttng_kernel_event >>>>> *ev) { + if (lttng_kernel_use_old_abi) + return ioctl(fd, >>>>> LTTNG_KERNEL_OLD_EVENT, ev); return ioctl(fd, >>>>> LTTNG_KERNEL_EVENT, ev); } >>>>> >>>>> int kernctl_add_context(int fd, struct lttng_kernel_context >>>>> *ctx) { + if (lttng_kernel_use_old_abi) + return ioctl(fd, >>>>> LTTNG_KERNEL_OLD_CONTEXT, ctx); return ioctl(fd, >>>>> LTTNG_KERNEL_CONTEXT, ctx); } >>>>> >>>>> @@ -57,43 +119,64 @@ int kernctl_add_context(int fd, struct >>>>> lttng_kernel_context *ctx) /* Enable event, channel and >>>>> session ioctl */ int kernctl_enable(int fd) { + if >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>>> LTTNG_KERNEL_OLD_ENABLE); return ioctl(fd, >>>>> LTTNG_KERNEL_ENABLE); } >>>>> >>>>> /* Disable event, channel and session ioctl */ int >>>>> kernctl_disable(int fd) { + if (lttng_kernel_use_old_abi) + >>>>> return ioctl(fd, LTTNG_KERNEL_OLD_DISABLE); return ioctl(fd, >>>>> LTTNG_KERNEL_DISABLE); } >>>>> >>>>> int kernctl_start_session(int fd) { + if >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>>> LTTNG_KERNEL_OLD_SESSION_START); return ioctl(fd, >>>>> LTTNG_KERNEL_SESSION_START); } >>>>> >>>>> int kernctl_stop_session(int fd) { + if >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>>> LTTNG_KERNEL_OLD_SESSION_STOP); return ioctl(fd, >>>>> LTTNG_KERNEL_SESSION_STOP); } >>>>> >>>>> >>>>> int kernctl_tracepoint_list(int fd) { + if >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>>> LTTNG_KERNEL_OLD_TRACEPOINT_LIST); return ioctl(fd, >>>>> LTTNG_KERNEL_TRACEPOINT_LIST); } >>>>> >>>>> int kernctl_tracer_version(int fd, struct >>>>> lttng_kernel_tracer_version *v) { - return ioctl(fd, >>>>> LTTNG_KERNEL_TRACER_VERSION, v); + int ret; + + ret = >>>>> ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); + if (!ret) + >>>>> return ret; + + lttng_kernel_use_old_abi = 1; + return >>>>> ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, v); } >>>>> >>>>> int kernctl_wait_quiescent(int fd) { + if >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, >>>>> LTTNG_KERNEL_OLD_WAIT_QUIESCENT); return ioctl(fd, >>>>> LTTNG_KERNEL_WAIT_QUIESCENT); } >>>>> >>>>> int kernctl_calibrate(int fd, struct lttng_kernel_calibrate >>>>> *calibrate) { + if (lttng_kernel_use_old_abi) + return >>>>> ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, calibrate); return >>>>> ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); } >>>>> >>>>> @@ -193,10 +276,3 @@ int kernctl_set_stream_id(int fd, >>>>> unsigned long *stream_id) { return ioctl(fd, >>>>> RING_BUFFER_SET_STREAM_ID, stream_id); } - -/* Get the offset >>>>> of the stream_id in the packet header */ -int >>>>> kernctl_get_net_stream_id_offset(int fd, unsigned long >>>>> *offset) -{ - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, >>>>> offset); - -} diff --git a/src/common/kernel-ctl/kernel-ctl.h >>>>> b/src/common/kernel-ctl/kernel-ctl.h index 18712d9..85a3a18 >>>>> 100644 --- a/src/common/kernel-ctl/kernel-ctl.h +++ >>>>> b/src/common/kernel-ctl/kernel-ctl.h @@ -21,6 +21,7 @@ >>>>> >>>>> #include #include >>>>> +#include >>>>> >>>>> int kernctl_create_session(int fd); int >>>>> kernctl_open_metadata(int fd, struct lttng_channel_attr >>>>> *chops); diff --git a/src/common/kernel-ctl/kernel-ioctl.h >>>>> b/src/common/kernel-ctl/kernel-ioctl.h index 35942be..1d34222 >>>>> 100644 --- a/src/common/kernel-ctl/kernel-ioctl.h +++ >>>>> b/src/common/kernel-ctl/kernel-ioctl.h @@ -49,37 +49,69 @@ /* >>>>> map stream to stream id for network streaming */ #define >>>>> RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned >>>>> long) >>>>> >>>>> +/* Old ABI (without support for 32/64 bits compat) */ +/* >>>>> LTTng file descriptor ioctl */ +#define >>>>> LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) >>>>> +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ + >>>>> _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) >>>>> +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, >>>>> 0x42) +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT >>>>> _IO(0xF6, 0x43) +#define LTTNG_KERNEL_OLD_CALIBRATE \ + >>>>> _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) + +/* >>>>> Session FD ioctl */ +#define LTTNG_KERNEL_OLD_METADATA >>>>> \ + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) >>>>> +#define LTTNG_KERNEL_OLD_CHANNEL \ + >>>>> _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) +#define >>>>> LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) >>>>> +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, >>>>> 0x53) + +/* Channel FD ioctl */ +#define >>>>> LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) >>>>> +#define LTTNG_KERNEL_OLD_EVENT \ + >>>>> _IOW(0xF6, 0x61, struct lttng_kernel_old_event) +#define >>>>> LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ + _IOR(0xF6, 0x62, >>>>> unsigned long) >>>>> >>>>> +/* Event and Channel FD ioctl */ +#define >>>>> LTTNG_KERNEL_OLD_CONTEXT \ + _IOW(0xF6, 0x70, >>>>> struct lttng_kernel_old_context) + +/* Event, Channel and >>>>> Session ioctl */ +#define LTTNG_KERNEL_OLD_ENABLE >>>>> _IO(0xF6, 0x80) +#define LTTNG_KERNEL_OLD_DISABLE >>>>> _IO(0xF6, 0x81) + + +/* New ABI (with suport for 32/64 bits >>>>> compat) */ /* LTTng file descriptor ioctl */ -#define >>>>> LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) -#define >>>>> LTTNG_KERNEL_TRACER_VERSION \ - _IOR(0xF6, 0x41, >>>>> struct lttng_kernel_tracer_version) -#define >>>>> LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) -#define >>>>> LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) +#define >>>>> LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) +#define >>>>> LTTNG_KERNEL_TRACER_VERSION \ + _IOR(0xF6, 0x46, struct >>>>> lttng_kernel_tracer_version) +#define >>>>> LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) +#define >>>>> LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) #define >>>>> LTTNG_KERNEL_CALIBRATE \ - _IOWR(0xF6, 0x44, struct >>>>> lttng_kernel_calibrate) + _IOWR(0xF6, 0x49, struct >>>>> lttng_kernel_calibrate) >>>>> >>>>> /* Session FD ioctl */ -#define LTTNG_KERNEL_METADATA >>>>> \ - _IOW(0xF6, 0x50, struct lttng_channel_attr) -#define >>>>> LTTNG_KERNEL_CHANNEL \ - _IOW(0xF6, 0x51, >>>>> struct lttng_channel_attr) -#define >>>>> LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) -#define >>>>> LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) +#define >>>>> LTTNG_KERNEL_METADATA \ + _IOW(0xF6, 0x54, struct >>>>> lttng_kernel_channel) +#define LTTNG_KERNEL_CHANNEL \ + >>>>> _IOW(0xF6, 0x55, struct lttng_kernel_channel) +#define >>>>> LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) +#define >>>>> LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) >>>>> >>>>> /* Channel FD ioctl */ -#define LTTNG_KERNEL_STREAM >>>>> _IO(0xF6, 0x60) -#define LTTNG_KERNEL_EVENT >>>>> \ - _IOW(0xF6, 0x61, struct lttng_kernel_event) -#define >>>>> LTTNG_KERNEL_STREAM_ID_OFFSET \ - _IOR(0xF6, 0x62, >>>>> unsigned long) +#define LTTNG_KERNEL_STREAM _IO(0xF6, >>>>> 0x62) +#define LTTNG_KERNEL_EVENT \ + _IOW(0xF6, 0x63, >>>>> struct lttng_kernel_event) >>>>> >>>>> /* Event and Channel FD ioctl */ -#define >>>>> LTTNG_KERNEL_CONTEXT \ - _IOW(0xF6, 0x70, >>>>> struct lttng_kernel_context) +#define LTTNG_KERNEL_CONTEXT >>>>> \ + _IOW(0xF6, 0x71, struct lttng_kernel_context) >>>>> >>>>> /* Event, Channel and Session ioctl */ -#define >>>>> LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) -#define >>>>> LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) +#define >>>>> LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) +#define >>>>> LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) >>>>> >>>>> #endif /* _LTT_KERNEL_IOCTL_H */ diff --git >>>>> a/src/common/lttng-kernel-old.h >>>>> b/src/common/lttng-kernel-old.h new file mode 100644 index >>>>> 0000000..0579751 --- /dev/null +++ >>>>> b/src/common/lttng-kernel-old.h @@ -0,0 +1,117 @@ +/* + * >>>>> Copyright (C) 2011 - Julien Desfossez >>>>> + * >>>>> Mathieu Desnoyers + * >>>>> David Goulet + * + * This program >>>>> is free software; you can redistribute it and/or modify + * >>>>> it under the terms of the GNU General Public License, version >>>>> 2 only, + * as published by the Free Software Foundation. + >>>>> * + * This program is distributed in the hope that it will be >>>>> useful, but WITHOUT + * ANY WARRANTY; without even the >>>>> implied warranty of MERCHANTABILITY or + * FITNESS FOR A >>>>> PARTICULAR PURPOSE. See the GNU General Public License for + >>>>> * more details. + * + * You should have received a copy of >>>>> the GNU General Public License along + * with this program; >>>>> if not, write to the Free Software Foundation, Inc., + * 51 >>>>> Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + >>>>> */ + +#ifndef _LTTNG_KERNEL_OLD_H +#define >>>>> _LTTNG_KERNEL_OLD_H + +#include +#include >>>>> + +#define >>>>> LTTNG_KERNEL_OLD_SYM_NAME_LEN 256 + +/* + * LTTng DebugFS >>>>> ABI structures. + * + * This is the kernel ABI copied from >>>>> lttng-modules tree. + */ + +/* Perf counter attributes */ >>>>> +struct lttng_kernel_old_perf_counter_ctx { + uint32_t type; >>>>> + uint64_t config; + char >>>>> name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; +}; + +/* Event/Channel >>>>> context */ +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 >>>>> +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 >>>>> LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 +struct >>>>> lttng_kernel_old_context { + enum lttng_kernel_context_type >>>>> ctx; + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; + + >>>>> union { + struct lttng_kernel_old_perf_counter_ctx >>>>> perf_counter; + char >>>>> padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; + } u; +}; + >>>>> +struct lttng_kernel_old_kretprobe { + uint64_t addr; + + >>>>> uint64_t offset; + char >>>>> symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; +}; + +/* + * >>>>> Either addr is used, or symbol_name and offset. + */ +struct >>>>> lttng_kernel_old_kprobe { + uint64_t addr; + + uint64_t >>>>> offset; + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; >>>>> +}; + +/* Function tracer */ +struct >>>>> lttng_kernel_old_function { + char >>>>> symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; +}; + +#define >>>>> LTTNG_KERNEL_OLD_EVENT_PADDING1 16 +#define >>>>> LTTNG_KERNEL_OLD_EVENT_PADDING2 >>>>> LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 +struct >>>>> lttng_kernel_old_event { + char >>>>> name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; + enum >>>>> lttng_kernel_instrumentation instrumentation; + char >>>>> padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; + + /* Per >>>>> instrumentation type configuration */ + union { + struct >>>>> lttng_kernel_old_kretprobe kretprobe; + struct >>>>> lttng_kernel_old_kprobe kprobe; + struct >>>>> lttng_kernel_old_function ftrace; + char >>>>> padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; + } u; +}; + >>>>> +struct lttng_kernel_old_tracer_version { + uint32_t major; + >>>>> uint32_t minor; + uint32_t patchlevel; +}; + +struct >>>>> lttng_kernel_old_calibrate { + enum >>>>> lttng_kernel_calibrate_type type; /* type (input) */ +}; + >>>>> +/* + * kernel channel + */ +#define >>>>> LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 >>>>> +struct lttng_kernel_old_channel { + int overwrite; >>>>> /* 1: overwrite, 0: discard */ + uint64_t subbuf_size; >>>>> /* bytes */ + uint64_t num_subbuf; /* power of >>>>> 2 */ + unsigned int switch_timer_interval; /* usec */ + >>>>> unsigned int read_timer_interval; /* usec */ + enum >>>>> lttng_event_output output; /* splice, mmap */ + + char >>>>> padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; +}; + +#endif /* >>>>> _LTTNG_KERNEL_OLD_H */ diff --git a/src/common/lttng-kernel.h >>>>> b/src/common/lttng-kernel.h index dbeb6aa..ac881bf 100644 --- >>>>> a/src/common/lttng-kernel.h +++ b/src/common/lttng-kernel.h >>>>> @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { >>>>> uint32_t type; uint64_t config; char >>>>> name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; >>>>> +}__attribute__((packed)); >>>>> >>>>> /* Event/Channel context */ #define >>>>> LTTNG_KERNEL_CONTEXT_PADDING1 16 @@ -72,14 +72,14 @@ struct >>>>> lttng_kernel_context { struct lttng_kernel_perf_counter_ctx >>>>> perf_counter; char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; } >>>>> u; -}; +}__attribute__((packed)); >>>>> >>>>> struct lttng_kernel_kretprobe { uint64_t addr; >>>>> >>>>> uint64_t offset; char >>>>> symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; >>>>> +}__attribute__((packed)); >>>>> >>>>> /* * Either addr is used, or symbol_name and offset. @@ >>>>> -89,12 +89,12 @@ struct lttng_kernel_kprobe { >>>>> >>>>> uint64_t offset; char >>>>> symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; >>>>> +}__attribute__((packed)); >>>>> >>>>> /* Function tracer */ struct lttng_kernel_function { char >>>>> symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; >>>>> +}__attribute__((packed)); >>>>> >>>>> #define LTTNG_KERNEL_EVENT_PADDING1 16 #define >>>>> LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + >>>>> 32 @@ -110,13 +110,13 @@ struct lttng_kernel_event { struct >>>>> lttng_kernel_function ftrace; char >>>>> padding[LTTNG_KERNEL_EVENT_PADDING2]; } u; -}; >>>>> +}__attribute__((packed)); >>>>> >>>>> struct lttng_kernel_tracer_version { uint32_t major; uint32_t >>>>> minor; uint32_t patchlevel; -}; +}__attribute__((packed)); >>>>> >>>>> enum lttng_kernel_calibrate_type { >>>>> LTTNG_KERNEL_CALIBRATE_KRETPROBE, @@ -124,6 +124,21 @@ enum >>>>> lttng_kernel_calibrate_type { >>>>> >>>>> struct lttng_kernel_calibrate { enum >>>>> lttng_kernel_calibrate_type type; /* type (input) */ -}; >>>>> +}__attribute__((packed)); + +/* + * kernel channel + */ >>>>> +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN >>>>> + 32 +struct lttng_kernel_channel { + uint64_t subbuf_size; >>>>> /* bytes */ + uint64_t num_subbuf; /* power of >>>>> 2 */ + unsigned int switch_timer_interval; /* usec */ + >>>>> unsigned int read_timer_interval; /* usec */ + int >>>>> overwrite; /* 1: overwrite, 0: discard >>>>> */ + enum lttng_event_output output; /* splice, mmap */ >>>>> + + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; >>>>> +}__attribute__((packed)); >>>>> >>>>> #endif /* _LTTNG_KERNEL_H */ -- 1.7.9.5 >>>>> >>>> > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev From dgoulet at efficios.com Mon Oct 1 14:43:41 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 01 Oct 2012 14:43:41 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v2] ABI with support for compat 32/64 bits In-Reply-To: <5069E352.7040909@efficios.com> References: <1348952028-31926-1-git-send-email-jdesfossez@efficios.com> <20121001161541.GB14260@Krystal> <5069D90C.1030804@efficios.com> <20121001181452.GA15644@Krystal> <5069DF8F.9050706@efficios.com> <5069E352.7040909@efficios.com> Message-ID: <5069E45D.4040303@efficios.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Julien Desfossez: >>>>>> >>>>>> +/* + * This flag indicates if lttng-tools must use the >>>>>> new or the old kernel ABI + * (without compat support for >>>>>> 32/64 bits). It is set by + * kernctl_tracer_version() >>>>> >>>>> Hrm. I don't like that this is only set by >>>>> kernctl_tracer_version(). What if, for an unforeseen >>>>> reason (e.g. code change), sessiond starts using "create >>>>> session" before checking the version ? The change you >>>>> propose assumes a behavior on the sessiond side that is not >>>>> cast in stone (no ABI requires it), so it might change. >>>>> This brings coupling between this otherwise self-contained >>>>> wrapper and the entire sessiond code base, which I don't >>>>> like. >>>>> >>>>> We should put the check in a wrapper macro around the every >>>>> new ioctl call. >>>>> >>>>> e.g. >>>>> >>>>> /* * Cache whether we need to use the old or new ABI. */ >>>>> static lttng_kernel_use_old_abi = -1; >>>>> >>>>> >>>>> if (lttng_kernel_use_old_abi == -1) { ret = ioctl(fd, >>>>> newname, args); if (!ret) { lttng_kernel_use_old_abi = 0; } >>>>> } else { if (!lttng_kernel_use_old_abi) { ret = ioctl(fd, >>>>> oldname, args); } else { ret = ioctl(fd, newname, args); } >>>>> } return ret; >>>>> >>>>> Thoughts ? >>>> >>>> Ok I will add this wrapper, I am wondering if we could just >>>> limit to these ioctl since they are the only one called on >>>> the "top-level" fd (/proc/lttng) instead of adding it >>>> everywhere : LTTNG_KERNEL_SESSION LTTNG_KERNEL_TRACER_VERSION >>>> LTTNG_KERNEL_TRACEPOINT_LIST LTTNG_KERNEL_WAIT_QUIESCENT >>>> LTTNG_KERNEL_CALIBRATE >>>> >>>> All other ioctl require at least a session so is it OK if I >>>> limit the test to only these ? >> >>> yes, it makes sense. And given that this code is very much >>> localized, I don't think it would be worth the effort to create >>> a macro wrapper. Duplicating the checks at each 5 sites, all >>> within the same file, seems good enough. But it's David's >>> call. >> >> I don't know... I can easily see some other use cases in the >> future that requires us to wrap the ioctl according to the >> kernel. >> >> I remember having this discussion way back when we started >> lttng-tools code and here we are almost three years later fixing >> the "ioctl wrapper issue" :P. >> >> I don't think adding a macro is a lot of work and it will be >> really easier for us to scale and adapt over time. Please, if I'm >> wrong, speak now or shut up to the end of eternity! :P > > For this particular case, we need to make special treatment when > the ioctl takes an argument depending on the ABI version (alloc and > assign the old struct values from the new one), so we won't be able > to call a generic wrapper (except if we do this treatment all the > time which I doubt we want). For ioctls that don't take an argument > the wrapper is easy though. > > So I think I should do a wrapper like compat_ioctl_no_arg(int fd, > unsigned long oldname, unsigned long newname) and for the ioctls > that take an argument make the checks locally. Hmmm it's that or a compat call for each ioctl listed above... or using va_list but I really don't like that :P Either way works for me. Mathieu? David > > Thoughts ? > > Thanks, > > Julien > > -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJQaeRaAAoJEELoaioR9I026m4IALHDYyU28segZJpauX6JbDTE ij+T9aW7+bAxJRQfymg9kx6noo8t+at4rEsOO+4FvK30/8Tfvqy5H84dw72AtPo2 mbllr4mGAkqb9S3mfoCWSt38bK46cqKZf8ldJJs4B8sH1eWEMtVxUv7vgNnI4yZf X3/2NoOUdFs1QUun8oGm2Evh82+4PAfJRemKybSIwi5a1GOoZQ7WGYViWsQch0mx vEjsdQ1ays4zCU/EWGhSu4+WBx18C+sPaRNg1MB98fqyjWMph7XbbCysQptv7LIs Es8wZkINul+HzKf8F47iapZO30QEGwawSxko+B0YyEbk6iXKzIiaPPbik3tgzdI= =UfYI -----END PGP SIGNATURE----- From mathieu.desnoyers at efficios.com Mon Oct 1 14:43:41 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 1 Oct 2012 14:43:41 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v2] ABI with support for compat 32/64 bits In-Reply-To: <5069E352.7040909@efficios.com> References: <1348952028-31926-1-git-send-email-jdesfossez@efficios.com> <20121001161541.GB14260@Krystal> <5069D90C.1030804@efficios.com> <20121001181452.GA15644@Krystal> <5069DF8F.9050706@efficios.com> <5069E352.7040909@efficios.com> Message-ID: <20121001184341.GA16425@Krystal> * Julien Desfossez (jdesfossez at efficios.com) wrote: > > > On 01/10/12 02:23 PM, David Goulet wrote: > > > > > > Mathieu Desnoyers: > >> * Julien Desfossez (jdesfossez at efficios.com) wrote: > >>> > >>> > >>> On 01/10/12 12:15 PM, Mathieu Desnoyers wrote: > >>>> * Julien Desfossez (jdesfossez at efficios.com) wrote: > >>>>> The current ABI does not work for compat 32/64 bits. This > >>>>> patch moves the current ABI as old-abi and provides a new ABI > >>>>> in which all the structures exchanged between user and > >>>>> kernel-space are packed. Also this new ABI moves the "int > >>>>> overwrite" member of the struct lttng_kernel_channel to > >>>>> remove the alignment added by the compiler. > >>>>> > >>>>> A patch for lttng-modules has been developed in parallel to > >>>>> this one to support the new ABI. These 2 patches have been > >>>>> tested in all possible configurations (applied or not) on > >>>>> 64-bit and 32-bit kernels (with CONFIG_COMPAT) and a > >>>>> user-space in 32 and 64-bit. > >>>>> > >>>>> Here are the results of the tests : k 64 compat | u 32 > >>>>> compat | OK k 64 compat | u 64 compat | OK k 64 > >>>>> compat | u 32 non-compat | KO k 64 compat | u 64 > >>>>> non-compat | OK > >>>>> > >>>>> k 64 non-compat | u 64 compat | OK k 64 non-compat | u 32 > >>>>> compat | KO k 64 non-compat | u 64 non-compat | OK k 64 > >>>>> non-compat | u 32 non-compat | KO > >>>>> > >>>>> k 32 compat | u compat | OK k 32 compat | u > >>>>> non-compat | OK > >>>>> > >>>>> k 32 non-compat | u compat | OK k 32 non-compat | u > >>>>> non-compat | OK > >>>>> > >>>>> The results are as expected : - on 32-bit user-space and > >>>>> kernel, every configuration works. - on 64-bit user-space and > >>>>> kernel, every configuration works. - with 32-bit user-space > >>>>> on a 64-bit kernel the only configuration where it works is > >>>>> when the compat patch is applied everywhere. > >>>>> > >>>>> Signed-off-by: Julien Desfossez > >>>>> --- src/bin/lttng-sessiond/trace-kernel.h | 1 + > >>>>> src/common/kernel-ctl/kernel-ctl.c | 96 > >>>>> ++++++++++++++++++++++++--- > >>>>> src/common/kernel-ctl/kernel-ctl.h | 1 + > >>>>> src/common/kernel-ctl/kernel-ioctl.h | 74 > >>>>> +++++++++++++++------ src/common/lttng-kernel-old.h | > >>>>> 117 +++++++++++++++++++++++++++++++++ > >>>>> src/common/lttng-kernel.h | 31 ++++++--- 6 > >>>>> files changed, 281 insertions(+), 39 deletions(-) create mode > >>>>> 100644 src/common/lttng-kernel-old.h > >>>>> > >>>>> diff --git a/src/bin/lttng-sessiond/trace-kernel.h > >>>>> b/src/bin/lttng-sessiond/trace-kernel.h index > >>>>> f04d9e7..c86cc27 100644 --- > >>>>> a/src/bin/lttng-sessiond/trace-kernel.h +++ > >>>>> b/src/bin/lttng-sessiond/trace-kernel.h @@ -22,6 +22,7 @@ > >>>>> > >>>>> #include #include > >>>>> +#include > >>>>> > >>>>> #include "consumer.h" > >>>>> > >>>>> diff --git a/src/common/kernel-ctl/kernel-ctl.c > >>>>> b/src/common/kernel-ctl/kernel-ctl.c index 1396cd9..2ac2d53 > >>>>> 100644 --- a/src/common/kernel-ctl/kernel-ctl.c +++ > >>>>> b/src/common/kernel-ctl/kernel-ctl.c @@ -18,38 +18,100 @@ > >>>>> > >>>>> #define __USE_LINUX_IOCTL_DEFS #include > >>>>> +#include > >>>>> > >>>>> #include "kernel-ctl.h" #include "kernel-ioctl.h" > >>>>> > >>>>> +/* + * This flag indicates if lttng-tools must use the new > >>>>> or the old kernel ABI + * (without compat support for 32/64 > >>>>> bits). It is set by + * kernctl_tracer_version() > >>>> > >>>> Hrm. I don't like that this is only set by > >>>> kernctl_tracer_version(). What if, for an unforeseen reason > >>>> (e.g. code change), sessiond starts using "create session" > >>>> before checking the version ? The change you propose assumes a > >>>> behavior on the sessiond side that is not cast in stone (no ABI > >>>> requires it), so it might change. This brings coupling between > >>>> this otherwise self-contained wrapper and the entire sessiond > >>>> code base, which I don't like. > >>>> > >>>> We should put the check in a wrapper macro around the every new > >>>> ioctl call. > >>>> > >>>> e.g. > >>>> > >>>> /* * Cache whether we need to use the old or new ABI. */ static > >>>> lttng_kernel_use_old_abi = -1; > >>>> > >>>> > >>>> if (lttng_kernel_use_old_abi == -1) { ret = ioctl(fd, newname, > >>>> args); if (!ret) { lttng_kernel_use_old_abi = 0; } } else { if > >>>> (!lttng_kernel_use_old_abi) { ret = ioctl(fd, oldname, args); } > >>>> else { ret = ioctl(fd, newname, args); } } return ret; > >>>> > >>>> Thoughts ? > >>> > >>> Ok I will add this wrapper, I am wondering if we could just limit > >>> to these ioctl since they are the only one called on the > >>> "top-level" fd (/proc/lttng) instead of adding it everywhere : > >>> LTTNG_KERNEL_SESSION LTTNG_KERNEL_TRACER_VERSION > >>> LTTNG_KERNEL_TRACEPOINT_LIST LTTNG_KERNEL_WAIT_QUIESCENT > >>> LTTNG_KERNEL_CALIBRATE > >>> > >>> All other ioctl require at least a session so is it OK if I limit > >>> the test to only these ? > > > >> yes, it makes sense. And given that this code is very much > >> localized, I don't think it would be worth the effort to create a > >> macro wrapper. Duplicating the checks at each 5 sites, all within > >> the same file, seems good enough. But it's David's call. > > > > I don't know... I can easily see some other use cases in the future > > that requires us to wrap the ioctl according to the kernel. > > > > I remember having this discussion way back when we started lttng-tools > > code and here we are almost three years later fixing the "ioctl > > wrapper issue" :P. > > > > I don't think adding a macro is a lot of work and it will be really > > easier for us to scale and adapt over time. Please, if I'm wrong, > > speak now or shut up to the end of eternity! :P > > For this particular case, we need to make special treatment when the > ioctl takes an argument depending on the ABI version (alloc and assign > the old struct values from the new one), so we won't be able to call a > generic wrapper (except if we do this treatment all the time which I > doubt we want). > For ioctls that don't take an argument the wrapper is easy though. > > So I think I should do a wrapper like > compat_ioctl_no_arg(int fd, unsigned long oldname, unsigned long newname) > and for the ioctls that take an argument make the checks locally. make sense for me! If you can do the wrapper in a static inline rather than a #define, it is even better! Thanks, Mathieu > > Thoughts ? > > Thanks, > > Julien > > > > > > Cheers! > > David > > > > > >> Thanks, > > > >> Mathieu > > > >>> > >>> Thanks, > >>> > >>> Julien > >>> > >>> > >>>> > >>>> Thanks, > >>>> > >>>> Mathieu > >>>> > >>>> > >>>>> + */ +static int lttng_kernel_use_old_abi; + int > >>>>> kernctl_create_session(int fd) { + if > >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, > >>>>> LTTNG_KERNEL_OLD_SESSION); return ioctl(fd, > >>>>> LTTNG_KERNEL_SESSION); } > >>>>> > >>>>> /* open the metadata global channel */ int > >>>>> kernctl_open_metadata(int fd, struct lttng_channel_attr > >>>>> *chops) { - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); + > >>>>> struct lttng_kernel_old_channel old_channel; + struct > >>>>> lttng_kernel_channel channel; + + if > >>>>> (lttng_kernel_use_old_abi) { + old_channel.overwrite = > >>>>> chops->overwrite; + old_channel.subbuf_size = > >>>>> chops->subbuf_size; + old_channel.num_subbuf = > >>>>> chops->num_subbuf; + old_channel.switch_timer_interval = > >>>>> chops->switch_timer_interval; + > >>>>> old_channel.read_timer_interval = > >>>>> chops->read_timer_interval; + old_channel.output = > >>>>> chops->output; + memcpy(old_channel.padding, chops->padding, > >>>>> sizeof(chops->padding)); + + return ioctl(fd, > >>>>> LTTNG_KERNEL_OLD_METADATA, &old_channel); + } + + > >>>>> channel.overwrite = chops->overwrite; + channel.subbuf_size = > >>>>> chops->subbuf_size; + channel.num_subbuf = > >>>>> chops->num_subbuf; + channel.switch_timer_interval = > >>>>> chops->switch_timer_interval; + channel.read_timer_interval = > >>>>> chops->read_timer_interval; + channel.output = > >>>>> chops->output; + memcpy(channel.padding, chops->padding, > >>>>> sizeof(chops->padding)); + + return ioctl(fd, > >>>>> LTTNG_KERNEL_METADATA, &channel); } > >>>>> > >>>>> int kernctl_create_channel(int fd, struct lttng_channel_attr > >>>>> *chops) { - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); + > >>>>> struct lttng_kernel_old_channel old_channel; + struct > >>>>> lttng_kernel_channel channel; + + if > >>>>> (lttng_kernel_use_old_abi) { + old_channel.overwrite = > >>>>> chops->overwrite; + old_channel.subbuf_size = > >>>>> chops->subbuf_size; + old_channel.num_subbuf = > >>>>> chops->num_subbuf; + old_channel.switch_timer_interval = > >>>>> chops->switch_timer_interval; + > >>>>> old_channel.read_timer_interval = > >>>>> chops->read_timer_interval; + old_channel.output = > >>>>> chops->output; + memcpy(old_channel.padding, chops->padding, > >>>>> sizeof(chops->padding)); + + return ioctl(fd, > >>>>> LTTNG_KERNEL_OLD_CHANNEL, &old_channel); + } + + > >>>>> channel.overwrite = chops->overwrite; + channel.subbuf_size = > >>>>> chops->subbuf_size; + channel.num_subbuf = > >>>>> chops->num_subbuf; + channel.switch_timer_interval = > >>>>> chops->switch_timer_interval; + channel.read_timer_interval = > >>>>> chops->read_timer_interval; + channel.output = > >>>>> chops->output; + memcpy(channel.padding, chops->padding, > >>>>> sizeof(chops->padding)); + + return ioctl(fd, > >>>>> LTTNG_KERNEL_CHANNEL, &channel); } > >>>>> > >>>>> int kernctl_create_stream(int fd) { + if > >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, > >>>>> LTTNG_KERNEL_OLD_STREAM); return ioctl(fd, > >>>>> LTTNG_KERNEL_STREAM); } > >>>>> > >>>>> int kernctl_create_event(int fd, struct lttng_kernel_event > >>>>> *ev) { + if (lttng_kernel_use_old_abi) + return ioctl(fd, > >>>>> LTTNG_KERNEL_OLD_EVENT, ev); return ioctl(fd, > >>>>> LTTNG_KERNEL_EVENT, ev); } > >>>>> > >>>>> int kernctl_add_context(int fd, struct lttng_kernel_context > >>>>> *ctx) { + if (lttng_kernel_use_old_abi) + return ioctl(fd, > >>>>> LTTNG_KERNEL_OLD_CONTEXT, ctx); return ioctl(fd, > >>>>> LTTNG_KERNEL_CONTEXT, ctx); } > >>>>> > >>>>> @@ -57,43 +119,64 @@ int kernctl_add_context(int fd, struct > >>>>> lttng_kernel_context *ctx) /* Enable event, channel and > >>>>> session ioctl */ int kernctl_enable(int fd) { + if > >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, > >>>>> LTTNG_KERNEL_OLD_ENABLE); return ioctl(fd, > >>>>> LTTNG_KERNEL_ENABLE); } > >>>>> > >>>>> /* Disable event, channel and session ioctl */ int > >>>>> kernctl_disable(int fd) { + if (lttng_kernel_use_old_abi) + > >>>>> return ioctl(fd, LTTNG_KERNEL_OLD_DISABLE); return ioctl(fd, > >>>>> LTTNG_KERNEL_DISABLE); } > >>>>> > >>>>> int kernctl_start_session(int fd) { + if > >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, > >>>>> LTTNG_KERNEL_OLD_SESSION_START); return ioctl(fd, > >>>>> LTTNG_KERNEL_SESSION_START); } > >>>>> > >>>>> int kernctl_stop_session(int fd) { + if > >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, > >>>>> LTTNG_KERNEL_OLD_SESSION_STOP); return ioctl(fd, > >>>>> LTTNG_KERNEL_SESSION_STOP); } > >>>>> > >>>>> > >>>>> int kernctl_tracepoint_list(int fd) { + if > >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, > >>>>> LTTNG_KERNEL_OLD_TRACEPOINT_LIST); return ioctl(fd, > >>>>> LTTNG_KERNEL_TRACEPOINT_LIST); } > >>>>> > >>>>> int kernctl_tracer_version(int fd, struct > >>>>> lttng_kernel_tracer_version *v) { - return ioctl(fd, > >>>>> LTTNG_KERNEL_TRACER_VERSION, v); + int ret; + + ret = > >>>>> ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); + if (!ret) + > >>>>> return ret; + + lttng_kernel_use_old_abi = 1; + return > >>>>> ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, v); } > >>>>> > >>>>> int kernctl_wait_quiescent(int fd) { + if > >>>>> (lttng_kernel_use_old_abi) + return ioctl(fd, > >>>>> LTTNG_KERNEL_OLD_WAIT_QUIESCENT); return ioctl(fd, > >>>>> LTTNG_KERNEL_WAIT_QUIESCENT); } > >>>>> > >>>>> int kernctl_calibrate(int fd, struct lttng_kernel_calibrate > >>>>> *calibrate) { + if (lttng_kernel_use_old_abi) + return > >>>>> ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, calibrate); return > >>>>> ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); } > >>>>> > >>>>> @@ -193,10 +276,3 @@ int kernctl_set_stream_id(int fd, > >>>>> unsigned long *stream_id) { return ioctl(fd, > >>>>> RING_BUFFER_SET_STREAM_ID, stream_id); } - -/* Get the offset > >>>>> of the stream_id in the packet header */ -int > >>>>> kernctl_get_net_stream_id_offset(int fd, unsigned long > >>>>> *offset) -{ - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, > >>>>> offset); - -} diff --git a/src/common/kernel-ctl/kernel-ctl.h > >>>>> b/src/common/kernel-ctl/kernel-ctl.h index 18712d9..85a3a18 > >>>>> 100644 --- a/src/common/kernel-ctl/kernel-ctl.h +++ > >>>>> b/src/common/kernel-ctl/kernel-ctl.h @@ -21,6 +21,7 @@ > >>>>> > >>>>> #include #include > >>>>> +#include > >>>>> > >>>>> int kernctl_create_session(int fd); int > >>>>> kernctl_open_metadata(int fd, struct lttng_channel_attr > >>>>> *chops); diff --git a/src/common/kernel-ctl/kernel-ioctl.h > >>>>> b/src/common/kernel-ctl/kernel-ioctl.h index 35942be..1d34222 > >>>>> 100644 --- a/src/common/kernel-ctl/kernel-ioctl.h +++ > >>>>> b/src/common/kernel-ctl/kernel-ioctl.h @@ -49,37 +49,69 @@ /* > >>>>> map stream to stream id for network streaming */ #define > >>>>> RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned > >>>>> long) > >>>>> > >>>>> +/* Old ABI (without support for 32/64 bits compat) */ +/* > >>>>> LTTng file descriptor ioctl */ +#define > >>>>> LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) > >>>>> +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ + > >>>>> _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) > >>>>> +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, > >>>>> 0x42) +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT > >>>>> _IO(0xF6, 0x43) +#define LTTNG_KERNEL_OLD_CALIBRATE \ + > >>>>> _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) + +/* > >>>>> Session FD ioctl */ +#define LTTNG_KERNEL_OLD_METADATA > >>>>> \ + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) > >>>>> +#define LTTNG_KERNEL_OLD_CHANNEL \ + > >>>>> _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) +#define > >>>>> LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) > >>>>> +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, > >>>>> 0x53) + +/* Channel FD ioctl */ +#define > >>>>> LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) > >>>>> +#define LTTNG_KERNEL_OLD_EVENT \ + > >>>>> _IOW(0xF6, 0x61, struct lttng_kernel_old_event) +#define > >>>>> LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ + _IOR(0xF6, 0x62, > >>>>> unsigned long) > >>>>> > >>>>> +/* Event and Channel FD ioctl */ +#define > >>>>> LTTNG_KERNEL_OLD_CONTEXT \ + _IOW(0xF6, 0x70, > >>>>> struct lttng_kernel_old_context) + +/* Event, Channel and > >>>>> Session ioctl */ +#define LTTNG_KERNEL_OLD_ENABLE > >>>>> _IO(0xF6, 0x80) +#define LTTNG_KERNEL_OLD_DISABLE > >>>>> _IO(0xF6, 0x81) + + +/* New ABI (with suport for 32/64 bits > >>>>> compat) */ /* LTTng file descriptor ioctl */ -#define > >>>>> LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) -#define > >>>>> LTTNG_KERNEL_TRACER_VERSION \ - _IOR(0xF6, 0x41, > >>>>> struct lttng_kernel_tracer_version) -#define > >>>>> LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) -#define > >>>>> LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) +#define > >>>>> LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) +#define > >>>>> LTTNG_KERNEL_TRACER_VERSION \ + _IOR(0xF6, 0x46, struct > >>>>> lttng_kernel_tracer_version) +#define > >>>>> LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) +#define > >>>>> LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) #define > >>>>> LTTNG_KERNEL_CALIBRATE \ - _IOWR(0xF6, 0x44, struct > >>>>> lttng_kernel_calibrate) + _IOWR(0xF6, 0x49, struct > >>>>> lttng_kernel_calibrate) > >>>>> > >>>>> /* Session FD ioctl */ -#define LTTNG_KERNEL_METADATA > >>>>> \ - _IOW(0xF6, 0x50, struct lttng_channel_attr) -#define > >>>>> LTTNG_KERNEL_CHANNEL \ - _IOW(0xF6, 0x51, > >>>>> struct lttng_channel_attr) -#define > >>>>> LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) -#define > >>>>> LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) +#define > >>>>> LTTNG_KERNEL_METADATA \ + _IOW(0xF6, 0x54, struct > >>>>> lttng_kernel_channel) +#define LTTNG_KERNEL_CHANNEL \ + > >>>>> _IOW(0xF6, 0x55, struct lttng_kernel_channel) +#define > >>>>> LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) +#define > >>>>> LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) > >>>>> > >>>>> /* Channel FD ioctl */ -#define LTTNG_KERNEL_STREAM > >>>>> _IO(0xF6, 0x60) -#define LTTNG_KERNEL_EVENT > >>>>> \ - _IOW(0xF6, 0x61, struct lttng_kernel_event) -#define > >>>>> LTTNG_KERNEL_STREAM_ID_OFFSET \ - _IOR(0xF6, 0x62, > >>>>> unsigned long) +#define LTTNG_KERNEL_STREAM _IO(0xF6, > >>>>> 0x62) +#define LTTNG_KERNEL_EVENT \ + _IOW(0xF6, 0x63, > >>>>> struct lttng_kernel_event) > >>>>> > >>>>> /* Event and Channel FD ioctl */ -#define > >>>>> LTTNG_KERNEL_CONTEXT \ - _IOW(0xF6, 0x70, > >>>>> struct lttng_kernel_context) +#define LTTNG_KERNEL_CONTEXT > >>>>> \ + _IOW(0xF6, 0x71, struct lttng_kernel_context) > >>>>> > >>>>> /* Event, Channel and Session ioctl */ -#define > >>>>> LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) -#define > >>>>> LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) +#define > >>>>> LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) +#define > >>>>> LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) > >>>>> > >>>>> #endif /* _LTT_KERNEL_IOCTL_H */ diff --git > >>>>> a/src/common/lttng-kernel-old.h > >>>>> b/src/common/lttng-kernel-old.h new file mode 100644 index > >>>>> 0000000..0579751 --- /dev/null +++ > >>>>> b/src/common/lttng-kernel-old.h @@ -0,0 +1,117 @@ +/* + * > >>>>> Copyright (C) 2011 - Julien Desfossez > >>>>> + * > >>>>> Mathieu Desnoyers + * > >>>>> David Goulet + * + * This program > >>>>> is free software; you can redistribute it and/or modify + * > >>>>> it under the terms of the GNU General Public License, version > >>>>> 2 only, + * as published by the Free Software Foundation. + > >>>>> * + * This program is distributed in the hope that it will be > >>>>> useful, but WITHOUT + * ANY WARRANTY; without even the > >>>>> implied warranty of MERCHANTABILITY or + * FITNESS FOR A > >>>>> PARTICULAR PURPOSE. See the GNU General Public License for + > >>>>> * more details. + * + * You should have received a copy of > >>>>> the GNU General Public License along + * with this program; > >>>>> if not, write to the Free Software Foundation, Inc., + * 51 > >>>>> Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + > >>>>> */ + +#ifndef _LTTNG_KERNEL_OLD_H +#define > >>>>> _LTTNG_KERNEL_OLD_H + +#include +#include > >>>>> + +#define > >>>>> LTTNG_KERNEL_OLD_SYM_NAME_LEN 256 + +/* + * LTTng DebugFS > >>>>> ABI structures. + * + * This is the kernel ABI copied from > >>>>> lttng-modules tree. + */ + +/* Perf counter attributes */ > >>>>> +struct lttng_kernel_old_perf_counter_ctx { + uint32_t type; > >>>>> + uint64_t config; + char > >>>>> name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; +}; + +/* Event/Channel > >>>>> context */ +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 > >>>>> +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 > >>>>> LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 +struct > >>>>> lttng_kernel_old_context { + enum lttng_kernel_context_type > >>>>> ctx; + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; + + > >>>>> union { + struct lttng_kernel_old_perf_counter_ctx > >>>>> perf_counter; + char > >>>>> padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; + } u; +}; + > >>>>> +struct lttng_kernel_old_kretprobe { + uint64_t addr; + + > >>>>> uint64_t offset; + char > >>>>> symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; +}; + +/* + * > >>>>> Either addr is used, or symbol_name and offset. + */ +struct > >>>>> lttng_kernel_old_kprobe { + uint64_t addr; + + uint64_t > >>>>> offset; + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > >>>>> +}; + +/* Function tracer */ +struct > >>>>> lttng_kernel_old_function { + char > >>>>> symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; +}; + +#define > >>>>> LTTNG_KERNEL_OLD_EVENT_PADDING1 16 +#define > >>>>> LTTNG_KERNEL_OLD_EVENT_PADDING2 > >>>>> LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 +struct > >>>>> lttng_kernel_old_event { + char > >>>>> name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; + enum > >>>>> lttng_kernel_instrumentation instrumentation; + char > >>>>> padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; + + /* Per > >>>>> instrumentation type configuration */ + union { + struct > >>>>> lttng_kernel_old_kretprobe kretprobe; + struct > >>>>> lttng_kernel_old_kprobe kprobe; + struct > >>>>> lttng_kernel_old_function ftrace; + char > >>>>> padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; + } u; +}; + > >>>>> +struct lttng_kernel_old_tracer_version { + uint32_t major; + > >>>>> uint32_t minor; + uint32_t patchlevel; +}; + +struct > >>>>> lttng_kernel_old_calibrate { + enum > >>>>> lttng_kernel_calibrate_type type; /* type (input) */ +}; + > >>>>> +/* + * kernel channel + */ +#define > >>>>> LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > >>>>> +struct lttng_kernel_old_channel { + int overwrite; > >>>>> /* 1: overwrite, 0: discard */ + uint64_t subbuf_size; > >>>>> /* bytes */ + uint64_t num_subbuf; /* power of > >>>>> 2 */ + unsigned int switch_timer_interval; /* usec */ + > >>>>> unsigned int read_timer_interval; /* usec */ + enum > >>>>> lttng_event_output output; /* splice, mmap */ + + char > >>>>> padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; +}; + +#endif /* > >>>>> _LTTNG_KERNEL_OLD_H */ diff --git a/src/common/lttng-kernel.h > >>>>> b/src/common/lttng-kernel.h index dbeb6aa..ac881bf 100644 --- > >>>>> a/src/common/lttng-kernel.h +++ b/src/common/lttng-kernel.h > >>>>> @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { > >>>>> uint32_t type; uint64_t config; char > >>>>> name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; > >>>>> +}__attribute__((packed)); > >>>>> > >>>>> /* Event/Channel context */ #define > >>>>> LTTNG_KERNEL_CONTEXT_PADDING1 16 @@ -72,14 +72,14 @@ struct > >>>>> lttng_kernel_context { struct lttng_kernel_perf_counter_ctx > >>>>> perf_counter; char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; } > >>>>> u; -}; +}__attribute__((packed)); > >>>>> > >>>>> struct lttng_kernel_kretprobe { uint64_t addr; > >>>>> > >>>>> uint64_t offset; char > >>>>> symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; > >>>>> +}__attribute__((packed)); > >>>>> > >>>>> /* * Either addr is used, or symbol_name and offset. @@ > >>>>> -89,12 +89,12 @@ struct lttng_kernel_kprobe { > >>>>> > >>>>> uint64_t offset; char > >>>>> symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; > >>>>> +}__attribute__((packed)); > >>>>> > >>>>> /* Function tracer */ struct lttng_kernel_function { char > >>>>> symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; > >>>>> +}__attribute__((packed)); > >>>>> > >>>>> #define LTTNG_KERNEL_EVENT_PADDING1 16 #define > >>>>> LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + > >>>>> 32 @@ -110,13 +110,13 @@ struct lttng_kernel_event { struct > >>>>> lttng_kernel_function ftrace; char > >>>>> padding[LTTNG_KERNEL_EVENT_PADDING2]; } u; -}; > >>>>> +}__attribute__((packed)); > >>>>> > >>>>> struct lttng_kernel_tracer_version { uint32_t major; uint32_t > >>>>> minor; uint32_t patchlevel; -}; +}__attribute__((packed)); > >>>>> > >>>>> enum lttng_kernel_calibrate_type { > >>>>> LTTNG_KERNEL_CALIBRATE_KRETPROBE, @@ -124,6 +124,21 @@ enum > >>>>> lttng_kernel_calibrate_type { > >>>>> > >>>>> struct lttng_kernel_calibrate { enum > >>>>> lttng_kernel_calibrate_type type; /* type (input) */ -}; > >>>>> +}__attribute__((packed)); + +/* + * kernel channel + */ > >>>>> +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN > >>>>> + 32 +struct lttng_kernel_channel { + uint64_t subbuf_size; > >>>>> /* bytes */ + uint64_t num_subbuf; /* power of > >>>>> 2 */ + unsigned int switch_timer_interval; /* usec */ + > >>>>> unsigned int read_timer_interval; /* usec */ + int > >>>>> overwrite; /* 1: overwrite, 0: discard > >>>>> */ + enum lttng_event_output output; /* splice, mmap */ > >>>>> + + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; > >>>>> +}__attribute__((packed)); > >>>>> > >>>>> #endif /* _LTTNG_KERNEL_H */ -- 1.7.9.5 > >>>>> > >>>> > > > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev at lists.lttng.org > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 1 14:52:17 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 1 Oct 2012 14:52:17 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v2] ABI with support for compat 32/64 bits In-Reply-To: <5069E45D.4040303@efficios.com> References: <1348952028-31926-1-git-send-email-jdesfossez@efficios.com> <20121001161541.GB14260@Krystal> <5069D90C.1030804@efficios.com> <20121001181452.GA15644@Krystal> <5069DF8F.9050706@efficios.com> <5069E352.7040909@efficios.com> <5069E45D.4040303@efficios.com> Message-ID: <20121001185217.GA18293@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > > > Julien Desfossez: > >>>>>> > >>>>>> +/* + * This flag indicates if lttng-tools must use the > >>>>>> new or the old kernel ABI + * (without compat support for > >>>>>> 32/64 bits). It is set by + * kernctl_tracer_version() > >>>>> > >>>>> Hrm. I don't like that this is only set by > >>>>> kernctl_tracer_version(). What if, for an unforeseen > >>>>> reason (e.g. code change), sessiond starts using "create > >>>>> session" before checking the version ? The change you > >>>>> propose assumes a behavior on the sessiond side that is not > >>>>> cast in stone (no ABI requires it), so it might change. > >>>>> This brings coupling between this otherwise self-contained > >>>>> wrapper and the entire sessiond code base, which I don't > >>>>> like. > >>>>> > >>>>> We should put the check in a wrapper macro around the every > >>>>> new ioctl call. > >>>>> > >>>>> e.g. > >>>>> > >>>>> /* * Cache whether we need to use the old or new ABI. */ > >>>>> static lttng_kernel_use_old_abi = -1; > >>>>> > >>>>> > >>>>> if (lttng_kernel_use_old_abi == -1) { ret = ioctl(fd, > >>>>> newname, args); if (!ret) { lttng_kernel_use_old_abi = 0; } > >>>>> } else { if (!lttng_kernel_use_old_abi) { ret = ioctl(fd, > >>>>> oldname, args); } else { ret = ioctl(fd, newname, args); } > >>>>> } return ret; > >>>>> > >>>>> Thoughts ? > >>>> > >>>> Ok I will add this wrapper, I am wondering if we could just > >>>> limit to these ioctl since they are the only one called on > >>>> the "top-level" fd (/proc/lttng) instead of adding it > >>>> everywhere : LTTNG_KERNEL_SESSION LTTNG_KERNEL_TRACER_VERSION > >>>> LTTNG_KERNEL_TRACEPOINT_LIST LTTNG_KERNEL_WAIT_QUIESCENT > >>>> LTTNG_KERNEL_CALIBRATE > >>>> > >>>> All other ioctl require at least a session so is it OK if I > >>>> limit the test to only these ? > >> > >>> yes, it makes sense. And given that this code is very much > >>> localized, I don't think it would be worth the effort to create > >>> a macro wrapper. Duplicating the checks at each 5 sites, all > >>> within the same file, seems good enough. But it's David's > >>> call. > >> > >> I don't know... I can easily see some other use cases in the > >> future that requires us to wrap the ioctl according to the > >> kernel. > >> > >> I remember having this discussion way back when we started > >> lttng-tools code and here we are almost three years later fixing > >> the "ioctl wrapper issue" :P. > >> > >> I don't think adding a macro is a lot of work and it will be > >> really easier for us to scale and adapt over time. Please, if I'm > >> wrong, speak now or shut up to the end of eternity! :P > > > > For this particular case, we need to make special treatment when > > the ioctl takes an argument depending on the ABI version (alloc and > > assign the old struct values from the new one), so we won't be able > > to call a generic wrapper (except if we do this treatment all the > > time which I doubt we want). For ioctls that don't take an argument > > the wrapper is easy though. > > > > So I think I should do a wrapper like compat_ioctl_no_arg(int fd, > > unsigned long oldname, unsigned long newname) and for the ioctls > > that take an argument make the checks locally. > > Hmmm it's that or a compat call for each ioctl listed above... or > using va_list but I really don't like that :P > > Either way works for me. Mathieu? Let's make it simple for those few sites, and not overengineer this. I like Julien's solution. Thanks, Mathieu > > David > > > > > Thoughts ? > > > > Thanks, > > > > Julien > > > > > -----BEGIN PGP SIGNATURE----- > > iQEcBAEBCgAGBQJQaeRaAAoJEELoaioR9I026m4IALHDYyU28segZJpauX6JbDTE > ij+T9aW7+bAxJRQfymg9kx6noo8t+at4rEsOO+4FvK30/8Tfvqy5H84dw72AtPo2 > mbllr4mGAkqb9S3mfoCWSt38bK46cqKZf8ldJJs4B8sH1eWEMtVxUv7vgNnI4yZf > X3/2NoOUdFs1QUun8oGm2Evh82+4PAfJRemKybSIwi5a1GOoZQ7WGYViWsQch0mx > vEjsdQ1ays4zCU/EWGhSu4+WBx18C+sPaRNg1MB98fqyjWMph7XbbCysQptv7LIs > Es8wZkINul+HzKf8F47iapZO30QEGwawSxko+B0YyEbk6iXKzIiaPPbik3tgzdI= > =UfYI > -----END PGP SIGNATURE----- > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From jdesfossez at efficios.com Mon Oct 1 16:21:48 2012 From: jdesfossez at efficios.com (Julien Desfossez) Date: Mon, 1 Oct 2012 16:21:48 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v3] ABI with support for compat 32/64 bits Message-ID: <1349122908-22553-1-git-send-email-jdesfossez@efficios.com> The current ABI does not work for compat 32/64 bits. This patch moves the current ABI as old-abi and provides a new ABI in which all the structures exchanged between user and kernel-space are packed. Also this new ABI moves the "int overwrite" member of the struct lttng_kernel_channel to remove the alignment added by the compiler. A patch for lttng-modules has been developed in parallel to this one to support the new ABI. These 2 patches have been tested in all possible configurations (applied or not) on 64-bit and 32-bit kernels (with CONFIG_COMPAT) and a user-space in 32 and 64-bit. Here are the results of the tests : k 64 compat |?u 32 compat | OK k 64 compat | u 64 compat | OK k 64 compat | u 32 non-compat | KO k 64 compat | u 64 non-compat | OK k 64 non-compat | u 64 compat | OK k 64 non-compat | u 32 compat | KO k 64 non-compat | u 64 non-compat | OK k 64 non-compat | u 32 non-compat | KO k 32 compat | u compat | OK k 32 compat | u non-compat | OK k 32 non-compat | u compat | OK k 32 non-compat | u non-compat | OK The results are as expected : - on 32-bit user-space and kernel, every configuration works. - on 64-bit user-space and kernel, every configuration works. - with 32-bit user-space on a 64-bit kernel the only configuration where it works is when the compat patch is applied everywhere. Signed-off-by: Julien Desfossez --- src/bin/lttng-sessiond/trace-kernel.h | 1 + src/common/kernel-ctl/kernel-ctl.c | 210 +++++++++++++++++++++++++++++---- src/common/kernel-ctl/kernel-ctl.h | 1 + src/common/kernel-ctl/kernel-ioctl.h | 74 ++++++++---- src/common/lttng-kernel-old.h | 117 ++++++++++++++++++ src/common/lttng-kernel.h | 31 +++-- 6 files changed, 385 insertions(+), 49 deletions(-) create mode 100644 src/common/lttng-kernel-old.h diff --git a/src/bin/lttng-sessiond/trace-kernel.h b/src/bin/lttng-sessiond/trace-kernel.h index f04d9e7..c86cc27 100644 --- a/src/bin/lttng-sessiond/trace-kernel.h +++ b/src/bin/lttng-sessiond/trace-kernel.h @@ -22,6 +22,7 @@ #include #include +#include #include "consumer.h" diff --git a/src/common/kernel-ctl/kernel-ctl.c b/src/common/kernel-ctl/kernel-ctl.c index 1396cd9..ab29dd0 100644 --- a/src/common/kernel-ctl/kernel-ctl.c +++ b/src/common/kernel-ctl/kernel-ctl.c @@ -18,38 +18,163 @@ #define __USE_LINUX_IOCTL_DEFS #include +#include #include "kernel-ctl.h" #include "kernel-ioctl.h" +/* + * This flag indicates whether lttng-tools must use the new or the old kernel + * ABI (without compat support for 32/64 bits). + */ +static int lttng_kernel_use_old_abi = -1; + +static inline int compat_ioctl_no_arg(int fd, unsigned long oldname, + unsigned long newname) +{ + int ret; + + if (lttng_kernel_use_old_abi == -1) { + ret = ioctl(fd, newname); + if (!ret) { + lttng_kernel_use_old_abi = 0; + goto end; + } + lttng_kernel_use_old_abi = 1; + } + if (lttng_kernel_use_old_abi) { + ret = ioctl(fd, oldname); + } else { + ret = ioctl(fd, newname); + } + +end: + return ret; +} + int kernctl_create_session(int fd) { - return ioctl(fd, LTTNG_KERNEL_SESSION); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION, + LTTNG_KERNEL_SESSION); } /* open the metadata global channel */ int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops) { - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); + struct lttng_kernel_old_channel old_channel; + struct lttng_kernel_channel channel; + + if (lttng_kernel_use_old_abi) { + old_channel.overwrite = chops->overwrite; + old_channel.subbuf_size = chops->subbuf_size; + old_channel.num_subbuf = chops->num_subbuf; + old_channel.switch_timer_interval = chops->switch_timer_interval; + old_channel.read_timer_interval = chops->read_timer_interval; + old_channel.output = chops->output; + memcpy(old_channel.padding, chops->padding, sizeof(chops->padding)); + + return ioctl(fd, LTTNG_KERNEL_OLD_METADATA, &old_channel); + } + + channel.overwrite = chops->overwrite; + channel.subbuf_size = chops->subbuf_size; + channel.num_subbuf = chops->num_subbuf; + channel.switch_timer_interval = chops->switch_timer_interval; + channel.read_timer_interval = chops->read_timer_interval; + channel.output = chops->output; + memcpy(channel.padding, chops->padding, sizeof(chops->padding)); + + return ioctl(fd, LTTNG_KERNEL_METADATA, &channel); } int kernctl_create_channel(int fd, struct lttng_channel_attr *chops) { - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); + struct lttng_kernel_old_channel old_channel; + struct lttng_kernel_channel channel; + + if (lttng_kernel_use_old_abi) { + old_channel.overwrite = chops->overwrite; + old_channel.subbuf_size = chops->subbuf_size; + old_channel.num_subbuf = chops->num_subbuf; + old_channel.switch_timer_interval = chops->switch_timer_interval; + old_channel.read_timer_interval = chops->read_timer_interval; + old_channel.output = chops->output; + memcpy(old_channel.padding, chops->padding, sizeof(chops->padding)); + + return ioctl(fd, LTTNG_KERNEL_OLD_CHANNEL, &old_channel); + } + + channel.overwrite = chops->overwrite; + channel.subbuf_size = chops->subbuf_size; + channel.num_subbuf = chops->num_subbuf; + channel.switch_timer_interval = chops->switch_timer_interval; + channel.read_timer_interval = chops->read_timer_interval; + channel.output = chops->output; + memcpy(channel.padding, chops->padding, sizeof(chops->padding)); + + return ioctl(fd, LTTNG_KERNEL_CHANNEL, &channel); } int kernctl_create_stream(int fd) { - return ioctl(fd, LTTNG_KERNEL_STREAM); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_STREAM, + LTTNG_KERNEL_STREAM); } int kernctl_create_event(int fd, struct lttng_kernel_event *ev) { + if (lttng_kernel_use_old_abi) { + struct lttng_kernel_event old_event; + + memcpy(old_event.name, ev->name, sizeof(old_event.name)); + old_event.instrumentation = ev->instrumentation; + switch (ev->instrumentation) { + case LTTNG_KERNEL_KPROBE: + old_event.u.kprobe.addr = ev->u.kprobe.addr; + old_event.u.kprobe.offset = ev->u.kprobe.offset; + memcpy(old_event.u.kprobe.symbol_name, + ev->u.kprobe.symbol_name, + sizeof(old_event.u.kprobe.symbol_name)); + break; + case LTTNG_KERNEL_KRETPROBE: + old_event.u.kretprobe.addr = ev->u.kretprobe.addr; + old_event.u.kretprobe.offset = ev->u.kretprobe.offset; + memcpy(old_event.u.kretprobe.symbol_name, + ev->u.kretprobe.symbol_name, + sizeof(old_event.u.kretprobe.symbol_name)); + break; + case LTTNG_KERNEL_FUNCTION: + memcpy(old_event.u.ftrace.symbol_name, + ev->u.ftrace.symbol_name, + sizeof(old_event.u.ftrace.symbol_name)); + break; + default: + break; + } + + return ioctl(fd, LTTNG_KERNEL_OLD_EVENT, &old_event); + } return ioctl(fd, LTTNG_KERNEL_EVENT, ev); } int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) { + if (lttng_kernel_use_old_abi) { + struct lttng_kernel_old_context old_ctx; + + old_ctx.ctx = ctx->ctx; + /* only type that uses the union */ + if (ctx->ctx == LTTNG_KERNEL_CONTEXT_PERF_COUNTER) { + old_ctx.u.perf_counter.type = + ctx->u.perf_counter.type; + old_ctx.u.perf_counter.config = + ctx->u.perf_counter.config; + memcpy(old_ctx.u.perf_counter.name, + ctx->u.perf_counter.name, + sizeof(old_ctx.u.perf_counter.name)); + } + return ioctl(fd, LTTNG_KERNEL_OLD_CONTEXT, &old_ctx); + } return ioctl(fd, LTTNG_KERNEL_CONTEXT, ctx); } @@ -57,44 +182,96 @@ int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) /* Enable event, channel and session ioctl */ int kernctl_enable(int fd) { - return ioctl(fd, LTTNG_KERNEL_ENABLE); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_ENABLE, + LTTNG_KERNEL_ENABLE); } /* Disable event, channel and session ioctl */ int kernctl_disable(int fd) { - return ioctl(fd, LTTNG_KERNEL_DISABLE); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_DISABLE, + LTTNG_KERNEL_DISABLE); } int kernctl_start_session(int fd) { - return ioctl(fd, LTTNG_KERNEL_SESSION_START); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_START, + LTTNG_KERNEL_SESSION_START); } int kernctl_stop_session(int fd) { - return ioctl(fd, LTTNG_KERNEL_SESSION_STOP); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_STOP, + LTTNG_KERNEL_SESSION_STOP); } - int kernctl_tracepoint_list(int fd) { - return ioctl(fd, LTTNG_KERNEL_TRACEPOINT_LIST); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_TRACEPOINT_LIST, + LTTNG_KERNEL_TRACEPOINT_LIST); } int kernctl_tracer_version(int fd, struct lttng_kernel_tracer_version *v) { - return ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); + int ret; + + if (lttng_kernel_use_old_abi == -1) { + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); + if (!ret) { + lttng_kernel_use_old_abi = 0; + goto end; + } + lttng_kernel_use_old_abi = 1; + } + if (lttng_kernel_use_old_abi) { + struct lttng_kernel_old_tracer_version old_v; + + ret = ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, &old_v); + if (ret) + goto end; + v->major = old_v.major; + v->minor = old_v.minor; + v->patchlevel = old_v.patchlevel; + } else { + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); + } + +end: + return ret; } int kernctl_wait_quiescent(int fd) { - return ioctl(fd, LTTNG_KERNEL_WAIT_QUIESCENT); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_WAIT_QUIESCENT, + LTTNG_KERNEL_WAIT_QUIESCENT); } int kernctl_calibrate(int fd, struct lttng_kernel_calibrate *calibrate) { - return ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); + int ret; + + if (lttng_kernel_use_old_abi == -1) { + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); + if (!ret) { + lttng_kernel_use_old_abi = 0; + goto end; + } + lttng_kernel_use_old_abi = 1; + } + if (lttng_kernel_use_old_abi) { + struct lttng_kernel_old_calibrate old_calibrate; + + old_calibrate.type = calibrate->type; + ret = ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, &old_calibrate); + if (ret) + goto end; + calibrate->type = old_calibrate.type; + } else { + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); + } + +end: + return ret; } @@ -193,10 +370,3 @@ int kernctl_set_stream_id(int fd, unsigned long *stream_id) { return ioctl(fd, RING_BUFFER_SET_STREAM_ID, stream_id); } - -/* Get the offset of the stream_id in the packet header */ -int kernctl_get_net_stream_id_offset(int fd, unsigned long *offset) -{ - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, offset); - -} diff --git a/src/common/kernel-ctl/kernel-ctl.h b/src/common/kernel-ctl/kernel-ctl.h index 18712d9..85a3a18 100644 --- a/src/common/kernel-ctl/kernel-ctl.h +++ b/src/common/kernel-ctl/kernel-ctl.h @@ -21,6 +21,7 @@ #include #include +#include int kernctl_create_session(int fd); int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops); diff --git a/src/common/kernel-ctl/kernel-ioctl.h b/src/common/kernel-ctl/kernel-ioctl.h index 35942be..1d34222 100644 --- a/src/common/kernel-ctl/kernel-ioctl.h +++ b/src/common/kernel-ctl/kernel-ioctl.h @@ -49,37 +49,69 @@ /* map stream to stream id for network streaming */ #define RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned long) +/* Old ABI (without support for 32/64 bits compat) */ +/* LTTng file descriptor ioctl */ +#define LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ + _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, 0x42) +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT _IO(0xF6, 0x43) +#define LTTNG_KERNEL_OLD_CALIBRATE \ + _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) + +/* Session FD ioctl */ +#define LTTNG_KERNEL_OLD_METADATA \ + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) +#define LTTNG_KERNEL_OLD_CHANNEL \ + _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) +#define LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, 0x53) + +/* Channel FD ioctl */ +#define LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) +#define LTTNG_KERNEL_OLD_EVENT \ + _IOW(0xF6, 0x61, struct lttng_kernel_old_event) +#define LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ + _IOR(0xF6, 0x62, unsigned long) +/* Event and Channel FD ioctl */ +#define LTTNG_KERNEL_OLD_CONTEXT \ + _IOW(0xF6, 0x70, struct lttng_kernel_old_context) + +/* Event, Channel and Session ioctl */ +#define LTTNG_KERNEL_OLD_ENABLE _IO(0xF6, 0x80) +#define LTTNG_KERNEL_OLD_DISABLE _IO(0xF6, 0x81) + + +/* New ABI (with suport for 32/64 bits compat) */ /* LTTng file descriptor ioctl */ -#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) -#define LTTNG_KERNEL_TRACER_VERSION \ - _IOR(0xF6, 0x41, struct lttng_kernel_tracer_version) -#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) -#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) +#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) +#define LTTNG_KERNEL_TRACER_VERSION \ + _IOR(0xF6, 0x46, struct lttng_kernel_tracer_version) +#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) +#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) #define LTTNG_KERNEL_CALIBRATE \ - _IOWR(0xF6, 0x44, struct lttng_kernel_calibrate) + _IOWR(0xF6, 0x49, struct lttng_kernel_calibrate) /* Session FD ioctl */ -#define LTTNG_KERNEL_METADATA \ - _IOW(0xF6, 0x50, struct lttng_channel_attr) -#define LTTNG_KERNEL_CHANNEL \ - _IOW(0xF6, 0x51, struct lttng_channel_attr) -#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) -#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) +#define LTTNG_KERNEL_METADATA \ + _IOW(0xF6, 0x54, struct lttng_kernel_channel) +#define LTTNG_KERNEL_CHANNEL \ + _IOW(0xF6, 0x55, struct lttng_kernel_channel) +#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) +#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) /* Channel FD ioctl */ -#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x60) -#define LTTNG_KERNEL_EVENT \ - _IOW(0xF6, 0x61, struct lttng_kernel_event) -#define LTTNG_KERNEL_STREAM_ID_OFFSET \ - _IOR(0xF6, 0x62, unsigned long) +#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x62) +#define LTTNG_KERNEL_EVENT \ + _IOW(0xF6, 0x63, struct lttng_kernel_event) /* Event and Channel FD ioctl */ -#define LTTNG_KERNEL_CONTEXT \ - _IOW(0xF6, 0x70, struct lttng_kernel_context) +#define LTTNG_KERNEL_CONTEXT \ + _IOW(0xF6, 0x71, struct lttng_kernel_context) /* Event, Channel and Session ioctl */ -#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) -#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) +#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) +#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) #endif /* _LTT_KERNEL_IOCTL_H */ diff --git a/src/common/lttng-kernel-old.h b/src/common/lttng-kernel-old.h new file mode 100644 index 0000000..0579751 --- /dev/null +++ b/src/common/lttng-kernel-old.h @@ -0,0 +1,117 @@ +/* + * Copyright (C) 2011 - Julien Desfossez + * Mathieu Desnoyers + * David Goulet + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef _LTTNG_KERNEL_OLD_H +#define _LTTNG_KERNEL_OLD_H + +#include +#include + +#define LTTNG_KERNEL_OLD_SYM_NAME_LEN 256 + +/* + * LTTng DebugFS ABI structures. + * + * This is the kernel ABI copied from lttng-modules tree. + */ + +/* Perf counter attributes */ +struct lttng_kernel_old_perf_counter_ctx { + uint32_t type; + uint64_t config; + char name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; +}; + +/* Event/Channel context */ +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 +struct lttng_kernel_old_context { + enum lttng_kernel_context_type ctx; + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; + + union { + struct lttng_kernel_old_perf_counter_ctx perf_counter; + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; + } u; +}; + +struct lttng_kernel_old_kretprobe { + uint64_t addr; + + uint64_t offset; + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; +}; + +/* + * Either addr is used, or symbol_name and offset. + */ +struct lttng_kernel_old_kprobe { + uint64_t addr; + + uint64_t offset; + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; +}; + +/* Function tracer */ +struct lttng_kernel_old_function { + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; +}; + +#define LTTNG_KERNEL_OLD_EVENT_PADDING1 16 +#define LTTNG_KERNEL_OLD_EVENT_PADDING2 LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 +struct lttng_kernel_old_event { + char name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; + enum lttng_kernel_instrumentation instrumentation; + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; + + /* Per instrumentation type configuration */ + union { + struct lttng_kernel_old_kretprobe kretprobe; + struct lttng_kernel_old_kprobe kprobe; + struct lttng_kernel_old_function ftrace; + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; + } u; +}; + +struct lttng_kernel_old_tracer_version { + uint32_t major; + uint32_t minor; + uint32_t patchlevel; +}; + +struct lttng_kernel_old_calibrate { + enum lttng_kernel_calibrate_type type; /* type (input) */ +}; + +/* + * kernel channel + */ +#define LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 +struct lttng_kernel_old_channel { + int overwrite; /* 1: overwrite, 0: discard */ + uint64_t subbuf_size; /* bytes */ + uint64_t num_subbuf; /* power of 2 */ + unsigned int switch_timer_interval; /* usec */ + unsigned int read_timer_interval; /* usec */ + enum lttng_event_output output; /* splice, mmap */ + + char padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; +}; + +#endif /* _LTTNG_KERNEL_OLD_H */ diff --git a/src/common/lttng-kernel.h b/src/common/lttng-kernel.h index dbeb6aa..ac881bf 100644 --- a/src/common/lttng-kernel.h +++ b/src/common/lttng-kernel.h @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { uint32_t type; uint64_t config; char name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; +}__attribute__((packed)); /* Event/Channel context */ #define LTTNG_KERNEL_CONTEXT_PADDING1 16 @@ -72,14 +72,14 @@ struct lttng_kernel_context { struct lttng_kernel_perf_counter_ctx perf_counter; char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; } u; -}; +}__attribute__((packed)); struct lttng_kernel_kretprobe { uint64_t addr; uint64_t offset; char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; +}__attribute__((packed)); /* * Either addr is used, or symbol_name and offset. @@ -89,12 +89,12 @@ struct lttng_kernel_kprobe { uint64_t offset; char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; +}__attribute__((packed)); /* Function tracer */ struct lttng_kernel_function { char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; +}__attribute__((packed)); #define LTTNG_KERNEL_EVENT_PADDING1 16 #define LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 @@ -110,13 +110,13 @@ struct lttng_kernel_event { struct lttng_kernel_function ftrace; char padding[LTTNG_KERNEL_EVENT_PADDING2]; } u; -}; +}__attribute__((packed)); struct lttng_kernel_tracer_version { uint32_t major; uint32_t minor; uint32_t patchlevel; -}; +}__attribute__((packed)); enum lttng_kernel_calibrate_type { LTTNG_KERNEL_CALIBRATE_KRETPROBE, @@ -124,6 +124,21 @@ enum lttng_kernel_calibrate_type { struct lttng_kernel_calibrate { enum lttng_kernel_calibrate_type type; /* type (input) */ -}; +}__attribute__((packed)); + +/* + * kernel channel + */ +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 +struct lttng_kernel_channel { + uint64_t subbuf_size; /* bytes */ + uint64_t num_subbuf; /* power of 2 */ + unsigned int switch_timer_interval; /* usec */ + unsigned int read_timer_interval; /* usec */ + int overwrite; /* 1: overwrite, 0: discard */ + enum lttng_event_output output; /* splice, mmap */ + + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; +}__attribute__((packed)); #endif /* _LTTNG_KERNEL_H */ -- 1.7.9.5 From mathieu.desnoyers at efficios.com Mon Oct 1 16:36:59 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 1 Oct 2012 16:36:59 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v3] ABI with support for compat 32/64 bits In-Reply-To: <1349122908-22553-1-git-send-email-jdesfossez@efficios.com> References: <1349122908-22553-1-git-send-email-jdesfossez@efficios.com> Message-ID: <20121001203659.GA22077@Krystal> * Julien Desfossez (jdesfossez at efficios.com) wrote: > The current ABI does not work for compat 32/64 bits. > This patch moves the current ABI as old-abi and provides a new ABI in > which all the structures exchanged between user and kernel-space are > packed. Also this new ABI moves the "int overwrite" member of the > struct lttng_kernel_channel to remove the alignment added by the > compiler. > > A patch for lttng-modules has been developed in parallel to this one > to support the new ABI. These 2 patches have been tested in all > possible configurations (applied or not) on 64-bit and 32-bit kernels > (with CONFIG_COMPAT) and a user-space in 32 and 64-bit. > > Here are the results of the tests : > k 64 compat |?u 32 compat | OK > k 64 compat | u 64 compat | OK > k 64 compat | u 32 non-compat | KO > k 64 compat | u 64 non-compat | OK > > k 64 non-compat | u 64 compat | OK > k 64 non-compat | u 32 compat | KO > k 64 non-compat | u 64 non-compat | OK > k 64 non-compat | u 32 non-compat | KO > > k 32 compat | u compat | OK > k 32 compat | u non-compat | OK > > k 32 non-compat | u compat | OK > k 32 non-compat | u non-compat | OK > > The results are as expected : > - on 32-bit user-space and kernel, every configuration works. > - on 64-bit user-space and kernel, every configuration works. > - with 32-bit user-space on a 64-bit kernel the only configuration > where it works is when the compat patch is applied everywhere. > > Signed-off-by: Julien Desfossez > --- > src/bin/lttng-sessiond/trace-kernel.h | 1 + > src/common/kernel-ctl/kernel-ctl.c | 210 +++++++++++++++++++++++++++++---- > src/common/kernel-ctl/kernel-ctl.h | 1 + > src/common/kernel-ctl/kernel-ioctl.h | 74 ++++++++---- > src/common/lttng-kernel-old.h | 117 ++++++++++++++++++ > src/common/lttng-kernel.h | 31 +++-- > 6 files changed, 385 insertions(+), 49 deletions(-) > create mode 100644 src/common/lttng-kernel-old.h > > diff --git a/src/bin/lttng-sessiond/trace-kernel.h b/src/bin/lttng-sessiond/trace-kernel.h > index f04d9e7..c86cc27 100644 > --- a/src/bin/lttng-sessiond/trace-kernel.h > +++ b/src/bin/lttng-sessiond/trace-kernel.h > @@ -22,6 +22,7 @@ > > #include > #include > +#include > > #include "consumer.h" > > diff --git a/src/common/kernel-ctl/kernel-ctl.c b/src/common/kernel-ctl/kernel-ctl.c > index 1396cd9..ab29dd0 100644 > --- a/src/common/kernel-ctl/kernel-ctl.c > +++ b/src/common/kernel-ctl/kernel-ctl.c > @@ -18,38 +18,163 @@ > > #define __USE_LINUX_IOCTL_DEFS > #include > +#include > > #include "kernel-ctl.h" > #include "kernel-ioctl.h" > > +/* > + * This flag indicates whether lttng-tools must use the new or the old kernel > + * ABI (without compat support for 32/64 bits). > + */ > +static int lttng_kernel_use_old_abi = -1; > + > +static inline int compat_ioctl_no_arg(int fd, unsigned long oldname, > + unsigned long newname) > +{ > + int ret; > + > + if (lttng_kernel_use_old_abi == -1) { > + ret = ioctl(fd, newname); > + if (!ret) { > + lttng_kernel_use_old_abi = 0; > + goto end; > + } > + lttng_kernel_use_old_abi = 1; > + } > + if (lttng_kernel_use_old_abi) { > + ret = ioctl(fd, oldname); > + } else { > + ret = ioctl(fd, newname); > + } > + > +end: > + return ret; > +} > + > int kernctl_create_session(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_SESSION); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION, > + LTTNG_KERNEL_SESSION); > } > > /* open the metadata global channel */ > int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops) > { > - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); > + struct lttng_kernel_old_channel old_channel; > + struct lttng_kernel_channel channel; > + > + if (lttng_kernel_use_old_abi) { > + old_channel.overwrite = chops->overwrite; > + old_channel.subbuf_size = chops->subbuf_size; > + old_channel.num_subbuf = chops->num_subbuf; > + old_channel.switch_timer_interval = chops->switch_timer_interval; > + old_channel.read_timer_interval = chops->read_timer_interval; > + old_channel.output = chops->output; > + memcpy(old_channel.padding, chops->padding, sizeof(chops->padding)); > + > + return ioctl(fd, LTTNG_KERNEL_OLD_METADATA, &old_channel); > + } > + > + channel.overwrite = chops->overwrite; > + channel.subbuf_size = chops->subbuf_size; > + channel.num_subbuf = chops->num_subbuf; > + channel.switch_timer_interval = chops->switch_timer_interval; > + channel.read_timer_interval = chops->read_timer_interval; > + channel.output = chops->output; > + memcpy(channel.padding, chops->padding, sizeof(chops->padding)); > + > + return ioctl(fd, LTTNG_KERNEL_METADATA, &channel); > } > > int kernctl_create_channel(int fd, struct lttng_channel_attr *chops) > { > - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); > + struct lttng_kernel_old_channel old_channel; > + struct lttng_kernel_channel channel; > + > + if (lttng_kernel_use_old_abi) { > + old_channel.overwrite = chops->overwrite; > + old_channel.subbuf_size = chops->subbuf_size; > + old_channel.num_subbuf = chops->num_subbuf; > + old_channel.switch_timer_interval = chops->switch_timer_interval; > + old_channel.read_timer_interval = chops->read_timer_interval; > + old_channel.output = chops->output; > + memcpy(old_channel.padding, chops->padding, sizeof(chops->padding)); > + > + return ioctl(fd, LTTNG_KERNEL_OLD_CHANNEL, &old_channel); > + } > + > + channel.overwrite = chops->overwrite; > + channel.subbuf_size = chops->subbuf_size; > + channel.num_subbuf = chops->num_subbuf; > + channel.switch_timer_interval = chops->switch_timer_interval; > + channel.read_timer_interval = chops->read_timer_interval; > + channel.output = chops->output; > + memcpy(channel.padding, chops->padding, sizeof(chops->padding)); > + > + return ioctl(fd, LTTNG_KERNEL_CHANNEL, &channel); > } > > int kernctl_create_stream(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_STREAM); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_STREAM, > + LTTNG_KERNEL_STREAM); > } > > int kernctl_create_event(int fd, struct lttng_kernel_event *ev) > { > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_event old_event; > + > + memcpy(old_event.name, ev->name, sizeof(old_event.name)); > + old_event.instrumentation = ev->instrumentation; > + switch (ev->instrumentation) { > + case LTTNG_KERNEL_KPROBE: > + old_event.u.kprobe.addr = ev->u.kprobe.addr; > + old_event.u.kprobe.offset = ev->u.kprobe.offset; > + memcpy(old_event.u.kprobe.symbol_name, > + ev->u.kprobe.symbol_name, > + sizeof(old_event.u.kprobe.symbol_name)); > + break; > + case LTTNG_KERNEL_KRETPROBE: > + old_event.u.kretprobe.addr = ev->u.kretprobe.addr; > + old_event.u.kretprobe.offset = ev->u.kretprobe.offset; > + memcpy(old_event.u.kretprobe.symbol_name, > + ev->u.kretprobe.symbol_name, > + sizeof(old_event.u.kretprobe.symbol_name)); > + break; > + case LTTNG_KERNEL_FUNCTION: > + memcpy(old_event.u.ftrace.symbol_name, > + ev->u.ftrace.symbol_name, > + sizeof(old_event.u.ftrace.symbol_name)); > + break; > + default: > + break; > + } > + > + return ioctl(fd, LTTNG_KERNEL_OLD_EVENT, &old_event); > + } > return ioctl(fd, LTTNG_KERNEL_EVENT, ev); > } > > int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) > { > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_old_context old_ctx; > + > + old_ctx.ctx = ctx->ctx; > + /* only type that uses the union */ > + if (ctx->ctx == LTTNG_KERNEL_CONTEXT_PERF_COUNTER) { > + old_ctx.u.perf_counter.type = > + ctx->u.perf_counter.type; > + old_ctx.u.perf_counter.config = > + ctx->u.perf_counter.config; > + memcpy(old_ctx.u.perf_counter.name, > + ctx->u.perf_counter.name, > + sizeof(old_ctx.u.perf_counter.name)); > + } > + return ioctl(fd, LTTNG_KERNEL_OLD_CONTEXT, &old_ctx); > + } > return ioctl(fd, LTTNG_KERNEL_CONTEXT, ctx); > } > > @@ -57,44 +182,96 @@ int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) > /* Enable event, channel and session ioctl */ > int kernctl_enable(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_ENABLE); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_ENABLE, > + LTTNG_KERNEL_ENABLE); > } > > /* Disable event, channel and session ioctl */ > int kernctl_disable(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_DISABLE); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_DISABLE, > + LTTNG_KERNEL_DISABLE); > } > > int kernctl_start_session(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_SESSION_START); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_START, > + LTTNG_KERNEL_SESSION_START); > } > > int kernctl_stop_session(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_SESSION_STOP); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_STOP, > + LTTNG_KERNEL_SESSION_STOP); > } > > - > int kernctl_tracepoint_list(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_TRACEPOINT_LIST); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_TRACEPOINT_LIST, > + LTTNG_KERNEL_TRACEPOINT_LIST); > } > > int kernctl_tracer_version(int fd, struct lttng_kernel_tracer_version *v) > { > - return ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > + int ret; > + > + if (lttng_kernel_use_old_abi == -1) { > + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > + if (!ret) { > + lttng_kernel_use_old_abi = 0; > + goto end; > + } > + lttng_kernel_use_old_abi = 1; > + } > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_old_tracer_version old_v; > + > + ret = ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, &old_v); > + if (ret) > + goto end; > + v->major = old_v.major; > + v->minor = old_v.minor; > + v->patchlevel = old_v.patchlevel; > + } else { > + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > + } > + > +end: > + return ret; > } > > int kernctl_wait_quiescent(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_WAIT_QUIESCENT); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_WAIT_QUIESCENT, > + LTTNG_KERNEL_WAIT_QUIESCENT); > } > > int kernctl_calibrate(int fd, struct lttng_kernel_calibrate *calibrate) > { > - return ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > + int ret; > + > + if (lttng_kernel_use_old_abi == -1) { > + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > + if (!ret) { > + lttng_kernel_use_old_abi = 0; > + goto end; > + } > + lttng_kernel_use_old_abi = 1; > + } > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_old_calibrate old_calibrate; > + > + old_calibrate.type = calibrate->type; > + ret = ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, &old_calibrate); > + if (ret) > + goto end; > + calibrate->type = old_calibrate.type; > + } else { > + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > + } > + > +end: > + return ret; > } > > > @@ -193,10 +370,3 @@ int kernctl_set_stream_id(int fd, unsigned long *stream_id) > { > return ioctl(fd, RING_BUFFER_SET_STREAM_ID, stream_id); > } > - > -/* Get the offset of the stream_id in the packet header */ > -int kernctl_get_net_stream_id_offset(int fd, unsigned long *offset) > -{ > - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, offset); > - > -} > diff --git a/src/common/kernel-ctl/kernel-ctl.h b/src/common/kernel-ctl/kernel-ctl.h > index 18712d9..85a3a18 100644 > --- a/src/common/kernel-ctl/kernel-ctl.h > +++ b/src/common/kernel-ctl/kernel-ctl.h > @@ -21,6 +21,7 @@ > > #include > #include > +#include > > int kernctl_create_session(int fd); > int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops); > diff --git a/src/common/kernel-ctl/kernel-ioctl.h b/src/common/kernel-ctl/kernel-ioctl.h > index 35942be..1d34222 100644 > --- a/src/common/kernel-ctl/kernel-ioctl.h > +++ b/src/common/kernel-ctl/kernel-ioctl.h > @@ -49,37 +49,69 @@ > /* map stream to stream id for network streaming */ > #define RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned long) > > +/* Old ABI (without support for 32/64 bits compat) */ > +/* LTTng file descriptor ioctl */ > +#define LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) > +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ > + _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) > +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, 0x42) > +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT _IO(0xF6, 0x43) > +#define LTTNG_KERNEL_OLD_CALIBRATE \ > + _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) > + > +/* Session FD ioctl */ > +#define LTTNG_KERNEL_OLD_METADATA \ > + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) > +#define LTTNG_KERNEL_OLD_CHANNEL \ > + _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) > +#define LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) > +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, 0x53) > + > +/* Channel FD ioctl */ > +#define LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) > +#define LTTNG_KERNEL_OLD_EVENT \ > + _IOW(0xF6, 0x61, struct lttng_kernel_old_event) > +#define LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ > + _IOR(0xF6, 0x62, unsigned long) > > +/* Event and Channel FD ioctl */ > +#define LTTNG_KERNEL_OLD_CONTEXT \ > + _IOW(0xF6, 0x70, struct lttng_kernel_old_context) > + > +/* Event, Channel and Session ioctl */ > +#define LTTNG_KERNEL_OLD_ENABLE _IO(0xF6, 0x80) > +#define LTTNG_KERNEL_OLD_DISABLE _IO(0xF6, 0x81) > + > + > +/* New ABI (with suport for 32/64 bits compat) */ > /* LTTng file descriptor ioctl */ > -#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) > -#define LTTNG_KERNEL_TRACER_VERSION \ > - _IOR(0xF6, 0x41, struct lttng_kernel_tracer_version) > -#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) > -#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) > +#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) > +#define LTTNG_KERNEL_TRACER_VERSION \ > + _IOR(0xF6, 0x46, struct lttng_kernel_tracer_version) > +#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) > +#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) > #define LTTNG_KERNEL_CALIBRATE \ > - _IOWR(0xF6, 0x44, struct lttng_kernel_calibrate) > + _IOWR(0xF6, 0x49, struct lttng_kernel_calibrate) > > /* Session FD ioctl */ > -#define LTTNG_KERNEL_METADATA \ > - _IOW(0xF6, 0x50, struct lttng_channel_attr) > -#define LTTNG_KERNEL_CHANNEL \ > - _IOW(0xF6, 0x51, struct lttng_channel_attr) > -#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) > -#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) > +#define LTTNG_KERNEL_METADATA \ > + _IOW(0xF6, 0x54, struct lttng_kernel_channel) > +#define LTTNG_KERNEL_CHANNEL \ > + _IOW(0xF6, 0x55, struct lttng_kernel_channel) > +#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) > +#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) > > /* Channel FD ioctl */ > -#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x60) > -#define LTTNG_KERNEL_EVENT \ > - _IOW(0xF6, 0x61, struct lttng_kernel_event) > -#define LTTNG_KERNEL_STREAM_ID_OFFSET \ > - _IOR(0xF6, 0x62, unsigned long) > +#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x62) > +#define LTTNG_KERNEL_EVENT \ > + _IOW(0xF6, 0x63, struct lttng_kernel_event) > > /* Event and Channel FD ioctl */ > -#define LTTNG_KERNEL_CONTEXT \ > - _IOW(0xF6, 0x70, struct lttng_kernel_context) > +#define LTTNG_KERNEL_CONTEXT \ > + _IOW(0xF6, 0x71, struct lttng_kernel_context) > > /* Event, Channel and Session ioctl */ > -#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) > -#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) > +#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) > +#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) > > #endif /* _LTT_KERNEL_IOCTL_H */ > diff --git a/src/common/lttng-kernel-old.h b/src/common/lttng-kernel-old.h > new file mode 100644 > index 0000000..0579751 > --- /dev/null > +++ b/src/common/lttng-kernel-old.h > @@ -0,0 +1,117 @@ > +/* > + * Copyright (C) 2011 - Julien Desfossez > + * Mathieu Desnoyers > + * David Goulet > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2 only, > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along > + * with this program; if not, write to the Free Software Foundation, Inc., > + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > + */ > + > +#ifndef _LTTNG_KERNEL_OLD_H > +#define _LTTNG_KERNEL_OLD_H > + > +#include > +#include > + > +#define LTTNG_KERNEL_OLD_SYM_NAME_LEN 256 no need for LTTNG_KERNEL_OLD_SYM_NAME_LEN. please use LTTNG_KERNEL_SYM_NAME_LEN like the lttng-modules header implementation. Thanks, Mathieu > + > +/* > + * LTTng DebugFS ABI structures. > + * > + * This is the kernel ABI copied from lttng-modules tree. > + */ > + > +/* Perf counter attributes */ > +struct lttng_kernel_old_perf_counter_ctx { > + uint32_t type; > + uint64_t config; > + char name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > +}; > + > +/* Event/Channel context */ > +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 > +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 > +struct lttng_kernel_old_context { > + enum lttng_kernel_context_type ctx; > + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; > + > + union { > + struct lttng_kernel_old_perf_counter_ctx perf_counter; > + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; > + } u; > +}; > + > +struct lttng_kernel_old_kretprobe { > + uint64_t addr; > + > + uint64_t offset; > + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > +}; > + > +/* > + * Either addr is used, or symbol_name and offset. > + */ > +struct lttng_kernel_old_kprobe { > + uint64_t addr; > + > + uint64_t offset; > + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > +}; > + > +/* Function tracer */ > +struct lttng_kernel_old_function { > + char symbol_name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > +}; > + > +#define LTTNG_KERNEL_OLD_EVENT_PADDING1 16 > +#define LTTNG_KERNEL_OLD_EVENT_PADDING2 LTTNG_KERNEL_OLD_SYM_NAME_LEN + 32 > +struct lttng_kernel_old_event { > + char name[LTTNG_KERNEL_OLD_SYM_NAME_LEN]; > + enum lttng_kernel_instrumentation instrumentation; > + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; > + > + /* Per instrumentation type configuration */ > + union { > + struct lttng_kernel_old_kretprobe kretprobe; > + struct lttng_kernel_old_kprobe kprobe; > + struct lttng_kernel_old_function ftrace; > + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; > + } u; > +}; > + > +struct lttng_kernel_old_tracer_version { > + uint32_t major; > + uint32_t minor; > + uint32_t patchlevel; > +}; > + > +struct lttng_kernel_old_calibrate { > + enum lttng_kernel_calibrate_type type; /* type (input) */ > +}; > + > +/* > + * kernel channel > + */ > +#define LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > +struct lttng_kernel_old_channel { > + int overwrite; /* 1: overwrite, 0: discard */ > + uint64_t subbuf_size; /* bytes */ > + uint64_t num_subbuf; /* power of 2 */ > + unsigned int switch_timer_interval; /* usec */ > + unsigned int read_timer_interval; /* usec */ > + enum lttng_event_output output; /* splice, mmap */ > + > + char padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; > +}; > + > +#endif /* _LTTNG_KERNEL_OLD_H */ > diff --git a/src/common/lttng-kernel.h b/src/common/lttng-kernel.h > index dbeb6aa..ac881bf 100644 > --- a/src/common/lttng-kernel.h > +++ b/src/common/lttng-kernel.h > @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { > uint32_t type; > uint64_t config; > char name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* Event/Channel context */ > #define LTTNG_KERNEL_CONTEXT_PADDING1 16 > @@ -72,14 +72,14 @@ struct lttng_kernel_context { > struct lttng_kernel_perf_counter_ctx perf_counter; > char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; > } u; > -}; > +}__attribute__((packed)); > > struct lttng_kernel_kretprobe { > uint64_t addr; > > uint64_t offset; > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* > * Either addr is used, or symbol_name and offset. > @@ -89,12 +89,12 @@ struct lttng_kernel_kprobe { > > uint64_t offset; > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* Function tracer */ > struct lttng_kernel_function { > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > #define LTTNG_KERNEL_EVENT_PADDING1 16 > #define LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > @@ -110,13 +110,13 @@ struct lttng_kernel_event { > struct lttng_kernel_function ftrace; > char padding[LTTNG_KERNEL_EVENT_PADDING2]; > } u; > -}; > +}__attribute__((packed)); > > struct lttng_kernel_tracer_version { > uint32_t major; > uint32_t minor; > uint32_t patchlevel; > -}; > +}__attribute__((packed)); > > enum lttng_kernel_calibrate_type { > LTTNG_KERNEL_CALIBRATE_KRETPROBE, > @@ -124,6 +124,21 @@ enum lttng_kernel_calibrate_type { > > struct lttng_kernel_calibrate { > enum lttng_kernel_calibrate_type type; /* type (input) */ > -}; > +}__attribute__((packed)); > + > +/* > + * kernel channel > + */ > +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > +struct lttng_kernel_channel { > + uint64_t subbuf_size; /* bytes */ > + uint64_t num_subbuf; /* power of 2 */ > + unsigned int switch_timer_interval; /* usec */ > + unsigned int read_timer_interval; /* usec */ > + int overwrite; /* 1: overwrite, 0: discard */ > + enum lttng_event_output output; /* splice, mmap */ > + > + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; > +}__attribute__((packed)); > > #endif /* _LTTNG_KERNEL_H */ > -- > 1.7.9.5 > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From jdesfossez at efficios.com Mon Oct 1 16:42:20 2012 From: jdesfossez at efficios.com (Julien Desfossez) Date: Mon, 1 Oct 2012 16:42:20 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v4] ABI with support for compat 32/64 bits Message-ID: <1349124140-3224-1-git-send-email-jdesfossez@efficios.com> The current ABI does not work for compat 32/64 bits. This patch moves the current ABI as old-abi and provides a new ABI in which all the structures exchanged between user and kernel-space are packed. Also this new ABI moves the "int overwrite" member of the struct lttng_kernel_channel to remove the alignment added by the compiler. A patch for lttng-modules has been developed in parallel to this one to support the new ABI. These 2 patches have been tested in all possible configurations (applied or not) on 64-bit and 32-bit kernels (with CONFIG_COMPAT) and a user-space in 32 and 64-bit. Here are the results of the tests : k 64 compat |?u 32 compat | OK k 64 compat | u 64 compat | OK k 64 compat | u 32 non-compat | KO k 64 compat | u 64 non-compat | OK k 64 non-compat | u 64 compat | OK k 64 non-compat | u 32 compat | KO k 64 non-compat | u 64 non-compat | OK k 64 non-compat | u 32 non-compat | KO k 32 compat | u compat | OK k 32 compat | u non-compat | OK k 32 non-compat | u compat | OK k 32 non-compat | u non-compat | OK The results are as expected : - on 32-bit user-space and kernel, every configuration works. - on 64-bit user-space and kernel, every configuration works. - with 32-bit user-space on a 64-bit kernel the only configuration where it works is when the compat patch is applied everywhere. Signed-off-by: Julien Desfossez --- src/bin/lttng-sessiond/trace-kernel.h | 1 + src/common/kernel-ctl/kernel-ctl.c | 210 +++++++++++++++++++++++++++++---- src/common/kernel-ctl/kernel-ctl.h | 1 + src/common/kernel-ctl/kernel-ioctl.h | 74 ++++++++---- src/common/lttng-kernel-old.h | 115 ++++++++++++++++++ src/common/lttng-kernel.h | 31 +++-- 6 files changed, 383 insertions(+), 49 deletions(-) create mode 100644 src/common/lttng-kernel-old.h diff --git a/src/bin/lttng-sessiond/trace-kernel.h b/src/bin/lttng-sessiond/trace-kernel.h index f04d9e7..c86cc27 100644 --- a/src/bin/lttng-sessiond/trace-kernel.h +++ b/src/bin/lttng-sessiond/trace-kernel.h @@ -22,6 +22,7 @@ #include #include +#include #include "consumer.h" diff --git a/src/common/kernel-ctl/kernel-ctl.c b/src/common/kernel-ctl/kernel-ctl.c index 1396cd9..ab29dd0 100644 --- a/src/common/kernel-ctl/kernel-ctl.c +++ b/src/common/kernel-ctl/kernel-ctl.c @@ -18,38 +18,163 @@ #define __USE_LINUX_IOCTL_DEFS #include +#include #include "kernel-ctl.h" #include "kernel-ioctl.h" +/* + * This flag indicates whether lttng-tools must use the new or the old kernel + * ABI (without compat support for 32/64 bits). + */ +static int lttng_kernel_use_old_abi = -1; + +static inline int compat_ioctl_no_arg(int fd, unsigned long oldname, + unsigned long newname) +{ + int ret; + + if (lttng_kernel_use_old_abi == -1) { + ret = ioctl(fd, newname); + if (!ret) { + lttng_kernel_use_old_abi = 0; + goto end; + } + lttng_kernel_use_old_abi = 1; + } + if (lttng_kernel_use_old_abi) { + ret = ioctl(fd, oldname); + } else { + ret = ioctl(fd, newname); + } + +end: + return ret; +} + int kernctl_create_session(int fd) { - return ioctl(fd, LTTNG_KERNEL_SESSION); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION, + LTTNG_KERNEL_SESSION); } /* open the metadata global channel */ int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops) { - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); + struct lttng_kernel_old_channel old_channel; + struct lttng_kernel_channel channel; + + if (lttng_kernel_use_old_abi) { + old_channel.overwrite = chops->overwrite; + old_channel.subbuf_size = chops->subbuf_size; + old_channel.num_subbuf = chops->num_subbuf; + old_channel.switch_timer_interval = chops->switch_timer_interval; + old_channel.read_timer_interval = chops->read_timer_interval; + old_channel.output = chops->output; + memcpy(old_channel.padding, chops->padding, sizeof(chops->padding)); + + return ioctl(fd, LTTNG_KERNEL_OLD_METADATA, &old_channel); + } + + channel.overwrite = chops->overwrite; + channel.subbuf_size = chops->subbuf_size; + channel.num_subbuf = chops->num_subbuf; + channel.switch_timer_interval = chops->switch_timer_interval; + channel.read_timer_interval = chops->read_timer_interval; + channel.output = chops->output; + memcpy(channel.padding, chops->padding, sizeof(chops->padding)); + + return ioctl(fd, LTTNG_KERNEL_METADATA, &channel); } int kernctl_create_channel(int fd, struct lttng_channel_attr *chops) { - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); + struct lttng_kernel_old_channel old_channel; + struct lttng_kernel_channel channel; + + if (lttng_kernel_use_old_abi) { + old_channel.overwrite = chops->overwrite; + old_channel.subbuf_size = chops->subbuf_size; + old_channel.num_subbuf = chops->num_subbuf; + old_channel.switch_timer_interval = chops->switch_timer_interval; + old_channel.read_timer_interval = chops->read_timer_interval; + old_channel.output = chops->output; + memcpy(old_channel.padding, chops->padding, sizeof(chops->padding)); + + return ioctl(fd, LTTNG_KERNEL_OLD_CHANNEL, &old_channel); + } + + channel.overwrite = chops->overwrite; + channel.subbuf_size = chops->subbuf_size; + channel.num_subbuf = chops->num_subbuf; + channel.switch_timer_interval = chops->switch_timer_interval; + channel.read_timer_interval = chops->read_timer_interval; + channel.output = chops->output; + memcpy(channel.padding, chops->padding, sizeof(chops->padding)); + + return ioctl(fd, LTTNG_KERNEL_CHANNEL, &channel); } int kernctl_create_stream(int fd) { - return ioctl(fd, LTTNG_KERNEL_STREAM); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_STREAM, + LTTNG_KERNEL_STREAM); } int kernctl_create_event(int fd, struct lttng_kernel_event *ev) { + if (lttng_kernel_use_old_abi) { + struct lttng_kernel_event old_event; + + memcpy(old_event.name, ev->name, sizeof(old_event.name)); + old_event.instrumentation = ev->instrumentation; + switch (ev->instrumentation) { + case LTTNG_KERNEL_KPROBE: + old_event.u.kprobe.addr = ev->u.kprobe.addr; + old_event.u.kprobe.offset = ev->u.kprobe.offset; + memcpy(old_event.u.kprobe.symbol_name, + ev->u.kprobe.symbol_name, + sizeof(old_event.u.kprobe.symbol_name)); + break; + case LTTNG_KERNEL_KRETPROBE: + old_event.u.kretprobe.addr = ev->u.kretprobe.addr; + old_event.u.kretprobe.offset = ev->u.kretprobe.offset; + memcpy(old_event.u.kretprobe.symbol_name, + ev->u.kretprobe.symbol_name, + sizeof(old_event.u.kretprobe.symbol_name)); + break; + case LTTNG_KERNEL_FUNCTION: + memcpy(old_event.u.ftrace.symbol_name, + ev->u.ftrace.symbol_name, + sizeof(old_event.u.ftrace.symbol_name)); + break; + default: + break; + } + + return ioctl(fd, LTTNG_KERNEL_OLD_EVENT, &old_event); + } return ioctl(fd, LTTNG_KERNEL_EVENT, ev); } int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) { + if (lttng_kernel_use_old_abi) { + struct lttng_kernel_old_context old_ctx; + + old_ctx.ctx = ctx->ctx; + /* only type that uses the union */ + if (ctx->ctx == LTTNG_KERNEL_CONTEXT_PERF_COUNTER) { + old_ctx.u.perf_counter.type = + ctx->u.perf_counter.type; + old_ctx.u.perf_counter.config = + ctx->u.perf_counter.config; + memcpy(old_ctx.u.perf_counter.name, + ctx->u.perf_counter.name, + sizeof(old_ctx.u.perf_counter.name)); + } + return ioctl(fd, LTTNG_KERNEL_OLD_CONTEXT, &old_ctx); + } return ioctl(fd, LTTNG_KERNEL_CONTEXT, ctx); } @@ -57,44 +182,96 @@ int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) /* Enable event, channel and session ioctl */ int kernctl_enable(int fd) { - return ioctl(fd, LTTNG_KERNEL_ENABLE); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_ENABLE, + LTTNG_KERNEL_ENABLE); } /* Disable event, channel and session ioctl */ int kernctl_disable(int fd) { - return ioctl(fd, LTTNG_KERNEL_DISABLE); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_DISABLE, + LTTNG_KERNEL_DISABLE); } int kernctl_start_session(int fd) { - return ioctl(fd, LTTNG_KERNEL_SESSION_START); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_START, + LTTNG_KERNEL_SESSION_START); } int kernctl_stop_session(int fd) { - return ioctl(fd, LTTNG_KERNEL_SESSION_STOP); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_STOP, + LTTNG_KERNEL_SESSION_STOP); } - int kernctl_tracepoint_list(int fd) { - return ioctl(fd, LTTNG_KERNEL_TRACEPOINT_LIST); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_TRACEPOINT_LIST, + LTTNG_KERNEL_TRACEPOINT_LIST); } int kernctl_tracer_version(int fd, struct lttng_kernel_tracer_version *v) { - return ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); + int ret; + + if (lttng_kernel_use_old_abi == -1) { + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); + if (!ret) { + lttng_kernel_use_old_abi = 0; + goto end; + } + lttng_kernel_use_old_abi = 1; + } + if (lttng_kernel_use_old_abi) { + struct lttng_kernel_old_tracer_version old_v; + + ret = ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, &old_v); + if (ret) + goto end; + v->major = old_v.major; + v->minor = old_v.minor; + v->patchlevel = old_v.patchlevel; + } else { + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); + } + +end: + return ret; } int kernctl_wait_quiescent(int fd) { - return ioctl(fd, LTTNG_KERNEL_WAIT_QUIESCENT); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_WAIT_QUIESCENT, + LTTNG_KERNEL_WAIT_QUIESCENT); } int kernctl_calibrate(int fd, struct lttng_kernel_calibrate *calibrate) { - return ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); + int ret; + + if (lttng_kernel_use_old_abi == -1) { + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); + if (!ret) { + lttng_kernel_use_old_abi = 0; + goto end; + } + lttng_kernel_use_old_abi = 1; + } + if (lttng_kernel_use_old_abi) { + struct lttng_kernel_old_calibrate old_calibrate; + + old_calibrate.type = calibrate->type; + ret = ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, &old_calibrate); + if (ret) + goto end; + calibrate->type = old_calibrate.type; + } else { + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); + } + +end: + return ret; } @@ -193,10 +370,3 @@ int kernctl_set_stream_id(int fd, unsigned long *stream_id) { return ioctl(fd, RING_BUFFER_SET_STREAM_ID, stream_id); } - -/* Get the offset of the stream_id in the packet header */ -int kernctl_get_net_stream_id_offset(int fd, unsigned long *offset) -{ - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, offset); - -} diff --git a/src/common/kernel-ctl/kernel-ctl.h b/src/common/kernel-ctl/kernel-ctl.h index 18712d9..85a3a18 100644 --- a/src/common/kernel-ctl/kernel-ctl.h +++ b/src/common/kernel-ctl/kernel-ctl.h @@ -21,6 +21,7 @@ #include #include +#include int kernctl_create_session(int fd); int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops); diff --git a/src/common/kernel-ctl/kernel-ioctl.h b/src/common/kernel-ctl/kernel-ioctl.h index 35942be..1d34222 100644 --- a/src/common/kernel-ctl/kernel-ioctl.h +++ b/src/common/kernel-ctl/kernel-ioctl.h @@ -49,37 +49,69 @@ /* map stream to stream id for network streaming */ #define RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned long) +/* Old ABI (without support for 32/64 bits compat) */ +/* LTTng file descriptor ioctl */ +#define LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ + _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, 0x42) +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT _IO(0xF6, 0x43) +#define LTTNG_KERNEL_OLD_CALIBRATE \ + _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) + +/* Session FD ioctl */ +#define LTTNG_KERNEL_OLD_METADATA \ + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) +#define LTTNG_KERNEL_OLD_CHANNEL \ + _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) +#define LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, 0x53) + +/* Channel FD ioctl */ +#define LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) +#define LTTNG_KERNEL_OLD_EVENT \ + _IOW(0xF6, 0x61, struct lttng_kernel_old_event) +#define LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ + _IOR(0xF6, 0x62, unsigned long) +/* Event and Channel FD ioctl */ +#define LTTNG_KERNEL_OLD_CONTEXT \ + _IOW(0xF6, 0x70, struct lttng_kernel_old_context) + +/* Event, Channel and Session ioctl */ +#define LTTNG_KERNEL_OLD_ENABLE _IO(0xF6, 0x80) +#define LTTNG_KERNEL_OLD_DISABLE _IO(0xF6, 0x81) + + +/* New ABI (with suport for 32/64 bits compat) */ /* LTTng file descriptor ioctl */ -#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) -#define LTTNG_KERNEL_TRACER_VERSION \ - _IOR(0xF6, 0x41, struct lttng_kernel_tracer_version) -#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) -#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) +#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) +#define LTTNG_KERNEL_TRACER_VERSION \ + _IOR(0xF6, 0x46, struct lttng_kernel_tracer_version) +#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) +#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) #define LTTNG_KERNEL_CALIBRATE \ - _IOWR(0xF6, 0x44, struct lttng_kernel_calibrate) + _IOWR(0xF6, 0x49, struct lttng_kernel_calibrate) /* Session FD ioctl */ -#define LTTNG_KERNEL_METADATA \ - _IOW(0xF6, 0x50, struct lttng_channel_attr) -#define LTTNG_KERNEL_CHANNEL \ - _IOW(0xF6, 0x51, struct lttng_channel_attr) -#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) -#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) +#define LTTNG_KERNEL_METADATA \ + _IOW(0xF6, 0x54, struct lttng_kernel_channel) +#define LTTNG_KERNEL_CHANNEL \ + _IOW(0xF6, 0x55, struct lttng_kernel_channel) +#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) +#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) /* Channel FD ioctl */ -#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x60) -#define LTTNG_KERNEL_EVENT \ - _IOW(0xF6, 0x61, struct lttng_kernel_event) -#define LTTNG_KERNEL_STREAM_ID_OFFSET \ - _IOR(0xF6, 0x62, unsigned long) +#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x62) +#define LTTNG_KERNEL_EVENT \ + _IOW(0xF6, 0x63, struct lttng_kernel_event) /* Event and Channel FD ioctl */ -#define LTTNG_KERNEL_CONTEXT \ - _IOW(0xF6, 0x70, struct lttng_kernel_context) +#define LTTNG_KERNEL_CONTEXT \ + _IOW(0xF6, 0x71, struct lttng_kernel_context) /* Event, Channel and Session ioctl */ -#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) -#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) +#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) +#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) #endif /* _LTT_KERNEL_IOCTL_H */ diff --git a/src/common/lttng-kernel-old.h b/src/common/lttng-kernel-old.h new file mode 100644 index 0000000..1b8999a --- /dev/null +++ b/src/common/lttng-kernel-old.h @@ -0,0 +1,115 @@ +/* + * Copyright (C) 2011 - Julien Desfossez + * Mathieu Desnoyers + * David Goulet + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef _LTTNG_KERNEL_OLD_H +#define _LTTNG_KERNEL_OLD_H + +#include +#include + +/* + * LTTng DebugFS ABI structures. + * + * This is the kernel ABI copied from lttng-modules tree. + */ + +/* Perf counter attributes */ +struct lttng_kernel_old_perf_counter_ctx { + uint32_t type; + uint64_t config; + char name[LTTNG_KERNEL_SYM_NAME_LEN]; +}; + +/* Event/Channel context */ +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 +struct lttng_kernel_old_context { + enum lttng_kernel_context_type ctx; + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; + + union { + struct lttng_kernel_old_perf_counter_ctx perf_counter; + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; + } u; +}; + +struct lttng_kernel_old_kretprobe { + uint64_t addr; + + uint64_t offset; + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; +}; + +/* + * Either addr is used, or symbol_name and offset. + */ +struct lttng_kernel_old_kprobe { + uint64_t addr; + + uint64_t offset; + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; +}; + +/* Function tracer */ +struct lttng_kernel_old_function { + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; +}; + +#define LTTNG_KERNEL_OLD_EVENT_PADDING1 16 +#define LTTNG_KERNEL_OLD_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 +struct lttng_kernel_old_event { + char name[LTTNG_KERNEL_SYM_NAME_LEN]; + enum lttng_kernel_instrumentation instrumentation; + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; + + /* Per instrumentation type configuration */ + union { + struct lttng_kernel_old_kretprobe kretprobe; + struct lttng_kernel_old_kprobe kprobe; + struct lttng_kernel_old_function ftrace; + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; + } u; +}; + +struct lttng_kernel_old_tracer_version { + uint32_t major; + uint32_t minor; + uint32_t patchlevel; +}; + +struct lttng_kernel_old_calibrate { + enum lttng_kernel_calibrate_type type; /* type (input) */ +}; + +/* + * kernel channel + */ +#define LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 +struct lttng_kernel_old_channel { + int overwrite; /* 1: overwrite, 0: discard */ + uint64_t subbuf_size; /* bytes */ + uint64_t num_subbuf; /* power of 2 */ + unsigned int switch_timer_interval; /* usec */ + unsigned int read_timer_interval; /* usec */ + enum lttng_event_output output; /* splice, mmap */ + + char padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; +}; + +#endif /* _LTTNG_KERNEL_OLD_H */ diff --git a/src/common/lttng-kernel.h b/src/common/lttng-kernel.h index dbeb6aa..ac881bf 100644 --- a/src/common/lttng-kernel.h +++ b/src/common/lttng-kernel.h @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { uint32_t type; uint64_t config; char name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; +}__attribute__((packed)); /* Event/Channel context */ #define LTTNG_KERNEL_CONTEXT_PADDING1 16 @@ -72,14 +72,14 @@ struct lttng_kernel_context { struct lttng_kernel_perf_counter_ctx perf_counter; char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; } u; -}; +}__attribute__((packed)); struct lttng_kernel_kretprobe { uint64_t addr; uint64_t offset; char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; +}__attribute__((packed)); /* * Either addr is used, or symbol_name and offset. @@ -89,12 +89,12 @@ struct lttng_kernel_kprobe { uint64_t offset; char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; +}__attribute__((packed)); /* Function tracer */ struct lttng_kernel_function { char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; +}__attribute__((packed)); #define LTTNG_KERNEL_EVENT_PADDING1 16 #define LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 @@ -110,13 +110,13 @@ struct lttng_kernel_event { struct lttng_kernel_function ftrace; char padding[LTTNG_KERNEL_EVENT_PADDING2]; } u; -}; +}__attribute__((packed)); struct lttng_kernel_tracer_version { uint32_t major; uint32_t minor; uint32_t patchlevel; -}; +}__attribute__((packed)); enum lttng_kernel_calibrate_type { LTTNG_KERNEL_CALIBRATE_KRETPROBE, @@ -124,6 +124,21 @@ enum lttng_kernel_calibrate_type { struct lttng_kernel_calibrate { enum lttng_kernel_calibrate_type type; /* type (input) */ -}; +}__attribute__((packed)); + +/* + * kernel channel + */ +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 +struct lttng_kernel_channel { + uint64_t subbuf_size; /* bytes */ + uint64_t num_subbuf; /* power of 2 */ + unsigned int switch_timer_interval; /* usec */ + unsigned int read_timer_interval; /* usec */ + int overwrite; /* 1: overwrite, 0: discard */ + enum lttng_event_output output; /* splice, mmap */ + + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; +}__attribute__((packed)); #endif /* _LTTNG_KERNEL_H */ -- 1.7.9.5 From paul.chavent at fnac.net Mon Oct 1 16:27:09 2012 From: paul.chavent at fnac.net (Paul Chavent) Date: Mon, 01 Oct 2012 22:27:09 +0200 Subject: [lttng-dev] Wich interface to use for lttng ust In-Reply-To: <20121001145925.GC13423@Krystal> References: <14773128.185971349080793918.JavaMail.www@wsfrf1114> <20121001145925.GC13423@Krystal> Message-ID: <5069FC9D.1070303@fnac.net> Thanks. On 10/01/2012 04:59 PM, Mathieu Desnoyers wrote: > * paul.chavent at fnac.net (paul.chavent at fnac.net) wrote: >> Hi. >> >> I would like to try lttng user space traces. I've found two documentations : >> - manual : http://lttng.org/files/ust/manual/ust.html >> - man page : http://lttng.org/files/doc/man-pages/man3/lttng-ust.3.html >> >> Is there any "prefered" choice for reference ? > > lttng-ust(3). > >> Should/Can i use >> ust_marker, tracepoint ? > > ust_marker do not exist anymore. > > Alexandre, can you remove the old > http://lttng.org/files/ust/manual/ust.html 0.x manpage ? Or make sure > you move it somewhere that clearly states it is outdated ? > > Thanks, > > Mathieu > >> >> Thank for your replies. >> >> Paul. >> >> _______________________________________________ >> lttng-dev mailing list >> lttng-dev at lists.lttng.org >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > From mathieu.desnoyers at efficios.com Mon Oct 1 17:27:16 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 1 Oct 2012 17:27:16 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v4] ABI with support for compat 32/64 bits In-Reply-To: <1349124140-3224-1-git-send-email-jdesfossez@efficios.com> References: <1349124140-3224-1-git-send-email-jdesfossez@efficios.com> Message-ID: <20121001212716.GB22077@Krystal> * Julien Desfossez (jdesfossez at efficios.com) wrote: [...] > +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > +struct lttng_kernel_channel { > + uint64_t subbuf_size; /* bytes */ > + uint64_t num_subbuf; /* power of 2 */ > + unsigned int switch_timer_interval; /* usec */ > + unsigned int read_timer_interval; /* usec */ -> > + int overwrite; /* 1: overwrite, 0: discard */ > + enum lttng_event_output output; /* splice, mmap */ field order mismatch with lttng-modules. Please re-review all structures between modules and tools headers. Thanks, Mathieu > + > + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; > +}__attribute__((packed)); > > #endif /* _LTTNG_KERNEL_H */ > -- > 1.7.9.5 > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Mon Oct 1 18:04:57 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 01 Oct 2012 18:04:57 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v4] ABI with support for compat 32/64 bits In-Reply-To: <1349124140-3224-1-git-send-email-jdesfossez@efficios.com> References: <1349124140-3224-1-git-send-email-jdesfossez@efficios.com> Message-ID: <506A1389.1000702@efficios.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Julien Desfossez: > The current ABI does not work for compat 32/64 bits. This patch > moves the current ABI as old-abi and provides a new ABI in which > all the structures exchanged between user and kernel-space are > packed. Also this new ABI moves the "int overwrite" member of the > struct lttng_kernel_channel to remove the alignment added by the > compiler. > > A patch for lttng-modules has been developed in parallel to this > one to support the new ABI. These 2 patches have been tested in > all possible configurations (applied or not) on 64-bit and 32-bit > kernels (with CONFIG_COMPAT) and a user-space in 32 and 64-bit. > > Here are the results of the tests : k 64 compat | u 32 compat > | OK k 64 compat | u 64 compat | OK k 64 compat | u 32 > non-compat | KO k 64 compat | u 64 non-compat | OK > > k 64 non-compat | u 64 compat | OK k 64 non-compat | u 32 > compat | KO k 64 non-compat | u 64 non-compat | OK k 64 > non-compat | u 32 non-compat | KO > > k 32 compat | u compat | OK k 32 compat | u > non-compat | OK > > k 32 non-compat | u compat | OK k 32 non-compat | u > non-compat | OK > > The results are as expected : - on 32-bit user-space and kernel, > every configuration works. - on 64-bit user-space and kernel, every > configuration works. - with 32-bit user-space on a 64-bit kernel > the only configuration where it works is when the compat patch is > applied everywhere. > > Signed-off-by: Julien Desfossez --- > src/bin/lttng-sessiond/trace-kernel.h | 1 + > src/common/kernel-ctl/kernel-ctl.c | 210 > +++++++++++++++++++++++++++++---- > src/common/kernel-ctl/kernel-ctl.h | 1 + > src/common/kernel-ctl/kernel-ioctl.h | 74 ++++++++---- > src/common/lttng-kernel-old.h | 115 ++++++++++++++++++ > src/common/lttng-kernel.h | 31 +++-- 6 files changed, > 383 insertions(+), 49 deletions(-) create mode 100644 > src/common/lttng-kernel-old.h > > diff --git a/src/bin/lttng-sessiond/trace-kernel.h > b/src/bin/lttng-sessiond/trace-kernel.h index f04d9e7..c86cc27 > 100644 --- a/src/bin/lttng-sessiond/trace-kernel.h +++ > b/src/bin/lttng-sessiond/trace-kernel.h @@ -22,6 +22,7 @@ > > #include #include +#include > > > #include "consumer.h" > > diff --git a/src/common/kernel-ctl/kernel-ctl.c > b/src/common/kernel-ctl/kernel-ctl.c index 1396cd9..ab29dd0 100644 > --- a/src/common/kernel-ctl/kernel-ctl.c +++ > b/src/common/kernel-ctl/kernel-ctl.c @@ -18,38 +18,163 @@ > > #define __USE_LINUX_IOCTL_DEFS #include +#include > > > #include "kernel-ctl.h" #include "kernel-ioctl.h" > > +/* + * This flag indicates whether lttng-tools must use the new or > the old kernel + * ABI (without compat support for 32/64 bits). + > */ As discuss, having a better comment to explain "what is old kernel". Since this compat layer is due to be removed someday, just explain why please. > +static int lttng_kernel_use_old_abi = -1; + I'm a bit "painful" on that but I really want each functions in lttng-tools to be commented and args explained briefly and return value. Also normally, we have a location for compat layer "stuff" which is src/common/compat. In this case I'm ok to let it here since this is _suppose_ to be temporary. However, temporary stuff always stays forever... so a clear comment that we plan to remove this compat layer after 2.1 at some point in time. (Maybe a XXX comment). I'm asking all this because, since this is named with _old*_, in a couple of months, I'll need to understand what this is about and why it's not called "compat_...2_0_..." > +static inline int compat_ioctl_no_arg(int fd, unsigned long > oldname, + unsigned long newname) +{ + int ret; + + if > (lttng_kernel_use_old_abi == -1) { + ret = ioctl(fd, newname); + > if (!ret) { + lttng_kernel_use_old_abi = 0; + goto end; + } + > lttng_kernel_use_old_abi = 1; + } + if (lttng_kernel_use_old_abi) > { + ret = ioctl(fd, oldname); + } else { + ret = ioctl(fd, > newname); + } + +end: + return ret; +} + int > kernctl_create_session(int fd) { - return ioctl(fd, > LTTNG_KERNEL_SESSION); + return compat_ioctl_no_arg(fd, > LTTNG_KERNEL_OLD_SESSION, + LTTNG_KERNEL_SESSION); } > > /* open the metadata global channel */ int > kernctl_open_metadata(int fd, struct lttng_channel_attr *chops) { - > return ioctl(fd, LTTNG_KERNEL_METADATA, chops); + struct > lttng_kernel_old_channel old_channel; + struct lttng_kernel_channel > channel; + + if (lttng_kernel_use_old_abi) { + > old_channel.overwrite = chops->overwrite; + > old_channel.subbuf_size = chops->subbuf_size; + > old_channel.num_subbuf = chops->num_subbuf; + > old_channel.switch_timer_interval = chops->switch_timer_interval; + > old_channel.read_timer_interval = chops->read_timer_interval; + > old_channel.output = chops->output; + memcpy(old_channel.padding, > chops->padding, sizeof(chops->padding)); The size of this memcpy should use the old_channel padding being the destination buffer. I know this is "controlled buffer and known size" but it can be error proned in the future with any changes to the current ABI. (More to fix below). > + + return ioctl(fd, LTTNG_KERNEL_OLD_METADATA, &old_channel); + > } + + channel.overwrite = chops->overwrite; + channel.subbuf_size = > chops->subbuf_size; + channel.num_subbuf = chops->num_subbuf; + > channel.switch_timer_interval = chops->switch_timer_interval; + > channel.read_timer_interval = chops->read_timer_interval; + > channel.output = chops->output; + memcpy(channel.padding, > chops->padding, sizeof(chops->padding)); Here. > + + return ioctl(fd, LTTNG_KERNEL_METADATA, &channel); } > > int kernctl_create_channel(int fd, struct lttng_channel_attr > *chops) { - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); + struct > lttng_kernel_old_channel old_channel; Move this inside the if(OLD). > + struct lttng_kernel_channel channel; + + if > (lttng_kernel_use_old_abi) { + old_channel.overwrite = > chops->overwrite; + old_channel.subbuf_size = chops->subbuf_size; > + old_channel.num_subbuf = chops->num_subbuf; + > old_channel.switch_timer_interval = chops->switch_timer_interval; + > old_channel.read_timer_interval = chops->read_timer_interval; + > old_channel.output = chops->output; + memcpy(old_channel.padding, > chops->padding, sizeof(chops->padding)); Here. > + + return ioctl(fd, LTTNG_KERNEL_OLD_CHANNEL, &old_channel); + } > + + channel.overwrite = chops->overwrite; + channel.subbuf_size = > chops->subbuf_size; + channel.num_subbuf = chops->num_subbuf; + > channel.switch_timer_interval = chops->switch_timer_interval; + > channel.read_timer_interval = chops->read_timer_interval; + > channel.output = chops->output; + memcpy(channel.padding, > chops->padding, sizeof(chops->padding)); Here. > + + return ioctl(fd, LTTNG_KERNEL_CHANNEL, &channel); } > > int kernctl_create_stream(int fd) { - return ioctl(fd, > LTTNG_KERNEL_STREAM); + return compat_ioctl_no_arg(fd, > LTTNG_KERNEL_OLD_STREAM, + LTTNG_KERNEL_STREAM); } > > int kernctl_create_event(int fd, struct lttng_kernel_event *ev) { + > if (lttng_kernel_use_old_abi) { + struct lttng_kernel_event > old_event; Shouldn't that be lttng_kernel_old_event here ? > + + memcpy(old_event.name, ev->name, sizeof(old_event.name)); + > old_event.instrumentation = ev->instrumentation; + switch > (ev->instrumentation) { + case LTTNG_KERNEL_KPROBE: + > old_event.u.kprobe.addr = ev->u.kprobe.addr; + > old_event.u.kprobe.offset = ev->u.kprobe.offset; + > memcpy(old_event.u.kprobe.symbol_name, + > ev->u.kprobe.symbol_name, + > sizeof(old_event.u.kprobe.symbol_name)); + break; + case > LTTNG_KERNEL_KRETPROBE: + old_event.u.kretprobe.addr = > ev->u.kretprobe.addr; + old_event.u.kretprobe.offset = > ev->u.kretprobe.offset; + > memcpy(old_event.u.kretprobe.symbol_name, + > ev->u.kretprobe.symbol_name, + > sizeof(old_event.u.kretprobe.symbol_name)); + break; + case > LTTNG_KERNEL_FUNCTION: + memcpy(old_event.u.ftrace.symbol_name, + > ev->u.ftrace.symbol_name, + > sizeof(old_event.u.ftrace.symbol_name)); + break; + default: + > break; + } + + return ioctl(fd, LTTNG_KERNEL_OLD_EVENT, > &old_event); + } return ioctl(fd, LTTNG_KERNEL_EVENT, ev); } > > int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) > { + if (lttng_kernel_use_old_abi) { + struct > lttng_kernel_old_context old_ctx; + + old_ctx.ctx = ctx->ctx; + > /* only type that uses the union */ + if (ctx->ctx == > LTTNG_KERNEL_CONTEXT_PERF_COUNTER) { + > old_ctx.u.perf_counter.type = + ctx->u.perf_counter.type; + > old_ctx.u.perf_counter.config = + ctx->u.perf_counter.config; + > memcpy(old_ctx.u.perf_counter.name, + ctx->u.perf_counter.name, > + sizeof(old_ctx.u.perf_counter.name)); + } + return ioctl(fd, > LTTNG_KERNEL_OLD_CONTEXT, &old_ctx); + } return ioctl(fd, > LTTNG_KERNEL_CONTEXT, ctx); } > > @@ -57,44 +182,96 @@ int kernctl_add_context(int fd, struct > lttng_kernel_context *ctx) /* Enable event, channel and session > ioctl */ int kernctl_enable(int fd) { - return ioctl(fd, > LTTNG_KERNEL_ENABLE); + return compat_ioctl_no_arg(fd, > LTTNG_KERNEL_OLD_ENABLE, + LTTNG_KERNEL_ENABLE); } > > /* Disable event, channel and session ioctl */ int > kernctl_disable(int fd) { - return ioctl(fd, > LTTNG_KERNEL_DISABLE); + return compat_ioctl_no_arg(fd, > LTTNG_KERNEL_OLD_DISABLE, + LTTNG_KERNEL_DISABLE); } > > int kernctl_start_session(int fd) { - return ioctl(fd, > LTTNG_KERNEL_SESSION_START); + return compat_ioctl_no_arg(fd, > LTTNG_KERNEL_OLD_SESSION_START, + LTTNG_KERNEL_SESSION_START); } > > int kernctl_stop_session(int fd) { - return ioctl(fd, > LTTNG_KERNEL_SESSION_STOP); + return compat_ioctl_no_arg(fd, > LTTNG_KERNEL_OLD_SESSION_STOP, + LTTNG_KERNEL_SESSION_STOP); } > > - int kernctl_tracepoint_list(int fd) { - return ioctl(fd, > LTTNG_KERNEL_TRACEPOINT_LIST); + return compat_ioctl_no_arg(fd, > LTTNG_KERNEL_OLD_TRACEPOINT_LIST, + > LTTNG_KERNEL_TRACEPOINT_LIST); } > > int kernctl_tracer_version(int fd, struct > lttng_kernel_tracer_version *v) { - return ioctl(fd, > LTTNG_KERNEL_TRACER_VERSION, v); + int ret; + + if > (lttng_kernel_use_old_abi == -1) { + ret = ioctl(fd, > LTTNG_KERNEL_TRACER_VERSION, v); + if (!ret) { + > lttng_kernel_use_old_abi = 0; + goto end; + } + > lttng_kernel_use_old_abi = 1; + } + if (lttng_kernel_use_old_abi) > { + struct lttng_kernel_old_tracer_version old_v; + + ret = > ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, &old_v); + if (ret) + > goto end; {} missing > + v->major = old_v.major; + v->minor = old_v.minor; + > v->patchlevel = old_v.patchlevel; + } else { + ret = ioctl(fd, > LTTNG_KERNEL_TRACER_VERSION, v); + } + +end: + return ret; } > > int kernctl_wait_quiescent(int fd) { - return ioctl(fd, > LTTNG_KERNEL_WAIT_QUIESCENT); + return compat_ioctl_no_arg(fd, > LTTNG_KERNEL_OLD_WAIT_QUIESCENT, + LTTNG_KERNEL_WAIT_QUIESCENT); > } > > int kernctl_calibrate(int fd, struct lttng_kernel_calibrate > *calibrate) { - return ioctl(fd, LTTNG_KERNEL_CALIBRATE, > calibrate); + int ret; + + if (lttng_kernel_use_old_abi == -1) { + > ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); + if (!ret) { > + lttng_kernel_use_old_abi = 0; + goto end; + } + > lttng_kernel_use_old_abi = 1; + } + if (lttng_kernel_use_old_abi) > { + struct lttng_kernel_old_calibrate old_calibrate; + + > old_calibrate.type = calibrate->type; + ret = ioctl(fd, > LTTNG_KERNEL_OLD_CALIBRATE, &old_calibrate); + if (ret) + goto > end; {} missing > + calibrate->type = old_calibrate.type; + } else { + ret = > ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); + } + +end: + return > ret; } > > > @@ -193,10 +370,3 @@ int kernctl_set_stream_id(int fd, unsigned > long *stream_id) { return ioctl(fd, RING_BUFFER_SET_STREAM_ID, > stream_id); } - -/* Get the offset of the stream_id in the packet > header */ -int kernctl_get_net_stream_id_offset(int fd, unsigned > long *offset) -{ - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, > offset); - -} diff --git a/src/common/kernel-ctl/kernel-ctl.h > b/src/common/kernel-ctl/kernel-ctl.h index 18712d9..85a3a18 100644 > --- a/src/common/kernel-ctl/kernel-ctl.h +++ > b/src/common/kernel-ctl/kernel-ctl.h @@ -21,6 +21,7 @@ > > #include #include +#include > > > int kernctl_create_session(int fd); int kernctl_open_metadata(int > fd, struct lttng_channel_attr *chops); diff --git > a/src/common/kernel-ctl/kernel-ioctl.h > b/src/common/kernel-ctl/kernel-ioctl.h index 35942be..1d34222 > 100644 --- a/src/common/kernel-ctl/kernel-ioctl.h +++ > b/src/common/kernel-ctl/kernel-ioctl.h @@ -49,37 +49,69 @@ /* map > stream to stream id for network streaming */ #define > RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned > long) > > +/* Old ABI (without support for 32/64 bits compat) */ +/* LTTng > file descriptor ioctl */ +#define LTTNG_KERNEL_OLD_SESSION > _IO(0xF6, 0x40) +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ > + _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) > +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, 0x42) > +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT _IO(0xF6, 0x43) > +#define LTTNG_KERNEL_OLD_CALIBRATE \ + _IOWR(0xF6, 0x44, struct > lttng_kernel_old_calibrate) + +/* Session FD ioctl */ +#define > LTTNG_KERNEL_OLD_METADATA \ + _IOW(0xF6, 0x50, > struct lttng_kernel_old_channel) +#define LTTNG_KERNEL_OLD_CHANNEL > \ + _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) +#define > LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) +#define > LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, 0x53) + +/* > Channel FD ioctl */ +#define LTTNG_KERNEL_OLD_STREAM > _IO(0xF6, 0x60) +#define LTTNG_KERNEL_OLD_EVENT \ > + _IOW(0xF6, 0x61, struct lttng_kernel_old_event) +#define > LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ + _IOR(0xF6, 0x62, > unsigned long) > > +/* Event and Channel FD ioctl */ +#define LTTNG_KERNEL_OLD_CONTEXT > \ + _IOW(0xF6, 0x70, struct lttng_kernel_old_context) + +/* Event, > Channel and Session ioctl */ +#define LTTNG_KERNEL_OLD_ENABLE > _IO(0xF6, 0x80) +#define LTTNG_KERNEL_OLD_DISABLE > _IO(0xF6, 0x81) + + +/* New ABI (with suport for 32/64 bits compat) > */ Maybe just use "Current ABI" with version number also. (2.1 in this case I guess). Thanks David > /* LTTng file descriptor ioctl */ -#define LTTNG_KERNEL_SESSION > _IO(0xF6, 0x40) -#define LTTNG_KERNEL_TRACER_VERSION \ - > _IOR(0xF6, 0x41, struct lttng_kernel_tracer_version) -#define > LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) -#define > LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) +#define > LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) +#define > LTTNG_KERNEL_TRACER_VERSION \ + _IOR(0xF6, 0x46, struct > lttng_kernel_tracer_version) +#define LTTNG_KERNEL_TRACEPOINT_LIST > _IO(0xF6, 0x47) +#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, > 0x48) #define LTTNG_KERNEL_CALIBRATE \ - _IOWR(0xF6, 0x44, struct > lttng_kernel_calibrate) + _IOWR(0xF6, 0x49, struct > lttng_kernel_calibrate) > > /* Session FD ioctl */ -#define LTTNG_KERNEL_METADATA > \ - _IOW(0xF6, 0x50, struct lttng_channel_attr) -#define > LTTNG_KERNEL_CHANNEL \ - _IOW(0xF6, 0x51, struct > lttng_channel_attr) -#define LTTNG_KERNEL_SESSION_START > _IO(0xF6, 0x52) -#define LTTNG_KERNEL_SESSION_STOP > _IO(0xF6, 0x53) +#define LTTNG_KERNEL_METADATA \ + _IOW(0xF6, > 0x54, struct lttng_kernel_channel) +#define LTTNG_KERNEL_CHANNEL > \ + _IOW(0xF6, 0x55, struct lttng_kernel_channel) +#define > LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) +#define > LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) > > /* Channel FD ioctl */ -#define LTTNG_KERNEL_STREAM > _IO(0xF6, 0x60) -#define LTTNG_KERNEL_EVENT \ - > _IOW(0xF6, 0x61, struct lttng_kernel_event) -#define > LTTNG_KERNEL_STREAM_ID_OFFSET \ - _IOR(0xF6, 0x62, unsigned > long) +#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x62) +#define > LTTNG_KERNEL_EVENT \ + _IOW(0xF6, 0x63, struct > lttng_kernel_event) > > /* Event and Channel FD ioctl */ -#define LTTNG_KERNEL_CONTEXT > \ - _IOW(0xF6, 0x70, struct lttng_kernel_context) +#define > LTTNG_KERNEL_CONTEXT \ + _IOW(0xF6, 0x71, struct > lttng_kernel_context) > > /* Event, Channel and Session ioctl */ -#define LTTNG_KERNEL_ENABLE > _IO(0xF6, 0x80) -#define LTTNG_KERNEL_DISABLE > _IO(0xF6, 0x81) +#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) > +#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) > > #endif /* _LTT_KERNEL_IOCTL_H */ diff --git > a/src/common/lttng-kernel-old.h b/src/common/lttng-kernel-old.h new > file mode 100644 index 0000000..1b8999a --- /dev/null +++ > b/src/common/lttng-kernel-old.h @@ -0,0 +1,115 @@ +/* + * Copyright > (C) 2011 - Julien Desfossez + * > Mathieu Desnoyers + * > David Goulet + * + * This program is free > software; you can redistribute it and/or modify + * it under the > terms of the GNU General Public License, version 2 only, + * as > published by the Free Software Foundation. + * + * This program is > distributed in the hope that it will be useful, but WITHOUT + * ANY > WARRANTY; without even the implied warranty of MERCHANTABILITY or + > * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public > License for + * more details. + * + * You should have received a > copy of the GNU General Public License along + * with this program; > if not, write to the Free Software Foundation, Inc., + * 51 > Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + > +#ifndef _LTTNG_KERNEL_OLD_H +#define _LTTNG_KERNEL_OLD_H + > +#include +#include + +/* + * > LTTng DebugFS ABI structures. + * + * This is the kernel ABI copied > from lttng-modules tree. + */ + +/* Perf counter attributes */ > +struct lttng_kernel_old_perf_counter_ctx { + uint32_t type; + > uint64_t config; + char name[LTTNG_KERNEL_SYM_NAME_LEN]; +}; + +/* > Event/Channel context */ +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 > 16 +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 > LTTNG_KERNEL_SYM_NAME_LEN + 32 +struct lttng_kernel_old_context { + > enum lttng_kernel_context_type ctx; + char > padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; + + union { + struct > lttng_kernel_old_perf_counter_ctx perf_counter; + char > padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; + } u; +}; + +struct > lttng_kernel_old_kretprobe { + uint64_t addr; + + uint64_t offset; > + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; +}; + +/* + * Either > addr is used, or symbol_name and offset. + */ +struct > lttng_kernel_old_kprobe { + uint64_t addr; + + uint64_t offset; + > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; +}; + +/* Function > tracer */ +struct lttng_kernel_old_function { + char > symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; +}; + +#define > LTTNG_KERNEL_OLD_EVENT_PADDING1 16 +#define > LTTNG_KERNEL_OLD_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > +struct lttng_kernel_old_event { + char > name[LTTNG_KERNEL_SYM_NAME_LEN]; + enum > lttng_kernel_instrumentation instrumentation; + char > padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; + + /* Per > instrumentation type configuration */ + union { + struct > lttng_kernel_old_kretprobe kretprobe; + struct > lttng_kernel_old_kprobe kprobe; + struct lttng_kernel_old_function > ftrace; + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; + } u; > +}; + +struct lttng_kernel_old_tracer_version { + uint32_t major; + > uint32_t minor; + uint32_t patchlevel; +}; + +struct > lttng_kernel_old_calibrate { + enum lttng_kernel_calibrate_type > type; /* type (input) */ +}; + +/* + * kernel channel + */ +#define > LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > +struct lttng_kernel_old_channel { + int overwrite; > /* 1: overwrite, 0: discard */ + uint64_t subbuf_size; > /* bytes */ + uint64_t num_subbuf; /* power of 2 */ > + unsigned int switch_timer_interval; /* usec */ + unsigned int > read_timer_interval; /* usec */ + enum lttng_event_output output; > /* splice, mmap */ + + char > padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; +}; + +#endif /* > _LTTNG_KERNEL_OLD_H */ diff --git a/src/common/lttng-kernel.h > b/src/common/lttng-kernel.h index dbeb6aa..ac881bf 100644 --- > a/src/common/lttng-kernel.h +++ b/src/common/lttng-kernel.h @@ > -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { uint32_t > type; uint64_t config; char name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; > +}__attribute__((packed)); > > /* Event/Channel context */ #define LTTNG_KERNEL_CONTEXT_PADDING1 > 16 @@ -72,14 +72,14 @@ struct lttng_kernel_context { struct > lttng_kernel_perf_counter_ctx perf_counter; char > padding[LTTNG_KERNEL_CONTEXT_PADDING2]; } u; -}; > +}__attribute__((packed)); > > struct lttng_kernel_kretprobe { uint64_t addr; > > uint64_t offset; char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; > +}__attribute__((packed)); > > /* * Either addr is used, or symbol_name and offset. @@ -89,12 > +89,12 @@ struct lttng_kernel_kprobe { > > uint64_t offset; char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; > +}__attribute__((packed)); > > /* Function tracer */ struct lttng_kernel_function { char > symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; > +}__attribute__((packed)); > > #define LTTNG_KERNEL_EVENT_PADDING1 16 #define > LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 @@ > -110,13 +110,13 @@ struct lttng_kernel_event { struct > lttng_kernel_function ftrace; char > padding[LTTNG_KERNEL_EVENT_PADDING2]; } u; -}; > +}__attribute__((packed)); > > struct lttng_kernel_tracer_version { uint32_t major; uint32_t > minor; uint32_t patchlevel; -}; +}__attribute__((packed)); > > enum lttng_kernel_calibrate_type { > LTTNG_KERNEL_CALIBRATE_KRETPROBE, @@ -124,6 +124,21 @@ enum > lttng_kernel_calibrate_type { > > struct lttng_kernel_calibrate { enum lttng_kernel_calibrate_type > type; /* type (input) */ -}; +}__attribute__((packed)); + +/* + * > kernel channel + */ +#define LTTNG_KERNEL_CHANNEL_PADDING1 > LTTNG_SYMBOL_NAME_LEN + 32 +struct lttng_kernel_channel { + > uint64_t subbuf_size; /* bytes */ + uint64_t > num_subbuf; /* power of 2 */ + unsigned int > switch_timer_interval; /* usec */ + unsigned int > read_timer_interval; /* usec */ + int overwrite; > /* 1: overwrite, 0: discard */ + enum lttng_event_output output; > /* splice, mmap */ + + char > padding[LTTNG_KERNEL_CHANNEL_PADDING1]; +}__attribute__((packed)); > > #endif /* _LTTNG_KERNEL_H */ -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJQahOGAAoJEELoaioR9I02x3UIAKpJrPlrhDp+IS/bgfnN6DTs xpSN2Cfmmwsa8+rifSglsMf4FjPX4YrzB3ONhvLtyxTN81l1YhynYk87cE3aXEiw 2fZR5jXkU7jogk1BAd546pI5TpNU3i53JMjB+aN33tyJfIxo8pu2TnN94Rau0IDn a81ETaA5y9ABdMAc8nhH1UeUjVPFuoQkqmSePTpUWrVbWFbDPI+JJAtSEZV3MlZ7 sDnoaBNRqfqJ2oYld5GtHtDrU0BoGVhT/de9ED4TC/Pn4q1Tj6e4LgOXOw2ZK4Lb 7uH7XcoL87UUISsgzuEKfreeZQ+nDef8+TTk8Fe3DHoOfiW8Pp4BP8kN3VExcMA= =xWuT -----END PGP SIGNATURE----- From mathieu.desnoyers at efficios.com Mon Oct 1 19:22:27 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 1 Oct 2012 19:22:27 -0400 Subject: [lttng-dev] [lttng-tools GIT PULL] please pull from compudj-pull Message-ID: <20121001232227.GA23978@Krystal> at commit db8870edf473e2a2f69e488375d32405ea324017 Thanks! Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Tue Oct 2 10:13:07 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 2 Oct 2012 10:13:07 -0400 Subject: [lttng-dev] [URCU PATCH 0/3] wait-free concurrent queues (wfcqueue) Message-ID: <20121002141307.GA4057@Krystal> Implement wait-free concurrent queues, with a new API different from wfqueue.h, which is already provided by Userspace RCU. The advantage of splitting the head and tail objects of the queue into different arguments is to allow these to sit on different cache-lines, thus eliminating false-sharing, leading to a 2.3x speed increase. This API also introduces a "splice" operation, which moves all nodes from one queue into another, and postpones the synchronization to either dequeue or iteration on the list. The splice operation does not need to touch every single node of the queue it moves them from. Moreover, the splice operation only needs to ensure mutual exclusion with other dequeuers, iterations, and splice operations from the list it splices from, but acts as a simple enqueuer on the list it splices into (no mutual exclusion needed for that list). Feedback is welcome, Thanks! Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Tue Oct 2 10:14:44 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 2 Oct 2012 10:14:44 -0400 Subject: [lttng-dev] [URCU PATCH 1/3] wfcqueue: implement concurrency-efficient queue In-Reply-To: <20121002141307.GA4057@Krystal> References: <20121002141307.GA4057@Krystal> Message-ID: <20121002141444.GB4057@Krystal> This new API simplify the wfqueue implementation, and brings a 2.3x to 2.6x performance boost due to the ability to eliminate false-sharing between enqueue and dequeue. This work is derived from the patch from Lai Jiangshan submitted as "urcu: new wfqueue implementation" (http://lists.lttng.org/pipermail/lttng-dev/2012-August/018379.html) Its changelog: > Some guys would be surprised by this fact: > There are already TWO implementations of wfqueue in urcu. > > The first one is in urcu/static/wfqueue.h: > 1) enqueue: exchange the tail and then update previous->next > 2) dequeue: wait for first node's next pointer and them shift, a dummy node > is introduced to avoid the queue->tail become NULL when shift. > > The second one shares some code with the first one, and the left code > are spreading in urcu-call-rcu-impl.h: > 1) enqueue: share with the first one > 2) no dequeue operation: and no shift, so it don't need dummy node, > Although the dummy node is queued when initialization, but it is removed > after the first dequeue_all operation in call_rcu_thread(). > call_rcu_data_free() forgets to handle the dummy node if it is not removed. > 3)dequeue_all: record the old head and tail, and queue->head become the special > tail node.(atomic record the tail and change the tail). > > The second implementation's code are spreading, bad for review, and it is not > tested by tests/test_urcu_wfq. > > So we need a better implementation avoid the dummy node dancing and can service > both generic wfqueue APIs and dequeue_all API for call rcu. > > The new implementation: > 1) enqueue: share with the first one/original implementation. > 2) dequeue: shift when node count >= 2, cmpxchg when node count = 1. > no dummy node, save memory. > 3) dequeue_all: simply set queue->head.next to NULL, xchg the tail > and return the old head.next. > > More implementation details are in the code. > tests/test_urcu_wfq will be update in future for testing new APIs. The patch proposed by Lai brings a very interesting simplification to the single-node handling (which is kept here), and moves all queue handling code away from call_rcu implementation, back into the wfqueue code. This has the benefit to allow testing enhancements. I modified it so the API does not expose implementation details to the user (e.g. ___cds_wfq_node_sync_next). I added a "splice" operation and a for loop iterator which should allow wfqueue users to use the list very efficiently both from LGPL/GPL code and from non-LGPL-compatible code. I also changed the API so the queue head and tail are now two separate structures: it allows the queue user to place these as they like, either on different cache lines (to eliminate false-sharing), or close one to another (on same cache-line) in case a queue is spliced onto the stack and not concurrently accessed. Benchmarks performed on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz (dual-core, with hyperthreading) Benchmark invoked: for a in $(seq 1 10); do ./test_urcu_wfq 1 1 10 -a 0 -a 2; done (using cpu number 0 and 2, which should correspond to two cores of my Intel 2-core/hyperthread processor) Before patch: testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 97274297 nr_dequeues 80745742 successful enqueues 97274297 successful dequeues 80745321 end_dequeues 16528976 nr_ops 178020039 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 92300568 nr_dequeues 75019529 successful enqueues 92300568 successful dequeues 74973237 end_dequeues 17327331 nr_ops 167320097 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 93516443 nr_dequeues 75846726 successful enqueues 93516443 successful dequeues 75826578 end_dequeues 17689865 nr_ops 169363169 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 94160362 nr_dequeues 77967638 successful enqueues 94160362 successful dequeues 77967638 end_dequeues 16192724 nr_ops 172128000 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 97491956 nr_dequeues 81001191 successful enqueues 97491956 successful dequeues 81000247 end_dequeues 16491709 nr_ops 178493147 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 94101298 nr_dequeues 75650510 successful enqueues 94101298 successful dequeues 75649318 end_dequeues 18451980 nr_ops 169751808 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 94742803 nr_dequeues 75402105 successful enqueues 94742803 successful dequeues 75341859 end_dequeues 19400944 nr_ops 170144908 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 92198835 nr_dequeues 75037877 successful enqueues 92198835 successful dequeues 75027605 end_dequeues 17171230 nr_ops 167236712 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 94159560 nr_dequeues 77895972 successful enqueues 94159560 successful dequeues 77858442 end_dequeues 16301118 nr_ops 172055532 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 96059399 nr_dequeues 80115442 successful enqueues 96059399 successful dequeues 80066843 end_dequeues 15992556 nr_ops 176174841 After patch: testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 221229322 nr_dequeues 210645491 successful enqueues 221229322 successful dequeues 210645088 end_dequeues 10584234 nr_ops 431874813 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 219803943 nr_dequeues 210377337 successful enqueues 219803943 successful dequeues 210368680 end_dequeues 9435263 nr_ops 430181280 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 237006358 nr_dequeues 237035340 successful enqueues 237006358 successful dequeues 236997050 end_dequeues 9308 nr_ops 474041698 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 235822443 nr_dequeues 235815942 successful enqueues 235822443 successful dequeues 235814020 end_dequeues 8423 nr_ops 471638385 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 235825567 nr_dequeues 235811803 successful enqueues 235825567 successful dequeues 235810526 end_dequeues 15041 nr_ops 471637370 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 221974953 nr_dequeues 210938190 successful enqueues 221974953 successful dequeues 210938190 end_dequeues 11036763 nr_ops 432913143 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 237994492 nr_dequeues 237938119 successful enqueues 237994492 successful dequeues 237930648 end_dequeues 63844 nr_ops 475932611 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 220634365 nr_dequeues 210491382 successful enqueues 220634365 successful dequeues 210490995 end_dequeues 10143370 nr_ops 431125747 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 237388065 nr_dequeues 237401251 successful enqueues 237388065 successful dequeues 237380295 end_dequeues 7770 nr_ops 474789316 testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 221201436 nr_dequeues 210831162 successful enqueues 221201436 successful dequeues 210831162 end_dequeues 10370274 nr_ops 432032598 Summary: Both enqueue and dequeue speed increase: around 2.3x speedup for enqueue, and around 2.6x for dequeue. We can verify that: successful enqueues - successful dequeues = end_dequeues For all runs (ensures correctness: no lost node). CC: Lai Jiangshan CC: Paul McKenney Signed-off-by: Mathieu Desnoyers --- diff --git a/Makefile.am b/Makefile.am index 2396fcf..ffdca9a 100644 --- a/Makefile.am +++ b/Makefile.am @@ -16,7 +16,7 @@ nobase_dist_include_HEADERS = urcu/compiler.h urcu/hlist.h urcu/list.h \ urcu/uatomic/generic.h urcu/arch/generic.h urcu/wfstack.h \ urcu/wfqueue.h urcu/rculfstack.h urcu/rculfqueue.h \ urcu/ref.h urcu/cds.h urcu/urcu_ref.h urcu/urcu-futex.h \ - urcu/uatomic_arch.h urcu/rculfhash.h \ + urcu/uatomic_arch.h urcu/rculfhash.h urcu/wfcqueue.h \ $(top_srcdir)/urcu/map/*.h \ $(top_srcdir)/urcu/static/*.h \ urcu/tls-compat.h @@ -53,7 +53,7 @@ lib_LTLIBRARIES = liburcu-common.la \ # liburcu-common contains wait-free queues (needed by call_rcu) as well # as futex fallbacks. # -liburcu_common_la_SOURCES = wfqueue.c wfstack.c $(COMPAT) +liburcu_common_la_SOURCES = wfqueue.c wfcqueue.c wfstack.c $(COMPAT) liburcu_la_SOURCES = urcu.c urcu-pointer.c $(COMPAT) liburcu_la_LIBADD = liburcu-common.la diff --git a/urcu/static/wfcqueue.h b/urcu/static/wfcqueue.h new file mode 100644 index 0000000..a989984 --- /dev/null +++ b/urcu/static/wfcqueue.h @@ -0,0 +1,380 @@ +#ifndef _URCU_WFCQUEUE_STATIC_H +#define _URCU_WFCQUEUE_STATIC_H + +/* + * wfcqueue-static.h + * + * Userspace RCU library - Concurrent Queue with Wait-Free Enqueue/Blocking Dequeue + * + * TO BE INCLUDED ONLY IN LGPL-COMPATIBLE CODE. See wfcqueue.h for linking + * dynamically with the userspace rcu library. + * + * Copyright 2010-2012 - Mathieu Desnoyers + * Copyright 2011-2012 - Lai Jiangshan + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include +#include +#include +#include +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/* + * Concurrent queue with wait-free enqueue/blocking dequeue. + * + * Inspired from half-wait-free/half-blocking queue implementation done by + * Paul E. McKenney. + * + * Mutual exclusion of __cds_wfcq_* API + * + * Unless otherwise stated, the caller must ensure mutual exclusion of + * queue update operations "dequeue" and "splice" (for source queue). + * Queue read operations "first" and "next" need to be protected against + * concurrent "dequeue" and "splice" (for source queue) by the caller. + * "enqueue", "splice" (for destination queue), and "empty" are the only + * operations that can be used without any mutual exclusion. + * Mutual exclusion can be ensured by holding cds_wfcq_dequeue_lock(). + * + * For convenience, cds_wfcq_dequeue_blocking() and + * cds_wfcq_splice_blocking() hold the dequeue lock. + */ + +#define WFCQ_ADAPT_ATTEMPTS 10 /* Retry if being set */ +#define WFCQ_WAIT 10 /* Wait 10 ms if being set */ + +/* + * cds_wfcq_node_init: initialize wait-free queue node. + */ +static inline void _cds_wfcq_node_init(struct cds_wfcq_node *node) +{ + node->next = NULL; +} + +/* + * cds_wfcq_init: initialize wait-free queue. + */ +static inline void _cds_wfcq_init(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + int ret; + + /* Set queue head and tail */ + _cds_wfcq_node_init(&head->node); + tail->p = &head->node; + ret = pthread_mutex_init(&head->lock, NULL); + assert(!ret); +} + +/* + * cds_wfcq_empty: return whether wait-free queue is empty. + * + * No memory barrier is issued. No mutual exclusion is required. + */ +static inline bool _cds_wfcq_empty(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + /* + * Queue is empty if no node is pointed by head->node.next nor + * tail->p. Even though the tail->p check is sufficient to find + * out of the queue is empty, we first check head->node.next as a + * common case to ensure that dequeuers do not frequently access + * enqueuer's tail->p cache line. + */ + return CMM_LOAD_SHARED(head->node.next) == NULL + && CMM_LOAD_SHARED(tail->p) == &head->node; +} + +static inline void _cds_wfcq_dequeue_lock(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + int ret; + + ret = pthread_mutex_lock(&head->lock); + assert(!ret); +} + +static inline void _cds_wfcq_dequeue_unlock(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + int ret; + + ret = pthread_mutex_unlock(&head->lock); + assert(!ret); +} + +static inline void ___cds_wfcq_append(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail, + struct cds_wfcq_node *new_head, + struct cds_wfcq_node *new_tail) +{ + struct cds_wfcq_node *old_tail; + + /* + * Implicit memory barrier before uatomic_xchg() orders earlier + * stores to data structure containing node and setting + * node->next to NULL before publication. + */ + old_tail = uatomic_xchg(&tail->p, new_tail); + + /* + * Implicit memory barrier after uatomic_xchg() orders store to + * q->tail before store to old_tail->next. + * + * At this point, dequeuers see a NULL tail->p->next, which + * indicates that the queue is being appended to. The following + * store will append "node" to the queue from a dequeuer + * perspective. + */ + CMM_STORE_SHARED(old_tail->next, new_head); +} + +/* + * cds_wfcq_enqueue: enqueue a node into a wait-free queue. + * + * Issues a full memory barrier before enqueue. No mutual exclusion is + * required. + */ +static inline void _cds_wfcq_enqueue(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail, + struct cds_wfcq_node *new_tail) +{ + ___cds_wfcq_append(head, tail, new_tail, new_tail); +} + +/* + * Waiting for enqueuer to complete enqueue and return the next node. + */ +static inline struct cds_wfcq_node * +___cds_wfcq_node_sync_next(struct cds_wfcq_node *node) +{ + struct cds_wfcq_node *next; + int attempt = 0; + + /* + * Adaptative busy-looping waiting for enqueuer to complete enqueue. + */ + while ((next = CMM_LOAD_SHARED(node->next)) == NULL) { + if (++attempt >= WFCQ_ADAPT_ATTEMPTS) { + poll(NULL, 0, WFCQ_WAIT); /* Wait for 10ms */ + attempt = 0; + } else { + caa_cpu_relax(); + } + } + + return next; +} + +/* + * __cds_wfcq_first_blocking: get first node of a queue, without dequeuing. + * + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * Should be called with cds_wfcq_dequeue_lock() held. + */ +static inline struct cds_wfcq_node * +___cds_wfcq_first_blocking(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + struct cds_wfcq_node *node; + + if (_cds_wfcq_empty(head, tail)) + return NULL; + node = ___cds_wfcq_node_sync_next(&head->node); + /* Load head->node.next before loading node's content */ + cmm_smp_read_barrier_depends(); + return node; +} + +/* + * __cds_wfcq_next_blocking: get next node of a queue, without dequeuing. + * + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * Should be called with cds_wfcq_dequeue_lock() held. + */ +static inline struct cds_wfcq_node * +___cds_wfcq_next_blocking(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail, + struct cds_wfcq_node *node) +{ + struct cds_wfcq_node *next; + + /* + * Even though the following tail->p check is sufficient to find + * out if we reached the end of the queue, we first check + * node->next as a common case to ensure that iteration on nodes + * do not frequently access enqueuer's tail->p cache line. + */ + if ((next = CMM_LOAD_SHARED(node->next)) == NULL) { + /* Load node->next before tail->p */ + cmm_smp_rmb(); + if (CMM_LOAD_SHARED(tail->p) == node) + return NULL; + next = ___cds_wfcq_node_sync_next(node); + } + /* Load node->next before loading next's content */ + cmm_smp_read_barrier_depends(); + return next; +} + +/* + * __cds_wfcq_dequeue_blocking: dequeue a node from the queue. + * + * No need to go on a waitqueue here, as there is no possible state in which the + * list could cause dequeue to busy-loop needlessly while waiting for another + * thread to be scheduled. The queue appears empty until tail->next is set by + * enqueue. + * + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * It is valid to reuse and free a dequeued node immediately. + * Should be called with cds_wfcq_dequeue_lock() held. + */ +static inline struct cds_wfcq_node * +___cds_wfcq_dequeue_blocking(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + struct cds_wfcq_node *node, *next; + + if (_cds_wfcq_empty(head, tail)) + return NULL; + + node = ___cds_wfcq_node_sync_next(&head->node); + + if ((next = CMM_LOAD_SHARED(node->next)) == NULL) { + /* + * @node is probably the only node in the queue. + * Try to move the tail to &q->head. + * q->head.next is set to NULL here, and stays + * NULL if the cmpxchg succeeds. Should the + * cmpxchg fail due to a concurrent enqueue, the + * q->head.next will be set to the next node. + * The implicit memory barrier before + * uatomic_cmpxchg() orders load node->next + * before loading q->tail. + * The implicit memory barrier before uatomic_cmpxchg + * orders load q->head.next before loading node's + * content. + */ + _cds_wfcq_node_init(&head->node); + if (uatomic_cmpxchg(&tail->p, node, &head->node) == node) + return node; + next = ___cds_wfcq_node_sync_next(node); + } + + /* + * Move queue head forward. + */ + head->node.next = next; + + /* Load q->head.next before loading node's content */ + cmm_smp_read_barrier_depends(); + return node; +} + +/* + * __cds_wfcq_splice_blocking: enqueue all src_q nodes at the end of dest_q. + * + * Dequeue all nodes from src_q. + * dest_q must be already initialized. + * Should be called with cds_wfcq_dequeue_lock() held on src_q. + */ +static inline void +___cds_wfcq_splice_blocking( + struct cds_wfcq_head *dest_q_head, + struct cds_wfcq_tail *dest_q_tail, + struct cds_wfcq_head *src_q_head, + struct cds_wfcq_tail *src_q_tail) +{ + struct cds_wfcq_node *head, *tail; + + if (_cds_wfcq_empty(src_q_head, src_q_tail)) + return; + + head = ___cds_wfcq_node_sync_next(&src_q_head->node); + _cds_wfcq_node_init(&src_q_head->node); + + /* + * Memory barrier implied before uatomic_xchg() orders store to + * src_q->head before store to src_q->tail. This is required by + * concurrent enqueue on src_q, which exchanges the tail before + * updating the previous tail's next pointer. + */ + tail = uatomic_xchg(&src_q_tail->p, &src_q_head->node); + + /* + * Append the spliced content of src_q into dest_q. Does not + * require mutual exclusion on dest_q (wait-free). + */ + ___cds_wfcq_append(dest_q_head, dest_q_tail, head, tail); +} + +/* + * cds_wfcq_dequeue_blocking: dequeue a node from a wait-free queue. + * + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * Mutual exlusion with (and only with) cds_wfcq_splice_blocking is + * ensured. + * It is valid to reuse and free a dequeued node immediately. + */ +static inline struct cds_wfcq_node * +_cds_wfcq_dequeue_blocking(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + struct cds_wfcq_node *retval; + + _cds_wfcq_dequeue_lock(head, tail); + retval = ___cds_wfcq_dequeue_blocking(head, tail); + _cds_wfcq_dequeue_unlock(head, tail); + return retval; +} + +/* + * cds_wfcq_splice_blocking: enqueue all src_q nodes at the end of dest_q. + * + * Dequeue all nodes from src_q. + * dest_q must be already initialized. + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * Mutual exlusion with (and only with) cds_wfcq_dequeue_blocking is + * ensured. + */ +static inline void +_cds_wfcq_splice_blocking( + struct cds_wfcq_head *dest_q_head, + struct cds_wfcq_tail *dest_q_tail, + struct cds_wfcq_head *src_q_head, + struct cds_wfcq_tail *src_q_tail) +{ + _cds_wfcq_dequeue_lock(src_q_head, src_q_tail); + ___cds_wfcq_splice_blocking(dest_q_head, dest_q_tail, + src_q_head, src_q_tail); + _cds_wfcq_dequeue_unlock(src_q_head, src_q_tail); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _URCU_WFCQUEUE_STATIC_H */ diff --git a/urcu/wfcqueue.h b/urcu/wfcqueue.h new file mode 100644 index 0000000..5576cbf --- /dev/null +++ b/urcu/wfcqueue.h @@ -0,0 +1,263 @@ +#ifndef _URCU_WFCQUEUE_H +#define _URCU_WFCQUEUE_H + +/* + * wfcqueue.h + * + * Userspace RCU library - Concurrent Queue with Wait-Free Enqueue/Blocking Dequeue + * + * Copyright 2010-2012 - Mathieu Desnoyers + * Copyright 2011-2012 - Lai Jiangshan + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include +#include +#include +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/* + * Concurrent queue with wait-free enqueue/blocking dequeue. + * + * Inspired from half-wait-free/half-blocking queue implementation done by + * Paul E. McKenney. + */ + +struct cds_wfcq_node { + struct cds_wfcq_node *next; +}; + +/* + * Do not put head and tail on the same cache-line if concurrent + * enqueue/dequeue are expected from many CPUs. This eliminates + * false-sharing between enqueue and dequeue. + */ +struct cds_wfcq_head { + struct cds_wfcq_node node; + pthread_mutex_t lock; +}; + +struct cds_wfcq_tail { + struct cds_wfcq_node *p; +}; + +#ifdef _LGPL_SOURCE + +#include + +#define cds_wfcq_node_init _cds_wfcq_node_init +#define cds_wfcq_init _cds_wfcq_init +#define cds_wfcq_empty _cds_wfcq_empty +#define cds_wfcq_enqueue _cds_wfcq_enqueue + +/* Dequeue locking */ +#define cds_wfcq_dequeue_lock _cds_wfcq_dequeue_lock +#define cds_wfcq_dequeue_unlock _cds_wfcq_dequeue_unlock + +/* Locking performed within cds_wfcq calls. */ +#define cds_wfcq_dequeue_blocking _cds_wfcq_dequeue_blocking +#define cds_wfcq_splice_blocking _cds_wfcq_splice_blocking +#define cds_wfcq_first_blocking _cds_wfcq_first_blocking +#define cds_wfcq_next_blocking _cds_wfcq_next_blocking + +/* Locking ensured by caller by holding cds_wfcq_dequeue_lock() */ +#define __cds_wfcq_dequeue_blocking ___cds_wfcq_dequeue_blocking +#define __cds_wfcq_splice_blocking ___cds_wfcq_splice_blocking +#define __cds_wfcq_first_blocking ___cds_wfcq_first_blocking +#define __cds_wfcq_next_blocking ___cds_wfcq_next_blocking + +#else /* !_LGPL_SOURCE */ + +/* + * Mutual exclusion of cds_wfcq_* / __cds_wfcq_* API + * + * Unless otherwise stated, the caller must ensure mutual exclusion of + * queue update operations "dequeue" and "splice" (for source queue). + * Queue read operations "first" and "next" need to be protected against + * concurrent "dequeue" and "splice" (for source queue) by the caller. + * "enqueue", "splice" (for destination queue), and "empty" are the only + * operations that can be used without any mutual exclusion. + * Mutual exclusion can be ensured by holding cds_wfcq_dequeue_lock(). + * + * For convenience, cds_wfcq_dequeue_blocking() and + * cds_wfcq_splice_blocking() hold the dequeue lock. + */ + +/* + * cds_wfcq_node_init: initialize wait-free queue node. + */ +extern void cds_wfcq_node_init(struct cds_wfcq_node *node); + +/* + * cds_wfcq_init: initialize wait-free queue. + */ +extern void cds_wfcq_init(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail); + +/* + * cds_wfcq_empty: return whether wait-free queue is empty. + * + * No memory barrier is issued. No mutual exclusion is required. + */ +extern bool cds_wfcq_empty(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail); + +/* + * cds_wfcq_dequeue_lock: take the dequeue mutual exclusion lock. + */ +extern void cds_wfcq_dequeue_lock(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail); + +/* + * cds_wfcq_dequeue_unlock: release the dequeue mutual exclusion lock. + */ +extern void cds_wfcq_dequeue_unlock(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail); + +/* + * cds_wfcq_enqueue: enqueue a node into a wait-free queue. + * + * Issues a full memory barrier before enqueue. No mutual exclusion is + * required. + */ +extern void cds_wfcq_enqueue(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail, + struct cds_wfcq_node *node); + +/* + * cds_wfcq_dequeue_blocking: dequeue a node from a wait-free queue. + * + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * It is valid to reuse and free a dequeued node immediately. + * Mutual exlusion with dequeuers is ensured internally. + */ +extern struct cds_wfcq_node *cds_wfcq_dequeue_blocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail); + +/* + * cds_wfcq_splice_blocking: enqueue all src_q nodes at the end of dest_q. + * + * Dequeue all nodes from src_q. + * dest_q must be already initialized. + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * Mutual exlusion with dequeuers is ensured internally. + */ +extern void cds_wfcq_splice_blocking( + struct cds_wfcq_head *dest_q_head, + struct cds_wfcq_tail *dest_q_tail, + struct cds_wfcq_head *src_q_head, + struct cds_wfcq_tail *src_q_tail); + +/* + * __cds_wfcq_dequeue_blocking: + * + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * It is valid to reuse and free a dequeued node immediately. + * Should be called with cds_wfcq_dequeue_lock() held. + */ +extern struct cds_wfcq_node *__cds_wfcq_dequeue_blocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail); + +/* + * __cds_wfcq_splice_blocking: enqueue all src_q nodes at the end of dest_q. + * + * Dequeue all nodes from src_q. + * dest_q must be already initialized. + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * Should be called with cds_wfcq_dequeue_lock() held. + */ +extern void __cds_wfcq_splice_blocking( + struct cds_wfcq_head *dest_q_head, + struct cds_wfcq_tail *dest_q_tail, + struct cds_wfcq_head *src_q_head, + struct cds_wfcq_tail *src_q_tail); + +/* + * __cds_wfcq_first_blocking: get first node of a queue, without dequeuing. + * + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * Should be called with cds_wfcq_dequeue_lock() held. + */ +extern struct cds_wfcq_node *__cds_wfcq_first_blocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail); + +/* + * __cds_wfcq_next_blocking: get next node of a queue, without dequeuing. + * + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * Should be called with cds_wfcq_dequeue_lock() held. + */ +extern struct cds_wfcq_node *__cds_wfcq_next_blocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail, + struct cds_wfcq_node *node); + +#endif /* !_LGPL_SOURCE */ + +/* + * __cds_wfcq_for_each_blocking: Iterate over all nodes in a queue, + * without dequeuing them. + * @head: head of the queue (struct cds_wfcq_head pointer). + * @tail: tail of the queue (struct cds_wfcq_tail pointer). + * @node: iterator on the queue (struct cds_wfcq_node pointer). + * + * Content written into each node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * Should be called with cds_wfcq_dequeue_lock() held. + */ +#define __cds_wfcq_for_each_blocking(head, tail, node) \ + for (node = __cds_wfcq_first_blocking(head, tail); \ + node != NULL; \ + node = __cds_wfcq_next_blocking(head, tail, node)) + +/* + * __cds_wfcq_for_each_blocking_safe: Iterate over all nodes in a queue, + * without dequeuing them. Safe against deletion. + * @head: head of the queue (struct cds_wfcq_head pointer). + * @tail: tail of the queue (struct cds_wfcq_tail pointer). + * @node: iterator on the queue (struct cds_wfcq_node pointer). + * @n: struct cds_wfcq_node pointer holding the next pointer (used + * internally). + * + * Content written into each node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * Should be called with cds_wfcq_dequeue_lock() held. + */ +#define __cds_wfcq_for_each_blocking_safe(head, tail, node, n) \ + for (node = __cds_wfcq_first_blocking(head, tail), \ + n = (node ? __cds_wfcq_next_blocking(head, tail, node) : NULL); \ + node != NULL; \ + node = n, n = (node ? __cds_wfcq_next_blocking(head, tail, node) : NULL)) + +#ifdef __cplusplus +} +#endif + +#endif /* _URCU_WFCQUEUE_H */ diff --git a/wfcqueue.c b/wfcqueue.c new file mode 100644 index 0000000..1fa27ac --- /dev/null +++ b/wfcqueue.c @@ -0,0 +1,116 @@ +/* + * wfcqueue.c + * + * Userspace RCU library - Concurrent queue with Wait-Free Enqueue/Blocking Dequeue + * + * Copyright 2010-2012 - Mathieu Desnoyers + * Copyright 2011-2012 - Lai Jiangshan + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/* Do not #define _LGPL_SOURCE to ensure we can emit the wrapper symbols */ +#include "urcu/wfcqueue.h" +#include "urcu/static/wfcqueue.h" + +/* + * library wrappers to be used by non-LGPL compatible source code. + */ + +void cds_wfcq_node_init(struct cds_wfcq_node *node) +{ + _cds_wfcq_node_init(node); +} + +void cds_wfcq_init(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + _cds_wfcq_init(head, tail); +} + +bool cds_wfcq_empty(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) + +{ + return _cds_wfcq_empty(head, tail); +} + +void cds_wfcq_enqueue(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail, + struct cds_wfcq_node *node) +{ + _cds_wfcq_enqueue(head, tail, node); +} + +void cds_wfcq_dequeue_lock(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + cds_wfcq_dequeue_lock(head, tail); +} + +void cds_wfcq_dequeue_unlock(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + cds_wfcq_dequeue_unlock(head, tail); +} + +struct cds_wfcq_node *cds_wfcq_dequeue_blocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + return _cds_wfcq_dequeue_blocking(head, tail); +} + +void cds_wfcq_splice_blocking( + struct cds_wfcq_head *dest_q_head, + struct cds_wfcq_tail *dest_q_tail, + struct cds_wfcq_head *src_q_head, + struct cds_wfcq_tail *src_q_tail) +{ + _cds_wfcq_splice_blocking(dest_q_head, dest_q_tail, + src_q_head, src_q_tail); +} + +struct cds_wfcq_node *__cds_wfcq_dequeue_blocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + return ___cds_wfcq_dequeue_blocking(head, tail); +} + +void __cds_wfcq_splice_blocking( + struct cds_wfcq_head *dest_q_head, + struct cds_wfcq_tail *dest_q_tail, + struct cds_wfcq_head *src_q_head, + struct cds_wfcq_tail *src_q_tail) +{ + ___cds_wfcq_splice_blocking(dest_q_head, dest_q_tail, + src_q_head, src_q_tail); +} + +struct cds_wfcq_node *__cds_wfcq_first_blocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + return ___cds_wfcq_first_blocking(head, tail); +} + +struct cds_wfcq_node *__cds_wfcq_next_blocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail, + struct cds_wfcq_node *node) +{ + return ___cds_wfcq_next_blocking(head, tail, node); +} -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Tue Oct 2 10:15:42 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 2 Oct 2012 10:15:42 -0400 Subject: [lttng-dev] [URCU PATCH 2/3] wfcqueue test In-Reply-To: <20121002141307.GA4057@Krystal> References: <20121002141307.GA4057@Krystal> Message-ID: <20121002141542.GC4057@Krystal> Signed-off-by: Mathieu Desnoyers --- diff --git a/tests/Makefile.am b/tests/Makefile.am index 7d5ea82..81718bb 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -14,7 +14,9 @@ noinst_PROGRAMS = test_urcu test_urcu_dynamic_link test_urcu_timing \ test_uatomic test_urcu_assign test_urcu_assign_dynamic_link \ test_urcu_bp test_urcu_bp_dynamic_link test_cycles_per_loop \ test_urcu_lfq test_urcu_wfq test_urcu_lfs test_urcu_wfs \ + test_urcu_wfcq \ test_urcu_wfq_dynlink test_urcu_wfs_dynlink \ + test_urcu_wfcq_dynlink \ test_urcu_lfq_dynlink test_urcu_lfs_dynlink test_urcu_hash noinst_HEADERS = rcutorture.h @@ -169,6 +171,13 @@ test_urcu_wfq_dynlink_SOURCES = test_urcu_wfq.c test_urcu_wfq_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS) test_urcu_wfq_dynlink_LDADD = $(URCU_COMMON_LIB) +test_urcu_wfcq_SOURCES = test_urcu_wfcq.c $(COMPAT) +test_urcu_wfcq_LDADD = $(URCU_COMMON_LIB) + +test_urcu_wfcq_dynlink_SOURCES = test_urcu_wfcq.c +test_urcu_wfcq_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS) +test_urcu_wfcq_dynlink_LDADD = $(URCU_COMMON_LIB) + test_urcu_lfs_SOURCES = test_urcu_lfs.c $(URCU) test_urcu_lfs_LDADD = $(URCU_CDS_LIB) diff --git a/tests/test_urcu_wfcq.c b/tests/test_urcu_wfcq.c new file mode 100644 index 0000000..3141268 --- /dev/null +++ b/tests/test_urcu_wfcq.c @@ -0,0 +1,418 @@ +/* + * test_urcu_wfcq.c + * + * Userspace RCU library - example RCU-based lock-free concurrent queue + * + * Copyright February 2010 - Mathieu Desnoyers + * Copyright February 2010 - Paolo Bonzini + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#define _GNU_SOURCE +#include "../config.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#ifdef __linux__ +#include +#endif + +/* hardcoded number of CPUs */ +#define NR_CPUS 16384 + +#if defined(_syscall0) +_syscall0(pid_t, gettid) +#elif defined(__NR_gettid) +static inline pid_t gettid(void) +{ + return syscall(__NR_gettid); +} +#else +#warning "use pid as tid" +static inline pid_t gettid(void) +{ + return getpid(); +} +#endif + +#ifndef DYNAMIC_LINK_TEST +#define _LGPL_SOURCE +#endif +#include +#include + +static volatile int test_go, test_stop; + +static unsigned long rduration; + +static unsigned long duration; + +/* read-side C.S. duration, in loops */ +static unsigned long wdelay; + +static inline void loop_sleep(unsigned long l) +{ + while(l-- != 0) + caa_cpu_relax(); +} + +static int verbose_mode; + +#define printf_verbose(fmt, args...) \ + do { \ + if (verbose_mode) \ + printf(fmt, args); \ + } while (0) + +static unsigned int cpu_affinities[NR_CPUS]; +static unsigned int next_aff = 0; +static int use_affinity = 0; + +pthread_mutex_t affinity_mutex = PTHREAD_MUTEX_INITIALIZER; + +#ifndef HAVE_CPU_SET_T +typedef unsigned long cpu_set_t; +# define CPU_ZERO(cpuset) do { *(cpuset) = 0; } while(0) +# define CPU_SET(cpu, cpuset) do { *(cpuset) |= (1UL << (cpu)); } while(0) +#endif + +static void set_affinity(void) +{ + cpu_set_t mask; + int cpu; + int ret; + + if (!use_affinity) + return; + +#if HAVE_SCHED_SETAFFINITY + ret = pthread_mutex_lock(&affinity_mutex); + if (ret) { + perror("Error in pthread mutex lock"); + exit(-1); + } + cpu = cpu_affinities[next_aff++]; + ret = pthread_mutex_unlock(&affinity_mutex); + if (ret) { + perror("Error in pthread mutex unlock"); + exit(-1); + } + + CPU_ZERO(&mask); + CPU_SET(cpu, &mask); +#if SCHED_SETAFFINITY_ARGS == 2 + sched_setaffinity(0, &mask); +#else + sched_setaffinity(0, sizeof(mask), &mask); +#endif +#endif /* HAVE_SCHED_SETAFFINITY */ +} + +/* + * returns 0 if test should end. + */ +static int test_duration_dequeue(void) +{ + return !test_stop; +} + +static int test_duration_enqueue(void) +{ + return !test_stop; +} + +static DEFINE_URCU_TLS(unsigned long long, nr_dequeues); +static DEFINE_URCU_TLS(unsigned long long, nr_enqueues); + +static DEFINE_URCU_TLS(unsigned long long, nr_successful_dequeues); +static DEFINE_URCU_TLS(unsigned long long, nr_successful_enqueues); + +static unsigned int nr_enqueuers; +static unsigned int nr_dequeuers; + +static struct cds_wfcq_head __attribute__((aligned(CAA_CACHE_LINE_SIZE))) head; +static struct cds_wfcq_tail __attribute__((aligned(CAA_CACHE_LINE_SIZE))) tail; + +void *thr_enqueuer(void *_count) +{ + unsigned long long *count = _count; + + printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n", + "enqueuer", pthread_self(), (unsigned long)gettid()); + + set_affinity(); + + while (!test_go) + { + } + cmm_smp_mb(); + + for (;;) { + struct cds_wfcq_node *node = malloc(sizeof(*node)); + if (!node) + goto fail; + cds_wfcq_node_init(node); + cds_wfcq_enqueue(&head, &tail, node); + URCU_TLS(nr_successful_enqueues)++; + + if (caa_unlikely(wdelay)) + loop_sleep(wdelay); +fail: + URCU_TLS(nr_enqueues)++; + if (caa_unlikely(!test_duration_enqueue())) + break; + } + + count[0] = URCU_TLS(nr_enqueues); + count[1] = URCU_TLS(nr_successful_enqueues); + printf_verbose("enqueuer thread_end, thread id : %lx, tid %lu, " + "enqueues %llu successful_enqueues %llu\n", + pthread_self(), (unsigned long)gettid(), + URCU_TLS(nr_enqueues), URCU_TLS(nr_successful_enqueues)); + return ((void*)1); + +} + +void *thr_dequeuer(void *_count) +{ + unsigned long long *count = _count; + + printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n", + "dequeuer", pthread_self(), (unsigned long)gettid()); + + set_affinity(); + + while (!test_go) + { + } + cmm_smp_mb(); + + for (;;) { + struct cds_wfcq_node *node = + cds_wfcq_dequeue_blocking(&head, &tail); + + if (node) { + free(node); + URCU_TLS(nr_successful_dequeues)++; + } + + URCU_TLS(nr_dequeues)++; + if (caa_unlikely(!test_duration_dequeue())) + break; + if (caa_unlikely(rduration)) + loop_sleep(rduration); + } + + printf_verbose("dequeuer thread_end, thread id : %lx, tid %lu, " + "dequeues %llu, successful_dequeues %llu\n", + pthread_self(), (unsigned long)gettid(), + URCU_TLS(nr_dequeues), URCU_TLS(nr_successful_dequeues)); + count[0] = URCU_TLS(nr_dequeues); + count[1] = URCU_TLS(nr_successful_dequeues); + return ((void*)2); +} + +void test_end(unsigned long long *nr_dequeues) +{ + struct cds_wfcq_node *node; + + do { + node = cds_wfcq_dequeue_blocking(&head, &tail); + if (node) { + free(node); + (*nr_dequeues)++; + } + } while (node); +} + +void show_usage(int argc, char **argv) +{ + printf("Usage : %s nr_dequeuers nr_enqueuers duration (s)", argv[0]); + printf(" [-d delay] (enqueuer period (in loops))"); + printf(" [-c duration] (dequeuer period (in loops))"); + printf(" [-v] (verbose output)"); + printf(" [-a cpu#] [-a cpu#]... (affinity)"); + printf("\n"); +} + +int main(int argc, char **argv) +{ + int err; + pthread_t *tid_enqueuer, *tid_dequeuer; + void *tret; + unsigned long long *count_enqueuer, *count_dequeuer; + unsigned long long tot_enqueues = 0, tot_dequeues = 0; + unsigned long long tot_successful_enqueues = 0, + tot_successful_dequeues = 0; + unsigned long long end_dequeues = 0; + int i, a; + + if (argc < 4) { + show_usage(argc, argv); + return -1; + } + + err = sscanf(argv[1], "%u", &nr_dequeuers); + if (err != 1) { + show_usage(argc, argv); + return -1; + } + + err = sscanf(argv[2], "%u", &nr_enqueuers); + if (err != 1) { + show_usage(argc, argv); + return -1; + } + + err = sscanf(argv[3], "%lu", &duration); + if (err != 1) { + show_usage(argc, argv); + return -1; + } + + for (i = 4; i < argc; i++) { + if (argv[i][0] != '-') + continue; + switch (argv[i][1]) { + case 'a': + if (argc < i + 2) { + show_usage(argc, argv); + return -1; + } + a = atoi(argv[++i]); + cpu_affinities[next_aff++] = a; + use_affinity = 1; + printf_verbose("Adding CPU %d affinity\n", a); + break; + case 'c': + if (argc < i + 2) { + show_usage(argc, argv); + return -1; + } + rduration = atol(argv[++i]); + break; + case 'd': + if (argc < i + 2) { + show_usage(argc, argv); + return -1; + } + wdelay = atol(argv[++i]); + break; + case 'v': + verbose_mode = 1; + break; + } + } + + printf_verbose("running test for %lu seconds, %u enqueuers, " + "%u dequeuers.\n", + duration, nr_enqueuers, nr_dequeuers); + printf_verbose("Writer delay : %lu loops.\n", rduration); + printf_verbose("Reader duration : %lu loops.\n", wdelay); + printf_verbose("thread %-6s, thread id : %lx, tid %lu\n", + "main", pthread_self(), (unsigned long)gettid()); + + tid_enqueuer = malloc(sizeof(*tid_enqueuer) * nr_enqueuers); + tid_dequeuer = malloc(sizeof(*tid_dequeuer) * nr_dequeuers); + count_enqueuer = malloc(2 * sizeof(*count_enqueuer) * nr_enqueuers); + count_dequeuer = malloc(2 * sizeof(*count_dequeuer) * nr_dequeuers); + cds_wfcq_init(&head, &tail); + + next_aff = 0; + + for (i = 0; i < nr_enqueuers; i++) { + err = pthread_create(&tid_enqueuer[i], NULL, thr_enqueuer, + &count_enqueuer[2 * i]); + if (err != 0) + exit(1); + } + for (i = 0; i < nr_dequeuers; i++) { + err = pthread_create(&tid_dequeuer[i], NULL, thr_dequeuer, + &count_dequeuer[2 * i]); + if (err != 0) + exit(1); + } + + cmm_smp_mb(); + + test_go = 1; + + for (i = 0; i < duration; i++) { + sleep(1); + if (verbose_mode) + write (1, ".", 1); + } + + test_stop = 1; + + for (i = 0; i < nr_enqueuers; i++) { + err = pthread_join(tid_enqueuer[i], &tret); + if (err != 0) + exit(1); + tot_enqueues += count_enqueuer[2 * i]; + tot_successful_enqueues += count_enqueuer[2 * i + 1]; + } + for (i = 0; i < nr_dequeuers; i++) { + err = pthread_join(tid_dequeuer[i], &tret); + if (err != 0) + exit(1); + tot_dequeues += count_dequeuer[2 * i]; + tot_successful_dequeues += count_dequeuer[2 * i + 1]; + } + + test_end(&end_dequeues); + + printf_verbose("total number of enqueues : %llu, dequeues %llu\n", + tot_enqueues, tot_dequeues); + printf_verbose("total number of successful enqueues : %llu, " + "successful dequeues %llu\n", + tot_successful_enqueues, tot_successful_dequeues); + printf("SUMMARY %-25s testdur %4lu nr_enqueuers %3u wdelay %6lu " + "nr_dequeuers %3u " + "rdur %6lu nr_enqueues %12llu nr_dequeues %12llu " + "successful enqueues %12llu successful dequeues %12llu " + "end_dequeues %llu nr_ops %12llu\n", + argv[0], duration, nr_enqueuers, wdelay, + nr_dequeuers, rduration, tot_enqueues, tot_dequeues, + tot_successful_enqueues, + tot_successful_dequeues, end_dequeues, + tot_enqueues + tot_dequeues); + if (tot_successful_enqueues != tot_successful_dequeues + end_dequeues) + printf("WARNING! Discrepancy between nr succ. enqueues %llu vs " + "succ. dequeues + end dequeues %llu.\n", + tot_successful_enqueues, + tot_successful_dequeues + end_dequeues); + + free(count_enqueuer); + free(count_dequeuer); + free(tid_enqueuer); + free(tid_dequeuer); + return 0; +} -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Tue Oct 2 10:16:38 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 2 Oct 2012 10:16:38 -0400 Subject: [lttng-dev] [URCU PATCH 3/3] call_rcu: use wfcqueue, eliminate false-sharing In-Reply-To: <20121002141307.GA4057@Krystal> References: <20121002141307.GA4057@Krystal> Message-ID: <20121002141638.GD4057@Krystal> Eliminate false-sharing between call_rcu (enqueuer) and worker threads on the queue head and tail. Signed-off-by: Mathieu Desnoyers --- diff --git a/tests/Makefile.am b/tests/Makefile.am index 81718bb..c92bbe6 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -30,14 +30,14 @@ if COMPAT_FUTEX COMPAT+=$(top_srcdir)/compat_futex.c endif -URCU=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) -URCU_QSBR=$(top_srcdir)/urcu-qsbr.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) +URCU=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) +URCU_QSBR=$(top_srcdir)/urcu-qsbr.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) # URCU_MB uses urcu.c but -DRCU_MB must be defined -URCU_MB=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) +URCU_MB=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) # URCU_SIGNAL uses urcu.c but -DRCU_SIGNAL must be defined -URCU_SIGNAL=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) -URCU_BP=$(top_srcdir)/urcu-bp.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) -URCU_DEFER=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) +URCU_SIGNAL=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) +URCU_BP=$(top_srcdir)/urcu-bp.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) +URCU_DEFER=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) URCU_COMMON_LIB=$(top_builddir)/liburcu-common.la URCU_LIB=$(top_builddir)/liburcu.la diff --git a/urcu-call-rcu-impl.h b/urcu-call-rcu-impl.h index 13b24ff..cf65992 100644 --- a/urcu-call-rcu-impl.h +++ b/urcu-call-rcu-impl.h @@ -21,6 +21,7 @@ */ #define _GNU_SOURCE +#define _LGPL_SOURCE #include #include #include @@ -35,7 +36,7 @@ #include #include "config.h" -#include "urcu/wfqueue.h" +#include "urcu/wfcqueue.h" #include "urcu-call-rcu.h" #include "urcu-pointer.h" #include "urcu/list.h" @@ -46,7 +47,14 @@ /* Data structure that identifies a call_rcu thread. */ struct call_rcu_data { - struct cds_wfq_queue cbs; + /* + * Align the tail on cache line size to eliminate false-sharing + * with head. + */ + struct cds_wfcq_tail __attribute__((aligned(CAA_CACHE_LINE_SIZE))) cbs_tail; + /* Alignment on cache line size will add padding here */ + + struct cds_wfcq_head cbs_head; unsigned long flags; int32_t futex; unsigned long qlen; /* maintained for debugging. */ @@ -220,10 +228,7 @@ static void call_rcu_wake_up(struct call_rcu_data *crdp) static void *call_rcu_thread(void *arg) { unsigned long cbcount; - struct cds_wfq_node *cbs; - struct cds_wfq_node **cbs_tail; - struct call_rcu_data *crdp = (struct call_rcu_data *)arg; - struct rcu_head *rhp; + struct call_rcu_data *crdp = (struct call_rcu_data *) arg; int rt = !!(uatomic_read(&crdp->flags) & URCU_CALL_RCU_RT); int ret; @@ -243,35 +248,33 @@ static void *call_rcu_thread(void *arg) cmm_smp_mb(); } for (;;) { - if (&crdp->cbs.head != _CMM_LOAD_SHARED(crdp->cbs.tail)) { - while ((cbs = _CMM_LOAD_SHARED(crdp->cbs.head)) == NULL) - poll(NULL, 0, 1); - _CMM_STORE_SHARED(crdp->cbs.head, NULL); - cbs_tail = (struct cds_wfq_node **) - uatomic_xchg(&crdp->cbs.tail, &crdp->cbs.head); + struct cds_wfcq_head cbs_tmp_head; + struct cds_wfcq_tail cbs_tmp_tail; + struct cds_wfcq_node *cbs, *cbs_tmp_n; + + cds_wfcq_init(&cbs_tmp_head, &cbs_tmp_tail); + __cds_wfcq_splice_blocking(&cbs_tmp_head, &cbs_tmp_tail, + &crdp->cbs_head, &crdp->cbs_tail); + if (!cds_wfcq_empty(&cbs_tmp_head, &cbs_tmp_tail)) { synchronize_rcu(); cbcount = 0; - do { - while (cbs->next == NULL && - &cbs->next != cbs_tail) - poll(NULL, 0, 1); - if (cbs == &crdp->cbs.dummy) { - cbs = cbs->next; - continue; - } - rhp = (struct rcu_head *)cbs; - cbs = cbs->next; + __cds_wfcq_for_each_blocking_safe(&cbs_tmp_head, + &cbs_tmp_tail, cbs, cbs_tmp_n) { + struct rcu_head *rhp; + + rhp = caa_container_of(cbs, + struct rcu_head, next); rhp->func(rhp); cbcount++; - } while (cbs != NULL); + } uatomic_sub(&crdp->qlen, cbcount); } if (uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOP) break; rcu_thread_offline(); if (!rt) { - if (&crdp->cbs.head - == _CMM_LOAD_SHARED(crdp->cbs.tail)) { + if (cds_wfcq_empty(&crdp->cbs_head, + &crdp->cbs_tail)) { call_rcu_wait(crdp); poll(NULL, 0, 10); uatomic_dec(&crdp->futex); @@ -317,7 +320,7 @@ static void call_rcu_data_init(struct call_rcu_data **crdpp, if (crdp == NULL) urcu_die(errno); memset(crdp, '\0', sizeof(*crdp)); - cds_wfq_init(&crdp->cbs); + cds_wfcq_init(&crdp->cbs_head, &crdp->cbs_tail); crdp->qlen = 0; crdp->futex = 0; crdp->flags = flags; @@ -590,12 +593,12 @@ void call_rcu(struct rcu_head *head, { struct call_rcu_data *crdp; - cds_wfq_node_init(&head->next); + cds_wfcq_node_init(&head->next); head->func = func; /* Holding rcu read-side lock across use of per-cpu crdp */ rcu_read_lock(); crdp = get_call_rcu_data(); - cds_wfq_enqueue(&crdp->cbs, &head->next); + cds_wfcq_enqueue(&crdp->cbs_head, &crdp->cbs_tail, &head->next); uatomic_inc(&crdp->qlen); wake_call_rcu_thread(crdp); rcu_read_unlock(); @@ -625,10 +628,6 @@ void call_rcu(struct rcu_head *head, */ void call_rcu_data_free(struct call_rcu_data *crdp) { - struct cds_wfq_node *cbs; - struct cds_wfq_node **cbs_tail; - struct cds_wfq_node **cbs_endprev; - if (crdp == NULL || crdp == default_call_rcu_data) { return; } @@ -638,17 +637,12 @@ void call_rcu_data_free(struct call_rcu_data *crdp) while ((uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOPPED) == 0) poll(NULL, 0, 1); } - if (&crdp->cbs.head != _CMM_LOAD_SHARED(crdp->cbs.tail)) { - while ((cbs = _CMM_LOAD_SHARED(crdp->cbs.head)) == NULL) - poll(NULL, 0, 1); - _CMM_STORE_SHARED(crdp->cbs.head, NULL); - cbs_tail = (struct cds_wfq_node **) - uatomic_xchg(&crdp->cbs.tail, &crdp->cbs.head); + if (!cds_wfcq_empty(&crdp->cbs_head, &crdp->cbs_tail)) { /* Create default call rcu data if need be */ (void) get_default_call_rcu_data(); - cbs_endprev = (struct cds_wfq_node **) - uatomic_xchg(&default_call_rcu_data, cbs_tail); - *cbs_endprev = cbs; + __cds_wfcq_splice_blocking(&default_call_rcu_data->cbs_head, + &default_call_rcu_data->cbs_tail, + &crdp->cbs_head, &crdp->cbs_tail); uatomic_add(&default_call_rcu_data->qlen, uatomic_read(&crdp->qlen)); wake_call_rcu_thread(default_call_rcu_data); diff --git a/urcu-call-rcu.h b/urcu-call-rcu.h index f7eac8d..1dad0e2 100644 --- a/urcu-call-rcu.h +++ b/urcu-call-rcu.h @@ -32,7 +32,7 @@ #include #include -#include +#include #ifdef __cplusplus extern "C" { @@ -55,7 +55,7 @@ struct call_rcu_data; */ struct rcu_head { - struct cds_wfq_node next; + struct cds_wfcq_node next; void (*func)(struct rcu_head *head); }; -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Tue Oct 2 11:26:54 2012 From: dgoulet at efficios.com (David Goulet) Date: Tue, 02 Oct 2012 11:26:54 -0400 Subject: [lttng-dev] [lttng-tools GIT PULL] please pull from compudj-pull In-Reply-To: <20121001232227.GA23978@Krystal> References: <20121001232227.GA23978@Krystal> Message-ID: <506B07BE.4090209@efficios.com> Merged! Mathieu Desnoyers: > at commit db8870edf473e2a2f69e488375d32405ea324017 > > Thanks! > > Mathieu > From paul.chavent at fnac.net Tue Oct 2 03:37:07 2012 From: paul.chavent at fnac.net (paul.chavent at fnac.net) Date: Tue, 2 Oct 2012 09:37:07 +0200 (CEST) Subject: [lttng-dev] Build out of src tree. Message-ID: <1551710.192371349163427925.JavaMail.www@wsfrf1114> I used to build my packages out of src tree. As it's not straightforward for lttng-tools and lttng-ust, i would like to submit those two patches. Regards. Paul. -------------- next part -------------- A non-text attachment was scrubbed... Name: lttng-ust-git-0001-Build-out-of-src-tree.patch Type: text/x-patch Size: 5627 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: lttng-tools-git-0001-Build-out-of-src-tree.patch Type: text/x-patch Size: 9835 bytes Desc: not available URL: From mathieu.desnoyers at efficios.com Tue Oct 2 12:47:33 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 2 Oct 2012 12:47:33 -0400 Subject: [lttng-dev] Build out of src tree. In-Reply-To: <1551710.192371349163427925.JavaMail.www@wsfrf1114> References: <1551710.192371349163427925.JavaMail.www@wsfrf1114> Message-ID: <20121002164733.GA5877@Krystal> * paul.chavent at fnac.net (paul.chavent at fnac.net) wrote: > I used to build my packages out of src tree. As it's not > straightforward for lttng-tools and lttng-ust, i would like to submit > those two patches. I just merged your UST patch into master and stable-2.0. David, I recommend you do the same for the lttng-tools. (please consider this as my acked-by). Thanks! Mathieu > > Regards. > > Paul. > > From dbcdad58a675fb19125411f713700ce633919173 Mon Sep 17 00:00:00 2001 > From: Paul Chavent > Date: Tue, 2 Oct 2012 09:30:31 +0200 > Subject: [PATCH] Build out of src tree. > > --- > liblttng-ust-ctl/Makefile.am | 2 +- > liblttng-ust-libc-wrapper/Makefile.am | 2 +- > liblttng-ust/Makefile.am | 2 +- > libringbuffer/Makefile.am | 4 ++-- > tests/demo/Makefile.am | 2 +- > tests/fork/Makefile.am | 2 +- > tests/hello-static-lib/Makefile.am | 2 +- > tests/hello.cxx/Makefile.am | 2 +- > tests/hello/Makefile.am | 2 +- > tests/ust-basic-tracing/Makefile.am | 2 +- > tests/ust-multi-test/Makefile.am | 2 +- > 11 files changed, 12 insertions(+), 12 deletions(-) > > diff --git a/liblttng-ust-ctl/Makefile.am b/liblttng-ust-ctl/Makefile.am > index 4e341be..1a57fef 100644 > --- a/liblttng-ust-ctl/Makefile.am > +++ b/liblttng-ust-ctl/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/liblttng-ust-comm > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/liblttng-ust-comm -I$(top_builddir)/include > AM_CFLAGS = -fno-strict-aliasing > > lib_LTLIBRARIES = liblttng-ust-ctl.la > diff --git a/liblttng-ust-libc-wrapper/Makefile.am b/liblttng-ust-libc-wrapper/Makefile.am > index 5b3f7f0..4fdcedb 100644 > --- a/liblttng-ust-libc-wrapper/Makefile.am > +++ b/liblttng-ust-libc-wrapper/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = -I$(top_srcdir)/include > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_builddir)/include > AM_CFLAGS = -fno-strict-aliasing > > lib_LTLIBRARIES = liblttng-ust-libc-wrapper.la > diff --git a/liblttng-ust/Makefile.am b/liblttng-ust/Makefile.am > index aeb5092..dff8b6e 100644 > --- a/liblttng-ust/Makefile.am > +++ b/liblttng-ust/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = -I$(top_srcdir)/include > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_builddir)/include > AM_CFLAGS = -fno-strict-aliasing > > noinst_LTLIBRARIES = liblttng-ust-runtime.la liblttng-ust-support.la > diff --git a/libringbuffer/Makefile.am b/libringbuffer/Makefile.am > index b5b0ebd..271c8be 100644 > --- a/libringbuffer/Makefile.am > +++ b/libringbuffer/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = -I$(top_srcdir)/include > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_builddir)/include > AM_CFLAGS = -fno-strict-aliasing > > noinst_LTLIBRARIES = libringbuffer.la > @@ -12,7 +12,7 @@ libringbuffer_la_SOURCES = \ > backend.h backend_internal.h backend_types.h \ > frontend_api.h frontend.h frontend_internal.h frontend_types.h \ > nohz.h vatomic.h tlsfixup.h > - > + > libringbuffer_la_LIBADD = \ > -lpthread \ > -lrt > diff --git a/tests/demo/Makefile.am b/tests/demo/Makefile.am > index e4570ff..0e43255 100644 > --- a/tests/demo/Makefile.am > +++ b/tests/demo/Makefile.am > @@ -1,6 +1,6 @@ > # -Wsystem-headers is needed to print warnings in the tracepoint > # description file. > -AM_CPPFLAGS = -I$(top_srcdir)/include -Wsystem-headers > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_builddir)/include -Wsystem-headers > > # Set LIBS to nothing so the application does not link on useless > # libraries. > diff --git a/tests/fork/Makefile.am b/tests/fork/Makefile.am > index 0a649c7..a893366 100644 > --- a/tests/fork/Makefile.am > +++ b/tests/fork/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = -I$(top_srcdir)/include -Wsystem-headers > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_builddir)/include -Wsystem-headers > > noinst_PROGRAMS = fork fork2 > fork_SOURCES = fork.c ust_tests_fork.h > diff --git a/tests/hello-static-lib/Makefile.am b/tests/hello-static-lib/Makefile.am > index 6633489..6ae9463 100644 > --- a/tests/hello-static-lib/Makefile.am > +++ b/tests/hello-static-lib/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = -I$(top_srcdir)/include -Wsystem-headers > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_builddir)/include -Wsystem-headers > > noinst_LTLIBRARIES = liblttng-ust-provider-ust-test-hello.la > liblttng_ust_provider_ust_test_hello_la_SOURCES = \ > diff --git a/tests/hello.cxx/Makefile.am b/tests/hello.cxx/Makefile.am > index f56f431..4eace0e 100644 > --- a/tests/hello.cxx/Makefile.am > +++ b/tests/hello.cxx/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = -I$(top_srcdir)/include -Wsystem-headers > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_builddir)/include -Wsystem-headers > > noinst_PROGRAMS = hello > hello_SOURCES = hello.cpp tp.c ust_tests_hello.h > diff --git a/tests/hello/Makefile.am b/tests/hello/Makefile.am > index 0c4c311..1ee7f8d 100644 > --- a/tests/hello/Makefile.am > +++ b/tests/hello/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = -I$(top_srcdir)/include -Wsystem-headers > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_builddir)/include -Wsystem-headers > > noinst_PROGRAMS = hello > hello_SOURCES = hello.c tp.c ust_tests_hello.h > diff --git a/tests/ust-basic-tracing/Makefile.am b/tests/ust-basic-tracing/Makefile.am > index 0a0a2f0..175adc8 100644 > --- a/tests/ust-basic-tracing/Makefile.am > +++ b/tests/ust-basic-tracing/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/libust > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_builddir)/include -I$(top_srcdir)/libust > > noinst_PROGRAMS = ust-basic-tracing > ust_basic_tracing_SOURCES = ust-basic-tracing.c > diff --git a/tests/ust-multi-test/Makefile.am b/tests/ust-multi-test/Makefile.am > index c1d39d9..69c7cc9 100644 > --- a/tests/ust-multi-test/Makefile.am > +++ b/tests/ust-multi-test/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/libust > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_builddir)/include -I$(top_srcdir)/libust > > noinst_PROGRAMS = ust-multi-test > ust_multi_test_SOURCES = ust-multi-test.c > -- > 1.7.9.5 > > From 3425bde2827d245cb427c989786fdc10bb235b9f Mon Sep 17 00:00:00 2001 > From: Paul Chavent > Date: Tue, 2 Oct 2012 09:30:02 +0200 > Subject: [PATCH] Build out of src tree. > > --- > src/bin/lttng-consumerd/Makefile.am | 2 +- > src/bin/lttng-relayd/Makefile.am | 3 ++- > src/bin/lttng-sessiond/Makefile.am | 5 +++-- > src/bin/lttng/Makefile.am | 3 ++- > src/common/Makefile.am | 2 +- > src/common/compat/Makefile.am | 2 +- > src/common/hashtable/Makefile.am | 2 +- > src/common/kernel-consumer/Makefile.am | 2 ++ > src/common/kernel-ctl/Makefile.am | 2 +- > src/common/relayd/Makefile.am | 2 ++ > src/common/sessiond-comm/Makefile.am | 2 ++ > src/common/ust-consumer/Makefile.am | 3 +++ > src/lib/lttng-ctl/Makefile.am | 2 ++ > src/lib/lttng-ctl/filter/Makefile.am | 2 ++ > tests/kernel/Makefile.am | 2 +- > tests/tools/Makefile.am | 2 +- > tests/tools/streaming/Makefile.am | 2 +- > tests/ust/Makefile.am | 2 +- > tests/ust/before-after/Makefile.am | 2 +- > tests/ust/high-throughput/Makefile.am | 2 +- > tests/ust/low-throughput/Makefile.am | 2 +- > tests/ust/multi-session/Makefile.am | 2 +- > tests/ust/nprocesses/Makefile.am | 2 +- > 23 files changed, 34 insertions(+), 18 deletions(-) > > diff --git a/src/bin/lttng-consumerd/Makefile.am b/src/bin/lttng-consumerd/Makefile.am > index 89ae059..a395c0b 100644 > --- a/src/bin/lttng-consumerd/Makefile.am > +++ b/src/bin/lttng-consumerd/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src > > lttnglibexec_PROGRAMS = lttng-consumerd > > diff --git a/src/bin/lttng-relayd/Makefile.am b/src/bin/lttng-relayd/Makefile.am > index cb625e5..1f3cde4 100644 > --- a/src/bin/lttng-relayd/Makefile.am > +++ b/src/bin/lttng-relayd/Makefile.am > @@ -1,4 +1,5 @@ > -AM_CPPFLAGS = -DINSTALL_BIN_PATH=\""$(lttnglibexecdir)"\" \ > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src \ > + -DINSTALL_BIN_PATH=\""$(lttnglibexecdir)"\" \ > -DINSTALL_LIB_PATH=\""$(libdir)"\" > > AM_CFLAGS = -fno-strict-aliasing > diff --git a/src/bin/lttng-sessiond/Makefile.am b/src/bin/lttng-sessiond/Makefile.am > index 73be023..b1859d6 100644 > --- a/src/bin/lttng-sessiond/Makefile.am > +++ b/src/bin/lttng-sessiond/Makefile.am > @@ -1,5 +1,6 @@ > -AM_CPPFLAGS = -DINSTALL_BIN_PATH=\""$(lttnglibexecdir)"\" \ > - -DINSTALL_LIB_PATH=\""$(libdir)"\" > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src \ > + -DINSTALL_BIN_PATH=\""$(lttnglibexecdir)"\" \ > + -DINSTALL_LIB_PATH=\""$(libdir)"\" > > AM_CFLAGS = -fno-strict-aliasing > > diff --git a/src/bin/lttng/Makefile.am b/src/bin/lttng/Makefile.am > index 0381aa6..fe7991b 100644 > --- a/src/bin/lttng/Makefile.am > +++ b/src/bin/lttng/Makefile.am > @@ -1,4 +1,5 @@ > -AM_CPPFLAGS = -DINSTALL_BIN_PATH=\""$(bindir)"\" > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src \ > + -DINSTALL_BIN_PATH=\""$(bindir)"\" > > bin_PROGRAMS = lttng > > diff --git a/src/common/Makefile.am b/src/common/Makefile.am > index ca48153..5850041 100644 > --- a/src/common/Makefile.am > +++ b/src/common/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src > > SUBDIRS = compat hashtable kernel-ctl sessiond-comm relayd kernel-consumer ust-consumer > > diff --git a/src/common/compat/Makefile.am b/src/common/compat/Makefile.am > index 91cd3bd..2d83282 100644 > --- a/src/common/compat/Makefile.am > +++ b/src/common/compat/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = -I$(top_srcdir)/include > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src > > noinst_LTLIBRARIES = libcompat.la > > diff --git a/src/common/hashtable/Makefile.am b/src/common/hashtable/Makefile.am > index 7a2b835..62e22c1 100644 > --- a/src/common/hashtable/Makefile.am > +++ b/src/common/hashtable/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = -I$(top_srcdir)/include > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src > > noinst_LTLIBRARIES = libhashtable.la > > diff --git a/src/common/kernel-consumer/Makefile.am b/src/common/kernel-consumer/Makefile.am > index 008041e..ed5462a 100644 > --- a/src/common/kernel-consumer/Makefile.am > +++ b/src/common/kernel-consumer/Makefile.am > @@ -1,3 +1,5 @@ > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src > + > # Kernel consumer library > noinst_LTLIBRARIES = libkernel-consumer.la > > diff --git a/src/common/kernel-ctl/Makefile.am b/src/common/kernel-ctl/Makefile.am > index a56a021..ab057dd 100644 > --- a/src/common/kernel-ctl/Makefile.am > +++ b/src/common/kernel-ctl/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CPPFLAGS = -I$(top_srcdir)/include > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src > > noinst_LTLIBRARIES = libkernel-ctl.la > > diff --git a/src/common/relayd/Makefile.am b/src/common/relayd/Makefile.am > index 84eee1b..274da87 100644 > --- a/src/common/relayd/Makefile.am > +++ b/src/common/relayd/Makefile.am > @@ -1,3 +1,5 @@ > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src > + > # Relayd library > noinst_LTLIBRARIES = librelayd.la > > diff --git a/src/common/sessiond-comm/Makefile.am b/src/common/sessiond-comm/Makefile.am > index 2a70d54..24063f8 100644 > --- a/src/common/sessiond-comm/Makefile.am > +++ b/src/common/sessiond-comm/Makefile.am > @@ -1,3 +1,5 @@ > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src > + > # Session daemon communication lib > noinst_LTLIBRARIES = libsessiond-comm.la > > diff --git a/src/common/ust-consumer/Makefile.am b/src/common/ust-consumer/Makefile.am > index c79c4a5..ab8d38a 100644 > --- a/src/common/ust-consumer/Makefile.am > +++ b/src/common/ust-consumer/Makefile.am > @@ -1,4 +1,7 @@ > if HAVE_LIBLTTNG_UST_CTL > + > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src > + > noinst_LTLIBRARIES = libust-consumer.la > > libust_consumer_la_SOURCES = ust-consumer.c ust-consumer.h > diff --git a/src/lib/lttng-ctl/Makefile.am b/src/lib/lttng-ctl/Makefile.am > index f1b4b50..2681bdd 100644 > --- a/src/lib/lttng-ctl/Makefile.am > +++ b/src/lib/lttng-ctl/Makefile.am > @@ -1,3 +1,5 @@ > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src -I$(builddir) > + > SUBDIRS = filter > > lib_LTLIBRARIES = liblttng-ctl.la > diff --git a/src/lib/lttng-ctl/filter/Makefile.am b/src/lib/lttng-ctl/filter/Makefile.am > index 45ed418..28ff90a 100644 > --- a/src/lib/lttng-ctl/filter/Makefile.am > +++ b/src/lib/lttng-ctl/filter/Makefile.am > @@ -1,3 +1,5 @@ > +AM_CPPFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src -I$(srcdir) -I$(builddir) > + > noinst_PROGRAMS = filter-grammar-test > noinst_LTLIBRARIES = libfilter.la > noinst_HEADERS = filter-ast.h > diff --git a/tests/kernel/Makefile.am b/tests/kernel/Makefile.am > index 2992afb..8872af4 100644 > --- a/tests/kernel/Makefile.am > +++ b/tests/kernel/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CFLAGS = -g -Wall -I../ > +AM_CFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src -I$(top_srcdir)/tests -g -Wall > AM_LDFLAGS = -lurcu -lurcu-cds > > EXTRA_DIST = runall.sh run-kernel-tests.sh > diff --git a/tests/tools/Makefile.am b/tests/tools/Makefile.am > index 3d25900..c0477e5 100644 > --- a/tests/tools/Makefile.am > +++ b/tests/tools/Makefile.am > @@ -1,6 +1,6 @@ > SUBDIRS = streaming > > -AM_CFLAGS = -g -Wall -I../ > +AM_CFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src -I$(top_srcdir)/tests -g -Wall > AM_LDFLAGS = -lurcu -lurcu-cds > > EXTRA_DIST = runall.sh > diff --git a/tests/tools/streaming/Makefile.am b/tests/tools/streaming/Makefile.am > index f7a3c9d..3ff8ef0 100644 > --- a/tests/tools/streaming/Makefile.am > +++ b/tests/tools/streaming/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CFLAGS = -I. -O2 -g -I../../ > +AM_CFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src -I$(top_srcdir)/tests -I$(srcdir) -O2 -g > AM_LDFLAGS = > > if LTTNG_TOOLS_BUILD_WITH_LIBDL > diff --git a/tests/ust/Makefile.am b/tests/ust/Makefile.am > index 146bff2..387842d 100644 > --- a/tests/ust/Makefile.am > +++ b/tests/ust/Makefile.am > @@ -1,7 +1,7 @@ > if HAVE_LIBLTTNG_UST_CTL > SUBDIRS = nprocesses high-throughput low-throughput before-after multi-session > > -AM_CFLAGS = -g -Wall -I../ > +AM_CFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/tests -I$(top_srcdir)/src -g -Wall > AM_LDFLAGS = -lurcu -lurcu-cds > > EXTRA_DIST = runall.sh run-ust-global-tests.sh > diff --git a/tests/ust/before-after/Makefile.am b/tests/ust/before-after/Makefile.am > index 29652dc..d197d72 100644 > --- a/tests/ust/before-after/Makefile.am > +++ b/tests/ust/before-after/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CFLAGS = -I. -O2 > +AM_CFLAGS = -I$(srcdir) -O2 > AM_LDFLAGS = -llttng-ust > > if LTTNG_TOOLS_BUILD_WITH_LIBDL > diff --git a/tests/ust/high-throughput/Makefile.am b/tests/ust/high-throughput/Makefile.am > index 01d5ad2..cff8fe4 100644 > --- a/tests/ust/high-throughput/Makefile.am > +++ b/tests/ust/high-throughput/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CFLAGS = -I. -O2 > +AM_CFLAGS = -I$(srcdir) -O2 > AM_LDFLAGS = -llttng-ust > > if LTTNG_TOOLS_BUILD_WITH_LIBDL > diff --git a/tests/ust/low-throughput/Makefile.am b/tests/ust/low-throughput/Makefile.am > index aefdf53..a1df9f5 100644 > --- a/tests/ust/low-throughput/Makefile.am > +++ b/tests/ust/low-throughput/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CFLAGS = -I. -O2 > +AM_CFLAGS = -I$(srcdir) -O2 > AM_LDFLAGS = -llttng-ust > > if LTTNG_TOOLS_BUILD_WITH_LIBDL > diff --git a/tests/ust/multi-session/Makefile.am b/tests/ust/multi-session/Makefile.am > index 29652dc..d197d72 100644 > --- a/tests/ust/multi-session/Makefile.am > +++ b/tests/ust/multi-session/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CFLAGS = -I. -O2 > +AM_CFLAGS = -I$(srcdir) -O2 > AM_LDFLAGS = -llttng-ust > > if LTTNG_TOOLS_BUILD_WITH_LIBDL > diff --git a/tests/ust/nprocesses/Makefile.am b/tests/ust/nprocesses/Makefile.am > index d3fdcbd..20beea0 100644 > --- a/tests/ust/nprocesses/Makefile.am > +++ b/tests/ust/nprocesses/Makefile.am > @@ -1,4 +1,4 @@ > -AM_CFLAGS = -I. -O2 > +AM_CFLAGS = -I$(srcdir) -O2 > AM_LDFLAGS = -llttng-ust > > if LTTNG_TOOLS_BUILD_WITH_LIBDL > -- > 1.7.9.5 > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From christian.babeux at efficios.com Tue Oct 2 14:00:30 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 2 Oct 2012 14:00:30 -0400 Subject: [lttng-dev] [PATCH lttng-tools 4/5] Tests: Add health check thread stall test In-Reply-To: <1348770199-1618-4-git-send-email-christian.babeux@efficios.com> References: <1348770199-1618-1-git-send-email-christian.babeux@efficios.com> <1348770199-1618-4-git-send-email-christian.babeux@efficios.com> Message-ID: Hi, Please disregard this patch because it's missing the env_val declaration. A v2 will soon follow. Thank you, Christian On Thu, Sep 27, 2012 at 2:23 PM, Christian Babeux wrote: > This test trigger a "code stall" in a specified thread using the > testpoint mechanism. The testpoint behavior is implemented in > health_stall.c. The testpoint code stall a specific thread processing > by calling sleep(3). > > The test select the thread to be stalled by enabling a specific > environment variable. > > The test ensure the threads can be succesfully stalled and that the > health check feature is able to properly detect stalling in non-polling > cases. > > Signed-off-by: Christian Babeux > --- > tests/tools/health/Makefile.am | 6 +- > tests/tools/health/health_stall.c | 66 +++++++++++++++++ > tests/tools/health/health_thread_stall | 128 +++++++++++++++++++++++++++++++++ > 3 files changed, 199 insertions(+), 1 deletion(-) > create mode 100644 tests/tools/health/health_stall.c > create mode 100755 tests/tools/health/health_thread_stall > > diff --git a/tests/tools/health/Makefile.am b/tests/tools/health/Makefile.am > index 0a3f6c5..9fab582 100644 > --- a/tests/tools/health/Makefile.am > +++ b/tests/tools/health/Makefile.am > @@ -10,12 +10,16 @@ endif > > UTILS= > > -lib_LTLIBRARIES=libhealthexit.la > +lib_LTLIBRARIES=libhealthexit.la libhealthstall.la > > # Health thread exit ld_preloaded test lib > libhealthexit_la_SOURCES=health_exit.c > libhealthexit_la_LDFLAGS= -module > > +# Health thread stall ld_preloaded test lib > +libhealthstall_la_SOURCES=health_stall.c > +libhealthstall_la_LDFLAGS= -module > + > noinst_PROGRAMS = health_check > > health_check_SOURCES = health_check.c $(UTILS) > diff --git a/tests/tools/health/health_stall.c b/tests/tools/health/health_stall.c > new file mode 100644 > index 0000000..86b6986 > --- /dev/null > +++ b/tests/tools/health/health_stall.c > @@ -0,0 +1,66 @@ > +/* > + * Copyright (C) 2012 - Christian Babeux > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License, version 2 only, as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along with > + * this program; if not, write to the Free Software Foundation, Inc., 51 > + * Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > + */ > + > +#include > +#include > +#include > +#include > + > +#define STALL_TIME 60 > + > +/* > + * Check if the specified environment variable is set. > + * Return 1 if set, otherwise 0. > + */ > +int check_env_var(const char *env) > +{ > + if (env) { > + if (getenv(env) != NULL && (strncmp(env_val, "1", 1) == 0)) { > + return 1; > + } > + } > + > + return 0; > +} > + > +void __testpoint_thread_manage_clients_before_loop(void) > +{ > + const char *var = "LTTNG_THREAD_MANAGE_CLIENTS_STALL"; > + > + if (check_env_var(var)) { > + sleep(STALL_TIME); > + } > +} > + > +void __testpoint_thread_manage_kernel_before_loop(void) > +{ > + const char *var = "LTTNG_THREAD_MANAGE_KERNEL_STALL"; > + > + if (check_env_var(var)) { > + sleep(STALL_TIME); > + } > +} > + > +void __testpoint_thread_manage_apps_before_loop(void) > +{ > + const char *var = "LTTNG_THREAD_MANAGE_APPS_STALL"; > + > + if (check_env_var(var)) { > + sleep(STALL_TIME); > + } > +} > + > diff --git a/tests/tools/health/health_thread_stall b/tests/tools/health/health_thread_stall > new file mode 100755 > index 0000000..d870895 > --- /dev/null > +++ b/tests/tools/health/health_thread_stall > @@ -0,0 +1,128 @@ > +#!/bin/bash > +# > +# Copyright (C) - 2012 Christian Babeux > +# > +# This program is free software; you can redistribute it and/or modify it > +# under the terms of the GNU General Public License, version 2 only, as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it will be useful, but WITHOUT > +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > +# more details. > +# > +# You should have received a copy of the GNU General Public License along with > +# this program; if not, write to the Free Software Foundation, Inc., 51 > +# Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > + > +TEST_DESC="Health check - Thread stall" > + > +CURDIR=$(dirname $0)/ > +TESTDIR=$CURDIR/../.. > +LTTNG_BIN="lttng" > +SESSION_NAME="health_thread_stall" > +EVENT_NAME="bogus" > +HEALTH_CHECK_BIN="health_check" > +SESSIOND_PRELOAD=".libs/libhealthstall.so" > + > +source $TESTDIR/utils.sh > + > +print_test_banner "$TEST_DESC" > + > +if [ ! -f "$SESSIOND_PRELOAD" ]; then > + echo -e "libhealthstall.so not available for this test. Skipping." > + exit 0 > +fi > + > +function test_thread_stall > +{ > + test_thread_stall_name="$1" > + test_thread_exit_code="$2" > + > + echo "" > + echo -e "=== Testing health failure with ${test_thread_stall_name}" > + > + # Activate testpoints > + export LTTNG_TESTPOINT_ENABLE=1 > + > + # Activate specific thread exit > + export ${test_thread_stall_name}_STALL=1 > + > + # Spawn sessiond with preload healthexit lib > + export LD_PRELOAD="$CURDIR/$SESSIOND_PRELOAD" > + start_lttng_sessiond > + > + # Cleanup some env. var. > + unset LD_PRELOAD > + unset ${test_thread_stall_name}_STALL > + > + # Check initial health status > + $CURDIR/$HEALTH_CHECK_BIN &> /dev/null > + > + echo -n "Validating that ${test_thread_stall_name} is stalled... " > + > + # Wait > + sleep 25 > + > + # Check health status, exit code should indicate failure > + $CURDIR/$HEALTH_CHECK_BIN &> /dev/null > + > + health_check_exit_code=$? > + > + if [ $health_check_exit_code -eq $test_thread_exit_code ]; then > + print_ok > + else > + print_fail > + echo -e "Health returned: $health_check_exit_code\n" > + > + stop_lttng_sessiond > + return 1 > + fi > + > + echo -n "Validating that ${test_thread_stall_name} is no longer stalled... " > + > + # Wait > + sleep 40 > + > + # Check health status, exit code should now pass > + $CURDIR/$HEALTH_CHECK_BIN &> /dev/null > + > + health_check_exit_code=$? > + > + if [ $health_check_exit_code -eq 0 ]; then > + print_ok > + stop_lttng_sessiond > + else > + print_fail > + echo -e "Health returned: $health_check_exit_code\n" > + stop_lttng_sessiond > + return 1 > + fi > + > + > +} > + > +THREAD=("LTTNG_THREAD_MANAGE_CLIENTS" > + "LTTNG_THREAD_MANAGE_APPS" > +# This thread is a little bit tricky to stall, > +# need to send some commands and setup an app. > +# "LTTNG_THREAD_REG_APPS" > + "LTTNG_THREAD_MANAGE_KERNEL") > + > +# Exit code value to indicate specific thread failure > +EXIT_CODE=(1 > + 2 > +# 4 > + 8) > + > +THREAD_COUNT=${#THREAD[@]} > +i=0 > +while [ "$i" -lt "$THREAD_COUNT" ]; do > + test_thread_stall "${THREAD[$i]}" "${EXIT_CODE[$i]}" > + > + if [ $? -eq 1 ]; then > + exit 1 > + fi > + > + let "i++" > +done > -- > 1.7.12 > From christian.babeux at efficios.com Tue Oct 2 14:01:16 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 2 Oct 2012 14:01:16 -0400 Subject: [lttng-dev] [PATCH lttng-tools 3/5] Tests: Add health check thread exit test In-Reply-To: <1348770199-1618-3-git-send-email-christian.babeux@efficios.com> References: <1348770199-1618-1-git-send-email-christian.babeux@efficios.com> <1348770199-1618-3-git-send-email-christian.babeux@efficios.com> Message-ID: Hi, Please disregard this patch because it's missing the env_val declaration. A v2 will soon follow. Thank you, Christian On Thu, Sep 27, 2012 at 2:23 PM, Christian Babeux wrote: > This test trigger a failure in a specified thread using the > recently introduced testpoint mechanism. The testpoints behavior > is implemented in health_exit.c. The testpoint code simply calls > pthread_exit(3) and effectively "kill" the thread without affecting > the other threads behavior. > > The test select the thread to be "killed" by enabling a specific > environment variable. > > With this test we ensure that each thread can be succesfully terminated > and that the health check feature properly detect a failure. > > Signed-off-by: Christian Babeux > --- > tests/tools/health/Makefile.am | 6 ++ > tests/tools/health/health_exit.c | 80 ++++++++++++++++++++++++++ > tests/tools/health/health_thread_exit | 105 ++++++++++++++++++++++++++++++++++ > 3 files changed, 191 insertions(+) > create mode 100644 tests/tools/health/health_exit.c > create mode 100755 tests/tools/health/health_thread_exit > > diff --git a/tests/tools/health/Makefile.am b/tests/tools/health/Makefile.am > index 09573db..0a3f6c5 100644 > --- a/tests/tools/health/Makefile.am > +++ b/tests/tools/health/Makefile.am > @@ -10,6 +10,12 @@ endif > > UTILS= > > +lib_LTLIBRARIES=libhealthexit.la > + > +# Health thread exit ld_preloaded test lib > +libhealthexit_la_SOURCES=health_exit.c > +libhealthexit_la_LDFLAGS= -module > + > noinst_PROGRAMS = health_check > > health_check_SOURCES = health_check.c $(UTILS) > diff --git a/tests/tools/health/health_exit.c b/tests/tools/health/health_exit.c > new file mode 100644 > index 0000000..c2382f2 > --- /dev/null > +++ b/tests/tools/health/health_exit.c > @@ -0,0 +1,80 @@ > +/* > + * Copyright (C) 2012 - Christian Babeux > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License, version 2 only, as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along with > + * this program; if not, write to the Free Software Foundation, Inc., 51 > + * Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > + */ > + > +#include > +#include > +#include > + > +/* > + * Check if the specified environment variable is set. > + * Return 1 if set, otherwise 0. > + */ > +int check_env_var(const char *env) > +{ > + if (env) { > + if (getenv(env) != NULL && (strncmp(env_val, "1", 1) == 0)) { > + return 1; > + } > + } > + > + return 0; > +} > + > +void __testpoint_thread_manage_clients(void) > +{ > + const char *var = "LTTNG_THREAD_MANAGE_CLIENTS_EXIT"; > + > + if (check_env_var(var)) { > + pthread_exit(NULL); > + } > +} > + > +void __testpoint_thread_registration_apps(void) > +{ > + const char *var = "LTTNG_THREAD_REG_APPS_EXIT"; > + > + if (check_env_var(var)) { > + pthread_exit(NULL); > + } > +} > + > +void __testpoint_thread_manage_apps(void) > +{ > + const char *var = "LTTNG_THREAD_MANAGE_APPS_EXIT"; > + > + if (check_env_var(var)) { > + pthread_exit(NULL); > + } > +} > + > +void __testpoint_thread_manage_kernel(void) > +{ > + const char *var = "LTTNG_THREAD_MANAGE_KERNEL_EXIT"; > + > + if (check_env_var(var)) { > + pthread_exit(NULL); > + } > +} > + > +void __testpoint_thread_manage_consumer(void) > +{ > + const char *var = "LTTNG_THREAD_MANAGE_CONSUMER_EXIT"; > + > + if (check_env_var(var)) { > + pthread_exit(NULL); > + } > +} > diff --git a/tests/tools/health/health_thread_exit b/tests/tools/health/health_thread_exit > new file mode 100755 > index 0000000..dab6b64 > --- /dev/null > +++ b/tests/tools/health/health_thread_exit > @@ -0,0 +1,105 @@ > +#!/bin/bash > +# > +# Copyright (C) - 2012 Christian Babeux > +# > +# This program is free software; you can redistribute it and/or modify it > +# under the terms of the GNU General Public License, version 2 only, as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it will be useful, but WITHOUT > +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > +# more details. > +# > +# You should have received a copy of the GNU General Public License along with > +# this program; if not, write to the Free Software Foundation, Inc., 51 > +# Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > + > +TEST_DESC="Health check - Thread exit" > + > +CURDIR=$(dirname $0)/ > +TESTDIR=$CURDIR/../.. > +LTTNG_BIN="lttng" > +SESSION_NAME="health_thread_exit" > +EVENT_NAME="bogus" > +HEALTH_CHECK_BIN="health_check" > +SESSIOND_PRELOAD=".libs/libhealthexit.so" > + > +source $TESTDIR/utils.sh > + > +print_test_banner "$TEST_DESC" > + > +if [ ! -f "$SESSIOND_PRELOAD" ]; then > + echo -e "libhealthexit.so not available for this test. Skipping." > + exit 0 > +fi > + > +function test_thread_exit > +{ > + test_thread_exit_name="$1" > + test_thread_exit_code="$2" > + > + echo "" > + echo -e "=== Testing health failure with ${test_thread_exit_name}" > + > + # Activate testpoints > + export LTTNG_TESTPOINT_ENABLE=1 > + > + # Activate specific thread exit > + export ${test_thread_exit_name}_EXIT=1 > + > + # Spawn sessiond with preload healthexit lib > + export LD_PRELOAD="$CURDIR/$SESSIOND_PRELOAD" > + start_lttng_sessiond > + > + # Cleanup some env. var. > + unset LD_PRELOAD > + unset ${test_thread_exit_name}_EXIT > + > + # Check initial health status > + $CURDIR/$HEALTH_CHECK_BIN &> /dev/null > + > + echo -n "Validating thread ${test_thread_exit_name} failure... " > + > + # Wait > + sleep 25 > + > + # Check health status, exit code should indicate failure > + $CURDIR/$HEALTH_CHECK_BIN &> /dev/null > + > + health_check_exit_code=$? > + > + if [ $health_check_exit_code -eq $test_thread_exit_code ]; then > + print_ok > + stop_lttng_sessiond > + else > + print_fail > + echo -e "Health returned: $health_check_exit_code\n" > + > + stop_lttng_sessiond > + return 1 > + fi > +} > + > +THREAD=("LTTNG_THREAD_MANAGE_CLIENTS" > + "LTTNG_THREAD_MANAGE_APPS" > + "LTTNG_THREAD_REG_APPS" > + "LTTNG_THREAD_MANAGE_KERNEL") > + > +# Exit code value to indicate specific thread failure > +EXIT_CODE=(1 2 4 8) > + > +THREAD_COUNT=${#THREAD[@]} > +i=0 > +while [ "$i" -lt "$THREAD_COUNT" ]; do > + test_thread_exit "${THREAD[$i]}" "${EXIT_CODE[$i]}" > + > + if [ $? -eq 1 ]; then > + exit 1 > + fi > + > + let "i++" > +done > + > +# Special case manage consumer, need to spawn consumer via commands. > +#"LTTNG_THREAD_MANAGE_CONSUMER" > -- > 1.7.12 > From christian.babeux at efficios.com Tue Oct 2 14:05:32 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 2 Oct 2012 14:05:32 -0400 Subject: [lttng-dev] [PATCH v2 lttng-tools 3/5] Tests: Add health check thread exit test In-Reply-To: <1348770199-1618-3-git-send-email-christian.babeux@efficios.com> References: <1348770199-1618-3-git-send-email-christian.babeux@efficios.com> Message-ID: <1349201132-8407-1-git-send-email-christian.babeux@efficios.com> This test trigger a failure in a specified thread using the recently introduced testpoint mechanism. The testpoints behavior is implemented in health_exit.c. The testpoint code simply calls pthread_exit(3) and effectively "kill" the thread without affecting the other threads behavior. The test select the thread to be "killed" by enabling a specific environment variable. With this test we ensure that each thread can be succesfully terminated and that the health check feature properly detect a failure. Signed-off-by: Christian Babeux --- tests/tools/health/Makefile.am | 6 ++ tests/tools/health/health_exit.c | 81 ++++++++++++++++++++++++++ tests/tools/health/health_thread_exit | 105 ++++++++++++++++++++++++++++++++++ 3 files changed, 192 insertions(+) create mode 100644 tests/tools/health/health_exit.c create mode 100755 tests/tools/health/health_thread_exit diff --git a/tests/tools/health/Makefile.am b/tests/tools/health/Makefile.am index 09573db..0a3f6c5 100644 --- a/tests/tools/health/Makefile.am +++ b/tests/tools/health/Makefile.am @@ -10,6 +10,12 @@ endif UTILS= +lib_LTLIBRARIES=libhealthexit.la + +# Health thread exit ld_preloaded test lib +libhealthexit_la_SOURCES=health_exit.c +libhealthexit_la_LDFLAGS= -module + noinst_PROGRAMS = health_check health_check_SOURCES = health_check.c $(UTILS) diff --git a/tests/tools/health/health_exit.c b/tests/tools/health/health_exit.c new file mode 100644 index 0000000..258b08d --- /dev/null +++ b/tests/tools/health/health_exit.c @@ -0,0 +1,81 @@ +/* + * Copyright (C) 2012 - Christian Babeux + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License, version 2 only, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 51 + * Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include +#include +#include + +/* + * Check if the specified environment variable is set. + * Return 1 if set, otherwise 0. + */ +int check_env_var(const char *env) +{ + if (env) { + char *env_val = getenv(env); + if (env_val && (strncmp(env_val, "1", 1) == 0)) { + return 1; + } + } + + return 0; +} + +void __testpoint_thread_manage_clients(void) +{ + const char *var = "LTTNG_THREAD_MANAGE_CLIENTS_EXIT"; + + if (check_env_var(var)) { + pthread_exit(NULL); + } +} + +void __testpoint_thread_registration_apps(void) +{ + const char *var = "LTTNG_THREAD_REG_APPS_EXIT"; + + if (check_env_var(var)) { + pthread_exit(NULL); + } +} + +void __testpoint_thread_manage_apps(void) +{ + const char *var = "LTTNG_THREAD_MANAGE_APPS_EXIT"; + + if (check_env_var(var)) { + pthread_exit(NULL); + } +} + +void __testpoint_thread_manage_kernel(void) +{ + const char *var = "LTTNG_THREAD_MANAGE_KERNEL_EXIT"; + + if (check_env_var(var)) { + pthread_exit(NULL); + } +} + +void __testpoint_thread_manage_consumer(void) +{ + const char *var = "LTTNG_THREAD_MANAGE_CONSUMER_EXIT"; + + if (check_env_var(var)) { + pthread_exit(NULL); + } +} diff --git a/tests/tools/health/health_thread_exit b/tests/tools/health/health_thread_exit new file mode 100755 index 0000000..dab6b64 --- /dev/null +++ b/tests/tools/health/health_thread_exit @@ -0,0 +1,105 @@ +#!/bin/bash +# +# Copyright (C) - 2012 Christian Babeux +# +# This program is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License, version 2 only, as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it will be useful, but WITHOUT +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +# more details. +# +# You should have received a copy of the GNU General Public License along with +# this program; if not, write to the Free Software Foundation, Inc., 51 +# Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +TEST_DESC="Health check - Thread exit" + +CURDIR=$(dirname $0)/ +TESTDIR=$CURDIR/../.. +LTTNG_BIN="lttng" +SESSION_NAME="health_thread_exit" +EVENT_NAME="bogus" +HEALTH_CHECK_BIN="health_check" +SESSIOND_PRELOAD=".libs/libhealthexit.so" + +source $TESTDIR/utils.sh + +print_test_banner "$TEST_DESC" + +if [ ! -f "$SESSIOND_PRELOAD" ]; then + echo -e "libhealthexit.so not available for this test. Skipping." + exit 0 +fi + +function test_thread_exit +{ + test_thread_exit_name="$1" + test_thread_exit_code="$2" + + echo "" + echo -e "=== Testing health failure with ${test_thread_exit_name}" + + # Activate testpoints + export LTTNG_TESTPOINT_ENABLE=1 + + # Activate specific thread exit + export ${test_thread_exit_name}_EXIT=1 + + # Spawn sessiond with preload healthexit lib + export LD_PRELOAD="$CURDIR/$SESSIOND_PRELOAD" + start_lttng_sessiond + + # Cleanup some env. var. + unset LD_PRELOAD + unset ${test_thread_exit_name}_EXIT + + # Check initial health status + $CURDIR/$HEALTH_CHECK_BIN &> /dev/null + + echo -n "Validating thread ${test_thread_exit_name} failure... " + + # Wait + sleep 25 + + # Check health status, exit code should indicate failure + $CURDIR/$HEALTH_CHECK_BIN &> /dev/null + + health_check_exit_code=$? + + if [ $health_check_exit_code -eq $test_thread_exit_code ]; then + print_ok + stop_lttng_sessiond + else + print_fail + echo -e "Health returned: $health_check_exit_code\n" + + stop_lttng_sessiond + return 1 + fi +} + +THREAD=("LTTNG_THREAD_MANAGE_CLIENTS" + "LTTNG_THREAD_MANAGE_APPS" + "LTTNG_THREAD_REG_APPS" + "LTTNG_THREAD_MANAGE_KERNEL") + +# Exit code value to indicate specific thread failure +EXIT_CODE=(1 2 4 8) + +THREAD_COUNT=${#THREAD[@]} +i=0 +while [ "$i" -lt "$THREAD_COUNT" ]; do + test_thread_exit "${THREAD[$i]}" "${EXIT_CODE[$i]}" + + if [ $? -eq 1 ]; then + exit 1 + fi + + let "i++" +done + +# Special case manage consumer, need to spawn consumer via commands. +#"LTTNG_THREAD_MANAGE_CONSUMER" -- 1.7.12.1 From christian.babeux at efficios.com Tue Oct 2 14:06:30 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 2 Oct 2012 14:06:30 -0400 Subject: [lttng-dev] [PATCH v2 lttng-tools 4/5] Tests: Add health check thread stall test In-Reply-To: <1348770199-1618-4-git-send-email-christian.babeux@efficios.com> References: <1348770199-1618-4-git-send-email-christian.babeux@efficios.com> Message-ID: <1349201190-8549-1-git-send-email-christian.babeux@efficios.com> This test trigger a "code stall" in a specified thread using the testpoint mechanism. The testpoint behavior is implemented in health_stall.c. The testpoint code stall a specific thread processing by calling sleep(3). The test select the thread to be stalled by enabling a specific environment variable. The test ensure the threads can be succesfully stalled and that the health check feature is able to properly detect stalling in non-polling cases. Signed-off-by: Christian Babeux --- tests/tools/health/Makefile.am | 6 +- tests/tools/health/health_stall.c | 67 +++++++++++++++++ tests/tools/health/health_thread_stall | 128 +++++++++++++++++++++++++++++++++ 3 files changed, 200 insertions(+), 1 deletion(-) create mode 100644 tests/tools/health/health_stall.c create mode 100755 tests/tools/health/health_thread_stall diff --git a/tests/tools/health/Makefile.am b/tests/tools/health/Makefile.am index 0a3f6c5..9fab582 100644 --- a/tests/tools/health/Makefile.am +++ b/tests/tools/health/Makefile.am @@ -10,12 +10,16 @@ endif UTILS= -lib_LTLIBRARIES=libhealthexit.la +lib_LTLIBRARIES=libhealthexit.la libhealthstall.la # Health thread exit ld_preloaded test lib libhealthexit_la_SOURCES=health_exit.c libhealthexit_la_LDFLAGS= -module +# Health thread stall ld_preloaded test lib +libhealthstall_la_SOURCES=health_stall.c +libhealthstall_la_LDFLAGS= -module + noinst_PROGRAMS = health_check health_check_SOURCES = health_check.c $(UTILS) diff --git a/tests/tools/health/health_stall.c b/tests/tools/health/health_stall.c new file mode 100644 index 0000000..9ce3e65 --- /dev/null +++ b/tests/tools/health/health_stall.c @@ -0,0 +1,67 @@ +/* + * Copyright (C) 2012 - Christian Babeux + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License, version 2 only, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 51 + * Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include +#include +#include +#include + +#define STALL_TIME 60 + +/* + * Check if the specified environment variable is set. + * Return 1 if set, otherwise 0. + */ +int check_env_var(const char *env) +{ + if (env) { + char *env_val = getenv(env); + if (env_val && (strncmp(env_val, "1", 1) == 0)) { + return 1; + } + } + + return 0; +} + +void __testpoint_thread_manage_clients_before_loop(void) +{ + const char *var = "LTTNG_THREAD_MANAGE_CLIENTS_STALL"; + + if (check_env_var(var)) { + sleep(STALL_TIME); + } +} + +void __testpoint_thread_manage_kernel_before_loop(void) +{ + const char *var = "LTTNG_THREAD_MANAGE_KERNEL_STALL"; + + if (check_env_var(var)) { + sleep(STALL_TIME); + } +} + +void __testpoint_thread_manage_apps_before_loop(void) +{ + const char *var = "LTTNG_THREAD_MANAGE_APPS_STALL"; + + if (check_env_var(var)) { + sleep(STALL_TIME); + } +} + diff --git a/tests/tools/health/health_thread_stall b/tests/tools/health/health_thread_stall new file mode 100755 index 0000000..d870895 --- /dev/null +++ b/tests/tools/health/health_thread_stall @@ -0,0 +1,128 @@ +#!/bin/bash +# +# Copyright (C) - 2012 Christian Babeux +# +# This program is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License, version 2 only, as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it will be useful, but WITHOUT +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +# more details. +# +# You should have received a copy of the GNU General Public License along with +# this program; if not, write to the Free Software Foundation, Inc., 51 +# Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +TEST_DESC="Health check - Thread stall" + +CURDIR=$(dirname $0)/ +TESTDIR=$CURDIR/../.. +LTTNG_BIN="lttng" +SESSION_NAME="health_thread_stall" +EVENT_NAME="bogus" +HEALTH_CHECK_BIN="health_check" +SESSIOND_PRELOAD=".libs/libhealthstall.so" + +source $TESTDIR/utils.sh + +print_test_banner "$TEST_DESC" + +if [ ! -f "$SESSIOND_PRELOAD" ]; then + echo -e "libhealthstall.so not available for this test. Skipping." + exit 0 +fi + +function test_thread_stall +{ + test_thread_stall_name="$1" + test_thread_exit_code="$2" + + echo "" + echo -e "=== Testing health failure with ${test_thread_stall_name}" + + # Activate testpoints + export LTTNG_TESTPOINT_ENABLE=1 + + # Activate specific thread exit + export ${test_thread_stall_name}_STALL=1 + + # Spawn sessiond with preload healthexit lib + export LD_PRELOAD="$CURDIR/$SESSIOND_PRELOAD" + start_lttng_sessiond + + # Cleanup some env. var. + unset LD_PRELOAD + unset ${test_thread_stall_name}_STALL + + # Check initial health status + $CURDIR/$HEALTH_CHECK_BIN &> /dev/null + + echo -n "Validating that ${test_thread_stall_name} is stalled... " + + # Wait + sleep 25 + + # Check health status, exit code should indicate failure + $CURDIR/$HEALTH_CHECK_BIN &> /dev/null + + health_check_exit_code=$? + + if [ $health_check_exit_code -eq $test_thread_exit_code ]; then + print_ok + else + print_fail + echo -e "Health returned: $health_check_exit_code\n" + + stop_lttng_sessiond + return 1 + fi + + echo -n "Validating that ${test_thread_stall_name} is no longer stalled... " + + # Wait + sleep 40 + + # Check health status, exit code should now pass + $CURDIR/$HEALTH_CHECK_BIN &> /dev/null + + health_check_exit_code=$? + + if [ $health_check_exit_code -eq 0 ]; then + print_ok + stop_lttng_sessiond + else + print_fail + echo -e "Health returned: $health_check_exit_code\n" + stop_lttng_sessiond + return 1 + fi + + +} + +THREAD=("LTTNG_THREAD_MANAGE_CLIENTS" + "LTTNG_THREAD_MANAGE_APPS" +# This thread is a little bit tricky to stall, +# need to send some commands and setup an app. +# "LTTNG_THREAD_REG_APPS" + "LTTNG_THREAD_MANAGE_KERNEL") + +# Exit code value to indicate specific thread failure +EXIT_CODE=(1 + 2 +# 4 + 8) + +THREAD_COUNT=${#THREAD[@]} +i=0 +while [ "$i" -lt "$THREAD_COUNT" ]; do + test_thread_stall "${THREAD[$i]}" "${EXIT_CODE[$i]}" + + if [ $? -eq 1 ]; then + exit 1 + fi + + let "i++" +done -- 1.7.12.1 From dgoulet at efficios.com Tue Oct 2 14:15:55 2012 From: dgoulet at efficios.com (David Goulet) Date: Tue, 2 Oct 2012 14:15:55 -0400 Subject: [lttng-dev] [PATCH lttng-tools] Fix: Stream allocation and insertion consistency Message-ID: <1349201755-7875-1-git-send-email-dgoulet@efficios.com> The stream allocation in the consumer was doing ustctl actions on the stream and updating refounts. However, before inserting the stream into the hash table and polling on the fd for data, an error could occur which could stop the stream insertion hence creating multiple fd leaks, mem leaks and bad refount state. Furthermore, the consumer_del_stream now can destroy a stream even if that stream is not added to the global hash table. There is also a couple of fixes adding missing rcu read side lock and a call_rcu() on stream deletion. Signed-off-by: David Goulet --- src/common/consumer.c | 255 +++++++++++++++++++++----------- src/common/ust-consumer/ust-consumer.c | 32 +++- src/common/ust-consumer/ust-consumer.h | 4 +- 3 files changed, 197 insertions(+), 94 deletions(-) diff --git a/src/common/consumer.c b/src/common/consumer.c index 53806b0..4f60860 100644 --- a/src/common/consumer.c +++ b/src/common/consumer.c @@ -62,20 +62,23 @@ volatile int consumer_quit = 0; * Find a stream. The consumer_data.lock must be locked during this * call. */ -static struct lttng_consumer_stream *consumer_find_stream(int key) +static struct lttng_consumer_stream *consumer_find_stream(int key, + struct lttng_ht *ht) { struct lttng_ht_iter iter; struct lttng_ht_node_ulong *node; struct lttng_consumer_stream *stream = NULL; + assert(ht); + /* Negative keys are lookup failures */ - if (key < 0) + if (key < 0) { return NULL; + } rcu_read_lock(); - lttng_ht_lookup(consumer_data.stream_ht, (void *)((unsigned long) key), - &iter); + lttng_ht_lookup(ht, (void *)((unsigned long) key), &iter); node = lttng_ht_iter_get_node_ulong(&iter); if (node != NULL) { stream = caa_container_of(node, struct lttng_consumer_stream, node); @@ -86,12 +89,12 @@ static struct lttng_consumer_stream *consumer_find_stream(int key) return stream; } -static void consumer_steal_stream_key(int key) +static void consumer_steal_stream_key(int key, struct lttng_ht *ht) { struct lttng_consumer_stream *stream; rcu_read_lock(); - stream = consumer_find_stream(key); + stream = consumer_find_stream(key, ht); if (stream) { stream->key = -1; /* @@ -223,8 +226,8 @@ void consumer_flag_relayd_for_destroy(struct consumer_relayd_sock_pair *relayd) } /* - * Remove a stream from the global list protected by a mutex. This - * function is also responsible for freeing its data structures. + * Remove a stream from the global list protected by a mutex. This function is + * also responsible for freeing its data structures. */ void consumer_del_stream(struct lttng_consumer_stream *stream) { @@ -232,10 +235,46 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) struct lttng_ht_iter iter; struct lttng_consumer_channel *free_chan = NULL; struct consumer_relayd_sock_pair *relayd; + struct lttng_ht_node_ulong *node; assert(stream); + DBG3("Consumer deleting stream %d", stream->key); + pthread_mutex_lock(&consumer_data.lock); + rcu_read_lock(); + + /* + * A stream with a key value of -1 means that the stream is in the hash + * table but can not be looked up. This happens when consumer_add_stream is + * done and we have a duplicate key before insertion. + * consumer_steal_stream_key() is called to make sure we can insert a + * stream even though the index is already present. Since the key is the fd + * value on the session daemon side, duplicates are possible. + */ + if (stream->key != -1) { + lttng_ht_lookup(consumer_data.stream_ht, + (void *)((unsigned long) stream->key), &iter); + node = lttng_ht_iter_get_node_ulong(&iter); + if (node == NULL) { + rcu_read_unlock(); + + /* + * Stream doest not exist in hash table. This can happen if we hit + * an error after allocation but before adding it to the table. We + * consider that if the node is not in the hash table and has a + * valid key, no ustctl/ioctl nor mmap action was done hence + * jumping to the RCU free. + */ + DBG2("Consumer stream key %d not found during deletion", stream->key); + goto free_stream; + } else { + /* Remove stream from hash table and continue */ + ret = lttng_ht_del(consumer_data.stream_ht, &iter); + assert(!ret); + } + } + rcu_read_unlock(); switch (consumer_data.type) { case LTTNG_CONSUMER_KERNEL: @@ -256,20 +295,10 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) goto end; } - rcu_read_lock(); - iter.iter.node = &stream->node.node; - ret = lttng_ht_del(consumer_data.stream_ht, &iter); - assert(!ret); - - rcu_read_unlock(); - - if (consumer_data.stream_count <= 0) { - goto end; - } + /* This should NEVER reach a negative value. */ + assert(consumer_data.stream_count >= 0); consumer_data.stream_count--; - if (!stream) { - goto end; - } + if (stream->out_fd >= 0) { ret = close(stream->out_fd); if (ret) { @@ -317,7 +346,6 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) destroy_relayd(relayd); } } - rcu_read_unlock(); uatomic_dec(&stream->chan->refcount); if (!uatomic_read(&stream->chan->refcount) @@ -325,7 +353,6 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) free_chan = stream->chan; } - call_rcu(&stream->node.head, consumer_free_stream); end: consumer_data.need_update = 1; pthread_mutex_unlock(&consumer_data.lock); @@ -333,6 +360,10 @@ end: if (free_chan) { consumer_del_channel(free_chan); } + +free_stream: + call_rcu(&stream->node.head, consumer_free_stream); + rcu_read_unlock(); } struct lttng_consumer_stream *consumer_allocate_stream( @@ -349,20 +380,25 @@ struct lttng_consumer_stream *consumer_allocate_stream( int *alloc_ret) { struct lttng_consumer_stream *stream; - int ret; stream = zmalloc(sizeof(*stream)); if (stream == NULL) { - perror("malloc struct lttng_consumer_stream"); + PERROR("malloc struct lttng_consumer_stream"); *alloc_ret = -ENOMEM; - return NULL; + goto end; } + + /* + * Get stream's channel reference. Needed when adding the stream to the + * global hash table. + */ stream->chan = consumer_find_channel(channel_key); if (!stream->chan) { *alloc_ret = -ENOENT; + ERR("Unable to find channel for stream %d", stream_key); goto error; } - stream->chan->refcount++; + stream->key = stream_key; stream->shm_fd = shm_fd; stream->wait_fd = wait_fd; @@ -381,35 +417,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( lttng_ht_node_init_ulong(&stream->node, stream->key); lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); - switch (consumer_data.type) { - case LTTNG_CONSUMER_KERNEL: - break; - case LTTNG_CONSUMER32_UST: - case LTTNG_CONSUMER64_UST: - stream->cpu = stream->chan->cpucount++; - ret = lttng_ustconsumer_allocate_stream(stream); - if (ret) { - *alloc_ret = -EINVAL; - goto error; - } - break; - default: - ERR("Unknown consumer_data type"); - *alloc_ret = -EINVAL; - goto error; - } - - /* - * When nb_init_streams reaches 0, we don't need to trigger any action in - * terms of destroying the associated channel, because the action that - * causes the count to become 0 also causes a stream to be added. The - * channel deletion will thus be triggered by the following removal of this - * stream. - */ - if (uatomic_read(&stream->chan->nb_init_streams) > 0) { - uatomic_dec(&stream->chan->nb_init_streams); - } - DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, stream->shm_fd, stream->wait_fd, @@ -419,6 +426,7 @@ struct lttng_consumer_stream *consumer_allocate_stream( error: free(stream); +end: return NULL; } @@ -428,38 +436,66 @@ error: int consumer_add_stream(struct lttng_consumer_stream *stream) { int ret = 0; - struct lttng_ht_node_ulong *node; - struct lttng_ht_iter iter; struct consumer_relayd_sock_pair *relayd; - pthread_mutex_lock(&consumer_data.lock); - /* Steal stream identifier, for UST */ - consumer_steal_stream_key(stream->key); + assert(stream); + + DBG3("Adding consumer stream %d", stream->key); + pthread_mutex_lock(&consumer_data.lock); rcu_read_lock(); - lttng_ht_lookup(consumer_data.stream_ht, - (void *)((unsigned long) stream->key), &iter); - node = lttng_ht_iter_get_node_ulong(&iter); - if (node != NULL) { - rcu_read_unlock(); - /* Stream already exist. Ignore the insertion */ - goto end; - } - lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); + switch (consumer_data.type) { + case LTTNG_CONSUMER_KERNEL: + break; + case LTTNG_CONSUMER32_UST: + case LTTNG_CONSUMER64_UST: + stream->cpu = stream->chan->cpucount++; + ret = lttng_ustconsumer_add_stream(stream); + if (ret) { + ret = -EINVAL; + goto error; + } + + /* Steal stream identifier only for UST */ + consumer_steal_stream_key(stream->key, consumer_data.stream_ht); + break; + default: + ERR("Unknown consumer_data type"); + assert(0); + ret = -ENOSYS; + goto error; + } /* Check and cleanup relayd */ relayd = consumer_find_relayd(stream->net_seq_idx); if (relayd != NULL) { uatomic_inc(&relayd->refcount); } - rcu_read_unlock(); - /* Update consumer data */ + /* Final operation is to add the stream to the global hash table. */ + lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); + + /* Update channel refcount once added without error(s). */ + uatomic_inc(&stream->chan->refcount); + + /* + * When nb_init_streams reaches 0, we don't need to trigger any action in + * terms of destroying the associated channel, because the action that + * causes the count to become 0 also causes a stream to be added. The + * channel deletion will thus be triggered by the following removal of this + * stream. + */ + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { + uatomic_dec(&stream->chan->nb_init_streams); + } + + /* Update consumer data once the node is inserted. */ consumer_data.stream_count++; consumer_data.need_update = 1; -end: +error: + rcu_read_unlock(); pthread_mutex_unlock(&consumer_data.lock); return ret; @@ -611,7 +647,7 @@ void consumer_change_stream_state(int stream_key, struct lttng_consumer_stream *stream; pthread_mutex_lock(&consumer_data.lock); - stream = consumer_find_stream(stream_key); + stream = consumer_find_stream(stream_key, consumer_data.stream_ht); if (stream) { stream->state = state; } @@ -679,7 +715,9 @@ void consumer_del_channel(struct lttng_consumer_channel *channel) } } + rcu_read_lock(); call_rcu(&channel->node.head, consumer_free_channel); + rcu_read_unlock(); end: pthread_mutex_unlock(&consumer_data.lock); } @@ -1526,7 +1564,7 @@ static void destroy_stream_ht(struct lttng_ht *ht) ret = lttng_ht_del(ht, &iter); assert(!ret); - free(stream); + call_rcu(&stream->node.head, consumer_free_stream); } rcu_read_unlock(); @@ -1626,17 +1664,46 @@ static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) consumer_del_channel(stream->chan); } - free(stream); + rcu_read_lock(); + call_rcu(&stream->node.head, consumer_free_stream); + rcu_read_unlock(); } /* * Action done with the metadata stream when adding it to the consumer internal * data structures to handle it. */ -static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) +static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, + struct lttng_ht *ht) { + int ret = 0; struct consumer_relayd_sock_pair *relayd; + switch (consumer_data.type) { + case LTTNG_CONSUMER_KERNEL: + break; + case LTTNG_CONSUMER32_UST: + case LTTNG_CONSUMER64_UST: + ret = lttng_ustconsumer_add_stream(stream); + if (ret) { + ret = -EINVAL; + goto error; + } + + /* Steal stream identifier only for UST */ + consumer_steal_stream_key(stream->key, ht); + break; + default: + ERR("Unknown consumer_data type"); + assert(0); + return -ENOSYS; + } + + /* + * From here, refcounts are updated so be _careful_ when returning an error + * after this point. + */ + /* Find relayd and, if one is found, increment refcount. */ rcu_read_lock(); relayd = consumer_find_relayd(stream->net_seq_idx); @@ -1644,6 +1711,27 @@ static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) uatomic_inc(&relayd->refcount); } rcu_read_unlock(); + + /* Update channel refcount once added without error(s). */ + uatomic_inc(&stream->chan->refcount); + + /* + * When nb_init_streams reaches 0, we don't need to trigger any action in + * terms of destroying the associated channel, because the action that + * causes the count to become 0 also causes a stream to be added. The + * channel deletion will thus be triggered by the following removal of this + * stream. + */ + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { + uatomic_dec(&stream->chan->nb_init_streams); + } + + rcu_read_lock(); + lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); + rcu_read_unlock(); + +error: + return ret; } /* @@ -1740,17 +1828,16 @@ restart: DBG("Adding metadata stream %d to poll set", stream->wait_fd); - rcu_read_lock(); - /* The node should be init at this point */ - lttng_ht_add_unique_ulong(metadata_ht, - &stream->waitfd_node); - rcu_read_unlock(); + ret = consumer_add_metadata_stream(stream, metadata_ht); + if (ret) { + /* Stream was not setup properly. Continuing. */ + free(stream); + continue; + } /* Add metadata stream to the global poll events list */ lttng_poll_add(&events, stream->wait_fd, LPOLLIN | LPOLLPRI); - - consumer_add_metadata_stream(stream); } /* Metadata pipe handled. Continue handling the others */ diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c index f57e2e6..e10d540 100644 --- a/src/common/ust-consumer/ust-consumer.c +++ b/src/common/ust-consumer/ust-consumer.c @@ -190,9 +190,8 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, return ret; } - DBG("consumer_add_stream chan %d stream %d", - msg.u.stream.channel_key, - msg.u.stream.stream_key); + DBG("Consumer command ADD_STREAM chan %d stream %d", + msg.u.stream.channel_key, msg.u.stream.stream_key); assert(msg.u.stream.output == LTTNG_EVENT_MMAP); new_stream = consumer_allocate_stream(msg.u.stream.channel_key, @@ -235,12 +234,13 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, &new_stream->relayd_stream_id); pthread_mutex_unlock(&relayd->ctrl_sock_mutex); if (ret < 0) { + consumer_del_stream(new_stream); goto end_nosignal; } } else if (msg.u.stream.net_index != -1) { ERR("Network sequence index %d unknown. Not adding stream.", msg.u.stream.net_index); - free(new_stream); + consumer_del_stream(new_stream); goto end_nosignal; } @@ -248,6 +248,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, if (ctx->on_recv_stream) { ret = ctx->on_recv_stream(new_stream); if (ret < 0) { + consumer_del_stream(new_stream); goto end_nosignal; } } @@ -260,9 +261,17 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, } while (ret < 0 && errno == EINTR); if (ret < 0) { PERROR("write metadata pipe"); + consumer_del_stream(new_stream); + goto end_nosignal; } } else { - consumer_add_stream(new_stream); + ret = consumer_add_stream(new_stream); + if (ret) { + ERR("Consumer add stream %d failed. Continuing", + new_stream->key); + consumer_del_stream(new_stream); + goto end_nosignal; + } } DBG("UST consumer_add_stream %s (%d,%d) with relayd id %" PRIu64, @@ -374,7 +383,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) ustctl_unmap_channel(chan->handle); } -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) { struct lttng_ust_object_data obj; int ret; @@ -384,17 +393,24 @@ int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) obj.wait_fd = stream->wait_fd; obj.memory_map_size = stream->mmap_len; ret = ustctl_add_stream(stream->chan->handle, &obj); - if (ret) + if (ret) { + ERR("UST ctl add_stream failed with ret %d", ret); return ret; + } + stream->buf = ustctl_open_stream_read(stream->chan->handle, stream->cpu); - if (!stream->buf) + if (!stream->buf) { + ERR("UST ctl open_stream_read failed"); return -EBUSY; + } + /* ustctl_open_stream_read has closed the shm fd. */ stream->wait_fd_is_copy = 1; stream->shm_fd = -1; stream->mmap_base = ustctl_get_mmap_base(stream->chan->handle, stream->buf); if (!stream->mmap_base) { + ERR("UST ctl get_mmap_base failed"); return -EINVAL; } diff --git a/src/common/ust-consumer/ust-consumer.h b/src/common/ust-consumer/ust-consumer.h index 3f76f23..6b507ed 100644 --- a/src/common/ust-consumer/ust-consumer.h +++ b/src/common/ust-consumer/ust-consumer.h @@ -49,7 +49,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, extern int lttng_ustconsumer_allocate_channel(struct lttng_consumer_channel *chan); extern void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan); -extern int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream); +extern int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream); extern void lttng_ustconsumer_del_stream(struct lttng_consumer_stream *stream); int lttng_ustconsumer_read_subbuffer(struct lttng_consumer_stream *stream, @@ -117,7 +117,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) } static inline -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) { return -ENOSYS; } -- 1.7.10.4 From christian.babeux at efficios.com Tue Oct 2 14:18:29 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 2 Oct 2012 14:18:29 -0400 Subject: [lttng-dev] [PATCH v2 lttng-tools 1/5] Add testpoints in lttng-sessiond to instrument every threads In-Reply-To: <1348770199-1618-1-git-send-email-christian.babeux@efficios.com> References: <1348770199-1618-1-git-send-email-christian.babeux@efficios.com> Message-ID: <1349201909-9906-1-git-send-email-christian.babeux@efficios.com> This commit adds 8 new testpoints in the lttng-sessiond binary. These testpoints rely on the testpoints infrastructure introduced recently. Testpoints: thread_manage_clients thread_manage_clients_before_loop thread_registration_apps thread_manage_apps thread_manage_apps_before_loop thread_manage_kernel thread_manage_kernel_before_loop thread_manage_consumer The thread_ testpoints are placed directly at the thread start and they can be used to trigger failure in . The thread__before_loop testpoints are placed directly before the main processing loop of the thread and thus can be used to stall the processing of the thread. Signed-off-by: Christian Babeux --- src/bin/lttng-sessiond/Makefile.am | 3 ++- src/bin/lttng-sessiond/main.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/src/bin/lttng-sessiond/Makefile.am b/src/bin/lttng-sessiond/Makefile.am index 73be023..733818e 100644 --- a/src/bin/lttng-sessiond/Makefile.am +++ b/src/bin/lttng-sessiond/Makefile.am @@ -38,7 +38,8 @@ lttng_sessiond_LDADD = -lrt -lurcu-common -lurcu \ $(top_builddir)/src/common/hashtable/libhashtable.la \ $(top_builddir)/src/common/libcommon.la \ $(top_builddir)/src/common/compat/libcompat.la \ - $(top_builddir)/src/common/relayd/librelayd.la + $(top_builddir)/src/common/relayd/librelayd.la \ + $(top_builddir)/src/common/testpoint/libtestpoint.la if HAVE_LIBLTTNG_UST_CTL lttng_sessiond_LDADD += -llttng-ust-ctl diff --git a/src/bin/lttng-sessiond/main.c b/src/bin/lttng-sessiond/main.c index 730ac65..6e93cf0 100644 --- a/src/bin/lttng-sessiond/main.c +++ b/src/bin/lttng-sessiond/main.c @@ -45,6 +45,7 @@ #include #include #include +#include #include "lttng-sessiond.h" #include "channel.h" @@ -65,6 +66,16 @@ #define CONSUMERD_FILE "lttng-consumerd" +/* Testpoints, internal use only */ +TESTPOINT_DECL(thread_manage_clients); +TESTPOINT_DECL(thread_manage_clients_before_loop); +TESTPOINT_DECL(thread_registration_apps); +TESTPOINT_DECL(thread_manage_apps); +TESTPOINT_DECL(thread_manage_apps_before_loop); +TESTPOINT_DECL(thread_manage_kernel); +TESTPOINT_DECL(thread_manage_kernel_before_loop); +TESTPOINT_DECL(thread_manage_consumer); + /* Const values */ const char default_home_dir[] = DEFAULT_HOME_DIR; const char default_tracing_group[] = DEFAULT_TRACING_GROUP; @@ -680,8 +691,12 @@ static void *thread_manage_kernel(void *data) DBG("Thread manage kernel started"); + testpoint(thread_manage_kernel); + health_code_update(&health_thread_kernel); + testpoint(thread_manage_kernel_before_loop); + ret = create_thread_poll_set(&events, 2); if (ret < 0) { goto error_poll_create; @@ -829,6 +844,9 @@ static void *thread_manage_consumer(void *data) /* Inifinite blocking call, waiting for transmission */ restart: health_poll_update(&consumer_data->health); + + testpoint(thread_manage_consumer); + ret = lttng_poll_wait(&events, -1); health_poll_update(&consumer_data->health); if (ret < 0) { @@ -1026,6 +1044,8 @@ static void *thread_manage_apps(void *data) DBG("[thread] Manage application started"); + testpoint(thread_manage_apps); + rcu_register_thread(); rcu_thread_online(); @@ -1041,6 +1061,8 @@ static void *thread_manage_apps(void *data) goto error; } + testpoint(thread_manage_apps_before_loop); + health_code_update(&health_thread_app_manage); while (1) { @@ -1264,6 +1286,8 @@ static void *thread_registration_apps(void *data) DBG("[thread] Manage application registration started"); + testpoint(thread_registration_apps); + ret = lttcomm_listen_unix_sock(apps_sock); if (ret < 0) { goto error_listen; @@ -2912,6 +2936,8 @@ static void *thread_manage_clients(void *data) DBG("[thread] Manage client started"); + testpoint(thread_manage_clients); + rcu_register_thread(); health_code_update(&health_thread_cmd); @@ -2943,6 +2969,8 @@ static void *thread_manage_clients(void *data) kill(ppid, SIGUSR1); } + testpoint(thread_manage_clients_before_loop); + health_code_update(&health_thread_cmd); while (1) { -- 1.7.12.1 From christian.babeux at efficios.com Tue Oct 2 14:19:19 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 2 Oct 2012 14:19:19 -0400 Subject: [lttng-dev] [PATCH v2 lttng-tools 2/5] Tests: Add a health check utility program In-Reply-To: <1348770199-1618-2-git-send-email-christian.babeux@efficios.com> References: <1348770199-1618-2-git-send-email-christian.babeux@efficios.com> Message-ID: <1349201959-9989-1-git-send-email-christian.babeux@efficios.com> The health_check program is a simple utility to query the health status of the different threads of the sessiond. Sample output: > ./health_check Health check cmd: 0 Health check app. manage: 0 Health check app. registration: 0 Health check kernel: 0 Health check consumer: 0 The return code is encoded to indicate which thread has failed. Signed-off-by: Christian Babeux --- tests/tools/health/Makefile.am | 20 +++++++++++ tests/tools/health/health_check.c | 73 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 93 insertions(+) create mode 100644 tests/tools/health/Makefile.am create mode 100644 tests/tools/health/health_check.c diff --git a/tests/tools/health/Makefile.am b/tests/tools/health/Makefile.am new file mode 100644 index 0000000..09573db --- /dev/null +++ b/tests/tools/health/Makefile.am @@ -0,0 +1,20 @@ +AM_CFLAGS = -I. -O2 -g -I../../../include +AM_LDFLAGS = + +if LTTNG_TOOLS_BUILD_WITH_LIBDL +AM_LDFLAGS += -ldl +endif +if LTTNG_TOOLS_BUILD_WITH_LIBC_DL +AM_LDFLAGS += -lc +endif + +UTILS= + +noinst_PROGRAMS = health_check + +health_check_SOURCES = health_check.c $(UTILS) +health_check_LDADD = $(top_builddir)/src/lib/lttng-ctl/liblttng-ctl.la \ + $(top_builddir)/src/common/libcommon.la + +noinst_SCRIPTS = +EXTRA_DIST = diff --git a/tests/tools/health/health_check.c b/tests/tools/health/health_check.c new file mode 100644 index 0000000..3eef110 --- /dev/null +++ b/tests/tools/health/health_check.c @@ -0,0 +1,73 @@ +/* + * Copyright (C) 2012 - Christian Babeux + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License, version 2 only, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 51 + * Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include + +#include "lttng/lttng.h" + +#define HEALTH_CMD_FAIL (1 << 0) +#define HEALTH_APP_MNG_FAIL (1 << 1) +#define HEALTH_APP_REG_FAIL (1 << 2) +#define HEALTH_KERNEL_FAIL (1 << 3) +#define HEALTH_CSMR_FAIL (1 << 4) + +int main(int argc, char *argv[]) +{ + int health = -1; + int status = 0; + + /* Command thread */ + health = lttng_health_check(LTTNG_HEALTH_CMD); + printf("Health check cmd: %d\n", health); + + if (health) { + status |= HEALTH_CMD_FAIL; + } + + /* App manage thread */ + health = lttng_health_check(LTTNG_HEALTH_APP_MANAGE); + printf("Health check app. manage: %d\n", health); + + if (health) { + status |= HEALTH_APP_MNG_FAIL; + } + /* App registration thread */ + health = lttng_health_check(LTTNG_HEALTH_APP_REG); + printf("Health check app. registration: %d\n", health); + + if (health) { + status |= HEALTH_APP_REG_FAIL; + } + + /* Kernel thread */ + health = lttng_health_check(LTTNG_HEALTH_KERNEL); + printf("Health check kernel: %d\n", health); + + if (health) { + status |= HEALTH_KERNEL_FAIL; + } + + /* Consumer thread */ + health = lttng_health_check(LTTNG_HEALTH_CONSUMER); + printf("Health check consumer: %d\n", health); + + if (health) { + status |= HEALTH_CSMR_FAIL; + } + + return status; +} -- 1.7.12.1 From christian.babeux at efficios.com Tue Oct 2 14:20:04 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 2 Oct 2012 14:20:04 -0400 Subject: [lttng-dev] [PATCH v2 lttng-tools 5/5] Tests: Add health check tests to configure In-Reply-To: <1348770321-1670-1-git-send-email-christian.babeux@efficios.com> References: <1348770321-1670-1-git-send-email-christian.babeux@efficios.com> Message-ID: <1349202004-10115-1-git-send-email-christian.babeux@efficios.com> Add health folder to top-level tests Makefile.am. Also add a runall script to run all health check tests. Signed-off-by: Christian Babeux --- configure.ac | 1 + tests/tools/Makefile.am | 2 +- tests/tools/health/runall | 28 ++++++++++++++++++++++++++++ 3 files changed, 30 insertions(+), 1 deletion(-) create mode 100755 tests/tools/health/runall diff --git a/configure.ac b/configure.ac index 36c137b..713daa9 100644 --- a/configure.ac +++ b/configure.ac @@ -288,6 +288,7 @@ AC_CONFIG_FILES([ tests/kernel/Makefile tests/tools/Makefile tests/tools/streaming/Makefile + tests/tools/health/Makefile tests/ust/Makefile tests/ust/nprocesses/Makefile tests/ust/high-throughput/Makefile diff --git a/tests/tools/Makefile.am b/tests/tools/Makefile.am index 3d25900..faa836c 100644 --- a/tests/tools/Makefile.am +++ b/tests/tools/Makefile.am @@ -1,4 +1,4 @@ -SUBDIRS = streaming +SUBDIRS = streaming health AM_CFLAGS = -g -Wall -I../ AM_LDFLAGS = -lurcu -lurcu-cds diff --git a/tests/tools/health/runall b/tests/tools/health/runall new file mode 100755 index 0000000..c22e353 --- /dev/null +++ b/tests/tools/health/runall @@ -0,0 +1,28 @@ +#!/bin/bash + +DIR=$(dirname $0) + +tests=( $DIR/health_thread_exit $DIR/health_thread_stall ) +exit_code=0 + +function start_tests () +{ + for bin in ${tests[@]}; + do + if [ ! -e $bin ]; then + echo -e "$bin not found, passing" + continue + fi + + ./$bin + # Test must return 0 to pass. + if [ $? -ne 0 ]; then + exit_code=1 + break + fi + done +} + +start_tests + +exit $exit_code -- 1.7.12.1 From Bernd.Hufmann at ericsson.com Tue Oct 2 14:47:23 2012 From: Bernd.Hufmann at ericsson.com (Bernd Hufmann) Date: Tue, 2 Oct 2012 14:47:23 -0400 Subject: [lttng-dev] lttng destroy fails Message-ID: <506B36BB.4040808@ericsson.com> Hello I noticed a minor issue when executing the following command sequence: > lttng create auto Session auto created. Traces will be written in /home/bernd/lttng-traces/auto-20121002-144207 > lttng destroy Warning: Session name auto not found The session is sill around after that. I think the problem is that in the file .lttngrc the wrong session name is stored. In the file .lttngrc the session name "auto" is stored but the actual name is "auto-20121002-144207". BTW, I'm using lttng-tools 2.1 RC4. BR, Bernd -- This Communication is Confidential. We only send and receive email on the basis of the terms set out at www.ericsson.com/email_disclaimer From jdesfossez at efficios.com Tue Oct 2 15:03:26 2012 From: jdesfossez at efficios.com (Julien Desfossez) Date: Tue, 2 Oct 2012 15:03:26 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v5] ABI with support for compat 32/64 bits Message-ID: <1349204606-6473-1-git-send-email-jdesfossez@efficios.com> The current ABI does not work for compat 32/64 bits. This patch moves the current ABI as old-abi and provides a new ABI in which all the structures exchanged between user and kernel-space are packed. Also this new ABI moves the "int overwrite" member of the struct lttng_kernel_channel to remove the alignment added by the compiler. A patch for lttng-modules has been developed in parallel to this one to support the new ABI. These 2 patches have been tested in all possible configurations (applied or not) on 64-bit and 32-bit kernels (with CONFIG_COMPAT) and a user-space in 32 and 64-bit. Here are the results of the tests : k 64 compat |?u 32 compat | OK k 64 compat | u 64 compat | OK k 64 compat | u 32 non-compat | KO k 64 compat | u 64 non-compat | OK k 64 non-compat | u 64 compat | OK k 64 non-compat | u 32 compat | KO k 64 non-compat | u 64 non-compat | OK k 64 non-compat | u 32 non-compat | KO k 32 compat | u compat | OK k 32 compat | u non-compat | OK k 32 non-compat | u compat | OK k 32 non-compat | u non-compat | OK The results are as expected : - on 32-bit user-space and kernel, every configuration works. - on 64-bit user-space and kernel, every configuration works. - with 32-bit user-space on a 64-bit kernel the only configuration where it works is when the compat patch is applied everywhere. Signed-off-by: Julien Desfossez --- src/bin/lttng-sessiond/trace-kernel.h | 1 + src/common/kernel-ctl/kernel-ctl.c | 224 ++++++++++++++++++++++++++++++--- src/common/kernel-ctl/kernel-ctl.h | 1 + src/common/kernel-ctl/kernel-ioctl.h | 74 +++++++---- src/common/lttng-kernel-old.h | 115 +++++++++++++++++ src/common/lttng-kernel.h | 31 +++-- 6 files changed, 397 insertions(+), 49 deletions(-) create mode 100644 src/common/lttng-kernel-old.h diff --git a/src/bin/lttng-sessiond/trace-kernel.h b/src/bin/lttng-sessiond/trace-kernel.h index f04d9e7..c86cc27 100644 --- a/src/bin/lttng-sessiond/trace-kernel.h +++ b/src/bin/lttng-sessiond/trace-kernel.h @@ -22,6 +22,7 @@ #include #include +#include #include "consumer.h" diff --git a/src/common/kernel-ctl/kernel-ctl.c b/src/common/kernel-ctl/kernel-ctl.c index 1396cd9..a93d251 100644 --- a/src/common/kernel-ctl/kernel-ctl.c +++ b/src/common/kernel-ctl/kernel-ctl.c @@ -18,38 +18,175 @@ #define __USE_LINUX_IOCTL_DEFS #include +#include #include "kernel-ctl.h" #include "kernel-ioctl.h" +/* + * This flag indicates which version of the kernel ABI to use. The old + * ABI (namespace _old) does not support a 32-bit user-space when the + * kernel is 64-bit. The old ABI is kept here for compatibility but is + * deprecated and will be removed eventually. + */ +static int lttng_kernel_use_old_abi = -1; + +/* + * Execute the new or old ioctl depending on the ABI version. + * If the ABI version is not determined yet (lttng_kernel_use_old_abi = -1), + * this function tests if the new ABI is available and otherwise fallbacks + * on the old one. + * This function takes the fd on which the ioctl must be executed and the old + * and new request codes. + * It returns the return value of the ioctl executed. + */ +static inline int compat_ioctl_no_arg(int fd, unsigned long oldname, + unsigned long newname) +{ + int ret; + + if (lttng_kernel_use_old_abi == -1) { + ret = ioctl(fd, newname); + if (!ret) { + lttng_kernel_use_old_abi = 0; + goto end; + } + lttng_kernel_use_old_abi = 1; + } + if (lttng_kernel_use_old_abi) { + ret = ioctl(fd, oldname); + } else { + ret = ioctl(fd, newname); + } + +end: + return ret; +} + int kernctl_create_session(int fd) { - return ioctl(fd, LTTNG_KERNEL_SESSION); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION, + LTTNG_KERNEL_SESSION); } /* open the metadata global channel */ int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops) { - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); + struct lttng_kernel_old_channel old_channel; + struct lttng_kernel_channel channel; + + if (lttng_kernel_use_old_abi) { + old_channel.overwrite = chops->overwrite; + old_channel.subbuf_size = chops->subbuf_size; + old_channel.num_subbuf = chops->num_subbuf; + old_channel.switch_timer_interval = chops->switch_timer_interval; + old_channel.read_timer_interval = chops->read_timer_interval; + old_channel.output = chops->output; + memcpy(old_channel.padding, chops->padding, sizeof(old_channel.padding)); + + return ioctl(fd, LTTNG_KERNEL_OLD_METADATA, &old_channel); + } + + channel.overwrite = chops->overwrite; + channel.subbuf_size = chops->subbuf_size; + channel.num_subbuf = chops->num_subbuf; + channel.switch_timer_interval = chops->switch_timer_interval; + channel.read_timer_interval = chops->read_timer_interval; + channel.output = chops->output; + memcpy(channel.padding, chops->padding, sizeof(channel.padding)); + + return ioctl(fd, LTTNG_KERNEL_METADATA, &channel); } int kernctl_create_channel(int fd, struct lttng_channel_attr *chops) { - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); + struct lttng_kernel_channel channel; + + if (lttng_kernel_use_old_abi) { + struct lttng_kernel_old_channel old_channel; + + old_channel.overwrite = chops->overwrite; + old_channel.subbuf_size = chops->subbuf_size; + old_channel.num_subbuf = chops->num_subbuf; + old_channel.switch_timer_interval = chops->switch_timer_interval; + old_channel.read_timer_interval = chops->read_timer_interval; + old_channel.output = chops->output; + memcpy(old_channel.padding, chops->padding, sizeof(old_channel.padding)); + + return ioctl(fd, LTTNG_KERNEL_OLD_CHANNEL, &old_channel); + } + + channel.overwrite = chops->overwrite; + channel.subbuf_size = chops->subbuf_size; + channel.num_subbuf = chops->num_subbuf; + channel.switch_timer_interval = chops->switch_timer_interval; + channel.read_timer_interval = chops->read_timer_interval; + channel.output = chops->output; + memcpy(channel.padding, chops->padding, sizeof(channel.padding)); + + return ioctl(fd, LTTNG_KERNEL_CHANNEL, &channel); } int kernctl_create_stream(int fd) { - return ioctl(fd, LTTNG_KERNEL_STREAM); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_STREAM, + LTTNG_KERNEL_STREAM); } int kernctl_create_event(int fd, struct lttng_kernel_event *ev) { + if (lttng_kernel_use_old_abi) { + struct lttng_kernel_old_event old_event; + + memcpy(old_event.name, ev->name, sizeof(old_event.name)); + old_event.instrumentation = ev->instrumentation; + switch (ev->instrumentation) { + case LTTNG_KERNEL_KPROBE: + old_event.u.kprobe.addr = ev->u.kprobe.addr; + old_event.u.kprobe.offset = ev->u.kprobe.offset; + memcpy(old_event.u.kprobe.symbol_name, + ev->u.kprobe.symbol_name, + sizeof(old_event.u.kprobe.symbol_name)); + break; + case LTTNG_KERNEL_KRETPROBE: + old_event.u.kretprobe.addr = ev->u.kretprobe.addr; + old_event.u.kretprobe.offset = ev->u.kretprobe.offset; + memcpy(old_event.u.kretprobe.symbol_name, + ev->u.kretprobe.symbol_name, + sizeof(old_event.u.kretprobe.symbol_name)); + break; + case LTTNG_KERNEL_FUNCTION: + memcpy(old_event.u.ftrace.symbol_name, + ev->u.ftrace.symbol_name, + sizeof(old_event.u.ftrace.symbol_name)); + break; + default: + break; + } + + return ioctl(fd, LTTNG_KERNEL_OLD_EVENT, &old_event); + } return ioctl(fd, LTTNG_KERNEL_EVENT, ev); } int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) { + if (lttng_kernel_use_old_abi) { + struct lttng_kernel_old_context old_ctx; + + old_ctx.ctx = ctx->ctx; + /* only type that uses the union */ + if (ctx->ctx == LTTNG_KERNEL_CONTEXT_PERF_COUNTER) { + old_ctx.u.perf_counter.type = + ctx->u.perf_counter.type; + old_ctx.u.perf_counter.config = + ctx->u.perf_counter.config; + memcpy(old_ctx.u.perf_counter.name, + ctx->u.perf_counter.name, + sizeof(old_ctx.u.perf_counter.name)); + } + return ioctl(fd, LTTNG_KERNEL_OLD_CONTEXT, &old_ctx); + } return ioctl(fd, LTTNG_KERNEL_CONTEXT, ctx); } @@ -57,44 +194,98 @@ int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) /* Enable event, channel and session ioctl */ int kernctl_enable(int fd) { - return ioctl(fd, LTTNG_KERNEL_ENABLE); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_ENABLE, + LTTNG_KERNEL_ENABLE); } /* Disable event, channel and session ioctl */ int kernctl_disable(int fd) { - return ioctl(fd, LTTNG_KERNEL_DISABLE); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_DISABLE, + LTTNG_KERNEL_DISABLE); } int kernctl_start_session(int fd) { - return ioctl(fd, LTTNG_KERNEL_SESSION_START); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_START, + LTTNG_KERNEL_SESSION_START); } int kernctl_stop_session(int fd) { - return ioctl(fd, LTTNG_KERNEL_SESSION_STOP); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_STOP, + LTTNG_KERNEL_SESSION_STOP); } - int kernctl_tracepoint_list(int fd) { - return ioctl(fd, LTTNG_KERNEL_TRACEPOINT_LIST); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_TRACEPOINT_LIST, + LTTNG_KERNEL_TRACEPOINT_LIST); } int kernctl_tracer_version(int fd, struct lttng_kernel_tracer_version *v) { - return ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); + int ret; + + if (lttng_kernel_use_old_abi == -1) { + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); + if (!ret) { + lttng_kernel_use_old_abi = 0; + goto end; + } + lttng_kernel_use_old_abi = 1; + } + if (lttng_kernel_use_old_abi) { + struct lttng_kernel_old_tracer_version old_v; + + ret = ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, &old_v); + if (ret) { + goto end; + } + v->major = old_v.major; + v->minor = old_v.minor; + v->patchlevel = old_v.patchlevel; + } else { + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); + } + +end: + return ret; } int kernctl_wait_quiescent(int fd) { - return ioctl(fd, LTTNG_KERNEL_WAIT_QUIESCENT); + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_WAIT_QUIESCENT, + LTTNG_KERNEL_WAIT_QUIESCENT); } int kernctl_calibrate(int fd, struct lttng_kernel_calibrate *calibrate) { - return ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); + int ret; + + if (lttng_kernel_use_old_abi == -1) { + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); + if (!ret) { + lttng_kernel_use_old_abi = 0; + goto end; + } + lttng_kernel_use_old_abi = 1; + } + if (lttng_kernel_use_old_abi) { + struct lttng_kernel_old_calibrate old_calibrate; + + old_calibrate.type = calibrate->type; + ret = ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, &old_calibrate); + if (ret) { + goto end; + } + calibrate->type = old_calibrate.type; + } else { + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); + } + +end: + return ret; } @@ -193,10 +384,3 @@ int kernctl_set_stream_id(int fd, unsigned long *stream_id) { return ioctl(fd, RING_BUFFER_SET_STREAM_ID, stream_id); } - -/* Get the offset of the stream_id in the packet header */ -int kernctl_get_net_stream_id_offset(int fd, unsigned long *offset) -{ - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, offset); - -} diff --git a/src/common/kernel-ctl/kernel-ctl.h b/src/common/kernel-ctl/kernel-ctl.h index 18712d9..85a3a18 100644 --- a/src/common/kernel-ctl/kernel-ctl.h +++ b/src/common/kernel-ctl/kernel-ctl.h @@ -21,6 +21,7 @@ #include #include +#include int kernctl_create_session(int fd); int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops); diff --git a/src/common/kernel-ctl/kernel-ioctl.h b/src/common/kernel-ctl/kernel-ioctl.h index 35942be..8e22632 100644 --- a/src/common/kernel-ctl/kernel-ioctl.h +++ b/src/common/kernel-ctl/kernel-ioctl.h @@ -49,37 +49,69 @@ /* map stream to stream id for network streaming */ #define RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned long) +/* Old ABI (without support for 32/64 bits compat) */ +/* LTTng file descriptor ioctl */ +#define LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ + _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, 0x42) +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT _IO(0xF6, 0x43) +#define LTTNG_KERNEL_OLD_CALIBRATE \ + _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) + +/* Session FD ioctl */ +#define LTTNG_KERNEL_OLD_METADATA \ + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) +#define LTTNG_KERNEL_OLD_CHANNEL \ + _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) +#define LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, 0x53) + +/* Channel FD ioctl */ +#define LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) +#define LTTNG_KERNEL_OLD_EVENT \ + _IOW(0xF6, 0x61, struct lttng_kernel_old_event) +#define LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ + _IOR(0xF6, 0x62, unsigned long) +/* Event and Channel FD ioctl */ +#define LTTNG_KERNEL_OLD_CONTEXT \ + _IOW(0xF6, 0x70, struct lttng_kernel_old_context) + +/* Event, Channel and Session ioctl */ +#define LTTNG_KERNEL_OLD_ENABLE _IO(0xF6, 0x80) +#define LTTNG_KERNEL_OLD_DISABLE _IO(0xF6, 0x81) + + +/* Current ABI (with suport for 32/64 bits compat) */ /* LTTng file descriptor ioctl */ -#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) -#define LTTNG_KERNEL_TRACER_VERSION \ - _IOR(0xF6, 0x41, struct lttng_kernel_tracer_version) -#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) -#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) +#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) +#define LTTNG_KERNEL_TRACER_VERSION \ + _IOR(0xF6, 0x46, struct lttng_kernel_tracer_version) +#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) +#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) #define LTTNG_KERNEL_CALIBRATE \ - _IOWR(0xF6, 0x44, struct lttng_kernel_calibrate) + _IOWR(0xF6, 0x49, struct lttng_kernel_calibrate) /* Session FD ioctl */ -#define LTTNG_KERNEL_METADATA \ - _IOW(0xF6, 0x50, struct lttng_channel_attr) -#define LTTNG_KERNEL_CHANNEL \ - _IOW(0xF6, 0x51, struct lttng_channel_attr) -#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) -#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) +#define LTTNG_KERNEL_METADATA \ + _IOW(0xF6, 0x54, struct lttng_kernel_channel) +#define LTTNG_KERNEL_CHANNEL \ + _IOW(0xF6, 0x55, struct lttng_kernel_channel) +#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) +#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) /* Channel FD ioctl */ -#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x60) -#define LTTNG_KERNEL_EVENT \ - _IOW(0xF6, 0x61, struct lttng_kernel_event) -#define LTTNG_KERNEL_STREAM_ID_OFFSET \ - _IOR(0xF6, 0x62, unsigned long) +#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x62) +#define LTTNG_KERNEL_EVENT \ + _IOW(0xF6, 0x63, struct lttng_kernel_event) /* Event and Channel FD ioctl */ -#define LTTNG_KERNEL_CONTEXT \ - _IOW(0xF6, 0x70, struct lttng_kernel_context) +#define LTTNG_KERNEL_CONTEXT \ + _IOW(0xF6, 0x71, struct lttng_kernel_context) /* Event, Channel and Session ioctl */ -#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) -#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) +#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) +#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) #endif /* _LTT_KERNEL_IOCTL_H */ diff --git a/src/common/lttng-kernel-old.h b/src/common/lttng-kernel-old.h new file mode 100644 index 0000000..1b8999a --- /dev/null +++ b/src/common/lttng-kernel-old.h @@ -0,0 +1,115 @@ +/* + * Copyright (C) 2011 - Julien Desfossez + * Mathieu Desnoyers + * David Goulet + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#ifndef _LTTNG_KERNEL_OLD_H +#define _LTTNG_KERNEL_OLD_H + +#include +#include + +/* + * LTTng DebugFS ABI structures. + * + * This is the kernel ABI copied from lttng-modules tree. + */ + +/* Perf counter attributes */ +struct lttng_kernel_old_perf_counter_ctx { + uint32_t type; + uint64_t config; + char name[LTTNG_KERNEL_SYM_NAME_LEN]; +}; + +/* Event/Channel context */ +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 +struct lttng_kernel_old_context { + enum lttng_kernel_context_type ctx; + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; + + union { + struct lttng_kernel_old_perf_counter_ctx perf_counter; + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; + } u; +}; + +struct lttng_kernel_old_kretprobe { + uint64_t addr; + + uint64_t offset; + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; +}; + +/* + * Either addr is used, or symbol_name and offset. + */ +struct lttng_kernel_old_kprobe { + uint64_t addr; + + uint64_t offset; + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; +}; + +/* Function tracer */ +struct lttng_kernel_old_function { + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; +}; + +#define LTTNG_KERNEL_OLD_EVENT_PADDING1 16 +#define LTTNG_KERNEL_OLD_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 +struct lttng_kernel_old_event { + char name[LTTNG_KERNEL_SYM_NAME_LEN]; + enum lttng_kernel_instrumentation instrumentation; + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; + + /* Per instrumentation type configuration */ + union { + struct lttng_kernel_old_kretprobe kretprobe; + struct lttng_kernel_old_kprobe kprobe; + struct lttng_kernel_old_function ftrace; + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; + } u; +}; + +struct lttng_kernel_old_tracer_version { + uint32_t major; + uint32_t minor; + uint32_t patchlevel; +}; + +struct lttng_kernel_old_calibrate { + enum lttng_kernel_calibrate_type type; /* type (input) */ +}; + +/* + * kernel channel + */ +#define LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 +struct lttng_kernel_old_channel { + int overwrite; /* 1: overwrite, 0: discard */ + uint64_t subbuf_size; /* bytes */ + uint64_t num_subbuf; /* power of 2 */ + unsigned int switch_timer_interval; /* usec */ + unsigned int read_timer_interval; /* usec */ + enum lttng_event_output output; /* splice, mmap */ + + char padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; +}; + +#endif /* _LTTNG_KERNEL_OLD_H */ diff --git a/src/common/lttng-kernel.h b/src/common/lttng-kernel.h index dbeb6aa..fa8ba61 100644 --- a/src/common/lttng-kernel.h +++ b/src/common/lttng-kernel.h @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { uint32_t type; uint64_t config; char name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; +}__attribute__((packed)); /* Event/Channel context */ #define LTTNG_KERNEL_CONTEXT_PADDING1 16 @@ -72,14 +72,14 @@ struct lttng_kernel_context { struct lttng_kernel_perf_counter_ctx perf_counter; char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; } u; -}; +}__attribute__((packed)); struct lttng_kernel_kretprobe { uint64_t addr; uint64_t offset; char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; +}__attribute__((packed)); /* * Either addr is used, or symbol_name and offset. @@ -89,12 +89,12 @@ struct lttng_kernel_kprobe { uint64_t offset; char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; +}__attribute__((packed)); /* Function tracer */ struct lttng_kernel_function { char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; -}; +}__attribute__((packed)); #define LTTNG_KERNEL_EVENT_PADDING1 16 #define LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 @@ -110,13 +110,13 @@ struct lttng_kernel_event { struct lttng_kernel_function ftrace; char padding[LTTNG_KERNEL_EVENT_PADDING2]; } u; -}; +}__attribute__((packed)); struct lttng_kernel_tracer_version { uint32_t major; uint32_t minor; uint32_t patchlevel; -}; +}__attribute__((packed)); enum lttng_kernel_calibrate_type { LTTNG_KERNEL_CALIBRATE_KRETPROBE, @@ -124,6 +124,21 @@ enum lttng_kernel_calibrate_type { struct lttng_kernel_calibrate { enum lttng_kernel_calibrate_type type; /* type (input) */ -}; +}__attribute__((packed)); + +/* + * kernel channel + */ +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 +struct lttng_kernel_channel { + uint64_t subbuf_size; /* bytes */ + uint64_t num_subbuf; /* power of 2 */ + unsigned int switch_timer_interval; /* usec */ + unsigned int read_timer_interval; /* usec */ + enum lttng_event_output output; /* splice, mmap */ + + int overwrite; /* 1: overwrite, 0: discard */ + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; +}__attribute__((packed)); #endif /* _LTTNG_KERNEL_H */ -- 1.7.9.5 From Bernd.Hufmann at ericsson.com Tue Oct 2 15:04:49 2012 From: Bernd.Hufmann at ericsson.com (Bernd Hufmann) Date: Tue, 2 Oct 2012 15:04:49 -0400 Subject: [lttng-dev] consumerd crashes (streaming) Message-ID: <506B3AD1.6080200@ericsson.com> Hello I'm using lttng-tools 2.1 RC4 to stream traces over the network. I noticed a crash of the consumerd for the following scenario: host1> lttng-relayd host2> lttng create mySession -U host2> lttng enable-event -u -a host2> lttng start host2> lttng stop After that, I stopped the relayd on host1. Then, when destroying the session on host2, the consumerd crashes and remains as a zombi process (). I was able to reproduce this multiple times. However, the screen print-out varied. See at end of mail for an example of the output. I hope this output helps to find the problem. Could someone please have a look? Thanks Bernd PERROR: recvmsg inet: Connection reset by peer [in lttcomm_recvmsg_inet_sock() at inet.c:227] PERROR: sendmsg inet: Connection reset by peer [in lttcomm_sendmsg_inet_sock() at inet.c:271] PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1214] Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 56) PERROR: Error in file write: Bad file descriptor [in lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1214] Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37) *** glibc detected *** lttng-consumerd: free(): invalid pointer: 0xb5c02738 *** ======= Backtrace: ========= /lib/i386-linux-gnu/libc.so.6(+0x73e42)[0x282e42] lttng-consumerd(lttng_consumer_thread_poll_metadata+0x294)[0x804b6c4] /lib/i386-linux-gnu/libpthread.so.0(+0x6d4c)[0xd84d4c] /lib/i386-linux-gnu/libc.so.6(clone+0x5e)[0x2f9ace] ======= Memory map: ======== 00110000-0012c000 r-xp 00000000 08:01 524197 /lib/i386-linux-gnu/libgcc_s.so.1 0012c000-0012d000 r--p 0001b000 08:01 524197 /lib/i386-linux-gnu/libgcc_s.so.1 0012d000-0012e000 rw-p 0001c000 08:01 524197 /lib/i386-linux-gnu/libgcc_s.so.1 0020e000-0020f000 r-xp 00000000 00:00 0 [vdso] 0020f000-003ae000 r-xp 00000000 08:01 542235 /lib/i386-linux-gnu/libc-2.15.so 003ae000-003b0000 r--p 0019f000 08:01 542235 /lib/i386-linux-gnu/libc-2.15.so 003b0000-003b1000 rw-p 001a1000 08:01 542235 /lib/i386-linux-gnu/libc-2.15.so 003b1000-003b4000 rw-p 00000000 00:00 0 00694000-0069b000 r-xp 00000000 08:01 542251 /lib/i386-linux-gnu/librt-2.15.so 0069b000-0069c000 r--p 00006000 08:01 542251 /lib/i386-linux-gnu/librt-2.15.so 0069c000-0069d000 rw-p 00007000 08:01 542251 /lib/i386-linux-gnu/librt-2.15.so 0087f000-0089d000 r-xp 00000000 08:01 654290 /usr/local/lib/liblttng-ust-ctl.so.0.0.0 0089d000-0089e000 r--p 0001d000 08:01 654290 /usr/local/lib/liblttng-ust-ctl.so.0.0.0 0089e000-0089f000 rw-p 0001e000 08:01 654290 /usr/local/lib/liblttng-ust-ctl.so.0.0.0 00921000-00925000 r-xp 00000000 08:01 654175 /usr/local/lib/liburcu.so.1.0.0 00925000-00926000 r--p 00003000 08:01 654175 /usr/local/lib/liburcu.so.1.0.0 00926000-00927000 rw-p 00004000 08:01 654175 /usr/local/lib/liburcu.so.1.0.0 00a1c000-00a3c000 r-xp 00000000 08:01 523325 /lib/i386-linux-gnu/ld-2.15.so 00a3c000-00a3d000 r--p 0001f000 08:01 523325 /lib/i386-linux-gnu/ld-2.15.so 00a3d000-00a3e000 rw-p 00020000 08:01 523325 /lib/i386-linux-gnu/ld-2.15.so 00a5e000-00a63000 r-xp 00000000 08:01 654191 /usr/local/lib/liburcu-bp.so.1.0.0 00a63000-00a64000 r--p 00004000 08:01 654191 /usr/local/lib/liburcu-bp.so.1.0.0 00a64000-00a65000 rw-p 00005000 08:01 654191 /usr/local/lib/liburcu-bp.so.1.0.0 00d7e000-00d95000 r-xp 00000000 08:01 542249 /lib/i386-linux-gnu/libpthread-2.15.so 00d95000-00d96000 r--p 00016000 08:01 542249 /lib/i386-linux-gnu/libpthread-2.15.so 00d96000-00d97000 rw-p 00017000 08:01 542249 /lib/i386-linux-gnu/libpthread-2.15.so 00d97000-00d99000 rw-p 00000000 00:00 0 08048000-08062000 r-xp 00000000 08:01 654329 /usr/local/lib/lttng/libexec/lttng-consumerd 08062000-08063000 r--p 0001a000 08:01 654329 /usr/local/lib/lttng/libexec/lttng-consumerd 08063000-08064000 rw-p 0001b000 08:01 654329 /usr/local/lib/lttng/libexec/lttng-consumerd 08064000-08066000 rw-p 00000000 00:00 0 09b9e000-09bbf000 rw-p 00000000 00:00 0 [heap] b53ff000-b5400000 ---p 00000000 00:00 0 b5400000-b5c00000 rw-p 00000000 00:00 0 b5c00000-b5c21000 rw-p 00000000 00:00 0 b5c21000-b5d00000 ---p 00000000 00:00 0 b5e00000-b5e21000 rw-p 00000000 00:00 0 b5e21000-b5f00000 ---p 00000000 00:00 0 b5fd7000-b5fdd000 rw-s 00000000 00:12 27685 /run/shm/ust-shm-tmp-3bEkF7 (deleted) b5fdd000-b5fe1000 rw-s 00000000 00:12 27723 /run/shm/ust-shm-tmp-YwqSPR (deleted) b5fe1000-b5fe2000 ---p 00000000 00:00 0 b5fe2000-b67e2000 rw-p 00000000 00:00 0 b67e2000-b67e3000 ---p 00000000 00:00 0 b67e3000-b6fe3000 rw-p 00000000 00:00 0 b6fe3000-b6fe4000 ---p 00000000 00:00 0 b6fe4000-b77e7000 rw-p 00000000 00:00 0 b77e7000-b77ed000 rw-s 00000000 00:12 27681 /run/shm/ust-shm-tmp-7Lo767 (deleted) b77ed000-b77f1000 rw-s 00000000 00:12 27719 /run/shm/ust-shm-tmp-uEFKcS (deleted) b77fa000-b77fd000 rw-p 00000000 00:00 0 bfa07000-bfa28000 rw-p 00000000 00:00 0 [stack] Error: consumer err socket second poll error Error: Health error occurred in thread_manage_consumer -- This Communication is Confidential. We only send and receive email on the basis of the terms set out at www.ericsson.com/email_disclaimer From dgoulet at efficios.com Tue Oct 2 15:09:10 2012 From: dgoulet at efficios.com (David Goulet) Date: Tue, 02 Oct 2012 15:09:10 -0400 Subject: [lttng-dev] consumerd crashes (streaming) In-Reply-To: <506B3AD1.6080200@ericsson.com> References: <506B3AD1.6080200@ericsson.com> Message-ID: <506B3BD6.8030203@efficios.com> Hi Bernd, This really looks like a problem that is suppose to be fixed by the patch on lttng-dev actually under review. I'll let you know once it's merged. Thanks! David Bernd Hufmann: > Hello > > I'm using lttng-tools 2.1 RC4 to stream traces over the network. I > noticed a crash of the consumerd for the following scenario: > > host1> lttng-relayd > > host2> lttng create mySession -U > host2> lttng enable-event -u -a > host2> lttng start > host2> lttng stop > > After that, I stopped the relayd on host1. Then, when destroying the > session on host2, the consumerd crashes and remains as a zombi process > (). I was able to reproduce this multiple times. However, the > screen print-out varied. See at end of mail for an example of the > output. I hope this output helps to find the problem. Could someone > please have a look? > > Thanks > Bernd > > PERROR: recvmsg inet: Connection reset by peer [in > lttcomm_recvmsg_inet_sock() at inet.c:227] > PERROR: sendmsg inet: Connection reset by peer [in > lttcomm_sendmsg_inet_sock() at inet.c:271] > PERROR: Error in file write: Bad file descriptor [in > lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1214] > Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 56) > PERROR: Error in file write: Bad file descriptor [in > lttng_consumer_on_read_subbuffer_mmap() at consumer.c:1214] > Error: Error writing to tracefile (ret: -1 != len: 4096 != subbuf_size: 37) > *** glibc detected *** lttng-consumerd: free(): invalid pointer: > 0xb5c02738 *** > ======= Backtrace: ========= > /lib/i386-linux-gnu/libc.so.6(+0x73e42)[0x282e42] > lttng-consumerd(lttng_consumer_thread_poll_metadata+0x294)[0x804b6c4] > /lib/i386-linux-gnu/libpthread.so.0(+0x6d4c)[0xd84d4c] > /lib/i386-linux-gnu/libc.so.6(clone+0x5e)[0x2f9ace] > ======= Memory map: ======== > 00110000-0012c000 r-xp 00000000 08:01 524197 > /lib/i386-linux-gnu/libgcc_s.so.1 > 0012c000-0012d000 r--p 0001b000 08:01 524197 > /lib/i386-linux-gnu/libgcc_s.so.1 > 0012d000-0012e000 rw-p 0001c000 08:01 524197 > /lib/i386-linux-gnu/libgcc_s.so.1 > 0020e000-0020f000 r-xp 00000000 00:00 0 [vdso] > 0020f000-003ae000 r-xp 00000000 08:01 542235 > /lib/i386-linux-gnu/libc-2.15.so > 003ae000-003b0000 r--p 0019f000 08:01 542235 > /lib/i386-linux-gnu/libc-2.15.so > 003b0000-003b1000 rw-p 001a1000 08:01 542235 > /lib/i386-linux-gnu/libc-2.15.so > 003b1000-003b4000 rw-p 00000000 00:00 0 > 00694000-0069b000 r-xp 00000000 08:01 542251 > /lib/i386-linux-gnu/librt-2.15.so > 0069b000-0069c000 r--p 00006000 08:01 542251 > /lib/i386-linux-gnu/librt-2.15.so > 0069c000-0069d000 rw-p 00007000 08:01 542251 > /lib/i386-linux-gnu/librt-2.15.so > 0087f000-0089d000 r-xp 00000000 08:01 654290 > /usr/local/lib/liblttng-ust-ctl.so.0.0.0 > 0089d000-0089e000 r--p 0001d000 08:01 654290 > /usr/local/lib/liblttng-ust-ctl.so.0.0.0 > 0089e000-0089f000 rw-p 0001e000 08:01 654290 > /usr/local/lib/liblttng-ust-ctl.so.0.0.0 > 00921000-00925000 r-xp 00000000 08:01 654175 > /usr/local/lib/liburcu.so.1.0.0 > 00925000-00926000 r--p 00003000 08:01 654175 > /usr/local/lib/liburcu.so.1.0.0 > 00926000-00927000 rw-p 00004000 08:01 654175 > /usr/local/lib/liburcu.so.1.0.0 > 00a1c000-00a3c000 r-xp 00000000 08:01 523325 > /lib/i386-linux-gnu/ld-2.15.so > 00a3c000-00a3d000 r--p 0001f000 08:01 523325 > /lib/i386-linux-gnu/ld-2.15.so > 00a3d000-00a3e000 rw-p 00020000 08:01 523325 > /lib/i386-linux-gnu/ld-2.15.so > 00a5e000-00a63000 r-xp 00000000 08:01 654191 > /usr/local/lib/liburcu-bp.so.1.0.0 > 00a63000-00a64000 r--p 00004000 08:01 654191 > /usr/local/lib/liburcu-bp.so.1.0.0 > 00a64000-00a65000 rw-p 00005000 08:01 654191 > /usr/local/lib/liburcu-bp.so.1.0.0 > 00d7e000-00d95000 r-xp 00000000 08:01 542249 > /lib/i386-linux-gnu/libpthread-2.15.so > 00d95000-00d96000 r--p 00016000 08:01 542249 > /lib/i386-linux-gnu/libpthread-2.15.so > 00d96000-00d97000 rw-p 00017000 08:01 542249 > /lib/i386-linux-gnu/libpthread-2.15.so > 00d97000-00d99000 rw-p 00000000 00:00 0 > 08048000-08062000 r-xp 00000000 08:01 654329 > /usr/local/lib/lttng/libexec/lttng-consumerd > 08062000-08063000 r--p 0001a000 08:01 654329 > /usr/local/lib/lttng/libexec/lttng-consumerd > 08063000-08064000 rw-p 0001b000 08:01 654329 > /usr/local/lib/lttng/libexec/lttng-consumerd > 08064000-08066000 rw-p 00000000 00:00 0 > 09b9e000-09bbf000 rw-p 00000000 00:00 0 [heap] > b53ff000-b5400000 ---p 00000000 00:00 0 > b5400000-b5c00000 rw-p 00000000 00:00 0 > b5c00000-b5c21000 rw-p 00000000 00:00 0 > b5c21000-b5d00000 ---p 00000000 00:00 0 > b5e00000-b5e21000 rw-p 00000000 00:00 0 > b5e21000-b5f00000 ---p 00000000 00:00 0 > b5fd7000-b5fdd000 rw-s 00000000 00:12 27685 > /run/shm/ust-shm-tmp-3bEkF7 (deleted) > b5fdd000-b5fe1000 rw-s 00000000 00:12 27723 > /run/shm/ust-shm-tmp-YwqSPR (deleted) > b5fe1000-b5fe2000 ---p 00000000 00:00 0 > b5fe2000-b67e2000 rw-p 00000000 00:00 0 > b67e2000-b67e3000 ---p 00000000 00:00 0 > b67e3000-b6fe3000 rw-p 00000000 00:00 0 > b6fe3000-b6fe4000 ---p 00000000 00:00 0 > b6fe4000-b77e7000 rw-p 00000000 00:00 0 > b77e7000-b77ed000 rw-s 00000000 00:12 27681 > /run/shm/ust-shm-tmp-7Lo767 (deleted) > b77ed000-b77f1000 rw-s 00000000 00:12 27719 > /run/shm/ust-shm-tmp-uEFKcS (deleted) > b77fa000-b77fd000 rw-p 00000000 00:00 0 > bfa07000-bfa28000 rw-p 00000000 00:00 0 [stack] > Error: consumer err socket second poll error > Error: Health error occurred in thread_manage_consumer > > > > > > > From dgoulet at efficios.com Tue Oct 2 15:11:40 2012 From: dgoulet at efficios.com (David Goulet) Date: Tue, 02 Oct 2012 15:11:40 -0400 Subject: [lttng-dev] lttng destroy fails In-Reply-To: <506B36BB.4040808@ericsson.com> References: <506B36BB.4040808@ericsson.com> Message-ID: <506B3C6C.5030605@efficios.com> Hi Bernd, I've just created a new issue about that. https://bugs.lttng.org/issues/359 Thanks for reporting! David Bernd Hufmann: > Hello > > I noticed a minor issue when executing the following command sequence: > >> lttng create auto > Session auto created. > Traces will be written in /home/bernd/lttng-traces/auto-20121002-144207 >> lttng destroy > Warning: Session name auto not found > > The session is sill around after that. I think the problem is that in > the file .lttngrc the wrong session name is stored. In the file .lttngrc > the session name "auto" is stored but the actual name is > "auto-20121002-144207". > > BTW, I'm using lttng-tools 2.1 RC4. > > BR, > Bernd > From christian.babeux at efficios.com Tue Oct 2 16:00:27 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 2 Oct 2012 16:00:27 -0400 Subject: [lttng-dev] [PATCH v3 lttng-tools 1/5] Add testpoints in lttng-sessiond to instrument every threads In-Reply-To: <1348770199-1618-1-git-send-email-christian.babeux@efficios.com> References: <1348770199-1618-1-git-send-email-christian.babeux@efficios.com> Message-ID: <1349208027-14191-1-git-send-email-christian.babeux@efficios.com> This commit adds 8 new testpoints in the lttng-sessiond binary. These testpoints rely on the testpoints infrastructure introduced recently. Testpoints: thread_manage_clients thread_manage_clients_before_loop thread_registration_apps thread_manage_apps thread_manage_apps_before_loop thread_manage_kernel thread_manage_kernel_before_loop thread_manage_consumer The thread_ testpoints are placed directly at the thread start and they can be used to trigger failure in . The thread__before_loop testpoints are placed directly before the main processing loop of the thread and thus can be used to stall the processing of the thread. Signed-off-by: Christian Babeux --- src/bin/lttng-sessiond/Makefile.am | 6 ++++-- src/bin/lttng-sessiond/main.c | 18 ++++++++++++++++++ src/bin/lttng-sessiond/testpoint.h | 29 +++++++++++++++++++++++++++++ 3 files changed, 51 insertions(+), 2 deletions(-) create mode 100644 src/bin/lttng-sessiond/testpoint.h diff --git a/src/bin/lttng-sessiond/Makefile.am b/src/bin/lttng-sessiond/Makefile.am index 73be023..b13059d 100644 --- a/src/bin/lttng-sessiond/Makefile.am +++ b/src/bin/lttng-sessiond/Makefile.am @@ -21,7 +21,8 @@ lttng_sessiond_SOURCES = utils.c utils.h \ kernel-consumer.c kernel-consumer.h \ consumer.h filter.c filter.h \ health.c health.h \ - cmd.c cmd.h + cmd.c cmd.h \ + testpoint.h if HAVE_LIBLTTNG_UST_CTL lttng_sessiond_SOURCES += trace-ust.c ust-app.c ust-consumer.c ust-consumer.h @@ -38,7 +39,8 @@ lttng_sessiond_LDADD = -lrt -lurcu-common -lurcu \ $(top_builddir)/src/common/hashtable/libhashtable.la \ $(top_builddir)/src/common/libcommon.la \ $(top_builddir)/src/common/compat/libcompat.la \ - $(top_builddir)/src/common/relayd/librelayd.la + $(top_builddir)/src/common/relayd/librelayd.la \ + $(top_builddir)/src/common/testpoint/libtestpoint.la if HAVE_LIBLTTNG_UST_CTL lttng_sessiond_LDADD += -llttng-ust-ctl diff --git a/src/bin/lttng-sessiond/main.c b/src/bin/lttng-sessiond/main.c index 730ac65..2b78141 100644 --- a/src/bin/lttng-sessiond/main.c +++ b/src/bin/lttng-sessiond/main.c @@ -62,6 +62,7 @@ #include "fd-limit.h" #include "filter.h" #include "health.h" +#include "testpoint.h" #define CONSUMERD_FILE "lttng-consumerd" @@ -680,8 +681,12 @@ static void *thread_manage_kernel(void *data) DBG("Thread manage kernel started"); + testpoint(thread_manage_kernel); + health_code_update(&health_thread_kernel); + testpoint(thread_manage_kernel_before_loop); + ret = create_thread_poll_set(&events, 2); if (ret < 0) { goto error_poll_create; @@ -829,6 +834,9 @@ static void *thread_manage_consumer(void *data) /* Inifinite blocking call, waiting for transmission */ restart: health_poll_update(&consumer_data->health); + + testpoint(thread_manage_consumer); + ret = lttng_poll_wait(&events, -1); health_poll_update(&consumer_data->health); if (ret < 0) { @@ -1026,6 +1034,8 @@ static void *thread_manage_apps(void *data) DBG("[thread] Manage application started"); + testpoint(thread_manage_apps); + rcu_register_thread(); rcu_thread_online(); @@ -1041,6 +1051,8 @@ static void *thread_manage_apps(void *data) goto error; } + testpoint(thread_manage_apps_before_loop); + health_code_update(&health_thread_app_manage); while (1) { @@ -1264,6 +1276,8 @@ static void *thread_registration_apps(void *data) DBG("[thread] Manage application registration started"); + testpoint(thread_registration_apps); + ret = lttcomm_listen_unix_sock(apps_sock); if (ret < 0) { goto error_listen; @@ -2912,6 +2926,8 @@ static void *thread_manage_clients(void *data) DBG("[thread] Manage client started"); + testpoint(thread_manage_clients); + rcu_register_thread(); health_code_update(&health_thread_cmd); @@ -2943,6 +2959,8 @@ static void *thread_manage_clients(void *data) kill(ppid, SIGUSR1); } + testpoint(thread_manage_clients_before_loop); + health_code_update(&health_thread_cmd); while (1) { diff --git a/src/bin/lttng-sessiond/testpoint.h b/src/bin/lttng-sessiond/testpoint.h new file mode 100644 index 0000000..5116548 --- /dev/null +++ b/src/bin/lttng-sessiond/testpoint.h @@ -0,0 +1,29 @@ +/* + * Copyright (C) 2012 - Christian Babeux + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include + +/* Testpoints, internal use only */ +TESTPOINT_DECL(thread_manage_clients); +TESTPOINT_DECL(thread_manage_clients_before_loop); +TESTPOINT_DECL(thread_registration_apps); +TESTPOINT_DECL(thread_manage_apps); +TESTPOINT_DECL(thread_manage_apps_before_loop); +TESTPOINT_DECL(thread_manage_kernel); +TESTPOINT_DECL(thread_manage_kernel_before_loop); +TESTPOINT_DECL(thread_manage_consumer); + -- 1.7.12.1 From paul.chavent at fnac.net Tue Oct 2 16:10:09 2012 From: paul.chavent at fnac.net (Paul Chavent) Date: Tue, 02 Oct 2012 22:10:09 +0200 Subject: [lttng-dev] Viewing userspace apps traces Message-ID: <506B4A21.6080309@fnac.net> Hi. Today, I've tested the tracing of a user space app. I wonder what is the best solution for viewing it ? I have tried babeltrace that produce a good comprehensive text outpout. I have tried eclipse that isn't able to use the vtid and vpid context. It's unfortunate as i would like to see a timeline representation... Is it a good solution to use the custom text parser of the eclipse tool ? Do you suggest an other way ? Thanks for your help. Paul. From alexmonthy at voxpopuli.im Tue Oct 2 16:29:24 2012 From: alexmonthy at voxpopuli.im (Alexandre Montplaisir) Date: Tue, 02 Oct 2012 16:29:24 -0400 Subject: [lttng-dev] Viewing userspace apps traces In-Reply-To: <506B4A21.6080309@fnac.net> References: <506B4A21.6080309@fnac.net> Message-ID: <506B4EA4.8050407@voxpopuli.im> Hi Paul, On 12-10-02 04:10 PM, Paul Chavent wrote: > Hi. > > Today, I've tested the tracing of a user space app. > > I wonder what is the best solution for viewing it ? > > I have tried babeltrace that produce a good comprehensive text outpout. > I have tried eclipse that isn't able to use the vtid and vpid context. > It's unfortunate as i would like to see a timeline representation... Do you mean the Events view does not show the event contexts? It should, afaik. It could be a bug. As for graphical views, like timegraphs, it's not easy to have a general view that can work with any UST trace. Each application defines its own event types, so we have no guarantee for any given event type to be there. What would you like see in your "timeline representation" exactly? Maybe we could give you some pointers as to how to implement such a view. (We're currently working on making it easy to extend the framework to implement new views, so this could be a good exercise!) > Is it a good solution to use the custom text parser of the eclipse tool ? Not really, the custom parsers are, for example, when you have a text log and want to import it into the TMF framework. LTTng/UST traces are in CTF format, and that format is already supported in TMF. Cheers, -- Alexandre Montplaisir DORSAL lab, ?cole Polytechnique de Montr?al From dgoulet at efficios.com Tue Oct 2 16:46:19 2012 From: dgoulet at efficios.com (David Goulet) Date: Tue, 02 Oct 2012 16:46:19 -0400 Subject: [lttng-dev] Build out of src tree. In-Reply-To: <1551710.192371349163427925.JavaMail.www@wsfrf1114> References: <1551710.192371349163427925.JavaMail.www@wsfrf1114> Message-ID: <506B529B.9020809@efficios.com> Merged! Thanks! David paul.chavent at fnac.net: > I used to build my packages out of src tree. As it's not straightforward for lttng-tools and lttng-ust, i would like to submit those two patches. > > Regards. > > Paul. > > > > This body part will be downloaded on demand. From paul.chavent at fnac.net Wed Oct 3 01:50:29 2012 From: paul.chavent at fnac.net (Paul Chavent) Date: Wed, 03 Oct 2012 07:50:29 +0200 Subject: [lttng-dev] Viewing userspace apps traces In-Reply-To: <506B4EA4.8050407@voxpopuli.im> References: <506B4A21.6080309@fnac.net> <506B4EA4.8050407@voxpopuli.im> Message-ID: <506BD225.3030906@fnac.net> Hi Thank you for your reply : On 10/02/2012 10:29 PM, Alexandre Montplaisir wrote: > Hi Paul, > > On 12-10-02 04:10 PM, Paul Chavent wrote: >> Hi. >> >> Today, I've tested the tracing of a user space app. >> >> I wonder what is the best solution for viewing it ? >> >> I have tried babeltrace that produce a good comprehensive text outpout. >> I have tried eclipse that isn't able to use the vtid and vpid context. >> It's unfortunate as i would like to see a timeline representation... > > Do you mean the Events view does not show the event contexts? It should, > afaik. It could be a bug. > > As for graphical views, like timegraphs, it's not easy to have a general > view that can work with any UST trace. Each application defines its own > event types, so we have no guarantee for any given event type to be there. > > What would you like see in your "timeline representation" exactly? Maybe > we could give you some pointers as to how to implement such a view. > (We're currently working on making it easy to extend the framework to > implement new views, so this could be a good exercise!) I would like to see, eg, one line per tid, and on each line, the value of one context or argument value. I'm ready to follow an exercise for extending the framework ! > >> Is it a good solution to use the custom text parser of the eclipse tool ? > > Not really, the custom parsers are, for example, when you have a text > log and want to import it into the TMF framework. LTTng/UST traces are > in CTF format, and that format is already supported in TMF. > > > > Cheers, > From dgoulet at efficios.com Wed Oct 3 09:40:20 2012 From: dgoulet at efficios.com (David Goulet) Date: Wed, 03 Oct 2012 09:40:20 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v5] ABI with support for compat 32/64 bits In-Reply-To: <1349204606-6473-1-git-send-email-jdesfossez@efficios.com> References: <1349204606-6473-1-git-send-email-jdesfossez@efficios.com> Message-ID: <506C4044.7000603@efficios.com> This look good to me. There is a small change I would make though to the open_metadata call where the old_channel is not in the if - else{} statement but I'll do it before merging it. Don't bother sending back a version. Acked. David Julien Desfossez: > The current ABI does not work for compat 32/64 bits. > This patch moves the current ABI as old-abi and provides a new ABI in > which all the structures exchanged between user and kernel-space are > packed. Also this new ABI moves the "int overwrite" member of the > struct lttng_kernel_channel to remove the alignment added by the > compiler. > > A patch for lttng-modules has been developed in parallel to this one > to support the new ABI. These 2 patches have been tested in all > possible configurations (applied or not) on 64-bit and 32-bit kernels > (with CONFIG_COMPAT) and a user-space in 32 and 64-bit. > > Here are the results of the tests : > k 64 compat | u 32 compat | OK > k 64 compat | u 64 compat | OK > k 64 compat | u 32 non-compat | KO > k 64 compat | u 64 non-compat | OK > > k 64 non-compat | u 64 compat | OK > k 64 non-compat | u 32 compat | KO > k 64 non-compat | u 64 non-compat | OK > k 64 non-compat | u 32 non-compat | KO > > k 32 compat | u compat | OK > k 32 compat | u non-compat | OK > > k 32 non-compat | u compat | OK > k 32 non-compat | u non-compat | OK > > The results are as expected : > - on 32-bit user-space and kernel, every configuration works. > - on 64-bit user-space and kernel, every configuration works. > - with 32-bit user-space on a 64-bit kernel the only configuration > where it works is when the compat patch is applied everywhere. > > Signed-off-by: Julien Desfossez > --- > src/bin/lttng-sessiond/trace-kernel.h | 1 + > src/common/kernel-ctl/kernel-ctl.c | 224 ++++++++++++++++++++++++++++++--- > src/common/kernel-ctl/kernel-ctl.h | 1 + > src/common/kernel-ctl/kernel-ioctl.h | 74 +++++++---- > src/common/lttng-kernel-old.h | 115 +++++++++++++++++ > src/common/lttng-kernel.h | 31 +++-- > 6 files changed, 397 insertions(+), 49 deletions(-) > create mode 100644 src/common/lttng-kernel-old.h > > diff --git a/src/bin/lttng-sessiond/trace-kernel.h b/src/bin/lttng-sessiond/trace-kernel.h > index f04d9e7..c86cc27 100644 > --- a/src/bin/lttng-sessiond/trace-kernel.h > +++ b/src/bin/lttng-sessiond/trace-kernel.h > @@ -22,6 +22,7 @@ > > #include > #include > +#include > > #include "consumer.h" > > diff --git a/src/common/kernel-ctl/kernel-ctl.c b/src/common/kernel-ctl/kernel-ctl.c > index 1396cd9..a93d251 100644 > --- a/src/common/kernel-ctl/kernel-ctl.c > +++ b/src/common/kernel-ctl/kernel-ctl.c > @@ -18,38 +18,175 @@ > > #define __USE_LINUX_IOCTL_DEFS > #include > +#include > > #include "kernel-ctl.h" > #include "kernel-ioctl.h" > > +/* > + * This flag indicates which version of the kernel ABI to use. The old > + * ABI (namespace _old) does not support a 32-bit user-space when the > + * kernel is 64-bit. The old ABI is kept here for compatibility but is > + * deprecated and will be removed eventually. > + */ > +static int lttng_kernel_use_old_abi = -1; > + > +/* > + * Execute the new or old ioctl depending on the ABI version. > + * If the ABI version is not determined yet (lttng_kernel_use_old_abi = -1), > + * this function tests if the new ABI is available and otherwise fallbacks > + * on the old one. > + * This function takes the fd on which the ioctl must be executed and the old > + * and new request codes. > + * It returns the return value of the ioctl executed. > + */ > +static inline int compat_ioctl_no_arg(int fd, unsigned long oldname, > + unsigned long newname) > +{ > + int ret; > + > + if (lttng_kernel_use_old_abi == -1) { > + ret = ioctl(fd, newname); > + if (!ret) { > + lttng_kernel_use_old_abi = 0; > + goto end; > + } > + lttng_kernel_use_old_abi = 1; > + } > + if (lttng_kernel_use_old_abi) { > + ret = ioctl(fd, oldname); > + } else { > + ret = ioctl(fd, newname); > + } > + > +end: > + return ret; > +} > + > int kernctl_create_session(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_SESSION); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION, > + LTTNG_KERNEL_SESSION); > } > > /* open the metadata global channel */ > int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops) > { > - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); > + struct lttng_kernel_old_channel old_channel; > + struct lttng_kernel_channel channel; > + > + if (lttng_kernel_use_old_abi) { > + old_channel.overwrite = chops->overwrite; > + old_channel.subbuf_size = chops->subbuf_size; > + old_channel.num_subbuf = chops->num_subbuf; > + old_channel.switch_timer_interval = chops->switch_timer_interval; > + old_channel.read_timer_interval = chops->read_timer_interval; > + old_channel.output = chops->output; > + memcpy(old_channel.padding, chops->padding, sizeof(old_channel.padding)); > + > + return ioctl(fd, LTTNG_KERNEL_OLD_METADATA, &old_channel); > + } > + > + channel.overwrite = chops->overwrite; > + channel.subbuf_size = chops->subbuf_size; > + channel.num_subbuf = chops->num_subbuf; > + channel.switch_timer_interval = chops->switch_timer_interval; > + channel.read_timer_interval = chops->read_timer_interval; > + channel.output = chops->output; > + memcpy(channel.padding, chops->padding, sizeof(channel.padding)); > + > + return ioctl(fd, LTTNG_KERNEL_METADATA, &channel); > } > > int kernctl_create_channel(int fd, struct lttng_channel_attr *chops) > { > - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); > + struct lttng_kernel_channel channel; > + > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_old_channel old_channel; > + > + old_channel.overwrite = chops->overwrite; > + old_channel.subbuf_size = chops->subbuf_size; > + old_channel.num_subbuf = chops->num_subbuf; > + old_channel.switch_timer_interval = chops->switch_timer_interval; > + old_channel.read_timer_interval = chops->read_timer_interval; > + old_channel.output = chops->output; > + memcpy(old_channel.padding, chops->padding, sizeof(old_channel.padding)); > + > + return ioctl(fd, LTTNG_KERNEL_OLD_CHANNEL, &old_channel); > + } > + > + channel.overwrite = chops->overwrite; > + channel.subbuf_size = chops->subbuf_size; > + channel.num_subbuf = chops->num_subbuf; > + channel.switch_timer_interval = chops->switch_timer_interval; > + channel.read_timer_interval = chops->read_timer_interval; > + channel.output = chops->output; > + memcpy(channel.padding, chops->padding, sizeof(channel.padding)); > + > + return ioctl(fd, LTTNG_KERNEL_CHANNEL, &channel); > } > > int kernctl_create_stream(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_STREAM); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_STREAM, > + LTTNG_KERNEL_STREAM); > } > > int kernctl_create_event(int fd, struct lttng_kernel_event *ev) > { > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_old_event old_event; > + > + memcpy(old_event.name, ev->name, sizeof(old_event.name)); > + old_event.instrumentation = ev->instrumentation; > + switch (ev->instrumentation) { > + case LTTNG_KERNEL_KPROBE: > + old_event.u.kprobe.addr = ev->u.kprobe.addr; > + old_event.u.kprobe.offset = ev->u.kprobe.offset; > + memcpy(old_event.u.kprobe.symbol_name, > + ev->u.kprobe.symbol_name, > + sizeof(old_event.u.kprobe.symbol_name)); > + break; > + case LTTNG_KERNEL_KRETPROBE: > + old_event.u.kretprobe.addr = ev->u.kretprobe.addr; > + old_event.u.kretprobe.offset = ev->u.kretprobe.offset; > + memcpy(old_event.u.kretprobe.symbol_name, > + ev->u.kretprobe.symbol_name, > + sizeof(old_event.u.kretprobe.symbol_name)); > + break; > + case LTTNG_KERNEL_FUNCTION: > + memcpy(old_event.u.ftrace.symbol_name, > + ev->u.ftrace.symbol_name, > + sizeof(old_event.u.ftrace.symbol_name)); > + break; > + default: > + break; > + } > + > + return ioctl(fd, LTTNG_KERNEL_OLD_EVENT, &old_event); > + } > return ioctl(fd, LTTNG_KERNEL_EVENT, ev); > } > > int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) > { > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_old_context old_ctx; > + > + old_ctx.ctx = ctx->ctx; > + /* only type that uses the union */ > + if (ctx->ctx == LTTNG_KERNEL_CONTEXT_PERF_COUNTER) { > + old_ctx.u.perf_counter.type = > + ctx->u.perf_counter.type; > + old_ctx.u.perf_counter.config = > + ctx->u.perf_counter.config; > + memcpy(old_ctx.u.perf_counter.name, > + ctx->u.perf_counter.name, > + sizeof(old_ctx.u.perf_counter.name)); > + } > + return ioctl(fd, LTTNG_KERNEL_OLD_CONTEXT, &old_ctx); > + } > return ioctl(fd, LTTNG_KERNEL_CONTEXT, ctx); > } > > @@ -57,44 +194,98 @@ int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) > /* Enable event, channel and session ioctl */ > int kernctl_enable(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_ENABLE); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_ENABLE, > + LTTNG_KERNEL_ENABLE); > } > > /* Disable event, channel and session ioctl */ > int kernctl_disable(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_DISABLE); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_DISABLE, > + LTTNG_KERNEL_DISABLE); > } > > int kernctl_start_session(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_SESSION_START); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_START, > + LTTNG_KERNEL_SESSION_START); > } > > int kernctl_stop_session(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_SESSION_STOP); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_STOP, > + LTTNG_KERNEL_SESSION_STOP); > } > > - > int kernctl_tracepoint_list(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_TRACEPOINT_LIST); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_TRACEPOINT_LIST, > + LTTNG_KERNEL_TRACEPOINT_LIST); > } > > int kernctl_tracer_version(int fd, struct lttng_kernel_tracer_version *v) > { > - return ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > + int ret; > + > + if (lttng_kernel_use_old_abi == -1) { > + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > + if (!ret) { > + lttng_kernel_use_old_abi = 0; > + goto end; > + } > + lttng_kernel_use_old_abi = 1; > + } > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_old_tracer_version old_v; > + > + ret = ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, &old_v); > + if (ret) { > + goto end; > + } > + v->major = old_v.major; > + v->minor = old_v.minor; > + v->patchlevel = old_v.patchlevel; > + } else { > + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > + } > + > +end: > + return ret; > } > > int kernctl_wait_quiescent(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_WAIT_QUIESCENT); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_WAIT_QUIESCENT, > + LTTNG_KERNEL_WAIT_QUIESCENT); > } > > int kernctl_calibrate(int fd, struct lttng_kernel_calibrate *calibrate) > { > - return ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > + int ret; > + > + if (lttng_kernel_use_old_abi == -1) { > + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > + if (!ret) { > + lttng_kernel_use_old_abi = 0; > + goto end; > + } > + lttng_kernel_use_old_abi = 1; > + } > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_old_calibrate old_calibrate; > + > + old_calibrate.type = calibrate->type; > + ret = ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, &old_calibrate); > + if (ret) { > + goto end; > + } > + calibrate->type = old_calibrate.type; > + } else { > + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > + } > + > +end: > + return ret; > } > > > @@ -193,10 +384,3 @@ int kernctl_set_stream_id(int fd, unsigned long *stream_id) > { > return ioctl(fd, RING_BUFFER_SET_STREAM_ID, stream_id); > } > - > -/* Get the offset of the stream_id in the packet header */ > -int kernctl_get_net_stream_id_offset(int fd, unsigned long *offset) > -{ > - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, offset); > - > -} > diff --git a/src/common/kernel-ctl/kernel-ctl.h b/src/common/kernel-ctl/kernel-ctl.h > index 18712d9..85a3a18 100644 > --- a/src/common/kernel-ctl/kernel-ctl.h > +++ b/src/common/kernel-ctl/kernel-ctl.h > @@ -21,6 +21,7 @@ > > #include > #include > +#include > > int kernctl_create_session(int fd); > int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops); > diff --git a/src/common/kernel-ctl/kernel-ioctl.h b/src/common/kernel-ctl/kernel-ioctl.h > index 35942be..8e22632 100644 > --- a/src/common/kernel-ctl/kernel-ioctl.h > +++ b/src/common/kernel-ctl/kernel-ioctl.h > @@ -49,37 +49,69 @@ > /* map stream to stream id for network streaming */ > #define RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned long) > > +/* Old ABI (without support for 32/64 bits compat) */ > +/* LTTng file descriptor ioctl */ > +#define LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) > +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ > + _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) > +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, 0x42) > +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT _IO(0xF6, 0x43) > +#define LTTNG_KERNEL_OLD_CALIBRATE \ > + _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) > + > +/* Session FD ioctl */ > +#define LTTNG_KERNEL_OLD_METADATA \ > + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) > +#define LTTNG_KERNEL_OLD_CHANNEL \ > + _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) > +#define LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) > +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, 0x53) > + > +/* Channel FD ioctl */ > +#define LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) > +#define LTTNG_KERNEL_OLD_EVENT \ > + _IOW(0xF6, 0x61, struct lttng_kernel_old_event) > +#define LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ > + _IOR(0xF6, 0x62, unsigned long) > > +/* Event and Channel FD ioctl */ > +#define LTTNG_KERNEL_OLD_CONTEXT \ > + _IOW(0xF6, 0x70, struct lttng_kernel_old_context) > + > +/* Event, Channel and Session ioctl */ > +#define LTTNG_KERNEL_OLD_ENABLE _IO(0xF6, 0x80) > +#define LTTNG_KERNEL_OLD_DISABLE _IO(0xF6, 0x81) > + > + > +/* Current ABI (with suport for 32/64 bits compat) */ > /* LTTng file descriptor ioctl */ > -#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) > -#define LTTNG_KERNEL_TRACER_VERSION \ > - _IOR(0xF6, 0x41, struct lttng_kernel_tracer_version) > -#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) > -#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) > +#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) > +#define LTTNG_KERNEL_TRACER_VERSION \ > + _IOR(0xF6, 0x46, struct lttng_kernel_tracer_version) > +#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) > +#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) > #define LTTNG_KERNEL_CALIBRATE \ > - _IOWR(0xF6, 0x44, struct lttng_kernel_calibrate) > + _IOWR(0xF6, 0x49, struct lttng_kernel_calibrate) > > /* Session FD ioctl */ > -#define LTTNG_KERNEL_METADATA \ > - _IOW(0xF6, 0x50, struct lttng_channel_attr) > -#define LTTNG_KERNEL_CHANNEL \ > - _IOW(0xF6, 0x51, struct lttng_channel_attr) > -#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) > -#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) > +#define LTTNG_KERNEL_METADATA \ > + _IOW(0xF6, 0x54, struct lttng_kernel_channel) > +#define LTTNG_KERNEL_CHANNEL \ > + _IOW(0xF6, 0x55, struct lttng_kernel_channel) > +#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) > +#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) > > /* Channel FD ioctl */ > -#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x60) > -#define LTTNG_KERNEL_EVENT \ > - _IOW(0xF6, 0x61, struct lttng_kernel_event) > -#define LTTNG_KERNEL_STREAM_ID_OFFSET \ > - _IOR(0xF6, 0x62, unsigned long) > +#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x62) > +#define LTTNG_KERNEL_EVENT \ > + _IOW(0xF6, 0x63, struct lttng_kernel_event) > > /* Event and Channel FD ioctl */ > -#define LTTNG_KERNEL_CONTEXT \ > - _IOW(0xF6, 0x70, struct lttng_kernel_context) > +#define LTTNG_KERNEL_CONTEXT \ > + _IOW(0xF6, 0x71, struct lttng_kernel_context) > > /* Event, Channel and Session ioctl */ > -#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) > -#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) > +#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) > +#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) > > #endif /* _LTT_KERNEL_IOCTL_H */ > diff --git a/src/common/lttng-kernel-old.h b/src/common/lttng-kernel-old.h > new file mode 100644 > index 0000000..1b8999a > --- /dev/null > +++ b/src/common/lttng-kernel-old.h > @@ -0,0 +1,115 @@ > +/* > + * Copyright (C) 2011 - Julien Desfossez > + * Mathieu Desnoyers > + * David Goulet > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2 only, > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along > + * with this program; if not, write to the Free Software Foundation, Inc., > + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > + */ > + > +#ifndef _LTTNG_KERNEL_OLD_H > +#define _LTTNG_KERNEL_OLD_H > + > +#include > +#include > + > +/* > + * LTTng DebugFS ABI structures. > + * > + * This is the kernel ABI copied from lttng-modules tree. > + */ > + > +/* Perf counter attributes */ > +struct lttng_kernel_old_perf_counter_ctx { > + uint32_t type; > + uint64_t config; > + char name[LTTNG_KERNEL_SYM_NAME_LEN]; > +}; > + > +/* Event/Channel context */ > +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 > +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > +struct lttng_kernel_old_context { > + enum lttng_kernel_context_type ctx; > + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; > + > + union { > + struct lttng_kernel_old_perf_counter_ctx perf_counter; > + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; > + } u; > +}; > + > +struct lttng_kernel_old_kretprobe { > + uint64_t addr; > + > + uint64_t offset; > + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > +}; > + > +/* > + * Either addr is used, or symbol_name and offset. > + */ > +struct lttng_kernel_old_kprobe { > + uint64_t addr; > + > + uint64_t offset; > + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > +}; > + > +/* Function tracer */ > +struct lttng_kernel_old_function { > + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > +}; > + > +#define LTTNG_KERNEL_OLD_EVENT_PADDING1 16 > +#define LTTNG_KERNEL_OLD_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > +struct lttng_kernel_old_event { > + char name[LTTNG_KERNEL_SYM_NAME_LEN]; > + enum lttng_kernel_instrumentation instrumentation; > + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; > + > + /* Per instrumentation type configuration */ > + union { > + struct lttng_kernel_old_kretprobe kretprobe; > + struct lttng_kernel_old_kprobe kprobe; > + struct lttng_kernel_old_function ftrace; > + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; > + } u; > +}; > + > +struct lttng_kernel_old_tracer_version { > + uint32_t major; > + uint32_t minor; > + uint32_t patchlevel; > +}; > + > +struct lttng_kernel_old_calibrate { > + enum lttng_kernel_calibrate_type type; /* type (input) */ > +}; > + > +/* > + * kernel channel > + */ > +#define LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > +struct lttng_kernel_old_channel { > + int overwrite; /* 1: overwrite, 0: discard */ > + uint64_t subbuf_size; /* bytes */ > + uint64_t num_subbuf; /* power of 2 */ > + unsigned int switch_timer_interval; /* usec */ > + unsigned int read_timer_interval; /* usec */ > + enum lttng_event_output output; /* splice, mmap */ > + > + char padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; > +}; > + > +#endif /* _LTTNG_KERNEL_OLD_H */ > diff --git a/src/common/lttng-kernel.h b/src/common/lttng-kernel.h > index dbeb6aa..fa8ba61 100644 > --- a/src/common/lttng-kernel.h > +++ b/src/common/lttng-kernel.h > @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { > uint32_t type; > uint64_t config; > char name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* Event/Channel context */ > #define LTTNG_KERNEL_CONTEXT_PADDING1 16 > @@ -72,14 +72,14 @@ struct lttng_kernel_context { > struct lttng_kernel_perf_counter_ctx perf_counter; > char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; > } u; > -}; > +}__attribute__((packed)); > > struct lttng_kernel_kretprobe { > uint64_t addr; > > uint64_t offset; > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* > * Either addr is used, or symbol_name and offset. > @@ -89,12 +89,12 @@ struct lttng_kernel_kprobe { > > uint64_t offset; > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* Function tracer */ > struct lttng_kernel_function { > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > #define LTTNG_KERNEL_EVENT_PADDING1 16 > #define LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > @@ -110,13 +110,13 @@ struct lttng_kernel_event { > struct lttng_kernel_function ftrace; > char padding[LTTNG_KERNEL_EVENT_PADDING2]; > } u; > -}; > +}__attribute__((packed)); > > struct lttng_kernel_tracer_version { > uint32_t major; > uint32_t minor; > uint32_t patchlevel; > -}; > +}__attribute__((packed)); > > enum lttng_kernel_calibrate_type { > LTTNG_KERNEL_CALIBRATE_KRETPROBE, > @@ -124,6 +124,21 @@ enum lttng_kernel_calibrate_type { > > struct lttng_kernel_calibrate { > enum lttng_kernel_calibrate_type type; /* type (input) */ > -}; > +}__attribute__((packed)); > + > +/* > + * kernel channel > + */ > +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > +struct lttng_kernel_channel { > + uint64_t subbuf_size; /* bytes */ > + uint64_t num_subbuf; /* power of 2 */ > + unsigned int switch_timer_interval; /* usec */ > + unsigned int read_timer_interval; /* usec */ > + enum lttng_event_output output; /* splice, mmap */ > + > + int overwrite; /* 1: overwrite, 0: discard */ > + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; > +}__attribute__((packed)); > > #endif /* _LTTNG_KERNEL_H */ From dgoulet at efficios.com Wed Oct 3 11:48:35 2012 From: dgoulet at efficios.com (David Goulet) Date: Wed, 3 Oct 2012 11:48:35 -0400 Subject: [lttng-dev] [PATCH lttng-tools 1/3] Fix: Add missing call rcu and read side lock Message-ID: <1349279317-28056-1-git-send-email-dgoulet@efficios.com> Signed-off-by: David Goulet --- src/common/consumer.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/common/consumer.c b/src/common/consumer.c index f01eb5d..7c3762e 100644 --- a/src/common/consumer.c +++ b/src/common/consumer.c @@ -688,7 +688,9 @@ void consumer_del_channel(struct lttng_consumer_channel *channel) } } + rcu_read_lock(); call_rcu(&channel->node.head, consumer_free_channel); + rcu_read_unlock(); end: pthread_mutex_unlock(&consumer_data.lock); } @@ -1535,7 +1537,7 @@ static void destroy_stream_ht(struct lttng_ht *ht) ret = lttng_ht_del(ht, &iter); assert(!ret); - free(stream); + call_rcu(&stream->node.head, consumer_free_stream); } rcu_read_unlock(); @@ -1635,7 +1637,9 @@ static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) consumer_del_channel(stream->chan); } - free(stream); + rcu_read_lock(); + call_rcu(&stream->node.head, consumer_free_stream); + rcu_read_unlock(); } /* -- 1.7.10.4 From dgoulet at efficios.com Wed Oct 3 11:48:36 2012 From: dgoulet at efficios.com (David Goulet) Date: Wed, 3 Oct 2012 11:48:36 -0400 Subject: [lttng-dev] [PATCH lttng-tools 2/3] Add hash table argument to helper functions In-Reply-To: <1349279317-28056-1-git-send-email-dgoulet@efficios.com> References: <1349279317-28056-1-git-send-email-dgoulet@efficios.com> Message-ID: <1349279317-28056-2-git-send-email-dgoulet@efficios.com> This allows these helper functions to be used more broadly across the code base and not for a specific hash table. Signed-off-by: David Goulet --- src/common/consumer.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/src/common/consumer.c b/src/common/consumer.c index 7c3762e..6ee366f 100644 --- a/src/common/consumer.c +++ b/src/common/consumer.c @@ -62,12 +62,15 @@ volatile int consumer_quit = 0; * Find a stream. The consumer_data.lock must be locked during this * call. */ -static struct lttng_consumer_stream *consumer_find_stream(int key) +static struct lttng_consumer_stream *consumer_find_stream(int key, + struct lttng_ht *ht) { struct lttng_ht_iter iter; struct lttng_ht_node_ulong *node; struct lttng_consumer_stream *stream = NULL; + assert(ht); + /* Negative keys are lookup failures */ if (key < 0) { return NULL; @@ -75,8 +78,7 @@ static struct lttng_consumer_stream *consumer_find_stream(int key) rcu_read_lock(); - lttng_ht_lookup(consumer_data.stream_ht, (void *)((unsigned long) key), - &iter); + lttng_ht_lookup(ht, (void *)((unsigned long) key), &iter); node = lttng_ht_iter_get_node_ulong(&iter); if (node != NULL) { stream = caa_container_of(node, struct lttng_consumer_stream, node); @@ -87,12 +89,12 @@ static struct lttng_consumer_stream *consumer_find_stream(int key) return stream; } -static void consumer_steal_stream_key(int key) +static void consumer_steal_stream_key(int key, struct lttng_ht *ht) { struct lttng_consumer_stream *stream; rcu_read_lock(); - stream = consumer_find_stream(key); + stream = consumer_find_stream(key, ht); if (stream) { stream->key = -1; /* @@ -443,7 +445,7 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) pthread_mutex_lock(&consumer_data.lock); /* Steal stream identifier, for UST */ - consumer_steal_stream_key(stream->key); + consumer_steal_stream_key(stream->key, consumer_data.stream_ht); rcu_read_lock(); lttng_ht_lookup(consumer_data.stream_ht, @@ -620,7 +622,7 @@ void consumer_change_stream_state(int stream_key, struct lttng_consumer_stream *stream; pthread_mutex_lock(&consumer_data.lock); - stream = consumer_find_stream(stream_key); + stream = consumer_find_stream(stream_key, consumer_data.stream_ht); if (stream) { stream->state = state; } -- 1.7.10.4 From dgoulet at efficios.com Wed Oct 3 11:48:37 2012 From: dgoulet at efficios.com (David Goulet) Date: Wed, 3 Oct 2012 11:48:37 -0400 Subject: [lttng-dev] [PATCH lttng-tools 3/3] Fix: Stream allocation and insertion consistency In-Reply-To: <1349279317-28056-1-git-send-email-dgoulet@efficios.com> References: <1349279317-28056-1-git-send-email-dgoulet@efficios.com> Message-ID: <1349279317-28056-3-git-send-email-dgoulet@efficios.com> The stream allocation in the consumer was doing ustctl actions on the stream and updating refounts. However, before inserting the stream into the hash table and polling on the fd for data, an error could occur which could stop the stream insertion hence creating multiple fd leaks, mem leaks and bad refount state. Furthermore, the consumer_del_stream now can destroy a stream even if that stream is not added to the global hash table. The kernel and UST consumer uses it on error between allocation and hash table insertion. Signed-off-by: David Goulet --- src/common/consumer.c | 219 +++++++++++++++++--------- src/common/kernel-consumer/kernel-consumer.c | 13 +- src/common/ust-consumer/ust-consumer.c | 16 +- src/common/ust-consumer/ust-consumer.h | 4 +- 4 files changed, 172 insertions(+), 80 deletions(-) diff --git a/src/common/consumer.c b/src/common/consumer.c index 6ee366f..6011622 100644 --- a/src/common/consumer.c +++ b/src/common/consumer.c @@ -227,8 +227,8 @@ void consumer_flag_relayd_for_destroy(struct consumer_relayd_sock_pair *relayd) } /* - * Remove a stream from the global list protected by a mutex. This - * function is also responsible for freeing its data structures. + * Remove a stream from the global list protected by a mutex. This function is + * also responsible for freeing its data structures. */ void consumer_del_stream(struct lttng_consumer_stream *stream) { @@ -236,10 +236,46 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) struct lttng_ht_iter iter; struct lttng_consumer_channel *free_chan = NULL; struct consumer_relayd_sock_pair *relayd; + struct lttng_ht_node_ulong *node; assert(stream); + DBG3("Consumer deleting stream %d", stream->key); + pthread_mutex_lock(&consumer_data.lock); + rcu_read_lock(); + + /* + * A stream with a key value of -1 means that the stream is in the hash + * table but can not be looked up. This happens when consumer_add_stream is + * done and we have a duplicate key before insertion. + * consumer_steal_stream_key() is called to make sure we can insert a + * stream even though the index is already present. Since the key is the fd + * value on the session daemon side, duplicates are possible. + */ + if (stream->key != -1) { + lttng_ht_lookup(consumer_data.stream_ht, + (void *)((unsigned long) stream->key), &iter); + node = lttng_ht_iter_get_node_ulong(&iter); + if (node == NULL) { + rcu_read_unlock(); + + /* + * Stream doest not exist in hash table. This can happen if we hit + * an error after allocation but before adding it to the table. We + * consider that if the node is not in the hash table and has a + * valid key, no ustctl/ioctl nor mmap action was done hence + * jumping to the RCU free. + */ + DBG2("Consumer stream key %d not found during deletion", stream->key); + goto free_stream; + } else { + /* Remove stream from hash table and continue */ + ret = lttng_ht_del(consumer_data.stream_ht, &iter); + assert(!ret); + } + } + rcu_read_unlock(); switch (consumer_data.type) { case LTTNG_CONSUMER_KERNEL: @@ -260,20 +296,10 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) goto end; } - rcu_read_lock(); - iter.iter.node = &stream->node.node; - ret = lttng_ht_del(consumer_data.stream_ht, &iter); - assert(!ret); - - rcu_read_unlock(); - - if (consumer_data.stream_count <= 0) { - goto end; - } + /* This should NEVER reach a negative value. */ + assert(consumer_data.stream_count >= 0); consumer_data.stream_count--; - if (!stream) { - goto end; - } + if (stream->out_fd >= 0) { ret = close(stream->out_fd); if (ret) { @@ -321,7 +347,6 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) destroy_relayd(relayd); } } - rcu_read_unlock(); uatomic_dec(&stream->chan->refcount); if (!uatomic_read(&stream->chan->refcount) @@ -329,7 +354,6 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) free_chan = stream->chan; } - call_rcu(&stream->node.head, consumer_free_stream); end: consumer_data.need_update = 1; pthread_mutex_unlock(&consumer_data.lock); @@ -337,6 +361,10 @@ end: if (free_chan) { consumer_del_channel(free_chan); } + +free_stream: + call_rcu(&stream->node.head, consumer_free_stream); + rcu_read_unlock(); } struct lttng_consumer_stream *consumer_allocate_stream( @@ -353,7 +381,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( int *alloc_ret) { struct lttng_consumer_stream *stream; - int ret; stream = zmalloc(sizeof(*stream)); if (stream == NULL) { @@ -372,7 +399,7 @@ struct lttng_consumer_stream *consumer_allocate_stream( ERR("Unable to find channel for stream %d", stream_key); goto error; } - stream->chan->refcount++; + stream->key = stream_key; stream->shm_fd = shm_fd; stream->wait_fd = wait_fd; @@ -391,35 +418,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( lttng_ht_node_init_ulong(&stream->node, stream->key); lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); - switch (consumer_data.type) { - case LTTNG_CONSUMER_KERNEL: - break; - case LTTNG_CONSUMER32_UST: - case LTTNG_CONSUMER64_UST: - stream->cpu = stream->chan->cpucount++; - ret = lttng_ustconsumer_allocate_stream(stream); - if (ret) { - *alloc_ret = -EINVAL; - goto error; - } - break; - default: - ERR("Unknown consumer_data type"); - *alloc_ret = -EINVAL; - goto error; - } - - /* - * When nb_init_streams reaches 0, we don't need to trigger any action in - * terms of destroying the associated channel, because the action that - * causes the count to become 0 also causes a stream to be added. The - * channel deletion will thus be triggered by the following removal of this - * stream. - */ - if (uatomic_read(&stream->chan->nb_init_streams) > 0) { - uatomic_dec(&stream->chan->nb_init_streams); - } - DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, stream->shm_fd, stream->wait_fd, @@ -439,38 +437,66 @@ end: int consumer_add_stream(struct lttng_consumer_stream *stream) { int ret = 0; - struct lttng_ht_node_ulong *node; - struct lttng_ht_iter iter; struct consumer_relayd_sock_pair *relayd; - pthread_mutex_lock(&consumer_data.lock); - /* Steal stream identifier, for UST */ - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); + assert(stream); + DBG3("Adding consumer stream %d", stream->key); + + pthread_mutex_lock(&consumer_data.lock); rcu_read_lock(); - lttng_ht_lookup(consumer_data.stream_ht, - (void *)((unsigned long) stream->key), &iter); - node = lttng_ht_iter_get_node_ulong(&iter); - if (node != NULL) { - rcu_read_unlock(); - /* Stream already exist. Ignore the insertion */ - goto end; - } - lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); + switch (consumer_data.type) { + case LTTNG_CONSUMER_KERNEL: + break; + case LTTNG_CONSUMER32_UST: + case LTTNG_CONSUMER64_UST: + stream->cpu = stream->chan->cpucount++; + ret = lttng_ustconsumer_add_stream(stream); + if (ret) { + ret = -EINVAL; + goto error; + } + + /* Steal stream identifier only for UST */ + consumer_steal_stream_key(stream->key, consumer_data.stream_ht); + break; + default: + ERR("Unknown consumer_data type"); + assert(0); + ret = -ENOSYS; + goto error; + } /* Check and cleanup relayd */ relayd = consumer_find_relayd(stream->net_seq_idx); if (relayd != NULL) { uatomic_inc(&relayd->refcount); } - rcu_read_unlock(); - /* Update consumer data */ + /* Final operation is to add the stream to the global hash table. */ + lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); + + /* Update channel refcount once added without error(s). */ + uatomic_inc(&stream->chan->refcount); + + /* + * When nb_init_streams reaches 0, we don't need to trigger any action in + * terms of destroying the associated channel, because the action that + * causes the count to become 0 also causes a stream to be added. The + * channel deletion will thus be triggered by the following removal of this + * stream. + */ + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { + uatomic_dec(&stream->chan->nb_init_streams); + } + + /* Update consumer data once the node is inserted. */ consumer_data.stream_count++; consumer_data.need_update = 1; -end: +error: + rcu_read_unlock(); pthread_mutex_unlock(&consumer_data.lock); return ret; @@ -1648,10 +1674,37 @@ static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) * Action done with the metadata stream when adding it to the consumer internal * data structures to handle it. */ -static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) +static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, + struct lttng_ht *ht) { + int ret = 0; struct consumer_relayd_sock_pair *relayd; + switch (consumer_data.type) { + case LTTNG_CONSUMER_KERNEL: + break; + case LTTNG_CONSUMER32_UST: + case LTTNG_CONSUMER64_UST: + ret = lttng_ustconsumer_add_stream(stream); + if (ret) { + ret = -EINVAL; + goto error; + } + + /* Steal stream identifier only for UST */ + consumer_steal_stream_key(stream->key, ht); + break; + default: + ERR("Unknown consumer_data type"); + assert(0); + return -ENOSYS; + } + + /* + * From here, refcounts are updated so be _careful_ when returning an error + * after this point. + */ + /* Find relayd and, if one is found, increment refcount. */ rcu_read_lock(); relayd = consumer_find_relayd(stream->net_seq_idx); @@ -1659,6 +1712,27 @@ static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) uatomic_inc(&relayd->refcount); } rcu_read_unlock(); + + /* Update channel refcount once added without error(s). */ + uatomic_inc(&stream->chan->refcount); + + /* + * When nb_init_streams reaches 0, we don't need to trigger any action in + * terms of destroying the associated channel, because the action that + * causes the count to become 0 also causes a stream to be added. The + * channel deletion will thus be triggered by the following removal of this + * stream. + */ + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { + uatomic_dec(&stream->chan->nb_init_streams); + } + + rcu_read_lock(); + lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); + rcu_read_unlock(); + +error: + return ret; } /* @@ -1755,17 +1829,16 @@ restart: DBG("Adding metadata stream %d to poll set", stream->wait_fd); - rcu_read_lock(); - /* The node should be init at this point */ - lttng_ht_add_unique_ulong(metadata_ht, - &stream->waitfd_node); - rcu_read_unlock(); + ret = consumer_add_metadata_stream(stream, metadata_ht); + if (ret) { + /* Stream was not setup properly. Continuing. */ + free(stream); + continue; + } /* Add metadata stream to the global poll events list */ lttng_poll_add(&events, stream->wait_fd, LPOLLIN | LPOLLPRI); - - consumer_add_metadata_stream(stream); } /* Metadata pipe handled. Continue handling the others */ diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c index 4d61cc5..878c4ab 100644 --- a/src/common/kernel-consumer/kernel-consumer.c +++ b/src/common/kernel-consumer/kernel-consumer.c @@ -206,18 +206,20 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, &new_stream->relayd_stream_id); pthread_mutex_unlock(&relayd->ctrl_sock_mutex); if (ret < 0) { + consumer_del_stream(new_stream); goto end_nosignal; } } else if (msg.u.stream.net_index != -1) { ERR("Network sequence index %d unknown. Not adding stream.", msg.u.stream.net_index); - free(new_stream); + consumer_del_stream(new_stream); goto end_nosignal; } if (ctx->on_recv_stream) { ret = ctx->on_recv_stream(new_stream); if (ret < 0) { + consumer_del_stream(new_stream); goto end_nosignal; } } @@ -230,9 +232,16 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, } while (ret < 0 && errno == EINTR); if (ret < 0) { PERROR("write metadata pipe"); + consumer_del_stream(new_stream); } } else { - consumer_add_stream(new_stream); + ret = consumer_add_stream(new_stream); + if (ret) { + ERR("Consumer add stream %d failed. Continuing", + new_stream->key); + consumer_del_stream(new_stream); + goto end_nosignal; + } } DBG("Kernel consumer_add_stream (%d)", fd); diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c index 76238a0..e10d540 100644 --- a/src/common/ust-consumer/ust-consumer.c +++ b/src/common/ust-consumer/ust-consumer.c @@ -234,12 +234,13 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, &new_stream->relayd_stream_id); pthread_mutex_unlock(&relayd->ctrl_sock_mutex); if (ret < 0) { + consumer_del_stream(new_stream); goto end_nosignal; } } else if (msg.u.stream.net_index != -1) { ERR("Network sequence index %d unknown. Not adding stream.", msg.u.stream.net_index); - free(new_stream); + consumer_del_stream(new_stream); goto end_nosignal; } @@ -247,6 +248,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, if (ctx->on_recv_stream) { ret = ctx->on_recv_stream(new_stream); if (ret < 0) { + consumer_del_stream(new_stream); goto end_nosignal; } } @@ -259,9 +261,17 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, } while (ret < 0 && errno == EINTR); if (ret < 0) { PERROR("write metadata pipe"); + consumer_del_stream(new_stream); + goto end_nosignal; } } else { - consumer_add_stream(new_stream); + ret = consumer_add_stream(new_stream); + if (ret) { + ERR("Consumer add stream %d failed. Continuing", + new_stream->key); + consumer_del_stream(new_stream); + goto end_nosignal; + } } DBG("UST consumer_add_stream %s (%d,%d) with relayd id %" PRIu64, @@ -373,7 +383,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) ustctl_unmap_channel(chan->handle); } -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) { struct lttng_ust_object_data obj; int ret; diff --git a/src/common/ust-consumer/ust-consumer.h b/src/common/ust-consumer/ust-consumer.h index 3f76f23..6b507ed 100644 --- a/src/common/ust-consumer/ust-consumer.h +++ b/src/common/ust-consumer/ust-consumer.h @@ -49,7 +49,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, extern int lttng_ustconsumer_allocate_channel(struct lttng_consumer_channel *chan); extern void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan); -extern int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream); +extern int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream); extern void lttng_ustconsumer_del_stream(struct lttng_consumer_stream *stream); int lttng_ustconsumer_read_subbuffer(struct lttng_consumer_stream *stream, @@ -117,7 +117,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) } static inline -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) { return -ENOSYS; } -- 1.7.10.4 From mathieu.desnoyers at efficios.com Wed Oct 3 12:28:41 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 3 Oct 2012 12:28:41 -0400 Subject: [lttng-dev] [PATCH lttng-tools 1/3] Fix: Add missing call rcu and read side lock In-Reply-To: <1349279317-28056-1-git-send-email-dgoulet@efficios.com> References: <1349279317-28056-1-git-send-email-dgoulet@efficios.com> Message-ID: <20121003162841.GA21776@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > Signed-off-by: David Goulet > --- > src/common/consumer.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/src/common/consumer.c b/src/common/consumer.c > index f01eb5d..7c3762e 100644 > --- a/src/common/consumer.c > +++ b/src/common/consumer.c > @@ -688,7 +688,9 @@ void consumer_del_channel(struct lttng_consumer_channel *channel) > } > } > > + rcu_read_lock(); > call_rcu(&channel->node.head, consumer_free_channel); > + rcu_read_unlock(); this rcu read lock is useless. > end: > pthread_mutex_unlock(&consumer_data.lock); > } > @@ -1535,7 +1537,7 @@ static void destroy_stream_ht(struct lttng_ht *ht) > ret = lttng_ht_del(ht, &iter); > assert(!ret); > > - free(stream); > + call_rcu(&stream->node.head, consumer_free_stream); Good point. While you are there, please remove the bogus comment at the top of destroy_stream_ht(). It does not take into account that lttng_ht_del can trigger a concurrent resize (performed by call rcu worker threads). > } > rcu_read_unlock(); > > @@ -1635,7 +1637,9 @@ static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) > consumer_del_channel(stream->chan); > } > > - free(stream); > + rcu_read_lock(); > + call_rcu(&stream->node.head, consumer_free_stream); > + rcu_read_unlock(); call_rcu makes sense here, but not rcu read lock. Thanks, Mathieu > } > > /* > -- > 1.7.10.4 > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Wed Oct 3 12:29:47 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 3 Oct 2012 12:29:47 -0400 Subject: [lttng-dev] [PATCH lttng-tools 2/3] Add hash table argument to helper functions In-Reply-To: <1349279317-28056-2-git-send-email-dgoulet@efficios.com> References: <1349279317-28056-1-git-send-email-dgoulet@efficios.com> <1349279317-28056-2-git-send-email-dgoulet@efficios.com> Message-ID: <20121003162947.GB21776@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > This allows these helper functions to be used more broadly across the > code base and not for a specific hash table. > > Signed-off-by: David Goulet Acked-by: Mathieu Desnoyers Thanks! > --- > src/common/consumer.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/src/common/consumer.c b/src/common/consumer.c > index 7c3762e..6ee366f 100644 > --- a/src/common/consumer.c > +++ b/src/common/consumer.c > @@ -62,12 +62,15 @@ volatile int consumer_quit = 0; > * Find a stream. The consumer_data.lock must be locked during this > * call. > */ > -static struct lttng_consumer_stream *consumer_find_stream(int key) > +static struct lttng_consumer_stream *consumer_find_stream(int key, > + struct lttng_ht *ht) > { > struct lttng_ht_iter iter; > struct lttng_ht_node_ulong *node; > struct lttng_consumer_stream *stream = NULL; > > + assert(ht); > + > /* Negative keys are lookup failures */ > if (key < 0) { > return NULL; > @@ -75,8 +78,7 @@ static struct lttng_consumer_stream *consumer_find_stream(int key) > > rcu_read_lock(); > > - lttng_ht_lookup(consumer_data.stream_ht, (void *)((unsigned long) key), > - &iter); > + lttng_ht_lookup(ht, (void *)((unsigned long) key), &iter); > node = lttng_ht_iter_get_node_ulong(&iter); > if (node != NULL) { > stream = caa_container_of(node, struct lttng_consumer_stream, node); > @@ -87,12 +89,12 @@ static struct lttng_consumer_stream *consumer_find_stream(int key) > return stream; > } > > -static void consumer_steal_stream_key(int key) > +static void consumer_steal_stream_key(int key, struct lttng_ht *ht) > { > struct lttng_consumer_stream *stream; > > rcu_read_lock(); > - stream = consumer_find_stream(key); > + stream = consumer_find_stream(key, ht); > if (stream) { > stream->key = -1; > /* > @@ -443,7 +445,7 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) > > pthread_mutex_lock(&consumer_data.lock); > /* Steal stream identifier, for UST */ > - consumer_steal_stream_key(stream->key); > + consumer_steal_stream_key(stream->key, consumer_data.stream_ht); > > rcu_read_lock(); > lttng_ht_lookup(consumer_data.stream_ht, > @@ -620,7 +622,7 @@ void consumer_change_stream_state(int stream_key, > struct lttng_consumer_stream *stream; > > pthread_mutex_lock(&consumer_data.lock); > - stream = consumer_find_stream(stream_key); > + stream = consumer_find_stream(stream_key, consumer_data.stream_ht); > if (stream) { > stream->state = state; > } > -- > 1.7.10.4 > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Wed Oct 3 12:38:29 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 3 Oct 2012 12:38:29 -0400 Subject: [lttng-dev] [PATCH lttng-tools 3/3] Fix: Stream allocation and insertion consistency In-Reply-To: <1349279317-28056-3-git-send-email-dgoulet@efficios.com> References: <1349279317-28056-1-git-send-email-dgoulet@efficios.com> <1349279317-28056-3-git-send-email-dgoulet@efficios.com> Message-ID: <20121003163829.GC21776@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > The stream allocation in the consumer was doing ustctl actions on the > stream and updating refounts. However, before inserting the stream into refounts -> refcounts. > the hash table and polling on the fd for data, an error could occur > which could stop the stream insertion hence creating multiple fd leaks, > mem leaks and bad refount state. refount -> refcount > > Furthermore, the consumer_del_stream now can destroy a stream even if > that stream is not added to the global hash table. The kernel and UST > consumer uses it on error between allocation and hash table insertion. consumer -> consumers uses -> use > > Signed-off-by: David Goulet > --- > src/common/consumer.c | 219 +++++++++++++++++--------- > src/common/kernel-consumer/kernel-consumer.c | 13 +- > src/common/ust-consumer/ust-consumer.c | 16 +- > src/common/ust-consumer/ust-consumer.h | 4 +- > 4 files changed, 172 insertions(+), 80 deletions(-) > > diff --git a/src/common/consumer.c b/src/common/consumer.c > index 6ee366f..6011622 100644 > --- a/src/common/consumer.c > +++ b/src/common/consumer.c > @@ -227,8 +227,8 @@ void consumer_flag_relayd_for_destroy(struct consumer_relayd_sock_pair *relayd) > } > > /* > - * Remove a stream from the global list protected by a mutex. This > - * function is also responsible for freeing its data structures. > + * Remove a stream from the global list protected by a mutex. This function is > + * also responsible for freeing its data structures. > */ > void consumer_del_stream(struct lttng_consumer_stream *stream) > { > @@ -236,10 +236,46 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) > struct lttng_ht_iter iter; > struct lttng_consumer_channel *free_chan = NULL; > struct consumer_relayd_sock_pair *relayd; > + struct lttng_ht_node_ulong *node; > > assert(stream); > > + DBG3("Consumer deleting stream %d", stream->key); > + > pthread_mutex_lock(&consumer_data.lock); > + rcu_read_lock(); > + > + /* > + * A stream with a key value of -1 means that the stream is in the hash > + * table but can not be looked up. This happens when consumer_add_stream is > + * done and we have a duplicate key before insertion. > + * consumer_steal_stream_key() is called to make sure we can insert a > + * stream even though the index is already present. Since the key is the fd > + * value on the session daemon side, duplicates are possible. > + */ > + if (stream->key != -1) { > + lttng_ht_lookup(consumer_data.stream_ht, > + (void *)((unsigned long) stream->key), &iter); > + node = lttng_ht_iter_get_node_ulong(&iter); > + if (node == NULL) { > + rcu_read_unlock(); > + > + /* > + * Stream doest not exist in hash table. This can happen if we hit > + * an error after allocation but before adding it to the table. We > + * consider that if the node is not in the hash table and has a > + * valid key, no ustctl/ioctl nor mmap action was done hence > + * jumping to the RCU free. > + */ > + DBG2("Consumer stream key %d not found during deletion", stream->key); > + goto free_stream; > + } else { > + /* Remove stream from hash table and continue */ > + ret = lttng_ht_del(consumer_data.stream_ht, &iter); > + assert(!ret); > + } > + } > + rcu_read_unlock(); Why are you changing this code ? You add a lookup to get the node you already receive as parameter. It looks pretty much useless to me. What you probably want there is to pass a flag to consumer_del_stream() telling it whether or not it needs to remove the stream from the hash table, so it can skip the ht_del step accordingly. Let's discuss this one and, once we understand the intent, we'll continue on the rest of the patch. Thanks, Mathieu > > switch (consumer_data.type) { > case LTTNG_CONSUMER_KERNEL: > @@ -260,20 +296,10 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) > goto end; > } > > - rcu_read_lock(); > - iter.iter.node = &stream->node.node; > - ret = lttng_ht_del(consumer_data.stream_ht, &iter); > - assert(!ret); > - > - rcu_read_unlock(); > - > - if (consumer_data.stream_count <= 0) { > - goto end; > - } > + /* This should NEVER reach a negative value. */ > + assert(consumer_data.stream_count >= 0); > consumer_data.stream_count--; > - if (!stream) { > - goto end; > - } > + > if (stream->out_fd >= 0) { > ret = close(stream->out_fd); > if (ret) { > @@ -321,7 +347,6 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) > destroy_relayd(relayd); > } > } > - rcu_read_unlock(); > > uatomic_dec(&stream->chan->refcount); > if (!uatomic_read(&stream->chan->refcount) > @@ -329,7 +354,6 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) > free_chan = stream->chan; > } > > - call_rcu(&stream->node.head, consumer_free_stream); > end: > consumer_data.need_update = 1; > pthread_mutex_unlock(&consumer_data.lock); > @@ -337,6 +361,10 @@ end: > if (free_chan) { > consumer_del_channel(free_chan); > } > + > +free_stream: > + call_rcu(&stream->node.head, consumer_free_stream); > + rcu_read_unlock(); > } > > struct lttng_consumer_stream *consumer_allocate_stream( > @@ -353,7 +381,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( > int *alloc_ret) > { > struct lttng_consumer_stream *stream; > - int ret; > > stream = zmalloc(sizeof(*stream)); > if (stream == NULL) { > @@ -372,7 +399,7 @@ struct lttng_consumer_stream *consumer_allocate_stream( > ERR("Unable to find channel for stream %d", stream_key); > goto error; > } > - stream->chan->refcount++; > + > stream->key = stream_key; > stream->shm_fd = shm_fd; > stream->wait_fd = wait_fd; > @@ -391,35 +418,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( > lttng_ht_node_init_ulong(&stream->node, stream->key); > lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); > > - switch (consumer_data.type) { > - case LTTNG_CONSUMER_KERNEL: > - break; > - case LTTNG_CONSUMER32_UST: > - case LTTNG_CONSUMER64_UST: > - stream->cpu = stream->chan->cpucount++; > - ret = lttng_ustconsumer_allocate_stream(stream); > - if (ret) { > - *alloc_ret = -EINVAL; > - goto error; > - } > - break; > - default: > - ERR("Unknown consumer_data type"); > - *alloc_ret = -EINVAL; > - goto error; > - } > - > - /* > - * When nb_init_streams reaches 0, we don't need to trigger any action in > - * terms of destroying the associated channel, because the action that > - * causes the count to become 0 also causes a stream to be added. The > - * channel deletion will thus be triggered by the following removal of this > - * stream. > - */ > - if (uatomic_read(&stream->chan->nb_init_streams) > 0) { > - uatomic_dec(&stream->chan->nb_init_streams); > - } > - > DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," > " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, > stream->shm_fd, stream->wait_fd, > @@ -439,38 +437,66 @@ end: > int consumer_add_stream(struct lttng_consumer_stream *stream) > { > int ret = 0; > - struct lttng_ht_node_ulong *node; > - struct lttng_ht_iter iter; > struct consumer_relayd_sock_pair *relayd; > > - pthread_mutex_lock(&consumer_data.lock); > - /* Steal stream identifier, for UST */ > - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); > + assert(stream); > > + DBG3("Adding consumer stream %d", stream->key); > + > + pthread_mutex_lock(&consumer_data.lock); > rcu_read_lock(); > - lttng_ht_lookup(consumer_data.stream_ht, > - (void *)((unsigned long) stream->key), &iter); > - node = lttng_ht_iter_get_node_ulong(&iter); > - if (node != NULL) { > - rcu_read_unlock(); > - /* Stream already exist. Ignore the insertion */ > - goto end; > - } > > - lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); > + switch (consumer_data.type) { > + case LTTNG_CONSUMER_KERNEL: > + break; > + case LTTNG_CONSUMER32_UST: > + case LTTNG_CONSUMER64_UST: > + stream->cpu = stream->chan->cpucount++; > + ret = lttng_ustconsumer_add_stream(stream); > + if (ret) { > + ret = -EINVAL; > + goto error; > + } > + > + /* Steal stream identifier only for UST */ > + consumer_steal_stream_key(stream->key, consumer_data.stream_ht); > + break; > + default: > + ERR("Unknown consumer_data type"); > + assert(0); > + ret = -ENOSYS; > + goto error; > + } > > /* Check and cleanup relayd */ > relayd = consumer_find_relayd(stream->net_seq_idx); > if (relayd != NULL) { > uatomic_inc(&relayd->refcount); > } > - rcu_read_unlock(); > > - /* Update consumer data */ > + /* Final operation is to add the stream to the global hash table. */ > + lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); > + > + /* Update channel refcount once added without error(s). */ > + uatomic_inc(&stream->chan->refcount); > + > + /* > + * When nb_init_streams reaches 0, we don't need to trigger any action in > + * terms of destroying the associated channel, because the action that > + * causes the count to become 0 also causes a stream to be added. The > + * channel deletion will thus be triggered by the following removal of this > + * stream. > + */ > + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { > + uatomic_dec(&stream->chan->nb_init_streams); > + } > + > + /* Update consumer data once the node is inserted. */ > consumer_data.stream_count++; > consumer_data.need_update = 1; > > -end: > +error: > + rcu_read_unlock(); > pthread_mutex_unlock(&consumer_data.lock); > > return ret; > @@ -1648,10 +1674,37 @@ static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) > * Action done with the metadata stream when adding it to the consumer internal > * data structures to handle it. > */ > -static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) > +static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > + struct lttng_ht *ht) > { > + int ret = 0; > struct consumer_relayd_sock_pair *relayd; > > + switch (consumer_data.type) { > + case LTTNG_CONSUMER_KERNEL: > + break; > + case LTTNG_CONSUMER32_UST: > + case LTTNG_CONSUMER64_UST: > + ret = lttng_ustconsumer_add_stream(stream); > + if (ret) { > + ret = -EINVAL; > + goto error; > + } > + > + /* Steal stream identifier only for UST */ > + consumer_steal_stream_key(stream->key, ht); > + break; > + default: > + ERR("Unknown consumer_data type"); > + assert(0); > + return -ENOSYS; > + } > + > + /* > + * From here, refcounts are updated so be _careful_ when returning an error > + * after this point. > + */ > + > /* Find relayd and, if one is found, increment refcount. */ > rcu_read_lock(); > relayd = consumer_find_relayd(stream->net_seq_idx); > @@ -1659,6 +1712,27 @@ static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) > uatomic_inc(&relayd->refcount); > } > rcu_read_unlock(); > + > + /* Update channel refcount once added without error(s). */ > + uatomic_inc(&stream->chan->refcount); > + > + /* > + * When nb_init_streams reaches 0, we don't need to trigger any action in > + * terms of destroying the associated channel, because the action that > + * causes the count to become 0 also causes a stream to be added. The > + * channel deletion will thus be triggered by the following removal of this > + * stream. > + */ > + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { > + uatomic_dec(&stream->chan->nb_init_streams); > + } > + > + rcu_read_lock(); > + lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); > + rcu_read_unlock(); > + > +error: > + return ret; > } > > /* > @@ -1755,17 +1829,16 @@ restart: > DBG("Adding metadata stream %d to poll set", > stream->wait_fd); > > - rcu_read_lock(); > - /* The node should be init at this point */ > - lttng_ht_add_unique_ulong(metadata_ht, > - &stream->waitfd_node); > - rcu_read_unlock(); > + ret = consumer_add_metadata_stream(stream, metadata_ht); > + if (ret) { > + /* Stream was not setup properly. Continuing. */ > + free(stream); > + continue; > + } > > /* Add metadata stream to the global poll events list */ > lttng_poll_add(&events, stream->wait_fd, > LPOLLIN | LPOLLPRI); > - > - consumer_add_metadata_stream(stream); > } > > /* Metadata pipe handled. Continue handling the others */ > diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c > index 4d61cc5..878c4ab 100644 > --- a/src/common/kernel-consumer/kernel-consumer.c > +++ b/src/common/kernel-consumer/kernel-consumer.c > @@ -206,18 +206,20 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > &new_stream->relayd_stream_id); > pthread_mutex_unlock(&relayd->ctrl_sock_mutex); > if (ret < 0) { > + consumer_del_stream(new_stream); > goto end_nosignal; > } > } else if (msg.u.stream.net_index != -1) { > ERR("Network sequence index %d unknown. Not adding stream.", > msg.u.stream.net_index); > - free(new_stream); > + consumer_del_stream(new_stream); > goto end_nosignal; > } > > if (ctx->on_recv_stream) { > ret = ctx->on_recv_stream(new_stream); > if (ret < 0) { > + consumer_del_stream(new_stream); > goto end_nosignal; > } > } > @@ -230,9 +232,16 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > } while (ret < 0 && errno == EINTR); > if (ret < 0) { > PERROR("write metadata pipe"); > + consumer_del_stream(new_stream); > } > } else { > - consumer_add_stream(new_stream); > + ret = consumer_add_stream(new_stream); > + if (ret) { > + ERR("Consumer add stream %d failed. Continuing", > + new_stream->key); > + consumer_del_stream(new_stream); > + goto end_nosignal; > + } > } > > DBG("Kernel consumer_add_stream (%d)", fd); > diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c > index 76238a0..e10d540 100644 > --- a/src/common/ust-consumer/ust-consumer.c > +++ b/src/common/ust-consumer/ust-consumer.c > @@ -234,12 +234,13 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > &new_stream->relayd_stream_id); > pthread_mutex_unlock(&relayd->ctrl_sock_mutex); > if (ret < 0) { > + consumer_del_stream(new_stream); > goto end_nosignal; > } > } else if (msg.u.stream.net_index != -1) { > ERR("Network sequence index %d unknown. Not adding stream.", > msg.u.stream.net_index); > - free(new_stream); > + consumer_del_stream(new_stream); > goto end_nosignal; > } > > @@ -247,6 +248,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > if (ctx->on_recv_stream) { > ret = ctx->on_recv_stream(new_stream); > if (ret < 0) { > + consumer_del_stream(new_stream); > goto end_nosignal; > } > } > @@ -259,9 +261,17 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > } while (ret < 0 && errno == EINTR); > if (ret < 0) { > PERROR("write metadata pipe"); > + consumer_del_stream(new_stream); > + goto end_nosignal; > } > } else { > - consumer_add_stream(new_stream); > + ret = consumer_add_stream(new_stream); > + if (ret) { > + ERR("Consumer add stream %d failed. Continuing", > + new_stream->key); > + consumer_del_stream(new_stream); > + goto end_nosignal; > + } > } > > DBG("UST consumer_add_stream %s (%d,%d) with relayd id %" PRIu64, > @@ -373,7 +383,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) > ustctl_unmap_channel(chan->handle); > } > > -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) > +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) > { > struct lttng_ust_object_data obj; > int ret; > diff --git a/src/common/ust-consumer/ust-consumer.h b/src/common/ust-consumer/ust-consumer.h > index 3f76f23..6b507ed 100644 > --- a/src/common/ust-consumer/ust-consumer.h > +++ b/src/common/ust-consumer/ust-consumer.h > @@ -49,7 +49,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > > extern int lttng_ustconsumer_allocate_channel(struct lttng_consumer_channel *chan); > extern void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan); > -extern int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream); > +extern int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream); > extern void lttng_ustconsumer_del_stream(struct lttng_consumer_stream *stream); > > int lttng_ustconsumer_read_subbuffer(struct lttng_consumer_stream *stream, > @@ -117,7 +117,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) > } > > static inline > -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) > +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) > { > return -ENOSYS; > } > -- > 1.7.10.4 > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Wed Oct 3 12:42:10 2012 From: dgoulet at efficios.com (David Goulet) Date: Wed, 03 Oct 2012 12:42:10 -0400 Subject: [lttng-dev] [PATCH lttng-tools 3/3] Fix: Stream allocation and insertion consistency In-Reply-To: <20121003163829.GC21776@Krystal> References: <1349279317-28056-1-git-send-email-dgoulet@efficios.com> <1349279317-28056-3-git-send-email-dgoulet@efficios.com> <20121003163829.GC21776@Krystal> Message-ID: <506C6AE2.5060304@efficios.com> Mathieu Desnoyers: > * David Goulet (dgoulet at efficios.com) wrote: >> The stream allocation in the consumer was doing ustctl actions on the >> stream and updating refounts. However, before inserting the stream into > > refounts -> refcounts. > >> the hash table and polling on the fd for data, an error could occur >> which could stop the stream insertion hence creating multiple fd leaks, >> mem leaks and bad refount state. > > refount -> refcount > >> >> Furthermore, the consumer_del_stream now can destroy a stream even if >> that stream is not added to the global hash table. The kernel and UST >> consumer uses it on error between allocation and hash table insertion. > > consumer -> consumers > uses -> use > >> >> Signed-off-by: David Goulet >> --- >> src/common/consumer.c | 219 +++++++++++++++++--------- >> src/common/kernel-consumer/kernel-consumer.c | 13 +- >> src/common/ust-consumer/ust-consumer.c | 16 +- >> src/common/ust-consumer/ust-consumer.h | 4 +- >> 4 files changed, 172 insertions(+), 80 deletions(-) >> >> diff --git a/src/common/consumer.c b/src/common/consumer.c >> index 6ee366f..6011622 100644 >> --- a/src/common/consumer.c >> +++ b/src/common/consumer.c >> @@ -227,8 +227,8 @@ void consumer_flag_relayd_for_destroy(struct consumer_relayd_sock_pair *relayd) >> } >> >> /* >> - * Remove a stream from the global list protected by a mutex. This >> - * function is also responsible for freeing its data structures. >> + * Remove a stream from the global list protected by a mutex. This function is >> + * also responsible for freeing its data structures. >> */ >> void consumer_del_stream(struct lttng_consumer_stream *stream) >> { >> @@ -236,10 +236,46 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) >> struct lttng_ht_iter iter; >> struct lttng_consumer_channel *free_chan = NULL; >> struct consumer_relayd_sock_pair *relayd; >> + struct lttng_ht_node_ulong *node; >> >> assert(stream); >> >> + DBG3("Consumer deleting stream %d", stream->key); >> + >> pthread_mutex_lock(&consumer_data.lock); >> + rcu_read_lock(); >> + >> + /* >> + * A stream with a key value of -1 means that the stream is in the hash >> + * table but can not be looked up. This happens when consumer_add_stream is >> + * done and we have a duplicate key before insertion. >> + * consumer_steal_stream_key() is called to make sure we can insert a >> + * stream even though the index is already present. Since the key is the fd >> + * value on the session daemon side, duplicates are possible. >> + */ >> + if (stream->key != -1) { >> + lttng_ht_lookup(consumer_data.stream_ht, >> + (void *)((unsigned long) stream->key), &iter); >> + node = lttng_ht_iter_get_node_ulong(&iter); >> + if (node == NULL) { >> + rcu_read_unlock(); >> + >> + /* >> + * Stream doest not exist in hash table. This can happen if we hit >> + * an error after allocation but before adding it to the table. We >> + * consider that if the node is not in the hash table and has a >> + * valid key, no ustctl/ioctl nor mmap action was done hence >> + * jumping to the RCU free. >> + */ >> + DBG2("Consumer stream key %d not found during deletion", stream->key); >> + goto free_stream; >> + } else { >> + /* Remove stream from hash table and continue */ >> + ret = lttng_ht_del(consumer_data.stream_ht, &iter); >> + assert(!ret); >> + } >> + } >> + rcu_read_unlock(); > > Why are you changing this code ? You add a lookup to get the node you > already receive as parameter. It looks pretty much useless to me. > > What you probably want there is to pass a flag to consumer_del_stream() > telling it whether or not it needs to remove the stream from the hash > table, so it can skip the ht_del step accordingly. Yes we can do that instead. > > Let's discuss this one and, once we understand the intent, we'll > continue on the rest of the patch. > > Thanks, > > Mathieu > >> >> switch (consumer_data.type) { >> case LTTNG_CONSUMER_KERNEL: >> @@ -260,20 +296,10 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) >> goto end; >> } >> >> - rcu_read_lock(); >> - iter.iter.node = &stream->node.node; >> - ret = lttng_ht_del(consumer_data.stream_ht, &iter); >> - assert(!ret); >> - >> - rcu_read_unlock(); >> - >> - if (consumer_data.stream_count <= 0) { >> - goto end; >> - } >> + /* This should NEVER reach a negative value. */ >> + assert(consumer_data.stream_count >= 0); >> consumer_data.stream_count--; >> - if (!stream) { >> - goto end; >> - } >> + >> if (stream->out_fd >= 0) { >> ret = close(stream->out_fd); >> if (ret) { >> @@ -321,7 +347,6 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) >> destroy_relayd(relayd); >> } >> } >> - rcu_read_unlock(); >> >> uatomic_dec(&stream->chan->refcount); >> if (!uatomic_read(&stream->chan->refcount) >> @@ -329,7 +354,6 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) >> free_chan = stream->chan; >> } >> >> - call_rcu(&stream->node.head, consumer_free_stream); >> end: >> consumer_data.need_update = 1; >> pthread_mutex_unlock(&consumer_data.lock); >> @@ -337,6 +361,10 @@ end: >> if (free_chan) { >> consumer_del_channel(free_chan); >> } >> + >> +free_stream: >> + call_rcu(&stream->node.head, consumer_free_stream); >> + rcu_read_unlock(); >> } >> >> struct lttng_consumer_stream *consumer_allocate_stream( >> @@ -353,7 +381,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( >> int *alloc_ret) >> { >> struct lttng_consumer_stream *stream; >> - int ret; >> >> stream = zmalloc(sizeof(*stream)); >> if (stream == NULL) { >> @@ -372,7 +399,7 @@ struct lttng_consumer_stream *consumer_allocate_stream( >> ERR("Unable to find channel for stream %d", stream_key); >> goto error; >> } >> - stream->chan->refcount++; >> + >> stream->key = stream_key; >> stream->shm_fd = shm_fd; >> stream->wait_fd = wait_fd; >> @@ -391,35 +418,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( >> lttng_ht_node_init_ulong(&stream->node, stream->key); >> lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); >> >> - switch (consumer_data.type) { >> - case LTTNG_CONSUMER_KERNEL: >> - break; >> - case LTTNG_CONSUMER32_UST: >> - case LTTNG_CONSUMER64_UST: >> - stream->cpu = stream->chan->cpucount++; >> - ret = lttng_ustconsumer_allocate_stream(stream); >> - if (ret) { >> - *alloc_ret = -EINVAL; >> - goto error; >> - } >> - break; >> - default: >> - ERR("Unknown consumer_data type"); >> - *alloc_ret = -EINVAL; >> - goto error; >> - } >> - >> - /* >> - * When nb_init_streams reaches 0, we don't need to trigger any action in >> - * terms of destroying the associated channel, because the action that >> - * causes the count to become 0 also causes a stream to be added. The >> - * channel deletion will thus be triggered by the following removal of this >> - * stream. >> - */ >> - if (uatomic_read(&stream->chan->nb_init_streams) > 0) { >> - uatomic_dec(&stream->chan->nb_init_streams); >> - } >> - >> DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," >> " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, >> stream->shm_fd, stream->wait_fd, >> @@ -439,38 +437,66 @@ end: >> int consumer_add_stream(struct lttng_consumer_stream *stream) >> { >> int ret = 0; >> - struct lttng_ht_node_ulong *node; >> - struct lttng_ht_iter iter; >> struct consumer_relayd_sock_pair *relayd; >> >> - pthread_mutex_lock(&consumer_data.lock); >> - /* Steal stream identifier, for UST */ >> - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); >> + assert(stream); >> >> + DBG3("Adding consumer stream %d", stream->key); >> + >> + pthread_mutex_lock(&consumer_data.lock); >> rcu_read_lock(); >> - lttng_ht_lookup(consumer_data.stream_ht, >> - (void *)((unsigned long) stream->key), &iter); >> - node = lttng_ht_iter_get_node_ulong(&iter); >> - if (node != NULL) { >> - rcu_read_unlock(); >> - /* Stream already exist. Ignore the insertion */ >> - goto end; >> - } >> >> - lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); >> + switch (consumer_data.type) { >> + case LTTNG_CONSUMER_KERNEL: >> + break; >> + case LTTNG_CONSUMER32_UST: >> + case LTTNG_CONSUMER64_UST: >> + stream->cpu = stream->chan->cpucount++; >> + ret = lttng_ustconsumer_add_stream(stream); >> + if (ret) { >> + ret = -EINVAL; >> + goto error; >> + } >> + >> + /* Steal stream identifier only for UST */ >> + consumer_steal_stream_key(stream->key, consumer_data.stream_ht); >> + break; >> + default: >> + ERR("Unknown consumer_data type"); >> + assert(0); >> + ret = -ENOSYS; >> + goto error; >> + } >> >> /* Check and cleanup relayd */ >> relayd = consumer_find_relayd(stream->net_seq_idx); >> if (relayd != NULL) { >> uatomic_inc(&relayd->refcount); >> } >> - rcu_read_unlock(); >> >> - /* Update consumer data */ >> + /* Final operation is to add the stream to the global hash table. */ >> + lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); >> + >> + /* Update channel refcount once added without error(s). */ >> + uatomic_inc(&stream->chan->refcount); >> + >> + /* >> + * When nb_init_streams reaches 0, we don't need to trigger any action in >> + * terms of destroying the associated channel, because the action that >> + * causes the count to become 0 also causes a stream to be added. The >> + * channel deletion will thus be triggered by the following removal of this >> + * stream. >> + */ >> + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { >> + uatomic_dec(&stream->chan->nb_init_streams); >> + } >> + >> + /* Update consumer data once the node is inserted. */ >> consumer_data.stream_count++; >> consumer_data.need_update = 1; >> >> -end: >> +error: >> + rcu_read_unlock(); >> pthread_mutex_unlock(&consumer_data.lock); >> >> return ret; >> @@ -1648,10 +1674,37 @@ static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) >> * Action done with the metadata stream when adding it to the consumer internal >> * data structures to handle it. >> */ >> -static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) >> +static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, >> + struct lttng_ht *ht) >> { >> + int ret = 0; >> struct consumer_relayd_sock_pair *relayd; >> >> + switch (consumer_data.type) { >> + case LTTNG_CONSUMER_KERNEL: >> + break; >> + case LTTNG_CONSUMER32_UST: >> + case LTTNG_CONSUMER64_UST: >> + ret = lttng_ustconsumer_add_stream(stream); >> + if (ret) { >> + ret = -EINVAL; >> + goto error; >> + } >> + >> + /* Steal stream identifier only for UST */ >> + consumer_steal_stream_key(stream->key, ht); >> + break; >> + default: >> + ERR("Unknown consumer_data type"); >> + assert(0); >> + return -ENOSYS; >> + } >> + >> + /* >> + * From here, refcounts are updated so be _careful_ when returning an error >> + * after this point. >> + */ >> + >> /* Find relayd and, if one is found, increment refcount. */ >> rcu_read_lock(); >> relayd = consumer_find_relayd(stream->net_seq_idx); >> @@ -1659,6 +1712,27 @@ static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) >> uatomic_inc(&relayd->refcount); >> } >> rcu_read_unlock(); >> + >> + /* Update channel refcount once added without error(s). */ >> + uatomic_inc(&stream->chan->refcount); >> + >> + /* >> + * When nb_init_streams reaches 0, we don't need to trigger any action in >> + * terms of destroying the associated channel, because the action that >> + * causes the count to become 0 also causes a stream to be added. The >> + * channel deletion will thus be triggered by the following removal of this >> + * stream. >> + */ >> + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { >> + uatomic_dec(&stream->chan->nb_init_streams); >> + } >> + >> + rcu_read_lock(); >> + lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); >> + rcu_read_unlock(); >> + >> +error: >> + return ret; >> } >> >> /* >> @@ -1755,17 +1829,16 @@ restart: >> DBG("Adding metadata stream %d to poll set", >> stream->wait_fd); >> >> - rcu_read_lock(); >> - /* The node should be init at this point */ >> - lttng_ht_add_unique_ulong(metadata_ht, >> - &stream->waitfd_node); >> - rcu_read_unlock(); >> + ret = consumer_add_metadata_stream(stream, metadata_ht); >> + if (ret) { >> + /* Stream was not setup properly. Continuing. */ >> + free(stream); >> + continue; >> + } >> >> /* Add metadata stream to the global poll events list */ >> lttng_poll_add(&events, stream->wait_fd, >> LPOLLIN | LPOLLPRI); >> - >> - consumer_add_metadata_stream(stream); >> } >> >> /* Metadata pipe handled. Continue handling the others */ >> diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c >> index 4d61cc5..878c4ab 100644 >> --- a/src/common/kernel-consumer/kernel-consumer.c >> +++ b/src/common/kernel-consumer/kernel-consumer.c >> @@ -206,18 +206,20 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> &new_stream->relayd_stream_id); >> pthread_mutex_unlock(&relayd->ctrl_sock_mutex); >> if (ret < 0) { >> + consumer_del_stream(new_stream); >> goto end_nosignal; >> } >> } else if (msg.u.stream.net_index != -1) { >> ERR("Network sequence index %d unknown. Not adding stream.", >> msg.u.stream.net_index); >> - free(new_stream); >> + consumer_del_stream(new_stream); >> goto end_nosignal; >> } >> >> if (ctx->on_recv_stream) { >> ret = ctx->on_recv_stream(new_stream); >> if (ret < 0) { >> + consumer_del_stream(new_stream); >> goto end_nosignal; >> } >> } >> @@ -230,9 +232,16 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> } while (ret < 0 && errno == EINTR); >> if (ret < 0) { >> PERROR("write metadata pipe"); >> + consumer_del_stream(new_stream); >> } >> } else { >> - consumer_add_stream(new_stream); >> + ret = consumer_add_stream(new_stream); >> + if (ret) { >> + ERR("Consumer add stream %d failed. Continuing", >> + new_stream->key); >> + consumer_del_stream(new_stream); >> + goto end_nosignal; >> + } >> } >> >> DBG("Kernel consumer_add_stream (%d)", fd); >> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c >> index 76238a0..e10d540 100644 >> --- a/src/common/ust-consumer/ust-consumer.c >> +++ b/src/common/ust-consumer/ust-consumer.c >> @@ -234,12 +234,13 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> &new_stream->relayd_stream_id); >> pthread_mutex_unlock(&relayd->ctrl_sock_mutex); >> if (ret < 0) { >> + consumer_del_stream(new_stream); >> goto end_nosignal; >> } >> } else if (msg.u.stream.net_index != -1) { >> ERR("Network sequence index %d unknown. Not adding stream.", >> msg.u.stream.net_index); >> - free(new_stream); >> + consumer_del_stream(new_stream); >> goto end_nosignal; >> } >> >> @@ -247,6 +248,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> if (ctx->on_recv_stream) { >> ret = ctx->on_recv_stream(new_stream); >> if (ret < 0) { >> + consumer_del_stream(new_stream); >> goto end_nosignal; >> } >> } >> @@ -259,9 +261,17 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> } while (ret < 0 && errno == EINTR); >> if (ret < 0) { >> PERROR("write metadata pipe"); >> + consumer_del_stream(new_stream); >> + goto end_nosignal; >> } >> } else { >> - consumer_add_stream(new_stream); >> + ret = consumer_add_stream(new_stream); >> + if (ret) { >> + ERR("Consumer add stream %d failed. Continuing", >> + new_stream->key); >> + consumer_del_stream(new_stream); >> + goto end_nosignal; >> + } >> } >> >> DBG("UST consumer_add_stream %s (%d,%d) with relayd id %" PRIu64, >> @@ -373,7 +383,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) >> ustctl_unmap_channel(chan->handle); >> } >> >> -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) >> +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) >> { >> struct lttng_ust_object_data obj; >> int ret; >> diff --git a/src/common/ust-consumer/ust-consumer.h b/src/common/ust-consumer/ust-consumer.h >> index 3f76f23..6b507ed 100644 >> --- a/src/common/ust-consumer/ust-consumer.h >> +++ b/src/common/ust-consumer/ust-consumer.h >> @@ -49,7 +49,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> >> extern int lttng_ustconsumer_allocate_channel(struct lttng_consumer_channel *chan); >> extern void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan); >> -extern int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream); >> +extern int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream); >> extern void lttng_ustconsumer_del_stream(struct lttng_consumer_stream *stream); >> >> int lttng_ustconsumer_read_subbuffer(struct lttng_consumer_stream *stream, >> @@ -117,7 +117,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) >> } >> >> static inline >> -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) >> +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) >> { >> return -ENOSYS; >> } >> -- >> 1.7.10.4 >> >> >> _______________________________________________ >> lttng-dev mailing list >> lttng-dev at lists.lttng.org >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > From dgoulet at efficios.com Wed Oct 3 12:48:45 2012 From: dgoulet at efficios.com (David Goulet) Date: Wed, 03 Oct 2012 12:48:45 -0400 Subject: [lttng-dev] [PATCH lttng-tools 3/3] Fix: Stream allocation and insertion consistency In-Reply-To: <506C6AE2.5060304@efficios.com> References: <1349279317-28056-1-git-send-email-dgoulet@efficios.com> <1349279317-28056-3-git-send-email-dgoulet@efficios.com> <20121003163829.GC21776@Krystal> <506C6AE2.5060304@efficios.com> Message-ID: <506C6C6D.8030301@efficios.com> David Goulet: > Mathieu Desnoyers: >> * David Goulet (dgoulet at efficios.com) wrote: >>> The stream allocation in the consumer was doing ustctl actions on the >>> stream and updating refounts. However, before inserting the stream into >> >> refounts -> refcounts. >> >>> the hash table and polling on the fd for data, an error could occur >>> which could stop the stream insertion hence creating multiple fd leaks, >>> mem leaks and bad refount state. >> >> refount -> refcount >> >>> >>> Furthermore, the consumer_del_stream now can destroy a stream even if >>> that stream is not added to the global hash table. The kernel and UST >>> consumer uses it on error between allocation and hash table insertion. >> >> consumer -> consumers >> uses -> use >> >>> >>> Signed-off-by: David Goulet >>> --- >>> src/common/consumer.c | 219 +++++++++++++++++--------- >>> src/common/kernel-consumer/kernel-consumer.c | 13 +- >>> src/common/ust-consumer/ust-consumer.c | 16 +- >>> src/common/ust-consumer/ust-consumer.h | 4 +- >>> 4 files changed, 172 insertions(+), 80 deletions(-) >>> >>> diff --git a/src/common/consumer.c b/src/common/consumer.c >>> index 6ee366f..6011622 100644 >>> --- a/src/common/consumer.c >>> +++ b/src/common/consumer.c >>> @@ -227,8 +227,8 @@ void consumer_flag_relayd_for_destroy(struct consumer_relayd_sock_pair *relayd) >>> } >>> >>> /* >>> - * Remove a stream from the global list protected by a mutex. This >>> - * function is also responsible for freeing its data structures. >>> + * Remove a stream from the global list protected by a mutex. This function is >>> + * also responsible for freeing its data structures. >>> */ >>> void consumer_del_stream(struct lttng_consumer_stream *stream) >>> { >>> @@ -236,10 +236,46 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) >>> struct lttng_ht_iter iter; >>> struct lttng_consumer_channel *free_chan = NULL; >>> struct consumer_relayd_sock_pair *relayd; >>> + struct lttng_ht_node_ulong *node; >>> >>> assert(stream); >>> >>> + DBG3("Consumer deleting stream %d", stream->key); >>> + >>> pthread_mutex_lock(&consumer_data.lock); >>> + rcu_read_lock(); >>> + >>> + /* >>> + * A stream with a key value of -1 means that the stream is in the hash >>> + * table but can not be looked up. This happens when consumer_add_stream is >>> + * done and we have a duplicate key before insertion. >>> + * consumer_steal_stream_key() is called to make sure we can insert a >>> + * stream even though the index is already present. Since the key is the fd >>> + * value on the session daemon side, duplicates are possible. >>> + */ >>> + if (stream->key != -1) { >>> + lttng_ht_lookup(consumer_data.stream_ht, >>> + (void *)((unsigned long) stream->key), &iter); >>> + node = lttng_ht_iter_get_node_ulong(&iter); >>> + if (node == NULL) { >>> + rcu_read_unlock(); >>> + >>> + /* >>> + * Stream doest not exist in hash table. This can happen if we hit >>> + * an error after allocation but before adding it to the table. We >>> + * consider that if the node is not in the hash table and has a >>> + * valid key, no ustctl/ioctl nor mmap action was done hence >>> + * jumping to the RCU free. >>> + */ >>> + DBG2("Consumer stream key %d not found during deletion", stream->key); >>> + goto free_stream; >>> + } else { >>> + /* Remove stream from hash table and continue */ >>> + ret = lttng_ht_del(consumer_data.stream_ht, &iter); >>> + assert(!ret); >>> + } >>> + } >>> + rcu_read_unlock(); >> >> Why are you changing this code ? You add a lookup to get the node you >> already receive as parameter. It looks pretty much useless to me. >> >> What you probably want there is to pass a flag to consumer_del_stream() >> telling it whether or not it needs to remove the stream from the hash >> table, so it can skip the ht_del step accordingly. > > Yes we can do that instead. Actually, we could provide the hash table to the call making consumer_del_stream goes: consumer_del_stream(key, ht) and a NULL ht means that the key cannot be remove from the hash table. Else, with a flag, I'll go with wrapper macro to make the code clearer and not just "consumer_del_stream(key, 1)" Thoughts? David > >> >> Let's discuss this one and, once we understand the intent, we'll >> continue on the rest of the patch. >> >> Thanks, >> >> Mathieu >> >>> >>> switch (consumer_data.type) { >>> case LTTNG_CONSUMER_KERNEL: >>> @@ -260,20 +296,10 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) >>> goto end; >>> } >>> >>> - rcu_read_lock(); >>> - iter.iter.node = &stream->node.node; >>> - ret = lttng_ht_del(consumer_data.stream_ht, &iter); >>> - assert(!ret); >>> - >>> - rcu_read_unlock(); >>> - >>> - if (consumer_data.stream_count <= 0) { >>> - goto end; >>> - } >>> + /* This should NEVER reach a negative value. */ >>> + assert(consumer_data.stream_count >= 0); >>> consumer_data.stream_count--; >>> - if (!stream) { >>> - goto end; >>> - } >>> + >>> if (stream->out_fd >= 0) { >>> ret = close(stream->out_fd); >>> if (ret) { >>> @@ -321,7 +347,6 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) >>> destroy_relayd(relayd); >>> } >>> } >>> - rcu_read_unlock(); >>> >>> uatomic_dec(&stream->chan->refcount); >>> if (!uatomic_read(&stream->chan->refcount) >>> @@ -329,7 +354,6 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) >>> free_chan = stream->chan; >>> } >>> >>> - call_rcu(&stream->node.head, consumer_free_stream); >>> end: >>> consumer_data.need_update = 1; >>> pthread_mutex_unlock(&consumer_data.lock); >>> @@ -337,6 +361,10 @@ end: >>> if (free_chan) { >>> consumer_del_channel(free_chan); >>> } >>> + >>> +free_stream: >>> + call_rcu(&stream->node.head, consumer_free_stream); >>> + rcu_read_unlock(); >>> } >>> >>> struct lttng_consumer_stream *consumer_allocate_stream( >>> @@ -353,7 +381,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( >>> int *alloc_ret) >>> { >>> struct lttng_consumer_stream *stream; >>> - int ret; >>> >>> stream = zmalloc(sizeof(*stream)); >>> if (stream == NULL) { >>> @@ -372,7 +399,7 @@ struct lttng_consumer_stream *consumer_allocate_stream( >>> ERR("Unable to find channel for stream %d", stream_key); >>> goto error; >>> } >>> - stream->chan->refcount++; >>> + >>> stream->key = stream_key; >>> stream->shm_fd = shm_fd; >>> stream->wait_fd = wait_fd; >>> @@ -391,35 +418,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( >>> lttng_ht_node_init_ulong(&stream->node, stream->key); >>> lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); >>> >>> - switch (consumer_data.type) { >>> - case LTTNG_CONSUMER_KERNEL: >>> - break; >>> - case LTTNG_CONSUMER32_UST: >>> - case LTTNG_CONSUMER64_UST: >>> - stream->cpu = stream->chan->cpucount++; >>> - ret = lttng_ustconsumer_allocate_stream(stream); >>> - if (ret) { >>> - *alloc_ret = -EINVAL; >>> - goto error; >>> - } >>> - break; >>> - default: >>> - ERR("Unknown consumer_data type"); >>> - *alloc_ret = -EINVAL; >>> - goto error; >>> - } >>> - >>> - /* >>> - * When nb_init_streams reaches 0, we don't need to trigger any action in >>> - * terms of destroying the associated channel, because the action that >>> - * causes the count to become 0 also causes a stream to be added. The >>> - * channel deletion will thus be triggered by the following removal of this >>> - * stream. >>> - */ >>> - if (uatomic_read(&stream->chan->nb_init_streams) > 0) { >>> - uatomic_dec(&stream->chan->nb_init_streams); >>> - } >>> - >>> DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," >>> " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, >>> stream->shm_fd, stream->wait_fd, >>> @@ -439,38 +437,66 @@ end: >>> int consumer_add_stream(struct lttng_consumer_stream *stream) >>> { >>> int ret = 0; >>> - struct lttng_ht_node_ulong *node; >>> - struct lttng_ht_iter iter; >>> struct consumer_relayd_sock_pair *relayd; >>> >>> - pthread_mutex_lock(&consumer_data.lock); >>> - /* Steal stream identifier, for UST */ >>> - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); >>> + assert(stream); >>> >>> + DBG3("Adding consumer stream %d", stream->key); >>> + >>> + pthread_mutex_lock(&consumer_data.lock); >>> rcu_read_lock(); >>> - lttng_ht_lookup(consumer_data.stream_ht, >>> - (void *)((unsigned long) stream->key), &iter); >>> - node = lttng_ht_iter_get_node_ulong(&iter); >>> - if (node != NULL) { >>> - rcu_read_unlock(); >>> - /* Stream already exist. Ignore the insertion */ >>> - goto end; >>> - } >>> >>> - lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); >>> + switch (consumer_data.type) { >>> + case LTTNG_CONSUMER_KERNEL: >>> + break; >>> + case LTTNG_CONSUMER32_UST: >>> + case LTTNG_CONSUMER64_UST: >>> + stream->cpu = stream->chan->cpucount++; >>> + ret = lttng_ustconsumer_add_stream(stream); >>> + if (ret) { >>> + ret = -EINVAL; >>> + goto error; >>> + } >>> + >>> + /* Steal stream identifier only for UST */ >>> + consumer_steal_stream_key(stream->key, consumer_data.stream_ht); >>> + break; >>> + default: >>> + ERR("Unknown consumer_data type"); >>> + assert(0); >>> + ret = -ENOSYS; >>> + goto error; >>> + } >>> >>> /* Check and cleanup relayd */ >>> relayd = consumer_find_relayd(stream->net_seq_idx); >>> if (relayd != NULL) { >>> uatomic_inc(&relayd->refcount); >>> } >>> - rcu_read_unlock(); >>> >>> - /* Update consumer data */ >>> + /* Final operation is to add the stream to the global hash table. */ >>> + lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); >>> + >>> + /* Update channel refcount once added without error(s). */ >>> + uatomic_inc(&stream->chan->refcount); >>> + >>> + /* >>> + * When nb_init_streams reaches 0, we don't need to trigger any action in >>> + * terms of destroying the associated channel, because the action that >>> + * causes the count to become 0 also causes a stream to be added. The >>> + * channel deletion will thus be triggered by the following removal of this >>> + * stream. >>> + */ >>> + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { >>> + uatomic_dec(&stream->chan->nb_init_streams); >>> + } >>> + >>> + /* Update consumer data once the node is inserted. */ >>> consumer_data.stream_count++; >>> consumer_data.need_update = 1; >>> >>> -end: >>> +error: >>> + rcu_read_unlock(); >>> pthread_mutex_unlock(&consumer_data.lock); >>> >>> return ret; >>> @@ -1648,10 +1674,37 @@ static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) >>> * Action done with the metadata stream when adding it to the consumer internal >>> * data structures to handle it. >>> */ >>> -static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) >>> +static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, >>> + struct lttng_ht *ht) >>> { >>> + int ret = 0; >>> struct consumer_relayd_sock_pair *relayd; >>> >>> + switch (consumer_data.type) { >>> + case LTTNG_CONSUMER_KERNEL: >>> + break; >>> + case LTTNG_CONSUMER32_UST: >>> + case LTTNG_CONSUMER64_UST: >>> + ret = lttng_ustconsumer_add_stream(stream); >>> + if (ret) { >>> + ret = -EINVAL; >>> + goto error; >>> + } >>> + >>> + /* Steal stream identifier only for UST */ >>> + consumer_steal_stream_key(stream->key, ht); >>> + break; >>> + default: >>> + ERR("Unknown consumer_data type"); >>> + assert(0); >>> + return -ENOSYS; >>> + } >>> + >>> + /* >>> + * From here, refcounts are updated so be _careful_ when returning an error >>> + * after this point. >>> + */ >>> + >>> /* Find relayd and, if one is found, increment refcount. */ >>> rcu_read_lock(); >>> relayd = consumer_find_relayd(stream->net_seq_idx); >>> @@ -1659,6 +1712,27 @@ static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) >>> uatomic_inc(&relayd->refcount); >>> } >>> rcu_read_unlock(); >>> + >>> + /* Update channel refcount once added without error(s). */ >>> + uatomic_inc(&stream->chan->refcount); >>> + >>> + /* >>> + * When nb_init_streams reaches 0, we don't need to trigger any action in >>> + * terms of destroying the associated channel, because the action that >>> + * causes the count to become 0 also causes a stream to be added. The >>> + * channel deletion will thus be triggered by the following removal of this >>> + * stream. >>> + */ >>> + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { >>> + uatomic_dec(&stream->chan->nb_init_streams); >>> + } >>> + >>> + rcu_read_lock(); >>> + lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); >>> + rcu_read_unlock(); >>> + >>> +error: >>> + return ret; >>> } >>> >>> /* >>> @@ -1755,17 +1829,16 @@ restart: >>> DBG("Adding metadata stream %d to poll set", >>> stream->wait_fd); >>> >>> - rcu_read_lock(); >>> - /* The node should be init at this point */ >>> - lttng_ht_add_unique_ulong(metadata_ht, >>> - &stream->waitfd_node); >>> - rcu_read_unlock(); >>> + ret = consumer_add_metadata_stream(stream, metadata_ht); >>> + if (ret) { >>> + /* Stream was not setup properly. Continuing. */ >>> + free(stream); >>> + continue; >>> + } >>> >>> /* Add metadata stream to the global poll events list */ >>> lttng_poll_add(&events, stream->wait_fd, >>> LPOLLIN | LPOLLPRI); >>> - >>> - consumer_add_metadata_stream(stream); >>> } >>> >>> /* Metadata pipe handled. Continue handling the others */ >>> diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c >>> index 4d61cc5..878c4ab 100644 >>> --- a/src/common/kernel-consumer/kernel-consumer.c >>> +++ b/src/common/kernel-consumer/kernel-consumer.c >>> @@ -206,18 +206,20 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>> &new_stream->relayd_stream_id); >>> pthread_mutex_unlock(&relayd->ctrl_sock_mutex); >>> if (ret < 0) { >>> + consumer_del_stream(new_stream); >>> goto end_nosignal; >>> } >>> } else if (msg.u.stream.net_index != -1) { >>> ERR("Network sequence index %d unknown. Not adding stream.", >>> msg.u.stream.net_index); >>> - free(new_stream); >>> + consumer_del_stream(new_stream); >>> goto end_nosignal; >>> } >>> >>> if (ctx->on_recv_stream) { >>> ret = ctx->on_recv_stream(new_stream); >>> if (ret < 0) { >>> + consumer_del_stream(new_stream); >>> goto end_nosignal; >>> } >>> } >>> @@ -230,9 +232,16 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>> } while (ret < 0 && errno == EINTR); >>> if (ret < 0) { >>> PERROR("write metadata pipe"); >>> + consumer_del_stream(new_stream); >>> } >>> } else { >>> - consumer_add_stream(new_stream); >>> + ret = consumer_add_stream(new_stream); >>> + if (ret) { >>> + ERR("Consumer add stream %d failed. Continuing", >>> + new_stream->key); >>> + consumer_del_stream(new_stream); >>> + goto end_nosignal; >>> + } >>> } >>> >>> DBG("Kernel consumer_add_stream (%d)", fd); >>> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c >>> index 76238a0..e10d540 100644 >>> --- a/src/common/ust-consumer/ust-consumer.c >>> +++ b/src/common/ust-consumer/ust-consumer.c >>> @@ -234,12 +234,13 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>> &new_stream->relayd_stream_id); >>> pthread_mutex_unlock(&relayd->ctrl_sock_mutex); >>> if (ret < 0) { >>> + consumer_del_stream(new_stream); >>> goto end_nosignal; >>> } >>> } else if (msg.u.stream.net_index != -1) { >>> ERR("Network sequence index %d unknown. Not adding stream.", >>> msg.u.stream.net_index); >>> - free(new_stream); >>> + consumer_del_stream(new_stream); >>> goto end_nosignal; >>> } >>> >>> @@ -247,6 +248,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>> if (ctx->on_recv_stream) { >>> ret = ctx->on_recv_stream(new_stream); >>> if (ret < 0) { >>> + consumer_del_stream(new_stream); >>> goto end_nosignal; >>> } >>> } >>> @@ -259,9 +261,17 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>> } while (ret < 0 && errno == EINTR); >>> if (ret < 0) { >>> PERROR("write metadata pipe"); >>> + consumer_del_stream(new_stream); >>> + goto end_nosignal; >>> } >>> } else { >>> - consumer_add_stream(new_stream); >>> + ret = consumer_add_stream(new_stream); >>> + if (ret) { >>> + ERR("Consumer add stream %d failed. Continuing", >>> + new_stream->key); >>> + consumer_del_stream(new_stream); >>> + goto end_nosignal; >>> + } >>> } >>> >>> DBG("UST consumer_add_stream %s (%d,%d) with relayd id %" PRIu64, >>> @@ -373,7 +383,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) >>> ustctl_unmap_channel(chan->handle); >>> } >>> >>> -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) >>> +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) >>> { >>> struct lttng_ust_object_data obj; >>> int ret; >>> diff --git a/src/common/ust-consumer/ust-consumer.h b/src/common/ust-consumer/ust-consumer.h >>> index 3f76f23..6b507ed 100644 >>> --- a/src/common/ust-consumer/ust-consumer.h >>> +++ b/src/common/ust-consumer/ust-consumer.h >>> @@ -49,7 +49,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>> >>> extern int lttng_ustconsumer_allocate_channel(struct lttng_consumer_channel *chan); >>> extern void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan); >>> -extern int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream); >>> +extern int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream); >>> extern void lttng_ustconsumer_del_stream(struct lttng_consumer_stream *stream); >>> >>> int lttng_ustconsumer_read_subbuffer(struct lttng_consumer_stream *stream, >>> @@ -117,7 +117,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) >>> } >>> >>> static inline >>> -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) >>> +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) >>> { >>> return -ENOSYS; >>> } >>> -- >>> 1.7.10.4 >>> >>> >>> _______________________________________________ >>> lttng-dev mailing list >>> lttng-dev at lists.lttng.org >>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev >> > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev From mathieu.desnoyers at efficios.com Wed Oct 3 13:06:41 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 3 Oct 2012 13:06:41 -0400 Subject: [lttng-dev] [PATCH lttng-tools 3/3] Fix: Stream allocation and insertion consistency In-Reply-To: <506C6C6D.8030301@efficios.com> References: <1349279317-28056-1-git-send-email-dgoulet@efficios.com> <1349279317-28056-3-git-send-email-dgoulet@efficios.com> <20121003163829.GC21776@Krystal> <506C6AE2.5060304@efficios.com> <506C6C6D.8030301@efficios.com> Message-ID: <20121003170641.GA22280@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > > > David Goulet: > > Mathieu Desnoyers: > >> * David Goulet (dgoulet at efficios.com) wrote: > >>> The stream allocation in the consumer was doing ustctl actions on the > >>> stream and updating refounts. However, before inserting the stream into > >> > >> refounts -> refcounts. > >> > >>> the hash table and polling on the fd for data, an error could occur > >>> which could stop the stream insertion hence creating multiple fd leaks, > >>> mem leaks and bad refount state. > >> > >> refount -> refcount > >> > >>> > >>> Furthermore, the consumer_del_stream now can destroy a stream even if > >>> that stream is not added to the global hash table. The kernel and UST > >>> consumer uses it on error between allocation and hash table insertion. > >> > >> consumer -> consumers > >> uses -> use > >> > >>> > >>> Signed-off-by: David Goulet > >>> --- > >>> src/common/consumer.c | 219 +++++++++++++++++--------- > >>> src/common/kernel-consumer/kernel-consumer.c | 13 +- > >>> src/common/ust-consumer/ust-consumer.c | 16 +- > >>> src/common/ust-consumer/ust-consumer.h | 4 +- > >>> 4 files changed, 172 insertions(+), 80 deletions(-) > >>> > >>> diff --git a/src/common/consumer.c b/src/common/consumer.c > >>> index 6ee366f..6011622 100644 > >>> --- a/src/common/consumer.c > >>> +++ b/src/common/consumer.c > >>> @@ -227,8 +227,8 @@ void consumer_flag_relayd_for_destroy(struct consumer_relayd_sock_pair *relayd) > >>> } > >>> > >>> /* > >>> - * Remove a stream from the global list protected by a mutex. This > >>> - * function is also responsible for freeing its data structures. > >>> + * Remove a stream from the global list protected by a mutex. This function is > >>> + * also responsible for freeing its data structures. > >>> */ > >>> void consumer_del_stream(struct lttng_consumer_stream *stream) > >>> { > >>> @@ -236,10 +236,46 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) > >>> struct lttng_ht_iter iter; > >>> struct lttng_consumer_channel *free_chan = NULL; > >>> struct consumer_relayd_sock_pair *relayd; > >>> + struct lttng_ht_node_ulong *node; > >>> > >>> assert(stream); > >>> > >>> + DBG3("Consumer deleting stream %d", stream->key); > >>> + > >>> pthread_mutex_lock(&consumer_data.lock); > >>> + rcu_read_lock(); > >>> + > >>> + /* > >>> + * A stream with a key value of -1 means that the stream is in the hash > >>> + * table but can not be looked up. This happens when consumer_add_stream is > >>> + * done and we have a duplicate key before insertion. > >>> + * consumer_steal_stream_key() is called to make sure we can insert a > >>> + * stream even though the index is already present. Since the key is the fd > >>> + * value on the session daemon side, duplicates are possible. > >>> + */ > >>> + if (stream->key != -1) { > >>> + lttng_ht_lookup(consumer_data.stream_ht, > >>> + (void *)((unsigned long) stream->key), &iter); > >>> + node = lttng_ht_iter_get_node_ulong(&iter); > >>> + if (node == NULL) { > >>> + rcu_read_unlock(); > >>> + > >>> + /* > >>> + * Stream doest not exist in hash table. This can happen if we hit > >>> + * an error after allocation but before adding it to the table. We > >>> + * consider that if the node is not in the hash table and has a > >>> + * valid key, no ustctl/ioctl nor mmap action was done hence > >>> + * jumping to the RCU free. > >>> + */ > >>> + DBG2("Consumer stream key %d not found during deletion", stream->key); > >>> + goto free_stream; > >>> + } else { > >>> + /* Remove stream from hash table and continue */ > >>> + ret = lttng_ht_del(consumer_data.stream_ht, &iter); > >>> + assert(!ret); > >>> + } > >>> + } > >>> + rcu_read_unlock(); > >> > >> Why are you changing this code ? You add a lookup to get the node you > >> already receive as parameter. It looks pretty much useless to me. > >> > >> What you probably want there is to pass a flag to consumer_del_stream() > >> telling it whether or not it needs to remove the stream from the hash > >> table, so it can skip the ht_del step accordingly. > > > > Yes we can do that instead. > > Actually, we could provide the hash table to the call making > consumer_del_stream goes: > > consumer_del_stream(key, ht) > > and a NULL ht means that the key cannot be remove from the hash table. > > Else, with a flag, I'll go with wrapper macro to make the code clearer > and not just "consumer_del_stream(key, 1)" > > Thoughts? yep, the null ht parameter works for me. Please resubmit, and we'll continue the review. Thanks, Mathieu > > David > > > > >> > >> Let's discuss this one and, once we understand the intent, we'll > >> continue on the rest of the patch. > >> > >> Thanks, > >> > >> Mathieu > >> > >>> > >>> switch (consumer_data.type) { > >>> case LTTNG_CONSUMER_KERNEL: > >>> @@ -260,20 +296,10 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) > >>> goto end; > >>> } > >>> > >>> - rcu_read_lock(); > >>> - iter.iter.node = &stream->node.node; > >>> - ret = lttng_ht_del(consumer_data.stream_ht, &iter); > >>> - assert(!ret); > >>> - > >>> - rcu_read_unlock(); > >>> - > >>> - if (consumer_data.stream_count <= 0) { > >>> - goto end; > >>> - } > >>> + /* This should NEVER reach a negative value. */ > >>> + assert(consumer_data.stream_count >= 0); > >>> consumer_data.stream_count--; > >>> - if (!stream) { > >>> - goto end; > >>> - } > >>> + > >>> if (stream->out_fd >= 0) { > >>> ret = close(stream->out_fd); > >>> if (ret) { > >>> @@ -321,7 +347,6 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) > >>> destroy_relayd(relayd); > >>> } > >>> } > >>> - rcu_read_unlock(); > >>> > >>> uatomic_dec(&stream->chan->refcount); > >>> if (!uatomic_read(&stream->chan->refcount) > >>> @@ -329,7 +354,6 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) > >>> free_chan = stream->chan; > >>> } > >>> > >>> - call_rcu(&stream->node.head, consumer_free_stream); > >>> end: > >>> consumer_data.need_update = 1; > >>> pthread_mutex_unlock(&consumer_data.lock); > >>> @@ -337,6 +361,10 @@ end: > >>> if (free_chan) { > >>> consumer_del_channel(free_chan); > >>> } > >>> + > >>> +free_stream: > >>> + call_rcu(&stream->node.head, consumer_free_stream); > >>> + rcu_read_unlock(); > >>> } > >>> > >>> struct lttng_consumer_stream *consumer_allocate_stream( > >>> @@ -353,7 +381,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( > >>> int *alloc_ret) > >>> { > >>> struct lttng_consumer_stream *stream; > >>> - int ret; > >>> > >>> stream = zmalloc(sizeof(*stream)); > >>> if (stream == NULL) { > >>> @@ -372,7 +399,7 @@ struct lttng_consumer_stream *consumer_allocate_stream( > >>> ERR("Unable to find channel for stream %d", stream_key); > >>> goto error; > >>> } > >>> - stream->chan->refcount++; > >>> + > >>> stream->key = stream_key; > >>> stream->shm_fd = shm_fd; > >>> stream->wait_fd = wait_fd; > >>> @@ -391,35 +418,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( > >>> lttng_ht_node_init_ulong(&stream->node, stream->key); > >>> lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); > >>> > >>> - switch (consumer_data.type) { > >>> - case LTTNG_CONSUMER_KERNEL: > >>> - break; > >>> - case LTTNG_CONSUMER32_UST: > >>> - case LTTNG_CONSUMER64_UST: > >>> - stream->cpu = stream->chan->cpucount++; > >>> - ret = lttng_ustconsumer_allocate_stream(stream); > >>> - if (ret) { > >>> - *alloc_ret = -EINVAL; > >>> - goto error; > >>> - } > >>> - break; > >>> - default: > >>> - ERR("Unknown consumer_data type"); > >>> - *alloc_ret = -EINVAL; > >>> - goto error; > >>> - } > >>> - > >>> - /* > >>> - * When nb_init_streams reaches 0, we don't need to trigger any action in > >>> - * terms of destroying the associated channel, because the action that > >>> - * causes the count to become 0 also causes a stream to be added. The > >>> - * channel deletion will thus be triggered by the following removal of this > >>> - * stream. > >>> - */ > >>> - if (uatomic_read(&stream->chan->nb_init_streams) > 0) { > >>> - uatomic_dec(&stream->chan->nb_init_streams); > >>> - } > >>> - > >>> DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," > >>> " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, > >>> stream->shm_fd, stream->wait_fd, > >>> @@ -439,38 +437,66 @@ end: > >>> int consumer_add_stream(struct lttng_consumer_stream *stream) > >>> { > >>> int ret = 0; > >>> - struct lttng_ht_node_ulong *node; > >>> - struct lttng_ht_iter iter; > >>> struct consumer_relayd_sock_pair *relayd; > >>> > >>> - pthread_mutex_lock(&consumer_data.lock); > >>> - /* Steal stream identifier, for UST */ > >>> - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); > >>> + assert(stream); > >>> > >>> + DBG3("Adding consumer stream %d", stream->key); > >>> + > >>> + pthread_mutex_lock(&consumer_data.lock); > >>> rcu_read_lock(); > >>> - lttng_ht_lookup(consumer_data.stream_ht, > >>> - (void *)((unsigned long) stream->key), &iter); > >>> - node = lttng_ht_iter_get_node_ulong(&iter); > >>> - if (node != NULL) { > >>> - rcu_read_unlock(); > >>> - /* Stream already exist. Ignore the insertion */ > >>> - goto end; > >>> - } > >>> > >>> - lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); > >>> + switch (consumer_data.type) { > >>> + case LTTNG_CONSUMER_KERNEL: > >>> + break; > >>> + case LTTNG_CONSUMER32_UST: > >>> + case LTTNG_CONSUMER64_UST: > >>> + stream->cpu = stream->chan->cpucount++; > >>> + ret = lttng_ustconsumer_add_stream(stream); > >>> + if (ret) { > >>> + ret = -EINVAL; > >>> + goto error; > >>> + } > >>> + > >>> + /* Steal stream identifier only for UST */ > >>> + consumer_steal_stream_key(stream->key, consumer_data.stream_ht); > >>> + break; > >>> + default: > >>> + ERR("Unknown consumer_data type"); > >>> + assert(0); > >>> + ret = -ENOSYS; > >>> + goto error; > >>> + } > >>> > >>> /* Check and cleanup relayd */ > >>> relayd = consumer_find_relayd(stream->net_seq_idx); > >>> if (relayd != NULL) { > >>> uatomic_inc(&relayd->refcount); > >>> } > >>> - rcu_read_unlock(); > >>> > >>> - /* Update consumer data */ > >>> + /* Final operation is to add the stream to the global hash table. */ > >>> + lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); > >>> + > >>> + /* Update channel refcount once added without error(s). */ > >>> + uatomic_inc(&stream->chan->refcount); > >>> + > >>> + /* > >>> + * When nb_init_streams reaches 0, we don't need to trigger any action in > >>> + * terms of destroying the associated channel, because the action that > >>> + * causes the count to become 0 also causes a stream to be added. The > >>> + * channel deletion will thus be triggered by the following removal of this > >>> + * stream. > >>> + */ > >>> + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { > >>> + uatomic_dec(&stream->chan->nb_init_streams); > >>> + } > >>> + > >>> + /* Update consumer data once the node is inserted. */ > >>> consumer_data.stream_count++; > >>> consumer_data.need_update = 1; > >>> > >>> -end: > >>> +error: > >>> + rcu_read_unlock(); > >>> pthread_mutex_unlock(&consumer_data.lock); > >>> > >>> return ret; > >>> @@ -1648,10 +1674,37 @@ static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) > >>> * Action done with the metadata stream when adding it to the consumer internal > >>> * data structures to handle it. > >>> */ > >>> -static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) > >>> +static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > >>> + struct lttng_ht *ht) > >>> { > >>> + int ret = 0; > >>> struct consumer_relayd_sock_pair *relayd; > >>> > >>> + switch (consumer_data.type) { > >>> + case LTTNG_CONSUMER_KERNEL: > >>> + break; > >>> + case LTTNG_CONSUMER32_UST: > >>> + case LTTNG_CONSUMER64_UST: > >>> + ret = lttng_ustconsumer_add_stream(stream); > >>> + if (ret) { > >>> + ret = -EINVAL; > >>> + goto error; > >>> + } > >>> + > >>> + /* Steal stream identifier only for UST */ > >>> + consumer_steal_stream_key(stream->key, ht); > >>> + break; > >>> + default: > >>> + ERR("Unknown consumer_data type"); > >>> + assert(0); > >>> + return -ENOSYS; > >>> + } > >>> + > >>> + /* > >>> + * From here, refcounts are updated so be _careful_ when returning an error > >>> + * after this point. > >>> + */ > >>> + > >>> /* Find relayd and, if one is found, increment refcount. */ > >>> rcu_read_lock(); > >>> relayd = consumer_find_relayd(stream->net_seq_idx); > >>> @@ -1659,6 +1712,27 @@ static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) > >>> uatomic_inc(&relayd->refcount); > >>> } > >>> rcu_read_unlock(); > >>> + > >>> + /* Update channel refcount once added without error(s). */ > >>> + uatomic_inc(&stream->chan->refcount); > >>> + > >>> + /* > >>> + * When nb_init_streams reaches 0, we don't need to trigger any action in > >>> + * terms of destroying the associated channel, because the action that > >>> + * causes the count to become 0 also causes a stream to be added. The > >>> + * channel deletion will thus be triggered by the following removal of this > >>> + * stream. > >>> + */ > >>> + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { > >>> + uatomic_dec(&stream->chan->nb_init_streams); > >>> + } > >>> + > >>> + rcu_read_lock(); > >>> + lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); > >>> + rcu_read_unlock(); > >>> + > >>> +error: > >>> + return ret; > >>> } > >>> > >>> /* > >>> @@ -1755,17 +1829,16 @@ restart: > >>> DBG("Adding metadata stream %d to poll set", > >>> stream->wait_fd); > >>> > >>> - rcu_read_lock(); > >>> - /* The node should be init at this point */ > >>> - lttng_ht_add_unique_ulong(metadata_ht, > >>> - &stream->waitfd_node); > >>> - rcu_read_unlock(); > >>> + ret = consumer_add_metadata_stream(stream, metadata_ht); > >>> + if (ret) { > >>> + /* Stream was not setup properly. Continuing. */ > >>> + free(stream); > >>> + continue; > >>> + } > >>> > >>> /* Add metadata stream to the global poll events list */ > >>> lttng_poll_add(&events, stream->wait_fd, > >>> LPOLLIN | LPOLLPRI); > >>> - > >>> - consumer_add_metadata_stream(stream); > >>> } > >>> > >>> /* Metadata pipe handled. Continue handling the others */ > >>> diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c > >>> index 4d61cc5..878c4ab 100644 > >>> --- a/src/common/kernel-consumer/kernel-consumer.c > >>> +++ b/src/common/kernel-consumer/kernel-consumer.c > >>> @@ -206,18 +206,20 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>> &new_stream->relayd_stream_id); > >>> pthread_mutex_unlock(&relayd->ctrl_sock_mutex); > >>> if (ret < 0) { > >>> + consumer_del_stream(new_stream); > >>> goto end_nosignal; > >>> } > >>> } else if (msg.u.stream.net_index != -1) { > >>> ERR("Network sequence index %d unknown. Not adding stream.", > >>> msg.u.stream.net_index); > >>> - free(new_stream); > >>> + consumer_del_stream(new_stream); > >>> goto end_nosignal; > >>> } > >>> > >>> if (ctx->on_recv_stream) { > >>> ret = ctx->on_recv_stream(new_stream); > >>> if (ret < 0) { > >>> + consumer_del_stream(new_stream); > >>> goto end_nosignal; > >>> } > >>> } > >>> @@ -230,9 +232,16 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>> } while (ret < 0 && errno == EINTR); > >>> if (ret < 0) { > >>> PERROR("write metadata pipe"); > >>> + consumer_del_stream(new_stream); > >>> } > >>> } else { > >>> - consumer_add_stream(new_stream); > >>> + ret = consumer_add_stream(new_stream); > >>> + if (ret) { > >>> + ERR("Consumer add stream %d failed. Continuing", > >>> + new_stream->key); > >>> + consumer_del_stream(new_stream); > >>> + goto end_nosignal; > >>> + } > >>> } > >>> > >>> DBG("Kernel consumer_add_stream (%d)", fd); > >>> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c > >>> index 76238a0..e10d540 100644 > >>> --- a/src/common/ust-consumer/ust-consumer.c > >>> +++ b/src/common/ust-consumer/ust-consumer.c > >>> @@ -234,12 +234,13 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>> &new_stream->relayd_stream_id); > >>> pthread_mutex_unlock(&relayd->ctrl_sock_mutex); > >>> if (ret < 0) { > >>> + consumer_del_stream(new_stream); > >>> goto end_nosignal; > >>> } > >>> } else if (msg.u.stream.net_index != -1) { > >>> ERR("Network sequence index %d unknown. Not adding stream.", > >>> msg.u.stream.net_index); > >>> - free(new_stream); > >>> + consumer_del_stream(new_stream); > >>> goto end_nosignal; > >>> } > >>> > >>> @@ -247,6 +248,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>> if (ctx->on_recv_stream) { > >>> ret = ctx->on_recv_stream(new_stream); > >>> if (ret < 0) { > >>> + consumer_del_stream(new_stream); > >>> goto end_nosignal; > >>> } > >>> } > >>> @@ -259,9 +261,17 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>> } while (ret < 0 && errno == EINTR); > >>> if (ret < 0) { > >>> PERROR("write metadata pipe"); > >>> + consumer_del_stream(new_stream); > >>> + goto end_nosignal; > >>> } > >>> } else { > >>> - consumer_add_stream(new_stream); > >>> + ret = consumer_add_stream(new_stream); > >>> + if (ret) { > >>> + ERR("Consumer add stream %d failed. Continuing", > >>> + new_stream->key); > >>> + consumer_del_stream(new_stream); > >>> + goto end_nosignal; > >>> + } > >>> } > >>> > >>> DBG("UST consumer_add_stream %s (%d,%d) with relayd id %" PRIu64, > >>> @@ -373,7 +383,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) > >>> ustctl_unmap_channel(chan->handle); > >>> } > >>> > >>> -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) > >>> +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) > >>> { > >>> struct lttng_ust_object_data obj; > >>> int ret; > >>> diff --git a/src/common/ust-consumer/ust-consumer.h b/src/common/ust-consumer/ust-consumer.h > >>> index 3f76f23..6b507ed 100644 > >>> --- a/src/common/ust-consumer/ust-consumer.h > >>> +++ b/src/common/ust-consumer/ust-consumer.h > >>> @@ -49,7 +49,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>> > >>> extern int lttng_ustconsumer_allocate_channel(struct lttng_consumer_channel *chan); > >>> extern void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan); > >>> -extern int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream); > >>> +extern int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream); > >>> extern void lttng_ustconsumer_del_stream(struct lttng_consumer_stream *stream); > >>> > >>> int lttng_ustconsumer_read_subbuffer(struct lttng_consumer_stream *stream, > >>> @@ -117,7 +117,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) > >>> } > >>> > >>> static inline > >>> -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) > >>> +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) > >>> { > >>> return -ENOSYS; > >>> } > >>> -- > >>> 1.7.10.4 > >>> > >>> > >>> _______________________________________________ > >>> lttng-dev mailing list > >>> lttng-dev at lists.lttng.org > >>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > >> > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev at lists.lttng.org > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From bernd.hufmann at ericsson.com Wed Oct 3 14:17:23 2012 From: bernd.hufmann at ericsson.com (Bernd Hufmann) Date: Wed, 3 Oct 2012 14:17:23 -0400 Subject: [lttng-dev] LTTng Tools 2.1 streaming commands Message-ID: <4CB817F8B8860343821834C09BFBA7A6E183805D4D@EUSAACMS0702.eamcs.ericsson.se> Hello For the support of LTTng Tools 2.1 in Eclipse, I'm currently trying to understand how to use the configuration for network streaming with the updated "lttng create"-command and new "enable-consumer"-command. a) lttng enable-consumer I find this command confusing because this command does not always enables the consumer, even if the command name implies so. The enabling actually depends on how the command is executed. Examples: * "lttng enable-consumer -k -U net://" or "lttng enable-consumer -k -C tcp:// -D tcp://" don't enable the consumer. You need to either add option --enable or execute subsequently "lttng enable-consumer --enable" * lttng enable-consumer -k net:// does enable the consumer. I took me a while to figure out the difference to the example above: The option -U is omitted. What the command actually provides, is 2 features: A way to configure streaming (e.g. remote_addr) and a way to enable the consumer. Would it be better to name it to "lttng configure-consumer"? Also, remove the support of the possibility to not specify -U, -C or -D. The following variants of this command should be enough: lttng configure-consumer -k -U [--enable] lttng configure-consumer -k -C -D [--enable] lttng configure-consumer -k --enable lttng configure-consumer -u -U [--enable] lttng configure-consumer -u -C -D [--enable] lttng configure-consumer -u --enable Please let me know what you think. b) lttng create [-U ] | [-C -D ] [--no-consumer] [--disable-consumer] * Are options --no-consumer or --disable-consumer only applicable for streaming? * I'm not sure what is the purpose of the options --no-consumer or --disable-consumer. Could you please explain the use cases for using --no-consumer or --disable-consumer? Thanks Bernd This Communication is Confidential. We only send and receive email on the basis of the terms set out at www.ericsson.com/email_disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From dgoulet at efficios.com Wed Oct 3 14:27:07 2012 From: dgoulet at efficios.com (David Goulet) Date: Wed, 03 Oct 2012 14:27:07 -0400 Subject: [lttng-dev] LTTng Tools 2.1 streaming commands In-Reply-To: <4CB817F8B8860343821834C09BFBA7A6E183805D4D@EUSAACMS0702.eamcs.ericsson.se> References: <4CB817F8B8860343821834C09BFBA7A6E183805D4D@EUSAACMS0702.eamcs.ericsson.se> Message-ID: <506C837B.5070402@efficios.com> Hi Bernd, The enable-consumer, by default, always enable a consumer. We added the "extended options" such as the -U/-C/-D/-e to control each part of the API. So let say, $ lttng enable-consumer -k net://localhost This command does two API calls underneath which are lttng_set_consumer_uri and lttng_enable_consumer. However, the set_consumer_uri can be arbitrary long because it has to connect to the relayd (if remote) and set the session. It adds "unknown" latency to the command. So, for someone willing to control the full time window of the streaming setup using the API, it is divided in two calls. This is why we added the extended options so the command line UI could also be controlled on a per API call basis. Is it clear enough? Reply continues below: Bernd Hufmann: > Hello > > For the support of LTTng Tools 2.1 in Eclipse, I'm currently trying to > understand how to use the configuration for network streaming with the > updated "lttng create"-command and new "enable-consumer"-command. > > a) lttng enable-consumer > I find this command confusing because this command does not always > enables the consumer, even if the command name implieeees so. The enabling > actually depends on how the command is executed. > Examples: > > * "lttng enable-consumer -k -U net://" or "lttng > enable-consumer -k -C tcp:// -D tcp://" > don't enable the consumer. You need to either add option --enable or > execute subsequently "lttng enable-consumer --enable" > * lttng enable-consumer -k net:// does enable the > consumer. I took me a while to figure out the difference to the > example above: The option -U is omitted. > > > What the command actually provides, is 2 features: A way to configure > streaming (e.g. remote_addr) and a way to enable the consumer. Would it > be better to name it to "lttng configure-consumer"? Also, remove the > support of the possibility to not specify -U, -C or -D. The following > variants of this command should be enough: > lttng configure-consumer -k -U [--enable] > lttng configure-consumer -k -C -D [--enable] > lttng configure-consumer -k --enable > lttng configure-consumer -u -U [--enable] > lttng configure-consumer -u -C -D [--enable] > lttng configure-consumer -u --enable > > Please let me know what you think. > > b) lttng create [-U ] | [-C -D ] > [--no-consumer] [--disable-consumer] > > * Are options --no-consumer or --disable-consumer only applicable for > streaming? No, also for local consumer. > * I'm not sure what is the purpose of the options --no-consumer or > --disable-consumer. Could you please explain the use cases for using > --no-consumer or --disable-consumer? This basically disable the consumer for a tracing session. It's not very useful for now but for upcoming snapshots and live tracing, it will make way more sense! :). Again, same idea, the API can control the consumer "state" (enable/disable), so we added these options for the UI. Cheers! David > > > Thanks > Bernd > > This Communication is Confidential. We only send and receive email on > the basis of the terms set out at _www.ericsson.com/email_disclaimer_ > > > > > This body part will be downloaded on demand. From dgoulet at efficios.com Wed Oct 3 14:30:53 2012 From: dgoulet at efficios.com (David Goulet) Date: Wed, 3 Oct 2012 14:30:53 -0400 Subject: [lttng-dev] [PATCH v2 lttng-tools] Fix: Stream allocation and insertion consistency Message-ID: <1349289053-26553-1-git-send-email-dgoulet@efficios.com> The stream allocation in the consumer was doing ustctl actions on the stream and updating refcounts. However, before inserting the stream into the hash table and polling on the fd for data, an error could occur which could stop the stream insertion hence creating multiple fd leaks, mem leaks and bad refcount state. Furthermore, the consumer_del_stream now can destroy a stream even if that stream is not added to the global hash table. The kernel and UST consumers use it on error between allocation and hash table insertion. Signed-off-by: David Goulet --- src/common/consumer.c | 226 ++++++++++++++++---------- src/common/consumer.h | 3 +- src/common/kernel-consumer/kernel-consumer.c | 13 +- src/common/ust-consumer/ust-consumer.c | 20 ++- src/common/ust-consumer/ust-consumer.h | 4 +- 5 files changed, 173 insertions(+), 93 deletions(-) diff --git a/src/common/consumer.c b/src/common/consumer.c index 161bf7e..dd8806c 100644 --- a/src/common/consumer.c +++ b/src/common/consumer.c @@ -227,18 +227,39 @@ void consumer_flag_relayd_for_destroy(struct consumer_relayd_sock_pair *relayd) } /* - * Remove a stream from the global list protected by a mutex. This - * function is also responsible for freeing its data structures. + * Remove a stream from the global list protected by a mutex. This function is + * also responsible for freeing its data structures. */ -void consumer_del_stream(struct lttng_consumer_stream *stream) +void consumer_del_stream(struct lttng_consumer_stream *stream, + struct lttng_ht *ht) { int ret; - struct lttng_ht_iter iter; struct lttng_consumer_channel *free_chan = NULL; struct consumer_relayd_sock_pair *relayd; assert(stream); + DBG3("Consumer deleting stream %d", stream->key); + + if (ht) { + struct lttng_ht_node_ulong *node; + struct lttng_ht_iter iter; + + rcu_read_lock(); + lttng_ht_lookup(ht, (void *)((unsigned long) stream->key), &iter); + node = lttng_ht_iter_get_node_ulong(&iter); + if (node != NULL) { + ret = lttng_ht_del(ht, &iter); + assert(!ret); + } + rcu_read_unlock(); + + /* + * If the stream is not found in the HT, simply continue to at least + * free the stream in the process. + */ + } + pthread_mutex_lock(&consumer_data.lock); switch (consumer_data.type) { @@ -257,23 +278,15 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) default: ERR("Unknown consumer_data type"); assert(0); - goto end; } - rcu_read_lock(); - iter.iter.node = &stream->node.node; - ret = lttng_ht_del(consumer_data.stream_ht, &iter); - assert(!ret); + /* This should NEVER reach a negative value. */ + assert(consumer_data.stream_count >= 0); + consumer_data.stream_count--; + consumer_data.need_update = 1; - rcu_read_unlock(); + pthread_mutex_unlock(&consumer_data.lock); - if (consumer_data.stream_count <= 0) { - goto end; - } - consumer_data.stream_count--; - if (!stream) { - goto end; - } if (stream->out_fd >= 0) { ret = close(stream->out_fd); if (ret) { @@ -321,22 +334,17 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) destroy_relayd(relayd); } } + /* relayd pointer must not be used beyond this point. */ rcu_read_unlock(); uatomic_dec(&stream->chan->refcount); if (!uatomic_read(&stream->chan->refcount) && !uatomic_read(&stream->chan->nb_init_streams)) { - free_chan = stream->chan; + /* Free channel once the consumer data lock is released */ + consumer_del_channel(free_chan); } call_rcu(&stream->node.head, consumer_free_stream); -end: - consumer_data.need_update = 1; - pthread_mutex_unlock(&consumer_data.lock); - - if (free_chan) { - consumer_del_channel(free_chan); - } } struct lttng_consumer_stream *consumer_allocate_stream( @@ -353,7 +361,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( int *alloc_ret) { struct lttng_consumer_stream *stream; - int ret; stream = zmalloc(sizeof(*stream)); if (stream == NULL) { @@ -372,7 +379,7 @@ struct lttng_consumer_stream *consumer_allocate_stream( ERR("Unable to find channel for stream %d", stream_key); goto error; } - stream->chan->refcount++; + stream->key = stream_key; stream->shm_fd = shm_fd; stream->wait_fd = wait_fd; @@ -391,35 +398,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( lttng_ht_node_init_ulong(&stream->node, stream->key); lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); - switch (consumer_data.type) { - case LTTNG_CONSUMER_KERNEL: - break; - case LTTNG_CONSUMER32_UST: - case LTTNG_CONSUMER64_UST: - stream->cpu = stream->chan->cpucount++; - ret = lttng_ustconsumer_allocate_stream(stream); - if (ret) { - *alloc_ret = -EINVAL; - goto error; - } - break; - default: - ERR("Unknown consumer_data type"); - *alloc_ret = -EINVAL; - goto error; - } - - /* - * When nb_init_streams reaches 0, we don't need to trigger any action in - * terms of destroying the associated channel, because the action that - * causes the count to become 0 also causes a stream to be added. The - * channel deletion will thus be triggered by the following removal of this - * stream. - */ - if (uatomic_read(&stream->chan->nb_init_streams) > 0) { - uatomic_dec(&stream->chan->nb_init_streams); - } - DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, stream->shm_fd, stream->wait_fd, @@ -439,38 +417,66 @@ end: int consumer_add_stream(struct lttng_consumer_stream *stream) { int ret = 0; - struct lttng_ht_node_ulong *node; - struct lttng_ht_iter iter; struct consumer_relayd_sock_pair *relayd; - pthread_mutex_lock(&consumer_data.lock); - /* Steal stream identifier, for UST */ - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); + assert(stream); + DBG3("Adding consumer stream %d", stream->key); + + pthread_mutex_lock(&consumer_data.lock); rcu_read_lock(); - lttng_ht_lookup(consumer_data.stream_ht, - (void *)((unsigned long) stream->key), &iter); - node = lttng_ht_iter_get_node_ulong(&iter); - if (node != NULL) { - rcu_read_unlock(); - /* Stream already exist. Ignore the insertion */ - goto end; - } - lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); + switch (consumer_data.type) { + case LTTNG_CONSUMER_KERNEL: + break; + case LTTNG_CONSUMER32_UST: + case LTTNG_CONSUMER64_UST: + stream->cpu = stream->chan->cpucount++; + ret = lttng_ustconsumer_add_stream(stream); + if (ret) { + ret = -EINVAL; + goto error; + } + + /* Steal stream identifier only for UST */ + consumer_steal_stream_key(stream->key, consumer_data.stream_ht); + break; + default: + ERR("Unknown consumer_data type"); + assert(0); + ret = -ENOSYS; + goto error; + } /* Check and cleanup relayd */ relayd = consumer_find_relayd(stream->net_seq_idx); if (relayd != NULL) { uatomic_inc(&relayd->refcount); } - rcu_read_unlock(); - /* Update consumer data */ + /* Final operation is to add the stream to the global hash table. */ + lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); + + /* Update channel refcount once added without error(s). */ + uatomic_inc(&stream->chan->refcount); + + /* + * When nb_init_streams reaches 0, we don't need to trigger any action in + * terms of destroying the associated channel, because the action that + * causes the count to become 0 also causes a stream to be added. The + * channel deletion will thus be triggered by the following removal of this + * stream. + */ + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { + uatomic_dec(&stream->chan->nb_init_streams); + } + + /* Update consumer data once the node is inserted. */ consumer_data.stream_count++; consumer_data.need_update = 1; -end: +error: + rcu_read_unlock(); pthread_mutex_unlock(&consumer_data.lock); return ret; @@ -896,7 +902,7 @@ void lttng_consumer_cleanup(void) node) { struct lttng_consumer_stream *stream = caa_container_of(node, struct lttng_consumer_stream, node); - consumer_del_stream(stream); + consumer_del_stream(stream, consumer_data.stream_ht); } cds_lfht_for_each_entry(consumer_data.channel_ht->ht, &iter.iter, node, @@ -1642,10 +1648,37 @@ static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) * Action done with the metadata stream when adding it to the consumer internal * data structures to handle it. */ -static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) +static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, + struct lttng_ht *ht) { + int ret = 0; struct consumer_relayd_sock_pair *relayd; + switch (consumer_data.type) { + case LTTNG_CONSUMER_KERNEL: + break; + case LTTNG_CONSUMER32_UST: + case LTTNG_CONSUMER64_UST: + ret = lttng_ustconsumer_add_stream(stream); + if (ret) { + ret = -EINVAL; + goto error; + } + + /* Steal stream identifier only for UST */ + consumer_steal_stream_key(stream->key, ht); + break; + default: + ERR("Unknown consumer_data type"); + assert(0); + return -ENOSYS; + } + + /* + * From here, refcounts are updated so be _careful_ when returning an error + * after this point. + */ + /* Find relayd and, if one is found, increment refcount. */ rcu_read_lock(); relayd = consumer_find_relayd(stream->net_seq_idx); @@ -1653,6 +1686,27 @@ static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) uatomic_inc(&relayd->refcount); } rcu_read_unlock(); + + /* Update channel refcount once added without error(s). */ + uatomic_inc(&stream->chan->refcount); + + /* + * When nb_init_streams reaches 0, we don't need to trigger any action in + * terms of destroying the associated channel, because the action that + * causes the count to become 0 also causes a stream to be added. The + * channel deletion will thus be triggered by the following removal of this + * stream. + */ + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { + uatomic_dec(&stream->chan->nb_init_streams); + } + + rcu_read_lock(); + lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); + rcu_read_unlock(); + +error: + return ret; } /* @@ -1749,17 +1803,16 @@ restart: DBG("Adding metadata stream %d to poll set", stream->wait_fd); - rcu_read_lock(); - /* The node should be init at this point */ - lttng_ht_add_unique_ulong(metadata_ht, - &stream->waitfd_node); - rcu_read_unlock(); + ret = consumer_add_metadata_stream(stream, metadata_ht); + if (ret) { + /* Stream was not setup properly. Continuing. */ + free(stream); + continue; + } /* Add metadata stream to the global poll events list */ lttng_poll_add(&events, stream->wait_fd, LPOLLIN | LPOLLPRI); - - consumer_add_metadata_stream(stream); } /* Metadata pipe handled. Continue handling the others */ @@ -2015,19 +2068,22 @@ void *lttng_consumer_thread_poll_fds(void *data) if ((pollfd[i].revents & POLLHUP)) { DBG("Polling fd %d tells it has hung up.", pollfd[i].fd); if (!local_stream[i]->data_read) { - consumer_del_stream(local_stream[i]); + consumer_del_stream(local_stream[i], + consumer_data.stream_ht); num_hup++; } } else if (pollfd[i].revents & POLLERR) { ERR("Error returned in polling fd %d.", pollfd[i].fd); if (!local_stream[i]->data_read) { - consumer_del_stream(local_stream[i]); + consumer_del_stream(local_stream[i], + consumer_data.stream_ht); num_hup++; } } else if (pollfd[i].revents & POLLNVAL) { ERR("Polling fd %d tells fd is not open.", pollfd[i].fd); if (!local_stream[i]->data_read) { - consumer_del_stream(local_stream[i]); + consumer_del_stream(local_stream[i], + consumer_data.stream_ht); num_hup++; } } diff --git a/src/common/consumer.h b/src/common/consumer.h index 9a93c42..5af38e1 100644 --- a/src/common/consumer.h +++ b/src/common/consumer.h @@ -341,7 +341,8 @@ extern struct lttng_consumer_stream *consumer_allocate_stream( int metadata_flag, int *alloc_ret); extern int consumer_add_stream(struct lttng_consumer_stream *stream); -extern void consumer_del_stream(struct lttng_consumer_stream *stream); +extern void consumer_del_stream(struct lttng_consumer_stream *stream, + struct lttng_ht *ht); extern void consumer_change_stream_state(int stream_key, enum lttng_consumer_stream_state state); extern void consumer_del_channel(struct lttng_consumer_channel *channel); diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c index 4d61cc5..13cbe21 100644 --- a/src/common/kernel-consumer/kernel-consumer.c +++ b/src/common/kernel-consumer/kernel-consumer.c @@ -206,18 +206,20 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, &new_stream->relayd_stream_id); pthread_mutex_unlock(&relayd->ctrl_sock_mutex); if (ret < 0) { + consumer_del_stream(new_stream, NULL); goto end_nosignal; } } else if (msg.u.stream.net_index != -1) { ERR("Network sequence index %d unknown. Not adding stream.", msg.u.stream.net_index); - free(new_stream); + consumer_del_stream(new_stream, NULL); goto end_nosignal; } if (ctx->on_recv_stream) { ret = ctx->on_recv_stream(new_stream); if (ret < 0) { + consumer_del_stream(new_stream, NULL); goto end_nosignal; } } @@ -230,9 +232,16 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, } while (ret < 0 && errno == EINTR); if (ret < 0) { PERROR("write metadata pipe"); + consumer_del_stream(new_stream, NULL); } } else { - consumer_add_stream(new_stream); + ret = consumer_add_stream(new_stream); + if (ret) { + ERR("Consumer add stream %d failed. Continuing", + new_stream->key); + consumer_del_stream(new_stream, NULL); + goto end_nosignal; + } } DBG("Kernel consumer_add_stream (%d)", fd); diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c index 76238a0..69da765 100644 --- a/src/common/ust-consumer/ust-consumer.c +++ b/src/common/ust-consumer/ust-consumer.c @@ -234,12 +234,13 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, &new_stream->relayd_stream_id); pthread_mutex_unlock(&relayd->ctrl_sock_mutex); if (ret < 0) { + consumer_del_stream(new_stream, NULL); goto end_nosignal; } } else if (msg.u.stream.net_index != -1) { ERR("Network sequence index %d unknown. Not adding stream.", msg.u.stream.net_index); - free(new_stream); + consumer_del_stream(new_stream, NULL); goto end_nosignal; } @@ -247,6 +248,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, if (ctx->on_recv_stream) { ret = ctx->on_recv_stream(new_stream); if (ret < 0) { + consumer_del_stream(new_stream, NULL); goto end_nosignal; } } @@ -259,9 +261,21 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, } while (ret < 0 && errno == EINTR); if (ret < 0) { PERROR("write metadata pipe"); + consumer_del_stream(new_stream, NULL); + goto end_nosignal; } } else { - consumer_add_stream(new_stream); + ret = consumer_add_stream(new_stream); + if (ret) { + ERR("Consumer add stream %d failed. Continuing", + new_stream->key); + /* + * At this point, if the add_stream fails, it is not in the + * hash table thus passing the NULL value here. + */ + consumer_del_stream(new_stream, NULL); + goto end_nosignal; + } } DBG("UST consumer_add_stream %s (%d,%d) with relayd id %" PRIu64, @@ -373,7 +387,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) ustctl_unmap_channel(chan->handle); } -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) { struct lttng_ust_object_data obj; int ret; diff --git a/src/common/ust-consumer/ust-consumer.h b/src/common/ust-consumer/ust-consumer.h index 3f76f23..6b507ed 100644 --- a/src/common/ust-consumer/ust-consumer.h +++ b/src/common/ust-consumer/ust-consumer.h @@ -49,7 +49,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, extern int lttng_ustconsumer_allocate_channel(struct lttng_consumer_channel *chan); extern void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan); -extern int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream); +extern int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream); extern void lttng_ustconsumer_del_stream(struct lttng_consumer_stream *stream); int lttng_ustconsumer_read_subbuffer(struct lttng_consumer_stream *stream, @@ -117,7 +117,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) } static inline -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) { return -ENOSYS; } -- 1.7.10.4 From paulmck at linux.vnet.ibm.com Wed Oct 3 14:28:46 2012 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Wed, 3 Oct 2012 11:28:46 -0700 Subject: [lttng-dev] [rp] [URCU PATCH 0/3] wait-free concurrent queues (wfcqueue) In-Reply-To: <20121002141307.GA4057@Krystal> References: <20121002141307.GA4057@Krystal> Message-ID: <20121003182846.GN2527@linux.vnet.ibm.com> On Tue, Oct 02, 2012 at 10:13:07AM -0400, Mathieu Desnoyers wrote: > Implement wait-free concurrent queues, with a new API different from > wfqueue.h, which is already provided by Userspace RCU. The advantage of > splitting the head and tail objects of the queue into different > arguments is to allow these to sit on different cache-lines, thus > eliminating false-sharing, leading to a 2.3x speed increase. > > This API also introduces a "splice" operation, which moves all nodes > from one queue into another, and postpones the synchronization to either > dequeue or iteration on the list. The splice operation does not need to > touch every single node of the queue it moves them from. Moreover, the > splice operation only needs to ensure mutual exclusion with other > dequeuers, iterations, and splice operations from the list it splices > from, but acts as a simple enqueuer on the list it splices into (no > mutual exclusion needed for that list). > > Feedback is welcome, These look sane to me, though I must confess that the tail pointer referencing the node rather than the node's next pointer did throw me for a bit. ;-) Thanx, Paul From mathieu.desnoyers at efficios.com Wed Oct 3 17:04:36 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 3 Oct 2012 17:04:36 -0400 Subject: [lttng-dev] [rp] [URCU PATCH 0/3] wait-free concurrent queues (wfcqueue) In-Reply-To: <20121003182846.GN2527@linux.vnet.ibm.com> References: <20121002141307.GA4057@Krystal> <20121003182846.GN2527@linux.vnet.ibm.com> Message-ID: <20121003210436.GB25090@Krystal> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > On Tue, Oct 02, 2012 at 10:13:07AM -0400, Mathieu Desnoyers wrote: > > Implement wait-free concurrent queues, with a new API different from > > wfqueue.h, which is already provided by Userspace RCU. The advantage of > > splitting the head and tail objects of the queue into different > > arguments is to allow these to sit on different cache-lines, thus > > eliminating false-sharing, leading to a 2.3x speed increase. > > > > This API also introduces a "splice" operation, which moves all nodes > > from one queue into another, and postpones the synchronization to either > > dequeue or iteration on the list. The splice operation does not need to > > touch every single node of the queue it moves them from. Moreover, the > > splice operation only needs to ensure mutual exclusion with other > > dequeuers, iterations, and splice operations from the list it splices > > from, but acts as a simple enqueuer on the list it splices into (no > > mutual exclusion needed for that list). > > > > Feedback is welcome, > > These look sane to me, though I must confess that the tail pointer > referencing the node rather than the node's next pointer did throw > me for a bit. ;-) Yes, this was originally introduced with Lai's original patch to wfqueue, which I think is a nice simplification: it's pretty much the same thing to use the last node address as tail rather than the address of its first member (its next pointer address (_not_ value)). It ends up being the same address in this case, but more interestingly, we don't have to use a struct cds_wfcq_node ** type: a simple struct cds_wfcq_node * suffice. Thanks Paul, I will therefore merge these 3 patches with your Acked-by. Lai, you are welcome to provide improvements to this code against the master branch. I will gladly consider any change you propose. Thanks! Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From alexandre.montplaisir at polymtl.ca Wed Oct 3 23:41:53 2012 From: alexandre.montplaisir at polymtl.ca (Alexandre Montplaisir) Date: Wed, 03 Oct 2012 23:41:53 -0400 Subject: [lttng-dev] Viewing userspace apps traces In-Reply-To: <506BD225.3030906@fnac.net> References: <506B4A21.6080309@fnac.net> <506B4EA4.8050407@voxpopuli.im> <506BD225.3030906@fnac.net> Message-ID: <506D0581.3030400@polymtl.ca> Sorry for the late reply, I had to update/rebase some of the stuff first... (Cross-posting linuxtools-dev, as it might interest some people who follow that list too. This is about extending TMF to implement a graphical view for a specific UST trace type.) On 12-10-03 01:50 AM, Paul Chavent wrote: > [...] > > On 10/02/2012 10:29 PM, Alexandre Montplaisir wrote: >> [...] >> >> What would you like see in your "timeline representation" exactly? Maybe >> we could give you some pointers as to how to implement such a view. >> (We're currently working on making it easy to extend the framework to >> implement new views, so this could be a good exercise!) > > I would like to see, eg, one line per tid, and on each line, the value > of one context or argument value. > > I'm ready to follow an exercise for extending the framework ! Ok good! We don't have a nice tutorial ready yet, as most parts are still working their way upstream. But if you want to dig into it and try it out now, you can: 1 - Set up the development environment for TMF: http://wiki.eclipse.org/Linux_Tools_Project/LTTng_Eclipse_Plug-in_Development_Environement_Setup 2 - Checkout the "lttng-kepler" branch in the git. This is where the latest development happens. 3 - Apply those two patches, in that order: https://git.eclipse.org/r/#/c/7747/ https://git.eclipse.org/r/#/c/7748/ (you can copy-paste the "cherry-pick" command shown on the page) 4 - Download the example program and view from: git://git.dorsal.polymtl.ca/~alexmont/ust-example.git Now at this point you should be able to import and build the example plugins (ust.example.core and ust.example.ui) and the TMF/LTTng ones in the same workspace. You can try it to make sure it works correctly : take a UST trace of the "myprog" program, and then load it into TMF, and show the "Example -> Connections" view. It should display the yellow and green rectangles corresponding to the states that were defined. After that, it shouldn't be too hard (famous last words...) to rework the ust.example.* code to fit your application. The points of interest will be (before renames): MyUstTraceInput, line 85+: This is where you assign your trace events to states ConnectionsPresentationProvider, line 31-34: This is where you assign the colors to each state in the view and same file, lines 64-68 and 81-85 : This is where you assign the trace's states to the ones in the view. (one place is for the actual colored rectangle, the other is for the tooltips, iirc). If you have any question or problem, please let me know! Good luck ;) -- Alexandre Montplaisir DORSAL lab, ?cole Polytechnique de Montr?al From Fredrik_Oestman at mentor.com Thu Oct 4 06:56:41 2012 From: Fredrik_Oestman at mentor.com (Oestman, Fredrik) Date: Thu, 4 Oct 2012 10:56:41 +0000 Subject: [lttng-dev] Viewing userspace apps traces In-Reply-To: <506B4EA4.8050407@voxpopuli.im> References: <506B4A21.6080309@fnac.net> <506B4EA4.8050407@voxpopuli.im> Message-ID: <524C960C5DFC794E82BE548D825F05CF5BC6F694@EU-MBX-01.mgc.mentorg.com> Alexandre Montplaisir wrote: > As for graphical views, like timegraphs, it's not easy to have a general > view that can work with any UST trace. Each application defines its own > event types, so we have no guarantee for any given event type to be there. Maybe this doesn't really help you, but our commercial tool can do just that. Cheers, Fredrik ?stman http://go.mentor.com/sourceryanalyzer/ From Bernd.Hufmann at ericsson.com Thu Oct 4 09:40:00 2012 From: Bernd.Hufmann at ericsson.com (eamcs/eedbhu) Date: Thu, 4 Oct 2012 09:40:00 -0400 Subject: [lttng-dev] Viewing userspace apps traces In-Reply-To: <506D0581.3030400@polymtl.ca> References: <506B4A21.6080309@fnac.net> <506B4EA4.8050407@voxpopuli.im> <506BD225.3030906@fnac.net> <506D0581.3030400@polymtl.ca> Message-ID: <506D91B0.2070708@ericsson.com> Hi Paul The Eclipse Linux Tools release v1.1.1 (which is also included in the CDT EPP Juno SR1 release) contains an update that displays CTF context information in the events table. So you should be able to see the context information. Best Regards Bernd On 10/03/2012 11:41 PM, Alexandre Montplaisir wrote: > Sorry for the late reply, I had to update/rebase some of the stuff first... > > (Cross-posting linuxtools-dev, as it might interest some people who > follow that list too. This is about extending TMF to implement a > graphical view for a specific UST trace type.) > > > On 12-10-03 01:50 AM, Paul Chavent wrote: >> [...] >> >> On 10/02/2012 10:29 PM, Alexandre Montplaisir wrote: >>> [...] >>> >>> What would you like see in your "timeline representation" exactly? Maybe >>> we could give you some pointers as to how to implement such a view. >>> (We're currently working on making it easy to extend the framework to >>> implement new views, so this could be a good exercise!) >> I would like to see, eg, one line per tid, and on each line, the value >> of one context or argument value. >> >> I'm ready to follow an exercise for extending the framework ! > Ok good! We don't have a nice tutorial ready yet, as most parts are > still working their way upstream. But if you want to dig into it and try > it out now, you can: > > 1 - Set up the development environment for TMF: > http://wiki.eclipse.org/Linux_Tools_Project/LTTng_Eclipse_Plug-in_Development_Environement_Setup > > 2 - Checkout the "lttng-kepler" branch in the git. This is where the > latest development happens. > > 3 - Apply those two patches, in that order: > https://git.eclipse.org/r/#/c/7747/ > https://git.eclipse.org/r/#/c/7748/ > (you can copy-paste the "cherry-pick" command shown on the page) > > 4 - Download the example program and view from: > git://git.dorsal.polymtl.ca/~alexmont/ust-example.git > > Now at this point you should be able to import and build the example > plugins (ust.example.core and ust.example.ui) and the TMF/LTTng ones in > the same workspace. > > You can try it to make sure it works correctly : take a UST trace of the > "myprog" program, and then load it into TMF, and show the "Example -> > Connections" view. It should display the yellow and green rectangles > corresponding to the states that were defined. > > > After that, it shouldn't be too hard (famous last words...) to rework > the ust.example.* code to fit your application. The points of interest > will be (before renames): > > MyUstTraceInput, line 85+: This is where you assign your trace events to > states > ConnectionsPresentationProvider, line 31-34: This is where you assign > the colors to each state in the view > and same file, lines 64-68 and 81-85 : This is where you assign the > trace's states to the ones in the view. > (one place is for the actual colored rectangle, the other is for the > tooltips, iirc). > > > If you have any question or problem, please let me know! > > Good luck ;) > From mathieu.desnoyers at efficios.com Thu Oct 4 10:29:28 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Thu, 4 Oct 2012 10:29:28 -0400 Subject: [lttng-dev] Viewing userspace apps traces In-Reply-To: <524C960C5DFC794E82BE548D825F05CF5BC6F694@EU-MBX-01.mgc.mentorg.com> References: <506B4A21.6080309@fnac.net> <506B4EA4.8050407@voxpopuli.im> <524C960C5DFC794E82BE548D825F05CF5BC6F694@EU-MBX-01.mgc.mentorg.com> Message-ID: <20121004142928.GA3766@Krystal> * Oestman, Fredrik (Fredrik_Oestman at mentor.com) wrote: > Alexandre Montplaisir wrote: > > As for graphical views, like timegraphs, it's not easy to have a general > > view that can work with any UST trace. Each application defines its own > > event types, so we have no guarantee for any given event type to be there. > > Maybe this doesn't really help you, but our commercial tool can do just that. > > Cheers, > > Fredrik ?stman > > http://go.mentor.com/sourceryanalyzer/ Hi Fredrik, Hinting at this tool in this context (discussion of an open source viewer implementation) on lttng-dev (an open source project mailing list) can only mean that you seem to be willing to contribute views to the open source project. That would be very welcome indeed. What are Mentor's plans in contributing to the open source viewer efforts of the LTTng project ? Thank you, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Thu Oct 4 10:55:04 2012 From: dgoulet at efficios.com (David Goulet) Date: Thu, 4 Oct 2012 10:55:04 -0400 Subject: [lttng-dev] [PATCH v3 lttng-tools] Fix: Stream allocation and insertion consistency Message-ID: <1349362504-18296-1-git-send-email-dgoulet@efficios.com> The stream allocation in the consumer was doing ustctl actions on the stream and updating refcounts. However, before inserting the stream into the hash table and polling on the fd for data, an error could occur which could stop the stream insertion hence creating multiple fd leaks, mem leaks and bad refcount state. Furthermore, the consumer_del_stream now can destroy a stream even if that stream is not added to the global hash table. The kernel and UST consumers use it on error between allocation and hash table insertion. Signed-off-by: David Goulet --- src/common/consumer.c | 217 ++++++++++++++++++-------- src/common/consumer.h | 3 +- src/common/kernel-consumer/kernel-consumer.c | 13 +- src/common/ust-consumer/ust-consumer.c | 20 ++- src/common/ust-consumer/ust-consumer.h | 4 +- 5 files changed, 181 insertions(+), 76 deletions(-) diff --git a/src/common/consumer.c b/src/common/consumer.c index 161bf7e..a3fbef6 100644 --- a/src/common/consumer.c +++ b/src/common/consumer.c @@ -230,7 +230,8 @@ void consumer_flag_relayd_for_destroy(struct consumer_relayd_sock_pair *relayd) * Remove a stream from the global list protected by a mutex. This * function is also responsible for freeing its data structures. */ -void consumer_del_stream(struct lttng_consumer_stream *stream) +void consumer_del_stream(struct lttng_consumer_stream *stream, + struct lttng_ht *ht) { int ret; struct lttng_ht_iter iter; @@ -239,6 +240,11 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) assert(stream); + if (ht == NULL) { + /* Means that the stream was never added to a hash table */ + goto free_stream; + } + pthread_mutex_lock(&consumer_data.lock); switch (consumer_data.type) { @@ -329,7 +335,6 @@ void consumer_del_stream(struct lttng_consumer_stream *stream) free_chan = stream->chan; } - call_rcu(&stream->node.head, consumer_free_stream); end: consumer_data.need_update = 1; pthread_mutex_unlock(&consumer_data.lock); @@ -337,6 +342,9 @@ end: if (free_chan) { consumer_del_channel(free_chan); } + +free_stream: + call_rcu(&stream->node.head, consumer_free_stream); } struct lttng_consumer_stream *consumer_allocate_stream( @@ -353,7 +361,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( int *alloc_ret) { struct lttng_consumer_stream *stream; - int ret; stream = zmalloc(sizeof(*stream)); if (stream == NULL) { @@ -372,7 +379,7 @@ struct lttng_consumer_stream *consumer_allocate_stream( ERR("Unable to find channel for stream %d", stream_key); goto error; } - stream->chan->refcount++; + stream->key = stream_key; stream->shm_fd = shm_fd; stream->wait_fd = wait_fd; @@ -391,35 +398,6 @@ struct lttng_consumer_stream *consumer_allocate_stream( lttng_ht_node_init_ulong(&stream->node, stream->key); lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); - switch (consumer_data.type) { - case LTTNG_CONSUMER_KERNEL: - break; - case LTTNG_CONSUMER32_UST: - case LTTNG_CONSUMER64_UST: - stream->cpu = stream->chan->cpucount++; - ret = lttng_ustconsumer_allocate_stream(stream); - if (ret) { - *alloc_ret = -EINVAL; - goto error; - } - break; - default: - ERR("Unknown consumer_data type"); - *alloc_ret = -EINVAL; - goto error; - } - - /* - * When nb_init_streams reaches 0, we don't need to trigger any action in - * terms of destroying the associated channel, because the action that - * causes the count to become 0 also causes a stream to be added. The - * channel deletion will thus be triggered by the following removal of this - * stream. - */ - if (uatomic_read(&stream->chan->nb_init_streams) > 0) { - uatomic_dec(&stream->chan->nb_init_streams); - } - DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, stream->shm_fd, stream->wait_fd, @@ -439,22 +417,35 @@ end: int consumer_add_stream(struct lttng_consumer_stream *stream) { int ret = 0; - struct lttng_ht_node_ulong *node; - struct lttng_ht_iter iter; struct consumer_relayd_sock_pair *relayd; - pthread_mutex_lock(&consumer_data.lock); - /* Steal stream identifier, for UST */ - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); + assert(stream); + + DBG3("Adding consumer stream %d", stream->key); + pthread_mutex_lock(&consumer_data.lock); rcu_read_lock(); - lttng_ht_lookup(consumer_data.stream_ht, - (void *)((unsigned long) stream->key), &iter); - node = lttng_ht_iter_get_node_ulong(&iter); - if (node != NULL) { - rcu_read_unlock(); - /* Stream already exist. Ignore the insertion */ - goto end; + + switch (consumer_data.type) { + case LTTNG_CONSUMER_KERNEL: + break; + case LTTNG_CONSUMER32_UST: + case LTTNG_CONSUMER64_UST: + stream->cpu = stream->chan->cpucount++; + ret = lttng_ustconsumer_add_stream(stream); + if (ret) { + ret = -EINVAL; + goto error; + } + + /* Steal stream identifier only for UST */ + consumer_steal_stream_key(stream->key, consumer_data.stream_ht); + break; + default: + ERR("Unknown consumer_data type"); + assert(0); + ret = -ENOSYS; + goto error; } lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); @@ -464,13 +455,27 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) if (relayd != NULL) { uatomic_inc(&relayd->refcount); } - rcu_read_unlock(); - /* Update consumer data */ + /* Update channel refcount once added without error(s). */ + uatomic_inc(&stream->chan->refcount); + + /* + * When nb_init_streams reaches 0, we don't need to trigger any action in + * terms of destroying the associated channel, because the action that + * causes the count to become 0 also causes a stream to be added. The + * channel deletion will thus be triggered by the following removal of this + * stream. + */ + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { + uatomic_dec(&stream->chan->nb_init_streams); + } + + /* Update consumer data once the node is inserted. */ consumer_data.stream_count++; consumer_data.need_update = 1; -end: +error: + rcu_read_unlock(); pthread_mutex_unlock(&consumer_data.lock); return ret; @@ -896,7 +901,7 @@ void lttng_consumer_cleanup(void) node) { struct lttng_consumer_stream *stream = caa_container_of(node, struct lttng_consumer_stream, node); - consumer_del_stream(stream); + consumer_del_stream(stream, consumer_data.stream_ht); } cds_lfht_for_each_entry(consumer_data.channel_ht->ht, &iter.iter, node, @@ -1545,9 +1550,12 @@ static void destroy_stream_ht(struct lttng_ht *ht) /* * Clean up a metadata stream and free its memory. */ -static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) +static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, + struct lttng_ht *ht) { int ret; + struct lttng_ht_iter iter; + struct lttng_consumer_channel *free_chan = NULL; struct consumer_relayd_sock_pair *relayd; assert(stream); @@ -1557,6 +1565,16 @@ static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) */ assert(stream->metadata_flag); + if (ht == NULL) { + goto free_stream; + } + + rcu_read_lock(); + iter.iter.node = &stream->node.node; + ret = lttng_ht_del(ht, &iter); + assert(!ret); + rcu_read_unlock(); + pthread_mutex_lock(&consumer_data.lock); switch (consumer_data.type) { case LTTNG_CONSUMER_KERNEL: @@ -1574,8 +1592,8 @@ static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) default: ERR("Unknown consumer_data type"); assert(0); + goto end; } - pthread_mutex_unlock(&consumer_data.lock); if (stream->out_fd >= 0) { ret = close(stream->out_fd); @@ -1632,9 +1650,17 @@ static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) if (!uatomic_read(&stream->chan->refcount) && !uatomic_read(&stream->chan->nb_init_streams)) { /* Go for channel deletion! */ - consumer_del_channel(stream->chan); + free_chan = stream->chan; } +end: + pthread_mutex_unlock(&consumer_data.lock); + + if (free_chan) { + consumer_del_channel(free_chan); + } + +free_stream: call_rcu(&stream->node.head, consumer_free_stream); } @@ -1642,17 +1668,72 @@ static void consumer_del_metadata_stream(struct lttng_consumer_stream *stream) * Action done with the metadata stream when adding it to the consumer internal * data structures to handle it. */ -static void consumer_add_metadata_stream(struct lttng_consumer_stream *stream) +static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, + struct lttng_ht *ht) { + int ret = 0; struct consumer_relayd_sock_pair *relayd; - /* Find relayd and, if one is found, increment refcount. */ + assert(stream); + assert(ht); + + DBG3("Adding metadata stream %d to hash table", stream->wait_fd); + + pthread_mutex_lock(&consumer_data.lock); + + switch (consumer_data.type) { + case LTTNG_CONSUMER_KERNEL: + break; + case LTTNG_CONSUMER32_UST: + case LTTNG_CONSUMER64_UST: + ret = lttng_ustconsumer_add_stream(stream); + if (ret) { + ret = -EINVAL; + goto error; + } + + /* Steal stream identifier only for UST */ + consumer_steal_stream_key(stream->wait_fd, ht); + break; + default: + ERR("Unknown consumer_data type"); + assert(0); + ret = -ENOSYS; + goto error; + } + + /* + * From here, refcounts are updated so be _careful_ when returning an error + * after this point. + */ + rcu_read_lock(); + /* Find relayd and, if one is found, increment refcount. */ relayd = consumer_find_relayd(stream->net_seq_idx); if (relayd != NULL) { uatomic_inc(&relayd->refcount); } + + /* Update channel refcount once added without error(s). */ + uatomic_inc(&stream->chan->refcount); + + /* + * When nb_init_streams reaches 0, we don't need to trigger any action in + * terms of destroying the associated channel, because the action that + * causes the count to become 0 also causes a stream to be added. The + * channel deletion will thus be triggered by the following removal of this + * stream. + */ + if (uatomic_read(&stream->chan->nb_init_streams) > 0) { + uatomic_dec(&stream->chan->nb_init_streams); + } + + lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); rcu_read_unlock(); + +error: + pthread_mutex_unlock(&consumer_data.lock); + return ret; } /* @@ -1749,17 +1830,17 @@ restart: DBG("Adding metadata stream %d to poll set", stream->wait_fd); - rcu_read_lock(); - /* The node should be init at this point */ - lttng_ht_add_unique_ulong(metadata_ht, - &stream->waitfd_node); - rcu_read_unlock(); + ret = consumer_add_metadata_stream(stream, metadata_ht); + if (ret) { + ERR("Unable to add metadata stream"); + /* Stream was not setup properly. Continuing. */ + consumer_del_metadata_stream(stream, NULL); + continue; + } /* Add metadata stream to the global poll events list */ lttng_poll_add(&events, stream->wait_fd, LPOLLIN | LPOLLPRI); - - consumer_add_metadata_stream(stream); } /* Metadata pipe handled. Continue handling the others */ @@ -1817,11 +1898,8 @@ restart: } } - /* Removing it from hash table, poll set and free memory */ - lttng_ht_del(metadata_ht, &iter); - lttng_poll_del(&events, stream->wait_fd); - consumer_del_metadata_stream(stream); + consumer_del_metadata_stream(stream, metadata_ht); } rcu_read_unlock(); } @@ -2015,19 +2093,22 @@ void *lttng_consumer_thread_poll_fds(void *data) if ((pollfd[i].revents & POLLHUP)) { DBG("Polling fd %d tells it has hung up.", pollfd[i].fd); if (!local_stream[i]->data_read) { - consumer_del_stream(local_stream[i]); + consumer_del_stream(local_stream[i], + consumer_data.stream_ht); num_hup++; } } else if (pollfd[i].revents & POLLERR) { ERR("Error returned in polling fd %d.", pollfd[i].fd); if (!local_stream[i]->data_read) { - consumer_del_stream(local_stream[i]); + consumer_del_stream(local_stream[i], + consumer_data.stream_ht); num_hup++; } } else if (pollfd[i].revents & POLLNVAL) { ERR("Polling fd %d tells fd is not open.", pollfd[i].fd); if (!local_stream[i]->data_read) { - consumer_del_stream(local_stream[i]); + consumer_del_stream(local_stream[i], + consumer_data.stream_ht); num_hup++; } } diff --git a/src/common/consumer.h b/src/common/consumer.h index 9a93c42..5af38e1 100644 --- a/src/common/consumer.h +++ b/src/common/consumer.h @@ -341,7 +341,8 @@ extern struct lttng_consumer_stream *consumer_allocate_stream( int metadata_flag, int *alloc_ret); extern int consumer_add_stream(struct lttng_consumer_stream *stream); -extern void consumer_del_stream(struct lttng_consumer_stream *stream); +extern void consumer_del_stream(struct lttng_consumer_stream *stream, + struct lttng_ht *ht); extern void consumer_change_stream_state(int stream_key, enum lttng_consumer_stream_state state); extern void consumer_del_channel(struct lttng_consumer_channel *channel); diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c index 4d61cc5..13cbe21 100644 --- a/src/common/kernel-consumer/kernel-consumer.c +++ b/src/common/kernel-consumer/kernel-consumer.c @@ -206,18 +206,20 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, &new_stream->relayd_stream_id); pthread_mutex_unlock(&relayd->ctrl_sock_mutex); if (ret < 0) { + consumer_del_stream(new_stream, NULL); goto end_nosignal; } } else if (msg.u.stream.net_index != -1) { ERR("Network sequence index %d unknown. Not adding stream.", msg.u.stream.net_index); - free(new_stream); + consumer_del_stream(new_stream, NULL); goto end_nosignal; } if (ctx->on_recv_stream) { ret = ctx->on_recv_stream(new_stream); if (ret < 0) { + consumer_del_stream(new_stream, NULL); goto end_nosignal; } } @@ -230,9 +232,16 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, } while (ret < 0 && errno == EINTR); if (ret < 0) { PERROR("write metadata pipe"); + consumer_del_stream(new_stream, NULL); } } else { - consumer_add_stream(new_stream); + ret = consumer_add_stream(new_stream); + if (ret) { + ERR("Consumer add stream %d failed. Continuing", + new_stream->key); + consumer_del_stream(new_stream, NULL); + goto end_nosignal; + } } DBG("Kernel consumer_add_stream (%d)", fd); diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c index 76238a0..69da765 100644 --- a/src/common/ust-consumer/ust-consumer.c +++ b/src/common/ust-consumer/ust-consumer.c @@ -234,12 +234,13 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, &new_stream->relayd_stream_id); pthread_mutex_unlock(&relayd->ctrl_sock_mutex); if (ret < 0) { + consumer_del_stream(new_stream, NULL); goto end_nosignal; } } else if (msg.u.stream.net_index != -1) { ERR("Network sequence index %d unknown. Not adding stream.", msg.u.stream.net_index); - free(new_stream); + consumer_del_stream(new_stream, NULL); goto end_nosignal; } @@ -247,6 +248,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, if (ctx->on_recv_stream) { ret = ctx->on_recv_stream(new_stream); if (ret < 0) { + consumer_del_stream(new_stream, NULL); goto end_nosignal; } } @@ -259,9 +261,21 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, } while (ret < 0 && errno == EINTR); if (ret < 0) { PERROR("write metadata pipe"); + consumer_del_stream(new_stream, NULL); + goto end_nosignal; } } else { - consumer_add_stream(new_stream); + ret = consumer_add_stream(new_stream); + if (ret) { + ERR("Consumer add stream %d failed. Continuing", + new_stream->key); + /* + * At this point, if the add_stream fails, it is not in the + * hash table thus passing the NULL value here. + */ + consumer_del_stream(new_stream, NULL); + goto end_nosignal; + } } DBG("UST consumer_add_stream %s (%d,%d) with relayd id %" PRIu64, @@ -373,7 +387,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) ustctl_unmap_channel(chan->handle); } -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) { struct lttng_ust_object_data obj; int ret; diff --git a/src/common/ust-consumer/ust-consumer.h b/src/common/ust-consumer/ust-consumer.h index 3f76f23..6b507ed 100644 --- a/src/common/ust-consumer/ust-consumer.h +++ b/src/common/ust-consumer/ust-consumer.h @@ -49,7 +49,7 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, extern int lttng_ustconsumer_allocate_channel(struct lttng_consumer_channel *chan); extern void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan); -extern int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream); +extern int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream); extern void lttng_ustconsumer_del_stream(struct lttng_consumer_stream *stream); int lttng_ustconsumer_read_subbuffer(struct lttng_consumer_stream *stream, @@ -117,7 +117,7 @@ void lttng_ustconsumer_del_channel(struct lttng_consumer_channel *chan) } static inline -int lttng_ustconsumer_allocate_stream(struct lttng_consumer_stream *stream) +int lttng_ustconsumer_add_stream(struct lttng_consumer_stream *stream) { return -ENOSYS; } -- 1.7.10.4 From paulmck at linux.vnet.ibm.com Thu Oct 4 14:51:20 2012 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Thu, 4 Oct 2012 11:51:20 -0700 Subject: [lttng-dev] [rp] [URCU PATCH 0/3] wait-free concurrent queues (wfcqueue) In-Reply-To: <20121003210436.GB25090@Krystal> References: <20121002141307.GA4057@Krystal> <20121003182846.GN2527@linux.vnet.ibm.com> <20121003210436.GB25090@Krystal> Message-ID: <20121004185120.GA23484@linux.vnet.ibm.com> On Wed, Oct 03, 2012 at 05:04:36PM -0400, Mathieu Desnoyers wrote: > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > > On Tue, Oct 02, 2012 at 10:13:07AM -0400, Mathieu Desnoyers wrote: > > > Implement wait-free concurrent queues, with a new API different from > > > wfqueue.h, which is already provided by Userspace RCU. The advantage of > > > splitting the head and tail objects of the queue into different > > > arguments is to allow these to sit on different cache-lines, thus > > > eliminating false-sharing, leading to a 2.3x speed increase. > > > > > > This API also introduces a "splice" operation, which moves all nodes > > > from one queue into another, and postpones the synchronization to either > > > dequeue or iteration on the list. The splice operation does not need to > > > touch every single node of the queue it moves them from. Moreover, the > > > splice operation only needs to ensure mutual exclusion with other > > > dequeuers, iterations, and splice operations from the list it splices > > > from, but acts as a simple enqueuer on the list it splices into (no > > > mutual exclusion needed for that list). > > > > > > Feedback is welcome, > > > > These look sane to me, though I must confess that the tail pointer > > referencing the node rather than the node's next pointer did throw > > me for a bit. ;-) > > Yes, this was originally introduced with Lai's original patch to > wfqueue, which I think is a nice simplification: it's pretty much the > same thing to use the last node address as tail rather than the address > of its first member (its next pointer address (_not_ value)). It ends up > being the same address in this case, but more interestingly, we don't > have to use a struct cds_wfcq_node ** type: a simple struct > cds_wfcq_node * suffice. > > Thanks Paul, I will therefore merge these 3 patches with your Acked-by. Good point -- just confirming: Acked-by: Paul E. McKenney > Lai, you are welcome to provide improvements to this code against the > master branch. I will gladly consider any change you propose. > > Thanks! > > Mathieu > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com > > _______________________________________________ > rp mailing list > rp at svcs.cs.pdx.edu > http://svcs.cs.pdx.edu/mailman/listinfo/rp > From iurnah at gmail.com Thu Oct 4 13:58:32 2012 From: iurnah at gmail.com (Rui Han) Date: Thu, 4 Oct 2012 13:58:32 -0400 Subject: [lttng-dev] about sys_read, count argument, Message-ID: Hi all, I did a trace about a pdf reader process in my virtual machine, I found count argument in some of sys_read are extremely large, one of them have the value of "7161056817779312906". This cannot be true. I need to know exactly how many bytes have been read and write from the system calls. Anyone who know what happens here? Thank you very much. Rui -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.chavent at fnac.net Thu Oct 4 16:39:11 2012 From: paul.chavent at fnac.net (Paul Chavent) Date: Thu, 04 Oct 2012 22:39:11 +0200 Subject: [lttng-dev] Viewing userspace apps traces In-Reply-To: <506D91B0.2070708@ericsson.com> References: <506B4A21.6080309@fnac.net> <506B4EA4.8050407@voxpopuli.im> <506BD225.3030906@fnac.net> <506D0581.3030400@polymtl.ca> <506D91B0.2070708@ericsson.com> Message-ID: <506DF3EF.9070201@fnac.net> Hi, I've follow this link (http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/juno/SR1/eclipse-cpp-juno-SR1-linux-gtk-x86_64.tar.gz) to download the Eclipse env. Here is the version notes i can give you : ************** Eclipse Version: Juno Release Build id: 20120614-1722 Eclipse Linux Tools LTTng ... 1.0.0.201206130106 ************** Then i've got my traces with : lttng create lttng enable-event -u -a lttng add-context -u -t vpid -t vtid lttng start ... lttng stop lttng destroy Here is a sample of the traces : http://paul.chavent.free.fr/tmp/sample_traces.tar.bz2 I can open the traces in eclipse (i see the "Events" panel), but i can't see the events ordered by pid/tid in the "Time Chart" panel, neither "Control Flow", nor "Ressources" ...) Is it possible to have the traces displayed sorted by tid for instance ? Thanks. Paul. On 10/04/2012 03:40 PM, eamcs/eedbhu wrote: > Hi Paul > > The Eclipse Linux Tools release v1.1.1 (which is also included in the CDT EPP Juno SR1 release) contains an update that displays CTF context information in the events table. > So you should be able to see the context information. > > Best Regards > Bernd > > > On 10/03/2012 11:41 PM, Alexandre Montplaisir wrote: >> Sorry for the late reply, I had to update/rebase some of the stuff first... >> >> (Cross-posting linuxtools-dev, as it might interest some people who >> follow that list too. This is about extending TMF to implement a >> graphical view for a specific UST trace type.) >> >> >> On 12-10-03 01:50 AM, Paul Chavent wrote: >>> [...] >>> >>> On 10/02/2012 10:29 PM, Alexandre Montplaisir wrote: >>>> [...] >>>> >>>> What would you like see in your "timeline representation" exactly? Maybe >>>> we could give you some pointers as to how to implement such a view. >>>> (We're currently working on making it easy to extend the framework to >>>> implement new views, so this could be a good exercise!) >>> I would like to see, eg, one line per tid, and on each line, the value >>> of one context or argument value. >>> >>> I'm ready to follow an exercise for extending the framework ! >> Ok good! We don't have a nice tutorial ready yet, as most parts are >> still working their way upstream. But if you want to dig into it and try >> it out now, you can: >> >> 1 - Set up the development environment for TMF: >> http://wiki.eclipse.org/Linux_Tools_Project/LTTng_Eclipse_Plug-in_Development_Environement_Setup >> >> 2 - Checkout the "lttng-kepler" branch in the git. This is where the >> latest development happens. >> >> 3 - Apply those two patches, in that order: >> https://git.eclipse.org/r/#/c/7747/ >> https://git.eclipse.org/r/#/c/7748/ >> (you can copy-paste the "cherry-pick" command shown on the page) >> >> 4 - Download the example program and view from: >> git://git.dorsal.polymtl.ca/~alexmont/ust-example.git >> >> Now at this point you should be able to import and build the example >> plugins (ust.example.core and ust.example.ui) and the TMF/LTTng ones in >> the same workspace. >> >> You can try it to make sure it works correctly : take a UST trace of the >> "myprog" program, and then load it into TMF, and show the "Example -> >> Connections" view. It should display the yellow and green rectangles >> corresponding to the states that were defined. >> >> >> After that, it shouldn't be too hard (famous last words...) to rework >> the ust.example.* code to fit your application. The points of interest >> will be (before renames): >> >> MyUstTraceInput, line 85+: This is where you assign your trace events to >> states >> ConnectionsPresentationProvider, line 31-34: This is where you assign >> the colors to each state in the view >> and same file, lines 64-68 and 81-85 : This is where you assign the >> trace's states to the ones in the view. >> (one place is for the actual colored rectangle, the other is for the >> tooltips, iirc). >> >> >> If you have any question or problem, please let me know! >> >> Good luck ;) >> > From francis.giraldeau at gmail.com Thu Oct 4 16:38:52 2012 From: francis.giraldeau at gmail.com (Francis Giraldeau) Date: Thu, 04 Oct 2012 16:38:52 -0400 Subject: [lttng-dev] about sys_read, count argument, In-Reply-To: References: Message-ID: <506DF3DC.50508@gmail.com> Le 2012-10-04 13:58, Rui Han a ?crit : > > I did a trace about a pdf reader process in my virtual machine, I found > count argument in some of sys_read are extremely large, one of them have > the value of "7161056817779312906". This cannot be true. I need to know > exactly how many bytes have been read and write from the system calls. > Anyone who know what happens here? The return value of the following exit_syscall event for this thread indicates the read size. Cheers, Francis -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4489 bytes Desc: Signature cryptographique S/MIME URL: From Bernd.Hufmann at ericsson.com Fri Oct 5 07:44:24 2012 From: Bernd.Hufmann at ericsson.com (eamcs/eedbhu) Date: Fri, 5 Oct 2012 07:44:24 -0400 Subject: [lttng-dev] LTTng Tools 2.1 streaming commands In-Reply-To: <506C837B.5070402@efficios.com> References: <4CB817F8B8860343821834C09BFBA7A6E183805D4D@EUSAACMS0702.eamcs.ericsson.se> <506C837B.5070402@efficios.com> Message-ID: <506EC818.7050708@ericsson.com> Hi David thanks for the detailed explanation. I'm not debating the purpose of the command or that a 2 step approach is supported for enabling the consumer. As you described it makes sense to have this possibility. However I'm debating mainly the name of the command "enable-consumer". The command suggests that the consumer will be enabled when executing the command. I foresee that a lot of users expect that the consumer is enabled after the command execution. I think, a different name would improve the user experience. Best Regards Bernd On 10/03/2012 02:27 PM, David Goulet wrote: > Hi Bernd, > > The enable-consumer, by default, always enable a consumer. We added the > "extended options" such as the -U/-C/-D/-e to control each part of the > API. So let say, > > $ lttng enable-consumer -k net://localhost > > This command does two API calls underneath which are > lttng_set_consumer_uri and lttng_enable_consumer. > > However, the set_consumer_uri can be arbitrary long because it has to > connect to the relayd (if remote) and set the session. It adds "unknown" > latency to the command. So, for someone willing to control the full time > window of the streaming setup using the API, it is divided in two calls. > > This is why we added the extended options so the command line UI could > also be controlled on a per API call basis. > > Is it clear enough? > > Reply continues below: > > Bernd Hufmann: >> Hello >> >> For the support of LTTng Tools 2.1 in Eclipse, I'm currently trying to >> understand how to use the configuration for network streaming with the >> updated "lttng create"-command and new "enable-consumer"-command. >> >> a) lttng enable-consumer >> I find this command confusing because this command does not always >> enables the consumer, even if the command name implieeees so. The enabling >> actually depends on how the command is executed. >> Examples: >> >> * "lttng enable-consumer -k -U net://" or "lttng >> enable-consumer -k -C tcp:// -D tcp://" >> don't enable the consumer. You need to either add option --enable or >> execute subsequently "lttng enable-consumer --enable" >> * lttng enable-consumer -k net:// does enable the >> consumer. I took me a while to figure out the difference to the >> example above: The option -U is omitted. >> >> >> What the command actually provides, is 2 features: A way to configure >> streaming (e.g. remote_addr) and a way to enable the consumer. Would it >> be better to name it to "lttng configure-consumer"? Also, remove the >> support of the possibility to not specify -U, -C or -D. The following >> variants of this command should be enough: >> lttng configure-consumer -k -U [--enable] >> lttng configure-consumer -k -C -D [--enable] >> lttng configure-consumer -k --enable >> lttng configure-consumer -u -U [--enable] >> lttng configure-consumer -u -C -D [--enable] >> lttng configure-consumer -u --enable >> >> Please let me know what you think. >> >> b) lttng create [-U] | [-C -D] >> [--no-consumer] [--disable-consumer] >> >> * Are options --no-consumer or --disable-consumer only applicable for >> streaming? > No, also for local consumer. > >> * I'm not sure what is the purpose of the options --no-consumer or >> --disable-consumer. Could you please explain the use cases for using >> --no-consumer or --disable-consumer? > This basically disable the consumer for a tracing session. It's not very > useful for now but for upcoming snapshots and live tracing, it will make > way more sense! :). > > Again, same idea, the API can control the consumer "state" > (enable/disable), so we added these options for the UI. > > Cheers! > David > >> >> Thanks >> Bernd >> >> This Communication is Confidential. We only send and receive email on >> the basis of the terms set out at _www.ericsson.com/email_disclaimer_ >> >> >> >> >> This body part will be downloaded on demand. From bhufmann at gmail.com Fri Oct 5 08:15:10 2012 From: bhufmann at gmail.com (Bernd Hufmann) Date: Fri, 5 Oct 2012 08:15:10 -0400 Subject: [lttng-dev] Viewing userspace apps traces In-Reply-To: <506DF3EF.9070201@fnac.net> References: <506B4A21.6080309@fnac.net> <506B4EA4.8050407@voxpopuli.im> <506BD225.3030906@fnac.net> <506D0581.3030400@polymtl.ca> <506D91B0.2070708@ericsson.com> <506DF3EF.9070201@fnac.net> Message-ID: Hi Paul I was trying your trace with the Eclipse Juno SR1 CDT EPP and you're right the context information is not displayed in the Events Table (I'm not talking about the Time Graph and its table). I was verifying the commits and the commit that added the support for displaying of context information in the Events Table is in the release. As well as the example trace with context information I generated and tested with works. However, it was a Kernel trace with context information and not a UST trace. So, that needs to be investigated on our side (Eclipse developer) why the context information is not shown for the UST trace. We'll let you know. Best Regard Bernd On Thu, Oct 4, 2012 at 4:39 PM, Paul Chavent wrote: > Hi, > > I've follow this link > (http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/juno/SR1/eclipse-cpp-juno-SR1-linux-gtk-x86_64.tar.gz) > to download the Eclipse env. > > Here is the version notes i can give you : > ************** > Eclipse > Version: Juno Release > Build id: 20120614-1722 > > Eclipse Linux Tools LTTng ... 1.0.0.201206130106 > ************** > > Then i've got my traces with : > > lttng create > lttng enable-event -u -a > lttng add-context -u -t vpid -t vtid > lttng start > ... > lttng stop > lttng destroy > > Here is a sample of the traces : > http://paul.chavent.free.fr/tmp/sample_traces.tar.bz2 > > > I can open the traces in eclipse (i see the "Events" panel), but i can't see > the events ordered by pid/tid in the "Time Chart" panel, neither "Control > Flow", nor "Ressources" ...) > > Is it possible to have the traces displayed sorted by tid for instance ? > > Thanks. > > Paul. > > > > On 10/04/2012 03:40 PM, eamcs/eedbhu wrote: >> >> Hi Paul >> >> The Eclipse Linux Tools release v1.1.1 (which is also included in the CDT >> EPP Juno SR1 release) contains an update that displays CTF context >> information in the events table. >> So you should be able to see the context information. >> >> Best Regards >> Bernd >> >> >> On 10/03/2012 11:41 PM, Alexandre Montplaisir wrote: >>> >>> Sorry for the late reply, I had to update/rebase some of the stuff >>> first... >>> >>> (Cross-posting linuxtools-dev, as it might interest some people who >>> follow that list too. This is about extending TMF to implement a >>> graphical view for a specific UST trace type.) >>> >>> >>> On 12-10-03 01:50 AM, Paul Chavent wrote: >>>> >>>> [...] >>>> >>>> On 10/02/2012 10:29 PM, Alexandre Montplaisir wrote: >>>>> >>>>> [...] >>>>> >>>>> What would you like see in your "timeline representation" exactly? >>>>> Maybe >>>>> we could give you some pointers as to how to implement such a view. >>>>> (We're currently working on making it easy to extend the framework to >>>>> implement new views, so this could be a good exercise!) >>>> >>>> I would like to see, eg, one line per tid, and on each line, the value >>>> of one context or argument value. >>>> >>>> I'm ready to follow an exercise for extending the framework ! >>> >>> Ok good! We don't have a nice tutorial ready yet, as most parts are >>> still working their way upstream. But if you want to dig into it and try >>> it out now, you can: >>> >>> 1 - Set up the development environment for TMF: >>> >>> http://wiki.eclipse.org/Linux_Tools_Project/LTTng_Eclipse_Plug-in_Development_Environement_Setup >>> >>> 2 - Checkout the "lttng-kepler" branch in the git. This is where the >>> latest development happens. >>> >>> 3 - Apply those two patches, in that order: >>> https://git.eclipse.org/r/#/c/7747/ >>> https://git.eclipse.org/r/#/c/7748/ >>> (you can copy-paste the "cherry-pick" command shown on the page) >>> >>> 4 - Download the example program and view from: >>> git://git.dorsal.polymtl.ca/~alexmont/ust-example.git >>> >>> Now at this point you should be able to import and build the example >>> plugins (ust.example.core and ust.example.ui) and the TMF/LTTng ones in >>> the same workspace. >>> >>> You can try it to make sure it works correctly : take a UST trace of the >>> "myprog" program, and then load it into TMF, and show the "Example -> >>> Connections" view. It should display the yellow and green rectangles >>> corresponding to the states that were defined. >>> >>> >>> After that, it shouldn't be too hard (famous last words...) to rework >>> the ust.example.* code to fit your application. The points of interest >>> will be (before renames): >>> >>> MyUstTraceInput, line 85+: This is where you assign your trace events to >>> states >>> ConnectionsPresentationProvider, line 31-34: This is where you assign >>> the colors to each state in the view >>> and same file, lines 64-68 and 81-85 : This is where you assign the >>> trace's states to the ones in the view. >>> (one place is for the actual colored rectangle, the other is for the >>> tooltips, iirc). >>> >>> >>> If you have any question or problem, please let me know! >>> >>> Good luck ;) >>> >> > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev From dgoulet at efficios.com Fri Oct 5 09:13:37 2012 From: dgoulet at efficios.com (David Goulet) Date: Fri, 05 Oct 2012 09:13:37 -0400 Subject: [lttng-dev] LTTng Tools 2.1 streaming commands In-Reply-To: <506EC818.7050708@ericsson.com> References: <4CB817F8B8860343821834C09BFBA7A6E183805D4D@EUSAACMS0702.eamcs.ericsson.se> <506C837B.5070402@efficios.com> <506EC818.7050708@ericsson.com> Message-ID: <506EDD01.60704@efficios.com> Right. I guess this discussion can continue and we can come up with a community consensus. This is a matter of UI naming so it should be decided now before we release stable since for the complete 2.x cycle we'll have to support this. Anyone wants to ship in here? Thanks! David eamcs/eedbhu: > Hi David > > thanks for the detailed explanation. > > I'm not debating the purpose of the command or that a 2 step approach is > supported for enabling the consumer. As you described it makes sense to > have this possibility. > > However I'm debating mainly the name of the command "enable-consumer". > The command suggests that the consumer will be enabled when executing > the command. I foresee that a lot of users expect that the consumer is > enabled after the command execution. I think, a different name would > improve the user experience. > > Best Regards > Bernd > > On 10/03/2012 02:27 PM, David Goulet wrote: >> Hi Bernd, >> >> The enable-consumer, by default, always enable a consumer. We added the >> "extended options" such as the -U/-C/-D/-e to control each part of the >> API. So let say, >> >> $ lttng enable-consumer -k net://localhost >> >> This command does two API calls underneath which are >> lttng_set_consumer_uri and lttng_enable_consumer. >> >> However, the set_consumer_uri can be arbitrary long because it has to >> connect to the relayd (if remote) and set the session. It adds "unknown" >> latency to the command. So, for someone willing to control the full time >> window of the streaming setup using the API, it is divided in two calls. >> >> This is why we added the extended options so the command line UI could >> also be controlled on a per API call basis. >> >> Is it clear enough? >> >> Reply continues below: >> >> Bernd Hufmann: >>> Hello >>> >>> For the support of LTTng Tools 2.1 in Eclipse, I'm currently trying to >>> understand how to use the configuration for network streaming with the >>> updated "lttng create"-command and new "enable-consumer"-command. >>> >>> a) lttng enable-consumer >>> I find this command confusing because this command does not always >>> enables the consumer, even if the command name implieeees so. The >>> enabling >>> actually depends on how the command is executed. >>> Examples: >>> >>> * "lttng enable-consumer -k -U net://" or "lttng >>> enable-consumer -k -C tcp:// -D tcp://" >>> don't enable the consumer. You need to either add option >>> --enable or >>> execute subsequently "lttng enable-consumer --enable" >>> * lttng enable-consumer -k net:// does enable the >>> consumer. I took me a while to figure out the difference to the >>> example above: The option -U is omitted. >>> >>> >>> What the command actually provides, is 2 features: A way to configure >>> streaming (e.g. remote_addr) and a way to enable the consumer. Would it >>> be better to name it to "lttng configure-consumer"? Also, remove the >>> support of the possibility to not specify -U, -C or -D. The following >>> variants of this command should be enough: >>> lttng configure-consumer -k -U [--enable] >>> lttng configure-consumer -k -C -D [--enable] >>> lttng configure-consumer -k --enable >>> lttng configure-consumer -u -U [--enable] >>> lttng configure-consumer -u -C -D [--enable] >>> lttng configure-consumer -u --enable >>> >>> Please let me know what you think. >>> >>> b) lttng create [-U] | [-C -D] >>> [--no-consumer] [--disable-consumer] >>> >>> * Are options --no-consumer or --disable-consumer only applicable for >>> streaming? >> No, also for local consumer. >> >>> * I'm not sure what is the purpose of the options --no-consumer or >>> --disable-consumer. Could you please explain the use cases for >>> using >>> --no-consumer or --disable-consumer? >> This basically disable the consumer for a tracing session. It's not very >> useful for now but for upcoming snapshots and live tracing, it will make >> way more sense! :). >> >> Again, same idea, the API can control the consumer "state" >> (enable/disable), so we added these options for the UI. >> >> Cheers! >> David >> >>> >>> Thanks >>> Bernd >>> >>> This Communication is Confidential. We only send and receive email on >>> the basis of the terms set out at _www.ericsson.com/email_disclaimer_ >>> >>> >>> >>> >>> This body part will be downloaded on demand. > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev From paul.chavent at fnac.net Fri Oct 5 15:29:10 2012 From: paul.chavent at fnac.net (Paul Chavent) Date: Fri, 05 Oct 2012 21:29:10 +0200 Subject: [lttng-dev] Viewing userspace apps traces In-Reply-To: References: <506B4A21.6080309@fnac.net> <506B4EA4.8050407@voxpopuli.im> <506BD225.3030906@fnac.net> <506D0581.3030400@polymtl.ca> <506D91B0.2070708@ericsson.com> <506DF3EF.9070201@fnac.net> Message-ID: <506F3506.1010708@fnac.net> Thank you very much for handling this point. On 10/05/2012 02:15 PM, Bernd Hufmann wrote: > Hi Paul > > I was trying your trace with the Eclipse Juno SR1 CDT EPP and you're > right the context information is not displayed in the Events Table > (I'm not talking about the Time Graph and its table). > I was verifying the commits and the commit that added the support for > displaying of context information in the Events Table is in the > release. As well as the example trace with context > information I generated and tested with works. However, it was a > Kernel trace with context information and not a UST trace. > > So, that needs to be investigated on our side (Eclipse developer) why > the context information is not shown for the UST trace. We'll let you > know. > > Best Regard > Bernd > > On Thu, Oct 4, 2012 at 4:39 PM, Paul Chavent wrote: >> Hi, >> >> I've follow this link >> (http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/juno/SR1/eclipse-cpp-juno-SR1-linux-gtk-x86_64.tar.gz) >> to download the Eclipse env. >> >> Here is the version notes i can give you : >> ************** >> Eclipse >> Version: Juno Release >> Build id: 20120614-1722 >> >> Eclipse Linux Tools LTTng ... 1.0.0.201206130106 >> ************** >> >> Then i've got my traces with : >> >> lttng create >> lttng enable-event -u -a >> lttng add-context -u -t vpid -t vtid >> lttng start >> ... >> lttng stop >> lttng destroy >> >> Here is a sample of the traces : >> http://paul.chavent.free.fr/tmp/sample_traces.tar.bz2 >> >> >> I can open the traces in eclipse (i see the "Events" panel), but i can't see >> the events ordered by pid/tid in the "Time Chart" panel, neither "Control >> Flow", nor "Ressources" ...) >> >> Is it possible to have the traces displayed sorted by tid for instance ? >> >> Thanks. >> >> Paul. >> >> >> >> On 10/04/2012 03:40 PM, eamcs/eedbhu wrote: >>> >>> Hi Paul >>> >>> The Eclipse Linux Tools release v1.1.1 (which is also included in the CDT >>> EPP Juno SR1 release) contains an update that displays CTF context >>> information in the events table. >>> So you should be able to see the context information. >>> >>> Best Regards >>> Bernd >>> >>> >>> On 10/03/2012 11:41 PM, Alexandre Montplaisir wrote: >>>> >>>> Sorry for the late reply, I had to update/rebase some of the stuff >>>> first... >>>> >>>> (Cross-posting linuxtools-dev, as it might interest some people who >>>> follow that list too. This is about extending TMF to implement a >>>> graphical view for a specific UST trace type.) >>>> >>>> >>>> On 12-10-03 01:50 AM, Paul Chavent wrote: >>>>> >>>>> [...] >>>>> >>>>> On 10/02/2012 10:29 PM, Alexandre Montplaisir wrote: >>>>>> >>>>>> [...] >>>>>> >>>>>> What would you like see in your "timeline representation" exactly? >>>>>> Maybe >>>>>> we could give you some pointers as to how to implement such a view. >>>>>> (We're currently working on making it easy to extend the framework to >>>>>> implement new views, so this could be a good exercise!) >>>>> >>>>> I would like to see, eg, one line per tid, and on each line, the value >>>>> of one context or argument value. >>>>> >>>>> I'm ready to follow an exercise for extending the framework ! >>>> >>>> Ok good! We don't have a nice tutorial ready yet, as most parts are >>>> still working their way upstream. But if you want to dig into it and try >>>> it out now, you can: >>>> >>>> 1 - Set up the development environment for TMF: >>>> >>>> http://wiki.eclipse.org/Linux_Tools_Project/LTTng_Eclipse_Plug-in_Development_Environement_Setup >>>> >>>> 2 - Checkout the "lttng-kepler" branch in the git. This is where the >>>> latest development happens. >>>> >>>> 3 - Apply those two patches, in that order: >>>> https://git.eclipse.org/r/#/c/7747/ >>>> https://git.eclipse.org/r/#/c/7748/ >>>> (you can copy-paste the "cherry-pick" command shown on the page) >>>> >>>> 4 - Download the example program and view from: >>>> git://git.dorsal.polymtl.ca/~alexmont/ust-example.git >>>> >>>> Now at this point you should be able to import and build the example >>>> plugins (ust.example.core and ust.example.ui) and the TMF/LTTng ones in >>>> the same workspace. >>>> >>>> You can try it to make sure it works correctly : take a UST trace of the >>>> "myprog" program, and then load it into TMF, and show the "Example -> >>>> Connections" view. It should display the yellow and green rectangles >>>> corresponding to the states that were defined. >>>> >>>> >>>> After that, it shouldn't be too hard (famous last words...) to rework >>>> the ust.example.* code to fit your application. The points of interest >>>> will be (before renames): >>>> >>>> MyUstTraceInput, line 85+: This is where you assign your trace events to >>>> states >>>> ConnectionsPresentationProvider, line 31-34: This is where you assign >>>> the colors to each state in the view >>>> and same file, lines 64-68 and 81-85 : This is where you assign the >>>> trace's states to the ones in the view. >>>> (one place is for the actual colored rectangle, the other is for the >>>> tooltips, iirc). >>>> >>>> >>>> If you have any question or problem, please let me know! >>>> >>>> Good luck ;) >>>> >>> >> >> >> _______________________________________________ >> lttng-dev mailing list >> lttng-dev at lists.lttng.org >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > From laijs at cn.fujitsu.com Sun Oct 7 23:09:29 2012 From: laijs at cn.fujitsu.com (Lai Jiangshan) Date: Mon, 08 Oct 2012 11:09:29 +0800 Subject: [lttng-dev] [URCU PATCH 3/3] call_rcu: use wfcqueue, eliminate false-sharing In-Reply-To: <20121002141638.GD4057@Krystal> References: <20121002141307.GA4057@Krystal> <20121002141638.GD4057@Krystal> Message-ID: <507243E9.8010604@cn.fujitsu.com> On 10/02/2012 10:16 PM, Mathieu Desnoyers wrote: > Eliminate false-sharing between call_rcu (enqueuer) and worker threads > on the queue head and tail. > > Signed-off-by: Mathieu Desnoyers > --- > diff --git a/tests/Makefile.am b/tests/Makefile.am > index 81718bb..c92bbe6 100644 > --- a/tests/Makefile.am > +++ b/tests/Makefile.am > @@ -30,14 +30,14 @@ if COMPAT_FUTEX > COMPAT+=$(top_srcdir)/compat_futex.c > endif > > -URCU=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > -URCU_QSBR=$(top_srcdir)/urcu-qsbr.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > +URCU=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > +URCU_QSBR=$(top_srcdir)/urcu-qsbr.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > # URCU_MB uses urcu.c but -DRCU_MB must be defined > -URCU_MB=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > +URCU_MB=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > # URCU_SIGNAL uses urcu.c but -DRCU_SIGNAL must be defined > -URCU_SIGNAL=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > -URCU_BP=$(top_srcdir)/urcu-bp.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > -URCU_DEFER=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > +URCU_SIGNAL=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > +URCU_BP=$(top_srcdir)/urcu-bp.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > +URCU_DEFER=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > > URCU_COMMON_LIB=$(top_builddir)/liburcu-common.la > URCU_LIB=$(top_builddir)/liburcu.la > diff --git a/urcu-call-rcu-impl.h b/urcu-call-rcu-impl.h > index 13b24ff..cf65992 100644 > --- a/urcu-call-rcu-impl.h > +++ b/urcu-call-rcu-impl.h > @@ -21,6 +21,7 @@ > */ > > #define _GNU_SOURCE > +#define _LGPL_SOURCE > #include > #include > #include > @@ -35,7 +36,7 @@ > #include > > #include "config.h" > -#include "urcu/wfqueue.h" > +#include "urcu/wfcqueue.h" > #include "urcu-call-rcu.h" > #include "urcu-pointer.h" > #include "urcu/list.h" > @@ -46,7 +47,14 @@ > /* Data structure that identifies a call_rcu thread. */ > > struct call_rcu_data { > - struct cds_wfq_queue cbs; > + /* > + * Align the tail on cache line size to eliminate false-sharing > + * with head. > + */ > + struct cds_wfcq_tail __attribute__((aligned(CAA_CACHE_LINE_SIZE))) cbs_tail; > + /* Alignment on cache line size will add padding here */ > + > + struct cds_wfcq_head cbs_head; wrong here. In this code, cbs_tail and cbs_head are in the same cache line. --- struct call_rcu_data { struct cds_wfcq_tail cbs_tail; struct cds_wfcq_head __attribute__((aligned(CAA_CACHE_LINE_SIZE))) cbs_head; /* other fields, can move some fields up to use the room between tail and head */ }; # cat test.c struct a { int __attribute__((aligned(64))) i; int j; }; struct b { int i; int __attribute__((aligned(64))) j; }; void main(void) { printf("%d,%d\n", sizeof(struct a), sizeof(struct b)); } # ./a.out 64,128 > unsigned long flags; > int32_t futex; > unsigned long qlen; /* maintained for debugging. */ > @@ -220,10 +228,7 @@ static void call_rcu_wake_up(struct call_rcu_data *crdp) > static void *call_rcu_thread(void *arg) > { > unsigned long cbcount; > - struct cds_wfq_node *cbs; > - struct cds_wfq_node **cbs_tail; > - struct call_rcu_data *crdp = (struct call_rcu_data *)arg; > - struct rcu_head *rhp; > + struct call_rcu_data *crdp = (struct call_rcu_data *) arg; > int rt = !!(uatomic_read(&crdp->flags) & URCU_CALL_RCU_RT); > int ret; > > @@ -243,35 +248,33 @@ static void *call_rcu_thread(void *arg) > cmm_smp_mb(); > } > for (;;) { > - if (&crdp->cbs.head != _CMM_LOAD_SHARED(crdp->cbs.tail)) { > - while ((cbs = _CMM_LOAD_SHARED(crdp->cbs.head)) == NULL) > - poll(NULL, 0, 1); > - _CMM_STORE_SHARED(crdp->cbs.head, NULL); > - cbs_tail = (struct cds_wfq_node **) > - uatomic_xchg(&crdp->cbs.tail, &crdp->cbs.head); > + struct cds_wfcq_head cbs_tmp_head; > + struct cds_wfcq_tail cbs_tmp_tail; > + struct cds_wfcq_node *cbs, *cbs_tmp_n; > + > + cds_wfcq_init(&cbs_tmp_head, &cbs_tmp_tail); > + __cds_wfcq_splice_blocking(&cbs_tmp_head, &cbs_tmp_tail, > + &crdp->cbs_head, &crdp->cbs_tail); > + if (!cds_wfcq_empty(&cbs_tmp_head, &cbs_tmp_tail)) { > synchronize_rcu(); > cbcount = 0; > - do { > - while (cbs->next == NULL && > - &cbs->next != cbs_tail) > - poll(NULL, 0, 1); > - if (cbs == &crdp->cbs.dummy) { > - cbs = cbs->next; > - continue; > - } > - rhp = (struct rcu_head *)cbs; > - cbs = cbs->next; > + __cds_wfcq_for_each_blocking_safe(&cbs_tmp_head, > + &cbs_tmp_tail, cbs, cbs_tmp_n) { > + struct rcu_head *rhp; > + > + rhp = caa_container_of(cbs, > + struct rcu_head, next); > rhp->func(rhp); > cbcount++; > - } while (cbs != NULL); > + } > uatomic_sub(&crdp->qlen, cbcount); > } > if (uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOP) > break; > rcu_thread_offline(); > if (!rt) { > - if (&crdp->cbs.head > - == _CMM_LOAD_SHARED(crdp->cbs.tail)) { > + if (cds_wfcq_empty(&crdp->cbs_head, > + &crdp->cbs_tail)) { > call_rcu_wait(crdp); > poll(NULL, 0, 10); > uatomic_dec(&crdp->futex); > @@ -317,7 +320,7 @@ static void call_rcu_data_init(struct call_rcu_data **crdpp, > if (crdp == NULL) > urcu_die(errno); > memset(crdp, '\0', sizeof(*crdp)); > - cds_wfq_init(&crdp->cbs); > + cds_wfcq_init(&crdp->cbs_head, &crdp->cbs_tail); > crdp->qlen = 0; > crdp->futex = 0; > crdp->flags = flags; > @@ -590,12 +593,12 @@ void call_rcu(struct rcu_head *head, > { > struct call_rcu_data *crdp; > > - cds_wfq_node_init(&head->next); > + cds_wfcq_node_init(&head->next); > head->func = func; > /* Holding rcu read-side lock across use of per-cpu crdp */ > rcu_read_lock(); > crdp = get_call_rcu_data(); > - cds_wfq_enqueue(&crdp->cbs, &head->next); > + cds_wfcq_enqueue(&crdp->cbs_head, &crdp->cbs_tail, &head->next); > uatomic_inc(&crdp->qlen); > wake_call_rcu_thread(crdp); > rcu_read_unlock(); > @@ -625,10 +628,6 @@ void call_rcu(struct rcu_head *head, > */ > void call_rcu_data_free(struct call_rcu_data *crdp) > { > - struct cds_wfq_node *cbs; > - struct cds_wfq_node **cbs_tail; > - struct cds_wfq_node **cbs_endprev; > - > if (crdp == NULL || crdp == default_call_rcu_data) { > return; > } > @@ -638,17 +637,12 @@ void call_rcu_data_free(struct call_rcu_data *crdp) > while ((uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOPPED) == 0) > poll(NULL, 0, 1); > } > - if (&crdp->cbs.head != _CMM_LOAD_SHARED(crdp->cbs.tail)) { > - while ((cbs = _CMM_LOAD_SHARED(crdp->cbs.head)) == NULL) > - poll(NULL, 0, 1); > - _CMM_STORE_SHARED(crdp->cbs.head, NULL); > - cbs_tail = (struct cds_wfq_node **) > - uatomic_xchg(&crdp->cbs.tail, &crdp->cbs.head); > + if (!cds_wfcq_empty(&crdp->cbs_head, &crdp->cbs_tail)) { > /* Create default call rcu data if need be */ > (void) get_default_call_rcu_data(); > - cbs_endprev = (struct cds_wfq_node **) > - uatomic_xchg(&default_call_rcu_data, cbs_tail); > - *cbs_endprev = cbs; > + __cds_wfcq_splice_blocking(&default_call_rcu_data->cbs_head, > + &default_call_rcu_data->cbs_tail, > + &crdp->cbs_head, &crdp->cbs_tail); > uatomic_add(&default_call_rcu_data->qlen, > uatomic_read(&crdp->qlen)); > wake_call_rcu_thread(default_call_rcu_data); > diff --git a/urcu-call-rcu.h b/urcu-call-rcu.h > index f7eac8d..1dad0e2 100644 > --- a/urcu-call-rcu.h > +++ b/urcu-call-rcu.h > @@ -32,7 +32,7 @@ > #include > #include > > -#include > +#include > > #ifdef __cplusplus > extern "C" { > @@ -55,7 +55,7 @@ struct call_rcu_data; > */ > > struct rcu_head { > - struct cds_wfq_node next; > + struct cds_wfcq_node next; > void (*func)(struct rcu_head *head); > }; > From laijs at cn.fujitsu.com Sun Oct 7 23:33:47 2012 From: laijs at cn.fujitsu.com (Lai Jiangshan) Date: Mon, 08 Oct 2012 11:33:47 +0800 Subject: [lttng-dev] [rp] [URCU PATCH 0/3] wait-free concurrent queues (wfcqueue) In-Reply-To: <20121003210436.GB25090@Krystal> References: <20121002141307.GA4057@Krystal> <20121003182846.GN2527@linux.vnet.ibm.com> <20121003210436.GB25090@Krystal> Message-ID: <5072499B.1050301@cn.fujitsu.com> On 10/04/2012 05:04 AM, Mathieu Desnoyers wrote: > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: >> On Tue, Oct 02, 2012 at 10:13:07AM -0400, Mathieu Desnoyers wrote: >>> Implement wait-free concurrent queues, with a new API different from >>> wfqueue.h, which is already provided by Userspace RCU. The advantage of >>> splitting the head and tail objects of the queue into different >>> arguments is to allow these to sit on different cache-lines, thus >>> eliminating false-sharing, leading to a 2.3x speed increase. >>> >>> This API also introduces a "splice" operation, which moves all nodes >>> from one queue into another, and postpones the synchronization to either >>> dequeue or iteration on the list. The splice operation does not need to >>> touch every single node of the queue it moves them from. Moreover, the >>> splice operation only needs to ensure mutual exclusion with other >>> dequeuers, iterations, and splice operations from the list it splices >>> from, but acts as a simple enqueuer on the list it splices into (no >>> mutual exclusion needed for that list). >>> >>> Feedback is welcome, >> >> These look sane to me, though I must confess that the tail pointer >> referencing the node rather than the node's next pointer did throw >> me for a bit. ;-) > > Yes, this was originally introduced with Lai's original patch to > wfqueue, which I think is a nice simplification: it's pretty much the > same thing to use the last node address as tail rather than the address > of its first member (its next pointer address (_not_ value)). It ends up > being the same address in this case, but more interestingly, we don't > have to use a struct cds_wfcq_node ** type: a simple struct > cds_wfcq_node * suffice. > > Thanks Paul, I will therefore merge these 3 patches with your Acked-by. > > Lai, you are welcome to provide improvements to this code against the > master branch. I will gladly consider any change you propose. > I did not remember that there is any improvement idea not included. The patchset is OK for me. I think you can reimplement wfqueue via wfcqueue without cacheline opt. Thanks, Lai From mathieu.desnoyers at efficios.com Mon Oct 8 10:49:16 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 8 Oct 2012 10:49:16 -0400 Subject: [lttng-dev] [URCU PATCH 3/3] call_rcu: use wfcqueue, eliminate false-sharing In-Reply-To: <507243E9.8010604@cn.fujitsu.com> References: <20121002141307.GA4057@Krystal> <20121002141638.GD4057@Krystal> <507243E9.8010604@cn.fujitsu.com> Message-ID: <20121008144916.GA29352@Krystal> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > On 10/02/2012 10:16 PM, Mathieu Desnoyers wrote: > > Eliminate false-sharing between call_rcu (enqueuer) and worker threads > > on the queue head and tail. > > > > Signed-off-by: Mathieu Desnoyers > > --- > > diff --git a/tests/Makefile.am b/tests/Makefile.am > > index 81718bb..c92bbe6 100644 > > --- a/tests/Makefile.am > > +++ b/tests/Makefile.am > > @@ -30,14 +30,14 @@ if COMPAT_FUTEX > > COMPAT+=$(top_srcdir)/compat_futex.c > > endif > > > > -URCU=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > > -URCU_QSBR=$(top_srcdir)/urcu-qsbr.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > > +URCU=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > > +URCU_QSBR=$(top_srcdir)/urcu-qsbr.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > > # URCU_MB uses urcu.c but -DRCU_MB must be defined > > -URCU_MB=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > > +URCU_MB=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > > # URCU_SIGNAL uses urcu.c but -DRCU_SIGNAL must be defined > > -URCU_SIGNAL=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > > -URCU_BP=$(top_srcdir)/urcu-bp.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > > -URCU_DEFER=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > > +URCU_SIGNAL=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > > +URCU_BP=$(top_srcdir)/urcu-bp.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > > +URCU_DEFER=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > > > > URCU_COMMON_LIB=$(top_builddir)/liburcu-common.la > > URCU_LIB=$(top_builddir)/liburcu.la > > diff --git a/urcu-call-rcu-impl.h b/urcu-call-rcu-impl.h > > index 13b24ff..cf65992 100644 > > --- a/urcu-call-rcu-impl.h > > +++ b/urcu-call-rcu-impl.h > > @@ -21,6 +21,7 @@ > > */ > > > > #define _GNU_SOURCE > > +#define _LGPL_SOURCE > > #include > > #include > > #include > > @@ -35,7 +36,7 @@ > > #include > > > > #include "config.h" > > -#include "urcu/wfqueue.h" > > +#include "urcu/wfcqueue.h" > > #include "urcu-call-rcu.h" > > #include "urcu-pointer.h" > > #include "urcu/list.h" > > @@ -46,7 +47,14 @@ > > /* Data structure that identifies a call_rcu thread. */ > > > > struct call_rcu_data { > > - struct cds_wfq_queue cbs; > > + /* > > + * Align the tail on cache line size to eliminate false-sharing > > + * with head. > > + */ > > + struct cds_wfcq_tail __attribute__((aligned(CAA_CACHE_LINE_SIZE))) cbs_tail; > > + /* Alignment on cache line size will add padding here */ > > + > > + struct cds_wfcq_head cbs_head; > > > wrong here. In this code, cbs_tail and cbs_head are in the same cache line. > > --- > > struct call_rcu_data { > struct cds_wfcq_tail cbs_tail; > struct cds_wfcq_head __attribute__((aligned(CAA_CACHE_LINE_SIZE))) cbs_head; > /* other fields, can move some fields up to use the room between tail and head */ > }; > > # cat test.c > > struct a { > int __attribute__((aligned(64))) i; > int j; > }; > struct b { > int i; > int __attribute__((aligned(64))) j; > }; > > void main(void) > { > printf("%d,%d\n", sizeof(struct a), sizeof(struct b)); > } > > # ./a.out > 64,128 > Good point! While we are there, I notice that the "qlen" count, kept for debugging, is causing false-sharing too. I wonder if we should split this counter in two counters: nr_enqueue and nr_dequeue, which would sit in two different cache lines ? It's mainly Paul who cares about this counter. Thoughts ? Here is the fix to the problem you noticed above: commit b9f893b69fbc31baea418794938f4eb74cc4923a Author: Mathieu Desnoyers Date: Mon Oct 8 10:44:38 2012 -0400 Fix urcu-call-rcu-impl.h: false-sharing > > struct call_rcu_data { > > - struct cds_wfq_queue cbs; > > + /* > > + * Align the tail on cache line size to eliminate false-sharing > > + * with head. > > + */ > > + struct cds_wfcq_tail __attribute__((aligned(CAA_CACHE_LINE_SIZE))) cbs_tail; > > + /* Alignment on cache line size will add padding here */ > > + > > + struct cds_wfcq_head cbs_head; > > > wrong here. In this code, cbs_tail and cbs_head are in the same cache line. Reported-by: Lai Jiangshan Signed-off-by: Mathieu Desnoyers Thanks! Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 8 11:07:30 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 8 Oct 2012 11:07:30 -0400 Subject: [lttng-dev] [rp] [URCU PATCH 0/3] wait-free concurrent queues (wfcqueue) In-Reply-To: <5072499B.1050301@cn.fujitsu.com> References: <20121002141307.GA4057@Krystal> <20121003182846.GN2527@linux.vnet.ibm.com> <20121003210436.GB25090@Krystal> <5072499B.1050301@cn.fujitsu.com> Message-ID: <20121008150729.GB29352@Krystal> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > On 10/04/2012 05:04 AM, Mathieu Desnoyers wrote: > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > >> On Tue, Oct 02, 2012 at 10:13:07AM -0400, Mathieu Desnoyers wrote: > >>> Implement wait-free concurrent queues, with a new API different from > >>> wfqueue.h, which is already provided by Userspace RCU. The advantage of > >>> splitting the head and tail objects of the queue into different > >>> arguments is to allow these to sit on different cache-lines, thus > >>> eliminating false-sharing, leading to a 2.3x speed increase. > >>> > >>> This API also introduces a "splice" operation, which moves all nodes > >>> from one queue into another, and postpones the synchronization to either > >>> dequeue or iteration on the list. The splice operation does not need to > >>> touch every single node of the queue it moves them from. Moreover, the > >>> splice operation only needs to ensure mutual exclusion with other > >>> dequeuers, iterations, and splice operations from the list it splices > >>> from, but acts as a simple enqueuer on the list it splices into (no > >>> mutual exclusion needed for that list). > >>> > >>> Feedback is welcome, > >> > >> These look sane to me, though I must confess that the tail pointer > >> referencing the node rather than the node's next pointer did throw > >> me for a bit. ;-) > > > > Yes, this was originally introduced with Lai's original patch to > > wfqueue, which I think is a nice simplification: it's pretty much the > > same thing to use the last node address as tail rather than the address > > of its first member (its next pointer address (_not_ value)). It ends up > > being the same address in this case, but more interestingly, we don't > > have to use a struct cds_wfcq_node ** type: a simple struct > > cds_wfcq_node * suffice. > > > > Thanks Paul, I will therefore merge these 3 patches with your Acked-by. > > > > Lai, you are welcome to provide improvements to this code against the > > master branch. I will gladly consider any change you propose. > > > > I did not remember that there is any improvement idea not included. > The patchset is OK for me. Great! Would you be OK if I commit the following patch ? Let me know if you want me to put your signed-off-by on this (I can even put your email as From if you like): wfcqueue: update credits in patch documentation Give credits to those responsible for the design and implementation of commit 8ad4ce587f001ae026d5560ac509c2e48986130b, "wfcqueue: implement concurrency-efficient queue", which happened through rounds of email and patch exchanges. Signed-off-by: Mathieu Desnoyers --- diff --git a/urcu/static/wfcqueue.h b/urcu/static/wfcqueue.h index a989984..153143d 100644 --- a/urcu/static/wfcqueue.h +++ b/urcu/static/wfcqueue.h @@ -41,8 +41,10 @@ extern "C" { /* * Concurrent queue with wait-free enqueue/blocking dequeue. * - * Inspired from half-wait-free/half-blocking queue implementation done by - * Paul E. McKenney. + * This queue has been designed and implemented collaboratively by + * Mathieu Desnoyers and Lai Jiangshan. Inspired from + * half-wait-free/half-blocking queue implementation done by Paul E. + * McKenney. * * Mutual exclusion of __cds_wfcq_* API * diff --git a/urcu/wfcqueue.h b/urcu/wfcqueue.h index 5576cbf..940dc7d 100644 --- a/urcu/wfcqueue.h +++ b/urcu/wfcqueue.h @@ -37,8 +37,10 @@ extern "C" { /* * Concurrent queue with wait-free enqueue/blocking dequeue. * - * Inspired from half-wait-free/half-blocking queue implementation done by - * Paul E. McKenney. + * This queue has been designed and implemented collaboratively by + * Mathieu Desnoyers and Lai Jiangshan. Inspired from + * half-wait-free/half-blocking queue implementation done by Paul E. + * McKenney. */ struct cds_wfcq_node { > I think you can reimplement wfqueue via wfcqueue without cacheline opt. Hrm, semantically this can indeed be done, but I fear that we might not be strictly ABI-compatible with the old wfqueue. So I would be tempted to leave the old wfqueue implementation as-is, and maybe deprecate it at some point. Thoughts ? Thanks! Mathieu > > Thanks, > Lai -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From paulmck at linux.vnet.ibm.com Mon Oct 8 11:10:19 2012 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Mon, 8 Oct 2012 08:10:19 -0700 Subject: [lttng-dev] [URCU PATCH 3/3] call_rcu: use wfcqueue, eliminate false-sharing In-Reply-To: <20121008144916.GA29352@Krystal> References: <20121002141307.GA4057@Krystal> <20121002141638.GD4057@Krystal> <507243E9.8010604@cn.fujitsu.com> <20121008144916.GA29352@Krystal> Message-ID: <20121008151019.GA2453@linux.vnet.ibm.com> On Mon, Oct 08, 2012 at 10:49:16AM -0400, Mathieu Desnoyers wrote: > * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > > On 10/02/2012 10:16 PM, Mathieu Desnoyers wrote: > > > Eliminate false-sharing between call_rcu (enqueuer) and worker threads > > > on the queue head and tail. > > > > > > Signed-off-by: Mathieu Desnoyers > > > --- > > > diff --git a/tests/Makefile.am b/tests/Makefile.am > > > index 81718bb..c92bbe6 100644 > > > --- a/tests/Makefile.am > > > +++ b/tests/Makefile.am > > > @@ -30,14 +30,14 @@ if COMPAT_FUTEX > > > COMPAT+=$(top_srcdir)/compat_futex.c > > > endif > > > > > > -URCU=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > > > -URCU_QSBR=$(top_srcdir)/urcu-qsbr.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > > > +URCU=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > > > +URCU_QSBR=$(top_srcdir)/urcu-qsbr.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > > > # URCU_MB uses urcu.c but -DRCU_MB must be defined > > > -URCU_MB=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > > > +URCU_MB=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > > > # URCU_SIGNAL uses urcu.c but -DRCU_SIGNAL must be defined > > > -URCU_SIGNAL=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > > > -URCU_BP=$(top_srcdir)/urcu-bp.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > > > -URCU_DEFER=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfqueue.c $(COMPAT) > > > +URCU_SIGNAL=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > > > +URCU_BP=$(top_srcdir)/urcu-bp.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > > > +URCU_DEFER=$(top_srcdir)/urcu.c $(top_srcdir)/urcu-pointer.c $(top_srcdir)/wfcqueue.c $(COMPAT) > > > > > > URCU_COMMON_LIB=$(top_builddir)/liburcu-common.la > > > URCU_LIB=$(top_builddir)/liburcu.la > > > diff --git a/urcu-call-rcu-impl.h b/urcu-call-rcu-impl.h > > > index 13b24ff..cf65992 100644 > > > --- a/urcu-call-rcu-impl.h > > > +++ b/urcu-call-rcu-impl.h > > > @@ -21,6 +21,7 @@ > > > */ > > > > > > #define _GNU_SOURCE > > > +#define _LGPL_SOURCE > > > #include > > > #include > > > #include > > > @@ -35,7 +36,7 @@ > > > #include > > > > > > #include "config.h" > > > -#include "urcu/wfqueue.h" > > > +#include "urcu/wfcqueue.h" > > > #include "urcu-call-rcu.h" > > > #include "urcu-pointer.h" > > > #include "urcu/list.h" > > > @@ -46,7 +47,14 @@ > > > /* Data structure that identifies a call_rcu thread. */ > > > > > > struct call_rcu_data { > > > - struct cds_wfq_queue cbs; > > > + /* > > > + * Align the tail on cache line size to eliminate false-sharing > > > + * with head. > > > + */ > > > + struct cds_wfcq_tail __attribute__((aligned(CAA_CACHE_LINE_SIZE))) cbs_tail; > > > + /* Alignment on cache line size will add padding here */ > > > + > > > + struct cds_wfcq_head cbs_head; > > > > > > wrong here. In this code, cbs_tail and cbs_head are in the same cache line. > > > > --- > > > > struct call_rcu_data { > > struct cds_wfcq_tail cbs_tail; > > struct cds_wfcq_head __attribute__((aligned(CAA_CACHE_LINE_SIZE))) cbs_head; > > /* other fields, can move some fields up to use the room between tail and head */ > > }; > > > > # cat test.c > > > > struct a { > > int __attribute__((aligned(64))) i; > > int j; > > }; > > struct b { > > int i; > > int __attribute__((aligned(64))) j; > > }; > > > > void main(void) > > { > > printf("%d,%d\n", sizeof(struct a), sizeof(struct b)); > > } > > > > # ./a.out > > 64,128 > > > > Good point! While we are there, I notice that the "qlen" count, kept for > debugging, is causing false-sharing too. I wonder if we should split > this counter in two counters: nr_enqueue and nr_dequeue, which would sit > in two different cache lines ? It's mainly Paul who cares about this > counter. Thoughts ? Works for me, as long as nr_enqueue and nr_dequeue are both unsigned long to avoid issues with overflow. Thanx, Paul > Here is the fix to the problem you noticed above: > > commit b9f893b69fbc31baea418794938f4eb74cc4923a > Author: Mathieu Desnoyers > Date: Mon Oct 8 10:44:38 2012 -0400 > > Fix urcu-call-rcu-impl.h: false-sharing > > > > struct call_rcu_data { > > > - struct cds_wfq_queue cbs; > > > + /* > > > + * Align the tail on cache line size to eliminate false-sharing > > > + * with head. > > > + */ > > > + struct cds_wfcq_tail __attribute__((aligned(CAA_CACHE_LINE_SIZE))) cbs_tail; > > > + /* Alignment on cache line size will add padding here */ > > > + > > > + struct cds_wfcq_head cbs_head; > > > > > > wrong here. In this code, cbs_tail and cbs_head are in the same cache line. > > Reported-by: Lai Jiangshan > Signed-off-by: Mathieu Desnoyers > > Thanks! > > Mathieu > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com > From pbonzini at redhat.com Mon Oct 8 11:47:10 2012 From: pbonzini at redhat.com (Paolo Bonzini) Date: Mon, 08 Oct 2012 17:47:10 +0200 Subject: [lttng-dev] [URCU PATCH 1/3] wfcqueue: implement concurrency-efficient queue In-Reply-To: <20121002141444.GB4057@Krystal> References: <20121002141307.GA4057@Krystal> <20121002141444.GB4057@Krystal> Message-ID: <5072F57E.4020308@redhat.com> Il 02/10/2012 16:14, Mathieu Desnoyers ha scritto: > +/* > + * Concurrent queue with wait-free enqueue/blocking dequeue. > + * > + * Inspired from half-wait-free/half-blocking queue implementation done by > + * Paul E. McKenney. > + * > + * Mutual exclusion of __cds_wfcq_* API > + * > + * Unless otherwise stated, the caller must ensure mutual exclusion of > + * queue update operations "dequeue" and "splice" (for source queue). > + * Queue read operations "first" and "next" need to be protected against > + * concurrent "dequeue" and "splice" (for source queue) by the caller. > + * "enqueue", "splice" (for destination queue), and "empty" are the only > + * operations that can be used without any mutual exclusion. > + * Mutual exclusion can be ensured by holding cds_wfcq_dequeue_lock(). > + * > + * For convenience, cds_wfcq_dequeue_blocking() and > + * cds_wfcq_splice_blocking() hold the dequeue lock. > + */ Hi, can you add a for-like macro for iteration? Iteration does not require holding the lock and assumes that you are the sole user of the data structure, which is useful together with splice. Paolo From mathieu.desnoyers at efficios.com Mon Oct 8 12:03:55 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 8 Oct 2012 12:03:55 -0400 Subject: [lttng-dev] [URCU PATCH 3/3] call_rcu: use wfcqueue, eliminate false-sharing In-Reply-To: <20121008151019.GA2453@linux.vnet.ibm.com> References: <20121002141307.GA4057@Krystal> <20121002141638.GD4057@Krystal> <507243E9.8010604@cn.fujitsu.com> <20121008144916.GA29352@Krystal> <20121008151019.GA2453@linux.vnet.ibm.com> Message-ID: <20121008160355.GA30083@Krystal> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > On Mon, Oct 08, 2012 at 10:49:16AM -0400, Mathieu Desnoyers wrote: [...] > > > > Good point! While we are there, I notice that the "qlen" count, kept for > > debugging, is causing false-sharing too. I wonder if we should split > > this counter in two counters: nr_enqueue and nr_dequeue, which would sit > > in two different cache lines ? It's mainly Paul who cares about this > > counter. Thoughts ? > > Works for me, as long as nr_enqueue and nr_dequeue are both unsigned long > to avoid issues with overflow. > How about the following ? The 0.7% performance increase (which is this small probably due to call rcu batching) makes me wonder if it's really worth all the trouble through. If not, then we might want to consider removing the head alignment altogether. Carefully placing head/tail into different cache lines is really worth it if we have frequent dequeue, but in this case we batch through "splice" operations on large queues. Thoughts ? Fix wfcqueue: false-sharing of qlen, flags, futex fields The "qlen" count, kept for debugging, is causing false-sharing. Split this counter into two: nr_enqueue and nr_dequeue, which sit in two different cache lines. The flags field is read-only by enqueue, and read-mostly by dequeue. We can eliminate all false-sharing between enqueue and dequeue by having one copy of the flag for enqueue and one for dequeue. The STOP and STOPPED flags are only used for dequeue (at "free" time), and the RT flag is set at init time, and used by both enqueue and dequeue. The "futex" field is very often read by enqueue, and less often read and modified by dequeue (only when dequeue has emptied the queue). Move it to the enqueuer cache-line too. With these modifications, the test "test_urcu_hash 0 1 10" (one updater for hash table, which uses call_rcu heavily for reclaim) gains 0.7%. The small performance improvement in this case can be associated to the fact that call_rcu dequeuer batches dequeuing of rcu head elements. Signed-off-by: Mathieu Desnoyers --- diff --git a/urcu-call-rcu-impl.h b/urcu-call-rcu-impl.h index dca98e4..3f46be6 100644 --- a/urcu-call-rcu-impl.h +++ b/urcu-call-rcu-impl.h @@ -49,17 +49,18 @@ struct call_rcu_data { /* * Align the tail on cache line size to eliminate false-sharing - * with head. Small note, however: the "qlen" field, kept for - * debugging, will cause false-sharing between enqueue and - * dequeue. + * with head. */ struct cds_wfcq_tail cbs_tail; + unsigned long nr_enqueue; /* maintained for debugging. */ + unsigned long flags_enqueue; + int32_t futex; + /* Alignment on cache line size will add padding here */ struct cds_wfcq_head __attribute__((aligned(CAA_CACHE_LINE_SIZE))) cbs_head; - unsigned long flags; - int32_t futex; - unsigned long qlen; /* maintained for debugging. */ + unsigned long nr_dequeue; /* maintained for debugging. */ + unsigned long flags_dequeue; pthread_t tid; int cpu_affinity; struct cds_list_head list; @@ -231,7 +232,7 @@ static void *call_rcu_thread(void *arg) { unsigned long cbcount; struct call_rcu_data *crdp = (struct call_rcu_data *) arg; - int rt = !!(uatomic_read(&crdp->flags) & URCU_CALL_RCU_RT); + int rt = !!(uatomic_read(&crdp->flags_dequeue) & URCU_CALL_RCU_RT); int ret; ret = set_thread_cpu_affinity(crdp); @@ -269,9 +270,9 @@ static void *call_rcu_thread(void *arg) rhp->func(rhp); cbcount++; } - uatomic_sub(&crdp->qlen, cbcount); + uatomic_add(&crdp->nr_dequeue, cbcount); } - if (uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOP) + if (uatomic_read(&crdp->flags_dequeue) & URCU_CALL_RCU_STOP) break; rcu_thread_offline(); if (!rt) { @@ -300,7 +301,7 @@ static void *call_rcu_thread(void *arg) cmm_smp_mb(); uatomic_set(&crdp->futex, 0); } - uatomic_or(&crdp->flags, URCU_CALL_RCU_STOPPED); + uatomic_or(&crdp->flags_dequeue, URCU_CALL_RCU_STOPPED); rcu_unregister_thread(); return NULL; } @@ -323,9 +324,12 @@ static void call_rcu_data_init(struct call_rcu_data **crdpp, urcu_die(errno); memset(crdp, '\0', sizeof(*crdp)); cds_wfcq_init(&crdp->cbs_head, &crdp->cbs_tail); - crdp->qlen = 0; + crdp->nr_enqueue = 0; + crdp->flags_enqueue = flags; crdp->futex = 0; - crdp->flags = flags; + + crdp->nr_dequeue = 0; + crdp->flags_dequeue = flags; cds_list_add(&crdp->list, &call_rcu_data_list); crdp->cpu_affinity = cpu_affinity; cmm_smp_mb(); /* Structure initialized before pointer is planted. */ @@ -571,7 +575,7 @@ int create_all_cpu_call_rcu_data(unsigned long flags) */ static void wake_call_rcu_thread(struct call_rcu_data *crdp) { - if (!(_CMM_LOAD_SHARED(crdp->flags) & URCU_CALL_RCU_RT)) + if (!(_CMM_LOAD_SHARED(crdp->flags_enqueue) & URCU_CALL_RCU_RT)) call_rcu_wake_up(crdp); } @@ -601,7 +605,7 @@ void call_rcu(struct rcu_head *head, rcu_read_lock(); crdp = get_call_rcu_data(); cds_wfcq_enqueue(&crdp->cbs_head, &crdp->cbs_tail, &head->next); - uatomic_inc(&crdp->qlen); + uatomic_inc(&crdp->nr_enqueue); wake_call_rcu_thread(crdp); rcu_read_unlock(); } @@ -633,10 +637,10 @@ void call_rcu_data_free(struct call_rcu_data *crdp) if (crdp == NULL || crdp == default_call_rcu_data) { return; } - if ((uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOPPED) == 0) { - uatomic_or(&crdp->flags, URCU_CALL_RCU_STOP); + if ((uatomic_read(&crdp->flags_dequeue) & URCU_CALL_RCU_STOPPED) == 0) { + uatomic_or(&crdp->flags_dequeue, URCU_CALL_RCU_STOP); wake_call_rcu_thread(crdp); - while ((uatomic_read(&crdp->flags) & URCU_CALL_RCU_STOPPED) == 0) + while ((uatomic_read(&crdp->flags_dequeue) & URCU_CALL_RCU_STOPPED) == 0) poll(NULL, 0, 1); } if (!cds_wfcq_empty(&crdp->cbs_head, &crdp->cbs_tail)) { @@ -645,8 +649,10 @@ void call_rcu_data_free(struct call_rcu_data *crdp) __cds_wfcq_splice_blocking(&default_call_rcu_data->cbs_head, &default_call_rcu_data->cbs_tail, &crdp->cbs_head, &crdp->cbs_tail); - uatomic_add(&default_call_rcu_data->qlen, - uatomic_read(&crdp->qlen)); + uatomic_add(&default_call_rcu_data->nr_enqueue, + uatomic_read(&crdp->nr_enqueue)); + uatomic_add(&default_call_rcu_data->nr_dequeue, + uatomic_read(&crdp->nr_dequeue)); wake_call_rcu_thread(default_call_rcu_data); } @@ -750,7 +756,7 @@ void call_rcu_after_fork_child(void) cds_list_for_each_entry_safe(crdp, next, &call_rcu_data_list, list) { if (crdp == default_call_rcu_data) continue; - uatomic_set(&crdp->flags, URCU_CALL_RCU_STOPPED); + uatomic_set(&crdp->flags_dequeue, URCU_CALL_RCU_STOPPED); call_rcu_data_free(crdp); } } -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 8 12:15:44 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 8 Oct 2012 12:15:44 -0400 Subject: [lttng-dev] [URCU PATCH 1/3] wfcqueue: implement concurrency-efficient queue In-Reply-To: <5072F57E.4020308@redhat.com> References: <20121002141307.GA4057@Krystal> <20121002141444.GB4057@Krystal> <5072F57E.4020308@redhat.com> Message-ID: <20121008161544.GB30083@Krystal> * Paolo Bonzini (pbonzini at redhat.com) wrote: > Il 02/10/2012 16:14, Mathieu Desnoyers ha scritto: > > +/* > > + * Concurrent queue with wait-free enqueue/blocking dequeue. > > + * > > + * Inspired from half-wait-free/half-blocking queue implementation done by > > + * Paul E. McKenney. > > + * > > + * Mutual exclusion of __cds_wfcq_* API > > + * > > + * Unless otherwise stated, the caller must ensure mutual exclusion of > > + * queue update operations "dequeue" and "splice" (for source queue). > > + * Queue read operations "first" and "next" need to be protected against > > + * concurrent "dequeue" and "splice" (for source queue) by the caller. > > + * "enqueue", "splice" (for destination queue), and "empty" are the only > > + * operations that can be used without any mutual exclusion. > > + * Mutual exclusion can be ensured by holding cds_wfcq_dequeue_lock(). > > + * > > + * For convenience, cds_wfcq_dequeue_blocking() and > > + * cds_wfcq_splice_blocking() hold the dequeue lock. > > + */ > > Hi, > > can you add a for-like macro for iteration? Iteration does not require > holding the lock and assumes that you are the sole user of the data > structure, which is useful together with splice. > > Paolo Hi Paolo, We actually already have those, they are just not described in this comment. I will fix this right away. By the way, you will notice the wording: + * Queue read operations "first" and "next", which are used by + * "for_each" iterations, need to be protected against concurrent + * "dequeue" and "splice" (for source queue) by the caller. Being the only one iterating on a queue with local head/tail after a splice operation is one way to provide mutual exclusion. Holding a lock is not the only way to achieve mutual exclusion. commit f94061a3df4c9eab9ac869a19e4228de54771fcb Author: Mathieu Desnoyers Date: Mon Oct 8 12:11:30 2012 -0400 wfcqueue documentation: hint at for_each iterators Reported-by: Paolo Bonzini Signed-off-by: Mathieu Desnoyers diff --git a/urcu/static/wfcqueue.h b/urcu/static/wfcqueue.h index a989984..944ee88 100644 --- a/urcu/static/wfcqueue.h +++ b/urcu/static/wfcqueue.h @@ -48,8 +48,9 @@ extern "C" { * * Unless otherwise stated, the caller must ensure mutual exclusion of * queue update operations "dequeue" and "splice" (for source queue). - * Queue read operations "first" and "next" need to be protected against - * concurrent "dequeue" and "splice" (for source queue) by the caller. + * Queue read operations "first" and "next", which are used by + * "for_each" iterations, need to be protected against concurrent + * "dequeue" and "splice" (for source queue) by the caller. * "enqueue", "splice" (for destination queue), and "empty" are the only * operations that can be used without any mutual exclusion. * Mutual exclusion can be ensured by holding cds_wfcq_dequeue_lock(). @@ -190,6 +191,10 @@ ___cds_wfcq_node_sync_next(struct cds_wfcq_node *node) * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. * Should be called with cds_wfcq_dequeue_lock() held. + * + * Used by for-like iteration macros in urcu/wfqueue.h: + * __cds_wfcq_for_each_blocking() + * __cds_wfcq_for_each_blocking_safe() */ static inline struct cds_wfcq_node * ___cds_wfcq_first_blocking(struct cds_wfcq_head *head, @@ -211,6 +216,10 @@ ___cds_wfcq_first_blocking(struct cds_wfcq_head *head, * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. * Should be called with cds_wfcq_dequeue_lock() held. + * + * Used by for-like iteration macros in urcu/wfqueue.h: + * __cds_wfcq_for_each_blocking() + * __cds_wfcq_for_each_blocking_safe() */ static inline struct cds_wfcq_node * ___cds_wfcq_next_blocking(struct cds_wfcq_head *head, diff --git a/urcu/wfcqueue.h b/urcu/wfcqueue.h index 5576cbf..501120b 100644 --- a/urcu/wfcqueue.h +++ b/urcu/wfcqueue.h @@ -91,8 +91,9 @@ struct cds_wfcq_tail { * * Unless otherwise stated, the caller must ensure mutual exclusion of * queue update operations "dequeue" and "splice" (for source queue). - * Queue read operations "first" and "next" need to be protected against - * concurrent "dequeue" and "splice" (for source queue) by the caller. + * Queue read operations "first" and "next", which are used by + * "for_each" iterations, need to be protected against concurrent + * "dequeue" and "splice" (for source queue) by the caller. * "enqueue", "splice" (for destination queue), and "empty" are the only * operations that can be used without any mutual exclusion. * Mutual exclusion can be ensured by holding cds_wfcq_dequeue_lock(). @@ -202,6 +203,10 @@ extern void __cds_wfcq_splice_blocking( * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. * Should be called with cds_wfcq_dequeue_lock() held. + * + * Used by for-like iteration macros: + * __cds_wfcq_for_each_blocking() + * __cds_wfcq_for_each_blocking_safe() */ extern struct cds_wfcq_node *__cds_wfcq_first_blocking( struct cds_wfcq_head *head, @@ -213,6 +218,10 @@ extern struct cds_wfcq_node *__cds_wfcq_first_blocking( * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. * Should be called with cds_wfcq_dequeue_lock() held. + * + * Used by for-like iteration macros: + * __cds_wfcq_for_each_blocking() + * __cds_wfcq_for_each_blocking_safe() */ extern struct cds_wfcq_node *__cds_wfcq_next_blocking( struct cds_wfcq_head *head, -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From pbonzini at redhat.com Mon Oct 8 12:33:15 2012 From: pbonzini at redhat.com (Paolo Bonzini) Date: Mon, 08 Oct 2012 18:33:15 +0200 Subject: [lttng-dev] [URCU PATCH 1/3] wfcqueue: implement concurrency-efficient queue In-Reply-To: <20121008161544.GB30083@Krystal> References: <20121002141307.GA4057@Krystal> <20121002141444.GB4057@Krystal> <5072F57E.4020308@redhat.com> <20121008161544.GB30083@Krystal> Message-ID: <5073004B.2050400@redhat.com> Il 08/10/2012 18:15, Mathieu Desnoyers ha scritto: > Hi Paolo, > > We actually already have those, they are just not described in this > comment. I will fix this right away. By the way, you will notice the > wording: > > + * Queue read operations "first" and "next", which are used by > + * "for_each" iterations, need to be protected against concurrent > + * "dequeue" and "splice" (for source queue) by the caller. > > Being the only one iterating on a queue with local head/tail after a > splice operation is one way to provide mutual exclusion. Holding a lock > is not the only way to achieve mutual exclusion. Uh, I was confused by the _blocking suffix. But when used together with splice you know it is not blocking---only the splice will block. Paolo From mathieu.desnoyers at efficios.com Mon Oct 8 14:10:39 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 8 Oct 2012 14:10:39 -0400 Subject: [lttng-dev] [URCU PATCH 1/3] wfcqueue: implement concurrency-efficient queue In-Reply-To: <5073004B.2050400@redhat.com> References: <20121002141307.GA4057@Krystal> <20121002141444.GB4057@Krystal> <5072F57E.4020308@redhat.com> <20121008161544.GB30083@Krystal> <5073004B.2050400@redhat.com> Message-ID: <20121008181039.GA4558@Krystal> * Paolo Bonzini (pbonzini at redhat.com) wrote: > Il 08/10/2012 18:15, Mathieu Desnoyers ha scritto: > > Hi Paolo, > > > > We actually already have those, they are just not described in this > > comment. I will fix this right away. By the way, you will notice the > > wording: > > > > + * Queue read operations "first" and "next", which are used by > > + * "for_each" iterations, need to be protected against concurrent > > + * "dequeue" and "splice" (for source queue) by the caller. > > > > Being the only one iterating on a queue with local head/tail after a > > splice operation is one way to provide mutual exclusion. Holding a lock > > is not the only way to achieve mutual exclusion. > > Uh, I was confused by the _blocking suffix. But when used together with > splice you know it is not blocking---only the splice will block. Well, in this case, the "blocking" can be understood as busy-waiting (and actual blocking that invokes the OS happens if busy-waiting for long periods only). for_each iteration can indeed busy-wait: ___cds_wfcq_first_blocking() and ___cds_wfcq_next_blocking() can both call ___cds_wfcq_node_sync_next(), which busy-waits if it encounters a NULL next pointer that is not located in the tail node. Seen through the use-case of splice to local queue + for_each, here is what happens: - splice moves the content of the queue into a "local" queue. However, it does _not_ issue a ___cds_wfcq_node_sync_next() on each node: no traversal is performed. - then, within the for_each iteration, we perform the ___cds_wfcq_node_sync_next() synchronization as we iterate on the local queue. This ensures that the synchronization performed for each node is only performed lazily, only when those nodes are actually traversed. This approach has the advantage of increasing the locality of reference, and also to diminish the odds that this synchronization actually needs to be executed uselessly by postponing it to the point where it is actually needed. Does it make more sense ? Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From Andrew.McDermott at windriver.com Mon Oct 8 17:57:23 2012 From: Andrew.McDermott at windriver.com (McDermott, Andrew) Date: Mon, 8 Oct 2012 21:57:23 +0000 Subject: [lttng-dev] status of lttng top Message-ID: <7F632A9222059A42AF70FCB7965774AA20627EAB@ALA-MBB.corp.ad.wrs.com> I was wondering what the status of lttng top was. I'm happy to add the daily Ubuntu PPA and try it that way, or equally building from source. But, before I tread that route, are there any gotchas to be aware of. Is it still considered work-in-progress, etc. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hollis_blanchard at mentor.com Mon Oct 8 19:14:27 2012 From: hollis_blanchard at mentor.com (Hollis Blanchard) Date: Mon, 8 Oct 2012 16:14:27 -0700 Subject: [lttng-dev] UST segfault: memcpy too big Message-ID: <50735E53.1010104@mentor.com> I seem to have hit a little problem with a "hello world" test app and lttng-ust 2.0.3. lttng-ust.git seems to be affected as well. Basically, I created a single UST tracepoint, but as soon as I run "lttng enable-event -u -a", my app segfaults. The problem seems to be that when creating the event to pass to ltt_event_create(), we try to memcpy the full 256 bytes of name. However, the name might be shorter, and if we get unlucky it falls within 256 bytes of the segment boundary... Completely untested patch, using strncpy instead of memcpy: --- liblttng-ust/ltt-events.c.orig 2012-10-08 16:02:16.421494319 -0700 +++ liblttng-ust/ltt-events.c 2012-10-08 16:03:24.770293756 -0700 @@ -248,9 +248,10 @@ int pending_probe_fix_events(const struc memcpy(&event_param, &sw->event_param, sizeof(event_param)); - memcpy(event_param.name, + stncpy(event_param.name, desc->name, sizeof(event_param.name)); + event_param.name[sizeof(event_param.name)-1] = '\0'; /* create event */ ret = ltt_event_create(sw->chan, &event_param, NULL, In addition to not testing this patch, I haven't audited other callers to see if it's a pattern that also needs to be fixed elsewhere. Debug info: Program received signal SIGSEGV, Segmentation fault. 0x4da8d8d4 in memcpy () from /home/hollisb/work/panda-mel-sysroot/lib/libc.so.6 (gdb) bt #0 0x4da8d8d4 in memcpy () from /home/hollisb/work/panda-mel-sysroot/lib/libc.so.6 #1 0x4007eacc in pending_probe_fix_events (desc=0x8bec) at ltt-events.c:251 #2 0x4007b980 in ltt_probe_register (desc=0x11164) at ltt-probes.c:119 #3 0x0000886c in __lttng_events_init__lighttpd_connection () at /home/hollisb/work/panda-mel-sysroot/usr/include/lttng/ust-tracepoint-event.h:554 #4 0x00008b84 in __libc_csu_init (argc=0x1, argv=0xbe8cfe24, envp=0xbe8cfe2c) at elf-init.c:124 #5 0x4da1f848 in __libc_start_main (main=0xbe8cfe24, argc=0x4db4a000, ubp_av=0x4da1f848, init=0x8ad0 <__libc_csu_init>, fini=0x8b98 <__libc_csu_fini>, rtld_fini=0x4d9e8018 <_dl_fini>, stack_end=0xbe8cfe24) at libc-start.c:185 #6 0x00008968 in _start () (gdb) up #1 0x4007eacc in pending_probe_fix_events (desc=0x8bec) at ltt-events.c:251 (gdb) list 247 int ret; 248 249 memcpy(&event_param, &sw->event_param, 250 sizeof(event_param)); 251 memcpy(event_param.name, 252 desc->name, 253 sizeof(event_param.name)); 254 /* create event */ 255 ret = ltt_event_create(sw->chan, 256 &event_param, NULL, (gdb) p desc->name $10 = 0x8f10 "lighttpd_connection:create" (gdb) p sizeof(event_param.name) $11 = 0x100 (gdb) x/64 desc->name 0x8f10: 0x6867696c 0x64707474 0x6e6f635f 0x7463656e 0x8f20: 0x3a6e6f69 0x61657263 0x6574 0x2c746e69 0x8f30: 0x6e6f6320 0x0 0x6867696c 0x64707474 0x8f40: 0x6e6f635f 0x7463656e 0x6e6f69 0x727470 0x8f50 <__tp_strtab_lighttpd_connection___create>: 0x6867696c 0x64707474 0x6e6f635f 0x7463656e 0x8f60 <__tp_strtab_lighttpd_connection___create+16>: 0x3a6e6f69 0x61657263 0x6574 0x8101b108 0x8f70: 0x8400b0b0 0x0 0x8101b108 0x8400b0b0 0x8f80: 0x0 0x8101b108 0x8400b0b0 0x0 0x8f90: 0x7ffff740 0x7fffffe4 0x7ffff76c 0x80a8b0b0 0x8fa0: 0x7ffff7ec 0x80b0b0b0 0x7ffff7f0 0x80a8b0b0 0x8fb0: 0x7ffff8b0 0x7fffffd0 0x7ffff8e4 0x80b108af 0x8fc0: 0x7ffff97c 0x1 0x7ffff9d4 0x80b0b0b0 0x8fd0: 0x7ffff9e8 0x7fffff98 0x7ffffa10 0x8015aab0 0x8fe0: 0x7ffffaf0 0x80aeb0b0 0x7ffffbb0 0x80b0b0b0 0x8ff0: 0x7ffffbac 0x1 0x0 0x0 0x9000: Cannot access memory at address 0x9000 Uh oh... % readelf -l tracetest-arm Elf file type is EXEC (Executable file) Entry point 0x893c There are 8 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align EXIDX 0x000f90 0x00008f90 0x00008f90 0x00068 0x00068 R 0x4 PHDR 0x000034 0x00008034 0x00008034 0x00100 0x00100 R E 0x4 INTERP 0x000134 0x00008134 0x00008134 0x00013 0x00013 R 0x1 [Requesting program interpreter: /lib/ld-linux.so.3] LOAD 0x000000 0x00008000 0x00008000 0x00ffc 0x00ffc R E 0x8000 LOAD 0x001000 0x00011000 0x00011000 0x001cc 0x001f8 RW 0x8000 DYNAMIC 0x00101c 0x0001101c 0x0001101c 0x00100 0x00100 RW 0x4 NOTE 0x000148 0x00008148 0x00008148 0x00020 0x00020 R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4 Note the hole between those two LOAD segments. desc->name landed very close to the end of the first. I got lucky on x86, where (at least with my toolchain) the next segment happens to land closer: LOAD 0x000000 0x08048000 0x08048000 0x00f98 0x00f98 R E 0x1000 LOAD 0x001000 0x08049000 0x08049000 0x00204 0x00248 RW 0x1000 -- Hollis Blanchard Product Owner, Sourcery Analyzer Mentor Graphics, Embedded Systems Division -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdesfossez at efficios.com Mon Oct 8 22:20:14 2012 From: jdesfossez at efficios.com (Julien Desfossez) Date: Mon, 08 Oct 2012 22:20:14 -0400 Subject: [lttng-dev] status of lttng top In-Reply-To: <7F632A9222059A42AF70FCB7965774AA20627EAB@ALA-MBB.corp.ad.wrs.com> References: <7F632A9222059A42AF70FCB7965774AA20627EAB@ALA-MBB.corp.ad.wrs.com> Message-ID: <507389DE.4000204@efficios.com> Hi, LTTngTop is still work in progress and will remain that way for a long time, but the version in the PPA (or in the master branch in git) is perfectly usable for offline traces (traces recorded and replayed through LTTngTop). The "live" branch is more experimental and requires patches in both Babeltrace and Lttng-tools (all documented in the README-LIVE file), but it worked at the time of Plumbers, I didn't have much time since then to rebase the branches. I am waiting for the release of Lttng-tools 2.1 (currently in RC) before merging those patches. After these patches are integrated, LTTngTop will be able to work live without any modifications, so directly reading traces in memory shared with the tracer. In the meantime we are working on replacing the "home made" state system in LTTngTop with a more generic one (which will be used also in LTTV), this will cleanup this part of the code and allow to store the state on disk. So in a near future we will be able to only read the state instead of the trace (once it has been generated), which will compress significantly the amount of data we need to keep in order to access the kind of statistics provided by LTTngTop. If you want to try LTTngTop, you can just install the package and follow the man page to record a trace with the right contexts, it should work as is. If you have any questions and/or feedback, please don't hesitate to ask. Thanks, Julien On 08/10/12 05:57 PM, McDermott, Andrew wrote: > I was wondering what the status of lttng top was. I'm happy to add the > daily Ubuntu PPA and try it that way, or equally building from source. > But, before I tread that route, are there any gotchas to be aware of. > Is it still considered work-in-progress, etc. > > Thanks. > > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev From dgoulet at efficios.com Tue Oct 9 10:41:59 2012 From: dgoulet at efficios.com (David Goulet) Date: Tue, 09 Oct 2012 10:41:59 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v5] ABI with support for compat 32/64 bits In-Reply-To: <506C4044.7000603@efficios.com> References: <1349204606-6473-1-git-send-email-jdesfossez@efficios.com> <506C4044.7000603@efficios.com> Message-ID: <507437B7.8020309@efficios.com> Status on that patch? Mathieu? David Goulet: > This look good to me. There is a small change I would make though to the > open_metadata call where the old_channel is not in the if - else{} > statement but I'll do it before merging it. Don't bother sending back a > version. > > Acked. > > David > > Julien Desfossez: >> The current ABI does not work for compat 32/64 bits. >> This patch moves the current ABI as old-abi and provides a new ABI in >> which all the structures exchanged between user and kernel-space are >> packed. Also this new ABI moves the "int overwrite" member of the >> struct lttng_kernel_channel to remove the alignment added by the >> compiler. >> >> A patch for lttng-modules has been developed in parallel to this one >> to support the new ABI. These 2 patches have been tested in all >> possible configurations (applied or not) on 64-bit and 32-bit kernels >> (with CONFIG_COMPAT) and a user-space in 32 and 64-bit. >> >> Here are the results of the tests : >> k 64 compat | u 32 compat | OK >> k 64 compat | u 64 compat | OK >> k 64 compat | u 32 non-compat | KO >> k 64 compat | u 64 non-compat | OK >> >> k 64 non-compat | u 64 compat | OK >> k 64 non-compat | u 32 compat | KO >> k 64 non-compat | u 64 non-compat | OK >> k 64 non-compat | u 32 non-compat | KO >> >> k 32 compat | u compat | OK >> k 32 compat | u non-compat | OK >> >> k 32 non-compat | u compat | OK >> k 32 non-compat | u non-compat | OK >> >> The results are as expected : >> - on 32-bit user-space and kernel, every configuration works. >> - on 64-bit user-space and kernel, every configuration works. >> - with 32-bit user-space on a 64-bit kernel the only configuration >> where it works is when the compat patch is applied everywhere. >> >> Signed-off-by: Julien Desfossez >> --- >> src/bin/lttng-sessiond/trace-kernel.h | 1 + >> src/common/kernel-ctl/kernel-ctl.c | 224 ++++++++++++++++++++++++++++++--- >> src/common/kernel-ctl/kernel-ctl.h | 1 + >> src/common/kernel-ctl/kernel-ioctl.h | 74 +++++++---- >> src/common/lttng-kernel-old.h | 115 +++++++++++++++++ >> src/common/lttng-kernel.h | 31 +++-- >> 6 files changed, 397 insertions(+), 49 deletions(-) >> create mode 100644 src/common/lttng-kernel-old.h >> >> diff --git a/src/bin/lttng-sessiond/trace-kernel.h b/src/bin/lttng-sessiond/trace-kernel.h >> index f04d9e7..c86cc27 100644 >> --- a/src/bin/lttng-sessiond/trace-kernel.h >> +++ b/src/bin/lttng-sessiond/trace-kernel.h >> @@ -22,6 +22,7 @@ >> >> #include >> #include >> +#include >> >> #include "consumer.h" >> >> diff --git a/src/common/kernel-ctl/kernel-ctl.c b/src/common/kernel-ctl/kernel-ctl.c >> index 1396cd9..a93d251 100644 >> --- a/src/common/kernel-ctl/kernel-ctl.c >> +++ b/src/common/kernel-ctl/kernel-ctl.c >> @@ -18,38 +18,175 @@ >> >> #define __USE_LINUX_IOCTL_DEFS >> #include >> +#include >> >> #include "kernel-ctl.h" >> #include "kernel-ioctl.h" >> >> +/* >> + * This flag indicates which version of the kernel ABI to use. The old >> + * ABI (namespace _old) does not support a 32-bit user-space when the >> + * kernel is 64-bit. The old ABI is kept here for compatibility but is >> + * deprecated and will be removed eventually. >> + */ >> +static int lttng_kernel_use_old_abi = -1; >> + >> +/* >> + * Execute the new or old ioctl depending on the ABI version. >> + * If the ABI version is not determined yet (lttng_kernel_use_old_abi = -1), >> + * this function tests if the new ABI is available and otherwise fallbacks >> + * on the old one. >> + * This function takes the fd on which the ioctl must be executed and the old >> + * and new request codes. >> + * It returns the return value of the ioctl executed. >> + */ >> +static inline int compat_ioctl_no_arg(int fd, unsigned long oldname, >> + unsigned long newname) >> +{ >> + int ret; >> + >> + if (lttng_kernel_use_old_abi == -1) { >> + ret = ioctl(fd, newname); >> + if (!ret) { >> + lttng_kernel_use_old_abi = 0; >> + goto end; >> + } >> + lttng_kernel_use_old_abi = 1; >> + } >> + if (lttng_kernel_use_old_abi) { >> + ret = ioctl(fd, oldname); >> + } else { >> + ret = ioctl(fd, newname); >> + } >> + >> +end: >> + return ret; >> +} >> + >> int kernctl_create_session(int fd) >> { >> - return ioctl(fd, LTTNG_KERNEL_SESSION); >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION, >> + LTTNG_KERNEL_SESSION); >> } >> >> /* open the metadata global channel */ >> int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops) >> { >> - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); >> + struct lttng_kernel_old_channel old_channel; >> + struct lttng_kernel_channel channel; >> + >> + if (lttng_kernel_use_old_abi) { >> + old_channel.overwrite = chops->overwrite; >> + old_channel.subbuf_size = chops->subbuf_size; >> + old_channel.num_subbuf = chops->num_subbuf; >> + old_channel.switch_timer_interval = chops->switch_timer_interval; >> + old_channel.read_timer_interval = chops->read_timer_interval; >> + old_channel.output = chops->output; >> + memcpy(old_channel.padding, chops->padding, sizeof(old_channel.padding)); >> + >> + return ioctl(fd, LTTNG_KERNEL_OLD_METADATA, &old_channel); >> + } >> + >> + channel.overwrite = chops->overwrite; >> + channel.subbuf_size = chops->subbuf_size; >> + channel.num_subbuf = chops->num_subbuf; >> + channel.switch_timer_interval = chops->switch_timer_interval; >> + channel.read_timer_interval = chops->read_timer_interval; >> + channel.output = chops->output; >> + memcpy(channel.padding, chops->padding, sizeof(channel.padding)); >> + >> + return ioctl(fd, LTTNG_KERNEL_METADATA, &channel); >> } >> >> int kernctl_create_channel(int fd, struct lttng_channel_attr *chops) >> { >> - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); >> + struct lttng_kernel_channel channel; >> + >> + if (lttng_kernel_use_old_abi) { >> + struct lttng_kernel_old_channel old_channel; >> + >> + old_channel.overwrite = chops->overwrite; >> + old_channel.subbuf_size = chops->subbuf_size; >> + old_channel.num_subbuf = chops->num_subbuf; >> + old_channel.switch_timer_interval = chops->switch_timer_interval; >> + old_channel.read_timer_interval = chops->read_timer_interval; >> + old_channel.output = chops->output; >> + memcpy(old_channel.padding, chops->padding, sizeof(old_channel.padding)); >> + >> + return ioctl(fd, LTTNG_KERNEL_OLD_CHANNEL, &old_channel); >> + } >> + >> + channel.overwrite = chops->overwrite; >> + channel.subbuf_size = chops->subbuf_size; >> + channel.num_subbuf = chops->num_subbuf; >> + channel.switch_timer_interval = chops->switch_timer_interval; >> + channel.read_timer_interval = chops->read_timer_interval; >> + channel.output = chops->output; >> + memcpy(channel.padding, chops->padding, sizeof(channel.padding)); >> + >> + return ioctl(fd, LTTNG_KERNEL_CHANNEL, &channel); >> } >> >> int kernctl_create_stream(int fd) >> { >> - return ioctl(fd, LTTNG_KERNEL_STREAM); >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_STREAM, >> + LTTNG_KERNEL_STREAM); >> } >> >> int kernctl_create_event(int fd, struct lttng_kernel_event *ev) >> { >> + if (lttng_kernel_use_old_abi) { >> + struct lttng_kernel_old_event old_event; >> + >> + memcpy(old_event.name, ev->name, sizeof(old_event.name)); >> + old_event.instrumentation = ev->instrumentation; >> + switch (ev->instrumentation) { >> + case LTTNG_KERNEL_KPROBE: >> + old_event.u.kprobe.addr = ev->u.kprobe.addr; >> + old_event.u.kprobe.offset = ev->u.kprobe.offset; >> + memcpy(old_event.u.kprobe.symbol_name, >> + ev->u.kprobe.symbol_name, >> + sizeof(old_event.u.kprobe.symbol_name)); >> + break; >> + case LTTNG_KERNEL_KRETPROBE: >> + old_event.u.kretprobe.addr = ev->u.kretprobe.addr; >> + old_event.u.kretprobe.offset = ev->u.kretprobe.offset; >> + memcpy(old_event.u.kretprobe.symbol_name, >> + ev->u.kretprobe.symbol_name, >> + sizeof(old_event.u.kretprobe.symbol_name)); >> + break; >> + case LTTNG_KERNEL_FUNCTION: >> + memcpy(old_event.u.ftrace.symbol_name, >> + ev->u.ftrace.symbol_name, >> + sizeof(old_event.u.ftrace.symbol_name)); >> + break; >> + default: >> + break; >> + } >> + >> + return ioctl(fd, LTTNG_KERNEL_OLD_EVENT, &old_event); >> + } >> return ioctl(fd, LTTNG_KERNEL_EVENT, ev); >> } >> >> int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) >> { >> + if (lttng_kernel_use_old_abi) { >> + struct lttng_kernel_old_context old_ctx; >> + >> + old_ctx.ctx = ctx->ctx; >> + /* only type that uses the union */ >> + if (ctx->ctx == LTTNG_KERNEL_CONTEXT_PERF_COUNTER) { >> + old_ctx.u.perf_counter.type = >> + ctx->u.perf_counter.type; >> + old_ctx.u.perf_counter.config = >> + ctx->u.perf_counter.config; >> + memcpy(old_ctx.u.perf_counter.name, >> + ctx->u.perf_counter.name, >> + sizeof(old_ctx.u.perf_counter.name)); >> + } >> + return ioctl(fd, LTTNG_KERNEL_OLD_CONTEXT, &old_ctx); >> + } >> return ioctl(fd, LTTNG_KERNEL_CONTEXT, ctx); >> } >> >> @@ -57,44 +194,98 @@ int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) >> /* Enable event, channel and session ioctl */ >> int kernctl_enable(int fd) >> { >> - return ioctl(fd, LTTNG_KERNEL_ENABLE); >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_ENABLE, >> + LTTNG_KERNEL_ENABLE); >> } >> >> /* Disable event, channel and session ioctl */ >> int kernctl_disable(int fd) >> { >> - return ioctl(fd, LTTNG_KERNEL_DISABLE); >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_DISABLE, >> + LTTNG_KERNEL_DISABLE); >> } >> >> int kernctl_start_session(int fd) >> { >> - return ioctl(fd, LTTNG_KERNEL_SESSION_START); >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_START, >> + LTTNG_KERNEL_SESSION_START); >> } >> >> int kernctl_stop_session(int fd) >> { >> - return ioctl(fd, LTTNG_KERNEL_SESSION_STOP); >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_STOP, >> + LTTNG_KERNEL_SESSION_STOP); >> } >> >> - >> int kernctl_tracepoint_list(int fd) >> { >> - return ioctl(fd, LTTNG_KERNEL_TRACEPOINT_LIST); >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_TRACEPOINT_LIST, >> + LTTNG_KERNEL_TRACEPOINT_LIST); >> } >> >> int kernctl_tracer_version(int fd, struct lttng_kernel_tracer_version *v) >> { >> - return ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); >> + int ret; >> + >> + if (lttng_kernel_use_old_abi == -1) { >> + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); >> + if (!ret) { >> + lttng_kernel_use_old_abi = 0; >> + goto end; >> + } >> + lttng_kernel_use_old_abi = 1; >> + } >> + if (lttng_kernel_use_old_abi) { >> + struct lttng_kernel_old_tracer_version old_v; >> + >> + ret = ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, &old_v); >> + if (ret) { >> + goto end; >> + } >> + v->major = old_v.major; >> + v->minor = old_v.minor; >> + v->patchlevel = old_v.patchlevel; >> + } else { >> + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); >> + } >> + >> +end: >> + return ret; >> } >> >> int kernctl_wait_quiescent(int fd) >> { >> - return ioctl(fd, LTTNG_KERNEL_WAIT_QUIESCENT); >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_WAIT_QUIESCENT, >> + LTTNG_KERNEL_WAIT_QUIESCENT); >> } >> >> int kernctl_calibrate(int fd, struct lttng_kernel_calibrate *calibrate) >> { >> - return ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); >> + int ret; >> + >> + if (lttng_kernel_use_old_abi == -1) { >> + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); >> + if (!ret) { >> + lttng_kernel_use_old_abi = 0; >> + goto end; >> + } >> + lttng_kernel_use_old_abi = 1; >> + } >> + if (lttng_kernel_use_old_abi) { >> + struct lttng_kernel_old_calibrate old_calibrate; >> + >> + old_calibrate.type = calibrate->type; >> + ret = ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, &old_calibrate); >> + if (ret) { >> + goto end; >> + } >> + calibrate->type = old_calibrate.type; >> + } else { >> + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); >> + } >> + >> +end: >> + return ret; >> } >> >> >> @@ -193,10 +384,3 @@ int kernctl_set_stream_id(int fd, unsigned long *stream_id) >> { >> return ioctl(fd, RING_BUFFER_SET_STREAM_ID, stream_id); >> } >> - >> -/* Get the offset of the stream_id in the packet header */ >> -int kernctl_get_net_stream_id_offset(int fd, unsigned long *offset) >> -{ >> - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, offset); >> - >> -} >> diff --git a/src/common/kernel-ctl/kernel-ctl.h b/src/common/kernel-ctl/kernel-ctl.h >> index 18712d9..85a3a18 100644 >> --- a/src/common/kernel-ctl/kernel-ctl.h >> +++ b/src/common/kernel-ctl/kernel-ctl.h >> @@ -21,6 +21,7 @@ >> >> #include >> #include >> +#include >> >> int kernctl_create_session(int fd); >> int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops); >> diff --git a/src/common/kernel-ctl/kernel-ioctl.h b/src/common/kernel-ctl/kernel-ioctl.h >> index 35942be..8e22632 100644 >> --- a/src/common/kernel-ctl/kernel-ioctl.h >> +++ b/src/common/kernel-ctl/kernel-ioctl.h >> @@ -49,37 +49,69 @@ >> /* map stream to stream id for network streaming */ >> #define RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned long) >> >> +/* Old ABI (without support for 32/64 bits compat) */ >> +/* LTTng file descriptor ioctl */ >> +#define LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) >> +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ >> + _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) >> +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, 0x42) >> +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT _IO(0xF6, 0x43) >> +#define LTTNG_KERNEL_OLD_CALIBRATE \ >> + _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) >> + >> +/* Session FD ioctl */ >> +#define LTTNG_KERNEL_OLD_METADATA \ >> + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) >> +#define LTTNG_KERNEL_OLD_CHANNEL \ >> + _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) >> +#define LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) >> +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, 0x53) >> + >> +/* Channel FD ioctl */ >> +#define LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) >> +#define LTTNG_KERNEL_OLD_EVENT \ >> + _IOW(0xF6, 0x61, struct lttng_kernel_old_event) >> +#define LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ >> + _IOR(0xF6, 0x62, unsigned long) >> >> +/* Event and Channel FD ioctl */ >> +#define LTTNG_KERNEL_OLD_CONTEXT \ >> + _IOW(0xF6, 0x70, struct lttng_kernel_old_context) >> + >> +/* Event, Channel and Session ioctl */ >> +#define LTTNG_KERNEL_OLD_ENABLE _IO(0xF6, 0x80) >> +#define LTTNG_KERNEL_OLD_DISABLE _IO(0xF6, 0x81) >> + >> + >> +/* Current ABI (with suport for 32/64 bits compat) */ >> /* LTTng file descriptor ioctl */ >> -#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) >> -#define LTTNG_KERNEL_TRACER_VERSION \ >> - _IOR(0xF6, 0x41, struct lttng_kernel_tracer_version) >> -#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) >> -#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) >> +#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) >> +#define LTTNG_KERNEL_TRACER_VERSION \ >> + _IOR(0xF6, 0x46, struct lttng_kernel_tracer_version) >> +#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) >> +#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) >> #define LTTNG_KERNEL_CALIBRATE \ >> - _IOWR(0xF6, 0x44, struct lttng_kernel_calibrate) >> + _IOWR(0xF6, 0x49, struct lttng_kernel_calibrate) >> >> /* Session FD ioctl */ >> -#define LTTNG_KERNEL_METADATA \ >> - _IOW(0xF6, 0x50, struct lttng_channel_attr) >> -#define LTTNG_KERNEL_CHANNEL \ >> - _IOW(0xF6, 0x51, struct lttng_channel_attr) >> -#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) >> -#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) >> +#define LTTNG_KERNEL_METADATA \ >> + _IOW(0xF6, 0x54, struct lttng_kernel_channel) >> +#define LTTNG_KERNEL_CHANNEL \ >> + _IOW(0xF6, 0x55, struct lttng_kernel_channel) >> +#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) >> +#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) >> >> /* Channel FD ioctl */ >> -#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x60) >> -#define LTTNG_KERNEL_EVENT \ >> - _IOW(0xF6, 0x61, struct lttng_kernel_event) >> -#define LTTNG_KERNEL_STREAM_ID_OFFSET \ >> - _IOR(0xF6, 0x62, unsigned long) >> +#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x62) >> +#define LTTNG_KERNEL_EVENT \ >> + _IOW(0xF6, 0x63, struct lttng_kernel_event) >> >> /* Event and Channel FD ioctl */ >> -#define LTTNG_KERNEL_CONTEXT \ >> - _IOW(0xF6, 0x70, struct lttng_kernel_context) >> +#define LTTNG_KERNEL_CONTEXT \ >> + _IOW(0xF6, 0x71, struct lttng_kernel_context) >> >> /* Event, Channel and Session ioctl */ >> -#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) >> -#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) >> +#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) >> +#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) >> >> #endif /* _LTT_KERNEL_IOCTL_H */ >> diff --git a/src/common/lttng-kernel-old.h b/src/common/lttng-kernel-old.h >> new file mode 100644 >> index 0000000..1b8999a >> --- /dev/null >> +++ b/src/common/lttng-kernel-old.h >> @@ -0,0 +1,115 @@ >> +/* >> + * Copyright (C) 2011 - Julien Desfossez >> + * Mathieu Desnoyers >> + * David Goulet >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License, version 2 only, >> + * as published by the Free Software Foundation. >> + * >> + * This program is distributed in the hope that it will be useful, but WITHOUT >> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or >> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for >> + * more details. >> + * >> + * You should have received a copy of the GNU General Public License along >> + * with this program; if not, write to the Free Software Foundation, Inc., >> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. >> + */ >> + >> +#ifndef _LTTNG_KERNEL_OLD_H >> +#define _LTTNG_KERNEL_OLD_H >> + >> +#include >> +#include >> + >> +/* >> + * LTTng DebugFS ABI structures. >> + * >> + * This is the kernel ABI copied from lttng-modules tree. >> + */ >> + >> +/* Perf counter attributes */ >> +struct lttng_kernel_old_perf_counter_ctx { >> + uint32_t type; >> + uint64_t config; >> + char name[LTTNG_KERNEL_SYM_NAME_LEN]; >> +}; >> + >> +/* Event/Channel context */ >> +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 >> +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 >> +struct lttng_kernel_old_context { >> + enum lttng_kernel_context_type ctx; >> + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; >> + >> + union { >> + struct lttng_kernel_old_perf_counter_ctx perf_counter; >> + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; >> + } u; >> +}; >> + >> +struct lttng_kernel_old_kretprobe { >> + uint64_t addr; >> + >> + uint64_t offset; >> + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; >> +}; >> + >> +/* >> + * Either addr is used, or symbol_name and offset. >> + */ >> +struct lttng_kernel_old_kprobe { >> + uint64_t addr; >> + >> + uint64_t offset; >> + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; >> +}; >> + >> +/* Function tracer */ >> +struct lttng_kernel_old_function { >> + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; >> +}; >> + >> +#define LTTNG_KERNEL_OLD_EVENT_PADDING1 16 >> +#define LTTNG_KERNEL_OLD_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 >> +struct lttng_kernel_old_event { >> + char name[LTTNG_KERNEL_SYM_NAME_LEN]; >> + enum lttng_kernel_instrumentation instrumentation; >> + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; >> + >> + /* Per instrumentation type configuration */ >> + union { >> + struct lttng_kernel_old_kretprobe kretprobe; >> + struct lttng_kernel_old_kprobe kprobe; >> + struct lttng_kernel_old_function ftrace; >> + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; >> + } u; >> +}; >> + >> +struct lttng_kernel_old_tracer_version { >> + uint32_t major; >> + uint32_t minor; >> + uint32_t patchlevel; >> +}; >> + >> +struct lttng_kernel_old_calibrate { >> + enum lttng_kernel_calibrate_type type; /* type (input) */ >> +}; >> + >> +/* >> + * kernel channel >> + */ >> +#define LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 >> +struct lttng_kernel_old_channel { >> + int overwrite; /* 1: overwrite, 0: discard */ >> + uint64_t subbuf_size; /* bytes */ >> + uint64_t num_subbuf; /* power of 2 */ >> + unsigned int switch_timer_interval; /* usec */ >> + unsigned int read_timer_interval; /* usec */ >> + enum lttng_event_output output; /* splice, mmap */ >> + >> + char padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; >> +}; >> + >> +#endif /* _LTTNG_KERNEL_OLD_H */ >> diff --git a/src/common/lttng-kernel.h b/src/common/lttng-kernel.h >> index dbeb6aa..fa8ba61 100644 >> --- a/src/common/lttng-kernel.h >> +++ b/src/common/lttng-kernel.h >> @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { >> uint32_t type; >> uint64_t config; >> char name[LTTNG_KERNEL_SYM_NAME_LEN]; >> -}; >> +}__attribute__((packed)); >> >> /* Event/Channel context */ >> #define LTTNG_KERNEL_CONTEXT_PADDING1 16 >> @@ -72,14 +72,14 @@ struct lttng_kernel_context { >> struct lttng_kernel_perf_counter_ctx perf_counter; >> char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; >> } u; >> -}; >> +}__attribute__((packed)); >> >> struct lttng_kernel_kretprobe { >> uint64_t addr; >> >> uint64_t offset; >> char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; >> -}; >> +}__attribute__((packed)); >> >> /* >> * Either addr is used, or symbol_name and offset. >> @@ -89,12 +89,12 @@ struct lttng_kernel_kprobe { >> >> uint64_t offset; >> char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; >> -}; >> +}__attribute__((packed)); >> >> /* Function tracer */ >> struct lttng_kernel_function { >> char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; >> -}; >> +}__attribute__((packed)); >> >> #define LTTNG_KERNEL_EVENT_PADDING1 16 >> #define LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 >> @@ -110,13 +110,13 @@ struct lttng_kernel_event { >> struct lttng_kernel_function ftrace; >> char padding[LTTNG_KERNEL_EVENT_PADDING2]; >> } u; >> -}; >> +}__attribute__((packed)); >> >> struct lttng_kernel_tracer_version { >> uint32_t major; >> uint32_t minor; >> uint32_t patchlevel; >> -}; >> +}__attribute__((packed)); >> >> enum lttng_kernel_calibrate_type { >> LTTNG_KERNEL_CALIBRATE_KRETPROBE, >> @@ -124,6 +124,21 @@ enum lttng_kernel_calibrate_type { >> >> struct lttng_kernel_calibrate { >> enum lttng_kernel_calibrate_type type; /* type (input) */ >> -}; >> +}__attribute__((packed)); >> + >> +/* >> + * kernel channel >> + */ >> +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 >> +struct lttng_kernel_channel { >> + uint64_t subbuf_size; /* bytes */ >> + uint64_t num_subbuf; /* power of 2 */ >> + unsigned int switch_timer_interval; /* usec */ >> + unsigned int read_timer_interval; /* usec */ >> + enum lttng_event_output output; /* splice, mmap */ >> + >> + int overwrite; /* 1: overwrite, 0: discard */ >> + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; >> +}__attribute__((packed)); >> >> #endif /* _LTTNG_KERNEL_H */ > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev From mathieu.desnoyers at efficios.com Tue Oct 9 12:23:14 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 9 Oct 2012 12:23:14 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v5] ABI with support for compat 32/64 bits In-Reply-To: <507437B7.8020309@efficios.com> References: <1349204606-6473-1-git-send-email-jdesfossez@efficios.com> <506C4044.7000603@efficios.com> <507437B7.8020309@efficios.com> Message-ID: <20121009162314.GA2829@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > Status on that patch? Looks good to me! Thanks! Acked-by: Mathieu Desnoyers > > Mathieu? > > David Goulet: > > This look good to me. There is a small change I would make though to the > > open_metadata call where the old_channel is not in the if - else{} > > statement but I'll do it before merging it. Don't bother sending back a > > version. > > > > Acked. > > > > David > > > > Julien Desfossez: > >> The current ABI does not work for compat 32/64 bits. > >> This patch moves the current ABI as old-abi and provides a new ABI in > >> which all the structures exchanged between user and kernel-space are > >> packed. Also this new ABI moves the "int overwrite" member of the > >> struct lttng_kernel_channel to remove the alignment added by the > >> compiler. > >> > >> A patch for lttng-modules has been developed in parallel to this one > >> to support the new ABI. These 2 patches have been tested in all > >> possible configurations (applied or not) on 64-bit and 32-bit kernels > >> (with CONFIG_COMPAT) and a user-space in 32 and 64-bit. > >> > >> Here are the results of the tests : > >> k 64 compat | u 32 compat | OK > >> k 64 compat | u 64 compat | OK > >> k 64 compat | u 32 non-compat | KO > >> k 64 compat | u 64 non-compat | OK > >> > >> k 64 non-compat | u 64 compat | OK > >> k 64 non-compat | u 32 compat | KO > >> k 64 non-compat | u 64 non-compat | OK > >> k 64 non-compat | u 32 non-compat | KO > >> > >> k 32 compat | u compat | OK > >> k 32 compat | u non-compat | OK > >> > >> k 32 non-compat | u compat | OK > >> k 32 non-compat | u non-compat | OK > >> > >> The results are as expected : > >> - on 32-bit user-space and kernel, every configuration works. > >> - on 64-bit user-space and kernel, every configuration works. > >> - with 32-bit user-space on a 64-bit kernel the only configuration > >> where it works is when the compat patch is applied everywhere. > >> > >> Signed-off-by: Julien Desfossez > >> --- > >> src/bin/lttng-sessiond/trace-kernel.h | 1 + > >> src/common/kernel-ctl/kernel-ctl.c | 224 ++++++++++++++++++++++++++++++--- > >> src/common/kernel-ctl/kernel-ctl.h | 1 + > >> src/common/kernel-ctl/kernel-ioctl.h | 74 +++++++---- > >> src/common/lttng-kernel-old.h | 115 +++++++++++++++++ > >> src/common/lttng-kernel.h | 31 +++-- > >> 6 files changed, 397 insertions(+), 49 deletions(-) > >> create mode 100644 src/common/lttng-kernel-old.h > >> > >> diff --git a/src/bin/lttng-sessiond/trace-kernel.h b/src/bin/lttng-sessiond/trace-kernel.h > >> index f04d9e7..c86cc27 100644 > >> --- a/src/bin/lttng-sessiond/trace-kernel.h > >> +++ b/src/bin/lttng-sessiond/trace-kernel.h > >> @@ -22,6 +22,7 @@ > >> > >> #include > >> #include > >> +#include > >> > >> #include "consumer.h" > >> > >> diff --git a/src/common/kernel-ctl/kernel-ctl.c b/src/common/kernel-ctl/kernel-ctl.c > >> index 1396cd9..a93d251 100644 > >> --- a/src/common/kernel-ctl/kernel-ctl.c > >> +++ b/src/common/kernel-ctl/kernel-ctl.c > >> @@ -18,38 +18,175 @@ > >> > >> #define __USE_LINUX_IOCTL_DEFS > >> #include > >> +#include > >> > >> #include "kernel-ctl.h" > >> #include "kernel-ioctl.h" > >> > >> +/* > >> + * This flag indicates which version of the kernel ABI to use. The old > >> + * ABI (namespace _old) does not support a 32-bit user-space when the > >> + * kernel is 64-bit. The old ABI is kept here for compatibility but is > >> + * deprecated and will be removed eventually. > >> + */ > >> +static int lttng_kernel_use_old_abi = -1; > >> + > >> +/* > >> + * Execute the new or old ioctl depending on the ABI version. > >> + * If the ABI version is not determined yet (lttng_kernel_use_old_abi = -1), > >> + * this function tests if the new ABI is available and otherwise fallbacks > >> + * on the old one. > >> + * This function takes the fd on which the ioctl must be executed and the old > >> + * and new request codes. > >> + * It returns the return value of the ioctl executed. > >> + */ > >> +static inline int compat_ioctl_no_arg(int fd, unsigned long oldname, > >> + unsigned long newname) > >> +{ > >> + int ret; > >> + > >> + if (lttng_kernel_use_old_abi == -1) { > >> + ret = ioctl(fd, newname); > >> + if (!ret) { > >> + lttng_kernel_use_old_abi = 0; > >> + goto end; > >> + } > >> + lttng_kernel_use_old_abi = 1; > >> + } > >> + if (lttng_kernel_use_old_abi) { > >> + ret = ioctl(fd, oldname); > >> + } else { > >> + ret = ioctl(fd, newname); > >> + } > >> + > >> +end: > >> + return ret; > >> +} > >> + > >> int kernctl_create_session(int fd) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_SESSION); > >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION, > >> + LTTNG_KERNEL_SESSION); > >> } > >> > >> /* open the metadata global channel */ > >> int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); > >> + struct lttng_kernel_old_channel old_channel; > >> + struct lttng_kernel_channel channel; > >> + > >> + if (lttng_kernel_use_old_abi) { > >> + old_channel.overwrite = chops->overwrite; > >> + old_channel.subbuf_size = chops->subbuf_size; > >> + old_channel.num_subbuf = chops->num_subbuf; > >> + old_channel.switch_timer_interval = chops->switch_timer_interval; > >> + old_channel.read_timer_interval = chops->read_timer_interval; > >> + old_channel.output = chops->output; > >> + memcpy(old_channel.padding, chops->padding, sizeof(old_channel.padding)); > >> + > >> + return ioctl(fd, LTTNG_KERNEL_OLD_METADATA, &old_channel); > >> + } > >> + > >> + channel.overwrite = chops->overwrite; > >> + channel.subbuf_size = chops->subbuf_size; > >> + channel.num_subbuf = chops->num_subbuf; > >> + channel.switch_timer_interval = chops->switch_timer_interval; > >> + channel.read_timer_interval = chops->read_timer_interval; > >> + channel.output = chops->output; > >> + memcpy(channel.padding, chops->padding, sizeof(channel.padding)); > >> + > >> + return ioctl(fd, LTTNG_KERNEL_METADATA, &channel); > >> } > >> > >> int kernctl_create_channel(int fd, struct lttng_channel_attr *chops) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); > >> + struct lttng_kernel_channel channel; > >> + > >> + if (lttng_kernel_use_old_abi) { > >> + struct lttng_kernel_old_channel old_channel; > >> + > >> + old_channel.overwrite = chops->overwrite; > >> + old_channel.subbuf_size = chops->subbuf_size; > >> + old_channel.num_subbuf = chops->num_subbuf; > >> + old_channel.switch_timer_interval = chops->switch_timer_interval; > >> + old_channel.read_timer_interval = chops->read_timer_interval; > >> + old_channel.output = chops->output; > >> + memcpy(old_channel.padding, chops->padding, sizeof(old_channel.padding)); > >> + > >> + return ioctl(fd, LTTNG_KERNEL_OLD_CHANNEL, &old_channel); > >> + } > >> + > >> + channel.overwrite = chops->overwrite; > >> + channel.subbuf_size = chops->subbuf_size; > >> + channel.num_subbuf = chops->num_subbuf; > >> + channel.switch_timer_interval = chops->switch_timer_interval; > >> + channel.read_timer_interval = chops->read_timer_interval; > >> + channel.output = chops->output; > >> + memcpy(channel.padding, chops->padding, sizeof(channel.padding)); > >> + > >> + return ioctl(fd, LTTNG_KERNEL_CHANNEL, &channel); > >> } > >> > >> int kernctl_create_stream(int fd) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_STREAM); > >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_STREAM, > >> + LTTNG_KERNEL_STREAM); > >> } > >> > >> int kernctl_create_event(int fd, struct lttng_kernel_event *ev) > >> { > >> + if (lttng_kernel_use_old_abi) { > >> + struct lttng_kernel_old_event old_event; > >> + > >> + memcpy(old_event.name, ev->name, sizeof(old_event.name)); > >> + old_event.instrumentation = ev->instrumentation; > >> + switch (ev->instrumentation) { > >> + case LTTNG_KERNEL_KPROBE: > >> + old_event.u.kprobe.addr = ev->u.kprobe.addr; > >> + old_event.u.kprobe.offset = ev->u.kprobe.offset; > >> + memcpy(old_event.u.kprobe.symbol_name, > >> + ev->u.kprobe.symbol_name, > >> + sizeof(old_event.u.kprobe.symbol_name)); > >> + break; > >> + case LTTNG_KERNEL_KRETPROBE: > >> + old_event.u.kretprobe.addr = ev->u.kretprobe.addr; > >> + old_event.u.kretprobe.offset = ev->u.kretprobe.offset; > >> + memcpy(old_event.u.kretprobe.symbol_name, > >> + ev->u.kretprobe.symbol_name, > >> + sizeof(old_event.u.kretprobe.symbol_name)); > >> + break; > >> + case LTTNG_KERNEL_FUNCTION: > >> + memcpy(old_event.u.ftrace.symbol_name, > >> + ev->u.ftrace.symbol_name, > >> + sizeof(old_event.u.ftrace.symbol_name)); > >> + break; > >> + default: > >> + break; > >> + } > >> + > >> + return ioctl(fd, LTTNG_KERNEL_OLD_EVENT, &old_event); > >> + } > >> return ioctl(fd, LTTNG_KERNEL_EVENT, ev); > >> } > >> > >> int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) > >> { > >> + if (lttng_kernel_use_old_abi) { > >> + struct lttng_kernel_old_context old_ctx; > >> + > >> + old_ctx.ctx = ctx->ctx; > >> + /* only type that uses the union */ > >> + if (ctx->ctx == LTTNG_KERNEL_CONTEXT_PERF_COUNTER) { > >> + old_ctx.u.perf_counter.type = > >> + ctx->u.perf_counter.type; > >> + old_ctx.u.perf_counter.config = > >> + ctx->u.perf_counter.config; > >> + memcpy(old_ctx.u.perf_counter.name, > >> + ctx->u.perf_counter.name, > >> + sizeof(old_ctx.u.perf_counter.name)); > >> + } > >> + return ioctl(fd, LTTNG_KERNEL_OLD_CONTEXT, &old_ctx); > >> + } > >> return ioctl(fd, LTTNG_KERNEL_CONTEXT, ctx); > >> } > >> > >> @@ -57,44 +194,98 @@ int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) > >> /* Enable event, channel and session ioctl */ > >> int kernctl_enable(int fd) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_ENABLE); > >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_ENABLE, > >> + LTTNG_KERNEL_ENABLE); > >> } > >> > >> /* Disable event, channel and session ioctl */ > >> int kernctl_disable(int fd) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_DISABLE); > >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_DISABLE, > >> + LTTNG_KERNEL_DISABLE); > >> } > >> > >> int kernctl_start_session(int fd) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_SESSION_START); > >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_START, > >> + LTTNG_KERNEL_SESSION_START); > >> } > >> > >> int kernctl_stop_session(int fd) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_SESSION_STOP); > >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_STOP, > >> + LTTNG_KERNEL_SESSION_STOP); > >> } > >> > >> - > >> int kernctl_tracepoint_list(int fd) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_TRACEPOINT_LIST); > >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_TRACEPOINT_LIST, > >> + LTTNG_KERNEL_TRACEPOINT_LIST); > >> } > >> > >> int kernctl_tracer_version(int fd, struct lttng_kernel_tracer_version *v) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > >> + int ret; > >> + > >> + if (lttng_kernel_use_old_abi == -1) { > >> + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > >> + if (!ret) { > >> + lttng_kernel_use_old_abi = 0; > >> + goto end; > >> + } > >> + lttng_kernel_use_old_abi = 1; > >> + } > >> + if (lttng_kernel_use_old_abi) { > >> + struct lttng_kernel_old_tracer_version old_v; > >> + > >> + ret = ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, &old_v); > >> + if (ret) { > >> + goto end; > >> + } > >> + v->major = old_v.major; > >> + v->minor = old_v.minor; > >> + v->patchlevel = old_v.patchlevel; > >> + } else { > >> + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > >> + } > >> + > >> +end: > >> + return ret; > >> } > >> > >> int kernctl_wait_quiescent(int fd) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_WAIT_QUIESCENT); > >> + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_WAIT_QUIESCENT, > >> + LTTNG_KERNEL_WAIT_QUIESCENT); > >> } > >> > >> int kernctl_calibrate(int fd, struct lttng_kernel_calibrate *calibrate) > >> { > >> - return ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > >> + int ret; > >> + > >> + if (lttng_kernel_use_old_abi == -1) { > >> + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > >> + if (!ret) { > >> + lttng_kernel_use_old_abi = 0; > >> + goto end; > >> + } > >> + lttng_kernel_use_old_abi = 1; > >> + } > >> + if (lttng_kernel_use_old_abi) { > >> + struct lttng_kernel_old_calibrate old_calibrate; > >> + > >> + old_calibrate.type = calibrate->type; > >> + ret = ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, &old_calibrate); > >> + if (ret) { > >> + goto end; > >> + } > >> + calibrate->type = old_calibrate.type; > >> + } else { > >> + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > >> + } > >> + > >> +end: > >> + return ret; > >> } > >> > >> > >> @@ -193,10 +384,3 @@ int kernctl_set_stream_id(int fd, unsigned long *stream_id) > >> { > >> return ioctl(fd, RING_BUFFER_SET_STREAM_ID, stream_id); > >> } > >> - > >> -/* Get the offset of the stream_id in the packet header */ > >> -int kernctl_get_net_stream_id_offset(int fd, unsigned long *offset) > >> -{ > >> - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, offset); > >> - > >> -} > >> diff --git a/src/common/kernel-ctl/kernel-ctl.h b/src/common/kernel-ctl/kernel-ctl.h > >> index 18712d9..85a3a18 100644 > >> --- a/src/common/kernel-ctl/kernel-ctl.h > >> +++ b/src/common/kernel-ctl/kernel-ctl.h > >> @@ -21,6 +21,7 @@ > >> > >> #include > >> #include > >> +#include > >> > >> int kernctl_create_session(int fd); > >> int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops); > >> diff --git a/src/common/kernel-ctl/kernel-ioctl.h b/src/common/kernel-ctl/kernel-ioctl.h > >> index 35942be..8e22632 100644 > >> --- a/src/common/kernel-ctl/kernel-ioctl.h > >> +++ b/src/common/kernel-ctl/kernel-ioctl.h > >> @@ -49,37 +49,69 @@ > >> /* map stream to stream id for network streaming */ > >> #define RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned long) > >> > >> +/* Old ABI (without support for 32/64 bits compat) */ > >> +/* LTTng file descriptor ioctl */ > >> +#define LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) > >> +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ > >> + _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) > >> +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, 0x42) > >> +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT _IO(0xF6, 0x43) > >> +#define LTTNG_KERNEL_OLD_CALIBRATE \ > >> + _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) > >> + > >> +/* Session FD ioctl */ > >> +#define LTTNG_KERNEL_OLD_METADATA \ > >> + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) > >> +#define LTTNG_KERNEL_OLD_CHANNEL \ > >> + _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) > >> +#define LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) > >> +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, 0x53) > >> + > >> +/* Channel FD ioctl */ > >> +#define LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) > >> +#define LTTNG_KERNEL_OLD_EVENT \ > >> + _IOW(0xF6, 0x61, struct lttng_kernel_old_event) > >> +#define LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ > >> + _IOR(0xF6, 0x62, unsigned long) > >> > >> +/* Event and Channel FD ioctl */ > >> +#define LTTNG_KERNEL_OLD_CONTEXT \ > >> + _IOW(0xF6, 0x70, struct lttng_kernel_old_context) > >> + > >> +/* Event, Channel and Session ioctl */ > >> +#define LTTNG_KERNEL_OLD_ENABLE _IO(0xF6, 0x80) > >> +#define LTTNG_KERNEL_OLD_DISABLE _IO(0xF6, 0x81) > >> + > >> + > >> +/* Current ABI (with suport for 32/64 bits compat) */ > >> /* LTTng file descriptor ioctl */ > >> -#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) > >> -#define LTTNG_KERNEL_TRACER_VERSION \ > >> - _IOR(0xF6, 0x41, struct lttng_kernel_tracer_version) > >> -#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) > >> -#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) > >> +#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) > >> +#define LTTNG_KERNEL_TRACER_VERSION \ > >> + _IOR(0xF6, 0x46, struct lttng_kernel_tracer_version) > >> +#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) > >> +#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) > >> #define LTTNG_KERNEL_CALIBRATE \ > >> - _IOWR(0xF6, 0x44, struct lttng_kernel_calibrate) > >> + _IOWR(0xF6, 0x49, struct lttng_kernel_calibrate) > >> > >> /* Session FD ioctl */ > >> -#define LTTNG_KERNEL_METADATA \ > >> - _IOW(0xF6, 0x50, struct lttng_channel_attr) > >> -#define LTTNG_KERNEL_CHANNEL \ > >> - _IOW(0xF6, 0x51, struct lttng_channel_attr) > >> -#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) > >> -#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) > >> +#define LTTNG_KERNEL_METADATA \ > >> + _IOW(0xF6, 0x54, struct lttng_kernel_channel) > >> +#define LTTNG_KERNEL_CHANNEL \ > >> + _IOW(0xF6, 0x55, struct lttng_kernel_channel) > >> +#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) > >> +#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) > >> > >> /* Channel FD ioctl */ > >> -#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x60) > >> -#define LTTNG_KERNEL_EVENT \ > >> - _IOW(0xF6, 0x61, struct lttng_kernel_event) > >> -#define LTTNG_KERNEL_STREAM_ID_OFFSET \ > >> - _IOR(0xF6, 0x62, unsigned long) > >> +#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x62) > >> +#define LTTNG_KERNEL_EVENT \ > >> + _IOW(0xF6, 0x63, struct lttng_kernel_event) > >> > >> /* Event and Channel FD ioctl */ > >> -#define LTTNG_KERNEL_CONTEXT \ > >> - _IOW(0xF6, 0x70, struct lttng_kernel_context) > >> +#define LTTNG_KERNEL_CONTEXT \ > >> + _IOW(0xF6, 0x71, struct lttng_kernel_context) > >> > >> /* Event, Channel and Session ioctl */ > >> -#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) > >> -#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) > >> +#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) > >> +#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) > >> > >> #endif /* _LTT_KERNEL_IOCTL_H */ > >> diff --git a/src/common/lttng-kernel-old.h b/src/common/lttng-kernel-old.h > >> new file mode 100644 > >> index 0000000..1b8999a > >> --- /dev/null > >> +++ b/src/common/lttng-kernel-old.h > >> @@ -0,0 +1,115 @@ > >> +/* > >> + * Copyright (C) 2011 - Julien Desfossez > >> + * Mathieu Desnoyers > >> + * David Goulet > >> + * > >> + * This program is free software; you can redistribute it and/or modify > >> + * it under the terms of the GNU General Public License, version 2 only, > >> + * as published by the Free Software Foundation. > >> + * > >> + * This program is distributed in the hope that it will be useful, but WITHOUT > >> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > >> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > >> + * more details. > >> + * > >> + * You should have received a copy of the GNU General Public License along > >> + * with this program; if not, write to the Free Software Foundation, Inc., > >> + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > >> + */ > >> + > >> +#ifndef _LTTNG_KERNEL_OLD_H > >> +#define _LTTNG_KERNEL_OLD_H > >> + > >> +#include > >> +#include > >> + > >> +/* > >> + * LTTng DebugFS ABI structures. > >> + * > >> + * This is the kernel ABI copied from lttng-modules tree. > >> + */ > >> + > >> +/* Perf counter attributes */ > >> +struct lttng_kernel_old_perf_counter_ctx { > >> + uint32_t type; > >> + uint64_t config; > >> + char name[LTTNG_KERNEL_SYM_NAME_LEN]; > >> +}; > >> + > >> +/* Event/Channel context */ > >> +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 > >> +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > >> +struct lttng_kernel_old_context { > >> + enum lttng_kernel_context_type ctx; > >> + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; > >> + > >> + union { > >> + struct lttng_kernel_old_perf_counter_ctx perf_counter; > >> + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; > >> + } u; > >> +}; > >> + > >> +struct lttng_kernel_old_kretprobe { > >> + uint64_t addr; > >> + > >> + uint64_t offset; > >> + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > >> +}; > >> + > >> +/* > >> + * Either addr is used, or symbol_name and offset. > >> + */ > >> +struct lttng_kernel_old_kprobe { > >> + uint64_t addr; > >> + > >> + uint64_t offset; > >> + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > >> +}; > >> + > >> +/* Function tracer */ > >> +struct lttng_kernel_old_function { > >> + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > >> +}; > >> + > >> +#define LTTNG_KERNEL_OLD_EVENT_PADDING1 16 > >> +#define LTTNG_KERNEL_OLD_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > >> +struct lttng_kernel_old_event { > >> + char name[LTTNG_KERNEL_SYM_NAME_LEN]; > >> + enum lttng_kernel_instrumentation instrumentation; > >> + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; > >> + > >> + /* Per instrumentation type configuration */ > >> + union { > >> + struct lttng_kernel_old_kretprobe kretprobe; > >> + struct lttng_kernel_old_kprobe kprobe; > >> + struct lttng_kernel_old_function ftrace; > >> + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; > >> + } u; > >> +}; > >> + > >> +struct lttng_kernel_old_tracer_version { > >> + uint32_t major; > >> + uint32_t minor; > >> + uint32_t patchlevel; > >> +}; > >> + > >> +struct lttng_kernel_old_calibrate { > >> + enum lttng_kernel_calibrate_type type; /* type (input) */ > >> +}; > >> + > >> +/* > >> + * kernel channel > >> + */ > >> +#define LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > >> +struct lttng_kernel_old_channel { > >> + int overwrite; /* 1: overwrite, 0: discard */ > >> + uint64_t subbuf_size; /* bytes */ > >> + uint64_t num_subbuf; /* power of 2 */ > >> + unsigned int switch_timer_interval; /* usec */ > >> + unsigned int read_timer_interval; /* usec */ > >> + enum lttng_event_output output; /* splice, mmap */ > >> + > >> + char padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; > >> +}; > >> + > >> +#endif /* _LTTNG_KERNEL_OLD_H */ > >> diff --git a/src/common/lttng-kernel.h b/src/common/lttng-kernel.h > >> index dbeb6aa..fa8ba61 100644 > >> --- a/src/common/lttng-kernel.h > >> +++ b/src/common/lttng-kernel.h > >> @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { > >> uint32_t type; > >> uint64_t config; > >> char name[LTTNG_KERNEL_SYM_NAME_LEN]; > >> -}; > >> +}__attribute__((packed)); > >> > >> /* Event/Channel context */ > >> #define LTTNG_KERNEL_CONTEXT_PADDING1 16 > >> @@ -72,14 +72,14 @@ struct lttng_kernel_context { > >> struct lttng_kernel_perf_counter_ctx perf_counter; > >> char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; > >> } u; > >> -}; > >> +}__attribute__((packed)); > >> > >> struct lttng_kernel_kretprobe { > >> uint64_t addr; > >> > >> uint64_t offset; > >> char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > >> -}; > >> +}__attribute__((packed)); > >> > >> /* > >> * Either addr is used, or symbol_name and offset. > >> @@ -89,12 +89,12 @@ struct lttng_kernel_kprobe { > >> > >> uint64_t offset; > >> char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > >> -}; > >> +}__attribute__((packed)); > >> > >> /* Function tracer */ > >> struct lttng_kernel_function { > >> char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > >> -}; > >> +}__attribute__((packed)); > >> > >> #define LTTNG_KERNEL_EVENT_PADDING1 16 > >> #define LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > >> @@ -110,13 +110,13 @@ struct lttng_kernel_event { > >> struct lttng_kernel_function ftrace; > >> char padding[LTTNG_KERNEL_EVENT_PADDING2]; > >> } u; > >> -}; > >> +}__attribute__((packed)); > >> > >> struct lttng_kernel_tracer_version { > >> uint32_t major; > >> uint32_t minor; > >> uint32_t patchlevel; > >> -}; > >> +}__attribute__((packed)); > >> > >> enum lttng_kernel_calibrate_type { > >> LTTNG_KERNEL_CALIBRATE_KRETPROBE, > >> @@ -124,6 +124,21 @@ enum lttng_kernel_calibrate_type { > >> > >> struct lttng_kernel_calibrate { > >> enum lttng_kernel_calibrate_type type; /* type (input) */ > >> -}; > >> +}__attribute__((packed)); > >> + > >> +/* > >> + * kernel channel > >> + */ > >> +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > >> +struct lttng_kernel_channel { > >> + uint64_t subbuf_size; /* bytes */ > >> + uint64_t num_subbuf; /* power of 2 */ > >> + unsigned int switch_timer_interval; /* usec */ > >> + unsigned int read_timer_interval; /* usec */ > >> + enum lttng_event_output output; /* splice, mmap */ > >> + > >> + int overwrite; /* 1: overwrite, 0: discard */ > >> + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; > >> +}__attribute__((packed)); > >> > >> #endif /* _LTTNG_KERNEL_H */ > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev at lists.lttng.org > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Tue Oct 9 12:52:38 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 9 Oct 2012 12:52:38 -0400 Subject: [lttng-dev] UST segfault: memcpy too big In-Reply-To: <50735E53.1010104@mentor.com> References: <50735E53.1010104@mentor.com> Message-ID: <20121009165238.GA5195@Krystal> * Hollis Blanchard (hollis_blanchard at mentor.com) wrote: > I seem to have hit a little problem with a "hello world" test app and > lttng-ust 2.0.3. lttng-ust.git seems to be affected as well. Basically, > I created a single UST tracepoint, but as soon as I run "lttng > enable-event -u -a", my app segfaults. The problem seems to be that when > creating the event to pass to ltt_event_create(), we try to memcpy the > full 256 bytes of name. However, the name might be shorter, and if we > get unlucky it falls within 256 bytes of the segment boundary... Good catch !! Fixed by commit: master: commit 1c7b4a9b7cc83f750a7d58d5e2f4894a2559f583 Author: Mathieu Desnoyers Date: Tue Oct 9 12:47:31 2012 -0400 Fix: memcpy of string is larger than source Hollis Blanchard wrote: > I seem to have hit a little problem with a "hello world" test app and > lttng-ust 2.0.3. lttng-ust.git seems to be affected as well. Basically, > I created a single UST tracepoint, but as soon as I run "lttng > enable-event -u -a", my app segfaults. The problem seems to be that when > creating the event to pass to ltt_event_create(), we try to memcpy the > full 256 bytes of name. However, the name might be shorter, and if we > get unlucky it falls within 256 bytes of the segment boundary... Fixing the 3 sites where this issue arise. Manually inspecting all memcpy in the UST code returned by grep did the job. Reported-by: Hollis Blanchard Signed-off-by: Mathieu Desnoyers stable-2.0: commit 7a673d9947d11a37d08be89a5c157afdfd377f9f Author: Mathieu Desnoyers Date: Tue Oct 9 12:47:31 2012 -0400 Fix: memcpy of string is larger than source Hollis Blanchard wrote: > I seem to have hit a little problem with a "hello world" test app and > lttng-ust 2.0.3. lttng-ust.git seems to be affected as well. Basically, > I created a single UST tracepoint, but as soon as I run "lttng > enable-event -u -a", my app segfaults. The problem seems to be that when > creating the event to pass to ltt_event_create(), we try to memcpy the > full 256 bytes of name. However, the name might be shorter, and if we > get unlucky it falls within 256 bytes of the segment boundary... Fixing the 3 sites where this issue arise. Manually inspecting all memcpy in the UST code returned by grep did the job. Reported-by: Hollis Blanchard Signed-off-by: Mathieu Desnoyers Thanks! Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Tue Oct 9 14:39:20 2012 From: dgoulet at efficios.com (David Goulet) Date: Tue, 09 Oct 2012 14:39:20 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH v5] ABI with support for compat 32/64 bits In-Reply-To: <1349204606-6473-1-git-send-email-jdesfossez@efficios.com> References: <1349204606-6473-1-git-send-email-jdesfossez@efficios.com> Message-ID: <50746F58.6070400@efficios.com> Merged! Thanks David Julien Desfossez: > The current ABI does not work for compat 32/64 bits. > This patch moves the current ABI as old-abi and provides a new ABI in > which all the structures exchanged between user and kernel-space are > packed. Also this new ABI moves the "int overwrite" member of the > struct lttng_kernel_channel to remove the alignment added by the > compiler. > > A patch for lttng-modules has been developed in parallel to this one > to support the new ABI. These 2 patches have been tested in all > possible configurations (applied or not) on 64-bit and 32-bit kernels > (with CONFIG_COMPAT) and a user-space in 32 and 64-bit. > > Here are the results of the tests : > k 64 compat | u 32 compat | OK > k 64 compat | u 64 compat | OK > k 64 compat | u 32 non-compat | KO > k 64 compat | u 64 non-compat | OK > > k 64 non-compat | u 64 compat | OK > k 64 non-compat | u 32 compat | KO > k 64 non-compat | u 64 non-compat | OK > k 64 non-compat | u 32 non-compat | KO > > k 32 compat | u compat | OK > k 32 compat | u non-compat | OK > > k 32 non-compat | u compat | OK > k 32 non-compat | u non-compat | OK > > The results are as expected : > - on 32-bit user-space and kernel, every configuration works. > - on 64-bit user-space and kernel, every configuration works. > - with 32-bit user-space on a 64-bit kernel the only configuration > where it works is when the compat patch is applied everywhere. > > Signed-off-by: Julien Desfossez > --- > src/bin/lttng-sessiond/trace-kernel.h | 1 + > src/common/kernel-ctl/kernel-ctl.c | 224 ++++++++++++++++++++++++++++++--- > src/common/kernel-ctl/kernel-ctl.h | 1 + > src/common/kernel-ctl/kernel-ioctl.h | 74 +++++++---- > src/common/lttng-kernel-old.h | 115 +++++++++++++++++ > src/common/lttng-kernel.h | 31 +++-- > 6 files changed, 397 insertions(+), 49 deletions(-) > create mode 100644 src/common/lttng-kernel-old.h > > diff --git a/src/bin/lttng-sessiond/trace-kernel.h b/src/bin/lttng-sessiond/trace-kernel.h > index f04d9e7..c86cc27 100644 > --- a/src/bin/lttng-sessiond/trace-kernel.h > +++ b/src/bin/lttng-sessiond/trace-kernel.h > @@ -22,6 +22,7 @@ > > #include > #include > +#include > > #include "consumer.h" > > diff --git a/src/common/kernel-ctl/kernel-ctl.c b/src/common/kernel-ctl/kernel-ctl.c > index 1396cd9..a93d251 100644 > --- a/src/common/kernel-ctl/kernel-ctl.c > +++ b/src/common/kernel-ctl/kernel-ctl.c > @@ -18,38 +18,175 @@ > > #define __USE_LINUX_IOCTL_DEFS > #include > +#include > > #include "kernel-ctl.h" > #include "kernel-ioctl.h" > > +/* > + * This flag indicates which version of the kernel ABI to use. The old > + * ABI (namespace _old) does not support a 32-bit user-space when the > + * kernel is 64-bit. The old ABI is kept here for compatibility but is > + * deprecated and will be removed eventually. > + */ > +static int lttng_kernel_use_old_abi = -1; > + > +/* > + * Execute the new or old ioctl depending on the ABI version. > + * If the ABI version is not determined yet (lttng_kernel_use_old_abi = -1), > + * this function tests if the new ABI is available and otherwise fallbacks > + * on the old one. > + * This function takes the fd on which the ioctl must be executed and the old > + * and new request codes. > + * It returns the return value of the ioctl executed. > + */ > +static inline int compat_ioctl_no_arg(int fd, unsigned long oldname, > + unsigned long newname) > +{ > + int ret; > + > + if (lttng_kernel_use_old_abi == -1) { > + ret = ioctl(fd, newname); > + if (!ret) { > + lttng_kernel_use_old_abi = 0; > + goto end; > + } > + lttng_kernel_use_old_abi = 1; > + } > + if (lttng_kernel_use_old_abi) { > + ret = ioctl(fd, oldname); > + } else { > + ret = ioctl(fd, newname); > + } > + > +end: > + return ret; > +} > + > int kernctl_create_session(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_SESSION); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION, > + LTTNG_KERNEL_SESSION); > } > > /* open the metadata global channel */ > int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops) > { > - return ioctl(fd, LTTNG_KERNEL_METADATA, chops); > + struct lttng_kernel_old_channel old_channel; > + struct lttng_kernel_channel channel; > + > + if (lttng_kernel_use_old_abi) { > + old_channel.overwrite = chops->overwrite; > + old_channel.subbuf_size = chops->subbuf_size; > + old_channel.num_subbuf = chops->num_subbuf; > + old_channel.switch_timer_interval = chops->switch_timer_interval; > + old_channel.read_timer_interval = chops->read_timer_interval; > + old_channel.output = chops->output; > + memcpy(old_channel.padding, chops->padding, sizeof(old_channel.padding)); > + > + return ioctl(fd, LTTNG_KERNEL_OLD_METADATA, &old_channel); > + } > + > + channel.overwrite = chops->overwrite; > + channel.subbuf_size = chops->subbuf_size; > + channel.num_subbuf = chops->num_subbuf; > + channel.switch_timer_interval = chops->switch_timer_interval; > + channel.read_timer_interval = chops->read_timer_interval; > + channel.output = chops->output; > + memcpy(channel.padding, chops->padding, sizeof(channel.padding)); > + > + return ioctl(fd, LTTNG_KERNEL_METADATA, &channel); > } > > int kernctl_create_channel(int fd, struct lttng_channel_attr *chops) > { > - return ioctl(fd, LTTNG_KERNEL_CHANNEL, chops); > + struct lttng_kernel_channel channel; > + > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_old_channel old_channel; > + > + old_channel.overwrite = chops->overwrite; > + old_channel.subbuf_size = chops->subbuf_size; > + old_channel.num_subbuf = chops->num_subbuf; > + old_channel.switch_timer_interval = chops->switch_timer_interval; > + old_channel.read_timer_interval = chops->read_timer_interval; > + old_channel.output = chops->output; > + memcpy(old_channel.padding, chops->padding, sizeof(old_channel.padding)); > + > + return ioctl(fd, LTTNG_KERNEL_OLD_CHANNEL, &old_channel); > + } > + > + channel.overwrite = chops->overwrite; > + channel.subbuf_size = chops->subbuf_size; > + channel.num_subbuf = chops->num_subbuf; > + channel.switch_timer_interval = chops->switch_timer_interval; > + channel.read_timer_interval = chops->read_timer_interval; > + channel.output = chops->output; > + memcpy(channel.padding, chops->padding, sizeof(channel.padding)); > + > + return ioctl(fd, LTTNG_KERNEL_CHANNEL, &channel); > } > > int kernctl_create_stream(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_STREAM); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_STREAM, > + LTTNG_KERNEL_STREAM); > } > > int kernctl_create_event(int fd, struct lttng_kernel_event *ev) > { > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_old_event old_event; > + > + memcpy(old_event.name, ev->name, sizeof(old_event.name)); > + old_event.instrumentation = ev->instrumentation; > + switch (ev->instrumentation) { > + case LTTNG_KERNEL_KPROBE: > + old_event.u.kprobe.addr = ev->u.kprobe.addr; > + old_event.u.kprobe.offset = ev->u.kprobe.offset; > + memcpy(old_event.u.kprobe.symbol_name, > + ev->u.kprobe.symbol_name, > + sizeof(old_event.u.kprobe.symbol_name)); > + break; > + case LTTNG_KERNEL_KRETPROBE: > + old_event.u.kretprobe.addr = ev->u.kretprobe.addr; > + old_event.u.kretprobe.offset = ev->u.kretprobe.offset; > + memcpy(old_event.u.kretprobe.symbol_name, > + ev->u.kretprobe.symbol_name, > + sizeof(old_event.u.kretprobe.symbol_name)); > + break; > + case LTTNG_KERNEL_FUNCTION: > + memcpy(old_event.u.ftrace.symbol_name, > + ev->u.ftrace.symbol_name, > + sizeof(old_event.u.ftrace.symbol_name)); > + break; > + default: > + break; > + } > + > + return ioctl(fd, LTTNG_KERNEL_OLD_EVENT, &old_event); > + } > return ioctl(fd, LTTNG_KERNEL_EVENT, ev); > } > > int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) > { > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_old_context old_ctx; > + > + old_ctx.ctx = ctx->ctx; > + /* only type that uses the union */ > + if (ctx->ctx == LTTNG_KERNEL_CONTEXT_PERF_COUNTER) { > + old_ctx.u.perf_counter.type = > + ctx->u.perf_counter.type; > + old_ctx.u.perf_counter.config = > + ctx->u.perf_counter.config; > + memcpy(old_ctx.u.perf_counter.name, > + ctx->u.perf_counter.name, > + sizeof(old_ctx.u.perf_counter.name)); > + } > + return ioctl(fd, LTTNG_KERNEL_OLD_CONTEXT, &old_ctx); > + } > return ioctl(fd, LTTNG_KERNEL_CONTEXT, ctx); > } > > @@ -57,44 +194,98 @@ int kernctl_add_context(int fd, struct lttng_kernel_context *ctx) > /* Enable event, channel and session ioctl */ > int kernctl_enable(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_ENABLE); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_ENABLE, > + LTTNG_KERNEL_ENABLE); > } > > /* Disable event, channel and session ioctl */ > int kernctl_disable(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_DISABLE); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_DISABLE, > + LTTNG_KERNEL_DISABLE); > } > > int kernctl_start_session(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_SESSION_START); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_START, > + LTTNG_KERNEL_SESSION_START); > } > > int kernctl_stop_session(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_SESSION_STOP); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_SESSION_STOP, > + LTTNG_KERNEL_SESSION_STOP); > } > > - > int kernctl_tracepoint_list(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_TRACEPOINT_LIST); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_TRACEPOINT_LIST, > + LTTNG_KERNEL_TRACEPOINT_LIST); > } > > int kernctl_tracer_version(int fd, struct lttng_kernel_tracer_version *v) > { > - return ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > + int ret; > + > + if (lttng_kernel_use_old_abi == -1) { > + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > + if (!ret) { > + lttng_kernel_use_old_abi = 0; > + goto end; > + } > + lttng_kernel_use_old_abi = 1; > + } > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_old_tracer_version old_v; > + > + ret = ioctl(fd, LTTNG_KERNEL_OLD_TRACER_VERSION, &old_v); > + if (ret) { > + goto end; > + } > + v->major = old_v.major; > + v->minor = old_v.minor; > + v->patchlevel = old_v.patchlevel; > + } else { > + ret = ioctl(fd, LTTNG_KERNEL_TRACER_VERSION, v); > + } > + > +end: > + return ret; > } > > int kernctl_wait_quiescent(int fd) > { > - return ioctl(fd, LTTNG_KERNEL_WAIT_QUIESCENT); > + return compat_ioctl_no_arg(fd, LTTNG_KERNEL_OLD_WAIT_QUIESCENT, > + LTTNG_KERNEL_WAIT_QUIESCENT); > } > > int kernctl_calibrate(int fd, struct lttng_kernel_calibrate *calibrate) > { > - return ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > + int ret; > + > + if (lttng_kernel_use_old_abi == -1) { > + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > + if (!ret) { > + lttng_kernel_use_old_abi = 0; > + goto end; > + } > + lttng_kernel_use_old_abi = 1; > + } > + if (lttng_kernel_use_old_abi) { > + struct lttng_kernel_old_calibrate old_calibrate; > + > + old_calibrate.type = calibrate->type; > + ret = ioctl(fd, LTTNG_KERNEL_OLD_CALIBRATE, &old_calibrate); > + if (ret) { > + goto end; > + } > + calibrate->type = old_calibrate.type; > + } else { > + ret = ioctl(fd, LTTNG_KERNEL_CALIBRATE, calibrate); > + } > + > +end: > + return ret; > } > > > @@ -193,10 +384,3 @@ int kernctl_set_stream_id(int fd, unsigned long *stream_id) > { > return ioctl(fd, RING_BUFFER_SET_STREAM_ID, stream_id); > } > - > -/* Get the offset of the stream_id in the packet header */ > -int kernctl_get_net_stream_id_offset(int fd, unsigned long *offset) > -{ > - return ioctl(fd, LTTNG_KERNEL_STREAM_ID_OFFSET, offset); > - > -} > diff --git a/src/common/kernel-ctl/kernel-ctl.h b/src/common/kernel-ctl/kernel-ctl.h > index 18712d9..85a3a18 100644 > --- a/src/common/kernel-ctl/kernel-ctl.h > +++ b/src/common/kernel-ctl/kernel-ctl.h > @@ -21,6 +21,7 @@ > > #include > #include > +#include > > int kernctl_create_session(int fd); > int kernctl_open_metadata(int fd, struct lttng_channel_attr *chops); > diff --git a/src/common/kernel-ctl/kernel-ioctl.h b/src/common/kernel-ctl/kernel-ioctl.h > index 35942be..8e22632 100644 > --- a/src/common/kernel-ctl/kernel-ioctl.h > +++ b/src/common/kernel-ctl/kernel-ioctl.h > @@ -49,37 +49,69 @@ > /* map stream to stream id for network streaming */ > #define RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned long) > > +/* Old ABI (without support for 32/64 bits compat) */ > +/* LTTng file descriptor ioctl */ > +#define LTTNG_KERNEL_OLD_SESSION _IO(0xF6, 0x40) > +#define LTTNG_KERNEL_OLD_TRACER_VERSION \ > + _IOR(0xF6, 0x41, struct lttng_kernel_old_tracer_version) > +#define LTTNG_KERNEL_OLD_TRACEPOINT_LIST _IO(0xF6, 0x42) > +#define LTTNG_KERNEL_OLD_WAIT_QUIESCENT _IO(0xF6, 0x43) > +#define LTTNG_KERNEL_OLD_CALIBRATE \ > + _IOWR(0xF6, 0x44, struct lttng_kernel_old_calibrate) > + > +/* Session FD ioctl */ > +#define LTTNG_KERNEL_OLD_METADATA \ > + _IOW(0xF6, 0x50, struct lttng_kernel_old_channel) > +#define LTTNG_KERNEL_OLD_CHANNEL \ > + _IOW(0xF6, 0x51, struct lttng_kernel_old_channel) > +#define LTTNG_KERNEL_OLD_SESSION_START _IO(0xF6, 0x52) > +#define LTTNG_KERNEL_OLD_SESSION_STOP _IO(0xF6, 0x53) > + > +/* Channel FD ioctl */ > +#define LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) > +#define LTTNG_KERNEL_OLD_EVENT \ > + _IOW(0xF6, 0x61, struct lttng_kernel_old_event) > +#define LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ > + _IOR(0xF6, 0x62, unsigned long) > > +/* Event and Channel FD ioctl */ > +#define LTTNG_KERNEL_OLD_CONTEXT \ > + _IOW(0xF6, 0x70, struct lttng_kernel_old_context) > + > +/* Event, Channel and Session ioctl */ > +#define LTTNG_KERNEL_OLD_ENABLE _IO(0xF6, 0x80) > +#define LTTNG_KERNEL_OLD_DISABLE _IO(0xF6, 0x81) > + > + > +/* Current ABI (with suport for 32/64 bits compat) */ > /* LTTng file descriptor ioctl */ > -#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x40) > -#define LTTNG_KERNEL_TRACER_VERSION \ > - _IOR(0xF6, 0x41, struct lttng_kernel_tracer_version) > -#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x42) > -#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x43) > +#define LTTNG_KERNEL_SESSION _IO(0xF6, 0x45) > +#define LTTNG_KERNEL_TRACER_VERSION \ > + _IOR(0xF6, 0x46, struct lttng_kernel_tracer_version) > +#define LTTNG_KERNEL_TRACEPOINT_LIST _IO(0xF6, 0x47) > +#define LTTNG_KERNEL_WAIT_QUIESCENT _IO(0xF6, 0x48) > #define LTTNG_KERNEL_CALIBRATE \ > - _IOWR(0xF6, 0x44, struct lttng_kernel_calibrate) > + _IOWR(0xF6, 0x49, struct lttng_kernel_calibrate) > > /* Session FD ioctl */ > -#define LTTNG_KERNEL_METADATA \ > - _IOW(0xF6, 0x50, struct lttng_channel_attr) > -#define LTTNG_KERNEL_CHANNEL \ > - _IOW(0xF6, 0x51, struct lttng_channel_attr) > -#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x52) > -#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x53) > +#define LTTNG_KERNEL_METADATA \ > + _IOW(0xF6, 0x54, struct lttng_kernel_channel) > +#define LTTNG_KERNEL_CHANNEL \ > + _IOW(0xF6, 0x55, struct lttng_kernel_channel) > +#define LTTNG_KERNEL_SESSION_START _IO(0xF6, 0x56) > +#define LTTNG_KERNEL_SESSION_STOP _IO(0xF6, 0x57) > > /* Channel FD ioctl */ > -#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x60) > -#define LTTNG_KERNEL_EVENT \ > - _IOW(0xF6, 0x61, struct lttng_kernel_event) > -#define LTTNG_KERNEL_STREAM_ID_OFFSET \ > - _IOR(0xF6, 0x62, unsigned long) > +#define LTTNG_KERNEL_STREAM _IO(0xF6, 0x62) > +#define LTTNG_KERNEL_EVENT \ > + _IOW(0xF6, 0x63, struct lttng_kernel_event) > > /* Event and Channel FD ioctl */ > -#define LTTNG_KERNEL_CONTEXT \ > - _IOW(0xF6, 0x70, struct lttng_kernel_context) > +#define LTTNG_KERNEL_CONTEXT \ > + _IOW(0xF6, 0x71, struct lttng_kernel_context) > > /* Event, Channel and Session ioctl */ > -#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x80) > -#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x81) > +#define LTTNG_KERNEL_ENABLE _IO(0xF6, 0x82) > +#define LTTNG_KERNEL_DISABLE _IO(0xF6, 0x83) > > #endif /* _LTT_KERNEL_IOCTL_H */ > diff --git a/src/common/lttng-kernel-old.h b/src/common/lttng-kernel-old.h > new file mode 100644 > index 0000000..1b8999a > --- /dev/null > +++ b/src/common/lttng-kernel-old.h > @@ -0,0 +1,115 @@ > +/* > + * Copyright (C) 2011 - Julien Desfossez > + * Mathieu Desnoyers > + * David Goulet > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2 only, > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along > + * with this program; if not, write to the Free Software Foundation, Inc., > + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. > + */ > + > +#ifndef _LTTNG_KERNEL_OLD_H > +#define _LTTNG_KERNEL_OLD_H > + > +#include > +#include > + > +/* > + * LTTng DebugFS ABI structures. > + * > + * This is the kernel ABI copied from lttng-modules tree. > + */ > + > +/* Perf counter attributes */ > +struct lttng_kernel_old_perf_counter_ctx { > + uint32_t type; > + uint64_t config; > + char name[LTTNG_KERNEL_SYM_NAME_LEN]; > +}; > + > +/* Event/Channel context */ > +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING1 16 > +#define LTTNG_KERNEL_OLD_CONTEXT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > +struct lttng_kernel_old_context { > + enum lttng_kernel_context_type ctx; > + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING1]; > + > + union { > + struct lttng_kernel_old_perf_counter_ctx perf_counter; > + char padding[LTTNG_KERNEL_OLD_CONTEXT_PADDING2]; > + } u; > +}; > + > +struct lttng_kernel_old_kretprobe { > + uint64_t addr; > + > + uint64_t offset; > + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > +}; > + > +/* > + * Either addr is used, or symbol_name and offset. > + */ > +struct lttng_kernel_old_kprobe { > + uint64_t addr; > + > + uint64_t offset; > + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > +}; > + > +/* Function tracer */ > +struct lttng_kernel_old_function { > + char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > +}; > + > +#define LTTNG_KERNEL_OLD_EVENT_PADDING1 16 > +#define LTTNG_KERNEL_OLD_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > +struct lttng_kernel_old_event { > + char name[LTTNG_KERNEL_SYM_NAME_LEN]; > + enum lttng_kernel_instrumentation instrumentation; > + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING1]; > + > + /* Per instrumentation type configuration */ > + union { > + struct lttng_kernel_old_kretprobe kretprobe; > + struct lttng_kernel_old_kprobe kprobe; > + struct lttng_kernel_old_function ftrace; > + char padding[LTTNG_KERNEL_OLD_EVENT_PADDING2]; > + } u; > +}; > + > +struct lttng_kernel_old_tracer_version { > + uint32_t major; > + uint32_t minor; > + uint32_t patchlevel; > +}; > + > +struct lttng_kernel_old_calibrate { > + enum lttng_kernel_calibrate_type type; /* type (input) */ > +}; > + > +/* > + * kernel channel > + */ > +#define LTTNG_KERNEL_OLD_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > +struct lttng_kernel_old_channel { > + int overwrite; /* 1: overwrite, 0: discard */ > + uint64_t subbuf_size; /* bytes */ > + uint64_t num_subbuf; /* power of 2 */ > + unsigned int switch_timer_interval; /* usec */ > + unsigned int read_timer_interval; /* usec */ > + enum lttng_event_output output; /* splice, mmap */ > + > + char padding[LTTNG_KERNEL_OLD_CHANNEL_PADDING1]; > +}; > + > +#endif /* _LTTNG_KERNEL_OLD_H */ > diff --git a/src/common/lttng-kernel.h b/src/common/lttng-kernel.h > index dbeb6aa..fa8ba61 100644 > --- a/src/common/lttng-kernel.h > +++ b/src/common/lttng-kernel.h > @@ -59,7 +59,7 @@ struct lttng_kernel_perf_counter_ctx { > uint32_t type; > uint64_t config; > char name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* Event/Channel context */ > #define LTTNG_KERNEL_CONTEXT_PADDING1 16 > @@ -72,14 +72,14 @@ struct lttng_kernel_context { > struct lttng_kernel_perf_counter_ctx perf_counter; > char padding[LTTNG_KERNEL_CONTEXT_PADDING2]; > } u; > -}; > +}__attribute__((packed)); > > struct lttng_kernel_kretprobe { > uint64_t addr; > > uint64_t offset; > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* > * Either addr is used, or symbol_name and offset. > @@ -89,12 +89,12 @@ struct lttng_kernel_kprobe { > > uint64_t offset; > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > /* Function tracer */ > struct lttng_kernel_function { > char symbol_name[LTTNG_KERNEL_SYM_NAME_LEN]; > -}; > +}__attribute__((packed)); > > #define LTTNG_KERNEL_EVENT_PADDING1 16 > #define LTTNG_KERNEL_EVENT_PADDING2 LTTNG_KERNEL_SYM_NAME_LEN + 32 > @@ -110,13 +110,13 @@ struct lttng_kernel_event { > struct lttng_kernel_function ftrace; > char padding[LTTNG_KERNEL_EVENT_PADDING2]; > } u; > -}; > +}__attribute__((packed)); > > struct lttng_kernel_tracer_version { > uint32_t major; > uint32_t minor; > uint32_t patchlevel; > -}; > +}__attribute__((packed)); > > enum lttng_kernel_calibrate_type { > LTTNG_KERNEL_CALIBRATE_KRETPROBE, > @@ -124,6 +124,21 @@ enum lttng_kernel_calibrate_type { > > struct lttng_kernel_calibrate { > enum lttng_kernel_calibrate_type type; /* type (input) */ > -}; > +}__attribute__((packed)); > + > +/* > + * kernel channel > + */ > +#define LTTNG_KERNEL_CHANNEL_PADDING1 LTTNG_SYMBOL_NAME_LEN + 32 > +struct lttng_kernel_channel { > + uint64_t subbuf_size; /* bytes */ > + uint64_t num_subbuf; /* power of 2 */ > + unsigned int switch_timer_interval; /* usec */ > + unsigned int read_timer_interval; /* usec */ > + enum lttng_event_output output; /* splice, mmap */ > + > + int overwrite; /* 1: overwrite, 0: discard */ > + char padding[LTTNG_KERNEL_CHANNEL_PADDING1]; > +}__attribute__((packed)); > > #endif /* _LTTNG_KERNEL_H */ From dgoulet at efficios.com Tue Oct 9 15:08:21 2012 From: dgoulet at efficios.com (David Goulet) Date: Tue, 09 Oct 2012 15:08:21 -0400 Subject: [lttng-dev] LTTng Tools 2.1 streaming commands In-Reply-To: <4CB817F8B8860343821834C09BFBA7A6E183805D4D@EUSAACMS0702.eamcs.ericsson.se> References: <4CB817F8B8860343821834C09BFBA7A6E183805D4D@EUSAACMS0702.eamcs.ericsson.se> Message-ID: <50747625.2010109@efficios.com> After talking a bit about this issue with other LTTng devs, it turns out that it makes more sense to have a "set-consumer" command and remove enable/disable-consumer from the cmd UI. I'll send a proposal on lttng-dev in the next days and please, everyone, feel free to give feedbacks on this. Thanks! David Bernd Hufmann: > Hello > > For the support of LTTng Tools 2.1 in Eclipse, I'm currently trying to > understand how to use the configuration for network streaming with the > updated "lttng create"-command and new "enable-consumer"-command. > > a) lttng enable-consumer > I find this command confusing because this command does not always > enables the consumer, even if the command name implies so. The enabling > actually depends on how the command is executed. > Examples: > > * "lttng enable-consumer -k -U net://" or "lttng > enable-consumer -k -C tcp:// -D tcp://" > don't enable the consumer. You need to either add option --enable or > execute subsequently "lttng enable-consumer --enable" > * lttng enable-consumer -k net:// does enable the > consumer. I took me a while to figure out the difference to the > example above: The option -U is omitted. > > > What the command actually provides, is 2 features: A way to configure > streaming (e.g. remote_addr) and a way to enable the consumer. Would it > be better to name it to "lttng configure-consumer"? Also, remove the > support of the possibility to not specify -U, -C or -D. The following > variants of this command should be enough: > lttng configure-consumer -k -U [--enable] > lttng configure-consumer -k -C -D [--enable] > lttng configure-consumer -k --enable > lttng configure-consumer -u -U [--enable] > lttng configure-consumer -u -C -D [--enable] > lttng configure-consumer -u --enable > > Please let me know what you think. > > b) lttng create [-U ] | [-C -D ] > [--no-consumer] [--disable-consumer] > > * Are options --no-consumer or --disable-consumer only applicable for > streaming? > * I'm not sure what is the purpose of the options --no-consumer or > --disable-consumer. Could you please explain the use cases for using > --no-consumer or --disable-consumer? > > > Thanks > Bernd > > This Communication is Confidential. We only send and receive email on > the basis of the terms set out at _www.ericsson.com/email_disclaimer_ > > > > > This body part will be downloaded on demand. From matthew.khouzam at ericsson.com Tue Oct 9 17:18:30 2012 From: matthew.khouzam at ericsson.com (Matthew Khouzam) Date: Tue, 9 Oct 2012 17:18:30 -0400 Subject: [lttng-dev] LTTng Tools 2.1 streaming commands In-Reply-To: <50747625.2010109@efficios.com> References: <4CB817F8B8860343821834C09BFBA7A6E183805D4D@EUSAACMS0702.eamcs.ericsson.se> <50747625.2010109@efficios.com> Message-ID: <507494A6.5090207@ericsson.com> I actually like this suggestion, it is less ambiguous... the last thing we want is for an enable-xyz to disable-xyz... Unless we want to frustrate the users. Cheers, Matthew On 12-10-09 03:08 PM, David Goulet wrote: > After talking a bit about this issue with other LTTng devs, it turns out > that it makes more sense to have a "set-consumer" command and remove > enable/disable-consumer from the cmd UI. > > I'll send a proposal on lttng-dev in the next days and please, everyone, > feel free to give feedbacks on this. > > Thanks! > David > > Bernd Hufmann: >> Hello >> >> For the support of LTTng Tools 2.1 in Eclipse, I'm currently trying to >> understand how to use the configuration for network streaming with the >> updated "lttng create"-command and new "enable-consumer"-command. >> >> a) lttng enable-consumer >> I find this command confusing because this command does not always >> enables the consumer, even if the command name implies so. The enabling >> actually depends on how the command is executed. >> Examples: >> >> * "lttng enable-consumer -k -U net://" or "lttng >> enable-consumer -k -C tcp:// -D tcp://" >> don't enable the consumer. You need to either add option --enable or >> execute subsequently "lttng enable-consumer --enable" >> * lttng enable-consumer -k net:// does enable the >> consumer. I took me a while to figure out the difference to the >> example above: The option -U is omitted. >> >> >> What the command actually provides, is 2 features: A way to configure >> streaming (e.g. remote_addr) and a way to enable the consumer. Would it >> be better to name it to "lttng configure-consumer"? Also, remove the >> support of the possibility to not specify -U, -C or -D. The following >> variants of this command should be enough: >> lttng configure-consumer -k -U [--enable] >> lttng configure-consumer -k -C -D [--enable] >> lttng configure-consumer -k --enable >> lttng configure-consumer -u -U [--enable] >> lttng configure-consumer -u -C -D [--enable] >> lttng configure-consumer -u --enable >> >> Please let me know what you think. >> >> b) lttng create [-U ] | [-C -D ] >> [--no-consumer] [--disable-consumer] >> >> * Are options --no-consumer or --disable-consumer only applicable for >> streaming? >> * I'm not sure what is the purpose of the options --no-consumer or >> --disable-consumer. Could you please explain the use cases for using >> --no-consumer or --disable-consumer? >> >> >> Thanks >> Bernd >> >> This Communication is Confidential. We only send and receive email on >> the basis of the terms set out at _www.ericsson.com/email_disclaimer_ >> >> >> >> >> This body part will be downloaded on demand. > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev From laijs at cn.fujitsu.com Tue Oct 9 22:53:30 2012 From: laijs at cn.fujitsu.com (Lai Jiangshan) Date: Wed, 10 Oct 2012 10:53:30 +0800 Subject: [lttng-dev] [rp] [URCU PATCH 0/3] wait-free concurrent queues (wfcqueue) In-Reply-To: <20121008150729.GB29352@Krystal> References: <20121002141307.GA4057@Krystal> <20121003182846.GN2527@linux.vnet.ibm.com> <20121003210436.GB25090@Krystal> <5072499B.1050301@cn.fujitsu.com> <20121008150729.GB29352@Krystal> Message-ID: <5074E32A.70305@cn.fujitsu.com> On 10/08/2012 11:07 PM, Mathieu Desnoyers wrote: > * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: >> On 10/04/2012 05:04 AM, Mathieu Desnoyers wrote: >>> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: >>>> On Tue, Oct 02, 2012 at 10:13:07AM -0400, Mathieu Desnoyers wrote: >>>>> Implement wait-free concurrent queues, with a new API different from >>>>> wfqueue.h, which is already provided by Userspace RCU. The advantage of >>>>> splitting the head and tail objects of the queue into different >>>>> arguments is to allow these to sit on different cache-lines, thus >>>>> eliminating false-sharing, leading to a 2.3x speed increase. >>>>> >>>>> This API also introduces a "splice" operation, which moves all nodes >>>>> from one queue into another, and postpones the synchronization to either >>>>> dequeue or iteration on the list. The splice operation does not need to >>>>> touch every single node of the queue it moves them from. Moreover, the >>>>> splice operation only needs to ensure mutual exclusion with other >>>>> dequeuers, iterations, and splice operations from the list it splices >>>>> from, but acts as a simple enqueuer on the list it splices into (no >>>>> mutual exclusion needed for that list). >>>>> >>>>> Feedback is welcome, >>>> >>>> These look sane to me, though I must confess that the tail pointer >>>> referencing the node rather than the node's next pointer did throw >>>> me for a bit. ;-) >>> >>> Yes, this was originally introduced with Lai's original patch to >>> wfqueue, which I think is a nice simplification: it's pretty much the >>> same thing to use the last node address as tail rather than the address >>> of its first member (its next pointer address (_not_ value)). It ends up >>> being the same address in this case, but more interestingly, we don't >>> have to use a struct cds_wfcq_node ** type: a simple struct >>> cds_wfcq_node * suffice. >>> >>> Thanks Paul, I will therefore merge these 3 patches with your Acked-by. >>> >>> Lai, you are welcome to provide improvements to this code against the >>> master branch. I will gladly consider any change you propose. >>> >> >> I did not remember that there is any improvement idea not included. >> The patchset is OK for me. > > Great! Would you be OK if I commit the following patch ? Let me know if > you want me to put your signed-off-by on this (I can even put your email > as From if you like): > > > wfcqueue: update credits in patch documentation > > Give credits to those responsible for the design and implementation of > commit 8ad4ce587f001ae026d5560ac509c2e48986130b, "wfcqueue: implement > concurrency-efficient queue", which happened through rounds of email and > patch exchanges. > > Signed-off-by: Mathieu Desnoyers > --- > diff --git a/urcu/static/wfcqueue.h b/urcu/static/wfcqueue.h > index a989984..153143d 100644 > --- a/urcu/static/wfcqueue.h > +++ b/urcu/static/wfcqueue.h > @@ -41,8 +41,10 @@ extern "C" { > /* > * Concurrent queue with wait-free enqueue/blocking dequeue. > * > - * Inspired from half-wait-free/half-blocking queue implementation done by > - * Paul E. McKenney. > + * This queue has been designed and implemented collaboratively by > + * Mathieu Desnoyers and Lai Jiangshan. Inspired from > + * half-wait-free/half-blocking queue implementation done by Paul E. > + * McKenney. > * > * Mutual exclusion of __cds_wfcq_* API > * > diff --git a/urcu/wfcqueue.h b/urcu/wfcqueue.h > index 5576cbf..940dc7d 100644 > --- a/urcu/wfcqueue.h > +++ b/urcu/wfcqueue.h > @@ -37,8 +37,10 @@ extern "C" { > /* > * Concurrent queue with wait-free enqueue/blocking dequeue. > * > - * Inspired from half-wait-free/half-blocking queue implementation done by > - * Paul E. McKenney. > + * This queue has been designed and implemented collaboratively by > + * Mathieu Desnoyers and Lai Jiangshan. Inspired from > + * half-wait-free/half-blocking queue implementation done by Paul E. > + * McKenney. > */ > > struct cds_wfcq_node { > > >> I think you can reimplement wfqueue via wfcqueue without cacheline opt. > > Hrm, semantically this can indeed be done, but I fear that we might not > be strictly ABI-compatible with the old wfqueue. So I would be tempted > to leave the old wfqueue implementation as-is, and maybe deprecate it at > some point. Thoughts ? > All APIs is not changed, they are forwarded to wfcqueue APIs in implementation, so ABI of APIs is compatible. The only thing is struct cds_wfq_queue: struct cds_wfq_queue { struct cds_wfq_node *head, **tail; struct cds_wfq_node dummy; /* Dummy node */ pthread_mutex_t lock; }; We can redefine it as: #define cds_wfq_node cds_wfcq_node struct cds_wfq_queue { union { struct cds_wfq_node *head; /* make bug-user who wrongly directly access to ->head happy */ struct cds_wfcq_node *__pad; /* not used in new implement */ }; union { struct cds_wfq_node **tail; /* make bug-user who wrongly directly access to ->tail happy */ struct cds_wfcq_tail real_tail; }; union { struct { struct cds_wfq_node dummy; /* Dummy node */ pthread_mutex_t lock; } struct cds_wfcq_head real_head; }; } static inline void _cds_wfq_init(struct cds_wfq_queue *q) { q->head = &q->dummy; /* make bug-user who wrongly directly access to ->head happy */ _cds_wfcq_init(&q->real_head, &q->real_tail); } after this change, struct cds_wfq_queue is not changed. Even bug-user wrongly directly access to struct cds_wfq_queue by old-view, the queue is compatible: head->dummy node->real node->real node.... tail->real tail node(or dummy node) the only different is that: dummy node is always the first node by old-view. And if we deprecate struct cds_wfq_queue, I think we should provide a new default wfqueue to users. Thanks, Lai From laijs at cn.fujitsu.com Tue Oct 9 22:56:53 2012 From: laijs at cn.fujitsu.com (Lai Jiangshan) Date: Wed, 10 Oct 2012 10:56:53 +0800 Subject: [lttng-dev] [URCU PATCH 1/3] wfcqueue: implement concurrency-efficient queue In-Reply-To: <20121002141444.GB4057@Krystal> References: <20121002141307.GA4057@Krystal> <20121002141444.GB4057@Krystal> Message-ID: <5074E3F5.3080005@cn.fujitsu.com> On 10/02/2012 10:14 PM, Mathieu Desnoyers wrote: > This new API simplify the wfqueue implementation, and brings a 2.3x to > 2.6x performance boost due to the ability to eliminate false-sharing > between enqueue and dequeue. > > This work is derived from the patch from Lai Jiangshan submitted as > "urcu: new wfqueue implementation" > (http://lists.lttng.org/pipermail/lttng-dev/2012-August/018379.html) > > Its changelog: > >> Some guys would be surprised by this fact: >> There are already TWO implementations of wfqueue in urcu. >> >> The first one is in urcu/static/wfqueue.h: >> 1) enqueue: exchange the tail and then update previous->next >> 2) dequeue: wait for first node's next pointer and them shift, a dummy node >> is introduced to avoid the queue->tail become NULL when shift. >> >> The second one shares some code with the first one, and the left code >> are spreading in urcu-call-rcu-impl.h: >> 1) enqueue: share with the first one >> 2) no dequeue operation: and no shift, so it don't need dummy node, >> Although the dummy node is queued when initialization, but it is removed >> after the first dequeue_all operation in call_rcu_thread(). >> call_rcu_data_free() forgets to handle the dummy node if it is not removed. >> 3)dequeue_all: record the old head and tail, and queue->head become the special >> tail node.(atomic record the tail and change the tail). >> >> The second implementation's code are spreading, bad for review, and it is not >> tested by tests/test_urcu_wfq. >> >> So we need a better implementation avoid the dummy node dancing and can service >> both generic wfqueue APIs and dequeue_all API for call rcu. >> >> The new implementation: >> 1) enqueue: share with the first one/original implementation. >> 2) dequeue: shift when node count >= 2, cmpxchg when node count = 1. >> no dummy node, save memory. >> 3) dequeue_all: simply set queue->head.next to NULL, xchg the tail >> and return the old head.next. >> >> More implementation details are in the code. >> tests/test_urcu_wfq will be update in future for testing new APIs. > > The patch proposed by Lai brings a very interesting simplification to > the single-node handling (which is kept here), and moves all queue > handling code away from call_rcu implementation, back into the wfqueue > code. This has the benefit to allow testing enhancements. > > I modified it so the API does not expose implementation details to the > user (e.g. ___cds_wfq_node_sync_next). I added a "splice" operation and > a for loop iterator which should allow wfqueue users to use the list > very efficiently both from LGPL/GPL code and from non-LGPL-compatible > code. > > I also changed the API so the queue head and tail are now two separate > structures: it allows the queue user to place these as they like, either > on different cache lines (to eliminate false-sharing), or close one to > another (on same cache-line) in case a queue is spliced onto the stack > and not concurrently accessed. > > Benchmarks performed on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz > (dual-core, with hyperthreading) > > Benchmark invoked: > for a in $(seq 1 10); do ./test_urcu_wfq 1 1 10 -a 0 -a 2; done > > (using cpu number 0 and 2, which should correspond to two cores of my > Intel 2-core/hyperthread processor) > > Before patch: > > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 97274297 nr_dequeues 80745742 successful enqueues 97274297 successful dequeues 80745321 end_dequeues 16528976 nr_ops 178020039 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 92300568 nr_dequeues 75019529 successful enqueues 92300568 successful dequeues 74973237 end_dequeues 17327331 nr_ops 167320097 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 93516443 nr_dequeues 75846726 successful enqueues 93516443 successful dequeues 75826578 end_dequeues 17689865 nr_ops 169363169 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 94160362 nr_dequeues 77967638 successful enqueues 94160362 successful dequeues 77967638 end_dequeues 16192724 nr_ops 172128000 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 97491956 nr_dequeues 81001191 successful enqueues 97491956 successful dequeues 81000247 end_dequeues 16491709 nr_ops 178493147 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 94101298 nr_dequeues 75650510 successful enqueues 94101298 successful dequeues 75649318 end_dequeues 18451980 nr_ops 169751808 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 94742803 nr_dequeues 75402105 successful enqueues 94742803 successful dequeues 75341859 end_dequeues 19400944 nr_ops 170144908 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 92198835 nr_dequeues 75037877 successful enqueues 92198835 successful dequeues 75027605 end_dequeues 17171230 nr_ops 167236712 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 94159560 nr_dequeues 77895972 successful enqueues 94159560 successful dequeues 77858442 end_dequeues 16301118 nr_ops 172055532 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 96059399 nr_dequeues 80115442 successful enqueues 96059399 successful dequeues 80066843 end_dequeues 15992556 nr_ops 176174841 > > After patch: > > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 221229322 nr_dequeues 210645491 successful enqueues 221229322 successful dequeues 210645088 end_dequeues 10584234 nr_ops 431874813 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 219803943 nr_dequeues 210377337 successful enqueues 219803943 successful dequeues 210368680 end_dequeues 9435263 nr_ops 430181280 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 237006358 nr_dequeues 237035340 successful enqueues 237006358 successful dequeues 236997050 end_dequeues 9308 nr_ops 474041698 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 235822443 nr_dequeues 235815942 successful enqueues 235822443 successful dequeues 235814020 end_dequeues 8423 nr_ops 471638385 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 235825567 nr_dequeues 235811803 successful enqueues 235825567 successful dequeues 235810526 end_dequeues 15041 nr_ops 471637370 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 221974953 nr_dequeues 210938190 successful enqueues 221974953 successful dequeues 210938190 end_dequeues 11036763 nr_ops 432913143 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 237994492 nr_dequeues 237938119 successful enqueues 237994492 successful dequeues 237930648 end_dequeues 63844 nr_ops 475932611 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 220634365 nr_dequeues 210491382 successful enqueues 220634365 successful dequeues 210490995 end_dequeues 10143370 nr_ops 431125747 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 237388065 nr_dequeues 237401251 successful enqueues 237388065 successful dequeues 237380295 end_dequeues 7770 nr_ops 474789316 > testdur 10 nr_enqueuers 1 wdelay 0 nr_dequeuers 1 rdur 0 nr_enqueues 221201436 nr_dequeues 210831162 successful enqueues 221201436 successful dequeues 210831162 end_dequeues 10370274 nr_ops 432032598 > > Summary: Both enqueue and dequeue speed increase: around 2.3x speedup > for enqueue, and around 2.6x for dequeue. > > We can verify that: > successful enqueues - successful dequeues = end_dequeues > > For all runs (ensures correctness: no lost node). > > CC: Lai Jiangshan > CC: Paul McKenney > Signed-off-by: Mathieu Desnoyers > --- > diff --git a/Makefile.am b/Makefile.am > index 2396fcf..ffdca9a 100644 > --- a/Makefile.am > +++ b/Makefile.am > @@ -16,7 +16,7 @@ nobase_dist_include_HEADERS = urcu/compiler.h urcu/hlist.h urcu/list.h \ > urcu/uatomic/generic.h urcu/arch/generic.h urcu/wfstack.h \ > urcu/wfqueue.h urcu/rculfstack.h urcu/rculfqueue.h \ > urcu/ref.h urcu/cds.h urcu/urcu_ref.h urcu/urcu-futex.h \ > - urcu/uatomic_arch.h urcu/rculfhash.h \ > + urcu/uatomic_arch.h urcu/rculfhash.h urcu/wfcqueue.h \ > $(top_srcdir)/urcu/map/*.h \ > $(top_srcdir)/urcu/static/*.h \ > urcu/tls-compat.h > @@ -53,7 +53,7 @@ lib_LTLIBRARIES = liburcu-common.la \ > # liburcu-common contains wait-free queues (needed by call_rcu) as well > # as futex fallbacks. > # > -liburcu_common_la_SOURCES = wfqueue.c wfstack.c $(COMPAT) > +liburcu_common_la_SOURCES = wfqueue.c wfcqueue.c wfstack.c $(COMPAT) > > liburcu_la_SOURCES = urcu.c urcu-pointer.c $(COMPAT) > liburcu_la_LIBADD = liburcu-common.la > diff --git a/urcu/static/wfcqueue.h b/urcu/static/wfcqueue.h > new file mode 100644 > index 0000000..a989984 > --- /dev/null > +++ b/urcu/static/wfcqueue.h > @@ -0,0 +1,380 @@ > +#ifndef _URCU_WFCQUEUE_STATIC_H > +#define _URCU_WFCQUEUE_STATIC_H > + > +/* > + * wfcqueue-static.h > + * > + * Userspace RCU library - Concurrent Queue with Wait-Free Enqueue/Blocking Dequeue > + * > + * TO BE INCLUDED ONLY IN LGPL-COMPATIBLE CODE. See wfcqueue.h for linking > + * dynamically with the userspace rcu library. > + * > + * Copyright 2010-2012 - Mathieu Desnoyers > + * Copyright 2011-2012 - Lai Jiangshan > + * > + * This library is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * This library is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with this library; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +/* > + * Concurrent queue with wait-free enqueue/blocking dequeue. > + * > + * Inspired from half-wait-free/half-blocking queue implementation done by > + * Paul E. McKenney. > + * > + * Mutual exclusion of __cds_wfcq_* API > + * > + * Unless otherwise stated, the caller must ensure mutual exclusion of > + * queue update operations "dequeue" and "splice" (for source queue). > + * Queue read operations "first" and "next" need to be protected against > + * concurrent "dequeue" and "splice" (for source queue) by the caller. > + * "enqueue", "splice" (for destination queue), and "empty" are the only > + * operations that can be used without any mutual exclusion. > + * Mutual exclusion can be ensured by holding cds_wfcq_dequeue_lock(). > + * > + * For convenience, cds_wfcq_dequeue_blocking() and > + * cds_wfcq_splice_blocking() hold the dequeue lock. > + */ > + > +#define WFCQ_ADAPT_ATTEMPTS 10 /* Retry if being set */ > +#define WFCQ_WAIT 10 /* Wait 10 ms if being set */ > + > +/* > + * cds_wfcq_node_init: initialize wait-free queue node. > + */ > +static inline void _cds_wfcq_node_init(struct cds_wfcq_node *node) > +{ > + node->next = NULL; > +} > + > +/* > + * cds_wfcq_init: initialize wait-free queue. > + */ > +static inline void _cds_wfcq_init(struct cds_wfcq_head *head, > + struct cds_wfcq_tail *tail) > +{ > + int ret; > + > + /* Set queue head and tail */ > + _cds_wfcq_node_init(&head->node); > + tail->p = &head->node; > + ret = pthread_mutex_init(&head->lock, NULL); > + assert(!ret); > +} > + > +/* > + * cds_wfcq_empty: return whether wait-free queue is empty. > + * > + * No memory barrier is issued. No mutual exclusion is required. > + */ > +static inline bool _cds_wfcq_empty(struct cds_wfcq_head *head, > + struct cds_wfcq_tail *tail) > +{ > + /* > + * Queue is empty if no node is pointed by head->node.next nor > + * tail->p. Even though the tail->p check is sufficient to find > + * out of the queue is empty, we first check head->node.next as a > + * common case to ensure that dequeuers do not frequently access > + * enqueuer's tail->p cache line. > + */ > + return CMM_LOAD_SHARED(head->node.next) == NULL > + && CMM_LOAD_SHARED(tail->p) == &head->node; > +} > + > +static inline void _cds_wfcq_dequeue_lock(struct cds_wfcq_head *head, > + struct cds_wfcq_tail *tail) > +{ > + int ret; > + > + ret = pthread_mutex_lock(&head->lock); > + assert(!ret); > +} > + > +static inline void _cds_wfcq_dequeue_unlock(struct cds_wfcq_head *head, > + struct cds_wfcq_tail *tail) > +{ > + int ret; > + > + ret = pthread_mutex_unlock(&head->lock); > + assert(!ret); > +} > + > +static inline void ___cds_wfcq_append(struct cds_wfcq_head *head, > + struct cds_wfcq_tail *tail, > + struct cds_wfcq_node *new_head, > + struct cds_wfcq_node *new_tail) > +{ > + struct cds_wfcq_node *old_tail; > + > + /* > + * Implicit memory barrier before uatomic_xchg() orders earlier > + * stores to data structure containing node and setting > + * node->next to NULL before publication. > + */ > + old_tail = uatomic_xchg(&tail->p, new_tail); > + > + /* > + * Implicit memory barrier after uatomic_xchg() orders store to > + * q->tail before store to old_tail->next. > + * > + * At this point, dequeuers see a NULL tail->p->next, which > + * indicates that the queue is being appended to. The following > + * store will append "node" to the queue from a dequeuer > + * perspective. > + */ > + CMM_STORE_SHARED(old_tail->next, new_head); > +} > + > +/* > + * cds_wfcq_enqueue: enqueue a node into a wait-free queue. > + * > + * Issues a full memory barrier before enqueue. No mutual exclusion is > + * required. > + */ > +static inline void _cds_wfcq_enqueue(struct cds_wfcq_head *head, > + struct cds_wfcq_tail *tail, > + struct cds_wfcq_node *new_tail) > +{ > + ___cds_wfcq_append(head, tail, new_tail, new_tail); > +} > + > +/* > + * Waiting for enqueuer to complete enqueue and return the next node. > + */ > +static inline struct cds_wfcq_node * > +___cds_wfcq_node_sync_next(struct cds_wfcq_node *node) > +{ > + struct cds_wfcq_node *next; > + int attempt = 0; > + > + /* > + * Adaptative busy-looping waiting for enqueuer to complete enqueue. > + */ > + while ((next = CMM_LOAD_SHARED(node->next)) == NULL) { > + if (++attempt >= WFCQ_ADAPT_ATTEMPTS) { > + poll(NULL, 0, WFCQ_WAIT); /* Wait for 10ms */ > + attempt = 0; > + } else { > + caa_cpu_relax(); > + } > + } > + > + return next; > +} > + > +/* > + * __cds_wfcq_first_blocking: get first node of a queue, without dequeuing. > + * > + * Content written into the node before enqueue is guaranteed to be > + * consistent, but no other memory ordering is ensured. > + * Should be called with cds_wfcq_dequeue_lock() held. > + */ > +static inline struct cds_wfcq_node * > +___cds_wfcq_first_blocking(struct cds_wfcq_head *head, > + struct cds_wfcq_tail *tail) > +{ > + struct cds_wfcq_node *node; > + > + if (_cds_wfcq_empty(head, tail)) > + return NULL; > + node = ___cds_wfcq_node_sync_next(&head->node); > + /* Load head->node.next before loading node's content */ > + cmm_smp_read_barrier_depends(); > + return node; > +} > + > +/* > + * __cds_wfcq_next_blocking: get next node of a queue, without dequeuing. > + * > + * Content written into the node before enqueue is guaranteed to be > + * consistent, but no other memory ordering is ensured. > + * Should be called with cds_wfcq_dequeue_lock() held. > + */ > +static inline struct cds_wfcq_node * > +___cds_wfcq_next_blocking(struct cds_wfcq_head *head, > + struct cds_wfcq_tail *tail, > + struct cds_wfcq_node *node) > +{ > + struct cds_wfcq_node *next; > + > + /* > + * Even though the following tail->p check is sufficient to find > + * out if we reached the end of the queue, we first check > + * node->next as a common case to ensure that iteration on nodes > + * do not frequently access enqueuer's tail->p cache line. > + */ > + if ((next = CMM_LOAD_SHARED(node->next)) == NULL) { > + /* Load node->next before tail->p */ > + cmm_smp_rmb(); > + if (CMM_LOAD_SHARED(tail->p) == node) > + return NULL; > + next = ___cds_wfcq_node_sync_next(node); > + } > + /* Load node->next before loading next's content */ > + cmm_smp_read_barrier_depends(); > + return next; > +} > + > +/* > + * __cds_wfcq_dequeue_blocking: dequeue a node from the queue. > + * > + * No need to go on a waitqueue here, as there is no possible state in which the > + * list could cause dequeue to busy-loop needlessly while waiting for another > + * thread to be scheduled. The queue appears empty until tail->next is set by > + * enqueue. > + * > + * Content written into the node before enqueue is guaranteed to be > + * consistent, but no other memory ordering is ensured. > + * It is valid to reuse and free a dequeued node immediately. > + * Should be called with cds_wfcq_dequeue_lock() held. > + */ > +static inline struct cds_wfcq_node * > +___cds_wfcq_dequeue_blocking(struct cds_wfcq_head *head, > + struct cds_wfcq_tail *tail) > +{ > + struct cds_wfcq_node *node, *next; > + > + if (_cds_wfcq_empty(head, tail)) > + return NULL; > + > + node = ___cds_wfcq_node_sync_next(&head->node); > + > + if ((next = CMM_LOAD_SHARED(node->next)) == NULL) { > + /* > + * @node is probably the only node in the queue. > + * Try to move the tail to &q->head. > + * q->head.next is set to NULL here, and stays > + * NULL if the cmpxchg succeeds. Should the > + * cmpxchg fail due to a concurrent enqueue, the > + * q->head.next will be set to the next node. > + * The implicit memory barrier before > + * uatomic_cmpxchg() orders load node->next > + * before loading q->tail. > + * The implicit memory barrier before uatomic_cmpxchg > + * orders load q->head.next before loading node's > + * content. > + */ > + _cds_wfcq_node_init(&head->node); > + if (uatomic_cmpxchg(&tail->p, node, &head->node) == node) > + return node; > + next = ___cds_wfcq_node_sync_next(node); > + } > + > + /* > + * Move queue head forward. > + */ > + head->node.next = next; > + > + /* Load q->head.next before loading node's content */ > + cmm_smp_read_barrier_depends(); > + return node; > +} > + > +/* > + * __cds_wfcq_splice_blocking: enqueue all src_q nodes at the end of dest_q. > + * > + * Dequeue all nodes from src_q. > + * dest_q must be already initialized. > + * Should be called with cds_wfcq_dequeue_lock() held on src_q. > + */ > +static inline void > +___cds_wfcq_splice_blocking( > + struct cds_wfcq_head *dest_q_head, > + struct cds_wfcq_tail *dest_q_tail, > + struct cds_wfcq_head *src_q_head, > + struct cds_wfcq_tail *src_q_tail) > +{ > + struct cds_wfcq_node *head, *tail; > + > + if (_cds_wfcq_empty(src_q_head, src_q_tail)) > + return; > + > + head = ___cds_wfcq_node_sync_next(&src_q_head->node); > + _cds_wfcq_node_init(&src_q_head->node); > + > + /* > + * Memory barrier implied before uatomic_xchg() orders store to > + * src_q->head before store to src_q->tail. This is required by > + * concurrent enqueue on src_q, which exchanges the tail before > + * updating the previous tail's next pointer. > + */ > + tail = uatomic_xchg(&src_q_tail->p, &src_q_head->node); > + > + /* > + * Append the spliced content of src_q into dest_q. Does not > + * require mutual exclusion on dest_q (wait-free). > + */ > + ___cds_wfcq_append(dest_q_head, dest_q_tail, head, tail); > +} > + > +/* > + * cds_wfcq_dequeue_blocking: dequeue a node from a wait-free queue. > + * > + * Content written into the node before enqueue is guaranteed to be > + * consistent, but no other memory ordering is ensured. > + * Mutual exlusion with (and only with) cds_wfcq_splice_blocking is > + * ensured. > + * It is valid to reuse and free a dequeued node immediately. > + */ > +static inline struct cds_wfcq_node * > +_cds_wfcq_dequeue_blocking(struct cds_wfcq_head *head, > + struct cds_wfcq_tail *tail) > +{ > + struct cds_wfcq_node *retval; > + > + _cds_wfcq_dequeue_lock(head, tail); > + retval = ___cds_wfcq_dequeue_blocking(head, tail); > + _cds_wfcq_dequeue_unlock(head, tail); > + return retval; > +} > + > +/* > + * cds_wfcq_splice_blocking: enqueue all src_q nodes at the end of dest_q. > + * > + * Dequeue all nodes from src_q. > + * dest_q must be already initialized. > + * Content written into the node before enqueue is guaranteed to be > + * consistent, but no other memory ordering is ensured. > + * Mutual exlusion with (and only with) cds_wfcq_dequeue_blocking is > + * ensured. > + */ > +static inline void > +_cds_wfcq_splice_blocking( > + struct cds_wfcq_head *dest_q_head, > + struct cds_wfcq_tail *dest_q_tail, > + struct cds_wfcq_head *src_q_head, > + struct cds_wfcq_tail *src_q_tail) > +{ > + _cds_wfcq_dequeue_lock(src_q_head, src_q_tail); > + ___cds_wfcq_splice_blocking(dest_q_head, dest_q_tail, > + src_q_head, src_q_tail); > + _cds_wfcq_dequeue_unlock(src_q_head, src_q_tail); > +} > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* _URCU_WFCQUEUE_STATIC_H */ > diff --git a/urcu/wfcqueue.h b/urcu/wfcqueue.h > new file mode 100644 > index 0000000..5576cbf > --- /dev/null > +++ b/urcu/wfcqueue.h > @@ -0,0 +1,263 @@ > +#ifndef _URCU_WFCQUEUE_H > +#define _URCU_WFCQUEUE_H > + > +/* > + * wfcqueue.h > + * > + * Userspace RCU library - Concurrent Queue with Wait-Free Enqueue/Blocking Dequeue > + * > + * Copyright 2010-2012 - Mathieu Desnoyers > + * Copyright 2011-2012 - Lai Jiangshan > + * > + * This library is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * This library is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with this library; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +/* > + * Concurrent queue with wait-free enqueue/blocking dequeue. > + * > + * Inspired from half-wait-free/half-blocking queue implementation done by > + * Paul E. McKenney. > + */ > + > +struct cds_wfcq_node { > + struct cds_wfcq_node *next; > +}; > + > +/* > + * Do not put head and tail on the same cache-line if concurrent > + * enqueue/dequeue are expected from many CPUs. This eliminates > + * false-sharing between enqueue and dequeue. > + */ > +struct cds_wfcq_head { > + struct cds_wfcq_node node; > + pthread_mutex_t lock; > +}; > + > +struct cds_wfcq_tail { > + struct cds_wfcq_node *p; > +}; Why use "p" here? I don't see the commits in the public git-tree. Or you forgot to update the tree and make "http" address happy. Thanks, Lai From mathieu.desnoyers at efficios.com Wed Oct 10 00:50:12 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 10 Oct 2012 00:50:12 -0400 Subject: [lttng-dev] [URCU PATCH 1/3] wfcqueue: implement concurrency-efficient queue In-Reply-To: <5074E3F5.3080005@cn.fujitsu.com> References: <20121002141307.GA4057@Krystal> <20121002141444.GB4057@Krystal> <5074E3F5.3080005@cn.fujitsu.com> Message-ID: <20121010045012.GA32082@Krystal> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > On 10/02/2012 10:14 PM, Mathieu Desnoyers wrote: [...] > > --- /dev/null > > +++ b/urcu/wfcqueue.h > > @@ -0,0 +1,263 @@ > > +#ifndef _URCU_WFCQUEUE_H > > +#define _URCU_WFCQUEUE_H > > + > > +/* > > + * wfcqueue.h > > + * > > + * Userspace RCU library - Concurrent Queue with Wait-Free Enqueue/Blocking Dequeue > > + * > > + * Copyright 2010-2012 - Mathieu Desnoyers > > + * Copyright 2011-2012 - Lai Jiangshan > > + * > > + * This library is free software; you can redistribute it and/or > > + * modify it under the terms of the GNU Lesser General Public > > + * License as published by the Free Software Foundation; either > > + * version 2.1 of the License, or (at your option) any later version. > > + * > > + * This library is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + * Lesser General Public License for more details. > > + * > > + * You should have received a copy of the GNU Lesser General Public > > + * License along with this library; if not, write to the Free Software > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#ifdef __cplusplus > > +extern "C" { > > +#endif > > + > > +/* > > + * Concurrent queue with wait-free enqueue/blocking dequeue. > > + * > > + * Inspired from half-wait-free/half-blocking queue implementation done by > > + * Paul E. McKenney. > > + */ > > + > > +struct cds_wfcq_node { > > + struct cds_wfcq_node *next; > > +}; > > + > > +/* > > + * Do not put head and tail on the same cache-line if concurrent > > + * enqueue/dequeue are expected from many CPUs. This eliminates > > + * false-sharing between enqueue and dequeue. > > + */ > > +struct cds_wfcq_head { > > + struct cds_wfcq_node node; > > + pthread_mutex_t lock; > > +}; > > + > > +struct cds_wfcq_tail { > > + struct cds_wfcq_node *p; > > +}; > > Why use "p" here? For lack of imagination. ;) Do you have something better to propose ? > > I don't see the commits in the public git-tree. Or you forgot to update the > tree and make "http" address happy. I have pushed those into the userspace RCU master branch a couple of days after posting them to the mailing lists. I just synchronized my urcu/wfcqueue volatile dev branch, but I expect to drop it soon, since it's now merged back into master. Thanks! Mathieu > > Thanks, > Lai -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Wed Oct 10 00:59:11 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 10 Oct 2012 00:59:11 -0400 Subject: [lttng-dev] [rp] [URCU PATCH 0/3] wait-free concurrent queues (wfcqueue) In-Reply-To: <5074E32A.70305@cn.fujitsu.com> References: <20121002141307.GA4057@Krystal> <20121003182846.GN2527@linux.vnet.ibm.com> <20121003210436.GB25090@Krystal> <5072499B.1050301@cn.fujitsu.com> <20121008150729.GB29352@Krystal> <5074E32A.70305@cn.fujitsu.com> Message-ID: <20121010045911.GB32082@Krystal> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > On 10/08/2012 11:07 PM, Mathieu Desnoyers wrote: > > * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > >> On 10/04/2012 05:04 AM, Mathieu Desnoyers wrote: > >>> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > >>>> On Tue, Oct 02, 2012 at 10:13:07AM -0400, Mathieu Desnoyers wrote: > >>>>> Implement wait-free concurrent queues, with a new API different from > >>>>> wfqueue.h, which is already provided by Userspace RCU. The advantage of > >>>>> splitting the head and tail objects of the queue into different > >>>>> arguments is to allow these to sit on different cache-lines, thus > >>>>> eliminating false-sharing, leading to a 2.3x speed increase. > >>>>> > >>>>> This API also introduces a "splice" operation, which moves all nodes > >>>>> from one queue into another, and postpones the synchronization to either > >>>>> dequeue or iteration on the list. The splice operation does not need to > >>>>> touch every single node of the queue it moves them from. Moreover, the > >>>>> splice operation only needs to ensure mutual exclusion with other > >>>>> dequeuers, iterations, and splice operations from the list it splices > >>>>> from, but acts as a simple enqueuer on the list it splices into (no > >>>>> mutual exclusion needed for that list). > >>>>> > >>>>> Feedback is welcome, > >>>> > >>>> These look sane to me, though I must confess that the tail pointer > >>>> referencing the node rather than the node's next pointer did throw > >>>> me for a bit. ;-) > >>> > >>> Yes, this was originally introduced with Lai's original patch to > >>> wfqueue, which I think is a nice simplification: it's pretty much the > >>> same thing to use the last node address as tail rather than the address > >>> of its first member (its next pointer address (_not_ value)). It ends up > >>> being the same address in this case, but more interestingly, we don't > >>> have to use a struct cds_wfcq_node ** type: a simple struct > >>> cds_wfcq_node * suffice. > >>> > >>> Thanks Paul, I will therefore merge these 3 patches with your Acked-by. > >>> > >>> Lai, you are welcome to provide improvements to this code against the > >>> master branch. I will gladly consider any change you propose. > >>> > >> > >> I did not remember that there is any improvement idea not included. > >> The patchset is OK for me. > > > > Great! Would you be OK if I commit the following patch ? Let me know if > > you want me to put your signed-off-by on this (I can even put your email > > as From if you like): > > > > > > wfcqueue: update credits in patch documentation > > > > Give credits to those responsible for the design and implementation of > > commit 8ad4ce587f001ae026d5560ac509c2e48986130b, "wfcqueue: implement > > concurrency-efficient queue", which happened through rounds of email and > > patch exchanges. > > > > Signed-off-by: Mathieu Desnoyers Hi Lai, Are you OK with this credits patch ? > > --- > > diff --git a/urcu/static/wfcqueue.h b/urcu/static/wfcqueue.h > > index a989984..153143d 100644 > > --- a/urcu/static/wfcqueue.h > > +++ b/urcu/static/wfcqueue.h > > @@ -41,8 +41,10 @@ extern "C" { > > /* > > * Concurrent queue with wait-free enqueue/blocking dequeue. > > * > > - * Inspired from half-wait-free/half-blocking queue implementation done by > > - * Paul E. McKenney. > > + * This queue has been designed and implemented collaboratively by > > + * Mathieu Desnoyers and Lai Jiangshan. Inspired from > > + * half-wait-free/half-blocking queue implementation done by Paul E. > > + * McKenney. > > * > > * Mutual exclusion of __cds_wfcq_* API > > * > > diff --git a/urcu/wfcqueue.h b/urcu/wfcqueue.h > > index 5576cbf..940dc7d 100644 > > --- a/urcu/wfcqueue.h > > +++ b/urcu/wfcqueue.h > > @@ -37,8 +37,10 @@ extern "C" { > > /* > > * Concurrent queue with wait-free enqueue/blocking dequeue. > > * > > - * Inspired from half-wait-free/half-blocking queue implementation done by > > - * Paul E. McKenney. > > + * This queue has been designed and implemented collaboratively by > > + * Mathieu Desnoyers and Lai Jiangshan. Inspired from > > + * half-wait-free/half-blocking queue implementation done by Paul E. > > + * McKenney. > > */ > > > > struct cds_wfcq_node { > > > > > >> I think you can reimplement wfqueue via wfcqueue without cacheline opt. > > > > Hrm, semantically this can indeed be done, but I fear that we might not > > be strictly ABI-compatible with the old wfqueue. So I would be tempted > > to leave the old wfqueue implementation as-is, and maybe deprecate it at > > some point. Thoughts ? > > > > All APIs is not changed, they are forwarded to wfcqueue APIs in implementation, > so ABI of APIs is compatible. The only thing is struct cds_wfq_queue: > > struct cds_wfq_queue { > struct cds_wfq_node *head, **tail; > struct cds_wfq_node dummy; /* Dummy node */ > pthread_mutex_t lock; > }; > > > We can redefine it as: > > #define cds_wfq_node cds_wfcq_node > > struct cds_wfq_queue { > union { > struct cds_wfq_node *head; /* make bug-user who wrongly directly access to ->head happy */ > struct cds_wfcq_node *__pad; /* not used in new implement */ > }; > union { > struct cds_wfq_node **tail; /* make bug-user who wrongly directly access to ->tail happy */ > struct cds_wfcq_tail real_tail; > }; > union { > struct { > struct cds_wfq_node dummy; /* Dummy node */ > pthread_mutex_t lock; > } > struct cds_wfcq_head real_head; > }; > } > > static inline void _cds_wfq_init(struct cds_wfq_queue *q) > { > q->head = &q->dummy; /* make bug-user who wrongly directly access to ->head happy */ > _cds_wfcq_init(&q->real_head, &q->real_tail); > } > > after this change, struct cds_wfq_queue is not changed. > Even bug-user wrongly directly access to struct cds_wfq_queue by old-view, > the queue is compatible: > head->dummy node->real node->real node.... > tail->real tail node(or dummy node) > > the only different is that: dummy node is always the first node by old-view. > > > And if we deprecate struct cds_wfq_queue, I think we should provide a > new default wfqueue to users. I guess the objective is nice, but I'm wondering if your mapping of wfqueue onto wfcqueue covers scenarios where we have objects that uses both the old and new view statically linked into the same program ? My ABI concern is not just about queue users directly accessing the fields of the queue, but also about binary-level compatibility of mixed old-new queues. Thoughts ? Thanks, Mathieu > > Thanks, > Lai -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From laijs at cn.fujitsu.com Wed Oct 10 03:52:08 2012 From: laijs at cn.fujitsu.com (Lai Jiangshan) Date: Wed, 10 Oct 2012 15:52:08 +0800 Subject: [lttng-dev] [RFC] re-document rculfstack and even rename it Message-ID: <50752928.3050006@cn.fujitsu.com> rculfstack is not really require RCU-only. 1) cds_lfs_push_rcu() don't need any lock, don't need RCU nor other locks. 2) cds_lfs_pop_rcu() don't only one of the following synchronization(not only RCU): A) use rcu_read_lock() to protect cds_lfs_pop_rcu() and use synchronize_rcu() or call_rcu() to free the popped node. (current comments said we need this synchronization, and thus we named this struct with rcu prefix. But actually, the followings are OK, and are more popular/friendly) B) use mutexs/locks to protect cds_lfs_pop_rcu(), we can free to free/modify the popped node any time, we don't need any synchronization when free them. C) only ONE thread can call cds_lfs_pop_rcu(). (multi-providers-single customer) D) others, like read-write locks. I consider B) and C) are more popular. In linux kernel, kernel/task_work.c uses a hybird ways of B) and C). I suggest to rename it, Or document B) and C) at least. Thanks, Lai From laijs at cn.fujitsu.com Wed Oct 10 06:13:43 2012 From: laijs at cn.fujitsu.com (Lai Jiangshan) Date: Wed, 10 Oct 2012 18:13:43 +0800 Subject: [lttng-dev] rculfstack bug Message-ID: <50754A57.1080104@cn.fujitsu.com> test code: ./tests/test_urcu_lfs 100 10 10 bug produce rate > 60% {{{ I didn't see any bug when "./tests/test_urcu_lfs 10 10 10" Or "./tests/test_urcu_lfs 100 100 10" But I just test it about 5 times }}} 4cores*1threads: Intel(R) Core(TM) i5 CPU 760 RCU_MB (no time to test for other rcu type) test commit: 768fba83676f49eb73fd1d8ad452016a84c5ec2a I didn't see any bug when "./tests/test_urcu_mb 10 100 10" Sorry, I tried, but I failed to find out the root cause currently. *** glibc detected *** /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs: double free or corruption (out): 0x00007f20955dfbb0 *** ======= Backtrace: ========= /lib64/libc.so.6[0x37ee676d63] /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs[0x4024f5] /lib64/libpthread.so.0[0x37eda06ccb] /lib64/libc.so.6(clone+0x6d)[0x37ee6e0c2d] ======= Memory map: ======== 00400000-00405000 r-xp 00000000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs 00605000-00606000 rw-p 00005000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs 00606000-00616000 rw-p 00000000 00:00 0 00e9c000-03482000 rw-p 00000000 00:00 0 [heap] 37ed600000-37ed61f000 r-xp 00000000 08:01 1507421 /lib64/ld-2.13.so 37ed81e000-37ed81f000 r--p 0001e000 08:01 1507421 /lib64/ld-2.13.so 37ed81f000-37ed820000 rw-p 0001f000 08:01 1507421 /lib64/ld-2.13.so 37ed820000-37ed821000 rw-p 00000000 00:00 0 37eda00000-37eda17000 r-xp 00000000 08:01 1507427 /lib64/libpthread-2.13.so 37eda17000-37edc16000 ---p 00017000 08:01 1507427 /lib64/libpthread-2.13.so 37edc16000-37edc17000 r--p 00016000 08:01 1507427 /lib64/libpthread-2.13.so 37edc17000-37edc18000 rw-p 00017000 08:01 1507427 /lib64/libpthread-2.13.so 37edc18000-37edc1c000 rw-p 00000000 00:00 0 37ee600000-37ee791000 r-xp 00000000 08:01 1507423 /lib64/libc-2.13.so 37ee791000-37ee991000 ---p 00191000 08:01 1507423 /lib64/libc-2.13.so 37ee991000-37ee995000 r--p 00191000 08:01 1507423 /lib64/libc-2.13.so 37ee995000-37ee996000 rw-p 00195000 08:01 1507423 /lib64/libc-2.13.so 37ee996000-37ee99c000 rw-p 00000000 00:00 0 37f0e00000-37f0e15000 r-xp 00000000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 37f0e15000-37f1014000 ---p 00015000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 37f1014000-37f1015000 rw-p 00014000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 7f1ee4000000-7f1ee4029000 rw-p 00000000 00:00 0 7f1ee4029000-7f1ee8000000 ---p 00000000 00:00 0 7f1eec000000-7f1eee039000 rw-p 00000000 00:00 0 7f1eee039000-7f1ef0000000 ---p 00000000 00:00 0 7f1ef4000000-7f1ef4029000 rw-p 00000000 00:00 0 7f1ef4029000-7f1ef8000000 ---p 00000000 00:00 0 7f1efc000000-7f1efc029000 rw-p 00000000 00:00 0 7f1efc029000-7f1f00000000 ---p 00000000 00:00 0 7f1f04000000-7f1f060b8000 rw-p 00000000 00:00 0 7f1f060b8000-7f1f08000000 ---p 00000000 00:00 0 7f1f0c000000-7f1f0c029000 rw-p 00000000 00:00 0 7f1f0c029000-7f1f10000000 ---p 00000000 00:00 0 7f1f14000000-7f1f14029000 rw-p 00000000 00:00 0 7f1f14029000-7f1f18000000 ---p 00000000 00:00 0 7f1f1c000000-7f1f1c029000 rw-p 00000000 00:00 0 7f1f1c029000-7f1f20000000 ---p 00000000 00:00 0 7f1f24000000-7f1f24029000 rw-p 00000000 00:00 0 7f1f24029000-7f1f28000000 ---p 00000000 00:00 0 7f1f2c000000-7f1f2c029000 rw-p 00000000 00:00 0 7f1f2c029000-7f1f30000000 ---p 00000000 00:00 0 7f1f34000000-7f1f34029000 rw-p 00000000 00:00 0 7f1f34029000-7f1f38000000 ---p 00000000 00:00 0 7f1f3c000000-7f1f3c029000 rw-p 00000000 00:00 0 7f1f3c029000-7f1f40000000 ---p 00000000 00:00 0 7f1f44000000-7f1f44029000 rw-p 00000000 00:00 0 7f1f44029000-7f1f48000000 ---p 00000000 00:00 0 7f1f4c000000-7f1f4c029000 rw-p 00000000 00:00 0 7f1f4c029000-7f1f50000000 ---p 00000000 00:00 0 7f1f54000000-7f1f54029000 rw-p 00000000 00:00 0 7f1f54029000-7f1f58000000 ---p 00000000 00:00 0 7f1f5c000000-7f1f5c029000 rw-p 00000000 00:00 0 7f1f5c029000-7f1f60000000 ---p 00000000 00:00 0 7f1f64000000-7f1f64029000 rw-p 00000000 00:00 0 7f1f64029000-7f1f68000000 ---p 00000000 00:00 0 7f1f6c000000-7f1f6c029000 rw-p 00000000 00:00 0 7f1f6c029000-7f1f70000000 ---p 00000000 00:00 0 7f1f74000000-7f1f74029000 rw-p 00000000 00:00 0 7f1f74029000-7f1f78000000 ---p 00000000 00:00 0 7f1f7c000000-7f1f7c029000 rw-p 00000000 00:00 0 7f1f7c029000-7f1f80000000 ---p 00000000 00:00 0 7f1f84000000-7f1f84029000 rw-p 00000000 00:00 0 7f1f84029000-7f1f88000000 ---p 00000000 00:00 0 7f1f8c000000-7f1f8c029000 rw-p 00000000 00:00 0 7f1f8c029000-7f1f90000000 ---p 00000000 00:00 0 7f1f94000000-7f1f94029000 rw-p 00000000 00:00 0 7f1f94029000-7f1f98000000 ---p 00000000 00:00 0 7f1f9c000000-7f1f9c029000 rw-p 00000000 00:00 0 7f1f9c029000-7f1fa0000000 ---p 00000000 00:00 0 7f1fa4000000-7f1fa60ac000 rw-p 00000000 00:00 0 7f1fa60ac000-7f1fa8000000 ---p 00000000 00:00 0 7f1fac000000-7f1fac029000 rw-p 00000000 00:00 0 7f1fac029000-7f1fb0000000 ---p 00000000 00:00 0 7f1fb4000000-7f1fb4029000 rw-p 00000000 00:00 0 7f1fb4029000-7f1fb8000000 ---p 00000000 00:00 0 7f1fbc000000-7f1fbc029000 rw-p 00000000 00:00 0 7f1fbc029000-7f1fc0000000 ---p 00000000 00:00 0 7f1fc4000000-7f1fc4029000 rw-p 00000000 00:00 0 7f1fc4029000-7f1fc8000000 ---p 00000000 00:00 0 7f1fcc000000-7f1fce0a1000 rw-p 00000000 00:00 0 7f1fce0a1000-7f1fd0000000 ---p 00000000 00:00 0 7f1fd4000000-7f1fd4029000 rw-p 00000000 00:00 0 7f1fd4029000-7f1fd8000000 ---p 00000000 00:00 0 7f1fdc000000-7f1fde06b000 rw-p 00000000 00:00 0 7f1fde06b000-7f1fe0000000 ---p 00000000 00:00 0 7f1fe4000000-7f1fe4029000 rw-p 00000000 00:00 0 7f1fe4029000-7f1fe8000000 ---p 00000000 00:00 0 7f1fec000000-7f1fede38000 rw-p 00000000 00:00 0 7f1fede38000-7f1ff0000000 ---p 00000000 00:00 0 7f1ff4000000-7f1ff4029000 rw-p 00000000 00:00 0 7f1ff4029000-7f1ff8000000 ---p 00000000 00:00 0 7f1ffc000000-7f1ffc029000 rw-p 00000000 00:00 0 7f1ffc029000-7f2000000000 ---p 00000000 00:00 0 7f2004000000-7f20060c6000 rw-p 00000000 00:00 0 7f20060c6000-7f2008000000 ---p 00000000 00:00 0 7f200c000000-7f200c029000 rw-p 00000000 00:00 0 7f200c029000-7f2010000000 ---p 00000000 00:00 0 7f2014000000-7f2014029000 rw-p 00000000 00:00 0 7f2014029000-7f2018000000 ---p 00000000 00:00 0 7f201c000000-7f201c029000 rw-p 00000000 00:00 0 7f201c029000-7f2020000000 ---p 00000000 00:00 0 7f2024000000-7f2024029000 rw-p 00000000 00:00 0 7f2024029000-7f2028000000 ---p 00000000 00:00 0 7f202c000000-7f202c029000 rw-p 00000000 00:00 0 7f202c029000-7f2030000000 ---p 00000000 00:00 0 7f2034000000-7f2034029000 rw-p 00000000 00:00 0 7f2034029000-7f2038000000 ---p 00000000 00:00 0 7f203c000000-7f203c029000 rw-p 00000000 00:00 0 7f203c029000-7f2040000000 ---p 00000000 00:00 0 7f2044000000-7f2044029000 rw-p 00000000 00:00 0 7f2044029000-7f2048000000 ---p 00000000 00:00 0 7f204c000000-7f204c029000 rw-p 00000000 00:00 0 7f204c029000-7f2050000000 ---p 00000000 00:00 0 7f2054000000-7f2054029000 rw-p 00000000 00:00 0 7f2054029000-7f2058000000 ---p 00000000 00:00 0 7f205c000000-7f205c029000 rw-p 00000000 00:00 0 7f205c029000-7f2060000000 ---p 00000000 00:00 0 7f2064000000-7f2064029000 rw-p 00000000 00:00 0 7f2064029000-7f2068000000 ---p 00000000 00:00 0 7f206c000000-7f206c029000 rw-p 00000000 00:00 0 7f206c029000-7f2070000000 ---p 00000000 00:00 0 7f2074000000-7f2074029000 rw-p 00000000 00:00 0 7f2074029000-7f2078000000 ---p 00000000 00:00 0 7f207c000000-7f207e0bc000 rw-p 00000000 00:00 0 7f207e0bc000-7f2080000000 ---p 00000000 00:00 0 7f2084000000-7f2084029000 rw-p 00000000 00:00 0 7f2084029000-7f2088000000 ---p 00000000 00:00 0 7f208c000000-7f208c029000 rw-p 00000000 00:00 0 7f208c029000-7f2090000000 ---p 00000000 00:00 0 7f2094000000-7f20960c6000 rw-p 00000000 00:00 0 7f20960c6000-7f2098000000 ---p 00000000 00:00 0 7f209c000000-7f209c029000 rw-p 00000000 00:00 0 7f209c029000-7f20a0000000 ---p 00000000 00:00 0 7f20a4000000-7f20a4029000 rw-p 00000000 00:00 0 7f20a4029000-7f20a8000000 ---p 00000000 00:00 0 7f20ac000000-7f20ac029000 rw-p 00000000 00:00 0 7f20ac029000-7f20b0000000 ---p 00000000 00:00 0 7f20b4000000-7f20b4029000 rw-p 00000000 00:00 0 7f20b4029000-7f20b8000000 ---p 00000000 00:00 0 7f20bc000000-7f20bc029000 rw-p 00000000 00:00 0 7f20bc029000-7f20c0000000 ---p 00000000 00:00 0 7f20c4000000-7f20c4029000 rw-p 00000000 00:00 0 7f20c4029000-7f20c8000000 ---p 00000000 00:00 0 7f20c8ffa000-7f20c8ffb000 ---p 00000000 00:00 0 7f20c8ffb000-7f20c97fb000 rw-p 00000000 00:00 0 [stack:10274] 7f20c97fb000-7f20c97fc000 ---p 00000000 00:00 0 7f20c97fc000-7f20c9ffc000 rw-p 00000000 00:00 0 7f20c9ffc000-7f20c9ffd000 ---p 00000000 00:00 0 7f20c9ffd000-7f20ca7fd000 rw-p 00000000 00:00 0 7f20ca7fd000-7f20ca7fe000 ---p 00000000 00:00 0 7f20ca7fe000-7f20caffe000 rw-p 00000000 00:00 0 7f20cc000000-7f20cc029000 rw-p 00000000 00:00 0 7f20cc029000-7f20d0000000 ---p 00000000 00:00 0 7f20d4000000-7f20d4029000 rw-p 00000000 00:00 0 7f20d4029000-7f20d8000000 ---p 00000000 00:00 0 7f20dc000000-7f20dc029000 rw-p 00000000 00:00 0 7f20dc029000-7f20e0000000 ---p 00000000 00:00 0 7f210d9dd000-7f210d9de000 ---p 00000000 00:00 0 7f210d9de000-7f210e1de000 rw-p 00000000 00:00 0 [stack:10160] 7f210e1de000-7f210e1df000 ---p 00000000 00:00 0 7f210e1df000-7f210e9df000 rw-p 00000000 00:00 0 [stack:10159] 7f210e9df000-7f210e9e0000 ---p 00000000 00:00 0 7f210e9e0000-7f210f1e0000 rw-p 00000000 00:00 0 7f210f1e0000-7f210f1e1000 ---p 00000000 00:00 0 7f210f1e1000-7f210f9e4000 rw-p 00000000 00:00 0 7f210fa00000-7f210fa01000 rw-p 00000000 00:00 0 7f210fa01000-7f210fa02000 r-xp 00000000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 7f210fa02000-7f210fc02000 ---p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 7f210fc02000-7f210fc03000 rw-p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 7f210fc03000-7f210fc04000 rw-p 00000000 00:00 0 7f210fc04000-7f210fc0a000 r-xp 00000000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 7f210fc0a000-7f210fe09000 ---p 00006000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 7f210fe09000-7f210fe0a000 rw-p 00005000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 7f210fe0a000-7f210fe0b000 rw-p 00000000 00:00 0 7fff7c648000-7fff7c669000 rw-p 00000000 00:00 0 [stack] 7fff7c715000-7fff7c716000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] From mathieu.desnoyers at efficios.com Wed Oct 10 07:42:15 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 10 Oct 2012 07:42:15 -0400 Subject: [lttng-dev] rculfstack bug In-Reply-To: <50754A57.1080104@cn.fujitsu.com> References: <50754A57.1080104@cn.fujitsu.com> Message-ID: <20121010114215.GA11307@Krystal> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > test code: > ./tests/test_urcu_lfs 100 10 10 > > bug produce rate > 60% > > {{{ > I didn't see any bug when "./tests/test_urcu_lfs 10 10 10" Or "./tests/test_urcu_lfs 100 100 10" > But I just test it about 5 times > }}} > > 4cores*1threads: Intel(R) Core(TM) i5 CPU 760 > RCU_MB (no time to test for other rcu type) > test commit: 768fba83676f49eb73fd1d8ad452016a84c5ec2a > > I didn't see any bug when "./tests/test_urcu_mb 10 100 10" > > Sorry, I tried, but I failed to find out the root cause currently. I think I managed to narrow down the issue: 1) the master branch does not reproduce it, but commit 768fba83676f49eb73fd1d8ad452016a84c5ec2a repdroduces it about 50% of the time. 2) the main change between 768fba83676f49eb73fd1d8ad452016a84c5ec2a and current master (f94061a3df4c9eab9ac869a19e4228de54771fcb) is call_rcu moving to wfcqueue. 3) the bug always arise, for me, at the end of the 10 seconds. However, it might be simply due to the fact that most of the memory get freed at the end of program execution. 4) I've been able to get a backtrace, and it looks like we have some call_rcu callback-invokation threads still working while call_rcu_data_free() is invoked. In the backtrace, call_rcu_data_free() is nicely waiting for the next thread to stop, and during that time, two callback-invokation threads are invoking callbacks (and one of them triggers the segfault). So I expect that commit commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe Author: Mathieu Desnoyers Date: Tue Sep 25 10:50:49 2012 -0500 call_rcu: use wfcqueue, eliminate false-sharing Eliminate false-sharing between call_rcu (enqueuer) and worker threads on the queue head and tail. Acked-by: Paul E. McKenney Signed-off-by: Mathieu Desnoyers Could have managed to fix the issue, or change the timing enough that it does not reproduces. I'll continue investigating. Thanks, Mathieu > > *** glibc detected *** /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs: double free or corruption (out): 0x00007f20955dfbb0 *** > ======= Backtrace: ========= > /lib64/libc.so.6[0x37ee676d63] > /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs[0x4024f5] > /lib64/libpthread.so.0[0x37eda06ccb] > /lib64/libc.so.6(clone+0x6d)[0x37ee6e0c2d] > ======= Memory map: ======== > 00400000-00405000 r-xp 00000000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs > 00605000-00606000 rw-p 00005000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs > 00606000-00616000 rw-p 00000000 00:00 0 > 00e9c000-03482000 rw-p 00000000 00:00 0 [heap] > 37ed600000-37ed61f000 r-xp 00000000 08:01 1507421 /lib64/ld-2.13.so > 37ed81e000-37ed81f000 r--p 0001e000 08:01 1507421 /lib64/ld-2.13.so > 37ed81f000-37ed820000 rw-p 0001f000 08:01 1507421 /lib64/ld-2.13.so > 37ed820000-37ed821000 rw-p 00000000 00:00 0 > 37eda00000-37eda17000 r-xp 00000000 08:01 1507427 /lib64/libpthread-2.13.so > 37eda17000-37edc16000 ---p 00017000 08:01 1507427 /lib64/libpthread-2.13.so > 37edc16000-37edc17000 r--p 00016000 08:01 1507427 /lib64/libpthread-2.13.so > 37edc17000-37edc18000 rw-p 00017000 08:01 1507427 /lib64/libpthread-2.13.so > 37edc18000-37edc1c000 rw-p 00000000 00:00 0 > 37ee600000-37ee791000 r-xp 00000000 08:01 1507423 /lib64/libc-2.13.so > 37ee791000-37ee991000 ---p 00191000 08:01 1507423 /lib64/libc-2.13.so > 37ee991000-37ee995000 r--p 00191000 08:01 1507423 /lib64/libc-2.13.so > 37ee995000-37ee996000 rw-p 00195000 08:01 1507423 /lib64/libc-2.13.so > 37ee996000-37ee99c000 rw-p 00000000 00:00 0 > 37f0e00000-37f0e15000 r-xp 00000000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > 37f0e15000-37f1014000 ---p 00015000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > 37f1014000-37f1015000 rw-p 00014000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > 7f1ee4000000-7f1ee4029000 rw-p 00000000 00:00 0 > 7f1ee4029000-7f1ee8000000 ---p 00000000 00:00 0 > 7f1eec000000-7f1eee039000 rw-p 00000000 00:00 0 > 7f1eee039000-7f1ef0000000 ---p 00000000 00:00 0 > 7f1ef4000000-7f1ef4029000 rw-p 00000000 00:00 0 > 7f1ef4029000-7f1ef8000000 ---p 00000000 00:00 0 > 7f1efc000000-7f1efc029000 rw-p 00000000 00:00 0 > 7f1efc029000-7f1f00000000 ---p 00000000 00:00 0 > 7f1f04000000-7f1f060b8000 rw-p 00000000 00:00 0 > 7f1f060b8000-7f1f08000000 ---p 00000000 00:00 0 > 7f1f0c000000-7f1f0c029000 rw-p 00000000 00:00 0 > 7f1f0c029000-7f1f10000000 ---p 00000000 00:00 0 > 7f1f14000000-7f1f14029000 rw-p 00000000 00:00 0 > 7f1f14029000-7f1f18000000 ---p 00000000 00:00 0 > 7f1f1c000000-7f1f1c029000 rw-p 00000000 00:00 0 > 7f1f1c029000-7f1f20000000 ---p 00000000 00:00 0 > 7f1f24000000-7f1f24029000 rw-p 00000000 00:00 0 > 7f1f24029000-7f1f28000000 ---p 00000000 00:00 0 > 7f1f2c000000-7f1f2c029000 rw-p 00000000 00:00 0 > 7f1f2c029000-7f1f30000000 ---p 00000000 00:00 0 > 7f1f34000000-7f1f34029000 rw-p 00000000 00:00 0 > 7f1f34029000-7f1f38000000 ---p 00000000 00:00 0 > 7f1f3c000000-7f1f3c029000 rw-p 00000000 00:00 0 > 7f1f3c029000-7f1f40000000 ---p 00000000 00:00 0 > 7f1f44000000-7f1f44029000 rw-p 00000000 00:00 0 > 7f1f44029000-7f1f48000000 ---p 00000000 00:00 0 > 7f1f4c000000-7f1f4c029000 rw-p 00000000 00:00 0 > 7f1f4c029000-7f1f50000000 ---p 00000000 00:00 0 > 7f1f54000000-7f1f54029000 rw-p 00000000 00:00 0 > 7f1f54029000-7f1f58000000 ---p 00000000 00:00 0 > 7f1f5c000000-7f1f5c029000 rw-p 00000000 00:00 0 > 7f1f5c029000-7f1f60000000 ---p 00000000 00:00 0 > 7f1f64000000-7f1f64029000 rw-p 00000000 00:00 0 > 7f1f64029000-7f1f68000000 ---p 00000000 00:00 0 > 7f1f6c000000-7f1f6c029000 rw-p 00000000 00:00 0 > 7f1f6c029000-7f1f70000000 ---p 00000000 00:00 0 > 7f1f74000000-7f1f74029000 rw-p 00000000 00:00 0 > 7f1f74029000-7f1f78000000 ---p 00000000 00:00 0 > 7f1f7c000000-7f1f7c029000 rw-p 00000000 00:00 0 > 7f1f7c029000-7f1f80000000 ---p 00000000 00:00 0 > 7f1f84000000-7f1f84029000 rw-p 00000000 00:00 0 > 7f1f84029000-7f1f88000000 ---p 00000000 00:00 0 > 7f1f8c000000-7f1f8c029000 rw-p 00000000 00:00 0 > 7f1f8c029000-7f1f90000000 ---p 00000000 00:00 0 > 7f1f94000000-7f1f94029000 rw-p 00000000 00:00 0 > 7f1f94029000-7f1f98000000 ---p 00000000 00:00 0 > 7f1f9c000000-7f1f9c029000 rw-p 00000000 00:00 0 > 7f1f9c029000-7f1fa0000000 ---p 00000000 00:00 0 > 7f1fa4000000-7f1fa60ac000 rw-p 00000000 00:00 0 > 7f1fa60ac000-7f1fa8000000 ---p 00000000 00:00 0 > 7f1fac000000-7f1fac029000 rw-p 00000000 00:00 0 > 7f1fac029000-7f1fb0000000 ---p 00000000 00:00 0 > 7f1fb4000000-7f1fb4029000 rw-p 00000000 00:00 0 > 7f1fb4029000-7f1fb8000000 ---p 00000000 00:00 0 > 7f1fbc000000-7f1fbc029000 rw-p 00000000 00:00 0 > 7f1fbc029000-7f1fc0000000 ---p 00000000 00:00 0 > 7f1fc4000000-7f1fc4029000 rw-p 00000000 00:00 0 > 7f1fc4029000-7f1fc8000000 ---p 00000000 00:00 0 > 7f1fcc000000-7f1fce0a1000 rw-p 00000000 00:00 0 > 7f1fce0a1000-7f1fd0000000 ---p 00000000 00:00 0 > 7f1fd4000000-7f1fd4029000 rw-p 00000000 00:00 0 > 7f1fd4029000-7f1fd8000000 ---p 00000000 00:00 0 > 7f1fdc000000-7f1fde06b000 rw-p 00000000 00:00 0 > 7f1fde06b000-7f1fe0000000 ---p 00000000 00:00 0 > 7f1fe4000000-7f1fe4029000 rw-p 00000000 00:00 0 > 7f1fe4029000-7f1fe8000000 ---p 00000000 00:00 0 > 7f1fec000000-7f1fede38000 rw-p 00000000 00:00 0 > 7f1fede38000-7f1ff0000000 ---p 00000000 00:00 0 > 7f1ff4000000-7f1ff4029000 rw-p 00000000 00:00 0 > 7f1ff4029000-7f1ff8000000 ---p 00000000 00:00 0 > 7f1ffc000000-7f1ffc029000 rw-p 00000000 00:00 0 > 7f1ffc029000-7f2000000000 ---p 00000000 00:00 0 > 7f2004000000-7f20060c6000 rw-p 00000000 00:00 0 > 7f20060c6000-7f2008000000 ---p 00000000 00:00 0 > 7f200c000000-7f200c029000 rw-p 00000000 00:00 0 > 7f200c029000-7f2010000000 ---p 00000000 00:00 0 > 7f2014000000-7f2014029000 rw-p 00000000 00:00 0 > 7f2014029000-7f2018000000 ---p 00000000 00:00 0 > 7f201c000000-7f201c029000 rw-p 00000000 00:00 0 > 7f201c029000-7f2020000000 ---p 00000000 00:00 0 > 7f2024000000-7f2024029000 rw-p 00000000 00:00 0 > 7f2024029000-7f2028000000 ---p 00000000 00:00 0 > 7f202c000000-7f202c029000 rw-p 00000000 00:00 0 > 7f202c029000-7f2030000000 ---p 00000000 00:00 0 > 7f2034000000-7f2034029000 rw-p 00000000 00:00 0 > 7f2034029000-7f2038000000 ---p 00000000 00:00 0 > 7f203c000000-7f203c029000 rw-p 00000000 00:00 0 > 7f203c029000-7f2040000000 ---p 00000000 00:00 0 > 7f2044000000-7f2044029000 rw-p 00000000 00:00 0 > 7f2044029000-7f2048000000 ---p 00000000 00:00 0 > 7f204c000000-7f204c029000 rw-p 00000000 00:00 0 > 7f204c029000-7f2050000000 ---p 00000000 00:00 0 > 7f2054000000-7f2054029000 rw-p 00000000 00:00 0 > 7f2054029000-7f2058000000 ---p 00000000 00:00 0 > 7f205c000000-7f205c029000 rw-p 00000000 00:00 0 > 7f205c029000-7f2060000000 ---p 00000000 00:00 0 > 7f2064000000-7f2064029000 rw-p 00000000 00:00 0 > 7f2064029000-7f2068000000 ---p 00000000 00:00 0 > 7f206c000000-7f206c029000 rw-p 00000000 00:00 0 > 7f206c029000-7f2070000000 ---p 00000000 00:00 0 > 7f2074000000-7f2074029000 rw-p 00000000 00:00 0 > 7f2074029000-7f2078000000 ---p 00000000 00:00 0 > 7f207c000000-7f207e0bc000 rw-p 00000000 00:00 0 > 7f207e0bc000-7f2080000000 ---p 00000000 00:00 0 > 7f2084000000-7f2084029000 rw-p 00000000 00:00 0 > 7f2084029000-7f2088000000 ---p 00000000 00:00 0 > 7f208c000000-7f208c029000 rw-p 00000000 00:00 0 > 7f208c029000-7f2090000000 ---p 00000000 00:00 0 > 7f2094000000-7f20960c6000 rw-p 00000000 00:00 0 > 7f20960c6000-7f2098000000 ---p 00000000 00:00 0 > 7f209c000000-7f209c029000 rw-p 00000000 00:00 0 > 7f209c029000-7f20a0000000 ---p 00000000 00:00 0 > 7f20a4000000-7f20a4029000 rw-p 00000000 00:00 0 > 7f20a4029000-7f20a8000000 ---p 00000000 00:00 0 > 7f20ac000000-7f20ac029000 rw-p 00000000 00:00 0 > 7f20ac029000-7f20b0000000 ---p 00000000 00:00 0 > 7f20b4000000-7f20b4029000 rw-p 00000000 00:00 0 > 7f20b4029000-7f20b8000000 ---p 00000000 00:00 0 > 7f20bc000000-7f20bc029000 rw-p 00000000 00:00 0 > 7f20bc029000-7f20c0000000 ---p 00000000 00:00 0 > 7f20c4000000-7f20c4029000 rw-p 00000000 00:00 0 > 7f20c4029000-7f20c8000000 ---p 00000000 00:00 0 > 7f20c8ffa000-7f20c8ffb000 ---p 00000000 00:00 0 > 7f20c8ffb000-7f20c97fb000 rw-p 00000000 00:00 0 [stack:10274] > 7f20c97fb000-7f20c97fc000 ---p 00000000 00:00 0 > 7f20c97fc000-7f20c9ffc000 rw-p 00000000 00:00 0 > 7f20c9ffc000-7f20c9ffd000 ---p 00000000 00:00 0 > 7f20c9ffd000-7f20ca7fd000 rw-p 00000000 00:00 0 > 7f20ca7fd000-7f20ca7fe000 ---p 00000000 00:00 0 > 7f20ca7fe000-7f20caffe000 rw-p 00000000 00:00 0 > 7f20cc000000-7f20cc029000 rw-p 00000000 00:00 0 > 7f20cc029000-7f20d0000000 ---p 00000000 00:00 0 > 7f20d4000000-7f20d4029000 rw-p 00000000 00:00 0 > 7f20d4029000-7f20d8000000 ---p 00000000 00:00 0 > 7f20dc000000-7f20dc029000 rw-p 00000000 00:00 0 > 7f20dc029000-7f20e0000000 ---p 00000000 00:00 0 > 7f210d9dd000-7f210d9de000 ---p 00000000 00:00 0 > 7f210d9de000-7f210e1de000 rw-p 00000000 00:00 0 [stack:10160] > 7f210e1de000-7f210e1df000 ---p 00000000 00:00 0 > 7f210e1df000-7f210e9df000 rw-p 00000000 00:00 0 [stack:10159] > 7f210e9df000-7f210e9e0000 ---p 00000000 00:00 0 > 7f210e9e0000-7f210f1e0000 rw-p 00000000 00:00 0 > 7f210f1e0000-7f210f1e1000 ---p 00000000 00:00 0 > 7f210f1e1000-7f210f9e4000 rw-p 00000000 00:00 0 > 7f210fa00000-7f210fa01000 rw-p 00000000 00:00 0 > 7f210fa01000-7f210fa02000 r-xp 00000000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > 7f210fa02000-7f210fc02000 ---p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > 7f210fc02000-7f210fc03000 rw-p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > 7f210fc03000-7f210fc04000 rw-p 00000000 00:00 0 > 7f210fc04000-7f210fc0a000 r-xp 00000000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > 7f210fc0a000-7f210fe09000 ---p 00006000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > 7f210fe09000-7f210fe0a000 rw-p 00005000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > 7f210fe0a000-7f210fe0b000 rw-p 00000000 00:00 0 > 7fff7c648000-7fff7c669000 rw-p 00000000 00:00 0 [stack] > 7fff7c715000-7fff7c716000 r-xp 00000000 00:00 0 [vdso] > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From bernd.hufmann at ericsson.com Wed Oct 10 09:20:41 2012 From: bernd.hufmann at ericsson.com (Bernd Hufmann) Date: Wed, 10 Oct 2012 09:20:41 -0400 Subject: [lttng-dev] LTTng Tools 2.1 streaming commands In-Reply-To: <50747625.2010109@efficios.com> References: <4CB817F8B8860343821834C09BFBA7A6E183805D4D@EUSAACMS0702.eamcs.ericsson.se> <50747625.2010109@efficios.com> Message-ID: <4CB817F8B8860343821834C09BFBA7A6E272920175@EUSAACMS0702.eamcs.ericsson.se> Hi David Thanks for looking into this. Bernd This Communication is Confidential. We only send and receive email on the basis of the terms set out at www.ericsson.com/email_disclaimer -----Original Message----- From: David Goulet [mailto:dgoulet at efficios.com] Sent: October-09-12 3:08 PM To: Bernd Hufmann Cc: lttng-dev at lists.lttng.org Subject: Re: [lttng-dev] LTTng Tools 2.1 streaming commands After talking a bit about this issue with other LTTng devs, it turns out that it makes more sense to have a "set-consumer" command and remove enable/disable-consumer from the cmd UI. I'll send a proposal on lttng-dev in the next days and please, everyone, feel free to give feedbacks on this. Thanks! David Bernd Hufmann: > Hello > > For the support of LTTng Tools 2.1 in Eclipse, I'm currently trying to > understand how to use the configuration for network streaming with the > updated "lttng create"-command and new "enable-consumer"-command. > > a) lttng enable-consumer > I find this command confusing because this command does not always > enables the consumer, even if the command name implies so. The > enabling actually depends on how the command is executed. > Examples: > > * "lttng enable-consumer -k -U net://" or "lttng > enable-consumer -k -C tcp:// -D tcp://" > don't enable the consumer. You need to either add option --enable or > execute subsequently "lttng enable-consumer --enable" > * lttng enable-consumer -k net:// does enable the > consumer. I took me a while to figure out the difference to the > example above: The option -U is omitted. > > > What the command actually provides, is 2 features: A way to configure > streaming (e.g. remote_addr) and a way to enable the consumer. Would > it be better to name it to "lttng configure-consumer"? Also, remove > the support of the possibility to not specify -U, -C or -D. The > following variants of this command should be enough: > lttng configure-consumer -k -U [--enable] lttng > configure-consumer -k -C -D [--enable] > lttng configure-consumer -k --enable lttng configure-consumer -u -U > [--enable] lttng configure-consumer -u -C > -D [--enable] lttng configure-consumer -u --enable > > Please let me know what you think. > > b) lttng create [-U ] | [-C -D > ] [--no-consumer] [--disable-consumer] > > * Are options --no-consumer or --disable-consumer only applicable for > streaming? > * I'm not sure what is the purpose of the options --no-consumer or > --disable-consumer. Could you please explain the use cases for using > --no-consumer or --disable-consumer? > > > Thanks > Bernd > > This Communication is Confidential. We only send and receive email on > the basis of the terms set out at _www.ericsson.com/email_disclaimer_ > > > > > This body part will be downloaded on demand. From paulmck at linux.vnet.ibm.com Wed Oct 10 11:02:07 2012 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Wed, 10 Oct 2012 08:02:07 -0700 Subject: [lttng-dev] rculfstack bug In-Reply-To: <20121010114215.GA11307@Krystal> References: <50754A57.1080104@cn.fujitsu.com> <20121010114215.GA11307@Krystal> Message-ID: <20121010150207.GB2495@linux.vnet.ibm.com> On Wed, Oct 10, 2012 at 07:42:15AM -0400, Mathieu Desnoyers wrote: > * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > > test code: > > ./tests/test_urcu_lfs 100 10 10 > > > > bug produce rate > 60% > > > > {{{ > > I didn't see any bug when "./tests/test_urcu_lfs 10 10 10" Or "./tests/test_urcu_lfs 100 100 10" > > But I just test it about 5 times > > }}} > > > > 4cores*1threads: Intel(R) Core(TM) i5 CPU 760 > > RCU_MB (no time to test for other rcu type) > > test commit: 768fba83676f49eb73fd1d8ad452016a84c5ec2a > > > > I didn't see any bug when "./tests/test_urcu_mb 10 100 10" > > > > Sorry, I tried, but I failed to find out the root cause currently. > > I think I managed to narrow down the issue: > > 1) the master branch does not reproduce it, but commit > 768fba83676f49eb73fd1d8ad452016a84c5ec2a repdroduces it about 50% of the > time. > > 2) the main change between 768fba83676f49eb73fd1d8ad452016a84c5ec2a and > current master (f94061a3df4c9eab9ac869a19e4228de54771fcb) is call_rcu > moving to wfcqueue. > > 3) the bug always arise, for me, at the end of the 10 seconds. > However, it might be simply due to the fact that most of the memory > get freed at the end of program execution. > > 4) I've been able to get a backtrace, and it looks like we have some > call_rcu callback-invokation threads still working while > call_rcu_data_free() is invoked. In the backtrace, call_rcu_data_free() > is nicely waiting for the next thread to stop, and during that time, > two callback-invokation threads are invoking callbacks (and one of > them triggers the segfault). Do any of the callbacks reference __thread variables from some other thread? If so, those threads must refrain from exiting until after such callbacks complete. Thanx, Paul > So I expect that commit > > commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe > Author: Mathieu Desnoyers > Date: Tue Sep 25 10:50:49 2012 -0500 > > call_rcu: use wfcqueue, eliminate false-sharing > > Eliminate false-sharing between call_rcu (enqueuer) and worker threads > on the queue head and tail. > > Acked-by: Paul E. McKenney > Signed-off-by: Mathieu Desnoyers > > Could have managed to fix the issue, or change the timing enough that it > does not reproduces. I'll continue investigating. > > Thanks, > > Mathieu > > > > > > *** glibc detected *** /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs: double free or corruption (out): 0x00007f20955dfbb0 *** > > ======= Backtrace: ========= > > /lib64/libc.so.6[0x37ee676d63] > > /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs[0x4024f5] > > /lib64/libpthread.so.0[0x37eda06ccb] > > /lib64/libc.so.6(clone+0x6d)[0x37ee6e0c2d] > > ======= Memory map: ======== > > 00400000-00405000 r-xp 00000000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs > > 00605000-00606000 rw-p 00005000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs > > 00606000-00616000 rw-p 00000000 00:00 0 > > 00e9c000-03482000 rw-p 00000000 00:00 0 [heap] > > 37ed600000-37ed61f000 r-xp 00000000 08:01 1507421 /lib64/ld-2.13.so > > 37ed81e000-37ed81f000 r--p 0001e000 08:01 1507421 /lib64/ld-2.13.so > > 37ed81f000-37ed820000 rw-p 0001f000 08:01 1507421 /lib64/ld-2.13.so > > 37ed820000-37ed821000 rw-p 00000000 00:00 0 > > 37eda00000-37eda17000 r-xp 00000000 08:01 1507427 /lib64/libpthread-2.13.so > > 37eda17000-37edc16000 ---p 00017000 08:01 1507427 /lib64/libpthread-2.13.so > > 37edc16000-37edc17000 r--p 00016000 08:01 1507427 /lib64/libpthread-2.13.so > > 37edc17000-37edc18000 rw-p 00017000 08:01 1507427 /lib64/libpthread-2.13.so > > 37edc18000-37edc1c000 rw-p 00000000 00:00 0 > > 37ee600000-37ee791000 r-xp 00000000 08:01 1507423 /lib64/libc-2.13.so > > 37ee791000-37ee991000 ---p 00191000 08:01 1507423 /lib64/libc-2.13.so > > 37ee991000-37ee995000 r--p 00191000 08:01 1507423 /lib64/libc-2.13.so > > 37ee995000-37ee996000 rw-p 00195000 08:01 1507423 /lib64/libc-2.13.so > > 37ee996000-37ee99c000 rw-p 00000000 00:00 0 > > 37f0e00000-37f0e15000 r-xp 00000000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > 37f0e15000-37f1014000 ---p 00015000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > 37f1014000-37f1015000 rw-p 00014000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > 7f1ee4000000-7f1ee4029000 rw-p 00000000 00:00 0 > > 7f1ee4029000-7f1ee8000000 ---p 00000000 00:00 0 > > 7f1eec000000-7f1eee039000 rw-p 00000000 00:00 0 > > 7f1eee039000-7f1ef0000000 ---p 00000000 00:00 0 > > 7f1ef4000000-7f1ef4029000 rw-p 00000000 00:00 0 > > 7f1ef4029000-7f1ef8000000 ---p 00000000 00:00 0 > > 7f1efc000000-7f1efc029000 rw-p 00000000 00:00 0 > > 7f1efc029000-7f1f00000000 ---p 00000000 00:00 0 > > 7f1f04000000-7f1f060b8000 rw-p 00000000 00:00 0 > > 7f1f060b8000-7f1f08000000 ---p 00000000 00:00 0 > > 7f1f0c000000-7f1f0c029000 rw-p 00000000 00:00 0 > > 7f1f0c029000-7f1f10000000 ---p 00000000 00:00 0 > > 7f1f14000000-7f1f14029000 rw-p 00000000 00:00 0 > > 7f1f14029000-7f1f18000000 ---p 00000000 00:00 0 > > 7f1f1c000000-7f1f1c029000 rw-p 00000000 00:00 0 > > 7f1f1c029000-7f1f20000000 ---p 00000000 00:00 0 > > 7f1f24000000-7f1f24029000 rw-p 00000000 00:00 0 > > 7f1f24029000-7f1f28000000 ---p 00000000 00:00 0 > > 7f1f2c000000-7f1f2c029000 rw-p 00000000 00:00 0 > > 7f1f2c029000-7f1f30000000 ---p 00000000 00:00 0 > > 7f1f34000000-7f1f34029000 rw-p 00000000 00:00 0 > > 7f1f34029000-7f1f38000000 ---p 00000000 00:00 0 > > 7f1f3c000000-7f1f3c029000 rw-p 00000000 00:00 0 > > 7f1f3c029000-7f1f40000000 ---p 00000000 00:00 0 > > 7f1f44000000-7f1f44029000 rw-p 00000000 00:00 0 > > 7f1f44029000-7f1f48000000 ---p 00000000 00:00 0 > > 7f1f4c000000-7f1f4c029000 rw-p 00000000 00:00 0 > > 7f1f4c029000-7f1f50000000 ---p 00000000 00:00 0 > > 7f1f54000000-7f1f54029000 rw-p 00000000 00:00 0 > > 7f1f54029000-7f1f58000000 ---p 00000000 00:00 0 > > 7f1f5c000000-7f1f5c029000 rw-p 00000000 00:00 0 > > 7f1f5c029000-7f1f60000000 ---p 00000000 00:00 0 > > 7f1f64000000-7f1f64029000 rw-p 00000000 00:00 0 > > 7f1f64029000-7f1f68000000 ---p 00000000 00:00 0 > > 7f1f6c000000-7f1f6c029000 rw-p 00000000 00:00 0 > > 7f1f6c029000-7f1f70000000 ---p 00000000 00:00 0 > > 7f1f74000000-7f1f74029000 rw-p 00000000 00:00 0 > > 7f1f74029000-7f1f78000000 ---p 00000000 00:00 0 > > 7f1f7c000000-7f1f7c029000 rw-p 00000000 00:00 0 > > 7f1f7c029000-7f1f80000000 ---p 00000000 00:00 0 > > 7f1f84000000-7f1f84029000 rw-p 00000000 00:00 0 > > 7f1f84029000-7f1f88000000 ---p 00000000 00:00 0 > > 7f1f8c000000-7f1f8c029000 rw-p 00000000 00:00 0 > > 7f1f8c029000-7f1f90000000 ---p 00000000 00:00 0 > > 7f1f94000000-7f1f94029000 rw-p 00000000 00:00 0 > > 7f1f94029000-7f1f98000000 ---p 00000000 00:00 0 > > 7f1f9c000000-7f1f9c029000 rw-p 00000000 00:00 0 > > 7f1f9c029000-7f1fa0000000 ---p 00000000 00:00 0 > > 7f1fa4000000-7f1fa60ac000 rw-p 00000000 00:00 0 > > 7f1fa60ac000-7f1fa8000000 ---p 00000000 00:00 0 > > 7f1fac000000-7f1fac029000 rw-p 00000000 00:00 0 > > 7f1fac029000-7f1fb0000000 ---p 00000000 00:00 0 > > 7f1fb4000000-7f1fb4029000 rw-p 00000000 00:00 0 > > 7f1fb4029000-7f1fb8000000 ---p 00000000 00:00 0 > > 7f1fbc000000-7f1fbc029000 rw-p 00000000 00:00 0 > > 7f1fbc029000-7f1fc0000000 ---p 00000000 00:00 0 > > 7f1fc4000000-7f1fc4029000 rw-p 00000000 00:00 0 > > 7f1fc4029000-7f1fc8000000 ---p 00000000 00:00 0 > > 7f1fcc000000-7f1fce0a1000 rw-p 00000000 00:00 0 > > 7f1fce0a1000-7f1fd0000000 ---p 00000000 00:00 0 > > 7f1fd4000000-7f1fd4029000 rw-p 00000000 00:00 0 > > 7f1fd4029000-7f1fd8000000 ---p 00000000 00:00 0 > > 7f1fdc000000-7f1fde06b000 rw-p 00000000 00:00 0 > > 7f1fde06b000-7f1fe0000000 ---p 00000000 00:00 0 > > 7f1fe4000000-7f1fe4029000 rw-p 00000000 00:00 0 > > 7f1fe4029000-7f1fe8000000 ---p 00000000 00:00 0 > > 7f1fec000000-7f1fede38000 rw-p 00000000 00:00 0 > > 7f1fede38000-7f1ff0000000 ---p 00000000 00:00 0 > > 7f1ff4000000-7f1ff4029000 rw-p 00000000 00:00 0 > > 7f1ff4029000-7f1ff8000000 ---p 00000000 00:00 0 > > 7f1ffc000000-7f1ffc029000 rw-p 00000000 00:00 0 > > 7f1ffc029000-7f2000000000 ---p 00000000 00:00 0 > > 7f2004000000-7f20060c6000 rw-p 00000000 00:00 0 > > 7f20060c6000-7f2008000000 ---p 00000000 00:00 0 > > 7f200c000000-7f200c029000 rw-p 00000000 00:00 0 > > 7f200c029000-7f2010000000 ---p 00000000 00:00 0 > > 7f2014000000-7f2014029000 rw-p 00000000 00:00 0 > > 7f2014029000-7f2018000000 ---p 00000000 00:00 0 > > 7f201c000000-7f201c029000 rw-p 00000000 00:00 0 > > 7f201c029000-7f2020000000 ---p 00000000 00:00 0 > > 7f2024000000-7f2024029000 rw-p 00000000 00:00 0 > > 7f2024029000-7f2028000000 ---p 00000000 00:00 0 > > 7f202c000000-7f202c029000 rw-p 00000000 00:00 0 > > 7f202c029000-7f2030000000 ---p 00000000 00:00 0 > > 7f2034000000-7f2034029000 rw-p 00000000 00:00 0 > > 7f2034029000-7f2038000000 ---p 00000000 00:00 0 > > 7f203c000000-7f203c029000 rw-p 00000000 00:00 0 > > 7f203c029000-7f2040000000 ---p 00000000 00:00 0 > > 7f2044000000-7f2044029000 rw-p 00000000 00:00 0 > > 7f2044029000-7f2048000000 ---p 00000000 00:00 0 > > 7f204c000000-7f204c029000 rw-p 00000000 00:00 0 > > 7f204c029000-7f2050000000 ---p 00000000 00:00 0 > > 7f2054000000-7f2054029000 rw-p 00000000 00:00 0 > > 7f2054029000-7f2058000000 ---p 00000000 00:00 0 > > 7f205c000000-7f205c029000 rw-p 00000000 00:00 0 > > 7f205c029000-7f2060000000 ---p 00000000 00:00 0 > > 7f2064000000-7f2064029000 rw-p 00000000 00:00 0 > > 7f2064029000-7f2068000000 ---p 00000000 00:00 0 > > 7f206c000000-7f206c029000 rw-p 00000000 00:00 0 > > 7f206c029000-7f2070000000 ---p 00000000 00:00 0 > > 7f2074000000-7f2074029000 rw-p 00000000 00:00 0 > > 7f2074029000-7f2078000000 ---p 00000000 00:00 0 > > 7f207c000000-7f207e0bc000 rw-p 00000000 00:00 0 > > 7f207e0bc000-7f2080000000 ---p 00000000 00:00 0 > > 7f2084000000-7f2084029000 rw-p 00000000 00:00 0 > > 7f2084029000-7f2088000000 ---p 00000000 00:00 0 > > 7f208c000000-7f208c029000 rw-p 00000000 00:00 0 > > 7f208c029000-7f2090000000 ---p 00000000 00:00 0 > > 7f2094000000-7f20960c6000 rw-p 00000000 00:00 0 > > 7f20960c6000-7f2098000000 ---p 00000000 00:00 0 > > 7f209c000000-7f209c029000 rw-p 00000000 00:00 0 > > 7f209c029000-7f20a0000000 ---p 00000000 00:00 0 > > 7f20a4000000-7f20a4029000 rw-p 00000000 00:00 0 > > 7f20a4029000-7f20a8000000 ---p 00000000 00:00 0 > > 7f20ac000000-7f20ac029000 rw-p 00000000 00:00 0 > > 7f20ac029000-7f20b0000000 ---p 00000000 00:00 0 > > 7f20b4000000-7f20b4029000 rw-p 00000000 00:00 0 > > 7f20b4029000-7f20b8000000 ---p 00000000 00:00 0 > > 7f20bc000000-7f20bc029000 rw-p 00000000 00:00 0 > > 7f20bc029000-7f20c0000000 ---p 00000000 00:00 0 > > 7f20c4000000-7f20c4029000 rw-p 00000000 00:00 0 > > 7f20c4029000-7f20c8000000 ---p 00000000 00:00 0 > > 7f20c8ffa000-7f20c8ffb000 ---p 00000000 00:00 0 > > 7f20c8ffb000-7f20c97fb000 rw-p 00000000 00:00 0 [stack:10274] > > 7f20c97fb000-7f20c97fc000 ---p 00000000 00:00 0 > > 7f20c97fc000-7f20c9ffc000 rw-p 00000000 00:00 0 > > 7f20c9ffc000-7f20c9ffd000 ---p 00000000 00:00 0 > > 7f20c9ffd000-7f20ca7fd000 rw-p 00000000 00:00 0 > > 7f20ca7fd000-7f20ca7fe000 ---p 00000000 00:00 0 > > 7f20ca7fe000-7f20caffe000 rw-p 00000000 00:00 0 > > 7f20cc000000-7f20cc029000 rw-p 00000000 00:00 0 > > 7f20cc029000-7f20d0000000 ---p 00000000 00:00 0 > > 7f20d4000000-7f20d4029000 rw-p 00000000 00:00 0 > > 7f20d4029000-7f20d8000000 ---p 00000000 00:00 0 > > 7f20dc000000-7f20dc029000 rw-p 00000000 00:00 0 > > 7f20dc029000-7f20e0000000 ---p 00000000 00:00 0 > > 7f210d9dd000-7f210d9de000 ---p 00000000 00:00 0 > > 7f210d9de000-7f210e1de000 rw-p 00000000 00:00 0 [stack:10160] > > 7f210e1de000-7f210e1df000 ---p 00000000 00:00 0 > > 7f210e1df000-7f210e9df000 rw-p 00000000 00:00 0 [stack:10159] > > 7f210e9df000-7f210e9e0000 ---p 00000000 00:00 0 > > 7f210e9e0000-7f210f1e0000 rw-p 00000000 00:00 0 > > 7f210f1e0000-7f210f1e1000 ---p 00000000 00:00 0 > > 7f210f1e1000-7f210f9e4000 rw-p 00000000 00:00 0 > > 7f210fa00000-7f210fa01000 rw-p 00000000 00:00 0 > > 7f210fa01000-7f210fa02000 r-xp 00000000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > 7f210fa02000-7f210fc02000 ---p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > 7f210fc02000-7f210fc03000 rw-p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > 7f210fc03000-7f210fc04000 rw-p 00000000 00:00 0 > > 7f210fc04000-7f210fc0a000 r-xp 00000000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > 7f210fc0a000-7f210fe09000 ---p 00006000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > 7f210fe09000-7f210fe0a000 rw-p 00005000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > 7f210fe0a000-7f210fe0b000 rw-p 00000000 00:00 0 > > 7fff7c648000-7fff7c669000 rw-p 00000000 00:00 0 [stack] > > 7fff7c715000-7fff7c716000 r-xp 00000000 00:00 0 [vdso] > > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev at lists.lttng.org > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com > From mathieu.desnoyers at efficios.com Wed Oct 10 11:07:51 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 10 Oct 2012 11:07:51 -0400 Subject: [lttng-dev] rculfstack bug In-Reply-To: <20121010150207.GB2495@linux.vnet.ibm.com> References: <50754A57.1080104@cn.fujitsu.com> <20121010114215.GA11307@Krystal> <20121010150207.GB2495@linux.vnet.ibm.com> Message-ID: <20121010150751.GB20761@Krystal> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > On Wed, Oct 10, 2012 at 07:42:15AM -0400, Mathieu Desnoyers wrote: > > * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > > > test code: > > > ./tests/test_urcu_lfs 100 10 10 > > > > > > bug produce rate > 60% > > > > > > {{{ > > > I didn't see any bug when "./tests/test_urcu_lfs 10 10 10" Or "./tests/test_urcu_lfs 100 100 10" > > > But I just test it about 5 times > > > }}} > > > > > > 4cores*1threads: Intel(R) Core(TM) i5 CPU 760 > > > RCU_MB (no time to test for other rcu type) > > > test commit: 768fba83676f49eb73fd1d8ad452016a84c5ec2a > > > > > > I didn't see any bug when "./tests/test_urcu_mb 10 100 10" > > > > > > Sorry, I tried, but I failed to find out the root cause currently. > > > > I think I managed to narrow down the issue: > > > > 1) the master branch does not reproduce it, but commit > > 768fba83676f49eb73fd1d8ad452016a84c5ec2a repdroduces it about 50% of the > > time. > > > > 2) the main change between 768fba83676f49eb73fd1d8ad452016a84c5ec2a and > > current master (f94061a3df4c9eab9ac869a19e4228de54771fcb) is call_rcu > > moving to wfcqueue. > > > > 3) the bug always arise, for me, at the end of the 10 seconds. > > However, it might be simply due to the fact that most of the memory > > get freed at the end of program execution. > > > > 4) I've been able to get a backtrace, and it looks like we have some > > call_rcu callback-invokation threads still working while > > call_rcu_data_free() is invoked. In the backtrace, call_rcu_data_free() > > is nicely waiting for the next thread to stop, and during that time, > > two callback-invokation threads are invoking callbacks (and one of > > them triggers the segfault). > > Do any of the callbacks reference __thread variables from some other > thread? If so, those threads must refrain from exiting until after > such callbacks complete. The callback is a simple caa_container_of + free, usual stuff, nothing fancy. Thanks, Mathieu > > Thanx, Paul > > > So I expect that commit > > > > commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe > > Author: Mathieu Desnoyers > > Date: Tue Sep 25 10:50:49 2012 -0500 > > > > call_rcu: use wfcqueue, eliminate false-sharing > > > > Eliminate false-sharing between call_rcu (enqueuer) and worker threads > > on the queue head and tail. > > > > Acked-by: Paul E. McKenney > > Signed-off-by: Mathieu Desnoyers > > > > Could have managed to fix the issue, or change the timing enough that it > > does not reproduces. I'll continue investigating. > > > > Thanks, > > > > Mathieu > > > > > > > > > > *** glibc detected *** /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs: double free or corruption (out): 0x00007f20955dfbb0 *** > > > ======= Backtrace: ========= > > > /lib64/libc.so.6[0x37ee676d63] > > > /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs[0x4024f5] > > > /lib64/libpthread.so.0[0x37eda06ccb] > > > /lib64/libc.so.6(clone+0x6d)[0x37ee6e0c2d] > > > ======= Memory map: ======== > > > 00400000-00405000 r-xp 00000000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs > > > 00605000-00606000 rw-p 00005000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs > > > 00606000-00616000 rw-p 00000000 00:00 0 > > > 00e9c000-03482000 rw-p 00000000 00:00 0 [heap] > > > 37ed600000-37ed61f000 r-xp 00000000 08:01 1507421 /lib64/ld-2.13.so > > > 37ed81e000-37ed81f000 r--p 0001e000 08:01 1507421 /lib64/ld-2.13.so > > > 37ed81f000-37ed820000 rw-p 0001f000 08:01 1507421 /lib64/ld-2.13.so > > > 37ed820000-37ed821000 rw-p 00000000 00:00 0 > > > 37eda00000-37eda17000 r-xp 00000000 08:01 1507427 /lib64/libpthread-2.13.so > > > 37eda17000-37edc16000 ---p 00017000 08:01 1507427 /lib64/libpthread-2.13.so > > > 37edc16000-37edc17000 r--p 00016000 08:01 1507427 /lib64/libpthread-2.13.so > > > 37edc17000-37edc18000 rw-p 00017000 08:01 1507427 /lib64/libpthread-2.13.so > > > 37edc18000-37edc1c000 rw-p 00000000 00:00 0 > > > 37ee600000-37ee791000 r-xp 00000000 08:01 1507423 /lib64/libc-2.13.so > > > 37ee791000-37ee991000 ---p 00191000 08:01 1507423 /lib64/libc-2.13.so > > > 37ee991000-37ee995000 r--p 00191000 08:01 1507423 /lib64/libc-2.13.so > > > 37ee995000-37ee996000 rw-p 00195000 08:01 1507423 /lib64/libc-2.13.so > > > 37ee996000-37ee99c000 rw-p 00000000 00:00 0 > > > 37f0e00000-37f0e15000 r-xp 00000000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > > 37f0e15000-37f1014000 ---p 00015000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > > 37f1014000-37f1015000 rw-p 00014000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > > 7f1ee4000000-7f1ee4029000 rw-p 00000000 00:00 0 > > > 7f1ee4029000-7f1ee8000000 ---p 00000000 00:00 0 > > > 7f1eec000000-7f1eee039000 rw-p 00000000 00:00 0 > > > 7f1eee039000-7f1ef0000000 ---p 00000000 00:00 0 > > > 7f1ef4000000-7f1ef4029000 rw-p 00000000 00:00 0 > > > 7f1ef4029000-7f1ef8000000 ---p 00000000 00:00 0 > > > 7f1efc000000-7f1efc029000 rw-p 00000000 00:00 0 > > > 7f1efc029000-7f1f00000000 ---p 00000000 00:00 0 > > > 7f1f04000000-7f1f060b8000 rw-p 00000000 00:00 0 > > > 7f1f060b8000-7f1f08000000 ---p 00000000 00:00 0 > > > 7f1f0c000000-7f1f0c029000 rw-p 00000000 00:00 0 > > > 7f1f0c029000-7f1f10000000 ---p 00000000 00:00 0 > > > 7f1f14000000-7f1f14029000 rw-p 00000000 00:00 0 > > > 7f1f14029000-7f1f18000000 ---p 00000000 00:00 0 > > > 7f1f1c000000-7f1f1c029000 rw-p 00000000 00:00 0 > > > 7f1f1c029000-7f1f20000000 ---p 00000000 00:00 0 > > > 7f1f24000000-7f1f24029000 rw-p 00000000 00:00 0 > > > 7f1f24029000-7f1f28000000 ---p 00000000 00:00 0 > > > 7f1f2c000000-7f1f2c029000 rw-p 00000000 00:00 0 > > > 7f1f2c029000-7f1f30000000 ---p 00000000 00:00 0 > > > 7f1f34000000-7f1f34029000 rw-p 00000000 00:00 0 > > > 7f1f34029000-7f1f38000000 ---p 00000000 00:00 0 > > > 7f1f3c000000-7f1f3c029000 rw-p 00000000 00:00 0 > > > 7f1f3c029000-7f1f40000000 ---p 00000000 00:00 0 > > > 7f1f44000000-7f1f44029000 rw-p 00000000 00:00 0 > > > 7f1f44029000-7f1f48000000 ---p 00000000 00:00 0 > > > 7f1f4c000000-7f1f4c029000 rw-p 00000000 00:00 0 > > > 7f1f4c029000-7f1f50000000 ---p 00000000 00:00 0 > > > 7f1f54000000-7f1f54029000 rw-p 00000000 00:00 0 > > > 7f1f54029000-7f1f58000000 ---p 00000000 00:00 0 > > > 7f1f5c000000-7f1f5c029000 rw-p 00000000 00:00 0 > > > 7f1f5c029000-7f1f60000000 ---p 00000000 00:00 0 > > > 7f1f64000000-7f1f64029000 rw-p 00000000 00:00 0 > > > 7f1f64029000-7f1f68000000 ---p 00000000 00:00 0 > > > 7f1f6c000000-7f1f6c029000 rw-p 00000000 00:00 0 > > > 7f1f6c029000-7f1f70000000 ---p 00000000 00:00 0 > > > 7f1f74000000-7f1f74029000 rw-p 00000000 00:00 0 > > > 7f1f74029000-7f1f78000000 ---p 00000000 00:00 0 > > > 7f1f7c000000-7f1f7c029000 rw-p 00000000 00:00 0 > > > 7f1f7c029000-7f1f80000000 ---p 00000000 00:00 0 > > > 7f1f84000000-7f1f84029000 rw-p 00000000 00:00 0 > > > 7f1f84029000-7f1f88000000 ---p 00000000 00:00 0 > > > 7f1f8c000000-7f1f8c029000 rw-p 00000000 00:00 0 > > > 7f1f8c029000-7f1f90000000 ---p 00000000 00:00 0 > > > 7f1f94000000-7f1f94029000 rw-p 00000000 00:00 0 > > > 7f1f94029000-7f1f98000000 ---p 00000000 00:00 0 > > > 7f1f9c000000-7f1f9c029000 rw-p 00000000 00:00 0 > > > 7f1f9c029000-7f1fa0000000 ---p 00000000 00:00 0 > > > 7f1fa4000000-7f1fa60ac000 rw-p 00000000 00:00 0 > > > 7f1fa60ac000-7f1fa8000000 ---p 00000000 00:00 0 > > > 7f1fac000000-7f1fac029000 rw-p 00000000 00:00 0 > > > 7f1fac029000-7f1fb0000000 ---p 00000000 00:00 0 > > > 7f1fb4000000-7f1fb4029000 rw-p 00000000 00:00 0 > > > 7f1fb4029000-7f1fb8000000 ---p 00000000 00:00 0 > > > 7f1fbc000000-7f1fbc029000 rw-p 00000000 00:00 0 > > > 7f1fbc029000-7f1fc0000000 ---p 00000000 00:00 0 > > > 7f1fc4000000-7f1fc4029000 rw-p 00000000 00:00 0 > > > 7f1fc4029000-7f1fc8000000 ---p 00000000 00:00 0 > > > 7f1fcc000000-7f1fce0a1000 rw-p 00000000 00:00 0 > > > 7f1fce0a1000-7f1fd0000000 ---p 00000000 00:00 0 > > > 7f1fd4000000-7f1fd4029000 rw-p 00000000 00:00 0 > > > 7f1fd4029000-7f1fd8000000 ---p 00000000 00:00 0 > > > 7f1fdc000000-7f1fde06b000 rw-p 00000000 00:00 0 > > > 7f1fde06b000-7f1fe0000000 ---p 00000000 00:00 0 > > > 7f1fe4000000-7f1fe4029000 rw-p 00000000 00:00 0 > > > 7f1fe4029000-7f1fe8000000 ---p 00000000 00:00 0 > > > 7f1fec000000-7f1fede38000 rw-p 00000000 00:00 0 > > > 7f1fede38000-7f1ff0000000 ---p 00000000 00:00 0 > > > 7f1ff4000000-7f1ff4029000 rw-p 00000000 00:00 0 > > > 7f1ff4029000-7f1ff8000000 ---p 00000000 00:00 0 > > > 7f1ffc000000-7f1ffc029000 rw-p 00000000 00:00 0 > > > 7f1ffc029000-7f2000000000 ---p 00000000 00:00 0 > > > 7f2004000000-7f20060c6000 rw-p 00000000 00:00 0 > > > 7f20060c6000-7f2008000000 ---p 00000000 00:00 0 > > > 7f200c000000-7f200c029000 rw-p 00000000 00:00 0 > > > 7f200c029000-7f2010000000 ---p 00000000 00:00 0 > > > 7f2014000000-7f2014029000 rw-p 00000000 00:00 0 > > > 7f2014029000-7f2018000000 ---p 00000000 00:00 0 > > > 7f201c000000-7f201c029000 rw-p 00000000 00:00 0 > > > 7f201c029000-7f2020000000 ---p 00000000 00:00 0 > > > 7f2024000000-7f2024029000 rw-p 00000000 00:00 0 > > > 7f2024029000-7f2028000000 ---p 00000000 00:00 0 > > > 7f202c000000-7f202c029000 rw-p 00000000 00:00 0 > > > 7f202c029000-7f2030000000 ---p 00000000 00:00 0 > > > 7f2034000000-7f2034029000 rw-p 00000000 00:00 0 > > > 7f2034029000-7f2038000000 ---p 00000000 00:00 0 > > > 7f203c000000-7f203c029000 rw-p 00000000 00:00 0 > > > 7f203c029000-7f2040000000 ---p 00000000 00:00 0 > > > 7f2044000000-7f2044029000 rw-p 00000000 00:00 0 > > > 7f2044029000-7f2048000000 ---p 00000000 00:00 0 > > > 7f204c000000-7f204c029000 rw-p 00000000 00:00 0 > > > 7f204c029000-7f2050000000 ---p 00000000 00:00 0 > > > 7f2054000000-7f2054029000 rw-p 00000000 00:00 0 > > > 7f2054029000-7f2058000000 ---p 00000000 00:00 0 > > > 7f205c000000-7f205c029000 rw-p 00000000 00:00 0 > > > 7f205c029000-7f2060000000 ---p 00000000 00:00 0 > > > 7f2064000000-7f2064029000 rw-p 00000000 00:00 0 > > > 7f2064029000-7f2068000000 ---p 00000000 00:00 0 > > > 7f206c000000-7f206c029000 rw-p 00000000 00:00 0 > > > 7f206c029000-7f2070000000 ---p 00000000 00:00 0 > > > 7f2074000000-7f2074029000 rw-p 00000000 00:00 0 > > > 7f2074029000-7f2078000000 ---p 00000000 00:00 0 > > > 7f207c000000-7f207e0bc000 rw-p 00000000 00:00 0 > > > 7f207e0bc000-7f2080000000 ---p 00000000 00:00 0 > > > 7f2084000000-7f2084029000 rw-p 00000000 00:00 0 > > > 7f2084029000-7f2088000000 ---p 00000000 00:00 0 > > > 7f208c000000-7f208c029000 rw-p 00000000 00:00 0 > > > 7f208c029000-7f2090000000 ---p 00000000 00:00 0 > > > 7f2094000000-7f20960c6000 rw-p 00000000 00:00 0 > > > 7f20960c6000-7f2098000000 ---p 00000000 00:00 0 > > > 7f209c000000-7f209c029000 rw-p 00000000 00:00 0 > > > 7f209c029000-7f20a0000000 ---p 00000000 00:00 0 > > > 7f20a4000000-7f20a4029000 rw-p 00000000 00:00 0 > > > 7f20a4029000-7f20a8000000 ---p 00000000 00:00 0 > > > 7f20ac000000-7f20ac029000 rw-p 00000000 00:00 0 > > > 7f20ac029000-7f20b0000000 ---p 00000000 00:00 0 > > > 7f20b4000000-7f20b4029000 rw-p 00000000 00:00 0 > > > 7f20b4029000-7f20b8000000 ---p 00000000 00:00 0 > > > 7f20bc000000-7f20bc029000 rw-p 00000000 00:00 0 > > > 7f20bc029000-7f20c0000000 ---p 00000000 00:00 0 > > > 7f20c4000000-7f20c4029000 rw-p 00000000 00:00 0 > > > 7f20c4029000-7f20c8000000 ---p 00000000 00:00 0 > > > 7f20c8ffa000-7f20c8ffb000 ---p 00000000 00:00 0 > > > 7f20c8ffb000-7f20c97fb000 rw-p 00000000 00:00 0 [stack:10274] > > > 7f20c97fb000-7f20c97fc000 ---p 00000000 00:00 0 > > > 7f20c97fc000-7f20c9ffc000 rw-p 00000000 00:00 0 > > > 7f20c9ffc000-7f20c9ffd000 ---p 00000000 00:00 0 > > > 7f20c9ffd000-7f20ca7fd000 rw-p 00000000 00:00 0 > > > 7f20ca7fd000-7f20ca7fe000 ---p 00000000 00:00 0 > > > 7f20ca7fe000-7f20caffe000 rw-p 00000000 00:00 0 > > > 7f20cc000000-7f20cc029000 rw-p 00000000 00:00 0 > > > 7f20cc029000-7f20d0000000 ---p 00000000 00:00 0 > > > 7f20d4000000-7f20d4029000 rw-p 00000000 00:00 0 > > > 7f20d4029000-7f20d8000000 ---p 00000000 00:00 0 > > > 7f20dc000000-7f20dc029000 rw-p 00000000 00:00 0 > > > 7f20dc029000-7f20e0000000 ---p 00000000 00:00 0 > > > 7f210d9dd000-7f210d9de000 ---p 00000000 00:00 0 > > > 7f210d9de000-7f210e1de000 rw-p 00000000 00:00 0 [stack:10160] > > > 7f210e1de000-7f210e1df000 ---p 00000000 00:00 0 > > > 7f210e1df000-7f210e9df000 rw-p 00000000 00:00 0 [stack:10159] > > > 7f210e9df000-7f210e9e0000 ---p 00000000 00:00 0 > > > 7f210e9e0000-7f210f1e0000 rw-p 00000000 00:00 0 > > > 7f210f1e0000-7f210f1e1000 ---p 00000000 00:00 0 > > > 7f210f1e1000-7f210f9e4000 rw-p 00000000 00:00 0 > > > 7f210fa00000-7f210fa01000 rw-p 00000000 00:00 0 > > > 7f210fa01000-7f210fa02000 r-xp 00000000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > > 7f210fa02000-7f210fc02000 ---p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > > 7f210fc02000-7f210fc03000 rw-p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > > 7f210fc03000-7f210fc04000 rw-p 00000000 00:00 0 > > > 7f210fc04000-7f210fc0a000 r-xp 00000000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > > 7f210fc0a000-7f210fe09000 ---p 00006000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > > 7f210fe09000-7f210fe0a000 rw-p 00005000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > > 7f210fe0a000-7f210fe0b000 rw-p 00000000 00:00 0 > > > 7fff7c648000-7fff7c669000 rw-p 00000000 00:00 0 [stack] > > > 7fff7c715000-7fff7c716000 r-xp 00000000 00:00 0 [vdso] > > > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] > > > > > > _______________________________________________ > > > lttng-dev mailing list > > > lttng-dev at lists.lttng.org > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > > -- > > Mathieu Desnoyers > > Operating System Efficiency R&D Consultant > > EfficiOS Inc. > > http://www.efficios.com > > > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From paulmck at linux.vnet.ibm.com Wed Oct 10 10:59:59 2012 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Wed, 10 Oct 2012 07:59:59 -0700 Subject: [lttng-dev] [RFC] re-document rculfstack and even rename it In-Reply-To: <50752928.3050006@cn.fujitsu.com> References: <50752928.3050006@cn.fujitsu.com> Message-ID: <20121010145959.GA2495@linux.vnet.ibm.com> On Wed, Oct 10, 2012 at 03:52:08PM +0800, Lai Jiangshan wrote: > rculfstack is not really require RCU-only. > > 1) cds_lfs_push_rcu() don't need any lock, don't need RCU nor other locks. > > 2) cds_lfs_pop_rcu() don't only one of the following synchronization(not only RCU): > A) use rcu_read_lock() to protect cds_lfs_pop_rcu() and use synchronize_rcu() > or call_rcu() to free the popped node. (current comments said we need this > synchronization, and thus we named this struct with rcu prefix. But actually, > the followings are OK, and are more popular/friendly) > B) use mutexs/locks to protect cds_lfs_pop_rcu(), we can free to free/modify the > popped node any time, we don't need any synchronization when free them. > C) only ONE thread can call cds_lfs_pop_rcu(). (multi-providers-single customer) > D) others, like read-write locks. > > I consider B) and C) are more popular. In linux kernel, > kernel/task_work.c uses a hybird ways of B) and C). > > I suggest to rename it, Or document B) and C) at least. Good timing -- stacks and queues are next on my list for documentation. ;-) Thanx, Paul From dgoulet at efficios.com Wed Oct 10 13:11:53 2012 From: dgoulet at efficios.com (David Goulet) Date: Wed, 10 Oct 2012 13:11:53 -0400 Subject: [lttng-dev] [RFC] Changes to the stop command Message-ID: <5075AC59.3050704@efficios.com> Hi everyone, We discovered a week ago a "broken guarantee" which is that when a session is stopped by either using the lttng command or the API call lttng_stop_session the traced data MUST be ready to be read. However, we don't offer that at all for now for both local storage and network streaming. The stop command/call simply does _not_ wait for that state. Here is the proposal to fix this issue before the 2.1 stable release. Let's add a new API call (extending it) that probes the session daemon for the trace files state (still writing, no more data, closed, ...). Ex: lttng_data_state(handle) This will bring a change to the default behavior of the stop command. >From now on, it will wait by default until the data is available to read (for both network and local). This will however be done on the client side in order to avoid blocking the session daemon client command sub system for an unknown amount of time. The way I propose we proceed is to use the new API call (mention above) on the liblttng-ctl side when a stop is done that requires it to wait. Unfortunately, there is no clean way to do that other than an active loop polling the session daemon... The "no wait" use case of the stop command will also be added with a lttng_stop_session_no_wait or something like that. In a nutshell: (new) lttng_data_state(handle) --> name is NOT final, please chip in for ideas! :) (new) lttng_stop_session_no_wait(session_name) --> naming NOT final. (changes) lttng stop (and lttng_stop_session) will now wait for the data to be available so babeltrace could be use right after for instance. A --no-wait will be added as well to the UI command. I would like everyone opinion on that because this is an important issue that _MUST_ be fixed in 2.1 stable or at least in the 2.1.x series. Thanks a lot! David From mathieu.desnoyers at efficios.com Wed Oct 10 13:53:04 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 10 Oct 2012 13:53:04 -0400 Subject: [lttng-dev] rculfstack bug In-Reply-To: <20121010150751.GB20761@Krystal> References: <50754A57.1080104@cn.fujitsu.com> <20121010114215.GA11307@Krystal> <20121010150207.GB2495@linux.vnet.ibm.com> <20121010150751.GB20761@Krystal> Message-ID: <20121010175304.GA25511@Krystal> * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > > On Wed, Oct 10, 2012 at 07:42:15AM -0400, Mathieu Desnoyers wrote: > > > * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > > > > test code: > > > > ./tests/test_urcu_lfs 100 10 10 > > > > > > > > bug produce rate > 60% > > > > > > > > {{{ > > > > I didn't see any bug when "./tests/test_urcu_lfs 10 10 10" Or "./tests/test_urcu_lfs 100 100 10" > > > > But I just test it about 5 times > > > > }}} > > > > > > > > 4cores*1threads: Intel(R) Core(TM) i5 CPU 760 > > > > RCU_MB (no time to test for other rcu type) > > > > test commit: 768fba83676f49eb73fd1d8ad452016a84c5ec2a > > > > > > > > I didn't see any bug when "./tests/test_urcu_mb 10 100 10" > > > > > > > > Sorry, I tried, but I failed to find out the root cause currently. > > > > > > I think I managed to narrow down the issue: > > > > > > 1) the master branch does not reproduce it, but commit > > > 768fba83676f49eb73fd1d8ad452016a84c5ec2a repdroduces it about 50% of the > > > time. > > > > > > 2) the main change between 768fba83676f49eb73fd1d8ad452016a84c5ec2a and > > > current master (f94061a3df4c9eab9ac869a19e4228de54771fcb) is call_rcu > > > moving to wfcqueue. > > > > > > 3) the bug always arise, for me, at the end of the 10 seconds. > > > However, it might be simply due to the fact that most of the memory > > > get freed at the end of program execution. > > > > > > 4) I've been able to get a backtrace, and it looks like we have some > > > call_rcu callback-invokation threads still working while > > > call_rcu_data_free() is invoked. In the backtrace, call_rcu_data_free() > > > is nicely waiting for the next thread to stop, and during that time, > > > two callback-invokation threads are invoking callbacks (and one of > > > them triggers the segfault). > > > > Do any of the callbacks reference __thread variables from some other > > thread? If so, those threads must refrain from exiting until after > > such callbacks complete. > > The callback is a simple caa_container_of + free, usual stuff, nothing > fancy. Here is the fix: the bug was in call rcu. It is not required for master, because we fixed it while moving to wfcqueue. We were erroneously writing to the head field of the default call_rcu_data rather than tail. I wonder if we should simply do a new release with call_rcu using wfcqueue and tell people to upgrade, or if we should somehow create a stable branch with this fix. Thoughts ? Thanks, Mathieu --- diff --git a/urcu-call-rcu-impl.h b/urcu-call-rcu-impl.h index 13b24ff..b205229 100644 --- a/urcu-call-rcu-impl.h +++ b/urcu-call-rcu-impl.h @@ -647,8 +647,9 @@ void call_rcu_data_free(struct call_rcu_data *crdp) /* Create default call rcu data if need be */ (void) get_default_call_rcu_data(); cbs_endprev = (struct cds_wfq_node **) - uatomic_xchg(&default_call_rcu_data, cbs_tail); - *cbs_endprev = cbs; + uatomic_xchg(&default_call_rcu_data->cbs.tail, + cbs_tail); + _CMM_STORE_SHARED(*cbs_endprev, cbs); uatomic_add(&default_call_rcu_data->qlen, uatomic_read(&crdp->qlen)); wake_call_rcu_thread(default_call_rcu_data); > > Thanks, > > Mathieu > > > > > Thanx, Paul > > > > > So I expect that commit > > > > > > commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe > > > Author: Mathieu Desnoyers > > > Date: Tue Sep 25 10:50:49 2012 -0500 > > > > > > call_rcu: use wfcqueue, eliminate false-sharing > > > > > > Eliminate false-sharing between call_rcu (enqueuer) and worker threads > > > on the queue head and tail. > > > > > > Acked-by: Paul E. McKenney > > > Signed-off-by: Mathieu Desnoyers > > > > > > Could have managed to fix the issue, or change the timing enough that it > > > does not reproduces. I'll continue investigating. > > > > > > Thanks, > > > > > > Mathieu > > > > > > > > > > > > > > *** glibc detected *** /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs: double free or corruption (out): 0x00007f20955dfbb0 *** > > > > ======= Backtrace: ========= > > > > /lib64/libc.so.6[0x37ee676d63] > > > > /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs[0x4024f5] > > > > /lib64/libpthread.so.0[0x37eda06ccb] > > > > /lib64/libc.so.6(clone+0x6d)[0x37ee6e0c2d] > > > > ======= Memory map: ======== > > > > 00400000-00405000 r-xp 00000000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs > > > > 00605000-00606000 rw-p 00005000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs > > > > 00606000-00616000 rw-p 00000000 00:00 0 > > > > 00e9c000-03482000 rw-p 00000000 00:00 0 [heap] > > > > 37ed600000-37ed61f000 r-xp 00000000 08:01 1507421 /lib64/ld-2.13.so > > > > 37ed81e000-37ed81f000 r--p 0001e000 08:01 1507421 /lib64/ld-2.13.so > > > > 37ed81f000-37ed820000 rw-p 0001f000 08:01 1507421 /lib64/ld-2.13.so > > > > 37ed820000-37ed821000 rw-p 00000000 00:00 0 > > > > 37eda00000-37eda17000 r-xp 00000000 08:01 1507427 /lib64/libpthread-2.13.so > > > > 37eda17000-37edc16000 ---p 00017000 08:01 1507427 /lib64/libpthread-2.13.so > > > > 37edc16000-37edc17000 r--p 00016000 08:01 1507427 /lib64/libpthread-2.13.so > > > > 37edc17000-37edc18000 rw-p 00017000 08:01 1507427 /lib64/libpthread-2.13.so > > > > 37edc18000-37edc1c000 rw-p 00000000 00:00 0 > > > > 37ee600000-37ee791000 r-xp 00000000 08:01 1507423 /lib64/libc-2.13.so > > > > 37ee791000-37ee991000 ---p 00191000 08:01 1507423 /lib64/libc-2.13.so > > > > 37ee991000-37ee995000 r--p 00191000 08:01 1507423 /lib64/libc-2.13.so > > > > 37ee995000-37ee996000 rw-p 00195000 08:01 1507423 /lib64/libc-2.13.so > > > > 37ee996000-37ee99c000 rw-p 00000000 00:00 0 > > > > 37f0e00000-37f0e15000 r-xp 00000000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > > > 37f0e15000-37f1014000 ---p 00015000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > > > 37f1014000-37f1015000 rw-p 00014000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > > > 7f1ee4000000-7f1ee4029000 rw-p 00000000 00:00 0 > > > > 7f1ee4029000-7f1ee8000000 ---p 00000000 00:00 0 > > > > 7f1eec000000-7f1eee039000 rw-p 00000000 00:00 0 > > > > 7f1eee039000-7f1ef0000000 ---p 00000000 00:00 0 > > > > 7f1ef4000000-7f1ef4029000 rw-p 00000000 00:00 0 > > > > 7f1ef4029000-7f1ef8000000 ---p 00000000 00:00 0 > > > > 7f1efc000000-7f1efc029000 rw-p 00000000 00:00 0 > > > > 7f1efc029000-7f1f00000000 ---p 00000000 00:00 0 > > > > 7f1f04000000-7f1f060b8000 rw-p 00000000 00:00 0 > > > > 7f1f060b8000-7f1f08000000 ---p 00000000 00:00 0 > > > > 7f1f0c000000-7f1f0c029000 rw-p 00000000 00:00 0 > > > > 7f1f0c029000-7f1f10000000 ---p 00000000 00:00 0 > > > > 7f1f14000000-7f1f14029000 rw-p 00000000 00:00 0 > > > > 7f1f14029000-7f1f18000000 ---p 00000000 00:00 0 > > > > 7f1f1c000000-7f1f1c029000 rw-p 00000000 00:00 0 > > > > 7f1f1c029000-7f1f20000000 ---p 00000000 00:00 0 > > > > 7f1f24000000-7f1f24029000 rw-p 00000000 00:00 0 > > > > 7f1f24029000-7f1f28000000 ---p 00000000 00:00 0 > > > > 7f1f2c000000-7f1f2c029000 rw-p 00000000 00:00 0 > > > > 7f1f2c029000-7f1f30000000 ---p 00000000 00:00 0 > > > > 7f1f34000000-7f1f34029000 rw-p 00000000 00:00 0 > > > > 7f1f34029000-7f1f38000000 ---p 00000000 00:00 0 > > > > 7f1f3c000000-7f1f3c029000 rw-p 00000000 00:00 0 > > > > 7f1f3c029000-7f1f40000000 ---p 00000000 00:00 0 > > > > 7f1f44000000-7f1f44029000 rw-p 00000000 00:00 0 > > > > 7f1f44029000-7f1f48000000 ---p 00000000 00:00 0 > > > > 7f1f4c000000-7f1f4c029000 rw-p 00000000 00:00 0 > > > > 7f1f4c029000-7f1f50000000 ---p 00000000 00:00 0 > > > > 7f1f54000000-7f1f54029000 rw-p 00000000 00:00 0 > > > > 7f1f54029000-7f1f58000000 ---p 00000000 00:00 0 > > > > 7f1f5c000000-7f1f5c029000 rw-p 00000000 00:00 0 > > > > 7f1f5c029000-7f1f60000000 ---p 00000000 00:00 0 > > > > 7f1f64000000-7f1f64029000 rw-p 00000000 00:00 0 > > > > 7f1f64029000-7f1f68000000 ---p 00000000 00:00 0 > > > > 7f1f6c000000-7f1f6c029000 rw-p 00000000 00:00 0 > > > > 7f1f6c029000-7f1f70000000 ---p 00000000 00:00 0 > > > > 7f1f74000000-7f1f74029000 rw-p 00000000 00:00 0 > > > > 7f1f74029000-7f1f78000000 ---p 00000000 00:00 0 > > > > 7f1f7c000000-7f1f7c029000 rw-p 00000000 00:00 0 > > > > 7f1f7c029000-7f1f80000000 ---p 00000000 00:00 0 > > > > 7f1f84000000-7f1f84029000 rw-p 00000000 00:00 0 > > > > 7f1f84029000-7f1f88000000 ---p 00000000 00:00 0 > > > > 7f1f8c000000-7f1f8c029000 rw-p 00000000 00:00 0 > > > > 7f1f8c029000-7f1f90000000 ---p 00000000 00:00 0 > > > > 7f1f94000000-7f1f94029000 rw-p 00000000 00:00 0 > > > > 7f1f94029000-7f1f98000000 ---p 00000000 00:00 0 > > > > 7f1f9c000000-7f1f9c029000 rw-p 00000000 00:00 0 > > > > 7f1f9c029000-7f1fa0000000 ---p 00000000 00:00 0 > > > > 7f1fa4000000-7f1fa60ac000 rw-p 00000000 00:00 0 > > > > 7f1fa60ac000-7f1fa8000000 ---p 00000000 00:00 0 > > > > 7f1fac000000-7f1fac029000 rw-p 00000000 00:00 0 > > > > 7f1fac029000-7f1fb0000000 ---p 00000000 00:00 0 > > > > 7f1fb4000000-7f1fb4029000 rw-p 00000000 00:00 0 > > > > 7f1fb4029000-7f1fb8000000 ---p 00000000 00:00 0 > > > > 7f1fbc000000-7f1fbc029000 rw-p 00000000 00:00 0 > > > > 7f1fbc029000-7f1fc0000000 ---p 00000000 00:00 0 > > > > 7f1fc4000000-7f1fc4029000 rw-p 00000000 00:00 0 > > > > 7f1fc4029000-7f1fc8000000 ---p 00000000 00:00 0 > > > > 7f1fcc000000-7f1fce0a1000 rw-p 00000000 00:00 0 > > > > 7f1fce0a1000-7f1fd0000000 ---p 00000000 00:00 0 > > > > 7f1fd4000000-7f1fd4029000 rw-p 00000000 00:00 0 > > > > 7f1fd4029000-7f1fd8000000 ---p 00000000 00:00 0 > > > > 7f1fdc000000-7f1fde06b000 rw-p 00000000 00:00 0 > > > > 7f1fde06b000-7f1fe0000000 ---p 00000000 00:00 0 > > > > 7f1fe4000000-7f1fe4029000 rw-p 00000000 00:00 0 > > > > 7f1fe4029000-7f1fe8000000 ---p 00000000 00:00 0 > > > > 7f1fec000000-7f1fede38000 rw-p 00000000 00:00 0 > > > > 7f1fede38000-7f1ff0000000 ---p 00000000 00:00 0 > > > > 7f1ff4000000-7f1ff4029000 rw-p 00000000 00:00 0 > > > > 7f1ff4029000-7f1ff8000000 ---p 00000000 00:00 0 > > > > 7f1ffc000000-7f1ffc029000 rw-p 00000000 00:00 0 > > > > 7f1ffc029000-7f2000000000 ---p 00000000 00:00 0 > > > > 7f2004000000-7f20060c6000 rw-p 00000000 00:00 0 > > > > 7f20060c6000-7f2008000000 ---p 00000000 00:00 0 > > > > 7f200c000000-7f200c029000 rw-p 00000000 00:00 0 > > > > 7f200c029000-7f2010000000 ---p 00000000 00:00 0 > > > > 7f2014000000-7f2014029000 rw-p 00000000 00:00 0 > > > > 7f2014029000-7f2018000000 ---p 00000000 00:00 0 > > > > 7f201c000000-7f201c029000 rw-p 00000000 00:00 0 > > > > 7f201c029000-7f2020000000 ---p 00000000 00:00 0 > > > > 7f2024000000-7f2024029000 rw-p 00000000 00:00 0 > > > > 7f2024029000-7f2028000000 ---p 00000000 00:00 0 > > > > 7f202c000000-7f202c029000 rw-p 00000000 00:00 0 > > > > 7f202c029000-7f2030000000 ---p 00000000 00:00 0 > > > > 7f2034000000-7f2034029000 rw-p 00000000 00:00 0 > > > > 7f2034029000-7f2038000000 ---p 00000000 00:00 0 > > > > 7f203c000000-7f203c029000 rw-p 00000000 00:00 0 > > > > 7f203c029000-7f2040000000 ---p 00000000 00:00 0 > > > > 7f2044000000-7f2044029000 rw-p 00000000 00:00 0 > > > > 7f2044029000-7f2048000000 ---p 00000000 00:00 0 > > > > 7f204c000000-7f204c029000 rw-p 00000000 00:00 0 > > > > 7f204c029000-7f2050000000 ---p 00000000 00:00 0 > > > > 7f2054000000-7f2054029000 rw-p 00000000 00:00 0 > > > > 7f2054029000-7f2058000000 ---p 00000000 00:00 0 > > > > 7f205c000000-7f205c029000 rw-p 00000000 00:00 0 > > > > 7f205c029000-7f2060000000 ---p 00000000 00:00 0 > > > > 7f2064000000-7f2064029000 rw-p 00000000 00:00 0 > > > > 7f2064029000-7f2068000000 ---p 00000000 00:00 0 > > > > 7f206c000000-7f206c029000 rw-p 00000000 00:00 0 > > > > 7f206c029000-7f2070000000 ---p 00000000 00:00 0 > > > > 7f2074000000-7f2074029000 rw-p 00000000 00:00 0 > > > > 7f2074029000-7f2078000000 ---p 00000000 00:00 0 > > > > 7f207c000000-7f207e0bc000 rw-p 00000000 00:00 0 > > > > 7f207e0bc000-7f2080000000 ---p 00000000 00:00 0 > > > > 7f2084000000-7f2084029000 rw-p 00000000 00:00 0 > > > > 7f2084029000-7f2088000000 ---p 00000000 00:00 0 > > > > 7f208c000000-7f208c029000 rw-p 00000000 00:00 0 > > > > 7f208c029000-7f2090000000 ---p 00000000 00:00 0 > > > > 7f2094000000-7f20960c6000 rw-p 00000000 00:00 0 > > > > 7f20960c6000-7f2098000000 ---p 00000000 00:00 0 > > > > 7f209c000000-7f209c029000 rw-p 00000000 00:00 0 > > > > 7f209c029000-7f20a0000000 ---p 00000000 00:00 0 > > > > 7f20a4000000-7f20a4029000 rw-p 00000000 00:00 0 > > > > 7f20a4029000-7f20a8000000 ---p 00000000 00:00 0 > > > > 7f20ac000000-7f20ac029000 rw-p 00000000 00:00 0 > > > > 7f20ac029000-7f20b0000000 ---p 00000000 00:00 0 > > > > 7f20b4000000-7f20b4029000 rw-p 00000000 00:00 0 > > > > 7f20b4029000-7f20b8000000 ---p 00000000 00:00 0 > > > > 7f20bc000000-7f20bc029000 rw-p 00000000 00:00 0 > > > > 7f20bc029000-7f20c0000000 ---p 00000000 00:00 0 > > > > 7f20c4000000-7f20c4029000 rw-p 00000000 00:00 0 > > > > 7f20c4029000-7f20c8000000 ---p 00000000 00:00 0 > > > > 7f20c8ffa000-7f20c8ffb000 ---p 00000000 00:00 0 > > > > 7f20c8ffb000-7f20c97fb000 rw-p 00000000 00:00 0 [stack:10274] > > > > 7f20c97fb000-7f20c97fc000 ---p 00000000 00:00 0 > > > > 7f20c97fc000-7f20c9ffc000 rw-p 00000000 00:00 0 > > > > 7f20c9ffc000-7f20c9ffd000 ---p 00000000 00:00 0 > > > > 7f20c9ffd000-7f20ca7fd000 rw-p 00000000 00:00 0 > > > > 7f20ca7fd000-7f20ca7fe000 ---p 00000000 00:00 0 > > > > 7f20ca7fe000-7f20caffe000 rw-p 00000000 00:00 0 > > > > 7f20cc000000-7f20cc029000 rw-p 00000000 00:00 0 > > > > 7f20cc029000-7f20d0000000 ---p 00000000 00:00 0 > > > > 7f20d4000000-7f20d4029000 rw-p 00000000 00:00 0 > > > > 7f20d4029000-7f20d8000000 ---p 00000000 00:00 0 > > > > 7f20dc000000-7f20dc029000 rw-p 00000000 00:00 0 > > > > 7f20dc029000-7f20e0000000 ---p 00000000 00:00 0 > > > > 7f210d9dd000-7f210d9de000 ---p 00000000 00:00 0 > > > > 7f210d9de000-7f210e1de000 rw-p 00000000 00:00 0 [stack:10160] > > > > 7f210e1de000-7f210e1df000 ---p 00000000 00:00 0 > > > > 7f210e1df000-7f210e9df000 rw-p 00000000 00:00 0 [stack:10159] > > > > 7f210e9df000-7f210e9e0000 ---p 00000000 00:00 0 > > > > 7f210e9e0000-7f210f1e0000 rw-p 00000000 00:00 0 > > > > 7f210f1e0000-7f210f1e1000 ---p 00000000 00:00 0 > > > > 7f210f1e1000-7f210f9e4000 rw-p 00000000 00:00 0 > > > > 7f210fa00000-7f210fa01000 rw-p 00000000 00:00 0 > > > > 7f210fa01000-7f210fa02000 r-xp 00000000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > > > 7f210fa02000-7f210fc02000 ---p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > > > 7f210fc02000-7f210fc03000 rw-p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > > > 7f210fc03000-7f210fc04000 rw-p 00000000 00:00 0 > > > > 7f210fc04000-7f210fc0a000 r-xp 00000000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > > > 7f210fc0a000-7f210fe09000 ---p 00006000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > > > 7f210fe09000-7f210fe0a000 rw-p 00005000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > > > 7f210fe0a000-7f210fe0b000 rw-p 00000000 00:00 0 > > > > 7fff7c648000-7fff7c669000 rw-p 00000000 00:00 0 [stack] > > > > 7fff7c715000-7fff7c716000 r-xp 00000000 00:00 0 [vdso] > > > > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] > > > > > > > > _______________________________________________ > > > > lttng-dev mailing list > > > > lttng-dev at lists.lttng.org > > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > > > > -- > > > Mathieu Desnoyers > > > Operating System Efficiency R&D Consultant > > > EfficiOS Inc. > > > http://www.efficios.com > > > > > > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev at lists.lttng.org > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Wed Oct 10 15:34:04 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 10 Oct 2012 15:34:04 -0400 Subject: [lttng-dev] [RFC] Changes to the stop command In-Reply-To: <5075AC59.3050704@efficios.com> References: <5075AC59.3050704@efficios.com> Message-ID: <20121010193404.GA29797@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > Hi everyone, > > We discovered a week ago a "broken guarantee" which is that when a > session is stopped by either using the lttng command or the API call > lttng_stop_session the traced data MUST be ready to be read. > > However, we don't offer that at all for now for both local storage and > network streaming. The stop command/call simply does _not_ wait for that > state. > > Here is the proposal to fix this issue before the 2.1 stable release. > Let's add a new API call (extending it) that probes the session daemon > for the trace files state (still writing, no more data, closed, ...). > > Ex: lttng_data_state(handle) > > This will bring a change to the default behavior of the stop command. > From now on, it will wait by default until the data is available to read > (for both network and local). This will however be done on the client > side in order to avoid blocking the session daemon client command sub > system for an unknown amount of time. > > The way I propose we proceed is to use the new API call (mention above) > on the liblttng-ctl side when a stop is done that requires it to wait. > Unfortunately, there is no clean way to do that other than an active > loop polling the session daemon... > > The "no wait" use case of the stop command will also be added with a > lttng_stop_session_no_wait or something like that. > > In a nutshell: > > (new) lttng_data_state(handle) --> name is NOT final, please chip in for > ideas! :) ideas: - lttng_data_pending() - lttng_data_available() > (new) lttng_stop_session_no_wait(session_name) --> naming NOT final. lttng_stop_session_no_wait sounds ok to me. The rest looks good. Let's see if others find better names ;) Thanks, Mathieu > > (changes) lttng stop (and lttng_stop_session) will now wait for the data > to be available so babeltrace could be use right after for instance. A > --no-wait will be added as well to the UI command. > > I would like everyone opinion on that because this is an important issue > that _MUST_ be fixed in 2.1 stable or at least in the 2.1.x series. > > Thanks a lot! > David > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From e_lopezs at encs.concordia.ca Wed Oct 10 16:06:29 2012 From: e_lopezs at encs.concordia.ca (Efraim Josue Lopez Sanchez) Date: Wed, 10 Oct 2012 16:06:29 -0400 Subject: [lttng-dev] TMF current and future developments Message-ID: <20121010160629.dcsyijhgooowckoo@mail.encs.concordia.ca> Hi, We are planning to build on top of TMF. Most recently we have tried implementing some stuffs so that we can familiarize with TMF source code. However, I've been wondering these last days about the future plans/developments related to TMF. Is there any webpage/place/document where I can find information regarding what are you guys working/developing right now? And also about the future developments/versions? Regards, Efraim From paulmck at linux.vnet.ibm.com Wed Oct 10 15:50:07 2012 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Wed, 10 Oct 2012 12:50:07 -0700 Subject: [lttng-dev] rculfstack bug In-Reply-To: <20121010175304.GA25511@Krystal> References: <50754A57.1080104@cn.fujitsu.com> <20121010114215.GA11307@Krystal> <20121010150207.GB2495@linux.vnet.ibm.com> <20121010150751.GB20761@Krystal> <20121010175304.GA25511@Krystal> Message-ID: <20121010195007.GG2495@linux.vnet.ibm.com> On Wed, Oct 10, 2012 at 01:53:04PM -0400, Mathieu Desnoyers wrote: > * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > > > On Wed, Oct 10, 2012 at 07:42:15AM -0400, Mathieu Desnoyers wrote: > > > > * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > > > > > test code: > > > > > ./tests/test_urcu_lfs 100 10 10 > > > > > > > > > > bug produce rate > 60% > > > > > > > > > > {{{ > > > > > I didn't see any bug when "./tests/test_urcu_lfs 10 10 10" Or "./tests/test_urcu_lfs 100 100 10" > > > > > But I just test it about 5 times > > > > > }}} > > > > > > > > > > 4cores*1threads: Intel(R) Core(TM) i5 CPU 760 > > > > > RCU_MB (no time to test for other rcu type) > > > > > test commit: 768fba83676f49eb73fd1d8ad452016a84c5ec2a > > > > > > > > > > I didn't see any bug when "./tests/test_urcu_mb 10 100 10" > > > > > > > > > > Sorry, I tried, but I failed to find out the root cause currently. > > > > > > > > I think I managed to narrow down the issue: > > > > > > > > 1) the master branch does not reproduce it, but commit > > > > 768fba83676f49eb73fd1d8ad452016a84c5ec2a repdroduces it about 50% of the > > > > time. > > > > > > > > 2) the main change between 768fba83676f49eb73fd1d8ad452016a84c5ec2a and > > > > current master (f94061a3df4c9eab9ac869a19e4228de54771fcb) is call_rcu > > > > moving to wfcqueue. > > > > > > > > 3) the bug always arise, for me, at the end of the 10 seconds. > > > > However, it might be simply due to the fact that most of the memory > > > > get freed at the end of program execution. > > > > > > > > 4) I've been able to get a backtrace, and it looks like we have some > > > > call_rcu callback-invokation threads still working while > > > > call_rcu_data_free() is invoked. In the backtrace, call_rcu_data_free() > > > > is nicely waiting for the next thread to stop, and during that time, > > > > two callback-invokation threads are invoking callbacks (and one of > > > > them triggers the segfault). > > > > > > Do any of the callbacks reference __thread variables from some other > > > thread? If so, those threads must refrain from exiting until after > > > such callbacks complete. > > > > The callback is a simple caa_container_of + free, usual stuff, nothing > > fancy. > > Here is the fix: the bug was in call rcu. It is not required for master, > because we fixed it while moving to wfcqueue. > > We were erroneously writing to the head field of the default > call_rcu_data rather than tail. Ouch!!! I have no idea why that would have passed my testing. :-( > I wonder if we should simply do a new release with call_rcu using > wfcqueue and tell people to upgrade, or if we should somehow create a > stable branch with this fix. > > Thoughts ? Under what conditions does this bug appear? It is necessary to not just use call_rcu(), but also to explicitly call call_rcu_data_free(), right? My guess is that a stable branch would be good -- there will be other bugs, after all. :-/ Thanx, Paul > Thanks, > > Mathieu > > --- > diff --git a/urcu-call-rcu-impl.h b/urcu-call-rcu-impl.h > index 13b24ff..b205229 100644 > --- a/urcu-call-rcu-impl.h > +++ b/urcu-call-rcu-impl.h > @@ -647,8 +647,9 @@ void call_rcu_data_free(struct call_rcu_data *crdp) > /* Create default call rcu data if need be */ > (void) get_default_call_rcu_data(); > cbs_endprev = (struct cds_wfq_node **) > - uatomic_xchg(&default_call_rcu_data, cbs_tail); > - *cbs_endprev = cbs; > + uatomic_xchg(&default_call_rcu_data->cbs.tail, > + cbs_tail); > + _CMM_STORE_SHARED(*cbs_endprev, cbs); > uatomic_add(&default_call_rcu_data->qlen, > uatomic_read(&crdp->qlen)); > wake_call_rcu_thread(default_call_rcu_data); > > > > > > Thanks, > > > > Mathieu > > > > > > > > Thanx, Paul > > > > > > > So I expect that commit > > > > > > > > commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe > > > > Author: Mathieu Desnoyers > > > > Date: Tue Sep 25 10:50:49 2012 -0500 > > > > > > > > call_rcu: use wfcqueue, eliminate false-sharing > > > > > > > > Eliminate false-sharing between call_rcu (enqueuer) and worker threads > > > > on the queue head and tail. > > > > > > > > Acked-by: Paul E. McKenney > > > > Signed-off-by: Mathieu Desnoyers > > > > > > > > Could have managed to fix the issue, or change the timing enough that it > > > > does not reproduces. I'll continue investigating. > > > > > > > > Thanks, > > > > > > > > Mathieu > > > > > > > > > > > > > > > > > > *** glibc detected *** /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs: double free or corruption (out): 0x00007f20955dfbb0 *** > > > > > ======= Backtrace: ========= > > > > > /lib64/libc.so.6[0x37ee676d63] > > > > > /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs[0x4024f5] > > > > > /lib64/libpthread.so.0[0x37eda06ccb] > > > > > /lib64/libc.so.6(clone+0x6d)[0x37ee6e0c2d] > > > > > ======= Memory map: ======== > > > > > 00400000-00405000 r-xp 00000000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs > > > > > 00605000-00606000 rw-p 00005000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs > > > > > 00606000-00616000 rw-p 00000000 00:00 0 > > > > > 00e9c000-03482000 rw-p 00000000 00:00 0 [heap] > > > > > 37ed600000-37ed61f000 r-xp 00000000 08:01 1507421 /lib64/ld-2.13.so > > > > > 37ed81e000-37ed81f000 r--p 0001e000 08:01 1507421 /lib64/ld-2.13.so > > > > > 37ed81f000-37ed820000 rw-p 0001f000 08:01 1507421 /lib64/ld-2.13.so > > > > > 37ed820000-37ed821000 rw-p 00000000 00:00 0 > > > > > 37eda00000-37eda17000 r-xp 00000000 08:01 1507427 /lib64/libpthread-2.13.so > > > > > 37eda17000-37edc16000 ---p 00017000 08:01 1507427 /lib64/libpthread-2.13.so > > > > > 37edc16000-37edc17000 r--p 00016000 08:01 1507427 /lib64/libpthread-2.13.so > > > > > 37edc17000-37edc18000 rw-p 00017000 08:01 1507427 /lib64/libpthread-2.13.so > > > > > 37edc18000-37edc1c000 rw-p 00000000 00:00 0 > > > > > 37ee600000-37ee791000 r-xp 00000000 08:01 1507423 /lib64/libc-2.13.so > > > > > 37ee791000-37ee991000 ---p 00191000 08:01 1507423 /lib64/libc-2.13.so > > > > > 37ee991000-37ee995000 r--p 00191000 08:01 1507423 /lib64/libc-2.13.so > > > > > 37ee995000-37ee996000 rw-p 00195000 08:01 1507423 /lib64/libc-2.13.so > > > > > 37ee996000-37ee99c000 rw-p 00000000 00:00 0 > > > > > 37f0e00000-37f0e15000 r-xp 00000000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > > > > 37f0e15000-37f1014000 ---p 00015000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > > > > 37f1014000-37f1015000 rw-p 00014000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > > > > 7f1ee4000000-7f1ee4029000 rw-p 00000000 00:00 0 > > > > > 7f1ee4029000-7f1ee8000000 ---p 00000000 00:00 0 > > > > > 7f1eec000000-7f1eee039000 rw-p 00000000 00:00 0 > > > > > 7f1eee039000-7f1ef0000000 ---p 00000000 00:00 0 > > > > > 7f1ef4000000-7f1ef4029000 rw-p 00000000 00:00 0 > > > > > 7f1ef4029000-7f1ef8000000 ---p 00000000 00:00 0 > > > > > 7f1efc000000-7f1efc029000 rw-p 00000000 00:00 0 > > > > > 7f1efc029000-7f1f00000000 ---p 00000000 00:00 0 > > > > > 7f1f04000000-7f1f060b8000 rw-p 00000000 00:00 0 > > > > > 7f1f060b8000-7f1f08000000 ---p 00000000 00:00 0 > > > > > 7f1f0c000000-7f1f0c029000 rw-p 00000000 00:00 0 > > > > > 7f1f0c029000-7f1f10000000 ---p 00000000 00:00 0 > > > > > 7f1f14000000-7f1f14029000 rw-p 00000000 00:00 0 > > > > > 7f1f14029000-7f1f18000000 ---p 00000000 00:00 0 > > > > > 7f1f1c000000-7f1f1c029000 rw-p 00000000 00:00 0 > > > > > 7f1f1c029000-7f1f20000000 ---p 00000000 00:00 0 > > > > > 7f1f24000000-7f1f24029000 rw-p 00000000 00:00 0 > > > > > 7f1f24029000-7f1f28000000 ---p 00000000 00:00 0 > > > > > 7f1f2c000000-7f1f2c029000 rw-p 00000000 00:00 0 > > > > > 7f1f2c029000-7f1f30000000 ---p 00000000 00:00 0 > > > > > 7f1f34000000-7f1f34029000 rw-p 00000000 00:00 0 > > > > > 7f1f34029000-7f1f38000000 ---p 00000000 00:00 0 > > > > > 7f1f3c000000-7f1f3c029000 rw-p 00000000 00:00 0 > > > > > 7f1f3c029000-7f1f40000000 ---p 00000000 00:00 0 > > > > > 7f1f44000000-7f1f44029000 rw-p 00000000 00:00 0 > > > > > 7f1f44029000-7f1f48000000 ---p 00000000 00:00 0 > > > > > 7f1f4c000000-7f1f4c029000 rw-p 00000000 00:00 0 > > > > > 7f1f4c029000-7f1f50000000 ---p 00000000 00:00 0 > > > > > 7f1f54000000-7f1f54029000 rw-p 00000000 00:00 0 > > > > > 7f1f54029000-7f1f58000000 ---p 00000000 00:00 0 > > > > > 7f1f5c000000-7f1f5c029000 rw-p 00000000 00:00 0 > > > > > 7f1f5c029000-7f1f60000000 ---p 00000000 00:00 0 > > > > > 7f1f64000000-7f1f64029000 rw-p 00000000 00:00 0 > > > > > 7f1f64029000-7f1f68000000 ---p 00000000 00:00 0 > > > > > 7f1f6c000000-7f1f6c029000 rw-p 00000000 00:00 0 > > > > > 7f1f6c029000-7f1f70000000 ---p 00000000 00:00 0 > > > > > 7f1f74000000-7f1f74029000 rw-p 00000000 00:00 0 > > > > > 7f1f74029000-7f1f78000000 ---p 00000000 00:00 0 > > > > > 7f1f7c000000-7f1f7c029000 rw-p 00000000 00:00 0 > > > > > 7f1f7c029000-7f1f80000000 ---p 00000000 00:00 0 > > > > > 7f1f84000000-7f1f84029000 rw-p 00000000 00:00 0 > > > > > 7f1f84029000-7f1f88000000 ---p 00000000 00:00 0 > > > > > 7f1f8c000000-7f1f8c029000 rw-p 00000000 00:00 0 > > > > > 7f1f8c029000-7f1f90000000 ---p 00000000 00:00 0 > > > > > 7f1f94000000-7f1f94029000 rw-p 00000000 00:00 0 > > > > > 7f1f94029000-7f1f98000000 ---p 00000000 00:00 0 > > > > > 7f1f9c000000-7f1f9c029000 rw-p 00000000 00:00 0 > > > > > 7f1f9c029000-7f1fa0000000 ---p 00000000 00:00 0 > > > > > 7f1fa4000000-7f1fa60ac000 rw-p 00000000 00:00 0 > > > > > 7f1fa60ac000-7f1fa8000000 ---p 00000000 00:00 0 > > > > > 7f1fac000000-7f1fac029000 rw-p 00000000 00:00 0 > > > > > 7f1fac029000-7f1fb0000000 ---p 00000000 00:00 0 > > > > > 7f1fb4000000-7f1fb4029000 rw-p 00000000 00:00 0 > > > > > 7f1fb4029000-7f1fb8000000 ---p 00000000 00:00 0 > > > > > 7f1fbc000000-7f1fbc029000 rw-p 00000000 00:00 0 > > > > > 7f1fbc029000-7f1fc0000000 ---p 00000000 00:00 0 > > > > > 7f1fc4000000-7f1fc4029000 rw-p 00000000 00:00 0 > > > > > 7f1fc4029000-7f1fc8000000 ---p 00000000 00:00 0 > > > > > 7f1fcc000000-7f1fce0a1000 rw-p 00000000 00:00 0 > > > > > 7f1fce0a1000-7f1fd0000000 ---p 00000000 00:00 0 > > > > > 7f1fd4000000-7f1fd4029000 rw-p 00000000 00:00 0 > > > > > 7f1fd4029000-7f1fd8000000 ---p 00000000 00:00 0 > > > > > 7f1fdc000000-7f1fde06b000 rw-p 00000000 00:00 0 > > > > > 7f1fde06b000-7f1fe0000000 ---p 00000000 00:00 0 > > > > > 7f1fe4000000-7f1fe4029000 rw-p 00000000 00:00 0 > > > > > 7f1fe4029000-7f1fe8000000 ---p 00000000 00:00 0 > > > > > 7f1fec000000-7f1fede38000 rw-p 00000000 00:00 0 > > > > > 7f1fede38000-7f1ff0000000 ---p 00000000 00:00 0 > > > > > 7f1ff4000000-7f1ff4029000 rw-p 00000000 00:00 0 > > > > > 7f1ff4029000-7f1ff8000000 ---p 00000000 00:00 0 > > > > > 7f1ffc000000-7f1ffc029000 rw-p 00000000 00:00 0 > > > > > 7f1ffc029000-7f2000000000 ---p 00000000 00:00 0 > > > > > 7f2004000000-7f20060c6000 rw-p 00000000 00:00 0 > > > > > 7f20060c6000-7f2008000000 ---p 00000000 00:00 0 > > > > > 7f200c000000-7f200c029000 rw-p 00000000 00:00 0 > > > > > 7f200c029000-7f2010000000 ---p 00000000 00:00 0 > > > > > 7f2014000000-7f2014029000 rw-p 00000000 00:00 0 > > > > > 7f2014029000-7f2018000000 ---p 00000000 00:00 0 > > > > > 7f201c000000-7f201c029000 rw-p 00000000 00:00 0 > > > > > 7f201c029000-7f2020000000 ---p 00000000 00:00 0 > > > > > 7f2024000000-7f2024029000 rw-p 00000000 00:00 0 > > > > > 7f2024029000-7f2028000000 ---p 00000000 00:00 0 > > > > > 7f202c000000-7f202c029000 rw-p 00000000 00:00 0 > > > > > 7f202c029000-7f2030000000 ---p 00000000 00:00 0 > > > > > 7f2034000000-7f2034029000 rw-p 00000000 00:00 0 > > > > > 7f2034029000-7f2038000000 ---p 00000000 00:00 0 > > > > > 7f203c000000-7f203c029000 rw-p 00000000 00:00 0 > > > > > 7f203c029000-7f2040000000 ---p 00000000 00:00 0 > > > > > 7f2044000000-7f2044029000 rw-p 00000000 00:00 0 > > > > > 7f2044029000-7f2048000000 ---p 00000000 00:00 0 > > > > > 7f204c000000-7f204c029000 rw-p 00000000 00:00 0 > > > > > 7f204c029000-7f2050000000 ---p 00000000 00:00 0 > > > > > 7f2054000000-7f2054029000 rw-p 00000000 00:00 0 > > > > > 7f2054029000-7f2058000000 ---p 00000000 00:00 0 > > > > > 7f205c000000-7f205c029000 rw-p 00000000 00:00 0 > > > > > 7f205c029000-7f2060000000 ---p 00000000 00:00 0 > > > > > 7f2064000000-7f2064029000 rw-p 00000000 00:00 0 > > > > > 7f2064029000-7f2068000000 ---p 00000000 00:00 0 > > > > > 7f206c000000-7f206c029000 rw-p 00000000 00:00 0 > > > > > 7f206c029000-7f2070000000 ---p 00000000 00:00 0 > > > > > 7f2074000000-7f2074029000 rw-p 00000000 00:00 0 > > > > > 7f2074029000-7f2078000000 ---p 00000000 00:00 0 > > > > > 7f207c000000-7f207e0bc000 rw-p 00000000 00:00 0 > > > > > 7f207e0bc000-7f2080000000 ---p 00000000 00:00 0 > > > > > 7f2084000000-7f2084029000 rw-p 00000000 00:00 0 > > > > > 7f2084029000-7f2088000000 ---p 00000000 00:00 0 > > > > > 7f208c000000-7f208c029000 rw-p 00000000 00:00 0 > > > > > 7f208c029000-7f2090000000 ---p 00000000 00:00 0 > > > > > 7f2094000000-7f20960c6000 rw-p 00000000 00:00 0 > > > > > 7f20960c6000-7f2098000000 ---p 00000000 00:00 0 > > > > > 7f209c000000-7f209c029000 rw-p 00000000 00:00 0 > > > > > 7f209c029000-7f20a0000000 ---p 00000000 00:00 0 > > > > > 7f20a4000000-7f20a4029000 rw-p 00000000 00:00 0 > > > > > 7f20a4029000-7f20a8000000 ---p 00000000 00:00 0 > > > > > 7f20ac000000-7f20ac029000 rw-p 00000000 00:00 0 > > > > > 7f20ac029000-7f20b0000000 ---p 00000000 00:00 0 > > > > > 7f20b4000000-7f20b4029000 rw-p 00000000 00:00 0 > > > > > 7f20b4029000-7f20b8000000 ---p 00000000 00:00 0 > > > > > 7f20bc000000-7f20bc029000 rw-p 00000000 00:00 0 > > > > > 7f20bc029000-7f20c0000000 ---p 00000000 00:00 0 > > > > > 7f20c4000000-7f20c4029000 rw-p 00000000 00:00 0 > > > > > 7f20c4029000-7f20c8000000 ---p 00000000 00:00 0 > > > > > 7f20c8ffa000-7f20c8ffb000 ---p 00000000 00:00 0 > > > > > 7f20c8ffb000-7f20c97fb000 rw-p 00000000 00:00 0 [stack:10274] > > > > > 7f20c97fb000-7f20c97fc000 ---p 00000000 00:00 0 > > > > > 7f20c97fc000-7f20c9ffc000 rw-p 00000000 00:00 0 > > > > > 7f20c9ffc000-7f20c9ffd000 ---p 00000000 00:00 0 > > > > > 7f20c9ffd000-7f20ca7fd000 rw-p 00000000 00:00 0 > > > > > 7f20ca7fd000-7f20ca7fe000 ---p 00000000 00:00 0 > > > > > 7f20ca7fe000-7f20caffe000 rw-p 00000000 00:00 0 > > > > > 7f20cc000000-7f20cc029000 rw-p 00000000 00:00 0 > > > > > 7f20cc029000-7f20d0000000 ---p 00000000 00:00 0 > > > > > 7f20d4000000-7f20d4029000 rw-p 00000000 00:00 0 > > > > > 7f20d4029000-7f20d8000000 ---p 00000000 00:00 0 > > > > > 7f20dc000000-7f20dc029000 rw-p 00000000 00:00 0 > > > > > 7f20dc029000-7f20e0000000 ---p 00000000 00:00 0 > > > > > 7f210d9dd000-7f210d9de000 ---p 00000000 00:00 0 > > > > > 7f210d9de000-7f210e1de000 rw-p 00000000 00:00 0 [stack:10160] > > > > > 7f210e1de000-7f210e1df000 ---p 00000000 00:00 0 > > > > > 7f210e1df000-7f210e9df000 rw-p 00000000 00:00 0 [stack:10159] > > > > > 7f210e9df000-7f210e9e0000 ---p 00000000 00:00 0 > > > > > 7f210e9e0000-7f210f1e0000 rw-p 00000000 00:00 0 > > > > > 7f210f1e0000-7f210f1e1000 ---p 00000000 00:00 0 > > > > > 7f210f1e1000-7f210f9e4000 rw-p 00000000 00:00 0 > > > > > 7f210fa00000-7f210fa01000 rw-p 00000000 00:00 0 > > > > > 7f210fa01000-7f210fa02000 r-xp 00000000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > > > > 7f210fa02000-7f210fc02000 ---p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > > > > 7f210fc02000-7f210fc03000 rw-p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > > > > 7f210fc03000-7f210fc04000 rw-p 00000000 00:00 0 > > > > > 7f210fc04000-7f210fc0a000 r-xp 00000000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > > > > 7f210fc0a000-7f210fe09000 ---p 00006000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > > > > 7f210fe09000-7f210fe0a000 rw-p 00005000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > > > > 7f210fe0a000-7f210fe0b000 rw-p 00000000 00:00 0 > > > > > 7fff7c648000-7fff7c669000 rw-p 00000000 00:00 0 [stack] > > > > > 7fff7c715000-7fff7c716000 r-xp 00000000 00:00 0 [vdso] > > > > > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] > > > > > > > > > > _______________________________________________ > > > > > lttng-dev mailing list > > > > > lttng-dev at lists.lttng.org > > > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > > > > > > -- > > > > Mathieu Desnoyers > > > > Operating System Efficiency R&D Consultant > > > > EfficiOS Inc. > > > > http://www.efficios.com > > > > > > > > > > > > > _______________________________________________ > > > lttng-dev mailing list > > > lttng-dev at lists.lttng.org > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > > -- > > Mathieu Desnoyers > > Operating System Efficiency R&D Consultant > > EfficiOS Inc. > > http://www.efficios.com > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com > From dgoulet at efficios.com Wed Oct 10 16:14:10 2012 From: dgoulet at efficios.com (David Goulet) Date: Wed, 10 Oct 2012 16:14:10 -0400 Subject: [lttng-dev] [RFC] Changes to the stop command In-Reply-To: <20121010193404.GA29797@Krystal> References: <5075AC59.3050704@efficios.com> <20121010193404.GA29797@Krystal> Message-ID: <5075D712.7070700@efficios.com> Mathieu Desnoyers: > * David Goulet (dgoulet at efficios.com) wrote: >> Hi everyone, >> >> We discovered a week ago a "broken guarantee" which is that when a >> session is stopped by either using the lttng command or the API call >> lttng_stop_session the traced data MUST be ready to be read. >> >> However, we don't offer that at all for now for both local storage and >> network streaming. The stop command/call simply does _not_ wait for that >> state. >> >> Here is the proposal to fix this issue before the 2.1 stable release. >> Let's add a new API call (extending it) that probes the session daemon >> for the trace files state (still writing, no more data, closed, ...). >> >> Ex: lttng_data_state(handle) >> >> This will bring a change to the default behavior of the stop command. >> From now on, it will wait by default until the data is available to read >> (for both network and local). This will however be done on the client >> side in order to avoid blocking the session daemon client command sub >> system for an unknown amount of time. >> >> The way I propose we proceed is to use the new API call (mention above) >> on the liblttng-ctl side when a stop is done that requires it to wait. >> Unfortunately, there is no clean way to do that other than an active >> loop polling the session daemon... >> >> The "no wait" use case of the stop command will also be added with a >> lttng_stop_session_no_wait or something like that. >> >> In a nutshell: >> >> (new) lttng_data_state(handle) --> name is NOT final, please chip in for >> ideas! :) > > ideas: > > - lttng_data_pending() > - lttng_data_available() lttng_data_ready? David > > >> (new) lttng_stop_session_no_wait(session_name) --> naming NOT final. > > lttng_stop_session_no_wait sounds ok to me. > > The rest looks good. Let's see if others find better names ;) > > Thanks, > > Mathieu > >> >> (changes) lttng stop (and lttng_stop_session) will now wait for the data >> to be available so babeltrace could be use right after for instance. A >> --no-wait will be added as well to the UI command. >> >> I would like everyone opinion on that because this is an important issue >> that _MUST_ be fixed in 2.1 stable or at least in the 2.1.x series. >> >> Thanks a lot! >> David >> >> _______________________________________________ >> lttng-dev mailing list >> lttng-dev at lists.lttng.org >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > From dgoulet at efficios.com Wed Oct 10 16:27:05 2012 From: dgoulet at efficios.com (David Goulet) Date: Wed, 10 Oct 2012 16:27:05 -0400 Subject: [lttng-dev] [RFC] Changes to the stop command In-Reply-To: <5075AC59.3050704@efficios.com> References: <5075AC59.3050704@efficios.com> Message-ID: <5075DA19.3020809@efficios.com> Addendum! Please replace lttng_stop_session to "lttng_stop_tracing". Thanks David David Goulet: > Hi everyone, > > We discovered a week ago a "broken guarantee" which is that when a > session is stopped by either using the lttng command or the API call > lttng_stop_session the traced data MUST be ready to be read. > > However, we don't offer that at all for now for both local storage and > network streaming. The stop command/call simply does _not_ wait for that > state. > > Here is the proposal to fix this issue before the 2.1 stable release. > Let's add a new API call (extending it) that probes the session daemon > for the trace files state (still writing, no more data, closed, ...). > > Ex: lttng_data_state(handle) > > This will bring a change to the default behavior of the stop command. > From now on, it will wait by default until the data is available to read > (for both network and local). This will however be done on the client > side in order to avoid blocking the session daemon client command sub > system for an unknown amount of time. > > The way I propose we proceed is to use the new API call (mention above) > on the liblttng-ctl side when a stop is done that requires it to wait. > Unfortunately, there is no clean way to do that other than an active > loop polling the session daemon... > > The "no wait" use case of the stop command will also be added with a > lttng_stop_session_no_wait or something like that. > > In a nutshell: > > (new) lttng_data_state(handle) --> name is NOT final, please chip in for > ideas! :) > (new) lttng_stop_session_no_wait(session_name) --> naming NOT final. > > (changes) lttng stop (and lttng_stop_session) will now wait for the data > to be available so babeltrace could be use right after for instance. A > --no-wait will be added as well to the UI command. > > I would like everyone opinion on that because this is an important issue > that _MUST_ be fixed in 2.1 stable or at least in the 2.1.x series. > > Thanks a lot! > David > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev From mathieu.desnoyers at efficios.com Wed Oct 10 16:49:14 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 10 Oct 2012 16:49:14 -0400 Subject: [lttng-dev] rculfstack bug In-Reply-To: <20121010195007.GG2495@linux.vnet.ibm.com> References: <50754A57.1080104@cn.fujitsu.com> <20121010114215.GA11307@Krystal> <20121010150207.GB2495@linux.vnet.ibm.com> <20121010150751.GB20761@Krystal> <20121010175304.GA25511@Krystal> <20121010195007.GG2495@linux.vnet.ibm.com> Message-ID: <20121010204914.GA1175@Krystal> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > On Wed, Oct 10, 2012 at 01:53:04PM -0400, Mathieu Desnoyers wrote: > > * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > > > > On Wed, Oct 10, 2012 at 07:42:15AM -0400, Mathieu Desnoyers wrote: > > > > > * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > > > > > > test code: > > > > > > ./tests/test_urcu_lfs 100 10 10 > > > > > > > > > > > > bug produce rate > 60% > > > > > > > > > > > > {{{ > > > > > > I didn't see any bug when "./tests/test_urcu_lfs 10 10 10" Or "./tests/test_urcu_lfs 100 100 10" > > > > > > But I just test it about 5 times > > > > > > }}} > > > > > > > > > > > > 4cores*1threads: Intel(R) Core(TM) i5 CPU 760 > > > > > > RCU_MB (no time to test for other rcu type) > > > > > > test commit: 768fba83676f49eb73fd1d8ad452016a84c5ec2a > > > > > > > > > > > > I didn't see any bug when "./tests/test_urcu_mb 10 100 10" > > > > > > > > > > > > Sorry, I tried, but I failed to find out the root cause currently. > > > > > > > > > > I think I managed to narrow down the issue: > > > > > > > > > > 1) the master branch does not reproduce it, but commit > > > > > 768fba83676f49eb73fd1d8ad452016a84c5ec2a repdroduces it about 50% of the > > > > > time. > > > > > > > > > > 2) the main change between 768fba83676f49eb73fd1d8ad452016a84c5ec2a and > > > > > current master (f94061a3df4c9eab9ac869a19e4228de54771fcb) is call_rcu > > > > > moving to wfcqueue. > > > > > > > > > > 3) the bug always arise, for me, at the end of the 10 seconds. > > > > > However, it might be simply due to the fact that most of the memory > > > > > get freed at the end of program execution. > > > > > > > > > > 4) I've been able to get a backtrace, and it looks like we have some > > > > > call_rcu callback-invokation threads still working while > > > > > call_rcu_data_free() is invoked. In the backtrace, call_rcu_data_free() > > > > > is nicely waiting for the next thread to stop, and during that time, > > > > > two callback-invokation threads are invoking callbacks (and one of > > > > > them triggers the segfault). > > > > > > > > Do any of the callbacks reference __thread variables from some other > > > > thread? If so, those threads must refrain from exiting until after > > > > such callbacks complete. > > > > > > The callback is a simple caa_container_of + free, usual stuff, nothing > > > fancy. > > > > Here is the fix: the bug was in call rcu. It is not required for master, > > because we fixed it while moving to wfcqueue. > > > > We were erroneously writing to the head field of the default > > call_rcu_data rather than tail. > > Ouch!!! I have no idea why that would have passed my testing. :-( Well, teardown is fairly unfrequent compared to call_rcu/cb execution. > > > I wonder if we should simply do a new release with call_rcu using > > wfcqueue and tell people to upgrade, or if we should somehow create a > > stable branch with this fix. > > > > Thoughts ? > > Under what conditions does this bug appear? It is necessary to not just > use call_rcu(), but also to explicitly call call_rcu_data_free(), right? The conditions for it to appear: 1) setup per-cpu callback-invokation threads, 2) use call_rcu 3) call call_rcu_data_free() while there are still some pending callbacks that have not yet been executed by the callback-invokation threads, 4) we then get corruption due to the "default" callback invokation that walks through a corrupted queue. My guess is that in all your test-cases, there were no callbacks left to move to the default cb-invokation thread. Interestingly, this test_urcu_lfs 100 10 10 test really stresses out reclaim. > > My guess is that a stable branch would be good -- there will be other > bugs, after all. :-/ Yep, although we have a pretty good track record so far! And I must say that this bug has been fixed in the master branch (with the move to wfcqueue) before it's been discovered. ;-) I will soon create stable-0.6 and stable-0.7 branches, Thanks! Mathieu > > Thanx, Paul > > > Thanks, > > > > Mathieu > > > > --- > > diff --git a/urcu-call-rcu-impl.h b/urcu-call-rcu-impl.h > > index 13b24ff..b205229 100644 > > --- a/urcu-call-rcu-impl.h > > +++ b/urcu-call-rcu-impl.h > > @@ -647,8 +647,9 @@ void call_rcu_data_free(struct call_rcu_data *crdp) > > /* Create default call rcu data if need be */ > > (void) get_default_call_rcu_data(); > > cbs_endprev = (struct cds_wfq_node **) > > - uatomic_xchg(&default_call_rcu_data, cbs_tail); > > - *cbs_endprev = cbs; > > + uatomic_xchg(&default_call_rcu_data->cbs.tail, > > + cbs_tail); > > + _CMM_STORE_SHARED(*cbs_endprev, cbs); > > uatomic_add(&default_call_rcu_data->qlen, > > uatomic_read(&crdp->qlen)); > > wake_call_rcu_thread(default_call_rcu_data); > > > > > > > > > > Thanks, > > > > > > Mathieu > > > > > > > > > > > Thanx, Paul > > > > > > > > > So I expect that commit > > > > > > > > > > commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe > > > > > Author: Mathieu Desnoyers > > > > > Date: Tue Sep 25 10:50:49 2012 -0500 > > > > > > > > > > call_rcu: use wfcqueue, eliminate false-sharing > > > > > > > > > > Eliminate false-sharing between call_rcu (enqueuer) and worker threads > > > > > on the queue head and tail. > > > > > > > > > > Acked-by: Paul E. McKenney > > > > > Signed-off-by: Mathieu Desnoyers > > > > > > > > > > Could have managed to fix the issue, or change the timing enough that it > > > > > does not reproduces. I'll continue investigating. > > > > > > > > > > Thanks, > > > > > > > > > > Mathieu > > > > > > > > > > > > > > > > > > > > > > *** glibc detected *** /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs: double free or corruption (out): 0x00007f20955dfbb0 *** > > > > > > ======= Backtrace: ========= > > > > > > /lib64/libc.so.6[0x37ee676d63] > > > > > > /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs[0x4024f5] > > > > > > /lib64/libpthread.so.0[0x37eda06ccb] > > > > > > /lib64/libc.so.6(clone+0x6d)[0x37ee6e0c2d] > > > > > > ======= Memory map: ======== > > > > > > 00400000-00405000 r-xp 00000000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs > > > > > > 00605000-00606000 rw-p 00005000 08:08 6031723 /home/laijs/work/userspace-rcu/tests/.libs/lt-test_urcu_lfs > > > > > > 00606000-00616000 rw-p 00000000 00:00 0 > > > > > > 00e9c000-03482000 rw-p 00000000 00:00 0 [heap] > > > > > > 37ed600000-37ed61f000 r-xp 00000000 08:01 1507421 /lib64/ld-2.13.so > > > > > > 37ed81e000-37ed81f000 r--p 0001e000 08:01 1507421 /lib64/ld-2.13.so > > > > > > 37ed81f000-37ed820000 rw-p 0001f000 08:01 1507421 /lib64/ld-2.13.so > > > > > > 37ed820000-37ed821000 rw-p 00000000 00:00 0 > > > > > > 37eda00000-37eda17000 r-xp 00000000 08:01 1507427 /lib64/libpthread-2.13.so > > > > > > 37eda17000-37edc16000 ---p 00017000 08:01 1507427 /lib64/libpthread-2.13.so > > > > > > 37edc16000-37edc17000 r--p 00016000 08:01 1507427 /lib64/libpthread-2.13.so > > > > > > 37edc17000-37edc18000 rw-p 00017000 08:01 1507427 /lib64/libpthread-2.13.so > > > > > > 37edc18000-37edc1c000 rw-p 00000000 00:00 0 > > > > > > 37ee600000-37ee791000 r-xp 00000000 08:01 1507423 /lib64/libc-2.13.so > > > > > > 37ee791000-37ee991000 ---p 00191000 08:01 1507423 /lib64/libc-2.13.so > > > > > > 37ee991000-37ee995000 r--p 00191000 08:01 1507423 /lib64/libc-2.13.so > > > > > > 37ee995000-37ee996000 rw-p 00195000 08:01 1507423 /lib64/libc-2.13.so > > > > > > 37ee996000-37ee99c000 rw-p 00000000 00:00 0 > > > > > > 37f0e00000-37f0e15000 r-xp 00000000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > > > > > 37f0e15000-37f1014000 ---p 00015000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > > > > > 37f1014000-37f1015000 rw-p 00014000 08:01 1507437 /lib64/libgcc_s-4.5.1-20100924.so.1 > > > > > > 7f1ee4000000-7f1ee4029000 rw-p 00000000 00:00 0 > > > > > > 7f1ee4029000-7f1ee8000000 ---p 00000000 00:00 0 > > > > > > 7f1eec000000-7f1eee039000 rw-p 00000000 00:00 0 > > > > > > 7f1eee039000-7f1ef0000000 ---p 00000000 00:00 0 > > > > > > 7f1ef4000000-7f1ef4029000 rw-p 00000000 00:00 0 > > > > > > 7f1ef4029000-7f1ef8000000 ---p 00000000 00:00 0 > > > > > > 7f1efc000000-7f1efc029000 rw-p 00000000 00:00 0 > > > > > > 7f1efc029000-7f1f00000000 ---p 00000000 00:00 0 > > > > > > 7f1f04000000-7f1f060b8000 rw-p 00000000 00:00 0 > > > > > > 7f1f060b8000-7f1f08000000 ---p 00000000 00:00 0 > > > > > > 7f1f0c000000-7f1f0c029000 rw-p 00000000 00:00 0 > > > > > > 7f1f0c029000-7f1f10000000 ---p 00000000 00:00 0 > > > > > > 7f1f14000000-7f1f14029000 rw-p 00000000 00:00 0 > > > > > > 7f1f14029000-7f1f18000000 ---p 00000000 00:00 0 > > > > > > 7f1f1c000000-7f1f1c029000 rw-p 00000000 00:00 0 > > > > > > 7f1f1c029000-7f1f20000000 ---p 00000000 00:00 0 > > > > > > 7f1f24000000-7f1f24029000 rw-p 00000000 00:00 0 > > > > > > 7f1f24029000-7f1f28000000 ---p 00000000 00:00 0 > > > > > > 7f1f2c000000-7f1f2c029000 rw-p 00000000 00:00 0 > > > > > > 7f1f2c029000-7f1f30000000 ---p 00000000 00:00 0 > > > > > > 7f1f34000000-7f1f34029000 rw-p 00000000 00:00 0 > > > > > > 7f1f34029000-7f1f38000000 ---p 00000000 00:00 0 > > > > > > 7f1f3c000000-7f1f3c029000 rw-p 00000000 00:00 0 > > > > > > 7f1f3c029000-7f1f40000000 ---p 00000000 00:00 0 > > > > > > 7f1f44000000-7f1f44029000 rw-p 00000000 00:00 0 > > > > > > 7f1f44029000-7f1f48000000 ---p 00000000 00:00 0 > > > > > > 7f1f4c000000-7f1f4c029000 rw-p 00000000 00:00 0 > > > > > > 7f1f4c029000-7f1f50000000 ---p 00000000 00:00 0 > > > > > > 7f1f54000000-7f1f54029000 rw-p 00000000 00:00 0 > > > > > > 7f1f54029000-7f1f58000000 ---p 00000000 00:00 0 > > > > > > 7f1f5c000000-7f1f5c029000 rw-p 00000000 00:00 0 > > > > > > 7f1f5c029000-7f1f60000000 ---p 00000000 00:00 0 > > > > > > 7f1f64000000-7f1f64029000 rw-p 00000000 00:00 0 > > > > > > 7f1f64029000-7f1f68000000 ---p 00000000 00:00 0 > > > > > > 7f1f6c000000-7f1f6c029000 rw-p 00000000 00:00 0 > > > > > > 7f1f6c029000-7f1f70000000 ---p 00000000 00:00 0 > > > > > > 7f1f74000000-7f1f74029000 rw-p 00000000 00:00 0 > > > > > > 7f1f74029000-7f1f78000000 ---p 00000000 00:00 0 > > > > > > 7f1f7c000000-7f1f7c029000 rw-p 00000000 00:00 0 > > > > > > 7f1f7c029000-7f1f80000000 ---p 00000000 00:00 0 > > > > > > 7f1f84000000-7f1f84029000 rw-p 00000000 00:00 0 > > > > > > 7f1f84029000-7f1f88000000 ---p 00000000 00:00 0 > > > > > > 7f1f8c000000-7f1f8c029000 rw-p 00000000 00:00 0 > > > > > > 7f1f8c029000-7f1f90000000 ---p 00000000 00:00 0 > > > > > > 7f1f94000000-7f1f94029000 rw-p 00000000 00:00 0 > > > > > > 7f1f94029000-7f1f98000000 ---p 00000000 00:00 0 > > > > > > 7f1f9c000000-7f1f9c029000 rw-p 00000000 00:00 0 > > > > > > 7f1f9c029000-7f1fa0000000 ---p 00000000 00:00 0 > > > > > > 7f1fa4000000-7f1fa60ac000 rw-p 00000000 00:00 0 > > > > > > 7f1fa60ac000-7f1fa8000000 ---p 00000000 00:00 0 > > > > > > 7f1fac000000-7f1fac029000 rw-p 00000000 00:00 0 > > > > > > 7f1fac029000-7f1fb0000000 ---p 00000000 00:00 0 > > > > > > 7f1fb4000000-7f1fb4029000 rw-p 00000000 00:00 0 > > > > > > 7f1fb4029000-7f1fb8000000 ---p 00000000 00:00 0 > > > > > > 7f1fbc000000-7f1fbc029000 rw-p 00000000 00:00 0 > > > > > > 7f1fbc029000-7f1fc0000000 ---p 00000000 00:00 0 > > > > > > 7f1fc4000000-7f1fc4029000 rw-p 00000000 00:00 0 > > > > > > 7f1fc4029000-7f1fc8000000 ---p 00000000 00:00 0 > > > > > > 7f1fcc000000-7f1fce0a1000 rw-p 00000000 00:00 0 > > > > > > 7f1fce0a1000-7f1fd0000000 ---p 00000000 00:00 0 > > > > > > 7f1fd4000000-7f1fd4029000 rw-p 00000000 00:00 0 > > > > > > 7f1fd4029000-7f1fd8000000 ---p 00000000 00:00 0 > > > > > > 7f1fdc000000-7f1fde06b000 rw-p 00000000 00:00 0 > > > > > > 7f1fde06b000-7f1fe0000000 ---p 00000000 00:00 0 > > > > > > 7f1fe4000000-7f1fe4029000 rw-p 00000000 00:00 0 > > > > > > 7f1fe4029000-7f1fe8000000 ---p 00000000 00:00 0 > > > > > > 7f1fec000000-7f1fede38000 rw-p 00000000 00:00 0 > > > > > > 7f1fede38000-7f1ff0000000 ---p 00000000 00:00 0 > > > > > > 7f1ff4000000-7f1ff4029000 rw-p 00000000 00:00 0 > > > > > > 7f1ff4029000-7f1ff8000000 ---p 00000000 00:00 0 > > > > > > 7f1ffc000000-7f1ffc029000 rw-p 00000000 00:00 0 > > > > > > 7f1ffc029000-7f2000000000 ---p 00000000 00:00 0 > > > > > > 7f2004000000-7f20060c6000 rw-p 00000000 00:00 0 > > > > > > 7f20060c6000-7f2008000000 ---p 00000000 00:00 0 > > > > > > 7f200c000000-7f200c029000 rw-p 00000000 00:00 0 > > > > > > 7f200c029000-7f2010000000 ---p 00000000 00:00 0 > > > > > > 7f2014000000-7f2014029000 rw-p 00000000 00:00 0 > > > > > > 7f2014029000-7f2018000000 ---p 00000000 00:00 0 > > > > > > 7f201c000000-7f201c029000 rw-p 00000000 00:00 0 > > > > > > 7f201c029000-7f2020000000 ---p 00000000 00:00 0 > > > > > > 7f2024000000-7f2024029000 rw-p 00000000 00:00 0 > > > > > > 7f2024029000-7f2028000000 ---p 00000000 00:00 0 > > > > > > 7f202c000000-7f202c029000 rw-p 00000000 00:00 0 > > > > > > 7f202c029000-7f2030000000 ---p 00000000 00:00 0 > > > > > > 7f2034000000-7f2034029000 rw-p 00000000 00:00 0 > > > > > > 7f2034029000-7f2038000000 ---p 00000000 00:00 0 > > > > > > 7f203c000000-7f203c029000 rw-p 00000000 00:00 0 > > > > > > 7f203c029000-7f2040000000 ---p 00000000 00:00 0 > > > > > > 7f2044000000-7f2044029000 rw-p 00000000 00:00 0 > > > > > > 7f2044029000-7f2048000000 ---p 00000000 00:00 0 > > > > > > 7f204c000000-7f204c029000 rw-p 00000000 00:00 0 > > > > > > 7f204c029000-7f2050000000 ---p 00000000 00:00 0 > > > > > > 7f2054000000-7f2054029000 rw-p 00000000 00:00 0 > > > > > > 7f2054029000-7f2058000000 ---p 00000000 00:00 0 > > > > > > 7f205c000000-7f205c029000 rw-p 00000000 00:00 0 > > > > > > 7f205c029000-7f2060000000 ---p 00000000 00:00 0 > > > > > > 7f2064000000-7f2064029000 rw-p 00000000 00:00 0 > > > > > > 7f2064029000-7f2068000000 ---p 00000000 00:00 0 > > > > > > 7f206c000000-7f206c029000 rw-p 00000000 00:00 0 > > > > > > 7f206c029000-7f2070000000 ---p 00000000 00:00 0 > > > > > > 7f2074000000-7f2074029000 rw-p 00000000 00:00 0 > > > > > > 7f2074029000-7f2078000000 ---p 00000000 00:00 0 > > > > > > 7f207c000000-7f207e0bc000 rw-p 00000000 00:00 0 > > > > > > 7f207e0bc000-7f2080000000 ---p 00000000 00:00 0 > > > > > > 7f2084000000-7f2084029000 rw-p 00000000 00:00 0 > > > > > > 7f2084029000-7f2088000000 ---p 00000000 00:00 0 > > > > > > 7f208c000000-7f208c029000 rw-p 00000000 00:00 0 > > > > > > 7f208c029000-7f2090000000 ---p 00000000 00:00 0 > > > > > > 7f2094000000-7f20960c6000 rw-p 00000000 00:00 0 > > > > > > 7f20960c6000-7f2098000000 ---p 00000000 00:00 0 > > > > > > 7f209c000000-7f209c029000 rw-p 00000000 00:00 0 > > > > > > 7f209c029000-7f20a0000000 ---p 00000000 00:00 0 > > > > > > 7f20a4000000-7f20a4029000 rw-p 00000000 00:00 0 > > > > > > 7f20a4029000-7f20a8000000 ---p 00000000 00:00 0 > > > > > > 7f20ac000000-7f20ac029000 rw-p 00000000 00:00 0 > > > > > > 7f20ac029000-7f20b0000000 ---p 00000000 00:00 0 > > > > > > 7f20b4000000-7f20b4029000 rw-p 00000000 00:00 0 > > > > > > 7f20b4029000-7f20b8000000 ---p 00000000 00:00 0 > > > > > > 7f20bc000000-7f20bc029000 rw-p 00000000 00:00 0 > > > > > > 7f20bc029000-7f20c0000000 ---p 00000000 00:00 0 > > > > > > 7f20c4000000-7f20c4029000 rw-p 00000000 00:00 0 > > > > > > 7f20c4029000-7f20c8000000 ---p 00000000 00:00 0 > > > > > > 7f20c8ffa000-7f20c8ffb000 ---p 00000000 00:00 0 > > > > > > 7f20c8ffb000-7f20c97fb000 rw-p 00000000 00:00 0 [stack:10274] > > > > > > 7f20c97fb000-7f20c97fc000 ---p 00000000 00:00 0 > > > > > > 7f20c97fc000-7f20c9ffc000 rw-p 00000000 00:00 0 > > > > > > 7f20c9ffc000-7f20c9ffd000 ---p 00000000 00:00 0 > > > > > > 7f20c9ffd000-7f20ca7fd000 rw-p 00000000 00:00 0 > > > > > > 7f20ca7fd000-7f20ca7fe000 ---p 00000000 00:00 0 > > > > > > 7f20ca7fe000-7f20caffe000 rw-p 00000000 00:00 0 > > > > > > 7f20cc000000-7f20cc029000 rw-p 00000000 00:00 0 > > > > > > 7f20cc029000-7f20d0000000 ---p 00000000 00:00 0 > > > > > > 7f20d4000000-7f20d4029000 rw-p 00000000 00:00 0 > > > > > > 7f20d4029000-7f20d8000000 ---p 00000000 00:00 0 > > > > > > 7f20dc000000-7f20dc029000 rw-p 00000000 00:00 0 > > > > > > 7f20dc029000-7f20e0000000 ---p 00000000 00:00 0 > > > > > > 7f210d9dd000-7f210d9de000 ---p 00000000 00:00 0 > > > > > > 7f210d9de000-7f210e1de000 rw-p 00000000 00:00 0 [stack:10160] > > > > > > 7f210e1de000-7f210e1df000 ---p 00000000 00:00 0 > > > > > > 7f210e1df000-7f210e9df000 rw-p 00000000 00:00 0 [stack:10159] > > > > > > 7f210e9df000-7f210e9e0000 ---p 00000000 00:00 0 > > > > > > 7f210e9e0000-7f210f1e0000 rw-p 00000000 00:00 0 > > > > > > 7f210f1e0000-7f210f1e1000 ---p 00000000 00:00 0 > > > > > > 7f210f1e1000-7f210f9e4000 rw-p 00000000 00:00 0 > > > > > > 7f210fa00000-7f210fa01000 rw-p 00000000 00:00 0 > > > > > > 7f210fa01000-7f210fa02000 r-xp 00000000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > > > > > 7f210fa02000-7f210fc02000 ---p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > > > > > 7f210fc02000-7f210fc03000 rw-p 00001000 08:08 6029369 /home/laijs/work/userspace-rcu/.libs/liburcu-common.so.1.0.0 > > > > > > 7f210fc03000-7f210fc04000 rw-p 00000000 00:00 0 > > > > > > 7f210fc04000-7f210fc0a000 r-xp 00000000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > > > > > 7f210fc0a000-7f210fe09000 ---p 00006000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > > > > > 7f210fe09000-7f210fe0a000 rw-p 00005000 08:08 6029586 /home/laijs/work/userspace-rcu/.libs/liburcu-cds.so.1.0.0 > > > > > > 7f210fe0a000-7f210fe0b000 rw-p 00000000 00:00 0 > > > > > > 7fff7c648000-7fff7c669000 rw-p 00000000 00:00 0 [stack] > > > > > > 7fff7c715000-7fff7c716000 r-xp 00000000 00:00 0 [vdso] > > > > > > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] > > > > > > > > > > > > _______________________________________________ > > > > > > lttng-dev mailing list > > > > > > lttng-dev at lists.lttng.org > > > > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > > > > > > > > -- > > > > > Mathieu Desnoyers > > > > > Operating System Efficiency R&D Consultant > > > > > EfficiOS Inc. > > > > > http://www.efficios.com > > > > > > > > > > > > > > > > > _______________________________________________ > > > > lttng-dev mailing list > > > > lttng-dev at lists.lttng.org > > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > > > > -- > > > Mathieu Desnoyers > > > Operating System Efficiency R&D Consultant > > > EfficiOS Inc. > > > http://www.efficios.com > > > > -- > > Mathieu Desnoyers > > Operating System Efficiency R&D Consultant > > EfficiOS Inc. > > http://www.efficios.com > > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From laijs at cn.fujitsu.com Wed Oct 10 21:31:01 2012 From: laijs at cn.fujitsu.com (Lai Jiangshan) Date: Thu, 11 Oct 2012 09:31:01 +0800 Subject: [lttng-dev] rculfstack bug In-Reply-To: <20121010195007.GG2495@linux.vnet.ibm.com> References: <50754A57.1080104@cn.fujitsu.com> <20121010114215.GA11307@Krystal> <20121010150207.GB2495@linux.vnet.ibm.com> <20121010150751.GB20761@Krystal> <20121010175304.GA25511@Krystal> <20121010195007.GG2495@linux.vnet.ibm.com> Message-ID: <50762155.3080804@cn.fujitsu.com> On 10/11/2012 03:50 AM, Paul E. McKenney wrote: > On Wed, Oct 10, 2012 at 01:53:04PM -0400, Mathieu Desnoyers wrote: >> * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: >>> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: >>>> On Wed, Oct 10, 2012 at 07:42:15AM -0400, Mathieu Desnoyers wrote: >>>>> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: >>>>>> test code: >>>>>> ./tests/test_urcu_lfs 100 10 10 >>>>>> >>>>>> bug produce rate > 60% >>>>>> >>>>>> {{{ >>>>>> I didn't see any bug when "./tests/test_urcu_lfs 10 10 10" Or "./tests/test_urcu_lfs 100 100 10" >>>>>> But I just test it about 5 times >>>>>> }}} >>>>>> >>>>>> 4cores*1threads: Intel(R) Core(TM) i5 CPU 760 >>>>>> RCU_MB (no time to test for other rcu type) >>>>>> test commit: 768fba83676f49eb73fd1d8ad452016a84c5ec2a >>>>>> >>>>>> I didn't see any bug when "./tests/test_urcu_mb 10 100 10" >>>>>> >>>>>> Sorry, I tried, but I failed to find out the root cause currently. >>>>> >>>>> I think I managed to narrow down the issue: >>>>> >>>>> 1) the master branch does not reproduce it, but commit >>>>> 768fba83676f49eb73fd1d8ad452016a84c5ec2a repdroduces it about 50% of the >>>>> time. >>>>> >>>>> 2) the main change between 768fba83676f49eb73fd1d8ad452016a84c5ec2a and >>>>> current master (f94061a3df4c9eab9ac869a19e4228de54771fcb) is call_rcu >>>>> moving to wfcqueue. >>>>> >>>>> 3) the bug always arise, for me, at the end of the 10 seconds. >>>>> However, it might be simply due to the fact that most of the memory >>>>> get freed at the end of program execution. >>>>> >>>>> 4) I've been able to get a backtrace, and it looks like we have some >>>>> call_rcu callback-invokation threads still working while >>>>> call_rcu_data_free() is invoked. In the backtrace, call_rcu_data_free() >>>>> is nicely waiting for the next thread to stop, and during that time, >>>>> two callback-invokation threads are invoking callbacks (and one of >>>>> them triggers the segfault). >>>> >>>> Do any of the callbacks reference __thread variables from some other >>>> thread? If so, those threads must refrain from exiting until after >>>> such callbacks complete. >>> >>> The callback is a simple caa_container_of + free, usual stuff, nothing >>> fancy. >> >> Here is the fix: the bug was in call rcu. It is not required for master, >> because we fixed it while moving to wfcqueue. >> >> We were erroneously writing to the head field of the default >> call_rcu_data rather than tail. > > Ouch!!! I have no idea why that would have passed my testing. :-( It's one of the reasons that I rewrite wfqueue and introduce delete_all() (Mathieu uses splice instead) to replace open code of wfqueue in urcu-call-rcu-impl.h. > >> I wonder if we should simply do a new release with call_rcu using >> wfcqueue and tell people to upgrade, or if we should somehow create a >> stable branch with this fix. >> >> Thoughts ? > > Under what conditions does this bug appear? It is necessary to not just > use call_rcu(), but also to explicitly call call_rcu_data_free(), right? > > My guess is that a stable branch would be good -- there will be other > bugs, after all. :-/ > > Thanx, Paul > >> Thanks, >> >> Mathieu >> >> --- >> diff --git a/urcu-call-rcu-impl.h b/urcu-call-rcu-impl.h >> index 13b24ff..b205229 100644 >> --- a/urcu-call-rcu-impl.h >> +++ b/urcu-call-rcu-impl.h >> @@ -647,8 +647,9 @@ void call_rcu_data_free(struct call_rcu_data *crdp) >> /* Create default call rcu data if need be */ >> (void) get_default_call_rcu_data(); >> cbs_endprev = (struct cds_wfq_node **) >> - uatomic_xchg(&default_call_rcu_data, cbs_tail); >> - *cbs_endprev = cbs; >> + uatomic_xchg(&default_call_rcu_data->cbs.tail, >> + cbs_tail); >> + _CMM_STORE_SHARED(*cbs_endprev, cbs); >> uatomic_add(&default_call_rcu_data->qlen, >> uatomic_read(&crdp->qlen)); >> wake_call_rcu_thread(default_call_rcu_data); >> >> >>> >>> Thanks, >>> >>> Mathieu >>> >>>> >>>> Thanx, Paul >>>> >>>>> So I expect that commit >>>>> >>>>> commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe >>>>> Author: Mathieu Desnoyers >>>>> Date: Tue Sep 25 10:50:49 2012 -0500 >>>>> >>>>> call_rcu: use wfcqueue, eliminate false-sharing >>>>> >>>>> Eliminate false-sharing between call_rcu (enqueuer) and worker threads >>>>> on the queue head and tail. >>>>> I think the changelog of this commit is too short. Thanks, Lai From paulmck at linux.vnet.ibm.com Wed Oct 10 23:02:51 2012 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Wed, 10 Oct 2012 20:02:51 -0700 Subject: [lttng-dev] rculfstack bug In-Reply-To: <50762155.3080804@cn.fujitsu.com> References: <50754A57.1080104@cn.fujitsu.com> <20121010114215.GA11307@Krystal> <20121010150207.GB2495@linux.vnet.ibm.com> <20121010150751.GB20761@Krystal> <20121010175304.GA25511@Krystal> <20121010195007.GG2495@linux.vnet.ibm.com> <50762155.3080804@cn.fujitsu.com> Message-ID: <20121011030251.GD2869@linux.vnet.ibm.com> On Thu, Oct 11, 2012 at 09:31:01AM +0800, Lai Jiangshan wrote: > On 10/11/2012 03:50 AM, Paul E. McKenney wrote: > > On Wed, Oct 10, 2012 at 01:53:04PM -0400, Mathieu Desnoyers wrote: > >> * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: > >>> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > >>>> On Wed, Oct 10, 2012 at 07:42:15AM -0400, Mathieu Desnoyers wrote: > >>>>> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > >>>>>> test code: > >>>>>> ./tests/test_urcu_lfs 100 10 10 > >>>>>> > >>>>>> bug produce rate > 60% > >>>>>> > >>>>>> {{{ > >>>>>> I didn't see any bug when "./tests/test_urcu_lfs 10 10 10" Or "./tests/test_urcu_lfs 100 100 10" > >>>>>> But I just test it about 5 times > >>>>>> }}} > >>>>>> > >>>>>> 4cores*1threads: Intel(R) Core(TM) i5 CPU 760 > >>>>>> RCU_MB (no time to test for other rcu type) > >>>>>> test commit: 768fba83676f49eb73fd1d8ad452016a84c5ec2a > >>>>>> > >>>>>> I didn't see any bug when "./tests/test_urcu_mb 10 100 10" > >>>>>> > >>>>>> Sorry, I tried, but I failed to find out the root cause currently. > >>>>> > >>>>> I think I managed to narrow down the issue: > >>>>> > >>>>> 1) the master branch does not reproduce it, but commit > >>>>> 768fba83676f49eb73fd1d8ad452016a84c5ec2a repdroduces it about 50% of the > >>>>> time. > >>>>> > >>>>> 2) the main change between 768fba83676f49eb73fd1d8ad452016a84c5ec2a and > >>>>> current master (f94061a3df4c9eab9ac869a19e4228de54771fcb) is call_rcu > >>>>> moving to wfcqueue. > >>>>> > >>>>> 3) the bug always arise, for me, at the end of the 10 seconds. > >>>>> However, it might be simply due to the fact that most of the memory > >>>>> get freed at the end of program execution. > >>>>> > >>>>> 4) I've been able to get a backtrace, and it looks like we have some > >>>>> call_rcu callback-invokation threads still working while > >>>>> call_rcu_data_free() is invoked. In the backtrace, call_rcu_data_free() > >>>>> is nicely waiting for the next thread to stop, and during that time, > >>>>> two callback-invokation threads are invoking callbacks (and one of > >>>>> them triggers the segfault). > >>>> > >>>> Do any of the callbacks reference __thread variables from some other > >>>> thread? If so, those threads must refrain from exiting until after > >>>> such callbacks complete. > >>> > >>> The callback is a simple caa_container_of + free, usual stuff, nothing > >>> fancy. > >> > >> Here is the fix: the bug was in call rcu. It is not required for master, > >> because we fixed it while moving to wfcqueue. > >> > >> We were erroneously writing to the head field of the default > >> call_rcu_data rather than tail. > > > > Ouch!!! I have no idea why that would have passed my testing. :-( > > It's one of the reasons that I rewrite wfqueue and introduce delete_all() > (Mathieu uses splice instead) to replace open code of wfqueue in urcu-call-rcu-impl.h. Good catch!!! Thanx, Paul > >> I wonder if we should simply do a new release with call_rcu using > >> wfcqueue and tell people to upgrade, or if we should somehow create a > >> stable branch with this fix. > >> > >> Thoughts ? > > > > Under what conditions does this bug appear? It is necessary to not just > > use call_rcu(), but also to explicitly call call_rcu_data_free(), right? > > > > My guess is that a stable branch would be good -- there will be other > > bugs, after all. :-/ > > > > Thanx, Paul > > > >> Thanks, > >> > >> Mathieu > >> > >> --- > >> diff --git a/urcu-call-rcu-impl.h b/urcu-call-rcu-impl.h > >> index 13b24ff..b205229 100644 > >> --- a/urcu-call-rcu-impl.h > >> +++ b/urcu-call-rcu-impl.h > >> @@ -647,8 +647,9 @@ void call_rcu_data_free(struct call_rcu_data *crdp) > >> /* Create default call rcu data if need be */ > >> (void) get_default_call_rcu_data(); > >> cbs_endprev = (struct cds_wfq_node **) > >> - uatomic_xchg(&default_call_rcu_data, cbs_tail); > >> - *cbs_endprev = cbs; > >> + uatomic_xchg(&default_call_rcu_data->cbs.tail, > >> + cbs_tail); > >> + _CMM_STORE_SHARED(*cbs_endprev, cbs); > >> uatomic_add(&default_call_rcu_data->qlen, > >> uatomic_read(&crdp->qlen)); > >> wake_call_rcu_thread(default_call_rcu_data); > >> > >> > >>> > >>> Thanks, > >>> > >>> Mathieu > >>> > >>>> > >>>> Thanx, Paul > >>>> > >>>>> So I expect that commit > >>>>> > >>>>> commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe > >>>>> Author: Mathieu Desnoyers > >>>>> Date: Tue Sep 25 10:50:49 2012 -0500 > >>>>> > >>>>> call_rcu: use wfcqueue, eliminate false-sharing > >>>>> > >>>>> Eliminate false-sharing between call_rcu (enqueuer) and worker threads > >>>>> on the queue head and tail. > >>>>> > > I think the changelog of this commit is too short. > > Thanks, > Lai > From Paul_Woegerer at mentor.com Thu Oct 11 05:24:48 2012 From: Paul_Woegerer at mentor.com (Woegerer, Paul) Date: Thu, 11 Oct 2012 11:24:48 +0200 Subject: [lttng-dev] sched_process_exec Message-ID: <50769060.7070309@mentor.com> Hi, Is there a reason why instrumentation/events/lttng-module/sched.h does not include TRACE_EVENT(sched_process_exec) ? The patch below adds the missing tracepoint definition (tested with kernel 3.4.6-2.10). diff --git a/instrumentation/events/lttng-module/sched.h b/instrumentation/events/lttng-module/sched.h index b68616e..ef791ac 100644 --- a/instrumentation/events/lttng-module/sched.h +++ b/instrumentation/events/lttng-module/sched.h @@ -314,6 +314,32 @@ TRACE_EVENT(sched_process_fork, ) /* + * Tracepoint for exec: + */ +TRACE_EVENT(sched_process_exec, + + TP_PROTO(struct task_struct *p, pid_t old_pid, + struct linux_binprm *bprm), + + TP_ARGS(p, old_pid, bprm), + + TP_STRUCT__entry( + __string( filename, bprm->filename ) + __field( pid_t, pid ) + __field( pid_t, old_pid ) + ), + + TP_fast_assign( + tp_strcpy(filename, bprm->filename); + tp_assign(pid, p->pid) + tp_assign(old_pid, old_pid) + ), + + TP_printk("filename=%s pid=%d old_pid=%d", __get_str(filename), + __entry->pid, __entry->old_pid) +) + +/* * XXX the below sched_stat tracepoints only apply to SCHED_OTHER/BATCH/IDLE * adding sched_stat support to SCHED_FIFO/RR would be welcome. */ -- Paul Woegerer | SW Development Engineer http://go.mentor.com/sourceryanalyzer Mentor Embedded(tm) | Prinz Eugen Stra?e 72/2/4, Vienna, 1040 Austria Nucleus? | Linux? | Android(tm) | Services | UI | Multi-OS Android is a trademark of Google Inc. Use of this trademark is subject to Google Permissions. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. From dut.ac.lgy at gmail.com Thu Oct 11 09:46:03 2012 From: dut.ac.lgy at gmail.com (=?GB2312?B?wfW54tHH?=) Date: Thu, 11 Oct 2012 21:46:03 +0800 Subject: [lttng-dev] How can I know what events we can enable by enable-events Message-ID: How can I know how many events we can enable and what they are? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From dgoulet at efficios.com Thu Oct 11 09:48:16 2012 From: dgoulet at efficios.com (David Goulet) Date: Thu, 11 Oct 2012 09:48:16 -0400 Subject: [lttng-dev] How can I know what events we can enable by enable-events In-Reply-To: References: Message-ID: <5076CE20.7060506@efficios.com> Hi, Use this command with your session name: $ lttng list mysession_name Refer to man lttng(1) for more info. Enjoy! David ???: > How can I know how many events we can enable and what they are? Thanks! > > > This body part will be downloaded on demand. From mathieu.desnoyers at efficios.com Thu Oct 11 10:58:55 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Thu, 11 Oct 2012 10:58:55 -0400 Subject: [lttng-dev] sched_process_exec In-Reply-To: <50769060.7070309@mentor.com> References: <50769060.7070309@mentor.com> Message-ID: <20121011145855.GA11726@Krystal> * Woegerer, Paul (Paul_Woegerer at mentor.com) wrote: > Hi, > > Is there a reason why instrumentation/events/lttng-module/sched.h does > not include TRACE_EVENT(sched_process_exec) ? > > The patch below adds the missing tracepoint definition (tested with > kernel 3.4.6-2.10). Hi Paul, The only reason is because it appeared in a later kernel than the one used when generating the lttng-modules trace event instrumentation. A couple a details to fix before I can merge this patch though: Please also update instrumentation/events/mainline/sched.h to add the original mainline TRACE_EVENT, so we can keep the files in sync. > > diff --git a/instrumentation/events/lttng-module/sched.h > b/instrumentation/events/lttng-module/sched.h > index b68616e..ef791ac 100644 > --- a/instrumentation/events/lttng-module/sched.h > +++ b/instrumentation/events/lttng-module/sched.h > @@ -314,6 +314,32 @@ TRACE_EVENT(sched_process_fork, > ) > > /* > + * Tracepoint for exec: > + */ > +TRACE_EVENT(sched_process_exec, > + > + TP_PROTO(struct task_struct *p, pid_t old_pid, > + struct linux_binprm *bprm), > + > + TP_ARGS(p, old_pid, bprm), > + > + TP_STRUCT__entry( > + __string( filename, bprm->filename ) > + __field( pid_t, pid ) > + __field( pid_t, old_pid ) > + ), > + > + TP_fast_assign( > + tp_strcpy(filename, bprm->filename); Please remove the ";" above. > + tp_assign(pid, p->pid) > + tp_assign(old_pid, old_pid) > + ), > + > + TP_printk("filename=%s pid=%d old_pid=%d", __get_str(filename), > + __entry->pid, __entry->old_pid) > +) > + > +/* > * XXX the below sched_stat tracepoints only apply to > SCHED_OTHER/BATCH/IDLE A newline has been added by your mail client, which causes the patch to not apply correctly. Hints on how to configure mail clients for sending patches can be looked up in the Linux kernel tree, under Documentation/email-clients.txt Thanks! Mathieu > * adding sched_stat support to SCHED_FIFO/RR would be welcome. > */ > > -- > Paul Woegerer | SW Development Engineer > http://go.mentor.com/sourceryanalyzer > > Mentor Embedded(tm) | Prinz Eugen Stra?e 72/2/4, Vienna, 1040 Austria > Nucleus? | Linux? | Android(tm) | Services | UI | Multi-OS > > Android is a trademark of Google Inc. Use of this trademark is subject to Google Permissions. > Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Thu Oct 11 11:27:08 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Thu, 11 Oct 2012 11:27:08 -0400 Subject: [lttng-dev] rculfstack bug In-Reply-To: <50762155.3080804@cn.fujitsu.com> References: <50754A57.1080104@cn.fujitsu.com> <20121010114215.GA11307@Krystal> <20121010150207.GB2495@linux.vnet.ibm.com> <20121010150751.GB20761@Krystal> <20121010175304.GA25511@Krystal> <20121010195007.GG2495@linux.vnet.ibm.com> <50762155.3080804@cn.fujitsu.com> Message-ID: <20121011152708.GB11726@Krystal> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > On 10/11/2012 03:50 AM, Paul E. McKenney wrote: > > On Wed, Oct 10, 2012 at 01:53:04PM -0400, Mathieu Desnoyers wrote: > >> * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: > >>> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > >>>> On Wed, Oct 10, 2012 at 07:42:15AM -0400, Mathieu Desnoyers wrote: > >>>>> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > >>>>>> test code: > >>>>>> ./tests/test_urcu_lfs 100 10 10 > >>>>>> > >>>>>> bug produce rate > 60% > >>>>>> > >>>>>> {{{ > >>>>>> I didn't see any bug when "./tests/test_urcu_lfs 10 10 10" Or "./tests/test_urcu_lfs 100 100 10" > >>>>>> But I just test it about 5 times > >>>>>> }}} > >>>>>> > >>>>>> 4cores*1threads: Intel(R) Core(TM) i5 CPU 760 > >>>>>> RCU_MB (no time to test for other rcu type) > >>>>>> test commit: 768fba83676f49eb73fd1d8ad452016a84c5ec2a > >>>>>> > >>>>>> I didn't see any bug when "./tests/test_urcu_mb 10 100 10" > >>>>>> > >>>>>> Sorry, I tried, but I failed to find out the root cause currently. > >>>>> > >>>>> I think I managed to narrow down the issue: > >>>>> > >>>>> 1) the master branch does not reproduce it, but commit > >>>>> 768fba83676f49eb73fd1d8ad452016a84c5ec2a repdroduces it about 50% of the > >>>>> time. > >>>>> > >>>>> 2) the main change between 768fba83676f49eb73fd1d8ad452016a84c5ec2a and > >>>>> current master (f94061a3df4c9eab9ac869a19e4228de54771fcb) is call_rcu > >>>>> moving to wfcqueue. > >>>>> > >>>>> 3) the bug always arise, for me, at the end of the 10 seconds. > >>>>> However, it might be simply due to the fact that most of the memory > >>>>> get freed at the end of program execution. > >>>>> > >>>>> 4) I've been able to get a backtrace, and it looks like we have some > >>>>> call_rcu callback-invokation threads still working while > >>>>> call_rcu_data_free() is invoked. In the backtrace, call_rcu_data_free() > >>>>> is nicely waiting for the next thread to stop, and during that time, > >>>>> two callback-invokation threads are invoking callbacks (and one of > >>>>> them triggers the segfault). > >>>> > >>>> Do any of the callbacks reference __thread variables from some other > >>>> thread? If so, those threads must refrain from exiting until after > >>>> such callbacks complete. > >>> > >>> The callback is a simple caa_container_of + free, usual stuff, nothing > >>> fancy. > >> > >> Here is the fix: the bug was in call rcu. It is not required for master, > >> because we fixed it while moving to wfcqueue. > >> > >> We were erroneously writing to the head field of the default > >> call_rcu_data rather than tail. > > > > Ouch!!! I have no idea why that would have passed my testing. :-( > > It's one of the reasons that I rewrite wfqueue and introduce delete_all() > (Mathieu uses splice instead) to replace open code of wfqueue in > urcu-call-rcu-impl.h. Yes, you did an excellent call on this one. We had the bug fixed in the master branch before it was discovered, which is always nice :) > > > > >> I wonder if we should simply do a new release with call_rcu using > >> wfcqueue and tell people to upgrade, or if we should somehow create a > >> stable branch with this fix. > >> > >> Thoughts ? > > > > Under what conditions does this bug appear? It is necessary to not just > > use call_rcu(), but also to explicitly call call_rcu_data_free(), right? > > > > My guess is that a stable branch would be good -- there will be other > > bugs, after all. :-/ > > > > Thanx, Paul > > > >> Thanks, > >> > >> Mathieu > >> > >> --- > >> diff --git a/urcu-call-rcu-impl.h b/urcu-call-rcu-impl.h > >> index 13b24ff..b205229 100644 > >> --- a/urcu-call-rcu-impl.h > >> +++ b/urcu-call-rcu-impl.h > >> @@ -647,8 +647,9 @@ void call_rcu_data_free(struct call_rcu_data *crdp) > >> /* Create default call rcu data if need be */ > >> (void) get_default_call_rcu_data(); > >> cbs_endprev = (struct cds_wfq_node **) > >> - uatomic_xchg(&default_call_rcu_data, cbs_tail); > >> - *cbs_endprev = cbs; > >> + uatomic_xchg(&default_call_rcu_data->cbs.tail, > >> + cbs_tail); > >> + _CMM_STORE_SHARED(*cbs_endprev, cbs); > >> uatomic_add(&default_call_rcu_data->qlen, > >> uatomic_read(&crdp->qlen)); > >> wake_call_rcu_thread(default_call_rcu_data); > >> > >> > >>> > >>> Thanks, > >>> > >>> Mathieu > >>> > >>>> > >>>> Thanx, Paul > >>>> > >>>>> So I expect that commit > >>>>> > >>>>> commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe > >>>>> Author: Mathieu Desnoyers > >>>>> Date: Tue Sep 25 10:50:49 2012 -0500 > >>>>> > >>>>> call_rcu: use wfcqueue, eliminate false-sharing > >>>>> > >>>>> Eliminate false-sharing between call_rcu (enqueuer) and worker threads > >>>>> on the queue head and tail. > >>>>> > > I think the changelog of this commit is too short. Yes, unfortunately, it's already in the master branch, and rewriting history is something mere mortals are not allowed to do. ;-) There are a couple of patches that I have pending that did not receive any negative feedback at this point. I will move things forwards cautiously to ensure we have a consistent master branch, and create stable branches for 0.6 and 0.7. Review of these commits will be welcome before we do a 0.8 release (with the new wfcqueue). We'll be able to proceed to further changes as separate commits. Thanks! Mathieu > > Thanks, > Lai -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From Paul_Woegerer at mentor.com Thu Oct 11 11:28:29 2012 From: Paul_Woegerer at mentor.com (Woegerer, Paul) Date: Thu, 11 Oct 2012 17:28:29 +0200 Subject: [lttng-dev] sched_process_exec In-Reply-To: <20121011145855.GA11726@Krystal> References: <50769060.7070309@mentor.com> <20121011145855.GA11726@Krystal> Message-ID: <5076E59D.3010003@mentor.com> On 10/11/2012 04:58 PM, Mathieu Desnoyers wrote: > A couple a details to fix before I can merge this patch though: > > Please also update instrumentation/events/mainline/sched.h to add the > original mainline TRACE_EVENT, so we can keep the files in sync. Ok, reconfigured Thunderbird, removed semicolon, updated mainline: diff --git a/instrumentation/events/lttng-module/sched.h b/instrumentation/events/lttng-module/sched.h index b68616e..23e4955 100644 --- a/instrumentation/events/lttng-module/sched.h +++ b/instrumentation/events/lttng-module/sched.h @@ -314,6 +314,32 @@ TRACE_EVENT(sched_process_fork, ) /* + * Tracepoint for exec: + */ +TRACE_EVENT(sched_process_exec, + + TP_PROTO(struct task_struct *p, pid_t old_pid, + struct linux_binprm *bprm), + + TP_ARGS(p, old_pid, bprm), + + TP_STRUCT__entry( + __string( filename, bprm->filename ) + __field( pid_t, pid ) + __field( pid_t, old_pid ) + ), + + TP_fast_assign( + tp_strcpy(filename, bprm->filename) + tp_assign(pid, p->pid) + tp_assign(old_pid, old_pid) + ), + + TP_printk("filename=%s pid=%d old_pid=%d", __get_str(filename), + __entry->pid, __entry->old_pid) +) + +/* * XXX the below sched_stat tracepoints only apply to SCHED_OTHER/BATCH/IDLE * adding sched_stat support to SCHED_FIFO/RR would be welcome. */ diff --git a/instrumentation/events/mainline/sched.h b/instrumentation/events/mainline/sched.h index f633478..6700ecc 100644 --- a/instrumentation/events/mainline/sched.h +++ b/instrumentation/events/mainline/sched.h @@ -275,6 +275,32 @@ TRACE_EVENT(sched_process_fork, ); /* + * Tracepoint for exec: + */ +TRACE_EVENT(sched_process_exec, + + TP_PROTO(struct task_struct *p, pid_t old_pid, + struct linux_binprm *bprm), + + TP_ARGS(p, old_pid, bprm), + + TP_STRUCT__entry( + __string( filename, bprm->filename ) + __field( pid_t, pid ) + __field( pid_t, old_pid ) + ), + + TP_fast_assign( + __assign_str(filename, bprm->filename); + __entry->pid = p->pid; + __entry->old_pid = old_pid; + ), + + TP_printk("filename=%s pid=%d old_pid=%d", __get_str(filename), + __entry->pid, __entry->old_pid) +); + +/* * XXX the below sched_stat tracepoints only apply to SCHED_OTHER/BATCH/IDLE * adding sched_stat support to SCHED_FIFO/RR would be welcome. */ From mathieu.desnoyers at efficios.com Thu Oct 11 12:07:29 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Thu, 11 Oct 2012 12:07:29 -0400 Subject: [lttng-dev] sched_process_exec In-Reply-To: <5076E59D.3010003@mentor.com> References: <50769060.7070309@mentor.com> <20121011145855.GA11726@Krystal> <5076E59D.3010003@mentor.com> Message-ID: <20121011160729.GA29181@Krystal> * Woegerer, Paul (Paul_Woegerer at mentor.com) wrote: > On 10/11/2012 04:58 PM, Mathieu Desnoyers wrote: > > A couple a details to fix before I can merge this patch though: > > > > Please also update instrumentation/events/mainline/sched.h to add the > > original mainline TRACE_EVENT, so we can keep the files in sync. > > Ok, reconfigured Thunderbird, removed semicolon, updated mainline: Still an issue: compudj at thinkos:~/work/lttng-modules$ patch -p1 < ~/.swp patching file instrumentation/events/lttng-module/sched.h patch: **** malformed patch at line 45: SCHED_OTHER/BATCH/IDLE it seems to be an issue with line-wrapping of your mail client. Thanks, Mathieu > > diff --git a/instrumentation/events/lttng-module/sched.h > b/instrumentation/events/lttng-module/sched.h > index b68616e..23e4955 100644 > --- a/instrumentation/events/lttng-module/sched.h > +++ b/instrumentation/events/lttng-module/sched.h > @@ -314,6 +314,32 @@ TRACE_EVENT(sched_process_fork, > ) > > /* > + * Tracepoint for exec: > + */ > +TRACE_EVENT(sched_process_exec, > + > + TP_PROTO(struct task_struct *p, pid_t old_pid, > + struct linux_binprm *bprm), > + > + TP_ARGS(p, old_pid, bprm), > + > + TP_STRUCT__entry( > + __string( filename, bprm->filename ) > + __field( pid_t, pid ) > + __field( pid_t, old_pid ) > + ), > + > + TP_fast_assign( > + tp_strcpy(filename, bprm->filename) > + tp_assign(pid, p->pid) > + tp_assign(old_pid, old_pid) > + ), > + > + TP_printk("filename=%s pid=%d old_pid=%d", __get_str(filename), > + __entry->pid, __entry->old_pid) > +) > + > +/* > * XXX the below sched_stat tracepoints only apply to > SCHED_OTHER/BATCH/IDLE > * adding sched_stat support to SCHED_FIFO/RR would be welcome. > */ > diff --git a/instrumentation/events/mainline/sched.h > b/instrumentation/events/mainline/sched.h > index f633478..6700ecc 100644 > --- a/instrumentation/events/mainline/sched.h > +++ b/instrumentation/events/mainline/sched.h > @@ -275,6 +275,32 @@ TRACE_EVENT(sched_process_fork, > ); > > /* > + * Tracepoint for exec: > + */ > +TRACE_EVENT(sched_process_exec, > + > + TP_PROTO(struct task_struct *p, pid_t old_pid, > + struct linux_binprm *bprm), > + > + TP_ARGS(p, old_pid, bprm), > + > + TP_STRUCT__entry( > + __string( filename, bprm->filename ) > + __field( pid_t, pid ) > + __field( pid_t, old_pid ) > + ), > + > + TP_fast_assign( > + __assign_str(filename, bprm->filename); > + __entry->pid = p->pid; > + __entry->old_pid = old_pid; > + ), > + > + TP_printk("filename=%s pid=%d old_pid=%d", __get_str(filename), > + __entry->pid, __entry->old_pid) > +); > + > +/* > * XXX the below sched_stat tracepoints only apply to > SCHED_OTHER/BATCH/IDLE > * adding sched_stat support to SCHED_FIFO/RR would be welcome. > */ > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Thu Oct 11 12:23:09 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Thu, 11 Oct 2012 12:23:09 -0400 Subject: [lttng-dev] [RELEASE] Userspace RCU 0.6.8 Message-ID: <20121011162309.GA30801@Krystal> liburcu is a LGPLv2.1 userspace RCU (read-copy-update) library. This data synchronization library provides read-side access which scales linearly with the number of cores. It does so by allowing multiples copies of a given data structure to live at the same time, and by monitoring the data structure accesses to detect grace periods after which memory reclamation is possible. liburcu-cds provides efficient data structures based on RCU and lock-free algorithms. Those structures include hash tables, queues, stacks, and doubly-linked lists. We are going forward with a stable-0.6 branch to push important fixes to the old 0.6.x series. Please note that the current 0.7.x branch has a different fix for this issue, which will be released shortly into 0.7.5. Please note that the bug below occurs very rarely, only when invoking call_rcu_data_free() for per-cpu or per-thread callback-invokation threads (typically before application exit). Changelog: 2012-09-11 Userspace RCU 0.6.8 * Fix: call_rcu list corruption on teardown Project website: http://lttng.org/urcu Download link: http://lttng.org/files/urcu/ -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From alexmonthy at voxpopuli.im Thu Oct 11 13:14:37 2012 From: alexmonthy at voxpopuli.im (Alexandre Montplaisir) Date: Thu, 11 Oct 2012 13:14:37 -0400 Subject: [lttng-dev] TMF current and future developments In-Reply-To: <20121010160629.dcsyijhgooowckoo@mail.encs.concordia.ca> References: <20121010160629.dcsyijhgooowckoo@mail.encs.concordia.ca> Message-ID: <5076FE7D.4050207@voxpopuli.im> Hi Efraim, On 12-10-10 04:06 PM, Efraim Josue Lopez Sanchez wrote: > Hi, > > We are planning to build on top of TMF. Most recently we have tried > implementing some stuffs so that we can familiarize with TMF source > code. However, I've been wondering these last days about the future > plans/developments related to TMF. Is there any webpage/place/document > where I can find information regarding what are you guys > working/developing right now? And also about the future > developments/versions? If you want to track the very latest development, you can look at the patches that are going in for review on Gerrit, at https://git.eclipse.org/r/#/q/status:open+project:linuxtools/org.eclipse.linuxtools,n,z (not everything is related to TMF/LTTng, but it should be obvious what is by looking at the modified files). Most of it is targeted at the "lttng-kepler" branch on git, which should ultimately be part of the 2.0 release next June. If you want to try/test the latest code, you should use that branch. There is also a small list of planned features on http://eclipse.org/linuxtools/projectPages/lttng/ under "Future Plans". If you have any question about specific features, let us know! Cheers, -- Alexandre Montplaisir DORSAL lab, ?cole Polytechnique de Montr?al From matthew.khouzam at ericsson.com Thu Oct 11 13:19:08 2012 From: matthew.khouzam at ericsson.com (Matthew Khouzam) Date: Thu, 11 Oct 2012 13:19:08 -0400 Subject: [lttng-dev] TMF current and future developments In-Reply-To: <5076FE7D.4050207@voxpopuli.im> References: <20121010160629.dcsyijhgooowckoo@mail.encs.concordia.ca> <5076FE7D.4050207@voxpopuli.im> Message-ID: <5076FF8C.3040205@ericsson.com> I would personally suggest checking out the state system. It is, in my opinion, THE place to put abstracted events. On 12-10-11 01:14 PM, Alexandre Montplaisir wrote: > Hi Efraim, > > On 12-10-10 04:06 PM, Efraim Josue Lopez Sanchez wrote: >> Hi, >> >> We are planning to build on top of TMF. Most recently we have tried >> implementing some stuffs so that we can familiarize with TMF source >> code. However, I've been wondering these last days about the future >> plans/developments related to TMF. Is there any webpage/place/document >> where I can find information regarding what are you guys >> working/developing right now? And also about the future >> developments/versions? > If you want to track the very latest development, you can look at the > patches that are going in for review on Gerrit, at > https://git.eclipse.org/r/#/q/status:open+project:linuxtools/org.eclipse.linuxtools,n,z > (not everything is related to TMF/LTTng, but it should be obvious what > is by looking at the modified files). > > Most of it is targeted at the "lttng-kepler" branch on git, which should > ultimately be part of the 2.0 release next June. If you want to try/test > the latest code, you should use that branch. > > There is also a small list of planned features on > http://eclipse.org/linuxtools/projectPages/lttng/ under "Future Plans". > > If you have any question about specific features, let us know! > > > Cheers, > From mathieu.desnoyers at efficios.com Thu Oct 11 14:22:55 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Thu, 11 Oct 2012 14:22:55 -0400 Subject: [lttng-dev] [RFC] re-document rculfstack and even rename it In-Reply-To: <50752928.3050006@cn.fujitsu.com> References: <50752928.3050006@cn.fujitsu.com> Message-ID: <20121011182255.GA19783@Krystal> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > rculfstack is not really require RCU-only. > > 1) cds_lfs_push_rcu() don't need any lock, don't need RCU nor other locks. Good point ! I even documented this peculiarness in the comment at the top of _cds_lfs_push_rcu(). > 2) cds_lfs_pop_rcu() don't only one of the following synchronization(not only RCU): > A) use rcu_read_lock() to protect cds_lfs_pop_rcu() and use synchronize_rcu() > or call_rcu() to free the popped node. (current comments said we need this > synchronization, and thus we named this struct with rcu prefix. But actually, > the followings are OK, and are more popular/friendly) > B) use mutexs/locks to protect cds_lfs_pop_rcu(), we can free to free/modify the > popped node any time, we don't need any synchronization when free them. > C) only ONE thread can call cds_lfs_pop_rcu(). (multi-providers-single customer) > D) others, like read-write locks. > > I consider B) and C) are more popular. In linux kernel, > kernel/task_work.c uses a hybird ways of B) and C). > > I suggest to rename it, Or document B) and C) at least. Yes, agreed! Do you suggest we introduce a "lfstack", and slowly deprecate rculfstack ? We could then document the various ways to protect "pop", and also implement a "splice" operation while we are there. Thoughts ? Thanks, Mathieu > > Thanks, > Lai > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Thu Oct 11 16:48:23 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Thu, 11 Oct 2012 16:48:23 -0400 Subject: [lttng-dev] [RFC PATCH] lfstack: implement lock-free stack Message-ID: <20121011204823.GA12805@Krystal> This stack does not require to hold RCU read-side lock across push, and allows multiple strategies to be used for pop. Signed-off-by: Mathieu Desnoyers --- diff --git a/Makefile.am b/Makefile.am index ffdca9a..195b89a 100644 --- a/Makefile.am +++ b/Makefile.am @@ -17,6 +17,7 @@ nobase_dist_include_HEADERS = urcu/compiler.h urcu/hlist.h urcu/list.h \ urcu/wfqueue.h urcu/rculfstack.h urcu/rculfqueue.h \ urcu/ref.h urcu/cds.h urcu/urcu_ref.h urcu/urcu-futex.h \ urcu/uatomic_arch.h urcu/rculfhash.h urcu/wfcqueue.h \ + urcu/lfstack.h \ $(top_srcdir)/urcu/map/*.h \ $(top_srcdir)/urcu/static/*.h \ urcu/tls-compat.h @@ -72,7 +73,8 @@ liburcu_signal_la_LIBADD = liburcu-common.la liburcu_bp_la_SOURCES = urcu-bp.c urcu-pointer.c $(COMPAT) liburcu_bp_la_LIBADD = liburcu-common.la -liburcu_cds_la_SOURCES = rculfqueue.c rculfstack.c $(RCULFHASH) $(COMPAT) +liburcu_cds_la_SOURCES = rculfqueue.c rculfstack.c lfstack.c \ + $(RCULFHASH) $(COMPAT) liburcu_cds_la_LIBADD = liburcu-common.la pkgconfigdir = $(libdir)/pkgconfig diff --git a/lfstack.c b/lfstack.c new file mode 100644 index 0000000..a6a1a8b --- /dev/null +++ b/lfstack.c @@ -0,0 +1,51 @@ +/* + * lfstack.c + * + * Userspace RCU library - Lock-Free Stack + * + * Copyright 2010-2012 - Mathieu Desnoyers + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/* Do not #define _LGPL_SOURCE to ensure we can emit the wrapper symbols */ +#undef _LGPL_SOURCE +#include "urcu/lfstack.h" +#define _LGPL_SOURCE +#include "urcu/static/lfstack.h" + +/* + * library wrappers to be used by non-LGPL compatible source code. + */ + +void cds_lfs_node_init(struct cds_lfs_node *node) +{ + _cds_lfs_node_init(node); +} + +void cds_lfs_init(struct cds_lfs_stack *s) +{ + _cds_lfs_init(s); +} + +int cds_lfs_push(struct cds_lfs_stack *s, struct cds_lfs_node *node) +{ + return _cds_lfs_push(s, node); +} + +struct cds_lfs_node *cds_lfs_pop(struct cds_lfs_stack *s) +{ + return _cds_lfs_pop(s); +} diff --git a/urcu/cds.h b/urcu/cds.h index d9e7984..78534bb 100644 --- a/urcu/cds.h +++ b/urcu/cds.h @@ -33,5 +33,6 @@ #include #include #include +#include #endif /* _URCU_CDS_H */ diff --git a/urcu/lfstack.h b/urcu/lfstack.h new file mode 100644 index 0000000..d068739 --- /dev/null +++ b/urcu/lfstack.h @@ -0,0 +1,87 @@ +#ifndef _URCU_LFSTACK_H +#define _URCU_LFSTACK_H + +/* + * lfstack.h + * + * Userspace RCU library - Lock-Free Stack + * + * Copyright 2010-2012 - Mathieu Desnoyers + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifdef __cplusplus +extern "C" { +#endif + +struct cds_lfs_node { + struct cds_lfs_node *next; +}; + +struct cds_lfs_stack { + struct cds_lfs_node *head; +}; + +#ifdef _LGPL_SOURCE + +#include + +#define cds_lfs_node_init _cds_lfs_node_init +#define cds_lfs_init _cds_lfs_init +#define cds_lfs_push _cds_lfs_push +#define cds_lfs_pop _cds_lfs_pop + +#else /* !_LGPL_SOURCE */ + +extern void cds_lfs_node_init(struct cds_lfs_node *node); +extern void cds_lfs_init(struct cds_lfs_stack *s); + +/* + * cds_lfs_push: push a node into the stack. + * + * Does not require any synchronization with other push nor pop. + * + * Returns 0 if the stack was empty prior to adding the node. + * Returns non-zero otherwise. + */ +extern int cds_lfs_push(struct cds_lfs_stack *s, + struct cds_lfs_node *node); + +/* + * cds_lfs_pop: pop a node from the stack. + * + * Returns NULL if stack is empty. + * + * cds_lfs_pop needs to be synchronized using one of the following + * techniques: + * + * 1) Calling cds_lfs_pop under rcu read lock critical section. The + * caller must wait for a grace period to pass before freeing the + * returned node or modifying the cds_lfs_node structure. + * 2) Using mutual exclusion (e.g. mutexes) to protect cds_lfs_pop + * callers. + * 3) Ensuring that only ONE thread can call cds_lfs_pop(). + * (multi-provider/single-consumer scheme). + */ +extern struct cds_lfs_node *cds_lfs_pop(struct cds_lfs_stack *s); + +#endif /* !_LGPL_SOURCE */ + +#ifdef __cplusplus +} +#endif + +#endif /* _URCU_LFSTACK_H */ diff --git a/urcu/static/lfstack.h b/urcu/static/lfstack.h new file mode 100644 index 0000000..7acbf54 --- /dev/null +++ b/urcu/static/lfstack.h @@ -0,0 +1,151 @@ +#ifndef _URCU_STATIC_LFSTACK_H +#define _URCU_STATIC_LFSTACK_H + +/* + * urcu/static/lfstack.h + * + * Userspace RCU library - Lock-Free Stack + * + * Copyright 2010-2012 - Mathieu Desnoyers + * + * TO BE INCLUDED ONLY IN LGPL-COMPATIBLE CODE. See rculfstack.h for linking + * dynamically with the userspace rcu library. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +static inline +void _cds_lfs_node_init(struct cds_lfs_node *node) +{ +} + +static inline +void _cds_lfs_init(struct cds_lfs_stack *s) +{ + s->head = NULL; +} + +/* + * cds_lfs_push: push a node into the stack. + * + * Does not require any synchronization with other push nor pop. + * + * Lock-free stack push is not subject to ABA problem, so no need to + * take the RCU read-side lock. Even if "head" changes between two + * uatomic_cmpxchg() invocations here (being popped, and then pushed + * again by one or more concurrent threads), the second + * uatomic_cmpxchg() invocation only cares about pushing a new entry at + * the head of the stack, ensuring consistency by making sure the new + * node->next is the same pointer value as the value replaced as head. + * It does not care about the content of the actual next node, so it can + * very well be reallocated between the two uatomic_cmpxchg(). + * + * We take the approach of expecting the stack to be usually empty, so + * we first try an initial uatomic_cmpxchg() on a NULL old_head, and + * retry if the old head was non-NULL (the value read by the first + * uatomic_cmpxchg() is used as old head for the following loop). The + * upside of this scheme is to minimize the amount of cacheline traffic, + * always performing an exclusive cacheline access, rather than doing + * non-exclusive followed by exclusive cacheline access (which would be + * required if we first read the old head value). This design decision + * might be revisited after more throrough benchmarking on various + * platforms. + * + * Returns 0 if the stack was empty prior to adding the node. + * Returns non-zero otherwise. + */ +static inline +int _cds_lfs_push(struct cds_lfs_stack *s, + struct cds_lfs_node *node) +{ + struct cds_lfs_node *head = NULL; + + for (;;) { + struct cds_lfs_node *old_head = head; + + /* + * node->next is still private at this point, no need to + * perform a _CMM_STORE_SHARED(). + */ + node->next = head; + /* + * uatomic_cmpxchg() implicit memory barrier orders earlier + * stores to node before publication. + */ + head = uatomic_cmpxchg(&s->head, old_head, node); + if (old_head == head) + break; + } + return (int) !!((unsigned long) head); +} + +/* + * cds_lfs_pop: pop a node from the stack. + * + * Returns NULL if stack is empty. + * + * cds_lfs_pop needs to be synchronized using one of the following + * techniques: + * + * 1) Calling cds_lfs_pop under rcu read lock critical section. The + * caller must wait for a grace period to pass before freeing the + * returned node or modifying the cds_lfs_node structure. + * 2) Using mutual exclusion (e.g. mutexes) to protect cds_lfs_pop + * callers. + * 3) Ensuring that only ONE thread can call cds_lfs_pop(). + * (multi-provider/single-consumer scheme). + */ +static inline +struct cds_lfs_node *_cds_lfs_pop(struct cds_lfs_stack *s) +{ + for (;;) { + struct cds_lfs_node *head; + + head = _CMM_LOAD_SHARED(s->head); + if (head) { + struct cds_lfs_node *next; + + /* + * Read head before head->next. Matches the + * implicit memory barrier before + * uatomic_cmpxchg() in cds_lfs_push. + */ + cmm_smp_read_barrier_depends(); + next = _CMM_LOAD_SHARED(head->next); + if (uatomic_cmpxchg(&s->head, head, next) == head) { + return head; + } else { + /* Concurrent modification. Retry. */ + continue; + } + } else { + /* Empty stack */ + return NULL; + } + } +} + +#ifdef __cplusplus +} +#endif + +#endif /* _URCU_STATIC_LFSTACK_H */ -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From laijs at cn.fujitsu.com Thu Oct 11 22:25:40 2012 From: laijs at cn.fujitsu.com (Lai Jiangshan) Date: Fri, 12 Oct 2012 10:25:40 +0800 Subject: [lttng-dev] [RFC] re-document rculfstack and even rename it In-Reply-To: <20121011182255.GA19783@Krystal> References: <50752928.3050006@cn.fujitsu.com> <20121011182255.GA19783@Krystal> Message-ID: <50777FA4.9030902@cn.fujitsu.com> On 10/12/2012 02:22 AM, Mathieu Desnoyers wrote: > * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: >> rculfstack is not really require RCU-only. >> >> 1) cds_lfs_push_rcu() don't need any lock, don't need RCU nor other locks. > > Good point ! I even documented this peculiarness in the > comment at the top of _cds_lfs_push_rcu(). > >> 2) cds_lfs_pop_rcu() don't only one of the following synchronization(not only RCU): >> A) use rcu_read_lock() to protect cds_lfs_pop_rcu() and use synchronize_rcu() >> or call_rcu() to free the popped node. (current comments said we need this >> synchronization, and thus we named this struct with rcu prefix. But actually, >> the followings are OK, and are more popular/friendly) >> B) use mutexs/locks to protect cds_lfs_pop_rcu(), we can free to free/modify the >> popped node any time, we don't need any synchronization when free them. >> C) only ONE thread can call cds_lfs_pop_rcu(). (multi-providers-single customer) >> D) others, like read-write locks. >> >> I consider B) and C) are more popular. In linux kernel, >> kernel/task_work.c uses a hybird ways of B) and C). >> >> I suggest to rename it, Or document B) and C) at least. > > Yes, agreed! Do you suggest we introduce a "lfstack", and slowly > deprecate rculfstack ? > > We could then document the various ways to protect "pop", and also > implement a "splice" operation while we are there. > Acked. Thanks, Lai From Zheng.Chang at emc.com Fri Oct 12 04:57:14 2012 From: Zheng.Chang at emc.com (Chang, Zheng) Date: Fri, 12 Oct 2012 04:57:14 -0400 Subject: [lttng-dev] Amount of trace folders created for the same tool Message-ID: <6539770C71C3814BB0BFC2DBEBD105080137BE38@CORPUSMX30B.corp.emc.com> Hi, I have a command line tool that calls tracepoint to generate trace. But it quits once the process is done. So Lttng will create different folders for it because its pid gets changed every time. This tool gets called hundreds times per day and I'm confused by how to manage these trace info. Is there a way or feature to avoid this case? Thanks -Zheng -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathieu.desnoyers at efficios.com Fri Oct 12 07:01:12 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Fri, 12 Oct 2012 07:01:12 -0400 Subject: [lttng-dev] Amount of trace folders created for the same tool In-Reply-To: <6539770C71C3814BB0BFC2DBEBD105080137BE38@CORPUSMX30B.corp.emc.com> References: <6539770C71C3814BB0BFC2DBEBD105080137BE38@CORPUSMX30B.corp.emc.com> Message-ID: <20121012110112.GB14821@Krystal> * Chang, Zheng (Zheng.Chang at emc.com) wrote: > Hi, > > > > I have a command line tool that calls tracepoint to generate trace. > > But it quits once the process is done. So Lttng will create different > folders for it because its pid gets changed every time. > > This tool gets called hundreds times per day and I'm confused by how to > manage these trace info. > > > > Is there a way or feature to avoid this case? Not yet, but we're working on "global, per user" UST buffers for LTTng 2.2. This will solve your issue. Thanks, Mathieu > > > > Thanks > > -Zheng > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Fri Oct 12 07:18:35 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Fri, 12 Oct 2012 07:18:35 -0400 Subject: [lttng-dev] [RELEASE] Userspace RCU 0.7.5 Message-ID: <20121012111835.GA15280@Krystal> liburcu is a LGPLv2.1 userspace RCU (read-copy-update) library. This data synchronization library provides read-side access which scales linearly with the number of cores. It does so by allowing multiples copies of a given data structure to live at the same time, and by monitoring the data structure accesses to detect grace periods after which memory reclamation is possible. liburcu-cds provides efficient data structures based on RCU and lock-free algorithms. Those structures include hash tables, queues, stacks, and doubly-linked lists. This is the start of a stable-0.7 branch to push important fixes. The master branch development will continue and eventually result in a 0.8.0 release. Please note that the bug "call_rcu list corruption on teardown" occurs very rarely, only when invoking call_rcu_data_free() for per-cpu or per-thread callback-invocation threads (typically before application exit). The fix for this corruption bug is the same as the fix used in the stable-0.6 branch. This fix is not needed in the master branch, because the issue had been already fixed by moving call_rcu to the new wfcqueue API. Changelog: 2012-10-12 Userspace RCU 0.7.5 * Fix: call_rcu list corruption on teardown * Ensure that read-side functions meet 10-line LGPL criterion * tls-compat.h: document sigaltstack(2) limitation * urcu: add notice to URCU_TLS() for it is not strictly async-signal-safe * Document sigaltstack(2) limitation * Documentation: update LICENSE file Project website: http://lttng.org/urcu Download link: http://lttng.org/files/urcu/ -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From Paul_Woegerer at mentor.com Fri Oct 12 08:12:42 2012 From: Paul_Woegerer at mentor.com (Woegerer, Paul) Date: Fri, 12 Oct 2012 14:12:42 +0200 Subject: [lttng-dev] sched_process_exec In-Reply-To: <20121011160729.GA29181@Krystal> References: <50769060.7070309@mentor.com> <20121011145855.GA11726@Krystal> <5076E59D.3010003@mentor.com> <20121011160729.GA29181@Krystal> Message-ID: <5078093A.6080500@mentor.com> On 10/11/2012 06:07 PM, Mathieu Desnoyers wrote: > * Woegerer, Paul (Paul_Woegerer at mentor.com) wrote: >> On 10/11/2012 04:58 PM, Mathieu Desnoyers wrote: >>> A couple a details to fix before I can merge this patch though: >>> >>> Please also update instrumentation/events/mainline/sched.h to add the >>> original mainline TRACE_EVENT, so we can keep the files in sync. >> >> Ok, reconfigured Thunderbird, removed semicolon, updated mainline: > > Still an issue: > > compudj at thinkos:~/work/lttng-modules$ patch -p1 < ~/.swp > patching file instrumentation/events/lttng-module/sched.h > patch: **** malformed patch at line 45: SCHED_OTHER/BATCH/IDLE Now it should (hopefully) work: >From b792323bca694f1e88144319befcdbfb30efa878 Mon Sep 17 00:00:00 2001 From: Paul Woegerer Date: Fri, 12 Oct 2012 12:52:19 +0200 Subject: [PATCH] Add TRACE_EVENT(sched_process_exec) to sched.h. --- instrumentation/events/lttng-module/sched.h | 26 ++++++++++++++++++++++++++ instrumentation/events/mainline/sched.h | 26 ++++++++++++++++++++++++++ 2 files changed, 52 insertions(+) diff --git a/instrumentation/events/lttng-module/sched.h b/instrumentation/events/lttng-module/sched.h index b68616e..23e4955 100644 --- a/instrumentation/events/lttng-module/sched.h +++ b/instrumentation/events/lttng-module/sched.h @@ -314,6 +314,32 @@ TRACE_EVENT(sched_process_fork, ) /* + * Tracepoint for exec: + */ +TRACE_EVENT(sched_process_exec, + + TP_PROTO(struct task_struct *p, pid_t old_pid, + struct linux_binprm *bprm), + + TP_ARGS(p, old_pid, bprm), + + TP_STRUCT__entry( + __string( filename, bprm->filename ) + __field( pid_t, pid ) + __field( pid_t, old_pid ) + ), + + TP_fast_assign( + tp_strcpy(filename, bprm->filename) + tp_assign(pid, p->pid) + tp_assign(old_pid, old_pid) + ), + + TP_printk("filename=%s pid=%d old_pid=%d", __get_str(filename), + __entry->pid, __entry->old_pid) +) + +/* * XXX the below sched_stat tracepoints only apply to SCHED_OTHER/BATCH/IDLE * adding sched_stat support to SCHED_FIFO/RR would be welcome. */ diff --git a/instrumentation/events/mainline/sched.h b/instrumentation/events/mainline/sched.h index f633478..6700ecc 100644 --- a/instrumentation/events/mainline/sched.h +++ b/instrumentation/events/mainline/sched.h @@ -275,6 +275,32 @@ TRACE_EVENT(sched_process_fork, ); /* + * Tracepoint for exec: + */ +TRACE_EVENT(sched_process_exec, + + TP_PROTO(struct task_struct *p, pid_t old_pid, + struct linux_binprm *bprm), + + TP_ARGS(p, old_pid, bprm), + + TP_STRUCT__entry( + __string( filename, bprm->filename ) + __field( pid_t, pid ) + __field( pid_t, old_pid ) + ), + + TP_fast_assign( + __assign_str(filename, bprm->filename); + __entry->pid = p->pid; + __entry->old_pid = old_pid; + ), + + TP_printk("filename=%s pid=%d old_pid=%d", __get_str(filename), + __entry->pid, __entry->old_pid) +); + +/* * XXX the below sched_stat tracepoints only apply to SCHED_OTHER/BATCH/IDLE * adding sched_stat support to SCHED_FIFO/RR would be welcome. */ -- 1.7.10.4 From mathieu.desnoyers at efficios.com Fri Oct 12 10:17:33 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Fri, 12 Oct 2012 10:17:33 -0400 Subject: [lttng-dev] [RFC PATCH urcu] lfstack: implement pop_all and iterators Message-ID: <20121012141732.GA16820@Krystal> Please note that this code is currently developed in a volatile branch: git://git.dorsal.polymtl.ca/~compudj/userspace-rcu branch: urcu/lfstack Signed-off-by: Mathieu Desnoyers --- diff --git a/lfstack.c b/lfstack.c index 74ffd4f..7e2011d 100644 --- a/lfstack.c +++ b/lfstack.c @@ -45,7 +45,12 @@ int cds_lfs_push(struct cds_lfs_stack *s, struct cds_lfs_node *node) return _cds_lfs_push(s, node); } -struct cds_lfs_node *cds_lfs_pop(struct cds_lfs_stack *s) +struct cds_lfs_node *__cds_lfs_pop(struct cds_lfs_stack *s) { - return _cds_lfs_pop(s); + return ___cds_lfs_pop(s); +} + +struct cds_lfs_head *__cds_lfs_pop_all(struct cds_lfs_stack *s) +{ + return ___cds_lfs_pop_all(s); } diff --git a/tests/test_urcu_lfs.c b/tests/test_urcu_lfs.c index a29dc94..8350e65 100644 --- a/tests/test_urcu_lfs.c +++ b/tests/test_urcu_lfs.c @@ -244,7 +244,7 @@ void *thr_dequeuer(void *_count) struct test *node; rcu_read_lock(); - snode = cds_lfs_pop(&s); + snode = __cds_lfs_pop(&s); node = caa_container_of(snode, struct test, list); rcu_read_unlock(); if (node) { @@ -275,7 +275,7 @@ void test_end(struct cds_lfs_stack *s, unsigned long long *nr_dequeues) struct cds_lfs_node *snode; do { - snode = cds_lfs_pop(s); + snode = __cds_lfs_pop(s); if (snode) { struct test *node; diff --git a/urcu/lfstack.h b/urcu/lfstack.h index d068739..463b0d9 100644 --- a/urcu/lfstack.h +++ b/urcu/lfstack.h @@ -27,14 +27,40 @@ extern "C" { #endif +/* + * struct cds_lfs_node is returned by cds_lfs_pop, and also used as + * iterator on stack. It is not safe to dereference the node next + * pointer when returned by cds_lfs_pop. + */ struct cds_lfs_node { struct cds_lfs_node *next; }; +/* + * struct cds_lfs_head is returned by __cds_lfs_pop_all, and can be used + * to begin iteration on the stack. + */ +struct cds_lfs_head { + struct cds_lfs_node node; +}; + struct cds_lfs_stack { - struct cds_lfs_node *head; + struct cds_lfs_head *head; }; +/* + * Synchronization table: + * + * External synchronization techniques described in the API below is + * required between pairs marked with "X". No external synchronization + * required between pairs marked with "-". + * + * cds_lfs_push __cds_lfs_pop __cds_lfs_pop_all + * cds_lfs_push - - - + * __cds_lfs_pop - X X + * __cds_lfs_pop_all - X - + */ + #ifdef _LGPL_SOURCE #include @@ -42,7 +68,8 @@ struct cds_lfs_stack { #define cds_lfs_node_init _cds_lfs_node_init #define cds_lfs_init _cds_lfs_init #define cds_lfs_push _cds_lfs_push -#define cds_lfs_pop _cds_lfs_pop +#define __cds_lfs_pop ___cds_lfs_pop +#define __cds_lfs_pop_all ___cds_lfs_pop_all #else /* !_LGPL_SOURCE */ @@ -61,25 +88,74 @@ extern int cds_lfs_push(struct cds_lfs_stack *s, struct cds_lfs_node *node); /* - * cds_lfs_pop: pop a node from the stack. + * __cds_lfs_pop: pop a node from the stack. * * Returns NULL if stack is empty. * - * cds_lfs_pop needs to be synchronized using one of the following + * __cds_lfs_pop needs to be synchronized using one of the following * techniques: * - * 1) Calling cds_lfs_pop under rcu read lock critical section. The + * 1) Calling __cds_lfs_pop under rcu read lock critical section. The * caller must wait for a grace period to pass before freeing the * returned node or modifying the cds_lfs_node structure. - * 2) Using mutual exclusion (e.g. mutexes) to protect cds_lfs_pop - * callers. - * 3) Ensuring that only ONE thread can call cds_lfs_pop(). - * (multi-provider/single-consumer scheme). + * 2) Using mutual exclusion (e.g. mutexes) to protect __cds_lfs_pop + * and __cds_lfs_pop_all callers. + * 3) Ensuring that only ONE thread can call __cds_lfs_pop() and + * __cds_lfs_pop_all(). (multi-provider/single-consumer scheme). + */ +extern struct cds_lfs_node *__cds_lfs_pop(struct cds_lfs_stack *s); + +/* + * __cds_lfs_pop_all: pop all nodes from a stack. + * + * __cds_lfs_pop_all does not require any synchronization with other + * push, nor with other __cds_lfs_pop_all, but requires synchronization + * matching the technique used to synchronize __cds_lfs_pop: + * + * 1) If __cds_lfs_pop is called under rcu read lock critical section, + * both __cds_lfs_pop and cds_lfs_pop_all callers must wait for a + * grace period to pass before freeing the returned node or modifying + * the cds_lfs_node structure. However, no RCU read-side critical + * section is needed around __cds_lfs_pop_all. + * 2) Using mutual exclusion (e.g. mutexes) to protect __cds_lfs_pop and + * __cds_lfs_pop_all callers. + * 3) Ensuring that only ONE thread can call __cds_lfs_pop() and + * __cds_lfs_pop_all(). (multi-provider/single-consumer scheme). */ -extern struct cds_lfs_node *cds_lfs_pop(struct cds_lfs_stack *s); +extern struct cds_lfs_head *__cds_lfs_pop_all(struct cds_lfs_stack *s); #endif /* !_LGPL_SOURCE */ +/* + * cds_lfs_for_each: Iterate over all nodes returned by + * __cds_lfs_pop_all. + * @__head: node returned by __cds_lfs_pop_all (struct cds_lfs_node pointer). + * @__node: node to use as iterator (struct cds_lfs_node pointer). + * + * Content written into each node before push is guaranteed to be + * consistent, but no other memory ordering is ensured. + */ +#define cds_lfs_for_each(__head, __node) \ + for (__node = &__head->node; \ + __node != NULL; \ + __node = __node->next) + +/* + * cds_lfs_for_each_safe: Iterate over all nodes returned by + * __cds_lfs_pop_all, safe against node deletion. + * @__head: node returned by __cds_lfs_pop_all (struct cds_lfs_node pointer). + * @__node: node to use as iterator (struct cds_lfs_node pointer). + * @__n: struct cds_lfs_node pointer holding the next pointer (used + * internally). + * + * Content written into each node before push is guaranteed to be + * consistent, but no other memory ordering is ensured. + */ +#define cds_lfs_for_each_safe(__head, __node, __n) \ + for (__node = &__head->node, __n = (__node ? __node->next : NULL); \ + __node != NULL; \ + __node = __n, __n = (__node ? __node->next : NULL)) + #ifdef __cplusplus } #endif diff --git a/urcu/static/lfstack.h b/urcu/static/lfstack.h index 7acbf54..479c8c3 100644 --- a/urcu/static/lfstack.h +++ b/urcu/static/lfstack.h @@ -33,6 +33,19 @@ extern "C" { #endif +/* + * Synchronization table: + * + * External synchronization techniques described in the API below is + * required between pairs marked with "X". No external synchronization + * required between pairs marked with "-". + * + * cds_lfs_push __cds_lfs_pop __cds_lfs_pop_all + * cds_lfs_push - - - + * __cds_lfs_pop - X X + * __cds_lfs_pop_all - X - + */ + static inline void _cds_lfs_node_init(struct cds_lfs_node *node) { @@ -77,21 +90,23 @@ static inline int _cds_lfs_push(struct cds_lfs_stack *s, struct cds_lfs_node *node) { - struct cds_lfs_node *head = NULL; + struct cds_lfs_head *head = NULL; + struct cds_lfs_head *new_head = + caa_container_of(node, struct cds_lfs_head, node); for (;;) { - struct cds_lfs_node *old_head = head; + struct cds_lfs_head *old_head = head; /* * node->next is still private at this point, no need to * perform a _CMM_STORE_SHARED(). */ - node->next = head; + node->next = &head->node; /* * uatomic_cmpxchg() implicit memory barrier orders earlier * stores to node before publication. */ - head = uatomic_cmpxchg(&s->head, old_head, node); + head = uatomic_cmpxchg(&s->head, old_head, new_head); if (old_head == head) break; } @@ -99,30 +114,31 @@ int _cds_lfs_push(struct cds_lfs_stack *s, } /* - * cds_lfs_pop: pop a node from the stack. + * __cds_lfs_pop: pop a node from the stack. * * Returns NULL if stack is empty. * - * cds_lfs_pop needs to be synchronized using one of the following + * __cds_lfs_pop needs to be synchronized using one of the following * techniques: * - * 1) Calling cds_lfs_pop under rcu read lock critical section. The + * 1) Calling __cds_lfs_pop under rcu read lock critical section. The * caller must wait for a grace period to pass before freeing the * returned node or modifying the cds_lfs_node structure. - * 2) Using mutual exclusion (e.g. mutexes) to protect cds_lfs_pop - * callers. - * 3) Ensuring that only ONE thread can call cds_lfs_pop(). - * (multi-provider/single-consumer scheme). + * 2) Using mutual exclusion (e.g. mutexes) to protect __cds_lfs_pop + * and __cds_lfs_pop_all callers. + * 3) Ensuring that only ONE thread can call __cds_lfs_pop() and + * __cds_lfs_pop_all(). (multi-provider/single-consumer scheme). */ static inline -struct cds_lfs_node *_cds_lfs_pop(struct cds_lfs_stack *s) +struct cds_lfs_node *___cds_lfs_pop(struct cds_lfs_stack *s) { for (;;) { - struct cds_lfs_node *head; + struct cds_lfs_head *head; head = _CMM_LOAD_SHARED(s->head); if (head) { struct cds_lfs_node *next; + struct cds_lfs_head *next_head; /* * Read head before head->next. Matches the @@ -130,9 +146,12 @@ struct cds_lfs_node *_cds_lfs_pop(struct cds_lfs_stack *s) * uatomic_cmpxchg() in cds_lfs_push. */ cmm_smp_read_barrier_depends(); - next = _CMM_LOAD_SHARED(head->next); - if (uatomic_cmpxchg(&s->head, head, next) == head) { - return head; + next = _CMM_LOAD_SHARED(head->node.next); + next_head = caa_container_of(next, + struct cds_lfs_head, node); + if (uatomic_cmpxchg(&s->head, head, next_head) + == head) { + return &head->node; } else { /* Concurrent modification. Retry. */ continue; @@ -144,6 +163,39 @@ struct cds_lfs_node *_cds_lfs_pop(struct cds_lfs_stack *s) } } +/* + * __cds_lfs_pop_all: pop all nodes from a stack. + * + * __cds_lfs_pop_all does not require any synchronization with other + * push, nor with other __cds_lfs_pop_all, but requires synchronization + * matching the technique used to synchronize __cds_lfs_pop: + * + * 1) If __cds_lfs_pop is called under rcu read lock critical section, + * both __cds_lfs_pop and cds_lfs_pop_all callers must wait for a + * grace period to pass before freeing the returned node or modifying + * the cds_lfs_node structure. However, no RCU read-side critical + * section is needed around __cds_lfs_pop_all. + * 2) Using mutual exclusion (e.g. mutexes) to protect __cds_lfs_pop and + * __cds_lfs_pop_all callers. + * 3) Ensuring that only ONE thread can call __cds_lfs_pop() and + * __cds_lfs_pop_all(). (multi-provider/single-consumer scheme). + */ +static inline +struct cds_lfs_head *___cds_lfs_pop_all(struct cds_lfs_stack *s) +{ + /* + * Implicit memory barrier after uatomic_xchg() matches implicit + * memory barrier before uatomic_cmpxchg() in cds_lfs_push. It + * ensures that all nodes of the returned list are consistent. + * There is no need to issue memory barriers when iterating on + * the returned list, because the full memory barrier issued + * prior to each uatomic_cmpxchg, which each write to head, are + * taking care to order writes to each node prior to the full + * memory barrier after this uatomic_xchg(). + */ + return uatomic_xchg(&s->head, NULL); +} + #ifdef __cplusplus } #endif -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Fri Oct 12 10:20:29 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Fri, 12 Oct 2012 10:20:29 -0400 Subject: [lttng-dev] sched_process_exec In-Reply-To: <5078093A.6080500@mentor.com> References: <50769060.7070309@mentor.com> <20121011145855.GA11726@Krystal> <5076E59D.3010003@mentor.com> <20121011160729.GA29181@Krystal> <5078093A.6080500@mentor.com> Message-ID: <20121012142029.GA16951@Krystal> * Woegerer, Paul (Paul_Woegerer at mentor.com) wrote: > On 10/11/2012 06:07 PM, Mathieu Desnoyers wrote: > > * Woegerer, Paul (Paul_Woegerer at mentor.com) wrote: > >> On 10/11/2012 04:58 PM, Mathieu Desnoyers wrote: > >>> A couple a details to fix before I can merge this patch though: > >>> > >>> Please also update instrumentation/events/mainline/sched.h to add the > >>> original mainline TRACE_EVENT, so we can keep the files in sync. > >> > >> Ok, reconfigured Thunderbird, removed semicolon, updated mainline: > > > > Still an issue: > > > > compudj at thinkos:~/work/lttng-modules$ patch -p1 < ~/.swp > > patching file instrumentation/events/lttng-module/sched.h > > patch: **** malformed patch at line 45: SCHED_OTHER/BATCH/IDLE > > Now it should (hopefully) work: > > From b792323bca694f1e88144319befcdbfb30efa878 Mon Sep 17 00:00:00 2001 > From: Paul Woegerer > Date: Fri, 12 Oct 2012 12:52:19 +0200 > Subject: [PATCH] Add TRACE_EVENT(sched_process_exec) to sched.h. merged, thanks ! Mathieu > > --- > instrumentation/events/lttng-module/sched.h | 26 ++++++++++++++++++++++++++ > instrumentation/events/mainline/sched.h | 26 ++++++++++++++++++++++++++ > 2 files changed, 52 insertions(+) > > diff --git a/instrumentation/events/lttng-module/sched.h b/instrumentation/events/lttng-module/sched.h > index b68616e..23e4955 100644 > --- a/instrumentation/events/lttng-module/sched.h > +++ b/instrumentation/events/lttng-module/sched.h > @@ -314,6 +314,32 @@ TRACE_EVENT(sched_process_fork, > ) > > /* > + * Tracepoint for exec: > + */ > +TRACE_EVENT(sched_process_exec, > + > + TP_PROTO(struct task_struct *p, pid_t old_pid, > + struct linux_binprm *bprm), > + > + TP_ARGS(p, old_pid, bprm), > + > + TP_STRUCT__entry( > + __string( filename, bprm->filename ) > + __field( pid_t, pid ) > + __field( pid_t, old_pid ) > + ), > + > + TP_fast_assign( > + tp_strcpy(filename, bprm->filename) > + tp_assign(pid, p->pid) > + tp_assign(old_pid, old_pid) > + ), > + > + TP_printk("filename=%s pid=%d old_pid=%d", __get_str(filename), > + __entry->pid, __entry->old_pid) > +) > + > +/* > * XXX the below sched_stat tracepoints only apply to SCHED_OTHER/BATCH/IDLE > * adding sched_stat support to SCHED_FIFO/RR would be welcome. > */ > diff --git a/instrumentation/events/mainline/sched.h b/instrumentation/events/mainline/sched.h > index f633478..6700ecc 100644 > --- a/instrumentation/events/mainline/sched.h > +++ b/instrumentation/events/mainline/sched.h > @@ -275,6 +275,32 @@ TRACE_EVENT(sched_process_fork, > ); > > /* > + * Tracepoint for exec: > + */ > +TRACE_EVENT(sched_process_exec, > + > + TP_PROTO(struct task_struct *p, pid_t old_pid, > + struct linux_binprm *bprm), > + > + TP_ARGS(p, old_pid, bprm), > + > + TP_STRUCT__entry( > + __string( filename, bprm->filename ) > + __field( pid_t, pid ) > + __field( pid_t, old_pid ) > + ), > + > + TP_fast_assign( > + __assign_str(filename, bprm->filename); > + __entry->pid = p->pid; > + __entry->old_pid = old_pid; > + ), > + > + TP_printk("filename=%s pid=%d old_pid=%d", __get_str(filename), > + __entry->pid, __entry->old_pid) > +); > + > +/* > * XXX the below sched_stat tracepoints only apply to SCHED_OTHER/BATCH/IDLE > * adding sched_stat support to SCHED_FIFO/RR would be welcome. > */ > -- > 1.7.10.4 > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Fri Oct 12 10:30:32 2012 From: dgoulet at efficios.com (David Goulet) Date: Fri, 12 Oct 2012 10:30:32 -0400 Subject: [lttng-dev] [PATCH lttng-tools 1/4] Rename consumer threads and spawn them in daemon Message-ID: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> The metadata thread is now created in the lttng-consumerd daemon so all thread could be controlled inside the daemon. This is the first step of a consumer thread refactoring which aims at moving data and metadata stream operations inside a dedicated thread so the session daemon thread does not block and is more efficient at adding streams. The most important concept is that a stream file descriptor MUST be opened as quickly as we can than passed to the right thread (for UST since they are already opened by the session daemon for the kernel). Signed-off-by: David Goulet --- src/bin/lttng-consumerd/lttng-consumerd.c | 18 ++++++++++----- src/common/consumer.c | 34 +++++++++-------------------- src/common/consumer.h | 5 +++-- 3 files changed, 26 insertions(+), 31 deletions(-) diff --git a/src/bin/lttng-consumerd/lttng-consumerd.c b/src/bin/lttng-consumerd/lttng-consumerd.c index 5952334..946fb02 100644 --- a/src/bin/lttng-consumerd/lttng-consumerd.c +++ b/src/bin/lttng-consumerd/lttng-consumerd.c @@ -356,23 +356,31 @@ int main(int argc, char **argv) } lttng_consumer_set_error_sock(ctx, ret); - /* Create the thread to manage the receive of fd */ - ret = pthread_create(&threads[0], NULL, lttng_consumer_thread_receive_fds, + /* Create thread to manage the polling/writing of trace metadata */ + ret = pthread_create(&threads[0], NULL, consumer_thread_metadata_poll, + (void *) ctx); + if (ret != 0) { + perror("pthread_create"); + goto error; + } + + /* Create thread to manage the polling/writing of trace data */ + ret = pthread_create(&threads[1], NULL, consumer_thread_data_poll, (void *) ctx); if (ret != 0) { perror("pthread_create"); goto error; } - /* Create thread to manage the polling/writing of traces */ - ret = pthread_create(&threads[1], NULL, lttng_consumer_thread_poll_fds, + /* Create the thread to manage the receive of fd */ + ret = pthread_create(&threads[2], NULL, consumer_thread_sessiond_poll, (void *) ctx); if (ret != 0) { perror("pthread_create"); goto error; } - for (i = 0; i < 2; i++) { + for (i = 0; i < 3; i++) { ret = pthread_join(threads[i], &status); if (ret != 0) { perror("pthread_join"); diff --git a/src/common/consumer.c b/src/common/consumer.c index 242b05b..055de1b 100644 --- a/src/common/consumer.c +++ b/src/common/consumer.c @@ -1131,6 +1131,8 @@ void lttng_consumer_destroy(struct lttng_consumer_local_data *ctx) PERROR("close"); } utils_close_pipe(ctx->consumer_splice_metadata_pipe); + /* This should trigger the metadata thread to exit */ + close(ctx->consumer_metadata_pipe[1]); unlink(ctx->consumer_command_sock_path); free(ctx); @@ -1756,7 +1758,7 @@ error: * Thread polls on metadata file descriptor and write them on disk or on the * network. */ -void *lttng_consumer_thread_poll_metadata(void *data) +void *consumer_thread_metadata_poll(void *data) { int ret, i, pollfd; uint32_t revents, nb_fd; @@ -1939,7 +1941,7 @@ end: * This thread polls the fds in the set to consume the data and write * it to tracefile if necessary. */ -void *lttng_consumer_thread_poll_fds(void *data) +void *consumer_thread_data_poll(void *data) { int num_rdy, num_hup, high_prio, ret, i; struct pollfd *pollfd = NULL; @@ -1949,19 +1951,9 @@ void *lttng_consumer_thread_poll_fds(void *data) int nb_fd = 0; struct lttng_consumer_local_data *ctx = data; ssize_t len; - pthread_t metadata_thread; - void *status; rcu_register_thread(); - /* Start metadata polling thread */ - ret = pthread_create(&metadata_thread, NULL, - lttng_consumer_thread_poll_metadata, (void *) ctx); - if (ret < 0) { - PERROR("pthread_create metadata thread"); - goto end; - } - local_stream = zmalloc(sizeof(struct lttng_consumer_stream)); while (1) { @@ -2145,19 +2137,13 @@ end: /* * Close the write side of the pipe so epoll_wait() in - * lttng_consumer_thread_poll_metadata can catch it. The thread is - * monitoring the read side of the pipe. If we close them both, epoll_wait - * strangely does not return and could create a endless wait period if the - * pipe is the only tracked fd in the poll set. The thread will take care - * of closing the read side. + * consumer_thread_metadata_poll can catch it. The thread is monitoring the + * read side of the pipe. If we close them both, epoll_wait strangely does + * not return and could create a endless wait period if the pipe is the + * only tracked fd in the poll set. The thread will take care of closing + * the read side. */ close(ctx->consumer_metadata_pipe[1]); - if (ret) { - ret = pthread_join(metadata_thread, &status); - if (ret < 0) { - PERROR("pthread_join metadata thread"); - } - } rcu_unregister_thread(); return NULL; @@ -2167,7 +2153,7 @@ end: * This thread listens on the consumerd socket and receives the file * descriptors from the session daemon. */ -void *lttng_consumer_thread_receive_fds(void *data) +void *consumer_thread_sessiond_poll(void *data) { int sock, client_socket, ret; /* diff --git a/src/common/consumer.h b/src/common/consumer.h index d0cd8fd..4b225e4 100644 --- a/src/common/consumer.h +++ b/src/common/consumer.h @@ -385,8 +385,9 @@ extern int lttng_consumer_get_produced_snapshot( struct lttng_consumer_local_data *ctx, struct lttng_consumer_stream *stream, unsigned long *pos); -extern void *lttng_consumer_thread_poll_fds(void *data); -extern void *lttng_consumer_thread_receive_fds(void *data); +extern void *consumer_thread_metadata_poll(void *data); +extern void *consumer_thread_data_poll(void *data); +extern void *consumer_thread_sessiond_poll(void *data); extern int lttng_consumer_recv_cmd(struct lttng_consumer_local_data *ctx, int sock, struct pollfd *consumer_sockpoll); -- 1.7.10.4 From dgoulet at efficios.com Fri Oct 12 10:30:33 2012 From: dgoulet at efficios.com (David Goulet) Date: Fri, 12 Oct 2012 10:30:33 -0400 Subject: [lttng-dev] [PATCH lttng-tools 2/4] Move add data stream to the data thread In-Reply-To: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> Message-ID: <1350052235-12198-2-git-send-email-dgoulet@efficios.com> As a second step of refactoring, upon receiving a data stream, we send it to the data thread that is now in charge of handling it. Furthermore, in order for this to behave correctly, we have to make the ustctl actions on the stream upon before passing it to the right thread (the kernel does not need special actions.). This way, once the sessiond thread reply back to the session daemon, the stream is sure to be open and ready for data to be recorded on the application side so we avoid a race between the application thinking the stream is ready and the stream thread still scheduled out. This commit should speed up the add stream process for the session daemon. There is still some actions to move out of the session daemon poll thread to gain speed significantly, especially for network streaming. Signed-off-by: David Goulet --- src/common/consumer.c | 123 +++++++++++--------------- src/common/consumer.h | 1 + src/common/kernel-consumer/kernel-consumer.c | 24 ++--- src/common/ust-consumer/ust-consumer.c | 40 ++++----- 4 files changed, 78 insertions(+), 110 deletions(-) diff --git a/src/common/consumer.c b/src/common/consumer.c index 055de1b..1d2b1f7 100644 --- a/src/common/consumer.c +++ b/src/common/consumer.c @@ -89,7 +89,7 @@ static struct lttng_consumer_stream *consumer_find_stream(int key, return stream; } -static void consumer_steal_stream_key(int key, struct lttng_ht *ht) +void consumer_steal_stream_key(int key, struct lttng_ht *ht) { struct lttng_consumer_stream *stream; @@ -409,6 +409,14 @@ struct lttng_consumer_stream *consumer_allocate_stream( lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); lttng_ht_node_init_ulong(&stream->node, stream->key); + /* + * The cpu number is needed before using any ustctl_* actions. Ignored for + * the kernel so the value does not matter. + */ + pthread_mutex_lock(&consumer_data.lock); + stream->cpu = stream->chan->cpucount++; + pthread_mutex_unlock(&consumer_data.lock); + DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, stream->shm_fd, stream->wait_fd, @@ -437,28 +445,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) pthread_mutex_lock(&consumer_data.lock); rcu_read_lock(); - switch (consumer_data.type) { - case LTTNG_CONSUMER_KERNEL: - break; - case LTTNG_CONSUMER32_UST: - case LTTNG_CONSUMER64_UST: - stream->cpu = stream->chan->cpucount++; - ret = lttng_ustconsumer_add_stream(stream); - if (ret) { - ret = -EINVAL; - goto error; - } - - /* Steal stream identifier only for UST */ - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); - break; - default: - ERR("Unknown consumer_data type"); - assert(0); - ret = -ENOSYS; - goto error; - } - lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); /* Check and cleanup relayd */ @@ -485,7 +471,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) consumer_data.stream_count++; consumer_data.need_update = 1; -error: rcu_read_unlock(); pthread_mutex_unlock(&consumer_data.lock); @@ -1582,17 +1567,6 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, DBG3("Consumer delete metadata stream %d", stream->wait_fd); - if (ht == NULL) { - /* Means the stream was allocated but not successfully added */ - goto free_stream; - } - - rcu_read_lock(); - iter.iter.node = &stream->waitfd_node.node; - ret = lttng_ht_del(ht, &iter); - assert(!ret); - rcu_read_unlock(); - pthread_mutex_lock(&consumer_data.lock); switch (consumer_data.type) { case LTTNG_CONSUMER_KERNEL: @@ -1613,6 +1587,18 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, goto end; } + if (ht == NULL) { + pthread_mutex_unlock(&consumer_data.lock); + /* Means the stream was allocated but not successfully added */ + goto free_stream; + } + + rcu_read_lock(); + iter.iter.node = &stream->waitfd_node.node; + ret = lttng_ht_del(ht, &iter); + assert(!ret); + rcu_read_unlock(); + if (stream->out_fd >= 0) { ret = close(stream->out_fd); if (ret) { @@ -1699,27 +1685,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, pthread_mutex_lock(&consumer_data.lock); - switch (consumer_data.type) { - case LTTNG_CONSUMER_KERNEL: - break; - case LTTNG_CONSUMER32_UST: - case LTTNG_CONSUMER64_UST: - ret = lttng_ustconsumer_add_stream(stream); - if (ret) { - ret = -EINVAL; - goto error; - } - - /* Steal stream identifier only for UST */ - consumer_steal_stream_key(stream->wait_fd, ht); - break; - default: - ERR("Unknown consumer_data type"); - assert(0); - ret = -ENOSYS; - goto error; - } - /* * From here, refcounts are updated so be _careful_ when returning an error * after this point. @@ -1749,7 +1714,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); rcu_read_unlock(); -error: pthread_mutex_unlock(&consumer_data.lock); return ret; } @@ -1946,7 +1910,7 @@ void *consumer_thread_data_poll(void *data) int num_rdy, num_hup, high_prio, ret, i; struct pollfd *pollfd = NULL; /* local view of the streams */ - struct lttng_consumer_stream **local_stream = NULL; + struct lttng_consumer_stream **local_stream = NULL, *new_stream = NULL; /* local view of consumer_data.fds_count */ int nb_fd = 0; struct lttng_consumer_local_data *ctx = data; @@ -2034,13 +1998,35 @@ void *consumer_thread_data_poll(void *data) */ if (pollfd[nb_fd].revents & (POLLIN | POLLPRI)) { size_t pipe_readlen; - char tmp; DBG("consumer_poll_pipe wake up"); /* Consume 1 byte of pipe data */ do { - pipe_readlen = read(ctx->consumer_poll_pipe[0], &tmp, 1); + pipe_readlen = read(ctx->consumer_poll_pipe[0], &new_stream, + sizeof(new_stream)); } while (pipe_readlen == -1 && errno == EINTR); + + /* + * If the stream is NULL, just ignore it. It's also possible that + * the sessiond poll thread changed the consumer_quit state and is + * waking us up to test it. + */ + if (new_stream == NULL) { + continue; + } + + ret = consumer_add_stream(new_stream); + if (ret) { + ERR("Consumer add stream %d failed. Continuing", + new_stream->key); + /* + * At this point, if the add_stream fails, it is not in the + * hash table thus passing the NULL value here. + */ + consumer_del_stream(new_stream, NULL); + } + + /* Continue to update the local streams and handle prio ones */ continue; } @@ -2260,19 +2246,16 @@ end: consumer_poll_timeout = LTTNG_CONSUMER_POLL_TIMEOUT; /* - * Wake-up the other end by writing a null byte in the pipe - * (non-blocking). Important note: Because writing into the - * pipe is non-blocking (and therefore we allow dropping wakeup - * data, as long as there is wakeup data present in the pipe - * buffer to wake up the other end), the other end should - * perform the following sequence for waiting: - * 1) empty the pipe (reads). - * 2) perform update operation. - * 3) wait on the pipe (poll). + * Notify the data poll thread to poll back again and test the + * consumer_quit state to quit gracefully. */ do { - ret = write(ctx->consumer_poll_pipe[1], "", 1); + struct lttng_consumer_stream *null_stream = NULL; + + ret = write(ctx->consumer_poll_pipe[1], &null_stream, + sizeof(null_stream)); } while (ret < 0 && errno == EINTR); + rcu_unregister_thread(); return NULL; } diff --git a/src/common/consumer.h b/src/common/consumer.h index 4b225e4..8e5891a 100644 --- a/src/common/consumer.h +++ b/src/common/consumer.h @@ -362,6 +362,7 @@ struct consumer_relayd_sock_pair *consumer_allocate_relayd_sock_pair( struct consumer_relayd_sock_pair *consumer_find_relayd(int key); int consumer_handle_stream_before_relayd(struct lttng_consumer_stream *stream, size_t data_size); +void consumer_steal_stream_key(int key, struct lttng_ht *ht); extern struct lttng_consumer_local_data *lttng_consumer_create( enum lttng_consumer_type type, diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c index 13cbe21..444f5e0 100644 --- a/src/common/kernel-consumer/kernel-consumer.c +++ b/src/common/kernel-consumer/kernel-consumer.c @@ -235,10 +235,12 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, consumer_del_stream(new_stream, NULL); } } else { - ret = consumer_add_stream(new_stream); - if (ret) { - ERR("Consumer add stream %d failed. Continuing", - new_stream->key); + do { + ret = write(ctx->consumer_poll_pipe[1], &new_stream, + sizeof(new_stream)); + } while (ret < 0 && errno == EINTR); + if (ret < 0) { + PERROR("write data pipe"); consumer_del_stream(new_stream, NULL); goto end_nosignal; } @@ -284,20 +286,6 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, goto end_nosignal; } - /* - * Wake-up the other end by writing a null byte in the pipe (non-blocking). - * Important note: Because writing into the pipe is non-blocking (and - * therefore we allow dropping wakeup data, as long as there is wakeup data - * present in the pipe buffer to wake up the other end), the other end - * should perform the following sequence for waiting: - * - * 1) empty the pipe (reads). - * 2) perform update operation. - * 3) wait on the pipe (poll). - */ - do { - ret = write(ctx->consumer_poll_pipe[1], "", 1); - } while (ret < 0 && errno == EINTR); end_nosignal: rcu_read_unlock(); diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c index 1170687..4ca4b84 100644 --- a/src/common/ust-consumer/ust-consumer.c +++ b/src/common/ust-consumer/ust-consumer.c @@ -224,6 +224,18 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, goto end_nosignal; } + /* + * This needs to be done as soon as we can so we don't block the + * application too long. + */ + ret = lttng_ustconsumer_add_stream(new_stream); + if (ret) { + consumer_del_stream(new_stream, NULL); + goto end_nosignal; + } + /* Steal stream identifier to avoid having streams with the same key */ + consumer_steal_stream_key(new_stream->key, consumer_data.stream_ht); + /* The stream is not metadata. Get relayd reference if exists. */ relayd = consumer_find_relayd(msg.u.stream.net_index); if (relayd != NULL) { @@ -265,14 +277,12 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, goto end_nosignal; } } else { - ret = consumer_add_stream(new_stream); - if (ret) { - ERR("Consumer add stream %d failed. Continuing", - new_stream->key); - /* - * At this point, if the add_stream fails, it is not in the - * hash table thus passing the NULL value here. - */ + do { + ret = write(ctx->consumer_poll_pipe[1], &new_stream, + sizeof(new_stream)); + } while (ret < 0 && errno == EINTR); + if (ret < 0) { + PERROR("write data pipe"); consumer_del_stream(new_stream, NULL); goto end_nosignal; } @@ -334,20 +344,6 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, break; } - /* - * Wake-up the other end by writing a null byte in the pipe (non-blocking). - * Important note: Because writing into the pipe is non-blocking (and - * therefore we allow dropping wakeup data, as long as there is wakeup data - * present in the pipe buffer to wake up the other end), the other end - * should perform the following sequence for waiting: - * - * 1) empty the pipe (reads). - * 2) perform update operation. - * 3) wait on the pipe (poll). - */ - do { - ret = write(ctx->consumer_poll_pipe[1], "", 1); - } while (ret < 0 && errno == EINTR); end_nosignal: rcu_read_unlock(); -- 1.7.10.4 From dgoulet at efficios.com Fri Oct 12 10:30:34 2012 From: dgoulet at efficios.com (David Goulet) Date: Fri, 12 Oct 2012 10:30:34 -0400 Subject: [lttng-dev] [PATCH lttng-tools 3/4] Make stream hash tables global to the consumer In-Reply-To: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> Message-ID: <1350052235-12198-3-git-send-email-dgoulet@efficios.com> The data stream hash table is now global to the consumer and used in the data thread. The consumer_data stream_ht is no longer used to track the data streams but instead will be used (and possibly renamed) by the session daemon poll thread to keep track of streams on a per session id basis for the upcoming feature that check traced data availability. For now, in order to avoid mind bugging problems to access the streams, both hash table are now defined globally (metadata and data). However, stream update are still done in a single thread. Don't count on this to be guaranteed in the next commits. Signed-off-by: David Goulet --- src/common/consumer.c | 91 +++++++++++++++++++++++++------- src/common/consumer.h | 9 ++-- src/common/ust-consumer/ust-consumer.c | 2 - 3 files changed, 75 insertions(+), 27 deletions(-) diff --git a/src/common/consumer.c b/src/common/consumer.c index 1d2b1f7..1fb9960 100644 --- a/src/common/consumer.c +++ b/src/common/consumer.c @@ -59,6 +59,17 @@ int consumer_poll_timeout = -1; volatile int consumer_quit = 0; /* + * The following two hash tables are visible by all threads which are separated + * in different source files. + * + * Global hash table containing respectively metadata and data streams. The + * stream element in this ht should only be updated by the metadata poll thread + * for the metadata and the data poll thread for the data. + */ +struct lttng_ht *metadata_ht = NULL; +struct lttng_ht *data_ht = NULL; + +/* * Find a stream. The consumer_data.lock must be locked during this * call. */ @@ -433,19 +444,24 @@ end: /* * Add a stream to the global list protected by a mutex. */ -int consumer_add_stream(struct lttng_consumer_stream *stream) +static int consumer_add_stream(struct lttng_consumer_stream *stream, + struct lttng_ht *ht) { int ret = 0; struct consumer_relayd_sock_pair *relayd; assert(stream); + assert(ht); DBG3("Adding consumer stream %d", stream->key); pthread_mutex_lock(&consumer_data.lock); rcu_read_lock(); - lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); + /* Steal stream identifier to avoid having streams with the same key */ + consumer_steal_stream_key(stream->key, ht); + + lttng_ht_add_unique_ulong(ht, &stream->node); /* Check and cleanup relayd */ relayd = consumer_find_relayd(stream->net_seq_idx); @@ -783,9 +799,9 @@ end: * * Returns the number of fds in the structures. */ -int consumer_update_poll_array( +static int consumer_update_poll_array( struct lttng_consumer_local_data *ctx, struct pollfd **pollfd, - struct lttng_consumer_stream **local_stream) + struct lttng_consumer_stream **local_stream, struct lttng_ht *ht) { int i = 0; struct lttng_ht_iter iter; @@ -793,8 +809,7 @@ int consumer_update_poll_array( DBG("Updating poll fd array"); rcu_read_lock(); - cds_lfht_for_each_entry(consumer_data.stream_ht->ht, &iter.iter, stream, - node.node) { + cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { if (stream->state != LTTNG_CONSUMER_ACTIVE_STREAM) { continue; } @@ -1523,6 +1538,33 @@ int lttng_consumer_recv_cmd(struct lttng_consumer_local_data *ctx, /* * Iterate over all streams of the hashtable and free them properly. * + * WARNING: *MUST* be used with data stream only. + */ +static void destroy_data_stream_ht(struct lttng_ht *ht) +{ + int ret; + struct lttng_ht_iter iter; + struct lttng_consumer_stream *stream; + + if (ht == NULL) { + return; + } + + rcu_read_lock(); + cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { + ret = lttng_ht_del(ht, &iter); + assert(!ret); + + call_rcu(&stream->node.head, consumer_free_stream); + } + rcu_read_unlock(); + + lttng_ht_destroy(ht); +} + +/* + * Iterate over all streams of the hashtable and free them properly. + * * XXX: Should not be only for metadata stream or else use an other name. */ static void destroy_stream_ht(struct lttng_ht *ht) @@ -1711,6 +1753,9 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, uatomic_dec(&stream->chan->nb_init_streams); } + /* Steal stream identifier to avoid having streams with the same key */ + consumer_steal_stream_key(stream->key, ht); + lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); rcu_read_unlock(); @@ -1729,7 +1774,6 @@ void *consumer_thread_metadata_poll(void *data) struct lttng_consumer_stream *stream = NULL; struct lttng_ht_iter iter; struct lttng_ht_node_ulong *node; - struct lttng_ht *metadata_ht = NULL; struct lttng_poll_event events; struct lttng_consumer_local_data *ctx = data; ssize_t len; @@ -1738,11 +1782,6 @@ void *consumer_thread_metadata_poll(void *data) DBG("Thread metadata poll started"); - metadata_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); - if (metadata_ht == NULL) { - goto end; - } - /* Size is set to 1 for the consumer_metadata pipe */ ret = lttng_poll_create(&events, 2, LTTNG_CLOEXEC); if (ret < 0) { @@ -1918,6 +1957,11 @@ void *consumer_thread_data_poll(void *data) rcu_register_thread(); + data_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); + if (data_ht == NULL) { + goto end; + } + local_stream = zmalloc(sizeof(struct lttng_consumer_stream)); while (1) { @@ -1955,7 +1999,8 @@ void *consumer_thread_data_poll(void *data) pthread_mutex_unlock(&consumer_data.lock); goto end; } - ret = consumer_update_poll_array(ctx, &pollfd, local_stream); + ret = consumer_update_poll_array(ctx, &pollfd, local_stream, + data_ht); if (ret < 0) { ERR("Error in allocating pollfd or local_outfds"); lttng_consumer_send_error(ctx, LTTCOMM_CONSUMERD_POLL_ERROR); @@ -2015,7 +2060,7 @@ void *consumer_thread_data_poll(void *data) continue; } - ret = consumer_add_stream(new_stream); + ret = consumer_add_stream(new_stream, data_ht); if (ret) { ERR("Consumer add stream %d failed. Continuing", new_stream->key); @@ -2088,22 +2133,19 @@ void *consumer_thread_data_poll(void *data) if ((pollfd[i].revents & POLLHUP)) { DBG("Polling fd %d tells it has hung up.", pollfd[i].fd); if (!local_stream[i]->data_read) { - consumer_del_stream(local_stream[i], - consumer_data.stream_ht); + consumer_del_stream(local_stream[i], data_ht); num_hup++; } } else if (pollfd[i].revents & POLLERR) { ERR("Error returned in polling fd %d.", pollfd[i].fd); if (!local_stream[i]->data_read) { - consumer_del_stream(local_stream[i], - consumer_data.stream_ht); + consumer_del_stream(local_stream[i], data_ht); num_hup++; } } else if (pollfd[i].revents & POLLNVAL) { ERR("Polling fd %d tells fd is not open.", pollfd[i].fd); if (!local_stream[i]->data_read) { - consumer_del_stream(local_stream[i], - consumer_data.stream_ht); + consumer_del_stream(local_stream[i], data_ht); num_hup++; } } @@ -2131,6 +2173,10 @@ end: */ close(ctx->consumer_metadata_pipe[1]); + if (data_ht) { + destroy_data_stream_ht(data_ht); + } + rcu_unregister_thread(); return NULL; } @@ -2299,6 +2345,11 @@ void lttng_consumer_init(void) consumer_data.stream_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); consumer_data.channel_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); consumer_data.relayd_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); + + metadata_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); + assert(metadata_ht); + data_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); + assert(data_ht); } /* diff --git a/src/common/consumer.h b/src/common/consumer.h index 8e5891a..6bce96d 100644 --- a/src/common/consumer.h +++ b/src/common/consumer.h @@ -275,6 +275,10 @@ struct lttng_consumer_global_data { struct lttng_ht *relayd_ht; }; +/* Defined in consumer.c and coupled with explanations */ +extern struct lttng_ht *metadata_ht; +extern struct lttng_ht *data_ht; + /* * Init consumer data structures. */ @@ -324,10 +328,6 @@ extern void lttng_consumer_sync_trace_file( */ extern int lttng_consumer_poll_socket(struct pollfd *kconsumer_sockpoll); -extern int consumer_update_poll_array( - struct lttng_consumer_local_data *ctx, struct pollfd **pollfd, - struct lttng_consumer_stream **local_consumer_streams); - extern struct lttng_consumer_stream *consumer_allocate_stream( int channel_key, int stream_key, int shm_fd, int wait_fd, @@ -340,7 +340,6 @@ extern struct lttng_consumer_stream *consumer_allocate_stream( int net_index, int metadata_flag, int *alloc_ret); -extern int consumer_add_stream(struct lttng_consumer_stream *stream); extern void consumer_del_stream(struct lttng_consumer_stream *stream, struct lttng_ht *ht); extern void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c index 4ca4b84..3b41e55 100644 --- a/src/common/ust-consumer/ust-consumer.c +++ b/src/common/ust-consumer/ust-consumer.c @@ -233,8 +233,6 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, consumer_del_stream(new_stream, NULL); goto end_nosignal; } - /* Steal stream identifier to avoid having streams with the same key */ - consumer_steal_stream_key(new_stream->key, consumer_data.stream_ht); /* The stream is not metadata. Get relayd reference if exists. */ relayd = consumer_find_relayd(msg.u.stream.net_index); -- 1.7.10.4 From dgoulet at efficios.com Fri Oct 12 10:30:35 2012 From: dgoulet at efficios.com (David Goulet) Date: Fri, 12 Oct 2012 10:30:35 -0400 Subject: [lttng-dev] [PATCH lttng-tools 4/4] Change the metadata hash table node In-Reply-To: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> Message-ID: <1350052235-12198-4-git-send-email-dgoulet@efficios.com> Remove the use of the "waitfd_node" for metadata and index the "node" by wait fd during stream allocation only for metadata stream. This was done so the waitfd node could be used later on for the hash table indexing stream by session id the traced data check command (soon to be implemented). Signed-off-by: David Goulet --- src/common/consumer.c | 36 +++++++++++++++++------------------- 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/src/common/consumer.c b/src/common/consumer.c index 1fb9960..0c1a812 100644 --- a/src/common/consumer.c +++ b/src/common/consumer.c @@ -172,17 +172,6 @@ void consumer_free_stream(struct rcu_head *head) free(stream); } -static -void consumer_free_metadata_stream(struct rcu_head *head) -{ - struct lttng_ht_node_ulong *node = - caa_container_of(head, struct lttng_ht_node_ulong, head); - struct lttng_consumer_stream *stream = - caa_container_of(node, struct lttng_consumer_stream, waitfd_node); - - free(stream); -} - /* * RCU protected relayd socket pair free. */ @@ -417,8 +406,17 @@ struct lttng_consumer_stream *consumer_allocate_stream( stream->metadata_flag = metadata_flag; strncpy(stream->path_name, path_name, sizeof(stream->path_name)); stream->path_name[sizeof(stream->path_name) - 1] = '\0'; - lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); - lttng_ht_node_init_ulong(&stream->node, stream->key); + + /* + * Index differently the metadata node because the thread is using an + * internal hash table to match streams in the metadata_ht to the epoll set + * file descriptor. + */ + if (metadata_flag) { + lttng_ht_node_init_ulong(&stream->node, stream->wait_fd); + } else { + lttng_ht_node_init_ulong(&stream->node, stream->key); + } /* * The cpu number is needed before using any ustctl_* actions. Ignored for @@ -1578,11 +1576,11 @@ static void destroy_stream_ht(struct lttng_ht *ht) } rcu_read_lock(); - cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, waitfd_node.node) { + cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { ret = lttng_ht_del(ht, &iter); assert(!ret); - call_rcu(&stream->waitfd_node.head, consumer_free_metadata_stream); + call_rcu(&stream->node.head, consumer_free_stream); } rcu_read_unlock(); @@ -1636,7 +1634,7 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, } rcu_read_lock(); - iter.iter.node = &stream->waitfd_node.node; + iter.iter.node = &stream->node.node; ret = lttng_ht_del(ht, &iter); assert(!ret); rcu_read_unlock(); @@ -1707,7 +1705,7 @@ end: } free_stream: - call_rcu(&stream->waitfd_node.head, consumer_free_metadata_stream); + call_rcu(&stream->node.head, consumer_free_stream); } /* @@ -1756,7 +1754,7 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, /* Steal stream identifier to avoid having streams with the same key */ consumer_steal_stream_key(stream->key, ht); - lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); + lttng_ht_add_unique_ulong(ht, &stream->node); rcu_read_unlock(); pthread_mutex_unlock(&consumer_data.lock); @@ -1881,7 +1879,7 @@ restart: assert(node); stream = caa_container_of(node, struct lttng_consumer_stream, - waitfd_node); + node); /* Check for error event */ if (revents & (LPOLLERR | LPOLLHUP)) { -- 1.7.10.4 From mathieu.desnoyers at efficios.com Fri Oct 12 10:35:39 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Fri, 12 Oct 2012 10:35:39 -0400 Subject: [lttng-dev] [urcu commit] wfcqueue: clarify locking usage Message-ID: <20121012143539.GA17214@Krystal> commit 1fe734e1914993dfa395e2b81e5c9ee0115cc56c Author: Mathieu Desnoyers Date: Fri Oct 12 10:33:20 2012 -0400 wfcqueue: clarify locking usage Signed-off-by: Mathieu Desnoyers diff --git a/urcu/static/wfcqueue.h b/urcu/static/wfcqueue.h index 2c8c447..2ca9eeb 100644 --- a/urcu/static/wfcqueue.h +++ b/urcu/static/wfcqueue.h @@ -59,6 +59,10 @@ extern "C" { * * For convenience, cds_wfcq_dequeue_blocking() and * cds_wfcq_splice_blocking() hold the dequeue lock. + * + * Besides locking, mutual exclusion of dequeue, splice and iteration + * can be ensured by performing all of those operations from a single + * thread, without requiring any lock. */ #define WFCQ_ADAPT_ATTEMPTS 10 /* Retry if being set */ @@ -192,7 +196,8 @@ ___cds_wfcq_node_sync_next(struct cds_wfcq_node *node) * * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. - * Should be called with cds_wfcq_dequeue_lock() held. + * Dequeue/splice/iteration mutual exclusion should be ensured by the + * caller. * * Used by for-like iteration macros in urcu/wfqueue.h: * __cds_wfcq_for_each_blocking() @@ -217,7 +222,8 @@ ___cds_wfcq_first_blocking(struct cds_wfcq_head *head, * * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. - * Should be called with cds_wfcq_dequeue_lock() held. + * Dequeue/splice/iteration mutual exclusion should be ensured by the + * caller. * * Used by for-like iteration macros in urcu/wfqueue.h: * __cds_wfcq_for_each_blocking() @@ -259,7 +265,8 @@ ___cds_wfcq_next_blocking(struct cds_wfcq_head *head, * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. * It is valid to reuse and free a dequeued node immediately. - * Should be called with cds_wfcq_dequeue_lock() held. + * Dequeue/splice/iteration mutual exclusion should be ensured by the + * caller. */ static inline struct cds_wfcq_node * ___cds_wfcq_dequeue_blocking(struct cds_wfcq_head *head, @@ -308,7 +315,8 @@ ___cds_wfcq_dequeue_blocking(struct cds_wfcq_head *head, * * Dequeue all nodes from src_q. * dest_q must be already initialized. - * Should be called with cds_wfcq_dequeue_lock() held on src_q. + * Dequeue/splice/iteration mutual exclusion for src_q should be ensured + * by the caller. */ static inline void ___cds_wfcq_splice_blocking( @@ -345,7 +353,7 @@ ___cds_wfcq_splice_blocking( * * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. - * Mutual exlusion with (and only with) cds_wfcq_splice_blocking is + * Mutual exlusion with cds_wfcq_splice_blocking and dequeue lock is * ensured. * It is valid to reuse and free a dequeued node immediately. */ @@ -368,7 +376,7 @@ _cds_wfcq_dequeue_blocking(struct cds_wfcq_head *head, * dest_q must be already initialized. * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. - * Mutual exlusion with (and only with) cds_wfcq_dequeue_blocking is + * Mutual exlusion with cds_wfcq_dequeue_blocking and dequeue lock is * ensured. */ static inline void diff --git a/urcu/wfcqueue.h b/urcu/wfcqueue.h index b13aa7b..ba2f2ed 100644 --- a/urcu/wfcqueue.h +++ b/urcu/wfcqueue.h @@ -102,6 +102,10 @@ struct cds_wfcq_tail { * * For convenience, cds_wfcq_dequeue_blocking() and * cds_wfcq_splice_blocking() hold the dequeue lock. + * + * Besides locking, mutual exclusion of dequeue, splice and iteration + * can be ensured by performing all of those operations from a single + * thread, without requiring any lock. */ /* @@ -151,7 +155,8 @@ extern void cds_wfcq_enqueue(struct cds_wfcq_head *head, * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. * It is valid to reuse and free a dequeued node immediately. - * Mutual exlusion with dequeuers is ensured internally. + * Mutual exlusion with cds_wfcq_dequeue_blocking and dequeue lock is + * ensured. */ extern struct cds_wfcq_node *cds_wfcq_dequeue_blocking( struct cds_wfcq_head *head, @@ -164,7 +169,8 @@ extern struct cds_wfcq_node *cds_wfcq_dequeue_blocking( * dest_q must be already initialized. * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. - * Mutual exlusion with dequeuers is ensured internally. + * Mutual exlusion with cds_wfcq_dequeue_blocking and dequeue lock is + * ensured. */ extern void cds_wfcq_splice_blocking( struct cds_wfcq_head *dest_q_head, @@ -178,7 +184,8 @@ extern void cds_wfcq_splice_blocking( * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. * It is valid to reuse and free a dequeued node immediately. - * Should be called with cds_wfcq_dequeue_lock() held. + * Dequeue/splice/iteration mutual exclusion should be ensured by the + * caller. */ extern struct cds_wfcq_node *__cds_wfcq_dequeue_blocking( struct cds_wfcq_head *head, @@ -191,7 +198,8 @@ extern struct cds_wfcq_node *__cds_wfcq_dequeue_blocking( * dest_q must be already initialized. * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. - * Should be called with cds_wfcq_dequeue_lock() held. + * Dequeue/splice/iteration mutual exclusion for src_q should be ensured + * by the caller. */ extern void __cds_wfcq_splice_blocking( struct cds_wfcq_head *dest_q_head, @@ -204,7 +212,8 @@ extern void __cds_wfcq_splice_blocking( * * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. - * Should be called with cds_wfcq_dequeue_lock() held. + * Dequeue/splice/iteration mutual exclusion should be ensured by the + * caller. * * Used by for-like iteration macros: * __cds_wfcq_for_each_blocking() @@ -219,7 +228,8 @@ extern struct cds_wfcq_node *__cds_wfcq_first_blocking( * * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. - * Should be called with cds_wfcq_dequeue_lock() held. + * Dequeue/splice/iteration mutual exclusion should be ensured by the + * caller. * * Used by for-like iteration macros: * __cds_wfcq_for_each_blocking() @@ -241,7 +251,8 @@ extern struct cds_wfcq_node *__cds_wfcq_next_blocking( * * Content written into each node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. - * Should be called with cds_wfcq_dequeue_lock() held. + * Dequeue/splice/iteration mutual exclusion should be ensured by the + * caller. */ #define __cds_wfcq_for_each_blocking(head, tail, node) \ for (node = __cds_wfcq_first_blocking(head, tail); \ @@ -259,7 +270,8 @@ extern struct cds_wfcq_node *__cds_wfcq_next_blocking( * * Content written into each node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. - * Should be called with cds_wfcq_dequeue_lock() held. + * Dequeue/splice/iteration mutual exclusion should be ensured by the + * caller. */ #define __cds_wfcq_for_each_blocking_safe(head, tail, node, n) \ for (node = __cds_wfcq_first_blocking(head, tail), \ -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Sat Oct 13 11:41:13 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Sat, 13 Oct 2012 11:41:13 -0400 Subject: [lttng-dev] [PATCH lttng-tools 1/4] Rename consumer threads and spawn them in daemon In-Reply-To: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> Message-ID: <20121013154113.GA29985@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > The metadata thread is now created in the lttng-consumerd daemon so all > thread could be controlled inside the daemon. > > This is the first step of a consumer thread refactoring which aims at > moving data and metadata stream operations inside a dedicated thread so > the session daemon thread does not block and is more efficient at adding > streams. > > The most important concept is that a stream file descriptor MUST be > opened as quickly as we can than passed to the right thread (for UST than -> then > since they are already opened by the session daemon for the kernel). > > Signed-off-by: David Goulet > --- > src/bin/lttng-consumerd/lttng-consumerd.c | 18 ++++++++++----- > src/common/consumer.c | 34 +++++++++-------------------- > src/common/consumer.h | 5 +++-- > 3 files changed, 26 insertions(+), 31 deletions(-) > > diff --git a/src/bin/lttng-consumerd/lttng-consumerd.c b/src/bin/lttng-consumerd/lttng-consumerd.c > index 5952334..946fb02 100644 > --- a/src/bin/lttng-consumerd/lttng-consumerd.c > +++ b/src/bin/lttng-consumerd/lttng-consumerd.c > @@ -356,23 +356,31 @@ int main(int argc, char **argv) > } > lttng_consumer_set_error_sock(ctx, ret); > > - /* Create the thread to manage the receive of fd */ > - ret = pthread_create(&threads[0], NULL, lttng_consumer_thread_receive_fds, > + /* Create thread to manage the polling/writing of trace metadata */ > + ret = pthread_create(&threads[0], NULL, consumer_thread_metadata_poll, > + (void *) ctx); > + if (ret != 0) { > + perror("pthread_create"); > + goto error; > + } > + > + /* Create thread to manage the polling/writing of trace data */ > + ret = pthread_create(&threads[1], NULL, consumer_thread_data_poll, > (void *) ctx); > if (ret != 0) { > perror("pthread_create"); > goto error; > } > > - /* Create thread to manage the polling/writing of traces */ > - ret = pthread_create(&threads[1], NULL, lttng_consumer_thread_poll_fds, > + /* Create the thread to manage the receive of fd */ > + ret = pthread_create(&threads[2], NULL, consumer_thread_sessiond_poll, > (void *) ctx); > if (ret != 0) { > perror("pthread_create"); > goto error; > } > > - for (i = 0; i < 2; i++) { > + for (i = 0; i < 3; i++) { > ret = pthread_join(threads[i], &status); > if (ret != 0) { > perror("pthread_join"); > diff --git a/src/common/consumer.c b/src/common/consumer.c > index 242b05b..055de1b 100644 > --- a/src/common/consumer.c > +++ b/src/common/consumer.c > @@ -1131,6 +1131,8 @@ void lttng_consumer_destroy(struct lttng_consumer_local_data *ctx) > PERROR("close"); > } > utils_close_pipe(ctx->consumer_splice_metadata_pipe); > + /* This should trigger the metadata thread to exit */ > + close(ctx->consumer_metadata_pipe[1]); this is adding a close, but did not remove any other remove that might previously be in place elsewhere. moreover, the close() return value is not tested. > > unlink(ctx->consumer_command_sock_path); > free(ctx); > @@ -1756,7 +1758,7 @@ error: > * Thread polls on metadata file descriptor and write them on disk or on the > * network. > */ > -void *lttng_consumer_thread_poll_metadata(void *data) > +void *consumer_thread_metadata_poll(void *data) > { > int ret, i, pollfd; > uint32_t revents, nb_fd; > @@ -1939,7 +1941,7 @@ end: > * This thread polls the fds in the set to consume the data and write > * it to tracefile if necessary. > */ > -void *lttng_consumer_thread_poll_fds(void *data) > +void *consumer_thread_data_poll(void *data) > { > int num_rdy, num_hup, high_prio, ret, i; > struct pollfd *pollfd = NULL; > @@ -1949,19 +1951,9 @@ void *lttng_consumer_thread_poll_fds(void *data) > int nb_fd = 0; > struct lttng_consumer_local_data *ctx = data; > ssize_t len; > - pthread_t metadata_thread; > - void *status; > > rcu_register_thread(); > > - /* Start metadata polling thread */ > - ret = pthread_create(&metadata_thread, NULL, > - lttng_consumer_thread_poll_metadata, (void *) ctx); > - if (ret < 0) { > - PERROR("pthread_create metadata thread"); > - goto end; > - } > - > local_stream = zmalloc(sizeof(struct lttng_consumer_stream)); > > while (1) { > @@ -2145,19 +2137,13 @@ end: > > /* > * Close the write side of the pipe so epoll_wait() in > - * lttng_consumer_thread_poll_metadata can catch it. The thread is > - * monitoring the read side of the pipe. If we close them both, epoll_wait > - * strangely does not return and could create a endless wait period if the > - * pipe is the only tracked fd in the poll set. The thread will take care > - * of closing the read side. > + * consumer_thread_metadata_poll can catch it. The thread is monitoring the > + * read side of the pipe. If we close them both, epoll_wait strangely does > + * not return and could create a endless wait period if the pipe is the > + * only tracked fd in the poll set. The thread will take care of closing > + * the read side. > */ > close(ctx->consumer_metadata_pipe[1]); this is the second close on the same FD I'm talking about. thanks, Mathieu > - if (ret) { > - ret = pthread_join(metadata_thread, &status); > - if (ret < 0) { > - PERROR("pthread_join metadata thread"); > - } > - } > > rcu_unregister_thread(); > return NULL; > @@ -2167,7 +2153,7 @@ end: > * This thread listens on the consumerd socket and receives the file > * descriptors from the session daemon. > */ > -void *lttng_consumer_thread_receive_fds(void *data) > +void *consumer_thread_sessiond_poll(void *data) > { > int sock, client_socket, ret; > /* > diff --git a/src/common/consumer.h b/src/common/consumer.h > index d0cd8fd..4b225e4 100644 > --- a/src/common/consumer.h > +++ b/src/common/consumer.h > @@ -385,8 +385,9 @@ extern int lttng_consumer_get_produced_snapshot( > struct lttng_consumer_local_data *ctx, > struct lttng_consumer_stream *stream, > unsigned long *pos); > -extern void *lttng_consumer_thread_poll_fds(void *data); > -extern void *lttng_consumer_thread_receive_fds(void *data); > +extern void *consumer_thread_metadata_poll(void *data); > +extern void *consumer_thread_data_poll(void *data); > +extern void *consumer_thread_sessiond_poll(void *data); > extern int lttng_consumer_recv_cmd(struct lttng_consumer_local_data *ctx, > int sock, struct pollfd *consumer_sockpoll); > > -- > 1.7.10.4 > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Sat Oct 13 11:53:13 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Sat, 13 Oct 2012 11:53:13 -0400 Subject: [lttng-dev] [PATCH lttng-tools 2/4] Move add data stream to the data thread In-Reply-To: <1350052235-12198-2-git-send-email-dgoulet@efficios.com> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <1350052235-12198-2-git-send-email-dgoulet@efficios.com> Message-ID: <20121013155313.GB29985@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > As a second step of refactoring, upon receiving a data stream, we send > it to the data thread that is now in charge of handling it. > > Furthermore, in order for this to behave correctly, we have to make the > ustctl actions on the stream upon before passing it to the right thread > (the kernel does not need special actions.). This way, once the sessiond > thread reply back to the session daemon, the stream is sure to be open > and ready for data to be recorded on the application side so we avoid a > race between the application thinking the stream is ready and the stream > thread still scheduled out. Normally, as long as we have a reference on the SHM file descriptor, and we have the wakeup FD, we should be good to fetch the data of buffers belonging to an application that has already exited, even if it did so before the ustctl calls are done. So I'm wondering why you do the ustctl calls in the sessiond thread ? It seems to complexify the implementation needlessly: we could still do the ustctl calls and output file open at the same location, the data/metadata threads. Thanks, Mathieu > > This commit should speed up the add stream process for the session > daemon. There is still some actions to move out of the session daemon > poll thread to gain speed significantly, especially for network > streaming. > > Signed-off-by: David Goulet > --- > src/common/consumer.c | 123 +++++++++++--------------- > src/common/consumer.h | 1 + > src/common/kernel-consumer/kernel-consumer.c | 24 ++--- > src/common/ust-consumer/ust-consumer.c | 40 ++++----- > 4 files changed, 78 insertions(+), 110 deletions(-) > > diff --git a/src/common/consumer.c b/src/common/consumer.c > index 055de1b..1d2b1f7 100644 > --- a/src/common/consumer.c > +++ b/src/common/consumer.c > @@ -89,7 +89,7 @@ static struct lttng_consumer_stream *consumer_find_stream(int key, > return stream; > } > > -static void consumer_steal_stream_key(int key, struct lttng_ht *ht) > +void consumer_steal_stream_key(int key, struct lttng_ht *ht) > { > struct lttng_consumer_stream *stream; > > @@ -409,6 +409,14 @@ struct lttng_consumer_stream *consumer_allocate_stream( > lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); > lttng_ht_node_init_ulong(&stream->node, stream->key); > > + /* > + * The cpu number is needed before using any ustctl_* actions. Ignored for > + * the kernel so the value does not matter. > + */ > + pthread_mutex_lock(&consumer_data.lock); > + stream->cpu = stream->chan->cpucount++; > + pthread_mutex_unlock(&consumer_data.lock); > + > DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," > " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, > stream->shm_fd, stream->wait_fd, > @@ -437,28 +445,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) > pthread_mutex_lock(&consumer_data.lock); > rcu_read_lock(); > > - switch (consumer_data.type) { > - case LTTNG_CONSUMER_KERNEL: > - break; > - case LTTNG_CONSUMER32_UST: > - case LTTNG_CONSUMER64_UST: > - stream->cpu = stream->chan->cpucount++; > - ret = lttng_ustconsumer_add_stream(stream); > - if (ret) { > - ret = -EINVAL; > - goto error; > - } > - > - /* Steal stream identifier only for UST */ > - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); > - break; > - default: > - ERR("Unknown consumer_data type"); > - assert(0); > - ret = -ENOSYS; > - goto error; > - } > - > lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); > > /* Check and cleanup relayd */ > @@ -485,7 +471,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) > consumer_data.stream_count++; > consumer_data.need_update = 1; > > -error: > rcu_read_unlock(); > pthread_mutex_unlock(&consumer_data.lock); > > @@ -1582,17 +1567,6 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, > > DBG3("Consumer delete metadata stream %d", stream->wait_fd); > > - if (ht == NULL) { > - /* Means the stream was allocated but not successfully added */ > - goto free_stream; > - } > - > - rcu_read_lock(); > - iter.iter.node = &stream->waitfd_node.node; > - ret = lttng_ht_del(ht, &iter); > - assert(!ret); > - rcu_read_unlock(); > - > pthread_mutex_lock(&consumer_data.lock); > switch (consumer_data.type) { > case LTTNG_CONSUMER_KERNEL: > @@ -1613,6 +1587,18 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, > goto end; > } > > + if (ht == NULL) { > + pthread_mutex_unlock(&consumer_data.lock); > + /* Means the stream was allocated but not successfully added */ > + goto free_stream; > + } > + > + rcu_read_lock(); > + iter.iter.node = &stream->waitfd_node.node; > + ret = lttng_ht_del(ht, &iter); > + assert(!ret); > + rcu_read_unlock(); > + > if (stream->out_fd >= 0) { > ret = close(stream->out_fd); > if (ret) { > @@ -1699,27 +1685,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > > pthread_mutex_lock(&consumer_data.lock); > > - switch (consumer_data.type) { > - case LTTNG_CONSUMER_KERNEL: > - break; > - case LTTNG_CONSUMER32_UST: > - case LTTNG_CONSUMER64_UST: > - ret = lttng_ustconsumer_add_stream(stream); > - if (ret) { > - ret = -EINVAL; > - goto error; > - } > - > - /* Steal stream identifier only for UST */ > - consumer_steal_stream_key(stream->wait_fd, ht); > - break; > - default: > - ERR("Unknown consumer_data type"); > - assert(0); > - ret = -ENOSYS; > - goto error; > - } > - > /* > * From here, refcounts are updated so be _careful_ when returning an error > * after this point. > @@ -1749,7 +1714,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); > rcu_read_unlock(); > > -error: > pthread_mutex_unlock(&consumer_data.lock); > return ret; > } > @@ -1946,7 +1910,7 @@ void *consumer_thread_data_poll(void *data) > int num_rdy, num_hup, high_prio, ret, i; > struct pollfd *pollfd = NULL; > /* local view of the streams */ > - struct lttng_consumer_stream **local_stream = NULL; > + struct lttng_consumer_stream **local_stream = NULL, *new_stream = NULL; > /* local view of consumer_data.fds_count */ > int nb_fd = 0; > struct lttng_consumer_local_data *ctx = data; > @@ -2034,13 +1998,35 @@ void *consumer_thread_data_poll(void *data) > */ > if (pollfd[nb_fd].revents & (POLLIN | POLLPRI)) { > size_t pipe_readlen; > - char tmp; > > DBG("consumer_poll_pipe wake up"); > /* Consume 1 byte of pipe data */ > do { > - pipe_readlen = read(ctx->consumer_poll_pipe[0], &tmp, 1); > + pipe_readlen = read(ctx->consumer_poll_pipe[0], &new_stream, > + sizeof(new_stream)); > } while (pipe_readlen == -1 && errno == EINTR); > + > + /* > + * If the stream is NULL, just ignore it. It's also possible that > + * the sessiond poll thread changed the consumer_quit state and is > + * waking us up to test it. > + */ > + if (new_stream == NULL) { > + continue; > + } > + > + ret = consumer_add_stream(new_stream); > + if (ret) { > + ERR("Consumer add stream %d failed. Continuing", > + new_stream->key); > + /* > + * At this point, if the add_stream fails, it is not in the > + * hash table thus passing the NULL value here. > + */ > + consumer_del_stream(new_stream, NULL); > + } > + > + /* Continue to update the local streams and handle prio ones */ > continue; > } > > @@ -2260,19 +2246,16 @@ end: > consumer_poll_timeout = LTTNG_CONSUMER_POLL_TIMEOUT; > > /* > - * Wake-up the other end by writing a null byte in the pipe > - * (non-blocking). Important note: Because writing into the > - * pipe is non-blocking (and therefore we allow dropping wakeup > - * data, as long as there is wakeup data present in the pipe > - * buffer to wake up the other end), the other end should > - * perform the following sequence for waiting: > - * 1) empty the pipe (reads). > - * 2) perform update operation. > - * 3) wait on the pipe (poll). > + * Notify the data poll thread to poll back again and test the > + * consumer_quit state to quit gracefully. > */ > do { > - ret = write(ctx->consumer_poll_pipe[1], "", 1); > + struct lttng_consumer_stream *null_stream = NULL; > + > + ret = write(ctx->consumer_poll_pipe[1], &null_stream, > + sizeof(null_stream)); > } while (ret < 0 && errno == EINTR); > + > rcu_unregister_thread(); > return NULL; > } > diff --git a/src/common/consumer.h b/src/common/consumer.h > index 4b225e4..8e5891a 100644 > --- a/src/common/consumer.h > +++ b/src/common/consumer.h > @@ -362,6 +362,7 @@ struct consumer_relayd_sock_pair *consumer_allocate_relayd_sock_pair( > struct consumer_relayd_sock_pair *consumer_find_relayd(int key); > int consumer_handle_stream_before_relayd(struct lttng_consumer_stream *stream, > size_t data_size); > +void consumer_steal_stream_key(int key, struct lttng_ht *ht); > > extern struct lttng_consumer_local_data *lttng_consumer_create( > enum lttng_consumer_type type, > diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c > index 13cbe21..444f5e0 100644 > --- a/src/common/kernel-consumer/kernel-consumer.c > +++ b/src/common/kernel-consumer/kernel-consumer.c > @@ -235,10 +235,12 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > consumer_del_stream(new_stream, NULL); > } > } else { > - ret = consumer_add_stream(new_stream); > - if (ret) { > - ERR("Consumer add stream %d failed. Continuing", > - new_stream->key); > + do { > + ret = write(ctx->consumer_poll_pipe[1], &new_stream, > + sizeof(new_stream)); > + } while (ret < 0 && errno == EINTR); > + if (ret < 0) { > + PERROR("write data pipe"); > consumer_del_stream(new_stream, NULL); > goto end_nosignal; > } > @@ -284,20 +286,6 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > goto end_nosignal; > } > > - /* > - * Wake-up the other end by writing a null byte in the pipe (non-blocking). > - * Important note: Because writing into the pipe is non-blocking (and > - * therefore we allow dropping wakeup data, as long as there is wakeup data > - * present in the pipe buffer to wake up the other end), the other end > - * should perform the following sequence for waiting: > - * > - * 1) empty the pipe (reads). > - * 2) perform update operation. > - * 3) wait on the pipe (poll). > - */ > - do { > - ret = write(ctx->consumer_poll_pipe[1], "", 1); > - } while (ret < 0 && errno == EINTR); > end_nosignal: > rcu_read_unlock(); > > diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c > index 1170687..4ca4b84 100644 > --- a/src/common/ust-consumer/ust-consumer.c > +++ b/src/common/ust-consumer/ust-consumer.c > @@ -224,6 +224,18 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > goto end_nosignal; > } > > + /* > + * This needs to be done as soon as we can so we don't block the > + * application too long. > + */ > + ret = lttng_ustconsumer_add_stream(new_stream); > + if (ret) { > + consumer_del_stream(new_stream, NULL); > + goto end_nosignal; > + } > + /* Steal stream identifier to avoid having streams with the same key */ > + consumer_steal_stream_key(new_stream->key, consumer_data.stream_ht); > + > /* The stream is not metadata. Get relayd reference if exists. */ > relayd = consumer_find_relayd(msg.u.stream.net_index); > if (relayd != NULL) { > @@ -265,14 +277,12 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > goto end_nosignal; > } > } else { > - ret = consumer_add_stream(new_stream); > - if (ret) { > - ERR("Consumer add stream %d failed. Continuing", > - new_stream->key); > - /* > - * At this point, if the add_stream fails, it is not in the > - * hash table thus passing the NULL value here. > - */ > + do { > + ret = write(ctx->consumer_poll_pipe[1], &new_stream, > + sizeof(new_stream)); > + } while (ret < 0 && errno == EINTR); > + if (ret < 0) { > + PERROR("write data pipe"); > consumer_del_stream(new_stream, NULL); > goto end_nosignal; > } > @@ -334,20 +344,6 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > break; > } > > - /* > - * Wake-up the other end by writing a null byte in the pipe (non-blocking). > - * Important note: Because writing into the pipe is non-blocking (and > - * therefore we allow dropping wakeup data, as long as there is wakeup data > - * present in the pipe buffer to wake up the other end), the other end > - * should perform the following sequence for waiting: > - * > - * 1) empty the pipe (reads). > - * 2) perform update operation. > - * 3) wait on the pipe (poll). > - */ > - do { > - ret = write(ctx->consumer_poll_pipe[1], "", 1); > - } while (ret < 0 && errno == EINTR); > end_nosignal: > rcu_read_unlock(); > > -- > 1.7.10.4 > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Sat Oct 13 11:56:15 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Sat, 13 Oct 2012 11:56:15 -0400 Subject: [lttng-dev] [PATCH lttng-tools 3/4] Make stream hash tables global to the consumer In-Reply-To: <1350052235-12198-3-git-send-email-dgoulet@efficios.com> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <1350052235-12198-3-git-send-email-dgoulet@efficios.com> Message-ID: <20121013155615.GC29985@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > The data stream hash table is now global to the consumer and used in the > data thread. The consumer_data stream_ht is no longer used to track the > data streams but instead will be used (and possibly renamed) by the > session daemon poll thread to keep track of streams on a per session id > basis for the upcoming feature that check traced data availability. > > For now, in order to avoid mind bugging problems to access the streams, > both hash table are now defined globally (metadata and data). However, > stream update are still done in a single thread. Don't count on this to > be guaranteed in the next commits. > > Signed-off-by: David Goulet > --- > src/common/consumer.c | 91 +++++++++++++++++++++++++------- > src/common/consumer.h | 9 ++-- > src/common/ust-consumer/ust-consumer.c | 2 - > 3 files changed, 75 insertions(+), 27 deletions(-) > > diff --git a/src/common/consumer.c b/src/common/consumer.c > index 1d2b1f7..1fb9960 100644 > --- a/src/common/consumer.c > +++ b/src/common/consumer.c > @@ -59,6 +59,17 @@ int consumer_poll_timeout = -1; > volatile int consumer_quit = 0; > > /* > + * The following two hash tables are visible by all threads which are separated > + * in different source files. > + * > + * Global hash table containing respectively metadata and data streams. The > + * stream element in this ht should only be updated by the metadata poll thread > + * for the metadata and the data poll thread for the data. > + */ > +struct lttng_ht *metadata_ht = NULL; > +struct lttng_ht *data_ht = NULL; > + > +/* > * Find a stream. The consumer_data.lock must be locked during this > * call. > */ > @@ -433,19 +444,24 @@ end: > /* > * Add a stream to the global list protected by a mutex. > */ > -int consumer_add_stream(struct lttng_consumer_stream *stream) > +static int consumer_add_stream(struct lttng_consumer_stream *stream, > + struct lttng_ht *ht) > { > int ret = 0; > struct consumer_relayd_sock_pair *relayd; > > assert(stream); > + assert(ht); > > DBG3("Adding consumer stream %d", stream->key); > > pthread_mutex_lock(&consumer_data.lock); > rcu_read_lock(); > > - lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); > + /* Steal stream identifier to avoid having streams with the same key */ > + consumer_steal_stream_key(stream->key, ht); I don't understand why suddenly this change is needed. Considering what this patch should be doing (just moving a ht from per-thread to global), it should not have any behavior impact. Thanks, Mathieu > + > + lttng_ht_add_unique_ulong(ht, &stream->node); > > /* Check and cleanup relayd */ > relayd = consumer_find_relayd(stream->net_seq_idx); > @@ -783,9 +799,9 @@ end: > * > * Returns the number of fds in the structures. > */ > -int consumer_update_poll_array( > +static int consumer_update_poll_array( > struct lttng_consumer_local_data *ctx, struct pollfd **pollfd, > - struct lttng_consumer_stream **local_stream) > + struct lttng_consumer_stream **local_stream, struct lttng_ht *ht) > { > int i = 0; > struct lttng_ht_iter iter; > @@ -793,8 +809,7 @@ int consumer_update_poll_array( > > DBG("Updating poll fd array"); > rcu_read_lock(); > - cds_lfht_for_each_entry(consumer_data.stream_ht->ht, &iter.iter, stream, > - node.node) { > + cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { > if (stream->state != LTTNG_CONSUMER_ACTIVE_STREAM) { > continue; > } > @@ -1523,6 +1538,33 @@ int lttng_consumer_recv_cmd(struct lttng_consumer_local_data *ctx, > /* > * Iterate over all streams of the hashtable and free them properly. > * > + * WARNING: *MUST* be used with data stream only. > + */ > +static void destroy_data_stream_ht(struct lttng_ht *ht) > +{ > + int ret; > + struct lttng_ht_iter iter; > + struct lttng_consumer_stream *stream; > + > + if (ht == NULL) { > + return; > + } > + > + rcu_read_lock(); > + cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { > + ret = lttng_ht_del(ht, &iter); > + assert(!ret); > + > + call_rcu(&stream->node.head, consumer_free_stream); > + } > + rcu_read_unlock(); > + > + lttng_ht_destroy(ht); > +} > + > +/* > + * Iterate over all streams of the hashtable and free them properly. > + * > * XXX: Should not be only for metadata stream or else use an other name. > */ > static void destroy_stream_ht(struct lttng_ht *ht) > @@ -1711,6 +1753,9 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > uatomic_dec(&stream->chan->nb_init_streams); > } > > + /* Steal stream identifier to avoid having streams with the same key */ > + consumer_steal_stream_key(stream->key, ht); > + > lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); > rcu_read_unlock(); > > @@ -1729,7 +1774,6 @@ void *consumer_thread_metadata_poll(void *data) > struct lttng_consumer_stream *stream = NULL; > struct lttng_ht_iter iter; > struct lttng_ht_node_ulong *node; > - struct lttng_ht *metadata_ht = NULL; > struct lttng_poll_event events; > struct lttng_consumer_local_data *ctx = data; > ssize_t len; > @@ -1738,11 +1782,6 @@ void *consumer_thread_metadata_poll(void *data) > > DBG("Thread metadata poll started"); > > - metadata_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > - if (metadata_ht == NULL) { > - goto end; > - } > - > /* Size is set to 1 for the consumer_metadata pipe */ > ret = lttng_poll_create(&events, 2, LTTNG_CLOEXEC); > if (ret < 0) { > @@ -1918,6 +1957,11 @@ void *consumer_thread_data_poll(void *data) > > rcu_register_thread(); > > + data_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > + if (data_ht == NULL) { > + goto end; > + } > + > local_stream = zmalloc(sizeof(struct lttng_consumer_stream)); > > while (1) { > @@ -1955,7 +1999,8 @@ void *consumer_thread_data_poll(void *data) > pthread_mutex_unlock(&consumer_data.lock); > goto end; > } > - ret = consumer_update_poll_array(ctx, &pollfd, local_stream); > + ret = consumer_update_poll_array(ctx, &pollfd, local_stream, > + data_ht); > if (ret < 0) { > ERR("Error in allocating pollfd or local_outfds"); > lttng_consumer_send_error(ctx, LTTCOMM_CONSUMERD_POLL_ERROR); > @@ -2015,7 +2060,7 @@ void *consumer_thread_data_poll(void *data) > continue; > } > > - ret = consumer_add_stream(new_stream); > + ret = consumer_add_stream(new_stream, data_ht); > if (ret) { > ERR("Consumer add stream %d failed. Continuing", > new_stream->key); > @@ -2088,22 +2133,19 @@ void *consumer_thread_data_poll(void *data) > if ((pollfd[i].revents & POLLHUP)) { > DBG("Polling fd %d tells it has hung up.", pollfd[i].fd); > if (!local_stream[i]->data_read) { > - consumer_del_stream(local_stream[i], > - consumer_data.stream_ht); > + consumer_del_stream(local_stream[i], data_ht); > num_hup++; > } > } else if (pollfd[i].revents & POLLERR) { > ERR("Error returned in polling fd %d.", pollfd[i].fd); > if (!local_stream[i]->data_read) { > - consumer_del_stream(local_stream[i], > - consumer_data.stream_ht); > + consumer_del_stream(local_stream[i], data_ht); > num_hup++; > } > } else if (pollfd[i].revents & POLLNVAL) { > ERR("Polling fd %d tells fd is not open.", pollfd[i].fd); > if (!local_stream[i]->data_read) { > - consumer_del_stream(local_stream[i], > - consumer_data.stream_ht); > + consumer_del_stream(local_stream[i], data_ht); > num_hup++; > } > } > @@ -2131,6 +2173,10 @@ end: > */ > close(ctx->consumer_metadata_pipe[1]); > > + if (data_ht) { > + destroy_data_stream_ht(data_ht); > + } > + > rcu_unregister_thread(); > return NULL; > } > @@ -2299,6 +2345,11 @@ void lttng_consumer_init(void) > consumer_data.stream_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > consumer_data.channel_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > consumer_data.relayd_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > + > + metadata_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > + assert(metadata_ht); > + data_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > + assert(data_ht); > } > > /* > diff --git a/src/common/consumer.h b/src/common/consumer.h > index 8e5891a..6bce96d 100644 > --- a/src/common/consumer.h > +++ b/src/common/consumer.h > @@ -275,6 +275,10 @@ struct lttng_consumer_global_data { > struct lttng_ht *relayd_ht; > }; > > +/* Defined in consumer.c and coupled with explanations */ > +extern struct lttng_ht *metadata_ht; > +extern struct lttng_ht *data_ht; > + > /* > * Init consumer data structures. > */ > @@ -324,10 +328,6 @@ extern void lttng_consumer_sync_trace_file( > */ > extern int lttng_consumer_poll_socket(struct pollfd *kconsumer_sockpoll); > > -extern int consumer_update_poll_array( > - struct lttng_consumer_local_data *ctx, struct pollfd **pollfd, > - struct lttng_consumer_stream **local_consumer_streams); > - > extern struct lttng_consumer_stream *consumer_allocate_stream( > int channel_key, int stream_key, > int shm_fd, int wait_fd, > @@ -340,7 +340,6 @@ extern struct lttng_consumer_stream *consumer_allocate_stream( > int net_index, > int metadata_flag, > int *alloc_ret); > -extern int consumer_add_stream(struct lttng_consumer_stream *stream); > extern void consumer_del_stream(struct lttng_consumer_stream *stream, > struct lttng_ht *ht); > extern void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, > diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c > index 4ca4b84..3b41e55 100644 > --- a/src/common/ust-consumer/ust-consumer.c > +++ b/src/common/ust-consumer/ust-consumer.c > @@ -233,8 +233,6 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > consumer_del_stream(new_stream, NULL); > goto end_nosignal; > } > - /* Steal stream identifier to avoid having streams with the same key */ > - consumer_steal_stream_key(new_stream->key, consumer_data.stream_ht); > > /* The stream is not metadata. Get relayd reference if exists. */ > relayd = consumer_find_relayd(msg.u.stream.net_index); > -- > 1.7.10.4 > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Sat Oct 13 12:00:35 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Sat, 13 Oct 2012 12:00:35 -0400 Subject: [lttng-dev] [PATCH lttng-tools 4/4] Change the metadata hash table node In-Reply-To: <1350052235-12198-4-git-send-email-dgoulet@efficios.com> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <1350052235-12198-4-git-send-email-dgoulet@efficios.com> Message-ID: <20121013160035.GD29985@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > Remove the use of the "waitfd_node" for metadata and index the "node" by > wait fd during stream allocation only for metadata stream. > > This was done so the waitfd node could be used later on for the hash > table indexing stream by session id the traced data check command (soon > to be implemented). this last changelog paragraph, being using "was" (past) is confusing me. Is it what you are now doing (present) or what was there before that you are changing ? Thanks, Mathieu > > Signed-off-by: David Goulet > --- > src/common/consumer.c | 36 +++++++++++++++++------------------- > 1 file changed, 17 insertions(+), 19 deletions(-) > > diff --git a/src/common/consumer.c b/src/common/consumer.c > index 1fb9960..0c1a812 100644 > --- a/src/common/consumer.c > +++ b/src/common/consumer.c > @@ -172,17 +172,6 @@ void consumer_free_stream(struct rcu_head *head) > free(stream); > } > > -static > -void consumer_free_metadata_stream(struct rcu_head *head) > -{ > - struct lttng_ht_node_ulong *node = > - caa_container_of(head, struct lttng_ht_node_ulong, head); > - struct lttng_consumer_stream *stream = > - caa_container_of(node, struct lttng_consumer_stream, waitfd_node); > - > - free(stream); > -} > - > /* > * RCU protected relayd socket pair free. > */ > @@ -417,8 +406,17 @@ struct lttng_consumer_stream *consumer_allocate_stream( > stream->metadata_flag = metadata_flag; > strncpy(stream->path_name, path_name, sizeof(stream->path_name)); > stream->path_name[sizeof(stream->path_name) - 1] = '\0'; > - lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); > - lttng_ht_node_init_ulong(&stream->node, stream->key); > + > + /* > + * Index differently the metadata node because the thread is using an > + * internal hash table to match streams in the metadata_ht to the epoll set > + * file descriptor. > + */ > + if (metadata_flag) { > + lttng_ht_node_init_ulong(&stream->node, stream->wait_fd); > + } else { > + lttng_ht_node_init_ulong(&stream->node, stream->key); > + } > > /* > * The cpu number is needed before using any ustctl_* actions. Ignored for > @@ -1578,11 +1576,11 @@ static void destroy_stream_ht(struct lttng_ht *ht) > } > > rcu_read_lock(); > - cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, waitfd_node.node) { > + cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { > ret = lttng_ht_del(ht, &iter); > assert(!ret); > > - call_rcu(&stream->waitfd_node.head, consumer_free_metadata_stream); > + call_rcu(&stream->node.head, consumer_free_stream); > } > rcu_read_unlock(); > > @@ -1636,7 +1634,7 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, > } > > rcu_read_lock(); > - iter.iter.node = &stream->waitfd_node.node; > + iter.iter.node = &stream->node.node; > ret = lttng_ht_del(ht, &iter); > assert(!ret); > rcu_read_unlock(); > @@ -1707,7 +1705,7 @@ end: > } > > free_stream: > - call_rcu(&stream->waitfd_node.head, consumer_free_metadata_stream); > + call_rcu(&stream->node.head, consumer_free_stream); > } > > /* > @@ -1756,7 +1754,7 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > /* Steal stream identifier to avoid having streams with the same key */ > consumer_steal_stream_key(stream->key, ht); > > - lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); > + lttng_ht_add_unique_ulong(ht, &stream->node); > rcu_read_unlock(); > > pthread_mutex_unlock(&consumer_data.lock); > @@ -1881,7 +1879,7 @@ restart: > assert(node); > > stream = caa_container_of(node, struct lttng_consumer_stream, > - waitfd_node); > + node); > > /* Check for error event */ > if (revents & (LPOLLERR | LPOLLHUP)) { > -- > 1.7.10.4 > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From laijs at cn.fujitsu.com Sat Oct 13 12:08:31 2012 From: laijs at cn.fujitsu.com (Lai Jiangshan) Date: Sun, 14 Oct 2012 00:08:31 +0800 Subject: [lttng-dev] [PATCH 1/4] test: test for the proper pointer Message-ID: <1350144514-4132-1-git-send-email-laijs@cn.fujitsu.com> We should use "if (snode)" instead of "if (node)" in case of the struct cds_lfs_node_rcu is not the first field of struct node. Signed-off-by: Lai Jiangshan --- tests/test_urcu_lfs.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/tests/test_urcu_lfs.c b/tests/test_urcu_lfs.c index 88bf65d..681b654 100644 --- a/tests/test_urcu_lfs.c +++ b/tests/test_urcu_lfs.c @@ -242,13 +242,14 @@ void *thr_dequeuer(void *_count) for (;;) { struct cds_lfs_node_rcu *snode; - struct test *node; rcu_read_lock(); snode = cds_lfs_pop_rcu(&s); - node = caa_container_of(snode, struct test, list); rcu_read_unlock(); - if (node) { + if (snode) { + struct test *node; + + node = caa_container_of(snode, struct test, list); call_rcu(&node->rcu, free_node_cb); URCU_TLS(nr_successful_dequeues)++; } -- 1.7.7.6 From laijs at cn.fujitsu.com Sat Oct 13 12:08:33 2012 From: laijs at cn.fujitsu.com (Lai Jiangshan) Date: Sun, 14 Oct 2012 00:08:33 +0800 Subject: [lttng-dev] [PATCH 3/4] test: test for the proper pointer In-Reply-To: <1350144514-4132-1-git-send-email-laijs@cn.fujitsu.com> References: <1350144514-4132-1-git-send-email-laijs@cn.fujitsu.com> Message-ID: <1350144514-4132-3-git-send-email-laijs@cn.fujitsu.com> We should use "if (qnode)" instead of "if (node)" in case of the struct cds_lfq_node_rcu is not the first field of struct node. Signed-off-by: Lai Jiangshan --- tests/test_urcu_lfq.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/tests/test_urcu_lfq.c b/tests/test_urcu_lfq.c index 66ddd41..0fcbf55 100644 --- a/tests/test_urcu_lfq.c +++ b/tests/test_urcu_lfq.c @@ -243,14 +243,15 @@ void *thr_dequeuer(void *_count) for (;;) { struct cds_lfq_node_rcu *qnode; - struct test *node; rcu_read_lock(); qnode = cds_lfq_dequeue_rcu(&q); - node = caa_container_of(qnode, struct test, list); rcu_read_unlock(); - if (node) { + if (qnode) { + struct test *node; + + node = caa_container_of(qnode, struct test, list); call_rcu(&node->rcu, free_node_cb); URCU_TLS(nr_successful_dequeues)++; } -- 1.7.7.6 From laijs at cn.fujitsu.com Sat Oct 13 12:08:32 2012 From: laijs at cn.fujitsu.com (Lai Jiangshan) Date: Sun, 14 Oct 2012 00:08:32 +0800 Subject: [lttng-dev] [PATCH 2/4] test: remove rcu_defer_register_thread() from test_urcu_lfs In-Reply-To: <1350144514-4132-1-git-send-email-laijs@cn.fujitsu.com> References: <1350144514-4132-1-git-send-email-laijs@cn.fujitsu.com> Message-ID: <1350144514-4132-2-git-send-email-laijs@cn.fujitsu.com> test_urcu_lfs has already switch to call_rcu(), rcu_defer_register_thread() is unneeded. Signed-off-by: Lai Jiangshan --- tests/test_urcu_lfs.c | 8 -------- 1 files changed, 0 insertions(+), 8 deletions(-) diff --git a/tests/test_urcu_lfs.c b/tests/test_urcu_lfs.c index 681b654..f5d7d4b 100644 --- a/tests/test_urcu_lfs.c +++ b/tests/test_urcu_lfs.c @@ -67,7 +67,6 @@ static inline pid_t gettid(void) #endif #include #include -#include static volatile int test_go, test_stop; @@ -221,18 +220,12 @@ void free_node_cb(struct rcu_head *head) void *thr_dequeuer(void *_count) { unsigned long long *count = _count; - int ret; printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n", "dequeuer", pthread_self(), (unsigned long)gettid()); set_affinity(); - ret = rcu_defer_register_thread(); - if (ret) { - printf("Error in rcu_defer_register_thread\n"); - exit(-1); - } rcu_register_thread(); while (!test_go) @@ -261,7 +254,6 @@ void *thr_dequeuer(void *_count) } rcu_unregister_thread(); - rcu_defer_unregister_thread(); printf_verbose("dequeuer thread_end, thread id : %lx, tid %lu, " "dequeues %llu, successful_dequeues %llu\n", -- 1.7.7.6 From laijs at cn.fujitsu.com Sat Oct 13 12:08:34 2012 From: laijs at cn.fujitsu.com (Lai Jiangshan) Date: Sun, 14 Oct 2012 00:08:34 +0800 Subject: [lttng-dev] [PATCH 4/4] test: remove rcu_defer_register_thread() from test_urcu_lfq In-Reply-To: <1350144514-4132-1-git-send-email-laijs@cn.fujitsu.com> References: <1350144514-4132-1-git-send-email-laijs@cn.fujitsu.com> Message-ID: <1350144514-4132-4-git-send-email-laijs@cn.fujitsu.com> test_urcu_lfq has already switch to call_rcu(), rcu_defer_register_thread() is unneeded. Signed-off-by: Lai Jiangshan --- tests/test_urcu_lfq.c | 8 -------- 1 files changed, 0 insertions(+), 8 deletions(-) diff --git a/tests/test_urcu_lfq.c b/tests/test_urcu_lfq.c index 0fcbf55..1204d92 100644 --- a/tests/test_urcu_lfq.c +++ b/tests/test_urcu_lfq.c @@ -67,7 +67,6 @@ static inline pid_t gettid(void) #endif #include #include -#include static volatile int test_go, test_stop; @@ -222,18 +221,12 @@ void free_node_cb(struct rcu_head *head) void *thr_dequeuer(void *_count) { unsigned long long *count = _count; - int ret; printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n", "dequeuer", pthread_self(), (unsigned long)gettid()); set_affinity(); - ret = rcu_defer_register_thread(); - if (ret) { - printf("Error in rcu_defer_register_thread\n"); - exit(-1); - } rcu_register_thread(); while (!test_go) @@ -264,7 +257,6 @@ void *thr_dequeuer(void *_count) } rcu_unregister_thread(); - rcu_defer_unregister_thread(); printf_verbose("dequeuer thread_end, thread id : %lx, tid %lu, " "dequeues %llu, successful_dequeues %llu\n", pthread_self(), (unsigned long)gettid(), -- 1.7.7.6 From mathieu.desnoyers at efficios.com Sat Oct 13 12:44:02 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Sat, 13 Oct 2012 12:44:02 -0400 Subject: [lttng-dev] [PATCH 1/4] test: test for the proper pointer In-Reply-To: <1350144514-4132-1-git-send-email-laijs@cn.fujitsu.com> References: <1350144514-4132-1-git-send-email-laijs@cn.fujitsu.com> Message-ID: <20121013164402.GE29985@Krystal> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > We should use "if (snode)" instead of "if (node)" in case of > the struct cds_lfs_node_rcu is not the first field of struct node. > > Signed-off-by: Lai Jiangshan merged, thanks! Mathieu > --- > tests/test_urcu_lfs.c | 7 ++++--- > 1 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/tests/test_urcu_lfs.c b/tests/test_urcu_lfs.c > index 88bf65d..681b654 100644 > --- a/tests/test_urcu_lfs.c > +++ b/tests/test_urcu_lfs.c > @@ -242,13 +242,14 @@ void *thr_dequeuer(void *_count) > > for (;;) { > struct cds_lfs_node_rcu *snode; > - struct test *node; > > rcu_read_lock(); > snode = cds_lfs_pop_rcu(&s); > - node = caa_container_of(snode, struct test, list); > rcu_read_unlock(); > - if (node) { > + if (snode) { > + struct test *node; > + > + node = caa_container_of(snode, struct test, list); > call_rcu(&node->rcu, free_node_cb); > URCU_TLS(nr_successful_dequeues)++; > } > -- > 1.7.7.6 > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Sat Oct 13 12:46:43 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Sat, 13 Oct 2012 12:46:43 -0400 Subject: [lttng-dev] [PATCH 2/4] test: remove rcu_defer_register_thread() from test_urcu_lfs In-Reply-To: <1350144514-4132-2-git-send-email-laijs@cn.fujitsu.com> References: <1350144514-4132-1-git-send-email-laijs@cn.fujitsu.com> <1350144514-4132-2-git-send-email-laijs@cn.fujitsu.com> Message-ID: <20121013164643.GF29985@Krystal> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > test_urcu_lfs has already switch to call_rcu(), rcu_defer_register_thread() > is unneeded. Merged, thanks! Mathieu > > Signed-off-by: Lai Jiangshan > --- > tests/test_urcu_lfs.c | 8 -------- > 1 files changed, 0 insertions(+), 8 deletions(-) > > diff --git a/tests/test_urcu_lfs.c b/tests/test_urcu_lfs.c > index 681b654..f5d7d4b 100644 > --- a/tests/test_urcu_lfs.c > +++ b/tests/test_urcu_lfs.c > @@ -67,7 +67,6 @@ static inline pid_t gettid(void) > #endif > #include > #include > -#include > > static volatile int test_go, test_stop; > > @@ -221,18 +220,12 @@ void free_node_cb(struct rcu_head *head) > void *thr_dequeuer(void *_count) > { > unsigned long long *count = _count; > - int ret; > > printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n", > "dequeuer", pthread_self(), (unsigned long)gettid()); > > set_affinity(); > > - ret = rcu_defer_register_thread(); > - if (ret) { > - printf("Error in rcu_defer_register_thread\n"); > - exit(-1); > - } > rcu_register_thread(); > > while (!test_go) > @@ -261,7 +254,6 @@ void *thr_dequeuer(void *_count) > } > > rcu_unregister_thread(); > - rcu_defer_unregister_thread(); > > printf_verbose("dequeuer thread_end, thread id : %lx, tid %lu, " > "dequeues %llu, successful_dequeues %llu\n", > -- > 1.7.7.6 > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Sat Oct 13 12:48:58 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Sat, 13 Oct 2012 12:48:58 -0400 Subject: [lttng-dev] [PATCH 3/4] test: test for the proper pointer In-Reply-To: <1350144514-4132-3-git-send-email-laijs@cn.fujitsu.com> References: <1350144514-4132-1-git-send-email-laijs@cn.fujitsu.com> <1350144514-4132-3-git-send-email-laijs@cn.fujitsu.com> Message-ID: <20121013164858.GG29985@Krystal> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > We should use "if (qnode)" instead of "if (node)" in case of > the struct cds_lfq_node_rcu is not the first field of struct node. merged, thanks! Mathieu > > Signed-off-by: Lai Jiangshan > --- > tests/test_urcu_lfq.c | 7 ++++--- > 1 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/tests/test_urcu_lfq.c b/tests/test_urcu_lfq.c > index 66ddd41..0fcbf55 100644 > --- a/tests/test_urcu_lfq.c > +++ b/tests/test_urcu_lfq.c > @@ -243,14 +243,15 @@ void *thr_dequeuer(void *_count) > > for (;;) { > struct cds_lfq_node_rcu *qnode; > - struct test *node; > > rcu_read_lock(); > qnode = cds_lfq_dequeue_rcu(&q); > - node = caa_container_of(qnode, struct test, list); > rcu_read_unlock(); > > - if (node) { > + if (qnode) { > + struct test *node; > + > + node = caa_container_of(qnode, struct test, list); > call_rcu(&node->rcu, free_node_cb); > URCU_TLS(nr_successful_dequeues)++; > } > -- > 1.7.7.6 > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Sat Oct 13 12:49:40 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Sat, 13 Oct 2012 12:49:40 -0400 Subject: [lttng-dev] [PATCH 4/4] test: remove rcu_defer_register_thread() from test_urcu_lfq In-Reply-To: <1350144514-4132-4-git-send-email-laijs@cn.fujitsu.com> References: <1350144514-4132-1-git-send-email-laijs@cn.fujitsu.com> <1350144514-4132-4-git-send-email-laijs@cn.fujitsu.com> Message-ID: <20121013164940.GH29985@Krystal> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > test_urcu_lfq has already switch to call_rcu(), rcu_defer_register_thread() > is unneeded. merged, thanks! Mathieu > > Signed-off-by: Lai Jiangshan > --- > tests/test_urcu_lfq.c | 8 -------- > 1 files changed, 0 insertions(+), 8 deletions(-) > > diff --git a/tests/test_urcu_lfq.c b/tests/test_urcu_lfq.c > index 0fcbf55..1204d92 100644 > --- a/tests/test_urcu_lfq.c > +++ b/tests/test_urcu_lfq.c > @@ -67,7 +67,6 @@ static inline pid_t gettid(void) > #endif > #include > #include > -#include > > static volatile int test_go, test_stop; > > @@ -222,18 +221,12 @@ void free_node_cb(struct rcu_head *head) > void *thr_dequeuer(void *_count) > { > unsigned long long *count = _count; > - int ret; > > printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n", > "dequeuer", pthread_self(), (unsigned long)gettid()); > > set_affinity(); > > - ret = rcu_defer_register_thread(); > - if (ret) { > - printf("Error in rcu_defer_register_thread\n"); > - exit(-1); > - } > rcu_register_thread(); > > while (!test_go) > @@ -264,7 +257,6 @@ void *thr_dequeuer(void *_count) > } > > rcu_unregister_thread(); > - rcu_defer_unregister_thread(); > printf_verbose("dequeuer thread_end, thread id : %lx, tid %lu, " > "dequeues %llu, successful_dequeues %llu\n", > pthread_self(), (unsigned long)gettid(), > -- > 1.7.7.6 > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Sat Oct 13 14:12:31 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Sat, 13 Oct 2012 14:12:31 -0400 Subject: [lttng-dev] [RFC] Feedback on 2 development branches: wfstack and lfstack Message-ID: <20121013181231.GA32276@Krystal> Hi! I started modifying the stack APIs to reflect the changes we did for wfcqueue, and also add pop_all() and iterators. The volatile dev branches are available at: http://git.dorsal.polymtl.ca/~compudj?p=userspace-rcu;a=shortlog;h=refs/heads/urcu/lfstack or git://git.dorsal.polymtl.ca/~compudj/userspace-rcu branch: urcu/lfstack http://git.dorsal.polymtl.ca/~compudj?p=userspace-rcu;a=shortlog;h=refs/heads/urcu/wfstack or git://git.dorsal.polymtl.ca/~compudj/userspace-rcu branch: urcu/wfstack Sorry for not sending patches, I'm a bit time-constrained this weekend. I am sending these links right away for review, because I think it might be good to pull these commits into the master branch before we proceed to other changes. And I want to minimize the amount of duplicated effort between Lai and myself. One of my next step will be to document the wfstack and lfstack APIs more thoroughly (similarly to what we did for wfcqueue). Then, my following step will be to see if I can implement a lfcqueue API, derived from wfcqueue, but with lock-free enqueue semantic (a mix of wfcqueue and rculfqueue). Feedback is welcome! Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Sun Oct 14 13:53:32 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Sun, 14 Oct 2012 13:53:32 -0400 Subject: [lttng-dev] urcu stack and queues updates and documentation Message-ID: <20121014175332.GA2947@Krystal> Hi Paul! I know you are currently looking at documentation of urcu data structures. I did quite a bit of work in that area these past days. Here is my plan: 1) I would like to deprecate, at some point, rculfqueue, wfqueue, and rculfstack. 2) For wfqueue, we replace it by wfcqueue, currently in the urcu master branch. 3) For rculfstack, we replace it by lfstack available here (volatile branch): git://git.dorsal.polymtl.ca/~compudj/userspace-rcu branch: urcu/lfstack 4) I did documentation improvements (and implemented pop_all as well as empty, and iterators) for wfstack here (volatile branch too): git://git.dorsal.polymtl.ca/~compudj/userspace-rcu branch: urcu/wfstack 5) The last one to look into would be rculfqueue. I'd really like to create a lfcqueue derived from wfcqueue if possible. It's the next item on my todo list this weekend. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From laijs at cn.fujitsu.com Sun Oct 14 23:00:17 2012 From: laijs at cn.fujitsu.com (Lai Jiangshan) Date: Mon, 15 Oct 2012 11:00:17 +0800 Subject: [lttng-dev] [RFC] Feedback on 2 development branches: wfstack and lfstack In-Reply-To: <20121013181231.GA32276@Krystal> References: <20121013181231.GA32276@Krystal> Message-ID: <507B7C41.60208@cn.fujitsu.com> On 10/14/2012 02:12 AM, Mathieu Desnoyers wrote: > Hi! > > I started modifying the stack APIs to reflect the changes we did for > wfcqueue, and also add pop_all() and iterators. > > The volatile dev branches are available at: > > http://git.dorsal.polymtl.ca/~compudj?p=userspace-rcu;a=shortlog;h=refs/heads/urcu/lfstack I don't see the value of introducing struct cds_lfs_head. It can force the users to use for_each() for the return value of pop_all(). Is it its purpose? thr_dequeuer() of a8edcc02a25328647f91b4bbe8207e8cdfd317d3 is too complicated. ( I don't see the value of @counter) do { ... if (test_pop) { test_lfs_pop; } if (test_pop_all) { test_lfs_pop_all; } ... } while(); > or > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > branch: urcu/lfstack > > http://git.dorsal.polymtl.ca/~compudj?p=userspace-rcu;a=shortlog;h=refs/heads/urcu/wfstack > or > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > branch: urcu/wfstack > > Sorry for not sending patches, I'm a bit time-constrained this weekend. > I am sending these links right away for review, because I think it might > be good to pull these commits into the master branch before we proceed > to other changes. And I want to minimize the amount of duplicated effort > between Lai and myself. > > One of my next step will be to document the wfstack and lfstack APIs > more thoroughly (similarly to what we did for wfcqueue). > > Then, my following step will be to see if I can implement a lfcqueue > API, derived from wfcqueue, but with lock-free enqueue semantic (a mix > of wfcqueue and rculfqueue). > > Feedback is welcome! > > Thanks, > > Mathieu > > From mathieu.desnoyers at efficios.com Mon Oct 15 09:28:07 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 15 Oct 2012 09:28:07 -0400 Subject: [lttng-dev] [RFC] Feedback on 2 development branches: wfstack and lfstack In-Reply-To: <507B7C41.60208@cn.fujitsu.com> References: <20121013181231.GA32276@Krystal> <507B7C41.60208@cn.fujitsu.com> Message-ID: <20121015132807.GA3691@Krystal> * Lai Jiangshan (laijs at cn.fujitsu.com) wrote: > On 10/14/2012 02:12 AM, Mathieu Desnoyers wrote: > > Hi! > > > > I started modifying the stack APIs to reflect the changes we did for > > wfcqueue, and also add pop_all() and iterators. > > > > The volatile dev branches are available at: > > > > http://git.dorsal.polymtl.ca/~compudj?p=userspace-rcu;a=shortlog;h=refs/heads/urcu/lfstack > > I don't see the value of introducing struct cds_lfs_head. > It can force the users to use for_each() for the return value of pop_all(). > Is it its purpose? No exactly. The purpose of struct cds_lfs_head is to ensure that nobody will attempt to use the for_each() iterators on struct cds_lfs_node returned by __cds_lfs_pop(). I'm using typing to enforce usage restrictions. > > thr_dequeuer() of a8edcc02a25328647f91b4bbe8207e8cdfd317d3 is too complicated. > ( I don't see the value of @counter) > > do { > ... > if (test_pop) { > test_lfs_pop; > } > if (test_pop_all) { > test_lfs_pop_all; > } I will refactor the duplicated code into functions. The reason why I use a mask on a counter to select pop or pop all is to ensure that we do the "loop_sleep(rduration)" delay equally for both pop and pop_all. Moreover, it will be easier to change the rate of pop vs pop_all in the future, using a randomization function rather than 50-50 chances. The update for both lfs and wfs will be pushed shortly, Thanks! Mathieu > ... > } while(); > > > > > > > or > > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > > branch: urcu/lfstack > > > > http://git.dorsal.polymtl.ca/~compudj?p=userspace-rcu;a=shortlog;h=refs/heads/urcu/wfstack > > or > > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > > branch: urcu/wfstack > > > > Sorry for not sending patches, I'm a bit time-constrained this weekend. > > I am sending these links right away for review, because I think it might > > be good to pull these commits into the master branch before we proceed > > to other changes. And I want to minimize the amount of duplicated effort > > between Lai and myself. > > > > One of my next step will be to document the wfstack and lfstack APIs > > more thoroughly (similarly to what we did for wfcqueue). > > > > Then, my following step will be to see if I can implement a lfcqueue > > API, derived from wfcqueue, but with lock-free enqueue semantic (a mix > > of wfcqueue and rculfqueue). > > > > Feedback is welcome! > > > > Thanks, > > > > Mathieu > > > > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Mon Oct 15 11:39:18 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 15 Oct 2012 11:39:18 -0400 Subject: [lttng-dev] [PATCH lttng-tools 1/4] Rename consumer threads and spawn them in daemon In-Reply-To: <20121013154113.GA29985@Krystal> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <20121013154113.GA29985@Krystal> Message-ID: <507C2E26.1030401@efficios.com> Mathieu Desnoyers: > * David Goulet (dgoulet at efficios.com) wrote: >> The metadata thread is now created in the lttng-consumerd daemon so all >> thread could be controlled inside the daemon. >> >> This is the first step of a consumer thread refactoring which aims at >> moving data and metadata stream operations inside a dedicated thread so >> the session daemon thread does not block and is more efficient at adding >> streams. >> >> The most important concept is that a stream file descriptor MUST be >> opened as quickly as we can than passed to the right thread (for UST > > than -> then > >> since they are already opened by the session daemon for the kernel). >> >> Signed-off-by: David Goulet >> --- >> src/bin/lttng-consumerd/lttng-consumerd.c | 18 ++++++++++----- >> src/common/consumer.c | 34 +++++++++-------------------- >> src/common/consumer.h | 5 +++-- >> 3 files changed, 26 insertions(+), 31 deletions(-) >> >> diff --git a/src/bin/lttng-consumerd/lttng-consumerd.c b/src/bin/lttng-consumerd/lttng-consumerd.c >> index 5952334..946fb02 100644 >> --- a/src/bin/lttng-consumerd/lttng-consumerd.c >> +++ b/src/bin/lttng-consumerd/lttng-consumerd.c >> @@ -356,23 +356,31 @@ int main(int argc, char **argv) >> } >> lttng_consumer_set_error_sock(ctx, ret); >> >> - /* Create the thread to manage the receive of fd */ >> - ret = pthread_create(&threads[0], NULL, lttng_consumer_thread_receive_fds, >> + /* Create thread to manage the polling/writing of trace metadata */ >> + ret = pthread_create(&threads[0], NULL, consumer_thread_metadata_poll, >> + (void *) ctx); >> + if (ret != 0) { >> + perror("pthread_create"); >> + goto error; >> + } >> + >> + /* Create thread to manage the polling/writing of trace data */ >> + ret = pthread_create(&threads[1], NULL, consumer_thread_data_poll, >> (void *) ctx); >> if (ret != 0) { >> perror("pthread_create"); >> goto error; >> } >> >> - /* Create thread to manage the polling/writing of traces */ >> - ret = pthread_create(&threads[1], NULL, lttng_consumer_thread_poll_fds, >> + /* Create the thread to manage the receive of fd */ >> + ret = pthread_create(&threads[2], NULL, consumer_thread_sessiond_poll, >> (void *) ctx); >> if (ret != 0) { >> perror("pthread_create"); >> goto error; >> } >> >> - for (i = 0; i < 2; i++) { >> + for (i = 0; i < 3; i++) { >> ret = pthread_join(threads[i], &status); >> if (ret != 0) { >> perror("pthread_join"); >> diff --git a/src/common/consumer.c b/src/common/consumer.c >> index 242b05b..055de1b 100644 >> --- a/src/common/consumer.c >> +++ b/src/common/consumer.c >> @@ -1131,6 +1131,8 @@ void lttng_consumer_destroy(struct lttng_consumer_local_data *ctx) >> PERROR("close"); >> } >> utils_close_pipe(ctx->consumer_splice_metadata_pipe); >> + /* This should trigger the metadata thread to exit */ >> + close(ctx->consumer_metadata_pipe[1]); > > this is adding a close, but did not remove any other remove that might > previously be in place elsewhere. So we got two possible error path which is either the poll thread fails or the consumer could be destroy by hand even though the threads are working well. Actually, this close should check if the value is valid and close it. To be honest, this is just a shortcut since close(-1) does not fail and ignoring the close error here since we are in the cleanup path anyway so we don't necessarily care about the perror message. Anyhow, we have to handle both error path. An if plus set -1 after close can be done so not to confuse. David > > moreover, the close() return value is not tested. >> >> unlink(ctx->consumer_command_sock_path); >> free(ctx); >> @@ -1756,7 +1758,7 @@ error: >> * Thread polls on metadata file descriptor and write them on disk or on the >> * network. >> */ >> -void *lttng_consumer_thread_poll_metadata(void *data) >> +void *consumer_thread_metadata_poll(void *data) >> { >> int ret, i, pollfd; >> uint32_t revents, nb_fd; >> @@ -1939,7 +1941,7 @@ end: >> * This thread polls the fds in the set to consume the data and write >> * it to tracefile if necessary. >> */ >> -void *lttng_consumer_thread_poll_fds(void *data) >> +void *consumer_thread_data_poll(void *data) >> { >> int num_rdy, num_hup, high_prio, ret, i; >> struct pollfd *pollfd = NULL; >> @@ -1949,19 +1951,9 @@ void *lttng_consumer_thread_poll_fds(void *data) >> int nb_fd = 0; >> struct lttng_consumer_local_data *ctx = data; >> ssize_t len; >> - pthread_t metadata_thread; >> - void *status; >> >> rcu_register_thread(); >> >> - /* Start metadata polling thread */ >> - ret = pthread_create(&metadata_thread, NULL, >> - lttng_consumer_thread_poll_metadata, (void *) ctx); >> - if (ret < 0) { >> - PERROR("pthread_create metadata thread"); >> - goto end; >> - } >> - >> local_stream = zmalloc(sizeof(struct lttng_consumer_stream)); >> >> while (1) { >> @@ -2145,19 +2137,13 @@ end: >> >> /* >> * Close the write side of the pipe so epoll_wait() in >> - * lttng_consumer_thread_poll_metadata can catch it. The thread is >> - * monitoring the read side of the pipe. If we close them both, epoll_wait >> - * strangely does not return and could create a endless wait period if the >> - * pipe is the only tracked fd in the poll set. The thread will take care >> - * of closing the read side. >> + * consumer_thread_metadata_poll can catch it. The thread is monitoring the >> + * read side of the pipe. If we close them both, epoll_wait strangely does >> + * not return and could create a endless wait period if the pipe is the >> + * only tracked fd in the poll set. The thread will take care of closing >> + * the read side. >> */ >> close(ctx->consumer_metadata_pipe[1]); > > this is the second close on the same FD I'm talking about. > > thanks, > > Mathieu > >> - if (ret) { >> - ret = pthread_join(metadata_thread, &status); >> - if (ret < 0) { >> - PERROR("pthread_join metadata thread"); >> - } >> - } >> >> rcu_unregister_thread(); >> return NULL; >> @@ -2167,7 +2153,7 @@ end: >> * This thread listens on the consumerd socket and receives the file >> * descriptors from the session daemon. >> */ >> -void *lttng_consumer_thread_receive_fds(void *data) >> +void *consumer_thread_sessiond_poll(void *data) >> { >> int sock, client_socket, ret; >> /* >> diff --git a/src/common/consumer.h b/src/common/consumer.h >> index d0cd8fd..4b225e4 100644 >> --- a/src/common/consumer.h >> +++ b/src/common/consumer.h >> @@ -385,8 +385,9 @@ extern int lttng_consumer_get_produced_snapshot( >> struct lttng_consumer_local_data *ctx, >> struct lttng_consumer_stream *stream, >> unsigned long *pos); >> -extern void *lttng_consumer_thread_poll_fds(void *data); >> -extern void *lttng_consumer_thread_receive_fds(void *data); >> +extern void *consumer_thread_metadata_poll(void *data); >> +extern void *consumer_thread_data_poll(void *data); >> +extern void *consumer_thread_sessiond_poll(void *data); >> extern int lttng_consumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> int sock, struct pollfd *consumer_sockpoll); >> >> -- >> 1.7.10.4 >> >> >> _______________________________________________ >> lttng-dev mailing list >> lttng-dev at lists.lttng.org >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > From dgoulet at efficios.com Mon Oct 15 11:40:31 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 15 Oct 2012 11:40:31 -0400 Subject: [lttng-dev] [PATCH lttng-tools 2/4] Move add data stream to the data thread In-Reply-To: <20121013155313.GB29985@Krystal> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <1350052235-12198-2-git-send-email-dgoulet@efficios.com> <20121013155313.GB29985@Krystal> Message-ID: <507C2E6F.9050107@efficios.com> Mathieu Desnoyers: > * David Goulet (dgoulet at efficios.com) wrote: >> As a second step of refactoring, upon receiving a data stream, we send >> it to the data thread that is now in charge of handling it. >> >> Furthermore, in order for this to behave correctly, we have to make the >> ustctl actions on the stream upon before passing it to the right thread >> (the kernel does not need special actions.). This way, once the sessiond >> thread reply back to the session daemon, the stream is sure to be open >> and ready for data to be recorded on the application side so we avoid a >> race between the application thinking the stream is ready and the stream >> thread still scheduled out. > > Normally, as long as we have a reference on the SHM file descriptor, and > we have the wakeup FD, we should be good to fetch the data of buffers > belonging to an application that has already exited, even if it did so > before the ustctl calls are done. > > So I'm wondering why you do the ustctl calls in the sessiond thread ? It > seems to complexify the implementation needlessly: we could still do the > ustctl calls and output file open at the same location, the > data/metadata threads. Hmmm, it was my understanding that does ustctl_* calls were needed before the trace could be recording thus making them quickly. Wrong? David > > Thanks, > > Mathieu > >> >> This commit should speed up the add stream process for the session >> daemon. There is still some actions to move out of the session daemon >> poll thread to gain speed significantly, especially for network >> streaming. >> >> Signed-off-by: David Goulet >> --- >> src/common/consumer.c | 123 +++++++++++--------------- >> src/common/consumer.h | 1 + >> src/common/kernel-consumer/kernel-consumer.c | 24 ++--- >> src/common/ust-consumer/ust-consumer.c | 40 ++++----- >> 4 files changed, 78 insertions(+), 110 deletions(-) >> >> diff --git a/src/common/consumer.c b/src/common/consumer.c >> index 055de1b..1d2b1f7 100644 >> --- a/src/common/consumer.c >> +++ b/src/common/consumer.c >> @@ -89,7 +89,7 @@ static struct lttng_consumer_stream *consumer_find_stream(int key, >> return stream; >> } >> >> -static void consumer_steal_stream_key(int key, struct lttng_ht *ht) >> +void consumer_steal_stream_key(int key, struct lttng_ht *ht) >> { >> struct lttng_consumer_stream *stream; >> >> @@ -409,6 +409,14 @@ struct lttng_consumer_stream *consumer_allocate_stream( >> lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); >> lttng_ht_node_init_ulong(&stream->node, stream->key); >> >> + /* >> + * The cpu number is needed before using any ustctl_* actions. Ignored for >> + * the kernel so the value does not matter. >> + */ >> + pthread_mutex_lock(&consumer_data.lock); >> + stream->cpu = stream->chan->cpucount++; >> + pthread_mutex_unlock(&consumer_data.lock); >> + >> DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," >> " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, >> stream->shm_fd, stream->wait_fd, >> @@ -437,28 +445,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) >> pthread_mutex_lock(&consumer_data.lock); >> rcu_read_lock(); >> >> - switch (consumer_data.type) { >> - case LTTNG_CONSUMER_KERNEL: >> - break; >> - case LTTNG_CONSUMER32_UST: >> - case LTTNG_CONSUMER64_UST: >> - stream->cpu = stream->chan->cpucount++; >> - ret = lttng_ustconsumer_add_stream(stream); >> - if (ret) { >> - ret = -EINVAL; >> - goto error; >> - } >> - >> - /* Steal stream identifier only for UST */ >> - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); >> - break; >> - default: >> - ERR("Unknown consumer_data type"); >> - assert(0); >> - ret = -ENOSYS; >> - goto error; >> - } >> - >> lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); >> >> /* Check and cleanup relayd */ >> @@ -485,7 +471,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) >> consumer_data.stream_count++; >> consumer_data.need_update = 1; >> >> -error: >> rcu_read_unlock(); >> pthread_mutex_unlock(&consumer_data.lock); >> >> @@ -1582,17 +1567,6 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, >> >> DBG3("Consumer delete metadata stream %d", stream->wait_fd); >> >> - if (ht == NULL) { >> - /* Means the stream was allocated but not successfully added */ >> - goto free_stream; >> - } >> - >> - rcu_read_lock(); >> - iter.iter.node = &stream->waitfd_node.node; >> - ret = lttng_ht_del(ht, &iter); >> - assert(!ret); >> - rcu_read_unlock(); >> - >> pthread_mutex_lock(&consumer_data.lock); >> switch (consumer_data.type) { >> case LTTNG_CONSUMER_KERNEL: >> @@ -1613,6 +1587,18 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, >> goto end; >> } >> >> + if (ht == NULL) { >> + pthread_mutex_unlock(&consumer_data.lock); >> + /* Means the stream was allocated but not successfully added */ >> + goto free_stream; >> + } >> + >> + rcu_read_lock(); >> + iter.iter.node = &stream->waitfd_node.node; >> + ret = lttng_ht_del(ht, &iter); >> + assert(!ret); >> + rcu_read_unlock(); >> + >> if (stream->out_fd >= 0) { >> ret = close(stream->out_fd); >> if (ret) { >> @@ -1699,27 +1685,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, >> >> pthread_mutex_lock(&consumer_data.lock); >> >> - switch (consumer_data.type) { >> - case LTTNG_CONSUMER_KERNEL: >> - break; >> - case LTTNG_CONSUMER32_UST: >> - case LTTNG_CONSUMER64_UST: >> - ret = lttng_ustconsumer_add_stream(stream); >> - if (ret) { >> - ret = -EINVAL; >> - goto error; >> - } >> - >> - /* Steal stream identifier only for UST */ >> - consumer_steal_stream_key(stream->wait_fd, ht); >> - break; >> - default: >> - ERR("Unknown consumer_data type"); >> - assert(0); >> - ret = -ENOSYS; >> - goto error; >> - } >> - >> /* >> * From here, refcounts are updated so be _careful_ when returning an error >> * after this point. >> @@ -1749,7 +1714,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, >> lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); >> rcu_read_unlock(); >> >> -error: >> pthread_mutex_unlock(&consumer_data.lock); >> return ret; >> } >> @@ -1946,7 +1910,7 @@ void *consumer_thread_data_poll(void *data) >> int num_rdy, num_hup, high_prio, ret, i; >> struct pollfd *pollfd = NULL; >> /* local view of the streams */ >> - struct lttng_consumer_stream **local_stream = NULL; >> + struct lttng_consumer_stream **local_stream = NULL, *new_stream = NULL; >> /* local view of consumer_data.fds_count */ >> int nb_fd = 0; >> struct lttng_consumer_local_data *ctx = data; >> @@ -2034,13 +1998,35 @@ void *consumer_thread_data_poll(void *data) >> */ >> if (pollfd[nb_fd].revents & (POLLIN | POLLPRI)) { >> size_t pipe_readlen; >> - char tmp; >> >> DBG("consumer_poll_pipe wake up"); >> /* Consume 1 byte of pipe data */ >> do { >> - pipe_readlen = read(ctx->consumer_poll_pipe[0], &tmp, 1); >> + pipe_readlen = read(ctx->consumer_poll_pipe[0], &new_stream, >> + sizeof(new_stream)); >> } while (pipe_readlen == -1 && errno == EINTR); >> + >> + /* >> + * If the stream is NULL, just ignore it. It's also possible that >> + * the sessiond poll thread changed the consumer_quit state and is >> + * waking us up to test it. >> + */ >> + if (new_stream == NULL) { >> + continue; >> + } >> + >> + ret = consumer_add_stream(new_stream); >> + if (ret) { >> + ERR("Consumer add stream %d failed. Continuing", >> + new_stream->key); >> + /* >> + * At this point, if the add_stream fails, it is not in the >> + * hash table thus passing the NULL value here. >> + */ >> + consumer_del_stream(new_stream, NULL); >> + } >> + >> + /* Continue to update the local streams and handle prio ones */ >> continue; >> } >> >> @@ -2260,19 +2246,16 @@ end: >> consumer_poll_timeout = LTTNG_CONSUMER_POLL_TIMEOUT; >> >> /* >> - * Wake-up the other end by writing a null byte in the pipe >> - * (non-blocking). Important note: Because writing into the >> - * pipe is non-blocking (and therefore we allow dropping wakeup >> - * data, as long as there is wakeup data present in the pipe >> - * buffer to wake up the other end), the other end should >> - * perform the following sequence for waiting: >> - * 1) empty the pipe (reads). >> - * 2) perform update operation. >> - * 3) wait on the pipe (poll). >> + * Notify the data poll thread to poll back again and test the >> + * consumer_quit state to quit gracefully. >> */ >> do { >> - ret = write(ctx->consumer_poll_pipe[1], "", 1); >> + struct lttng_consumer_stream *null_stream = NULL; >> + >> + ret = write(ctx->consumer_poll_pipe[1], &null_stream, >> + sizeof(null_stream)); >> } while (ret < 0 && errno == EINTR); >> + >> rcu_unregister_thread(); >> return NULL; >> } >> diff --git a/src/common/consumer.h b/src/common/consumer.h >> index 4b225e4..8e5891a 100644 >> --- a/src/common/consumer.h >> +++ b/src/common/consumer.h >> @@ -362,6 +362,7 @@ struct consumer_relayd_sock_pair *consumer_allocate_relayd_sock_pair( >> struct consumer_relayd_sock_pair *consumer_find_relayd(int key); >> int consumer_handle_stream_before_relayd(struct lttng_consumer_stream *stream, >> size_t data_size); >> +void consumer_steal_stream_key(int key, struct lttng_ht *ht); >> >> extern struct lttng_consumer_local_data *lttng_consumer_create( >> enum lttng_consumer_type type, >> diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c >> index 13cbe21..444f5e0 100644 >> --- a/src/common/kernel-consumer/kernel-consumer.c >> +++ b/src/common/kernel-consumer/kernel-consumer.c >> @@ -235,10 +235,12 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> consumer_del_stream(new_stream, NULL); >> } >> } else { >> - ret = consumer_add_stream(new_stream); >> - if (ret) { >> - ERR("Consumer add stream %d failed. Continuing", >> - new_stream->key); >> + do { >> + ret = write(ctx->consumer_poll_pipe[1], &new_stream, >> + sizeof(new_stream)); >> + } while (ret < 0 && errno == EINTR); >> + if (ret < 0) { >> + PERROR("write data pipe"); >> consumer_del_stream(new_stream, NULL); >> goto end_nosignal; >> } >> @@ -284,20 +286,6 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> goto end_nosignal; >> } >> >> - /* >> - * Wake-up the other end by writing a null byte in the pipe (non-blocking). >> - * Important note: Because writing into the pipe is non-blocking (and >> - * therefore we allow dropping wakeup data, as long as there is wakeup data >> - * present in the pipe buffer to wake up the other end), the other end >> - * should perform the following sequence for waiting: >> - * >> - * 1) empty the pipe (reads). >> - * 2) perform update operation. >> - * 3) wait on the pipe (poll). >> - */ >> - do { >> - ret = write(ctx->consumer_poll_pipe[1], "", 1); >> - } while (ret < 0 && errno == EINTR); >> end_nosignal: >> rcu_read_unlock(); >> >> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c >> index 1170687..4ca4b84 100644 >> --- a/src/common/ust-consumer/ust-consumer.c >> +++ b/src/common/ust-consumer/ust-consumer.c >> @@ -224,6 +224,18 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> goto end_nosignal; >> } >> >> + /* >> + * This needs to be done as soon as we can so we don't block the >> + * application too long. >> + */ >> + ret = lttng_ustconsumer_add_stream(new_stream); >> + if (ret) { >> + consumer_del_stream(new_stream, NULL); >> + goto end_nosignal; >> + } >> + /* Steal stream identifier to avoid having streams with the same key */ >> + consumer_steal_stream_key(new_stream->key, consumer_data.stream_ht); >> + >> /* The stream is not metadata. Get relayd reference if exists. */ >> relayd = consumer_find_relayd(msg.u.stream.net_index); >> if (relayd != NULL) { >> @@ -265,14 +277,12 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> goto end_nosignal; >> } >> } else { >> - ret = consumer_add_stream(new_stream); >> - if (ret) { >> - ERR("Consumer add stream %d failed. Continuing", >> - new_stream->key); >> - /* >> - * At this point, if the add_stream fails, it is not in the >> - * hash table thus passing the NULL value here. >> - */ >> + do { >> + ret = write(ctx->consumer_poll_pipe[1], &new_stream, >> + sizeof(new_stream)); >> + } while (ret < 0 && errno == EINTR); >> + if (ret < 0) { >> + PERROR("write data pipe"); >> consumer_del_stream(new_stream, NULL); >> goto end_nosignal; >> } >> @@ -334,20 +344,6 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> break; >> } >> >> - /* >> - * Wake-up the other end by writing a null byte in the pipe (non-blocking). >> - * Important note: Because writing into the pipe is non-blocking (and >> - * therefore we allow dropping wakeup data, as long as there is wakeup data >> - * present in the pipe buffer to wake up the other end), the other end >> - * should perform the following sequence for waiting: >> - * >> - * 1) empty the pipe (reads). >> - * 2) perform update operation. >> - * 3) wait on the pipe (poll). >> - */ >> - do { >> - ret = write(ctx->consumer_poll_pipe[1], "", 1); >> - } while (ret < 0 && errno == EINTR); >> end_nosignal: >> rcu_read_unlock(); >> >> -- >> 1.7.10.4 >> >> >> _______________________________________________ >> lttng-dev mailing list >> lttng-dev at lists.lttng.org >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > From dgoulet at efficios.com Mon Oct 15 11:47:31 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 15 Oct 2012 11:47:31 -0400 Subject: [lttng-dev] [PATCH lttng-tools 3/4] Make stream hash tables global to the consumer In-Reply-To: <20121013155615.GC29985@Krystal> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <1350052235-12198-3-git-send-email-dgoulet@efficios.com> <20121013155615.GC29985@Krystal> Message-ID: <507C3013.2050405@efficios.com> Mathieu Desnoyers: > * David Goulet (dgoulet at efficios.com) wrote: >> The data stream hash table is now global to the consumer and used in the >> data thread. The consumer_data stream_ht is no longer used to track the >> data streams but instead will be used (and possibly renamed) by the >> session daemon poll thread to keep track of streams on a per session id >> basis for the upcoming feature that check traced data availability. >> >> For now, in order to avoid mind bugging problems to access the streams, >> both hash table are now defined globally (metadata and data). However, >> stream update are still done in a single thread. Don't count on this to >> be guaranteed in the next commits. >> >> Signed-off-by: David Goulet >> --- >> src/common/consumer.c | 91 +++++++++++++++++++++++++------- >> src/common/consumer.h | 9 ++-- >> src/common/ust-consumer/ust-consumer.c | 2 - >> 3 files changed, 75 insertions(+), 27 deletions(-) >> >> diff --git a/src/common/consumer.c b/src/common/consumer.c >> index 1d2b1f7..1fb9960 100644 >> --- a/src/common/consumer.c >> +++ b/src/common/consumer.c >> @@ -59,6 +59,17 @@ int consumer_poll_timeout = -1; >> volatile int consumer_quit = 0; >> >> /* >> + * The following two hash tables are visible by all threads which are separated >> + * in different source files. >> + * >> + * Global hash table containing respectively metadata and data streams. The >> + * stream element in this ht should only be updated by the metadata poll thread >> + * for the metadata and the data poll thread for the data. >> + */ >> +struct lttng_ht *metadata_ht = NULL; >> +struct lttng_ht *data_ht = NULL; >> + >> +/* >> * Find a stream. The consumer_data.lock must be locked during this >> * call. >> */ >> @@ -433,19 +444,24 @@ end: >> /* >> * Add a stream to the global list protected by a mutex. >> */ >> -int consumer_add_stream(struct lttng_consumer_stream *stream) >> +static int consumer_add_stream(struct lttng_consumer_stream *stream, >> + struct lttng_ht *ht) >> { >> int ret = 0; >> struct consumer_relayd_sock_pair *relayd; >> >> assert(stream); >> + assert(ht); >> >> DBG3("Adding consumer stream %d", stream->key); >> >> pthread_mutex_lock(&consumer_data.lock); >> rcu_read_lock(); >> >> - lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); >> + /* Steal stream identifier to avoid having streams with the same key */ >> + consumer_steal_stream_key(stream->key, ht); > > I don't understand why suddenly this change is needed. Considering what > this patch should be doing (just moving a ht from per-thread to global), > it should not have any behavior impact. We move the steal stream key from the sessiond thread to the add_stream function call since we do not use the consumer_data hash table anymore (stream_ht) and uses per thread hashtable (global for now though). If you look below, you'll see that the steal stream key call is removed (using the consumer data stream_ht). This commit makes sure that both consumer_add_stream and add_metadata_stream steal the stream key if needed. Thanks David > > Thanks, > > Mathieu > >> + >> + lttng_ht_add_unique_ulong(ht, &stream->node); >> >> /* Check and cleanup relayd */ >> relayd = consumer_find_relayd(stream->net_seq_idx); >> @@ -783,9 +799,9 @@ end: >> * >> * Returns the number of fds in the structures. >> */ >> -int consumer_update_poll_array( >> +static int consumer_update_poll_array( >> struct lttng_consumer_local_data *ctx, struct pollfd **pollfd, >> - struct lttng_consumer_stream **local_stream) >> + struct lttng_consumer_stream **local_stream, struct lttng_ht *ht) >> { >> int i = 0; >> struct lttng_ht_iter iter; >> @@ -793,8 +809,7 @@ int consumer_update_poll_array( >> >> DBG("Updating poll fd array"); >> rcu_read_lock(); >> - cds_lfht_for_each_entry(consumer_data.stream_ht->ht, &iter.iter, stream, >> - node.node) { >> + cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { >> if (stream->state != LTTNG_CONSUMER_ACTIVE_STREAM) { >> continue; >> } >> @@ -1523,6 +1538,33 @@ int lttng_consumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> /* >> * Iterate over all streams of the hashtable and free them properly. >> * >> + * WARNING: *MUST* be used with data stream only. >> + */ >> +static void destroy_data_stream_ht(struct lttng_ht *ht) >> +{ >> + int ret; >> + struct lttng_ht_iter iter; >> + struct lttng_consumer_stream *stream; >> + >> + if (ht == NULL) { >> + return; >> + } >> + >> + rcu_read_lock(); >> + cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { >> + ret = lttng_ht_del(ht, &iter); >> + assert(!ret); >> + >> + call_rcu(&stream->node.head, consumer_free_stream); >> + } >> + rcu_read_unlock(); >> + >> + lttng_ht_destroy(ht); >> +} >> + >> +/* >> + * Iterate over all streams of the hashtable and free them properly. >> + * >> * XXX: Should not be only for metadata stream or else use an other name. >> */ >> static void destroy_stream_ht(struct lttng_ht *ht) >> @@ -1711,6 +1753,9 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, >> uatomic_dec(&stream->chan->nb_init_streams); >> } >> >> + /* Steal stream identifier to avoid having streams with the same key */ >> + consumer_steal_stream_key(stream->key, ht); >> + >> lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); >> rcu_read_unlock(); >> >> @@ -1729,7 +1774,6 @@ void *consumer_thread_metadata_poll(void *data) >> struct lttng_consumer_stream *stream = NULL; >> struct lttng_ht_iter iter; >> struct lttng_ht_node_ulong *node; >> - struct lttng_ht *metadata_ht = NULL; >> struct lttng_poll_event events; >> struct lttng_consumer_local_data *ctx = data; >> ssize_t len; >> @@ -1738,11 +1782,6 @@ void *consumer_thread_metadata_poll(void *data) >> >> DBG("Thread metadata poll started"); >> >> - metadata_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); >> - if (metadata_ht == NULL) { >> - goto end; >> - } >> - >> /* Size is set to 1 for the consumer_metadata pipe */ >> ret = lttng_poll_create(&events, 2, LTTNG_CLOEXEC); >> if (ret < 0) { >> @@ -1918,6 +1957,11 @@ void *consumer_thread_data_poll(void *data) >> >> rcu_register_thread(); >> >> + data_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); >> + if (data_ht == NULL) { >> + goto end; >> + } >> + >> local_stream = zmalloc(sizeof(struct lttng_consumer_stream)); >> >> while (1) { >> @@ -1955,7 +1999,8 @@ void *consumer_thread_data_poll(void *data) >> pthread_mutex_unlock(&consumer_data.lock); >> goto end; >> } >> - ret = consumer_update_poll_array(ctx, &pollfd, local_stream); >> + ret = consumer_update_poll_array(ctx, &pollfd, local_stream, >> + data_ht); >> if (ret < 0) { >> ERR("Error in allocating pollfd or local_outfds"); >> lttng_consumer_send_error(ctx, LTTCOMM_CONSUMERD_POLL_ERROR); >> @@ -2015,7 +2060,7 @@ void *consumer_thread_data_poll(void *data) >> continue; >> } >> >> - ret = consumer_add_stream(new_stream); >> + ret = consumer_add_stream(new_stream, data_ht); >> if (ret) { >> ERR("Consumer add stream %d failed. Continuing", >> new_stream->key); >> @@ -2088,22 +2133,19 @@ void *consumer_thread_data_poll(void *data) >> if ((pollfd[i].revents & POLLHUP)) { >> DBG("Polling fd %d tells it has hung up.", pollfd[i].fd); >> if (!local_stream[i]->data_read) { >> - consumer_del_stream(local_stream[i], >> - consumer_data.stream_ht); >> + consumer_del_stream(local_stream[i], data_ht); >> num_hup++; >> } >> } else if (pollfd[i].revents & POLLERR) { >> ERR("Error returned in polling fd %d.", pollfd[i].fd); >> if (!local_stream[i]->data_read) { >> - consumer_del_stream(local_stream[i], >> - consumer_data.stream_ht); >> + consumer_del_stream(local_stream[i], data_ht); >> num_hup++; >> } >> } else if (pollfd[i].revents & POLLNVAL) { >> ERR("Polling fd %d tells fd is not open.", pollfd[i].fd); >> if (!local_stream[i]->data_read) { >> - consumer_del_stream(local_stream[i], >> - consumer_data.stream_ht); >> + consumer_del_stream(local_stream[i], data_ht); >> num_hup++; >> } >> } >> @@ -2131,6 +2173,10 @@ end: >> */ >> close(ctx->consumer_metadata_pipe[1]); >> >> + if (data_ht) { >> + destroy_data_stream_ht(data_ht); >> + } >> + >> rcu_unregister_thread(); >> return NULL; >> } >> @@ -2299,6 +2345,11 @@ void lttng_consumer_init(void) >> consumer_data.stream_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); >> consumer_data.channel_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); >> consumer_data.relayd_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); >> + >> + metadata_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); >> + assert(metadata_ht); >> + data_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); >> + assert(data_ht); >> } >> >> /* >> diff --git a/src/common/consumer.h b/src/common/consumer.h >> index 8e5891a..6bce96d 100644 >> --- a/src/common/consumer.h >> +++ b/src/common/consumer.h >> @@ -275,6 +275,10 @@ struct lttng_consumer_global_data { >> struct lttng_ht *relayd_ht; >> }; >> >> +/* Defined in consumer.c and coupled with explanations */ >> +extern struct lttng_ht *metadata_ht; >> +extern struct lttng_ht *data_ht; >> + >> /* >> * Init consumer data structures. >> */ >> @@ -324,10 +328,6 @@ extern void lttng_consumer_sync_trace_file( >> */ >> extern int lttng_consumer_poll_socket(struct pollfd *kconsumer_sockpoll); >> >> -extern int consumer_update_poll_array( >> - struct lttng_consumer_local_data *ctx, struct pollfd **pollfd, >> - struct lttng_consumer_stream **local_consumer_streams); >> - >> extern struct lttng_consumer_stream *consumer_allocate_stream( >> int channel_key, int stream_key, >> int shm_fd, int wait_fd, >> @@ -340,7 +340,6 @@ extern struct lttng_consumer_stream *consumer_allocate_stream( >> int net_index, >> int metadata_flag, >> int *alloc_ret); >> -extern int consumer_add_stream(struct lttng_consumer_stream *stream); >> extern void consumer_del_stream(struct lttng_consumer_stream *stream, >> struct lttng_ht *ht); >> extern void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, >> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c >> index 4ca4b84..3b41e55 100644 >> --- a/src/common/ust-consumer/ust-consumer.c >> +++ b/src/common/ust-consumer/ust-consumer.c >> @@ -233,8 +233,6 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >> consumer_del_stream(new_stream, NULL); >> goto end_nosignal; >> } >> - /* Steal stream identifier to avoid having streams with the same key */ >> - consumer_steal_stream_key(new_stream->key, consumer_data.stream_ht); >> >> /* The stream is not metadata. Get relayd reference if exists. */ >> relayd = consumer_find_relayd(msg.u.stream.net_index); >> -- >> 1.7.10.4 >> >> >> _______________________________________________ >> lttng-dev mailing list >> lttng-dev at lists.lttng.org >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > From mathieu.desnoyers at efficios.com Mon Oct 15 13:28:08 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 15 Oct 2012 13:28:08 -0400 Subject: [lttng-dev] [PATCH lttng-tools 1/4] Rename consumer threads and spawn them in daemon In-Reply-To: <507C2E26.1030401@efficios.com> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <20121013154113.GA29985@Krystal> <507C2E26.1030401@efficios.com> Message-ID: <20121015172807.GA9034@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > > > Mathieu Desnoyers: > > * David Goulet (dgoulet at efficios.com) wrote: > >> The metadata thread is now created in the lttng-consumerd daemon so all > >> thread could be controlled inside the daemon. > >> > >> This is the first step of a consumer thread refactoring which aims at > >> moving data and metadata stream operations inside a dedicated thread so > >> the session daemon thread does not block and is more efficient at adding > >> streams. > >> > >> The most important concept is that a stream file descriptor MUST be > >> opened as quickly as we can than passed to the right thread (for UST > > > > than -> then > > > >> since they are already opened by the session daemon for the kernel). > >> > >> Signed-off-by: David Goulet > >> --- > >> src/bin/lttng-consumerd/lttng-consumerd.c | 18 ++++++++++----- > >> src/common/consumer.c | 34 +++++++++-------------------- > >> src/common/consumer.h | 5 +++-- > >> 3 files changed, 26 insertions(+), 31 deletions(-) > >> > >> diff --git a/src/bin/lttng-consumerd/lttng-consumerd.c b/src/bin/lttng-consumerd/lttng-consumerd.c > >> index 5952334..946fb02 100644 > >> --- a/src/bin/lttng-consumerd/lttng-consumerd.c > >> +++ b/src/bin/lttng-consumerd/lttng-consumerd.c > >> @@ -356,23 +356,31 @@ int main(int argc, char **argv) > >> } > >> lttng_consumer_set_error_sock(ctx, ret); > >> > >> - /* Create the thread to manage the receive of fd */ > >> - ret = pthread_create(&threads[0], NULL, lttng_consumer_thread_receive_fds, > >> + /* Create thread to manage the polling/writing of trace metadata */ > >> + ret = pthread_create(&threads[0], NULL, consumer_thread_metadata_poll, > >> + (void *) ctx); > >> + if (ret != 0) { > >> + perror("pthread_create"); > >> + goto error; > >> + } > >> + > >> + /* Create thread to manage the polling/writing of trace data */ > >> + ret = pthread_create(&threads[1], NULL, consumer_thread_data_poll, > >> (void *) ctx); > >> if (ret != 0) { > >> perror("pthread_create"); > >> goto error; > >> } > >> > >> - /* Create thread to manage the polling/writing of traces */ > >> - ret = pthread_create(&threads[1], NULL, lttng_consumer_thread_poll_fds, > >> + /* Create the thread to manage the receive of fd */ > >> + ret = pthread_create(&threads[2], NULL, consumer_thread_sessiond_poll, > >> (void *) ctx); > >> if (ret != 0) { > >> perror("pthread_create"); > >> goto error; > >> } > >> > >> - for (i = 0; i < 2; i++) { > >> + for (i = 0; i < 3; i++) { > >> ret = pthread_join(threads[i], &status); > >> if (ret != 0) { > >> perror("pthread_join"); > >> diff --git a/src/common/consumer.c b/src/common/consumer.c > >> index 242b05b..055de1b 100644 > >> --- a/src/common/consumer.c > >> +++ b/src/common/consumer.c > >> @@ -1131,6 +1131,8 @@ void lttng_consumer_destroy(struct lttng_consumer_local_data *ctx) > >> PERROR("close"); > >> } > >> utils_close_pipe(ctx->consumer_splice_metadata_pipe); > >> + /* This should trigger the metadata thread to exit */ > >> + close(ctx->consumer_metadata_pipe[1]); > > > > this is adding a close, but did not remove any other remove that might > > previously be in place elsewhere. > > So we got two possible error path which is either the poll thread fails > or the consumer could be destroy by hand even though the threads are > working well. > > Actually, this close should check if the value is valid and close it. To > be honest, this is just a shortcut since close(-1) does not fail and > ignoring the close error here since we are in the cleanup path anyway so > we don't necessarily care about the perror message. > > Anyhow, we have to handle both error path. An if plus set -1 after close > can be done so not to confuse. if two threads can concurrently perform close on the same fd value, how can you prove there are no possible races ? Mathieu > > David > > > > > moreover, the close() return value is not tested. > >> > >> unlink(ctx->consumer_command_sock_path); > >> free(ctx); > >> @@ -1756,7 +1758,7 @@ error: > >> * Thread polls on metadata file descriptor and write them on disk or on the > >> * network. > >> */ > >> -void *lttng_consumer_thread_poll_metadata(void *data) > >> +void *consumer_thread_metadata_poll(void *data) > >> { > >> int ret, i, pollfd; > >> uint32_t revents, nb_fd; > >> @@ -1939,7 +1941,7 @@ end: > >> * This thread polls the fds in the set to consume the data and write > >> * it to tracefile if necessary. > >> */ > >> -void *lttng_consumer_thread_poll_fds(void *data) > >> +void *consumer_thread_data_poll(void *data) > >> { > >> int num_rdy, num_hup, high_prio, ret, i; > >> struct pollfd *pollfd = NULL; > >> @@ -1949,19 +1951,9 @@ void *lttng_consumer_thread_poll_fds(void *data) > >> int nb_fd = 0; > >> struct lttng_consumer_local_data *ctx = data; > >> ssize_t len; > >> - pthread_t metadata_thread; > >> - void *status; > >> > >> rcu_register_thread(); > >> > >> - /* Start metadata polling thread */ > >> - ret = pthread_create(&metadata_thread, NULL, > >> - lttng_consumer_thread_poll_metadata, (void *) ctx); > >> - if (ret < 0) { > >> - PERROR("pthread_create metadata thread"); > >> - goto end; > >> - } > >> - > >> local_stream = zmalloc(sizeof(struct lttng_consumer_stream)); > >> > >> while (1) { > >> @@ -2145,19 +2137,13 @@ end: > >> > >> /* > >> * Close the write side of the pipe so epoll_wait() in > >> - * lttng_consumer_thread_poll_metadata can catch it. The thread is > >> - * monitoring the read side of the pipe. If we close them both, epoll_wait > >> - * strangely does not return and could create a endless wait period if the > >> - * pipe is the only tracked fd in the poll set. The thread will take care > >> - * of closing the read side. > >> + * consumer_thread_metadata_poll can catch it. The thread is monitoring the > >> + * read side of the pipe. If we close them both, epoll_wait strangely does > >> + * not return and could create a endless wait period if the pipe is the > >> + * only tracked fd in the poll set. The thread will take care of closing > >> + * the read side. > >> */ > >> close(ctx->consumer_metadata_pipe[1]); > > > > this is the second close on the same FD I'm talking about. > > > > thanks, > > > > Mathieu > > > >> - if (ret) { > >> - ret = pthread_join(metadata_thread, &status); > >> - if (ret < 0) { > >> - PERROR("pthread_join metadata thread"); > >> - } > >> - } > >> > >> rcu_unregister_thread(); > >> return NULL; > >> @@ -2167,7 +2153,7 @@ end: > >> * This thread listens on the consumerd socket and receives the file > >> * descriptors from the session daemon. > >> */ > >> -void *lttng_consumer_thread_receive_fds(void *data) > >> +void *consumer_thread_sessiond_poll(void *data) > >> { > >> int sock, client_socket, ret; > >> /* > >> diff --git a/src/common/consumer.h b/src/common/consumer.h > >> index d0cd8fd..4b225e4 100644 > >> --- a/src/common/consumer.h > >> +++ b/src/common/consumer.h > >> @@ -385,8 +385,9 @@ extern int lttng_consumer_get_produced_snapshot( > >> struct lttng_consumer_local_data *ctx, > >> struct lttng_consumer_stream *stream, > >> unsigned long *pos); > >> -extern void *lttng_consumer_thread_poll_fds(void *data); > >> -extern void *lttng_consumer_thread_receive_fds(void *data); > >> +extern void *consumer_thread_metadata_poll(void *data); > >> +extern void *consumer_thread_data_poll(void *data); > >> +extern void *consumer_thread_sessiond_poll(void *data); > >> extern int lttng_consumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >> int sock, struct pollfd *consumer_sockpoll); > >> > >> -- > >> 1.7.10.4 > >> > >> > >> _______________________________________________ > >> lttng-dev mailing list > >> lttng-dev at lists.lttng.org > >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 15 13:31:01 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 15 Oct 2012 13:31:01 -0400 Subject: [lttng-dev] [PATCH lttng-tools 2/4] Move add data stream to the data thread In-Reply-To: <507C2E6F.9050107@efficios.com> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <1350052235-12198-2-git-send-email-dgoulet@efficios.com> <20121013155313.GB29985@Krystal> <507C2E6F.9050107@efficios.com> Message-ID: <20121015173101.GA9142@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > > > Mathieu Desnoyers: > > * David Goulet (dgoulet at efficios.com) wrote: > >> As a second step of refactoring, upon receiving a data stream, we send > >> it to the data thread that is now in charge of handling it. > >> > >> Furthermore, in order for this to behave correctly, we have to make the > >> ustctl actions on the stream upon before passing it to the right thread > >> (the kernel does not need special actions.). This way, once the sessiond > >> thread reply back to the session daemon, the stream is sure to be open > >> and ready for data to be recorded on the application side so we avoid a > >> race between the application thinking the stream is ready and the stream > >> thread still scheduled out. > > > > Normally, as long as we have a reference on the SHM file descriptor, and > > we have the wakeup FD, we should be good to fetch the data of buffers > > belonging to an application that has already exited, even if it did so > > before the ustctl calls are done. > > > > So I'm wondering why you do the ustctl calls in the sessiond thread ? It > > seems to complexify the implementation needlessly: we could still do the > > ustctl calls and output file open at the same location, the > > data/metadata threads. > > Hmmm, it was my understanding that does does -> those > ustctl_* calls were needed > before the trace could be recording thus making them quickly. Wrong? Can you rephrase your question ? I don't understand. Thanks, Mathieu > > David > > > > > Thanks, > > > > Mathieu > > > >> > >> This commit should speed up the add stream process for the session > >> daemon. There is still some actions to move out of the session daemon > >> poll thread to gain speed significantly, especially for network > >> streaming. > >> > >> Signed-off-by: David Goulet > >> --- > >> src/common/consumer.c | 123 +++++++++++--------------- > >> src/common/consumer.h | 1 + > >> src/common/kernel-consumer/kernel-consumer.c | 24 ++--- > >> src/common/ust-consumer/ust-consumer.c | 40 ++++----- > >> 4 files changed, 78 insertions(+), 110 deletions(-) > >> > >> diff --git a/src/common/consumer.c b/src/common/consumer.c > >> index 055de1b..1d2b1f7 100644 > >> --- a/src/common/consumer.c > >> +++ b/src/common/consumer.c > >> @@ -89,7 +89,7 @@ static struct lttng_consumer_stream *consumer_find_stream(int key, > >> return stream; > >> } > >> > >> -static void consumer_steal_stream_key(int key, struct lttng_ht *ht) > >> +void consumer_steal_stream_key(int key, struct lttng_ht *ht) > >> { > >> struct lttng_consumer_stream *stream; > >> > >> @@ -409,6 +409,14 @@ struct lttng_consumer_stream *consumer_allocate_stream( > >> lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); > >> lttng_ht_node_init_ulong(&stream->node, stream->key); > >> > >> + /* > >> + * The cpu number is needed before using any ustctl_* actions. Ignored for > >> + * the kernel so the value does not matter. > >> + */ > >> + pthread_mutex_lock(&consumer_data.lock); > >> + stream->cpu = stream->chan->cpucount++; > >> + pthread_mutex_unlock(&consumer_data.lock); > >> + > >> DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," > >> " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, > >> stream->shm_fd, stream->wait_fd, > >> @@ -437,28 +445,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) > >> pthread_mutex_lock(&consumer_data.lock); > >> rcu_read_lock(); > >> > >> - switch (consumer_data.type) { > >> - case LTTNG_CONSUMER_KERNEL: > >> - break; > >> - case LTTNG_CONSUMER32_UST: > >> - case LTTNG_CONSUMER64_UST: > >> - stream->cpu = stream->chan->cpucount++; > >> - ret = lttng_ustconsumer_add_stream(stream); > >> - if (ret) { > >> - ret = -EINVAL; > >> - goto error; > >> - } > >> - > >> - /* Steal stream identifier only for UST */ > >> - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); > >> - break; > >> - default: > >> - ERR("Unknown consumer_data type"); > >> - assert(0); > >> - ret = -ENOSYS; > >> - goto error; > >> - } > >> - > >> lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); > >> > >> /* Check and cleanup relayd */ > >> @@ -485,7 +471,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) > >> consumer_data.stream_count++; > >> consumer_data.need_update = 1; > >> > >> -error: > >> rcu_read_unlock(); > >> pthread_mutex_unlock(&consumer_data.lock); > >> > >> @@ -1582,17 +1567,6 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, > >> > >> DBG3("Consumer delete metadata stream %d", stream->wait_fd); > >> > >> - if (ht == NULL) { > >> - /* Means the stream was allocated but not successfully added */ > >> - goto free_stream; > >> - } > >> - > >> - rcu_read_lock(); > >> - iter.iter.node = &stream->waitfd_node.node; > >> - ret = lttng_ht_del(ht, &iter); > >> - assert(!ret); > >> - rcu_read_unlock(); > >> - > >> pthread_mutex_lock(&consumer_data.lock); > >> switch (consumer_data.type) { > >> case LTTNG_CONSUMER_KERNEL: > >> @@ -1613,6 +1587,18 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, > >> goto end; > >> } > >> > >> + if (ht == NULL) { > >> + pthread_mutex_unlock(&consumer_data.lock); > >> + /* Means the stream was allocated but not successfully added */ > >> + goto free_stream; > >> + } > >> + > >> + rcu_read_lock(); > >> + iter.iter.node = &stream->waitfd_node.node; > >> + ret = lttng_ht_del(ht, &iter); > >> + assert(!ret); > >> + rcu_read_unlock(); > >> + > >> if (stream->out_fd >= 0) { > >> ret = close(stream->out_fd); > >> if (ret) { > >> @@ -1699,27 +1685,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > >> > >> pthread_mutex_lock(&consumer_data.lock); > >> > >> - switch (consumer_data.type) { > >> - case LTTNG_CONSUMER_KERNEL: > >> - break; > >> - case LTTNG_CONSUMER32_UST: > >> - case LTTNG_CONSUMER64_UST: > >> - ret = lttng_ustconsumer_add_stream(stream); > >> - if (ret) { > >> - ret = -EINVAL; > >> - goto error; > >> - } > >> - > >> - /* Steal stream identifier only for UST */ > >> - consumer_steal_stream_key(stream->wait_fd, ht); > >> - break; > >> - default: > >> - ERR("Unknown consumer_data type"); > >> - assert(0); > >> - ret = -ENOSYS; > >> - goto error; > >> - } > >> - > >> /* > >> * From here, refcounts are updated so be _careful_ when returning an error > >> * after this point. > >> @@ -1749,7 +1714,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > >> lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); > >> rcu_read_unlock(); > >> > >> -error: > >> pthread_mutex_unlock(&consumer_data.lock); > >> return ret; > >> } > >> @@ -1946,7 +1910,7 @@ void *consumer_thread_data_poll(void *data) > >> int num_rdy, num_hup, high_prio, ret, i; > >> struct pollfd *pollfd = NULL; > >> /* local view of the streams */ > >> - struct lttng_consumer_stream **local_stream = NULL; > >> + struct lttng_consumer_stream **local_stream = NULL, *new_stream = NULL; > >> /* local view of consumer_data.fds_count */ > >> int nb_fd = 0; > >> struct lttng_consumer_local_data *ctx = data; > >> @@ -2034,13 +1998,35 @@ void *consumer_thread_data_poll(void *data) > >> */ > >> if (pollfd[nb_fd].revents & (POLLIN | POLLPRI)) { > >> size_t pipe_readlen; > >> - char tmp; > >> > >> DBG("consumer_poll_pipe wake up"); > >> /* Consume 1 byte of pipe data */ > >> do { > >> - pipe_readlen = read(ctx->consumer_poll_pipe[0], &tmp, 1); > >> + pipe_readlen = read(ctx->consumer_poll_pipe[0], &new_stream, > >> + sizeof(new_stream)); > >> } while (pipe_readlen == -1 && errno == EINTR); > >> + > >> + /* > >> + * If the stream is NULL, just ignore it. It's also possible that > >> + * the sessiond poll thread changed the consumer_quit state and is > >> + * waking us up to test it. > >> + */ > >> + if (new_stream == NULL) { > >> + continue; > >> + } > >> + > >> + ret = consumer_add_stream(new_stream); > >> + if (ret) { > >> + ERR("Consumer add stream %d failed. Continuing", > >> + new_stream->key); > >> + /* > >> + * At this point, if the add_stream fails, it is not in the > >> + * hash table thus passing the NULL value here. > >> + */ > >> + consumer_del_stream(new_stream, NULL); > >> + } > >> + > >> + /* Continue to update the local streams and handle prio ones */ > >> continue; > >> } > >> > >> @@ -2260,19 +2246,16 @@ end: > >> consumer_poll_timeout = LTTNG_CONSUMER_POLL_TIMEOUT; > >> > >> /* > >> - * Wake-up the other end by writing a null byte in the pipe > >> - * (non-blocking). Important note: Because writing into the > >> - * pipe is non-blocking (and therefore we allow dropping wakeup > >> - * data, as long as there is wakeup data present in the pipe > >> - * buffer to wake up the other end), the other end should > >> - * perform the following sequence for waiting: > >> - * 1) empty the pipe (reads). > >> - * 2) perform update operation. > >> - * 3) wait on the pipe (poll). > >> + * Notify the data poll thread to poll back again and test the > >> + * consumer_quit state to quit gracefully. > >> */ > >> do { > >> - ret = write(ctx->consumer_poll_pipe[1], "", 1); > >> + struct lttng_consumer_stream *null_stream = NULL; > >> + > >> + ret = write(ctx->consumer_poll_pipe[1], &null_stream, > >> + sizeof(null_stream)); > >> } while (ret < 0 && errno == EINTR); > >> + > >> rcu_unregister_thread(); > >> return NULL; > >> } > >> diff --git a/src/common/consumer.h b/src/common/consumer.h > >> index 4b225e4..8e5891a 100644 > >> --- a/src/common/consumer.h > >> +++ b/src/common/consumer.h > >> @@ -362,6 +362,7 @@ struct consumer_relayd_sock_pair *consumer_allocate_relayd_sock_pair( > >> struct consumer_relayd_sock_pair *consumer_find_relayd(int key); > >> int consumer_handle_stream_before_relayd(struct lttng_consumer_stream *stream, > >> size_t data_size); > >> +void consumer_steal_stream_key(int key, struct lttng_ht *ht); > >> > >> extern struct lttng_consumer_local_data *lttng_consumer_create( > >> enum lttng_consumer_type type, > >> diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c > >> index 13cbe21..444f5e0 100644 > >> --- a/src/common/kernel-consumer/kernel-consumer.c > >> +++ b/src/common/kernel-consumer/kernel-consumer.c > >> @@ -235,10 +235,12 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >> consumer_del_stream(new_stream, NULL); > >> } > >> } else { > >> - ret = consumer_add_stream(new_stream); > >> - if (ret) { > >> - ERR("Consumer add stream %d failed. Continuing", > >> - new_stream->key); > >> + do { > >> + ret = write(ctx->consumer_poll_pipe[1], &new_stream, > >> + sizeof(new_stream)); > >> + } while (ret < 0 && errno == EINTR); > >> + if (ret < 0) { > >> + PERROR("write data pipe"); > >> consumer_del_stream(new_stream, NULL); > >> goto end_nosignal; > >> } > >> @@ -284,20 +286,6 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >> goto end_nosignal; > >> } > >> > >> - /* > >> - * Wake-up the other end by writing a null byte in the pipe (non-blocking). > >> - * Important note: Because writing into the pipe is non-blocking (and > >> - * therefore we allow dropping wakeup data, as long as there is wakeup data > >> - * present in the pipe buffer to wake up the other end), the other end > >> - * should perform the following sequence for waiting: > >> - * > >> - * 1) empty the pipe (reads). > >> - * 2) perform update operation. > >> - * 3) wait on the pipe (poll). > >> - */ > >> - do { > >> - ret = write(ctx->consumer_poll_pipe[1], "", 1); > >> - } while (ret < 0 && errno == EINTR); > >> end_nosignal: > >> rcu_read_unlock(); > >> > >> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c > >> index 1170687..4ca4b84 100644 > >> --- a/src/common/ust-consumer/ust-consumer.c > >> +++ b/src/common/ust-consumer/ust-consumer.c > >> @@ -224,6 +224,18 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >> goto end_nosignal; > >> } > >> > >> + /* > >> + * This needs to be done as soon as we can so we don't block the > >> + * application too long. > >> + */ > >> + ret = lttng_ustconsumer_add_stream(new_stream); > >> + if (ret) { > >> + consumer_del_stream(new_stream, NULL); > >> + goto end_nosignal; > >> + } > >> + /* Steal stream identifier to avoid having streams with the same key */ > >> + consumer_steal_stream_key(new_stream->key, consumer_data.stream_ht); > >> + > >> /* The stream is not metadata. Get relayd reference if exists. */ > >> relayd = consumer_find_relayd(msg.u.stream.net_index); > >> if (relayd != NULL) { > >> @@ -265,14 +277,12 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >> goto end_nosignal; > >> } > >> } else { > >> - ret = consumer_add_stream(new_stream); > >> - if (ret) { > >> - ERR("Consumer add stream %d failed. Continuing", > >> - new_stream->key); > >> - /* > >> - * At this point, if the add_stream fails, it is not in the > >> - * hash table thus passing the NULL value here. > >> - */ > >> + do { > >> + ret = write(ctx->consumer_poll_pipe[1], &new_stream, > >> + sizeof(new_stream)); > >> + } while (ret < 0 && errno == EINTR); > >> + if (ret < 0) { > >> + PERROR("write data pipe"); > >> consumer_del_stream(new_stream, NULL); > >> goto end_nosignal; > >> } > >> @@ -334,20 +344,6 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >> break; > >> } > >> > >> - /* > >> - * Wake-up the other end by writing a null byte in the pipe (non-blocking). > >> - * Important note: Because writing into the pipe is non-blocking (and > >> - * therefore we allow dropping wakeup data, as long as there is wakeup data > >> - * present in the pipe buffer to wake up the other end), the other end > >> - * should perform the following sequence for waiting: > >> - * > >> - * 1) empty the pipe (reads). > >> - * 2) perform update operation. > >> - * 3) wait on the pipe (poll). > >> - */ > >> - do { > >> - ret = write(ctx->consumer_poll_pipe[1], "", 1); > >> - } while (ret < 0 && errno == EINTR); > >> end_nosignal: > >> rcu_read_unlock(); > >> > >> -- > >> 1.7.10.4 > >> > >> > >> _______________________________________________ > >> lttng-dev mailing list > >> lttng-dev at lists.lttng.org > >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Mon Oct 15 13:37:05 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 15 Oct 2012 13:37:05 -0400 Subject: [lttng-dev] [PATCH lttng-tools 1/4] Rename consumer threads and spawn them in daemon In-Reply-To: <20121015172807.GA9034@Krystal> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <20121013154113.GA29985@Krystal> <507C2E26.1030401@efficios.com> <20121015172807.GA9034@Krystal> Message-ID: <507C49C1.3000801@efficios.com> Mathieu Desnoyers: >>>> diff --git a/src/common/consumer.c b/src/common/consumer.c >>>> index 242b05b..055de1b 100644 >>>> --- a/src/common/consumer.c >>>> +++ b/src/common/consumer.c >>>> @@ -1131,6 +1131,8 @@ void lttng_consumer_destroy(struct lttng_consumer_local_data *ctx) >>>> PERROR("close"); >>>> } >>>> utils_close_pipe(ctx->consumer_splice_metadata_pipe); >>>> + /* This should trigger the metadata thread to exit */ >>>> + close(ctx->consumer_metadata_pipe[1]); >>> >>> this is adding a close, but did not remove any other remove that might >>> previously be in place elsewhere. >> >> So we got two possible error path which is either the poll thread fails >> or the consumer could be destroy by hand even though the threads are >> working well. >> >> Actually, this close should check if the value is valid and close it. To >> be honest, this is just a shortcut since close(-1) does not fail and >> ignoring the close error here since we are in the cleanup path anyway so >> we don't necessarily care about the perror message. >> >> Anyhow, we have to handle both error path. An if plus set -1 after close >> can be done so not to confuse. > > if two threads can concurrently perform close on the same fd value, how > can you prove there are no possible races ? Nothing to prove, the race is possible. The point I was trying to explain is that it does not matter actually since we are in a cleanup code path. Anyway, let's remove it since the data thread, when dying, will close the metadata pipe anyway. This will avoid more discussion for this small detail :). David > > Mathieu > From mathieu.desnoyers at efficios.com Mon Oct 15 13:39:33 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 15 Oct 2012 13:39:33 -0400 Subject: [lttng-dev] [PATCH lttng-tools 1/4] Rename consumer threads and spawn them in daemon In-Reply-To: <507C49C1.3000801@efficios.com> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <20121013154113.GA29985@Krystal> <507C2E26.1030401@efficios.com> <20121015172807.GA9034@Krystal> <507C49C1.3000801@efficios.com> Message-ID: <20121015173933.GB9142@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > > > Mathieu Desnoyers: > >>>> diff --git a/src/common/consumer.c b/src/common/consumer.c > >>>> index 242b05b..055de1b 100644 > >>>> --- a/src/common/consumer.c > >>>> +++ b/src/common/consumer.c > >>>> @@ -1131,6 +1131,8 @@ void lttng_consumer_destroy(struct lttng_consumer_local_data *ctx) > >>>> PERROR("close"); > >>>> } > >>>> utils_close_pipe(ctx->consumer_splice_metadata_pipe); > >>>> + /* This should trigger the metadata thread to exit */ > >>>> + close(ctx->consumer_metadata_pipe[1]); > >>> > >>> this is adding a close, but did not remove any other remove that might > >>> previously be in place elsewhere. > >> > >> So we got two possible error path which is either the poll thread fails > >> or the consumer could be destroy by hand even though the threads are > >> working well. > >> > >> Actually, this close should check if the value is valid and close it. To > >> be honest, this is just a shortcut since close(-1) does not fail and > >> ignoring the close error here since we are in the cleanup path anyway so > >> we don't necessarily care about the perror message. > >> > >> Anyhow, we have to handle both error path. An if plus set -1 after close > >> can be done so not to confuse. > > > > if two threads can concurrently perform close on the same fd value, how > > can you prove there are no possible races ? > > Nothing to prove, the race is possible. The point I was trying to > explain is that it does not matter actually since we are in a cleanup > code path. what happens if we close FD 0, 1, 2 or another FD, due to this race ? what happens if our code evolve to restart threads after errors, and we leave this race in place, so it becomes hard to reproduce that in some occasions we are closing random file descriptors ? > Anyway, let's remove it since the data thread, when dying, will close > the metadata pipe anyway. my point exactly :) > > This will avoid more discussion for this small detail :). The devil is in the details, as someone famous said before me. Thanks, Mathieu > > David > > > > > Mathieu > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Mon Oct 15 13:40:52 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 15 Oct 2012 13:40:52 -0400 Subject: [lttng-dev] [PATCH lttng-tools 2/4] Move add data stream to the data thread In-Reply-To: <20121015173101.GA9142@Krystal> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <1350052235-12198-2-git-send-email-dgoulet@efficios.com> <20121013155313.GB29985@Krystal> <507C2E6F.9050107@efficios.com> <20121015173101.GA9142@Krystal> Message-ID: <507C4AA4.8080901@efficios.com> Mathieu Desnoyers: > * David Goulet (dgoulet at efficios.com) wrote: >> >> >> Mathieu Desnoyers: >>> * David Goulet (dgoulet at efficios.com) wrote: >>>> As a second step of refactoring, upon receiving a data stream, we send >>>> it to the data thread that is now in charge of handling it. >>>> >>>> Furthermore, in order for this to behave correctly, we have to make the >>>> ustctl actions on the stream upon before passing it to the right thread >>>> (the kernel does not need special actions.). This way, once the sessiond >>>> thread reply back to the session daemon, the stream is sure to be open >>>> and ready for data to be recorded on the application side so we avoid a >>>> race between the application thinking the stream is ready and the stream >>>> thread still scheduled out. >>> >>> Normally, as long as we have a reference on the SHM file descriptor, and >>> we have the wakeup FD, we should be good to fetch the data of buffers >>> belonging to an application that has already exited, even if it did so >>> before the ustctl calls are done. >>> >>> So I'm wondering why you do the ustctl calls in the sessiond thread ? It >>> seems to complexify the implementation needlessly: we could still do the >>> ustctl calls and output file open at the same location, the >>> data/metadata threads. >> >> Hmmm, it was my understanding that does > > does -> those > >> ustctl_* calls were needed >> before the trace could be recording thus making them quickly. Wrong? > > Can you rephrase your question ? I don't understand. > My understanding was that _those_ ustctl calls need to be done before the tracer could start recording data. This is why they were moved to the session daemon thread. Am I wrong here? When receiving an UST stream< on the consumer side, is the SHM reference already acquired? David > Thanks, > > Mathieu > >> >> David >> >>> >>> Thanks, >>> >>> Mathieu >>> >>>> >>>> This commit should speed up the add stream process for the session >>>> daemon. There is still some actions to move out of the session daemon >>>> poll thread to gain speed significantly, especially for network >>>> streaming. >>>> >>>> Signed-off-by: David Goulet >>>> --- >>>> src/common/consumer.c | 123 +++++++++++--------------- >>>> src/common/consumer.h | 1 + >>>> src/common/kernel-consumer/kernel-consumer.c | 24 ++--- >>>> src/common/ust-consumer/ust-consumer.c | 40 ++++----- >>>> 4 files changed, 78 insertions(+), 110 deletions(-) >>>> >>>> diff --git a/src/common/consumer.c b/src/common/consumer.c >>>> index 055de1b..1d2b1f7 100644 >>>> --- a/src/common/consumer.c >>>> +++ b/src/common/consumer.c >>>> @@ -89,7 +89,7 @@ static struct lttng_consumer_stream *consumer_find_stream(int key, >>>> return stream; >>>> } >>>> >>>> -static void consumer_steal_stream_key(int key, struct lttng_ht *ht) >>>> +void consumer_steal_stream_key(int key, struct lttng_ht *ht) >>>> { >>>> struct lttng_consumer_stream *stream; >>>> >>>> @@ -409,6 +409,14 @@ struct lttng_consumer_stream *consumer_allocate_stream( >>>> lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); >>>> lttng_ht_node_init_ulong(&stream->node, stream->key); >>>> >>>> + /* >>>> + * The cpu number is needed before using any ustctl_* actions. Ignored for >>>> + * the kernel so the value does not matter. >>>> + */ >>>> + pthread_mutex_lock(&consumer_data.lock); >>>> + stream->cpu = stream->chan->cpucount++; >>>> + pthread_mutex_unlock(&consumer_data.lock); >>>> + >>>> DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," >>>> " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, >>>> stream->shm_fd, stream->wait_fd, >>>> @@ -437,28 +445,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) >>>> pthread_mutex_lock(&consumer_data.lock); >>>> rcu_read_lock(); >>>> >>>> - switch (consumer_data.type) { >>>> - case LTTNG_CONSUMER_KERNEL: >>>> - break; >>>> - case LTTNG_CONSUMER32_UST: >>>> - case LTTNG_CONSUMER64_UST: >>>> - stream->cpu = stream->chan->cpucount++; >>>> - ret = lttng_ustconsumer_add_stream(stream); >>>> - if (ret) { >>>> - ret = -EINVAL; >>>> - goto error; >>>> - } >>>> - >>>> - /* Steal stream identifier only for UST */ >>>> - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); >>>> - break; >>>> - default: >>>> - ERR("Unknown consumer_data type"); >>>> - assert(0); >>>> - ret = -ENOSYS; >>>> - goto error; >>>> - } >>>> - >>>> lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); >>>> >>>> /* Check and cleanup relayd */ >>>> @@ -485,7 +471,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) >>>> consumer_data.stream_count++; >>>> consumer_data.need_update = 1; >>>> >>>> -error: >>>> rcu_read_unlock(); >>>> pthread_mutex_unlock(&consumer_data.lock); >>>> >>>> @@ -1582,17 +1567,6 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, >>>> >>>> DBG3("Consumer delete metadata stream %d", stream->wait_fd); >>>> >>>> - if (ht == NULL) { >>>> - /* Means the stream was allocated but not successfully added */ >>>> - goto free_stream; >>>> - } >>>> - >>>> - rcu_read_lock(); >>>> - iter.iter.node = &stream->waitfd_node.node; >>>> - ret = lttng_ht_del(ht, &iter); >>>> - assert(!ret); >>>> - rcu_read_unlock(); >>>> - >>>> pthread_mutex_lock(&consumer_data.lock); >>>> switch (consumer_data.type) { >>>> case LTTNG_CONSUMER_KERNEL: >>>> @@ -1613,6 +1587,18 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, >>>> goto end; >>>> } >>>> >>>> + if (ht == NULL) { >>>> + pthread_mutex_unlock(&consumer_data.lock); >>>> + /* Means the stream was allocated but not successfully added */ >>>> + goto free_stream; >>>> + } >>>> + >>>> + rcu_read_lock(); >>>> + iter.iter.node = &stream->waitfd_node.node; >>>> + ret = lttng_ht_del(ht, &iter); >>>> + assert(!ret); >>>> + rcu_read_unlock(); >>>> + >>>> if (stream->out_fd >= 0) { >>>> ret = close(stream->out_fd); >>>> if (ret) { >>>> @@ -1699,27 +1685,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, >>>> >>>> pthread_mutex_lock(&consumer_data.lock); >>>> >>>> - switch (consumer_data.type) { >>>> - case LTTNG_CONSUMER_KERNEL: >>>> - break; >>>> - case LTTNG_CONSUMER32_UST: >>>> - case LTTNG_CONSUMER64_UST: >>>> - ret = lttng_ustconsumer_add_stream(stream); >>>> - if (ret) { >>>> - ret = -EINVAL; >>>> - goto error; >>>> - } >>>> - >>>> - /* Steal stream identifier only for UST */ >>>> - consumer_steal_stream_key(stream->wait_fd, ht); >>>> - break; >>>> - default: >>>> - ERR("Unknown consumer_data type"); >>>> - assert(0); >>>> - ret = -ENOSYS; >>>> - goto error; >>>> - } >>>> - >>>> /* >>>> * From here, refcounts are updated so be _careful_ when returning an error >>>> * after this point. >>>> @@ -1749,7 +1714,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, >>>> lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); >>>> rcu_read_unlock(); >>>> >>>> -error: >>>> pthread_mutex_unlock(&consumer_data.lock); >>>> return ret; >>>> } >>>> @@ -1946,7 +1910,7 @@ void *consumer_thread_data_poll(void *data) >>>> int num_rdy, num_hup, high_prio, ret, i; >>>> struct pollfd *pollfd = NULL; >>>> /* local view of the streams */ >>>> - struct lttng_consumer_stream **local_stream = NULL; >>>> + struct lttng_consumer_stream **local_stream = NULL, *new_stream = NULL; >>>> /* local view of consumer_data.fds_count */ >>>> int nb_fd = 0; >>>> struct lttng_consumer_local_data *ctx = data; >>>> @@ -2034,13 +1998,35 @@ void *consumer_thread_data_poll(void *data) >>>> */ >>>> if (pollfd[nb_fd].revents & (POLLIN | POLLPRI)) { >>>> size_t pipe_readlen; >>>> - char tmp; >>>> >>>> DBG("consumer_poll_pipe wake up"); >>>> /* Consume 1 byte of pipe data */ >>>> do { >>>> - pipe_readlen = read(ctx->consumer_poll_pipe[0], &tmp, 1); >>>> + pipe_readlen = read(ctx->consumer_poll_pipe[0], &new_stream, >>>> + sizeof(new_stream)); >>>> } while (pipe_readlen == -1 && errno == EINTR); >>>> + >>>> + /* >>>> + * If the stream is NULL, just ignore it. It's also possible that >>>> + * the sessiond poll thread changed the consumer_quit state and is >>>> + * waking us up to test it. >>>> + */ >>>> + if (new_stream == NULL) { >>>> + continue; >>>> + } >>>> + >>>> + ret = consumer_add_stream(new_stream); >>>> + if (ret) { >>>> + ERR("Consumer add stream %d failed. Continuing", >>>> + new_stream->key); >>>> + /* >>>> + * At this point, if the add_stream fails, it is not in the >>>> + * hash table thus passing the NULL value here. >>>> + */ >>>> + consumer_del_stream(new_stream, NULL); >>>> + } >>>> + >>>> + /* Continue to update the local streams and handle prio ones */ >>>> continue; >>>> } >>>> >>>> @@ -2260,19 +2246,16 @@ end: >>>> consumer_poll_timeout = LTTNG_CONSUMER_POLL_TIMEOUT; >>>> >>>> /* >>>> - * Wake-up the other end by writing a null byte in the pipe >>>> - * (non-blocking). Important note: Because writing into the >>>> - * pipe is non-blocking (and therefore we allow dropping wakeup >>>> - * data, as long as there is wakeup data present in the pipe >>>> - * buffer to wake up the other end), the other end should >>>> - * perform the following sequence for waiting: >>>> - * 1) empty the pipe (reads). >>>> - * 2) perform update operation. >>>> - * 3) wait on the pipe (poll). >>>> + * Notify the data poll thread to poll back again and test the >>>> + * consumer_quit state to quit gracefully. >>>> */ >>>> do { >>>> - ret = write(ctx->consumer_poll_pipe[1], "", 1); >>>> + struct lttng_consumer_stream *null_stream = NULL; >>>> + >>>> + ret = write(ctx->consumer_poll_pipe[1], &null_stream, >>>> + sizeof(null_stream)); >>>> } while (ret < 0 && errno == EINTR); >>>> + >>>> rcu_unregister_thread(); >>>> return NULL; >>>> } >>>> diff --git a/src/common/consumer.h b/src/common/consumer.h >>>> index 4b225e4..8e5891a 100644 >>>> --- a/src/common/consumer.h >>>> +++ b/src/common/consumer.h >>>> @@ -362,6 +362,7 @@ struct consumer_relayd_sock_pair *consumer_allocate_relayd_sock_pair( >>>> struct consumer_relayd_sock_pair *consumer_find_relayd(int key); >>>> int consumer_handle_stream_before_relayd(struct lttng_consumer_stream *stream, >>>> size_t data_size); >>>> +void consumer_steal_stream_key(int key, struct lttng_ht *ht); >>>> >>>> extern struct lttng_consumer_local_data *lttng_consumer_create( >>>> enum lttng_consumer_type type, >>>> diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c >>>> index 13cbe21..444f5e0 100644 >>>> --- a/src/common/kernel-consumer/kernel-consumer.c >>>> +++ b/src/common/kernel-consumer/kernel-consumer.c >>>> @@ -235,10 +235,12 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>>> consumer_del_stream(new_stream, NULL); >>>> } >>>> } else { >>>> - ret = consumer_add_stream(new_stream); >>>> - if (ret) { >>>> - ERR("Consumer add stream %d failed. Continuing", >>>> - new_stream->key); >>>> + do { >>>> + ret = write(ctx->consumer_poll_pipe[1], &new_stream, >>>> + sizeof(new_stream)); >>>> + } while (ret < 0 && errno == EINTR); >>>> + if (ret < 0) { >>>> + PERROR("write data pipe"); >>>> consumer_del_stream(new_stream, NULL); >>>> goto end_nosignal; >>>> } >>>> @@ -284,20 +286,6 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>>> goto end_nosignal; >>>> } >>>> >>>> - /* >>>> - * Wake-up the other end by writing a null byte in the pipe (non-blocking). >>>> - * Important note: Because writing into the pipe is non-blocking (and >>>> - * therefore we allow dropping wakeup data, as long as there is wakeup data >>>> - * present in the pipe buffer to wake up the other end), the other end >>>> - * should perform the following sequence for waiting: >>>> - * >>>> - * 1) empty the pipe (reads). >>>> - * 2) perform update operation. >>>> - * 3) wait on the pipe (poll). >>>> - */ >>>> - do { >>>> - ret = write(ctx->consumer_poll_pipe[1], "", 1); >>>> - } while (ret < 0 && errno == EINTR); >>>> end_nosignal: >>>> rcu_read_unlock(); >>>> >>>> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c >>>> index 1170687..4ca4b84 100644 >>>> --- a/src/common/ust-consumer/ust-consumer.c >>>> +++ b/src/common/ust-consumer/ust-consumer.c >>>> @@ -224,6 +224,18 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>>> goto end_nosignal; >>>> } >>>> >>>> + /* >>>> + * This needs to be done as soon as we can so we don't block the >>>> + * application too long. >>>> + */ >>>> + ret = lttng_ustconsumer_add_stream(new_stream); >>>> + if (ret) { >>>> + consumer_del_stream(new_stream, NULL); >>>> + goto end_nosignal; >>>> + } >>>> + /* Steal stream identifier to avoid having streams with the same key */ >>>> + consumer_steal_stream_key(new_stream->key, consumer_data.stream_ht); >>>> + >>>> /* The stream is not metadata. Get relayd reference if exists. */ >>>> relayd = consumer_find_relayd(msg.u.stream.net_index); >>>> if (relayd != NULL) { >>>> @@ -265,14 +277,12 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>>> goto end_nosignal; >>>> } >>>> } else { >>>> - ret = consumer_add_stream(new_stream); >>>> - if (ret) { >>>> - ERR("Consumer add stream %d failed. Continuing", >>>> - new_stream->key); >>>> - /* >>>> - * At this point, if the add_stream fails, it is not in the >>>> - * hash table thus passing the NULL value here. >>>> - */ >>>> + do { >>>> + ret = write(ctx->consumer_poll_pipe[1], &new_stream, >>>> + sizeof(new_stream)); >>>> + } while (ret < 0 && errno == EINTR); >>>> + if (ret < 0) { >>>> + PERROR("write data pipe"); >>>> consumer_del_stream(new_stream, NULL); >>>> goto end_nosignal; >>>> } >>>> @@ -334,20 +344,6 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>>> break; >>>> } >>>> >>>> - /* >>>> - * Wake-up the other end by writing a null byte in the pipe (non-blocking). >>>> - * Important note: Because writing into the pipe is non-blocking (and >>>> - * therefore we allow dropping wakeup data, as long as there is wakeup data >>>> - * present in the pipe buffer to wake up the other end), the other end >>>> - * should perform the following sequence for waiting: >>>> - * >>>> - * 1) empty the pipe (reads). >>>> - * 2) perform update operation. >>>> - * 3) wait on the pipe (poll). >>>> - */ >>>> - do { >>>> - ret = write(ctx->consumer_poll_pipe[1], "", 1); >>>> - } while (ret < 0 && errno == EINTR); >>>> end_nosignal: >>>> rcu_read_unlock(); >>>> >>>> -- >>>> 1.7.10.4 >>>> >>>> >>>> _______________________________________________ >>>> lttng-dev mailing list >>>> lttng-dev at lists.lttng.org >>>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev >>> > From mathieu.desnoyers at efficios.com Mon Oct 15 13:42:32 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 15 Oct 2012 13:42:32 -0400 Subject: [lttng-dev] [PATCH lttng-tools 2/4] Move add data stream to the data thread In-Reply-To: <507C4AA4.8080901@efficios.com> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <1350052235-12198-2-git-send-email-dgoulet@efficios.com> <20121013155313.GB29985@Krystal> <507C2E6F.9050107@efficios.com> <20121015173101.GA9142@Krystal> <507C4AA4.8080901@efficios.com> Message-ID: <20121015174232.GA9258@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > > > Mathieu Desnoyers: > > * David Goulet (dgoulet at efficios.com) wrote: > >> > >> > >> Mathieu Desnoyers: > >>> * David Goulet (dgoulet at efficios.com) wrote: > >>>> As a second step of refactoring, upon receiving a data stream, we send > >>>> it to the data thread that is now in charge of handling it. > >>>> > >>>> Furthermore, in order for this to behave correctly, we have to make the > >>>> ustctl actions on the stream upon before passing it to the right thread > >>>> (the kernel does not need special actions.). This way, once the sessiond > >>>> thread reply back to the session daemon, the stream is sure to be open > >>>> and ready for data to be recorded on the application side so we avoid a > >>>> race between the application thinking the stream is ready and the stream > >>>> thread still scheduled out. > >>> > >>> Normally, as long as we have a reference on the SHM file descriptor, and > >>> we have the wakeup FD, we should be good to fetch the data of buffers > >>> belonging to an application that has already exited, even if it did so > >>> before the ustctl calls are done. > >>> > >>> So I'm wondering why you do the ustctl calls in the sessiond thread ? It > >>> seems to complexify the implementation needlessly: we could still do the > >>> ustctl calls and output file open at the same location, the > >>> data/metadata threads. > >> > >> Hmmm, it was my understanding that does > > > > does -> those > > > >> ustctl_* calls were needed > >> before the trace could be recording thus making them quickly. Wrong? > > > > Can you rephrase your question ? I don't understand. > > > > My understanding was that _those_ ustctl calls need to be done before > the tracer could start recording data. This is why they were moved to > the session daemon thread. > > Am I wrong here? When receiving an UST stream< on the consumer side, is > the SHM reference already acquired? yes, the reference to shm is already acquired: it's the FD that _has_ the reference. Thanks, Mathieu > > David > > > Thanks, > > > > Mathieu > > > >> > >> David > >> > >>> > >>> Thanks, > >>> > >>> Mathieu > >>> > >>>> > >>>> This commit should speed up the add stream process for the session > >>>> daemon. There is still some actions to move out of the session daemon > >>>> poll thread to gain speed significantly, especially for network > >>>> streaming. > >>>> > >>>> Signed-off-by: David Goulet > >>>> --- > >>>> src/common/consumer.c | 123 +++++++++++--------------- > >>>> src/common/consumer.h | 1 + > >>>> src/common/kernel-consumer/kernel-consumer.c | 24 ++--- > >>>> src/common/ust-consumer/ust-consumer.c | 40 ++++----- > >>>> 4 files changed, 78 insertions(+), 110 deletions(-) > >>>> > >>>> diff --git a/src/common/consumer.c b/src/common/consumer.c > >>>> index 055de1b..1d2b1f7 100644 > >>>> --- a/src/common/consumer.c > >>>> +++ b/src/common/consumer.c > >>>> @@ -89,7 +89,7 @@ static struct lttng_consumer_stream *consumer_find_stream(int key, > >>>> return stream; > >>>> } > >>>> > >>>> -static void consumer_steal_stream_key(int key, struct lttng_ht *ht) > >>>> +void consumer_steal_stream_key(int key, struct lttng_ht *ht) > >>>> { > >>>> struct lttng_consumer_stream *stream; > >>>> > >>>> @@ -409,6 +409,14 @@ struct lttng_consumer_stream *consumer_allocate_stream( > >>>> lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); > >>>> lttng_ht_node_init_ulong(&stream->node, stream->key); > >>>> > >>>> + /* > >>>> + * The cpu number is needed before using any ustctl_* actions. Ignored for > >>>> + * the kernel so the value does not matter. > >>>> + */ > >>>> + pthread_mutex_lock(&consumer_data.lock); > >>>> + stream->cpu = stream->chan->cpucount++; > >>>> + pthread_mutex_unlock(&consumer_data.lock); > >>>> + > >>>> DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," > >>>> " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, > >>>> stream->shm_fd, stream->wait_fd, > >>>> @@ -437,28 +445,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) > >>>> pthread_mutex_lock(&consumer_data.lock); > >>>> rcu_read_lock(); > >>>> > >>>> - switch (consumer_data.type) { > >>>> - case LTTNG_CONSUMER_KERNEL: > >>>> - break; > >>>> - case LTTNG_CONSUMER32_UST: > >>>> - case LTTNG_CONSUMER64_UST: > >>>> - stream->cpu = stream->chan->cpucount++; > >>>> - ret = lttng_ustconsumer_add_stream(stream); > >>>> - if (ret) { > >>>> - ret = -EINVAL; > >>>> - goto error; > >>>> - } > >>>> - > >>>> - /* Steal stream identifier only for UST */ > >>>> - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); > >>>> - break; > >>>> - default: > >>>> - ERR("Unknown consumer_data type"); > >>>> - assert(0); > >>>> - ret = -ENOSYS; > >>>> - goto error; > >>>> - } > >>>> - > >>>> lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); > >>>> > >>>> /* Check and cleanup relayd */ > >>>> @@ -485,7 +471,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) > >>>> consumer_data.stream_count++; > >>>> consumer_data.need_update = 1; > >>>> > >>>> -error: > >>>> rcu_read_unlock(); > >>>> pthread_mutex_unlock(&consumer_data.lock); > >>>> > >>>> @@ -1582,17 +1567,6 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, > >>>> > >>>> DBG3("Consumer delete metadata stream %d", stream->wait_fd); > >>>> > >>>> - if (ht == NULL) { > >>>> - /* Means the stream was allocated but not successfully added */ > >>>> - goto free_stream; > >>>> - } > >>>> - > >>>> - rcu_read_lock(); > >>>> - iter.iter.node = &stream->waitfd_node.node; > >>>> - ret = lttng_ht_del(ht, &iter); > >>>> - assert(!ret); > >>>> - rcu_read_unlock(); > >>>> - > >>>> pthread_mutex_lock(&consumer_data.lock); > >>>> switch (consumer_data.type) { > >>>> case LTTNG_CONSUMER_KERNEL: > >>>> @@ -1613,6 +1587,18 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, > >>>> goto end; > >>>> } > >>>> > >>>> + if (ht == NULL) { > >>>> + pthread_mutex_unlock(&consumer_data.lock); > >>>> + /* Means the stream was allocated but not successfully added */ > >>>> + goto free_stream; > >>>> + } > >>>> + > >>>> + rcu_read_lock(); > >>>> + iter.iter.node = &stream->waitfd_node.node; > >>>> + ret = lttng_ht_del(ht, &iter); > >>>> + assert(!ret); > >>>> + rcu_read_unlock(); > >>>> + > >>>> if (stream->out_fd >= 0) { > >>>> ret = close(stream->out_fd); > >>>> if (ret) { > >>>> @@ -1699,27 +1685,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > >>>> > >>>> pthread_mutex_lock(&consumer_data.lock); > >>>> > >>>> - switch (consumer_data.type) { > >>>> - case LTTNG_CONSUMER_KERNEL: > >>>> - break; > >>>> - case LTTNG_CONSUMER32_UST: > >>>> - case LTTNG_CONSUMER64_UST: > >>>> - ret = lttng_ustconsumer_add_stream(stream); > >>>> - if (ret) { > >>>> - ret = -EINVAL; > >>>> - goto error; > >>>> - } > >>>> - > >>>> - /* Steal stream identifier only for UST */ > >>>> - consumer_steal_stream_key(stream->wait_fd, ht); > >>>> - break; > >>>> - default: > >>>> - ERR("Unknown consumer_data type"); > >>>> - assert(0); > >>>> - ret = -ENOSYS; > >>>> - goto error; > >>>> - } > >>>> - > >>>> /* > >>>> * From here, refcounts are updated so be _careful_ when returning an error > >>>> * after this point. > >>>> @@ -1749,7 +1714,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > >>>> lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); > >>>> rcu_read_unlock(); > >>>> > >>>> -error: > >>>> pthread_mutex_unlock(&consumer_data.lock); > >>>> return ret; > >>>> } > >>>> @@ -1946,7 +1910,7 @@ void *consumer_thread_data_poll(void *data) > >>>> int num_rdy, num_hup, high_prio, ret, i; > >>>> struct pollfd *pollfd = NULL; > >>>> /* local view of the streams */ > >>>> - struct lttng_consumer_stream **local_stream = NULL; > >>>> + struct lttng_consumer_stream **local_stream = NULL, *new_stream = NULL; > >>>> /* local view of consumer_data.fds_count */ > >>>> int nb_fd = 0; > >>>> struct lttng_consumer_local_data *ctx = data; > >>>> @@ -2034,13 +1998,35 @@ void *consumer_thread_data_poll(void *data) > >>>> */ > >>>> if (pollfd[nb_fd].revents & (POLLIN | POLLPRI)) { > >>>> size_t pipe_readlen; > >>>> - char tmp; > >>>> > >>>> DBG("consumer_poll_pipe wake up"); > >>>> /* Consume 1 byte of pipe data */ > >>>> do { > >>>> - pipe_readlen = read(ctx->consumer_poll_pipe[0], &tmp, 1); > >>>> + pipe_readlen = read(ctx->consumer_poll_pipe[0], &new_stream, > >>>> + sizeof(new_stream)); > >>>> } while (pipe_readlen == -1 && errno == EINTR); > >>>> + > >>>> + /* > >>>> + * If the stream is NULL, just ignore it. It's also possible that > >>>> + * the sessiond poll thread changed the consumer_quit state and is > >>>> + * waking us up to test it. > >>>> + */ > >>>> + if (new_stream == NULL) { > >>>> + continue; > >>>> + } > >>>> + > >>>> + ret = consumer_add_stream(new_stream); > >>>> + if (ret) { > >>>> + ERR("Consumer add stream %d failed. Continuing", > >>>> + new_stream->key); > >>>> + /* > >>>> + * At this point, if the add_stream fails, it is not in the > >>>> + * hash table thus passing the NULL value here. > >>>> + */ > >>>> + consumer_del_stream(new_stream, NULL); > >>>> + } > >>>> + > >>>> + /* Continue to update the local streams and handle prio ones */ > >>>> continue; > >>>> } > >>>> > >>>> @@ -2260,19 +2246,16 @@ end: > >>>> consumer_poll_timeout = LTTNG_CONSUMER_POLL_TIMEOUT; > >>>> > >>>> /* > >>>> - * Wake-up the other end by writing a null byte in the pipe > >>>> - * (non-blocking). Important note: Because writing into the > >>>> - * pipe is non-blocking (and therefore we allow dropping wakeup > >>>> - * data, as long as there is wakeup data present in the pipe > >>>> - * buffer to wake up the other end), the other end should > >>>> - * perform the following sequence for waiting: > >>>> - * 1) empty the pipe (reads). > >>>> - * 2) perform update operation. > >>>> - * 3) wait on the pipe (poll). > >>>> + * Notify the data poll thread to poll back again and test the > >>>> + * consumer_quit state to quit gracefully. > >>>> */ > >>>> do { > >>>> - ret = write(ctx->consumer_poll_pipe[1], "", 1); > >>>> + struct lttng_consumer_stream *null_stream = NULL; > >>>> + > >>>> + ret = write(ctx->consumer_poll_pipe[1], &null_stream, > >>>> + sizeof(null_stream)); > >>>> } while (ret < 0 && errno == EINTR); > >>>> + > >>>> rcu_unregister_thread(); > >>>> return NULL; > >>>> } > >>>> diff --git a/src/common/consumer.h b/src/common/consumer.h > >>>> index 4b225e4..8e5891a 100644 > >>>> --- a/src/common/consumer.h > >>>> +++ b/src/common/consumer.h > >>>> @@ -362,6 +362,7 @@ struct consumer_relayd_sock_pair *consumer_allocate_relayd_sock_pair( > >>>> struct consumer_relayd_sock_pair *consumer_find_relayd(int key); > >>>> int consumer_handle_stream_before_relayd(struct lttng_consumer_stream *stream, > >>>> size_t data_size); > >>>> +void consumer_steal_stream_key(int key, struct lttng_ht *ht); > >>>> > >>>> extern struct lttng_consumer_local_data *lttng_consumer_create( > >>>> enum lttng_consumer_type type, > >>>> diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c > >>>> index 13cbe21..444f5e0 100644 > >>>> --- a/src/common/kernel-consumer/kernel-consumer.c > >>>> +++ b/src/common/kernel-consumer/kernel-consumer.c > >>>> @@ -235,10 +235,12 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>>> consumer_del_stream(new_stream, NULL); > >>>> } > >>>> } else { > >>>> - ret = consumer_add_stream(new_stream); > >>>> - if (ret) { > >>>> - ERR("Consumer add stream %d failed. Continuing", > >>>> - new_stream->key); > >>>> + do { > >>>> + ret = write(ctx->consumer_poll_pipe[1], &new_stream, > >>>> + sizeof(new_stream)); > >>>> + } while (ret < 0 && errno == EINTR); > >>>> + if (ret < 0) { > >>>> + PERROR("write data pipe"); > >>>> consumer_del_stream(new_stream, NULL); > >>>> goto end_nosignal; > >>>> } > >>>> @@ -284,20 +286,6 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>>> goto end_nosignal; > >>>> } > >>>> > >>>> - /* > >>>> - * Wake-up the other end by writing a null byte in the pipe (non-blocking). > >>>> - * Important note: Because writing into the pipe is non-blocking (and > >>>> - * therefore we allow dropping wakeup data, as long as there is wakeup data > >>>> - * present in the pipe buffer to wake up the other end), the other end > >>>> - * should perform the following sequence for waiting: > >>>> - * > >>>> - * 1) empty the pipe (reads). > >>>> - * 2) perform update operation. > >>>> - * 3) wait on the pipe (poll). > >>>> - */ > >>>> - do { > >>>> - ret = write(ctx->consumer_poll_pipe[1], "", 1); > >>>> - } while (ret < 0 && errno == EINTR); > >>>> end_nosignal: > >>>> rcu_read_unlock(); > >>>> > >>>> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c > >>>> index 1170687..4ca4b84 100644 > >>>> --- a/src/common/ust-consumer/ust-consumer.c > >>>> +++ b/src/common/ust-consumer/ust-consumer.c > >>>> @@ -224,6 +224,18 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>>> goto end_nosignal; > >>>> } > >>>> > >>>> + /* > >>>> + * This needs to be done as soon as we can so we don't block the > >>>> + * application too long. > >>>> + */ > >>>> + ret = lttng_ustconsumer_add_stream(new_stream); > >>>> + if (ret) { > >>>> + consumer_del_stream(new_stream, NULL); > >>>> + goto end_nosignal; > >>>> + } > >>>> + /* Steal stream identifier to avoid having streams with the same key */ > >>>> + consumer_steal_stream_key(new_stream->key, consumer_data.stream_ht); > >>>> + > >>>> /* The stream is not metadata. Get relayd reference if exists. */ > >>>> relayd = consumer_find_relayd(msg.u.stream.net_index); > >>>> if (relayd != NULL) { > >>>> @@ -265,14 +277,12 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>>> goto end_nosignal; > >>>> } > >>>> } else { > >>>> - ret = consumer_add_stream(new_stream); > >>>> - if (ret) { > >>>> - ERR("Consumer add stream %d failed. Continuing", > >>>> - new_stream->key); > >>>> - /* > >>>> - * At this point, if the add_stream fails, it is not in the > >>>> - * hash table thus passing the NULL value here. > >>>> - */ > >>>> + do { > >>>> + ret = write(ctx->consumer_poll_pipe[1], &new_stream, > >>>> + sizeof(new_stream)); > >>>> + } while (ret < 0 && errno == EINTR); > >>>> + if (ret < 0) { > >>>> + PERROR("write data pipe"); > >>>> consumer_del_stream(new_stream, NULL); > >>>> goto end_nosignal; > >>>> } > >>>> @@ -334,20 +344,6 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>>> break; > >>>> } > >>>> > >>>> - /* > >>>> - * Wake-up the other end by writing a null byte in the pipe (non-blocking). > >>>> - * Important note: Because writing into the pipe is non-blocking (and > >>>> - * therefore we allow dropping wakeup data, as long as there is wakeup data > >>>> - * present in the pipe buffer to wake up the other end), the other end > >>>> - * should perform the following sequence for waiting: > >>>> - * > >>>> - * 1) empty the pipe (reads). > >>>> - * 2) perform update operation. > >>>> - * 3) wait on the pipe (poll). > >>>> - */ > >>>> - do { > >>>> - ret = write(ctx->consumer_poll_pipe[1], "", 1); > >>>> - } while (ret < 0 && errno == EINTR); > >>>> end_nosignal: > >>>> rcu_read_unlock(); > >>>> > >>>> -- > >>>> 1.7.10.4 > >>>> > >>>> > >>>> _______________________________________________ > >>>> lttng-dev mailing list > >>>> lttng-dev at lists.lttng.org > >>>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > >>> > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Mon Oct 15 13:42:53 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 15 Oct 2012 13:42:53 -0400 Subject: [lttng-dev] [PATCH lttng-tools 1/4] Rename consumer threads and spawn them in daemon In-Reply-To: <20121015173933.GB9142@Krystal> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <20121013154113.GA29985@Krystal> <507C2E26.1030401@efficios.com> <20121015172807.GA9034@Krystal> <507C49C1.3000801@efficios.com> <20121015173933.GB9142@Krystal> Message-ID: <507C4B1D.3050602@efficios.com> Mathieu Desnoyers: > * David Goulet (dgoulet at efficios.com) wrote: >> >> >> Mathieu Desnoyers: >>>>>> diff --git a/src/common/consumer.c b/src/common/consumer.c >>>>>> index 242b05b..055de1b 100644 >>>>>> --- a/src/common/consumer.c >>>>>> +++ b/src/common/consumer.c >>>>>> @@ -1131,6 +1131,8 @@ void lttng_consumer_destroy(struct lttng_consumer_local_data *ctx) >>>>>> PERROR("close"); >>>>>> } >>>>>> utils_close_pipe(ctx->consumer_splice_metadata_pipe); >>>>>> + /* This should trigger the metadata thread to exit */ >>>>>> + close(ctx->consumer_metadata_pipe[1]); >>>>> >>>>> this is adding a close, but did not remove any other remove that might >>>>> previously be in place elsewhere. >>>> >>>> So we got two possible error path which is either the poll thread fails >>>> or the consumer could be destroy by hand even though the threads are >>>> working well. >>>> >>>> Actually, this close should check if the value is valid and close it. To >>>> be honest, this is just a shortcut since close(-1) does not fail and >>>> ignoring the close error here since we are in the cleanup path anyway so >>>> we don't necessarily care about the perror message. >>>> >>>> Anyhow, we have to handle both error path. An if plus set -1 after close >>>> can be done so not to confuse. >>> >>> if two threads can concurrently perform close on the same fd value, how >>> can you prove there are no possible races ? >> >> Nothing to prove, the race is possible. The point I was trying to >> explain is that it does not matter actually since we are in a cleanup >> code path. > > what happens if we close FD 0, 1, 2 or another FD, due to this race ? This does not matter because we are in a "cleanup code path" ... closing everything is the goal. In other circumstances, I agree that this is unacceptable. > > what happens if our code evolve to restart threads after errors, and we > leave this race in place, so it becomes hard to reproduce that in some > occasions we are closing random file descriptors ? Well... agree but I doubt "lttng_destroy_consumer" will change it's behavior :P David > >> Anyway, let's remove it since the data thread, when dying, will close >> the metadata pipe anyway. > > my point exactly :) > >> >> This will avoid more discussion for this small detail :). > > The devil is in the details, as someone famous said before me. > > Thanks, > > Mathieu > >> >> David >> >>> >>> Mathieu >>> > From dgoulet at efficios.com Mon Oct 15 13:45:03 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 15 Oct 2012 13:45:03 -0400 Subject: [lttng-dev] [PATCH lttng-tools 2/4] Move add data stream to the data thread In-Reply-To: <20121015174232.GA9258@Krystal> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <1350052235-12198-2-git-send-email-dgoulet@efficios.com> <20121013155313.GB29985@Krystal> <507C2E6F.9050107@efficios.com> <20121015173101.GA9142@Krystal> <507C4AA4.8080901@efficios.com> <20121015174232.GA9258@Krystal> Message-ID: <507C4B9F.3020802@efficios.com> Mathieu Desnoyers: > * David Goulet (dgoulet at efficios.com) wrote: >> >> >> Mathieu Desnoyers: >>> * David Goulet (dgoulet at efficios.com) wrote: >>>> >>>> >>>> Mathieu Desnoyers: >>>>> * David Goulet (dgoulet at efficios.com) wrote: >>>>>> As a second step of refactoring, upon receiving a data stream, we send >>>>>> it to the data thread that is now in charge of handling it. >>>>>> >>>>>> Furthermore, in order for this to behave correctly, we have to make the >>>>>> ustctl actions on the stream upon before passing it to the right thread >>>>>> (the kernel does not need special actions.). This way, once the sessiond >>>>>> thread reply back to the session daemon, the stream is sure to be open >>>>>> and ready for data to be recorded on the application side so we avoid a >>>>>> race between the application thinking the stream is ready and the stream >>>>>> thread still scheduled out. >>>>> >>>>> Normally, as long as we have a reference on the SHM file descriptor, and >>>>> we have the wakeup FD, we should be good to fetch the data of buffers >>>>> belonging to an application that has already exited, even if it did so >>>>> before the ustctl calls are done. >>>>> >>>>> So I'm wondering why you do the ustctl calls in the sessiond thread ? It >>>>> seems to complexify the implementation needlessly: we could still do the >>>>> ustctl calls and output file open at the same location, the >>>>> data/metadata threads. >>>> >>>> Hmmm, it was my understanding that does >>> >>> does -> those >>> >>>> ustctl_* calls were needed >>>> before the trace could be recording thus making them quickly. Wrong? >>> >>> Can you rephrase your question ? I don't understand. >>> >> >> My understanding was that _those_ ustctl calls need to be done before >> the tracer could start recording data. This is why they were moved to >> the session daemon thread. >> >> Am I wrong here? When receiving an UST stream< on the consumer side, is >> the SHM reference already acquired? > > yes, the reference to shm is already acquired: it's the FD that _has_ > the reference. Ok good. So just to be crystal clear here, the ustctl* calls can be delayed and done in the right thread? (data/metadata). David > > Thanks, > > Mathieu > >> >> David >> >>> Thanks, >>> >>> Mathieu >>> >>>> >>>> David >>>> >>>>> >>>>> Thanks, >>>>> >>>>> Mathieu >>>>> >>>>>> >>>>>> This commit should speed up the add stream process for the session >>>>>> daemon. There is still some actions to move out of the session daemon >>>>>> poll thread to gain speed significantly, especially for network >>>>>> streaming. >>>>>> >>>>>> Signed-off-by: David Goulet >>>>>> --- >>>>>> src/common/consumer.c | 123 +++++++++++--------------- >>>>>> src/common/consumer.h | 1 + >>>>>> src/common/kernel-consumer/kernel-consumer.c | 24 ++--- >>>>>> src/common/ust-consumer/ust-consumer.c | 40 ++++----- >>>>>> 4 files changed, 78 insertions(+), 110 deletions(-) >>>>>> >>>>>> diff --git a/src/common/consumer.c b/src/common/consumer.c >>>>>> index 055de1b..1d2b1f7 100644 >>>>>> --- a/src/common/consumer.c >>>>>> +++ b/src/common/consumer.c >>>>>> @@ -89,7 +89,7 @@ static struct lttng_consumer_stream *consumer_find_stream(int key, >>>>>> return stream; >>>>>> } >>>>>> >>>>>> -static void consumer_steal_stream_key(int key, struct lttng_ht *ht) >>>>>> +void consumer_steal_stream_key(int key, struct lttng_ht *ht) >>>>>> { >>>>>> struct lttng_consumer_stream *stream; >>>>>> >>>>>> @@ -409,6 +409,14 @@ struct lttng_consumer_stream *consumer_allocate_stream( >>>>>> lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); >>>>>> lttng_ht_node_init_ulong(&stream->node, stream->key); >>>>>> >>>>>> + /* >>>>>> + * The cpu number is needed before using any ustctl_* actions. Ignored for >>>>>> + * the kernel so the value does not matter. >>>>>> + */ >>>>>> + pthread_mutex_lock(&consumer_data.lock); >>>>>> + stream->cpu = stream->chan->cpucount++; >>>>>> + pthread_mutex_unlock(&consumer_data.lock); >>>>>> + >>>>>> DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," >>>>>> " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, >>>>>> stream->shm_fd, stream->wait_fd, >>>>>> @@ -437,28 +445,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) >>>>>> pthread_mutex_lock(&consumer_data.lock); >>>>>> rcu_read_lock(); >>>>>> >>>>>> - switch (consumer_data.type) { >>>>>> - case LTTNG_CONSUMER_KERNEL: >>>>>> - break; >>>>>> - case LTTNG_CONSUMER32_UST: >>>>>> - case LTTNG_CONSUMER64_UST: >>>>>> - stream->cpu = stream->chan->cpucount++; >>>>>> - ret = lttng_ustconsumer_add_stream(stream); >>>>>> - if (ret) { >>>>>> - ret = -EINVAL; >>>>>> - goto error; >>>>>> - } >>>>>> - >>>>>> - /* Steal stream identifier only for UST */ >>>>>> - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); >>>>>> - break; >>>>>> - default: >>>>>> - ERR("Unknown consumer_data type"); >>>>>> - assert(0); >>>>>> - ret = -ENOSYS; >>>>>> - goto error; >>>>>> - } >>>>>> - >>>>>> lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); >>>>>> >>>>>> /* Check and cleanup relayd */ >>>>>> @@ -485,7 +471,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) >>>>>> consumer_data.stream_count++; >>>>>> consumer_data.need_update = 1; >>>>>> >>>>>> -error: >>>>>> rcu_read_unlock(); >>>>>> pthread_mutex_unlock(&consumer_data.lock); >>>>>> >>>>>> @@ -1582,17 +1567,6 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, >>>>>> >>>>>> DBG3("Consumer delete metadata stream %d", stream->wait_fd); >>>>>> >>>>>> - if (ht == NULL) { >>>>>> - /* Means the stream was allocated but not successfully added */ >>>>>> - goto free_stream; >>>>>> - } >>>>>> - >>>>>> - rcu_read_lock(); >>>>>> - iter.iter.node = &stream->waitfd_node.node; >>>>>> - ret = lttng_ht_del(ht, &iter); >>>>>> - assert(!ret); >>>>>> - rcu_read_unlock(); >>>>>> - >>>>>> pthread_mutex_lock(&consumer_data.lock); >>>>>> switch (consumer_data.type) { >>>>>> case LTTNG_CONSUMER_KERNEL: >>>>>> @@ -1613,6 +1587,18 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, >>>>>> goto end; >>>>>> } >>>>>> >>>>>> + if (ht == NULL) { >>>>>> + pthread_mutex_unlock(&consumer_data.lock); >>>>>> + /* Means the stream was allocated but not successfully added */ >>>>>> + goto free_stream; >>>>>> + } >>>>>> + >>>>>> + rcu_read_lock(); >>>>>> + iter.iter.node = &stream->waitfd_node.node; >>>>>> + ret = lttng_ht_del(ht, &iter); >>>>>> + assert(!ret); >>>>>> + rcu_read_unlock(); >>>>>> + >>>>>> if (stream->out_fd >= 0) { >>>>>> ret = close(stream->out_fd); >>>>>> if (ret) { >>>>>> @@ -1699,27 +1685,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, >>>>>> >>>>>> pthread_mutex_lock(&consumer_data.lock); >>>>>> >>>>>> - switch (consumer_data.type) { >>>>>> - case LTTNG_CONSUMER_KERNEL: >>>>>> - break; >>>>>> - case LTTNG_CONSUMER32_UST: >>>>>> - case LTTNG_CONSUMER64_UST: >>>>>> - ret = lttng_ustconsumer_add_stream(stream); >>>>>> - if (ret) { >>>>>> - ret = -EINVAL; >>>>>> - goto error; >>>>>> - } >>>>>> - >>>>>> - /* Steal stream identifier only for UST */ >>>>>> - consumer_steal_stream_key(stream->wait_fd, ht); >>>>>> - break; >>>>>> - default: >>>>>> - ERR("Unknown consumer_data type"); >>>>>> - assert(0); >>>>>> - ret = -ENOSYS; >>>>>> - goto error; >>>>>> - } >>>>>> - >>>>>> /* >>>>>> * From here, refcounts are updated so be _careful_ when returning an error >>>>>> * after this point. >>>>>> @@ -1749,7 +1714,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, >>>>>> lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); >>>>>> rcu_read_unlock(); >>>>>> >>>>>> -error: >>>>>> pthread_mutex_unlock(&consumer_data.lock); >>>>>> return ret; >>>>>> } >>>>>> @@ -1946,7 +1910,7 @@ void *consumer_thread_data_poll(void *data) >>>>>> int num_rdy, num_hup, high_prio, ret, i; >>>>>> struct pollfd *pollfd = NULL; >>>>>> /* local view of the streams */ >>>>>> - struct lttng_consumer_stream **local_stream = NULL; >>>>>> + struct lttng_consumer_stream **local_stream = NULL, *new_stream = NULL; >>>>>> /* local view of consumer_data.fds_count */ >>>>>> int nb_fd = 0; >>>>>> struct lttng_consumer_local_data *ctx = data; >>>>>> @@ -2034,13 +1998,35 @@ void *consumer_thread_data_poll(void *data) >>>>>> */ >>>>>> if (pollfd[nb_fd].revents & (POLLIN | POLLPRI)) { >>>>>> size_t pipe_readlen; >>>>>> - char tmp; >>>>>> >>>>>> DBG("consumer_poll_pipe wake up"); >>>>>> /* Consume 1 byte of pipe data */ >>>>>> do { >>>>>> - pipe_readlen = read(ctx->consumer_poll_pipe[0], &tmp, 1); >>>>>> + pipe_readlen = read(ctx->consumer_poll_pipe[0], &new_stream, >>>>>> + sizeof(new_stream)); >>>>>> } while (pipe_readlen == -1 && errno == EINTR); >>>>>> + >>>>>> + /* >>>>>> + * If the stream is NULL, just ignore it. It's also possible that >>>>>> + * the sessiond poll thread changed the consumer_quit state and is >>>>>> + * waking us up to test it. >>>>>> + */ >>>>>> + if (new_stream == NULL) { >>>>>> + continue; >>>>>> + } >>>>>> + >>>>>> + ret = consumer_add_stream(new_stream); >>>>>> + if (ret) { >>>>>> + ERR("Consumer add stream %d failed. Continuing", >>>>>> + new_stream->key); >>>>>> + /* >>>>>> + * At this point, if the add_stream fails, it is not in the >>>>>> + * hash table thus passing the NULL value here. >>>>>> + */ >>>>>> + consumer_del_stream(new_stream, NULL); >>>>>> + } >>>>>> + >>>>>> + /* Continue to update the local streams and handle prio ones */ >>>>>> continue; >>>>>> } >>>>>> >>>>>> @@ -2260,19 +2246,16 @@ end: >>>>>> consumer_poll_timeout = LTTNG_CONSUMER_POLL_TIMEOUT; >>>>>> >>>>>> /* >>>>>> - * Wake-up the other end by writing a null byte in the pipe >>>>>> - * (non-blocking). Important note: Because writing into the >>>>>> - * pipe is non-blocking (and therefore we allow dropping wakeup >>>>>> - * data, as long as there is wakeup data present in the pipe >>>>>> - * buffer to wake up the other end), the other end should >>>>>> - * perform the following sequence for waiting: >>>>>> - * 1) empty the pipe (reads). >>>>>> - * 2) perform update operation. >>>>>> - * 3) wait on the pipe (poll). >>>>>> + * Notify the data poll thread to poll back again and test the >>>>>> + * consumer_quit state to quit gracefully. >>>>>> */ >>>>>> do { >>>>>> - ret = write(ctx->consumer_poll_pipe[1], "", 1); >>>>>> + struct lttng_consumer_stream *null_stream = NULL; >>>>>> + >>>>>> + ret = write(ctx->consumer_poll_pipe[1], &null_stream, >>>>>> + sizeof(null_stream)); >>>>>> } while (ret < 0 && errno == EINTR); >>>>>> + >>>>>> rcu_unregister_thread(); >>>>>> return NULL; >>>>>> } >>>>>> diff --git a/src/common/consumer.h b/src/common/consumer.h >>>>>> index 4b225e4..8e5891a 100644 >>>>>> --- a/src/common/consumer.h >>>>>> +++ b/src/common/consumer.h >>>>>> @@ -362,6 +362,7 @@ struct consumer_relayd_sock_pair *consumer_allocate_relayd_sock_pair( >>>>>> struct consumer_relayd_sock_pair *consumer_find_relayd(int key); >>>>>> int consumer_handle_stream_before_relayd(struct lttng_consumer_stream *stream, >>>>>> size_t data_size); >>>>>> +void consumer_steal_stream_key(int key, struct lttng_ht *ht); >>>>>> >>>>>> extern struct lttng_consumer_local_data *lttng_consumer_create( >>>>>> enum lttng_consumer_type type, >>>>>> diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c >>>>>> index 13cbe21..444f5e0 100644 >>>>>> --- a/src/common/kernel-consumer/kernel-consumer.c >>>>>> +++ b/src/common/kernel-consumer/kernel-consumer.c >>>>>> @@ -235,10 +235,12 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>>>>> consumer_del_stream(new_stream, NULL); >>>>>> } >>>>>> } else { >>>>>> - ret = consumer_add_stream(new_stream); >>>>>> - if (ret) { >>>>>> - ERR("Consumer add stream %d failed. Continuing", >>>>>> - new_stream->key); >>>>>> + do { >>>>>> + ret = write(ctx->consumer_poll_pipe[1], &new_stream, >>>>>> + sizeof(new_stream)); >>>>>> + } while (ret < 0 && errno == EINTR); >>>>>> + if (ret < 0) { >>>>>> + PERROR("write data pipe"); >>>>>> consumer_del_stream(new_stream, NULL); >>>>>> goto end_nosignal; >>>>>> } >>>>>> @@ -284,20 +286,6 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>>>>> goto end_nosignal; >>>>>> } >>>>>> >>>>>> - /* >>>>>> - * Wake-up the other end by writing a null byte in the pipe (non-blocking). >>>>>> - * Important note: Because writing into the pipe is non-blocking (and >>>>>> - * therefore we allow dropping wakeup data, as long as there is wakeup data >>>>>> - * present in the pipe buffer to wake up the other end), the other end >>>>>> - * should perform the following sequence for waiting: >>>>>> - * >>>>>> - * 1) empty the pipe (reads). >>>>>> - * 2) perform update operation. >>>>>> - * 3) wait on the pipe (poll). >>>>>> - */ >>>>>> - do { >>>>>> - ret = write(ctx->consumer_poll_pipe[1], "", 1); >>>>>> - } while (ret < 0 && errno == EINTR); >>>>>> end_nosignal: >>>>>> rcu_read_unlock(); >>>>>> >>>>>> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c >>>>>> index 1170687..4ca4b84 100644 >>>>>> --- a/src/common/ust-consumer/ust-consumer.c >>>>>> +++ b/src/common/ust-consumer/ust-consumer.c >>>>>> @@ -224,6 +224,18 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>>>>> goto end_nosignal; >>>>>> } >>>>>> >>>>>> + /* >>>>>> + * This needs to be done as soon as we can so we don't block the >>>>>> + * application too long. >>>>>> + */ >>>>>> + ret = lttng_ustconsumer_add_stream(new_stream); >>>>>> + if (ret) { >>>>>> + consumer_del_stream(new_stream, NULL); >>>>>> + goto end_nosignal; >>>>>> + } >>>>>> + /* Steal stream identifier to avoid having streams with the same key */ >>>>>> + consumer_steal_stream_key(new_stream->key, consumer_data.stream_ht); >>>>>> + >>>>>> /* The stream is not metadata. Get relayd reference if exists. */ >>>>>> relayd = consumer_find_relayd(msg.u.stream.net_index); >>>>>> if (relayd != NULL) { >>>>>> @@ -265,14 +277,12 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>>>>> goto end_nosignal; >>>>>> } >>>>>> } else { >>>>>> - ret = consumer_add_stream(new_stream); >>>>>> - if (ret) { >>>>>> - ERR("Consumer add stream %d failed. Continuing", >>>>>> - new_stream->key); >>>>>> - /* >>>>>> - * At this point, if the add_stream fails, it is not in the >>>>>> - * hash table thus passing the NULL value here. >>>>>> - */ >>>>>> + do { >>>>>> + ret = write(ctx->consumer_poll_pipe[1], &new_stream, >>>>>> + sizeof(new_stream)); >>>>>> + } while (ret < 0 && errno == EINTR); >>>>>> + if (ret < 0) { >>>>>> + PERROR("write data pipe"); >>>>>> consumer_del_stream(new_stream, NULL); >>>>>> goto end_nosignal; >>>>>> } >>>>>> @@ -334,20 +344,6 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, >>>>>> break; >>>>>> } >>>>>> >>>>>> - /* >>>>>> - * Wake-up the other end by writing a null byte in the pipe (non-blocking). >>>>>> - * Important note: Because writing into the pipe is non-blocking (and >>>>>> - * therefore we allow dropping wakeup data, as long as there is wakeup data >>>>>> - * present in the pipe buffer to wake up the other end), the other end >>>>>> - * should perform the following sequence for waiting: >>>>>> - * >>>>>> - * 1) empty the pipe (reads). >>>>>> - * 2) perform update operation. >>>>>> - * 3) wait on the pipe (poll). >>>>>> - */ >>>>>> - do { >>>>>> - ret = write(ctx->consumer_poll_pipe[1], "", 1); >>>>>> - } while (ret < 0 && errno == EINTR); >>>>>> end_nosignal: >>>>>> rcu_read_unlock(); >>>>>> >>>>>> -- >>>>>> 1.7.10.4 >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> lttng-dev mailing list >>>>>> lttng-dev at lists.lttng.org >>>>>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev >>>>> >>> > From mathieu.desnoyers at efficios.com Mon Oct 15 13:56:45 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 15 Oct 2012 13:56:45 -0400 Subject: [lttng-dev] [PATCH lttng-tools 2/4] Move add data stream to the data thread In-Reply-To: <507C4B9F.3020802@efficios.com> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <1350052235-12198-2-git-send-email-dgoulet@efficios.com> <20121013155313.GB29985@Krystal> <507C2E6F.9050107@efficios.com> <20121015173101.GA9142@Krystal> <507C4AA4.8080901@efficios.com> <20121015174232.GA9258@Krystal> <507C4B9F.3020802@efficios.com> Message-ID: <20121015175645.GA9423@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > > > Mathieu Desnoyers: > > * David Goulet (dgoulet at efficios.com) wrote: > >> > >> > >> Mathieu Desnoyers: > >>> * David Goulet (dgoulet at efficios.com) wrote: > >>>> > >>>> > >>>> Mathieu Desnoyers: > >>>>> * David Goulet (dgoulet at efficios.com) wrote: > >>>>>> As a second step of refactoring, upon receiving a data stream, we send > >>>>>> it to the data thread that is now in charge of handling it. > >>>>>> > >>>>>> Furthermore, in order for this to behave correctly, we have to make the > >>>>>> ustctl actions on the stream upon before passing it to the right thread > >>>>>> (the kernel does not need special actions.). This way, once the sessiond > >>>>>> thread reply back to the session daemon, the stream is sure to be open > >>>>>> and ready for data to be recorded on the application side so we avoid a > >>>>>> race between the application thinking the stream is ready and the stream > >>>>>> thread still scheduled out. > >>>>> > >>>>> Normally, as long as we have a reference on the SHM file descriptor, and > >>>>> we have the wakeup FD, we should be good to fetch the data of buffers > >>>>> belonging to an application that has already exited, even if it did so > >>>>> before the ustctl calls are done. > >>>>> > >>>>> So I'm wondering why you do the ustctl calls in the sessiond thread ? It > >>>>> seems to complexify the implementation needlessly: we could still do the > >>>>> ustctl calls and output file open at the same location, the > >>>>> data/metadata threads. > >>>> > >>>> Hmmm, it was my understanding that does > >>> > >>> does -> those > >>> > >>>> ustctl_* calls were needed > >>>> before the trace could be recording thus making them quickly. Wrong? > >>> > >>> Can you rephrase your question ? I don't understand. > >>> > >> > >> My understanding was that _those_ ustctl calls need to be done before > >> the tracer could start recording data. This is why they were moved to > >> the session daemon thread. > >> > >> Am I wrong here? When receiving an UST stream< on the consumer side, is > >> the SHM reference already acquired? > > > > yes, the reference to shm is already acquired: it's the FD that _has_ > > the reference. > > Ok good. So just to be crystal clear here, the ustctl* calls can be > delayed and done in the right thread? (data/metadata). yes, I expect it should work. Mathieu > > David > > > > > Thanks, > > > > Mathieu > > > >> > >> David > >> > >>> Thanks, > >>> > >>> Mathieu > >>> > >>>> > >>>> David > >>>> > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Mathieu > >>>>> > >>>>>> > >>>>>> This commit should speed up the add stream process for the session > >>>>>> daemon. There is still some actions to move out of the session daemon > >>>>>> poll thread to gain speed significantly, especially for network > >>>>>> streaming. > >>>>>> > >>>>>> Signed-off-by: David Goulet > >>>>>> --- > >>>>>> src/common/consumer.c | 123 +++++++++++--------------- > >>>>>> src/common/consumer.h | 1 + > >>>>>> src/common/kernel-consumer/kernel-consumer.c | 24 ++--- > >>>>>> src/common/ust-consumer/ust-consumer.c | 40 ++++----- > >>>>>> 4 files changed, 78 insertions(+), 110 deletions(-) > >>>>>> > >>>>>> diff --git a/src/common/consumer.c b/src/common/consumer.c > >>>>>> index 055de1b..1d2b1f7 100644 > >>>>>> --- a/src/common/consumer.c > >>>>>> +++ b/src/common/consumer.c > >>>>>> @@ -89,7 +89,7 @@ static struct lttng_consumer_stream *consumer_find_stream(int key, > >>>>>> return stream; > >>>>>> } > >>>>>> > >>>>>> -static void consumer_steal_stream_key(int key, struct lttng_ht *ht) > >>>>>> +void consumer_steal_stream_key(int key, struct lttng_ht *ht) > >>>>>> { > >>>>>> struct lttng_consumer_stream *stream; > >>>>>> > >>>>>> @@ -409,6 +409,14 @@ struct lttng_consumer_stream *consumer_allocate_stream( > >>>>>> lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); > >>>>>> lttng_ht_node_init_ulong(&stream->node, stream->key); > >>>>>> > >>>>>> + /* > >>>>>> + * The cpu number is needed before using any ustctl_* actions. Ignored for > >>>>>> + * the kernel so the value does not matter. > >>>>>> + */ > >>>>>> + pthread_mutex_lock(&consumer_data.lock); > >>>>>> + stream->cpu = stream->chan->cpucount++; > >>>>>> + pthread_mutex_unlock(&consumer_data.lock); > >>>>>> + > >>>>>> DBG3("Allocated stream %s (key %d, shm_fd %d, wait_fd %d, mmap_len %llu," > >>>>>> " out_fd %d, net_seq_idx %d)", stream->path_name, stream->key, > >>>>>> stream->shm_fd, stream->wait_fd, > >>>>>> @@ -437,28 +445,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) > >>>>>> pthread_mutex_lock(&consumer_data.lock); > >>>>>> rcu_read_lock(); > >>>>>> > >>>>>> - switch (consumer_data.type) { > >>>>>> - case LTTNG_CONSUMER_KERNEL: > >>>>>> - break; > >>>>>> - case LTTNG_CONSUMER32_UST: > >>>>>> - case LTTNG_CONSUMER64_UST: > >>>>>> - stream->cpu = stream->chan->cpucount++; > >>>>>> - ret = lttng_ustconsumer_add_stream(stream); > >>>>>> - if (ret) { > >>>>>> - ret = -EINVAL; > >>>>>> - goto error; > >>>>>> - } > >>>>>> - > >>>>>> - /* Steal stream identifier only for UST */ > >>>>>> - consumer_steal_stream_key(stream->key, consumer_data.stream_ht); > >>>>>> - break; > >>>>>> - default: > >>>>>> - ERR("Unknown consumer_data type"); > >>>>>> - assert(0); > >>>>>> - ret = -ENOSYS; > >>>>>> - goto error; > >>>>>> - } > >>>>>> - > >>>>>> lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); > >>>>>> > >>>>>> /* Check and cleanup relayd */ > >>>>>> @@ -485,7 +471,6 @@ int consumer_add_stream(struct lttng_consumer_stream *stream) > >>>>>> consumer_data.stream_count++; > >>>>>> consumer_data.need_update = 1; > >>>>>> > >>>>>> -error: > >>>>>> rcu_read_unlock(); > >>>>>> pthread_mutex_unlock(&consumer_data.lock); > >>>>>> > >>>>>> @@ -1582,17 +1567,6 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, > >>>>>> > >>>>>> DBG3("Consumer delete metadata stream %d", stream->wait_fd); > >>>>>> > >>>>>> - if (ht == NULL) { > >>>>>> - /* Means the stream was allocated but not successfully added */ > >>>>>> - goto free_stream; > >>>>>> - } > >>>>>> - > >>>>>> - rcu_read_lock(); > >>>>>> - iter.iter.node = &stream->waitfd_node.node; > >>>>>> - ret = lttng_ht_del(ht, &iter); > >>>>>> - assert(!ret); > >>>>>> - rcu_read_unlock(); > >>>>>> - > >>>>>> pthread_mutex_lock(&consumer_data.lock); > >>>>>> switch (consumer_data.type) { > >>>>>> case LTTNG_CONSUMER_KERNEL: > >>>>>> @@ -1613,6 +1587,18 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, > >>>>>> goto end; > >>>>>> } > >>>>>> > >>>>>> + if (ht == NULL) { > >>>>>> + pthread_mutex_unlock(&consumer_data.lock); > >>>>>> + /* Means the stream was allocated but not successfully added */ > >>>>>> + goto free_stream; > >>>>>> + } > >>>>>> + > >>>>>> + rcu_read_lock(); > >>>>>> + iter.iter.node = &stream->waitfd_node.node; > >>>>>> + ret = lttng_ht_del(ht, &iter); > >>>>>> + assert(!ret); > >>>>>> + rcu_read_unlock(); > >>>>>> + > >>>>>> if (stream->out_fd >= 0) { > >>>>>> ret = close(stream->out_fd); > >>>>>> if (ret) { > >>>>>> @@ -1699,27 +1685,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > >>>>>> > >>>>>> pthread_mutex_lock(&consumer_data.lock); > >>>>>> > >>>>>> - switch (consumer_data.type) { > >>>>>> - case LTTNG_CONSUMER_KERNEL: > >>>>>> - break; > >>>>>> - case LTTNG_CONSUMER32_UST: > >>>>>> - case LTTNG_CONSUMER64_UST: > >>>>>> - ret = lttng_ustconsumer_add_stream(stream); > >>>>>> - if (ret) { > >>>>>> - ret = -EINVAL; > >>>>>> - goto error; > >>>>>> - } > >>>>>> - > >>>>>> - /* Steal stream identifier only for UST */ > >>>>>> - consumer_steal_stream_key(stream->wait_fd, ht); > >>>>>> - break; > >>>>>> - default: > >>>>>> - ERR("Unknown consumer_data type"); > >>>>>> - assert(0); > >>>>>> - ret = -ENOSYS; > >>>>>> - goto error; > >>>>>> - } > >>>>>> - > >>>>>> /* > >>>>>> * From here, refcounts are updated so be _careful_ when returning an error > >>>>>> * after this point. > >>>>>> @@ -1749,7 +1714,6 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > >>>>>> lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); > >>>>>> rcu_read_unlock(); > >>>>>> > >>>>>> -error: > >>>>>> pthread_mutex_unlock(&consumer_data.lock); > >>>>>> return ret; > >>>>>> } > >>>>>> @@ -1946,7 +1910,7 @@ void *consumer_thread_data_poll(void *data) > >>>>>> int num_rdy, num_hup, high_prio, ret, i; > >>>>>> struct pollfd *pollfd = NULL; > >>>>>> /* local view of the streams */ > >>>>>> - struct lttng_consumer_stream **local_stream = NULL; > >>>>>> + struct lttng_consumer_stream **local_stream = NULL, *new_stream = NULL; > >>>>>> /* local view of consumer_data.fds_count */ > >>>>>> int nb_fd = 0; > >>>>>> struct lttng_consumer_local_data *ctx = data; > >>>>>> @@ -2034,13 +1998,35 @@ void *consumer_thread_data_poll(void *data) > >>>>>> */ > >>>>>> if (pollfd[nb_fd].revents & (POLLIN | POLLPRI)) { > >>>>>> size_t pipe_readlen; > >>>>>> - char tmp; > >>>>>> > >>>>>> DBG("consumer_poll_pipe wake up"); > >>>>>> /* Consume 1 byte of pipe data */ > >>>>>> do { > >>>>>> - pipe_readlen = read(ctx->consumer_poll_pipe[0], &tmp, 1); > >>>>>> + pipe_readlen = read(ctx->consumer_poll_pipe[0], &new_stream, > >>>>>> + sizeof(new_stream)); > >>>>>> } while (pipe_readlen == -1 && errno == EINTR); > >>>>>> + > >>>>>> + /* > >>>>>> + * If the stream is NULL, just ignore it. It's also possible that > >>>>>> + * the sessiond poll thread changed the consumer_quit state and is > >>>>>> + * waking us up to test it. > >>>>>> + */ > >>>>>> + if (new_stream == NULL) { > >>>>>> + continue; > >>>>>> + } > >>>>>> + > >>>>>> + ret = consumer_add_stream(new_stream); > >>>>>> + if (ret) { > >>>>>> + ERR("Consumer add stream %d failed. Continuing", > >>>>>> + new_stream->key); > >>>>>> + /* > >>>>>> + * At this point, if the add_stream fails, it is not in the > >>>>>> + * hash table thus passing the NULL value here. > >>>>>> + */ > >>>>>> + consumer_del_stream(new_stream, NULL); > >>>>>> + } > >>>>>> + > >>>>>> + /* Continue to update the local streams and handle prio ones */ > >>>>>> continue; > >>>>>> } > >>>>>> > >>>>>> @@ -2260,19 +2246,16 @@ end: > >>>>>> consumer_poll_timeout = LTTNG_CONSUMER_POLL_TIMEOUT; > >>>>>> > >>>>>> /* > >>>>>> - * Wake-up the other end by writing a null byte in the pipe > >>>>>> - * (non-blocking). Important note: Because writing into the > >>>>>> - * pipe is non-blocking (and therefore we allow dropping wakeup > >>>>>> - * data, as long as there is wakeup data present in the pipe > >>>>>> - * buffer to wake up the other end), the other end should > >>>>>> - * perform the following sequence for waiting: > >>>>>> - * 1) empty the pipe (reads). > >>>>>> - * 2) perform update operation. > >>>>>> - * 3) wait on the pipe (poll). > >>>>>> + * Notify the data poll thread to poll back again and test the > >>>>>> + * consumer_quit state to quit gracefully. > >>>>>> */ > >>>>>> do { > >>>>>> - ret = write(ctx->consumer_poll_pipe[1], "", 1); > >>>>>> + struct lttng_consumer_stream *null_stream = NULL; > >>>>>> + > >>>>>> + ret = write(ctx->consumer_poll_pipe[1], &null_stream, > >>>>>> + sizeof(null_stream)); > >>>>>> } while (ret < 0 && errno == EINTR); > >>>>>> + > >>>>>> rcu_unregister_thread(); > >>>>>> return NULL; > >>>>>> } > >>>>>> diff --git a/src/common/consumer.h b/src/common/consumer.h > >>>>>> index 4b225e4..8e5891a 100644 > >>>>>> --- a/src/common/consumer.h > >>>>>> +++ b/src/common/consumer.h > >>>>>> @@ -362,6 +362,7 @@ struct consumer_relayd_sock_pair *consumer_allocate_relayd_sock_pair( > >>>>>> struct consumer_relayd_sock_pair *consumer_find_relayd(int key); > >>>>>> int consumer_handle_stream_before_relayd(struct lttng_consumer_stream *stream, > >>>>>> size_t data_size); > >>>>>> +void consumer_steal_stream_key(int key, struct lttng_ht *ht); > >>>>>> > >>>>>> extern struct lttng_consumer_local_data *lttng_consumer_create( > >>>>>> enum lttng_consumer_type type, > >>>>>> diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c > >>>>>> index 13cbe21..444f5e0 100644 > >>>>>> --- a/src/common/kernel-consumer/kernel-consumer.c > >>>>>> +++ b/src/common/kernel-consumer/kernel-consumer.c > >>>>>> @@ -235,10 +235,12 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>>>>> consumer_del_stream(new_stream, NULL); > >>>>>> } > >>>>>> } else { > >>>>>> - ret = consumer_add_stream(new_stream); > >>>>>> - if (ret) { > >>>>>> - ERR("Consumer add stream %d failed. Continuing", > >>>>>> - new_stream->key); > >>>>>> + do { > >>>>>> + ret = write(ctx->consumer_poll_pipe[1], &new_stream, > >>>>>> + sizeof(new_stream)); > >>>>>> + } while (ret < 0 && errno == EINTR); > >>>>>> + if (ret < 0) { > >>>>>> + PERROR("write data pipe"); > >>>>>> consumer_del_stream(new_stream, NULL); > >>>>>> goto end_nosignal; > >>>>>> } > >>>>>> @@ -284,20 +286,6 @@ int lttng_kconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>>>>> goto end_nosignal; > >>>>>> } > >>>>>> > >>>>>> - /* > >>>>>> - * Wake-up the other end by writing a null byte in the pipe (non-blocking). > >>>>>> - * Important note: Because writing into the pipe is non-blocking (and > >>>>>> - * therefore we allow dropping wakeup data, as long as there is wakeup data > >>>>>> - * present in the pipe buffer to wake up the other end), the other end > >>>>>> - * should perform the following sequence for waiting: > >>>>>> - * > >>>>>> - * 1) empty the pipe (reads). > >>>>>> - * 2) perform update operation. > >>>>>> - * 3) wait on the pipe (poll). > >>>>>> - */ > >>>>>> - do { > >>>>>> - ret = write(ctx->consumer_poll_pipe[1], "", 1); > >>>>>> - } while (ret < 0 && errno == EINTR); > >>>>>> end_nosignal: > >>>>>> rcu_read_unlock(); > >>>>>> > >>>>>> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c > >>>>>> index 1170687..4ca4b84 100644 > >>>>>> --- a/src/common/ust-consumer/ust-consumer.c > >>>>>> +++ b/src/common/ust-consumer/ust-consumer.c > >>>>>> @@ -224,6 +224,18 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>>>>> goto end_nosignal; > >>>>>> } > >>>>>> > >>>>>> + /* > >>>>>> + * This needs to be done as soon as we can so we don't block the > >>>>>> + * application too long. > >>>>>> + */ > >>>>>> + ret = lttng_ustconsumer_add_stream(new_stream); > >>>>>> + if (ret) { > >>>>>> + consumer_del_stream(new_stream, NULL); > >>>>>> + goto end_nosignal; > >>>>>> + } > >>>>>> + /* Steal stream identifier to avoid having streams with the same key */ > >>>>>> + consumer_steal_stream_key(new_stream->key, consumer_data.stream_ht); > >>>>>> + > >>>>>> /* The stream is not metadata. Get relayd reference if exists. */ > >>>>>> relayd = consumer_find_relayd(msg.u.stream.net_index); > >>>>>> if (relayd != NULL) { > >>>>>> @@ -265,14 +277,12 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>>>>> goto end_nosignal; > >>>>>> } > >>>>>> } else { > >>>>>> - ret = consumer_add_stream(new_stream); > >>>>>> - if (ret) { > >>>>>> - ERR("Consumer add stream %d failed. Continuing", > >>>>>> - new_stream->key); > >>>>>> - /* > >>>>>> - * At this point, if the add_stream fails, it is not in the > >>>>>> - * hash table thus passing the NULL value here. > >>>>>> - */ > >>>>>> + do { > >>>>>> + ret = write(ctx->consumer_poll_pipe[1], &new_stream, > >>>>>> + sizeof(new_stream)); > >>>>>> + } while (ret < 0 && errno == EINTR); > >>>>>> + if (ret < 0) { > >>>>>> + PERROR("write data pipe"); > >>>>>> consumer_del_stream(new_stream, NULL); > >>>>>> goto end_nosignal; > >>>>>> } > >>>>>> @@ -334,20 +344,6 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >>>>>> break; > >>>>>> } > >>>>>> > >>>>>> - /* > >>>>>> - * Wake-up the other end by writing a null byte in the pipe (non-blocking). > >>>>>> - * Important note: Because writing into the pipe is non-blocking (and > >>>>>> - * therefore we allow dropping wakeup data, as long as there is wakeup data > >>>>>> - * present in the pipe buffer to wake up the other end), the other end > >>>>>> - * should perform the following sequence for waiting: > >>>>>> - * > >>>>>> - * 1) empty the pipe (reads). > >>>>>> - * 2) perform update operation. > >>>>>> - * 3) wait on the pipe (poll). > >>>>>> - */ > >>>>>> - do { > >>>>>> - ret = write(ctx->consumer_poll_pipe[1], "", 1); > >>>>>> - } while (ret < 0 && errno == EINTR); > >>>>>> end_nosignal: > >>>>>> rcu_read_unlock(); > >>>>>> > >>>>>> -- > >>>>>> 1.7.10.4 > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> lttng-dev mailing list > >>>>>> lttng-dev at lists.lttng.org > >>>>>> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > >>>>> > >>> > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 15 13:57:54 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 15 Oct 2012 13:57:54 -0400 Subject: [lttng-dev] [PATCH lttng-tools 3/4] Make stream hash tables global to the consumer In-Reply-To: <507C3013.2050405@efficios.com> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <1350052235-12198-3-git-send-email-dgoulet@efficios.com> <20121013155615.GC29985@Krystal> <507C3013.2050405@efficios.com> Message-ID: <20121015175754.GB9423@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > > > Mathieu Desnoyers: > > * David Goulet (dgoulet at efficios.com) wrote: > >> The data stream hash table is now global to the consumer and used in the > >> data thread. The consumer_data stream_ht is no longer used to track the > >> data streams but instead will be used (and possibly renamed) by the > >> session daemon poll thread to keep track of streams on a per session id > >> basis for the upcoming feature that check traced data availability. > >> > >> For now, in order to avoid mind bugging problems to access the streams, > >> both hash table are now defined globally (metadata and data). However, > >> stream update are still done in a single thread. Don't count on this to > >> be guaranteed in the next commits. > >> > >> Signed-off-by: David Goulet > >> --- > >> src/common/consumer.c | 91 +++++++++++++++++++++++++------- > >> src/common/consumer.h | 9 ++-- > >> src/common/ust-consumer/ust-consumer.c | 2 - > >> 3 files changed, 75 insertions(+), 27 deletions(-) > >> > >> diff --git a/src/common/consumer.c b/src/common/consumer.c > >> index 1d2b1f7..1fb9960 100644 > >> --- a/src/common/consumer.c > >> +++ b/src/common/consumer.c > >> @@ -59,6 +59,17 @@ int consumer_poll_timeout = -1; > >> volatile int consumer_quit = 0; > >> > >> /* > >> + * The following two hash tables are visible by all threads which are separated > >> + * in different source files. > >> + * > >> + * Global hash table containing respectively metadata and data streams. The > >> + * stream element in this ht should only be updated by the metadata poll thread > >> + * for the metadata and the data poll thread for the data. > >> + */ > >> +struct lttng_ht *metadata_ht = NULL; > >> +struct lttng_ht *data_ht = NULL; > >> + > >> +/* > >> * Find a stream. The consumer_data.lock must be locked during this > >> * call. > >> */ > >> @@ -433,19 +444,24 @@ end: > >> /* > >> * Add a stream to the global list protected by a mutex. > >> */ > >> -int consumer_add_stream(struct lttng_consumer_stream *stream) > >> +static int consumer_add_stream(struct lttng_consumer_stream *stream, > >> + struct lttng_ht *ht) > >> { > >> int ret = 0; > >> struct consumer_relayd_sock_pair *relayd; > >> > >> assert(stream); > >> + assert(ht); > >> > >> DBG3("Adding consumer stream %d", stream->key); > >> > >> pthread_mutex_lock(&consumer_data.lock); > >> rcu_read_lock(); > >> > >> - lttng_ht_add_unique_ulong(consumer_data.stream_ht, &stream->node); > >> + /* Steal stream identifier to avoid having streams with the same key */ > >> + consumer_steal_stream_key(stream->key, ht); > > > > I don't understand why suddenly this change is needed. Considering what > > this patch should be doing (just moving a ht from per-thread to global), > > it should not have any behavior impact. > > We move the steal stream key from the sessiond thread to the add_stream > function call since we do not use the consumer_data hash table anymore > (stream_ht) and uses per thread hashtable (global for now though). > > If you look below, you'll see that the steal stream key call is removed > (using the consumer data stream_ht). > > This commit makes sure that both consumer_add_stream and > add_metadata_stream steal the stream key if needed. ok, makes sense. Thanks! Mathieu > > Thanks > David > > > > > Thanks, > > > > Mathieu > > > >> + > >> + lttng_ht_add_unique_ulong(ht, &stream->node); > >> > >> /* Check and cleanup relayd */ > >> relayd = consumer_find_relayd(stream->net_seq_idx); > >> @@ -783,9 +799,9 @@ end: > >> * > >> * Returns the number of fds in the structures. > >> */ > >> -int consumer_update_poll_array( > >> +static int consumer_update_poll_array( > >> struct lttng_consumer_local_data *ctx, struct pollfd **pollfd, > >> - struct lttng_consumer_stream **local_stream) > >> + struct lttng_consumer_stream **local_stream, struct lttng_ht *ht) > >> { > >> int i = 0; > >> struct lttng_ht_iter iter; > >> @@ -793,8 +809,7 @@ int consumer_update_poll_array( > >> > >> DBG("Updating poll fd array"); > >> rcu_read_lock(); > >> - cds_lfht_for_each_entry(consumer_data.stream_ht->ht, &iter.iter, stream, > >> - node.node) { > >> + cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { > >> if (stream->state != LTTNG_CONSUMER_ACTIVE_STREAM) { > >> continue; > >> } > >> @@ -1523,6 +1538,33 @@ int lttng_consumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >> /* > >> * Iterate over all streams of the hashtable and free them properly. > >> * > >> + * WARNING: *MUST* be used with data stream only. > >> + */ > >> +static void destroy_data_stream_ht(struct lttng_ht *ht) > >> +{ > >> + int ret; > >> + struct lttng_ht_iter iter; > >> + struct lttng_consumer_stream *stream; > >> + > >> + if (ht == NULL) { > >> + return; > >> + } > >> + > >> + rcu_read_lock(); > >> + cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { > >> + ret = lttng_ht_del(ht, &iter); > >> + assert(!ret); > >> + > >> + call_rcu(&stream->node.head, consumer_free_stream); > >> + } > >> + rcu_read_unlock(); > >> + > >> + lttng_ht_destroy(ht); > >> +} > >> + > >> +/* > >> + * Iterate over all streams of the hashtable and free them properly. > >> + * > >> * XXX: Should not be only for metadata stream or else use an other name. > >> */ > >> static void destroy_stream_ht(struct lttng_ht *ht) > >> @@ -1711,6 +1753,9 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > >> uatomic_dec(&stream->chan->nb_init_streams); > >> } > >> > >> + /* Steal stream identifier to avoid having streams with the same key */ > >> + consumer_steal_stream_key(stream->key, ht); > >> + > >> lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); > >> rcu_read_unlock(); > >> > >> @@ -1729,7 +1774,6 @@ void *consumer_thread_metadata_poll(void *data) > >> struct lttng_consumer_stream *stream = NULL; > >> struct lttng_ht_iter iter; > >> struct lttng_ht_node_ulong *node; > >> - struct lttng_ht *metadata_ht = NULL; > >> struct lttng_poll_event events; > >> struct lttng_consumer_local_data *ctx = data; > >> ssize_t len; > >> @@ -1738,11 +1782,6 @@ void *consumer_thread_metadata_poll(void *data) > >> > >> DBG("Thread metadata poll started"); > >> > >> - metadata_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > >> - if (metadata_ht == NULL) { > >> - goto end; > >> - } > >> - > >> /* Size is set to 1 for the consumer_metadata pipe */ > >> ret = lttng_poll_create(&events, 2, LTTNG_CLOEXEC); > >> if (ret < 0) { > >> @@ -1918,6 +1957,11 @@ void *consumer_thread_data_poll(void *data) > >> > >> rcu_register_thread(); > >> > >> + data_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > >> + if (data_ht == NULL) { > >> + goto end; > >> + } > >> + > >> local_stream = zmalloc(sizeof(struct lttng_consumer_stream)); > >> > >> while (1) { > >> @@ -1955,7 +1999,8 @@ void *consumer_thread_data_poll(void *data) > >> pthread_mutex_unlock(&consumer_data.lock); > >> goto end; > >> } > >> - ret = consumer_update_poll_array(ctx, &pollfd, local_stream); > >> + ret = consumer_update_poll_array(ctx, &pollfd, local_stream, > >> + data_ht); > >> if (ret < 0) { > >> ERR("Error in allocating pollfd or local_outfds"); > >> lttng_consumer_send_error(ctx, LTTCOMM_CONSUMERD_POLL_ERROR); > >> @@ -2015,7 +2060,7 @@ void *consumer_thread_data_poll(void *data) > >> continue; > >> } > >> > >> - ret = consumer_add_stream(new_stream); > >> + ret = consumer_add_stream(new_stream, data_ht); > >> if (ret) { > >> ERR("Consumer add stream %d failed. Continuing", > >> new_stream->key); > >> @@ -2088,22 +2133,19 @@ void *consumer_thread_data_poll(void *data) > >> if ((pollfd[i].revents & POLLHUP)) { > >> DBG("Polling fd %d tells it has hung up.", pollfd[i].fd); > >> if (!local_stream[i]->data_read) { > >> - consumer_del_stream(local_stream[i], > >> - consumer_data.stream_ht); > >> + consumer_del_stream(local_stream[i], data_ht); > >> num_hup++; > >> } > >> } else if (pollfd[i].revents & POLLERR) { > >> ERR("Error returned in polling fd %d.", pollfd[i].fd); > >> if (!local_stream[i]->data_read) { > >> - consumer_del_stream(local_stream[i], > >> - consumer_data.stream_ht); > >> + consumer_del_stream(local_stream[i], data_ht); > >> num_hup++; > >> } > >> } else if (pollfd[i].revents & POLLNVAL) { > >> ERR("Polling fd %d tells fd is not open.", pollfd[i].fd); > >> if (!local_stream[i]->data_read) { > >> - consumer_del_stream(local_stream[i], > >> - consumer_data.stream_ht); > >> + consumer_del_stream(local_stream[i], data_ht); > >> num_hup++; > >> } > >> } > >> @@ -2131,6 +2173,10 @@ end: > >> */ > >> close(ctx->consumer_metadata_pipe[1]); > >> > >> + if (data_ht) { > >> + destroy_data_stream_ht(data_ht); > >> + } > >> + > >> rcu_unregister_thread(); > >> return NULL; > >> } > >> @@ -2299,6 +2345,11 @@ void lttng_consumer_init(void) > >> consumer_data.stream_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > >> consumer_data.channel_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > >> consumer_data.relayd_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > >> + > >> + metadata_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > >> + assert(metadata_ht); > >> + data_ht = lttng_ht_new(0, LTTNG_HT_TYPE_ULONG); > >> + assert(data_ht); > >> } > >> > >> /* > >> diff --git a/src/common/consumer.h b/src/common/consumer.h > >> index 8e5891a..6bce96d 100644 > >> --- a/src/common/consumer.h > >> +++ b/src/common/consumer.h > >> @@ -275,6 +275,10 @@ struct lttng_consumer_global_data { > >> struct lttng_ht *relayd_ht; > >> }; > >> > >> +/* Defined in consumer.c and coupled with explanations */ > >> +extern struct lttng_ht *metadata_ht; > >> +extern struct lttng_ht *data_ht; > >> + > >> /* > >> * Init consumer data structures. > >> */ > >> @@ -324,10 +328,6 @@ extern void lttng_consumer_sync_trace_file( > >> */ > >> extern int lttng_consumer_poll_socket(struct pollfd *kconsumer_sockpoll); > >> > >> -extern int consumer_update_poll_array( > >> - struct lttng_consumer_local_data *ctx, struct pollfd **pollfd, > >> - struct lttng_consumer_stream **local_consumer_streams); > >> - > >> extern struct lttng_consumer_stream *consumer_allocate_stream( > >> int channel_key, int stream_key, > >> int shm_fd, int wait_fd, > >> @@ -340,7 +340,6 @@ extern struct lttng_consumer_stream *consumer_allocate_stream( > >> int net_index, > >> int metadata_flag, > >> int *alloc_ret); > >> -extern int consumer_add_stream(struct lttng_consumer_stream *stream); > >> extern void consumer_del_stream(struct lttng_consumer_stream *stream, > >> struct lttng_ht *ht); > >> extern void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, > >> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c > >> index 4ca4b84..3b41e55 100644 > >> --- a/src/common/ust-consumer/ust-consumer.c > >> +++ b/src/common/ust-consumer/ust-consumer.c > >> @@ -233,8 +233,6 @@ int lttng_ustconsumer_recv_cmd(struct lttng_consumer_local_data *ctx, > >> consumer_del_stream(new_stream, NULL); > >> goto end_nosignal; > >> } > >> - /* Steal stream identifier to avoid having streams with the same key */ > >> - consumer_steal_stream_key(new_stream->key, consumer_data.stream_ht); > >> > >> /* The stream is not metadata. Get relayd reference if exists. */ > >> relayd = consumer_find_relayd(msg.u.stream.net_index); > >> -- > >> 1.7.10.4 > >> > >> > >> _______________________________________________ > >> lttng-dev mailing list > >> lttng-dev at lists.lttng.org > >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Mon Oct 15 15:22:42 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 15 Oct 2012 15:22:42 -0400 Subject: [lttng-dev] [PATCH lttng-tools 4/4] Change the metadata hash table node In-Reply-To: <20121013160035.GD29985@Krystal> References: <1350052235-12198-1-git-send-email-dgoulet@efficios.com> <1350052235-12198-4-git-send-email-dgoulet@efficios.com> <20121013160035.GD29985@Krystal> Message-ID: <507C6282.4000405@efficios.com> Oh forgot to reply :P Yes... my bad. Present... the patch does what's described :) David Mathieu Desnoyers: > * David Goulet (dgoulet at efficios.com) wrote: >> Remove the use of the "waitfd_node" for metadata and index the "node" by >> wait fd during stream allocation only for metadata stream. >> >> This was done so the waitfd node could be used later on for the hash >> table indexing stream by session id the traced data check command (soon >> to be implemented). > > this last changelog paragraph, being using "was" (past) is confusing me. > Is it what you are now doing (present) or what was there before that you > are changing ? > > Thanks, > > Mathieu > >> >> Signed-off-by: David Goulet >> --- >> src/common/consumer.c | 36 +++++++++++++++++------------------- >> 1 file changed, 17 insertions(+), 19 deletions(-) >> >> diff --git a/src/common/consumer.c b/src/common/consumer.c >> index 1fb9960..0c1a812 100644 >> --- a/src/common/consumer.c >> +++ b/src/common/consumer.c >> @@ -172,17 +172,6 @@ void consumer_free_stream(struct rcu_head *head) >> free(stream); >> } >> >> -static >> -void consumer_free_metadata_stream(struct rcu_head *head) >> -{ >> - struct lttng_ht_node_ulong *node = >> - caa_container_of(head, struct lttng_ht_node_ulong, head); >> - struct lttng_consumer_stream *stream = >> - caa_container_of(node, struct lttng_consumer_stream, waitfd_node); >> - >> - free(stream); >> -} >> - >> /* >> * RCU protected relayd socket pair free. >> */ >> @@ -417,8 +406,17 @@ struct lttng_consumer_stream *consumer_allocate_stream( >> stream->metadata_flag = metadata_flag; >> strncpy(stream->path_name, path_name, sizeof(stream->path_name)); >> stream->path_name[sizeof(stream->path_name) - 1] = '\0'; >> - lttng_ht_node_init_ulong(&stream->waitfd_node, stream->wait_fd); >> - lttng_ht_node_init_ulong(&stream->node, stream->key); >> + >> + /* >> + * Index differently the metadata node because the thread is using an >> + * internal hash table to match streams in the metadata_ht to the epoll set >> + * file descriptor. >> + */ >> + if (metadata_flag) { >> + lttng_ht_node_init_ulong(&stream->node, stream->wait_fd); >> + } else { >> + lttng_ht_node_init_ulong(&stream->node, stream->key); >> + } >> >> /* >> * The cpu number is needed before using any ustctl_* actions. Ignored for >> @@ -1578,11 +1576,11 @@ static void destroy_stream_ht(struct lttng_ht *ht) >> } >> >> rcu_read_lock(); >> - cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, waitfd_node.node) { >> + cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { >> ret = lttng_ht_del(ht, &iter); >> assert(!ret); >> >> - call_rcu(&stream->waitfd_node.head, consumer_free_metadata_stream); >> + call_rcu(&stream->node.head, consumer_free_stream); >> } >> rcu_read_unlock(); >> >> @@ -1636,7 +1634,7 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, >> } >> >> rcu_read_lock(); >> - iter.iter.node = &stream->waitfd_node.node; >> + iter.iter.node = &stream->node.node; >> ret = lttng_ht_del(ht, &iter); >> assert(!ret); >> rcu_read_unlock(); >> @@ -1707,7 +1705,7 @@ end: >> } >> >> free_stream: >> - call_rcu(&stream->waitfd_node.head, consumer_free_metadata_stream); >> + call_rcu(&stream->node.head, consumer_free_stream); >> } >> >> /* >> @@ -1756,7 +1754,7 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, >> /* Steal stream identifier to avoid having streams with the same key */ >> consumer_steal_stream_key(stream->key, ht); >> >> - lttng_ht_add_unique_ulong(ht, &stream->waitfd_node); >> + lttng_ht_add_unique_ulong(ht, &stream->node); >> rcu_read_unlock(); >> >> pthread_mutex_unlock(&consumer_data.lock); >> @@ -1881,7 +1879,7 @@ restart: >> assert(node); >> >> stream = caa_container_of(node, struct lttng_consumer_stream, >> - waitfd_node); >> + node); >> >> /* Check for error event */ >> if (revents & (LPOLLERR | LPOLLHUP)) { >> -- >> 1.7.10.4 >> >> >> _______________________________________________ >> lttng-dev mailing list >> lttng-dev at lists.lttng.org >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > From christian.babeux at efficios.com Tue Oct 16 14:33:19 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 16 Oct 2012 14:33:19 -0400 Subject: [lttng-dev] [PATCH lttng-tools 1/5] Tests: Add an unsupported operators filtering test Message-ID: <1350412403-9974-1-git-send-email-christian.babeux@efficios.com> This test validate that filters with unsupported operators are correctly flagged and that an enable event with these filters properly fails. Signed-off-by: Christian Babeux --- tests/tools/filtering/Makefile.am | 2 + tests/tools/filtering/unsupported-ops | 114 ++++++++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+) create mode 100644 tests/tools/filtering/Makefile.am create mode 100755 tests/tools/filtering/unsupported-ops diff --git a/tests/tools/filtering/Makefile.am b/tests/tools/filtering/Makefile.am new file mode 100644 index 0000000..62f7099 --- /dev/null +++ b/tests/tools/filtering/Makefile.am @@ -0,0 +1,2 @@ +noinst_SCRIPTS = unsupported-ops +EXTRA_DIST = unsupported-ops diff --git a/tests/tools/filtering/unsupported-ops b/tests/tools/filtering/unsupported-ops new file mode 100755 index 0000000..8b743f4 --- /dev/null +++ b/tests/tools/filtering/unsupported-ops @@ -0,0 +1,114 @@ +#!/bin/bash +# +# Copyright (C) - 2012 Christian Babeux +# +# This program is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License, version 2 only, as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it will be useful, but WITHOUT +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +# more details. +# +# You should have received a copy of the GNU General Public License along with +# this program; if not, write to the Free Software Foundation, Inc., 51 +# Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +TEST_DESC="Filtering - Unsupported operators" + +CURDIR=$(dirname $0)/ +TESTDIR=$CURDIR/../.. +LTTNG_BIN="lttng" +SESSION_NAME="filter-unsupported-ops" +EVENT_NAME="bogus" +ENABLE_EVENT_STDERR="/tmp/unsupported-ops-enable" +TRACE_PATH=$(mktemp -d) + +source $TESTDIR/utils.sh + +print_test_banner "$TEST_DESC" + +function enable_ust_lttng_event_filter_unsupported +{ + sess_name=$1 + event_name=$2 + filter=$3 + + echo -n "Enabling lttng event with filtering and unsupported operator " + enable_cmd="$TESTDIR/../src/bin/lttng/$LTTNG_BIN enable-event" + $enable_cmd $event_name -s $sess_name -u --filter "$filter" 2> $ENABLE_EVENT_STDERR 1> /dev/null + + # Enable must fail + if [ $? -eq 0 ]; then + print_fail + return 1 + else + print_ok + return 0 + fi +} + +function test_unsupported_op +{ + test_op_str=$1 + test_op_tkn=$2 + + echo "" + echo -e "=== Testing filter expression with unsupported operator $test_op_str ($test_op_tkn)" + + # Create session + create_lttng_session $SESSION_NAME $TRACE_PATH + + # Create filter + if [ "$test_op_str" == "UNARY_BIN_NOT" ]; then + TEST_FILTER="${test_op_tkn}1" + else + TEST_FILTER="intfield $test_op_tkn 1" + fi + + # Apply filter + enable_ust_lttng_event_filter_unsupported $SESSION_NAME $EVENT_NAME "$TEST_FILTER" + + # Test stderr for unsupported operator + echo -n "Unsupported operator test $test_op_str ($test_op_tkn) " + grep -i -q "not[[:space:]]\+supported" $ENABLE_EVENT_STDERR + + if [ $? -eq 1 ]; then + print_fail + return 1 + else + print_ok + fi + + # Destroy session + destroy_lttng_session $SESSION_NAME + return 0 +} + +# Unsupported operators +OP_STR=("MUL" "DIV" "MOD" "PLUS" "MINUS" "LSHIFT" "RSHIFT" + "BIN_AND" "BIN_OR" "BIN_XOR" "UNARY_BIN_NOT") + +OP_TKN=("*" "/" "%" "+" "-" "<<" ">>" "&" "|" "^" "~") + +OP_COUNT=${#OP_STR[@]} +i=0 + +start_lttng_sessiond + +while [ "$i" -lt "$OP_COUNT" ]; do + test_unsupported_op "${OP_STR[$i]}" "${OP_TKN[$i]}" + + if [ $? -eq 1 ]; then + exit 1 + fi + + let "i++" +done + +stop_lttng_sessiond + +# Cleanup +rm -f $ENABLE_EVENT_STDERR +rm -rf $TRACE_PATH -- 1.7.12.2 From christian.babeux at efficios.com Tue Oct 16 14:33:20 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 16 Oct 2012 14:33:20 -0400 Subject: [lttng-dev] [PATCH lttng-tools 2/5] Tests: Add a test for invalid filters In-Reply-To: <1350412403-9974-1-git-send-email-christian.babeux@efficios.com> References: <1350412403-9974-1-git-send-email-christian.babeux@efficios.com> Message-ID: <1350412403-9974-2-git-send-email-christian.babeux@efficios.com> This test validate that an invalid filter (unmatched parenthesis, field dereferences, unsupported ops, etc.) are correctly flagged as such. Signed-off-by: Christian Babeux --- tests/tools/filtering/Makefile.am | 4 +- tests/tools/filtering/invalid-filters | 135 ++++++++++++++++++++++++++++++++++ 2 files changed, 137 insertions(+), 2 deletions(-) create mode 100755 tests/tools/filtering/invalid-filters diff --git a/tests/tools/filtering/Makefile.am b/tests/tools/filtering/Makefile.am index 62f7099..5f3423a 100644 --- a/tests/tools/filtering/Makefile.am +++ b/tests/tools/filtering/Makefile.am @@ -1,2 +1,2 @@ -noinst_SCRIPTS = unsupported-ops -EXTRA_DIST = unsupported-ops +noinst_SCRIPTS = unsupported-ops invalid-filters +EXTRA_DIST = unsupported-ops invalid-filters diff --git a/tests/tools/filtering/invalid-filters b/tests/tools/filtering/invalid-filters new file mode 100755 index 0000000..d0777e5 --- /dev/null +++ b/tests/tools/filtering/invalid-filters @@ -0,0 +1,135 @@ +#!/bin/bash +# +# Copyright (C) - 2012 Christian Babeux +# +# This program is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License, version 2 only, as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it will be useful, but WITHOUT +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +# more details. +# +# You should have received a copy of the GNU General Public License along with +# this program; if not, write to the Free Software Foundation, Inc., 51 +# Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +TEST_DESC="Filtering - Invalid filters" + +CURDIR=$(dirname $0)/ +TESTDIR=$CURDIR/../.. +LTTNG_BIN="lttng" +SESSION_NAME="filter-invalid" +EVENT_NAME="bogus" +ENABLE_EVENT_STDERR="/tmp/invalid-filters-stderr" +TRACE_PATH=$(mktemp -d) + +source $TESTDIR/utils.sh + +print_test_banner "$TEST_DESC" + +function enable_ust_lttng_event_filter +{ + sess_name="$1" + event_name="$2" + filter="$3" + echo -n "Enabling lttng event with filtering and invalid filter " + + $TESTDIR/../src/bin/lttng/$LTTNG_BIN enable-event $event_name -s $sess_name -u --filter "$filter" 2> $ENABLE_EVENT_STDERR 1> /dev/null + + # Enable must fail + if [ $? -eq 0 ]; then + print_fail + return 1 + else + print_ok + return 0 + fi +} + +function test_invalid_filter +{ + test_invalid_filter="$1" + + echo "" + echo -e "=== Testing filter expression with invalid filter" + echo -e "Filter: $test_invalid_filter" + + # Create session + create_lttng_session $SESSION_NAME $TRACE_PATH + + # Apply filter + enable_ust_lttng_event_filter $SESSION_NAME $EVENT_NAME "$test_invalid_filter" + + # Destroy session + destroy_lttng_session $SESSION_NAME +} + +function test_bytecode_limit +{ + # Current bytecode limitation is 65536 bytes long. + # Generate a huge bytecode with some perl-fu + BYTECODE_LIMIT=`perl -e 'print "intfield" . " && 1" x5460'` + + echo "" + echo -e "=== Testing filter bytecode limits (64KiB)" + + # Create session + create_lttng_session $SESSION_NAME $TRACE_PATH + + # Apply filter + enable_ust_lttng_event_filter $SESSION_NAME $EVENT_NAME "$BYTECODE_LIMIT" + + # Destroy session + destroy_lttng_session $SESSION_NAME +} + +IFS=$'\n' +INVALID_FILTERS=( + # Unsupported ops + "intfield*1" + "intfield/1" + "intfield+1" + "intfield-1" + "intfield>>1" + "intfield<<1" + "intfield&1" + "intfield|1" + "intfield^1" + "~intfield" + "1+11111-3333+1" + "(1+2)*(55*666)" + "1+2*55*666" + "asdf + 1 > 1" + "asdfas < 2332 || asdf + 1 > 1" + "!+-+++-------+++++++++++-----!!--!44+1" + "aaa||(gg)+(333----1)" + "1+1" + # Unmatched parenthesis + "((((((((((((((intfield)))))))))))))" + '0 || ("abc" != "def")) && (3 < 4)' + # Field dereference + "a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a" + "a->" + "a-->a" + "a->a" + "a.b.c->d.e.f+1" + "!a.f.d" + "asdf.asdfsd.sadf < 4" + "asdfasdf->asdfasdf < 2" + ) + +start_lttng_sessiond +for FILTER in ${INVALID_FILTERS[@]}; +do + test_invalid_filter "$FILTER" +done + +test_bytecode_limit + +unset IFS +stop_lttng_sessiond + +rm -f $ENABLE_EVENT_STDERR +rm -rf $TRACE_PATH -- 1.7.12.2 From christian.babeux at efficios.com Tue Oct 16 14:33:21 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 16 Oct 2012 14:33:21 -0400 Subject: [lttng-dev] [PATCH lttng-tools 3/5] Tests: Add a trace statistics utility In-Reply-To: <1350412403-9974-1-git-send-email-christian.babeux@efficios.com> References: <1350412403-9974-1-git-send-email-christian.babeux@efficios.com> Message-ID: <1350412403-9974-3-git-send-email-christian.babeux@efficios.com> The babelstats script output statistics on fields values for a particular tracepoint. At the moment, the script only show minimum and maximum value for each fields of a particular tracepoint. The trace must be in the babeltrace text format. It can be passed via stdin. The script output this format: field_name min max Sample usage: > babeltrace sometracedir | babelstats.pl --tracepoint tp:tptest _seqfield1_length 4 4 seqfield2 "test" "test" stringfield2 "\*" "\*" floatfield 2222 2222 netintfieldhex 0x0 0x63 _seqfield2_length 4 4 longfield 0 99 netintfield 0 99 intfield2 0x0 0x63 intfield 0 99 stringfield "test" "test" doublefield 2 2 arrfield2 "test" "test" Use case: This script could be useful to validate that fields values are within some predefined expected ranges. Signed-off-by: Christian Babeux --- tests/tools/filtering/Makefile.am | 4 +- tests/tools/filtering/babelstats.pl | 174 ++++++++++++++++++++++++++++++++++++ 2 files changed, 176 insertions(+), 2 deletions(-) create mode 100755 tests/tools/filtering/babelstats.pl diff --git a/tests/tools/filtering/Makefile.am b/tests/tools/filtering/Makefile.am index 5f3423a..df99e27 100644 --- a/tests/tools/filtering/Makefile.am +++ b/tests/tools/filtering/Makefile.am @@ -1,2 +1,2 @@ -noinst_SCRIPTS = unsupported-ops invalid-filters -EXTRA_DIST = unsupported-ops invalid-filters +noinst_SCRIPTS = unsupported-ops invalid-filters babelstats.pl +EXTRA_DIST = unsupported-ops invalid-filters babelstats.pl diff --git a/tests/tools/filtering/babelstats.pl b/tests/tools/filtering/babelstats.pl new file mode 100755 index 0000000..d8d4dd0 --- /dev/null +++ b/tests/tools/filtering/babelstats.pl @@ -0,0 +1,174 @@ +#!/usr/bin/perl + +# Copyright (C) - 2012 Christian Babeux +# +# This program is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License, version 2 only, as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it will be useful, but WITHOUT +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +# more details. +# +# You should have received a copy of the GNU General Public License along with +# this program; if not, write to the Free Software Foundation, Inc., 51 +# Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +use strict; +use warnings; + +use Getopt::Long; + +my $opt_tracepoint; + +GetOptions('tracepoint=s' => \$opt_tracepoint) + or die("Invalid command-line option\n"); + +defined($opt_tracepoint) + or die("Missing tracepoint, use --tracepoint "); + +# Parse an array string. +# The format is as follow: [ [index] = value, ... ] +sub parse_array +{ + my ($arr_str) = @_; + my @array = (); + + # Strip leading and ending brackets, remove whitespace + $arr_str =~ s/^\[//; + $arr_str =~ s/\]$//; + $arr_str =~ s/\s//g; + + my @entries = split(',', $arr_str); + + foreach my $entry (@entries) { + if ($entry =~ /^\[(\d+)\]=(\d+)$/) { + my $index = $1; + my $value = $2; + splice @array, $index, 0, $value; + } + } + + return \@array; +} + +# Parse fields values. +# Format can either be a name = array or a name = value pair. +sub parse_fields +{ + my ($fields_str) = @_; + my %fields_hash; + + my $field_name = '[\w\d_]+'; + my $field_value = '[\w\d_\\\*"]+'; + my $array = '\[(?:\s\[\d+\]\s=\s\d+,)*\s\[\d+\]\s=\s\d+\s\]'; + + # Split the various fields + my @fields = ($fields_str =~ /$field_name\s=\s(?:$array|$field_value)/g); + + foreach my $field (@fields) { + if ($field =~ /($field_name)\s=\s($array)/) { + my $name = $1; + my $value = parse_array($2); + $fields_hash{$name} = $value; + } + + if ($field =~ /($field_name)\s=\s($field_value)/) { + my $name = $1; + my $value = $2; + $fields_hash{$name} = $value; + } + } + + return \%fields_hash; +} + +# Using an event array, merge all the fields +# of a particular tracepoint. +sub merge_fields +{ + my ($events_ref) = @_; + my %merged; + + foreach my $event (@{$events_ref}) { + my $tp_provider = $event->{'tp_provider'}; + my $tp_name = $event->{'tp_name'}; + my $tracepoint = "$tp_provider:$tp_name"; + + foreach my $key (keys %{$event->{'fields'}}) { + my $val = $event->{'fields'}->{$key}; + + # TODO: Merge of array is not implemented. + next if (ref($val) eq 'ARRAY'); + $merged{$tracepoint}{$key}{$val} = undef; + } + } + + return \%merged; +} + +# Print the minimum and maximum of each fields +# for a particular tracepoint. +sub print_fields_stats +{ + my ($merged_ref, $tracepoint) = @_; + + return unless ($tracepoint && exists $merged_ref->{$tracepoint}); + + foreach my $field (keys %{$merged_ref->{$tracepoint}}) { + my @sorted; + my @val = keys ($merged_ref->{$tracepoint}->{$field}); + + if ($val[0] =~ /^\d+$/) { + # Sort numerically + @sorted = sort { $a <=> $b } @val; + } elsif ($val[0] =~ /^0x[\da-f]+$/i) { + # Convert the hex values and sort numerically + @sorted = sort { hex($a) <=> hex($b) } @val; + } else { + # Fallback, alphabetical sort + @sorted = sort { lc($a) cmp lc($b) } @val; + } + + my $min = $sorted[0]; + my $max = $sorted[-1]; + + print "$field $min $max\n"; + } +} + +my @events; + +while (<>) +{ + my $timestamp = '\[(.*)\]'; + my $elapsed = '\((.*)\)'; + my $hostname = '.*'; + my $pname = '.*'; + my $pid = '\d+'; + my $tp_provider = '.*'; + my $tp_name = '.*'; + my $cpu_info = '{\scpu_id\s=\s(\d+)\s\}'; + my $fields = '{(.*)}'; + + # Parse babeltrace text output format + if (/$timestamp\s$elapsed\s($hostname):($pname):($pid)\s($tp_provider):($tp_name):\s$cpu_info,\s$fields/) { + my %event_hash; + + $event_hash{'timestamp'} = $1; + $event_hash{'elapsed'} = $2; + $event_hash{'hostname'} = $3; + $event_hash{'pname'} = $4; + $event_hash{'pid'} = $5; + $event_hash{'tp_provider'} = $6; + $event_hash{'tp_name'} = $7; + $event_hash{'cpu_id'} = $8; + $event_hash{'fields'} = parse_fields($9); + + push @events, \%event_hash; + } +} + +my %merged_fields = %{merge_fields(\@{events})}; +print_fields_stats(\%merged_fields, $opt_tracepoint); -- 1.7.12.2 From christian.babeux at efficios.com Tue Oct 16 14:33:22 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 16 Oct 2012 14:33:22 -0400 Subject: [lttng-dev] [PATCH lttng-tools 4/5] Tests: Add a test for valid filters In-Reply-To: <1350412403-9974-1-git-send-email-christian.babeux@efficios.com> References: <1350412403-9974-1-git-send-email-christian.babeux@efficios.com> Message-ID: <1350412403-9974-4-git-send-email-christian.babeux@efficios.com> This test validate that for a given filter the expected trace output is conform to the expected filter behavior. This test rely on the babelstats utility. With the help of this script, we can verify that the expected minimum and maximum values on fields of interest are within the expected ranges. For example, given 100 iterations on a tracepoint with the field 'intfield', with values starting from 0 and incrementing on each iteration, and the filter expression 'intfield < 4', we would expect that the min-max range lie within [0,3]. Thus, if the babelstat computed range does not match the expected range, this could potentially indicate failure in the filtering mechanism. Signed-off-by: Christian Babeux --- tests/tools/filtering/Makefile.am | 20 +- tests/tools/filtering/gen-ust-events.c | 59 +++++ tests/tools/filtering/tp.c | 15 ++ tests/tools/filtering/tp.h | 57 +++++ tests/tools/filtering/valid-filters | 411 +++++++++++++++++++++++++++++++++ 5 files changed, 560 insertions(+), 2 deletions(-) create mode 100644 tests/tools/filtering/gen-ust-events.c create mode 100644 tests/tools/filtering/tp.c create mode 100644 tests/tools/filtering/tp.h create mode 100755 tests/tools/filtering/valid-filters diff --git a/tests/tools/filtering/Makefile.am b/tests/tools/filtering/Makefile.am index df99e27..a3bf866 100644 --- a/tests/tools/filtering/Makefile.am +++ b/tests/tools/filtering/Makefile.am @@ -1,2 +1,18 @@ -noinst_SCRIPTS = unsupported-ops invalid-filters babelstats.pl -EXTRA_DIST = unsupported-ops invalid-filters babelstats.pl +AM_CFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src -I$(top_srcdir)/tests -I$(srcdir) -O2 -g +AM_LDFLAGS = + +if LTTNG_TOOLS_BUILD_WITH_LIBDL +AM_LDFLAGS += -ldl +endif +if LTTNG_TOOLS_BUILD_WITH_LIBC_DL +AM_LDFLAGS += -lc +endif + +if HAVE_LIBLTTNG_UST_CTL +noinst_PROGRAMS = gen-ust-events +gen_ust_events_SOURCES = gen-ust-events.c tp.c tp.h +gen_ust_events_LDADD = -llttng-ust +endif + +noinst_SCRIPTS = unsupported-ops invalid-filters valid-filters babelstats.pl +EXTRA_DIST = unsupported-ops invalid-filters valid-filters babelstats.pl diff --git a/tests/tools/filtering/gen-ust-events.c b/tests/tools/filtering/gen-ust-events.c new file mode 100644 index 0000000..c789c89 --- /dev/null +++ b/tests/tools/filtering/gen-ust-events.c @@ -0,0 +1,59 @@ +/* + * Copyright (C) - 2012 David Goulet + * + * This library is free software; you can redistribute it and/or modify it + * under the terms of the GNU Lesser General Public License as published by the + * Free Software Foundation; version 2.1 of the License. + * + * This library is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License + * for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with this library; if not, write to the Free Software Foundation, + * Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define TRACEPOINT_DEFINE +#include "tp.h" + +int main(int argc, char **argv) +{ + int i, netint; + long values[] = { 1, 2, 3 }; + char text[10] = "test"; + char escape[10] = "\\*"; + double dbl = 2.0; + float flt = 2222.0; + /* Generate 30 events. */ + unsigned int nr_iter = 100; + useconds_t nr_usec = 0; + + if (argc >= 2) { + nr_iter = atoi(argv[1]); + } + + if (argc == 3) { + /* By default, don't wait unless user specifies. */ + nr_usec = atoi(argv[2]); + } + + for (i = 0; i < nr_iter; i++) { + netint = htonl(i); + tracepoint(tp, tptest, i, netint, values, text, strlen(text), escape, dbl, flt); + usleep(nr_usec); + } + + return 0; +} diff --git a/tests/tools/filtering/tp.c b/tests/tools/filtering/tp.c new file mode 100644 index 0000000..a09561d --- /dev/null +++ b/tests/tools/filtering/tp.c @@ -0,0 +1,15 @@ +/* + * Copyright (c) - 2012 David Goulet + * + * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED OR + * IMPLIED. ANY USE IS AT YOUR OWN RISK. + * + * Permission is hereby granted to use or copy this program for any purpose, + * provided the above notices are retained on all copies. Permission to modify + * the code and to distribute modified code is granted, provided the above + * notices are retained, and a notice that the code was modified is included + * with the above copyright notice. + */ + +#define TRACEPOINT_CREATE_PROBES +#include "tp.h" diff --git a/tests/tools/filtering/tp.h b/tests/tools/filtering/tp.h new file mode 100644 index 0000000..15f81e5 --- /dev/null +++ b/tests/tools/filtering/tp.h @@ -0,0 +1,57 @@ +#undef TRACEPOINT_PROVIDER +#define TRACEPOINT_PROVIDER tp + +#if !defined(_TRACEPOINT_TP_H) || defined(TRACEPOINT_HEADER_MULTI_READ) +#define _TRACEPOINT_TP_H + +#ifdef __cplusplus +extern "C" { +#endif + +/* + * Copyright (C) 2011 Mathieu Desnoyers + * + * THIS MATERIAL IS PROVIDED AS IS, WITH ABSOLUTELY NO WARRANTY EXPRESSED + * OR IMPLIED. ANY USE IS AT YOUR OWN RISK. + * + * Permission is hereby granted to use or copy this program + * for any purpose, provided the above notices are retained on all copies. + * Permission to modify the code and to distribute modified code is granted, + * provided the above notices are retained, and a notice that the code was + * modified is included with the above copyright notice. + */ + +#include + +TRACEPOINT_EVENT(tp, tptest, + TP_ARGS(int, anint, int, netint, long *, values, + char *, text, size_t, textlen, + char *, etext, double, doublearg, float, floatarg), + TP_FIELDS( + ctf_integer(int, intfield, anint) + ctf_integer_hex(int, intfield2, anint) + ctf_integer(long, longfield, anint) + ctf_integer_network(int, netintfield, netint) + ctf_integer_network_hex(int, netintfieldhex, netint) + ctf_array(long, arrfield1, values, 3) + ctf_array_text(char, arrfield2, text, 10) + ctf_sequence(char, seqfield1, text, size_t, textlen) + ctf_sequence_text(char, seqfield2, text, size_t, textlen) + ctf_string(stringfield, text) + ctf_string(stringfield2, etext) + ctf_float(float, floatfield, floatarg) + ctf_float(double, doublefield, doublearg) + ) +) + +#endif /* _TRACEPOINT_TP_H */ + +#undef TRACEPOINT_INCLUDE_FILE +#define TRACEPOINT_INCLUDE_FILE ./tp.h + +/* This part must be outside ifdef protection */ +#include + +#ifdef __cplusplus +} +#endif diff --git a/tests/tools/filtering/valid-filters b/tests/tools/filtering/valid-filters new file mode 100755 index 0000000..b48b6ed --- /dev/null +++ b/tests/tools/filtering/valid-filters @@ -0,0 +1,411 @@ +#!/bin/bash +# +# Copyright (C) - 2012 Christian Babeux +# +# This program is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License, version 2 only, as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it will be useful, but WITHOUT +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for +# more details. +# +# You should have received a copy of the GNU General Public License along with +# this program; if not, write to the Free Software Foundation, Inc., 51 +# Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +TEST_DESC="Filtering - Valid filters" + +CURDIR=$(dirname $0)/ +TESTDIR=$CURDIR/../.. +LTTNG_BIN="lttng" +BIN_NAME="gen-ust-events" +STATS_BIN="babelstats.pl" +SESSION_NAME="valid_filter" +EVENT_NAME="tp:tptest" +NR_ITER=100 + +source $TESTDIR/utils.sh + +print_test_banner "$TEST_DESC" + +if [ ! -x "$CURDIR/$BIN_NAME" ]; then + echo -e "No UST nevents binary detected. Passing." + exit 0 +fi + +function enable_ust_lttng_event_filter() +{ + sess_name="$1" + event_name="$2" + filter="$3" + echo -n "Enabling lttng event with filtering " + + $TESTDIR/../src/bin/lttng/$LTTNG_BIN enable-event $event_name -s $sess_name -u --filter "$filter" 2>&1 >/dev/null + + if [ $? -eq 0 ]; then + print_ok + return 0 + else + print_fail + return 1 + fi +} + +function run_apps +{ + ./$CURDIR/$BIN_NAME $NR_ITER & >/dev/null 2>&1 +} + +function wait_apps +{ + echo "Waiting for applications to end" + while [ -n "$(pidof $BIN_NAME)" ]; do + echo -n "." + sleep 1 + done + echo "" +} + +function test_valid_filter +{ + filter="$1" + validator="$2" + + echo "" + echo -e "=== Testing valid filter: $1" + + trace_path=$(mktemp -d) + + # Create session + create_lttng_session $SESSION_NAME $trace_path + + # Enable filter + enable_ust_lttng_event_filter $SESSION_NAME $EVENT_NAME $filter + + # Trace apps + start_lttng_tracing $SESSION_NAME + run_apps + wait_apps + stop_lttng_tracing $SESSION_NAME + + # Destroy session + destroy_lttng_session $SESSION_NAME + + echo -n "Validating filter output " + stats=`babeltrace $trace_path | $CURDIR/$STATS_BIN --tracepoint $EVENT_NAME` + + $validator "$stats" + + if [ $? -eq 0 ]; then + print_ok +# rm -rf $trace_path + return 0 + else + print_fail + return 1 + fi +} + +function validate_min_max +{ + stats="$1" + field=$2 + expected_min=$3 + expected_max=$4 + + echo $stats | grep -q "$field $expected_min $expected_max" + + return $? +} + +function validator_intfield +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "1" "99" + status=$(($status|$?)) + + validate_min_max "$stats" "intfield2" "0x1" "0x63" + status=$(($status|$?)) + + validate_min_max "$stats" "longfield" "1" "99" + status=$(($status|$?)) + + validate_min_max "$stats" "netintfield" "1" "99" + status=$(($status|$?)) + + validate_min_max "$stats" "netintfieldhex" "0x1" "0x63" + status=$(($status|$?)) + + validate_min_max "$stats" "floatfield" "2222" "2222" + status=$(($status|$?)) + + validate_min_max "$stats" "doublefield" "2" "2" + status=$(($status|$?)) + + return $status +} + +function validator_intfield_gt +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "2" "99" + status=$(($status|$?)) + + return $status +} + +function validator_intfield_ge +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "1" "99" + status=$(($status|$?)) + + return $status +} + +function validator_intfield_lt +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "0" "1" + status=$(($status|$?)) + + return $status +} + +function validator_intfield_le +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "0" "2" + status=$(($status|$?)) + + return $status +} + +function validator_intfield_eq +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "1" "1" + status=$(($status|$?)) + + return $status +} + +function validator_intfield_ne +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "0" "98" + status=$(($status|$?)) + + return $status +} + +function validator_intfield_not +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "0" "0" + status=$(($status|$?)) + + return $status +} + +function validator_intfield_gt_and_longfield_gt +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "43" "99" + status=$(($status|$?)) + validate_min_max "$stats" "longfield" "43" "99" + status=$(($status|$?)) + + return $status +} + +function validator_intfield_ge_and_longfield_le +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "42" "42" + status=$(($status|$?)) + validate_min_max "$stats" "longfield" "42" "42" + status=$(($status|$?)) + + return $status +} + +function validator_intfield_lt_or_longfield_gt +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "0" "99" + status=$(($status|$?)) + validate_min_max "$stats" "longfield" "0" "99" + status=$(($status|$?)) + + return $status +} + +function validator_mixed_str_or_int_and_int +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "34" "99" + status=$(($status|$?)) + + validate_min_max "$stats" "stringfield" "\"test\"" "\"test\"" + status=$(($status|$?)) + + return $status +} + +function validator_mixed_int_double +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "0" "42" + status=$(($status|$?)) + + return $status +} + +function validator_true_statement +{ + stats="$1" + status=0 + + validate_min_max "$stats" "intfield" "0" "99" + status=$(($status|$?)) + + validate_min_max "$stats" "intfield2" "0x0" "0x63" + status=$(($status|$?)) + + validate_min_max "$stats" "longfield" "0" "99" + status=$(($status|$?)) + + validate_min_max "$stats" "netintfield" "0" "99" + status=$(($status|$?)) + + validate_min_max "$stats" "netintfieldhex" "0x0" "0x63" + status=$(($status|$?)) + + validate_min_max "$stats" "floatfield" "2222" "2222" + status=$(($status|$?)) + + validate_min_max "$stats" "doublefield" "2" "2" + status=$(($status|$?)) + + validate_min_max "$stats" "stringfield" "\"test\"" "\"test\"" + status=$(($status|$?)) + + validate_min_max "$stats" "stringfield2" ""\*"" ""\*"" + status=$(($status|$?)) + + return $status +} + +IFS=$'\n' + +issue_356_filter="intfield > 0 && intfield > 1 && " +issue_356_filter+="intfield > 2 && intfield > 3 && " +issue_356_filter+="intfield > 4 && intfield > 5 && " +issue_356_filter+="intfield > 6 && intfield > 7 && " +issue_356_filter+="intfield > 8 || intfield > 0" + +# One to one mapping between filters and validators + +FILTERS=("intfield" #1 + "intfield > 1" #2 + "intfield >= 1" #3 + "intfield < 2" #4 + "intfield <= 2" #5 + "intfield == 1" #6 + "intfield != 99" #7 + "!intfield" #8 + "-intfield" #9 + "--intfield" #10 + "+intfield" #11 + "++intfield" #12 + "intfield > 1 && longfield > 42" #13 + "intfield >= 42 && longfield <= 42" #14 + "intfield < 1 || longfield > 98" #15 + "(stringfield == \"test\" || intfield != 10) && intfield > 33" #16 + "intfield < 42.4242424242" #17 + "\"test\" == \"test\"" #18 #Issue #342 + "stringfield == \"test\"" #19 + "stringfield == \"t*\"" #20 + "stringfield == \"*\"" #21 + $issue_356_filter #22 #Issue #356 + "intfield < 0xDEADBEEF" #23 + "intfield < 0x2" #24 + "intfield < 02" #25 + "stringfield2 == \"\\\*\"" #26 +) + +VALIDATOR=("validator_intfield" #1 + "validator_intfield_gt" #2 + "validator_intfield_ge" #3 + "validator_intfield_lt" #4 + "validator_intfield_le" #5 + "validator_intfield_eq" #6 + "validator_intfield_ne" #7 + "validator_intfield_not" #8 + "validator_intfield" #9 + "validator_intfield" #10 + "validator_intfield" #11 + "validator_intfield" #12 + "validator_intfield_gt_and_longfield_gt" #13 + "validator_intfield_ge_and_longfield_le" #14 + "validator_intfield_lt_or_longfield_gt" #15 + "validator_mixed_str_or_int_and_int" #16 + "validator_mixed_int_double" #17 + "validator_true_statement" #18 + "validator_true_statement" #19 + "validator_true_statement" #20 + "validator_true_statement" #21 + "validator_intfield" #22 + "validator_true_statement" #23 + "validator_intfield_lt" #24 + "validator_intfield_lt" #25 + "validator_true_statement" #26 +) + +FILTER_COUNT=${#FILTERS[@]} +i=0 + +start_lttng_sessiond + +while [ "$i" -lt "$FILTER_COUNT" ]; do + + test_valid_filter "${FILTERS[$i]}" "${VALIDATOR[$i]}" + + if [ $? -eq 1 ]; then + stop_lttng_sessiond + exit 1 + fi + + let "i++" +done + +stop_lttng_sessiond -- 1.7.12.2 From christian.babeux at efficios.com Tue Oct 16 14:33:23 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 16 Oct 2012 14:33:23 -0400 Subject: [lttng-dev] [PATCH lttng-tools 5/5] Tests: Add filtering tests to configure In-Reply-To: <1350412403-9974-1-git-send-email-christian.babeux@efficios.com> References: <1350412403-9974-1-git-send-email-christian.babeux@efficios.com> Message-ID: <1350412403-9974-5-git-send-email-christian.babeux@efficios.com> Add filtering folder to top-level Makefile.am. Also add a runall script to run all filtering tests. Signed-off-by: Christian Babeux --- configure.ac | 1 + tests/tools/Makefile.am | 2 +- tests/tools/filtering/Makefile.am | 4 ++-- tests/tools/filtering/runall | 28 ++++++++++++++++++++++++++++ tests/tools/runall.sh | 2 +- 5 files changed, 33 insertions(+), 4 deletions(-) create mode 100755 tests/tools/filtering/runall diff --git a/configure.ac b/configure.ac index cb5dc38..fbaae6f 100644 --- a/configure.ac +++ b/configure.ac @@ -288,6 +288,7 @@ AC_CONFIG_FILES([ tests/kernel/Makefile tests/tools/Makefile tests/tools/streaming/Makefile + tests/tools/filtering/Makefile tests/tools/health/Makefile tests/ust/Makefile tests/ust/nprocesses/Makefile diff --git a/tests/tools/Makefile.am b/tests/tools/Makefile.am index 173dce2..56eda3a 100644 --- a/tests/tools/Makefile.am +++ b/tests/tools/Makefile.am @@ -1,4 +1,4 @@ -SUBDIRS = streaming health +SUBDIRS = streaming filtering health AM_CFLAGS = -I$(top_srcdir)/include -I$(top_srcdir)/src -I$(top_srcdir)/tests -g -Wall AM_LDFLAGS = -lurcu -lurcu-cds diff --git a/tests/tools/filtering/Makefile.am b/tests/tools/filtering/Makefile.am index a3bf866..e1e715d 100644 --- a/tests/tools/filtering/Makefile.am +++ b/tests/tools/filtering/Makefile.am @@ -14,5 +14,5 @@ gen_ust_events_SOURCES = gen-ust-events.c tp.c tp.h gen_ust_events_LDADD = -llttng-ust endif -noinst_SCRIPTS = unsupported-ops invalid-filters valid-filters babelstats.pl -EXTRA_DIST = unsupported-ops invalid-filters valid-filters babelstats.pl +noinst_SCRIPTS = runall unsupported-ops invalid-filters valid-filters babelstats.pl +EXTRA_DIST = runall unsupported-ops invalid-filters valid-filters babelstats.pl diff --git a/tests/tools/filtering/runall b/tests/tools/filtering/runall new file mode 100755 index 0000000..c92e399 --- /dev/null +++ b/tests/tools/filtering/runall @@ -0,0 +1,28 @@ +#!/bin/bash + +DIR=$(dirname $0) + +tests=( $DIR/unsupported-ops $DIR/invalid-filters $DIR/valid-filters ) +exit_code=0 + +function start_tests () +{ + for bin in ${tests[@]}; + do + if [ ! -e $bin ]; then + echo -e "$bin not found, passing" + continue + fi + + ./$bin + # Test must return 0 to pass. + if [ $? -ne 0 ]; then + exit_code=1 + break + fi + done +} + +start_tests + +exit $exit_code diff --git a/tests/tools/runall.sh b/tests/tools/runall.sh index 0ad7cf1..b2be91c 100755 --- a/tests/tools/runall.sh +++ b/tests/tools/runall.sh @@ -3,7 +3,7 @@ DIR=$(dirname $0) tests=( $DIR/test_kernel_data_trace $DIR/test_sessions $DIR/test_ust_data_trace \ - $DIR/streaming/runall $DIR/health/runall ) + $DIR/streaming/runall $DIR/health/runall $DIR/filtering/runall) exit_code=0 -- 1.7.12.2 From paulmck at linux.vnet.ibm.com Tue Oct 16 17:37:27 2012 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Tue, 16 Oct 2012 14:37:27 -0700 Subject: [lttng-dev] urcu stack and queues updates and documentation In-Reply-To: <20121014175332.GA2947@Krystal> References: <20121014175332.GA2947@Krystal> Message-ID: <20121016213727.GL2385@linux.vnet.ibm.com> On Sun, Oct 14, 2012 at 01:53:32PM -0400, Mathieu Desnoyers wrote: > Hi Paul! > > I know you are currently looking at documentation of urcu data > structures. I did quite a bit of work in that area these past days. Here > is my plan: Actually, I diverted to the atomic operations, given that the stack/queue API seems to be in flux. ;-) > 1) I would like to deprecate, at some point, rculfqueue, wfqueue, and > rculfstack. > > 2) For wfqueue, we replace it by wfcqueue, currently in the urcu master > branch. > > 3) For rculfstack, we replace it by lfstack available here (volatile > branch): > > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > branch: urcu/lfstack I probably have to document them to have any chance of having an opinion, other than my usual advice to avoid disrupting users of the old interfaces. > 4) I did documentation improvements (and implemented pop_all as well as > empty, and iterators) for wfstack here (volatile branch too): > > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > branch: urcu/wfstack I will be very happy to take advantage of this. ;-) > 5) The last one to look into would be rculfqueue. I'd really like to > create a lfcqueue derived from wfcqueue if possible. It's the next > item on my todo list this weekend. The piece I am missing is ABA avoidance. Or is this the approach that assumes a single dequeuer? Thanx, Paul > Thoughts ? > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com > From jdesfossez at efficios.com Tue Oct 16 20:44:13 2012 From: jdesfossez at efficios.com (Julien Desfossez) Date: Tue, 16 Oct 2012 20:44:13 -0400 Subject: [lttng-dev] [LTTNG-TOOLS PATCH] Fix: Remove network stream ID ABI calls Message-ID: <1350434653-8724-1-git-send-email-jdesfossez@efficios.com> This patch removes some experimental kernel ABI calls that were not supposed to get merged. The kernel support for these calls has never been introduced in lttng-modules. I'm not sure how it happened, but it seems that this code got introduced along with the commit 00e2e675d54dc726a7c8f8887c889cc8ef022003 Signed-off-by: Julien Desfossez --- src/common/kernel-ctl/kernel-ctl.c | 6 ------ src/common/kernel-ctl/kernel-ctl.h | 2 -- src/common/kernel-ctl/kernel-ioctl.h | 4 ---- 3 files changed, 12 deletions(-) diff --git a/src/common/kernel-ctl/kernel-ctl.c b/src/common/kernel-ctl/kernel-ctl.c index a93d251..e4a268e 100644 --- a/src/common/kernel-ctl/kernel-ctl.c +++ b/src/common/kernel-ctl/kernel-ctl.c @@ -378,9 +378,3 @@ int kernctl_put_subbuf(int fd) { return ioctl(fd, RING_BUFFER_PUT_SUBBUF); } - -/* Set the stream_id */ -int kernctl_set_stream_id(int fd, unsigned long *stream_id) -{ - return ioctl(fd, RING_BUFFER_SET_STREAM_ID, stream_id); -} diff --git a/src/common/kernel-ctl/kernel-ctl.h b/src/common/kernel-ctl/kernel-ctl.h index 85a3a18..ea2aa58 100644 --- a/src/common/kernel-ctl/kernel-ctl.h +++ b/src/common/kernel-ctl/kernel-ctl.h @@ -66,7 +66,5 @@ int kernctl_get_subbuf(int fd, unsigned long *pos); int kernctl_put_subbuf(int fd); int kernctl_buffer_flush(int fd); -int kernctl_set_stream_id(int fd, unsigned long *stream_id); -int kernctl_get_net_stream_id_offset(int fd, unsigned long *offset); #endif /* _LTTNG_KERNEL_CTL_H */ diff --git a/src/common/kernel-ctl/kernel-ioctl.h b/src/common/kernel-ctl/kernel-ioctl.h index 8e22632..75d6da0 100644 --- a/src/common/kernel-ctl/kernel-ioctl.h +++ b/src/common/kernel-ctl/kernel-ioctl.h @@ -46,8 +46,6 @@ #define RING_BUFFER_GET_MMAP_READ_OFFSET _IOR(0xF6, 0x0B, unsigned long) /* flush the current sub-buffer */ #define RING_BUFFER_FLUSH _IO(0xF6, 0x0C) -/* map stream to stream id for network streaming */ -#define RING_BUFFER_SET_STREAM_ID _IOW(0xF6, 0x0D, unsigned long) /* Old ABI (without support for 32/64 bits compat) */ /* LTTng file descriptor ioctl */ @@ -71,8 +69,6 @@ #define LTTNG_KERNEL_OLD_STREAM _IO(0xF6, 0x60) #define LTTNG_KERNEL_OLD_EVENT \ _IOW(0xF6, 0x61, struct lttng_kernel_old_event) -#define LTTNG_KERNEL_OLD_STREAM_ID_OFFSET \ - _IOR(0xF6, 0x62, unsigned long) /* Event and Channel FD ioctl */ #define LTTNG_KERNEL_OLD_CONTEXT \ -- 1.7.10.4 From Andrew.McDermott at windriver.com Wed Oct 17 07:32:14 2012 From: Andrew.McDermott at windriver.com (McDermott, Andrew) Date: Wed, 17 Oct 2012 11:32:14 +0000 Subject: [lttng-dev] are LTTng-2.0 packages available on Fedora 16 via yum? Message-ID: <7F632A9222059A42AF70FCB7965774AA2063D880@ALA-MBB.corp.ad.wrs.com> Hi, I'm happily using LTTng-2.0 on Ubuntu-12.04 but I was wondering if Fedora 16 is supported via a yum install. I see quite a few references to 17 but not necessarily 16. Is this correct or am I simply not searching hard enough. Thanks, Andy. From mathieu.desnoyers at efficios.com Wed Oct 17 11:19:46 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 17 Oct 2012 11:19:46 -0400 Subject: [lttng-dev] urcu stack and queues updates and documentation In-Reply-To: <20121016213727.GL2385@linux.vnet.ibm.com> References: <20121014175332.GA2947@Krystal> <20121016213727.GL2385@linux.vnet.ibm.com> Message-ID: <20121017151946.GA14514@Krystal> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > On Sun, Oct 14, 2012 at 01:53:32PM -0400, Mathieu Desnoyers wrote: > > Hi Paul! > > > > I know you are currently looking at documentation of urcu data > > structures. I did quite a bit of work in that area these past days. Here > > is my plan: > > Actually, I diverted to the atomic operations, given that the stack/queue > API seems to be in flux. ;-) That sounds like a wise decision ;-) > > > 1) I would like to deprecate, at some point, rculfqueue, wfqueue, and > > rculfstack. > > > > 2) For wfqueue, we replace it by wfcqueue, currently in the urcu master > > branch. > > > > 3) For rculfstack, we replace it by lfstack available here (volatile > > branch): > > > > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > > branch: urcu/lfstack > > I probably have to document them to have any chance of having an opinion, > other than my usual advice to avoid disrupting users of the old interfaces. My general plan is to leave the old interfaces in place, marking them as "deprecated" by adding a __attribute__((deprecated("This interface is deprecated. Please refer to urcu/xxxqueue.h for its replacement."))). Then we'll be able to drop the deprecated interfaces in a couple of versions. > > > 4) I did documentation improvements (and implemented pop_all as well as > > empty, and iterators) for wfstack here (volatile branch too): > > > > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > > branch: urcu/wfstack > > I will be very happy to take advantage of this. ;-) I wonder how we should move forward with these ? I could pull the urcu/wfstack, urcu/lfstack commits into master with your approval, and mark rculfstack and wfqueue as deprecated. wfstack is simply extended. I would wait a bit before deciding anything wrt rculfqueue. Thoughts ? > > > 5) The last one to look into would be rculfqueue. I'd really like to > > create a lfcqueue derived from wfcqueue if possible. It's the next > > item on my todo list this weekend. > > The piece I am missing is ABA avoidance. Or is this the approach > that assumes a single dequeuer? If we look at the big picture, the main difference between the "wf" and "lf" approaches, both for stack and queue, is that "wf" requires traversal to busy-wait when it sees the intermediate NULL pointer state. This allows wait-free push/enqueue with xchg. The "lf" approach ensures that a simple traversal can be done on the structures, at the expense of requiring a cmpxchg on the enqueue/push. Luckily, for stacks, the nature of stacks makes "push" ABA-proof (see the documentation in the code), even if we use cmpxchg. Unluckily, for queues, using cmpxchg on enqueue is ABA-prone. dequeue is ABA-prone too. Moreover, we need to have existance guarantees, so an enqueue does not attempt to do a cmpxchg on the next pointer of a node that has already been dequeued and reallocated. So, one approach is to always rely on RCU, and require the RCU read-side lock to be held around enqueue, and around dequeue. Now, the question is: can we rely on other, non-rcu techniques, to protect lfqueue against ABA and offer existance guarantees ? A single-dequeuer approach would unfortunately not be sufficient, because enqueue is ABA-prone, and due to lack of existance guarantees for the node we are about to append after: if we have multiple enqueuers and a single dequeuer, one enqueue could suffer from ABA, and try to touch reallocated memory, due to dequeue+reallocation of a node. Even forcing single-enqueuer/single-dequeuer would not suffice: if, between the moment we get the tail node we plan to append after, and the moment we perform the cmpxchg to that node next pointer, the node is dequeued and freed, we would be touching freed memory (corruption). Therefore, that would require a single mutex on _both_ enqueue and dequeue operations, which really defeats the purpose of a lock-free queue. So my current understanding is that we might have to stay with a RCU lfcqueue, requiring RCU read-side lock to be held for enqueue and dequeue, and requiring to wait for a grace period to elapse before freeing the memory returned by dequeue. The benefit of using rculfcqueue over wfcqueue is that traversal of the nodes, and dequeue, don't need to busy-loop on NULL next pointers. Thoughts ? Thanks! Mathieu > > Thanx, Paul > > > Thoughts ? > > > > Thanks, > > > > Mathieu > > > > -- > > Mathieu Desnoyers > > Operating System Efficiency R&D Consultant > > EfficiOS Inc. > > http://www.efficios.com > > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From jistone at redhat.com Wed Oct 17 12:35:39 2012 From: jistone at redhat.com (Josh Stone) Date: Wed, 17 Oct 2012 09:35:39 -0700 Subject: [lttng-dev] are LTTng-2.0 packages available on Fedora 16 via yum? In-Reply-To: <7F632A9222059A42AF70FCB7965774AA2063D880@ALA-MBB.corp.ad.wrs.com> References: <7F632A9222059A42AF70FCB7965774AA2063D880@ALA-MBB.corp.ad.wrs.com> Message-ID: <507EDE5B.8090102@redhat.com> (resending, as I neglected to CC the list...) On 10/17/2012 04:32 AM, McDermott, Andrew wrote: > I'm happily using LTTng-2.0 on Ubuntu-12.04 but I was wondering if > Fedora 16 is supported via a yum install. I see quite a few references > to 17 but not necessarily 16. Is this correct or am I simply not > searching hard enough. You can see which packages are available anywhere in Fedora here: https://apps.fedoraproject.org/packages/lttng-tools https://apps.fedoraproject.org/packages/lttng-ust They were just added to Fedora in July, and at that time Yannick only requested to add f17, so that's why f16 doesn't have them. https://bugzilla.redhat.com/show_bug.cgi?id=717748#c20 https://bugzilla.redhat.com/show_bug.cgi?id=834481#c4 HTH, Josh From Andrew.McDermott at windriver.com Wed Oct 17 13:28:12 2012 From: Andrew.McDermott at windriver.com (McDermott, Andrew) Date: Wed, 17 Oct 2012 17:28:12 +0000 Subject: [lttng-dev] are LTTng-2.0 packages available on Fedora 16 via yum? In-Reply-To: <507EDE5B.8090102@redhat.com> References: <7F632A9222059A42AF70FCB7965774AA2063D880@ALA-MBB.corp.ad.wrs.com> <507EDE5B.8090102@redhat.com> Message-ID: <7F632A9222059A42AF70FCB7965774AA2063E902@ALA-MBB.corp.ad.wrs.com> Hi, (resending, as I neglected to CC the list...) On 10/17/2012 04:32 AM, McDermott, Andrew wrote: I'm happily using LTTng-2.0 on Ubuntu-12.04 but I was wondering if Fedora 16 is supported via a yum install. I see quite a few references to 17 but not necessarily 16. Is this correct or am I simply not searching hard enough. You can see which packages are available anywhere in Fedora here: https://apps.fedoraproject.org/packages/lttng-tools https://apps.fedoraproject.org/packages/lttng-ust They were just added to Fedora in July, and at that time Yannick only requested to add f17, so that's why f16 doesn't have them. https://bugzilla.redhat.com/show_bug.cgi?id=717748#c20 https://bugzilla.redhat.com/show_bug.cgi?id=834481#c4 Thanks; that helps clear things up. -- andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From yannick.brosseau at gmail.com Wed Oct 17 13:35:55 2012 From: yannick.brosseau at gmail.com (Brosseau, Yannick) Date: Wed, 17 Oct 2012 13:35:55 -0400 Subject: [lttng-dev] are LTTng-2.0 packages available on Fedora 16 via yum? In-Reply-To: <7F632A9222059A42AF70FCB7965774AA2063E902@ALA-MBB.corp.ad.wrs.com> References: <7F632A9222059A42AF70FCB7965774AA2063D880@ALA-MBB.corp.ad.wrs.com> <507EDE5B.8090102@redhat.com> <7F632A9222059A42AF70FCB7965774AA2063E902@ALA-MBB.corp.ad.wrs.com> Message-ID: Josh is right. With the release of f18 soon, I not sure I could add f16, but I can check if necessary... On Oct 17, 2012 1:28 PM, "McDermott, Andrew" wrote: > Hi, > > (resending, as I neglected to CC the list...) > > On 10/17/2012 04:32 AM, McDermott, Andrew wrote: > > I'm happily using LTTng-2.0 on Ubuntu-12.04 but I was wondering if > Fedora 16 is supported via a yum install. I see quite a few references > to 17 but not necessarily 16. Is this correct or am I simply not > searching hard enough. > > > You can see which packages are available anywhere in Fedora here: > https://apps.fedoraproject.org/packages/lttng-tools > https://apps.fedoraproject.org/packages/lttng-ust > > They were just added to Fedora in July, and at that time Yannick only > requested to add f17, so that's why f16 doesn't have them. > https://bugzilla.redhat.com/show_bug.cgi?id=717748#c20 > https://bugzilla.redhat.com/show_bug.cgi?id=834481#c4 > > > Thanks; that helps clear things up. > > -- > andy > > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathieu.desnoyers at efficios.com Thu Oct 18 16:50:10 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Thu, 18 Oct 2012 16:50:10 -0400 Subject: [lttng-dev] [RELEASE] Babeltrace 1.0.0-rc6 Message-ID: <20121018205010.GA21434@Krystal> The Babeltrace project provides trace read and write libraries, as well as a trace converter. Plugins can be created for any trace format to allow its conversion to/from another trace format. The main format expected to be converted to/from is the Common Trace Format (CTF). The default input format of the "babeltrace" command is CTF, and its default output format is a human-readable text log. The "babeltrace-log" command converts from a text log to a CTF trace. We had enough fixes for another round of rc release before 1.0 final. Mainly fixed all kinds of memory leaks, which became important when libbabeltrace was introduced, and the scope of babeltrace became more than just trace conversion. Changelog: 2012-10-18 Babeltrace 1.0.0-rc6 * Add valgrind suppression file for libpopt * Fix: unplug memory leak that causes popt-0.13 to segfault * Fix: test all close/fclose ret val, fix double close * Cleanup: add missing newline * Fix: fd leak on trace close * Fix memory leaks induced by lack of libpopt documentation * babeltrace: fix poptGetOptArg memleak * plugins: implement plugin unregister * Doc: valgrind with babeltrace (glib workaround) * callsites: fix memory leak * Fix: free all the metadata-related memory * Fix : Free the iterator callback arrays * Fix : cleanup teardown of context * Fix : protect static float and double declarations * callsite: support instruction pointer field * Document that list.h is LGPLv2.1, but entirely trivial * Fix: callsite support: list multiple callsites * Add callsite support * Fix: Allow 64-bit packet offset * Fix: emf uri: surround by " " * Handle model.emf.uri event info * Fix: Documentation cleanup * Fix: misplaced C++ ifdef * Fix babeltrace-log get big line when the input file last line don't have enter * API Fix: bt_ctf_iter_read_event_flags * Fix: get encoding for char arrays and sequences * Fix: access to declaration from declaration_field * Fix: get_declaration_* should not cast to field * Fix babeltrace-log uninitialized memory (v2) * Revert "Fix babeltrace-log uninitialized memory" * Fix babeltrace-log uninitialized memory * Fix: access field properties by declaration * Fix: check return value of get_char_array * Fix: C++ support to API header files Project website: http://www.efficios.com/babeltrace Download link: http://www.efficios.com/files/babeltrace/ CTF specification: http://www.efficios.com/ctf -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From Fredrik_Oestman at mentor.com Fri Oct 19 04:07:41 2012 From: Fredrik_Oestman at mentor.com (Oestman, Fredrik) Date: Fri, 19 Oct 2012 08:07:41 +0000 Subject: [lttng-dev] Flight-recorder mode Message-ID: <524C960C5DFC794E82BE548D825F05CF5BC75AB2@EU-MBX-01.mgc.mentorg.com> Hi, We use LTTng 2.0 with a limited file system on the target to receive the trace logs: Extract of fstab file: tmpfs /dev/shm tmpfs size=64M,mode=777,noauto 0 0 When tmpfs is full, trace logs are corrupted. Since it is very difficult to control the tracing to avoid this, we would need a real flight-recorder mode where the files on the disk have a pre-defined (maximum) size and where either discard or overwrite mode can be applied. Is there such a thing or some other solution to the problem? Cheers, Fredrik ?stman http://go.mentor.com/sourceryanalyzer/ From fedotov.d.a at mail.ru Fri Oct 19 17:10:25 2012 From: fedotov.d.a at mail.ru (=?UTF-8?B?0JTQtdC90LjRgSDQpNC10LTQvtGC0L7Qsg==?=) Date: Sat, 20 Oct 2012 01:10:25 +0400 Subject: [lttng-dev] =?utf-8?q?LTTng_tracing_linux_modules?= Message-ID: <1350681025.129060780@f219.mail.ru> Hi! Please help me. I searched the whole Google and could not just ?find information on how to add to my own kernel tracing trace_events to the LTTng trace. I am using openSUSE distributive 12.1, vanila Linux kernel 3.4.11 + rt patch and LTTng 2.0. Could you give me some information on how can i add LTTng events from my module. P.S. Sorry for my?English.? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdesfossez at efficios.com Sat Oct 20 16:37:55 2012 From: jdesfossez at efficios.com (Julien Desfossez) Date: Sat, 20 Oct 2012 16:37:55 -0400 Subject: [lttng-dev] RFC : design notes for remote traces live reading Message-ID: <50830BA3.6000508@efficios.com> In order to achieve live reading of streamed traces, we need : - the index generation while tracing - index streaming - synchronization of streams - cooperating viewers This RFC addresses each of these points with the anticipated design, implementation is on its way, so quick feedbacks greatly appreciated ! * Index generation The index associates a trace packet with an offset inside the tracefile. While tracing, when a packet is ready to be written, we can ask the ring buffer to provide us the information required to produce the index (data_offset, packet_size, content_size, timestamp_begin, timestamp_end, events_discarded, events_discarded_len, stream_id). * Index streaming The index is mandatory for live reading since we use it for the streams synchronization. We absolutely need to receive the index, so we send it on the control port (TCP-only), but most of the information related to the index is only relevant if we receive the associated data packet. So the proposed protocol is the following : - with each data packet, send the data_offset, packet_size, content_size (all uint64_t) along with the already in place information (stream id and sequence number) - after sending a data packet, the consumer sends on the control port a new message (RELAYD_SEND_INDEX) with timestamp_begin, timestamp_end, events_discarded, events_discarded_len, stream_id, the sequence number, (all uint64_t), and the relayd stream id of the tracefile - when the relay receives a data packet it looks if it already received an index corresponding to this stream and sequence number, if yes it completes the index structure and writes the index on disk, otherwise it creates an index structure in memory with the information it can fill and stores it in a hash table waiting for the corresponding index packet to arrive - the same concept applies when the relay receives an index packet. This two-part remote index generation allows us to determine if we lost packets because of the network, limit the number of bytes sent on the control port and make sure we still have an index for each packet with its timestamps and the number of events lost so the viewer knows if we lost events because of the tracer or the network. Design question : since the lookup is always based on two factors (relayd stream_id and sequence number), do we want to create a hash table for each stream on the relay ? We have to consider that at some point, we might have to reorder trace packets (when we support UDP) before writing them to disk, so we will need a similar structure to temporarily store out-of-order packets. Also the hash table storing the indexes needs an expiration mechanism (based on timing or number of packets). * Synchronization of streams Already discussed in an earlier RFC, summary : - at a predefined rate, the consumer sends a synchronization packet that contains the last sequence number that can be safely read by the viewer for each stream of the session, it happens as soon as possible when all streams are generating data, and also time-based to cover the case with streams not generating any data. - the relay receives this packet, ensures all data packets and indexes are commited on disk (and sync'ed) and updates the synchronization with the viewers (discussed just below) * Cooperating viewers The viewers need to be aware that they are reading streamed data and play nicely with the synchronization algorithms in place. The proposed approach is using fcntl(2) "Advisory locking" to lock specific portions of the tracefiles. The viewers will have to test and make sure they are respecting the locks when they are switching packets. So in summary : - when the relay is ready to let the viewers access the data, it adds a new write lock on the region that cannot be safely read and removes the previous one - when a viewer needs to switch packet, it tests for the presence of a lock on the region of the file it needs to access, if there is no lock it can safely read the data, otherwise it blocks until the lock is removed. - when a data packet is lost on the network, an index is written, but the offset in the tracefile is set to an invalid value (-1) so the reader knows the data was lost in transit. - the viewers need also to be adapted to read on-disk indexes, support metadata updates, respect the locking. Not addressed here but mandatory : the metadata must be completely streamed before streaming trace data that correspond to this new metadata. Feedbacks, questions and improvement ideas welcome ! Thanks, Julien From mathieu.desnoyers at efficios.com Sat Oct 20 18:54:42 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Sat, 20 Oct 2012 18:54:42 -0400 Subject: [lttng-dev] RFC : design notes for remote traces live reading In-Reply-To: <50830BA3.6000508@efficios.com> References: <50830BA3.6000508@efficios.com> Message-ID: <20121020225442.GA5020@Krystal> * Julien Desfossez (jdesfossez at efficios.com) wrote: > In order to achieve live reading of streamed traces, we need : > - the index generation while tracing > - index streaming > - synchronization of streams > - cooperating viewers > > This RFC addresses each of these points with the anticipated design, > implementation is on its way, so quick feedbacks greatly appreciated ! > > * Index generation > The index associates a trace packet with an offset inside the tracefile. > While tracing, when a packet is ready to be written, we can ask the ring > buffer to provide us the information required to produce the index > (data_offset, Is data_offset just the header size ? Do we really want that in the on-disk index ? It can be easily computed from the metadata, I'm not sure we want to duplicate this information. > packet_size, content_size, timestamp_begin, timestamp_end, > events_discarded, events_discarded_len, events_discarded_len is also known from metadata. > stream_id). Maybe you could detail the exact layout of an element in the index as a packed C structure and provide it in the next round of this RFC so we know exactly which types and what contend you plan. > > * Index streaming > The index is mandatory for live reading since we use it for the streams > synchronization. We absolutely need to receive the index, so we send it > on the control port (TCP-only), but most of the information related to > the index is only relevant if we receive the associated data packet. So > the proposed protocol is the following : > - with each data packet, send the data_offset, packet_size, content_size what is data_offset ? > (all uint64_t) along with the already in place information (stream id > and sequence number) > - after sending a data packet, the consumer sends on the control port a > new message (RELAYD_SEND_INDEX) with timestamp_begin, timestamp_end, > events_discarded, events_discarded_len, stream_id, the sequence number, do we need events_discarded_len ? > (all uint64_t), and the relayd stream id of the tracefile > - when the relay receives a data packet it looks if it already received > an index corresponding to this stream and sequence number, if yes it > completes the index structure and writes the index on disk, otherwise it > creates an index structure in memory with the information it can fill > and stores it in a hash table waiting for the corresponding index packet > to arrive > - the same concept applies when the relay receives an index packet. Yep. We could possibly describe this as a 2-way merge point between data and index, performed through lookups (by what key ?) in a hash table. > > This two-part remote index generation allows us to determine if we lost > packets because of the network, limit the number of bytes sent on the > control port and make sure we still have an index for each packet with > its timestamps and the number of events lost so the viewer knows if we > lost events because of the tracer or the network. > > Design question : since the lookup is always based on two factors > (relayd stream_id and sequence number), do we want to create a hash > table for each stream on the relay ? Nope. A single hash table can be used. The hash function takes both stream ID and seq num (e.g. with a xor), and the compare function compares with both. > We have to consider that at some point, we might have to reorder trace > packets (when we support UDP) before writing them to disk, so we will > need a similar structure to temporarily store out-of-order packets. I don't think it will be necessary for UDP: UDP datagrams, AFAIK, arrive ordered at the receiver application, even if they are made of many actual IP packets. Basically, we can simply send each entire trace packet as one single UDP datagram. > Also the hash table storing the indexes needs an expiration mechanism > (based on timing or number of packets). Upon addition into the hash table, we could use a separate data structure to keep track of expiration timers. When an entry is removed from the hash table, we remove its associated timer entry. It does not need to sit in the same data structure. Maybe a linked list, or maybe a red black tree, would be more appropriate to keep track of these expiration times. A periodical timer could perform the discard of packets when they reach their timeout. > > * Synchronization of streams > Already discussed in an earlier RFC, summary : > - at a predefined rate, the consumer sends a synchronization packet that > contains the last sequence number that can be safely read by the viewer > for each stream of the session, it happens as soon as possible when all > streams are generating data, and also time-based to cover the case with > streams not generating any data. Note: if the consumer has not sent any data whatsoever (on any stream) since the last synchronization beacon, it can skip sending the next beacon. This is a nice power consumption optimisation. > - the relay receives this packet, ensures all data packets and indexes > are commited on disk (and sync'ed) and updates the synchronization with > the viewers (discussed just below) > > * Cooperating viewers > The viewers need to be aware that they are reading streamed data and > play nicely with the synchronization algorithms in place. The proposed > approach is using fcntl(2) "Advisory locking" to lock specific portions > of the tracefiles. The viewers will have to test and make sure they are > respecting the locks when they are switching packets. > So in summary : > - when the relay is ready to let the viewers access the data, it adds a > new write lock on the region that cannot be safely read and removes the > previous one > - when a viewer needs to switch packet, it tests for the presence of a > lock on the region of the file it needs to access, if there is no lock > it can safely read the data, otherwise it blocks until the lock is removed. > - when a data packet is lost on the network, an index is written, but > the offset in the tracefile is set to an invalid value (-1) so the > reader knows the data was lost in transit. > - the viewers need also to be adapted to read on-disk indexes, support > metadata updates, respect the locking. How do you expect to deal with streams coming during tracing ? How is the viewer expected to be told a new stream needs to be read, and how is the file creation / advisory locking vs file open (read) / advisory locking expected to be handled ? > > Not addressed here but mandatory : the metadata must be completely > streamed before streaming trace data that correspond to this new metadata. Yes. We might want to think a little more about what happens when we stream partially complete metadata that cuts it somewhere where it cannot be parsed.. ? Thanks! Mathieu > > Feedbacks, questions and improvement ideas welcome ! > > Thanks, > > Julien -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Sun Oct 21 22:30:49 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Sun, 21 Oct 2012 22:30:49 -0400 Subject: [lttng-dev] [PATCH] urcu wfcqueue: introduce nonblocking API Message-ID: <20121022023049.GA9653@Krystal> Introduce nonblocking API in wfcqueue, allowing RT threads to try to dequeue, splice, or iterate on spliced queues without blocking: the caller needs to handle CDS_WFCQ_WOULDBLOCK return value (or nonzero return value for splice). Signed-off-by: Mathieu Desnoyers CC: Paul McKenney CC: Lai Jiangshan --- diff --git a/urcu/static/wfcqueue.h b/urcu/static/wfcqueue.h index 4a9003e..120a598 100644 --- a/urcu/static/wfcqueue.h +++ b/urcu/static/wfcqueue.h @@ -2,7 +2,7 @@ #define _URCU_WFCQUEUE_STATIC_H /* - * wfcqueue-static.h + * urcu/static/wfcqueue.h * * Userspace RCU library - Concurrent Queue with Wait-Free Enqueue/Blocking Dequeue * @@ -171,7 +171,7 @@ static inline void _cds_wfcq_enqueue(struct cds_wfcq_head *head, * Waiting for enqueuer to complete enqueue and return the next node. */ static inline struct cds_wfcq_node * -___cds_wfcq_node_sync_next(struct cds_wfcq_node *node) +___cds_wfcq_node_sync_next(struct cds_wfcq_node *node, int blocking) { struct cds_wfcq_node *next; int attempt = 0; @@ -180,6 +180,8 @@ ___cds_wfcq_node_sync_next(struct cds_wfcq_node *node) * Adaptative busy-looping waiting for enqueuer to complete enqueue. */ while ((next = CMM_LOAD_SHARED(node->next)) == NULL) { + if (!blocking) + return CDS_WFCQ_WOULDBLOCK; if (++attempt >= WFCQ_ADAPT_ATTEMPTS) { poll(NULL, 0, WFCQ_WAIT); /* Wait for 10ms */ attempt = 0; @@ -191,34 +193,23 @@ ___cds_wfcq_node_sync_next(struct cds_wfcq_node *node) return next; } -/* - * __cds_wfcq_first_blocking: get first node of a queue, without dequeuing. - * - * Content written into the node before enqueue is guaranteed to be - * consistent, but no other memory ordering is ensured. - * Dequeue/splice/iteration mutual exclusion should be ensured by the - * caller. - * - * Used by for-like iteration macros in urcu/wfqueue.h: - * __cds_wfcq_for_each_blocking() - * __cds_wfcq_for_each_blocking_safe() - */ static inline struct cds_wfcq_node * -___cds_wfcq_first_blocking(struct cds_wfcq_head *head, - struct cds_wfcq_tail *tail) +___cds_wfcq_first(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail, + int blocking) { struct cds_wfcq_node *node; if (_cds_wfcq_empty(head, tail)) return NULL; - node = ___cds_wfcq_node_sync_next(&head->node); + node = ___cds_wfcq_node_sync_next(&head->node, blocking); /* Load head->node.next before loading node's content */ cmm_smp_read_barrier_depends(); return node; } /* - * __cds_wfcq_next_blocking: get next node of a queue, without dequeuing. + * __cds_wfcq_first_blocking: get first node of a queue, without dequeuing. * * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. @@ -230,9 +221,31 @@ ___cds_wfcq_first_blocking(struct cds_wfcq_head *head, * __cds_wfcq_for_each_blocking_safe() */ static inline struct cds_wfcq_node * -___cds_wfcq_next_blocking(struct cds_wfcq_head *head, +___cds_wfcq_first_blocking(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + return ___cds_wfcq_first(head, tail, 1); +} + + +/* + * __cds_wfcq_first_nonblocking: get first node of a queue, without dequeuing. + * + * Same as __cds_wfcq_first_blocking, but returns CDS_WFCQ_WOULDBLOCK if + * it needs to block. + */ +static inline struct cds_wfcq_node * +___cds_wfcq_first_nonblocking(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + return ___cds_wfcq_first(head, tail, 0); +} + +static inline struct cds_wfcq_node * +___cds_wfcq_next(struct cds_wfcq_head *head, struct cds_wfcq_tail *tail, - struct cds_wfcq_node *node) + struct cds_wfcq_node *node, + int blocking) { struct cds_wfcq_node *next; @@ -247,7 +260,7 @@ ___cds_wfcq_next_blocking(struct cds_wfcq_head *head, cmm_smp_rmb(); if (CMM_LOAD_SHARED(tail->p) == node) return NULL; - next = ___cds_wfcq_node_sync_next(node); + next = ___cds_wfcq_node_sync_next(node, blocking); } /* Load node->next before loading next's content */ cmm_smp_read_barrier_depends(); @@ -255,24 +268,50 @@ ___cds_wfcq_next_blocking(struct cds_wfcq_head *head, } /* - * __cds_wfcq_dequeue_blocking: dequeue a node from the queue. + * __cds_wfcq_next_blocking: get next node of a queue, without dequeuing. * * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. - * It is valid to reuse and free a dequeued node immediately. * Dequeue/splice/iteration mutual exclusion should be ensured by the * caller. + * + * Used by for-like iteration macros in urcu/wfqueue.h: + * __cds_wfcq_for_each_blocking() + * __cds_wfcq_for_each_blocking_safe() */ static inline struct cds_wfcq_node * -___cds_wfcq_dequeue_blocking(struct cds_wfcq_head *head, - struct cds_wfcq_tail *tail) +___cds_wfcq_next_blocking(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail, + struct cds_wfcq_node *node) +{ + return ___cds_wfcq_next(head, tail, node, 1); +} + +/* + * __cds_wfcq_next_blocking: get next node of a queue, without dequeuing. + * + * Same as __cds_wfcq_next_blocking, but returns CDS_WFCQ_WOULDBLOCK if + * it needs to block. + */ +static inline struct cds_wfcq_node * +___cds_wfcq_next_nonblocking(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail, + struct cds_wfcq_node *node) +{ + return ___cds_wfcq_next(head, tail, node, 0); +} + +static inline struct cds_wfcq_node * +___cds_wfcq_dequeue(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail, + int blocking) { struct cds_wfcq_node *node, *next; if (_cds_wfcq_empty(head, tail)) return NULL; - node = ___cds_wfcq_node_sync_next(&head->node); + node = ___cds_wfcq_node_sync_next(&head->node, blocking); if ((next = CMM_LOAD_SHARED(node->next)) == NULL) { /* @@ -292,7 +331,7 @@ ___cds_wfcq_dequeue_blocking(struct cds_wfcq_head *head, _cds_wfcq_node_init(&head->node); if (uatomic_cmpxchg(&tail->p, node, &head->node) == node) return node; - next = ___cds_wfcq_node_sync_next(node); + next = ___cds_wfcq_node_sync_next(node, blocking); } /* @@ -306,26 +345,50 @@ ___cds_wfcq_dequeue_blocking(struct cds_wfcq_head *head, } /* - * __cds_wfcq_splice_blocking: enqueue all src_q nodes at the end of dest_q. + * __cds_wfcq_dequeue_blocking: dequeue a node from the queue. * - * Dequeue all nodes from src_q. - * dest_q must be already initialized. - * Dequeue/splice/iteration mutual exclusion for src_q should be ensured - * by the caller. + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * It is valid to reuse and free a dequeued node immediately. + * Dequeue/splice/iteration mutual exclusion should be ensured by the + * caller. */ -static inline void -___cds_wfcq_splice_blocking( +static inline struct cds_wfcq_node * +___cds_wfcq_dequeue_blocking(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + return ___cds_wfcq_dequeue(head, tail, 1); +} + +/* + * __cds_wfcq_dequeue_nonblocking: dequeue a node from a wait-free queue. + * + * Same as __cds_wfcq_dequeue_blocking, but returns CDS_WFCQ_WOULDBLOCK + * if it needs to block. + */ +static inline struct cds_wfcq_node * +___cds_wfcq_dequeue_nonblocking(struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + return ___cds_wfcq_dequeue(head, tail, 0); +} + +static inline int +___cds_wfcq_splice( struct cds_wfcq_head *dest_q_head, struct cds_wfcq_tail *dest_q_tail, struct cds_wfcq_head *src_q_head, - struct cds_wfcq_tail *src_q_tail) + struct cds_wfcq_tail *src_q_tail, + int blocking) { struct cds_wfcq_node *head, *tail; if (_cds_wfcq_empty(src_q_head, src_q_tail)) - return; + return 0; - head = ___cds_wfcq_node_sync_next(&src_q_head->node); + head = ___cds_wfcq_node_sync_next(&src_q_head->node, blocking); + if (head == CDS_WFCQ_WOULDBLOCK) + return -1; _cds_wfcq_node_init(&src_q_head->node); /* @@ -341,6 +404,44 @@ ___cds_wfcq_splice_blocking( * require mutual exclusion on dest_q (wait-free). */ ___cds_wfcq_append(dest_q_head, dest_q_tail, head, tail); + return 0; +} + + +/* + * __cds_wfcq_splice_blocking: enqueue all src_q nodes at the end of dest_q. + * + * Dequeue all nodes from src_q. + * dest_q must be already initialized. + * Dequeue/splice/iteration mutual exclusion for src_q should be ensured + * by the caller. + */ +static inline void +___cds_wfcq_splice_blocking( + struct cds_wfcq_head *dest_q_head, + struct cds_wfcq_tail *dest_q_tail, + struct cds_wfcq_head *src_q_head, + struct cds_wfcq_tail *src_q_tail) +{ + (void) ___cds_wfcq_splice(dest_q_head, dest_q_tail, + src_q_head, src_q_tail, 1); +} + +/* + * __cds_wfcq_splice_nonblocking: enqueue all src_q nodes at the end of dest_q. + * + * Same as __cds_wfcq_splice_blocking, but returns nonzero if it needs to + * block. + */ +static inline int +___cds_wfcq_splice_nonblocking( + struct cds_wfcq_head *dest_q_head, + struct cds_wfcq_tail *dest_q_tail, + struct cds_wfcq_head *src_q_head, + struct cds_wfcq_tail *src_q_tail) +{ + return ___cds_wfcq_splice(dest_q_head, dest_q_tail, + src_q_head, src_q_tail, 0); } /* diff --git a/urcu/wfcqueue.h b/urcu/wfcqueue.h index ba2f2ed..fe2862e 100644 --- a/urcu/wfcqueue.h +++ b/urcu/wfcqueue.h @@ -2,7 +2,7 @@ #define _URCU_WFCQUEUE_H /* - * wfcqueue.h + * urcu/wfcqueue.h * * Userspace RCU library - Concurrent Queue with Wait-Free Enqueue/Blocking Dequeue * @@ -43,6 +43,8 @@ extern "C" { * McKenney. */ +#define CDS_WFCQ_WOULDBLOCK ((void *) -1UL) + struct cds_wfcq_node { struct cds_wfcq_node *next; }; @@ -86,6 +88,16 @@ struct cds_wfcq_tail { #define __cds_wfcq_first_blocking ___cds_wfcq_first_blocking #define __cds_wfcq_next_blocking ___cds_wfcq_next_blocking +/* + * Locking ensured by caller by holding cds_wfcq_dequeue_lock(). + * Non-blocking: deque, first, next return CDS_WFCQ_WOULDBLOCK if they + * need to block. splice returns nonzero if it needs to block. + */ +#define __cds_wfcq_dequeue_nonblocking ___cds_wfcq_dequeue_nonblocking +#define __cds_wfcq_splice_nonblocking ___cds_wfcq_splice_nonblocking +#define __cds_wfcq_first_nonblocking ___cds_wfcq_first_nonblocking +#define __cds_wfcq_next_nonblocking ___cds_wfcq_next_nonblocking + #else /* !_LGPL_SOURCE */ /* @@ -179,7 +191,7 @@ extern void cds_wfcq_splice_blocking( struct cds_wfcq_tail *src_q_tail); /* - * __cds_wfcq_dequeue_blocking: + * __cds_wfcq_dequeue_blocking: dequeue a node from a wait-free queue. * * Content written into the node before enqueue is guaranteed to be * consistent, but no other memory ordering is ensured. @@ -192,6 +204,16 @@ extern struct cds_wfcq_node *__cds_wfcq_dequeue_blocking( struct cds_wfcq_tail *tail); /* + * __cds_wfcq_dequeue_nonblocking: dequeue a node from a wait-free queue. + * + * Same as __cds_wfcq_dequeue_blocking, but returns CDS_WFCQ_WOULDBLOCK + * if it needs to block. + */ +extern struct cds_wfcq_node *__cds_wfcq_dequeue_nonblocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail); + +/* * __cds_wfcq_splice_blocking: enqueue all src_q nodes at the end of dest_q. * * Dequeue all nodes from src_q. @@ -208,6 +230,18 @@ extern void __cds_wfcq_splice_blocking( struct cds_wfcq_tail *src_q_tail); /* + * __cds_wfcq_splice_nonblocking: enqueue all src_q nodes at the end of dest_q. + * + * Same as __cds_wfcq_splice_blocking, but returns nonzero if it needs to + * block. + */ +extern int __cds_wfcq_splice_nonblocking( + struct cds_wfcq_head *dest_q_head, + struct cds_wfcq_tail *dest_q_tail, + struct cds_wfcq_head *src_q_head, + struct cds_wfcq_tail *src_q_tail); + +/* * __cds_wfcq_first_blocking: get first node of a queue, without dequeuing. * * Content written into the node before enqueue is guaranteed to be @@ -224,6 +258,16 @@ extern struct cds_wfcq_node *__cds_wfcq_first_blocking( struct cds_wfcq_tail *tail); /* + * __cds_wfcq_first_nonblocking: get first node of a queue, without dequeuing. + * + * Same as __cds_wfcq_first_blocking, but returns CDS_WFCQ_WOULDBLOCK if + * it needs to block. + */ +extern struct cds_wfcq_node *__cds_wfcq_first_nonblocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail); + +/* * __cds_wfcq_next_blocking: get next node of a queue, without dequeuing. * * Content written into the node before enqueue is guaranteed to be @@ -240,6 +284,17 @@ extern struct cds_wfcq_node *__cds_wfcq_next_blocking( struct cds_wfcq_tail *tail, struct cds_wfcq_node *node); +/* + * __cds_wfcq_next_blocking: get next node of a queue, without dequeuing. + * + * Same as __cds_wfcq_next_blocking, but returns CDS_WFCQ_WOULDBLOCK if + * it needs to block. + */ +extern struct cds_wfcq_node *__cds_wfcq_next_nonblocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail, + struct cds_wfcq_node *node); + #endif /* !_LGPL_SOURCE */ /* diff --git a/wfcqueue.c b/wfcqueue.c index 1fa27ac..3474ee0 100644 --- a/wfcqueue.c +++ b/wfcqueue.c @@ -90,6 +90,13 @@ struct cds_wfcq_node *__cds_wfcq_dequeue_blocking( return ___cds_wfcq_dequeue_blocking(head, tail); } +struct cds_wfcq_node *__cds_wfcq_dequeue_nonblocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + return ___cds_wfcq_dequeue_nonblocking(head, tail); +} + void __cds_wfcq_splice_blocking( struct cds_wfcq_head *dest_q_head, struct cds_wfcq_tail *dest_q_tail, @@ -100,6 +107,16 @@ void __cds_wfcq_splice_blocking( src_q_head, src_q_tail); } +int __cds_wfcq_splice_nonblocking( + struct cds_wfcq_head *dest_q_head, + struct cds_wfcq_tail *dest_q_tail, + struct cds_wfcq_head *src_q_head, + struct cds_wfcq_tail *src_q_tail) +{ + return ___cds_wfcq_splice_nonblocking(dest_q_head, dest_q_tail, + src_q_head, src_q_tail); +} + struct cds_wfcq_node *__cds_wfcq_first_blocking( struct cds_wfcq_head *head, struct cds_wfcq_tail *tail) @@ -107,6 +124,13 @@ struct cds_wfcq_node *__cds_wfcq_first_blocking( return ___cds_wfcq_first_blocking(head, tail); } +struct cds_wfcq_node *__cds_wfcq_first_nonblocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail) +{ + return ___cds_wfcq_first_nonblocking(head, tail); +} + struct cds_wfcq_node *__cds_wfcq_next_blocking( struct cds_wfcq_head *head, struct cds_wfcq_tail *tail, @@ -114,3 +138,11 @@ struct cds_wfcq_node *__cds_wfcq_next_blocking( { return ___cds_wfcq_next_blocking(head, tail, node); } + +struct cds_wfcq_node *__cds_wfcq_next_nonblocking( + struct cds_wfcq_head *head, + struct cds_wfcq_tail *tail, + struct cds_wfcq_node *node) +{ + return ___cds_wfcq_next_nonblocking(head, tail, node); +} -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From Andrew.McDermott at windriver.com Mon Oct 22 07:00:03 2012 From: Andrew.McDermott at windriver.com (McDermott, Andrew) Date: Mon, 22 Oct 2012 11:00:03 +0000 Subject: [lttng-dev] status of lttng top In-Reply-To: <507389DE.4000204@efficios.com> (Julien Desfossez's message of "Mon, 8 Oct 2012 22:20:14 -0400") References: <7F632A9222059A42AF70FCB7965774AA20627EAB@ALA-MBB.corp.ad.wrs.com> <507389DE.4000204@efficios.com> Message-ID: <7F632A9222059A42AF70FCB7965774AA20854D6F@ALA-MBB.corp.ad.wrs.com> Hi, > LTTngTop is still work in progress and will remain that way for a long > time, but the version in the PPA (or in the master branch in git) is > perfectly usable for offline traces (traces recorded and replayed > through LTTngTop). > > The "live" branch is more experimental and requires patches in both > Babeltrace and Lttng-tools (all documented in the README-LIVE file), but > it worked at the time of Plumbers, I didn't have much time since then to > rebase the branches. > > I am waiting for the release of Lttng-tools 2.1 (currently in RC) before > merging those patches. After these patches are integrated, LTTngTop will > be able to work live without any modifications, so directly reading > traces in memory shared with the tracer. Thanks for this info. Right now my interest is with the live streaming; we have a use case where the live streaming is really the only practical solution. Very roughly, would you expect the RC series to conclude this year, or (early) next year? > In the meantime we are working on replacing the "home made" state system > in LTTngTop with a more generic one (which will be used also in LTTV), > this will cleanup this part of the code and allow to store the state on > disk. So in a near future we will be able to only read the state instead > of the trace (once it has been generated), which will compress > significantly the amount of data we need to keep in order to access the > kind of statistics provided by LTTngTop. > > If you want to try LTTngTop, you can just install the package and follow > the man page to record a trace with the right contexts, it should work > as is. > > If you have any questions and/or feedback, please don't hesitate to ask. > > Thanks, > > Julien > > > On 08/10/12 05:57 PM, McDermott, Andrew wrote: >> I was wondering what the status of lttng top was. I'm happy to add the >> daily Ubuntu PPA and try it that way, or equally building from source. >> But, before I tread that route, are there any gotchas to be aware of. >> Is it still considered work-in-progress, etc. >> >> Thanks. >> >> >> >> _______________________________________________ >> lttng-dev mailing list >> lttng-dev at lists.lttng.org >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev From mathieu.desnoyers at efficios.com Mon Oct 22 08:58:55 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 22 Oct 2012 08:58:55 -0400 Subject: [lttng-dev] [PATCH urcu] wfstack: implement cds_wfs_pop_all and iterators, document API Message-ID: <20121022125855.GA30944@Krystal> Signed-off-by: Mathieu Desnoyers CC: Paul McKenney CC: Lai Jiangshan --- diff --git a/urcu/static/wfstack.h b/urcu/static/wfstack.h index cb68a59..668ff7d 100644 --- a/urcu/static/wfstack.h +++ b/urcu/static/wfstack.h @@ -1,10 +1,10 @@ -#ifndef _URCU_WFSTACK_STATIC_H -#define _URCU_WFSTACK_STATIC_H +#ifndef _URCU_STATIC_WFSTACK_H +#define _URCU_STATIC_WFSTACK_H /* - * wfstack-static.h + * urcu/static/wfstack.h * - * Userspace RCU library - Stack with Wait-Free push, Blocking pop. + * Userspace RCU library - Stack with with wait-free push, blocking traversal. * * TO BE INCLUDED ONLY IN LGPL-COMPATIBLE CODE. See wfstack.h for linking * dynamically with the userspace rcu library. @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -36,95 +37,292 @@ extern "C" { #endif -#define CDS_WF_STACK_END ((void *)0x1UL) +#define CDS_WFS_END ((void *) 0x1UL) #define CDS_WFS_ADAPT_ATTEMPTS 10 /* Retry if being set */ #define CDS_WFS_WAIT 10 /* Wait 10 ms if being set */ +/* + * Stack with wait-free push, blocking traversal. + * + * Stack implementing push, pop, pop_all operations, as well as iterator + * on the stack head returned by pop_all. + * + * Wait-free operations: cds_wfs_push, __cds_wfs_pop_all. + * Blocking operations: cds_wfs_pop, cds_wfs_pop_all, iteration on stack + * head returned by pop_all. + * + * Synchronization table: + * + * External synchronization techniques described in the API below is + * required between pairs marked with "X". No external synchronization + * required between pairs marked with "-". + * + * cds_wfs_push __cds_wfs_pop __cds_wfs_pop_all + * cds_wfs_push - - - + * __cds_wfs_pop - X X + * __cds_wfs_pop_all - X - + * + * cds_wfs_pop and cds_wfs_pop_all use an internal mutex to provide + * synchronization. + */ + +/* + * cds_wfs_node_init: initialize wait-free stack node. + */ static inline void _cds_wfs_node_init(struct cds_wfs_node *node) { node->next = NULL; } +/* + * cds_wfs_init: initialize wait-free stack. + */ static inline void _cds_wfs_init(struct cds_wfs_stack *s) { int ret; - s->head = CDS_WF_STACK_END; + s->head = CDS_WFS_END; ret = pthread_mutex_init(&s->lock, NULL); assert(!ret); } +static inline bool ___cds_wfs_end(void *node) +{ + return node == CDS_WFS_END; +} + /* - * Returns 0 if stack was empty, 1 otherwise. + * cds_wfs_empty: return whether wait-free stack is empty. + * + * No memory barrier is issued. No mutual exclusion is required. + */ +static inline bool _cds_wfs_empty(struct cds_wfs_stack *s) +{ + return ___cds_wfs_end(CMM_LOAD_SHARED(s->head)); +} + +/* + * cds_wfs_push: push a node into the stack. + * + * Issues a full memory barrier before push. No mutual exclusion is + * required. + * + * Returns 0 if the stack was empty prior to adding the node. + * Returns non-zero otherwise. */ static inline int _cds_wfs_push(struct cds_wfs_stack *s, struct cds_wfs_node *node) { - struct cds_wfs_node *old_head; + struct cds_wfs_head *old_head, *new_head; assert(node->next == NULL); + new_head = caa_container_of(node, struct cds_wfs_head, node); /* - * uatomic_xchg() implicit memory barrier orders earlier stores to node - * (setting it to NULL) before publication. + * uatomic_xchg() implicit memory barrier orders earlier stores + * to node (setting it to NULL) before publication. */ - old_head = uatomic_xchg(&s->head, node); + old_head = uatomic_xchg(&s->head, new_head); /* - * At this point, dequeuers see a NULL node->next, they should busy-wait - * until node->next is set to old_head. + * At this point, dequeuers see a NULL node->next, they should + * busy-wait until node->next is set to old_head. */ - CMM_STORE_SHARED(node->next, old_head); - return (old_head != CDS_WF_STACK_END); + CMM_STORE_SHARED(node->next, &old_head->node); + return !___cds_wfs_end(old_head); } /* - * Returns NULL if stack is empty. + * Waiting for push to complete enqueue and return the next node. */ -static inline -struct cds_wfs_node * -___cds_wfs_pop_blocking(struct cds_wfs_stack *s) +static inline struct cds_wfs_node * +___cds_wfs_node_sync_next(struct cds_wfs_node *node) { - struct cds_wfs_node *head, *next; + struct cds_wfs_node *next; int attempt = 0; -retry: - head = CMM_LOAD_SHARED(s->head); - if (head == CDS_WF_STACK_END) - return NULL; /* * Adaptative busy-looping waiting for push to complete. */ - while ((next = CMM_LOAD_SHARED(head->next)) == NULL) { + while ((next = CMM_LOAD_SHARED(node->next)) == NULL) { if (++attempt >= CDS_WFS_ADAPT_ATTEMPTS) { poll(NULL, 0, CDS_WFS_WAIT); /* Wait for 10ms */ attempt = 0; - } else + } else { caa_cpu_relax(); + } } - if (uatomic_cmpxchg(&s->head, head, next) == head) - return head; - else - goto retry; /* Concurrent modification. Retry. */ + + return next; } +/* + * __cds_wfs_pop_blocking: pop a node from the stack. + * + * Returns NULL if stack is empty. + * + * __cds_wfs_pop_blocking needs to be synchronized using one of the + * following techniques: + * + * 1) Calling __cds_wfs_pop_blocking under rcu read lock critical + * section. The caller must wait for a grace period to pass before + * freeing the returned node or modifying the cds_wfs_node structure. + * 2) Using mutual exclusion (e.g. mutexes) to protect + * __cds_wfs_pop_blocking and __cds_wfs_pop_all callers. + * 3) Ensuring that only ONE thread can call __cds_wfs_pop_blocking() + * and __cds_wfs_pop_all(). (multi-provider/single-consumer scheme). + */ static inline struct cds_wfs_node * -_cds_wfs_pop_blocking(struct cds_wfs_stack *s) +___cds_wfs_pop_blocking(struct cds_wfs_stack *s) +{ + struct cds_wfs_head *head, *new_head; + struct cds_wfs_node *next; + + for (;;) { + head = CMM_LOAD_SHARED(s->head); + if (___cds_wfs_end(head)) + return NULL; + next = ___cds_wfs_node_sync_next(&head->node); + new_head = caa_container_of(next, struct cds_wfs_head, node); + if (uatomic_cmpxchg(&s->head, head, new_head) == head) + return &head->node; + /* busy-loop if head changed under us */ + } +} + +/* + * __cds_wfs_pop_all: pop all nodes from a stack. + * + * __cds_wfs_pop_all does not require any synchronization with other + * push, nor with other __cds_wfs_pop_all, but requires synchronization + * matching the technique used to synchronize __cds_wfs_pop_blocking: + * + * 1) If __cds_wfs_pop_blocking is called under rcu read lock critical + * section, both __cds_wfs_pop_blocking and cds_wfs_pop_all callers + * must wait for a grace period to pass before freeing the returned + * node or modifying the cds_wfs_node structure. However, no RCU + * read-side critical section is needed around __cds_wfs_pop_all. + * 2) Using mutual exclusion (e.g. mutexes) to protect + * __cds_wfs_pop_blocking and __cds_wfs_pop_all callers. + * 3) Ensuring that only ONE thread can call __cds_wfs_pop_blocking() + * and __cds_wfs_pop_all(). (multi-provider/single-consumer scheme). + */ +static inline +struct cds_wfs_head * +___cds_wfs_pop_all(struct cds_wfs_stack *s) +{ + struct cds_wfs_head *head; + + /* + * Implicit memory barrier after uatomic_xchg() matches implicit + * memory barrier before uatomic_xchg() in cds_wfs_push. It + * ensures that all nodes of the returned list are consistent. + * There is no need to issue memory barriers when iterating on + * the returned list, because the full memory barrier issued + * prior to each uatomic_cmpxchg, which each write to head, are + * taking care to order writes to each node prior to the full + * memory barrier after this uatomic_xchg(). + */ + head = uatomic_xchg(&s->head, CDS_WFS_END); + if (___cds_wfs_end(head)) + return NULL; + return head; +} + +/* + * cds_wfs_pop_lock: lock stack pop-protection mutex. + */ +static inline void _cds_wfs_pop_lock(struct cds_wfs_stack *s) { - struct cds_wfs_node *retnode; int ret; ret = pthread_mutex_lock(&s->lock); assert(!ret); - retnode = ___cds_wfs_pop_blocking(s); +} + +/* + * cds_wfs_pop_unlock: unlock stack pop-protection mutex. + */ +static inline void _cds_wfs_pop_unlock(struct cds_wfs_stack *s) +{ + int ret; + ret = pthread_mutex_unlock(&s->lock); assert(!ret); +} + +/* + * Call __cds_wfs_pop_blocking with an internal pop mutex held. + */ +static inline +struct cds_wfs_node * +_cds_wfs_pop_blocking(struct cds_wfs_stack *s) +{ + struct cds_wfs_node *retnode; + + _cds_wfs_pop_lock(s); + retnode = ___cds_wfs_pop_blocking(s); + _cds_wfs_pop_unlock(s); return retnode; } +/* + * Call __cds_wfs_pop_all with an internal pop mutex held. + */ +static inline +struct cds_wfs_head * +_cds_wfs_pop_all_blocking(struct cds_wfs_stack *s) +{ + struct cds_wfs_head *rethead; + + _cds_wfs_pop_lock(s); + rethead = ___cds_wfs_pop_all(s); + _cds_wfs_pop_unlock(s); + return rethead; +} + +/* + * cds_wfs_first_blocking: get first node of a popped stack. + * + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * + * Used by for-like iteration macros in urcu/wfstack.h: + * cds_wfs_for_each_blocking() + * cds_wfs_for_each_blocking_safe() + */ +static inline struct cds_wfs_node * +_cds_wfs_first_blocking(struct cds_wfs_head *head) +{ + if (___cds_wfs_end(head)) + return NULL; + return &head->node; +} + +/* + * cds_wfs_next_blocking: get next node of a popped stack. + * + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * + * Used by for-like iteration macros in urcu/wfstack.h: + * cds_wfs_for_each_blocking() + * cds_wfs_for_each_blocking_safe() + */ +static inline struct cds_wfs_node * +_cds_wfs_next_blocking(struct cds_wfs_node *node) +{ + struct cds_wfs_node *next; + + next = ___cds_wfs_node_sync_next(node); + if (___cds_wfs_end(next)) + return NULL; + return next; +} + #ifdef __cplusplus } #endif -#endif /* _URCU_WFSTACK_STATIC_H */ +#endif /* _URCU_STATIC_WFSTACK_H */ diff --git a/urcu/wfstack.h b/urcu/wfstack.h index db2ee0c..b6992e8 100644 --- a/urcu/wfstack.h +++ b/urcu/wfstack.h @@ -2,9 +2,9 @@ #define _URCU_WFSTACK_H /* - * wfstack.h + * urcu/wfstack.h * - * Userspace RCU library - Stack with Wait-Free push, Blocking pop. + * Userspace RCU library - Stack with wait-free push, blocking traversal. * * Copyright 2010 - Mathieu Desnoyers * @@ -25,18 +25,59 @@ #include #include +#include #include #ifdef __cplusplus extern "C" { #endif +/* + * Stack with wait-free push, blocking traversal. + * + * Stack implementing push, pop, pop_all operations, as well as iterator + * on the stack head returned by pop_all. + * + * Wait-free operations: cds_wfs_push, __cds_wfs_pop_all. + * Blocking operations: cds_wfs_pop, cds_wfs_pop_all, iteration on stack + * head returned by pop_all. + * + * Synchronization table: + * + * External synchronization techniques described in the API below is + * required between pairs marked with "X". No external synchronization + * required between pairs marked with "-". + * + * cds_wfs_push __cds_wfs_pop __cds_wfs_pop_all + * cds_wfs_push - - - + * __cds_wfs_pop - X X + * __cds_wfs_pop_all - X - + * + * cds_wfs_pop and cds_wfs_pop_all use an internal mutex to provide + * synchronization. + */ + +/* + * struct cds_wfs_node is returned by __cds_wfs_pop, and also used as + * iterator on stack. It is not safe to dereference the node next + * pointer when returned by __cds_wfs_pop_blocking. + */ struct cds_wfs_node { struct cds_wfs_node *next; }; +/* + * struct cds_wfs_head is returned by __cds_wfs_pop_all, and can be used + * to begin iteration on the stack. "node" needs to be the first field of + * cds_wfs_head, so the end-of-stack pointer value can be used for both + * types. + */ +struct cds_wfs_head { + struct cds_wfs_node node; +}; + struct cds_wfs_stack { - struct cds_wfs_node *head; + struct cds_wfs_head *head; pthread_mutex_t lock; }; @@ -45,24 +86,179 @@ struct cds_wfs_stack { #include #define cds_wfs_node_init _cds_wfs_node_init -#define cds_wfs_init _cds_wfs_init -#define cds_wfs_push _cds_wfs_push -#define __cds_wfs_pop_blocking ___cds_wfs_pop_blocking -#define cds_wfs_pop_blocking _cds_wfs_pop_blocking +#define cds_wfs_init _cds_wfs_init +#define cds_wfs_empty _cds_wfs_empty +#define cds_wfs_push _cds_wfs_push + +/* Locking performed internally */ +#define cds_wfs_pop_blocking _cds_wfs_pop_blocking +#define cds_wfs_pop_all_blocking _cds_wfs_pop_all_blocking + +/* + * For iteration on cds_wfs_head returned by __cds_wfs_pop_all or + * cds_wfs_pop_all_blocking. + */ +#define cds_wfs_first_blocking _cds_wfs_first_blocking +#define cds_wfs_next_blocking _cds_wfs_next_blocking + +/* Pop locking with internal mutex */ +#define cds_wfs_pop_lock _cds_wfs_pop_lock +#define cds_wfs_pop_unlock _cds_wfs_pop_unlock + +/* Synchronization ensured by the caller. See synchronization table. */ +#define __cds_wfs_pop_blocking ___cds_wfs_pop_blocking +#define __cds_wfs_pop_all ___cds_wfs_pop_all #else /* !_LGPL_SOURCE */ +/* + * cds_wfs_node_init: initialize wait-free stack node. + */ extern void cds_wfs_node_init(struct cds_wfs_node *node); + +/* + * cds_wfs_init: initialize wait-free stack. + */ extern void cds_wfs_init(struct cds_wfs_stack *s); + +/* + * cds_wfs_empty: return whether wait-free stack is empty. + * + * No memory barrier is issued. No mutual exclusion is required. + */ +extern bool cds_wfs_empty(struct cds_wfs_stack *s); + +/* + * cds_wfs_push: push a node into the stack. + * + * Issues a full memory barrier before push. No mutual exclusion is + * required. + * + * Returns 0 if the stack was empty prior to adding the node. + * Returns non-zero otherwise. + */ extern int cds_wfs_push(struct cds_wfs_stack *s, struct cds_wfs_node *node); -/* __cds_wfs_pop_blocking: caller ensures mutual exclusion between pops */ -extern struct cds_wfs_node *__cds_wfs_pop_blocking(struct cds_wfs_stack *s); + +/* + * cds_wfs_pop_blocking: pop a node from the stack. + * + * Calls __cds_wfs_pop_blocking with an internal pop mutex held. + */ extern struct cds_wfs_node *cds_wfs_pop_blocking(struct cds_wfs_stack *s); +/* + * cds_wfs_pop_all_blocking: pop all nodes from a stack. + * + * Calls __cds_wfs_pop_all with an internal pop mutex held. + */ +extern struct cds_wfs_head *cds_wfs_pop_all_blocking(struct cds_wfs_stack *s); + +/* + * cds_wfs_first_blocking: get first node of a popped stack. + * + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * + * Used by for-like iteration macros in urcu/wfstack.h: + * cds_wfs_for_each_blocking() + * cds_wfs_for_each_blocking_safe() + */ +extern struct cds_wfs_node *cds_wfs_first_blocking(struct cds_wfs_head *head); + +/* + * cds_wfs_next_blocking: get next node of a popped stack. + * + * Content written into the node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + * + * Used by for-like iteration macros in urcu/wfstack.h: + * cds_wfs_for_each_blocking() + * cds_wfs_for_each_blocking_safe() + */ +extern struct cds_wfs_node *cds_wfs_next_blocking(struct cds_wfs_node *node); + +/* + * cds_wfs_pop_lock: lock stack pop-protection mutex. + */ +extern void cds_wfs_pop_lock(struct cds_wfs_stack *s); + +/* + * cds_wfs_pop_unlock: unlock stack pop-protection mutex. + */ +extern void cds_wfs_pop_unlock(struct cds_wfs_stack *s); + +/* + * __cds_wfs_pop_blocking: pop a node from the stack. + * + * Returns NULL if stack is empty. + * + * __cds_wfs_pop_blocking needs to be synchronized using one of the + * following techniques: + * + * 1) Calling __cds_wfs_pop_blocking under rcu read lock critical + * section. The caller must wait for a grace period to pass before + * freeing the returned node or modifying the cds_wfs_node structure. + * 2) Using mutual exclusion (e.g. mutexes) to protect + * __cds_wfs_pop_blocking and __cds_wfs_pop_all callers. + * 3) Ensuring that only ONE thread can call __cds_wfs_pop_blocking() + * and __cds_wfs_pop_all(). (multi-provider/single-consumer scheme). + */ +extern struct cds_wfs_node *__cds_wfs_pop_blocking(struct cds_wfs_stack *s); + +/* + * __cds_wfs_pop_all: pop all nodes from a stack. + * + * __cds_wfs_pop_all does not require any synchronization with other + * push, nor with other __cds_wfs_pop_all, but requires synchronization + * matching the technique used to synchronize __cds_wfs_pop_blocking: + * + * 1) If __cds_wfs_pop_blocking is called under rcu read lock critical + * section, both __cds_wfs_pop_blocking and cds_wfs_pop_all callers + * must wait for a grace period to pass before freeing the returned + * node or modifying the cds_wfs_node structure. However, no RCU + * read-side critical section is needed around __cds_wfs_pop_all. + * 2) Using mutual exclusion (e.g. mutexes) to protect + * __cds_wfs_pop_blocking and __cds_wfs_pop_all callers. + * 3) Ensuring that only ONE thread can call __cds_wfs_pop_blocking() + * and __cds_wfs_pop_all(). (multi-provider/single-consumer scheme). + */ +extern struct cds_wfs_head *__cds_wfs_pop_all(struct cds_wfs_stack *s); + #endif /* !_LGPL_SOURCE */ #ifdef __cplusplus } #endif +/* + * cds_wfs_for_each_blocking: Iterate over all nodes returned by + * __cds_wfs_pop_all(). + * @head: head of the queue (struct cds_wfs_head pointer). + * @node: iterator (struct cds_wfs_node pointer). + * + * Content written into each node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + */ +#define cds_wfs_for_each_blocking(head, node) \ + for (node = cds_wfs_first_blocking(head); \ + node != NULL; \ + node = cds_wfs_next_blocking(node)) + +/* + * cds_wfs_for_each_blocking_safe: Iterate over all nodes returned by + * __cds_wfs_pop_all(). Safe against deletion. + * @head: head of the queue (struct cds_wfs_head pointer). + * @node: iterator (struct cds_wfs_node pointer). + * @n: struct cds_wfs_node pointer holding the next pointer (used + * internally). + * + * Content written into each node before enqueue is guaranteed to be + * consistent, but no other memory ordering is ensured. + */ +#define cds_wfs_for_each_blocking_safe(head, node, n) \ + for (node = cds_wfs_first_blocking(head), \ + n = (node ? cds_wfs_next_blocking(node) : NULL); \ + node != NULL; \ + node = n, n = (node ? cds_wfs_next_blocking(node) : NULL)) + #endif /* _URCU_WFSTACK_H */ diff --git a/wfstack.c b/wfstack.c index e9799e6..48f290c 100644 --- a/wfstack.c +++ b/wfstack.c @@ -1,7 +1,7 @@ /* * wfstack.c * - * Userspace RCU library - Stack with Wait-Free push, Blocking pop. + * Userspace RCU library - Stack with wait-free push, blocking traversal. * * Copyright 2010 - Mathieu Desnoyers * @@ -38,17 +38,52 @@ void cds_wfs_init(struct cds_wfs_stack *s) _cds_wfs_init(s); } +bool cds_wfs_empty(struct cds_wfs_stack *s) +{ + return _cds_wfs_empty(s); +} + int cds_wfs_push(struct cds_wfs_stack *s, struct cds_wfs_node *node) { return _cds_wfs_push(s, node); } +struct cds_wfs_node *cds_wfs_pop_blocking(struct cds_wfs_stack *s) +{ + return _cds_wfs_pop_blocking(s); +} + +struct cds_wfs_head *cds_wfs_pop_all_blocking(struct cds_wfs_stack *s) +{ + return _cds_wfs_pop_all_blocking(s); +} + +struct cds_wfs_node *cds_wfs_first_blocking(struct cds_wfs_head *head) +{ + return _cds_wfs_first_blocking(head); +} + +struct cds_wfs_node *cds_wfs_next_blocking(struct cds_wfs_node *node) +{ + return _cds_wfs_next_blocking(node); +} + +void cds_wfs_pop_lock(struct cds_wfs_stack *s) +{ + _cds_wfs_pop_lock(s); +} + +void cds_wfs_pop_unlock(struct cds_wfs_stack *s) +{ + _cds_wfs_pop_unlock(s); +} + struct cds_wfs_node *__cds_wfs_pop_blocking(struct cds_wfs_stack *s) { return ___cds_wfs_pop_blocking(s); } -struct cds_wfs_node *cds_wfs_pop_blocking(struct cds_wfs_stack *s) +struct cds_wfs_head *__cds_wfs_pop_all(struct cds_wfs_stack *s) { - return _cds_wfs_pop_blocking(s); + return ___cds_wfs_pop_all(s); } -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 22 08:59:51 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 22 Oct 2012 08:59:51 -0400 Subject: [lttng-dev] [PATCH urcu] wfstack: implement pop_all and iteration tests Message-ID: <20121022125951.GB30944@Krystal> Signed-off-by: Mathieu Desnoyers CC: Paul McKenney CC: Lai Jiangshan --- diff --git a/tests/test_urcu_wfs.c b/tests/test_urcu_wfs.c index 66710e2..7d5fe0f 100644 --- a/tests/test_urcu_wfs.c +++ b/tests/test_urcu_wfs.c @@ -68,6 +68,16 @@ static inline pid_t gettid(void) #include #include +/* + * External synchronization used. + */ +enum test_sync { + TEST_SYNC_NONE = 0, + TEST_SYNC_MUTEX, +}; + +static enum test_sync test_sync; + static volatile int test_go, test_stop; static unsigned long rduration; @@ -85,10 +95,12 @@ static inline void loop_sleep(unsigned long loops) static int verbose_mode; +static int test_pop, test_pop_all; + #define printf_verbose(fmt, args...) \ do { \ if (verbose_mode) \ - printf(fmt, args); \ + printf(fmt, ## args); \ } while (0) static unsigned int cpu_affinities[NR_CPUS]; @@ -199,9 +211,45 @@ fail: } +static void do_test_pop(enum test_sync sync) +{ + struct cds_wfs_node *node; + + if (sync == TEST_SYNC_MUTEX) + cds_wfs_pop_lock(&s); + node = __cds_wfs_pop_blocking(&s); + if (sync == TEST_SYNC_MUTEX) + cds_wfs_pop_unlock(&s); + + if (node) { + free(node); + URCU_TLS(nr_successful_dequeues)++; + } + URCU_TLS(nr_dequeues)++; +} + +static void do_test_pop_all(enum test_sync sync) +{ + struct cds_wfs_head *head; + struct cds_wfs_node *node, *n; + + if (sync == TEST_SYNC_MUTEX) + cds_wfs_pop_lock(&s); + head = __cds_wfs_pop_all(&s); + if (sync == TEST_SYNC_MUTEX) + cds_wfs_pop_unlock(&s); + + cds_wfs_for_each_blocking_safe(head, node, n) { + free(node); + URCU_TLS(nr_successful_dequeues)++; + URCU_TLS(nr_dequeues)++; + } +} + void *thr_dequeuer(void *_count) { unsigned long long *count = _count; + unsigned int counter; printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n", "dequeuer", pthread_self(), (unsigned long)gettid()); @@ -213,15 +261,22 @@ void *thr_dequeuer(void *_count) } cmm_smp_mb(); - for (;;) { - struct cds_wfs_node *node = cds_wfs_pop_blocking(&s); + assert(test_pop || test_pop_all); - if (node) { - free(node); - URCU_TLS(nr_successful_dequeues)++; + for (;;) { + if (test_pop && test_pop_all) { + if (counter & 1) + do_test_pop(test_sync); + else + do_test_pop_all(test_sync); + counter++; + } else { + if (test_pop) + do_test_pop(test_sync); + else + do_test_pop_all(test_sync); } - URCU_TLS(nr_dequeues)++; if (caa_unlikely(!test_duration_dequeue())) break; if (caa_unlikely(rduration)) @@ -257,6 +312,10 @@ void show_usage(int argc, char **argv) printf(" [-c duration] (dequeuer period (in loops))"); printf(" [-v] (verbose output)"); printf(" [-a cpu#] [-a cpu#]... (affinity)"); + printf(" [-p] (test pop)"); + printf(" [-P] (test pop_all, enabled by default)"); + printf(" [-M] (use mutex external synchronization)"); + printf(" Note: default: no external synchronization used."); printf("\n"); } @@ -326,12 +385,29 @@ int main(int argc, char **argv) case 'v': verbose_mode = 1; break; + case 'p': + test_pop = 1; + break; + case 'P': + test_pop_all = 1; + break; + case 'M': + test_sync = TEST_SYNC_MUTEX; + break; } } + /* activate pop_all test by default */ + if (!test_pop && !test_pop_all) + test_pop_all = 1; + printf_verbose("running test for %lu seconds, %u enqueuers, " "%u dequeuers.\n", duration, nr_enqueuers, nr_dequeuers); + if (test_pop) + printf_verbose("pop test activated.\n"); + if (test_pop_all) + printf_verbose("pop_all test activated.\n"); printf_verbose("Writer delay : %lu loops.\n", rduration); printf_verbose("Reader duration : %lu loops.\n", wdelay); printf_verbose("thread %-6s, thread id : %lx, tid %lu\n", -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 22 09:01:54 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 22 Oct 2012 09:01:54 -0400 Subject: [lttng-dev] [PATCH urcu] lfstack: implement lock-free stack Message-ID: <20121022130154.GC30944@Krystal> This stack does not require to hold RCU read-side lock across push, and allows multiple strategies to be used for pop. Signed-off-by: Mathieu Desnoyers CC: Paul McKenney CC: Lai Jiangshan --- diff --git a/Makefile.am b/Makefile.am index ffdca9a..195b89a 100644 --- a/Makefile.am +++ b/Makefile.am @@ -17,6 +17,7 @@ nobase_dist_include_HEADERS = urcu/compiler.h urcu/hlist.h urcu/list.h \ urcu/wfqueue.h urcu/rculfstack.h urcu/rculfqueue.h \ urcu/ref.h urcu/cds.h urcu/urcu_ref.h urcu/urcu-futex.h \ urcu/uatomic_arch.h urcu/rculfhash.h urcu/wfcqueue.h \ + urcu/lfstack.h \ $(top_srcdir)/urcu/map/*.h \ $(top_srcdir)/urcu/static/*.h \ urcu/tls-compat.h @@ -72,7 +73,8 @@ liburcu_signal_la_LIBADD = liburcu-common.la liburcu_bp_la_SOURCES = urcu-bp.c urcu-pointer.c $(COMPAT) liburcu_bp_la_LIBADD = liburcu-common.la -liburcu_cds_la_SOURCES = rculfqueue.c rculfstack.c $(RCULFHASH) $(COMPAT) +liburcu_cds_la_SOURCES = rculfqueue.c rculfstack.c lfstack.c \ + $(RCULFHASH) $(COMPAT) liburcu_cds_la_LIBADD = liburcu-common.la pkgconfigdir = $(libdir)/pkgconfig diff --git a/lfstack.c b/lfstack.c new file mode 100644 index 0000000..74ffd4f --- /dev/null +++ b/lfstack.c @@ -0,0 +1,51 @@ +/* + * lfstack.c + * + * Userspace RCU library - Lock-Free Stack + * + * Copyright 2010-2012 - Mathieu Desnoyers + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/* Do not #define _LGPL_SOURCE to ensure we can emit the wrapper symbols */ +#undef _LGPL_SOURCE +#include "urcu/lfstack.h" +#define _LGPL_SOURCE +#include "urcu/static/lfstack.h" + +/* + * library wrappers to be used by non-LGPL compatible source code. + */ + +void cds_lfs_node_init(struct cds_lfs_node *node) +{ + _cds_lfs_node_init(node); +} + +void cds_lfs_init(struct cds_lfs_stack *s) +{ + _cds_lfs_init(s); +} + +int cds_lfs_push(struct cds_lfs_stack *s, struct cds_lfs_node *node) +{ + return _cds_lfs_push(s, node); +} + +struct cds_lfs_node *cds_lfs_pop(struct cds_lfs_stack *s) +{ + return _cds_lfs_pop(s); +} diff --git a/urcu/cds.h b/urcu/cds.h index d9e7984..78534bb 100644 --- a/urcu/cds.h +++ b/urcu/cds.h @@ -33,5 +33,6 @@ #include #include #include +#include #endif /* _URCU_CDS_H */ diff --git a/urcu/lfstack.h b/urcu/lfstack.h new file mode 100644 index 0000000..d068739 --- /dev/null +++ b/urcu/lfstack.h @@ -0,0 +1,87 @@ +#ifndef _URCU_LFSTACK_H +#define _URCU_LFSTACK_H + +/* + * lfstack.h + * + * Userspace RCU library - Lock-Free Stack + * + * Copyright 2010-2012 - Mathieu Desnoyers + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#ifdef __cplusplus +extern "C" { +#endif + +struct cds_lfs_node { + struct cds_lfs_node *next; +}; + +struct cds_lfs_stack { + struct cds_lfs_node *head; +}; + +#ifdef _LGPL_SOURCE + +#include + +#define cds_lfs_node_init _cds_lfs_node_init +#define cds_lfs_init _cds_lfs_init +#define cds_lfs_push _cds_lfs_push +#define cds_lfs_pop _cds_lfs_pop + +#else /* !_LGPL_SOURCE */ + +extern void cds_lfs_node_init(struct cds_lfs_node *node); +extern void cds_lfs_init(struct cds_lfs_stack *s); + +/* + * cds_lfs_push: push a node into the stack. + * + * Does not require any synchronization with other push nor pop. + * + * Returns 0 if the stack was empty prior to adding the node. + * Returns non-zero otherwise. + */ +extern int cds_lfs_push(struct cds_lfs_stack *s, + struct cds_lfs_node *node); + +/* + * cds_lfs_pop: pop a node from the stack. + * + * Returns NULL if stack is empty. + * + * cds_lfs_pop needs to be synchronized using one of the following + * techniques: + * + * 1) Calling cds_lfs_pop under rcu read lock critical section. The + * caller must wait for a grace period to pass before freeing the + * returned node or modifying the cds_lfs_node structure. + * 2) Using mutual exclusion (e.g. mutexes) to protect cds_lfs_pop + * callers. + * 3) Ensuring that only ONE thread can call cds_lfs_pop(). + * (multi-provider/single-consumer scheme). + */ +extern struct cds_lfs_node *cds_lfs_pop(struct cds_lfs_stack *s); + +#endif /* !_LGPL_SOURCE */ + +#ifdef __cplusplus +} +#endif + +#endif /* _URCU_LFSTACK_H */ diff --git a/urcu/static/lfstack.h b/urcu/static/lfstack.h new file mode 100644 index 0000000..7acbf54 --- /dev/null +++ b/urcu/static/lfstack.h @@ -0,0 +1,151 @@ +#ifndef _URCU_STATIC_LFSTACK_H +#define _URCU_STATIC_LFSTACK_H + +/* + * urcu/static/lfstack.h + * + * Userspace RCU library - Lock-Free Stack + * + * Copyright 2010-2012 - Mathieu Desnoyers + * + * TO BE INCLUDED ONLY IN LGPL-COMPATIBLE CODE. See rculfstack.h for linking + * dynamically with the userspace rcu library. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +static inline +void _cds_lfs_node_init(struct cds_lfs_node *node) +{ +} + +static inline +void _cds_lfs_init(struct cds_lfs_stack *s) +{ + s->head = NULL; +} + +/* + * cds_lfs_push: push a node into the stack. + * + * Does not require any synchronization with other push nor pop. + * + * Lock-free stack push is not subject to ABA problem, so no need to + * take the RCU read-side lock. Even if "head" changes between two + * uatomic_cmpxchg() invocations here (being popped, and then pushed + * again by one or more concurrent threads), the second + * uatomic_cmpxchg() invocation only cares about pushing a new entry at + * the head of the stack, ensuring consistency by making sure the new + * node->next is the same pointer value as the value replaced as head. + * It does not care about the content of the actual next node, so it can + * very well be reallocated between the two uatomic_cmpxchg(). + * + * We take the approach of expecting the stack to be usually empty, so + * we first try an initial uatomic_cmpxchg() on a NULL old_head, and + * retry if the old head was non-NULL (the value read by the first + * uatomic_cmpxchg() is used as old head for the following loop). The + * upside of this scheme is to minimize the amount of cacheline traffic, + * always performing an exclusive cacheline access, rather than doing + * non-exclusive followed by exclusive cacheline access (which would be + * required if we first read the old head value). This design decision + * might be revisited after more throrough benchmarking on various + * platforms. + * + * Returns 0 if the stack was empty prior to adding the node. + * Returns non-zero otherwise. + */ +static inline +int _cds_lfs_push(struct cds_lfs_stack *s, + struct cds_lfs_node *node) +{ + struct cds_lfs_node *head = NULL; + + for (;;) { + struct cds_lfs_node *old_head = head; + + /* + * node->next is still private at this point, no need to + * perform a _CMM_STORE_SHARED(). + */ + node->next = head; + /* + * uatomic_cmpxchg() implicit memory barrier orders earlier + * stores to node before publication. + */ + head = uatomic_cmpxchg(&s->head, old_head, node); + if (old_head == head) + break; + } + return (int) !!((unsigned long) head); +} + +/* + * cds_lfs_pop: pop a node from the stack. + * + * Returns NULL if stack is empty. + * + * cds_lfs_pop needs to be synchronized using one of the following + * techniques: + * + * 1) Calling cds_lfs_pop under rcu read lock critical section. The + * caller must wait for a grace period to pass before freeing the + * returned node or modifying the cds_lfs_node structure. + * 2) Using mutual exclusion (e.g. mutexes) to protect cds_lfs_pop + * callers. + * 3) Ensuring that only ONE thread can call cds_lfs_pop(). + * (multi-provider/single-consumer scheme). + */ +static inline +struct cds_lfs_node *_cds_lfs_pop(struct cds_lfs_stack *s) +{ + for (;;) { + struct cds_lfs_node *head; + + head = _CMM_LOAD_SHARED(s->head); + if (head) { + struct cds_lfs_node *next; + + /* + * Read head before head->next. Matches the + * implicit memory barrier before + * uatomic_cmpxchg() in cds_lfs_push. + */ + cmm_smp_read_barrier_depends(); + next = _CMM_LOAD_SHARED(head->next); + if (uatomic_cmpxchg(&s->head, head, next) == head) { + return head; + } else { + /* Concurrent modification. Retry. */ + continue; + } + } else { + /* Empty stack */ + return NULL; + } + } +} + +#ifdef __cplusplus +} +#endif + +#endif /* _URCU_STATIC_LFSTACK_H */ -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 22 09:02:44 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 22 Oct 2012 09:02:44 -0400 Subject: [lttng-dev] [PATCH urcu] lfstack: implement test Message-ID: <20121022130244.GD30944@Krystal> Signed-off-by: Mathieu Desnoyers CC: Paul McKenney CC: Lai Jiangshan --- diff --git a/tests/Makefile.am b/tests/Makefile.am index c92bbe6..df22d06 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -14,10 +14,12 @@ noinst_PROGRAMS = test_urcu test_urcu_dynamic_link test_urcu_timing \ test_uatomic test_urcu_assign test_urcu_assign_dynamic_link \ test_urcu_bp test_urcu_bp_dynamic_link test_cycles_per_loop \ test_urcu_lfq test_urcu_wfq test_urcu_lfs test_urcu_wfs \ + test_urcu_lfs_rcu \ test_urcu_wfcq \ test_urcu_wfq_dynlink test_urcu_wfs_dynlink \ test_urcu_wfcq_dynlink \ - test_urcu_lfq_dynlink test_urcu_lfs_dynlink test_urcu_hash + test_urcu_lfq_dynlink test_urcu_lfs_dynlink test_urcu_hash \ + test_urcu_lfs_rcu_dynlink noinst_HEADERS = rcutorture.h if COMPAT_ARCH @@ -181,10 +183,17 @@ test_urcu_wfcq_dynlink_LDADD = $(URCU_COMMON_LIB) test_urcu_lfs_SOURCES = test_urcu_lfs.c $(URCU) test_urcu_lfs_LDADD = $(URCU_CDS_LIB) +test_urcu_lfs_rcu_SOURCES = test_urcu_lfs_rcu.c $(URCU) +test_urcu_lfs_rcu_LDADD = $(URCU_CDS_LIB) + test_urcu_lfs_dynlink_SOURCES = test_urcu_lfs.c $(URCU) test_urcu_lfs_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS) test_urcu_lfs_dynlink_LDADD = $(URCU_CDS_LIB) +test_urcu_lfs_rcu_dynlink_SOURCES = test_urcu_lfs_rcu.c $(URCU) +test_urcu_lfs_rcu_dynlink_CFLAGS = -DDYNAMIC_LINK_TEST $(AM_CFLAGS) +test_urcu_lfs_rcu_dynlink_LDADD = $(URCU_CDS_LIB) + test_urcu_wfs_SOURCES = test_urcu_wfs.c $(COMPAT) test_urcu_wfs_LDADD = $(URCU_COMMON_LIB) diff --git a/tests/test_urcu_lfs.c b/tests/test_urcu_lfs.c index 61abaad..e15c677 100644 --- a/tests/test_urcu_lfs.c +++ b/tests/test_urcu_lfs.c @@ -1,9 +1,9 @@ /* * test_urcu_lfs.c * - * Userspace RCU library - example RCU-based lock-free stack + * Userspace RCU library - example lock-free stack * - * Copyright February 2010 - Mathieu Desnoyers + * Copyright 2010-2012 - Mathieu Desnoyers * Copyright February 2010 - Paolo Bonzini * * This program is free software; you can redistribute it and/or modify @@ -158,11 +158,11 @@ static unsigned int nr_enqueuers; static unsigned int nr_dequeuers; struct test { - struct cds_lfs_node_rcu list; + struct cds_lfs_node list; struct rcu_head rcu; }; -static struct cds_lfs_stack_rcu s; +static struct cds_lfs_stack s; void *thr_enqueuer(void *_count) { @@ -184,9 +184,8 @@ void *thr_enqueuer(void *_count) struct test *node = malloc(sizeof(*node)); if (!node) goto fail; - cds_lfs_node_init_rcu(&node->list); - /* No rcu read-side is needed for push */ - cds_lfs_push_rcu(&s, &node->list); + cds_lfs_node_init(&node->list); + cds_lfs_push(&s, &node->list); URCU_TLS(nr_successful_enqueues)++; if (caa_unlikely(wdelay)) @@ -234,10 +233,10 @@ void *thr_dequeuer(void *_count) cmm_smp_mb(); for (;;) { - struct cds_lfs_node_rcu *snode; + struct cds_lfs_node *snode; rcu_read_lock(); - snode = cds_lfs_pop_rcu(&s); + snode = cds_lfs_pop(&s); rcu_read_unlock(); if (snode) { struct test *node; @@ -264,12 +263,12 @@ void *thr_dequeuer(void *_count) return ((void*)2); } -void test_end(struct cds_lfs_stack_rcu *s, unsigned long long *nr_dequeues) +void test_end(struct cds_lfs_stack *s, unsigned long long *nr_dequeues) { - struct cds_lfs_node_rcu *snode; + struct cds_lfs_node *snode; do { - snode = cds_lfs_pop_rcu(s); + snode = cds_lfs_pop(s); if (snode) { struct test *node; @@ -371,7 +370,7 @@ int main(int argc, char **argv) tid_dequeuer = malloc(sizeof(*tid_dequeuer) * nr_dequeuers); count_enqueuer = malloc(2 * sizeof(*count_enqueuer) * nr_enqueuers); count_dequeuer = malloc(2 * sizeof(*count_dequeuer) * nr_dequeuers); - cds_lfs_init_rcu(&s); + cds_lfs_init(&s); err = create_all_cpu_call_rcu_data(0); if (err) { printf("Per-CPU call_rcu() worker threads unavailable. Using default global worker thread.\n"); diff --git a/tests/test_urcu_lfs_rcu.c b/tests/test_urcu_lfs_rcu.c new file mode 100644 index 0000000..d1e0ee9 --- /dev/null +++ b/tests/test_urcu_lfs_rcu.c @@ -0,0 +1,451 @@ +/* + * test_urcu_lfs_rcu.c + * + * Userspace RCU library - example RCU-based lock-free stack + * + * Copyright February 2010 - Mathieu Desnoyers + * Copyright February 2010 - Paolo Bonzini + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#define _GNU_SOURCE +#include "../config.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#ifdef __linux__ +#include +#endif + +/* hardcoded number of CPUs */ +#define NR_CPUS 16384 + +#if defined(_syscall0) +_syscall0(pid_t, gettid) +#elif defined(__NR_gettid) +static inline pid_t gettid(void) +{ + return syscall(__NR_gettid); +} +#else +#warning "use pid as tid" +static inline pid_t gettid(void) +{ + return getpid(); +} +#endif + +#ifndef DYNAMIC_LINK_TEST +#define _LGPL_SOURCE +#endif +#include +#include + +static volatile int test_go, test_stop; + +static unsigned long rduration; + +static unsigned long duration; + +/* read-side C.S. duration, in loops */ +static unsigned long wdelay; + +static inline void loop_sleep(unsigned long loops) +{ + while (loops-- != 0) + caa_cpu_relax(); +} + +static int verbose_mode; + +#define printf_verbose(fmt, args...) \ + do { \ + if (verbose_mode) \ + printf(fmt, args); \ + } while (0) + +static unsigned int cpu_affinities[NR_CPUS]; +static unsigned int next_aff = 0; +static int use_affinity = 0; + +pthread_mutex_t affinity_mutex = PTHREAD_MUTEX_INITIALIZER; + +#ifndef HAVE_CPU_SET_T +typedef unsigned long cpu_set_t; +# define CPU_ZERO(cpuset) do { *(cpuset) = 0; } while(0) +# define CPU_SET(cpu, cpuset) do { *(cpuset) |= (1UL << (cpu)); } while(0) +#endif + +static void set_affinity(void) +{ + cpu_set_t mask; + int cpu; + int ret; + + if (!use_affinity) + return; + +#if HAVE_SCHED_SETAFFINITY + ret = pthread_mutex_lock(&affinity_mutex); + if (ret) { + perror("Error in pthread mutex lock"); + exit(-1); + } + cpu = cpu_affinities[next_aff++]; + ret = pthread_mutex_unlock(&affinity_mutex); + if (ret) { + perror("Error in pthread mutex unlock"); + exit(-1); + } + + CPU_ZERO(&mask); + CPU_SET(cpu, &mask); +#if SCHED_SETAFFINITY_ARGS == 2 + sched_setaffinity(0, &mask); +#else + sched_setaffinity(0, sizeof(mask), &mask); +#endif +#endif /* HAVE_SCHED_SETAFFINITY */ +} + +/* + * returns 0 if test should end. + */ +static int test_duration_dequeue(void) +{ + return !test_stop; +} + +static int test_duration_enqueue(void) +{ + return !test_stop; +} + +static DEFINE_URCU_TLS(unsigned long long, nr_dequeues); +static DEFINE_URCU_TLS(unsigned long long, nr_enqueues); + +static DEFINE_URCU_TLS(unsigned long long, nr_successful_dequeues); +static DEFINE_URCU_TLS(unsigned long long, nr_successful_enqueues); + +static unsigned int nr_enqueuers; +static unsigned int nr_dequeuers; + +struct test { + struct cds_lfs_node_rcu list; + struct rcu_head rcu; +}; + +static struct cds_lfs_stack_rcu s; + +void *thr_enqueuer(void *_count) +{ + unsigned long long *count = _count; + + printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n", + "enqueuer", pthread_self(), (unsigned long)gettid()); + + set_affinity(); + + rcu_register_thread(); + + while (!test_go) + { + } + cmm_smp_mb(); + + for (;;) { + struct test *node = malloc(sizeof(*node)); + if (!node) + goto fail; + cds_lfs_node_init_rcu(&node->list); + /* No rcu read-side is needed for push */ + cds_lfs_push_rcu(&s, &node->list); + URCU_TLS(nr_successful_enqueues)++; + + if (caa_unlikely(wdelay)) + loop_sleep(wdelay); +fail: + URCU_TLS(nr_enqueues)++; + if (caa_unlikely(!test_duration_enqueue())) + break; + } + + rcu_unregister_thread(); + + count[0] = URCU_TLS(nr_enqueues); + count[1] = URCU_TLS(nr_successful_enqueues); + printf_verbose("enqueuer thread_end, thread id : %lx, tid %lu, " + "enqueues %llu successful_enqueues %llu\n", + pthread_self(), (unsigned long)gettid(), + URCU_TLS(nr_enqueues), URCU_TLS(nr_successful_enqueues)); + return ((void*)1); + +} + +static +void free_node_cb(struct rcu_head *head) +{ + struct test *node = + caa_container_of(head, struct test, rcu); + free(node); +} + +void *thr_dequeuer(void *_count) +{ + unsigned long long *count = _count; + + printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n", + "dequeuer", pthread_self(), (unsigned long)gettid()); + + set_affinity(); + + rcu_register_thread(); + + while (!test_go) + { + } + cmm_smp_mb(); + + for (;;) { + struct cds_lfs_node_rcu *snode; + + rcu_read_lock(); + snode = cds_lfs_pop_rcu(&s); + rcu_read_unlock(); + if (snode) { + struct test *node; + + node = caa_container_of(snode, struct test, list); + call_rcu(&node->rcu, free_node_cb); + URCU_TLS(nr_successful_dequeues)++; + } + URCU_TLS(nr_dequeues)++; + if (caa_unlikely(!test_duration_dequeue())) + break; + if (caa_unlikely(rduration)) + loop_sleep(rduration); + } + + rcu_unregister_thread(); + + printf_verbose("dequeuer thread_end, thread id : %lx, tid %lu, " + "dequeues %llu, successful_dequeues %llu\n", + pthread_self(), (unsigned long)gettid(), + URCU_TLS(nr_dequeues), URCU_TLS(nr_successful_dequeues)); + count[0] = URCU_TLS(nr_dequeues); + count[1] = URCU_TLS(nr_successful_dequeues); + return ((void*)2); +} + +void test_end(struct cds_lfs_stack_rcu *s, unsigned long long *nr_dequeues) +{ + struct cds_lfs_node_rcu *snode; + + do { + snode = cds_lfs_pop_rcu(s); + if (snode) { + struct test *node; + + node = caa_container_of(snode, struct test, list); + free(node); + (*nr_dequeues)++; + } + } while (snode); +} + +void show_usage(int argc, char **argv) +{ + printf("Usage : %s nr_dequeuers nr_enqueuers duration (s)", argv[0]); + printf(" [-d delay] (enqueuer period (in loops))"); + printf(" [-c duration] (dequeuer period (in loops))"); + printf(" [-v] (verbose output)"); + printf(" [-a cpu#] [-a cpu#]... (affinity)"); + printf("\n"); +} + +int main(int argc, char **argv) +{ + int err; + pthread_t *tid_enqueuer, *tid_dequeuer; + void *tret; + unsigned long long *count_enqueuer, *count_dequeuer; + unsigned long long tot_enqueues = 0, tot_dequeues = 0; + unsigned long long tot_successful_enqueues = 0, + tot_successful_dequeues = 0; + unsigned long long end_dequeues = 0; + int i, a; + + if (argc < 4) { + show_usage(argc, argv); + return -1; + } + + err = sscanf(argv[1], "%u", &nr_dequeuers); + if (err != 1) { + show_usage(argc, argv); + return -1; + } + + err = sscanf(argv[2], "%u", &nr_enqueuers); + if (err != 1) { + show_usage(argc, argv); + return -1; + } + + err = sscanf(argv[3], "%lu", &duration); + if (err != 1) { + show_usage(argc, argv); + return -1; + } + + for (i = 4; i < argc; i++) { + if (argv[i][0] != '-') + continue; + switch (argv[i][1]) { + case 'a': + if (argc < i + 2) { + show_usage(argc, argv); + return -1; + } + a = atoi(argv[++i]); + cpu_affinities[next_aff++] = a; + use_affinity = 1; + printf_verbose("Adding CPU %d affinity\n", a); + break; + case 'c': + if (argc < i + 2) { + show_usage(argc, argv); + return -1; + } + rduration = atol(argv[++i]); + break; + case 'd': + if (argc < i + 2) { + show_usage(argc, argv); + return -1; + } + wdelay = atol(argv[++i]); + break; + case 'v': + verbose_mode = 1; + break; + } + } + + printf_verbose("running test for %lu seconds, %u enqueuers, " + "%u dequeuers.\n", + duration, nr_enqueuers, nr_dequeuers); + printf_verbose("Writer delay : %lu loops.\n", rduration); + printf_verbose("Reader duration : %lu loops.\n", wdelay); + printf_verbose("thread %-6s, thread id : %lx, tid %lu\n", + "main", pthread_self(), (unsigned long)gettid()); + + tid_enqueuer = malloc(sizeof(*tid_enqueuer) * nr_enqueuers); + tid_dequeuer = malloc(sizeof(*tid_dequeuer) * nr_dequeuers); + count_enqueuer = malloc(2 * sizeof(*count_enqueuer) * nr_enqueuers); + count_dequeuer = malloc(2 * sizeof(*count_dequeuer) * nr_dequeuers); + cds_lfs_init_rcu(&s); + err = create_all_cpu_call_rcu_data(0); + if (err) { + printf("Per-CPU call_rcu() worker threads unavailable. Using default global worker thread.\n"); + } + + next_aff = 0; + + for (i = 0; i < nr_enqueuers; i++) { + err = pthread_create(&tid_enqueuer[i], NULL, thr_enqueuer, + &count_enqueuer[2 * i]); + if (err != 0) + exit(1); + } + for (i = 0; i < nr_dequeuers; i++) { + err = pthread_create(&tid_dequeuer[i], NULL, thr_dequeuer, + &count_dequeuer[2 * i]); + if (err != 0) + exit(1); + } + + cmm_smp_mb(); + + test_go = 1; + + for (i = 0; i < duration; i++) { + sleep(1); + if (verbose_mode) + write (1, ".", 1); + } + + test_stop = 1; + + for (i = 0; i < nr_enqueuers; i++) { + err = pthread_join(tid_enqueuer[i], &tret); + if (err != 0) + exit(1); + tot_enqueues += count_enqueuer[2 * i]; + tot_successful_enqueues += count_enqueuer[2 * i + 1]; + } + for (i = 0; i < nr_dequeuers; i++) { + err = pthread_join(tid_dequeuer[i], &tret); + if (err != 0) + exit(1); + tot_dequeues += count_dequeuer[2 * i]; + tot_successful_dequeues += count_dequeuer[2 * i + 1]; + } + + test_end(&s, &end_dequeues); + + printf_verbose("total number of enqueues : %llu, dequeues %llu\n", + tot_enqueues, tot_dequeues); + printf_verbose("total number of successful enqueues : %llu, " + "successful dequeues %llu\n", + tot_successful_enqueues, tot_successful_dequeues); + printf("SUMMARY %-25s testdur %4lu nr_enqueuers %3u wdelay %6lu " + "nr_dequeuers %3u " + "rdur %6lu nr_enqueues %12llu nr_dequeues %12llu " + "successful enqueues %12llu successful dequeues %12llu " + "end_dequeues %llu nr_ops %12llu\n", + argv[0], duration, nr_enqueuers, wdelay, + nr_dequeuers, rduration, tot_enqueues, tot_dequeues, + tot_successful_enqueues, + tot_successful_dequeues, end_dequeues, + tot_enqueues + tot_dequeues); + if (tot_successful_enqueues != tot_successful_dequeues + end_dequeues) + printf("WARNING! Discrepancy between nr succ. enqueues %llu vs " + "succ. dequeues + end dequeues %llu.\n", + tot_successful_enqueues, + tot_successful_dequeues + end_dequeues); + + free_all_cpu_call_rcu_data(); + free(count_enqueuer); + free(count_dequeuer); + free(tid_enqueuer); + free(tid_dequeuer); + return 0; +} -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 22 09:03:40 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 22 Oct 2012 09:03:40 -0400 Subject: [lttng-dev] [PATCH urcu] lfstack: implement empty, pop_all and iterators, document API Message-ID: <20121022130340.GE30944@Krystal> We are changing the ABI by adding a mutex into struct cds_lfs_stack. This ABI has never been exposed in a release so far, so we can change it. Signed-off-by: Mathieu Desnoyers CC: Paul McKenney CC: Lai Jiangshan --- diff --git a/lfstack.c b/lfstack.c index 74ffd4f..db2c2cf 100644 --- a/lfstack.c +++ b/lfstack.c @@ -40,12 +40,42 @@ void cds_lfs_init(struct cds_lfs_stack *s) _cds_lfs_init(s); } -int cds_lfs_push(struct cds_lfs_stack *s, struct cds_lfs_node *node) +bool cds_lfs_empty(struct cds_lfs_stack *s) +{ + return _cds_lfs_empty(s); +} + +bool cds_lfs_push(struct cds_lfs_stack *s, struct cds_lfs_node *node) { return _cds_lfs_push(s, node); } -struct cds_lfs_node *cds_lfs_pop(struct cds_lfs_stack *s) +struct cds_lfs_node *cds_lfs_pop_blocking(struct cds_lfs_stack *s) +{ + return _cds_lfs_pop_blocking(s); +} + +struct cds_lfs_head *cds_lfs_pop_all_blocking(struct cds_lfs_stack *s) +{ + return _cds_lfs_pop_all_blocking(s); +} + +void cds_lfs_pop_lock(struct cds_lfs_stack *s) +{ + _cds_lfs_pop_lock(s); +} + +void cds_lfs_pop_unlock(struct cds_lfs_stack *s) +{ + _cds_lfs_pop_unlock(s); +} + +struct cds_lfs_node *__cds_lfs_pop(struct cds_lfs_stack *s) +{ + return ___cds_lfs_pop(s); +} + +struct cds_lfs_head *__cds_lfs_pop_all(struct cds_lfs_stack *s) { - return _cds_lfs_pop(s); + return ___cds_lfs_pop_all(s); } diff --git a/tests/test_urcu_lfs.c b/tests/test_urcu_lfs.c index e15c677..6c30884 100644 --- a/tests/test_urcu_lfs.c +++ b/tests/test_urcu_lfs.c @@ -236,7 +236,7 @@ void *thr_dequeuer(void *_count) struct cds_lfs_node *snode; rcu_read_lock(); - snode = cds_lfs_pop(&s); + snode = __cds_lfs_pop(&s); rcu_read_unlock(); if (snode) { struct test *node; @@ -268,7 +268,7 @@ void test_end(struct cds_lfs_stack *s, unsigned long long *nr_dequeues) struct cds_lfs_node *snode; do { - snode = cds_lfs_pop(s); + snode = __cds_lfs_pop(s); if (snode) { struct test *node; diff --git a/urcu/lfstack.h b/urcu/lfstack.h index d068739..eddff0e 100644 --- a/urcu/lfstack.h +++ b/urcu/lfstack.h @@ -27,12 +27,52 @@ extern "C" { #endif +#include +#include + +/* + * Lock-free stack. + * + * Stack implementing push, pop, pop_all operations, as well as iterator + * on the stack head returned by pop_all. + * + * Synchronization table: + * + * External synchronization techniques described in the API below is + * required between pairs marked with "X". No external synchronization + * required between pairs marked with "-". + * + * cds_lfs_push __cds_lfs_pop __cds_lfs_pop_all + * cds_lfs_push - - - + * __cds_lfs_pop - X X + * __cds_lfs_pop_all - X - + * + * cds_lfs_pop_blocking and cds_lfs_pop_all_blocking use an internal + * mutex to provide synchronization. + */ + +/* + * struct cds_lfs_node is returned by cds_lfs_pop, and also used as + * iterator on stack. It is not safe to dereference the node next + * pointer when returned by cds_lfs_pop. + */ struct cds_lfs_node { struct cds_lfs_node *next; }; +/* + * struct cds_lfs_head is returned by __cds_lfs_pop_all, and can be used + * to begin iteration on the stack. "node" needs to be the first field + * of cds_lfs_head, so the end-of-stack pointer value can be used for + * both types. + */ +struct cds_lfs_head { + struct cds_lfs_node node; +}; + struct cds_lfs_stack { - struct cds_lfs_node *head; + struct cds_lfs_head *head; + pthread_mutex_t lock; }; #ifdef _LGPL_SOURCE @@ -41,15 +81,41 @@ struct cds_lfs_stack { #define cds_lfs_node_init _cds_lfs_node_init #define cds_lfs_init _cds_lfs_init +#define cds_lfs_empty _cds_lfs_empty #define cds_lfs_push _cds_lfs_push -#define cds_lfs_pop _cds_lfs_pop + +/* Locking performed internally */ +#define cds_lfs_pop_blocking _cds_lfs_pop_blocking +#define cds_lfs_pop_all_blocking _cds_lfs_pop_all_blocking + +/* Synchronize pop with internal mutex */ +#define cds_lfs_pop_lock _cds_lfs_pop_lock +#define cds_lfs_pop_unlock _cds_lfs_pop_unlock + +/* Synchronization ensured by the caller. See synchronization table. */ +#define __cds_lfs_pop ___cds_lfs_pop +#define __cds_lfs_pop_all ___cds_lfs_pop_all #else /* !_LGPL_SOURCE */ +/* + * cds_lfs_node_init: initialize lock-free stack node. + */ extern void cds_lfs_node_init(struct cds_lfs_node *node); + +/* + * cds_lfs_init: initialize lock-free stack. + */ extern void cds_lfs_init(struct cds_lfs_stack *s); /* + * cds_lfs_empty: return whether lock-free stack is empty. + * + * No memory barrier is issued. No mutual exclusion is required. + */ +extern bool cds_lfs_empty(struct cds_lfs_stack *s); + +/* * cds_lfs_push: push a node into the stack. * * Does not require any synchronization with other push nor pop. @@ -57,29 +123,102 @@ extern void cds_lfs_init(struct cds_lfs_stack *s); * Returns 0 if the stack was empty prior to adding the node. * Returns non-zero otherwise. */ -extern int cds_lfs_push(struct cds_lfs_stack *s, +extern bool cds_lfs_push(struct cds_lfs_stack *s, struct cds_lfs_node *node); /* - * cds_lfs_pop: pop a node from the stack. + * cds_lfs_pop_blocking: pop a node from the stack. + * + * Calls __cds_lfs_pop with an internal pop mutex held. + */ +extern struct cds_lfs_node *cds_lfs_pop_blocking(struct cds_lfs_stack *s); + +/* + * cds_lfs_pop_all_blocking: pop all nodes from a stack. + * + * Calls __cds_lfs_pop_all with an internal pop mutex held. + */ +extern struct cds_lfs_head *cds_lfs_pop_all_blocking(struct cds_lfs_stack *s); + +/* + * cds_lfs_pop_lock: lock stack pop-protection mutex. + */ +extern void cds_lfs_pop_lock(struct cds_lfs_stack *s); + +/* + * cds_lfs_pop_unlock: unlock stack pop-protection mutex. + */ +extern void cds_lfs_pop_unlock(struct cds_lfs_stack *s); + +/* + * __cds_lfs_pop: pop a node from the stack. * * Returns NULL if stack is empty. * - * cds_lfs_pop needs to be synchronized using one of the following + * __cds_lfs_pop needs to be synchronized using one of the following * techniques: * - * 1) Calling cds_lfs_pop under rcu read lock critical section. The + * 1) Calling __cds_lfs_pop under rcu read lock critical section. The * caller must wait for a grace period to pass before freeing the * returned node or modifying the cds_lfs_node structure. - * 2) Using mutual exclusion (e.g. mutexes) to protect cds_lfs_pop - * callers. - * 3) Ensuring that only ONE thread can call cds_lfs_pop(). - * (multi-provider/single-consumer scheme). + * 2) Using mutual exclusion (e.g. mutexes) to protect __cds_lfs_pop + * and __cds_lfs_pop_all callers. + * 3) Ensuring that only ONE thread can call __cds_lfs_pop() and + * __cds_lfs_pop_all(). (multi-provider/single-consumer scheme). */ -extern struct cds_lfs_node *cds_lfs_pop(struct cds_lfs_stack *s); +extern struct cds_lfs_node *__cds_lfs_pop(struct cds_lfs_stack *s); + +/* + * __cds_lfs_pop_all: pop all nodes from a stack. + * + * __cds_lfs_pop_all does not require any synchronization with other + * push, nor with other __cds_lfs_pop_all, but requires synchronization + * matching the technique used to synchronize __cds_lfs_pop: + * + * 1) If __cds_lfs_pop is called under rcu read lock critical section, + * both __cds_lfs_pop and cds_lfs_pop_all callers must wait for a + * grace period to pass before freeing the returned node or modifying + * the cds_lfs_node structure. However, no RCU read-side critical + * section is needed around __cds_lfs_pop_all. + * 2) Using mutual exclusion (e.g. mutexes) to protect __cds_lfs_pop and + * __cds_lfs_pop_all callers. + * 3) Ensuring that only ONE thread can call __cds_lfs_pop() and + * __cds_lfs_pop_all(). (multi-provider/single-consumer scheme). + */ +extern struct cds_lfs_head *__cds_lfs_pop_all(struct cds_lfs_stack *s); #endif /* !_LGPL_SOURCE */ +/* + * cds_lfs_for_each: Iterate over all nodes returned by + * __cds_lfs_pop_all. + * @__head: node returned by __cds_lfs_pop_all (struct cds_lfs_head pointer). + * @__node: node to use as iterator (struct cds_lfs_node pointer). + * + * Content written into each node before push is guaranteed to be + * consistent, but no other memory ordering is ensured. + */ +#define cds_lfs_for_each(__head, __node) \ + for (__node = &__head->node; \ + __node != NULL; \ + __node = __node->next) + +/* + * cds_lfs_for_each_safe: Iterate over all nodes returned by + * __cds_lfs_pop_all, safe against node deletion. + * @__head: node returned by __cds_lfs_pop_all (struct cds_lfs_head pointer). + * @__node: node to use as iterator (struct cds_lfs_node pointer). + * @__n: struct cds_lfs_node pointer holding the next pointer (used + * internally). + * + * Content written into each node before push is guaranteed to be + * consistent, but no other memory ordering is ensured. + */ +#define cds_lfs_for_each_safe(__head, __node, __n) \ + for (__node = &__head->node, __n = (__node ? __node->next : NULL); \ + __node != NULL; \ + __node = __n, __n = (__node ? __node->next : NULL)) + #ifdef __cplusplus } #endif diff --git a/urcu/static/lfstack.h b/urcu/static/lfstack.h index 7acbf54..77a26dc 100644 --- a/urcu/static/lfstack.h +++ b/urcu/static/lfstack.h @@ -26,6 +26,9 @@ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ +#include +#include +#include #include #include @@ -33,15 +36,63 @@ extern "C" { #endif +/* + * Lock-free stack. + * + * Stack implementing push, pop, pop_all operations, as well as iterator + * on the stack head returned by pop_all. + * + * Synchronization table: + * + * External synchronization techniques described in the API below is + * required between pairs marked with "X". No external synchronization + * required between pairs marked with "-". + * + * cds_lfs_push __cds_lfs_pop __cds_lfs_pop_all + * cds_lfs_push - - - + * __cds_lfs_pop - X X + * __cds_lfs_pop_all - X - + * + * cds_lfs_pop_blocking and cds_lfs_pop_all_blocking use an internal + * mutex to provide synchronization. + */ + +/* + * cds_lfs_node_init: initialize lock-free stack node. + */ static inline void _cds_lfs_node_init(struct cds_lfs_node *node) { } +/* + * cds_lfs_init: initialize lock-free stack. + */ static inline void _cds_lfs_init(struct cds_lfs_stack *s) { + int ret; + s->head = NULL; + ret = pthread_mutex_init(&s->lock, NULL); + assert(!ret); +} + +static inline +bool ___cds_lfs_empty_head(struct cds_lfs_head *head) +{ + return head == NULL; +} + +/* + * cds_lfs_empty: return whether lock-free stack is empty. + * + * No memory barrier is issued. No mutual exclusion is required. + */ +static inline +bool _cds_lfs_empty(struct cds_lfs_stack *s) +{ + return ___cds_lfs_empty_head(CMM_LOAD_SHARED(s->head)); } /* @@ -74,76 +125,159 @@ void _cds_lfs_init(struct cds_lfs_stack *s) * Returns non-zero otherwise. */ static inline -int _cds_lfs_push(struct cds_lfs_stack *s, +bool _cds_lfs_push(struct cds_lfs_stack *s, struct cds_lfs_node *node) { - struct cds_lfs_node *head = NULL; + struct cds_lfs_head *head = NULL; + struct cds_lfs_head *new_head = + caa_container_of(node, struct cds_lfs_head, node); for (;;) { - struct cds_lfs_node *old_head = head; + struct cds_lfs_head *old_head = head; /* * node->next is still private at this point, no need to * perform a _CMM_STORE_SHARED(). */ - node->next = head; + node->next = &head->node; /* * uatomic_cmpxchg() implicit memory barrier orders earlier * stores to node before publication. */ - head = uatomic_cmpxchg(&s->head, old_head, node); + head = uatomic_cmpxchg(&s->head, old_head, new_head); if (old_head == head) break; } - return (int) !!((unsigned long) head); + return ___cds_lfs_empty_head(head); } /* - * cds_lfs_pop: pop a node from the stack. + * __cds_lfs_pop: pop a node from the stack. * * Returns NULL if stack is empty. * - * cds_lfs_pop needs to be synchronized using one of the following + * __cds_lfs_pop needs to be synchronized using one of the following * techniques: * - * 1) Calling cds_lfs_pop under rcu read lock critical section. The + * 1) Calling __cds_lfs_pop under rcu read lock critical section. The * caller must wait for a grace period to pass before freeing the * returned node or modifying the cds_lfs_node structure. - * 2) Using mutual exclusion (e.g. mutexes) to protect cds_lfs_pop - * callers. - * 3) Ensuring that only ONE thread can call cds_lfs_pop(). - * (multi-provider/single-consumer scheme). + * 2) Using mutual exclusion (e.g. mutexes) to protect __cds_lfs_pop + * and __cds_lfs_pop_all callers. + * 3) Ensuring that only ONE thread can call __cds_lfs_pop() and + * __cds_lfs_pop_all(). (multi-provider/single-consumer scheme). */ static inline -struct cds_lfs_node *_cds_lfs_pop(struct cds_lfs_stack *s) +struct cds_lfs_node *___cds_lfs_pop(struct cds_lfs_stack *s) { for (;;) { - struct cds_lfs_node *head; + struct cds_lfs_head *head, *next_head; + struct cds_lfs_node *next; head = _CMM_LOAD_SHARED(s->head); - if (head) { - struct cds_lfs_node *next; - - /* - * Read head before head->next. Matches the - * implicit memory barrier before - * uatomic_cmpxchg() in cds_lfs_push. - */ - cmm_smp_read_barrier_depends(); - next = _CMM_LOAD_SHARED(head->next); - if (uatomic_cmpxchg(&s->head, head, next) == head) { - return head; - } else { - /* Concurrent modification. Retry. */ - continue; - } - } else { - /* Empty stack */ - return NULL; - } + if (___cds_lfs_empty_head(head)) + return NULL; /* Empty stack */ + + /* + * Read head before head->next. Matches the implicit + * memory barrier before uatomic_cmpxchg() in + * cds_lfs_push. + */ + cmm_smp_read_barrier_depends(); + next = _CMM_LOAD_SHARED(head->node.next); + next_head = caa_container_of(next, + struct cds_lfs_head, node); + if (uatomic_cmpxchg(&s->head, head, next_head) == head) + return &head->node; + /* busy-loop if head changed under us */ } } +/* + * __cds_lfs_pop_all: pop all nodes from a stack. + * + * __cds_lfs_pop_all does not require any synchronization with other + * push, nor with other __cds_lfs_pop_all, but requires synchronization + * matching the technique used to synchronize __cds_lfs_pop: + * + * 1) If __cds_lfs_pop is called under rcu read lock critical section, + * both __cds_lfs_pop and cds_lfs_pop_all callers must wait for a + * grace period to pass before freeing the returned node or modifying + * the cds_lfs_node structure. However, no RCU read-side critical + * section is needed around __cds_lfs_pop_all. + * 2) Using mutual exclusion (e.g. mutexes) to protect __cds_lfs_pop and + * __cds_lfs_pop_all callers. + * 3) Ensuring that only ONE thread can call __cds_lfs_pop() and + * __cds_lfs_pop_all(). (multi-provider/single-consumer scheme). + */ +static inline +struct cds_lfs_head *___cds_lfs_pop_all(struct cds_lfs_stack *s) +{ + /* + * Implicit memory barrier after uatomic_xchg() matches implicit + * memory barrier before uatomic_cmpxchg() in cds_lfs_push. It + * ensures that all nodes of the returned list are consistent. + * There is no need to issue memory barriers when iterating on + * the returned list, because the full memory barrier issued + * prior to each uatomic_cmpxchg, which each write to head, are + * taking care to order writes to each node prior to the full + * memory barrier after this uatomic_xchg(). + */ + return uatomic_xchg(&s->head, NULL); +} + +/* + * cds_lfs_pop_lock: lock stack pop-protection mutex. + */ +static inline void _cds_lfs_pop_lock(struct cds_lfs_stack *s) +{ + int ret; + + ret = pthread_mutex_lock(&s->lock); + assert(!ret); +} + +/* + * cds_lfs_pop_unlock: unlock stack pop-protection mutex. + */ +static inline void _cds_lfs_pop_unlock(struct cds_lfs_stack *s) +{ + int ret; + + ret = pthread_mutex_unlock(&s->lock); + assert(!ret); +} + +/* + * Call __cds_lfs_pop with an internal pop mutex held. + */ +static inline +struct cds_lfs_node * +_cds_lfs_pop_blocking(struct cds_lfs_stack *s) +{ + struct cds_lfs_node *retnode; + + _cds_lfs_pop_lock(s); + retnode = ___cds_lfs_pop(s); + _cds_lfs_pop_unlock(s); + return retnode; +} + +/* + * Call __cds_lfs_pop_all with an internal pop mutex held. + */ +static inline +struct cds_lfs_head * +_cds_lfs_pop_all_blocking(struct cds_lfs_stack *s) +{ + struct cds_lfs_head *rethead; + + _cds_lfs_pop_lock(s); + rethead = ___cds_lfs_pop_all(s); + _cds_lfs_pop_unlock(s); + return rethead; +} + #ifdef __cplusplus } #endif -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 22 09:04:32 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 22 Oct 2012 09:04:32 -0400 Subject: [lttng-dev] [PATCH urcu] lfstack: test pop_all and pop Message-ID: <20121022130432.GF30944@Krystal> Signed-off-by: Mathieu Desnoyers CC: Paul McKenney CC: Lai Jiangshan --- diff --git a/tests/test_urcu_lfs.c b/tests/test_urcu_lfs.c index 6c30884..c6403b1 100644 --- a/tests/test_urcu_lfs.c +++ b/tests/test_urcu_lfs.c @@ -68,6 +68,16 @@ static inline pid_t gettid(void) #include #include +/* + * External synchronization used. + */ +enum test_sync { + TEST_SYNC_NONE = 0, + TEST_SYNC_RCU, +}; + +static enum test_sync test_sync; + static volatile int test_go, test_stop; static unsigned long rduration; @@ -85,10 +95,12 @@ static inline void loop_sleep(unsigned long loops) static int verbose_mode; +static int test_pop, test_pop_all; + #define printf_verbose(fmt, args...) \ do { \ if (verbose_mode) \ - printf(fmt, args); \ + printf(fmt, ## args); \ } while (0) static unsigned int cpu_affinities[NR_CPUS]; @@ -216,6 +228,52 @@ void free_node_cb(struct rcu_head *head) free(node); } +static +void do_test_pop(enum test_sync sync) +{ + struct cds_lfs_node *snode; + + if (sync == TEST_SYNC_RCU) + rcu_read_lock(); + snode = __cds_lfs_pop(&s); + if (sync == TEST_SYNC_RCU) + rcu_read_unlock(); + if (snode) { + struct test *node; + + node = caa_container_of(snode, + struct test, list); + if (sync == TEST_SYNC_RCU) + call_rcu(&node->rcu, free_node_cb); + else + free(node); + URCU_TLS(nr_successful_dequeues)++; + } + URCU_TLS(nr_dequeues)++; +} + +static +void do_test_pop_all(enum test_sync sync) +{ + struct cds_lfs_node *snode; + struct cds_lfs_head *head; + struct cds_lfs_node *n; + + head = __cds_lfs_pop_all(&s); + cds_lfs_for_each_safe(head, snode, n) { + struct test *node; + + node = caa_container_of(snode, struct test, list); + if (sync == TEST_SYNC_RCU) + call_rcu(&node->rcu, free_node_cb); + else + free(node); + URCU_TLS(nr_successful_dequeues)++; + URCU_TLS(nr_dequeues)++; + } + +} + void *thr_dequeuer(void *_count) { unsigned long long *count = _count; @@ -232,20 +290,25 @@ void *thr_dequeuer(void *_count) } cmm_smp_mb(); - for (;;) { - struct cds_lfs_node *snode; - - rcu_read_lock(); - snode = __cds_lfs_pop(&s); - rcu_read_unlock(); - if (snode) { - struct test *node; + assert(test_pop || test_pop_all); - node = caa_container_of(snode, struct test, list); - call_rcu(&node->rcu, free_node_cb); - URCU_TLS(nr_successful_dequeues)++; + for (;;) { + unsigned int counter = 0; + + if (test_pop && test_pop_all) { + /* both pop and pop all */ + if (counter & 1) + do_test_pop(test_sync); + else + do_test_pop_all(test_sync); + counter++; + } else { + if (test_pop) + do_test_pop(test_sync); + else + do_test_pop_all(test_sync); } - URCU_TLS(nr_dequeues)++; + if (caa_unlikely(!test_duration_dequeue())) break; if (caa_unlikely(rduration)) @@ -286,6 +349,10 @@ void show_usage(int argc, char **argv) printf(" [-c duration] (dequeuer period (in loops))"); printf(" [-v] (verbose output)"); printf(" [-a cpu#] [-a cpu#]... (affinity)"); + printf(" [-p] (test pop)"); + printf(" [-P] (test pop_all, enabled by default)"); + printf(" [-R] (use RCU external synchronization)"); + printf(" Note: default: no external synchronization used."); printf("\n"); } @@ -355,12 +422,29 @@ int main(int argc, char **argv) case 'v': verbose_mode = 1; break; + case 'p': + test_pop = 1; + break; + case 'P': + test_pop_all = 1; + break; + case 'R': + test_sync = TEST_SYNC_RCU; + break; } } + /* activate pop_all test by default */ + if (!test_pop && !test_pop_all) + test_pop_all = 1; + printf_verbose("running test for %lu seconds, %u enqueuers, " "%u dequeuers.\n", duration, nr_enqueuers, nr_dequeuers); + if (test_pop) + printf_verbose("pop test activated.\n"); + if (test_pop_all) + printf_verbose("pop_all test activated.\n"); printf_verbose("Writer delay : %lu loops.\n", rduration); printf_verbose("Reader duration : %lu loops.\n", wdelay); printf_verbose("thread %-6s, thread id : %lx, tid %lu\n", -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Mon Oct 22 11:17:49 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 22 Oct 2012 11:17:49 -0400 Subject: [lttng-dev] RFC : design notes for remote traces live reading In-Reply-To: <20121020225442.GA5020@Krystal> References: <50830BA3.6000508@efficios.com> <20121020225442.GA5020@Krystal> Message-ID: <5085639D.4030004@efficios.com> Hoy! Comments below. Mathieu Desnoyers: > * Julien Desfossez (jdesfossez at efficios.com) wrote: >> In order to achieve live reading of streamed traces, we need : >> - the index generation while tracing >> - index streaming >> - synchronization of streams >> - cooperating viewers >> >> This RFC addresses each of these points with the anticipated design, >> implementation is on its way, so quick feedbacks greatly appreciated ! >> >> * Index generation >> The index associates a trace packet with an offset inside the tracefile. >> While tracing, when a packet is ready to be written, we can ask the ring >> buffer to provide us the information required to produce the index >> (data_offset, > > Is data_offset just the header size ? Do we really want that in the > on-disk index ? It can be easily computed from the metadata, I'm not > sure we want to duplicate this information. > >> packet_size, content_size, timestamp_begin, timestamp_end, >> events_discarded, events_discarded_len, > > events_discarded_len is also known from metadata. > >> stream_id). > > Maybe you could detail the exact layout of an element in the index as a > packed C structure and provide it in the next round of this RFC so we > know exactly which types and what contend you plan. I agree with that. Furthermore, part of this information might probably end up in the data header which poses problem for backward compatibility (relayd 2.2 --> sessiond 2.1 or relayd 2.1 --> sessiond 2.2). This is another issue entirely but having the clear memory layout will help a lot for future RFCs. > >> >> * Index streaming >> The index is mandatory for live reading since we use it for the streams >> synchronization. We absolutely need to receive the index, so we send it >> on the control port (TCP-only), but most of the information related to >> the index is only relevant if we receive the associated data packet. So >> the proposed protocol is the following : >> - with each data packet, send the data_offset, packet_size, content_size > > what is data_offset ? > >> (all uint64_t) along with the already in place information (stream id >> and sequence number) >> - after sending a data packet, the consumer sends on the control port a >> new message (RELAYD_SEND_INDEX) with timestamp_begin, timestamp_end, >> events_discarded, events_discarded_len, stream_id, the sequence number, > > do we need events_discarded_len ? > >> (all uint64_t), and the relayd stream id of the tracefile >> - when the relay receives a data packet it looks if it already received >> an index corresponding to this stream and sequence number, if yes it >> completes the index structure and writes the index on disk, otherwise it >> creates an index structure in memory with the information it can fill >> and stores it in a hash table waiting for the corresponding index packet >> to arrive >> - the same concept applies when the relay receives an index packet. > > Yep. We could possibly describe this as a 2-way merge point between data > and index, performed through lookups (by what key ?) in a hash table. > >> >> This two-part remote index generation allows us to determine if we lost >> packets because of the network, limit the number of bytes sent on the >> control port and make sure we still have an index for each packet with >> its timestamps and the number of events lost so the viewer knows if we >> lost events because of the tracer or the network. >> >> Design question : since the lookup is always based on two factors >> (relayd stream_id and sequence number), do we want to create a hash >> table for each stream on the relay ? > > Nope. A single hash table can be used. The hash function takes both > stream ID and seq num (e.g. with a xor), and the compare function > compares with both. Hmm... A bit worried about collision here ... since stream ID can be equal to a seq num so we have this problem: (stream:seq_num) 4:5 and 5:4. Anyhow, the operation using the stream ID and seq num should produce a different output for the above case. > >> We have to consider that at some point, we might have to reorder trace >> packets (when we support UDP) before writing them to disk, so we will >> need a similar structure to temporarily store out-of-order packets. > > I don't think it will be necessary for UDP: UDP datagrams, AFAIK, arrive > ordered at the receiver application, even if they are made of many > actual IP packets. Basically, we can simply send each entire trace > packet as one single UDP datagram. Agreed but can add a bit of complexity on the session daemon side to extract subbuffers and cut them in a UDP datagrams especially if the buffer size changes (set by the user). For instance, the subbuffers size is 256k and UDP datagram is 65k, well we will have to truncate the subbufers, queue part of it and probably add padding as well. We might want to consider both possibilities were we do that based on UDP datagrams or what Julien is proposing. > >> Also the hash table storing the indexes needs an expiration mechanism >> (based on timing or number of packets). > > Upon addition into the hash table, we could use a separate data > structure to keep track of expiration timers. When an entry is removed > from the hash table, we remove its associated timer entry. It does not > need to sit in the same data structure. Maybe a linked list, or maybe a > red black tree, would be more appropriate to keep track of these > expiration times. A periodical timer could perform the discard of > packets when they reach their timeout. This means that at each packet received, we'll have to just drop nodes from whatever data structure that have expired? > >> >> * Synchronization of streams >> Already discussed in an earlier RFC, summary : >> - at a predefined rate, the consumer sends a synchronization packet that >> contains the last sequence number that can be safely read by the viewer >> for each stream of the session, it happens as soon as possible when all >> streams are generating data, and also time-based to cover the case with >> streams not generating any data. > > Note: if the consumer has not sent any data whatsoever (on any stream) > since the last synchronization beacon, it can skip sending the next > beacon. This is a nice power consumption optimisation. > >> - the relay receives this packet, ensures all data packets and indexes >> are commited on disk (and sync'ed) and updates the synchronization with >> the viewers (discussed just below) >> >> * Cooperating viewers >> The viewers need to be aware that they are reading streamed data and >> play nicely with the synchronization algorithms in place. The proposed >> approach is using fcntl(2) "Advisory locking" to lock specific portions >> of the tracefiles. The viewers will have to test and make sure they are >> respecting the locks when they are switching packets. >> So in summary : >> - when the relay is ready to let the viewers access the data, it adds a >> new write lock on the region that cannot be safely read and removes the >> previous one >> - when a viewer needs to switch packet, it tests for the presence of a >> lock on the region of the file it needs to access, if there is no lock >> it can safely read the data, otherwise it blocks until the lock is removed. >> - when a data packet is lost on the network, an index is written, but >> the offset in the tracefile is set to an invalid value (-1) so the >> reader knows the data was lost in transit. >> - the viewers need also to be adapted to read on-disk indexes, support >> metadata updates, respect the locking. > > How do you expect to deal with streams coming during tracing ? How is > the viewer expected to be told a new stream needs to be read, and how > is the file creation / advisory locking vs file open (read) / advisory > locking expected to be handled ? > >> >> Not addressed here but mandatory : the metadata must be completely >> streamed before streaming trace data that correspond to this new metadata. > > Yes. We might want to think a little more about what happens when we > stream partially complete metadata that cuts it somewhere where it > cannot be parsed.. ? Of what I've experienced so far, there are times where the metadata is simply sent *only* when the stop command is done which uses a flush buffer operation and, since the trace throughput was so low so buffers don't get filled up. Considering this *strong* requirement that the metadata needs to be streamed completely, can we think of a ustctl/kernctl that forces the metadata extraction? And this will be especially useful for new metadata added during tracing! (Not sure how we can deal with that on the session daemon since we have no idea but the tracer knows so maybe it could wake up the stream fd whenever there is metadata available?). Cheers! David > > Thanks! > > Mathieu > >> >> Feedbacks, questions and improvement ideas welcome ! >> >> Thanks, >> >> Julien > From mathieu.desnoyers at efficios.com Mon Oct 22 11:38:02 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 22 Oct 2012 11:38:02 -0400 Subject: [lttng-dev] RFC : design notes for remote traces live reading In-Reply-To: <5085639D.4030004@efficios.com> References: <50830BA3.6000508@efficios.com> <20121020225442.GA5020@Krystal> <5085639D.4030004@efficios.com> Message-ID: <20121022153802.GA3432@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > Hoy! > > Comments below. > > Mathieu Desnoyers: > > * Julien Desfossez (jdesfossez at efficios.com) wrote: > >> In order to achieve live reading of streamed traces, we need : > >> - the index generation while tracing > >> - index streaming > >> - synchronization of streams > >> - cooperating viewers > >> > >> This RFC addresses each of these points with the anticipated design, > >> implementation is on its way, so quick feedbacks greatly appreciated ! > >> > >> * Index generation > >> The index associates a trace packet with an offset inside the tracefile. > >> While tracing, when a packet is ready to be written, we can ask the ring > >> buffer to provide us the information required to produce the index > >> (data_offset, > > > > Is data_offset just the header size ? Do we really want that in the > > on-disk index ? It can be easily computed from the metadata, I'm not > > sure we want to duplicate this information. > > > >> packet_size, content_size, timestamp_begin, timestamp_end, > >> events_discarded, events_discarded_len, > > > > events_discarded_len is also known from metadata. > > > >> stream_id). > > > > Maybe you could detail the exact layout of an element in the index as a > > packed C structure and provide it in the next round of this RFC so we > > know exactly which types and what contend you plan. > > I agree with that. Furthermore, part of this information might probably > end up in the data header which poses problem for backward compatibility > (relayd 2.2 --> sessiond 2.1 or relayd 2.1 --> sessiond 2.2). This is > another issue entirely but having the clear memory layout will help a > lot for future RFCs. > > > > >> > >> * Index streaming > >> The index is mandatory for live reading since we use it for the streams > >> synchronization. We absolutely need to receive the index, so we send it > >> on the control port (TCP-only), but most of the information related to > >> the index is only relevant if we receive the associated data packet. So > >> the proposed protocol is the following : > >> - with each data packet, send the data_offset, packet_size, content_size > > > > what is data_offset ? > > > >> (all uint64_t) along with the already in place information (stream id > >> and sequence number) > >> - after sending a data packet, the consumer sends on the control port a > >> new message (RELAYD_SEND_INDEX) with timestamp_begin, timestamp_end, > >> events_discarded, events_discarded_len, stream_id, the sequence number, > > > > do we need events_discarded_len ? > > > >> (all uint64_t), and the relayd stream id of the tracefile > >> - when the relay receives a data packet it looks if it already received > >> an index corresponding to this stream and sequence number, if yes it > >> completes the index structure and writes the index on disk, otherwise it > >> creates an index structure in memory with the information it can fill > >> and stores it in a hash table waiting for the corresponding index packet > >> to arrive > >> - the same concept applies when the relay receives an index packet. > > > > Yep. We could possibly describe this as a 2-way merge point between data > > and index, performed through lookups (by what key ?) in a hash table. > > > >> > >> This two-part remote index generation allows us to determine if we lost > >> packets because of the network, limit the number of bytes sent on the > >> control port and make sure we still have an index for each packet with > >> its timestamps and the number of events lost so the viewer knows if we > >> lost events because of the tracer or the network. > >> > >> Design question : since the lookup is always based on two factors > >> (relayd stream_id and sequence number), do we want to create a hash > >> table for each stream on the relay ? > > > > Nope. A single hash table can be used. The hash function takes both > > stream ID and seq num (e.g. with a xor), and the compare function > > compares with both. > > Hmm... A bit worried about collision here ... since stream ID can be > equal to a seq num so we have this problem: (stream:seq_num) 4:5 and 5:4. > > Anyhow, the operation using the stream ID and seq num should produce a > different output for the above case. A hash function can have seldom collisions, that's fine. We then disambiguate the collision using the compare function. > > > > >> We have to consider that at some point, we might have to reorder trace > >> packets (when we support UDP) before writing them to disk, so we will > >> need a similar structure to temporarily store out-of-order packets. > > > > I don't think it will be necessary for UDP: UDP datagrams, AFAIK, arrive > > ordered at the receiver application, even if they are made of many > > actual IP packets. Basically, we can simply send each entire trace > > packet as one single UDP datagram. > > Agreed but can add a bit of complexity on the session daemon side to > extract subbuffers and cut them in a UDP datagrams especially if the > buffer size changes (set by the user). > > For instance, the subbuffers size is 256k and UDP datagram is 65k, well > we will have to truncate the subbufers, queue part of it and probably > add padding as well. > > We might want to consider both possibilities were we do that based on > UDP datagrams or what Julien is proposing. Random question: can we do a sendmsg/recvmsg of 1MB over UDP ? Would the kernel deliver the 1MB udp packet ? > > > > >> Also the hash table storing the indexes needs an expiration mechanism > >> (based on timing or number of packets). > > > > Upon addition into the hash table, we could use a separate data > > structure to keep track of expiration timers. When an entry is removed > > from the hash table, we remove its associated timer entry. It does not > > need to sit in the same data structure. Maybe a linked list, or maybe a > > red black tree, would be more appropriate to keep track of these > > expiration times. A periodical timer could perform the discard of > > packets when they reach their timeout. > > This means that at each packet received, we'll have to just drop nodes > from whatever data structure that have expired? No. Expiration checking can be done with a timer. When we receive packets, this is yet another trigger that can let us remove stuff from the expiration queue. > > > > >> > >> * Synchronization of streams > >> Already discussed in an earlier RFC, summary : > >> - at a predefined rate, the consumer sends a synchronization packet that > >> contains the last sequence number that can be safely read by the viewer > >> for each stream of the session, it happens as soon as possible when all > >> streams are generating data, and also time-based to cover the case with > >> streams not generating any data. > > > > Note: if the consumer has not sent any data whatsoever (on any stream) > > since the last synchronization beacon, it can skip sending the next > > beacon. This is a nice power consumption optimisation. > > > >> - the relay receives this packet, ensures all data packets and indexes > >> are commited on disk (and sync'ed) and updates the synchronization with > >> the viewers (discussed just below) > >> > >> * Cooperating viewers > >> The viewers need to be aware that they are reading streamed data and > >> play nicely with the synchronization algorithms in place. The proposed > >> approach is using fcntl(2) "Advisory locking" to lock specific portions > >> of the tracefiles. The viewers will have to test and make sure they are > >> respecting the locks when they are switching packets. > >> So in summary : > >> - when the relay is ready to let the viewers access the data, it adds a > >> new write lock on the region that cannot be safely read and removes the > >> previous one > >> - when a viewer needs to switch packet, it tests for the presence of a > >> lock on the region of the file it needs to access, if there is no lock > >> it can safely read the data, otherwise it blocks until the lock is removed. > >> - when a data packet is lost on the network, an index is written, but > >> the offset in the tracefile is set to an invalid value (-1) so the > >> reader knows the data was lost in transit. > >> - the viewers need also to be adapted to read on-disk indexes, support > >> metadata updates, respect the locking. > > > > How do you expect to deal with streams coming during tracing ? How is > > the viewer expected to be told a new stream needs to be read, and how > > is the file creation / advisory locking vs file open (read) / advisory > > locking expected to be handled ? > > > >> > >> Not addressed here but mandatory : the metadata must be completely > >> streamed before streaming trace data that correspond to this new metadata. > > > > Yes. We might want to think a little more about what happens when we > > stream partially complete metadata that cuts it somewhere where it > > cannot be parsed.. ? > > Of what I've experienced so far, there are times where the metadata is > simply sent *only* when the stop command is done which uses a flush > buffer operation and, since the trace throughput was so low so buffers > don't get filled up. > > Considering this *strong* requirement that the metadata needs to be > streamed completely, can we think of a ustctl/kernctl that forces the > metadata extraction? Not sure what you mean. We can simply flush the metadata buffer. Or we could decide to change the way we grab metadata altogether so it becomes more synchronous with the application. However, this might be an issue with application crash dump. Thanks, Mathieu > > And this will be especially useful for new metadata added during > tracing! (Not sure how we can deal with that on the session daemon since > we have no idea but the tracer knows so maybe it could wake up the > stream fd whenever there is metadata available?). > > Cheers! > David > > > > > Thanks! > > > > Mathieu > > > >> > >> Feedbacks, questions and improvement ideas welcome ! > >> > >> Thanks, > >> > >> Julien > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Mon Oct 22 11:49:14 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 22 Oct 2012 11:49:14 -0400 Subject: [lttng-dev] RFC : design notes for remote traces live reading In-Reply-To: <20121022153802.GA3432@Krystal> References: <50830BA3.6000508@efficios.com> <20121020225442.GA5020@Krystal> <5085639D.4030004@efficios.com> <20121022153802.GA3432@Krystal> Message-ID: <50856AFA.4040706@efficios.com> Mathieu Desnoyers: > * David Goulet (dgoulet at efficios.com) wrote: >> Hoy! >> >> Comments below. >> >> Mathieu Desnoyers: >>> * Julien Desfossez (jdesfossez at efficios.com) wrote: >>>> In order to achieve live reading of streamed traces, we need : >>>> - the index generation while tracing >>>> - index streaming >>>> - synchronization of streams >>>> - cooperating viewers >>>> >>>> This RFC addresses each of these points with the anticipated design, >>>> implementation is on its way, so quick feedbacks greatly appreciated ! >>>> >>>> * Index generation >>>> The index associates a trace packet with an offset inside the tracefile. >>>> While tracing, when a packet is ready to be written, we can ask the ring >>>> buffer to provide us the information required to produce the index >>>> (data_offset, >>> >>> Is data_offset just the header size ? Do we really want that in the >>> on-disk index ? It can be easily computed from the metadata, I'm not >>> sure we want to duplicate this information. >>> >>>> packet_size, content_size, timestamp_begin, timestamp_end, >>>> events_discarded, events_discarded_len, >>> >>> events_discarded_len is also known from metadata. >>> >>>> stream_id). >>> >>> Maybe you could detail the exact layout of an element in the index as a >>> packed C structure and provide it in the next round of this RFC so we >>> know exactly which types and what contend you plan. >> >> I agree with that. Furthermore, part of this information might probably >> end up in the data header which poses problem for backward compatibility >> (relayd 2.2 --> sessiond 2.1 or relayd 2.1 --> sessiond 2.2). This is >> another issue entirely but having the clear memory layout will help a >> lot for future RFCs. >> >>> >>>> >>>> * Index streaming >>>> The index is mandatory for live reading since we use it for the streams >>>> synchronization. We absolutely need to receive the index, so we send it >>>> on the control port (TCP-only), but most of the information related to >>>> the index is only relevant if we receive the associated data packet. So >>>> the proposed protocol is the following : >>>> - with each data packet, send the data_offset, packet_size, content_size >>> >>> what is data_offset ? >>> >>>> (all uint64_t) along with the already in place information (stream id >>>> and sequence number) >>>> - after sending a data packet, the consumer sends on the control port a >>>> new message (RELAYD_SEND_INDEX) with timestamp_begin, timestamp_end, >>>> events_discarded, events_discarded_len, stream_id, the sequence number, >>> >>> do we need events_discarded_len ? >>> >>>> (all uint64_t), and the relayd stream id of the tracefile >>>> - when the relay receives a data packet it looks if it already received >>>> an index corresponding to this stream and sequence number, if yes it >>>> completes the index structure and writes the index on disk, otherwise it >>>> creates an index structure in memory with the information it can fill >>>> and stores it in a hash table waiting for the corresponding index packet >>>> to arrive >>>> - the same concept applies when the relay receives an index packet. >>> >>> Yep. We could possibly describe this as a 2-way merge point between data >>> and index, performed through lookups (by what key ?) in a hash table. >>> >>>> >>>> This two-part remote index generation allows us to determine if we lost >>>> packets because of the network, limit the number of bytes sent on the >>>> control port and make sure we still have an index for each packet with >>>> its timestamps and the number of events lost so the viewer knows if we >>>> lost events because of the tracer or the network. >>>> >>>> Design question : since the lookup is always based on two factors >>>> (relayd stream_id and sequence number), do we want to create a hash >>>> table for each stream on the relay ? >>> >>> Nope. A single hash table can be used. The hash function takes both >>> stream ID and seq num (e.g. with a xor), and the compare function >>> compares with both. >> >> Hmm... A bit worried about collision here ... since stream ID can be >> equal to a seq num so we have this problem: (stream:seq_num) 4:5 and 5:4. >> >> Anyhow, the operation using the stream ID and seq num should produce a >> different output for the above case. > > A hash function can have seldom collisions, that's fine. We then > disambiguate the collision using the compare function. Indeed. Let's keep that in mind though that collision can occurs for this particular data structure holding this information. > >> >>> >>>> We have to consider that at some point, we might have to reorder trace >>>> packets (when we support UDP) before writing them to disk, so we will >>>> need a similar structure to temporarily store out-of-order packets. >>> >>> I don't think it will be necessary for UDP: UDP datagrams, AFAIK, arrive >>> ordered at the receiver application, even if they are made of many >>> actual IP packets. Basically, we can simply send each entire trace >>> packet as one single UDP datagram. >> >> Agreed but can add a bit of complexity on the session daemon side to >> extract subbuffers and cut them in a UDP datagrams especially if the >> buffer size changes (set by the user). >> >> For instance, the subbuffers size is 256k and UDP datagram is 65k, well >> we will have to truncate the subbufers, queue part of it and probably >> add padding as well. >> >> We might want to consider both possibilities were we do that based on >> UDP datagrams or what Julien is proposing. > > Random question: can we do a sendmsg/recvmsg of 1MB over UDP ? Would the > kernel deliver the 1MB udp packet ? > >> >>> >>>> Also the hash table storing the indexes needs an expiration mechanism >>>> (based on timing or number of packets). >>> >>> Upon addition into the hash table, we could use a separate data >>> structure to keep track of expiration timers. When an entry is removed >>> from the hash table, we remove its associated timer entry. It does not >>> need to sit in the same data structure. Maybe a linked list, or maybe a >>> red black tree, would be more appropriate to keep track of these >>> expiration times. A periodical timer could perform the discard of >>> packets when they reach their timeout. >> >> This means that at each packet received, we'll have to just drop nodes >> from whatever data structure that have expired? > > No. Expiration checking can be done with a timer. When we receive > packets, this is yet another trigger that can let us remove stuff from > the expiration queue. Right. My bad, my brain apparently skipped: "A periodical timer could perform the discard of packets when they reach their timeout." :) > >> >>> >>>> >>>> * Synchronization of streams >>>> Already discussed in an earlier RFC, summary : >>>> - at a predefined rate, the consumer sends a synchronization packet that >>>> contains the last sequence number that can be safely read by the viewer >>>> for each stream of the session, it happens as soon as possible when all >>>> streams are generating data, and also time-based to cover the case with >>>> streams not generating any data. >>> >>> Note: if the consumer has not sent any data whatsoever (on any stream) >>> since the last synchronization beacon, it can skip sending the next >>> beacon. This is a nice power consumption optimisation. >>> >>>> - the relay receives this packet, ensures all data packets and indexes >>>> are commited on disk (and sync'ed) and updates the synchronization with >>>> the viewers (discussed just below) >>>> >>>> * Cooperating viewers >>>> The viewers need to be aware that they are reading streamed data and >>>> play nicely with the synchronization algorithms in place. The proposed >>>> approach is using fcntl(2) "Advisory locking" to lock specific portions >>>> of the tracefiles. The viewers will have to test and make sure they are >>>> respecting the locks when they are switching packets. >>>> So in summary : >>>> - when the relay is ready to let the viewers access the data, it adds a >>>> new write lock on the region that cannot be safely read and removes the >>>> previous one >>>> - when a viewer needs to switch packet, it tests for the presence of a >>>> lock on the region of the file it needs to access, if there is no lock >>>> it can safely read the data, otherwise it blocks until the lock is removed. >>>> - when a data packet is lost on the network, an index is written, but >>>> the offset in the tracefile is set to an invalid value (-1) so the >>>> reader knows the data was lost in transit. >>>> - the viewers need also to be adapted to read on-disk indexes, support >>>> metadata updates, respect the locking. >>> >>> How do you expect to deal with streams coming during tracing ? How is >>> the viewer expected to be told a new stream needs to be read, and how >>> is the file creation / advisory locking vs file open (read) / advisory >>> locking expected to be handled ? >>> >>>> >>>> Not addressed here but mandatory : the metadata must be completely >>>> streamed before streaming trace data that correspond to this new metadata. >>> >>> Yes. We might want to think a little more about what happens when we >>> stream partially complete metadata that cuts it somewhere where it >>> cannot be parsed.. ? >> >> Of what I've experienced so far, there are times where the metadata is >> simply sent *only* when the stop command is done which uses a flush >> buffer operation and, since the trace throughput was so low so buffers >> don't get filled up. >> >> Considering this *strong* requirement that the metadata needs to be >> streamed completely, can we think of a ustctl/kernctl that forces the >> metadata extraction? > > Not sure what you mean. We can simply flush the metadata buffer. Or we > could decide to change the way we grab metadata altogether so it becomes > more synchronous with the application. However, this might be an issue > with application crash dump. Flush buffer can do the trick indeed for most of the use cases. But what happens here if new metadata comes in (after start tracing) and the app is very low throughput ? Don't we need the tracer to immediately notify the consumer that there is new metadata available? Thanks! David > > Thanks, > > Mathieu > >> >> And this will be especially useful for new metadata added during >> tracing! (Not sure how we can deal with that on the session daemon since >> we have no idea but the tracer knows so maybe it could wake up the >> stream fd whenever there is metadata available?). >> >> Cheers! >> David >> >>> >>> Thanks! >>> >>> Mathieu >>> >>>> >>>> Feedbacks, questions and improvement ideas welcome ! >>>> >>>> Thanks, >>>> >>>> Julien >>> > From mathieu.desnoyers at efficios.com Mon Oct 22 11:57:39 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 22 Oct 2012 11:57:39 -0400 Subject: [lttng-dev] RFC : design notes for remote traces live reading In-Reply-To: <50856AFA.4040706@efficios.com> References: <50830BA3.6000508@efficios.com> <20121020225442.GA5020@Krystal> <5085639D.4030004@efficios.com> <20121022153802.GA3432@Krystal> <50856AFA.4040706@efficios.com> Message-ID: <20121022155739.GA3205@Krystal> * David Goulet (dgoulet at efficios.com) wrote: [...] > > > >> > >>> > >>>> > >>>> * Synchronization of streams > >>>> Already discussed in an earlier RFC, summary : > >>>> - at a predefined rate, the consumer sends a synchronization packet that > >>>> contains the last sequence number that can be safely read by the viewer > >>>> for each stream of the session, it happens as soon as possible when all > >>>> streams are generating data, and also time-based to cover the case with > >>>> streams not generating any data. > >>> > >>> Note: if the consumer has not sent any data whatsoever (on any stream) > >>> since the last synchronization beacon, it can skip sending the next > >>> beacon. This is a nice power consumption optimisation. > >>> > >>>> - the relay receives this packet, ensures all data packets and indexes > >>>> are commited on disk (and sync'ed) and updates the synchronization with > >>>> the viewers (discussed just below) > >>>> > >>>> * Cooperating viewers > >>>> The viewers need to be aware that they are reading streamed data and > >>>> play nicely with the synchronization algorithms in place. The proposed > >>>> approach is using fcntl(2) "Advisory locking" to lock specific portions > >>>> of the tracefiles. The viewers will have to test and make sure they are > >>>> respecting the locks when they are switching packets. > >>>> So in summary : > >>>> - when the relay is ready to let the viewers access the data, it adds a > >>>> new write lock on the region that cannot be safely read and removes the > >>>> previous one > >>>> - when a viewer needs to switch packet, it tests for the presence of a > >>>> lock on the region of the file it needs to access, if there is no lock > >>>> it can safely read the data, otherwise it blocks until the lock is removed. > >>>> - when a data packet is lost on the network, an index is written, but > >>>> the offset in the tracefile is set to an invalid value (-1) so the > >>>> reader knows the data was lost in transit. > >>>> - the viewers need also to be adapted to read on-disk indexes, support > >>>> metadata updates, respect the locking. > >>> > >>> How do you expect to deal with streams coming during tracing ? How is > >>> the viewer expected to be told a new stream needs to be read, and how > >>> is the file creation / advisory locking vs file open (read) / advisory > >>> locking expected to be handled ? > >>> > >>>> > >>>> Not addressed here but mandatory : the metadata must be completely > >>>> streamed before streaming trace data that correspond to this new metadata. > >>> > >>> Yes. We might want to think a little more about what happens when we > >>> stream partially complete metadata that cuts it somewhere where it > >>> cannot be parsed.. ? > >> > >> Of what I've experienced so far, there are times where the metadata is > >> simply sent *only* when the stop command is done which uses a flush > >> buffer operation and, since the trace throughput was so low so buffers > >> don't get filled up. > >> > >> Considering this *strong* requirement that the metadata needs to be > >> streamed completely, can we think of a ustctl/kernctl that forces the > >> metadata extraction? > > > > Not sure what you mean. We can simply flush the metadata buffer. Or we > > could decide to change the way we grab metadata altogether so it becomes > > more synchronous with the application. However, this might be an issue > > with application crash dump. > > Flush buffer can do the trick indeed for most of the use cases. But what > happens here if new metadata comes in (after start tracing) and the app > is very low throughput ? Don't we need the tracer to immediately notify > the consumer that there is new metadata available? Well the idea is that the periodical flush will apply to metadata too, not just data channels. And we'll make sure to send metadata before data when both are ready. Mathieu > > Thanks! > David > > > > > Thanks, > > > > Mathieu > > > >> > >> And this will be especially useful for new metadata added during > >> tracing! (Not sure how we can deal with that on the session daemon since > >> we have no idea but the tracer knows so maybe it could wake up the > >> stream fd whenever there is metadata available?). > >> > >> Cheers! > >> David > >> > >>> > >>> Thanks! > >>> > >>> Mathieu > >>> > >>>> > >>>> Feedbacks, questions and improvement ideas welcome ! > >>>> > >>>> Thanks, > >>>> > >>>> Julien > >>> > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Mon Oct 22 13:06:12 2012 From: dgoulet at efficios.com (David Goulet) Date: Mon, 22 Oct 2012 13:06:12 -0400 Subject: [lttng-dev] [RELEASE] LTTng-tools 2.1.0-rc5 In-Reply-To: <502EA58E.2060003@efficios.com> References: <502EA58E.2060003@efficios.com> Message-ID: <50857D04.10705@efficios.com> Greetings everyone (including LTTng elves), The lttng-tools project provides a session daemon (lttng-sessiond) that acts as a tracing registry, the "lttng" command line for tracing control, a lttng-ctl library for tracing control and a lttng-relayd for network streaming. This release candidate version contains a lot of fixes and especially one that fix the lttng_stop_tracing() behavior by adding a new call to the API (lttng.h). Here goes. The lttng_stop_tracing() API call, once it returns, there was suppose to be a guarantee that the tracing data is available for processing i.e. that any viewer/reader could parse the trace files without any problem or missing data. We've fixed that with a new API call, lttng_data_available(). It returns 1 if the data is ready to be read for a tracing session (for every domain) or else 0. We've added this call to lttng_stop_tracing() in order for this call to return *only* if the condition is true (1). It changes the old behavior since now the stop tracing call can wait an arbitrary amount of time. Note that, for now, lttng_data_available is called every 2usec on the client side (liblttng-ctl). So, this call does NOT block nor makes the session daemon wait for data availability. We've also added the lttng_stop_tracing_no_wait() that behaves *exactly* like lttng_stop_tracing() but will NOT wait for data availability. This is of course integrated in lttng command line and works for both local consumer (on the file system) or network streaming. For the lttng UI to _not_ wait when stopping a session, the --no-wait option was added to "lttng stop". Here is an example of the expected output of the command line. Each dot "." is a 2usec period of time. $ lttng stop mysession Waiting for data availability.... Tracing stopped for session mysession We know that this is a big addition to the RC process of lttng-tools but the data availability on stop tracing is a *very* important aspect of the lttng tool chain hence fixing it before the stable version especially with the network streaming support. Apart from that, here is the ChangeLog for 2.1.0-rc5 which contains a lot of fixes and tests. 2012-10-22 lttng-tools 2.1.0-rc5 * Fix: Remove network stream ID ABI calls * Tests: Add filtering tests * Wait for data availability when stopping a session * Relayd data available command support * Lib lttng-ctl data available command support * Consumer daemon data available command support * Add data structure for the data available command * Change the metadata hash table node * Make stream hash tables global to the consumer * Move add data stream to the data thread * Rename consumer threads and spawn them in daemon * Fix: relayd close stream command was not working * Fix: Relayd and consumerd socket leaks * Fix: Missing -ENODATA handling in the consumer * Fix: Empty metadata buffer(s) on HUP|ERR * ABI with support for compat 32/64 bits * Fix: Stream allocation and insertion consistency * Fix: output number of bytes written by relayd * Add hash table argument to helper functions * Fix: Add missing call rcu and read side lock * Tests: Fix LD_PRELOAD library lookup path for health tests * Fix: Add arbitrary wait period for kernel streaming test * Fix coding style and add/change debug statements * Fix: Build out of src tree * Tests: Add health check tests to configure * Tests: Add health check thread stall test * Tests: Add health check thread exit test * Tests: Add a health check utility program * Add testpoints in lttng-sessiond for each threads * New testpoint mechanism to instrument binaries for testing * Fix: off-by-one in comm proto between lttng-ctl and sessiond * Fix: Metadata stream leak when received in consumer * Fix: consumer_allocate_stream error handling * Fix: consumer should await for initial streams * Fix: Missing rcu read side lock in consumer Please feel free to email the list about any questions/comments concerning this release. Project website: http://lttng.org/lttng2.0 Download link: http://lttng.org/files/lttng-tools/lttng-tools-2.1.0-rc5.tar.bz2 (for the PGP signature, same file with .asc appended) Cheers! David _______________________________________________ lttng-dev mailing list lttng-dev at lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev From jdesfossez at efficios.com Mon Oct 22 14:10:01 2012 From: jdesfossez at efficios.com (Julien Desfossez) Date: Mon, 22 Oct 2012 14:10:01 -0400 Subject: [lttng-dev] status of lttng top In-Reply-To: <7F632A9222059A42AF70FCB7965774AA20854D6F@ALA-MBB.corp.ad.wrs.com> References: <7F632A9222059A42AF70FCB7965774AA20627EAB@ALA-MBB.corp.ad.wrs.com> <507389DE.4000204@efficios.com> <7F632A9222059A42AF70FCB7965774AA20854D6F@ALA-MBB.corp.ad.wrs.com> Message-ID: <50858BF9.705@efficios.com> Hi Andrew, On 22/10/12 07:00 AM, McDermott, Andrew wrote: > > Hi, > >> LTTngTop is still work in progress and will remain that way for a long >> time, but the version in the PPA (or in the master branch in git) is >> perfectly usable for offline traces (traces recorded and replayed >> through LTTngTop). >> >> The "live" branch is more experimental and requires patches in both >> Babeltrace and Lttng-tools (all documented in the README-LIVE file), but >> it worked at the time of Plumbers, I didn't have much time since then to >> rebase the branches. >> >> I am waiting for the release of Lttng-tools 2.1 (currently in RC) before >> merging those patches. After these patches are integrated, LTTngTop will >> be able to work live without any modifications, so directly reading >> traces in memory shared with the tracer. > > Thanks for this info. > > Right now my interest is with the live streaming; we have a use case > where the live streaming is really the only practical solution. > > Very roughly, would you expect the RC series to conclude this year, or > (early) next year? Just to clarify, are you interested in live network trace reading or live in-memory reading ? The patches I was talking about are for in-memory trace reading. Thanks, Julien From paulmck at linux.vnet.ibm.com Mon Oct 22 13:44:10 2012 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Mon, 22 Oct 2012 10:44:10 -0700 Subject: [lttng-dev] urcu stack and queues updates and documentation In-Reply-To: <20121017151946.GA14514@Krystal> References: <20121014175332.GA2947@Krystal> <20121016213727.GL2385@linux.vnet.ibm.com> <20121017151946.GA14514@Krystal> Message-ID: <20121022174410.GT2518@linux.vnet.ibm.com> On Wed, Oct 17, 2012 at 11:19:46AM -0400, Mathieu Desnoyers wrote: > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > > On Sun, Oct 14, 2012 at 01:53:32PM -0400, Mathieu Desnoyers wrote: > > > Hi Paul! > > > > > > I know you are currently looking at documentation of urcu data > > > structures. I did quite a bit of work in that area these past days. Here > > > is my plan: > > > > Actually, I diverted to the atomic operations, given that the stack/queue > > API seems to be in flux. ;-) > > That sounds like a wise decision ;-) > > > > 1) I would like to deprecate, at some point, rculfqueue, wfqueue, and > > > rculfstack. > > > > > > 2) For wfqueue, we replace it by wfcqueue, currently in the urcu master > > > branch. > > > > > > 3) For rculfstack, we replace it by lfstack available here (volatile > > > branch): > > > > > > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > > > branch: urcu/lfstack > > > > I probably have to document them to have any chance of having an opinion, > > other than my usual advice to avoid disrupting users of the old interfaces. > > My general plan is to leave the old interfaces in place, marking them as > "deprecated" by adding a __attribute__((deprecated("This interface is deprecated. Please refer to urcu/xxxqueue.h for its replacement."))). > Then we'll be able to drop the deprecated interfaces in a couple of > versions. Fair enough. Should enough users protest, we can of course leave them in place. > > > 4) I did documentation improvements (and implemented pop_all as well as > > > empty, and iterators) for wfstack here (volatile branch too): > > > > > > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > > > branch: urcu/wfstack > > > > I will be very happy to take advantage of this. ;-) > > I wonder how we should move forward with these ? I could pull the > urcu/wfstack, urcu/lfstack commits into master with your approval, and > mark rculfstack and wfqueue as deprecated. wfstack is simply extended. I > would wait a bit before deciding anything wrt rculfqueue. Thoughts ? I would be in favor of pulling them in -- we can fix if need be. That said, I am not so sure that getting rid of wfqueue is a good idea, given your analysis below. > > > 5) The last one to look into would be rculfqueue. I'd really like to > > > create a lfcqueue derived from wfcqueue if possible. It's the next > > > item on my todo list this weekend. > > > > The piece I am missing is ABA avoidance. Or is this the approach > > that assumes a single dequeuer? > > If we look at the big picture, the main difference between the "wf" and > "lf" approaches, both for stack and queue, is that "wf" requires > traversal to busy-wait when it sees the intermediate NULL pointer state. > This allows wait-free push/enqueue with xchg. The "lf" approach ensures > that a simple traversal can be done on the structures, at the expense of > requiring a cmpxchg on the enqueue/push. > > Luckily, for stacks, the nature of stacks makes "push" ABA-proof (see > the documentation in the code), even if we use cmpxchg. > > Unluckily, for queues, using cmpxchg on enqueue is ABA-prone. dequeue > is ABA-prone too. Moreover, we need to have existance guarantees, so an > enqueue does not attempt to do a cmpxchg on the next pointer of a node > that has already been dequeued and reallocated. So, one approach is to > always rely on RCU, and require the RCU read-side lock to be held around > enqueue, and around dequeue. Now, the question is: can we rely on other, > non-rcu techniques, to protect lfqueue against ABA and offer existance > guarantees ? > > A single-dequeuer approach would unfortunately not be sufficient, > because enqueue is ABA-prone, and due to lack of existance guarantees > for the node we are about to append after: if we have multiple enqueuers > and a single dequeuer, one enqueue could suffer from ABA, and try to > touch reallocated memory, due to dequeue+reallocation of a node. > > Even forcing single-enqueuer/single-dequeuer would not suffice: if, > between the moment we get the tail node we plan to append after, and the > moment we perform the cmpxchg to that node next pointer, the node is > dequeued and freed, we would be touching freed memory (corruption). > > Therefore, that would require a single mutex on _both_ enqueue and > dequeue operations, which really defeats the purpose of a lock-free > queue. > > So my current understanding is that we might have to stay with a RCU > lfcqueue, requiring RCU read-side lock to be held for enqueue and > dequeue, and requiring to wait for a grace period to elapse before > freeing the memory returned by dequeue. The benefit of using rculfcqueue > over wfcqueue is that traversal of the nodes, and dequeue, don't need to > busy-loop on NULL next pointers. > > Thoughts ? Heh! It would indeed seem that we didn't think through the conversion from wfqueue as thoroughly as we might have. ;-) Thanx, Paul > Thanks! > > Mathieu > > > > > Thanx, Paul > > > > > Thoughts ? > > > > > > Thanks, > > > > > > Mathieu > > > > > > -- > > > Mathieu Desnoyers > > > Operating System Efficiency R&D Consultant > > > EfficiOS Inc. > > > http://www.efficios.com > > > > > > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com > From mathieu.desnoyers at efficios.com Mon Oct 22 22:12:02 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 22 Oct 2012 22:12:02 -0400 Subject: [lttng-dev] urcu stack and queues updates and documentation In-Reply-To: <20121022174410.GT2518@linux.vnet.ibm.com> References: <20121014175332.GA2947@Krystal> <20121016213727.GL2385@linux.vnet.ibm.com> <20121017151946.GA14514@Krystal> <20121022174410.GT2518@linux.vnet.ibm.com> Message-ID: <20121023021202.GB13737@Krystal> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > On Wed, Oct 17, 2012 at 11:19:46AM -0400, Mathieu Desnoyers wrote: > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > > > On Sun, Oct 14, 2012 at 01:53:32PM -0400, Mathieu Desnoyers wrote: > > > > Hi Paul! > > > > > > > > I know you are currently looking at documentation of urcu data > > > > structures. I did quite a bit of work in that area these past days. Here > > > > is my plan: > > > > > > Actually, I diverted to the atomic operations, given that the stack/queue > > > API seems to be in flux. ;-) > > > > That sounds like a wise decision ;-) > > > > > > 1) I would like to deprecate, at some point, rculfqueue, wfqueue, and > > > > rculfstack. > > > > > > > > 2) For wfqueue, we replace it by wfcqueue, currently in the urcu master > > > > branch. > > > > > > > > 3) For rculfstack, we replace it by lfstack available here (volatile > > > > branch): > > > > > > > > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > > > > branch: urcu/lfstack > > > > > > I probably have to document them to have any chance of having an opinion, > > > other than my usual advice to avoid disrupting users of the old interfaces. > > > > My general plan is to leave the old interfaces in place, marking them as > > "deprecated" by adding a __attribute__((deprecated("This interface is deprecated. Please refer to urcu/xxxqueue.h for its replacement."))). > > Then we'll be able to drop the deprecated interfaces in a couple of > > versions. > > Fair enough. Should enough users protest, we can of course leave them > in place. OK. > > > > > 4) I did documentation improvements (and implemented pop_all as well as > > > > empty, and iterators) for wfstack here (volatile branch too): > > > > > > > > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > > > > branch: urcu/wfstack > > > > > > I will be very happy to take advantage of this. ;-) > > > > I wonder how we should move forward with these ? I could pull the > > urcu/wfstack, urcu/lfstack commits into master with your approval, and > > mark rculfstack and wfqueue as deprecated. wfstack is simply extended. I > > would wait a bit before deciding anything wrt rculfqueue. Thoughts ? > > I would be in favor of pulling them in -- we can fix if need be. > That said, I am not so sure that getting rid of wfqueue is a good idea, > given your analysis below. My analysis below is about rculfqueue, not wfqueue. I think you got both of them mixed up. > > > > > 5) The last one to look into would be rculfqueue. I'd really like to > > > > create a lfcqueue derived from wfcqueue if possible. It's the next > > > > item on my todo list this weekend. > > > > > > The piece I am missing is ABA avoidance. Or is this the approach > > > that assumes a single dequeuer? > > > > If we look at the big picture, the main difference between the "wf" and > > "lf" approaches, both for stack and queue, is that "wf" requires > > traversal to busy-wait when it sees the intermediate NULL pointer state. > > This allows wait-free push/enqueue with xchg. The "lf" approach ensures > > that a simple traversal can be done on the structures, at the expense of > > requiring a cmpxchg on the enqueue/push. > > > > Luckily, for stacks, the nature of stacks makes "push" ABA-proof (see > > the documentation in the code), even if we use cmpxchg. > > > > Unluckily, for queues, using cmpxchg on enqueue is ABA-prone. dequeue > > is ABA-prone too. Moreover, we need to have existance guarantees, so an > > enqueue does not attempt to do a cmpxchg on the next pointer of a node > > that has already been dequeued and reallocated. So, one approach is to > > always rely on RCU, and require the RCU read-side lock to be held around > > enqueue, and around dequeue. Now, the question is: can we rely on other, > > non-rcu techniques, to protect lfqueue against ABA and offer existance > > guarantees ? > > > > A single-dequeuer approach would unfortunately not be sufficient, > > because enqueue is ABA-prone, and due to lack of existance guarantees > > for the node we are about to append after: if we have multiple enqueuers > > and a single dequeuer, one enqueue could suffer from ABA, and try to > > touch reallocated memory, due to dequeue+reallocation of a node. > > > > Even forcing single-enqueuer/single-dequeuer would not suffice: if, > > between the moment we get the tail node we plan to append after, and the > > moment we perform the cmpxchg to that node next pointer, the node is > > dequeued and freed, we would be touching freed memory (corruption). > > > > Therefore, that would require a single mutex on _both_ enqueue and > > dequeue operations, which really defeats the purpose of a lock-free > > queue. > > > > So my current understanding is that we might have to stay with a RCU > > lfcqueue, requiring RCU read-side lock to be held for enqueue and > > dequeue, and requiring to wait for a grace period to elapse before > > freeing the memory returned by dequeue. The benefit of using rculfcqueue > > over wfcqueue is that traversal of the nodes, and dequeue, don't need to > > busy-loop on NULL next pointers. > > > > Thoughts ? > > Heh! It would indeed seem that we didn't think through the conversion > from wfqueue as thoroughly as we might have. ;-) The transition from wfqueue to wfcqueue does not pose any problem. It's transition from rculfqueue that is the concern here. Thanks, Mathieu > > Thanx, Paul > > > Thanks! > > > > Mathieu > > > > > > > > Thanx, Paul > > > > > > > Thoughts ? > > > > > > > > Thanks, > > > > > > > > Mathieu > > > > > > > > -- > > > > Mathieu Desnoyers > > > > Operating System Efficiency R&D Consultant > > > > EfficiOS Inc. > > > > http://www.efficios.com > > > > > > > > > > > -- > > Mathieu Desnoyers > > Operating System Efficiency R&D Consultant > > EfficiOS Inc. > > http://www.efficios.com > > > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Tue Oct 23 08:29:26 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 23 Oct 2012 08:29:26 -0400 Subject: [lttng-dev] urcu stack and queues updates and documentation In-Reply-To: <20121023021202.GB13737@Krystal> References: <20121014175332.GA2947@Krystal> <20121016213727.GL2385@linux.vnet.ibm.com> <20121017151946.GA14514@Krystal> <20121022174410.GT2518@linux.vnet.ibm.com> <20121023021202.GB13737@Krystal> Message-ID: <20121023122926.GA20944@Krystal> * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > > On Wed, Oct 17, 2012 at 11:19:46AM -0400, Mathieu Desnoyers wrote: > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > > > > On Sun, Oct 14, 2012 at 01:53:32PM -0400, Mathieu Desnoyers wrote: > > > > > Hi Paul! > > > > > > > > > > I know you are currently looking at documentation of urcu data > > > > > structures. I did quite a bit of work in that area these past days. Here > > > > > is my plan: > > > > > > > > Actually, I diverted to the atomic operations, given that the stack/queue > > > > API seems to be in flux. ;-) > > > > > > That sounds like a wise decision ;-) > > > > > > > > 1) I would like to deprecate, at some point, rculfqueue, wfqueue, and > > > > > rculfstack. > > > > > > > > > > 2) For wfqueue, we replace it by wfcqueue, currently in the urcu master > > > > > branch. > > > > > > > > > > 3) For rculfstack, we replace it by lfstack available here (volatile > > > > > branch): > > > > > > > > > > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > > > > > branch: urcu/lfstack > > > > > > > > I probably have to document them to have any chance of having an opinion, > > > > other than my usual advice to avoid disrupting users of the old interfaces. > > > > > > My general plan is to leave the old interfaces in place, marking them as > > > "deprecated" by adding a __attribute__((deprecated("This interface is deprecated. Please refer to urcu/xxxqueue.h for its replacement."))). > > > Then we'll be able to drop the deprecated interfaces in a couple of > > > versions. > > > > Fair enough. Should enough users protest, we can of course leave them > > in place. > > OK. > > > > > > > > 4) I did documentation improvements (and implemented pop_all as well as > > > > > empty, and iterators) for wfstack here (volatile branch too): > > > > > > > > > > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > > > > > branch: urcu/wfstack > > > > > > > > I will be very happy to take advantage of this. ;-) > > > > > > I wonder how we should move forward with these ? I could pull the > > > urcu/wfstack, urcu/lfstack commits into master with your approval, and > > > mark rculfstack and wfqueue as deprecated. wfstack is simply extended. I > > > would wait a bit before deciding anything wrt rculfqueue. Thoughts ? > > > > I would be in favor of pulling them in -- we can fix if need be. > > That said, I am not so sure that getting rid of wfqueue is a good idea, > > given your analysis below. > > My analysis below is about rculfqueue, not wfqueue. I think you got both > of them mixed up. I just merged the content of the urcu/wfstack, urcu/lfstack, urcu/wfcqueue branches into master. As explained below, urcu/rculfqueue needs more work. Thanks! Mathieu > > > > > > > > 5) The last one to look into would be rculfqueue. I'd really like to > > > > > create a lfcqueue derived from wfcqueue if possible. It's the next > > > > > item on my todo list this weekend. > > > > > > > > The piece I am missing is ABA avoidance. Or is this the approach > > > > that assumes a single dequeuer? > > > > > > If we look at the big picture, the main difference between the "wf" and > > > "lf" approaches, both for stack and queue, is that "wf" requires > > > traversal to busy-wait when it sees the intermediate NULL pointer state. > > > This allows wait-free push/enqueue with xchg. The "lf" approach ensures > > > that a simple traversal can be done on the structures, at the expense of > > > requiring a cmpxchg on the enqueue/push. > > > > > > Luckily, for stacks, the nature of stacks makes "push" ABA-proof (see > > > the documentation in the code), even if we use cmpxchg. > > > > > > Unluckily, for queues, using cmpxchg on enqueue is ABA-prone. dequeue > > > is ABA-prone too. Moreover, we need to have existance guarantees, so an > > > enqueue does not attempt to do a cmpxchg on the next pointer of a node > > > that has already been dequeued and reallocated. So, one approach is to > > > always rely on RCU, and require the RCU read-side lock to be held around > > > enqueue, and around dequeue. Now, the question is: can we rely on other, > > > non-rcu techniques, to protect lfqueue against ABA and offer existance > > > guarantees ? > > > > > > A single-dequeuer approach would unfortunately not be sufficient, > > > because enqueue is ABA-prone, and due to lack of existance guarantees > > > for the node we are about to append after: if we have multiple enqueuers > > > and a single dequeuer, one enqueue could suffer from ABA, and try to > > > touch reallocated memory, due to dequeue+reallocation of a node. > > > > > > Even forcing single-enqueuer/single-dequeuer would not suffice: if, > > > between the moment we get the tail node we plan to append after, and the > > > moment we perform the cmpxchg to that node next pointer, the node is > > > dequeued and freed, we would be touching freed memory (corruption). > > > > > > Therefore, that would require a single mutex on _both_ enqueue and > > > dequeue operations, which really defeats the purpose of a lock-free > > > queue. > > > > > > So my current understanding is that we might have to stay with a RCU > > > lfcqueue, requiring RCU read-side lock to be held for enqueue and > > > dequeue, and requiring to wait for a grace period to elapse before > > > freeing the memory returned by dequeue. The benefit of using rculfcqueue > > > over wfcqueue is that traversal of the nodes, and dequeue, don't need to > > > busy-loop on NULL next pointers. > > > > > > Thoughts ? > > > > Heh! It would indeed seem that we didn't think through the conversion > > from wfqueue as thoroughly as we might have. ;-) > > The transition from wfqueue to wfcqueue does not pose any problem. It's > transition from rculfqueue that is the concern here. > > Thanks, > > Mathieu > > > > > Thanx, Paul > > > > > Thanks! > > > > > > Mathieu > > > > > > > > > > > Thanx, Paul > > > > > > > > > Thoughts ? > > > > > > > > > > Thanks, > > > > > > > > > > Mathieu > > > > > > > > > > -- > > > > > Mathieu Desnoyers > > > > > Operating System Efficiency R&D Consultant > > > > > EfficiOS Inc. > > > > > http://www.efficios.com > > > > > > > > > > > > > > > -- > > > Mathieu Desnoyers > > > Operating System Efficiency R&D Consultant > > > EfficiOS Inc. > > > http://www.efficios.com > > > > > > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev at lists.lttng.org > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Tue Oct 23 08:50:05 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 23 Oct 2012 08:50:05 -0400 Subject: [lttng-dev] urcu stack and queues updates and documentation In-Reply-To: <20121023122926.GA20944@Krystal> References: <20121014175332.GA2947@Krystal> <20121016213727.GL2385@linux.vnet.ibm.com> <20121017151946.GA14514@Krystal> <20121022174410.GT2518@linux.vnet.ibm.com> <20121023021202.GB13737@Krystal> <20121023122926.GA20944@Krystal> Message-ID: <20121023125005.GA21311@Krystal> * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: > * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > > > On Wed, Oct 17, 2012 at 11:19:46AM -0400, Mathieu Desnoyers wrote: > > > > * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > > > > > On Sun, Oct 14, 2012 at 01:53:32PM -0400, Mathieu Desnoyers wrote: > > > > > > Hi Paul! > > > > > > > > > > > > I know you are currently looking at documentation of urcu data > > > > > > structures. I did quite a bit of work in that area these past days. Here > > > > > > is my plan: > > > > > > > > > > Actually, I diverted to the atomic operations, given that the stack/queue > > > > > API seems to be in flux. ;-) > > > > > > > > That sounds like a wise decision ;-) > > > > > > > > > > 1) I would like to deprecate, at some point, rculfqueue, wfqueue, and > > > > > > rculfstack. > > > > > > > > > > > > 2) For wfqueue, we replace it by wfcqueue, currently in the urcu master > > > > > > branch. > > > > > > > > > > > > 3) For rculfstack, we replace it by lfstack available here (volatile > > > > > > branch): > > > > > > > > > > > > git://git.dorsal.polymtl.ca/~compudj/userspace-rcu > > > > > > branch: urcu/lfstack > > > > > > > > > > I probably have to document them to have any chance of having an opinion, > > > > > other than my usual advice to avoid disrupting users of the old interfaces. > > > > > > > > My general plan is to leave the old interfaces in place, marking them as > > > > "deprecated" by adding a __attribute__((deprecated("This interface is deprecated. Please refer to urcu/xxxqueue.h for its replacement."))). > > > > Then we'll be able to drop the deprecated interfaces in a couple of > > > > versions. > > > > > > Fair enough. Should enough users protest, we can of course leave them > > > in place. > > > > OK. FYI, wfqueue and rculfstack are now deprecated: commit 147485105cf7b5c8ea96d7f68df973b9c5a94e8e Author: Mathieu Desnoyers Date: Tue Oct 23 08:43:33 2012 -0400 Deprecate wfqueue Replaced by "wfcqueue", which has a semantic that allows placing head and tail on different cache lines, and does not allocate memory internally. wfqueue users can easily migrate to wfcqueue. We choose to deprecate wfqueue rather than reimplementing it on top of wfcqueue to ensure we keep strong ABI compatibility for existing wfqueue users. Signed-off-by: Mathieu Desnoyers commit d89ec7629b8cafdc12e619cf5f07ceb5b0279275 Author: Mathieu Desnoyers Date: Tue Oct 23 08:36:42 2012 -0400 Deprecate rculfstack Replaced by "lfstack", which has a less restrictive semantic, and covers rculfstack completely. Signed-off-by: Mathieu Desnoyers wfstack is kept as-is (and has been recently extended with pop_all(), iterators, and empty() APIs). rculfqueue needs more work. I'd be tempted not to detail rculfqueue in the upcoming LWN article, but detailing wfqueue, wfstack and lfstack should be fine. Thanks! Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From Andrew.McDermott at windriver.com Tue Oct 23 09:11:16 2012 From: Andrew.McDermott at windriver.com (McDermott, Andrew) Date: Tue, 23 Oct 2012 13:11:16 +0000 Subject: [lttng-dev] status of lttng top In-Reply-To: <50858BF9.705@efficios.com> (Julien Desfossez's message of "Mon, 22 Oct 2012 14:10:01 -0400") References: <7F632A9222059A42AF70FCB7965774AA20627EAB@ALA-MBB.corp.ad.wrs.com> <507389DE.4000204@efficios.com> <7F632A9222059A42AF70FCB7965774AA20854D6F@ALA-MBB.corp.ad.wrs.com> <50858BF9.705@efficios.com> Message-ID: <7F632A9222059A42AF70FCB7965774AA20860D77@ALA-MBB.corp.ad.wrs.com> Hi, > On 22/10/12 07:00 AM, McDermott, Andrew wrote: >> >> Hi, >> >>> LTTngTop is still work in progress and will remain that way for a long >>> time, but the version in the PPA (or in the master branch in git) is >>> perfectly usable for offline traces (traces recorded and replayed >>> through LTTngTop). >>> >>> The "live" branch is more experimental and requires patches in both >>> Babeltrace and Lttng-tools (all documented in the README-LIVE file), but >>> it worked at the time of Plumbers, I didn't have much time since then to >>> rebase the branches. >>> >>> I am waiting for the release of Lttng-tools 2.1 (currently in RC) before >>> merging those patches. After these patches are integrated, LTTngTop will >>> be able to work live without any modifications, so directly reading >>> traces in memory shared with the tracer. >> >> Thanks for this info. >> >> Right now my interest is with the live streaming; we have a use case >> where the live streaming is really the only practical solution. >> >> Very roughly, would you expect the RC series to conclude this year, or >> (early) next year? > > Just to clarify, are you interested in live network trace reading or > live in-memory reading ? > The patches I was talking about are for in-memory trace reading. So I guess I don't understand enough of the low-level detail here. What I was interested in was being able to consume events, maybe periodically (1 /s), from a trace written by another process on the same machine. I guess that would fall under in-memory trace reading. -- andy From christian.babeux at efficios.com Tue Oct 23 15:36:17 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 23 Oct 2012 15:36:17 -0400 Subject: [lttng-dev] [PATCH lttng-tools] Fix: Possible memory leaks when creating filter IR root node Message-ID: <1351020977-18412-1-git-send-email-christian.babeux@efficios.com> Signed-off-by: Christian Babeux --- src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c b/src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c index eec78fc..84122c9 100644 --- a/src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c +++ b/src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c @@ -46,9 +46,11 @@ struct ir_op *make_op_root(struct ir_op *child, enum ir_side side) case IR_DATA_UNKNOWN: default: fprintf(stderr, "[error] Unknown root child data type\n"); + free(op); return NULL; case IR_DATA_STRING: fprintf(stderr, "[error] String cannot be root data type\n"); + free(op); return NULL; case IR_DATA_NUMERIC: case IR_DATA_FIELD_REF: -- 1.7.12.2 From christian.babeux at efficios.com Tue Oct 23 15:36:41 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 23 Oct 2012 15:36:41 -0400 Subject: [lttng-dev] [PATCH lttng-tools] Fix: Handle the unary bitwise negation operator (~) in the XML printer Message-ID: <1351021001-18480-1-git-send-email-christian.babeux@efficios.com> Signed-off-by: Christian Babeux --- src/lib/lttng-ctl/filter/filter-visitor-xml.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/lib/lttng-ctl/filter/filter-visitor-xml.c b/src/lib/lttng-ctl/filter/filter-visitor-xml.c index 90a336d..1d14f2e 100644 --- a/src/lib/lttng-ctl/filter/filter-visitor-xml.c +++ b/src/lib/lttng-ctl/filter/filter-visitor-xml.c @@ -235,6 +235,9 @@ int recursive_visit_print(struct filter_node *node, FILE *stream, int indent) case AST_UNARY_NOT: fprintf(stream, "\"!\""); break; + case AST_UNARY_BIN_NOT: + fprintf(stream, "\"~\""); + break; } fprintf(stream, ">\n"); ret = recursive_visit_print(node->u.unary_op.child, -- 1.7.12.2 From christian.babeux at efficios.com Tue Oct 23 15:37:20 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Tue, 23 Oct 2012 15:37:20 -0400 Subject: [lttng-dev] [PATCH lttng-ust] Fix: Fix self-assign warning on struct ustfork_clone_info init Message-ID: <1351021040-18587-1-git-send-email-christian.babeux@efficios.com> Use the proper field designator syntax (C99) to initialize the ustfork_clone_info struct. Signed-off-by: Christian Babeux --- liblttng-ust-fork/ustfork.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) mode change 100644 => 100755 liblttng-ust-fork/ustfork.c diff --git a/liblttng-ust-fork/ustfork.c b/liblttng-ust-fork/ustfork.c old mode 100644 new mode 100755 index 13f77cf..34f674c --- a/liblttng-ust-fork/ustfork.c +++ b/liblttng-ust-fork/ustfork.c @@ -139,7 +139,7 @@ int clone(int (*fn)(void *), void *child_stack, int flags, void *arg, ...) tls, ctid); } else { /* Creating a real process, we need to intervene. */ - struct ustfork_clone_info info = { fn = fn, arg = arg }; + struct ustfork_clone_info info = { .fn = fn, .arg = arg }; ust_before_fork(&info.sigset); retval = plibc_func(clone_fn, child_stack, flags, &info, -- 1.7.12.2 From mathieu.desnoyers at efficios.com Tue Oct 23 15:57:27 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 23 Oct 2012 15:57:27 -0400 Subject: [lttng-dev] [PATCH lttng-tools] Fix: Possible memory leaks when creating filter IR root node In-Reply-To: <1351020977-18412-1-git-send-email-christian.babeux@efficios.com> References: <1351020977-18412-1-git-send-email-christian.babeux@efficios.com> Message-ID: <20121023195727.GA30217@Krystal> * Christian Babeux (christian.babeux at efficios.com) wrote: > > Signed-off-by: Christian Babeux Acked-by: Mathieu Desnoyers > --- > src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c b/src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c > index eec78fc..84122c9 100644 > --- a/src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c > +++ b/src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c > @@ -46,9 +46,11 @@ struct ir_op *make_op_root(struct ir_op *child, enum ir_side side) > case IR_DATA_UNKNOWN: > default: > fprintf(stderr, "[error] Unknown root child data type\n"); > + free(op); > return NULL; > case IR_DATA_STRING: > fprintf(stderr, "[error] String cannot be root data type\n"); > + free(op); > return NULL; > case IR_DATA_NUMERIC: > case IR_DATA_FIELD_REF: > -- > 1.7.12.2 > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Tue Oct 23 15:57:49 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 23 Oct 2012 15:57:49 -0400 Subject: [lttng-dev] [PATCH lttng-tools] Fix: Handle the unary bitwise negation operator (~) in the XML printer In-Reply-To: <1351021001-18480-1-git-send-email-christian.babeux@efficios.com> References: <1351021001-18480-1-git-send-email-christian.babeux@efficios.com> Message-ID: <20121023195749.GB30217@Krystal> * Christian Babeux (christian.babeux at efficios.com) wrote: > > Signed-off-by: Christian Babeux Acked-by: Mathieu Desnoyers > --- > src/lib/lttng-ctl/filter/filter-visitor-xml.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/lib/lttng-ctl/filter/filter-visitor-xml.c b/src/lib/lttng-ctl/filter/filter-visitor-xml.c > index 90a336d..1d14f2e 100644 > --- a/src/lib/lttng-ctl/filter/filter-visitor-xml.c > +++ b/src/lib/lttng-ctl/filter/filter-visitor-xml.c > @@ -235,6 +235,9 @@ int recursive_visit_print(struct filter_node *node, FILE *stream, int indent) > case AST_UNARY_NOT: > fprintf(stream, "\"!\""); > break; > + case AST_UNARY_BIN_NOT: > + fprintf(stream, "\"~\""); > + break; > } > fprintf(stream, ">\n"); > ret = recursive_visit_print(node->u.unary_op.child, > -- > 1.7.12.2 > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Tue Oct 23 15:59:11 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 23 Oct 2012 15:59:11 -0400 Subject: [lttng-dev] [PATCH lttng-ust] Fix: Fix self-assign warning on struct ustfork_clone_info init In-Reply-To: <1351021040-18587-1-git-send-email-christian.babeux@efficios.com> References: <1351021040-18587-1-git-send-email-christian.babeux@efficios.com> Message-ID: <20121023195911.GC30217@Krystal> * Christian Babeux (christian.babeux at efficios.com) wrote: > Use the proper field designator syntax (C99) to initialize the > ustfork_clone_info struct. merged, thanks! Mathieu > > Signed-off-by: Christian Babeux > --- > liblttng-ust-fork/ustfork.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > mode change 100644 => 100755 liblttng-ust-fork/ustfork.c > > diff --git a/liblttng-ust-fork/ustfork.c b/liblttng-ust-fork/ustfork.c > old mode 100644 > new mode 100755 > index 13f77cf..34f674c > --- a/liblttng-ust-fork/ustfork.c > +++ b/liblttng-ust-fork/ustfork.c > @@ -139,7 +139,7 @@ int clone(int (*fn)(void *), void *child_stack, int flags, void *arg, ...) > tls, ctid); > } else { > /* Creating a real process, we need to intervene. */ > - struct ustfork_clone_info info = { fn = fn, arg = arg }; > + struct ustfork_clone_info info = { .fn = fn, .arg = arg }; > > ust_before_fork(&info.sigset); > retval = plibc_func(clone_fn, child_stack, flags, &info, > -- > 1.7.12.2 > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From David.OShea at quantum.com Tue Oct 23 23:58:59 2012 From: David.OShea at quantum.com (David OShea) Date: Wed, 24 Oct 2012 03:58:59 +0000 Subject: [lttng-dev] Wrong procname for userspace trace of app with different thread names Message-ID: <20998D40D9A2B7499CA5A3A2666CB1EB19F7CBC4@ZURMSG1.QUANTUM.com> Hi all, I performed a userspace trace of a multi-threaded application that uses prctl(PR_SET_NAME, ...) to give its threads names. From the output of 'lttng view' below (I removed timestamps to make it shorter), it appears that UST is populating the "procname" by getting the process/thread name from the first thread it sees for a given PID, then assuming that name applies to all threads in the process: mydaemon:22593 mydaemon:start: { }, { pthread_id = 1496398144, procname = "mythread31", vtid = 22761, vpid = 22593 }, { } mydaemon:22593 mydaemon:end: { }, { pthread_id = 1496398144, procname = "mythread31", vtid = 22761, vpid = 22593 }, { } myclient:28277 myclient:end: { }, { pthread_id = 47216641758592, vtid = 28277, vpid = 28277, procname = "myclient" }, { } myclient:28277 myclient:start: { }, { pthread_id = 47216641758592, vtid = 28277, vpid = 28277, procname = "myclient" }, { } mydaemon:22593 mydaemon:start: { }, { pthread_id = 1485908288, procname = "mythread31", vtid = 22760, vpid = 22593 }, { } On the first line, the procname contains the thread name for vtid = 22761. On the last line, the procname is still shown as "mythread31", but the vtid is different, and I can see from /proc/22593/task/22760/stat that the thread's name is "mythread30". However, the : at the start of each line - the pair before the : - does show the actual process name rather than any thread name. I'm using LTTng 2.0.4 and babeltrace 1.0.0.rc5. I set up my trace using these steps: lttng create lttng enable-channel channel0 --userspace lttng add-context --userspace -t vpid -t vtid -t procname -t pthread_id lttng enable-event --userspace --all It would be great if "procname" was replaced by "threadname" and showed each thread's name correctly, since the process's name is already shown, but the thread names would be very useful. Otherwise, at the very least, "procname" should probably show the same thing that is shown at the start of the line rather than incorrectly using the name of one of the threads. Incidentally, why do I have to do "enable-channel" to get the context to actually appear? If I don't do this, the "add-context" still says that each of the four contexts was "added to all channels", but they don't appear in 'lttng view'. Thanks in advance, David ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. From Andrew.McDermott at windriver.com Wed Oct 24 07:21:55 2012 From: Andrew.McDermott at windriver.com (McDermott, Andrew) Date: Wed, 24 Oct 2012 11:21:55 +0000 Subject: [lttng-dev] registering user-defined properties on tracepoints Message-ID: <7F632A9222059A42AF70FCB7965774AA2086527B@ALA-MBB.corp.ad.wrs.com> Is it possible to add /other/ user-defined attributes to an event definition. What I'm trying to do is associate other related properties/attributes to the event itself and on each field in the event. In this example I'm choosing to encode my attributes using a JSON-like syntax, but the point is that this is really user-defined. TRACEPOINT_EVENT(foo, some_event, TP_ARGS(int, value), TP_ATTRIBUTES("{ icon:some_event.png, helpIndex:docs/help/0001.html ... }") TP_FIELDS(ctf_integer(int, foo, foo, "{java_formatter:com.windriver.SomeEventFormatter, leftAdjust:1, mask:64, ... }")) ) I could create an associated event: TRACEPOINT_EVENT(foo, some_event_metadata, TP_ARGS(char *, str), TP_FIELDS(ctf_string(char *, str)) ) and have some rules based on the event names to bind the two together but it seems "nicer" to keep them together. My motivation is keeping this kind of auxiliary information available with the trace itself as opposed to some side-files which have the tendency to get out of sync. Or perhaps this is another way of achieving this. Any hints gratefully received... Thanks, Andy. From mathieu.desnoyers at efficios.com Wed Oct 24 08:32:07 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 24 Oct 2012 08:32:07 -0400 Subject: [lttng-dev] Wrong procname for userspace trace of app with different thread names In-Reply-To: <20998D40D9A2B7499CA5A3A2666CB1EB19F7CBC4@ZURMSG1.QUANTUM.com> References: <20998D40D9A2B7499CA5A3A2666CB1EB19F7CBC4@ZURMSG1.QUANTUM.com> Message-ID: <20121024123207.GA9568@Krystal> * David OShea (David.OShea at quantum.com) wrote: > Hi all, > > I performed a userspace trace of a multi-threaded application that > uses prctl(PR_SET_NAME, ...) to give its threads names. From the > output of 'lttng view' below (I removed timestamps to make it > shorter), it appears that UST is populating the "procname" by getting > the process/thread name from the first thread it sees for a given PID, > then assuming that name applies to all threads in the process: Yes, this is right. > > mydaemon:22593 mydaemon:start: { }, { pthread_id = 1496398144, procname = "mythread31", vtid = 22761, vpid = 22593 }, { } > mydaemon:22593 mydaemon:end: { }, { pthread_id = 1496398144, procname = "mythread31", vtid = 22761, vpid = 22593 }, { } > myclient:28277 myclient:end: { }, { pthread_id = 47216641758592, vtid = 28277, vpid = 28277, procname = "myclient" }, { } > myclient:28277 myclient:start: { }, { pthread_id = 47216641758592, vtid = 28277, vpid = 28277, procname = "myclient" }, { } > mydaemon:22593 mydaemon:start: { }, { pthread_id = 1485908288, procname = "mythread31", vtid = 22760, vpid = 22593 }, { } > > On the first line, the procname contains the thread name for vtid = > 22761. On the last line, the procname is still shown as "mythread31", > but the vtid is different, and I can see from > /proc/22593/task/22760/stat that the thread's name is "mythread30". > However, the : at the start of each line - the pair > before the : - does show the actual > process name rather than any thread name. > > I'm using LTTng 2.0.4 and babeltrace 1.0.0.rc5. I set up my trace > using these steps: > > lttng create > lttng enable-channel channel0 --userspace > lttng add-context --userspace -t vpid -t vtid -t procname -t pthread_id > lttng enable-event --userspace --all > > It would be great if "procname" was replaced by "threadname" and > showed each thread's name correctly, since the process's name is > already shown, but the thread names would be very useful. Otherwise, > at the very least, "procname" should probably show the same thing that > is shown at the start of the line rather than incorrectly using the > name of one of the threads. We currently choose to get the process name only once and cache it for performance reasons: we don't want to call prctl (and thus pay a round-trip to the kernel) each and every time an event is traced. I agree with you, though, that maybe this shortcut is not strictly right semantically speaking. One approach we could take would be to do like liblttng-ust-fork, and provide a way to override prctl(), and keep a procname per-thread value cached in user-space. This could fit in liblttng-ust-libc, which can be optionally loaded. When available, the procname context in UST would use this cached information instead of the info cached per process. However, this change is quite intrusive at this point of the -rc cycle. Given that getting the process name into a context can be expected to be an operation that might degrade performances (lttng offers no guarantees about how fast it is to grab each context), we could start by simply disabling the procname cache in user-space to get the semantic right. Somewhere in the future, we can then proceed to do the libc override to improve performance again. Thoughts ? > > Incidentally, why do I have to do "enable-channel" to get the context > to actually appear? If I don't do this, the "add-context" still says > that each of the four contexts was "added to all channels", but they > don't appear in 'lttng view'. Please provide the full list of commands you do to reproduce a "good" and "bad" behavior. Thanks, Mathieu > > Thanks in advance, > David > > ---------------------------------------------------------------------- > The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Wed Oct 24 08:35:51 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 24 Oct 2012 08:35:51 -0400 Subject: [lttng-dev] Wrong procname for userspace trace of app with different thread names In-Reply-To: <20121024123207.GA9568@Krystal> References: <20998D40D9A2B7499CA5A3A2666CB1EB19F7CBC4@ZURMSG1.QUANTUM.com> <20121024123207.GA9568@Krystal> Message-ID: <20121024123551.GB9568@Krystal> * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: > * David OShea (David.OShea at quantum.com) wrote: > > Hi all, > > > > I performed a userspace trace of a multi-threaded application that > > uses prctl(PR_SET_NAME, ...) to give its threads names. From the > > output of 'lttng view' below (I removed timestamps to make it > > shorter), it appears that UST is populating the "procname" by getting > > the process/thread name from the first thread it sees for a given PID, > > then assuming that name applies to all threads in the process: > > Yes, this is right. > > > > > mydaemon:22593 mydaemon:start: { }, { pthread_id = 1496398144, procname = "mythread31", vtid = 22761, vpid = 22593 }, { } > > mydaemon:22593 mydaemon:end: { }, { pthread_id = 1496398144, procname = "mythread31", vtid = 22761, vpid = 22593 }, { } > > myclient:28277 myclient:end: { }, { pthread_id = 47216641758592, vtid = 28277, vpid = 28277, procname = "myclient" }, { } > > myclient:28277 myclient:start: { }, { pthread_id = 47216641758592, vtid = 28277, vpid = 28277, procname = "myclient" }, { } > > mydaemon:22593 mydaemon:start: { }, { pthread_id = 1485908288, procname = "mythread31", vtid = 22760, vpid = 22593 }, { } > > > > On the first line, the procname contains the thread name for vtid = > > 22761. On the last line, the procname is still shown as "mythread31", > > but the vtid is different, and I can see from > > /proc/22593/task/22760/stat that the thread's name is "mythread30". > > However, the : at the start of each line - the pair > > before the : - does show the actual > > process name rather than any thread name. > > > > I'm using LTTng 2.0.4 and babeltrace 1.0.0.rc5. I set up my trace > > using these steps: > > > > lttng create > > lttng enable-channel channel0 --userspace > > lttng add-context --userspace -t vpid -t vtid -t procname -t pthread_id > > lttng enable-event --userspace --all > > > > It would be great if "procname" was replaced by "threadname" and > > showed each thread's name correctly, since the process's name is > > already shown, but the thread names would be very useful. Otherwise, > > at the very least, "procname" should probably show the same thing that > > is shown at the start of the line rather than incorrectly using the > > name of one of the threads. > > We currently choose to get the process name only once and cache it > for performance reasons: we don't want to call prctl (and thus pay a > round-trip to the kernel) each and every time an event is traced. > > I agree with you, though, that maybe this shortcut is not strictly right > semantically speaking. > > One approach we could take would be to do like liblttng-ust-fork, and > provide a way to override prctl(), and keep a procname per-thread value > cached in user-space. This could fit in liblttng-ust-libc, which can be > optionally loaded. When available, the procname context in UST would use > this cached information instead of the info cached per process. However, > this change is quite intrusive at this point of the -rc cycle. > > Given that getting the process name into a context can be expected to be > an operation that might degrade performances (lttng offers no guarantees > about how fast it is to grab each context), we could start by simply > disabling the procname cache in user-space to get the semantic right. > Somewhere in the future, we can then proceed to do the libc override to > improve performance again. Thoughts ? Now that I come to think of it, maybe there would be a middle-ground here: we could save the procname into a thread-local storage rather than a global variable. Therefore, as long as you issue prctl() on your thread before the first event is logged for that thread, you'll cache the appropriate thread name. I expect that typical use-cases involve calling prctl right after the thread starts, so it might just work (this would need to be documented though). Thoughts ? Thanks, Mathieu > > > > > Incidentally, why do I have to do "enable-channel" to get the context > > to actually appear? If I don't do this, the "add-context" still says > > that each of the four contexts was "added to all channels", but they > > don't appear in 'lttng view'. > > Please provide the full list of commands you do to reproduce a "good" > and "bad" behavior. > > Thanks, > > Mathieu > > > > > > Thanks in advance, > > David > > > > ---------------------------------------------------------------------- > > The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev at lists.lttng.org > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Wed Oct 24 09:06:01 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Wed, 24 Oct 2012 09:06:01 -0400 Subject: [lttng-dev] Wrong procname for userspace trace of app with different thread names In-Reply-To: <20121024123551.GB9568@Krystal> References: <20998D40D9A2B7499CA5A3A2666CB1EB19F7CBC4@ZURMSG1.QUANTUM.com> <20121024123207.GA9568@Krystal> <20121024123551.GB9568@Krystal> Message-ID: <20121024130601.GA10310@Krystal> * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: > * Mathieu Desnoyers (mathieu.desnoyers at efficios.com) wrote: > > * David OShea (David.OShea at quantum.com) wrote: > > > Hi all, > > > > > > I performed a userspace trace of a multi-threaded application that > > > uses prctl(PR_SET_NAME, ...) to give its threads names. From the > > > output of 'lttng view' below (I removed timestamps to make it > > > shorter), it appears that UST is populating the "procname" by getting > > > the process/thread name from the first thread it sees for a given PID, > > > then assuming that name applies to all threads in the process: > > > > Yes, this is right. > > > > > > > > mydaemon:22593 mydaemon:start: { }, { pthread_id = 1496398144, procname = "mythread31", vtid = 22761, vpid = 22593 }, { } > > > mydaemon:22593 mydaemon:end: { }, { pthread_id = 1496398144, procname = "mythread31", vtid = 22761, vpid = 22593 }, { } > > > myclient:28277 myclient:end: { }, { pthread_id = 47216641758592, vtid = 28277, vpid = 28277, procname = "myclient" }, { } > > > myclient:28277 myclient:start: { }, { pthread_id = 47216641758592, vtid = 28277, vpid = 28277, procname = "myclient" }, { } > > > mydaemon:22593 mydaemon:start: { }, { pthread_id = 1485908288, procname = "mythread31", vtid = 22760, vpid = 22593 }, { } > > > > > > On the first line, the procname contains the thread name for vtid = > > > 22761. On the last line, the procname is still shown as "mythread31", > > > but the vtid is different, and I can see from > > > /proc/22593/task/22760/stat that the thread's name is "mythread30". > > > However, the : at the start of each line - the pair > > > before the : - does show the actual > > > process name rather than any thread name. > > > > > > I'm using LTTng 2.0.4 and babeltrace 1.0.0.rc5. I set up my trace > > > using these steps: > > > > > > lttng create > > > lttng enable-channel channel0 --userspace > > > lttng add-context --userspace -t vpid -t vtid -t procname -t pthread_id > > > lttng enable-event --userspace --all > > > > > > It would be great if "procname" was replaced by "threadname" and > > > showed each thread's name correctly, since the process's name is > > > already shown, but the thread names would be very useful. Otherwise, > > > at the very least, "procname" should probably show the same thing that > > > is shown at the start of the line rather than incorrectly using the > > > name of one of the threads. > > > > We currently choose to get the process name only once and cache it > > for performance reasons: we don't want to call prctl (and thus pay a > > round-trip to the kernel) each and every time an event is traced. > > > > I agree with you, though, that maybe this shortcut is not strictly right > > semantically speaking. > > > > One approach we could take would be to do like liblttng-ust-fork, and > > provide a way to override prctl(), and keep a procname per-thread value > > cached in user-space. This could fit in liblttng-ust-libc, which can be > > optionally loaded. When available, the procname context in UST would use > > this cached information instead of the info cached per process. However, > > this change is quite intrusive at this point of the -rc cycle. > > > > Given that getting the process name into a context can be expected to be > > an operation that might degrade performances (lttng offers no guarantees > > about how fast it is to grab each context), we could start by simply > > disabling the procname cache in user-space to get the semantic right. > > Somewhere in the future, we can then proceed to do the libc override to > > improve performance again. Thoughts ? > > Now that I come to think of it, maybe there would be a middle-ground > here: we could save the procname into a thread-local storage rather than > a global variable. Therefore, as long as you issue prctl() on your > thread before the first event is logged for that thread, you'll cache > the appropriate thread name. > > I expect that typical use-cases involve calling prctl right after the > thread starts, so it might just work (this would need to be documented > though). Please try with lttng-ust HEAD, which includes: commit 009745db8ca05f7a3abbb37558b08eae0107f7e1 Author: Mathieu Desnoyers Date: Wed Oct 24 09:04:14 2012 -0400 Fix: procname context semantic Cache the procname per-thread rather than per-process to take into account that prctl() can be used to set thread names. prctl() should be issued before tracing each thread's first event if we care about the thread name. Signed-off-by: Mathieu Desnoyers > > Thoughts ? > > Thanks, > > Mathieu > > > > > > > > > Incidentally, why do I have to do "enable-channel" to get the context > > > to actually appear? If I don't do this, the "add-context" still says > > > that each of the four contexts was "added to all channels", but they > > > don't appear in 'lttng view'. > > > > Please provide the full list of commands you do to reproduce a "good" > > and "bad" behavior. > > > > Thanks, > > > > Mathieu > > > > > > > > > > Thanks in advance, > > > David > > > > > > ---------------------------------------------------------------------- > > > The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. > > > > > > _______________________________________________ > > > lttng-dev mailing list > > > lttng-dev at lists.lttng.org > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > > -- > > Mathieu Desnoyers > > Operating System Efficiency R&D Consultant > > EfficiOS Inc. > > http://www.efficios.com > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev at lists.lttng.org > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From David.OShea at quantum.com Thu Oct 25 00:07:54 2012 From: David.OShea at quantum.com (David OShea) Date: Thu, 25 Oct 2012 04:07:54 +0000 Subject: [lttng-dev] Wrong procname for userspace trace of app with different thread names In-Reply-To: <20121024130601.GA10310@Krystal> References: <20998D40D9A2B7499CA5A3A2666CB1EB19F7CBC4@ZURMSG1.QUANTUM.com> <20121024123207.GA9568@Krystal> <20121024123551.GB9568@Krystal> <20121024130601.GA10310@Krystal> Message-ID: <20998D40D9A2B7499CA5A3A2666CB1EB19F81C78@ZURMSG1.QUANTUM.com> Hi Mathieu, > -----Original Message----- > From: Mathieu Desnoyers [mailto:mathieu.desnoyers at efficios.com] > Sent: Wednesday, 24 October 2012 11:36 PM > To: David OShea > Cc: lttng-dev at lists.lttng.org > Subject: Re: [lttng-dev] Wrong procname for userspace trace of app with > different thread names > > I expect that typical use-cases involve calling prctl right after the > > thread starts, so it might just work (this would need to be > documented > > though). > > Please try with lttng-ust HEAD, which includes: [...] > > commit 009745db8ca05f7a3abbb37558b08eae0107f7e1 Thanks, I just applied that commit as a patch to the older version of lttng-ust I'm using and it works as desired! I think the assumption that prctl() is called right after the thread starts is a reasonable one. > > > > Incidentally, why do I have to do "enable-channel" to get the context > > > > to actually appear? If I don't do this, the "add-context" still says > > > > that each of the four contexts was "added to all channels", but they > > > > don't appear in 'lttng view'. > > > > > > Please provide the full list of commands you do to reproduce a "good" > > > and "bad" behavior. Thanks, I'll do a few tests and make a separate post about this. Regards, David ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. From Andrew.McDermott at windriver.com Thu Oct 25 07:49:41 2012 From: Andrew.McDermott at windriver.com (McDermott, Andrew) Date: Thu, 25 Oct 2012 11:49:41 +0000 Subject: [lttng-dev] getting a notification from 'lttng start|stop' Message-ID: <7F632A9222059A42AF70FCB7965774AA208660DB@ALA-MBB.corp.ad.wrs.com> Is it possible to see when tracing has actually started (i.e., when `lttng start' is invoked)? I can observe that the tracepoint is enabled/disabled by looking at the 'state' field in a 'struct tracepoint'. Is there an equivalent variable to monitor for `lttng start|stop'? I was trying to dump some additional process state (via calls to tracepoint) whenever an external start/stop/start/stop/start sequence occurs. Also, when I run 'lttng enable-event -u -a' I see that the 'state' value changes from '0' to '1', but if I run 'lttng disable-event -u -a' it stays at '1'. Is this expected? If I subsequently run 'lttng destroy' then it does go to '0'. Shouldn't the disable set it to '0' too? -- andy From jdesfossez at efficios.com Thu Oct 25 14:40:31 2012 From: jdesfossez at efficios.com (Julien Desfossez) Date: Thu, 25 Oct 2012 14:40:31 -0400 Subject: [lttng-dev] status of lttng top In-Reply-To: <7F632A9222059A42AF70FCB7965774AA20860D77@ALA-MBB.corp.ad.wrs.com> References: <7F632A9222059A42AF70FCB7965774AA20627EAB@ALA-MBB.corp.ad.wrs.com> <507389DE.4000204@efficios.com> <7F632A9222059A42AF70FCB7965774AA20854D6F@ALA-MBB.corp.ad.wrs.com> <50858BF9.705@efficios.com> <7F632A9222059A42AF70FCB7965774AA20860D77@ALA-MBB.corp.ad.wrs.com> Message-ID: <5089879F.50701@efficios.com> Hi, >>>> LTTngTop is still work in progress and will remain that way for a long >>>> time, but the version in the PPA (or in the master branch in git) is >>>> perfectly usable for offline traces (traces recorded and replayed >>>> through LTTngTop). >>>> >>>> The "live" branch is more experimental and requires patches in both >>>> Babeltrace and Lttng-tools (all documented in the README-LIVE file), but >>>> it worked at the time of Plumbers, I didn't have much time since then to >>>> rebase the branches. >>>> >>>> I am waiting for the release of Lttng-tools 2.1 (currently in RC) before >>>> merging those patches. After these patches are integrated, LTTngTop will >>>> be able to work live without any modifications, so directly reading >>>> traces in memory shared with the tracer. >>> >>> Thanks for this info. >>> >>> Right now my interest is with the live streaming; we have a use case >>> where the live streaming is really the only practical solution. >>> >>> Very roughly, would you expect the RC series to conclude this year, or >>> (early) next year? >> >> Just to clarify, are you interested in live network trace reading or >> live in-memory reading ? >> The patches I was talking about are for in-memory trace reading. > > So I guess I don't understand enough of the low-level detail here. What > I was interested in was being able to consume events, maybe periodically > (1 /s), from a trace written by another process on the same machine. I > guess that would fall under in-memory trace reading. > Ok I will just describe this a little more, when we talk about live reading the trace, we have two aspects : - reading a trace while it is being written on disk (whether it is received from the network or from a local consumer) - reading a trace directly from memory mapped buffers between the tracer and the consumer without writing the trace files. So if you want to read the trace on the machine that is being traced without ever writing the trace on disk, yes you want the in-memory trace reading. For 2.2, the focus is to support live trace reading from disk (local and network). In my development branches (referenced in previous email), I have code that provides live trace reading from memory, I will try to merge it in 2.2 but I cannot guarantee it will be accepted since it is not the current priority (but definitely a use-case we want to support). I hope it clarifies the situation, Thanks, Julien From dgoulet at efficios.com Thu Oct 25 15:52:04 2012 From: dgoulet at efficios.com (David Goulet) Date: Thu, 25 Oct 2012 15:52:04 -0400 Subject: [lttng-dev] [PATCH lttng-tools] Fix: consumer relayd cleanup on disconnect Message-ID: <1351194724-7860-1-git-send-email-dgoulet@efficios.com> Improve the resilience of the consumer by cleaning up a relayd object and all associated streams when a write error is detected on a relayd socket. Fixes #385 Signed-off-by: David Goulet --- src/common/consumer.c | 244 ++++++++++++++++++++++++++++++++++++++++---- src/common/consumer.h | 2 + src/common/relayd/relayd.c | 3 + 3 files changed, 229 insertions(+), 20 deletions(-) diff --git a/src/common/consumer.c b/src/common/consumer.c index 53c6180..e61a227 100644 --- a/src/common/consumer.c +++ b/src/common/consumer.c @@ -70,6 +70,21 @@ struct lttng_ht *metadata_ht; struct lttng_ht *data_ht; /* + * Notify a thread pipe to poll back again. This usually means that some global + * state has changed so we just send back the thread in a poll wait call. + */ +static void notify_thread_pipe(int wpipe) +{ + int ret; + + do { + struct lttng_consumer_stream *null_stream = NULL; + + ret = write(wpipe, &null_stream, sizeof(null_stream)); + } while (ret < 0 && errno == EINTR); +} + +/* * Find a stream. The consumer_data.lock must be locked during this * call. */ @@ -182,6 +197,14 @@ static void consumer_rcu_free_relayd(struct rcu_head *head) struct consumer_relayd_sock_pair *relayd = caa_container_of(node, struct consumer_relayd_sock_pair, node); + /* + * Close all sockets. This is done in the call RCU since we don't want the + * socket fds to be reassigned thus potentially creating bad state of the + * relayd object. + */ + (void) relayd_close(&relayd->control_sock); + (void) relayd_close(&relayd->data_sock); + free(relayd); } @@ -204,21 +227,86 @@ static void destroy_relayd(struct consumer_relayd_sock_pair *relayd) iter.iter.node = &relayd->node.node; ret = lttng_ht_del(consumer_data.relayd_ht, &iter); if (ret != 0) { - /* We assume the relayd was already destroyed */ + /* We assume the relayd is being or is destroyed */ return; } - /* Close all sockets */ - pthread_mutex_lock(&relayd->ctrl_sock_mutex); - (void) relayd_close(&relayd->control_sock); - pthread_mutex_unlock(&relayd->ctrl_sock_mutex); - (void) relayd_close(&relayd->data_sock); - /* RCU free() call */ call_rcu(&relayd->node.head, consumer_rcu_free_relayd); } /* + * Set the delete flag to all streams having the given network sequence index + * (relayd index). + */ +static void set_delete_flag_stream_by_netidx(int net_seq_idx) +{ + struct lttng_ht_iter iter; + struct lttng_consumer_stream *stream; + + DBG("Consumer set delete flag on stream by idx %d", net_seq_idx); + + rcu_read_lock(); + + /* Let's begin with metadata */ + cds_lfht_for_each_entry(metadata_ht->ht, &iter.iter, stream, node.node) { + if (stream->net_seq_idx == net_seq_idx) { + uatomic_set(&stream->delete_flag, 1); + DBG("Delete flag set to metadata stream %d", stream->wait_fd); + } + } + + /* Follow up by the data streams */ + cds_lfht_for_each_entry(data_ht->ht, &iter.iter, stream, node.node) { + if (stream->net_seq_idx == net_seq_idx) { + uatomic_set(&stream->delete_flag, 1); + DBG("Delete flag set to data stream %d", stream->wait_fd); + } + } + rcu_read_unlock(); +} + +/* + * Cleanup a relayd object by flagging every associated streams for deletion, + * destroying the object meaning removing it from the relayd hash table, + * closing the sockets and freeing the memory in a RCU call. + * + * If a local data context is available, notify the threads that the streams' + * state have changed. + */ +static void cleanup_relayd(struct consumer_relayd_sock_pair *relayd, + struct lttng_consumer_local_data *ctx) +{ + int netidx; + + if (!relayd) { + /* Well... no fun */ + return; + } + + /* Save the net sequence index before destroying the object */ + netidx = relayd->net_seq_idx; + + /* + * Delete the relayd from the relayd hash table, close the sockets and free + * the object in a RCU call. + */ + destroy_relayd(relayd); + + /* For all streams associated with the relayd, flag them for deletion. */ + set_delete_flag_stream_by_netidx(netidx); + + /* + * With a local data context, notify the threads that the streams' state + * have changed. + */ + if (ctx) { + notify_thread_pipe(ctx->consumer_data_pipe[1]); + notify_thread_pipe(ctx->consumer_metadata_pipe[1]); + } +} + +/* * Flag a relayd socket pair for destruction. Destroy it if the refcount * reaches zero. * @@ -251,11 +339,15 @@ void consumer_del_stream(struct lttng_consumer_stream *stream, assert(stream); + DBG("Consumer del stream %d", stream->wait_fd); + if (ht == NULL) { /* Means the stream was allocated but not successfully added */ goto free_stream; } + pthread_mutex_lock(&stream->lock); + pthread_mutex_lock(&consumer_data.lock); switch (consumer_data.type) { @@ -349,6 +441,7 @@ void consumer_del_stream(struct lttng_consumer_stream *stream, end: consumer_data.need_update = 1; pthread_mutex_unlock(&consumer_data.lock); + pthread_mutex_unlock(&stream->lock); if (free_chan) { consumer_del_channel(free_chan); @@ -804,7 +897,8 @@ static int consumer_update_poll_array( DBG("Updating poll fd array"); rcu_read_lock(); cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { - if (stream->state != LTTNG_CONSUMER_ACTIVE_STREAM) { + if (stream->state != LTTNG_CONSUMER_ACTIVE_STREAM || + stream->delete_flag) { continue; } DBG("Active FD %d", stream->wait_fd); @@ -1169,6 +1263,7 @@ ssize_t lttng_consumer_on_read_subbuffer_mmap( /* Default is on the disk */ int outfd = stream->out_fd; struct consumer_relayd_sock_pair *relayd = NULL; + unsigned int relayd_hang_up = 0; /* RCU lock for the relayd pointer */ rcu_read_lock(); @@ -1228,11 +1323,22 @@ ssize_t lttng_consumer_on_read_subbuffer_mmap( ret = write_relayd_metadata_id(outfd, stream, relayd, padding); if (ret < 0) { written = ret; + /* Socket operation failed. We consider the relayd dead */ + if (ret == -EPIPE || ret == -EINVAL) { + relayd_hang_up = 1; + goto write_error; + } goto end; } } + } else { + /* Socket operation failed. We consider the relayd dead */ + if (ret == -EPIPE || ret == -EINVAL) { + relayd_hang_up = 1; + goto write_error; + } + /* Else, use the default set before which is the filesystem. */ } - /* Else, use the default set before which is the filesystem. */ } else { /* No streaming, we have to set the len with the full padding */ len += padding; @@ -1248,6 +1354,11 @@ ssize_t lttng_consumer_on_read_subbuffer_mmap( if (written == 0) { written = ret; } + /* Socket operation failed. We consider the relayd dead */ + if (errno == EPIPE || errno == EINVAL) { + relayd_hang_up = 1; + goto write_error; + } goto end; } else if (ret > len) { PERROR("Error in file write (ret %zd > len %lu)", ret, len); @@ -1269,6 +1380,15 @@ ssize_t lttng_consumer_on_read_subbuffer_mmap( } lttng_consumer_sync_trace_file(stream, orig_offset); +write_error: + /* + * This is a special case that the relayd has closed its socket. Let's + * cleanup the relayd object and all associated streams. + */ + if (relayd && relayd_hang_up) { + cleanup_relayd(relayd, ctx); + } + end: /* Unlock only if ctrl socket used */ if (relayd && stream->metadata_flag) { @@ -1298,6 +1418,7 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( int outfd = stream->out_fd; struct consumer_relayd_sock_pair *relayd = NULL; int *splice_pipe; + unsigned int relayd_hang_up = 0; switch (consumer_data.type) { case LTTNG_CONSUMER_KERNEL: @@ -1350,6 +1471,12 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( padding); if (ret < 0) { written = ret; + /* Socket operation failed. We consider the relayd dead */ + if (ret == -EBADF) { + WARN("Remote relayd disconnected. Stopping"); + relayd_hang_up = 1; + goto write_error; + } goto end; } @@ -1361,7 +1488,12 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( /* Use the returned socket. */ outfd = ret; } else { - ERR("Remote relayd disconnected. Stopping"); + /* Socket operation failed. We consider the relayd dead */ + if (ret == -EBADF) { + WARN("Remote relayd disconnected. Stopping"); + relayd_hang_up = 1; + goto write_error; + } goto end; } } else { @@ -1410,6 +1542,12 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( if (written == 0) { written = ret_splice; } + /* Socket operation failed. We consider the relayd dead */ + if (errno == EBADF) { + WARN("Remote relayd disconnected. Stopping"); + relayd_hang_up = 1; + goto write_error; + } ret = errno; goto splice_error; } else if (ret_splice > len) { @@ -1437,12 +1575,20 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( goto end; +write_error: + /* + * This is a special case that the relayd has closed its socket. Let's + * cleanup the relayd object and all associated streams. + */ + if (relayd && relayd_hang_up) { + cleanup_relayd(relayd, ctx); + /* Let's not make fail the consumer for a disconnected relayd. */ + goto end; + } + splice_error: /* send the appropriate error description to sessiond */ switch (ret) { - case EBADF: - lttng_consumer_send_error(ctx, LTTCOMM_CONSUMERD_SPLICE_EBADF); - break; case EINVAL: lttng_consumer_send_error(ctx, LTTCOMM_CONSUMERD_SPLICE_EINVAL); break; @@ -1604,6 +1750,8 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, goto free_stream; } + pthread_mutex_lock(&stream->lock); + pthread_mutex_lock(&consumer_data.lock); switch (consumer_data.type) { case LTTNG_CONSUMER_KERNEL: @@ -1695,6 +1843,7 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, end: pthread_mutex_unlock(&consumer_data.lock); + pthread_mutex_unlock(&stream->lock); if (free_chan) { consumer_del_channel(free_chan); @@ -1766,6 +1915,58 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, } /* + * Delete data stream that are flagged for deletion (delete_flag). + */ +static void delete_flagged_data_stream(void) +{ + struct lttng_ht_iter iter; + struct lttng_consumer_stream *stream; + + DBG("Consumer delete flagged data stream"); + + rcu_read_lock(); + cds_lfht_for_each_entry(data_ht->ht, &iter.iter, stream, node.node) { + /* Validate delete flag of the stream */ + if (!stream->delete_flag) { + continue; + } + /* Delete it right now */ + consumer_del_stream(stream, data_ht); + } + rcu_read_unlock(); +} + +/* + * Delete metadata stream that are flagged for deletion (delete_flag). + */ +static void delete_flagged_metadata_stream(struct lttng_poll_event *pollset) +{ + struct lttng_ht_iter iter; + struct lttng_consumer_stream *stream; + + DBG("Consumer delete flagged metadata stream"); + + assert(pollset); + + rcu_read_lock(); + cds_lfht_for_each_entry(metadata_ht->ht, &iter.iter, stream, node.node) { + /* Validate delete flag of the stream */ + if (!stream->delete_flag) { + continue; + } + /* + * Remove from pollset so the metadata thread can continue without + * blocking on a deleted stream. + */ + lttng_poll_del(pollset, stream->wait_fd); + + /* Delete it right now */ + consumer_del_metadata_stream(stream, metadata_ht); + } + rcu_read_unlock(); +} + +/* * Thread polls on metadata file descriptor and write them on disk or on the * network. */ @@ -1856,6 +2057,13 @@ restart: continue; } + /* A NULL stream means that the state has changed. */ + if (stream == NULL) { + /* Check for deleted streams. */ + delete_flagged_metadata_stream(&events); + continue; + } + DBG("Adding metadata stream %d to poll set", stream->wait_fd); @@ -2063,6 +2271,7 @@ void *consumer_thread_data_poll(void *data) * waking us up to test it. */ if (new_stream == NULL) { + delete_flagged_data_stream(); continue; } @@ -2301,14 +2510,9 @@ end: /* * Notify the data poll thread to poll back again and test the - * consumer_quit state to quit gracefully. + * consumer_quit state that we just set so to quit gracefully. */ - do { - struct lttng_consumer_stream *null_stream = NULL; - - ret = write(ctx->consumer_data_pipe[1], &null_stream, - sizeof(null_stream)); - } while (ret < 0 && errno == EINTR); + notify_thread_pipe(ctx->consumer_data_pipe[1]); rcu_unregister_thread(); return NULL; diff --git a/src/common/consumer.h b/src/common/consumer.h index 53b6151..99ce325 100644 --- a/src/common/consumer.h +++ b/src/common/consumer.h @@ -150,6 +150,8 @@ struct lttng_consumer_stream { pthread_mutex_t lock; /* Tracing session id */ uint64_t session_id; + /* Delete flag. This indicates that the stream must be deleted */ + unsigned int delete_flag; }; /* diff --git a/src/common/relayd/relayd.c b/src/common/relayd/relayd.c index 785d3dc..db47608 100644 --- a/src/common/relayd/relayd.c +++ b/src/common/relayd/relayd.c @@ -67,6 +67,7 @@ static int send_command(struct lttcomm_sock *sock, ret = sock->ops->sendmsg(sock, buf, buf_size, flags); if (ret < 0) { + ret = -errno; goto error; } @@ -90,6 +91,7 @@ static int recv_reply(struct lttcomm_sock *sock, void *data, size_t size) ret = sock->ops->recvmsg(sock, data, size, 0); if (ret < 0) { + ret = -errno; goto error; } @@ -283,6 +285,7 @@ int relayd_send_data_hdr(struct lttcomm_sock *sock, /* Only send data header. */ ret = sock->ops->sendmsg(sock, hdr, size, 0); if (ret < 0) { + ret = -errno; goto error; } -- 1.7.10.4 From dgoulet at efficios.com Thu Oct 25 15:53:47 2012 From: dgoulet at efficios.com (David Goulet) Date: Thu, 25 Oct 2012 15:53:47 -0400 Subject: [lttng-dev] [PATCH lttng-tools] Fix: Synchronization issue for data available command Message-ID: <1351194827-8604-1-git-send-email-dgoulet@efficios.com> Signed-off-by: David Goulet --- src/common/consumer.c | 66 +++++++++++++++++++++++--- src/common/kernel-consumer/kernel-consumer.c | 19 ++------ src/common/ust-consumer/ust-consumer.c | 19 ++------ 3 files changed, 66 insertions(+), 38 deletions(-) diff --git a/src/common/consumer.c b/src/common/consumer.c index e61a227..cf0e715 100644 --- a/src/common/consumer.c +++ b/src/common/consumer.c @@ -2657,6 +2657,33 @@ error: } /* + * Try to lock the stream mutex. + * + * On success, 1 is returned else 0 indicating that the mutex is NOT lock. + */ +static int stream_try_lock(struct lttng_consumer_stream *stream) +{ + int ret; + + assert(stream); + + /* + * Try to lock the stream mutex. On failure, we know that the stream is + * being used else where hence there is data still being extracted. + */ + ret = pthread_mutex_trylock(&stream->lock); + if (ret == EBUSY) { + ret = 0; + goto end; + } + + ret = 1; + +end: + return ret; +} + +/* * Check if for a given session id there is still data needed to be extract * from the buffers. * @@ -2696,17 +2723,43 @@ int consumer_data_available(uint64_t id) ht->hash_fct((void *)((unsigned long) id), 0x42UL), ht->match_fct, (void *)((unsigned long) id), &iter.iter, stream, node_session_id.node) { - /* Check the stream for data. */ - ret = data_available(stream); - if (ret == 0) { + /* If this call fails, the stream is being used hence data pending. */ + ret = stream_try_lock(stream); + if (!ret) { goto data_not_available; } + /* + * A removed node from the hash table indicates that the stream has + * been deleted thus having a guarantee that the buffers are closed + * on the consumer side. However, data can still be transmitted + * over the network so don't skip the relayd check. + */ + ret = cds_lfht_is_node_deleted(&stream->node.node); + if (!ret) { + /* Check the stream if there is data in the buffers. */ + ret = data_available(stream); + if (ret == 0) { + pthread_mutex_unlock(&stream->lock); + goto data_not_available; + } + } + + /* Relayd check */ if (stream->net_seq_idx != -1) { relayd = consumer_find_relayd(stream->net_seq_idx); - assert(relayd); + if (!relayd) { + /* + * At this point, if the relayd object is not available for the + * given stream, it is because the relayd is being cleanup so + * every stream associated with it (for a session id value) are + * or wil be marked for deletion hence not having data pending + * anymore. + */ + pthread_mutex_unlock(&stream->lock); + goto data_not_available; + } - pthread_mutex_lock(&stream->lock); pthread_mutex_lock(&relayd->ctrl_sock_mutex); if (stream->metadata_flag) { ret = relayd_quiescent_control(&relayd->control_sock); @@ -2715,11 +2768,12 @@ int consumer_data_available(uint64_t id) stream->relayd_stream_id, stream->next_net_seq_num); } pthread_mutex_unlock(&relayd->ctrl_sock_mutex); - pthread_mutex_unlock(&stream->lock); if (ret == 0) { + pthread_mutex_unlock(&stream->lock); goto data_not_available; } } + pthread_mutex_unlock(&stream->lock); } /* diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c index 249df8a..196deee 100644 --- a/src/common/kernel-consumer/kernel-consumer.c +++ b/src/common/kernel-consumer/kernel-consumer.c @@ -485,7 +485,8 @@ error: /* * Check if data is still being extracted from the buffers for a specific - * stream. Consumer data lock MUST be acquired before calling this function. + * stream. Consumer data lock MUST be acquired before calling this function + * and the stream lock. * * Return 0 if the traced data are still getting read else 1 meaning that the * data is available for trace viewer reading. @@ -496,31 +497,17 @@ int lttng_kconsumer_data_available(struct lttng_consumer_stream *stream) assert(stream); - /* - * Try to lock the stream mutex. On failure, we know that the stream is - * being used else where hence there is data still being extracted. - */ - ret = pthread_mutex_trylock(&stream->lock); - if (ret == EBUSY) { - /* Data not available */ - ret = 0; - goto end; - } - /* The stream is now locked so we can do our ustctl calls */ - ret = kernctl_get_next_subbuf(stream->wait_fd); if (ret == 0) { /* There is still data so let's put back this subbuffer. */ ret = kernctl_put_subbuf(stream->wait_fd); assert(ret == 0); - goto end_unlock; + goto end; } /* Data is available to be read for this stream. */ ret = 1; -end_unlock: - pthread_mutex_unlock(&stream->lock); end: return ret; } diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c index e8e3f93..4d3671a 100644 --- a/src/common/ust-consumer/ust-consumer.c +++ b/src/common/ust-consumer/ust-consumer.c @@ -526,7 +526,8 @@ error: /* * Check if data is still being extracted from the buffers for a specific - * stream. Consumer data lock MUST be acquired before calling this function. + * stream. Consumer data lock MUST be acquired before calling this function + * and the stream lock. * * Return 0 if the traced data are still getting read else 1 meaning that the * data is available for trace viewer reading. @@ -539,31 +540,17 @@ int lttng_ustconsumer_data_available(struct lttng_consumer_stream *stream) DBG("UST consumer checking data availability"); - /* - * Try to lock the stream mutex. On failure, we know that the stream is - * being used else where hence there is data still being extracted. - */ - ret = pthread_mutex_trylock(&stream->lock); - if (ret == EBUSY) { - /* Data not available */ - ret = 0; - goto end; - } - /* The stream is now locked so we can do our ustctl calls */ - ret = ustctl_get_next_subbuf(stream->chan->handle, stream->buf); if (ret == 0) { /* There is still data so let's put back this subbuffer. */ ret = ustctl_put_subbuf(stream->chan->handle, stream->buf); assert(ret == 0); - goto end_unlock; + goto end; } /* Data is available to be read for this stream. */ ret = 1; -end_unlock: - pthread_mutex_unlock(&stream->lock); end: return ret; } -- 1.7.10.4 From dgoulet at efficios.com Thu Oct 25 15:56:29 2012 From: dgoulet at efficios.com (David Goulet) Date: Thu, 25 Oct 2012 15:56:29 -0400 Subject: [lttng-dev] [PATCH lttng-tools] Fix: Handle the unary bitwise negation operator (~) in the XML printer In-Reply-To: <1351021001-18480-1-git-send-email-christian.babeux@efficios.com> References: <1351021001-18480-1-git-send-email-christian.babeux@efficios.com> Message-ID: <5089996D.9080103@efficios.com> Merged! Christian Babeux: > Signed-off-by: Christian Babeux > --- > src/lib/lttng-ctl/filter/filter-visitor-xml.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/lib/lttng-ctl/filter/filter-visitor-xml.c b/src/lib/lttng-ctl/filter/filter-visitor-xml.c > index 90a336d..1d14f2e 100644 > --- a/src/lib/lttng-ctl/filter/filter-visitor-xml.c > +++ b/src/lib/lttng-ctl/filter/filter-visitor-xml.c > @@ -235,6 +235,9 @@ int recursive_visit_print(struct filter_node *node, FILE *stream, int indent) > case AST_UNARY_NOT: > fprintf(stream, "\"!\""); > break; > + case AST_UNARY_BIN_NOT: > + fprintf(stream, "\"~\""); > + break; > } > fprintf(stream, ">\n"); > ret = recursive_visit_print(node->u.unary_op.child, From dgoulet at efficios.com Thu Oct 25 15:56:30 2012 From: dgoulet at efficios.com (David Goulet) Date: Thu, 25 Oct 2012 15:56:30 -0400 Subject: [lttng-dev] [PATCH lttng-tools] Fix: Possible memory leaks when creating filter IR root node In-Reply-To: <1351020977-18412-1-git-send-email-christian.babeux@efficios.com> References: <1351020977-18412-1-git-send-email-christian.babeux@efficios.com> Message-ID: <5089996E.80201@efficios.com> Merged! Christian Babeux: > Signed-off-by: Christian Babeux > --- > src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c b/src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c > index eec78fc..84122c9 100644 > --- a/src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c > +++ b/src/lib/lttng-ctl/filter/filter-visitor-generate-ir.c > @@ -46,9 +46,11 @@ struct ir_op *make_op_root(struct ir_op *child, enum ir_side side) > case IR_DATA_UNKNOWN: > default: > fprintf(stderr, "[error] Unknown root child data type\n"); > + free(op); > return NULL; > case IR_DATA_STRING: > fprintf(stderr, "[error] String cannot be root data type\n"); > + free(op); > return NULL; > case IR_DATA_NUMERIC: > case IR_DATA_FIELD_REF: From mathieu.desnoyers at efficios.com Thu Oct 25 17:04:18 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Thu, 25 Oct 2012 17:04:18 -0400 Subject: [lttng-dev] [PATCH lttng-tools] Fix: Synchronization issue for data available command In-Reply-To: <1351194827-8604-1-git-send-email-dgoulet@efficios.com> References: <1351194827-8604-1-git-send-email-dgoulet@efficios.com> Message-ID: <20121025210418.GA1014@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > Signed-off-by: David Goulet > --- > src/common/consumer.c | 66 +++++++++++++++++++++++--- > src/common/kernel-consumer/kernel-consumer.c | 19 ++------ > src/common/ust-consumer/ust-consumer.c | 19 ++------ > 3 files changed, 66 insertions(+), 38 deletions(-) > > diff --git a/src/common/consumer.c b/src/common/consumer.c > index e61a227..cf0e715 100644 > --- a/src/common/consumer.c > +++ b/src/common/consumer.c > @@ -2657,6 +2657,33 @@ error: > } > > /* > + * Try to lock the stream mutex. > + * > + * On success, 1 is returned else 0 indicating that the mutex is NOT lock. > + */ > +static int stream_try_lock(struct lttng_consumer_stream *stream) > +{ > + int ret; > + > + assert(stream); > + > + /* > + * Try to lock the stream mutex. On failure, we know that the stream is > + * being used else where hence there is data still being extracted. > + */ > + ret = pthread_mutex_trylock(&stream->lock); > + if (ret == EBUSY) { > + ret = 0; > + goto end; > + } what are we doing with the other errors ? > + > + ret = 1; > + > +end: > + return ret; > +} > + > +/* > * Check if for a given session id there is still data needed to be extract > * from the buffers. > * > @@ -2696,17 +2723,43 @@ int consumer_data_available(uint64_t id) > ht->hash_fct((void *)((unsigned long) id), 0x42UL), > ht->match_fct, (void *)((unsigned long) id), > &iter.iter, stream, node_session_id.node) { > - /* Check the stream for data. */ > - ret = data_available(stream); > - if (ret == 0) { > + /* If this call fails, the stream is being used hence data pending. */ > + ret = stream_try_lock(stream); > + if (!ret) { > goto data_not_available; > } > > + /* > + * A removed node from the hash table indicates that the stream has > + * been deleted thus having a guarantee that the buffers are closed > + * on the consumer side. However, data can still be transmitted > + * over the network so don't skip the relayd check. > + */ > + ret = cds_lfht_is_node_deleted(&stream->node.node); > + if (!ret) { > + /* Check the stream if there is data in the buffers. */ > + ret = data_available(stream); > + if (ret == 0) { > + pthread_mutex_unlock(&stream->lock); > + goto data_not_available; > + } > + } > + > + /* Relayd check */ > if (stream->net_seq_idx != -1) { > relayd = consumer_find_relayd(stream->net_seq_idx); > - assert(relayd); > + if (!relayd) { > + /* > + * At this point, if the relayd object is not available for the > + * given stream, it is because the relayd is being cleanup so cleanup -> cleaned up > + * every stream associated with it (for a session id value) are > + * or wil be marked for deletion hence not having data pending hence not having -> hence do not have wil -> will > + * anymore. > + */ > + pthread_mutex_unlock(&stream->lock); > + goto data_not_available; > + } > > - pthread_mutex_lock(&stream->lock); > pthread_mutex_lock(&relayd->ctrl_sock_mutex); > if (stream->metadata_flag) { > ret = relayd_quiescent_control(&relayd->control_sock); > @@ -2715,11 +2768,12 @@ int consumer_data_available(uint64_t id) > stream->relayd_stream_id, stream->next_net_seq_num); > } > pthread_mutex_unlock(&relayd->ctrl_sock_mutex); > - pthread_mutex_unlock(&stream->lock); > if (ret == 0) { > + pthread_mutex_unlock(&stream->lock); dependency (lock nesting order) between pthread_mutex_unlock(&relayd->ctrl_sock_mutex); pthread_mutex_unlock(&stream->lock); should be documented. Thanks, Mathieu > goto data_not_available; > } > } > + pthread_mutex_unlock(&stream->lock); > } > > /* > diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c > index 249df8a..196deee 100644 > --- a/src/common/kernel-consumer/kernel-consumer.c > +++ b/src/common/kernel-consumer/kernel-consumer.c > @@ -485,7 +485,8 @@ error: > > /* > * Check if data is still being extracted from the buffers for a specific > - * stream. Consumer data lock MUST be acquired before calling this function. > + * stream. Consumer data lock MUST be acquired before calling this function > + * and the stream lock. > * > * Return 0 if the traced data are still getting read else 1 meaning that the > * data is available for trace viewer reading. > @@ -496,31 +497,17 @@ int lttng_kconsumer_data_available(struct lttng_consumer_stream *stream) > > assert(stream); > > - /* > - * Try to lock the stream mutex. On failure, we know that the stream is > - * being used else where hence there is data still being extracted. > - */ > - ret = pthread_mutex_trylock(&stream->lock); > - if (ret == EBUSY) { > - /* Data not available */ > - ret = 0; > - goto end; > - } > - /* The stream is now locked so we can do our ustctl calls */ > - > ret = kernctl_get_next_subbuf(stream->wait_fd); > if (ret == 0) { > /* There is still data so let's put back this subbuffer. */ > ret = kernctl_put_subbuf(stream->wait_fd); > assert(ret == 0); > - goto end_unlock; > + goto end; > } > > /* Data is available to be read for this stream. */ > ret = 1; > > -end_unlock: > - pthread_mutex_unlock(&stream->lock); > end: > return ret; > } > diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c > index e8e3f93..4d3671a 100644 > --- a/src/common/ust-consumer/ust-consumer.c > +++ b/src/common/ust-consumer/ust-consumer.c > @@ -526,7 +526,8 @@ error: > > /* > * Check if data is still being extracted from the buffers for a specific > - * stream. Consumer data lock MUST be acquired before calling this function. > + * stream. Consumer data lock MUST be acquired before calling this function > + * and the stream lock. > * > * Return 0 if the traced data are still getting read else 1 meaning that the > * data is available for trace viewer reading. > @@ -539,31 +540,17 @@ int lttng_ustconsumer_data_available(struct lttng_consumer_stream *stream) > > DBG("UST consumer checking data availability"); > > - /* > - * Try to lock the stream mutex. On failure, we know that the stream is > - * being used else where hence there is data still being extracted. > - */ > - ret = pthread_mutex_trylock(&stream->lock); > - if (ret == EBUSY) { > - /* Data not available */ > - ret = 0; > - goto end; > - } > - /* The stream is now locked so we can do our ustctl calls */ > - > ret = ustctl_get_next_subbuf(stream->chan->handle, stream->buf); > if (ret == 0) { > /* There is still data so let's put back this subbuffer. */ > ret = ustctl_put_subbuf(stream->chan->handle, stream->buf); > assert(ret == 0); > - goto end_unlock; > + goto end; > } > > /* Data is available to be read for this stream. */ > ret = 1; > > -end_unlock: > - pthread_mutex_unlock(&stream->lock); > end: > return ret; > } > -- > 1.7.10.4 > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From dgoulet at efficios.com Thu Oct 25 18:01:36 2012 From: dgoulet at efficios.com (David Goulet) Date: Thu, 25 Oct 2012 18:01:36 -0400 Subject: [lttng-dev] [PATCH lttng-tools v2] Fix: consumer relayd cleanup on disconnect Message-ID: <1351202496-18892-1-git-send-email-dgoulet@efficios.com> Improve the resilience of the consumer by cleaning up a relayd object and all associated streams when a write error is detected on a relayd socket. Fixes #385 Signed-off-by: David Goulet --- src/common/consumer.c | 259 ++++++++++++++++++++++++++++++++++++++++---- src/common/consumer.h | 12 ++ src/common/relayd/relayd.c | 3 + 3 files changed, 254 insertions(+), 20 deletions(-) diff --git a/src/common/consumer.c b/src/common/consumer.c index 53c6180..eeb2f59 100644 --- a/src/common/consumer.c +++ b/src/common/consumer.c @@ -70,6 +70,21 @@ struct lttng_ht *metadata_ht; struct lttng_ht *data_ht; /* + * Notify a thread pipe to poll back again. This usually means that some global + * state has changed so we just send back the thread in a poll wait call. + */ +static void notify_thread_pipe(int wpipe) +{ + int ret; + + do { + struct lttng_consumer_stream *null_stream = NULL; + + ret = write(wpipe, &null_stream, sizeof(null_stream)); + } while (ret < 0 && errno == EINTR); +} + +/* * Find a stream. The consumer_data.lock must be locked during this * call. */ @@ -182,6 +197,17 @@ static void consumer_rcu_free_relayd(struct rcu_head *head) struct consumer_relayd_sock_pair *relayd = caa_container_of(node, struct consumer_relayd_sock_pair, node); + /* + * Close all sockets. This is done in the call RCU since we don't want the + * socket fds to be reassigned thus potentially creating bad state of the + * relayd object. + * + * We do not have to lock the control socket mutex here since at this stage + * there is no one referencing to this relayd object. + */ + (void) relayd_close(&relayd->control_sock); + (void) relayd_close(&relayd->data_sock); + free(relayd); } @@ -204,21 +230,89 @@ static void destroy_relayd(struct consumer_relayd_sock_pair *relayd) iter.iter.node = &relayd->node.node; ret = lttng_ht_del(consumer_data.relayd_ht, &iter); if (ret != 0) { - /* We assume the relayd was already destroyed */ + /* We assume the relayd is being or is destroyed */ return; } - /* Close all sockets */ - pthread_mutex_lock(&relayd->ctrl_sock_mutex); - (void) relayd_close(&relayd->control_sock); - pthread_mutex_unlock(&relayd->ctrl_sock_mutex); - (void) relayd_close(&relayd->data_sock); - /* RCU free() call */ call_rcu(&relayd->node.head, consumer_rcu_free_relayd); } /* + * Update the end point status of all streams having the given network sequence + * index (relayd index). + * + * It's atomically set without having the stream mutex locked so be aware of + * potential race when using it. + */ +static void update_endpoint_status_by_netidx(int net_seq_idx, + enum consumer_endpoint_status status) +{ + struct lttng_ht_iter iter; + struct lttng_consumer_stream *stream; + + DBG("Consumer set delete flag on stream by idx %d", net_seq_idx); + + rcu_read_lock(); + + /* Let's begin with metadata */ + cds_lfht_for_each_entry(metadata_ht->ht, &iter.iter, stream, node.node) { + if (stream->net_seq_idx == net_seq_idx) { + uatomic_set(&stream->endpoint_status, status); + DBG("Delete flag set to metadata stream %d", stream->wait_fd); + } + } + + /* Follow up by the data streams */ + cds_lfht_for_each_entry(data_ht->ht, &iter.iter, stream, node.node) { + if (stream->net_seq_idx == net_seq_idx) { + uatomic_set(&stream->endpoint_status, status); + DBG("Delete flag set to data stream %d", stream->wait_fd); + } + } + rcu_read_unlock(); +} + +/* + * Cleanup a relayd object by flagging every associated streams for deletion, + * destroying the object meaning removing it from the relayd hash table, + * closing the sockets and freeing the memory in a RCU call. + * + * If a local data context is available, notify the threads that the streams' + * state have changed. + */ +static void cleanup_relayd(struct consumer_relayd_sock_pair *relayd, + struct lttng_consumer_local_data *ctx) +{ + int netidx; + + assert(relayd); + + /* Save the net sequence index before destroying the object */ + netidx = relayd->net_seq_idx; + + /* + * Delete the relayd from the relayd hash table, close the sockets and free + * the object in a RCU call. + */ + destroy_relayd(relayd); + + /* Set inactive endpoint to all streams */ + update_endpoint_status_by_netidx(netidx, CONSUMER_ENDPOINT_INACTIVE); + + /* + * With a local data context, notify the threads that the streams' state + * have changed. The write() action on the pipe acts as an "implicit" + * memory barrier ordering the updates of the end point status from the + * read of this status which happens AFTER receiving this notify. + */ + if (ctx) { + notify_thread_pipe(ctx->consumer_data_pipe[1]); + notify_thread_pipe(ctx->consumer_metadata_pipe[1]); + } +} + +/* * Flag a relayd socket pair for destruction. Destroy it if the refcount * reaches zero. * @@ -251,11 +345,14 @@ void consumer_del_stream(struct lttng_consumer_stream *stream, assert(stream); + DBG("Consumer del stream %d", stream->wait_fd); + if (ht == NULL) { /* Means the stream was allocated but not successfully added */ goto free_stream; } + pthread_mutex_lock(&stream->lock); pthread_mutex_lock(&consumer_data.lock); switch (consumer_data.type) { @@ -349,6 +446,7 @@ void consumer_del_stream(struct lttng_consumer_stream *stream, end: consumer_data.need_update = 1; pthread_mutex_unlock(&consumer_data.lock); + pthread_mutex_unlock(&stream->lock); if (free_chan) { consumer_del_channel(free_chan); @@ -804,7 +902,17 @@ static int consumer_update_poll_array( DBG("Updating poll fd array"); rcu_read_lock(); cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { - if (stream->state != LTTNG_CONSUMER_ACTIVE_STREAM) { + /* + * Only active streams with an active end point can be added to the + * poll set and local stream storage of the thread. + * + * There is a potential race here for endpoint_status to be updated + * just after the check. However, this is OK since the stream(s) will + * be deleted once the thread is notified that the end point state has + * changed where this function will be called back again. + */ + if (stream->state != LTTNG_CONSUMER_ACTIVE_STREAM || + stream->endpoint_status) { continue; } DBG("Active FD %d", stream->wait_fd); @@ -1169,6 +1277,7 @@ ssize_t lttng_consumer_on_read_subbuffer_mmap( /* Default is on the disk */ int outfd = stream->out_fd; struct consumer_relayd_sock_pair *relayd = NULL; + unsigned int relayd_hang_up = 0; /* RCU lock for the relayd pointer */ rcu_read_lock(); @@ -1228,11 +1337,22 @@ ssize_t lttng_consumer_on_read_subbuffer_mmap( ret = write_relayd_metadata_id(outfd, stream, relayd, padding); if (ret < 0) { written = ret; + /* Socket operation failed. We consider the relayd dead */ + if (ret == -EPIPE || ret == -EINVAL) { + relayd_hang_up = 1; + goto write_error; + } goto end; } } + } else { + /* Socket operation failed. We consider the relayd dead */ + if (ret == -EPIPE || ret == -EINVAL) { + relayd_hang_up = 1; + goto write_error; + } + /* Else, use the default set before which is the filesystem. */ } - /* Else, use the default set before which is the filesystem. */ } else { /* No streaming, we have to set the len with the full padding */ len += padding; @@ -1248,6 +1368,11 @@ ssize_t lttng_consumer_on_read_subbuffer_mmap( if (written == 0) { written = ret; } + /* Socket operation failed. We consider the relayd dead */ + if (errno == EPIPE || errno == EINVAL) { + relayd_hang_up = 1; + goto write_error; + } goto end; } else if (ret > len) { PERROR("Error in file write (ret %zd > len %lu)", ret, len); @@ -1269,6 +1394,15 @@ ssize_t lttng_consumer_on_read_subbuffer_mmap( } lttng_consumer_sync_trace_file(stream, orig_offset); +write_error: + /* + * This is a special case that the relayd has closed its socket. Let's + * cleanup the relayd object and all associated streams. + */ + if (relayd && relayd_hang_up) { + cleanup_relayd(relayd, ctx); + } + end: /* Unlock only if ctrl socket used */ if (relayd && stream->metadata_flag) { @@ -1298,6 +1432,7 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( int outfd = stream->out_fd; struct consumer_relayd_sock_pair *relayd = NULL; int *splice_pipe; + unsigned int relayd_hang_up = 0; switch (consumer_data.type) { case LTTNG_CONSUMER_KERNEL: @@ -1350,6 +1485,12 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( padding); if (ret < 0) { written = ret; + /* Socket operation failed. We consider the relayd dead */ + if (ret == -EBADF) { + WARN("Remote relayd disconnected. Stopping"); + relayd_hang_up = 1; + goto write_error; + } goto end; } @@ -1361,7 +1502,12 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( /* Use the returned socket. */ outfd = ret; } else { - ERR("Remote relayd disconnected. Stopping"); + /* Socket operation failed. We consider the relayd dead */ + if (ret == -EBADF) { + WARN("Remote relayd disconnected. Stopping"); + relayd_hang_up = 1; + goto write_error; + } goto end; } } else { @@ -1410,6 +1556,12 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( if (written == 0) { written = ret_splice; } + /* Socket operation failed. We consider the relayd dead */ + if (errno == EBADF) { + WARN("Remote relayd disconnected. Stopping"); + relayd_hang_up = 1; + goto write_error; + } ret = errno; goto splice_error; } else if (ret_splice > len) { @@ -1437,12 +1589,20 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( goto end; +write_error: + /* + * This is a special case that the relayd has closed its socket. Let's + * cleanup the relayd object and all associated streams. + */ + if (relayd && relayd_hang_up) { + cleanup_relayd(relayd, ctx); + /* Skip splice error so the consumer does not fail */ + goto end; + } + splice_error: /* send the appropriate error description to sessiond */ switch (ret) { - case EBADF: - lttng_consumer_send_error(ctx, LTTCOMM_CONSUMERD_SPLICE_EBADF); - break; case EINVAL: lttng_consumer_send_error(ctx, LTTCOMM_CONSUMERD_SPLICE_EINVAL); break; @@ -1604,6 +1764,8 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, goto free_stream; } + pthread_mutex_lock(&stream->lock); + pthread_mutex_lock(&consumer_data.lock); switch (consumer_data.type) { case LTTNG_CONSUMER_KERNEL: @@ -1695,6 +1857,7 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, end: pthread_mutex_unlock(&consumer_data.lock); + pthread_mutex_unlock(&stream->lock); if (free_chan) { consumer_del_channel(free_chan); @@ -1766,6 +1929,59 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, } /* + * Delete data stream that are flagged for deletion (endpoint_status). + */ +static void validate_endpoint_status_data_stream(void) +{ + struct lttng_ht_iter iter; + struct lttng_consumer_stream *stream; + + DBG("Consumer delete flagged data stream"); + + rcu_read_lock(); + cds_lfht_for_each_entry(data_ht->ht, &iter.iter, stream, node.node) { + /* Validate delete flag of the stream */ + if (!stream->endpoint_status) { + continue; + } + /* Delete it right now */ + consumer_del_stream(stream, data_ht); + } + rcu_read_unlock(); +} + +/* + * Delete metadata stream that are flagged for deletion (endpoint_status). + */ +static void validate_endpoint_status_metadata_stream( + struct lttng_poll_event *pollset) +{ + struct lttng_ht_iter iter; + struct lttng_consumer_stream *stream; + + DBG("Consumer delete flagged metadata stream"); + + assert(pollset); + + rcu_read_lock(); + cds_lfht_for_each_entry(metadata_ht->ht, &iter.iter, stream, node.node) { + /* Validate delete flag of the stream */ + if (!stream->endpoint_status) { + continue; + } + /* + * Remove from pollset so the metadata thread can continue without + * blocking on a deleted stream. + */ + lttng_poll_del(pollset, stream->wait_fd); + + /* Delete it right now */ + consumer_del_metadata_stream(stream, metadata_ht); + } + rcu_read_unlock(); +} + +/* * Thread polls on metadata file descriptor and write them on disk or on the * network. */ @@ -1856,6 +2072,13 @@ restart: continue; } + /* A NULL stream means that the state has changed. */ + if (stream == NULL) { + /* Check for deleted streams. */ + validate_endpoint_status_metadata_stream(&events); + continue; + } + DBG("Adding metadata stream %d to poll set", stream->wait_fd); @@ -2063,6 +2286,7 @@ void *consumer_thread_data_poll(void *data) * waking us up to test it. */ if (new_stream == NULL) { + validate_endpoint_status_data_stream(); continue; } @@ -2301,14 +2525,9 @@ end: /* * Notify the data poll thread to poll back again and test the - * consumer_quit state to quit gracefully. + * consumer_quit state that we just set so to quit gracefully. */ - do { - struct lttng_consumer_stream *null_stream = NULL; - - ret = write(ctx->consumer_data_pipe[1], &null_stream, - sizeof(null_stream)); - } while (ret < 0 && errno == EINTR); + notify_thread_pipe(ctx->consumer_data_pipe[1]); rcu_unregister_thread(); return NULL; diff --git a/src/common/consumer.h b/src/common/consumer.h index 53b6151..0334c49 100644 --- a/src/common/consumer.h +++ b/src/common/consumer.h @@ -74,6 +74,11 @@ enum lttng_consumer_type { LTTNG_CONSUMER32_UST, }; +enum consumer_endpoint_status { + CONSUMER_ENDPOINT_ACTIVE, + CONSUMER_ENDPOINT_INACTIVE, +}; + struct lttng_consumer_channel { struct lttng_ht_node_ulong node; int key; @@ -150,6 +155,13 @@ struct lttng_consumer_stream { pthread_mutex_t lock; /* Tracing session id */ uint64_t session_id; + /* + * Indicates if the stream end point is still active or not (network + * streaming or local file system). The thread "owning" the stream is + * handling this status and can be notified of a state change through the + * consumer data appropriate pipe. + */ + enum consumer_endpoint_status endpoint_status; }; /* diff --git a/src/common/relayd/relayd.c b/src/common/relayd/relayd.c index 785d3dc..db47608 100644 --- a/src/common/relayd/relayd.c +++ b/src/common/relayd/relayd.c @@ -67,6 +67,7 @@ static int send_command(struct lttcomm_sock *sock, ret = sock->ops->sendmsg(sock, buf, buf_size, flags); if (ret < 0) { + ret = -errno; goto error; } @@ -90,6 +91,7 @@ static int recv_reply(struct lttcomm_sock *sock, void *data, size_t size) ret = sock->ops->recvmsg(sock, data, size, 0); if (ret < 0) { + ret = -errno; goto error; } @@ -283,6 +285,7 @@ int relayd_send_data_hdr(struct lttcomm_sock *sock, /* Only send data header. */ ret = sock->ops->sendmsg(sock, hdr, size, 0); if (ret < 0) { + ret = -errno; goto error; } -- 1.7.10.4 From dgoulet at efficios.com Thu Oct 25 18:04:31 2012 From: dgoulet at efficios.com (David Goulet) Date: Thu, 25 Oct 2012 18:04:31 -0400 Subject: [lttng-dev] [PATCH lttng-tools] Fix: Synchronization issue for data available command In-Reply-To: <20121025210418.GA1014@Krystal> References: <1351194827-8604-1-git-send-email-dgoulet@efficios.com> <20121025210418.GA1014@Krystal> Message-ID: <5089B76F.9080207@efficios.com> Mathieu Desnoyers: > * David Goulet (dgoulet at efficios.com) wrote: >> Signed-off-by: David Goulet >> --- >> src/common/consumer.c | 66 +++++++++++++++++++++++--- >> src/common/kernel-consumer/kernel-consumer.c | 19 ++------ >> src/common/ust-consumer/ust-consumer.c | 19 ++------ >> 3 files changed, 66 insertions(+), 38 deletions(-) >> >> diff --git a/src/common/consumer.c b/src/common/consumer.c >> index e61a227..cf0e715 100644 >> --- a/src/common/consumer.c >> +++ b/src/common/consumer.c >> @@ -2657,6 +2657,33 @@ error: >> } >> >> /* >> + * Try to lock the stream mutex. >> + * >> + * On success, 1 is returned else 0 indicating that the mutex is NOT lock. >> + */ >> +static int stream_try_lock(struct lttng_consumer_stream *stream) >> +{ >> + int ret; >> + >> + assert(stream); >> + >> + /* >> + * Try to lock the stream mutex. On failure, we know that the stream is >> + * being used else where hence there is data still being extracted. >> + */ >> + ret = pthread_mutex_trylock(&stream->lock); >> + if (ret == EBUSY) { >> + ret = 0; >> + goto end; >> + } > > what are we doing with the other errors ? > The only other error is EINVAL and is basically impossible to get at this stage since the stream is certain to be valid. >> + >> + ret = 1; >> + >> +end: >> + return ret; >> +} >> + >> +/* >> * Check if for a given session id there is still data needed to be extract >> * from the buffers. >> * >> @@ -2696,17 +2723,43 @@ int consumer_data_available(uint64_t id) >> ht->hash_fct((void *)((unsigned long) id), 0x42UL), >> ht->match_fct, (void *)((unsigned long) id), >> &iter.iter, stream, node_session_id.node) { >> - /* Check the stream for data. */ >> - ret = data_available(stream); >> - if (ret == 0) { >> + /* If this call fails, the stream is being used hence data pending. */ >> + ret = stream_try_lock(stream); >> + if (!ret) { >> goto data_not_available; >> } >> >> + /* >> + * A removed node from the hash table indicates that the stream has >> + * been deleted thus having a guarantee that the buffers are closed >> + * on the consumer side. However, data can still be transmitted >> + * over the network so don't skip the relayd check. >> + */ >> + ret = cds_lfht_is_node_deleted(&stream->node.node); >> + if (!ret) { >> + /* Check the stream if there is data in the buffers. */ >> + ret = data_available(stream); >> + if (ret == 0) { >> + pthread_mutex_unlock(&stream->lock); >> + goto data_not_available; >> + } >> + } >> + >> + /* Relayd check */ >> if (stream->net_seq_idx != -1) { >> relayd = consumer_find_relayd(stream->net_seq_idx); >> - assert(relayd); >> + if (!relayd) { >> + /* >> + * At this point, if the relayd object is not available for the >> + * given stream, it is because the relayd is being cleanup so > > cleanup -> cleaned up > >> + * every stream associated with it (for a session id value) are >> + * or wil be marked for deletion hence not having data pending > > hence not having -> hence do not have > > > wil -> will > >> + * anymore. >> + */ >> + pthread_mutex_unlock(&stream->lock); >> + goto data_not_available; >> + } >> >> - pthread_mutex_lock(&stream->lock); >> pthread_mutex_lock(&relayd->ctrl_sock_mutex); >> if (stream->metadata_flag) { >> ret = relayd_quiescent_control(&relayd->control_sock); >> @@ -2715,11 +2768,12 @@ int consumer_data_available(uint64_t id) >> stream->relayd_stream_id, stream->next_net_seq_num); >> } >> pthread_mutex_unlock(&relayd->ctrl_sock_mutex); >> - pthread_mutex_unlock(&stream->lock); >> if (ret == 0) { >> + pthread_mutex_unlock(&stream->lock); > > dependency (lock nesting order) between > > pthread_mutex_unlock(&relayd->ctrl_sock_mutex); > pthread_mutex_unlock(&stream->lock); > > should be documented. I'll add another patch on top of this one, like we discussed, that documents the whole locking scheme of this file. Thanks! David > > Thanks, > > Mathieu > > >> goto data_not_available; >> } >> } >> + pthread_mutex_unlock(&stream->lock); >> } >> >> /* >> diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c >> index 249df8a..196deee 100644 >> --- a/src/common/kernel-consumer/kernel-consumer.c >> +++ b/src/common/kernel-consumer/kernel-consumer.c >> @@ -485,7 +485,8 @@ error: >> >> /* >> * Check if data is still being extracted from the buffers for a specific >> - * stream. Consumer data lock MUST be acquired before calling this function. >> + * stream. Consumer data lock MUST be acquired before calling this function >> + * and the stream lock. >> * >> * Return 0 if the traced data are still getting read else 1 meaning that the >> * data is available for trace viewer reading. >> @@ -496,31 +497,17 @@ int lttng_kconsumer_data_available(struct lttng_consumer_stream *stream) >> >> assert(stream); >> >> - /* >> - * Try to lock the stream mutex. On failure, we know that the stream is >> - * being used else where hence there is data still being extracted. >> - */ >> - ret = pthread_mutex_trylock(&stream->lock); >> - if (ret == EBUSY) { >> - /* Data not available */ >> - ret = 0; >> - goto end; >> - } >> - /* The stream is now locked so we can do our ustctl calls */ >> - >> ret = kernctl_get_next_subbuf(stream->wait_fd); >> if (ret == 0) { >> /* There is still data so let's put back this subbuffer. */ >> ret = kernctl_put_subbuf(stream->wait_fd); >> assert(ret == 0); >> - goto end_unlock; >> + goto end; >> } >> >> /* Data is available to be read for this stream. */ >> ret = 1; >> >> -end_unlock: >> - pthread_mutex_unlock(&stream->lock); >> end: >> return ret; >> } >> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c >> index e8e3f93..4d3671a 100644 >> --- a/src/common/ust-consumer/ust-consumer.c >> +++ b/src/common/ust-consumer/ust-consumer.c >> @@ -526,7 +526,8 @@ error: >> >> /* >> * Check if data is still being extracted from the buffers for a specific >> - * stream. Consumer data lock MUST be acquired before calling this function. >> + * stream. Consumer data lock MUST be acquired before calling this function >> + * and the stream lock. >> * >> * Return 0 if the traced data are still getting read else 1 meaning that the >> * data is available for trace viewer reading. >> @@ -539,31 +540,17 @@ int lttng_ustconsumer_data_available(struct lttng_consumer_stream *stream) >> >> DBG("UST consumer checking data availability"); >> >> - /* >> - * Try to lock the stream mutex. On failure, we know that the stream is >> - * being used else where hence there is data still being extracted. >> - */ >> - ret = pthread_mutex_trylock(&stream->lock); >> - if (ret == EBUSY) { >> - /* Data not available */ >> - ret = 0; >> - goto end; >> - } >> - /* The stream is now locked so we can do our ustctl calls */ >> - >> ret = ustctl_get_next_subbuf(stream->chan->handle, stream->buf); >> if (ret == 0) { >> /* There is still data so let's put back this subbuffer. */ >> ret = ustctl_put_subbuf(stream->chan->handle, stream->buf); >> assert(ret == 0); >> - goto end_unlock; >> + goto end; >> } >> >> /* Data is available to be read for this stream. */ >> ret = 1; >> >> -end_unlock: >> - pthread_mutex_unlock(&stream->lock); >> end: >> return ret; >> } >> -- >> 1.7.10.4 >> >> >> _______________________________________________ >> lttng-dev mailing list >> lttng-dev at lists.lttng.org >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > From mathieu.desnoyers at efficios.com Thu Oct 25 18:05:32 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Thu, 25 Oct 2012 18:05:32 -0400 Subject: [lttng-dev] [PATCH lttng-tools] Fix: Synchronization issue for data available command In-Reply-To: <5089B76F.9080207@efficios.com> References: <1351194827-8604-1-git-send-email-dgoulet@efficios.com> <20121025210418.GA1014@Krystal> <5089B76F.9080207@efficios.com> Message-ID: <20121025220532.GA2115@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > > > Mathieu Desnoyers: > > * David Goulet (dgoulet at efficios.com) wrote: > >> Signed-off-by: David Goulet > >> --- > >> src/common/consumer.c | 66 +++++++++++++++++++++++--- > >> src/common/kernel-consumer/kernel-consumer.c | 19 ++------ > >> src/common/ust-consumer/ust-consumer.c | 19 ++------ > >> 3 files changed, 66 insertions(+), 38 deletions(-) > >> > >> diff --git a/src/common/consumer.c b/src/common/consumer.c > >> index e61a227..cf0e715 100644 > >> --- a/src/common/consumer.c > >> +++ b/src/common/consumer.c > >> @@ -2657,6 +2657,33 @@ error: > >> } > >> > >> /* > >> + * Try to lock the stream mutex. > >> + * > >> + * On success, 1 is returned else 0 indicating that the mutex is NOT lock. > >> + */ > >> +static int stream_try_lock(struct lttng_consumer_stream *stream) > >> +{ > >> + int ret; > >> + > >> + assert(stream); > >> + > >> + /* > >> + * Try to lock the stream mutex. On failure, we know that the stream is > >> + * being used else where hence there is data still being extracted. > >> + */ > >> + ret = pthread_mutex_trylock(&stream->lock); > >> + if (ret == EBUSY) { > >> + ret = 0; > >> + goto end; > >> + } > > > > what are we doing with the other errors ? > > > > The only other error is EINVAL and is basically impossible to get at > this stage since the stream is certain to be valid. The whole point of testing error values is to test all of them, not to "assume that at this point this error value cannot happen". Please fix. Thanks, Mathieu > > >> + > >> + ret = 1; > >> + > >> +end: > >> + return ret; > >> +} > >> + > >> +/* > >> * Check if for a given session id there is still data needed to be extract > >> * from the buffers. > >> * > >> @@ -2696,17 +2723,43 @@ int consumer_data_available(uint64_t id) > >> ht->hash_fct((void *)((unsigned long) id), 0x42UL), > >> ht->match_fct, (void *)((unsigned long) id), > >> &iter.iter, stream, node_session_id.node) { > >> - /* Check the stream for data. */ > >> - ret = data_available(stream); > >> - if (ret == 0) { > >> + /* If this call fails, the stream is being used hence data pending. */ > >> + ret = stream_try_lock(stream); > >> + if (!ret) { > >> goto data_not_available; > >> } > >> > >> + /* > >> + * A removed node from the hash table indicates that the stream has > >> + * been deleted thus having a guarantee that the buffers are closed > >> + * on the consumer side. However, data can still be transmitted > >> + * over the network so don't skip the relayd check. > >> + */ > >> + ret = cds_lfht_is_node_deleted(&stream->node.node); > >> + if (!ret) { > >> + /* Check the stream if there is data in the buffers. */ > >> + ret = data_available(stream); > >> + if (ret == 0) { > >> + pthread_mutex_unlock(&stream->lock); > >> + goto data_not_available; > >> + } > >> + } > >> + > >> + /* Relayd check */ > >> if (stream->net_seq_idx != -1) { > >> relayd = consumer_find_relayd(stream->net_seq_idx); > >> - assert(relayd); > >> + if (!relayd) { > >> + /* > >> + * At this point, if the relayd object is not available for the > >> + * given stream, it is because the relayd is being cleanup so > > > > cleanup -> cleaned up > > > >> + * every stream associated with it (for a session id value) are > >> + * or wil be marked for deletion hence not having data pending > > > > hence not having -> hence do not have > > > > > > wil -> will > > > >> + * anymore. > >> + */ > >> + pthread_mutex_unlock(&stream->lock); > >> + goto data_not_available; > >> + } > >> > >> - pthread_mutex_lock(&stream->lock); > >> pthread_mutex_lock(&relayd->ctrl_sock_mutex); > >> if (stream->metadata_flag) { > >> ret = relayd_quiescent_control(&relayd->control_sock); > >> @@ -2715,11 +2768,12 @@ int consumer_data_available(uint64_t id) > >> stream->relayd_stream_id, stream->next_net_seq_num); > >> } > >> pthread_mutex_unlock(&relayd->ctrl_sock_mutex); > >> - pthread_mutex_unlock(&stream->lock); > >> if (ret == 0) { > >> + pthread_mutex_unlock(&stream->lock); > > > > dependency (lock nesting order) between > > > > pthread_mutex_unlock(&relayd->ctrl_sock_mutex); > > pthread_mutex_unlock(&stream->lock); > > > > should be documented. > > I'll add another patch on top of this one, like we discussed, that > documents the whole locking scheme of this file. > > Thanks! > David > > > > > Thanks, > > > > Mathieu > > > > > >> goto data_not_available; > >> } > >> } > >> + pthread_mutex_unlock(&stream->lock); > >> } > >> > >> /* > >> diff --git a/src/common/kernel-consumer/kernel-consumer.c b/src/common/kernel-consumer/kernel-consumer.c > >> index 249df8a..196deee 100644 > >> --- a/src/common/kernel-consumer/kernel-consumer.c > >> +++ b/src/common/kernel-consumer/kernel-consumer.c > >> @@ -485,7 +485,8 @@ error: > >> > >> /* > >> * Check if data is still being extracted from the buffers for a specific > >> - * stream. Consumer data lock MUST be acquired before calling this function. > >> + * stream. Consumer data lock MUST be acquired before calling this function > >> + * and the stream lock. > >> * > >> * Return 0 if the traced data are still getting read else 1 meaning that the > >> * data is available for trace viewer reading. > >> @@ -496,31 +497,17 @@ int lttng_kconsumer_data_available(struct lttng_consumer_stream *stream) > >> > >> assert(stream); > >> > >> - /* > >> - * Try to lock the stream mutex. On failure, we know that the stream is > >> - * being used else where hence there is data still being extracted. > >> - */ > >> - ret = pthread_mutex_trylock(&stream->lock); > >> - if (ret == EBUSY) { > >> - /* Data not available */ > >> - ret = 0; > >> - goto end; > >> - } > >> - /* The stream is now locked so we can do our ustctl calls */ > >> - > >> ret = kernctl_get_next_subbuf(stream->wait_fd); > >> if (ret == 0) { > >> /* There is still data so let's put back this subbuffer. */ > >> ret = kernctl_put_subbuf(stream->wait_fd); > >> assert(ret == 0); > >> - goto end_unlock; > >> + goto end; > >> } > >> > >> /* Data is available to be read for this stream. */ > >> ret = 1; > >> > >> -end_unlock: > >> - pthread_mutex_unlock(&stream->lock); > >> end: > >> return ret; > >> } > >> diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c > >> index e8e3f93..4d3671a 100644 > >> --- a/src/common/ust-consumer/ust-consumer.c > >> +++ b/src/common/ust-consumer/ust-consumer.c > >> @@ -526,7 +526,8 @@ error: > >> > >> /* > >> * Check if data is still being extracted from the buffers for a specific > >> - * stream. Consumer data lock MUST be acquired before calling this function. > >> + * stream. Consumer data lock MUST be acquired before calling this function > >> + * and the stream lock. > >> * > >> * Return 0 if the traced data are still getting read else 1 meaning that the > >> * data is available for trace viewer reading. > >> @@ -539,31 +540,17 @@ int lttng_ustconsumer_data_available(struct lttng_consumer_stream *stream) > >> > >> DBG("UST consumer checking data availability"); > >> > >> - /* > >> - * Try to lock the stream mutex. On failure, we know that the stream is > >> - * being used else where hence there is data still being extracted. > >> - */ > >> - ret = pthread_mutex_trylock(&stream->lock); > >> - if (ret == EBUSY) { > >> - /* Data not available */ > >> - ret = 0; > >> - goto end; > >> - } > >> - /* The stream is now locked so we can do our ustctl calls */ > >> - > >> ret = ustctl_get_next_subbuf(stream->chan->handle, stream->buf); > >> if (ret == 0) { > >> /* There is still data so let's put back this subbuffer. */ > >> ret = ustctl_put_subbuf(stream->chan->handle, stream->buf); > >> assert(ret == 0); > >> - goto end_unlock; > >> + goto end; > >> } > >> > >> /* Data is available to be read for this stream. */ > >> ret = 1; > >> > >> -end_unlock: > >> - pthread_mutex_unlock(&stream->lock); > >> end: > >> return ret; > >> } > >> -- > >> 1.7.10.4 > >> > >> > >> _______________________________________________ > >> lttng-dev mailing list > >> lttng-dev at lists.lttng.org > >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From jdesfossez at efficios.com Thu Oct 25 22:35:12 2012 From: jdesfossez at efficios.com (Julien Desfossez) Date: Thu, 25 Oct 2012 22:35:12 -0400 Subject: [lttng-dev] RFCv2 : design notes for remote traces live reading Message-ID: <5089F6E0.7000506@efficios.com> In order to achieve live reading of streamed traces, we need : - cooperating tracers - the index generation while tracing - index streaming - synchronization of streams - cooperating viewers This RFC addresses each of these points with the anticipated design, implementation is on its way, so quick feedbacks greatly appreciated ! * Cooperating tracers The metadata is mandatory to process any CTF trace. In order to achieve live trace reading, the metadata must be available to the viewer when it starts reading the trace. For now, the considered approach is to flush periodically the metadata stream to make sure it is sent. This topic needs more discussions. We need to find a way to make sure that the viewer cannot start reading data for which it does not have the metadata. * Index generation The index associates a trace packet with an offset inside the tracefile. While tracing, when a packet is ready to be written, we can ask the ring buffer to provide the information required to produce the index. For the viewers, the structure describing an index entry is the following : struct packet_index { off_t offset; /* offset of the packet in the file, in bytes */ int64_t data_offset; /* offset of data within the packet, in bits */ uint64_t packet_size; /* packet size, in bits */ uint64_t content_size; /* content size, in bits */ uint64_t timestamp_begin; uint64_t timestamp_end; uint64_t events_discarded; uint64_t events_discarded_len;/* length of the field, in bits */ uint64_t stream_id; }; The offset field is known when writing the trace file on disk. The fields data_offset and events_discarded_len can be computed from the metadata so we don't need to extract these 3 fields from the ring buffer. So the structure we need to extract from the tracer and write is the following : struct packet_index { uint64_t packet_size; /* packet size, in bits */ uint64_t content_size; /* content size, in bits */ uint64_t timestamp_begin; uint64_t timestamp_end; uint64_t events_discarded; uint64_t stream_id; }; * Index streaming The index is mandatory for live reading since we use it for the streams synchronization. We absolutely need to receive the index, so we send it on the control port (TCP-only), but most of the information related to the index is only relevant if we receive the associated data packet. So the proposed protocol is the following : - with each data packet, send the packet_size and content_size along with the already in place information (stream id and sequence number) - after sending a data packet, the consumer sends on the control port a new message (RELAYD_SEND_INDEX) with timestamp_begin, timestamp_end, events_discarded, stream_id, the sequence number, and the relayd stream id of the tracefile - when the relay receives a data packet it looks if it already received an index corresponding to this stream and sequence number, if yes it completes the index structure and writes the index on disk, otherwise it creates an index structure in memory with the information it can fill and stores it in a hash table waiting for the corresponding index packet to arrive - the same concept applies when the relay receives an index packet. This two-part remote index generation allows us to determine if we lost packets because of the network, limit the number of bytes sent on the control port and make sure we still have an index for each packet with its timestamps and the number of lost events so the viewer knows if we lost events because of the tracer or the network. In the relay we will introduce a hash table to help the lookups. The hash function will perform a XOR on the stream_id and sequence_number and the compare function will compare the two to avoid collisions. Also the hash table storing the indexes needs an expiration mechanism (based on timing or number of packets). Since some data may never arrive (lost UDP packets), we will add a separate data structure to store the timeout associated with each index entry. A timer will make sure to remove the expired entries. * Synchronization of streams Already discussed in an earlier RFC, summary : - at a predefined rate, the consumer sends a synchronization packet that contains the last sequence number that can be safely read by the viewer for each stream of the session, it happens as soon as possible when all streams are generating data, and also time-based to cover the case with streams not generating any data. - the relay receives this packet, ensures all data packets and indexes are commited on disk (and sync'ed) and updates the synchronization with the viewers (discussed just below) - if a consumer does not send any data on any stream the synchronization message is not necessary (since there is no data to display) so it won't be sent * Cooperating viewers The viewers need to be aware that they are reading streamed data and play nicely with the synchronization algorithms in place. The proposed approach is using fcntl(2) "Advisory locking" to lock specific portions of the tracefiles. The viewers will have to test and make sure they are respecting the locks when they are switching packets. So in summary : - when the relay is ready to let the viewers access the data, it adds a new write lock on the region that cannot be safely read and removes the previous one - when a viewer needs to switch packet, it tests for the presence of a lock on the region of the file it needs to access, if there is no lock it can safely read the data, otherwise it blocks until the lock is removed. - when a data packet is lost on the network, an index is written, but the offset in the tracefile is set to an invalid value (-1) so the reader knows the data was lost in transit. - when a new stream is created (cpu-hotplug or new application started), a new trace file is created on disk. The relay creates and immediately locks the file. The relay has the responsibility to not write data older than the oldest event in the other streams already available to the viewer (unlocked). - The viewer has the responsibility to detect new tracefiles (by using a notifications mechanism for example) - the viewers need also to be adapted to read on-disk indexes, support metadata updates, respect the locking. Feedbacks, questions and improvement ideas welcome ! Thanks, Julien From Andrew.McDermott at windriver.com Fri Oct 26 08:40:15 2012 From: Andrew.McDermott at windriver.com (McDermott, Andrew) Date: Fri, 26 Oct 2012 12:40:15 +0000 Subject: [lttng-dev] status of lttng top In-Reply-To: <5089879F.50701@efficios.com> (Julien Desfossez's message of "Thu, 25 Oct 2012 14:40:31 -0400") References: <7F632A9222059A42AF70FCB7965774AA20627EAB@ALA-MBB.corp.ad.wrs.com> <507389DE.4000204@efficios.com> <7F632A9222059A42AF70FCB7965774AA20854D6F@ALA-MBB.corp.ad.wrs.com> <50858BF9.705@efficios.com> <7F632A9222059A42AF70FCB7965774AA20860D77@ALA-MBB.corp.ad.wrs.com> <5089879F.50701@efficios.com> Message-ID: <7F632A9222059A42AF70FCB7965774AA208796E3@ALA-MBA.corp.ad.wrs.com> Julien Desfossez writes: > Hi, > >>>>> LTTngTop is still work in progress and will remain that way for a long >>>>> time, but the version in the PPA (or in the master branch in git) is >>>>> perfectly usable for offline traces (traces recorded and replayed >>>>> through LTTngTop). >>>>> >>>>> The "live" branch is more experimental and requires patches in both >>>>> Babeltrace and Lttng-tools (all documented in the README-LIVE file), but >>>>> it worked at the time of Plumbers, I didn't have much time since then to >>>>> rebase the branches. >>>>> >>>>> I am waiting for the release of Lttng-tools 2.1 (currently in RC) before >>>>> merging those patches. After these patches are integrated, LTTngTop will >>>>> be able to work live without any modifications, so directly reading >>>>> traces in memory shared with the tracer. >>>> >>>> Thanks for this info. >>>> >>>> Right now my interest is with the live streaming; we have a use case >>>> where the live streaming is really the only practical solution. >>>> >>>> Very roughly, would you expect the RC series to conclude this year, or >>>> (early) next year? >>> >>> Just to clarify, are you interested in live network trace reading or >>> live in-memory reading ? >>> The patches I was talking about are for in-memory trace reading. >> >> So I guess I don't understand enough of the low-level detail here. What >> I was interested in was being able to consume events, maybe periodically >> (1 /s), from a trace written by another process on the same machine. I >> guess that would fall under in-memory trace reading. >> > > Ok I will just describe this a little more, when we talk about live > reading the trace, we have two aspects : > - reading a trace while it is being written on disk (whether it is > received from the network or from a local consumer) > - reading a trace directly from memory mapped buffers between the tracer > and the consumer without writing the trace files. > > So if you want to read the trace on the machine that is being traced > without ever writing the trace on disk, yes you want the in-memory trace > reading. > > For 2.2, the focus is to support live trace reading from disk (local and > network). > In my development branches (referenced in previous email), I have code > that provides live trace reading from memory, I will try to merge it in > 2.2 but I cannot guarantee it will be accepted since it is not the > current priority (but definitely a use-case we want to support). > > I hope it clarifies the situation, Yes it does. Many thanks. -- andy From mathieu.desnoyers at efficios.com Fri Oct 26 09:57:13 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Fri, 26 Oct 2012 09:57:13 -0400 Subject: [lttng-dev] [PATCH lttng-tools v2] Fix: consumer relayd cleanup on disconnect In-Reply-To: <1351202496-18892-1-git-send-email-dgoulet@efficios.com> References: <1351202496-18892-1-git-send-email-dgoulet@efficios.com> Message-ID: <20121026135713.GA12860@Krystal> * David Goulet (dgoulet at efficios.com) wrote: > Improve the resilience of the consumer by cleaning up a relayd object > and all associated streams when a write error is detected on a relayd > socket. > > Fixes #385 > > Signed-off-by: David Goulet > --- > src/common/consumer.c | 259 ++++++++++++++++++++++++++++++++++++++++---- > src/common/consumer.h | 12 ++ > src/common/relayd/relayd.c | 3 + > 3 files changed, 254 insertions(+), 20 deletions(-) > > diff --git a/src/common/consumer.c b/src/common/consumer.c > index 53c6180..eeb2f59 100644 > --- a/src/common/consumer.c > +++ b/src/common/consumer.c > @@ -70,6 +70,21 @@ struct lttng_ht *metadata_ht; > struct lttng_ht *data_ht; > > /* > + * Notify a thread pipe to poll back again. This usually means that some global > + * state has changed so we just send back the thread in a poll wait call. > + */ > +static void notify_thread_pipe(int wpipe) > +{ > + int ret; > + > + do { > + struct lttng_consumer_stream *null_stream = NULL; > + > + ret = write(wpipe, &null_stream, sizeof(null_stream)); > + } while (ret < 0 && errno == EINTR); > +} > + > +/* > * Find a stream. The consumer_data.lock must be locked during this > * call. > */ > @@ -182,6 +197,17 @@ static void consumer_rcu_free_relayd(struct rcu_head *head) > struct consumer_relayd_sock_pair *relayd = > caa_container_of(node, struct consumer_relayd_sock_pair, node); > > + /* > + * Close all sockets. This is done in the call RCU since we don't want the > + * socket fds to be reassigned thus potentially creating bad state of the > + * relayd object. > + * > + * We do not have to lock the control socket mutex here since at this stage > + * there is no one referencing to this relayd object. > + */ > + (void) relayd_close(&relayd->control_sock); > + (void) relayd_close(&relayd->data_sock); > + > free(relayd); > } > > @@ -204,21 +230,89 @@ static void destroy_relayd(struct consumer_relayd_sock_pair *relayd) > iter.iter.node = &relayd->node.node; > ret = lttng_ht_del(consumer_data.relayd_ht, &iter); > if (ret != 0) { > - /* We assume the relayd was already destroyed */ > + /* We assume the relayd is being or is destroyed */ > return; > } > > - /* Close all sockets */ > - pthread_mutex_lock(&relayd->ctrl_sock_mutex); > - (void) relayd_close(&relayd->control_sock); > - pthread_mutex_unlock(&relayd->ctrl_sock_mutex); > - (void) relayd_close(&relayd->data_sock); > - > /* RCU free() call */ > call_rcu(&relayd->node.head, consumer_rcu_free_relayd); > } > > /* > + * Update the end point status of all streams having the given network sequence > + * index (relayd index). > + * > + * It's atomically set without having the stream mutex locked so be aware of > + * potential race when using it. Please describe that we handle this race with a retry that will happen, triggered by the pipe wakeup. Other than that, Acked-by: Mathieu Desnoyers > + */ > +static void update_endpoint_status_by_netidx(int net_seq_idx, > + enum consumer_endpoint_status status) > +{ > + struct lttng_ht_iter iter; > + struct lttng_consumer_stream *stream; > + > + DBG("Consumer set delete flag on stream by idx %d", net_seq_idx); > + > + rcu_read_lock(); > + > + /* Let's begin with metadata */ > + cds_lfht_for_each_entry(metadata_ht->ht, &iter.iter, stream, node.node) { > + if (stream->net_seq_idx == net_seq_idx) { > + uatomic_set(&stream->endpoint_status, status); > + DBG("Delete flag set to metadata stream %d", stream->wait_fd); > + } > + } > + > + /* Follow up by the data streams */ > + cds_lfht_for_each_entry(data_ht->ht, &iter.iter, stream, node.node) { > + if (stream->net_seq_idx == net_seq_idx) { > + uatomic_set(&stream->endpoint_status, status); > + DBG("Delete flag set to data stream %d", stream->wait_fd); > + } > + } > + rcu_read_unlock(); > +} > + > +/* > + * Cleanup a relayd object by flagging every associated streams for deletion, > + * destroying the object meaning removing it from the relayd hash table, > + * closing the sockets and freeing the memory in a RCU call. > + * > + * If a local data context is available, notify the threads that the streams' > + * state have changed. > + */ > +static void cleanup_relayd(struct consumer_relayd_sock_pair *relayd, > + struct lttng_consumer_local_data *ctx) > +{ > + int netidx; > + > + assert(relayd); > + > + /* Save the net sequence index before destroying the object */ > + netidx = relayd->net_seq_idx; > + > + /* > + * Delete the relayd from the relayd hash table, close the sockets and free > + * the object in a RCU call. > + */ > + destroy_relayd(relayd); > + > + /* Set inactive endpoint to all streams */ > + update_endpoint_status_by_netidx(netidx, CONSUMER_ENDPOINT_INACTIVE); > + > + /* > + * With a local data context, notify the threads that the streams' state > + * have changed. The write() action on the pipe acts as an "implicit" > + * memory barrier ordering the updates of the end point status from the > + * read of this status which happens AFTER receiving this notify. > + */ > + if (ctx) { > + notify_thread_pipe(ctx->consumer_data_pipe[1]); > + notify_thread_pipe(ctx->consumer_metadata_pipe[1]); > + } > +} > + > +/* > * Flag a relayd socket pair for destruction. Destroy it if the refcount > * reaches zero. > * > @@ -251,11 +345,14 @@ void consumer_del_stream(struct lttng_consumer_stream *stream, > > assert(stream); > > + DBG("Consumer del stream %d", stream->wait_fd); > + > if (ht == NULL) { > /* Means the stream was allocated but not successfully added */ > goto free_stream; > } > > + pthread_mutex_lock(&stream->lock); > pthread_mutex_lock(&consumer_data.lock); > > switch (consumer_data.type) { > @@ -349,6 +446,7 @@ void consumer_del_stream(struct lttng_consumer_stream *stream, > end: > consumer_data.need_update = 1; > pthread_mutex_unlock(&consumer_data.lock); > + pthread_mutex_unlock(&stream->lock); > > if (free_chan) { > consumer_del_channel(free_chan); > @@ -804,7 +902,17 @@ static int consumer_update_poll_array( > DBG("Updating poll fd array"); > rcu_read_lock(); > cds_lfht_for_each_entry(ht->ht, &iter.iter, stream, node.node) { > - if (stream->state != LTTNG_CONSUMER_ACTIVE_STREAM) { > + /* > + * Only active streams with an active end point can be added to the > + * poll set and local stream storage of the thread. > + * > + * There is a potential race here for endpoint_status to be updated > + * just after the check. However, this is OK since the stream(s) will > + * be deleted once the thread is notified that the end point state has > + * changed where this function will be called back again. > + */ > + if (stream->state != LTTNG_CONSUMER_ACTIVE_STREAM || > + stream->endpoint_status) { > continue; > } > DBG("Active FD %d", stream->wait_fd); > @@ -1169,6 +1277,7 @@ ssize_t lttng_consumer_on_read_subbuffer_mmap( > /* Default is on the disk */ > int outfd = stream->out_fd; > struct consumer_relayd_sock_pair *relayd = NULL; > + unsigned int relayd_hang_up = 0; > > /* RCU lock for the relayd pointer */ > rcu_read_lock(); > @@ -1228,11 +1337,22 @@ ssize_t lttng_consumer_on_read_subbuffer_mmap( > ret = write_relayd_metadata_id(outfd, stream, relayd, padding); > if (ret < 0) { > written = ret; > + /* Socket operation failed. We consider the relayd dead */ > + if (ret == -EPIPE || ret == -EINVAL) { > + relayd_hang_up = 1; > + goto write_error; > + } > goto end; > } > } > + } else { > + /* Socket operation failed. We consider the relayd dead */ > + if (ret == -EPIPE || ret == -EINVAL) { > + relayd_hang_up = 1; > + goto write_error; > + } > + /* Else, use the default set before which is the filesystem. */ > } > - /* Else, use the default set before which is the filesystem. */ > } else { > /* No streaming, we have to set the len with the full padding */ > len += padding; > @@ -1248,6 +1368,11 @@ ssize_t lttng_consumer_on_read_subbuffer_mmap( > if (written == 0) { > written = ret; > } > + /* Socket operation failed. We consider the relayd dead */ > + if (errno == EPIPE || errno == EINVAL) { > + relayd_hang_up = 1; > + goto write_error; > + } > goto end; > } else if (ret > len) { > PERROR("Error in file write (ret %zd > len %lu)", ret, len); > @@ -1269,6 +1394,15 @@ ssize_t lttng_consumer_on_read_subbuffer_mmap( > } > lttng_consumer_sync_trace_file(stream, orig_offset); > > +write_error: > + /* > + * This is a special case that the relayd has closed its socket. Let's > + * cleanup the relayd object and all associated streams. > + */ > + if (relayd && relayd_hang_up) { > + cleanup_relayd(relayd, ctx); > + } > + > end: > /* Unlock only if ctrl socket used */ > if (relayd && stream->metadata_flag) { > @@ -1298,6 +1432,7 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( > int outfd = stream->out_fd; > struct consumer_relayd_sock_pair *relayd = NULL; > int *splice_pipe; > + unsigned int relayd_hang_up = 0; > > switch (consumer_data.type) { > case LTTNG_CONSUMER_KERNEL: > @@ -1350,6 +1485,12 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( > padding); > if (ret < 0) { > written = ret; > + /* Socket operation failed. We consider the relayd dead */ > + if (ret == -EBADF) { > + WARN("Remote relayd disconnected. Stopping"); > + relayd_hang_up = 1; > + goto write_error; > + } > goto end; > } > > @@ -1361,7 +1502,12 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( > /* Use the returned socket. */ > outfd = ret; > } else { > - ERR("Remote relayd disconnected. Stopping"); > + /* Socket operation failed. We consider the relayd dead */ > + if (ret == -EBADF) { > + WARN("Remote relayd disconnected. Stopping"); > + relayd_hang_up = 1; > + goto write_error; > + } > goto end; > } > } else { > @@ -1410,6 +1556,12 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( > if (written == 0) { > written = ret_splice; > } > + /* Socket operation failed. We consider the relayd dead */ > + if (errno == EBADF) { > + WARN("Remote relayd disconnected. Stopping"); > + relayd_hang_up = 1; > + goto write_error; > + } > ret = errno; > goto splice_error; > } else if (ret_splice > len) { > @@ -1437,12 +1589,20 @@ ssize_t lttng_consumer_on_read_subbuffer_splice( > > goto end; > > +write_error: > + /* > + * This is a special case that the relayd has closed its socket. Let's > + * cleanup the relayd object and all associated streams. > + */ > + if (relayd && relayd_hang_up) { > + cleanup_relayd(relayd, ctx); > + /* Skip splice error so the consumer does not fail */ > + goto end; > + } > + > splice_error: > /* send the appropriate error description to sessiond */ > switch (ret) { > - case EBADF: > - lttng_consumer_send_error(ctx, LTTCOMM_CONSUMERD_SPLICE_EBADF); > - break; > case EINVAL: > lttng_consumer_send_error(ctx, LTTCOMM_CONSUMERD_SPLICE_EINVAL); > break; > @@ -1604,6 +1764,8 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, > goto free_stream; > } > > + pthread_mutex_lock(&stream->lock); > + > pthread_mutex_lock(&consumer_data.lock); > switch (consumer_data.type) { > case LTTNG_CONSUMER_KERNEL: > @@ -1695,6 +1857,7 @@ void consumer_del_metadata_stream(struct lttng_consumer_stream *stream, > > end: > pthread_mutex_unlock(&consumer_data.lock); > + pthread_mutex_unlock(&stream->lock); > > if (free_chan) { > consumer_del_channel(free_chan); > @@ -1766,6 +1929,59 @@ static int consumer_add_metadata_stream(struct lttng_consumer_stream *stream, > } > > /* > + * Delete data stream that are flagged for deletion (endpoint_status). > + */ > +static void validate_endpoint_status_data_stream(void) > +{ > + struct lttng_ht_iter iter; > + struct lttng_consumer_stream *stream; > + > + DBG("Consumer delete flagged data stream"); > + > + rcu_read_lock(); > + cds_lfht_for_each_entry(data_ht->ht, &iter.iter, stream, node.node) { > + /* Validate delete flag of the stream */ > + if (!stream->endpoint_status) { > + continue; > + } > + /* Delete it right now */ > + consumer_del_stream(stream, data_ht); > + } > + rcu_read_unlock(); > +} > + > +/* > + * Delete metadata stream that are flagged for deletion (endpoint_status). > + */ > +static void validate_endpoint_status_metadata_stream( > + struct lttng_poll_event *pollset) > +{ > + struct lttng_ht_iter iter; > + struct lttng_consumer_stream *stream; > + > + DBG("Consumer delete flagged metadata stream"); > + > + assert(pollset); > + > + rcu_read_lock(); > + cds_lfht_for_each_entry(metadata_ht->ht, &iter.iter, stream, node.node) { > + /* Validate delete flag of the stream */ > + if (!stream->endpoint_status) { > + continue; > + } > + /* > + * Remove from pollset so the metadata thread can continue without > + * blocking on a deleted stream. > + */ > + lttng_poll_del(pollset, stream->wait_fd); > + > + /* Delete it right now */ > + consumer_del_metadata_stream(stream, metadata_ht); > + } > + rcu_read_unlock(); > +} > + > +/* > * Thread polls on metadata file descriptor and write them on disk or on the > * network. > */ > @@ -1856,6 +2072,13 @@ restart: > continue; > } > > + /* A NULL stream means that the state has changed. */ > + if (stream == NULL) { > + /* Check for deleted streams. */ > + validate_endpoint_status_metadata_stream(&events); > + continue; > + } > + > DBG("Adding metadata stream %d to poll set", > stream->wait_fd); > > @@ -2063,6 +2286,7 @@ void *consumer_thread_data_poll(void *data) > * waking us up to test it. > */ > if (new_stream == NULL) { > + validate_endpoint_status_data_stream(); > continue; > } > > @@ -2301,14 +2525,9 @@ end: > > /* > * Notify the data poll thread to poll back again and test the > - * consumer_quit state to quit gracefully. > + * consumer_quit state that we just set so to quit gracefully. > */ > - do { > - struct lttng_consumer_stream *null_stream = NULL; > - > - ret = write(ctx->consumer_data_pipe[1], &null_stream, > - sizeof(null_stream)); > - } while (ret < 0 && errno == EINTR); > + notify_thread_pipe(ctx->consumer_data_pipe[1]); > > rcu_unregister_thread(); > return NULL; > diff --git a/src/common/consumer.h b/src/common/consumer.h > index 53b6151..0334c49 100644 > --- a/src/common/consumer.h > +++ b/src/common/consumer.h > @@ -74,6 +74,11 @@ enum lttng_consumer_type { > LTTNG_CONSUMER32_UST, > }; > > +enum consumer_endpoint_status { > + CONSUMER_ENDPOINT_ACTIVE, > + CONSUMER_ENDPOINT_INACTIVE, > +}; > + > struct lttng_consumer_channel { > struct lttng_ht_node_ulong node; > int key; > @@ -150,6 +155,13 @@ struct lttng_consumer_stream { > pthread_mutex_t lock; > /* Tracing session id */ > uint64_t session_id; > + /* > + * Indicates if the stream end point is still active or not (network > + * streaming or local file system). The thread "owning" the stream is > + * handling this status and can be notified of a state change through the > + * consumer data appropriate pipe. > + */ > + enum consumer_endpoint_status endpoint_status; > }; > > /* > diff --git a/src/common/relayd/relayd.c b/src/common/relayd/relayd.c > index 785d3dc..db47608 100644 > --- a/src/common/relayd/relayd.c > +++ b/src/common/relayd/relayd.c > @@ -67,6 +67,7 @@ static int send_command(struct lttcomm_sock *sock, > > ret = sock->ops->sendmsg(sock, buf, buf_size, flags); > if (ret < 0) { > + ret = -errno; > goto error; > } > > @@ -90,6 +91,7 @@ static int recv_reply(struct lttcomm_sock *sock, void *data, size_t size) > > ret = sock->ops->recvmsg(sock, data, size, 0); > if (ret < 0) { > + ret = -errno; > goto error; > } > > @@ -283,6 +285,7 @@ int relayd_send_data_hdr(struct lttcomm_sock *sock, > /* Only send data header. */ > ret = sock->ops->sendmsg(sock, hdr, size, 0); > if (ret < 0) { > + ret = -errno; > goto error; > } > > -- > 1.7.10.4 > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Sat Oct 27 11:28:07 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Sat, 27 Oct 2012 11:28:07 -0400 Subject: [lttng-dev] [RELEASE] Babeltrace 1.0.0 Message-ID: <20121027152807.GA26915@Krystal> The Babeltrace project provides trace read and write libraries, as well as a trace converter. Plugins can be created for any trace format to allow its conversion to/from another trace format. The main format expected to be converted to/from is the Common Trace Format (CTF). The default input format of the "babeltrace" command is CTF, and its default output format is a human-readable text log. The "babeltrace-log" command converts from a text log to a CTF trace. Changelog: 2012-10-27 Babeltrace 1.0.0 * tests: add test traces to distribution tarball * Document bash requirement for make check in README * Add tests to make check * Fix: add missing header size validation * callbacks.c: handle extract_ctf_stream_event return value * Cleanup: fix cppcheck warning * Cleanup: fix cppcheck warnings * fix double-free on error path Project website: http://www.efficios.com/babeltrace Download link: http://www.efficios.com/files/babeltrace/ CTF specification: http://www.efficios.com/ctf -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From David.OShea at quantum.com Sun Oct 28 18:22:05 2012 From: David.OShea at quantum.com (David OShea) Date: Sun, 28 Oct 2012 22:22:05 +0000 Subject: [lttng-dev] libust "Error: Error opening shm" every ~4 seconds if HOME variable is not set Message-ID: <20998D40D9A2B7499CA5A3A2666CB1EB19F843E3@ZURMSG1.QUANTUM.com> Hi all, I have an application that works as expected when I run it from a shell, but when I start it as a daemon from an init script, it outputs a message like this every 4 or 5 seconds: libust[15229/15231]: Error: Error opening shm (in get_wait_shm() at lttng-ust-comm.c:480) I did a little debugging to try and work out what was going on. I didn't actually end up seeing get_wait_shm() being called, but I noticed that ust_listener_thread() was entering the block of code containing: DBG("Info: sessiond not accepting connections to %s apps socket", sock_info->name); It appeared that this was because the sock_path was empty. From looking at sock_info, the wait_shm_path is also empty as shown below, as confirmed in the error message above where "Error opening shm" was meant to be followed by the path that was being opened but instead is just followed by two spaces: (gdb) p * (struct sock_info *) arg $7 = {name = 0x2b2721bd8b48 "local", ust_listener = 1110300992, root_handle = -1, constructor_sem_posted = 1, allowed = 1, global = 0, sock_path = '\000' , socket = -1, wait_shm_path = '\000' , wait_shm_mmap = 0x0} setup_local_apps() appears to not fill in the sock_path and wait_shm_path if the HOME environment variable is not set, which is the case for my daemon. Looking back, I can see that this message appeared during my application's startup: libust[15229/15229]: Error: Error setting up to local apps (in lttng_ust_init() at lttng-ust-comm.c:895) which is the indicator that setup_local_apps() failed. I assume that lttng-ust is attempting to connect to a per-user session daemon, but can't since HOME is not set. In my case, the application is running as root and connecting to the global session daemon, so I don't mind that it can't connect to a per-user session daemon. It would be nice if I didn't have to get all of those error messages, although I can certainly ignore them for now and could consider working around them by just making sure HOME is set to something :) I'm using lttng-ust-2.0.5 which I have patched with commit 009745db "Cache the procname per-thread rather than per-process to take into account that prctl() can be used to set thread names." Thanks in advance, David ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. From David.OShea at quantum.com Sun Oct 28 19:05:53 2012 From: David.OShea at quantum.com (David OShea) Date: Sun, 28 Oct 2012 23:05:53 +0000 Subject: [lttng-dev] 'add-context --userspace' must be preceded by 'enable-channel' or 'enable-event' Message-ID: <20998D40D9A2B7499CA5A3A2666CB1EB19F843FE@ZURMSG1.QUANTUM.com> Hi all, As I mentioned briefly in my thread "Wrong procname for userspace trace of app with different thread names", I found that I had to do an 'enable-channel' step to get UST context such as the thread name to actually appear in the trace. >From further investigation, it appears that if I perform 'enable-event' before 'add-context', that also makes the context appear. Here are three sequences of commands where I'm enabling the same UST context (vpid, vtid and procname) and same UST events (all) each time, but only the first sequence includes 'enable-channel', and I switched around the order of 'add-context' and 'enable-event' in the second and third. Note that in the second command sequence, the second set of curly braces - those containing the context - is missing from the 'lttng view' output. Good: 'enable-channel' before 'add-context': """ # lttng create Session auto-20121029-085910 created. Traces will be written in /root/lttng-traces/auto-20121029-085910 # lttng enable-channel channel0 --userspace UST channel channel0 enabled for session auto-20121029-085910 # lttng add-context --userspace -t vpid -t vtid -t procname UST context procname added to all channels UST context vtid added to all channels UST context vpid added to all channels # lttng enable-event --userspace --all All UST events are enabled in channel channel0 # lttng start Tracing started for session auto-20121029-085910 # lttng stop Tracing stopped for session auto-20121029-085910 # lttng view Trace directory: /root/lttng-traces/auto-20121029-085910 [09:00:34.022479500] (+?.?????????) mydaemon:13811 daemon:start: { cpu_id = 1 }, { procname = "mythread.9", vtid = 13993, vpid = 13811 }, { myvar = 999 } [...] # lttng destroy Session auto-20121029-085910 destroyed at /root """ Bad: 'add-context' before 'enable-event': """ # lttng create Session auto-20121029-090142 created. Traces will be written in /root/lttng-traces/auto-20121029-090142 # lttng add-context --userspace -t vpid -t vtid -t procname UST context procname added to all channels UST context vtid added to all channels UST context vpid added to all channels # lttng enable-event --userspace --all All UST events are enabled in channel channel0 # lttng start Tracing started for session auto-20121029-090142 # lttng stop Tracing stopped for session auto-20121029-090142 # lttng view Trace directory: /root/lttng-traces/auto-20121029-090142 [09:02:11.520300500] (+?.?????????) mydaemon:13811 daemon:start: { cpu_id = 0 }, { myvar = 999 } [...] # lttng destroy Session auto-20121029-090142 destroyed at /root """ Good: 'add-context' AFTER 'enable-event': """ # lttng create Session auto-20121029-090738 created. Traces will be written in /root/lttng-traces/auto-20121029-090738 # lttng enable-event --userspace --all All UST events are enabled in channel channel0 # lttng add-context --userspace -t vpid -t vtid -t procname UST context procname added to all channels UST context vtid added to all channels UST context vpid added to all channels # lttng start Tracing started for session auto-20121029-090738 # lttng stop Tracing stopped for session auto-20121029-090738 # lttng view Trace directory: /root/lttng-traces/auto-20121029-090738 [09:08:04.670574000] (+?.?????????) mydaemon:13811 daemon:start: { cpu_id = 1 }, { procname = "mythread.9", vtid = 13993, vpid = 13811 }, { myvar = 999 } [...] # lttng destroy Session auto-20121029-090738 destroyed at /root """ I am using lttng-ust-2.0.5 which I have patched with commit 009745db "Cache the procname per-thread rather than per-process to take into account that prctl() can be used to set thread names.", lttng-tools-2.0.4 and babeltrace-1.0.0.rc5. Thanks in advance, David ---------------------------------------------------------------------- The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. From mathieu.desnoyers at efficios.com Mon Oct 29 21:42:18 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 29 Oct 2012 21:42:18 -0400 Subject: [lttng-dev] libust "Error: Error opening shm" every ~4 seconds if HOME variable is not set In-Reply-To: <20998D40D9A2B7499CA5A3A2666CB1EB19F843E3@ZURMSG1.QUANTUM.com> References: <20998D40D9A2B7499CA5A3A2666CB1EB19F843E3@ZURMSG1.QUANTUM.com> Message-ID: <20121030014218.GA26711@Krystal> * David OShea (David.OShea at quantum.com) wrote: > Hi all, > > I have an application that works as expected when I run it from a shell, but when I start it as a daemon from an init script, it outputs a message like this every 4 or 5 seconds: > > libust[15229/15231]: Error: Error opening shm (in get_wait_shm() at lttng-ust-comm.c:480) > > I did a little debugging to try and work out what was going on. I didn't actually end up seeing get_wait_shm() being called, but I noticed that ust_listener_thread() was entering the block of code containing: > > DBG("Info: sessiond not accepting connections to %s apps socket", sock_info->name); > > It appeared that this was because the sock_path was empty. From looking at sock_info, the wait_shm_path is also empty as shown below, as confirmed in the error message above where "Error opening shm" was meant to be followed by the path that was being opened but instead is just followed by two spaces: > > (gdb) p * (struct sock_info *) arg > $7 = {name = 0x2b2721bd8b48 "local", ust_listener = 1110300992, root_handle = -1, constructor_sem_posted = 1, allowed = 1, global = 0, sock_path = '\000' , socket = -1, wait_shm_path = '\000' , wait_shm_mmap = 0x0} > > setup_local_apps() appears to not fill in the sock_path and wait_shm_path if the HOME environment variable is not set, which is the case for my daemon. Looking back, I can see that this message appeared during my application's startup: > > libust[15229/15229]: Error: Error setting up to local apps (in lttng_ust_init() at lttng-ust-comm.c:895) > > which is the indicator that setup_local_apps() failed. > > I assume that lttng-ust is attempting to connect to a per-user session daemon, but can't since HOME is not set. In my case, the application is running as root and connecting to the global session daemon, so I don't mind that it can't connect to a per-user session daemon. It would be nice if I didn't have to get all of those error messages, although I can certainly ignore them for now and could consider working around them by just making sure HOME is set to something :) > > I'm using lttng-ust-2.0.5 which I have patched with commit 009745db "Cache the procname per-thread rather than per-process to take into account that prctl() can be used to set thread names." Thanks for reporting! Can you try with stable-2.0 branch or master branch head ? Here are the commits: stable-2.0: commit e699eda9762d3cf3b0c40329eb3b6ce0947789dc Author: Mathieu Desnoyers Date: Mon Oct 29 21:39:42 2012 -0400 Cleanup: don't spawn per-user thread if HOME is not set Reported-by: David OShea Signed-off-by: Mathieu Desnoyers master: commit 9ec6895c5633ed93c5acdf1e5b06f075fbd709d3 Author: Mathieu Desnoyers Date: Mon Oct 29 21:39:42 2012 -0400 Cleanup: don't spawn per-user thread if HOME is not set Reported-by: David OShea Signed-off-by: Mathieu Desnoyers Thanks, Mathieu > > Thanks in advance, > David > > ---------------------------------------------------------------------- > The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From mathieu.desnoyers at efficios.com Mon Oct 29 21:48:45 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Mon, 29 Oct 2012 21:48:45 -0400 Subject: [lttng-dev] 'add-context --userspace' must be preceded by 'enable-channel' or 'enable-event' In-Reply-To: <20998D40D9A2B7499CA5A3A2666CB1EB19F843FE@ZURMSG1.QUANTUM.com> References: <20998D40D9A2B7499CA5A3A2666CB1EB19F843FE@ZURMSG1.QUANTUM.com> Message-ID: <20121030014845.GB26711@Krystal> * David OShea (David.OShea at quantum.com) wrote: > Hi all, > > As I mentioned briefly in my thread "Wrong procname for userspace > trace of app with different thread names", I found that I had to do an > 'enable-channel' step to get UST context such as the thread name to > actually appear in the trace. > > From further investigation, it appears that if I perform > 'enable-event' before 'add-context', that also makes the context > appear. Here are three sequences of commands where I'm enabling the > same UST context (vpid, vtid and procname) and same UST events (all) > each time, but only the first sequence includes 'enable-channel', and > I switched around the order of 'add-context' and 'enable-event' in the > second and third. Note that in the second command sequence, the > second set of curly braces - those containing the context - is missing > from the 'lttng view' output. If I understand correctly, the two use-cases where you issue "add-context" prior to enable-event and prior to enable-channel are behaving as if they are failing (those contexts don't appear in the trace), but you don't get any error message. David (David Goulet, in CC), any clue on why lttng-tools behaves that way, and how can we fix this ? Thanks, Mathieu > > Good: 'enable-channel' before 'add-context': > > """ > # lttng create > Session auto-20121029-085910 created. > Traces will be written in /root/lttng-traces/auto-20121029-085910 > # lttng enable-channel channel0 --userspace > UST channel channel0 enabled for session auto-20121029-085910 > # lttng add-context --userspace -t vpid -t vtid -t procname > UST context procname added to all channels > UST context vtid added to all channels > UST context vpid added to all channels > # lttng enable-event --userspace --all > All UST events are enabled in channel channel0 > # lttng start > Tracing started for session auto-20121029-085910 > # lttng stop > Tracing stopped for session auto-20121029-085910 > # lttng view > Trace directory: /root/lttng-traces/auto-20121029-085910 > > [09:00:34.022479500] (+?.?????????) mydaemon:13811 daemon:start: { cpu_id = 1 }, { procname = "mythread.9", vtid = 13993, vpid = 13811 }, { myvar = 999 } > [...] > # lttng destroy > Session auto-20121029-085910 destroyed at /root > """ > > Bad: 'add-context' before 'enable-event': > > """ > # lttng create > Session auto-20121029-090142 created. > Traces will be written in /root/lttng-traces/auto-20121029-090142 > # lttng add-context --userspace -t vpid -t vtid -t procname > UST context procname added to all channels > UST context vtid added to all channels > UST context vpid added to all channels > # lttng enable-event --userspace --all > All UST events are enabled in channel channel0 > # lttng start > Tracing started for session auto-20121029-090142 > # lttng stop > Tracing stopped for session auto-20121029-090142 > # lttng view > Trace directory: /root/lttng-traces/auto-20121029-090142 > > [09:02:11.520300500] (+?.?????????) mydaemon:13811 daemon:start: { cpu_id = 0 }, { myvar = 999 } > [...] > # lttng destroy > Session auto-20121029-090142 destroyed at /root > """ > > Good: 'add-context' AFTER 'enable-event': > > """ > # lttng create > Session auto-20121029-090738 created. > Traces will be written in /root/lttng-traces/auto-20121029-090738 > # lttng enable-event --userspace --all > All UST events are enabled in channel channel0 > # lttng add-context --userspace -t vpid -t vtid -t procname > UST context procname added to all channels > UST context vtid added to all channels > UST context vpid added to all channels > # lttng start > Tracing started for session auto-20121029-090738 > # lttng stop > Tracing stopped for session auto-20121029-090738 > # lttng view > Trace directory: /root/lttng-traces/auto-20121029-090738 > > [09:08:04.670574000] (+?.?????????) mydaemon:13811 daemon:start: { cpu_id = 1 }, { procname = "mythread.9", vtid = 13993, vpid = 13811 }, { myvar = 999 } > [...] > # lttng destroy > Session auto-20121029-090738 destroyed at /root > """ > > I am using lttng-ust-2.0.5 which I have patched with commit 009745db "Cache the procname per-thread rather than per-process to take into account that prctl() can be used to set thread names.", lttng-tools-2.0.4 and babeltrace-1.0.0.rc5. > > Thanks in advance, > David > > ---------------------------------------------------------------------- > The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From paulmck at linux.vnet.ibm.com Mon Oct 29 22:26:19 2012 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Mon, 29 Oct 2012 19:26:19 -0700 Subject: [lttng-dev] Fw: Re: [PATCH v2] epoll: Support for disabling items, and a self-test app. Message-ID: <20121030022619.GK3027@linux.vnet.ibm.com> FYI, userspace RCU proposed to solve an issue with epoll. Thanx, Paul ----- Forwarded message from Matt Helsley ----- Date: Fri, 26 Oct 2012 14:52:42 -0700 From: Matt Helsley To: "Michael Kerrisk (man-pages)" Cc: "Paton J. Lewis" , Alexander Viro , Andrew Morton , Jason Baron , "linux-fsdevel at vger.kernel.org" , "linux-kernel at vger.kernel.org" , Paul Holland , Davide Libenzi , "libc-alpha at sourceware.org" , Linux API , Paul McKenney Subject: Re: [PATCH v2] epoll: Support for disabling items, and a self-test app. On Thu, Oct 25, 2012 at 12:23:24PM +0200, Michael Kerrisk (man-pages) wrote: > Hi Pat, > > > >> I suppose that I have a concern that goes in the other direction. Is > >> there not some other solution possible that doesn't require the use of > >> EPOLLONESHOT? It seems overly restrictive to require that the caller > >> must employ this flag, and imposes the burden that the caller must > >> re-enable monitoring after each event. > >> > >> Does a solution like the following (with no requirement for EPOLLONESHOT) > >> work? > >> > >> 0. Implement an epoll_ctl() operation EPOLL_CTL_XXX > >> where the name XXX might be chosen based on the decision > >> in 4(a). > >> 1. EPOLL_CTL_XXX employs a private flag, EPOLLUSED, in the > >> per-fd events mask in the ready list. By default, > >> that flag is off. > >> 2. epoll_wait() always clears the EPOLLUSED flag if a > >> file descriptor is found to be ready. > >> 3. If an epoll_ctl(EPOLL_CTL_XXX) discovers that the EPOLLUSED > >> flag is NOT set, then > >> a) it sets the EPOLLUSED flag > >> b) It disables I/O events (as per EPOLL_CTL_DISABLE) > >> (I'm not 100% sure if this is necesary). > >> c) it returns EBUSY to the caller > >> 4. If an epoll_ctl(EPOLL_CTL_XXX) discovers that the EPOLLUSED > >> flag IS set, then it > >> a) either deletes the fd or disables events for the fd > >> (the choice here is a matter of design taste, I think; > >> deletion has the virtue of simplicity; disabling provides > >> the option to re-enable the fd later, if desired) > >> b) returns 0 to the caller. > >> > >> All of the above with suitable locking around the user-space cache. > >> > >> Cheers, > >> > >> Michael > > > > > > I don't believe that proposal will solve the problem. Consider the case > > where a worker thread has just executed epoll_wait and is about to execute > > the next line of code (which will access the data associated with the fd > > receiving the event). If the deletion thread manages to call > > epoll_ctl(EPOLL_CTL_XXX) for that fd twice in a row before the worker thread > > is able to execute the next statement, then the deletion thread will > > mistakenly conclude that it is safe to destroy the data that the worker > > thread is about to access. > > Okay -- I had the idea there might be a hole in my proposal ;-). > > By the way, have you been reading the comments in the two LWN articles > on EPOLL_CTL_DISABLE? > https://lwn.net/Articles/520012/ > http://lwn.net/SubscriberLink/520198/fd81ba0ecb1858a2/ > > There's some interesting proposals there--some suggesting that an > entirely user-space solution might be possible. I haven't looked > deeply into the ideas though. Yeah, I became quite interested so I wrote a crude epoll + urcu test. Since it's RCU review to ensure I've not made any serious mistakes could be quite helpful: #define _LGPL_SOURCE 1 #define _GNU_SOURCE 1 #include #include #include #include #include #include #include #include #include /* * Locking Voodoo: * * The globabls prefixed by _ require special care because they will be * accessed from multiple threads. * * The precise locking scheme we use varies whether READERS_USE_MUTEX is defined * When we're using userspace RCU the mutex only gets acquired for writes * to _-prefixed globals. Reads are done inside RCU read side critical * sections. * Otherwise the epmutex covers reads and writes to them all and the test * is not very scalable. */ static pthread_mutex_t epmutex = PTHREAD_MUTEX_INITIALIZER; static int _p[2]; /* Send dummy data from one thread to another */ static int _epfd; /* Threads wait to read/write on epfd */ static int _nepitems = 0; #ifdef READERS_USE_MUTEX #define init_lock() do {} while(0) #define init_thread() do {} while(0) #define read_lock pthread_mutex_lock #define read_unlock pthread_mutex_unlock #define fini_thread() do {} while(0) /* Because readers use the mutex synchronize_rcu() is a no-op */ #define synchronize_rcu() do {} while(0) #else #include #define init_lock rcu_init #define init_thread rcu_register_thread #define read_lock(m) rcu_read_lock() #define read_unlock(m) rcu_read_unlock() #define fini_thread() do { rcu_unregister_thread(); } while(0) #endif #define write_lock pthread_mutex_lock #define write_unlock pthread_mutex_unlock /* We send this data through the pipe. */ static const char *data = "test"; const size_t dlen = 5; static inline int harmless_errno(void) { return ((errno == EWOULDBLOCK) || (errno == EAGAIN) || (errno == EINTR)); } static void* thread_main(void *thread_nr) { struct epoll_event ev; int rc = 0; char buffer[dlen]; unsigned long long _niterations = 0; init_thread(); while (!rc) { read_lock(&epmutex); if (_nepitems < 1) { read_unlock(&epmutex); break; } rc = epoll_wait(_epfd, &ev, 1, 1); if (rc < 1) { read_unlock(&epmutex); if (rc == 0) continue; if (harmless_errno()) { rc = 0; continue; } break; } if (ev.events & EPOLLOUT) { rc = write(_p[1], data, dlen); read_unlock(&epmutex); if (rc < 0) { if (harmless_errno()) { rc = 0; continue; } break; } rc = 0; } else if (ev.events & EPOLLIN) { rc = read(_p[0], buffer, dlen); read_unlock(&epmutex); if (rc < 0) { if (harmless_errno()) { rc = 0; continue; } break; } rc = 0; } else read_unlock(&epmutex); _niterations++; } fini_thread(); return (void *)_niterations; } /* Some sample numbers from varying MAX_THREADS on my laptop: * With a global mutex: * 1 core for the main thread * 1 core for epoll_wait()'ing threads * The mutex doesn't scale -- increasing the number of threads despite * having more real cores just causes performance to go down. * 7 threads, 213432.128160 iterations per second * 3 threads, 606560.183997 iterations per second * 2 threads, 1346006.413404 iterations per second * 1 thread , 2148936.348793 iterations per second * * With URCU: * 1 core for the main thread which spins reading niterations. * N-1 cores for the epoll_wait()'ing threads. * "Hyperthreading" doesn't help here -- I've got 4 cores: * 7 threads, 1537304.965009 iterations per second * 4 threads, 1912846.753203 iterations per second * 3 threads, 2278639.336464 iterations per second * 2 threads, 1928805.899146 iterations per second * 1 thread , 2007198.066327 iterations per second */ #define MAX_THREADS 3 int main (int argc, char **argv) { struct timespec before, req, after; unsigned long long niterations = 0; pthread_t threads[MAX_THREADS]; struct epoll_event ev; int nthreads = 0, rc; init_lock(); /* Since we haven't made the threads yet we can safely use _ globals */ rc = pipe2(_p, O_NONBLOCK); if (rc < 0) goto error; _epfd = epoll_create1(EPOLL_CLOEXEC); if (_epfd < 0) goto error; /* Monitor the pipe via epoll */ ev.events = EPOLLIN; ev.data.u32 = 0; /* index in _p[] */ rc = epoll_ctl(_epfd, EPOLL_CTL_ADD, _p[0], &ev); if (rc < 0) goto error; _nepitems++; printf("Added fd %d to epoll set %d\n", _p[0], _epfd); ev.events = EPOLLOUT; ev.data.u32 = 1; rc = epoll_ctl(_epfd, EPOLL_CTL_ADD, _p[1], &ev); if (rc < 0) goto error; _nepitems++; printf("Added fd %d to epoll set %d\n", _p[1], _epfd); fflush(stdout); /* * After the first pthread_create() we can't safely use _ globals * without adhering to the locking scheme. pthread_create() should * also imply some thorough memory barriers so all our previous * modifications to the _ globals should be visible after this point. */ for (rc = 0; nthreads < MAX_THREADS; nthreads++) { rc = pthread_create(&threads[nthreads], NULL, &thread_main, (void *)(long)nthreads); if (rc < 0) goto error; } /* Wait for our child threads to do some "work" */ req.tv_sec = 30; rc = clock_gettime(CLOCK_MONOTONIC_RAW, &before); rc = nanosleep(&req, NULL); rc = clock_gettime(CLOCK_MONOTONIC_RAW, &after); /* * Modify the epoll interest set. This can leave stale * data in other threads because they may have done an * epoll_wait() with RCU read lock held instead of the * epmutex. */ write_lock(&epmutex); rc = epoll_ctl(_epfd, EPOLL_CTL_DEL, _p[0], &ev); if (rc == 0) { _nepitems--; printf("Removed fd %d from epoll set %d\n", _p[0], _epfd); rc = epoll_ctl(_epfd, EPOLL_CTL_DEL, _p[1], &ev); if (rc == 0) { printf("Removed fd %d from epoll set %d\n", _p[1], _epfd); _nepitems--; } } write_unlock(&epmutex); if (rc < 0) goto error; /* * Wait until the stale data are no longer in use. * We could use call_rcu() here too, but let's keep the test simple. */ printf("synchronize_rcu()\n"); fflush(stdout); synchronize_rcu(); printf("closing fds\n"); fflush(stdout); /* Clean up the stale data */ close(_p[0]); close(_p[1]); close(_epfd); printf("closed fds (%d, %d, %d)\n", _p[0], _p[1], _epfd); fflush(stdout); /* * Test is done. Join all the threads so that we give time for * races to show up. */ niterations = 0; for (; nthreads > 0; nthreads--) { unsigned long long thread_iterations; rc = pthread_join(threads[nthreads - 1], (void *)&thread_iterations); niterations += thread_iterations; } after.tv_sec -= before.tv_sec; after.tv_nsec -= before.tv_nsec; if (after.tv_nsec < 0) { --after.tv_sec; after.tv_nsec += 1000000000; } printf("%f iterations per second\n", (double)niterations/((double)after.tv_sec + (double)after.tv_nsec/1000000000.0)); exit(EXIT_SUCCESS); error: /* This is trashy testcase code -- it doesn't do full cleanup! */ for (; nthreads > 0; nthreads--) rc = pthread_cancel(threads[nthreads - 1]); exit(EXIT_FAILURE); } ----- End forwarded message ----- From mathieu.desnoyers at efficios.com Tue Oct 30 09:13:00 2012 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Tue, 30 Oct 2012 09:13:00 -0400 Subject: [lttng-dev] Fw: Re: [PATCH v2] epoll: Support for disabling items, and a self-test app. In-Reply-To: <20121030022619.GK3027@linux.vnet.ibm.com> References: <20121030022619.GK3027@linux.vnet.ibm.com> Message-ID: <20121030131300.GA3807@Krystal> * Paul E. McKenney (paulmck at linux.vnet.ibm.com) wrote: > FYI, userspace RCU proposed to solve an issue with epoll. Hi Paul! That's quite interesting indeed! I'm wondering about a couple of things related to the patch below. I see that RCU read-side critical section here is used to links together the epoll_wait() operation and the following read/write on the FD. On the update side, it ensures a grace period is observed between EPOLL_CTL_DEL and FD close. This should guarantee existance of the opened FD for the entire read-side C.S. if it's been observed as being in the poll set by epoll_wait(). This sounds all fine. The only question that comes up in my mind is: is it possible that one epoll_wait() call blocks for a very long period of time in the system ? Could this lead to memory exhaustion if we also happen to use RCU to delay memory reclaim within the same applications ? We might want to revisit our guide-lines about blocking OS calls within RCU read-side critical sections, or at least document the possible impact. Thoughts ? Thanks, Mathieu > > Thanx, Paul > > ----- Forwarded message from Matt Helsley ----- > > Date: Fri, 26 Oct 2012 14:52:42 -0700 > From: Matt Helsley > To: "Michael Kerrisk (man-pages)" > Cc: "Paton J. Lewis" , Alexander Viro > , Andrew Morton , > Jason Baron , "linux-fsdevel at vger.kernel.org" > , "linux-kernel at vger.kernel.org" > , Paul Holland , > Davide Libenzi , "libc-alpha at sourceware.org" > , Linux API , > Paul McKenney > Subject: Re: [PATCH v2] epoll: Support for disabling items, and a self-test > app. > > On Thu, Oct 25, 2012 at 12:23:24PM +0200, Michael Kerrisk (man-pages) wrote: > > Hi Pat, > > > > > > >> I suppose that I have a concern that goes in the other direction. Is > > >> there not some other solution possible that doesn't require the use of > > >> EPOLLONESHOT? It seems overly restrictive to require that the caller > > >> must employ this flag, and imposes the burden that the caller must > > >> re-enable monitoring after each event. > > >> > > >> Does a solution like the following (with no requirement for EPOLLONESHOT) > > >> work? > > >> > > >> 0. Implement an epoll_ctl() operation EPOLL_CTL_XXX > > >> where the name XXX might be chosen based on the decision > > >> in 4(a). > > >> 1. EPOLL_CTL_XXX employs a private flag, EPOLLUSED, in the > > >> per-fd events mask in the ready list. By default, > > >> that flag is off. > > >> 2. epoll_wait() always clears the EPOLLUSED flag if a > > >> file descriptor is found to be ready. > > >> 3. If an epoll_ctl(EPOLL_CTL_XXX) discovers that the EPOLLUSED > > >> flag is NOT set, then > > >> a) it sets the EPOLLUSED flag > > >> b) It disables I/O events (as per EPOLL_CTL_DISABLE) > > >> (I'm not 100% sure if this is necesary). > > >> c) it returns EBUSY to the caller > > >> 4. If an epoll_ctl(EPOLL_CTL_XXX) discovers that the EPOLLUSED > > >> flag IS set, then it > > >> a) either deletes the fd or disables events for the fd > > >> (the choice here is a matter of design taste, I think; > > >> deletion has the virtue of simplicity; disabling provides > > >> the option to re-enable the fd later, if desired) > > >> b) returns 0 to the caller. > > >> > > >> All of the above with suitable locking around the user-space cache. > > >> > > >> Cheers, > > >> > > >> Michael > > > > > > > > > I don't believe that proposal will solve the problem. Consider the case > > > where a worker thread has just executed epoll_wait and is about to execute > > > the next line of code (which will access the data associated with the fd > > > receiving the event). If the deletion thread manages to call > > > epoll_ctl(EPOLL_CTL_XXX) for that fd twice in a row before the worker thread > > > is able to execute the next statement, then the deletion thread will > > > mistakenly conclude that it is safe to destroy the data that the worker > > > thread is about to access. > > > > Okay -- I had the idea there might be a hole in my proposal ;-). > > > > By the way, have you been reading the comments in the two LWN articles > > on EPOLL_CTL_DISABLE? > > https://lwn.net/Articles/520012/ > > http://lwn.net/SubscriberLink/520198/fd81ba0ecb1858a2/ > > > > There's some interesting proposals there--some suggesting that an > > entirely user-space solution might be possible. I haven't looked > > deeply into the ideas though. > > Yeah, I became quite interested so I wrote a crude epoll + urcu test. > Since it's RCU review to ensure I've not made any serious mistakes could > be quite helpful: > > #define _LGPL_SOURCE 1 > #define _GNU_SOURCE 1 > > #include > #include > #include > #include > #include > #include > #include > #include > > #include > > /* > * Locking Voodoo: > * > * The globabls prefixed by _ require special care because they will be > * accessed from multiple threads. > * > * The precise locking scheme we use varies whether READERS_USE_MUTEX is defined > * When we're using userspace RCU the mutex only gets acquired for writes > * to _-prefixed globals. Reads are done inside RCU read side critical > * sections. > * Otherwise the epmutex covers reads and writes to them all and the test > * is not very scalable. > */ > static pthread_mutex_t epmutex = PTHREAD_MUTEX_INITIALIZER; > static int _p[2]; /* Send dummy data from one thread to another */ > static int _epfd; /* Threads wait to read/write on epfd */ > static int _nepitems = 0; > > #ifdef READERS_USE_MUTEX > #define init_lock() do {} while(0) > #define init_thread() do {} while(0) > #define read_lock pthread_mutex_lock > #define read_unlock pthread_mutex_unlock > #define fini_thread() do {} while(0) > /* Because readers use the mutex synchronize_rcu() is a no-op */ > #define synchronize_rcu() do {} while(0) > #else > #include > #define init_lock rcu_init > #define init_thread rcu_register_thread > #define read_lock(m) rcu_read_lock() > #define read_unlock(m) rcu_read_unlock() > #define fini_thread() do { rcu_unregister_thread(); } while(0) > #endif > #define write_lock pthread_mutex_lock > #define write_unlock pthread_mutex_unlock > > /* We send this data through the pipe. */ > static const char *data = "test"; > const size_t dlen = 5; > > static inline int harmless_errno(void) > { > return ((errno == EWOULDBLOCK) || (errno == EAGAIN) || (errno == EINTR)); > } > > static void* thread_main(void *thread_nr) > { > struct epoll_event ev; > int rc = 0; > char buffer[dlen]; > unsigned long long _niterations = 0; > > init_thread(); > while (!rc) { > read_lock(&epmutex); > if (_nepitems < 1) { > read_unlock(&epmutex); > break; > } > rc = epoll_wait(_epfd, &ev, 1, 1); > if (rc < 1) { > read_unlock(&epmutex); > if (rc == 0) > continue; > if (harmless_errno()) { > rc = 0; > continue; > } > break; > } > > if (ev.events & EPOLLOUT) { > rc = write(_p[1], data, dlen); > read_unlock(&epmutex); > if (rc < 0) { > if (harmless_errno()) { > rc = 0; > continue; > } > break; > } > rc = 0; > } else if (ev.events & EPOLLIN) { > rc = read(_p[0], buffer, dlen); > read_unlock(&epmutex); > if (rc < 0) { > if (harmless_errno()) { > rc = 0; > continue; > } > break; > } > rc = 0; > } else > read_unlock(&epmutex); > _niterations++; > } > fini_thread(); > return (void *)_niterations; > } > > /* Some sample numbers from varying MAX_THREADS on my laptop: > * With a global mutex: > * 1 core for the main thread > * 1 core for epoll_wait()'ing threads > * The mutex doesn't scale -- increasing the number of threads despite > * having more real cores just causes performance to go down. > * 7 threads, 213432.128160 iterations per second > * 3 threads, 606560.183997 iterations per second > * 2 threads, 1346006.413404 iterations per second > * 1 thread , 2148936.348793 iterations per second > * > * With URCU: > * 1 core for the main thread which spins reading niterations. > * N-1 cores for the epoll_wait()'ing threads. > * "Hyperthreading" doesn't help here -- I've got 4 cores: > * 7 threads, 1537304.965009 iterations per second > * 4 threads, 1912846.753203 iterations per second > * 3 threads, 2278639.336464 iterations per second > * 2 threads, 1928805.899146 iterations per second > * 1 thread , 2007198.066327 iterations per second > */ > #define MAX_THREADS 3 > > int main (int argc, char **argv) > { > struct timespec before, req, after; > unsigned long long niterations = 0; > pthread_t threads[MAX_THREADS]; > struct epoll_event ev; > int nthreads = 0, rc; > > init_lock(); > > /* Since we haven't made the threads yet we can safely use _ globals */ > rc = pipe2(_p, O_NONBLOCK); > if (rc < 0) > goto error; > > _epfd = epoll_create1(EPOLL_CLOEXEC); > if (_epfd < 0) > goto error; > > /* Monitor the pipe via epoll */ > ev.events = EPOLLIN; > ev.data.u32 = 0; /* index in _p[] */ > rc = epoll_ctl(_epfd, EPOLL_CTL_ADD, _p[0], &ev); > if (rc < 0) > goto error; > _nepitems++; > printf("Added fd %d to epoll set %d\n", _p[0], _epfd); > ev.events = EPOLLOUT; > ev.data.u32 = 1; > rc = epoll_ctl(_epfd, EPOLL_CTL_ADD, _p[1], &ev); > if (rc < 0) > goto error; > _nepitems++; > printf("Added fd %d to epoll set %d\n", _p[1], _epfd); > fflush(stdout); > > /* > * After the first pthread_create() we can't safely use _ globals > * without adhering to the locking scheme. pthread_create() should > * also imply some thorough memory barriers so all our previous > * modifications to the _ globals should be visible after this point. > */ > for (rc = 0; nthreads < MAX_THREADS; nthreads++) { > rc = pthread_create(&threads[nthreads], NULL, &thread_main, > (void *)(long)nthreads); > if (rc < 0) > goto error; > } > > /* Wait for our child threads to do some "work" */ > req.tv_sec = 30; > rc = clock_gettime(CLOCK_MONOTONIC_RAW, &before); > rc = nanosleep(&req, NULL); > rc = clock_gettime(CLOCK_MONOTONIC_RAW, &after); > > /* > * Modify the epoll interest set. This can leave stale > * data in other threads because they may have done an > * epoll_wait() with RCU read lock held instead of the > * epmutex. > */ > write_lock(&epmutex); > rc = epoll_ctl(_epfd, EPOLL_CTL_DEL, _p[0], &ev); > if (rc == 0) { > _nepitems--; > printf("Removed fd %d from epoll set %d\n", _p[0], _epfd); > rc = epoll_ctl(_epfd, EPOLL_CTL_DEL, _p[1], &ev); > if (rc == 0) { > printf("Removed fd %d from epoll set %d\n", _p[1], _epfd); > _nepitems--; > } > } > write_unlock(&epmutex); > if (rc < 0) > goto error; > > /* > * Wait until the stale data are no longer in use. > * We could use call_rcu() here too, but let's keep the test simple. > */ > printf("synchronize_rcu()\n"); > fflush(stdout); > synchronize_rcu(); > > printf("closing fds\n"); > fflush(stdout); > > /* Clean up the stale data */ > close(_p[0]); > close(_p[1]); > close(_epfd); > > printf("closed fds (%d, %d, %d)\n", _p[0], _p[1], _epfd); > fflush(stdout); > > /* > * Test is done. Join all the threads so that we give time for > * races to show up. > */ > niterations = 0; > for (; nthreads > 0; nthreads--) { > unsigned long long thread_iterations; > > rc = pthread_join(threads[nthreads - 1], > (void *)&thread_iterations); > niterations += thread_iterations; > } > > after.tv_sec -= before.tv_sec; > after.tv_nsec -= before.tv_nsec; > if (after.tv_nsec < 0) { > --after.tv_sec; > after.tv_nsec += 1000000000; > } > printf("%f iterations per second\n", (double)niterations/((double)after.tv_sec + (double)after.tv_nsec/1000000000.0)); > exit(EXIT_SUCCESS); > error: > /* This is trashy testcase code -- it doesn't do full cleanup! */ > for (; nthreads > 0; nthreads--) > rc = pthread_cancel(threads[nthreads - 1]); > exit(EXIT_FAILURE); > } > > > ----- End forwarded message ----- > > > _______________________________________________ > lttng-dev mailing list > lttng-dev at lists.lttng.org > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com From montarcyber at gmail.com Tue Oct 30 13:33:50 2012 From: montarcyber at gmail.com (tchak adim) Date: Tue, 30 Oct 2012 13:33:50 -0400 Subject: [lttng-dev] Question Message-ID: Hi all, i wanna know the signification of every kernel event on LTTng but i didn't find any documentation that depicts that part. also i want to know how can i retrieve system metrics from kernel traces ? i already find a document named "Recovering System Metrics from Kernel Trace" but it seems that this document isn't up to date because i'm using LTTng 2.0. Can you help me please ? THX. -------------- next part -------------- An HTML attachment was scrubbed... URL: From montarcyber at gmail.com Tue Oct 30 13:46:54 2012 From: montarcyber at gmail.com (tchak adim) Date: Tue, 30 Oct 2012 13:46:54 -0400 Subject: [lttng-dev] Question Message-ID: Hi all, i'm using lttng 2.0 and i didn't find any documentation that talk about the signification of kernel event listed by "lttng lisk -k" . I want to know how can i retrieve system metrics from kernel traces ? i already find a document named "Recovering System Metrics from Kernel Trace" but it seems that this document isn't up to date because i'm using LTTng 2.0. is there any updated document that depicts that part ? Can you help me please ? THX. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dgoulet at efficios.com Tue Oct 30 16:34:53 2012 From: dgoulet at efficios.com (David Goulet) Date: Tue, 30 Oct 2012 16:34:53 -0400 Subject: [lttng-dev] 'add-context --userspace' must be preceded by 'enable-channel' or 'enable-event' In-Reply-To: <20121030014845.GB26711@Krystal> References: <20998D40D9A2B7499CA5A3A2666CB1EB19F843FE@ZURMSG1.QUANTUM.com> <20121030014845.GB26711@Krystal> Message-ID: <509039ED.6020203@efficios.com> Comments below. Mathieu Desnoyers: > * David OShea (David.OShea at quantum.com) wrote: >> Hi all, >> >> As I mentioned briefly in my thread "Wrong procname for userspace >> trace of app with different thread names", I found that I had to do an >> 'enable-channel' step to get UST context such as the thread name to >> actually appear in the trace. >> >> From further investigation, it appears that if I perform >> 'enable-event' before 'add-context', that also makes the context >> appear. Here are three sequences of commands where I'm enabling the >> same UST context (vpid, vtid and procname) and same UST events (all) >> each time, but only the first sequence includes 'enable-channel', and >> I switched around the order of 'add-context' and 'enable-event' in the >> second and third. Note that in the second command sequence, the >> second set of curly braces - those containing the context - is missing >> from the 'lttng view' output. > > If I understand correctly, the two use-cases where you issue > "add-context" prior to enable-event and prior to enable-channel are > behaving as if they are failing (those contexts don't appear in the > trace), but you don't get any error message. David (David Goulet, in > CC), any clue on why lttng-tools behaves that way, and how can we fix > this ? > > Thanks, > > Mathieu > >> >> Good: 'enable-channel' before 'add-context': >> >> """ >> # lttng create >> Session auto-20121029-085910 created. >> Traces will be written in /root/lttng-traces/auto-20121029-085910 >> # lttng enable-channel channel0 --userspace >> UST channel channel0 enabled for session auto-20121029-085910 >> # lttng add-context --userspace -t vpid -t vtid -t procname >> UST context procname added to all channels >> UST context vtid added to all channels >> UST context vpid added to all channels >> # lttng enable-event --userspace --all >> All UST events are enabled in channel channel0 >> # lttng start >> Tracing started for session auto-20121029-085910 >> # lttng stop >> Tracing stopped for session auto-20121029-085910 >> # lttng view >> Trace directory: /root/lttng-traces/auto-20121029-085910 >> >> [09:00:34.022479500] (+?.?????????) mydaemon:13811 daemon:start: { cpu_id = 1 }, { procname = "mythread.9", vtid = 13993, vpid = 13811 }, { myvar = 999 } >> [...] >> # lttng destroy >> Session auto-20121029-085910 destroyed at /root >> """ >> >> Bad: 'add-context' before 'enable-event': >> >> """ >> # lttng create >> Session auto-20121029-090142 created. >> Traces will be written in /root/lttng-traces/auto-20121029-090142 >> # lttng add-context --userspace -t vpid -t vtid -t procname >> UST context procname added to all channels >> UST context vtid added to all channels >> UST context vpid added to all channels >> # lttng enable-event --userspace --all The context is added to "channel0" here which is the default one created automatically. The lttng-tools session daemon do add the contexts to the channel on the tracer side (ustctl_add_context) so Mathieu we might want to check if the UST tracer do behave correctly by adding the context to all events of a channel. (Note here that -a -u was used hence the "*" event). I also do confirm that lttng-tools is doing the right ustctl call on channel0 here. Thanks David >> All UST events are enabled in channel channel0 >> # lttng start >> Tracing started for session auto-20121029-090142 >> # lttng stop >> Tracing stopped for session auto-20121029-090142 >> # lttng view >> Trace directory: /root/lttng-traces/auto-20121029-090142 >> >> [09:02:11.520300500] (+?.?????????) mydaemon:13811 daemon:start: { cpu_id = 0 }, { myvar = 999 } >> [...] >> # lttng destroy >> Session auto-20121029-090142 destroyed at /root >> """ >> >> Good: 'add-context' AFTER 'enable-event': >> >> """ >> # lttng create >> Session auto-20121029-090738 created. >> Traces will be written in /root/lttng-traces/auto-20121029-090738 >> # lttng enable-event --userspace --all >> All UST events are enabled in channel channel0 >> # lttng add-context --userspace -t vpid -t vtid -t procname >> UST context procname added to all channels >> UST context vtid added to all channels >> UST context vpid added to all channels >> # lttng start >> Tracing started for session auto-20121029-090738 >> # lttng stop >> Tracing stopped for session auto-20121029-090738 >> # lttng view >> Trace directory: /root/lttng-traces/auto-20121029-090738 >> >> [09:08:04.670574000] (+?.?????????) mydaemon:13811 daemon:start: { cpu_id = 1 }, { procname = "mythread.9", vtid = 13993, vpid = 13811 }, { myvar = 999 } >> [...] >> # lttng destroy >> Session auto-20121029-090738 destroyed at /root >> """ >> >> I am using lttng-ust-2.0.5 which I have patched with commit 009745db "Cache the procname per-thread rather than per-process to take into account that prctl() can be used to set thread names.", lttng-tools-2.0.4 and babeltrace-1.0.0.rc5. >> >> Thanks in advance, >> David >> >> ---------------------------------------------------------------------- >> The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt. >> >> _______________________________________________ >> lttng-dev mailing list >> lttng-dev at lists.lttng.org >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > From christian.babeux at efficios.com Wed Oct 31 09:00:45 2012 From: christian.babeux at efficios.com (Christian Babeux) Date: Wed, 31 Oct 2012 09:00:45 -0400 Subject: [lttng-dev] [PATCH lttng-tools] Tests: Add filtering tests for uncovered cases Message-ID: <1351688445-18551-1-git-send-email-christian.babeux@efficios.com> While investigating the code coverage of the filtering feature, a couple of possible tests cases were uncovered: Error tests: * Strings can't be IR root node * Unary ! not allowed on string type * Comparison with string type not allowed * Logical operator not allowed with string types * Nesting of binary operator not allowed Valid tests: * Cover all left/right operands permutations with fields ref. and numeric values. Signed-off-by: Christian Babeux --- tests/tools/filtering/invalid-filters | 15 +++++++++++++++ tests/tools/filtering/valid-filters | 4 ++++ 2 files changed, 19 insertions(+) diff --git a/tests/tools/filtering/invalid-filters b/tests/tools/filtering/invalid-filters index d0777e5..b653705 100755 --- a/tests/tools/filtering/invalid-filters +++ b/tests/tools/filtering/invalid-filters @@ -118,6 +118,21 @@ INVALID_FILTERS=( "!a.f.d" "asdf.asdfsd.sadf < 4" "asdfasdf->asdfasdf < 2" + # String can't be root node + "\"somestring\"" + # Unary op on string not allowed + "!\"somestring\"" + # Comparison with string type not allowed + "\"somestring\" > 42" + "\"somestring\" > 42.0" + "42 > \"somestring\"" + "42.0 > \"somestring\"" + # Logical operator with string type not allowed + "\"somestring\" || 1" + "1 || \"somestring\"" + # Nesting of binary operator not allowed + "1 | (1 | (1 | 1))" + "1 > (1 > (1 > 1))" ) start_lttng_sessiond diff --git a/tests/tools/filtering/valid-filters b/tests/tools/filtering/valid-filters index b48b6ed..d32a60d 100755 --- a/tests/tools/filtering/valid-filters +++ b/tests/tools/filtering/valid-filters @@ -361,6 +361,8 @@ FILTERS=("intfield" #1 "intfield < 0x2" #24 "intfield < 02" #25 "stringfield2 == \"\\\*\"" #26 + "1.0 || intfield || 1.0" #27 + "1 < intfield" #28 ) VALIDATOR=("validator_intfield" #1 @@ -389,6 +391,8 @@ VALIDATOR=("validator_intfield" #1 "validator_intfield_lt" #24 "validator_intfield_lt" #25 "validator_true_statement" #26 + "validator_true_statement" #27 + "validator_intfield_gt" #28 ) FILTER_COUNT=${#FILTERS[@]} -- 1.8.0 From jdesfossez at efficios.com Wed Oct 31 15:09:18 2012 From: jdesfossez at efficios.com (Julien Desfossez) Date: Wed, 31 Oct 2012 15:09:18 -0400 Subject: [lttng-dev] RFC live trace reading use-cases Message-ID: <5091775E.5040609@efficios.com> Hi, We are currently working on defining a design for live trace reading and before commiting to one, we would like to have a list of all the relevant use-cases we have to take into account. Just to clarify, live trace reading means displaying/analyzing a trace while it is being recorded by a tracer. Here are the use-cases I have collected so far : - reading a trace from the disk of the machine that is currently being traced; - reading a trace from a remote machine on the disk of the machine running lttng-relayd (receiving the trace on the network); - reading the trace directly from the memory of the local tracer (mmap shared buffers); - reading the trace from the memory of a lttng-relayd daemon receiving the trace from the network but never writing it to disk. In all these cases, the tracing session might be a new one or a session already started. If the session is already started we could want to "attach" the viewer to it and read the data either from the beginning or from "now". Of course we want to keep the privilege separation and make sure the viewer is allowed to read the trace. We at least want to support the C/C++ and Java viewers and for the remote use-cases we might have to support viewers running on Windows as well. So we need an interface between the consumerd/relayd that will be common to all these possible viewers. I will not discuss here the early solutions we have right now to avoid influencing the decision, I am just interested in collecting the use-cases first. If you have other use-cases that you think might be interesting to support, please send them as soon as possible. We are in brainstorm mode, after that phase we will decide the use-cases we want to support/prioritize. Thanks, Julien