[ltt-dev] [PATCH 3/3] In-kernel filter module

Zhaolei zhaolei at cn.fujitsu.com
Tue Apr 7 23:19:12 EDT 2009


* From: "Mathieu Desnoyers" <compudj at krystal.dyndns.org>
Hello, Mathieu

Thanks for your answer.

>* Zhaolei (zhaolei at cn.fujitsu.com) wrote:
>> * From: "Mathieu Desnoyers" <compudj at krystal.dyndns.org>
> 
> [...]
> 
>> >> +static LIST_HEAD(marker_filter_head);
>> >> +
>> >> +/*
>> >> + * Use spinlock to lock filter struct because we can't use mutex in filter
>> >> + * callback
>> >> + */
>> >> +static DEFINE_SPINLOCK(filter_lock);
>> >> +
>> >> +/*
>> >> + * We need call some kernel function in ltt-filter function, but many kernel
>> >> + * function have tp/markers which cause cpu jump to ltt, then jump to
>> >> + * ltt-filter from ltt code.
>> >> + * Even if we don't call any kernel function in filter code, interrupt will
>> >> + * do similar thing.
>> >> + * To make it easy, we avoid this type of nesting by lock/unlock_filter.
>> >> + */
>> >> +DEFINE_PER_CPU(atomic_t, filter_nesting) = ATOMIC_INIT(0);
>> >> +static inline int lock_filter(void)
>> >> +{
>> >> + preempt_disable();
>> >> + if (atomic_xchg(&__get_cpu_var(filter_nesting), 1) > 0) {
>> >> + /* Already in lock */
>> >> + preempt_enable();
>> >> + return 1;
>> >> + }
>> >> + return 0;
>> >> +}
>> >> +static inline void unlock_filter(void)
>> >> +{
>> >> + atomic_set(&__get_cpu_var(filter_nesting), 0);
>> >> + preempt_enable();
>> >> +}
>> > 
>> > Hrm, this lock_filter seems broken. If you have an interrupt nesting
>> > over lock_filter, and takes lock_filter again, the interrupt will set
>> > the filter_nesting to 0, which creates a race.
>> If an interrupt nesting over lock_filter, and takes lock_filter again,
>> filter_nesting is still 1, because we use atomic_xchg(filter_nesting, 1).
>> 
> 
> The problem is not when the interrupt handler takes the lock, but when
> it releases it. It will set if back to 0, but the interrupted thread
> still holds this lock, which creates a race window. Basically your
> primitive is lacking a sane nesting counter.
Interrupt handler will not take and release this lock, it will only return
from filter when it found that already in lock, please see ltt_filter().

> 
>> > 
>> > Basically, what you want is to disallow filter modification used by an
>> > active tracing session _or_, if you want to allow filter modification
>> > while a tracing session is active. But to do it properly, you want to
>> > use RCU data structures synchronized with call_rcu_sched and
>> > synchronize_sched and use the RCU read lock (which is _already_ taken by
>> > LTTng around probe execution anyway) to synchronize.
>> I understand your opinion.
>> I agree your opinion that RCU can allow filter modification while a tracing
>> session is active.
>> But if we call a kernel function in filter function, and this kernel function
>> run into filter again, we can't prevent this kind of nesting by rcu.
>> Ltt have a ltt_nesting to prevent this kind of nesting, but it can't help
>> filter function.
> 
> You are right in that the filter function is called before reserving
> space, and therefore the ltt_nesting value is not tested. So you should
> simply test the already existing ltt_nesting value at the beginning of
> the filter function, you don't need another nesting variable. You can
> look at the ltt_reserve_slot function to see how to test this.
Good idea, i think.

> 
>> 
>> Maybe we can call no kernel function in filter() to prevent this kind of
>> nesting, and use rcu to allow filter modification when filter() function is
>> running, and I aggre to using rcu, but sorry that I am some busy these days,
>> it is not easy to get sometime to finish this change.
> 
> I doubt typical use-cases would include filter modification while the
> trace session is running. I would aim for the simpler goal : filter
> modifications should only be allowed when a filter is not attached to an
> active trace session. This way we don't have to use more complex
> RCU-aware data structures, which could be a real pain for filters.
Yes, simple method is always recommend.

> 
>> 
>> Current code is ok to work, and it passed my test with a my testset.
>> (I can send it to you if you need)
> 
> I would really prefer to have a solution that does not add supplementary
> custom locking into the tracer. Supporting filter modification when
> sessions are inactive seems like a perfect middle-ground between
> implementation complexity (it's simple) and not modifying the tracer
> locking semantics.
> 
>> 
>> > 
>> > The filter_lock spinlock should probably become a simple mutex too.
>> I attempted to use mutex for filter_lock, but always cause panic in my testset.
> 
> If I may suggest :
> 
> Let's remove the custom locking, use the trace_lock mutex to protect
> filter modifications, disallow filter modification if it belongs to an
> active trace, and also wait for a quiescent state with
> synchronize_sched() after we see that a filter belongs to an inactive
> trace (while we hold the trace_lock), so that we can make sure that no
> trace probes are seeing this filter while we modify it.
This is a good idea to avoid filter modifications when filter is in operation,
and it is more simple than i did.
But IMHO, it can't resolve problem that markers changed when filter is in
operation.
We must modufy data struct when add/remove marker, and we can't ignore this
operation as we do for fileop.
However, I still like your method, because it is simple and unify.
I think we can find a way to solve marker's problem based on your idea,
welcome to fix this patch to avoid custom locking and complex pre_cpu nesting lock.

Thanks
Zhaolei

> 
> Thanks,
> 
> Mathieu
> 
>> 
>> Thanks!
>> Zhaolei
>> > 
>> > Mathieu
>> > 
>> >> +
>> >> +struct marker_filter *_find_marker_filter(const char *name)
>> >> +{
>> >> + struct marker_filter *filter;
>> >> + list_for_each_entry(filter, &marker_filter_head, list) {
>> >> + if (!strcmp(name, filter->format->name))
>> >> + return filter;
>> >> + }
>> >> + return NULL;
>> >> +}
>> >> +
>> >> +static ssize_t format_read(struct file *filp, char __user *ubuf, size_t cnt,
>> >> + loff_t *ppos)
>> >> +{
>> >> + size_t ret;
>> >> + struct marker_filter *filter;
>> >> +
>> >> + lock_filter();
>> >> + spin_lock(&filter_lock);
>> >> + filter = filp->f_dentry->d_inode->i_private;
>> >> + if (filter) {
>> >> + ret = simple_read_from_buffer(ubuf, cnt, ppos,
>> >> + filter->format->format, strlen(filter->format->format));
>> >> + } else {
>> >> + ret = -EIO;
>> >> + }
>> >> + spin_unlock(&filter_lock);
>> >> + unlock_filter();
>> >> +
>> >> + return ret;
>> >> +}
>> >> +
>> >> +static const struct file_operations format_fops = {
>> >> + .read = format_read,
>> >> +};
>> >> +
>> >> +static int signed_type_read(char *buf, size_t bufsize,
>> >> + struct filter_item *filter_item)
>> >> +{
>> >> + int ret;
>> >> + int pos = 0;
>> >> +
>> >> + ret = scnprintf(buf + pos, bufsize - pos, "%s",
>> >> + compare_op_string[filter_item->setting.signed_type.op]);
>> >> + if (!ret)
>> >> + return 0;
>> >> + pos += ret;
>> >> +
>> >> + ret = scnprintf(buf + pos, bufsize - pos, " %lld",
>> >> + filter_item->setting.signed_type.value);
>> >> + if (!ret)
>> >> + return 0;
>> >> + pos += ret;
>> >> +
>> >> + return pos;
>> >> +}
>> >> +
>> >> +static int unsigned_type_read(char *buf, size_t bufsize,
>> >> + struct filter_item *filter_item)
>> >> +{
>> >> + int ret;
>> >> + int pos = 0;
>> >> +
>> >> + ret = scnprintf(buf + pos, bufsize - pos, "%s",
>> >> + compare_op_string[filter_item->setting.unsigned_type.op]);
>> >> + if (!ret)
>> >> + return 0;
>> >> + pos += ret;
>> >> +
>> >> + ret = scnprintf(buf + pos, bufsize - pos, " %llu",
>> >> + filter_item->setting.unsigned_type.value);
>> >> + if (!ret)
>> >> + return 0;
>> >> + pos += ret;
>> >> +
>> >> + return pos;
>> >> +}
>> >> +
>> >> +static int string_type_read(char *buf, size_t bufsize,
>> >> + struct filter_item *filter_item)
>> >> +{
>> >> + int ret;
>> >> + int pos = 0;
>> >> +
>> >> + ret = scnprintf(buf + pos, bufsize - pos, "%s",
>> >> + compare_op_string[filter_item->setting.string_type.op]);
>> >> + if (!ret)
>> >> + return 0;
>> >> + pos += ret;
>> >> +
>> >> + ret = scnprintf(buf + pos, bufsize - pos, " %s",
>> >> + filter_item->setting.string_type.value);
>> >> + if (!ret)
>> >> + return 0;
>> >> + pos += ret;
>> >> +
>> >> + return pos;
>> >> +}
>> >> +
>> >> +static int char_type_read(char *buf, size_t bufsize,
>> >> + struct filter_item *filter_item)
>> >> +{
>> >> + int ret;
>> >> + int pos = 0;
>> >> +
>> >> + ret = scnprintf(buf + pos, bufsize - pos, "%s",
>> >> + compare_op_string[filter_item->setting.char_type.op]);
>> >> + if (!ret)
>> >> + return 0;
>> >> + pos += ret;
>> >> +
>> >> + switch (filter_item->setting.char_type.value) {
>> >> + case '\t':
>> >> + ret = scnprintf(buf + pos, bufsize - pos, " \\t");
>> >> + break;
>> >> + case ' ':
>> >> + ret = scnprintf(buf + pos, bufsize - pos, " blank");
>> >> + break;
>> >> + case '\n':
>> >> + ret = scnprintf(buf + pos, bufsize - pos, " \\n");
>> >> + break;
>> >> + default:
>> >> + if (isgraph(filter_item->setting.char_type.value)) {
>> >> + ret = scnprintf(buf + pos, bufsize - pos, " %c",
>> >> + filter_item->setting.char_type.value);
>> >> + } else {
>> >> + ret = scnprintf(buf + pos, bufsize - pos,
>> >> + " \\%u", (unsigned int)
>> >> + filter_item->setting.char_type.value);
>> >> + }
>> >> + break;
>> >> + }
>> >> + if (!ret)
>> >> + return 0;
>> >> + pos += ret;
>> >> +
>> >> + return pos;
>> >> +}
>> >> +
>> >> +static ssize_t type_read(struct file *filp, char __user *ubuf, size_t cnt,
>> >> + loff_t *ppos)
>> >> +{
>> >> + size_t ret;
>> >> + struct filter_item *filter_item;
>> >> +
>> >> + lock_filter();
>> >> + spin_lock(&filter_lock);
>> >> + filter_item = filp->f_dentry->d_inode->i_private;
>> >> + if (filter_item) {
>> >> + struct format_item *format_item;
>> >> + char buf[128];
>> >> + int pos = 0;
>> >> +
>> >> + format_item = filter_item_format(filter_item);
>> >> +
>> >> + ret = scnprintf(buf + pos, sizeof(buf) - pos,
>> >> + "type: %u byte(s) %s\n",
>> >> + format_item->size,
>> >> + format_item_type_string[format_item->type]);
>> >> + if (!ret) {
>> >> + ret = -EINVAL;
>> >> + goto err_format_string;
>> >> + }
>> >> + pos += ret;
>> >> +
>> >> + ret = scnprintf(buf + pos, sizeof(buf) - pos, "setting: ");
>> >> + if (!ret) {
>> >> + ret = -EINVAL;
>> >> + goto err_format_string;
>> >> + }
>> >> + pos += ret;
>> >> +
>> >> + if (!filter_item->enable) {
>> >> + ret = scnprintf(buf + pos, sizeof(buf) - pos, "*");
>> >> + if (!ret) {
>> >> + ret = -EINVAL;
>> >> + goto err_format_string;
>> >> + }
>> >> + pos += ret;
>> >> + goto format_finish;
>> >> + }
>> >> +
>> >> + switch (format_item->type) {
>> >> + case SIGNED:
>> >> + ret = signed_type_read(buf + pos, sizeof(buf) - pos,
>> >> + filter_item);
>> >> + break;
>> >> + case UNSIGNED:
>> >> + ret = unsigned_type_read(buf + pos, sizeof(buf) - pos,
>> >> + filter_item);
>> >> + break;
>> >> + case STRING:
>> >> + ret = string_type_read(buf + pos, sizeof(buf) - pos,
>> >> + filter_item);
>> >> + break;
>> >> + case CHAR:
>> >> + ret = char_type_read(buf + pos, sizeof(buf) - pos,
>> >> + filter_item);
>> >> + break;
>> >> + default:
>> >> + /* should not run to here */
>> >> + printk(KERN_ERR "%s: unknown type %d\n",
>> >> + __func__, format_item->type);
>> >> + ret = -EIO;
>> >> + goto err_unknown_type;
>> >> + }
>> >> + if (!ret) {
>> >> + ret = -EINVAL;
>> >> + goto err_format_string;
>> >> + }
>> >> + pos += ret;
>> >> +
>> >> +format_finish:
>> >> + ret = scnprintf(buf + pos, sizeof(buf) - pos, "\n");
>> >> + if (!ret) {
>> >> + ret = -EINVAL;
>> >> + goto err_format_string;
>> >> + }
>> >> + pos += ret;
>> >> +
>> >> + ret = simple_read_from_buffer(ubuf, cnt, ppos, buf,
>> >> + strlen(buf));
>> >> + } else {
>> >> + ret = -EIO;
>> >> + }
>> >> +
>> >> +err_format_string:
>> >> +err_unknown_type:
>> >> + spin_unlock(&filter_lock);
>> >> + unlock_filter();
>> >> +
>> >> + return ret;
>> >> +}
>> >> +
>> >> +/*
>> >> + * Check op info in a input string
>> >> + * return length of op part on success, *op is set.
>> >> + * return 0 on fail
>> >> + */
>> >> +static size_t scan_op(const char *str, enum compare_op *op)
>> >> +{
>> >> + /* Must reverse sort with strlen */
>> >> + static struct {
>> >> + const char *str;
>> >> + enum compare_op op;
>> >> + } str_op[] = {
>> >> + {"gt", gt},
>> >> + {"ge", ge},
>> >> + {"eq", eq},
>> >> + {"le", le},
>> >> + {"lt", lt},
>> >> + {">=", ge},
>> >> + {"<=", le},
>> >> + {">", gt},
>> >> + {"=", eq},
>> >> + {"<", lt},
>> >> + };
>> >> + int i;
>> >> +
>> >> + for (i = 0; i < ARRAY_SIZE(str_op); i++) {
>> >> + if (!strncmp(str, str_op[i].str, strlen(str_op[i].str))) {
>> >> + *op = str_op[i].op;
>> >> + return strlen(str_op[i].str);
>> >> + }
>> >> + }
>> >> +
>> >> + return 0;
>> >> +}
>> >> +
>> >> +static int signed_type_write(const char *buf, struct format_item *format_item,
>> >> + struct filter_item *filter_item)
>> >> +{
>> >> + int oplen;
>> >> + enum compare_op op;
>> >> + long long value = 0;
>> >> +
>> >> + oplen = scan_op(buf, &op);
>> >> + if (!oplen)
>> >> + return -EINVAL;
>> >> +
>> >> + switch (format_item->size) {
>> >> + case 1:
>> >> + {
>> >> + char tmp;
>> >> + if (sscanf(buf + oplen, "%hhd", &tmp) != 1)
>> >> + return -EINVAL;
>> >> + value = tmp;
>> >> + }
>> >> + break;
>> >> + case 2:
>> >> + {
>> >> + short int tmp;
>> >> + if (sscanf(buf + oplen, "%hd", &tmp) != 1)
>> >> + return -EINVAL;
>> >> + value = tmp;
>> >> + }
>> >> + break;
>> >> + case 4:
>> >> + {
>> >> + int tmp;
>> >> + if (sscanf(buf + oplen, "%d", &tmp) != 1)
>> >> + return -EINVAL;
>> >> + value = tmp;
>> >> + }
>> >> + break;
>> >> + case 8:
>> >> + if (sscanf(buf + oplen, "%lld", &value) != 1)
>> >> + return -EINVAL;
>> >> + break;
>> >> + default:
>> >> + /* should not run to here */
>> >> + printk(KERN_ERR "%s: unknown size %zu\n", __func__,
>> >> + format_item->size);
>> >> + return -EIO;
>> >> + }
>> >> +
>> >> + filter_item->setting.signed_type.op = op;
>> >> + filter_item->setting.signed_type.value = value;
>> >> +
>> >> + return 0;
>> >> +}
>> >> +
>> >> +static int unsigned_type_write(const char *buf, struct format_item *format_item,
>> >> + struct filter_item *filter_item)
>> >> +{
>> >> + int oplen;
>> >> + enum compare_op op;
>> >> + long long value = 0;
>> >> +
>> >> + oplen = scan_op(buf, &op);
>> >> + if (!oplen)
>> >> + return -EINVAL;
>> >> +
>> >> + switch (format_item->size) {
>> >> + case 1:
>> >> + {
>> >> + unsigned char tmp;
>> >> + if (sscanf(buf + oplen, "%hhu", &tmp) != 1)
>> >> + return -EINVAL;
>> >> + value = tmp;
>> >> + }
>> >> + break;
>> >> + case 2:
>> >> + {
>> >> + unsigned short int tmp;
>> >> + if (sscanf(buf + oplen, "%hu", &tmp) != 1)
>> >> + return -EINVAL;
>> >> + value = tmp;
>> >> + }
>> >> + break;
>> >> + case 4:
>> >> + {
>> >> + unsigned int tmp;
>> >> + if (sscanf(buf + oplen, "%u", &tmp) != 1)
>> >> + return -EINVAL;
>> >> + value = tmp;
>> >> + }
>> >> + break;
>> >> + case 8:
>> >> + if (sscanf(buf + oplen, "%llu", &value) != 1)
>> >> + return -EINVAL;
>> >> + break;
>> >> + default:
>> >> + /* should not run to here */
>> >> + printk(KERN_ERR "%s: unknown size %zu\n", __func__,
>> >> + format_item->size);
>> >> + return -EIO;
>> >> + }
>> >> +
>> >> + filter_item->setting.unsigned_type.op = op;
>> >> + filter_item->setting.unsigned_type.value = value;
>> >> +
>> >> + return 0;
>> >> +}
>> >> +
>> >> +static int string_type_write(const char *buf, struct format_item *format_item,
>> >> + struct filter_item *filter_item)
>> >> +{
>> >> + int oplen;
>> >> + enum compare_op op;
>> >> + int val_len;
>> >> +
>> >> + oplen = scan_op(buf, &op);
>> >> + if (!oplen)
>> >> + return -EINVAL;
>> >> +
>> >> + /* End string on \n */
>> >> + for (val_len = 0; *(buf + oplen + val_len); val_len++) {
>> >> + if (*(buf + oplen + val_len) == '\n')
>> >> + break;
>> >> + }
>> >> +
>> >> + if (val_len > ARRAY_SIZE(filter_item->setting.string_type.value) - 1)
>> >> + return -EINVAL;
>> >> +
>> >> + filter_item->setting.string_type.op = op;
>> >> + memcpy(filter_item->setting.string_type.value, buf + oplen, val_len);
>> >> + filter_item->setting.string_type.value[val_len] = 0;
>> >> +
>> >> + return 0;
>> >> +}
>> >> +
>> >> +static int char_type_write(const char *buf, struct format_item *format_item,
>> >> + struct filter_item *filter_item)
>> >> +{
>> >> + int oplen;
>> >> + enum compare_op op;
>> >> + unsigned char value;
>> >> + int val_start;
>> >> +
>> >> + oplen = scan_op(buf, &op);
>> >> + if (!oplen)
>> >> + return -EINVAL;
>> >> +
>> >> + /* bypass blanks */
>> >> + for (val_start = 0; *(buf + oplen + val_start) == ' ' ||
>> >> + *(buf + oplen + val_start) == '\t'; val_start++)
>> >> + ;
>> >> +
>> >> + /*
>> >> + * parse escape string, following escape string are supported:
>> >> + *   "\n"
>> >> + *   "\[num]"
>> >> + *   "\[char expect \n and \[num]]", as "\\", "\ ", ...
>> >> + */
>> >> + switch (*(buf + oplen + val_start)) {
>> >> + case '\0':
>> >> + case '\n':
>> >> + return -EINVAL;
>> >> + case '\\':
>> >> + switch (*(buf + oplen + val_start + 1)) {
>> >> + case '\0':
>> >> + case '\n':
>> >> + return -EINVAL;
>> >> + case 'n':
>> >> + value = '\n';
>> >> + break;
>> >> + default:
>> >> + if (isdigit(*(buf + oplen + val_start + 1))) {
>> >> + if (sscanf(buf + oplen + val_start + 1, "%hhu",
>> >> + &value) != 1)
>> >> + return -EINVAL;
>> >> + } else {
>> >> + value =
>> >> + (*(buf + oplen + val_start + 1));
>> >> + }
>> >> + break;
>> >> + }
>> >> + break;
>> >> + default:
>> >> + value = *(buf + oplen + val_start);
>> >> + break;
>> >> + }
>> >> +
>> >> + filter_item->setting.char_type.op = op;
>> >> + filter_item->setting.char_type.value = value;
>> >> +
>> >> + return 0;
>> >> +}
>> >> +
>> >> +static ssize_t type_write(struct file *filp, const char __user *ubuf,
>> >> + size_t cnt, loff_t *ppos)
>> >> +{
>> >> + size_t ret;
>> >> + struct filter_item *filter_item;
>> >> +
>> >> + lock_filter();
>> >> + spin_lock(&filter_lock);
>> >> + filter_item = filp->f_dentry->d_inode->i_private;
>> >> + if (filter_item) {
>> >> + struct format_item *format_item;
>> >> + char buf[NAME_MAX];
>> >> + size_t buf_size;
>> >> + int val_start;
>> >> +
>> >> + format_item = filter_item_format(filter_item);
>> >> +
>> >> + buf_size = min(cnt, sizeof(buf) - 1);
>> >> + if (copy_from_user(buf, ubuf, buf_size)) {
>> >> + ret = -EFAULT;
>> >> + goto err_copy_from_user;
>> >> + }
>> >> + buf[buf_size] = 0;
>> >> +
>> >> + /* Check is to disable filter */
>> >> + for (val_start = 0; *(buf + val_start) == ' ' ||
>> >> + *(buf + val_start) == '\t'; val_start++)
>> >> + ;
>> >> + if (!strncmp(buf + val_start, "*", strlen("*"))
>> >> + || !strncmp(buf + val_start, "off", strlen("off"))) {
>> >> + struct marker_filter *filter;
>> >> + int i;
>> >> +
>> >> + filter_item->enable = 0;
>> >> +
>> >> + /*
>> >> + * rescan all items in this marker to
>> >> + * update marker_filter->enable
>> >> + */
>> >> + filter = filter_item->parent;
>> >> + filter->enable = 0;
>> >> + for (i = 0; i < filter->format->format_item_cnt; i++) {
>> >> + if (filter->items[i].enable) {
>> >> + filter->enable = 1;
>> >> + break;
>> >> + }
>> >> + }
>> >> +
>> >> + goto scan_finish;
>> >> + }
>> >> +
>> >> + switch (format_item->type) {
>> >> + case SIGNED:
>> >> + ret = signed_type_write(buf, format_item, filter_item);
>> >> + break;
>> >> + case UNSIGNED:
>> >> + ret = unsigned_type_write(buf, format_item,
>> >> + filter_item);
>> >> + break;
>> >> + case STRING:
>> >> + ret = string_type_write(buf, format_item, filter_item);
>> >> + break;
>> >> + case CHAR:
>> >> + ret = char_type_write(buf, format_item, filter_item);
>> >> + break;
>> >> + default:
>> >> + /* should not run to here */
>> >> + printk(KERN_ERR "%s: unknown type %d\n",
>> >> + __func__, format_item->type);
>> >> + ret = -EIO;
>> >> + goto err_unknown_type;
>> >> + }
>> >> + if (IS_ERR_VALUE(ret))
>> >> + goto err_scan_string;
>> >> +
>> >> + filter_item->enable = 1;
>> >> + filter_item->parent->enable = 1;
>> >> +
>> >> +scan_finish:
>> >> + ret = cnt;
>> >> + } else {
>> >> + ret = -EIO;
>> >> + }
>> >> +
>> >> +err_scan_string:
>> >> +err_unknown_type:
>> >> +err_copy_from_user:
>> >> + spin_unlock(&filter_lock);
>> >> + unlock_filter();
>> >> +
>> >> + return ret;
>> >> +}
>> >> +
>> >> +static const struct file_operations type_fops = {
>> >> + .read = type_read,
>> >> + .write = type_write,
>> >> +};
>> >> +
>> >> +static int init_filter_items(struct marker_filter *filter,
>> >> + struct marker_format *format)
>> >> +{
>> >> + int err = 0;
>> >> + int i;
>> >> +
>> >> + for (i = 0; i < format->format_item_cnt; i++) {
>> >> + char fname[16];
>> >> +
>> >> + filter->items[i].parent = filter;
>> >> + filter->items[i].enable = 0;
>> >> +
>> >> + /* Create type file as __arg1, __arg2 */
>> >> + sprintf(fname, "__arg%d", i + 1);
>> >> + filter->items[i].den[0] = debugfs_create_file(fname, S_IRUGO,
>> >> + filter->rootden, &filter->items[i], &type_fops);
>> >> + if (IS_ERR(filter->items[i].den[0])
>> >> + || !filter->items[i].den[0]) {
>> >> + printk(KERN_ERR
>> >> + "%s: create type file of %s for %s failed\n",
>> >> + __func__, fname, filter->format->name);
>> >> + err = -EIO;
>> >> + goto err_build_typefile;
>> >> + }
>> >> +
>> >> + /* Create symbol link to type file */
>> >> + if (filter->format->items[i]->name[0]) {
>> >> +
>> >> + filter->items[i].den[1] = debugfs_create_symlink(
>> >> + filter->format->items[i]->name,
>> >> + filter->rootden, fname);
>> >> + if (IS_ERR(filter->items[i].den[1])
>> >> + || !filter->items[i].den[1]) {
>> >> + /*
>> >> + * same-name type is allowed, but it will cause
>> >> + * create symlink fail.
>> >> + * we ignore this type of error.
>> >> + */
>> >> + filter->items[i].den[1] = NULL;
>> >> + }
>> >> + }
>> >> + }
>> >> +
>> >> + return 0;
>> >> +
>> >> +err_build_typefile:
>> >> + for (i--; i >= 0; i--) {
>> >> + /* see comments in _destroy_marker_filter() for why set NULL */
>> >> + filter->items[i].den[0]->d_inode->i_private = NULL;
>> >> + debugfs_remove(filter->items[i].den[0]);
>> >> + debugfs_remove(filter->items[i].den[1]);
>> >> + }
>> >> + return err;
>> >> +}
>> >> +
>> >> +static int add_marker_filter(struct marker_format *format)
>> >> +{
>> >> + int err = 0;
>> >> + struct marker_filter *filter;
>> >> +
>> >> + lock_filter();
>> >> + spin_lock(&filter_lock);
>> >> +
>> >> +#ifdef _DEBUG
>> >> + if (_find_marker_filter(format->name)) {
>> >> + printk(KERN_ERR "%s: filter %s already exist\n", __func__,
>> >> + format->name);
>> >> + err = -EIO;
>> >> + goto err_already_exist;
>> >> + }
>> >> +#endif /* _DEBUG */
>> >> +
>> >> + filter = kmalloc(sizeof(struct marker_filter), GFP_KERNEL);
>> >> + if (!filter) {
>> >> + printk(KERN_ERR "%s: alloc struct marker_filter failed\n",
>> >> + __func__);
>> >> + err = -ENOMEM;
>> >> + goto err_alloc_marker_filter;
>> >> + }
>> >> +
>> >> + INIT_LIST_HEAD(&filter->list);
>> >> +
>> >> + filter->format = format;
>> >> +
>> >> + filter->rootden = debugfs_create_dir(format->name, ltt_filter_dir);
>> >> + if (IS_ERR(filter->rootden) || !filter->rootden) {
>> >> + printk(KERN_ERR "%s: create dir of %s failed\n", __func__,
>> >> + format->name);
>> >> + err = -EIO;
>> >> + goto err_create_dir;
>> >> + }
>> >> +
>> >> + filter->formatden = debugfs_create_file("__arg0", S_IRUGO,
>> >> + filter->rootden, filter, &format_fops);
>> >> + if (IS_ERR(filter->formatden) || !filter->formatden) {
>> >> + printk(KERN_ERR
>> >> + "%s: create typefile failed\n", __func__);
>> >> + err = -EIO;
>> >> + goto err_build_typefile;
>> >> + }
>> >> +
>> >> + if (format->format_item_cnt) {
>> >> + filter->items = kmalloc(sizeof(struct filter_item)
>> >> + * format->format_item_cnt, GFP_KERNEL);
>> >> + if (!filter->items) {
>> >> + printk(KERN_ERR
>> >> + "%s: alloc filter's items failed\n", __func__);
>> >> + err = -ENOMEM;
>> >> + goto err_alloc_types;
>> >> + }
>> >> + err = init_filter_items(filter, format);
>> >> + if (IS_ERR_VALUE(err)) {
>> >> + printk(KERN_ERR
>> >> + "%s: init filter's items failed\n", __func__);
>> >> + goto err_init_types;
>> >> + }
>> >> + } else {
>> >> + filter->items = NULL;
>> >> + }
>> >> +
>> >> + filter->enable = 0;
>> >> +
>> >> + list_add(&filter->list, &marker_filter_head);
>> >> +
>> >> + spin_unlock(&filter_lock);
>> >> + unlock_filter();
>> >> +
>> >> + return 0;
>> >> +
>> >> +err_init_types:
>> >> + kfree(filter->items);
>> >> +err_alloc_types:
>> >> + /* see comments in _destroy_marker_filter() for why set NULL */
>> >> + filter->formatden->d_inode->i_private = NULL;
>> >> + debugfs_remove(filter->formatden);
>> >> +err_build_typefile:
>> >> + debugfs_remove(filter->rootden);
>> >> +err_create_dir:
>> >> + kfree(filter);
>> >> +err_alloc_marker_filter:
>> >> +#ifdef _DEBUG
>> >> +err_already_exist:
>> >> +#endif /* _DEBUG */
>> >> + spin_unlock(&filter_lock);
>> >> + unlock_filter();
>> >> + return err;
>> >> +}
>> >> +
>> >> +static void _destroy_marker_filter(struct marker_filter *filter)
>> >> +{
>> >> + int i;
>> >> +
>> >> + list_del(&filter->list);
>> >> +
>> >> + for (i = 0; i < filter->format->format_item_cnt; i++) {
>> >> + filter->items[i].den[0]->d_inode->i_private = NULL;
>> >> + debugfs_remove(filter->items[i].den[0]);
>> >> + debugfs_remove(filter->items[i].den[1]);
>> >> + }
>> >> + kfree(filter->items);
>> >> +
>> >> + /*
>> >> + * Set i_private to NULL, so that if some fileop in process,
>> >> + * it can avoid illegal access using i_private in fileop.
>> >> + * (we must get filter_lock in fileop to ensure no other change
>> >> + *  in filter struct)
>> >> + */
>> >> + filter->formatden->d_inode->i_private = NULL;
>> >> + debugfs_remove(filter->formatden);
>> >> +
>> >> + debugfs_remove(filter->rootden);
>> >> +
>> >> + kfree(filter);
>> >> +
>> >> + return;
>> >> +}
>> >> +
>> >> +static void destroy_all_filter(void)
>> >> +{
>> >> + struct marker_filter *filter, *next;
>> >> +
>> >> + lock_filter();
>> >> + spin_lock(&filter_lock);
>> >> + list_for_each_entry_safe(filter, next, &marker_filter_head, list) {
>> >> + _destroy_marker_filter(filter);
>> >> + }
>> >> + spin_unlock(&filter_lock);
>> >> + unlock_filter();
>> >> +}
>> >> +
>> >> +static int bulid_exist_filter(void)
>> >> +{
>> >> + int err;
>> >> + struct marker_format *format;
>> >> +
>> >> + if (!list_empty(&marker_filter_head)) {
>> >> + err = -EEXIST;
>> >> + goto err_not_empty;
>> >> + }
>> >> +
>> >> + mutex_lock(&format_lock);
>> >> + list_for_each_entry(format, &marker_format_head, list) {
>> >> + err = add_marker_filter(format);
>> >> + if (IS_ERR_VALUE(err)) {
>> >> + printk("%s: add_marker_filter failed\n", __func__);
>> >> + goto err_add_filter;
>> >> + }
>> >> + }
>> >> + mutex_unlock(&format_lock);
>> >> +
>> >> + return 0;
>> >> +
>> >> +err_add_filter:
>> >> + destroy_all_filter();
>> >> + mutex_unlock(&format_lock);
>> >> +err_not_empty:
>> >> + return err;
>> >> +}
>> >> +
>> >> +static int format_notify(struct notifier_block *self, unsigned long val,
>> >> + void *data)
>> >> +{
>> >> + struct marker_format *format = data;
>> >> + struct marker_filter *filter, *next;
>> >> +
>> >> + switch (val) {
>> >> + case ADD_FORMAT:
>> >> + add_marker_filter(format);
>> >> + break;
>> >> + case DEL_FORMAT:
>> >> + lock_filter();
>> >> + spin_lock(&filter_lock);
>> >> + list_for_each_entry_safe(filter, next, &marker_filter_head,
>> >> + list) {
>> >> + if (filter->format == format)
>> >> + _destroy_marker_filter(filter);
>> >> + }
>> >> + spin_unlock(&filter_lock);
>> >> + unlock_filter();
>> >> + break;
>> >> + }
>> >> + return NOTIFY_OK;
>> >> +}
>> >> +
>> >> +static struct notifier_block format_nb = {
>> >> + .notifier_call = format_notify,
>> >> +};
>> >> +
>> >> +static int marker_filter_init(void)
>> >> +{
>> >> + int err = 0;
>> >> +
>> >> + err = bulid_exist_filter();
>> >> + if (IS_ERR_VALUE(err)) {
>> >> + printk(KERN_ERR "%s: bulid_exist_filter failed\n", __func__);
>> >> + goto err_add_exist_filter;
>> >> + }
>> >> +
>> >> + err = register_format_notifier(&format_nb);
>> >> + if (IS_ERR_VALUE(err)) {
>> >> + printk(KERN_ERR "%s register format notifier failed\n",
>> >> + __func__);
>> >> + goto err_register_format_notifier;
>> >> + }
>> >> +
>> >> + return 0;
>> >> +
>> >> +err_register_format_notifier:
>> >> + destroy_all_filter();
>> >> +err_add_exist_filter:
>> >> + return err;
>> >> +}
>> >> +
>> >> +static void marker_filter_destroy(void)
>> >> +{
>> >> + unregister_format_notifier(&format_nb);
>> >> + destroy_all_filter();
>> >> +}
>> >> +
>> >> +
>> >> +/*
>> >> + * Filter operations
>> >> + *
>> >> + * define filter callback for lttng to determine if to output a event
>> >> + * by user's setting.
>> >> + *
>> >> + */
>> >> +
>> >> +/*
>> >> + * Ret:
>> >> + *   0:  filted
>> >> + *   1:  passed
>> >> + *   <0: err
>> >> + */
>> >> +static int signed_filter(struct format_item *format_item,
>> >> + struct filter_item *filter_item, va_list *args)
>> >> +{
>> >> + long long int val;
>> >> +
>> >> + switch (format_item->size) {
>> >> + case 1:
>> >> + val = (char)va_arg(*args, int);
>> >> + break;
>> >> + case 2:
>> >> + val = (short int)va_arg(*args, int);
>> >> + break;
>> >> + case 4:
>> >> + val = va_arg(*args, int);
>> >> + break;
>> >> + case 8:
>> >> + val = va_arg(*args, long long int);
>> >> + break;
>> >> + default:
>> >> + /* should not run to here */
>> >> + printk(KERN_ERR "%s: unknown size %d\n", __func__,
>> >> + format_item->size);
>> >> + return -1;
>> >> + }
>> >> +
>> >> + /*
>> >> + * We judge filter_item->enable here, because even if filter is not
>> >> + * enabled, we still need to bypass this data in va_list
>> >> + */
>> >> + if (!filter_item->enable)
>> >> + return 1;
>> >> +
>> >> + switch (filter_item->setting.signed_type.op) {
>> >> + case gt:
>> >> + return val > filter_item->setting.signed_type.value;
>> >> + case ge:
>> >> + return val >= filter_item->setting.signed_type.value;
>> >> + case eq:
>> >> + return val == filter_item->setting.signed_type.value;
>> >> + case le:
>> >> + return val <= filter_item->setting.signed_type.value;
>> >> + case lt:
>> >> + return val < filter_item->setting.signed_type.value;
>> >> + }
>> >> +
>> >> + /* should not run to here */
>> >> + return -1;
>> >> +}
>> >> +
>> >> +static int unsigned_filter(struct format_item *format_item,
>> >> + struct filter_item *filter_item, va_list *args)
>> >> +{
>> >> + unsigned long long int val;
>> >> +
>> >> + switch (format_item->size) {
>> >> + case 1:
>> >> + val = (unsigned char)va_arg(*args, unsigned int);
>> >> + break;
>> >> + case 2:
>> >> + val = (short int)va_arg(*args, unsigned int);
>> >> + break;
>> >> + case 4:
>> >> + val = va_arg(*args, unsigned int);
>> >> + break;
>> >> + case 8:
>> >> + val = va_arg(*args, unsigned long long int);
>> >> + break;
>> >> + default:
>> >> + /* should not run to here */
>> >> + printk(KERN_ERR "%s: unknown size %d\n", __func__,
>> >> + format_item->size);
>> >> + return -1;
>> >> + }
>> >> +
>> >> + /*
>> >> + * We judge filter_item->enable here, because even if filter is not
>> >> + * enabled, we still need to bypass this data in va_list
>> >> + */
>> >> + if (!filter_item->enable)
>> >> + return 1;
>> >> +
>> >> + switch (filter_item->setting.unsigned_type.op) {
>> >> + case gt:
>> >> + return val > filter_item->setting.unsigned_type.value;
>> >> + case ge:
>> >> + return val >= filter_item->setting.unsigned_type.value;
>> >> + case eq:
>> >> + return val == filter_item->setting.unsigned_type.value;
>> >> + case le:
>> >> + return val <= filter_item->setting.unsigned_type.value;
>> >> + case lt:
>> >> + return val < filter_item->setting.unsigned_type.value;
>> >> + }
>> >> +
>> >> + /* should not run to here */
>> >> + return -1;
>> >> +}
>> >> +
>> >> +static int string_filter(struct format_item *format_item,
>> >> + struct filter_item *filter_item, va_list *args)
>> >> +{
>> >> + const char *val;
>> >> +
>> >> + val = va_arg(*args, const char *);
>> >> +
>> >> + /*
>> >> + * We judge filter_item->enable here, because even if filter is not
>> >> + * enabled, we still need to bypass this data in va_list
>> >> + */
>> >> + if (!filter_item->enable)
>> >> + return 1;
>> >> +
>> >> + switch (filter_item->setting.string_type.op) {
>> >> + case gt:
>> >> + return strcmp(val, filter_item->setting.string_type.value) > 0;
>> >> + case ge:
>> >> + return strcmp(val, filter_item->setting.string_type.value) >= 0;
>> >> + case eq:
>> >> + return strcmp(val, filter_item->setting.string_type.value) == 0;
>> >> + case le:
>> >> + return strcmp(val, filter_item->setting.string_type.value) <= 0;
>> >> + case lt:
>> >> + return strcmp(val, filter_item->setting.string_type.value) < 0;
>> >> + }
>> >> +
>> >> + /* should not run to here */
>> >> + return -1;
>> >> +}
>> >> +
>> >> +static int char_filter(struct format_item *format_item,
>> >> + struct filter_item *filter_item, va_list *args)
>> >> +{
>> >> + unsigned char val;
>> >> +
>> >> + val = (unsigned char)va_arg(*args, unsigned int);
>> >> +
>> >> + /*
>> >> + * We judge filter_item->enable here, because even if filter is not
>> >> + * enabled, we still need to bypass this data in va_list
>> >> + */
>> >> + if (!filter_item->enable)
>> >> + return 1;
>> >> +
>> >> + switch (filter_item->setting.char_type.op) {
>> >> + case gt:
>> >> + return val > filter_item->setting.char_type.value;
>> >> + case ge:
>> >> + return val >= filter_item->setting.char_type.value;
>> >> + case eq:
>> >> + return val == filter_item->setting.char_type.value;
>> >> + case le:
>> >> + return val <= filter_item->setting.char_type.value;
>> >> + case lt:
>> >> + return val < filter_item->setting.char_type.value;
>> >> + }
>> >> +
>> >> + /* should not run to here */
>> >> + return -1;
>> >> +}
>> >> +
>> >> +static int ltt_filter(struct ltt_trace_struct *trace,
>> >> + const struct marker *mdata, va_list *args)
>> >> +{
>> >> + int ret;
>> >> + struct marker_filter *filter;
>> >> + struct marker_format *format;
>> >> + int i;
>> >> +
>> >> + if (lock_filter()) {
>> > 
>> > Hrm do you allow session filter modification when tracing is active for
>> > this session ?
>> > 
>> > I think the lock_filter approach you are taking here is wrong.
>> > 
>> > 
>> >> + /* Don't output events caused by ltt-filter */
>> >> + return 0;
>> >> + }
>> >> +
>> >> + spin_lock(&filter_lock);
>> >> +
>> >> + filter = _find_marker_filter(mdata->name);
>> >> + if (!filter) {
>> >> + printk(KERN_ERR "%s: unknown marker %s\n", __func__,
>> >> + mdata->name);
>> >> + ret = 1;
>> >> + goto done;
>> >> + }
>> >> +
>> >> + /* Bypass if no filter setting on this marker */
>> >> + if (!filter->enable) {
>> >> + ret = 1;
>> >> + goto done;
>> >> + }
>> >> +
>> >> + format = filter->format;
>> >> +
>> >> + for (i = 0; i < format->format_item_cnt; i++) {
>> >> + int result;
>> >> +
>> >> + switch (format->items[i]->type) {
>> >> + case SIGNED:
>> >> + result = signed_filter(format->items[i],
>> >> + filter->items + i, args);
>> >> + break;
>> >> + case UNSIGNED:
>> >> + result = unsigned_filter(format->items[i],
>> >> + filter->items + i, args);
>> >> + break;
>> >> + case STRING:
>> >> + result = string_filter(format->items[i],
>> >> + filter->items + i, args);
>> >> + break;
>> >> + case CHAR:
>> >> + result = char_filter(format->items[i],
>> >> + filter->items + i, args);
>> >> + break;
>> >> + default:
>> >> + /* should not run to here */
>> >> + printk(KERN_ERR "%s: unknown type %d\n", __func__,
>> >> + format->items[i]->type);
>> >> + ret = 1;
>> >> + goto done;
>> >> + }
>> >> + if (result < 0) {
>> >> + ret = 1;
>> >> + goto done;
>> >> + }
>> >> + if (!result) {
>> >> + ret = 0;
>> >> + goto done;
>> >> + }
>> >> + }
>> >> +
>> >> + ret = 1;
>> >> +
>> >> +done:
>> >> + spin_unlock(&filter_lock);
>> >> + unlock_filter();
>> >> + return ret;
>> >> +}
>> >> +
>> >> +
>> >> +/*
>> >> + * Init/exit option of module
>> >> + *
>> >> + * Init marker_format and marker_filter operations
>> >> + * register filter callback in ltt
>> >> + *
>> >> + */
>> >> +
>> >> +static int __init ltt_filter_init(void)
>> >> +{
>> >> + int err = 0;
>> >> + struct dentry *ltt_root_dentry;
>> >> +
>> >> + ltt_root_dentry = get_ltt_root();
>> >> + if (!ltt_root_dentry) {
>> >> + err = -ENOENT;
>> >> + goto err_no_root;
>> >> + }
>> >> +
>> >> + ltt_filter_dir = debugfs_create_dir(LTT_FILTER_DIR, ltt_root_dentry);
>> >> + if (IS_ERR(ltt_filter_dir) || !ltt_filter_dir) {
>> >> + printk(KERN_ERR "%s: create dir of %s failed\n", __func__,
>> >> + LTT_FILTER_DIR);
>> >> + err = -ENOMEM;
>> >> + goto err_create_filter_dir;
>> >> + }
>> >> +
>> >> + err = marker_format_init();
>> >> + if (IS_ERR_VALUE(err)) {
>> >> + printk(KERN_ERR "%s: init marker format failed\n", __func__);
>> >> + goto err_marker_format_init;
>> >> + }
>> >> +
>> >> + err = marker_filter_init();
>> >> + if (IS_ERR_VALUE(err)) {
>> >> + printk(KERN_ERR "%s: init marker filter failed\n", __func__);
>> >> + goto err_marker_filter_init;
>> >> + }
>> >> +
>> >> + err = ltt_module_register(LTT_FUNCTION_RUN_FILTER, ltt_filter,
>> >> + THIS_MODULE);
>> >> + if (IS_ERR_VALUE(err)) {
>> >> + printk(KERN_ERR "%s: register ltt filter failed\n", __func__);
>> >> + goto err_filter_register;
>> >> + }
>> >> +
>> >> + return 0;
>> >> +
>> >> +err_filter_register:
>> >> + marker_filter_destroy();
>> >> +err_marker_filter_init:
>> >> + marker_format_destroy();
>> >> +err_marker_format_init:
>> >> + debugfs_remove(ltt_filter_dir);
>> >> +err_create_filter_dir:
>> >> +err_no_root:
>> >> + return err;
>> >> +}
>> >> +
>> >> +static void __exit ltt_filter_exit(void)
>> >> +{
>> >> + ltt_module_unregister(LTT_FUNCTION_RUN_FILTER);
>> >> + marker_filter_destroy();
>> >> + marker_format_destroy();
>> >> +
>> >> + debugfs_remove(ltt_filter_dir);
>> >> +}
>> >> +
>> >> +module_init(ltt_filter_init);
>> >> +module_exit(ltt_filter_exit);
>> >> +
>> >> +MODULE_LICENSE("GPL");
>> >> +MODULE_AUTHOR("Zhao Lei <zhaolei at cn.fujitsu.com>");
>> >> +MODULE_DESCRIPTION("Linux Trace Toolkit Filter");
>> >> -- 
>> >> 1.5.5.3
>> >> 
>> >> 
>> >> 
>> >> _______________________________________________
>> >> ltt-dev mailing list
>> >> ltt-dev at lists.casi.polymtl.ca
>> >> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
>> >> 
>> > 
>> > -- 
>> > Mathieu Desnoyers
>> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
>> >
>> _______________________________________________
>> ltt-dev mailing list
>> ltt-dev at lists.casi.polymtl.ca
>> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
>> 
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> 
>


More information about the lttng-dev mailing list