[ltt-dev] [PATCH 3/3] In-kernel filter module

Mathieu Desnoyers compudj at krystal.dyndns.org
Tue Apr 7 12:48:48 EDT 2009


* Zhaolei (zhaolei at cn.fujitsu.com) wrote:
> * From: "Mathieu Desnoyers" <compudj at krystal.dyndns.org>

[...]

> >> +static LIST_HEAD(marker_filter_head);
> >> +
> >> +/*
> >> + * Use spinlock to lock filter struct because we can't use mutex in filter
> >> + * callback
> >> + */
> >> +static DEFINE_SPINLOCK(filter_lock);
> >> +
> >> +/*
> >> + * We need call some kernel function in ltt-filter function, but many kernel
> >> + * function have tp/markers which cause cpu jump to ltt, then jump to
> >> + * ltt-filter from ltt code.
> >> + * Even if we don't call any kernel function in filter code, interrupt will
> >> + * do similar thing.
> >> + * To make it easy, we avoid this type of nesting by lock/unlock_filter.
> >> + */
> >> +DEFINE_PER_CPU(atomic_t, filter_nesting) = ATOMIC_INIT(0);
> >> +static inline int lock_filter(void)
> >> +{
> >> + preempt_disable();
> >> + if (atomic_xchg(&__get_cpu_var(filter_nesting), 1) > 0) {
> >> + /* Already in lock */
> >> + preempt_enable();
> >> + return 1;
> >> + }
> >> + return 0;
> >> +}
> >> +static inline void unlock_filter(void)
> >> +{
> >> + atomic_set(&__get_cpu_var(filter_nesting), 0);
> >> + preempt_enable();
> >> +}
> > 
> > Hrm, this lock_filter seems broken. If you have an interrupt nesting
> > over lock_filter, and takes lock_filter again, the interrupt will set
> > the filter_nesting to 0, which creates a race.
> If an interrupt nesting over lock_filter, and takes lock_filter again,
> filter_nesting is still 1, because we use atomic_xchg(filter_nesting, 1).
> 

The problem is not when the interrupt handler takes the lock, but when
it releases it. It will set if back to 0, but the interrupted thread
still holds this lock, which creates a race window. Basically your
primitive is lacking a sane nesting counter.

> > 
> > Basically, what you want is to disallow filter modification used by an
> > active tracing session _or_, if you want to allow filter modification
> > while a tracing session is active. But to do it properly, you want to
> > use RCU data structures synchronized with call_rcu_sched and
> > synchronize_sched and use the RCU read lock (which is _already_ taken by
> > LTTng around probe execution anyway) to synchronize.
> I understand your opinion.
> I agree your opinion that RCU can allow filter modification while a tracing
> session is active.
> But if we call a kernel function in filter function, and this kernel function
> run into filter again, we can't prevent this kind of nesting by rcu.
> Ltt have a ltt_nesting to prevent this kind of nesting, but it can't help
> filter function.

You are right in that the filter function is called before reserving
space, and therefore the ltt_nesting value is not tested. So you should
simply test the already existing ltt_nesting value at the beginning of
the filter function, you don't need another nesting variable. You can
look at the ltt_reserve_slot function to see how to test this.

> 
> Maybe we can call no kernel function in filter() to prevent this kind of
> nesting, and use rcu to allow filter modification when filter() function is
> running, and I aggre to using rcu, but sorry that I am some busy these days,
> it is not easy to get sometime to finish this change.

I doubt typical use-cases would include filter modification while the
trace session is running. I would aim for the simpler goal : filter
modifications should only be allowed when a filter is not attached to an
active trace session. This way we don't have to use more complex
RCU-aware data structures, which could be a real pain for filters.

> 
> Current code is ok to work, and it passed my test with a my testset.
> (I can send it to you if you need)

I would really prefer to have a solution that does not add supplementary
custom locking into the tracer. Supporting filter modification when
sessions are inactive seems like a perfect middle-ground between
implementation complexity (it's simple) and not modifying the tracer
locking semantics.

> 
> > 
> > The filter_lock spinlock should probably become a simple mutex too.
> I attempted to use mutex for filter_lock, but always cause panic in my testset.

If I may suggest :

Let's remove the custom locking, use the trace_lock mutex to protect
filter modifications, disallow filter modification if it belongs to an
active trace, and also wait for a quiescent state with
synchronize_sched() after we see that a filter belongs to an inactive
trace (while we hold the trace_lock), so that we can make sure that no
trace probes are seeing this filter while we modify it.

Thanks,

Mathieu

> 
> Thanks!
> Zhaolei
> > 
> > Mathieu
> > 
> >> +
> >> +struct marker_filter *_find_marker_filter(const char *name)
> >> +{
> >> + struct marker_filter *filter;
> >> + list_for_each_entry(filter, &marker_filter_head, list) {
> >> + if (!strcmp(name, filter->format->name))
> >> + return filter;
> >> + }
> >> + return NULL;
> >> +}
> >> +
> >> +static ssize_t format_read(struct file *filp, char __user *ubuf, size_t cnt,
> >> + loff_t *ppos)
> >> +{
> >> + size_t ret;
> >> + struct marker_filter *filter;
> >> +
> >> + lock_filter();
> >> + spin_lock(&filter_lock);
> >> + filter = filp->f_dentry->d_inode->i_private;
> >> + if (filter) {
> >> + ret = simple_read_from_buffer(ubuf, cnt, ppos,
> >> + filter->format->format, strlen(filter->format->format));
> >> + } else {
> >> + ret = -EIO;
> >> + }
> >> + spin_unlock(&filter_lock);
> >> + unlock_filter();
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static const struct file_operations format_fops = {
> >> + .read = format_read,
> >> +};
> >> +
> >> +static int signed_type_read(char *buf, size_t bufsize,
> >> + struct filter_item *filter_item)
> >> +{
> >> + int ret;
> >> + int pos = 0;
> >> +
> >> + ret = scnprintf(buf + pos, bufsize - pos, "%s",
> >> + compare_op_string[filter_item->setting.signed_type.op]);
> >> + if (!ret)
> >> + return 0;
> >> + pos += ret;
> >> +
> >> + ret = scnprintf(buf + pos, bufsize - pos, " %lld",
> >> + filter_item->setting.signed_type.value);
> >> + if (!ret)
> >> + return 0;
> >> + pos += ret;
> >> +
> >> + return pos;
> >> +}
> >> +
> >> +static int unsigned_type_read(char *buf, size_t bufsize,
> >> + struct filter_item *filter_item)
> >> +{
> >> + int ret;
> >> + int pos = 0;
> >> +
> >> + ret = scnprintf(buf + pos, bufsize - pos, "%s",
> >> + compare_op_string[filter_item->setting.unsigned_type.op]);
> >> + if (!ret)
> >> + return 0;
> >> + pos += ret;
> >> +
> >> + ret = scnprintf(buf + pos, bufsize - pos, " %llu",
> >> + filter_item->setting.unsigned_type.value);
> >> + if (!ret)
> >> + return 0;
> >> + pos += ret;
> >> +
> >> + return pos;
> >> +}
> >> +
> >> +static int string_type_read(char *buf, size_t bufsize,
> >> + struct filter_item *filter_item)
> >> +{
> >> + int ret;
> >> + int pos = 0;
> >> +
> >> + ret = scnprintf(buf + pos, bufsize - pos, "%s",
> >> + compare_op_string[filter_item->setting.string_type.op]);
> >> + if (!ret)
> >> + return 0;
> >> + pos += ret;
> >> +
> >> + ret = scnprintf(buf + pos, bufsize - pos, " %s",
> >> + filter_item->setting.string_type.value);
> >> + if (!ret)
> >> + return 0;
> >> + pos += ret;
> >> +
> >> + return pos;
> >> +}
> >> +
> >> +static int char_type_read(char *buf, size_t bufsize,
> >> + struct filter_item *filter_item)
> >> +{
> >> + int ret;
> >> + int pos = 0;
> >> +
> >> + ret = scnprintf(buf + pos, bufsize - pos, "%s",
> >> + compare_op_string[filter_item->setting.char_type.op]);
> >> + if (!ret)
> >> + return 0;
> >> + pos += ret;
> >> +
> >> + switch (filter_item->setting.char_type.value) {
> >> + case '\t':
> >> + ret = scnprintf(buf + pos, bufsize - pos, " \\t");
> >> + break;
> >> + case ' ':
> >> + ret = scnprintf(buf + pos, bufsize - pos, " blank");
> >> + break;
> >> + case '\n':
> >> + ret = scnprintf(buf + pos, bufsize - pos, " \\n");
> >> + break;
> >> + default:
> >> + if (isgraph(filter_item->setting.char_type.value)) {
> >> + ret = scnprintf(buf + pos, bufsize - pos, " %c",
> >> + filter_item->setting.char_type.value);
> >> + } else {
> >> + ret = scnprintf(buf + pos, bufsize - pos,
> >> + " \\%u", (unsigned int)
> >> + filter_item->setting.char_type.value);
> >> + }
> >> + break;
> >> + }
> >> + if (!ret)
> >> + return 0;
> >> + pos += ret;
> >> +
> >> + return pos;
> >> +}
> >> +
> >> +static ssize_t type_read(struct file *filp, char __user *ubuf, size_t cnt,
> >> + loff_t *ppos)
> >> +{
> >> + size_t ret;
> >> + struct filter_item *filter_item;
> >> +
> >> + lock_filter();
> >> + spin_lock(&filter_lock);
> >> + filter_item = filp->f_dentry->d_inode->i_private;
> >> + if (filter_item) {
> >> + struct format_item *format_item;
> >> + char buf[128];
> >> + int pos = 0;
> >> +
> >> + format_item = filter_item_format(filter_item);
> >> +
> >> + ret = scnprintf(buf + pos, sizeof(buf) - pos,
> >> + "type: %u byte(s) %s\n",
> >> + format_item->size,
> >> + format_item_type_string[format_item->type]);
> >> + if (!ret) {
> >> + ret = -EINVAL;
> >> + goto err_format_string;
> >> + }
> >> + pos += ret;
> >> +
> >> + ret = scnprintf(buf + pos, sizeof(buf) - pos, "setting: ");
> >> + if (!ret) {
> >> + ret = -EINVAL;
> >> + goto err_format_string;
> >> + }
> >> + pos += ret;
> >> +
> >> + if (!filter_item->enable) {
> >> + ret = scnprintf(buf + pos, sizeof(buf) - pos, "*");
> >> + if (!ret) {
> >> + ret = -EINVAL;
> >> + goto err_format_string;
> >> + }
> >> + pos += ret;
> >> + goto format_finish;
> >> + }
> >> +
> >> + switch (format_item->type) {
> >> + case SIGNED:
> >> + ret = signed_type_read(buf + pos, sizeof(buf) - pos,
> >> + filter_item);
> >> + break;
> >> + case UNSIGNED:
> >> + ret = unsigned_type_read(buf + pos, sizeof(buf) - pos,
> >> + filter_item);
> >> + break;
> >> + case STRING:
> >> + ret = string_type_read(buf + pos, sizeof(buf) - pos,
> >> + filter_item);
> >> + break;
> >> + case CHAR:
> >> + ret = char_type_read(buf + pos, sizeof(buf) - pos,
> >> + filter_item);
> >> + break;
> >> + default:
> >> + /* should not run to here */
> >> + printk(KERN_ERR "%s: unknown type %d\n",
> >> + __func__, format_item->type);
> >> + ret = -EIO;
> >> + goto err_unknown_type;
> >> + }
> >> + if (!ret) {
> >> + ret = -EINVAL;
> >> + goto err_format_string;
> >> + }
> >> + pos += ret;
> >> +
> >> +format_finish:
> >> + ret = scnprintf(buf + pos, sizeof(buf) - pos, "\n");
> >> + if (!ret) {
> >> + ret = -EINVAL;
> >> + goto err_format_string;
> >> + }
> >> + pos += ret;
> >> +
> >> + ret = simple_read_from_buffer(ubuf, cnt, ppos, buf,
> >> + strlen(buf));
> >> + } else {
> >> + ret = -EIO;
> >> + }
> >> +
> >> +err_format_string:
> >> +err_unknown_type:
> >> + spin_unlock(&filter_lock);
> >> + unlock_filter();
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +/*
> >> + * Check op info in a input string
> >> + * return length of op part on success, *op is set.
> >> + * return 0 on fail
> >> + */
> >> +static size_t scan_op(const char *str, enum compare_op *op)
> >> +{
> >> + /* Must reverse sort with strlen */
> >> + static struct {
> >> + const char *str;
> >> + enum compare_op op;
> >> + } str_op[] = {
> >> + {"gt", gt},
> >> + {"ge", ge},
> >> + {"eq", eq},
> >> + {"le", le},
> >> + {"lt", lt},
> >> + {">=", ge},
> >> + {"<=", le},
> >> + {">", gt},
> >> + {"=", eq},
> >> + {"<", lt},
> >> + };
> >> + int i;
> >> +
> >> + for (i = 0; i < ARRAY_SIZE(str_op); i++) {
> >> + if (!strncmp(str, str_op[i].str, strlen(str_op[i].str))) {
> >> + *op = str_op[i].op;
> >> + return strlen(str_op[i].str);
> >> + }
> >> + }
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int signed_type_write(const char *buf, struct format_item *format_item,
> >> + struct filter_item *filter_item)
> >> +{
> >> + int oplen;
> >> + enum compare_op op;
> >> + long long value = 0;
> >> +
> >> + oplen = scan_op(buf, &op);
> >> + if (!oplen)
> >> + return -EINVAL;
> >> +
> >> + switch (format_item->size) {
> >> + case 1:
> >> + {
> >> + char tmp;
> >> + if (sscanf(buf + oplen, "%hhd", &tmp) != 1)
> >> + return -EINVAL;
> >> + value = tmp;
> >> + }
> >> + break;
> >> + case 2:
> >> + {
> >> + short int tmp;
> >> + if (sscanf(buf + oplen, "%hd", &tmp) != 1)
> >> + return -EINVAL;
> >> + value = tmp;
> >> + }
> >> + break;
> >> + case 4:
> >> + {
> >> + int tmp;
> >> + if (sscanf(buf + oplen, "%d", &tmp) != 1)
> >> + return -EINVAL;
> >> + value = tmp;
> >> + }
> >> + break;
> >> + case 8:
> >> + if (sscanf(buf + oplen, "%lld", &value) != 1)
> >> + return -EINVAL;
> >> + break;
> >> + default:
> >> + /* should not run to here */
> >> + printk(KERN_ERR "%s: unknown size %zu\n", __func__,
> >> + format_item->size);
> >> + return -EIO;
> >> + }
> >> +
> >> + filter_item->setting.signed_type.op = op;
> >> + filter_item->setting.signed_type.value = value;
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int unsigned_type_write(const char *buf, struct format_item *format_item,
> >> + struct filter_item *filter_item)
> >> +{
> >> + int oplen;
> >> + enum compare_op op;
> >> + long long value = 0;
> >> +
> >> + oplen = scan_op(buf, &op);
> >> + if (!oplen)
> >> + return -EINVAL;
> >> +
> >> + switch (format_item->size) {
> >> + case 1:
> >> + {
> >> + unsigned char tmp;
> >> + if (sscanf(buf + oplen, "%hhu", &tmp) != 1)
> >> + return -EINVAL;
> >> + value = tmp;
> >> + }
> >> + break;
> >> + case 2:
> >> + {
> >> + unsigned short int tmp;
> >> + if (sscanf(buf + oplen, "%hu", &tmp) != 1)
> >> + return -EINVAL;
> >> + value = tmp;
> >> + }
> >> + break;
> >> + case 4:
> >> + {
> >> + unsigned int tmp;
> >> + if (sscanf(buf + oplen, "%u", &tmp) != 1)
> >> + return -EINVAL;
> >> + value = tmp;
> >> + }
> >> + break;
> >> + case 8:
> >> + if (sscanf(buf + oplen, "%llu", &value) != 1)
> >> + return -EINVAL;
> >> + break;
> >> + default:
> >> + /* should not run to here */
> >> + printk(KERN_ERR "%s: unknown size %zu\n", __func__,
> >> + format_item->size);
> >> + return -EIO;
> >> + }
> >> +
> >> + filter_item->setting.unsigned_type.op = op;
> >> + filter_item->setting.unsigned_type.value = value;
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int string_type_write(const char *buf, struct format_item *format_item,
> >> + struct filter_item *filter_item)
> >> +{
> >> + int oplen;
> >> + enum compare_op op;
> >> + int val_len;
> >> +
> >> + oplen = scan_op(buf, &op);
> >> + if (!oplen)
> >> + return -EINVAL;
> >> +
> >> + /* End string on \n */
> >> + for (val_len = 0; *(buf + oplen + val_len); val_len++) {
> >> + if (*(buf + oplen + val_len) == '\n')
> >> + break;
> >> + }
> >> +
> >> + if (val_len > ARRAY_SIZE(filter_item->setting.string_type.value) - 1)
> >> + return -EINVAL;
> >> +
> >> + filter_item->setting.string_type.op = op;
> >> + memcpy(filter_item->setting.string_type.value, buf + oplen, val_len);
> >> + filter_item->setting.string_type.value[val_len] = 0;
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int char_type_write(const char *buf, struct format_item *format_item,
> >> + struct filter_item *filter_item)
> >> +{
> >> + int oplen;
> >> + enum compare_op op;
> >> + unsigned char value;
> >> + int val_start;
> >> +
> >> + oplen = scan_op(buf, &op);
> >> + if (!oplen)
> >> + return -EINVAL;
> >> +
> >> + /* bypass blanks */
> >> + for (val_start = 0; *(buf + oplen + val_start) == ' ' ||
> >> + *(buf + oplen + val_start) == '\t'; val_start++)
> >> + ;
> >> +
> >> + /*
> >> + * parse escape string, following escape string are supported:
> >> + *   "\n"
> >> + *   "\[num]"
> >> + *   "\[char expect \n and \[num]]", as "\\", "\ ", ...
> >> + */
> >> + switch (*(buf + oplen + val_start)) {
> >> + case '\0':
> >> + case '\n':
> >> + return -EINVAL;
> >> + case '\\':
> >> + switch (*(buf + oplen + val_start + 1)) {
> >> + case '\0':
> >> + case '\n':
> >> + return -EINVAL;
> >> + case 'n':
> >> + value = '\n';
> >> + break;
> >> + default:
> >> + if (isdigit(*(buf + oplen + val_start + 1))) {
> >> + if (sscanf(buf + oplen + val_start + 1, "%hhu",
> >> + &value) != 1)
> >> + return -EINVAL;
> >> + } else {
> >> + value =
> >> + (*(buf + oplen + val_start + 1));
> >> + }
> >> + break;
> >> + }
> >> + break;
> >> + default:
> >> + value = *(buf + oplen + val_start);
> >> + break;
> >> + }
> >> +
> >> + filter_item->setting.char_type.op = op;
> >> + filter_item->setting.char_type.value = value;
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static ssize_t type_write(struct file *filp, const char __user *ubuf,
> >> + size_t cnt, loff_t *ppos)
> >> +{
> >> + size_t ret;
> >> + struct filter_item *filter_item;
> >> +
> >> + lock_filter();
> >> + spin_lock(&filter_lock);
> >> + filter_item = filp->f_dentry->d_inode->i_private;
> >> + if (filter_item) {
> >> + struct format_item *format_item;
> >> + char buf[NAME_MAX];
> >> + size_t buf_size;
> >> + int val_start;
> >> +
> >> + format_item = filter_item_format(filter_item);
> >> +
> >> + buf_size = min(cnt, sizeof(buf) - 1);
> >> + if (copy_from_user(buf, ubuf, buf_size)) {
> >> + ret = -EFAULT;
> >> + goto err_copy_from_user;
> >> + }
> >> + buf[buf_size] = 0;
> >> +
> >> + /* Check is to disable filter */
> >> + for (val_start = 0; *(buf + val_start) == ' ' ||
> >> + *(buf + val_start) == '\t'; val_start++)
> >> + ;
> >> + if (!strncmp(buf + val_start, "*", strlen("*"))
> >> + || !strncmp(buf + val_start, "off", strlen("off"))) {
> >> + struct marker_filter *filter;
> >> + int i;
> >> +
> >> + filter_item->enable = 0;
> >> +
> >> + /*
> >> + * rescan all items in this marker to
> >> + * update marker_filter->enable
> >> + */
> >> + filter = filter_item->parent;
> >> + filter->enable = 0;
> >> + for (i = 0; i < filter->format->format_item_cnt; i++) {
> >> + if (filter->items[i].enable) {
> >> + filter->enable = 1;
> >> + break;
> >> + }
> >> + }
> >> +
> >> + goto scan_finish;
> >> + }
> >> +
> >> + switch (format_item->type) {
> >> + case SIGNED:
> >> + ret = signed_type_write(buf, format_item, filter_item);
> >> + break;
> >> + case UNSIGNED:
> >> + ret = unsigned_type_write(buf, format_item,
> >> + filter_item);
> >> + break;
> >> + case STRING:
> >> + ret = string_type_write(buf, format_item, filter_item);
> >> + break;
> >> + case CHAR:
> >> + ret = char_type_write(buf, format_item, filter_item);
> >> + break;
> >> + default:
> >> + /* should not run to here */
> >> + printk(KERN_ERR "%s: unknown type %d\n",
> >> + __func__, format_item->type);
> >> + ret = -EIO;
> >> + goto err_unknown_type;
> >> + }
> >> + if (IS_ERR_VALUE(ret))
> >> + goto err_scan_string;
> >> +
> >> + filter_item->enable = 1;
> >> + filter_item->parent->enable = 1;
> >> +
> >> +scan_finish:
> >> + ret = cnt;
> >> + } else {
> >> + ret = -EIO;
> >> + }
> >> +
> >> +err_scan_string:
> >> +err_unknown_type:
> >> +err_copy_from_user:
> >> + spin_unlock(&filter_lock);
> >> + unlock_filter();
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static const struct file_operations type_fops = {
> >> + .read = type_read,
> >> + .write = type_write,
> >> +};
> >> +
> >> +static int init_filter_items(struct marker_filter *filter,
> >> + struct marker_format *format)
> >> +{
> >> + int err = 0;
> >> + int i;
> >> +
> >> + for (i = 0; i < format->format_item_cnt; i++) {
> >> + char fname[16];
> >> +
> >> + filter->items[i].parent = filter;
> >> + filter->items[i].enable = 0;
> >> +
> >> + /* Create type file as __arg1, __arg2 */
> >> + sprintf(fname, "__arg%d", i + 1);
> >> + filter->items[i].den[0] = debugfs_create_file(fname, S_IRUGO,
> >> + filter->rootden, &filter->items[i], &type_fops);
> >> + if (IS_ERR(filter->items[i].den[0])
> >> + || !filter->items[i].den[0]) {
> >> + printk(KERN_ERR
> >> + "%s: create type file of %s for %s failed\n",
> >> + __func__, fname, filter->format->name);
> >> + err = -EIO;
> >> + goto err_build_typefile;
> >> + }
> >> +
> >> + /* Create symbol link to type file */
> >> + if (filter->format->items[i]->name[0]) {
> >> +
> >> + filter->items[i].den[1] = debugfs_create_symlink(
> >> + filter->format->items[i]->name,
> >> + filter->rootden, fname);
> >> + if (IS_ERR(filter->items[i].den[1])
> >> + || !filter->items[i].den[1]) {
> >> + /*
> >> + * same-name type is allowed, but it will cause
> >> + * create symlink fail.
> >> + * we ignore this type of error.
> >> + */
> >> + filter->items[i].den[1] = NULL;
> >> + }
> >> + }
> >> + }
> >> +
> >> + return 0;
> >> +
> >> +err_build_typefile:
> >> + for (i--; i >= 0; i--) {
> >> + /* see comments in _destroy_marker_filter() for why set NULL */
> >> + filter->items[i].den[0]->d_inode->i_private = NULL;
> >> + debugfs_remove(filter->items[i].den[0]);
> >> + debugfs_remove(filter->items[i].den[1]);
> >> + }
> >> + return err;
> >> +}
> >> +
> >> +static int add_marker_filter(struct marker_format *format)
> >> +{
> >> + int err = 0;
> >> + struct marker_filter *filter;
> >> +
> >> + lock_filter();
> >> + spin_lock(&filter_lock);
> >> +
> >> +#ifdef _DEBUG
> >> + if (_find_marker_filter(format->name)) {
> >> + printk(KERN_ERR "%s: filter %s already exist\n", __func__,
> >> + format->name);
> >> + err = -EIO;
> >> + goto err_already_exist;
> >> + }
> >> +#endif /* _DEBUG */
> >> +
> >> + filter = kmalloc(sizeof(struct marker_filter), GFP_KERNEL);
> >> + if (!filter) {
> >> + printk(KERN_ERR "%s: alloc struct marker_filter failed\n",
> >> + __func__);
> >> + err = -ENOMEM;
> >> + goto err_alloc_marker_filter;
> >> + }
> >> +
> >> + INIT_LIST_HEAD(&filter->list);
> >> +
> >> + filter->format = format;
> >> +
> >> + filter->rootden = debugfs_create_dir(format->name, ltt_filter_dir);
> >> + if (IS_ERR(filter->rootden) || !filter->rootden) {
> >> + printk(KERN_ERR "%s: create dir of %s failed\n", __func__,
> >> + format->name);
> >> + err = -EIO;
> >> + goto err_create_dir;
> >> + }
> >> +
> >> + filter->formatden = debugfs_create_file("__arg0", S_IRUGO,
> >> + filter->rootden, filter, &format_fops);
> >> + if (IS_ERR(filter->formatden) || !filter->formatden) {
> >> + printk(KERN_ERR
> >> + "%s: create typefile failed\n", __func__);
> >> + err = -EIO;
> >> + goto err_build_typefile;
> >> + }
> >> +
> >> + if (format->format_item_cnt) {
> >> + filter->items = kmalloc(sizeof(struct filter_item)
> >> + * format->format_item_cnt, GFP_KERNEL);
> >> + if (!filter->items) {
> >> + printk(KERN_ERR
> >> + "%s: alloc filter's items failed\n", __func__);
> >> + err = -ENOMEM;
> >> + goto err_alloc_types;
> >> + }
> >> + err = init_filter_items(filter, format);
> >> + if (IS_ERR_VALUE(err)) {
> >> + printk(KERN_ERR
> >> + "%s: init filter's items failed\n", __func__);
> >> + goto err_init_types;
> >> + }
> >> + } else {
> >> + filter->items = NULL;
> >> + }
> >> +
> >> + filter->enable = 0;
> >> +
> >> + list_add(&filter->list, &marker_filter_head);
> >> +
> >> + spin_unlock(&filter_lock);
> >> + unlock_filter();
> >> +
> >> + return 0;
> >> +
> >> +err_init_types:
> >> + kfree(filter->items);
> >> +err_alloc_types:
> >> + /* see comments in _destroy_marker_filter() for why set NULL */
> >> + filter->formatden->d_inode->i_private = NULL;
> >> + debugfs_remove(filter->formatden);
> >> +err_build_typefile:
> >> + debugfs_remove(filter->rootden);
> >> +err_create_dir:
> >> + kfree(filter);
> >> +err_alloc_marker_filter:
> >> +#ifdef _DEBUG
> >> +err_already_exist:
> >> +#endif /* _DEBUG */
> >> + spin_unlock(&filter_lock);
> >> + unlock_filter();
> >> + return err;
> >> +}
> >> +
> >> +static void _destroy_marker_filter(struct marker_filter *filter)
> >> +{
> >> + int i;
> >> +
> >> + list_del(&filter->list);
> >> +
> >> + for (i = 0; i < filter->format->format_item_cnt; i++) {
> >> + filter->items[i].den[0]->d_inode->i_private = NULL;
> >> + debugfs_remove(filter->items[i].den[0]);
> >> + debugfs_remove(filter->items[i].den[1]);
> >> + }
> >> + kfree(filter->items);
> >> +
> >> + /*
> >> + * Set i_private to NULL, so that if some fileop in process,
> >> + * it can avoid illegal access using i_private in fileop.
> >> + * (we must get filter_lock in fileop to ensure no other change
> >> + *  in filter struct)
> >> + */
> >> + filter->formatden->d_inode->i_private = NULL;
> >> + debugfs_remove(filter->formatden);
> >> +
> >> + debugfs_remove(filter->rootden);
> >> +
> >> + kfree(filter);
> >> +
> >> + return;
> >> +}
> >> +
> >> +static void destroy_all_filter(void)
> >> +{
> >> + struct marker_filter *filter, *next;
> >> +
> >> + lock_filter();
> >> + spin_lock(&filter_lock);
> >> + list_for_each_entry_safe(filter, next, &marker_filter_head, list) {
> >> + _destroy_marker_filter(filter);
> >> + }
> >> + spin_unlock(&filter_lock);
> >> + unlock_filter();
> >> +}
> >> +
> >> +static int bulid_exist_filter(void)
> >> +{
> >> + int err;
> >> + struct marker_format *format;
> >> +
> >> + if (!list_empty(&marker_filter_head)) {
> >> + err = -EEXIST;
> >> + goto err_not_empty;
> >> + }
> >> +
> >> + mutex_lock(&format_lock);
> >> + list_for_each_entry(format, &marker_format_head, list) {
> >> + err = add_marker_filter(format);
> >> + if (IS_ERR_VALUE(err)) {
> >> + printk("%s: add_marker_filter failed\n", __func__);
> >> + goto err_add_filter;
> >> + }
> >> + }
> >> + mutex_unlock(&format_lock);
> >> +
> >> + return 0;
> >> +
> >> +err_add_filter:
> >> + destroy_all_filter();
> >> + mutex_unlock(&format_lock);
> >> +err_not_empty:
> >> + return err;
> >> +}
> >> +
> >> +static int format_notify(struct notifier_block *self, unsigned long val,
> >> + void *data)
> >> +{
> >> + struct marker_format *format = data;
> >> + struct marker_filter *filter, *next;
> >> +
> >> + switch (val) {
> >> + case ADD_FORMAT:
> >> + add_marker_filter(format);
> >> + break;
> >> + case DEL_FORMAT:
> >> + lock_filter();
> >> + spin_lock(&filter_lock);
> >> + list_for_each_entry_safe(filter, next, &marker_filter_head,
> >> + list) {
> >> + if (filter->format == format)
> >> + _destroy_marker_filter(filter);
> >> + }
> >> + spin_unlock(&filter_lock);
> >> + unlock_filter();
> >> + break;
> >> + }
> >> + return NOTIFY_OK;
> >> +}
> >> +
> >> +static struct notifier_block format_nb = {
> >> + .notifier_call = format_notify,
> >> +};
> >> +
> >> +static int marker_filter_init(void)
> >> +{
> >> + int err = 0;
> >> +
> >> + err = bulid_exist_filter();
> >> + if (IS_ERR_VALUE(err)) {
> >> + printk(KERN_ERR "%s: bulid_exist_filter failed\n", __func__);
> >> + goto err_add_exist_filter;
> >> + }
> >> +
> >> + err = register_format_notifier(&format_nb);
> >> + if (IS_ERR_VALUE(err)) {
> >> + printk(KERN_ERR "%s register format notifier failed\n",
> >> + __func__);
> >> + goto err_register_format_notifier;
> >> + }
> >> +
> >> + return 0;
> >> +
> >> +err_register_format_notifier:
> >> + destroy_all_filter();
> >> +err_add_exist_filter:
> >> + return err;
> >> +}
> >> +
> >> +static void marker_filter_destroy(void)
> >> +{
> >> + unregister_format_notifier(&format_nb);
> >> + destroy_all_filter();
> >> +}
> >> +
> >> +
> >> +/*
> >> + * Filter operations
> >> + *
> >> + * define filter callback for lttng to determine if to output a event
> >> + * by user's setting.
> >> + *
> >> + */
> >> +
> >> +/*
> >> + * Ret:
> >> + *   0:  filted
> >> + *   1:  passed
> >> + *   <0: err
> >> + */
> >> +static int signed_filter(struct format_item *format_item,
> >> + struct filter_item *filter_item, va_list *args)
> >> +{
> >> + long long int val;
> >> +
> >> + switch (format_item->size) {
> >> + case 1:
> >> + val = (char)va_arg(*args, int);
> >> + break;
> >> + case 2:
> >> + val = (short int)va_arg(*args, int);
> >> + break;
> >> + case 4:
> >> + val = va_arg(*args, int);
> >> + break;
> >> + case 8:
> >> + val = va_arg(*args, long long int);
> >> + break;
> >> + default:
> >> + /* should not run to here */
> >> + printk(KERN_ERR "%s: unknown size %d\n", __func__,
> >> + format_item->size);
> >> + return -1;
> >> + }
> >> +
> >> + /*
> >> + * We judge filter_item->enable here, because even if filter is not
> >> + * enabled, we still need to bypass this data in va_list
> >> + */
> >> + if (!filter_item->enable)
> >> + return 1;
> >> +
> >> + switch (filter_item->setting.signed_type.op) {
> >> + case gt:
> >> + return val > filter_item->setting.signed_type.value;
> >> + case ge:
> >> + return val >= filter_item->setting.signed_type.value;
> >> + case eq:
> >> + return val == filter_item->setting.signed_type.value;
> >> + case le:
> >> + return val <= filter_item->setting.signed_type.value;
> >> + case lt:
> >> + return val < filter_item->setting.signed_type.value;
> >> + }
> >> +
> >> + /* should not run to here */
> >> + return -1;
> >> +}
> >> +
> >> +static int unsigned_filter(struct format_item *format_item,
> >> + struct filter_item *filter_item, va_list *args)
> >> +{
> >> + unsigned long long int val;
> >> +
> >> + switch (format_item->size) {
> >> + case 1:
> >> + val = (unsigned char)va_arg(*args, unsigned int);
> >> + break;
> >> + case 2:
> >> + val = (short int)va_arg(*args, unsigned int);
> >> + break;
> >> + case 4:
> >> + val = va_arg(*args, unsigned int);
> >> + break;
> >> + case 8:
> >> + val = va_arg(*args, unsigned long long int);
> >> + break;
> >> + default:
> >> + /* should not run to here */
> >> + printk(KERN_ERR "%s: unknown size %d\n", __func__,
> >> + format_item->size);
> >> + return -1;
> >> + }
> >> +
> >> + /*
> >> + * We judge filter_item->enable here, because even if filter is not
> >> + * enabled, we still need to bypass this data in va_list
> >> + */
> >> + if (!filter_item->enable)
> >> + return 1;
> >> +
> >> + switch (filter_item->setting.unsigned_type.op) {
> >> + case gt:
> >> + return val > filter_item->setting.unsigned_type.value;
> >> + case ge:
> >> + return val >= filter_item->setting.unsigned_type.value;
> >> + case eq:
> >> + return val == filter_item->setting.unsigned_type.value;
> >> + case le:
> >> + return val <= filter_item->setting.unsigned_type.value;
> >> + case lt:
> >> + return val < filter_item->setting.unsigned_type.value;
> >> + }
> >> +
> >> + /* should not run to here */
> >> + return -1;
> >> +}
> >> +
> >> +static int string_filter(struct format_item *format_item,
> >> + struct filter_item *filter_item, va_list *args)
> >> +{
> >> + const char *val;
> >> +
> >> + val = va_arg(*args, const char *);
> >> +
> >> + /*
> >> + * We judge filter_item->enable here, because even if filter is not
> >> + * enabled, we still need to bypass this data in va_list
> >> + */
> >> + if (!filter_item->enable)
> >> + return 1;
> >> +
> >> + switch (filter_item->setting.string_type.op) {
> >> + case gt:
> >> + return strcmp(val, filter_item->setting.string_type.value) > 0;
> >> + case ge:
> >> + return strcmp(val, filter_item->setting.string_type.value) >= 0;
> >> + case eq:
> >> + return strcmp(val, filter_item->setting.string_type.value) == 0;
> >> + case le:
> >> + return strcmp(val, filter_item->setting.string_type.value) <= 0;
> >> + case lt:
> >> + return strcmp(val, filter_item->setting.string_type.value) < 0;
> >> + }
> >> +
> >> + /* should not run to here */
> >> + return -1;
> >> +}
> >> +
> >> +static int char_filter(struct format_item *format_item,
> >> + struct filter_item *filter_item, va_list *args)
> >> +{
> >> + unsigned char val;
> >> +
> >> + val = (unsigned char)va_arg(*args, unsigned int);
> >> +
> >> + /*
> >> + * We judge filter_item->enable here, because even if filter is not
> >> + * enabled, we still need to bypass this data in va_list
> >> + */
> >> + if (!filter_item->enable)
> >> + return 1;
> >> +
> >> + switch (filter_item->setting.char_type.op) {
> >> + case gt:
> >> + return val > filter_item->setting.char_type.value;
> >> + case ge:
> >> + return val >= filter_item->setting.char_type.value;
> >> + case eq:
> >> + return val == filter_item->setting.char_type.value;
> >> + case le:
> >> + return val <= filter_item->setting.char_type.value;
> >> + case lt:
> >> + return val < filter_item->setting.char_type.value;
> >> + }
> >> +
> >> + /* should not run to here */
> >> + return -1;
> >> +}
> >> +
> >> +static int ltt_filter(struct ltt_trace_struct *trace,
> >> + const struct marker *mdata, va_list *args)
> >> +{
> >> + int ret;
> >> + struct marker_filter *filter;
> >> + struct marker_format *format;
> >> + int i;
> >> +
> >> + if (lock_filter()) {
> > 
> > Hrm do you allow session filter modification when tracing is active for
> > this session ?
> > 
> > I think the lock_filter approach you are taking here is wrong.
> > 
> > 
> >> + /* Don't output events caused by ltt-filter */
> >> + return 0;
> >> + }
> >> +
> >> + spin_lock(&filter_lock);
> >> +
> >> + filter = _find_marker_filter(mdata->name);
> >> + if (!filter) {
> >> + printk(KERN_ERR "%s: unknown marker %s\n", __func__,
> >> + mdata->name);
> >> + ret = 1;
> >> + goto done;
> >> + }
> >> +
> >> + /* Bypass if no filter setting on this marker */
> >> + if (!filter->enable) {
> >> + ret = 1;
> >> + goto done;
> >> + }
> >> +
> >> + format = filter->format;
> >> +
> >> + for (i = 0; i < format->format_item_cnt; i++) {
> >> + int result;
> >> +
> >> + switch (format->items[i]->type) {
> >> + case SIGNED:
> >> + result = signed_filter(format->items[i],
> >> + filter->items + i, args);
> >> + break;
> >> + case UNSIGNED:
> >> + result = unsigned_filter(format->items[i],
> >> + filter->items + i, args);
> >> + break;
> >> + case STRING:
> >> + result = string_filter(format->items[i],
> >> + filter->items + i, args);
> >> + break;
> >> + case CHAR:
> >> + result = char_filter(format->items[i],
> >> + filter->items + i, args);
> >> + break;
> >> + default:
> >> + /* should not run to here */
> >> + printk(KERN_ERR "%s: unknown type %d\n", __func__,
> >> + format->items[i]->type);
> >> + ret = 1;
> >> + goto done;
> >> + }
> >> + if (result < 0) {
> >> + ret = 1;
> >> + goto done;
> >> + }
> >> + if (!result) {
> >> + ret = 0;
> >> + goto done;
> >> + }
> >> + }
> >> +
> >> + ret = 1;
> >> +
> >> +done:
> >> + spin_unlock(&filter_lock);
> >> + unlock_filter();
> >> + return ret;
> >> +}
> >> +
> >> +
> >> +/*
> >> + * Init/exit option of module
> >> + *
> >> + * Init marker_format and marker_filter operations
> >> + * register filter callback in ltt
> >> + *
> >> + */
> >> +
> >> +static int __init ltt_filter_init(void)
> >> +{
> >> + int err = 0;
> >> + struct dentry *ltt_root_dentry;
> >> +
> >> + ltt_root_dentry = get_ltt_root();
> >> + if (!ltt_root_dentry) {
> >> + err = -ENOENT;
> >> + goto err_no_root;
> >> + }
> >> +
> >> + ltt_filter_dir = debugfs_create_dir(LTT_FILTER_DIR, ltt_root_dentry);
> >> + if (IS_ERR(ltt_filter_dir) || !ltt_filter_dir) {
> >> + printk(KERN_ERR "%s: create dir of %s failed\n", __func__,
> >> + LTT_FILTER_DIR);
> >> + err = -ENOMEM;
> >> + goto err_create_filter_dir;
> >> + }
> >> +
> >> + err = marker_format_init();
> >> + if (IS_ERR_VALUE(err)) {
> >> + printk(KERN_ERR "%s: init marker format failed\n", __func__);
> >> + goto err_marker_format_init;
> >> + }
> >> +
> >> + err = marker_filter_init();
> >> + if (IS_ERR_VALUE(err)) {
> >> + printk(KERN_ERR "%s: init marker filter failed\n", __func__);
> >> + goto err_marker_filter_init;
> >> + }
> >> +
> >> + err = ltt_module_register(LTT_FUNCTION_RUN_FILTER, ltt_filter,
> >> + THIS_MODULE);
> >> + if (IS_ERR_VALUE(err)) {
> >> + printk(KERN_ERR "%s: register ltt filter failed\n", __func__);
> >> + goto err_filter_register;
> >> + }
> >> +
> >> + return 0;
> >> +
> >> +err_filter_register:
> >> + marker_filter_destroy();
> >> +err_marker_filter_init:
> >> + marker_format_destroy();
> >> +err_marker_format_init:
> >> + debugfs_remove(ltt_filter_dir);
> >> +err_create_filter_dir:
> >> +err_no_root:
> >> + return err;
> >> +}
> >> +
> >> +static void __exit ltt_filter_exit(void)
> >> +{
> >> + ltt_module_unregister(LTT_FUNCTION_RUN_FILTER);
> >> + marker_filter_destroy();
> >> + marker_format_destroy();
> >> +
> >> + debugfs_remove(ltt_filter_dir);
> >> +}
> >> +
> >> +module_init(ltt_filter_init);
> >> +module_exit(ltt_filter_exit);
> >> +
> >> +MODULE_LICENSE("GPL");
> >> +MODULE_AUTHOR("Zhao Lei <zhaolei at cn.fujitsu.com>");
> >> +MODULE_DESCRIPTION("Linux Trace Toolkit Filter");
> >> -- 
> >> 1.5.5.3
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> ltt-dev mailing list
> >> ltt-dev at lists.casi.polymtl.ca
> >> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
> >> 
> > 
> > -- 
> > Mathieu Desnoyers
> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> >
> _______________________________________________
> ltt-dev mailing list
> ltt-dev at lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68




More information about the lttng-dev mailing list