diff mbox series

tracing: Add a way to filter function addresses to function names

Message ID 20221214125209.09d736dd@gandalf.local.home (mailing list archive)
State Superseded
Headers show
Series tracing: Add a way to filter function addresses to function names | expand

Commit Message

Steven Rostedt Dec. 14, 2022, 5:52 p.m. UTC
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

There's been several times where an event records a function address in
its field and I needed to filter on that address for a specific function
name. It required looking up the function in kallsyms, finding its size,
and doing a compare of "field >= function_start && field < function_end".

But this would change from boot to boot and is unreliable in scripts.
Also, it is useful to have this at boot up, where the addresses will not
be known. For example, on the boot command line:

  trace_trigger="initcall_finish.traceoff if initcall_finish.function == acpi_init"

To implement this, add a ".function" prefix, that will check that the
field is of size long, and the only operations allowed (so far) are "=="
and "!=".

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
[ Resending due to claws-mail messing up the format of the
  original patch ]

 Documentation/trace/events.rst     | 12 +++++
 kernel/trace/trace_events.c        |  2 +-
 kernel/trace/trace_events_filter.c | 79 +++++++++++++++++++++++++++++-
 3 files changed, 91 insertions(+), 2 deletions(-)

Comments

Ross Zwisler Dec. 16, 2022, 9:38 p.m. UTC | #1
On Wed, Dec 14, 2022 at 12:52:09PM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> There's been several times where an event records a function address in
> its field and I needed to filter on that address for a specific function
> name. It required looking up the function in kallsyms, finding its size,
> and doing a compare of "field >= function_start && field < function_end".

This is amazingly useful!

> But this would change from boot to boot and is unreliable in scripts.
> Also, it is useful to have this at boot up, where the addresses will not
> be known. For example, on the boot command line:
> 
>   trace_trigger="initcall_finish.traceoff if initcall_finish.function == acpi_init"

I think this should actually be:

  trace_trigger="initcall_finish.traceoff if func.function == acpi_init"
                                             ^^^^

right?  'func' is the function pointer in the format:

  [ /sys/kernel/debug/tracing/events/initcall/initcall_finish ]
  # cat format
  name: initcall_finish
  ID: 20
  format:
          field:unsigned short common_type;	offset:0;	size:2;	signed:0;
          field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
          field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
          field:int common_pid;	offset:4;	size:4;	signed:1;

          field:initcall_t func;	offset:8;	size:8;	signed:0;
          field:int ret;	offset:16;	size:4;	signed:1;

  print fmt: "func=%pS ret=%d", REC->func, REC->ret

> To implement this, add a ".function" prefix, that will check that the
> field is of size long, and the only operations allowed (so far) are "=="
> and "!=".
> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---

<>

> @@ -1393,6 +1414,12 @@ static int parse_pred(const char *str, void *data,
>  		i += len;
>  	}
>  
> +	/* See if the field is a user space string */

Is this comment correct, or was it just copied from the .ustring handling
above?  We aren't doing any string sanitization (strncpy_from_kernel_nofault()
and friends) AFAICT, just doing a kernel symbol lookup.

Maybe:

  /* See if this is a kernel function name */

?

> +	if ((len = str_has_prefix(str + i, ".function"))) {
> +		function = true;
> +		i += len;
> +	}
> +
>  	while (isspace(str[i]))
>  		i++;
>  
> @@ -1423,7 +1450,57 @@ static int parse_pred(const char *str, void *data,
>  	pred->offset = field->offset;
>  	pred->op = op;
>  
> -	if (ftrace_event_is_function(call)) {
> +	if (function) {
> +		/* The field must be the same size as long */
> +		if (field->size != sizeof(long)) {
> +			parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i);
> +			goto err_free;
> +		}
> +
> +		/* Function only works with '==' or '!=' and an unquoted string */
> +		switch (op) {
> +		case OP_NE:
> +		case OP_EQ:
> +			break;
> +		default:
> +			parse_error(pe, FILT_ERR_INVALID_OP, pos + i);
> +			goto err_free;
> +		}
> +
> +		if (isdigit(str[i])) {
> +			ret = kstrtol(num_buf, 0, &ip);
> +			if (ret) {
> +				parse_error(pe, FILT_ERR_INVALID_VALUE, pos + i);
> +				goto err_free;
> +			}

Maybe I'm holding it by the wrong end, but can we actually use this to filter
based on an address?  Hex doesn't work (as you'd expect from looking at
kstrol()), but decimal doesn't work for me either:

  [ /sys/kernel/debug/tracing/events/kmem/kmalloc ]
  # echo "traceoff:1 if call_site.function == 0xffffffff96ca4240" > trigger

  [ /sys/kernel/debug/tracing/events/kmem/kmalloc ]
  # echo "traceoff:1 if call_site.function == 18446744071944421952" > trigger
  bash: echo: write error: Invalid argument

Should this interface insist that users use function names that we can look
up?
Steven Rostedt Dec. 16, 2022, 9:49 p.m. UTC | #2
On Fri, 16 Dec 2022 14:38:49 -0700
Ross Zwisler <zwisler@google.com> wrote:

> On Wed, Dec 14, 2022 at 12:52:09PM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> > 
> > There's been several times where an event records a function address in
> > its field and I needed to filter on that address for a specific function
> > name. It required looking up the function in kallsyms, finding its size,
> > and doing a compare of "field >= function_start && field < function_end".  
> 
> This is amazingly useful!

Thanks!

> 
> > But this would change from boot to boot and is unreliable in scripts.
> > Also, it is useful to have this at boot up, where the addresses will not
> > be known. For example, on the boot command line:
> > 
> >   trace_trigger="initcall_finish.traceoff if initcall_finish.function == acpi_init"  
> 
> I think this should actually be:
> 
>   trace_trigger="initcall_finish.traceoff if func.function == acpi_init"
>                                              ^^^^
> 
> right?  'func' is the function pointer in the format:

Oops. Yes.

> 
>   [ /sys/kernel/debug/tracing/events/initcall/initcall_finish ]
>   # cat format
>   name: initcall_finish
>   ID: 20
>   format:
>           field:unsigned short common_type;	offset:0;	size:2;	signed:0;
>           field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
>           field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
>           field:int common_pid;	offset:4;	size:4;	signed:1;
> 
>           field:initcall_t func;	offset:8;	size:8;	signed:0;
>           field:int ret;	offset:16;	size:4;	signed:1;
> 
>   print fmt: "func=%pS ret=%d", REC->func, REC->ret
> 
> > To implement this, add a ".function" prefix, that will check that the
> > field is of size long, and the only operations allowed (so far) are "=="
> > and "!=".
> > 
> > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> > ---  
> 
> <>
> 
> > @@ -1393,6 +1414,12 @@ static int parse_pred(const char *str, void *data,
> >  		i += len;
> >  	}
> >  
> > +	/* See if the field is a user space string */  
> 
> Is this comment correct, or was it just copied from the .ustring handling
> above?  We aren't doing any string sanitization (strncpy_from_kernel_nofault()
> and friends) AFAICT, just doing a kernel symbol lookup.

Oops again ;)

> 
> Maybe:
> 
>   /* See if this is a kernel function name */

Sure.

> 
> ?
> 
> > +	if ((len = str_has_prefix(str + i, ".function"))) {
> > +		function = true;
> > +		i += len;
> > +	}
> > +
> >  	while (isspace(str[i]))
> >  		i++;
> >  
> > @@ -1423,7 +1450,57 @@ static int parse_pred(const char *str, void *data,
> >  	pred->offset = field->offset;
> >  	pred->op = op;
> >  
> > -	if (ftrace_event_is_function(call)) {
> > +	if (function) {
> > +		/* The field must be the same size as long */
> > +		if (field->size != sizeof(long)) {
> > +			parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i);
> > +			goto err_free;
> > +		}
> > +
> > +		/* Function only works with '==' or '!=' and an unquoted string */
> > +		switch (op) {
> > +		case OP_NE:
> > +		case OP_EQ:
> > +			break;
> > +		default:
> > +			parse_error(pe, FILT_ERR_INVALID_OP, pos + i);
> > +			goto err_free;
> > +		}
> > +
> > +		if (isdigit(str[i])) {
> > +			ret = kstrtol(num_buf, 0, &ip);
> > +			if (ret) {
> > +				parse_error(pe, FILT_ERR_INVALID_VALUE, pos + i);
> > +				goto err_free;
> > +			}  
> 
> Maybe I'm holding it by the wrong end, but can we actually use this to filter
> based on an address?  Hex doesn't work (as you'd expect from looking at
> kstrol()), but decimal doesn't work for me either:
> 
>   [ /sys/kernel/debug/tracing/events/kmem/kmalloc ]
>   # echo "traceoff:1 if call_site.function == 0xffffffff96ca4240" > trigger
> 
>   [ /sys/kernel/debug/tracing/events/kmem/kmalloc ]
>   # echo "traceoff:1 if call_site.function == 18446744071944421952" > trigger
>   bash: echo: write error: Invalid argument

Thanks for letting me know. I didn't test this and it should work. I'll
look into it.

> 
> Should this interface insist that users use function names that we can look
> up?

It may pick the wrong function if there are more than one function with the
same name. Passing by address will guarantee it will be the right function.

Thanks for the review!

-- Steve
Masami Hiramatsu (Google) Dec. 18, 2022, 1:37 a.m. UTC | #3
Hi Steve,

On Wed, 14 Dec 2022 12:52:09 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> There's been several times where an event records a function address in
> its field and I needed to filter on that address for a specific function
> name. It required looking up the function in kallsyms, finding its size,
> and doing a compare of "field >= function_start && field < function_end".
> 
> But this would change from boot to boot and is unreliable in scripts.
> Also, it is useful to have this at boot up, where the addresses will not
> be known. For example, on the boot command line:
> 
>   trace_trigger="initcall_finish.traceoff if initcall_finish.function == acpi_init"
> 
> To implement this, add a ".function" prefix, that will check that the
> field is of size long, and the only operations allowed (so far) are "=="
> and "!=".

This looks nice! BTW, can you also add a test case for this feature?
Thus we can ensure this works both with symbols or function addresses.

Thank you,

> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> [ Resending due to claws-mail messing up the format of the
>   original patch ]
> 
>  Documentation/trace/events.rst     | 12 +++++
>  kernel/trace/trace_events.c        |  2 +-
>  kernel/trace/trace_events_filter.c | 79 +++++++++++++++++++++++++++++-
>  3 files changed, 91 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
> index c47f381d0c00..d0fd5c7220b7 100644
> --- a/Documentation/trace/events.rst
> +++ b/Documentation/trace/events.rst
> @@ -207,6 +207,18 @@ field name::
>  As the kernel will have to know how to retrieve the memory that the pointer
>  is at from user space.
>  
> +You can convert any long type to a function address and search by function name::
> +
> +  call_site.function == security_prepare_creds
> +
> +The above will filter when the field "call_site" falls on the address within
> +"security_prepare_creds". That is, it will compare the value of "call_site" and
> +the filter will return true if it is greater than or equal to the start of
> +the function "security_prepare_creds" and less than the end of that function.
> +
> +The ".function" postfix can only be attached to values of size long, and can only
> +be compared with "==" or "!=".
> +
>  5.2 Setting filters
>  -------------------
>  
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 33e0b4f8ebe6..db6e2f399440 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -2822,7 +2822,7 @@ static __init int setup_trace_triggers(char *str)
>  		if (!trigger)
>  			break;
>  		bootup_triggers[i].event = strsep(&trigger, ".");
> -		bootup_triggers[i].trigger = strsep(&trigger, ".");
> +		bootup_triggers[i].trigger = strsep(&trigger, "");
>  		if (!bootup_triggers[i].trigger)
>  			break;
>  	}
> diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
> index 96acc2b71ac7..eef6426051bb 100644
> --- a/kernel/trace/trace_events_filter.c
> +++ b/kernel/trace/trace_events_filter.c
> @@ -64,6 +64,7 @@ enum filter_pred_fn {
>  	FILTER_PRED_FN_PCHAR_USER,
>  	FILTER_PRED_FN_PCHAR,
>  	FILTER_PRED_FN_CPU,
> +	FILTER_PRED_FN_FUNCTION,
>  	FILTER_PRED_FN_,
>  	FILTER_PRED_TEST_VISITED,
>  };
> @@ -71,6 +72,7 @@ enum filter_pred_fn {
>  struct filter_pred {
>  	enum filter_pred_fn 	fn_num;
>  	u64 			val;
> +	u64 			val2;
>  	struct regex		regex;
>  	unsigned short		*ops;
>  	struct ftrace_event_field *field;
> @@ -103,6 +105,7 @@ struct filter_pred {
>  	C(INVALID_FILTER,	"Meaningless filter expression"),	\
>  	C(IP_FIELD_ONLY,	"Only 'ip' field is supported for function trace"), \
>  	C(INVALID_VALUE,	"Invalid value (did you forget quotes)?"), \
> +	C(NO_FUNCTION,		"Function not found"),			\
>  	C(ERRNO,		"Error"),				\
>  	C(NO_FILTER,		"No filter found")
>  
> @@ -876,6 +879,17 @@ static int filter_pred_comm(struct filter_pred *pred, void *event)
>  	return cmp ^ pred->not;
>  }
>  
> +/* Filter predicate for functions. */
> +static int filter_pred_function(struct filter_pred *pred, void *event)
> +{
> +	unsigned long *addr = (unsigned long *)(event + pred->offset);
> +	unsigned long start = (unsigned long)pred->val;
> +	unsigned long end = (unsigned long)pred->val2;
> +	int ret = *addr >= start && *addr < end;
> +
> +	return pred->op == OP_EQ ? ret : !ret;
> +}
> +
>  /*
>   * regex_match_foo - Basic regex callbacks
>   *
> @@ -1335,6 +1349,8 @@ static int filter_pred_fn_call(struct filter_pred *pred, void *event)
>  		return filter_pred_pchar(pred, event);
>  	case FILTER_PRED_FN_CPU:
>  		return filter_pred_cpu(pred, event);
> +	case FILTER_PRED_FN_FUNCTION:
> +		return filter_pred_function(pred, event);
>  	case FILTER_PRED_TEST_VISITED:
>  		return test_pred_visited_fn(pred, event);
>  	default:
> @@ -1350,8 +1366,13 @@ static int parse_pred(const char *str, void *data,
>  	struct trace_event_call *call = data;
>  	struct ftrace_event_field *field;
>  	struct filter_pred *pred = NULL;
> +	unsigned long offset;
> +	unsigned long size;
> +	unsigned long ip;
>  	char num_buf[24];	/* Big enough to hold an address */
>  	char *field_name;
> +	char *name;
> +	bool function = false;
>  	bool ustring = false;
>  	char q;
>  	u64 val;
> @@ -1393,6 +1414,12 @@ static int parse_pred(const char *str, void *data,
>  		i += len;
>  	}
>  
> +	/* See if the field is a user space string */
> +	if ((len = str_has_prefix(str + i, ".function"))) {
> +		function = true;
> +		i += len;
> +	}
> +
>  	while (isspace(str[i]))
>  		i++;
>  
> @@ -1423,7 +1450,57 @@ static int parse_pred(const char *str, void *data,
>  	pred->offset = field->offset;
>  	pred->op = op;
>  
> -	if (ftrace_event_is_function(call)) {
> +	if (function) {
> +		/* The field must be the same size as long */
> +		if (field->size != sizeof(long)) {
> +			parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i);
> +			goto err_free;
> +		}
> +
> +		/* Function only works with '==' or '!=' and an unquoted string */
> +		switch (op) {
> +		case OP_NE:
> +		case OP_EQ:
> +			break;
> +		default:
> +			parse_error(pe, FILT_ERR_INVALID_OP, pos + i);
> +			goto err_free;
> +		}
> +
> +		if (isdigit(str[i])) {
> +			ret = kstrtol(num_buf, 0, &ip);
> +			if (ret) {
> +				parse_error(pe, FILT_ERR_INVALID_VALUE, pos + i);
> +				goto err_free;
> +			}
> +		} else {
> +			s = i;
> +			for (; str[i] && !isspace(str[i]); i++)
> +				;
> +
> +			len = i - s;
> +			name = kmemdup_nul(str + s, len, GFP_KERNEL);
> +			if (!name)
> +				goto err_mem;
> +			ip = kallsyms_lookup_name(name);
> +			kfree(name);
> +			if (!ip) {
> +				parse_error(pe, FILT_ERR_NO_FUNCTION, pos + i);
> +				goto err_free;
> +			}
> +		}
> +
> +		/* Now find the function start and end address */
> +		if (!kallsyms_lookup_size_offset(ip, &size, &offset)) {
> +			parse_error(pe, FILT_ERR_NO_FUNCTION, pos + i);
> +			goto err_free;
> +		}
> +
> +		pred->fn_num = FILTER_PRED_FN_FUNCTION;
> +		pred->val = ip - offset;
> +		pred->val2 = pred->val + size;
> +
> +	} else if (ftrace_event_is_function(call)) {
>  		/*
>  		 * Perf does things different with function events.
>  		 * It only allows an "ip" field, and expects a string.
> -- 
> 2.35.1
>
Zheng Yejian Dec. 19, 2022, 2:38 a.m. UTC | #4
在 2022/12/15 01:52, Steven Rostedt 写道:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> There's been several times where an event records a function address in
> its field and I needed to filter on that address for a specific function
> name. It required looking up the function in kallsyms, finding its size,
> and doing a compare of "field >= function_start && field < function_end".
> 
> But this would change from boot to boot and is unreliable in scripts.
> Also, it is useful to have this at boot up, where the addresses will not
> be known. For example, on the boot command line:
> 
>    trace_trigger="initcall_finish.traceoff if initcall_finish.function == acpi_init"
> 
> To implement this, add a ".function" prefix, that will check that the
> field is of size long, and the only operations allowed (so far) are "=="
> and "!=".
> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> [ Resending due to claws-mail messing up the format of the
>    original patch ]
> 
>   Documentation/trace/events.rst     | 12 +++++
>   kernel/trace/trace_events.c        |  2 +-
>   kernel/trace/trace_events_filter.c | 79 +++++++++++++++++++++++++++++-
>   3 files changed, 91 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
> index c47f381d0c00..d0fd5c7220b7 100644
> --- a/Documentation/trace/events.rst
> +++ b/Documentation/trace/events.rst
> @@ -207,6 +207,18 @@ field name::
>   As the kernel will have to know how to retrieve the memory that the pointer
>   is at from user space.
>   
> +You can convert any long type to a function address and search by function name::
> +
> +  call_site.function == security_prepare_creds
> +
> +The above will filter when the field "call_site" falls on the address within
> +"security_prepare_creds". That is, it will compare the value of "call_site" and
> +the filter will return true if it is greater than or equal to the start of
> +the function "security_prepare_creds" and less than the end of that function.
> +
> +The ".function" postfix can only be attached to values of size long, and can only
> +be compared with "==" or "!=".
> +
>   5.2 Setting filters
>   -------------------
>   
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 33e0b4f8ebe6..db6e2f399440 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -2822,7 +2822,7 @@ static __init int setup_trace_triggers(char *str)
>   		if (!trigger)
>   			break;
>   		bootup_triggers[i].event = strsep(&trigger, ".");
> -		bootup_triggers[i].trigger = strsep(&trigger, ".");
> +		bootup_triggers[i].trigger = strsep(&trigger, "");

Would it be better to change to:

      bootup_triggers[i].trigger = trigger;

Because there is unnecessary loop if call strsep(s, "") :-)
   strsep(s, ct) {
     strpbrk(cs, ct) {
       // when 'ct' is empty string, here will always return NULL
       // after traversing string 'cs'

--
Best regards,
Zheng Yejian

>   		if (!bootup_triggers[i].trigger)
>   			break;
>   	}
> diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
> index 96acc2b71ac7..eef6426051bb 100644
> --- a/kernel/trace/trace_events_filter.c
> +++ b/kernel/trace/trace_events_filter.c
> @@ -64,6 +64,7 @@ enum filter_pred_fn {
>   	FILTER_PRED_FN_PCHAR_USER,
>   	FILTER_PRED_FN_PCHAR,
>   	FILTER_PRED_FN_CPU,
> +	FILTER_PRED_FN_FUNCTION,
>   	FILTER_PRED_FN_,
>   	FILTER_PRED_TEST_VISITED,
>   };
> @@ -71,6 +72,7 @@ enum filter_pred_fn {
>   struct filter_pred {
>   	enum filter_pred_fn 	fn_num;
>   	u64 			val;
> +	u64 			val2;
>   	struct regex		regex;
>   	unsigned short		*ops;
>   	struct ftrace_event_field *field;
> @@ -103,6 +105,7 @@ struct filter_pred {
>   	C(INVALID_FILTER,	"Meaningless filter expression"),	\
>   	C(IP_FIELD_ONLY,	"Only 'ip' field is supported for function trace"), \
>   	C(INVALID_VALUE,	"Invalid value (did you forget quotes)?"), \
> +	C(NO_FUNCTION,		"Function not found"),			\
>   	C(ERRNO,		"Error"),				\
>   	C(NO_FILTER,		"No filter found")
>   
> @@ -876,6 +879,17 @@ static int filter_pred_comm(struct filter_pred *pred, void *event)
>   	return cmp ^ pred->not;
>   }
>   
> +/* Filter predicate for functions. */
> +static int filter_pred_function(struct filter_pred *pred, void *event)
> +{
> +	unsigned long *addr = (unsigned long *)(event + pred->offset);
> +	unsigned long start = (unsigned long)pred->val;
> +	unsigned long end = (unsigned long)pred->val2;
> +	int ret = *addr >= start && *addr < end;
> +
> +	return pred->op == OP_EQ ? ret : !ret;
> +}
> +
>   /*
>    * regex_match_foo - Basic regex callbacks
>    *
> @@ -1335,6 +1349,8 @@ static int filter_pred_fn_call(struct filter_pred *pred, void *event)
>   		return filter_pred_pchar(pred, event);
>   	case FILTER_PRED_FN_CPU:
>   		return filter_pred_cpu(pred, event);
> +	case FILTER_PRED_FN_FUNCTION:
> +		return filter_pred_function(pred, event);
>   	case FILTER_PRED_TEST_VISITED:
>   		return test_pred_visited_fn(pred, event);
>   	default:
> @@ -1350,8 +1366,13 @@ static int parse_pred(const char *str, void *data,
>   	struct trace_event_call *call = data;
>   	struct ftrace_event_field *field;
>   	struct filter_pred *pred = NULL;
> +	unsigned long offset;
> +	unsigned long size;
> +	unsigned long ip;
>   	char num_buf[24];	/* Big enough to hold an address */
>   	char *field_name;
> +	char *name;
> +	bool function = false;
>   	bool ustring = false;
>   	char q;
>   	u64 val;
> @@ -1393,6 +1414,12 @@ static int parse_pred(const char *str, void *data,
>   		i += len;
>   	}
>   
> +	/* See if the field is a user space string */
> +	if ((len = str_has_prefix(str + i, ".function"))) {
> +		function = true;
> +		i += len;
> +	}
> +
>   	while (isspace(str[i]))
>   		i++;
>   
> @@ -1423,7 +1450,57 @@ static int parse_pred(const char *str, void *data,
>   	pred->offset = field->offset;
>   	pred->op = op;
>   
> -	if (ftrace_event_is_function(call)) {
> +	if (function) {
> +		/* The field must be the same size as long */
> +		if (field->size != sizeof(long)) {
> +			parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i);
> +			goto err_free;
> +		}
> +
> +		/* Function only works with '==' or '!=' and an unquoted string */
> +		switch (op) {
> +		case OP_NE:
> +		case OP_EQ:
> +			break;
> +		default:
> +			parse_error(pe, FILT_ERR_INVALID_OP, pos + i);
> +			goto err_free;
> +		}
> +
> +		if (isdigit(str[i])) {
> +			ret = kstrtol(num_buf, 0, &ip);
> +			if (ret) {
> +				parse_error(pe, FILT_ERR_INVALID_VALUE, pos + i);
> +				goto err_free;
> +			}
> +		} else {
> +			s = i;
> +			for (; str[i] && !isspace(str[i]); i++)
> +				;
> +
> +			len = i - s;
> +			name = kmemdup_nul(str + s, len, GFP_KERNEL);
> +			if (!name)
> +				goto err_mem;
> +			ip = kallsyms_lookup_name(name);
> +			kfree(name);
> +			if (!ip) {
> +				parse_error(pe, FILT_ERR_NO_FUNCTION, pos + i);
> +				goto err_free;
> +			}
> +		}
> +
> +		/* Now find the function start and end address */
> +		if (!kallsyms_lookup_size_offset(ip, &size, &offset)) {
> +			parse_error(pe, FILT_ERR_NO_FUNCTION, pos + i);
> +			goto err_free;
> +		}
> +
> +		pred->fn_num = FILTER_PRED_FN_FUNCTION;
> +		pred->val = ip - offset;
> +		pred->val2 = pred->val + size;
> +
> +	} else if (ftrace_event_is_function(call)) {
>   		/*
>   		 * Perf does things different with function events.
>   		 * It only allows an "ip" field, and expects a string.
Steven Rostedt Dec. 19, 2022, 6:21 p.m. UTC | #5
On Mon, 19 Dec 2022 10:38:50 +0800
Zheng Yejian <zhengyejian1@huawei.com> wrote:

> > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> > index 33e0b4f8ebe6..db6e2f399440 100644
> > --- a/kernel/trace/trace_events.c
> > +++ b/kernel/trace/trace_events.c
> > @@ -2822,7 +2822,7 @@ static __init int setup_trace_triggers(char *str)
> >   		if (!trigger)
> >   			break;
> >   		bootup_triggers[i].event = strsep(&trigger, ".");
> > -		bootup_triggers[i].trigger = strsep(&trigger, ".");
> > +		bootup_triggers[i].trigger = strsep(&trigger, "");  
> 
> Would it be better to change to:
> 
>       bootup_triggers[i].trigger = trigger;
> 

Sure, I'll make the update.

> Because there is unnecessary loop if call strsep(s, "") :-)
>    strsep(s, ct) {
>      strpbrk(cs, ct) {
>        // when 'ct' is empty string, here will always return NULL
>        // after traversing string 'cs'

I'm not sure what you mean about an extra loop.

Thanks,

-- Steve
Zheng Yejian Dec. 20, 2022, 1:40 a.m. UTC | #6
On 2022/12/20 02:21, Steven Rostedt wrote:
> On Mon, 19 Dec 2022 10:38:50 +0800
> Zheng Yejian <zhengyejian1@huawei.com> wrote:
> 
>>> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
>>> index 33e0b4f8ebe6..db6e2f399440 100644
>>> --- a/kernel/trace/trace_events.c
>>> +++ b/kernel/trace/trace_events.c
>>> @@ -2822,7 +2822,7 @@ static __init int setup_trace_triggers(char *str)
>>>    		if (!trigger)
>>>    			break;
>>>    		bootup_triggers[i].event = strsep(&trigger, ".");
>>> -		bootup_triggers[i].trigger = strsep(&trigger, ".");
>>> +		bootup_triggers[i].trigger = strsep(&trigger, "");
>>
>> Would it be better to change to:
>>
>>        bootup_triggers[i].trigger = trigger;
>>
> 
> Sure, I'll make the update.
> 
>> Because there is unnecessary loop if call strsep(s, "") :-)
>>     strsep(s, ct) {
>>       strpbrk(cs, ct) {
>>         // when 'ct' is empty string, here will always return NULL
>>         // after traversing string 'cs'
> 
> I'm not sure what you mean about an extra loop.

I just mean above "traversing string 'cs'" in strpbrk(s, "") is 
unnecessary. Sorry for making the ambiguity :-(

> 
> Thanks,
> 
> -- Steve
>
diff mbox series

Patch

diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index c47f381d0c00..d0fd5c7220b7 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -207,6 +207,18 @@  field name::
 As the kernel will have to know how to retrieve the memory that the pointer
 is at from user space.
 
+You can convert any long type to a function address and search by function name::
+
+  call_site.function == security_prepare_creds
+
+The above will filter when the field "call_site" falls on the address within
+"security_prepare_creds". That is, it will compare the value of "call_site" and
+the filter will return true if it is greater than or equal to the start of
+the function "security_prepare_creds" and less than the end of that function.
+
+The ".function" postfix can only be attached to values of size long, and can only
+be compared with "==" or "!=".
+
 5.2 Setting filters
 -------------------
 
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 33e0b4f8ebe6..db6e2f399440 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2822,7 +2822,7 @@  static __init int setup_trace_triggers(char *str)
 		if (!trigger)
 			break;
 		bootup_triggers[i].event = strsep(&trigger, ".");
-		bootup_triggers[i].trigger = strsep(&trigger, ".");
+		bootup_triggers[i].trigger = strsep(&trigger, "");
 		if (!bootup_triggers[i].trigger)
 			break;
 	}
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 96acc2b71ac7..eef6426051bb 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -64,6 +64,7 @@  enum filter_pred_fn {
 	FILTER_PRED_FN_PCHAR_USER,
 	FILTER_PRED_FN_PCHAR,
 	FILTER_PRED_FN_CPU,
+	FILTER_PRED_FN_FUNCTION,
 	FILTER_PRED_FN_,
 	FILTER_PRED_TEST_VISITED,
 };
@@ -71,6 +72,7 @@  enum filter_pred_fn {
 struct filter_pred {
 	enum filter_pred_fn 	fn_num;
 	u64 			val;
+	u64 			val2;
 	struct regex		regex;
 	unsigned short		*ops;
 	struct ftrace_event_field *field;
@@ -103,6 +105,7 @@  struct filter_pred {
 	C(INVALID_FILTER,	"Meaningless filter expression"),	\
 	C(IP_FIELD_ONLY,	"Only 'ip' field is supported for function trace"), \
 	C(INVALID_VALUE,	"Invalid value (did you forget quotes)?"), \
+	C(NO_FUNCTION,		"Function not found"),			\
 	C(ERRNO,		"Error"),				\
 	C(NO_FILTER,		"No filter found")
 
@@ -876,6 +879,17 @@  static int filter_pred_comm(struct filter_pred *pred, void *event)
 	return cmp ^ pred->not;
 }
 
+/* Filter predicate for functions. */
+static int filter_pred_function(struct filter_pred *pred, void *event)
+{
+	unsigned long *addr = (unsigned long *)(event + pred->offset);
+	unsigned long start = (unsigned long)pred->val;
+	unsigned long end = (unsigned long)pred->val2;
+	int ret = *addr >= start && *addr < end;
+
+	return pred->op == OP_EQ ? ret : !ret;
+}
+
 /*
  * regex_match_foo - Basic regex callbacks
  *
@@ -1335,6 +1349,8 @@  static int filter_pred_fn_call(struct filter_pred *pred, void *event)
 		return filter_pred_pchar(pred, event);
 	case FILTER_PRED_FN_CPU:
 		return filter_pred_cpu(pred, event);
+	case FILTER_PRED_FN_FUNCTION:
+		return filter_pred_function(pred, event);
 	case FILTER_PRED_TEST_VISITED:
 		return test_pred_visited_fn(pred, event);
 	default:
@@ -1350,8 +1366,13 @@  static int parse_pred(const char *str, void *data,
 	struct trace_event_call *call = data;
 	struct ftrace_event_field *field;
 	struct filter_pred *pred = NULL;
+	unsigned long offset;
+	unsigned long size;
+	unsigned long ip;
 	char num_buf[24];	/* Big enough to hold an address */
 	char *field_name;
+	char *name;
+	bool function = false;
 	bool ustring = false;
 	char q;
 	u64 val;
@@ -1393,6 +1414,12 @@  static int parse_pred(const char *str, void *data,
 		i += len;
 	}
 
+	/* See if the field is a user space string */
+	if ((len = str_has_prefix(str + i, ".function"))) {
+		function = true;
+		i += len;
+	}
+
 	while (isspace(str[i]))
 		i++;
 
@@ -1423,7 +1450,57 @@  static int parse_pred(const char *str, void *data,
 	pred->offset = field->offset;
 	pred->op = op;
 
-	if (ftrace_event_is_function(call)) {
+	if (function) {
+		/* The field must be the same size as long */
+		if (field->size != sizeof(long)) {
+			parse_error(pe, FILT_ERR_ILLEGAL_FIELD_OP, pos + i);
+			goto err_free;
+		}
+
+		/* Function only works with '==' or '!=' and an unquoted string */
+		switch (op) {
+		case OP_NE:
+		case OP_EQ:
+			break;
+		default:
+			parse_error(pe, FILT_ERR_INVALID_OP, pos + i);
+			goto err_free;
+		}
+
+		if (isdigit(str[i])) {
+			ret = kstrtol(num_buf, 0, &ip);
+			if (ret) {
+				parse_error(pe, FILT_ERR_INVALID_VALUE, pos + i);
+				goto err_free;
+			}
+		} else {
+			s = i;
+			for (; str[i] && !isspace(str[i]); i++)
+				;
+
+			len = i - s;
+			name = kmemdup_nul(str + s, len, GFP_KERNEL);
+			if (!name)
+				goto err_mem;
+			ip = kallsyms_lookup_name(name);
+			kfree(name);
+			if (!ip) {
+				parse_error(pe, FILT_ERR_NO_FUNCTION, pos + i);
+				goto err_free;
+			}
+		}
+
+		/* Now find the function start and end address */
+		if (!kallsyms_lookup_size_offset(ip, &size, &offset)) {
+			parse_error(pe, FILT_ERR_NO_FUNCTION, pos + i);
+			goto err_free;
+		}
+
+		pred->fn_num = FILTER_PRED_FN_FUNCTION;
+		pred->val = ip - offset;
+		pred->val2 = pred->val + size;
+
+	} else if (ftrace_event_is_function(call)) {
 		/*
 		 * Perf does things different with function events.
 		 * It only allows an "ip" field, and expects a string.