diff mbox series

[RFC,bpf-next,01/10] uprobe: Add session callbacks to uprobe_consumer

Message ID 20240604200221.377848-2-jolsa@kernel.org (mailing list archive)
State Superseded
Headers show
Series uprobe, bpf: Add session support | expand

Commit Message

Jiri Olsa June 4, 2024, 8:02 p.m. UTC
Adding new set of callbacks that are triggered on entry and return
uprobe execution for the attached function.

The session means that those callbacks are 'connected' in a way
that allows to:
  - control execution of return callback from entry callback
  - share data between entry and return callbacks

The session concept fits to our common use case where we do filtering
on entry uprobe and based on the result we decide to run the return
uprobe (or not).

It's also convenient to share the data between session callbacks.

The control of return uprobe execution is done via return value of the
entry session callback, where 0 means to install and execute return
uprobe, 1 means to not install.

Current implementation has a restriction that allows to register only
single consumer with session callbacks for a uprobe and also restricting
standard callbacks consumers.

Which means that there can be only single user of a uprobe (inode +
offset) when session consumer is registered to it.

This is because all registered consumers are executed when uprobe or
return uprobe is hit and wihout additional layer (like fgraph's shadow
stack) that would keep the state of the return callback, we have no
way to find out which consumer should be executed.

I'm not sure how big limitation this is for people, our current use
case seems to be ok with that. Fixing this would be more complex/bigger
change to uprobes, thoughts?

Hence sending this as RFC to gather more opinions and feedback.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/uprobes.h | 18 +++++++++++
 kernel/events/uprobes.c | 69 +++++++++++++++++++++++++++++++++++------
 2 files changed, 78 insertions(+), 9 deletions(-)

Comments

Oleg Nesterov June 5, 2024, 3:24 p.m. UTC | #1
I'll try to read this code tomorrow, right now I don't really understand
what does it do and why.

However,

On 06/04, Jiri Olsa wrote:
>
>  struct uprobe_consumer {
> +	/*
> +	 * The handler callback return value controls removal of the uprobe.
> +	 *  0 on success, uprobe stays
> +	 *  1 on failure, remove the uprobe
> +	 *    console warning for anything else
> +	 */
>  	int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);

This is misleading. It is not about success/failure, it is about filtering.

consumer->handler() returns UPROBE_HANDLER_REMOVE if this consumer is not
interested in this task, so this uprobe can be removed (unless another
consumer returns 0).

> +/*
> + * Make sure all the uprobe consumers have only one type of entry
> + * callback registered (either handler or handler_session) due to
> + * different return value actions.
> + */
> +static int consumer_check(struct uprobe_consumer *curr, struct uprobe_consumer *uc)
> +{
> +	if (!curr)
> +		return 0;
> +	if (curr->handler_session || uc->handler_session)
> +		return -EBUSY;
> +	return 0;
> +}

Hmm, I don't understand this code, it doesn't match the comment...

The comment says "all the uprobe consumers have only one type" but
consumer_check() will always fail if the the 1st or 2nd consumer has
->handler_session != NULL ?

Perhaps you meant

	if (!!curr->handler != !!uc->handler)
		return -EBUSY;

?

Oleg.
Oleg Nesterov June 5, 2024, 4:01 p.m. UTC | #2
On 06/05, Oleg Nesterov wrote:
>
> > +/*
> > + * Make sure all the uprobe consumers have only one type of entry
> > + * callback registered (either handler or handler_session) due to
> > + * different return value actions.
> > + */
> > +static int consumer_check(struct uprobe_consumer *curr, struct uprobe_consumer *uc)
> > +{
> > +	if (!curr)
> > +		return 0;
> > +	if (curr->handler_session || uc->handler_session)
> > +		return -EBUSY;
> > +	return 0;
> > +}
>
> Hmm, I don't understand this code, it doesn't match the comment...
>
> The comment says "all the uprobe consumers have only one type" but
> consumer_check() will always fail if the the 1st or 2nd consumer has
> ->handler_session != NULL ?
>
> Perhaps you meant
>
> 	if (!!curr->handler != !!uc->handler)
> 		return -EBUSY;
>
> ?

OK, the changelog says

	Which means that there can be only single user of a uprobe (inode +
	offset) when session consumer is registered to it.

so the code is correct. But I still think the comment is misleading.

Oleg.
Oleg Nesterov June 5, 2024, 4:36 p.m. UTC | #3
On 06/05, Oleg Nesterov wrote:
>
> On 06/05, Oleg Nesterov wrote:
> >
> > > +/*
> > > + * Make sure all the uprobe consumers have only one type of entry
> > > + * callback registered (either handler or handler_session) due to
> > > + * different return value actions.
> > > + */
> > > +static int consumer_check(struct uprobe_consumer *curr, struct uprobe_consumer *uc)
> > > +{
> > > +	if (!curr)
> > > +		return 0;
> > > +	if (curr->handler_session || uc->handler_session)
> > > +		return -EBUSY;
> > > +	return 0;
> > > +}
> >
> > Hmm, I don't understand this code, it doesn't match the comment...
> >
> > The comment says "all the uprobe consumers have only one type" but
> > consumer_check() will always fail if the the 1st or 2nd consumer has
> > ->handler_session != NULL ?
> >
> > Perhaps you meant
> >
> > 	if (!!curr->handler != !!uc->handler)
> > 		return -EBUSY;
> >
> > ?
>
> OK, the changelog says
>
> 	Which means that there can be only single user of a uprobe (inode +
> 	offset) when session consumer is registered to it.
>
> so the code is correct. But I still think the comment is misleading.

Cough... perhaps it is correct but I am still confused even we forget about
the comment ;)

OK, uprobe can have a single consumer with ->handler_session != NULL. I guess
this is because return_instance->data is "global".

So uprobe can have multiple handler_session == NULL consumers before
handler_session != NULL, but not after ?

Oleg.
Andrii Nakryiko June 5, 2024, 5:25 p.m. UTC | #4
On Tue, Jun 4, 2024 at 1:02 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding new set of callbacks that are triggered on entry and return
> uprobe execution for the attached function.
>
> The session means that those callbacks are 'connected' in a way
> that allows to:
>   - control execution of return callback from entry callback
>   - share data between entry and return callbacks
>
> The session concept fits to our common use case where we do filtering
> on entry uprobe and based on the result we decide to run the return
> uprobe (or not).
>
> It's also convenient to share the data between session callbacks.
>
> The control of return uprobe execution is done via return value of the
> entry session callback, where 0 means to install and execute return
> uprobe, 1 means to not install.
>
> Current implementation has a restriction that allows to register only
> single consumer with session callbacks for a uprobe and also restricting
> standard callbacks consumers.
>
> Which means that there can be only single user of a uprobe (inode +
> offset) when session consumer is registered to it.
>
> This is because all registered consumers are executed when uprobe or
> return uprobe is hit and wihout additional layer (like fgraph's shadow
> stack) that would keep the state of the return callback, we have no
> way to find out which consumer should be executed.
>
> I'm not sure how big limitation this is for people, our current use
> case seems to be ok with that. Fixing this would be more complex/bigger
> change to uprobes, thoughts?

I think it's a pretty big limitation, because in production you don't
always know ahead of time all possible users of uprobe, so any such
limitations will cause problems, issue reports, investigation, etc.

As one possible solution, what if we do

struct return_instance {
    ...
    u64 session_cookies[];
};

and allocate sizeof(struct return_instance) + 8 *
<num-of-session-consumers> and then at runtime pass
&session_cookies[i] as data pointer to session-aware callbacks?

>
> Hence sending this as RFC to gather more opinions and feedback.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  include/linux/uprobes.h | 18 +++++++++++
>  kernel/events/uprobes.c | 69 +++++++++++++++++++++++++++++++++++------
>  2 files changed, 78 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index f46e0ca0169c..a2f2d5ac3cee 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -34,6 +34,12 @@ enum uprobe_filter_ctx {
>  };
>
>  struct uprobe_consumer {
> +       /*
> +        * The handler callback return value controls removal of the uprobe.
> +        *  0 on success, uprobe stays
> +        *  1 on failure, remove the uprobe
> +        *    console warning for anything else
> +        */
>         int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
>         int (*ret_handler)(struct uprobe_consumer *self,
>                                 unsigned long func,
> @@ -42,6 +48,17 @@ struct uprobe_consumer {
>                                 enum uprobe_filter_ctx ctx,
>                                 struct mm_struct *mm);
>
> +       /* The handler_session callback return value controls execution of
> +        * the return uprobe and ret_handler_session callback.
> +        *  0 on success
> +        *  1 on failure, DO NOT install/execute the return uprobe
> +        *    console warning for anything else
> +        */
> +       int (*handler_session)(struct uprobe_consumer *self, struct pt_regs *regs,
> +                              unsigned long *data);
> +       int (*ret_handler_session)(struct uprobe_consumer *self, unsigned long func,
> +                                  struct pt_regs *regs, unsigned long *data);
> +

We should try to avoid an alternative set of callbacks, IMO. Let's
extend existing ones with `unsigned long *data`, but specify that
unless consumer sets some flag on registration that it needs a session
cookie, we'll pass NULL here? Or just allocate cookie data for each
registered consumer for simplicity, don't know; given we don't expect
many consumers on exactly the same uprobe, it might be ok to keep it
simple.


>         struct uprobe_consumer *next;
>  };
>
> @@ -85,6 +102,7 @@ struct return_instance {
>         unsigned long           func;
>         unsigned long           stack;          /* stack pointer */
>         unsigned long           orig_ret_vaddr; /* original return address */
> +       unsigned long           data;
>         bool                    chained;        /* true, if instance is nested */
>
>         struct return_instance  *next;          /* keep as stack */
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 2c83ba776fc7..17b0771272a6 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -750,12 +750,32 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
>         return uprobe;
>  }
>
> -static void consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc)
> +/*
> + * Make sure all the uprobe consumers have only one type of entry
> + * callback registered (either handler or handler_session) due to
> + * different return value actions.
> + */
> +static int consumer_check(struct uprobe_consumer *curr, struct uprobe_consumer *uc)
> +{
> +       if (!curr)
> +               return 0;
> +       if (curr->handler_session || uc->handler_session)
> +               return -EBUSY;
> +       return 0;
> +}
> +
> +static int consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc)
>  {
> +       int err;
> +
>         down_write(&uprobe->consumer_rwsem);
> -       uc->next = uprobe->consumers;
> -       uprobe->consumers = uc;
> +       err = consumer_check(uprobe->consumers, uc);
> +       if (!err) {
> +               uc->next = uprobe->consumers;
> +               uprobe->consumers = uc;
> +       }
>         up_write(&uprobe->consumer_rwsem);
> +       return err;
>  }
>
>  /*
> @@ -1114,6 +1134,21 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
>  }
>  EXPORT_SYMBOL_GPL(uprobe_unregister);
>
> +static int check_handler(struct uprobe_consumer *uc)
> +{
> +       /* Uprobe must have at least one set consumer. */
> +       if (!uc->handler && !uc->ret_handler &&
> +           !uc->handler_session && !uc->ret_handler_session)
> +               return -1;
> +       /* Session consumer is exclusive. */
> +       if (uc->handler && uc->handler_session)
> +               return -1;
> +       /* Session consumer must have both entry and return handler. */
> +       if (!!uc->handler_session != !!uc->ret_handler_session)
> +               return -1;
> +       return 0;
> +}
> +
>  /*
>   * __uprobe_register - register a probe
>   * @inode: the file in which the probe has to be placed.
> @@ -1138,8 +1173,7 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
>         struct uprobe *uprobe;
>         int ret;
>
> -       /* Uprobe must have at least one set consumer */
> -       if (!uc->handler && !uc->ret_handler)
> +       if (check_handler(uc))
>                 return -EINVAL;
>
>         /* copy_insn() uses read_mapping_page() or shmem_read_mapping_page() */
> @@ -1173,11 +1207,14 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
>         down_write(&uprobe->register_rwsem);
>         ret = -EAGAIN;
>         if (likely(uprobe_is_active(uprobe))) {
> -               consumer_add(uprobe, uc);
> +               ret = consumer_add(uprobe, uc);
> +               if (ret)
> +                       goto fail;
>                 ret = register_for_each_vma(uprobe, uc);
>                 if (ret)
>                         __uprobe_unregister(uprobe, uc);
>         }
> + fail:
>         up_write(&uprobe->register_rwsem);
>         put_uprobe(uprobe);
>
> @@ -1853,7 +1890,7 @@ static void cleanup_return_instances(struct uprobe_task *utask, bool chained,
>         utask->return_instances = ri;
>  }
>
> -static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
> +static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs, unsigned long data)
>  {
>         struct return_instance *ri;
>         struct uprobe_task *utask;
> @@ -1909,6 +1946,7 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
>         ri->stack = user_stack_pointer(regs);
>         ri->orig_ret_vaddr = orig_ret_vaddr;
>         ri->chained = chained;
> +       ri->data = data;
>
>         utask->depth++;
>         ri->next = utask->return_instances;
> @@ -2070,6 +2108,7 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
>         struct uprobe_consumer *uc;
>         int remove = UPROBE_HANDLER_REMOVE;
>         bool need_prep = false; /* prepare return uprobe, when needed */
> +       unsigned long data = 0;
>
>         down_read(&uprobe->register_rwsem);
>         for (uc = uprobe->consumers; uc; uc = uc->next) {
> @@ -2081,14 +2120,24 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
>                                 "bad rc=0x%x from %ps()\n", rc, uc->handler);
>                 }
>
> -               if (uc->ret_handler)
> +               if (uc->handler_session) {
> +                       rc = uc->handler_session(uc, regs, &data);
> +                       WARN(rc & ~UPROBE_HANDLER_MASK,
> +                               "bad rc=0x%x from %ps()\n", rc, uc->handler_session);
> +               }
> +
> +               if (uc->ret_handler || uc->ret_handler_session)
>                         need_prep = true;
>
>                 remove &= rc;
>         }
>
>         if (need_prep && !remove)
> -               prepare_uretprobe(uprobe, regs); /* put bp at return */
> +               prepare_uretprobe(uprobe, regs, data); /* put bp at return */
> +
> +       /* remove uprobe only for non-session consumers */
> +       if (uprobe->consumers && remove)
> +               remove &= !!uprobe->consumers->handler;
>
>         if (remove && uprobe->consumers) {
>                 WARN_ON(!uprobe_is_active(uprobe));
> @@ -2107,6 +2156,8 @@ handle_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs)
>         for (uc = uprobe->consumers; uc; uc = uc->next) {
>                 if (uc->ret_handler)
>                         uc->ret_handler(uc, ri->func, regs);
> +               if (uc->ret_handler_session)
> +                       uc->ret_handler_session(uc, ri->func, regs, &ri->data);
>         }
>         up_read(&uprobe->register_rwsem);
>  }
> --
> 2.45.1
>
Oleg Nesterov June 5, 2024, 5:56 p.m. UTC | #5
On 06/05, Andrii Nakryiko wrote:
>
> so any such
> limitations will cause problems, issue reports, investigation, etc.

Agreed...

> As one possible solution, what if we do
>
> struct return_instance {
>     ...
>     u64 session_cookies[];
> };
>
> and allocate sizeof(struct return_instance) + 8 *
> <num-of-session-consumers> and then at runtime pass
> &session_cookies[i] as data pointer to session-aware callbacks?

I too thought about this, but I guess it is not that simple.

Just for example. Suppose we have 2 session-consumers C1 and C2.
What if uprobe_unregister(C1) comes before the probed function
returns?

We need something like map_cookie_to_consumer().

> > +       /* The handler_session callback return value controls execution of
> > +        * the return uprobe and ret_handler_session callback.
> > +        *  0 on success
> > +        *  1 on failure, DO NOT install/execute the return uprobe
> > +        *    console warning for anything else
> > +        */
> > +       int (*handler_session)(struct uprobe_consumer *self, struct pt_regs *regs,
> > +                              unsigned long *data);
> > +       int (*ret_handler_session)(struct uprobe_consumer *self, unsigned long func,
> > +                                  struct pt_regs *regs, unsigned long *data);
> > +
>
> We should try to avoid an alternative set of callbacks, IMO. Let's
> extend existing ones with `unsigned long *data`,

Oh yes, agreed.

And the comment about the return value looks confusing too. I mean, the
logic doesn't differ from the ret-code from ->handler().

"DO NOT install/execute the return uprobe" is not true if another
non-session-consumer returns 0.

Oleg.
Jiri Olsa June 5, 2024, 8:18 p.m. UTC | #6
On Wed, Jun 05, 2024 at 06:36:25PM +0200, Oleg Nesterov wrote:
> On 06/05, Oleg Nesterov wrote:
> >
> > On 06/05, Oleg Nesterov wrote:
> > >
> > > > +/*
> > > > + * Make sure all the uprobe consumers have only one type of entry
> > > > + * callback registered (either handler or handler_session) due to
> > > > + * different return value actions.
> > > > + */
> > > > +static int consumer_check(struct uprobe_consumer *curr, struct uprobe_consumer *uc)
> > > > +{
> > > > +	if (!curr)
> > > > +		return 0;
> > > > +	if (curr->handler_session || uc->handler_session)
> > > > +		return -EBUSY;
> > > > +	return 0;
> > > > +}
> > >
> > > Hmm, I don't understand this code, it doesn't match the comment...
> > >
> > > The comment says "all the uprobe consumers have only one type" but
> > > consumer_check() will always fail if the the 1st or 2nd consumer has
> > > ->handler_session != NULL ?
> > >
> > > Perhaps you meant
> > >
> > > 	if (!!curr->handler != !!uc->handler)
> > > 		return -EBUSY;
> > >
> > > ?
> >
> > OK, the changelog says
> >
> > 	Which means that there can be only single user of a uprobe (inode +
> > 	offset) when session consumer is registered to it.
> >
> > so the code is correct. But I still think the comment is misleading.
> 
> Cough... perhaps it is correct but I am still confused even we forget about
> the comment ;)
> 
> OK, uprobe can have a single consumer with ->handler_session != NULL. I guess
> this is because return_instance->data is "global".
> 
> So uprobe can have multiple handler_session == NULL consumers before
> handler_session != NULL, but not after ?

ah yea it should have done what's in the comment, so it's missing
the check for handler.. session handlers are meant to be exclusive

thanks,
jirka
Andrii Nakryiko June 5, 2024, 8:47 p.m. UTC | #7
On Wed, Jun 5, 2024 at 10:57 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 06/05, Andrii Nakryiko wrote:
> >
> > so any such
> > limitations will cause problems, issue reports, investigation, etc.
>
> Agreed...
>
> > As one possible solution, what if we do
> >
> > struct return_instance {
> >     ...
> >     u64 session_cookies[];
> > };
> >
> > and allocate sizeof(struct return_instance) + 8 *
> > <num-of-session-consumers> and then at runtime pass
> > &session_cookies[i] as data pointer to session-aware callbacks?
>
> I too thought about this, but I guess it is not that simple.
>
> Just for example. Suppose we have 2 session-consumers C1 and C2.
> What if uprobe_unregister(C1) comes before the probed function
> returns?
>
> We need something like map_cookie_to_consumer().

Fair enough. The easy way to solve this is to have


struct uprobe_session_cookie {
    int consumer_id;
    u64 cookie;
};

And add id to each new consumer when it is added to struct uprobe.
Unfortunately, it's impossible to tell when a new consumer was added
to the list (as a front item, but maybe we just change it to be
appended instead of prepending) vs when the old consumer was removed,
so in some cases we'd need to do a linear search.

But the good news is that in the common case we wouldn't need to
search and the next item in session_cookies[] array would be the one
we need.

WDYT? It's still fast, and it's simpler than the shadow stack idea, IMO.

P.S. Regardless, maybe we should change the order in which we insert
consumers to uprobe? Right now uprobe consumer added later will be
executed first, which, while not wrong, is counter-intuitive. And also
it breaks a nice natural order when we need to match it up with stuff
like session_cookies[] as described above.

>
> > > +       /* The handler_session callback return value controls execution of
> > > +        * the return uprobe and ret_handler_session callback.
> > > +        *  0 on success
> > > +        *  1 on failure, DO NOT install/execute the return uprobe
> > > +        *    console warning for anything else
> > > +        */
> > > +       int (*handler_session)(struct uprobe_consumer *self, struct pt_regs *regs,
> > > +                              unsigned long *data);
> > > +       int (*ret_handler_session)(struct uprobe_consumer *self, unsigned long func,
> > > +                                  struct pt_regs *regs, unsigned long *data);
> > > +
> >
> > We should try to avoid an alternative set of callbacks, IMO. Let's
> > extend existing ones with `unsigned long *data`,
>
> Oh yes, agreed.
>
> And the comment about the return value looks confusing too. I mean, the
> logic doesn't differ from the ret-code from ->handler().
>
> "DO NOT install/execute the return uprobe" is not true if another
> non-session-consumer returns 0.
>
> Oleg.
>
Jiri Olsa June 5, 2024, 8:50 p.m. UTC | #8
On Wed, Jun 05, 2024 at 07:56:19PM +0200, Oleg Nesterov wrote:
> On 06/05, Andrii Nakryiko wrote:
> >
> > so any such
> > limitations will cause problems, issue reports, investigation, etc.
> 
> Agreed...
> 
> > As one possible solution, what if we do
> >
> > struct return_instance {
> >     ...
> >     u64 session_cookies[];
> > };
> >
> > and allocate sizeof(struct return_instance) + 8 *
> > <num-of-session-consumers> and then at runtime pass
> > &session_cookies[i] as data pointer to session-aware callbacks?
> 
> I too thought about this, but I guess it is not that simple.
> 
> Just for example. Suppose we have 2 session-consumers C1 and C2.
> What if uprobe_unregister(C1) comes before the probed function
> returns?
> 
> We need something like map_cookie_to_consumer().

I guess we could have hash table in return_instance that gets 'consumer -> cookie' ?

return instance is freed after the consumers' return handlers are executed,
so there's no leak if some consumer gets unregistered before that

> 
> > > +       /* The handler_session callback return value controls execution of
> > > +        * the return uprobe and ret_handler_session callback.
> > > +        *  0 on success
> > > +        *  1 on failure, DO NOT install/execute the return uprobe
> > > +        *    console warning for anything else
> > > +        */
> > > +       int (*handler_session)(struct uprobe_consumer *self, struct pt_regs *regs,
> > > +                              unsigned long *data);
> > > +       int (*ret_handler_session)(struct uprobe_consumer *self, unsigned long func,
> > > +                                  struct pt_regs *regs, unsigned long *data);
> > > +
> >
> > We should try to avoid an alternative set of callbacks, IMO. Let's
> > extend existing ones with `unsigned long *data`,
> 
> Oh yes, agreed.
> 
> And the comment about the return value looks confusing too. I mean, the
> logic doesn't differ from the ret-code from ->handler().
> 
> "DO NOT install/execute the return uprobe" is not true if another
> non-session-consumer returns 0.

well they are meant to be exclusive, so there'd be no other non-session-consumer

jirka
Oleg Nesterov June 5, 2024, 9 p.m. UTC | #9
On 06/05, Jiri Olsa wrote:
>
> > And the comment about the return value looks confusing too. I mean, the
> > logic doesn't differ from the ret-code from ->handler().
> >
> > "DO NOT install/execute the return uprobe" is not true if another
> > non-session-consumer returns 0.
>
> well they are meant to be exclusive, so there'd be no other non-session-consumer

OK. (but may be the changelog can explain more clearly why they can't
co-exist with the non-session-consumers).

But again, this doesn't differ from the the ret-code from the
non-session-consumer->handler().

If it returns 1 == UPROBE_HANDLER_REMOVE, then without other consumers
prepare_uretprobe() won't be called.

Oleg.
Jiri Olsa June 5, 2024, 9:01 p.m. UTC | #10
On Wed, Jun 05, 2024 at 10:25:56AM -0700, Andrii Nakryiko wrote:

SNIP

> > ---
> >  include/linux/uprobes.h | 18 +++++++++++
> >  kernel/events/uprobes.c | 69 +++++++++++++++++++++++++++++++++++------
> >  2 files changed, 78 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> > index f46e0ca0169c..a2f2d5ac3cee 100644
> > --- a/include/linux/uprobes.h
> > +++ b/include/linux/uprobes.h
> > @@ -34,6 +34,12 @@ enum uprobe_filter_ctx {
> >  };
> >
> >  struct uprobe_consumer {
> > +       /*
> > +        * The handler callback return value controls removal of the uprobe.
> > +        *  0 on success, uprobe stays
> > +        *  1 on failure, remove the uprobe
> > +        *    console warning for anything else
> > +        */
> >         int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
> >         int (*ret_handler)(struct uprobe_consumer *self,
> >                                 unsigned long func,
> > @@ -42,6 +48,17 @@ struct uprobe_consumer {
> >                                 enum uprobe_filter_ctx ctx,
> >                                 struct mm_struct *mm);
> >
> > +       /* The handler_session callback return value controls execution of
> > +        * the return uprobe and ret_handler_session callback.
> > +        *  0 on success
> > +        *  1 on failure, DO NOT install/execute the return uprobe
> > +        *    console warning for anything else
> > +        */
> > +       int (*handler_session)(struct uprobe_consumer *self, struct pt_regs *regs,
> > +                              unsigned long *data);
> > +       int (*ret_handler_session)(struct uprobe_consumer *self, unsigned long func,
> > +                                  struct pt_regs *regs, unsigned long *data);
> > +
> 
> We should try to avoid an alternative set of callbacks, IMO. Let's
> extend existing ones with `unsigned long *data`, but specify that
> unless consumer sets some flag on registration that it needs a session
> cookie, we'll pass NULL here? Or just allocate cookie data for each
> registered consumer for simplicity, don't know; given we don't expect
> many consumers on exactly the same uprobe, it might be ok to keep it
> simple.
>

ah, I did not want to break existing users.. but it's not uapi,
so we're good, ok makes sense

jirka
 
> 
> >         struct uprobe_consumer *next;
> >  };
> >
> > @@ -85,6 +102,7 @@ struct return_instance {
> >         unsigned long           func;
> >         unsigned long           stack;          /* stack pointer */
> >         unsigned long           orig_ret_vaddr; /* original return address */
> > +       unsigned long           data;
> >         bool                    chained;        /* true, if instance is nested */
> >
> >         struct return_instance  *next;          /* keep as stack */

SNIP
Jiri Olsa June 5, 2024, 9:17 p.m. UTC | #11
On Wed, Jun 05, 2024 at 01:47:00PM -0700, Andrii Nakryiko wrote:
> On Wed, Jun 5, 2024 at 10:57 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On 06/05, Andrii Nakryiko wrote:
> > >
> > > so any such
> > > limitations will cause problems, issue reports, investigation, etc.
> >
> > Agreed...
> >
> > > As one possible solution, what if we do
> > >
> > > struct return_instance {
> > >     ...
> > >     u64 session_cookies[];
> > > };
> > >
> > > and allocate sizeof(struct return_instance) + 8 *
> > > <num-of-session-consumers> and then at runtime pass
> > > &session_cookies[i] as data pointer to session-aware callbacks?
> >
> > I too thought about this, but I guess it is not that simple.
> >
> > Just for example. Suppose we have 2 session-consumers C1 and C2.
> > What if uprobe_unregister(C1) comes before the probed function
> > returns?
> >
> > We need something like map_cookie_to_consumer().
> 
> Fair enough. The easy way to solve this is to have
> 
> 
> struct uprobe_session_cookie {
>     int consumer_id;
>     u64 cookie;
> };
> 
> And add id to each new consumer when it is added to struct uprobe.
> Unfortunately, it's impossible to tell when a new consumer was added
> to the list (as a front item, but maybe we just change it to be
> appended instead of prepending) vs when the old consumer was removed,
> so in some cases we'd need to do a linear search.

also we probably need to add the flag if we want to execute the return
handler..  we can have multiple session handlers and if just one of them
returns 0 we need to install the return probe

and then when return probe hits, we need to execute only that consumer's
return handler

jirka

> 
> But the good news is that in the common case we wouldn't need to
> search and the next item in session_cookies[] array would be the one
> we need.
> 
> WDYT? It's still fast, and it's simpler than the shadow stack idea, IMO.
> 
> P.S. Regardless, maybe we should change the order in which we insert
> consumers to uprobe? Right now uprobe consumer added later will be
> executed first, which, while not wrong, is counter-intuitive. And also
> it breaks a nice natural order when we need to match it up with stuff
> like session_cookies[] as described above.
> 
> >
> > > > +       /* The handler_session callback return value controls execution of
> > > > +        * the return uprobe and ret_handler_session callback.
> > > > +        *  0 on success
> > > > +        *  1 on failure, DO NOT install/execute the return uprobe
> > > > +        *    console warning for anything else
> > > > +        */
> > > > +       int (*handler_session)(struct uprobe_consumer *self, struct pt_regs *regs,
> > > > +                              unsigned long *data);
> > > > +       int (*ret_handler_session)(struct uprobe_consumer *self, unsigned long func,
> > > > +                                  struct pt_regs *regs, unsigned long *data);
> > > > +
> > >
> > > We should try to avoid an alternative set of callbacks, IMO. Let's
> > > extend existing ones with `unsigned long *data`,
> >
> > Oh yes, agreed.
> >
> > And the comment about the return value looks confusing too. I mean, the
> > logic doesn't differ from the ret-code from ->handler().
> >
> > "DO NOT install/execute the return uprobe" is not true if another
> > non-session-consumer returns 0.
> >
> > Oleg.
> >
Oleg Nesterov June 5, 2024, 9:23 p.m. UTC | #12
On 06/05, Andrii Nakryiko wrote:
>
> WDYT? It's still fast, and it's simpler than the shadow stack idea, IMO.

Andrii. I am alredy sleeping, I'll try to read your email tomorrow.
Right now I can only say that everything is simpler than the shadow stack ;)

> P.S. Regardless, maybe we should change the order in which we insert
> consumers to uprobe? Right now uprobe consumer added later will be
> executed first, which, while not wrong, is counter-intuitive.

Agreed...

Even if currently this doesn't really matter, I guess it is supposed
that uc->handler() is "non-intrusive".

Oleg.
Jiri Olsa June 6, 2024, 4:46 p.m. UTC | #13
On Wed, Jun 05, 2024 at 10:50:11PM +0200, Jiri Olsa wrote:
> On Wed, Jun 05, 2024 at 07:56:19PM +0200, Oleg Nesterov wrote:
> > On 06/05, Andrii Nakryiko wrote:
> > >
> > > so any such
> > > limitations will cause problems, issue reports, investigation, etc.
> > 
> > Agreed...
> > 
> > > As one possible solution, what if we do
> > >
> > > struct return_instance {
> > >     ...
> > >     u64 session_cookies[];
> > > };
> > >
> > > and allocate sizeof(struct return_instance) + 8 *
> > > <num-of-session-consumers> and then at runtime pass
> > > &session_cookies[i] as data pointer to session-aware callbacks?
> > 
> > I too thought about this, but I guess it is not that simple.
> > 
> > Just for example. Suppose we have 2 session-consumers C1 and C2.
> > What if uprobe_unregister(C1) comes before the probed function
> > returns?
> > 
> > We need something like map_cookie_to_consumer().
> 
> I guess we could have hash table in return_instance that gets 'consumer -> cookie' ?

ok, hash table is probably too big for this.. I guess some solution that
would iterate consumers and cookies made sure it matches would be fine

jirka

> 
> return instance is freed after the consumers' return handlers are executed,
> so there's no leak if some consumer gets unregistered before that
> 
> > 
> > > > +       /* The handler_session callback return value controls execution of
> > > > +        * the return uprobe and ret_handler_session callback.
> > > > +        *  0 on success
> > > > +        *  1 on failure, DO NOT install/execute the return uprobe
> > > > +        *    console warning for anything else
> > > > +        */
> > > > +       int (*handler_session)(struct uprobe_consumer *self, struct pt_regs *regs,
> > > > +                              unsigned long *data);
> > > > +       int (*ret_handler_session)(struct uprobe_consumer *self, unsigned long func,
> > > > +                                  struct pt_regs *regs, unsigned long *data);
> > > > +
> > >
> > > We should try to avoid an alternative set of callbacks, IMO. Let's
> > > extend existing ones with `unsigned long *data`,
> > 
> > Oh yes, agreed.
> > 
> > And the comment about the return value looks confusing too. I mean, the
> > logic doesn't differ from the ret-code from ->handler().
> > 
> > "DO NOT install/execute the return uprobe" is not true if another
> > non-session-consumer returns 0.
> 
> well they are meant to be exclusive, so there'd be no other non-session-consumer
> 
> jirka
Andrii Nakryiko June 6, 2024, 4:52 p.m. UTC | #14
On Thu, Jun 6, 2024 at 9:46 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Wed, Jun 05, 2024 at 10:50:11PM +0200, Jiri Olsa wrote:
> > On Wed, Jun 05, 2024 at 07:56:19PM +0200, Oleg Nesterov wrote:
> > > On 06/05, Andrii Nakryiko wrote:
> > > >
> > > > so any such
> > > > limitations will cause problems, issue reports, investigation, etc.
> > >
> > > Agreed...
> > >
> > > > As one possible solution, what if we do
> > > >
> > > > struct return_instance {
> > > >     ...
> > > >     u64 session_cookies[];
> > > > };
> > > >
> > > > and allocate sizeof(struct return_instance) + 8 *
> > > > <num-of-session-consumers> and then at runtime pass
> > > > &session_cookies[i] as data pointer to session-aware callbacks?
> > >
> > > I too thought about this, but I guess it is not that simple.
> > >
> > > Just for example. Suppose we have 2 session-consumers C1 and C2.
> > > What if uprobe_unregister(C1) comes before the probed function
> > > returns?
> > >
> > > We need something like map_cookie_to_consumer().
> >
> > I guess we could have hash table in return_instance that gets 'consumer -> cookie' ?
>
> ok, hash table is probably too big for this.. I guess some solution that
> would iterate consumers and cookies made sure it matches would be fine
>

Yes, I was hoping to avoid hash tables for this, and in the common
case have no added overhead.

> jirka
>
> >
> > return instance is freed after the consumers' return handlers are executed,
> > so there's no leak if some consumer gets unregistered before that
> >
> > >
> > > > > +       /* The handler_session callback return value controls execution of
> > > > > +        * the return uprobe and ret_handler_session callback.
> > > > > +        *  0 on success
> > > > > +        *  1 on failure, DO NOT install/execute the return uprobe
> > > > > +        *    console warning for anything else
> > > > > +        */
> > > > > +       int (*handler_session)(struct uprobe_consumer *self, struct pt_regs *regs,
> > > > > +                              unsigned long *data);
> > > > > +       int (*ret_handler_session)(struct uprobe_consumer *self, unsigned long func,
> > > > > +                                  struct pt_regs *regs, unsigned long *data);
> > > > > +
> > > >
> > > > We should try to avoid an alternative set of callbacks, IMO. Let's
> > > > extend existing ones with `unsigned long *data`,
> > >
> > > Oh yes, agreed.
> > >
> > > And the comment about the return value looks confusing too. I mean, the
> > > logic doesn't differ from the ret-code from ->handler().
> > >
> > > "DO NOT install/execute the return uprobe" is not true if another
> > > non-session-consumer returns 0.
> >
> > well they are meant to be exclusive, so there'd be no other non-session-consumer
> >
> > jirka
Jiri Olsa June 10, 2024, 11:06 a.m. UTC | #15
On Thu, Jun 06, 2024 at 09:52:39AM -0700, Andrii Nakryiko wrote:
> On Thu, Jun 6, 2024 at 9:46 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Wed, Jun 05, 2024 at 10:50:11PM +0200, Jiri Olsa wrote:
> > > On Wed, Jun 05, 2024 at 07:56:19PM +0200, Oleg Nesterov wrote:
> > > > On 06/05, Andrii Nakryiko wrote:
> > > > >
> > > > > so any such
> > > > > limitations will cause problems, issue reports, investigation, etc.
> > > >
> > > > Agreed...
> > > >
> > > > > As one possible solution, what if we do
> > > > >
> > > > > struct return_instance {
> > > > >     ...
> > > > >     u64 session_cookies[];
> > > > > };
> > > > >
> > > > > and allocate sizeof(struct return_instance) + 8 *
> > > > > <num-of-session-consumers> and then at runtime pass
> > > > > &session_cookies[i] as data pointer to session-aware callbacks?
> > > >
> > > > I too thought about this, but I guess it is not that simple.
> > > >
> > > > Just for example. Suppose we have 2 session-consumers C1 and C2.
> > > > What if uprobe_unregister(C1) comes before the probed function
> > > > returns?
> > > >
> > > > We need something like map_cookie_to_consumer().
> > >
> > > I guess we could have hash table in return_instance that gets 'consumer -> cookie' ?
> >
> > ok, hash table is probably too big for this.. I guess some solution that
> > would iterate consumers and cookies made sure it matches would be fine
> >
> 
> Yes, I was hoping to avoid hash tables for this, and in the common
> case have no added overhead.

hi,
here's first stab on that.. the change below:
  - extends current handlers with extra argument rather than adding new
    set of handlers
  - store session consumers objects within return_instance object and
  - iterate these objects ^^^ in handle_uretprobe_chain

I guess it could be still polished, but I wonder if this could
be the right direction to do this.. thoughts? ;-)

thanks,
jirka


---
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index f46e0ca0169c..4e40e8352eac 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -34,15 +34,19 @@ enum uprobe_filter_ctx {
 };
 
 struct uprobe_consumer {
-	int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
+	int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs,
+			unsigned long *data);
 	int (*ret_handler)(struct uprobe_consumer *self,
 				unsigned long func,
-				struct pt_regs *regs);
+				struct pt_regs *regs,
+				unsigned long *data);
 	bool (*filter)(struct uprobe_consumer *self,
 				enum uprobe_filter_ctx ctx,
 				struct mm_struct *mm);
 
 	struct uprobe_consumer *next;
+	bool is_session;
+	unsigned int id;
 };
 
 #ifdef CONFIG_UPROBES
@@ -80,6 +84,12 @@ struct uprobe_task {
 	unsigned int			depth;
 };
 
+struct session_consumer {
+	long cookie;
+	unsigned int id;
+	int rc;
+};
+
 struct return_instance {
 	struct uprobe		*uprobe;
 	unsigned long		func;
@@ -88,6 +98,8 @@ struct return_instance {
 	bool			chained;	/* true, if instance is nested */
 
 	struct return_instance	*next;		/* keep as stack */
+	int			session_cnt;
+	struct session_consumer	sc[1];		/* 1 for zero item marking the end */
 };
 
 enum rp_check {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2c83ba776fc7..cbd71dc06ef0 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -63,6 +63,8 @@ struct uprobe {
 	loff_t			ref_ctr_offset;
 	unsigned long		flags;
 
+	unsigned int		session_cnt;
+
 	/*
 	 * The generic code assumes that it has two members of unknown type
 	 * owned by the arch-specific code:
@@ -750,11 +752,30 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
 	return uprobe;
 }
 
+static void
+uprobe_consumer_account(struct uprobe *uprobe, struct uprobe_consumer *uc)
+{
+	static unsigned int session_id;
+
+	if (uc->is_session) {
+		uprobe->session_cnt++;
+		uc->id = ++session_id ?: ++session_id;
+	}
+}
+
+static void
+uprobe_consumer_unaccount(struct uprobe *uprobe, struct uprobe_consumer *uc)
+{
+	if (uc->is_session)
+		uprobe->session_cnt--;
+}
+
 static void consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
 	down_write(&uprobe->consumer_rwsem);
 	uc->next = uprobe->consumers;
 	uprobe->consumers = uc;
+	uprobe_consumer_account(uprobe, uc);
 	up_write(&uprobe->consumer_rwsem);
 }
 
@@ -773,6 +794,7 @@ static bool consumer_del(struct uprobe *uprobe, struct uprobe_consumer *uc)
 		if (*con == uc) {
 			*con = uc->next;
 			ret = true;
+			uprobe_consumer_unaccount(uprobe, uc);
 			break;
 		}
 	}
@@ -1744,6 +1766,23 @@ static struct uprobe_task *get_utask(void)
 	return current->utask;
 }
 
+static size_t ri_size(int session_cnt)
+{
+	struct return_instance *ri __maybe_unused;
+
+	return sizeof(*ri) + session_cnt * sizeof(ri->sc[0]);
+}
+
+static struct return_instance *alloc_return_instance(int session_cnt)
+{
+	struct return_instance *ri;
+
+	ri = kzalloc(ri_size(session_cnt), GFP_KERNEL);
+	if (ri)
+		ri->session_cnt = session_cnt;
+	return ri;
+}
+
 static int dup_utask(struct task_struct *t, struct uprobe_task *o_utask)
 {
 	struct uprobe_task *n_utask;
@@ -1756,11 +1795,11 @@ static int dup_utask(struct task_struct *t, struct uprobe_task *o_utask)
 
 	p = &n_utask->return_instances;
 	for (o = o_utask->return_instances; o; o = o->next) {
-		n = kmalloc(sizeof(struct return_instance), GFP_KERNEL);
+		n = alloc_return_instance(o->session_cnt);
 		if (!n)
 			return -ENOMEM;
 
-		*n = *o;
+		memcpy(n, o, ri_size(o->session_cnt));
 		get_uprobe(n->uprobe);
 		n->next = NULL;
 
@@ -1853,35 +1892,38 @@ static void cleanup_return_instances(struct uprobe_task *utask, bool chained,
 	utask->return_instances = ri;
 }
 
-static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
+static struct return_instance *
+prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs,
+		  struct return_instance *ri, int session_cnt)
 {
-	struct return_instance *ri;
 	struct uprobe_task *utask;
 	unsigned long orig_ret_vaddr, trampoline_vaddr;
 	bool chained;
 
 	if (!get_xol_area())
-		return;
+		return ri;
 
 	utask = get_utask();
 	if (!utask)
-		return;
+		return ri;
 
 	if (utask->depth >= MAX_URETPROBE_DEPTH) {
 		printk_ratelimited(KERN_INFO "uprobe: omit uretprobe due to"
 				" nestedness limit pid/tgid=%d/%d\n",
 				current->pid, current->tgid);
-		return;
+		return ri;
 	}
 
-	ri = kmalloc(sizeof(struct return_instance), GFP_KERNEL);
-	if (!ri)
-		return;
+	if (!ri) {
+		ri = alloc_return_instance(session_cnt);
+		if (!ri)
+			return NULL;
+	}
 
 	trampoline_vaddr = get_trampoline_vaddr();
 	orig_ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, regs);
 	if (orig_ret_vaddr == -1)
-		goto fail;
+		return ri;
 
 	/* drop the entries invalidated by longjmp() */
 	chained = (orig_ret_vaddr == trampoline_vaddr);
@@ -1899,7 +1941,7 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
 			 * attack from user-space.
 			 */
 			uprobe_warn(current, "handle tail call");
-			goto fail;
+			return ri;
 		}
 		orig_ret_vaddr = utask->return_instances->orig_ret_vaddr;
 	}
@@ -1914,9 +1956,7 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
 	ri->next = utask->return_instances;
 	utask->return_instances = ri;
 
-	return;
- fail:
-	kfree(ri);
+	return NULL;
 }
 
 /* Prepare to single-step probed instruction out of line. */
@@ -2069,44 +2109,90 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
 {
 	struct uprobe_consumer *uc;
 	int remove = UPROBE_HANDLER_REMOVE;
+	struct session_consumer *sc = NULL;
+	struct return_instance *ri = NULL;
 	bool need_prep = false; /* prepare return uprobe, when needed */
 
 	down_read(&uprobe->register_rwsem);
-	for (uc = uprobe->consumers; uc; uc = uc->next) {
+	if (uprobe->session_cnt) {
+		ri = alloc_return_instance(uprobe->session_cnt);
+		if (!ri)
+			goto out;
+	}
+	for (uc = uprobe->consumers, sc = &ri->sc[0]; uc; uc = uc->next) {
 		int rc = 0;
 
 		if (uc->handler) {
-			rc = uc->handler(uc, regs);
+			rc = uc->handler(uc, regs, uc->is_session ? &sc->cookie : NULL);
 			WARN(rc & ~UPROBE_HANDLER_MASK,
 				"bad rc=0x%x from %ps()\n", rc, uc->handler);
 		}
 
-		if (uc->ret_handler)
+		if (uc->is_session) {
+			need_prep |= !rc;
+			remove = 0;
+			sc->id = uc->id;
+			sc->rc = rc;
+			sc++;
+		} else if (uc->ret_handler) {
 			need_prep = true;
+		}
 
 		remove &= rc;
 	}
 
 	if (need_prep && !remove)
-		prepare_uretprobe(uprobe, regs); /* put bp at return */
+		ri = prepare_uretprobe(uprobe, regs, ri, uprobe->session_cnt); /* put bp at return */
+	kfree(ri);
 
 	if (remove && uprobe->consumers) {
 		WARN_ON(!uprobe_is_active(uprobe));
 		unapply_uprobe(uprobe, current->mm);
 	}
+ out:
 	up_read(&uprobe->register_rwsem);
 }
 
+static struct session_consumer *
+consumer_find(struct session_consumer *sc, struct uprobe_consumer *uc)
+{
+	for (; sc && sc->id; sc++) {
+		if (sc->id == uc->id)
+			return sc;
+	}
+	return NULL;
+}
+
 static void
 handle_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs)
 {
 	struct uprobe *uprobe = ri->uprobe;
+	struct session_consumer *sc, *tmp;
 	struct uprobe_consumer *uc;
 
 	down_read(&uprobe->register_rwsem);
-	for (uc = uprobe->consumers; uc; uc = uc->next) {
-		if (uc->ret_handler)
-			uc->ret_handler(uc, ri->func, regs);
+	for (uc = uprobe->consumers, sc = &ri->sc[0]; uc; uc = uc->next) {
+		long *cookie = NULL;
+		int rc = 0;
+
+		if (uc->is_session) {
+			/*
+			 * session_consumers are in order with uprobe_consumers,
+			 * we just need to reflect that any uprobe_consumer could
+			 * be removed or added
+			 */
+			tmp = consumer_find(sc, uc);
+			if (tmp) {
+				rc = tmp->rc;
+				cookie = &tmp->cookie;
+				sc = tmp + 1;
+			} else {
+				rc = 1;
+			}
+		}
+
+		if (!rc && uc->ret_handler)
+			uc->ret_handler(uc, ri->func, regs, cookie);
 	}
 	up_read(&uprobe->register_rwsem);
 }
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f5154c051d2c..ae7c35379e4a 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -3329,7 +3329,8 @@ uprobe_multi_link_filter(struct uprobe_consumer *con, enum uprobe_filter_ctx ctx
 }
 
 static int
-uprobe_multi_link_handler(struct uprobe_consumer *con, struct pt_regs *regs)
+uprobe_multi_link_handler(struct uprobe_consumer *con, struct pt_regs *regs,
+			  unsigned long *data)
 {
 	struct bpf_uprobe *uprobe;
 
@@ -3338,7 +3339,8 @@ uprobe_multi_link_handler(struct uprobe_consumer *con, struct pt_regs *regs)
 }
 
 static int
-uprobe_multi_link_ret_handler(struct uprobe_consumer *con, unsigned long func, struct pt_regs *regs)
+uprobe_multi_link_ret_handler(struct uprobe_consumer *con, unsigned long func, struct pt_regs *regs,
+			      unsigned long *data)
 {
 	struct bpf_uprobe *uprobe;
 
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 8541fa1494ae..f7b17f08344c 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -88,9 +88,11 @@ static struct trace_uprobe *to_trace_uprobe(struct dyn_event *ev)
 static int register_uprobe_event(struct trace_uprobe *tu);
 static int unregister_uprobe_event(struct trace_uprobe *tu);
 
-static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs);
+static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs,
+			     unsigned long *data);
 static int uretprobe_dispatcher(struct uprobe_consumer *con,
-				unsigned long func, struct pt_regs *regs);
+				unsigned long func, struct pt_regs *regs,
+				unsigned long *data);
 
 #ifdef CONFIG_STACK_GROWSUP
 static unsigned long adjust_stack_addr(unsigned long addr, unsigned int n)
@@ -1500,7 +1502,8 @@ trace_uprobe_register(struct trace_event_call *event, enum trace_reg type,
 	}
 }
 
-static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
+static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs,
+			     unsigned long *data)
 {
 	struct trace_uprobe *tu;
 	struct uprobe_dispatch_data udd;
@@ -1530,7 +1533,8 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
 }
 
 static int uretprobe_dispatcher(struct uprobe_consumer *con,
-				unsigned long func, struct pt_regs *regs)
+				unsigned long func, struct pt_regs *regs,
+				unsigned long *data)
 {
 	struct trace_uprobe *tu;
 	struct uprobe_dispatch_data udd;
Andrii Nakryiko June 17, 2024, 10:53 p.m. UTC | #16
On Mon, Jun 10, 2024 at 4:06 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, Jun 06, 2024 at 09:52:39AM -0700, Andrii Nakryiko wrote:
> > On Thu, Jun 6, 2024 at 9:46 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Wed, Jun 05, 2024 at 10:50:11PM +0200, Jiri Olsa wrote:
> > > > On Wed, Jun 05, 2024 at 07:56:19PM +0200, Oleg Nesterov wrote:
> > > > > On 06/05, Andrii Nakryiko wrote:
> > > > > >
> > > > > > so any such
> > > > > > limitations will cause problems, issue reports, investigation, etc.
> > > > >
> > > > > Agreed...
> > > > >
> > > > > > As one possible solution, what if we do
> > > > > >
> > > > > > struct return_instance {
> > > > > >     ...
> > > > > >     u64 session_cookies[];
> > > > > > };
> > > > > >
> > > > > > and allocate sizeof(struct return_instance) + 8 *
> > > > > > <num-of-session-consumers> and then at runtime pass
> > > > > > &session_cookies[i] as data pointer to session-aware callbacks?
> > > > >
> > > > > I too thought about this, but I guess it is not that simple.
> > > > >
> > > > > Just for example. Suppose we have 2 session-consumers C1 and C2.
> > > > > What if uprobe_unregister(C1) comes before the probed function
> > > > > returns?
> > > > >
> > > > > We need something like map_cookie_to_consumer().
> > > >
> > > > I guess we could have hash table in return_instance that gets 'consumer -> cookie' ?
> > >
> > > ok, hash table is probably too big for this.. I guess some solution that
> > > would iterate consumers and cookies made sure it matches would be fine
> > >
> >
> > Yes, I was hoping to avoid hash tables for this, and in the common
> > case have no added overhead.
>
> hi,
> here's first stab on that.. the change below:
>   - extends current handlers with extra argument rather than adding new
>     set of handlers
>   - store session consumers objects within return_instance object and
>   - iterate these objects ^^^ in handle_uretprobe_chain
>
> I guess it could be still polished, but I wonder if this could
> be the right direction to do this.. thoughts? ;-)

Yeah, I think this is the right direction. It's a bit sad that this
makes getting rid of rw_sem on hot path even harder, but that's a
separate problem.

>
> thanks,
> jirka
>
>
> ---
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index f46e0ca0169c..4e40e8352eac 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -34,15 +34,19 @@ enum uprobe_filter_ctx {
>  };
>
>  struct uprobe_consumer {
> -       int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
> +       int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs,
> +                       unsigned long *data);

can we use __u64 here? This long vs __u64 might cause problems for BPF
when the host is 32-bit architecture (BPF is always 64-bit).

>         int (*ret_handler)(struct uprobe_consumer *self,
>                                 unsigned long func,
> -                               struct pt_regs *regs);
> +                               struct pt_regs *regs,
> +                               unsigned long *data);
>         bool (*filter)(struct uprobe_consumer *self,
>                                 enum uprobe_filter_ctx ctx,
>                                 struct mm_struct *mm);
>

[...]

>  static int dup_utask(struct task_struct *t, struct uprobe_task *o_utask)
>  {
>         struct uprobe_task *n_utask;
> @@ -1756,11 +1795,11 @@ static int dup_utask(struct task_struct *t, struct uprobe_task *o_utask)
>
>         p = &n_utask->return_instances;
>         for (o = o_utask->return_instances; o; o = o->next) {
> -               n = kmalloc(sizeof(struct return_instance), GFP_KERNEL);
> +               n = alloc_return_instance(o->session_cnt);
>                 if (!n)
>                         return -ENOMEM;
>
> -               *n = *o;
> +               memcpy(n, o, ri_size(o->session_cnt));
>                 get_uprobe(n->uprobe);
>                 n->next = NULL;
>
> @@ -1853,35 +1892,38 @@ static void cleanup_return_instances(struct uprobe_task *utask, bool chained,
>         utask->return_instances = ri;
>  }
>
> -static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
> +static struct return_instance *
> +prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs,
> +                 struct return_instance *ri, int session_cnt)

you have struct uprobe, why do you need to pass session_cnt? Also,
given return_instance is cached, it seems more natural to have

struct return_instance **ri as in/out parameter, and keep the function
itself as void

>  {
> -       struct return_instance *ri;
>         struct uprobe_task *utask;
>         unsigned long orig_ret_vaddr, trampoline_vaddr;
>         bool chained;
>

[...]

>         if (need_prep && !remove)
> -               prepare_uretprobe(uprobe, regs); /* put bp at return */
> +               ri = prepare_uretprobe(uprobe, regs, ri, uprobe->session_cnt); /* put bp at return */
> +       kfree(ri);
>
>         if (remove && uprobe->consumers) {
>                 WARN_ON(!uprobe_is_active(uprobe));
>                 unapply_uprobe(uprobe, current->mm);
>         }
> + out:
>         up_read(&uprobe->register_rwsem);
>  }
>
> +static struct session_consumer *
> +consumer_find(struct session_consumer *sc, struct uprobe_consumer *uc)

why can't we keep track of remaining number of session_consumer items
instead of using entire extra entry as a terminating element? Seems
wasteful and unnecessary.

> +{
> +       for (; sc && sc->id; sc++) {
> +               if (sc->id == uc->id)
> +                       return sc;
> +       }
> +       return NULL;
> +}
> +

[...]
Jiri Olsa June 19, 2024, 6:48 p.m. UTC | #17
On Mon, Jun 17, 2024 at 03:53:50PM -0700, Andrii Nakryiko wrote:
> On Mon, Jun 10, 2024 at 4:06 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Thu, Jun 06, 2024 at 09:52:39AM -0700, Andrii Nakryiko wrote:
> > > On Thu, Jun 6, 2024 at 9:46 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > >
> > > > On Wed, Jun 05, 2024 at 10:50:11PM +0200, Jiri Olsa wrote:
> > > > > On Wed, Jun 05, 2024 at 07:56:19PM +0200, Oleg Nesterov wrote:
> > > > > > On 06/05, Andrii Nakryiko wrote:
> > > > > > >
> > > > > > > so any such
> > > > > > > limitations will cause problems, issue reports, investigation, etc.
> > > > > >
> > > > > > Agreed...
> > > > > >
> > > > > > > As one possible solution, what if we do
> > > > > > >
> > > > > > > struct return_instance {
> > > > > > >     ...
> > > > > > >     u64 session_cookies[];
> > > > > > > };
> > > > > > >
> > > > > > > and allocate sizeof(struct return_instance) + 8 *
> > > > > > > <num-of-session-consumers> and then at runtime pass
> > > > > > > &session_cookies[i] as data pointer to session-aware callbacks?
> > > > > >
> > > > > > I too thought about this, but I guess it is not that simple.
> > > > > >
> > > > > > Just for example. Suppose we have 2 session-consumers C1 and C2.
> > > > > > What if uprobe_unregister(C1) comes before the probed function
> > > > > > returns?
> > > > > >
> > > > > > We need something like map_cookie_to_consumer().
> > > > >
> > > > > I guess we could have hash table in return_instance that gets 'consumer -> cookie' ?
> > > >
> > > > ok, hash table is probably too big for this.. I guess some solution that
> > > > would iterate consumers and cookies made sure it matches would be fine
> > > >
> > >
> > > Yes, I was hoping to avoid hash tables for this, and in the common
> > > case have no added overhead.
> >
> > hi,
> > here's first stab on that.. the change below:
> >   - extends current handlers with extra argument rather than adding new
> >     set of handlers
> >   - store session consumers objects within return_instance object and
> >   - iterate these objects ^^^ in handle_uretprobe_chain
> >
> > I guess it could be still polished, but I wonder if this could
> > be the right direction to do this.. thoughts? ;-)
> 
> Yeah, I think this is the right direction. It's a bit sad that this
> makes getting rid of rw_sem on hot path even harder, but that's a
> separate problem.
> 
> >
> > thanks,
> > jirka
> >
> >
> > ---
> > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> > index f46e0ca0169c..4e40e8352eac 100644
> > --- a/include/linux/uprobes.h
> > +++ b/include/linux/uprobes.h
> > @@ -34,15 +34,19 @@ enum uprobe_filter_ctx {
> >  };
> >
> >  struct uprobe_consumer {
> > -       int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
> > +       int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs,
> > +                       unsigned long *data);
> 
> can we use __u64 here? This long vs __u64 might cause problems for BPF
> when the host is 32-bit architecture (BPF is always 64-bit).

ok

> 
> >         int (*ret_handler)(struct uprobe_consumer *self,
> >                                 unsigned long func,
> > -                               struct pt_regs *regs);
> > +                               struct pt_regs *regs,
> > +                               unsigned long *data);
> >         bool (*filter)(struct uprobe_consumer *self,
> >                                 enum uprobe_filter_ctx ctx,
> >                                 struct mm_struct *mm);
> >
> 
> [...]
> 
> >  static int dup_utask(struct task_struct *t, struct uprobe_task *o_utask)
> >  {
> >         struct uprobe_task *n_utask;
> > @@ -1756,11 +1795,11 @@ static int dup_utask(struct task_struct *t, struct uprobe_task *o_utask)
> >
> >         p = &n_utask->return_instances;
> >         for (o = o_utask->return_instances; o; o = o->next) {
> > -               n = kmalloc(sizeof(struct return_instance), GFP_KERNEL);
> > +               n = alloc_return_instance(o->session_cnt);
> >                 if (!n)
> >                         return -ENOMEM;
> >
> > -               *n = *o;
> > +               memcpy(n, o, ri_size(o->session_cnt));
> >                 get_uprobe(n->uprobe);
> >                 n->next = NULL;
> >
> > @@ -1853,35 +1892,38 @@ static void cleanup_return_instances(struct uprobe_task *utask, bool chained,
> >         utask->return_instances = ri;
> >  }
> >
> > -static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
> > +static struct return_instance *
> > +prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs,
> > +                 struct return_instance *ri, int session_cnt)
> 
> you have struct uprobe, why do you need to pass session_cnt? Also,
> given return_instance is cached, it seems more natural to have
> 
> struct return_instance **ri as in/out parameter, and keep the function
> itself as void

I tried that, but now I think it'd be better if we just let prepare_uretprobe
to allocate (if needed) and free ri in case it fails and do something like:

       if (need_prep && !remove)
               prepare_uretprobe(uprobe, regs, ri); /* put bp at return */
       else
               kfree(ri);

> 
> >  {
> > -       struct return_instance *ri;
> >         struct uprobe_task *utask;
> >         unsigned long orig_ret_vaddr, trampoline_vaddr;
> >         bool chained;
> >
> 
> [...]
> 
> >         if (need_prep && !remove)
> > -               prepare_uretprobe(uprobe, regs); /* put bp at return */
> > +               ri = prepare_uretprobe(uprobe, regs, ri, uprobe->session_cnt); /* put bp at return */
> > +       kfree(ri);
> >
> >         if (remove && uprobe->consumers) {
> >                 WARN_ON(!uprobe_is_active(uprobe));
> >                 unapply_uprobe(uprobe, current->mm);
> >         }
> > + out:
> >         up_read(&uprobe->register_rwsem);
> >  }
> >
> > +static struct session_consumer *
> > +consumer_find(struct session_consumer *sc, struct uprobe_consumer *uc)
> 
> why can't we keep track of remaining number of session_consumer items
> instead of using entire extra entry as a terminating element? Seems
> wasteful and unnecessary.

ok I think it's possible, will try that

thanks,
jirka
diff mbox series

Patch

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index f46e0ca0169c..a2f2d5ac3cee 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -34,6 +34,12 @@  enum uprobe_filter_ctx {
 };
 
 struct uprobe_consumer {
+	/*
+	 * The handler callback return value controls removal of the uprobe.
+	 *  0 on success, uprobe stays
+	 *  1 on failure, remove the uprobe
+	 *    console warning for anything else
+	 */
 	int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
 	int (*ret_handler)(struct uprobe_consumer *self,
 				unsigned long func,
@@ -42,6 +48,17 @@  struct uprobe_consumer {
 				enum uprobe_filter_ctx ctx,
 				struct mm_struct *mm);
 
+	/* The handler_session callback return value controls execution of
+	 * the return uprobe and ret_handler_session callback.
+	 *  0 on success
+	 *  1 on failure, DO NOT install/execute the return uprobe
+	 *    console warning for anything else
+	 */
+	int (*handler_session)(struct uprobe_consumer *self, struct pt_regs *regs,
+			       unsigned long *data);
+	int (*ret_handler_session)(struct uprobe_consumer *self, unsigned long func,
+				   struct pt_regs *regs, unsigned long *data);
+
 	struct uprobe_consumer *next;
 };
 
@@ -85,6 +102,7 @@  struct return_instance {
 	unsigned long		func;
 	unsigned long		stack;		/* stack pointer */
 	unsigned long		orig_ret_vaddr; /* original return address */
+	unsigned long		data;
 	bool			chained;	/* true, if instance is nested */
 
 	struct return_instance	*next;		/* keep as stack */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2c83ba776fc7..17b0771272a6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -750,12 +750,32 @@  static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
 	return uprobe;
 }
 
-static void consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc)
+/*
+ * Make sure all the uprobe consumers have only one type of entry
+ * callback registered (either handler or handler_session) due to
+ * different return value actions.
+ */
+static int consumer_check(struct uprobe_consumer *curr, struct uprobe_consumer *uc)
+{
+	if (!curr)
+		return 0;
+	if (curr->handler_session || uc->handler_session)
+		return -EBUSY;
+	return 0;
+}
+
+static int consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
+	int err;
+
 	down_write(&uprobe->consumer_rwsem);
-	uc->next = uprobe->consumers;
-	uprobe->consumers = uc;
+	err = consumer_check(uprobe->consumers, uc);
+	if (!err) {
+		uc->next = uprobe->consumers;
+		uprobe->consumers = uc;
+	}
 	up_write(&uprobe->consumer_rwsem);
+	return err;
 }
 
 /*
@@ -1114,6 +1134,21 @@  void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
 }
 EXPORT_SYMBOL_GPL(uprobe_unregister);
 
+static int check_handler(struct uprobe_consumer *uc)
+{
+	/* Uprobe must have at least one set consumer. */
+	if (!uc->handler && !uc->ret_handler &&
+	    !uc->handler_session && !uc->ret_handler_session)
+		return -1;
+	/* Session consumer is exclusive. */
+	if (uc->handler && uc->handler_session)
+		return -1;
+	/* Session consumer must have both entry and return handler. */
+	if (!!uc->handler_session != !!uc->ret_handler_session)
+		return -1;
+	return 0;
+}
+
 /*
  * __uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
@@ -1138,8 +1173,7 @@  static int __uprobe_register(struct inode *inode, loff_t offset,
 	struct uprobe *uprobe;
 	int ret;
 
-	/* Uprobe must have at least one set consumer */
-	if (!uc->handler && !uc->ret_handler)
+	if (check_handler(uc))
 		return -EINVAL;
 
 	/* copy_insn() uses read_mapping_page() or shmem_read_mapping_page() */
@@ -1173,11 +1207,14 @@  static int __uprobe_register(struct inode *inode, loff_t offset,
 	down_write(&uprobe->register_rwsem);
 	ret = -EAGAIN;
 	if (likely(uprobe_is_active(uprobe))) {
-		consumer_add(uprobe, uc);
+		ret = consumer_add(uprobe, uc);
+		if (ret)
+			goto fail;
 		ret = register_for_each_vma(uprobe, uc);
 		if (ret)
 			__uprobe_unregister(uprobe, uc);
 	}
+ fail:
 	up_write(&uprobe->register_rwsem);
 	put_uprobe(uprobe);
 
@@ -1853,7 +1890,7 @@  static void cleanup_return_instances(struct uprobe_task *utask, bool chained,
 	utask->return_instances = ri;
 }
 
-static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
+static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs, unsigned long data)
 {
 	struct return_instance *ri;
 	struct uprobe_task *utask;
@@ -1909,6 +1946,7 @@  static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
 	ri->stack = user_stack_pointer(regs);
 	ri->orig_ret_vaddr = orig_ret_vaddr;
 	ri->chained = chained;
+	ri->data = data;
 
 	utask->depth++;
 	ri->next = utask->return_instances;
@@ -2070,6 +2108,7 @@  static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
 	struct uprobe_consumer *uc;
 	int remove = UPROBE_HANDLER_REMOVE;
 	bool need_prep = false; /* prepare return uprobe, when needed */
+	unsigned long data = 0;
 
 	down_read(&uprobe->register_rwsem);
 	for (uc = uprobe->consumers; uc; uc = uc->next) {
@@ -2081,14 +2120,24 @@  static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
 				"bad rc=0x%x from %ps()\n", rc, uc->handler);
 		}
 
-		if (uc->ret_handler)
+		if (uc->handler_session) {
+			rc = uc->handler_session(uc, regs, &data);
+			WARN(rc & ~UPROBE_HANDLER_MASK,
+				"bad rc=0x%x from %ps()\n", rc, uc->handler_session);
+		}
+
+		if (uc->ret_handler || uc->ret_handler_session)
 			need_prep = true;
 
 		remove &= rc;
 	}
 
 	if (need_prep && !remove)
-		prepare_uretprobe(uprobe, regs); /* put bp at return */
+		prepare_uretprobe(uprobe, regs, data); /* put bp at return */
+
+	/* remove uprobe only for non-session consumers */
+	if (uprobe->consumers && remove)
+		remove &= !!uprobe->consumers->handler;
 
 	if (remove && uprobe->consumers) {
 		WARN_ON(!uprobe_is_active(uprobe));
@@ -2107,6 +2156,8 @@  handle_uretprobe_chain(struct return_instance *ri, struct pt_regs *regs)
 	for (uc = uprobe->consumers; uc; uc = uc->next) {
 		if (uc->ret_handler)
 			uc->ret_handler(uc, ri->func, regs);
+		if (uc->ret_handler_session)
+			uc->ret_handler_session(uc, ri->func, regs, &ri->data);
 	}
 	up_read(&uprobe->register_rwsem);
 }