diff mbox series

[v3,07/11] perf: Add breakpoint information to siginfo on SIGTRAP

Message ID 20210324112503.623833-8-elver@google.com (mailing list archive)
State New
Headers show
Series Add support for synchronous signals on perf events | expand

Commit Message

Marco Elver March 24, 2021, 11:24 a.m. UTC
Encode information from breakpoint attributes into siginfo_t, which
helps disambiguate which breakpoint fired.

Note, providing the event fd may be unreliable, since the event may have
been modified (via PERF_EVENT_IOC_MODIFY_ATTRIBUTES) between the event
triggering and the signal being delivered to user space.

Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* Add comment about si_perf==0.
---
 kernel/events/core.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

Peter Zijlstra March 24, 2021, 12:53 p.m. UTC | #1
On Wed, Mar 24, 2021 at 12:24:59PM +0100, Marco Elver wrote:
> Encode information from breakpoint attributes into siginfo_t, which
> helps disambiguate which breakpoint fired.
> 
> Note, providing the event fd may be unreliable, since the event may have
> been modified (via PERF_EVENT_IOC_MODIFY_ATTRIBUTES) between the event
> triggering and the signal being delivered to user space.
> 
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * Add comment about si_perf==0.
> ---
>  kernel/events/core.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 1e4c949bf75f..0316d39e8c8f 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6399,6 +6399,22 @@ static void perf_sigtrap(struct perf_event *event)
>  	info.si_signo = SIGTRAP;
>  	info.si_code = TRAP_PERF;
>  	info.si_errno = event->attr.type;
> +
> +	switch (event->attr.type) {
> +	case PERF_TYPE_BREAKPOINT:
> +		info.si_addr = (void *)(unsigned long)event->attr.bp_addr;
> +		info.si_perf = (event->attr.bp_len << 16) | (u64)event->attr.bp_type;

Ahh, here's the si_perf user. I wasn't really clear to me what was
supposed to be in that field at patch #5 where it was introduced.

Would it perhaps make sense to put the user address of struct
perf_event_attr in there instead? (Obviously we'd have to carry it from
the syscall to here, but it might be more useful than a random encoding
of some bits therefrom).

Then we can also clearly document that's in that field, and it might be
more useful for possible other uses.
Peter Zijlstra March 24, 2021, 1:01 p.m. UTC | #2
On Wed, Mar 24, 2021 at 01:53:48PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 24, 2021 at 12:24:59PM +0100, Marco Elver wrote:
> > Encode information from breakpoint attributes into siginfo_t, which
> > helps disambiguate which breakpoint fired.
> > 
> > Note, providing the event fd may be unreliable, since the event may have
> > been modified (via PERF_EVENT_IOC_MODIFY_ATTRIBUTES) between the event
> > triggering and the signal being delivered to user space.
> > 
> > Signed-off-by: Marco Elver <elver@google.com>
> > ---
> > v2:
> > * Add comment about si_perf==0.
> > ---
> >  kernel/events/core.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 1e4c949bf75f..0316d39e8c8f 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -6399,6 +6399,22 @@ static void perf_sigtrap(struct perf_event *event)
> >  	info.si_signo = SIGTRAP;
> >  	info.si_code = TRAP_PERF;
> >  	info.si_errno = event->attr.type;
> > +
> > +	switch (event->attr.type) {
> > +	case PERF_TYPE_BREAKPOINT:
> > +		info.si_addr = (void *)(unsigned long)event->attr.bp_addr;
> > +		info.si_perf = (event->attr.bp_len << 16) | (u64)event->attr.bp_type;
> 
> Ahh, here's the si_perf user. I wasn't really clear to me what was
> supposed to be in that field at patch #5 where it was introduced.
> 
> Would it perhaps make sense to put the user address of struct
> perf_event_attr in there instead? (Obviously we'd have to carry it from
> the syscall to here, but it might be more useful than a random encoding
> of some bits therefrom).
> 
> Then we can also clearly document that's in that field, and it might be
> more useful for possible other uses.

Something like so...

---

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -778,6 +778,8 @@ struct perf_event {
 	void *security;
 #endif
 	struct list_head		sb_list;
+
+	struct perf_event_attr		__user *uattr;
 #endif /* CONFIG_PERF_EVENTS */
 };
 
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5652,13 +5652,17 @@ static long _perf_ioctl(struct perf_even
 		return perf_event_query_prog_array(event, (void __user *)arg);
 
 	case PERF_EVENT_IOC_MODIFY_ATTRIBUTES: {
+		struct perf_event_attr __user *uattr;
 		struct perf_event_attr new_attr;
-		int err = perf_copy_attr((struct perf_event_attr __user *)arg,
-					 &new_attr);
+		int err;
 
+		uattr = (struct perf_event_attr __user *)arg;
+		err = perf_copy_attr(uattr, &new_attr);
 		if (err)
 			return err;
 
+		event->uattr = uattr;
+
 		return perf_event_modify_attr(event,  &new_attr);
 	}
 	default:
@@ -6400,6 +6404,8 @@ static void perf_sigtrap(struct perf_eve
 	info.si_signo = SIGTRAP;
 	info.si_code = TRAP_PERF;
 	info.si_errno = event->attr.type;
+	info.si_perf = (unsigned long)event->uattr;
+
 	force_sig_info(&info);
 }
 
@@ -12011,6 +12017,8 @@ SYSCALL_DEFINE5(perf_event_open,
 		goto err_task;
 	}
 
+	event->uattr = attr_uptr;
+
 	if (is_sampling_event(event)) {
 		if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
 			err = -EOPNOTSUPP;
Peter Zijlstra March 24, 2021, 1:21 p.m. UTC | #3
On Wed, Mar 24, 2021 at 02:01:56PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 24, 2021 at 01:53:48PM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 24, 2021 at 12:24:59PM +0100, Marco Elver wrote:
> > > Encode information from breakpoint attributes into siginfo_t, which
> > > helps disambiguate which breakpoint fired.
> > > 
> > > Note, providing the event fd may be unreliable, since the event may have
> > > been modified (via PERF_EVENT_IOC_MODIFY_ATTRIBUTES) between the event
> > > triggering and the signal being delivered to user space.
> > > 
> > > Signed-off-by: Marco Elver <elver@google.com>
> > > ---
> > > v2:
> > > * Add comment about si_perf==0.
> > > ---
> > >  kernel/events/core.c | 16 ++++++++++++++++
> > >  1 file changed, 16 insertions(+)
> > > 
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index 1e4c949bf75f..0316d39e8c8f 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -6399,6 +6399,22 @@ static void perf_sigtrap(struct perf_event *event)
> > >  	info.si_signo = SIGTRAP;
> > >  	info.si_code = TRAP_PERF;
> > >  	info.si_errno = event->attr.type;
> > > +
> > > +	switch (event->attr.type) {
> > > +	case PERF_TYPE_BREAKPOINT:
> > > +		info.si_addr = (void *)(unsigned long)event->attr.bp_addr;
> > > +		info.si_perf = (event->attr.bp_len << 16) | (u64)event->attr.bp_type;
> > 
> > Ahh, here's the si_perf user. I wasn't really clear to me what was
> > supposed to be in that field at patch #5 where it was introduced.
> > 
> > Would it perhaps make sense to put the user address of struct
> > perf_event_attr in there instead? (Obviously we'd have to carry it from
> > the syscall to here, but it might be more useful than a random encoding
> > of some bits therefrom).
> > 
> > Then we can also clearly document that's in that field, and it might be
> > more useful for possible other uses.
> 
> Something like so...

Ok possibly something like so, which also gets the data address right
for more cases.

---
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -778,6 +778,8 @@ struct perf_event {
 	void *security;
 #endif
 	struct list_head		sb_list;
+
+	struct kernel_siginfo 		siginfo;
 #endif /* CONFIG_PERF_EVENTS */
 };
 
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5652,13 +5652,17 @@ static long _perf_ioctl(struct perf_even
 		return perf_event_query_prog_array(event, (void __user *)arg);
 
 	case PERF_EVENT_IOC_MODIFY_ATTRIBUTES: {
+		struct perf_event_attr __user *uattr;
 		struct perf_event_attr new_attr;
-		int err = perf_copy_attr((struct perf_event_attr __user *)arg,
-					 &new_attr);
+		int err;
 
+		uattr = (struct perf_event_attr __user *)arg;
+		err = perf_copy_attr(uattr, &new_attr);
 		if (err)
 			return err;
 
+		event->siginfo.si_perf = (unsigned long)uattr;
+
 		return perf_event_modify_attr(event,  &new_attr);
 	}
 	default:
@@ -6394,13 +6398,7 @@ void perf_event_wakeup(struct perf_event
 
 static void perf_sigtrap(struct perf_event *event)
 {
-	struct kernel_siginfo info;
-
-	clear_siginfo(&info);
-	info.si_signo = SIGTRAP;
-	info.si_code = TRAP_PERF;
-	info.si_errno = event->attr.type;
-	force_sig_info(&info);
+	force_sig_info(&event->siginfo);
 }
 
 static void perf_pending_event_disable(struct perf_event *event)
@@ -6414,8 +6412,8 @@ static void perf_pending_event_disable(s
 		WRITE_ONCE(event->pending_disable, -1);
 
 		if (event->attr.sigtrap) {
-			atomic_set(&event->event_limit, 1); /* rearm event */
 			perf_sigtrap(event);
+			atomic_set_release(&event->event_limit, 1); /* rearm event */
 			return;
 		}
 
@@ -9121,6 +9119,7 @@ static int __perf_event_overflow(struct
 	if (events && atomic_dec_and_test(&event->event_limit)) {
 		ret = 1;
 		event->pending_kill = POLL_HUP;
+		event->siginfo.si_addr = (void *)data->addr;
 
 		perf_event_disable_inatomic(event);
 	}
@@ -12011,6 +12010,11 @@ SYSCALL_DEFINE5(perf_event_open,
 		goto err_task;
 	}
 
+	clear_siginfo(&event->siginfo);
+	event->siginfo.si_signo = SIGTRAP;
+	event->siginfo.si_code = TRAP_PERF;
+	event->siginfo.si_perf = (unsigned long)attr_uptr;
+
 	if (is_sampling_event(event)) {
 		if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
 			err = -EOPNOTSUPP;
Peter Zijlstra March 24, 2021, 1:43 p.m. UTC | #4
On Wed, Mar 24, 2021 at 02:21:37PM +0100, Peter Zijlstra wrote:
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5652,13 +5652,17 @@ static long _perf_ioctl(struct perf_even
>  		return perf_event_query_prog_array(event, (void __user *)arg);
>  
>  	case PERF_EVENT_IOC_MODIFY_ATTRIBUTES: {
> +		struct perf_event_attr __user *uattr;
>  		struct perf_event_attr new_attr;
> -		int err = perf_copy_attr((struct perf_event_attr __user *)arg,
> -					 &new_attr);
> +		int err;
>  
> +		uattr = (struct perf_event_attr __user *)arg;
> +		err = perf_copy_attr(uattr, &new_attr);
>  		if (err)
>  			return err;
>  
> +		event->siginfo.si_perf = (unsigned long)uattr;

Oh bugger; that wants updating for all children too..

> +
>  		return perf_event_modify_attr(event,  &new_attr);
>  	}
>  	default:
> @@ -12011,6 +12010,11 @@ SYSCALL_DEFINE5(perf_event_open,
>  		goto err_task;
>  	}
>  
> +	clear_siginfo(&event->siginfo);
> +	event->siginfo.si_signo = SIGTRAP;
> +	event->siginfo.si_code = TRAP_PERF;
> +	event->siginfo.si_perf = (unsigned long)attr_uptr;

And inherit_event() / perf_event_alloc() want to copy/propagate that.

>  	if (is_sampling_event(event)) {
>  		if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
>  			err = -EOPNOTSUPP;
Marco Elver March 24, 2021, 1:47 p.m. UTC | #5
On Wed, 24 Mar 2021 at 14:21, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Mar 24, 2021 at 02:01:56PM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 24, 2021 at 01:53:48PM +0100, Peter Zijlstra wrote:
> > > On Wed, Mar 24, 2021 at 12:24:59PM +0100, Marco Elver wrote:
> > > > Encode information from breakpoint attributes into siginfo_t, which
> > > > helps disambiguate which breakpoint fired.
> > > >
> > > > Note, providing the event fd may be unreliable, since the event may have
> > > > been modified (via PERF_EVENT_IOC_MODIFY_ATTRIBUTES) between the event
> > > > triggering and the signal being delivered to user space.
> > > >
> > > > Signed-off-by: Marco Elver <elver@google.com>
> > > > ---
> > > > v2:
> > > > * Add comment about si_perf==0.
> > > > ---
> > > >  kernel/events/core.c | 16 ++++++++++++++++
> > > >  1 file changed, 16 insertions(+)
> > > >
> > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > index 1e4c949bf75f..0316d39e8c8f 100644
> > > > --- a/kernel/events/core.c
> > > > +++ b/kernel/events/core.c
> > > > @@ -6399,6 +6399,22 @@ static void perf_sigtrap(struct perf_event *event)
> > > >   info.si_signo = SIGTRAP;
> > > >   info.si_code = TRAP_PERF;
> > > >   info.si_errno = event->attr.type;
> > > > +
> > > > + switch (event->attr.type) {
> > > > + case PERF_TYPE_BREAKPOINT:
> > > > +         info.si_addr = (void *)(unsigned long)event->attr.bp_addr;
> > > > +         info.si_perf = (event->attr.bp_len << 16) | (u64)event->attr.bp_type;
> > >
> > > Ahh, here's the si_perf user. I wasn't really clear to me what was
> > > supposed to be in that field at patch #5 where it was introduced.
> > >
> > > Would it perhaps make sense to put the user address of struct
> > > perf_event_attr in there instead? (Obviously we'd have to carry it from
> > > the syscall to here, but it might be more useful than a random encoding
> > > of some bits therefrom).
> > >
> > > Then we can also clearly document that's in that field, and it might be
> > > more useful for possible other uses.
> >
> > Something like so...
>
> Ok possibly something like so, which also gets the data address right
> for more cases.

It'd be nice if this could work. Though I think there's an inherent
problem (same as with fd) with trying to pass a reference back to the
user, while the user can concurrently modify that reference.

Let's assume that user space creates new copies of perf_event_attr for
every version they want, there's still a race where the user modifies
an event, and concurrently in another thread a signal arrives. I
currently don't see a way to determine when it's safe to free a
perf_event_attr or reuse, without there still being a chance that a
signal arrives due to some old perf_event_attr. And for our usecase,
we really need to know a precise subset out of attr that triggered the
event.

So the safest thing I can see is to stash a copy of the relevant
information in siginfo, which is how we ended up with encoding bits
from perf_event_attr into si_perf.

One way around this I could see is that we know that there's a limited
number of combinations of attrs, and the user just creates an instance
for every version they want (and hope it doesn't exceed some large
number). Of course, for breakpoints, we have bp_addr, but let's assume
that si_addr has the right version, so we won't need to access
perf_event_attr::bp_addr.

But given the additional complexities, I'm not sure it's worth it. Is
there a way to solve the modify-signal-race problem in a nicer way?

Thanks,
-- Marco
Peter Zijlstra March 24, 2021, 2 p.m. UTC | #6
One last try, I'll leave it alone now, I promise :-)

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -778,6 +778,9 @@ struct perf_event {
 	void *security;
 #endif
 	struct list_head		sb_list;
+
+	unsigned long			si_uattr;
+	unsigned long			si_data;
 #endif /* CONFIG_PERF_EVENTS */
 };
 
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5652,13 +5652,17 @@ static long _perf_ioctl(struct perf_even
 		return perf_event_query_prog_array(event, (void __user *)arg);
 
 	case PERF_EVENT_IOC_MODIFY_ATTRIBUTES: {
+		struct perf_event_attr __user *uattr;
 		struct perf_event_attr new_attr;
-		int err = perf_copy_attr((struct perf_event_attr __user *)arg,
-					 &new_attr);
+		int err;
 
+		uattr = (struct perf_event_attr __user *)arg;
+		err = perf_copy_attr(uattr, &new_attr);
 		if (err)
 			return err;
 
+		event->si_uattr = (unsigned long)uattr;
+
 		return perf_event_modify_attr(event,  &new_attr);
 	}
 	default:
@@ -6399,7 +6403,12 @@ static void perf_sigtrap(struct perf_eve
 	clear_siginfo(&info);
 	info.si_signo = SIGTRAP;
 	info.si_code = TRAP_PERF;
-	info.si_errno = event->attr.type;
+	info.si_addr = (void *)event->si_data;
+
+	info.si_perf = event->si_uattr;
+	if (event->parent)
+		info.si_perf = event->parent->si_uattr;
+
 	force_sig_info(&info);
 }
 
@@ -6414,8 +6423,8 @@ static void perf_pending_event_disable(s
 		WRITE_ONCE(event->pending_disable, -1);
 
 		if (event->attr.sigtrap) {
-			atomic_set(&event->event_limit, 1); /* rearm event */
 			perf_sigtrap(event);
+			atomic_set_release(&event->event_limit, 1); /* rearm event */
 			return;
 		}
 
@@ -9121,6 +9130,7 @@ static int __perf_event_overflow(struct
 	if (events && atomic_dec_and_test(&event->event_limit)) {
 		ret = 1;
 		event->pending_kill = POLL_HUP;
+		event->si_data = data->addr;
 
 		perf_event_disable_inatomic(event);
 	}
@@ -12011,6 +12021,8 @@ SYSCALL_DEFINE5(perf_event_open,
 		goto err_task;
 	}
 
+	event->si_uattr = (unsigned long)attr_uptr;
+
 	if (is_sampling_event(event)) {
 		if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
 			err = -EOPNOTSUPP;
Marco Elver March 24, 2021, 2:05 p.m. UTC | #7
On Wed, 24 Mar 2021 at 15:01, Peter Zijlstra <peterz@infradead.org> wrote:
>
> One last try, I'll leave it alone now, I promise :-)

This looks like it does what you suggested, thanks! :-)

I'll still need to think about it, because of the potential problem
with modify-signal-races and what the user's synchronization story
would look like then.

> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -778,6 +778,9 @@ struct perf_event {
>         void *security;
>  #endif
>         struct list_head                sb_list;
> +
> +       unsigned long                   si_uattr;
> +       unsigned long                   si_data;
>  #endif /* CONFIG_PERF_EVENTS */
>  };
>
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5652,13 +5652,17 @@ static long _perf_ioctl(struct perf_even
>                 return perf_event_query_prog_array(event, (void __user *)arg);
>
>         case PERF_EVENT_IOC_MODIFY_ATTRIBUTES: {
> +               struct perf_event_attr __user *uattr;
>                 struct perf_event_attr new_attr;
> -               int err = perf_copy_attr((struct perf_event_attr __user *)arg,
> -                                        &new_attr);
> +               int err;
>
> +               uattr = (struct perf_event_attr __user *)arg;
> +               err = perf_copy_attr(uattr, &new_attr);
>                 if (err)
>                         return err;
>
> +               event->si_uattr = (unsigned long)uattr;
> +
>                 return perf_event_modify_attr(event,  &new_attr);
>         }
>         default:
> @@ -6399,7 +6403,12 @@ static void perf_sigtrap(struct perf_eve
>         clear_siginfo(&info);
>         info.si_signo = SIGTRAP;
>         info.si_code = TRAP_PERF;
> -       info.si_errno = event->attr.type;
> +       info.si_addr = (void *)event->si_data;
> +
> +       info.si_perf = event->si_uattr;
> +       if (event->parent)
> +               info.si_perf = event->parent->si_uattr;
> +
>         force_sig_info(&info);
>  }
>
> @@ -6414,8 +6423,8 @@ static void perf_pending_event_disable(s
>                 WRITE_ONCE(event->pending_disable, -1);
>
>                 if (event->attr.sigtrap) {
> -                       atomic_set(&event->event_limit, 1); /* rearm event */
>                         perf_sigtrap(event);
> +                       atomic_set_release(&event->event_limit, 1); /* rearm event */
>                         return;
>                 }
>
> @@ -9121,6 +9130,7 @@ static int __perf_event_overflow(struct
>         if (events && atomic_dec_and_test(&event->event_limit)) {
>                 ret = 1;
>                 event->pending_kill = POLL_HUP;
> +               event->si_data = data->addr;
>
>                 perf_event_disable_inatomic(event);
>         }
> @@ -12011,6 +12021,8 @@ SYSCALL_DEFINE5(perf_event_open,
>                 goto err_task;
>         }
>
> +       event->si_uattr = (unsigned long)attr_uptr;
> +
>         if (is_sampling_event(event)) {
>                 if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
>                         err = -EOPNOTSUPP;
Dmitry Vyukov March 24, 2021, 2:12 p.m. UTC | #8
On Wed, Mar 24, 2021 at 3:05 PM Marco Elver <elver@google.com> wrote:
>
> On Wed, 24 Mar 2021 at 15:01, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > One last try, I'll leave it alone now, I promise :-)
>
> This looks like it does what you suggested, thanks! :-)
>
> I'll still need to think about it, because of the potential problem
> with modify-signal-races and what the user's synchronization story
> would look like then.

I agree that this looks inherently racy. The attr can't be allocated
on stack, user synchronization may be tricky and expensive. The API
may provoke bugs and some users may not even realize the race problem.

One potential alternative is use of an opaque u64 context (if we could
shove it into the attr). A user can pass a pointer to the attr in
there (makes it equivalent to this proposal), or bit-pack size/type
(as we want), pass some sequence number or whatever.



> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -778,6 +778,9 @@ struct perf_event {
> >         void *security;
> >  #endif
> >         struct list_head                sb_list;
> > +
> > +       unsigned long                   si_uattr;
> > +       unsigned long                   si_data;
> >  #endif /* CONFIG_PERF_EVENTS */
> >  };
> >
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -5652,13 +5652,17 @@ static long _perf_ioctl(struct perf_even
> >                 return perf_event_query_prog_array(event, (void __user *)arg);
> >
> >         case PERF_EVENT_IOC_MODIFY_ATTRIBUTES: {
> > +               struct perf_event_attr __user *uattr;
> >                 struct perf_event_attr new_attr;
> > -               int err = perf_copy_attr((struct perf_event_attr __user *)arg,
> > -                                        &new_attr);
> > +               int err;
> >
> > +               uattr = (struct perf_event_attr __user *)arg;
> > +               err = perf_copy_attr(uattr, &new_attr);
> >                 if (err)
> >                         return err;
> >
> > +               event->si_uattr = (unsigned long)uattr;
> > +
> >                 return perf_event_modify_attr(event,  &new_attr);
> >         }
> >         default:
> > @@ -6399,7 +6403,12 @@ static void perf_sigtrap(struct perf_eve
> >         clear_siginfo(&info);
> >         info.si_signo = SIGTRAP;
> >         info.si_code = TRAP_PERF;
> > -       info.si_errno = event->attr.type;
> > +       info.si_addr = (void *)event->si_data;
> > +
> > +       info.si_perf = event->si_uattr;
> > +       if (event->parent)
> > +               info.si_perf = event->parent->si_uattr;
> > +
> >         force_sig_info(&info);
> >  }
> >
> > @@ -6414,8 +6423,8 @@ static void perf_pending_event_disable(s
> >                 WRITE_ONCE(event->pending_disable, -1);
> >
> >                 if (event->attr.sigtrap) {
> > -                       atomic_set(&event->event_limit, 1); /* rearm event */
> >                         perf_sigtrap(event);
> > +                       atomic_set_release(&event->event_limit, 1); /* rearm event */
> >                         return;
> >                 }
> >
> > @@ -9121,6 +9130,7 @@ static int __perf_event_overflow(struct
> >         if (events && atomic_dec_and_test(&event->event_limit)) {
> >                 ret = 1;
> >                 event->pending_kill = POLL_HUP;
> > +               event->si_data = data->addr;
> >
> >                 perf_event_disable_inatomic(event);
> >         }
> > @@ -12011,6 +12021,8 @@ SYSCALL_DEFINE5(perf_event_open,
> >                 goto err_task;
> >         }
> >
> > +       event->si_uattr = (unsigned long)attr_uptr;
> > +
> >         if (is_sampling_event(event)) {
> >                 if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
> >                         err = -EOPNOTSUPP;
Dmitry Vyukov March 24, 2021, 2:15 p.m. UTC | #9
On Wed, Mar 24, 2021 at 3:12 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > On Wed, 24 Mar 2021 at 15:01, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > One last try, I'll leave it alone now, I promise :-)
> >
> > This looks like it does what you suggested, thanks! :-)
> >
> > I'll still need to think about it, because of the potential problem
> > with modify-signal-races and what the user's synchronization story
> > would look like then.
>
> I agree that this looks inherently racy. The attr can't be allocated
> on stack, user synchronization may be tricky and expensive. The API
> may provoke bugs and some users may not even realize the race problem.
>
> One potential alternative is use of an opaque u64 context (if we could
> shove it into the attr). A user can pass a pointer to the attr in
> there (makes it equivalent to this proposal), or bit-pack size/type
> (as we want), pass some sequence number or whatever.

Just to clarify what I was thinking about, but did not really state:
perf_event_attr_t includes u64 ctx, and we return it back to the user
in siginfo_t. Kernel does not treat it in any way. This is a pretty
common API pattern in general.


> > > --- a/include/linux/perf_event.h
> > > +++ b/include/linux/perf_event.h
> > > @@ -778,6 +778,9 @@ struct perf_event {
> > >         void *security;
> > >  #endif
> > >         struct list_head                sb_list;
> > > +
> > > +       unsigned long                   si_uattr;
> > > +       unsigned long                   si_data;
> > >  #endif /* CONFIG_PERF_EVENTS */
> > >  };
> > >
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -5652,13 +5652,17 @@ static long _perf_ioctl(struct perf_even
> > >                 return perf_event_query_prog_array(event, (void __user *)arg);
> > >
> > >         case PERF_EVENT_IOC_MODIFY_ATTRIBUTES: {
> > > +               struct perf_event_attr __user *uattr;
> > >                 struct perf_event_attr new_attr;
> > > -               int err = perf_copy_attr((struct perf_event_attr __user *)arg,
> > > -                                        &new_attr);
> > > +               int err;
> > >
> > > +               uattr = (struct perf_event_attr __user *)arg;
> > > +               err = perf_copy_attr(uattr, &new_attr);
> > >                 if (err)
> > >                         return err;
> > >
> > > +               event->si_uattr = (unsigned long)uattr;
> > > +
> > >                 return perf_event_modify_attr(event,  &new_attr);
> > >         }
> > >         default:
> > > @@ -6399,7 +6403,12 @@ static void perf_sigtrap(struct perf_eve
> > >         clear_siginfo(&info);
> > >         info.si_signo = SIGTRAP;
> > >         info.si_code = TRAP_PERF;
> > > -       info.si_errno = event->attr.type;
> > > +       info.si_addr = (void *)event->si_data;
> > > +
> > > +       info.si_perf = event->si_uattr;
> > > +       if (event->parent)
> > > +               info.si_perf = event->parent->si_uattr;
> > > +
> > >         force_sig_info(&info);
> > >  }
> > >
> > > @@ -6414,8 +6423,8 @@ static void perf_pending_event_disable(s
> > >                 WRITE_ONCE(event->pending_disable, -1);
> > >
> > >                 if (event->attr.sigtrap) {
> > > -                       atomic_set(&event->event_limit, 1); /* rearm event */
> > >                         perf_sigtrap(event);
> > > +                       atomic_set_release(&event->event_limit, 1); /* rearm event */
> > >                         return;
> > >                 }
> > >
> > > @@ -9121,6 +9130,7 @@ static int __perf_event_overflow(struct
> > >         if (events && atomic_dec_and_test(&event->event_limit)) {
> > >                 ret = 1;
> > >                 event->pending_kill = POLL_HUP;
> > > +               event->si_data = data->addr;
> > >
> > >                 perf_event_disable_inatomic(event);
> > >         }
> > > @@ -12011,6 +12021,8 @@ SYSCALL_DEFINE5(perf_event_open,
> > >                 goto err_task;
> > >         }
> > >
> > > +       event->si_uattr = (unsigned long)attr_uptr;
> > > +
> > >         if (is_sampling_event(event)) {
> > >                 if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
> > >                         err = -EOPNOTSUPP;
Marco Elver March 25, 2021, 7 a.m. UTC | #10
On Wed, 24 Mar 2021 at 15:15, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Wed, Mar 24, 2021 at 3:12 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > On Wed, 24 Mar 2021 at 15:01, Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > One last try, I'll leave it alone now, I promise :-)
> > >
> > > This looks like it does what you suggested, thanks! :-)
> > >
> > > I'll still need to think about it, because of the potential problem
> > > with modify-signal-races and what the user's synchronization story
> > > would look like then.
> >
> > I agree that this looks inherently racy. The attr can't be allocated
> > on stack, user synchronization may be tricky and expensive. The API
> > may provoke bugs and some users may not even realize the race problem.
> >
> > One potential alternative is use of an opaque u64 context (if we could
> > shove it into the attr). A user can pass a pointer to the attr in
> > there (makes it equivalent to this proposal), or bit-pack size/type
> > (as we want), pass some sequence number or whatever.
>
> Just to clarify what I was thinking about, but did not really state:
> perf_event_attr_t includes u64 ctx, and we return it back to the user
> in siginfo_t. Kernel does not treat it in any way. This is a pretty
> common API pattern in general.

Ok, let's go for a new field in perf_event_attr which is copied to
si_perf. This gives user space full flexibility to decide what to
stick in it, and the kernel does not prescribe some weird encoding or
synchronization that user space would have to live with. I'll probably
call it perf_event_attr::sig_data, because all si_* things are macros.

Thanks,
-- Marco
Ingo Molnar March 25, 2021, 2:18 p.m. UTC | #11
* Dmitry Vyukov <dvyukov@google.com> wrote:

> On Wed, Mar 24, 2021 at 3:05 PM Marco Elver <elver@google.com> wrote:
> >
> > On Wed, 24 Mar 2021 at 15:01, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > One last try, I'll leave it alone now, I promise :-)
> >
> > This looks like it does what you suggested, thanks! :-)
> >
> > I'll still need to think about it, because of the potential problem
> > with modify-signal-races and what the user's synchronization story
> > would look like then.
> 
> I agree that this looks inherently racy. The attr can't be allocated
> on stack, user synchronization may be tricky and expensive. The API
> may provoke bugs and some users may not even realize the race problem.

Yeah, so why cannot we allocate enough space from the signal handler 
user-space stack and put the attr there, and point to it from 
sig_info?

The idea would be to create a stable, per-signal snapshot of whatever 
the perf_attr state is at the moment the event happens and the signal 
is generated - which is roughly what user-space wants, right?

Thanks,

	Ingo
Marco Elver March 25, 2021, 3:17 p.m. UTC | #12
On Thu, 25 Mar 2021 at 15:18, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Dmitry Vyukov <dvyukov@google.com> wrote:
>
> > On Wed, Mar 24, 2021 at 3:05 PM Marco Elver <elver@google.com> wrote:
> > >
> > > On Wed, 24 Mar 2021 at 15:01, Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > One last try, I'll leave it alone now, I promise :-)
> > >
> > > This looks like it does what you suggested, thanks! :-)
> > >
> > > I'll still need to think about it, because of the potential problem
> > > with modify-signal-races and what the user's synchronization story
> > > would look like then.
> >
> > I agree that this looks inherently racy. The attr can't be allocated
> > on stack, user synchronization may be tricky and expensive. The API
> > may provoke bugs and some users may not even realize the race problem.
>
> Yeah, so why cannot we allocate enough space from the signal handler
> user-space stack and put the attr there, and point to it from
> sig_info?
>
> The idea would be to create a stable, per-signal snapshot of whatever
> the perf_attr state is at the moment the event happens and the signal
> is generated - which is roughly what user-space wants, right?

I certainly couldn't say how feasible this is. Is there infrastructure
in place to do this? Or do we have to introduce support for stashing
things on the signal stack?

From what we can tell, the most flexible option though appears to be
just some user settable opaque data in perf_event_attr, that is copied
to siginfo. It'd allow user space to store a pointer or a hash/key, or
just encode the relevant information it wants; but could also go
further, and add information beyond perf_event_attr, such as things
like a signal receiver filter (e.g. task ID or set of threads which
should process the signal etc.).

So if there's no strong objection to the additional field in
perf_event_attr, I think it'll give us the simplest and most flexible
option.

Thanks,
-- Marco

> Thanks,
>
>         Ingo
Ingo Molnar March 25, 2021, 3:35 p.m. UTC | #13
* Marco Elver <elver@google.com> wrote:

> > Yeah, so why cannot we allocate enough space from the signal 
> > handler user-space stack and put the attr there, and point to it 
> > from sig_info?
> >
> > The idea would be to create a stable, per-signal snapshot of 
> > whatever the perf_attr state is at the moment the event happens 
> > and the signal is generated - which is roughly what user-space 
> > wants, right?
> 
> I certainly couldn't say how feasible this is. Is there 
> infrastructure in place to do this? Or do we have to introduce 
> support for stashing things on the signal stack?
> 
> From what we can tell, the most flexible option though appears to be 
> just some user settable opaque data in perf_event_attr, that is 
> copied to siginfo. It'd allow user space to store a pointer or a 
> hash/key, or just encode the relevant information it wants; but 
> could also go further, and add information beyond perf_event_attr, 
> such as things like a signal receiver filter (e.g. task ID or set of 
> threads which should process the signal etc.).
> 
> So if there's no strong objection to the additional field in 
> perf_event_attr, I think it'll give us the simplest and most 
> flexible option.

Sounds good to me - it's also probably measurably faster than copying 
the not-so-small-anymore perf_attr structure.

Thanks,

	Ingo
diff mbox series

Patch

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1e4c949bf75f..0316d39e8c8f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6399,6 +6399,22 @@  static void perf_sigtrap(struct perf_event *event)
 	info.si_signo = SIGTRAP;
 	info.si_code = TRAP_PERF;
 	info.si_errno = event->attr.type;
+
+	switch (event->attr.type) {
+	case PERF_TYPE_BREAKPOINT:
+		info.si_addr = (void *)(unsigned long)event->attr.bp_addr;
+		info.si_perf = (event->attr.bp_len << 16) | (u64)event->attr.bp_type;
+		break;
+	default:
+		/*
+		 * No additional info set (si_perf == 0).
+		 *
+		 * Adding new cases for event types to set si_perf to a
+		 * non-constant value must ensure that si_perf != 0.
+		 */
+		break;
+	}
+
 	force_sig_info(&info);
 }