Message ID | 20230202062549.632425-1-arilou@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next,v2] libbpf: Add wakeup_events to creation options | expand |
On Wed, Feb 1, 2023 at 10:26 PM Jon Doron <arilou@gmail.com> wrote: > > From: Jon Doron <jond@wiz.io> > > Add option to set when the perf buffer should wake up, by default the > perf buffer becomes signaled for every event that is being pushed to it. > > In case of a high throughput of events it will be more efficient to wake > up only once you have X events ready to be read. > > So your application can wakeup once and drain the entire perf buffer. > > Signed-off-by: Jon Doron <jond@wiz.io> > --- > tools/lib/bpf/libbpf.c | 4 ++-- > tools/lib/bpf/libbpf.h | 3 ++- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index eed5cec6f510..6b30ff13922b 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt, > attr.config = PERF_COUNT_SW_BPF_OUTPUT; > attr.type = PERF_TYPE_SOFTWARE; > attr.sample_type = PERF_SAMPLE_RAW; > - attr.sample_period = 1; > - attr.wakeup_events = 1; > + attr.sample_period = OPTS_GET(opts, wakeup_events, 1); > + attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1); I suspect the case of LIBBPF_OPTS(perf_buffer_opts, opts); perf_buffer__new(...., &opts); is not handled correctly and you end up with sample_period == wakeup_events == 0 Can you please add BPF selftests that's setting wakeup_events to zero and separately to >1? > > p.attr = &attr; > p.sample_cb = sample_cb; > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h > index 8777ff21ea1d..e83c0a915dc7 100644 > --- a/tools/lib/bpf/libbpf.h > +++ b/tools/lib/bpf/libbpf.h > @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt); > /* common use perf buffer options */ > struct perf_buffer_opts { > size_t sz; > + __u32 wakeup_events; > }; > -#define perf_buffer_opts__last_field sz > +#define perf_buffer_opts__last_field wakeup_events > > /** > * @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified > -- > 2.39.1 >
On 02/02/2023, Andrii Nakryiko wrote: >On Wed, Feb 1, 2023 at 10:26 PM Jon Doron <arilou@gmail.com> wrote: >> >> From: Jon Doron <jond@wiz.io> >> >> Add option to set when the perf buffer should wake up, by default the >> perf buffer becomes signaled for every event that is being pushed to it. >> >> In case of a high throughput of events it will be more efficient to wake >> up only once you have X events ready to be read. >> >> So your application can wakeup once and drain the entire perf buffer. >> >> Signed-off-by: Jon Doron <jond@wiz.io> >> --- >> tools/lib/bpf/libbpf.c | 4 ++-- >> tools/lib/bpf/libbpf.h | 3 ++- >> 2 files changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c >> index eed5cec6f510..6b30ff13922b 100644 >> --- a/tools/lib/bpf/libbpf.c >> +++ b/tools/lib/bpf/libbpf.c >> @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt, >> attr.config = PERF_COUNT_SW_BPF_OUTPUT; >> attr.type = PERF_TYPE_SOFTWARE; >> attr.sample_type = PERF_SAMPLE_RAW; >> - attr.sample_period = 1; >> - attr.wakeup_events = 1; >> + attr.sample_period = OPTS_GET(opts, wakeup_events, 1); >> + attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1); > >I suspect the case of > >LIBBPF_OPTS(perf_buffer_opts, opts); > >perf_buffer__new(...., &opts); > >is not handled correctly and you end up with sample_period == wakeup_events == 0 > >Can you please add BPF selftests that's setting wakeup_events to zero >and separately to >1? > Hi Andrii, I'm not sure what we are testing, when you have sample_period == wakeup_events == 0, it basically means to never wakeup, so let's say you would wait on the poll_fd infinitely it will never wake you up. When you have let's say wakeup_event != 0, you will wakeup after the ring buffer in the perf buffer has more events than wakeup_events. I do see your point that if someone is using the macro to build the opts they will end with something unexpected, would you like me to treat 0 as 1 in that case? -- Jon. >> >> p.attr = &attr; >> p.sample_cb = sample_cb; >> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h >> index 8777ff21ea1d..e83c0a915dc7 100644 >> --- a/tools/lib/bpf/libbpf.h >> +++ b/tools/lib/bpf/libbpf.h >> @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt); >> /* common use perf buffer options */ >> struct perf_buffer_opts { >> size_t sz; >> + __u32 wakeup_events; >> }; >> -#define perf_buffer_opts__last_field sz >> +#define perf_buffer_opts__last_field wakeup_events >> >> /** >> * @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified >> -- >> 2.39.1 >>
On 2/1/23 10:25 PM, Jon Doron wrote: > From: Jon Doron <jond@wiz.io> > > Add option to set when the perf buffer should wake up, by default the > perf buffer becomes signaled for every event that is being pushed to it. > > In case of a high throughput of events it will be more efficient to wake > up only once you have X events ready to be read. > > So your application can wakeup once and drain the entire perf buffer. > > Signed-off-by: Jon Doron <jond@wiz.io> > --- > tools/lib/bpf/libbpf.c | 4 ++-- > tools/lib/bpf/libbpf.h | 3 ++- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index eed5cec6f510..6b30ff13922b 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt, > attr.config = PERF_COUNT_SW_BPF_OUTPUT; > attr.type = PERF_TYPE_SOFTWARE; > attr.sample_type = PERF_SAMPLE_RAW; > - attr.sample_period = 1; > - attr.wakeup_events = 1; > + attr.sample_period = OPTS_GET(opts, wakeup_events, 1); > + attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1); > > p.attr = &attr; > p.sample_cb = sample_cb; > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h > index 8777ff21ea1d..e83c0a915dc7 100644 > --- a/tools/lib/bpf/libbpf.h > +++ b/tools/lib/bpf/libbpf.h > @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt); > /* common use perf buffer options */ > struct perf_buffer_opts { > size_t sz; > + __u32 wakeup_events; Since you are adding wakeup_events here, do you think it make sense to add sample_period to struct perf_buffer_opts as well? In some cases, users might want to have different values for sample_period and wakeup_events, e.g., smaller sample_period to accumulate data and larger wakeup_events to wakeup user space poll? > }; > -#define perf_buffer_opts__last_field sz > +#define perf_buffer_opts__last_field wakeup_events > > /** > * @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified
On Thu, Feb 2, 2023 at 11:14 PM Jon Doron <arilou@gmail.com> wrote: > > On 02/02/2023, Andrii Nakryiko wrote: > >On Wed, Feb 1, 2023 at 10:26 PM Jon Doron <arilou@gmail.com> wrote: > >> > >> From: Jon Doron <jond@wiz.io> > >> > >> Add option to set when the perf buffer should wake up, by default the > >> perf buffer becomes signaled for every event that is being pushed to it. > >> > >> In case of a high throughput of events it will be more efficient to wake > >> up only once you have X events ready to be read. > >> > >> So your application can wakeup once and drain the entire perf buffer. > >> > >> Signed-off-by: Jon Doron <jond@wiz.io> > >> --- > >> tools/lib/bpf/libbpf.c | 4 ++-- > >> tools/lib/bpf/libbpf.h | 3 ++- > >> 2 files changed, 4 insertions(+), 3 deletions(-) > >> > >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > >> index eed5cec6f510..6b30ff13922b 100644 > >> --- a/tools/lib/bpf/libbpf.c > >> +++ b/tools/lib/bpf/libbpf.c > >> @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt, > >> attr.config = PERF_COUNT_SW_BPF_OUTPUT; > >> attr.type = PERF_TYPE_SOFTWARE; > >> attr.sample_type = PERF_SAMPLE_RAW; > >> - attr.sample_period = 1; > >> - attr.wakeup_events = 1; > >> + attr.sample_period = OPTS_GET(opts, wakeup_events, 1); > >> + attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1); > > > >I suspect the case of > > > >LIBBPF_OPTS(perf_buffer_opts, opts); > > > >perf_buffer__new(...., &opts); > > > >is not handled correctly and you end up with sample_period == wakeup_events == 0 > > > >Can you please add BPF selftests that's setting wakeup_events to zero > >and separately to >1? > > > > Hi Andrii, > > I'm not sure what we are testing, when you have sample_period == > wakeup_events == 0, it basically means to never wakeup, so let's say you > would wait on the poll_fd infinitely it will never wake you up. > > When you have let's say wakeup_event != 0, you will wakeup after the > ring buffer in the perf buffer has more events than wakeup_events. > > I do see your point that if someone is using the macro to build the opts > they will end with something unexpected, would you like me to treat 0 as > 1 in that case? Yes, exactly, I think we should treat zero as 1 and write a test that this happens. Otherwise it will be very confusing when someone use perf_buffer_opts for some other future option, and then suddenly starts getting no notification. If someone really needs wakeup == 0, they have a fallback plan to use perf_buffer__new_raw_opts(), which is probably justified for some very specific and advanced uses. So yes, please add a test with few subtests where we test default opts (wakeup_after == 1), and wakeup_after > 1. > > -- Jon. > > >> > >> p.attr = &attr; > >> p.sample_cb = sample_cb; > >> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h > >> index 8777ff21ea1d..e83c0a915dc7 100644 > >> --- a/tools/lib/bpf/libbpf.h > >> +++ b/tools/lib/bpf/libbpf.h > >> @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt); > >> /* common use perf buffer options */ > >> struct perf_buffer_opts { > >> size_t sz; > >> + __u32 wakeup_events; > >> }; > >> -#define perf_buffer_opts__last_field sz > >> +#define perf_buffer_opts__last_field wakeup_events > >> > >> /** > >> * @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified > >> -- > >> 2.39.1 > >>
On Fri, Feb 3, 2023 at 10:31 AM Yonghong Song <yhs@meta.com> wrote: > > > > On 2/1/23 10:25 PM, Jon Doron wrote: > > From: Jon Doron <jond@wiz.io> > > > > Add option to set when the perf buffer should wake up, by default the > > perf buffer becomes signaled for every event that is being pushed to it. > > > > In case of a high throughput of events it will be more efficient to wake > > up only once you have X events ready to be read. > > > > So your application can wakeup once and drain the entire perf buffer. > > > > Signed-off-by: Jon Doron <jond@wiz.io> > > --- > > tools/lib/bpf/libbpf.c | 4 ++-- > > tools/lib/bpf/libbpf.h | 3 ++- > > 2 files changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > > index eed5cec6f510..6b30ff13922b 100644 > > --- a/tools/lib/bpf/libbpf.c > > +++ b/tools/lib/bpf/libbpf.c > > @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt, > > attr.config = PERF_COUNT_SW_BPF_OUTPUT; > > attr.type = PERF_TYPE_SOFTWARE; > > attr.sample_type = PERF_SAMPLE_RAW; > > - attr.sample_period = 1; > > - attr.wakeup_events = 1; > > + attr.sample_period = OPTS_GET(opts, wakeup_events, 1); > > + attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1); > > > > p.attr = &attr; > > p.sample_cb = sample_cb; > > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h > > index 8777ff21ea1d..e83c0a915dc7 100644 > > --- a/tools/lib/bpf/libbpf.h > > +++ b/tools/lib/bpf/libbpf.h > > @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt); > > /* common use perf buffer options */ > > struct perf_buffer_opts { > > size_t sz; > > + __u32 wakeup_events; > > Since you are adding wakeup_events here, do you think it make sense > to add sample_period to struct perf_buffer_opts as well? In some cases, > users might want to have different values for sample_period and > wakeup_events, e.g., smaller sample_period to accumulate data and > larger wakeup_events to wakeup user space poll? It's not clear to me from reading perf_event_open manpage what "sample_period" means for perf buffer. What will happen when we have sample_period != wakeup_events? > > > }; > > -#define perf_buffer_opts__last_field sz > > +#define perf_buffer_opts__last_field wakeup_events > > > > /** > > * @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified
On 2/3/23 1:42 PM, Andrii Nakryiko wrote: > On Fri, Feb 3, 2023 at 10:31 AM Yonghong Song <yhs@meta.com> wrote: >> >> >> >> On 2/1/23 10:25 PM, Jon Doron wrote: >>> From: Jon Doron <jond@wiz.io> >>> >>> Add option to set when the perf buffer should wake up, by default the >>> perf buffer becomes signaled for every event that is being pushed to it. >>> >>> In case of a high throughput of events it will be more efficient to wake >>> up only once you have X events ready to be read. >>> >>> So your application can wakeup once and drain the entire perf buffer. >>> >>> Signed-off-by: Jon Doron <jond@wiz.io> >>> --- >>> tools/lib/bpf/libbpf.c | 4 ++-- >>> tools/lib/bpf/libbpf.h | 3 ++- >>> 2 files changed, 4 insertions(+), 3 deletions(-) >>> >>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c >>> index eed5cec6f510..6b30ff13922b 100644 >>> --- a/tools/lib/bpf/libbpf.c >>> +++ b/tools/lib/bpf/libbpf.c >>> @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt, >>> attr.config = PERF_COUNT_SW_BPF_OUTPUT; >>> attr.type = PERF_TYPE_SOFTWARE; >>> attr.sample_type = PERF_SAMPLE_RAW; >>> - attr.sample_period = 1; >>> - attr.wakeup_events = 1; >>> + attr.sample_period = OPTS_GET(opts, wakeup_events, 1); >>> + attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1); >>> >>> p.attr = &attr; >>> p.sample_cb = sample_cb; >>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h >>> index 8777ff21ea1d..e83c0a915dc7 100644 >>> --- a/tools/lib/bpf/libbpf.h >>> +++ b/tools/lib/bpf/libbpf.h >>> @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt); >>> /* common use perf buffer options */ >>> struct perf_buffer_opts { >>> size_t sz; >>> + __u32 wakeup_events; >> >> Since you are adding wakeup_events here, do you think it make sense >> to add sample_period to struct perf_buffer_opts as well? In some cases, >> users might want to have different values for sample_period and >> wakeup_events, e.g., smaller sample_period to accumulate data and >> larger wakeup_events to wakeup user space poll? > > It's not clear to me from reading perf_event_open manpage what > "sample_period" means for perf buffer. What will happen when we have > sample_period != wakeup_events? The following is my understanding. Let us sample_period = 10, wakeup_events = 100. Every 10 samples, kernel overflow handler is called and the the sample data are written into ring buffer. Every 100 samples, the poll() syscall is waken up to give userspace a chance to read the data. In this particular case, it is possible that user space may miss some samples at the end if no special handling (I forgot what it is). But if the user space doesn't really care, it might be okay with such a configuration. It would be great if some perf expert can clarify whether my understanding is correct or not. > >> >>> }; >>> -#define perf_buffer_opts__last_field sz >>> +#define perf_buffer_opts__last_field wakeup_events >>> >>> /** >>> * @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified
On 2/3/23 3:11 PM, Yonghong Song wrote: > > > On 2/3/23 1:42 PM, Andrii Nakryiko wrote: >> On Fri, Feb 3, 2023 at 10:31 AM Yonghong Song <yhs@meta.com> wrote: >>> >>> >>> >>> On 2/1/23 10:25 PM, Jon Doron wrote: >>>> From: Jon Doron <jond@wiz.io> >>>> >>>> Add option to set when the perf buffer should wake up, by default the >>>> perf buffer becomes signaled for every event that is being pushed to >>>> it. >>>> >>>> In case of a high throughput of events it will be more efficient to >>>> wake >>>> up only once you have X events ready to be read. >>>> >>>> So your application can wakeup once and drain the entire perf buffer. >>>> >>>> Signed-off-by: Jon Doron <jond@wiz.io> >>>> --- >>>> tools/lib/bpf/libbpf.c | 4 ++-- >>>> tools/lib/bpf/libbpf.h | 3 ++- >>>> 2 files changed, 4 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c >>>> index eed5cec6f510..6b30ff13922b 100644 >>>> --- a/tools/lib/bpf/libbpf.c >>>> +++ b/tools/lib/bpf/libbpf.c >>>> @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int >>>> map_fd, size_t page_cnt, >>>> attr.config = PERF_COUNT_SW_BPF_OUTPUT; >>>> attr.type = PERF_TYPE_SOFTWARE; >>>> attr.sample_type = PERF_SAMPLE_RAW; >>>> - attr.sample_period = 1; >>>> - attr.wakeup_events = 1; >>>> + attr.sample_period = OPTS_GET(opts, wakeup_events, 1); >>>> + attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1); >>>> >>>> p.attr = &attr; >>>> p.sample_cb = sample_cb; >>>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h >>>> index 8777ff21ea1d..e83c0a915dc7 100644 >>>> --- a/tools/lib/bpf/libbpf.h >>>> +++ b/tools/lib/bpf/libbpf.h >>>> @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, >>>> int cpu, __u64 cnt); >>>> /* common use perf buffer options */ >>>> struct perf_buffer_opts { >>>> size_t sz; >>>> + __u32 wakeup_events; >>> >>> Since you are adding wakeup_events here, do you think it make sense >>> to add sample_period to struct perf_buffer_opts as well? In some cases, >>> users might want to have different values for sample_period and >>> wakeup_events, e.g., smaller sample_period to accumulate data and >>> larger wakeup_events to wakeup user space poll? >> >> It's not clear to me from reading perf_event_open manpage what >> "sample_period" means for perf buffer. What will happen when we have >> sample_period != wakeup_events? > > The following is my understanding. Let us sample_period = 10, > wakeup_events = 100. > > Every 10 samples, kernel overflow handler is called and the the sample > data are written into ring buffer. > Every 100 samples, the poll() syscall is waken up to give userspace > a chance to read the data. > > In this particular case, it is possible that user space may miss > some samples at the end if no special handling (I forgot what it is). > But if the user space doesn't really care, it might be okay with > such a configuration. > > It would be great if some perf expert can clarify whether my > understanding is correct or not. Think twice. I am okay with just one wakeup_events/sample_period in the perf_buffer_opts since wakeup_events == sample_period is the most common use case. The same as before wakeup_events = sample_period = 1. But it is apparent that sample_period is much easier to understand than wakeup_events. Can we put sample_period in struct perf_buffer_opts instead of wakeup_events? > >> >>> >>>> }; >>>> -#define perf_buffer_opts__last_field sz >>>> +#define perf_buffer_opts__last_field wakeup_events >>>> >>>> /** >>>> * @brief **perf_buffer__new()** creates BPF perfbuf manager for >>>> a specified
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index eed5cec6f510..6b30ff13922b 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt, attr.config = PERF_COUNT_SW_BPF_OUTPUT; attr.type = PERF_TYPE_SOFTWARE; attr.sample_type = PERF_SAMPLE_RAW; - attr.sample_period = 1; - attr.wakeup_events = 1; + attr.sample_period = OPTS_GET(opts, wakeup_events, 1); + attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1); p.attr = &attr; p.sample_cb = sample_cb; diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 8777ff21ea1d..e83c0a915dc7 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt); /* common use perf buffer options */ struct perf_buffer_opts { size_t sz; + __u32 wakeup_events; }; -#define perf_buffer_opts__last_field sz +#define perf_buffer_opts__last_field wakeup_events /** * @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified