[bpf-next,v2] libbpf: Add wakeup_events to creation options

Message ID	20230202062549.632425-1-arilou@gmail.com (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@vger.kernel.org> From: Jon Doron <arilou@gmail.com> To: bpf@vger.kernel.org, ast@kernel.org, andrii@kernel.org Cc: Jon Doron <jond@wiz.io> Subject: [PATCH bpf-next v2] libbpf: Add wakeup_events to creation options Date: Thu, 2 Feb 2023 08:25:49 +0200 Message-Id: <20230202062549.632425-1-arilou@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[bpf-next,v2] libbpf: Add wakeup_events to creation options \| expand [bpf-next,v2] libbpf: Add wakeup_events to creation options

Context	Check	Description
bpf/vmtest-bpf-next-VM_Test-39	success	Logs for test_verifier on x86_64 with llvm-17
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	success	Link
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers	warning	9 maintainers not CCed: john.fastabend@gmail.com daniel@iogearbox.net sdf@google.com jolsa@kernel.org song@kernel.org martin.lau@linux.dev haoluo@google.com yhs@fb.com kpsingh@kernel.org
netdev/build_clang	success	Errors and warnings before: 0 this patch: 0
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 0 this patch: 0
netdev/checkpatch	warning	CHECK: Please use a blank line after function/struct/union/enum declarations
netdev/kdoc	success	Errors and warnings before: 110 this patch: 110
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for test_progs_no_alu32_parallel on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-32	success	Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33	success	Logs for test_progs_parallel on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-37	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-38	success	Logs for test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for test_maps on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-14	fail	Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for test_progs_no_alu32 on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for test_progs_no_alu32_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-29	success	Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-30	success	Logs for test_progs_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-34	success	Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-35	success	Logs for test_verifier on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for test_progs on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-26	success	Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-36	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-31	success	Logs for test_progs_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-PR	fail	PR summary
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for test_maps on s390x with gcc

Jon Doron Feb. 2, 2023, 6:25 a.m. UTC

From: Jon Doron <jond@wiz.io>

Add option to set when the perf buffer should wake up, by default the
perf buffer becomes signaled for every event that is being pushed to it.

In case of a high throughput of events it will be more efficient to wake
up only once you have X events ready to be read.

So your application can wakeup once and drain the entire perf buffer.

Signed-off-by: Jon Doron <jond@wiz.io>
---
 tools/lib/bpf/libbpf.c | 4 ++--
 tools/lib/bpf/libbpf.h | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

Andrii Nakryiko Feb. 2, 2023, 10:50 p.m. UTC | #1

On Wed, Feb 1, 2023 at 10:26 PM Jon Doron <arilou@gmail.com> wrote:
>
> From: Jon Doron <jond@wiz.io>
>
> Add option to set when the perf buffer should wake up, by default the
> perf buffer becomes signaled for every event that is being pushed to it.
>
> In case of a high throughput of events it will be more efficient to wake
> up only once you have X events ready to be read.
>
> So your application can wakeup once and drain the entire perf buffer.
>
> Signed-off-by: Jon Doron <jond@wiz.io>
> ---
>  tools/lib/bpf/libbpf.c | 4 ++--
>  tools/lib/bpf/libbpf.h | 3 ++-
>  2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index eed5cec6f510..6b30ff13922b 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt,
>         attr.config = PERF_COUNT_SW_BPF_OUTPUT;
>         attr.type = PERF_TYPE_SOFTWARE;
>         attr.sample_type = PERF_SAMPLE_RAW;
> -       attr.sample_period = 1;
> -       attr.wakeup_events = 1;
> +       attr.sample_period = OPTS_GET(opts, wakeup_events, 1);
> +       attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1);

I suspect the case of

LIBBPF_OPTS(perf_buffer_opts, opts);

perf_buffer__new(...., &opts);

is not handled correctly and you end up with sample_period == wakeup_events == 0

Can you please add BPF selftests that's setting wakeup_events to zero
and separately to >1?

>
>         p.attr = &attr;
>         p.sample_cb = sample_cb;
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 8777ff21ea1d..e83c0a915dc7 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt);
>  /* common use perf buffer options */
>  struct perf_buffer_opts {
>         size_t sz;
> +       __u32 wakeup_events;
>  };
> -#define perf_buffer_opts__last_field sz
> +#define perf_buffer_opts__last_field wakeup_events
>
>  /**
>   * @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified
> --
> 2.39.1
>

Jon Doron Feb. 3, 2023, 7:14 a.m. UTC | #2

On 02/02/2023, Andrii Nakryiko wrote:
>On Wed, Feb 1, 2023 at 10:26 PM Jon Doron <arilou@gmail.com> wrote:
>>
>> From: Jon Doron <jond@wiz.io>
>>
>> Add option to set when the perf buffer should wake up, by default the
>> perf buffer becomes signaled for every event that is being pushed to it.
>>
>> In case of a high throughput of events it will be more efficient to wake
>> up only once you have X events ready to be read.
>>
>> So your application can wakeup once and drain the entire perf buffer.
>>
>> Signed-off-by: Jon Doron <jond@wiz.io>
>> ---
>>  tools/lib/bpf/libbpf.c | 4 ++--
>>  tools/lib/bpf/libbpf.h | 3 ++-
>>  2 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index eed5cec6f510..6b30ff13922b 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt,
>>         attr.config = PERF_COUNT_SW_BPF_OUTPUT;
>>         attr.type = PERF_TYPE_SOFTWARE;
>>         attr.sample_type = PERF_SAMPLE_RAW;
>> -       attr.sample_period = 1;
>> -       attr.wakeup_events = 1;
>> +       attr.sample_period = OPTS_GET(opts, wakeup_events, 1);
>> +       attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1);
>
>I suspect the case of
>
>LIBBPF_OPTS(perf_buffer_opts, opts);
>
>perf_buffer__new(...., &opts);
>
>is not handled correctly and you end up with sample_period == wakeup_events == 0
>
>Can you please add BPF selftests that's setting wakeup_events to zero
>and separately to >1?
>

Hi Andrii,

I'm not sure what we are testing, when you have sample_period == 
wakeup_events == 0, it basically means to never wakeup, so let's say you 
would wait on the poll_fd infinitely it will never wake you up.

When you have let's say wakeup_event != 0, you will wakeup after the 
ring buffer in the perf buffer has more events than wakeup_events.

I do see your point that if someone is using the macro to build the opts 
they will end with something unexpected, would you like me to treat 0 as 
1 in that case?

-- Jon.

>>
>>         p.attr = &attr;
>>         p.sample_cb = sample_cb;
>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>> index 8777ff21ea1d..e83c0a915dc7 100644
>> --- a/tools/lib/bpf/libbpf.h
>> +++ b/tools/lib/bpf/libbpf.h
>> @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt);
>>  /* common use perf buffer options */
>>  struct perf_buffer_opts {
>>         size_t sz;
>> +       __u32 wakeup_events;
>>  };
>> -#define perf_buffer_opts__last_field sz
>> +#define perf_buffer_opts__last_field wakeup_events
>>
>>  /**
>>   * @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified
>> --
>> 2.39.1
>>

Yonghong Song Feb. 3, 2023, 6:31 p.m. UTC | #3

On 2/1/23 10:25 PM, Jon Doron wrote:
> From: Jon Doron <jond@wiz.io>
> 
> Add option to set when the perf buffer should wake up, by default the
> perf buffer becomes signaled for every event that is being pushed to it.
> 
> In case of a high throughput of events it will be more efficient to wake
> up only once you have X events ready to be read.
> 
> So your application can wakeup once and drain the entire perf buffer.
> 
> Signed-off-by: Jon Doron <jond@wiz.io>
> ---
>   tools/lib/bpf/libbpf.c | 4 ++--
>   tools/lib/bpf/libbpf.h | 3 ++-
>   2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index eed5cec6f510..6b30ff13922b 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt,
>   	attr.config = PERF_COUNT_SW_BPF_OUTPUT;
>   	attr.type = PERF_TYPE_SOFTWARE;
>   	attr.sample_type = PERF_SAMPLE_RAW;
> -	attr.sample_period = 1;
> -	attr.wakeup_events = 1;
> +	attr.sample_period = OPTS_GET(opts, wakeup_events, 1);
> +	attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1);
>   
>   	p.attr = &attr;
>   	p.sample_cb = sample_cb;
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 8777ff21ea1d..e83c0a915dc7 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt);
>   /* common use perf buffer options */
>   struct perf_buffer_opts {
>   	size_t sz;
> +	__u32 wakeup_events;

Since you are adding wakeup_events here, do you think it make sense
to add sample_period to struct perf_buffer_opts as well? In some cases,
users might want to have different values for sample_period and 
wakeup_events, e.g., smaller sample_period to accumulate data and
larger wakeup_events to wakeup user space poll?

>   };
> -#define perf_buffer_opts__last_field sz
> +#define perf_buffer_opts__last_field wakeup_events
>   
>   /**
>    * @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified

Andrii Nakryiko Feb. 3, 2023, 9:41 p.m. UTC | #4

On Thu, Feb 2, 2023 at 11:14 PM Jon Doron <arilou@gmail.com> wrote:
>
> On 02/02/2023, Andrii Nakryiko wrote:
> >On Wed, Feb 1, 2023 at 10:26 PM Jon Doron <arilou@gmail.com> wrote:
> >>
> >> From: Jon Doron <jond@wiz.io>
> >>
> >> Add option to set when the perf buffer should wake up, by default the
> >> perf buffer becomes signaled for every event that is being pushed to it.
> >>
> >> In case of a high throughput of events it will be more efficient to wake
> >> up only once you have X events ready to be read.
> >>
> >> So your application can wakeup once and drain the entire perf buffer.
> >>
> >> Signed-off-by: Jon Doron <jond@wiz.io>
> >> ---
> >>  tools/lib/bpf/libbpf.c | 4 ++--
> >>  tools/lib/bpf/libbpf.h | 3 ++-
> >>  2 files changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >> index eed5cec6f510..6b30ff13922b 100644
> >> --- a/tools/lib/bpf/libbpf.c
> >> +++ b/tools/lib/bpf/libbpf.c
> >> @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt,
> >>         attr.config = PERF_COUNT_SW_BPF_OUTPUT;
> >>         attr.type = PERF_TYPE_SOFTWARE;
> >>         attr.sample_type = PERF_SAMPLE_RAW;
> >> -       attr.sample_period = 1;
> >> -       attr.wakeup_events = 1;
> >> +       attr.sample_period = OPTS_GET(opts, wakeup_events, 1);
> >> +       attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1);
> >
> >I suspect the case of
> >
> >LIBBPF_OPTS(perf_buffer_opts, opts);
> >
> >perf_buffer__new(...., &opts);
> >
> >is not handled correctly and you end up with sample_period == wakeup_events == 0
> >
> >Can you please add BPF selftests that's setting wakeup_events to zero
> >and separately to >1?
> >
>
> Hi Andrii,
>
> I'm not sure what we are testing, when you have sample_period ==
> wakeup_events == 0, it basically means to never wakeup, so let's say you
> would wait on the poll_fd infinitely it will never wake you up.
>
> When you have let's say wakeup_event != 0, you will wakeup after the
> ring buffer in the perf buffer has more events than wakeup_events.
>
> I do see your point that if someone is using the macro to build the opts
> they will end with something unexpected, would you like me to treat 0 as
> 1 in that case?

Yes, exactly, I think we should treat zero as 1 and write a test that
this happens. Otherwise it will be very confusing when someone use
perf_buffer_opts for some other future option, and then suddenly
starts getting no notification. If someone really needs wakeup == 0,
they have a fallback plan to use perf_buffer__new_raw_opts(), which is
probably justified for some very specific and advanced uses.

So yes, please add a test with few subtests where we test default opts
(wakeup_after == 1), and wakeup_after > 1.

>
> -- Jon.
>
> >>
> >>         p.attr = &attr;
> >>         p.sample_cb = sample_cb;
> >> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> >> index 8777ff21ea1d..e83c0a915dc7 100644
> >> --- a/tools/lib/bpf/libbpf.h
> >> +++ b/tools/lib/bpf/libbpf.h
> >> @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt);
> >>  /* common use perf buffer options */
> >>  struct perf_buffer_opts {
> >>         size_t sz;
> >> +       __u32 wakeup_events;
> >>  };
> >> -#define perf_buffer_opts__last_field sz
> >> +#define perf_buffer_opts__last_field wakeup_events
> >>
> >>  /**
> >>   * @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified
> >> --
> >> 2.39.1
> >>

Andrii Nakryiko Feb. 3, 2023, 9:42 p.m. UTC | #5

On Fri, Feb 3, 2023 at 10:31 AM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 2/1/23 10:25 PM, Jon Doron wrote:
> > From: Jon Doron <jond@wiz.io>
> >
> > Add option to set when the perf buffer should wake up, by default the
> > perf buffer becomes signaled for every event that is being pushed to it.
> >
> > In case of a high throughput of events it will be more efficient to wake
> > up only once you have X events ready to be read.
> >
> > So your application can wakeup once and drain the entire perf buffer.
> >
> > Signed-off-by: Jon Doron <jond@wiz.io>
> > ---
> >   tools/lib/bpf/libbpf.c | 4 ++--
> >   tools/lib/bpf/libbpf.h | 3 ++-
> >   2 files changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index eed5cec6f510..6b30ff13922b 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt,
> >       attr.config = PERF_COUNT_SW_BPF_OUTPUT;
> >       attr.type = PERF_TYPE_SOFTWARE;
> >       attr.sample_type = PERF_SAMPLE_RAW;
> > -     attr.sample_period = 1;
> > -     attr.wakeup_events = 1;
> > +     attr.sample_period = OPTS_GET(opts, wakeup_events, 1);
> > +     attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1);
> >
> >       p.attr = &attr;
> >       p.sample_cb = sample_cb;
> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > index 8777ff21ea1d..e83c0a915dc7 100644
> > --- a/tools/lib/bpf/libbpf.h
> > +++ b/tools/lib/bpf/libbpf.h
> > @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt);
> >   /* common use perf buffer options */
> >   struct perf_buffer_opts {
> >       size_t sz;
> > +     __u32 wakeup_events;
>
> Since you are adding wakeup_events here, do you think it make sense
> to add sample_period to struct perf_buffer_opts as well? In some cases,
> users might want to have different values for sample_period and
> wakeup_events, e.g., smaller sample_period to accumulate data and
> larger wakeup_events to wakeup user space poll?

It's not clear to me from reading perf_event_open manpage what
"sample_period" means for perf buffer. What will happen when we have
sample_period != wakeup_events?

>
> >   };
> > -#define perf_buffer_opts__last_field sz
> > +#define perf_buffer_opts__last_field wakeup_events
> >
> >   /**
> >    * @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified

Yonghong Song Feb. 3, 2023, 11:11 p.m. UTC | #6

On 2/3/23 1:42 PM, Andrii Nakryiko wrote:
> On Fri, Feb 3, 2023 at 10:31 AM Yonghong Song <yhs@meta.com> wrote:
>>
>>
>>
>> On 2/1/23 10:25 PM, Jon Doron wrote:
>>> From: Jon Doron <jond@wiz.io>
>>>
>>> Add option to set when the perf buffer should wake up, by default the
>>> perf buffer becomes signaled for every event that is being pushed to it.
>>>
>>> In case of a high throughput of events it will be more efficient to wake
>>> up only once you have X events ready to be read.
>>>
>>> So your application can wakeup once and drain the entire perf buffer.
>>>
>>> Signed-off-by: Jon Doron <jond@wiz.io>
>>> ---
>>>    tools/lib/bpf/libbpf.c | 4 ++--
>>>    tools/lib/bpf/libbpf.h | 3 ++-
>>>    2 files changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>>> index eed5cec6f510..6b30ff13922b 100644
>>> --- a/tools/lib/bpf/libbpf.c
>>> +++ b/tools/lib/bpf/libbpf.c
>>> @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt,
>>>        attr.config = PERF_COUNT_SW_BPF_OUTPUT;
>>>        attr.type = PERF_TYPE_SOFTWARE;
>>>        attr.sample_type = PERF_SAMPLE_RAW;
>>> -     attr.sample_period = 1;
>>> -     attr.wakeup_events = 1;
>>> +     attr.sample_period = OPTS_GET(opts, wakeup_events, 1);
>>> +     attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1);
>>>
>>>        p.attr = &attr;
>>>        p.sample_cb = sample_cb;
>>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>>> index 8777ff21ea1d..e83c0a915dc7 100644
>>> --- a/tools/lib/bpf/libbpf.h
>>> +++ b/tools/lib/bpf/libbpf.h
>>> @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt);
>>>    /* common use perf buffer options */
>>>    struct perf_buffer_opts {
>>>        size_t sz;
>>> +     __u32 wakeup_events;
>>
>> Since you are adding wakeup_events here, do you think it make sense
>> to add sample_period to struct perf_buffer_opts as well? In some cases,
>> users might want to have different values for sample_period and
>> wakeup_events, e.g., smaller sample_period to accumulate data and
>> larger wakeup_events to wakeup user space poll?
> 
> It's not clear to me from reading perf_event_open manpage what
> "sample_period" means for perf buffer. What will happen when we have
> sample_period != wakeup_events?

The following is my understanding. Let us sample_period = 10,
wakeup_events = 100.

Every 10 samples, kernel overflow handler is called and the the sample 
data are written into ring buffer.
Every 100 samples, the poll() syscall is waken up to give userspace
a chance to read the data.

In this particular case, it is possible that user space may miss
some samples at the end if no special handling (I forgot what it is).
But if the user space doesn't really care, it might be okay with
such a configuration.

It would be great if some perf expert can clarify whether my
understanding is correct or not.

> 
>>
>>>    };
>>> -#define perf_buffer_opts__last_field sz
>>> +#define perf_buffer_opts__last_field wakeup_events
>>>
>>>    /**
>>>     * @brief **perf_buffer__new()** creates BPF perfbuf manager for a specified

Yonghong Song Feb. 6, 2023, 5:39 a.m. UTC | #7

On 2/3/23 3:11 PM, Yonghong Song wrote:
> 
> 
> On 2/3/23 1:42 PM, Andrii Nakryiko wrote:
>> On Fri, Feb 3, 2023 at 10:31 AM Yonghong Song <yhs@meta.com> wrote:
>>>
>>>
>>>
>>> On 2/1/23 10:25 PM, Jon Doron wrote:
>>>> From: Jon Doron <jond@wiz.io>
>>>>
>>>> Add option to set when the perf buffer should wake up, by default the
>>>> perf buffer becomes signaled for every event that is being pushed to 
>>>> it.
>>>>
>>>> In case of a high throughput of events it will be more efficient to 
>>>> wake
>>>> up only once you have X events ready to be read.
>>>>
>>>> So your application can wakeup once and drain the entire perf buffer.
>>>>
>>>> Signed-off-by: Jon Doron <jond@wiz.io>
>>>> ---
>>>>    tools/lib/bpf/libbpf.c | 4 ++--
>>>>    tools/lib/bpf/libbpf.h | 3 ++-
>>>>    2 files changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>>>> index eed5cec6f510..6b30ff13922b 100644
>>>> --- a/tools/lib/bpf/libbpf.c
>>>> +++ b/tools/lib/bpf/libbpf.c
>>>> @@ -11719,8 +11719,8 @@ struct perf_buffer *perf_buffer__new(int 
>>>> map_fd, size_t page_cnt,
>>>>        attr.config = PERF_COUNT_SW_BPF_OUTPUT;
>>>>        attr.type = PERF_TYPE_SOFTWARE;
>>>>        attr.sample_type = PERF_SAMPLE_RAW;
>>>> -     attr.sample_period = 1;
>>>> -     attr.wakeup_events = 1;
>>>> +     attr.sample_period = OPTS_GET(opts, wakeup_events, 1);
>>>> +     attr.wakeup_events = OPTS_GET(opts, wakeup_events, 1);
>>>>
>>>>        p.attr = &attr;
>>>>        p.sample_cb = sample_cb;
>>>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>>>> index 8777ff21ea1d..e83c0a915dc7 100644
>>>> --- a/tools/lib/bpf/libbpf.h
>>>> +++ b/tools/lib/bpf/libbpf.h
>>>> @@ -1246,8 +1246,9 @@ typedef void (*perf_buffer_lost_fn)(void *ctx, 
>>>> int cpu, __u64 cnt);
>>>>    /* common use perf buffer options */
>>>>    struct perf_buffer_opts {
>>>>        size_t sz;
>>>> +     __u32 wakeup_events;
>>>
>>> Since you are adding wakeup_events here, do you think it make sense
>>> to add sample_period to struct perf_buffer_opts as well? In some cases,
>>> users might want to have different values for sample_period and
>>> wakeup_events, e.g., smaller sample_period to accumulate data and
>>> larger wakeup_events to wakeup user space poll?
>>
>> It's not clear to me from reading perf_event_open manpage what
>> "sample_period" means for perf buffer. What will happen when we have
>> sample_period != wakeup_events?
> 
> The following is my understanding. Let us sample_period = 10,
> wakeup_events = 100.
> 
> Every 10 samples, kernel overflow handler is called and the the sample 
> data are written into ring buffer.
> Every 100 samples, the poll() syscall is waken up to give userspace
> a chance to read the data.
> 
> In this particular case, it is possible that user space may miss
> some samples at the end if no special handling (I forgot what it is).
> But if the user space doesn't really care, it might be okay with
> such a configuration.
> 
> It would be great if some perf expert can clarify whether my
> understanding is correct or not.

Think twice. I am okay with just one wakeup_events/sample_period
in the perf_buffer_opts since wakeup_events == sample_period
is the most common use case. The same as before wakeup_events =
sample_period = 1.

But it is apparent that sample_period is much easier to understand
than wakeup_events. Can we put sample_period in struct
perf_buffer_opts instead of wakeup_events?

> 
>>
>>>
>>>>    };
>>>> -#define perf_buffer_opts__last_field sz
>>>> +#define perf_buffer_opts__last_field wakeup_events
>>>>
>>>>    /**
>>>>     * @brief **perf_buffer__new()** creates BPF perfbuf manager for 
>>>> a specified

[bpf-next,v2] libbpf: Add wakeup_events to creation options

Checks

Commit Message

Comments

Patch