Message ID | 20220808155341.2479054-2-void@manifault.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | bpf: Add user-space-publisher ringbuffer map type | expand |
On Mon, Aug 8, 2022 at 8:54 AM David Vernet <void@manifault.com> wrote: > > We want to support a ringbuf map type where samples are published from > user-space to BPF programs. BPF currently supports a kernel -> user-space > circular ringbuffer via the BPF_MAP_TYPE_RINGBUF map type. We'll need to > define a new map type for user-space -> kernel, as none of the helpers > exported for BPF_MAP_TYPE_RINGBUF will apply to a user-space producer > ringbuffer, and we'll want to add one or more helper functions that would > not apply for a kernel-producer ringbuffer. > > This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type > definition. The map type is useless in its current form, as there is no way > to access or use it for anything until we add more BPF helpers. A follow-on > patch will therefore add a new helper function that allows BPF programs to > run callbacks on samples that are published to the ringbuffer. > > Signed-off-by: David Vernet <void@manifault.com> > --- > include/linux/bpf_types.h | 1 + > include/uapi/linux/bpf.h | 1 + > kernel/bpf/ringbuf.c | 70 +++++++++++++++++++++++++++++----- > kernel/bpf/verifier.c | 3 ++ > tools/include/uapi/linux/bpf.h | 1 + > tools/lib/bpf/libbpf.c | 1 + > 6 files changed, 68 insertions(+), 9 deletions(-) > [...] > > -static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma) > +static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma, > + bool kernel_producer) > { > struct bpf_ringbuf_map *rb_map; > > rb_map = container_of(map, struct bpf_ringbuf_map, map); > > if (vma->vm_flags & VM_WRITE) { > - /* allow writable mapping for the consumer_pos only */ > - if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE) > + if (kernel_producer) { > + /* allow writable mapping for the consumer_pos only */ > + if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE) > + return -EPERM; > + /* For user ringbufs, disallow writable mappings to the > + * consumer pointer, and allow writable mappings to both the > + * producer position, and the ring buffer data itself. > + */ > + } else if (vma->vm_pgoff == 0) > return -EPERM; the asymmetrical use of {} in one if branch and not using them in another is extremely confusing, please don't do that the way you put big comment inside the wrong if branch also throws me off, maybe move it before return -EPERM instead with proper indentation? sorry for nitpicks, but I've been stuck for a few minutes trying to figure out what exactly is happening here :) > } else { > vma->vm_flags &= ~VM_MAYWRITE; > @@ -242,6 +271,16 @@ static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma) > vma->vm_pgoff + RINGBUF_PGOFF); > } > > +static int ringbuf_map_mmap_kern(struct bpf_map *map, struct vm_area_struct *vma) > +{ > + return ringbuf_map_mmap(map, vma, true); > +} > + > +static int ringbuf_map_mmap_user(struct bpf_map *map, struct vm_area_struct *vma) > +{ > + return ringbuf_map_mmap(map, vma, false); > +} I wouldn't mind if you just have two separate implementations of ringbuf_map_mmap for _kern and _user cases, tbh, probably would be clearer as well > + > static unsigned long ringbuf_avail_data_sz(struct bpf_ringbuf *rb) > { > unsigned long cons_pos, prod_pos; > @@ -269,7 +308,7 @@ const struct bpf_map_ops ringbuf_map_ops = { > .map_meta_equal = bpf_map_meta_equal, > .map_alloc = ringbuf_map_alloc, > .map_free = ringbuf_map_free, > - .map_mmap = ringbuf_map_mmap, > + .map_mmap = ringbuf_map_mmap_kern, > .map_poll = ringbuf_map_poll, > .map_lookup_elem = ringbuf_map_lookup_elem, > .map_update_elem = ringbuf_map_update_elem, > @@ -278,6 +317,19 @@ const struct bpf_map_ops ringbuf_map_ops = { > .map_btf_id = &ringbuf_map_btf_ids[0], > }; > [...]
On Mon, Aug 8, 2022 at 8:54 AM David Vernet <void@manifault.com> wrote: > > We want to support a ringbuf map type where samples are published from > user-space to BPF programs. BPF currently supports a kernel -> user-space > circular ringbuffer via the BPF_MAP_TYPE_RINGBUF map type. We'll need to > define a new map type for user-space -> kernel, as none of the helpers > exported for BPF_MAP_TYPE_RINGBUF will apply to a user-space producer > ringbuffer, and we'll want to add one or more helper functions that would > not apply for a kernel-producer ringbuffer. > > This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type > definition. The map type is useless in its current form, as there is no way > to access or use it for anything until we add more BPF helpers. A follow-on > patch will therefore add a new helper function that allows BPF programs to > run callbacks on samples that are published to the ringbuffer. > > Signed-off-by: David Vernet <void@manifault.com> > --- > include/linux/bpf_types.h | 1 + > include/uapi/linux/bpf.h | 1 + > kernel/bpf/ringbuf.c | 70 +++++++++++++++++++++++++++++----- > kernel/bpf/verifier.c | 3 ++ > tools/include/uapi/linux/bpf.h | 1 + > tools/lib/bpf/libbpf.c | 1 + > 6 files changed, 68 insertions(+), 9 deletions(-) > > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h > index 2b9112b80171..2c6a4f2562a7 100644 > --- a/include/linux/bpf_types.h > +++ b/include/linux/bpf_types.h > @@ -126,6 +126,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops) > #endif > BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops) > BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops) > +BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops) > > BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint) > BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing) > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 7bf9ba1329be..a341f877b230 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -909,6 +909,7 @@ enum bpf_map_type { > BPF_MAP_TYPE_INODE_STORAGE, > BPF_MAP_TYPE_TASK_STORAGE, > BPF_MAP_TYPE_BLOOM_FILTER, > + BPF_MAP_TYPE_USER_RINGBUF, > }; > > /* Note that tracing related programs such as > diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c > index ded4faeca192..29e2de42df15 100644 > --- a/kernel/bpf/ringbuf.c > +++ b/kernel/bpf/ringbuf.c > @@ -38,12 +38,32 @@ struct bpf_ringbuf { > struct page **pages; > int nr_pages; > spinlock_t spinlock ____cacheline_aligned_in_smp; > - /* Consumer and producer counters are put into separate pages to allow > - * mapping consumer page as r/w, but restrict producer page to r/o. > - * This protects producer position from being modified by user-space > - * application and ruining in-kernel position tracking. > + /* Consumer and producer counters are put into separate pages to > + * allow each position to be mapped with different permissions. > + * This prevents a user-space application from modifying the > + * position and ruining in-kernel tracking. The permissions of the > + * pages depend on who is producing samples: user-space or the > + * kernel. > + * > + * Kernel-producer > + * --------------- > + * The producer position and data pages are mapped as r/o in > + * userspace. For this approach, bits in the header of samples are > + * used to signal to user-space, and to other producers, whether a > + * sample is currently being written. > + * > + * User-space producer > + * ------------------- > + * Only the page containing the consumer position, and whether the > + * ringbuffer is currently being consumed via a 'busy' bit, are > + * mapped r/o in user-space. Sample headers may not be used to > + * communicate any information between kernel consumers, as a > + * user-space application could modify its contents at any time. > */ > - unsigned long consumer_pos __aligned(PAGE_SIZE); > + struct { > + unsigned long consumer_pos; > + atomic_t busy; one more thing, why does busy have to be exposed into user-space mapped memory at all? Can't it be just a private variable in bpf_ringbuf? > + } __aligned(PAGE_SIZE); > unsigned long producer_pos __aligned(PAGE_SIZE); > char data[] __aligned(PAGE_SIZE); > }; [...]
On Thu, Aug 11, 2022 at 04:22:50PM -0700, Andrii Nakryiko wrote: > On Mon, Aug 8, 2022 at 8:54 AM David Vernet <void@manifault.com> wrote: > > > > We want to support a ringbuf map type where samples are published from > > user-space to BPF programs. BPF currently supports a kernel -> user-space > > circular ringbuffer via the BPF_MAP_TYPE_RINGBUF map type. We'll need to > > define a new map type for user-space -> kernel, as none of the helpers > > exported for BPF_MAP_TYPE_RINGBUF will apply to a user-space producer > > ringbuffer, and we'll want to add one or more helper functions that would > > not apply for a kernel-producer ringbuffer. > > > > This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type > > definition. The map type is useless in its current form, as there is no way > > to access or use it for anything until we add more BPF helpers. A follow-on > > patch will therefore add a new helper function that allows BPF programs to > > run callbacks on samples that are published to the ringbuffer. > > > > Signed-off-by: David Vernet <void@manifault.com> > > --- > > include/linux/bpf_types.h | 1 + > > include/uapi/linux/bpf.h | 1 + > > kernel/bpf/ringbuf.c | 70 +++++++++++++++++++++++++++++----- > > kernel/bpf/verifier.c | 3 ++ > > tools/include/uapi/linux/bpf.h | 1 + > > tools/lib/bpf/libbpf.c | 1 + > > 6 files changed, 68 insertions(+), 9 deletions(-) > > > > [...] > > > > > -static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma) > > +static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma, > > + bool kernel_producer) > > { > > struct bpf_ringbuf_map *rb_map; > > > > rb_map = container_of(map, struct bpf_ringbuf_map, map); > > > > if (vma->vm_flags & VM_WRITE) { > > - /* allow writable mapping for the consumer_pos only */ > > - if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE) > > + if (kernel_producer) { > > + /* allow writable mapping for the consumer_pos only */ > > + if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE) > > + return -EPERM; > > + /* For user ringbufs, disallow writable mappings to the > > + * consumer pointer, and allow writable mappings to both the > > + * producer position, and the ring buffer data itself. > > + */ > > + } else if (vma->vm_pgoff == 0) > > return -EPERM; > > the asymmetrical use of {} in one if branch and not using them in > another is extremely confusing, please don't do that Ah, sorry, I misread the style guide and thought this was stipulated there, but I now see that it's explicitly stated that you should include a brace if only one branch is in a single statement. I'll fix this (by splitting these into their own implementations, as mentioned below). > the way you put big comment inside the wrong if branch also throws me > off, maybe move it before return -EPERM instead with proper > indentation? Yeah, fair enough. > sorry for nitpicks, but I've been stuck for a few minutes trying to > figure out what exactly is happening here :) Not a problem at all, sorry for the less-than-readable code. > > } else { > > vma->vm_flags &= ~VM_MAYWRITE; > > @@ -242,6 +271,16 @@ static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma) > > vma->vm_pgoff + RINGBUF_PGOFF); > > } > > > > +static int ringbuf_map_mmap_kern(struct bpf_map *map, struct vm_area_struct *vma) > > +{ > > + return ringbuf_map_mmap(map, vma, true); > > +} > > + > > +static int ringbuf_map_mmap_user(struct bpf_map *map, struct vm_area_struct *vma) > > +{ > > + return ringbuf_map_mmap(map, vma, false); > > +} > > I wouldn't mind if you just have two separate implementations of > ringbuf_map_mmap for _kern and _user cases, tbh, probably would be > clearer as well Yeah, I can do this. I was trying to avoid any copy-pasta at all cost, but I think it's doing more harm than good. I'll just split them into totally separate implementations. > > + > > static unsigned long ringbuf_avail_data_sz(struct bpf_ringbuf *rb) > > { > > unsigned long cons_pos, prod_pos; > > @@ -269,7 +308,7 @@ const struct bpf_map_ops ringbuf_map_ops = { > > .map_meta_equal = bpf_map_meta_equal, > > .map_alloc = ringbuf_map_alloc, > > .map_free = ringbuf_map_free, > > - .map_mmap = ringbuf_map_mmap, > > + .map_mmap = ringbuf_map_mmap_kern, > > .map_poll = ringbuf_map_poll, > > .map_lookup_elem = ringbuf_map_lookup_elem, > > .map_update_elem = ringbuf_map_update_elem, > > @@ -278,6 +317,19 @@ const struct bpf_map_ops ringbuf_map_ops = { > > .map_btf_id = &ringbuf_map_btf_ids[0], > > }; > > > > [...]
On Thu, Aug 11, 2022 at 04:29:02PM -0700, Andrii Nakryiko wrote: [...] > > - /* Consumer and producer counters are put into separate pages to allow > > - * mapping consumer page as r/w, but restrict producer page to r/o. > > - * This protects producer position from being modified by user-space > > - * application and ruining in-kernel position tracking. > > + /* Consumer and producer counters are put into separate pages to > > + * allow each position to be mapped with different permissions. > > + * This prevents a user-space application from modifying the > > + * position and ruining in-kernel tracking. The permissions of the > > + * pages depend on who is producing samples: user-space or the > > + * kernel. > > + * > > + * Kernel-producer > > + * --------------- > > + * The producer position and data pages are mapped as r/o in > > + * userspace. For this approach, bits in the header of samples are > > + * used to signal to user-space, and to other producers, whether a > > + * sample is currently being written. > > + * > > + * User-space producer > > + * ------------------- > > + * Only the page containing the consumer position, and whether the > > + * ringbuffer is currently being consumed via a 'busy' bit, are > > + * mapped r/o in user-space. Sample headers may not be used to > > + * communicate any information between kernel consumers, as a > > + * user-space application could modify its contents at any time. > > */ > > - unsigned long consumer_pos __aligned(PAGE_SIZE); > > + struct { > > + unsigned long consumer_pos; > > + atomic_t busy; > > one more thing, why does busy have to be exposed into user-space > mapped memory at all? Can't it be just a private variable in > bpf_ringbuf? It could be moved elsewhere in the struct. I put it here to avoid increasing the size of struct bpf_ringbuf unnecessarily, as we had all of this extra space on the consumer_pos page. Specifically, I was trying to avoid taxing kernel-producer ringbuffers. If you'd prefer, I can just put it elsewhere in the struct.
On Fri, Aug 12, 2022 at 9:23 AM David Vernet <void@manifault.com> wrote: > > On Thu, Aug 11, 2022 at 04:29:02PM -0700, Andrii Nakryiko wrote: > > [...] > > > > - /* Consumer and producer counters are put into separate pages to allow > > > - * mapping consumer page as r/w, but restrict producer page to r/o. > > > - * This protects producer position from being modified by user-space > > > - * application and ruining in-kernel position tracking. > > > + /* Consumer and producer counters are put into separate pages to > > > + * allow each position to be mapped with different permissions. > > > + * This prevents a user-space application from modifying the > > > + * position and ruining in-kernel tracking. The permissions of the > > > + * pages depend on who is producing samples: user-space or the > > > + * kernel. > > > + * > > > + * Kernel-producer > > > + * --------------- > > > + * The producer position and data pages are mapped as r/o in > > > + * userspace. For this approach, bits in the header of samples are > > > + * used to signal to user-space, and to other producers, whether a > > > + * sample is currently being written. > > > + * > > > + * User-space producer > > > + * ------------------- > > > + * Only the page containing the consumer position, and whether the > > > + * ringbuffer is currently being consumed via a 'busy' bit, are > > > + * mapped r/o in user-space. Sample headers may not be used to > > > + * communicate any information between kernel consumers, as a > > > + * user-space application could modify its contents at any time. > > > */ > > > - unsigned long consumer_pos __aligned(PAGE_SIZE); > > > + struct { > > > + unsigned long consumer_pos; > > > + atomic_t busy; > > > > one more thing, why does busy have to be exposed into user-space > > mapped memory at all? Can't it be just a private variable in > > bpf_ringbuf? > > It could be moved elsewhere in the struct. I put it here to avoid > increasing the size of struct bpf_ringbuf unnecessarily, as we had all of > this extra space on the consumer_pos page. Specifically, I was trying to > avoid taxing kernel-producer ringbuffers. If you'd prefer, I can just put > it elsewhere in the struct. Yes, let's move. 8 byte increase is not a problem, while exposing internals into user-visible memory page is at the very least is unclean.
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 2b9112b80171..2c6a4f2562a7 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -126,6 +126,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops) #endif BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops) +BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops) BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint) BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 7bf9ba1329be..a341f877b230 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -909,6 +909,7 @@ enum bpf_map_type { BPF_MAP_TYPE_INODE_STORAGE, BPF_MAP_TYPE_TASK_STORAGE, BPF_MAP_TYPE_BLOOM_FILTER, + BPF_MAP_TYPE_USER_RINGBUF, }; /* Note that tracing related programs such as diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c index ded4faeca192..29e2de42df15 100644 --- a/kernel/bpf/ringbuf.c +++ b/kernel/bpf/ringbuf.c @@ -38,12 +38,32 @@ struct bpf_ringbuf { struct page **pages; int nr_pages; spinlock_t spinlock ____cacheline_aligned_in_smp; - /* Consumer and producer counters are put into separate pages to allow - * mapping consumer page as r/w, but restrict producer page to r/o. - * This protects producer position from being modified by user-space - * application and ruining in-kernel position tracking. + /* Consumer and producer counters are put into separate pages to + * allow each position to be mapped with different permissions. + * This prevents a user-space application from modifying the + * position and ruining in-kernel tracking. The permissions of the + * pages depend on who is producing samples: user-space or the + * kernel. + * + * Kernel-producer + * --------------- + * The producer position and data pages are mapped as r/o in + * userspace. For this approach, bits in the header of samples are + * used to signal to user-space, and to other producers, whether a + * sample is currently being written. + * + * User-space producer + * ------------------- + * Only the page containing the consumer position, and whether the + * ringbuffer is currently being consumed via a 'busy' bit, are + * mapped r/o in user-space. Sample headers may not be used to + * communicate any information between kernel consumers, as a + * user-space application could modify its contents at any time. */ - unsigned long consumer_pos __aligned(PAGE_SIZE); + struct { + unsigned long consumer_pos; + atomic_t busy; + } __aligned(PAGE_SIZE); unsigned long producer_pos __aligned(PAGE_SIZE); char data[] __aligned(PAGE_SIZE); }; @@ -141,6 +161,7 @@ static struct bpf_ringbuf *bpf_ringbuf_alloc(size_t data_sz, int numa_node) rb->mask = data_sz - 1; rb->consumer_pos = 0; + atomic_set(&rb->busy, 0); rb->producer_pos = 0; return rb; @@ -224,15 +245,23 @@ static int ringbuf_map_get_next_key(struct bpf_map *map, void *key, return -ENOTSUPP; } -static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma) +static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma, + bool kernel_producer) { struct bpf_ringbuf_map *rb_map; rb_map = container_of(map, struct bpf_ringbuf_map, map); if (vma->vm_flags & VM_WRITE) { - /* allow writable mapping for the consumer_pos only */ - if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE) + if (kernel_producer) { + /* allow writable mapping for the consumer_pos only */ + if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE) + return -EPERM; + /* For user ringbufs, disallow writable mappings to the + * consumer pointer, and allow writable mappings to both the + * producer position, and the ring buffer data itself. + */ + } else if (vma->vm_pgoff == 0) return -EPERM; } else { vma->vm_flags &= ~VM_MAYWRITE; @@ -242,6 +271,16 @@ static int ringbuf_map_mmap(struct bpf_map *map, struct vm_area_struct *vma) vma->vm_pgoff + RINGBUF_PGOFF); } +static int ringbuf_map_mmap_kern(struct bpf_map *map, struct vm_area_struct *vma) +{ + return ringbuf_map_mmap(map, vma, true); +} + +static int ringbuf_map_mmap_user(struct bpf_map *map, struct vm_area_struct *vma) +{ + return ringbuf_map_mmap(map, vma, false); +} + static unsigned long ringbuf_avail_data_sz(struct bpf_ringbuf *rb) { unsigned long cons_pos, prod_pos; @@ -269,7 +308,7 @@ const struct bpf_map_ops ringbuf_map_ops = { .map_meta_equal = bpf_map_meta_equal, .map_alloc = ringbuf_map_alloc, .map_free = ringbuf_map_free, - .map_mmap = ringbuf_map_mmap, + .map_mmap = ringbuf_map_mmap_kern, .map_poll = ringbuf_map_poll, .map_lookup_elem = ringbuf_map_lookup_elem, .map_update_elem = ringbuf_map_update_elem, @@ -278,6 +317,19 @@ const struct bpf_map_ops ringbuf_map_ops = { .map_btf_id = &ringbuf_map_btf_ids[0], }; +BTF_ID_LIST_SINGLE(user_ringbuf_map_btf_ids, struct, bpf_ringbuf_map) +const struct bpf_map_ops user_ringbuf_map_ops = { + .map_meta_equal = bpf_map_meta_equal, + .map_alloc = ringbuf_map_alloc, + .map_free = ringbuf_map_free, + .map_mmap = ringbuf_map_mmap_user, + .map_lookup_elem = ringbuf_map_lookup_elem, + .map_update_elem = ringbuf_map_update_elem, + .map_delete_elem = ringbuf_map_delete_elem, + .map_get_next_key = ringbuf_map_get_next_key, + .map_btf_id = &user_ringbuf_map_btf_ids[0], +}; + /* Given pointer to ring buffer record metadata and struct bpf_ringbuf itself, * calculate offset from record metadata to ring buffer in pages, rounded * down. This page offset is stored as part of record metadata and allows to diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 938ba1536249..4a9562c62af0 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -6196,6 +6196,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, func_id != BPF_FUNC_ringbuf_discard_dynptr) goto error; break; + case BPF_MAP_TYPE_USER_RINGBUF: + goto error; case BPF_MAP_TYPE_STACK_TRACE: if (func_id != BPF_FUNC_get_stackid) goto error; @@ -12681,6 +12683,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, } break; case BPF_MAP_TYPE_RINGBUF: + case BPF_MAP_TYPE_USER_RINGBUF: case BPF_MAP_TYPE_INODE_STORAGE: case BPF_MAP_TYPE_SK_STORAGE: case BPF_MAP_TYPE_TASK_STORAGE: diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 59a217ca2dfd..ce0ce713faf9 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -909,6 +909,7 @@ enum bpf_map_type { BPF_MAP_TYPE_INODE_STORAGE, BPF_MAP_TYPE_TASK_STORAGE, BPF_MAP_TYPE_BLOOM_FILTER, + BPF_MAP_TYPE_USER_RINGBUF, }; /* Note that tracing related programs such as diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 77e3797cf75a..9c1f2d09f44e 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -163,6 +163,7 @@ static const char * const map_type_name[] = { [BPF_MAP_TYPE_INODE_STORAGE] = "inode_storage", [BPF_MAP_TYPE_TASK_STORAGE] = "task_storage", [BPF_MAP_TYPE_BLOOM_FILTER] = "bloom_filter", + [BPF_MAP_TYPE_USER_RINGBUF] = "user_ringbuf", }; static const char * const prog_type_name[] = {
We want to support a ringbuf map type where samples are published from user-space to BPF programs. BPF currently supports a kernel -> user-space circular ringbuffer via the BPF_MAP_TYPE_RINGBUF map type. We'll need to define a new map type for user-space -> kernel, as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply to a user-space producer ringbuffer, and we'll want to add one or more helper functions that would not apply for a kernel-producer ringbuffer. This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type definition. The map type is useless in its current form, as there is no way to access or use it for anything until we add more BPF helpers. A follow-on patch will therefore add a new helper function that allows BPF programs to run callbacks on samples that are published to the ringbuffer. Signed-off-by: David Vernet <void@manifault.com> --- include/linux/bpf_types.h | 1 + include/uapi/linux/bpf.h | 1 + kernel/bpf/ringbuf.c | 70 +++++++++++++++++++++++++++++----- kernel/bpf/verifier.c | 3 ++ tools/include/uapi/linux/bpf.h | 1 + tools/lib/bpf/libbpf.c | 1 + 6 files changed, 68 insertions(+), 9 deletions(-)