Message ID | 20231214043010.3458072-2-houtao@huaweicloud.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 8f82583f9527b3be9d70d9a5d1f33435e29d0480 |
Delegated to: | BPF |
Headers | show |
Series | bpf: Use GFP_KERNEL in bpf_event_entry_gen() | expand |
Hou Tao wrote: > From: Hou Tao <houtao1@huawei.com> > > There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or > ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two > callbacks. > > For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need > rcu-read-lock because array->ptrs must still be allocated. For > bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires > rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use > rcu_read_lock() during the invocation of htab_map_update_elem(). > > Acked-by: Yonghong Song <yonghong.song@linux.dev> > Signed-off-by: Hou Tao <houtao1@huawei.com> > --- > kernel/bpf/hashtab.c | 6 ++++++ > kernel/bpf/syscall.c | 4 ---- > 2 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > index 5b9146fa825f..ec3bdcc6a3cf 100644 > --- a/kernel/bpf/hashtab.c > +++ b/kernel/bpf/hashtab.c > @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, > if (IS_ERR(ptr)) > return PTR_ERR(ptr); > > + /* The htab bucket lock is always held during update operations in fd > + * htab map, and the following rcu_read_lock() is only used to avoid > + * the WARN_ON_ONCE in htab_map_update_elem(). > + */ > + rcu_read_lock(); > ret = htab_map_update_elem(map, key, &ptr, map_flags); > + rcu_read_unlock(); Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It looks like there are two ways to get to htab_map_update_elem() either through a syscall and the path here (bpf_fd_htab_map_update_elem) or through a BPF program calling, bpf_update_elem()? In the BPF_CALL case bpf_map_update_elem() already has, WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held()) The htab_map_update_elem() has an additional check for rcu_read_lock_trace_held(), but not sure where this is coming from at the moment. Can that be added to the BPF caller side if needed? Did I miss some caller path? > if (ret) > map->ops->map_fd_put_ptr(map, ptr, false); > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index d63c1ed42412..3fcf7741146a 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -184,15 +184,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file, > err = bpf_percpu_cgroup_storage_update(map, key, value, > flags); > } else if (IS_FD_ARRAY(map)) { > - rcu_read_lock(); > err = bpf_fd_array_map_update_elem(map, map_file, key, value, > flags); > - rcu_read_unlock(); > } else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) { > - rcu_read_lock(); > err = bpf_fd_htab_map_update_elem(map, map_file, key, value, > flags); > - rcu_read_unlock(); > } else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) { > /* rcu_read_lock() is not needed */ > err = bpf_fd_reuseport_array_update_elem(map, key, value, Any reason to leave the last rcu_read_lock() on the 'else{}' case? If the rule is we have a reference to the map through the file fdget()? And any concurrent runners need some locking, xchg, to handle the update a rcu_read_lock() wont help there. I didn't audit all the update flows tonight though. > -- > 2.29.2 > >
Hi, On 12/14/2023 2:22 PM, John Fastabend wrote: > Hou Tao wrote: >> From: Hou Tao <houtao1@huawei.com> >> >> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or >> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two >> callbacks. >> >> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need >> rcu-read-lock because array->ptrs must still be allocated. For >> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires >> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use >> rcu_read_lock() during the invocation of htab_map_update_elem(). >> >> Acked-by: Yonghong Song <yonghong.song@linux.dev> >> Signed-off-by: Hou Tao <houtao1@huawei.com> >> --- >> kernel/bpf/hashtab.c | 6 ++++++ >> kernel/bpf/syscall.c | 4 ---- >> 2 files changed, 6 insertions(+), 4 deletions(-) >> >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >> index 5b9146fa825f..ec3bdcc6a3cf 100644 >> --- a/kernel/bpf/hashtab.c >> +++ b/kernel/bpf/hashtab.c >> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, >> if (IS_ERR(ptr)) >> return PTR_ERR(ptr); >> >> + /* The htab bucket lock is always held during update operations in fd >> + * htab map, and the following rcu_read_lock() is only used to avoid >> + * the WARN_ON_ONCE in htab_map_update_elem(). >> + */ >> + rcu_read_lock(); >> ret = htab_map_update_elem(map, key, &ptr, map_flags); >> + rcu_read_unlock(); > Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It > looks like there are two ways to get to htab_map_update_elem() either > through a syscall and the path here (bpf_fd_htab_map_update_elem) or > through a BPF program calling, bpf_update_elem()? In the BPF_CALL > case bpf_map_update_elem() already has, > > WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held()) > > The htab_map_update_elem() has an additional check for > rcu_read_lock_trace_held(), but not sure where this is coming from > at the moment. Can that be added to the BPF caller side if needed? > > Did I miss some caller path? No. But I think the main reason for the extra WARN in bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by verifier in do_misc_fixups(), so the WARN_ON_ONCE in bpf_map_update_elem() will not be invoked ever. For rcu_read_lock_trace_held(), I have added the assertion in bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check rcu_read_lock_trace_held() before calling bpf map helpers"). > > >> if (ret) >> map->ops->map_fd_put_ptr(map, ptr, false); >> >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >> index d63c1ed42412..3fcf7741146a 100644 >> --- a/kernel/bpf/syscall.c >> +++ b/kernel/bpf/syscall.c >> @@ -184,15 +184,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file, >> err = bpf_percpu_cgroup_storage_update(map, key, value, >> flags); >> } else if (IS_FD_ARRAY(map)) { >> - rcu_read_lock(); >> err = bpf_fd_array_map_update_elem(map, map_file, key, value, >> flags); >> - rcu_read_unlock(); >> } else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) { >> - rcu_read_lock(); >> err = bpf_fd_htab_map_update_elem(map, map_file, key, value, >> flags); >> - rcu_read_unlock(); >> } else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) { >> /* rcu_read_lock() is not needed */ >> err = bpf_fd_reuseport_array_update_elem(map, key, value, > Any reason to leave the last rcu_read_lock() on the 'else{}' case? If > the rule is we have a reference to the map through the file fdget()? And > any concurrent runners need some locking, xchg, to handle the update a > rcu_read_lock() wont help there. > > I didn't audit all the update flows tonight though. It seems it is still necessary for htab and local storage. For normal htab, it is possible the update is done without taking the bucket lock (in-place replace), so RCU CS is needed to guarantee the iteration is still safe. And for local storage (e.g. cgrp local storage) it may also do in-place update through lookup and then update. We could fold the calling of rcu_read_lock() into .map_update_elem() if it is necessary. > > >> -- >> 2.29.2 >> >>
On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote: > > Hi, > > On 12/14/2023 2:22 PM, John Fastabend wrote: > > Hou Tao wrote: > >> From: Hou Tao <houtao1@huawei.com> > >> > >> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or > >> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two > >> callbacks. > >> > >> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need > >> rcu-read-lock because array->ptrs must still be allocated. For > >> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires > >> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use > >> rcu_read_lock() during the invocation of htab_map_update_elem(). > >> > >> Acked-by: Yonghong Song <yonghong.song@linux.dev> > >> Signed-off-by: Hou Tao <houtao1@huawei.com> > >> --- > >> kernel/bpf/hashtab.c | 6 ++++++ > >> kernel/bpf/syscall.c | 4 ---- > >> 2 files changed, 6 insertions(+), 4 deletions(-) > >> > >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > >> index 5b9146fa825f..ec3bdcc6a3cf 100644 > >> --- a/kernel/bpf/hashtab.c > >> +++ b/kernel/bpf/hashtab.c > >> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, > >> if (IS_ERR(ptr)) > >> return PTR_ERR(ptr); > >> > >> + /* The htab bucket lock is always held during update operations in fd > >> + * htab map, and the following rcu_read_lock() is only used to avoid > >> + * the WARN_ON_ONCE in htab_map_update_elem(). > >> + */ > >> + rcu_read_lock(); > >> ret = htab_map_update_elem(map, key, &ptr, map_flags); > >> + rcu_read_unlock(); > > Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It > > looks like there are two ways to get to htab_map_update_elem() either > > through a syscall and the path here (bpf_fd_htab_map_update_elem) or > > through a BPF program calling, bpf_update_elem()? In the BPF_CALL > > case bpf_map_update_elem() already has, > > > > WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held()) > > > > The htab_map_update_elem() has an additional check for > > rcu_read_lock_trace_held(), but not sure where this is coming from > > at the moment. Can that be added to the BPF caller side if needed? > > > > Did I miss some caller path? > > No. But I think the main reason for the extra WARN in > bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by > verifier in do_misc_fixups(), so the WARN_ON_ONCE in > bpf_map_update_elem() will not be invoked ever. For > rcu_read_lock_trace_held(), I have added the assertion in > bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check > rcu_read_lock_trace_held() before calling bpf map helpers"). Yep. We should probably remove WARN_ONs from bpf_map_update_elem() and others in kernel/bpf/helpers.c since they are inlined by the verifier with 99% probability and the WARNs are never called even in DEBUG kernels. And confusing developers. As this thread shows. We can replace them with a comment that explains this inlining logic and where the real WARNs are.
Alexei Starovoitov wrote: > On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote: > > > > Hi, > > > > On 12/14/2023 2:22 PM, John Fastabend wrote: > > > Hou Tao wrote: > > >> From: Hou Tao <houtao1@huawei.com> > > >> > > >> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or > > >> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two > > >> callbacks. > > >> > > >> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need > > >> rcu-read-lock because array->ptrs must still be allocated. For > > >> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires > > >> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use > > >> rcu_read_lock() during the invocation of htab_map_update_elem(). > > >> > > >> Acked-by: Yonghong Song <yonghong.song@linux.dev> > > >> Signed-off-by: Hou Tao <houtao1@huawei.com> > > >> --- > > >> kernel/bpf/hashtab.c | 6 ++++++ > > >> kernel/bpf/syscall.c | 4 ---- > > >> 2 files changed, 6 insertions(+), 4 deletions(-) > > >> > > >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > > >> index 5b9146fa825f..ec3bdcc6a3cf 100644 > > >> --- a/kernel/bpf/hashtab.c > > >> +++ b/kernel/bpf/hashtab.c > > >> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, > > >> if (IS_ERR(ptr)) > > >> return PTR_ERR(ptr); > > >> > > >> + /* The htab bucket lock is always held during update operations in fd > > >> + * htab map, and the following rcu_read_lock() is only used to avoid > > >> + * the WARN_ON_ONCE in htab_map_update_elem(). > > >> + */ Ah ok but isn't this comment wrong because you do need rcu read lock to do the walk with lookup_nulls_elem_raw where there is no lock being held? And then the subsequent copy in place is fine because you do have a lock. So its not just to appease the WARN_ON_ONCE here it has an actual real need? > > >> + rcu_read_lock(); > > >> ret = htab_map_update_elem(map, key, &ptr, map_flags); > > >> + rcu_read_unlock(); > > > Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It > > > looks like there are two ways to get to htab_map_update_elem() either > > > through a syscall and the path here (bpf_fd_htab_map_update_elem) or > > > through a BPF program calling, bpf_update_elem()? In the BPF_CALL > > > case bpf_map_update_elem() already has, > > > > > > WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held()) > > > > > > The htab_map_update_elem() has an additional check for > > > rcu_read_lock_trace_held(), but not sure where this is coming from > > > at the moment. Can that be added to the BPF caller side if needed? > > > > > > Did I miss some caller path? > > > > No. But I think the main reason for the extra WARN in > > bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by > > verifier in do_misc_fixups(), so the WARN_ON_ONCE in > > bpf_map_update_elem() will not be invoked ever. For > > rcu_read_lock_trace_held(), I have added the assertion in > > bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check > > rcu_read_lock_trace_held() before calling bpf map helpers"). > > Yep. > We should probably remove WARN_ONs from > bpf_map_update_elem() and others in kernel/bpf/helpers.c > since they are inlined by the verifier with 99% probability > and the WARNs are never called even in DEBUG kernels. > And confusing developers. As this thread shows. Agree. The rcu_read needs to be close as possible to where its actually needed and the WARN_ON_ONCE should be dropped if its going to be inlined. > > We can replace them with a comment that explains this inlining logic > and where the real WARNs are.
On Thu, Dec 14, 2023 at 11:15 AM John Fastabend <john.fastabend@gmail.com> wrote: > > Alexei Starovoitov wrote: > > On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote: > > > > > > Hi, > > > > > > On 12/14/2023 2:22 PM, John Fastabend wrote: > > > > Hou Tao wrote: > > > >> From: Hou Tao <houtao1@huawei.com> > > > >> > > > >> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or > > > >> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two > > > >> callbacks. > > > >> > > > >> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need > > > >> rcu-read-lock because array->ptrs must still be allocated. For > > > >> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires > > > >> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use > > > >> rcu_read_lock() during the invocation of htab_map_update_elem(). > > > >> > > > >> Acked-by: Yonghong Song <yonghong.song@linux.dev> > > > >> Signed-off-by: Hou Tao <houtao1@huawei.com> > > > >> --- > > > >> kernel/bpf/hashtab.c | 6 ++++++ > > > >> kernel/bpf/syscall.c | 4 ---- > > > >> 2 files changed, 6 insertions(+), 4 deletions(-) > > > >> > > > >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > > > >> index 5b9146fa825f..ec3bdcc6a3cf 100644 > > > >> --- a/kernel/bpf/hashtab.c > > > >> +++ b/kernel/bpf/hashtab.c > > > >> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, > > > >> if (IS_ERR(ptr)) > > > >> return PTR_ERR(ptr); > > > >> > > > >> + /* The htab bucket lock is always held during update operations in fd > > > >> + * htab map, and the following rcu_read_lock() is only used to avoid > > > >> + * the WARN_ON_ONCE in htab_map_update_elem(). > > > >> + */ > > Ah ok but isn't this comment wrong because you do need rcu read lock to do > the walk with lookup_nulls_elem_raw where there is no lock being held? And > then the subsequent copy in place is fine because you do have a lock. Ohh. You're correct. Not sure what I was thinking. Hou, could you please send a follow up to undo my braino.
Hi, On 12/15/2023 11:23 AM, Alexei Starovoitov wrote: > On Thu, Dec 14, 2023 at 11:15 AM John Fastabend > <john.fastabend@gmail.com> wrote: >> Alexei Starovoitov wrote: >>> On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote: >>>> Hi, >>>> >>>> On 12/14/2023 2:22 PM, John Fastabend wrote: >>>>> Hou Tao wrote: >>>>>> From: Hou Tao <houtao1@huawei.com> >>>>>> >>>>>> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or >>>>>> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two >>>>>> callbacks. >>>>>> >>>>>> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need >>>>>> rcu-read-lock because array->ptrs must still be allocated. For >>>>>> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires >>>>>> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use >>>>>> rcu_read_lock() during the invocation of htab_map_update_elem(). >>>>>> >>>>>> Acked-by: Yonghong Song <yonghong.song@linux.dev> >>>>>> Signed-off-by: Hou Tao <houtao1@huawei.com> >>>>>> --- >>>>>> kernel/bpf/hashtab.c | 6 ++++++ >>>>>> kernel/bpf/syscall.c | 4 ---- >>>>>> 2 files changed, 6 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >>>>>> index 5b9146fa825f..ec3bdcc6a3cf 100644 >>>>>> --- a/kernel/bpf/hashtab.c >>>>>> +++ b/kernel/bpf/hashtab.c >>>>>> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, >>>>>> if (IS_ERR(ptr)) >>>>>> return PTR_ERR(ptr); >>>>>> >>>>>> + /* The htab bucket lock is always held during update operations in fd >>>>>> + * htab map, and the following rcu_read_lock() is only used to avoid >>>>>> + * the WARN_ON_ONCE in htab_map_update_elem(). >>>>>> + */ >> Ah ok but isn't this comment wrong because you do need rcu read lock to do >> the walk with lookup_nulls_elem_raw where there is no lock being held? And >> then the subsequent copy in place is fine because you do have a lock. > Ohh. You're correct. > Not sure what I was thinking. > > Hou, > could you please send a follow up to undo my braino. Er, I didn't follow. There is no spin-lock support in fd htab map, so htab_map_update_elem() won't call lookup_nulls_elem_raw(), instead it will lock the bucket and invoke lookup_elem_raw(), so I don't think rcu_read_lock() is indeed needed for the invocation of htab_map_update_elem(), except to make WARN_ON_ONC() happy.
Hi, On 12/15/2023 3:15 AM, John Fastabend wrote: > Alexei Starovoitov wrote: >> On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@huaweicloud.com> wrote: >>> Hi, >>> >>> On 12/14/2023 2:22 PM, John Fastabend wrote: >>>> Hou Tao wrote: >>>>> From: Hou Tao <houtao1@huawei.com> >>>>> >>>>> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or >>>>> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two >>>>> callbacks. >>>>> >>>>> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need >>>>> rcu-read-lock because array->ptrs must still be allocated. For >>>>> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires >>>>> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use >>>>> rcu_read_lock() during the invocation of htab_map_update_elem(). >>>>> >>>>> Acked-by: Yonghong Song <yonghong.song@linux.dev> >>>>> Signed-off-by: Hou Tao <houtao1@huawei.com> >>>>> --- >>>>> kernel/bpf/hashtab.c | 6 ++++++ >>>>> kernel/bpf/syscall.c | 4 ---- >>>>> 2 files changed, 6 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >>>>> index 5b9146fa825f..ec3bdcc6a3cf 100644 >>>>> --- a/kernel/bpf/hashtab.c >>>>> +++ b/kernel/bpf/hashtab.c >>>>> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, >>>>> if (IS_ERR(ptr)) >>>>> return PTR_ERR(ptr); >>>>> >>>>> + /* The htab bucket lock is always held during update operations in fd >>>>> + * htab map, and the following rcu_read_lock() is only used to avoid >>>>> + * the WARN_ON_ONCE in htab_map_update_elem(). >>>>> + */ > Ah ok but isn't this comment wrong because you do need rcu read lock to do > the walk with lookup_nulls_elem_raw where there is no lock being held? And > then the subsequent copy in place is fine because you do have a lock. > > So its not just to appease the WARN_ON_ONCE here it has an actual real > need? > >>>>> + rcu_read_lock(); >>>>> ret = htab_map_update_elem(map, key, &ptr, map_flags); >>>>> + rcu_read_unlock(); >>>> Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It >>>> looks like there are two ways to get to htab_map_update_elem() either >>>> through a syscall and the path here (bpf_fd_htab_map_update_elem) or >>>> through a BPF program calling, bpf_update_elem()? In the BPF_CALL >>>> case bpf_map_update_elem() already has, >>>> >>>> WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held()) >>>> >>>> The htab_map_update_elem() has an additional check for >>>> rcu_read_lock_trace_held(), but not sure where this is coming from >>>> at the moment. Can that be added to the BPF caller side if needed? >>>> >>>> Did I miss some caller path? >>> No. But I think the main reason for the extra WARN in >>> bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by >>> verifier in do_misc_fixups(), so the WARN_ON_ONCE in >>> bpf_map_update_elem() will not be invoked ever. For >>> rcu_read_lock_trace_held(), I have added the assertion in >>> bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check >>> rcu_read_lock_trace_held() before calling bpf map helpers"). >> Yep. >> We should probably remove WARN_ONs from >> bpf_map_update_elem() and others in kernel/bpf/helpers.c >> since they are inlined by the verifier with 99% probability >> and the WARNs are never called even in DEBUG kernels. >> And confusing developers. As this thread shows. > Agree. The rcu_read needs to be close as possible to where its actually > needed and the WARN_ON_ONCE should be dropped if its going to be > inlined. I did some investigation on these bpf map helpers and the implementations of these helpers in various kinds of bpf map. It seems most implementations (besides dev_map_hash_ops) already have added proper RCU lock assertions, so I think it is indeed OK to remove WARN_ON_ONCE() on these bpf map helpers after fixing the assertion in dev_map_hash_ops. The following is the details: 1. bpf_map_lookup_elem helper (a) hash/lru_hash/percpu_hash/lru_percpu_hash with !rcu_read_lock_held() && !rcu_read_lock_trace_held() && !rcu_read_lock_bh_held() in __htab_map_lookup_elem() (b) array/percpu_array no deletion, so no RCU (c) lpm_trie with rcu_read_lock_bh_held() in trie_lookup_elem() (d) htab_of_maps with !rcu_read_lock_held() && !rcu_read_lock_trace_held() && !rcu_read_lock_bh_held() in __htab_map_lookup_elem() (e) array_of_maps no deletion, so no RCU (f) sockmap rcu_read_lock_held() in __sock_map_lookup_elem() (g) sockhash rcu_read_lock_held() in__sock_hash_lookup_elem() (h) devmap rcu_read_lock_bh_held() in __dev_map_lookup_elem() (i) devmap_hash (incorrect assertion) No rcu_read_lock_bh_held() in __dev_map_hash_lookup_elem() (j) xskmap rcu_read_lock_bh_held() in __xsk_map_lookup_elem() 2. bpf_map_update_elem helper (a) hash/lru_hash/percpu_hash/lru_percpu_hash with !rcu_read_lock_held() && !rcu_read_lock_trace_held() && !rcu_read_lock_bh_held() in htab_map_update_elem()/htab_lru_map_update_elem()/__htab_percpu_map_update_elem()/__htab_lru_percpu_map_update_elem() (b) array/percpu_array no RCU (c) lpm_trie use spin-lock, and no RCU (d) sockmap use spin-lock & with rcu_read_lock_held() in sock_map_update_common() (e) sockhash use spin-lock & with rcu_read_lock_held() in sock_hash_update_common() 3.bpf_map_delete_elem helper (a) hash/lru_hash/percpu_hash/lru_percpu_hash with !rcu_read_lock_held() && !rcu_read_lock_trace_held() && !rcu_read_lock_bh_held() () in htab_map_delete_elem/htab_lru_map_delete_elem (b) array/percpu_array no support (c) lpm_trie use spin-lock, no rcu (d) sockmap use spin-lock (e) sockhash use spin-lock 4. bpf_map_lookup_percpu_elem (a) percpu_hash/lru_percpu_hash with !rcu_read_lock_held() && !rcu_read_lock_trace_held() && !rcu_read_lock_bh_held() in __htab_map_lookup_elem() (b) percpu_array no deletion, no RCU >> We can replace them with a comment that explains this inlining logic >> and where the real WARNs are..
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 5b9146fa825f..ec3bdcc6a3cf 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file, if (IS_ERR(ptr)) return PTR_ERR(ptr); + /* The htab bucket lock is always held during update operations in fd + * htab map, and the following rcu_read_lock() is only used to avoid + * the WARN_ON_ONCE in htab_map_update_elem(). + */ + rcu_read_lock(); ret = htab_map_update_elem(map, key, &ptr, map_flags); + rcu_read_unlock(); if (ret) map->ops->map_fd_put_ptr(map, ptr, false); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index d63c1ed42412..3fcf7741146a 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -184,15 +184,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file, err = bpf_percpu_cgroup_storage_update(map, key, value, flags); } else if (IS_FD_ARRAY(map)) { - rcu_read_lock(); err = bpf_fd_array_map_update_elem(map, map_file, key, value, flags); - rcu_read_unlock(); } else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) { - rcu_read_lock(); err = bpf_fd_htab_map_update_elem(map, map_file, key, value, flags); - rcu_read_unlock(); } else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) { /* rcu_read_lock() is not needed */ err = bpf_fd_reuseport_array_update_elem(map, key, value,