Message ID | 20250204082848.13471-3-hotforest@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | bpf: Overwrite the htab element atomically | expand |
+cc Cody Haas Sorry for the resend. I sent the reply in the HTML format. On 2/4/2025 4:28 PM, Hou Tao wrote: > Currently, the update of existing element in hash map involves two > steps: > 1) insert the new element at the head of the hash list > 2) remove the old element > > It is possible that the concurrent lookup operation may fail to find > either the old element or the new element if the lookup operation starts > before the addition and continues after the removal. > > Therefore, replacing the two-step update with an atomic update. After > the change, the update will be atomic in the perspective of the lookup > operation: it will either find the old element or the new element. > > Signed-off-by: Hou Tao <hotforest@gmail.com> > --- > kernel/bpf/hashtab.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > index 4a9eeb7aef85..a28b11ce74c6 100644 > --- a/kernel/bpf/hashtab.c > +++ b/kernel/bpf/hashtab.c > @@ -1179,12 +1179,14 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, > goto err; > } > > - /* add new element to the head of the list, so that > - * concurrent search will find it before old elem > - */ > - hlist_nulls_add_head_rcu(&l_new->hash_node, head); > - if (l_old) { > - hlist_nulls_del_rcu(&l_old->hash_node); > + if (!l_old) { > + hlist_nulls_add_head_rcu(&l_new->hash_node, head); > + } else { > + /* Replace the old element atomically, so that > + * concurrent search will find either the new element or > + * the old element. > + */ > + hlist_nulls_replace_rcu(&l_new->hash_node, &l_old->hash_node); > > /* l_old has already been stashed in htab->extra_elems, free > * its special fields before it is available for reuse. Also > After thinking about it the second time, the atomic list replacement on the update side is enough to make lookup operation always find the existing element. However, due to the immediate reuse, the lookup may find an unexpected value. Maybe we should disable the immediate reuse for specific map (e.g., htab of maps).
Hou Tao <houtao@huaweicloud.com> writes: > +cc Cody Haas > > Sorry for the resend. I sent the reply in the HTML format. > > On 2/4/2025 4:28 PM, Hou Tao wrote: >> Currently, the update of existing element in hash map involves two >> steps: >> 1) insert the new element at the head of the hash list >> 2) remove the old element >> >> It is possible that the concurrent lookup operation may fail to find >> either the old element or the new element if the lookup operation starts >> before the addition and continues after the removal. >> >> Therefore, replacing the two-step update with an atomic update. After >> the change, the update will be atomic in the perspective of the lookup >> operation: it will either find the old element or the new element. >> >> Signed-off-by: Hou Tao <hotforest@gmail.com> >> --- >> kernel/bpf/hashtab.c | 14 ++++++++------ >> 1 file changed, 8 insertions(+), 6 deletions(-) >> >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >> index 4a9eeb7aef85..a28b11ce74c6 100644 >> --- a/kernel/bpf/hashtab.c >> +++ b/kernel/bpf/hashtab.c >> @@ -1179,12 +1179,14 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, >> goto err; >> } >> >> - /* add new element to the head of the list, so that >> - * concurrent search will find it before old elem >> - */ >> - hlist_nulls_add_head_rcu(&l_new->hash_node, head); >> - if (l_old) { >> - hlist_nulls_del_rcu(&l_old->hash_node); >> + if (!l_old) { >> + hlist_nulls_add_head_rcu(&l_new->hash_node, head); >> + } else { >> + /* Replace the old element atomically, so that >> + * concurrent search will find either the new element or >> + * the old element. >> + */ >> + hlist_nulls_replace_rcu(&l_new->hash_node, &l_old->hash_node); >> >> /* l_old has already been stashed in htab->extra_elems, free >> * its special fields before it is available for reuse. Also >> > > After thinking about it the second time, the atomic list replacement on > the update side is enough to make lookup operation always find the > existing element. However, due to the immediate reuse, the lookup may > find an unexpected value. Maybe we should disable the immediate reuse > for specific map (e.g., htab of maps). Hmm, in an RCU-protected data structure, reusing the memory before an RCU grace period has elapsed is just as wrong as freeing it, isn't it? I.e., the reuse logic should have some kind of call_rcu redirection to be completely correct? -Toke
Hi Toke, On 2/6/2025 11:05 PM, Toke Høiland-Jørgensen wrote: > Hou Tao <houtao@huaweicloud.com> writes: > >> +cc Cody Haas >> >> Sorry for the resend. I sent the reply in the HTML format. >> >> On 2/4/2025 4:28 PM, Hou Tao wrote: >>> Currently, the update of existing element in hash map involves two >>> steps: >>> 1) insert the new element at the head of the hash list >>> 2) remove the old element >>> >>> It is possible that the concurrent lookup operation may fail to find >>> either the old element or the new element if the lookup operation starts >>> before the addition and continues after the removal. >>> >>> Therefore, replacing the two-step update with an atomic update. After >>> the change, the update will be atomic in the perspective of the lookup >>> operation: it will either find the old element or the new element. >>> >>> Signed-off-by: Hou Tao <hotforest@gmail.com> >>> --- >>> kernel/bpf/hashtab.c | 14 ++++++++------ >>> 1 file changed, 8 insertions(+), 6 deletions(-) >>> >>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >>> index 4a9eeb7aef85..a28b11ce74c6 100644 >>> --- a/kernel/bpf/hashtab.c >>> +++ b/kernel/bpf/hashtab.c >>> @@ -1179,12 +1179,14 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, >>> goto err; >>> } >>> >>> - /* add new element to the head of the list, so that >>> - * concurrent search will find it before old elem >>> - */ >>> - hlist_nulls_add_head_rcu(&l_new->hash_node, head); >>> - if (l_old) { >>> - hlist_nulls_del_rcu(&l_old->hash_node); >>> + if (!l_old) { >>> + hlist_nulls_add_head_rcu(&l_new->hash_node, head); >>> + } else { >>> + /* Replace the old element atomically, so that >>> + * concurrent search will find either the new element or >>> + * the old element. >>> + */ >>> + hlist_nulls_replace_rcu(&l_new->hash_node, &l_old->hash_node); >>> >>> /* l_old has already been stashed in htab->extra_elems, free >>> * its special fields before it is available for reuse. Also >>> >> After thinking about it the second time, the atomic list replacement on >> the update side is enough to make lookup operation always find the >> existing element. However, due to the immediate reuse, the lookup may >> find an unexpected value. Maybe we should disable the immediate reuse >> for specific map (e.g., htab of maps). > Hmm, in an RCU-protected data structure, reusing the memory before an > RCU grace period has elapsed is just as wrong as freeing it, isn't it? > I.e., the reuse logic should have some kind of call_rcu redirection to > be completely correct? Not for all cases. There is SLAB_TYPESAFE_BY_RCU-typed slab. For hash map, the reuse is also tricky (e.g., the goto again case in lookup_nulls_elem_raw), however it can not prevent the lookup procedure from returning unexpected value. I had post a patch set [1] to "fix" that, but Alexei said it is "a known quirk". Here I am not sure about whether it is reasonable to disable the reuse for htab of maps only. I will post a v2 for the patch set. [1]: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/ > -Toke
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 4a9eeb7aef85..a28b11ce74c6 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1179,12 +1179,14 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, goto err; } - /* add new element to the head of the list, so that - * concurrent search will find it before old elem - */ - hlist_nulls_add_head_rcu(&l_new->hash_node, head); - if (l_old) { - hlist_nulls_del_rcu(&l_old->hash_node); + if (!l_old) { + hlist_nulls_add_head_rcu(&l_new->hash_node, head); + } else { + /* Replace the old element atomically, so that + * concurrent search will find either the new element or + * the old element. + */ + hlist_nulls_replace_rcu(&l_new->hash_node, &l_old->hash_node); /* l_old has already been stashed in htab->extra_elems, free * its special fields before it is available for reuse. Also
Currently, the update of existing element in hash map involves two steps: 1) insert the new element at the head of the hash list 2) remove the old element It is possible that the concurrent lookup operation may fail to find either the old element or the new element if the lookup operation starts before the addition and continues after the removal. Therefore, replacing the two-step update with an atomic update. After the change, the update will be atomic in the perspective of the lookup operation: it will either find the old element or the new element. Signed-off-by: Hou Tao <hotforest@gmail.com> --- kernel/bpf/hashtab.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)