diff mbox series

[bpf,v3,2/3] bpf: Don't reinit map value in prealloc_lru_pop

Message ID 20220809213033.24147-3-memxor@gmail.com (mailing list archive)
State Accepted
Commit 275c30bcee66a27d1aa97a215d607ad6d49804cb
Delegated to: BPF
Headers show
Series Don't reinit map value in prealloc_lru_pop | expand

Checks

Context Check Description
bpf/vmtest-bpf-PR fail PR summary
bpf/vmtest-bpf-VM_Test-1 fail Logs for Kernel LATEST on ubuntu-latest with gcc
bpf/vmtest-bpf-VM_Test-2 fail Logs for Kernel LATEST on ubuntu-latest with llvm-16
bpf/vmtest-bpf-VM_Test-3 fail Logs for Kernel LATEST on z15 with gcc
netdev/tree_selection success Clearly marked for bpf
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 13 this patch: 13
netdev/cc_maintainers fail 1 blamed authors not CCed: martin.lau@linux.dev; 7 maintainers not CCed: john.fastabend@gmail.com song@kernel.org sdf@google.com martin.lau@linux.dev kpsingh@kernel.org jolsa@kernel.org haoluo@google.com
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 13 this patch: 13
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 13 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Kumar Kartikeya Dwivedi Aug. 9, 2022, 9:30 p.m. UTC
The LRU map that is preallocated may have its elements reused while
another program holds a pointer to it from bpf_map_lookup_elem. Hence,
only check_and_free_fields is appropriate when the element is being
deleted, as it ensures proper synchronization against concurrent access
of the map value. After that, we cannot call check_and_init_map_value
again as it may rewrite bpf_spin_lock, bpf_timer, and kptr fields while
they can be concurrently accessed from a BPF program.

This is safe to do as when the map entry is deleted, concurrent access
is protected against by check_and_free_fields, i.e. an existing timer
would be freed, and any existing kptr will be released by it. The
program can create further timers and kptrs after check_and_free_fields,
but they will eventually be released once the preallocated items are
freed on map destruction, even if the item is never reused again. Hence,
the deleted item sitting in the free list can still have resources
attached to it, and they would never leak.

With spin_lock, we never touch the field at all on delete or update, as
we may end up modifying the state of the lock. Since the verifier
ensures that a bpf_spin_lock call is always paired with bpf_spin_unlock
call, the program will eventually release the lock so that on reuse the
new user of the value can take the lock.

Essentially, for the preallocated case, we must assume that the map
value may always be in use by the program, even when it is sitting in
the freelist, and handle things accordingly, i.e. use proper
synchronization inside check_and_free_fields, and never reinitialize the
special fields when it is reused on update.

Fixes: 68134668c17f ("bpf: Add map side support for bpf timers.")
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/hashtab.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

Comments

Martin KaFai Lau Aug. 9, 2022, 10:29 p.m. UTC | #1
On Tue, Aug 09, 2022 at 11:30:32PM +0200, Kumar Kartikeya Dwivedi wrote:
> The LRU map that is preallocated may have its elements reused while
> another program holds a pointer to it from bpf_map_lookup_elem. Hence,
> only check_and_free_fields is appropriate when the element is being
> deleted, as it ensures proper synchronization against concurrent access
> of the map value. After that, we cannot call check_and_init_map_value
> again as it may rewrite bpf_spin_lock, bpf_timer, and kptr fields while
> they can be concurrently accessed from a BPF program.
> 
> This is safe to do as when the map entry is deleted, concurrent access
> is protected against by check_and_free_fields, i.e. an existing timer
> would be freed, and any existing kptr will be released by it. The
> program can create further timers and kptrs after check_and_free_fields,
> but they will eventually be released once the preallocated items are
> freed on map destruction, even if the item is never reused again. Hence,
> the deleted item sitting in the free list can still have resources
> attached to it, and they would never leak.
> 
> With spin_lock, we never touch the field at all on delete or update, as
> we may end up modifying the state of the lock. Since the verifier
> ensures that a bpf_spin_lock call is always paired with bpf_spin_unlock
> call, the program will eventually release the lock so that on reuse the
> new user of the value can take the lock.
The bpf_spin_lock's verifier description makes sense.  Note that
the lru map does not support spin lock for now.

> 
> Essentially, for the preallocated case, we must assume that the map
> value may always be in use by the program, even when it is sitting in
> the freelist, and handle things accordingly, i.e. use proper
> synchronization inside check_and_free_fields, and never reinitialize the
> special fields when it is reused on update.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Alexei Starovoitov Aug. 10, 2022, 12:49 a.m. UTC | #2
On Tue, Aug 9, 2022 at 3:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Tue, Aug 09, 2022 at 11:30:32PM +0200, Kumar Kartikeya Dwivedi wrote:
> > The LRU map that is preallocated may have its elements reused while
> > another program holds a pointer to it from bpf_map_lookup_elem. Hence,
> > only check_and_free_fields is appropriate when the element is being
> > deleted, as it ensures proper synchronization against concurrent access
> > of the map value. After that, we cannot call check_and_init_map_value
> > again as it may rewrite bpf_spin_lock, bpf_timer, and kptr fields while
> > they can be concurrently accessed from a BPF program.
> >
> > This is safe to do as when the map entry is deleted, concurrent access
> > is protected against by check_and_free_fields, i.e. an existing timer
> > would be freed, and any existing kptr will be released by it. The
> > program can create further timers and kptrs after check_and_free_fields,
> > but they will eventually be released once the preallocated items are
> > freed on map destruction, even if the item is never reused again. Hence,
> > the deleted item sitting in the free list can still have resources
> > attached to it, and they would never leak.
> >
> > With spin_lock, we never touch the field at all on delete or update, as
> > we may end up modifying the state of the lock. Since the verifier
> > ensures that a bpf_spin_lock call is always paired with bpf_spin_unlock
> > call, the program will eventually release the lock so that on reuse the
> > new user of the value can take the lock.
> The bpf_spin_lock's verifier description makes sense.  Note that
> the lru map does not support spin lock for now.

ahh. then it's not a bpf tree material.
It's a minor cleanup for bpf-next?

> >
> > Essentially, for the preallocated case, we must assume that the map
> > value may always be in use by the program, even when it is sitting in
> > the freelist, and handle things accordingly, i.e. use proper
> > synchronization inside check_and_free_fields, and never reinitialize the
> > special fields when it is reused on update.
> Acked-by: Martin KaFai Lau <kafai@fb.com>
Kumar Kartikeya Dwivedi Aug. 10, 2022, 12:51 a.m. UTC | #3
On Wed, 10 Aug 2022 at 02:50, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Aug 9, 2022 at 3:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > On Tue, Aug 09, 2022 at 11:30:32PM +0200, Kumar Kartikeya Dwivedi wrote:
> > > The LRU map that is preallocated may have its elements reused while
> > > another program holds a pointer to it from bpf_map_lookup_elem. Hence,
> > > only check_and_free_fields is appropriate when the element is being
> > > deleted, as it ensures proper synchronization against concurrent access
> > > of the map value. After that, we cannot call check_and_init_map_value
> > > again as it may rewrite bpf_spin_lock, bpf_timer, and kptr fields while
> > > they can be concurrently accessed from a BPF program.
> > >
> > > This is safe to do as when the map entry is deleted, concurrent access
> > > is protected against by check_and_free_fields, i.e. an existing timer
> > > would be freed, and any existing kptr will be released by it. The
> > > program can create further timers and kptrs after check_and_free_fields,
> > > but they will eventually be released once the preallocated items are
> > > freed on map destruction, even if the item is never reused again. Hence,
> > > the deleted item sitting in the free list can still have resources
> > > attached to it, and they would never leak.
> > >
> > > With spin_lock, we never touch the field at all on delete or update, as
> > > we may end up modifying the state of the lock. Since the verifier
> > > ensures that a bpf_spin_lock call is always paired with bpf_spin_unlock
> > > call, the program will eventually release the lock so that on reuse the
> > > new user of the value can take the lock.
> > The bpf_spin_lock's verifier description makes sense.  Note that
> > the lru map does not support spin lock for now.
>
> ahh. then it's not a bpf tree material.
> It's a minor cleanup for bpf-next?
>

I was just describing what we do for each of the three types in the
commit log. It still affects timers and kptrs, which lru map supports.

> > >
> > > Essentially, for the preallocated case, we must assume that the map
> > > value may always be in use by the program, even when it is sitting in
> > > the freelist, and handle things accordingly, i.e. use proper
> > > synchronization inside check_and_free_fields, and never reinitialize the
> > > special fields when it is reused on update.
> > Acked-by: Martin KaFai Lau <kafai@fb.com>
Alexei Starovoitov Aug. 10, 2022, 12:58 a.m. UTC | #4
On Tue, Aug 9, 2022 at 5:52 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Wed, 10 Aug 2022 at 02:50, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Aug 9, 2022 at 3:29 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >
> > > On Tue, Aug 09, 2022 at 11:30:32PM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > The LRU map that is preallocated may have its elements reused while
> > > > another program holds a pointer to it from bpf_map_lookup_elem. Hence,
> > > > only check_and_free_fields is appropriate when the element is being
> > > > deleted, as it ensures proper synchronization against concurrent access
> > > > of the map value. After that, we cannot call check_and_init_map_value
> > > > again as it may rewrite bpf_spin_lock, bpf_timer, and kptr fields while
> > > > they can be concurrently accessed from a BPF program.
> > > >
> > > > This is safe to do as when the map entry is deleted, concurrent access
> > > > is protected against by check_and_free_fields, i.e. an existing timer
> > > > would be freed, and any existing kptr will be released by it. The
> > > > program can create further timers and kptrs after check_and_free_fields,
> > > > but they will eventually be released once the preallocated items are
> > > > freed on map destruction, even if the item is never reused again. Hence,
> > > > the deleted item sitting in the free list can still have resources
> > > > attached to it, and they would never leak.
> > > >
> > > > With spin_lock, we never touch the field at all on delete or update, as
> > > > we may end up modifying the state of the lock. Since the verifier
> > > > ensures that a bpf_spin_lock call is always paired with bpf_spin_unlock
> > > > call, the program will eventually release the lock so that on reuse the
> > > > new user of the value can take the lock.
> > > The bpf_spin_lock's verifier description makes sense.  Note that
> > > the lru map does not support spin lock for now.
> >
> > ahh. then it's not a bpf tree material.
> > It's a minor cleanup for bpf-next?
> >
>
> I was just describing what we do for each of the three types in the
> commit log. It still affects timers and kptrs, which lru map supports.

got it.
The uaf-ing cpu can use kptr field of deleted elem
and things go bad it lru_pop-ing cpu will clear it.
diff mbox series

Patch

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index da7578426a46..4d793a92301b 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -311,12 +311,8 @@  static struct htab_elem *prealloc_lru_pop(struct bpf_htab *htab, void *key,
 	struct htab_elem *l;
 
 	if (node) {
-		u32 key_size = htab->map.key_size;
-
 		l = container_of(node, struct htab_elem, lru_node);
-		memcpy(l->key, key, key_size);
-		check_and_init_map_value(&htab->map,
-					 l->key + round_up(key_size, 8));
+		memcpy(l->key, key, htab->map.key_size);
 		return l;
 	}