diff mbox series

bpf, sockmap: fix deadlocks in the sockhash and sockmap

Message ID 20230918093620.3479627-1-make_ruc2021@163.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series bpf, sockmap: fix deadlocks in the sockhash and sockmap | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1342 this patch: 1342
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 1364 this patch: 1364
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1365 this patch: 1365
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 21 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-0 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-5 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-1 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_verifier on x86_64 with llvm-16

Commit Message

Ma Ke Sept. 18, 2023, 9:36 a.m. UTC
It seems that elements in sockhash are rarely actively
deleted by users or ebpf program. Therefore, we do not
pay much attention to their deletion. Compared with hash
maps, sockhash only provides spin_lock_bh protection.
This causes it to appear to have self-locking behavior
in the interrupt context, as CVE-2023-0160 points out.

Signed-off-by: Ma Ke <make_ruc2021@163.com>
---
 net/core/sock_map.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Kui-Feng Lee Sept. 18, 2023, 6:49 p.m. UTC | #1
On 9/18/23 02:36, Ma Ke wrote:
> It seems that elements in sockhash are rarely actively
> deleted by users or ebpf program. Therefore, we do not
> pay much attention to their deletion. Compared with hash
> maps, sockhash only provides spin_lock_bh protection.
> This causes it to appear to have self-locking behavior
> in the interrupt context, as CVE-2023-0160 points out.
> 
> Signed-off-by: Ma Ke <make_ruc2021@163.com>
> ---
>   net/core/sock_map.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index cb11750b1df5..1302d484e769 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -928,11 +928,12 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
>   	struct bpf_shtab_bucket *bucket;
>   	struct bpf_shtab_elem *elem;
>   	int ret = -ENOENT;
> +	unsigned long flags;

Keep reverse xmas tree ordering?

>   
>   	hash = sock_hash_bucket_hash(key, key_size);
>   	bucket = sock_hash_select_bucket(htab, hash);
>   
> -	spin_lock_bh(&bucket->lock);
> +	spin_lock_irqsave(&bucket->lock, flags);
>   	elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size);
>   	if (elem) {
>   		hlist_del_rcu(&elem->node);
> @@ -940,7 +941,7 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
>   		sock_hash_free_elem(htab, elem);
>   		ret = 0;
>   	}
> -	spin_unlock_bh(&bucket->lock);
> +	spin_unlock_irqrestore(&bucket->lock, flags);
>   	return ret;
>   }
>
John Fastabend Sept. 20, 2023, 6:07 p.m. UTC | #2
Kui-Feng Lee wrote:
> 
> 
> On 9/18/23 02:36, Ma Ke wrote:
> > It seems that elements in sockhash are rarely actively
> > deleted by users or ebpf program. Therefore, we do not

We never delete them in our usage. I think soon we will have
support to run BPF programs without a map at all removing these
concerns for many use cases.

> > pay much attention to their deletion. Compared with hash
> > maps, sockhash only provides spin_lock_bh protection.
> > This causes it to appear to have self-locking behavior
> > in the interrupt context, as CVE-2023-0160 points out.

CVE is a bit exagerrated in my opinion. I'm not sure why
anyone would delete an element from interrupt context. But,
OK if someone wrote such a thing we shouldn't lock up.

> > 
> > Signed-off-by: Ma Ke <make_ruc2021@163.com>
> > ---
> >   net/core/sock_map.c | 5 +++--
> >   1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> > index cb11750b1df5..1302d484e769 100644
> > --- a/net/core/sock_map.c
> > +++ b/net/core/sock_map.c
> > @@ -928,11 +928,12 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
> >   	struct bpf_shtab_bucket *bucket;
> >   	struct bpf_shtab_elem *elem;
> >   	int ret = -ENOENT;
> > +	unsigned long flags;
> 
> Keep reverse xmas tree ordering?
> 
> >   
> >   	hash = sock_hash_bucket_hash(key, key_size);
> >   	bucket = sock_hash_select_bucket(htab, hash);
> >   
> > -	spin_lock_bh(&bucket->lock);
> > +	spin_lock_irqsave(&bucket->lock, flags);

The hashtab code htab_lock_bucket also does a preempt_disable()
followed by raw_spin_lock_irqsave(). Do we need this as well
to handle the PREEMPT_CONFIG cases.

I'll also take a look, but figured I would post the question given
I wont likely get time to check until tonight/tomorrow.

Also converting to irqsave before ran into syzbot crash wont this do the
same?

> >   	elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size);
> >   	if (elem) {
> >   		hlist_del_rcu(&elem->node);
> > @@ -940,7 +941,7 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
> >   		sock_hash_free_elem(htab, elem);
> >   		ret = 0;
> >   	}
> > -	spin_unlock_bh(&bucket->lock);
> > +	spin_unlock_irqrestore(&bucket->lock, flags);
> >   	return ret;
> >   }
> >
Martin KaFai Lau Sept. 21, 2023, 1:31 a.m. UTC | #3
On 9/20/23 11:07 AM, John Fastabend wrote:
>>> pay much attention to their deletion. Compared with hash
>>> maps, sockhash only provides spin_lock_bh protection.
>>> This causes it to appear to have self-locking behavior
>>> in the interrupt context, as CVE-2023-0160 points out.
> 
> CVE is a bit exagerrated in my opinion. I'm not sure why
> anyone would delete an element from interrupt context. But,
> OK if someone wrote such a thing we shouldn't lock up.

This should only happen in tracing program?
not sure if it will be too drastic to disallow tracing program to use 
bpf_map_delete_elem during load time now.

A followup question, if sockmap can be accessed from tracing program, does it 
need an in_nmi() check?

>>>    	hash = sock_hash_bucket_hash(key, key_size);
>>>    	bucket = sock_hash_select_bucket(htab, hash);
>>>    
>>> -	spin_lock_bh(&bucket->lock);
>>> +	spin_lock_irqsave(&bucket->lock, flags);
> 
> The hashtab code htab_lock_bucket also does a preempt_disable()
> followed by raw_spin_lock_irqsave(). Do we need this as well
> to handle the PREEMPT_CONFIG cases.

iirc, preempt_disable in htab is for the CONFIG_PREEMPT but it is for the 
__this_cpu_inc_return to avoid unnecessary lock failure due to preemption, so 
probably it is not needed here. The commit 2775da216287 ("bpf: Disable 
preemption when increasing per-cpu map_locked")

If map_delete can be called from any tracing context, the raw_spin_lock_xxx 
version is probably needed though. Otherwise, splat (e.g. 
PROVE_RAW_LOCK_NESTING) could be triggered.
John Fastabend Sept. 21, 2023, 4:52 a.m. UTC | #4
Martin KaFai Lau wrote:
> On 9/20/23 11:07 AM, John Fastabend wrote:
> >>> pay much attention to their deletion. Compared with hash
> >>> maps, sockhash only provides spin_lock_bh protection.
> >>> This causes it to appear to have self-locking behavior
> >>> in the interrupt context, as CVE-2023-0160 points out.
> > 
> > CVE is a bit exagerrated in my opinion. I'm not sure why
> > anyone would delete an element from interrupt context. But,
> > OK if someone wrote such a thing we shouldn't lock up.
> 
> This should only happen in tracing program?
> not sure if it will be too drastic to disallow tracing program to use 
> bpf_map_delete_elem during load time now.

I don't think we have any users from tracing programs, but
might be something out there?

> 
> A followup question, if sockmap can be accessed from tracing program, does it 
> need an in_nmi() check?

I think we could just do 'in_nmi(); return EOPNOTSUPP;'

> 
> >>>    	hash = sock_hash_bucket_hash(key, key_size);
> >>>    	bucket = sock_hash_select_bucket(htab, hash);
> >>>    
> >>> -	spin_lock_bh(&bucket->lock);
> >>> +	spin_lock_irqsave(&bucket->lock, flags);
> > 
> > The hashtab code htab_lock_bucket also does a preempt_disable()
> > followed by raw_spin_lock_irqsave(). Do we need this as well
> > to handle the PREEMPT_CONFIG cases.
> 
> iirc, preempt_disable in htab is for the CONFIG_PREEMPT but it is for the 
> __this_cpu_inc_return to avoid unnecessary lock failure due to preemption, so 
> probably it is not needed here. The commit 2775da216287 ("bpf: Disable 
> preemption when increasing per-cpu map_locked")
> 
> If map_delete can be called from any tracing context, the raw_spin_lock_xxx 
> version is probably needed though. Otherwise, splat (e.g. 
> PROVE_RAW_LOCK_NESTING) could be triggered.

Yep. I'll look at it I guess. We should probably either block
access from tracing programs or add some tests.
diff mbox series

Patch

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index cb11750b1df5..1302d484e769 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -928,11 +928,12 @@  static long sock_hash_delete_elem(struct bpf_map *map, void *key)
 	struct bpf_shtab_bucket *bucket;
 	struct bpf_shtab_elem *elem;
 	int ret = -ENOENT;
+	unsigned long flags;
 
 	hash = sock_hash_bucket_hash(key, key_size);
 	bucket = sock_hash_select_bucket(htab, hash);
 
-	spin_lock_bh(&bucket->lock);
+	spin_lock_irqsave(&bucket->lock, flags);
 	elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size);
 	if (elem) {
 		hlist_del_rcu(&elem->node);
@@ -940,7 +941,7 @@  static long sock_hash_delete_elem(struct bpf_map *map, void *key)
 		sock_hash_free_elem(htab, elem);
 		ret = 0;
 	}
-	spin_unlock_bh(&bucket->lock);
+	spin_unlock_irqrestore(&bucket->lock, flags);
 	return ret;
 }