bpf, sockmap: fix deadlocks in the sockhash and sockmap

Message ID	20230918093620.3479627-1-make_ruc2021@163.com (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98A6F1BDF7; Mon, 18 Sep 2023 09:38:28 +0000 (UTC) From: Ma Ke <make_ruc2021@163.com> To: john.fastabend@gmail.com, jakub@cloudflare.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, Ma Ke <make_ruc2021@163.com> Subject: [PATCH] bpf, sockmap: fix deadlocks in the sockhash and sockmap Date: Mon, 18 Sep 2023 17:36:20 +0800 Message-Id: <20230918093620.3479627-1-make_ruc2021@163.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	bpf, sockmap: fix deadlocks in the sockhash and sockmap \| expand bpf, sockmap: fix deadlocks in the sockhash and sockmap

Context	Check	Description
netdev/series_format	warning	Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 1342 this patch: 1342
netdev/cc_maintainers	success	CCed 8 of 8 maintainers
netdev/build_clang	success	Errors and warnings before: 1364 this patch: 1364
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 1365 this patch: 1365
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 21 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-next-PR	success	PR summary
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-0	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10	fail	Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for veristat
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for test_verifier on x86_64 with llvm-16

Ma Ke Sept. 18, 2023, 9:36 a.m. UTC

It seems that elements in sockhash are rarely actively
deleted by users or ebpf program. Therefore, we do not
pay much attention to their deletion. Compared with hash
maps, sockhash only provides spin_lock_bh protection.
This causes it to appear to have self-locking behavior
in the interrupt context, as CVE-2023-0160 points out.

Signed-off-by: Ma Ke <make_ruc2021@163.com>
---
 net/core/sock_map.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Kui-Feng Lee Sept. 18, 2023, 6:49 p.m. UTC | #1

On 9/18/23 02:36, Ma Ke wrote:
> It seems that elements in sockhash are rarely actively
> deleted by users or ebpf program. Therefore, we do not
> pay much attention to their deletion. Compared with hash
> maps, sockhash only provides spin_lock_bh protection.
> This causes it to appear to have self-locking behavior
> in the interrupt context, as CVE-2023-0160 points out.
> 
> Signed-off-by: Ma Ke <make_ruc2021@163.com>
> ---
>   net/core/sock_map.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index cb11750b1df5..1302d484e769 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -928,11 +928,12 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
>   	struct bpf_shtab_bucket *bucket;
>   	struct bpf_shtab_elem *elem;
>   	int ret = -ENOENT;
> +	unsigned long flags;

Keep reverse xmas tree ordering?

>   
>   	hash = sock_hash_bucket_hash(key, key_size);
>   	bucket = sock_hash_select_bucket(htab, hash);
>   
> -	spin_lock_bh(&bucket->lock);
> +	spin_lock_irqsave(&bucket->lock, flags);
>   	elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size);
>   	if (elem) {
>   		hlist_del_rcu(&elem->node);
> @@ -940,7 +941,7 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
>   		sock_hash_free_elem(htab, elem);
>   		ret = 0;
>   	}
> -	spin_unlock_bh(&bucket->lock);
> +	spin_unlock_irqrestore(&bucket->lock, flags);
>   	return ret;
>   }
>

John Fastabend Sept. 20, 2023, 6:07 p.m. UTC | #2

Kui-Feng Lee wrote:
> 
> 
> On 9/18/23 02:36, Ma Ke wrote:
> > It seems that elements in sockhash are rarely actively
> > deleted by users or ebpf program. Therefore, we do not

We never delete them in our usage. I think soon we will have
support to run BPF programs without a map at all removing these
concerns for many use cases.

> > pay much attention to their deletion. Compared with hash
> > maps, sockhash only provides spin_lock_bh protection.
> > This causes it to appear to have self-locking behavior
> > in the interrupt context, as CVE-2023-0160 points out.

CVE is a bit exagerrated in my opinion. I'm not sure why
anyone would delete an element from interrupt context. But,
OK if someone wrote such a thing we shouldn't lock up.

> > 
> > Signed-off-by: Ma Ke <make_ruc2021@163.com>
> > ---
> >   net/core/sock_map.c | 5 +++--
> >   1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> > index cb11750b1df5..1302d484e769 100644
> > --- a/net/core/sock_map.c
> > +++ b/net/core/sock_map.c
> > @@ -928,11 +928,12 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
> >   	struct bpf_shtab_bucket *bucket;
> >   	struct bpf_shtab_elem *elem;
> >   	int ret = -ENOENT;
> > +	unsigned long flags;
> 
> Keep reverse xmas tree ordering?
> 
> >   
> >   	hash = sock_hash_bucket_hash(key, key_size);
> >   	bucket = sock_hash_select_bucket(htab, hash);
> >   
> > -	spin_lock_bh(&bucket->lock);
> > +	spin_lock_irqsave(&bucket->lock, flags);

The hashtab code htab_lock_bucket also does a preempt_disable()
followed by raw_spin_lock_irqsave(). Do we need this as well
to handle the PREEMPT_CONFIG cases.

I'll also take a look, but figured I would post the question given
I wont likely get time to check until tonight/tomorrow.

Also converting to irqsave before ran into syzbot crash wont this do the
same?

> >   	elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size);
> >   	if (elem) {
> >   		hlist_del_rcu(&elem->node);
> > @@ -940,7 +941,7 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
> >   		sock_hash_free_elem(htab, elem);
> >   		ret = 0;
> >   	}
> > -	spin_unlock_bh(&bucket->lock);
> > +	spin_unlock_irqrestore(&bucket->lock, flags);
> >   	return ret;
> >   }
> >

Martin KaFai Lau Sept. 21, 2023, 1:31 a.m. UTC | #3

On 9/20/23 11:07 AM, John Fastabend wrote:
>>> pay much attention to their deletion. Compared with hash
>>> maps, sockhash only provides spin_lock_bh protection.
>>> This causes it to appear to have self-locking behavior
>>> in the interrupt context, as CVE-2023-0160 points out.
> 
> CVE is a bit exagerrated in my opinion. I'm not sure why
> anyone would delete an element from interrupt context. But,
> OK if someone wrote such a thing we shouldn't lock up.

This should only happen in tracing program?
not sure if it will be too drastic to disallow tracing program to use 
bpf_map_delete_elem during load time now.

A followup question, if sockmap can be accessed from tracing program, does it 
need an in_nmi() check?

>>>    	hash = sock_hash_bucket_hash(key, key_size);
>>>    	bucket = sock_hash_select_bucket(htab, hash);
>>>    
>>> -	spin_lock_bh(&bucket->lock);
>>> +	spin_lock_irqsave(&bucket->lock, flags);
> 
> The hashtab code htab_lock_bucket also does a preempt_disable()
> followed by raw_spin_lock_irqsave(). Do we need this as well
> to handle the PREEMPT_CONFIG cases.

iirc, preempt_disable in htab is for the CONFIG_PREEMPT but it is for the 
__this_cpu_inc_return to avoid unnecessary lock failure due to preemption, so 
probably it is not needed here. The commit 2775da216287 ("bpf: Disable 
preemption when increasing per-cpu map_locked")

If map_delete can be called from any tracing context, the raw_spin_lock_xxx 
version is probably needed though. Otherwise, splat (e.g. 
PROVE_RAW_LOCK_NESTING) could be triggered.

John Fastabend Sept. 21, 2023, 4:52 a.m. UTC | #4

Martin KaFai Lau wrote:
> On 9/20/23 11:07 AM, John Fastabend wrote:
> >>> pay much attention to their deletion. Compared with hash
> >>> maps, sockhash only provides spin_lock_bh protection.
> >>> This causes it to appear to have self-locking behavior
> >>> in the interrupt context, as CVE-2023-0160 points out.
> > 
> > CVE is a bit exagerrated in my opinion. I'm not sure why
> > anyone would delete an element from interrupt context. But,
> > OK if someone wrote such a thing we shouldn't lock up.
> 
> This should only happen in tracing program?
> not sure if it will be too drastic to disallow tracing program to use 
> bpf_map_delete_elem during load time now.

I don't think we have any users from tracing programs, but
might be something out there?

> 
> A followup question, if sockmap can be accessed from tracing program, does it 
> need an in_nmi() check?

I think we could just do 'in_nmi(); return EOPNOTSUPP;'

> 
> >>>    	hash = sock_hash_bucket_hash(key, key_size);
> >>>    	bucket = sock_hash_select_bucket(htab, hash);
> >>>    
> >>> -	spin_lock_bh(&bucket->lock);
> >>> +	spin_lock_irqsave(&bucket->lock, flags);
> > 
> > The hashtab code htab_lock_bucket also does a preempt_disable()
> > followed by raw_spin_lock_irqsave(). Do we need this as well
> > to handle the PREEMPT_CONFIG cases.
> 
> iirc, preempt_disable in htab is for the CONFIG_PREEMPT but it is for the 
> __this_cpu_inc_return to avoid unnecessary lock failure due to preemption, so 
> probably it is not needed here. The commit 2775da216287 ("bpf: Disable 
> preemption when increasing per-cpu map_locked")
> 
> If map_delete can be called from any tracing context, the raw_spin_lock_xxx 
> version is probably needed though. Otherwise, splat (e.g. 
> PROVE_RAW_LOCK_NESTING) could be triggered.

Yep. I'll look at it I guess. We should probably either block
access from tracing programs or add some tests.

bpf, sockmap: fix deadlocks in the sockhash and sockmap

Checks

Commit Message

Comments

Patch