[RFC] Use after free in BPF/ XDP during XDP_REDIRECT

Message ID	20250313183911.SPAmGLyw@linutronix.de (mailing list archive)
State	RFC
Delegated to:	BPF
Headers	show Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A083018952C; Thu, 13 Mar 2025 18:39:15 +0000 (UTC) Date: Thu, 13 Mar 2025 19:39:11 +0100 From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: Ricardo =?utf-8?q?Ca=C3=B1uelo?= Navarro <rcn@igalia.com>, Alexei Starovoitov <ast@kernel.org>, Andrii Nakryiko <andrii@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Jesper Dangaard Brouer <hawk@kernel.org>, John Fastabend <john.fastabend@gmail.com>, Thomas Gleixner <tglx@linutronix.de> Subject: [RFC] Use after free in BPF/ XDP during XDP_REDIRECT Message-ID: <20250313183911.SPAmGLyw@linutronix.de> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline
Series	[RFC] Use after free in BPF/ XDP during XDP_REDIRECT \| expand [RFC] Use after free in BPF/ XDP during XDP_REDIRECT

Context	Check	Description
netdev/tree_selection	success	Guessing tree name failed - patch did not apply
bpf/vmtest-bpf-PR	fail	merge-conflict

Sebastian Andrzej Siewior March 13, 2025, 6:39 p.m. UTC

Hi,

Ricardo reported a KASAN related use after free
	https://lore.kernel.org/all/20250226-20250204-kasan-slab-use-after-free-read-in-dev_map_enqueue__submit-v3-0-360efec441ba@igalia.com/

in v6.6 stable and suggest a backport of commits
	401cb7dae8130 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
	fecef4cd42c68 ("tun: Assign missing bpf_net_context.")
	9da49aa80d686 ("tun: Add missing bpf_net_ctx_clear() in do_xdp_generic()")

as a fix. In the meantime I have the syz reproducer+config and was able
to investigate.
It looks as if the syzbot starts a BPF program via xdp_test_run_batch()
which assigns ri->tgt_value via dev_hash_map_redirect() and the return code
isn't XDP_REDIRECT it looks like nonsense. So the print in
bpf_warn_invalid_xdp_action() appears once. Everything goes as planned.
Then the TUN driver runs another BPF program which returns XDP_REDIRECT
without setting ri->tgt_value. This appears to be a trick because it
invoked bpf_trace_printk() which printed four characters. Anyway, this
is enough to get xdp_do_redirect() going.

The commits in questions do fix it because the bpf_redirect_info becomes
not only per-task but gets invalidated after the XDP context is left.

Now that I understand it I would suggest something smaller instead as a
stable fix, (instead the proposed patches). Any objections to the
following:




Sebastian

Toke Høiland-Jørgensen March 13, 2025, 7:28 p.m. UTC | #1

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> Hi,
>
> Ricardo reported a KASAN related use after free
> 	https://lore.kernel.org/all/20250226-20250204-kasan-slab-use-after-free-read-in-dev_map_enqueue__submit-v3-0-360efec441ba@igalia.com/
>
> in v6.6 stable and suggest a backport of commits
> 	401cb7dae8130 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
> 	fecef4cd42c68 ("tun: Assign missing bpf_net_context.")
> 	9da49aa80d686 ("tun: Add missing bpf_net_ctx_clear() in do_xdp_generic()")
>
> as a fix. In the meantime I have the syz reproducer+config and was able
> to investigate.
> It looks as if the syzbot starts a BPF program via xdp_test_run_batch()
> which assigns ri->tgt_value via dev_hash_map_redirect() and the return code
> isn't XDP_REDIRECT it looks like nonsense. So the print in
> bpf_warn_invalid_xdp_action() appears once. Everything goes as planned.
> Then the TUN driver runs another BPF program which returns XDP_REDIRECT
> without setting ri->tgt_value. This appears to be a trick because it
> invoked bpf_trace_printk() which printed four characters. Anyway, this
> is enough to get xdp_do_redirect() going.
>
> The commits in questions do fix it because the bpf_redirect_info becomes
> not only per-task but gets invalidated after the XDP context is left.
>
> Now that I understand it I would suggest something smaller instead as a
> stable fix, (instead the proposed patches). Any objections to the
> following:
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index be313928d272..1d906b7a541d 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -9000,8 +9000,12 @@ static bool xdp_is_valid_access(int off, int size,
>  
>  void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog, u32 act)
>  {
> +	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>  	const u32 act_max = XDP_REDIRECT;
>  
> +	ri->map_id = INT_MAX;
> +	ri->map_type = BPF_MAP_TYPE_UNSPEC;
> +
>  	pr_warn_once("%s XDP return value %u on prog %s (id %d) dev %s, expect packet loss!\n",
>  		     act > act_max ? "Illegal" : "Driver unsupported",
>  		     act, prog->aux->name, prog->aux->id, dev ? dev->name : "N/A");

From your description above, this will fix the particular error
encountered, but what happens if the initial return code is not in fact
nonsense (so the warn_invalid_action) is not triggered?

I.e.,

bpf_redirect_map(...);
return XDP_DROP;

would still leave ri->map_id and ri->map_type set for the later tun
driver invocation, no?

-Toke

Sebastian Andrzej Siewior March 13, 2025, 8:32 p.m. UTC | #2

On 2025-03-13 20:28:06 [+0100], Toke Høiland-Jørgensen wrote:
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> 
> > Hi,
> >
> > Ricardo reported a KASAN related use after free
> > 	https://lore.kernel.org/all/20250226-20250204-kasan-slab-use-after-free-read-in-dev_map_enqueue__submit-v3-0-360efec441ba@igalia.com/
> >
> > in v6.6 stable and suggest a backport of commits
> > 	401cb7dae8130 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
> > 	fecef4cd42c68 ("tun: Assign missing bpf_net_context.")
> > 	9da49aa80d686 ("tun: Add missing bpf_net_ctx_clear() in do_xdp_generic()")
> >
> > as a fix. In the meantime I have the syz reproducer+config and was able
> > to investigate.
> > It looks as if the syzbot starts a BPF program via xdp_test_run_batch()
> > which assigns ri->tgt_value via dev_hash_map_redirect() and the return code
> > isn't XDP_REDIRECT it looks like nonsense. So the print in
> > bpf_warn_invalid_xdp_action() appears once. Everything goes as planned.
> > Then the TUN driver runs another BPF program which returns XDP_REDIRECT
> > without setting ri->tgt_value. This appears to be a trick because it
> > invoked bpf_trace_printk() which printed four characters. Anyway, this
> > is enough to get xdp_do_redirect() going.
> >
> > The commits in questions do fix it because the bpf_redirect_info becomes
> > not only per-task but gets invalidated after the XDP context is left.
> >
> > Now that I understand it I would suggest something smaller instead as a
> > stable fix, (instead the proposed patches). Any objections to the
> > following:
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index be313928d272..1d906b7a541d 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -9000,8 +9000,12 @@ static bool xdp_is_valid_access(int off, int size,
> >  
> >  void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog, u32 act)
> >  {
> > +	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> >  	const u32 act_max = XDP_REDIRECT;
> >  
> > +	ri->map_id = INT_MAX;
> > +	ri->map_type = BPF_MAP_TYPE_UNSPEC;
> > +
> >  	pr_warn_once("%s XDP return value %u on prog %s (id %d) dev %s, expect packet loss!\n",
> >  		     act > act_max ? "Illegal" : "Driver unsupported",
> >  		     act, prog->aux->name, prog->aux->id, dev ? dev->name : "N/A");
> 
> From your description above, this will fix the particular error
> encountered, but what happens if the initial return code is not in fact
> nonsense (so the warn_invalid_action) is not triggered?
> 
> I.e.,
> 
> bpf_redirect_map(...);
> return XDP_DROP;
> 
> would still leave ri->map_id and ri->map_type set for the later tun
> driver invocation, no?

Right. So if it returns XDP_PASS or XDP_DROP instead of nonsense then
the buffer remains set. And another driver could use it.
But this would mean we would have to tackle each bpf_prog_run_xdp()
invocation and reset it afterwards… So maybe the backport instead? We
have
| $ git grep bpf_prog_run_xdp | wc -l
| 55

call sites.

> -Toke

Sebastian

Toke Høiland-Jørgensen March 14, 2025, 9:21 a.m. UTC | #3

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2025-03-13 20:28:06 [+0100], Toke Høiland-Jørgensen wrote:
>> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
>> 
>> > Hi,
>> >
>> > Ricardo reported a KASAN related use after free
>> > 	https://lore.kernel.org/all/20250226-20250204-kasan-slab-use-after-free-read-in-dev_map_enqueue__submit-v3-0-360efec441ba@igalia.com/
>> >
>> > in v6.6 stable and suggest a backport of commits
>> > 	401cb7dae8130 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
>> > 	fecef4cd42c68 ("tun: Assign missing bpf_net_context.")
>> > 	9da49aa80d686 ("tun: Add missing bpf_net_ctx_clear() in do_xdp_generic()")
>> >
>> > as a fix. In the meantime I have the syz reproducer+config and was able
>> > to investigate.
>> > It looks as if the syzbot starts a BPF program via xdp_test_run_batch()
>> > which assigns ri->tgt_value via dev_hash_map_redirect() and the return code
>> > isn't XDP_REDIRECT it looks like nonsense. So the print in
>> > bpf_warn_invalid_xdp_action() appears once. Everything goes as planned.
>> > Then the TUN driver runs another BPF program which returns XDP_REDIRECT
>> > without setting ri->tgt_value. This appears to be a trick because it
>> > invoked bpf_trace_printk() which printed four characters. Anyway, this
>> > is enough to get xdp_do_redirect() going.
>> >
>> > The commits in questions do fix it because the bpf_redirect_info becomes
>> > not only per-task but gets invalidated after the XDP context is left.
>> >
>> > Now that I understand it I would suggest something smaller instead as a
>> > stable fix, (instead the proposed patches). Any objections to the
>> > following:
>> >
>> > diff --git a/net/core/filter.c b/net/core/filter.c
>> > index be313928d272..1d906b7a541d 100644
>> > --- a/net/core/filter.c
>> > +++ b/net/core/filter.c
>> > @@ -9000,8 +9000,12 @@ static bool xdp_is_valid_access(int off, int size,
>> >  
>> >  void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog, u32 act)
>> >  {
>> > +	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>> >  	const u32 act_max = XDP_REDIRECT;
>> >  
>> > +	ri->map_id = INT_MAX;
>> > +	ri->map_type = BPF_MAP_TYPE_UNSPEC;
>> > +
>> >  	pr_warn_once("%s XDP return value %u on prog %s (id %d) dev %s, expect packet loss!\n",
>> >  		     act > act_max ? "Illegal" : "Driver unsupported",
>> >  		     act, prog->aux->name, prog->aux->id, dev ? dev->name : "N/A");
>> 
>> From your description above, this will fix the particular error
>> encountered, but what happens if the initial return code is not in fact
>> nonsense (so the warn_invalid_action) is not triggered?
>> 
>> I.e.,
>> 
>> bpf_redirect_map(...);
>> return XDP_DROP;
>> 
>> would still leave ri->map_id and ri->map_type set for the later tun
>> driver invocation, no?
>
> Right. So if it returns XDP_PASS or XDP_DROP instead of nonsense then
> the buffer remains set. And another driver could use it.
> But this would mean we would have to tackle each bpf_prog_run_xdp()
> invocation and reset it afterwards… So maybe the backport instead? We
> have
> | $ git grep bpf_prog_run_xdp | wc -l
> | 55
>
> call sites.

Hmm, how about putting the reset (essentially the changes you have
above) into bpf_prog_run_xdp() itself, before executing the BPF program?

-Toke

Sebastian Andrzej Siewior March 14, 2025, 3:30 p.m. UTC | #4

On 2025-03-14 10:21:15 [+0100], Toke Høiland-Jørgensen wrote:
> Hmm, how about putting the reset (essentially the changes you have
> above) into bpf_prog_run_xdp() itself, before executing the BPF program?

That would be the snippet below. It does work as far as the testcase
goes. It is just and unconditional write which might look like a waste
but given the circumstances…

While at it, is there anything that ensures that only bpf_prog_run_xdp()
can invoke the map_redirect callback? Mainline only assigns the task
pointer in NAPI callback so any usage outside of bpf_prog_run_xdp() will
lead to a segfault and I haven't seen a report yet so…

--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -486,7 +486,12 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * under local_bh_disable(), which provides the needed RCU protection
 	 * for accessing map entries.
 	 */
-	u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	u32 act;
+
+	ri->map_id = INT_MAX;
+	ri->map_type = BPF_MAP_TYPE_UNSPEC;
+	act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));

 	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
 		if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))

Toke Høiland-Jørgensen March 14, 2025, 4:03 p.m. UTC | #5

Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2025-03-14 10:21:15 [+0100], Toke Høiland-Jørgensen wrote:
>> Hmm, how about putting the reset (essentially the changes you have
>> above) into bpf_prog_run_xdp() itself, before executing the BPF program?
>
> That would be the snippet below. It does work as far as the testcase
> goes. It is just and unconditional write which might look like a waste
> but given the circumstances…

Hmm, yeah, it would slow down applications that never redirect, I
suppose. Hmm, we could avoid the write by checking the values first? See
below.

> While at it, is there anything that ensures that only bpf_prog_run_xdp()
> can invoke the map_redirect callback? Mainline only assigns the task
> pointer in NAPI callback so any usage outside of bpf_prog_run_xdp() will
> lead to a segfault and I haven't seen a report yet so…

Yes, the verifier restricts which program types can call the
map_redirect helper.


> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -486,7 +486,12 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
>  	 * under local_bh_disable(), which provides the needed RCU protection
>  	 * for accessing map entries.
>  	 */
> -	u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> +	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> +	u32 act;
> +

Add an if here like

if (ri->map_id | ri->map_type) { /* single | to make it a single branch */

> +	ri->map_id = INT_MAX;
> +	ri->map_type = BPF_MAP_TYPE_UNSPEC;

}

Also, ri->map_id should be set to 0, not INT_MAX.

-Toke

Sebastian Andrzej Siewior March 14, 2025, 5:27 p.m. UTC | #6

On 2025-03-14 17:03:35 [+0100], Toke Høiland-Jørgensen wrote:
> > While at it, is there anything that ensures that only bpf_prog_run_xdp()
> > can invoke the map_redirect callback? Mainline only assigns the task
> > pointer in NAPI callback so any usage outside of bpf_prog_run_xdp() will
> > lead to a segfault and I haven't seen a report yet so…
> 
> Yes, the verifier restricts which program types can call the
> map_redirect helper.

Okay. So checks for the BPF_PROG_TYPE_XDP type for the map_redirect and
that is the only one setting it. Okay. Now I remember Alexei mentioning
something…

> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -486,7 +486,12 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
> >  	 * under local_bh_disable(), which provides the needed RCU protection
> >  	 * for accessing map entries.
> >  	 */
> > -	u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
> > +	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
> > +	u32 act;
> > +
> 
> Add an if here like
> 
> if (ri->map_id | ri->map_type) { /* single | to make it a single branch */
> 
> > +	ri->map_id = INT_MAX;
> > +	ri->map_type = BPF_MAP_TYPE_UNSPEC;
> 
> }
> 
> Also, ri->map_id should be set to 0, not INT_MAX.

The or variant does

|         add %gs:this_cpu_off(%rip), %rax        # this_cpu_off, tcp_ptr__
|         movl    32(%rax), %edx  # _51->map_id, _51->map_id
|         orl     36(%rax), %edx  # _51->map_type, tmp311
|         je      .L1546  #,
|         movq    $0, 32(%rax)    #, MEM <vector(2) unsigned int> [(unsigned int *)_51 + 32B]
| .L1546:

while the || does

|         add %gs:this_cpu_off(%rip), %rax        # this_cpu_off, tcp_ptr__
|         cmpq    $0, 32(%rax)    #, *_51
|         je      .L1546  #,
|         movq    $0, 32(%rax)    #, MEM <vector(2) unsigned int> [(unsigned int *)_51 + 32B]
| .L1546:

gcc isn't bad at optimizing here ;)

This is the or version as asked for. I don't mind doing any of the both.
I everyone agrees then I would send it to Greg.

--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -486,7 +486,14 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * under local_bh_disable(), which provides the needed RCU protection
 	 * for accessing map entries.
 	 */
-	u32 act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	u32 act;
+
+	if (ri->map_id | ri->map_type) {
+		ri->map_id = 0;
+		ri->map_type = BPF_MAP_TYPE_UNSPEC;
+	}
+	act = __bpf_prog_run(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
 
 	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
 		if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))

> -Toke

Sebastian

[RFC] Use after free in BPF/ XDP during XDP_REDIRECT

Checks

Commit Message

Comments

Patch