diff mbox series

[net] udp: fix segmentation crash for GRO packet without fraglist

Message ID 20240415150103.23316-1-shiming.cheng@mediatek.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] udp: fix segmentation crash for GRO packet without fraglist | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 928 this patch: 928
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 2 blamed authors not CCed: willemb@google.com steffen.klassert@secunet.com; 6 maintainers not CCed: steffen.klassert@secunet.com bpf@vger.kernel.org linux-mediatek@lists.infradead.org angelogioacchino.delregno@collabora.com willemb@google.com linux-arm-kernel@lists.infradead.org
netdev/build_clang success Errors and warnings before: 938 this patch: 938
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 939 this patch: 939
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 54 this patch: 54
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-04-15--18-00 (tests: 960)

Commit Message

Shiming Cheng April 15, 2024, 3:01 p.m. UTC
From: Shiming Cheng <shiming.cheng@mediatek.com>

A GRO packet without fraglist is crashed and backtrace is as below:
 [ 1100.812205][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
G        W  OE      6.6.17-android15-0-g380371ea9bf1 #1
 [ 1100.812317][    C3]  __udp_gso_segment+0x298/0x4d4
 [ 1100.812335][    C3]  __skb_gso_segment+0xc4/0x120
 [ 1100.812339][    C3]  udp_rcv_segment+0x50/0x134
 [ 1100.812344][    C3]  udp_queue_rcv_skb+0x74/0x114
 [ 1100.812348][    C3]  udp_unicast_rcv_skb+0x94/0xac
 [ 1100.812358][    C3]  udp_rcv+0x20/0x30

The reason that the packet loses its fraglist is that in ingress bpf
it makes a test pull with to make sure it can read packet headers
via direct packet access: In bpf_progs/offload.c
try_make_writable -> bpf_skb_pull_data -> pskb_may_pull ->
__pskb_pull_tail  This operation pull the data in fraglist into linear
and set the fraglist to null.

BPF needs to modify a proper length to do pull data. However kernel
should also improve the flow to avoid crash from a bpf function call.
As there is no split flow and app may not decode the merged UDP packet,
we should drop the packet without fraglist in skb_segment_list here.

Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Signed-off-by: Lena Wang <lena.wang@mediatek.com>
---
 net/core/skbuff.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Willem de Bruijn April 15, 2024, 8:53 p.m. UTC | #1
shiming.cheng@ wrote:
> From: Shiming Cheng <shiming.cheng@mediatek.com>
> 
> A GRO packet without fraglist is crashed and backtrace is as below:
>  [ 1100.812205][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
> G        W  OE      6.6.17-android15-0-g380371ea9bf1 #1
>  [ 1100.812317][    C3]  __udp_gso_segment+0x298/0x4d4
>  [ 1100.812335][    C3]  __skb_gso_segment+0xc4/0x120
>  [ 1100.812339][    C3]  udp_rcv_segment+0x50/0x134
>  [ 1100.812344][    C3]  udp_queue_rcv_skb+0x74/0x114
>  [ 1100.812348][    C3]  udp_unicast_rcv_skb+0x94/0xac
>  [ 1100.812358][    C3]  udp_rcv+0x20/0x30
> 
> The reason that the packet loses its fraglist is that in ingress bpf
> it makes a test pull with to make sure it can read packet headers
> via direct packet access: In bpf_progs/offload.c
> try_make_writable -> bpf_skb_pull_data -> pskb_may_pull ->
> __pskb_pull_tail  This operation pull the data in fraglist into linear
> and set the fraglist to null.

What is the right behavior from BPF with regard to SKB_GSO_FRAGLIST
skbs?

Some, like SCTP, cannot be linearized ever, as the do not have a
single gso_size.

Should this BPF operation just fail?

> 
> BPF needs to modify a proper length to do pull data. However kernel
> should also improve the flow to avoid crash from a bpf function call.
> As there is no split flow and app may not decode the merged UDP packet,
> we should drop the packet without fraglist in skb_segment_list here.
> 
> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> ---
>  net/core/skbuff.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b99127712e67..f68f2679b086 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
>  	if (err)
>  		goto err_linearize;
>  
> +	if (!list_skb)
> +		goto err_linearize;
> +
>  	skb_shinfo(skb)->frag_list = NULL;

In absense of plugging the issue in BPF, dropping here is the best
we can do indeed, I think.
Lena Wang (王娜) April 16, 2024, 2:14 a.m. UTC | #2
On Mon, 2024-04-15 at 16:53 -0400, Willem de Bruijn wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  shiming.cheng@ wrote:
> > From: Shiming Cheng <shiming.cheng@mediatek.com>
> > 
> > A GRO packet without fraglist is crashed and backtrace is as below:
> >  [ 1100.812205][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
> > G        W  OE      6.6.17-android15-0-g380371ea9bf1 #1
> >  [ 1100.812317][    C3]  __udp_gso_segment+0x298/0x4d4
> >  [ 1100.812335][    C3]  __skb_gso_segment+0xc4/0x120
> >  [ 1100.812339][    C3]  udp_rcv_segment+0x50/0x134
> >  [ 1100.812344][    C3]  udp_queue_rcv_skb+0x74/0x114
> >  [ 1100.812348][    C3]  udp_unicast_rcv_skb+0x94/0xac
> >  [ 1100.812358][    C3]  udp_rcv+0x20/0x30
> > 
> > The reason that the packet loses its fraglist is that in ingress
> bpf
> > it makes a test pull with to make sure it can read packet headers
> > via direct packet access: In bpf_progs/offload.c
> > try_make_writable -> bpf_skb_pull_data -> pskb_may_pull ->
> > __pskb_pull_tail  This operation pull the data in fraglist into
> linear
> > and set the fraglist to null.
> 
> What is the right behavior from BPF with regard to SKB_GSO_FRAGLIST
> skbs?
> 
> Some, like SCTP, cannot be linearized ever, as the do not have a
> single gso_size.
> 
> Should this BPF operation just fail?
> 
In most situation for big gso size packet, it indeed fails but BPF
doesn't check the result. It seems the udp GRO packet can't be pulled/
trimed/condensed or else it can't be segmented correctly.

As the BPF function comments it doesn't matter if the data pull failed 
or pull less. It just does a blind best effort pull.

A patch to modify bpf pull length is upstreamed to Google before and
below are part of Google BPF expert maze's reply:
maze@google.com<maze@google.com> #5Apr 13, 2024 02:30AM
I *think* if that patch fixes anything, then it's really proving that
there's a bug in the kernel that needs to be fixed instead.
It should be legal to call try_make_writable(skb, X) with *any* value
of X.

I add maze in loop and we could start more discussion here.

> > 
> > BPF needs to modify a proper length to do pull data. However kernel
> > should also improve the flow to avoid crash from a bpf function
> call.
> > As there is no split flow and app may not decode the merged UDP
> packet,
> > we should drop the packet without fraglist in skb_segment_list
> here.
> > 
> > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > ---
> >  net/core/skbuff.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index b99127712e67..f68f2679b086 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct
> sk_buff *skb,
> >  if (err)
> >  goto err_linearize;
> >  
> > +if (!list_skb)
> > +goto err_linearize;
> > +
> >  skb_shinfo(skb)->frag_list = NULL;
> 
> In absense of plugging the issue in BPF, dropping here is the best
> we can do indeed, I think.
>
Maciej Żenczykowski April 16, 2024, 2:53 a.m. UTC | #3
On Mon, Apr 15, 2024 at 7:14 PM Lena Wang (王娜) <Lena.Wang@mediatek.com> wrote:
>
> On Mon, 2024-04-15 at 16:53 -0400, Willem de Bruijn wrote:
> >
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  shiming.cheng@ wrote:
> > > From: Shiming Cheng <shiming.cheng@mediatek.com>
> > >
> > > A GRO packet without fraglist is crashed and backtrace is as below:
> > >  [ 1100.812205][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
> > > G        W  OE      6.6.17-android15-0-g380371ea9bf1 #1
> > >  [ 1100.812317][    C3]  __udp_gso_segment+0x298/0x4d4
> > >  [ 1100.812335][    C3]  __skb_gso_segment+0xc4/0x120
> > >  [ 1100.812339][    C3]  udp_rcv_segment+0x50/0x134
> > >  [ 1100.812344][    C3]  udp_queue_rcv_skb+0x74/0x114
> > >  [ 1100.812348][    C3]  udp_unicast_rcv_skb+0x94/0xac
> > >  [ 1100.812358][    C3]  udp_rcv+0x20/0x30
> > >
> > > The reason that the packet loses its fraglist is that in ingress
> > bpf
> > > it makes a test pull with to make sure it can read packet headers
> > > via direct packet access: In bpf_progs/offload.c
> > > try_make_writable -> bpf_skb_pull_data -> pskb_may_pull ->
> > > __pskb_pull_tail  This operation pull the data in fraglist into
> > linear
> > > and set the fraglist to null.
> >
> > What is the right behavior from BPF with regard to SKB_GSO_FRAGLIST
> > skbs?
> >
> > Some, like SCTP, cannot be linearized ever, as the do not have a
> > single gso_size.
> >
> > Should this BPF operation just fail?
> >
> In most situation for big gso size packet, it indeed fails but BPF
> doesn't check the result. It seems the udp GRO packet can't be pulled/
> trimed/condensed or else it can't be segmented correctly.
>
> As the BPF function comments it doesn't matter if the data pull failed
> or pull less. It just does a blind best effort pull.
>
> A patch to modify bpf pull length is upstreamed to Google before and
> below are part of Google BPF expert maze's reply:
> maze@google.com<maze@google.com> #5Apr 13, 2024 02:30AM
> I *think* if that patch fixes anything, then it's really proving that
> there's a bug in the kernel that needs to be fixed instead.
> It should be legal to call try_make_writable(skb, X) with *any* value
> of X.
>
> I add maze in loop and we could start more discussion here.

Personally, I think bpf_skb_pull_data() should have automatically
(ie. in kernel code) reduced how much it pulls so that it would pull
headers only,
and not packet content.
(This is assuming the rest of the code isn't ready to deal with a longer pull,
which I think is the case atm.  Pulling too much, and then crashing or forcing
the stack to drop packets because of them being malformed seems wrong...)

In general it would be nice if there was a way to just say pull all headers...
(or possibly all L2/L3/L4 headers)
You in general need to pull stuff *before* you've even looked at the packet,
so that you can look at the packet,
so it's relatively hard/annoying to pull the correct length from bpf
code itself.

> > > BPF needs to modify a proper length to do pull data. However kernel
> > > should also improve the flow to avoid crash from a bpf function
> > call.
> > > As there is no split flow and app may not decode the merged UDP
> > packet,
> > > we should drop the packet without fraglist in skb_segment_list
> > here.
> > >
> > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > > ---
> > >  net/core/skbuff.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index b99127712e67..f68f2679b086 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct
> > sk_buff *skb,
> > >  if (err)
> > >  goto err_linearize;
> > >
> > > +if (!list_skb)
> > > +goto err_linearize;
> > > +
> > >  skb_shinfo(skb)->frag_list = NULL;
> >
> > In absense of plugging the issue in BPF, dropping here is the best
> > we can do indeed, I think.
> >

--
Maciej Żenczykowski, Kernel Networking Developer @ Google
Willem de Bruijn April 16, 2024, 5:16 p.m. UTC | #4
Maciej Żenczykowski wrote:
> On Mon, Apr 15, 2024 at 7:14 PM Lena Wang (王娜) <Lena.Wang@mediatek.com> wrote:
> >
> > On Mon, 2024-04-15 at 16:53 -0400, Willem de Bruijn wrote:
> > >
> > > External email : Please do not click links or open attachments until
> > > you have verified the sender or the content.
> > >  shiming.cheng@ wrote:
> > > > From: Shiming Cheng <shiming.cheng@mediatek.com>
> > > >
> > > > A GRO packet without fraglist is crashed and backtrace is as below:
> > > >  [ 1100.812205][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
> > > > G        W  OE      6.6.17-android15-0-g380371ea9bf1 #1
> > > >  [ 1100.812317][    C3]  __udp_gso_segment+0x298/0x4d4
> > > >  [ 1100.812335][    C3]  __skb_gso_segment+0xc4/0x120
> > > >  [ 1100.812339][    C3]  udp_rcv_segment+0x50/0x134
> > > >  [ 1100.812344][    C3]  udp_queue_rcv_skb+0x74/0x114
> > > >  [ 1100.812348][    C3]  udp_unicast_rcv_skb+0x94/0xac
> > > >  [ 1100.812358][    C3]  udp_rcv+0x20/0x30
> > > >
> > > > The reason that the packet loses its fraglist is that in ingress
> > > bpf
> > > > it makes a test pull with to make sure it can read packet headers
> > > > via direct packet access: In bpf_progs/offload.c
> > > > try_make_writable -> bpf_skb_pull_data -> pskb_may_pull ->
> > > > __pskb_pull_tail  This operation pull the data in fraglist into
> > > linear
> > > > and set the fraglist to null.
> > >
> > > What is the right behavior from BPF with regard to SKB_GSO_FRAGLIST
> > > skbs?
> > >
> > > Some, like SCTP, cannot be linearized ever, as the do not have a
> > > single gso_size.
> > >
> > > Should this BPF operation just fail?
> > >
> > In most situation for big gso size packet, it indeed fails but BPF
> > doesn't check the result. It seems the udp GRO packet can't be pulled/
> > trimed/condensed or else it can't be segmented correctly.
> >
> > As the BPF function comments it doesn't matter if the data pull failed
> > or pull less. It just does a blind best effort pull.
> >
> > A patch to modify bpf pull length is upstreamed to Google before and
> > below are part of Google BPF expert maze's reply:
> > maze@google.com<maze@google.com> #5Apr 13, 2024 02:30AM
> > I *think* if that patch fixes anything, then it's really proving that
> > there's a bug in the kernel that needs to be fixed instead.
> > It should be legal to call try_make_writable(skb, X) with *any* value
> > of X.
> >
> > I add maze in loop and we could start more discussion here.
> 
> Personally, I think bpf_skb_pull_data() should have automatically
> (ie. in kernel code) reduced how much it pulls so that it would pull
> headers only,

That would be a helper that parses headers to discover header length.
Parsing is better left to the BPF program.

> and not packet content.
> (This is assuming the rest of the code isn't ready to deal with a longer pull,
> which I think is the case atm.  Pulling too much, and then crashing or forcing
> the stack to drop packets because of them being malformed seems wrong...)
> 
> In general it would be nice if there was a way to just say pull all headers...
> (or possibly all L2/L3/L4 headers)
> You in general need to pull stuff *before* you've even looked at the packet,
> so that you can look at the packet,
> so it's relatively hard/annoying to pull the correct length from bpf
> code itself.
> 
> > > > BPF needs to modify a proper length to do pull data. However kernel
> > > > should also improve the flow to avoid crash from a bpf function
> > > call.
> > > > As there is no split flow and app may not decode the merged UDP
> > > packet,
> > > > we should drop the packet without fraglist in skb_segment_list
> > > here.
> > > >
> > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > > > ---
> > > >  net/core/skbuff.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > index b99127712e67..f68f2679b086 100644
> > > > --- a/net/core/skbuff.c
> > > > +++ b/net/core/skbuff.c
> > > > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct
> > > sk_buff *skb,
> > > >  if (err)
> > > >  goto err_linearize;
> > > >
> > > > +if (!list_skb)
> > > > +goto err_linearize;
> > > > +

This would catch the case where the entire data frag_list is
linearized, but not a pskb_may_pull that only pulls in part of the
list.

Even with BPF being privileged, the kernel should not crash if BPF
pulls a FRAGLIST GSO skb.

But the check needs to be refined a bit. For a UDP GSO packet, I
think gso_size is still valid, so if the head_skb length does not
match gso_size, it has been messed with and should be dropped.

For a GSO_BY_FRAGS skb, there is no single gso_size, and this pull
may be entirely undetectable as long as frag_list != NULL?


> > > >  skb_shinfo(skb)->frag_list = NULL;
> > >
> > > In absense of plugging the issue in BPF, dropping here is the best
> > > we can do indeed, I think.
> > >
> 
> --
> Maciej Żenczykowski, Kernel Networking Developer @ Google
Maciej Żenczykowski April 16, 2024, 5:51 p.m. UTC | #5
On Tue, Apr 16, 2024 at 10:16 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Maciej Żenczykowski wrote:
> > On Mon, Apr 15, 2024 at 7:14 PM Lena Wang (王娜) <Lena.Wang@mediatek.com> wrote:
> > >
> > > On Mon, 2024-04-15 at 16:53 -0400, Willem de Bruijn wrote:
> > > >
> > > > External email : Please do not click links or open attachments until
> > > > you have verified the sender or the content.
> > > >  shiming.cheng@ wrote:
> > > > > From: Shiming Cheng <shiming.cheng@mediatek.com>
> > > > >
> > > > > A GRO packet without fraglist is crashed and backtrace is as below:
> > > > >  [ 1100.812205][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
> > > > > G        W  OE      6.6.17-android15-0-g380371ea9bf1 #1
> > > > >  [ 1100.812317][    C3]  __udp_gso_segment+0x298/0x4d4
> > > > >  [ 1100.812335][    C3]  __skb_gso_segment+0xc4/0x120
> > > > >  [ 1100.812339][    C3]  udp_rcv_segment+0x50/0x134
> > > > >  [ 1100.812344][    C3]  udp_queue_rcv_skb+0x74/0x114
> > > > >  [ 1100.812348][    C3]  udp_unicast_rcv_skb+0x94/0xac
> > > > >  [ 1100.812358][    C3]  udp_rcv+0x20/0x30
> > > > >
> > > > > The reason that the packet loses its fraglist is that in ingress
> > > > bpf
> > > > > it makes a test pull with to make sure it can read packet headers
> > > > > via direct packet access: In bpf_progs/offload.c
> > > > > try_make_writable -> bpf_skb_pull_data -> pskb_may_pull ->
> > > > > __pskb_pull_tail  This operation pull the data in fraglist into
> > > > linear
> > > > > and set the fraglist to null.
> > > >
> > > > What is the right behavior from BPF with regard to SKB_GSO_FRAGLIST
> > > > skbs?
> > > >
> > > > Some, like SCTP, cannot be linearized ever, as the do not have a
> > > > single gso_size.
> > > >
> > > > Should this BPF operation just fail?
> > > >
> > > In most situation for big gso size packet, it indeed fails but BPF
> > > doesn't check the result. It seems the udp GRO packet can't be pulled/
> > > trimed/condensed or else it can't be segmented correctly.
> > >
> > > As the BPF function comments it doesn't matter if the data pull failed
> > > or pull less. It just does a blind best effort pull.
> > >
> > > A patch to modify bpf pull length is upstreamed to Google before and
> > > below are part of Google BPF expert maze's reply:
> > > maze@google.com<maze@google.com> #5Apr 13, 2024 02:30AM
> > > I *think* if that patch fixes anything, then it's really proving that
> > > there's a bug in the kernel that needs to be fixed instead.
> > > It should be legal to call try_make_writable(skb, X) with *any* value
> > > of X.
> > >
> > > I add maze in loop and we could start more discussion here.
> >
> > Personally, I think bpf_skb_pull_data() should have automatically
> > (ie. in kernel code) reduced how much it pulls so that it would pull
> > headers only,
>
> That would be a helper that parses headers to discover header length.

Does it actually need to?  Presumably the bpf pull function could
notice that it is
a packet flagged as being of type X (UDP GSO FRAGLIST) and reduce the pull
accordingly so that it doesn't pull anything from the non-linear
fraglist portion???

I know only the generic overview of what udp gso is, not any details, so I am
assuming here that there's some sort of guarantee to how these packets
are structured...  But I imagine there must be or we wouldn't be hitting these
issues deeper in the stack?

> Parsing is better left to the BPF program.
>
> > and not packet content.
> > (This is assuming the rest of the code isn't ready to deal with a longer pull,
> > which I think is the case atm.  Pulling too much, and then crashing or forcing
> > the stack to drop packets because of them being malformed seems wrong...)
> >
> > In general it would be nice if there was a way to just say pull all headers...
> > (or possibly all L2/L3/L4 headers)
> > You in general need to pull stuff *before* you've even looked at the packet,
> > so that you can look at the packet,
> > so it's relatively hard/annoying to pull the correct length from bpf
> > code itself.
> >
> > > > > BPF needs to modify a proper length to do pull data. However kernel
> > > > > should also improve the flow to avoid crash from a bpf function
> > > > call.
> > > > > As there is no split flow and app may not decode the merged UDP
> > > > packet,
> > > > > we should drop the packet without fraglist in skb_segment_list
> > > > here.
> > > > >
> > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > > > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > > > > ---
> > > > >  net/core/skbuff.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > > index b99127712e67..f68f2679b086 100644
> > > > > --- a/net/core/skbuff.c
> > > > > +++ b/net/core/skbuff.c
> > > > > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct
> > > > sk_buff *skb,
> > > > >  if (err)
> > > > >  goto err_linearize;
> > > > >
> > > > > +if (!list_skb)
> > > > > +goto err_linearize;
> > > > > +
>
> This would catch the case where the entire data frag_list is
> linearized, but not a pskb_may_pull that only pulls in part of the
> list.
>
> Even with BPF being privileged, the kernel should not crash if BPF
> pulls a FRAGLIST GSO skb.
>
> But the check needs to be refined a bit. For a UDP GSO packet, I
> think gso_size is still valid, so if the head_skb length does not
> match gso_size, it has been messed with and should be dropped.
>
> For a GSO_BY_FRAGS skb, there is no single gso_size, and this pull
> may be entirely undetectable as long as frag_list != NULL?
>
>
> > > > >  skb_shinfo(skb)->frag_list = NULL;
> > > >
> > > > In absense of plugging the issue in BPF, dropping here is the best
> > > > we can do indeed, I think.
Maciej Żenczykowski April 16, 2024, 5:57 p.m. UTC | #6
On Tue, Apr 16, 2024 at 10:51 AM Maciej Żenczykowski <maze@google.com> wrote:
>
> On Tue, Apr 16, 2024 at 10:16 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Maciej Żenczykowski wrote:
> > > On Mon, Apr 15, 2024 at 7:14 PM Lena Wang (王娜) <Lena.Wang@mediatek.com> wrote:
> > > >
> > > > On Mon, 2024-04-15 at 16:53 -0400, Willem de Bruijn wrote:
> > > > >
> > > > > External email : Please do not click links or open attachments until
> > > > > you have verified the sender or the content.
> > > > >  shiming.cheng@ wrote:
> > > > > > From: Shiming Cheng <shiming.cheng@mediatek.com>
> > > > > >
> > > > > > A GRO packet without fraglist is crashed and backtrace is as below:
> > > > > >  [ 1100.812205][    C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted:
> > > > > > G        W  OE      6.6.17-android15-0-g380371ea9bf1 #1
> > > > > >  [ 1100.812317][    C3]  __udp_gso_segment+0x298/0x4d4
> > > > > >  [ 1100.812335][    C3]  __skb_gso_segment+0xc4/0x120
> > > > > >  [ 1100.812339][    C3]  udp_rcv_segment+0x50/0x134
> > > > > >  [ 1100.812344][    C3]  udp_queue_rcv_skb+0x74/0x114
> > > > > >  [ 1100.812348][    C3]  udp_unicast_rcv_skb+0x94/0xac
> > > > > >  [ 1100.812358][    C3]  udp_rcv+0x20/0x30
> > > > > >
> > > > > > The reason that the packet loses its fraglist is that in ingress
> > > > > bpf
> > > > > > it makes a test pull with to make sure it can read packet headers
> > > > > > via direct packet access: In bpf_progs/offload.c
> > > > > > try_make_writable -> bpf_skb_pull_data -> pskb_may_pull ->
> > > > > > __pskb_pull_tail  This operation pull the data in fraglist into
> > > > > linear
> > > > > > and set the fraglist to null.
> > > > >
> > > > > What is the right behavior from BPF with regard to SKB_GSO_FRAGLIST
> > > > > skbs?
> > > > >
> > > > > Some, like SCTP, cannot be linearized ever, as the do not have a
> > > > > single gso_size.
> > > > >
> > > > > Should this BPF operation just fail?
> > > > >
> > > > In most situation for big gso size packet, it indeed fails but BPF
> > > > doesn't check the result. It seems the udp GRO packet can't be pulled/
> > > > trimed/condensed or else it can't be segmented correctly.
> > > >
> > > > As the BPF function comments it doesn't matter if the data pull failed
> > > > or pull less. It just does a blind best effort pull.
> > > >
> > > > A patch to modify bpf pull length is upstreamed to Google before and
> > > > below are part of Google BPF expert maze's reply:
> > > > maze@google.com<maze@google.com> #5Apr 13, 2024 02:30AM
> > > > I *think* if that patch fixes anything, then it's really proving that
> > > > there's a bug in the kernel that needs to be fixed instead.
> > > > It should be legal to call try_make_writable(skb, X) with *any* value
> > > > of X.
> > > >
> > > > I add maze in loop and we could start more discussion here.
> > >
> > > Personally, I think bpf_skb_pull_data() should have automatically
> > > (ie. in kernel code) reduced how much it pulls so that it would pull
> > > headers only,
> >
> > That would be a helper that parses headers to discover header length.
>
> Does it actually need to?  Presumably the bpf pull function could
> notice that it is
> a packet flagged as being of type X (UDP GSO FRAGLIST) and reduce the pull
> accordingly so that it doesn't pull anything from the non-linear
> fraglist portion???
>
> I know only the generic overview of what udp gso is, not any details, so I am
> assuming here that there's some sort of guarantee to how these packets
> are structured...  But I imagine there must be or we wouldn't be hitting these
> issues deeper in the stack?

Perhaps for a packet of this type we're already guaranteed the headers
are in the linear portion,
and the pull should simply be ignored?

>
> > Parsing is better left to the BPF program.
> >
> > > and not packet content.
> > > (This is assuming the rest of the code isn't ready to deal with a longer pull,
> > > which I think is the case atm.  Pulling too much, and then crashing or forcing
> > > the stack to drop packets because of them being malformed seems wrong...)
> > >
> > > In general it would be nice if there was a way to just say pull all headers...
> > > (or possibly all L2/L3/L4 headers)
> > > You in general need to pull stuff *before* you've even looked at the packet,
> > > so that you can look at the packet,
> > > so it's relatively hard/annoying to pull the correct length from bpf
> > > code itself.
> > >
> > > > > > BPF needs to modify a proper length to do pull data. However kernel
> > > > > > should also improve the flow to avoid crash from a bpf function
> > > > > call.
> > > > > > As there is no split flow and app may not decode the merged UDP
> > > > > packet,
> > > > > > we should drop the packet without fraglist in skb_segment_list
> > > > > here.
> > > > > >
> > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > > > > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > > > > > ---
> > > > > >  net/core/skbuff.c | 3 +++
> > > > > >  1 file changed, 3 insertions(+)
> > > > > >
> > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > > > index b99127712e67..f68f2679b086 100644
> > > > > > --- a/net/core/skbuff.c
> > > > > > +++ b/net/core/skbuff.c
> > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct
> > > > > sk_buff *skb,
> > > > > >  if (err)
> > > > > >  goto err_linearize;
> > > > > >
> > > > > > +if (!list_skb)
> > > > > > +goto err_linearize;
> > > > > > +
> >
> > This would catch the case where the entire data frag_list is
> > linearized, but not a pskb_may_pull that only pulls in part of the
> > list.
> >
> > Even with BPF being privileged, the kernel should not crash if BPF
> > pulls a FRAGLIST GSO skb.
> >
> > But the check needs to be refined a bit. For a UDP GSO packet, I
> > think gso_size is still valid, so if the head_skb length does not
> > match gso_size, it has been messed with and should be dropped.
> >
> > For a GSO_BY_FRAGS skb, there is no single gso_size, and this pull
> > may be entirely undetectable as long as frag_list != NULL?
> >
> >
> > > > > >  skb_shinfo(skb)->frag_list = NULL;
> > > > >
> > > > > In absense of plugging the issue in BPF, dropping here is the best
> > > > > we can do indeed, I think.

--
Maciej Żenczykowski, Kernel Networking Developer @ Google
Willem de Bruijn April 16, 2024, 11:14 p.m. UTC | #7
> > > > Personally, I think bpf_skb_pull_data() should have automatically
> > > > (ie. in kernel code) reduced how much it pulls so that it would pull
> > > > headers only,
> > >
> > > That would be a helper that parses headers to discover header length.
> >
> > Does it actually need to?  Presumably the bpf pull function could
> > notice that it is
> > a packet flagged as being of type X (UDP GSO FRAGLIST) and reduce the pull
> > accordingly so that it doesn't pull anything from the non-linear
> > fraglist portion???
> >
> > I know only the generic overview of what udp gso is, not any details, so I am
> > assuming here that there's some sort of guarantee to how these packets
> > are structured...  But I imagine there must be or we wouldn't be hitting these
> > issues deeper in the stack?
> 
> Perhaps for a packet of this type we're already guaranteed the headers
> are in the linear portion,
> and the pull should simply be ignored?
> 
> >
> > > Parsing is better left to the BPF program.

I do prefer adding sanity checks to the BPF helpers, over having to
add then in the net hot path only to protect against dangerous BPF
programs.

In this case, it would be detecting this GSO type and failing the
operation if exceeding skb_headlen().
> > >
> > > > and not packet content.
> > > > (This is assuming the rest of the code isn't ready to deal with a longer pull,
> > > > which I think is the case atm.  Pulling too much, and then crashing or forcing
> > > > the stack to drop packets because of them being malformed seems wrong...)
> > > >
> > > > In general it would be nice if there was a way to just say pull all headers...
> > > > (or possibly all L2/L3/L4 headers)
> > > > You in general need to pull stuff *before* you've even looked at the packet,
> > > > so that you can look at the packet,
> > > > so it's relatively hard/annoying to pull the correct length from bpf
> > > > code itself.
> > > >
> > > > > > > BPF needs to modify a proper length to do pull data. However kernel
> > > > > > > should also improve the flow to avoid crash from a bpf function
> > > > > > call.
> > > > > > > As there is no split flow and app may not decode the merged UDP
> > > > > > packet,
> > > > > > > we should drop the packet without fraglist in skb_segment_list
> > > > > > here.
> > > > > > >
> > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > > > > > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > > > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > > > > > > ---
> > > > > > >  net/core/skbuff.c | 3 +++
> > > > > > >  1 file changed, 3 insertions(+)
> > > > > > >
> > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > > > > index b99127712e67..f68f2679b086 100644
> > > > > > > --- a/net/core/skbuff.c
> > > > > > > +++ b/net/core/skbuff.c
> > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct
> > > > > > sk_buff *skb,
> > > > > > >  if (err)
> > > > > > >  goto err_linearize;
> > > > > > >
> > > > > > > +if (!list_skb)
> > > > > > > +goto err_linearize;
> > > > > > > +
> > >
> > > This would catch the case where the entire data frag_list is
> > > linearized, but not a pskb_may_pull that only pulls in part of the
> > > list.
> > >
> > > Even with BPF being privileged, the kernel should not crash if BPF
> > > pulls a FRAGLIST GSO skb.
> > >
> > > But the check needs to be refined a bit. For a UDP GSO packet, I
> > > think gso_size is still valid, so if the head_skb length does not
> > > match gso_size, it has been messed with and should be dropped.
> > >
> > > For a GSO_BY_FRAGS skb, there is no single gso_size, and this pull
> > > may be entirely undetectable as long as frag_list != NULL?
> > >
> > >
> > > > > > >  skb_shinfo(skb)->frag_list = NULL;
> > > > > >
> > > > > > In absense of plugging the issue in BPF, dropping here is the best
> > > > > > we can do indeed, I think.
> 
> --
> Maciej Żenczykowski, Kernel Networking Developer @ Google
Lena Wang (王娜) April 17, 2024, 7:19 a.m. UTC | #8
On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  > > > > Personally, I think bpf_skb_pull_data() should have
> automatically
> > > > > (ie. in kernel code) reduced how much it pulls so that it
> would pull
> > > > > headers only,
> > > >
> > > > That would be a helper that parses headers to discover header
> length.
> > >
> > > Does it actually need to?  Presumably the bpf pull function could
> > > notice that it is
> > > a packet flagged as being of type X (UDP GSO FRAGLIST) and reduce
> the pull
> > > accordingly so that it doesn't pull anything from the non-linear
> > > fraglist portion???
> > >
> > > I know only the generic overview of what udp gso is, not any
> details, so I am
> > > assuming here that there's some sort of guarantee to how these
> packets
> > > are structured...  But I imagine there must be or we wouldn't be
> hitting these
> > > issues deeper in the stack?
> > 
> > Perhaps for a packet of this type we're already guaranteed the
> headers
> > are in the linear portion,
> > and the pull should simply be ignored?
> > 
> > >
> > > > Parsing is better left to the BPF program.
> 
> I do prefer adding sanity checks to the BPF helpers, over having to
> add then in the net hot path only to protect against dangerous BPF
> programs.
> 
Is it OK to ignore or decrease pull length for udp gro fraglist packet?
It could save the normal packet and sent to user correctly.

In common/net/core/filter.c
static inline int __bpf_try_make_writable(struct sk_buff *skb,
              unsigned int write_len)
{ 
+	if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
+		(SKB_GSO_UDP  |SKB_GSO_UDP_L4)) {
+		return 0;

+	     or if (write_len > skb_headlen(skb))
+			write_len = skb_headlen(skb);
+	}
	return skb_ensure_writable(skb, write_len);
}
 

> In this case, it would be detecting this GSO type and failing the
> operation if exceeding skb_headlen().
> > > >
> > > > > and not packet content.
> > > > > (This is assuming the rest of the code isn't ready to deal
> with a longer pull,
> > > > > which I think is the case atm.  Pulling too much, and then
> crashing or forcing
> > > > > the stack to drop packets because of them being malformed
> seems wrong...)
> > > > >
> > > > > In general it would be nice if there was a way to just say
> pull all headers...
> > > > > (or possibly all L2/L3/L4 headers)
> > > > > You in general need to pull stuff *before* you've even looked
> at the packet,
> > > > > so that you can look at the packet,
> > > > > so it's relatively hard/annoying to pull the correct length
> from bpf
> > > > > code itself.
> > > > >
> > > > > > > > BPF needs to modify a proper length to do pull data.
> However kernel
> > > > > > > > should also improve the flow to avoid crash from a bpf
> function
> > > > > > > call.
> > > > > > > > As there is no split flow and app may not decode the
> merged UDP
> > > > > > > packet,
> > > > > > > > we should drop the packet without fraglist in
> skb_segment_list
> > > > > > > here.
> > > > > > > >
> > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist
> chaining.")
> > > > > > > > Signed-off-by: Shiming Cheng <
> shiming.cheng@mediatek.com>
> > > > > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > > > > > > > ---
> > > > > > > >  net/core/skbuff.c | 3 +++
> > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > > > > > index b99127712e67..f68f2679b086 100644
> > > > > > > > --- a/net/core/skbuff.c
> > > > > > > > +++ b/net/core/skbuff.c
> > > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff
> *skb_segment_list(struct
> > > > > > > sk_buff *skb,
> > > > > > > >  if (err)
> > > > > > > >  goto err_linearize;
> > > > > > > >
> > > > > > > > +if (!list_skb)
> > > > > > > > +goto err_linearize;
> > > > > > > > +
> > > >
> > > > This would catch the case where the entire data frag_list is
> > > > linearized, but not a pskb_may_pull that only pulls in part of
> the
> > > > list.
> > > >
> > > > Even with BPF being privileged, the kernel should not crash if
> BPF
> > > > pulls a FRAGLIST GSO skb.
> > > >
> > > > But the check needs to be refined a bit. For a UDP GSO packet,
> I
> > > > think gso_size is still valid, so if the head_skb length does
> not
> > > > match gso_size, it has been messed with and should be dropped.
> > > >
Is it OK as below? Is it OK to add log to record the error for easy
checking issue.

In net/core/skbuff.c skb_segment_list
+unsigned int mss = skb_shinfo(head_skb)->gso_size;
+bool err_len = false;

+if ( mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) {
+	pr_err("skb is dropped due to messed data. gso size:%d,
+		hdrlen:%d", mss, skb_headlen(head_skb)
+	if (!list_skb)
+		goto err_linearize;
+	else
+		err_len = true;
+}

...
+if (err_len) {
+	goto err_linearize;
+}

skb_get(skb);
...

> > > > For a GSO_BY_FRAGS skb, there is no single gso_size, and this
> pull
> > > > may be entirely undetectable as long as frag_list != NULL?
> > > >
> > > >
In function skb_segment_list(), it just handle udp fraglist gro packet.
nr_frags will be 0 here. 

It records a SKB_GSO_DODGY in gso_type when doing partially eaten for
fraglist in __pskb_pull_tail and in skb_segment() it will check and
disable NETIF_F_SG. 
skb_segment could segment data as gso_size even if it is pulled into
hearder skb. I am not sure if it can decode when frag_list is NULL or
partially eaten as no BPF pulls illegal length for tcp packet. Our
platfrom doesn't meet issues in skb_segment for tcp packet till now.

> > > > > > > >  skb_shinfo(skb)->frag_list = NULL;
> > > > > > >
> > > > > > > In absense of plugging the issue in BPF, dropping here is
> the best
> > > > > > > we can do indeed, I think.
> > 
> > --
> > Maciej Żenczykowski, Kernel Networking Developer @ Google
> 
>
Willem de Bruijn April 17, 2024, 7:48 p.m. UTC | #9
Lena Wang (王娜) wrote:
> On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote:
> >  	 
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  > > > > Personally, I think bpf_skb_pull_data() should have
> > automatically
> > > > > > (ie. in kernel code) reduced how much it pulls so that it
> > would pull
> > > > > > headers only,
> > > > >
> > > > > That would be a helper that parses headers to discover header
> > length.
> > > >
> > > > Does it actually need to?  Presumably the bpf pull function could
> > > > notice that it is
> > > > a packet flagged as being of type X (UDP GSO FRAGLIST) and reduce
> > the pull
> > > > accordingly so that it doesn't pull anything from the non-linear
> > > > fraglist portion???
> > > >
> > > > I know only the generic overview of what udp gso is, not any
> > details, so I am
> > > > assuming here that there's some sort of guarantee to how these
> > packets
> > > > are structured...  But I imagine there must be or we wouldn't be
> > hitting these
> > > > issues deeper in the stack?
> > > 
> > > Perhaps for a packet of this type we're already guaranteed the
> > headers
> > > are in the linear portion,
> > > and the pull should simply be ignored?
> > > 
> > > >
> > > > > Parsing is better left to the BPF program.
> > 
> > I do prefer adding sanity checks to the BPF helpers, over having to
> > add then in the net hot path only to protect against dangerous BPF
> > programs.
> > 
> Is it OK to ignore or decrease pull length for udp gro fraglist packet?
> It could save the normal packet and sent to user correctly.
> 
> In common/net/core/filter.c
> static inline int __bpf_try_make_writable(struct sk_buff *skb,
>               unsigned int write_len)
> { 
> +	if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> +		(SKB_GSO_UDP  |SKB_GSO_UDP_L4)) {

The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST.

> +		return 0;

Failing for any pull is a bit excessive. And would kill a sane
workaround of pulling only as many bytes as needed.
 
> +	     or if (write_len > skb_headlen(skb))
> +			write_len = skb_headlen(skb);

Truncating requests would be a surprising change of behavior
for this function.

Failing for a pull > skb_headlen is arguably reasonable, as
the alternative is that we let it go through but have to drop
the now malformed packets on segmentation.


> +	}
> 	return skb_ensure_writable(skb, write_len);
> }
>  
> 
> > In this case, it would be detecting this GSO type and failing the
> > operation if exceeding skb_headlen().
> > > > >
> > > > > > and not packet content.
> > > > > > (This is assuming the rest of the code isn't ready to deal
> > with a longer pull,
> > > > > > which I think is the case atm.  Pulling too much, and then
> > crashing or forcing
> > > > > > the stack to drop packets because of them being malformed
> > seems wrong...)
> > > > > >
> > > > > > In general it would be nice if there was a way to just say
> > pull all headers...
> > > > > > (or possibly all L2/L3/L4 headers)
> > > > > > You in general need to pull stuff *before* you've even looked
> > at the packet,
> > > > > > so that you can look at the packet,
> > > > > > so it's relatively hard/annoying to pull the correct length
> > from bpf
> > > > > > code itself.
> > > > > >
> > > > > > > > > BPF needs to modify a proper length to do pull data.
> > However kernel
> > > > > > > > > should also improve the flow to avoid crash from a bpf
> > function
> > > > > > > > call.
> > > > > > > > > As there is no split flow and app may not decode the
> > merged UDP
> > > > > > > > packet,
> > > > > > > > > we should drop the packet without fraglist in
> > skb_segment_list
> > > > > > > > here.
> > > > > > > > >
> > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist
> > chaining.")
> > > > > > > > > Signed-off-by: Shiming Cheng <
> > shiming.cheng@mediatek.com>
> > > > > > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > > > > > > > > ---
> > > > > > > > >  net/core/skbuff.c | 3 +++
> > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > > > > > > index b99127712e67..f68f2679b086 100644
> > > > > > > > > --- a/net/core/skbuff.c
> > > > > > > > > +++ b/net/core/skbuff.c
> > > > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff
> > *skb_segment_list(struct
> > > > > > > > sk_buff *skb,
> > > > > > > > >  if (err)
> > > > > > > > >  goto err_linearize;
> > > > > > > > >
> > > > > > > > > +if (!list_skb)
> > > > > > > > > +goto err_linearize;
> > > > > > > > > +
> > > > >
> > > > > This would catch the case where the entire data frag_list is
> > > > > linearized, but not a pskb_may_pull that only pulls in part of
> > the
> > > > > list.
> > > > >
> > > > > Even with BPF being privileged, the kernel should not crash if
> > BPF
> > > > > pulls a FRAGLIST GSO skb.
> > > > >
> > > > > But the check needs to be refined a bit. For a UDP GSO packet,
> > I
> > > > > think gso_size is still valid, so if the head_skb length does
> > not
> > > > > match gso_size, it has been messed with and should be dropped.
> > > > >
> Is it OK as below? Is it OK to add log to record the error for easy
> checking issue.
> 
> In net/core/skbuff.c skb_segment_list
> +unsigned int mss = skb_shinfo(head_skb)->gso_size;
> +bool err_len = false;
> 
> +if ( mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) {
> +	pr_err("skb is dropped due to messed data. gso size:%d,
> +		hdrlen:%d", mss, skb_headlen(head_skb)

Such logs should always be rate limited. But no need to log cases
where we well understood how we get there.

I would stick with one approach: either in the BPF func or in
segmentation, not both. And then I find BPF preferable, as explained
before.

> +	if (!list_skb)
> +		goto err_linearize;
> +	else
> +		err_len = true;
> +}
> 
> ...
> +if (err_len) {
> +	goto err_linearize;
> +}
> 
> skb_get(skb);
> ...
Lena Wang (王娜) April 18, 2024, 2:52 a.m. UTC | #10
On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  Lena Wang (王娜) wrote:
> > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote:
> > >   
> > > External email : Please do not click links or open attachments
> until
> > > you have verified the sender or the content.
> > >  > > > > Personally, I think bpf_skb_pull_data() should have
> > > automatically
> > > > > > > (ie. in kernel code) reduced how much it pulls so that it
> > > would pull
> > > > > > > headers only,
> > > > > >
> > > > > > That would be a helper that parses headers to discover
> header
> > > length.
> > > > >
> > > > > Does it actually need to?  Presumably the bpf pull function
> could
> > > > > notice that it is
> > > > > a packet flagged as being of type X (UDP GSO FRAGLIST) and
> reduce
> > > the pull
> > > > > accordingly so that it doesn't pull anything from the non-
> linear
> > > > > fraglist portion???
> > > > >
> > > > > I know only the generic overview of what udp gso is, not any
> > > details, so I am
> > > > > assuming here that there's some sort of guarantee to how
> these
> > > packets
> > > > > are structured...  But I imagine there must be or we wouldn't
> be
> > > hitting these
> > > > > issues deeper in the stack?
> > > > 
> > > > Perhaps for a packet of this type we're already guaranteed the
> > > headers
> > > > are in the linear portion,
> > > > and the pull should simply be ignored?
> > > > 
> > > > >
> > > > > > Parsing is better left to the BPF program.
> > > 
> > > I do prefer adding sanity checks to the BPF helpers, over having
> to
> > > add then in the net hot path only to protect against dangerous
> BPF
> > > programs.
> > > 
> > Is it OK to ignore or decrease pull length for udp gro fraglist
> packet?
> > It could save the normal packet and sent to user correctly.
> > 
> > In common/net/core/filter.c
> > static inline int __bpf_try_make_writable(struct sk_buff *skb,
> >               unsigned int write_len)
> > { 
> > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > +(SKB_GSO_UDP  |SKB_GSO_UDP_L4)) {
> 
> The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST.
> 
Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. In
udp_offload.c udp4_gro_complete gso_type adds "SKB_GSO_FRAGLIST|
SKB_GSO_UDP_L4". Here checking these two flags is to limit the packet
as "UDP + need GSO + fraglist".

We could remove SKB_GSO_UDP_L4 check for more packet that may addrive
skb_segment_list.

> > +return 0;
> 
> Failing for any pull is a bit excessive. And would kill a sane
> workaround of pulling only as many bytes as needed.
>  
> > +     or if (write_len > skb_headlen(skb))
> > +write_len = skb_headlen(skb);
> 
> Truncating requests would be a surprising change of behavior
> for this function.
> 
> Failing for a pull > skb_headlen is arguably reasonable, as
> the alternative is that we let it go through but have to drop
> the now malformed packets on segmentation.
> 
> 
Is it OK as below?

In common/net/core/filter.c
static inline int __bpf_try_make_writable(struct sk_buff *skb,
              unsigned int write_len)
{ 
+       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
+               SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) {
+               return 0;
+       }
        return skb_ensure_writable(skb, write_len);
}

> > +}
> > return skb_ensure_writable(skb, write_len);
> > }
> >  
> > 
> > > In this case, it would be detecting this GSO type and failing the
> > > operation if exceeding skb_headlen().
> > > > > >
> > > > > > > and not packet content.
> > > > > > > (This is assuming the rest of the code isn't ready to
> deal
> > > with a longer pull,
> > > > > > > which I think is the case atm.  Pulling too much, and
> then
> > > crashing or forcing
> > > > > > > the stack to drop packets because of them being malformed
> > > seems wrong...)
> > > > > > >
> > > > > > > In general it would be nice if there was a way to just
> say
> > > pull all headers...
> > > > > > > (or possibly all L2/L3/L4 headers)
> > > > > > > You in general need to pull stuff *before* you've even
> looked
> > > at the packet,
> > > > > > > so that you can look at the packet,
> > > > > > > so it's relatively hard/annoying to pull the correct
> length
> > > from bpf
> > > > > > > code itself.
> > > > > > >
> > > > > > > > > > BPF needs to modify a proper length to do pull
> data.
> > > However kernel
> > > > > > > > > > should also improve the flow to avoid crash from a
> bpf
> > > function
> > > > > > > > > call.
> > > > > > > > > > As there is no split flow and app may not decode
> the
> > > merged UDP
> > > > > > > > > packet,
> > > > > > > > > > we should drop the packet without fraglist in
> > > skb_segment_list
> > > > > > > > > here.
> > > > > > > > > >
> > > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist
> > > chaining.")
> > > > > > > > > > Signed-off-by: Shiming Cheng <
> > > shiming.cheng@mediatek.com>
> > > > > > > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > > > > > > > > > ---
> > > > > > > > > >  net/core/skbuff.c | 3 +++
> > > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > > >
> > > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > > > > > > > index b99127712e67..f68f2679b086 100644
> > > > > > > > > > --- a/net/core/skbuff.c
> > > > > > > > > > +++ b/net/core/skbuff.c
> > > > > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff
> > > *skb_segment_list(struct
> > > > > > > > > sk_buff *skb,
> > > > > > > > > >  if (err)
> > > > > > > > > >  goto err_linearize;
> > > > > > > > > >
> > > > > > > > > > +if (!list_skb)
> > > > > > > > > > +goto err_linearize;
> > > > > > > > > > +
> > > > > >
> > > > > > This would catch the case where the entire data frag_list
> is
> > > > > > linearized, but not a pskb_may_pull that only pulls in part
> of
> > > the
> > > > > > list.
> > > > > >
> > > > > > Even with BPF being privileged, the kernel should not crash
> if
> > > BPF
> > > > > > pulls a FRAGLIST GSO skb.
> > > > > >
> > > > > > But the check needs to be refined a bit. For a UDP GSO
> packet,
> > > I
> > > > > > think gso_size is still valid, so if the head_skb length
> does
> > > not
> > > > > > match gso_size, it has been messed with and should be
> dropped.
> > > > > >
> > Is it OK as below? Is it OK to add log to record the error for easy
> > checking issue.
> > 
> > In net/core/skbuff.c skb_segment_list
> > +unsigned int mss = skb_shinfo(head_skb)->gso_size;
> > +bool err_len = false;
> > 
> > +if ( mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) {
> > +pr_err("skb is dropped due to messed data. gso size:%d,
> > +hdrlen:%d", mss, skb_headlen(head_skb)
> 
> Such logs should always be rate limited. But no need to log cases
> where we well understood how we get there.
> 
> I would stick with one approach: either in the BPF func or in
> segmentation, not both. And then I find BPF preferable, as explained
> before.
> 
OK, we try make a patch in BPF func.

> > +if (!list_skb)
> > +goto err_linearize;
> > +else
> > +err_len = true;
> > +}
> > 
> > ...
> > +if (err_len) {
> > +goto err_linearize;
> > +}
> > 
> > skb_get(skb);
> > ...
Maciej Żenczykowski April 18, 2024, 4:15 a.m. UTC | #11
On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) <Lena.Wang@mediatek.com> wrote:
>
> On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote:
> >
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  Lena Wang (王娜) wrote:
> > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote:
> > > >
> > > > External email : Please do not click links or open attachments
> > until
> > > > you have verified the sender or the content.
> > > >  > > > > Personally, I think bpf_skb_pull_data() should have
> > > > automatically
> > > > > > > > (ie. in kernel code) reduced how much it pulls so that it
> > > > would pull
> > > > > > > > headers only,
> > > > > > >
> > > > > > > That would be a helper that parses headers to discover
> > header
> > > > length.
> > > > > >
> > > > > > Does it actually need to?  Presumably the bpf pull function
> > could
> > > > > > notice that it is
> > > > > > a packet flagged as being of type X (UDP GSO FRAGLIST) and
> > reduce
> > > > the pull
> > > > > > accordingly so that it doesn't pull anything from the non-
> > linear
> > > > > > fraglist portion???
> > > > > >
> > > > > > I know only the generic overview of what udp gso is, not any
> > > > details, so I am
> > > > > > assuming here that there's some sort of guarantee to how
> > these
> > > > packets
> > > > > > are structured...  But I imagine there must be or we wouldn't
> > be
> > > > hitting these
> > > > > > issues deeper in the stack?
> > > > >
> > > > > Perhaps for a packet of this type we're already guaranteed the
> > > > headers
> > > > > are in the linear portion,
> > > > > and the pull should simply be ignored?
> > > > >
> > > > > >
> > > > > > > Parsing is better left to the BPF program.
> > > >
> > > > I do prefer adding sanity checks to the BPF helpers, over having
> > to
> > > > add then in the net hot path only to protect against dangerous
> > BPF
> > > > programs.
> > > >
> > > Is it OK to ignore or decrease pull length for udp gro fraglist
> > packet?
> > > It could save the normal packet and sent to user correctly.
> > >
> > > In common/net/core/filter.c
> > > static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > >               unsigned int write_len)
> > > {
> > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > +(SKB_GSO_UDP  |SKB_GSO_UDP_L4)) {
> >
> > The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST.
> >
> Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. In
> udp_offload.c udp4_gro_complete gso_type adds "SKB_GSO_FRAGLIST|
> SKB_GSO_UDP_L4". Here checking these two flags is to limit the packet
> as "UDP + need GSO + fraglist".
>
> We could remove SKB_GSO_UDP_L4 check for more packet that may addrive
> skb_segment_list.
>
> > > +return 0;
> >
> > Failing for any pull is a bit excessive. And would kill a sane
> > workaround of pulling only as many bytes as needed.
> >
> > > +     or if (write_len > skb_headlen(skb))
> > > +write_len = skb_headlen(skb);
> >
> > Truncating requests would be a surprising change of behavior
> > for this function.
> >
> > Failing for a pull > skb_headlen is arguably reasonable, as
> > the alternative is that we let it go through but have to drop
> > the now malformed packets on segmentation.
> >
> >
> Is it OK as below?
>
> In common/net/core/filter.c
> static inline int __bpf_try_make_writable(struct sk_buff *skb,
>               unsigned int write_len)
> {
> +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> +               SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) {
> +               return 0;

please limit write_len to skb_headlen() instead of just returning 0

> +       }
>         return skb_ensure_writable(skb, write_len);
> }
>
> > > +}
> > > return skb_ensure_writable(skb, write_len);
> > > }
> > >
> > >
> > > > In this case, it would be detecting this GSO type and failing the
> > > > operation if exceeding skb_headlen().
> > > > > > >
> > > > > > > > and not packet content.
> > > > > > > > (This is assuming the rest of the code isn't ready to
> > deal
> > > > with a longer pull,
> > > > > > > > which I think is the case atm.  Pulling too much, and
> > then
> > > > crashing or forcing
> > > > > > > > the stack to drop packets because of them being malformed
> > > > seems wrong...)
> > > > > > > >
> > > > > > > > In general it would be nice if there was a way to just
> > say
> > > > pull all headers...
> > > > > > > > (or possibly all L2/L3/L4 headers)
> > > > > > > > You in general need to pull stuff *before* you've even
> > looked
> > > > at the packet,
> > > > > > > > so that you can look at the packet,
> > > > > > > > so it's relatively hard/annoying to pull the correct
> > length
> > > > from bpf
> > > > > > > > code itself.
> > > > > > > >
> > > > > > > > > > > BPF needs to modify a proper length to do pull
> > data.
> > > > However kernel
> > > > > > > > > > > should also improve the flow to avoid crash from a
> > bpf
> > > > function
> > > > > > > > > > call.
> > > > > > > > > > > As there is no split flow and app may not decode
> > the
> > > > merged UDP
> > > > > > > > > > packet,
> > > > > > > > > > > we should drop the packet without fraglist in
> > > > skb_segment_list
> > > > > > > > > > here.
> > > > > > > > > > >
> > > > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist
> > > > chaining.")
> > > > > > > > > > > Signed-off-by: Shiming Cheng <
> > > > shiming.cheng@mediatek.com>
> > > > > > > > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  net/core/skbuff.c | 3 +++
> > > > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > > > > > > > > index b99127712e67..f68f2679b086 100644
> > > > > > > > > > > --- a/net/core/skbuff.c
> > > > > > > > > > > +++ b/net/core/skbuff.c
> > > > > > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff
> > > > *skb_segment_list(struct
> > > > > > > > > > sk_buff *skb,
> > > > > > > > > > >  if (err)
> > > > > > > > > > >  goto err_linearize;
> > > > > > > > > > >
> > > > > > > > > > > +if (!list_skb)
> > > > > > > > > > > +goto err_linearize;
> > > > > > > > > > > +
> > > > > > >
> > > > > > > This would catch the case where the entire data frag_list
> > is
> > > > > > > linearized, but not a pskb_may_pull that only pulls in part
> > of
> > > > the
> > > > > > > list.
> > > > > > >
> > > > > > > Even with BPF being privileged, the kernel should not crash
> > if
> > > > BPF
> > > > > > > pulls a FRAGLIST GSO skb.
> > > > > > >
> > > > > > > But the check needs to be refined a bit. For a UDP GSO
> > packet,
> > > > I
> > > > > > > think gso_size is still valid, so if the head_skb length
> > does
> > > > not
> > > > > > > match gso_size, it has been messed with and should be
> > dropped.
> > > > > > >
> > > Is it OK as below? Is it OK to add log to record the error for easy
> > > checking issue.
> > >
> > > In net/core/skbuff.c skb_segment_list
> > > +unsigned int mss = skb_shinfo(head_skb)->gso_size;
> > > +bool err_len = false;
> > >
> > > +if ( mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) {
> > > +pr_err("skb is dropped due to messed data. gso size:%d,
> > > +hdrlen:%d", mss, skb_headlen(head_skb)
> >
> > Such logs should always be rate limited. But no need to log cases
> > where we well understood how we get there.
> >
> > I would stick with one approach: either in the BPF func or in
> > segmentation, not both. And then I find BPF preferable, as explained
> > before.
> >
> OK, we try make a patch in BPF func.
>
> > > +if (!list_skb)
> > > +goto err_linearize;
> > > +else
> > > +err_len = true;
> > > +}
> > >
> > > ...
> > > +if (err_len) {
> > > +goto err_linearize;
> > > +}
> > >
> > > skb_get(skb);
> > > ...

--
Maciej Żenczykowski, Kernel Networking Developer @ Google
Lena Wang (王娜) April 19, 2024, 8:36 a.m. UTC | #12
On Wed, 2024-04-17 at 21:15 -0700, Maciej Żenczykowski wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) <
> Lena.Wang@mediatek.com> wrote:
> >
> > On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote:
> > >
> > > External email : Please do not click links or open attachments
> until
> > > you have verified the sender or the content.
> > >  Lena Wang (王娜) wrote:
> > > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote:
> > > > >
> > > > > External email : Please do not click links or open
> attachments
> > > until
> > > > > you have verified the sender or the content.
> > > > >  > > > > Personally, I think bpf_skb_pull_data() should have
> > > > > automatically
> > > > > > > > > (ie. in kernel code) reduced how much it pulls so
> that it
> > > > > would pull
> > > > > > > > > headers only,
> > > > > > > >
> > > > > > > > That would be a helper that parses headers to discover
> > > header
> > > > > length.
> > > > > > >
> > > > > > > Does it actually need to?  Presumably the bpf pull
> function
> > > could
> > > > > > > notice that it is
> > > > > > > a packet flagged as being of type X (UDP GSO FRAGLIST)
> and
> > > reduce
> > > > > the pull
> > > > > > > accordingly so that it doesn't pull anything from the
> non-
> > > linear
> > > > > > > fraglist portion???
> > > > > > >
> > > > > > > I know only the generic overview of what udp gso is, not
> any
> > > > > details, so I am
> > > > > > > assuming here that there's some sort of guarantee to how
> > > these
> > > > > packets
> > > > > > > are structured...  But I imagine there must be or we
> wouldn't
> > > be
> > > > > hitting these
> > > > > > > issues deeper in the stack?
> > > > > >
> > > > > > Perhaps for a packet of this type we're already guaranteed
> the
> > > > > headers
> > > > > > are in the linear portion,
> > > > > > and the pull should simply be ignored?
> > > > > >
> > > > > > >
> > > > > > > > Parsing is better left to the BPF program.
> > > > >
> > > > > I do prefer adding sanity checks to the BPF helpers, over
> having
> > > to
> > > > > add then in the net hot path only to protect against
> dangerous
> > > BPF
> > > > > programs.
> > > > >
> > > > Is it OK to ignore or decrease pull length for udp gro fraglist
> > > packet?
> > > > It could save the normal packet and sent to user correctly.
> > > >
> > > > In common/net/core/filter.c
> > > > static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > > >               unsigned int write_len)
> > > > {
> > > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > > +(SKB_GSO_UDP  |SKB_GSO_UDP_L4)) {
> > >
> > > The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST.
> > >
> > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. In
> > udp_offload.c udp4_gro_complete gso_type adds "SKB_GSO_FRAGLIST|
> > SKB_GSO_UDP_L4". Here checking these two flags is to limit the
> packet
> > as "UDP + need GSO + fraglist".
> >
> > We could remove SKB_GSO_UDP_L4 check for more packet that may
> addrive
> > skb_segment_list.
> >
> > > > +return 0;
> > >
> > > Failing for any pull is a bit excessive. And would kill a sane
> > > workaround of pulling only as many bytes as needed.
> > >
> > > > +     or if (write_len > skb_headlen(skb))
> > > > +write_len = skb_headlen(skb);
> > >
> > > Truncating requests would be a surprising change of behavior
> > > for this function.
> > >
> > > Failing for a pull > skb_headlen is arguably reasonable, as
> > > the alternative is that we let it go through but have to drop
> > > the now malformed packets on segmentation.
> > >
> > >
> > Is it OK as below?
> >
> > In common/net/core/filter.c
> > static inline int __bpf_try_make_writable(struct sk_buff *skb,
> >               unsigned int write_len)
> > {
> > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > +               SKB_GSO_FRAGLIST) && (write_len >
> skb_headlen(skb))) {
> > +               return 0;
> 
> please limit write_len to skb_headlen() instead of just returning 0
> 

Hi Maze & Willem,
Maze's advice is:
In common/net/core/filter.c
static inline int __bpf_try_make_writable(struct sk_buff *skb,
              unsigned int write_len)
{ 
+       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
+               SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) {
+               write_len = skb_headlen(skb);
+       }
        return skb_ensure_writable(skb, write_len);
}

Willem's advice is to "Failing for a pull > skb_headlen is arguably 
reasonable...". It prefers to return 0 :
+       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
+               SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) {
+               return 0;
+       }

It seems a bit conflict. However I am not sure if my understanding is
right and hope to get your further guide.

Thanks
Lena

> > +       }
> >         return skb_ensure_writable(skb, write_len);
> > }
> >
> > > > +}
> > > > return skb_ensure_writable(skb, write_len);
> > > > }
> > > >
> > > >
> > > > > In this case, it would be detecting this GSO type and failing
> the
> > > > > operation if exceeding skb_headlen().
> > > > > > > >
> > > > > > > > > and not packet content.
> > > > > > > > > (This is assuming the rest of the code isn't ready to
> > > deal
> > > > > with a longer pull,
> > > > > > > > > which I think is the case atm.  Pulling too much, and
> > > then
> > > > > crashing or forcing
> > > > > > > > > the stack to drop packets because of them being
> malformed
> > > > > seems wrong...)
> > > > > > > > >
> > > > > > > > > In general it would be nice if there was a way to
> just
> > > say
> > > > > pull all headers...
> > > > > > > > > (or possibly all L2/L3/L4 headers)
> > > > > > > > > You in general need to pull stuff *before* you've
> even
> > > looked
> > > > > at the packet,
> > > > > > > > > so that you can look at the packet,
> > > > > > > > > so it's relatively hard/annoying to pull the correct
> > > length
> > > > > from bpf
> > > > > > > > > code itself.
> > > > > > > > >
> > > > > > > > > > > > BPF needs to modify a proper length to do pull
> > > data.
> > > > > However kernel
> > > > > > > > > > > > should also improve the flow to avoid crash
> from a
> > > bpf
> > > > > function
> > > > > > > > > > > call.
> > > > > > > > > > > > As there is no split flow and app may not
> decode
> > > the
> > > > > merged UDP
> > > > > > > > > > > packet,
> > > > > > > > > > > > we should drop the packet without fraglist in
> > > > > skb_segment_list
> > > > > > > > > > > here.
> > > > > > > > > > > >
> > > > > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO
> fraglist
> > > > > chaining.")
> > > > > > > > > > > > Signed-off-by: Shiming Cheng <
> > > > > shiming.cheng@mediatek.com>
> > > > > > > > > > > > Signed-off-by: Lena Wang <
> lena.wang@mediatek.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  net/core/skbuff.c | 3 +++
> > > > > > > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/net/core/skbuff.c
> b/net/core/skbuff.c
> > > > > > > > > > > > index b99127712e67..f68f2679b086 100644
> > > > > > > > > > > > --- a/net/core/skbuff.c
> > > > > > > > > > > > +++ b/net/core/skbuff.c
> > > > > > > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff
> > > > > *skb_segment_list(struct
> > > > > > > > > > > sk_buff *skb,
> > > > > > > > > > > >  if (err)
> > > > > > > > > > > >  goto err_linearize;
> > > > > > > > > > > >
> > > > > > > > > > > > +if (!list_skb)
> > > > > > > > > > > > +goto err_linearize;
> > > > > > > > > > > > +
> > > > > > > >
> > > > > > > > This would catch the case where the entire data
> frag_list
> > > is
> > > > > > > > linearized, but not a pskb_may_pull that only pulls in
> part
> > > of
> > > > > the
> > > > > > > > list.
> > > > > > > >
> > > > > > > > Even with BPF being privileged, the kernel should not
> crash
> > > if
> > > > > BPF
> > > > > > > > pulls a FRAGLIST GSO skb.
> > > > > > > >
> > > > > > > > But the check needs to be refined a bit. For a UDP GSO
> > > packet,
> > > > > I
> > > > > > > > think gso_size is still valid, so if the head_skb
> length
> > > does
> > > > > not
> > > > > > > > match gso_size, it has been messed with and should be
> > > dropped.
> > > > > > > >
> > > > Is it OK as below? Is it OK to add log to record the error for
> easy
> > > > checking issue.
> > > >
> > > > In net/core/skbuff.c skb_segment_list
> > > > +unsigned int mss = skb_shinfo(head_skb)->gso_size;
> > > > +bool err_len = false;
> > > >
> > > > +if ( mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) {
> > > > +pr_err("skb is dropped due to messed data. gso size:%d,
> > > > +hdrlen:%d", mss, skb_headlen(head_skb)
> > >
> > > Such logs should always be rate limited. But no need to log cases
> > > where we well understood how we get there.
> > >
> > > I would stick with one approach: either in the BPF func or in
> > > segmentation, not both. And then I find BPF preferable, as
> explained
> > > before.
> > >
> > OK, we try make a patch in BPF func.
> >
> > > > +if (!list_skb)
> > > > +goto err_linearize;
> > > > +else
> > > > +err_len = true;
> > > > +}
> > > >
> > > > ...
> > > > +if (err_len) {
> > > > +goto err_linearize;
> > > > +}
> > > >
> > > > skb_get(skb);
> > > > ...
> 
> --
> Maciej Żenczykowski, Kernel Networking Developer @ Google
Willem de Bruijn April 19, 2024, 2:17 p.m. UTC | #13
Lena Wang (王娜) wrote:
> On Wed, 2024-04-17 at 21:15 -0700, Maciej Żenczykowski wrote:
> >  	 
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) <
> > Lena.Wang@mediatek.com> wrote:
> > >
> > > On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote:
> > > >
> > > > External email : Please do not click links or open attachments
> > until
> > > > you have verified the sender or the content.
> > > >  Lena Wang (王娜) wrote:
> > > > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote:
> > > > > >
> > > > > > External email : Please do not click links or open
> > attachments
> > > > until
> > > > > > you have verified the sender or the content.
> > > > > >  > > > > Personally, I think bpf_skb_pull_data() should have
> > > > > > automatically
> > > > > > > > > > (ie. in kernel code) reduced how much it pulls so
> > that it
> > > > > > would pull
> > > > > > > > > > headers only,
> > > > > > > > >
> > > > > > > > > That would be a helper that parses headers to discover
> > > > header
> > > > > > length.
> > > > > > > >
> > > > > > > > Does it actually need to?  Presumably the bpf pull
> > function
> > > > could
> > > > > > > > notice that it is
> > > > > > > > a packet flagged as being of type X (UDP GSO FRAGLIST)
> > and
> > > > reduce
> > > > > > the pull
> > > > > > > > accordingly so that it doesn't pull anything from the
> > non-
> > > > linear
> > > > > > > > fraglist portion???
> > > > > > > >
> > > > > > > > I know only the generic overview of what udp gso is, not
> > any
> > > > > > details, so I am
> > > > > > > > assuming here that there's some sort of guarantee to how
> > > > these
> > > > > > packets
> > > > > > > > are structured...  But I imagine there must be or we
> > wouldn't
> > > > be
> > > > > > hitting these
> > > > > > > > issues deeper in the stack?
> > > > > > >
> > > > > > > Perhaps for a packet of this type we're already guaranteed
> > the
> > > > > > headers
> > > > > > > are in the linear portion,
> > > > > > > and the pull should simply be ignored?
> > > > > > >
> > > > > > > >
> > > > > > > > > Parsing is better left to the BPF program.
> > > > > >
> > > > > > I do prefer adding sanity checks to the BPF helpers, over
> > having
> > > > to
> > > > > > add then in the net hot path only to protect against
> > dangerous
> > > > BPF
> > > > > > programs.
> > > > > >
> > > > > Is it OK to ignore or decrease pull length for udp gro fraglist
> > > > packet?
> > > > > It could save the normal packet and sent to user correctly.
> > > > >
> > > > > In common/net/core/filter.c
> > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > > > >               unsigned int write_len)
> > > > > {
> > > > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > > > +(SKB_GSO_UDP  |SKB_GSO_UDP_L4)) {
> > > >
> > > > The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST.
> > > >
> > > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. In
> > > udp_offload.c udp4_gro_complete gso_type adds "SKB_GSO_FRAGLIST|
> > > SKB_GSO_UDP_L4". Here checking these two flags is to limit the
> > packet
> > > as "UDP + need GSO + fraglist".
> > >
> > > We could remove SKB_GSO_UDP_L4 check for more packet that may
> > addrive
> > > skb_segment_list.
> > >
> > > > > +return 0;
> > > >
> > > > Failing for any pull is a bit excessive. And would kill a sane
> > > > workaround of pulling only as many bytes as needed.
> > > >
> > > > > +     or if (write_len > skb_headlen(skb))
> > > > > +write_len = skb_headlen(skb);
> > > >
> > > > Truncating requests would be a surprising change of behavior
> > > > for this function.
> > > >
> > > > Failing for a pull > skb_headlen is arguably reasonable, as
> > > > the alternative is that we let it go through but have to drop
> > > > the now malformed packets on segmentation.
> > > >
> > > >
> > > Is it OK as below?
> > >
> > > In common/net/core/filter.c
> > > static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > >               unsigned int write_len)
> > > {
> > > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > +               SKB_GSO_FRAGLIST) && (write_len >
> > skb_headlen(skb))) {
> > > +               return 0;
> > 
> > please limit write_len to skb_headlen() instead of just returning 0
> > 
> 
> Hi Maze & Willem,
> Maze's advice is:
> In common/net/core/filter.c
> static inline int __bpf_try_make_writable(struct sk_buff *skb,
>               unsigned int write_len)
> { 
> +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> +               SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) {
> +               write_len = skb_headlen(skb);
> +       }
>         return skb_ensure_writable(skb, write_len);
> }
> 
> Willem's advice is to "Failing for a pull > skb_headlen is arguably 
> reasonable...". It prefers to return 0 :
> +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> +               SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) {
> +               return 0;
> +       }
> 
> It seems a bit conflict. However I am not sure if my understanding is
> right and hope to get your further guide.

I did not mean to return 0. But to fail a request that would pull an
unsafe amount. The caller must get a clear error signal.

Back to the original report: the issue should already have been fixed
by commit 876e8ca83667 ("net: fix NULL pointer in skb_segment_list").
But that commit is in the kernel for which you report the error.

Turns out that the crash is not in skb_segment_list, but later in
__udpv4_gso_segment_list_csum. Which unconditionally dereferences
udp_hdr(seg).

The above fix also mentions skb pull as the culprit, but does not
include a BPF program. If this can be reached in other ways, then we
do need a stronger test in skb_segment_list, as you propose.

I don't want to narrowly check whether udp_hdr is safe. Essentially,
an SKB_GSO_FRAGLIST skb layout cannot be trusted at all if even one
byte would get pulled.
Maciej Żenczykowski April 19, 2024, 5:29 p.m. UTC | #14
On Fri, Apr 19, 2024 at 7:17 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Lena Wang (王娜) wrote:
> > On Wed, 2024-04-17 at 21:15 -0700, Maciej Żenczykowski wrote:
> > >
> > > External email : Please do not click links or open attachments until
> > > you have verified the sender or the content.
> > >  On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) <
> > > Lena.Wang@mediatek.com> wrote:
> > > >
> > > > On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote:
> > > > >
> > > > > External email : Please do not click links or open attachments
> > > until
> > > > > you have verified the sender or the content.
> > > > >  Lena Wang (王娜) wrote:
> > > > > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote:
> > > > > > >
> > > > > > > External email : Please do not click links or open
> > > attachments
> > > > > until
> > > > > > > you have verified the sender or the content.
> > > > > > >  > > > > Personally, I think bpf_skb_pull_data() should have
> > > > > > > automatically
> > > > > > > > > > > (ie. in kernel code) reduced how much it pulls so
> > > that it
> > > > > > > would pull
> > > > > > > > > > > headers only,
> > > > > > > > > >
> > > > > > > > > > That would be a helper that parses headers to discover
> > > > > header
> > > > > > > length.
> > > > > > > > >
> > > > > > > > > Does it actually need to?  Presumably the bpf pull
> > > function
> > > > > could
> > > > > > > > > notice that it is
> > > > > > > > > a packet flagged as being of type X (UDP GSO FRAGLIST)
> > > and
> > > > > reduce
> > > > > > > the pull
> > > > > > > > > accordingly so that it doesn't pull anything from the
> > > non-
> > > > > linear
> > > > > > > > > fraglist portion???
> > > > > > > > >
> > > > > > > > > I know only the generic overview of what udp gso is, not
> > > any
> > > > > > > details, so I am
> > > > > > > > > assuming here that there's some sort of guarantee to how
> > > > > these
> > > > > > > packets
> > > > > > > > > are structured...  But I imagine there must be or we
> > > wouldn't
> > > > > be
> > > > > > > hitting these
> > > > > > > > > issues deeper in the stack?
> > > > > > > >
> > > > > > > > Perhaps for a packet of this type we're already guaranteed
> > > the
> > > > > > > headers
> > > > > > > > are in the linear portion,
> > > > > > > > and the pull should simply be ignored?
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > Parsing is better left to the BPF program.
> > > > > > >
> > > > > > > I do prefer adding sanity checks to the BPF helpers, over
> > > having
> > > > > to
> > > > > > > add then in the net hot path only to protect against
> > > dangerous
> > > > > BPF
> > > > > > > programs.
> > > > > > >
> > > > > > Is it OK to ignore or decrease pull length for udp gro fraglist
> > > > > packet?
> > > > > > It could save the normal packet and sent to user correctly.
> > > > > >
> > > > > > In common/net/core/filter.c
> > > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > > > > >               unsigned int write_len)
> > > > > > {
> > > > > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > > > > +(SKB_GSO_UDP  |SKB_GSO_UDP_L4)) {
> > > > >
> > > > > The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST.
> > > > >
> > > > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. In
> > > > udp_offload.c udp4_gro_complete gso_type adds "SKB_GSO_FRAGLIST|
> > > > SKB_GSO_UDP_L4". Here checking these two flags is to limit the
> > > packet
> > > > as "UDP + need GSO + fraglist".
> > > >
> > > > We could remove SKB_GSO_UDP_L4 check for more packet that may
> > > addrive
> > > > skb_segment_list.
> > > >
> > > > > > +return 0;
> > > > >
> > > > > Failing for any pull is a bit excessive. And would kill a sane
> > > > > workaround of pulling only as many bytes as needed.
> > > > >
> > > > > > +     or if (write_len > skb_headlen(skb))
> > > > > > +write_len = skb_headlen(skb);
> > > > >
> > > > > Truncating requests would be a surprising change of behavior
> > > > > for this function.
> > > > >
> > > > > Failing for a pull > skb_headlen is arguably reasonable, as
> > > > > the alternative is that we let it go through but have to drop
> > > > > the now malformed packets on segmentation.
> > > > >
> > > > >
> > > > Is it OK as below?
> > > >
> > > > In common/net/core/filter.c
> > > > static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > > >               unsigned int write_len)
> > > > {
> > > > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > > +               SKB_GSO_FRAGLIST) && (write_len >
> > > skb_headlen(skb))) {
> > > > +               return 0;
> > >
> > > please limit write_len to skb_headlen() instead of just returning 0
> > >
> >
> > Hi Maze & Willem,
> > Maze's advice is:
> > In common/net/core/filter.c
> > static inline int __bpf_try_make_writable(struct sk_buff *skb,
> >               unsigned int write_len)
> > {
> > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > +               SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) {
> > +               write_len = skb_headlen(skb);
> > +       }
> >         return skb_ensure_writable(skb, write_len);
> > }
> >
> > Willem's advice is to "Failing for a pull > skb_headlen is arguably
> > reasonable...". It prefers to return 0 :
> > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > +               SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) {
> > +               return 0;
> > +       }
> >
> > It seems a bit conflict. However I am not sure if my understanding is
> > right and hope to get your further guide.
>
> I did not mean to return 0. But to fail a request that would pull an
> unsafe amount. The caller must get a clear error signal.

That's hostile on userspace.
Currently the caller doesn't even check the error return...
Why would we?  We already have to reload all pointers, and have to do
and will thus redo checking on those.

What do you expect the caller to do? Subtract -1 and try again?
That's hard to do from BPF as it involves looping... and is slow.

We already try to not pull too much:

void try_make_writable(struct __sk_buff* skb, int len) {
  if (len > skb->len) len = skb->len;
  if (skb->data_end - skb->data < len) bpf_skb_pull_data(skb, len);
}

Is there at least something like skb->len that has the actually
pullable length in it?

Or are these skb's structured in such a way that there is never a need
to pull anything,
because the headers are already always in the linear portion?

> Back to the original report: the issue should already have been fixed
> by commit 876e8ca83667 ("net: fix NULL pointer in skb_segment_list").
> But that commit is in the kernel for which you report the error.
>
> Turns out that the crash is not in skb_segment_list, but later in
> __udpv4_gso_segment_list_csum. Which unconditionally dereferences
> udp_hdr(seg).
>
> The above fix also mentions skb pull as the culprit, but does not
> include a BPF program. If this can be reached in other ways, then we
> do need a stronger test in skb_segment_list, as you propose.
>
> I don't want to narrowly check whether udp_hdr is safe. Essentially,
> an SKB_GSO_FRAGLIST skb layout cannot be trusted at all if even one
> byte would get pulled.

--
Maciej Żenczykowski, Kernel Networking Developer @ Google
Willem de Bruijn April 19, 2024, 5:41 p.m. UTC | #15
Maciej Żenczykowski wrote:
> On Fri, Apr 19, 2024 at 7:17 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > Lena Wang (王娜) wrote:
> > > On Wed, 2024-04-17 at 21:15 -0700, Maciej Żenczykowski wrote:
> > > >
> > > > External email : Please do not click links or open attachments until
> > > > you have verified the sender or the content.
> > > >  On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) <
> > > > Lena.Wang@mediatek.com> wrote:
> > > > >
> > > > > On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote:
> > > > > >
> > > > > > External email : Please do not click links or open attachments
> > > > until
> > > > > > you have verified the sender or the content.
> > > > > >  Lena Wang (王娜) wrote:
> > > > > > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote:
> > > > > > > >
> > > > > > > > External email : Please do not click links or open
> > > > attachments
> > > > > > until
> > > > > > > > you have verified the sender or the content.
> > > > > > > >  > > > > Personally, I think bpf_skb_pull_data() should have
> > > > > > > > automatically
> > > > > > > > > > > > (ie. in kernel code) reduced how much it pulls so
> > > > that it
> > > > > > > > would pull
> > > > > > > > > > > > headers only,
> > > > > > > > > > >
> > > > > > > > > > > That would be a helper that parses headers to discover
> > > > > > header
> > > > > > > > length.
> > > > > > > > > >
> > > > > > > > > > Does it actually need to?  Presumably the bpf pull
> > > > function
> > > > > > could
> > > > > > > > > > notice that it is
> > > > > > > > > > a packet flagged as being of type X (UDP GSO FRAGLIST)
> > > > and
> > > > > > reduce
> > > > > > > > the pull
> > > > > > > > > > accordingly so that it doesn't pull anything from the
> > > > non-
> > > > > > linear
> > > > > > > > > > fraglist portion???
> > > > > > > > > >
> > > > > > > > > > I know only the generic overview of what udp gso is, not
> > > > any
> > > > > > > > details, so I am
> > > > > > > > > > assuming here that there's some sort of guarantee to how
> > > > > > these
> > > > > > > > packets
> > > > > > > > > > are structured...  But I imagine there must be or we
> > > > wouldn't
> > > > > > be
> > > > > > > > hitting these
> > > > > > > > > > issues deeper in the stack?
> > > > > > > > >
> > > > > > > > > Perhaps for a packet of this type we're already guaranteed
> > > > the
> > > > > > > > headers
> > > > > > > > > are in the linear portion,
> > > > > > > > > and the pull should simply be ignored?
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Parsing is better left to the BPF program.
> > > > > > > >
> > > > > > > > I do prefer adding sanity checks to the BPF helpers, over
> > > > having
> > > > > > to
> > > > > > > > add then in the net hot path only to protect against
> > > > dangerous
> > > > > > BPF
> > > > > > > > programs.
> > > > > > > >
> > > > > > > Is it OK to ignore or decrease pull length for udp gro fraglist
> > > > > > packet?
> > > > > > > It could save the normal packet and sent to user correctly.
> > > > > > >
> > > > > > > In common/net/core/filter.c
> > > > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > > > > > >               unsigned int write_len)
> > > > > > > {
> > > > > > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > > > > > +(SKB_GSO_UDP  |SKB_GSO_UDP_L4)) {
> > > > > >
> > > > > > The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST.
> > > > > >
> > > > > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. In
> > > > > udp_offload.c udp4_gro_complete gso_type adds "SKB_GSO_FRAGLIST|
> > > > > SKB_GSO_UDP_L4". Here checking these two flags is to limit the
> > > > packet
> > > > > as "UDP + need GSO + fraglist".
> > > > >
> > > > > We could remove SKB_GSO_UDP_L4 check for more packet that may
> > > > addrive
> > > > > skb_segment_list.
> > > > >
> > > > > > > +return 0;
> > > > > >
> > > > > > Failing for any pull is a bit excessive. And would kill a sane
> > > > > > workaround of pulling only as many bytes as needed.
> > > > > >
> > > > > > > +     or if (write_len > skb_headlen(skb))
> > > > > > > +write_len = skb_headlen(skb);
> > > > > >
> > > > > > Truncating requests would be a surprising change of behavior
> > > > > > for this function.
> > > > > >
> > > > > > Failing for a pull > skb_headlen is arguably reasonable, as
> > > > > > the alternative is that we let it go through but have to drop
> > > > > > the now malformed packets on segmentation.
> > > > > >
> > > > > >
> > > > > Is it OK as below?
> > > > >
> > > > > In common/net/core/filter.c
> > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > > > >               unsigned int write_len)
> > > > > {
> > > > > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > > > +               SKB_GSO_FRAGLIST) && (write_len >
> > > > skb_headlen(skb))) {
> > > > > +               return 0;
> > > >
> > > > please limit write_len to skb_headlen() instead of just returning 0
> > > >
> > >
> > > Hi Maze & Willem,
> > > Maze's advice is:
> > > In common/net/core/filter.c
> > > static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > >               unsigned int write_len)
> > > {
> > > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > +               SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) {
> > > +               write_len = skb_headlen(skb);
> > > +       }
> > >         return skb_ensure_writable(skb, write_len);
> > > }
> > >
> > > Willem's advice is to "Failing for a pull > skb_headlen is arguably
> > > reasonable...". It prefers to return 0 :
> > > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > +               SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) {
> > > +               return 0;
> > > +       }
> > >
> > > It seems a bit conflict. However I am not sure if my understanding is
> > > right and hope to get your further guide.
> >
> > I did not mean to return 0. But to fail a request that would pull an
> > unsafe amount. The caller must get a clear error signal.
> 
> That's hostile on userspace.
> Currently the caller doesn't even check the error return...

It can, and probably should.

bpf_skb_pull data returns the error code from bpf_try_make_writable:

   return bpf_try_make_writable(skb, len ? : skb_headlen(skb));

> Why would we?  We already have to reload all pointers, and have to do
> and will thus redo checking on those.
> 
> What do you expect the caller to do? Subtract -1 and try again?
> That's hard to do from BPF as it involves looping... and is slow.
> 
> We already try to not pull too much:
> 
> void try_make_writable(struct __sk_buff* skb, int len) {
>   if (len > skb->len) len = skb->len;
>   if (skb->data_end - skb->data < len) bpf_skb_pull_data(skb, len);
> }
> 
> Is there at least something like skb->len that has the actually
> pullable length in it?

The above snippet shows that it passes skb_headlen if the caller
passes 0.

But your BPF program does not even need the data writable, so then
it is of little help of course.
 
> Or are these skb's structured in such a way that there is never a need
> to pull anything,
> because the headers are already always in the linear portion?

That is indeed the case.

So as far as I can see:

A BPF program that just wants to pull the network and transport
headers can diligently pull exactly what is needed. And will not
even observe any data pulled into linear in practice. This is still
advisable rather than trusting that the headers are linear. It may
also be required by the validator? Don't know. But check the return
value.

> 
> > Back to the original report: the issue should already have been fixed
> > by commit 876e8ca83667 ("net: fix NULL pointer in skb_segment_list").
> > But that commit is in the kernel for which you report the error.
> >
> > Turns out that the crash is not in skb_segment_list, but later in
> > __udpv4_gso_segment_list_csum. Which unconditionally dereferences
> > udp_hdr(seg).
> >
> > The above fix also mentions skb pull as the culprit, but does not
> > include a BPF program. If this can be reached in other ways, then we
> > do need a stronger test in skb_segment_list, as you propose.
> >
> > I don't want to narrowly check whether udp_hdr is safe. Essentially,
> > an SKB_GSO_FRAGLIST skb layout cannot be trusted at all if even one
> > byte would get pulled.
> 
> --
> Maciej Żenczykowski, Kernel Networking Developer @ Google
Lena Wang (王娜) April 23, 2024, 2:47 p.m. UTC | #16
On Fri, 2024-04-19 at 13:41 -0400, Willem de Bruijn wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  Maciej Żenczykowski wrote:
> > On Fri, Apr 19, 2024 at 7:17 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > Lena Wang (王娜) wrote:
> > > > On Wed, 2024-04-17 at 21:15 -0700, Maciej Żenczykowski wrote:
> > > > >
> > > > > External email : Please do not click links or open
> attachments until
> > > > > you have verified the sender or the content.
> > > > >  On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) <
> > > > > Lena.Wang@mediatek.com> wrote:
> > > > > >
> > > > > > On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote:
> > > > > > >
> > > > > > > External email : Please do not click links or open
> attachments
> > > > > until
> > > > > > > you have verified the sender or the content.
> > > > > > >  Lena Wang (王娜) wrote:
> > > > > > > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn
> wrote:
> > > > > > > > >
> > > > > > > > > External email : Please do not click links or open
> > > > > attachments
> > > > > > > until
> > > > > > > > > you have verified the sender or the content.
> > > > > > > > >  > > > > Personally, I think bpf_skb_pull_data()
> should have
> > > > > > > > > automatically
> > > > > > > > > > > > > (ie. in kernel code) reduced how much it
> pulls so
> > > > > that it
> > > > > > > > > would pull
> > > > > > > > > > > > > headers only,
> > > > > > > > > > > >
> > > > > > > > > > > > That would be a helper that parses headers to
> discover
> > > > > > > header
> > > > > > > > > length.
> > > > > > > > > > >
> > > > > > > > > > > Does it actually need to?  Presumably the bpf
> pull
> > > > > function
> > > > > > > could
> > > > > > > > > > > notice that it is
> > > > > > > > > > > a packet flagged as being of type X (UDP GSO
> FRAGLIST)
> > > > > and
> > > > > > > reduce
> > > > > > > > > the pull
> > > > > > > > > > > accordingly so that it doesn't pull anything from
> the
> > > > > non-
> > > > > > > linear
> > > > > > > > > > > fraglist portion???
> > > > > > > > > > >
> > > > > > > > > > > I know only the generic overview of what udp gso
> is, not
> > > > > any
> > > > > > > > > details, so I am
> > > > > > > > > > > assuming here that there's some sort of guarantee
> to how
> > > > > > > these
> > > > > > > > > packets
> > > > > > > > > > > are structured...  But I imagine there must be or
> we
> > > > > wouldn't
> > > > > > > be
> > > > > > > > > hitting these
> > > > > > > > > > > issues deeper in the stack?
> > > > > > > > > >
> > > > > > > > > > Perhaps for a packet of this type we're already
> guaranteed
> > > > > the
> > > > > > > > > headers
> > > > > > > > > > are in the linear portion,
> > > > > > > > > > and the pull should simply be ignored?
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Parsing is better left to the BPF program.
> > > > > > > > >
> > > > > > > > > I do prefer adding sanity checks to the BPF helpers,
> over
> > > > > having
> > > > > > > to
> > > > > > > > > add then in the net hot path only to protect against
> > > > > dangerous
> > > > > > > BPF
> > > > > > > > > programs.
> > > > > > > > >
> > > > > > > > Is it OK to ignore or decrease pull length for udp gro
> fraglist
> > > > > > > packet?
> > > > > > > > It could save the normal packet and sent to user
> correctly.
> > > > > > > >
> > > > > > > > In common/net/core/filter.c
> > > > > > > > static inline int __bpf_try_make_writable(struct
> sk_buff *skb,
> > > > > > > >               unsigned int write_len)
> > > > > > > > {
> > > > > > > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > > > > > > +(SKB_GSO_UDP  |SKB_GSO_UDP_L4)) {
> > > > > > >
> > > > > > > The issue is not with SKB_GSO_UDP_L4, but with
> SKB_GSO_FRAGLIST.
> > > > > > >
> > > > > > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO.
> In
> > > > > > udp_offload.c udp4_gro_complete gso_type adds
> "SKB_GSO_FRAGLIST|
> > > > > > SKB_GSO_UDP_L4". Here checking these two flags is to limit
> the
> > > > > packet
> > > > > > as "UDP + need GSO + fraglist".
> > > > > >
> > > > > > We could remove SKB_GSO_UDP_L4 check for more packet that
> may
> > > > > addrive
> > > > > > skb_segment_list.
> > > > > >
> > > > > > > > +return 0;
> > > > > > >
> > > > > > > Failing for any pull is a bit excessive. And would kill a
> sane
> > > > > > > workaround of pulling only as many bytes as needed.
> > > > > > >
> > > > > > > > +     or if (write_len > skb_headlen(skb))
> > > > > > > > +write_len = skb_headlen(skb);
> > > > > > >
> > > > > > > Truncating requests would be a surprising change of
> behavior
> > > > > > > for this function.
> > > > > > >
> > > > > > > Failing for a pull > skb_headlen is arguably reasonable,
> as
> > > > > > > the alternative is that we let it go through but have to
> drop
> > > > > > > the now malformed packets on segmentation.
> > > > > > >
> > > > > > >
> > > > > > Is it OK as below?
> > > > > >
> > > > > > In common/net/core/filter.c
> > > > > > static inline int __bpf_try_make_writable(struct sk_buff
> *skb,
> > > > > >               unsigned int write_len)
> > > > > > {
> > > > > > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > > > > +               SKB_GSO_FRAGLIST) && (write_len >
> > > > > skb_headlen(skb))) {
> > > > > > +               return 0;
> > > > >
> > > > > please limit write_len to skb_headlen() instead of just
> returning 0
> > > > >
> > > >
> > > > Hi Maze & Willem,
> > > > Maze's advice is:
> > > > In common/net/core/filter.c
> > > > static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > > >               unsigned int write_len)
> > > > {
> > > > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > > +               SKB_GSO_FRAGLIST) && (write_len >
> skb_headlen(skb))) {
> > > > +               write_len = skb_headlen(skb);
> > > > +       }
> > > >         return skb_ensure_writable(skb, write_len);
> > > > }
> > > >
> > > > Willem's advice is to "Failing for a pull > skb_headlen is
> arguably
> > > > reasonable...". It prefers to return 0 :
> > > > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > > +               SKB_GSO_FRAGLIST) && (write_len >
> skb_headlen(skb))) {
> > > > +               return 0;
> > > > +       }
> > > >
> > > > It seems a bit conflict. However I am not sure if my
> understanding is
> > > > right and hope to get your further guide.
> > >
> > > I did not mean to return 0. But to fail a request that would pull
> an
> > > unsafe amount. The caller must get a clear error signal.
> > 
> > That's hostile on userspace.
> > Currently the caller doesn't even check the error return...
> 
> It can, and probably should.
> 
> bpf_skb_pull data returns the error code from bpf_try_make_writable:
> 
>    return bpf_try_make_writable(skb, len ? : skb_headlen(skb));
> 
> > Why would we?  We already have to reload all pointers, and have to
> do
> > and will thus redo checking on those.
> > 
> > What do you expect the caller to do? Subtract -1 and try again?
> > That's hard to do from BPF as it involves looping... and is slow.
> > 
> > We already try to not pull too much:
> > 
> > void try_make_writable(struct __sk_buff* skb, int len) {
> >   if (len > skb->len) len = skb->len;
> >   if (skb->data_end - skb->data < len) bpf_skb_pull_data(skb, len);
> > }
> > 
> > Is there at least something like skb->len that has the actually
> > pullable length in it?
> 
> The above snippet shows that it passes skb_headlen if the caller
> passes 0.
> 
> But your BPF program does not even need the data writable, so then
> it is of little help of course.
>  
> > Or are these skb's structured in such a way that there is never a
> need
> > to pull anything,
> > because the headers are already always in the linear portion?
> 
> That is indeed the case.
> 
> So as far as I can see:
> 
> A BPF program that just wants to pull the network and transport
> headers can diligently pull exactly what is needed. And will not
> even observe any data pulled into linear in practice. This is still
> advisable rather than trusting that the headers are linear. It may
> also be required by the validator? Don't know. But check the return
> value.
> 
Hi Willem,
As the discussion, is it OK for the patch below?

diff --git a/net/core/filter.c b/net/core/filter.c
index 3a6110ea4009..abc6029c8eef 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1655,6 +1655,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
bpf_sp);
 static inline int __bpf_try_make_writable(struct sk_buff *skb,
                                          unsigned int write_len)
 {
+       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
+                       SKB_GSO_FRAGLIST) && (write_len >
skb_headlen(skb))) {
+               return -ENOMEM;
+       }
+
        return skb_ensure_writable(skb, write_len);
 }

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 73b1e0e53534..2e90534c1a1e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4036,9 +4036,11 @@ struct sk_buff *skb_segment_list(struct sk_buff
*skb,
        unsigned int tnl_hlen = skb_tnl_header_len(skb);
        unsigned int delta_truesize = 0;
        unsigned int delta_len = 0;
+       unsigned int mss = skb_shinfo(skb)->gso_size;
        struct sk_buff *tail = NULL;
        struct sk_buff *nskb, *tmp;
        int len_diff, err;
+       bool err_len = false;

        skb_push(skb, -skb_network_offset(skb) + offset);

@@ -4047,6 +4049,14 @@ struct sk_buff *skb_segment_list(struct sk_buff
*skb,
        if (err)
                goto err_linearize;

+       if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) {
+               if (!list_skb) {
+                       goto err_linearize;
+               } else {
+                       err_len = true;
+               }
+       }
+
        skb_shinfo(skb)->frag_list = NULL;

        while (list_skb) {
@@ -4109,6 +4119,9 @@ struct sk_buff *skb_segment_list(struct sk_buff
*skb,
            __skb_linearize(skb))
                goto err_linearize;

+       if (err_len)
+               goto err_linearize;
+
        skb_get(skb);

        return skb;

> > 
> > > Back to the original report: the issue should already have been
> fixed
> > > by commit 876e8ca83667 ("net: fix NULL pointer in
> skb_segment_list").
> > > But that commit is in the kernel for which you report the error.
> > >
> > > Turns out that the crash is not in skb_segment_list, but later in
> > > __udpv4_gso_segment_list_csum. Which unconditionally dereferences
> > > udp_hdr(seg).
> > >
> > > The above fix also mentions skb pull as the culprit, but does not
> > > include a BPF program. If this can be reached in other ways, then
> we
> > > do need a stronger test in skb_segment_list, as you propose.
> > >
> > > I don't want to narrowly check whether udp_hdr is safe.
> Essentially,
> > > an SKB_GSO_FRAGLIST skb layout cannot be trusted at all if even
> one
> > > byte would get pulled.
> > 
> > --
> > Maciej Żenczykowski, Kernel Networking Developer @ Google
> 
>
Willem de Bruijn April 23, 2024, 6:35 p.m. UTC | #17
> Hi Willem,
> As the discussion, is it OK for the patch below?

Thanks for iterating on this.

I would like the opinion also of the fraglist and UDP GRO experts.
 
Yes, I think both

- protecting skb_segment_list against clearly illegal fraglist packets, and
- blocking BPF from constructing such packets

are worthwhile stable fixes. I believe they should be two separate
patches. Both probably with the same Fixes tag: 3a1296a38d0c
("net: Support GRO/GSO fraglist chaining").

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 3a6110ea4009..abc6029c8eef 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1655,6 +1655,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
> bpf_sp);
>  static inline int __bpf_try_make_writable(struct sk_buff *skb,
>                                           unsigned int write_len)
>  {
> +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> +                       SKB_GSO_FRAGLIST) && (write_len >
> skb_headlen(skb))) {
> +               return -ENOMEM;
> +       }
> +

Indentation looks off, but I agree with the logic.

    if (skb_is_gso(skb) &&
        (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
         (write_len > skb_headlen(skb)))

>         return skb_ensure_writable(skb, write_len);
>  }
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 73b1e0e53534..2e90534c1a1e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4036,9 +4036,11 @@ struct sk_buff *skb_segment_list(struct sk_buff
> *skb,
>         unsigned int tnl_hlen = skb_tnl_header_len(skb);
>         unsigned int delta_truesize = 0;
>         unsigned int delta_len = 0;
> +       unsigned int mss = skb_shinfo(skb)->gso_size;
>         struct sk_buff *tail = NULL;
>         struct sk_buff *nskb, *tmp;
>         int len_diff, err;
> +       bool err_len = false;
> 
>         skb_push(skb, -skb_network_offset(skb) + offset);
> 
> @@ -4047,6 +4049,14 @@ struct sk_buff *skb_segment_list(struct sk_buff
> *skb,
>         if (err)
>                 goto err_linearize;
> 
> +       if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) {
> +               if (!list_skb) {
> +                       goto err_linearize;

The label no longer truly covers the meaning.

But that is already true since the above (second) jump was added in
commit c329b261afe7 ("net: prevent skb corruption on frag list
segmentation").

Neither needs the kfree_skb_list, as skb->next is not assigned to
until the loop. Can just return ERR_PTR(-EFAULT)?

> +               } else {
> +                       err_len = true;
> +               }
> +       }
> +

Why the branch? Might as well always fail immediately?

>         skb_shinfo(skb)->frag_list = NULL;
> 
>         while (list_skb) {
> @@ -4109,6 +4119,9 @@ struct sk_buff *skb_segment_list(struct sk_buff
> *skb,
>             __skb_linearize(skb))
>                 goto err_linearize;
> 
> +       if (err_len)
> +               goto err_linearize;
> +
>         skb_get(skb);
> 
>         return skb;
> 
> > > 
> > > > Back to the original report: the issue should already have been
> > fixed
> > > > by commit 876e8ca83667 ("net: fix NULL pointer in
> > skb_segment_list").
> > > > But that commit is in the kernel for which you report the error.
> > > >
> > > > Turns out that the crash is not in skb_segment_list, but later in
> > > > __udpv4_gso_segment_list_csum. Which unconditionally dereferences
> > > > udp_hdr(seg).
> > > >
> > > > The above fix also mentions skb pull as the culprit, but does not
> > > > include a BPF program. If this can be reached in other ways, then
> > we
> > > > do need a stronger test in skb_segment_list, as you propose.
> > > >
> > > > I don't want to narrowly check whether udp_hdr is safe.
> > Essentially,
> > > > an SKB_GSO_FRAGLIST skb layout cannot be trusted at all if even
> > one
> > > > byte would get pulled.
Lena Wang (王娜) April 24, 2024, 12:22 p.m. UTC | #18
On Tue, 2024-04-23 at 14:35 -0400, Willem de Bruijn wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  > Hi Willem,
> > As the discussion, is it OK for the patch below?
> 
> Thanks for iterating on this.
> 
> I would like the opinion also of the fraglist and UDP GRO experts.
>  
> Yes, I think both
> 
> - protecting skb_segment_list against clearly illegal fraglist
> packets, and
> - blocking BPF from constructing such packets
> 
> are worthwhile stable fixes. I believe they should be two separate
> patches. Both probably with the same Fixes tag: 3a1296a38d0c
> ("net: Support GRO/GSO fraglist chaining").
> 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 3a6110ea4009..abc6029c8eef 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -1655,6 +1655,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
> > bpf_sp);
> >  static inline int __bpf_try_make_writable(struct sk_buff *skb,
> >                                           unsigned int write_len)
> >  {
> > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > +                       SKB_GSO_FRAGLIST) && (write_len >
> > skb_headlen(skb))) {
> > +               return -ENOMEM;
> > +       }
> > +
> 
> Indentation looks off, but I agree with the logic.
> 
>     if (skb_is_gso(skb) &&
>         (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
>          (write_len > skb_headlen(skb)))
> 
> >         return skb_ensure_writable(skb, write_len);
> >  }
> > 
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 73b1e0e53534..2e90534c1a1e 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -4036,9 +4036,11 @@ struct sk_buff *skb_segment_list(struct
> sk_buff
> > *skb,
> >         unsigned int tnl_hlen = skb_tnl_header_len(skb);
> >         unsigned int delta_truesize = 0;
> >         unsigned int delta_len = 0;
> > +       unsigned int mss = skb_shinfo(skb)->gso_size;
> >         struct sk_buff *tail = NULL;
> >         struct sk_buff *nskb, *tmp;
> >         int len_diff, err;
> > +       bool err_len = false;
> > 
> >         skb_push(skb, -skb_network_offset(skb) + offset);
> > 
> > @@ -4047,6 +4049,14 @@ struct sk_buff *skb_segment_list(struct
> sk_buff
> > *skb,
> >         if (err)
> >                 goto err_linearize;
> > 
> > +       if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) {
> > +               if (!list_skb) {
> > +                       goto err_linearize;
> 
> The label no longer truly covers the meaning.
> 
> But that is already true since the above (second) jump was added in
> commit c329b261afe7 ("net: prevent skb corruption on frag list
> segmentation").
> 
> Neither needs the kfree_skb_list, as skb->next is not assigned to
> until the loop. Can just return ERR_PTR(-EFAULT)?
> 
> > +               } else {
> > +                       err_len = true;
> > +               }
> > +       }
> > +
> 
> Why the branch? Might as well always fail immediately?
> 
Hi Willem,
Thanks for your guidance.
You are right. There is no need for another branch as fraglist
could be freeed in kfree_skb.
Could I git send mail wo patches as below?

From 933237400c0e2fa997470b70ff0e407996fa239c Mon Sep 17 00:00:00 2001
From: Shiming Cheng <shiming.cheng@mediatek.com>
Date: Wed, 24 Apr 2024 13:42:35 +0800
Subject: [PATCH net] net: prevent BPF pull GROed skb's fraglist

A GROed skb with fraglist can't be pulled data
from its fraglist as it may result a invalid
segmentation or kernel exception.

For such structured skb we limit the BPF pull
data length smaller than skb_headlen() and return
error if exceeding.

Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Signed-off-by: Lena Wang <lena.wang@mediatek.com>
---
 net/core/filter.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 8adf95765cdd..8ed4d5d87167 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
bpf_sp);
 static inline int __bpf_try_make_writable(struct sk_buff *skb,
 					  unsigned int write_len)
 {
+	if (skb_is_gso(skb) &&
+	    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
+	     write_len > skb_headlen(skb)) {
+		return -ENOMEM;
+	}
 	return skb_ensure_writable(skb, write_len);
 }
Willem de Bruijn April 24, 2024, 2:28 p.m. UTC | #19
Lena Wang (王娜) wrote:
> On Tue, 2024-04-23 at 14:35 -0400, Willem de Bruijn wrote:
> >  	 
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  > Hi Willem,
> > > As the discussion, is it OK for the patch below?
> > 
> > Thanks for iterating on this.
> > 
> > I would like the opinion also of the fraglist and UDP GRO experts.
> >  
> > Yes, I think both
> > 
> > - protecting skb_segment_list against clearly illegal fraglist
> > packets, and
> > - blocking BPF from constructing such packets
> > 
> > are worthwhile stable fixes. I believe they should be two separate
> > patches. Both probably with the same Fixes tag: 3a1296a38d0c
> > ("net: Support GRO/GSO fraglist chaining").
> > 
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 3a6110ea4009..abc6029c8eef 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -1655,6 +1655,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
> > > bpf_sp);
> > >  static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > >                                           unsigned int write_len)
> > >  {
> > > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > +                       SKB_GSO_FRAGLIST) && (write_len >
> > > skb_headlen(skb))) {
> > > +               return -ENOMEM;
> > > +       }
> > > +
> > 
> > Indentation looks off, but I agree with the logic.
> > 
> >     if (skb_is_gso(skb) &&
> >         (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> >          (write_len > skb_headlen(skb)))
> > 
> > >         return skb_ensure_writable(skb, write_len);
> > >  }
> > > 
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index 73b1e0e53534..2e90534c1a1e 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -4036,9 +4036,11 @@ struct sk_buff *skb_segment_list(struct
> > sk_buff
> > > *skb,
> > >         unsigned int tnl_hlen = skb_tnl_header_len(skb);
> > >         unsigned int delta_truesize = 0;
> > >         unsigned int delta_len = 0;
> > > +       unsigned int mss = skb_shinfo(skb)->gso_size;
> > >         struct sk_buff *tail = NULL;
> > >         struct sk_buff *nskb, *tmp;
> > >         int len_diff, err;
> > > +       bool err_len = false;
> > > 
> > >         skb_push(skb, -skb_network_offset(skb) + offset);
> > > 
> > > @@ -4047,6 +4049,14 @@ struct sk_buff *skb_segment_list(struct
> > sk_buff
> > > *skb,
> > >         if (err)
> > >                 goto err_linearize;
> > > 
> > > +       if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) {
> > > +               if (!list_skb) {
> > > +                       goto err_linearize;
> > 
> > The label no longer truly covers the meaning.
> > 
> > But that is already true since the above (second) jump was added in
> > commit c329b261afe7 ("net: prevent skb corruption on frag list
> > segmentation").
> > 
> > Neither needs the kfree_skb_list, as skb->next is not assigned to
> > until the loop. Can just return ERR_PTR(-EFAULT)?
> > 
> > > +               } else {
> > > +                       err_len = true;
> > > +               }
> > > +       }
> > > +
> > 
> > Why the branch? Might as well always fail immediately?
> > 
> Hi Willem,
> Thanks for your guidance.
> You are right. There is no need for another branch as fraglist
> could be freeed in kfree_skb.
> Could I git send mail wo patches as below?
> 
> From 933237400c0e2fa997470b70ff0e407996fa239c Mon Sep 17 00:00:00 2001
> From: Shiming Cheng <shiming.cheng@mediatek.com>
> Date: Wed, 24 Apr 2024 13:42:35 +0800
> Subject: [PATCH net] net: prevent BPF pull GROed skb's fraglist
> 
> A GROed skb with fraglist can't be pulled data

Please use the specific label: SKB_GSO_FRAGLIST skb

> from its fraglist as it may result a invalid
> segmentation or kernel exception.
> 
> For such structured skb we limit the BPF pull
> data length smaller than skb_headlen() and return
> error if exceeding.
> 
> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> ---
>  net/core/filter.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 8adf95765cdd..8ed4d5d87167 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
> bpf_sp);
>  static inline int __bpf_try_make_writable(struct sk_buff *skb,
>  					  unsigned int write_len)
>  {
> +	if (skb_is_gso(skb) &&
> +	    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> +	     write_len > skb_headlen(skb)) {
> +		return -ENOMEM;
> +	}
>  	return skb_ensure_writable(skb, write_len);
>  }
>  
> -- 
> 2.18.0
> 
> 
> From 2d0729b20cf810ba1b31e046952c1cd78f295ca3 Mon Sep 17 00:00:00 2001
> From: Shiming Cheng <shiming.cheng@mediatek.com>
> Date: Wed, 24 Apr 2024 14:43:45 +0800
> Subject: [PATCH net] net: drop GROed skb pulled from fraglist
> 
> A GROed skb with fraglist maybe pulled by BPF
> or other ways. It can't be trusted at all even
> if one byte is pulled and should be dropped
> on segmentation.

This paraphrases my comment. It is better to spell it out:

An SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is expected to have all
segments except the last to be gso_size long. If this does not hold,
the skb has been modified and the fraglist gso integrity is lost. Drop
the packet, as it cannot be segmented correctly by skb_segment_list.

The skb could be salvaged, though, right? By linearizing, dropping
the SKB_GSO_FRAGLIST bit and entering the normal skb_segment path
rather than the skb_segment_list path.

That choice is currently made in the protocol caller,
__udp_gso_segment. It's not trivial to add such a backup path here.
So let's add this backstop against kernel crashes.

> 
> If the gso_size does not match skb_headlen(),
> it means to be pulled part of or the entire
> fraglsit. It has been messed with and we return

fraglist

> error to free this skb.
> 
> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> ---
>  net/core/skbuff.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b99127712e67..750fbb51b99f 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4493,6 +4493,7 @@ struct sk_buff *skb_segment_list(struct sk_buff
> *skb,
>  	unsigned int tnl_hlen = skb_tnl_header_len(skb);
>  	unsigned int delta_truesize = 0;
>  	unsigned int delta_len = 0;
> +	unsigned int mss = skb_shinfo(skb)->gso_size;

Reverse christmas tree

>  	struct sk_buff *tail = NULL;
>  	struct sk_buff *nskb, *tmp;
>  	int len_diff, err;
> @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct sk_buff
> *skb,
>  	if (err)
>  		goto err_linearize;
>  
> +	if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb))
> +		return ERR_PTR(-EFAULT);
> +

Do this precondition integrity check before the skb_unclone path?

>  	skb_shinfo(skb)->frag_list = NULL;
>  
>  	while (list_skb) {
> -- 
> 2.18.0
Lena Wang (王娜) April 25, 2024, 4:32 a.m. UTC | #20
On Wed, 2024-04-24 at 10:28 -0400, Willem de Bruijn wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  
> Lena Wang (王娜) wrote:
> > On Tue, 2024-04-23 at 14:35 -0400, Willem de Bruijn wrote:
> > >   
> > > External email : Please do not click links or open attachments
> until
> > > you have verified the sender or the content.
> > >  > Hi Willem,
> > > > As the discussion, is it OK for the patch below?
> > > 
> > > Thanks for iterating on this.
> > > 
> > > I would like the opinion also of the fraglist and UDP GRO
> experts.
> > >  
> > > Yes, I think both
> > > 
> > > - protecting skb_segment_list against clearly illegal fraglist
> > > packets, and
> > > - blocking BPF from constructing such packets
> > > 
> > > are worthwhile stable fixes. I believe they should be two
> separate
> > > patches. Both probably with the same Fixes tag: 3a1296a38d0c
> > > ("net: Support GRO/GSO fraglist chaining").
> > > 
> > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > index 3a6110ea4009..abc6029c8eef 100644
> > > > --- a/net/core/filter.c
> > > > +++ b/net/core/filter.c
> > > > @@ -1655,6 +1655,11 @@ static DEFINE_PER_CPU(struct
> bpf_scratchpad,
> > > > bpf_sp);
> > > >  static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > > >                                           unsigned int
> write_len)
> > > >  {
> > > > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > > +                       SKB_GSO_FRAGLIST) && (write_len >
> > > > skb_headlen(skb))) {
> > > > +               return -ENOMEM;
> > > > +       }
> > > > +
> > > 
> > > Indentation looks off, but I agree with the logic.
> > > 
> > >     if (skb_is_gso(skb) &&
> > >         (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> > >          (write_len > skb_headlen(skb)))
> > > 
> > > >         return skb_ensure_writable(skb, write_len);
> > > >  }
> > > > 
> > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > index 73b1e0e53534..2e90534c1a1e 100644
> > > > --- a/net/core/skbuff.c
> > > > +++ b/net/core/skbuff.c
> > > > @@ -4036,9 +4036,11 @@ struct sk_buff *skb_segment_list(struct
> > > sk_buff
> > > > *skb,
> > > >         unsigned int tnl_hlen = skb_tnl_header_len(skb);
> > > >         unsigned int delta_truesize = 0;
> > > >         unsigned int delta_len = 0;
> > > > +       unsigned int mss = skb_shinfo(skb)->gso_size;
> > > >         struct sk_buff *tail = NULL;
> > > >         struct sk_buff *nskb, *tmp;
> > > >         int len_diff, err;
> > > > +       bool err_len = false;
> > > > 
> > > >         skb_push(skb, -skb_network_offset(skb) + offset);
> > > > 
> > > > @@ -4047,6 +4049,14 @@ struct sk_buff *skb_segment_list(struct
> > > sk_buff
> > > > *skb,
> > > >         if (err)
> > > >                 goto err_linearize;
> > > > 
> > > > +       if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) {
> > > > +               if (!list_skb) {
> > > > +                       goto err_linearize;
> > > 
> > > The label no longer truly covers the meaning.
> > > 
> > > But that is already true since the above (second) jump was added
> in
> > > commit c329b261afe7 ("net: prevent skb corruption on frag list
> > > segmentation").
> > > 
> > > Neither needs the kfree_skb_list, as skb->next is not assigned to
> > > until the loop. Can just return ERR_PTR(-EFAULT)?
> > > 
> > > > +               } else {
> > > > +                       err_len = true;
> > > > +               }
> > > > +       }
> > > > +
> > > 
> > > Why the branch? Might as well always fail immediately?
> > > 
> > Hi Willem,
> > Thanks for your guidance.
> > You are right. There is no need for another branch as fraglist
> > could be freeed in kfree_skb.
> > Could I git send mail wo patches as below?
> > 
> > From 933237400c0e2fa997470b70ff0e407996fa239c Mon Sep 17 00:00:00
> 2001
> > From: Shiming Cheng <shiming.cheng@mediatek.com>
> > Date: Wed, 24 Apr 2024 13:42:35 +0800
> > Subject: [PATCH net] net: prevent BPF pull GROed skb's fraglist
> > 
> > A GROed skb with fraglist can't be pulled data
> 
> Please use the specific label: SKB_GSO_FRAGLIST skb
> 
> > from its fraglist as it may result a invalid
> > segmentation or kernel exception.
> > 
> > For such structured skb we limit the BPF pull
> > data length smaller than skb_headlen() and return
> > error if exceeding.
> > 
> > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > ---
> >  net/core/filter.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 8adf95765cdd..8ed4d5d87167 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
> > bpf_sp);
> >  static inline int __bpf_try_make_writable(struct sk_buff *skb,
> >    unsigned int write_len)
> >  {
> > +if (skb_is_gso(skb) &&
> > +    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> > +     write_len > skb_headlen(skb)) {
> > +return -ENOMEM;
> > +}
> >  return skb_ensure_writable(skb, write_len);
> >  }
> >  
> > -- 
> > 2.18.0
> > 
> > 
> > From 2d0729b20cf810ba1b31e046952c1cd78f295ca3 Mon Sep 17 00:00:00
> 2001
> > From: Shiming Cheng <shiming.cheng@mediatek.com>
> > Date: Wed, 24 Apr 2024 14:43:45 +0800
> > Subject: [PATCH net] net: drop GROed skb pulled from fraglist
> > 
> > A GROed skb with fraglist maybe pulled by BPF
> > or other ways. It can't be trusted at all even
> > if one byte is pulled and should be dropped
> > on segmentation.
> 
> This paraphrases my comment. It is better to spell it out:
> 
> An SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is expected to have all
> segments except the last to be gso_size long. If this does not hold,
> the skb has been modified and the fraglist gso integrity is lost.
> Drop
> the packet, as it cannot be segmented correctly by skb_segment_list.
> 
> The skb could be salvaged, though, right? By linearizing, dropping
> the SKB_GSO_FRAGLIST bit and entering the normal skb_segment path
> rather than the skb_segment_list path.
> 
> That choice is currently made in the protocol caller,
> __udp_gso_segment. It's not trivial to add such a backup path here.
> So let's add this backstop against kernel crashes.
> 
> > 
> > If the gso_size does not match skb_headlen(),
> > it means to be pulled part of or the entire
> > fraglsit. It has been messed with and we return
> 
> fraglist
> 
> > error to free this skb.
> > 
> > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > ---
> >  net/core/skbuff.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index b99127712e67..750fbb51b99f 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -4493,6 +4493,7 @@ struct sk_buff *skb_segment_list(struct
> sk_buff
> > *skb,
> >  unsigned int tnl_hlen = skb_tnl_header_len(skb);
> >  unsigned int delta_truesize = 0;
> >  unsigned int delta_len = 0;
> > +unsigned int mss = skb_shinfo(skb)->gso_size;
> 
> Reverse christmas tree
> 
> >  struct sk_buff *tail = NULL;
> >  struct sk_buff *nskb, *tmp;
> >  int len_diff, err;
> > @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct
> sk_buff
> > *skb,
> >  if (err)
> >  goto err_linearize;
> >  
> > +if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb))
> > +return ERR_PTR(-EFAULT);
> > +
> 
> Do this precondition integrity check before the skb_unclone path?

After return error, the skb will enter into kfree_skb, not consume_skb.
It may meet same crash problem which has been resolved by skb_unclone.

Or kfree_skb could well handle the cloned skb's release?

Other changes are updated as below:

From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 00:00:00 2001
From: Shiming Cheng <shiming.cheng@mediatek.com>
Date: Wed, 24 Apr 2024 13:42:35 +0800
Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST skb

A SKB_GSO_FRAGLIST skb can't be pulled data
from its fraglist as it may result an invalid
segmentation or kernel exception.

For such structured skb we limit the BPF pulling
data length smaller than skb_headlen() and return
error if exceeding.

Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Signed-off-by: Lena Wang <lena.wang@mediatek.com>
---
 net/core/filter.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 8adf95765cdd..8ed4d5d87167 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
bpf_sp);
 static inline int __bpf_try_make_writable(struct sk_buff *skb,
 					  unsigned int write_len)
 {
+	if (skb_is_gso(skb) &&
+	    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
+	     write_len > skb_headlen(skb)) {
+		return -ENOMEM;
+	}
 	return skb_ensure_writable(skb, write_len);
 }
Willem de Bruijn April 25, 2024, 2:07 p.m. UTC | #21
> > >  struct sk_buff *tail = NULL;
> > >  struct sk_buff *nskb, *tmp;
> > >  int len_diff, err;
> > > @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct
> > sk_buff
> > > *skb,
> > >  if (err)
> > >  goto err_linearize;
> > >  
> > > +if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb))
> > > +return ERR_PTR(-EFAULT);
> > > +
> > 
> > Do this precondition integrity check before the skb_unclone path?
> 
> After return error, the skb will enter into kfree_skb, not consume_skb.
> It may meet same crash problem which has been resolved by skb_unclone.
> 
> Or kfree_skb could well handle the cloned skb's release?

Since this is an error path it should reach kfree_skb rather than
consume_skb.

> 
> Other changes are updated as below:
> 
> From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 00:00:00 2001
> From: Shiming Cheng <shiming.cheng@mediatek.com>
> Date: Wed, 24 Apr 2024 13:42:35 +0800
> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST skb
> 
> A SKB_GSO_FRAGLIST skb can't be pulled data
> from its fraglist as it may result an invalid
> segmentation or kernel exception.
> 
> For such structured skb we limit the BPF pulling
> data length smaller than skb_headlen() and return
> error if exceeding.
> 
> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> ---
>  net/core/filter.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 8adf95765cdd..8ed4d5d87167 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
> bpf_sp);
>  static inline int __bpf_try_make_writable(struct sk_buff *skb,
>  					  unsigned int write_len)
>  {
> +	if (skb_is_gso(skb) &&
> +	    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> +	     write_len > skb_headlen(skb)) {
> +		return -ENOMEM;
> +	}
>  	return skb_ensure_writable(skb, write_len);
>  }
>  
> -- 
> 2.18.0
> 
> 
> From 64d55392debbc90ef2e9c33441024d612075bdd7 Mon Sep 17 00:00:00 2001
> From: Shiming Cheng <shiming.cheng@mediatek.com>
> Date: Wed, 24 Apr 2024 14:43:45 +0800
> Subject: [PATCH net] net: drop pulled SKB_GSO_FRAGLIST skb
> 
> A SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is
> expected to have all segments except the last
> to be gso_size long. If this does not hold, the
> skb has been modified and the fraglist gso integrity
> is lost. Drop the packet, as it cannot be segmented
> correctly by skb_segment_list.
> 
> The skb could be salvaged, though, right?
> By linearizing, dropping the SKB_GSO_FRAGLIST bit
> and entering the normal skb_segment path rather than
> the skb_segment_list path.

Drop the "though, right?"
> 
> That choice is currently made in the protocol caller,
> __udp_gso_segment. It's not trivial to add such a
> backup path here. So let's add this backstop against
> kernel crashes.
> 
> If the gso_size does not match skb_headlen(),
> it means part of or the entire fraglist has been pulled.
> It has been messed with and we should return error to
> free this skb.

This paragraph is now duplicative. Drop.
> 
> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> ---
>  net/core/skbuff.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b99127712e67..4777f5fea6c3 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4491,6 +4491,7 @@ struct sk_buff *skb_segment_list(struct sk_buff
> *skb,
>  {
>  	struct sk_buff *list_skb = skb_shinfo(skb)->frag_list;
>  	unsigned int tnl_hlen = skb_tnl_header_len(skb);
> +	unsigned int mss = skb_shinfo(skb)->gso_size;
>  	unsigned int delta_truesize = 0;
>  	unsigned int delta_len = 0;
>  	struct sk_buff *tail = NULL;
> @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct sk_buff
> *skb,
>  	if (err)
>  		goto err_linearize;
>  
> +	if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb))
> +		return ERR_PTR(-EFAULT);
> +
>  	skb_shinfo(skb)->frag_list = NULL;
>  
>  	while (list_skb) {
> -- 
> 2.18.0
Maciej Żenczykowski April 26, 2024, 12:16 a.m. UTC | #22
On Wed, Apr 24, 2024 at 9:33 PM Lena Wang (王娜) <Lena.Wang@mediatek.com> wrote:
>
> On Wed, 2024-04-24 at 10:28 -0400, Willem de Bruijn wrote:
> >
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >
> > Lena Wang (王娜) wrote:
> > > On Tue, 2024-04-23 at 14:35 -0400, Willem de Bruijn wrote:
> > > >
> > > > External email : Please do not click links or open attachments
> > until
> > > > you have verified the sender or the content.
> > > >  > Hi Willem,
> > > > > As the discussion, is it OK for the patch below?
> > > >
> > > > Thanks for iterating on this.
> > > >
> > > > I would like the opinion also of the fraglist and UDP GRO
> > experts.
> > > >
> > > > Yes, I think both
> > > >
> > > > - protecting skb_segment_list against clearly illegal fraglist
> > > > packets, and
> > > > - blocking BPF from constructing such packets
> > > >
> > > > are worthwhile stable fixes. I believe they should be two
> > separate
> > > > patches. Both probably with the same Fixes tag: 3a1296a38d0c
> > > > ("net: Support GRO/GSO fraglist chaining").
> > > >
> > > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > > index 3a6110ea4009..abc6029c8eef 100644
> > > > > --- a/net/core/filter.c
> > > > > +++ b/net/core/filter.c
> > > > > @@ -1655,6 +1655,11 @@ static DEFINE_PER_CPU(struct
> > bpf_scratchpad,
> > > > > bpf_sp);
> > > > >  static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > > > >                                           unsigned int
> > write_len)
> > > > >  {
> > > > > +       if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type &
> > > > > +                       SKB_GSO_FRAGLIST) && (write_len >
> > > > > skb_headlen(skb))) {
> > > > > +               return -ENOMEM;
> > > > > +       }
> > > > > +
> > > >
> > > > Indentation looks off, but I agree with the logic.
> > > >
> > > >     if (skb_is_gso(skb) &&
> > > >         (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> > > >          (write_len > skb_headlen(skb)))
> > > >
> > > > >         return skb_ensure_writable(skb, write_len);
> > > > >  }
> > > > >
> > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > > > index 73b1e0e53534..2e90534c1a1e 100644
> > > > > --- a/net/core/skbuff.c
> > > > > +++ b/net/core/skbuff.c
> > > > > @@ -4036,9 +4036,11 @@ struct sk_buff *skb_segment_list(struct
> > > > sk_buff
> > > > > *skb,
> > > > >         unsigned int tnl_hlen = skb_tnl_header_len(skb);
> > > > >         unsigned int delta_truesize = 0;
> > > > >         unsigned int delta_len = 0;
> > > > > +       unsigned int mss = skb_shinfo(skb)->gso_size;
> > > > >         struct sk_buff *tail = NULL;
> > > > >         struct sk_buff *nskb, *tmp;
> > > > >         int len_diff, err;
> > > > > +       bool err_len = false;
> > > > >
> > > > >         skb_push(skb, -skb_network_offset(skb) + offset);
> > > > >
> > > > > @@ -4047,6 +4049,14 @@ struct sk_buff *skb_segment_list(struct
> > > > sk_buff
> > > > > *skb,
> > > > >         if (err)
> > > > >                 goto err_linearize;
> > > > >
> > > > > +       if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) {
> > > > > +               if (!list_skb) {
> > > > > +                       goto err_linearize;
> > > >
> > > > The label no longer truly covers the meaning.
> > > >
> > > > But that is already true since the above (second) jump was added
> > in
> > > > commit c329b261afe7 ("net: prevent skb corruption on frag list
> > > > segmentation").
> > > >
> > > > Neither needs the kfree_skb_list, as skb->next is not assigned to
> > > > until the loop. Can just return ERR_PTR(-EFAULT)?
> > > >
> > > > > +               } else {
> > > > > +                       err_len = true;
> > > > > +               }
> > > > > +       }
> > > > > +
> > > >
> > > > Why the branch? Might as well always fail immediately?
> > > >
> > > Hi Willem,
> > > Thanks for your guidance.
> > > You are right. There is no need for another branch as fraglist
> > > could be freeed in kfree_skb.
> > > Could I git send mail wo patches as below?
> > >
> > > From 933237400c0e2fa997470b70ff0e407996fa239c Mon Sep 17 00:00:00
> > 2001
> > > From: Shiming Cheng <shiming.cheng@mediatek.com>
> > > Date: Wed, 24 Apr 2024 13:42:35 +0800
> > > Subject: [PATCH net] net: prevent BPF pull GROed skb's fraglist
> > >
> > > A GROed skb with fraglist can't be pulled data
> >
> > Please use the specific label: SKB_GSO_FRAGLIST skb
> >
> > > from its fraglist as it may result a invalid
> > > segmentation or kernel exception.
> > >
> > > For such structured skb we limit the BPF pull
> > > data length smaller than skb_headlen() and return
> > > error if exceeding.
> > >
> > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > > ---
> > >  net/core/filter.c | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > >
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 8adf95765cdd..8ed4d5d87167 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
> > > bpf_sp);
> > >  static inline int __bpf_try_make_writable(struct sk_buff *skb,
> > >    unsigned int write_len)
> > >  {
> > > +if (skb_is_gso(skb) &&
> > > +    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> > > +     write_len > skb_headlen(skb)) {
> > > +return -ENOMEM;
> > > +}
> > >  return skb_ensure_writable(skb, write_len);
> > >  }
> > >
> > > --
> > > 2.18.0
> > >
> > >
> > > From 2d0729b20cf810ba1b31e046952c1cd78f295ca3 Mon Sep 17 00:00:00
> > 2001
> > > From: Shiming Cheng <shiming.cheng@mediatek.com>
> > > Date: Wed, 24 Apr 2024 14:43:45 +0800
> > > Subject: [PATCH net] net: drop GROed skb pulled from fraglist
> > >
> > > A GROed skb with fraglist maybe pulled by BPF
> > > or other ways. It can't be trusted at all even
> > > if one byte is pulled and should be dropped
> > > on segmentation.
> >
> > This paraphrases my comment. It is better to spell it out:
> >
> > An SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is expected to have all
> > segments except the last to be gso_size long. If this does not hold,
> > the skb has been modified and the fraglist gso integrity is lost.
> > Drop
> > the packet, as it cannot be segmented correctly by skb_segment_list.
> >
> > The skb could be salvaged, though, right? By linearizing, dropping
> > the SKB_GSO_FRAGLIST bit and entering the normal skb_segment path
> > rather than the skb_segment_list path.
> >
> > That choice is currently made in the protocol caller,
> > __udp_gso_segment. It's not trivial to add such a backup path here.
> > So let's add this backstop against kernel crashes.
> >
> > >
> > > If the gso_size does not match skb_headlen(),
> > > it means to be pulled part of or the entire
> > > fraglsit. It has been messed with and we return
> >
> > fraglist
> >
> > > error to free this skb.
> > >
> > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > > ---
> > >  net/core/skbuff.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index b99127712e67..750fbb51b99f 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -4493,6 +4493,7 @@ struct sk_buff *skb_segment_list(struct
> > sk_buff
> > > *skb,
> > >  unsigned int tnl_hlen = skb_tnl_header_len(skb);
> > >  unsigned int delta_truesize = 0;
> > >  unsigned int delta_len = 0;
> > > +unsigned int mss = skb_shinfo(skb)->gso_size;
> >
> > Reverse christmas tree
> >
> > >  struct sk_buff *tail = NULL;
> > >  struct sk_buff *nskb, *tmp;
> > >  int len_diff, err;
> > > @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct
> > sk_buff
> > > *skb,
> > >  if (err)
> > >  goto err_linearize;
> > >
> > > +if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb))
> > > +return ERR_PTR(-EFAULT);
> > > +
> >
> > Do this precondition integrity check before the skb_unclone path?
>
> After return error, the skb will enter into kfree_skb, not consume_skb.
> It may meet same crash problem which has been resolved by skb_unclone.
>
> Or kfree_skb could well handle the cloned skb's release?
>
> Other changes are updated as below:
>
> From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 00:00:00 2001
> From: Shiming Cheng <shiming.cheng@mediatek.com>
> Date: Wed, 24 Apr 2024 13:42:35 +0800
> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST skb
>
> A SKB_GSO_FRAGLIST skb can't be pulled data
> from its fraglist as it may result an invalid
> segmentation or kernel exception.
>
> For such structured skb we limit the BPF pulling
> data length smaller than skb_headlen() and return
> error if exceeding.
>
> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> ---
>  net/core/filter.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 8adf95765cdd..8ed4d5d87167 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
> bpf_sp);
>  static inline int __bpf_try_make_writable(struct sk_buff *skb,
>                                           unsigned int write_len)
>  {
> +       if (skb_is_gso(skb) &&
> +           (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> +            write_len > skb_headlen(skb)) {

at least call:
  skb_ensure_writable(skb, skb_headlen(skb));
here before you return the error to make sure the already linear
header portion is writable.
(the skb could be a clone with a r/o header afaict)

> +               return -ENOMEM;
> +       }
>         return skb_ensure_writable(skb, write_len);
>  }
>
> --
> 2.18.0
>
>
> From 64d55392debbc90ef2e9c33441024d612075bdd7 Mon Sep 17 00:00:00 2001
> From: Shiming Cheng <shiming.cheng@mediatek.com>
> Date: Wed, 24 Apr 2024 14:43:45 +0800
> Subject: [PATCH net] net: drop pulled SKB_GSO_FRAGLIST skb
>
> A SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is
> expected to have all segments except the last
> to be gso_size long. If this does not hold, the
> skb has been modified and the fraglist gso integrity
> is lost. Drop the packet, as it cannot be segmented
> correctly by skb_segment_list.
>
> The skb could be salvaged, though, right?
> By linearizing, dropping the SKB_GSO_FRAGLIST bit
> and entering the normal skb_segment path rather than
> the skb_segment_list path.
>
> That choice is currently made in the protocol caller,
> __udp_gso_segment. It's not trivial to add such a
> backup path here. So let's add this backstop against
> kernel crashes.
>
> If the gso_size does not match skb_headlen(),
> it means part of or the entire fraglist has been pulled.
> It has been messed with and we should return error to
> free this skb.
>
> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> ---
>  net/core/skbuff.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b99127712e67..4777f5fea6c3 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4491,6 +4491,7 @@ struct sk_buff *skb_segment_list(struct sk_buff
> *skb,
>  {
>         struct sk_buff *list_skb = skb_shinfo(skb)->frag_list;
>         unsigned int tnl_hlen = skb_tnl_header_len(skb);
> +       unsigned int mss = skb_shinfo(skb)->gso_size;
>         unsigned int delta_truesize = 0;
>         unsigned int delta_len = 0;
>         struct sk_buff *tail = NULL;
> @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct sk_buff
> *skb,
>         if (err)
>                 goto err_linearize;
>
> +       if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb))
> +               return ERR_PTR(-EFAULT);
> +
>         skb_shinfo(skb)->frag_list = NULL;
>
>         while (list_skb) {
> --
> 2.18.0

--
Maciej Żenczykowski, Kernel Networking Developer @ Google
Lena Wang (王娜) April 26, 2024, 9:52 a.m. UTC | #23
On Thu, 2024-04-25 at 10:07 -0400, Willem de Bruijn wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  
> > > >  struct sk_buff *tail = NULL;
> > > >  struct sk_buff *nskb, *tmp;
> > > >  int len_diff, err;
> > > > @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct
> > > sk_buff
> > > > *skb,
> > > >  if (err)
> > > >  goto err_linearize;
> > > >  
> > > > +if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb))
> > > > +return ERR_PTR(-EFAULT);
> > > > +
> > > 
> > > Do this precondition integrity check before the skb_unclone path?
> > 
> > After return error, the skb will enter into kfree_skb, not
> consume_skb.
> > It may meet same crash problem which has been resolved by
> skb_unclone.
> > 
> > Or kfree_skb could well handle the cloned skb's release?
> 
> Since this is an error path it should reach kfree_skb rather than
> consume_skb.
> 
So we could keep the check after the skb_unclone path, right?

> > 
> > Other changes are updated as below:
> > 
> > From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 00:00:00
> 2001
> > From: Shiming Cheng <shiming.cheng@mediatek.com>
> > Date: Wed, 24 Apr 2024 13:42:35 +0800
> > Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST skb
> > 
> > A SKB_GSO_FRAGLIST skb can't be pulled data
> > from its fraglist as it may result an invalid
> > segmentation or kernel exception.
> > 
> > For such structured skb we limit the BPF pulling
> > data length smaller than skb_headlen() and return
> > error if exceeding.
> > 
> > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > ---
> >  net/core/filter.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 8adf95765cdd..8ed4d5d87167 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
> > bpf_sp);
> >  static inline int __bpf_try_make_writable(struct sk_buff *skb,
> >    unsigned int write_len)
> >  {
> > +if (skb_is_gso(skb) &&
> > +    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> > +     write_len > skb_headlen(skb)) {
> > +return -ENOMEM;
> > +}
> >  return skb_ensure_writable(skb, write_len);
> >  }
> >  
> > -- 
> > 2.18.0
> > 
> > 
> > From 64d55392debbc90ef2e9c33441024d612075bdd7 Mon Sep 17 00:00:00
> 2001
> > From: Shiming Cheng <shiming.cheng@mediatek.com>
> > Date: Wed, 24 Apr 2024 14:43:45 +0800
> > Subject: [PATCH net] net: drop pulled SKB_GSO_FRAGLIST skb
> > 
> > A SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is
> > expected to have all segments except the last
> > to be gso_size long. If this does not hold, the
> > skb has been modified and the fraglist gso integrity
> > is lost. Drop the packet, as it cannot be segmented
> > correctly by skb_segment_list.
> > 
> > The skb could be salvaged, though, right?
> > By linearizing, dropping the SKB_GSO_FRAGLIST bit
> > and entering the normal skb_segment path rather than
> > the skb_segment_list path.
> 
> Drop the "though, right?"
> > 
> > That choice is currently made in the protocol caller,
> > __udp_gso_segment. It's not trivial to add such a
> > backup path here. So let's add this backstop against
> > kernel crashes.
> > 
> > If the gso_size does not match skb_headlen(),
> > it means part of or the entire fraglist has been pulled.
> > It has been messed with and we should return error to
> > free this skb.
> 
> This paragraph is now duplicative. Drop.

OK, updated as below:

From 59d561adc13d52e3c225c6b8276f6a53324f7d56 Mon Sep 17 00:00:00 2001
From: Shiming Cheng <shiming.cheng@mediatek.com>
Date: Wed, 24 Apr 2024 14:43:45 +0800
Subject: [PATCH net] net: drop pulled SKB_GSO_FRAGLIST skb

A SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is
expected to have all segments except the last
to be gso_size long. If this does not hold, the
skb has been modified and the fraglist gso integrity
is lost. Drop the packet, as it cannot be segmented
correctly by skb_segment_list.

The skb could be salvaged. By linearizing, dropping
the SKB_GSO_FRAGLIST bit and entering the normal
skb_segment path rather than the skb_segment_list path.

That choice is currently made in the protocol caller,
__udp_gso_segment. It's not trivial to add such a
backup path here. So let's add this backstop against
kernel crashes.

Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Signed-off-by: Lena Wang <lena.wang@mediatek.com>
---
 net/core/skbuff.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b99127712e67..4777f5fea6c3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4491,6 +4491,7 @@ struct sk_buff *skb_segment_list(struct sk_buff
*skb,
 {
 	struct sk_buff *list_skb = skb_shinfo(skb)->frag_list;
 	unsigned int tnl_hlen = skb_tnl_header_len(skb);
+	unsigned int mss = skb_shinfo(skb)->gso_size;
 	unsigned int delta_truesize = 0;
 	unsigned int delta_len = 0;
 	struct sk_buff *tail = NULL;
@@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct sk_buff
*skb,
 	if (err)
 		goto err_linearize;
 
+	if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb))
+		return ERR_PTR(-EFAULT);
+
 	skb_shinfo(skb)->frag_list = NULL;
 
 	while (list_skb) {
Daniel Borkmann April 26, 2024, 9:08 p.m. UTC | #24
On 4/26/24 11:52 AM, Lena Wang (王娜) wrote:
[...]
>>>  From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 00:00:00
>> 2001
>>> From: Shiming Cheng <shiming.cheng@mediatek.com>
>>> Date: Wed, 24 Apr 2024 13:42:35 +0800
>>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST skb
>>>
>>> A SKB_GSO_FRAGLIST skb can't be pulled data
>>> from its fraglist as it may result an invalid
>>> segmentation or kernel exception.
>>>
>>> For such structured skb we limit the BPF pulling
>>> data length smaller than skb_headlen() and return
>>> error if exceeding.
>>>
>>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
>>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
>>> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
>>> ---
>>>   net/core/filter.c | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index 8adf95765cdd..8ed4d5d87167 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
>>> bpf_sp);
>>>   static inline int __bpf_try_make_writable(struct sk_buff *skb,
>>>     unsigned int write_len)
>>>   {
>>> +if (skb_is_gso(skb) &&
>>> +    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
>>> +     write_len > skb_headlen(skb)) {
>>> +return -ENOMEM;
>>> +}
>>>   return skb_ensure_writable(skb, write_len);

Dumb question, but should this guard be more generically part of skb_ensure_writable()
internals, presumably that would be inside pskb_may_pull_reason(), or only if we ever
see more code instances similar to this?

Thanks,
Daniel
Willem de Bruijn April 27, 2024, 1:28 p.m. UTC | #25
Daniel Borkmann wrote:
> On 4/26/24 11:52 AM, Lena Wang (王娜) wrote:
> [...]
> >>>  From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 00:00:00
> >> 2001
> >>> From: Shiming Cheng <shiming.cheng@mediatek.com>
> >>> Date: Wed, 24 Apr 2024 13:42:35 +0800
> >>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST skb
> >>>
> >>> A SKB_GSO_FRAGLIST skb can't be pulled data
> >>> from its fraglist as it may result an invalid
> >>> segmentation or kernel exception.
> >>>
> >>> For such structured skb we limit the BPF pulling
> >>> data length smaller than skb_headlen() and return
> >>> error if exceeding.
> >>>
> >>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> >>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> >>> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> >>> ---
> >>>   net/core/filter.c | 5 +++++
> >>>   1 file changed, 5 insertions(+)
> >>>
> >>> diff --git a/net/core/filter.c b/net/core/filter.c
> >>> index 8adf95765cdd..8ed4d5d87167 100644
> >>> --- a/net/core/filter.c
> >>> +++ b/net/core/filter.c
> >>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad,
> >>> bpf_sp);
> >>>   static inline int __bpf_try_make_writable(struct sk_buff *skb,
> >>>     unsigned int write_len)
> >>>   {
> >>> +if (skb_is_gso(skb) &&
> >>> +    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> >>> +     write_len > skb_headlen(skb)) {
> >>> +return -ENOMEM;
> >>> +}
> >>>   return skb_ensure_writable(skb, write_len);
> 
> Dumb question, but should this guard be more generically part of skb_ensure_writable()
> internals, presumably that would be inside pskb_may_pull_reason(), or only if we ever
> see more code instances similar to this?

Good point. Most callers of skb_ensure_writable correctly pull only
headers, so wouldn't cause this problem. But it also adds coverage to
things like tc pedit.
Lena Wang (王娜) April 28, 2024, 7:48 a.m. UTC | #26
On Sat, 2024-04-27 at 09:28 -0400, Willem de Bruijn wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  
> Daniel Borkmann wrote:
> > On 4/26/24 11:52 AM, Lena Wang (王娜) wrote:
> > [...]
> > >>>  From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17
> 00:00:00
> > >> 2001
> > >>> From: Shiming Cheng <shiming.cheng@mediatek.com>
> > >>> Date: Wed, 24 Apr 2024 13:42:35 +0800
> > >>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST
> skb
> > >>>
> > >>> A SKB_GSO_FRAGLIST skb can't be pulled data
> > >>> from its fraglist as it may result an invalid
> > >>> segmentation or kernel exception.
> > >>>
> > >>> For such structured skb we limit the BPF pulling
> > >>> data length smaller than skb_headlen() and return
> > >>> error if exceeding.
> > >>>
> > >>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > >>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > >>> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > >>> ---
> > >>>   net/core/filter.c | 5 +++++
> > >>>   1 file changed, 5 insertions(+)
> > >>>
> > >>> diff --git a/net/core/filter.c b/net/core/filter.c
> > >>> index 8adf95765cdd..8ed4d5d87167 100644
> > >>> --- a/net/core/filter.c
> > >>> +++ b/net/core/filter.c
> > >>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct
> bpf_scratchpad,
> > >>> bpf_sp);
> > >>>   static inline int __bpf_try_make_writable(struct sk_buff
> *skb,
> > >>>     unsigned int write_len)
> > >>>   {
> > >>> +if (skb_is_gso(skb) &&
> > >>> +    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> > >>> +     write_len > skb_headlen(skb)) {
> > >>> +return -ENOMEM;
> > >>> +}
> > >>>   return skb_ensure_writable(skb, write_len);
> > 
> > Dumb question, but should this guard be more generically part of
> skb_ensure_writable()
> > internals, presumably that would be inside pskb_may_pull_reason(),
> or only if we ever
> > see more code instances similar to this?
> 
> Good point. Most callers of skb_ensure_writable correctly pull only
> headers, so wouldn't cause this problem. But it also adds coverage to
> things like tc pedit.

Updated:

From 3be30b8cf6e629f2615ef4eafe3b2a1c0d68c530 Mon Sep 17 00:00:00 2001
From: Shiming Cheng <shiming.cheng@mediatek.com>
Date: Sun, 28 Apr 2024 15:03:12 +0800
Subject: [PATCH net] net: prevent pulling SKB_GSO_FRAGLIST skb

BPF or TC callers may pull in a length longer than skb_headlen()
for a SKB_GSO_FRAGLIST skb. The data in fraglist will be pulled
into the linear space. However it destroys the skb's structure
and may result in an invalid segmentation or kernel exception.

So we should add protection to stop the operation and return
error to remind callers.

Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Signed-off-by: Lena Wang <lena.wang@mediatek.com>
---
 include/linux/skbuff.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9d24aec064e8..3eef65b3db24 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2740,6 +2740,12 @@ pskb_may_pull_reason(struct sk_buff *skb,
unsigned int len)
 	if (unlikely(len > skb->len))
 		return SKB_DROP_REASON_PKT_TOO_SMALL;
 
+	if (skb_is_gso(skb) &&
+	    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
+	     write_len > skb_headlen(skb)) {
+		return SKB_DROP_REASON_NOMEM;
+	}
+
 	if (unlikely(!__pskb_pull_tail(skb, len - skb_headlen(skb))))
 		return SKB_DROP_REASON_NOMEM;
Willem de Bruijn April 28, 2024, 1:19 p.m. UTC | #27
Lena Wang (王娜) wrote:
> On Sat, 2024-04-27 at 09:28 -0400, Willem de Bruijn wrote:
> >  	 
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  
> > Daniel Borkmann wrote:
> > > On 4/26/24 11:52 AM, Lena Wang (王娜) wrote:
> > > [...]
> > > >>>  From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17
> > 00:00:00
> > > >> 2001
> > > >>> From: Shiming Cheng <shiming.cheng@mediatek.com>
> > > >>> Date: Wed, 24 Apr 2024 13:42:35 +0800
> > > >>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST
> > skb
> > > >>>
> > > >>> A SKB_GSO_FRAGLIST skb can't be pulled data
> > > >>> from its fraglist as it may result an invalid
> > > >>> segmentation or kernel exception.
> > > >>>
> > > >>> For such structured skb we limit the BPF pulling
> > > >>> data length smaller than skb_headlen() and return
> > > >>> error if exceeding.
> > > >>>
> > > >>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > > >>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > > >>> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > > >>> ---
> > > >>>   net/core/filter.c | 5 +++++
> > > >>>   1 file changed, 5 insertions(+)
> > > >>>
> > > >>> diff --git a/net/core/filter.c b/net/core/filter.c
> > > >>> index 8adf95765cdd..8ed4d5d87167 100644
> > > >>> --- a/net/core/filter.c
> > > >>> +++ b/net/core/filter.c
> > > >>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct
> > bpf_scratchpad,
> > > >>> bpf_sp);
> > > >>>   static inline int __bpf_try_make_writable(struct sk_buff
> > *skb,
> > > >>>     unsigned int write_len)
> > > >>>   {
> > > >>> +if (skb_is_gso(skb) &&
> > > >>> +    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> > > >>> +     write_len > skb_headlen(skb)) {
> > > >>> +return -ENOMEM;
> > > >>> +}
> > > >>>   return skb_ensure_writable(skb, write_len);
> > > 
> > > Dumb question, but should this guard be more generically part of
> > skb_ensure_writable()
> > > internals, presumably that would be inside pskb_may_pull_reason(),
> > or only if we ever
> > > see more code instances similar to this?
> > 
> > Good point. Most callers of skb_ensure_writable correctly pull only
> > headers, so wouldn't cause this problem. But it also adds coverage to
> > things like tc pedit.
> 
> Updated:
> 
> From 3be30b8cf6e629f2615ef4eafe3b2a1c0d68c530 Mon Sep 17 00:00:00 2001
> From: Shiming Cheng <shiming.cheng@mediatek.com>
> Date: Sun, 28 Apr 2024 15:03:12 +0800
> Subject: [PATCH net] net: prevent pulling SKB_GSO_FRAGLIST skb
> 
> BPF or TC callers may pull in a length longer than skb_headlen()
> for a SKB_GSO_FRAGLIST skb. The data in fraglist will be pulled
> into the linear space. However it destroys the skb's structure
> and may result in an invalid segmentation or kernel exception.
> 
> So we should add protection to stop the operation and return
> error to remind callers.
> 
> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> ---
>  include/linux/skbuff.h | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 9d24aec064e8..3eef65b3db24 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2740,6 +2740,12 @@ pskb_may_pull_reason(struct sk_buff *skb,
> unsigned int len)


pskb_may_pull is called in far too many locations. Daniel's
suggestion was to amend skb_ensure_writable

Please start sending the patches as regular patches. They are close
enough to review normally.

>  	if (unlikely(len > skb->len))
>  		return SKB_DROP_REASON_PKT_TOO_SMALL;
>  
> +	if (skb_is_gso(skb) &&
> +	    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> +	     write_len > skb_headlen(skb)) {
> +		return SKB_DROP_REASON_NOMEM;
> +	}
> +
>  	if (unlikely(!__pskb_pull_tail(skb, len - skb_headlen(skb))))
>  		return SKB_DROP_REASON_NOMEM;
>  
> -- 
> 2.18.0
Daniel Borkmann April 29, 2024, 10:15 a.m. UTC | #28
On 4/28/24 9:48 AM, Lena Wang (王娜) wrote:
> On Sat, 2024-04-27 at 09:28 -0400, Willem de Bruijn wrote:
>>   	
>> External email : Please do not click links or open attachments until
>> you have verified the sender or the content.
>>   
>> Daniel Borkmann wrote:
>>> On 4/26/24 11:52 AM, Lena Wang (王娜) wrote:
>>> [...]
>>>>>>   From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17
>> 00:00:00
>>>>> 2001
>>>>>> From: Shiming Cheng <shiming.cheng@mediatek.com>
>>>>>> Date: Wed, 24 Apr 2024 13:42:35 +0800
>>>>>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST
>> skb
>>>>>>
>>>>>> A SKB_GSO_FRAGLIST skb can't be pulled data
>>>>>> from its fraglist as it may result an invalid
>>>>>> segmentation or kernel exception.
>>>>>>
>>>>>> For such structured skb we limit the BPF pulling
>>>>>> data length smaller than skb_headlen() and return
>>>>>> error if exceeding.
>>>>>>
>>>>>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
>>>>>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
>>>>>> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
>>>>>> ---
>>>>>>    net/core/filter.c | 5 +++++
>>>>>>    1 file changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>>>> index 8adf95765cdd..8ed4d5d87167 100644
>>>>>> --- a/net/core/filter.c
>>>>>> +++ b/net/core/filter.c
>>>>>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct
>> bpf_scratchpad,
>>>>>> bpf_sp);
>>>>>>    static inline int __bpf_try_make_writable(struct sk_buff
>> *skb,
>>>>>>      unsigned int write_len)
>>>>>>    {
>>>>>> +if (skb_is_gso(skb) &&
>>>>>> +    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
>>>>>> +     write_len > skb_headlen(skb)) {
>>>>>> +return -ENOMEM;
>>>>>> +}
>>>>>>    return skb_ensure_writable(skb, write_len);
>>>
>>> Dumb question, but should this guard be more generically part of
>> skb_ensure_writable()
>>> internals, presumably that would be inside pskb_may_pull_reason(),
>> or only if we ever
>>> see more code instances similar to this?
>>
>> Good point. Most callers of skb_ensure_writable correctly pull only
>> headers, so wouldn't cause this problem. But it also adds coverage to
>> things like tc pedit.
> 
> Updated:
> 
>  From 3be30b8cf6e629f2615ef4eafe3b2a1c0d68c530 Mon Sep 17 00:00:00 2001
> From: Shiming Cheng <shiming.cheng@mediatek.com>
> Date: Sun, 28 Apr 2024 15:03:12 +0800
> Subject: [PATCH net] net: prevent pulling SKB_GSO_FRAGLIST skb
> 
> BPF or TC callers may pull in a length longer than skb_headlen()
> for a SKB_GSO_FRAGLIST skb. The data in fraglist will be pulled
> into the linear space. However it destroys the skb's structure
> and may result in an invalid segmentation or kernel exception.
> 
> So we should add protection to stop the operation and return
> error to remind callers.
> 
> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> ---
>   include/linux/skbuff.h | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 9d24aec064e8..3eef65b3db24 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2740,6 +2740,12 @@ pskb_may_pull_reason(struct sk_buff *skb,
> unsigned int len)
>   	if (unlikely(len > skb->len))
>   		return SKB_DROP_REASON_PKT_TOO_SMALL;
>   
> +	if (skb_is_gso(skb) &&
> +	    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> +	     write_len > skb_headlen(skb)) {
> +		return SKB_DROP_REASON_NOMEM;
> +	}

The 'write_len > skb_headlen(skb)' test is redundant, no ?

It is covered by the earlier test :

         if (likely(len <= skb_headlen(skb)))
                 return SKB_NOT_DROPPED_YET;

Also, was this patch even compile tested since there is no write_len var ?

>   	if (unlikely(!__pskb_pull_tail(skb, len - skb_headlen(skb))))
>   		return SKB_DROP_REASON_NOMEM;
>   
>
Lena Wang (王娜) April 29, 2024, 11:45 a.m. UTC | #29
On Mon, 2024-04-29 at 12:15 +0200, Daniel Borkmann wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 4/28/24 9:48 AM, Lena Wang (王娜) wrote:
> > On Sat, 2024-04-27 at 09:28 -0400, Willem de Bruijn wrote:
> >>   
> >> External email : Please do not click links or open attachments
> until
> >> you have verified the sender or the content.
> >>   
> >> Daniel Borkmann wrote:
> >>> On 4/26/24 11:52 AM, Lena Wang (王娜) wrote:
> >>> [...]
> >>>>>>   From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17
> >> 00:00:00
> >>>>> 2001
> >>>>>> From: Shiming Cheng <shiming.cheng@mediatek.com>
> >>>>>> Date: Wed, 24 Apr 2024 13:42:35 +0800
> >>>>>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST
> >> skb
> >>>>>>
> >>>>>> A SKB_GSO_FRAGLIST skb can't be pulled data
> >>>>>> from its fraglist as it may result an invalid
> >>>>>> segmentation or kernel exception.
> >>>>>>
> >>>>>> For such structured skb we limit the BPF pulling
> >>>>>> data length smaller than skb_headlen() and return
> >>>>>> error if exceeding.
> >>>>>>
> >>>>>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist
> chaining.")
> >>>>>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> >>>>>> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> >>>>>> ---
> >>>>>>    net/core/filter.c | 5 +++++
> >>>>>>    1 file changed, 5 insertions(+)
> >>>>>>
> >>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
> >>>>>> index 8adf95765cdd..8ed4d5d87167 100644
> >>>>>> --- a/net/core/filter.c
> >>>>>> +++ b/net/core/filter.c
> >>>>>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct
> >> bpf_scratchpad,
> >>>>>> bpf_sp);
> >>>>>>    static inline int __bpf_try_make_writable(struct sk_buff
> >> *skb,
> >>>>>>      unsigned int write_len)
> >>>>>>    {
> >>>>>> +if (skb_is_gso(skb) &&
> >>>>>> +    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> >>>>>> +     write_len > skb_headlen(skb)) {
> >>>>>> +return -ENOMEM;
> >>>>>> +}
> >>>>>>    return skb_ensure_writable(skb, write_len);
> >>>
> >>> Dumb question, but should this guard be more generically part of
> >> skb_ensure_writable()
> >>> internals, presumably that would be inside
> pskb_may_pull_reason(),
> >> or only if we ever
> >>> see more code instances similar to this?
> >>
> >> Good point. Most callers of skb_ensure_writable correctly pull
> only
> >> headers, so wouldn't cause this problem. But it also adds coverage
> to
> >> things like tc pedit.
> > 
> > Updated:
> > 
> >  From 3be30b8cf6e629f2615ef4eafe3b2a1c0d68c530 Mon Sep 17 00:00:00
> 2001
> > From: Shiming Cheng <shiming.cheng@mediatek.com>
> > Date: Sun, 28 Apr 2024 15:03:12 +0800
> > Subject: [PATCH net] net: prevent pulling SKB_GSO_FRAGLIST skb
> > 
> > BPF or TC callers may pull in a length longer than skb_headlen()
> > for a SKB_GSO_FRAGLIST skb. The data in fraglist will be pulled
> > into the linear space. However it destroys the skb's structure
> > and may result in an invalid segmentation or kernel exception.
> > 
> > So we should add protection to stop the operation and return
> > error to remind callers.
> > 
> > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
> > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
> > Signed-off-by: Lena Wang <lena.wang@mediatek.com>
> > ---
> >   include/linux/skbuff.h | 6 ++++++
> >   1 file changed, 6 insertions(+)
> > 
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 9d24aec064e8..3eef65b3db24 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -2740,6 +2740,12 @@ pskb_may_pull_reason(struct sk_buff *skb,
> > unsigned int len)
> >   if (unlikely(len > skb->len))
> >   return SKB_DROP_REASON_PKT_TOO_SMALL;
> >   
> > +if (skb_is_gso(skb) &&
> > +    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
> > +     write_len > skb_headlen(skb)) {
> > +return SKB_DROP_REASON_NOMEM;
> > +}
> 
> The 'write_len > skb_headlen(skb)' test is redundant, no ?
> 
> It is covered by the earlier test :
> 
>          if (likely(len <= skb_headlen(skb)))
>                  return SKB_NOT_DROPPED_YET;
> 
Daniel, it is not redundant. The bpf pulls a len between
skb_headlen(skb) and skb->len that results in error. Here it will stop
this operation. For other skbs(not SKB_GSO_FRAGLIST) it could be a
normal behaviour and will continue to do next pulling.

> Also, was this patch even compile tested since there is no write_len
> var ?
> 
The check is moved into skb_ensure_writable as advice and a new mail
loop is created to start upstream:

From 36307cb8706653a63a389faadbb1324cda5c0445 Mon Sep 17 00:00:00 2001
F
rom: Shiming Cheng <shiming.cheng@mediatek.com>
Date: Sun, 28 Apr 2024
21:50:11 +0800
Subject: [PATCH net] net: prevent pulling
SKB_GSO_FRAGLIST skb

BPF or TC callers may pull in a length longer than skb_headlen()
for a SKB_GSO_FRAGLIST skb. The data in fraglist will be pulled
into the linear space. However it destroys the skb's structure
and may result in an invalid segmentation or kernel exception.

So we should add protection to stop the operation and return
error to remind callers.

Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Signed-off-by: Lena Wang <lena.wang@mediatek.com>
---
 net/core/skbuff.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f68f2679b086..2d35e009e814 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6100,6 +6100,12 @@ EXPORT_SYMBOL(skb_vlan_untag);
 
 int skb_ensure_writable(struct sk_buff *skb, unsigned int write_len)
 {
+	if (skb_is_gso(skb) &&
+	    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
+	     write_len > skb_headlen(skb)) {
+		return -ENOMEM;
+	}
+
 	if (!pskb_may_pull(skb, write_len))
 		return -ENOMEM;
Daniel Borkmann April 29, 2024, 3:11 p.m. UTC | #30
On 4/29/24 1:45 PM, Lena Wang (王娜) wrote:
> On Mon, 2024-04-29 at 12:15 +0200, Daniel Borkmann wrote:
>>   	
>> External email : Please do not click links or open attachments until
>> you have verified the sender or the content.
>>   On 4/28/24 9:48 AM, Lena Wang (王娜) wrote:
>>> On Sat, 2024-04-27 at 09:28 -0400, Willem de Bruijn wrote:
>>>>    
>>>> External email : Please do not click links or open attachments
>> until
>>>> you have verified the sender or the content.
>>>>    
>>>> Daniel Borkmann wrote:
>>>>> On 4/26/24 11:52 AM, Lena Wang (王娜) wrote:
>>>>> [...]
>>>>>>>>    From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17
>>>> 00:00:00
>>>>>>> 2001
>>>>>>>> From: Shiming Cheng <shiming.cheng@mediatek.com>
>>>>>>>> Date: Wed, 24 Apr 2024 13:42:35 +0800
>>>>>>>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST
>>>> skb
>>>>>>>>
>>>>>>>> A SKB_GSO_FRAGLIST skb can't be pulled data
>>>>>>>> from its fraglist as it may result an invalid
>>>>>>>> segmentation or kernel exception.
>>>>>>>>
>>>>>>>> For such structured skb we limit the BPF pulling
>>>>>>>> data length smaller than skb_headlen() and return
>>>>>>>> error if exceeding.
>>>>>>>>
>>>>>>>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist
>> chaining.")
>>>>>>>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
>>>>>>>> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
>>>>>>>> ---
>>>>>>>>     net/core/filter.c | 5 +++++
>>>>>>>>     1 file changed, 5 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>>>>>> index 8adf95765cdd..8ed4d5d87167 100644
>>>>>>>> --- a/net/core/filter.c
>>>>>>>> +++ b/net/core/filter.c
>>>>>>>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct
>>>> bpf_scratchpad,
>>>>>>>> bpf_sp);
>>>>>>>>     static inline int __bpf_try_make_writable(struct sk_buff
>>>> *skb,
>>>>>>>>       unsigned int write_len)
>>>>>>>>     {
>>>>>>>> +if (skb_is_gso(skb) &&
>>>>>>>> +    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
>>>>>>>> +     write_len > skb_headlen(skb)) {
>>>>>>>> +return -ENOMEM;
>>>>>>>> +}
>>>>>>>>     return skb_ensure_writable(skb, write_len);
>>>>>
>>>>> Dumb question, but should this guard be more generically part of
>>>> skb_ensure_writable()
>>>>> internals, presumably that would be inside
>> pskb_may_pull_reason(),
>>>> or only if we ever
>>>>> see more code instances similar to this?
>>>>
>>>> Good point. Most callers of skb_ensure_writable correctly pull
>> only
>>>> headers, so wouldn't cause this problem. But it also adds coverage
>> to
>>>> things like tc pedit.
>>>
>>> Updated:
>>>
>>>   From 3be30b8cf6e629f2615ef4eafe3b2a1c0d68c530 Mon Sep 17 00:00:00
>> 2001
>>> From: Shiming Cheng <shiming.cheng@mediatek.com>
>>> Date: Sun, 28 Apr 2024 15:03:12 +0800
>>> Subject: [PATCH net] net: prevent pulling SKB_GSO_FRAGLIST skb
>>>
>>> BPF or TC callers may pull in a length longer than skb_headlen()
>>> for a SKB_GSO_FRAGLIST skb. The data in fraglist will be pulled
>>> into the linear space. However it destroys the skb's structure
>>> and may result in an invalid segmentation or kernel exception.
>>>
>>> So we should add protection to stop the operation and return
>>> error to remind callers.
>>>
>>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
>>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
>>> Signed-off-by: Lena Wang <lena.wang@mediatek.com>
>>> ---
>>>    include/linux/skbuff.h | 6 ++++++
>>>    1 file changed, 6 insertions(+)
>>>
>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>> index 9d24aec064e8..3eef65b3db24 100644
>>> --- a/include/linux/skbuff.h
>>> +++ b/include/linux/skbuff.h
>>> @@ -2740,6 +2740,12 @@ pskb_may_pull_reason(struct sk_buff *skb,
>>> unsigned int len)
>>>    if (unlikely(len > skb->len))
>>>    return SKB_DROP_REASON_PKT_TOO_SMALL;
>>>    
>>> +if (skb_is_gso(skb) &&
>>> +    (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) &&
>>> +     write_len > skb_headlen(skb)) {
>>> +return SKB_DROP_REASON_NOMEM;
>>> +}
>>
>> The 'write_len > skb_headlen(skb)' test is redundant, no ?
>>
>> It is covered by the earlier test :
>>
>>           if (likely(len <= skb_headlen(skb)))
>>                   return SKB_NOT_DROPPED_YET;
>>
> Daniel, it is not redundant. The bpf pulls a len between
> skb_headlen(skb) and skb->len that results in error. Here it will stop
> this operation. For other skbs(not SKB_GSO_FRAGLIST) it could be a
> normal behaviour and will continue to do next pulling.

I meant something like the below. The len <= skb_headlen(skb) case you
already return earlier with SKB_NOT_DROPPED_YET. Willem, do you see a
case where this should not live in pskb_may_pull_reason() but rather
specifically in skb_ensure_writable()?

Thanks,
Daniel

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 517e546a120a..ef2a0328ff2b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1687,6 +1687,11 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
  /* Internal */
  #define skb_shinfo(SKB)	((struct skb_shared_info *)(skb_end_pointer(SKB)))

+static inline bool skb_is_gso(const struct sk_buff *skb)
+{
+	return skb_shinfo(skb)->gso_size;
+}
+
  static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb)
  {
  	return &skb_shinfo(skb)->hwtstamps;
@@ -2740,6 +2745,10 @@ pskb_may_pull_reason(struct sk_buff *skb, unsigned int len)
  	if (unlikely(len > skb->len))
  		return SKB_DROP_REASON_PKT_TOO_SMALL;

+	if (unlikely(skb_is_gso(skb) &&
+		     (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST)))
+		return SKB_DROP_REASON_FRAGLIST_PULL;
+
  	if (unlikely(!__pskb_pull_tail(skb, len - skb_headlen(skb))))
  		return SKB_DROP_REASON_NOMEM;

@@ -4953,11 +4962,6 @@ static inline struct sec_path *skb_sec_path(const struct sk_buff *skb)
  #endif
  }

-static inline bool skb_is_gso(const struct sk_buff *skb)
-{
-	return skb_shinfo(skb)->gso_size;
-}
-
  /* Note: Should be called only if skb_is_gso(skb) is true */
  static inline bool skb_is_gso_v6(const struct sk_buff *skb)
  {
diff --git a/include/net/dropreason-core.h b/include/net/dropreason-core.h
index 9707ab54fdd5..9d6c97a6b2b6 100644
--- a/include/net/dropreason-core.h
+++ b/include/net/dropreason-core.h
@@ -418,6 +418,12 @@ enum skb_drop_reason {
  	 * iterations.
  	 */
  	SKB_DROP_REASON_TC_RECLASSIFY_LOOP,
+	/**
+	 * @SKB_DROP_REASON_FRAGLIST_PULL: attempting to pull GSO fraglist
+	 * which destroys the skb's structure and may then result in an
+	 * invalid segmentation or kernel exception.
+	 */
+	SKB_DROP_REASON_FRAGLIST_PULL,
  	/**
  	 * @SKB_DROP_REASON_MAX: the maximum of core drop reasons, which
  	 * shouldn't be used as a real 'reason' - only for tracing code gen
Willem de Bruijn April 29, 2024, 9:14 p.m. UTC | #31
> >> The 'write_len > skb_headlen(skb)' test is redundant, no ?
> >>
> >> It is covered by the earlier test :
> >>
> >>           if (likely(len <= skb_headlen(skb)))
> >>                   return SKB_NOT_DROPPED_YET;
> >>
> > Daniel, it is not redundant. The bpf pulls a len between
> > skb_headlen(skb) and skb->len that results in error. Here it will stop
> > this operation. For other skbs(not SKB_GSO_FRAGLIST) it could be a
> > normal behaviour and will continue to do next pulling.
> 
> I meant something like the below. The len <= skb_headlen(skb) case you
> already return earlier with SKB_NOT_DROPPED_YET. Willem, do you see a
> case where this should not live in pskb_may_pull_reason() but rather
> specifically in skb_ensure_writable()?

Yes. pskb_may_pull is called all over the hot path. All in locations
that are known safe, because they only pull header bytes. I prefer to
limit the branch to the few (user configurable) locations that are in
scope.
diff mbox series

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b99127712e67..f68f2679b086 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4504,6 +4504,9 @@  struct sk_buff *skb_segment_list(struct sk_buff *skb,
 	if (err)
 		goto err_linearize;
 
+	if (!list_skb)
+		goto err_linearize;
+
 	skb_shinfo(skb)->frag_list = NULL;
 
 	while (list_skb) {