Message ID | 20240415150103.23316-1-shiming.cheng@mediatek.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] udp: fix segmentation crash for GRO packet without fraglist | expand |
shiming.cheng@ wrote: > From: Shiming Cheng <shiming.cheng@mediatek.com> > > A GRO packet without fraglist is crashed and backtrace is as below: > [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: > G W OE 6.6.17-android15-0-g380371ea9bf1 #1 > [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4 > [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120 > [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134 > [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114 > [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac > [ 1100.812358][ C3] udp_rcv+0x20/0x30 > > The reason that the packet loses its fraglist is that in ingress bpf > it makes a test pull with to make sure it can read packet headers > via direct packet access: In bpf_progs/offload.c > try_make_writable -> bpf_skb_pull_data -> pskb_may_pull -> > __pskb_pull_tail This operation pull the data in fraglist into linear > and set the fraglist to null. What is the right behavior from BPF with regard to SKB_GSO_FRAGLIST skbs? Some, like SCTP, cannot be linearized ever, as the do not have a single gso_size. Should this BPF operation just fail? > > BPF needs to modify a proper length to do pull data. However kernel > should also improve the flow to avoid crash from a bpf function call. > As there is no split flow and app may not decode the merged UDP packet, > we should drop the packet without fraglist in skb_segment_list here. > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > --- > net/core/skbuff.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index b99127712e67..f68f2679b086 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, > if (err) > goto err_linearize; > > + if (!list_skb) > + goto err_linearize; > + > skb_shinfo(skb)->frag_list = NULL; In absense of plugging the issue in BPF, dropping here is the best we can do indeed, I think.
On Mon, 2024-04-15 at 16:53 -0400, Willem de Bruijn wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > shiming.cheng@ wrote: > > From: Shiming Cheng <shiming.cheng@mediatek.com> > > > > A GRO packet without fraglist is crashed and backtrace is as below: > > [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: > > G W OE 6.6.17-android15-0-g380371ea9bf1 #1 > > [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4 > > [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120 > > [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134 > > [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114 > > [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac > > [ 1100.812358][ C3] udp_rcv+0x20/0x30 > > > > The reason that the packet loses its fraglist is that in ingress > bpf > > it makes a test pull with to make sure it can read packet headers > > via direct packet access: In bpf_progs/offload.c > > try_make_writable -> bpf_skb_pull_data -> pskb_may_pull -> > > __pskb_pull_tail This operation pull the data in fraglist into > linear > > and set the fraglist to null. > > What is the right behavior from BPF with regard to SKB_GSO_FRAGLIST > skbs? > > Some, like SCTP, cannot be linearized ever, as the do not have a > single gso_size. > > Should this BPF operation just fail? > In most situation for big gso size packet, it indeed fails but BPF doesn't check the result. It seems the udp GRO packet can't be pulled/ trimed/condensed or else it can't be segmented correctly. As the BPF function comments it doesn't matter if the data pull failed or pull less. It just does a blind best effort pull. A patch to modify bpf pull length is upstreamed to Google before and below are part of Google BPF expert maze's reply: maze@google.com<maze@google.com> #5Apr 13, 2024 02:30AM I *think* if that patch fixes anything, then it's really proving that there's a bug in the kernel that needs to be fixed instead. It should be legal to call try_make_writable(skb, X) with *any* value of X. I add maze in loop and we could start more discussion here. > > > > BPF needs to modify a proper length to do pull data. However kernel > > should also improve the flow to avoid crash from a bpf function > call. > > As there is no split flow and app may not decode the merged UDP > packet, > > we should drop the packet without fraglist in skb_segment_list > here. > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > --- > > net/core/skbuff.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index b99127712e67..f68f2679b086 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct > sk_buff *skb, > > if (err) > > goto err_linearize; > > > > +if (!list_skb) > > +goto err_linearize; > > + > > skb_shinfo(skb)->frag_list = NULL; > > In absense of plugging the issue in BPF, dropping here is the best > we can do indeed, I think. >
On Mon, Apr 15, 2024 at 7:14 PM Lena Wang (王娜) <Lena.Wang@mediatek.com> wrote: > > On Mon, 2024-04-15 at 16:53 -0400, Willem de Bruijn wrote: > > > > External email : Please do not click links or open attachments until > > you have verified the sender or the content. > > shiming.cheng@ wrote: > > > From: Shiming Cheng <shiming.cheng@mediatek.com> > > > > > > A GRO packet without fraglist is crashed and backtrace is as below: > > > [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: > > > G W OE 6.6.17-android15-0-g380371ea9bf1 #1 > > > [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4 > > > [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120 > > > [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134 > > > [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114 > > > [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac > > > [ 1100.812358][ C3] udp_rcv+0x20/0x30 > > > > > > The reason that the packet loses its fraglist is that in ingress > > bpf > > > it makes a test pull with to make sure it can read packet headers > > > via direct packet access: In bpf_progs/offload.c > > > try_make_writable -> bpf_skb_pull_data -> pskb_may_pull -> > > > __pskb_pull_tail This operation pull the data in fraglist into > > linear > > > and set the fraglist to null. > > > > What is the right behavior from BPF with regard to SKB_GSO_FRAGLIST > > skbs? > > > > Some, like SCTP, cannot be linearized ever, as the do not have a > > single gso_size. > > > > Should this BPF operation just fail? > > > In most situation for big gso size packet, it indeed fails but BPF > doesn't check the result. It seems the udp GRO packet can't be pulled/ > trimed/condensed or else it can't be segmented correctly. > > As the BPF function comments it doesn't matter if the data pull failed > or pull less. It just does a blind best effort pull. > > A patch to modify bpf pull length is upstreamed to Google before and > below are part of Google BPF expert maze's reply: > maze@google.com<maze@google.com> #5Apr 13, 2024 02:30AM > I *think* if that patch fixes anything, then it's really proving that > there's a bug in the kernel that needs to be fixed instead. > It should be legal to call try_make_writable(skb, X) with *any* value > of X. > > I add maze in loop and we could start more discussion here. Personally, I think bpf_skb_pull_data() should have automatically (ie. in kernel code) reduced how much it pulls so that it would pull headers only, and not packet content. (This is assuming the rest of the code isn't ready to deal with a longer pull, which I think is the case atm. Pulling too much, and then crashing or forcing the stack to drop packets because of them being malformed seems wrong...) In general it would be nice if there was a way to just say pull all headers... (or possibly all L2/L3/L4 headers) You in general need to pull stuff *before* you've even looked at the packet, so that you can look at the packet, so it's relatively hard/annoying to pull the correct length from bpf code itself. > > > BPF needs to modify a proper length to do pull data. However kernel > > > should also improve the flow to avoid crash from a bpf function > > call. > > > As there is no split flow and app may not decode the merged UDP > > packet, > > > we should drop the packet without fraglist in skb_segment_list > > here. > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > > --- > > > net/core/skbuff.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index b99127712e67..f68f2679b086 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct > > sk_buff *skb, > > > if (err) > > > goto err_linearize; > > > > > > +if (!list_skb) > > > +goto err_linearize; > > > + > > > skb_shinfo(skb)->frag_list = NULL; > > > > In absense of plugging the issue in BPF, dropping here is the best > > we can do indeed, I think. > > -- Maciej Żenczykowski, Kernel Networking Developer @ Google
Maciej Żenczykowski wrote: > On Mon, Apr 15, 2024 at 7:14 PM Lena Wang (王娜) <Lena.Wang@mediatek.com> wrote: > > > > On Mon, 2024-04-15 at 16:53 -0400, Willem de Bruijn wrote: > > > > > > External email : Please do not click links or open attachments until > > > you have verified the sender or the content. > > > shiming.cheng@ wrote: > > > > From: Shiming Cheng <shiming.cheng@mediatek.com> > > > > > > > > A GRO packet without fraglist is crashed and backtrace is as below: > > > > [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: > > > > G W OE 6.6.17-android15-0-g380371ea9bf1 #1 > > > > [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4 > > > > [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120 > > > > [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134 > > > > [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114 > > > > [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac > > > > [ 1100.812358][ C3] udp_rcv+0x20/0x30 > > > > > > > > The reason that the packet loses its fraglist is that in ingress > > > bpf > > > > it makes a test pull with to make sure it can read packet headers > > > > via direct packet access: In bpf_progs/offload.c > > > > try_make_writable -> bpf_skb_pull_data -> pskb_may_pull -> > > > > __pskb_pull_tail This operation pull the data in fraglist into > > > linear > > > > and set the fraglist to null. > > > > > > What is the right behavior from BPF with regard to SKB_GSO_FRAGLIST > > > skbs? > > > > > > Some, like SCTP, cannot be linearized ever, as the do not have a > > > single gso_size. > > > > > > Should this BPF operation just fail? > > > > > In most situation for big gso size packet, it indeed fails but BPF > > doesn't check the result. It seems the udp GRO packet can't be pulled/ > > trimed/condensed or else it can't be segmented correctly. > > > > As the BPF function comments it doesn't matter if the data pull failed > > or pull less. It just does a blind best effort pull. > > > > A patch to modify bpf pull length is upstreamed to Google before and > > below are part of Google BPF expert maze's reply: > > maze@google.com<maze@google.com> #5Apr 13, 2024 02:30AM > > I *think* if that patch fixes anything, then it's really proving that > > there's a bug in the kernel that needs to be fixed instead. > > It should be legal to call try_make_writable(skb, X) with *any* value > > of X. > > > > I add maze in loop and we could start more discussion here. > > Personally, I think bpf_skb_pull_data() should have automatically > (ie. in kernel code) reduced how much it pulls so that it would pull > headers only, That would be a helper that parses headers to discover header length. Parsing is better left to the BPF program. > and not packet content. > (This is assuming the rest of the code isn't ready to deal with a longer pull, > which I think is the case atm. Pulling too much, and then crashing or forcing > the stack to drop packets because of them being malformed seems wrong...) > > In general it would be nice if there was a way to just say pull all headers... > (or possibly all L2/L3/L4 headers) > You in general need to pull stuff *before* you've even looked at the packet, > so that you can look at the packet, > so it's relatively hard/annoying to pull the correct length from bpf > code itself. > > > > > BPF needs to modify a proper length to do pull data. However kernel > > > > should also improve the flow to avoid crash from a bpf function > > > call. > > > > As there is no split flow and app may not decode the merged UDP > > > packet, > > > > we should drop the packet without fraglist in skb_segment_list > > > here. > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > > > --- > > > > net/core/skbuff.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > index b99127712e67..f68f2679b086 100644 > > > > --- a/net/core/skbuff.c > > > > +++ b/net/core/skbuff.c > > > > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct > > > sk_buff *skb, > > > > if (err) > > > > goto err_linearize; > > > > > > > > +if (!list_skb) > > > > +goto err_linearize; > > > > + This would catch the case where the entire data frag_list is linearized, but not a pskb_may_pull that only pulls in part of the list. Even with BPF being privileged, the kernel should not crash if BPF pulls a FRAGLIST GSO skb. But the check needs to be refined a bit. For a UDP GSO packet, I think gso_size is still valid, so if the head_skb length does not match gso_size, it has been messed with and should be dropped. For a GSO_BY_FRAGS skb, there is no single gso_size, and this pull may be entirely undetectable as long as frag_list != NULL? > > > > skb_shinfo(skb)->frag_list = NULL; > > > > > > In absense of plugging the issue in BPF, dropping here is the best > > > we can do indeed, I think. > > > > > -- > Maciej Żenczykowski, Kernel Networking Developer @ Google
On Tue, Apr 16, 2024 at 10:16 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Maciej Żenczykowski wrote: > > On Mon, Apr 15, 2024 at 7:14 PM Lena Wang (王娜) <Lena.Wang@mediatek.com> wrote: > > > > > > On Mon, 2024-04-15 at 16:53 -0400, Willem de Bruijn wrote: > > > > > > > > External email : Please do not click links or open attachments until > > > > you have verified the sender or the content. > > > > shiming.cheng@ wrote: > > > > > From: Shiming Cheng <shiming.cheng@mediatek.com> > > > > > > > > > > A GRO packet without fraglist is crashed and backtrace is as below: > > > > > [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: > > > > > G W OE 6.6.17-android15-0-g380371ea9bf1 #1 > > > > > [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4 > > > > > [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120 > > > > > [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134 > > > > > [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114 > > > > > [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac > > > > > [ 1100.812358][ C3] udp_rcv+0x20/0x30 > > > > > > > > > > The reason that the packet loses its fraglist is that in ingress > > > > bpf > > > > > it makes a test pull with to make sure it can read packet headers > > > > > via direct packet access: In bpf_progs/offload.c > > > > > try_make_writable -> bpf_skb_pull_data -> pskb_may_pull -> > > > > > __pskb_pull_tail This operation pull the data in fraglist into > > > > linear > > > > > and set the fraglist to null. > > > > > > > > What is the right behavior from BPF with regard to SKB_GSO_FRAGLIST > > > > skbs? > > > > > > > > Some, like SCTP, cannot be linearized ever, as the do not have a > > > > single gso_size. > > > > > > > > Should this BPF operation just fail? > > > > > > > In most situation for big gso size packet, it indeed fails but BPF > > > doesn't check the result. It seems the udp GRO packet can't be pulled/ > > > trimed/condensed or else it can't be segmented correctly. > > > > > > As the BPF function comments it doesn't matter if the data pull failed > > > or pull less. It just does a blind best effort pull. > > > > > > A patch to modify bpf pull length is upstreamed to Google before and > > > below are part of Google BPF expert maze's reply: > > > maze@google.com<maze@google.com> #5Apr 13, 2024 02:30AM > > > I *think* if that patch fixes anything, then it's really proving that > > > there's a bug in the kernel that needs to be fixed instead. > > > It should be legal to call try_make_writable(skb, X) with *any* value > > > of X. > > > > > > I add maze in loop and we could start more discussion here. > > > > Personally, I think bpf_skb_pull_data() should have automatically > > (ie. in kernel code) reduced how much it pulls so that it would pull > > headers only, > > That would be a helper that parses headers to discover header length. Does it actually need to? Presumably the bpf pull function could notice that it is a packet flagged as being of type X (UDP GSO FRAGLIST) and reduce the pull accordingly so that it doesn't pull anything from the non-linear fraglist portion??? I know only the generic overview of what udp gso is, not any details, so I am assuming here that there's some sort of guarantee to how these packets are structured... But I imagine there must be or we wouldn't be hitting these issues deeper in the stack? > Parsing is better left to the BPF program. > > > and not packet content. > > (This is assuming the rest of the code isn't ready to deal with a longer pull, > > which I think is the case atm. Pulling too much, and then crashing or forcing > > the stack to drop packets because of them being malformed seems wrong...) > > > > In general it would be nice if there was a way to just say pull all headers... > > (or possibly all L2/L3/L4 headers) > > You in general need to pull stuff *before* you've even looked at the packet, > > so that you can look at the packet, > > so it's relatively hard/annoying to pull the correct length from bpf > > code itself. > > > > > > > BPF needs to modify a proper length to do pull data. However kernel > > > > > should also improve the flow to avoid crash from a bpf function > > > > call. > > > > > As there is no split flow and app may not decode the merged UDP > > > > packet, > > > > > we should drop the packet without fraglist in skb_segment_list > > > > here. > > > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > > > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > > > > --- > > > > > net/core/skbuff.c | 3 +++ > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > > index b99127712e67..f68f2679b086 100644 > > > > > --- a/net/core/skbuff.c > > > > > +++ b/net/core/skbuff.c > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct > > > > sk_buff *skb, > > > > > if (err) > > > > > goto err_linearize; > > > > > > > > > > +if (!list_skb) > > > > > +goto err_linearize; > > > > > + > > This would catch the case where the entire data frag_list is > linearized, but not a pskb_may_pull that only pulls in part of the > list. > > Even with BPF being privileged, the kernel should not crash if BPF > pulls a FRAGLIST GSO skb. > > But the check needs to be refined a bit. For a UDP GSO packet, I > think gso_size is still valid, so if the head_skb length does not > match gso_size, it has been messed with and should be dropped. > > For a GSO_BY_FRAGS skb, there is no single gso_size, and this pull > may be entirely undetectable as long as frag_list != NULL? > > > > > > > skb_shinfo(skb)->frag_list = NULL; > > > > > > > > In absense of plugging the issue in BPF, dropping here is the best > > > > we can do indeed, I think.
On Tue, Apr 16, 2024 at 10:51 AM Maciej Żenczykowski <maze@google.com> wrote: > > On Tue, Apr 16, 2024 at 10:16 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Maciej Żenczykowski wrote: > > > On Mon, Apr 15, 2024 at 7:14 PM Lena Wang (王娜) <Lena.Wang@mediatek.com> wrote: > > > > > > > > On Mon, 2024-04-15 at 16:53 -0400, Willem de Bruijn wrote: > > > > > > > > > > External email : Please do not click links or open attachments until > > > > > you have verified the sender or the content. > > > > > shiming.cheng@ wrote: > > > > > > From: Shiming Cheng <shiming.cheng@mediatek.com> > > > > > > > > > > > > A GRO packet without fraglist is crashed and backtrace is as below: > > > > > > [ 1100.812205][ C3] CPU: 3 PID: 0 Comm: swapper/3 Tainted: > > > > > > G W OE 6.6.17-android15-0-g380371ea9bf1 #1 > > > > > > [ 1100.812317][ C3] __udp_gso_segment+0x298/0x4d4 > > > > > > [ 1100.812335][ C3] __skb_gso_segment+0xc4/0x120 > > > > > > [ 1100.812339][ C3] udp_rcv_segment+0x50/0x134 > > > > > > [ 1100.812344][ C3] udp_queue_rcv_skb+0x74/0x114 > > > > > > [ 1100.812348][ C3] udp_unicast_rcv_skb+0x94/0xac > > > > > > [ 1100.812358][ C3] udp_rcv+0x20/0x30 > > > > > > > > > > > > The reason that the packet loses its fraglist is that in ingress > > > > > bpf > > > > > > it makes a test pull with to make sure it can read packet headers > > > > > > via direct packet access: In bpf_progs/offload.c > > > > > > try_make_writable -> bpf_skb_pull_data -> pskb_may_pull -> > > > > > > __pskb_pull_tail This operation pull the data in fraglist into > > > > > linear > > > > > > and set the fraglist to null. > > > > > > > > > > What is the right behavior from BPF with regard to SKB_GSO_FRAGLIST > > > > > skbs? > > > > > > > > > > Some, like SCTP, cannot be linearized ever, as the do not have a > > > > > single gso_size. > > > > > > > > > > Should this BPF operation just fail? > > > > > > > > > In most situation for big gso size packet, it indeed fails but BPF > > > > doesn't check the result. It seems the udp GRO packet can't be pulled/ > > > > trimed/condensed or else it can't be segmented correctly. > > > > > > > > As the BPF function comments it doesn't matter if the data pull failed > > > > or pull less. It just does a blind best effort pull. > > > > > > > > A patch to modify bpf pull length is upstreamed to Google before and > > > > below are part of Google BPF expert maze's reply: > > > > maze@google.com<maze@google.com> #5Apr 13, 2024 02:30AM > > > > I *think* if that patch fixes anything, then it's really proving that > > > > there's a bug in the kernel that needs to be fixed instead. > > > > It should be legal to call try_make_writable(skb, X) with *any* value > > > > of X. > > > > > > > > I add maze in loop and we could start more discussion here. > > > > > > Personally, I think bpf_skb_pull_data() should have automatically > > > (ie. in kernel code) reduced how much it pulls so that it would pull > > > headers only, > > > > That would be a helper that parses headers to discover header length. > > Does it actually need to? Presumably the bpf pull function could > notice that it is > a packet flagged as being of type X (UDP GSO FRAGLIST) and reduce the pull > accordingly so that it doesn't pull anything from the non-linear > fraglist portion??? > > I know only the generic overview of what udp gso is, not any details, so I am > assuming here that there's some sort of guarantee to how these packets > are structured... But I imagine there must be or we wouldn't be hitting these > issues deeper in the stack? Perhaps for a packet of this type we're already guaranteed the headers are in the linear portion, and the pull should simply be ignored? > > > Parsing is better left to the BPF program. > > > > > and not packet content. > > > (This is assuming the rest of the code isn't ready to deal with a longer pull, > > > which I think is the case atm. Pulling too much, and then crashing or forcing > > > the stack to drop packets because of them being malformed seems wrong...) > > > > > > In general it would be nice if there was a way to just say pull all headers... > > > (or possibly all L2/L3/L4 headers) > > > You in general need to pull stuff *before* you've even looked at the packet, > > > so that you can look at the packet, > > > so it's relatively hard/annoying to pull the correct length from bpf > > > code itself. > > > > > > > > > BPF needs to modify a proper length to do pull data. However kernel > > > > > > should also improve the flow to avoid crash from a bpf function > > > > > call. > > > > > > As there is no split flow and app may not decode the merged UDP > > > > > packet, > > > > > > we should drop the packet without fraglist in skb_segment_list > > > > > here. > > > > > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > > > > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > > > > > --- > > > > > > net/core/skbuff.c | 3 +++ > > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > > > index b99127712e67..f68f2679b086 100644 > > > > > > --- a/net/core/skbuff.c > > > > > > +++ b/net/core/skbuff.c > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct > > > > > sk_buff *skb, > > > > > > if (err) > > > > > > goto err_linearize; > > > > > > > > > > > > +if (!list_skb) > > > > > > +goto err_linearize; > > > > > > + > > > > This would catch the case where the entire data frag_list is > > linearized, but not a pskb_may_pull that only pulls in part of the > > list. > > > > Even with BPF being privileged, the kernel should not crash if BPF > > pulls a FRAGLIST GSO skb. > > > > But the check needs to be refined a bit. For a UDP GSO packet, I > > think gso_size is still valid, so if the head_skb length does not > > match gso_size, it has been messed with and should be dropped. > > > > For a GSO_BY_FRAGS skb, there is no single gso_size, and this pull > > may be entirely undetectable as long as frag_list != NULL? > > > > > > > > > > skb_shinfo(skb)->frag_list = NULL; > > > > > > > > > > In absense of plugging the issue in BPF, dropping here is the best > > > > > we can do indeed, I think. -- Maciej Żenczykowski, Kernel Networking Developer @ Google
> > > > Personally, I think bpf_skb_pull_data() should have automatically > > > > (ie. in kernel code) reduced how much it pulls so that it would pull > > > > headers only, > > > > > > That would be a helper that parses headers to discover header length. > > > > Does it actually need to? Presumably the bpf pull function could > > notice that it is > > a packet flagged as being of type X (UDP GSO FRAGLIST) and reduce the pull > > accordingly so that it doesn't pull anything from the non-linear > > fraglist portion??? > > > > I know only the generic overview of what udp gso is, not any details, so I am > > assuming here that there's some sort of guarantee to how these packets > > are structured... But I imagine there must be or we wouldn't be hitting these > > issues deeper in the stack? > > Perhaps for a packet of this type we're already guaranteed the headers > are in the linear portion, > and the pull should simply be ignored? > > > > > > Parsing is better left to the BPF program. I do prefer adding sanity checks to the BPF helpers, over having to add then in the net hot path only to protect against dangerous BPF programs. In this case, it would be detecting this GSO type and failing the operation if exceeding skb_headlen(). > > > > > > > and not packet content. > > > > (This is assuming the rest of the code isn't ready to deal with a longer pull, > > > > which I think is the case atm. Pulling too much, and then crashing or forcing > > > > the stack to drop packets because of them being malformed seems wrong...) > > > > > > > > In general it would be nice if there was a way to just say pull all headers... > > > > (or possibly all L2/L3/L4 headers) > > > > You in general need to pull stuff *before* you've even looked at the packet, > > > > so that you can look at the packet, > > > > so it's relatively hard/annoying to pull the correct length from bpf > > > > code itself. > > > > > > > > > > > BPF needs to modify a proper length to do pull data. However kernel > > > > > > > should also improve the flow to avoid crash from a bpf function > > > > > > call. > > > > > > > As there is no split flow and app may not decode the merged UDP > > > > > > packet, > > > > > > > we should drop the packet without fraglist in skb_segment_list > > > > > > here. > > > > > > > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > > > > > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > > > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > > > > > > --- > > > > > > > net/core/skbuff.c | 3 +++ > > > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > > > > index b99127712e67..f68f2679b086 100644 > > > > > > > --- a/net/core/skbuff.c > > > > > > > +++ b/net/core/skbuff.c > > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct > > > > > > sk_buff *skb, > > > > > > > if (err) > > > > > > > goto err_linearize; > > > > > > > > > > > > > > +if (!list_skb) > > > > > > > +goto err_linearize; > > > > > > > + > > > > > > This would catch the case where the entire data frag_list is > > > linearized, but not a pskb_may_pull that only pulls in part of the > > > list. > > > > > > Even with BPF being privileged, the kernel should not crash if BPF > > > pulls a FRAGLIST GSO skb. > > > > > > But the check needs to be refined a bit. For a UDP GSO packet, I > > > think gso_size is still valid, so if the head_skb length does not > > > match gso_size, it has been messed with and should be dropped. > > > > > > For a GSO_BY_FRAGS skb, there is no single gso_size, and this pull > > > may be entirely undetectable as long as frag_list != NULL? > > > > > > > > > > > > > skb_shinfo(skb)->frag_list = NULL; > > > > > > > > > > > > In absense of plugging the issue in BPF, dropping here is the best > > > > > > we can do indeed, I think. > > -- > Maciej Żenczykowski, Kernel Networking Developer @ Google
On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > > > > > Personally, I think bpf_skb_pull_data() should have > automatically > > > > > (ie. in kernel code) reduced how much it pulls so that it > would pull > > > > > headers only, > > > > > > > > That would be a helper that parses headers to discover header > length. > > > > > > Does it actually need to? Presumably the bpf pull function could > > > notice that it is > > > a packet flagged as being of type X (UDP GSO FRAGLIST) and reduce > the pull > > > accordingly so that it doesn't pull anything from the non-linear > > > fraglist portion??? > > > > > > I know only the generic overview of what udp gso is, not any > details, so I am > > > assuming here that there's some sort of guarantee to how these > packets > > > are structured... But I imagine there must be or we wouldn't be > hitting these > > > issues deeper in the stack? > > > > Perhaps for a packet of this type we're already guaranteed the > headers > > are in the linear portion, > > and the pull should simply be ignored? > > > > > > > > > Parsing is better left to the BPF program. > > I do prefer adding sanity checks to the BPF helpers, over having to > add then in the net hot path only to protect against dangerous BPF > programs. > Is it OK to ignore or decrease pull length for udp gro fraglist packet? It could save the normal packet and sent to user correctly. In common/net/core/filter.c static inline int __bpf_try_make_writable(struct sk_buff *skb, unsigned int write_len) { + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & + (SKB_GSO_UDP |SKB_GSO_UDP_L4)) { + return 0; + or if (write_len > skb_headlen(skb)) + write_len = skb_headlen(skb); + } return skb_ensure_writable(skb, write_len); } > In this case, it would be detecting this GSO type and failing the > operation if exceeding skb_headlen(). > > > > > > > > > and not packet content. > > > > > (This is assuming the rest of the code isn't ready to deal > with a longer pull, > > > > > which I think is the case atm. Pulling too much, and then > crashing or forcing > > > > > the stack to drop packets because of them being malformed > seems wrong...) > > > > > > > > > > In general it would be nice if there was a way to just say > pull all headers... > > > > > (or possibly all L2/L3/L4 headers) > > > > > You in general need to pull stuff *before* you've even looked > at the packet, > > > > > so that you can look at the packet, > > > > > so it's relatively hard/annoying to pull the correct length > from bpf > > > > > code itself. > > > > > > > > > > > > > BPF needs to modify a proper length to do pull data. > However kernel > > > > > > > > should also improve the flow to avoid crash from a bpf > function > > > > > > > call. > > > > > > > > As there is no split flow and app may not decode the > merged UDP > > > > > > > packet, > > > > > > > > we should drop the packet without fraglist in > skb_segment_list > > > > > > > here. > > > > > > > > > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist > chaining.") > > > > > > > > Signed-off-by: Shiming Cheng < > shiming.cheng@mediatek.com> > > > > > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > > > > > > > --- > > > > > > > > net/core/skbuff.c | 3 +++ > > > > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > > > > > index b99127712e67..f68f2679b086 100644 > > > > > > > > --- a/net/core/skbuff.c > > > > > > > > +++ b/net/core/skbuff.c > > > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff > *skb_segment_list(struct > > > > > > > sk_buff *skb, > > > > > > > > if (err) > > > > > > > > goto err_linearize; > > > > > > > > > > > > > > > > +if (!list_skb) > > > > > > > > +goto err_linearize; > > > > > > > > + > > > > > > > > This would catch the case where the entire data frag_list is > > > > linearized, but not a pskb_may_pull that only pulls in part of > the > > > > list. > > > > > > > > Even with BPF being privileged, the kernel should not crash if > BPF > > > > pulls a FRAGLIST GSO skb. > > > > > > > > But the check needs to be refined a bit. For a UDP GSO packet, > I > > > > think gso_size is still valid, so if the head_skb length does > not > > > > match gso_size, it has been messed with and should be dropped. > > > > Is it OK as below? Is it OK to add log to record the error for easy checking issue. In net/core/skbuff.c skb_segment_list +unsigned int mss = skb_shinfo(head_skb)->gso_size; +bool err_len = false; +if ( mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) { + pr_err("skb is dropped due to messed data. gso size:%d, + hdrlen:%d", mss, skb_headlen(head_skb) + if (!list_skb) + goto err_linearize; + else + err_len = true; +} ... +if (err_len) { + goto err_linearize; +} skb_get(skb); ... > > > > For a GSO_BY_FRAGS skb, there is no single gso_size, and this > pull > > > > may be entirely undetectable as long as frag_list != NULL? > > > > > > > > In function skb_segment_list(), it just handle udp fraglist gro packet. nr_frags will be 0 here. It records a SKB_GSO_DODGY in gso_type when doing partially eaten for fraglist in __pskb_pull_tail and in skb_segment() it will check and disable NETIF_F_SG. skb_segment could segment data as gso_size even if it is pulled into hearder skb. I am not sure if it can decode when frag_list is NULL or partially eaten as no BPF pulls illegal length for tcp packet. Our platfrom doesn't meet issues in skb_segment for tcp packet till now. > > > > > > > > skb_shinfo(skb)->frag_list = NULL; > > > > > > > > > > > > > > In absense of plugging the issue in BPF, dropping here is > the best > > > > > > > we can do indeed, I think. > > > > -- > > Maciej Żenczykowski, Kernel Networking Developer @ Google > >
Lena Wang (王娜) wrote: > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote: > > > > External email : Please do not click links or open attachments until > > you have verified the sender or the content. > > > > > > Personally, I think bpf_skb_pull_data() should have > > automatically > > > > > > (ie. in kernel code) reduced how much it pulls so that it > > would pull > > > > > > headers only, > > > > > > > > > > That would be a helper that parses headers to discover header > > length. > > > > > > > > Does it actually need to? Presumably the bpf pull function could > > > > notice that it is > > > > a packet flagged as being of type X (UDP GSO FRAGLIST) and reduce > > the pull > > > > accordingly so that it doesn't pull anything from the non-linear > > > > fraglist portion??? > > > > > > > > I know only the generic overview of what udp gso is, not any > > details, so I am > > > > assuming here that there's some sort of guarantee to how these > > packets > > > > are structured... But I imagine there must be or we wouldn't be > > hitting these > > > > issues deeper in the stack? > > > > > > Perhaps for a packet of this type we're already guaranteed the > > headers > > > are in the linear portion, > > > and the pull should simply be ignored? > > > > > > > > > > > > Parsing is better left to the BPF program. > > > > I do prefer adding sanity checks to the BPF helpers, over having to > > add then in the net hot path only to protect against dangerous BPF > > programs. > > > Is it OK to ignore or decrease pull length for udp gro fraglist packet? > It could save the normal packet and sent to user correctly. > > In common/net/core/filter.c > static inline int __bpf_try_make_writable(struct sk_buff *skb, > unsigned int write_len) > { > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > + (SKB_GSO_UDP |SKB_GSO_UDP_L4)) { The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST. > + return 0; Failing for any pull is a bit excessive. And would kill a sane workaround of pulling only as many bytes as needed. > + or if (write_len > skb_headlen(skb)) > + write_len = skb_headlen(skb); Truncating requests would be a surprising change of behavior for this function. Failing for a pull > skb_headlen is arguably reasonable, as the alternative is that we let it go through but have to drop the now malformed packets on segmentation. > + } > return skb_ensure_writable(skb, write_len); > } > > > > In this case, it would be detecting this GSO type and failing the > > operation if exceeding skb_headlen(). > > > > > > > > > > > and not packet content. > > > > > > (This is assuming the rest of the code isn't ready to deal > > with a longer pull, > > > > > > which I think is the case atm. Pulling too much, and then > > crashing or forcing > > > > > > the stack to drop packets because of them being malformed > > seems wrong...) > > > > > > > > > > > > In general it would be nice if there was a way to just say > > pull all headers... > > > > > > (or possibly all L2/L3/L4 headers) > > > > > > You in general need to pull stuff *before* you've even looked > > at the packet, > > > > > > so that you can look at the packet, > > > > > > so it's relatively hard/annoying to pull the correct length > > from bpf > > > > > > code itself. > > > > > > > > > > > > > > > BPF needs to modify a proper length to do pull data. > > However kernel > > > > > > > > > should also improve the flow to avoid crash from a bpf > > function > > > > > > > > call. > > > > > > > > > As there is no split flow and app may not decode the > > merged UDP > > > > > > > > packet, > > > > > > > > > we should drop the packet without fraglist in > > skb_segment_list > > > > > > > > here. > > > > > > > > > > > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist > > chaining.") > > > > > > > > > Signed-off-by: Shiming Cheng < > > shiming.cheng@mediatek.com> > > > > > > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > > > > > > > > --- > > > > > > > > > net/core/skbuff.c | 3 +++ > > > > > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > > > > > > index b99127712e67..f68f2679b086 100644 > > > > > > > > > --- a/net/core/skbuff.c > > > > > > > > > +++ b/net/core/skbuff.c > > > > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff > > *skb_segment_list(struct > > > > > > > > sk_buff *skb, > > > > > > > > > if (err) > > > > > > > > > goto err_linearize; > > > > > > > > > > > > > > > > > > +if (!list_skb) > > > > > > > > > +goto err_linearize; > > > > > > > > > + > > > > > > > > > > This would catch the case where the entire data frag_list is > > > > > linearized, but not a pskb_may_pull that only pulls in part of > > the > > > > > list. > > > > > > > > > > Even with BPF being privileged, the kernel should not crash if > > BPF > > > > > pulls a FRAGLIST GSO skb. > > > > > > > > > > But the check needs to be refined a bit. For a UDP GSO packet, > > I > > > > > think gso_size is still valid, so if the head_skb length does > > not > > > > > match gso_size, it has been messed with and should be dropped. > > > > > > Is it OK as below? Is it OK to add log to record the error for easy > checking issue. > > In net/core/skbuff.c skb_segment_list > +unsigned int mss = skb_shinfo(head_skb)->gso_size; > +bool err_len = false; > > +if ( mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) { > + pr_err("skb is dropped due to messed data. gso size:%d, > + hdrlen:%d", mss, skb_headlen(head_skb) Such logs should always be rate limited. But no need to log cases where we well understood how we get there. I would stick with one approach: either in the BPF func or in segmentation, not both. And then I find BPF preferable, as explained before. > + if (!list_skb) > + goto err_linearize; > + else > + err_len = true; > +} > > ... > +if (err_len) { > + goto err_linearize; > +} > > skb_get(skb); > ...
On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > Lena Wang (王娜) wrote: > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote: > > > > > > External email : Please do not click links or open attachments > until > > > you have verified the sender or the content. > > > > > > > Personally, I think bpf_skb_pull_data() should have > > > automatically > > > > > > > (ie. in kernel code) reduced how much it pulls so that it > > > would pull > > > > > > > headers only, > > > > > > > > > > > > That would be a helper that parses headers to discover > header > > > length. > > > > > > > > > > Does it actually need to? Presumably the bpf pull function > could > > > > > notice that it is > > > > > a packet flagged as being of type X (UDP GSO FRAGLIST) and > reduce > > > the pull > > > > > accordingly so that it doesn't pull anything from the non- > linear > > > > > fraglist portion??? > > > > > > > > > > I know only the generic overview of what udp gso is, not any > > > details, so I am > > > > > assuming here that there's some sort of guarantee to how > these > > > packets > > > > > are structured... But I imagine there must be or we wouldn't > be > > > hitting these > > > > > issues deeper in the stack? > > > > > > > > Perhaps for a packet of this type we're already guaranteed the > > > headers > > > > are in the linear portion, > > > > and the pull should simply be ignored? > > > > > > > > > > > > > > > Parsing is better left to the BPF program. > > > > > > I do prefer adding sanity checks to the BPF helpers, over having > to > > > add then in the net hot path only to protect against dangerous > BPF > > > programs. > > > > > Is it OK to ignore or decrease pull length for udp gro fraglist > packet? > > It could save the normal packet and sent to user correctly. > > > > In common/net/core/filter.c > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > unsigned int write_len) > > { > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > +(SKB_GSO_UDP |SKB_GSO_UDP_L4)) { > > The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST. > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. In udp_offload.c udp4_gro_complete gso_type adds "SKB_GSO_FRAGLIST| SKB_GSO_UDP_L4". Here checking these two flags is to limit the packet as "UDP + need GSO + fraglist". We could remove SKB_GSO_UDP_L4 check for more packet that may addrive skb_segment_list. > > +return 0; > > Failing for any pull is a bit excessive. And would kill a sane > workaround of pulling only as many bytes as needed. > > > + or if (write_len > skb_headlen(skb)) > > +write_len = skb_headlen(skb); > > Truncating requests would be a surprising change of behavior > for this function. > > Failing for a pull > skb_headlen is arguably reasonable, as > the alternative is that we let it go through but have to drop > the now malformed packets on segmentation. > > Is it OK as below? In common/net/core/filter.c static inline int __bpf_try_make_writable(struct sk_buff *skb, unsigned int write_len) { + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { + return 0; + } return skb_ensure_writable(skb, write_len); } > > +} > > return skb_ensure_writable(skb, write_len); > > } > > > > > > > In this case, it would be detecting this GSO type and failing the > > > operation if exceeding skb_headlen(). > > > > > > > > > > > > > and not packet content. > > > > > > > (This is assuming the rest of the code isn't ready to > deal > > > with a longer pull, > > > > > > > which I think is the case atm. Pulling too much, and > then > > > crashing or forcing > > > > > > > the stack to drop packets because of them being malformed > > > seems wrong...) > > > > > > > > > > > > > > In general it would be nice if there was a way to just > say > > > pull all headers... > > > > > > > (or possibly all L2/L3/L4 headers) > > > > > > > You in general need to pull stuff *before* you've even > looked > > > at the packet, > > > > > > > so that you can look at the packet, > > > > > > > so it's relatively hard/annoying to pull the correct > length > > > from bpf > > > > > > > code itself. > > > > > > > > > > > > > > > > > BPF needs to modify a proper length to do pull > data. > > > However kernel > > > > > > > > > > should also improve the flow to avoid crash from a > bpf > > > function > > > > > > > > > call. > > > > > > > > > > As there is no split flow and app may not decode > the > > > merged UDP > > > > > > > > > packet, > > > > > > > > > > we should drop the packet without fraglist in > > > skb_segment_list > > > > > > > > > here. > > > > > > > > > > > > > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist > > > chaining.") > > > > > > > > > > Signed-off-by: Shiming Cheng < > > > shiming.cheng@mediatek.com> > > > > > > > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > > > > > > > > > --- > > > > > > > > > > net/core/skbuff.c | 3 +++ > > > > > > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > > > > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > > > > > > > index b99127712e67..f68f2679b086 100644 > > > > > > > > > > --- a/net/core/skbuff.c > > > > > > > > > > +++ b/net/core/skbuff.c > > > > > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff > > > *skb_segment_list(struct > > > > > > > > > sk_buff *skb, > > > > > > > > > > if (err) > > > > > > > > > > goto err_linearize; > > > > > > > > > > > > > > > > > > > > +if (!list_skb) > > > > > > > > > > +goto err_linearize; > > > > > > > > > > + > > > > > > > > > > > > This would catch the case where the entire data frag_list > is > > > > > > linearized, but not a pskb_may_pull that only pulls in part > of > > > the > > > > > > list. > > > > > > > > > > > > Even with BPF being privileged, the kernel should not crash > if > > > BPF > > > > > > pulls a FRAGLIST GSO skb. > > > > > > > > > > > > But the check needs to be refined a bit. For a UDP GSO > packet, > > > I > > > > > > think gso_size is still valid, so if the head_skb length > does > > > not > > > > > > match gso_size, it has been messed with and should be > dropped. > > > > > > > > Is it OK as below? Is it OK to add log to record the error for easy > > checking issue. > > > > In net/core/skbuff.c skb_segment_list > > +unsigned int mss = skb_shinfo(head_skb)->gso_size; > > +bool err_len = false; > > > > +if ( mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) { > > +pr_err("skb is dropped due to messed data. gso size:%d, > > +hdrlen:%d", mss, skb_headlen(head_skb) > > Such logs should always be rate limited. But no need to log cases > where we well understood how we get there. > > I would stick with one approach: either in the BPF func or in > segmentation, not both. And then I find BPF preferable, as explained > before. > OK, we try make a patch in BPF func. > > +if (!list_skb) > > +goto err_linearize; > > +else > > +err_len = true; > > +} > > > > ... > > +if (err_len) { > > +goto err_linearize; > > +} > > > > skb_get(skb); > > ...
On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) <Lena.Wang@mediatek.com> wrote: > > On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote: > > > > External email : Please do not click links or open attachments until > > you have verified the sender or the content. > > Lena Wang (王娜) wrote: > > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote: > > > > > > > > External email : Please do not click links or open attachments > > until > > > > you have verified the sender or the content. > > > > > > > > Personally, I think bpf_skb_pull_data() should have > > > > automatically > > > > > > > > (ie. in kernel code) reduced how much it pulls so that it > > > > would pull > > > > > > > > headers only, > > > > > > > > > > > > > > That would be a helper that parses headers to discover > > header > > > > length. > > > > > > > > > > > > Does it actually need to? Presumably the bpf pull function > > could > > > > > > notice that it is > > > > > > a packet flagged as being of type X (UDP GSO FRAGLIST) and > > reduce > > > > the pull > > > > > > accordingly so that it doesn't pull anything from the non- > > linear > > > > > > fraglist portion??? > > > > > > > > > > > > I know only the generic overview of what udp gso is, not any > > > > details, so I am > > > > > > assuming here that there's some sort of guarantee to how > > these > > > > packets > > > > > > are structured... But I imagine there must be or we wouldn't > > be > > > > hitting these > > > > > > issues deeper in the stack? > > > > > > > > > > Perhaps for a packet of this type we're already guaranteed the > > > > headers > > > > > are in the linear portion, > > > > > and the pull should simply be ignored? > > > > > > > > > > > > > > > > > > Parsing is better left to the BPF program. > > > > > > > > I do prefer adding sanity checks to the BPF helpers, over having > > to > > > > add then in the net hot path only to protect against dangerous > > BPF > > > > programs. > > > > > > > Is it OK to ignore or decrease pull length for udp gro fraglist > > packet? > > > It could save the normal packet and sent to user correctly. > > > > > > In common/net/core/filter.c > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > unsigned int write_len) > > > { > > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > +(SKB_GSO_UDP |SKB_GSO_UDP_L4)) { > > > > The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST. > > > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. In > udp_offload.c udp4_gro_complete gso_type adds "SKB_GSO_FRAGLIST| > SKB_GSO_UDP_L4". Here checking these two flags is to limit the packet > as "UDP + need GSO + fraglist". > > We could remove SKB_GSO_UDP_L4 check for more packet that may addrive > skb_segment_list. > > > > +return 0; > > > > Failing for any pull is a bit excessive. And would kill a sane > > workaround of pulling only as many bytes as needed. > > > > > + or if (write_len > skb_headlen(skb)) > > > +write_len = skb_headlen(skb); > > > > Truncating requests would be a surprising change of behavior > > for this function. > > > > Failing for a pull > skb_headlen is arguably reasonable, as > > the alternative is that we let it go through but have to drop > > the now malformed packets on segmentation. > > > > > Is it OK as below? > > In common/net/core/filter.c > static inline int __bpf_try_make_writable(struct sk_buff *skb, > unsigned int write_len) > { > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { > + return 0; please limit write_len to skb_headlen() instead of just returning 0 > + } > return skb_ensure_writable(skb, write_len); > } > > > > +} > > > return skb_ensure_writable(skb, write_len); > > > } > > > > > > > > > > In this case, it would be detecting this GSO type and failing the > > > > operation if exceeding skb_headlen(). > > > > > > > > > > > > > > > and not packet content. > > > > > > > > (This is assuming the rest of the code isn't ready to > > deal > > > > with a longer pull, > > > > > > > > which I think is the case atm. Pulling too much, and > > then > > > > crashing or forcing > > > > > > > > the stack to drop packets because of them being malformed > > > > seems wrong...) > > > > > > > > > > > > > > > > In general it would be nice if there was a way to just > > say > > > > pull all headers... > > > > > > > > (or possibly all L2/L3/L4 headers) > > > > > > > > You in general need to pull stuff *before* you've even > > looked > > > > at the packet, > > > > > > > > so that you can look at the packet, > > > > > > > > so it's relatively hard/annoying to pull the correct > > length > > > > from bpf > > > > > > > > code itself. > > > > > > > > > > > > > > > > > > > BPF needs to modify a proper length to do pull > > data. > > > > However kernel > > > > > > > > > > > should also improve the flow to avoid crash from a > > bpf > > > > function > > > > > > > > > > call. > > > > > > > > > > > As there is no split flow and app may not decode > > the > > > > merged UDP > > > > > > > > > > packet, > > > > > > > > > > > we should drop the packet without fraglist in > > > > skb_segment_list > > > > > > > > > > here. > > > > > > > > > > > > > > > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist > > > > chaining.") > > > > > > > > > > > Signed-off-by: Shiming Cheng < > > > > shiming.cheng@mediatek.com> > > > > > > > > > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > > > > > > > > > > --- > > > > > > > > > > > net/core/skbuff.c | 3 +++ > > > > > > > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > > > > > > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > > > > > > > > index b99127712e67..f68f2679b086 100644 > > > > > > > > > > > --- a/net/core/skbuff.c > > > > > > > > > > > +++ b/net/core/skbuff.c > > > > > > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff > > > > *skb_segment_list(struct > > > > > > > > > > sk_buff *skb, > > > > > > > > > > > if (err) > > > > > > > > > > > goto err_linearize; > > > > > > > > > > > > > > > > > > > > > > +if (!list_skb) > > > > > > > > > > > +goto err_linearize; > > > > > > > > > > > + > > > > > > > > > > > > > > This would catch the case where the entire data frag_list > > is > > > > > > > linearized, but not a pskb_may_pull that only pulls in part > > of > > > > the > > > > > > > list. > > > > > > > > > > > > > > Even with BPF being privileged, the kernel should not crash > > if > > > > BPF > > > > > > > pulls a FRAGLIST GSO skb. > > > > > > > > > > > > > > But the check needs to be refined a bit. For a UDP GSO > > packet, > > > > I > > > > > > > think gso_size is still valid, so if the head_skb length > > does > > > > not > > > > > > > match gso_size, it has been messed with and should be > > dropped. > > > > > > > > > > Is it OK as below? Is it OK to add log to record the error for easy > > > checking issue. > > > > > > In net/core/skbuff.c skb_segment_list > > > +unsigned int mss = skb_shinfo(head_skb)->gso_size; > > > +bool err_len = false; > > > > > > +if ( mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) { > > > +pr_err("skb is dropped due to messed data. gso size:%d, > > > +hdrlen:%d", mss, skb_headlen(head_skb) > > > > Such logs should always be rate limited. But no need to log cases > > where we well understood how we get there. > > > > I would stick with one approach: either in the BPF func or in > > segmentation, not both. And then I find BPF preferable, as explained > > before. > > > OK, we try make a patch in BPF func. > > > > +if (!list_skb) > > > +goto err_linearize; > > > +else > > > +err_len = true; > > > +} > > > > > > ... > > > +if (err_len) { > > > +goto err_linearize; > > > +} > > > > > > skb_get(skb); > > > ... -- Maciej Żenczykowski, Kernel Networking Developer @ Google
On Wed, 2024-04-17 at 21:15 -0700, Maciej Żenczykowski wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) < > Lena.Wang@mediatek.com> wrote: > > > > On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote: > > > > > > External email : Please do not click links or open attachments > until > > > you have verified the sender or the content. > > > Lena Wang (王娜) wrote: > > > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote: > > > > > > > > > > External email : Please do not click links or open > attachments > > > until > > > > > you have verified the sender or the content. > > > > > > > > > Personally, I think bpf_skb_pull_data() should have > > > > > automatically > > > > > > > > > (ie. in kernel code) reduced how much it pulls so > that it > > > > > would pull > > > > > > > > > headers only, > > > > > > > > > > > > > > > > That would be a helper that parses headers to discover > > > header > > > > > length. > > > > > > > > > > > > > > Does it actually need to? Presumably the bpf pull > function > > > could > > > > > > > notice that it is > > > > > > > a packet flagged as being of type X (UDP GSO FRAGLIST) > and > > > reduce > > > > > the pull > > > > > > > accordingly so that it doesn't pull anything from the > non- > > > linear > > > > > > > fraglist portion??? > > > > > > > > > > > > > > I know only the generic overview of what udp gso is, not > any > > > > > details, so I am > > > > > > > assuming here that there's some sort of guarantee to how > > > these > > > > > packets > > > > > > > are structured... But I imagine there must be or we > wouldn't > > > be > > > > > hitting these > > > > > > > issues deeper in the stack? > > > > > > > > > > > > Perhaps for a packet of this type we're already guaranteed > the > > > > > headers > > > > > > are in the linear portion, > > > > > > and the pull should simply be ignored? > > > > > > > > > > > > > > > > > > > > > Parsing is better left to the BPF program. > > > > > > > > > > I do prefer adding sanity checks to the BPF helpers, over > having > > > to > > > > > add then in the net hot path only to protect against > dangerous > > > BPF > > > > > programs. > > > > > > > > > Is it OK to ignore or decrease pull length for udp gro fraglist > > > packet? > > > > It could save the normal packet and sent to user correctly. > > > > > > > > In common/net/core/filter.c > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > > unsigned int write_len) > > > > { > > > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > +(SKB_GSO_UDP |SKB_GSO_UDP_L4)) { > > > > > > The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST. > > > > > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. In > > udp_offload.c udp4_gro_complete gso_type adds "SKB_GSO_FRAGLIST| > > SKB_GSO_UDP_L4". Here checking these two flags is to limit the > packet > > as "UDP + need GSO + fraglist". > > > > We could remove SKB_GSO_UDP_L4 check for more packet that may > addrive > > skb_segment_list. > > > > > > +return 0; > > > > > > Failing for any pull is a bit excessive. And would kill a sane > > > workaround of pulling only as many bytes as needed. > > > > > > > + or if (write_len > skb_headlen(skb)) > > > > +write_len = skb_headlen(skb); > > > > > > Truncating requests would be a surprising change of behavior > > > for this function. > > > > > > Failing for a pull > skb_headlen is arguably reasonable, as > > > the alternative is that we let it go through but have to drop > > > the now malformed packets on segmentation. > > > > > > > > Is it OK as below? > > > > In common/net/core/filter.c > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > unsigned int write_len) > > { > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > + SKB_GSO_FRAGLIST) && (write_len > > skb_headlen(skb))) { > > + return 0; > > please limit write_len to skb_headlen() instead of just returning 0 > Hi Maze & Willem, Maze's advice is: In common/net/core/filter.c static inline int __bpf_try_make_writable(struct sk_buff *skb, unsigned int write_len) { + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { + write_len = skb_headlen(skb); + } return skb_ensure_writable(skb, write_len); } Willem's advice is to "Failing for a pull > skb_headlen is arguably reasonable...". It prefers to return 0 : + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { + return 0; + } It seems a bit conflict. However I am not sure if my understanding is right and hope to get your further guide. Thanks Lena > > + } > > return skb_ensure_writable(skb, write_len); > > } > > > > > > +} > > > > return skb_ensure_writable(skb, write_len); > > > > } > > > > > > > > > > > > > In this case, it would be detecting this GSO type and failing > the > > > > > operation if exceeding skb_headlen(). > > > > > > > > > > > > > > > > > and not packet content. > > > > > > > > > (This is assuming the rest of the code isn't ready to > > > deal > > > > > with a longer pull, > > > > > > > > > which I think is the case atm. Pulling too much, and > > > then > > > > > crashing or forcing > > > > > > > > > the stack to drop packets because of them being > malformed > > > > > seems wrong...) > > > > > > > > > > > > > > > > > > In general it would be nice if there was a way to > just > > > say > > > > > pull all headers... > > > > > > > > > (or possibly all L2/L3/L4 headers) > > > > > > > > > You in general need to pull stuff *before* you've > even > > > looked > > > > > at the packet, > > > > > > > > > so that you can look at the packet, > > > > > > > > > so it's relatively hard/annoying to pull the correct > > > length > > > > > from bpf > > > > > > > > > code itself. > > > > > > > > > > > > > > > > > > > > > BPF needs to modify a proper length to do pull > > > data. > > > > > However kernel > > > > > > > > > > > > should also improve the flow to avoid crash > from a > > > bpf > > > > > function > > > > > > > > > > > call. > > > > > > > > > > > > As there is no split flow and app may not > decode > > > the > > > > > merged UDP > > > > > > > > > > > packet, > > > > > > > > > > > > we should drop the packet without fraglist in > > > > > skb_segment_list > > > > > > > > > > > here. > > > > > > > > > > > > > > > > > > > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO > fraglist > > > > > chaining.") > > > > > > > > > > > > Signed-off-by: Shiming Cheng < > > > > > shiming.cheng@mediatek.com> > > > > > > > > > > > > Signed-off-by: Lena Wang < > lena.wang@mediatek.com> > > > > > > > > > > > > --- > > > > > > > > > > > > net/core/skbuff.c | 3 +++ > > > > > > > > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > > > > > > > > > > > > > > > diff --git a/net/core/skbuff.c > b/net/core/skbuff.c > > > > > > > > > > > > index b99127712e67..f68f2679b086 100644 > > > > > > > > > > > > --- a/net/core/skbuff.c > > > > > > > > > > > > +++ b/net/core/skbuff.c > > > > > > > > > > > > @@ -4504,6 +4504,9 @@ struct sk_buff > > > > > *skb_segment_list(struct > > > > > > > > > > > sk_buff *skb, > > > > > > > > > > > > if (err) > > > > > > > > > > > > goto err_linearize; > > > > > > > > > > > > > > > > > > > > > > > > +if (!list_skb) > > > > > > > > > > > > +goto err_linearize; > > > > > > > > > > > > + > > > > > > > > > > > > > > > > This would catch the case where the entire data > frag_list > > > is > > > > > > > > linearized, but not a pskb_may_pull that only pulls in > part > > > of > > > > > the > > > > > > > > list. > > > > > > > > > > > > > > > > Even with BPF being privileged, the kernel should not > crash > > > if > > > > > BPF > > > > > > > > pulls a FRAGLIST GSO skb. > > > > > > > > > > > > > > > > But the check needs to be refined a bit. For a UDP GSO > > > packet, > > > > > I > > > > > > > > think gso_size is still valid, so if the head_skb > length > > > does > > > > > not > > > > > > > > match gso_size, it has been messed with and should be > > > dropped. > > > > > > > > > > > > Is it OK as below? Is it OK to add log to record the error for > easy > > > > checking issue. > > > > > > > > In net/core/skbuff.c skb_segment_list > > > > +unsigned int mss = skb_shinfo(head_skb)->gso_size; > > > > +bool err_len = false; > > > > > > > > +if ( mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) { > > > > +pr_err("skb is dropped due to messed data. gso size:%d, > > > > +hdrlen:%d", mss, skb_headlen(head_skb) > > > > > > Such logs should always be rate limited. But no need to log cases > > > where we well understood how we get there. > > > > > > I would stick with one approach: either in the BPF func or in > > > segmentation, not both. And then I find BPF preferable, as > explained > > > before. > > > > > OK, we try make a patch in BPF func. > > > > > > +if (!list_skb) > > > > +goto err_linearize; > > > > +else > > > > +err_len = true; > > > > +} > > > > > > > > ... > > > > +if (err_len) { > > > > +goto err_linearize; > > > > +} > > > > > > > > skb_get(skb); > > > > ... > > -- > Maciej Żenczykowski, Kernel Networking Developer @ Google
Lena Wang (王娜) wrote: > On Wed, 2024-04-17 at 21:15 -0700, Maciej Żenczykowski wrote: > > > > External email : Please do not click links or open attachments until > > you have verified the sender or the content. > > On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) < > > Lena.Wang@mediatek.com> wrote: > > > > > > On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote: > > > > > > > > External email : Please do not click links or open attachments > > until > > > > you have verified the sender or the content. > > > > Lena Wang (王娜) wrote: > > > > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote: > > > > > > > > > > > > External email : Please do not click links or open > > attachments > > > > until > > > > > > you have verified the sender or the content. > > > > > > > > > > Personally, I think bpf_skb_pull_data() should have > > > > > > automatically > > > > > > > > > > (ie. in kernel code) reduced how much it pulls so > > that it > > > > > > would pull > > > > > > > > > > headers only, > > > > > > > > > > > > > > > > > > That would be a helper that parses headers to discover > > > > header > > > > > > length. > > > > > > > > > > > > > > > > Does it actually need to? Presumably the bpf pull > > function > > > > could > > > > > > > > notice that it is > > > > > > > > a packet flagged as being of type X (UDP GSO FRAGLIST) > > and > > > > reduce > > > > > > the pull > > > > > > > > accordingly so that it doesn't pull anything from the > > non- > > > > linear > > > > > > > > fraglist portion??? > > > > > > > > > > > > > > > > I know only the generic overview of what udp gso is, not > > any > > > > > > details, so I am > > > > > > > > assuming here that there's some sort of guarantee to how > > > > these > > > > > > packets > > > > > > > > are structured... But I imagine there must be or we > > wouldn't > > > > be > > > > > > hitting these > > > > > > > > issues deeper in the stack? > > > > > > > > > > > > > > Perhaps for a packet of this type we're already guaranteed > > the > > > > > > headers > > > > > > > are in the linear portion, > > > > > > > and the pull should simply be ignored? > > > > > > > > > > > > > > > > > > > > > > > > Parsing is better left to the BPF program. > > > > > > > > > > > > I do prefer adding sanity checks to the BPF helpers, over > > having > > > > to > > > > > > add then in the net hot path only to protect against > > dangerous > > > > BPF > > > > > > programs. > > > > > > > > > > > Is it OK to ignore or decrease pull length for udp gro fraglist > > > > packet? > > > > > It could save the normal packet and sent to user correctly. > > > > > > > > > > In common/net/core/filter.c > > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > > > unsigned int write_len) > > > > > { > > > > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > > +(SKB_GSO_UDP |SKB_GSO_UDP_L4)) { > > > > > > > > The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST. > > > > > > > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. In > > > udp_offload.c udp4_gro_complete gso_type adds "SKB_GSO_FRAGLIST| > > > SKB_GSO_UDP_L4". Here checking these two flags is to limit the > > packet > > > as "UDP + need GSO + fraglist". > > > > > > We could remove SKB_GSO_UDP_L4 check for more packet that may > > addrive > > > skb_segment_list. > > > > > > > > +return 0; > > > > > > > > Failing for any pull is a bit excessive. And would kill a sane > > > > workaround of pulling only as many bytes as needed. > > > > > > > > > + or if (write_len > skb_headlen(skb)) > > > > > +write_len = skb_headlen(skb); > > > > > > > > Truncating requests would be a surprising change of behavior > > > > for this function. > > > > > > > > Failing for a pull > skb_headlen is arguably reasonable, as > > > > the alternative is that we let it go through but have to drop > > > > the now malformed packets on segmentation. > > > > > > > > > > > Is it OK as below? > > > > > > In common/net/core/filter.c > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > unsigned int write_len) > > > { > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > + SKB_GSO_FRAGLIST) && (write_len > > > skb_headlen(skb))) { > > > + return 0; > > > > please limit write_len to skb_headlen() instead of just returning 0 > > > > Hi Maze & Willem, > Maze's advice is: > In common/net/core/filter.c > static inline int __bpf_try_make_writable(struct sk_buff *skb, > unsigned int write_len) > { > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { > + write_len = skb_headlen(skb); > + } > return skb_ensure_writable(skb, write_len); > } > > Willem's advice is to "Failing for a pull > skb_headlen is arguably > reasonable...". It prefers to return 0 : > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { > + return 0; > + } > > It seems a bit conflict. However I am not sure if my understanding is > right and hope to get your further guide. I did not mean to return 0. But to fail a request that would pull an unsafe amount. The caller must get a clear error signal. Back to the original report: the issue should already have been fixed by commit 876e8ca83667 ("net: fix NULL pointer in skb_segment_list"). But that commit is in the kernel for which you report the error. Turns out that the crash is not in skb_segment_list, but later in __udpv4_gso_segment_list_csum. Which unconditionally dereferences udp_hdr(seg). The above fix also mentions skb pull as the culprit, but does not include a BPF program. If this can be reached in other ways, then we do need a stronger test in skb_segment_list, as you propose. I don't want to narrowly check whether udp_hdr is safe. Essentially, an SKB_GSO_FRAGLIST skb layout cannot be trusted at all if even one byte would get pulled.
On Fri, Apr 19, 2024 at 7:17 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Lena Wang (王娜) wrote: > > On Wed, 2024-04-17 at 21:15 -0700, Maciej Żenczykowski wrote: > > > > > > External email : Please do not click links or open attachments until > > > you have verified the sender or the content. > > > On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) < > > > Lena.Wang@mediatek.com> wrote: > > > > > > > > On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote: > > > > > > > > > > External email : Please do not click links or open attachments > > > until > > > > > you have verified the sender or the content. > > > > > Lena Wang (王娜) wrote: > > > > > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote: > > > > > > > > > > > > > > External email : Please do not click links or open > > > attachments > > > > > until > > > > > > > you have verified the sender or the content. > > > > > > > > > > > Personally, I think bpf_skb_pull_data() should have > > > > > > > automatically > > > > > > > > > > > (ie. in kernel code) reduced how much it pulls so > > > that it > > > > > > > would pull > > > > > > > > > > > headers only, > > > > > > > > > > > > > > > > > > > > That would be a helper that parses headers to discover > > > > > header > > > > > > > length. > > > > > > > > > > > > > > > > > > Does it actually need to? Presumably the bpf pull > > > function > > > > > could > > > > > > > > > notice that it is > > > > > > > > > a packet flagged as being of type X (UDP GSO FRAGLIST) > > > and > > > > > reduce > > > > > > > the pull > > > > > > > > > accordingly so that it doesn't pull anything from the > > > non- > > > > > linear > > > > > > > > > fraglist portion??? > > > > > > > > > > > > > > > > > > I know only the generic overview of what udp gso is, not > > > any > > > > > > > details, so I am > > > > > > > > > assuming here that there's some sort of guarantee to how > > > > > these > > > > > > > packets > > > > > > > > > are structured... But I imagine there must be or we > > > wouldn't > > > > > be > > > > > > > hitting these > > > > > > > > > issues deeper in the stack? > > > > > > > > > > > > > > > > Perhaps for a packet of this type we're already guaranteed > > > the > > > > > > > headers > > > > > > > > are in the linear portion, > > > > > > > > and the pull should simply be ignored? > > > > > > > > > > > > > > > > > > > > > > > > > > > Parsing is better left to the BPF program. > > > > > > > > > > > > > > I do prefer adding sanity checks to the BPF helpers, over > > > having > > > > > to > > > > > > > add then in the net hot path only to protect against > > > dangerous > > > > > BPF > > > > > > > programs. > > > > > > > > > > > > > Is it OK to ignore or decrease pull length for udp gro fraglist > > > > > packet? > > > > > > It could save the normal packet and sent to user correctly. > > > > > > > > > > > > In common/net/core/filter.c > > > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > > > > unsigned int write_len) > > > > > > { > > > > > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > > > +(SKB_GSO_UDP |SKB_GSO_UDP_L4)) { > > > > > > > > > > The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST. > > > > > > > > > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. In > > > > udp_offload.c udp4_gro_complete gso_type adds "SKB_GSO_FRAGLIST| > > > > SKB_GSO_UDP_L4". Here checking these two flags is to limit the > > > packet > > > > as "UDP + need GSO + fraglist". > > > > > > > > We could remove SKB_GSO_UDP_L4 check for more packet that may > > > addrive > > > > skb_segment_list. > > > > > > > > > > +return 0; > > > > > > > > > > Failing for any pull is a bit excessive. And would kill a sane > > > > > workaround of pulling only as many bytes as needed. > > > > > > > > > > > + or if (write_len > skb_headlen(skb)) > > > > > > +write_len = skb_headlen(skb); > > > > > > > > > > Truncating requests would be a surprising change of behavior > > > > > for this function. > > > > > > > > > > Failing for a pull > skb_headlen is arguably reasonable, as > > > > > the alternative is that we let it go through but have to drop > > > > > the now malformed packets on segmentation. > > > > > > > > > > > > > > Is it OK as below? > > > > > > > > In common/net/core/filter.c > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > > unsigned int write_len) > > > > { > > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > + SKB_GSO_FRAGLIST) && (write_len > > > > skb_headlen(skb))) { > > > > + return 0; > > > > > > please limit write_len to skb_headlen() instead of just returning 0 > > > > > > > Hi Maze & Willem, > > Maze's advice is: > > In common/net/core/filter.c > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > unsigned int write_len) > > { > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { > > + write_len = skb_headlen(skb); > > + } > > return skb_ensure_writable(skb, write_len); > > } > > > > Willem's advice is to "Failing for a pull > skb_headlen is arguably > > reasonable...". It prefers to return 0 : > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { > > + return 0; > > + } > > > > It seems a bit conflict. However I am not sure if my understanding is > > right and hope to get your further guide. > > I did not mean to return 0. But to fail a request that would pull an > unsafe amount. The caller must get a clear error signal. That's hostile on userspace. Currently the caller doesn't even check the error return... Why would we? We already have to reload all pointers, and have to do and will thus redo checking on those. What do you expect the caller to do? Subtract -1 and try again? That's hard to do from BPF as it involves looping... and is slow. We already try to not pull too much: void try_make_writable(struct __sk_buff* skb, int len) { if (len > skb->len) len = skb->len; if (skb->data_end - skb->data < len) bpf_skb_pull_data(skb, len); } Is there at least something like skb->len that has the actually pullable length in it? Or are these skb's structured in such a way that there is never a need to pull anything, because the headers are already always in the linear portion? > Back to the original report: the issue should already have been fixed > by commit 876e8ca83667 ("net: fix NULL pointer in skb_segment_list"). > But that commit is in the kernel for which you report the error. > > Turns out that the crash is not in skb_segment_list, but later in > __udpv4_gso_segment_list_csum. Which unconditionally dereferences > udp_hdr(seg). > > The above fix also mentions skb pull as the culprit, but does not > include a BPF program. If this can be reached in other ways, then we > do need a stronger test in skb_segment_list, as you propose. > > I don't want to narrowly check whether udp_hdr is safe. Essentially, > an SKB_GSO_FRAGLIST skb layout cannot be trusted at all if even one > byte would get pulled. -- Maciej Żenczykowski, Kernel Networking Developer @ Google
Maciej Żenczykowski wrote: > On Fri, Apr 19, 2024 at 7:17 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Lena Wang (王娜) wrote: > > > On Wed, 2024-04-17 at 21:15 -0700, Maciej Żenczykowski wrote: > > > > > > > > External email : Please do not click links or open attachments until > > > > you have verified the sender or the content. > > > > On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) < > > > > Lena.Wang@mediatek.com> wrote: > > > > > > > > > > On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote: > > > > > > > > > > > > External email : Please do not click links or open attachments > > > > until > > > > > > you have verified the sender or the content. > > > > > > Lena Wang (王娜) wrote: > > > > > > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn wrote: > > > > > > > > > > > > > > > > External email : Please do not click links or open > > > > attachments > > > > > > until > > > > > > > > you have verified the sender or the content. > > > > > > > > > > > > Personally, I think bpf_skb_pull_data() should have > > > > > > > > automatically > > > > > > > > > > > > (ie. in kernel code) reduced how much it pulls so > > > > that it > > > > > > > > would pull > > > > > > > > > > > > headers only, > > > > > > > > > > > > > > > > > > > > > > That would be a helper that parses headers to discover > > > > > > header > > > > > > > > length. > > > > > > > > > > > > > > > > > > > > Does it actually need to? Presumably the bpf pull > > > > function > > > > > > could > > > > > > > > > > notice that it is > > > > > > > > > > a packet flagged as being of type X (UDP GSO FRAGLIST) > > > > and > > > > > > reduce > > > > > > > > the pull > > > > > > > > > > accordingly so that it doesn't pull anything from the > > > > non- > > > > > > linear > > > > > > > > > > fraglist portion??? > > > > > > > > > > > > > > > > > > > > I know only the generic overview of what udp gso is, not > > > > any > > > > > > > > details, so I am > > > > > > > > > > assuming here that there's some sort of guarantee to how > > > > > > these > > > > > > > > packets > > > > > > > > > > are structured... But I imagine there must be or we > > > > wouldn't > > > > > > be > > > > > > > > hitting these > > > > > > > > > > issues deeper in the stack? > > > > > > > > > > > > > > > > > > Perhaps for a packet of this type we're already guaranteed > > > > the > > > > > > > > headers > > > > > > > > > are in the linear portion, > > > > > > > > > and the pull should simply be ignored? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Parsing is better left to the BPF program. > > > > > > > > > > > > > > > > I do prefer adding sanity checks to the BPF helpers, over > > > > having > > > > > > to > > > > > > > > add then in the net hot path only to protect against > > > > dangerous > > > > > > BPF > > > > > > > > programs. > > > > > > > > > > > > > > > Is it OK to ignore or decrease pull length for udp gro fraglist > > > > > > packet? > > > > > > > It could save the normal packet and sent to user correctly. > > > > > > > > > > > > > > In common/net/core/filter.c > > > > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > > > > > unsigned int write_len) > > > > > > > { > > > > > > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > > > > +(SKB_GSO_UDP |SKB_GSO_UDP_L4)) { > > > > > > > > > > > > The issue is not with SKB_GSO_UDP_L4, but with SKB_GSO_FRAGLIST. > > > > > > > > > > > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. In > > > > > udp_offload.c udp4_gro_complete gso_type adds "SKB_GSO_FRAGLIST| > > > > > SKB_GSO_UDP_L4". Here checking these two flags is to limit the > > > > packet > > > > > as "UDP + need GSO + fraglist". > > > > > > > > > > We could remove SKB_GSO_UDP_L4 check for more packet that may > > > > addrive > > > > > skb_segment_list. > > > > > > > > > > > > +return 0; > > > > > > > > > > > > Failing for any pull is a bit excessive. And would kill a sane > > > > > > workaround of pulling only as many bytes as needed. > > > > > > > > > > > > > + or if (write_len > skb_headlen(skb)) > > > > > > > +write_len = skb_headlen(skb); > > > > > > > > > > > > Truncating requests would be a surprising change of behavior > > > > > > for this function. > > > > > > > > > > > > Failing for a pull > skb_headlen is arguably reasonable, as > > > > > > the alternative is that we let it go through but have to drop > > > > > > the now malformed packets on segmentation. > > > > > > > > > > > > > > > > > Is it OK as below? > > > > > > > > > > In common/net/core/filter.c > > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > > > unsigned int write_len) > > > > > { > > > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > > + SKB_GSO_FRAGLIST) && (write_len > > > > > skb_headlen(skb))) { > > > > > + return 0; > > > > > > > > please limit write_len to skb_headlen() instead of just returning 0 > > > > > > > > > > Hi Maze & Willem, > > > Maze's advice is: > > > In common/net/core/filter.c > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > unsigned int write_len) > > > { > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { > > > + write_len = skb_headlen(skb); > > > + } > > > return skb_ensure_writable(skb, write_len); > > > } > > > > > > Willem's advice is to "Failing for a pull > skb_headlen is arguably > > > reasonable...". It prefers to return 0 : > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { > > > + return 0; > > > + } > > > > > > It seems a bit conflict. However I am not sure if my understanding is > > > right and hope to get your further guide. > > > > I did not mean to return 0. But to fail a request that would pull an > > unsafe amount. The caller must get a clear error signal. > > That's hostile on userspace. > Currently the caller doesn't even check the error return... It can, and probably should. bpf_skb_pull data returns the error code from bpf_try_make_writable: return bpf_try_make_writable(skb, len ? : skb_headlen(skb)); > Why would we? We already have to reload all pointers, and have to do > and will thus redo checking on those. > > What do you expect the caller to do? Subtract -1 and try again? > That's hard to do from BPF as it involves looping... and is slow. > > We already try to not pull too much: > > void try_make_writable(struct __sk_buff* skb, int len) { > if (len > skb->len) len = skb->len; > if (skb->data_end - skb->data < len) bpf_skb_pull_data(skb, len); > } > > Is there at least something like skb->len that has the actually > pullable length in it? The above snippet shows that it passes skb_headlen if the caller passes 0. But your BPF program does not even need the data writable, so then it is of little help of course. > Or are these skb's structured in such a way that there is never a need > to pull anything, > because the headers are already always in the linear portion? That is indeed the case. So as far as I can see: A BPF program that just wants to pull the network and transport headers can diligently pull exactly what is needed. And will not even observe any data pulled into linear in practice. This is still advisable rather than trusting that the headers are linear. It may also be required by the validator? Don't know. But check the return value. > > > Back to the original report: the issue should already have been fixed > > by commit 876e8ca83667 ("net: fix NULL pointer in skb_segment_list"). > > But that commit is in the kernel for which you report the error. > > > > Turns out that the crash is not in skb_segment_list, but later in > > __udpv4_gso_segment_list_csum. Which unconditionally dereferences > > udp_hdr(seg). > > > > The above fix also mentions skb pull as the culprit, but does not > > include a BPF program. If this can be reached in other ways, then we > > do need a stronger test in skb_segment_list, as you propose. > > > > I don't want to narrowly check whether udp_hdr is safe. Essentially, > > an SKB_GSO_FRAGLIST skb layout cannot be trusted at all if even one > > byte would get pulled. > > -- > Maciej Żenczykowski, Kernel Networking Developer @ Google
On Fri, 2024-04-19 at 13:41 -0400, Willem de Bruijn wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > Maciej Żenczykowski wrote: > > On Fri, Apr 19, 2024 at 7:17 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > Lena Wang (王娜) wrote: > > > > On Wed, 2024-04-17 at 21:15 -0700, Maciej Żenczykowski wrote: > > > > > > > > > > External email : Please do not click links or open > attachments until > > > > > you have verified the sender or the content. > > > > > On Wed, Apr 17, 2024 at 7:53 PM Lena Wang (王娜) < > > > > > Lena.Wang@mediatek.com> wrote: > > > > > > > > > > > > On Wed, 2024-04-17 at 15:48 -0400, Willem de Bruijn wrote: > > > > > > > > > > > > > > External email : Please do not click links or open > attachments > > > > > until > > > > > > > you have verified the sender or the content. > > > > > > > Lena Wang (王娜) wrote: > > > > > > > > On Tue, 2024-04-16 at 19:14 -0400, Willem de Bruijn > wrote: > > > > > > > > > > > > > > > > > > External email : Please do not click links or open > > > > > attachments > > > > > > > until > > > > > > > > > you have verified the sender or the content. > > > > > > > > > > > > > Personally, I think bpf_skb_pull_data() > should have > > > > > > > > > automatically > > > > > > > > > > > > > (ie. in kernel code) reduced how much it > pulls so > > > > > that it > > > > > > > > > would pull > > > > > > > > > > > > > headers only, > > > > > > > > > > > > > > > > > > > > > > > > That would be a helper that parses headers to > discover > > > > > > > header > > > > > > > > > length. > > > > > > > > > > > > > > > > > > > > > > Does it actually need to? Presumably the bpf > pull > > > > > function > > > > > > > could > > > > > > > > > > > notice that it is > > > > > > > > > > > a packet flagged as being of type X (UDP GSO > FRAGLIST) > > > > > and > > > > > > > reduce > > > > > > > > > the pull > > > > > > > > > > > accordingly so that it doesn't pull anything from > the > > > > > non- > > > > > > > linear > > > > > > > > > > > fraglist portion??? > > > > > > > > > > > > > > > > > > > > > > I know only the generic overview of what udp gso > is, not > > > > > any > > > > > > > > > details, so I am > > > > > > > > > > > assuming here that there's some sort of guarantee > to how > > > > > > > these > > > > > > > > > packets > > > > > > > > > > > are structured... But I imagine there must be or > we > > > > > wouldn't > > > > > > > be > > > > > > > > > hitting these > > > > > > > > > > > issues deeper in the stack? > > > > > > > > > > > > > > > > > > > > Perhaps for a packet of this type we're already > guaranteed > > > > > the > > > > > > > > > headers > > > > > > > > > > are in the linear portion, > > > > > > > > > > and the pull should simply be ignored? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Parsing is better left to the BPF program. > > > > > > > > > > > > > > > > > > I do prefer adding sanity checks to the BPF helpers, > over > > > > > having > > > > > > > to > > > > > > > > > add then in the net hot path only to protect against > > > > > dangerous > > > > > > > BPF > > > > > > > > > programs. > > > > > > > > > > > > > > > > > Is it OK to ignore or decrease pull length for udp gro > fraglist > > > > > > > packet? > > > > > > > > It could save the normal packet and sent to user > correctly. > > > > > > > > > > > > > > > > In common/net/core/filter.c > > > > > > > > static inline int __bpf_try_make_writable(struct > sk_buff *skb, > > > > > > > > unsigned int write_len) > > > > > > > > { > > > > > > > > +if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > > > > > +(SKB_GSO_UDP |SKB_GSO_UDP_L4)) { > > > > > > > > > > > > > > The issue is not with SKB_GSO_UDP_L4, but with > SKB_GSO_FRAGLIST. > > > > > > > > > > > > > Current in kernel just UDP uses SKB_GSO_FRAGLIST to do GRO. > In > > > > > > udp_offload.c udp4_gro_complete gso_type adds > "SKB_GSO_FRAGLIST| > > > > > > SKB_GSO_UDP_L4". Here checking these two flags is to limit > the > > > > > packet > > > > > > as "UDP + need GSO + fraglist". > > > > > > > > > > > > We could remove SKB_GSO_UDP_L4 check for more packet that > may > > > > > addrive > > > > > > skb_segment_list. > > > > > > > > > > > > > > +return 0; > > > > > > > > > > > > > > Failing for any pull is a bit excessive. And would kill a > sane > > > > > > > workaround of pulling only as many bytes as needed. > > > > > > > > > > > > > > > + or if (write_len > skb_headlen(skb)) > > > > > > > > +write_len = skb_headlen(skb); > > > > > > > > > > > > > > Truncating requests would be a surprising change of > behavior > > > > > > > for this function. > > > > > > > > > > > > > > Failing for a pull > skb_headlen is arguably reasonable, > as > > > > > > > the alternative is that we let it go through but have to > drop > > > > > > > the now malformed packets on segmentation. > > > > > > > > > > > > > > > > > > > > Is it OK as below? > > > > > > > > > > > > In common/net/core/filter.c > > > > > > static inline int __bpf_try_make_writable(struct sk_buff > *skb, > > > > > > unsigned int write_len) > > > > > > { > > > > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > > > + SKB_GSO_FRAGLIST) && (write_len > > > > > > skb_headlen(skb))) { > > > > > > + return 0; > > > > > > > > > > please limit write_len to skb_headlen() instead of just > returning 0 > > > > > > > > > > > > > Hi Maze & Willem, > > > > Maze's advice is: > > > > In common/net/core/filter.c > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > > unsigned int write_len) > > > > { > > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > + SKB_GSO_FRAGLIST) && (write_len > > skb_headlen(skb))) { > > > > + write_len = skb_headlen(skb); > > > > + } > > > > return skb_ensure_writable(skb, write_len); > > > > } > > > > > > > > Willem's advice is to "Failing for a pull > skb_headlen is > arguably > > > > reasonable...". It prefers to return 0 : > > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > + SKB_GSO_FRAGLIST) && (write_len > > skb_headlen(skb))) { > > > > + return 0; > > > > + } > > > > > > > > It seems a bit conflict. However I am not sure if my > understanding is > > > > right and hope to get your further guide. > > > > > > I did not mean to return 0. But to fail a request that would pull > an > > > unsafe amount. The caller must get a clear error signal. > > > > That's hostile on userspace. > > Currently the caller doesn't even check the error return... > > It can, and probably should. > > bpf_skb_pull data returns the error code from bpf_try_make_writable: > > return bpf_try_make_writable(skb, len ? : skb_headlen(skb)); > > > Why would we? We already have to reload all pointers, and have to > do > > and will thus redo checking on those. > > > > What do you expect the caller to do? Subtract -1 and try again? > > That's hard to do from BPF as it involves looping... and is slow. > > > > We already try to not pull too much: > > > > void try_make_writable(struct __sk_buff* skb, int len) { > > if (len > skb->len) len = skb->len; > > if (skb->data_end - skb->data < len) bpf_skb_pull_data(skb, len); > > } > > > > Is there at least something like skb->len that has the actually > > pullable length in it? > > The above snippet shows that it passes skb_headlen if the caller > passes 0. > > But your BPF program does not even need the data writable, so then > it is of little help of course. > > > Or are these skb's structured in such a way that there is never a > need > > to pull anything, > > because the headers are already always in the linear portion? > > That is indeed the case. > > So as far as I can see: > > A BPF program that just wants to pull the network and transport > headers can diligently pull exactly what is needed. And will not > even observe any data pulled into linear in practice. This is still > advisable rather than trusting that the headers are linear. It may > also be required by the validator? Don't know. But check the return > value. > Hi Willem, As the discussion, is it OK for the patch below? diff --git a/net/core/filter.c b/net/core/filter.c index 3a6110ea4009..abc6029c8eef 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1655,6 +1655,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, bpf_sp); static inline int __bpf_try_make_writable(struct sk_buff *skb, unsigned int write_len) { + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & + SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) { + return -ENOMEM; + } + return skb_ensure_writable(skb, write_len); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 73b1e0e53534..2e90534c1a1e 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4036,9 +4036,11 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, unsigned int tnl_hlen = skb_tnl_header_len(skb); unsigned int delta_truesize = 0; unsigned int delta_len = 0; + unsigned int mss = skb_shinfo(skb)->gso_size; struct sk_buff *tail = NULL; struct sk_buff *nskb, *tmp; int len_diff, err; + bool err_len = false; skb_push(skb, -skb_network_offset(skb) + offset); @@ -4047,6 +4049,14 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, if (err) goto err_linearize; + if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) { + if (!list_skb) { + goto err_linearize; + } else { + err_len = true; + } + } + skb_shinfo(skb)->frag_list = NULL; while (list_skb) { @@ -4109,6 +4119,9 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, __skb_linearize(skb)) goto err_linearize; + if (err_len) + goto err_linearize; + skb_get(skb); return skb; > > > > > Back to the original report: the issue should already have been > fixed > > > by commit 876e8ca83667 ("net: fix NULL pointer in > skb_segment_list"). > > > But that commit is in the kernel for which you report the error. > > > > > > Turns out that the crash is not in skb_segment_list, but later in > > > __udpv4_gso_segment_list_csum. Which unconditionally dereferences > > > udp_hdr(seg). > > > > > > The above fix also mentions skb pull as the culprit, but does not > > > include a BPF program. If this can be reached in other ways, then > we > > > do need a stronger test in skb_segment_list, as you propose. > > > > > > I don't want to narrowly check whether udp_hdr is safe. > Essentially, > > > an SKB_GSO_FRAGLIST skb layout cannot be trusted at all if even > one > > > byte would get pulled. > > > > -- > > Maciej Żenczykowski, Kernel Networking Developer @ Google > >
> Hi Willem, > As the discussion, is it OK for the patch below? Thanks for iterating on this. I would like the opinion also of the fraglist and UDP GRO experts. Yes, I think both - protecting skb_segment_list against clearly illegal fraglist packets, and - blocking BPF from constructing such packets are worthwhile stable fixes. I believe they should be two separate patches. Both probably with the same Fixes tag: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining"). > diff --git a/net/core/filter.c b/net/core/filter.c > index 3a6110ea4009..abc6029c8eef 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -1655,6 +1655,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, > bpf_sp); > static inline int __bpf_try_make_writable(struct sk_buff *skb, > unsigned int write_len) > { > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > + SKB_GSO_FRAGLIST) && (write_len > > skb_headlen(skb))) { > + return -ENOMEM; > + } > + Indentation looks off, but I agree with the logic. if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && (write_len > skb_headlen(skb))) > return skb_ensure_writable(skb, write_len); > } > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 73b1e0e53534..2e90534c1a1e 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -4036,9 +4036,11 @@ struct sk_buff *skb_segment_list(struct sk_buff > *skb, > unsigned int tnl_hlen = skb_tnl_header_len(skb); > unsigned int delta_truesize = 0; > unsigned int delta_len = 0; > + unsigned int mss = skb_shinfo(skb)->gso_size; > struct sk_buff *tail = NULL; > struct sk_buff *nskb, *tmp; > int len_diff, err; > + bool err_len = false; > > skb_push(skb, -skb_network_offset(skb) + offset); > > @@ -4047,6 +4049,14 @@ struct sk_buff *skb_segment_list(struct sk_buff > *skb, > if (err) > goto err_linearize; > > + if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) { > + if (!list_skb) { > + goto err_linearize; The label no longer truly covers the meaning. But that is already true since the above (second) jump was added in commit c329b261afe7 ("net: prevent skb corruption on frag list segmentation"). Neither needs the kfree_skb_list, as skb->next is not assigned to until the loop. Can just return ERR_PTR(-EFAULT)? > + } else { > + err_len = true; > + } > + } > + Why the branch? Might as well always fail immediately? > skb_shinfo(skb)->frag_list = NULL; > > while (list_skb) { > @@ -4109,6 +4119,9 @@ struct sk_buff *skb_segment_list(struct sk_buff > *skb, > __skb_linearize(skb)) > goto err_linearize; > > + if (err_len) > + goto err_linearize; > + > skb_get(skb); > > return skb; > > > > > > > > Back to the original report: the issue should already have been > > fixed > > > > by commit 876e8ca83667 ("net: fix NULL pointer in > > skb_segment_list"). > > > > But that commit is in the kernel for which you report the error. > > > > > > > > Turns out that the crash is not in skb_segment_list, but later in > > > > __udpv4_gso_segment_list_csum. Which unconditionally dereferences > > > > udp_hdr(seg). > > > > > > > > The above fix also mentions skb pull as the culprit, but does not > > > > include a BPF program. If this can be reached in other ways, then > > we > > > > do need a stronger test in skb_segment_list, as you propose. > > > > > > > > I don't want to narrowly check whether udp_hdr is safe. > > Essentially, > > > > an SKB_GSO_FRAGLIST skb layout cannot be trusted at all if even > > one > > > > byte would get pulled.
On Tue, 2024-04-23 at 14:35 -0400, Willem de Bruijn wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > > Hi Willem, > > As the discussion, is it OK for the patch below? > > Thanks for iterating on this. > > I would like the opinion also of the fraglist and UDP GRO experts. > > Yes, I think both > > - protecting skb_segment_list against clearly illegal fraglist > packets, and > - blocking BPF from constructing such packets > > are worthwhile stable fixes. I believe they should be two separate > patches. Both probably with the same Fixes tag: 3a1296a38d0c > ("net: Support GRO/GSO fraglist chaining"). > > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 3a6110ea4009..abc6029c8eef 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -1655,6 +1655,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, > > bpf_sp); > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > unsigned int write_len) > > { > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > + SKB_GSO_FRAGLIST) && (write_len > > > skb_headlen(skb))) { > > + return -ENOMEM; > > + } > > + > > Indentation looks off, but I agree with the logic. > > if (skb_is_gso(skb) && > (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > (write_len > skb_headlen(skb))) > > > return skb_ensure_writable(skb, write_len); > > } > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index 73b1e0e53534..2e90534c1a1e 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -4036,9 +4036,11 @@ struct sk_buff *skb_segment_list(struct > sk_buff > > *skb, > > unsigned int tnl_hlen = skb_tnl_header_len(skb); > > unsigned int delta_truesize = 0; > > unsigned int delta_len = 0; > > + unsigned int mss = skb_shinfo(skb)->gso_size; > > struct sk_buff *tail = NULL; > > struct sk_buff *nskb, *tmp; > > int len_diff, err; > > + bool err_len = false; > > > > skb_push(skb, -skb_network_offset(skb) + offset); > > > > @@ -4047,6 +4049,14 @@ struct sk_buff *skb_segment_list(struct > sk_buff > > *skb, > > if (err) > > goto err_linearize; > > > > + if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) { > > + if (!list_skb) { > > + goto err_linearize; > > The label no longer truly covers the meaning. > > But that is already true since the above (second) jump was added in > commit c329b261afe7 ("net: prevent skb corruption on frag list > segmentation"). > > Neither needs the kfree_skb_list, as skb->next is not assigned to > until the loop. Can just return ERR_PTR(-EFAULT)? > > > + } else { > > + err_len = true; > > + } > > + } > > + > > Why the branch? Might as well always fail immediately? > Hi Willem, Thanks for your guidance. You are right. There is no need for another branch as fraglist could be freeed in kfree_skb. Could I git send mail wo patches as below? From 933237400c0e2fa997470b70ff0e407996fa239c Mon Sep 17 00:00:00 2001 From: Shiming Cheng <shiming.cheng@mediatek.com> Date: Wed, 24 Apr 2024 13:42:35 +0800 Subject: [PATCH net] net: prevent BPF pull GROed skb's fraglist A GROed skb with fraglist can't be pulled data from its fraglist as it may result a invalid segmentation or kernel exception. For such structured skb we limit the BPF pull data length smaller than skb_headlen() and return error if exceeding. Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> Signed-off-by: Lena Wang <lena.wang@mediatek.com> --- net/core/filter.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/core/filter.c b/net/core/filter.c index 8adf95765cdd..8ed4d5d87167 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, bpf_sp); static inline int __bpf_try_make_writable(struct sk_buff *skb, unsigned int write_len) { + if (skb_is_gso(skb) && + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && + write_len > skb_headlen(skb)) { + return -ENOMEM; + } return skb_ensure_writable(skb, write_len); }
Lena Wang (王娜) wrote: > On Tue, 2024-04-23 at 14:35 -0400, Willem de Bruijn wrote: > > > > External email : Please do not click links or open attachments until > > you have verified the sender or the content. > > > Hi Willem, > > > As the discussion, is it OK for the patch below? > > > > Thanks for iterating on this. > > > > I would like the opinion also of the fraglist and UDP GRO experts. > > > > Yes, I think both > > > > - protecting skb_segment_list against clearly illegal fraglist > > packets, and > > - blocking BPF from constructing such packets > > > > are worthwhile stable fixes. I believe they should be two separate > > patches. Both probably with the same Fixes tag: 3a1296a38d0c > > ("net: Support GRO/GSO fraglist chaining"). > > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > > index 3a6110ea4009..abc6029c8eef 100644 > > > --- a/net/core/filter.c > > > +++ b/net/core/filter.c > > > @@ -1655,6 +1655,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, > > > bpf_sp); > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > unsigned int write_len) > > > { > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > + SKB_GSO_FRAGLIST) && (write_len > > > > skb_headlen(skb))) { > > > + return -ENOMEM; > > > + } > > > + > > > > Indentation looks off, but I agree with the logic. > > > > if (skb_is_gso(skb) && > > (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > > (write_len > skb_headlen(skb))) > > > > > return skb_ensure_writable(skb, write_len); > > > } > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index 73b1e0e53534..2e90534c1a1e 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -4036,9 +4036,11 @@ struct sk_buff *skb_segment_list(struct > > sk_buff > > > *skb, > > > unsigned int tnl_hlen = skb_tnl_header_len(skb); > > > unsigned int delta_truesize = 0; > > > unsigned int delta_len = 0; > > > + unsigned int mss = skb_shinfo(skb)->gso_size; > > > struct sk_buff *tail = NULL; > > > struct sk_buff *nskb, *tmp; > > > int len_diff, err; > > > + bool err_len = false; > > > > > > skb_push(skb, -skb_network_offset(skb) + offset); > > > > > > @@ -4047,6 +4049,14 @@ struct sk_buff *skb_segment_list(struct > > sk_buff > > > *skb, > > > if (err) > > > goto err_linearize; > > > > > > + if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) { > > > + if (!list_skb) { > > > + goto err_linearize; > > > > The label no longer truly covers the meaning. > > > > But that is already true since the above (second) jump was added in > > commit c329b261afe7 ("net: prevent skb corruption on frag list > > segmentation"). > > > > Neither needs the kfree_skb_list, as skb->next is not assigned to > > until the loop. Can just return ERR_PTR(-EFAULT)? > > > > > + } else { > > > + err_len = true; > > > + } > > > + } > > > + > > > > Why the branch? Might as well always fail immediately? > > > Hi Willem, > Thanks for your guidance. > You are right. There is no need for another branch as fraglist > could be freeed in kfree_skb. > Could I git send mail wo patches as below? > > From 933237400c0e2fa997470b70ff0e407996fa239c Mon Sep 17 00:00:00 2001 > From: Shiming Cheng <shiming.cheng@mediatek.com> > Date: Wed, 24 Apr 2024 13:42:35 +0800 > Subject: [PATCH net] net: prevent BPF pull GROed skb's fraglist > > A GROed skb with fraglist can't be pulled data Please use the specific label: SKB_GSO_FRAGLIST skb > from its fraglist as it may result a invalid > segmentation or kernel exception. > > For such structured skb we limit the BPF pull > data length smaller than skb_headlen() and return > error if exceeding. > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > --- > net/core/filter.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 8adf95765cdd..8ed4d5d87167 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, > bpf_sp); > static inline int __bpf_try_make_writable(struct sk_buff *skb, > unsigned int write_len) > { > + if (skb_is_gso(skb) && > + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > + write_len > skb_headlen(skb)) { > + return -ENOMEM; > + } > return skb_ensure_writable(skb, write_len); > } > > -- > 2.18.0 > > > From 2d0729b20cf810ba1b31e046952c1cd78f295ca3 Mon Sep 17 00:00:00 2001 > From: Shiming Cheng <shiming.cheng@mediatek.com> > Date: Wed, 24 Apr 2024 14:43:45 +0800 > Subject: [PATCH net] net: drop GROed skb pulled from fraglist > > A GROed skb with fraglist maybe pulled by BPF > or other ways. It can't be trusted at all even > if one byte is pulled and should be dropped > on segmentation. This paraphrases my comment. It is better to spell it out: An SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is expected to have all segments except the last to be gso_size long. If this does not hold, the skb has been modified and the fraglist gso integrity is lost. Drop the packet, as it cannot be segmented correctly by skb_segment_list. The skb could be salvaged, though, right? By linearizing, dropping the SKB_GSO_FRAGLIST bit and entering the normal skb_segment path rather than the skb_segment_list path. That choice is currently made in the protocol caller, __udp_gso_segment. It's not trivial to add such a backup path here. So let's add this backstop against kernel crashes. > > If the gso_size does not match skb_headlen(), > it means to be pulled part of or the entire > fraglsit. It has been messed with and we return fraglist > error to free this skb. > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > --- > net/core/skbuff.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index b99127712e67..750fbb51b99f 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -4493,6 +4493,7 @@ struct sk_buff *skb_segment_list(struct sk_buff > *skb, > unsigned int tnl_hlen = skb_tnl_header_len(skb); > unsigned int delta_truesize = 0; > unsigned int delta_len = 0; > + unsigned int mss = skb_shinfo(skb)->gso_size; Reverse christmas tree > struct sk_buff *tail = NULL; > struct sk_buff *nskb, *tmp; > int len_diff, err; > @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct sk_buff > *skb, > if (err) > goto err_linearize; > > + if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) > + return ERR_PTR(-EFAULT); > + Do this precondition integrity check before the skb_unclone path? > skb_shinfo(skb)->frag_list = NULL; > > while (list_skb) { > -- > 2.18.0
On Wed, 2024-04-24 at 10:28 -0400, Willem de Bruijn wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > > Lena Wang (王娜) wrote: > > On Tue, 2024-04-23 at 14:35 -0400, Willem de Bruijn wrote: > > > > > > External email : Please do not click links or open attachments > until > > > you have verified the sender or the content. > > > > Hi Willem, > > > > As the discussion, is it OK for the patch below? > > > > > > Thanks for iterating on this. > > > > > > I would like the opinion also of the fraglist and UDP GRO > experts. > > > > > > Yes, I think both > > > > > > - protecting skb_segment_list against clearly illegal fraglist > > > packets, and > > > - blocking BPF from constructing such packets > > > > > > are worthwhile stable fixes. I believe they should be two > separate > > > patches. Both probably with the same Fixes tag: 3a1296a38d0c > > > ("net: Support GRO/GSO fraglist chaining"). > > > > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > > > index 3a6110ea4009..abc6029c8eef 100644 > > > > --- a/net/core/filter.c > > > > +++ b/net/core/filter.c > > > > @@ -1655,6 +1655,11 @@ static DEFINE_PER_CPU(struct > bpf_scratchpad, > > > > bpf_sp); > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > > unsigned int > write_len) > > > > { > > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > + SKB_GSO_FRAGLIST) && (write_len > > > > > skb_headlen(skb))) { > > > > + return -ENOMEM; > > > > + } > > > > + > > > > > > Indentation looks off, but I agree with the logic. > > > > > > if (skb_is_gso(skb) && > > > (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > > > (write_len > skb_headlen(skb))) > > > > > > > return skb_ensure_writable(skb, write_len); > > > > } > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > index 73b1e0e53534..2e90534c1a1e 100644 > > > > --- a/net/core/skbuff.c > > > > +++ b/net/core/skbuff.c > > > > @@ -4036,9 +4036,11 @@ struct sk_buff *skb_segment_list(struct > > > sk_buff > > > > *skb, > > > > unsigned int tnl_hlen = skb_tnl_header_len(skb); > > > > unsigned int delta_truesize = 0; > > > > unsigned int delta_len = 0; > > > > + unsigned int mss = skb_shinfo(skb)->gso_size; > > > > struct sk_buff *tail = NULL; > > > > struct sk_buff *nskb, *tmp; > > > > int len_diff, err; > > > > + bool err_len = false; > > > > > > > > skb_push(skb, -skb_network_offset(skb) + offset); > > > > > > > > @@ -4047,6 +4049,14 @@ struct sk_buff *skb_segment_list(struct > > > sk_buff > > > > *skb, > > > > if (err) > > > > goto err_linearize; > > > > > > > > + if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) { > > > > + if (!list_skb) { > > > > + goto err_linearize; > > > > > > The label no longer truly covers the meaning. > > > > > > But that is already true since the above (second) jump was added > in > > > commit c329b261afe7 ("net: prevent skb corruption on frag list > > > segmentation"). > > > > > > Neither needs the kfree_skb_list, as skb->next is not assigned to > > > until the loop. Can just return ERR_PTR(-EFAULT)? > > > > > > > + } else { > > > > + err_len = true; > > > > + } > > > > + } > > > > + > > > > > > Why the branch? Might as well always fail immediately? > > > > > Hi Willem, > > Thanks for your guidance. > > You are right. There is no need for another branch as fraglist > > could be freeed in kfree_skb. > > Could I git send mail wo patches as below? > > > > From 933237400c0e2fa997470b70ff0e407996fa239c Mon Sep 17 00:00:00 > 2001 > > From: Shiming Cheng <shiming.cheng@mediatek.com> > > Date: Wed, 24 Apr 2024 13:42:35 +0800 > > Subject: [PATCH net] net: prevent BPF pull GROed skb's fraglist > > > > A GROed skb with fraglist can't be pulled data > > Please use the specific label: SKB_GSO_FRAGLIST skb > > > from its fraglist as it may result a invalid > > segmentation or kernel exception. > > > > For such structured skb we limit the BPF pull > > data length smaller than skb_headlen() and return > > error if exceeding. > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > --- > > net/core/filter.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 8adf95765cdd..8ed4d5d87167 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, > > bpf_sp); > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > unsigned int write_len) > > { > > +if (skb_is_gso(skb) && > > + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > > + write_len > skb_headlen(skb)) { > > +return -ENOMEM; > > +} > > return skb_ensure_writable(skb, write_len); > > } > > > > -- > > 2.18.0 > > > > > > From 2d0729b20cf810ba1b31e046952c1cd78f295ca3 Mon Sep 17 00:00:00 > 2001 > > From: Shiming Cheng <shiming.cheng@mediatek.com> > > Date: Wed, 24 Apr 2024 14:43:45 +0800 > > Subject: [PATCH net] net: drop GROed skb pulled from fraglist > > > > A GROed skb with fraglist maybe pulled by BPF > > or other ways. It can't be trusted at all even > > if one byte is pulled and should be dropped > > on segmentation. > > This paraphrases my comment. It is better to spell it out: > > An SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is expected to have all > segments except the last to be gso_size long. If this does not hold, > the skb has been modified and the fraglist gso integrity is lost. > Drop > the packet, as it cannot be segmented correctly by skb_segment_list. > > The skb could be salvaged, though, right? By linearizing, dropping > the SKB_GSO_FRAGLIST bit and entering the normal skb_segment path > rather than the skb_segment_list path. > > That choice is currently made in the protocol caller, > __udp_gso_segment. It's not trivial to add such a backup path here. > So let's add this backstop against kernel crashes. > > > > > If the gso_size does not match skb_headlen(), > > it means to be pulled part of or the entire > > fraglsit. It has been messed with and we return > > fraglist > > > error to free this skb. > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > --- > > net/core/skbuff.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index b99127712e67..750fbb51b99f 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -4493,6 +4493,7 @@ struct sk_buff *skb_segment_list(struct > sk_buff > > *skb, > > unsigned int tnl_hlen = skb_tnl_header_len(skb); > > unsigned int delta_truesize = 0; > > unsigned int delta_len = 0; > > +unsigned int mss = skb_shinfo(skb)->gso_size; > > Reverse christmas tree > > > struct sk_buff *tail = NULL; > > struct sk_buff *nskb, *tmp; > > int len_diff, err; > > @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct > sk_buff > > *skb, > > if (err) > > goto err_linearize; > > > > +if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) > > +return ERR_PTR(-EFAULT); > > + > > Do this precondition integrity check before the skb_unclone path? After return error, the skb will enter into kfree_skb, not consume_skb. It may meet same crash problem which has been resolved by skb_unclone. Or kfree_skb could well handle the cloned skb's release? Other changes are updated as below: From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 00:00:00 2001 From: Shiming Cheng <shiming.cheng@mediatek.com> Date: Wed, 24 Apr 2024 13:42:35 +0800 Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST skb A SKB_GSO_FRAGLIST skb can't be pulled data from its fraglist as it may result an invalid segmentation or kernel exception. For such structured skb we limit the BPF pulling data length smaller than skb_headlen() and return error if exceeding. Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> Signed-off-by: Lena Wang <lena.wang@mediatek.com> --- net/core/filter.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/core/filter.c b/net/core/filter.c index 8adf95765cdd..8ed4d5d87167 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, bpf_sp); static inline int __bpf_try_make_writable(struct sk_buff *skb, unsigned int write_len) { + if (skb_is_gso(skb) && + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && + write_len > skb_headlen(skb)) { + return -ENOMEM; + } return skb_ensure_writable(skb, write_len); }
> > > struct sk_buff *tail = NULL; > > > struct sk_buff *nskb, *tmp; > > > int len_diff, err; > > > @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct > > sk_buff > > > *skb, > > > if (err) > > > goto err_linearize; > > > > > > +if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) > > > +return ERR_PTR(-EFAULT); > > > + > > > > Do this precondition integrity check before the skb_unclone path? > > After return error, the skb will enter into kfree_skb, not consume_skb. > It may meet same crash problem which has been resolved by skb_unclone. > > Or kfree_skb could well handle the cloned skb's release? Since this is an error path it should reach kfree_skb rather than consume_skb. > > Other changes are updated as below: > > From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 00:00:00 2001 > From: Shiming Cheng <shiming.cheng@mediatek.com> > Date: Wed, 24 Apr 2024 13:42:35 +0800 > Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST skb > > A SKB_GSO_FRAGLIST skb can't be pulled data > from its fraglist as it may result an invalid > segmentation or kernel exception. > > For such structured skb we limit the BPF pulling > data length smaller than skb_headlen() and return > error if exceeding. > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > --- > net/core/filter.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 8adf95765cdd..8ed4d5d87167 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, > bpf_sp); > static inline int __bpf_try_make_writable(struct sk_buff *skb, > unsigned int write_len) > { > + if (skb_is_gso(skb) && > + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > + write_len > skb_headlen(skb)) { > + return -ENOMEM; > + } > return skb_ensure_writable(skb, write_len); > } > > -- > 2.18.0 > > > From 64d55392debbc90ef2e9c33441024d612075bdd7 Mon Sep 17 00:00:00 2001 > From: Shiming Cheng <shiming.cheng@mediatek.com> > Date: Wed, 24 Apr 2024 14:43:45 +0800 > Subject: [PATCH net] net: drop pulled SKB_GSO_FRAGLIST skb > > A SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is > expected to have all segments except the last > to be gso_size long. If this does not hold, the > skb has been modified and the fraglist gso integrity > is lost. Drop the packet, as it cannot be segmented > correctly by skb_segment_list. > > The skb could be salvaged, though, right? > By linearizing, dropping the SKB_GSO_FRAGLIST bit > and entering the normal skb_segment path rather than > the skb_segment_list path. Drop the "though, right?" > > That choice is currently made in the protocol caller, > __udp_gso_segment. It's not trivial to add such a > backup path here. So let's add this backstop against > kernel crashes. > > If the gso_size does not match skb_headlen(), > it means part of or the entire fraglist has been pulled. > It has been messed with and we should return error to > free this skb. This paragraph is now duplicative. Drop. > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > --- > net/core/skbuff.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index b99127712e67..4777f5fea6c3 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -4491,6 +4491,7 @@ struct sk_buff *skb_segment_list(struct sk_buff > *skb, > { > struct sk_buff *list_skb = skb_shinfo(skb)->frag_list; > unsigned int tnl_hlen = skb_tnl_header_len(skb); > + unsigned int mss = skb_shinfo(skb)->gso_size; > unsigned int delta_truesize = 0; > unsigned int delta_len = 0; > struct sk_buff *tail = NULL; > @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct sk_buff > *skb, > if (err) > goto err_linearize; > > + if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) > + return ERR_PTR(-EFAULT); > + > skb_shinfo(skb)->frag_list = NULL; > > while (list_skb) { > -- > 2.18.0
On Wed, Apr 24, 2024 at 9:33 PM Lena Wang (王娜) <Lena.Wang@mediatek.com> wrote: > > On Wed, 2024-04-24 at 10:28 -0400, Willem de Bruijn wrote: > > > > External email : Please do not click links or open attachments until > > you have verified the sender or the content. > > > > Lena Wang (王娜) wrote: > > > On Tue, 2024-04-23 at 14:35 -0400, Willem de Bruijn wrote: > > > > > > > > External email : Please do not click links or open attachments > > until > > > > you have verified the sender or the content. > > > > > Hi Willem, > > > > > As the discussion, is it OK for the patch below? > > > > > > > > Thanks for iterating on this. > > > > > > > > I would like the opinion also of the fraglist and UDP GRO > > experts. > > > > > > > > Yes, I think both > > > > > > > > - protecting skb_segment_list against clearly illegal fraglist > > > > packets, and > > > > - blocking BPF from constructing such packets > > > > > > > > are worthwhile stable fixes. I believe they should be two > > separate > > > > patches. Both probably with the same Fixes tag: 3a1296a38d0c > > > > ("net: Support GRO/GSO fraglist chaining"). > > > > > > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > > > > index 3a6110ea4009..abc6029c8eef 100644 > > > > > --- a/net/core/filter.c > > > > > +++ b/net/core/filter.c > > > > > @@ -1655,6 +1655,11 @@ static DEFINE_PER_CPU(struct > > bpf_scratchpad, > > > > > bpf_sp); > > > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > > > unsigned int > > write_len) > > > > > { > > > > > + if (skb_is_gso(skb) && (skb_shinfo(skb)->gso_type & > > > > > + SKB_GSO_FRAGLIST) && (write_len > > > > > > skb_headlen(skb))) { > > > > > + return -ENOMEM; > > > > > + } > > > > > + > > > > > > > > Indentation looks off, but I agree with the logic. > > > > > > > > if (skb_is_gso(skb) && > > > > (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > > > > (write_len > skb_headlen(skb))) > > > > > > > > > return skb_ensure_writable(skb, write_len); > > > > > } > > > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > > index 73b1e0e53534..2e90534c1a1e 100644 > > > > > --- a/net/core/skbuff.c > > > > > +++ b/net/core/skbuff.c > > > > > @@ -4036,9 +4036,11 @@ struct sk_buff *skb_segment_list(struct > > > > sk_buff > > > > > *skb, > > > > > unsigned int tnl_hlen = skb_tnl_header_len(skb); > > > > > unsigned int delta_truesize = 0; > > > > > unsigned int delta_len = 0; > > > > > + unsigned int mss = skb_shinfo(skb)->gso_size; > > > > > struct sk_buff *tail = NULL; > > > > > struct sk_buff *nskb, *tmp; > > > > > int len_diff, err; > > > > > + bool err_len = false; > > > > > > > > > > skb_push(skb, -skb_network_offset(skb) + offset); > > > > > > > > > > @@ -4047,6 +4049,14 @@ struct sk_buff *skb_segment_list(struct > > > > sk_buff > > > > > *skb, > > > > > if (err) > > > > > goto err_linearize; > > > > > > > > > > + if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) { > > > > > + if (!list_skb) { > > > > > + goto err_linearize; > > > > > > > > The label no longer truly covers the meaning. > > > > > > > > But that is already true since the above (second) jump was added > > in > > > > commit c329b261afe7 ("net: prevent skb corruption on frag list > > > > segmentation"). > > > > > > > > Neither needs the kfree_skb_list, as skb->next is not assigned to > > > > until the loop. Can just return ERR_PTR(-EFAULT)? > > > > > > > > > + } else { > > > > > + err_len = true; > > > > > + } > > > > > + } > > > > > + > > > > > > > > Why the branch? Might as well always fail immediately? > > > > > > > Hi Willem, > > > Thanks for your guidance. > > > You are right. There is no need for another branch as fraglist > > > could be freeed in kfree_skb. > > > Could I git send mail wo patches as below? > > > > > > From 933237400c0e2fa997470b70ff0e407996fa239c Mon Sep 17 00:00:00 > > 2001 > > > From: Shiming Cheng <shiming.cheng@mediatek.com> > > > Date: Wed, 24 Apr 2024 13:42:35 +0800 > > > Subject: [PATCH net] net: prevent BPF pull GROed skb's fraglist > > > > > > A GROed skb with fraglist can't be pulled data > > > > Please use the specific label: SKB_GSO_FRAGLIST skb > > > > > from its fraglist as it may result a invalid > > > segmentation or kernel exception. > > > > > > For such structured skb we limit the BPF pull > > > data length smaller than skb_headlen() and return > > > error if exceeding. > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > > --- > > > net/core/filter.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > > index 8adf95765cdd..8ed4d5d87167 100644 > > > --- a/net/core/filter.c > > > +++ b/net/core/filter.c > > > @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, > > > bpf_sp); > > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > > unsigned int write_len) > > > { > > > +if (skb_is_gso(skb) && > > > + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > > > + write_len > skb_headlen(skb)) { > > > +return -ENOMEM; > > > +} > > > return skb_ensure_writable(skb, write_len); > > > } > > > > > > -- > > > 2.18.0 > > > > > > > > > From 2d0729b20cf810ba1b31e046952c1cd78f295ca3 Mon Sep 17 00:00:00 > > 2001 > > > From: Shiming Cheng <shiming.cheng@mediatek.com> > > > Date: Wed, 24 Apr 2024 14:43:45 +0800 > > > Subject: [PATCH net] net: drop GROed skb pulled from fraglist > > > > > > A GROed skb with fraglist maybe pulled by BPF > > > or other ways. It can't be trusted at all even > > > if one byte is pulled and should be dropped > > > on segmentation. > > > > This paraphrases my comment. It is better to spell it out: > > > > An SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is expected to have all > > segments except the last to be gso_size long. If this does not hold, > > the skb has been modified and the fraglist gso integrity is lost. > > Drop > > the packet, as it cannot be segmented correctly by skb_segment_list. > > > > The skb could be salvaged, though, right? By linearizing, dropping > > the SKB_GSO_FRAGLIST bit and entering the normal skb_segment path > > rather than the skb_segment_list path. > > > > That choice is currently made in the protocol caller, > > __udp_gso_segment. It's not trivial to add such a backup path here. > > So let's add this backstop against kernel crashes. > > > > > > > > If the gso_size does not match skb_headlen(), > > > it means to be pulled part of or the entire > > > fraglsit. It has been messed with and we return > > > > fraglist > > > > > error to free this skb. > > > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > > --- > > > net/core/skbuff.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index b99127712e67..750fbb51b99f 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -4493,6 +4493,7 @@ struct sk_buff *skb_segment_list(struct > > sk_buff > > > *skb, > > > unsigned int tnl_hlen = skb_tnl_header_len(skb); > > > unsigned int delta_truesize = 0; > > > unsigned int delta_len = 0; > > > +unsigned int mss = skb_shinfo(skb)->gso_size; > > > > Reverse christmas tree > > > > > struct sk_buff *tail = NULL; > > > struct sk_buff *nskb, *tmp; > > > int len_diff, err; > > > @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct > > sk_buff > > > *skb, > > > if (err) > > > goto err_linearize; > > > > > > +if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) > > > +return ERR_PTR(-EFAULT); > > > + > > > > Do this precondition integrity check before the skb_unclone path? > > After return error, the skb will enter into kfree_skb, not consume_skb. > It may meet same crash problem which has been resolved by skb_unclone. > > Or kfree_skb could well handle the cloned skb's release? > > Other changes are updated as below: > > From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 00:00:00 2001 > From: Shiming Cheng <shiming.cheng@mediatek.com> > Date: Wed, 24 Apr 2024 13:42:35 +0800 > Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST skb > > A SKB_GSO_FRAGLIST skb can't be pulled data > from its fraglist as it may result an invalid > segmentation or kernel exception. > > For such structured skb we limit the BPF pulling > data length smaller than skb_headlen() and return > error if exceeding. > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > --- > net/core/filter.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 8adf95765cdd..8ed4d5d87167 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, > bpf_sp); > static inline int __bpf_try_make_writable(struct sk_buff *skb, > unsigned int write_len) > { > + if (skb_is_gso(skb) && > + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > + write_len > skb_headlen(skb)) { at least call: skb_ensure_writable(skb, skb_headlen(skb)); here before you return the error to make sure the already linear header portion is writable. (the skb could be a clone with a r/o header afaict) > + return -ENOMEM; > + } > return skb_ensure_writable(skb, write_len); > } > > -- > 2.18.0 > > > From 64d55392debbc90ef2e9c33441024d612075bdd7 Mon Sep 17 00:00:00 2001 > From: Shiming Cheng <shiming.cheng@mediatek.com> > Date: Wed, 24 Apr 2024 14:43:45 +0800 > Subject: [PATCH net] net: drop pulled SKB_GSO_FRAGLIST skb > > A SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is > expected to have all segments except the last > to be gso_size long. If this does not hold, the > skb has been modified and the fraglist gso integrity > is lost. Drop the packet, as it cannot be segmented > correctly by skb_segment_list. > > The skb could be salvaged, though, right? > By linearizing, dropping the SKB_GSO_FRAGLIST bit > and entering the normal skb_segment path rather than > the skb_segment_list path. > > That choice is currently made in the protocol caller, > __udp_gso_segment. It's not trivial to add such a > backup path here. So let's add this backstop against > kernel crashes. > > If the gso_size does not match skb_headlen(), > it means part of or the entire fraglist has been pulled. > It has been messed with and we should return error to > free this skb. > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > --- > net/core/skbuff.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index b99127712e67..4777f5fea6c3 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -4491,6 +4491,7 @@ struct sk_buff *skb_segment_list(struct sk_buff > *skb, > { > struct sk_buff *list_skb = skb_shinfo(skb)->frag_list; > unsigned int tnl_hlen = skb_tnl_header_len(skb); > + unsigned int mss = skb_shinfo(skb)->gso_size; > unsigned int delta_truesize = 0; > unsigned int delta_len = 0; > struct sk_buff *tail = NULL; > @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct sk_buff > *skb, > if (err) > goto err_linearize; > > + if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) > + return ERR_PTR(-EFAULT); > + > skb_shinfo(skb)->frag_list = NULL; > > while (list_skb) { > -- > 2.18.0 -- Maciej Żenczykowski, Kernel Networking Developer @ Google
On Thu, 2024-04-25 at 10:07 -0400, Willem de Bruijn wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > > > > > struct sk_buff *tail = NULL; > > > > struct sk_buff *nskb, *tmp; > > > > int len_diff, err; > > > > @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct > > > sk_buff > > > > *skb, > > > > if (err) > > > > goto err_linearize; > > > > > > > > +if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) > > > > +return ERR_PTR(-EFAULT); > > > > + > > > > > > Do this precondition integrity check before the skb_unclone path? > > > > After return error, the skb will enter into kfree_skb, not > consume_skb. > > It may meet same crash problem which has been resolved by > skb_unclone. > > > > Or kfree_skb could well handle the cloned skb's release? > > Since this is an error path it should reach kfree_skb rather than > consume_skb. > So we could keep the check after the skb_unclone path, right? > > > > Other changes are updated as below: > > > > From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 00:00:00 > 2001 > > From: Shiming Cheng <shiming.cheng@mediatek.com> > > Date: Wed, 24 Apr 2024 13:42:35 +0800 > > Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST skb > > > > A SKB_GSO_FRAGLIST skb can't be pulled data > > from its fraglist as it may result an invalid > > segmentation or kernel exception. > > > > For such structured skb we limit the BPF pulling > > data length smaller than skb_headlen() and return > > error if exceeding. > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > --- > > net/core/filter.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 8adf95765cdd..8ed4d5d87167 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, > > bpf_sp); > > static inline int __bpf_try_make_writable(struct sk_buff *skb, > > unsigned int write_len) > > { > > +if (skb_is_gso(skb) && > > + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > > + write_len > skb_headlen(skb)) { > > +return -ENOMEM; > > +} > > return skb_ensure_writable(skb, write_len); > > } > > > > -- > > 2.18.0 > > > > > > From 64d55392debbc90ef2e9c33441024d612075bdd7 Mon Sep 17 00:00:00 > 2001 > > From: Shiming Cheng <shiming.cheng@mediatek.com> > > Date: Wed, 24 Apr 2024 14:43:45 +0800 > > Subject: [PATCH net] net: drop pulled SKB_GSO_FRAGLIST skb > > > > A SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is > > expected to have all segments except the last > > to be gso_size long. If this does not hold, the > > skb has been modified and the fraglist gso integrity > > is lost. Drop the packet, as it cannot be segmented > > correctly by skb_segment_list. > > > > The skb could be salvaged, though, right? > > By linearizing, dropping the SKB_GSO_FRAGLIST bit > > and entering the normal skb_segment path rather than > > the skb_segment_list path. > > Drop the "though, right?" > > > > That choice is currently made in the protocol caller, > > __udp_gso_segment. It's not trivial to add such a > > backup path here. So let's add this backstop against > > kernel crashes. > > > > If the gso_size does not match skb_headlen(), > > it means part of or the entire fraglist has been pulled. > > It has been messed with and we should return error to > > free this skb. > > This paragraph is now duplicative. Drop. OK, updated as below: From 59d561adc13d52e3c225c6b8276f6a53324f7d56 Mon Sep 17 00:00:00 2001 From: Shiming Cheng <shiming.cheng@mediatek.com> Date: Wed, 24 Apr 2024 14:43:45 +0800 Subject: [PATCH net] net: drop pulled SKB_GSO_FRAGLIST skb A SKB_GSO_FRAGLIST skb without GSO_BY_FRAGS is expected to have all segments except the last to be gso_size long. If this does not hold, the skb has been modified and the fraglist gso integrity is lost. Drop the packet, as it cannot be segmented correctly by skb_segment_list. The skb could be salvaged. By linearizing, dropping the SKB_GSO_FRAGLIST bit and entering the normal skb_segment path rather than the skb_segment_list path. That choice is currently made in the protocol caller, __udp_gso_segment. It's not trivial to add such a backup path here. So let's add this backstop against kernel crashes. Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> Signed-off-by: Lena Wang <lena.wang@mediatek.com> --- net/core/skbuff.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index b99127712e67..4777f5fea6c3 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4491,6 +4491,7 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, { struct sk_buff *list_skb = skb_shinfo(skb)->frag_list; unsigned int tnl_hlen = skb_tnl_header_len(skb); + unsigned int mss = skb_shinfo(skb)->gso_size; unsigned int delta_truesize = 0; unsigned int delta_len = 0; struct sk_buff *tail = NULL; @@ -4504,6 +4505,9 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, if (err) goto err_linearize; + if (mss != GSO_BY_FRAGS && mss != skb_headlen(skb)) + return ERR_PTR(-EFAULT); + skb_shinfo(skb)->frag_list = NULL; while (list_skb) {
On 4/26/24 11:52 AM, Lena Wang (王娜) wrote: [...] >>> From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 00:00:00 >> 2001 >>> From: Shiming Cheng <shiming.cheng@mediatek.com> >>> Date: Wed, 24 Apr 2024 13:42:35 +0800 >>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST skb >>> >>> A SKB_GSO_FRAGLIST skb can't be pulled data >>> from its fraglist as it may result an invalid >>> segmentation or kernel exception. >>> >>> For such structured skb we limit the BPF pulling >>> data length smaller than skb_headlen() and return >>> error if exceeding. >>> >>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") >>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> >>> Signed-off-by: Lena Wang <lena.wang@mediatek.com> >>> --- >>> net/core/filter.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/net/core/filter.c b/net/core/filter.c >>> index 8adf95765cdd..8ed4d5d87167 100644 >>> --- a/net/core/filter.c >>> +++ b/net/core/filter.c >>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, >>> bpf_sp); >>> static inline int __bpf_try_make_writable(struct sk_buff *skb, >>> unsigned int write_len) >>> { >>> +if (skb_is_gso(skb) && >>> + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && >>> + write_len > skb_headlen(skb)) { >>> +return -ENOMEM; >>> +} >>> return skb_ensure_writable(skb, write_len); Dumb question, but should this guard be more generically part of skb_ensure_writable() internals, presumably that would be inside pskb_may_pull_reason(), or only if we ever see more code instances similar to this? Thanks, Daniel
Daniel Borkmann wrote: > On 4/26/24 11:52 AM, Lena Wang (王娜) wrote: > [...] > >>> From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 00:00:00 > >> 2001 > >>> From: Shiming Cheng <shiming.cheng@mediatek.com> > >>> Date: Wed, 24 Apr 2024 13:42:35 +0800 > >>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST skb > >>> > >>> A SKB_GSO_FRAGLIST skb can't be pulled data > >>> from its fraglist as it may result an invalid > >>> segmentation or kernel exception. > >>> > >>> For such structured skb we limit the BPF pulling > >>> data length smaller than skb_headlen() and return > >>> error if exceeding. > >>> > >>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > >>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > >>> Signed-off-by: Lena Wang <lena.wang@mediatek.com> > >>> --- > >>> net/core/filter.c | 5 +++++ > >>> 1 file changed, 5 insertions(+) > >>> > >>> diff --git a/net/core/filter.c b/net/core/filter.c > >>> index 8adf95765cdd..8ed4d5d87167 100644 > >>> --- a/net/core/filter.c > >>> +++ b/net/core/filter.c > >>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct bpf_scratchpad, > >>> bpf_sp); > >>> static inline int __bpf_try_make_writable(struct sk_buff *skb, > >>> unsigned int write_len) > >>> { > >>> +if (skb_is_gso(skb) && > >>> + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > >>> + write_len > skb_headlen(skb)) { > >>> +return -ENOMEM; > >>> +} > >>> return skb_ensure_writable(skb, write_len); > > Dumb question, but should this guard be more generically part of skb_ensure_writable() > internals, presumably that would be inside pskb_may_pull_reason(), or only if we ever > see more code instances similar to this? Good point. Most callers of skb_ensure_writable correctly pull only headers, so wouldn't cause this problem. But it also adds coverage to things like tc pedit.
On Sat, 2024-04-27 at 09:28 -0400, Willem de Bruijn wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > > Daniel Borkmann wrote: > > On 4/26/24 11:52 AM, Lena Wang (王娜) wrote: > > [...] > > >>> From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 > 00:00:00 > > >> 2001 > > >>> From: Shiming Cheng <shiming.cheng@mediatek.com> > > >>> Date: Wed, 24 Apr 2024 13:42:35 +0800 > > >>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST > skb > > >>> > > >>> A SKB_GSO_FRAGLIST skb can't be pulled data > > >>> from its fraglist as it may result an invalid > > >>> segmentation or kernel exception. > > >>> > > >>> For such structured skb we limit the BPF pulling > > >>> data length smaller than skb_headlen() and return > > >>> error if exceeding. > > >>> > > >>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > >>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > >>> Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > >>> --- > > >>> net/core/filter.c | 5 +++++ > > >>> 1 file changed, 5 insertions(+) > > >>> > > >>> diff --git a/net/core/filter.c b/net/core/filter.c > > >>> index 8adf95765cdd..8ed4d5d87167 100644 > > >>> --- a/net/core/filter.c > > >>> +++ b/net/core/filter.c > > >>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct > bpf_scratchpad, > > >>> bpf_sp); > > >>> static inline int __bpf_try_make_writable(struct sk_buff > *skb, > > >>> unsigned int write_len) > > >>> { > > >>> +if (skb_is_gso(skb) && > > >>> + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > > >>> + write_len > skb_headlen(skb)) { > > >>> +return -ENOMEM; > > >>> +} > > >>> return skb_ensure_writable(skb, write_len); > > > > Dumb question, but should this guard be more generically part of > skb_ensure_writable() > > internals, presumably that would be inside pskb_may_pull_reason(), > or only if we ever > > see more code instances similar to this? > > Good point. Most callers of skb_ensure_writable correctly pull only > headers, so wouldn't cause this problem. But it also adds coverage to > things like tc pedit. Updated: From 3be30b8cf6e629f2615ef4eafe3b2a1c0d68c530 Mon Sep 17 00:00:00 2001 From: Shiming Cheng <shiming.cheng@mediatek.com> Date: Sun, 28 Apr 2024 15:03:12 +0800 Subject: [PATCH net] net: prevent pulling SKB_GSO_FRAGLIST skb BPF or TC callers may pull in a length longer than skb_headlen() for a SKB_GSO_FRAGLIST skb. The data in fraglist will be pulled into the linear space. However it destroys the skb's structure and may result in an invalid segmentation or kernel exception. So we should add protection to stop the operation and return error to remind callers. Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> Signed-off-by: Lena Wang <lena.wang@mediatek.com> --- include/linux/skbuff.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9d24aec064e8..3eef65b3db24 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2740,6 +2740,12 @@ pskb_may_pull_reason(struct sk_buff *skb, unsigned int len) if (unlikely(len > skb->len)) return SKB_DROP_REASON_PKT_TOO_SMALL; + if (skb_is_gso(skb) && + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && + write_len > skb_headlen(skb)) { + return SKB_DROP_REASON_NOMEM; + } + if (unlikely(!__pskb_pull_tail(skb, len - skb_headlen(skb)))) return SKB_DROP_REASON_NOMEM;
Lena Wang (王娜) wrote: > On Sat, 2024-04-27 at 09:28 -0400, Willem de Bruijn wrote: > > > > External email : Please do not click links or open attachments until > > you have verified the sender or the content. > > > > Daniel Borkmann wrote: > > > On 4/26/24 11:52 AM, Lena Wang (王娜) wrote: > > > [...] > > > >>> From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 > > 00:00:00 > > > >> 2001 > > > >>> From: Shiming Cheng <shiming.cheng@mediatek.com> > > > >>> Date: Wed, 24 Apr 2024 13:42:35 +0800 > > > >>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST > > skb > > > >>> > > > >>> A SKB_GSO_FRAGLIST skb can't be pulled data > > > >>> from its fraglist as it may result an invalid > > > >>> segmentation or kernel exception. > > > >>> > > > >>> For such structured skb we limit the BPF pulling > > > >>> data length smaller than skb_headlen() and return > > > >>> error if exceeding. > > > >>> > > > >>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > > >>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > > >>> Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > > >>> --- > > > >>> net/core/filter.c | 5 +++++ > > > >>> 1 file changed, 5 insertions(+) > > > >>> > > > >>> diff --git a/net/core/filter.c b/net/core/filter.c > > > >>> index 8adf95765cdd..8ed4d5d87167 100644 > > > >>> --- a/net/core/filter.c > > > >>> +++ b/net/core/filter.c > > > >>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct > > bpf_scratchpad, > > > >>> bpf_sp); > > > >>> static inline int __bpf_try_make_writable(struct sk_buff > > *skb, > > > >>> unsigned int write_len) > > > >>> { > > > >>> +if (skb_is_gso(skb) && > > > >>> + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > > > >>> + write_len > skb_headlen(skb)) { > > > >>> +return -ENOMEM; > > > >>> +} > > > >>> return skb_ensure_writable(skb, write_len); > > > > > > Dumb question, but should this guard be more generically part of > > skb_ensure_writable() > > > internals, presumably that would be inside pskb_may_pull_reason(), > > or only if we ever > > > see more code instances similar to this? > > > > Good point. Most callers of skb_ensure_writable correctly pull only > > headers, so wouldn't cause this problem. But it also adds coverage to > > things like tc pedit. > > Updated: > > From 3be30b8cf6e629f2615ef4eafe3b2a1c0d68c530 Mon Sep 17 00:00:00 2001 > From: Shiming Cheng <shiming.cheng@mediatek.com> > Date: Sun, 28 Apr 2024 15:03:12 +0800 > Subject: [PATCH net] net: prevent pulling SKB_GSO_FRAGLIST skb > > BPF or TC callers may pull in a length longer than skb_headlen() > for a SKB_GSO_FRAGLIST skb. The data in fraglist will be pulled > into the linear space. However it destroys the skb's structure > and may result in an invalid segmentation or kernel exception. > > So we should add protection to stop the operation and return > error to remind callers. > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > --- > include/linux/skbuff.h | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 9d24aec064e8..3eef65b3db24 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -2740,6 +2740,12 @@ pskb_may_pull_reason(struct sk_buff *skb, > unsigned int len) pskb_may_pull is called in far too many locations. Daniel's suggestion was to amend skb_ensure_writable Please start sending the patches as regular patches. They are close enough to review normally. > if (unlikely(len > skb->len)) > return SKB_DROP_REASON_PKT_TOO_SMALL; > > + if (skb_is_gso(skb) && > + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > + write_len > skb_headlen(skb)) { > + return SKB_DROP_REASON_NOMEM; > + } > + > if (unlikely(!__pskb_pull_tail(skb, len - skb_headlen(skb)))) > return SKB_DROP_REASON_NOMEM; > > -- > 2.18.0
On 4/28/24 9:48 AM, Lena Wang (王娜) wrote: > On Sat, 2024-04-27 at 09:28 -0400, Willem de Bruijn wrote: >> >> External email : Please do not click links or open attachments until >> you have verified the sender or the content. >> >> Daniel Borkmann wrote: >>> On 4/26/24 11:52 AM, Lena Wang (王娜) wrote: >>> [...] >>>>>> From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 >> 00:00:00 >>>>> 2001 >>>>>> From: Shiming Cheng <shiming.cheng@mediatek.com> >>>>>> Date: Wed, 24 Apr 2024 13:42:35 +0800 >>>>>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST >> skb >>>>>> >>>>>> A SKB_GSO_FRAGLIST skb can't be pulled data >>>>>> from its fraglist as it may result an invalid >>>>>> segmentation or kernel exception. >>>>>> >>>>>> For such structured skb we limit the BPF pulling >>>>>> data length smaller than skb_headlen() and return >>>>>> error if exceeding. >>>>>> >>>>>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") >>>>>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> >>>>>> Signed-off-by: Lena Wang <lena.wang@mediatek.com> >>>>>> --- >>>>>> net/core/filter.c | 5 +++++ >>>>>> 1 file changed, 5 insertions(+) >>>>>> >>>>>> diff --git a/net/core/filter.c b/net/core/filter.c >>>>>> index 8adf95765cdd..8ed4d5d87167 100644 >>>>>> --- a/net/core/filter.c >>>>>> +++ b/net/core/filter.c >>>>>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct >> bpf_scratchpad, >>>>>> bpf_sp); >>>>>> static inline int __bpf_try_make_writable(struct sk_buff >> *skb, >>>>>> unsigned int write_len) >>>>>> { >>>>>> +if (skb_is_gso(skb) && >>>>>> + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && >>>>>> + write_len > skb_headlen(skb)) { >>>>>> +return -ENOMEM; >>>>>> +} >>>>>> return skb_ensure_writable(skb, write_len); >>> >>> Dumb question, but should this guard be more generically part of >> skb_ensure_writable() >>> internals, presumably that would be inside pskb_may_pull_reason(), >> or only if we ever >>> see more code instances similar to this? >> >> Good point. Most callers of skb_ensure_writable correctly pull only >> headers, so wouldn't cause this problem. But it also adds coverage to >> things like tc pedit. > > Updated: > > From 3be30b8cf6e629f2615ef4eafe3b2a1c0d68c530 Mon Sep 17 00:00:00 2001 > From: Shiming Cheng <shiming.cheng@mediatek.com> > Date: Sun, 28 Apr 2024 15:03:12 +0800 > Subject: [PATCH net] net: prevent pulling SKB_GSO_FRAGLIST skb > > BPF or TC callers may pull in a length longer than skb_headlen() > for a SKB_GSO_FRAGLIST skb. The data in fraglist will be pulled > into the linear space. However it destroys the skb's structure > and may result in an invalid segmentation or kernel exception. > > So we should add protection to stop the operation and return > error to remind callers. > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > --- > include/linux/skbuff.h | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 9d24aec064e8..3eef65b3db24 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -2740,6 +2740,12 @@ pskb_may_pull_reason(struct sk_buff *skb, > unsigned int len) > if (unlikely(len > skb->len)) > return SKB_DROP_REASON_PKT_TOO_SMALL; > > + if (skb_is_gso(skb) && > + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > + write_len > skb_headlen(skb)) { > + return SKB_DROP_REASON_NOMEM; > + } The 'write_len > skb_headlen(skb)' test is redundant, no ? It is covered by the earlier test : if (likely(len <= skb_headlen(skb))) return SKB_NOT_DROPPED_YET; Also, was this patch even compile tested since there is no write_len var ? > if (unlikely(!__pskb_pull_tail(skb, len - skb_headlen(skb)))) > return SKB_DROP_REASON_NOMEM; > >
On Mon, 2024-04-29 at 12:15 +0200, Daniel Borkmann wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 4/28/24 9:48 AM, Lena Wang (王娜) wrote: > > On Sat, 2024-04-27 at 09:28 -0400, Willem de Bruijn wrote: > >> > >> External email : Please do not click links or open attachments > until > >> you have verified the sender or the content. > >> > >> Daniel Borkmann wrote: > >>> On 4/26/24 11:52 AM, Lena Wang (王娜) wrote: > >>> [...] > >>>>>> From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 > >> 00:00:00 > >>>>> 2001 > >>>>>> From: Shiming Cheng <shiming.cheng@mediatek.com> > >>>>>> Date: Wed, 24 Apr 2024 13:42:35 +0800 > >>>>>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST > >> skb > >>>>>> > >>>>>> A SKB_GSO_FRAGLIST skb can't be pulled data > >>>>>> from its fraglist as it may result an invalid > >>>>>> segmentation or kernel exception. > >>>>>> > >>>>>> For such structured skb we limit the BPF pulling > >>>>>> data length smaller than skb_headlen() and return > >>>>>> error if exceeding. > >>>>>> > >>>>>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist > chaining.") > >>>>>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > >>>>>> Signed-off-by: Lena Wang <lena.wang@mediatek.com> > >>>>>> --- > >>>>>> net/core/filter.c | 5 +++++ > >>>>>> 1 file changed, 5 insertions(+) > >>>>>> > >>>>>> diff --git a/net/core/filter.c b/net/core/filter.c > >>>>>> index 8adf95765cdd..8ed4d5d87167 100644 > >>>>>> --- a/net/core/filter.c > >>>>>> +++ b/net/core/filter.c > >>>>>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct > >> bpf_scratchpad, > >>>>>> bpf_sp); > >>>>>> static inline int __bpf_try_make_writable(struct sk_buff > >> *skb, > >>>>>> unsigned int write_len) > >>>>>> { > >>>>>> +if (skb_is_gso(skb) && > >>>>>> + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > >>>>>> + write_len > skb_headlen(skb)) { > >>>>>> +return -ENOMEM; > >>>>>> +} > >>>>>> return skb_ensure_writable(skb, write_len); > >>> > >>> Dumb question, but should this guard be more generically part of > >> skb_ensure_writable() > >>> internals, presumably that would be inside > pskb_may_pull_reason(), > >> or only if we ever > >>> see more code instances similar to this? > >> > >> Good point. Most callers of skb_ensure_writable correctly pull > only > >> headers, so wouldn't cause this problem. But it also adds coverage > to > >> things like tc pedit. > > > > Updated: > > > > From 3be30b8cf6e629f2615ef4eafe3b2a1c0d68c530 Mon Sep 17 00:00:00 > 2001 > > From: Shiming Cheng <shiming.cheng@mediatek.com> > > Date: Sun, 28 Apr 2024 15:03:12 +0800 > > Subject: [PATCH net] net: prevent pulling SKB_GSO_FRAGLIST skb > > > > BPF or TC callers may pull in a length longer than skb_headlen() > > for a SKB_GSO_FRAGLIST skb. The data in fraglist will be pulled > > into the linear space. However it destroys the skb's structure > > and may result in an invalid segmentation or kernel exception. > > > > So we should add protection to stop the operation and return > > error to remind callers. > > > > Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") > > Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> > > Signed-off-by: Lena Wang <lena.wang@mediatek.com> > > --- > > include/linux/skbuff.h | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > > index 9d24aec064e8..3eef65b3db24 100644 > > --- a/include/linux/skbuff.h > > +++ b/include/linux/skbuff.h > > @@ -2740,6 +2740,12 @@ pskb_may_pull_reason(struct sk_buff *skb, > > unsigned int len) > > if (unlikely(len > skb->len)) > > return SKB_DROP_REASON_PKT_TOO_SMALL; > > > > +if (skb_is_gso(skb) && > > + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && > > + write_len > skb_headlen(skb)) { > > +return SKB_DROP_REASON_NOMEM; > > +} > > The 'write_len > skb_headlen(skb)' test is redundant, no ? > > It is covered by the earlier test : > > if (likely(len <= skb_headlen(skb))) > return SKB_NOT_DROPPED_YET; > Daniel, it is not redundant. The bpf pulls a len between skb_headlen(skb) and skb->len that results in error. Here it will stop this operation. For other skbs(not SKB_GSO_FRAGLIST) it could be a normal behaviour and will continue to do next pulling. > Also, was this patch even compile tested since there is no write_len > var ? > The check is moved into skb_ensure_writable as advice and a new mail loop is created to start upstream: From 36307cb8706653a63a389faadbb1324cda5c0445 Mon Sep 17 00:00:00 2001 F rom: Shiming Cheng <shiming.cheng@mediatek.com> Date: Sun, 28 Apr 2024 21:50:11 +0800 Subject: [PATCH net] net: prevent pulling SKB_GSO_FRAGLIST skb BPF or TC callers may pull in a length longer than skb_headlen() for a SKB_GSO_FRAGLIST skb. The data in fraglist will be pulled into the linear space. However it destroys the skb's structure and may result in an invalid segmentation or kernel exception. So we should add protection to stop the operation and return error to remind callers. Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> Signed-off-by: Lena Wang <lena.wang@mediatek.com> --- net/core/skbuff.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f68f2679b086..2d35e009e814 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -6100,6 +6100,12 @@ EXPORT_SYMBOL(skb_vlan_untag); int skb_ensure_writable(struct sk_buff *skb, unsigned int write_len) { + if (skb_is_gso(skb) && + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && + write_len > skb_headlen(skb)) { + return -ENOMEM; + } + if (!pskb_may_pull(skb, write_len)) return -ENOMEM;
On 4/29/24 1:45 PM, Lena Wang (王娜) wrote: > On Mon, 2024-04-29 at 12:15 +0200, Daniel Borkmann wrote: >> >> External email : Please do not click links or open attachments until >> you have verified the sender or the content. >> On 4/28/24 9:48 AM, Lena Wang (王娜) wrote: >>> On Sat, 2024-04-27 at 09:28 -0400, Willem de Bruijn wrote: >>>> >>>> External email : Please do not click links or open attachments >> until >>>> you have verified the sender or the content. >>>> >>>> Daniel Borkmann wrote: >>>>> On 4/26/24 11:52 AM, Lena Wang (王娜) wrote: >>>>> [...] >>>>>>>> From 301da5c9d65652bac6091d4cd64b751b3338f8bb Mon Sep 17 >>>> 00:00:00 >>>>>>> 2001 >>>>>>>> From: Shiming Cheng <shiming.cheng@mediatek.com> >>>>>>>> Date: Wed, 24 Apr 2024 13:42:35 +0800 >>>>>>>> Subject: [PATCH net] net: prevent BPF pulling SKB_GSO_FRAGLIST >>>> skb >>>>>>>> >>>>>>>> A SKB_GSO_FRAGLIST skb can't be pulled data >>>>>>>> from its fraglist as it may result an invalid >>>>>>>> segmentation or kernel exception. >>>>>>>> >>>>>>>> For such structured skb we limit the BPF pulling >>>>>>>> data length smaller than skb_headlen() and return >>>>>>>> error if exceeding. >>>>>>>> >>>>>>>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist >> chaining.") >>>>>>>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> >>>>>>>> Signed-off-by: Lena Wang <lena.wang@mediatek.com> >>>>>>>> --- >>>>>>>> net/core/filter.c | 5 +++++ >>>>>>>> 1 file changed, 5 insertions(+) >>>>>>>> >>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c >>>>>>>> index 8adf95765cdd..8ed4d5d87167 100644 >>>>>>>> --- a/net/core/filter.c >>>>>>>> +++ b/net/core/filter.c >>>>>>>> @@ -1662,6 +1662,11 @@ static DEFINE_PER_CPU(struct >>>> bpf_scratchpad, >>>>>>>> bpf_sp); >>>>>>>> static inline int __bpf_try_make_writable(struct sk_buff >>>> *skb, >>>>>>>> unsigned int write_len) >>>>>>>> { >>>>>>>> +if (skb_is_gso(skb) && >>>>>>>> + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && >>>>>>>> + write_len > skb_headlen(skb)) { >>>>>>>> +return -ENOMEM; >>>>>>>> +} >>>>>>>> return skb_ensure_writable(skb, write_len); >>>>> >>>>> Dumb question, but should this guard be more generically part of >>>> skb_ensure_writable() >>>>> internals, presumably that would be inside >> pskb_may_pull_reason(), >>>> or only if we ever >>>>> see more code instances similar to this? >>>> >>>> Good point. Most callers of skb_ensure_writable correctly pull >> only >>>> headers, so wouldn't cause this problem. But it also adds coverage >> to >>>> things like tc pedit. >>> >>> Updated: >>> >>> From 3be30b8cf6e629f2615ef4eafe3b2a1c0d68c530 Mon Sep 17 00:00:00 >> 2001 >>> From: Shiming Cheng <shiming.cheng@mediatek.com> >>> Date: Sun, 28 Apr 2024 15:03:12 +0800 >>> Subject: [PATCH net] net: prevent pulling SKB_GSO_FRAGLIST skb >>> >>> BPF or TC callers may pull in a length longer than skb_headlen() >>> for a SKB_GSO_FRAGLIST skb. The data in fraglist will be pulled >>> into the linear space. However it destroys the skb's structure >>> and may result in an invalid segmentation or kernel exception. >>> >>> So we should add protection to stop the operation and return >>> error to remind callers. >>> >>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") >>> Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com> >>> Signed-off-by: Lena Wang <lena.wang@mediatek.com> >>> --- >>> include/linux/skbuff.h | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h >>> index 9d24aec064e8..3eef65b3db24 100644 >>> --- a/include/linux/skbuff.h >>> +++ b/include/linux/skbuff.h >>> @@ -2740,6 +2740,12 @@ pskb_may_pull_reason(struct sk_buff *skb, >>> unsigned int len) >>> if (unlikely(len > skb->len)) >>> return SKB_DROP_REASON_PKT_TOO_SMALL; >>> >>> +if (skb_is_gso(skb) && >>> + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST) && >>> + write_len > skb_headlen(skb)) { >>> +return SKB_DROP_REASON_NOMEM; >>> +} >> >> The 'write_len > skb_headlen(skb)' test is redundant, no ? >> >> It is covered by the earlier test : >> >> if (likely(len <= skb_headlen(skb))) >> return SKB_NOT_DROPPED_YET; >> > Daniel, it is not redundant. The bpf pulls a len between > skb_headlen(skb) and skb->len that results in error. Here it will stop > this operation. For other skbs(not SKB_GSO_FRAGLIST) it could be a > normal behaviour and will continue to do next pulling. I meant something like the below. The len <= skb_headlen(skb) case you already return earlier with SKB_NOT_DROPPED_YET. Willem, do you see a case where this should not live in pskb_may_pull_reason() but rather specifically in skb_ensure_writable()? Thanks, Daniel diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 517e546a120a..ef2a0328ff2b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1687,6 +1687,11 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, /* Internal */ #define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB))) +static inline bool skb_is_gso(const struct sk_buff *skb) +{ + return skb_shinfo(skb)->gso_size; +} + static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb) { return &skb_shinfo(skb)->hwtstamps; @@ -2740,6 +2745,10 @@ pskb_may_pull_reason(struct sk_buff *skb, unsigned int len) if (unlikely(len > skb->len)) return SKB_DROP_REASON_PKT_TOO_SMALL; + if (unlikely(skb_is_gso(skb) && + (skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST))) + return SKB_DROP_REASON_FRAGLIST_PULL; + if (unlikely(!__pskb_pull_tail(skb, len - skb_headlen(skb)))) return SKB_DROP_REASON_NOMEM; @@ -4953,11 +4962,6 @@ static inline struct sec_path *skb_sec_path(const struct sk_buff *skb) #endif } -static inline bool skb_is_gso(const struct sk_buff *skb) -{ - return skb_shinfo(skb)->gso_size; -} - /* Note: Should be called only if skb_is_gso(skb) is true */ static inline bool skb_is_gso_v6(const struct sk_buff *skb) { diff --git a/include/net/dropreason-core.h b/include/net/dropreason-core.h index 9707ab54fdd5..9d6c97a6b2b6 100644 --- a/include/net/dropreason-core.h +++ b/include/net/dropreason-core.h @@ -418,6 +418,12 @@ enum skb_drop_reason { * iterations. */ SKB_DROP_REASON_TC_RECLASSIFY_LOOP, + /** + * @SKB_DROP_REASON_FRAGLIST_PULL: attempting to pull GSO fraglist + * which destroys the skb's structure and may then result in an + * invalid segmentation or kernel exception. + */ + SKB_DROP_REASON_FRAGLIST_PULL, /** * @SKB_DROP_REASON_MAX: the maximum of core drop reasons, which * shouldn't be used as a real 'reason' - only for tracing code gen
> >> The 'write_len > skb_headlen(skb)' test is redundant, no ? > >> > >> It is covered by the earlier test : > >> > >> if (likely(len <= skb_headlen(skb))) > >> return SKB_NOT_DROPPED_YET; > >> > > Daniel, it is not redundant. The bpf pulls a len between > > skb_headlen(skb) and skb->len that results in error. Here it will stop > > this operation. For other skbs(not SKB_GSO_FRAGLIST) it could be a > > normal behaviour and will continue to do next pulling. > > I meant something like the below. The len <= skb_headlen(skb) case you > already return earlier with SKB_NOT_DROPPED_YET. Willem, do you see a > case where this should not live in pskb_may_pull_reason() but rather > specifically in skb_ensure_writable()? Yes. pskb_may_pull is called all over the hot path. All in locations that are known safe, because they only pull header bytes. I prefer to limit the branch to the few (user configurable) locations that are in scope.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index b99127712e67..f68f2679b086 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4504,6 +4504,9 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb, if (err) goto err_linearize; + if (!list_skb) + goto err_linearize; + skb_shinfo(skb)->frag_list = NULL; while (list_skb) {