diff mbox series

[net] net: gro: set the last skb->next to NULL when it get merged

Message ID 20211026131859.59114-1-kerneljasonxing@gmail.com (mailing list archive)
State Rejected
Delegated to: Netdev Maintainers
Headers show
Series [net] net: gro: set the last skb->next to NULL when it get merged | expand

Checks

Context Check Description
netdev/cover_letter success Single patches do not need cover letters
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success No Fixes tag
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 7 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/header_inline success No static functions without inline keyword in header files

Commit Message

Jason Xing Oct. 26, 2021, 1:18 p.m. UTC
From: Jason Xing <xingwanli@kuaishou.com>

Setting the @next of the last skb to NULL to prevent the panic in future
when someone does something to the last of the gro list but its @next is
invalid.

For example, without the fix (commit: ece23711dd95), a panic could happen
with the clsact loaded when skb is redirected and then validated in
validate_xmit_skb_list() which could access the error addr of the @next
of the last skb. Thus, "general protection fault" would appear after that.

Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
---
 net/core/skbuff.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Jason Xing Oct. 27, 2021, 7:23 a.m. UTC | #1
On Tue, Oct 26, 2021 at 9:19 PM <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <xingwanli@kuaishou.com>
>
> Setting the @next of the last skb to NULL to prevent the panic in future
> when someone does something to the last of the gro list but its @next is
> invalid.
>
> For example, without the fix (commit: ece23711dd95), a panic could happen
> with the clsact loaded when skb is redirected and then validated in
> validate_xmit_skb_list() which could access the error addr of the @next
> of the last skb. Thus, "general protection fault" would appear after that.
>
> Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
> ---
>  net/core/skbuff.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 2170bea..7b248f1 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4396,6 +4396,7 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
>                 skb_shinfo(p)->frag_list = skb;
>         else
>                 NAPI_GRO_CB(p)->last->next = skb;
> +       skb->next = NULL;
>         NAPI_GRO_CB(p)->last = skb;

Besides, I'm a little bit confused that this operation inserts the
newest skb into the tail of the flow, so the tail of flow is the
newest, head oldest. The patch (commit: 600adc18) introduces the flush
of the oldest when the flow is full to lower the latency, but actually
it fetches the tail of the flow. Do I get something wrong here? I feel
it is really odd.

Thanks,
Jason

>         __skb_header_release(skb);
>         lp = p;
> --
> 1.8.3.1
>
Jason Xing Oct. 27, 2021, 8:07 a.m. UTC | #2
On Wed, Oct 27, 2021 at 3:23 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Tue, Oct 26, 2021 at 9:19 PM <kerneljasonxing@gmail.com> wrote:
> >
> > From: Jason Xing <xingwanli@kuaishou.com>
> >
> > Setting the @next of the last skb to NULL to prevent the panic in future
> > when someone does something to the last of the gro list but its @next is
> > invalid.
> >
> > For example, without the fix (commit: ece23711dd95), a panic could happen
> > with the clsact loaded when skb is redirected and then validated in
> > validate_xmit_skb_list() which could access the error addr of the @next
> > of the last skb. Thus, "general protection fault" would appear after that.
> >
> > Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
> > ---
> >  net/core/skbuff.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 2170bea..7b248f1 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -4396,6 +4396,7 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
> >                 skb_shinfo(p)->frag_list = skb;
> >         else
> >                 NAPI_GRO_CB(p)->last->next = skb;
> > +       skb->next = NULL;
> >         NAPI_GRO_CB(p)->last = skb;
>
> Besides, I'm a little bit confused that this operation inserts the
> newest skb into the tail of the flow, so the tail of flow is the
> newest, head oldest. The patch (commit: 600adc18) introduces the flush
> of the oldest when the flow is full to lower the latency, but actually
> it fetches the tail of the flow. Do I get something wrong here? I feel

I have to update this part. The commit 600adc18 evicts and flushes the
oldest flow. But for the current kernel, when
"napi->gro_hash[hash].count >= MAX_GRO_SKBS" happens, the
gro_flush_oldest() flushes the oldest skb of one certain flow,
actually it is the newest skb because it is at the end of the list.

> it is really odd.
>
> Thanks,
> Jason
>
> >         __skb_header_release(skb);
> >         lp = p;
> > --
> > 1.8.3.1
> >
Jason Xing Oct. 27, 2021, 8:56 a.m. UTC | #3
On Wed, Oct 27, 2021 at 4:07 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, Oct 27, 2021 at 3:23 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Tue, Oct 26, 2021 at 9:19 PM <kerneljasonxing@gmail.com> wrote:
> > >
> > > From: Jason Xing <xingwanli@kuaishou.com>
> > >
> > > Setting the @next of the last skb to NULL to prevent the panic in future
> > > when someone does something to the last of the gro list but its @next is
> > > invalid.
> > >
> > > For example, without the fix (commit: ece23711dd95), a panic could happen
> > > with the clsact loaded when skb is redirected and then validated in
> > > validate_xmit_skb_list() which could access the error addr of the @next
> > > of the last skb. Thus, "general protection fault" would appear after that.
> > >
> > > Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
> > > ---
> > >  net/core/skbuff.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > > index 2170bea..7b248f1 100644
> > > --- a/net/core/skbuff.c
> > > +++ b/net/core/skbuff.c
> > > @@ -4396,6 +4396,7 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
> > >                 skb_shinfo(p)->frag_list = skb;
> > >         else
> > >                 NAPI_GRO_CB(p)->last->next = skb;
> > > +       skb->next = NULL;
> > >         NAPI_GRO_CB(p)->last = skb;
> >
> > Besides, I'm a little bit confused that this operation inserts the
> > newest skb into the tail of the flow, so the tail of flow is the
> > newest, head oldest. The patch (commit: 600adc18) introduces the flush
> > of the oldest when the flow is full to lower the latency, but actually
> > it fetches the tail of the flow. Do I get something wrong here? I feel
>
> I have to update this part. The commit 600adc18 evicts and flushes the
> oldest flow. But for the current kernel, when
> "napi->gro_hash[hash].count >= MAX_GRO_SKBS" happens, the
> gro_flush_oldest() flushes the oldest skb of one certain flow,
> actually it is the newest skb because it is at the end of the list.

I just submitted another patch to explain how it happens, please help
me review both patches.

Link: https://lore.kernel.org/lkml/20211027084944.4508-1-kerneljasonxing@gmail.com/

Thanks again,
Jason

>
> > it is really odd.
> >
> > Thanks,
> > Jason
> >
> > >         __skb_header_release(skb);
> > >         lp = p;
> > > --
> > > 1.8.3.1
> > >
Yunsheng Lin Oct. 27, 2021, 12:40 p.m. UTC | #4
On 2021/10/27 16:56, Jason Xing wrote:
> On Wed, Oct 27, 2021 at 4:07 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>
>> On Wed, Oct 27, 2021 at 3:23 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>>
>>> On Tue, Oct 26, 2021 at 9:19 PM <kerneljasonxing@gmail.com> wrote:
>>>>
>>>> From: Jason Xing <xingwanli@kuaishou.com>
>>>>
>>>> Setting the @next of the last skb to NULL to prevent the panic in future
>>>> when someone does something to the last of the gro list but its @next is
>>>> invalid.
>>>>
>>>> For example, without the fix (commit: ece23711dd95), a panic could happen
>>>> with the clsact loaded when skb is redirected and then validated in
>>>> validate_xmit_skb_list() which could access the error addr of the @next
>>>> of the last skb. Thus, "general protection fault" would appear after that.
>>>>
>>>> Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
>>>> ---
>>>>  net/core/skbuff.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index 2170bea..7b248f1 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -4396,6 +4396,7 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
>>>>                 skb_shinfo(p)->frag_list = skb;
>>>>         else
>>>>                 NAPI_GRO_CB(p)->last->next = skb;
>>>> +       skb->next = NULL;
>>>>         NAPI_GRO_CB(p)->last = skb;
>>>
>>> Besides, I'm a little bit confused that this operation inserts the
>>> newest skb into the tail of the flow, so the tail of flow is the
>>> newest, head oldest. The patch (commit: 600adc18) introduces the flush
>>> of the oldest when the flow is full to lower the latency, but actually
>>> it fetches the tail of the flow. Do I get something wrong here? I feel
>>
>> I have to update this part. The commit 600adc18 evicts and flushes the
>> oldest flow. But for the current kernel, when
>> "napi->gro_hash[hash].count >= MAX_GRO_SKBS" happens, the
>> gro_flush_oldest() flushes the oldest skb of one certain flow,
>> actually it is the newest skb because it is at the end of the list.

it seems the below is more matched with the gro_flush_oldest() instead
of the above code block:
https://elixir.bootlin.com/linux/v5.15-rc3/source/net/core/dev.c#L6118

> 
> I just submitted another patch to explain how it happens, please help
> me review both patches.
> 
> Link: https://lore.kernel.org/lkml/20211027084944.4508-1-kerneljasonxing@gmail.com/
> 
> Thanks again,
> Jason
> 
>>
>>> it is really odd.
>>>
>>> Thanks,
>>> Jason
>>>
>>>>         __skb_header_release(skb);
>>>>         lp = p;
>>>> --
>>>> 1.8.3.1
>>>>
> .
>
Jason Xing Oct. 27, 2021, 12:54 p.m. UTC | #5
On Wed, Oct 27, 2021 at 8:40 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2021/10/27 16:56, Jason Xing wrote:
> > On Wed, Oct 27, 2021 at 4:07 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >>
> >> On Wed, Oct 27, 2021 at 3:23 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >>>
> >>> On Tue, Oct 26, 2021 at 9:19 PM <kerneljasonxing@gmail.com> wrote:
> >>>>
> >>>> From: Jason Xing <xingwanli@kuaishou.com>
> >>>>
> >>>> Setting the @next of the last skb to NULL to prevent the panic in future
> >>>> when someone does something to the last of the gro list but its @next is
> >>>> invalid.
> >>>>
> >>>> For example, without the fix (commit: ece23711dd95), a panic could happen
> >>>> with the clsact loaded when skb is redirected and then validated in
> >>>> validate_xmit_skb_list() which could access the error addr of the @next
> >>>> of the last skb. Thus, "general protection fault" would appear after that.
> >>>>
> >>>> Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
> >>>> ---
> >>>>  net/core/skbuff.c | 1 +
> >>>>  1 file changed, 1 insertion(+)
> >>>>
> >>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >>>> index 2170bea..7b248f1 100644
> >>>> --- a/net/core/skbuff.c
> >>>> +++ b/net/core/skbuff.c
> >>>> @@ -4396,6 +4396,7 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
> >>>>                 skb_shinfo(p)->frag_list = skb;
> >>>>         else
> >>>>                 NAPI_GRO_CB(p)->last->next = skb;
> >>>> +       skb->next = NULL;
> >>>>         NAPI_GRO_CB(p)->last = skb;
> >>>
> >>> Besides, I'm a little bit confused that this operation inserts the
> >>> newest skb into the tail of the flow, so the tail of flow is the
> >>> newest, head oldest. The patch (commit: 600adc18) introduces the flush
> >>> of the oldest when the flow is full to lower the latency, but actually
> >>> it fetches the tail of the flow. Do I get something wrong here? I feel
> >>
> >> I have to update this part. The commit 600adc18 evicts and flushes the
> >> oldest flow. But for the current kernel, when
> >> "napi->gro_hash[hash].count >= MAX_GRO_SKBS" happens, the
> >> gro_flush_oldest() flushes the oldest skb of one certain flow,
> >> actually it is the newest skb because it is at the end of the list.
>
> it seems the below is more matched with the gro_flush_oldest() instead
> of the above code block:
> https://elixir.bootlin.com/linux/v5.15-rc3/source/net/core/dev.c#L6118
>

What you said is the @skb->list but not the list between skbs which is
connected by skb->next when the new incoming skb needs to get merged.
The @skb->list->next/prev is not the same as @skb->next.

> >
> > I just submitted another patch to explain how it happens, please help
> > me review both patches.
> >
> > Link: https://lore.kernel.org/lkml/20211027084944.4508-1-kerneljasonxing@gmail.com/
> >
> > Thanks again,
> > Jason
> >
> >>
> >>> it is really odd.
> >>>
> >>> Thanks,
> >>> Jason
> >>>
> >>>>         __skb_header_release(skb);
> >>>>         lp = p;
> >>>> --
> >>>> 1.8.3.1
> >>>>
> > .
> >
Jason Xing Oct. 27, 2021, 1:57 p.m. UTC | #6
On Wed, Oct 27, 2021 at 8:54 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, Oct 27, 2021 at 8:40 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >
> > On 2021/10/27 16:56, Jason Xing wrote:
> > > On Wed, Oct 27, 2021 at 4:07 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >>
> > >> On Wed, Oct 27, 2021 at 3:23 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >>>
> > >>> On Tue, Oct 26, 2021 at 9:19 PM <kerneljasonxing@gmail.com> wrote:
> > >>>>
> > >>>> From: Jason Xing <xingwanli@kuaishou.com>
> > >>>>
> > >>>> Setting the @next of the last skb to NULL to prevent the panic in future
> > >>>> when someone does something to the last of the gro list but its @next is
> > >>>> invalid.
> > >>>>
> > >>>> For example, without the fix (commit: ece23711dd95), a panic could happen
> > >>>> with the clsact loaded when skb is redirected and then validated in
> > >>>> validate_xmit_skb_list() which could access the error addr of the @next
> > >>>> of the last skb. Thus, "general protection fault" would appear after that.
> > >>>>
> > >>>> Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
> > >>>> ---
> > >>>>  net/core/skbuff.c | 1 +
> > >>>>  1 file changed, 1 insertion(+)
> > >>>>
> > >>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > >>>> index 2170bea..7b248f1 100644
> > >>>> --- a/net/core/skbuff.c
> > >>>> +++ b/net/core/skbuff.c
> > >>>> @@ -4396,6 +4396,7 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
> > >>>>                 skb_shinfo(p)->frag_list = skb;
> > >>>>         else
> > >>>>                 NAPI_GRO_CB(p)->last->next = skb;
> > >>>> +       skb->next = NULL;
> > >>>>         NAPI_GRO_CB(p)->last = skb;
> > >>>
> > >>> Besides, I'm a little bit confused that this operation inserts the
> > >>> newest skb into the tail of the flow, so the tail of flow is the
> > >>> newest, head oldest. The patch (commit: 600adc18) introduces the flush
> > >>> of the oldest when the flow is full to lower the latency, but actually
> > >>> it fetches the tail of the flow. Do I get something wrong here? I feel
> > >>
> > >> I have to update this part. The commit 600adc18 evicts and flushes the
> > >> oldest flow. But for the current kernel, when
> > >> "napi->gro_hash[hash].count >= MAX_GRO_SKBS" happens, the
> > >> gro_flush_oldest() flushes the oldest skb of one certain flow,
> > >> actually it is the newest skb because it is at the end of the list.
> >
> > it seems the below is more matched with the gro_flush_oldest() instead
> > of the above code block:
> > https://elixir.bootlin.com/linux/v5.15-rc3/source/net/core/dev.c#L6118
> >
>
> What you said is the @skb->list but not the list between skbs which is
> connected by skb->next when the new incoming skb needs to get merged.
> The @skb->list->next/prev is not the same as @skb->next.
>
> > >
> > > I just submitted another patch to explain how it happens, please help
> > > me review both patches.
> > >
> > > Link: https://lore.kernel.org/lkml/20211027084944.4508-1-kerneljasonxing@gmail.com/
> > >

Emm, I think you're right, Yunsheng. The gro_flush_oldest() fetches
the list of @skb->list.
Do you think the tail of skb's next pointer should be set to NULL?

Thanks,
Jason

> > > Thanks again,
> > > Jason
> > >
> > >>
> > >>> it is really odd.
> > >>>
> > >>> Thanks,
> > >>> Jason
> > >>>
> > >>>>         __skb_header_release(skb);
> > >>>>         lp = p;
> > >>>> --
> > >>>> 1.8.3.1
> > >>>>
> > > .
> > >
Eric Dumazet Oct. 27, 2021, 7:20 p.m. UTC | #7
On 10/26/21 6:18 AM, kerneljasonxing@gmail.com wrote:
> From: Jason Xing <xingwanli@kuaishou.com>
> 
> Setting the @next of the last skb to NULL to prevent the panic in future
> when someone does something to the last of the gro list but its @next is
> invalid.
> 
> For example, without the fix (commit: ece23711dd95), a panic could happen
> with the clsact loaded when skb is redirected and then validated in
> validate_xmit_skb_list() which could access the error addr of the @next
> of the last skb. Thus, "general protection fault" would appear after that.
> 

If this a bug, please provide a Fixes: tag

> Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
> ---
>  net/core/skbuff.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 2170bea..7b248f1 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4396,6 +4396,7 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
>  		skb_shinfo(p)->frag_list = skb;
>  	else
>  		NAPI_GRO_CB(p)->last->next = skb;
> +	skb->next = NULL;

Really at this point skb->next must be already NULL.

Please provide a stack trace so that we fix the caller instead
of adding more code in GRO layer.


>  	NAPI_GRO_CB(p)->last = skb;
>  	__skb_header_release(skb);
>  	lp = p;
>
Eric Dumazet Oct. 27, 2021, 7:25 p.m. UTC | #8
On 10/27/21 1:07 AM, Jason Xing wrote:
> On Wed, Oct 27, 2021 at 3:23 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>>
>> On Tue, Oct 26, 2021 at 9:19 PM <kerneljasonxing@gmail.com> wrote:
>>>
>>> From: Jason Xing <xingwanli@kuaishou.com>
>>>
>>> Setting the @next of the last skb to NULL to prevent the panic in future
>>> when someone does something to the last of the gro list but its @next is
>>> invalid.
>>>
>>> For example, without the fix (commit: ece23711dd95), a panic could happen
>>> with the clsact loaded when skb is redirected and then validated in
>>> validate_xmit_skb_list() which could access the error addr of the @next
>>> of the last skb. Thus, "general protection fault" would appear after that.
>>>
>>> Signed-off-by: Jason Xing <xingwanli@kuaishou.com>
>>> ---
>>>  net/core/skbuff.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index 2170bea..7b248f1 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -4396,6 +4396,7 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
>>>                 skb_shinfo(p)->frag_list = skb;
>>>         else
>>>                 NAPI_GRO_CB(p)->last->next = skb;
>>> +       skb->next = NULL;
>>>         NAPI_GRO_CB(p)->last = skb;
>>
>> Besides, I'm a little bit confused that this operation inserts the
>> newest skb into the tail of the flow, so the tail of flow is the
>> newest, head oldest. The patch (commit: 600adc18) introduces the flush
>> of the oldest when the flow is full to lower the latency, but actually
>> it fetches the tail of the flow. Do I get something wrong here? I feel
> 
> I have to update this part. The commit 600adc18 evicts and flushes the
> oldest flow. But for the current kernel, when
> "napi->gro_hash[hash].count >= MAX_GRO_SKBS" happens, the
> gro_flush_oldest() flushes the oldest skb of one certain flow,
> actually it is the newest skb because it is at the end of the list.

GRO only keeps one skb per flow in the main hash/lru.

I think you are not understanding GRO correctly.
diff mbox series

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2170bea..7b248f1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4396,6 +4396,7 @@  int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
 		skb_shinfo(p)->frag_list = skb;
 	else
 		NAPI_GRO_CB(p)->last->next = skb;
+	skb->next = NULL;
 	NAPI_GRO_CB(p)->last = skb;
 	__skb_header_release(skb);
 	lp = p;