diff mbox series

[net] qmi_wwan: Clone the skb when in pass-through mode

Message ID 20210614141849.3587683-1-kristian.evensen@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] qmi_wwan: Clone the skb when in pass-through mode | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers fail 1 blamed authors not CCed: kuba@kernel.org; 3 maintainers not CCed: kuba@kernel.org davem@davemloft.net linux-usb@vger.kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes fail Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link

Commit Message

Kristian Evensen June 14, 2021, 2:18 p.m. UTC
The skb that we pass to the rmnet driver is owned by usbnet and is freed
soon after the rx_fixup() callback is called (in usbnet_bh()).  There is
no guarantee that rmnet is done handling the skb before it is freed. We
should clone the skb before we call netif_rx() to prevent use-after-free
and misc. kernel oops.

Fixes: 59e139cf0b32 ("net: qmi_wwan: Add pass through mode")

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
---
 drivers/net/usb/qmi_wwan.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Bjørn Mork June 14, 2021, 2:45 p.m. UTC | #1
Kristian Evensen <kristian.evensen@gmail.com> writes:

> The skb that we pass to the rmnet driver is owned by usbnet and is freed
> soon after the rx_fixup() callback is called (in usbnet_bh()).  There is
> no guarantee that rmnet is done handling the skb before it is freed. We
> should clone the skb before we call netif_rx() to prevent use-after-free
> and misc. kernel oops.
>
> Fixes: 59e139cf0b32 ("net: qmi_wwan: Add pass through mode")
>
> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
> ---
>  drivers/net/usb/qmi_wwan.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
> index db8d3a4f2678..5ac307eb0bfd 100644
> --- a/drivers/net/usb/qmi_wwan.c
> +++ b/drivers/net/usb/qmi_wwan.c
> @@ -620,6 +620,10 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
>  		return qmimux_rx_fixup(dev, skb);
>  
>  	if (info->flags & QMI_WWAN_FLAG_PASS_THROUGH) {
> +		skb = skb_clone(skb, GFP_ATOMIC);
> +		if (!skb)
> +			return 0;
> +
>  		skb->protocol = htons(ETH_P_MAP);
>  		return (netif_rx(skb) == NET_RX_SUCCESS);
>  	}


Thanks for pointing this out.  But it still looks strange to me.  Why do
we call netif_rx(skb) here instead of just returning 1 and leave that
for usbnet_skb_return()?  With cloning we end up doing eth_type_trans()
on the duplicate - is that wise?


Bjørn
Kristian Evensen June 14, 2021, 3:49 p.m. UTC | #2
Hi Bjørn,

On Mon, Jun 14, 2021 at 4:46 PM Bjørn Mork <bjorn@mork.no> wrote:
> Thanks for pointing this out.  But it still looks strange to me.  Why do
> we call netif_rx(skb) here instead of just returning 1 and leave that
> for usbnet_skb_return()?  With cloning we end up doing eth_type_trans()
> on the duplicate - is that wise?

Thanks for pointing this out. You are right and looking at the code
again, we should end up in usbnet_skb_return() if the call to netif_rx
is succesful. I do agree that calling netif_rx() is a bit strange,
considering that usbnet_skb_return() already calls this function.
Perhaps the issue is that we end up calling netif_rx() on the same skb
twice? rmnet also frees the skb when done with the de-aggregation.

I will do some more testing tomorrow and see if I can figure out
something. So far, the only thing I know is that if I try to perform a
more demanding transfer (like downloading a file) using qmi_wwan +
rmnet then I get a kernel oops immediatly.

Kristian
Kristian Evensen June 14, 2021, 5:02 p.m. UTC | #3
Hi Bjørn,

On Mon, Jun 14, 2021 at 5:49 PM Kristian Evensen
<kristian.evensen@gmail.com> wrote:
> I will do some more testing tomorrow and see if I can figure out
> something. So far, the only thing I know is that if I try to perform a
> more demanding transfer (like downloading a file) using qmi_wwan +
> rmnet then I get a kernel oops immediatly.

I got curios and decided to test your proposed changed today.
Replacing the call to netif_rx() with return 1, and then letting
usbnet_skb_return() do the netif_rx() call, seems to have resolved the
issue. At least I am not longer able to trigger the oops using my
earlier reproducer (running a couple of Speedtest).

Unless someone beats me to it, I will prepare another patch tomorrow.
Thanks for the help.

Kristian
Jakub Kicinski June 14, 2021, 8:05 p.m. UTC | #4
On Mon, 14 Jun 2021 16:45:55 +0200 Bjørn Mork wrote:
> Kristian Evensen <kristian.evensen@gmail.com> writes:
> 
> > The skb that we pass to the rmnet driver is owned by usbnet and is freed
> > soon after the rx_fixup() callback is called (in usbnet_bh()).  There is
> > no guarantee that rmnet is done handling the skb before it is freed. We
> > should clone the skb before we call netif_rx() to prevent use-after-free
> > and misc. kernel oops.
> >
> > Fixes: 59e139cf0b32 ("net: qmi_wwan: Add pass through mode")
> >
> > Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
> > ---
> >  drivers/net/usb/qmi_wwan.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
> > index db8d3a4f2678..5ac307eb0bfd 100644
> > --- a/drivers/net/usb/qmi_wwan.c
> > +++ b/drivers/net/usb/qmi_wwan.c
> > @@ -620,6 +620,10 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
> >  		return qmimux_rx_fixup(dev, skb);
> >  
> >  	if (info->flags & QMI_WWAN_FLAG_PASS_THROUGH) {
> > +		skb = skb_clone(skb, GFP_ATOMIC);
> > +		if (!skb)
> > +			return 0;
> > +
> >  		skb->protocol = htons(ETH_P_MAP);
> >  		return (netif_rx(skb) == NET_RX_SUCCESS);
> >  	}  
> 
> Thanks for pointing this out.  But it still looks strange to me.  Why do
> we call netif_rx(skb) here instead of just returning 1 and leave that
> for usbnet_skb_return()?  With cloning we end up doing eth_type_trans()
> on the duplicate - is that wise?

Agreed on the cloning being a strange solution. Kristian, were you able
to reproduce the problem on upstream kernels?

It does look pretty strange that qmimux_rx_fixup() copies out all
packets and receives them, and then let's usbnet to process the
multi-frame skb without even fulling off the qmimux_hdr. I'm probably
missing something.. otherwise sth like FLAG_MULTI_PACKET may be in
order?
Bjørn Mork June 15, 2021, 6:24 a.m. UTC | #5
Jakub Kicinski <kuba@kernel.org> writes:

> It does look pretty strange that qmimux_rx_fixup() copies out all
> packets and receives them, and then let's usbnet to process the
> multi-frame skb without even fulling off the qmimux_hdr. I'm probably
> missing something.. otherwise sth like FLAG_MULTI_PACKET may be in
> order?

Yes, maybe it is?  We'd have to call usbnet_skb_return() for the plain
IP frames then, but that might come out cleaner?


Bjørn
Kristian Evensen June 15, 2021, 9:03 a.m. UTC | #6
Hi Jakub,

On Mon, Jun 14, 2021 at 10:05 PM Jakub Kicinski <kuba@kernel.org> wrote:
> Agreed on the cloning being a strange solution. Kristian, were you able
> to reproduce the problem on upstream kernels?

Yes, after Bjørn's comment I realized that cloning was not a good
solution. I should have taken a closer look at the usbnet code, so
sorry about that. The most recent kernel I have managed to work well
with my boards is 5.4.123, but I see that 5.10 is available as well
(OpenWrt). However, I have backported all changes made to rmnet and
qmi_wwan between 5.4 and net-next as of yesterday. I was hoping that a
backport of the changes to those two drivers would be enough, but
perhaps there is something I have missed. I will try to get 5.10 to
run and see if that helps.

However, I have spent some more time looking into the code today.
Bjørn is right that calling netif_rx() from qmi_wwan is strange, at
least when in passthrough mode. The rx_fixup function will return 1
(assuming netif_rx() is successful), which in turn will lead to
usbnet_skb_return() being called and netif_rx() being called a second
time for the same skb. I have to admit that I don't know what will
happen when netif_rx() is called twice, but either call seems
redundant. I will submit a patch modifying the qmi_wwan rx_fixup
function to return 1 when the QMI_WWAN_FLAG_PASS_THROUGH. I believe it
is a nice clean-up and that is better to use as much of the existing
infrastructure as possible.

> It does look pretty strange that qmimux_rx_fixup() copies out all
> packets and receives them, and then let's usbnet to process the
> multi-frame skb without even fulling off the qmimux_hdr. I'm probably
> missing something.. otherwise sth like FLAG_MULTI_PACKET may be in
> order?

qmimux_rx_fixup() is different from what we are discussing here.
qmimux_rx_fixup() is used when the de-aggregation is performed by the
qmi_wwan driver, while the passthrough flag is set when the
de-aggregation is done by the rmnet driver. The logic in
qmimux_rx_fixup() is very similar to how the other usbnet mini-drivers
handles de-aggregation and also how de-aggregation is handled by for
example rmnet. I have no opinion on if the logic makes sens or not,
but at least the origin can be traced :)

Kristian
Bjørn Mork June 15, 2021, 10:04 a.m. UTC | #7
Kristian Evensen <kristian.evensen@gmail.com> writes:

>> It does look pretty strange that qmimux_rx_fixup() copies out all
>> packets and receives them, and then let's usbnet to process the
>> multi-frame skb without even fulling off the qmimux_hdr. I'm probably
>> missing something.. otherwise sth like FLAG_MULTI_PACKET may be in
>> order?
>
> qmimux_rx_fixup() is different from what we are discussing here.
> qmimux_rx_fixup() is used when the de-aggregation is performed by the
> qmi_wwan driver, while the passthrough flag is set when the
> de-aggregation is done by the rmnet driver. The logic in
> qmimux_rx_fixup() is very similar to how the other usbnet mini-drivers
> handles de-aggregation and also how de-aggregation is handled by for
> example rmnet. I have no opinion on if the logic makes sens or not,
> but at least the origin can be traced :)

Yes, FLAG_MULTI_PACKET is only applicable to the qmimux case. But I
think Jakub is right that we should set it anyway. There is no way to
return from rx_fixup without an error or further processing of the skb,
unless we set FLAG_MULTI_PACKET.  Or invent something else.  But setting
that flag and then add the necessary usnet_sb_return call doesn't look
too bad?



Bjørn
Kristian Evensen June 15, 2021, 10:51 a.m. UTC | #8
Hi Bjørn,

On Tue, Jun 15, 2021 at 12:04 PM Bjørn Mork <bjorn@mork.no> wrote:
> Yes, FLAG_MULTI_PACKET is only applicable to the qmimux case. But I
> think Jakub is right that we should set it anyway. There is no way to
> return from rx_fixup without an error or further processing of the skb,
> unless we set FLAG_MULTI_PACKET.  Or invent something else.  But setting
> that flag and then add the necessary usnet_sb_return call doesn't look
> too bad?

Just so that I am sure that we are on the same page. What you are
suggesting is something like:

* Update FLAG_MULTI_PACKET when qmimux is set/unset.
* Replace the call to netif_rx() inside qmimux_rx_fixup() with a call
to usbnet_skb_return.
* I guess we need to keep the code that updates the qmimux interface
counters. I guess we can just call this code unconditionally?

I think this would be a really nice solution. The same (at least
FLAG_MULTI_PACKET + usbnet_skb_return) could be applied to pass
through as well, giving us consistent handling of aggregated packets.
While we might not save a huge number of lines, I believe the
resulting code will be easier to understand.

If we agree that this is a good way forward, I can prepare a patch. I
have everything set up and ready to go.

Kristian
Kristian Evensen June 15, 2021, 11:04 a.m. UTC | #9
Hi,

On Tue, Jun 15, 2021 at 12:51 PM Kristian Evensen
<kristian.evensen@gmail.com> wrote:
> I think this would be a really nice solution. The same (at least
> FLAG_MULTI_PACKET + usbnet_skb_return) could be applied to pass
> through as well, giving us consistent handling of aggregated packets.
> While we might not save a huge number of lines, I believe the
> resulting code will be easier to understand.

Apologies for the noise. When I check the code again, I see that as
long as FLAG_MULTI_PACKET is set, then we end up with usbnet freeing
the skb (we will always jump to done in rx_process()). So for the
pass-through case, I believe your initial suggestion of having
rx_fixup return 1 is the way to go.

Kristian
Bjørn Mork June 15, 2021, 1:39 p.m. UTC | #10
Kristian Evensen <kristian.evensen@gmail.com> writes:
> On Tue, Jun 15, 2021 at 12:51 PM Kristian Evensen
> <kristian.evensen@gmail.com> wrote:
>> I think this would be a really nice solution. The same (at least
>> FLAG_MULTI_PACKET + usbnet_skb_return) could be applied to pass
>> through as well, giving us consistent handling of aggregated packets.
>> While we might not save a huge number of lines, I believe the
>> resulting code will be easier to understand.
>
> Apologies for the noise. When I check the code again, I see that as
> long as FLAG_MULTI_PACKET is set, then we end up with usbnet freeing
> the skb (we will always jump to done in rx_process()). So for the
> pass-through case, I believe your initial suggestion of having
> rx_fixup return 1 is the way to go.

Yes, if we are to use FLAG_MULTI_PACKET then we must call
usbnet_skb_return() for all the non-muxed cases.  There is no clean way
to enable FLAG_MULTI_PACKET on-demand.

I am fine with either solution.  Whatever Jakub wants :-)


Bjørn
Jakub Kicinski June 15, 2021, 7:26 p.m. UTC | #11
On Tue, 15 Jun 2021 15:39:14 +0200 Bjørn Mork wrote:
> >> I think this would be a really nice solution. The same (at least
> >> FLAG_MULTI_PACKET + usbnet_skb_return) could be applied to pass
> >> through as well, giving us consistent handling of aggregated packets.
> >> While we might not save a huge number of lines, I believe the
> >> resulting code will be easier to understand.  
> >
> > Apologies for the noise. When I check the code again, I see that as
> > long as FLAG_MULTI_PACKET is set, then we end up with usbnet freeing
> > the skb (we will always jump to done in rx_process()). So for the
> > pass-through case, I believe your initial suggestion of having
> > rx_fixup return 1 is the way to go.  
> 
> Yes, if we are to use FLAG_MULTI_PACKET then we must call
> usbnet_skb_return() for all the non-muxed cases.  There is no clean way
> to enable FLAG_MULTI_PACKET on-demand.

Tricky piece of code. Perhaps we could add another return code 
to the rx_fixup call? Seems that we expect 0 or 1 today, maybe we 
can make 2 mean "data was copied out", and use that for the qmimux 
case?

> I am fine with either solution.  Whatever Jakub wants :-)

Well, turns out I was looking at the wrong netif_rx() so take
what I say with a grain of salt ;)
Jakub Kicinski June 15, 2021, 7:27 p.m. UTC | #12
On Tue, 15 Jun 2021 12:26:04 -0700 Jakub Kicinski wrote:
> > > Apologies for the noise. When I check the code again, I see that as
> > > long as FLAG_MULTI_PACKET is set, then we end up with usbnet freeing
> > > the skb (we will always jump to done in rx_process()). So for the
> > > pass-through case, I believe your initial suggestion of having
> > > rx_fixup return 1 is the way to go.    
> > 
> > Yes, if we are to use FLAG_MULTI_PACKET then we must call
> > usbnet_skb_return() for all the non-muxed cases.  There is no clean way
> > to enable FLAG_MULTI_PACKET on-demand.  
> 
> Tricky piece of code. Perhaps we could add another return code 
> to the rx_fixup call? Seems that we expect 0 or 1 today, maybe we 
> can make 2 mean "data was copied out", and use that for the qmimux 
> case?

And to be clear with that still do what Bjorn suggested and return 1 
instead of netif_rx() in the non-qmimux case.
Kristian Evensen June 16, 2021, 10:08 a.m. UTC | #13
Hi,

On Tue, Jun 15, 2021 at 9:26 PM Jakub Kicinski <kuba@kernel.org> wrote:
> Tricky piece of code. Perhaps we could add another return code
> to the rx_fixup call? Seems that we expect 0 or 1 today, maybe we
> can make 2 mean "data was copied out", and use that for the qmimux
> case?

I took another look at the qmi_wwan and usbnet code, and I think we
can solve the problem at hand with the current infrastructure. I
believe all we have to do, is to change qmimux_rx_fixup() to return 0
and not 1. When the rx_fixup() callback returns 0, rx_progress() will
jump to "done" and queue the skb for clean-up. This is the same
behavior as if FLAG_MULTI_PACKET was set. I will test and prepare a
patch changing the return value, assuming it works well. Looking at
the loop in qmimux_rx_fixup I also see room for some clean-up, I will
try to merge everything into one.

Kristian
diff mbox series

Patch

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index db8d3a4f2678..5ac307eb0bfd 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -620,6 +620,10 @@  static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 		return qmimux_rx_fixup(dev, skb);
 
 	if (info->flags & QMI_WWAN_FLAG_PASS_THROUGH) {
+		skb = skb_clone(skb, GFP_ATOMIC);
+		if (!skb)
+			return 0;
+
 		skb->protocol = htons(ETH_P_MAP);
 		return (netif_rx(skb) == NET_RX_SUCCESS);
 	}