diff mbox series

[RFC] usb: gadget: ncm: Fix handling of zero block length packets

Message ID 20240226112816.2616719-1-quic_kriskura@quicinc.com (mailing list archive)
State Superseded
Headers show
Series [RFC] usb: gadget: ncm: Fix handling of zero block length packets | expand

Commit Message

Krishna Kurapati Feb. 26, 2024, 11:28 a.m. UTC
While connecting to a Linux host with CDC_NCM_NTB_DEF_SIZE_TX
set to 65536, it has been observed that we receive short packets,
which come at interval of 5-10 seconds sometimes and have block
length zero but still contain 1-2 valid datagrams present.

According to the NCM spec:

"If wBlockLength = 0x0000, the block is terminated by a
short packet. In this case, the USB transfer must still
be shorter than dwNtbInMaxSize or dwNtbOutMaxSize. If
exactly dwNtbInMaxSize or dwNtbOutMaxSize bytes are sent,
and the size is a multiple of wMaxPacketSize for the
given pipe, then no ZLP shall be sent.

wBlockLength= 0x0000 must be used with extreme care, because
of the possibility that the host and device may get out of
sync, and because of test issues.

wBlockLength = 0x0000 allows the sender to reduce latency by
starting to send a very large NTB, and then shortening it when
the sender discovers that there’s not sufficient data to justify
sending a large NTB"

However, there is a potential issue with the current implementation,
as it checks for the occurrence of multiple NTBs in a single
giveback by verifying if the leftover bytes to be processed is zero
or not. If the block length reads zero, we would process the same
NTB infintely because the leftover bytes is never zero and it leads
to a crash. Fix this by bailing out if block length reads zero.

Fixes: 427694cfaafa ("usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call")
Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
---

PS: Although this issue was seen after CDC_NCM_NTB_DEF_SIZE_TX
was modified to 64K on host side, I still believe this
can come up at any time as per the spec. Also I assumed
that the giveback where block length is zero, has only
one NTB and not multiple ones.

 drivers/usb/gadget/function/f_ncm.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Greg KH Feb. 26, 2024, 1:13 p.m. UTC | #1
On Mon, Feb 26, 2024 at 04:58:16PM +0530, Krishna Kurapati wrote:
> While connecting to a Linux host with CDC_NCM_NTB_DEF_SIZE_TX
> set to 65536, it has been observed that we receive short packets,
> which come at interval of 5-10 seconds sometimes and have block
> length zero but still contain 1-2 valid datagrams present.
> 
> According to the NCM spec:
> 
> "If wBlockLength = 0x0000, the block is terminated by a
> short packet. In this case, the USB transfer must still
> be shorter than dwNtbInMaxSize or dwNtbOutMaxSize. If
> exactly dwNtbInMaxSize or dwNtbOutMaxSize bytes are sent,
> and the size is a multiple of wMaxPacketSize for the
> given pipe, then no ZLP shall be sent.
> 
> wBlockLength= 0x0000 must be used with extreme care, because
> of the possibility that the host and device may get out of
> sync, and because of test issues.
> 
> wBlockLength = 0x0000 allows the sender to reduce latency by
> starting to send a very large NTB, and then shortening it when
> the sender discovers that there’s not sufficient data to justify
> sending a large NTB"
> 
> However, there is a potential issue with the current implementation,
> as it checks for the occurrence of multiple NTBs in a single
> giveback by verifying if the leftover bytes to be processed is zero
> or not. If the block length reads zero, we would process the same
> NTB infintely because the leftover bytes is never zero and it leads
> to a crash. Fix this by bailing out if block length reads zero.
> 
> Fixes: 427694cfaafa ("usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call")
> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
> ---
> 
> PS: Although this issue was seen after CDC_NCM_NTB_DEF_SIZE_TX
> was modified to 64K on host side, I still believe this
> can come up at any time as per the spec. Also I assumed
> that the giveback where block length is zero, has only
> one NTB and not multiple ones.

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- You have marked a patch with a "Fixes:" tag for a commit that is in an
  older released kernel, yet you do not have a cc: stable line in the
  signed-off-by area at all, which means that the patch will not be
  applied to any older kernel releases.  To properly fix this, please
  follow the documented rules in the
  Documentation/process/stable-kernel-rules.rst file for how to resolve
  this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot
Maciej Żenczykowski Feb. 26, 2024, 9:56 p.m. UTC | #2
On Mon, Feb 26, 2024 at 3:28 AM Krishna Kurapati
<quic_kriskura@quicinc.com> wrote:
>
> While connecting to a Linux host with CDC_NCM_NTB_DEF_SIZE_TX
> set to 65536, it has been observed that we receive short packets,
> which come at interval of 5-10 seconds sometimes and have block
> length zero but still contain 1-2 valid datagrams present.
>
> According to the NCM spec:
>
> "If wBlockLength = 0x0000, the block is terminated by a
> short packet. In this case, the USB transfer must still
> be shorter than dwNtbInMaxSize or dwNtbOutMaxSize. If
> exactly dwNtbInMaxSize or dwNtbOutMaxSize bytes are sent,
> and the size is a multiple of wMaxPacketSize for the
> given pipe, then no ZLP shall be sent.
>
> wBlockLength= 0x0000 must be used with extreme care, because
> of the possibility that the host and device may get out of
> sync, and because of test issues.
>
> wBlockLength = 0x0000 allows the sender to reduce latency by
> starting to send a very large NTB, and then shortening it when
> the sender discovers that there’s not sufficient data to justify
> sending a large NTB"
>
> However, there is a potential issue with the current implementation,
> as it checks for the occurrence of multiple NTBs in a single
> giveback by verifying if the leftover bytes to be processed is zero
> or not. If the block length reads zero, we would process the same
> NTB infintely because the leftover bytes is never zero and it leads
> to a crash. Fix this by bailing out if block length reads zero.
>
> Fixes: 427694cfaafa ("usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call")
> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
> ---
>
> PS: Although this issue was seen after CDC_NCM_NTB_DEF_SIZE_TX
> was modified to 64K on host side, I still believe this
> can come up at any time as per the spec. Also I assumed
> that the giveback where block length is zero, has only
> one NTB and not multiple ones.
>
>  drivers/usb/gadget/function/f_ncm.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
> index e2a059cfda2c..355e370e5140 100644
> --- a/drivers/usb/gadget/function/f_ncm.c
> +++ b/drivers/usb/gadget/function/f_ncm.c
> @@ -1337,6 +1337,9 @@ static int ncm_unwrap_ntb(struct gether *port,
>         VDBG(port->func.config->cdev,
>              "Parsed NTB with %d frames\n", dgram_counter);
>
> +       if (block_len == 0)
> +               goto done;
> +
>         to_process -= block_len;
>
>         /*
> @@ -1351,6 +1354,7 @@ static int ncm_unwrap_ntb(struct gether *port,
>                 goto parse_ntb;
>         }
>
> +done:
>         dev_consume_skb_any(skb);
>
>         return 0;
> --
> 2.34.1
>

In general this is of course fine (though see Greg's auto-complaint).

I haven't thought too much about this, but I just wonder whether the
check for block_len == 0
shouldn't be just after block_len is read, ie. somewhere just after:

block_len = get_ncm(&tmp, opts->block_length);

as it is kind of weird to be handling block_len == 0 at the point where
you are already theoretically done processing the block...

I guess, as is, this assumes the block isn't actually of length 0,
since there's a bunch of following get_ncm() calls...
Are those guaranteed to be valid?

I guess I don't actually see the infinite loop with block_len == 0,
since get_ncm() always moves us forward...

Maybe your patch *is* correct as is, and you just need a comment
explaining *why* block_len == 0 is terminal at the spot you're adding the check.

Also couldn't you fix this without goto, by changing

  } else if (to_process > 0) {
to
  } else if (to_process && block_len) {
    // See NCM spec.  zero block_len means short packet.

--
Maciej Żenczykowski, Kernel Networking Developer @ Google
Krishna Kurapati Feb. 27, 2024, 2:40 a.m. UTC | #3
On 2/27/2024 3:26 AM, Maciej Żenczykowski wrote:
> On Mon, Feb 26, 2024 at 3:28 AM Krishna Kurapati
> <quic_kriskura@quicinc.com> wrote:
>>
>> While connecting to a Linux host with CDC_NCM_NTB_DEF_SIZE_TX
>> set to 65536, it has been observed that we receive short packets,
>> which come at interval of 5-10 seconds sometimes and have block
>> length zero but still contain 1-2 valid datagrams present.
>>
>> According to the NCM spec:
>>
>> "If wBlockLength = 0x0000, the block is terminated by a
>> short packet. In this case, the USB transfer must still
>> be shorter than dwNtbInMaxSize or dwNtbOutMaxSize. If
>> exactly dwNtbInMaxSize or dwNtbOutMaxSize bytes are sent,
>> and the size is a multiple of wMaxPacketSize for the
>> given pipe, then no ZLP shall be sent.
>>
>> wBlockLength= 0x0000 must be used with extreme care, because
>> of the possibility that the host and device may get out of
>> sync, and because of test issues.
>>
>> wBlockLength = 0x0000 allows the sender to reduce latency by
>> starting to send a very large NTB, and then shortening it when
>> the sender discovers that there’s not sufficient data to justify
>> sending a large NTB"
>>
>> However, there is a potential issue with the current implementation,
>> as it checks for the occurrence of multiple NTBs in a single
>> giveback by verifying if the leftover bytes to be processed is zero
>> or not. If the block length reads zero, we would process the same
>> NTB infintely because the leftover bytes is never zero and it leads
>> to a crash. Fix this by bailing out if block length reads zero.
>>
>> Fixes: 427694cfaafa ("usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call")
>> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
>> ---
>>
>> PS: Although this issue was seen after CDC_NCM_NTB_DEF_SIZE_TX
>> was modified to 64K on host side, I still believe this
>> can come up at any time as per the spec. Also I assumed
>> that the giveback where block length is zero, has only
>> one NTB and not multiple ones.
>>
>>   drivers/usb/gadget/function/f_ncm.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
>> index e2a059cfda2c..355e370e5140 100644
>> --- a/drivers/usb/gadget/function/f_ncm.c
>> +++ b/drivers/usb/gadget/function/f_ncm.c
>> @@ -1337,6 +1337,9 @@ static int ncm_unwrap_ntb(struct gether *port,
>>          VDBG(port->func.config->cdev,
>>               "Parsed NTB with %d frames\n", dgram_counter);
>>
>> +       if (block_len == 0)
>> +               goto done;
>> +
>>          to_process -= block_len;
>>
>>          /*
>> @@ -1351,6 +1354,7 @@ static int ncm_unwrap_ntb(struct gether *port,
>>                  goto parse_ntb;
>>          }
>>
>> +done:
>>          dev_consume_skb_any(skb);
>>
>>          return 0;
>> --
>> 2.34.1
>>
> 
> In general this is of course fine (though see Greg's auto-complaint).
> 
> I haven't thought too much about this, but I just wonder whether the
> check for block_len == 0
> shouldn't be just after block_len is read, ie. somewhere just after:
> 
> block_len = get_ncm(&tmp, opts->block_length);
> 
> as it is kind of weird to be handling block_len == 0 at the point where
> you are already theoretically done processing the block...
> 
> I guess, as is, this assumes the block isn't actually of length 0,
> since there's a bunch of following get_ncm() calls...
> Are those guaranteed to be valid?
> 

I did get this doubt and tried it. I bailed out as soon as I found out 
block len is zero without actually processing the datagrams present and 
when I did that even ping doesn't work. Everything works only when the 
datagrams in this zero block len NTB are parsed properly.

> I guess I don't actually see the infinite loop with block_len == 0,
> since get_ncm() always moves us forward...
> 

The infinite loop occurs because we keep moving the buffer pointer 
forward and keep processing the giveback until to_process variable 
becomes zero or one. In case block length is zero, we never move the 
buffer pointer forward and never reduce to_process variable and hence 
keep infinitely processing the same NTB over and over again.

> Maybe your patch *is* correct as is, and you just need a comment
> explaining *why* block_len == 0 is terminal at the spot you're adding the check.
> 
> Also couldn't you fix this without goto, by changing
> 
>    } else if (to_process > 0) {
> to
>    } else if (to_process && block_len) {
>      // See NCM spec.  zero block_len means short packet.
> 

I will test this out once (although I know that looking at it, it would 
definitely work) and send v2 with this diff.

Thanks for the review.

Regards,
Krishna,
Krishna Kurapati Feb. 29, 2024, 5:39 a.m. UTC | #4
On 2/27/2024 8:10 AM, Krishna Kurapati PSSNV wrote:
> 

>>
>> In general this is of course fine (though see Greg's auto-complaint).
>>
>> I haven't thought too much about this, but I just wonder whether the
>> check for block_len == 0
>> shouldn't be just after block_len is read, ie. somewhere just after:
>>
>> block_len = get_ncm(&tmp, opts->block_length);
>>
>> as it is kind of weird to be handling block_len == 0 at the point where
>> you are already theoretically done processing the block...
>>
>> I guess, as is, this assumes the block isn't actually of length 0,
>> since there's a bunch of following get_ncm() calls...
>> Are those guaranteed to be valid?
>>
> 
> I did get this doubt and tried it. I bailed out as soon as I found out 
> block len is zero without actually processing the datagrams present and 
> when I did that even ping doesn't work. Everything works only when the 
> datagrams in this zero block len NTB are parsed properly.
> 
>> I guess I don't actually see the infinite loop with block_len == 0,
>> since get_ncm() always moves us forward...
>>
> 
> The infinite loop occurs because we keep moving the buffer pointer 
> forward and keep processing the giveback until to_process variable 
> becomes zero or one. In case block length is zero, we never move the 
> buffer pointer forward and never reduce to_process variable and hence 
> keep infinitely processing the same NTB over and over again.
> 
>> Maybe your patch *is* correct as is, and you just need a comment
>> explaining *why* block_len == 0 is terminal at the spot you're adding 
>> the check.
>>
>> Also couldn't you fix this without goto, by changing
>>
>>    } else if (to_process > 0) {
>> to
>>    } else if (to_process && block_len) {
>>      // See NCM spec.  zero block_len means short packet.
>>
> 
> I will test this out once (although I know that looking at it, it would 
> definitely work) and send v2 with this diff.
> 
> Thanks for the review.
> 

Hi Maciej, Greg,

  Thanks for approving v2.

  Not sure if this is the right forum to ask this question, but had one 
query. In the NCM driver, the register_netdev is called during bind but 
the cleanup for that is called during free_inst. Meaning if usb0 
interface is created for ncm on bind or a composition switch into NCM 
(first comp switch after bootup), then it is removed only after removing 
the entire g1/functions/ncm.0 folder.

  Shouldn't we cleanup and remove the usb0 interface in unbind as a 
counter operation of bind ? By extension this question also applies to 
f_eem/ f_ecm/ f_rndis where it was done in similar manner. So was 
wondering if anyone could help me with info on why it was designed that way.

Regards,
Krishna,
diff mbox series

Patch

diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
index e2a059cfda2c..355e370e5140 100644
--- a/drivers/usb/gadget/function/f_ncm.c
+++ b/drivers/usb/gadget/function/f_ncm.c
@@ -1337,6 +1337,9 @@  static int ncm_unwrap_ntb(struct gether *port,
 	VDBG(port->func.config->cdev,
 	     "Parsed NTB with %d frames\n", dgram_counter);
 
+	if (block_len == 0)
+		goto done;
+
 	to_process -= block_len;
 
 	/*
@@ -1351,6 +1354,7 @@  static int ncm_unwrap_ntb(struct gether *port,
 		goto parse_ntb;
 	}
 
+done:
 	dev_consume_skb_any(skb);
 
 	return 0;