Message ID | 20200805075810.604063-1-lorenzo@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | usb: gadget: f_ncm: allow using NCM in SuperSpeed Plus gadgets. | expand |
Lorenzo Colitti <lorenzo@google.com> writes: > Currently, using f_ncm in a SuperSpeed Plus gadget results in > an oops in config_ep_by_speed because ncm_set_alt passes in NULL > ssp_descriptors. Fix this by defining new descriptors for > SuperSpeed Plus. (We cannot re-use the existing definitions for > the SuperSpeed descriptors, even though they are mostly the same, > because they are not fixed initializers). > > Also fix reported bandwidth to match bandwidth reported for > SuperSpeed. This calculation is already incorrect, because it > returns 851 Mbps and NCM can already reach speeds in excess of > 1.7 Gbps on a 5 Gbps port. But it's better to return 851 Mbps > than 9 Mbps for SuperSpeed Plus. > > Tested: enabled f_ncm on a dwc3 gadget and 10Gbps link, ran iperf > Signed-off-by: Lorenzo Colitti <lorenzo@google.com> > --- > drivers/usb/gadget/function/f_ncm.c | 79 ++++++++++++++++++++++++++++- > 1 file changed, 77 insertions(+), 2 deletions(-) > > diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c > index 1d900081b1..91f87165e7 100644 > --- a/drivers/usb/gadget/function/f_ncm.c > +++ b/drivers/usb/gadget/function/f_ncm.c > @@ -85,7 +85,8 @@ static inline struct f_ncm *func_to_ncm(struct usb_function *f) > /* peak (theoretical) bulk transfer rate in bits-per-second */ > static inline unsigned ncm_bitrate(struct usb_gadget *g) > { > - if (gadget_is_superspeed(g) && g->speed == USB_SPEED_SUPER) > + if ((gadget_is_superspeed(g) && g->speed == USB_SPEED_SUPER) || > + (gadget_is_superspeed_plus(g) && g->speed == USB_SPEED_SUPER_PLUS)) > return 13 * 1024 * 8 * 1000 * 8; shouldn't this be 16 * 1024 * 8 * 1000 * 8? I mean, assuming we have bMaxBurst set to 15. If I remember how this calculation was done it was: 13 packets / uFrame 1024 bytes per packet 8000 uFrames / second 8 bits / byte But for SS and SSP what controls the number of packets in a uFrame is bMaxBurst, which can go up to 16 (15 + 1). Seems like we should also update ss_ncm_bulk_comp_desc to set a bMaxBurst of 15: @@ -381,8 +381,8 @@ static struct usb_ss_ep_comp_descriptor ss_ncm_bulk_comp_desc = { .bLength = sizeof(ss_ncm_bulk_comp_desc), .bDescriptorType = USB_DT_SS_ENDPOINT_COMP, - /* the following 2 values can be tweaked if necessary */ - /* .bMaxBurst = 0, */ + .bMaxBurst = 15, + /* the following value can be tweaked if necessary */ /* .bmAttributes = 0, */ }; But anyway, if we change 13 to 16, then we 1Gbps. How did you get 1.7Gbps? I think we should really update ncm_bitrate() to contain the correct equations for bitrate on different speeds. > @@ -400,6 +401,75 @@ static struct usb_descriptor_header *ncm_ss_function[] = { > NULL, > }; > > +/* super speed plus support: */ > + > +static struct usb_endpoint_descriptor ssp_ncm_notify_desc = { > + .bLength = USB_DT_ENDPOINT_SIZE, > + .bDescriptorType = USB_DT_ENDPOINT, > + > + .bEndpointAddress = USB_DIR_IN, > + .bmAttributes = USB_ENDPOINT_XFER_INT, > + .wMaxPacketSize = cpu_to_le16(NCM_STATUS_BYTECOUNT), > + .bInterval = USB_MS_TO_HS_INTERVAL(NCM_STATUS_INTERVAL_MS) > +}; > + > +static struct usb_ss_ep_comp_descriptor ssp_ncm_notify_comp_desc = { > + .bLength = sizeof(ssp_ncm_notify_comp_desc), > + .bDescriptorType = USB_DT_SS_ENDPOINT_COMP, > + > + /* the following 3 values can be tweaked if necessary */ > + /* .bMaxBurst = 0, */ > + /* .bmAttributes = 0, */ > + .wBytesPerInterval = cpu_to_le16(NCM_STATUS_BYTECOUNT), > +}; > + > +static struct usb_endpoint_descriptor ssp_ncm_in_desc = { > + .bLength = USB_DT_ENDPOINT_SIZE, > + .bDescriptorType = USB_DT_ENDPOINT, > + > + .bEndpointAddress = USB_DIR_IN, > + .bmAttributes = USB_ENDPOINT_XFER_BULK, > + .wMaxPacketSize = cpu_to_le16(1024), > +}; > + > +static struct usb_endpoint_descriptor ssp_ncm_out_desc = { > + .bLength = USB_DT_ENDPOINT_SIZE, > + .bDescriptorType = USB_DT_ENDPOINT, > + > + .bEndpointAddress = USB_DIR_OUT, > + .bmAttributes = USB_ENDPOINT_XFER_BULK, > + .wMaxPacketSize = cpu_to_le16(1024), > +}; > + > +static struct usb_ss_ep_comp_descriptor ssp_ncm_bulk_comp_desc = { > + .bLength = sizeof(ssp_ncm_bulk_comp_desc), > + .bDescriptorType = USB_DT_SS_ENDPOINT_COMP, > + > + /* the following 2 values can be tweaked if necessary */ > + /* .bMaxBurst = 0, */ should you add bMaxBurst = 15 here?
On Wed, Aug 5, 2020 at 7:21 PM Felipe Balbi <balbi@kernel.org> wrote: > But anyway, if we change 13 to 16, then we 1Gbps. How did you get > 1.7Gbps? I think we should really update ncm_bitrate() to contain the > correct equations for bitrate on different speeds. I got 1.7 Gbps because that's what I measured in iperf. :-) But actually after reading the spec I think that for SuperSpeed and above that calculation is meaningless, because bulk transfers are no longer limited by a set number of packets per microframe. The USB 3.2 spec has mostly replaced the words "microframe" with "bus interval", but I don't see any place in the spec that says the number of packets per bus interval is limited. Section 8.12.1.2 (Bulk IN Transactions) just says that "when the host is ready to receive bulk data, it sends an ACK TP" saying how many packets it's willing to accept, where the maximum is the burst size supported by the endpoint. After that, the host has to respond with an ACK to every data packet received, and every ACK specifies the number of data packets it expects from then on. It seems more correct that for SS and above the driver should simply just report the link speed minus theoretical bus overhead? Section 4.4.11 suggests that's about 10%, so it would announce 4.5 Gbps. > > +static struct usb_ss_ep_comp_descriptor ssp_ncm_bulk_comp_desc = { > > + .bLength = sizeof(ssp_ncm_bulk_comp_desc), > > + .bDescriptorType = USB_DT_SS_ENDPOINT_COMP, > > + > > + /* the following 2 values can be tweaked if necessary */ > > + /* .bMaxBurst = 0, */ > > should you add bMaxBurst = 15 here? I'm not sure. On my setup, it provides a fair performance boost (goes from ~1.7Gbps to ~2.3Gbps in, and ~620Mbps to ~720Mbps out). But I don't know whether there might be any compatibility constraints or hardware dependencies. I do see that the f_mass_storage driver sets it to 15: /* Calculate bMaxBurst, we know packet size is 1024 */ max_burst = min_t(unsigned, FSG_BUFLEN / 1024, 15); so perhaps this is fine to do in NCM too? If we want to set bMaxBurst to 15, should that be in this patch, or in a separate patch? Cheers, Lorenzo
USB3.0 / USB3.1 gen1 / USB3.2 gen 1x1 / 5gbps overhead is upwards of 20% (8b10b coding is 80% efficient). USB3.1 gen2 / USB3.2 gen 2x1 / 10gbps overhead is upwards of 3% (128b132b coding is nearly 97% efficient). however: USB3.2 gen 1x2 / 10gbps overhead is again 20% (since this is 8b10b on two 5gbps links on one cable) USB3.2 gen 2x2 / 20gbps overhead is again 3% (since this is 128b132b on two 10gbps links on one cable) On top of that you need to layer usb protocol overhead (the above is just link layer overhead). AFAICT for optimal xfer you need to transfer in 16KiB chunks, which get split into 16 1KiB pieces, each piece has overhead, plus there's a begin packet and final ack packet (ie. 18 packets total). I'm not entirely sure what the overhead is here, but my estimate: 16384 / (32 + 16*(32 + 1024) + 32) puts it at another 3.5% loss on top of the previous L1 overhead (ie. multiplicative). [Note: I'm not entirely sure if the first and final 32 are correct, but I'm pretty sure it's at least this much, if anything stuff is worse due to some unavoidable delays between data reception and ack, the upstream direction to host is even worse, since host asks for data, device provides it, host acks it, thus there's 2 data direction flip delays] This means: 5 gbps -> 5*8/10*0.965 = 3.86 gbps (USB 3.0 / USB3.1 gen1 / USB3.2 gen 1x1) 10 gbps -> 10*128/132*0.965 = 9.35 gbps (this is USB3.1 gen2 / USB3.2 gen 2x1) 10 gbps -> 10*8/10*0.965 = 7.72 gbps (this is dual link USB3.2 gen 1x2) 20 gbps -> 20*128/132*0.965 = 18.72 gbps (this is dual link USB3.2 gen 2x2) At least I'm pretty sure you physically can't go faster, though there might still be extra overhead I missed (which would make it even slower). (in particular the dual link cases seem to duplicate some control stuff across both cables, so overhead is probably a tad higher) > > > + /* the following 2 values can be tweaked if necessary */ > > > + /* .bMaxBurst = 0, */ > > > > should you add bMaxBurst = 15 here? > > I'm not sure. On my setup, it provides a fair performance boost (goes > from ~1.7Gbps to ~2.3Gbps in, and ~620Mbps to ~720Mbps out). But I > don't know whether there might be any compatibility constraints or > hardware dependencies. I do see that the f_mass_storage driver sets it > to 15: > > /* Calculate bMaxBurst, we know packet size is 1024 */ > max_burst = min_t(unsigned, FSG_BUFLEN / 1024, 15); > > so perhaps this is fine to do in NCM too? If we want to set bMaxBurst > to 15, should that be in this patch, or in a separate patch? I think we should. I would imagine this is the 16*1024 I reference up above. Though it should probably be bumped in a different commit.
Lorenzo Colitti <lorenzo@google.com> writes: > On Wed, Aug 5, 2020 at 7:21 PM Felipe Balbi <balbi@kernel.org> wrote: >> But anyway, if we change 13 to 16, then we 1Gbps. How did you get >> 1.7Gbps? I think we should really update ncm_bitrate() to contain the >> correct equations for bitrate on different speeds. > > I got 1.7 Gbps because that's what I measured in iperf. :-) > > But actually after reading the spec I think that for SuperSpeed and > above that calculation is meaningless, because bulk transfers are no > longer limited by a set number of packets per microframe. The USB 3.2 > spec has mostly replaced the words "microframe" with "bus interval", > but I don't see any place in the spec that says the number of packets > per bus interval is limited. Section 8.12.1.2 (Bulk IN Transactions) > just says that "when the host is ready to receive bulk data, it sends > an ACK TP" saying how many packets it's willing to accept, where the > maximum is the burst size supported by the endpoint. After that, the > host has to respond with an ACK to every data packet received, and > every ACK specifies the number of data packets it expects from then > on. > > It seems more correct that for SS and above the driver should simply > just report the link speed minus theoretical bus overhead? Section > 4.4.11 suggests that's about 10%, so it would announce 4.5 Gbps. makes sense to me :-) >> > +static struct usb_ss_ep_comp_descriptor ssp_ncm_bulk_comp_desc = { >> > + .bLength = sizeof(ssp_ncm_bulk_comp_desc), >> > + .bDescriptorType = USB_DT_SS_ENDPOINT_COMP, >> > + >> > + /* the following 2 values can be tweaked if necessary */ >> > + /* .bMaxBurst = 0, */ >> >> should you add bMaxBurst = 15 here? > > I'm not sure. On my setup, it provides a fair performance boost (goes > from ~1.7Gbps to ~2.3Gbps in, and ~620Mbps to ~720Mbps out). But I > don't know whether there might be any compatibility constraints or > hardware dependencies. I do see that the f_mass_storage driver sets it > to 15: there shouldn't be any compatibility issues, no. > /* Calculate bMaxBurst, we know packet size is 1024 */ > max_burst = min_t(unsigned, FSG_BUFLEN / 1024, 15); > > so perhaps this is fine to do in NCM too? If we want to set bMaxBurst > to 15, should that be in this patch, or in a separate patch? yup, should be fine. A separate patch is okay too :-)
Hi, Maciej Żenczykowski <zenczykowski@gmail.com> writes: > USB3.0 / USB3.1 gen1 / USB3.2 gen 1x1 / 5gbps overhead is upwards of > 20% (8b10b coding is 80% efficient). > > USB3.1 gen2 / USB3.2 gen 2x1 / 10gbps overhead is upwards of 3% > (128b132b coding is nearly 97% efficient). > > however: > USB3.2 gen 1x2 / 10gbps overhead is again 20% (since this is 8b10b on > two 5gbps links on one cable) > > USB3.2 gen 2x2 / 20gbps overhead is again 3% (since this is 128b132b > on two 10gbps links on one cable) > > On top of that you need to layer usb protocol overhead (the above is > just link layer overhead). > > AFAICT for optimal xfer you need to transfer in 16KiB chunks, which > get split into 16 1KiB pieces, > each piece has overhead, plus there's a begin packet and final ack > packet (ie. 18 packets total). > I'm not entirely sure what the overhead is here, but my estimate: > 16384 / (32 + 16*(32 + 1024) + 32) > puts it at another 3.5% loss on top of the previous L1 overhead (ie. > multiplicative). > > [Note: I'm not entirely sure if the first and final 32 are correct, > but I'm pretty sure it's at least this much, > if anything stuff is worse due to some unavoidable delays between data > reception and ack, the upstream direction to host is even worse, since > host asks for data, device provides it, host acks it, thus there's 2 > data direction flip delays] > > This means: > 5 gbps -> 5*8/10*0.965 = 3.86 gbps (USB 3.0 / USB3.1 gen1 / USB3.2 gen 1x1) > 10 gbps -> 10*128/132*0.965 = 9.35 gbps (this is USB3.1 gen2 / USB3.2 gen 2x1) > 10 gbps -> 10*8/10*0.965 = 7.72 gbps (this is dual link USB3.2 gen 1x2) > 20 gbps -> 20*128/132*0.965 = 18.72 gbps (this is dual link USB3.2 gen 2x2) thanks for going through the trouble of digging all this information, much appreciated. Unless anyone has any concerns with these numbers, I think this is much closer to reality. Any further limitation is SW/HW overhead. > At least I'm pretty sure you physically can't go faster, though there > might still be extra overhead I missed (which would make it even > slower). > (in particular the dual link cases seem to duplicate some control > stuff across both cables, so overhead is probably a tad higher) possible, yeah. >> > > + /* the following 2 values can be tweaked if necessary */ >> > > + /* .bMaxBurst = 0, */ >> > >> > should you add bMaxBurst = 15 here? >> >> I'm not sure. On my setup, it provides a fair performance boost (goes >> from ~1.7Gbps to ~2.3Gbps in, and ~620Mbps to ~720Mbps out). But I >> don't know whether there might be any compatibility constraints or >> hardware dependencies. I do see that the f_mass_storage driver sets it >> to 15: >> >> /* Calculate bMaxBurst, we know packet size is 1024 */ >> max_burst = min_t(unsigned, FSG_BUFLEN / 1024, 15); >> >> so perhaps this is fine to do in NCM too? If we want to set bMaxBurst >> to 15, should that be in this patch, or in a separate patch? > > I think we should. I would imagine this is the 16*1024 I reference up above. > Though it should probably be bumped in a different commit. fair enough :-)
On Thu, Aug 6, 2020 at 3:04 AM Maciej Żenczykowski <zenczykowski@gmail.com> wrote: > This means: > 5 gbps -> 5*8/10*0.965 = 3.86 gbps (USB 3.0 / USB3.1 gen1 / USB3.2 gen 1x1) > 10 gbps -> 10*128/132*0.965 = 9.35 gbps (this is USB3.1 gen2 / USB3.2 gen 2x1) > 10 gbps -> 10*8/10*0.965 = 7.72 gbps (this is dual link USB3.2 gen 1x2) > 20 gbps -> 20*128/132*0.965 = 18.72 gbps (this is dual link USB3.2 gen 2x2) Thanks for the detailed calculations, but unfortunately it turns out that this is just a 32-bit number. :-( So v2 sets it to a round 4 Gbps for anything 5 Gbps or above.
On Mon, Aug 17, 2020 at 10:05 PM Felipe Balbi <balbi@kernel.org> wrote: > > /* Calculate bMaxBurst, we know packet size is 1024 */ > > max_burst = min_t(unsigned, FSG_BUFLEN / 1024, 15); > > > > so perhaps this is fine to do in NCM too? If we want to set bMaxBurst > > to 15, should that be in this patch, or in a separate patch? > > yup, should be fine. A separate patch is okay too :-) Thanks. Bumped up existing SS descriptors to 15 in a separate commit, and amended this commit (with the SSP descriptors) to use 15 as well.
diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c index 1d900081b1..91f87165e7 100644 --- a/drivers/usb/gadget/function/f_ncm.c +++ b/drivers/usb/gadget/function/f_ncm.c @@ -85,7 +85,8 @@ static inline struct f_ncm *func_to_ncm(struct usb_function *f) /* peak (theoretical) bulk transfer rate in bits-per-second */ static inline unsigned ncm_bitrate(struct usb_gadget *g) { - if (gadget_is_superspeed(g) && g->speed == USB_SPEED_SUPER) + if ((gadget_is_superspeed(g) && g->speed == USB_SPEED_SUPER) || + (gadget_is_superspeed_plus(g) && g->speed == USB_SPEED_SUPER_PLUS)) return 13 * 1024 * 8 * 1000 * 8; else if (gadget_is_dualspeed(g) && g->speed == USB_SPEED_HIGH) return 13 * 512 * 8 * 1000 * 8; @@ -400,6 +401,75 @@ static struct usb_descriptor_header *ncm_ss_function[] = { NULL, }; +/* super speed plus support: */ + +static struct usb_endpoint_descriptor ssp_ncm_notify_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + + .bEndpointAddress = USB_DIR_IN, + .bmAttributes = USB_ENDPOINT_XFER_INT, + .wMaxPacketSize = cpu_to_le16(NCM_STATUS_BYTECOUNT), + .bInterval = USB_MS_TO_HS_INTERVAL(NCM_STATUS_INTERVAL_MS) +}; + +static struct usb_ss_ep_comp_descriptor ssp_ncm_notify_comp_desc = { + .bLength = sizeof(ssp_ncm_notify_comp_desc), + .bDescriptorType = USB_DT_SS_ENDPOINT_COMP, + + /* the following 3 values can be tweaked if necessary */ + /* .bMaxBurst = 0, */ + /* .bmAttributes = 0, */ + .wBytesPerInterval = cpu_to_le16(NCM_STATUS_BYTECOUNT), +}; + +static struct usb_endpoint_descriptor ssp_ncm_in_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + + .bEndpointAddress = USB_DIR_IN, + .bmAttributes = USB_ENDPOINT_XFER_BULK, + .wMaxPacketSize = cpu_to_le16(1024), +}; + +static struct usb_endpoint_descriptor ssp_ncm_out_desc = { + .bLength = USB_DT_ENDPOINT_SIZE, + .bDescriptorType = USB_DT_ENDPOINT, + + .bEndpointAddress = USB_DIR_OUT, + .bmAttributes = USB_ENDPOINT_XFER_BULK, + .wMaxPacketSize = cpu_to_le16(1024), +}; + +static struct usb_ss_ep_comp_descriptor ssp_ncm_bulk_comp_desc = { + .bLength = sizeof(ssp_ncm_bulk_comp_desc), + .bDescriptorType = USB_DT_SS_ENDPOINT_COMP, + + /* the following 2 values can be tweaked if necessary */ + /* .bMaxBurst = 0, */ + /* .bmAttributes = 0, */ +}; + +static struct usb_descriptor_header *ncm_ssp_function[] = { + (struct usb_descriptor_header *) &ncm_iad_desc, + /* CDC NCM control descriptors */ + (struct usb_descriptor_header *) &ncm_control_intf, + (struct usb_descriptor_header *) &ncm_header_desc, + (struct usb_descriptor_header *) &ncm_union_desc, + (struct usb_descriptor_header *) &ecm_desc, + (struct usb_descriptor_header *) &ncm_desc, + (struct usb_descriptor_header *) &ssp_ncm_notify_desc, + (struct usb_descriptor_header *) &ssp_ncm_notify_comp_desc, + /* data interface, altsettings 0 and 1 */ + (struct usb_descriptor_header *) &ncm_data_nop_intf, + (struct usb_descriptor_header *) &ncm_data_intf, + (struct usb_descriptor_header *) &ssp_ncm_in_desc, + (struct usb_descriptor_header *) &ssp_ncm_bulk_comp_desc, + (struct usb_descriptor_header *) &ssp_ncm_out_desc, + (struct usb_descriptor_header *) &ssp_ncm_bulk_comp_desc, + NULL, +}; + /* string descriptors: */ #define STRING_CTRL_IDX 0 @@ -1502,8 +1572,13 @@ static int ncm_bind(struct usb_configuration *c, struct usb_function *f) ss_ncm_notify_desc.bEndpointAddress = fs_ncm_notify_desc.bEndpointAddress; + ssp_ncm_in_desc.bEndpointAddress = fs_ncm_in_desc.bEndpointAddress; + ssp_ncm_out_desc.bEndpointAddress = fs_ncm_out_desc.bEndpointAddress; + ssp_ncm_notify_desc.bEndpointAddress = + fs_ncm_notify_desc.bEndpointAddress; + status = usb_assign_descriptors(f, ncm_fs_function, ncm_hs_function, - ncm_ss_function, NULL); + ncm_ss_function, ncm_ssp_function); if (status) goto fail;
Currently, using f_ncm in a SuperSpeed Plus gadget results in an oops in config_ep_by_speed because ncm_set_alt passes in NULL ssp_descriptors. Fix this by defining new descriptors for SuperSpeed Plus. (We cannot re-use the existing definitions for the SuperSpeed descriptors, even though they are mostly the same, because they are not fixed initializers). Also fix reported bandwidth to match bandwidth reported for SuperSpeed. This calculation is already incorrect, because it returns 851 Mbps and NCM can already reach speeds in excess of 1.7 Gbps on a 5 Gbps port. But it's better to return 851 Mbps than 9 Mbps for SuperSpeed Plus. Tested: enabled f_ncm on a dwc3 gadget and 10Gbps link, ran iperf Signed-off-by: Lorenzo Colitti <lorenzo@google.com> --- drivers/usb/gadget/function/f_ncm.c | 79 ++++++++++++++++++++++++++++- 1 file changed, 77 insertions(+), 2 deletions(-)