Message ID | 5f7605b9f4cd2d6de4f0ef7d25be9a99d92c5aee.1585297723.git.joglekar@synopsys.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Add logic to consolidate TRBs for Synopsys xHC | expand |
On Fri, Mar 27, 2020 at 02:23:46PM +0530, Tejas Joglekar wrote: > The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for > each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8 > for HS. The controller loads and updates the TRB cache from the transfer > ring in system memory whenever the driver issues a start transfer or > update transfer command. > > For chained TRBs, the Synopsys xHC requires that the total amount of > bytes for all TRBs loaded in the TRB cache be greater than or equal to 1 > MPS. Or the chain ends within the TRB cache (with a last TRB). > > If this requirement is not met, the controller will not be able to send > or receive a packet and it will hang causing a driver timeout and error. Sounds like broken hardware, or is this requirement in the xhci spec? > > This can be a problem if a class driver queues SG requests with many > small-buffer entries. The XHCI driver will create a chained TRB for each > entry which may trigger this issue. > > This patch adds logic to the XHCI driver to detect and prevent this from > happening. > > For every (TRB_CACHE_SIZE - 2), we check the total buffer size of > the SG list and if the last window of (TRB_CACHE_SIZE - 2) SG list length > and we don't make up at least 1 MPS, we create a temporary buffer to > consolidate full SG list into the buffer. > > We check at (TRB_CACHE_SIZE - 2) window because it is possible that there > would be a link and/or event data TRB that take up to 2 of the cache > entries. > > We discovered this issue with devices on other platforms but have not > yet come across any device that triggers this on Linux. But it could be > a real problem now or in the future. All it takes is N number of small > chained TRBs. And other instances of the Synopsys IP may have smaller > values for the TRB_CACHE_SIZE which would exacerbate the problem. > > Signed-off-by: Tejas Joglekar <joglekar@synopsys.com> > --- > > Resending as 'umlaut' in email are not accepted by some servers. > > drivers/usb/core/hcd.c | 8 +++ > drivers/usb/host/xhci-ring.c | 2 +- > drivers/usb/host/xhci.c | 128 +++++++++++++++++++++++++++++++++++++++++++ > drivers/usb/host/xhci.h | 4 ++ > 4 files changed, 141 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c > index aa45840d8273..fdd257a2b8a6 100644 > --- a/drivers/usb/core/hcd.c > +++ b/drivers/usb/core/hcd.c > @@ -1459,6 +1459,14 @@ int usb_hcd_map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb, > return -EINVAL; > } > > + /* > + * If SG is consolidate into single buffer > + * return early I do not understand this comment. > + */ > + if ((urb->transfer_flags & > + URB_DMA_MAP_SINGLE)) > + return ret; Why? Isn't this now going to affect other host controllers (like all of them?) > + > n = dma_map_sg( > hcd->self.sysdev, > urb->sg, > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index a78787bb5133..2fad9474912a 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -3291,7 +3291,7 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags, > > full_len = urb->transfer_buffer_length; > /* If we have scatter/gather list, we use it. */ > - if (urb->num_sgs) { > + if (urb->num_sgs && !(urb->transfer_flags & URB_DMA_MAP_SINGLE)) { > num_sgs = urb->num_mapped_sgs; > sg = urb->sg; > addr = (u64) sg_dma_address(sg); > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c > index fe38275363e0..94fddbd06179 100644 > --- a/drivers/usb/host/xhci.c > +++ b/drivers/usb/host/xhci.c > @@ -1256,6 +1256,109 @@ EXPORT_SYMBOL_GPL(xhci_resume); > > /*-------------------------------------------------------------------------*/ > > +static int xhci_map_temp_buffer(struct usb_hcd *hcd, struct urb *urb) > +{ > + void *temp; > + int ret = 0; > + unsigned int len; > + unsigned int buf_len; > + enum dma_data_direction dir; > + struct xhci_hcd *xhci; > + > + xhci = hcd_to_xhci(hcd); > + dir = usb_urb_dir_in(urb) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; > + buf_len = urb->transfer_buffer_length; > + > + temp = kzalloc_node(buf_len, GFP_ATOMIC, > + dev_to_node(hcd->self.sysdev)); > + if (!temp) { > + xhci_warn(xhci, "Failed to create temp buffer, HC may fail\n"); Didn't kzalloc just warn before this? And isn't this whole thing going to cause a lot more memory allocations per submission than before? > + return -ENOMEM; > + } > + > + if (usb_urb_dir_out(urb)) { > + len = sg_pcopy_to_buffer(urb->sg, urb->num_sgs, > + temp, buf_len, 0); > + if (len != buf_len) > + xhci_warn(xhci, "Wrong temp buffer write length\n"); How could this happen? And if it does, why spam the kernel log about it and yet not return an error? > + } > + > + urb->transfer_buffer = temp; > + urb->transfer_dma = dma_map_single(hcd->self.sysdev, > + urb->transfer_buffer, > + urb->transfer_buffer_length, > + dir); > + if (dma_mapping_error(hcd->self.sysdev, > + urb->transfer_dma)) { > + xhci_err(xhci, "dma mapping error\n"); Again, didn't dma_mapping_error() spit out a message? > + ret = -EAGAIN; > + kfree(temp); > + } else { > + urb->transfer_flags |= URB_DMA_MAP_SINGLE; > + } > + > + return ret; > +} thanks, greg k-h
Hi, On 3/27/2020 2:57 PM, Greg KH wrote: > On Fri, Mar 27, 2020 at 02:23:46PM +0530, Tejas Joglekar wrote: >> The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for >> each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8 >> for HS. The controller loads and updates the TRB cache from the transfer >> ring in system memory whenever the driver issues a start transfer or >> update transfer command. >> >> For chained TRBs, the Synopsys xHC requires that the total amount of >> bytes for all TRBs loaded in the TRB cache be greater than or equal to 1 >> MPS. Or the chain ends within the TRB cache (with a last TRB). >> >> If this requirement is not met, the controller will not be able to send >> or receive a packet and it will hang causing a driver timeout and error. > > Sounds like broken hardware, or is this requirement in the xhci spec? > Not a xhci spec requirement, but behavior of Synopsys xHC. We have not seen actual failure on Linux yet but it is possible in future if SG list with very small buffer size is given for transfer. >> >> This can be a problem if a class driver queues SG requests with many >> small-buffer entries. The XHCI driver will create a chained TRB for each >> entry which may trigger this issue. >> >> This patch adds logic to the XHCI driver to detect and prevent this from >> happening. >> >> For every (TRB_CACHE_SIZE - 2), we check the total buffer size of >> the SG list and if the last window of (TRB_CACHE_SIZE - 2) SG list length >> and we don't make up at least 1 MPS, we create a temporary buffer to >> consolidate full SG list into the buffer. >> >> We check at (TRB_CACHE_SIZE - 2) window because it is possible that there >> would be a link and/or event data TRB that take up to 2 of the cache >> entries. >> >> We discovered this issue with devices on other platforms but have not >> yet come across any device that triggers this on Linux. But it could be >> a real problem now or in the future. All it takes is N number of small >> chained TRBs. And other instances of the Synopsys IP may have smaller >> values for the TRB_CACHE_SIZE which would exacerbate the problem. >> >> Signed-off-by: Tejas Joglekar <joglekar@synopsys.com> >> --- >> >> Resending as 'umlaut' in email are not accepted by some servers. >> >> drivers/usb/core/hcd.c | 8 +++ >> drivers/usb/host/xhci-ring.c | 2 +- >> drivers/usb/host/xhci.c | 128 +++++++++++++++++++++++++++++++++++++++++++ >> drivers/usb/host/xhci.h | 4 ++ >> 4 files changed, 141 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c >> index aa45840d8273..fdd257a2b8a6 100644 >> --- a/drivers/usb/core/hcd.c >> +++ b/drivers/usb/core/hcd.c >> @@ -1459,6 +1459,14 @@ int usb_hcd_map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb, >> return -EINVAL; >> } >> >> + /* >> + * If SG is consolidate into single buffer >> + * return early > > I do not understand this comment. The SG list is copied to a temporary buffer, and buffer is DMA mapped so we should not map the SG list again, and return without any mapping here. > >> + */ >> + if ((urb->transfer_flags & >> + URB_DMA_MAP_SINGLE)) >> + return ret; > > Why? Isn't this now going to affect other host controllers (like all of > them?) > When urb->num_sgs is greater than 0, other than my quirk dma_map function no one will set the DMA transfer flag as URB_DMA_MAP_SINGLE. So it would not be called by all HC's. Even when the SG list does not have very small buffer sizes this quirk will not set the URB_DMA_MAP_SINGLE transfer flag. >> + >> n = dma_map_sg( >> hcd->self.sysdev, >> urb->sg, >> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c >> index a78787bb5133..2fad9474912a 100644 >> --- a/drivers/usb/host/xhci-ring.c >> +++ b/drivers/usb/host/xhci-ring.c >> @@ -3291,7 +3291,7 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags, >> >> full_len = urb->transfer_buffer_length; >> /* If we have scatter/gather list, we use it. */ >> - if (urb->num_sgs) { >> + if (urb->num_sgs && !(urb->transfer_flags & URB_DMA_MAP_SINGLE)) { >> num_sgs = urb->num_mapped_sgs; >> sg = urb->sg; >> addr = (u64) sg_dma_address(sg); >> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c >> index fe38275363e0..94fddbd06179 100644 >> --- a/drivers/usb/host/xhci.c >> +++ b/drivers/usb/host/xhci.c >> @@ -1256,6 +1256,109 @@ EXPORT_SYMBOL_GPL(xhci_resume); >> >> /*-------------------------------------------------------------------------*/ >> >> +static int xhci_map_temp_buffer(struct usb_hcd *hcd, struct urb *urb) >> +{ >> + void *temp; >> + int ret = 0; >> + unsigned int len; >> + unsigned int buf_len; >> + enum dma_data_direction dir; >> + struct xhci_hcd *xhci; >> + >> + xhci = hcd_to_xhci(hcd); >> + dir = usb_urb_dir_in(urb) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; >> + buf_len = urb->transfer_buffer_length; >> + >> + temp = kzalloc_node(buf_len, GFP_ATOMIC, >> + dev_to_node(hcd->self.sysdev)); >> + if (!temp) { >> + xhci_warn(xhci, "Failed to create temp buffer, HC may fail\n"); > > Didn't kzalloc just warn before this? > Yes, It should. > And isn't this whole thing going to cause a lot more memory allocations > per submission than before? > If buffer sizes for SG list are very small (less than MPS size per TRB_CACHE_SIZE) yes, it will have more memory allocations. >> + return -ENOMEM; >> + } >> + >> + if (usb_urb_dir_out(urb)) { >> + len = sg_pcopy_to_buffer(urb->sg, urb->num_sgs, >> + temp, buf_len, 0); >> + if (len != buf_len) >> + xhci_warn(xhci, "Wrong temp buffer write length\n"); > > How could this happen? And if it does, why spam the kernel log about it > and yet not return an error? > Logic similar to bounce buffer allocation, but I agree should not log this as we don't want to return with error here. >> + } >> + >> + urb->transfer_buffer = temp; >> + urb->transfer_dma = dma_map_single(hcd->self.sysdev, >> + urb->transfer_buffer, >> + urb->transfer_buffer_length, >> + dir); >> + if (dma_mapping_error(hcd->self.sysdev, >> + urb->transfer_dma)) { >> + xhci_err(xhci, "dma mapping error\n"); > > Again, didn't dma_mapping_error() spit out a message? > Yes, should remove this too. >> + ret = -EAGAIN; >> + kfree(temp); >> + } else { >> + urb->transfer_flags |= URB_DMA_MAP_SINGLE; >> + } >> + >> + return ret; >> +} > > thanks, > > greg k-h > Thanks & Regards, Tejas Joglekar
On Fri, Mar 27, 2020 at 10:05:21AM +0000, Tejas Joglekar wrote: > Hi, > On 3/27/2020 2:57 PM, Greg KH wrote: > > On Fri, Mar 27, 2020 at 02:23:46PM +0530, Tejas Joglekar wrote: > >> The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for > >> each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8 > >> for HS. The controller loads and updates the TRB cache from the transfer > >> ring in system memory whenever the driver issues a start transfer or > >> update transfer command. > >> > >> For chained TRBs, the Synopsys xHC requires that the total amount of > >> bytes for all TRBs loaded in the TRB cache be greater than or equal to 1 > >> MPS. Or the chain ends within the TRB cache (with a last TRB). > >> > >> If this requirement is not met, the controller will not be able to send > >> or receive a packet and it will hang causing a driver timeout and error. > > > > Sounds like broken hardware, or is this requirement in the xhci spec? > > > Not a xhci spec requirement, but behavior of Synopsys xHC. We have not seen > actual failure on Linux yet but it is possible in future if SG list with > very small buffer size is given for transfer. So this is something required that is outside of the spec, meaning that the hardware is imposing additional requirements, which implies it's a hardware bug, or "quirk", right? > >> This can be a problem if a class driver queues SG requests with many > >> small-buffer entries. The XHCI driver will create a chained TRB for each > >> entry which may trigger this issue. > >> > >> This patch adds logic to the XHCI driver to detect and prevent this from > >> happening. > >> > >> For every (TRB_CACHE_SIZE - 2), we check the total buffer size of > >> the SG list and if the last window of (TRB_CACHE_SIZE - 2) SG list length > >> and we don't make up at least 1 MPS, we create a temporary buffer to > >> consolidate full SG list into the buffer. > >> > >> We check at (TRB_CACHE_SIZE - 2) window because it is possible that there > >> would be a link and/or event data TRB that take up to 2 of the cache > >> entries. > >> > >> We discovered this issue with devices on other platforms but have not > >> yet come across any device that triggers this on Linux. But it could be > >> a real problem now or in the future. All it takes is N number of small > >> chained TRBs. And other instances of the Synopsys IP may have smaller > >> values for the TRB_CACHE_SIZE which would exacerbate the problem. > >> > >> Signed-off-by: Tejas Joglekar <joglekar@synopsys.com> > >> --- > >> > >> Resending as 'umlaut' in email are not accepted by some servers. > >> > >> drivers/usb/core/hcd.c | 8 +++ > >> drivers/usb/host/xhci-ring.c | 2 +- > >> drivers/usb/host/xhci.c | 128 +++++++++++++++++++++++++++++++++++++++++++ > >> drivers/usb/host/xhci.h | 4 ++ > >> 4 files changed, 141 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c > >> index aa45840d8273..fdd257a2b8a6 100644 > >> --- a/drivers/usb/core/hcd.c > >> +++ b/drivers/usb/core/hcd.c > >> @@ -1459,6 +1459,14 @@ int usb_hcd_map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb, > >> return -EINVAL; > >> } > >> > >> + /* > >> + * If SG is consolidate into single buffer > >> + * return early > > > > I do not understand this comment. > > The SG list is copied to a temporary buffer, and buffer is DMA mapped so we should > not map the SG list again, and return without any mapping here. Please write this all out a lot more to make it more obvious. > > > >> + */ > >> + if ((urb->transfer_flags & > >> + URB_DMA_MAP_SINGLE)) > >> + return ret; > > > > Why? Isn't this now going to affect other host controllers (like all of > > them?) > > > When urb->num_sgs is greater than 0, other than my quirk dma_map function no one > will set the DMA transfer flag as URB_DMA_MAP_SINGLE. So it would not be called by > all HC's. Even when the SG list does not have very small buffer sizes this quirk will > not set the URB_DMA_MAP_SINGLE transfer flag. Are you sure? :) > >> + > >> n = dma_map_sg( > >> hcd->self.sysdev, > >> urb->sg, > >> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > >> index a78787bb5133..2fad9474912a 100644 > >> --- a/drivers/usb/host/xhci-ring.c > >> +++ b/drivers/usb/host/xhci-ring.c > >> @@ -3291,7 +3291,7 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags, > >> > >> full_len = urb->transfer_buffer_length; > >> /* If we have scatter/gather list, we use it. */ > >> - if (urb->num_sgs) { > >> + if (urb->num_sgs && !(urb->transfer_flags & URB_DMA_MAP_SINGLE)) { > >> num_sgs = urb->num_mapped_sgs; > >> sg = urb->sg; > >> addr = (u64) sg_dma_address(sg); > >> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c > >> index fe38275363e0..94fddbd06179 100644 > >> --- a/drivers/usb/host/xhci.c > >> +++ b/drivers/usb/host/xhci.c > >> @@ -1256,6 +1256,109 @@ EXPORT_SYMBOL_GPL(xhci_resume); > >> > >> /*-------------------------------------------------------------------------*/ > >> > >> +static int xhci_map_temp_buffer(struct usb_hcd *hcd, struct urb *urb) > >> +{ > >> + void *temp; > >> + int ret = 0; > >> + unsigned int len; > >> + unsigned int buf_len; > >> + enum dma_data_direction dir; > >> + struct xhci_hcd *xhci; > >> + > >> + xhci = hcd_to_xhci(hcd); > >> + dir = usb_urb_dir_in(urb) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; > >> + buf_len = urb->transfer_buffer_length; > >> + > >> + temp = kzalloc_node(buf_len, GFP_ATOMIC, > >> + dev_to_node(hcd->self.sysdev)); > >> + if (!temp) { > >> + xhci_warn(xhci, "Failed to create temp buffer, HC may fail\n"); > > > > Didn't kzalloc just warn before this? > > > Yes, It should. Then do not spit out another message please. > > And isn't this whole thing going to cause a lot more memory allocations > > per submission than before? > > > If buffer sizes for SG list are very small (less than MPS size per TRB_CACHE_SIZE) > yes, it will have more memory allocations. That's not good :( greg k-h
>>>> +static int xhci_map_temp_buffer(struct usb_hcd *hcd, struct urb *urb) >>>> +{ >>>> + void *temp; >>>> + int ret = 0; >>>> + unsigned int len; >>>> + unsigned int buf_len; >>>> + enum dma_data_direction dir; >>>> + struct xhci_hcd *xhci; >>>> + >>>> + xhci = hcd_to_xhci(hcd); >>>> + dir = usb_urb_dir_in(urb) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; >>>> + buf_len = urb->transfer_buffer_length; >>>> + >>>> + temp = kzalloc_node(buf_len, GFP_ATOMIC, >>>> + dev_to_node(hcd->self.sysdev)); >>>> + if (!temp) { >>>> + xhci_warn(xhci, "Failed to create temp buffer, HC may fail\n"); >>> >>> Didn't kzalloc just warn before this? >>> >> Yes, It should. > > Then do not spit out another message please. > >>> And isn't this whole thing going to cause a lot more memory allocations >>> per submission than before? >>> >> If buffer sizes for SG list are very small (less than MPS size per TRB_CACHE_SIZE) >> yes, it will have more memory allocations. > > That's not good :( > I actually recommended this after looking at the real numbers. It was explained to me that the Synopsis xHC has a flaw that it will hang if its TRB cache contains 16 chained TRBs (no ending TRB), and the combined TRB size is _less_ than 1024 bytes. So this only happens if a URB has a sg list with more than 16 entries whose total size is less than 1024 bytes. I was told this has not been seen ever in real life usage. So in the unlikey case this is ever triggered we will end up allocating a small bounce buffer, probably around 1024 bytes, and copy the data over from the sg-list. (for HS speeds limits are 8 chained TRBs and 512 bytes) - Mathias
Not sure if I'm hijacking the thread, or if this patch is related, but I've been seeing xHC hangs and timeouts with the dwc_3_1 controller. Protocol Trace: https://0paste.com/59611 or https://photos.app.goo.gl/54DGJJuH4kQa9Psy6 kernel trace: https://0paste.com/59613 This hang happens on transactions larger than 16K regardless of the number of TRBs. I only see this on SS and not 2.0.
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index aa45840d8273..fdd257a2b8a6 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -1459,6 +1459,14 @@ int usb_hcd_map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb, return -EINVAL; } + /* + * If SG is consolidate into single buffer + * return early + */ + if ((urb->transfer_flags & + URB_DMA_MAP_SINGLE)) + return ret; + n = dma_map_sg( hcd->self.sysdev, urb->sg, diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index a78787bb5133..2fad9474912a 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -3291,7 +3291,7 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags, full_len = urb->transfer_buffer_length; /* If we have scatter/gather list, we use it. */ - if (urb->num_sgs) { + if (urb->num_sgs && !(urb->transfer_flags & URB_DMA_MAP_SINGLE)) { num_sgs = urb->num_mapped_sgs; sg = urb->sg; addr = (u64) sg_dma_address(sg); diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index fe38275363e0..94fddbd06179 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -1256,6 +1256,109 @@ EXPORT_SYMBOL_GPL(xhci_resume); /*-------------------------------------------------------------------------*/ +static int xhci_map_temp_buffer(struct usb_hcd *hcd, struct urb *urb) +{ + void *temp; + int ret = 0; + unsigned int len; + unsigned int buf_len; + enum dma_data_direction dir; + struct xhci_hcd *xhci; + + xhci = hcd_to_xhci(hcd); + dir = usb_urb_dir_in(urb) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + buf_len = urb->transfer_buffer_length; + + temp = kzalloc_node(buf_len, GFP_ATOMIC, + dev_to_node(hcd->self.sysdev)); + if (!temp) { + xhci_warn(xhci, "Failed to create temp buffer, HC may fail\n"); + return -ENOMEM; + } + + if (usb_urb_dir_out(urb)) { + len = sg_pcopy_to_buffer(urb->sg, urb->num_sgs, + temp, buf_len, 0); + if (len != buf_len) + xhci_warn(xhci, "Wrong temp buffer write length\n"); + } + + urb->transfer_buffer = temp; + urb->transfer_dma = dma_map_single(hcd->self.sysdev, + urb->transfer_buffer, + urb->transfer_buffer_length, + dir); + if (dma_mapping_error(hcd->self.sysdev, + urb->transfer_dma)) { + xhci_err(xhci, "dma mapping error\n"); + ret = -EAGAIN; + kfree(temp); + } else { + urb->transfer_flags |= URB_DMA_MAP_SINGLE; + } + + return ret; +} + +static bool xhci_urb_temp_buffer_required(struct usb_hcd *hcd, + struct urb *urb) +{ + bool ret = false; + unsigned int i; + unsigned int len = 0; + unsigned int buf_len; + unsigned int trb_size; + unsigned int max_pkt; + struct scatterlist *sg; + struct scatterlist *tail_sg; + + sg = urb->sg; + tail_sg = urb->sg; + buf_len = urb->transfer_buffer_length; + max_pkt = usb_endpoint_maxp(&urb->ep->desc); + + if (urb->dev->speed >= USB_SPEED_SUPER) + trb_size = TRB_CACHE_SIZE_SS; + else + trb_size = TRB_CACHE_SIZE_HS; + + for_each_sg(urb->sg, sg, urb->num_sgs, i) { + len = len + sg->length; + if (i > trb_size - 2) { + len = len - tail_sg->length; + if (len < max_pkt) { + ret = true; + break; + } + + tail_sg = sg_next(tail_sg); + } + } + return ret; +} + +static void xhci_unmap_temp_buf(struct urb *urb) +{ + struct scatterlist *sg; + unsigned int len; + unsigned int buf_len; + + sg = urb->sg; + buf_len = urb->transfer_buffer_length; + + if (usb_urb_dir_in(urb)) { + len = sg_pcopy_from_buffer(urb->sg, urb->num_sgs, + urb->transfer_buffer, + buf_len, + 0); + if (len != buf_len) + dev_err(&urb->dev->dev, "Wrong length for unmap\n"); + } + + kfree(urb->transfer_buffer); + urb->transfer_buffer = NULL; +} + /* * Bypass the DMA mapping if URB is suitable for Immediate Transfer (IDT), * we'll copy the actual data into the TRB address register. This is limited to @@ -1265,12 +1368,36 @@ EXPORT_SYMBOL_GPL(xhci_resume); static int xhci_map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flags) { + struct xhci_hcd *xhci; + + xhci = hcd_to_xhci(hcd); + if (xhci_urb_suitable_for_idt(urb)) return 0; + if (xhci->quirks & XHCI_CONSOLIDATE_SG_LIST) { + if (xhci_urb_temp_buffer_required(hcd, urb)) + xhci_map_temp_buffer(hcd, urb); + } return usb_hcd_map_urb_for_dma(hcd, urb, mem_flags); } +static void xhci_unmap_urb_for_dma(struct usb_hcd *hcd, struct urb *urb) +{ + struct xhci_hcd *xhci; + bool unmap_temp_buf = false; + + xhci = hcd_to_xhci(hcd); + + if (urb->num_sgs && (urb->transfer_flags & URB_DMA_MAP_SINGLE)) + unmap_temp_buf = true; + + usb_hcd_unmap_urb_for_dma(hcd, urb); + + if ((xhci->quirks & XHCI_CONSOLIDATE_SG_LIST) && unmap_temp_buf) + xhci_unmap_temp_buf(urb); +} + /** * xhci_get_endpoint_index - Used for passing endpoint bitmasks between the core and * HCDs. Find the index for an endpoint given its descriptor. Use the return @@ -5315,6 +5442,7 @@ static const struct hc_driver xhci_hc_driver = { * managing i/o requests and associated device resources */ .map_urb_for_dma = xhci_map_urb_for_dma, + .unmap_urb_for_dma = xhci_unmap_urb_for_dma, .urb_enqueue = xhci_urb_enqueue, .urb_dequeue = xhci_urb_dequeue, .alloc_dev = xhci_alloc_dev, diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index a093eeaec70e..341d1dfbe689 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1330,6 +1330,10 @@ enum xhci_setup_dev { #define TRB_SIA (1<<31) #define TRB_FRAME_ID(p) (((p) & 0x7ff) << 20) +/* TRB cache size for xHC with TRB cache */ +#define TRB_CACHE_SIZE_HS 8 +#define TRB_CACHE_SIZE_SS 16 + struct xhci_generic_trb { __le32 field[4]; };
The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8 for HS. The controller loads and updates the TRB cache from the transfer ring in system memory whenever the driver issues a start transfer or update transfer command. For chained TRBs, the Synopsys xHC requires that the total amount of bytes for all TRBs loaded in the TRB cache be greater than or equal to 1 MPS. Or the chain ends within the TRB cache (with a last TRB). If this requirement is not met, the controller will not be able to send or receive a packet and it will hang causing a driver timeout and error. This can be a problem if a class driver queues SG requests with many small-buffer entries. The XHCI driver will create a chained TRB for each entry which may trigger this issue. This patch adds logic to the XHCI driver to detect and prevent this from happening. For every (TRB_CACHE_SIZE - 2), we check the total buffer size of the SG list and if the last window of (TRB_CACHE_SIZE - 2) SG list length and we don't make up at least 1 MPS, we create a temporary buffer to consolidate full SG list into the buffer. We check at (TRB_CACHE_SIZE - 2) window because it is possible that there would be a link and/or event data TRB that take up to 2 of the cache entries. We discovered this issue with devices on other platforms but have not yet come across any device that triggers this on Linux. But it could be a real problem now or in the future. All it takes is N number of small chained TRBs. And other instances of the Synopsys IP may have smaller values for the TRB_CACHE_SIZE which would exacerbate the problem. Signed-off-by: Tejas Joglekar <joglekar@synopsys.com> --- Resending as 'umlaut' in email are not accepted by some servers. drivers/usb/core/hcd.c | 8 +++ drivers/usb/host/xhci-ring.c | 2 +- drivers/usb/host/xhci.c | 128 +++++++++++++++++++++++++++++++++++++++++++ drivers/usb/host/xhci.h | 4 ++ 4 files changed, 141 insertions(+), 1 deletion(-)