Message ID | 20190621174553.28862-2-suwan.kim027@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | usbip: Implement SG support to vhci | expand |
Hi Suwan, On 6/21/19 11:45 AM, Suwan Kim wrote: > vhci doesn’t do dma for remote device. Actually, the real dma > operation is done by network card driver. So, vhci doesn’t use and > need dma address of transfer buffer of urb. > > When vhci supports SG, it is useful to use native SG list instead > of mapped SG list because dma mapping fnuction can adjust the > number of SG list that is urb->num_mapped_sgs. > > But hcd provides dma mapping and unmapping function by defualt. Typo "defualt" > Moreover, it causes unnecessary dma mapping and unmapping which > will be done again at the NIC driver and it wastes CPU cycles. > So, implement map_urb_for_dma and unmap_urb_for_dma function for > vhci in order to skip the dma mapping and unmapping procedure. > How did you verify that unnecessary dma map/unmap are happening? How many CPU cycles did you manage to reduce with this change? thanks, -- Shuah
On Fri, Jun 28, 2019 at 06:11:54PM -0600, shuah wrote: > Hi Suwan, > > On 6/21/19 11:45 AM, Suwan Kim wrote: > > vhci doesn’t do dma for remote device. Actually, the real dma > > operation is done by network card driver. So, vhci doesn’t use and > > need dma address of transfer buffer of urb. > > > > When vhci supports SG, it is useful to use native SG list instead > > of mapped SG list because dma mapping fnuction can adjust the > > number of SG list that is urb->num_mapped_sgs. > > > > But hcd provides dma mapping and unmapping function by defualt. > > Typo "defualt" > > > Moreover, it causes unnecessary dma mapping and unmapping which > > will be done again at the NIC driver and it wastes CPU cycles. > > So, implement map_urb_for_dma and unmap_urb_for_dma function for > > vhci in order to skip the dma mapping and unmapping procedure. > > > > How did you verify that unnecessary dma map/unmap are happening? > How many CPU cycles did you manage to reduce with this change? Dma mapping/unmapping is not required for vhci because vhci passes the virtual address of the buffer to the network stack without passing the dma address of the buffer. Network stack receive the virtual address of the buffer from vhci and later, network card driver performs dma mapping for the buffer. So, as far as I know, dma address of the buffer is not used for vhci and virtual address is only used by vhci. I used ftrace to measure a duration of usb_hcd_map_urb_for_dma(). As a result, usb_hcd_map_urb_for_dma() took a duration of about 0.14us out of about 10us which is the duration of usb_hcd_submit_urb(). However, this figure is the dma mapping measurement value for physically contiguous buffers when vhci does not support SG, and if vhci supports SG, more CPU cycles will be consumed for SG dma mapping. I think that the important point is dma mapping/unmapping is unnecessary at vhci. So we can skip dma mapping/unmapping and save the CPU cycles (even if it is small). This is an opportunity to reduce the end-to-end latency of usbip and improve the performance. Regards Suwan Kim
diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c index 667d9c0ec905..be87c8a63e24 100644 --- a/drivers/usb/usbip/vhci_hcd.c +++ b/drivers/usb/usbip/vhci_hcd.c @@ -1287,6 +1287,18 @@ static int vhci_free_streams(struct usb_hcd *hcd, struct usb_device *udev, return 0; } +static int vhci_map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb, + gfp_t mem_flags) +{ + dev_dbg(hcd->self.controller, "vhci does not map urb for dma\n"); + return 0; +} + +static void vhci_unmap_urb_for_dma(struct usb_hcd *hcd, struct urb *urb) +{ + dev_dbg(hcd->self.controller, "vhci does not unmap urb for dma\n"); +} + static const struct hc_driver vhci_hc_driver = { .description = driver_name, .product_desc = driver_desc, @@ -1303,6 +1315,9 @@ static const struct hc_driver vhci_hc_driver = { .get_frame_number = vhci_get_frame_number, + .map_urb_for_dma = vhci_map_urb_for_dma, + .unmap_urb_for_dma = vhci_unmap_urb_for_dma, + .hub_status_data = vhci_hub_status, .hub_control = vhci_hub_control, .bus_suspend = vhci_bus_suspend,
vhci doesn’t do dma for remote device. Actually, the real dma operation is done by network card driver. So, vhci doesn’t use and need dma address of transfer buffer of urb. When vhci supports SG, it is useful to use native SG list instead of mapped SG list because dma mapping fnuction can adjust the number of SG list that is urb->num_mapped_sgs. But hcd provides dma mapping and unmapping function by defualt. Moreover, it causes unnecessary dma mapping and unmapping which will be done again at the NIC driver and it wastes CPU cycles. So, implement map_urb_for_dma and unmap_urb_for_dma function for vhci in order to skip the dma mapping and unmapping procedure. Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> --- drivers/usb/usbip/vhci_hcd.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)