diff mbox series

[1/2] usbip: Skip DMA mapping and unmapping for urb at vhci

Message ID 20190621174553.28862-2-suwan.kim027@gmail.com (mailing list archive)
State New, archived
Headers show
Series usbip: Implement SG support to vhci | expand

Commit Message

Suwan Kim June 21, 2019, 5:45 p.m. UTC
vhci doesn’t do dma for remote device. Actually, the real dma
operation is done by network card driver. So, vhci doesn’t use and
need dma address of transfer buffer of urb.

When vhci supports SG, it is useful to use native SG list instead
of mapped SG list because dma mapping fnuction can adjust the
number of SG list that is urb->num_mapped_sgs.

But hcd provides dma mapping and unmapping function by defualt.
Moreover, it causes unnecessary dma mapping and unmapping which
will be done again at the NIC driver and it wastes CPU cycles.
So, implement map_urb_for_dma and unmap_urb_for_dma function for
vhci in order to skip the dma mapping and unmapping procedure.

Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
---
 drivers/usb/usbip/vhci_hcd.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Shuah June 29, 2019, 12:11 a.m. UTC | #1
Hi Suwan,

On 6/21/19 11:45 AM, Suwan Kim wrote:
> vhci doesn’t do dma for remote device. Actually, the real dma
> operation is done by network card driver. So, vhci doesn’t use and
> need dma address of transfer buffer of urb.
> 
> When vhci supports SG, it is useful to use native SG list instead
> of mapped SG list because dma mapping fnuction can adjust the
> number of SG list that is urb->num_mapped_sgs.
> 
> But hcd provides dma mapping and unmapping function by defualt.

Typo "defualt"

> Moreover, it causes unnecessary dma mapping and unmapping which
> will be done again at the NIC driver and it wastes CPU cycles.
> So, implement map_urb_for_dma and unmap_urb_for_dma function for
> vhci in order to skip the dma mapping and unmapping procedure.
> 

How did you verify that unnecessary dma map/unmap are happening?
How many CPU cycles did you manage to reduce with this change?

thanks,
-- Shuah
Suwan Kim July 1, 2019, 9:29 a.m. UTC | #2
On Fri, Jun 28, 2019 at 06:11:54PM -0600, shuah wrote:
> Hi Suwan,
> 
> On 6/21/19 11:45 AM, Suwan Kim wrote:
> > vhci doesn’t do dma for remote device. Actually, the real dma
> > operation is done by network card driver. So, vhci doesn’t use and
> > need dma address of transfer buffer of urb.
> > 
> > When vhci supports SG, it is useful to use native SG list instead
> > of mapped SG list because dma mapping fnuction can adjust the
> > number of SG list that is urb->num_mapped_sgs.
> > 
> > But hcd provides dma mapping and unmapping function by defualt.
> 
> Typo "defualt"
> 
> > Moreover, it causes unnecessary dma mapping and unmapping which
> > will be done again at the NIC driver and it wastes CPU cycles.
> > So, implement map_urb_for_dma and unmap_urb_for_dma function for
> > vhci in order to skip the dma mapping and unmapping procedure.
> > 
> 
> How did you verify that unnecessary dma map/unmap are happening?
> How many CPU cycles did you manage to reduce with this change?

Dma mapping/unmapping is not required for vhci because vhci passes
the virtual address of the buffer to the network stack without
passing the dma address of the buffer. Network stack receive the
virtual address of the buffer from vhci and later, network card
driver performs dma mapping for the buffer. So, as far as I know,
dma address of the buffer is not used for vhci and virtual address
is only used by vhci.

I used ftrace to measure a duration of usb_hcd_map_urb_for_dma().
As a result, usb_hcd_map_urb_for_dma() took a duration of about
0.14us out of about 10us which is the duration of usb_hcd_submit_urb().
However, this figure is the dma mapping measurement value for
physically contiguous buffers when vhci does not support SG, and
if vhci supports SG, more CPU cycles will be consumed for SG dma
mapping.

I think that the important point is dma mapping/unmapping is
unnecessary at vhci. So we can skip dma mapping/unmapping and save
the CPU cycles (even if it is small). This is an opportunity to
reduce the end-to-end latency of usbip and improve the performance.

Regards

Suwan Kim
diff mbox series

Patch

diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c
index 667d9c0ec905..be87c8a63e24 100644
--- a/drivers/usb/usbip/vhci_hcd.c
+++ b/drivers/usb/usbip/vhci_hcd.c
@@ -1287,6 +1287,18 @@  static int vhci_free_streams(struct usb_hcd *hcd, struct usb_device *udev,
 	return 0;
 }
 
+static int vhci_map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb,
+			    gfp_t mem_flags)
+{
+	dev_dbg(hcd->self.controller, "vhci does not map urb for dma\n");
+	return 0;
+}
+
+static void vhci_unmap_urb_for_dma(struct usb_hcd *hcd, struct urb *urb)
+{
+	dev_dbg(hcd->self.controller, "vhci does not unmap urb for dma\n");
+}
+
 static const struct hc_driver vhci_hc_driver = {
 	.description	= driver_name,
 	.product_desc	= driver_desc,
@@ -1303,6 +1315,9 @@  static const struct hc_driver vhci_hc_driver = {
 
 	.get_frame_number = vhci_get_frame_number,
 
+	.map_urb_for_dma = vhci_map_urb_for_dma,
+	.unmap_urb_for_dma = vhci_unmap_urb_for_dma,
+
 	.hub_status_data = vhci_hub_status,
 	.hub_control    = vhci_hub_control,
 	.bus_suspend	= vhci_bus_suspend,