diff mbox series

[2/2] usbip: Implement SG support to vhci

Message ID 20190621174553.28862-3-suwan.kim027@gmail.com (mailing list archive)
State New, archived
Headers show
Series usbip: Implement SG support to vhci | expand

Commit Message

Suwan Kim June 21, 2019, 5:45 p.m. UTC
There are bugs on vhci with usb 3.0 storage device. Originally, vhci
doesn't supported SG. So, USB storage driver on vhci divides SG list
into multiple URBs and it causes buffer overflow on the xhci of the
server. So we need to add SG support to vhci

In this patch, vhci basically support SG and it sends each SG list
entry to the stub driver. Then, the stub driver sees total length of
the buffer and allocates SG table and pages according to the total
buffer length calling sgl_alloc(). After the stub driver receives
completed URB, it again sends each SG list entry to the vhci.

If HCD of server doesn't support SG, the stub driver allocates
big buffer using kmalloc() instead of using sgl_alloc() which
allocates SG list and pages.

Alan fixed vhci bug with the USB 3.0 storage device by modifying
USB storage driver.
("usb-storage: Set virt_boundary_mask to avoid SG overflows")
But the fundamental solution of it is to add SG support to vhci.

This patch works well with the USB 3.0 storage devices without Alan's
patch, and we can revert Alan's patch if it causes some troubles.

Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
---
 drivers/usb/usbip/stub_rx.c      | 32 +++++++++++++----
 drivers/usb/usbip/stub_tx.c      | 53 +++++++++++++++++++++++-----
 drivers/usb/usbip/usbip_common.c | 60 +++++++++++++++++++++++++++-----
 drivers/usb/usbip/usbip_common.h |  2 ++
 drivers/usb/usbip/vhci_hcd.c     |  8 ++++-
 drivers/usb/usbip/vhci_tx.c      | 49 ++++++++++++++++++++------
 6 files changed, 169 insertions(+), 35 deletions(-)

Comments

Alan Stern June 21, 2019, 8:05 p.m. UTC | #1
On Sat, 22 Jun 2019, Suwan Kim wrote:

> There are bugs on vhci with usb 3.0 storage device. Originally, vhci
> doesn't supported SG. So, USB storage driver on vhci divides SG list
> into multiple URBs and it causes buffer overflow on the xhci of the
> server. So we need to add SG support to vhci

It doesn't cause buffer overflow.  The problem was that a transfer got
terminated too early because the transfer length for one of the URBs
was not divisible by the maxpacket size.

> In this patch, vhci basically support SG and it sends each SG list
> entry to the stub driver. Then, the stub driver sees total length of
> the buffer and allocates SG table and pages according to the total
> buffer length calling sgl_alloc(). After the stub driver receives
> completed URB, it again sends each SG list entry to the vhci.
> 
> If HCD of server doesn't support SG, the stub driver allocates
> big buffer using kmalloc() instead of using sgl_alloc() which
> allocates SG list and pages.

You might be better off not using kmalloc.  It's easier for the kernel 
to allocate a bunch of small buffers than a single big one.  Then you 
can create a separate URB for each buffer.

> Alan fixed vhci bug with the USB 3.0 storage device by modifying
> USB storage driver.
> ("usb-storage: Set virt_boundary_mask to avoid SG overflows")
> But the fundamental solution of it is to add SG support to vhci.
> 
> This patch works well with the USB 3.0 storage devices without Alan's
> patch, and we can revert Alan's patch if it causes some troubles.

These last two paragraphs don't need to be in the patch description.

> Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
> ---

I'm not sufficiently familiar with the usbip drivers to review most of 
this.  However...

> diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c
> index be87c8a63e24..cc93c1a87a3e 100644
> --- a/drivers/usb/usbip/vhci_hcd.c
> +++ b/drivers/usb/usbip/vhci_hcd.c
> @@ -696,7 +696,8 @@ static int vhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
>  	}
>  	vdev = &vhci_hcd->vdev[portnum-1];
>  
> -	if (!urb->transfer_buffer && urb->transfer_buffer_length) {
> +	if (!urb->transfer_buffer && !urb->num_sgs &&
> +	     urb->transfer_buffer_length) {
>  		dev_dbg(dev, "Null URB transfer buffer\n");
>  		return -EINVAL;
>  	}
> @@ -1142,6 +1143,11 @@ static int vhci_setup(struct usb_hcd *hcd)
>  		hcd->speed = HCD_USB3;
>  		hcd->self.root_hub->speed = USB_SPEED_SUPER;
>  	}
> +
> +	/* support sg */
> +	hcd->self.sg_tablesize = ~0;
> +	hcd->self.no_sg_constraint = 1;

You probably shouldn't do this, for two reasons.  First, sg_tablesize
of the server's HCD may be smaller than ~0.  If the client's value is
larger than the server's, a transfer could be accepted on the client
but then fail on the server because the SG list was too big.

Also, you may want to restrict the size of SG transfers even further,
so that you don't have to allocate a tremendous amount of memory all at
once on the server.  An SG transfer can be quite large.  I don't know 
what a reasonable limit would be -- 16 perhaps?

Alan Stern
Suwan Kim June 24, 2019, 2:58 p.m. UTC | #2
On Fri, Jun 21, 2019 at 04:05:24PM -0400, Alan Stern wrote:
> On Sat, 22 Jun 2019, Suwan Kim wrote:
> 
> > There are bugs on vhci with usb 3.0 storage device. Originally, vhci
> > doesn't supported SG. So, USB storage driver on vhci divides SG list
> > into multiple URBs and it causes buffer overflow on the xhci of the
> > server. So we need to add SG support to vhci
> 
> It doesn't cause buffer overflow.  The problem was that a transfer got
> terminated too early because the transfer length for one of the URBs
> was not divisible by the maxpacket size.

Oh.. I misunderstood the problem. I will rewrite the problem
situation.

> > In this patch, vhci basically support SG and it sends each SG list
> > entry to the stub driver. Then, the stub driver sees total length of
> > the buffer and allocates SG table and pages according to the total
> > buffer length calling sgl_alloc(). After the stub driver receives
> > completed URB, it again sends each SG list entry to the vhci.
> > 
> > If HCD of server doesn't support SG, the stub driver allocates
> > big buffer using kmalloc() instead of using sgl_alloc() which
> > allocates SG list and pages.
> 
> You might be better off not using kmalloc.  It's easier for the kernel 
> to allocate a bunch of small buffers than a single big one.  Then you 
> can create a separate URB for each buffer.

Ok. I will implement it as usb_sg_init() does and send v2 patch
including the logic of submitting separate URBs.

> > Alan fixed vhci bug with the USB 3.0 storage device by modifying
> > USB storage driver.
> > ("usb-storage: Set virt_boundary_mask to avoid SG overflows")
> > But the fundamental solution of it is to add SG support to vhci.
> > 
> > This patch works well with the USB 3.0 storage devices without Alan's
> > patch, and we can revert Alan's patch if it causes some troubles.
> 
> These last two paragraphs don't need to be in the patch description.

I will remove these paragraphs in v2 patch.

> > Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
> > ---
> 
> I'm not sufficiently familiar with the usbip drivers to review most of 
> this.  However...
> 
> > diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c
> > index be87c8a63e24..cc93c1a87a3e 100644
> > --- a/drivers/usb/usbip/vhci_hcd.c
> > +++ b/drivers/usb/usbip/vhci_hcd.c
> > @@ -696,7 +696,8 @@ static int vhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
> >  	}
> >  	vdev = &vhci_hcd->vdev[portnum-1];
> >  
> > -	if (!urb->transfer_buffer && urb->transfer_buffer_length) {
> > +	if (!urb->transfer_buffer && !urb->num_sgs &&
> > +	     urb->transfer_buffer_length) {
> >  		dev_dbg(dev, "Null URB transfer buffer\n");
> >  		return -EINVAL;
> >  	}
> > @@ -1142,6 +1143,11 @@ static int vhci_setup(struct usb_hcd *hcd)
> >  		hcd->speed = HCD_USB3;
> >  		hcd->self.root_hub->speed = USB_SPEED_SUPER;
> >  	}
> > +
> > +	/* support sg */
> > +	hcd->self.sg_tablesize = ~0;
> > +	hcd->self.no_sg_constraint = 1;
> 
> You probably shouldn't do this, for two reasons.  First, sg_tablesize
> of the server's HCD may be smaller than ~0.  If the client's value is
> larger than the server's, a transfer could be accepted on the client
> but then fail on the server because the SG list was too big.
> 
> Also, you may want to restrict the size of SG transfers even further,
> so that you don't have to allocate a tremendous amount of memory all at
> once on the server.  An SG transfer can be quite large.  I don't know 
> what a reasonable limit would be -- 16 perhaps?

Is there any reason why you think that 16 is ok? Or Can I set this
value as the smallest value of all HC? I think that sg_tablesize
cannot be a variable value because vhci interacts with different
machines and all machines has different sg_tablesize value.

Regards

Suwan Kim
Alan Stern June 24, 2019, 5:24 p.m. UTC | #3
On Mon, 24 Jun 2019, Suwan Kim wrote:

> > > +	hcd->self.sg_tablesize = ~0;
> > > +	hcd->self.no_sg_constraint = 1;
> > 
> > You probably shouldn't do this, for two reasons.  First, sg_tablesize
> > of the server's HCD may be smaller than ~0.  If the client's value is
> > larger than the server's, a transfer could be accepted on the client
> > but then fail on the server because the SG list was too big.

On the other hand, I don't know of any examples where an HCD has 
sg_tablesize set to anything other than 0 or ~0.  vhci-hcd might end up 
being the only one.

> > Also, you may want to restrict the size of SG transfers even further,
> > so that you don't have to allocate a tremendous amount of memory all at
> > once on the server.  An SG transfer can be quite large.  I don't know 
> > what a reasonable limit would be -- 16 perhaps?
> 
> Is there any reason why you think that 16 is ok? Or Can I set this
> value as the smallest value of all HC? I think that sg_tablesize
> cannot be a variable value because vhci interacts with different
> machines and all machines has different sg_tablesize value.

I didn't have any good reason for picking 16.  Using the smallest value 
of all the HCDs seems like a good idea.

Alan Stern
Suwan Kim July 4, 2019, 5:24 p.m. UTC | #4
On Mon, Jun 24, 2019 at 01:24:15PM -0400, Alan Stern wrote:
> On Mon, 24 Jun 2019, Suwan Kim wrote:
> 
> > > > +	hcd->self.sg_tablesize = ~0;
> > > > +	hcd->self.no_sg_constraint = 1;
> > > 
> > > You probably shouldn't do this, for two reasons.  First, sg_tablesize
> > > of the server's HCD may be smaller than ~0.  If the client's value is
> > > larger than the server's, a transfer could be accepted on the client
> > > but then fail on the server because the SG list was too big.
> 
> On the other hand, I don't know of any examples where an HCD has 
> sg_tablesize set to anything other than 0 or ~0.  vhci-hcd might end up 
> being the only one.
> 
> > > Also, you may want to restrict the size of SG transfers even further,
> > > so that you don't have to allocate a tremendous amount of memory all at
> > > once on the server.  An SG transfer can be quite large.  I don't know 
> > > what a reasonable limit would be -- 16 perhaps?
> > 
> > Is there any reason why you think that 16 is ok? Or Can I set this
> > value as the smallest value of all HC? I think that sg_tablesize
> > cannot be a variable value because vhci interacts with different
> > machines and all machines has different sg_tablesize value.
> 
> I didn't have any good reason for picking 16.  Using the smallest value 
> of all the HCDs seems like a good idea.

I also have not seen an HCD with a value other than ~0 or 0 except for
whci which uses 2048, but is not 2048 the maximum value of sg_tablesize?
If so, ~0 is the minimum value of sg_tablesize that supports SG. Then
can vhci use ~0 if we don't consider memory pressure of the server?

If all of the HCDs supporting SG have ~0 as sg_tablesize value, I
think that whether we use an HCD locally or remotely, the degree of
memory pressure is same in both local and remote usage.

Regards

Suwan Kim
Alan Stern July 5, 2019, 1:41 a.m. UTC | #5
On Fri, 5 Jul 2019, Suwan Kim wrote:

> On Mon, Jun 24, 2019 at 01:24:15PM -0400, Alan Stern wrote:
> > On Mon, 24 Jun 2019, Suwan Kim wrote:
> > 
> > > > > +	hcd->self.sg_tablesize = ~0;
> > > > > +	hcd->self.no_sg_constraint = 1;
> > > > 
> > > > You probably shouldn't do this, for two reasons.  First, sg_tablesize
> > > > of the server's HCD may be smaller than ~0.  If the client's value is
> > > > larger than the server's, a transfer could be accepted on the client
> > > > but then fail on the server because the SG list was too big.
> > 
> > On the other hand, I don't know of any examples where an HCD has 
> > sg_tablesize set to anything other than 0 or ~0.  vhci-hcd might end up 
> > being the only one.
> > 
> > > > Also, you may want to restrict the size of SG transfers even further,
> > > > so that you don't have to allocate a tremendous amount of memory all at
> > > > once on the server.  An SG transfer can be quite large.  I don't know 
> > > > what a reasonable limit would be -- 16 perhaps?
> > > 
> > > Is there any reason why you think that 16 is ok? Or Can I set this
> > > value as the smallest value of all HC? I think that sg_tablesize
> > > cannot be a variable value because vhci interacts with different
> > > machines and all machines has different sg_tablesize value.
> > 
> > I didn't have any good reason for picking 16.  Using the smallest value 
> > of all the HCDs seems like a good idea.
> 
> I also have not seen an HCD with a value other than ~0 or 0 except for
> whci which uses 2048, but is not 2048 the maximum value of sg_tablesize?
> If so, ~0 is the minimum value of sg_tablesize that supports SG. Then
> can vhci use ~0 if we don't consider memory pressure of the server?
> 
> If all of the HCDs supporting SG have ~0 as sg_tablesize value, I
> think that whether we use an HCD locally or remotely, the degree of
> memory pressure is same in both local and remote usage.

You have a lot of leeway.  For example, there's no reason a single SG
transfer on the client has to correspond to a single SG transfer on the
host.  In theory the client's vhci-hcd can break a large SG transfer up
into a bunch of smaller pieces and send them to the host one by one,
then reassemble the results to complete the original transfer.  That
way the memory pressure on the host would be a lot smaller than on the
client.

Alan Stern
Suwan Kim July 5, 2019, 9:07 a.m. UTC | #6
On Thu, Jul 04, 2019 at 09:41:04PM -0400, Alan Stern wrote:
> On Fri, 5 Jul 2019, Suwan Kim wrote:
> 
> > On Mon, Jun 24, 2019 at 01:24:15PM -0400, Alan Stern wrote:
> > > On Mon, 24 Jun 2019, Suwan Kim wrote:
> > > 
> > > > > > +	hcd->self.sg_tablesize = ~0;
> > > > > > +	hcd->self.no_sg_constraint = 1;
> > > > > 
> > > > > You probably shouldn't do this, for two reasons.  First, sg_tablesize
> > > > > of the server's HCD may be smaller than ~0.  If the client's value is
> > > > > larger than the server's, a transfer could be accepted on the client
> > > > > but then fail on the server because the SG list was too big.
> > > 
> > > On the other hand, I don't know of any examples where an HCD has 
> > > sg_tablesize set to anything other than 0 or ~0.  vhci-hcd might end up 
> > > being the only one.
> > > 
> > > > > Also, you may want to restrict the size of SG transfers even further,
> > > > > so that you don't have to allocate a tremendous amount of memory all at
> > > > > once on the server.  An SG transfer can be quite large.  I don't know 
> > > > > what a reasonable limit would be -- 16 perhaps?
> > > > 
> > > > Is there any reason why you think that 16 is ok? Or Can I set this
> > > > value as the smallest value of all HC? I think that sg_tablesize
> > > > cannot be a variable value because vhci interacts with different
> > > > machines and all machines has different sg_tablesize value.
> > > 
> > > I didn't have any good reason for picking 16.  Using the smallest value 
> > > of all the HCDs seems like a good idea.
> > 
> > I also have not seen an HCD with a value other than ~0 or 0 except for
> > whci which uses 2048, but is not 2048 the maximum value of sg_tablesize?
> > If so, ~0 is the minimum value of sg_tablesize that supports SG. Then
> > can vhci use ~0 if we don't consider memory pressure of the server?
> > 
> > If all of the HCDs supporting SG have ~0 as sg_tablesize value, I
> > think that whether we use an HCD locally or remotely, the degree of
> > memory pressure is same in both local and remote usage.
> 
> You have a lot of leeway.  For example, there's no reason a single SG
> transfer on the client has to correspond to a single SG transfer on the
> host.  In theory the client's vhci-hcd can break a large SG transfer up
> into a bunch of smaller pieces and send them to the host one by one,
> then reassemble the results to complete the original transfer.  That
> way the memory pressure on the host would be a lot smaller than on the
> client.

Thank you for the feedback, Alan. I understood your comment. It
seems to be a good idea to use sg_tablesize to alleviate the memory
pressure of the host. But I think 16 is too small for USB 3.0 device
because USB 3.0 storage device in my machine usually uses more than
30 SG entries. So, I will set sg_tablesize to 32.

Regards

Suwan Kim
diff mbox series

Patch

diff --git a/drivers/usb/usbip/stub_rx.c b/drivers/usb/usbip/stub_rx.c
index 97b09a42a10c..798854b9bbc8 100644
--- a/drivers/usb/usbip/stub_rx.c
+++ b/drivers/usb/usbip/stub_rx.c
@@ -7,6 +7,7 @@ 
 #include <linux/kthread.h>
 #include <linux/usb.h>
 #include <linux/usb/hcd.h>
+#include <linux/scatterlist.h>
 
 #include "usbip_common.h"
 #include "stub.h"
@@ -446,7 +447,11 @@  static void stub_recv_cmd_submit(struct stub_device *sdev,
 	struct stub_priv *priv;
 	struct usbip_device *ud = &sdev->ud;
 	struct usb_device *udev = sdev->udev;
+	struct scatterlist *sgl;
+	int nents;
 	int pipe = get_pipe(sdev, pdu);
+	unsigned long long buf_len;
+	int use_sg;
 
 	if (pipe == -1)
 		return;
@@ -455,6 +460,10 @@  static void stub_recv_cmd_submit(struct stub_device *sdev,
 	if (!priv)
 		return;
 
+	buf_len = (unsigned long long)pdu->u.cmd_submit.transfer_buffer_length;
+	/* Check if the server's HCD supprot sg */
+	use_sg = pdu->u.cmd_submit.num_sgs && udev->bus->sg_tablesize;
+
 	/* setup a urb */
 	if (usb_pipeisoc(pipe))
 		priv->urb = usb_alloc_urb(pdu->u.cmd_submit.number_of_packets,
@@ -468,13 +477,22 @@  static void stub_recv_cmd_submit(struct stub_device *sdev,
 	}
 
 	/* allocate urb transfer buffer, if needed */
-	if (pdu->u.cmd_submit.transfer_buffer_length > 0) {
-		priv->urb->transfer_buffer =
-			kzalloc(pdu->u.cmd_submit.transfer_buffer_length,
-				GFP_KERNEL);
-		if (!priv->urb->transfer_buffer) {
-			usbip_event_add(ud, SDEV_EVENT_ERROR_MALLOC);
-			return;
+	if (buf_len > 0) {
+		if (use_sg) {
+			sgl = sgl_alloc(buf_len, GFP_KERNEL, &nents);
+			if (!sgl) {
+				usbip_event_add(ud, SDEV_EVENT_ERROR_MALLOC);
+				return;
+			}
+			priv->urb->sg = sgl;
+			priv->urb->num_sgs = nents;
+			priv->urb->transfer_buffer = NULL;
+		} else {
+			priv->urb->transfer_buffer = kzalloc(buf_len, GFP_KERNEL);
+			if (!priv->urb->transfer_buffer) {
+				usbip_event_add(ud, SDEV_EVENT_ERROR_MALLOC);
+				return;
+			}
 		}
 	}
 
diff --git a/drivers/usb/usbip/stub_tx.c b/drivers/usb/usbip/stub_tx.c
index f0ec41a50cbc..ece129cd0b28 100644
--- a/drivers/usb/usbip/stub_tx.c
+++ b/drivers/usb/usbip/stub_tx.c
@@ -5,6 +5,7 @@ 
 
 #include <linux/kthread.h>
 #include <linux/socket.h>
+#include <linux/scatterlist.h>
 
 #include "usbip_common.h"
 #include "stub.h"
@@ -13,11 +14,21 @@  static void stub_free_priv_and_urb(struct stub_priv *priv)
 {
 	struct urb *urb = priv->urb;
 
-	kfree(urb->setup_packet);
-	urb->setup_packet = NULL;
+	if (urb->setup_packet) {
+		kfree(urb->setup_packet);
+		urb->setup_packet = NULL;
+	}
+
+	if (urb->transfer_buffer) {
+		kfree(urb->transfer_buffer);
+		urb->transfer_buffer = NULL;
+	}
 
-	kfree(urb->transfer_buffer);
-	urb->transfer_buffer = NULL;
+	if (urb->num_sgs) {
+		sgl_free(urb->sg);
+		urb->sg = NULL;
+		urb->num_sgs = 0;
+	}
 
 	list_del(&priv->list);
 	kmem_cache_free(stub_priv_cache, priv);
@@ -161,13 +172,16 @@  static int stub_send_ret_submit(struct stub_device *sdev)
 		struct usbip_header pdu_header;
 		struct usbip_iso_packet_descriptor *iso_buffer = NULL;
 		struct kvec *iov = NULL;
+		struct scatterlist *sg;
 		int iovnum = 0;
+		int i;
 
 		txsize = 0;
 		memset(&pdu_header, 0, sizeof(pdu_header));
 		memset(&msg, 0, sizeof(msg));
 
-		if (urb->actual_length > 0 && !urb->transfer_buffer) {
+		if (urb->actual_length > 0 && !urb->transfer_buffer &&
+		   !urb->num_sgs) {
 			dev_err(&sdev->udev->dev,
 				"urb: actual_length %d transfer_buffer null\n",
 				urb->actual_length);
@@ -176,6 +190,8 @@  static int stub_send_ret_submit(struct stub_device *sdev)
 
 		if (usb_pipetype(urb->pipe) == PIPE_ISOCHRONOUS)
 			iovnum = 2 + urb->number_of_packets;
+		else if (usb_pipein(urb->pipe) && urb->actual_length > 0 && urb->num_sgs)
+			iovnum = 1 + urb->num_sgs;
 		else
 			iovnum = 2;
 
@@ -203,9 +219,30 @@  static int stub_send_ret_submit(struct stub_device *sdev)
 		if (usb_pipein(urb->pipe) &&
 		    usb_pipetype(urb->pipe) != PIPE_ISOCHRONOUS &&
 		    urb->actual_length > 0) {
-			iov[iovnum].iov_base = urb->transfer_buffer;
-			iov[iovnum].iov_len  = urb->actual_length;
-			iovnum++;
+			if (urb->num_sgs) {
+				unsigned int copy = urb->actual_length;
+				int size;
+
+				for_each_sg(urb->sg, sg, urb->num_sgs ,i) {
+					if (copy == 0)
+						break;
+
+					if (copy < sg->length)
+						size = copy;
+					else
+						size = sg->length;
+
+					iov[iovnum].iov_base = sg_virt(sg);
+					iov[iovnum].iov_len = size;
+
+					iovnum++;
+					copy -= size;
+				}
+			} else {
+				iov[iovnum].iov_base = urb->transfer_buffer;
+				iov[iovnum].iov_len  = urb->actual_length;
+				iovnum++;
+			}
 			txsize += urb->actual_length;
 		} else if (usb_pipein(urb->pipe) &&
 			   usb_pipetype(urb->pipe) == PIPE_ISOCHRONOUS) {
diff --git a/drivers/usb/usbip/usbip_common.c b/drivers/usb/usbip/usbip_common.c
index 45da3e01c7b0..56b2a1fbe0bf 100644
--- a/drivers/usb/usbip/usbip_common.c
+++ b/drivers/usb/usbip/usbip_common.c
@@ -365,6 +365,7 @@  static void usbip_pack_cmd_submit(struct usbip_header *pdu, struct urb *urb,
 		spdu->start_frame		= urb->start_frame;
 		spdu->number_of_packets		= urb->number_of_packets;
 		spdu->interval			= urb->interval;
+		spdu->num_sgs			= urb->num_sgs;
 	} else  {
 		urb->transfer_flags         = spdu->transfer_flags;
 		urb->transfer_buffer_length = spdu->transfer_buffer_length;
@@ -434,6 +435,7 @@  static void correct_endian_cmd_submit(struct usbip_header_cmd_submit *pdu,
 {
 	if (send) {
 		pdu->transfer_flags = cpu_to_be32(pdu->transfer_flags);
+		pdu->num_sgs = cpu_to_be32(pdu->num_sgs);
 
 		cpu_to_be32s(&pdu->transfer_buffer_length);
 		cpu_to_be32s(&pdu->start_frame);
@@ -441,6 +443,7 @@  static void correct_endian_cmd_submit(struct usbip_header_cmd_submit *pdu,
 		cpu_to_be32s(&pdu->interval);
 	} else {
 		pdu->transfer_flags = be32_to_cpu(pdu->transfer_flags);
+		pdu->num_sgs = be32_to_cpu(pdu->num_sgs);
 
 		be32_to_cpus(&pdu->transfer_buffer_length);
 		be32_to_cpus(&pdu->start_frame);
@@ -680,8 +683,12 @@  EXPORT_SYMBOL_GPL(usbip_pad_iso);
 /* some members of urb must be substituted before. */
 int usbip_recv_xbuff(struct usbip_device *ud, struct urb *urb)
 {
-	int ret;
+	struct scatterlist *sg;
+	int ret = 0;
+	int recv;
 	int size;
+	int copy;
+	int i;
 
 	if (ud->side == USBIP_STUB || ud->side == USBIP_VUDC) {
 		/* the direction of urb must be OUT. */
@@ -712,14 +719,49 @@  int usbip_recv_xbuff(struct usbip_device *ud, struct urb *urb)
 		}
 	}
 
-	ret = usbip_recv(ud->tcp_socket, urb->transfer_buffer, size);
-	if (ret != size) {
-		dev_err(&urb->dev->dev, "recv xbuf, %d\n", ret);
-		if (ud->side == USBIP_STUB || ud->side == USBIP_VUDC) {
-			usbip_event_add(ud, SDEV_EVENT_ERROR_TCP);
-		} else {
-			usbip_event_add(ud, VDEV_EVENT_ERROR_TCP);
-			return -EPIPE;
+	if (urb->num_sgs) {
+		copy = size;
+		for_each_sg(urb->sg, sg, urb->num_sgs, i) {
+			int recv_size;
+
+			if (copy < sg->length)
+				recv_size = copy;
+			else
+				recv_size = sg->length;
+
+			recv = usbip_recv(ud->tcp_socket, sg_virt(sg), recv_size);
+			if (recv != recv_size) {
+				dev_err(&urb->dev->dev, "recv xbuf, %d\n", ret);
+				if (ud->side == USBIP_STUB || ud->side == USBIP_VUDC) {
+					usbip_event_add(ud, SDEV_EVENT_ERROR_TCP);
+				} else {
+					usbip_event_add(ud, VDEV_EVENT_ERROR_TCP);
+					return -EPIPE;
+				}
+			}
+			copy -= recv;
+			ret += recv;
+		}
+
+		if (ret != size) {
+			dev_err(&urb->dev->dev, "recv xbuf, %d\n", ret);
+			if (ud->side == USBIP_STUB || ud->side == USBIP_VUDC) {
+				usbip_event_add(ud, SDEV_EVENT_ERROR_TCP);
+			} else {
+				usbip_event_add(ud, VDEV_EVENT_ERROR_TCP);
+				return -EPIPE;
+			}
+		}
+	} else {
+		ret = usbip_recv(ud->tcp_socket, urb->transfer_buffer, size);
+		if (ret != size) {
+			dev_err(&urb->dev->dev, "recv xbuf, %d\n", ret);
+			if (ud->side == USBIP_STUB || ud->side == USBIP_VUDC) {
+				usbip_event_add(ud, SDEV_EVENT_ERROR_TCP);
+			} else {
+				usbip_event_add(ud, VDEV_EVENT_ERROR_TCP);
+				return -EPIPE;
+			}
 		}
 	}
 
diff --git a/drivers/usb/usbip/usbip_common.h b/drivers/usb/usbip/usbip_common.h
index bf8afe9b5883..b395946c4d03 100644
--- a/drivers/usb/usbip/usbip_common.h
+++ b/drivers/usb/usbip/usbip_common.h
@@ -143,6 +143,7 @@  struct usbip_header_basic {
  * struct usbip_header_cmd_submit - USBIP_CMD_SUBMIT packet header
  * @transfer_flags: URB flags
  * @transfer_buffer_length: the data size for (in) or (out) transfer
+ * @num_sgs: the number of scatter gather list of URB
  * @start_frame: initial frame for isochronous or interrupt transfers
  * @number_of_packets: number of isochronous packets
  * @interval: maximum time for the request on the server-side host controller
@@ -151,6 +152,7 @@  struct usbip_header_basic {
 struct usbip_header_cmd_submit {
 	__u32 transfer_flags;
 	__s32 transfer_buffer_length;
+	__u32 num_sgs;
 
 	/* it is difficult for usbip to sync frames (reserved only?) */
 	__s32 start_frame;
diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c
index be87c8a63e24..cc93c1a87a3e 100644
--- a/drivers/usb/usbip/vhci_hcd.c
+++ b/drivers/usb/usbip/vhci_hcd.c
@@ -696,7 +696,8 @@  static int vhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
 	}
 	vdev = &vhci_hcd->vdev[portnum-1];
 
-	if (!urb->transfer_buffer && urb->transfer_buffer_length) {
+	if (!urb->transfer_buffer && !urb->num_sgs &&
+	     urb->transfer_buffer_length) {
 		dev_dbg(dev, "Null URB transfer buffer\n");
 		return -EINVAL;
 	}
@@ -1142,6 +1143,11 @@  static int vhci_setup(struct usb_hcd *hcd)
 		hcd->speed = HCD_USB3;
 		hcd->self.root_hub->speed = USB_SPEED_SUPER;
 	}
+
+	/* support sg */
+	hcd->self.sg_tablesize = ~0;
+	hcd->self.no_sg_constraint = 1;
+
 	return 0;
 }
 
diff --git a/drivers/usb/usbip/vhci_tx.c b/drivers/usb/usbip/vhci_tx.c
index 2fa26d0578d7..3472180f5af8 100644
--- a/drivers/usb/usbip/vhci_tx.c
+++ b/drivers/usb/usbip/vhci_tx.c
@@ -5,6 +5,7 @@ 
 
 #include <linux/kthread.h>
 #include <linux/slab.h>
+#include <linux/scatterlist.h>
 
 #include "usbip_common.h"
 #include "vhci.h"
@@ -51,12 +52,13 @@  static struct vhci_priv *dequeue_from_priv_tx(struct vhci_device *vdev)
 static int vhci_send_cmd_submit(struct vhci_device *vdev)
 {
 	struct vhci_priv *priv = NULL;
-
+	struct scatterlist *sg;
 	struct msghdr msg;
-	struct kvec iov[3];
+	struct kvec *iov;
 	size_t txsize;
-
 	size_t total_size = 0;
+	int iovnum;
+	int i;
 
 	while ((priv = dequeue_from_priv_tx(vdev)) != NULL) {
 		int ret;
@@ -72,18 +74,41 @@  static int vhci_send_cmd_submit(struct vhci_device *vdev)
 		usbip_dbg_vhci_tx("setup txdata urb seqnum %lu\n",
 				  priv->seqnum);
 
+		if (urb->num_sgs && usb_pipeout(urb->pipe))
+			iovnum = 2 + urb->num_sgs;
+		else
+			iovnum = 3;
+
+		iov = kzalloc(iovnum * sizeof(*iov), GFP_KERNEL);
+		if (!iov) {
+			usbip_event_add(&vdev->ud,
+						SDEV_EVENT_ERROR_MALLOC);
+			return -ENOMEM;
+		}
+
 		/* 1. setup usbip_header */
 		setup_cmd_submit_pdu(&pdu_header, urb);
 		usbip_header_correct_endian(&pdu_header, 1);
+		iovnum = 0;
 
-		iov[0].iov_base = &pdu_header;
-		iov[0].iov_len  = sizeof(pdu_header);
+		iov[iovnum].iov_base = &pdu_header;
+		iov[iovnum].iov_len  = sizeof(pdu_header);
 		txsize += sizeof(pdu_header);
+		iovnum++;
 
 		/* 2. setup transfer buffer */
 		if (!usb_pipein(urb->pipe) && urb->transfer_buffer_length > 0) {
-			iov[1].iov_base = urb->transfer_buffer;
-			iov[1].iov_len  = urb->transfer_buffer_length;
+			if (urb->num_sgs && !usb_endpoint_xfer_isoc(&urb->ep->desc)) {
+				for_each_sg(urb->sg, sg, urb->num_sgs ,i) {
+					iov[iovnum].iov_base = sg_virt(sg);
+					iov[iovnum].iov_len = sg->length;
+					iovnum++;
+				}
+			} else {
+				iov[iovnum].iov_base = urb->transfer_buffer;
+				iov[iovnum].iov_len  = urb->transfer_buffer_length;
+				iovnum++;
+			}
 			txsize += urb->transfer_buffer_length;
 		}
 
@@ -93,25 +118,29 @@  static int vhci_send_cmd_submit(struct vhci_device *vdev)
 
 			iso_buffer = usbip_alloc_iso_desc_pdu(urb, &len);
 			if (!iso_buffer) {
+				kfree(iov);
 				usbip_event_add(&vdev->ud,
 						SDEV_EVENT_ERROR_MALLOC);
 				return -1;
 			}
 
-			iov[2].iov_base = iso_buffer;
-			iov[2].iov_len  = len;
+			iov[iovnum].iov_base = iso_buffer;
+			iov[iovnum].iov_len  = len;
+			iovnum++;
 			txsize += len;
 		}
 
-		ret = kernel_sendmsg(vdev->ud.tcp_socket, &msg, iov, 3, txsize);
+		ret = kernel_sendmsg(vdev->ud.tcp_socket, &msg, iov, iovnum, txsize);
 		if (ret != txsize) {
 			pr_err("sendmsg failed!, ret=%d for %zd\n", ret,
 			       txsize);
+			kfree(iov);
 			kfree(iso_buffer);
 			usbip_event_add(&vdev->ud, VDEV_EVENT_ERROR_TCP);
 			return -1;
 		}
 
+		kfree(iov);
 		kfree(iso_buffer);
 		usbip_dbg_vhci_tx("send txdata\n");