diff mbox

[v2,2/2] qdisk - hw/block/xen_disk: grant copy implementation

Message ID 1465811036-17026-3-git-send-email-paulinaszubarczyk@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paulina Szubarczyk June 13, 2016, 9:43 a.m. UTC
Copy data operated on during request from/to local buffers to/from 
the grant references. 

Before grant copy operation local buffers must be allocated what is 
done by calling ioreq_init_copy_buffers. For the 'read' operation, 
first, the qemu device invokes the read operation on local buffers 
and on the completion grant copy is called and buffers are freed. 
For the 'write' operation grant copy is performed before invoking 
write by qemu device. 

A new value 'feature_grant_copy' is added to recognize when the 
grant copy operation is supported by a guest. 
The body of the function 'ioreq_runio_qemu_aio' is moved to 
'ioreq_runio_qemu_aio_blk' and in the 'ioreq_runio_qemu_aio' depending
on the support for grant copy according checks, initialization, grant 
operation are made, then the 'ioreq_runio_qemu_aio_blk' function is 
called. 

Signed-off-by: Paulina Szubarczyk <paulinaszubarczyk@gmail.com>
---
Changes since v1:
- removed the 'ioreq_write','ioreq_read_init','ioreq_read' functions 
- implemented 'ioreq_init_copy_buffers', 'ioreq_copy' 
- reverted the removal of grant map and introduced conditional invoking
  grant copy or grant map
- resigned from caching the local buffers on behalf of allocating the 
  required amount of pages at once. The cached structure would require 
  to have an lock guard and I suppose that the performance improvement 
  would degraded. 

 hw/block/xen_disk.c | 175 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 163 insertions(+), 12 deletions(-)

Comments

David Vrabel June 13, 2016, 10:15 a.m. UTC | #1
On 13/06/16 10:43, Paulina Szubarczyk wrote:
> Copy data operated on during request from/to local buffers to/from 
> the grant references. 
> 
> Before grant copy operation local buffers must be allocated what is 
> done by calling ioreq_init_copy_buffers. For the 'read' operation, 
> first, the qemu device invokes the read operation on local buffers 
> and on the completion grant copy is called and buffers are freed. 
> For the 'write' operation grant copy is performed before invoking 
> write by qemu device. 
> 
> A new value 'feature_grant_copy' is added to recognize when the 
> grant copy operation is supported by a guest. 
> The body of the function 'ioreq_runio_qemu_aio' is moved to 
> 'ioreq_runio_qemu_aio_blk' and in the 'ioreq_runio_qemu_aio' depending
> on the support for grant copy according checks, initialization, grant 
> operation are made, then the 'ioreq_runio_qemu_aio_blk' function is 
> called. 

I think you should add an option to force the use of grant mapping even
if copy support is detected.  If future changes to the grant map
infrastructure makes it faster or if grant map scales better in some
systems, then it would be useful to be able to use it.

> +    rc = xc_gnttab_grant_copy(gnt, count, segs);
> +
> +    if (rc) {
> +        xen_be_printf(&ioreq->blkdev->xendev, 0, 
> +                      "failed to copy data %d \n", rc);

I don't think you want to log anything here.  A guest could spam the
logs by repeatedly submitting requests with (for example) bad grant
references.

> +        ioreq->aio_errors++;
> +        r = -1; goto out;

return -1;

> @@ -1020,10 +1163,18 @@ static int blk_connect(struct XenDevice *xendev)
>  
>      xen_be_bind_evtchn(&blkdev->xendev);
>  
> +    xc_gnttab_grant_copy_segment_t seg;
> +    blkdev->feature_grant_copy = 
> +                (xc_gnttab_grant_copy(blkdev->xendev.gnttabdev, 0, &seg) == 0);

You can pass NULL for the segments here.

> +
> +    xen_be_printf(&blkdev->xendev, 3, "GRANT COPY %s\n", 
> +                  blkdev->feature_grant_copy ? "ENABLED" : "DISABLED");
> +
>      xen_be_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
>                    "remote port %d, local port %d\n",
>                    blkdev->xendev.protocol, blkdev->ring_ref,
>                    blkdev->xendev.remote_port, blkdev->xendev.local_port);
> +
>      return 0;
>  }

David
Paulina Szubarczyk June 13, 2016, 10:44 a.m. UTC | #2
On Mon, 2016-06-13 at 11:15 +0100, David Vrabel wrote:
> On 13/06/16 10:43, Paulina Szubarczyk wrote:
> > Copy data operated on during request from/to local buffers to/from 
> > the grant references. 
> > 
> > Before grant copy operation local buffers must be allocated what is 
> > done by calling ioreq_init_copy_buffers. For the 'read' operation, 
> > first, the qemu device invokes the read operation on local buffers 
> > and on the completion grant copy is called and buffers are freed. 
> > For the 'write' operation grant copy is performed before invoking 
> > write by qemu device. 
> > 
> > A new value 'feature_grant_copy' is added to recognize when the 
> > grant copy operation is supported by a guest. 
> > The body of the function 'ioreq_runio_qemu_aio' is moved to 
> > 'ioreq_runio_qemu_aio_blk' and in the 'ioreq_runio_qemu_aio' depending
> > on the support for grant copy according checks, initialization, grant 
> > operation are made, then the 'ioreq_runio_qemu_aio_blk' function is 
> > called. 
> 
> I think you should add an option to force the use of grant mapping even
> if copy support is detected.  If future changes to the grant map
> infrastructure makes it faster or if grant map scales better in some
> systems, then it would be useful to be able to use it.

The 'feature_grant_copy' is a boolean and could be set to false in such case.
There could be added a node in XenStore, for example
'feature-force-grant-map', which when set by frontend will be read
during a connection and changed the value to false forcing the grant map
operation. 

> > +    rc = xc_gnttab_grant_copy(gnt, count, segs);
> > +
> > +    if (rc) {
> > +        xen_be_printf(&ioreq->blkdev->xendev, 0, 
> > +                      "failed to copy data %d \n", rc);
> 
> I don't think you want to log anything here.  A guest could spam the
> logs by repeatedly submitting requests with (for example) bad grant
> references.

I might removed that log or change the level, though when the mapping
fails for grant map it is logged in a similar manner.

> > +        ioreq->aio_errors++;
> > +        r = -1; goto out;
> 
> return -1;
> 
> > @@ -1020,10 +1163,18 @@ static int blk_connect(struct XenDevice *xendev)
> >  
> >      xen_be_bind_evtchn(&blkdev->xendev);
> >  
> > +    xc_gnttab_grant_copy_segment_t seg;
> > +    blkdev->feature_grant_copy = 
> > +                (xc_gnttab_grant_copy(blkdev->xendev.gnttabdev, 0, &seg) == 0);
> 
> You can pass NULL for the segments here.

Yes, thank you.
> 
> > +
> > +    xen_be_printf(&blkdev->xendev, 3, "GRANT COPY %s\n", 
> > +                  blkdev->feature_grant_copy ? "ENABLED" : "DISABLED");
> > +
> >      xen_be_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
> >                    "remote port %d, local port %d\n",
> >                    blkdev->xendev.protocol, blkdev->ring_ref,
> >                    blkdev->xendev.remote_port, blkdev->xendev.local_port);
> > +
> >      return 0;
> >  }
> 
> David
> 
Paulina
David Vrabel June 13, 2016, 10:58 a.m. UTC | #3
On 13/06/16 11:44, Paulina Szubarczyk wrote:
> On Mon, 2016-06-13 at 11:15 +0100, David Vrabel wrote:
>> On 13/06/16 10:43, Paulina Szubarczyk wrote:
>>> Copy data operated on during request from/to local buffers to/from 
>>> the grant references. 
>>>
>>> Before grant copy operation local buffers must be allocated what is 
>>> done by calling ioreq_init_copy_buffers. For the 'read' operation, 
>>> first, the qemu device invokes the read operation on local buffers 
>>> and on the completion grant copy is called and buffers are freed. 
>>> For the 'write' operation grant copy is performed before invoking 
>>> write by qemu device. 
>>>
>>> A new value 'feature_grant_copy' is added to recognize when the 
>>> grant copy operation is supported by a guest. 
>>> The body of the function 'ioreq_runio_qemu_aio' is moved to 
>>> 'ioreq_runio_qemu_aio_blk' and in the 'ioreq_runio_qemu_aio' depending
>>> on the support for grant copy according checks, initialization, grant 
>>> operation are made, then the 'ioreq_runio_qemu_aio_blk' function is 
>>> called. 
>>
>> I think you should add an option to force the use of grant mapping even
>> if copy support is detected.  If future changes to the grant map
>> infrastructure makes it faster or if grant map scales better in some
>> systems, then it would be useful to be able to use it.
> 
> The 'feature_grant_copy' is a boolean and could be set to false in such case.
> There could be added a node in XenStore, for example
> 'feature-force-grant-map', which when set by frontend will be read
> during a connection and changed the value to false forcing the grant map
> operation.

This option should not be exposed/controlled by the frontend.  I was
thinking of a command line option to qemu or similar.

David
Paulina Szubarczyk June 15, 2016, 4:55 p.m. UTC | #4
On Mon, 2016-06-13 at 11:58 +0100, David Vrabel wrote:
> On 13/06/16 11:44, Paulina Szubarczyk wrote:
> > On Mon, 2016-06-13 at 11:15 +0100, David Vrabel wrote:
> >> On 13/06/16 10:43, Paulina Szubarczyk wrote:
> >>> Copy data operated on during request from/to local buffers to/from 
> >>> the grant references. 
> >>>
> >>> Before grant copy operation local buffers must be allocated what is 
> >>> done by calling ioreq_init_copy_buffers. For the 'read' operation, 
> >>> first, the qemu device invokes the read operation on local buffers 
> >>> and on the completion grant copy is called and buffers are freed. 
> >>> For the 'write' operation grant copy is performed before invoking 
> >>> write by qemu device. 
> >>>
> >>> A new value 'feature_grant_copy' is added to recognize when the 
> >>> grant copy operation is supported by a guest. 
> >>> The body of the function 'ioreq_runio_qemu_aio' is moved to 
> >>> 'ioreq_runio_qemu_aio_blk' and in the 'ioreq_runio_qemu_aio' depending
> >>> on the support for grant copy according checks, initialization, grant 
> >>> operation are made, then the 'ioreq_runio_qemu_aio_blk' function is 
> >>> called. 
> >>
> >> I think you should add an option to force the use of grant mapping even
> >> if copy support is detected.  If future changes to the grant map
> >> infrastructure makes it faster or if grant map scales better in some
> >> systems, then it would be useful to be able to use it.
> > 
> > The 'feature_grant_copy' is a boolean and could be set to false in such case.
> > There could be added a node in XenStore, for example
> > 'feature-force-grant-map', which when set by frontend will be read
> > during a connection and changed the value to false forcing the grant map
> > operation.
> 
> This option should not be exposed/controlled by the frontend.  I was
> thinking of a command line option to qemu or similar.

I think then there should be possibility for setting this in 
xl block-attach <disk-spec-component(s)> and in the config file that 
defines the domain in the corresponding fields. But if there is 
a need for such feature I would rather do it in a different patch.

Paulina
diff mbox

Patch

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 37e14d1..af6b8c7 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -131,6 +131,9 @@  struct XenBlkDev {
     unsigned int        persistent_gnt_count;
     unsigned int        max_grants;
 
+    /* Grant copy */
+    gboolean            feature_grant_copy;
+
     /* qemu block driver */
     DriveInfo           *dinfo;
     BlockBackend        *blk;
@@ -500,6 +503,100 @@  static int ioreq_map(struct ioreq *ioreq)
     return 0;
 }
 
+static void* get_buffer(int count) 
+{
+    return xc_memalign(xen_xc, XC_PAGE_SIZE, count*XC_PAGE_SIZE);
+}
+
+static void free_buffers(struct ioreq *ioreq) 
+{
+    int i;
+
+    for (i = 0; i < ioreq->v.niov; i++) { 
+        ioreq->page[i] = NULL;
+    }
+
+    free(ioreq->pages);
+}
+
+static int ioreq_init_copy_buffers(struct ioreq *ioreq) {
+    int i;
+
+    if (ioreq->v.niov == 0) {
+        return 0;
+    }
+
+    ioreq->pages = get_buffer(ioreq->v.niov);
+    if (!ioreq->pages) { 
+        return -1; 
+    }
+
+    for (i = 0; i < ioreq->v.niov; i++) {
+        ioreq->page[i] = ioreq->pages + i*XC_PAGE_SIZE;
+        ioreq->v.iov[i].iov_base += (uintptr_t)ioreq->page[i];
+    }
+
+    return 0;
+}
+
+static int ioreq_copy(struct ioreq *ioreq) 
+{
+    XenGnttab gnt = ioreq->blkdev->xendev.gnttabdev;
+    xc_gnttab_grant_copy_segment_t segs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+    int i, count = 0, r, rc;
+    int64_t file_blk = ioreq->blkdev->file_blk;
+
+    if (ioreq->v.niov == 0) {
+        r = 0; goto out;
+    }
+
+    count = ioreq->v.niov;
+
+    for (i = 0; i < count; i++) {
+
+        xc_gnttab_grant_copy_ptr_t *from, *to;
+
+        if (ioreq->req.operation == BLKIF_OP_READ) {
+            segs[i].flags = GNTCOPY_dest_gref;
+            from = &(segs[i].dest);
+            to = &(segs[i].source);
+        } else {
+            segs[i].flags = GNTCOPY_source_gref;
+            from = &(segs[i].source);
+            to = &(segs[i].dest);
+        }
+        segs[i].len = (ioreq->req.seg[i].last_sect 
+                       - ioreq->req.seg[i].first_sect + 1) * file_blk;
+        from->foreign.ref = ioreq->refs[i];
+        from->foreign.domid = ioreq->domids[i];
+        from->foreign.offset = ioreq->req.seg[i].first_sect * file_blk;
+        to->virt = ioreq->v.iov[i].iov_base;
+    }
+
+    rc = xc_gnttab_grant_copy(gnt, count, segs);
+
+    if (rc) {
+        xen_be_printf(&ioreq->blkdev->xendev, 0, 
+                      "failed to copy data %d \n", rc);
+        ioreq->aio_errors++;
+        r = -1; goto out;
+    } else {
+        r = 0;
+    }
+
+    for (i = 0; i < count; i++) {
+        if (segs[i].status != GNTST_okay) {
+            xen_be_printf(&ioreq->blkdev->xendev, 0, 
+                          "failed to copy data %d for gref %d, domid %d\n", rc, 
+                          ioreq->refs[i], ioreq->domids[i]); 
+            ioreq->aio_errors++;
+            r = -1;
+        }
+    }
+out:
+    return r;
+}
+
 static int ioreq_runio_qemu_aio(struct ioreq *ioreq);
 
 static void qemu_aio_complete(void *opaque, int ret)
@@ -521,6 +618,7 @@  static void qemu_aio_complete(void *opaque, int ret)
     if (ioreq->aio_inflight > 0) {
         return;
     }
+
     if (ioreq->postsync) {
         ioreq->postsync = 0;
         ioreq->aio_inflight++;
@@ -528,8 +626,32 @@  static void qemu_aio_complete(void *opaque, int ret)
         return;
     }
 
+    if (ioreq->blkdev->feature_grant_copy) {
+        switch (ioreq->req.operation) {
+        case BLKIF_OP_READ:
+            /* in case of failure ioreq->aio_errors is increased
+             * and it is logged */
+            ioreq_copy(ioreq);
+            free_buffers(ioreq);
+            break;
+        case BLKIF_OP_WRITE:
+        case BLKIF_OP_FLUSH_DISKCACHE:
+            if (!ioreq->req.nr_segments) {
+                break;
+            }
+            free_buffers(ioreq);
+            break;
+        default:
+            break;
+        }
+    }
+
     ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY;
-    ioreq_unmap(ioreq);
+    
+    if (!ioreq->blkdev->feature_grant_copy) {
+        ioreq_unmap(ioreq);
+    }
+
     ioreq_finish(ioreq);
     switch (ioreq->req.operation) {
     case BLKIF_OP_WRITE:
@@ -547,14 +669,42 @@  static void qemu_aio_complete(void *opaque, int ret)
     qemu_bh_schedule(ioreq->blkdev->bh);
 }
 
+static int ioreq_runio_qemu_aio_blk(struct ioreq *ioreq);
+
 static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
 {
-    struct XenBlkDev *blkdev = ioreq->blkdev;
+    if (ioreq->blkdev->feature_grant_copy) {
+
+        ioreq_init_copy_buffers(ioreq);
+        if (ioreq->req.nr_segments && (ioreq->req.operation == BLKIF_OP_WRITE ||
+            ioreq->req.operation == BLKIF_OP_FLUSH_DISKCACHE)) {
+            if (ioreq_copy(ioreq)) {
+                free_buffers(ioreq);
+                goto err;
+            }
+        }
+        if (ioreq_runio_qemu_aio_blk(ioreq)) goto err;
 
-    if (ioreq->req.nr_segments && ioreq_map(ioreq) == -1) {
-        goto err_no_map;
+    } else {
+        
+        if (ioreq->req.nr_segments && ioreq_map(ioreq)) goto err;
+        if (ioreq_runio_qemu_aio_blk(ioreq)) {
+            ioreq_unmap(ioreq);
+            goto err;
+        }
     }
 
+    return 0;
+err:
+    ioreq_finish(ioreq);
+    ioreq->status = BLKIF_RSP_ERROR;
+    return -1;
+}
+
+static int ioreq_runio_qemu_aio_blk(struct ioreq *ioreq) 
+{
+    struct XenBlkDev *blkdev = ioreq->blkdev;
+
     ioreq->aio_inflight++;
     if (ioreq->presync) {
         blk_aio_flush(ioreq->blkdev->blk, qemu_aio_complete, ioreq);
@@ -594,19 +744,12 @@  static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
     }
     default:
         /* unknown operation (shouldn't happen -- parse catches this) */
-        goto err;
+        return -1;
     }
 
     qemu_aio_complete(ioreq, 0);
 
     return 0;
-
-err:
-    ioreq_unmap(ioreq);
-err_no_map:
-    ioreq_finish(ioreq);
-    ioreq->status = BLKIF_RSP_ERROR;
-    return -1;
 }
 
 static int blk_send_response_one(struct ioreq *ioreq)
@@ -1020,10 +1163,18 @@  static int blk_connect(struct XenDevice *xendev)
 
     xen_be_bind_evtchn(&blkdev->xendev);
 
+    xc_gnttab_grant_copy_segment_t seg;
+    blkdev->feature_grant_copy = 
+                (xc_gnttab_grant_copy(blkdev->xendev.gnttabdev, 0, &seg) == 0);
+
+    xen_be_printf(&blkdev->xendev, 3, "GRANT COPY %s\n", 
+                  blkdev->feature_grant_copy ? "ENABLED" : "DISABLED");
+
     xen_be_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
                   "remote port %d, local port %d\n",
                   blkdev->xendev.protocol, blkdev->ring_ref,
                   blkdev->xendev.remote_port, blkdev->xendev.local_port);
+
     return 0;
 }