Message ID | 1342168731-11797-4-git-send-email-asias@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 13 Jul 2012 16:38:51 +0800, Asias He <asias@redhat.com> wrote: > Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk > use_bio=1' to enable ->make_request_fn() based I/O path. This patch conflicts with Paolo's Bonzini's 'virtio-blk: allow toggling host cache between writeback and writethrough' which is also queued (see linux-next). I'm not sure what the correct behavior for bio & cacheflush is, if any. But as to the patch itself: it's a hack. 1) Leaving the guest's admin to turn on the switch is a terrible choice. 2) The block layer should stop merging and sorting when a device is fast, not the driver. 3) I pointed out that slow disks have low IOPS, so why is this conditional? Sure, more guest exits, but it's still a small number for a slow device. 4) The only case where we want merging is on a slow device when the host isn't doing it. Now, despite this, I'm prepared to commit it. But in my mind it's a hack: we should aim for use_bio to be based on a feature bit fed from the host, and use the module parameter only if we want to override it. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 13 Jul 2012 16:38:51 +0800, Asias He <asias@redhat.com> wrote: > This patch introduces bio-based IO path for virtio-blk. Acked-by: Rusty Russell <rusty@rustcorp.com.au> I just hope we can do better than a module option in future. Thanks, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/27/2012 08:33 AM, Rusty Russell wrote: > On Fri, 13 Jul 2012 16:38:51 +0800, Asias He <asias@redhat.com> wrote: >> Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk >> use_bio=1' to enable ->make_request_fn() based I/O path. > > This patch conflicts with Paolo's Bonzini's 'virtio-blk: allow toggling > host cache between writeback and writethrough' which is also queued (see > linux-next). Rebased against Paolo's patch in V4. > I'm not sure what the correct behavior for bio & cacheflush is, if any. REQ_FLUSH is not supported in the bio path. > But as to the patch itself: it's a hack. > > 1) Leaving the guest's admin to turn on the switch is a terrible choice. > 2) The block layer should stop merging and sorting when a device is > fast, not the driver. > 3) I pointed out that slow disks have low IOPS, so why is this > conditional? Sure, more guest exits, but it's still a small number > for a slow device. > 4) The only case where we want merging is on a slow device when the host > isn't doing it. > > Now, despite this, I'm prepared to commit it. But in my mind it's a > hack: we should aim for use_bio to be based on a feature bit fed from > the host, and use the module parameter only if we want to override it. OK. A feature bit from host sound like a choice but a switch is also needed on host side. And for other OS, e.g. Windows, the bio thing does not apply at all. Anyway, I have to admit that adding a module parameter here is not the best choice. Let's think more.
> > I'm not sure what the correct behavior for bio & cacheflush is, if > > any. > > REQ_FLUSH is not supported in the bio path. Ouch, that's correct: @@ -414,7 +529,7 @@ static void virtblk_update_cache_mode(struct virtio_device *vdev) u8 writeback = virtblk_get_cache_mode(vdev); struct virtio_blk *vblk = vdev->priv; - if (writeback) + if (writeback && !use_bio) blk_queue_flush(vblk->disk->queue, REQ_FLUSH); else blk_queue_flush(vblk->disk->queue, 0); then it is not safe against power losses. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jul 28, 2012 at 10:38:41AM +0800, Asias He wrote: > On 07/27/2012 08:33 AM, Rusty Russell wrote: > >On Fri, 13 Jul 2012 16:38:51 +0800, Asias He <asias@redhat.com> wrote: > >>Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk > >>use_bio=1' to enable ->make_request_fn() based I/O path. > > > >This patch conflicts with Paolo's Bonzini's 'virtio-blk: allow toggling > >host cache between writeback and writethrough' which is also queued (see > >linux-next). > > Rebased against Paolo's patch in V4. > > >I'm not sure what the correct behavior for bio & cacheflush is, if any. > > REQ_FLUSH is not supported in the bio path. > > >But as to the patch itself: it's a hack. > > > >1) Leaving the guest's admin to turn on the switch is a terrible choice. > >2) The block layer should stop merging and sorting when a device is > > fast, not the driver. > >3) I pointed out that slow disks have low IOPS, so why is this > > conditional? Sure, more guest exits, but it's still a small number > > for a slow device. > >4) The only case where we want merging is on a slow device when the host > > isn't doing it. > > > >Now, despite this, I'm prepared to commit it. But in my mind it's a > >hack: we should aim for use_bio to be based on a feature bit fed from > >the host, and use the module parameter only if we want to override it. > > OK. A feature bit from host sound like a choice but a switch is also > needed on host side. qemu automatically gives you the ability to control any feature bit. > And for other OS, e.g. Windows, the bio thing > does not apply at all. Let's try to define when it's a good idea. Is it a hint to guest that backend handles small accesses efficiently so ok to disable batching? > Anyway, I have to admit that adding a module parameter here is not > the best choice. Let's think more. > > -- > Asias -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/29/2012 08:59 PM, Michael S. Tsirkin wrote: > On Sat, Jul 28, 2012 at 10:38:41AM +0800, Asias He wrote: >> On 07/27/2012 08:33 AM, Rusty Russell wrote: >>> On Fri, 13 Jul 2012 16:38:51 +0800, Asias He <asias@redhat.com> wrote: >>>> Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk >>>> use_bio=1' to enable ->make_request_fn() based I/O path. >>> >>> This patch conflicts with Paolo's Bonzini's 'virtio-blk: allow toggling >>> host cache between writeback and writethrough' which is also queued (see >>> linux-next). >> >> Rebased against Paolo's patch in V4. >> >>> I'm not sure what the correct behavior for bio & cacheflush is, if any. >> >> REQ_FLUSH is not supported in the bio path. >> >>> But as to the patch itself: it's a hack. >>> >>> 1) Leaving the guest's admin to turn on the switch is a terrible choice. >>> 2) The block layer should stop merging and sorting when a device is >>> fast, not the driver. >>> 3) I pointed out that slow disks have low IOPS, so why is this >>> conditional? Sure, more guest exits, but it's still a small number >>> for a slow device. >>> 4) The only case where we want merging is on a slow device when the host >>> isn't doing it. >>> >>> Now, despite this, I'm prepared to commit it. But in my mind it's a >>> hack: we should aim for use_bio to be based on a feature bit fed from >>> the host, and use the module parameter only if we want to override it. >> >> OK. A feature bit from host sound like a choice but a switch is also >> needed on host side. > > qemu automatically gives you the ability to control > any feature bit. Automatically? >> And for other OS, e.g. Windows, the bio thing >> does not apply at all. > > Let's try to define when it's a good idea. Is it a hint to guest that > backend handles small accesses efficiently so ok to disable batching? Yes. It's also a hint for latency reduction. >> Anyway, I have to admit that adding a module parameter here is not >> the best choice. Let's think more. >> >> -- >> Asias
On 07/28/2012 02:42 PM, Paolo Bonzini wrote: >>> I'm not sure what the correct behavior for bio & cacheflush is, if >>> any. >> >> REQ_FLUSH is not supported in the bio path. > > Ouch, that's correct: > > @@ -414,7 +529,7 @@ static void virtblk_update_cache_mode(struct virtio_device *vdev) > u8 writeback = virtblk_get_cache_mode(vdev); > struct virtio_blk *vblk = vdev->priv; > > - if (writeback) > + if (writeback && !use_bio) > blk_queue_flush(vblk->disk->queue, REQ_FLUSH); > else > blk_queue_flush(vblk->disk->queue, 0); > > then it is not safe against power losses. Yes. Something like this: qemu -drive file=foo.img,cache=writeback/unsafe is not safe against power losses also? I think we can add REQ_FLUSH & REQ_FUA support to bio path and that deserves another patch.
Il 30/07/2012 06:43, Asias He ha scritto: >> > > Yes. Something like this: > > qemu -drive file=foo.img,cache=writeback/unsafe > > is not safe against power losses also? cache=writeback and cache=none are safe, cache=unsafe isn't. > I think we can add REQ_FLUSH & REQ_FUA support to bio path and that > deserves another patch. You only need to add REQ_FLUSH support. The virtio-blk protocol does not support REQ_FUA, because there's no easy way to do it in userspace. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 30, 2012 at 12:43:12PM +0800, Asias He wrote: > I think we can add REQ_FLUSH & REQ_FUA support to bio path and that > deserves another patch. Adding it is a requirement for merging the code. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 30, 2012 at 09:31:06AM +0200, Paolo Bonzini wrote: > You only need to add REQ_FLUSH support. The virtio-blk protocol does > not support REQ_FUA, because there's no easy way to do it in userspace. A bio-based driver needs to handle both REQ_FLUSH and REQ_FUA as it does not get the sequencing of REQ_FUA into REQ_FLUSH that request based drivers can request. To what the REQ_FUA request gets translated is a different story. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/30/2012 09:43 PM, Christoph Hellwig wrote: > On Mon, Jul 30, 2012 at 12:43:12PM +0800, Asias He wrote: >> I think we can add REQ_FLUSH & REQ_FUA support to bio path and that >> deserves another patch. > > Adding it is a requirement for merging the code. > OK. Will add that.
On 07/30/2012 09:44 PM, Christoph Hellwig wrote: > On Mon, Jul 30, 2012 at 09:31:06AM +0200, Paolo Bonzini wrote: >> You only need to add REQ_FLUSH support. The virtio-blk protocol does >> not support REQ_FUA, because there's no easy way to do it in userspace. > > A bio-based driver needs to handle both REQ_FLUSH and REQ_FUA as it does > not get the sequencing of REQ_FUA into REQ_FLUSH that request based drivers > can request. To what the REQ_FUA request gets translated is a different story. I just sent out V5 to support both REQ_FLUSH AND REQ_FUA. Thanks, Christoph!
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 774c31d..e137190 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -14,6 +14,9 @@ #define PART_BITS 4 +static bool use_bio; +module_param(use_bio, bool, S_IRUGO); + static int major; static DEFINE_IDA(vd_index_ida); @@ -23,6 +26,7 @@ struct virtio_blk { struct virtio_device *vdev; struct virtqueue *vq; + wait_queue_head_t queue_wait; /* The disk structure for the kernel. */ struct gendisk *disk; @@ -51,53 +55,87 @@ struct virtio_blk struct virtblk_req { struct request *req; + struct bio *bio; struct virtio_blk_outhdr out_hdr; struct virtio_scsi_inhdr in_hdr; u8 status; + struct scatterlist sg[]; }; -static void blk_done(struct virtqueue *vq) +static inline int virtblk_result(struct virtblk_req *vbr) +{ + switch (vbr->status) { + case VIRTIO_BLK_S_OK: + return 0; + case VIRTIO_BLK_S_UNSUPP: + return -ENOTTY; + default: + return -EIO; + } +} + +static inline void virtblk_request_done(struct virtio_blk *vblk, + struct virtblk_req *vbr) +{ + struct request *req = vbr->req; + int error = virtblk_result(vbr); + + if (req->cmd_type == REQ_TYPE_BLOCK_PC) { + req->resid_len = vbr->in_hdr.residual; + req->sense_len = vbr->in_hdr.sense_len; + req->errors = vbr->in_hdr.errors; + } else if (req->cmd_type == REQ_TYPE_SPECIAL) { + req->errors = (error != 0); + } + + __blk_end_request_all(req, error); + mempool_free(vbr, vblk->pool); +} + +static inline void virtblk_bio_done(struct virtio_blk *vblk, + struct virtblk_req *vbr) +{ + bio_endio(vbr->bio, virtblk_result(vbr)); + mempool_free(vbr, vblk->pool); +} + +static void virtblk_done(struct virtqueue *vq) { struct virtio_blk *vblk = vq->vdev->priv; + unsigned long bio_done = 0, req_done = 0; struct virtblk_req *vbr; - unsigned int len; unsigned long flags; + unsigned int len; spin_lock_irqsave(vblk->disk->queue->queue_lock, flags); while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) { - int error; - - switch (vbr->status) { - case VIRTIO_BLK_S_OK: - error = 0; - break; - case VIRTIO_BLK_S_UNSUPP: - error = -ENOTTY; - break; - default: - error = -EIO; - break; - } - - switch (vbr->req->cmd_type) { - case REQ_TYPE_BLOCK_PC: - vbr->req->resid_len = vbr->in_hdr.residual; - vbr->req->sense_len = vbr->in_hdr.sense_len; - vbr->req->errors = vbr->in_hdr.errors; - break; - case REQ_TYPE_SPECIAL: - vbr->req->errors = (error != 0); - break; - default: - break; + if (vbr->bio) { + virtblk_bio_done(vblk, vbr); + bio_done++; + } else { + virtblk_request_done(vblk, vbr); + req_done++; } - - __blk_end_request_all(vbr->req, error); - mempool_free(vbr, vblk->pool); } /* In case queue is stopped waiting for more buffers. */ - blk_start_queue(vblk->disk->queue); + if (req_done) + blk_start_queue(vblk->disk->queue); spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags); + + if (bio_done) + wake_up(&vblk->queue_wait); +} + +static inline struct virtblk_req *virtblk_alloc_req(struct virtio_blk *vblk, + gfp_t gfp_mask) +{ + struct virtblk_req *vbr; + + vbr = mempool_alloc(vblk->pool, gfp_mask); + if (vbr && use_bio) + sg_init_table(vbr->sg, vblk->sg_elems); + + return vbr; } static bool do_req(struct request_queue *q, struct virtio_blk *vblk, @@ -106,13 +144,13 @@ static bool do_req(struct request_queue *q, struct virtio_blk *vblk, unsigned long num, out = 0, in = 0; struct virtblk_req *vbr; - vbr = mempool_alloc(vblk->pool, GFP_ATOMIC); + vbr = virtblk_alloc_req(vblk, GFP_ATOMIC); if (!vbr) /* When another request finishes we'll try again. */ return false; vbr->req = req; - + vbr->bio = NULL; if (req->cmd_flags & REQ_FLUSH) { vbr->out_hdr.type = VIRTIO_BLK_T_FLUSH; vbr->out_hdr.sector = 0; @@ -172,7 +210,8 @@ static bool do_req(struct request_queue *q, struct virtio_blk *vblk, } } - if (virtqueue_add_buf(vblk->vq, vblk->sg, out, in, vbr, GFP_ATOMIC)<0) { + if (virtqueue_add_buf(vblk->vq, vblk->sg, out, in, vbr, + GFP_ATOMIC) < 0) { mempool_free(vbr, vblk->pool); return false; } @@ -180,7 +219,7 @@ static bool do_req(struct request_queue *q, struct virtio_blk *vblk, return true; } -static void do_virtblk_request(struct request_queue *q) +static void virtblk_request(struct request_queue *q) { struct virtio_blk *vblk = q->queuedata; struct request *req; @@ -203,6 +242,82 @@ static void do_virtblk_request(struct request_queue *q) virtqueue_kick(vblk->vq); } +static void virtblk_add_buf_wait(struct virtio_blk *vblk, + struct virtblk_req *vbr, + unsigned long out, + unsigned long in) +{ + DEFINE_WAIT(wait); + + for (;;) { + prepare_to_wait_exclusive(&vblk->queue_wait, &wait, + TASK_UNINTERRUPTIBLE); + + spin_lock_irq(vblk->disk->queue->queue_lock); + if (virtqueue_add_buf(vblk->vq, vbr->sg, out, in, vbr, + GFP_ATOMIC) < 0) { + spin_unlock_irq(vblk->disk->queue->queue_lock); + io_schedule(); + } else { + virtqueue_kick(vblk->vq); + spin_unlock_irq(vblk->disk->queue->queue_lock); + break; + } + + } + + finish_wait(&vblk->queue_wait, &wait); +} + +static void virtblk_make_request(struct request_queue *q, struct bio *bio) +{ + struct virtio_blk *vblk = q->queuedata; + unsigned int num, out = 0, in = 0; + struct virtblk_req *vbr; + + BUG_ON(bio->bi_phys_segments + 2 > vblk->sg_elems); + BUG_ON(bio->bi_rw & (REQ_FLUSH | REQ_FUA)); + + vbr = virtblk_alloc_req(vblk, GFP_NOIO); + if (!vbr) { + bio_endio(bio, -ENOMEM); + return; + } + + vbr->bio = bio; + vbr->req = NULL; + vbr->out_hdr.type = 0; + vbr->out_hdr.sector = bio->bi_sector; + vbr->out_hdr.ioprio = bio_prio(bio); + + sg_set_buf(&vbr->sg[out++], &vbr->out_hdr, sizeof(vbr->out_hdr)); + + num = blk_bio_map_sg(q, bio, vbr->sg + out); + + sg_set_buf(&vbr->sg[num + out + in++], &vbr->status, + sizeof(vbr->status)); + + if (num) { + if (bio->bi_rw & REQ_WRITE) { + vbr->out_hdr.type |= VIRTIO_BLK_T_OUT; + out += num; + } else { + vbr->out_hdr.type |= VIRTIO_BLK_T_IN; + in += num; + } + } + + spin_lock_irq(vblk->disk->queue->queue_lock); + if (unlikely(virtqueue_add_buf(vblk->vq, vbr->sg, out, in, vbr, + GFP_ATOMIC) < 0)) { + spin_unlock_irq(vblk->disk->queue->queue_lock); + virtblk_add_buf_wait(vblk, vbr, out, in); + return; + } + virtqueue_kick(vblk->vq); + spin_unlock_irq(vblk->disk->queue->queue_lock); +} + /* return id (s/n) string for *disk to *id_str */ static int virtblk_get_id(struct gendisk *disk, char *id_str) @@ -360,7 +475,7 @@ static int init_vq(struct virtio_blk *vblk) int err = 0; /* We expect one virtqueue, for output. */ - vblk->vq = virtio_find_single_vq(vblk->vdev, blk_done, "requests"); + vblk->vq = virtio_find_single_vq(vblk->vdev, virtblk_done, "requests"); if (IS_ERR(vblk->vq)) err = PTR_ERR(vblk->vq); @@ -400,6 +515,8 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) struct virtio_blk *vblk; struct request_queue *q; int err, index; + int pool_size; + u64 cap; u32 v, blk_size, sg_elems, opt_io_size; u16 min_io_size; @@ -429,10 +546,12 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) goto out_free_index; } + init_waitqueue_head(&vblk->queue_wait); vblk->vdev = vdev; vblk->sg_elems = sg_elems; sg_init_table(vblk->sg, vblk->sg_elems); mutex_init(&vblk->config_lock); + INIT_WORK(&vblk->config_work, virtblk_config_changed_work); vblk->config_enable = true; @@ -440,7 +559,10 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) if (err) goto out_free_vblk; - vblk->pool = mempool_create_kmalloc_pool(1,sizeof(struct virtblk_req)); + pool_size = sizeof(struct virtblk_req); + if (use_bio) + pool_size += sizeof(struct scatterlist) * sg_elems; + vblk->pool = mempool_create_kmalloc_pool(1, pool_size); if (!vblk->pool) { err = -ENOMEM; goto out_free_vq; @@ -453,12 +575,14 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) goto out_mempool; } - q = vblk->disk->queue = blk_init_queue(do_virtblk_request, NULL); + q = vblk->disk->queue = blk_init_queue(virtblk_request, NULL); if (!q) { err = -ENOMEM; goto out_put_disk; } + if (use_bio) + blk_queue_make_request(q, virtblk_make_request); q->queuedata = vblk; virtblk_name_format("vd", index, vblk->disk->disk_name, DISK_NAME_LEN); @@ -471,7 +595,7 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) vblk->index = index; /* configure queue flush support */ - if (virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH)) + if (virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH) && !use_bio) blk_queue_flush(q, REQ_FLUSH); /* If disk is read-only in the host, the guest should obey */ @@ -544,7 +668,6 @@ static int __devinit virtblk_probe(struct virtio_device *vdev) if (!err && opt_io_size) blk_queue_io_opt(q, blk_size * opt_io_size); - add_disk(vblk->disk); err = device_create_file(disk_to_dev(vblk->disk), &dev_attr_serial); if (err)