diff mbox series

virtio_blk: set the default scheduler to none

Message ID 20231207043118.118158-1-fengli@smartx.com (mailing list archive)
State New, archived
Headers show
Series virtio_blk: set the default scheduler to none | expand

Commit Message

Li Feng Dec. 7, 2023, 4:31 a.m. UTC
virtio-blk is generally used in cloud computing scenarios, where the
performance of virtual disks is very important. The mq-deadline scheduler
has a big performance drop compared to none with single queue. In my tests,
mq-deadline 4k readread iops were 270k compared to 450k for none. So here
the default scheduler of virtio-blk is set to "none".

Signed-off-by: Li Feng <fengli@smartx.com>
---
 drivers/block/virtio_blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jason Wang Dec. 7, 2023, 6:02 a.m. UTC | #1
On Thu, Dec 7, 2023 at 12:33 PM Li Feng <fengli@smartx.com> wrote:
>
> virtio-blk is generally used in cloud computing scenarios, where the
> performance of virtual disks is very important. The mq-deadline scheduler
> has a big performance drop compared to none with single queue.

At least you can choose the scheduler based on if mq is supported or not?

Thanks
Li Feng Dec. 7, 2023, 6:32 a.m. UTC | #2
> On Dec 7, 2023, at 14:02, Jason Wang <jasowang@redhat.com> wrote:
> 
> On Thu, Dec 7, 2023 at 12:33 PM Li Feng <fengli@smartx.com> wrote:
>> 
>> virtio-blk is generally used in cloud computing scenarios, where the
>> performance of virtual disks is very important. The mq-deadline scheduler
>> has a big performance drop compared to none with single queue.
> 
> At least you can choose the scheduler based on if mq is supported or not?
> 
> Thanks
> 
The "none" scheduler has better performance regardless of whether it is
a single queue or multiple queues. Why not set the single queue scheduler
to none by default?

Thanks.
Michael S. Tsirkin Dec. 7, 2023, 6:38 a.m. UTC | #3
On Thu, Dec 07, 2023 at 02:02:36PM +0800, Jason Wang wrote:
> On Thu, Dec 7, 2023 at 12:33 PM Li Feng <fengli@smartx.com> wrote:
> >
> > virtio-blk is generally used in cloud computing scenarios, where the
> > performance of virtual disks is very important. The mq-deadline scheduler
> > has a big performance drop compared to none with single queue.
> 
> At least you can choose the scheduler based on if mq is supported or not?
> 
> Thanks

This is already the case:

static struct elevator_type *elevator_get_default(struct request_queue *q)
{
        if (q->tag_set && q->tag_set->flags & BLK_MQ_F_NO_SCHED_BY_DEFAULT)
                return NULL;

        if (q->nr_hw_queues != 1 &&
            !blk_mq_is_shared_tags(q->tag_set->flags))
                return NULL;

        return elevator_find_get(q, "mq-deadline");
}

I guess I agree blk is typically kind of similar to loopback
so none by default makes sense here same as for loopback.

Stefan care to comment?
Chaitanya Kulkarni Dec. 7, 2023, 6:53 a.m. UTC | #4
On 12/6/23 20:31, Li Feng wrote:
> virtio-blk is generally used in cloud computing scenarios, where the
> performance of virtual disks is very important. The mq-deadline scheduler
> has a big performance drop compared to none with single queue. In my tests,
> mq-deadline 4k readread iops were 270k compared to 450k for none. So here
> the default scheduler of virtio-blk is set to "none".
>
> Signed-off-by: Li Feng <fengli@smartx.com>
> ---
>

This patch looks good to me, however I'd update the commit log and add
performance numbers for the non-mq case also, just in-case to show that we
are not breaking non-mq setup.

Being said that, in case we want to be future proof, we can also think of
adding a module param so if someone comes with a scenario where NO_SCHED is
not providing the performance then they can just use the module parameter
instead of again editing the code, irrespective of that :-

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck
Li Feng Dec. 7, 2023, 7:21 a.m. UTC | #5
> On Dec 7, 2023, at 14:53, Chaitanya Kulkarni <chaitanyak@nvidia.com> wrote:
> 
> On 12/6/23 20:31, Li Feng wrote:
>> virtio-blk is generally used in cloud computing scenarios, where the
>> performance of virtual disks is very important. The mq-deadline scheduler
>> has a big performance drop compared to none with single queue. In my tests,
>> mq-deadline 4k readread iops were 270k compared to 450k for none. So here
>> the default scheduler of virtio-blk is set to "none".
>> 
>> Signed-off-by: Li Feng <fengli@smartx.com>
>> ---
>> 
> 
> This patch looks good to me, however I'd update the commit log and add
> performance numbers for the non-mq case also, just in-case to show that we
> are not breaking non-mq setup.
> 
> Being said that, in case we want to be future proof, we can also think of
> adding a module param so if someone comes with a scenario where NO_SCHED is
> not providing the performance then they can just use the module parameter
> instead of again editing the code, irrespective of that :-
> 
> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
> 
> -ck

Hi ck,

What I put above(450k vs 270k) is the data of single queue(non-mq). I think
we don’t need to add module parameters because the scheduler can be modified
through sysfs.

Thanks.

> 
>
Chaitanya Kulkarni Dec. 7, 2023, 9:48 a.m. UTC | #6
On 12/6/2023 11:21 PM, Li Feng wrote:
> 
> 
>> On Dec 7, 2023, at 14:53, Chaitanya Kulkarni <chaitanyak@nvidia.com> wrote:
>>
>> On 12/6/23 20:31, Li Feng wrote:
>>> virtio-blk is generally used in cloud computing scenarios, where the
>>> performance of virtual disks is very important. The mq-deadline scheduler
>>> has a big performance drop compared to none with single queue. In my tests,
>>> mq-deadline 4k readread iops were 270k compared to 450k for none. So here
>>> the default scheduler of virtio-blk is set to "none".
>>>
>>> Signed-off-by: Li Feng <fengli@smartx.com>
>>> ---
>>>
>>
>> This patch looks good to me, however I'd update the commit log and add
>> performance numbers for the non-mq case also, just in-case to show that we
>> are not breaking non-mq setup.
>>
>> Being said that, in case we want to be future proof, we can also think of
>> adding a module param so if someone comes with a scenario where NO_SCHED is
>> not providing the performance then they can just use the module parameter
>> instead of again editing the code, irrespective of that :-
>>
>> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
>>
>> -ck
> 
> Hi ck,
> 
> What I put above(450k vs 270k) is the data of single queue(non-mq). I think
> we don’t need to add module parameters because the scheduler can be modified
> through sysfs.
> 
> Thanks.

okay.

-ck
Stefan Hajnoczi Dec. 7, 2023, 2:51 p.m. UTC | #7
On Thu, Dec 07, 2023 at 12:31:05PM +0800, Li Feng wrote:
> virtio-blk is generally used in cloud computing scenarios, where the
> performance of virtual disks is very important. The mq-deadline scheduler
> has a big performance drop compared to none with single queue. In my tests,
> mq-deadline 4k readread iops were 270k compared to 450k for none. So here
> the default scheduler of virtio-blk is set to "none".
> 
> Signed-off-by: Li Feng <fengli@smartx.com>
> ---
>  drivers/block/virtio_blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

This seems similar to commit f8b12e513b95 ("virtio_blk: revert
QUEUE_FLAG_VIRT addition") where changing the default sounded good in
theory but exposed existing users to performance regressions.

Christoph's suggestion back in 2009 was to introduce a flag in the
virtio-blk hardware interface so that the device can provide a hint from
the host side.

Do you have more performance data aside from 4k randread? My suggestion
would be for everyone with an interest to collect and share data so
there's more evidence that this new default works well for a range of
configurations.

I don't want to be overly conservative. The virtio_blk driver has
undergone changes in this regard from the legacy block layer to blk-mq
(without an I/O scheduler) to blk-mq (mq-deadline). Performance changed
at each step and that wasn't a showstopper, so I think we could default
to 'none' without a lot of damage. Let's just get more data.

Stefan

> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index d53d6aa8ee69..5183ec8e00be 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -1367,7 +1367,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>  	vblk->tag_set.ops = &virtio_mq_ops;
>  	vblk->tag_set.queue_depth = queue_depth;
>  	vblk->tag_set.numa_node = NUMA_NO_NODE;
> -	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
> +	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_SCHED_BY_DEFAULT;
>  	vblk->tag_set.cmd_size =
>  		sizeof(struct virtblk_req) +
>  		sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;
> -- 
> 2.42.0
>
Paolo Bonzini Dec. 7, 2023, 2:55 p.m. UTC | #8
On Thu, Dec 7, 2023 at 3:52 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Thu, Dec 07, 2023 at 12:31:05PM +0800, Li Feng wrote:
> > virtio-blk is generally used in cloud computing scenarios, where the
> > performance of virtual disks is very important. The mq-deadline scheduler
> > has a big performance drop compared to none with single queue. In my tests,
> > mq-deadline 4k readread iops were 270k compared to 450k for none. So here
> > the default scheduler of virtio-blk is set to "none".
> >
> > Signed-off-by: Li Feng <fengli@smartx.com>
> > ---
> >  drivers/block/virtio_blk.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
>
> This seems similar to commit f8b12e513b95 ("virtio_blk: revert
> QUEUE_FLAG_VIRT addition") where changing the default sounded good in
> theory but exposed existing users to performance regressions. [...]
> I don't want to be overly conservative. The virtio_blk driver has
> undergone changes in this regard from the legacy block layer to blk-mq
> (without an I/O scheduler) to blk-mq (mq-deadline).

IIRC there were also regressions in both virtio-blk and virtio-scsi
when switching from the legacy block layer to blk-mq. So perhaps I
*am* a bit more conservative, but based on past experience, this patch
seems not to be a great idea for practical use cases.

Paolo

> Performance changed
> at each step and that wasn't a showstopper, so I think we could default
> to 'none' without a lot of damage. Let's just get more data.
>
> Stefan
>
> >
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index d53d6aa8ee69..5183ec8e00be 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -1367,7 +1367,7 @@ static int virtblk_probe(struct virtio_device *vdev)
> >       vblk->tag_set.ops = &virtio_mq_ops;
> >       vblk->tag_set.queue_depth = queue_depth;
> >       vblk->tag_set.numa_node = NUMA_NO_NODE;
> > -     vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
> > +     vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_SCHED_BY_DEFAULT;
> >       vblk->tag_set.cmd_size =
> >               sizeof(struct virtblk_req) +
> >               sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;
> > --
> > 2.42.0
> >
Christoph Hellwig Dec. 7, 2023, 3:19 p.m. UTC | #9
On Thu, Dec 07, 2023 at 03:55:17PM +0100, Paolo Bonzini wrote:
> > This seems similar to commit f8b12e513b95 ("virtio_blk: revert
> > QUEUE_FLAG_VIRT addition") where changing the default sounded good in
> > theory but exposed existing users to performance regressions. [...]
> > I don't want to be overly conservative. The virtio_blk driver has
> > undergone changes in this regard from the legacy block layer to blk-mq
> > (without an I/O scheduler) to blk-mq (mq-deadline).
> 
> IIRC there were also regressions in both virtio-blk and virtio-scsi
> when switching from the legacy block layer to blk-mq. So perhaps I
> *am* a bit more conservative, but based on past experience, this patch
> seems not to be a great idea for practical use cases.

Agreed.  I'm in fact not exactly happy about the rather odd
BLK_MQ_F_NO_SCHED_BY_DEFAULT flag to start with.
Ming Lei Dec. 8, 2023, 2 a.m. UTC | #10
On Thu, Dec 07, 2023 at 12:31:05PM +0800, Li Feng wrote:
> virtio-blk is generally used in cloud computing scenarios, where the
> performance of virtual disks is very important. The mq-deadline scheduler
> has a big performance drop compared to none with single queue. In my tests,
> mq-deadline 4k readread iops were 270k compared to 450k for none. So here
> the default scheduler of virtio-blk is set to "none".

The test result shows you may not test HDD. backing of virtio-blk.

none can lose IO merge capability more or less, so probably sequential IO perf
drops in case of HDD backing.

Thanks,
Ming
Keith Busch Dec. 8, 2023, 2:44 a.m. UTC | #11
On Fri, Dec 08, 2023 at 10:00:36AM +0800, Ming Lei wrote:
> On Thu, Dec 07, 2023 at 12:31:05PM +0800, Li Feng wrote:
> > virtio-blk is generally used in cloud computing scenarios, where the
> > performance of virtual disks is very important. The mq-deadline scheduler
> > has a big performance drop compared to none with single queue. In my tests,
> > mq-deadline 4k readread iops were 270k compared to 450k for none. So here
> > the default scheduler of virtio-blk is set to "none".
> 
> The test result shows you may not test HDD. backing of virtio-blk.
> 
> none can lose IO merge capability more or less, so probably sequential IO perf
> drops in case of HDD backing.

More of a curiosity, as I don't immediately even have an HDD to test
with! Isn't it more useful for the host providing the backing HDD use an
appropriate IO scheduler? virtio-blk has similiarities with a stacking
block driver, and we usually don't need to stack IO schedulers.
Jens Axboe Dec. 8, 2023, 3:15 a.m. UTC | #12
On 12/7/23 7:44 PM, Keith Busch wrote:
> On Fri, Dec 08, 2023 at 10:00:36AM +0800, Ming Lei wrote:
>> On Thu, Dec 07, 2023 at 12:31:05PM +0800, Li Feng wrote:
>>> virtio-blk is generally used in cloud computing scenarios, where the
>>> performance of virtual disks is very important. The mq-deadline scheduler
>>> has a big performance drop compared to none with single queue. In my tests,
>>> mq-deadline 4k readread iops were 270k compared to 450k for none. So here
>>> the default scheduler of virtio-blk is set to "none".
>>
>> The test result shows you may not test HDD. backing of virtio-blk.
>>
>> none can lose IO merge capability more or less, so probably sequential IO perf
>> drops in case of HDD backing.
> 
> More of a curiosity, as I don't immediately even have an HDD to test
> with! Isn't it more useful for the host providing the backing HDD use an
> appropriate IO scheduler? virtio-blk has similiarities with a stacking
> block driver, and we usually don't need to stack IO schedulers.

Indeed, any kind of scheduling or merging should not need to happen
here, but rather at the lower end. But there could be an argument to be
made for having fewer commands coming out of virtio-blk, however. Would
be interesting to test.
Ming Lei Dec. 8, 2023, 3:54 a.m. UTC | #13
On Thu, Dec 07, 2023 at 07:44:37PM -0700, Keith Busch wrote:
> On Fri, Dec 08, 2023 at 10:00:36AM +0800, Ming Lei wrote:
> > On Thu, Dec 07, 2023 at 12:31:05PM +0800, Li Feng wrote:
> > > virtio-blk is generally used in cloud computing scenarios, where the
> > > performance of virtual disks is very important. The mq-deadline scheduler
> > > has a big performance drop compared to none with single queue. In my tests,
> > > mq-deadline 4k readread iops were 270k compared to 450k for none. So here
> > > the default scheduler of virtio-blk is set to "none".
> > 
> > The test result shows you may not test HDD. backing of virtio-blk.
> > 
> > none can lose IO merge capability more or less, so probably sequential IO perf
> > drops in case of HDD backing.
> 
> More of a curiosity, as I don't immediately even have an HDD to test
> with! Isn't it more useful for the host providing the backing HDD use an
> appropriate IO scheduler? virtio-blk has similiarities with a stacking
> block driver, and we usually don't need to stack IO schedulers.

dm-rq actually uses IO scheduler at high layer, and early merge has some
benefits:

1) virtio-blk inflight requests are reduced, so less chance to throttle
inside VM, meantime less IOs(bigger size) are handled by QEMU, and submitted
to host side queue.

2) early merge in VM is cheap than host side, since there can be more block
IOs originated from different virtio-blk/scsi devices at the same time and
all images can be stored in single disk, then these IOs become interleaved in
host side queue, so sequential IO may become random or hard to merge.

As Jens mentioned, it needs actual test.


Thanks,
Ming
Li Feng Dec. 8, 2023, 5:55 a.m. UTC | #14
Hi,

I have ran all io pattern on my another host.
Notes:
q1 means fio iodepth = 1
j1 means fio num jobs = 1

VCPU = 4, VMEM = 2GiB, fio using directio.

The results of most jobs are better than the deadline, and some are lower than the deadline。


pattern                        |  iops(mq-deadline)    | iops(none)       | diff
:-                             | -:                    | -:               | -:
4k-randread-q1-j1              | 12325             |	 13356      	|8.37%
256k-randread-q1-j1            | 1865              |	 1883       	|0.97%
4k-randread-q128-j1            | 204739            |	 319066     	|55.84%
256k-randread-q128-j1          | 24257             |	 22851      	|-5.80%
4k-randwrite-q1-j1             | 9923              |	 10163      	|2.42%
256k-randwrite-q1-j1           | 2762              |	 2833       	|2.57%
4k-randwrite-q128-j1           | 137400            |	 152081     	|10.68%
256k-randwrite-q128-j1         | 9353              |	 9233       	|-1.28%
4k-read-q1-j1                  | 21499             |	 22223      	|3.37%
256k-read-q1-j1                | 1919              |	 1951       	|1.67%
4k-read-q128-j1                | 158806            |	 345269     	|117.42%
256k-read-q128-j1              | 18918             |	 23710      	|25.33%
4k-write-q1-j1                 | 10120             |	 10262      	|1.40%
256k-write-q1-j1               | 2779              |	 2744       	|-1.26%
4k-write-q128-j1               | 47576             |	 209236     	|339.79%
256k-write-q128-j1             | 9199              |	 9337       	|1.50%
4k-randread-q1-j2              | 24238             |	 25478      	|5.12%
256k-randread-q1-j2            | 3656              |	 3649       	|-0.19%
4k-randread-q128-j2            | 390090            |	 577300     	|47.99%
256k-randread-q128-j2          | 21992             |	 23437      	|6.57%
4k-randwrite-q1-j2             | 17096             |	 18112      	|5.94%
256k-randwrite-q1-j2           | 5188              |	 4914       	|-5.28%
4k-randwrite-q128-j2           | 143373            |	 140560     	|-1.96%
256k-randwrite-q128-j2         | 9423              |	 9314       	|-1.16%
4k-read-q1-j2                  | 36890             |	 31768      	|-13.88%
256k-read-q1-j2                | 3708              |	 4028       	|8.63%
4k-read-q128-j2                | 399500            |	 409857     	|2.59%
256k-read-q128-j2              | 19360             |	 21467      	|10.88%
4k-write-q1-j2                 | 17786             |	 18519      	|4.12%
256k-write-q1-j2               | 4756              |	 5035       	|5.87%
4k-write-q128-j2               | 175756            |	 159109     	|-9.47%
256k-write-q128-j2             | 9292              |	 9293       	|0.01%

> On Dec 8, 2023, at 11:54, Ming Lei <ming.lei@redhat.com> wrote:
> 
> On Thu, Dec 07, 2023 at 07:44:37PM -0700, Keith Busch wrote:
>> On Fri, Dec 08, 2023 at 10:00:36AM +0800, Ming Lei wrote:
>>> On Thu, Dec 07, 2023 at 12:31:05PM +0800, Li Feng wrote:
>>>> virtio-blk is generally used in cloud computing scenarios, where the
>>>> performance of virtual disks is very important. The mq-deadline scheduler
>>>> has a big performance drop compared to none with single queue. In my tests,
>>>> mq-deadline 4k readread iops were 270k compared to 450k for none. So here
>>>> the default scheduler of virtio-blk is set to "none".
>>> 
>>> The test result shows you may not test HDD. backing of virtio-blk.
>>> 
>>> none can lose IO merge capability more or less, so probably sequential IO perf
>>> drops in case of HDD backing.
>> 
>> More of a curiosity, as I don't immediately even have an HDD to test
>> with! Isn't it more useful for the host providing the backing HDD use an
>> appropriate IO scheduler? virtio-blk has similiarities with a stacking
>> block driver, and we usually don't need to stack IO schedulers.
> 
> dm-rq actually uses IO scheduler at high layer, and early merge has some
> benefits:
> 
> 1) virtio-blk inflight requests are reduced, so less chance to throttle
> inside VM, meantime less IOs(bigger size) are handled by QEMU, and submitted
> to host side queue.
> 
> 2) early merge in VM is cheap than host side, since there can be more block
> IOs originated from different virtio-blk/scsi devices at the same time and
> all images can be stored in single disk, then these IOs become interleaved in
> host side queue, so sequential IO may become random or hard to merge.
> 
> As Jens mentioned, it needs actual test.
> 
> 
> Thanks,
> Ming
>
Paolo Bonzini Dec. 8, 2023, 11:07 a.m. UTC | #15
On Fri, Dec 8, 2023 at 6:54 AM Li Feng <fengli@smartx.com> wrote:
>
>
> Hi,
>
> I have ran all io pattern on my another host.
> Notes:
> q1 means fio iodepth = 1
> j1 means fio num jobs = 1
>
> VCPU = 4, VMEM = 2GiB, fio using directio.
>
> The results of most jobs are better than the deadline, and some are lower than the deadline。

I think this analysis is a bit simplistic. In particular:

For low queue depth improvements are relatively small but we also have
the worst cases:

4k-randread-q1-j1      |    12325  |     13356  |   8.37%
256k-randread-q1-j1    |     1865  |      1883  |   0.97%
4k-randwrite-q1-j1     |     9923  |     10163  |   2.42%
256k-randwrite-q1-j1   |     2762  |      2833  |   2.57%
4k-read-q1-j1          |    21499  |     22223  |   3.37%
256k-read-q1-j1        |     1919  |      1951  |   1.67%
4k-write-q1-j1         |    10120  |     10262  |   1.40%
256k-write-q1-j1       |     2779  |      2744  |  -1.26%
4k-randread-q1-j2      |    24238  |     25478  |   5.12%
256k-randread-q1-j2    |     3656  |      3649  |  -0.19%
4k-randwrite-q1-j2     |    17096  |     18112  |   5.94%
256k-randwrite-q1-j2   |     5188  |      4914  |  -5.28%
4k-read-q1-j2          |    36890  |     31768  | -13.88%
256k-read-q1-j2        |     3708  |      4028  |   8.63%
4k-write-q1-j2         |    17786  |     18519  |   4.12%
256k-write-q1-j2       |     4756  |      5035  |   5.87%

(I ran a paired t-test and it confirms that the improvements overall
are not statistically significant).

Small, high queue depth I/O is where the improvements are definitely
significant, but even then the scheduler seems to help in the j2 case:

4k-randread-q128-j1    |   204739  |    319066  |  55.84%
4k-randwrite-q128-j1   |   137400  |    152081  |  10.68%
4k-read-q128-j1        |   158806  |    345269  | 117.42%
4k-write-q128-j1       |    47576  |    209236  | 339.79%
4k-randread-q128-j2    |   390090  |    577300  |  47.99%
4k-randwrite-q128-j2   |   143373  |    140560  |  -1.96%
4k-read-q128-j2        |   399500  |    409857  |   2.59%
4k-write-q128-j2       |   175756  |    159109  |  -9.47%

At higher sizes, even high queue depth results have high variability.
There are clear improvements for sequential reads, but not so much for
everything else:

256k-randread-q128-j1  |    24257  |     22851  |  -5.80%
256k-randwrite-q128-j1 |     9353  |      9233  |  -1.28%
256k-read-q128-j1      |    18918  |     23710  |  25.33%
256k-write-q128-j1     |     9199  |      9337  |   1.50%
256k-randread-q128-j2  |    21992  |     23437  |   6.57%
256k-randwrite-q128-j2 |     9423  |      9314  |  -1.16%
256k-read-q128-j2      |    19360  |     21467  |  10.88%
256k-write-q128-j2     |     9292  |      9293  |   0.01%

I would focus on small I/O with varying queue depths, to understand at
which point the performance starts to improve; queue depth of 128 may
not be representative of common usage, especially high queue depth
*sequential* access which is where the biggest effects are visibie.
Maybe you can look at improvements in the scheduler instead?

Paolo
Michael S. Tsirkin Dec. 25, 2023, 2:20 p.m. UTC | #16
On Thu, Dec 07, 2023 at 12:31:05PM +0800, Li Feng wrote:
> virtio-blk is generally used in cloud computing scenarios, where the
> performance of virtual disks is very important. The mq-deadline scheduler
> has a big performance drop compared to none with single queue. In my tests,
> mq-deadline 4k readread iops were 270k compared to 450k for none. So here
> the default scheduler of virtio-blk is set to "none".
> 
> Signed-off-by: Li Feng <fengli@smartx.com>

I dropped this for now, pls try to address comments by Christoph/Paolo
if it's still needed

> ---
>  drivers/block/virtio_blk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index d53d6aa8ee69..5183ec8e00be 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -1367,7 +1367,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>  	vblk->tag_set.ops = &virtio_mq_ops;
>  	vblk->tag_set.queue_depth = queue_depth;
>  	vblk->tag_set.numa_node = NUMA_NO_NODE;
> -	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
> +	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_SCHED_BY_DEFAULT;
>  	vblk->tag_set.cmd_size =
>  		sizeof(struct virtblk_req) +
>  		sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;
> -- 
> 2.42.0
Li Feng Dec. 26, 2023, 9:01 a.m. UTC | #17
Hi MST and paolo,

mq-deadline is good for slow media, and none is good for high-speed media. 
It depends on how the community views this issue. When virtio-blk adopts
multi-queue,it automatically changes from deadline to none, which is not
uniform here.

I don't have ideas right now to answer Christoph/Paolo's question.

Thanks,
Li

> On Dec 25, 2023, at 22:20, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Thu, Dec 07, 2023 at 12:31:05PM +0800, Li Feng wrote:
>> virtio-blk is generally used in cloud computing scenarios, where the
>> performance of virtual disks is very important. The mq-deadline scheduler
>> has a big performance drop compared to none with single queue. In my tests,
>> mq-deadline 4k readread iops were 270k compared to 450k for none. So here
>> the default scheduler of virtio-blk is set to "none".
>> 
>> Signed-off-by: Li Feng <fengli@smartx.com>
> 
> I dropped this for now, pls try to address comments by Christoph/Paolo
> if it's still needed
> 
>> ---
>> drivers/block/virtio_blk.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index d53d6aa8ee69..5183ec8e00be 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -1367,7 +1367,7 @@ static int virtblk_probe(struct virtio_device *vdev)
>> 	vblk->tag_set.ops = &virtio_mq_ops;
>> 	vblk->tag_set.queue_depth = queue_depth;
>> 	vblk->tag_set.numa_node = NUMA_NO_NODE;
>> -	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
>> +	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_SCHED_BY_DEFAULT;
>> 	vblk->tag_set.cmd_size =
>> 		sizeof(struct virtblk_req) +
>> 		sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;
>> -- 
>> 2.42.0
>
Michael S. Tsirkin Dec. 26, 2023, 9:05 a.m. UTC | #18
On Tue, Dec 26, 2023 at 05:01:40PM +0800, Li Feng wrote:
> Hi MST and paolo,
> 
> mq-deadline is good for slow media, and none is good for high-speed media. 
> It depends on how the community views this issue. When virtio-blk adopts
> multi-queue,it automatically changes from deadline to none, which is not
> uniform here.

It's not virtio-blk changing though, it's linux doing that, right?
Is virtio-blk special somehow?

> I don't have ideas right now to answer Christoph/Paolo's question.
> 
> Thanks,
> Li
Li Feng Dec. 26, 2023, 12:14 p.m. UTC | #19
On Tue, Dec 26, 2023 at 5:05 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Dec 26, 2023 at 05:01:40PM +0800, Li Feng wrote:
> > Hi MST and paolo,
> >
> > mq-deadline is good for slow media, and none is good for high-speed media.
> > It depends on how the community views this issue. When virtio-blk adopts
> > multi-queue,it automatically changes from deadline to none, which is not
> > uniform here.
>
> It's not virtio-blk changing though, it's linux doing that, right?
> Is virtio-blk special somehow?
>
Yes, it’s the common code path.

Each block driver has a chance to set the default none scheduler, like loop,
it has set the none as the default scheduler.

https://lkml.kernel.org/stable/20230809103647.895397540@linuxfoundation.org/

> > I don't have ideas right now to answer Christoph/Paolo's question.
> >
> > Thanks,
> > Li
>
> --
> MST
>
Michael S. Tsirkin Dec. 26, 2023, 3:38 p.m. UTC | #20
On Tue, Dec 26, 2023 at 05:01:40PM +0800, Li Feng wrote:
> I don't have ideas right now to answer Christoph/Paolo's question.

Paolo did some testing on this thread and posted some concerns.
Do you disagree with his analysis?
Li Feng Dec. 27, 2023, 7:26 a.m. UTC | #21
> On Dec 26, 2023, at 23:38, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Tue, Dec 26, 2023 at 05:01:40PM +0800, Li Feng wrote:
>> I don't have ideas right now to answer Christoph/Paolo's question.
> 
> Paolo did some testing on this thread and posted some concerns.
> Do you disagree with his analysis?

Paolo gave a very detailed data analysis. Indeed, in some low queue depth
scenarios, the data dropped somewhat. However, I suspect that it may be 
caused by other problems (such as test fluctuations) rather than the benefits 
brought by deadline. 

BTW, I think 128 queue depth test is a very important and common performance
test item for storage devices.

Thanks,
Li

> 
> -- 
> MST
>
Michael S. Tsirkin Dec. 27, 2023, 10:06 a.m. UTC | #22
On Wed, Dec 27, 2023 at 03:26:30PM +0800, Li Feng wrote:
> 
> 
> > On Dec 26, 2023, at 23:38, Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > On Tue, Dec 26, 2023 at 05:01:40PM +0800, Li Feng wrote:
> >> I don't have ideas right now to answer Christoph/Paolo's question.
> > 
> > Paolo did some testing on this thread and posted some concerns.
> > Do you disagree with his analysis?
> 
> Paolo gave a very detailed data analysis. Indeed, in some low queue depth
> scenarios, the data dropped somewhat. However, I suspect that it may be 
> caused by other problems (such as test fluctuations) rather than the benefits 
> brought by deadline. 

Maybe. I think it would be up to you to prove this then.


> BTW, I think 128 queue depth test is a very important and common performance
> test item for storage devices.
> 
> Thanks,
> Li
> 
> > 
> > -- 
> > MST
> >
Li Feng Dec. 28, 2023, 7:25 a.m. UTC | #23
> On Dec 27, 2023, at 18:06, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> On Wed, Dec 27, 2023 at 03:26:30PM +0800, Li Feng wrote:
>> 
>> 
>>> On Dec 26, 2023, at 23:38, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> 
>>> On Tue, Dec 26, 2023 at 05:01:40PM +0800, Li Feng wrote:
>>>> I don't have ideas right now to answer Christoph/Paolo's question.
>>> 
>>> Paolo did some testing on this thread and posted some concerns.
>>> Do you disagree with his analysis?
>> 
>> Paolo gave a very detailed data analysis. Indeed, in some low queue depth
>> scenarios, the data dropped somewhat. However, I suspect that it may be 
>> caused by other problems (such as test fluctuations) rather than the benefits 
>> brought by deadline.
> 
> Maybe. I think it would be up to you to prove this then.
Ok.

> 
> 
>> BTW, I think 128 queue depth test is a very important and common performance
>> test item for storage devices.
>> 
>> Thanks,
>> Li
>> 
>>> 
>>> -- 
>>> MST
>>> 
>
diff mbox series

Patch

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index d53d6aa8ee69..5183ec8e00be 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1367,7 +1367,7 @@  static int virtblk_probe(struct virtio_device *vdev)
 	vblk->tag_set.ops = &virtio_mq_ops;
 	vblk->tag_set.queue_depth = queue_depth;
 	vblk->tag_set.numa_node = NUMA_NO_NODE;
-	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_NO_SCHED_BY_DEFAULT;
 	vblk->tag_set.cmd_size =
 		sizeof(struct virtblk_req) +
 		sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;