[1/1] block: add missed BDRV_O_NOCACHE when block device is opened without file
diff mbox

Message ID 1251c8d7-3862-2cb9-a3c8-aa6d3c8e3924@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev Jan. 25, 2017, 7:44 p.m. UTC
On 01/25/2017 08:59 PM, Max Reitz wrote:
> [CC-ing John]
>
> On 25.01.2017 17:42, Denis V. Lunev wrote:
>> Technically there is a problem when the guest DVD is created by libvirt
>> with AIO mode 'native' on Linux. Current QEMU is unable to start the
>> domain configured as follows:
>>     <disk type='file' device='cdrom'>
>>       <driver name='qemu' type='raw' cache='none' io='native'/>
>>       <target dev='sdb' bus='scsi'/>
>>       <readonly/>
>>     </disk>
>> The problem comes from the combination of 'cache' and 'io' options.
>>
>> 'io' option is common option and it is removed from block driver
>> specific options. 'cache' originally is not. The patch makes 'cache'
>> option common. This works fine as long as cdrom media insertion
>> later on.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> ---
>>  blockdev.c | 18 +++++++++++++++---
>>  1 file changed, 15 insertions(+), 3 deletions(-)
> There was a Red Hat BZ for this:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1342999
>
> There is now a corresponding BZ for libvirt:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1377321
>
> The gist is that it was determined to be a problem with libvirt.
>
> RHEL has a downstream commit to work around this issue by ignoring the
> cache mode set for empty CD-ROMs -- but that is only because that was
> simpler than fixing libvirt.
>
> (Note that it's ignoring the cache mode instead of actually evaluating it.)
>
This is what I have exactly started from:
http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHEV/SRPMS/qemu-kvm-rhev-2.6.0-27.el7.src.rpm

This package starts VM well for the above mentioned configuration:

    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <target dev='sdb' bus='scsi'/>
      <readonly/>
    </disk>

but the problem comes later at 'change' moment. It reports

'Details: internal error: unable to execute QEMU command 'change':
aio=native was specified, but it requires cache.direct=on, which
was not specified.)'


Thus partial solution implemented by the RedHat is really
partial and does not work at the second stage. I have started from

         blk_rs->read_only     = !(bdrv_flags & BDRV_O_RDWR);

The problem for us is that we have such previously valid configurations
in the field.

>> May be this has already discussed, but still. AIO=native for CDROM without
>> media seems important case.
> First, I personally don't find aio=native very important for CD-ROM
> drives. Yes, we shouldn't make IDE CD-ROM slower than necessary, but I
> don't think aio=native will make the difference between "Slow like
> CD-ROMs are in reality" and "As fast as virtio".
the problem for me is that each clone() call is costly and counts. That
is why we are trying to avoid it whenever possible. That is why 'native'
mode is MUCH better. Also it would be very nice not to use cached
IO, which is very good for memory overcommit situations.

Here I am fighting not with the performance, which does not make
any real sense, but with a memory footprint.

> Second, all this patch does is revert some changes done by commit
> 91a097e7478940483e76d52217f05bc05b98d5a5, which was very deliberate.
>
> Third, you may then be asking for the recommended way to put an
> aio=native medium into a CD-ROM drive. Good thing you ask, because we
> have a way that we want to recommend but can't because it's still
> considered experimental:
>
> The BDS is added using blockdev-add, with all of the appropriate caching
> and aio options. Then it's inserted into the drive using the
> x-blockdev-insert-medium command, and the drive is closed using
> blockdev-close-tray.
the problem, again, is that with x-blockdev-insert-medium I can not
deal with block driver options, or may be I am missing something.

> There are a couple of issues with this: First, blockdev-add and
> x-blockdev-insert-medium are still experimental. The good news are that
> I think the reason why the latter is experimental has actually
> disappeared and we can just drop the x- by now. The
> not-so-good-but-not-so-bad-either news are that we plan to get
> blockdev-add declared non-experimental for 2.9. Let's see how it goes.
Does this mean that x-blockdev-del would also lose x- prefix?

> The other issue is that of course it's more cumbersome to use than a
> simple 'change' via HMP. I argue that if someone communicates with a VM
> by hand, they either have to deal with this added complexity or they
> cannot use aio=native for CD-ROM drives -- which, as I said, I don't
> think is too bad.
this is how 'virsh change-media' works at the moment, at least with
latest downstreams.
That is why this is not that clear.

> However, there is a reason why you cannot set cache/aio for an empty
> drive and it's simply that it doesn't make sense.
>
> Max
>
Thank you very much for the prompt answer.

Den

Comments

Max Reitz Jan. 28, 2017, 4:23 p.m. UTC | #1
On 25.01.2017 20:44, Denis V. Lunev wrote:
> On 01/25/2017 08:59 PM, Max Reitz wrote:
>> [CC-ing John]
>>
>> On 25.01.2017 17:42, Denis V. Lunev wrote:
>>> Technically there is a problem when the guest DVD is created by libvirt
>>> with AIO mode 'native' on Linux. Current QEMU is unable to start the
>>> domain configured as follows:
>>>     <disk type='file' device='cdrom'>
>>>       <driver name='qemu' type='raw' cache='none' io='native'/>
>>>       <target dev='sdb' bus='scsi'/>
>>>       <readonly/>
>>>     </disk>
>>> The problem comes from the combination of 'cache' and 'io' options.
>>>
>>> 'io' option is common option and it is removed from block driver
>>> specific options. 'cache' originally is not. The patch makes 'cache'
>>> option common. This works fine as long as cdrom media insertion
>>> later on.
>>>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> CC: Kevin Wolf <kwolf@redhat.com>
>>> CC: Max Reitz <mreitz@redhat.com>
>>> ---
>>>  blockdev.c | 18 +++++++++++++++---
>>>  1 file changed, 15 insertions(+), 3 deletions(-)
>> There was a Red Hat BZ for this:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1342999
>>
>> There is now a corresponding BZ for libvirt:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1377321
>>
>> The gist is that it was determined to be a problem with libvirt.
>>
>> RHEL has a downstream commit to work around this issue by ignoring the
>> cache mode set for empty CD-ROMs -- but that is only because that was
>> simpler than fixing libvirt.
>>
>> (Note that it's ignoring the cache mode instead of actually evaluating it.)
>>
> This is what I have exactly started from:
> http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHEV/SRPMS/qemu-kvm-rhev-2.6.0-27.el7.src.rpm
> 
> This package starts VM well for the above mentioned configuration:
> 
>     <disk type='file' device='cdrom'>
>       <driver name='qemu' type='raw' cache='none' io='native'/>
>       <target dev='sdb' bus='scsi'/>
>       <readonly/>
>     </disk>
> 
> but the problem comes later at 'change' moment. It reports
> 
> 'Details: internal error: unable to execute QEMU command 'change':
> aio=native was specified, but it requires cache.direct=on, which
> was not specified.)'
> 
> 
> Thus partial solution implemented by the RedHat is really
> partial and does not work at the second stage.

Yes, because it is not supposed to work for that. The only thing the
downstream patch does is ignore the cache mode so you can still start
the VM even if libvirt decides to specify a cache parameter for an empty
drive (which it should not do).

>                                                I have started from
> 
> diff --git a/blockdev.c b/blockdev.c
> index d280dc4..e2c9053 100644   
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -608,6 +608,13 @@ static BlockBackend *blockdev_init(const char
> *file, QDict *bs_opts,
>              goto early_err;
>          }
>  
> +        if (qdict_haskey(bs_opts, BDRV_OPT_CACHE_DIRECT)) {
> +            bdrv_flags |= BDRV_O_NOCACHE;
> +        }
> +        if (qdict_haskey(bs_opts, BDRV_OPT_CACHE_NO_FLUSH)) {
> +            bdrv_flags |= BDRV_O_NO_FLUSH;
> +        }
> +
>          blk_rs = blk_get_root_state(blk);
>          blk_rs->open_flags    = bdrv_flags;
>          blk_rs->read_only     = !(bdrv_flags & BDRV_O_RDWR);
> 
> The problem for us is that we have such previously valid configurations
> in the field.
> 
>>> May be this has already discussed, but still. AIO=native for CDROM without
>>> media seems important case.
>> First, I personally don't find aio=native very important for CD-ROM
>> drives. Yes, we shouldn't make IDE CD-ROM slower than necessary, but I
>> don't think aio=native will make the difference between "Slow like
>> CD-ROMs are in reality" and "As fast as virtio".
> the problem for me is that each clone() call is costly and counts. That
> is why we are trying to avoid it whenever possible. That is why 'native'
> mode is MUCH better. Also it would be very nice not to use cached
> IO, which is very good for memory overcommit situations.
> 
> Here I am fighting not with the performance, which does not make
> any real sense, but with a memory footprint.

Fair enough. Still, specifying a cache mode for an empty drive simply
doesn't make sense.

Have you considered inserting a null-co:// file at startup as a
workaround? If you use the old 'change' (or blockdev-change-medium)
command, it should retain AIO and cache mode.

>> Second, all this patch does is revert some changes done by commit
>> 91a097e7478940483e76d52217f05bc05b98d5a5, which was very deliberate.
>>
>> Third, you may then be asking for the recommended way to put an
>> aio=native medium into a CD-ROM drive. Good thing you ask, because we
>> have a way that we want to recommend but can't because it's still
>> considered experimental:
>>
>> The BDS is added using blockdev-add, with all of the appropriate caching
>> and aio options. Then it's inserted into the drive using the
>> x-blockdev-insert-medium command, and the drive is closed using
>> blockdev-close-tray.
> the problem, again, is that with x-blockdev-insert-medium I can not
> deal with block driver options, or may be I am missing something.

x-blockdev-insert-medium just inserts a BDS as a medium. The BDS is
added via blockdev-add which allows you to set all of these options.

>> There are a couple of issues with this: First, blockdev-add and
>> x-blockdev-insert-medium are still experimental. The good news are that
>> I think the reason why the latter is experimental has actually
>> disappeared and we can just drop the x- by now. The
>> not-so-good-but-not-so-bad-either news are that we plan to get
>> blockdev-add declared non-experimental for 2.9. Let's see how it goes.
> Does this mean that x-blockdev-del would also lose x- prefix?

As far as I'm aware, yes.

>> The other issue is that of course it's more cumbersome to use than a
>> simple 'change' via HMP. I argue that if someone communicates with a VM
>> by hand, they either have to deal with this added complexity or they
>> cannot use aio=native for CD-ROM drives -- which, as I said, I don't
>> think is too bad.
> this is how 'virsh change-media' works at the moment, at least with
> latest downstreams.
> That is why this is not that clear.

Well, that can be easily changed, the question is how fast that will be.
You would still need to make libvirt aware of what cache mode to use for
the new medium, and that may require deeper changes.

(I supposed libvirt only stores a cache mode per drive while it should
actually store a cache mode per image.)

Max
Denis V. Lunev Jan. 30, 2017, 8:31 a.m. UTC | #2
On 01/28/2017 07:23 PM, Max Reitz wrote:
> On 25.01.2017 20:44, Denis V. Lunev wrote:
>> On 01/25/2017 08:59 PM, Max Reitz wrote:
>>> [CC-ing John]
>>>
>>> On 25.01.2017 17:42, Denis V. Lunev wrote:
>>>> Technically there is a problem when the guest DVD is created by libvirt
>>>> with AIO mode 'native' on Linux. Current QEMU is unable to start the
>>>> domain configured as follows:
>>>>     <disk type='file' device='cdrom'>
>>>>       <driver name='qemu' type='raw' cache='none' io='native'/>
>>>>       <target dev='sdb' bus='scsi'/>
>>>>       <readonly/>
>>>>     </disk>
>>>> The problem comes from the combination of 'cache' and 'io' options.
>>>>
>>>> 'io' option is common option and it is removed from block driver
>>>> specific options. 'cache' originally is not. The patch makes 'cache'
>>>> option common. This works fine as long as cdrom media insertion
>>>> later on.
>>>>
>>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>>> CC: Kevin Wolf <kwolf@redhat.com>
>>>> CC: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>  blockdev.c | 18 +++++++++++++++---
>>>>  1 file changed, 15 insertions(+), 3 deletions(-)
>>> There was a Red Hat BZ for this:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1342999
>>>
>>> There is now a corresponding BZ for libvirt:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1377321
>>>
>>> The gist is that it was determined to be a problem with libvirt.
>>>
>>> RHEL has a downstream commit to work around this issue by ignoring the
>>> cache mode set for empty CD-ROMs -- but that is only because that was
>>> simpler than fixing libvirt.
>>>
>>> (Note that it's ignoring the cache mode instead of actually evaluating it.)
>>>
>> This is what I have exactly started from:
>> http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHEV/SRPMS/qemu-kvm-rhev-2.6.0-27.el7.src.rpm
>>
>> This package starts VM well for the above mentioned configuration:
>>
>>     <disk type='file' device='cdrom'>
>>       <driver name='qemu' type='raw' cache='none' io='native'/>
>>       <target dev='sdb' bus='scsi'/>
>>       <readonly/>
>>     </disk>
>>
>> but the problem comes later at 'change' moment. It reports
>>
>> 'Details: internal error: unable to execute QEMU command 'change':
>> aio=native was specified, but it requires cache.direct=on, which
>> was not specified.)'
>>
>>
>> Thus partial solution implemented by the RedHat is really
>> partial and does not work at the second stage.
> Yes, because it is not supposed to work for that. The only thing the
> downstream patch does is ignore the cache mode so you can still start
> the VM even if libvirt decides to specify a cache parameter for an empty
> drive (which it should not do).
>
>>                                                I have started from
>>
>> diff --git a/blockdev.c b/blockdev.c
>> index d280dc4..e2c9053 100644   
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -608,6 +608,13 @@ static BlockBackend *blockdev_init(const char
>> *file, QDict *bs_opts,
>>              goto early_err;
>>          }
>>  
>> +        if (qdict_haskey(bs_opts, BDRV_OPT_CACHE_DIRECT)) {
>> +            bdrv_flags |= BDRV_O_NOCACHE;
>> +        }
>> +        if (qdict_haskey(bs_opts, BDRV_OPT_CACHE_NO_FLUSH)) {
>> +            bdrv_flags |= BDRV_O_NO_FLUSH;
>> +        }
>> +
>>          blk_rs = blk_get_root_state(blk);
>>          blk_rs->open_flags    = bdrv_flags;
>>          blk_rs->read_only     = !(bdrv_flags & BDRV_O_RDWR);
>>
>> The problem for us is that we have such previously valid configurations
>> in the field.
>>
>>>> May be this has already discussed, but still. AIO=native for CDROM without
>>>> media seems important case.
>>> First, I personally don't find aio=native very important for CD-ROM
>>> drives. Yes, we shouldn't make IDE CD-ROM slower than necessary, but I
>>> don't think aio=native will make the difference between "Slow like
>>> CD-ROMs are in reality" and "As fast as virtio".
>> the problem for me is that each clone() call is costly and counts. That
>> is why we are trying to avoid it whenever possible. That is why 'native'
>> mode is MUCH better. Also it would be very nice not to use cached
>> IO, which is very good for memory overcommit situations.
>>
>> Here I am fighting not with the performance, which does not make
>> any real sense, but with a memory footprint.
> Fair enough. Still, specifying a cache mode for an empty drive simply
> doesn't make sense.
>
> Have you considered inserting a null-co:// file at startup as a
> workaround? If you use the old 'change' (or blockdev-change-medium)
> command, it should retain AIO and cache mode.
ok. Thanks for the suggestion.


>
>>> Second, all this patch does is revert some changes done by commit
>>> 91a097e7478940483e76d52217f05bc05b98d5a5, which was very deliberate.
>>>
>>> Third, you may then be asking for the recommended way to put an
>>> aio=native medium into a CD-ROM drive. Good thing you ask, because we
>>> have a way that we want to recommend but can't because it's still
>>> considered experimental:
>>>
>>> The BDS is added using blockdev-add, with all of the appropriate caching
>>> and aio options. Then it's inserted into the drive using the
>>> x-blockdev-insert-medium command, and the drive is closed using
>>> blockdev-close-tray.
>> the problem, again, is that with x-blockdev-insert-medium I can not
>> deal with block driver options, or may be I am missing something.
> x-blockdev-insert-medium just inserts a BDS as a medium. The BDS is
> added via blockdev-add which allows you to set all of these options.
>
>>> There are a couple of issues with this: First, blockdev-add and
>>> x-blockdev-insert-medium are still experimental. The good news are that
>>> I think the reason why the latter is experimental has actually
>>> disappeared and we can just drop the x- by now. The
>>> not-so-good-but-not-so-bad-either news are that we plan to get
>>> blockdev-add declared non-experimental for 2.9. Let's see how it goes.
>> Does this mean that x-blockdev-del would also lose x- prefix?
> As far as I'm aware, yes.

that sounds good!


>>> The other issue is that of course it's more cumbersome to use than a
>>> simple 'change' via HMP. I argue that if someone communicates with a VM
>>> by hand, they either have to deal with this added complexity or they
>>> cannot use aio=native for CD-ROM drives -- which, as I said, I don't
>>> think is too bad.
>> this is how 'virsh change-media' works at the moment, at least with
>> latest downstreams.
>> That is why this is not that clear.
> Well, that can be easily changed, the question is how fast that will be.
> You would still need to make libvirt aware of what cache mode to use for
> the new medium, and that may require deeper changes.
>
> (I supposed libvirt only stores a cache mode per drive while it should
> actually store a cache mode per image.)

fair enough. libvirt could do that.
Thank you for these clarifications.

Den
Kevin Wolf Jan. 30, 2017, 9:53 a.m. UTC | #3
Am 25.01.2017 um 20:44 hat Denis V. Lunev geschrieben:
> This is what I have exactly started from:
> http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHEV/SRPMS/qemu-kvm-rhev-2.6.0-27.el7.src.rpm
> 
> This package starts VM well for the above mentioned configuration:
> 
>     <disk type='file' device='cdrom'>
>       <driver name='qemu' type='raw' cache='none' io='native'/>
>       <target dev='sdb' bus='scsi'/>
>       <readonly/>
>     </disk>
> 
> but the problem comes later at 'change' moment. It reports
> 
> 'Details: internal error: unable to execute QEMU command 'change':
> aio=native was specified, but it requires cache.direct=on, which
> was not specified.)'
> 
> 
> Thus partial solution implemented by the RedHat is really
> partial and does not work at the second stage. I have started from

That's a downstream problem that is probably worth fixing, but of course
not with an upstream patch. Can you create a Bugzilla for this and
assign it to Max, John or me? (Putting the rest of us into the CC list)

> the problem for me is that each clone() call is costly and counts. That
> is why we are trying to avoid it whenever possible. That is why 'native'
> mode is MUCH better. Also it would be very nice not to use cached
> IO, which is very good for memory overcommit situations.
> 
> Here I am fighting not with the performance, which does not make
> any real sense, but with a memory footprint.

This makes sense generally, but for empty CD-ROMs, no I/O request is
ever made, so neither Linux AIO nor the thread pool is used.

You only get a specific cache/aio mode once you actually insert a block
driver node to the virtual device.

> > Second, all this patch does is revert some changes done by commit
> > 91a097e7478940483e76d52217f05bc05b98d5a5, which was very deliberate.
> >
> > Third, you may then be asking for the recommended way to put an
> > aio=native medium into a CD-ROM drive. Good thing you ask, because we
> > have a way that we want to recommend but can't because it's still
> > considered experimental:
> >
> > The BDS is added using blockdev-add, with all of the appropriate caching
> > and aio options. Then it's inserted into the drive using the
> > x-blockdev-insert-medium command, and the drive is closed using
> > blockdev-close-tray.
> the problem, again, is that with x-blockdev-insert-medium I can not
> deal with block driver options, or may be I am missing something.

The thing that you insert with x-blockdev-insert-medium is a block
driver node that you presumably created with blockdev-add, so you did
have a chance to specify whichever options you want.

Kevin

Patch
diff mbox

diff --git a/blockdev.c b/blockdev.c
index d280dc4..e2c9053 100644   
--- a/blockdev.c
+++ b/blockdev.c
@@ -608,6 +608,13 @@  static BlockBackend *blockdev_init(const char
*file, QDict *bs_opts,
             goto early_err;
         }
 
+        if (qdict_haskey(bs_opts, BDRV_OPT_CACHE_DIRECT)) {
+            bdrv_flags |= BDRV_O_NOCACHE;
+        }
+        if (qdict_haskey(bs_opts, BDRV_OPT_CACHE_NO_FLUSH)) {
+            bdrv_flags |= BDRV_O_NO_FLUSH;
+        }
+
         blk_rs = blk_get_root_state(blk);
         blk_rs->open_flags    = bdrv_flags;