diff mbox series

scsi-generic: replace logical block count of response of READ CAPACITY

Message ID 20211120101512.7770-1-lma@suse.com (mailing list archive)
State New, archived
Headers show
Series scsi-generic: replace logical block count of response of READ CAPACITY | expand

Commit Message

Lin Ma Nov. 20, 2021, 10:15 a.m. UTC
While using SCSI passthrough, Following scenario makes qemu doesn't
realized the capacity change of remote scsi target:
1. online resize the scsi target.
2. issue 'rescan-scsi-bus.sh -s ...' in host.
3. issue 'rescan-scsi-bus.sh -s ...' in vm.

In above scenario I used to experienced errors while accessing the
additional disk space in vm. I think the reasonable operations should
be:
1. online resize the scsi target.
2. issue 'rescan-scsi-bus.sh -s ...' in host.
3. issue 'block_resize' via qmp to notify qemu.
4. issue 'rescan-scsi-bus.sh -s ...' in vm.

The errors disappear once I notify qemu by block_resize via qmp.

So this patch replaces the number of logical blocks of READ CAPACITY
response from scsi target by qemu's bs->total_sectors. If the user in
vm wants to access the additional disk space, The administrator of
host must notify qemu once resizeing the scsi target.

Bonus is that domblkinfo of libvirt can reflect the consistent capacity
information between host and vm in case of missing block_resize in qemu.
E.g:
...
    <disk type='block' device='lun'>
      <driver name='qemu' type='raw'/>
      <source dev='/dev/sdc' index='1'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
...

Before:
1. online resize the scsi target.
2. host:~  # rescan-scsi-bus.sh -s /dev/sdc
3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
4  host:~  # virsh domblkinfo --domain $DOMAIN --human --device sda
Capacity:       4.000 GiB
Allocation:     0.000 B
Physical:       8.000 GiB

5. guest:~ # lsblk /dev/sda
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda      8:0    0   8G  0 disk
└─sda1   8:1    0   2G  0 part

After:
1. online resize the scsi target.
2. host:~  # rescan-scsi-bus.sh -s /dev/sdc
3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
4  host:~  # virsh domblkinfo --domain $DOMAIN --human --device sda
Capacity:       4.000 GiB
Allocation:     0.000 B
Physical:       8.000 GiB

5. guest:~ # lsblk /dev/sda
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda      8:0    0   4G  0 disk
└─sda1   8:1    0   2G  0 part

Signed-off-by: Lin Ma <lma@suse.com>
---
 hw/scsi/scsi-generic.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Claudio Fontana Dec. 21, 2021, 9:48 a.m. UTC | #1
Hi Paolo, Hannes,

any thoughts on the following issue?

Introduction:

When using SAN storage for providing block devices to guests, configured as SCSI-passthrough devices, increasing the space available in the VM is a use case.

To do it, it is currently necessary to:

1) expand storage on the actual SAN,
2) run a "virsh blockresize" or equivalent command to make QEMU aware of the new size, and finally
3) do a "rescan-scsi-bus.sh" or equivalent operation in the guest to make the running guest aware of the increased disk size.

The problem:

As of now, the administrator needs to make sure that step 3 won't be done before step 2 has been executed, or the resulting state will be inconsistent.
In practice this creates organizational issues to try to sync up host/storage admins and guest OS admin, and is therefore error prone (due to these human factors).

The proposal:

The patch I replied to here from Ma Lin tries to avoid the inconsistent state, by having "rescan-scsi-bus.sh" still report the old size in the guest until QEMU itself is aware of the new disk size.

The patch:

Before the patch, the SCSI READ_CAPACITY command in the guest os directly receives the unmodified response from the storage backend.

After the patch, QEMU intercepts the READ_CAPACITY response and replaces the maximum LBA with the information which is saved in QEMU.

This means: after resizing the storage on the SAN backend, the host administrator must explicitly notify about CAPACITY HAS CHANGED by issuing a block resize command through QMP or libvirt,
even for SCSI passthrough disks.

Any ideas on this patch or on possible alternatives?

Thanks,

Claudio


On 11/20/21 11:15 AM, Lin Ma wrote:
> While using SCSI passthrough, Following scenario makes qemu doesn't
> realized the capacity change of remote scsi target:
> 1. online resize the scsi target.
> 2. issue 'rescan-scsi-bus.sh -s ...' in host.
> 3. issue 'rescan-scsi-bus.sh -s ...' in vm.
> 
> In above scenario I used to experienced errors while accessing the
> additional disk space in vm. I think the reasonable operations should
> be:
> 1. online resize the scsi target.
> 2. issue 'rescan-scsi-bus.sh -s ...' in host.
> 3. issue 'block_resize' via qmp to notify qemu.
> 4. issue 'rescan-scsi-bus.sh -s ...' in vm.
> 
> The errors disappear once I notify qemu by block_resize via qmp.
> 
> So this patch replaces the number of logical blocks of READ CAPACITY
> response from scsi target by qemu's bs->total_sectors. If the user in
> vm wants to access the additional disk space, The administrator of
> host must notify qemu once resizeing the scsi target.
> 
> Bonus is that domblkinfo of libvirt can reflect the consistent capacity
> information between host and vm in case of missing block_resize in qemu.
> E.g:
> ...
>     <disk type='block' device='lun'>
>       <driver name='qemu' type='raw'/>
>       <source dev='/dev/sdc' index='1'/>
>       <backingStore/>
>       <target dev='sda' bus='scsi'/>
>       <alias name='scsi0-0-0-0'/>
>       <address type='drive' controller='0' bus='0' target='0' unit='0'/>
>     </disk>
> ...
> 
> Before:
> 1. online resize the scsi target.
> 2. host:~  # rescan-scsi-bus.sh -s /dev/sdc
> 3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
> 4  host:~  # virsh domblkinfo --domain $DOMAIN --human --device sda
> Capacity:       4.000 GiB
> Allocation:     0.000 B
> Physical:       8.000 GiB
> 
> 5. guest:~ # lsblk /dev/sda
> NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sda      8:0    0   8G  0 disk
> └─sda1   8:1    0   2G  0 part
> 
> After:
> 1. online resize the scsi target.
> 2. host:~  # rescan-scsi-bus.sh -s /dev/sdc
> 3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
> 4  host:~  # virsh domblkinfo --domain $DOMAIN --human --device sda
> Capacity:       4.000 GiB
> Allocation:     0.000 B
> Physical:       8.000 GiB
> 
> 5. guest:~ # lsblk /dev/sda
> NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sda      8:0    0   4G  0 disk
> └─sda1   8:1    0   2G  0 part
> 
> Signed-off-by: Lin Ma <lma@suse.com>
> ---
>  hw/scsi/scsi-generic.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
> index 0306ccc7b1..343b51c2c0 100644
> --- a/hw/scsi/scsi-generic.c
> +++ b/hw/scsi/scsi-generic.c
> @@ -315,11 +315,17 @@ static void scsi_read_complete(void * opaque, int ret)
>      if (r->req.cmd.buf[0] == READ_CAPACITY_10 &&
>          (ldl_be_p(&r->buf[0]) != 0xffffffffU || s->max_lba == 0)) {
>          s->blocksize = ldl_be_p(&r->buf[4]);
> -        s->max_lba = ldl_be_p(&r->buf[0]) & 0xffffffffULL;
> +        BlockBackend *blk = s->conf.blk;
> +        BlockDriverState *bs = blk_bs(blk);
> +        s->max_lba = bs->total_sectors - 1;
> +        stl_be_p(&r->buf[0], s->max_lba);
>      } else if (r->req.cmd.buf[0] == SERVICE_ACTION_IN_16 &&
>                 (r->req.cmd.buf[1] & 31) == SAI_READ_CAPACITY_16) {
>          s->blocksize = ldl_be_p(&r->buf[8]);
> -        s->max_lba = ldq_be_p(&r->buf[0]);
> +        BlockBackend *blk = s->conf.blk;
> +        BlockDriverState *bs = blk_bs(blk);
> +        s->max_lba = bs->total_sectors - 1;
> +        stq_be_p(&r->buf[0], s->max_lba);
>      }
>      blk_set_guest_block_size(s->conf.blk, s->blocksize);
>  
>
Philippe Mathieu-Daudé Dec. 21, 2021, 11:20 a.m. UTC | #2
Hi Claudio,

On 12/21/21 10:48, Claudio Fontana wrote:
> Hi Paolo, Hannes,
> 
> any thoughts on the following issue?
> 
> Introduction:
> 
> When using SAN storage for providing block devices to guests, configured as SCSI-passthrough devices, increasing the space available in the VM is a use case.
> 
> To do it, it is currently necessary to:
> 
> 1) expand storage on the actual SAN,
> 2) run a "virsh blockresize" or equivalent command to make QEMU aware of the new size, and finally
> 3) do a "rescan-scsi-bus.sh" or equivalent operation in the guest to make the running guest aware of the increased disk size.
> 
> The problem:
> 
> As of now, the administrator needs to make sure that step 3 won't be done before step 2 has been executed, or the resulting state will be inconsistent.
> In practice this creates organizational issues to try to sync up host/storage admins and guest OS admin, and is therefore error prone (due to these human factors).
> 
> The proposal:
> 
> The patch I replied to here from Ma Lin tries to avoid the inconsistent state, by having "rescan-scsi-bus.sh" still report the old size in the guest until QEMU itself is aware of the new disk size.
> 
> The patch:
> 
> Before the patch, the SCSI READ_CAPACITY command in the guest os directly receives the unmodified response from the storage backend.
> 
> After the patch, QEMU intercepts the READ_CAPACITY response and replaces the maximum LBA with the information which is saved in QEMU.
> 
> This means: after resizing the storage on the SAN backend, the host administrator must explicitly notify about CAPACITY HAS CHANGED by issuing a block resize command through QMP or libvirt,
> even for SCSI passthrough disks.
> 
> Any ideas on this patch or on possible alternatives?

I am not an SCSI expert, but Lin's description and yours sound
coherent to me.

One minor nitpick below.

> On 11/20/21 11:15 AM, Lin Ma wrote:
>> While using SCSI passthrough, Following scenario makes qemu doesn't
>> realized the capacity change of remote scsi target:
>> 1. online resize the scsi target.
>> 2. issue 'rescan-scsi-bus.sh -s ...' in host.
>> 3. issue 'rescan-scsi-bus.sh -s ...' in vm.
>>
>> In above scenario I used to experienced errors while accessing the
>> additional disk space in vm. I think the reasonable operations should
>> be:
>> 1. online resize the scsi target.
>> 2. issue 'rescan-scsi-bus.sh -s ...' in host.
>> 3. issue 'block_resize' via qmp to notify qemu.
>> 4. issue 'rescan-scsi-bus.sh -s ...' in vm.
>>
>> The errors disappear once I notify qemu by block_resize via qmp.
>>
>> So this patch replaces the number of logical blocks of READ CAPACITY
>> response from scsi target by qemu's bs->total_sectors. If the user in
>> vm wants to access the additional disk space, The administrator of
>> host must notify qemu once resizeing the scsi target.
>>
>> Bonus is that domblkinfo of libvirt can reflect the consistent capacity
>> information between host and vm in case of missing block_resize in qemu.
>> E.g:
>> ...
>>     <disk type='block' device='lun'>
>>       <driver name='qemu' type='raw'/>
>>       <source dev='/dev/sdc' index='1'/>
>>       <backingStore/>
>>       <target dev='sda' bus='scsi'/>
>>       <alias name='scsi0-0-0-0'/>
>>       <address type='drive' controller='0' bus='0' target='0' unit='0'/>
>>     </disk>
>> ...
>>
>> Before:
>> 1. online resize the scsi target.
>> 2. host:~  # rescan-scsi-bus.sh -s /dev/sdc
>> 3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
>> 4  host:~  # virsh domblkinfo --domain $DOMAIN --human --device sda
>> Capacity:       4.000 GiB
>> Allocation:     0.000 B
>> Physical:       8.000 GiB
>>
>> 5. guest:~ # lsblk /dev/sda
>> NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
>> sda      8:0    0   8G  0 disk
>> └─sda1   8:1    0   2G  0 part
>>
>> After:
>> 1. online resize the scsi target.
>> 2. host:~  # rescan-scsi-bus.sh -s /dev/sdc
>> 3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
>> 4  host:~  # virsh domblkinfo --domain $DOMAIN --human --device sda
>> Capacity:       4.000 GiB
>> Allocation:     0.000 B
>> Physical:       8.000 GiB
>>
>> 5. guest:~ # lsblk /dev/sda
>> NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
>> sda      8:0    0   4G  0 disk
>> └─sda1   8:1    0   2G  0 part
>>
>> Signed-off-by: Lin Ma <lma@suse.com>
>> ---
>>  hw/scsi/scsi-generic.c | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
>> index 0306ccc7b1..343b51c2c0 100644
>> --- a/hw/scsi/scsi-generic.c
>> +++ b/hw/scsi/scsi-generic.c
>> @@ -315,11 +315,17 @@ static void scsi_read_complete(void * opaque, int ret)
>>      if (r->req.cmd.buf[0] == READ_CAPACITY_10 &&
>>          (ldl_be_p(&r->buf[0]) != 0xffffffffU || s->max_lba == 0)) {
>>          s->blocksize = ldl_be_p(&r->buf[4]);
>> -        s->max_lba = ldl_be_p(&r->buf[0]) & 0xffffffffULL;
>> +        BlockBackend *blk = s->conf.blk;
>> +        BlockDriverState *bs = blk_bs(blk);
>> +        s->max_lba = bs->total_sectors - 1;

I'd add a refresh_max_lba() helper:

 static void refresh_max_lba(SCSIDevice *s)
 {
     BlockBackend *blk = s->conf.blk;
     BlockDriverState *bs = blk_bs(blk);
     uint64_t max_lba = bs->total_sectors - 1;

     if (max_lba != s->max_lba) {
         trace_scsi_generic_max_lba_refreshed(s->max_lba, max_lba);
         s->max_lba = max_lba;
     }
 }

Otherwise, to the best of my knowledge:
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

>> +        stl_be_p(&r->buf[0], s->max_lba);
>>      } else if (r->req.cmd.buf[0] == SERVICE_ACTION_IN_16 &&
>>                 (r->req.cmd.buf[1] & 31) == SAI_READ_CAPACITY_16) {
>>          s->blocksize = ldl_be_p(&r->buf[8]);
>> -        s->max_lba = ldq_be_p(&r->buf[0]);
>> +        BlockBackend *blk = s->conf.blk;
>> +        BlockDriverState *bs = blk_bs(blk);
>> +        s->max_lba = bs->total_sectors - 1;
>> +        stq_be_p(&r->buf[0], s->max_lba);
>>      }
>>      blk_set_guest_block_size(s->conf.blk, s->blocksize);
>>  
>>
> 
>
diff mbox series

Patch

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 0306ccc7b1..343b51c2c0 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -315,11 +315,17 @@  static void scsi_read_complete(void * opaque, int ret)
     if (r->req.cmd.buf[0] == READ_CAPACITY_10 &&
         (ldl_be_p(&r->buf[0]) != 0xffffffffU || s->max_lba == 0)) {
         s->blocksize = ldl_be_p(&r->buf[4]);
-        s->max_lba = ldl_be_p(&r->buf[0]) & 0xffffffffULL;
+        BlockBackend *blk = s->conf.blk;
+        BlockDriverState *bs = blk_bs(blk);
+        s->max_lba = bs->total_sectors - 1;
+        stl_be_p(&r->buf[0], s->max_lba);
     } else if (r->req.cmd.buf[0] == SERVICE_ACTION_IN_16 &&
                (r->req.cmd.buf[1] & 31) == SAI_READ_CAPACITY_16) {
         s->blocksize = ldl_be_p(&r->buf[8]);
-        s->max_lba = ldq_be_p(&r->buf[0]);
+        BlockBackend *blk = s->conf.blk;
+        BlockDriverState *bs = blk_bs(blk);
+        s->max_lba = bs->total_sectors - 1;
+        stq_be_p(&r->buf[0], s->max_lba);
     }
     blk_set_guest_block_size(s->conf.blk, s->blocksize);