diff mbox series

[RFC,v2,3/3] hw/block/pflash: use memory_region_init_rom_device_from_file()

Message ID 20210301115329.411762-4-philmd@redhat.com (mailing list archive)
State New, archived
Headers show
Series hw/block/pflash: Mmap read-only backend files with MAP_SHARED | expand

Commit Message

Philippe Mathieu-Daudé March 1, 2021, 11:53 a.m. UTC
If the block drive is read-only we will model a "protected" flash
device. We can thus use memory_region_init_rom_device_from_file()
which mmap the backing file when creating the MemoryRegion.
If the same backing file is used by multiple QEMU instances, this
reduces the memory footprint (this is often the case with the
CODE flash image from OVMF and AAVMF).

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 hw/block/pflash_cfi01.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

Comments

Stefan Hajnoczi March 1, 2021, 6:13 p.m. UTC | #1
On Mon, Mar 01, 2021 at 12:53:29PM +0100, Philippe Mathieu-Daudé wrote:
> If the block drive is read-only we will model a "protected" flash
> device. We can thus use memory_region_init_rom_device_from_file()
> which mmap the backing file when creating the MemoryRegion.
> If the same backing file is used by multiple QEMU instances, this
> reduces the memory footprint (this is often the case with the
> CODE flash image from OVMF and AAVMF).
> 
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
> ---
>  hw/block/pflash_cfi01.c | 39 +++++++++++++++++++++++++++++++--------
>  1 file changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
> index a5fa8d8b74a..ec290636298 100644
> --- a/hw/block/pflash_cfi01.c
> +++ b/hw/block/pflash_cfi01.c
> @@ -702,6 +702,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
>      int ret;
>      uint64_t blocks_per_device, sector_len_per_device, device_len;
>      int num_devices;
> +    bool romd_mr_shared_mapped;
>  
>      if (pfl->sector_len == 0) {
>          error_setg(errp, "attribute \"sector-length\" not specified or zero.");
> @@ -743,19 +744,41 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
>          pfl->ro = 0;
>      }
>  
> -    memory_region_init_rom_device(
> -        &pfl->mem, OBJECT(dev),
> -        &pflash_cfi01_ops,
> -        pfl,
> -        pfl->name, total_len, errp);
> -    if (*errp) {
> -        return;
> +    if (pfl->ro && pfl->blk) {
> +        BlockDriverState *bs = blk_bs(pfl->blk);
> +
> +        /* If "raw" driver used, try to mmap the backing file as RAM_SHARED */
> +        if (bs->drv == &bdrv_raw) { /* FIXME check offset=0 ? */

Bypassing the block layer is tricky because there are a lot of features
that conflict (you already pointed out the offset= option). Checking
bdrv_raw is not enough because the underlying protocol driver could be
GlusterFS, iSCSI, etc.

I think the goal here is to avoid changing the command-line/QMP so that
users don't need to modify their guests. Therefore changing the pflash
qdev properties is not desirable (we could have added a separate code
path that bypasses the block layer cleanly). This seems like a
worthwhile optimization that the block layer should support. I suggest
adding a new API like:

  /* Returns a filename string if @blk supports read-only mmap */
  char *blk_get_read_only_mmap_filename(BlockBackend *blk, Error **errp);

Then block/raw-format.c would forward the call to bs->file and
block/raw-posix.c would implement it by returning a new filename string
when bs->read_only is true.

FWIW this API isn't perfect because the file could be reopened with QMP
and the existing mmap would remain in place.

> +            Error *local_err = NULL;
> +
> +            memory_region_init_rom_device_from_file(&pfl->mem, OBJECT(dev),
> +                                                    &pflash_cfi01_ops, pfl,
> +                                                    pfl->name, total_len,
> +                                                    qemu_real_host_page_size,
> +                                                    RAM_SHARED,
> +                                                    bs->exact_filename,
> +                                                    true, &local_err);
> +            if (local_err) {
> +                error_report_err(local_err);
> +                /* fall back to memory_region_init_rom_device() */
> +            } else {
> +                romd_mr_shared_mapped = true;
> +            }
> +        }
> +    }
> +    if (!romd_mr_shared_mapped) {
> +        memory_region_init_rom_device(&pfl->mem, OBJECT(dev),
> +                                      &pflash_cfi01_ops, pfl,
> +                                      pfl->name, total_len, errp);
> +        if (*errp) {
> +            return;
> +        }
>      }
>  
>      pfl->storage = memory_region_get_ram_ptr(&pfl->mem);
>      sysbus_init_mmio(SYS_BUS_DEVICE(dev), &pfl->mem);
>  
> -    if (pfl->blk) {
> +    if (pfl->blk && !romd_mr_shared_mapped) {
>          if (!blk_check_size_and_read_all(pfl->blk, pfl->storage, total_len,
>                                           errp)) {
>              vmstate_unregister_ram(&pfl->mem, DEVICE(pfl));
> -- 
> 2.26.2
>
Philippe Mathieu-Daudé March 2, 2021, 7:04 a.m. UTC | #2
On 3/1/21 7:13 PM, Stefan Hajnoczi wrote:
> On Mon, Mar 01, 2021 at 12:53:29PM +0100, Philippe Mathieu-Daudé wrote:
>> If the block drive is read-only we will model a "protected" flash
>> device. We can thus use memory_region_init_rom_device_from_file()
>> which mmap the backing file when creating the MemoryRegion.
>> If the same backing file is used by multiple QEMU instances, this
>> reduces the memory footprint (this is often the case with the
>> CODE flash image from OVMF and AAVMF).
>>
>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>> ---
>>  hw/block/pflash_cfi01.c | 39 +++++++++++++++++++++++++++++++--------
>>  1 file changed, 31 insertions(+), 8 deletions(-)
>>
>> diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
>> index a5fa8d8b74a..ec290636298 100644
>> --- a/hw/block/pflash_cfi01.c
>> +++ b/hw/block/pflash_cfi01.c
>> @@ -702,6 +702,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
>>      int ret;
>>      uint64_t blocks_per_device, sector_len_per_device, device_len;
>>      int num_devices;
>> +    bool romd_mr_shared_mapped;
>>  
>>      if (pfl->sector_len == 0) {
>>          error_setg(errp, "attribute \"sector-length\" not specified or zero.");
>> @@ -743,19 +744,41 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
>>          pfl->ro = 0;
>>      }
>>  
>> -    memory_region_init_rom_device(
>> -        &pfl->mem, OBJECT(dev),
>> -        &pflash_cfi01_ops,
>> -        pfl,
>> -        pfl->name, total_len, errp);
>> -    if (*errp) {
>> -        return;
>> +    if (pfl->ro && pfl->blk) {
>> +        BlockDriverState *bs = blk_bs(pfl->blk);
>> +
>> +        /* If "raw" driver used, try to mmap the backing file as RAM_SHARED */
>> +        if (bs->drv == &bdrv_raw) { /* FIXME check offset=0 ? */
> 
> Bypassing the block layer is tricky because there are a lot of features
> that conflict (you already pointed out the offset= option). Checking
> bdrv_raw is not enough because the underlying protocol driver could be
> GlusterFS, iSCSI, etc.

OK.

> I think the goal here is to avoid changing the command-line/QMP so that
> users don't need to modify their guests. Therefore changing the pflash
> qdev properties is not desirable (we could have added a separate code
> path that bypasses the block layer cleanly).

Yes, this is the limitation.

> This seems like a
> worthwhile optimization that the block layer should support. I suggest
> adding a new API like:
> 
>   /* Returns a filename string if @blk supports read-only mmap */
>   char *blk_get_read_only_mmap_filename(BlockBackend *blk, Error **errp);
> 
> Then block/raw-format.c would forward the call to bs->file and
> block/raw-posix.c would implement it by returning a new filename string
> when bs->read_only is true.

Thanks :) Kevin suggested something similar too.

> 
> FWIW this API isn't perfect because the file could be reopened with QMP
> and the existing mmap would remain in place.

Can you show me a QMP example or point me at the command?
This shouldn't happen with the pflash.

Thanks for reviewing,

Phil.
Stefan Hajnoczi March 2, 2021, 10:54 a.m. UTC | #3
On Tue, Mar 02, 2021 at 08:04:46AM +0100, Philippe Mathieu-Daudé wrote:
> On 3/1/21 7:13 PM, Stefan Hajnoczi wrote:
> > On Mon, Mar 01, 2021 at 12:53:29PM +0100, Philippe Mathieu-Daudé wrote:
> > FWIW this API isn't perfect because the file could be reopened with QMP
> > and the existing mmap would remain in place.
> 
> Can you show me a QMP example or point me at the command?

x-blockdev-change and other commands can reopen or reconfigure the
BlockDriverState graph - the mmap user would not be aware of this.

For example, block_set_io_throttle won't take effect if the guest has
the device mmapped.

> This shouldn't happen with the pflash.

It's not possible to say that because pflash has a
DEFINE_PROP_DRIVE("drive") property. The storage is backed by a
--drive/--blockdev and the user could send any QMP command that operates
on drives :(.

Users probably won't but there is nothing stopping them.

The block layer has a permission system (BLK_PERM_*). Maybe it's
possible to use it to lock a BDS while mmap is active?

Stefan
diff mbox series

Patch

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index a5fa8d8b74a..ec290636298 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -702,6 +702,7 @@  static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
     int ret;
     uint64_t blocks_per_device, sector_len_per_device, device_len;
     int num_devices;
+    bool romd_mr_shared_mapped;
 
     if (pfl->sector_len == 0) {
         error_setg(errp, "attribute \"sector-length\" not specified or zero.");
@@ -743,19 +744,41 @@  static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
         pfl->ro = 0;
     }
 
-    memory_region_init_rom_device(
-        &pfl->mem, OBJECT(dev),
-        &pflash_cfi01_ops,
-        pfl,
-        pfl->name, total_len, errp);
-    if (*errp) {
-        return;
+    if (pfl->ro && pfl->blk) {
+        BlockDriverState *bs = blk_bs(pfl->blk);
+
+        /* If "raw" driver used, try to mmap the backing file as RAM_SHARED */
+        if (bs->drv == &bdrv_raw) { /* FIXME check offset=0 ? */
+            Error *local_err = NULL;
+
+            memory_region_init_rom_device_from_file(&pfl->mem, OBJECT(dev),
+                                                    &pflash_cfi01_ops, pfl,
+                                                    pfl->name, total_len,
+                                                    qemu_real_host_page_size,
+                                                    RAM_SHARED,
+                                                    bs->exact_filename,
+                                                    true, &local_err);
+            if (local_err) {
+                error_report_err(local_err);
+                /* fall back to memory_region_init_rom_device() */
+            } else {
+                romd_mr_shared_mapped = true;
+            }
+        }
+    }
+    if (!romd_mr_shared_mapped) {
+        memory_region_init_rom_device(&pfl->mem, OBJECT(dev),
+                                      &pflash_cfi01_ops, pfl,
+                                      pfl->name, total_len, errp);
+        if (*errp) {
+            return;
+        }
     }
 
     pfl->storage = memory_region_get_ram_ptr(&pfl->mem);
     sysbus_init_mmio(SYS_BUS_DEVICE(dev), &pfl->mem);
 
-    if (pfl->blk) {
+    if (pfl->blk && !romd_mr_shared_mapped) {
         if (!blk_check_size_and_read_all(pfl->blk, pfl->storage, total_len,
                                          errp)) {
             vmstate_unregister_ram(&pfl->mem, DEVICE(pfl));