mbox series

[0/4] Add ignore-external migration capability

Message ID 20190110120120.9943-1-yury-kotov@yandex-team.ru (mailing list archive)
Headers show
Series Add ignore-external migration capability | expand

Message

Yury Kotov Jan. 10, 2019, 12:01 p.m. UTC
Hi,

The series adds migration capability which allows to skip 'external' RAM blocks
during migration. External block is a RAMBlock which available from the outside
of current QEMU process (e.g. file in /dev/shm). It's useful for fast local
migration to update QEMU for the running guests.

Patches:
1. Add offset validation to make sure that external RAM block has the same
   physical offset on target side,
2. Add RAM_EXTERNAL flag to determine external RAM blocks,
3. Add ignore-external migration capability,
4. Add a test.

Usage example:
1. Start source VM:
   qemu-system-x86 \
     -m 4G \
     -object memory-backend-file,id=mem0,size=4G,share=on,mem-path=/dev/shm/mem0 \
     -numa node,memdev=mem0 \
     -qmp unix:/tmp/qemu-qmp-1.sock,server,nowait \

2. Start target VM:
   qemu-system-x86 \
     -m 4G \
     -object memory-backend-file,id=mem0,size=4G,share=on,mem-path=/dev/shm/mem0 \
     -numa node,memdev=mem0 \
     -qmp unix:/tmp/qemu-qmp-2.sock,server,nowait \
     -incoming defer

3. Enable ignore-external capability on both VMs:
   { "execute": "migrate-set-capabilities" , "arguments":
     { "capabilities": [ { "capability": "x-ignore-external", "state": true } ] } }

4. Start migration.

Regards,
Yury

Yury Kotov (4):
  migration: add RAMBlock's offset validation
  exec: add RAM_EXTERNAL flag to mark non-QEMU allocated blocks
  migration: introduce ignore-external capability
  tests/migration-test: Add a test for ignore-external capability

 backends/hostmem-file.c   |   3 +-
 exec.c                    |   7 ++-
 include/exec/cpu-common.h |   1 +
 include/exec/memory.h     |   3 ++
 migration/migration.c     |   9 ++++
 migration/migration.h     |   1 +
 migration/ram.c           |  52 ++++++++++++++++--
 numa.c                    |   4 +-
 qapi/migration.json       |   6 ++-
 tests/migration-test.c    | 109 +++++++++++++++++++++++++++++++-------
 10 files changed, 165 insertions(+), 30 deletions(-)

Comments

Dr. David Alan Gilbert Jan. 10, 2019, 8:11 p.m. UTC | #1
* Yury Kotov (yury-kotov@yandex-team.ru) wrote:
> Hi,
> 
> The series adds migration capability which allows to skip 'external' RAM blocks
> during migration. External block is a RAMBlock which available from the outside
> of current QEMU process (e.g. file in /dev/shm). It's useful for fast local
> migration to update QEMU for the running guests.

Hi Yury,
  There have been a few similar patch series around from people wanting
to do similar things.
  In particular Lai Jiangshan's https://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg07511.html
and Cédric Le Goater wanted to skip regions for a different reason.

  We merged some of Cédric's code last year so that we now
have the qemu_ram_is_migratable() function - and we should be reusing
that to skip things rather than adding a new check that we have to add
everywhere.

  Also, ypu're skipping 'external' things, I think the other suggestion
was to skip 'shared' things (i.e. anything with share=0); skipping
share=on cases sounds easier to me.

Dave

> Patches:
> 1. Add offset validation to make sure that external RAM block has the same
>    physical offset on target side,
> 2. Add RAM_EXTERNAL flag to determine external RAM blocks,
> 3. Add ignore-external migration capability,
> 4. Add a test.
> 
> Usage example:
> 1. Start source VM:
>    qemu-system-x86 \
>      -m 4G \
>      -object memory-backend-file,id=mem0,size=4G,share=on,mem-path=/dev/shm/mem0 \
>      -numa node,memdev=mem0 \
>      -qmp unix:/tmp/qemu-qmp-1.sock,server,nowait \
> 
> 2. Start target VM:
>    qemu-system-x86 \
>      -m 4G \
>      -object memory-backend-file,id=mem0,size=4G,share=on,mem-path=/dev/shm/mem0 \
>      -numa node,memdev=mem0 \
>      -qmp unix:/tmp/qemu-qmp-2.sock,server,nowait \
>      -incoming defer
> 
> 3. Enable ignore-external capability on both VMs:
>    { "execute": "migrate-set-capabilities" , "arguments":
>      { "capabilities": [ { "capability": "x-ignore-external", "state": true } ] } }
> 
> 4. Start migration.
> 
> Regards,
> Yury
> 
> Yury Kotov (4):
>   migration: add RAMBlock's offset validation
>   exec: add RAM_EXTERNAL flag to mark non-QEMU allocated blocks
>   migration: introduce ignore-external capability
>   tests/migration-test: Add a test for ignore-external capability
> 
>  backends/hostmem-file.c   |   3 +-
>  exec.c                    |   7 ++-
>  include/exec/cpu-common.h |   1 +
>  include/exec/memory.h     |   3 ++
>  migration/migration.c     |   9 ++++
>  migration/migration.h     |   1 +
>  migration/ram.c           |  52 ++++++++++++++++--
>  numa.c                    |   4 +-
>  qapi/migration.json       |   6 ++-
>  tests/migration-test.c    | 109 +++++++++++++++++++++++++++++++-------
>  10 files changed, 165 insertions(+), 30 deletions(-)
> 
> -- 
> 2.20.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Yury Kotov Jan. 11, 2019, 3:49 p.m. UTC | #2
10.01.2019, 23:12, "Dr. David Alan Gilbert" <dgilbert@redhat.com>:
> * Yury Kotov (yury-kotov@yandex-team.ru) wrote:
>>  Hi,
>>
>>  The series adds migration capability which allows to skip 'external' RAM blocks
>>  during migration. External block is a RAMBlock which available from the outside
>>  of current QEMU process (e.g. file in /dev/shm). It's useful for fast local
>>  migration to update QEMU for the running guests.
>
> Hi Yury,
>   There have been a few similar patch series around from people wanting
> to do similar things.
>   In particular Lai Jiangshan's https://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg07511.html
> and Cédric Le Goater wanted to skip regions for a different reason.
>
>   We merged some of Cédric's code last year so that we now
> have the qemu_ram_is_migratable() function - and we should be reusing
> that to skip things rather than adding a new check that we have to add
> everywhere.
>

I didn't see the series, so I'll check it, thanks!
But I saw qemu_ram_is_migratable() function and corresponding patch.
It's very close to my needs, but it works a bit different IIUC:
1. Not migratable blocks isn't validated (existence and size) during migration,
2. "Migratable" state is determined during the block creation time.
   Such case isn't valid because of it:
   * Source has one migratable and one not migratable RAM blocks,
   * Target has the same (idstr) blocks, but both are not migratable.
   Thus, target will not expect pages for not migratable blocks.

>   Also, ypu're skipping 'external' things, I think the other suggestion
> was to skip 'shared' things (i.e. anything with share=0); skipping
> share=on cases sounds easier to me.

I agree that introducing new term is a complication, but 'share' and 'external'
terms have important differences (I'll describe it below).

Just to clarify:
* 'share' means that other processes has an access to such memory,
* 'external' means file backed memory.

There is another use case I wanted to support (I had to write about it in
the cover letter, sorry..):
1. Migrate source VM to file and kill source,
2. Start target VM and migrate it from file.
In such case source VM may have memory-backend-ram with share=off, it's ok.

Thus, in the new migration capability I want to migrate memory that meets
three conditions:
1. The source will not use the memory after migration ends,
2. The source may exit before target starts (migrate to file),
3. The target has an access to the memory.

I think 'external' fits them better than 'share'.

>
> Dave
>
>>  Patches:
>>  1. Add offset validation to make sure that external RAM block has the same
>>     physical offset on target side,
>>  2. Add RAM_EXTERNAL flag to determine external RAM blocks,
>>  3. Add ignore-external migration capability,
>>  4. Add a test.
>>
>>  Usage example:
>>  1. Start source VM:
>>     qemu-system-x86 \
>>       -m 4G \
>>       -object memory-backend-file,id=mem0,size=4G,share=on,mem-path=/dev/shm/mem0 \
>>       -numa node,memdev=mem0 \
>>       -qmp unix:/tmp/qemu-qmp-1.sock,server,nowait \
>>
>>  2. Start target VM:
>>     qemu-system-x86 \
>>       -m 4G \
>>       -object memory-backend-file,id=mem0,size=4G,share=on,mem-path=/dev/shm/mem0 \
>>       -numa node,memdev=mem0 \
>>       -qmp unix:/tmp/qemu-qmp-2.sock,server,nowait \
>>       -incoming defer
>>
>>  3. Enable ignore-external capability on both VMs:
>>     { "execute": "migrate-set-capabilities" , "arguments":
>>       { "capabilities": [ { "capability": "x-ignore-external", "state": true } ] } }
>>
>>  4. Start migration.
>>
>>  Regards,
>>  Yury
>>
>>  Yury Kotov (4):
>>    migration: add RAMBlock's offset validation
>>    exec: add RAM_EXTERNAL flag to mark non-QEMU allocated blocks
>>    migration: introduce ignore-external capability
>>    tests/migration-test: Add a test for ignore-external capability
>>
>>   backends/hostmem-file.c | 3 +-
>>   exec.c | 7 ++-
>>   include/exec/cpu-common.h | 1 +
>>   include/exec/memory.h | 3 ++
>>   migration/migration.c | 9 ++++
>>   migration/migration.h | 1 +
>>   migration/ram.c | 52 ++++++++++++++++--
>>   numa.c | 4 +-
>>   qapi/migration.json | 6 ++-
>>   tests/migration-test.c | 109 +++++++++++++++++++++++++++++++-------
>>   10 files changed, 165 insertions(+), 30 deletions(-)
>>
>>  --
>>  2.20.1
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Regards,
Yury
Dr. David Alan Gilbert Jan. 11, 2019, 8:09 p.m. UTC | #3
* Yury Kotov (yury-kotov@yandex-team.ru) wrote:
> 10.01.2019, 23:12, "Dr. David Alan Gilbert" <dgilbert@redhat.com>:
> > * Yury Kotov (yury-kotov@yandex-team.ru) wrote:
> >>  Hi,
> >>
> >>  The series adds migration capability which allows to skip 'external' RAM blocks
> >>  during migration. External block is a RAMBlock which available from the outside
> >>  of current QEMU process (e.g. file in /dev/shm). It's useful for fast local
> >>  migration to update QEMU for the running guests.
> >
> > Hi Yury,
> >   There have been a few similar patch series around from people wanting
> > to do similar things.
> >   In particular Lai Jiangshan's https://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg07511.html
> > and Cédric Le Goater wanted to skip regions for a different reason.
> >
> >   We merged some of Cédric's code last year so that we now
> > have the qemu_ram_is_migratable() function - and we should be reusing
> > that to skip things rather than adding a new check that we have to add
> > everywhere.
> >
> 
> I didn't see the series, so I'll check it, thanks!
> But I saw qemu_ram_is_migratable() function and corresponding patch.
> It's very close to my needs, but it works a bit different IIUC:
> 1. Not migratable blocks isn't validated (existence and size) during migration,
> 2. "Migratable" state is determined during the block creation time.
>    Such case isn't valid because of it:
>    * Source has one migratable and one not migratable RAM blocks,
>    * Target has the same (idstr) blocks, but both are not migratable.
>    Thus, target will not expect pages for not migratable blocks.

I've added Cédric to the mail;
there were other cases people were interested in, including switching
it dynamically, just no one else used it yet.

I'd prefer it if you did modify the is_migratable - that will fix a lot
of other places in the migration code to avoid your blocks;  I think the
best thing is for you to add a spearate 'needs_migration_check' for
those blocks like yours which you want to check but you don't send the
data.  Please don't change the format of the migratino stream except
in the case where you are sending those to be checked.

> >   Also, ypu're skipping 'external' things, I think the other suggestion
> > was to skip 'shared' things (i.e. anything with share=0); skipping
> > share=on cases sounds easier to me.
> 
> I agree that introducing new term is a complication, but 'share' and 'external'
> terms have important differences (I'll describe it below).
> 
> Just to clarify:
> * 'share' means that other processes has an access to such memory,
> * 'external' means file backed memory.
> 
> There is another use case I wanted to support (I had to write about it in
> the cover letter, sorry..):
> 1. Migrate source VM to file and kill source,
> 2. Start target VM and migrate it from file.
> In such case source VM may have memory-backend-ram with share=off, it's ok.
> 
> Thus, in the new migration capability I want to migrate memory that meets
> three conditions:
> 1. The source will not use the memory after migration ends,
> 2. The source may exit before target starts (migrate to file),
> 3. The target has an access to the memory.
> 
> I think 'external' fits them better than 'share'.

Are you sure that with share=off (the default), in the case where the
source shuts down, that the changes have been written to the backing
file?

I'm also wondering if perhaps we'd be better having an explicit
migrate=off property on memory objects rather than trying to guess from
the share= or the fact it's an external path.

Igor: Does that make sense to you?

Dave

> >
> > Dave
> >
> >>  Patches:
> >>  1. Add offset validation to make sure that external RAM block has the same
> >>     physical offset on target side,
> >>  2. Add RAM_EXTERNAL flag to determine external RAM blocks,
> >>  3. Add ignore-external migration capability,
> >>  4. Add a test.
> >>
> >>  Usage example:
> >>  1. Start source VM:
> >>     qemu-system-x86 \
> >>       -m 4G \
> >>       -object memory-backend-file,id=mem0,size=4G,share=on,mem-path=/dev/shm/mem0 \
> >>       -numa node,memdev=mem0 \
> >>       -qmp unix:/tmp/qemu-qmp-1.sock,server,nowait \
> >>
> >>  2. Start target VM:
> >>     qemu-system-x86 \
> >>       -m 4G \
> >>       -object memory-backend-file,id=mem0,size=4G,share=on,mem-path=/dev/shm/mem0 \
> >>       -numa node,memdev=mem0 \
> >>       -qmp unix:/tmp/qemu-qmp-2.sock,server,nowait \
> >>       -incoming defer
> >>
> >>  3. Enable ignore-external capability on both VMs:
> >>     { "execute": "migrate-set-capabilities" , "arguments":
> >>       { "capabilities": [ { "capability": "x-ignore-external", "state": true } ] } }
> >>
> >>  4. Start migration.
> >>
> >>  Regards,
> >>  Yury
> >>
> >>  Yury Kotov (4):
> >>    migration: add RAMBlock's offset validation
> >>    exec: add RAM_EXTERNAL flag to mark non-QEMU allocated blocks
> >>    migration: introduce ignore-external capability
> >>    tests/migration-test: Add a test for ignore-external capability
> >>
> >>   backends/hostmem-file.c | 3 +-
> >>   exec.c | 7 ++-
> >>   include/exec/cpu-common.h | 1 +
> >>   include/exec/memory.h | 3 ++
> >>   migration/migration.c | 9 ++++
> >>   migration/migration.h | 1 +
> >>   migration/ram.c | 52 ++++++++++++++++--
> >>   numa.c | 4 +-
> >>   qapi/migration.json | 6 ++-
> >>   tests/migration-test.c | 109 +++++++++++++++++++++++++++++++-------
> >>   10 files changed, 165 insertions(+), 30 deletions(-)
> >>
> >>  --
> >>  2.20.1
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> Regards,
> Yury
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Eduardo Habkost Jan. 11, 2019, 8:55 p.m. UTC | #4
On Fri, Jan 11, 2019 at 06:49:53PM +0300, Yury Kotov wrote:
> 10.01.2019, 23:12, "Dr. David Alan Gilbert" <dgilbert@redhat.com>:
> > * Yury Kotov (yury-kotov@yandex-team.ru) wrote:
> >>  Hi,
> >>
> >>  The series adds migration capability which allows to skip 'external' RAM blocks
> >>  during migration. External block is a RAMBlock which available from the outside
> >>  of current QEMU process (e.g. file in /dev/shm). It's useful for fast local
> >>  migration to update QEMU for the running guests.
> >
> > Hi Yury,
> >   There have been a few similar patch series around from people wanting
> > to do similar things.
> >   In particular Lai Jiangshan's https://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg07511.html
> > and Cédric Le Goater wanted to skip regions for a different reason.
> >
> >   We merged some of Cédric's code last year so that we now
> > have the qemu_ram_is_migratable() function - and we should be reusing
> > that to skip things rather than adding a new check that we have to add
> > everywhere.
> >
> 
> I didn't see the series, so I'll check it, thanks!
> But I saw qemu_ram_is_migratable() function and corresponding patch.
> It's very close to my needs, but it works a bit different IIUC:
> 1. Not migratable blocks isn't validated (existence and size) during migration,
> 2. "Migratable" state is determined during the block creation time.
>    Such case isn't valid because of it:
>    * Source has one migratable and one not migratable RAM blocks,
>    * Target has the same (idstr) blocks, but both are not migratable.
>    Thus, target will not expect pages for not migratable blocks.
> 
> >   Also, ypu're skipping 'external' things, I think the other suggestion
> > was to skip 'shared' things (i.e. anything with share=0); skipping
> > share=on cases sounds easier to me.
> 
> I agree that introducing new term is a complication, but 'share' and 'external'
> terms have important differences (I'll describe it below).
> 
> Just to clarify:
> * 'share' means that other processes has an access to such memory,
> * 'external' means file backed memory.

If you use file backed memory with share=off, writes are not
propagated to the file (they are mapped with MAP_PRIVATE).  Would
you really want to skip file backed memory if it has share=off?

> 
> There is another use case I wanted to support (I had to write about it in
> the cover letter, sorry..):
> 1. Migrate source VM to file and kill source,
> 2. Start target VM and migrate it from file.
> In such case source VM may have memory-backend-ram with share=off, it's ok.
> 
> Thus, in the new migration capability I want to migrate memory that meets
> three conditions:
> 1. The source will not use the memory after migration ends,
> 2. The source may exit before target starts (migrate to file),
> 3. The target has an access to the memory.
> 
> I think 'external' fits them better than 'share'.
> 

In either case, defining "external" seems tricky.  A memory
region might be backed by a file on tmpfs or hugetlbfs that was
deleted, which makes the file "internal" for practical purposes.
QEMU has no way to tell if (3) is really true.
no-reply@patchew.org Jan. 13, 2019, 2:37 p.m. UTC | #5
Patchew URL: https://patchew.org/QEMU/20190110120120.9943-1-yury-kotov@yandex-team.ru/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-quick@centos7 SHOW_ENV=1 J=8
=== TEST SCRIPT END ===




The full log is available at
http://patchew.org/logs/20190110120120.9943-1-yury-kotov@yandex-team.ru/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
no-reply@patchew.org Jan. 13, 2019, 11:57 p.m. UTC | #6
Patchew URL: https://patchew.org/QEMU/20190110120120.9943-1-yury-kotov@yandex-team.ru/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-mingw@fedora SHOW_ENV=1 J=14
=== TEST SCRIPT END ===

  CC      aarch64-softmmu/balloon.o
  CC      x86_64-softmmu/hw/block/virtio-blk.o
/tmp/qemu-test/src/migration/ram.c: In function 'ram_load':
/tmp/qemu-test/src/migration/ram.c:4159:42: error: format '%lld' expects argument of type 'long long int', but argument 3 has type 'ram_addr_t {aka unsigned int}' [-Werror=format=]
                             error_report("Mismatched RAM block offset %s "
                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/i686-w64-mingw32/sys-root/mingw/include/inttypes.h:299:0,
---
  CC      aarch64-softmmu/hw/cpu/a15mpcore.o
  CC      aarch64-softmmu/hw/display/omap_dss.o
/tmp/qemu-test/src/migration/ram.c: In function 'ram_load':
/tmp/qemu-test/src/migration/ram.c:4159:42: error: format '%lld' expects argument of type 'long long int', but argument 3 has type 'ram_addr_t {aka unsigned int}' [-Werror=format=]
                             error_report("Mismatched RAM block offset %s "
                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/i686-w64-mingw32/sys-root/mingw/include/inttypes.h:299:0,


The full log is available at
http://patchew.org/logs/20190110120120.9943-1-yury-kotov@yandex-team.ru/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Yury Kotov Jan. 14, 2019, 3:16 p.m. UTC | #7
11.01.2019, 23:09, "Dr. David Alan Gilbert" <dgilbert@redhat.com>:
> * Yury Kotov (yury-kotov@yandex-team.ru) wrote:
>>  10.01.2019, 23:12, "Dr. David Alan Gilbert" <dgilbert@redhat.com>:
>>  > * Yury Kotov (yury-kotov@yandex-team.ru) wrote:
>>  >> Hi,
>>  >>
>>  >> The series adds migration capability which allows to skip 'external' RAM blocks
>>  >> during migration. External block is a RAMBlock which available from the outside
>>  >> of current QEMU process (e.g. file in /dev/shm). It's useful for fast local
>>  >> migration to update QEMU for the running guests.
>>  >
>>  > Hi Yury,
>>  > There have been a few similar patch series around from people wanting
>>  > to do similar things.
>>  > In particular Lai Jiangshan's https://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg07511.html
>>  > and Cédric Le Goater wanted to skip regions for a different reason.
>>  >
>>  > We merged some of Cédric's code last year so that we now
>>  > have the qemu_ram_is_migratable() function - and we should be reusing
>>  > that to skip things rather than adding a new check that we have to add
>>  > everywhere.
>>  >
>>
>>  I didn't see the series, so I'll check it, thanks!
>>  But I saw qemu_ram_is_migratable() function and corresponding patch.
>>  It's very close to my needs, but it works a bit different IIUC:
>>  1. Not migratable blocks isn't validated (existence and size) during migration,
>>  2. "Migratable" state is determined during the block creation time.
>>     Such case isn't valid because of it:
>>     * Source has one migratable and one not migratable RAM blocks,
>>     * Target has the same (idstr) blocks, but both are not migratable.
>>     Thus, target will not expect pages for not migratable blocks.
>
> I've added Cédric to the mail;
> there were other cases people were interested in, including switching
> it dynamically, just no one else used it yet.
>
> I'd prefer it if you did modify the is_migratable - that will fix a lot
> of other places in the migration code to avoid your blocks; I think the
> best thing is for you to add a spearate 'needs_migration_check' for
> those blocks like yours which you want to check but you don't send the
> data. Please don't change the format of the migratino stream except
> in the case where you are sending those to be checked.
>

Ok, I've got your point and will try to reuse/modify is_migratable.

>>  > Also, ypu're skipping 'external' things, I think the other suggestion
>>  > was to skip 'shared' things (i.e. anything with share=0); skipping
>>  > share=on cases sounds easier to me.
>>
>>  I agree that introducing new term is a complication, but 'share' and 'external'
>>  terms have important differences (I'll describe it below).
>>
>>  Just to clarify:
>>  * 'share' means that other processes has an access to such memory,
>>  * 'external' means file backed memory.
>>
>>  There is another use case I wanted to support (I had to write about it in
>>  the cover letter, sorry..):
>>  1. Migrate source VM to file and kill source,
>>  2. Start target VM and migrate it from file.
>>  In such case source VM may have memory-backend-ram with share=off, it's ok.
>>
>>  Thus, in the new migration capability I want to migrate memory that meets
>>  three conditions:
>>  1. The source will not use the memory after migration ends,
>>  2. The source may exit before target starts (migrate to file),
>>  3. The target has an access to the memory.
>>
>>  I think 'external' fits them better than 'share'.
>
> Are you sure that with share=off (the default), in the case where the
> source shuts down, that the changes have been written to the backing
> file?
>

Oh, you're right. I was sure it would work...

> I'm also wondering if perhaps we'd be better having an explicit
> migrate=off property on memory objects rather than trying to guess from
> the share= or the fact it's an external path.

It's good idea. Perhaps this is the best option, given the above.

>
> Igor: Does that make sense to you?
>
> Dave
>
>>  >
>>  > Dave
>>  >
>>  >> Patches:
>>  >> 1. Add offset validation to make sure that external RAM block has the same
>>  >> physical offset on target side,
>>  >> 2. Add RAM_EXTERNAL flag to determine external RAM blocks,
>>  >> 3. Add ignore-external migration capability,
>>  >> 4. Add a test.
>>  >>
>>  >> Usage example:
>>  >> 1. Start source VM:
>>  >> qemu-system-x86 \
>>  >> -m 4G \
>>  >> -object memory-backend-file,id=mem0,size=4G,share=on,mem-path=/dev/shm/mem0 \
>>  >> -numa node,memdev=mem0 \
>>  >> -qmp unix:/tmp/qemu-qmp-1.sock,server,nowait \
>>  >>
>>  >> 2. Start target VM:
>>  >> qemu-system-x86 \
>>  >> -m 4G \
>>  >> -object memory-backend-file,id=mem0,size=4G,share=on,mem-path=/dev/shm/mem0 \
>>  >> -numa node,memdev=mem0 \
>>  >> -qmp unix:/tmp/qemu-qmp-2.sock,server,nowait \
>>  >> -incoming defer
>>  >>
>>  >> 3. Enable ignore-external capability on both VMs:
>>  >> { "execute": "migrate-set-capabilities" , "arguments":
>>  >> { "capabilities": [ { "capability": "x-ignore-external", "state": true } ] } }
>>  >>
>>  >> 4. Start migration.
>>  >>
>>  >> Regards,
>>  >> Yury
>>  >>
>>  >> Yury Kotov (4):
>>  >> migration: add RAMBlock's offset validation
>>  >> exec: add RAM_EXTERNAL flag to mark non-QEMU allocated blocks
>>  >> migration: introduce ignore-external capability
>>  >> tests/migration-test: Add a test for ignore-external capability
>>  >>
>>  >> backends/hostmem-file.c | 3 +-
>>  >> exec.c | 7 ++-
>>  >> include/exec/cpu-common.h | 1 +
>>  >> include/exec/memory.h | 3 ++
>>  >> migration/migration.c | 9 ++++
>>  >> migration/migration.h | 1 +
>>  >> migration/ram.c | 52 ++++++++++++++++--
>>  >> numa.c | 4 +-
>>  >> qapi/migration.json | 6 ++-
>>  >> tests/migration-test.c | 109 +++++++++++++++++++++++++++++++-------
>>  >> 10 files changed, 165 insertions(+), 30 deletions(-)
>>  >>
>>  >> --
>>  >> 2.20.1
>>  > --
>>  > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>
>>  Regards,
>>  Yury
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Regards,
Yury
Yury Kotov Jan. 14, 2019, 3:31 p.m. UTC | #8
11.01.2019, 23:55, "Eduardo Habkost" <ehabkost@redhat.com>:
> On Fri, Jan 11, 2019 at 06:49:53PM +0300, Yury Kotov wrote:
>>  10.01.2019, 23:12, "Dr. David Alan Gilbert" <dgilbert@redhat.com>:
>>  > * Yury Kotov (yury-kotov@yandex-team.ru) wrote:
>>  >> Hi,
>>  >>
>>  >> The series adds migration capability which allows to skip 'external' RAM blocks
>>  >> during migration. External block is a RAMBlock which available from the outside
>>  >> of current QEMU process (e.g. file in /dev/shm). It's useful for fast local
>>  >> migration to update QEMU for the running guests.
>>  >
>>  > Hi Yury,
>>  > There have been a few similar patch series around from people wanting
>>  > to do similar things.
>>  > In particular Lai Jiangshan's https://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg07511.html
>>  > and Cédric Le Goater wanted to skip regions for a different reason.
>>  >
>>  > We merged some of Cédric's code last year so that we now
>>  > have the qemu_ram_is_migratable() function - and we should be reusing
>>  > that to skip things rather than adding a new check that we have to add
>>  > everywhere.
>>  >
>>
>>  I didn't see the series, so I'll check it, thanks!
>>  But I saw qemu_ram_is_migratable() function and corresponding patch.
>>  It's very close to my needs, but it works a bit different IIUC:
>>  1. Not migratable blocks isn't validated (existence and size) during migration,
>>  2. "Migratable" state is determined during the block creation time.
>>     Such case isn't valid because of it:
>>     * Source has one migratable and one not migratable RAM blocks,
>>     * Target has the same (idstr) blocks, but both are not migratable.
>>     Thus, target will not expect pages for not migratable blocks.
>>
>>  > Also, ypu're skipping 'external' things, I think the other suggestion
>>  > was to skip 'shared' things (i.e. anything with share=0); skipping
>>  > share=on cases sounds easier to me.
>>
>>  I agree that introducing new term is a complication, but 'share' and 'external'
>>  terms have important differences (I'll describe it below).
>>
>>  Just to clarify:
>>  * 'share' means that other processes has an access to such memory,
>>  * 'external' means file backed memory.
>
> If you use file backed memory with share=off, writes are not
> propagated to the file (they are mapped with MAP_PRIVATE). Would
> you really want to skip file backed memory if it has share=off?
>

Yes, you're right. I was sure it would work, but share=on is also needed in
my case.

>>  There is another use case I wanted to support (I had to write about it in
>>  the cover letter, sorry..):
>>  1. Migrate source VM to file and kill source,
>>  2. Start target VM and migrate it from file.
>>  In such case source VM may have memory-backend-ram with share=off, it's ok.
>>
>>  Thus, in the new migration capability I want to migrate memory that meets
>>  three conditions:
>>  1. The source will not use the memory after migration ends,
>>  2. The source may exit before target starts (migrate to file),
>>  3. The target has an access to the memory.
>>
>>  I think 'external' fits them better than 'share'.
>
> In either case, defining "external" seems tricky. A memory
> region might be backed by a file on tmpfs or hugetlbfs that was
> deleted, which makes the file "internal" for practical purposes.
> QEMU has no way to tell if (3) is really true.
>
> --
> Eduardo

Agree. Perhaps the best is a separate flag, as suggested by Dave.

Regards,
Yury
Yury Kotov Jan. 21, 2019, 2:09 p.m. UTC | #9
Hi,

Just want to clarify your suggestions.

1) migrate=off/share=on

I'm not sure that adding new flag 'migrate=off' is a good idea. I think that
share=on as you suggested at first is enough.
* It's a new flag which has sense only with share=on
* It seems to that the meaning of this flag isn't clear. migrate=off isn't
  RAM_MIGRATABLE, it's rather RAM_IGNORABLE. I.e. it means that we don't migrate
  such block only if the capability is enabled.
If you don't mind, I'll prefer share=on and ignore-shared capability.

2) Keep RAM migration version

It's more complicated question. To validate GPAs for ignored blocks I have to
change the stream format. I can do this conditionally (if (cap_enabled) { ... })
but in this case, I want to make sure that the capability is enabled or disabled
on both source and target. Otherwise, there is an undefined behavior on the
target side (most likely some error during deserialization).
To fix that I can add a capability validation feature.

For example, add a new section to vmstate_configuration (not complete):
+static const VMStateDescription vmstate_capabilites = {
+    .name = "configuration/capabilities",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = vmstate_capabilites_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32_V(caps_count, SaveState, 1),
+        VMSTATE_VARRAY_UINT32_ALLOC(capabilities, SaveState, caps_count, 1,
+                                    vmstate_info_bool, bool),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_configuration = {
     .name = "configuration",
     .version_id = 1,
@@ -404,6 +456,7 @@ static const VMStateDescription vmstate_configuration = {
     },
     .subsections = (const VMStateDescription*[]) {
         &vmstate_target_page_bits,
+        &vmstate_capabilites,
         NULL
     }
 };

It seems too complicated. If I should change the migration stream anyway, maybe
it's better to increment the version?

What do you think?

Regards,
Yury
Dr. David Alan Gilbert Jan. 22, 2019, 6:08 p.m. UTC | #10
* Yury Kotov (yury-kotov@yandex-team.ru) wrote:
> Hi,
> 
> Just want to clarify your suggestions.
> 
> 1) migrate=off/share=on
> 
> I'm not sure that adding new flag 'migrate=off' is a good idea. I think that
> share=on as you suggested at first is enough.
> * It's a new flag which has sense only with share=on
> * It seems to that the meaning of this flag isn't clear. migrate=off isn't
>   RAM_MIGRATABLE, it's rather RAM_IGNORABLE. I.e. it means that we don't migrate
>   such block only if the capability is enabled.
> If you don't mind, I'll prefer share=on and ignore-shared capability.

Yes, I'm OK with that, just check with Lai Jiangshan (cc'd) that it
meets their requirement as well.

> 2) Keep RAM migration version
> 
> It's more complicated question. To validate GPAs for ignored blocks I have to
> change the stream format. I can do this conditionally (if (cap_enabled) { ... })
> but in this case, I want to make sure that the capability is enabled or disabled
> on both source and target. Otherwise, there is an undefined behavior on the
> target side (most likely some error during deserialization).
> To fix that I can add a capability validation feature.
> 
> For example, add a new section to vmstate_configuration (not complete):
> +static const VMStateDescription vmstate_capabilites = {
> +    .name = "configuration/capabilities",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = vmstate_capabilites_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32_V(caps_count, SaveState, 1),
> +        VMSTATE_VARRAY_UINT32_ALLOC(capabilities, SaveState, caps_count, 1,
> +                                    vmstate_info_bool, bool),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static const VMStateDescription vmstate_configuration = {
>      .name = "configuration",
>      .version_id = 1,
> @@ -404,6 +456,7 @@ static const VMStateDescription vmstate_configuration = {
>      },
>      .subsections = (const VMStateDescription*[]) {
>          &vmstate_target_page_bits,
> +        &vmstate_capabilites,
>          NULL
>      }
>  };
> 
> It seems too complicated. If I should change the migration stream anyway, maybe
> it's better to increment the version?
> 
> What do you think?

Never break compatibility!

There's three separate things here:
  a) Skipping the shared blocks
  b) Adding a check that skipped blocks actually match
  c) Adding a check for matching capabilities

Lets solve them separately.

I think (a) we've discussed.
(b) If you only enable the checking when your ignore-shared is enabled
then it doesn't break any compatibility.  But watch out, if they're
skipped for some other reason then you might not want to check them;
for example some of the things Peter Maydell was talking about for ARM
was that the block may or may not be present, so there's no requirement
it's on the destination.

(c) That's a nice to have; you'd have to create a list of capabilities
you care about including; if they're only new capabilities then you
can just enable the fields when any are set and it doesn't break old
things; if you want to include some other capabilities then tie
it to the machine type and it wont break old setups.

Dave

> Regards,
> Yury
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK