diff mbox series

[V1] migration: cpr breaks SNP guest

Message ID 1743082598-428927-1-git-send-email-steven.sistare@oracle.com (mailing list archive)
State New
Headers show
Series [V1] migration: cpr breaks SNP guest | expand

Commit Message

Steve Sistare March 27, 2025, 1:36 p.m. UTC
With aux-ram-share=off, booting an SNP guest fails with:

  ../util/error.c:68: error_setv: Assertion `*errp == NULL' failed.

This is because a CPR blocker for the guest_memfd ramblock is added
twice, once in ram_block_add_cpr_blocker because aux-ram-share=off so
rb->fd < 0, and once in ram_block_add for a specific guest_memfd blocker.

To fix, add the guest_memfd blocker iff a generic one would not be
added by ram_block_add_cpr_blocker.

Fixes: 094a3dbc55df ("migration: ram block cpr blockers")
Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
Reported-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 system/physmem.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

Comments

Tom Lendacky March 27, 2025, 2:10 p.m. UTC | #1
On 3/27/25 08:36, Steve Sistare wrote:
> With aux-ram-share=off, booting an SNP guest fails with:
> 
>   ../util/error.c:68: error_setv: Assertion `*errp == NULL' failed.
> 
> This is because a CPR blocker for the guest_memfd ramblock is added
> twice, once in ram_block_add_cpr_blocker because aux-ram-share=off so
> rb->fd < 0, and once in ram_block_add for a specific guest_memfd blocker.
> 
> To fix, add the guest_memfd blocker iff a generic one would not be
> added by ram_block_add_cpr_blocker.
> 
> Fixes: 094a3dbc55df ("migration: ram block cpr blockers")
> Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
> Reported-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

SNP guest launch works for me with this fix. Thanks, Steve!

Tested-by: Tom Lendacky <thomas.lendacky@amd.com>

> ---
>  system/physmem.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/system/physmem.c b/system/physmem.c
> index e97de3e..cfafb06 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -1908,13 +1908,18 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>              goto out_free;
>          }
>  
> -        error_setg(&new_block->cpr_blocker,
> -                   "Memory region %s uses guest_memfd, "
> -                   "which is not supported with CPR.",
> -                   memory_region_name(new_block->mr));
> -        migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
> -                                  MIG_MODE_CPR_TRANSFER,
> -                                  -1);
> +        /*
> +         * Add a specific guest_memfd blocker if a generic one would not be
> +         * added by ram_block_add_cpr_blocker.
> +         */
> +        if (new_block->fd >= 0 && qemu_ram_is_shared(new_block)) {
> +            error_setg(&new_block->cpr_blocker,
> +                       "Memory region %s uses guest_memfd, "
> +                       "which is not supported with CPR.",
> +                       memory_region_name(new_block->mr));
> +            migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
> +                                      MIG_MODE_CPR_TRANSFER, -1);
> +        }
>      }
>  
>      ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS;
Fabiano Rosas March 27, 2025, 2:21 p.m. UTC | #2
Steve Sistare <steven.sistare@oracle.com> writes:

> With aux-ram-share=off, booting an SNP guest fails with:
>
>   ../util/error.c:68: error_setv: Assertion `*errp == NULL' failed.
>
> This is because a CPR blocker for the guest_memfd ramblock is added
> twice, once in ram_block_add_cpr_blocker because aux-ram-share=off so
> rb->fd < 0, and once in ram_block_add for a specific guest_memfd blocker.
>
> To fix, add the guest_memfd blocker iff a generic one would not be
> added by ram_block_add_cpr_blocker.
>
> Fixes: 094a3dbc55df ("migration: ram block cpr blockers")
> Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
> Reported-by: Michael Roth <michael.roth@amd.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  system/physmem.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/system/physmem.c b/system/physmem.c
> index e97de3e..cfafb06 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -1908,13 +1908,18 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>              goto out_free;
>          }
>  
> -        error_setg(&new_block->cpr_blocker,
> -                   "Memory region %s uses guest_memfd, "
> -                   "which is not supported with CPR.",
> -                   memory_region_name(new_block->mr));
> -        migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
> -                                  MIG_MODE_CPR_TRANSFER,
> -                                  -1);
> +        /*
> +         * Add a specific guest_memfd blocker if a generic one would not be
> +         * added by ram_block_add_cpr_blocker.
> +         */
> +        if (new_block->fd >= 0 && qemu_ram_is_shared(new_block)) {

Can we avoid having two instances of the same condition that will need
to be kept in sync? What about:

/*
 * Preserving guest_memfd may be sufficient for CPR, but it has not
 * been tested yet.
 */
if (ram_is_cpr_compatible(new_block)) {
...

> +            error_setg(&new_block->cpr_blocker,
> +                       "Memory region %s uses guest_memfd, "
> +                       "which is not supported with CPR.",
> +                       memory_region_name(new_block->mr));
> +            migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
> +                                      MIG_MODE_CPR_TRANSFER, -1);
> +        }
>      }
>  
>      ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS;
Steve Sistare March 27, 2025, 2:46 p.m. UTC | #3
On 3/27/2025 10:21 AM, Fabiano Rosas wrote:
> Steve Sistare <steven.sistare@oracle.com> writes:
> 
>> With aux-ram-share=off, booting an SNP guest fails with:
>>
>>    ../util/error.c:68: error_setv: Assertion `*errp == NULL' failed.
>>
>> This is because a CPR blocker for the guest_memfd ramblock is added
>> twice, once in ram_block_add_cpr_blocker because aux-ram-share=off so
>> rb->fd < 0, and once in ram_block_add for a specific guest_memfd blocker.
>>
>> To fix, add the guest_memfd blocker iff a generic one would not be
>> added by ram_block_add_cpr_blocker.
>>
>> Fixes: 094a3dbc55df ("migration: ram block cpr blockers")
>> Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
>> Reported-by: Michael Roth <michael.roth@amd.com>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>   system/physmem.c | 19 ++++++++++++-------
>>   1 file changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/system/physmem.c b/system/physmem.c
>> index e97de3e..cfafb06 100644
>> --- a/system/physmem.c
>> +++ b/system/physmem.c
>> @@ -1908,13 +1908,18 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>>               goto out_free;
>>           }
>>   
>> -        error_setg(&new_block->cpr_blocker,
>> -                   "Memory region %s uses guest_memfd, "
>> -                   "which is not supported with CPR.",
>> -                   memory_region_name(new_block->mr));
>> -        migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
>> -                                  MIG_MODE_CPR_TRANSFER,
>> -                                  -1);
>> +        /*
>> +         * Add a specific guest_memfd blocker if a generic one would not be
>> +         * added by ram_block_add_cpr_blocker.
>> +         */
>> +        if (new_block->fd >= 0 && qemu_ram_is_shared(new_block)) {
> 
> Can we avoid having two instances of the same condition that will need
> to be kept in sync? What about:
> 
> /*
>   * Preserving guest_memfd may be sufficient for CPR, but it has not
>   * been tested yet.
>   */
> if (ram_is_cpr_compatible(new_block)) {
> ...
> 

Sure, that is better. I will send V2 shortly.

- Steve

>> +            error_setg(&new_block->cpr_blocker,
>> +                       "Memory region %s uses guest_memfd, "
>> +                       "which is not supported with CPR.",
>> +                       memory_region_name(new_block->mr));
>> +            migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
>> +                                      MIG_MODE_CPR_TRANSFER, -1);
>> +        }
>>       }
>>   
>>       ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS;
diff mbox series

Patch

diff --git a/system/physmem.c b/system/physmem.c
index e97de3e..cfafb06 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1908,13 +1908,18 @@  static void ram_block_add(RAMBlock *new_block, Error **errp)
             goto out_free;
         }
 
-        error_setg(&new_block->cpr_blocker,
-                   "Memory region %s uses guest_memfd, "
-                   "which is not supported with CPR.",
-                   memory_region_name(new_block->mr));
-        migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
-                                  MIG_MODE_CPR_TRANSFER,
-                                  -1);
+        /*
+         * Add a specific guest_memfd blocker if a generic one would not be
+         * added by ram_block_add_cpr_blocker.
+         */
+        if (new_block->fd >= 0 && qemu_ram_is_shared(new_block)) {
+            error_setg(&new_block->cpr_blocker,
+                       "Memory region %s uses guest_memfd, "
+                       "which is not supported with CPR.",
+                       memory_region_name(new_block->mr));
+            migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
+                                      MIG_MODE_CPR_TRANSFER, -1);
+        }
     }
 
     ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS;