diff mbox series

[1/2] migration: Fix rdma migration failed

Message ID 20230920090412.726725-1-lizhijian@fujitsu.com (mailing list archive)
State New, archived
Headers show
Series [1/2] migration: Fix rdma migration failed | expand

Commit Message

Zhijian Li (Fujitsu) Sept. 20, 2023, 9:04 a.m. UTC
From: Li Zhijian <lizhijian@cn.fujitsu.com>

Destination will fail with:
qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.

migrate with RDMA is different from tcp. RDMA has its own control
message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.

find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic to
destination and cause migration to fail.

Since there's no existing subroutine to indicate whether it's migrated
by RDMA or not, and RDMA is not compatible with multifd, we use
migrate_multifd() here.

Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
---
 migration/ram.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Fabiano Rosas Sept. 20, 2023, 12:46 p.m. UTC | #1
Li Zhijian <lizhijian@fujitsu.com> writes:

> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>
> Destination will fail with:
> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>
> migrate with RDMA is different from tcp. RDMA has its own control
> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.

Yeah, this is really fragile. We need a long term solution to this. Any
other change to multifd protocol as well as any other change to the
migration ram handling might hit this issue again.

Perhaps commit 294e5a4034 ("multifd: Only flush once each full round of
memory") should simply not have touched the stream at that point, but we
don't have any explicit safeguards to avoid interleaving flags from
different layers like that (assuming multifd is at another logical layer
than the ram handling).

I don't have any good suggestions at this moment, so for now:

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Zhijian Li (Fujitsu) Sept. 21, 2023, 1:40 a.m. UTC | #2
Sorry to all, i forgot to update my email address to lizhijian@fujitsu.com.

Corrected it.


On 20/09/2023 17:04, Li Zhijian wrote:
> From: Li Zhijian <lizhijian@cn.fujitsu.com>
> 
> Destination will fail with:
> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
> 
> migrate with RDMA is different from tcp. RDMA has its own control
> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
> 
> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic to
> destination and cause migration to fail.
> 
> Since there's no existing subroutine to indicate whether it's migrated
> by RDMA or not, and RDMA is not compatible with multifd, we use
> migrate_multifd() here.
> 
> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>   migration/ram.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 9040d66e61..89ae28e21a 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1399,7 +1399,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
>           pss->page = 0;
>           pss->block = QLIST_NEXT_RCU(pss->block, next);
>           if (!pss->block) {
> -            if (!migrate_multifd_flush_after_each_section()) {
> +            if (migrate_multifd() &&
> +                !migrate_multifd_flush_after_each_section()) {
>                   QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
>                   int ret = multifd_send_sync_main(f);
>                   if (ret < 0) {
Zhijian Li (Fujitsu) Sept. 22, 2023, 7:42 a.m. UTC | #3
On 20/09/2023 20:46, Fabiano Rosas wrote:
> Li Zhijian <lizhijian@fujitsu.com> writes:
> 
>> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>>
>> Destination will fail with:
>> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>>
>> migrate with RDMA is different from tcp. RDMA has its own control
>> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
>> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
> 
> Yeah, this is really fragile. We need a long term solution to this. Any
> other change to multifd protocol as well as any other change to the
> migration ram handling might hit this issue again.

Yeah, it's pain point.

Another option is that let RDMA control handler to know RAM_SAVE_FLAG_MULTIFD_FLUSH message
and do nothing with it.


> 
> Perhaps commit 294e5a4034 ("multifd: Only flush once each full round of
> memory") should simply not have touched the stream at that point, but we
> don't have any explicit safeguards to avoid interleaving flags from
> different layers like that (assuming multifd is at another logical layer
> than the ram handling)> 
> I don't have any good suggestions at this moment, so for now:
> 
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
Peter Xu Sept. 22, 2023, 3:42 p.m. UTC | #4
On Wed, Sep 20, 2023 at 05:04:11PM +0800, Li Zhijian wrote:
> From: Li Zhijian <lizhijian@cn.fujitsu.com>
> 
> Destination will fail with:
> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
> 
> migrate with RDMA is different from tcp. RDMA has its own control
> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
> 
> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic to
> destination and cause migration to fail.
> 
> Since there's no existing subroutine to indicate whether it's migrated
> by RDMA or not, and RDMA is not compatible with multifd, we use
> migrate_multifd() here.
> 
> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> ---
>  migration/ram.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 9040d66e61..89ae28e21a 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1399,7 +1399,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
>          pss->page = 0;
>          pss->block = QLIST_NEXT_RCU(pss->block, next);
>          if (!pss->block) {
> -            if (!migrate_multifd_flush_after_each_section()) {
> +            if (migrate_multifd() &&
> +                !migrate_multifd_flush_after_each_section()) {
>                  QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
>                  int ret = multifd_send_sync_main(f);
>                  if (ret < 0) {
> -- 
> 2.31.1
> 

Maybe better to put that check at the entry of
migrate_multifd_flush_after_each_section()?

I also hope that some day there's no multifd function called in generic
migration code paths..
Fabiano Rosas Sept. 22, 2023, 3:59 p.m. UTC | #5
Peter Xu <peterx@redhat.com> writes:

> On Wed, Sep 20, 2023 at 05:04:11PM +0800, Li Zhijian wrote:
>> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>> 
>> Destination will fail with:
>> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>> 
>> migrate with RDMA is different from tcp. RDMA has its own control
>> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
>> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
>> 
>> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
>> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic to
>> destination and cause migration to fail.
>> 
>> Since there's no existing subroutine to indicate whether it's migrated
>> by RDMA or not, and RDMA is not compatible with multifd, we use
>> migrate_multifd() here.
>> 
>> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>  migration/ram.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 9040d66e61..89ae28e21a 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1399,7 +1399,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
>>          pss->page = 0;
>>          pss->block = QLIST_NEXT_RCU(pss->block, next);
>>          if (!pss->block) {
>> -            if (!migrate_multifd_flush_after_each_section()) {
>> +            if (migrate_multifd() &&
>> +                !migrate_multifd_flush_after_each_section()) {
>>                  QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
>>                  int ret = multifd_send_sync_main(f);
>>                  if (ret < 0) {
>> -- 
>> 2.31.1
>> 
>
> Maybe better to put that check at the entry of
> migrate_multifd_flush_after_each_section()?
>
> I also hope that some day there's no multifd function called in generic
> migration code paths..

I wonder what happened with that MigrationOps idea. We added the
ram_save_target_page pointer and nothing else. It seems like it could be
something in the direction of allowing different parts of the migration
code to provide different behavior without having to put these explicit
checks all over the place.

And although we're removing the QEMUFile hooks, those also looked like
they could help mitigate these cross-layer interactions. (I'm NOT
advocating bringing them back).
Peter Xu Sept. 22, 2023, 4:09 p.m. UTC | #6
On Fri, Sep 22, 2023 at 12:59:37PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Sep 20, 2023 at 05:04:11PM +0800, Li Zhijian wrote:
> >> From: Li Zhijian <lizhijian@cn.fujitsu.com>
> >> 
> >> Destination will fail with:
> >> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
> >> 
> >> migrate with RDMA is different from tcp. RDMA has its own control
> >> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
> >> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
> >> 
> >> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
> >> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic to
> >> destination and cause migration to fail.
> >> 
> >> Since there's no existing subroutine to indicate whether it's migrated
> >> by RDMA or not, and RDMA is not compatible with multifd, we use
> >> migrate_multifd() here.
> >> 
> >> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
> >> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
> >> ---
> >>  migration/ram.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/migration/ram.c b/migration/ram.c
> >> index 9040d66e61..89ae28e21a 100644
> >> --- a/migration/ram.c
> >> +++ b/migration/ram.c
> >> @@ -1399,7 +1399,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
> >>          pss->page = 0;
> >>          pss->block = QLIST_NEXT_RCU(pss->block, next);
> >>          if (!pss->block) {
> >> -            if (!migrate_multifd_flush_after_each_section()) {
> >> +            if (migrate_multifd() &&
> >> +                !migrate_multifd_flush_after_each_section()) {
> >>                  QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
> >>                  int ret = multifd_send_sync_main(f);
> >>                  if (ret < 0) {
> >> -- 
> >> 2.31.1
> >> 
> >
> > Maybe better to put that check at the entry of
> > migrate_multifd_flush_after_each_section()?
> >
> > I also hope that some day there's no multifd function called in generic
> > migration code paths..
> 
> I wonder what happened with that MigrationOps idea. We added the
> ram_save_target_page pointer and nothing else. It seems like it could be
> something in the direction of allowing different parts of the migration
> code to provide different behavior without having to put these explicit
> checks all over the place.

Yeah..

https://lore.kernel.org/qemu-devel/20230130080956.3047-12-quintela@redhat.com/

Juan should know better.

Personally I think it'll be good we only introduce hook when there's a 2nd
user already.  I assume Juan merged it planning that'll land soon but it
didn't.
Zhijian Li (Fujitsu) Sept. 25, 2023, 8:59 a.m. UTC | #7
On 22/09/2023 23:42, Peter Xu wrote:
> On Wed, Sep 20, 2023 at 05:04:11PM +0800, Li Zhijian wrote:
>> From: Li Zhijian <lizhijian@cn.fujitsu.com>
>>
>> Destination will fail with:
>> qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing.
>>
>> migrate with RDMA is different from tcp. RDMA has its own control
>> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
>> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
>>
>> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
>> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic to
>> destination and cause migration to fail.
>>
>> Since there's no existing subroutine to indicate whether it's migrated
>> by RDMA or not, and RDMA is not compatible with multifd, we use
>> migrate_multifd() here.
>>
>> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>> ---
>>   migration/ram.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 9040d66e61..89ae28e21a 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1399,7 +1399,8 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
>>           pss->page = 0;
>>           pss->block = QLIST_NEXT_RCU(pss->block, next);
>>           if (!pss->block) {
>> -            if (!migrate_multifd_flush_after_each_section()) {
>> +            if (migrate_multifd() &&
>> +                !migrate_multifd_flush_after_each_section()) {
>>                   QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
>>                   int ret = multifd_send_sync_main(f);
>>                   if (ret < 0) {
>> -- 
>> 2.31.1
>>
> 
> Maybe better to put that check at the entry of
> migrate_multifd_flush_after_each_section()?
> 

It sounds good to me:
diff --git a/migration/options.c b/migration/options.c
index 1d1e1321b0a..327bcf2fbe4 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -368,7 +368,7 @@ bool migrate_multifd_flush_after_each_section(void)
  {
      MigrationState *s = migrate_get_current();

-    return s->multifd_flush_after_each_section;
+    return !migrate_multifd() || s->multifd_flush_after_each_section;
  }

  bool migrate_postcopy(void)


That changes make migrate_multifd_flush_after_each_section() always true when multifd is disabled.

Thanks



> I also hope that some day there's no multifd function called in generic
> migration code paths..
>
diff mbox series

Patch

diff --git a/migration/ram.c b/migration/ram.c
index 9040d66e61..89ae28e21a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1399,7 +1399,8 @@  static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
         pss->page = 0;
         pss->block = QLIST_NEXT_RCU(pss->block, next);
         if (!pss->block) {
-            if (!migrate_multifd_flush_after_each_section()) {
+            if (migrate_multifd() &&
+                !migrate_multifd_flush_after_each_section()) {
                 QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
                 int ret = multifd_send_sync_main(f);
                 if (ret < 0) {