diff mbox series

[v6,10/11] migration: remove the unnecessary RDMA_CONTROL_ERROR message

Message ID 1533562177-16447-11-git-send-email-lidongchen@tencent.com (mailing list archive)
State New, archived
Headers show
Series Enable postcopy RDMA live migration | expand

Commit Message

858585 jemmy Aug. 6, 2018, 1:29 p.m. UTC
It's not necessary to send RDMA_CONTROL_ERROR when clean up rdma resource.
If rdma->error_state is ture, the message may not send successfully.
and the cm event can also notify the peer qemu.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 migration/rdma.c | 11 -----------
 1 file changed, 11 deletions(-)

Comments

Dr. David Alan Gilbert Aug. 17, 2018, 2:04 p.m. UTC | #1
* Lidong Chen (jemmy858585@gmail.com) wrote:
> It's not necessary to send RDMA_CONTROL_ERROR when clean up rdma resource.
> If rdma->error_state is ture, the message may not send successfully.
> and the cm event can also notify the peer qemu.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

How does this keep 'cancel' working; I added 32bce196344 last year to
make that code also send the RDMA_CONTROL_ERROR in 'cancelling'.

Dave

> ---
>  migration/rdma.c | 11 -----------
>  1 file changed, 11 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index ae07515..e1498f2 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2305,17 +2305,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>      int idx;
>  
>      if (rdma->cm_id && rdma->connected) {
> -        if ((rdma->error_state ||
> -             migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
> -            !rdma->received_error) {
> -            RDMAControlHeader head = { .len = 0,
> -                                       .type = RDMA_CONTROL_ERROR,
> -                                       .repeat = 1,
> -                                     };
> -            error_report("Early error. Sending error.");
> -            qemu_rdma_post_send_control(rdma, NULL, &head);
> -        }
> -
>          rdma_disconnect(rdma->cm_id);
>          trace_qemu_rdma_cleanup_disconnect();
>          rdma->connected = false;
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
858585 jemmy Aug. 20, 2018, 9:04 a.m. UTC | #2
On Fri, Aug 17, 2018 at 10:04 PM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Lidong Chen (jemmy858585@gmail.com) wrote:
>> It's not necessary to send RDMA_CONTROL_ERROR when clean up rdma resource.
>> If rdma->error_state is ture, the message may not send successfully.
>> and the cm event can also notify the peer qemu.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>
> How does this keep 'cancel' working; I added 32bce196344 last year to
> make that code also send the RDMA_CONTROL_ERROR in 'cancelling'.

I guess send the RDMA_CONTROL_ERROR is to notify peer qemu close rdma
connection,
 and to make sure the receive rdma_disconnect event.

But the two sides qemu should cleanup rdma independently. maybe the
destination qemu is hang.

1.the current qemu version already not wait for
RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect,

2.for peer qemu, it's already poll rdma->channel->fd, compare to send
RDMA_CONTROL_ERROR, maybe use cm event
to notify peer qemu to quit is better. maybe the rdma is already in
error_state, and RDMA_CONTROL_ERROR
cannot send successfully. when cancel migraiton, the destination will
receive RDMA_CM_EVENT_DISCONNECTED.

3. I prefer use RDMA_CONTROL_ERROR to notify some code logic error,
not rdma connection error.

>
> Dave
>
>> ---
>>  migration/rdma.c | 11 -----------
>>  1 file changed, 11 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index ae07515..e1498f2 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2305,17 +2305,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>>      int idx;
>>
>>      if (rdma->cm_id && rdma->connected) {
>> -        if ((rdma->error_state ||
>> -             migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
>> -            !rdma->received_error) {
>> -            RDMAControlHeader head = { .len = 0,
>> -                                       .type = RDMA_CONTROL_ERROR,
>> -                                       .repeat = 1,
>> -                                     };
>> -            error_report("Early error. Sending error.");
>> -            qemu_rdma_post_send_control(rdma, NULL, &head);
>> -        }
>> -
>>          rdma_disconnect(rdma->cm_id);
>>          trace_qemu_rdma_cleanup_disconnect();
>>          rdma->connected = false;
>> --
>> 1.8.3.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox series

Patch

diff --git a/migration/rdma.c b/migration/rdma.c
index ae07515..e1498f2 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2305,17 +2305,6 @@  static void qemu_rdma_cleanup(RDMAContext *rdma)
     int idx;
 
     if (rdma->cm_id && rdma->connected) {
-        if ((rdma->error_state ||
-             migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
-            !rdma->received_error) {
-            RDMAControlHeader head = { .len = 0,
-                                       .type = RDMA_CONTROL_ERROR,
-                                       .repeat = 1,
-                                     };
-            error_report("Early error. Sending error.");
-            qemu_rdma_post_send_control(rdma, NULL, &head);
-        }
-
         rdma_disconnect(rdma->cm_id);
         trace_qemu_rdma_cleanup_disconnect();
         rdma->connected = false;