diff mbox series

[v1,07/17] migration/rdma: Use ram_block_discard_set_broken()

Message ID 20200506094948.76388-8-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series virtio-mem: Paravirtualized memory hot(un)plug | expand

Commit Message

David Hildenbrand May 6, 2020, 9:49 a.m. UTC
RDMA will pin all guest memory (as documented in docs/rdma.txt). We want
to mark RAM block discards to be broken - however, to keep it simple
use ram_block_discard_is_required() instead of inhibiting.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 migration/rdma.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Dr. David Alan Gilbert May 15, 2020, 12:45 p.m. UTC | #1
* David Hildenbrand (david@redhat.com) wrote:
> RDMA will pin all guest memory (as documented in docs/rdma.txt). We want
> to mark RAM block discards to be broken - however, to keep it simple
> use ram_block_discard_is_required() instead of inhibiting.

Should this be dependent on whether rdma->pin_all is set?
Even with !pin_all some will be pinned at any given time
(when it's registered with the rdma stack).

Dave

> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  migration/rdma.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index f61587891b..029adbb950 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -29,6 +29,7 @@
>  #include "qemu/sockets.h"
>  #include "qemu/bitmap.h"
>  #include "qemu/coroutine.h"
> +#include "exec/memory.h"
>  #include <sys/socket.h>
>  #include <netdb.h>
>  #include <arpa/inet.h>
> @@ -4017,8 +4018,14 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
>      Error *local_err = NULL;
>  
>      trace_rdma_start_incoming_migration();
> -    rdma = qemu_rdma_data_init(host_port, &local_err);
>  
> +    /* Avoid ram_block_discard_set_broken(), cannot change during migration. */
> +    if (ram_block_discard_is_required()) {
> +        error_setg(errp, "RDMA: cannot set discarding of RAM broken");
> +        return;
> +    }
> +
> +    rdma = qemu_rdma_data_init(host_port, &local_err);
>      if (rdma == NULL) {
>          goto err;
>      }
> @@ -4064,10 +4071,17 @@ void rdma_start_outgoing_migration(void *opaque,
>                              const char *host_port, Error **errp)
>  {
>      MigrationState *s = opaque;
> -    RDMAContext *rdma = qemu_rdma_data_init(host_port, errp);
>      RDMAContext *rdma_return_path = NULL;
> +    RDMAContext *rdma;
>      int ret = 0;
>  
> +    /* Avoid ram_block_discard_set_broken(), cannot change during migration. */
> +    if (ram_block_discard_is_required()) {
> +        error_setg(errp, "RDMA: cannot set discarding of RAM broken");
> +        return;
> +    }
> +
> +    rdma = qemu_rdma_data_init(host_port, errp);
>      if (rdma == NULL) {
>          goto err;
>      }
> -- 
> 2.25.3
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
David Hildenbrand May 15, 2020, 2:09 p.m. UTC | #2
On 15.05.20 14:45, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> RDMA will pin all guest memory (as documented in docs/rdma.txt). We want
>> to mark RAM block discards to be broken - however, to keep it simple
>> use ram_block_discard_is_required() instead of inhibiting.
> 
> Should this be dependent on whether rdma->pin_all is set?
> Even with !pin_all some will be pinned at any given time
> (when it's registered with the rdma stack).

Do you know how much memory this is? Is such memory only temporarily pinned?

At least with special-cases of vfio, it's acceptable if some memory is
temporarily pinned - we assume it's only the working set of the driver,
which guests will not inflate as long as they don't want to shoot
themselves in the foot.

This here sounds like the guest does not know the pinned memory is
special, right?
Dr. David Alan Gilbert May 15, 2020, 5:51 p.m. UTC | #3
* David Hildenbrand (david@redhat.com) wrote:
> On 15.05.20 14:45, Dr. David Alan Gilbert wrote:
> > * David Hildenbrand (david@redhat.com) wrote:
> >> RDMA will pin all guest memory (as documented in docs/rdma.txt). We want
> >> to mark RAM block discards to be broken - however, to keep it simple
> >> use ram_block_discard_is_required() instead of inhibiting.
> > 
> > Should this be dependent on whether rdma->pin_all is set?
> > Even with !pin_all some will be pinned at any given time
> > (when it's registered with the rdma stack).
> 
> Do you know how much memory this is? Is such memory only temporarily pinned?

With pin_all not set, only a subset of memory, I think multiple 1MB
chunks, are pinned at any one time.

> At least with special-cases of vfio, it's acceptable if some memory is
> temporarily pinned - we assume it's only the working set of the driver,
> which guests will not inflate as long as they don't want to shoot
> themselves in the foot.
> 
> This here sounds like the guest does not know the pinned memory is
> special, right?

Right - for RDMA it's all of memory that's being transferred, and the
guest doesn't see when each part is transferred.

Dave

> -- 
> Thanks,
> 
> David / dhildenb
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
David Hildenbrand May 15, 2020, 5:59 p.m. UTC | #4
On 15.05.20 19:51, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> On 15.05.20 14:45, Dr. David Alan Gilbert wrote:
>>> * David Hildenbrand (david@redhat.com) wrote:
>>>> RDMA will pin all guest memory (as documented in docs/rdma.txt). We want
>>>> to mark RAM block discards to be broken - however, to keep it simple
>>>> use ram_block_discard_is_required() instead of inhibiting.
>>>
>>> Should this be dependent on whether rdma->pin_all is set?
>>> Even with !pin_all some will be pinned at any given time
>>> (when it's registered with the rdma stack).
>>
>> Do you know how much memory this is? Is such memory only temporarily pinned?
> 
> With pin_all not set, only a subset of memory, I think multiple 1MB
> chunks, are pinned at any one time.
> 
>> At least with special-cases of vfio, it's acceptable if some memory is
>> temporarily pinned - we assume it's only the working set of the driver,
>> which guests will not inflate as long as they don't want to shoot
>> themselves in the foot.
>>
>> This here sounds like the guest does not know the pinned memory is
>> special, right?
> 
> Right - for RDMA it's all of memory that's being transferred, and the
> guest doesn't see when each part is transferred.


Okay, so all memory will eventually be pinned, just not at the same
time, correct?

I think this implies that any memory that was previously discarded will
be backed my new pages, meaning we will consume more memory than intended.

If so, always disabling discarding of RAM seems to be the right thing to do.
Dr. David Alan Gilbert May 15, 2020, 6:36 p.m. UTC | #5
* David Hildenbrand (david@redhat.com) wrote:
> On 15.05.20 19:51, Dr. David Alan Gilbert wrote:
> > * David Hildenbrand (david@redhat.com) wrote:
> >> On 15.05.20 14:45, Dr. David Alan Gilbert wrote:
> >>> * David Hildenbrand (david@redhat.com) wrote:
> >>>> RDMA will pin all guest memory (as documented in docs/rdma.txt). We want
> >>>> to mark RAM block discards to be broken - however, to keep it simple
> >>>> use ram_block_discard_is_required() instead of inhibiting.
> >>>
> >>> Should this be dependent on whether rdma->pin_all is set?
> >>> Even with !pin_all some will be pinned at any given time
> >>> (when it's registered with the rdma stack).
> >>
> >> Do you know how much memory this is? Is such memory only temporarily pinned?
> > 
> > With pin_all not set, only a subset of memory, I think multiple 1MB
> > chunks, are pinned at any one time.
> > 
> >> At least with special-cases of vfio, it's acceptable if some memory is
> >> temporarily pinned - we assume it's only the working set of the driver,
> >> which guests will not inflate as long as they don't want to shoot
> >> themselves in the foot.
> >>
> >> This here sounds like the guest does not know the pinned memory is
> >> special, right?
> > 
> > Right - for RDMA it's all of memory that's being transferred, and the
> > guest doesn't see when each part is transferred.
> 
> 
> Okay, so all memory will eventually be pinned, just not at the same
> time, correct?
> 
> I think this implies that any memory that was previously discarded will
> be backed my new pages, meaning we will consume more memory than intended.
> 
> If so, always disabling discarding of RAM seems to be the right thing to do.

Yeh that's probably true, although there's a check for 'buffer_is_zero'
in the !rdma->pin_all case, if the entire area is zero (or probably if
unmapped) then it sends a notification rather than registering; see
qemu_rdma_write_one and search for 'This chunk has not yet been
registered, so first check to see'

Dave

> 
> -- 
> Thanks,
> 
> David / dhildenb
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
David Hildenbrand May 18, 2020, 1:52 p.m. UTC | #6
On 15.05.20 20:36, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> On 15.05.20 19:51, Dr. David Alan Gilbert wrote:
>>> * David Hildenbrand (david@redhat.com) wrote:
>>>> On 15.05.20 14:45, Dr. David Alan Gilbert wrote:
>>>>> * David Hildenbrand (david@redhat.com) wrote:
>>>>>> RDMA will pin all guest memory (as documented in docs/rdma.txt). We want
>>>>>> to mark RAM block discards to be broken - however, to keep it simple
>>>>>> use ram_block_discard_is_required() instead of inhibiting.
>>>>>
>>>>> Should this be dependent on whether rdma->pin_all is set?
>>>>> Even with !pin_all some will be pinned at any given time
>>>>> (when it's registered with the rdma stack).
>>>>
>>>> Do you know how much memory this is? Is such memory only temporarily pinned?
>>>
>>> With pin_all not set, only a subset of memory, I think multiple 1MB
>>> chunks, are pinned at any one time.
>>>
>>>> At least with special-cases of vfio, it's acceptable if some memory is
>>>> temporarily pinned - we assume it's only the working set of the driver,
>>>> which guests will not inflate as long as they don't want to shoot
>>>> themselves in the foot.
>>>>
>>>> This here sounds like the guest does not know the pinned memory is
>>>> special, right?
>>>
>>> Right - for RDMA it's all of memory that's being transferred, and the
>>> guest doesn't see when each part is transferred.
>>
>>
>> Okay, so all memory will eventually be pinned, just not at the same
>> time, correct?
>>
>> I think this implies that any memory that was previously discarded will
>> be backed my new pages, meaning we will consume more memory than intended.
>>
>> If so, always disabling discarding of RAM seems to be the right thing to do.
> 
> Yeh that's probably true, although there's a check for 'buffer_is_zero'
> in the !rdma->pin_all case, if the entire area is zero (or probably if
> unmapped) then it sends a notification rather than registering; see
> qemu_rdma_write_one and search for 'This chunk has not yet been
> registered, so first check to see'

Right, if the whole chunk is zero, it will send a "compressed" zero
chunk to the target. That will result in a memset() in case the
destination is not already zero. So, both the source and the destination
will be at least be read.

But this only works if a complete chunk (1MB) is zero IIUC. If only one
page within a chunk is not zero (e.g., not inflated), the whole chunk
will be pinned. Also, "disabled chunk registration" seems to be another
case.

https://wiki.qemu.org/Features/RDMALiveMigration

"Finally, zero pages are only checked if a page has not yet been
registered using chunk registration (or not checked at all and
unconditionally written if chunk registration is disabled. This is
accomplished using the "Compress" command listed above. If the page
*has* been registered then we check the entire chunk for zero. Only if
the entire chunk is zero, then we send a compress command to zap the
page on the other side."
diff mbox series

Patch

diff --git a/migration/rdma.c b/migration/rdma.c
index f61587891b..029adbb950 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -29,6 +29,7 @@ 
 #include "qemu/sockets.h"
 #include "qemu/bitmap.h"
 #include "qemu/coroutine.h"
+#include "exec/memory.h"
 #include <sys/socket.h>
 #include <netdb.h>
 #include <arpa/inet.h>
@@ -4017,8 +4018,14 @@  void rdma_start_incoming_migration(const char *host_port, Error **errp)
     Error *local_err = NULL;
 
     trace_rdma_start_incoming_migration();
-    rdma = qemu_rdma_data_init(host_port, &local_err);
 
+    /* Avoid ram_block_discard_set_broken(), cannot change during migration. */
+    if (ram_block_discard_is_required()) {
+        error_setg(errp, "RDMA: cannot set discarding of RAM broken");
+        return;
+    }
+
+    rdma = qemu_rdma_data_init(host_port, &local_err);
     if (rdma == NULL) {
         goto err;
     }
@@ -4064,10 +4071,17 @@  void rdma_start_outgoing_migration(void *opaque,
                             const char *host_port, Error **errp)
 {
     MigrationState *s = opaque;
-    RDMAContext *rdma = qemu_rdma_data_init(host_port, errp);
     RDMAContext *rdma_return_path = NULL;
+    RDMAContext *rdma;
     int ret = 0;
 
+    /* Avoid ram_block_discard_set_broken(), cannot change during migration. */
+    if (ram_block_discard_is_required()) {
+        error_setg(errp, "RDMA: cannot set discarding of RAM broken");
+        return;
+    }
+
+    rdma = qemu_rdma_data_init(host_port, errp);
     if (rdma == NULL) {
         goto err;
     }