diff mbox series

[REPOST,v3,78/80] hostmem: fix strict bind policy

Message ID 1579779525-20065-79-git-send-email-imammedo@redhat.com (mailing list archive)
State New, archived
Headers show
Series refactor main RAM allocation to use hostmem backend | expand

Commit Message

Igor Mammedov Jan. 23, 2020, 11:38 a.m. UTC
When option -mem-prealloc is used with one or more memory-backend
objects, created backends may not obey configured bind policy or
creation may fail after kernel attempts to move pages according
to bind policy.
Reason is in file_ram_alloc(), which will pre-allocate
any descriptor based RAM if global mem_prealloc != 0 and that
happens way before bind policy is applied to memory range.

One way to fix it would be to extend memory_region_foo() API
and add more invariants that could broken later due implicit
dependencies that's hard to track.

Another approach is to drop adhoc main RAM allocation and
consolidate it around memory-backend. That allows to have
single place that allocates guest RAM (main and memdev)
in the same way and then global mem_prealloc could be
replaced by backend's property[s] that will affect created
memory-backend objects but only in correct order this time.

With main RAM now converted to hostmem backends, there is no
point in keeping global mem_prealloc around, so alias
 -mem-prealloc to "memory-backend.prealloc=on"
machine compat[*] property and make mem_prealloc a local
variable to only stir registration of compat property.

*) currently user accessible -global works only with DEVICE
   based objects and extra work is needed to make it work
   with hostmem backends. But that is convenience option
   and out of scope of this already huge refactoring.
   Hence machine compat properties were used.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v3:
  - use object_register_sugar_prop() instead of directly hacking
        compat_props (Paolo Bonzini <pbonzini@redhat.com>)

CC: pbonzini@redhat.com
CC: ehabkost@redhat.com
CC: rth@twiddle.net
---
 include/sysemu/hostmem.h |  2 +-
 include/sysemu/sysemu.h  |  1 -
 backends/hostmem-file.c  |  1 -
 backends/hostmem-memfd.c |  1 -
 backends/hostmem.c       | 12 +-----------
 exec.c                   | 11 -----------
 vl.c                     |  3 ++-
 7 files changed, 4 insertions(+), 27 deletions(-)

Comments

Halil Pasic Jan. 24, 2020, 7:17 p.m. UTC | #1
On Thu, 23 Jan 2020 12:38:43 +0100
Igor Mammedov <imammedo@redhat.com> wrote:

> With main RAM now converted to hostmem backends, there is no
> point in keeping global mem_prealloc around, so alias
>  -mem-prealloc to "memory-backend.prealloc=on"
> machine compat[*] property and make mem_prealloc a local
> variable to only stir registration of compat property.
> 
> *) currently user accessible -global works only with DEVICE
>    based objects and extra work is needed to make it work
>    with hostmem backends. But that is convenience option
>    and out of scope of this already huge refactoring.
>    Hence machine compat properties were used.

AFAIU because of this something like
-global memory-backend-file.share=on
(as proposed by
https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg00531.html)
can not be used to make the main RAM shared (e.g. for vhost on s390x).
Or am I wrong? If not, is -global still the way we want to make this work
for non-numa machines, or did I miss updates?

Regards,
Halil
Igor Mammedov Jan. 27, 2020, 7:39 a.m. UTC | #2
On Fri, 24 Jan 2020 20:17:48 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Thu, 23 Jan 2020 12:38:43 +0100
> Igor Mammedov <imammedo@redhat.com> wrote:
> 
> > With main RAM now converted to hostmem backends, there is no
> > point in keeping global mem_prealloc around, so alias
> >  -mem-prealloc to "memory-backend.prealloc=on"
> > machine compat[*] property and make mem_prealloc a local
> > variable to only stir registration of compat property.
> > 
> > *) currently user accessible -global works only with DEVICE
> >    based objects and extra work is needed to make it work
> >    with hostmem backends. But that is convenience option
> >    and out of scope of this already huge refactoring.
> >    Hence machine compat properties were used.  
> 
> AFAIU because of this something like
> -global memory-backend-file.share=on
> (as proposed by
> https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg00531.html)
> can not be used to make the main RAM shared (e.g. for vhost on s390x).
> Or am I wrong? If not, is -global still the way we want to make this work
> for non-numa machines, or did I miss updates?

one should be able to use memory-backend property to make it work
instead of -m convenience option in s390 case.

As for -global for objects (or more limited variant for memory-backends),
it needs more work to support objects. (that's mostly policy decision)

> Regards,
> Halil
>
Halil Pasic Jan. 27, 2020, 2:41 p.m. UTC | #3
On Mon, 27 Jan 2020 08:39:25 +0100
Igor Mammedov <imammedo@redhat.com> wrote:

> On Fri, 24 Jan 2020 20:17:48 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Thu, 23 Jan 2020 12:38:43 +0100
> > Igor Mammedov <imammedo@redhat.com> wrote:
> > 
> > > With main RAM now converted to hostmem backends, there is no
> > > point in keeping global mem_prealloc around, so alias
> > >  -mem-prealloc to "memory-backend.prealloc=on"
> > > machine compat[*] property and make mem_prealloc a local
> > > variable to only stir registration of compat property.
> > > 
> > > *) currently user accessible -global works only with DEVICE
> > >    based objects and extra work is needed to make it work
> > >    with hostmem backends. But that is convenience option
> > >    and out of scope of this already huge refactoring.
> > >    Hence machine compat properties were used.  
> > 
> > AFAIU because of this something like
> > -global memory-backend-file.share=on
> > (as proposed by
> > https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg00531.html)
> > can not be used to make the main RAM shared (e.g. for vhost on s390x).
> > Or am I wrong? If not, is -global still the way we want to make this work
> > for non-numa machines, or did I miss updates?
> 
> one should be able to use memory-backend property to make it work
> instead of -m convenience option in s390 case.

Thank you very much for the quick response!

Honestly, I overlooked the memory-backed machine property, but regardless
of that -machine,memory-backend=id *does not seem viable* at the
moment.

My understanding is that one has to do something like:        
-machine type=s390-ccw-virtio,memory-backend=mem \
-object memory-backend-file,id=mem,size=2G,mem-path=/dev/shm/virtiofs.shm,share=on \

I get 
qemu: : Device 'mem' not found
because the 'memory-backend-*' objects are delayed,
i.e. !object_create_initial(), and at the time when
machine_set_property() tries to look the memory-backend up the
memory-backend is not yet created.

For why delayed, object_create_initial()  has a comment:

    /* Memory allocation by backends needs to be done
     * after configure_accelerator() (due to the tcg_enabled()
     * checks at memory_region_init_*()).
     *
     * Also, allocation of large amounts of memory may delay
     * chardev initialization for too long, and trigger timeouts
     * on software that waits for a monitor socket to be created
     * (e.g. libvirt).
     */
    if (g_str_has_prefix(type, "memory-backend-")) {

Or, am I using it wrong?

> 
> As for -global for objects (or more limited variant for memory-backends),
> it needs more work to support objects. (that's mostly policy decision)
> 

I agree.

Regards,
Halil
Igor Mammedov Jan. 28, 2020, 12:07 p.m. UTC | #4
On Mon, 27 Jan 2020 15:41:45 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Mon, 27 Jan 2020 08:39:25 +0100
> Igor Mammedov <imammedo@redhat.com> wrote:
> 
> > On Fri, 24 Jan 2020 20:17:48 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >   
> > > On Thu, 23 Jan 2020 12:38:43 +0100
> > > Igor Mammedov <imammedo@redhat.com> wrote:
> > >   
> > > > With main RAM now converted to hostmem backends, there is no
> > > > point in keeping global mem_prealloc around, so alias
> > > >  -mem-prealloc to "memory-backend.prealloc=on"
> > > > machine compat[*] property and make mem_prealloc a local
> > > > variable to only stir registration of compat property.
> > > > 
> > > > *) currently user accessible -global works only with DEVICE
> > > >    based objects and extra work is needed to make it work
> > > >    with hostmem backends. But that is convenience option
> > > >    and out of scope of this already huge refactoring.
> > > >    Hence machine compat properties were used.    
> > > 
> > > AFAIU because of this something like
> > > -global memory-backend-file.share=on
> > > (as proposed by
> > > https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg00531.html)
> > > can not be used to make the main RAM shared (e.g. for vhost on s390x).
> > > Or am I wrong? If not, is -global still the way we want to make this work
> > > for non-numa machines, or did I miss updates?  
> > 
> > one should be able to use memory-backend property to make it work
> > instead of -m convenience option in s390 case.  
> 
> Thank you very much for the quick response!
> 
> Honestly, I overlooked the memory-backed machine property, but regardless
> of that -machine,memory-backend=id *does not seem viable* at the
> moment.
> 
> My understanding is that one has to do something like:        
> -machine type=s390-ccw-virtio,memory-backend=mem \
> -object memory-backend-file,id=mem,size=2G,mem-path=/dev/shm/virtiofs.shm,share=on \
> 
> I get 
> qemu: : Device 'mem' not found
> because the 'memory-backend-*' objects are delayed,
> i.e. !object_create_initial(), and at the time when
> machine_set_property() tries to look the memory-backend up the
> memory-backend is not yet created.
> 
> For why delayed, object_create_initial()  has a comment:
> 
>     /* Memory allocation by backends needs to be done
>      * after configure_accelerator() (due to the tcg_enabled()
>      * checks at memory_region_init_*()).
>      *
>      * Also, allocation of large amounts of memory may delay
>      * chardev initialization for too long, and trigger timeouts
>      * on software that waits for a monitor socket to be created
>      * (e.g. libvirt).
>      */
>     if (g_str_has_prefix(type, "memory-backend-")) {
> 
> Or, am I using it wrong?

You right, 
I even had an alternative impl. earlier that used string property
instead of link, but later I forgot about this complication
and simplified it to link property which works fine but only
for -m case.

It's necessary to rewrite patches 2-4/80 to use string property
for memory-backend which will be used for delayed backend access
at machine_run_board_init() time.

I'll CC you on relevant patches for reviewing when I post v4.

> > 
> > As for -global for objects (or more limited variant for memory-backends),
> > it needs more work to support objects. (that's mostly policy decision)
> >   
> 
> I agree.
> 
> Regards,
> Halil
>
Halil Pasic Jan. 28, 2020, 12:37 p.m. UTC | #5
On Tue, 28 Jan 2020 13:07:40 +0100
Igor Mammedov <imammedo@redhat.com> wrote:

> On Mon, 27 Jan 2020 15:41:45 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Mon, 27 Jan 2020 08:39:25 +0100
> > Igor Mammedov <imammedo@redhat.com> wrote:
[..]
> > > 
> > > one should be able to use memory-backend property to make it work
> > > instead of -m convenience option in s390 case.  
> > 
> > Thank you very much for the quick response!
> > 
> > Honestly, I overlooked the memory-backed machine property, but regardless
> > of that -machine,memory-backend=id *does not seem viable* at the
> > moment.
> > 
[..]
> > 
> > Or, am I using it wrong?
> 
> You right, 
> I even had an alternative impl. earlier that used string property
> instead of link, but later I forgot about this complication
> and simplified it to link property which works fine but only
> for -m case.
> 
> It's necessary to rewrite patches 2-4/80 to use string property
> for memory-backend which will be used for delayed backend access
> at machine_run_board_init() time.
> 
> I'll CC you on relevant patches for reviewing when I post v4.
> 

Thanks! I will give it a spin. 

Halil
diff mbox series

Patch

diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index bdf8666..8276e53 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -70,7 +70,7 @@  struct HostMemoryBackend {
     /* protected */
     uint64_t size;
     bool merge, dump, use_canonical_path;
-    bool prealloc, force_prealloc, is_mapped, share;
+    bool prealloc, is_mapped, share;
     uint32_t prealloc_threads;
     DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
     HostMemPolicy policy;
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 8de9065..b72c773 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -53,7 +53,6 @@  extern uint8_t *boot_splash_filedata;
 extern bool enable_mlock;
 extern bool enable_cpu_pm;
 extern QEMUClockType rtc_clock;
-extern int mem_prealloc;
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index cb319a9..c8c355f 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -51,7 +51,6 @@  file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
         return;
     }
 
-    backend->force_prealloc = mem_prealloc;
     name = host_memory_backend_get_name(backend);
     memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
                                      name,
diff --git a/backends/hostmem-memfd.c b/backends/hostmem-memfd.c
index 26070b4..74ba987 100644
--- a/backends/hostmem-memfd.c
+++ b/backends/hostmem-memfd.c
@@ -45,7 +45,6 @@  memfd_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
         return;
     }
 
-    backend->force_prealloc = mem_prealloc;
     fd = qemu_memfd_create(TYPE_MEMORY_BACKEND_MEMFD, backend->size,
                            m->hugetlb, m->hugetlbsize, m->seal ?
                            F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL : 0,
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 0988986..a70867b 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -215,7 +215,7 @@  static bool host_memory_backend_get_prealloc(Object *obj, Error **errp)
 {
     HostMemoryBackend *backend = MEMORY_BACKEND(obj);
 
-    return backend->prealloc || backend->force_prealloc;
+    return backend->prealloc;
 }
 
 static void host_memory_backend_set_prealloc(Object *obj, bool value,
@@ -224,14 +224,6 @@  static void host_memory_backend_set_prealloc(Object *obj, bool value,
     Error *local_err = NULL;
     HostMemoryBackend *backend = MEMORY_BACKEND(obj);
 
-    if (backend->force_prealloc) {
-        if (value) {
-            error_setg(errp,
-                       "remove -mem-prealloc to use the prealloc property");
-            return;
-        }
-    }
-
     if (!host_memory_backend_mr_inited(backend)) {
         backend->prealloc = value;
         return;
@@ -288,8 +280,6 @@  static void host_memory_backend_init(Object *obj)
     /* TODO: convert access to globals to compat properties */
     backend->merge = machine_mem_merge(machine);
     backend->dump = machine_dump_guest_core(machine);
-    backend->prealloc = mem_prealloc;
-    backend->prealloc_threads = 1;
 }
 
 static void host_memory_backend_post_init(Object *obj)
diff --git a/exec.c b/exec.c
index dc844fd..172327d 100644
--- a/exec.c
+++ b/exec.c
@@ -1802,8 +1802,6 @@  static void *file_ram_alloc(RAMBlock *block,
                             bool truncate,
                             Error **errp)
 {
-    Error *err = NULL;
-    MachineState *ms = MACHINE(qdev_get_machine());
     void *area;
 
     block->page_size = qemu_fd_getpagesize(fd);
@@ -1859,15 +1857,6 @@  static void *file_ram_alloc(RAMBlock *block,
         return NULL;
     }
 
-    if (mem_prealloc) {
-        os_mem_prealloc(fd, area, memory, ms->smp.cpus, &err);
-        if (err) {
-            error_propagate(errp, err);
-            qemu_ram_munmap(fd, area, memory);
-            return NULL;
-        }
-    }
-
     block->fd = fd;
     return area;
 }
diff --git a/vl.c b/vl.c
index 21e3262..f6b8abb 100644
--- a/vl.c
+++ b/vl.c
@@ -140,7 +140,6 @@  enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB;
 int display_opengl;
 const char* keyboard_layout = NULL;
 ram_addr_t ram_size;
-int mem_prealloc = 0; /* force preallocation of physical target memory */
 bool enable_mlock = false;
 bool enable_cpu_pm = false;
 int nb_nics;
@@ -2876,6 +2875,7 @@  int main(int argc, char **argv, char **envp)
     const char *mem_path = NULL;
     BlockdevOptionsQueue bdo_queue = QSIMPLEQ_HEAD_INITIALIZER(bdo_queue);
     QemuPluginList plugin_list = QTAILQ_HEAD_INITIALIZER(plugin_list);
+    int mem_prealloc = 0; /* force preallocation of physical target memory */
 
     os_set_line_buffering();
 
@@ -3979,6 +3979,7 @@  int main(int argc, char **argv, char **envp)
         val = g_strdup_printf("%d", current_machine->smp.cpus);
         object_register_sugar_prop("memory-backend", "prealloc-threads", val);
         g_free(val);
+        object_register_sugar_prop("memory-backend", "prealloc", "on");
     }
 
     /*