mbox series

[v14,0/2] support MAP_SYNC for memory-backend-file

Message ID 20190422004849.26463-1-richardw.yang@linux.intel.com (mailing list archive)
Headers show
Series support MAP_SYNC for memory-backend-file | expand

Message

Wei Yang April 22, 2019, 12:48 a.m. UTC
Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
files on ext4/xfs file system mounted with '-o dax').

A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
    https://patchwork.kernel.org/patch/10028151/

In order to make sure that the file metadata is in sync after a fault 
while we are writing a shared DAX supporting backend files, this
patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.

As the DAX vs DMA truncated issue was solved, we refined the code and
send out this feature for the v5 version.

We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
'share=on' & 'pmem=on'. 
Or QEMU will not pass this flag to mmap(2)

Test with below cases:
1. pmem=on is set, shared=on is set, MAP_SYNC supported:
   a: backend is a dax supporting file.
   1) start VM1 with options:
   -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_1},size=${DAX_FILE_SIZE_1},align=128M,pmem=on,share=on
   -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
   
   2) start VM2 with options:
   -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_2,size=${DAX_FILE_SIZE_2},align=128M,pmem=on,share=on
   -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.

   3) live migrate from VM1 to VM2.
   
   4) Suddenly let Host crash or power failure.

   5) check DAX_FILE_1 and DAX_FILE_2, no corrupt.

   b: backend is a regular file.
   1) start with options
   -object memory-backend-file,id=nv_be4,share,mem-path=${REG_FILE},size=${REG_FILE_SIZE},align=128M,pmem=on,share=on
   -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.

   will warning "failed to validate with mapping flags: Operation not supported"
   FILE_1 and FILE_2 random corrupt.

2. Other cases:
   FILE_1 and FILE_2 random corrupt.

Changes in V14:
 * 1/2 rebase on top of current upstream and tested

Changes in V13:
 * 4/5 Micheal: move the inlcude to mmap_alloc.c.
 * 4/5 Micheal: refine the warning message.
 * 5/5 Micheal: refine the Documentations.

Changes in V12:
 * 2/5: Micheal: Update update-linux-headers.sh
 * 3/5: Micheal: Use script update add linux/mman.h
 * 4/5: Pankaj,Micheal: 1) fallback to mmap without
        MAP_SYNC & MAP_SHARED_VALIDATE if sync not supported or failed
	2) Replace the include with 3/5 added linux/mman.h
 * 5/5: Micheal: Refine the Documentations.

Changes in V11:
 * 1/3: Micheal: Change to just add a bool is_pmem in qemu_ram_mmap.
 * 2/3: Micheal: Fix the compatibility for old kernel.
 * 2/3&3/3: Micheal&Eduardo :Update the behavior below: 
   Waning at no-dax and continue without MAP_SYNC.
   Test if fails again for compatibility, then remove the MAP_VALIDATE and
   silently proceed.

Changes in V10:
 * 4/4: refine the document.
 * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
 * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
 * 2/4: Fix the wrong include header

Changes in V9:
 * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
 * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
 since I don't have much knowledge about the sparse feature, @Micheal Could you 
 add some documentation/commit message on this patch? Thank you very much.
 * 3/6: from 2/5: Eduardo: updated the commit message. 
 * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
 * 5/6: from 4/5: Eduardo: updated the commit message.
 * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.

Changes in v8:
 * Micheal: 3/5, remove the duplicated define in the os_dep.h
 * Micheal: 2/5, make type define safety.
 * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
 * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
   MAP_SYNC only worked with pmem=on.
 * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
   all the flags in one parameter.

Changes in v7:
 * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)

Changes in v6:
 * Pankaj: 3/7 are squashed with 2/7
 * Pankaj: 7/7 update comments to "consistent filesystem metadata".
 * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
 * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
 * Stefan, 5/7 Add missing "munmap"
 * Stefan, 2/7 refine the shared/flag.

Changes in v5:
 * Add patch 1 to fix a memory leak issue.
 * Refine the patch 4-6
 * Remove the patch 3 as we already change the parameter from "shared" to
   "flags"

Changes in v4:
 * Add patch 1-3 to switch some functions to a single 'flags'
   parameters. (Michael S. Tsirkin)
 * v3 patch 1-3 become v4 patch 4-6.
 * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
   new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
 * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)

Changes in v3:
 * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
   cases, and add back the retry mechanism. MAP_SYNC will be ignored
   by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
 * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
   platforms in order to make qemu_ram_mmap() compile on those platforms.
 * Patch 2&3: include more information in error messages of
   memory-backend in hope to help user to identify the error.
   (Dr. David Alan Gilbert)
 * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)

Changes in v2:
 * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
 * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
   the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
 * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
   to osdep.h. (Michael S. Tsirkin)

Zhang Yi (2):
  util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
  docs: Added MAP_SYNC documentation

 docs/nvdimm.txt   | 22 +++++++++++++++++++---
 qemu-options.hx   |  5 +++++
 util/mmap-alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 64 insertions(+), 4 deletions(-)

Comments

Michael S. Tsirkin April 22, 2019, 12:34 p.m. UTC | #1
On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote:
> Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> files on ext4/xfs file system mounted with '-o dax').
> 
> A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
>     https://patchwork.kernel.org/patch/10028151/
> 
> In order to make sure that the file metadata is in sync after a fault 
> while we are writing a shared DAX supporting backend files, this
> patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> 
> As the DAX vs DMA truncated issue was solved, we refined the code and
> send out this feature for the v5 version.
> 
> We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> 'share=on' & 'pmem=on'. 
> Or QEMU will not pass this flag to mmap(2)

OK this is in a good shape. As we are in freeze anyway,
there's still a bit more time to polish it. I have a couple of
suggestions:

- squash docs in same patch with code, no need for two patches
- mmap errors are not silently ignored as the doc says,
  a warning is produced

Also, it might make sense to send the warnings to an errp object and not stderr.
I would leave that to a follow-up patch.


> Test with below cases:
> 1. pmem=on is set, shared=on is set, MAP_SYNC supported:
>    a: backend is a dax supporting file.
>    1) start VM1 with options:
>    -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_1},size=${DAX_FILE_SIZE_1},align=128M,pmem=on,share=on
>    -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
>    
>    2) start VM2 with options:
>    -object memory-backend-file,id=nv_be4,share,mem-path=${DAX_FILE_2,size=${DAX_FILE_SIZE_2},align=128M,pmem=on,share=on
>    -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
> 
>    3) live migrate from VM1 to VM2.
>    
>    4) Suddenly let Host crash or power failure.
> 
>    5) check DAX_FILE_1 and DAX_FILE_2, no corrupt.
> 
>    b: backend is a regular file.
>    1) start with options
>    -object memory-backend-file,id=nv_be4,share,mem-path=${REG_FILE},size=${REG_FILE_SIZE},align=128M,pmem=on,share=on
>    -device nvdimm,id=nv4,memdev=nv_be4,label-size=2M.
> 
>    will warning "failed to validate with mapping flags: Operation not supported"
>    FILE_1 and FILE_2 random corrupt.
> 
> 2. Other cases:
>    FILE_1 and FILE_2 random corrupt.
> 
> Changes in V14:
>  * 1/2 rebase on top of current upstream and tested
> 
> Changes in V13:
>  * 4/5 Micheal: move the inlcude to mmap_alloc.c.
>  * 4/5 Micheal: refine the warning message.
>  * 5/5 Micheal: refine the Documentations.
> 
> Changes in V12:
>  * 2/5: Micheal: Update update-linux-headers.sh
>  * 3/5: Micheal: Use script update add linux/mman.h
>  * 4/5: Pankaj,Micheal: 1) fallback to mmap without
>         MAP_SYNC & MAP_SHARED_VALIDATE if sync not supported or failed
> 	2) Replace the include with 3/5 added linux/mman.h
>  * 5/5: Micheal: Refine the Documentations.
> 
> Changes in V11:
>  * 1/3: Micheal: Change to just add a bool is_pmem in qemu_ram_mmap.
>  * 2/3: Micheal: Fix the compatibility for old kernel.
>  * 2/3&3/3: Micheal&Eduardo :Update the behavior below: 
>    Waning at no-dax and continue without MAP_SYNC.
>    Test if fails again for compatibility, then remove the MAP_VALIDATE and
>    silently proceed.
> 
> Changes in V10:
>  * 4/4: refine the document.
>  * 3/4: Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
>  * 2/4: refine the commit message, Added MAP_SHARED_VALIDATE.
>  * 2/4: Fix the wrong include header
> 
> Changes in V9:
>  * 1/6: Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>  * 2/6: New Added: Micheal: use sparse feature define RAM_FLAG. 
>  since I don't have much knowledge about the sparse feature, @Micheal Could you 
>  add some documentation/commit message on this patch? Thank you very much.
>  * 3/6: from 2/5: Eduardo: updated the commit message. 
>  * 4/6: from 3/5: Micheal: don't ignore MAP_SYNC failures silently.
>  * 5/6: from 4/5: Eduardo: updated the commit message.
>  * 6/6: from 5/5: Micheal: Drop the sync option, document the MAP_SYNC.
> 
> Changes in v8:
>  * Micheal: 3/5, remove the duplicated define in the os_dep.h
>  * Micheal: 2/5, make type define safety.
>  * Micheal: 2/5, fixed the incorrect define MAP_SHARE on qemu_anon_ram_alloc.
>  * 4/6 removed, we remove the on/off/auto define of sync,  as by now,
>    MAP_SYNC only worked with pmem=on.
>  * @Micheal, I still reuse the RAM_SYNC flag, it is much straightforward to parse 
>    all the flags in one parameter.
> 
> Changes in v7:
>  * Micheal: [3,4,6]/6 limited the "sync" flag only on a nvdimm backend.(pmem=on)
> 
> Changes in v6:
>  * Pankaj: 3/7 are squashed with 2/7
>  * Pankaj: 7/7 update comments to "consistent filesystem metadata".
>  * Pankaj, Igor: 1/7 Added Reviewed-by in patch-1/7
>  * Stefan, 4/7 move the include header from "/linux/mman.h" to "osdep.h"
>  * Stefan, 5/7 Add missing "munmap"
>  * Stefan, 2/7 refine the shared/flag.
> 
> Changes in v5:
>  * Add patch 1 to fix a memory leak issue.
>  * Refine the patch 4-6
>  * Remove the patch 3 as we already change the parameter from "shared" to
>    "flags"
> 
> Changes in v4:
>  * Add patch 1-3 to switch some functions to a single 'flags'
>    parameters. (Michael S. Tsirkin)
>  * v3 patch 1-3 become v4 patch 4-6.
>  * Patch 4: move definitions of MAP_SYNC and MAP_SHARED_VALIDATE to a
>    new header file under include/standard-headers/linux/. (Michael S. Tsirkin)
>  * Patch 6: refine the description of the 'sync' option. (Michael S. Tsirkin)
> 
> Changes in v3:
>  * Patch 1: add MAP_SHARED_VALIDATE in both sync=on and sync=auto
>    cases, and add back the retry mechanism. MAP_SYNC will be ignored
>    by Linux kernel 4.15 if MAP_SHARED_VALIDATE is missed.
>  * Patch 1: define MAP_SYNC and MAP_SHARED_VALIDATE as 0 on non-Linux
>    platforms in order to make qemu_ram_mmap() compile on those platforms.
>  * Patch 2&3: include more information in error messages of
>    memory-backend in hope to help user to identify the error.
>    (Dr. David Alan Gilbert)
>  * Patch 3: fix typo in the commit message. (Dr. David Alan Gilbert)
> 
> Changes in v2:
>  * Add 'sync' option to control the use of MAP_SYNC. (Eduardo Habkost)
>  * Remove the unnecessary set of MAP_SHARED_VALIDATE in some cases and
>    the retry mechanism in qemu_ram_mmap(). (Michael S. Tsirkin)
>  * Move OS dependent definitions of MAP_SYNC and MAP_SHARED_VALIDATE
>    to osdep.h. (Michael S. Tsirkin)
> 
> Zhang Yi (2):
>   util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap()
>   docs: Added MAP_SYNC documentation
> 
>  docs/nvdimm.txt   | 22 +++++++++++++++++++---
>  qemu-options.hx   |  5 +++++
>  util/mmap-alloc.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 64 insertions(+), 4 deletions(-)
> 
> -- 
> 2.19.1
Eduardo Habkost April 22, 2019, 6:22 p.m. UTC | #2
On Mon, Apr 22, 2019 at 08:34:51AM -0400, Michael S. Tsirkin wrote:
> On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote:
> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> > files on ext4/xfs file system mounted with '-o dax').
> > 
> > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
> >     https://patchwork.kernel.org/patch/10028151/
> > 
> > In order to make sure that the file metadata is in sync after a fault 
> > while we are writing a shared DAX supporting backend files, this
> > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> > 
> > As the DAX vs DMA truncated issue was solved, we refined the code and
> > send out this feature for the v5 version.
> > 
> > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> > 'share=on' & 'pmem=on'. 
> > Or QEMU will not pass this flag to mmap(2)
> 
> OK this is in a good shape. As we are in freeze anyway,
> there's still a bit more time to polish it. I have a couple of
> suggestions:
> 
> - squash docs in same patch with code, no need for two patches
> - mmap errors are not silently ignored as the doc says,
>   a warning is produced

Note that a warning is produced only if both share=on and pmem=on
is specified.  If using pmem=on without share=on, no warning is
printed at all.

I agree we could squash the docs in the same patch, but I don't
want to prevent the code from being merged and require v15 to be
sent just because we are still polishing the documentation.

If there are no objections, I plan to apply this version of the
series including the following fixup (just removing the word
"silently"), and I suggest further improvements to be sent as
follow up patches.

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index bcd1456e72..b531cacd35 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -159,8 +159,8 @@ If these conditions are not satisfied i.e. if either 'pmem' or 'share'
 are not set, if the backend file does not support DAX or if MAP_SYNC
 is not supported by the host kernel, write persistence is not
 guaranteed after a system crash. For compatibility reasons, these
-conditions are silently ignored if not satisfied. Currently, no way
-is provided to test for them.
+conditions are ignored if not satisfied. Currently, no way is
+provided to test for them.
 For more details, please reference mmap(2) man page:
 http://man7.org/linux/man-pages/man2/mmap.2.html.
Wei Yang April 23, 2019, 2:41 a.m. UTC | #3
On Mon, Apr 22, 2019 at 03:22:55PM -0300, Eduardo Habkost wrote:
>On Mon, Apr 22, 2019 at 08:34:51AM -0400, Michael S. Tsirkin wrote:
>> On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote:
>> > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
>> > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
>> > files on ext4/xfs file system mounted with '-o dax').
>> > 
>> > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
>> >     https://patchwork.kernel.org/patch/10028151/
>> > 
>> > In order to make sure that the file metadata is in sync after a fault 
>> > while we are writing a shared DAX supporting backend files, this
>> > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
>> > 
>> > As the DAX vs DMA truncated issue was solved, we refined the code and
>> > send out this feature for the v5 version.
>> > 
>> > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
>> > 'share=on' & 'pmem=on'. 
>> > Or QEMU will not pass this flag to mmap(2)
>> 
>> OK this is in a good shape. As we are in freeze anyway,
>> there's still a bit more time to polish it. I have a couple of
>> suggestions:
>> 
>> - squash docs in same patch with code, no need for two patches
>> - mmap errors are not silently ignored as the doc says,
>>   a warning is produced
>
>Note that a warning is produced only if both share=on and pmem=on
>is specified.  If using pmem=on without share=on, no warning is
>printed at all.
>
>I agree we could squash the docs in the same patch, but I don't
>want to prevent the code from being merged and require v15 to be
>sent just because we are still polishing the documentation.
>
>If there are no objections, I plan to apply this version of the
>series including the following fixup (just removing the word
>"silently"), and I suggest further improvements to be sent as
>follow up patches.
>

If my understanding is correct, the following up patch is:

"send the warnings to an errp object and not stderr"
Michael S. Tsirkin April 23, 2019, 12:43 p.m. UTC | #4
On Mon, Apr 22, 2019 at 03:22:55PM -0300, Eduardo Habkost wrote:
> On Mon, Apr 22, 2019 at 08:34:51AM -0400, Michael S. Tsirkin wrote:
> > On Mon, Apr 22, 2019 at 08:48:47AM +0800, Wei Yang wrote:
> > > Linux 4.15 introduces a new mmap flag MAP_SYNC, which can be used to
> > > guarantee the write persistence to mmap'ed files supporting DAX (e.g.,
> > > files on ext4/xfs file system mounted with '-o dax').
> > > 
> > > A description of MAP_SYNC and MAP_SHARED_VALIDATE can be found at
> > >     https://patchwork.kernel.org/patch/10028151/
> > > 
> > > In order to make sure that the file metadata is in sync after a fault 
> > > while we are writing a shared DAX supporting backend files, this
> > > patch-set enables QEMU to use MAP_SYNC flag for memory-backend-dax-file.
> > > 
> > > As the DAX vs DMA truncated issue was solved, we refined the code and
> > > send out this feature for the v5 version.
> > > 
> > > We will pass MAP_SYNC to mmap(2); if MAP_SYNC is supported and
> > > 'share=on' & 'pmem=on'. 
> > > Or QEMU will not pass this flag to mmap(2)
> > 
> > OK this is in a good shape. As we are in freeze anyway,
> > there's still a bit more time to polish it. I have a couple of
> > suggestions:
> > 
> > - squash docs in same patch with code, no need for two patches
> > - mmap errors are not silently ignored as the doc says,
> >   a warning is produced
> 
> Note that a warning is produced only if both share=on and pmem=on
> is specified.  If using pmem=on without share=on, no warning is
> printed at all.
> 
> I agree we could squash the docs in the same patch, but I don't
> want to prevent the code from being merged and require v15 to be
> sent just because we are still polishing the documentation.
> 
> If there are no objections, I plan to apply this version of the
> series including the following fixup (just removing the word
> "silently"), and I suggest further improvements to be sent as
> follow up patches.
> 
> diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
> index bcd1456e72..b531cacd35 100644
> --- a/docs/nvdimm.txt
> +++ b/docs/nvdimm.txt
> @@ -159,8 +159,8 @@ If these conditions are not satisfied i.e. if either 'pmem' or 'share'
>  are not set, if the backend file does not support DAX or if MAP_SYNC
>  is not supported by the host kernel, write persistence is not
>  guaranteed after a system crash. For compatibility reasons, these
> -conditions are silently ignored if not satisfied. Currently, no way
> -is provided to test for them.
> +conditions are ignored if not satisfied. Currently, no way is
> +provided to test for them.
>  For more details, please reference mmap(2) man page:
>  http://man7.org/linux/man-pages/man2/mmap.2.html.

with the two being squashed, and above fix:

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>



> -- 
> Eduardo