Message ID | 20230725105239.2022-1-logoerthiner1@163.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Open file as read only on private mapping in qemu_ram_alloc_from_file | expand |
At 2023-07-25 19:42:30, "David Hildenbrand" <david@redhat.com> wrote: >Hi, > >patch subject should start with "softmmu/physmem: Open ..." Sorry I am newbie to the patch submission part. I will resubmit a version of patch if the final acceptable patch after discussion is mostly the same. (For example, if this patch finally involves adding another parameter and adding various hooks, then I may feel it hard to finish the patch myself, both due to lack of knowledge of qemu source code tree, and due to lack of various environment to test every case out) Anyway thanks to all your suggestions. > >On 25.07.23 12:52, Thiner Logoer wrote: >> An read only file can be mapped with read write as long as the >> mapping is private, which is very common case. Make > >At least in the environments I know, using private file mappings is a corner case ;) > >What is you use case? VM templating? Mostly, if I understand the terminology correct. I was experimenting on vm snapshoting that uses MAP_PRIVATE when recovering memory, similar to what firecracker says in this documentation. https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md And in my experiment qemu supports recovering from a memory file + a guest state file out of the box. In fact, `-mem-path filename4pc.ram` works out of the box (since the default parameter is map_private+readwrite), only that vanilla setup requires memory file to be writeable though the file never gets written. (the actual memory file & guest state file require separated hacking) And at least the patch provided here have been the solution to this last problem for me for a while. By the way the commit: "Commit 134253a4, machine: do not crash if default RAM backend name has been stolen" disallows me to use a memory backed file directly as pc.ram and make `-object memory-backed-file,*` based setup more complex (I cannot easily make the memory unbacked by any file before snapshoting and backed by file after recovery from snapshot after this patch). This is the reason why I prefer `-mem-path` despite the doc tells that this usage is close to deprecated, and that `-mem-path` has less configurable parameters. > >> qemu_ram_alloc_from_file open file as read only when the >> mapping is private, otherwise open will fail when file >> does not allow write. >> >> If this file does not exist or is a directory, the flag is not used, >> so it should be OK. >> >> from https://gitlab.com/qemu-project/qemu/-/issues/1689 >> >> Signed-off-by: Thiner Logoer <logoerthiner1@163.com> >> --- >> softmmu/physmem.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/softmmu/physmem.c b/softmmu/physmem.c >> index 3df73542e1..e8036ee335 100644 >> --- a/softmmu/physmem.c >> +++ b/softmmu/physmem.c >> @@ -1945,8 +1945,15 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion*mr, >> int fd; >> bool created; >> RAMBlock *block; >> + > >^ > >.git/rebase-apply/patch:13: trailing whitespace. I remembered I have deleted this whitespace before. Obviously I have messed up with different version of patch files, sorry about that ... > >> + /* >> + * If map is private, the fd does not need to be writable. >> + * This only get effective when the file is existent. > >"This will get ignored if the file does not yet exist." > >> + */ >> + bool open_as_readonly = readonly || !(ram_flags & RAM_SHARED); >> >> - fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created, >> + fd = file_ram_open(mem_path, memory_region_name(mr), >> + open_as_readonly, &created, >> errp); >> if (fd < 0) { >> return NULL; > > >Opening a file R/O will also make operations like fallocate/ftruncate/ ... fail. I saw fallocate in softmmu/physmem.c on somewhere, though I was not sure how it is actually used. Your response fills in this part. > >For example, this will make fallocate(FALLOC_FL_PUNCH_HOLE) stop working and in >turn make ram_block_discard_range() bail out. > > >There was a recent discussion/patch on that: > >commit 1d44ff586f8a8e113379430750b5a0a2a3f64cf9 >Author: David Hildenbrand <david@redhat.com> >Date: Thu Jul 6 09:56:06 2023 +0200 > > softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping > > ram_block_discard_range() cannot possibly do the right thing in > MAP_PRIVATE file mappings in the general case. > > To achieve the documented semantics, we also have to punch a hole into > the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings > of such a file. > > For example, using VM templating -- see commit b17fbbe55cba ("migration: > allow private destination ram with x-ignore-shared") -- in combination with > any mechanism that relies on discarding of RAM is problematic. This > includes: > * Postcopy live migration > * virtio-balloon inflation/deflation or free-page-reporting > * virtio-mem > > So at least warn that there is something possibly dangerous is going on > when using ram_block_discard_range() in these cases. > I did not expect that multiple qemu features will contradict each other - private cow map of file & user fault fd based on demand memory serving ... (do not blame me too much if I get the terminology wrong - I am no professional qemu dev :D) > >While it doesn't work "in the general case", it works in the "single file owner" case >where someone simply forgot to specify "share=on" -- "share=off" is the default for >memory-backend-file :( . > > >For example, with hugetlb+virtio-mem the following works if the file does not exists: > >(note that virtio-mem will fallocate(FALLOC_FL_PUNCH_HOLE) the whole file upfront) > >... >-object memory-backend-file,share=off,mem-path=/dev/hugepages/vmem0,id=mem2,size=2G \ >-device virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root > > >With you patch, once the file already exists, we would now get > >qemu-system-x86_64: -device virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root: ram_block_discard_range: Failed to fallocate :0 +80000000 (-9) >qemu-system-x86_64: -device virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root: Unexpected error discarding RAM: Bad file descriptor > > >So this has the potential to break existing setups. > >The easy fix for these would be to configure "share=on" in these now-failing setups. Hmmmmm .... I am afraid that the easiest prefix could be to configure `share=on` when the path starts with "/dev/huge" while firing a warning :D I am sorry about that if existing systems will be broken because of my patch ... I have learnt that mem-path commonly refer to hugetlb/hugepage, but actually I have no idea what is the outcome if hugetlb or anything similar was mapped with map_private and copy-on-write happens - will a whole huge page be copied on write then? I would suppose that in reality system managers may consider directly remove the file first if the file will be truncated anyway. However t would be a different story if this file should be truncated exactly PARTIALLY. Alternatively maybe another flag "create=on" can be added when private semantics are required, so that if the file exists, the file should be unlinked or truncated first before using? Since I am nowhere familiar to this part of qemu source code, it will be hard for me to write the additional command line flag part correct, if this is believed to be the correct solution though. In summary I am glad to learn more of the backgrounds here. Back to `-mem-path` part. Now I wonder whether filling the initial value for ram is what `-mem-path` is expected behavior (whether I am using a feature that will be deprecated soon); whether there is a convenient method to filling the initial value in copy-on-write styles if `mem-path` is not good to use; and in general whether a privately used memory backed file SHOULD be writeable. Regards, logoerthiner
On 25.07.23 18:01, ThinerLogoer wrote: > > At 2023-07-25 19:42:30, "David Hildenbrand" <david@redhat.com> wrote: >> Hi, >> >> patch subject should start with "softmmu/physmem: Open ..." > > Sorry I am newbie to the patch submission part. I will resubmit a version of patch if the > final acceptable patch after discussion is mostly the same. (For example, if this patch > finally involves adding another parameter and adding various hooks, then I may feel it > hard to finish the patch myself, both due to lack of knowledge of qemu source code tree, > and due to lack of various environment to test every case out) No worries. I'll be happy to guide you. But if you feel more comfortable that I take over, just let me know. > > Anyway thanks to all your suggestions. > >> >> On 25.07.23 12:52, Thiner Logoer wrote: >>> An read only file can be mapped with read write as long as the >>> mapping is private, which is very common case. Make >> >> At least in the environments I know, using private file mappings is a corner case ;) >> >> What is you use case? VM templating? > > Mostly, if I understand the terminology correct. I was experimenting on vm snapshoting > that uses MAP_PRIVATE when recovering memory, similar to what firecracker says in this > documentation. > > https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md > > And in my experiment qemu supports recovering from a memory file + a guest state file out > of the box. > In fact, `-mem-path filename4pc.ram` works out of the box (since the default parameter is > map_private+readwrite), only that vanilla setup requires memory file to be writeable Oh, you're saying "-mem-path" results in a MAP_PRIVATE mapping? That sounds very nasty :/ It probably was introduced only for hugetlb handling, and wouldn't actually share memory with another process. In fact, we added MAP_SHARED handling later commit dbcb8981183592be129b2e624b7bcd4245e75fbc Author: Paolo Bonzini <pbonzini@redhat.com> Date: Tue Jun 10 19:15:24 2014 +0800 hostmem: add property to map memory with MAP_SHARED A new "share" property can be used with the "memory-file" backend to map memory with MAP_SHARED instead of MAP_PRIVATE. Even one doc in docs/devel/multi-process.rst is wrong: "Note guest memory must be backed by file descriptors, such as when QEMU is given the *-mem-path* command line option." ... no, that won't work with a MAP_PRIVATE mapping. > though the file never gets written. (the actual memory file & guest state file require > separated hacking) > > And at least the patch provided here have been the solution to this last problem for me > for a while. > > By the way the commit: "Commit 134253a4, machine: do not crash if default RAM backend name > has been stolen" disallows me to use a memory backed file directly as pc.ram and make > `-object memory-backed-file,*` based setup more complex (I cannot easily make the memory Can't you simply do -object memory-backed-file,id=mem1 \ -machine q35,memory-backend=mem1,share=off \ Or what would be the problem with that? > unbacked by any file before snapshoting and backed by file after recovery from snapshot > after this patch). This is the reason why I prefer `-mem-path` despite the doc tells that > this usage is close to deprecated, and that `-mem-path` has less configurable parameters. > >> >>> qemu_ram_alloc_from_file open file as read only when the >>> mapping is private, otherwise open will fail when file >>> does not allow write. >>> >>> If this file does not exist or is a directory, the flag is not used, >>> so it should be OK. >>> >>> from https://gitlab.com/qemu-project/qemu/-/issues/1689 >>> >>> Signed-off-by: Thiner Logoer <logoerthiner1@163.com> >>> --- >>> softmmu/physmem.c | 9 ++++++++- >>> 1 file changed, 8 insertions(+), 1 deletion(-) >>> >>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c >>> index 3df73542e1..e8036ee335 100644 >>> --- a/softmmu/physmem.c >>> +++ b/softmmu/physmem.c >>> @@ -1945,8 +1945,15 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion*mr, >>> int fd; >>> bool created; >>> RAMBlock *block; >>> + >> >> ^ >> >> .git/rebase-apply/patch:13: trailing whitespace. > > I remembered I have deleted this whitespace before. Obviously I have messed up with > different version of patch files, sorry about that ... > No worries :) >> >>> + /* >>> + * If map is private, the fd does not need to be writable. >>> + * This only get effective when the file is existent. >> >> "This will get ignored if the file does not yet exist." >> >>> + */ >>> + bool open_as_readonly = readonly || !(ram_flags & RAM_SHARED); >>> >>> - fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created, >>> + fd = file_ram_open(mem_path, memory_region_name(mr), >>> + open_as_readonly, &created, >>> errp); >>> if (fd < 0) { >>> return NULL; >> >> >> Opening a file R/O will also make operations like fallocate/ftruncate/ ... fail. > > I saw fallocate in softmmu/physmem.c on somewhere, though I was not sure how it is > actually used. Your response fills in this part. > >> >> For example, this will make fallocate(FALLOC_FL_PUNCH_HOLE) stop working and in >> turn make ram_block_discard_range() bail out. >> >> >> There was a recent discussion/patch on that: >> >> commit 1d44ff586f8a8e113379430750b5a0a2a3f64cf9 >> Author: David Hildenbrand <david@redhat.com> >> Date: Thu Jul 6 09:56:06 2023 +0200 >> >> softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping >> >> ram_block_discard_range() cannot possibly do the right thing in >> MAP_PRIVATE file mappings in the general case. >> >> To achieve the documented semantics, we also have to punch a hole into >> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings >> of such a file. >> >> For example, using VM templating -- see commit b17fbbe55cba ("migration: >> allow private destination ram with x-ignore-shared") -- in combination with >> any mechanism that relies on discarding of RAM is problematic. This >> includes: >> * Postcopy live migration >> * virtio-balloon inflation/deflation or free-page-reporting >> * virtio-mem >> >> So at least warn that there is something possibly dangerous is going on >> when using ram_block_discard_range() in these cases. >> > > I did not expect that multiple qemu features will contradict each other - private cow map > of file & user fault fd based on demand memory serving ... (do not blame me too much if I > get the terminology wrong - I am no professional qemu dev :D) Let me rephrase: "I did not wish that multiple qemu features will contradict each other" :) > >> >> While it doesn't work "in the general case", it works in the "single file owner" case >> where someone simply forgot to specify "share=on" -- "share=off" is the default for >> memory-backend-file :( . >> >> >> For example, with hugetlb+virtio-mem the following works if the file does not exists: >> >> (note that virtio-mem will fallocate(FALLOC_FL_PUNCH_HOLE) the whole file upfront) >> >> ... >> -object memory-backend-file,share=off,mem-path=/dev/hugepages/vmem0,id=mem2,size=2G \ >> -device virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root >> >> >> With you patch, once the file already exists, we would now get >> >> qemu-system-x86_64: -device > virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root: ram_block_discard_range: > Failed to fallocate :0 +80000000 (-9) >> qemu-system-x86_64: -device > virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root: Unexpected error > discarding RAM: Bad file descriptor >> >> >> So this has the potential to break existing setups. >> >> The easy fix for these would be to configure "share=on" in these now-failing setups. Hmmmmm .... > > I am afraid that the easiest prefix could be to configure `share=on` when the path starts > with "/dev/huge" while firing a warning :D > > I am sorry about that if existing systems will be broken because of my patch ... > > I have learnt that mem-path commonly refer to hugetlb/hugepage, but actually I have no > idea what is the outcome if hugetlb or anything similar was mapped with map_private and > copy-on-write happens - will a whole huge page be copied on write then? > > I would suppose that in reality system managers may consider directly remove the file > first if the file will be truncated anyway. However t would be a different story if this > file should be truncated exactly PARTIALLY. > > Alternatively maybe another flag "create=on" can be added when private semantics are > required, so that if the file exists, the file should be unlinked or truncated first > before using? > > Since I am nowhere familiar to this part of qemu source code, it will be hard for me to > write the additional command line flag part correct, if this is believed to be the correct > solution though. > > In summary I am glad to learn more of the backgrounds here. The easiest way not break any existing setup would be to open the file R/O only if opening it R/W failed due to lack of permissions, and we have a private mapping. So, in case of !RAMP_SHARED, simply retry once more without write permissions. Would that keep your use-case working? > > Back to `-mem-path` part. Now I wonder whether filling the initial value for ram is what > `-mem-path` is expected behavior (whether I am using a feature that will be deprecated > soon); whether there is a convenient method to filling the initial value in copy-on-write > styles if `mem-path` is not good to use; and in general whether a privately used memory > backed file SHOULD be writeable. The case that "-mem-path" has always used MAP_PRIVATE and seemed to have worked with postcopy live migration (another user of fallocate(PUNCH_HOLE)) tells me that we should be careful about that.
At 2023-07-26 16:11:44, "David Hildenbrand" <david@redhat.com> wrote: > >> though the file never gets written. (the actual memory file & guest state file require >> separated hacking) >> >> And at least the patch provided here have been the solution to this last problem for me >> for a while. >> >> By the way the commit: "Commit 134253a4, machine: do not crash if default RAM backend name >> has been stolen" disallows me to use a memory backed file directly as pc.ram and make >> `-object memory-backed-file,*` based setup more complex (I cannot easily make the memory > >Can't you simply do > >-object memory-backed-file,id=mem1 \ >-machine q35,memory-backend=mem1,share=off \ > >Or what would be the problem with that? I could do this though I have not tested this out. For me the biggest problem is that machines who uses a custom memory backend are likely not compatible with those who uses the default memory backend, and this is not convenient - why they are different types of machines and does not migrate from one to another ... My hack is mostly like using `-machine q35` when making snapshot, and `-machine q35 -mem-path filename4pc.ram` when I like to provide a separated memory file on recovery and `-machine q35` when I do not like to provide a memory file. I prefer flexibility when recovering from snapshot that I can either provide a guest state file containing everything, trading memory initialization cost for convenience and disk usage (memory file is and can only be uncompressed), or, provide a memory file + guest state file that focuses more on memory performance. >> >>> >>> While it doesn't work "in the general case", it works in the "single file owner" case >>> where someone simply forgot to specify "share=on" -- "share=off" is the default for >>> memory-backend-file :( . >>> >>> >>> For example, with hugetlb+virtio-mem the following works if the file does not exists: >>> >>> (note that virtio-mem will fallocate(FALLOC_FL_PUNCH_HOLE) the whole file upfront) >>> >>> ... >>> -object memory-backend-file,share=off,mem-path=/dev/hugepages/vmem0,id=mem2,size=2G \ >>> -device virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root >>> >>> >>> With you patch, once the file already exists, we would now get >>> >>> qemu-system-x86_64: -device >> virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root: ram_block_discard_range: >> Failed to fallocate :0 +80000000 (-9) >>> qemu-system-x86_64: -device >> virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root: Unexpected error >> discarding RAM: Bad file descriptor >>> >>> >>> So this has the potential to break existing setups. >>> >>> The easy fix for these would be to configure "share=on" in these now-failing setups. Hmmmmm .... >> >> I am afraid that the easiest prefix could be to configure `share=on` when the path starts >> with "/dev/huge" while firing a warning :D >> >> I am sorry about that if existing systems will be broken because of my patch ... >> >> I have learnt that mem-path commonly refer to hugetlb/hugepage, but actually I have no >> idea what is the outcome if hugetlb or anything similar was mapped with map_private and >> copy-on-write happens - will a whole huge page be copied on write then? >> >> I would suppose that in reality system managers may consider directly remove the file >> first if the file will be truncated anyway. However t would be a different story if this >> file should be truncated exactly PARTIALLY. >> >> Alternatively maybe another flag "create=on" can be added when private semantics are >> required, so that if the file exists, the file should be unlinked or truncated first >> before using? >> >> Since I am nowhere familiar to this part of qemu source code, it will be hard for me to >> write the additional command line flag part correct, if this is believed to be the correct >> solution though. >> >> In summary I am glad to learn more of the backgrounds here. > >The easiest way not break any existing setup would be to open the file >R/O only if opening it R/W failed due to lack of permissions, and we >have a private mapping. So, in case of !RAMP_SHARED, simply retry once >more without write permissions. > >Would that keep your use-case working? > I believe yes. When I want to make sure that memory file is not changed, I only need to make sure that the file is read only on the filesystem level, either by readonly mounting or readonly file modes. Though this is not perfect - it means when I accidentally make the memory file writeable, qemu will open it with readwrite mode. This is a bit scary and counter-intuitive. To think of it, this solution seems most likely to be accepted since it works most of the time, no matter how the other parts of the code changes. It does not even related to whether the memory pages are shared or anything similar. I will give this a shot later. As I have learnt in the messages (though I feel I do not fully understand postcopy live migration mechanism), I can feel that the problem behind the decision of shared/private readonly/readwrite and whether to punch holes, are convoluted and might take months to solve ... >> >> Back to `-mem-path` part. Now I wonder whether filling the initial value for ram is what >> `-mem-path` is expected behavior (whether I am using a feature that will be deprecated >> soon); whether there is a convenient method to filling the initial value in copy-on-write >> styles if `mem-path` is not good to use; and in general whether a privately used memory >> backed file SHOULD be writeable. > >The case that "-mem-path" has always used MAP_PRIVATE and seemed to have >worked with postcopy live migration (another user of >fallocate(PUNCH_HOLE)) tells me that we should be careful about that. > Ha, I did not expect that the current mem path behavior is not fully by design. :D -- Regards, logoerthiner
On Wed, 26 Jul 2023 10:11:44 +0200 David Hildenbrand <david@redhat.com> wrote: > On 25.07.23 18:01, ThinerLogoer wrote: > > > > At 2023-07-25 19:42:30, "David Hildenbrand" <david@redhat.com> wrote: > >> Hi, > >> > >> patch subject should start with "softmmu/physmem: Open ..." > > > > Sorry I am newbie to the patch submission part. I will resubmit a version of patch if the > > final acceptable patch after discussion is mostly the same. (For example, if this patch > > finally involves adding another parameter and adding various hooks, then I may feel it > > hard to finish the patch myself, both due to lack of knowledge of qemu source code tree, > > and due to lack of various environment to test every case out) > > No worries. I'll be happy to guide you. But if you feel more comfortable > that I take over, just let me know. > > > > > Anyway thanks to all your suggestions. > > > >> > >> On 25.07.23 12:52, Thiner Logoer wrote: > >>> An read only file can be mapped with read write as long as the > >>> mapping is private, which is very common case. Make > >> > >> At least in the environments I know, using private file mappings is a corner case ;) > >> > >> What is you use case? VM templating? > > > > Mostly, if I understand the terminology correct. I was experimenting on vm snapshoting > > that uses MAP_PRIVATE when recovering memory, similar to what firecracker says in this > > documentation. > > > > https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md > > > > And in my experiment qemu supports recovering from a memory file + a guest state file out > > of the box. > > In fact, `-mem-path filename4pc.ram` works out of the box (since the default parameter is > > map_private+readwrite), only that vanilla setup requires memory file to be writeable > > Oh, you're saying "-mem-path" results in a MAP_PRIVATE mapping? That > sounds very nasty :/ It probably was introduced only for hugetlb > handling, and wouldn't actually share memory with another process. > > In fact, we added MAP_SHARED handling later > > commit dbcb8981183592be129b2e624b7bcd4245e75fbc > Author: Paolo Bonzini <pbonzini@redhat.com> > Date: Tue Jun 10 19:15:24 2014 +0800 > > hostmem: add property to map memory with MAP_SHARED > > A new "share" property can be used with the "memory-file" backend to > map memory with MAP_SHARED instead of MAP_PRIVATE. > > > Even one doc in docs/devel/multi-process.rst is wrong: > > "Note guest memory must be backed by file descriptors, such as when QEMU > is given the *-mem-path* command line option." > > ... no, that won't work with a MAP_PRIVATE mapping. > > > > though the file never gets written. (the actual memory file & guest state file require > > separated hacking) > > > > And at least the patch provided here have been the solution to this last problem for me > > for a while. > > > > By the way the commit: "Commit 134253a4, machine: do not crash if default RAM backend name > > has been stolen" disallows me to use a memory backed file directly as pc.ram and make > > `-object memory-backed-file,*` based setup more complex (I cannot easily make the memory > > Can't you simply do > > -object memory-backed-file,id=mem1 \ > -machine q35,memory-backend=mem1,share=off \ > > Or what would be the problem with that? > > > unbacked by any file before snapshoting and backed by file after recovery from snapshot > > after this patch). This is the reason why I prefer `-mem-path` despite the doc tells that > > this usage is close to deprecated, and that `-mem-path` has less configurable parameters. > > > >> > >>> qemu_ram_alloc_from_file open file as read only when the > >>> mapping is private, otherwise open will fail when file > >>> does not allow write. > >>> > >>> If this file does not exist or is a directory, the flag is not used, > >>> so it should be OK. > >>> > >>> from https://gitlab.com/qemu-project/qemu/-/issues/1689 > >>> > >>> Signed-off-by: Thiner Logoer <logoerthiner1@163.com> > >>> --- > >>> softmmu/physmem.c | 9 ++++++++- > >>> 1 file changed, 8 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c > >>> index 3df73542e1..e8036ee335 100644 > >>> --- a/softmmu/physmem.c > >>> +++ b/softmmu/physmem.c > >>> @@ -1945,8 +1945,15 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion*mr, > >>> int fd; > >>> bool created; > >>> RAMBlock *block; > >>> + > >> > >> ^ > >> > >> .git/rebase-apply/patch:13: trailing whitespace. > > > > I remembered I have deleted this whitespace before. Obviously I have messed up with > > different version of patch files, sorry about that ... > > > > No worries :) > > >> > >>> + /* > >>> + * If map is private, the fd does not need to be writable. > >>> + * This only get effective when the file is existent. > >> > >> "This will get ignored if the file does not yet exist." > >> > >>> + */ > >>> + bool open_as_readonly = readonly || !(ram_flags & RAM_SHARED); > >>> > >>> - fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created, > >>> + fd = file_ram_open(mem_path, memory_region_name(mr), > >>> + open_as_readonly, &created, > >>> errp); > >>> if (fd < 0) { > >>> return NULL; > >> > >> > >> Opening a file R/O will also make operations like fallocate/ftruncate/ ... fail. > > > > I saw fallocate in softmmu/physmem.c on somewhere, though I was not sure how it is > > actually used. Your response fills in this part. > > > >> > >> For example, this will make fallocate(FALLOC_FL_PUNCH_HOLE) stop working and in > >> turn make ram_block_discard_range() bail out. > >> > >> > >> There was a recent discussion/patch on that: > >> > >> commit 1d44ff586f8a8e113379430750b5a0a2a3f64cf9 > >> Author: David Hildenbrand <david@redhat.com> > >> Date: Thu Jul 6 09:56:06 2023 +0200 > >> > >> softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping > >> > >> ram_block_discard_range() cannot possibly do the right thing in > >> MAP_PRIVATE file mappings in the general case. > >> > >> To achieve the documented semantics, we also have to punch a hole into > >> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings > >> of such a file. > >> > >> For example, using VM templating -- see commit b17fbbe55cba ("migration: > >> allow private destination ram with x-ignore-shared") -- in combination with > >> any mechanism that relies on discarding of RAM is problematic. This > >> includes: > >> * Postcopy live migration > >> * virtio-balloon inflation/deflation or free-page-reporting > >> * virtio-mem > >> > >> So at least warn that there is something possibly dangerous is going on > >> when using ram_block_discard_range() in these cases. > >> > > > > I did not expect that multiple qemu features will contradict each other - private cow map > > of file & user fault fd based on demand memory serving ... (do not blame me too much if I > > get the terminology wrong - I am no professional qemu dev :D) > > Let me rephrase: > > "I did not wish that multiple qemu features will contradict each other" > > :) > > > > >> > >> While it doesn't work "in the general case", it works in the "single file owner" case > >> where someone simply forgot to specify "share=on" -- "share=off" is the default for > >> memory-backend-file :( . > >> > >> > >> For example, with hugetlb+virtio-mem the following works if the file does not exists: > >> > >> (note that virtio-mem will fallocate(FALLOC_FL_PUNCH_HOLE) the whole file upfront) > >> > >> ... > >> -object memory-backend-file,share=off,mem-path=/dev/hugepages/vmem0,id=mem2,size=2G \ > >> -device virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root > >> > >> > >> With you patch, once the file already exists, we would now get > >> > >> qemu-system-x86_64: -device > > virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root: ram_block_discard_range: > > Failed to fallocate :0 +80000000 (-9) > >> qemu-system-x86_64: -device > > virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root: Unexpected error > > discarding RAM: Bad file descriptor > >> > >> > >> So this has the potential to break existing setups. > >> > >> The easy fix for these would be to configure "share=on" in these now-failing setups. Hmmmmm .... > > > > I am afraid that the easiest prefix could be to configure `share=on` when the path starts > > with "/dev/huge" while firing a warning :D > > > > I am sorry about that if existing systems will be broken because of my patch ... > > > > I have learnt that mem-path commonly refer to hugetlb/hugepage, but actually I have no > > idea what is the outcome if hugetlb or anything similar was mapped with map_private and > > copy-on-write happens - will a whole huge page be copied on write then? > > > > I would suppose that in reality system managers may consider directly remove the file > > first if the file will be truncated anyway. However t would be a different story if this > > file should be truncated exactly PARTIALLY. > > > > Alternatively maybe another flag "create=on" can be added when private semantics are > > required, so that if the file exists, the file should be unlinked or truncated first > > before using? > > > > Since I am nowhere familiar to this part of qemu source code, it will be hard for me to > > write the additional command line flag part correct, if this is believed to be the correct > > solution though. > > > > In summary I am glad to learn more of the backgrounds here. > > The easiest way not break any existing setup would be to open the file > R/O only if opening it R/W failed due to lack of permissions, and we > have a private mapping. So, in case of !RAMP_SHARED, simply retry once > more without write permissions. > > Would that keep your use-case working? > > > > > Back to `-mem-path` part. Now I wonder whether filling the initial value for ram is what > > `-mem-path` is expected behavior (whether I am using a feature that will be deprecated > > soon); whether there is a convenient method to filling the initial value in copy-on-write > > styles if `mem-path` is not good to use; and in general whether a privately used memory > > backed file SHOULD be writeable. > > The case that "-mem-path" has always used MAP_PRIVATE and seemed to have > worked with postcopy live migration (another user of > fallocate(PUNCH_HOLE)) tells me that we should be careful about that. -mem-path is legacy, to keep pre memory-backend behavior/CLI working. It's better no to touch that and probably even more better to deprecate/drop it altogether. Users can use -machine foo,memory-backend= directly which is what is used underhood. The only difference is that one can use arbitrary backends/option with it.
diff --git a/softmmu/physmem.c b/softmmu/physmem.c index 3df73542e1..e8036ee335 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -1945,8 +1945,15 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, int fd; bool created; RAMBlock *block; + + /* + * If map is private, the fd does not need to be writable. + * This only get effective when the file is existent. + */ + bool open_as_readonly = readonly || !(ram_flags & RAM_SHARED); - fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created, + fd = file_ram_open(mem_path, memory_region_name(mr), + open_as_readonly, &created, errp); if (fd < 0) { return NULL;
An read only file can be mapped with read write as long as the mapping is private, which is very common case. Make qemu_ram_alloc_from_file open file as read only when the mapping is private, otherwise open will fail when file does not allow write. If this file does not exist or is a directory, the flag is not used, so it should be OK. from https://gitlab.com/qemu-project/qemu/-/issues/1689 Signed-off-by: Thiner Logoer <logoerthiner1@163.com> --- softmmu/physmem.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)