Message ID | 20250123131944.391886-1-d-tatianin@yandex-team.ru (mailing list archive) |
---|---|
Headers | show |
Series | overcommit: introduce mem-lock-onfault | expand |
On Thu, Jan 23, 2025 at 04:19:40PM +0300, Daniil Tatianin wrote: > Currently, passing mem-lock=on to QEMU causes memory usage to grow by > huge amounts: > > no memlock: > $ ./qemu-system-x86_64 -overcommit mem-lock=off > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > 45652 > > $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > 39756 > > memlock: > $ ./qemu-system-x86_64 -overcommit mem-lock=on > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > 1309876 > > $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > 259956 > > This is caused by the fact that mlockall(2) automatically > write-faults every existing and future anonymous mappings in the > process right away. > > One of the reasons to enable mem-lock is to protect a QEMU process' > pages from being compacted and migrated by kcompactd (which does so > by messing with a live process page tables causing thousands of TLB > flush IPIs per second) basically stealing all guest time while it's > active. > > mem-lock=on helps against this (given compact_unevictable_allowed is 0), > but the memory overhead it introduces is an undesirable side effect, > which we can completely avoid by passing MCL_ONFAULT to mlockall, which > is what this series allows to do with a new option for mem-lock called > on-fault. > > memlock-onfault: > $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > 54004 > > $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > 47772 > > You may notice the memory usage is still slightly higher, in this case > by a few megabytes over the mem-lock=off case. I was able to trace this > down to a bug in the linux kernel with MCL_ONFAULT not being honored for > the early process heap (with brk(2) etc.) so it is still write-faulted in > this case, but it's still way less than it was with just the mem-lock=on. > > Changes since v1: > - Don't make a separate mem-lock-onfault, add an on-fault option to mem-lock instead > > Changes since v2: > - Move overcommit option parsing out of line > - Make enable_mlock an enum instead > > Changes since v3: > - Rebase to latest master due to the recent sysemu -> system renames > > Daniil Tatianin (4): > os: add an ability to lock memory on_fault > system/vl: extract overcommit option parsing into a helper > system: introduce a new MlockState enum > overcommit: introduce mem-lock=on-fault > > hw/virtio/virtio-mem.c | 2 +- > include/system/os-posix.h | 2 +- > include/system/os-win32.h | 3 ++- > include/system/system.h | 12 ++++++++- > migration/postcopy-ram.c | 4 +-- > os-posix.c | 10 ++++++-- > qemu-options.hx | 14 +++++++---- > system/globals.c | 12 ++++++++- > system/vl.c | 52 +++++++++++++++++++++++++++++++-------- > 9 files changed, 87 insertions(+), 24 deletions(-) Considering it's very mem relevant change and looks pretty benign.. I can pick this if nobody disagrees (or beats me to it, which I'd appreciate). I'll also provide at least one week for people to stop me. Thanks,
On 1/23/25 7:31 PM, Peter Xu wrote: > On Thu, Jan 23, 2025 at 04:19:40PM +0300, Daniil Tatianin wrote: >> Currently, passing mem-lock=on to QEMU causes memory usage to grow by >> huge amounts: >> >> no memlock: >> $ ./qemu-system-x86_64 -overcommit mem-lock=off >> $ ps -p $(pidof ./qemu-system-x86_64) -o rss= >> 45652 >> >> $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm >> $ ps -p $(pidof ./qemu-system-x86_64) -o rss= >> 39756 >> >> memlock: >> $ ./qemu-system-x86_64 -overcommit mem-lock=on >> $ ps -p $(pidof ./qemu-system-x86_64) -o rss= >> 1309876 >> >> $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm >> $ ps -p $(pidof ./qemu-system-x86_64) -o rss= >> 259956 >> >> This is caused by the fact that mlockall(2) automatically >> write-faults every existing and future anonymous mappings in the >> process right away. >> >> One of the reasons to enable mem-lock is to protect a QEMU process' >> pages from being compacted and migrated by kcompactd (which does so >> by messing with a live process page tables causing thousands of TLB >> flush IPIs per second) basically stealing all guest time while it's >> active. >> >> mem-lock=on helps against this (given compact_unevictable_allowed is 0), >> but the memory overhead it introduces is an undesirable side effect, >> which we can completely avoid by passing MCL_ONFAULT to mlockall, which >> is what this series allows to do with a new option for mem-lock called >> on-fault. >> >> memlock-onfault: >> $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault >> $ ps -p $(pidof ./qemu-system-x86_64) -o rss= >> 54004 >> >> $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm >> $ ps -p $(pidof ./qemu-system-x86_64) -o rss= >> 47772 >> >> You may notice the memory usage is still slightly higher, in this case >> by a few megabytes over the mem-lock=off case. I was able to trace this >> down to a bug in the linux kernel with MCL_ONFAULT not being honored for >> the early process heap (with brk(2) etc.) so it is still write-faulted in >> this case, but it's still way less than it was with just the mem-lock=on. >> >> Changes since v1: >> - Don't make a separate mem-lock-onfault, add an on-fault option to mem-lock instead >> >> Changes since v2: >> - Move overcommit option parsing out of line >> - Make enable_mlock an enum instead >> >> Changes since v3: >> - Rebase to latest master due to the recent sysemu -> system renames >> >> Daniil Tatianin (4): >> os: add an ability to lock memory on_fault >> system/vl: extract overcommit option parsing into a helper >> system: introduce a new MlockState enum >> overcommit: introduce mem-lock=on-fault >> >> hw/virtio/virtio-mem.c | 2 +- >> include/system/os-posix.h | 2 +- >> include/system/os-win32.h | 3 ++- >> include/system/system.h | 12 ++++++++- >> migration/postcopy-ram.c | 4 +-- >> os-posix.c | 10 ++++++-- >> qemu-options.hx | 14 +++++++---- >> system/globals.c | 12 ++++++++- >> system/vl.c | 52 +++++++++++++++++++++++++++++++-------- >> 9 files changed, 87 insertions(+), 24 deletions(-) > Considering it's very mem relevant change and looks pretty benign.. I can > pick this if nobody disagrees (or beats me to it, which I'd appreciate). > > I'll also provide at least one week for people to stop me. I think it's been almost two weeks, so should be good now :) Thanks! > Thanks, >
On Tue, Feb 04, 2025 at 11:23:41AM +0300, Daniil Tatianin wrote: > > On 1/23/25 7:31 PM, Peter Xu wrote: > > On Thu, Jan 23, 2025 at 04:19:40PM +0300, Daniil Tatianin wrote: > > > Currently, passing mem-lock=on to QEMU causes memory usage to grow by > > > huge amounts: > > > > > > no memlock: > > > $ ./qemu-system-x86_64 -overcommit mem-lock=off > > > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > > > 45652 > > > > > > $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm > > > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > > > 39756 > > > > > > memlock: > > > $ ./qemu-system-x86_64 -overcommit mem-lock=on > > > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > > > 1309876 > > > > > > $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm > > > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > > > 259956 > > > > > > This is caused by the fact that mlockall(2) automatically > > > write-faults every existing and future anonymous mappings in the > > > process right away. > > > > > > One of the reasons to enable mem-lock is to protect a QEMU process' > > > pages from being compacted and migrated by kcompactd (which does so > > > by messing with a live process page tables causing thousands of TLB > > > flush IPIs per second) basically stealing all guest time while it's > > > active. > > > > > > mem-lock=on helps against this (given compact_unevictable_allowed is 0), > > > but the memory overhead it introduces is an undesirable side effect, > > > which we can completely avoid by passing MCL_ONFAULT to mlockall, which > > > is what this series allows to do with a new option for mem-lock called > > > on-fault. > > > > > > memlock-onfault: > > > $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault > > > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > > > 54004 > > > > > > $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm > > > $ ps -p $(pidof ./qemu-system-x86_64) -o rss= > > > 47772 > > > > > > You may notice the memory usage is still slightly higher, in this case > > > by a few megabytes over the mem-lock=off case. I was able to trace this > > > down to a bug in the linux kernel with MCL_ONFAULT not being honored for > > > the early process heap (with brk(2) etc.) so it is still write-faulted in > > > this case, but it's still way less than it was with just the mem-lock=on. > > > > > > Changes since v1: > > > - Don't make a separate mem-lock-onfault, add an on-fault option to mem-lock instead > > > > > > Changes since v2: > > > - Move overcommit option parsing out of line > > > - Make enable_mlock an enum instead > > > > > > Changes since v3: > > > - Rebase to latest master due to the recent sysemu -> system renames > > > > > > Daniil Tatianin (4): > > > os: add an ability to lock memory on_fault > > > system/vl: extract overcommit option parsing into a helper > > > system: introduce a new MlockState enum > > > overcommit: introduce mem-lock=on-fault > > > > > > hw/virtio/virtio-mem.c | 2 +- > > > include/system/os-posix.h | 2 +- > > > include/system/os-win32.h | 3 ++- > > > include/system/system.h | 12 ++++++++- > > > migration/postcopy-ram.c | 4 +-- > > > os-posix.c | 10 ++++++-- > > > qemu-options.hx | 14 +++++++---- > > > system/globals.c | 12 ++++++++- > > > system/vl.c | 52 +++++++++++++++++++++++++++++++-------- > > > 9 files changed, 87 insertions(+), 24 deletions(-) > > Considering it's very mem relevant change and looks pretty benign.. I can > > pick this if nobody disagrees (or beats me to it, which I'd appreciate). > > > > I'll also provide at least one week for people to stop me. > > I think it's been almost two weeks, so should be good now :) Don't worry, this is in track. I'll send it maybe in a few days. Thanks,
On 2/4/25 5:47 PM, Peter Xu wrote: > On Tue, Feb 04, 2025 at 11:23:41AM +0300, Daniil Tatianin wrote: >> On 1/23/25 7:31 PM, Peter Xu wrote: >>> On Thu, Jan 23, 2025 at 04:19:40PM +0300, Daniil Tatianin wrote: >>>> Currently, passing mem-lock=on to QEMU causes memory usage to grow by >>>> huge amounts: >>>> >>>> no memlock: >>>> $ ./qemu-system-x86_64 -overcommit mem-lock=off >>>> $ ps -p $(pidof ./qemu-system-x86_64) -o rss= >>>> 45652 >>>> >>>> $ ./qemu-system-x86_64 -overcommit mem-lock=off -enable-kvm >>>> $ ps -p $(pidof ./qemu-system-x86_64) -o rss= >>>> 39756 >>>> >>>> memlock: >>>> $ ./qemu-system-x86_64 -overcommit mem-lock=on >>>> $ ps -p $(pidof ./qemu-system-x86_64) -o rss= >>>> 1309876 >>>> >>>> $ ./qemu-system-x86_64 -overcommit mem-lock=on -enable-kvm >>>> $ ps -p $(pidof ./qemu-system-x86_64) -o rss= >>>> 259956 >>>> >>>> This is caused by the fact that mlockall(2) automatically >>>> write-faults every existing and future anonymous mappings in the >>>> process right away. >>>> >>>> One of the reasons to enable mem-lock is to protect a QEMU process' >>>> pages from being compacted and migrated by kcompactd (which does so >>>> by messing with a live process page tables causing thousands of TLB >>>> flush IPIs per second) basically stealing all guest time while it's >>>> active. >>>> >>>> mem-lock=on helps against this (given compact_unevictable_allowed is 0), >>>> but the memory overhead it introduces is an undesirable side effect, >>>> which we can completely avoid by passing MCL_ONFAULT to mlockall, which >>>> is what this series allows to do with a new option for mem-lock called >>>> on-fault. >>>> >>>> memlock-onfault: >>>> $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault >>>> $ ps -p $(pidof ./qemu-system-x86_64) -o rss= >>>> 54004 >>>> >>>> $ ./qemu-system-x86_64 -overcommit mem-lock=on-fault -enable-kvm >>>> $ ps -p $(pidof ./qemu-system-x86_64) -o rss= >>>> 47772 >>>> >>>> You may notice the memory usage is still slightly higher, in this case >>>> by a few megabytes over the mem-lock=off case. I was able to trace this >>>> down to a bug in the linux kernel with MCL_ONFAULT not being honored for >>>> the early process heap (with brk(2) etc.) so it is still write-faulted in >>>> this case, but it's still way less than it was with just the mem-lock=on. >>>> >>>> Changes since v1: >>>> - Don't make a separate mem-lock-onfault, add an on-fault option to mem-lock instead >>>> >>>> Changes since v2: >>>> - Move overcommit option parsing out of line >>>> - Make enable_mlock an enum instead >>>> >>>> Changes since v3: >>>> - Rebase to latest master due to the recent sysemu -> system renames >>>> >>>> Daniil Tatianin (4): >>>> os: add an ability to lock memory on_fault >>>> system/vl: extract overcommit option parsing into a helper >>>> system: introduce a new MlockState enum >>>> overcommit: introduce mem-lock=on-fault >>>> >>>> hw/virtio/virtio-mem.c | 2 +- >>>> include/system/os-posix.h | 2 +- >>>> include/system/os-win32.h | 3 ++- >>>> include/system/system.h | 12 ++++++++- >>>> migration/postcopy-ram.c | 4 +-- >>>> os-posix.c | 10 ++++++-- >>>> qemu-options.hx | 14 +++++++---- >>>> system/globals.c | 12 ++++++++- >>>> system/vl.c | 52 +++++++++++++++++++++++++++++++-------- >>>> 9 files changed, 87 insertions(+), 24 deletions(-) >>> Considering it's very mem relevant change and looks pretty benign.. I can >>> pick this if nobody disagrees (or beats me to it, which I'd appreciate). >>> >>> I'll also provide at least one week for people to stop me. >> I think it's been almost two weeks, so should be good now :) > Don't worry, this is in track. I'll send it maybe in a few days. > > Thanks, Amazing, thank you!