Message ID | 20230404135850.3673404-7-willy@infradead.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Avoid the mmap lock for fault-around | expand |
On Tue, Apr 04, 2023 at 02:58:50PM +0100, Matthew Wilcox (Oracle) wrote: > The map_pages fs method should be safe to run under the VMA lock instead > of the mmap lock. This should have a measurable reduction in contention > on the mmap lock. https://github.com/antonblanchard/will-it-scale/pull/37/files should be a good microbenchmark to report numbers from. Obviously real-world benchmarks will be more compelling.
On Tue, Apr 4, 2023 at 8:23 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Tue, Apr 04, 2023 at 02:58:50PM +0100, Matthew Wilcox (Oracle) wrote: > > The map_pages fs method should be safe to run under the VMA lock instead > > of the mmap lock. This should have a measurable reduction in contention > > on the mmap lock. > > https://github.com/antonblanchard/will-it-scale/pull/37/files should > be a good microbenchmark to report numbers from. Obviously real-world > benchmarks will be more compelling. The series looks sane to me. I'll run some tests on a NUMA machine to see if anything breaks. Thanks!
On Tue, 2023-04-04 at 16:23 +0100, Matthew Wilcox wrote: > On Tue, Apr 04, 2023 at 02:58:50PM +0100, Matthew Wilcox (Oracle) > wrote: > > The map_pages fs method should be safe to run under the VMA lock > > instead > > of the mmap lock. This should have a measurable reduction in > > contention > > on the mmap lock. > > https://github.com/antonblanchard/will-it-scale/pull/37/files should > be a good microbenchmark to report numbers from. Obviously real- > world > benchmarks will be more compelling. > Test result in my side with page_fault4 of will-it-scale in thread mode is: 15274196 (without the patch) -> 17291444 (with the patch) 13.2% improvement on a Ice Lake with 48C/96T + 192G RAM + ext4 filesystem. The perf showed the mmap_lock contention reduced a lot: (Removed the grandson functions of do_user_addr_fault()) latest linux-next with the patch: 51.78%--do_user_addr_fault | |--49.09%--handle_mm_fault |--1.19%--lock_vma_under_rcu --1.09%--down_read latest linux-next without the patch: 73.65%--do_user_addr_fault | |--28.65%--handle_mm_fault |--17.22%--down_read_trylock |--10.92%--down_read |--9.20%--up_read --7.30%--find_vma My understanding is down_read_trylock, down_read and up_read all are related with mmap_lock. So the mmap_lock contention reduction is quite obvious. For functional testing, our 0day already cherry-picked the patchset and the testing is ongoing. If any problem hit during testing, 0day will report it out. Thanks. Regards Yin, Fengwei
On Mon, Apr 10, 2023 at 04:53:19AM +0000, Yin, Fengwei wrote: > On Tue, 2023-04-04 at 16:23 +0100, Matthew Wilcox wrote: > > On Tue, Apr 04, 2023 at 02:58:50PM +0100, Matthew Wilcox (Oracle) > > wrote: > > > The map_pages fs method should be safe to run under the VMA lock > > > instead > > > of the mmap lock. This should have a measurable reduction in > > > contention > > > on the mmap lock. > > > > https://github.com/antonblanchard/will-it-scale/pull/37/files should > > be a good microbenchmark to report numbers from. Obviously real- > > world > > benchmarks will be more compelling. > > > > Test result in my side with page_fault4 of will-it-scale in thread > mode is: > 15274196 (without the patch) -> 17291444 (with the patch) > > 13.2% improvement on a Ice Lake with 48C/96T + 192G RAM + ext4 > filesystem. Thanks! That is really good news. > The perf showed the mmap_lock contention reduced a lot: > (Removed the grandson functions of do_user_addr_fault()) > > latest linux-next with the patch: > 51.78%--do_user_addr_fault > | > |--49.09%--handle_mm_fault > |--1.19%--lock_vma_under_rcu > --1.09%--down_read > > latest linux-next without the patch: > 73.65%--do_user_addr_fault > | > |--28.65%--handle_mm_fault > |--17.22%--down_read_trylock > |--10.92%--down_read > |--9.20%--up_read > --7.30%--find_vma > > My understanding is down_read_trylock, down_read and up_read all are > related with mmap_lock. So the mmap_lock contention reduction is quite > obvious. Absolutely. I'm a little surprised that find_vma() basically disappeared from the perf results. I guess that it was cache cold after contending on the mmap_lock rwsem. But this is a very encouraging result.
On 4/10/2023 10:11 PM, Matthew Wilcox wrote: > On Mon, Apr 10, 2023 at 04:53:19AM +0000, Yin, Fengwei wrote: >> On Tue, 2023-04-04 at 16:23 +0100, Matthew Wilcox wrote: >>> On Tue, Apr 04, 2023 at 02:58:50PM +0100, Matthew Wilcox (Oracle) >>> wrote: >>>> The map_pages fs method should be safe to run under the VMA lock >>>> instead >>>> of the mmap lock. This should have a measurable reduction in >>>> contention >>>> on the mmap lock. >>> >>> https://github.com/antonblanchard/will-it-scale/pull/37/files should >>> be a good microbenchmark to report numbers from. Obviously real- >>> world >>> benchmarks will be more compelling. >>> >> >> Test result in my side with page_fault4 of will-it-scale in thread >> mode is: >> 15274196 (without the patch) -> 17291444 (with the patch) >> >> 13.2% improvement on a Ice Lake with 48C/96T + 192G RAM + ext4 >> filesystem. > > Thanks! That is really good news. > >> The perf showed the mmap_lock contention reduced a lot: >> (Removed the grandson functions of do_user_addr_fault()) >> >> latest linux-next with the patch: >> 51.78%--do_user_addr_fault >> | >> |--49.09%--handle_mm_fault >> |--1.19%--lock_vma_under_rcu >> --1.09%--down_read >> >> latest linux-next without the patch: >> 73.65%--do_user_addr_fault >> | >> |--28.65%--handle_mm_fault >> |--17.22%--down_read_trylock >> |--10.92%--down_read >> |--9.20%--up_read >> --7.30%--find_vma >> >> My understanding is down_read_trylock, down_read and up_read all are >> related with mmap_lock. So the mmap_lock contention reduction is quite >> obvious. > > Absolutely. I'm a little surprised that find_vma() basically disappeared > from the perf results. I guess that it was cache cold after contending > on the mmap_lock rwsem. But this is a very encouraging result. Yes. find_vma() was surprise. So I did more check about it. 1. re-run the testing for more 3 times in case I made stupid mistake. All testing show same behavior for find_vma(). 2. perf report for find_vma() shows: 6.26%--find_vma | --0.66%--mt_find | --0.60%--mtree_range_walk The most time used in find_vma() is not mt_find. It's mmap_assert_locked(mm). 3. perf annotate of find_vma shows: │ ffffffffa91e9f20 <load0>: 0.07 │ nop 0.07 │ sub $0x10,%rsp 0.01 │ mov %gs:0x28,%rax 0.05 │ mov %rax,0x8(%rsp) 0.02 │ xor %eax,%eax │ mov %rsi,(%rsp) 0.00 │ mov 0x70(%rdi),%rax --> This is rwsem_is_locked(&mm->mmap_lock) 99.60 │ test %rax,%rax │ ↓ je 4e 0.09 │ mov $0xffffffffffffffff,%rdx │ mov %rsp,%rsi I believe the line "mov 0x70(%rdi),%rax" should occupy 99.60% runtime of find_vma() instead of line "test %rax,%rax". And it's accessing the mmap_lock. With this series, the mmap_lock contention is reduced a lot with page_fault4 of will-it-scale. It makes mmap_lock access fast also. Regards Yin, Fengwei
diff --git a/mm/memory.c b/mm/memory.c index 9952bebd25b4..590ccaf6f7fb 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4521,8 +4521,6 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf) { vm_fault_t ret = 0; - if (vmf->flags & FAULT_FLAG_VMA_LOCK) - return VM_FAULT_RETRY; /* * Let's call ->map_pages() first and use ->fault() as fallback * if page by the offset is not ready to be mapped (cold cache or @@ -4534,6 +4532,8 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf) return ret; } + if (vmf->flags & FAULT_FLAG_VMA_LOCK) + return VM_FAULT_RETRY; ret = __do_fault(vmf); if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY))) return ret;
The map_pages fs method should be safe to run under the VMA lock instead of the mmap lock. This should have a measurable reduction in contention on the mmap lock. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> --- mm/memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)