[v4,00/33] Per-VMA locks

Message ID	20230227173632.3292573-1-surenb@google.com (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> Date: Mon, 27 Feb 2023 09:35:59 -0800 Mime-Version: 1.0 Message-ID: <20230227173632.3292573-1-surenb@google.com> Subject: [PATCH v4 00/33] Per-VMA locks From: Suren Baghdasaryan <surenb@google.com> To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan <surenb@google.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Per-VMA locks \| expand [v4,00/33] Per-VMA locks [v4,01/33] maple_tree: Be more cautious about dead nodes [v4,02/33] maple_tree: Detect dead nodes in mas_start() [v4,03/33] maple_tree: Fix freeing of nodes in rcu mode [v4,04/33] maple_tree: remove extra smp_wmb() from mas_dead_leaves() [v4,05/33] maple_tree: Fix write memory barrier of nodes once dead for RCU mode [v4,06/33] maple_tree: Add smp_rmb() to dead node detection [v4,07/33] maple_tree: Add RCU lock checking to rcu callback functions [v4,08/33] mm: Enable maple tree RCU mode by default. [v4,09/33] mm: introduce CONFIG_PER_VMA_LOCK [v4,10/33] mm: rcu safe VMA freeing [v4,11/33] mm: move mmap_lock assert function definitions [v4,12/33] mm: add per-VMA lock and helper functions to control it [v4,13/33] mm: mark VMA as being written when changing vm_flags [v4,14/33] mm/mmap: move vma_prepare before vma_adjust_trans_huge [v4,15/33] mm/khugepaged: write-lock VMA while collapsing a huge page [v4,16/33] mm/mmap: write-lock VMAs in vma_prepare before modifying them [v4,17/33] mm/mremap: write-lock VMA while remapping it to a new address range [v4,18/33] mm: write-lock VMAs before removing them from VMA tree [v4,19/33] mm: conditionally write-lock VMA in free_pgtables [v4,20/33] kernel/fork: assert no VMA readers during its destruction [v4,21/33] mm/mmap: prevent pagefault handler from racing with mmu_notifier registration [v4,22/33] mm: introduce vma detached flag [v4,23/33] mm: introduce lock_vma_under_rcu to be used from arch-specific code [v4,24/33] mm: fall back to mmap_lock if vma->anon_vma is not yet set [v4,25/33] mm: add FAULT_FLAG_VMA_LOCK flag [v4,26/33] mm: prevent do_swap_page from handling page faults under VMA lock [v4,27/33] mm: prevent userfaults to be handled under per-vma lock [v4,28/33] mm: introduce per-VMA lock statistics [v4,29/33] x86/mm: try VMA lock-based page fault handling first [v4,30/33] arm64/mm: try VMA lock-based page fault handling first [v4,31/33] powerc/mm: try VMA lock-based page fault handling first [v4,32/33] mm/mmap: free vm_area_struct without call_rcu in exit_mmap [v4,33/33] mm: separate vma->lock from vm_area_struct

Suren Baghdasaryan Feb. 27, 2023, 5:35 p.m. UTC

Previous versions:
v3: https://lore.kernel.org/all/20230216051750.3125598-1-surenb@google.com/
v2: https://lore.kernel.org/lkml/20230127194110.533103-1-surenb@google.com/
v1: https://lore.kernel.org/all/20230109205336.3665937-1-surenb@google.com/
RFC: https://lore.kernel.org/all/20220901173516.702122-1-surenb@google.com/

LWN article describing the feature:
https://lwn.net/Articles/906852/

Per-vma locks idea that was discussed during SPF [1] discussion at LSF/MM
last year [2], which concluded with suggestion that “a reader/writer
semaphore could be put into the VMA itself; that would have the effect of
using the VMA as a sort of range lock. There would still be contention at
the VMA level, but it would be an improvement.” This patchset implements
this suggested approach.

When handling page faults we lookup the VMA that contains the faulting
page under RCU protection and try to acquire its lock. If that fails we
fall back to using mmap_lock, similar to how SPF handled this situation.

One notable way the implementation deviates from the proposal is the way
VMAs are read-locked. During some of mm updates, multiple VMAs need to be
locked until the end of the update (e.g. vma_merge, split_vma, etc).
Tracking all the locked VMAs, avoiding recursive locks, figuring out when
it's safe to unlock previously locked VMAs would make the code more
complex. So, instead of the usual lock/unlock pattern, the proposed
solution marks a VMA as locked and provides an efficient way to:
1. Identify locked VMAs.
2. Unlock all locked VMAs in bulk.
We also postpone unlocking the locked VMAs until the end of the update,
when we do mmap_write_unlock. Potentially this keeps a VMA locked for
longer than is absolutely necessary but it results in a big reduction of
code complexity.
Read-locking a VMA is done using two sequence numbers - one in the
vm_area_struct and one in the mm_struct. VMA is considered read-locked
when these sequence numbers are equal. To read-lock a VMA we set the
sequence number in vm_area_struct to be equal to the sequence number in
mm_struct. To unlock all VMAs we increment mm_struct's seq number. This
allows for an efficient way to track locked VMAs and to drop the locks on
all VMAs at the end of the update.

The patchset implements per-VMA locking only for anonymous pages which
are not in swap and avoids userfaultfs as their implementation is more
complex. Additional support for file-back page faults, swapped and user
pages can be added incrementally.

Performance benchmarks show similar although slightly smaller benefits as
with SPF patchset (~75% of SPF benefits). Still, with lower complexity
this approach might be more desirable.

Since RFC was posted in September 2022, two separate Google teams outside
of Android evaluated the patchset and confirmed positive results. Here are
the known usecases when per-VMA locks show benefits:

Android:
Apps with high number of threads (~100) launch times improve by up to 20%.
Each thread mmaps several areas upon startup (Stack and Thread-local
storage (TLS), thread signal stack, indirect ref table), which requires
taking mmap_lock in write mode. Page faults take mmap_lock in read mode.
During app launch, both thread creation and page faults establishing the
active workinget are happening in parallel and that causes lock contention
between mm writers and readers even if updates and page faults are
happening in different VMAs. Per-vma locks prevent this contention by
providing more granular lock.

Google Fibers:
We have several dynamically sized thread pools that spawn new threads
under increased load and reduce their number when idling. For example,
Google's in-process scheduling/threading framework, UMCG/Fibers, is backed
by such a thread pool. When idling, only a small number of idle worker
threads are available; when a spike of incoming requests arrive, each
request is handled in its own "fiber", which is a work item posted onto a
UMCG worker thread; quite often these spikes lead to a number of new
threads spawning. Each new thread needs to allocate and register an RSEQ
section on its TLS, then register itself with the kernel as a UMCG worker
thread, and only after that it can be considered by the in-process
UMCG/Fiber scheduler as available to do useful work. In short, during an
incoming workload spike new threads have to be spawned, and they perform
several syscalls (RSEQ registration, UMCG worker registration, memory
allocations) before they can actually start doing useful work. Removing
any bottlenecks on this thread startup path will greatly improve our
services' latencies when faced with request/workload spikes.
At high scale, mmap_lock contention during thread creation and stack page
faults leads to user-visible multi-second serving latencies in a similar
pattern to Android app startup. Per-VMA locking patchset has been run
successfully in limited experiments with user-facing production workloads.
In these experiments, we observed that the peak thread creation rate was
high enough that thread creation is no longer a bottleneck.

TCP zerocopy receive:
From the point of view of TCP zerocopy receive, the per-vma lock patch is
massively beneficial.
In today's implementation, a process with N threads where N - 1 are
performing zerocopy receive and 1 thread is performing madvise() with the
write lock taken (e.g. needs to change vm_flags) will result in all N -1
receive threads blocking until the madvise is done. Conversely, on a busy
process receiving a lot of data, an madvise operation that does need to
take the mmap lock in write mode will need to wait for all of the receives
to be done - a lose:lose proposition. Per-VMA locking _removes_ by
definition this source of contention entirely.
There are other benefits for receive as well, chiefly a reduction in
cacheline bouncing across receiving threads for locking/unlocking the
single mmap lock. On an RPC style synthetic workload with 4KB RPCs:
1a) The find+lock+unlock VMA path in the base case, without the per-vma
lock patchset, is about 0.7% of cycles as measured by perf.
1b) mmap_read_lock + mmap_read_unlock in the base case is about 0.5%
cycles overall - most of this is within the TCP read hotpath (a small
fraction is 'other' usage in the system).
2a) The find+lock+unlock VMA path, with the per-vma patchset and a trivial
patch written to take advantage of it in TCP, is about 0.4% of cycles
(down from 0.7% above)
2b) mmap_read_lock + mmap_read_unlock in the per-vma patchset is < 0.1%
cycles and is out of the TCP read hotpath entirely (down from 0.5% before,
the remaining usage is the 'other' usage in the system).
So, in addition to entirely removing an onerous source of contention, it
also reduces the CPU cycles of TCP receive zerocopy by about 0.5%+
(compared to overall cycles in perf) for the 'small' RPC scenario.

The patchset structure is:
0001-0008: Enable maple-tree RCU mode
0009-0031: Main per-vma locks patchset
0032-0033: Performance optimizations

Changes since v3:
- Changed patch [3] to move vma_prepare before vma_adjust_trans_huge
- Dropped patch [4] from the set as unnecessary, per Hyeonggon Yoo
- Changed patch [5] to do VMA locking inside vma_prepare, per Liam Howlett
- Dropped patch [6] from the set as unnecessary, per Liam Howlett

[1] https://lore.kernel.org/all/20220128131006.67712-1-michel@lespinasse.org/
[2] https://lwn.net/Articles/893906/
[3] https://lore.kernel.org/all/20230216051750.3125598-15-surenb@google.com/
[4] https://lore.kernel.org/all/20230216051750.3125598-17-surenb@google.com/
[5] https://lore.kernel.org/all/20230216051750.3125598-18-surenb@google.com/
[6] https://lore.kernel.org/all/20230216051750.3125598-22-surenb@google.com/

The patchset applies cleanly over mm-unstable branch.

Laurent Dufour (1):
  powerc/mm: try VMA lock-based page fault handling first

Liam Howlett (4):
  maple_tree: Be more cautious about dead nodes
  maple_tree: Detect dead nodes in mas_start()
  maple_tree: Fix freeing of nodes in rcu mode
  maple_tree: remove extra smp_wmb() from mas_dead_leaves()

Liam R. Howlett (4):
  maple_tree: Fix write memory barrier of nodes once dead for RCU mode
  maple_tree: Add smp_rmb() to dead node detection
  maple_tree: Add RCU lock checking to rcu callback functions
  mm: Enable maple tree RCU mode by default.

Michel Lespinasse (1):
  mm: rcu safe VMA freeing

Suren Baghdasaryan (23):
  mm: introduce CONFIG_PER_VMA_LOCK
  mm: move mmap_lock assert function definitions
  mm: add per-VMA lock and helper functions to control it
  mm: mark VMA as being written when changing vm_flags
  mm/mmap: move vma_prepare before vma_adjust_trans_huge
  mm/khugepaged: write-lock VMA while collapsing a huge page
  mm/mmap: write-lock VMAs in vma_prepare before modifying them
  mm/mremap: write-lock VMA while remapping it to a new address range
  mm: write-lock VMAs before removing them from VMA tree
  mm: conditionally write-lock VMA in free_pgtables
  kernel/fork: assert no VMA readers during its destruction
  mm/mmap: prevent pagefault handler from racing with mmu_notifier
    registration
  mm: introduce vma detached flag
  mm: introduce lock_vma_under_rcu to be used from arch-specific code
  mm: fall back to mmap_lock if vma->anon_vma is not yet set
  mm: add FAULT_FLAG_VMA_LOCK flag
  mm: prevent do_swap_page from handling page faults under VMA lock
  mm: prevent userfaults to be handled under per-vma lock
  mm: introduce per-VMA lock statistics
  x86/mm: try VMA lock-based page fault handling first
  arm64/mm: try VMA lock-based page fault handling first
  mm/mmap: free vm_area_struct without call_rcu in exit_mmap
  mm: separate vma->lock from vm_area_struct

 arch/arm64/Kconfig                     |   1 +
 arch/arm64/mm/fault.c                  |  36 ++++
 arch/powerpc/mm/fault.c                |  41 ++++
 arch/powerpc/platforms/powernv/Kconfig |   1 +
 arch/powerpc/platforms/pseries/Kconfig |   1 +
 arch/x86/Kconfig                       |   1 +
 arch/x86/mm/fault.c                    |  36 ++++
 include/linux/mm.h                     | 108 +++++++++-
 include/linux/mm_types.h               |  32 ++-
 include/linux/mmap_lock.h              |  37 ++--
 include/linux/vm_event_item.h          |   6 +
 include/linux/vmstat.h                 |   6 +
 kernel/fork.c                          |  99 +++++++--
 lib/maple_tree.c                       | 269 +++++++++++++++++--------
 mm/Kconfig                             |  12 ++
 mm/Kconfig.debug                       |   6 +
 mm/init-mm.c                           |   3 +
 mm/internal.h                          |   2 +-
 mm/khugepaged.c                        |   5 +
 mm/memory.c                            |  72 ++++++-
 mm/mmap.c                              |  53 +++--
 mm/mremap.c                            |   1 +
 mm/nommu.c                             |   5 +
 mm/rmap.c                              |  31 +--
 mm/vmstat.c                            |   6 +
 tools/testing/radix-tree/maple.c       |  16 ++
 26 files changed, 737 insertions(+), 149 deletions(-)

Leon Romanovsky July 11, 2023, 10:35 a.m. UTC | #1

On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote:

<...>

> Laurent Dufour (1):
>   powerc/mm: try VMA lock-based page fault handling first

Hi,

This series and specifically the commit above broke docker over PPC.
It causes to docker service stuck while trying to activate. Revert of
this commit allows us to use docker again.

[user@ppc-135-3-200-205 ~]# sudo systemctl status docker
● docker.service - Docker Application Container Engine
     Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
     Active: activating (start) since Mon 2023-06-26 14:47:07 IDT; 3h 50min ago
TriggeredBy: ● docker.socket
       Docs: https://docs.docker.com
   Main PID: 276555 (dockerd)
     Memory: 44.2M
     CGroup: /system.slice/docker.service
             └─ 276555 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129383166+03:00" level=info msg="Graph migration to content-addressability took 0.00 se>
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129666160+03:00" level=warning msg="Your kernel does not support cgroup cfs period"
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129684117+03:00" level=warning msg="Your kernel does not support cgroup cfs quotas"
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129697085+03:00" level=warning msg="Your kernel does not support cgroup rt period"
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129711513+03:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129720656+03:00" level=warning msg="Unable to find blkio cgroup in mounts"
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129805617+03:00" level=warning msg="mountpoint for pids not found"
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.130199070+03:00" level=info msg="Loading containers: start."
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.132688568+03:00" level=warning msg="Running modprobe bridge br_netfilter failed with me>
Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.271014050+03:00" level=info msg="Default bridge (docker0) is assigned with an IP addres>

Python script which we used for bisect:

import subprocess
import time
import sys


def run_command(cmd):
    print('running:', cmd)

    p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    try:
        stdout, stderr = p.communicate(timeout=30)

    except subprocess.TimeoutExpired:
        return True

    print(stdout.decode())
    print(stderr.decode())
    print('rc:', p.returncode)

    return False


def main():
    commands = [
        'sudo systemctl stop docker',
        'sudo systemctl status docker',
        'sudo systemctl is-active docker',
        'sudo systemctl start docker',
        'sudo systemctl status docker',
    ]

    for i in range(1000):
        title = f'Try no. {i + 1}'
        print('*' * 50, title, '*' * 50)

        for cmd in commands:
            if run_command(cmd):
                print(f'Reproduced on try no. {i + 1}!')
                print(f'"{cmd}" is stuck!')

                return 1

            print('\n')
        time.sleep(30)
    return 0

if __name__ == '__main__':
    sys.exit(main())

Thanks

Vlastimil Babka July 11, 2023, 10:39 a.m. UTC | #2

On 7/11/23 12:35, Leon Romanovsky wrote:
> 
> On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote:
> 
> <...>
> 
>> Laurent Dufour (1):
>>   powerc/mm: try VMA lock-based page fault handling first
> 
> Hi,
> 
> This series and specifically the commit above broke docker over PPC.
> It causes to docker service stuck while trying to activate. Revert of
> this commit allows us to use docker again.

Hi,

there have been follow-up fixes, that are part of 6.4.3 stable (also
6.5-rc1) Does that version work for you?

Vlastimil

> [user@ppc-135-3-200-205 ~]# sudo systemctl status docker
> ● docker.service - Docker Application Container Engine
>      Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
>      Active: activating (start) since Mon 2023-06-26 14:47:07 IDT; 3h 50min ago
> TriggeredBy: ● docker.socket
>        Docs: https://docs.docker.com
>    Main PID: 276555 (dockerd)
>      Memory: 44.2M
>      CGroup: /system.slice/docker.service
>              └─ 276555 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
> 
> Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129383166+03:00" level=info msg="Graph migration to content-addressability took 0.00 se>
> Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129666160+03:00" level=warning msg="Your kernel does not support cgroup cfs period"
> Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129684117+03:00" level=warning msg="Your kernel does not support cgroup cfs quotas"
> Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129697085+03:00" level=warning msg="Your kernel does not support cgroup rt period"
> Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129711513+03:00" level=warning msg="Your kernel does not support cgroup rt runtime"
> Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129720656+03:00" level=warning msg="Unable to find blkio cgroup in mounts"
> Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.129805617+03:00" level=warning msg="mountpoint for pids not found"
> Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.130199070+03:00" level=info msg="Loading containers: start."
> Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.132688568+03:00" level=warning msg="Running modprobe bridge br_netfilter failed with me>
> Jun 26 14:47:07 ppc-135-3-200-205 dockerd[276555]: time="2023-06-26T14:47:07.271014050+03:00" level=info msg="Default bridge (docker0) is assigned with an IP addres>
> 
> Python script which we used for bisect:
> 
> import subprocess
> import time
> import sys
> 
> 
> def run_command(cmd):
>     print('running:', cmd)
> 
>     p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
> 
>     try:
>         stdout, stderr = p.communicate(timeout=30)
> 
>     except subprocess.TimeoutExpired:
>         return True
> 
>     print(stdout.decode())
>     print(stderr.decode())
>     print('rc:', p.returncode)
> 
>     return False
> 
> 
> def main():
>     commands = [
>         'sudo systemctl stop docker',
>         'sudo systemctl status docker',
>         'sudo systemctl is-active docker',
>         'sudo systemctl start docker',
>         'sudo systemctl status docker',
>     ]
> 
>     for i in range(1000):
>         title = f'Try no. {i + 1}'
>         print('*' * 50, title, '*' * 50)
> 
>         for cmd in commands:
>             if run_command(cmd):
>                 print(f'Reproduced on try no. {i + 1}!')
>                 print(f'"{cmd}" is stuck!')
> 
>                 return 1
> 
>             print('\n')
>         time.sleep(30)
>     return 0
> 
> if __name__ == '__main__':
>     sys.exit(main())
> 
> Thanks

Leon Romanovsky July 11, 2023, 11:01 a.m. UTC | #3

On Tue, Jul 11, 2023 at 12:39:34PM +0200, Vlastimil Babka wrote:
> On 7/11/23 12:35, Leon Romanovsky wrote:
> > 
> > On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote:
> > 
> > <...>
> > 
> >> Laurent Dufour (1):
> >>   powerc/mm: try VMA lock-based page fault handling first
> > 
> > Hi,
> > 
> > This series and specifically the commit above broke docker over PPC.
> > It causes to docker service stuck while trying to activate. Revert of
> > this commit allows us to use docker again.
> 
> Hi,
> 
> there have been follow-up fixes, that are part of 6.4.3 stable (also
> 6.5-rc1) Does that version work for you?

I'll recheck it again on clean system, but for the record:
1. We are running 6.5-rc1 kernels.
2. PPC doesn't compile for us on -rc1 without this fix.
https://lore.kernel.org/all/20230629124500.1.I55e2f4e7903d686c4484cb23c033c6a9e1a9d4c4@changeid/
3. I didn't see anything relevant -rc1 with "git log arch/powerpc/mm/fault.c".

Do you have in mind anything specific to check?

Thanks

Leon Romanovsky July 11, 2023, 11:09 a.m. UTC | #4

On Tue, Jul 11, 2023 at 02:01:41PM +0300, Leon Romanovsky wrote:
> On Tue, Jul 11, 2023 at 12:39:34PM +0200, Vlastimil Babka wrote:
> > On 7/11/23 12:35, Leon Romanovsky wrote:
> > > 
> > > On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote:
> > > 
> > > <...>
> > > 
> > >> Laurent Dufour (1):
> > >>   powerc/mm: try VMA lock-based page fault handling first
> > > 
> > > Hi,
> > > 
> > > This series and specifically the commit above broke docker over PPC.
> > > It causes to docker service stuck while trying to activate. Revert of
> > > this commit allows us to use docker again.
> > 
> > Hi,
> > 
> > there have been follow-up fixes, that are part of 6.4.3 stable (also
> > 6.5-rc1) Does that version work for you?
> 
> I'll recheck it again on clean system, but for the record:
> 1. We are running 6.5-rc1 kernels.
> 2. PPC doesn't compile for us on -rc1 without this fix.
> https://lore.kernel.org/all/20230629124500.1.I55e2f4e7903d686c4484cb23c033c6a9e1a9d4c4@changeid/

Ohh, I see it in -rc1, let's recheck.

> 3. I didn't see anything relevant -rc1 with "git log arch/powerpc/mm/fault.c".
> 
> Do you have in mind anything specific to check?
> 
> Thanks
>

Suren Baghdasaryan July 11, 2023, 4:35 p.m. UTC | #5

On Tue, Jul 11, 2023 at 4:09 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Jul 11, 2023 at 02:01:41PM +0300, Leon Romanovsky wrote:
> > On Tue, Jul 11, 2023 at 12:39:34PM +0200, Vlastimil Babka wrote:
> > > On 7/11/23 12:35, Leon Romanovsky wrote:
> > > >
> > > > On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote:
> > > >
> > > > <...>
> > > >
> > > >> Laurent Dufour (1):
> > > >>   powerc/mm: try VMA lock-based page fault handling first
> > > >
> > > > Hi,
> > > >
> > > > This series and specifically the commit above broke docker over PPC.
> > > > It causes to docker service stuck while trying to activate. Revert of
> > > > this commit allows us to use docker again.
> > >
> > > Hi,
> > >
> > > there have been follow-up fixes, that are part of 6.4.3 stable (also
> > > 6.5-rc1) Does that version work for you?
> >
> > I'll recheck it again on clean system, but for the record:
> > 1. We are running 6.5-rc1 kernels.
> > 2. PPC doesn't compile for us on -rc1 without this fix.
> > https://lore.kernel.org/all/20230629124500.1.I55e2f4e7903d686c4484cb23c033c6a9e1a9d4c4@changeid/
>
> Ohh, I see it in -rc1, let's recheck.

Hi Leon,
Please let us know how it goes.

>
> > 3. I didn't see anything relevant -rc1 with "git log arch/powerpc/mm/fault.c".

The fixes Vlastimil was referring to are not in the fault.c, they are
in the main mm and fork code. More specifically, check for these
patches to exist in the branch you are testing:

mm: lock newly mapped VMA with corrected ordering
fork: lock VMAs of the parent process when forking
mm: lock newly mapped VMA which can be modified after it becomes visible
mm: lock a vma before stack expansion

Thanks,
Suren.

> >
> > Do you have in mind anything specific to check?
> >
> > Thanks
> >
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

Leon Romanovsky July 11, 2023, 5:14 p.m. UTC | #6

On Tue, Jul 11, 2023 at 09:35:13AM -0700, Suren Baghdasaryan wrote:
> On Tue, Jul 11, 2023 at 4:09 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Tue, Jul 11, 2023 at 02:01:41PM +0300, Leon Romanovsky wrote:
> > > On Tue, Jul 11, 2023 at 12:39:34PM +0200, Vlastimil Babka wrote:
> > > > On 7/11/23 12:35, Leon Romanovsky wrote:
> > > > >
> > > > > On Mon, Feb 27, 2023 at 09:35:59AM -0800, Suren Baghdasaryan wrote:
> > > > >
> > > > > <...>
> > > > >
> > > > >> Laurent Dufour (1):
> > > > >>   powerc/mm: try VMA lock-based page fault handling first
> > > > >
> > > > > Hi,
> > > > >
> > > > > This series and specifically the commit above broke docker over PPC.
> > > > > It causes to docker service stuck while trying to activate. Revert of
> > > > > this commit allows us to use docker again.
> > > >
> > > > Hi,
> > > >
> > > > there have been follow-up fixes, that are part of 6.4.3 stable (also
> > > > 6.5-rc1) Does that version work for you?
> > >
> > > I'll recheck it again on clean system, but for the record:
> > > 1. We are running 6.5-rc1 kernels.
> > > 2. PPC doesn't compile for us on -rc1 without this fix.
> > > https://lore.kernel.org/all/20230629124500.1.I55e2f4e7903d686c4484cb23c033c6a9e1a9d4c4@changeid/
> >
> > Ohh, I see it in -rc1, let's recheck.
> 
> Hi Leon,
> Please let us know how it goes.

Once, we rebuilt clean -rc1, docker worked for us.
Sorry for the noise.

> 
> >
> > > 3. I didn't see anything relevant -rc1 with "git log arch/powerpc/mm/fault.c".
> 
> The fixes Vlastimil was referring to are not in the fault.c, they are
> in the main mm and fork code. More specifically, check for these
> patches to exist in the branch you are testing:
> 
> mm: lock newly mapped VMA with corrected ordering
> fork: lock VMAs of the parent process when forking
> mm: lock newly mapped VMA which can be modified after it becomes visible
> mm: lock a vma before stack expansion

Thanks

> 
> Thanks,
> Suren.
> 
> > >
> > > Do you have in mind anything specific to check?
> > >
> > > Thanks
> > >
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >

[v4,00/33] Per-VMA locks

Message

Comments