mbox series

[0/3] Avoid live-lock in fault-in+uaccess loops with sub-page faults

Message ID 20211124192024.2408218-1-catalin.marinas@arm.com (mailing list archive)
Headers show
Series Avoid live-lock in fault-in+uaccess loops with sub-page faults | expand

Message

Catalin Marinas Nov. 24, 2021, 7:20 p.m. UTC
Hi,

There are a few places in the filesystem layer where a uaccess is
performed in a loop with page faults disabled, together with a
fault_in_*() call to pre-fault the pages. On architectures like arm64
with MTE (memory tagging extensions) or SPARC ADI, even if the
fault_in_*() succeeded, the uaccess can still fault indefinitely.

In general this is not an issue since such code restarts the
fault_in_*() from where the uaccess failed, therefore guaranteeing
forward progress. The btrfs search_ioctl(), however, rewinds the
fault_in_*() position and it can live-lock. This was reported by Al
here:

https://lore.kernel.org/r/YSqOUb7yZ7kBoKRY@zeniv-ca.linux.org.uk

There's also an analysis by Al of other fault-in places:

https://lore.kernel.org/r/YSldx9uhMYhT/G8X@zeniv-ca.linux.org.uk

and another sub-thread on the same topic:

https://lore.kernel.org/r/YXBFqD9WVuU8awIv@arm.com

So far only btrfs search_ioctl() seems to be affected and that's what
this series addresses. The existing loops like generic_perform_write()
already guarantee forward progress.

Andreas raised a concern about O_DIRECT accesses since on fault the user
address is rewound to a block size boundary. I tried ext4, btrfs and
gfs2 and I could not get any of them to live-lock. Depending on the
alignment of the user buffer (page or not), I found two behaviours:

- the copy to or from the user buffer succeeds entirely if it goes
  through the kernel mapping (GUP, kmap'ed page; user MTE tags are not
  checked) or

- the copy partially succeeds after a few attempts at uaccess on the
  faulting same address (the highest number of attempts in my tests was
  11 with btrfs).

Given the high cost of such sub-page probing (which is done prior to the
uaccess) my proposal is to only change the btrfs search_ioctl() (as per
the last patch). We can extend the API and call places in the future if
needed but I hope filesystems already deal with this in other ways.

Thanks.

Catalin Marinas (3):
  mm: Introduce fault_in_exact_writeable() to probe for sub-page faults
  arm64: Add support for sub-page faults user probing
  btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page
    faults

 arch/Kconfig                     |  7 +++++++
 arch/arm64/Kconfig               |  1 +
 arch/arm64/include/asm/uaccess.h | 33 ++++++++++++++++++++++++++++++++
 fs/btrfs/ioctl.c                 |  3 ++-
 include/linux/pagemap.h          |  1 +
 include/linux/uaccess.h          | 21 ++++++++++++++++++++
 mm/gup.c                         | 19 ++++++++++++++++++
 7 files changed, 84 insertions(+), 1 deletion(-)

Comments

Andrew Morton Nov. 24, 2021, 9:36 p.m. UTC | #1
On Wed, 24 Nov 2021 19:20:21 +0000 Catalin Marinas <catalin.marinas@arm.com> wrote:

> Hi,
> 
> There are a few places in the filesystem layer where a uaccess is
> performed in a loop with page faults disabled, together with a
> fault_in_*() call to pre-fault the pages. On architectures like arm64
> with MTE (memory tagging extensions) or SPARC ADI, even if the
> fault_in_*() succeeded, the uaccess can still fault indefinitely.
> 
> In general this is not an issue since such code restarts the
> fault_in_*() from where the uaccess failed, therefore guaranteeing
> forward progress. The btrfs search_ioctl(), however, rewinds the
> fault_in_*() position and it can live-lock. This was reported by Al
> here:

Btrfs livelock on some-of-arm sounds fairly serious.  Should we
backport this?  If so, a48b73eca4ce ("btrfs: fix potential deadlock in
the search ioctl") appears to be a suitable Fixes: target?
Catalin Marinas Nov. 24, 2021, 10:31 p.m. UTC | #2
On Wed, Nov 24, 2021 at 01:36:00PM -0800, Andrew Morton wrote:
> On Wed, 24 Nov 2021 19:20:21 +0000 Catalin Marinas <catalin.marinas@arm.com> wrote:
> > There are a few places in the filesystem layer where a uaccess is
> > performed in a loop with page faults disabled, together with a
> > fault_in_*() call to pre-fault the pages. On architectures like arm64
> > with MTE (memory tagging extensions) or SPARC ADI, even if the
> > fault_in_*() succeeded, the uaccess can still fault indefinitely.
> > 
> > In general this is not an issue since such code restarts the
> > fault_in_*() from where the uaccess failed, therefore guaranteeing
> > forward progress. The btrfs search_ioctl(), however, rewinds the
> > fault_in_*() position and it can live-lock. This was reported by Al
> > here:
> 
> Btrfs livelock on some-of-arm sounds fairly serious.

Luckily not much btrfs use on Arm mobile parts.

> Should we
> backport this?  If so, a48b73eca4ce ("btrfs: fix potential deadlock in
> the search ioctl") appears to be a suitable Fixes: target?

This should be a suitable target together with a Cc stable to v4.4
(that's what the above commit had). Of course, the other two patches
need backporting as well and they won't apply cleanly.

Once we agreed on the fix, I'm happy to do the backports and send them
all to stable.