mbox series

[RESEND,v3,0/3] x86/mm: LAM fixups and cleanups

Message ID 20240702132139.3332013-1-yosryahmed@google.com (mailing list archive)
Headers show
Series x86/mm: LAM fixups and cleanups | expand

Message

Yosry Ahmed July 2, 2024, 1:21 p.m. UTC
This series has fixups and cleanups for LAM. Most importantly, patch 1
fixes a sycnhronization issue that may cause crashes of userspace
applications. This is a resend of v3, rebased on top of v6.10-rc6.

v3: https://lore.kernel.org/lkml/20240418012835.3360429-1-yosryahmed@google.com/
v2: https://lore.kernel.org/lkml/20240312155641.4003683-1-yosryahmed@google.com/
v1: https://lore.kernel.org/lkml/20240312035951.3535980-1-yosryahmed@google.com/
RFC: https://lore.kernel.org/lkml/20240307133916.3782068-1-yosryahmed@google.com/

Yosry Ahmed (3):
  x86/mm: Use IPIs to synchronize LAM enablement
  x86/mm: Fix LAM inconsistency during context switch
  x86/mm: Cleanup prctl_enable_tagged_addr() nr_bits error checking

 arch/x86/include/asm/mmu_context.h |  8 +++++++-
 arch/x86/include/asm/tlbflush.h    |  9 ++++-----
 arch/x86/kernel/process_64.c       | 25 ++++++++++++++++---------
 arch/x86/mm/tlb.c                  | 15 ++++++++-------
 4 files changed, 35 insertions(+), 22 deletions(-)

Comments

Andrew Morton July 2, 2024, 5:36 p.m. UTC | #1
On Tue,  2 Jul 2024 13:21:36 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:

> This series has fixups and cleanups for LAM. Most importantly, patch 1
> fixes a sycnhronization issue that may cause crashes of userspace
> applications. This is a resend of v3, rebased on top of v6.10-rc6.

"Crashes of userspace applications" is bad.  Yet the patchset has been
floating about for four months.

It's unclear (to me) how serious this is.  Can you please explain how
common this is, what the userspace application needs to do to trigger
this, etc?
Yosry Ahmed July 2, 2024, 5:39 p.m. UTC | #2
On Tue, Jul 2, 2024 at 10:36 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue,  2 Jul 2024 13:21:36 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
>
> > This series has fixups and cleanups for LAM. Most importantly, patch 1
> > fixes a sycnhronization issue that may cause crashes of userspace
> > applications. This is a resend of v3, rebased on top of v6.10-rc6.
>
> "Crashes of userspace applications" is bad.  Yet the patchset has been
> floating about for four months.
>
> It's unclear (to me) how serious this is.  Can you please explain how
> common this is, what the userspace application needs to do to trigger
> this, etc?

I don't think it would be common. The bug only happens on new hardware
supporting LAM, and it happens in a specific scenario where a
userspace task enables LAM while a kthread is using (borrowing) its
mm_struct on another CPU.

So it is possible but I certainly wouldn't call it common or easily triggerable.
Andrew Morton July 2, 2024, 6:35 p.m. UTC | #3
On Tue, 2 Jul 2024 10:39:03 -0700 Yosry Ahmed <yosryahmed@google.com> wrote:

> On Tue, Jul 2, 2024 at 10:36 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Tue,  2 Jul 2024 13:21:36 +0000 Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > > This series has fixups and cleanups for LAM. Most importantly, patch 1
> > > fixes a sycnhronization issue that may cause crashes of userspace
> > > applications. This is a resend of v3, rebased on top of v6.10-rc6.
> >
> > "Crashes of userspace applications" is bad.  Yet the patchset has been
> > floating about for four months.
> >
> > It's unclear (to me) how serious this is.  Can you please explain how
> > common this is, what the userspace application needs to do to trigger
> > this, etc?
> 
> I don't think it would be common. The bug only happens on new hardware
> supporting LAM, and it happens in a specific scenario where a
> userspace task enables LAM while a kthread is using (borrowing) its
> mm_struct on another CPU.
> 
> So it is possible but I certainly wouldn't call it common or easily triggerable.

But when people run older (or current) kernels on newer hardware, they
will hit this.  So a backport to cover 82721d8b25d7 ("x86/mm: Handle
LAM on context switch") is needed.

The series doesn't seem to be getting much traction so I can add it to
mm.git's mm-unstable branch for wider testing, but it's clearly an x86
tree thing.
Dave Hansen July 2, 2024, 6:38 p.m. UTC | #4
On 7/2/24 11:35, Andrew Morton wrote:
> But when people run older (or current) kernels on newer hardware, they
> will hit this.  So a backport to cover 82721d8b25d7 ("x86/mm: Handle
> LAM on context switch") is needed.
> 
> The series doesn't seem to be getting much traction so I can add it to
> mm.git's mm-unstable branch for wider testing, but it's clearly an x86
> tree thing.

I was really hoping Andy L would look at this since he suggested this
whole thing really.

I completely agree that this needs some wider testing.  How about I pull
it into x86/mm so it gets some linux-next testing instead of having it
in mm-unstable?  Maybe it'll also attract Andy's attention once it's in
there.
Yosry Ahmed July 2, 2024, 6:55 p.m. UTC | #5
On Tue, Jul 2, 2024 at 11:38 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 7/2/24 11:35, Andrew Morton wrote:
> > But when people run older (or current) kernels on newer hardware, they
> > will hit this.  So a backport to cover 82721d8b25d7 ("x86/mm: Handle
> > LAM on context switch") is needed.
> >
> > The series doesn't seem to be getting much traction so I can add it to
> > mm.git's mm-unstable branch for wider testing, but it's clearly an x86
> > tree thing.
>
> I was really hoping Andy L would look at this since he suggested this
> whole thing really.
>
> I completely agree that this needs some wider testing.  How about I pull
> it into x86/mm so it gets some linux-next testing instead of having it
> in mm-unstable?  Maybe it'll also attract Andy's attention once it's in
> there.

That would be great. Thanks Dave!