Message ID | 20240702132139.3332013-1-yosryahmed@google.com (mailing list archive) |
---|---|
Headers | show |
Series | x86/mm: LAM fixups and cleanups | expand |
On Tue, 2 Jul 2024 13:21:36 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > This series has fixups and cleanups for LAM. Most importantly, patch 1 > fixes a sycnhronization issue that may cause crashes of userspace > applications. This is a resend of v3, rebased on top of v6.10-rc6. "Crashes of userspace applications" is bad. Yet the patchset has been floating about for four months. It's unclear (to me) how serious this is. Can you please explain how common this is, what the userspace application needs to do to trigger this, etc?
On Tue, Jul 2, 2024 at 10:36 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 2 Jul 2024 13:21:36 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > > > This series has fixups and cleanups for LAM. Most importantly, patch 1 > > fixes a sycnhronization issue that may cause crashes of userspace > > applications. This is a resend of v3, rebased on top of v6.10-rc6. > > "Crashes of userspace applications" is bad. Yet the patchset has been > floating about for four months. > > It's unclear (to me) how serious this is. Can you please explain how > common this is, what the userspace application needs to do to trigger > this, etc? I don't think it would be common. The bug only happens on new hardware supporting LAM, and it happens in a specific scenario where a userspace task enables LAM while a kthread is using (borrowing) its mm_struct on another CPU. So it is possible but I certainly wouldn't call it common or easily triggerable.
On Tue, 2 Jul 2024 10:39:03 -0700 Yosry Ahmed <yosryahmed@google.com> wrote: > On Tue, Jul 2, 2024 at 10:36 AM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Tue, 2 Jul 2024 13:21:36 +0000 Yosry Ahmed <yosryahmed@google.com> wrote: > > > > > This series has fixups and cleanups for LAM. Most importantly, patch 1 > > > fixes a sycnhronization issue that may cause crashes of userspace > > > applications. This is a resend of v3, rebased on top of v6.10-rc6. > > > > "Crashes of userspace applications" is bad. Yet the patchset has been > > floating about for four months. > > > > It's unclear (to me) how serious this is. Can you please explain how > > common this is, what the userspace application needs to do to trigger > > this, etc? > > I don't think it would be common. The bug only happens on new hardware > supporting LAM, and it happens in a specific scenario where a > userspace task enables LAM while a kthread is using (borrowing) its > mm_struct on another CPU. > > So it is possible but I certainly wouldn't call it common or easily triggerable. But when people run older (or current) kernels on newer hardware, they will hit this. So a backport to cover 82721d8b25d7 ("x86/mm: Handle LAM on context switch") is needed. The series doesn't seem to be getting much traction so I can add it to mm.git's mm-unstable branch for wider testing, but it's clearly an x86 tree thing.
On 7/2/24 11:35, Andrew Morton wrote: > But when people run older (or current) kernels on newer hardware, they > will hit this. So a backport to cover 82721d8b25d7 ("x86/mm: Handle > LAM on context switch") is needed. > > The series doesn't seem to be getting much traction so I can add it to > mm.git's mm-unstable branch for wider testing, but it's clearly an x86 > tree thing. I was really hoping Andy L would look at this since he suggested this whole thing really. I completely agree that this needs some wider testing. How about I pull it into x86/mm so it gets some linux-next testing instead of having it in mm-unstable? Maybe it'll also attract Andy's attention once it's in there.
On Tue, Jul 2, 2024 at 11:38 AM Dave Hansen <dave.hansen@intel.com> wrote: > > On 7/2/24 11:35, Andrew Morton wrote: > > But when people run older (or current) kernels on newer hardware, they > > will hit this. So a backport to cover 82721d8b25d7 ("x86/mm: Handle > > LAM on context switch") is needed. > > > > The series doesn't seem to be getting much traction so I can add it to > > mm.git's mm-unstable branch for wider testing, but it's clearly an x86 > > tree thing. > > I was really hoping Andy L would look at this since he suggested this > whole thing really. > > I completely agree that this needs some wider testing. How about I pull > it into x86/mm so it gets some linux-next testing instead of having it > in mm-unstable? Maybe it'll also attract Andy's attention once it's in > there. That would be great. Thanks Dave!