[1/4] mm: Check if mmu notifier callbacks are allowed to fail
diff mbox series

Message ID 20190520213945.17046-1-daniel.vetter@ffwll.ch
State New
Headers show
Series
  • [1/4] mm: Check if mmu notifier callbacks are allowed to fail
Related show

Commit Message

Daniel Vetter May 20, 2019, 9:39 p.m. UTC
Just a bit of paranoia, since if we start pushing this deep into
callchains it's hard to spot all places where an mmu notifier
implementation might fail when it's not allowed to.

Inspired by some confusion we had discussing i915 mmu notifiers and
whether we could use the newly-introduced return value to handle some
corner cases. Until we realized that these are only for when a task
has been killed by the oom reaper.

An alternative approach would be to split the callback into two
versions, one with the int return value, and the other with void
return value like in older kernels. But that's a lot more churn for
fairly little gain I think.

Summary from the m-l discussion on why we want something at warning
level: This allows automated tooling in CI to catch bugs without
humans having to look at everything. If we just upgrade the existing
pr_info to a pr_warn, then we'll have false positives. And as-is, no
one will ever spot the problem since it's lost in the massive amounts
of overall dmesg noise.

v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
the problematic case (Michal Hocko).

v3: Rebase on top of Glisse's arg rework.

v4: More rebase on top of Glisse reworking everything.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: linux-mm@kvack.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 mm/mmu_notifier.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Jerome Glisse May 21, 2019, 3:44 p.m. UTC | #1
On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> Just a bit of paranoia, since if we start pushing this deep into
> callchains it's hard to spot all places where an mmu notifier
> implementation might fail when it's not allowed to.
> 
> Inspired by some confusion we had discussing i915 mmu notifiers and
> whether we could use the newly-introduced return value to handle some
> corner cases. Until we realized that these are only for when a task
> has been killed by the oom reaper.
> 
> An alternative approach would be to split the callback into two
> versions, one with the int return value, and the other with void
> return value like in older kernels. But that's a lot more churn for
> fairly little gain I think.
> 
> Summary from the m-l discussion on why we want something at warning
> level: This allows automated tooling in CI to catch bugs without
> humans having to look at everything. If we just upgrade the existing
> pr_info to a pr_warn, then we'll have false positives. And as-is, no
> one will ever spot the problem since it's lost in the massive amounts
> of overall dmesg noise.
> 
> v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> the problematic case (Michal Hocko).
> 
> v3: Rebase on top of Glisse's arg rework.
> 
> v4: More rebase on top of Glisse reworking everything.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

Reviewed-by: Jérôme Glisse <jglisse@redhat.com>

> ---
>  mm/mmu_notifier.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index ee36068077b6..c05e406a7cd7 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -181,6 +181,9 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
>  				pr_info("%pS callback failed with %d in %sblockable context.\n",
>  					mn->ops->invalidate_range_start, _ret,
>  					!mmu_notifier_range_blockable(range) ? "non-" : "");
> +				if (!mmu_notifier_range_blockable(range))
> +					pr_warn("%pS callback failure not allowed\n",
> +						mn->ops->invalidate_range_start);
>  				ret = _ret;
>  			}
>  		}
> -- 
> 2.20.1
>
Daniel Vetter June 18, 2019, 3:22 p.m. UTC | #2
On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > Just a bit of paranoia, since if we start pushing this deep into
> > callchains it's hard to spot all places where an mmu notifier
> > implementation might fail when it's not allowed to.
> > 
> > Inspired by some confusion we had discussing i915 mmu notifiers and
> > whether we could use the newly-introduced return value to handle some
> > corner cases. Until we realized that these are only for when a task
> > has been killed by the oom reaper.
> > 
> > An alternative approach would be to split the callback into two
> > versions, one with the int return value, and the other with void
> > return value like in older kernels. But that's a lot more churn for
> > fairly little gain I think.
> > 
> > Summary from the m-l discussion on why we want something at warning
> > level: This allows automated tooling in CI to catch bugs without
> > humans having to look at everything. If we just upgrade the existing
> > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > one will ever spot the problem since it's lost in the massive amounts
> > of overall dmesg noise.
> > 
> > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > the problematic case (Michal Hocko).
> > 
> > v3: Rebase on top of Glisse's arg rework.
> > 
> > v4: More rebase on top of Glisse reworking everything.
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > Cc: linux-mm@kvack.org
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> 
> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>

-mm folks, is this (entire series of 4 patches) planned to land in the 5.3
merge window? Or do you want more reviews/testing/polish?

I think with all the hmm rework going on, a bit more validation and checks
in this tricky area would help.

Thanks, Daniel

> 
> > ---
> >  mm/mmu_notifier.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > index ee36068077b6..c05e406a7cd7 100644
> > --- a/mm/mmu_notifier.c
> > +++ b/mm/mmu_notifier.c
> > @@ -181,6 +181,9 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> >  				pr_info("%pS callback failed with %d in %sblockable context.\n",
> >  					mn->ops->invalidate_range_start, _ret,
> >  					!mmu_notifier_range_blockable(range) ? "non-" : "");
> > +				if (!mmu_notifier_range_blockable(range))
> > +					pr_warn("%pS callback failure not allowed\n",
> > +						mn->ops->invalidate_range_start);
> >  				ret = _ret;
> >  			}
> >  		}
> > -- 
> > 2.20.1
> > 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Jason Gunthorpe June 19, 2019, 4:50 p.m. UTC | #3
On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > Just a bit of paranoia, since if we start pushing this deep into
> > > callchains it's hard to spot all places where an mmu notifier
> > > implementation might fail when it's not allowed to.
> > > 
> > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > whether we could use the newly-introduced return value to handle some
> > > corner cases. Until we realized that these are only for when a task
> > > has been killed by the oom reaper.
> > > 
> > > An alternative approach would be to split the callback into two
> > > versions, one with the int return value, and the other with void
> > > return value like in older kernels. But that's a lot more churn for
> > > fairly little gain I think.
> > > 
> > > Summary from the m-l discussion on why we want something at warning
> > > level: This allows automated tooling in CI to catch bugs without
> > > humans having to look at everything. If we just upgrade the existing
> > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > one will ever spot the problem since it's lost in the massive amounts
> > > of overall dmesg noise.
> > > 
> > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > the problematic case (Michal Hocko).

I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
like syzkaller to report a bug, while a random pr_warn probably will
not.

I do agree the backtrace is not useful here, but we don't have a
warn-no-backtrace version..

IMHO, kernel/driver bugs should always be reported by WARN &
friends. We never expect to see the print, so why do we care how big
it is?

Also note that WARN integrates an unlikely() into it so the codegen is
automatically a bit more optimal that the if & pr_warn combination.

Jason
Daniel Vetter June 19, 2019, 7:57 p.m. UTC | #4
On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > callchains it's hard to spot all places where an mmu notifier
> > > > implementation might fail when it's not allowed to.
> > > >
> > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > whether we could use the newly-introduced return value to handle some
> > > > corner cases. Until we realized that these are only for when a task
> > > > has been killed by the oom reaper.
> > > >
> > > > An alternative approach would be to split the callback into two
> > > > versions, one with the int return value, and the other with void
> > > > return value like in older kernels. But that's a lot more churn for
> > > > fairly little gain I think.
> > > >
> > > > Summary from the m-l discussion on why we want something at warning
> > > > level: This allows automated tooling in CI to catch bugs without
> > > > humans having to look at everything. If we just upgrade the existing
> > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > one will ever spot the problem since it's lost in the massive amounts
> > > > of overall dmesg noise.
> > > >
> > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > the problematic case (Michal Hocko).
>
> I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> like syzkaller to report a bug, while a random pr_warn probably will
> not.
>
> I do agree the backtrace is not useful here, but we don't have a
> warn-no-backtrace version..
>
> IMHO, kernel/driver bugs should always be reported by WARN &
> friends. We never expect to see the print, so why do we care how big
> it is?
>
> Also note that WARN integrates an unlikely() into it so the codegen is
> automatically a bit more optimal that the if & pr_warn combination.

Where do you make a difference between a WARN without backtrace and a
pr_warn? They're both dumped at the same log-level ...

I can easily throw an unlikely around this here if that's the only
thing that's blocking the merge.
-Daniel
Jason Gunthorpe June 19, 2019, 8:13 p.m. UTC | #5
On Wed, Jun 19, 2019 at 09:57:15PM +0200, Daniel Vetter wrote:
> On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > > callchains it's hard to spot all places where an mmu notifier
> > > > > implementation might fail when it's not allowed to.
> > > > >
> > > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > > whether we could use the newly-introduced return value to handle some
> > > > > corner cases. Until we realized that these are only for when a task
> > > > > has been killed by the oom reaper.
> > > > >
> > > > > An alternative approach would be to split the callback into two
> > > > > versions, one with the int return value, and the other with void
> > > > > return value like in older kernels. But that's a lot more churn for
> > > > > fairly little gain I think.
> > > > >
> > > > > Summary from the m-l discussion on why we want something at warning
> > > > > level: This allows automated tooling in CI to catch bugs without
> > > > > humans having to look at everything. If we just upgrade the existing
> > > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > > one will ever spot the problem since it's lost in the massive amounts
> > > > > of overall dmesg noise.
> > > > >
> > > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > > the problematic case (Michal Hocko).
> >
> > I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> > like syzkaller to report a bug, while a random pr_warn probably will
> > not.
> >
> > I do agree the backtrace is not useful here, but we don't have a
> > warn-no-backtrace version..
> >
> > IMHO, kernel/driver bugs should always be reported by WARN &
> > friends. We never expect to see the print, so why do we care how big
> > it is?
> >
> > Also note that WARN integrates an unlikely() into it so the codegen is
> > automatically a bit more optimal that the if & pr_warn combination.
> 
> Where do you make a difference between a WARN without backtrace and a
> pr_warn? They're both dumped at the same log-level ...

WARN panics the kernel when you set 

/proc/sys/kernel/panic_on_warn

So auto testing tools can set that and get a clean detection that the
kernel has failed the test in some way.

Otherwise you are left with frail/ugly grepping of dmesg.

Jason
Daniel Vetter June 19, 2019, 8:18 p.m. UTC | #6
On Wed, Jun 19, 2019 at 10:13 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Wed, Jun 19, 2019 at 09:57:15PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > > > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > > > callchains it's hard to spot all places where an mmu notifier
> > > > > > implementation might fail when it's not allowed to.
> > > > > >
> > > > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > > > whether we could use the newly-introduced return value to handle some
> > > > > > corner cases. Until we realized that these are only for when a task
> > > > > > has been killed by the oom reaper.
> > > > > >
> > > > > > An alternative approach would be to split the callback into two
> > > > > > versions, one with the int return value, and the other with void
> > > > > > return value like in older kernels. But that's a lot more churn for
> > > > > > fairly little gain I think.
> > > > > >
> > > > > > Summary from the m-l discussion on why we want something at warning
> > > > > > level: This allows automated tooling in CI to catch bugs without
> > > > > > humans having to look at everything. If we just upgrade the existing
> > > > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > > > one will ever spot the problem since it's lost in the massive amounts
> > > > > > of overall dmesg noise.
> > > > > >
> > > > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > > > the problematic case (Michal Hocko).
> > >
> > > I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> > > like syzkaller to report a bug, while a random pr_warn probably will
> > > not.
> > >
> > > I do agree the backtrace is not useful here, but we don't have a
> > > warn-no-backtrace version..
> > >
> > > IMHO, kernel/driver bugs should always be reported by WARN &
> > > friends. We never expect to see the print, so why do we care how big
> > > it is?
> > >
> > > Also note that WARN integrates an unlikely() into it so the codegen is
> > > automatically a bit more optimal that the if & pr_warn combination.
> >
> > Where do you make a difference between a WARN without backtrace and a
> > pr_warn? They're both dumped at the same log-level ...
>
> WARN panics the kernel when you set
>
> /proc/sys/kernel/panic_on_warn
>
> So auto testing tools can set that and get a clean detection that the
> kernel has failed the test in some way.
>
> Otherwise you are left with frail/ugly grepping of dmesg.

Hm right.

Anyway, I'm happy to repaint the bikeshed in any color that's desired,
if that helps with landing it. WARN_WITHOUT_BACKTRACE might take a bit
longer (need to find a bit of time, plus it'll definitely attract more
comments).

Michal?
-Daniel
Jason Gunthorpe June 19, 2019, 8:42 p.m. UTC | #7
On Wed, Jun 19, 2019 at 10:18:43PM +0200, Daniel Vetter wrote:
> On Wed, Jun 19, 2019 at 10:13 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Wed, Jun 19, 2019 at 09:57:15PM +0200, Daniel Vetter wrote:
> > > On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > > > > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > > > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > > > > callchains it's hard to spot all places where an mmu notifier
> > > > > > > implementation might fail when it's not allowed to.
> > > > > > >
> > > > > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > > > > whether we could use the newly-introduced return value to handle some
> > > > > > > corner cases. Until we realized that these are only for when a task
> > > > > > > has been killed by the oom reaper.
> > > > > > >
> > > > > > > An alternative approach would be to split the callback into two
> > > > > > > versions, one with the int return value, and the other with void
> > > > > > > return value like in older kernels. But that's a lot more churn for
> > > > > > > fairly little gain I think.
> > > > > > >
> > > > > > > Summary from the m-l discussion on why we want something at warning
> > > > > > > level: This allows automated tooling in CI to catch bugs without
> > > > > > > humans having to look at everything. If we just upgrade the existing
> > > > > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > > > > one will ever spot the problem since it's lost in the massive amounts
> > > > > > > of overall dmesg noise.
> > > > > > >
> > > > > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > > > > the problematic case (Michal Hocko).
> > > >
> > > > I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> > > > like syzkaller to report a bug, while a random pr_warn probably will
> > > > not.
> > > >
> > > > I do agree the backtrace is not useful here, but we don't have a
> > > > warn-no-backtrace version..
> > > >
> > > > IMHO, kernel/driver bugs should always be reported by WARN &
> > > > friends. We never expect to see the print, so why do we care how big
> > > > it is?
> > > >
> > > > Also note that WARN integrates an unlikely() into it so the codegen is
> > > > automatically a bit more optimal that the if & pr_warn combination.
> > >
> > > Where do you make a difference between a WARN without backtrace and a
> > > pr_warn? They're both dumped at the same log-level ...
> >
> > WARN panics the kernel when you set
> >
> > /proc/sys/kernel/panic_on_warn
> >
> > So auto testing tools can set that and get a clean detection that the
> > kernel has failed the test in some way.
> >
> > Otherwise you are left with frail/ugly grepping of dmesg.
> 
> Hm right.
> 
> Anyway, I'm happy to repaint the bikeshed in any color that's desired,
> if that helps with landing it. WARN_WITHOUT_BACKTRACE might take a bit
> longer (need to find a bit of time, plus it'll definitely attract more
> comments).

I was actually just writing something very similar when looking at the
hmm things..

Also, is the test backwards?

mmu_notifier_range_blockable() == true means the callback must return
zero

mmu_notififer_range_blockable() == false means the callback can return
0 or -EAGAIN.

Suggest this:

                                pr_info("%pS callback failed with %d in %sblockable context.\n",
                                        mn->ops->invalidate_range_start, _ret,
                                        !mmu_notifier_range_blockable(range) ? "non-" : "");
+                               WARN_ON(mmu_notifier_range_blockable(range) ||
+                                       _ret != -EAGAIN);
                                ret = _ret;
                        }
                }

To express the API invariant.

Jason
Daniel Vetter June 19, 2019, 9:20 p.m. UTC | #8
On Wed, Jun 19, 2019 at 10:42 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jun 19, 2019 at 10:18:43PM +0200, Daniel Vetter wrote:
> > On Wed, Jun 19, 2019 at 10:13 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Wed, Jun 19, 2019 at 09:57:15PM +0200, Daniel Vetter wrote:
> > > > On Wed, Jun 19, 2019 at 6:50 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > > On Tue, Jun 18, 2019 at 05:22:15PM +0200, Daniel Vetter wrote:
> > > > > > On Tue, May 21, 2019 at 11:44:11AM -0400, Jerome Glisse wrote:
> > > > > > > On Mon, May 20, 2019 at 11:39:42PM +0200, Daniel Vetter wrote:
> > > > > > > > Just a bit of paranoia, since if we start pushing this deep into
> > > > > > > > callchains it's hard to spot all places where an mmu notifier
> > > > > > > > implementation might fail when it's not allowed to.
> > > > > > > >
> > > > > > > > Inspired by some confusion we had discussing i915 mmu notifiers and
> > > > > > > > whether we could use the newly-introduced return value to handle some
> > > > > > > > corner cases. Until we realized that these are only for when a task
> > > > > > > > has been killed by the oom reaper.
> > > > > > > >
> > > > > > > > An alternative approach would be to split the callback into two
> > > > > > > > versions, one with the int return value, and the other with void
> > > > > > > > return value like in older kernels. But that's a lot more churn for
> > > > > > > > fairly little gain I think.
> > > > > > > >
> > > > > > > > Summary from the m-l discussion on why we want something at warning
> > > > > > > > level: This allows automated tooling in CI to catch bugs without
> > > > > > > > humans having to look at everything. If we just upgrade the existing
> > > > > > > > pr_info to a pr_warn, then we'll have false positives. And as-is, no
> > > > > > > > one will ever spot the problem since it's lost in the massive amounts
> > > > > > > > of overall dmesg noise.
> > > > > > > >
> > > > > > > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> > > > > > > > the problematic case (Michal Hocko).
> > > > >
> > > > > I disagree with this v2 note, the WARN_ON/WARN will trigger checkers
> > > > > like syzkaller to report a bug, while a random pr_warn probably will
> > > > > not.
> > > > >
> > > > > I do agree the backtrace is not useful here, but we don't have a
> > > > > warn-no-backtrace version..
> > > > >
> > > > > IMHO, kernel/driver bugs should always be reported by WARN &
> > > > > friends. We never expect to see the print, so why do we care how big
> > > > > it is?
> > > > >
> > > > > Also note that WARN integrates an unlikely() into it so the codegen is
> > > > > automatically a bit more optimal that the if & pr_warn combination.
> > > >
> > > > Where do you make a difference between a WARN without backtrace and a
> > > > pr_warn? They're both dumped at the same log-level ...
> > >
> > > WARN panics the kernel when you set
> > >
> > > /proc/sys/kernel/panic_on_warn
> > >
> > > So auto testing tools can set that and get a clean detection that the
> > > kernel has failed the test in some way.
> > >
> > > Otherwise you are left with frail/ugly grepping of dmesg.
> >
> > Hm right.
> >
> > Anyway, I'm happy to repaint the bikeshed in any color that's desired,
> > if that helps with landing it. WARN_WITHOUT_BACKTRACE might take a bit
> > longer (need to find a bit of time, plus it'll definitely attract more
> > comments).
>
> I was actually just writing something very similar when looking at the
> hmm things..
>
> Also, is the test backwards?

Yes, in the last rebase I screwed things up :-/
-Daniel

> mmu_notifier_range_blockable() == true means the callback must return
> zero
>
> mmu_notififer_range_blockable() == false means the callback can return
> 0 or -EAGAIN.
>
> Suggest this:
>
>                                 pr_info("%pS callback failed with %d in %sblockable context.\n",
>                                         mn->ops->invalidate_range_start, _ret,
>                                         !mmu_notifier_range_blockable(range) ? "non-" : "");
> +                               WARN_ON(mmu_notifier_range_blockable(range) ||
> +                                       _ret != -EAGAIN);
>                                 ret = _ret;
>                         }
>                 }
>
> To express the API invariant.
>
> Jason

Patch
diff mbox series

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index ee36068077b6..c05e406a7cd7 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -181,6 +181,9 @@  int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
 				pr_info("%pS callback failed with %d in %sblockable context.\n",
 					mn->ops->invalidate_range_start, _ret,
 					!mmu_notifier_range_blockable(range) ? "non-" : "");
+				if (!mmu_notifier_range_blockable(range))
+					pr_warn("%pS callback failure not allowed\n",
+						mn->ops->invalidate_range_start);
 				ret = _ret;
 			}
 		}