Message ID | 20181210103641.31259-2-daniel.vetter@ffwll.ch (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mmu notifier debug checks v2 | expand |
Patches #1 and #3 are Reviewed-by: Christian König <christian.koenig@amd.com> Patch #2 is Acked-by: Christian König <christian.koenig@amd.com> because I can't judge if adding the counter in the thread structure is actually a good idea. In patch #4 I honestly don't understand at all how this stuff works, so no-comment from my side on this. Christian. Am 10.12.18 um 11:36 schrieb Daniel Vetter: > Just a bit of paranoia, since if we start pushing this deep into > callchains it's hard to spot all places where an mmu notifier > implementation might fail when it's not allowed to. > > Inspired by some confusion we had discussing i915 mmu notifiers and > whether we could use the newly-introduced return value to handle some > corner cases. Until we realized that these are only for when a task > has been killed by the oom reaper. > > An alternative approach would be to split the callback into two > versions, one with the int return value, and the other with void > return value like in older kernels. But that's a lot more churn for > fairly little gain I think. > > Summary from the m-l discussion on why we want something at warning > level: This allows automated tooling in CI to catch bugs without > humans having to look at everything. If we just upgrade the existing > pr_info to a pr_warn, then we'll have false positives. And as-is, no > one will ever spot the problem since it's lost in the massive amounts > of overall dmesg noise. > > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for > the problematic case (Michal Hocko). > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: "Christian König" <christian.koenig@amd.com> > Cc: David Rientjes <rientjes@google.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: "Jérôme Glisse" <jglisse@redhat.com> > Cc: linux-mm@kvack.org > Cc: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > --- > mm/mmu_notifier.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > index 5119ff846769..ccc22f21b735 100644 > --- a/mm/mmu_notifier.c > +++ b/mm/mmu_notifier.c > @@ -190,6 +190,9 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, > pr_info("%pS callback failed with %d in %sblockable context.\n", > mn->ops->invalidate_range_start, _ret, > !blockable ? "non-" : ""); > + if (blockable) > + pr_warn("%pS callback failure not allowed\n", > + mn->ops->invalidate_range_start); > ret = _ret; > } > }
On Mon 10-12-18 11:36:38, Daniel Vetter wrote: > Just a bit of paranoia, since if we start pushing this deep into > callchains it's hard to spot all places where an mmu notifier > implementation might fail when it's not allowed to. > > Inspired by some confusion we had discussing i915 mmu notifiers and > whether we could use the newly-introduced return value to handle some > corner cases. Until we realized that these are only for when a task > has been killed by the oom reaper. > > An alternative approach would be to split the callback into two > versions, one with the int return value, and the other with void > return value like in older kernels. But that's a lot more churn for > fairly little gain I think. > > Summary from the m-l discussion on why we want something at warning > level: This allows automated tooling in CI to catch bugs without > humans having to look at everything. If we just upgrade the existing > pr_info to a pr_warn, then we'll have false positives. And as-is, no > one will ever spot the problem since it's lost in the massive amounts > of overall dmesg noise. OK, fair enough. If this is going to help with testing then I do not have any objections of course. > v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for > the problematic case (Michal Hocko). Thanks! > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: "Christian König" <christian.koenig@amd.com> > Cc: David Rientjes <rientjes@google.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: "Jérôme Glisse" <jglisse@redhat.com> > Cc: linux-mm@kvack.org > Cc: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > --- > mm/mmu_notifier.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > index 5119ff846769..ccc22f21b735 100644 > --- a/mm/mmu_notifier.c > +++ b/mm/mmu_notifier.c > @@ -190,6 +190,9 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, > pr_info("%pS callback failed with %d in %sblockable context.\n", > mn->ops->invalidate_range_start, _ret, > !blockable ? "non-" : ""); > + if (blockable) > + pr_warn("%pS callback failure not allowed\n", > + mn->ops->invalidate_range_start); > ret = _ret; > } > } > -- > 2.20.0.rc1 >
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 5119ff846769..ccc22f21b735 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -190,6 +190,9 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm, pr_info("%pS callback failed with %d in %sblockable context.\n", mn->ops->invalidate_range_start, _ret, !blockable ? "non-" : ""); + if (blockable) + pr_warn("%pS callback failure not allowed\n", + mn->ops->invalidate_range_start); ret = _ret; } }
Just a bit of paranoia, since if we start pushing this deep into callchains it's hard to spot all places where an mmu notifier implementation might fail when it's not allowed to. Inspired by some confusion we had discussing i915 mmu notifiers and whether we could use the newly-introduced return value to handle some corner cases. Until we realized that these are only for when a task has been killed by the oom reaper. An alternative approach would be to split the callback into two versions, one with the int return value, and the other with void return value like in older kernels. But that's a lot more churn for fairly little gain I think. Summary from the m-l discussion on why we want something at warning level: This allows automated tooling in CI to catch bugs without humans having to look at everything. If we just upgrade the existing pr_info to a pr_warn, then we'll have false positives. And as-is, no one will ever spot the problem since it's lost in the massive amounts of overall dmesg noise. v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for the problematic case (Michal Hocko). Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Rientjes <rientjes@google.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: linux-mm@kvack.org Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> --- mm/mmu_notifier.c | 3 +++ 1 file changed, 3 insertions(+)