diff mbox

[2/2] mm, debug: report when GFP_NO{FS,IO} is used explicitly from memalloc_no{fs,io}_{save,restore} context

Message ID 1461671772-1269-3-git-send-email-mhocko@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Michal Hocko April 26, 2016, 11:56 a.m. UTC
From: Michal Hocko <mhocko@suse.com>

THIS PATCH IS FOR TESTING ONLY AND NOT MEANT TO HIT LINUS TREE

It is desirable to reduce the direct GFP_NO{FS,IO} usage at minimum and
prefer scope usage defined by memalloc_no{fs,io}_{save,restore} API.

Let's help this process and add a debugging tool to catch when an
explicit allocation request for GFP_NO{FS,IO} is done from the scope
context. The printed stacktrace should help to identify the caller
and evaluate whether it can be changed to use a wider context or whether
it is called from another potentially dangerous context which needs
a scope protection as well.

The checks have to be enabled explicitly by debug_scope_gfp kernel
command line parameter.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/page_alloc.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

Comments

Dave Chinner April 26, 2016, 10:58 p.m. UTC | #1
On Tue, Apr 26, 2016 at 01:56:12PM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> THIS PATCH IS FOR TESTING ONLY AND NOT MEANT TO HIT LINUS TREE
> 
> It is desirable to reduce the direct GFP_NO{FS,IO} usage at minimum and
> prefer scope usage defined by memalloc_no{fs,io}_{save,restore} API.
> 
> Let's help this process and add a debugging tool to catch when an
> explicit allocation request for GFP_NO{FS,IO} is done from the scope
> context. The printed stacktrace should help to identify the caller
> and evaluate whether it can be changed to use a wider context or whether
> it is called from another potentially dangerous context which needs
> a scope protection as well.

You're going to get a large number of these from XFS. There are call
paths in XFs that get called both inside and outside transaction
context, and many of them are marked with GFP_NOFS to prevent issues
that have cropped up in the past.

Often these are to silence lockdep warnings (e.g. commit b17cb36
("xfs: fix missing KM_NOFS tags to keep lockdep happy")) because
lockdep gets very unhappy about the same functions being called with
different reclaim contexts. e.g.  directory block mapping might
occur from readdir (no transaction context) or within transactions
(create/unlink). hence paths like this are tagged with GFP_NOFS to
stop lockdep emitting false positive warnings....

Removing the GFP_NOFS flags in situations like this is simply going
to restart the flood of false positive lockdep warnings we've
silenced over the years, so perhaps lockdep needs to be made smarter
as well...

Cheers,

Dave.
Michal Hocko April 27, 2016, 8:03 a.m. UTC | #2
On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> On Tue, Apr 26, 2016 at 01:56:12PM +0200, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > THIS PATCH IS FOR TESTING ONLY AND NOT MEANT TO HIT LINUS TREE
> > 
> > It is desirable to reduce the direct GFP_NO{FS,IO} usage at minimum and
> > prefer scope usage defined by memalloc_no{fs,io}_{save,restore} API.
> > 
> > Let's help this process and add a debugging tool to catch when an
> > explicit allocation request for GFP_NO{FS,IO} is done from the scope
> > context. The printed stacktrace should help to identify the caller
> > and evaluate whether it can be changed to use a wider context or whether
> > it is called from another potentially dangerous context which needs
> > a scope protection as well.
> 
> You're going to get a large number of these from XFS. There are call
> paths in XFs that get called both inside and outside transaction
> context, and many of them are marked with GFP_NOFS to prevent issues
> that have cropped up in the past.
> 
> Often these are to silence lockdep warnings (e.g. commit b17cb36
> ("xfs: fix missing KM_NOFS tags to keep lockdep happy")) because
> lockdep gets very unhappy about the same functions being called with
> different reclaim contexts. e.g.  directory block mapping might
> occur from readdir (no transaction context) or within transactions
> (create/unlink). hence paths like this are tagged with GFP_NOFS to
> stop lockdep emitting false positive warnings....

I would much rather see lockdep being fixed than abusing GFP_NOFS to
workaround its limitations. GFP_NOFS has a real consequences to the
memory reclaim. I will go and check the commit you mentioned and try
to understand why that is a problem. From what you described above
I would like to get rid of exactly this kind of usage which is not
really needed for the recursion protection.

Thanks!
Dave Chinner April 27, 2016, 10:55 p.m. UTC | #3
On Wed, Apr 27, 2016 at 10:03:11AM +0200, Michal Hocko wrote:
> On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> > On Tue, Apr 26, 2016 at 01:56:12PM +0200, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > THIS PATCH IS FOR TESTING ONLY AND NOT MEANT TO HIT LINUS TREE
> > > 
> > > It is desirable to reduce the direct GFP_NO{FS,IO} usage at minimum and
> > > prefer scope usage defined by memalloc_no{fs,io}_{save,restore} API.
> > > 
> > > Let's help this process and add a debugging tool to catch when an
> > > explicit allocation request for GFP_NO{FS,IO} is done from the scope
> > > context. The printed stacktrace should help to identify the caller
> > > and evaluate whether it can be changed to use a wider context or whether
> > > it is called from another potentially dangerous context which needs
> > > a scope protection as well.
> > 
> > You're going to get a large number of these from XFS. There are call
> > paths in XFs that get called both inside and outside transaction
> > context, and many of them are marked with GFP_NOFS to prevent issues
> > that have cropped up in the past.
> > 
> > Often these are to silence lockdep warnings (e.g. commit b17cb36
> > ("xfs: fix missing KM_NOFS tags to keep lockdep happy")) because
> > lockdep gets very unhappy about the same functions being called with
> > different reclaim contexts. e.g.  directory block mapping might
> > occur from readdir (no transaction context) or within transactions
> > (create/unlink). hence paths like this are tagged with GFP_NOFS to
> > stop lockdep emitting false positive warnings....
> 
> I would much rather see lockdep being fixed than abusing GFP_NOFS to
> workaround its limitations. GFP_NOFS has a real consequences to the
> memory reclaim. I will go and check the commit you mentioned and try
> to understand why that is a problem. From what you described above
> I would like to get rid of exactly this kind of usage which is not
> really needed for the recursion protection.

The problem is that every time we come across this, the answer is
"use lockdep annotations". Our lockdep annotations are freakin'
complex because of this, and more often than not lockdep false
positives occur due to bugs in the annotations. e.g. see
fs/xfs/xfs_inode.h for all the inode locking annotations we have to
use and the hoops we have to jump through because we are limited to
8 subclasses and we have to be able to annotate nested inode locks
5 deep in places (RENAME_WHITEOUT, thanks).

At one point, we had to reset lockdep classes for inodes in reclaim
so that they didn't throw lockdep false positives the moment an
inode was locked in a memory reclaim context. We had to change
locking to remove that problem (commit 4f59af7 ("xfs: remove iolock
lock classes"). Then there were all the problems with reclaim
triggering lockdep warnings on directory inodes - we had to add a
separate directory inode class for them, and even then we still need
GFP_NOFS in places to minimise reclaim noise (as per the above
commit).

Put simply: we've had to resort to designing locking and allocation
strategies around the limitations of lockdep annotations, as opposed
to what is actually possible or even optimal. i.e. when the choice
is a 2 minute fix to add GFP_NOFS in cases like this, versus another
week long effort to rewrite the inode annotations (again) like this
one last year:

commit 0952c8183c1575a78dc416b5e168987ff98728bb
Author: Dave Chinner <dchinner@redhat.com>
Date:   Wed Aug 19 10:32:49 2015 +1000

    xfs: clean up inode lockdep annotations
    
    Lockdep annotations are a maintenance nightmare. Locking has to be
    modified to suit the limitations of the annotations, and we're
    always having to fix the annotations because they are unable to
    express the complexity of locking heirarchies correctly.
.....

It's a no-brainer to see why GFP_NOFS will be added to the
allocation in question. I've been saying for years that I consider
lockdep harmful - if you want to get rid of GFP_NOFS, then you're
going to need to sort out the lockdep reclaim annotation mess at the
same time...

Cheers,

Dave.
Michal Hocko April 28, 2016, 8:17 a.m. UTC | #4
[Trim the CC list]
On Wed 27-04-16 08:58:45, Dave Chinner wrote:
[...]
> Often these are to silence lockdep warnings (e.g. commit b17cb36
> ("xfs: fix missing KM_NOFS tags to keep lockdep happy")) because
> lockdep gets very unhappy about the same functions being called with
> different reclaim contexts. e.g.  directory block mapping might
> occur from readdir (no transaction context) or within transactions
> (create/unlink). hence paths like this are tagged with GFP_NOFS to
> stop lockdep emitting false positive warnings....

As already said in other email, I have tried to revert the above
commit and tried to run it with some fs workloads but didn't manage
to hit any lockdep splats (after I fixed my bug in the patch 1.2). I
have tried to find reports which led to this commit but didn't succeed
much. Everything is from much earlier or later. Do you happen to
remember which loads triggered them, what they looked like or have an
idea what to try to reproduce them? So far I was trying heavy parallel
fs_mark, kernbench inside a tiny virtual machine so any of those have
triggered direct reclaim all the time.

Thanks!
Dave Chinner April 28, 2016, 9:51 p.m. UTC | #5
On Thu, Apr 28, 2016 at 10:17:59AM +0200, Michal Hocko wrote:
> [Trim the CC list]
> On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> [...]
> > Often these are to silence lockdep warnings (e.g. commit b17cb36
> > ("xfs: fix missing KM_NOFS tags to keep lockdep happy")) because
> > lockdep gets very unhappy about the same functions being called with
> > different reclaim contexts. e.g.  directory block mapping might
> > occur from readdir (no transaction context) or within transactions
> > (create/unlink). hence paths like this are tagged with GFP_NOFS to
> > stop lockdep emitting false positive warnings....
> 
> As already said in other email, I have tried to revert the above
> commit and tried to run it with some fs workloads but didn't manage
> to hit any lockdep splats (after I fixed my bug in the patch 1.2). I
> have tried to find reports which led to this commit but didn't succeed
> much. Everything is from much earlier or later. Do you happen to
> remember which loads triggered them, what they looked like or have an
> idea what to try to reproduce them? So far I was trying heavy parallel
> fs_mark, kernbench inside a tiny virtual machine so any of those have
> triggered direct reclaim all the time.

Most of those issues were reported by users and not reproducable by
any obvious means. They may have been fixed since, but I'm sceptical
of that because, generally speaking, developer testing only catches
the obvious lockdep issues. i.e. it's users that report all the
really twisty issues, and they are generally not reproducable except
under their production workloads...

IOWs, the absence of reports in your testing does not mean there
isn't a problem, and that is one of the biggest problems with
lockdep annotations - we have no way of ever knowing if they are
still necessary or not without exposing users to regressions and
potential deadlocks.....

Cheers,

Dave.
Michal Hocko April 29, 2016, 12:12 p.m. UTC | #6
On Fri 29-04-16 07:51:45, Dave Chinner wrote:
> On Thu, Apr 28, 2016 at 10:17:59AM +0200, Michal Hocko wrote:
> > [Trim the CC list]
> > On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> > [...]
> > > Often these are to silence lockdep warnings (e.g. commit b17cb36
> > > ("xfs: fix missing KM_NOFS tags to keep lockdep happy")) because
> > > lockdep gets very unhappy about the same functions being called with
> > > different reclaim contexts. e.g.  directory block mapping might
> > > occur from readdir (no transaction context) or within transactions
> > > (create/unlink). hence paths like this are tagged with GFP_NOFS to
> > > stop lockdep emitting false positive warnings....
> > 
> > As already said in other email, I have tried to revert the above
> > commit and tried to run it with some fs workloads but didn't manage
> > to hit any lockdep splats (after I fixed my bug in the patch 1.2). I
> > have tried to find reports which led to this commit but didn't succeed
> > much. Everything is from much earlier or later. Do you happen to
> > remember which loads triggered them, what they looked like or have an
> > idea what to try to reproduce them? So far I was trying heavy parallel
> > fs_mark, kernbench inside a tiny virtual machine so any of those have
> > triggered direct reclaim all the time.
> 
> Most of those issues were reported by users and not reproducable by
> any obvious means.

I would really appreciate a reference to some of those (my google-fu has
failed me) or at least a pattern of those splats - was it 
"inconsistent {RECLAIM_FS-ON-[RW]} -> {IN-RECLAIM_FS-[WR]} usage"
or a different class reports?

> They may have been fixed since, but I'm sceptical
> of that because, generally speaking, developer testing only catches
> the obvious lockdep issues. i.e. it's users that report all the
> really twisty issues, and they are generally not reproducable except
> under their production workloads...
> 
> IOWs, the absence of reports in your testing does not mean there
> isn't a problem, and that is one of the biggest problems with
> lockdep annotations - we have no way of ever knowing if they are
> still necessary or not without exposing users to regressions and
> potential deadlocks.....

I understand your points here but if we are sure that those lockdep
reports are just false positives then we should rather provide an api to
silence lockdep for those paths than abusing GFP_NOFS which a) hurts
the overal reclaim healthiness and b) works around a non-existing
problem with lockdep disabled which is the vast majority of
configurations.

Thanks!
Dave Chinner April 29, 2016, 11:40 p.m. UTC | #7
On Fri, Apr 29, 2016 at 02:12:20PM +0200, Michal Hocko wrote:
> On Fri 29-04-16 07:51:45, Dave Chinner wrote:
> > On Thu, Apr 28, 2016 at 10:17:59AM +0200, Michal Hocko wrote:
> > > [Trim the CC list]
> > > On Wed 27-04-16 08:58:45, Dave Chinner wrote:
> > > [...]
> > > > Often these are to silence lockdep warnings (e.g. commit b17cb36
> > > > ("xfs: fix missing KM_NOFS tags to keep lockdep happy")) because
> > > > lockdep gets very unhappy about the same functions being called with
> > > > different reclaim contexts. e.g.  directory block mapping might
> > > > occur from readdir (no transaction context) or within transactions
> > > > (create/unlink). hence paths like this are tagged with GFP_NOFS to
> > > > stop lockdep emitting false positive warnings....
> > > 
> > > As already said in other email, I have tried to revert the above
> > > commit and tried to run it with some fs workloads but didn't manage
> > > to hit any lockdep splats (after I fixed my bug in the patch 1.2). I
> > > have tried to find reports which led to this commit but didn't succeed
> > > much. Everything is from much earlier or later. Do you happen to
> > > remember which loads triggered them, what they looked like or have an
> > > idea what to try to reproduce them? So far I was trying heavy parallel
> > > fs_mark, kernbench inside a tiny virtual machine so any of those have
> > > triggered direct reclaim all the time.
> > 
> > Most of those issues were reported by users and not reproducable by
> > any obvious means.
> 
> I would really appreciate a reference to some of those (my google-fu has
> failed me) or at least a pattern of those splats

If you can't find them with google, then I won't. Google is mostly
useless as a patch/mailing list search tool these days. You can try
looking through this list:

https://www.google.com.au/search?q=XFS+lockdep+site:oss.sgi.com+-splice

but I'm not seeing anything particularly relevant in that list -
there isn't a single reclaim related lockdep report in that...

> - was it 
> "inconsistent {RECLAIM_FS-ON-[RW]} -> {IN-RECLAIM_FS-[WR]} usage"
> or a different class reports?

Typically that was involved, but it quite often there'd be a number
of locks and sometimes even interrupt stacks in an interaction
between 5 or 6 different processes. Lockdep covers all sorts of
stuff now (like fs freeze annotations as well as locks and memory
reclaim) so sometimes the only thing we can do is remove the
reclaim context from the stack and see if that makes it go away...
> 
> > They may have been fixed since, but I'm sceptical
> > of that because, generally speaking, developer testing only catches
> > the obvious lockdep issues. i.e. it's users that report all the
> > really twisty issues, and they are generally not reproducable except
> > under their production workloads...
> > 
> > IOWs, the absence of reports in your testing does not mean there
> > isn't a problem, and that is one of the biggest problems with
> > lockdep annotations - we have no way of ever knowing if they are
> > still necessary or not without exposing users to regressions and
> > potential deadlocks.....
> 
> I understand your points here but if we are sure that those lockdep
> reports are just false positives then we should rather provide an api to
> silence lockdep for those paths

I agree with this - please provide such infrastructure before we
need it...

> than abusing GFP_NOFS which a) hurts
> the overal reclaim healthiness

Which doesn't actually seem to be a problem for the vast majority of
users.

> and b) works around a non-existing
> problem with lockdep disabled which is the vast majority of
> configurations.

But the moment we have a lockdep problem, we get bug reports from
all over the place and people complaining about it, so we are
*required* to silence them one way or another. And, like I said,
when the choice is simply adding GFP_NOFS or spending a week or two
completely reworking complex code that has functioned correctly for
15 years, the risk/reward *always* falls on the side of "just add
GFP_NOFS".

Please keep in mind that there is as much code in fs/xfs as there is
in the mm/ subsystem, and XFS has twice that in userspace as well.
I say this, because we have only have 3-4 full time developers to do
all the work required on this code base, unlike the mm/ subsystem
which had 30-40 full time MM developers attending LSFMM. This is why
I push back on suggestions that require significant redesign of
subsystem code to handle memory allocation/reclaim quirks - most
subsystems simply don't have the resources available to do such
work, and so will always look for the quick 2 minute fix when it is
available....

Cheers,

Dave.
Michal Hocko May 3, 2016, 3:38 p.m. UTC | #8
On Sat 30-04-16 09:40:08, Dave Chinner wrote:
> On Fri, Apr 29, 2016 at 02:12:20PM +0200, Michal Hocko wrote:
[...]
> > - was it 
> > "inconsistent {RECLAIM_FS-ON-[RW]} -> {IN-RECLAIM_FS-[WR]} usage"
> > or a different class reports?
> 
> Typically that was involved, but it quite often there'd be a number
> of locks and sometimes even interrupt stacks in an interaction
> between 5 or 6 different processes. Lockdep covers all sorts of
> stuff now (like fs freeze annotations as well as locks and memory
> reclaim) so sometimes the only thing we can do is remove the
> reclaim context from the stack and see if that makes it go away...

That is what I was thinking of. lockdep_reclaim_{disable,enable} or
something like that to tell __lockdep_trace_alloc to not skip
mark_held_locks(). This would effectivelly help to get rid of reclaim
specific reports. It is hard to tell whether there would be others,
though.

> > > They may have been fixed since, but I'm sceptical
> > > of that because, generally speaking, developer testing only catches
> > > the obvious lockdep issues. i.e. it's users that report all the
> > > really twisty issues, and they are generally not reproducable except
> > > under their production workloads...
> > > 
> > > IOWs, the absence of reports in your testing does not mean there
> > > isn't a problem, and that is one of the biggest problems with
> > > lockdep annotations - we have no way of ever knowing if they are
> > > still necessary or not without exposing users to regressions and
> > > potential deadlocks.....
> > 
> > I understand your points here but if we are sure that those lockdep
> > reports are just false positives then we should rather provide an api to
> > silence lockdep for those paths
> 
> I agree with this - please provide such infrastructure before we
> need it...

Do you think a reclaim specific lockdep annotation would be sufficient?

> > than abusing GFP_NOFS which a) hurts
> > the overal reclaim healthiness
> 
> Which doesn't actually seem to be a problem for the vast majority of
> users.

Yes, most users are OK. Those allocations can be triggered by the
userspace (read a malicious user) quite easily and be harmful without a
good way to contain them.
 
> > and b) works around a non-existing
> > problem with lockdep disabled which is the vast majority of
> > configurations.
> 
> But the moment we have a lockdep problem, we get bug reports from
> all over the place and people complaining about it, so we are
> *required* to silence them one way or another. And, like I said,
> when the choice is simply adding GFP_NOFS or spending a week or two
> completely reworking complex code that has functioned correctly for
> 15 years, the risk/reward *always* falls on the side of "just add
> GFP_NOFS".
> 
> Please keep in mind that there is as much code in fs/xfs as there is
> in the mm/ subsystem, and XFS has twice that in userspace as well.
> I say this, because we have only have 3-4 full time developers to do
> all the work required on this code base, unlike the mm/ subsystem
> which had 30-40 full time MM developers attending LSFMM. This is why
> I push back on suggestions that require significant redesign of
> subsystem code to handle memory allocation/reclaim quirks - most
> subsystems simply don't have the resources available to do such
> work, and so will always look for the quick 2 minute fix when it is
> available....

I do understand your concerns and I really do not ask you to redesign
your code. I would like make the code more maintainable and reducing the
number of (undocumented) GFP_NOFS usage to the minimum seems to be like
a first step. Now the direct usage of GFP_NOFS (resp. KM_NOFS) in xfs is
not that large. If we can reduce the few instances which are using the
flag to silence the lockdep and replace them by a better annotation then
I think this would be an improvement as well. If we can go one step
further and can get rid of mapping_set_gfp_mask(inode->i_mapping,
(gfp_mask & ~(__GFP_FS))) then I would be even happier.

I think other fs and code which interacts with FS layer needs much more
changes than xfs to be honest.
Dave Chinner May 4, 2016, 12:07 a.m. UTC | #9
On Tue, May 03, 2016 at 05:38:23PM +0200, Michal Hocko wrote:
> On Sat 30-04-16 09:40:08, Dave Chinner wrote:
> > On Fri, Apr 29, 2016 at 02:12:20PM +0200, Michal Hocko wrote:
> [...]
> > > - was it 
> > > "inconsistent {RECLAIM_FS-ON-[RW]} -> {IN-RECLAIM_FS-[WR]} usage"
> > > or a different class reports?
> > 
> > Typically that was involved, but it quite often there'd be a number
> > of locks and sometimes even interrupt stacks in an interaction
> > between 5 or 6 different processes. Lockdep covers all sorts of
> > stuff now (like fs freeze annotations as well as locks and memory
> > reclaim) so sometimes the only thing we can do is remove the
> > reclaim context from the stack and see if that makes it go away...
> 
> That is what I was thinking of. lockdep_reclaim_{disable,enable} or
> something like that to tell __lockdep_trace_alloc to not skip
> mark_held_locks(). This would effectivelly help to get rid of reclaim
> specific reports. It is hard to tell whether there would be others,
> though.

Yeah, though I suspect this would get messy having to scatter it
around the code. I can encapsulate it via internal XFS KM flags,
though, so I do think that will be a real issue.

> > > > They may have been fixed since, but I'm sceptical
> > > > of that because, generally speaking, developer testing only catches
> > > > the obvious lockdep issues. i.e. it's users that report all the
> > > > really twisty issues, and they are generally not reproducable except
> > > > under their production workloads...
> > > > 
> > > > IOWs, the absence of reports in your testing does not mean there
> > > > isn't a problem, and that is one of the biggest problems with
> > > > lockdep annotations - we have no way of ever knowing if they are
> > > > still necessary or not without exposing users to regressions and
> > > > potential deadlocks.....
> > > 
> > > I understand your points here but if we are sure that those lockdep
> > > reports are just false positives then we should rather provide an api to
> > > silence lockdep for those paths
> > 
> > I agree with this - please provide such infrastructure before we
> > need it...
> 
> Do you think a reclaim specific lockdep annotation would be sufficient?

It will help - it'll take some time to work through all the explicit
KM_NOFS calls in XFS, though, to determine if they are just working
around lockdep false positives or some other potential problem....

> I do understand your concerns and I really do not ask you to redesign
> your code. I would like make the code more maintainable and reducing the
> number of (undocumented) GFP_NOFS usage to the minimum seems to be like
> a first step. Now the direct usage of GFP_NOFS (resp. KM_NOFS) in xfs is
> not that large.

That's true, and if we can reduce them to real cases of GFP_NOFS
being needed vs annotations to silence lockdep false positives we'll
then know what problems we really need to fix...

Cheers,

Dave.
diff mbox

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 86bb5d6ddd7d..085d00280496 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3750,6 +3750,61 @@  __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	return page;
 }
 
+static bool debug_scope_gfp;
+
+static int __init enable_debug_scope_gfp(char *unused)
+{
+	debug_scope_gfp = true;
+	return 0;
+}
+
+/*
+ * spit the stack trace if the given gfp_mask clears flags which are context
+ * wide cleared. Such a caller can remove special flags clearing and rely on
+ * the context wide mask.
+ */
+static inline void debug_scope_gfp_context(gfp_t gfp_mask)
+{
+	gfp_t restrict_mask;
+
+	if (likely(!debug_scope_gfp))
+		return;
+
+	/* both NOFS, NOIO are irrelevant when direct reclaim is disabled */
+	if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
+		return;
+
+	if (current->flags & PF_MEMALLOC_NOIO)
+		restrict_mask = __GFP_IO;
+	else if ((current->flags & PF_MEMALLOC_NOFS) && (gfp_mask & __GFP_IO))
+		restrict_mask = __GFP_FS;
+	else
+		return;
+
+	if ((gfp_mask & restrict_mask) != restrict_mask) {
+		/*
+		 * If you see this this warning then the code does:
+		 * memalloc_no{fs,io}_save()
+		 * ...
+		 *    foo()
+		 *      alloc_page(GFP_NO{FS,IO})
+		 * ...
+		 * memalloc_no{fs,io}_restore()
+		 *
+		 * allocation which is unnecessary because the scope gfp
+		 * context will do that for all allocation requests already.
+		 * If foo() is called from multiple contexts then make sure other
+		 * contexts are safe wrt. GFP_NO{FS,IO} semantic and either add
+		 * scope protection into particular paths or change the gfp mask
+		 * to GFP_KERNEL.
+		 */
+		pr_info("Unnecesarily specific gfp mask:%#x(%pGg) for the %s task wide context\n", gfp_mask, &gfp_mask,
+				(current->flags & PF_MEMALLOC_NOIO)?"NOIO":"NOFS");
+		dump_stack();
+	}
+}
+early_param("debug_scope_gfp", enable_debug_scope_gfp);
+
 /*
  * This is the 'heart' of the zoned buddy allocator.
  */
@@ -3796,6 +3851,7 @@  __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 				ac.nodemask);
 
 	/* First allocation attempt */
+	debug_scope_gfp_context(gfp_mask);
 	page = get_page_from_freelist(alloc_mask, order, alloc_flags, &ac);
 	if (likely(page))
 		goto out;