diff mbox

[2/9] xfs: introduce and use KM_NOLOCKDEP to silence reclaim lockdep false positives

Message ID 20161215140715.12732-3-mhocko@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Michal Hocko Dec. 15, 2016, 2:07 p.m. UTC
From: Michal Hocko <mhocko@suse.com>

Now that the page allocator offers __GFP_NOLOCKDEP let's introduce
KM_NOLOCKDEP alias for the xfs allocation APIs. While we are at it
also change KM_NOFS users introduced by b17cb364dbbb ("xfs: fix missing
KM_NOFS tags to keep lockdep happy") and use the new flag for them
instead. There is really no reason to make these allocations contexts
weaker just because of the lockdep which even might not be enabled
in most cases.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/xfs/kmem.h                | 4 ++++
 fs/xfs/libxfs/xfs_da_btree.c | 4 ++--
 fs/xfs/xfs_buf.c             | 2 +-
 fs/xfs/xfs_dir2_readdir.c    | 2 +-
 4 files changed, 8 insertions(+), 4 deletions(-)

Comments

Dave Chinner Dec. 19, 2016, 9:24 p.m. UTC | #1
On Thu, Dec 15, 2016 at 03:07:08PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Now that the page allocator offers __GFP_NOLOCKDEP let's introduce
> KM_NOLOCKDEP alias for the xfs allocation APIs. While we are at it
> also change KM_NOFS users introduced by b17cb364dbbb ("xfs: fix missing
> KM_NOFS tags to keep lockdep happy") and use the new flag for them
> instead. There is really no reason to make these allocations contexts
> weaker just because of the lockdep which even might not be enabled
> in most cases.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

I'd suggest that it might be better to drop this patch for now -
it's not necessary for the context flag changeover but does
introduce a risk of regressions if the conversion is wrong.

Hence I think this is better as a completely separate series
which audits and changes all the unnecessary KM_NOFS allocations
in one go. I've never liked whack-a-mole style changes like this -
do it once, do it properly....

Cheers,

Dave.
Darrick J. Wong Dec. 19, 2016, 10:06 p.m. UTC | #2
On Tue, Dec 20, 2016 at 08:24:13AM +1100, Dave Chinner wrote:
> On Thu, Dec 15, 2016 at 03:07:08PM +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Now that the page allocator offers __GFP_NOLOCKDEP let's introduce
> > KM_NOLOCKDEP alias for the xfs allocation APIs. While we are at it
> > also change KM_NOFS users introduced by b17cb364dbbb ("xfs: fix missing
> > KM_NOFS tags to keep lockdep happy") and use the new flag for them
> > instead. There is really no reason to make these allocations contexts
> > weaker just because of the lockdep which even might not be enabled
> > in most cases.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> I'd suggest that it might be better to drop this patch for now -
> it's not necessary for the context flag changeover but does
> introduce a risk of regressions if the conversion is wrong.

I was just about to write in that while I didn't see anything obviously
wrong with the NOFS removals, I also don't know for sure that we can't
end up recursively in those code paths (specifically the directory
traversal thing).

--D

> Hence I think this is better as a completely separate series
> which audits and changes all the unnecessary KM_NOFS allocations
> in one go. I've never liked whack-a-mole style changes like this -
> do it once, do it properly....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Hocko Dec. 20, 2016, 8:38 a.m. UTC | #3
On Tue 20-12-16 08:24:13, Dave Chinner wrote:
> On Thu, Dec 15, 2016 at 03:07:08PM +0100, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Now that the page allocator offers __GFP_NOLOCKDEP let's introduce
> > KM_NOLOCKDEP alias for the xfs allocation APIs. While we are at it
> > also change KM_NOFS users introduced by b17cb364dbbb ("xfs: fix missing
> > KM_NOFS tags to keep lockdep happy") and use the new flag for them
> > instead. There is really no reason to make these allocations contexts
> > weaker just because of the lockdep which even might not be enabled
> > in most cases.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> I'd suggest that it might be better to drop this patch for now -
> it's not necessary for the context flag changeover but does
> introduce a risk of regressions if the conversion is wrong.
> 
> Hence I think this is better as a completely separate series
> which audits and changes all the unnecessary KM_NOFS allocations
> in one go. I've never liked whack-a-mole style changes like this -
> do it once, do it properly....

OK, fair enough. I thought it might be better to have an example user so
that others can follow but as you say, the risk of regression is really
there and these kind of changes definitely need a throughout review.

I am not sure I will be able to post more of those changes because that
requires an intimate knowledge of the fs so I hope somebody can take
over there and follow up.

Thanks!
Dave Chinner Dec. 20, 2016, 9:39 p.m. UTC | #4
On Mon, Dec 19, 2016 at 02:06:19PM -0800, Darrick J. Wong wrote:
> On Tue, Dec 20, 2016 at 08:24:13AM +1100, Dave Chinner wrote:
> > On Thu, Dec 15, 2016 at 03:07:08PM +0100, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > Now that the page allocator offers __GFP_NOLOCKDEP let's introduce
> > > KM_NOLOCKDEP alias for the xfs allocation APIs. While we are at it
> > > also change KM_NOFS users introduced by b17cb364dbbb ("xfs: fix missing
> > > KM_NOFS tags to keep lockdep happy") and use the new flag for them
> > > instead. There is really no reason to make these allocations contexts
> > > weaker just because of the lockdep which even might not be enabled
> > > in most cases.
> > > 
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > 
> > I'd suggest that it might be better to drop this patch for now -
> > it's not necessary for the context flag changeover but does
> > introduce a risk of regressions if the conversion is wrong.
> 
> I was just about to write in that while I didn't see anything obviously
> wrong with the NOFS removals, I also don't know for sure that we can't
> end up recursively in those code paths (specifically the directory
> traversal thing).

The issue is with code paths that can be called from both inside and
outside transaction context - lockdep complains when it sees an
allocation path that is used with both GFP_NOFS and GFP_KERNEL
context, as it doesn't know that the GFP_KERNEL usage is safe or
not.

So things like the directory buffer path, which can be called from
readdir without a transaction context, have various KM_NOFS flags
scattered through it so that lockdep doesn't get all upset every
time readdir is called...

There are other cases like this - btree manipulation via bunmapi()
can be called without transaction context to remove delayed alloc
extents, and that puts all of the btree cursor and  incore extent
list handling in the same boat (all those allocations are KM_NOFS),
etc.

So it's not really recursion that is the problem here - it's
different allocation contexts that lockdep can't know about unless
it's told about them. We've done that with KM_NOFS in the past; in
future we should use this KM_NOLOCKDEP flag, though I'd prefer a
better name for it. e.g. KM_NOTRANS to indicate that the allocation
can occur both inside and outside of transaction context....

Cheers,

Dave.
diff mbox

Patch

diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index 689f746224e7..ea3984091d58 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -33,6 +33,7 @@  typedef unsigned __bitwise xfs_km_flags_t;
 #define KM_NOFS		((__force xfs_km_flags_t)0x0004u)
 #define KM_MAYFAIL	((__force xfs_km_flags_t)0x0008u)
 #define KM_ZERO		((__force xfs_km_flags_t)0x0010u)
+#define KM_NOLOCKDEP	((__force xfs_km_flags_t)0x0020u)
 
 /*
  * We use a special process flag to avoid recursive callbacks into
@@ -57,6 +58,9 @@  kmem_flags_convert(xfs_km_flags_t flags)
 	if (flags & KM_ZERO)
 		lflags |= __GFP_ZERO;
 
+	if (flags & KM_NOLOCKDEP)
+		lflags |= __GFP_NOLOCKDEP;
+
 	return lflags;
 }
 
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index f2dc1a950c85..b8b5f6914863 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -2429,7 +2429,7 @@  xfs_buf_map_from_irec(
 
 	if (nirecs > 1) {
 		map = kmem_zalloc(nirecs * sizeof(struct xfs_buf_map),
-				  KM_SLEEP | KM_NOFS);
+				  KM_SLEEP | KM_NOLOCKDEP);
 		if (!map)
 			return -ENOMEM;
 		*mapp = map;
@@ -2488,7 +2488,7 @@  xfs_dabuf_map(
 		 */
 		if (nfsb != 1)
 			irecs = kmem_zalloc(sizeof(irec) * nfsb,
-					    KM_SLEEP | KM_NOFS);
+					    KM_SLEEP | KM_NOLOCKDEP);
 
 		nirecs = nfsb;
 		error = xfs_bmapi_read(dp, (xfs_fileoff_t)bno, nfsb, irecs,
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 7f0a01f7b592..f31ae592dcae 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1785,7 +1785,7 @@  xfs_alloc_buftarg(
 {
 	xfs_buftarg_t		*btp;
 
-	btp = kmem_zalloc(sizeof(*btp), KM_SLEEP | KM_NOFS);
+	btp = kmem_zalloc(sizeof(*btp), KM_SLEEP | KM_NOLOCKDEP);
 
 	btp->bt_mount = mp;
 	btp->bt_dev =  bdev->bd_dev;
diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
index 003a99b83bd8..033ed65d7ce6 100644
--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -503,7 +503,7 @@  xfs_dir2_leaf_getdents(
 	length = howmany(bufsize + geo->blksize, (1 << geo->fsblog));
 	map_info = kmem_zalloc(offsetof(struct xfs_dir2_leaf_map_info, map) +
 				(length * sizeof(struct xfs_bmbt_irec)),
-			       KM_SLEEP | KM_NOFS);
+			       KM_SLEEP | KM_NOLOCKDEP);
 	map_info->map_size = length;
 
 	/*