diff mbox

btrfs list corruption and soft lockups while testing writeback error handling

Message ID 1494591145.2787.5.camel@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Layton May 12, 2017, 12:12 p.m. UTC
On Thu, 2017-05-11 at 15:56 -0400, Chris Mason wrote:
> On 05/11/2017 03:52 PM, Jeff Layton wrote:
> > On Thu, 2017-05-11 at 07:13 -0400, Jeff Layton wrote:
> > > I finally got my writeback error handling test to work on btrfs (thanks,
> > > Chris!), by making the filesystem stripe the data and mirror the
> > > metadata across two devices. The test passes now, but on one run, I got
> > > the following list corruption warning and then a soft lockup (which is
> > > probably fallout from the list corruption).
> > > 
> > > I ran the test several times before and since then without this failure,
> > > so I don't have a clear reproducer. The kernel in this instance is
> > > basically a v4.11 kernel with my pile of writeback error handling
> > > patches on top:
> > > 
> > >     https://urldefense.proofpoint.com/v2/url?u=https-3A__git.samba.org_-3Fp-3Djlayton_linux.git-3Ba-3Dshortlog-3Bh-3Drefs_heads_wberr&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=BXXwaUFQNFNaGGFYHEVlvNBwkrXiIoH7K5iOdR_PvxM&s=xE6pIXeQ1rlaxAV8aTYBSiI06pb3WZoiRJW8Vo1L3NQ&e=
> > > 
> > > It may be that they are a contributing factor, but this smells more like
> > > a bug down in btrfs. Let me know if you need other info:
> 
> [ btrfs inode logging ]
> 
> > (cc'ing Liu Bo since we were discussing this earlier this week)
> > 
> > I can't reproduce this on stock v4.11, so I think this is a bug in my
> > series.
> > 
> > I think this is due to the differences in how errors are being reported
> > from filemap_fdatawait_range now causing some transactions to end up
> > being freed while they're still on the log_ctxs list. I'm working on
> > hunting down the problem now.
> > 
> > Sorry for the noise!
> > 
> 
> There's a list in the inode logging code that we consistently seem to 
> find list debugging assertions with.  We've fixed up all the known 
> issues, but I wouldn't be surprised if we've got a goto fail in there.
> 
> I'll take a look ;)
> 

Thanks. I'm running test 999 here in a loop to reproduce it on a kernel
with my patch series applied:

https://git.samba.org/?p=jlayton/xfstests.git;a=shortlog;h=refs/heads/wberr

The patch below seems to prevent it from crashing, but I'm not at all
sure that this is a correct fix. Still, I think that the way errors are
tracked within btrfs might need some rework around errseq_t's. In
principle, it could make things even simpler now that we don't need to
worry about resetting errors that have been cleared, etc...

--------------------8<--------------------------

[PATCH] btrfs: make btrfs_log_ctx->io_err an errseq_t

The btrfs_log_ctx has an io_err field in it that gets an error stored
in it when there is an I/O error. The way this is done now requires a
lot of extra machinery. Instead, convert it over to using errseq_t
to tell whether there was an error since the context was initialized.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/btrfs/file.c     |  7 ++++---
 fs/btrfs/tree-log.c | 19 -------------------
 fs/btrfs/tree-log.h |  2 ++
 3 files changed, 6 insertions(+), 22 deletions(-)
diff mbox

Patch

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e15faf240b51..c4afbf556e3a 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1959,7 +1959,7 @@  int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_trans_handle *trans;
 	struct btrfs_log_ctx ctx;
-	int ret = 0;
+	int ret = 0, wb_ret;
 	bool full_sync = 0;
 	u64 len;
 	errseq_t wb_since = READ_ONCE(file->f_wb_err);
@@ -2143,9 +2143,10 @@  int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 * therefore we need to check for errors in the ordered operations,
 	 * which are indicated by ctx.io_err.
 	 */
-	if (ctx.io_err) {
+	wb_ret = filemap_check_wb_error(inode->i_mapping, ctx.io_err);
+	if (wb_ret) {
+		ret = wb_ret;
 		btrfs_end_transaction(trans);
-		ret = ctx.io_err;
 		goto out;
 	}
 
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index d0a123dbb199..da414e488c4b 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4079,11 +4079,6 @@  static int log_one_extent(struct btrfs_trans_handle *trans,
 	if (ret)
 		return ret;
 
-	if (ordered_io_err) {
-		ctx->io_err = -EIO;
-		return 0;
-	}
-
 	btrfs_init_map_token(&token);
 
 	ret = __btrfs_drop_extents(trans, log, &inode->vfs_inode, path, em->start,
@@ -4165,7 +4160,6 @@  static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans,
 	u64 test_gen;
 	int ret = 0;
 	int num = 0;
-	errseq_t since = filemap_sample_wb_error(inode->vfs_inode.i_mapping);
 
 	INIT_LIST_HEAD(&extents);
 
@@ -4199,19 +4193,6 @@  static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans,
 
 	list_sort(NULL, &extents, extent_cmp);
 	btrfs_get_logged_extents(inode, logged_list, start, end);
-	/*
-	 * Some ordered extents started by fsync might have completed
-	 * before we could collect them into the list logged_list, which
-	 * means they're gone, not in our logged_list nor in the inode's
-	 * ordered tree. We want the application/user space to know an
-	 * error happened while attempting to persist file data so that
-	 * it can take proper action. If such error happened, we leave
-	 * without writing to the log tree and the fsync must report the
-	 * file data write error and not commit the current transaction.
-	 */
-	ret = filemap_check_wb_error(inode->vfs_inode.i_mapping, since);
-	if (ret)
-		ctx->io_err = ret;
 process:
 	while (!list_empty(&extents)) {
 		em = list_entry(extents.next, struct extent_map, list);
diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
index 483027f9a7f4..97a6143842a4 100644
--- a/fs/btrfs/tree-log.h
+++ b/fs/btrfs/tree-log.h
@@ -42,6 +42,8 @@  static inline void btrfs_init_log_ctx(struct btrfs_log_ctx *ctx,
 	ctx->io_err = 0;
 	ctx->log_new_dentries = false;
 	ctx->inode = inode;
+	if (inode)
+		ctx->io_err = filemap_sample_wb_error(inode->i_mapping);
 	INIT_LIST_HEAD(&ctx->list);
 }