diff mbox

[v7,16/22] block: convert to errseq_t based writeback error tracking

Message ID 20170616193427.13955-17-jlayton@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Layton June 16, 2017, 7:34 p.m. UTC
This is a very minimal conversion to errseq_t based error tracking
for raw block device access.

Only real change that is strictly required is that we must
unconditionally call filemap_report_wb_err in blkdev_fsync.
That ensures that the file's errseq_t is always advanced to
the latest value in the mapping.

Note that there are internal callers that call sync_blockdev
and the like that are not affected by this. They'll continue
to use the AS_EIO/AS_ENOSPC flags for error reporting like
they always have for now.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/block_dev.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig June 20, 2017, 12:35 p.m. UTC | #1
>  	error = filemap_write_and_wait_range(filp->f_mapping, start, end);
>  	if (error)
> -		return error;
> +		goto out;
>  
>  	/*
>  	 * There is no need to serialise calls to blkdev_issue_flush with
> @@ -640,6 +640,10 @@ int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
>  	if (error == -EOPNOTSUPP)
>  		error = 0;
>  
> +out:
> +	wberr = filemap_report_wb_err(filp);
> +	if (!error)
> +		error = wberr;

Just curious: what's the reason filemap_write_and_wait_range couldn't
query for the error using filemap_report_wb_err internally?
Jeff Layton June 20, 2017, 5:44 p.m. UTC | #2
On Tue, 2017-06-20 at 05:35 -0700, Christoph Hellwig wrote:
> >  	error = filemap_write_and_wait_range(filp->f_mapping, start, end);
> >  	if (error)
> > -		return error;
> > +		goto out;
> >  
> >  	/*
> >  	 * There is no need to serialise calls to blkdev_issue_flush with
> > @@ -640,6 +640,10 @@ int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
> >  	if (error == -EOPNOTSUPP)
> >  		error = 0;
> >  
> > +out:
> > +	wberr = filemap_report_wb_err(filp);
> > +	if (!error)
> > +		error = wberr;
> 
> Just curious: what's the reason filemap_write_and_wait_range couldn't
> query for the error using filemap_report_wb_err internally?

In order to query for errors with errseq_t, you need a previously-
sampled point from which to check. When you call
filemap_write_and_wait_range though you don't have a struct file and so
no previously-sampled value.
Christoph Hellwig June 24, 2017, 11:59 a.m. UTC | #3
On Tue, Jun 20, 2017 at 01:44:44PM -0400, Jeff Layton wrote:
> In order to query for errors with errseq_t, you need a previously-
> sampled point from which to check. When you call
> filemap_write_and_wait_range though you don't have a struct file and so
> no previously-sampled value.

So can we simply introduce variants of them that take a struct file?
That would be:

 a) less churn
 b) less code
 c) less chance to get data integrity wrong
Jeff Layton June 24, 2017, 1:16 p.m. UTC | #4
On Sat, 2017-06-24 at 04:59 -0700, Christoph Hellwig wrote:
> On Tue, Jun 20, 2017 at 01:44:44PM -0400, Jeff Layton wrote:
> > In order to query for errors with errseq_t, you need a previously-
> > sampled point from which to check. When you call
> > filemap_write_and_wait_range though you don't have a struct file and so
> > no previously-sampled value.
> 
> So can we simply introduce variants of them that take a struct file?
> That would be:
> 
>  a) less churn
>  b) less code
>  c) less chance to get data integrity wrong

Yeah, I had that thought after I sent the reply to you earlier.

The main reason I didn't do that before was that I had myself convinced
that we needed to do the check_and_advance as late as possible in the
fsync process, after the metadata had been written.

Now that I think about it more, I think you're probably correct. As long
as we do the check and advance at some point after doing the
write_and_wait, we're fine here and shouldn't violate exactly once
semantics on the fsync return.
Jeff Layton June 26, 2017, 2:34 p.m. UTC | #5
On Sat, 2017-06-24 at 09:16 -0400, Jeff Layton wrote:
> On Sat, 2017-06-24 at 04:59 -0700, Christoph Hellwig wrote:
> > On Tue, Jun 20, 2017 at 01:44:44PM -0400, Jeff Layton wrote:
> > > In order to query for errors with errseq_t, you need a previously-
> > > sampled point from which to check. When you call
> > > filemap_write_and_wait_range though you don't have a struct file and so
> > > no previously-sampled value.
> > 
> > So can we simply introduce variants of them that take a struct file?
> > That would be:
> > 
> >  a) less churn
> >  b) less code
> >  c) less chance to get data integrity wrong
> 
> Yeah, I had that thought after I sent the reply to you earlier.
> 
> The main reason I didn't do that before was that I had myself convinced
> that we needed to do the check_and_advance as late as possible in the
> fsync process, after the metadata had been written.
> 
> Now that I think about it more, I think you're probably correct. As long
> as we do the check and advance at some point after doing the
> write_and_wait, we're fine here and shouldn't violate exactly once
> semantics on the fsync return.

So I have a file_write_and_wait_range now that should DTRT for this
patch.

The bigger question is -- what about more complex filesystems like
ext4?  There are a couple of cases where we can return -EIO or -EROFS on
fsync before filemap_write_and_wait_range is ever called. Like this one
for instance:

        if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
                return -EIO;

...and the EXT4_MF_FS_ABORTED case.

Are those conditions ever recoverable, such that a later fsync could
succeed? IOW, could I do a remount or something such that the existing
fds are left open and become usable again? 

If so, then we really ought to advance the errseq_t in the file when we
catch those cases as well. If we have to do that, then it probably makes
sense to leave the ext4 patch as-is.
Christoph Hellwig June 27, 2017, 3:20 p.m. UTC | #6
On Mon, Jun 26, 2017 at 10:34:18AM -0400, Jeff Layton wrote:
> The bigger question is -- what about more complex filesystems like
> ext4?  There are a couple of cases where we can return -EIO or -EROFS on
> fsync before filemap_write_and_wait_range is ever called. Like this one
> for instance:
> 
>         if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
>                 return -EIO;
> 
> ...and the EXT4_MF_FS_ABORTED case.
> 
> Are those conditions ever recoverable, such that a later fsync could
> succeed? IOW, could I do a remount or something such that the existing
> fds are left open and become usable again? 

This looks copied from the xfs forced shutdown code, and in that
case it's final and permanent - you'll need an unmount to
clear it.

> If so, then we really ought to advance the errseq_t in the file when we
> catch those cases as well. If we have to do that, then it probably makes
> sense to leave the ext4 patch as-is.

I think it can switch to the new file helper.
diff mbox

Patch

diff --git a/fs/block_dev.c b/fs/block_dev.c
index dc839f8f0ba5..9e8e13b097ef 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -625,11 +625,11 @@  int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
 {
 	struct inode *bd_inode = bdev_file_inode(filp);
 	struct block_device *bdev = I_BDEV(bd_inode);
-	int error;
+	int error, wberr;
 	
 	error = filemap_write_and_wait_range(filp->f_mapping, start, end);
 	if (error)
-		return error;
+		goto out;
 
 	/*
 	 * There is no need to serialise calls to blkdev_issue_flush with
@@ -640,6 +640,10 @@  int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
 	if (error == -EOPNOTSUPP)
 		error = 0;
 
+out:
+	wberr = filemap_report_wb_err(filp);
+	if (!error)
+		error = wberr;
 	return error;
 }
 EXPORT_SYMBOL(blkdev_fsync);