[v6,12/20] fs: add a new fstype flag to indicate how writeback errors are tracked
diff mbox

Message ID 20170612122316.13244-15-jlayton@redhat.com
State New
Headers show

Commit Message

Jeff Layton June 12, 2017, 12:23 p.m. UTC
Now that we have new infrastructure for handling writeback errors using
errseq_t, we need to convert the existing code to use it. We could
attempt to retrofit the old interfaces on top of the new, but there is
a conceptual disconnect here in the case of internal callers that
invoke filemap_fdatawait and the like.

When reporting writeback errors, we will always report errors that have
occurred since a particular point in time. With the old writeback error
reporting, the time we used was "since it was last tested/cleared" which
is entirely arbitrary and potentially racy. Now, we can report the
latest error that has occurred since an arbitrary point in time
(represented as a sampled errseq_t value).

This means that we need to touch each filesystem that calls
filemap_check_errors in some fashion and ensure that we establish sane
"since" values for those callers. But...some code is shared between
filesystems and needs to be able to handle both error tracking schemes.

Add a new FS_WB_ERRSEQ flag to the fstype. Later patches will set and
key off of that to decide what behavior should be used.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h | 1 +
 1 file changed, 1 insertion(+)

Comments

Christoph Hellwig June 12, 2017, 12:45 p.m. UTC | #1
On Mon, Jun 12, 2017 at 08:23:06AM -0400, Jeff Layton wrote:
> Add a new FS_WB_ERRSEQ flag to the fstype. Later patches will set and
> key off of that to decide what behavior should be used.

Please invert this so that only file systems that keep the old semantics
need a flag.
Jeff Layton June 13, 2017, 10:24 a.m. UTC | #2
On Mon, 2017-06-12 at 05:45 -0700, Christoph Hellwig wrote:
> On Mon, Jun 12, 2017 at 08:23:06AM -0400, Jeff Layton wrote:
> > Add a new FS_WB_ERRSEQ flag to the fstype. Later patches will set and
> > key off of that to decide what behavior should be used.
> 
> Please invert this so that only file systems that keep the old semantics
> need a flag.


That's definitely what I want for the endgame here. My plan was to add
this flag for now, and then eventually reverse it (or drop it) once all
or most filesystems are converted.

We can do it that way from the get-go if you like. It'll mean tossing in
 a patch add this flag to all filesystems that have an fsync operation
and that use the pagecache, and then gradually remove it from them as we
convert them.

Which method do you prefer?
Christoph Hellwig June 14, 2017, 6:47 a.m. UTC | #3
On Tue, Jun 13, 2017 at 06:24:32AM -0400, Jeff Layton wrote:
> That's definitely what I want for the endgame here. My plan was to add
> this flag for now, and then eventually reverse it (or drop it) once all
> or most filesystems are converted.
> 
> We can do it that way from the get-go if you like. It'll mean tossing in
>  a patch add this flag to all filesystems that have an fsync operation
> and that use the pagecache, and then gradually remove it from them as we
> convert them.
> 
> Which method do you prefer?

Please do it from the get-go.  Or in fact figure out if we can get
away without it entirely.  Moving the error reporting into ->fsync
should help greatly with that, so what's missing after that?
Jeff Layton June 14, 2017, 5:24 p.m. UTC | #4
On Tue, 2017-06-13 at 23:47 -0700, Christoph Hellwig wrote:
> On Tue, Jun 13, 2017 at 06:24:32AM -0400, Jeff Layton wrote:
> > That's definitely what I want for the endgame here. My plan was to add
> > this flag for now, and then eventually reverse it (or drop it) once all
> > or most filesystems are converted.
> > 
> > We can do it that way from the get-go if you like. It'll mean tossing in
> >  a patch add this flag to all filesystems that have an fsync operation
> > and that use the pagecache, and then gradually remove it from them as we
> > convert them.
> > 
> > Which method do you prefer?
> 
> Please do it from the get-go.  Or in fact figure out if we can get
> away without it entirely.  Moving the error reporting into ->fsync
> should help greatly with that, so what's missing after that?

In this smaller set, it's only really used for DAX. In the larger patch
series I have (which needs updating on top of this), there are other
things that key off of it:

sync_file_range: ->fsync isn't called directly there, and I think we
probably want similar semantics to fsync() for it

JBD2: will try to re-set the error after clearing it with
filemap_fdatawait. That's problematic with the new infrastructure so we
need some way to avoid it.

What I think I'll do for now is add a new FS_DAX_WB_ERRSEQ flag that
will go away by the end of the series. As the need arises for a similar
flag in other areas, I'll add them then.

The overall goal is not to need these flags. It may take a bit of time
to get there though.

Thanks for the review so far!
Christoph Hellwig June 15, 2017, 8:22 a.m. UTC | #5
On Wed, Jun 14, 2017 at 01:24:43PM -0400, Jeff Layton wrote:
> In this smaller set, it's only really used for DAX.

DAX only is implemented by three filesystems, please just fix them
up in one go.

> sync_file_range: ->fsync isn't called directly there, and I think we
> probably want similar semantics to fsync() for it

sync_file_range is only supposed to sync data, so it should not call
->fsync.

> JBD2: will try to re-set the error after clearing it with
> filemap_fdatawait. That's problematic with the new infrastructure so we
> need some way to avoid it.

JBD2 only has two users, please fix them up in one go.
Jeff Layton June 15, 2017, 10:42 a.m. UTC | #6
On Thu, 2017-06-15 at 01:22 -0700, Christoph Hellwig wrote:
> On Wed, Jun 14, 2017 at 01:24:43PM -0400, Jeff Layton wrote:
> > In this smaller set, it's only really used for DAX.
> 
> DAX only is implemented by three filesystems, please just fix them
> up in one go.
> 

Ok.

> > sync_file_range: ->fsync isn't called directly there, and I think we
> > probably want similar semantics to fsync() for it
> 
> sync_file_range is only supposed to sync data, so it should not call
> ->fsync.
> 

Correct.

But if there is a data writeback error, should we report an error on all
open fds at that time (like we will for fsync)?

I think we probably do want to do that, but like you say...there is no
file op for sync_file_range. It'll need some way to figure out what sort
of error tracking is in play.

> > JBD2: will try to re-set the error after clearing it with
> > filemap_fdatawait. That's problematic with the new infrastructure so we
> > need some way to avoid it.
> 
> JBD2 only has two users, please fix them up in one go.

I came up with a fix yesterday that makes the flag unnecessary there.

Thanks,
Christoph Hellwig June 15, 2017, 2:57 p.m. UTC | #7
On Thu, Jun 15, 2017 at 06:42:12AM -0400, Jeff Layton wrote:
> Correct.
> 
> But if there is a data writeback error, should we report an error on all
> open fds at that time (like we will for fsync)?

We should in theory, but I don't see how to properly do it.  In addition
sync_file_range just can't be used for data integrity to start with, so
I don't think it's worth it.  At some point we should add a proper
fsync_range syscall, though.
Jeff Layton June 15, 2017, 3:03 p.m. UTC | #8
On Thu, 2017-06-15 at 07:57 -0700, Christoph Hellwig wrote:
> On Thu, Jun 15, 2017 at 06:42:12AM -0400, Jeff Layton wrote:
> > Correct.
> > 
> > But if there is a data writeback error, should we report an error on all
> > open fds at that time (like we will for fsync)?
> 
> We should in theory, but I don't see how to properly do it.  In addition
> sync_file_range just can't be used for data integrity to start with, so
> I don't think it's worth it.  At some point we should add a proper
> fsync_range syscall, though.

filemap_report_wb_err will always return 0 if the inode never has
mapping_set_error called on it. So, I think we should be able to do it
there once we get all of the fs' converted over. That'll have to happen
at the end of the series however.

Patch
diff mbox

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6cd87887430b..17ba6284ab14 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2023,6 +2023,7 @@  struct file_system_type {
 #define FS_BINARY_MOUNTDATA	2
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
+#define FS_WB_ERRSEQ		16	/* errseq_t writeback err tracking */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);