diff mbox

[2/5] buffer: record blockdev write errors in super_block that backs them

Message ID 20180604180304.9662-3-jlayton@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Layton June 4, 2018, 6:03 p.m. UTC
From: Jeff Layton <jlayton@redhat.com>

When syncing out a block device (a'la __sync_blockdev), any error
encountered will only be recorded in the bd_inode's mapping. When the
blockdev contains a filesystem however, we'd like to also record the
error in the super_block that's stored there.

Make mark_buffer_write_io_error also record the error in the
corresponding super_block when a writeback error occurs and the block
device contains a mounted superblock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/buffer.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jeff Layton June 6, 2018, 3:56 p.m. UTC | #1
On Mon, 2018-06-04 at 14:03 -0400, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> When syncing out a block device (a'la __sync_blockdev), any error
> encountered will only be recorded in the bd_inode's mapping. When the
> blockdev contains a filesystem however, we'd like to also record the
> error in the super_block that's stored there.
> 
> Make mark_buffer_write_io_error also record the error in the
> corresponding super_block when a writeback error occurs and the block
> device contains a mounted superblock.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/buffer.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 249b83fafe48..dae2a857d5bc 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1117,6 +1117,8 @@ void mark_buffer_write_io_error(struct buffer_head *bh)
>  		mapping_set_error(bh->b_page->mapping, -EIO);
>  	if (bh->b_assoc_map)
>  		mapping_set_error(bh->b_assoc_map, -EIO);
> +	if (bh->b_bdev->bd_super)
> +		errseq_set(&bh->b_bdev->bd_super->s_wb_err, -EIO);
>  }
>  EXPORT_SYMBOL(mark_buffer_write_io_error);
>  

(cc'ing linux-block and Jens)

I'm wondering whether this patch might turn out to be racy. For
instance, could a call to __sync_blockdev race with an unmount in such
a way that bd_super goes NULL after we check it but before errseq_set
is called?

If so, what can we do to ensure that that doesn't happen? Any insight
here would be appreciated.

Thanks,
Jeff Layton June 19, 2018, 10:40 a.m. UTC | #2
On Wed, 2018-06-06 at 11:56 -0400, Jeff Layton wrote:
> On Mon, 2018-06-04 at 14:03 -0400, Jeff Layton wrote:
> > From: Jeff Layton <jlayton@redhat.com>
> > 
> > When syncing out a block device (a'la __sync_blockdev), any error
> > encountered will only be recorded in the bd_inode's mapping. When the
> > blockdev contains a filesystem however, we'd like to also record the
> > error in the super_block that's stored there.
> > 
> > Make mark_buffer_write_io_error also record the error in the
> > corresponding super_block when a writeback error occurs and the block
> > device contains a mounted superblock.
> > 
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  fs/buffer.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 249b83fafe48..dae2a857d5bc 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -1117,6 +1117,8 @@ void mark_buffer_write_io_error(struct buffer_head *bh)
> >  		mapping_set_error(bh->b_page->mapping, -EIO);
> >  	if (bh->b_assoc_map)
> >  		mapping_set_error(bh->b_assoc_map, -EIO);
> > +	if (bh->b_bdev->bd_super)
> > +		errseq_set(&bh->b_bdev->bd_super->s_wb_err, -EIO);
> >  }
> >  EXPORT_SYMBOL(mark_buffer_write_io_error);
> >  
> 
> (cc'ing linux-block and Jens)
> 
> I'm wondering whether this patch might turn out to be racy. For
> instance, could a call to __sync_blockdev race with an unmount in such
> a way that bd_super goes NULL after we check it but before errseq_set
> is called?
> 
> If so, what can we do to ensure that that doesn't happen? Any insight
> here would be appreciated.
> 
> Thanks,

Jens, ping? I never got a response on the above.

After looking over it some more, I suspect that this may be racy with
some filesystems. Some of them seem to just flush out data to the
bd_inode on unmount, and trust the system to take care of the rest.

One possible fix there might be to turn bd_super into an RCU managed
pointer. We already free super_blocks under RCU, so we could do
something there like:

rcu_read_lock();
sb = rcu_dereference(bh->b_bdev->bd_super);
if (sb)
	errseq_set(&sb->s_wb_err, -EIO);
rcu_read_unlock();

There aren't that many accessors of bd_super, so that seems like it'd be
fairly simple to do.

Still, I'd like someone to sanity check me here. Is there something that
would prevent the above race that I'm not seeing?

Thanks,
Jeff Layton June 19, 2018, 1:03 p.m. UTC | #3
On Tue, 2018-06-19 at 06:40 -0400, Jeff Layton wrote:
> On Wed, 2018-06-06 at 11:56 -0400, Jeff Layton wrote:
> > On Mon, 2018-06-04 at 14:03 -0400, Jeff Layton wrote:
> > > From: Jeff Layton <jlayton@redhat.com>
> > > 
> > > When syncing out a block device (a'la __sync_blockdev), any error
> > > encountered will only be recorded in the bd_inode's mapping. When the
> > > blockdev contains a filesystem however, we'd like to also record the
> > > error in the super_block that's stored there.
> > > 
> > > Make mark_buffer_write_io_error also record the error in the
> > > corresponding super_block when a writeback error occurs and the block
> > > device contains a mounted superblock.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > ---
> > >  fs/buffer.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/fs/buffer.c b/fs/buffer.c
> > > index 249b83fafe48..dae2a857d5bc 100644
> > > --- a/fs/buffer.c
> > > +++ b/fs/buffer.c
> > > @@ -1117,6 +1117,8 @@ void mark_buffer_write_io_error(struct buffer_head *bh)
> > >  		mapping_set_error(bh->b_page->mapping, -EIO);
> > >  	if (bh->b_assoc_map)
> > >  		mapping_set_error(bh->b_assoc_map, -EIO);
> > > +	if (bh->b_bdev->bd_super)
> > > +		errseq_set(&bh->b_bdev->bd_super->s_wb_err, -EIO);
> > >  }
> > >  EXPORT_SYMBOL(mark_buffer_write_io_error);
> > >  
> > 
> > (cc'ing linux-block and Jens)
> > 
> > I'm wondering whether this patch might turn out to be racy. For
> > instance, could a call to __sync_blockdev race with an unmount in such
> > a way that bd_super goes NULL after we check it but before errseq_set
> > is called?
> > 
> > If so, what can we do to ensure that that doesn't happen? Any insight
> > here would be appreciated.
> > 
> > Thanks,
> 
> Jens, ping? I never got a response on the above.
> 
> After looking over it some more, I suspect that this may be racy with
> some filesystems. Some of them seem to just flush out data to the
> bd_inode on unmount, and trust the system to take care of the rest.
> 
> One possible fix there might be to turn bd_super into an RCU managed
> pointer. We already free super_blocks under RCU, so we could do
> something there like:
> 
> rcu_read_lock();
> sb = rcu_dereference(bh->b_bdev->bd_super);
> if (sb)
> 	errseq_set(&sb->s_wb_err, -EIO);
> rcu_read_unlock();
> 
> There aren't that many accessors of bd_super, so that seems like it'd be
> fairly simple to do.
> 
> Still, I'd like someone to sanity check me here. Is there something that
> would prevent the above race that I'm not seeing?
> 

(cc'ing Ted since he added blkdev_releasepage in 2009)

Corollary question:

What makes it safe to dereference bd_super in blkdev_releasepage?

bd_super can go NULL in kill_sb and eventually the super_block will be
freed. Is there a ToC/ToU race in that function?
diff mbox

Patch

diff --git a/fs/buffer.c b/fs/buffer.c
index 249b83fafe48..dae2a857d5bc 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1117,6 +1117,8 @@  void mark_buffer_write_io_error(struct buffer_head *bh)
 		mapping_set_error(bh->b_page->mapping, -EIO);
 	if (bh->b_assoc_map)
 		mapping_set_error(bh->b_assoc_map, -EIO);
+	if (bh->b_bdev->bd_super)
+		errseq_set(&bh->b_bdev->bd_super->s_wb_err, -EIO);
 }
 EXPORT_SYMBOL(mark_buffer_write_io_error);