diff mbox series

[v5,2/2] buffer: record blockdev write errors in super_block that it backs

Message ID 20200415121300.228017-3-jlayton@kernel.org (mailing list archive)
State New, archived
Headers show
Series vfs: have syncfs() return error when there are writeback errors | expand

Commit Message

Jeffrey Layton April 15, 2020, 12:13 p.m. UTC
From: Jeff Layton <jlayton@redhat.com>

When syncing out a block device (a'la __sync_blockdev), any error
encountered will only be recorded in the bd_inode's mapping. When the
blockdev contains a filesystem however, we'd like to also record the
error in the super_block that's stored there.

Make mark_buffer_write_io_error also record the error in the
corresponding super_block when a writeback error occurs and the block
device contains a mounted superblock.

Since superblocks are RCU freed, hold the rcu_read_lock to ensure
that the superblock doesn't go away while we're marking it.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/buffer.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Jan Kara April 15, 2020, 2:06 p.m. UTC | #1
On Wed 15-04-20 08:13:00, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> When syncing out a block device (a'la __sync_blockdev), any error
> encountered will only be recorded in the bd_inode's mapping. When the
> blockdev contains a filesystem however, we'd like to also record the
> error in the super_block that's stored there.
> 
> Make mark_buffer_write_io_error also record the error in the
> corresponding super_block when a writeback error occurs and the block
> device contains a mounted superblock.
> 
> Since superblocks are RCU freed, hold the rcu_read_lock to ensure
> that the superblock doesn't go away while we're marking it.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/buffer.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index f73276d746bb..2a4a5cc20418 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1154,12 +1154,19 @@ EXPORT_SYMBOL(mark_buffer_dirty);
>  
>  void mark_buffer_write_io_error(struct buffer_head *bh)
>  {
> +	struct super_block *sb;
> +
>  	set_buffer_write_io_error(bh);
>  	/* FIXME: do we need to set this in both places? */
>  	if (bh->b_page && bh->b_page->mapping)
>  		mapping_set_error(bh->b_page->mapping, -EIO);
>  	if (bh->b_assoc_map)
>  		mapping_set_error(bh->b_assoc_map, -EIO);
> +	rcu_read_lock();
> +	sb = bh->b_bdev->bd_super;

You still need READ_ONCE() here. Otherwise the dereference below can still
result in refetch and NULL ptr deref.

								Honza

> +	if (sb)
> +		errseq_set(&sb->s_wb_err, -EIO);
> +	rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(mark_buffer_write_io_error);
>  
> -- 
> 2.25.2
>
Jeffrey Layton April 15, 2020, 4:22 p.m. UTC | #2
On Wed, 2020-04-15 at 16:06 +0200, Jan Kara wrote:
> On Wed 15-04-20 08:13:00, Jeff Layton wrote:
> > From: Jeff Layton <jlayton@redhat.com>
> > 
> > When syncing out a block device (a'la __sync_blockdev), any error
> > encountered will only be recorded in the bd_inode's mapping. When the
> > blockdev contains a filesystem however, we'd like to also record the
> > error in the super_block that's stored there.
> > 
> > Make mark_buffer_write_io_error also record the error in the
> > corresponding super_block when a writeback error occurs and the block
> > device contains a mounted superblock.
> > 
> > Since superblocks are RCU freed, hold the rcu_read_lock to ensure
> > that the superblock doesn't go away while we're marking it.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/buffer.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index f73276d746bb..2a4a5cc20418 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -1154,12 +1154,19 @@ EXPORT_SYMBOL(mark_buffer_dirty);
> >  
> >  void mark_buffer_write_io_error(struct buffer_head *bh)
> >  {
> > +	struct super_block *sb;
> > +
> >  	set_buffer_write_io_error(bh);
> >  	/* FIXME: do we need to set this in both places? */
> >  	if (bh->b_page && bh->b_page->mapping)
> >  		mapping_set_error(bh->b_page->mapping, -EIO);
> >  	if (bh->b_assoc_map)
> >  		mapping_set_error(bh->b_assoc_map, -EIO);
> > +	rcu_read_lock();
> > +	sb = bh->b_bdev->bd_super;
> 
> You still need READ_ONCE() here. Otherwise the dereference below can still
> result in refetch and NULL ptr deref.
> 
> 								Honza
> 

Huh? That seems like a really suspicious thing for the compiler/arch to
do. We are checking that sb isn't NULL before we dereference it. Doesn't
that imply a data dependency? How could the value of "sb" change after
that?

I'm also not sure I understand how using READ_ONCE really helps there if
we can't count on the value of a local variable not changing.

> > +	if (sb)
> > +		errseq_set(&sb->s_wb_err, -EIO);
> > +	rcu_read_unlock();
> >  }
> >  EXPORT_SYMBOL(mark_buffer_write_io_error);
> >  
> > -- 
> > 2.25.2
> >
Jan Kara April 16, 2020, 9:35 a.m. UTC | #3
On Wed 15-04-20 12:22:27, Jeff Layton wrote:
> On Wed, 2020-04-15 at 16:06 +0200, Jan Kara wrote:
> > On Wed 15-04-20 08:13:00, Jeff Layton wrote:
> > > From: Jeff Layton <jlayton@redhat.com>
> > > 
> > > When syncing out a block device (a'la __sync_blockdev), any error
> > > encountered will only be recorded in the bd_inode's mapping. When the
> > > blockdev contains a filesystem however, we'd like to also record the
> > > error in the super_block that's stored there.
> > > 
> > > Make mark_buffer_write_io_error also record the error in the
> > > corresponding super_block when a writeback error occurs and the block
> > > device contains a mounted superblock.
> > > 
> > > Since superblocks are RCU freed, hold the rcu_read_lock to ensure
> > > that the superblock doesn't go away while we're marking it.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/buffer.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/fs/buffer.c b/fs/buffer.c
> > > index f73276d746bb..2a4a5cc20418 100644
> > > --- a/fs/buffer.c
> > > +++ b/fs/buffer.c
> > > @@ -1154,12 +1154,19 @@ EXPORT_SYMBOL(mark_buffer_dirty);
> > >  
> > >  void mark_buffer_write_io_error(struct buffer_head *bh)
> > >  {
> > > +	struct super_block *sb;
> > > +
> > >  	set_buffer_write_io_error(bh);
> > >  	/* FIXME: do we need to set this in both places? */
> > >  	if (bh->b_page && bh->b_page->mapping)
> > >  		mapping_set_error(bh->b_page->mapping, -EIO);
> > >  	if (bh->b_assoc_map)
> > >  		mapping_set_error(bh->b_assoc_map, -EIO);
> > > +	rcu_read_lock();
> > > +	sb = bh->b_bdev->bd_super;
> > 
> > You still need READ_ONCE() here. Otherwise the dereference below can still
> > result in refetch and NULL ptr deref.
> > 
> > 								Honza
> > 
> 
> Huh? That seems like a really suspicious thing for the compiler/arch to
> do. We are checking that sb isn't NULL before we dereference it. Doesn't
> that imply a data dependency? How could the value of "sb" change after
> that?

Because the compiler is free to optimize the local variable away and
actually compile the dereference below as bh->b_bdev->bd_super->s_wb_err
(from C11 standard POV such code is equivalent since in C11 memory model
it is assumed there are no concurrent accesses). And READ_ONCE() is a way
to forbid compiler from doing such optimization - through 'volatile'
keyword it tells the compiler there may be concurrent accesses happening
and makes sure the value is really fetched into the local variable and used
from there. There are good articles about this on LWN - I'd give you a link
but LWN seems to be down today. But the latest article is about KCSAN and
from there are links to older articles about compiler optimizations.

> I'm also not sure I understand how using READ_ONCE really helps there if
> we can't count on the value of a local variable not changing.

I hope I've explained this above.

								Honza

> > > +	if (sb)
> > > +		errseq_set(&sb->s_wb_err, -EIO);
> > > +	rcu_read_unlock();
> > >  }
> > >  EXPORT_SYMBOL(mark_buffer_write_io_error);
> > >  
> > > -- 
> > > 2.25.2
> > > 
> 
> -- 
> Jeff Layton <jlayton@kernel.org>
>
Jeffrey Layton April 16, 2020, 11:31 a.m. UTC | #4
On Thu, 2020-04-16 at 11:35 +0200, Jan Kara wrote:
> On Wed 15-04-20 12:22:27, Jeff Layton wrote:
> > On Wed, 2020-04-15 at 16:06 +0200, Jan Kara wrote:
> > > On Wed 15-04-20 08:13:00, Jeff Layton wrote:
> > > > From: Jeff Layton <jlayton@redhat.com>
> > > > 
> > > > When syncing out a block device (a'la __sync_blockdev), any error
> > > > encountered will only be recorded in the bd_inode's mapping. When the
> > > > blockdev contains a filesystem however, we'd like to also record the
> > > > error in the super_block that's stored there.
> > > > 
> > > > Make mark_buffer_write_io_error also record the error in the
> > > > corresponding super_block when a writeback error occurs and the block
> > > > device contains a mounted superblock.
> > > > 
> > > > Since superblocks are RCU freed, hold the rcu_read_lock to ensure
> > > > that the superblock doesn't go away while we're marking it.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >  fs/buffer.c | 7 +++++++
> > > >  1 file changed, 7 insertions(+)
> > > > 
> > > > diff --git a/fs/buffer.c b/fs/buffer.c
> > > > index f73276d746bb..2a4a5cc20418 100644
> > > > --- a/fs/buffer.c
> > > > +++ b/fs/buffer.c
> > > > @@ -1154,12 +1154,19 @@ EXPORT_SYMBOL(mark_buffer_dirty);
> > > >  
> > > >  void mark_buffer_write_io_error(struct buffer_head *bh)
> > > >  {
> > > > +	struct super_block *sb;
> > > > +
> > > >  	set_buffer_write_io_error(bh);
> > > >  	/* FIXME: do we need to set this in both places? */
> > > >  	if (bh->b_page && bh->b_page->mapping)
> > > >  		mapping_set_error(bh->b_page->mapping, -EIO);
> > > >  	if (bh->b_assoc_map)
> > > >  		mapping_set_error(bh->b_assoc_map, -EIO);
> > > > +	rcu_read_lock();
> > > > +	sb = bh->b_bdev->bd_super;
> > > 
> > > You still need READ_ONCE() here. Otherwise the dereference below can still
> > > result in refetch and NULL ptr deref.
> > > 
> > > 								Honza
> > > 
> > 
> > Huh? That seems like a really suspicious thing for the compiler/arch to
> > do. We are checking that sb isn't NULL before we dereference it. Doesn't
> > that imply a data dependency? How could the value of "sb" change after
> > that?
> 
> Because the compiler is free to optimize the local variable away and
> actually compile the dereference below as bh->b_bdev->bd_super->s_wb_err
> (from C11 standard POV such code is equivalent since in C11 memory model
> it is assumed there are no concurrent accesses). And READ_ONCE() is a way
> to forbid compiler from doing such optimization - through 'volatile'
> keyword it tells the compiler there may be concurrent accesses happening
> and makes sure the value is really fetched into the local variable and used
> from there. There are good articles about this on LWN - I'd give you a link
> but LWN seems to be down today. But the latest article is about KCSAN and
> from there are links to older articles about compiler optimizations.
> 
> > I'm also not sure I understand how using READ_ONCE really helps there if
> > we can't count on the value of a local variable not changing.
> 
> I hope I've explained this above.
> 

Got it. Thanks for the explanation. Now I'll have nightmares about all
of the race conditions I've created in the past by making this
assumption!

I'll send a v6 set in a few mins.

> > > > +	if (sb)
> > > > +		errseq_set(&sb->s_wb_err, -EIO);
> > > > +	rcu_read_unlock();
> > > >  }
> > > >  EXPORT_SYMBOL(mark_buffer_write_io_error);
> > > >  
> > > > -- 
> > > > 2.25.2
> > > > 
> > 
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> >
diff mbox series

Patch

diff --git a/fs/buffer.c b/fs/buffer.c
index f73276d746bb..2a4a5cc20418 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1154,12 +1154,19 @@  EXPORT_SYMBOL(mark_buffer_dirty);
 
 void mark_buffer_write_io_error(struct buffer_head *bh)
 {
+	struct super_block *sb;
+
 	set_buffer_write_io_error(bh);
 	/* FIXME: do we need to set this in both places? */
 	if (bh->b_page && bh->b_page->mapping)
 		mapping_set_error(bh->b_page->mapping, -EIO);
 	if (bh->b_assoc_map)
 		mapping_set_error(bh->b_assoc_map, -EIO);
+	rcu_read_lock();
+	sb = bh->b_bdev->bd_super;
+	if (sb)
+		errseq_set(&sb->s_wb_err, -EIO);
+	rcu_read_unlock();
 }
 EXPORT_SYMBOL(mark_buffer_write_io_error);