diff mbox series

fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb

Message ID 20201224044954.1349459-1-satyat@google.com (mailing list archive)
State New, archived
Headers show
Series fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb | expand

Commit Message

Satya Tangirala Dec. 24, 2020, 4:49 a.m. UTC
freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer
whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff
bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify
bd_fsfreeze_sb.

But this means a freeze_bdev() call followed by a thaw_bdev() call can
leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is
zero. If freeze_bdev() is called again, and this time
get_active_super() returns NULL (e.g. because the FS is unmounted),
we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is
*untouched* - it stays the same (now garbage) value. A subsequent
thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate
(since bd_fsfreeze_count > 0), and attempt to use it.

Fix this by always setting bd_fsfreeze_sb to NULL when
bd_fsfreeze_count is successfully decremented to 0 in thaw_sb().
Alternatively, we could set bd_fsfreeze_sb to whatever
get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count
is successfully incremented to 1 from 0 (which can be achieved cleanly
by moving the line currently setting bd_fsfreeze_sb to immediately
after the "sync:" label, but it might be a little too subtle/easily
overlooked in future).

This fixes the currently panicking xfstests generic/085.

Fixes: 040f04bd2e82 ("fs: simplify freeze_bdev/thaw_bdev")
Signed-off-by: Satya Tangirala <satyat@google.com>
---
 fs/block_dev.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Darrick J. Wong Jan. 4, 2021, 9:58 p.m. UTC | #1
On Thu, Dec 24, 2020 at 04:49:54AM +0000, Satya Tangirala wrote:
> freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer
> whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff
> bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify
> bd_fsfreeze_sb.
> 
> But this means a freeze_bdev() call followed by a thaw_bdev() call can
> leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is
> zero. If freeze_bdev() is called again, and this time
> get_active_super() returns NULL (e.g. because the FS is unmounted),
> we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is
> *untouched* - it stays the same (now garbage) value. A subsequent
> thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate
> (since bd_fsfreeze_count > 0), and attempt to use it.
> 
> Fix this by always setting bd_fsfreeze_sb to NULL when
> bd_fsfreeze_count is successfully decremented to 0 in thaw_sb().
> Alternatively, we could set bd_fsfreeze_sb to whatever
> get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count
> is successfully incremented to 1 from 0 (which can be achieved cleanly
> by moving the line currently setting bd_fsfreeze_sb to immediately
> after the "sync:" label, but it might be a little too subtle/easily
> overlooked in future).
> 
> This fixes the currently panicking xfstests generic/085.
> 
> Fixes: 040f04bd2e82 ("fs: simplify freeze_bdev/thaw_bdev")
> Signed-off-by: Satya Tangirala <satyat@google.com>

I came up with the same solution to the same crash, so:

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/block_dev.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 9e56ee1f2652..12a811a9ae4b 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -606,6 +606,8 @@ int thaw_bdev(struct block_device *bdev)
>  		error = thaw_super(sb);
>  	if (error)
>  		bdev->bd_fsfreeze_count++;
> +	else
> +		bdev->bd_fsfreeze_sb = NULL;
>  out:
>  	mutex_unlock(&bdev->bd_fsfreeze_mutex);
>  	return error;
> -- 
> 2.29.2.729.g45daf8777d-goog
>
Christoph Hellwig Jan. 5, 2021, 7:50 a.m. UTC | #2
Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig Jan. 7, 2021, 4:20 p.m. UTC | #3
Can someone pick this up?  Maybe through Jens' block tree as that is
where my commit this is fixing up came from.

For reference:


Reviewed-by: Christoph Hellwig <hch@lst.de>

On Thu, Dec 24, 2020 at 04:49:54AM +0000, Satya Tangirala wrote:
> freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer
> whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff
> bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify
> bd_fsfreeze_sb.
> 
> But this means a freeze_bdev() call followed by a thaw_bdev() call can
> leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is
> zero. If freeze_bdev() is called again, and this time
> get_active_super() returns NULL (e.g. because the FS is unmounted),
> we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is
> *untouched* - it stays the same (now garbage) value. A subsequent
> thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate
> (since bd_fsfreeze_count > 0), and attempt to use it.
> 
> Fix this by always setting bd_fsfreeze_sb to NULL when
> bd_fsfreeze_count is successfully decremented to 0 in thaw_sb().
> Alternatively, we could set bd_fsfreeze_sb to whatever
> get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count
> is successfully incremented to 1 from 0 (which can be achieved cleanly
> by moving the line currently setting bd_fsfreeze_sb to immediately
> after the "sync:" label, but it might be a little too subtle/easily
> overlooked in future).
> 
> This fixes the currently panicking xfstests generic/085.
> 
> Fixes: 040f04bd2e82 ("fs: simplify freeze_bdev/thaw_bdev")
> Signed-off-by: Satya Tangirala <satyat@google.com>
> ---
>  fs/block_dev.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 9e56ee1f2652..12a811a9ae4b 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -606,6 +606,8 @@ int thaw_bdev(struct block_device *bdev)
>  		error = thaw_super(sb);
>  	if (error)
>  		bdev->bd_fsfreeze_count++;
> +	else
> +		bdev->bd_fsfreeze_sb = NULL;
>  out:
>  	mutex_unlock(&bdev->bd_fsfreeze_mutex);
>  	return error;
> -- 
> 2.29.2.729.g45daf8777d-goog
---end quoted text---
Bob Peterson Jan. 7, 2021, 4:26 p.m. UTC | #4
----- Original Message -----
> Can someone pick this up?  Maybe through Jens' block tree as that is
> where my commit this is fixing up came from.
> 
> For reference:
> 
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> On Thu, Dec 24, 2020 at 04:49:54AM +0000, Satya Tangirala wrote:
> > freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer
> > whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff
> > bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify
> > bd_fsfreeze_sb.
> > 
> > But this means a freeze_bdev() call followed by a thaw_bdev() call can
> > leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is
> > zero. If freeze_bdev() is called again, and this time
> > get_active_super() returns NULL (e.g. because the FS is unmounted),
> > we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is
> > *untouched* - it stays the same (now garbage) value. A subsequent
> > thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate
> > (since bd_fsfreeze_count > 0), and attempt to use it.
> > 
> > Fix this by always setting bd_fsfreeze_sb to NULL when
> > bd_fsfreeze_count is successfully decremented to 0 in thaw_sb().
> > Alternatively, we could set bd_fsfreeze_sb to whatever
> > get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count
> > is successfully incremented to 1 from 0 (which can be achieved cleanly
> > by moving the line currently setting bd_fsfreeze_sb to immediately
> > after the "sync:" label, but it might be a little too subtle/easily
> > overlooked in future).
> > 
> > This fixes the currently panicking xfstests generic/085.
> > 
> > Fixes: 040f04bd2e82 ("fs: simplify freeze_bdev/thaw_bdev")
> > Signed-off-by: Satya Tangirala <satyat@google.com>
> > ---
> >  fs/block_dev.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 9e56ee1f2652..12a811a9ae4b 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -606,6 +606,8 @@ int thaw_bdev(struct block_device *bdev)
> >  		error = thaw_super(sb);
> >  	if (error)
> >  		bdev->bd_fsfreeze_count++;
> > +	else
> > +		bdev->bd_fsfreeze_sb = NULL;
> >  out:
> >  	mutex_unlock(&bdev->bd_fsfreeze_mutex);
> >  	return error;
> > --
> > 2.29.2.729.g45daf8777d-goog
> ---end quoted text---
> 
> 
Funny you should ask. I came across this bug in my testing of gfs2
and my patch is slightly different. I was wondering who to send it to.
Perhaps Viro?

Regards,

Bob Peterson
Jens Axboe Jan. 7, 2021, 4:26 p.m. UTC | #5
On 1/7/21 9:20 AM, Christoph Hellwig wrote:
> Can someone pick this up?  Maybe through Jens' block tree as that is
> where my commit this is fixing up came from.
> 
> For reference:
> 
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Applied, thanks.
Bob Peterson Jan. 7, 2021, 4:27 p.m. UTC | #6
----- Original Message -----
> Can someone pick this up?  Maybe through Jens' block tree as that is
> where my commit this is fixing up came from.
Christoph and Al,

Here is my version:

Bob Peterson

fs: fix freeze count problem in freeze_bdev

Before this patch, if you tried to freeze a device (function freeze_bdev)
while it was being unmounted, it would get NULL back from get_active_super
and correctly bypass the freeze calls. Unfortunately, it forgot to decrement
its bd_fsfreeze_count. Subsequent calls to device thaw (thaw_bdev) would
see the non-zero bd_fsfreeze_count and assume the bd_fsfreeze_sb value was
still valid. That's not a safe assumption and resulted in use-after-free,
which often caused fatal kernel errors like: "unable to handle page fault
for address."

This patch adds the necessary decrement of bd_fsfreeze_count for that
error path. It also adds code to set the bd_fsfreeze_sb to NULL when the
last reference is reached in thaw_bdev.

Reviewed-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/block_dev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9e56ee1f2652..c6daf7d12546 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -555,8 +555,10 @@ int freeze_bdev(struct block_device *bdev)
 		goto done;
 
 	sb = get_active_super(bdev);
-	if (!sb)
+	if (!sb) {
+		bdev->bd_fsfreeze_count--;
 		goto sync;
+	}
 	if (sb->s_op->freeze_super)
 		error = sb->s_op->freeze_super(sb);
 	else
@@ -600,6 +602,7 @@ int thaw_bdev(struct block_device *bdev)
 	if (!sb)
 		goto out;
 
+	bdev->bd_fsfreeze_sb = NULL;
 	if (sb->s_op->thaw_super)
 		error = sb->s_op->thaw_super(sb);
 	else
Satya Tangirala Jan. 7, 2021, 11:08 p.m. UTC | #7
On Thu, Jan 07, 2021 at 11:27:37AM -0500, Bob Peterson wrote:
> ----- Original Message -----
> > Can someone pick this up?  Maybe through Jens' block tree as that is
> > where my commit this is fixing up came from.
> Christoph and Al,
> 
> Here is my version:
> 
> Bob Peterson
> 
> fs: fix freeze count problem in freeze_bdev
> 
> Before this patch, if you tried to freeze a device (function freeze_bdev)
> while it was being unmounted, it would get NULL back from get_active_super
> and correctly bypass the freeze calls. Unfortunately, it forgot to decrement
> its bd_fsfreeze_count. Subsequent calls to device thaw (thaw_bdev) would
> see the non-zero bd_fsfreeze_count and assume the bd_fsfreeze_sb value was
> still valid. That's not a safe assumption and resulted in use-after-free,
> which often caused fatal kernel errors like: "unable to handle page fault
> for address."
> 
> This patch adds the necessary decrement of bd_fsfreeze_count for that
> error path. It also adds code to set the bd_fsfreeze_sb to NULL when the
> last reference is reached in thaw_bdev.
> 
> Reviewed-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/block_dev.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 9e56ee1f2652..c6daf7d12546 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -555,8 +555,10 @@ int freeze_bdev(struct block_device *bdev)
>  		goto done;
>  
>  	sb = get_active_super(bdev);
> -	if (!sb)
> +	if (!sb) {
> +		bdev->bd_fsfreeze_count--;
>  		goto sync;
> +	}
>  	if (sb->s_op->freeze_super)
>  		error = sb->s_op->freeze_super(sb);
>  	else
> @@ -600,6 +602,7 @@ int thaw_bdev(struct block_device *bdev)
>  	if (!sb)
>  		goto out;
>  
> +	bdev->bd_fsfreeze_sb = NULL;
This causes bdev->bd_fsfreeze_sb to be set to NULL even if the call to
thaw_super right after this line fail. So if a caller tries to call
thaw_bdev() again after receiving such an error, that next call won't even
try to call thaw_super(). Is that what we want here?  (I don't know much
about this code, but from a cursory glance I think this difference is
visible to emergency_thaw_bdev() in fs/buffer.c)

In my version of the patch, I set bdev->bd_fsfreeze_sb to NULL only
*after* we check that the call to thaw_super() succeeded to avoid this.
>  	if (sb->s_op->thaw_super)
>  		error = sb->s_op->thaw_super(sb);
>  	else
>
In another mail, you mentioned
> I wrote this patch to fix the freeze/thaw device problem before I saw
> the patch "fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb"
> from Satya Tangirala. That one, however, does not fix the bd_freeze_count
> problem and this patch does.
Thanks a lot for investigating the bug and the patch I sent :)
Was there actually an issue with that patch I sent? As you said, the bug
is very reproduceable on master with generic/085. But with my patch, I
don't see any issues even after having run the test many, many times
(admittedly, I tested it on f2fs and ext4 rather than gfs2, but I don't
think that should cause much differences). Did you have a test case that
actually causes a failure with my patch?

The only two differences between the patch I sent and this patch are

1) The point at which the bd_fsfreeze_bd is set to NULL in thaw_bdev(), as
I mentioned above already.

2) Whether or not to decrement bd_fsfreeze_count when we get a NULL from
get_active_super() in freeze_bdev() - I don't do this in my patch.

I think setting bd_fsfreeze_sb to NULL in thaw_bdev (in either the place
your patch does it or my patch does it) is enough to fix the bug w.r.t the
use-after-free. Fwiw, I do think it should be set to NULL after we check for
all the errors though.

I think the second difference (decrementing bd_fsfreeze_count when
get_active_super() returns NULL) doesn't change anything w.r.t the
use-after-free. It does however, change the behaviour of the function
slightly, and it might be caller visible (because from a cursory glance, it
looks like we're reading the bd_fsfreeze_count from some other places like
fs/super.c). Even before 040f04bd2e82, the code wouldn't decrement
bd_fsfreeze_count when get_active_super() returned NULL - so is this change
in behaviour intentional? And if so, maybe it should go in a separate
patch?
Christoph Hellwig Jan. 8, 2021, 9:36 a.m. UTC | #8
On Thu, Jan 07, 2021 at 11:08:39PM +0000, Satya Tangirala wrote:
> >  		error = sb->s_op->freeze_super(sb);
> >  	else
> > @@ -600,6 +602,7 @@ int thaw_bdev(struct block_device *bdev)
> >  	if (!sb)
> >  		goto out;
> >  
> > +	bdev->bd_fsfreeze_sb = NULL;
> This causes bdev->bd_fsfreeze_sb to be set to NULL even if the call to
> thaw_super right after this line fail. So if a caller tries to call
> thaw_bdev() again after receiving such an error, that next call won't even
> try to call thaw_super(). Is that what we want here?  (I don't know much
> about this code, but from a cursory glance I think this difference is
> visible to emergency_thaw_bdev() in fs/buffer.c)

Yes, that definitively is an issue.

> 
> I think the second difference (decrementing bd_fsfreeze_count when
> get_active_super() returns NULL) doesn't change anything w.r.t the
> use-after-free. It does however, change the behaviour of the function
> slightly, and it might be caller visible (because from a cursory glance, it
> looks like we're reading the bd_fsfreeze_count from some other places like
> fs/super.c). Even before 040f04bd2e82, the code wouldn't decrement
> bd_fsfreeze_count when get_active_super() returned NULL - so is this change
> in behaviour intentional? And if so, maybe it should go in a separate
> patch?

Yes, that would be a change in behavior.  And I'm not sure why we would
want to change it.  But if so we should do it in a separate patch that
documents the why, on top of the patch that already is in the block tree.
Bob Peterson Jan. 8, 2021, 1:17 p.m. UTC | #9
----- Original Message -----
> This causes bdev->bd_fsfreeze_sb to be set to NULL even if the call to
> thaw_super right after this line fail. So if a caller tries to call
> thaw_bdev() again after receiving such an error, that next call won't even
> try to call thaw_super(). Is that what we want here?  (I don't know much
> about this code, but from a cursory glance I think this difference is
> visible to emergency_thaw_bdev() in fs/buffer.c)
> 
> In my version of the patch, I set bdev->bd_fsfreeze_sb to NULL only
> *after* we check that the call to thaw_super() succeeded to avoid this.

Yes, I see your point. Your patch is superior and I'll mine accordingly.

> Thanks a lot for investigating the bug and the patch I sent :)
> Was there actually an issue with that patch I sent? As you said, the bug

No, I never saw your patch until I saw Christoph's reference to it yesterday,
after I had been using my patch to fix the problem. AFAIK, there is no
problem with your patch.

> I think the second difference (decrementing bd_fsfreeze_count when
> get_active_super() returns NULL) doesn't change anything w.r.t the
> use-after-free. It does however, change the behaviour of the function
> slightly, and it might be caller visible (because from a cursory glance, it
> looks like we're reading the bd_fsfreeze_count from some other places like
> fs/super.c). Even before 040f04bd2e82, the code wouldn't decrement
> bd_fsfreeze_count when get_active_super() returned NULL - so is this change
> in behaviour intentional? And if so, maybe it should go in a separate
> patch?

This is the bigger issue, and I'm not very familiar with this code either,
so I'll defer to the experts. Yes, it's a change in behavior, but I think
it makes sense to decrement the bd_fsfreeze_count in this case. Here's why:

If the blockdev is frozen by freeze_bdev while it's being unmounted, the
bd_fsfreeze_count is incremented, but the freeze is ignored. Subsequent
attempts to thaw the device will be ignored but return 0 because the sb
is not found. When the device is mounted again, calls to freeze_bdev
will bypass the call to freeze_super for the newly mounted sb, because
bdev->bd_fsfreeze_count was then incremented from 1 to 2 in freeze_bdev.

	if (++bdev->bd_fsfreeze_count > 1)
		goto done;

So you're freezing the device without really freezing the superblock.
Seems like dangerous behavior to me. The new sb will only be frozen if
a second thaw is done, which gets them back in sync. I suppose we could
say this is acceptable loss, and your number of thaws should match your
freezes, and if they don't: user error. Still, it seems like we should do
something about it, like refuse to mount a frozen device. Perhaps it already
does that; I'll need to do some research.

Like I said, I don't know this code. I'm just trying to fix a problem
I observed. I'll defer to the experts.

Regards,

Bob Peterson
Bob Peterson Jan. 8, 2021, 2:58 p.m. UTC | #10
Hi,

> This is the bigger issue, and I'm not very familiar with this code either,
> so I'll defer to the experts. Yes, it's a change in behavior, but I think
> it makes sense to decrement the bd_fsfreeze_count in this case. Here's why:
> 
> If the blockdev is frozen by freeze_bdev while it's being unmounted, the
> bd_fsfreeze_count is incremented, but the freeze is ignored. Subsequent
> attempts to thaw the device will be ignored but return 0 because the sb
> is not found. When the device is mounted again, calls to freeze_bdev
> will bypass the call to freeze_super for the newly mounted sb, because
> bdev->bd_fsfreeze_count was then incremented from 1 to 2 in freeze_bdev.
> 
> 	if (++bdev->bd_fsfreeze_count > 1)
> 		goto done;
> 
> So you're freezing the device without really freezing the superblock.
> Seems like dangerous behavior to me. The new sb will only be frozen if
> a second thaw is done, which gets them back in sync. I suppose we could
> say this is acceptable loss, and your number of thaws should match your
> freezes, and if they don't: user error. Still, it seems like we should do
> something about it, like refuse to mount a frozen device. Perhaps it already
> does that; I'll need to do some research.

After some experiments, I've determined that my fears about the count are unfounded.
Consider my patch withdrawn. Sorry for the noise.

Bob Peterson
diff mbox series

Patch

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 9e56ee1f2652..12a811a9ae4b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -606,6 +606,8 @@  int thaw_bdev(struct block_device *bdev)
 		error = thaw_super(sb);
 	if (error)
 		bdev->bd_fsfreeze_count++;
+	else
+		bdev->bd_fsfreeze_sb = NULL;
 out:
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	return error;