Message ID | 20201224044954.1349459-1-satyat@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb | expand |
On Thu, Dec 24, 2020 at 04:49:54AM +0000, Satya Tangirala wrote: > freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer > whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff > bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify > bd_fsfreeze_sb. > > But this means a freeze_bdev() call followed by a thaw_bdev() call can > leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is > zero. If freeze_bdev() is called again, and this time > get_active_super() returns NULL (e.g. because the FS is unmounted), > we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is > *untouched* - it stays the same (now garbage) value. A subsequent > thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate > (since bd_fsfreeze_count > 0), and attempt to use it. > > Fix this by always setting bd_fsfreeze_sb to NULL when > bd_fsfreeze_count is successfully decremented to 0 in thaw_sb(). > Alternatively, we could set bd_fsfreeze_sb to whatever > get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count > is successfully incremented to 1 from 0 (which can be achieved cleanly > by moving the line currently setting bd_fsfreeze_sb to immediately > after the "sync:" label, but it might be a little too subtle/easily > overlooked in future). > > This fixes the currently panicking xfstests generic/085. > > Fixes: 040f04bd2e82 ("fs: simplify freeze_bdev/thaw_bdev") > Signed-off-by: Satya Tangirala <satyat@google.com> I came up with the same solution to the same crash, so: Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> --D > --- > fs/block_dev.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 9e56ee1f2652..12a811a9ae4b 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -606,6 +606,8 @@ int thaw_bdev(struct block_device *bdev) > error = thaw_super(sb); > if (error) > bdev->bd_fsfreeze_count++; > + else > + bdev->bd_fsfreeze_sb = NULL; > out: > mutex_unlock(&bdev->bd_fsfreeze_mutex); > return error; > -- > 2.29.2.729.g45daf8777d-goog >
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
Can someone pick this up? Maybe through Jens' block tree as that is where my commit this is fixing up came from. For reference: Reviewed-by: Christoph Hellwig <hch@lst.de> On Thu, Dec 24, 2020 at 04:49:54AM +0000, Satya Tangirala wrote: > freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer > whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff > bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify > bd_fsfreeze_sb. > > But this means a freeze_bdev() call followed by a thaw_bdev() call can > leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is > zero. If freeze_bdev() is called again, and this time > get_active_super() returns NULL (e.g. because the FS is unmounted), > we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is > *untouched* - it stays the same (now garbage) value. A subsequent > thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate > (since bd_fsfreeze_count > 0), and attempt to use it. > > Fix this by always setting bd_fsfreeze_sb to NULL when > bd_fsfreeze_count is successfully decremented to 0 in thaw_sb(). > Alternatively, we could set bd_fsfreeze_sb to whatever > get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count > is successfully incremented to 1 from 0 (which can be achieved cleanly > by moving the line currently setting bd_fsfreeze_sb to immediately > after the "sync:" label, but it might be a little too subtle/easily > overlooked in future). > > This fixes the currently panicking xfstests generic/085. > > Fixes: 040f04bd2e82 ("fs: simplify freeze_bdev/thaw_bdev") > Signed-off-by: Satya Tangirala <satyat@google.com> > --- > fs/block_dev.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 9e56ee1f2652..12a811a9ae4b 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -606,6 +606,8 @@ int thaw_bdev(struct block_device *bdev) > error = thaw_super(sb); > if (error) > bdev->bd_fsfreeze_count++; > + else > + bdev->bd_fsfreeze_sb = NULL; > out: > mutex_unlock(&bdev->bd_fsfreeze_mutex); > return error; > -- > 2.29.2.729.g45daf8777d-goog ---end quoted text---
----- Original Message ----- > Can someone pick this up? Maybe through Jens' block tree as that is > where my commit this is fixing up came from. > > For reference: > > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > On Thu, Dec 24, 2020 at 04:49:54AM +0000, Satya Tangirala wrote: > > freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer > > whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff > > bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify > > bd_fsfreeze_sb. > > > > But this means a freeze_bdev() call followed by a thaw_bdev() call can > > leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is > > zero. If freeze_bdev() is called again, and this time > > get_active_super() returns NULL (e.g. because the FS is unmounted), > > we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is > > *untouched* - it stays the same (now garbage) value. A subsequent > > thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate > > (since bd_fsfreeze_count > 0), and attempt to use it. > > > > Fix this by always setting bd_fsfreeze_sb to NULL when > > bd_fsfreeze_count is successfully decremented to 0 in thaw_sb(). > > Alternatively, we could set bd_fsfreeze_sb to whatever > > get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count > > is successfully incremented to 1 from 0 (which can be achieved cleanly > > by moving the line currently setting bd_fsfreeze_sb to immediately > > after the "sync:" label, but it might be a little too subtle/easily > > overlooked in future). > > > > This fixes the currently panicking xfstests generic/085. > > > > Fixes: 040f04bd2e82 ("fs: simplify freeze_bdev/thaw_bdev") > > Signed-off-by: Satya Tangirala <satyat@google.com> > > --- > > fs/block_dev.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/fs/block_dev.c b/fs/block_dev.c > > index 9e56ee1f2652..12a811a9ae4b 100644 > > --- a/fs/block_dev.c > > +++ b/fs/block_dev.c > > @@ -606,6 +606,8 @@ int thaw_bdev(struct block_device *bdev) > > error = thaw_super(sb); > > if (error) > > bdev->bd_fsfreeze_count++; > > + else > > + bdev->bd_fsfreeze_sb = NULL; > > out: > > mutex_unlock(&bdev->bd_fsfreeze_mutex); > > return error; > > -- > > 2.29.2.729.g45daf8777d-goog > ---end quoted text--- > > Funny you should ask. I came across this bug in my testing of gfs2 and my patch is slightly different. I was wondering who to send it to. Perhaps Viro? Regards, Bob Peterson
On 1/7/21 9:20 AM, Christoph Hellwig wrote: > Can someone pick this up? Maybe through Jens' block tree as that is > where my commit this is fixing up came from. > > For reference: > > > Reviewed-by: Christoph Hellwig <hch@lst.de> Applied, thanks.
----- Original Message ----- > Can someone pick this up? Maybe through Jens' block tree as that is > where my commit this is fixing up came from. Christoph and Al, Here is my version: Bob Peterson fs: fix freeze count problem in freeze_bdev Before this patch, if you tried to freeze a device (function freeze_bdev) while it was being unmounted, it would get NULL back from get_active_super and correctly bypass the freeze calls. Unfortunately, it forgot to decrement its bd_fsfreeze_count. Subsequent calls to device thaw (thaw_bdev) would see the non-zero bd_fsfreeze_count and assume the bd_fsfreeze_sb value was still valid. That's not a safe assumption and resulted in use-after-free, which often caused fatal kernel errors like: "unable to handle page fault for address." This patch adds the necessary decrement of bd_fsfreeze_count for that error path. It also adds code to set the bd_fsfreeze_sb to NULL when the last reference is reached in thaw_bdev. Reviewed-by: Bob Peterson <rpeterso@redhat.com> --- fs/block_dev.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 9e56ee1f2652..c6daf7d12546 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -555,8 +555,10 @@ int freeze_bdev(struct block_device *bdev) goto done; sb = get_active_super(bdev); - if (!sb) + if (!sb) { + bdev->bd_fsfreeze_count--; goto sync; + } if (sb->s_op->freeze_super) error = sb->s_op->freeze_super(sb); else @@ -600,6 +602,7 @@ int thaw_bdev(struct block_device *bdev) if (!sb) goto out; + bdev->bd_fsfreeze_sb = NULL; if (sb->s_op->thaw_super) error = sb->s_op->thaw_super(sb); else
On Thu, Jan 07, 2021 at 11:27:37AM -0500, Bob Peterson wrote: > ----- Original Message ----- > > Can someone pick this up? Maybe through Jens' block tree as that is > > where my commit this is fixing up came from. > Christoph and Al, > > Here is my version: > > Bob Peterson > > fs: fix freeze count problem in freeze_bdev > > Before this patch, if you tried to freeze a device (function freeze_bdev) > while it was being unmounted, it would get NULL back from get_active_super > and correctly bypass the freeze calls. Unfortunately, it forgot to decrement > its bd_fsfreeze_count. Subsequent calls to device thaw (thaw_bdev) would > see the non-zero bd_fsfreeze_count and assume the bd_fsfreeze_sb value was > still valid. That's not a safe assumption and resulted in use-after-free, > which often caused fatal kernel errors like: "unable to handle page fault > for address." > > This patch adds the necessary decrement of bd_fsfreeze_count for that > error path. It also adds code to set the bd_fsfreeze_sb to NULL when the > last reference is reached in thaw_bdev. > > Reviewed-by: Bob Peterson <rpeterso@redhat.com> > --- > fs/block_dev.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 9e56ee1f2652..c6daf7d12546 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -555,8 +555,10 @@ int freeze_bdev(struct block_device *bdev) > goto done; > > sb = get_active_super(bdev); > - if (!sb) > + if (!sb) { > + bdev->bd_fsfreeze_count--; > goto sync; > + } > if (sb->s_op->freeze_super) > error = sb->s_op->freeze_super(sb); > else > @@ -600,6 +602,7 @@ int thaw_bdev(struct block_device *bdev) > if (!sb) > goto out; > > + bdev->bd_fsfreeze_sb = NULL; This causes bdev->bd_fsfreeze_sb to be set to NULL even if the call to thaw_super right after this line fail. So if a caller tries to call thaw_bdev() again after receiving such an error, that next call won't even try to call thaw_super(). Is that what we want here? (I don't know much about this code, but from a cursory glance I think this difference is visible to emergency_thaw_bdev() in fs/buffer.c) In my version of the patch, I set bdev->bd_fsfreeze_sb to NULL only *after* we check that the call to thaw_super() succeeded to avoid this. > if (sb->s_op->thaw_super) > error = sb->s_op->thaw_super(sb); > else > In another mail, you mentioned > I wrote this patch to fix the freeze/thaw device problem before I saw > the patch "fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb" > from Satya Tangirala. That one, however, does not fix the bd_freeze_count > problem and this patch does. Thanks a lot for investigating the bug and the patch I sent :) Was there actually an issue with that patch I sent? As you said, the bug is very reproduceable on master with generic/085. But with my patch, I don't see any issues even after having run the test many, many times (admittedly, I tested it on f2fs and ext4 rather than gfs2, but I don't think that should cause much differences). Did you have a test case that actually causes a failure with my patch? The only two differences between the patch I sent and this patch are 1) The point at which the bd_fsfreeze_bd is set to NULL in thaw_bdev(), as I mentioned above already. 2) Whether or not to decrement bd_fsfreeze_count when we get a NULL from get_active_super() in freeze_bdev() - I don't do this in my patch. I think setting bd_fsfreeze_sb to NULL in thaw_bdev (in either the place your patch does it or my patch does it) is enough to fix the bug w.r.t the use-after-free. Fwiw, I do think it should be set to NULL after we check for all the errors though. I think the second difference (decrementing bd_fsfreeze_count when get_active_super() returns NULL) doesn't change anything w.r.t the use-after-free. It does however, change the behaviour of the function slightly, and it might be caller visible (because from a cursory glance, it looks like we're reading the bd_fsfreeze_count from some other places like fs/super.c). Even before 040f04bd2e82, the code wouldn't decrement bd_fsfreeze_count when get_active_super() returned NULL - so is this change in behaviour intentional? And if so, maybe it should go in a separate patch?
On Thu, Jan 07, 2021 at 11:08:39PM +0000, Satya Tangirala wrote: > > error = sb->s_op->freeze_super(sb); > > else > > @@ -600,6 +602,7 @@ int thaw_bdev(struct block_device *bdev) > > if (!sb) > > goto out; > > > > + bdev->bd_fsfreeze_sb = NULL; > This causes bdev->bd_fsfreeze_sb to be set to NULL even if the call to > thaw_super right after this line fail. So if a caller tries to call > thaw_bdev() again after receiving such an error, that next call won't even > try to call thaw_super(). Is that what we want here? (I don't know much > about this code, but from a cursory glance I think this difference is > visible to emergency_thaw_bdev() in fs/buffer.c) Yes, that definitively is an issue. > > I think the second difference (decrementing bd_fsfreeze_count when > get_active_super() returns NULL) doesn't change anything w.r.t the > use-after-free. It does however, change the behaviour of the function > slightly, and it might be caller visible (because from a cursory glance, it > looks like we're reading the bd_fsfreeze_count from some other places like > fs/super.c). Even before 040f04bd2e82, the code wouldn't decrement > bd_fsfreeze_count when get_active_super() returned NULL - so is this change > in behaviour intentional? And if so, maybe it should go in a separate > patch? Yes, that would be a change in behavior. And I'm not sure why we would want to change it. But if so we should do it in a separate patch that documents the why, on top of the patch that already is in the block tree.
----- Original Message ----- > This causes bdev->bd_fsfreeze_sb to be set to NULL even if the call to > thaw_super right after this line fail. So if a caller tries to call > thaw_bdev() again after receiving such an error, that next call won't even > try to call thaw_super(). Is that what we want here? (I don't know much > about this code, but from a cursory glance I think this difference is > visible to emergency_thaw_bdev() in fs/buffer.c) > > In my version of the patch, I set bdev->bd_fsfreeze_sb to NULL only > *after* we check that the call to thaw_super() succeeded to avoid this. Yes, I see your point. Your patch is superior and I'll mine accordingly. > Thanks a lot for investigating the bug and the patch I sent :) > Was there actually an issue with that patch I sent? As you said, the bug No, I never saw your patch until I saw Christoph's reference to it yesterday, after I had been using my patch to fix the problem. AFAIK, there is no problem with your patch. > I think the second difference (decrementing bd_fsfreeze_count when > get_active_super() returns NULL) doesn't change anything w.r.t the > use-after-free. It does however, change the behaviour of the function > slightly, and it might be caller visible (because from a cursory glance, it > looks like we're reading the bd_fsfreeze_count from some other places like > fs/super.c). Even before 040f04bd2e82, the code wouldn't decrement > bd_fsfreeze_count when get_active_super() returned NULL - so is this change > in behaviour intentional? And if so, maybe it should go in a separate > patch? This is the bigger issue, and I'm not very familiar with this code either, so I'll defer to the experts. Yes, it's a change in behavior, but I think it makes sense to decrement the bd_fsfreeze_count in this case. Here's why: If the blockdev is frozen by freeze_bdev while it's being unmounted, the bd_fsfreeze_count is incremented, but the freeze is ignored. Subsequent attempts to thaw the device will be ignored but return 0 because the sb is not found. When the device is mounted again, calls to freeze_bdev will bypass the call to freeze_super for the newly mounted sb, because bdev->bd_fsfreeze_count was then incremented from 1 to 2 in freeze_bdev. if (++bdev->bd_fsfreeze_count > 1) goto done; So you're freezing the device without really freezing the superblock. Seems like dangerous behavior to me. The new sb will only be frozen if a second thaw is done, which gets them back in sync. I suppose we could say this is acceptable loss, and your number of thaws should match your freezes, and if they don't: user error. Still, it seems like we should do something about it, like refuse to mount a frozen device. Perhaps it already does that; I'll need to do some research. Like I said, I don't know this code. I'm just trying to fix a problem I observed. I'll defer to the experts. Regards, Bob Peterson
Hi, > This is the bigger issue, and I'm not very familiar with this code either, > so I'll defer to the experts. Yes, it's a change in behavior, but I think > it makes sense to decrement the bd_fsfreeze_count in this case. Here's why: > > If the blockdev is frozen by freeze_bdev while it's being unmounted, the > bd_fsfreeze_count is incremented, but the freeze is ignored. Subsequent > attempts to thaw the device will be ignored but return 0 because the sb > is not found. When the device is mounted again, calls to freeze_bdev > will bypass the call to freeze_super for the newly mounted sb, because > bdev->bd_fsfreeze_count was then incremented from 1 to 2 in freeze_bdev. > > if (++bdev->bd_fsfreeze_count > 1) > goto done; > > So you're freezing the device without really freezing the superblock. > Seems like dangerous behavior to me. The new sb will only be frozen if > a second thaw is done, which gets them back in sync. I suppose we could > say this is acceptable loss, and your number of thaws should match your > freezes, and if they don't: user error. Still, it seems like we should do > something about it, like refuse to mount a frozen device. Perhaps it already > does that; I'll need to do some research. After some experiments, I've determined that my fears about the count are unfounded. Consider my patch withdrawn. Sorry for the noise. Bob Peterson
diff --git a/fs/block_dev.c b/fs/block_dev.c index 9e56ee1f2652..12a811a9ae4b 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -606,6 +606,8 @@ int thaw_bdev(struct block_device *bdev) error = thaw_super(sb); if (error) bdev->bd_fsfreeze_count++; + else + bdev->bd_fsfreeze_sb = NULL; out: mutex_unlock(&bdev->bd_fsfreeze_mutex); return error;
freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify bd_fsfreeze_sb. But this means a freeze_bdev() call followed by a thaw_bdev() call can leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is zero. If freeze_bdev() is called again, and this time get_active_super() returns NULL (e.g. because the FS is unmounted), we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is *untouched* - it stays the same (now garbage) value. A subsequent thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate (since bd_fsfreeze_count > 0), and attempt to use it. Fix this by always setting bd_fsfreeze_sb to NULL when bd_fsfreeze_count is successfully decremented to 0 in thaw_sb(). Alternatively, we could set bd_fsfreeze_sb to whatever get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count is successfully incremented to 1 from 0 (which can be achieved cleanly by moving the line currently setting bd_fsfreeze_sb to immediately after the "sync:" label, but it might be a little too subtle/easily overlooked in future). This fixes the currently panicking xfstests generic/085. Fixes: 040f04bd2e82 ("fs: simplify freeze_bdev/thaw_bdev") Signed-off-by: Satya Tangirala <satyat@google.com> --- fs/block_dev.c | 2 ++ 1 file changed, 2 insertions(+)