diff mbox

[5/7] btrfs: explictly delete unused block groups in close_ctree and ro-remount

Message ID 1434375680-4115-5-git-send-email-jeffm@suse.com (mailing list archive)
State Accepted
Headers show

Commit Message

Jeff Mahoney June 15, 2015, 1:41 p.m. UTC
From: Jeff Mahoney <jeffm@suse.com>

The cleaner thread may already be sleeping by the time we enter
close_ctree.  If that's the case, we'll skip removing any unused
block groups queued for removal, even during a normal umount.
They'll be cleaned up automatically at next mount, but users
expect a umount to be a clean synchronization point, especially
when used on thin-provisioned storage with -odiscard.  We also
explicitly remove unused block groups in the ro-remount path
for the same reason.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
---
 fs/btrfs/disk-io.c |  9 +++++++++
 fs/btrfs/super.c   | 11 +++++++++++
 2 files changed, 20 insertions(+)

Comments

Filipe Manana June 17, 2015, 10:04 a.m. UTC | #1
On Mon, Jun 15, 2015 at 2:41 PM,  <jeffm@suse.com> wrote:
> From: Jeff Mahoney <jeffm@suse.com>
>
> The cleaner thread may already be sleeping by the time we enter
> close_ctree.  If that's the case, we'll skip removing any unused
> block groups queued for removal, even during a normal umount.
> They'll be cleaned up automatically at next mount, but users
> expect a umount to be a clean synchronization point, especially
> when used on thin-provisioned storage with -odiscard.  We also
> explicitly remove unused block groups in the ro-remount path
> for the same reason.
>
> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Tested-by: Filipe Manana <fdmanana@suse.com>

> ---
>  fs/btrfs/disk-io.c |  9 +++++++++
>  fs/btrfs/super.c   | 11 +++++++++++
>  2 files changed, 20 insertions(+)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 2ef9a4b..2e47fef 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3710,6 +3710,15 @@ void close_ctree(struct btrfs_root *root)
>         cancel_work_sync(&fs_info->async_reclaim_work);
>
>         if (!(fs_info->sb->s_flags & MS_RDONLY)) {
> +               /*
> +                * If the cleaner thread is stopped and there are
> +                * block groups queued for removal, the deletion will be
> +                * skipped when we quit the cleaner thread.
> +                */
> +               mutex_lock(&root->fs_info->cleaner_mutex);
> +               btrfs_delete_unused_bgs(root->fs_info);
> +               mutex_unlock(&root->fs_info->cleaner_mutex);
> +
>                 ret = btrfs_commit_super(root);
>                 if (ret)
>                         btrfs_err(fs_info, "commit super ret %d", ret);
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 9e66f5e..2ccd8d4 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1539,6 +1539,17 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
>
>                 sb->s_flags |= MS_RDONLY;
>
> +               /*
> +                * Setting MS_RDONLY will put the cleaner thread to
> +                * sleep at the next loop if it's already active.
> +                * If it's already asleep, we'll leave unused block
> +                * groups on disk until we're mounted read-write again
> +                * unless we clean them up here.
> +                */
> +               mutex_lock(&root->fs_info->cleaner_mutex);
> +               btrfs_delete_unused_bgs(fs_info);
> +               mutex_unlock(&root->fs_info->cleaner_mutex);
> +
>                 btrfs_dev_replace_suspend_for_unmount(fs_info);
>                 btrfs_scrub_cancel(fs_info);
>                 btrfs_pause_balance(fs_info);
> --
> 2.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana June 17, 2015, 1:24 p.m. UTC | #2
On Wed, Jun 17, 2015 at 11:04 AM, Filipe David Manana
<fdmanana@gmail.com> wrote:
> On Mon, Jun 15, 2015 at 2:41 PM,  <jeffm@suse.com> wrote:
>> From: Jeff Mahoney <jeffm@suse.com>
>>
>> The cleaner thread may already be sleeping by the time we enter
>> close_ctree.  If that's the case, we'll skip removing any unused
>> block groups queued for removal, even during a normal umount.
>> They'll be cleaned up automatically at next mount, but users
>> expect a umount to be a clean synchronization point, especially
>> when used on thin-provisioned storage with -odiscard.  We also
>> explicitly remove unused block groups in the ro-remount path
>> for the same reason.
>>
>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> Tested-by: Filipe Manana <fdmanana@suse.com>
>
>> ---
>>  fs/btrfs/disk-io.c |  9 +++++++++
>>  fs/btrfs/super.c   | 11 +++++++++++
>>  2 files changed, 20 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 2ef9a4b..2e47fef 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3710,6 +3710,15 @@ void close_ctree(struct btrfs_root *root)
>>         cancel_work_sync(&fs_info->async_reclaim_work);
>>
>>         if (!(fs_info->sb->s_flags & MS_RDONLY)) {
>> +               /*
>> +                * If the cleaner thread is stopped and there are
>> +                * block groups queued for removal, the deletion will be
>> +                * skipped when we quit the cleaner thread.
>> +                */
>> +               mutex_lock(&root->fs_info->cleaner_mutex);
>> +               btrfs_delete_unused_bgs(root->fs_info);
>> +               mutex_unlock(&root->fs_info->cleaner_mutex);
>> +
>>                 ret = btrfs_commit_super(root);
>>                 if (ret)
>>                         btrfs_err(fs_info, "commit super ret %d", ret);
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index 9e66f5e..2ccd8d4 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -1539,6 +1539,17 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
>>
>>                 sb->s_flags |= MS_RDONLY;
>>
>> +               /*
>> +                * Setting MS_RDONLY will put the cleaner thread to
>> +                * sleep at the next loop if it's already active.
>> +                * If it's already asleep, we'll leave unused block
>> +                * groups on disk until we're mounted read-write again
>> +                * unless we clean them up here.
>> +                */
>> +               mutex_lock(&root->fs_info->cleaner_mutex);
>> +               btrfs_delete_unused_bgs(fs_info);
>> +               mutex_unlock(&root->fs_info->cleaner_mutex);

So actually, this allows for a deadlock after the patch I sent out last week:

https://patchwork.kernel.org/patch/6586811/

In that patch delete_unused_bgs is no longer called under the
cleaner_mutex, and making it so, will cause a deadlock with
relocation.

Even without that patch, I don't think you need using this mutex
anyway - no 2 tasks running this function can get the same bg from the
fs_info->unused_bgs list.

thanks


>> +
>>                 btrfs_dev_replace_suspend_for_unmount(fs_info);
>>                 btrfs_scrub_cancel(fs_info);
>>                 btrfs_pause_balance(fs_info);
>> --
>> 2.4.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
Jeff Mahoney June 17, 2015, 2:32 p.m. UTC | #3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 6/17/15 9:24 AM, Filipe David Manana wrote:
> On Wed, Jun 17, 2015 at 11:04 AM, Filipe David Manana 
> <fdmanana@gmail.com> wrote:
>> On Mon, Jun 15, 2015 at 2:41 PM,  <jeffm@suse.com> wrote:
>>> From: Jeff Mahoney <jeffm@suse.com>
>>> 
>>> The cleaner thread may already be sleeping by the time we
>>> enter close_ctree.  If that's the case, we'll skip removing any
>>> unused block groups queued for removal, even during a normal
>>> umount. They'll be cleaned up automatically at next mount, but
>>> users expect a umount to be a clean synchronization point,
>>> especially when used on thin-provisioned storage with
>>> -odiscard.  We also explicitly remove unused block groups in
>>> the ro-remount path for the same reason.
>>> 
>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>> Reviewed-by: Filipe Manana <fdmanana@suse.com> Tested-by: Filipe
>> Manana <fdmanana@suse.com>
>> 
>>> --- fs/btrfs/disk-io.c |  9 +++++++++ fs/btrfs/super.c   | 11
>>> +++++++++++ 2 files changed, 20 insertions(+)
>>> 
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index
>>> 2ef9a4b..2e47fef 100644 --- a/fs/btrfs/disk-io.c +++
>>> b/fs/btrfs/disk-io.c @@ -3710,6 +3710,15 @@ void
>>> close_ctree(struct btrfs_root *root) 
>>> cancel_work_sync(&fs_info->async_reclaim_work);
>>> 
>>> if (!(fs_info->sb->s_flags & MS_RDONLY)) { +               /* +
>>> * If the cleaner thread is stopped and there are +
>>> * block groups queued for removal, the deletion will be +
>>> * skipped when we quit the cleaner thread. +                */ 
>>> +               mutex_lock(&root->fs_info->cleaner_mutex); +
>>> btrfs_delete_unused_bgs(root->fs_info); +
>>> mutex_unlock(&root->fs_info->cleaner_mutex); + ret =
>>> btrfs_commit_super(root); if (ret) btrfs_err(fs_info, "commit
>>> super ret %d", ret); diff --git a/fs/btrfs/super.c
>>> b/fs/btrfs/super.c index 9e66f5e..2ccd8d4 100644 ---
>>> a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1539,6 +1539,17
>>> @@ static int btrfs_remount(struct super_block *sb, int *flags,
>>> char *data)
>>> 
>>> sb->s_flags |= MS_RDONLY;
>>> 
>>> +               /* +                * Setting MS_RDONLY will
>>> put the cleaner thread to +                * sleep at the next
>>> loop if it's already active. +                * If it's already
>>> asleep, we'll leave unused block +                * groups on
>>> disk until we're mounted read-write again +                *
>>> unless we clean them up here. +                */ +
>>> mutex_lock(&root->fs_info->cleaner_mutex); +
>>> btrfs_delete_unused_bgs(fs_info); +
>>> mutex_unlock(&root->fs_info->cleaner_mutex);
> 
> So actually, this allows for a deadlock after the patch I sent out
> last week:
> 
> https://patchwork.kernel.org/patch/6586811/
> 
> In that patch delete_unused_bgs is no longer called under the 
> cleaner_mutex, and making it so, will cause a deadlock with/ru 
> relocation.
> 
> Even without that patch, I don't think you need using this mutex 
> anyway - no 2 tasks running this function can get the same bg from
> the fs_info->unused_bgs list.

I was hitting crashes during umount when xfstests would do remount-ro
and umount in quick succession.  I can go back and confirm this, but I
believe I was encountering a race between the cleaner thread and
umount after being set read-only.  It didn't trigger all the time.  My
hypothesis is that if the cleaner thread was running and had a lot of
work to do, it could start before set MS_RDONLY and still be
performing work through the remount and into the umount.  Ro-remount
would have set MS_RDONLY so we skip the btrfs_super_commit in
close_ctree and then blow up afterwards.

Taking the cleaner mutex means we either wait until the cleaner thread
has finished or we put it to sleep on the next loop before it does
anything.  In either case, it's safe.  It could just has easily been:

               mutex_lock(&root->fs_info->cleaner_mutex);
               mutex_unlock(&root->fs_info->cleaner_mutex);

               btrfs_delete_unused_bgs(fs_info);

I think it actually was in a previous version I was testing.  It
probably should go back to that version so that we don't end up
confusing it with the new mutex you introduced in your patch.

- -Jeff

- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)

iQIcBAEBAgAGBQJVgYThAAoJEB57S2MheeWymvMP/jPnCFslZfEphccGlqsDUQeb
Ua9SVQJ5XjS0BbnVfuMGmzxew30BkUBdpnlWsufdVIKIeR9DNcvuDHJtcXMUI+Uw
FU/Asik//xiDPJ1hldPc4d0CJjsFBKHVLKjirkeE7kuvwa+XmfUYfKrhfzt6ZGvt
sWrCwMJRWFAS88ayR+NAelwaMzIy+Rbs5gZYg6dd2OCvIa4GuTh/szx8RaPOjNWQ
QcQHy2FlCcV/AtCA+ZaXh8NLmATIA8613biP7ATGIYHEdaZf7Oivov/u154QVwkt
c4omauofHKbBmlz2d//PS/T/n9nT7F7p1YvFaDnLLyQ0Ew3VBq+M9gyuWF8IGxti
iHdGkgQxnPSY0gGLA5bIt0D+su1RcTqa/71LOsBqbmk7KioNF4bp9FmaykHx2LAL
NpKGPD6BEcTTZAXfGdV6/IxTuii1temxcyawJrijakFTseA/GOmODI3K1kg+nLZA
OBjzFmzzLFir8SuiIWLO5ncbbsoM6rHhbl08DeKZ6tOH4JQm2ROciVgTn67SVxB5
bmjzl/zhhePfPgmbf5WoLsT4cbuGK00r+M3U79vzIjfEPmKAfGFbu9jEGPvQahKT
tOBRw7IaL8vCrBLFGUhhQzECwOK6Zms4r2ZTino30MwSNHegPjUbt8xDmFHw+gp3
Td6o4o23By9ygZgac0KI
=iHjc
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Mahoney June 17, 2015, 2:36 p.m. UTC | #4
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 6/17/15 10:32 AM, Jeff Mahoney wrote:
> On 6/17/15 9:24 AM, Filipe David Manana wrote:
>> On Wed, Jun 17, 2015 at 11:04 AM, Filipe David Manana 
>> <fdmanana@gmail.com> wrote:
>>> On Mon, Jun 15, 2015 at 2:41 PM,  <jeffm@suse.com> wrote:
>>>> From: Jeff Mahoney <jeffm@suse.com>
>>>> 
>>>> The cleaner thread may already be sleeping by the time we 
>>>> enter close_ctree.  If that's the case, we'll skip removing
>>>> any unused block groups queued for removal, even during a
>>>> normal umount. They'll be cleaned up automatically at next
>>>> mount, but users expect a umount to be a clean
>>>> synchronization point, especially when used on
>>>> thin-provisioned storage with -odiscard.  We also explicitly
>>>> remove unused block groups in the ro-remount path for the
>>>> same reason.
>>>> 
>>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>>> Reviewed-by: Filipe Manana <fdmanana@suse.com> Tested-by:
>>> Filipe Manana <fdmanana@suse.com>
>>> 
>>>> --- fs/btrfs/disk-io.c |  9 +++++++++ fs/btrfs/super.c   |
>>>> 11 +++++++++++ 2 files changed, 20 insertions(+)
>>>> 
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 
>>>> 2ef9a4b..2e47fef 100644 --- a/fs/btrfs/disk-io.c +++ 
>>>> b/fs/btrfs/disk-io.c @@ -3710,6 +3710,15 @@ void 
>>>> close_ctree(struct btrfs_root *root) 
>>>> cancel_work_sync(&fs_info->async_reclaim_work);
>>>> 
>>>> if (!(fs_info->sb->s_flags & MS_RDONLY)) { +               /*
>>>> + * If the cleaner thread is stopped and there are + * block
>>>> groups queued for removal, the deletion will be + * skipped
>>>> when we quit the cleaner thread. +                */ +
>>>> mutex_lock(&root->fs_info->cleaner_mutex); + 
>>>> btrfs_delete_unused_bgs(root->fs_info); + 
>>>> mutex_unlock(&root->fs_info->cleaner_mutex); + ret = 
>>>> btrfs_commit_super(root); if (ret) btrfs_err(fs_info,
>>>> "commit super ret %d", ret); diff --git a/fs/btrfs/super.c 
>>>> b/fs/btrfs/super.c index 9e66f5e..2ccd8d4 100644 --- 
>>>> a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1539,6
>>>> +1539,17 @@ static int btrfs_remount(struct super_block *sb,
>>>> int *flags, char *data)
>>>> 
>>>> sb->s_flags |= MS_RDONLY;
>>>> 
>>>> +               /* +                * Setting MS_RDONLY will 
>>>> put the cleaner thread to +                * sleep at the
>>>> next loop if it's already active. +                * If it's
>>>> already asleep, we'll leave unused block +                *
>>>> groups on disk until we're mounted read-write again +
>>>> * unless we clean them up here. +                */ + 
>>>> mutex_lock(&root->fs_info->cleaner_mutex); + 
>>>> btrfs_delete_unused_bgs(fs_info); + 
>>>> mutex_unlock(&root->fs_info->cleaner_mutex);
> 
>> So actually, this allows for a deadlock after the patch I sent
>> out last week:
> 
>> https://patchwork.kernel.org/patch/6586811/
> 
>> In that patch delete_unused_bgs is no longer called under the 
>> cleaner_mutex, and making it so, will cause a deadlock with/ru 
>> relocation.
> 
>> Even without that patch, I don't think you need using this mutex
>>  anyway - no 2 tasks running this function can get the same bg
>> from the fs_info->unused_bgs list.
> 
> I was hitting crashes during umount when xfstests would do
> remount-ro and umount in quick succession.  I can go back and
> confirm this, but I believe I was encountering a race between the
> cleaner thread and umount after being set read-only.  It didn't
> trigger all the time.  My hypothesis is that if the cleaner thread
> was running and had a lot of work to do, it could start before set
> MS_RDONLY and still be performing work through the remount and into
> the umount.  Ro-remount would have set MS_RDONLY so we skip the
> btrfs_super_commit in close_ctree and then blow up afterwards.
> 
> Taking the cleaner mutex means we either wait until the cleaner
> thread has finished or we put it to sleep on the next loop before
> it does anything.  In either case, it's safe.  It could just has
> easily been:
> 
> mutex_lock(&root->fs_info->cleaner_mutex); 
> mutex_unlock(&root->fs_info->cleaner_mutex);
> 
> btrfs_delete_unused_bgs(fs_info);
> 
> I think it actually was in a previous version I was testing.  It 
> probably should go back to that version so that we don't end up 
> confusing it with the new mutex you introduced in your patch.

It looks like your:
[PATCH] Btrfs: fix crash on close_ctree() if cleaner starts new
transaction

would also fix this in a more general case.  We can drop taking the
cleaner mutex here.

- -Jeff

- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)

iQIcBAEBAgAGBQJVgYYDAAoJEB57S2MheeWyxIcQAIGwFvP1bL4C8Oa3WyFL/tjE
QITNDQZGYXEKfFqRWdHEAeFJ8kv234xo/tx7Ml0Txd8DFrqzDwXSxv6deLzDiiTT
gymMdBKO3x7TLKZTxnyDXYEUDHM72IMOUS2el3wOOsc61rL1KajFEWySGtAA80pk
bIUH6uosRTXhpXBRe080mc9XPhtfIQyCC8nroJHYazNwT3VWrvbhDaZPM3npNttj
5glsCz7ieseiWKqFCIlYC5yCgpst79U7D8M75Jo0yslvtZNpZOMR3YhvyQakj5hG
p/CFRfbdFGnl3wKv+ACyu7XlewqoA9LwkB5Sbjzd4XbS3n7J4gch043b+BbIl2SA
VghNTTI+tm7KKvMa3fghtedooVYu6DjdhU58VEWOBtHaDiWntSmd0FqzUCqAotxC
fwEmMWCWCWR1E0etRUrnbO1DGltkR38ost7cvXOPXUUdvv3Hy22mTfWW73YwsWXW
kwmG2V+IdgOWHDMxQCnj55/NbYep+/TiVjDPJnOuCn8tD5Tw+zHxtRbXhVcyKpGj
jJXKb9uxDhKpsisz8HQJHf1uMLFJ3qzCqgYxysbc2PqlzylFfY2aefYWSPmrE6y4
6OJW7gTr75PzrGGm7gM1sPiPQLuFNEFBEi0Ak7ad6Q6SAAV339r+h00sg4Q1adVu
2JedYHUeFDUjAGAgft0G
=SQ/a
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Filipe Manana June 17, 2015, 2:37 p.m. UTC | #5
On Wed, Jun 17, 2015 at 3:36 PM, Jeff Mahoney <jeffm@suse.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 6/17/15 10:32 AM, Jeff Mahoney wrote:
>> On 6/17/15 9:24 AM, Filipe David Manana wrote:
>>> On Wed, Jun 17, 2015 at 11:04 AM, Filipe David Manana
>>> <fdmanana@gmail.com> wrote:
>>>> On Mon, Jun 15, 2015 at 2:41 PM,  <jeffm@suse.com> wrote:
>>>>> From: Jeff Mahoney <jeffm@suse.com>
>>>>>
>>>>> The cleaner thread may already be sleeping by the time we
>>>>> enter close_ctree.  If that's the case, we'll skip removing
>>>>> any unused block groups queued for removal, even during a
>>>>> normal umount. They'll be cleaned up automatically at next
>>>>> mount, but users expect a umount to be a clean
>>>>> synchronization point, especially when used on
>>>>> thin-provisioned storage with -odiscard.  We also explicitly
>>>>> remove unused block groups in the ro-remount path for the
>>>>> same reason.
>>>>>
>>>>> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
>>>> Reviewed-by: Filipe Manana <fdmanana@suse.com> Tested-by:
>>>> Filipe Manana <fdmanana@suse.com>
>>>>
>>>>> --- fs/btrfs/disk-io.c |  9 +++++++++ fs/btrfs/super.c   |
>>>>> 11 +++++++++++ 2 files changed, 20 insertions(+)
>>>>>
>>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index
>>>>> 2ef9a4b..2e47fef 100644 --- a/fs/btrfs/disk-io.c +++
>>>>> b/fs/btrfs/disk-io.c @@ -3710,6 +3710,15 @@ void
>>>>> close_ctree(struct btrfs_root *root)
>>>>> cancel_work_sync(&fs_info->async_reclaim_work);
>>>>>
>>>>> if (!(fs_info->sb->s_flags & MS_RDONLY)) { +               /*
>>>>> + * If the cleaner thread is stopped and there are + * block
>>>>> groups queued for removal, the deletion will be + * skipped
>>>>> when we quit the cleaner thread. +                */ +
>>>>> mutex_lock(&root->fs_info->cleaner_mutex); +
>>>>> btrfs_delete_unused_bgs(root->fs_info); +
>>>>> mutex_unlock(&root->fs_info->cleaner_mutex); + ret =
>>>>> btrfs_commit_super(root); if (ret) btrfs_err(fs_info,
>>>>> "commit super ret %d", ret); diff --git a/fs/btrfs/super.c
>>>>> b/fs/btrfs/super.c index 9e66f5e..2ccd8d4 100644 ---
>>>>> a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1539,6
>>>>> +1539,17 @@ static int btrfs_remount(struct super_block *sb,
>>>>> int *flags, char *data)
>>>>>
>>>>> sb->s_flags |= MS_RDONLY;
>>>>>
>>>>> +               /* +                * Setting MS_RDONLY will
>>>>> put the cleaner thread to +                * sleep at the
>>>>> next loop if it's already active. +                * If it's
>>>>> already asleep, we'll leave unused block +                *
>>>>> groups on disk until we're mounted read-write again +
>>>>> * unless we clean them up here. +                */ +
>>>>> mutex_lock(&root->fs_info->cleaner_mutex); +
>>>>> btrfs_delete_unused_bgs(fs_info); +
>>>>> mutex_unlock(&root->fs_info->cleaner_mutex);
>>
>>> So actually, this allows for a deadlock after the patch I sent
>>> out last week:
>>
>>> https://patchwork.kernel.org/patch/6586811/
>>
>>> In that patch delete_unused_bgs is no longer called under the
>>> cleaner_mutex, and making it so, will cause a deadlock with/ru
>>> relocation.
>>
>>> Even without that patch, I don't think you need using this mutex
>>>  anyway - no 2 tasks running this function can get the same bg
>>> from the fs_info->unused_bgs list.
>>
>> I was hitting crashes during umount when xfstests would do
>> remount-ro and umount in quick succession.  I can go back and
>> confirm this, but I believe I was encountering a race between the
>> cleaner thread and umount after being set read-only.  It didn't
>> trigger all the time.  My hypothesis is that if the cleaner thread
>> was running and had a lot of work to do, it could start before set
>> MS_RDONLY and still be performing work through the remount and into
>> the umount.  Ro-remount would have set MS_RDONLY so we skip the
>> btrfs_super_commit in close_ctree and then blow up afterwards.
>>
>> Taking the cleaner mutex means we either wait until the cleaner
>> thread has finished or we put it to sleep on the next loop before
>> it does anything.  In either case, it's safe.  It could just has
>> easily been:
>>
>> mutex_lock(&root->fs_info->cleaner_mutex);
>> mutex_unlock(&root->fs_info->cleaner_mutex);
>>
>> btrfs_delete_unused_bgs(fs_info);
>>
>> I think it actually was in a previous version I was testing.  It
>> probably should go back to that version so that we don't end up
>> confusing it with the new mutex you introduced in your patch.
>
> It looks like your:
> [PATCH] Btrfs: fix crash on close_ctree() if cleaner starts new
> transaction
>
> would also fix this in a more general case.  We can drop taking the
> cleaner mutex here.

Cool, thanks Jeff.

>
> - -Jeff
>
> - --
> Jeff Mahoney
> SUSE Labs
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
>
> iQIcBAEBAgAGBQJVgYYDAAoJEB57S2MheeWyxIcQAIGwFvP1bL4C8Oa3WyFL/tjE
> QITNDQZGYXEKfFqRWdHEAeFJ8kv234xo/tx7Ml0Txd8DFrqzDwXSxv6deLzDiiTT
> gymMdBKO3x7TLKZTxnyDXYEUDHM72IMOUS2el3wOOsc61rL1KajFEWySGtAA80pk
> bIUH6uosRTXhpXBRe080mc9XPhtfIQyCC8nroJHYazNwT3VWrvbhDaZPM3npNttj
> 5glsCz7ieseiWKqFCIlYC5yCgpst79U7D8M75Jo0yslvtZNpZOMR3YhvyQakj5hG
> p/CFRfbdFGnl3wKv+ACyu7XlewqoA9LwkB5Sbjzd4XbS3n7J4gch043b+BbIl2SA
> VghNTTI+tm7KKvMa3fghtedooVYu6DjdhU58VEWOBtHaDiWntSmd0FqzUCqAotxC
> fwEmMWCWCWR1E0etRUrnbO1DGltkR38ost7cvXOPXUUdvv3Hy22mTfWW73YwsWXW
> kwmG2V+IdgOWHDMxQCnj55/NbYep+/TiVjDPJnOuCn8tD5Tw+zHxtRbXhVcyKpGj
> jJXKb9uxDhKpsisz8HQJHf1uMLFJ3qzCqgYxysbc2PqlzylFfY2aefYWSPmrE6y4
> 6OJW7gTr75PzrGGm7gM1sPiPQLuFNEFBEi0Ak7ad6Q6SAAV339r+h00sg4Q1adVu
> 2JedYHUeFDUjAGAgft0G
> =SQ/a
> -----END PGP SIGNATURE-----
diff mbox

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2ef9a4b..2e47fef 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3710,6 +3710,15 @@  void close_ctree(struct btrfs_root *root)
 	cancel_work_sync(&fs_info->async_reclaim_work);
 
 	if (!(fs_info->sb->s_flags & MS_RDONLY)) {
+		/*
+		 * If the cleaner thread is stopped and there are
+		 * block groups queued for removal, the deletion will be
+		 * skipped when we quit the cleaner thread.
+		 */
+		mutex_lock(&root->fs_info->cleaner_mutex);
+		btrfs_delete_unused_bgs(root->fs_info);
+		mutex_unlock(&root->fs_info->cleaner_mutex);
+
 		ret = btrfs_commit_super(root);
 		if (ret)
 			btrfs_err(fs_info, "commit super ret %d", ret);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 9e66f5e..2ccd8d4 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1539,6 +1539,17 @@  static int btrfs_remount(struct super_block *sb, int *flags, char *data)
 
 		sb->s_flags |= MS_RDONLY;
 
+		/*
+		 * Setting MS_RDONLY will put the cleaner thread to
+		 * sleep at the next loop if it's already active.
+		 * If it's already asleep, we'll leave unused block
+		 * groups on disk until we're mounted read-write again
+		 * unless we clean them up here.
+		 */
+		mutex_lock(&root->fs_info->cleaner_mutex);
+		btrfs_delete_unused_bgs(fs_info);
+		mutex_unlock(&root->fs_info->cleaner_mutex);
+
 		btrfs_dev_replace_suspend_for_unmount(fs_info);
 		btrfs_scrub_cancel(fs_info);
 		btrfs_pause_balance(fs_info);