bug? fstrim only trims unallocated space, not unused in bg's
diff mbox

Message ID 0f6c7514-1cf2-1d88-1307-c43f7642b525@gmx.com
State New
Headers show

Commit Message

Qu Wenruo Nov. 20, 2017, 2:42 a.m. UTC
On 2017年11月20日 10:24, Chris Murphy wrote:
> On Sun, Nov 19, 2017 at 7:13 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>> On 2017年11月19日 14:17, Chris Murphy wrote:
>>> fstrim should trim free space, but it only trims unallocated. This is
>>> with kernel 4.14.0 and the entire 4.13 series. I'm pretty sure it
>>> behaved this way with 4.12 also.
>>
>> Tested with 4.14-rc7, can't reproduce it.
> 
> $ sudo btrfs fi us /
> Overall:
>     Device size:          70.00GiB
>     Device allocated:          31.03GiB
>     Device unallocated:          38.97GiB
>     Device missing:             0.00B
>     Used:              22.12GiB
>     Free (estimated):          47.62GiB    (min: 47.62GiB)
> ...snip...
> 
> $ sudo fstrim -v /
> /: 39 GiB (41841328128 bytes) trimmed
> 
> Then I run btrfs-debug -b / and find the least used block group, at 8% usage;
> 
> block group offset   174202028032 len 1073741824 used   89206784
> chunk_objectid 256 flags 1 usage 0.08
> 
> And balance that block group:
> 
> $ sudo btrfs balance start -dvrange=174202028032..174202028033 -dlimit=1 /
> Done, had to relocate 1 out of 32 chunks
> 
> And trim again:
> 
> /: 39 GiB (41841328128 bytes) trimmed
> 
> 
>> Any special mount options or setup?
>> (BTW, I also tried space_cache=v2 and default v1, no obvious difference)
> 
> 
> /dev/nvme0n1p8 on / type btrfs
> (rw,relatime,seclabel,ssd,space_cache,subvolid=333,subvol=/root27)

Nothing special at all.

And unfortunately, no trace point inside btrfs_trim_block_group() at all.

But a quick glance shows me that, the loop to iterate existing block
groups to trim free space inside them has a return value overwrite bug.

So only unallocated space get trimmed.

Would you please try this diff to get the return value?

------
                        ret = btrfs_trim_block_group(cache,
@@ -11000,7 +11000,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
struct fstrim_range *range)
                        trimmed += group_trimmed;
                        if (ret) {
                                btrfs_put_block_group(cache);
-                               break;
+                               goto out;
                        }
                }

@@ -11019,6 +11019,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
struct fstrim_range *range)
        }
        mutex_unlock(&fs_info->fs_devices->device_list_mutex);

+out:
        range->len = trimmed;
        return ret;
 }
------

Thanks,
Qu

> 
> 
> Would a strace of fstrim help?
> 
>

Comments

Chris Murphy Nov. 20, 2017, 10:23 p.m. UTC | #1
On Sun, Nov 19, 2017 at 7:42 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2017年11月20日 10:24, Chris Murphy wrote:
>> On Sun, Nov 19, 2017 at 7:13 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>
>>>
>>> On 2017年11月19日 14:17, Chris Murphy wrote:
>>>> fstrim should trim free space, but it only trims unallocated. This is
>>>> with kernel 4.14.0 and the entire 4.13 series. I'm pretty sure it
>>>> behaved this way with 4.12 also.
>>>
>>> Tested with 4.14-rc7, can't reproduce it.
>>
>> $ sudo btrfs fi us /
>> Overall:
>>     Device size:          70.00GiB
>>     Device allocated:          31.03GiB
>>     Device unallocated:          38.97GiB
>>     Device missing:             0.00B
>>     Used:              22.12GiB
>>     Free (estimated):          47.62GiB    (min: 47.62GiB)
>> ...snip...
>>
>> $ sudo fstrim -v /
>> /: 39 GiB (41841328128 bytes) trimmed
>>
>> Then I run btrfs-debug -b / and find the least used block group, at 8% usage;
>>
>> block group offset   174202028032 len 1073741824 used   89206784
>> chunk_objectid 256 flags 1 usage 0.08
>>
>> And balance that block group:
>>
>> $ sudo btrfs balance start -dvrange=174202028032..174202028033 -dlimit=1 /
>> Done, had to relocate 1 out of 32 chunks
>>
>> And trim again:
>>
>> /: 39 GiB (41841328128 bytes) trimmed
>>
>>
>>> Any special mount options or setup?
>>> (BTW, I also tried space_cache=v2 and default v1, no obvious difference)
>>
>>
>> /dev/nvme0n1p8 on / type btrfs
>> (rw,relatime,seclabel,ssd,space_cache,subvolid=333,subvol=/root27)
>
> Nothing special at all.
>
> And unfortunately, no trace point inside btrfs_trim_block_group() at all.
>
> But a quick glance shows me that, the loop to iterate existing block
> groups to trim free space inside them has a return value overwrite bug.
>
> So only unallocated space get trimmed.
>
> Would you please try this diff to get the return value?
>
> ------
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 309a109069f1..dbec05dc8810 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10983,12 +10983,12 @@ int btrfs_trim_fs(struct btrfs_fs_info
> *fs_info, struct fstrim_range *range)
>                                 ret = cache_block_group(cache, 0);
>                                 if (ret) {
>                                         btrfs_put_block_group(cache);
> -                                       break;
> +                                       goto out;
>                                 }
>                                 ret = wait_block_group_cache_done(cache);
>                                 if (ret) {
>                                         btrfs_put_block_group(cache);
> -                                       break;
> +                                       goto out;
>                                 }
>                         }
>                         ret = btrfs_trim_block_group(cache,
> @@ -11000,7 +11000,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
> struct fstrim_range *range)
>                         trimmed += group_trimmed;
>                         if (ret) {
>                                 btrfs_put_block_group(cache);
> -                               break;
> +                               goto out;
>                         }
>                 }
>
> @@ -11019,6 +11019,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
> struct fstrim_range *range)
>         }
>         mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>
> +out:
>         range->len = trimmed;
>         return ret;
>  }
> ------

This won't apply on tag v4.14 for some reason.

[chris@f27s linux]$ git apply -v ~/qutrim1.patch
Checking patch fs/btrfs/extent-tree.c...
error: while searching for:
                               ret = cache_block_group(cache, 0);
                               if (ret) {
                                       btrfs_put_block_group(cache);
                                       break;
                               }
                               ret = wait_block_group_cache_done(cache);
                               if (ret) {
                                       btrfs_put_block_group(cache);
                                       break;
                               }
                       }
                       ret = btrfs_trim_block_group(cache,

error: patch failed: fs/btrfs/extent-tree.c:10983
error: fs/btrfs/extent-tree.c: patch does not apply
[chris@f27s linux]$


If I do it manually (just adding the goto and build it, reboot, I
still get the same result for fstrim and nothing in dmesg.
Qu Wenruo Nov. 21, 2017, 1:16 a.m. UTC | #2
On 2017年11月21日 06:23, Chris Murphy wrote:
> On Sun, Nov 19, 2017 at 7:42 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>> On 2017年11月20日 10:24, Chris Murphy wrote:
>>> On Sun, Nov 19, 2017 at 7:13 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>
>>>>
>>>> On 2017年11月19日 14:17, Chris Murphy wrote:
>>>>> fstrim should trim free space, but it only trims unallocated. This is
>>>>> with kernel 4.14.0 and the entire 4.13 series. I'm pretty sure it
>>>>> behaved this way with 4.12 also.
>>>>
>>>> Tested with 4.14-rc7, can't reproduce it.
>>>
>>> $ sudo btrfs fi us /
>>> Overall:
>>>     Device size:          70.00GiB
>>>     Device allocated:          31.03GiB
>>>     Device unallocated:          38.97GiB
>>>     Device missing:             0.00B
>>>     Used:              22.12GiB
>>>     Free (estimated):          47.62GiB    (min: 47.62GiB)
>>> ...snip...
>>>
>>> $ sudo fstrim -v /
>>> /: 39 GiB (41841328128 bytes) trimmed
>>>
>>> Then I run btrfs-debug -b / and find the least used block group, at 8% usage;
>>>
>>> block group offset   174202028032 len 1073741824 used   89206784
>>> chunk_objectid 256 flags 1 usage 0.08
>>>
>>> And balance that block group:
>>>
>>> $ sudo btrfs balance start -dvrange=174202028032..174202028033 -dlimit=1 /
>>> Done, had to relocate 1 out of 32 chunks
>>>
>>> And trim again:
>>>
>>> /: 39 GiB (41841328128 bytes) trimmed
>>>
>>>
>>>> Any special mount options or setup?
>>>> (BTW, I also tried space_cache=v2 and default v1, no obvious difference)
>>>
>>>
>>> /dev/nvme0n1p8 on / type btrfs
>>> (rw,relatime,seclabel,ssd,space_cache,subvolid=333,subvol=/root27)
>>
>> Nothing special at all.
>>
>> And unfortunately, no trace point inside btrfs_trim_block_group() at all.
>>
>> But a quick glance shows me that, the loop to iterate existing block
>> groups to trim free space inside them has a return value overwrite bug.
>>
>> So only unallocated space get trimmed.
>>
>> Would you please try this diff to get the return value?
>>
>> ------
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 309a109069f1..dbec05dc8810 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -10983,12 +10983,12 @@ int btrfs_trim_fs(struct btrfs_fs_info
>> *fs_info, struct fstrim_range *range)
>>                                 ret = cache_block_group(cache, 0);
>>                                 if (ret) {
>>                                         btrfs_put_block_group(cache);
>> -                                       break;
>> +                                       goto out;
>>                                 }
>>                                 ret = wait_block_group_cache_done(cache);
>>                                 if (ret) {
>>                                         btrfs_put_block_group(cache);
>> -                                       break;
>> +                                       goto out;
>>                                 }
>>                         }
>>                         ret = btrfs_trim_block_group(cache,
>> @@ -11000,7 +11000,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>> struct fstrim_range *range)
>>                         trimmed += group_trimmed;
>>                         if (ret) {
>>                                 btrfs_put_block_group(cache);
>> -                               break;
>> +                               goto out;
>>                         }
>>                 }
>>
>> @@ -11019,6 +11019,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>> struct fstrim_range *range)
>>         }
>>         mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>
>> +out:
>>         range->len = trimmed;
>>         return ret;
>>  }
>> ------
> 
> This won't apply on tag v4.14 for some reason.
> 
> [chris@f27s linux]$ git apply -v ~/qutrim1.patch
> Checking patch fs/btrfs/extent-tree.c...
> error: while searching for:
>                                ret = cache_block_group(cache, 0);
>                                if (ret) {
>                                        btrfs_put_block_group(cache);
>                                        break;
>                                }
>                                ret = wait_block_group_cache_done(cache);
>                                if (ret) {
>                                        btrfs_put_block_group(cache);
>                                        break;
>                                }
>                        }
>                        ret = btrfs_trim_block_group(cache,
> 
> error: patch failed: fs/btrfs/extent-tree.c:10983
> error: fs/btrfs/extent-tree.c: patch does not apply
> [chris@f27s linux]$
> 
> 
> If I do it manually (just adding the goto and build it, reboot, I
> still get the same result for fstrim and nothing in dmesg.

Sorry, that diff will not output extra info. Just to abort the process
and return true error code.

I have update the patch to output more verbose output.
You could find it in patchwork:
https://patchwork.kernel.org/patch/10065991/

Thanks,
Qu
Chris Murphy Nov. 21, 2017, 4:06 a.m. UTC | #3
On Mon, Nov 20, 2017 at 6:16 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2017年11月21日 06:23, Chris Murphy wrote:
>> On Sun, Nov 19, 2017 at 7:42 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>
>>>
>>> On 2017年11月20日 10:24, Chris Murphy wrote:
>>>> On Sun, Nov 19, 2017 at 7:13 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>
>>>>>
>>>>> On 2017年11月19日 14:17, Chris Murphy wrote:
>>>>>> fstrim should trim free space, but it only trims unallocated. This is
>>>>>> with kernel 4.14.0 and the entire 4.13 series. I'm pretty sure it
>>>>>> behaved this way with 4.12 also.
>>>>>
>>>>> Tested with 4.14-rc7, can't reproduce it.
>>>>
>>>> $ sudo btrfs fi us /
>>>> Overall:
>>>>     Device size:          70.00GiB
>>>>     Device allocated:          31.03GiB
>>>>     Device unallocated:          38.97GiB
>>>>     Device missing:             0.00B
>>>>     Used:              22.12GiB
>>>>     Free (estimated):          47.62GiB    (min: 47.62GiB)
>>>> ...snip...
>>>>
>>>> $ sudo fstrim -v /
>>>> /: 39 GiB (41841328128 bytes) trimmed
>>>>
>>>> Then I run btrfs-debug -b / and find the least used block group, at 8% usage;
>>>>
>>>> block group offset   174202028032 len 1073741824 used   89206784
>>>> chunk_objectid 256 flags 1 usage 0.08
>>>>
>>>> And balance that block group:
>>>>
>>>> $ sudo btrfs balance start -dvrange=174202028032..174202028033 -dlimit=1 /
>>>> Done, had to relocate 1 out of 32 chunks
>>>>
>>>> And trim again:
>>>>
>>>> /: 39 GiB (41841328128 bytes) trimmed
>>>>
>>>>
>>>>> Any special mount options or setup?
>>>>> (BTW, I also tried space_cache=v2 and default v1, no obvious difference)
>>>>
>>>>
>>>> /dev/nvme0n1p8 on / type btrfs
>>>> (rw,relatime,seclabel,ssd,space_cache,subvolid=333,subvol=/root27)
>>>
>>> Nothing special at all.
>>>
>>> And unfortunately, no trace point inside btrfs_trim_block_group() at all.
>>>
>>> But a quick glance shows me that, the loop to iterate existing block
>>> groups to trim free space inside them has a return value overwrite bug.
>>>
>>> So only unallocated space get trimmed.
>>>
>>> Would you please try this diff to get the return value?
>>>
>>> ------
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 309a109069f1..dbec05dc8810 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -10983,12 +10983,12 @@ int btrfs_trim_fs(struct btrfs_fs_info
>>> *fs_info, struct fstrim_range *range)
>>>                                 ret = cache_block_group(cache, 0);
>>>                                 if (ret) {
>>>                                         btrfs_put_block_group(cache);
>>> -                                       break;
>>> +                                       goto out;
>>>                                 }
>>>                                 ret = wait_block_group_cache_done(cache);
>>>                                 if (ret) {
>>>                                         btrfs_put_block_group(cache);
>>> -                                       break;
>>> +                                       goto out;
>>>                                 }
>>>                         }
>>>                         ret = btrfs_trim_block_group(cache,
>>> @@ -11000,7 +11000,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>>> struct fstrim_range *range)
>>>                         trimmed += group_trimmed;
>>>                         if (ret) {
>>>                                 btrfs_put_block_group(cache);
>>> -                               break;
>>> +                               goto out;
>>>                         }
>>>                 }
>>>
>>> @@ -11019,6 +11019,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>>> struct fstrim_range *range)
>>>         }
>>>         mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>>
>>> +out:
>>>         range->len = trimmed;
>>>         return ret;
>>>  }
>>> ------
>>
>> This won't apply on tag v4.14 for some reason.
>>
>> [chris@f27s linux]$ git apply -v ~/qutrim1.patch
>> Checking patch fs/btrfs/extent-tree.c...
>> error: while searching for:
>>                                ret = cache_block_group(cache, 0);
>>                                if (ret) {
>>                                        btrfs_put_block_group(cache);
>>                                        break;
>>                                }
>>                                ret = wait_block_group_cache_done(cache);
>>                                if (ret) {
>>                                        btrfs_put_block_group(cache);
>>                                        break;
>>                                }
>>                        }
>>                        ret = btrfs_trim_block_group(cache,
>>
>> error: patch failed: fs/btrfs/extent-tree.c:10983
>> error: fs/btrfs/extent-tree.c: patch does not apply
>> [chris@f27s linux]$
>>
>>
>> If I do it manually (just adding the goto and build it, reboot, I
>> still get the same result for fstrim and nothing in dmesg.
>
> Sorry, that diff will not output extra info. Just to abort the process
> and return true error code.

OK? It didn't seem to do that either. I see no change.


>
> I have update the patch to output more verbose output.
> You could find it in patchwork:
> https://patchwork.kernel.org/patch/10065991/

Patch applies on v4.14.0, and still nothing in dmesg, or in user space
when issuing fstrim.

# fstrim -v /
/: 38 GiB (40767586304 bytes) trimmed
# dmesg | grep -i btrfs
[    2.745902] btrfs: loading out-of-tree module taints kernel.
[    2.749905] Btrfs loaded, crc32c=crc32c-intel
[    2.751072] BTRFS: device label fedora devid 1 transid 252048 /dev/nvme0n1p8
[    4.295891] BTRFS info (device nvme0n1p8): disk space caching is enabled
[    4.295892] BTRFS info (device nvme0n1p8): has skinny extents
[    4.307326] BTRFS info (device nvme0n1p8): enabling ssd optimizations
[    4.959467] BTRFS info (device nvme0n1p8): disk space caching is enabled
[root@f27h ~]#


Pretty sure the patch is applied because of the first message about
the out of tree module, which I do not get with Fedora kernels.
Chris Murphy Nov. 21, 2017, 4:09 a.m. UTC | #4
Do I need btrfs debug stuff enabled for this patch to work?

$ grep -i btrfs /home/chris/linux/.config
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
# CONFIG_BTRFS_FS_CHECK_INTEGRITY is not set
# CONFIG_BTRFS_FS_RUN_SANITY_TESTS is not set
# CONFIG_BTRFS_DEBUG is not set
# CONFIG_BTRFS_ASSERT is not set
$
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 309a109069f1..dbec05dc8810 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10983,12 +10983,12 @@  int btrfs_trim_fs(struct btrfs_fs_info
*fs_info, struct fstrim_range *range)
                                ret = cache_block_group(cache, 0);
                                if (ret) {
                                        btrfs_put_block_group(cache);
-                                       break;
+                                       goto out;
                                }
                                ret = wait_block_group_cache_done(cache);
                                if (ret) {
                                        btrfs_put_block_group(cache);
-                                       break;
+                                       goto out;
                                }
                        }