diff mbox

bug? fstrim only trims unallocated space, not unused in bg's

Message ID af39cb86-693c-2f3b-7831-60968d3b5c3d@gmx.com (mailing list archive)
State New, archived
Headers show

Commit Message

Qu Wenruo Nov. 21, 2017, 4:29 a.m. UTC
On 2017年11月21日 12:06, Chris Murphy wrote:
> On Mon, Nov 20, 2017 at 6:16 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>> On 2017年11月21日 06:23, Chris Murphy wrote:
>>> On Sun, Nov 19, 2017 at 7:42 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>
>>>>
>>>> On 2017年11月20日 10:24, Chris Murphy wrote:
>>>>> On Sun, Nov 19, 2017 at 7:13 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 2017年11月19日 14:17, Chris Murphy wrote:
>>>>>>> fstrim should trim free space, but it only trims unallocated. This is
>>>>>>> with kernel 4.14.0 and the entire 4.13 series. I'm pretty sure it
>>>>>>> behaved this way with 4.12 also.
>>>>>>
>>>>>> Tested with 4.14-rc7, can't reproduce it.
>>>>>
>>>>> $ sudo btrfs fi us /
>>>>> Overall:
>>>>>     Device size:          70.00GiB
>>>>>     Device allocated:          31.03GiB
>>>>>     Device unallocated:          38.97GiB
>>>>>     Device missing:             0.00B
>>>>>     Used:              22.12GiB
>>>>>     Free (estimated):          47.62GiB    (min: 47.62GiB)
>>>>> ...snip...
>>>>>
>>>>> $ sudo fstrim -v /
>>>>> /: 39 GiB (41841328128 bytes) trimmed
>>>>>
>>>>> Then I run btrfs-debug -b / and find the least used block group, at 8% usage;
>>>>>
>>>>> block group offset   174202028032 len 1073741824 used   89206784
>>>>> chunk_objectid 256 flags 1 usage 0.08
>>>>>
>>>>> And balance that block group:
>>>>>
>>>>> $ sudo btrfs balance start -dvrange=174202028032..174202028033 -dlimit=1 /
>>>>> Done, had to relocate 1 out of 32 chunks
>>>>>
>>>>> And trim again:
>>>>>
>>>>> /: 39 GiB (41841328128 bytes) trimmed
>>>>>
>>>>>
>>>>>> Any special mount options or setup?
>>>>>> (BTW, I also tried space_cache=v2 and default v1, no obvious difference)
>>>>>
>>>>>
>>>>> /dev/nvme0n1p8 on / type btrfs
>>>>> (rw,relatime,seclabel,ssd,space_cache,subvolid=333,subvol=/root27)
>>>>
>>>> Nothing special at all.
>>>>
>>>> And unfortunately, no trace point inside btrfs_trim_block_group() at all.
>>>>
>>>> But a quick glance shows me that, the loop to iterate existing block
>>>> groups to trim free space inside them has a return value overwrite bug.
>>>>
>>>> So only unallocated space get trimmed.
>>>>
>>>> Would you please try this diff to get the return value?
>>>>
>>>> ------
>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>> index 309a109069f1..dbec05dc8810 100644
>>>> --- a/fs/btrfs/extent-tree.c
>>>> +++ b/fs/btrfs/extent-tree.c
>>>> @@ -10983,12 +10983,12 @@ int btrfs_trim_fs(struct btrfs_fs_info
>>>> *fs_info, struct fstrim_range *range)
>>>>                                 ret = cache_block_group(cache, 0);
>>>>                                 if (ret) {
>>>>                                         btrfs_put_block_group(cache);
>>>> -                                       break;
>>>> +                                       goto out;
>>>>                                 }
>>>>                                 ret = wait_block_group_cache_done(cache);
>>>>                                 if (ret) {
>>>>                                         btrfs_put_block_group(cache);
>>>> -                                       break;
>>>> +                                       goto out;
>>>>                                 }
>>>>                         }
>>>>                         ret = btrfs_trim_block_group(cache,
>>>> @@ -11000,7 +11000,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>>>> struct fstrim_range *range)
>>>>                         trimmed += group_trimmed;
>>>>                         if (ret) {
>>>>                                 btrfs_put_block_group(cache);
>>>> -                               break;
>>>> +                               goto out;
>>>>                         }
>>>>                 }
>>>>
>>>> @@ -11019,6 +11019,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>>>> struct fstrim_range *range)
>>>>         }
>>>>         mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>>>
>>>> +out:
>>>>         range->len = trimmed;
>>>>         return ret;
>>>>  }
>>>> ------
>>>
>>> This won't apply on tag v4.14 for some reason.
>>>
>>> [chris@f27s linux]$ git apply -v ~/qutrim1.patch
>>> Checking patch fs/btrfs/extent-tree.c...
>>> error: while searching for:
>>>                                ret = cache_block_group(cache, 0);
>>>                                if (ret) {
>>>                                        btrfs_put_block_group(cache);
>>>                                        break;
>>>                                }
>>>                                ret = wait_block_group_cache_done(cache);
>>>                                if (ret) {
>>>                                        btrfs_put_block_group(cache);
>>>                                        break;
>>>                                }
>>>                        }
>>>                        ret = btrfs_trim_block_group(cache,
>>>
>>> error: patch failed: fs/btrfs/extent-tree.c:10983
>>> error: fs/btrfs/extent-tree.c: patch does not apply
>>> [chris@f27s linux]$
>>>
>>>
>>> If I do it manually (just adding the goto and build it, reboot, I
>>> still get the same result for fstrim and nothing in dmesg.
>>
>> Sorry, that diff will not output extra info. Just to abort the process
>> and return true error code.
> 
> OK? It didn't seem to do that either. I see no change.
> 
> 
>>
>> I have update the patch to output more verbose output.
>> You could find it in patchwork:
>> https://patchwork.kernel.org/patch/10065991/
> 
> Patch applies on v4.14.0, and still nothing in dmesg, or in user space
> when issuing fstrim.
> 
> # fstrim -v /
> /: 38 GiB (40767586304 bytes) trimmed
> # dmesg | grep -i btrfs
> [    2.745902] btrfs: loading out-of-tree module taints kernel.
> [    2.749905] Btrfs loaded, crc32c=crc32c-intel
> [    2.751072] BTRFS: device label fedora devid 1 transid 252048 /dev/nvme0n1p8
> [    4.295891] BTRFS info (device nvme0n1p8): disk space caching is enabled
> [    4.295892] BTRFS info (device nvme0n1p8): has skinny extents
> [    4.307326] BTRFS info (device nvme0n1p8): enabling ssd optimizations
> [    4.959467] BTRFS info (device nvme0n1p8): disk space caching is enabled
> [root@f27h ~]#
> 
> 
> Pretty sure the patch is applied because of the first message about
> the out of tree module, which I do not get with Fedora kernels.

So that's not something wrong happened to make you skip trimming one
chunk, but something else just skipped the block group trimming.

And I don't think DEBUG config is related to this.

I doubt if it's the @fstrim_range passed in has something strange that
prevent us from trimming block groups.

Would you please try this diff based on the patch from patchwork?

------
         * try to trim all FS space, our block group may start from
non-zero.
         */
@@ -10981,6 +10983,8 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
struct fstrim_range *range)
                cache = btrfs_lookup_block_group(fs_info, range->start);

        for (; cache; cache = next_block_group(fs_info, cache)) {
+               btrfs_info(fs_info, "bg start=%llu len=%llu",
+                          cache->key.objectid, cache->key.offset);
                if (cache->key.objectid >= (range->start + range->len)) {
                        btrfs_put_block_group(cache);
                        break;
@@ -11045,6 +11049,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
struct fstrim_range *range)
        mutex_unlock(&fs_info->fs_devices->device_list_mutex);

        range->len = trimmed;
+       btrfs_info(fs_info, "trimming done");
        if (bg_ret)
                return bg_ret;
        return dev_ret;

------

Thanks,
Qu

Comments

Chris Murphy Nov. 21, 2017, 4:34 a.m. UTC | #1
On Mon, Nov 20, 2017 at 9:29 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2017年11月21日 12:06, Chris Murphy wrote:
>> On Mon, Nov 20, 2017 at 6:16 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>
>>>
>>> On 2017年11月21日 06:23, Chris Murphy wrote:
>>>> On Sun, Nov 19, 2017 at 7:42 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>
>>>>>
>>>>> On 2017年11月20日 10:24, Chris Murphy wrote:
>>>>>> On Sun, Nov 19, 2017 at 7:13 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2017年11月19日 14:17, Chris Murphy wrote:
>>>>>>>> fstrim should trim free space, but it only trims unallocated. This is
>>>>>>>> with kernel 4.14.0 and the entire 4.13 series. I'm pretty sure it
>>>>>>>> behaved this way with 4.12 also.
>>>>>>>
>>>>>>> Tested with 4.14-rc7, can't reproduce it.
>>>>>>
>>>>>> $ sudo btrfs fi us /
>>>>>> Overall:
>>>>>>     Device size:          70.00GiB
>>>>>>     Device allocated:          31.03GiB
>>>>>>     Device unallocated:          38.97GiB
>>>>>>     Device missing:             0.00B
>>>>>>     Used:              22.12GiB
>>>>>>     Free (estimated):          47.62GiB    (min: 47.62GiB)
>>>>>> ...snip...
>>>>>>
>>>>>> $ sudo fstrim -v /
>>>>>> /: 39 GiB (41841328128 bytes) trimmed
>>>>>>
>>>>>> Then I run btrfs-debug -b / and find the least used block group, at 8% usage;
>>>>>>
>>>>>> block group offset   174202028032 len 1073741824 used   89206784
>>>>>> chunk_objectid 256 flags 1 usage 0.08
>>>>>>
>>>>>> And balance that block group:
>>>>>>
>>>>>> $ sudo btrfs balance start -dvrange=174202028032..174202028033 -dlimit=1 /
>>>>>> Done, had to relocate 1 out of 32 chunks
>>>>>>
>>>>>> And trim again:
>>>>>>
>>>>>> /: 39 GiB (41841328128 bytes) trimmed
>>>>>>
>>>>>>
>>>>>>> Any special mount options or setup?
>>>>>>> (BTW, I also tried space_cache=v2 and default v1, no obvious difference)
>>>>>>
>>>>>>
>>>>>> /dev/nvme0n1p8 on / type btrfs
>>>>>> (rw,relatime,seclabel,ssd,space_cache,subvolid=333,subvol=/root27)
>>>>>
>>>>> Nothing special at all.
>>>>>
>>>>> And unfortunately, no trace point inside btrfs_trim_block_group() at all.
>>>>>
>>>>> But a quick glance shows me that, the loop to iterate existing block
>>>>> groups to trim free space inside them has a return value overwrite bug.
>>>>>
>>>>> So only unallocated space get trimmed.
>>>>>
>>>>> Would you please try this diff to get the return value?
>>>>>
>>>>> ------
>>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>>> index 309a109069f1..dbec05dc8810 100644
>>>>> --- a/fs/btrfs/extent-tree.c
>>>>> +++ b/fs/btrfs/extent-tree.c
>>>>> @@ -10983,12 +10983,12 @@ int btrfs_trim_fs(struct btrfs_fs_info
>>>>> *fs_info, struct fstrim_range *range)
>>>>>                                 ret = cache_block_group(cache, 0);
>>>>>                                 if (ret) {
>>>>>                                         btrfs_put_block_group(cache);
>>>>> -                                       break;
>>>>> +                                       goto out;
>>>>>                                 }
>>>>>                                 ret = wait_block_group_cache_done(cache);
>>>>>                                 if (ret) {
>>>>>                                         btrfs_put_block_group(cache);
>>>>> -                                       break;
>>>>> +                                       goto out;
>>>>>                                 }
>>>>>                         }
>>>>>                         ret = btrfs_trim_block_group(cache,
>>>>> @@ -11000,7 +11000,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>>>>> struct fstrim_range *range)
>>>>>                         trimmed += group_trimmed;
>>>>>                         if (ret) {
>>>>>                                 btrfs_put_block_group(cache);
>>>>> -                               break;
>>>>> +                               goto out;
>>>>>                         }
>>>>>                 }
>>>>>
>>>>> @@ -11019,6 +11019,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>>>>> struct fstrim_range *range)
>>>>>         }
>>>>>         mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>>>>
>>>>> +out:
>>>>>         range->len = trimmed;
>>>>>         return ret;
>>>>>  }
>>>>> ------
>>>>
>>>> This won't apply on tag v4.14 for some reason.
>>>>
>>>> [chris@f27s linux]$ git apply -v ~/qutrim1.patch
>>>> Checking patch fs/btrfs/extent-tree.c...
>>>> error: while searching for:
>>>>                                ret = cache_block_group(cache, 0);
>>>>                                if (ret) {
>>>>                                        btrfs_put_block_group(cache);
>>>>                                        break;
>>>>                                }
>>>>                                ret = wait_block_group_cache_done(cache);
>>>>                                if (ret) {
>>>>                                        btrfs_put_block_group(cache);
>>>>                                        break;
>>>>                                }
>>>>                        }
>>>>                        ret = btrfs_trim_block_group(cache,
>>>>
>>>> error: patch failed: fs/btrfs/extent-tree.c:10983
>>>> error: fs/btrfs/extent-tree.c: patch does not apply
>>>> [chris@f27s linux]$
>>>>
>>>>
>>>> If I do it manually (just adding the goto and build it, reboot, I
>>>> still get the same result for fstrim and nothing in dmesg.
>>>
>>> Sorry, that diff will not output extra info. Just to abort the process
>>> and return true error code.
>>
>> OK? It didn't seem to do that either. I see no change.
>>
>>
>>>
>>> I have update the patch to output more verbose output.
>>> You could find it in patchwork:
>>> https://patchwork.kernel.org/patch/10065991/
>>
>> Patch applies on v4.14.0, and still nothing in dmesg, or in user space
>> when issuing fstrim.
>>
>> # fstrim -v /
>> /: 38 GiB (40767586304 bytes) trimmed
>> # dmesg | grep -i btrfs
>> [    2.745902] btrfs: loading out-of-tree module taints kernel.
>> [    2.749905] Btrfs loaded, crc32c=crc32c-intel
>> [    2.751072] BTRFS: device label fedora devid 1 transid 252048 /dev/nvme0n1p8
>> [    4.295891] BTRFS info (device nvme0n1p8): disk space caching is enabled
>> [    4.295892] BTRFS info (device nvme0n1p8): has skinny extents
>> [    4.307326] BTRFS info (device nvme0n1p8): enabling ssd optimizations
>> [    4.959467] BTRFS info (device nvme0n1p8): disk space caching is enabled
>> [root@f27h ~]#
>>
>>
>> Pretty sure the patch is applied because of the first message about
>> the out of tree module, which I do not get with Fedora kernels.
>
> So that's not something wrong happened to make you skip trimming one
> chunk, but something else just skipped the block group trimming.
>
> And I don't think DEBUG config is related to this.
>
> I doubt if it's the @fstrim_range passed in has something strange that
> prevent us from trimming block groups.
>
> Would you please try this diff based on the patch from patchwork?

Apply in addition to previous patch? Or apply to clean v4.14?


>
> ------
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index f830aa91ac3d..a4bf29a4a860 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10972,6 +10972,8 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
> struct fstrim_range *range)
>         int dev_ret = 0;
>         int ret = 0;
>
> +       btrfs_info(fs_info, "trimming btrfs, start=%llu len=%llu
> minlen=%llu",
> +                  range->start, range->len, range->minlen);
>         /*
>          * try to trim all FS space, our block group may start from
> non-zero.
>          */
> @@ -10981,6 +10983,8 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
> struct fstrim_range *range)
>                 cache = btrfs_lookup_block_group(fs_info, range->start);
>
>         for (; cache; cache = next_block_group(fs_info, cache)) {
> +               btrfs_info(fs_info, "bg start=%llu len=%llu",
> +                          cache->key.objectid, cache->key.offset);
>                 if (cache->key.objectid >= (range->start + range->len)) {
>                         btrfs_put_block_group(cache);
>                         break;
> @@ -11045,6 +11049,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
> struct fstrim_range *range)
>         mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>
>         range->len = trimmed;
> +       btrfs_info(fs_info, "trimming done");
>         if (bg_ret)
>                 return bg_ret;
>         return dev_ret;
>
> ------
>
> Thanks,
> Qu
>
Qu Wenruo Nov. 21, 2017, 4:43 a.m. UTC | #2
On 2017年11月21日 12:34, Chris Murphy wrote:
> On Mon, Nov 20, 2017 at 9:29 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>> On 2017年11月21日 12:06, Chris Murphy wrote:
>>> On Mon, Nov 20, 2017 at 6:16 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>
>>>>
>>>> On 2017年11月21日 06:23, Chris Murphy wrote:
>>>>> On Sun, Nov 19, 2017 at 7:42 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 2017年11月20日 10:24, Chris Murphy wrote:
>>>>>>> On Sun, Nov 19, 2017 at 7:13 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2017年11月19日 14:17, Chris Murphy wrote:
>>>>>>>>> fstrim should trim free space, but it only trims unallocated. This is
>>>>>>>>> with kernel 4.14.0 and the entire 4.13 series. I'm pretty sure it
>>>>>>>>> behaved this way with 4.12 also.
>>>>>>>>
>>>>>>>> Tested with 4.14-rc7, can't reproduce it.
>>>>>>>
>>>>>>> $ sudo btrfs fi us /
>>>>>>> Overall:
>>>>>>>     Device size:          70.00GiB
>>>>>>>     Device allocated:          31.03GiB
>>>>>>>     Device unallocated:          38.97GiB
>>>>>>>     Device missing:             0.00B
>>>>>>>     Used:              22.12GiB
>>>>>>>     Free (estimated):          47.62GiB    (min: 47.62GiB)
>>>>>>> ...snip...
>>>>>>>
>>>>>>> $ sudo fstrim -v /
>>>>>>> /: 39 GiB (41841328128 bytes) trimmed
>>>>>>>
>>>>>>> Then I run btrfs-debug -b / and find the least used block group, at 8% usage;
>>>>>>>
>>>>>>> block group offset   174202028032 len 1073741824 used   89206784
>>>>>>> chunk_objectid 256 flags 1 usage 0.08
>>>>>>>
>>>>>>> And balance that block group:
>>>>>>>
>>>>>>> $ sudo btrfs balance start -dvrange=174202028032..174202028033 -dlimit=1 /
>>>>>>> Done, had to relocate 1 out of 32 chunks
>>>>>>>
>>>>>>> And trim again:
>>>>>>>
>>>>>>> /: 39 GiB (41841328128 bytes) trimmed
>>>>>>>
>>>>>>>
>>>>>>>> Any special mount options or setup?
>>>>>>>> (BTW, I also tried space_cache=v2 and default v1, no obvious difference)
>>>>>>>
>>>>>>>
>>>>>>> /dev/nvme0n1p8 on / type btrfs
>>>>>>> (rw,relatime,seclabel,ssd,space_cache,subvolid=333,subvol=/root27)
>>>>>>
>>>>>> Nothing special at all.
>>>>>>
>>>>>> And unfortunately, no trace point inside btrfs_trim_block_group() at all.
>>>>>>
>>>>>> But a quick glance shows me that, the loop to iterate existing block
>>>>>> groups to trim free space inside them has a return value overwrite bug.
>>>>>>
>>>>>> So only unallocated space get trimmed.
>>>>>>
>>>>>> Would you please try this diff to get the return value?
>>>>>>
>>>>>> ------
>>>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>>>> index 309a109069f1..dbec05dc8810 100644
>>>>>> --- a/fs/btrfs/extent-tree.c
>>>>>> +++ b/fs/btrfs/extent-tree.c
>>>>>> @@ -10983,12 +10983,12 @@ int btrfs_trim_fs(struct btrfs_fs_info
>>>>>> *fs_info, struct fstrim_range *range)
>>>>>>                                 ret = cache_block_group(cache, 0);
>>>>>>                                 if (ret) {
>>>>>>                                         btrfs_put_block_group(cache);
>>>>>> -                                       break;
>>>>>> +                                       goto out;
>>>>>>                                 }
>>>>>>                                 ret = wait_block_group_cache_done(cache);
>>>>>>                                 if (ret) {
>>>>>>                                         btrfs_put_block_group(cache);
>>>>>> -                                       break;
>>>>>> +                                       goto out;
>>>>>>                                 }
>>>>>>                         }
>>>>>>                         ret = btrfs_trim_block_group(cache,
>>>>>> @@ -11000,7 +11000,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>>>>>> struct fstrim_range *range)
>>>>>>                         trimmed += group_trimmed;
>>>>>>                         if (ret) {
>>>>>>                                 btrfs_put_block_group(cache);
>>>>>> -                               break;
>>>>>> +                               goto out;
>>>>>>                         }
>>>>>>                 }
>>>>>>
>>>>>> @@ -11019,6 +11019,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>>>>>> struct fstrim_range *range)
>>>>>>         }
>>>>>>         mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>>>>>
>>>>>> +out:
>>>>>>         range->len = trimmed;
>>>>>>         return ret;
>>>>>>  }
>>>>>> ------
>>>>>
>>>>> This won't apply on tag v4.14 for some reason.
>>>>>
>>>>> [chris@f27s linux]$ git apply -v ~/qutrim1.patch
>>>>> Checking patch fs/btrfs/extent-tree.c...
>>>>> error: while searching for:
>>>>>                                ret = cache_block_group(cache, 0);
>>>>>                                if (ret) {
>>>>>                                        btrfs_put_block_group(cache);
>>>>>                                        break;
>>>>>                                }
>>>>>                                ret = wait_block_group_cache_done(cache);
>>>>>                                if (ret) {
>>>>>                                        btrfs_put_block_group(cache);
>>>>>                                        break;
>>>>>                                }
>>>>>                        }
>>>>>                        ret = btrfs_trim_block_group(cache,
>>>>>
>>>>> error: patch failed: fs/btrfs/extent-tree.c:10983
>>>>> error: fs/btrfs/extent-tree.c: patch does not apply
>>>>> [chris@f27s linux]$
>>>>>
>>>>>
>>>>> If I do it manually (just adding the goto and build it, reboot, I
>>>>> still get the same result for fstrim and nothing in dmesg.
>>>>
>>>> Sorry, that diff will not output extra info. Just to abort the process
>>>> and return true error code.
>>>
>>> OK? It didn't seem to do that either. I see no change.
>>>
>>>
>>>>
>>>> I have update the patch to output more verbose output.
>>>> You could find it in patchwork:
>>>> https://patchwork.kernel.org/patch/10065991/
>>>
>>> Patch applies on v4.14.0, and still nothing in dmesg, or in user space
>>> when issuing fstrim.
>>>
>>> # fstrim -v /
>>> /: 38 GiB (40767586304 bytes) trimmed
>>> # dmesg | grep -i btrfs
>>> [    2.745902] btrfs: loading out-of-tree module taints kernel.
>>> [    2.749905] Btrfs loaded, crc32c=crc32c-intel
>>> [    2.751072] BTRFS: device label fedora devid 1 transid 252048 /dev/nvme0n1p8
>>> [    4.295891] BTRFS info (device nvme0n1p8): disk space caching is enabled
>>> [    4.295892] BTRFS info (device nvme0n1p8): has skinny extents
>>> [    4.307326] BTRFS info (device nvme0n1p8): enabling ssd optimizations
>>> [    4.959467] BTRFS info (device nvme0n1p8): disk space caching is enabled
>>> [root@f27h ~]#
>>>
>>>
>>> Pretty sure the patch is applied because of the first message about
>>> the out of tree module, which I do not get with Fedora kernels.
>>
>> So that's not something wrong happened to make you skip trimming one
>> chunk, but something else just skipped the block group trimming.
>>
>> And I don't think DEBUG config is related to this.
>>
>> I doubt if it's the @fstrim_range passed in has something strange that
>> prevent us from trimming block groups.
>>
>> Would you please try this diff based on the patch from patchwork?
> 
> Apply in addition to previous patch? Or apply to clean v4.14?

On previous patch.

And BTW, according to your sysfs discard output, it seems that you're
using Intel 600P 256G NVME ssd, which I'm also using.

I will also try to create such case on my side.

Thanks,
Qu
> 
> 
>>
>> ------
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index f830aa91ac3d..a4bf29a4a860 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -10972,6 +10972,8 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>> struct fstrim_range *range)
>>         int dev_ret = 0;
>>         int ret = 0;
>>
>> +       btrfs_info(fs_info, "trimming btrfs, start=%llu len=%llu
>> minlen=%llu",
>> +                  range->start, range->len, range->minlen);
>>         /*
>>          * try to trim all FS space, our block group may start from
>> non-zero.
>>          */
>> @@ -10981,6 +10983,8 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>> struct fstrim_range *range)
>>                 cache = btrfs_lookup_block_group(fs_info, range->start);
>>
>>         for (; cache; cache = next_block_group(fs_info, cache)) {
>> +               btrfs_info(fs_info, "bg start=%llu len=%llu",
>> +                          cache->key.objectid, cache->key.offset);
>>                 if (cache->key.objectid >= (range->start + range->len)) {
>>                         btrfs_put_block_group(cache);
>>                         break;
>> @@ -11045,6 +11049,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>> struct fstrim_range *range)
>>         mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>
>>         range->len = trimmed;
>> +       btrfs_info(fs_info, "trimming done");
>>         if (bg_ret)
>>                 return bg_ret;
>>         return dev_ret;
>>
>> ------
>>
>> Thanks,
>> Qu
>>
> 
> 
>
Chris Murphy Nov. 21, 2017, 4:46 a.m. UTC | #3
On Mon, Nov 20, 2017 at 9:43 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:

>
> And BTW, according to your sysfs discard output, it seems that you're
> using Intel 600P 256G NVME ssd, which I'm also using.

No. 'nvme list' shows it as:
SAMSUNG MZVLV256HCHP-000H1

And lscpi shows it as:
6d:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co
Ltd NVMe SSD Controller SM951/PM951 [144d:a802] (rev 01) (prog-if 02
[NVM Express])
Chris Murphy Nov. 21, 2017, 4:49 a.m. UTC | #4
On Mon, Nov 20, 2017 at 9:43 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2017年11月21日 12:34, Chris Murphy wrote:
>> On Mon, Nov 20, 2017 at 9:29 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>
>>>
>>> On 2017年11月21日 12:06, Chris Murphy wrote:
>>>> On Mon, Nov 20, 2017 at 6:16 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>
>>>>>
>>>>> On 2017年11月21日 06:23, Chris Murphy wrote:
>>>>>> On Sun, Nov 19, 2017 at 7:42 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2017年11月20日 10:24, Chris Murphy wrote:
>>>>>>>> On Sun, Nov 19, 2017 at 7:13 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2017年11月19日 14:17, Chris Murphy wrote:
>>>>>>>>>> fstrim should trim free space, but it only trims unallocated. This is
>>>>>>>>>> with kernel 4.14.0 and the entire 4.13 series. I'm pretty sure it
>>>>>>>>>> behaved this way with 4.12 also.
>>>>>>>>>
>>>>>>>>> Tested with 4.14-rc7, can't reproduce it.
>>>>>>>>
>>>>>>>> $ sudo btrfs fi us /
>>>>>>>> Overall:
>>>>>>>>     Device size:          70.00GiB
>>>>>>>>     Device allocated:          31.03GiB
>>>>>>>>     Device unallocated:          38.97GiB
>>>>>>>>     Device missing:             0.00B
>>>>>>>>     Used:              22.12GiB
>>>>>>>>     Free (estimated):          47.62GiB    (min: 47.62GiB)
>>>>>>>> ...snip...
>>>>>>>>
>>>>>>>> $ sudo fstrim -v /
>>>>>>>> /: 39 GiB (41841328128 bytes) trimmed
>>>>>>>>
>>>>>>>> Then I run btrfs-debug -b / and find the least used block group, at 8% usage;
>>>>>>>>
>>>>>>>> block group offset   174202028032 len 1073741824 used   89206784
>>>>>>>> chunk_objectid 256 flags 1 usage 0.08
>>>>>>>>
>>>>>>>> And balance that block group:
>>>>>>>>
>>>>>>>> $ sudo btrfs balance start -dvrange=174202028032..174202028033 -dlimit=1 /
>>>>>>>> Done, had to relocate 1 out of 32 chunks
>>>>>>>>
>>>>>>>> And trim again:
>>>>>>>>
>>>>>>>> /: 39 GiB (41841328128 bytes) trimmed
>>>>>>>>
>>>>>>>>
>>>>>>>>> Any special mount options or setup?
>>>>>>>>> (BTW, I also tried space_cache=v2 and default v1, no obvious difference)
>>>>>>>>
>>>>>>>>
>>>>>>>> /dev/nvme0n1p8 on / type btrfs
>>>>>>>> (rw,relatime,seclabel,ssd,space_cache,subvolid=333,subvol=/root27)
>>>>>>>
>>>>>>> Nothing special at all.
>>>>>>>
>>>>>>> And unfortunately, no trace point inside btrfs_trim_block_group() at all.
>>>>>>>
>>>>>>> But a quick glance shows me that, the loop to iterate existing block
>>>>>>> groups to trim free space inside them has a return value overwrite bug.
>>>>>>>
>>>>>>> So only unallocated space get trimmed.
>>>>>>>
>>>>>>> Would you please try this diff to get the return value?
>>>>>>>
>>>>>>> ------
>>>>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>>>>> index 309a109069f1..dbec05dc8810 100644
>>>>>>> --- a/fs/btrfs/extent-tree.c
>>>>>>> +++ b/fs/btrfs/extent-tree.c
>>>>>>> @@ -10983,12 +10983,12 @@ int btrfs_trim_fs(struct btrfs_fs_info
>>>>>>> *fs_info, struct fstrim_range *range)
>>>>>>>                                 ret = cache_block_group(cache, 0);
>>>>>>>                                 if (ret) {
>>>>>>>                                         btrfs_put_block_group(cache);
>>>>>>> -                                       break;
>>>>>>> +                                       goto out;
>>>>>>>                                 }
>>>>>>>                                 ret = wait_block_group_cache_done(cache);
>>>>>>>                                 if (ret) {
>>>>>>>                                         btrfs_put_block_group(cache);
>>>>>>> -                                       break;
>>>>>>> +                                       goto out;
>>>>>>>                                 }
>>>>>>>                         }
>>>>>>>                         ret = btrfs_trim_block_group(cache,
>>>>>>> @@ -11000,7 +11000,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>>>>>>> struct fstrim_range *range)
>>>>>>>                         trimmed += group_trimmed;
>>>>>>>                         if (ret) {
>>>>>>>                                 btrfs_put_block_group(cache);
>>>>>>> -                               break;
>>>>>>> +                               goto out;
>>>>>>>                         }
>>>>>>>                 }
>>>>>>>
>>>>>>> @@ -11019,6 +11019,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>>>>>>> struct fstrim_range *range)
>>>>>>>         }
>>>>>>>         mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>>>>>>
>>>>>>> +out:
>>>>>>>         range->len = trimmed;
>>>>>>>         return ret;
>>>>>>>  }
>>>>>>> ------
>>>>>>
>>>>>> This won't apply on tag v4.14 for some reason.
>>>>>>
>>>>>> [chris@f27s linux]$ git apply -v ~/qutrim1.patch
>>>>>> Checking patch fs/btrfs/extent-tree.c...
>>>>>> error: while searching for:
>>>>>>                                ret = cache_block_group(cache, 0);
>>>>>>                                if (ret) {
>>>>>>                                        btrfs_put_block_group(cache);
>>>>>>                                        break;
>>>>>>                                }
>>>>>>                                ret = wait_block_group_cache_done(cache);
>>>>>>                                if (ret) {
>>>>>>                                        btrfs_put_block_group(cache);
>>>>>>                                        break;
>>>>>>                                }
>>>>>>                        }
>>>>>>                        ret = btrfs_trim_block_group(cache,
>>>>>>
>>>>>> error: patch failed: fs/btrfs/extent-tree.c:10983
>>>>>> error: fs/btrfs/extent-tree.c: patch does not apply
>>>>>> [chris@f27s linux]$
>>>>>>
>>>>>>
>>>>>> If I do it manually (just adding the goto and build it, reboot, I
>>>>>> still get the same result for fstrim and nothing in dmesg.
>>>>>
>>>>> Sorry, that diff will not output extra info. Just to abort the process
>>>>> and return true error code.
>>>>
>>>> OK? It didn't seem to do that either. I see no change.
>>>>
>>>>
>>>>>
>>>>> I have update the patch to output more verbose output.
>>>>> You could find it in patchwork:
>>>>> https://patchwork.kernel.org/patch/10065991/
>>>>
>>>> Patch applies on v4.14.0, and still nothing in dmesg, or in user space
>>>> when issuing fstrim.
>>>>
>>>> # fstrim -v /
>>>> /: 38 GiB (40767586304 bytes) trimmed
>>>> # dmesg | grep -i btrfs
>>>> [    2.745902] btrfs: loading out-of-tree module taints kernel.
>>>> [    2.749905] Btrfs loaded, crc32c=crc32c-intel
>>>> [    2.751072] BTRFS: device label fedora devid 1 transid 252048 /dev/nvme0n1p8
>>>> [    4.295891] BTRFS info (device nvme0n1p8): disk space caching is enabled
>>>> [    4.295892] BTRFS info (device nvme0n1p8): has skinny extents
>>>> [    4.307326] BTRFS info (device nvme0n1p8): enabling ssd optimizations
>>>> [    4.959467] BTRFS info (device nvme0n1p8): disk space caching is enabled
>>>> [root@f27h ~]#
>>>>
>>>>
>>>> Pretty sure the patch is applied because of the first message about
>>>> the out of tree module, which I do not get with Fedora kernels.
>>>
>>> So that's not something wrong happened to make you skip trimming one
>>> chunk, but something else just skipped the block group trimming.
>>>
>>> And I don't think DEBUG config is related to this.
>>>
>>> I doubt if it's the @fstrim_range passed in has something strange that
>>> prevent us from trimming block groups.
>>>
>>> Would you please try this diff based on the patch from patchwork?
>>
>> Apply in addition to previous patch? Or apply to clean v4.14?
>
> On previous patch.

Refuses to apply with or without previous patch.

$ git apply -v ~/qufstrim3.patch
Checking patch fs/btrfs/extent-tree.c...
error: while searching for:
       int dev_ret = 0;
       int ret = 0;

       /*
        * try to trim all FS space, our block group may start from non-zero.
        */

error: patch failed: fs/btrfs/extent-tree.c:10972
error: fs/btrfs/extent-tree.c: patch does not apply





>
> And BTW, according to your sysfs discard output, it seems that you're
> using Intel 600P 256G NVME ssd, which I'm also using.
>
> I will also try to create such case on my side.
>
> Thanks,
> Qu
>>
>>
>>>
>>> ------
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index f830aa91ac3d..a4bf29a4a860 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -10972,6 +10972,8 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>>> struct fstrim_range *range)
>>>         int dev_ret = 0;
>>>         int ret = 0;
>>>
>>> +       btrfs_info(fs_info, "trimming btrfs, start=%llu len=%llu
>>> minlen=%llu",
>>> +                  range->start, range->len, range->minlen);
>>>         /*
>>>          * try to trim all FS space, our block group may start from
>>> non-zero.
>>>          */
>>> @@ -10981,6 +10983,8 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>>> struct fstrim_range *range)
>>>                 cache = btrfs_lookup_block_group(fs_info, range->start);
>>>
>>>         for (; cache; cache = next_block_group(fs_info, cache)) {
>>> +               btrfs_info(fs_info, "bg start=%llu len=%llu",
>>> +                          cache->key.objectid, cache->key.offset);
>>>                 if (cache->key.objectid >= (range->start + range->len)) {
>>>                         btrfs_put_block_group(cache);
>>>                         break;
>>> @@ -11045,6 +11049,7 @@ int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
>>> struct fstrim_range *range)
>>>         mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>>
>>>         range->len = trimmed;
>>> +       btrfs_info(fs_info, "trimming done");
>>>         if (bg_ret)
>>>                 return bg_ret;
>>>         return dev_ret;
>>>
>>> ------
>>>
>>> Thanks,
>>> Qu
>>>
>>
>>
>>
>
Qu Wenruo Nov. 21, 2017, 4:58 a.m. UTC | #5
On 2017年11月21日 12:49, Chris Murphy wrote:
> On Mon, Nov 20, 2017 at 9:43 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>>
>>> Apply in addition to previous patch? Or apply to clean v4.14?
>>
>> On previous patch.
> 
> Refuses to apply with or without previous patch.
> 
> $ git apply -v ~/qufstrim3.patch
> Checking patch fs/btrfs/extent-tree.c...
> error: while searching for:
>        int dev_ret = 0;
>        int ret = 0;
> 
>        /*
>         * try to trim all FS space, our block group may start from non-zero.
>         */
> 
> error: patch failed: fs/btrfs/extent-tree.c:10972
> error: fs/btrfs/extent-tree.c: patch does not apply
> 

Please try this branch.

It's just previous patch and diff merged together and applied on v4.14
tag from torvalds.

https://github.com/adam900710/linux/tree/tmp

Thanks,
Qu
Chris Murphy Nov. 21, 2017, 5:58 a.m. UTC | #6
On Mon, Nov 20, 2017 at 9:58 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2017年11月21日 12:49, Chris Murphy wrote:
>> On Mon, Nov 20, 2017 at 9:43 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>
>>>
>>>>
>>>> Apply in addition to previous patch? Or apply to clean v4.14?
>>>
>>> On previous patch.
>>
>> Refuses to apply with or without previous patch.
>>
>> $ git apply -v ~/qufstrim3.patch
>> Checking patch fs/btrfs/extent-tree.c...
>> error: while searching for:
>>        int dev_ret = 0;
>>        int ret = 0;
>>
>>        /*
>>         * try to trim all FS space, our block group may start from non-zero.
>>         */
>>
>> error: patch failed: fs/btrfs/extent-tree.c:10972
>> error: fs/btrfs/extent-tree.c: patch does not apply
>>
>
> Please try this branch.
>
> It's just previous patch and diff merged together and applied on v4.14
> tag from torvalds.
>
> https://github.com/adam900710/linux/tree/tmp

# fstrim -v /
/: 38 GiB (40767586304 bytes) trimmed
# dmesg

..snip...
[   46.408792] BTRFS info (device nvme0n1p8): trimming btrfs, start=0
len=75161927680 minlen=512
[   46.408800] BTRFS info (device nvme0n1p8): bg start=140882477056
len=1073741824
[   46.433867] BTRFS info (device nvme0n1p8): trimming done

Attaching 'btrfs-debug -b /' to get an idea about the block groups present.
Qu Wenruo Nov. 21, 2017, 6:10 a.m. UTC | #7
On 2017年11月21日 13:58, Chris Murphy wrote:
> On Mon, Nov 20, 2017 at 9:58 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>> On 2017年11月21日 12:49, Chris Murphy wrote:
>>> On Mon, Nov 20, 2017 at 9:43 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>
>>>>
>>>>>
>>>>> Apply in addition to previous patch? Or apply to clean v4.14?
>>>>
>>>> On previous patch.
>>>
>>> Refuses to apply with or without previous patch.
>>>
>>> $ git apply -v ~/qufstrim3.patch
>>> Checking patch fs/btrfs/extent-tree.c...
>>> error: while searching for:
>>>        int dev_ret = 0;
>>>        int ret = 0;
>>>
>>>        /*
>>>         * try to trim all FS space, our block group may start from non-zero.
>>>         */
>>>
>>> error: patch failed: fs/btrfs/extent-tree.c:10972
>>> error: fs/btrfs/extent-tree.c: patch does not apply
>>>
>>
>> Please try this branch.
>>
>> It's just previous patch and diff merged together and applied on v4.14
>> tag from torvalds.
>>
>> https://github.com/adam900710/linux/tree/tmp
> 
> # fstrim -v /
> /: 38 GiB (40767586304 bytes) trimmed
> # dmesg
> 
> ..snip...
> [   46.408792] BTRFS info (device nvme0n1p8): trimming btrfs, start=0
> len=75161927680 minlen=512
> [   46.408800] BTRFS info (device nvme0n1p8): bg start=140882477056
> len=1073741824
> [   46.433867] BTRFS info (device nvme0n1p8): trimming done

Great (for the output, not for the trimming failure).

And the problem is very obvious now.
140882477056 << First chunk start
75161927680  << length of fstrim_range passed in

Obviously, fstrim_range passed in is using the filesystem size it
assumes to be.

While we stupidly use the range in fstrim_range without considering the
fact that, we're dealing with *btrfs logical address space*.
Where our chunk can start from any bytenr (well, at least aligned with
sectorsize).
When I read the code I also think the range check has nothing wrong at all.

So the truth here is, we should not ever try to check the range from
fstrim_range.

And the problem means that, a normal btrfs with some usage and after
several full balance, fstrim will only trim the unallocated space for btrfs.

Now the fix should not be a hard to craft.

Great thanks for all your help to locate the problem.

Thanks,
Qu
> 
> Attaching 'btrfs-debug -b /' to get an idea about the block groups present.
> 
> 
>
Chris Murphy April 1, 2018, 3:28 a.m. UTC | #8
On Mon, Nov 20, 2017 at 11:10 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
> On 2017年11月21日 13:58, Chris Murphy wrote:
>> On Mon, Nov 20, 2017 at 9:58 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>
>>>
>>> On 2017年11月21日 12:49, Chris Murphy wrote:
>>>> On Mon, Nov 20, 2017 at 9:43 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Apply in addition to previous patch? Or apply to clean v4.14?
>>>>>
>>>>> On previous patch.
>>>>
>>>> Refuses to apply with or without previous patch.
>>>>
>>>> $ git apply -v ~/qufstrim3.patch
>>>> Checking patch fs/btrfs/extent-tree.c...
>>>> error: while searching for:
>>>>        int dev_ret = 0;
>>>>        int ret = 0;
>>>>
>>>>        /*
>>>>         * try to trim all FS space, our block group may start from non-zero.
>>>>         */
>>>>
>>>> error: patch failed: fs/btrfs/extent-tree.c:10972
>>>> error: fs/btrfs/extent-tree.c: patch does not apply
>>>>
>>>
>>> Please try this branch.
>>>
>>> It's just previous patch and diff merged together and applied on v4.14
>>> tag from torvalds.
>>>
>>> https://github.com/adam900710/linux/tree/tmp
>>
>> # fstrim -v /
>> /: 38 GiB (40767586304 bytes) trimmed
>> # dmesg
>>
>> ..snip...
>> [   46.408792] BTRFS info (device nvme0n1p8): trimming btrfs, start=0
>> len=75161927680 minlen=512
>> [   46.408800] BTRFS info (device nvme0n1p8): bg start=140882477056
>> len=1073741824
>> [   46.433867] BTRFS info (device nvme0n1p8): trimming done
>
> Great (for the output, not for the trimming failure).
>
> And the problem is very obvious now.
> 140882477056 << First chunk start
> 75161927680  << length of fstrim_range passed in
>
> Obviously, fstrim_range passed in is using the filesystem size it
> assumes to be.
>
> While we stupidly use the range in fstrim_range without considering the
> fact that, we're dealing with *btrfs logical address space*.
> Where our chunk can start from any bytenr (well, at least aligned with
> sectorsize).
> When I read the code I also think the range check has nothing wrong at all.
>
> So the truth here is, we should not ever try to check the range from
> fstrim_range.
>
> And the problem means that, a normal btrfs with some usage and after
> several full balance, fstrim will only trim the unallocated space for btrfs.
>
> Now the fix should not be a hard to craft.
>
> Great thanks for all your help to locate the problem.

I still see this problem in 4.16.0-0.rc7.git1.1.fc29.x86_64. Are there
any patches to test?

[chris@f27h mnt]$ sudo btrfs fi us /
Overall:
    Device size:          70.00GiB
    Device allocated:          60.06GiB
    Device unallocated:           9.94GiB
    Device missing:             0.00B
    Used:              36.89GiB
    Free (estimated):          29.75GiB    (min: 24.78GiB)
    Data ratio:                  1.00
    Metadata ratio:              1.00
    Global reserve:         107.77MiB    (used: 12.38MiB)

Data,single: Size:55.00GiB, Used:35.19GiB
   /dev/nvme0n1p9      55.00GiB

Metadata,single: Size:5.00GiB, Used:1.70GiB
   /dev/nvme0n1p9       5.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/nvme0n1p9      64.00MiB

Unallocated:
   /dev/nvme0n1p9       9.94GiB
[chris@f27h mnt]$ sudo fstrim -v /
[sudo] password for chris:
/: 10 GiB (10669260800 bytes) trimmed
Qu Wenruo April 1, 2018, 10:34 a.m. UTC | #9
On 2018年04月01日 11:28, Chris Murphy wrote:
> On Mon, Nov 20, 2017 at 11:10 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>> On 2017年11月21日 13:58, Chris Murphy wrote:
>>> On Mon, Nov 20, 2017 at 9:58 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>
>>>>
>>>> On 2017年11月21日 12:49, Chris Murphy wrote:
>>>>> On Mon, Nov 20, 2017 at 9:43 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Apply in addition to previous patch? Or apply to clean v4.14?
>>>>>>
>>>>>> On previous patch.
>>>>>
>>>>> Refuses to apply with or without previous patch.
>>>>>
>>>>> $ git apply -v ~/qufstrim3.patch
>>>>> Checking patch fs/btrfs/extent-tree.c...
>>>>> error: while searching for:
>>>>>        int dev_ret = 0;
>>>>>        int ret = 0;
>>>>>
>>>>>        /*
>>>>>         * try to trim all FS space, our block group may start from non-zero.
>>>>>         */
>>>>>
>>>>> error: patch failed: fs/btrfs/extent-tree.c:10972
>>>>> error: fs/btrfs/extent-tree.c: patch does not apply
>>>>>
>>>>
>>>> Please try this branch.
>>>>
>>>> It's just previous patch and diff merged together and applied on v4.14
>>>> tag from torvalds.
>>>>
>>>> https://github.com/adam900710/linux/tree/tmp
>>>
>>> # fstrim -v /
>>> /: 38 GiB (40767586304 bytes) trimmed
>>> # dmesg
>>>
>>> ..snip...
>>> [   46.408792] BTRFS info (device nvme0n1p8): trimming btrfs, start=0
>>> len=75161927680 minlen=512
>>> [   46.408800] BTRFS info (device nvme0n1p8): bg start=140882477056
>>> len=1073741824
>>> [   46.433867] BTRFS info (device nvme0n1p8): trimming done
>>
>> Great (for the output, not for the trimming failure).
>>
>> And the problem is very obvious now.
>> 140882477056 << First chunk start
>> 75161927680  << length of fstrim_range passed in
>>
>> Obviously, fstrim_range passed in is using the filesystem size it
>> assumes to be.
>>
>> While we stupidly use the range in fstrim_range without considering the
>> fact that, we're dealing with *btrfs logical address space*.
>> Where our chunk can start from any bytenr (well, at least aligned with
>> sectorsize).
>> When I read the code I also think the range check has nothing wrong at all.
>>
>> So the truth here is, we should not ever try to check the range from
>> fstrim_range.
>>
>> And the problem means that, a normal btrfs with some usage and after
>> several full balance, fstrim will only trim the unallocated space for btrfs.
>>
>> Now the fix should not be a hard to craft.
>>
>> Great thanks for all your help to locate the problem.
> 
> I still see this problem in 4.16.0-0.rc7.git1.1.fc29.x86_64. Are there
> any patches to test?
> 
> [chris@f27h mnt]$ sudo btrfs fi us /
> Overall:
>     Device size:          70.00GiB
>     Device allocated:          60.06GiB
>     Device unallocated:           9.94GiB
>     Device missing:             0.00B
>     Used:              36.89GiB
>     Free (estimated):          29.75GiB    (min: 24.78GiB)
>     Data ratio:                  1.00
>     Metadata ratio:              1.00
>     Global reserve:         107.77MiB    (used: 12.38MiB)
> 
> Data,single: Size:55.00GiB, Used:35.19GiB
>    /dev/nvme0n1p9      55.00GiB
> 
> Metadata,single: Size:5.00GiB, Used:1.70GiB
>    /dev/nvme0n1p9       5.00GiB
> 
> System,DUP: Size:32.00MiB, Used:16.00KiB
>    /dev/nvme0n1p9      64.00MiB
> 
> Unallocated:
>    /dev/nvme0n1p9       9.94GiB
> [chris@f27h mnt]$ sudo fstrim -v /
> [sudo] password for chris:
> /: 10 GiB (10669260800 bytes) trimmed

The latest version is here
https://patchwork.kernel.org/patch/10078773/
https://patchwork.kernel.org/patch/10083979/

And I didn't see it in misc-next either, I may need to ping it soon.

Thanks,
Qu
Chris Murphy April 1, 2018, 11:30 p.m. UTC | #10
[chris@f27h linux]$ git apply -v
~/RESEND-1-2-btrfs-Enhance-btrfs_trim_fs-function-to-handle-error-better.patch
Checking patch fs/btrfs/extent-tree.c...
Hunk #1 succeeded at 10942 (offset -6 lines).
Hunk #2 succeeded at 10962 (offset -6 lines).
Hunk #3 succeeded at 10974 (offset -6 lines).
Hunk #4 succeeded at 10988 (offset -6 lines).
Hunk #5 succeeded at 11025 (offset -6 lines).
Applied patch fs/btrfs/extent-tree.c cleanly.
[chris@f27h linux]$ git apply -v
~/v2.2-2-2-btrfs-Ensure-btrfs_trim_fs-can-trim-the-whole-fs.patch
Checking patch fs/btrfs/extent-tree.c...
Hunk #1 succeeded at 10961 (offset -6 lines).
Checking patch fs/btrfs/ioctl.c...
Hunk #1 succeeded at 364 (offset -1 lines).
Hunk #2 succeeded at 387 (offset -1 lines).
Applied patch fs/btrfs/extent-tree.c cleanly.
Applied patch fs/btrfs/ioctl.c cleanly.

compiles fine, and test appears to trim more than before, and the file
system still works and scrubs with no errors (did scrub with
4.15.14.fc28).

[chris@f27h ~]$ uname -r
4.16.0-rc7 (mainline, unpatched)
[chris@f27h ~]$ sudo btrfs fi us /
Overall:
    Device size:          70.00GiB
    Device allocated:          46.06GiB
    Device unallocated:          23.94GiB
    Device missing:             0.00B
    Used:              43.90GiB
    Free (estimated):          25.86GiB    (min: 13.89GiB)
    Data ratio:                  1.00
    Metadata ratio:              1.00
    Global reserve:         109.94MiB    (used: 0.00B)

Data,single: Size:44.00GiB, Used:42.08GiB
   /dev/nvme0n1p9      44.00GiB

Metadata,single: Size:2.00GiB, Used:1.82GiB
   /dev/nvme0n1p9       2.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/nvme0n1p9      64.00MiB

Unallocated:
   /dev/nvme0n1p9      23.94GiB
[chris@f27h ~]$ sudo fstrim -v /
[sudo] password for chris:
/: 24 GiB (25701646336 bytes) trimmed
[chris@f27h ~]$

==============

[chris@f27h ~]$ uname -r
4.16.0-rc7+ (mainline, plus patch)
[chris@f27h ~]$ sudo btrfs fi us /
Overall:
    Device size:          70.00GiB
    Device allocated:          46.06GiB
    Device unallocated:          23.94GiB
    Device missing:             0.00B
    Used:              43.90GiB
    Free (estimated):          25.86GiB    (min: 13.89GiB)
    Data ratio:                  1.00
    Metadata ratio:              1.00
    Global reserve:         110.06MiB    (used: 0.00B)

Data,single: Size:44.00GiB, Used:42.08GiB
   /dev/nvme0n1p9      44.00GiB

Metadata,single: Size:2.00GiB, Used:1.82GiB
   /dev/nvme0n1p9       2.00GiB

System,DUP: Size:32.00MiB, Used:16.00KiB
   /dev/nvme0n1p9      64.00MiB

Unallocated:
   /dev/nvme0n1p9      23.94GiB
[chris@f27h ~]$ sudo fstrim -v /
[sudo] password for chris:
/: 26.5 GiB (28394635264 bytes) trimmed
[chris@f27h ~]$


---
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f830aa91ac3d..a4bf29a4a860 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10972,6 +10972,8 @@  int btrfs_trim_fs(struct btrfs_fs_info *fs_info,
struct fstrim_range *range)
        int dev_ret = 0;
        int ret = 0;

+       btrfs_info(fs_info, "trimming btrfs, start=%llu len=%llu
minlen=%llu",
+                  range->start, range->len, range->minlen);
        /*