mbox series

[0/5] btrfs: scrub: improve the scrub performance

Message ID cover.1690542141.git.wqu@suse.com (mailing list archive)
Headers show
Series btrfs: scrub: improve the scrub performance | expand

Message

Qu Wenruo July 28, 2023, 11:14 a.m. UTC
[REPO]
https://github.com/adam900710/linux/tree/scrub_testing

[CHANGELOG]
v1:
- Rebased to latest misc-next

- Rework the read IO grouping patch
  David has found some crashes mostly related to scrub performance
  fixes, meanwhile the original grouping patch has one extra flag,
  SCRUB_FLAG_READ_SUBMITTED, to avoid double submitting.

  But this flag can be avoided as we can easily avoid double submitting
  just by properly checking the sctx->nr_stripe variable.

  This reworked grouping read IO patch should be safer compared to the
  initial version, with better code structure.

  Unfortunately, the final performance is worse than the initial version
  (2.2GiB/s vs 2.5GiB/s), but it should be less racy thus safer.

- Re-order the patches
  The first 3 patches are the main fixes, and I put safer patches first,
  so even if David still found crash at certain patch, the remaining can
  be dropped if needed.

There is a huge scrub performance drop introduced by v6.4 kernel, that 
the scrub performance is only around 1/3 for large data extents.

There are several causes:

- Missing blk plug
  This means read requests won't be merged by block layer, this can
  hugely reduce the read performance.

- Extra time spent on extent/csum tree search
  This including extra path allocation/freeing and tree searchs.
  This is especially obvious for large data extents, as previously we
  only do one csum search per 512K, but now we do one csum search per
  64K, an 8 times increase in csum tree search.

- Less concurrency
  Mostly due to the fact that we're doing submit-and-wait, thus much
  lower queue depth, affecting things like NVME which benefits a lot
  from high concurrency.

The first 3 patches would greately improve the scrub read performance,
but unfortunately it's still not as fast as the pre-6.4 kernels.
(2.2GiB/s vs 3.0GiB/s), but still much better than 6.4 kernels (2.2GiB
vs 1.0GiB/s).

Qu Wenruo (5):
  btrfs: scrub: avoid unnecessary extent tree search preparing stripes
  btrfs: scrub: avoid unnecessary csum tree search preparing stripes
  btrfs: scrub: fix grouping of read IO
  btrfs: scrub: don't go ordered workqueue for dev-replace
  btrfs: scrub: move write back of repaired sectors into
    scrub_stripe_read_repair_worker()

 fs/btrfs/file-item.c |  33 +++---
 fs/btrfs/file-item.h |   6 +-
 fs/btrfs/raid56.c    |   4 +-
 fs/btrfs/scrub.c     | 234 ++++++++++++++++++++++++++-----------------
 4 files changed, 169 insertions(+), 108 deletions(-)

Comments

Martin Steigerwald July 28, 2023, 12:38 p.m. UTC | #1
Qu Wenruo - 28.07.23, 13:14:03 CEST:
> The first 3 patches would greately improve the scrub read performance,
> but unfortunately it's still not as fast as the pre-6.4 kernels.
> (2.2GiB/s vs 3.0GiB/s), but still much better than 6.4 kernels
> (2.2GiB vs 1.0GiB/s).

Thanks for the patch set.

What is the reason for not going back to the performance of the pre-6.4 
kernel? Isn't it possible with the new scrubbing method? In that case 
what improvements does the new scrubbing code have that warrant to have 
a lower performance?

Just like to understand the background of this a bit more. I do not mind 
a bit lower performance too much, especially in case it is outweighed by 
other benefits.
David Sterba July 28, 2023, 4:50 p.m. UTC | #2
On Fri, Jul 28, 2023 at 02:38:35PM +0200, Martin Steigerwald wrote:
> Qu Wenruo - 28.07.23, 13:14:03 CEST:
> > The first 3 patches would greately improve the scrub read performance,
> > but unfortunately it's still not as fast as the pre-6.4 kernels.
> > (2.2GiB/s vs 3.0GiB/s), but still much better than 6.4 kernels
> > (2.2GiB vs 1.0GiB/s).
> 
> Thanks for the patch set.
> 
> What is the reason for not going back to the performance of the pre-6.4 
> kernel? Isn't it possible with the new scrubbing method? In that case 
> what improvements does the new scrubbing code have that warrant to have 
> a lower performance?

Lower performance was not expected and needs to be brought back. A minor
decrease would be tolerable but that's something around 5%, not 60%.

> Just like to understand the background of this a bit more. I do not mind 
> a bit lower performance too much, especially in case it is outweighed by 
> other benefits.

The code in scrub was from 3.0 times and since then new features have
been implemented, extending the code became hard over time so a bigger
update was done restructuring how the IO is done.
Martin Steigerwald July 28, 2023, 9:14 p.m. UTC | #3
David Sterba - 28.07.23, 18:50:37 CEST:
> On Fri, Jul 28, 2023 at 02:38:35PM +0200, Martin Steigerwald wrote:
> > Qu Wenruo - 28.07.23, 13:14:03 CEST:
> > > The first 3 patches would greately improve the scrub read
> > > performance, but unfortunately it's still not as fast as the
> > > pre-6.4 kernels. (2.2GiB/s vs 3.0GiB/s), but still much better
> > > than 6.4 kernels (2.2GiB vs 1.0GiB/s).
> > 
> > Thanks for the patch set.
> > 
> > What is the reason for not going back to the performance of the
> > pre-6.4 kernel? Isn't it possible with the new scrubbing method? In
> > that case what improvements does the new scrubbing code have that
> > warrant to have a lower performance?
> 
> Lower performance was not expected and needs to be brought back. A
> minor decrease would be tolerable but that's something around 5%, not
> 60%.

Okay. Best of success with improving performance again.

> > Just like to understand the background of this a bit more. I do not
> > mind a bit lower performance too much, especially in case it is
> > outweighed by other benefits.
> 
> The code in scrub was from 3.0 times and since then new features have
> been implemented, extending the code became hard over time so a bigger
> update was done restructuring how the IO is done.

Okay, thanks for explaining.
Jani Partanen Aug. 1, 2023, 8:14 p.m. UTC | #4
Hello, I did some testing with 4 x 320GB HDD's. Meta raid1c4 and data 
raid 5.

Kernel  6.3.12

btrfs scrub start -B /dev/sdb

scrub done for 6691cb53-271b-4abd-b2ab-143c41027924
Scrub started:    Tue Aug  1 04:00:39 2023
Status:           finished
Duration:         2:37:35
Total to scrub:   149.58GiB
Rate:             16.20MiB/s
Error summary:    no errors found


Kernel 6.5.0-rc3

btrfs scrub start -B /dev/sdb

scrub done for 6691cb53-271b-4abd-b2ab-143c41027924
Scrub started:    Tue Aug  1 08:41:12 2023
Status:           finished
Duration:         1:31:03
Total to scrub:   299.16GiB
Rate:             56.08MiB/s
Error summary:    no errors found


So much better speed but Total to scrub reporting seems strange.

df -h /dev/sdb
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        1,2T  599G  292G  68% /mnt


Looks like old did like 1/4 of total data what seems like right because 
I have 4 drives.

New did  about 1/2 of total data what seems wrong.

And if I do scrub against mount point:

btrfs scrub start -B /mnt/
scrub done for 6691cb53-271b-4abd-b2ab-143c41027924
Scrub started:    Tue Aug  1 11:03:56 2023
Status:           finished
Duration:         10:02:44
Total to scrub:   1.17TiB
Rate:             33.89MiB/s
Error summary:    no errors found


Then performance goes down to toilet and now Total to scrub reporting is 
like 2/1

btrfs version
btrfs-progs v6.3.3

Is it btrfs-progs issue with reporting? What about raid 5 scrub 
performance, why it is so bad?


About disks, they are old WD Blue drives what can do about 100MB/s 
read/write on average.


On 28/07/2023 14.14, Qu Wenruo wrote:
> [REPO]
> https://github.com/adam900710/linux/tree/scrub_testing
>
> [CHANGELOG]
> v1:
> - Rebased to latest misc-next
>
> - Rework the read IO grouping patch
>    David has found some crashes mostly related to scrub performance
>    fixes, meanwhile the original grouping patch has one extra flag,
>    SCRUB_FLAG_READ_SUBMITTED, to avoid double submitting.
>
>    But this flag can be avoided as we can easily avoid double submitting
>    just by properly checking the sctx->nr_stripe variable.
>
>    This reworked grouping read IO patch should be safer compared to the
>    initial version, with better code structure.
>
>    Unfortunately, the final performance is worse than the initial version
>    (2.2GiB/s vs 2.5GiB/s), but it should be less racy thus safer.
>
> - Re-order the patches
>    The first 3 patches are the main fixes, and I put safer patches first,
>    so even if David still found crash at certain patch, the remaining can
>    be dropped if needed.
>
> There is a huge scrub performance drop introduced by v6.4 kernel, that
> the scrub performance is only around 1/3 for large data extents.
>
> There are several causes:
>
> - Missing blk plug
>    This means read requests won't be merged by block layer, this can
>    hugely reduce the read performance.
>
> - Extra time spent on extent/csum tree search
>    This including extra path allocation/freeing and tree searchs.
>    This is especially obvious for large data extents, as previously we
>    only do one csum search per 512K, but now we do one csum search per
>    64K, an 8 times increase in csum tree search.
>
> - Less concurrency
>    Mostly due to the fact that we're doing submit-and-wait, thus much
>    lower queue depth, affecting things like NVME which benefits a lot
>    from high concurrency.
>
> The first 3 patches would greately improve the scrub read performance,
> but unfortunately it's still not as fast as the pre-6.4 kernels.
> (2.2GiB/s vs 3.0GiB/s), but still much better than 6.4 kernels (2.2GiB
> vs 1.0GiB/s).
>
> Qu Wenruo (5):
>    btrfs: scrub: avoid unnecessary extent tree search preparing stripes
>    btrfs: scrub: avoid unnecessary csum tree search preparing stripes
>    btrfs: scrub: fix grouping of read IO
>    btrfs: scrub: don't go ordered workqueue for dev-replace
>    btrfs: scrub: move write back of repaired sectors into
>      scrub_stripe_read_repair_worker()
>
>   fs/btrfs/file-item.c |  33 +++---
>   fs/btrfs/file-item.h |   6 +-
>   fs/btrfs/raid56.c    |   4 +-
>   fs/btrfs/scrub.c     | 234 ++++++++++++++++++++++++++-----------------
>   4 files changed, 169 insertions(+), 108 deletions(-)
>
Qu Wenruo Aug. 1, 2023, 10:06 p.m. UTC | #5
On 2023/8/2 04:14, Jani Partanen wrote:
> Hello, I did some testing with 4 x 320GB HDD's. Meta raid1c4 and data
> raid 5.

RAID5 has other problems related to scrub performance unfortunately.

>
> Kernel  6.3.12
>
> btrfs scrub start -B /dev/sdb
>
> scrub done for 6691cb53-271b-4abd-b2ab-143c41027924
> Scrub started:    Tue Aug  1 04:00:39 2023
> Status:           finished
> Duration:         2:37:35
> Total to scrub:   149.58GiB
> Rate:             16.20MiB/s
> Error summary:    no errors found
>
>
> Kernel 6.5.0-rc3
>
> btrfs scrub start -B /dev/sdb
>
> scrub done for 6691cb53-271b-4abd-b2ab-143c41027924
> Scrub started:    Tue Aug  1 08:41:12 2023
> Status:           finished
> Duration:         1:31:03
> Total to scrub:   299.16GiB
> Rate:             56.08MiB/s
> Error summary:    no errors found
>
>
> So much better speed but Total to scrub reporting seems strange.
>
> df -h /dev/sdb
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sdb        1,2T  599G  292G  68% /mnt
>
>
> Looks like old did like 1/4 of total data what seems like right because
> I have 4 drives.
>
> New did  about 1/2 of total data what seems wrong.

I checked the kernel part of the progress reporting, for single device
scrub for RAID56, if it's a data stripe it contributes to the scrubbed
bytes, but if it's a P/Q stripe it should not contribute to the value.

Thus 1/4 should be the correct value.

However there is another factor in btrfs-progs, which determines how to
report the numbers.

There is a fix for it already merged in v6.3.2, but it seems to have
other problems involved.

>
> And if I do scrub against mount point:
>
> btrfs scrub start -B /mnt/
> scrub done for 6691cb53-271b-4abd-b2ab-143c41027924
> Scrub started:    Tue Aug  1 11:03:56 2023
> Status:           finished
> Duration:         10:02:44
> Total to scrub:   1.17TiB
> Rate:             33.89MiB/s
> Error summary:    no errors found
>
>
> Then performance goes down to toilet and now Total to scrub reporting is
> like 2/1
>
> btrfs version
> btrfs-progs v6.3.3
>
> Is it btrfs-progs issue with reporting?

Can you try with -BdR option?

It shows the raw numbers, which is the easiest way to determine if it's
a bug in btrfs-progs or kernel.

> What about raid 5 scrub
> performance, why it is so bad?

It's explained in this cover letter:
https://lore.kernel.org/linux-btrfs/cover.1688368617.git.wqu@suse.com/

In short, RAID56 full fs scrub is causing too many duplicated reads, and
the root cause is, the per-device scrub is never a good idea for RAID56.

That's why I'm trying to introduce the new scrub flag for that.

Thanks,
Qu

>
>
> About disks, they are old WD Blue drives what can do about 100MB/s
> read/write on average.
>
>
> On 28/07/2023 14.14, Qu Wenruo wrote:
>> [REPO]
>> https://github.com/adam900710/linux/tree/scrub_testing
>>
>> [CHANGELOG]
>> v1:
>> - Rebased to latest misc-next
>>
>> - Rework the read IO grouping patch
>>    David has found some crashes mostly related to scrub performance
>>    fixes, meanwhile the original grouping patch has one extra flag,
>>    SCRUB_FLAG_READ_SUBMITTED, to avoid double submitting.
>>
>>    But this flag can be avoided as we can easily avoid double submitting
>>    just by properly checking the sctx->nr_stripe variable.
>>
>>    This reworked grouping read IO patch should be safer compared to the
>>    initial version, with better code structure.
>>
>>    Unfortunately, the final performance is worse than the initial version
>>    (2.2GiB/s vs 2.5GiB/s), but it should be less racy thus safer.
>>
>> - Re-order the patches
>>    The first 3 patches are the main fixes, and I put safer patches first,
>>    so even if David still found crash at certain patch, the remaining can
>>    be dropped if needed.
>>
>> There is a huge scrub performance drop introduced by v6.4 kernel, that
>> the scrub performance is only around 1/3 for large data extents.
>>
>> There are several causes:
>>
>> - Missing blk plug
>>    This means read requests won't be merged by block layer, this can
>>    hugely reduce the read performance.
>>
>> - Extra time spent on extent/csum tree search
>>    This including extra path allocation/freeing and tree searchs.
>>    This is especially obvious for large data extents, as previously we
>>    only do one csum search per 512K, but now we do one csum search per
>>    64K, an 8 times increase in csum tree search.
>>
>> - Less concurrency
>>    Mostly due to the fact that we're doing submit-and-wait, thus much
>>    lower queue depth, affecting things like NVME which benefits a lot
>>    from high concurrency.
>>
>> The first 3 patches would greately improve the scrub read performance,
>> but unfortunately it's still not as fast as the pre-6.4 kernels.
>> (2.2GiB/s vs 3.0GiB/s), but still much better than 6.4 kernels (2.2GiB
>> vs 1.0GiB/s).
>>
>> Qu Wenruo (5):
>>    btrfs: scrub: avoid unnecessary extent tree search preparing stripes
>>    btrfs: scrub: avoid unnecessary csum tree search preparing stripes
>>    btrfs: scrub: fix grouping of read IO
>>    btrfs: scrub: don't go ordered workqueue for dev-replace
>>    btrfs: scrub: move write back of repaired sectors into
>>      scrub_stripe_read_repair_worker()
>>
>>   fs/btrfs/file-item.c |  33 +++---
>>   fs/btrfs/file-item.h |   6 +-
>>   fs/btrfs/raid56.c    |   4 +-
>>   fs/btrfs/scrub.c     | 234 ++++++++++++++++++++++++++-----------------
>>   4 files changed, 169 insertions(+), 108 deletions(-)
>>
Jani Partanen Aug. 1, 2023, 11:48 p.m. UTC | #6
On 02/08/2023 1.06, Qu Wenruo wrote:
>
> Can you try with -BdR option?
>
> It shows the raw numbers, which is the easiest way to determine if it's
> a bug in btrfs-progs or kernel.
>

Here is single device result:

btrfs scrub start -BdR /dev/sdb

Scrub device /dev/sdb (id 1) done
Scrub started:    Wed Aug  2 01:33:21 2023
Status:           finished
Duration:         0:44:29
         data_extents_scrubbed: 4902956
         tree_extents_scrubbed: 60494
         data_bytes_scrubbed: 321301020672
         tree_bytes_scrubbed: 991133696
         read_errors: 0
         csum_errors: 0
         verify_errors: 0
         no_csum: 22015840
         csum_discards: 0
         super_errors: 0
         malloc_errors: 0
         uncorrectable_errors: 0
         unverified_errors: 0
         corrected_errors: 0
         last_physical: 256679870464


I'll do against mountpoint when I go to sleep because it gonna take long.

>> What about raid 5 scrub
>> performance, why it is so bad?
>
> It's explained in this cover letter:
> https://lore.kernel.org/linux-btrfs/cover.1688368617.git.wqu@suse.com/
>
> In short, RAID56 full fs scrub is causing too many duplicated reads, and
> the root cause is, the per-device scrub is never a good idea for RAID56.
>
> That's why I'm trying to introduce the new scrub flag for that.
>
Ah, so there is different patchset for raid5 scrub, good to know. I'm 
gonna build that branch and test it. Also let me know if I could help 
somehow to do that stress testing. These drives are deticated for 
testing. I am running VM under Hyper-V and disk are passthrough directly 
to VM.
Qu Wenruo Aug. 2, 2023, 1:56 a.m. UTC | #7
On 2023/8/2 07:48, Jani Partanen wrote:
>
> On 02/08/2023 1.06, Qu Wenruo wrote:
>>
>> Can you try with -BdR option?
>>
>> It shows the raw numbers, which is the easiest way to determine if it's
>> a bug in btrfs-progs or kernel.
>>
>
> Here is single device result:
>
> btrfs scrub start -BdR /dev/sdb
>
> Scrub device /dev/sdb (id 1) done
> Scrub started:    Wed Aug  2 01:33:21 2023
> Status:           finished
> Duration:         0:44:29
>          data_extents_scrubbed: 4902956
>          tree_extents_scrubbed: 60494
>          data_bytes_scrubbed: 321301020672

So the btrfs scrub report is doing the correct report using the values
from kernel.

And considering the used space is around 600G, divided by 4 disks (aka,
3 data stripes + 1 parity stripes), it's not that weird, as we would got
around 200G per device (parity doesn't contribute to the scrubbed bytes).

Especially considering your metadata is RAID1C4, it means we should only
have more than 200G.
Instead it's the old report of less than 200G doesn't seem correct.

Mind to provide the output of "btrfs fi usage <mnt>" to verify my
assumption?

>          tree_bytes_scrubbed: 991133696
>          read_errors: 0
>          csum_errors: 0
>          verify_errors: 0
>          no_csum: 22015840
>          csum_discards: 0
>          super_errors: 0
>          malloc_errors: 0
>          uncorrectable_errors: 0
>          unverified_errors: 0
>          corrected_errors: 0
>          last_physical: 256679870464
>
>
> I'll do against mountpoint when I go to sleep because it gonna take long.
>
>>> What about raid 5 scrub
>>> performance, why it is so bad?
>>
>> It's explained in this cover letter:
>> https://lore.kernel.org/linux-btrfs/cover.1688368617.git.wqu@suse.com/
>>
>> In short, RAID56 full fs scrub is causing too many duplicated reads, and
>> the root cause is, the per-device scrub is never a good idea for RAID56.
>>
>> That's why I'm trying to introduce the new scrub flag for that.
>>
> Ah, so there is different patchset for raid5 scrub, good to know. I'm
> gonna build that branch and test it.

Although it's not recommended to test it for now, as we're still
handling the performance drop, thus the patchset may not apply cleanly.

> Also let me know if I could help
> somehow to do that stress testing. These drives are deticated for
> testing. I am running VM under Hyper-V and disk are passthrough directly
> to VM.
>

Sure, I'll CC you when refreshing the patchset, extra tests are always
appreciated.

Thanks,
Qu
Jani Partanen Aug. 2, 2023, 2:15 a.m. UTC | #8
On 02/08/2023 4.56, Qu Wenruo wrote:
>
> So the btrfs scrub report is doing the correct report using the values
> from kernel.
>
> And considering the used space is around 600G, divided by 4 disks (aka,
> 3 data stripes + 1 parity stripes), it's not that weird, as we would got
> around 200G per device (parity doesn't contribute to the scrubbed bytes).
>
> Especially considering your metadata is RAID1C4, it means we should only
> have more than 200G.
> Instead it's the old report of less than 200G doesn't seem correct.
>
> Mind to provide the output of "btrfs fi usage <mnt>" to verify my
> assumption?
>
btrfs fi usage /mnt/
Overall:
     Device size:                   1.16TiB
     Device allocated:            844.25GiB
     Device unallocated:          348.11GiB
     Device missing:                  0.00B
     Device slack:                    0.00B
     Used:                        799.86GiB
     Free (estimated):            289.58GiB      (min: 115.52GiB)
     Free (statfs, df):           289.55GiB
     Data ratio:                       1.33
     Metadata ratio:                   4.00
     Global reserve:              471.80MiB      (used: 0.00B)
     Multiple profiles:                  no

Data,RAID5: Size:627.00GiB, Used:598.51GiB (95.46%)
    /dev/sdb      209.00GiB
    /dev/sdc      209.00GiB
    /dev/sdd      209.00GiB
    /dev/sde      209.00GiB

Metadata,RAID1C4: Size:2.00GiB, Used:472.56MiB (23.07%)
    /dev/sdb        2.00GiB
    /dev/sdc        2.00GiB
    /dev/sdd        2.00GiB
    /dev/sde        2.00GiB

System,RAID1C4: Size:64.00MiB, Used:64.00KiB (0.10%)
    /dev/sdb       64.00MiB
    /dev/sdc       64.00MiB
    /dev/sdd       64.00MiB
    /dev/sde       64.00MiB

Unallocated:
    /dev/sdb       87.03GiB
    /dev/sdc       87.03GiB
    /dev/sdd       87.03GiB

    /dev/sde       87.03GiB


There is 1 extra 2GB file now so thats why it show little more usage now.


> Sure, I'll CC you when refreshing the patchset, extra tests are always
> appreciated.
>
Sound good, thanks!
Qu Wenruo Aug. 2, 2023, 2:20 a.m. UTC | #9
On 2023/8/2 10:15, Jani Partanen wrote:
> 
> On 02/08/2023 4.56, Qu Wenruo wrote:
>>
>> So the btrfs scrub report is doing the correct report using the values
>> from kernel.
>>
>> And considering the used space is around 600G, divided by 4 disks (aka,
>> 3 data stripes + 1 parity stripes), it's not that weird, as we would got
>> around 200G per device (parity doesn't contribute to the scrubbed bytes).
>>
>> Especially considering your metadata is RAID1C4, it means we should only
>> have more than 200G.
>> Instead it's the old report of less than 200G doesn't seem correct.
>>
>> Mind to provide the output of "btrfs fi usage <mnt>" to verify my
>> assumption?
>>
> btrfs fi usage /mnt/
> Overall:
>      Device size:                   1.16TiB
>      Device allocated:            844.25GiB
>      Device unallocated:          348.11GiB
>      Device missing:                  0.00B
>      Device slack:                    0.00B
>      Used:                        799.86GiB
>      Free (estimated):            289.58GiB      (min: 115.52GiB)
>      Free (statfs, df):           289.55GiB
>      Data ratio:                       1.33
>      Metadata ratio:                   4.00
>      Global reserve:              471.80MiB      (used: 0.00B)
>      Multiple profiles:                  no
> 
> Data,RAID5: Size:627.00GiB, Used:598.51GiB (95.46%)
>     /dev/sdb      209.00GiB
>     /dev/sdc      209.00GiB
>     /dev/sdd      209.00GiB
>     /dev/sde      209.00GiB

OK, my previous calculation is incorrect...

For each device there should be 209GiB used by RAID5 chunks, and only 
3/4 of them contributes to the scrubbed data bytes.

Thus there seems to be some double accounting.

Definitely needs extra digging for this situation.

Thanks,
Qu

> 
> Metadata,RAID1C4: Size:2.00GiB, Used:472.56MiB (23.07%)
>     /dev/sdb        2.00GiB
>     /dev/sdc        2.00GiB
>     /dev/sdd        2.00GiB
>     /dev/sde        2.00GiB
> 
> System,RAID1C4: Size:64.00MiB, Used:64.00KiB (0.10%)
>     /dev/sdb       64.00MiB
>     /dev/sdc       64.00MiB
>     /dev/sdd       64.00MiB
>     /dev/sde       64.00MiB
> 
> Unallocated:
>     /dev/sdb       87.03GiB
>     /dev/sdc       87.03GiB
>     /dev/sdd       87.03GiB
> 
>     /dev/sde       87.03GiB
> 
> 
> There is 1 extra 2GB file now so thats why it show little more usage now.
> 
> 
>> Sure, I'll CC you when refreshing the patchset, extra tests are always
>> appreciated.
>>
> Sound good, thanks!
>
Qu Wenruo Aug. 3, 2023, 6:30 a.m. UTC | #10
On 2023/8/2 10:20, Qu Wenruo wrote:
> 
> 
> On 2023/8/2 10:15, Jani Partanen wrote:
>>
>> On 02/08/2023 4.56, Qu Wenruo wrote:
>>>
>>> So the btrfs scrub report is doing the correct report using the values
>>> from kernel.
>>>
>>> And considering the used space is around 600G, divided by 4 disks (aka,
>>> 3 data stripes + 1 parity stripes), it's not that weird, as we would got
>>> around 200G per device (parity doesn't contribute to the scrubbed 
>>> bytes).
>>>
>>> Especially considering your metadata is RAID1C4, it means we should only
>>> have more than 200G.
>>> Instead it's the old report of less than 200G doesn't seem correct.
>>>
>>> Mind to provide the output of "btrfs fi usage <mnt>" to verify my
>>> assumption?
>>>
>> btrfs fi usage /mnt/
>> Overall:
>>      Device size:                   1.16TiB
>>      Device allocated:            844.25GiB
>>      Device unallocated:          348.11GiB
>>      Device missing:                  0.00B
>>      Device slack:                    0.00B
>>      Used:                        799.86GiB
>>      Free (estimated):            289.58GiB      (min: 115.52GiB)
>>      Free (statfs, df):           289.55GiB
>>      Data ratio:                       1.33
>>      Metadata ratio:                   4.00
>>      Global reserve:              471.80MiB      (used: 0.00B)
>>      Multiple profiles:                  no
>>
>> Data,RAID5: Size:627.00GiB, Used:598.51GiB (95.46%)
>>     /dev/sdb      209.00GiB
>>     /dev/sdc      209.00GiB
>>     /dev/sdd      209.00GiB
>>     /dev/sde      209.00GiB
> 
> OK, my previous calculation is incorrect...
> 
> For each device there should be 209GiB used by RAID5 chunks, and only 
> 3/4 of them contributes to the scrubbed data bytes.
> 
> Thus there seems to be some double accounting.
> 
> Definitely needs extra digging for this situation.

Well, this turns out to be something related to the patchset.

If you don't apply the patchset, the reporting is correct.

The problem is in the last patch, which calls 
scrub_stripe_report_errors() twice, thus double accounting the values.

I'll fix it soon.

Thanks for spotting this one!
Qu

> 
> Thanks,
> Qu
> 
>>
>> Metadata,RAID1C4: Size:2.00GiB, Used:472.56MiB (23.07%)
>>     /dev/sdb        2.00GiB
>>     /dev/sdc        2.00GiB
>>     /dev/sdd        2.00GiB
>>     /dev/sde        2.00GiB
>>
>> System,RAID1C4: Size:64.00MiB, Used:64.00KiB (0.10%)
>>     /dev/sdb       64.00MiB
>>     /dev/sdc       64.00MiB
>>     /dev/sdd       64.00MiB
>>     /dev/sde       64.00MiB
>>
>> Unallocated:
>>     /dev/sdb       87.03GiB
>>     /dev/sdc       87.03GiB
>>     /dev/sdd       87.03GiB
>>
>>     /dev/sde       87.03GiB
>>
>>
>> There is 1 extra 2GB file now so thats why it show little more usage now.
>>
>>
>>> Sure, I'll CC you when refreshing the patchset, extra tests are always
>>> appreciated.
>>>
>> Sound good, thanks!
>>