diff mbox series

[2/7] btrfs: reduce the log level for btrfs_dev_stat_inc_and_print()

Message ID c54030e9a9e202f36e6002fb533810bc5e8a6b9b.1710409033.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: scrub: refine the error messages | expand

Commit Message

Qu Wenruo March 14, 2024, 9:50 a.m. UTC
Currently when we increase the device statistics, it would always lead
to an error message in the kernel log.

I would argue this behavior is not ideal:

- It would flood the dmesg and bury real important messages
  One common scenario is scrub.
  If scrub hit some errors, it would cause both scrub and
  btrfs_dev_stat_inc_and_print() to print error messages.

  And in that case, btrfs_dev_stat_inc_and_print() is completely
  useless.

- The results of btrfs_dev_stat_inc_and_print() is mostly for history
  monitoring, doesn't has enough details

  If we trigger the errors during regular read, such messages from
  btrfs_dev_stat_inc_and_print() won't help us to locate the cause
  either.

The real usage for the btrfs device statistics is for some user space
daemon to check if there is any new errors, acting like some checks on
SMART, thus we don't really need/want those messages in dmesg.

This patch would reduce the log level to debug (disabled by default) for
btrfs_dev_stat_inc_and_print().
For users really want to utilize btrfs devices statistics, they should
go check "btrfs device stats" periodically, and we should focus the
kernel error messages to more important things.

CC: stable@vger.kernel.org
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Filipe Manana March 14, 2024, 5:17 p.m. UTC | #1
On Thu, Mar 14, 2024 at 9:54 AM Qu Wenruo <wqu@suse.com> wrote:
>
> Currently when we increase the device statistics, it would always lead
> to an error message in the kernel log.
>
> I would argue this behavior is not ideal:
>
> - It would flood the dmesg and bury real important messages
>   One common scenario is scrub.
>   If scrub hit some errors, it would cause both scrub and
>   btrfs_dev_stat_inc_and_print() to print error messages.
>
>   And in that case, btrfs_dev_stat_inc_and_print() is completely
>   useless.
>
> - The results of btrfs_dev_stat_inc_and_print() is mostly for history
>   monitoring, doesn't has enough details
>
>   If we trigger the errors during regular read, such messages from
>   btrfs_dev_stat_inc_and_print() won't help us to locate the cause
>   either.
>
> The real usage for the btrfs device statistics is for some user space
> daemon to check if there is any new errors, acting like some checks on
> SMART, thus we don't really need/want those messages in dmesg.
>
> This patch would reduce the log level to debug (disabled by default) for
> btrfs_dev_stat_inc_and_print().
> For users really want to utilize btrfs devices statistics, they should
> go check "btrfs device stats" periodically, and we should focus the
> kernel error messages to more important things.

Not sure if this is the right thing to do.

In the scrub context it can be annoying for sure.
Other cases I'm not so sure about, because having error messages in
dmesg/syslog may help notice issues more quickly.

>
> CC: stable@vger.kernel.org
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/volumes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e49935a54da0..126145950ed3 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -7828,7 +7828,7 @@ void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index)
>
>         if (!dev->dev_stats_valid)
>                 return;
> -       btrfs_err_rl_in_rcu(dev->fs_info,
> +       btrfs_debug_rl_in_rcu(dev->fs_info,
>                 "bdev %s errs: wr %u, rd %u, flush %u, corrupt %u, gen %u",
>                            btrfs_dev_name(dev),
>                            btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_WRITE_ERRS),
> --
> 2.44.0
>
>
Qu Wenruo March 14, 2024, 8:26 p.m. UTC | #2
在 2024/3/15 03:47, Filipe Manana 写道:
> On Thu, Mar 14, 2024 at 9:54 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> Currently when we increase the device statistics, it would always lead
>> to an error message in the kernel log.
>>
>> I would argue this behavior is not ideal:
>>
>> - It would flood the dmesg and bury real important messages
>>    One common scenario is scrub.
>>    If scrub hit some errors, it would cause both scrub and
>>    btrfs_dev_stat_inc_and_print() to print error messages.
>>
>>    And in that case, btrfs_dev_stat_inc_and_print() is completely
>>    useless.
>>
>> - The results of btrfs_dev_stat_inc_and_print() is mostly for history
>>    monitoring, doesn't has enough details
>>
>>    If we trigger the errors during regular read, such messages from
>>    btrfs_dev_stat_inc_and_print() won't help us to locate the cause
>>    either.
>>
>> The real usage for the btrfs device statistics is for some user space
>> daemon to check if there is any new errors, acting like some checks on
>> SMART, thus we don't really need/want those messages in dmesg.
>>
>> This patch would reduce the log level to debug (disabled by default) for
>> btrfs_dev_stat_inc_and_print().
>> For users really want to utilize btrfs devices statistics, they should
>> go check "btrfs device stats" periodically, and we should focus the
>> kernel error messages to more important things.
>
> Not sure if this is the right thing to do.
>
> In the scrub context it can be annoying for sure.
> Other cases I'm not so sure about, because having error messages in
> dmesg/syslog may help notice issues more quickly.

For non-scrub cases, I'd argue we already have enough output:

No matter if the error is fixed or not, every time a mirror got csum
mismatch or other errors, we already have error message output:

  Data: btrfs_print_data_csum_error()
  Meta: btrfs_validate_extent_buffer()

For repaired ones, we have extra output from bio layer for both metadata
and data:
  btrfs_repair_io_failure()

So I'd say the dev_stat ones are already duplicated.

Thanks,
Qu
>
>>
>> CC: stable@vger.kernel.org
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/volumes.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index e49935a54da0..126145950ed3 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -7828,7 +7828,7 @@ void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index)
>>
>>          if (!dev->dev_stats_valid)
>>                  return;
>> -       btrfs_err_rl_in_rcu(dev->fs_info,
>> +       btrfs_debug_rl_in_rcu(dev->fs_info,
>>                  "bdev %s errs: wr %u, rd %u, flush %u, corrupt %u, gen %u",
>>                             btrfs_dev_name(dev),
>>                             btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_WRITE_ERRS),
>> --
>> 2.44.0
>>
>>
>
Filipe Manana March 18, 2024, 7:54 p.m. UTC | #3
On Thu, Mar 14, 2024 at 8:26 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2024/3/15 03:47, Filipe Manana 写道:
> > On Thu, Mar 14, 2024 at 9:54 AM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> Currently when we increase the device statistics, it would always lead
> >> to an error message in the kernel log.
> >>
> >> I would argue this behavior is not ideal:
> >>
> >> - It would flood the dmesg and bury real important messages
> >>    One common scenario is scrub.
> >>    If scrub hit some errors, it would cause both scrub and
> >>    btrfs_dev_stat_inc_and_print() to print error messages.
> >>
> >>    And in that case, btrfs_dev_stat_inc_and_print() is completely
> >>    useless.
> >>
> >> - The results of btrfs_dev_stat_inc_and_print() is mostly for history
> >>    monitoring, doesn't has enough details
> >>
> >>    If we trigger the errors during regular read, such messages from
> >>    btrfs_dev_stat_inc_and_print() won't help us to locate the cause
> >>    either.
> >>
> >> The real usage for the btrfs device statistics is for some user space
> >> daemon to check if there is any new errors, acting like some checks on
> >> SMART, thus we don't really need/want those messages in dmesg.
> >>
> >> This patch would reduce the log level to debug (disabled by default) for
> >> btrfs_dev_stat_inc_and_print().
> >> For users really want to utilize btrfs devices statistics, they should
> >> go check "btrfs device stats" periodically, and we should focus the
> >> kernel error messages to more important things.
> >
> > Not sure if this is the right thing to do.
> >
> > In the scrub context it can be annoying for sure.
> > Other cases I'm not so sure about, because having error messages in
> > dmesg/syslog may help notice issues more quickly.
>
> For non-scrub cases, I'd argue we already have enough output:
>
> No matter if the error is fixed or not, every time a mirror got csum
> mismatch or other errors, we already have error message output:
>
>   Data: btrfs_print_data_csum_error()
>   Meta: btrfs_validate_extent_buffer()
>
> For repaired ones, we have extra output from bio layer for both metadata
> and data:
>   btrfs_repair_io_failure()
>
> So I'd say the dev_stat ones are already duplicated.

Ok, I suppose it's fine then.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

>
> Thanks,
> Qu
> >
> >>
> >> CC: stable@vger.kernel.org
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>   fs/btrfs/volumes.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index e49935a54da0..126145950ed3 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -7828,7 +7828,7 @@ void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index)
> >>
> >>          if (!dev->dev_stats_valid)
> >>                  return;
> >> -       btrfs_err_rl_in_rcu(dev->fs_info,
> >> +       btrfs_debug_rl_in_rcu(dev->fs_info,
> >>                  "bdev %s errs: wr %u, rd %u, flush %u, corrupt %u, gen %u",
> >>                             btrfs_dev_name(dev),
> >>                             btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_WRITE_ERRS),
> >> --
> >> 2.44.0
> >>
> >>
> >
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e49935a54da0..126145950ed3 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7828,7 +7828,7 @@  void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index)
 
 	if (!dev->dev_stats_valid)
 		return;
-	btrfs_err_rl_in_rcu(dev->fs_info,
+	btrfs_debug_rl_in_rcu(dev->fs_info,
 		"bdev %s errs: wr %u, rd %u, flush %u, corrupt %u, gen %u",
 			   btrfs_dev_name(dev),
 			   btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_WRITE_ERRS),