diff mbox series

[6/7] btrfs: scrub: unify and shorten the error message

Message ID 6ba44b940e4e3eea573cad667ab8c0b2dd8f2c06.1710409033.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: scrub: refine the error messages | expand

Commit Message

Qu Wenruo March 14, 2024, 9:50 a.m. UTC
Currently the scrub error report is pretty long:

 BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872 on dev /dev/mapper/test-scratch1 physical 13647872
 BTRFS warning (device dm-2): checksum error at logical 13647872 on dev /dev/mapper/test-scratch1, physical 13647872, root 5, inode 257, offset 16384, length 4096, links 1 (path: file1)

Since we have so many things to output, it's not a surprise we got long
error lines.

To make the lines a little shorter, and make important info more
obvious, change the unify output to something like this:

 BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872
 BTRFS warning (device dm-2): checksum error at inode 5/257(file1) fileoff 16384, logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872

The idea is, to put large values/small values/string separated meanwhile
skip the some descriptor directly:

- "logical LOGICAL(MIRROR)"
  LOGICAL is always a large value, meanwhile MIRROR is always a single
  digit.
  Thus combining them together, human can still split them instinctively.

- "physical DEVID(DEVPATH)PHYSICAL"
  DEVID is normally a short number, DEVPATH is a long string, and PHYSICAL
  is a normally a large number.
  And for most end users, the most important device path is still easy
  to catch, even surrounded by some large numbers.

- inode ROOT/INO(PATH)
  To locate a btrfs inode, we have to provide both root and inode
  number. Normally ROOT should be a much smaller number (3 digits)
  meanwhile the INO can be pretty large.

  However for the end user, the most important thing is the PATH, which
  is still not hard to locate.

Any better ideas would be appreciated to make those lines shorter.

However what we really need is a proper way to report all those info
(maybe put the file/dev name resolution into the user space) to user
space reliably (without rate limit).

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 55 +++++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 24 deletions(-)

Comments

Filipe Manana March 14, 2024, 5:40 p.m. UTC | #1
On Thu, Mar 14, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote:
>
> Currently the scrub error report is pretty long:
>
>  BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872 on dev /dev/mapper/test-scratch1 physical 13647872
>  BTRFS warning (device dm-2): checksum error at logical 13647872 on dev /dev/mapper/test-scratch1, physical 13647872, root 5, inode 257, offset 16384, length 4096, links 1 (path: file1)
>
> Since we have so many things to output, it's not a surprise we got long
> error lines.
>
> To make the lines a little shorter, and make important info more
> obvious, change the unify output to something like this:
>
>  BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872
>  BTRFS warning (device dm-2): checksum error at inode 5/257(file1) fileoff 16384, logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872

I find that hard to read because:

1) Please leave spaces before opening parenthesis.
   That makes things less cluttered and easier to the eyes, more
consistent with what we generally do and it's the formal way to do
things in English (see
https://www.scribens.com/grammar-rules/typography/spacing.html);

2) Instead of "inode 5/257(file1)", something a lot easier to
understand like "inode 5 root 257 path \"file1\"", which leaves no
margin for doubt about what each number is;

3) Same with "logical 13647872(1)" - what is the 1? That will be the
question anyone reading such a log message will likely have.
    Something like "logical 13647872 mirror 1" makes it clear;

4) Same with "physical 1(/dev/mapper/test-scratch1)13647872".
    Something like "physical 13647872 device ID 1 device path
\"/dev/mapper/test-scratch1\"", makes it clear what each number is and
easier to read.

Thanks.

>
> The idea is, to put large values/small values/string separated meanwhile
> skip the some descriptor directly:
>
> - "logical LOGICAL(MIRROR)"
>   LOGICAL is always a large value, meanwhile MIRROR is always a single
>   digit.
>   Thus combining them together, human can still split them instinctively.
>
> - "physical DEVID(DEVPATH)PHYSICAL"
>   DEVID is normally a short number, DEVPATH is a long string, and PHYSICAL
>   is a normally a large number.
>   And for most end users, the most important device path is still easy
>   to catch, even surrounded by some large numbers.
>
> - inode ROOT/INO(PATH)
>   To locate a btrfs inode, we have to provide both root and inode
>   number. Normally ROOT should be a much smaller number (3 digits)
>   meanwhile the INO can be pretty large.
>
>   However for the end user, the most important thing is the PATH, which
>   is still not hard to locate.
>
> Any better ideas would be appreciated to make those lines shorter.
>
> However what we really need is a proper way to report all those info
> (maybe put the file/dev name resolution into the user space) to user
> space reliably (without rate limit).
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/scrub.c | 55 +++++++++++++++++++++++++++---------------------
>  1 file changed, 31 insertions(+), 24 deletions(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 18b2ee3b1616..17e492076bf8 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -226,6 +226,7 @@ struct scrub_warning {
>         u64                     physical;
>         u64                     logical;
>         struct btrfs_device     *dev;
> +       int                     mirror_num;
>  };
>
>  static void release_scrub_stripe(struct scrub_stripe *stripe)
> @@ -427,12 +428,12 @@ static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
>          */
>         for (i = 0; i < ipath->fspath->elem_cnt; ++i)
>                 btrfs_warn_in_rcu(fs_info,
> -"%s at logical %llu on dev %s, physical %llu, root %llu, inode %llu, offset %llu, path: %s",
> -                                 swarn->errstr, swarn->logical,
> -                                 btrfs_dev_name(swarn->dev),
> -                                 swarn->physical,
> -                                 root, inum, offset,
> -                                 (char *)(unsigned long)ipath->fspath->val[i]);
> +"%s at inode %lld/%llu(%s) fileoff %llu, logical %llu(%u) physical %llu(%s)%llu",
> +                                 swarn->errstr, root, inum,
> +                                 (char *)ipath->fspath->val[i], offset,
> +                                 swarn->logical, swarn->mirror_num,
> +                                 swarn->dev->devid, btrfs_dev_name(swarn->dev),
> +                                 swarn->physical);
>
>         btrfs_put_root(local_root);
>         free_ipath(ipath);
> @@ -440,18 +441,17 @@ static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
>
>  err:
>         btrfs_warn_in_rcu(fs_info,
> -                         "%s at logical %llu on dev %s, physical %llu, root %llu, inode %llu, offset %llu: path resolving failed with ret=%d",
> -                         swarn->errstr, swarn->logical,
> -                         btrfs_dev_name(swarn->dev),
> -                         swarn->physical,
> -                         root, inum, offset, ret);
> -
> +       "%s at inode %lld/%llu fileoff %llu, logical %llu(%u) physical %llu(%s)%llu",
> +                         swarn->errstr, root, inum, offset,
> +                         swarn->logical, swarn->mirror_num,
> +                         swarn->dev->devid, btrfs_dev_name(swarn->dev),
> +                         swarn->physical);
>         free_ipath(ipath);
>         return 0;
>  }
>
>  static void scrub_print_common_warning(const char *errstr, struct btrfs_device *dev,
> -                                      u64 logical, u64 physical)
> +                                      u64 logical, u64 physical, int mirror_num)
>  {
>         struct btrfs_fs_info *fs_info = dev->fs_info;
>         struct btrfs_path *path;
> @@ -471,6 +471,7 @@ static void scrub_print_common_warning(const char *errstr, struct btrfs_device *
>         swarn.logical = logical;
>         swarn.errstr = errstr;
>         swarn.dev = NULL;
> +       swarn.mirror_num = mirror_num;
>
>         ret = extent_from_logical(fs_info, swarn.logical, path, &found_key,
>                                   &flags);
> @@ -501,10 +502,11 @@ static void scrub_print_common_warning(const char *errstr, struct btrfs_device *
>                         if (ret > 0)
>                                 break;
>                         btrfs_warn_in_rcu(fs_info,
> -"%s at logical %llu on dev %s, physical %llu: metadata %s (level %d) in tree %llu",
> -                               errstr, swarn.logical, btrfs_dev_name(dev),
> -                               swarn.physical, (ref_level ? "node" : "leaf"),
> -                               ref_level, ref_root);
> +"%s at metadata in tree %llu(%u), logical %llu(%u) physical %llu(%s)%llu",
> +                               errstr, ref_root, ref_level,
> +                               swarn.logical, swarn.mirror_num,
> +                               dev->devid, btrfs_dev_name(dev),
> +                               swarn.physical);
>                 }
>                 btrfs_release_path(path);
>         } else {
> @@ -879,27 +881,32 @@ static void scrub_stripe_report_errors(struct scrub_ctx *sctx,
>                  */
>                 if (repaired) {
>                         btrfs_err_rl_in_rcu(fs_info,
> -                       "fixed up error at logical %llu on dev %s physical %llu",
> -                                           logical, btrfs_dev_name(dev),
> +                       "fixed up error at logical %llu(%u) physical %llu(%s)%llu",
> +                                           logical, stripe->mirror_num,
> +                                           dev->devid, btrfs_dev_name(dev),
>                                             physical);
>                         continue;
>                 }
>
>                 /* The remaining are all for unrepaired. */
>                 btrfs_err_rl_in_rcu(fs_info,
> -       "unable to fixup (regular) error at logical %llu on dev %s physical %llu",
> -                                           logical, btrfs_dev_name(dev),
> +       "unable to fixup (regular) error at logical %llu(%u) physical %llu(%s)%llu",
> +                                           logical, stripe->mirror_num,
> +                                           dev->devid, btrfs_dev_name(dev),
>                                             physical);
>
>                 if (test_bit(sector_nr, &stripe->io_error_bitmap))
>                         scrub_print_common_warning("i/o error", dev,
> -                                                    logical, physical);
> +                                                    logical, physical,
> +                                                    stripe->mirror_num);
>                 if (test_bit(sector_nr, &stripe->csum_error_bitmap))
>                         scrub_print_common_warning("checksum error", dev,
> -                                                    logical, physical);
> +                                                    logical, physical,
> +                                                    stripe->mirror_num);
>                 if (test_bit(sector_nr, &stripe->meta_error_bitmap))
>                         scrub_print_common_warning("header error", dev,
> -                                                    logical, physical);
> +                                                    logical, physical,
> +                                                    stripe->mirror_num);
>         }
>
>         spin_lock(&sctx->stat_lock);
> --
> 2.44.0
>
>
Qu Wenruo March 14, 2024, 8:56 p.m. UTC | #2
在 2024/3/15 04:10, Filipe Manana 写道:
> On Thu, Mar 14, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> Currently the scrub error report is pretty long:
>>
>>   BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872 on dev /dev/mapper/test-scratch1 physical 13647872
>>   BTRFS warning (device dm-2): checksum error at logical 13647872 on dev /dev/mapper/test-scratch1, physical 13647872, root 5, inode 257, offset 16384, length 4096, links 1 (path: file1)
>>
>> Since we have so many things to output, it's not a surprise we got long
>> error lines.
>>
>> To make the lines a little shorter, and make important info more
>> obvious, change the unify output to something like this:
>>
>>   BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872
>>   BTRFS warning (device dm-2): checksum error at inode 5/257(file1) fileoff 16384, logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872
> 
> I find that hard to read because:
> 
> 1) Please leave spaces before opening parenthesis.
>     That makes things less cluttered and easier to the eyes, more
> consistent with what we generally do and it's the formal way to do
> things in English (see
> https://www.scribens.com/grammar-rules/typography/spacing.html);

Yep, I can do that, but I also want to keep the involved members 
together thus closer.

Not sure if adding spaces would still keep the close relationships 
between the values.

E.g: inode 5/256 (file1) fileoff 16384, logical 123456 (1) physical 1 
(scratch1) 123456

It makes it a little harder to indicate that "(1)" is related to logical 
address (thus mirror number).

> 
> 2) Instead of "inode 5/257(file1)", something a lot easier to
> understand like "inode 5 root 257 path \"file1\"", which leaves no
> margin for doubt about what each number is;
> 
> 3) Same with "logical 13647872(1)" - what is the 1? That will be the
> question anyone reading such a log message will likely have.
>      Something like "logical 13647872 mirror 1" makes it clear;
> 
> 4) Same with "physical 1(/dev/mapper/test-scratch1)13647872".
>      Something like "physical 13647872 device ID 1 device path
> \"/dev/mapper/test-scratch1\"", makes it clear what each number is and
> easier to read.

I totally understand the original output format is much easier to read 
on its own.

However if we have hundreds lines of similar output, it's a different 
story, a small change in any of the value is much harder to catch, and 
the extra helpful prompt is in fact a burden other than a bless.

(That's why things like spreadsheet is much easier to reader for 
multiple similarly structured data, and in that case it's much better to 
craft a script to convert the lines into a csv)

Unfortunately we don't have the benefit (at least yet) to structure all 
these info into a structured output.


But what I'm doing here is to reduce the prompts to minimal (at most 4 
prompts, "inode", "fileoff", "logical", "physical"), so less duplicated 
strings for multiple lines of similar error messages.

The patchset is in the middle ground between fully detailed output (the 
old one, focusing on single line) and the fully script orient output (no 
prompt at all, just all numbers/strings, focusing on CSV like output).


Furthermore, I would also argue that, for entry level end users, they 
can still catch the critical info (like file path and device path) 
without much difficult, and those info is enough for them to take action.
(E.g. deleting the file, or replace the disk)

Yes, they would get confused on the devid or rootid, but that's fine and 
everyone can learn something new.

For experienced users who understand basics of btrfs, or us developers, 
the split in values are already arranged in a way similar sized values 
are never close together (aka, string/large/small value split).

Thanks,
Qu

> 
> Thanks.
>
kernel test robot March 14, 2024, 11:05 p.m. UTC | #3
Hi Qu,

kernel test robot noticed the following build warnings:

[auto build test WARNING on kdave/for-next]
[also build test WARNING on linus/master v6.8 next-20240314]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-scrub-fix-incorrectly-reported-logical-physical-address/20240314-215457
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
patch link:    https://lore.kernel.org/r/6ba44b940e4e3eea573cad667ab8c0b2dd8f2c06.1710409033.git.wqu%40suse.com
patch subject: [PATCH 6/7] btrfs: scrub: unify and shorten the error message
config: m68k-defconfig (https://download.01.org/0day-ci/archive/20240315/202403150650.dNFtHzxf-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240315/202403150650.dNFtHzxf-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202403150650.dNFtHzxf-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from fs/btrfs/extent_io.h:16,
                    from fs/btrfs/locking.h:13,
                    from fs/btrfs/ctree.h:19,
                    from fs/btrfs/scrub.c:10:
   fs/btrfs/scrub.c: In function 'scrub_print_warning_inode':
>> fs/btrfs/scrub.c:433:35: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     433 |                                   (char *)ipath->fspath->val[i], offset,
         |                                   ^
   fs/btrfs/messages.h:27:39: note: in definition of macro 'btrfs_printk'
      27 |         _btrfs_printk(fs_info, fmt, ##args)
         |                                       ^~~~
   fs/btrfs/messages.h:66:9: note: in expansion of macro 'btrfs_printk_in_rcu'
      66 |         btrfs_printk_in_rcu(fs_info, KERN_WARNING fmt, ##args)
         |         ^~~~~~~~~~~~~~~~~~~
   fs/btrfs/scrub.c:430:17: note: in expansion of macro 'btrfs_warn_in_rcu'
     430 |                 btrfs_warn_in_rcu(fs_info,
         |                 ^~~~~~~~~~~~~~~~~


vim +433 fs/btrfs/scrub.c

   388	
   389	static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
   390					     u64 root, void *warn_ctx)
   391	{
   392		int ret;
   393		int i;
   394		unsigned nofs_flag;
   395		struct scrub_warning *swarn = warn_ctx;
   396		struct btrfs_fs_info *fs_info = swarn->dev->fs_info;
   397		struct inode_fs_paths *ipath = NULL;
   398		struct btrfs_root *local_root;
   399	
   400		local_root = btrfs_get_fs_root(fs_info, root, true);
   401		if (IS_ERR(local_root)) {
   402			ret = PTR_ERR(local_root);
   403			goto err;
   404		}
   405	
   406		/*
   407		 * init_path might indirectly call vmalloc, or use GFP_KERNEL. Scrub
   408		 * uses GFP_NOFS in this context, so we keep it consistent but it does
   409		 * not seem to be strictly necessary.
   410		 */
   411		nofs_flag = memalloc_nofs_save();
   412		ipath = init_ipath(4096, local_root, swarn->path);
   413		memalloc_nofs_restore(nofs_flag);
   414		if (IS_ERR(ipath)) {
   415			btrfs_put_root(local_root);
   416			ret = PTR_ERR(ipath);
   417			ipath = NULL;
   418			goto err;
   419		}
   420		ret = paths_from_inode(inum, ipath);
   421	
   422		if (ret < 0)
   423			goto err;
   424	
   425		/*
   426		 * we deliberately ignore the bit ipath might have been too small to
   427		 * hold all of the paths here
   428		 */
   429		for (i = 0; i < ipath->fspath->elem_cnt; ++i)
   430			btrfs_warn_in_rcu(fs_info,
   431	"%s at inode %lld/%llu(%s) fileoff %llu, logical %llu(%u) physical %llu(%s)%llu",
   432					  swarn->errstr, root, inum,
 > 433					  (char *)ipath->fspath->val[i], offset,
   434					  swarn->logical, swarn->mirror_num,
   435					  swarn->dev->devid, btrfs_dev_name(swarn->dev),
   436					  swarn->physical);
   437	
   438		btrfs_put_root(local_root);
   439		free_ipath(ipath);
   440		return 0;
   441	
   442	err:
   443		btrfs_warn_in_rcu(fs_info,
   444		"%s at inode %lld/%llu fileoff %llu, logical %llu(%u) physical %llu(%s)%llu",
   445				  swarn->errstr, root, inum, offset,
   446				  swarn->logical, swarn->mirror_num,
   447				  swarn->dev->devid, btrfs_dev_name(swarn->dev),
   448				  swarn->physical);
   449		free_ipath(ipath);
   450		return 0;
   451	}
   452
kernel test robot March 15, 2024, 11:44 a.m. UTC | #4
Hi Qu,

kernel test robot noticed the following build warnings:

[auto build test WARNING on kdave/for-next]
[also build test WARNING on linus/master v6.8 next-20240315]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-scrub-fix-incorrectly-reported-logical-physical-address/20240314-215457
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
patch link:    https://lore.kernel.org/r/6ba44b940e4e3eea573cad667ab8c0b2dd8f2c06.1710409033.git.wqu%40suse.com
patch subject: [PATCH 6/7] btrfs: scrub: unify and shorten the error message
config: hexagon-randconfig-r111-20240315 (https://download.01.org/0day-ci/archive/20240315/202403151954.2aDLBPQm-lkp@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 8f68022f8e6e54d1aeae4ed301f5a015963089b7)
reproduce: (https://download.01.org/0day-ci/archive/20240315/202403151954.2aDLBPQm-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202403151954.2aDLBPQm-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> fs/btrfs/scrub.c:430:17: sparse: sparse: non size-preserving integer to pointer cast

vim +430 fs/btrfs/scrub.c

a2de733c78fa7a Arne Jansen    2011-03-08  388  
c7499a64dcf62b Filipe Manana  2022-11-01  389  static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
c7499a64dcf62b Filipe Manana  2022-11-01  390  				     u64 root, void *warn_ctx)
558540c17771ea Jan Schmidt    2011-06-13  391  {
558540c17771ea Jan Schmidt    2011-06-13  392  	int ret;
558540c17771ea Jan Schmidt    2011-06-13  393  	int i;
de2491fdefe7e5 David Sterba   2017-05-31  394  	unsigned nofs_flag;
ff023aac31198e Stefan Behrens 2012-11-06  395  	struct scrub_warning *swarn = warn_ctx;
fb456252d3d9c0 Jeff Mahoney   2016-06-22  396  	struct btrfs_fs_info *fs_info = swarn->dev->fs_info;
558540c17771ea Jan Schmidt    2011-06-13  397  	struct inode_fs_paths *ipath = NULL;
558540c17771ea Jan Schmidt    2011-06-13  398  	struct btrfs_root *local_root;
558540c17771ea Jan Schmidt    2011-06-13  399  
56e9357a1e8167 David Sterba   2020-05-15  400  	local_root = btrfs_get_fs_root(fs_info, root, true);
558540c17771ea Jan Schmidt    2011-06-13  401  	if (IS_ERR(local_root)) {
558540c17771ea Jan Schmidt    2011-06-13  402  		ret = PTR_ERR(local_root);
558540c17771ea Jan Schmidt    2011-06-13  403  		goto err;
558540c17771ea Jan Schmidt    2011-06-13  404  	}
558540c17771ea Jan Schmidt    2011-06-13  405  
de2491fdefe7e5 David Sterba   2017-05-31  406  	/*
de2491fdefe7e5 David Sterba   2017-05-31  407  	 * init_path might indirectly call vmalloc, or use GFP_KERNEL. Scrub
de2491fdefe7e5 David Sterba   2017-05-31  408  	 * uses GFP_NOFS in this context, so we keep it consistent but it does
de2491fdefe7e5 David Sterba   2017-05-31  409  	 * not seem to be strictly necessary.
de2491fdefe7e5 David Sterba   2017-05-31  410  	 */
de2491fdefe7e5 David Sterba   2017-05-31  411  	nofs_flag = memalloc_nofs_save();
558540c17771ea Jan Schmidt    2011-06-13  412  	ipath = init_ipath(4096, local_root, swarn->path);
de2491fdefe7e5 David Sterba   2017-05-31  413  	memalloc_nofs_restore(nofs_flag);
26bdef541d26fd Dan Carpenter  2011-11-16  414  	if (IS_ERR(ipath)) {
0024652895e347 Josef Bacik    2020-01-24  415  		btrfs_put_root(local_root);
26bdef541d26fd Dan Carpenter  2011-11-16  416  		ret = PTR_ERR(ipath);
26bdef541d26fd Dan Carpenter  2011-11-16  417  		ipath = NULL;
26bdef541d26fd Dan Carpenter  2011-11-16  418  		goto err;
26bdef541d26fd Dan Carpenter  2011-11-16  419  	}
558540c17771ea Jan Schmidt    2011-06-13  420  	ret = paths_from_inode(inum, ipath);
558540c17771ea Jan Schmidt    2011-06-13  421  
558540c17771ea Jan Schmidt    2011-06-13  422  	if (ret < 0)
558540c17771ea Jan Schmidt    2011-06-13  423  		goto err;
558540c17771ea Jan Schmidt    2011-06-13  424  
558540c17771ea Jan Schmidt    2011-06-13  425  	/*
558540c17771ea Jan Schmidt    2011-06-13  426  	 * we deliberately ignore the bit ipath might have been too small to
558540c17771ea Jan Schmidt    2011-06-13  427  	 * hold all of the paths here
558540c17771ea Jan Schmidt    2011-06-13  428  	 */
558540c17771ea Jan Schmidt    2011-06-13  429  	for (i = 0; i < ipath->fspath->elem_cnt; ++i)
5d163e0e68ce74 Jeff Mahoney   2016-09-20 @430  		btrfs_warn_in_rcu(fs_info,
023af88ce10a69 Qu Wenruo      2024-03-14  431  "%s at inode %lld/%llu(%s) fileoff %llu, logical %llu(%u) physical %llu(%s)%llu",
023af88ce10a69 Qu Wenruo      2024-03-14  432  				  swarn->errstr, root, inum,
023af88ce10a69 Qu Wenruo      2024-03-14  433  				  (char *)ipath->fspath->val[i], offset,
023af88ce10a69 Qu Wenruo      2024-03-14  434  				  swarn->logical, swarn->mirror_num,
023af88ce10a69 Qu Wenruo      2024-03-14  435  				  swarn->dev->devid, btrfs_dev_name(swarn->dev),
023af88ce10a69 Qu Wenruo      2024-03-14  436  				  swarn->physical);
558540c17771ea Jan Schmidt    2011-06-13  437  
0024652895e347 Josef Bacik    2020-01-24  438  	btrfs_put_root(local_root);
558540c17771ea Jan Schmidt    2011-06-13  439  	free_ipath(ipath);
558540c17771ea Jan Schmidt    2011-06-13  440  	return 0;
558540c17771ea Jan Schmidt    2011-06-13  441  
558540c17771ea Jan Schmidt    2011-06-13  442  err:
5d163e0e68ce74 Jeff Mahoney   2016-09-20  443  	btrfs_warn_in_rcu(fs_info,
023af88ce10a69 Qu Wenruo      2024-03-14  444  	"%s at inode %lld/%llu fileoff %llu, logical %llu(%u) physical %llu(%s)%llu",
023af88ce10a69 Qu Wenruo      2024-03-14  445  			  swarn->errstr, root, inum, offset,
023af88ce10a69 Qu Wenruo      2024-03-14  446  			  swarn->logical, swarn->mirror_num,
023af88ce10a69 Qu Wenruo      2024-03-14  447  			  swarn->dev->devid, btrfs_dev_name(swarn->dev),
023af88ce10a69 Qu Wenruo      2024-03-14  448  			  swarn->physical);
558540c17771ea Jan Schmidt    2011-06-13  449  	free_ipath(ipath);
558540c17771ea Jan Schmidt    2011-06-13  450  	return 0;
558540c17771ea Jan Schmidt    2011-06-13  451  }
558540c17771ea Jan Schmidt    2011-06-13  452
Filipe Manana March 18, 2024, 4:26 p.m. UTC | #5
On Thu, Mar 14, 2024 at 8:56 PM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> 在 2024/3/15 04:10, Filipe Manana 写道:
> > On Thu, Mar 14, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >> Currently the scrub error report is pretty long:
> >>
> >>   BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872 on dev /dev/mapper/test-scratch1 physical 13647872
> >>   BTRFS warning (device dm-2): checksum error at logical 13647872 on dev /dev/mapper/test-scratch1, physical 13647872, root 5, inode 257, offset 16384, length 4096, links 1 (path: file1)
> >>
> >> Since we have so many things to output, it's not a surprise we got long
> >> error lines.
> >>
> >> To make the lines a little shorter, and make important info more
> >> obvious, change the unify output to something like this:
> >>
> >>   BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872
> >>   BTRFS warning (device dm-2): checksum error at inode 5/257(file1) fileoff 16384, logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872
> >
> > I find that hard to read because:
> >
> > 1) Please leave spaces before opening parenthesis.
> >     That makes things less cluttered and easier to the eyes, more
> > consistent with what we generally do and it's the formal way to do
> > things in English (see
> > https://www.scribens.com/grammar-rules/typography/spacing.html);
>
> Yep, I can do that, but I also want to keep the involved members
> together thus closer.
>
> Not sure if adding spaces would still keep the close relationships
> between the values.
>
> E.g: inode 5/256 (file1) fileoff 16384, logical 123456 (1) physical 1
> (scratch1) 123456
>
> It makes it a little harder to indicate that "(1)" is related to logical
> address (thus mirror number).
>
> >
> > 2) Instead of "inode 5/257(file1)", something a lot easier to
> > understand like "inode 5 root 257 path \"file1\"", which leaves no
> > margin for doubt about what each number is;
> >
> > 3) Same with "logical 13647872(1)" - what is the 1? That will be the
> > question anyone reading such a log message will likely have.
> >      Something like "logical 13647872 mirror 1" makes it clear;
> >
> > 4) Same with "physical 1(/dev/mapper/test-scratch1)13647872".
> >      Something like "physical 13647872 device ID 1 device path
> > \"/dev/mapper/test-scratch1\"", makes it clear what each number is and
> > easier to read.
>
> I totally understand the original output format is much easier to read
> on its own.
>
> However if we have hundreds lines of similar output, it's a different
> story, a small change in any of the value is much harder to catch, and
> the extra helpful prompt is in fact a burden other than a bless.
>
> (That's why things like spreadsheet is much easier to reader for
> multiple similarly structured data, and in that case it's much better to
> craft a script to convert the lines into a csv)
>
> Unfortunately we don't have the benefit (at least yet) to structure all
> these info into a structured output.
>
>
> But what I'm doing here is to reduce the prompts to minimal (at most 4
> prompts, "inode", "fileoff", "logical", "physical"), so less duplicated
> strings for multiple lines of similar error messages.
>
> The patchset is in the middle ground between fully detailed output (the
> old one, focusing on single line) and the fully script orient output (no
> prompt at all, just all numbers/strings, focusing on CSV like output).
>
>
> Furthermore, I would also argue that, for entry level end users, they
> can still catch the critical info (like file path and device path)
> without much difficult, and those info is enough for them to take action.
> (E.g. deleting the file, or replace the disk)
>
> Yes, they would get confused on the devid or rootid, but that's fine and
> everyone can learn something new.
>
> For experienced users who understand basics of btrfs, or us developers,
> the split in values are already arranged in a way similar sized values
> are never close together (aka, string/large/small value split).

Well, I'm a developer and I can tell you if I ever see such log
messages, I'll have to go to the source code to figure out what each
value is supposed to be.
I'll be able to memorize for some hours or even a few days, but after
that I'll forget again and have to look at the source code again.


>
> Thanks,
> Qu
>
> >
> > Thanks.
> >
Qu Wenruo March 18, 2024, 8 p.m. UTC | #6
在 2024/3/19 02:56, Filipe Manana 写道:
> On Thu, Mar 14, 2024 at 8:56 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>>
>>
>> 在 2024/3/15 04:10, Filipe Manana 写道:
>>> On Thu, Mar 14, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote:
>>>>
>>>> Currently the scrub error report is pretty long:
>>>>
>>>>    BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872 on dev /dev/mapper/test-scratch1 physical 13647872
>>>>    BTRFS warning (device dm-2): checksum error at logical 13647872 on dev /dev/mapper/test-scratch1, physical 13647872, root 5, inode 257, offset 16384, length 4096, links 1 (path: file1)
>>>>
>>>> Since we have so many things to output, it's not a surprise we got long
>>>> error lines.
>>>>
>>>> To make the lines a little shorter, and make important info more
>>>> obvious, change the unify output to something like this:
>>>>
>>>>    BTRFS error (device dm-2): unable to fixup (regular) error at logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872
>>>>    BTRFS warning (device dm-2): checksum error at inode 5/257(file1) fileoff 16384, logical 13647872(1) physical 1(/dev/mapper/test-scratch1)13647872
>>>
>>> I find that hard to read because:
>>>
>>> 1) Please leave spaces before opening parenthesis.
>>>      That makes things less cluttered and easier to the eyes, more
>>> consistent with what we generally do and it's the formal way to do
>>> things in English (see
>>> https://www.scribens.com/grammar-rules/typography/spacing.html);
>>
>> Yep, I can do that, but I also want to keep the involved members
>> together thus closer.
>>
>> Not sure if adding spaces would still keep the close relationships
>> between the values.
>>
>> E.g: inode 5/256 (file1) fileoff 16384, logical 123456 (1) physical 1
>> (scratch1) 123456
>>
>> It makes it a little harder to indicate that "(1)" is related to logical
>> address (thus mirror number).
>>
>>>
>>> 2) Instead of "inode 5/257(file1)", something a lot easier to
>>> understand like "inode 5 root 257 path \"file1\"", which leaves no
>>> margin for doubt about what each number is;
>>>
>>> 3) Same with "logical 13647872(1)" - what is the 1? That will be the
>>> question anyone reading such a log message will likely have.
>>>       Something like "logical 13647872 mirror 1" makes it clear;
>>>
>>> 4) Same with "physical 1(/dev/mapper/test-scratch1)13647872".
>>>       Something like "physical 13647872 device ID 1 device path
>>> \"/dev/mapper/test-scratch1\"", makes it clear what each number is and
>>> easier to read.
>>
>> I totally understand the original output format is much easier to read
>> on its own.
>>
>> However if we have hundreds lines of similar output, it's a different
>> story, a small change in any of the value is much harder to catch, and
>> the extra helpful prompt is in fact a burden other than a bless.
>>
>> (That's why things like spreadsheet is much easier to reader for
>> multiple similarly structured data, and in that case it's much better to
>> craft a script to convert the lines into a csv)
>>
>> Unfortunately we don't have the benefit (at least yet) to structure all
>> these info into a structured output.
>>
>>
>> But what I'm doing here is to reduce the prompts to minimal (at most 4
>> prompts, "inode", "fileoff", "logical", "physical"), so less duplicated
>> strings for multiple lines of similar error messages.
>>
>> The patchset is in the middle ground between fully detailed output (the
>> old one, focusing on single line) and the fully script orient output (no
>> prompt at all, just all numbers/strings, focusing on CSV like output).
>>
>>
>> Furthermore, I would also argue that, for entry level end users, they
>> can still catch the critical info (like file path and device path)
>> without much difficult, and those info is enough for them to take action.
>> (E.g. deleting the file, or replace the disk)
>>
>> Yes, they would get confused on the devid or rootid, but that's fine and
>> everyone can learn something new.
>>
>> For experienced users who understand basics of btrfs, or us developers,
>> the split in values are already arranged in a way similar sized values
>> are never close together (aka, string/large/small value split).
>
> Well, I'm a developer and I can tell you if I ever see such log
> messages, I'll have to go to the source code to figure out what each
> value is supposed to be.
> I'll be able to memorize for some hours or even a few days, but after
> that I'll forget again and have to look at the source code again.

As the core problem is, if you just see one such error, and you need to
investigate, yes it needs more digging.

But in the real world, the support cases I hit are all have tons of
similar lines almost flooding the dmesg (well, in that case they don't
include any useful info either).

In that case, digging the format once is not a big deal at all, in fact
the main heavy lifting part is to determine how severe the corruption is.

And I'm not going full csv-like output, we still have several prompts to
provide some help.

Thanks,
Qu

>
>
>>
>> Thanks,
>> Qu
>>
>>>
>>> Thanks.
>>>
>
diff mbox series

Patch

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 18b2ee3b1616..17e492076bf8 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -226,6 +226,7 @@  struct scrub_warning {
 	u64			physical;
 	u64			logical;
 	struct btrfs_device	*dev;
+	int			mirror_num;
 };
 
 static void release_scrub_stripe(struct scrub_stripe *stripe)
@@ -427,12 +428,12 @@  static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
 	 */
 	for (i = 0; i < ipath->fspath->elem_cnt; ++i)
 		btrfs_warn_in_rcu(fs_info,
-"%s at logical %llu on dev %s, physical %llu, root %llu, inode %llu, offset %llu, path: %s",
-				  swarn->errstr, swarn->logical,
-				  btrfs_dev_name(swarn->dev),
-				  swarn->physical,
-				  root, inum, offset,
-				  (char *)(unsigned long)ipath->fspath->val[i]);
+"%s at inode %lld/%llu(%s) fileoff %llu, logical %llu(%u) physical %llu(%s)%llu",
+				  swarn->errstr, root, inum,
+				  (char *)ipath->fspath->val[i], offset,
+				  swarn->logical, swarn->mirror_num,
+				  swarn->dev->devid, btrfs_dev_name(swarn->dev),
+				  swarn->physical);
 
 	btrfs_put_root(local_root);
 	free_ipath(ipath);
@@ -440,18 +441,17 @@  static int scrub_print_warning_inode(u64 inum, u64 offset, u64 num_bytes,
 
 err:
 	btrfs_warn_in_rcu(fs_info,
-			  "%s at logical %llu on dev %s, physical %llu, root %llu, inode %llu, offset %llu: path resolving failed with ret=%d",
-			  swarn->errstr, swarn->logical,
-			  btrfs_dev_name(swarn->dev),
-			  swarn->physical,
-			  root, inum, offset, ret);
-
+	"%s at inode %lld/%llu fileoff %llu, logical %llu(%u) physical %llu(%s)%llu",
+			  swarn->errstr, root, inum, offset,
+			  swarn->logical, swarn->mirror_num,
+			  swarn->dev->devid, btrfs_dev_name(swarn->dev),
+			  swarn->physical);
 	free_ipath(ipath);
 	return 0;
 }
 
 static void scrub_print_common_warning(const char *errstr, struct btrfs_device *dev,
-				       u64 logical, u64 physical)
+				       u64 logical, u64 physical, int mirror_num)
 {
 	struct btrfs_fs_info *fs_info = dev->fs_info;
 	struct btrfs_path *path;
@@ -471,6 +471,7 @@  static void scrub_print_common_warning(const char *errstr, struct btrfs_device *
 	swarn.logical = logical;
 	swarn.errstr = errstr;
 	swarn.dev = NULL;
+	swarn.mirror_num = mirror_num;
 
 	ret = extent_from_logical(fs_info, swarn.logical, path, &found_key,
 				  &flags);
@@ -501,10 +502,11 @@  static void scrub_print_common_warning(const char *errstr, struct btrfs_device *
 			if (ret > 0)
 				break;
 			btrfs_warn_in_rcu(fs_info,
-"%s at logical %llu on dev %s, physical %llu: metadata %s (level %d) in tree %llu",
-				errstr, swarn.logical, btrfs_dev_name(dev),
-				swarn.physical, (ref_level ? "node" : "leaf"),
-				ref_level, ref_root);
+"%s at metadata in tree %llu(%u), logical %llu(%u) physical %llu(%s)%llu",
+				errstr, ref_root, ref_level,
+				swarn.logical, swarn.mirror_num,
+				dev->devid, btrfs_dev_name(dev),
+				swarn.physical);
 		}
 		btrfs_release_path(path);
 	} else {
@@ -879,27 +881,32 @@  static void scrub_stripe_report_errors(struct scrub_ctx *sctx,
 		 */
 		if (repaired) {
 			btrfs_err_rl_in_rcu(fs_info,
-			"fixed up error at logical %llu on dev %s physical %llu",
-					    logical, btrfs_dev_name(dev),
+			"fixed up error at logical %llu(%u) physical %llu(%s)%llu",
+					    logical, stripe->mirror_num,
+					    dev->devid, btrfs_dev_name(dev),
 					    physical);
 			continue;
 		}
 
 		/* The remaining are all for unrepaired. */
 		btrfs_err_rl_in_rcu(fs_info,
-	"unable to fixup (regular) error at logical %llu on dev %s physical %llu",
-					    logical, btrfs_dev_name(dev),
+	"unable to fixup (regular) error at logical %llu(%u) physical %llu(%s)%llu",
+					    logical, stripe->mirror_num,
+					    dev->devid, btrfs_dev_name(dev),
 					    physical);
 
 		if (test_bit(sector_nr, &stripe->io_error_bitmap))
 			scrub_print_common_warning("i/o error", dev,
-						     logical, physical);
+						     logical, physical,
+						     stripe->mirror_num);
 		if (test_bit(sector_nr, &stripe->csum_error_bitmap))
 			scrub_print_common_warning("checksum error", dev,
-						     logical, physical);
+						     logical, physical,
+						     stripe->mirror_num);
 		if (test_bit(sector_nr, &stripe->meta_error_bitmap))
 			scrub_print_common_warning("header error", dev,
-						     logical, physical);
+						     logical, physical,
+						     stripe->mirror_num);
 	}
 
 	spin_lock(&sctx->stat_lock);