diff mbox series

[v2,2/2] btrfs: Expose the BTRFS commit stats through sysfs

Message ID 20220614222231.2582876-3-iangelak@fb.com (mailing list archive)
State New, archived
Headers show
Series btrfs: Expose BTRFS commit stats through sysfs | expand

Commit Message

Ioannis Angelakopoulos June 14, 2022, 10:22 p.m. UTC
Create a new sysfs entry named "commit_stats" under /sys/fs/btrfs/<UUID>/
for each mounted BTRFS filesystem. The entry exposes: 1) The number of
commits so far, 2) The duration of the last commit in ms, 3) The maximum
commit duration seen so far in ms and 4) The total duration for all commits
so far in ms.

The function "btrfs_commit_stats_show" in fs/btrfs/sysfs.c is responsible
for exposing the stats to user space.

The function "btrfs_commit_stats_store" in fs/btrfs/sysfs.c is responsible
for resetting the above values to zero.

Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
---
 fs/btrfs/sysfs.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

Comments

Nikolay Borisov June 15, 2022, 12:47 p.m. UTC | #1
On 15.06.22 г. 1:22 ч., Ioannis Angelakopoulos wrote:
> Create a new sysfs entry named "commit_stats" under /sys/fs/btrfs/<UUID>/
> for each mounted BTRFS filesystem. The entry exposes: 1) The number of
> commits so far, 2) The duration of the last commit in ms, 3) The maximum
> commit duration seen so far in ms and 4) The total duration for all commits
> so far in ms.
> 
> The function "btrfs_commit_stats_show" in fs/btrfs/sysfs.c is responsible
> for exposing the stats to user space.
> 
> The function "btrfs_commit_stats_store" in fs/btrfs/sysfs.c is responsible
> for resetting the above values to zero.
> 
> Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
> ---
>   fs/btrfs/sysfs.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index db3736de14a5..54b26aef290b 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -991,6 +991,55 @@ static ssize_t btrfs_sectorsize_show(struct kobject *kobj,
>   
>   BTRFS_ATTR(, sectorsize, btrfs_sectorsize_show);
>   
> +static ssize_t btrfs_commit_stats_show(struct kobject *kobj,
> +				struct kobj_attribute *a, char *buf)
> +{
> +	struct btrfs_fs_info *fs_info = to_fs_info(kobj);
> +
> +	return sysfs_emit(buf,
> +		"commits %llu\n"
> +		"last_commit_dur %llu ms\n"
> +		"max_commit_dur %llu ms\n"
> +		"total_commit_dur %llu ms\n",
> +		fs_info->commit_stats.commit_counter,
> +		fs_info->commit_stats.last_commit_dur / NSEC_PER_MSEC,
> +		fs_info->commit_stats.max_commit_dur / NSEC_PER_MSEC,
> +		fs_info->commit_stats.total_commit_dur / NSEC_PER_MSEC);
> +}
> +
> +static ssize_t btrfs_commit_stats_store(struct kobject *kobj,
> +						struct kobj_attribute *a,
> +						const char *buf, size_t len)
> +{

Is there really value in being able to zero out the current stats?


<snip>
David Sterba June 15, 2022, 1:31 p.m. UTC | #2
On Wed, Jun 15, 2022 at 03:47:51PM +0300, Nikolay Borisov wrote:
> > +
> > +	return sysfs_emit(buf,
> > +		"commits %llu\n"
> > +		"last_commit_dur %llu ms\n"
> > +		"max_commit_dur %llu ms\n"
> > +		"total_commit_dur %llu ms\n",
> > +		fs_info->commit_stats.commit_counter,
> > +		fs_info->commit_stats.last_commit_dur / NSEC_PER_MSEC,
> > +		fs_info->commit_stats.max_commit_dur / NSEC_PER_MSEC,
> > +		fs_info->commit_stats.total_commit_dur / NSEC_PER_MSEC);
> > +}
> > +
> > +static ssize_t btrfs_commit_stats_store(struct kobject *kobj,
> > +						struct kobj_attribute *a,
> > +						const char *buf, size_t len)
> > +{
> 
> Is there really value in being able to zero out the current stats?

I think it makes sense for the max commit time, one might want to track
that for some workload and then reset. For the other it can go both
ways, eg. a monitoring tool saves the stats completely and resets them.
OTOH long term stats would be lost in case there's more than one
application reading it.
kernel test robot June 15, 2022, 8:57 p.m. UTC | #3
Hi Ioannis,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kdave/for-next]
[also build test ERROR on v5.19-rc2 next-20220615]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Ioannis-Angelakopoulos/btrfs-Expose-BTRFS-commit-stats-through-sysfs/20220615-062725
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: i386-debian-10.3-kselftests (https://download.01.org/0day-ci/archive/20220616/202206160408.sNjzgOSw-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/7fc5bb8a0de0b8d21313846a3eca99a70ca25da4
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Ioannis-Angelakopoulos/btrfs-Expose-BTRFS-commit-stats-through-sysfs/20220615-062725
        git checkout 7fc5bb8a0de0b8d21313846a3eca99a70ca25da4
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "__udivdi3" [fs/btrfs/btrfs.ko] undefined!
kernel test robot June 15, 2022, 9:07 p.m. UTC | #4
Hi Ioannis,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kdave/for-next]
[also build test ERROR on v5.19-rc2 next-20220615]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Ioannis-Angelakopoulos/btrfs-Expose-BTRFS-commit-stats-through-sysfs/20220615-062725
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
config: xtensa-allyesconfig (https://download.01.org/0day-ci/archive/20220616/202206160433.omRNx0tv-lkp@intel.com/config)
compiler: xtensa-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/7fc5bb8a0de0b8d21313846a3eca99a70ca25da4
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Ioannis-Angelakopoulos/btrfs-Expose-BTRFS-commit-stats-through-sysfs/20220615-062725
        git checkout 7fc5bb8a0de0b8d21313846a3eca99a70ca25da4
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=xtensa SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   `.exit.text' referenced in section `__jump_table' of fs/cifs/cifsfs.o: defined in discarded section `.exit.text' of fs/cifs/cifsfs.o
   `.exit.text' referenced in section `__jump_table' of fs/cifs/cifsfs.o: defined in discarded section `.exit.text' of fs/cifs/cifsfs.o
   `.exit.text' referenced in section `__jump_table' of fs/fuse/inode.o: defined in discarded section `.exit.text' of fs/fuse/inode.o
   `.exit.text' referenced in section `__jump_table' of fs/fuse/inode.o: defined in discarded section `.exit.text' of fs/fuse/inode.o
   xtensa-linux-ld: fs/btrfs/sysfs.o: in function `rmdir_subvol_show':
>> sysfs.c:(.text+0x1f0): undefined reference to `__udivdi3'
   xtensa-linux-ld: fs/btrfs/sysfs.o: in function `btrfs_commit_stats_show':
   sysfs.c:(.text+0x267): undefined reference to `__udivdi3'
   xtensa-linux-ld: fs/btrfs/sysfs.o: in function `rmdir_subvol_show':
   sysfs.c:(.text+0x1f8): undefined reference to `__udivdi3'
   xtensa-linux-ld: fs/btrfs/sysfs.o: in function `btrfs_commit_stats_show':
   sysfs.c:(.text+0x28b): undefined reference to `__udivdi3'
   xtensa-linux-ld: fs/btrfs/sysfs.o: in function `rmdir_subvol_show':
   sysfs.c:(.text+0x200): undefined reference to `__udivdi3'
   xtensa-linux-ld: fs/btrfs/sysfs.o:sysfs.c:(.text+0x2af): more undefined references to `__udivdi3' follow
   `.exit.text' referenced in section `__jump_table' of fs/ceph/super.o: defined in discarded section `.exit.text' of fs/ceph/super.o
   `.exit.text' referenced in section `__jump_table' of fs/ceph/super.o: defined in discarded section `.exit.text' of fs/ceph/super.o
   `.exit.text' referenced in section `__jump_table' of drivers/rapidio/rio_cm.o: defined in discarded section `.exit.text' of drivers/rapidio/rio_cm.o
   `.exit.text' referenced in section `__jump_table' of drivers/rapidio/rio_cm.o: defined in discarded section `.exit.text' of drivers/rapidio/rio_cm.o
   `.exit.text' referenced in section `__jump_table' of drivers/rapidio/switches/idt_gen2.o: defined in discarded section `.exit.text' of drivers/rapidio/switches/idt_gen2.o
   `.exit.text' referenced in section `__jump_table' of drivers/rapidio/switches/idt_gen2.o: defined in discarded section `.exit.text' of drivers/rapidio/switches/idt_gen2.o
   `.exit.text' referenced in section `__jump_table' of drivers/rapidio/switches/idt_gen2.o: defined in discarded section `.exit.text' of drivers/rapidio/switches/idt_gen2.o
   `.exit.text' referenced in section `__jump_table' of drivers/rapidio/switches/idt_gen2.o: defined in discarded section `.exit.text' of drivers/rapidio/switches/idt_gen2.o
   `.exit.text' referenced in section `__jump_table' of drivers/rapidio/switches/idt_gen3.o: defined in discarded section `.exit.text' of drivers/rapidio/switches/idt_gen3.o
   `.exit.text' referenced in section `__jump_table' of drivers/rapidio/switches/idt_gen3.o: defined in discarded section `.exit.text' of drivers/rapidio/switches/idt_gen3.o
   `.exit.text' referenced in section `__jump_table' of drivers/rapidio/switches/idt_gen3.o: defined in discarded section `.exit.text' of drivers/rapidio/switches/idt_gen3.o
   `.exit.text' referenced in section `__jump_table' of drivers/rapidio/switches/idt_gen3.o: defined in discarded section `.exit.text' of drivers/rapidio/switches/idt_gen3.o
   `.exit.text' referenced in section `__jump_table' of drivers/video/fbdev/vt8623fb.o: defined in discarded section `.exit.text' of drivers/video/fbdev/vt8623fb.o
   `.exit.text' referenced in section `__jump_table' of drivers/video/fbdev/vt8623fb.o: defined in discarded section `.exit.text' of drivers/video/fbdev/vt8623fb.o
   `.exit.text' referenced in section `__jump_table' of drivers/video/fbdev/s3fb.o: defined in discarded section `.exit.text' of drivers/video/fbdev/s3fb.o
   `.exit.text' referenced in section `__jump_table' of drivers/video/fbdev/s3fb.o: defined in discarded section `.exit.text' of drivers/video/fbdev/s3fb.o
   `.exit.text' referenced in section `__jump_table' of drivers/video/fbdev/arkfb.o: defined in discarded section `.exit.text' of drivers/video/fbdev/arkfb.o
   `.exit.text' referenced in section `__jump_table' of drivers/video/fbdev/arkfb.o: defined in discarded section `.exit.text' of drivers/video/fbdev/arkfb.o
   `.exit.text' referenced in section `__jump_table' of drivers/misc/phantom.o: defined in discarded section `.exit.text' of drivers/misc/phantom.o
   `.exit.text' referenced in section `__jump_table' of drivers/misc/phantom.o: defined in discarded section `.exit.text' of drivers/misc/phantom.o
   `.exit.text' referenced in section `__jump_table' of drivers/misc/habanalabs/common/habanalabs_drv.o: defined in discarded section `.exit.text' of drivers/misc/habanalabs/common/habanalabs_drv.o
   `.exit.text' referenced in section `__jump_table' of drivers/misc/habanalabs/common/habanalabs_drv.o: defined in discarded section `.exit.text' of drivers/misc/habanalabs/common/habanalabs_drv.o
   `.exit.text' referenced in section `__jump_table' of drivers/scsi/fcoe/fcoe.o: defined in discarded section `.exit.text' of drivers/scsi/fcoe/fcoe.o
   `.exit.text' referenced in section `__jump_table' of drivers/scsi/fcoe/fcoe.o: defined in discarded section `.exit.text' of drivers/scsi/fcoe/fcoe.o
   `.exit.text' referenced in section `__jump_table' of drivers/scsi/cxgbi/libcxgbi.o: defined in discarded section `.exit.text' of drivers/scsi/cxgbi/libcxgbi.o
   `.exit.text' referenced in section `__jump_table' of drivers/scsi/cxgbi/libcxgbi.o: defined in discarded section `.exit.text' of drivers/scsi/cxgbi/libcxgbi.o
   `.exit.text' referenced in section `__jump_table' of drivers/target/target_core_configfs.o: defined in discarded section `.exit.text' of drivers/target/target_core_configfs.o
   `.exit.text' referenced in section `__jump_table' of drivers/target/target_core_configfs.o: defined in discarded section `.exit.text' of drivers/target/target_core_configfs.o
   `.exit.text' referenced in section `__jump_table' of drivers/mtd/maps/pcmciamtd.o: defined in discarded section `.exit.text' of drivers/mtd/maps/pcmciamtd.o
   `.exit.text' referenced in section `__jump_table' of drivers/mtd/maps/pcmciamtd.o: defined in discarded section `.exit.text' of drivers/mtd/maps/pcmciamtd.o
   `.exit.text' referenced in section `__jump_table' of drivers/net/wireless/purelifi/plfxlc/usb.o: defined in discarded section `.exit.text' of drivers/net/wireless/purelifi/plfxlc/usb.o
   `.exit.text' referenced in section `__jump_table' of drivers/net/wireless/purelifi/plfxlc/usb.o: defined in discarded section `.exit.text' of drivers/net/wireless/purelifi/plfxlc/usb.o
   `.exit.text' referenced in section `__jump_table' of drivers/net/wireless/zydas/zd1211rw/zd_usb.o: defined in discarded section `.exit.text' of drivers/net/wireless/zydas/zd1211rw/zd_usb.o
   `.exit.text' referenced in section `__jump_table' of drivers/net/wireless/zydas/zd1211rw/zd_usb.o: defined in discarded section `.exit.text' of drivers/net/wireless/zydas/zd1211rw/zd_usb.o
   `.exit.text' referenced in section `__jump_table' of drivers/net/wireless/ray_cs.o: defined in discarded section `.exit.text' of drivers/net/wireless/ray_cs.o
   `.exit.text' referenced in section `__jump_table' of drivers/net/wireless/ray_cs.o: defined in discarded section `.exit.text' of drivers/net/wireless/ray_cs.o
   `.exit.text' referenced in section `__jump_table' of drivers/net/wireless/mac80211_hwsim.o: defined in discarded section `.exit.text' of drivers/net/wireless/mac80211_hwsim.o
   `.exit.text' referenced in section `__jump_table' of drivers/net/wireless/mac80211_hwsim.o: defined in discarded section `.exit.text' of drivers/net/wireless/mac80211_hwsim.o
   `.exit.text' referenced in section `__jump_table' of drivers/usb/gadget/legacy/inode.o: defined in discarded section `.exit.text' of drivers/usb/gadget/legacy/inode.o
   `.exit.text' referenced in section `__jump_table' of drivers/usb/gadget/legacy/inode.o: defined in discarded section `.exit.text' of drivers/usb/gadget/legacy/inode.o
   `.exit.text' referenced in section `__jump_table' of drivers/usb/gadget/legacy/g_ffs.o: defined in discarded section `.exit.text' of drivers/usb/gadget/legacy/g_ffs.o
   `.exit.text' referenced in section `__jump_table' of drivers/usb/gadget/legacy/g_ffs.o: defined in discarded section `.exit.text' of drivers/usb/gadget/legacy/g_ffs.o
   `.exit.text' referenced in section `__jump_table' of drivers/media/common/siano/smscoreapi.o: defined in discarded section `.exit.text' of drivers/media/common/siano/smscoreapi.o
   `.exit.text' referenced in section `__jump_table' of drivers/media/common/siano/smscoreapi.o: defined in discarded section `.exit.text' of drivers/media/common/siano/smscoreapi.o
   `.exit.text' referenced in section `__jump_table' of drivers/vme/bridges/vme_fake.o: defined in discarded section `.exit.text' of drivers/vme/bridges/vme_fake.o
   `.exit.text' referenced in section `__jump_table' of drivers/vme/bridges/vme_fake.o: defined in discarded section `.exit.text' of drivers/vme/bridges/vme_fake.o
   `.exit.text' referenced in section `.data' of sound/soc/codecs/tlv320adc3xxx.o: defined in discarded section `.exit.text' of sound/soc/codecs/tlv320adc3xxx.o
   `.exit.text' referenced in section `__jump_table' of net/netfilter/nf_conntrack_h323_main.o: defined in discarded section `.exit.text' of net/netfilter/nf_conntrack_h323_main.o
   `.exit.text' referenced in section `__jump_table' of net/netfilter/nf_conntrack_h323_main.o: defined in discarded section `.exit.text' of net/netfilter/nf_conntrack_h323_main.o
   `.exit.text' referenced in section `__jump_table' of net/netfilter/ipset/ip_set_core.o: defined in discarded section `.exit.text' of net/netfilter/ipset/ip_set_core.o
   `.exit.text' referenced in section `__jump_table' of net/netfilter/ipset/ip_set_core.o: defined in discarded section `.exit.text' of net/netfilter/ipset/ip_set_core.o
   `.exit.text' referenced in section `__jump_table' of net/ceph/ceph_common.o: defined in discarded section `.exit.text' of net/ceph/ceph_common.o
   `.exit.text' referenced in section `__jump_table' of net/ceph/ceph_common.o: defined in discarded section `.exit.text' of net/ceph/ceph_common.o
Boris Burkov June 15, 2022, 9:32 p.m. UTC | #5
On Wed, Jun 15, 2022 at 03:31:30PM +0200, David Sterba wrote:
> On Wed, Jun 15, 2022 at 03:47:51PM +0300, Nikolay Borisov wrote:
> > > +
> > > +	return sysfs_emit(buf,
> > > +		"commits %llu\n"
> > > +		"last_commit_dur %llu ms\n"
> > > +		"max_commit_dur %llu ms\n"
> > > +		"total_commit_dur %llu ms\n",
> > > +		fs_info->commit_stats.commit_counter,
> > > +		fs_info->commit_stats.last_commit_dur / NSEC_PER_MSEC,
> > > +		fs_info->commit_stats.max_commit_dur / NSEC_PER_MSEC,
> > > +		fs_info->commit_stats.total_commit_dur / NSEC_PER_MSEC);
> > > +}
> > > +
> > > +static ssize_t btrfs_commit_stats_store(struct kobject *kobj,
> > > +						struct kobj_attribute *a,
> > > +						const char *buf, size_t len)
> > > +{
> > 
> > Is there really value in being able to zero out the current stats?
> 
> I think it makes sense for the max commit time, one might want to track
> that for some workload and then reset. For the other it can go both
> ways, eg. a monitoring tool saves the stats completely and resets them.
> OTOH long term stats would be lost in case there's more than one
> application reading it.

As far as I can see, our options are roughly:
1. separate files per stat, only max file has clear
2. clear only max when clearing the joint file
3. clear everything in the joint file (current patch)
4. clear bitmap to control which fields to clear

1 seems the clearest, but is sort of messy in terms of spamming lots of
files. There can be a "1b" variant which is one file with
count/total/last and a second file with max (rather than one each for all
four). 2 is a bit weird, just due to asymmetry. The "multiple separate
clearers" problem Dave came up with seems  serious for 3: it means if I
want to clear max and you want to clear total, we might make each other
lose data. 4 would work around that, but is an untuitive interface.

One other reason clearing total could be good is if we are counting in
nanoseconds, overflow becomes a non-trivial risk. For this reason, I
think I vote for 1 (separate files), but 2 (only clear max in a single
file) seems like a decent compromise. 4 feels overengineered, but is
kind of a souped-up version of 2.
Nikolay Borisov June 16, 2022, 3:24 p.m. UTC | #6
On 16.06.22 г. 0:32 ч., Boris Burkov wrote:
> On Wed, Jun 15, 2022 at 03:31:30PM +0200, David Sterba wrote:
>> On Wed, Jun 15, 2022 at 03:47:51PM +0300, Nikolay Borisov wrote:
>>>> +
>>>> +	return sysfs_emit(buf,
>>>> +		"commits %llu\n"
>>>> +		"last_commit_dur %llu ms\n"
>>>> +		"max_commit_dur %llu ms\n"
>>>> +		"total_commit_dur %llu ms\n",
>>>> +		fs_info->commit_stats.commit_counter,
>>>> +		fs_info->commit_stats.last_commit_dur / NSEC_PER_MSEC,
>>>> +		fs_info->commit_stats.max_commit_dur / NSEC_PER_MSEC,
>>>> +		fs_info->commit_stats.total_commit_dur / NSEC_PER_MSEC);
>>>> +}
>>>> +
>>>> +static ssize_t btrfs_commit_stats_store(struct kobject *kobj,
>>>> +						struct kobj_attribute *a,
>>>> +						const char *buf, size_t len)
>>>> +{
>>>
>>> Is there really value in being able to zero out the current stats?
>>
>> I think it makes sense for the max commit time, one might want to track
>> that for some workload and then reset. For the other it can go both
>> ways, eg. a monitoring tool saves the stats completely and resets them.
>> OTOH long term stats would be lost in case there's more than one
>> application reading it.
> 
> As far as I can see, our options are roughly:
> 1. separate files per stat, only max file has clear
> 2. clear only max when clearing the joint file
> 3. clear everything in the joint file (current patch)
> 4. clear bitmap to control which fields to clear
> 
> 1 seems the clearest, but is sort of messy in terms of spamming lots of
> files. There can be a "1b" variant which is one file with
> count/total/last and a second file with max (rather than one each for all
> four). 2 is a bit weird, just due to asymmetry. The "multiple separate
> clearers" problem Dave came up with seems  serious for 3: it means if I
> want to clear max and you want to clear total, we might make each other
> lose data. 4 would work around that, but is an untuitive interface.
> 
> One other reason clearing total could be good is if we are counting in
> nanoseconds, overflow becomes a non-trivial risk. For this reason, I
> think I vote for 1 (separate files), but 2 (only clear max in a single
> file) seems like a decent compromise. 4 feels overengineered, but is
> kind of a souped-up version of 2.


I don't know why but I like 2, even though when I think more about it it 
indeed introduces somewhat non-trivial asymmetry. But given that sysfs 
interfaces aren't considered ABI we can get it wrong the first time and 
we won't have to bother to support until the end of times.

A different POV would be that those stats would mostly be useful when 
doing measurements of a particular workload and in those cases you can 
reset the stats by remounting the fs, no ? From a monitoring POV I'd 
expect that the most interesting stats would be last_commit_dur as you 
can read it every x seconds, plot it and see how transaction latency 
varies over time. From such stats you can derive a max value, probably 
not THE max, but it should be within the ballpark. Total_commit and 
commit_counter - yeah, I dunno about those.
Boris Burkov June 16, 2022, 4:53 p.m. UTC | #7
On Thu, Jun 16, 2022 at 06:24:51PM +0300, Nikolay Borisov wrote:
> 
> 
> On 16.06.22 г. 0:32 ч., Boris Burkov wrote:
> > On Wed, Jun 15, 2022 at 03:31:30PM +0200, David Sterba wrote:
> > > On Wed, Jun 15, 2022 at 03:47:51PM +0300, Nikolay Borisov wrote:
> > > > > +
> > > > > +	return sysfs_emit(buf,
> > > > > +		"commits %llu\n"
> > > > > +		"last_commit_dur %llu ms\n"
> > > > > +		"max_commit_dur %llu ms\n"
> > > > > +		"total_commit_dur %llu ms\n",
> > > > > +		fs_info->commit_stats.commit_counter,
> > > > > +		fs_info->commit_stats.last_commit_dur / NSEC_PER_MSEC,
> > > > > +		fs_info->commit_stats.max_commit_dur / NSEC_PER_MSEC,
> > > > > +		fs_info->commit_stats.total_commit_dur / NSEC_PER_MSEC);
> > > > > +}
> > > > > +
> > > > > +static ssize_t btrfs_commit_stats_store(struct kobject *kobj,
> > > > > +						struct kobj_attribute *a,
> > > > > +						const char *buf, size_t len)
> > > > > +{
> > > > 
> > > > Is there really value in being able to zero out the current stats?
> > > 
> > > I think it makes sense for the max commit time, one might want to track
> > > that for some workload and then reset. For the other it can go both
> > > ways, eg. a monitoring tool saves the stats completely and resets them.
> > > OTOH long term stats would be lost in case there's more than one
> > > application reading it.
> > 
> > As far as I can see, our options are roughly:
> > 1. separate files per stat, only max file has clear
> > 2. clear only max when clearing the joint file
> > 3. clear everything in the joint file (current patch)
> > 4. clear bitmap to control which fields to clear
> > 
> > 1 seems the clearest, but is sort of messy in terms of spamming lots of
> > files. There can be a "1b" variant which is one file with
> > count/total/last and a second file with max (rather than one each for all
> > four). 2 is a bit weird, just due to asymmetry. The "multiple separate
> > clearers" problem Dave came up with seems  serious for 3: it means if I
> > want to clear max and you want to clear total, we might make each other
> > lose data. 4 would work around that, but is an untuitive interface.
> > 
> > One other reason clearing total could be good is if we are counting in
> > nanoseconds, overflow becomes a non-trivial risk. For this reason, I
> > think I vote for 1 (separate files), but 2 (only clear max in a single
> > file) seems like a decent compromise. 4 feels overengineered, but is
> > kind of a souped-up version of 2.
> 
> 
> I don't know why but I like 2, even though when I think more about it it
> indeed introduces somewhat non-trivial asymmetry. But given that sysfs
> interfaces aren't considered ABI we can get it wrong the first time and we
> won't have to bother to support until the end of times.
> 
> A different POV would be that those stats would mostly be useful when doing
> measurements of a particular workload and in those cases you can reset the
> stats by remounting the fs, no ? From a monitoring POV I'd expect that the
> most interesting stats would be last_commit_dur as you can read it every x
> seconds, plot it and see how transaction latency varies over time. From such

My 2c: What's most useful for monitoring are total+count and max.

You have a process periodically read the total/count and get an average
over the collection interval, and it reads/clears the max to track the
max over the collection interval. From there it sends what it has
collected off to some DB to be stored/plotted/aggregated/whatever.

Our collection interval is typically 60s. I imagine if you collected
more frequently, your idea of tracking last duration would work a lot
better than in our setup.

With all that said, I think I agree that 2 is the best interface. I
think it also lets us get rid of the lock, since you no longer care
about racing setting the max with clearing it, since it is always
self-consistent.

> stats you can derive a max value, probably not THE max, but it should be
> within the ballpark. Total_commit and commit_counter - yeah, I dunno about
> those.
Ioannis Angelakopoulos June 20, 2022, 8:37 p.m. UTC | #8
On 6/16/22 9:53 AM, Boris Burkov wrote:
> On Thu, Jun 16, 2022 at 06:24:51PM +0300, Nikolay Borisov wrote:
>>
>>
>> On 16.06.22 г. 0:32 ч., Boris Burkov wrote:
>>> On Wed, Jun 15, 2022 at 03:31:30PM +0200, David Sterba wrote:
>>>> On Wed, Jun 15, 2022 at 03:47:51PM +0300, Nikolay Borisov wrote:
>>>>>> +
>>>>>> +	return sysfs_emit(buf,
>>>>>> +		"commits %llu\n"
>>>>>> +		"last_commit_dur %llu ms\n"
>>>>>> +		"max_commit_dur %llu ms\n"
>>>>>> +		"total_commit_dur %llu ms\n",
>>>>>> +		fs_info->commit_stats.commit_counter,
>>>>>> +		fs_info->commit_stats.last_commit_dur / NSEC_PER_MSEC,
>>>>>> +		fs_info->commit_stats.max_commit_dur / NSEC_PER_MSEC,
>>>>>> +		fs_info->commit_stats.total_commit_dur / NSEC_PER_MSEC);
>>>>>> +}
>>>>>> +
>>>>>> +static ssize_t btrfs_commit_stats_store(struct kobject *kobj,
>>>>>> +						struct kobj_attribute *a,
>>>>>> +						const char *buf, size_t len)
>>>>>> +{
>>>>>
>>>>> Is there really value in being able to zero out the current stats?
>>>>
>>>> I think it makes sense for the max commit time, one might want to track
>>>> that for some workload and then reset. For the other it can go both
>>>> ways, eg. a monitoring tool saves the stats completely and resets them.
>>>> OTOH long term stats would be lost in case there's more than one
>>>> application reading it.
>>>
>>> As far as I can see, our options are roughly:
>>> 1. separate files per stat, only max file has clear
>>> 2. clear only max when clearing the joint file
>>> 3. clear everything in the joint file (current patch)
>>> 4. clear bitmap to control which fields to clear
>>>
>>> 1 seems the clearest, but is sort of messy in terms of spamming lots of
>>> files. There can be a "1b" variant which is one file with
>>> count/total/last and a second file with max (rather than one each for all
>>> four). 2 is a bit weird, just due to asymmetry. The "multiple separate
>>> clearers" problem Dave came up with seems  serious for 3: it means if I
>>> want to clear max and you want to clear total, we might make each other
>>> lose data. 4 would work around that, but is an untuitive interface.
>>>
>>> One other reason clearing total could be good is if we are counting in
>>> nanoseconds, overflow becomes a non-trivial risk. For this reason, I
>>> think I vote for 1 (separate files), but 2 (only clear max in a single
>>> file) seems like a decent compromise. 4 feels overengineered, but is
>>> kind of a souped-up version of 2.
>>
>>
>> I don't know why but I like 2, even though when I think more about it it
>> indeed introduces somewhat non-trivial asymmetry. But given that sysfs
>> interfaces aren't considered ABI we can get it wrong the first time and we
>> won't have to bother to support until the end of times.
>>
>> A different POV would be that those stats would mostly be useful when doing
>> measurements of a particular workload and in those cases you can reset the
>> stats by remounting the fs, no ? From a monitoring POV I'd expect that the
>> most interesting stats would be last_commit_dur as you can read it every x
>> seconds, plot it and see how transaction latency varies over time. From such
> 
> My 2c: What's most useful for monitoring are total+count and max.
> 
> You have a process periodically read the total/count and get an average
> over the collection interval, and it reads/clears the max to track the
> max over the collection interval. From there it sends what it has
> collected off to some DB to be stored/plotted/aggregated/whatever.
> 
> Our collection interval is typically 60s. I imagine if you collected
> more frequently, your idea of tracking last duration would work a lot
> better than in our setup.
> 
> With all that said, I think I agree that 2 is the best interface. I
> think it also lets us get rid of the lock, since you no longer care
> about racing setting the max with clearing it, since it is always
> self-consistent.

Just to make sure, do we proceed with option 2? That is, clearing only 
the max and getting rid of the lock?

>> stats you can derive a max value, probably not THE max, but it should be
>> within the ballpark. Total_commit and commit_counter - yeah, I dunno about
>> those.
David Sterba June 21, 2022, 2:45 p.m. UTC | #9
On Mon, Jun 20, 2022 at 08:37:57PM +0000, Ioannis Angelakopoulos wrote:
> On 6/16/22 9:53 AM, Boris Burkov wrote:
> > On Thu, Jun 16, 2022 at 06:24:51PM +0300, Nikolay Borisov wrote:
> >>
> >>
> >> On 16.06.22 г. 0:32 ч., Boris Burkov wrote:
> >>> On Wed, Jun 15, 2022 at 03:31:30PM +0200, David Sterba wrote:
> >>>> On Wed, Jun 15, 2022 at 03:47:51PM +0300, Nikolay Borisov wrote:
> >>>>>> +
> >>>>>> +	return sysfs_emit(buf,
> >>>>>> +		"commits %llu\n"
> >>>>>> +		"last_commit_dur %llu ms\n"
> >>>>>> +		"max_commit_dur %llu ms\n"
> >>>>>> +		"total_commit_dur %llu ms\n",
> >>>>>> +		fs_info->commit_stats.commit_counter,
> >>>>>> +		fs_info->commit_stats.last_commit_dur / NSEC_PER_MSEC,
> >>>>>> +		fs_info->commit_stats.max_commit_dur / NSEC_PER_MSEC,
> >>>>>> +		fs_info->commit_stats.total_commit_dur / NSEC_PER_MSEC);
> >>>>>> +}
> >>>>>> +
> >>>>>> +static ssize_t btrfs_commit_stats_store(struct kobject *kobj,
> >>>>>> +						struct kobj_attribute *a,
> >>>>>> +						const char *buf, size_t len)
> >>>>>> +{
> >>>>>
> >>>>> Is there really value in being able to zero out the current stats?
> >>>>
> >>>> I think it makes sense for the max commit time, one might want to track
> >>>> that for some workload and then reset. For the other it can go both
> >>>> ways, eg. a monitoring tool saves the stats completely and resets them.
> >>>> OTOH long term stats would be lost in case there's more than one
> >>>> application reading it.
> >>>
> >>> As far as I can see, our options are roughly:
> >>> 1. separate files per stat, only max file has clear
> >>> 2. clear only max when clearing the joint file
> >>> 3. clear everything in the joint file (current patch)
> >>> 4. clear bitmap to control which fields to clear
> >>>
> >>> 1 seems the clearest, but is sort of messy in terms of spamming lots of
> >>> files. There can be a "1b" variant which is one file with
> >>> count/total/last and a second file with max (rather than one each for all
> >>> four). 2 is a bit weird, just due to asymmetry. The "multiple separate
> >>> clearers" problem Dave came up with seems  serious for 3: it means if I
> >>> want to clear max and you want to clear total, we might make each other
> >>> lose data. 4 would work around that, but is an untuitive interface.
> >>>
> >>> One other reason clearing total could be good is if we are counting in
> >>> nanoseconds, overflow becomes a non-trivial risk. For this reason, I
> >>> think I vote for 1 (separate files), but 2 (only clear max in a single
> >>> file) seems like a decent compromise. 4 feels overengineered, but is
> >>> kind of a souped-up version of 2.
> >>
> >>
> >> I don't know why but I like 2, even though when I think more about it it
> >> indeed introduces somewhat non-trivial asymmetry. But given that sysfs
> >> interfaces aren't considered ABI we can get it wrong the first time and we
> >> won't have to bother to support until the end of times.
> >>
> >> A different POV would be that those stats would mostly be useful when doing
> >> measurements of a particular workload and in those cases you can reset the
> >> stats by remounting the fs, no ? From a monitoring POV I'd expect that the
> >> most interesting stats would be last_commit_dur as you can read it every x
> >> seconds, plot it and see how transaction latency varies over time. From such
> > 
> > My 2c: What's most useful for monitoring are total+count and max.
> > 
> > You have a process periodically read the total/count and get an average
> > over the collection interval, and it reads/clears the max to track the
> > max over the collection interval. From there it sends what it has
> > collected off to some DB to be stored/plotted/aggregated/whatever.
> > 
> > Our collection interval is typically 60s. I imagine if you collected
> > more frequently, your idea of tracking last duration would work a lot
> > better than in our setup.
> > 
> > With all that said, I think I agree that 2 is the best interface. I
> > think it also lets us get rid of the lock, since you no longer care
> > about racing setting the max with clearing it, since it is always
> > self-consistent.
> 
> Just to make sure, do we proceed with option 2? That is, clearing only 
> the max and getting rid of the lock?

I agree that 2 is reasonable. Clearing all values does not seem very
useful as it would lose information, clearing just the maximum resets a
cumulative value and suites long sampling period, while for more fine
granied monitoring the other values can be used.

If there's a use case that can't work with current values and clearning
semantics we can revisit that, but I'd like to hear about first.
David Sterba June 21, 2022, 2:49 p.m. UTC | #10
On Tue, Jun 14, 2022 at 03:22:34PM -0700, Ioannis Angelakopoulos wrote:
> Create a new sysfs entry named "commit_stats" under /sys/fs/btrfs/<UUID>/
> for each mounted BTRFS filesystem. The entry exposes: 1) The number of
> commits so far, 2) The duration of the last commit in ms, 3) The maximum
> commit duration seen so far in ms and 4) The total duration for all commits
> so far in ms.
> 
> The function "btrfs_commit_stats_show" in fs/btrfs/sysfs.c is responsible
> for exposing the stats to user space.
> 
> The function "btrfs_commit_stats_store" in fs/btrfs/sysfs.c is responsible
> for resetting the above values to zero.
> 
> Signed-off-by: Ioannis Angelakopoulos <iangelak@fb.com>
> ---
>  fs/btrfs/sysfs.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index db3736de14a5..54b26aef290b 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -991,6 +991,55 @@ static ssize_t btrfs_sectorsize_show(struct kobject *kobj,
>  
>  BTRFS_ATTR(, sectorsize, btrfs_sectorsize_show);
>  
> +static ssize_t btrfs_commit_stats_show(struct kobject *kobj,
> +				struct kobj_attribute *a, char *buf)
> +{
> +	struct btrfs_fs_info *fs_info = to_fs_info(kobj);
> +
> +	return sysfs_emit(buf,
> +		"commits %llu\n"
> +		"last_commit_dur %llu ms\n"

		last_commit_dur_ms %llu

so the name and units is in one string and "name value" on each line. In
case we'd want to add _ns variants for better precision. Sysfs is often
used as "grep value_name /sys/.../file" so it's future proof.

> +		"max_commit_dur %llu ms\n"
> +		"total_commit_dur %llu ms\n",
> +		fs_info->commit_stats.commit_counter,
> +		fs_info->commit_stats.last_commit_dur / NSEC_PER_MSEC,
> +		fs_info->commit_stats.max_commit_dur / NSEC_PER_MSEC,
> +		fs_info->commit_stats.total_commit_dur / NSEC_PER_MSEC);
> +}
> +
> +static ssize_t btrfs_commit_stats_store(struct kobject *kobj,
> +						struct kobj_attribute *a,
> +						const char *buf, size_t len)
> +{
> +	struct btrfs_fs_info *fs_info = to_fs_info(kobj);
> +	unsigned long val;
> +	int ret;
> +
> +	if (!fs_info)
> +		return -EPERM;
> +
> +	if (!capable(CAP_SYS_RESOURCE))
> +		return -EPERM;
> +
> +	ret = kstrtoul(buf, 10, &val);
> +	if (ret)
> +		return ret;
> +	if (val)
> +		return -EINVAL;
> +
> +	spin_lock(&fs_info->super_lock);
> +	fs_info->commit_stats.commit_counter = 0;
> +	fs_info->commit_stats.last_commit_dur = 0;
> +	fs_info->commit_stats.max_commit_dur = 0;
> +	fs_info->commit_stats.total_commit_dur = 0;
> +	spin_unlock(&fs_info->super_lock);
> +
> +	return len;
> +}

No newline

> +
> +BTRFS_ATTR_RW(, commit_stats, btrfs_commit_stats_show,
> +			  btrfs_commit_stats_store);
> +
>  static ssize_t btrfs_clone_alignment_show(struct kobject *kobj,
>  				struct kobj_attribute *a, char *buf)
>  {
> @@ -1230,6 +1279,7 @@ static const struct attribute *btrfs_attrs[] = {
>  	BTRFS_ATTR_PTR(, generation),
>  	BTRFS_ATTR_PTR(, read_policy),
>  	BTRFS_ATTR_PTR(, bg_reclaim_threshold),
> +	BTRFS_ATTR_PTR(, commit_stats),
>  	NULL,
>  };
>  
> @@ -2236,4 +2286,3 @@ void __cold btrfs_exit_sysfs(void)
>  #endif
>  	kset_unregister(btrfs_kset);
>  }
> -
> -- 
> 2.30.2
>
diff mbox series

Patch

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index db3736de14a5..54b26aef290b 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -991,6 +991,55 @@  static ssize_t btrfs_sectorsize_show(struct kobject *kobj,
 
 BTRFS_ATTR(, sectorsize, btrfs_sectorsize_show);
 
+static ssize_t btrfs_commit_stats_show(struct kobject *kobj,
+				struct kobj_attribute *a, char *buf)
+{
+	struct btrfs_fs_info *fs_info = to_fs_info(kobj);
+
+	return sysfs_emit(buf,
+		"commits %llu\n"
+		"last_commit_dur %llu ms\n"
+		"max_commit_dur %llu ms\n"
+		"total_commit_dur %llu ms\n",
+		fs_info->commit_stats.commit_counter,
+		fs_info->commit_stats.last_commit_dur / NSEC_PER_MSEC,
+		fs_info->commit_stats.max_commit_dur / NSEC_PER_MSEC,
+		fs_info->commit_stats.total_commit_dur / NSEC_PER_MSEC);
+}
+
+static ssize_t btrfs_commit_stats_store(struct kobject *kobj,
+						struct kobj_attribute *a,
+						const char *buf, size_t len)
+{
+	struct btrfs_fs_info *fs_info = to_fs_info(kobj);
+	unsigned long val;
+	int ret;
+
+	if (!fs_info)
+		return -EPERM;
+
+	if (!capable(CAP_SYS_RESOURCE))
+		return -EPERM;
+
+	ret = kstrtoul(buf, 10, &val);
+	if (ret)
+		return ret;
+	if (val)
+		return -EINVAL;
+
+	spin_lock(&fs_info->super_lock);
+	fs_info->commit_stats.commit_counter = 0;
+	fs_info->commit_stats.last_commit_dur = 0;
+	fs_info->commit_stats.max_commit_dur = 0;
+	fs_info->commit_stats.total_commit_dur = 0;
+	spin_unlock(&fs_info->super_lock);
+
+	return len;
+}
+
+BTRFS_ATTR_RW(, commit_stats, btrfs_commit_stats_show,
+			  btrfs_commit_stats_store);
+
 static ssize_t btrfs_clone_alignment_show(struct kobject *kobj,
 				struct kobj_attribute *a, char *buf)
 {
@@ -1230,6 +1279,7 @@  static const struct attribute *btrfs_attrs[] = {
 	BTRFS_ATTR_PTR(, generation),
 	BTRFS_ATTR_PTR(, read_policy),
 	BTRFS_ATTR_PTR(, bg_reclaim_threshold),
+	BTRFS_ATTR_PTR(, commit_stats),
 	NULL,
 };
 
@@ -2236,4 +2286,3 @@  void __cold btrfs_exit_sysfs(void)
 #endif
 	kset_unregister(btrfs_kset);
 }
-