Message ID | 20190702132918.114818-1-houtao1@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | md: export internal stats through debugfs | expand |
On Tue, Jul 2, 2019 at 6:25 AM Hou Tao <houtao1@huawei.com> wrote: > > Hi, > > There are so many io counters, stats and flags in md, so I think > export these info to userspace will be helpful for online-debugging, > especially when the vmlinux file and the crash utility are not > available. And these info can also be utilized during code > understanding. > > MD has already exported some stats through sysfs files under > /sys/block/mdX/md, but using sysfs file to export more internal > stats are not a good choice, because we need to create a single > sysfs file for each internal stat according to the use convention > of sysfs and there are too many internal stats. Further, the > newly-created sysfs files would become APIs for userspace tools, > but that is not we wanted, because these files are related with > internal stats and internal stats may change from time to time. > > And I think debugfs is a better choice. Because we can show multiple > related stats in a debugfs file, and the debugfs file will never be > used as an userspace API. > > Two debugfs files are created to expose these internal stats: > * iostat: io counters and io related stats (e.g., mddev->active_io, > r1conf->nr_pending, or r1confi->retry_list) > * stat: normal stats/flags (e.g., mddev->recovery, conf->array_frozen) > > Because internal stats are spreaded all over md-core and md-personality, > so both md-core and md-personality will create these two debugfs files > under different debugfs directory. > > Patch 1 factors out the debugfs files creation routine for md-core and > md-personality, patch 2 creates two debugfs files: iostat & stat under > /sys/kernel/debug/block/mdX for md-core, and patch 3 creates two debugfs > files: iostat & stat under /sys/kernel/debug/block/mdX/raid1 for md-raid1. > > The following lines show the hierarchy and the content of these debugfs > files for a RAID1 device: > > $ pwd > /sys/kernel/debug/block/md0 > $ tree > . > ├── iostat > ├── raid1 > │ ├── iostat > │ └── stat > └── stat > > $ cat iostat > active_io 0 > sb_wait 0 pending_writes 0 > recovery_active 0 > bitmap pending_writes 0 > > $ cat stat > flags 0x20 > sb_flags 0x0 > recovery 0x0 > > $ cat raid1/iostat > retry_list active 0 > bio_end_io_list active 0 > pending_bio_list active 0 cnt 0 > sync_pending 0 > nr_pending 0 > nr_waiting 0 > nr_queued 0 > barrier 0 Hi, Sorry for the late reply. I think these information are really debug information that we should not show in /sys. Once we expose them in /sys, we need to support them because some use space may start searching data from them. Thanks, Song
On 7/2/19 9:29 PM, Hou Tao wrote: > Hi, > > There are so many io counters, stats and flags in md, so I think > export these info to userspace will be helpful for online-debugging, For online-debugging, I'd suggest you have a try eBPF which can be very helpful. -Bob > especially when the vmlinux file and the crash utility are not > available. And these info can also be utilized during code > understanding. > > MD has already exported some stats through sysfs files under > /sys/block/mdX/md, but using sysfs file to export more internal > stats are not a good choice, because we need to create a single > sysfs file for each internal stat according to the use convention > of sysfs and there are too many internal stats. Further, the > newly-created sysfs files would become APIs for userspace tools, > but that is not we wanted, because these files are related with > internal stats and internal stats may change from time to time. > > And I think debugfs is a better choice. Because we can show multiple > related stats in a debugfs file, and the debugfs file will never be > used as an userspace API. > > Two debugfs files are created to expose these internal stats: > * iostat: io counters and io related stats (e.g., mddev->active_io, > r1conf->nr_pending, or r1confi->retry_list) > * stat: normal stats/flags (e.g., mddev->recovery, conf->array_frozen) > > Because internal stats are spreaded all over md-core and md-personality, > so both md-core and md-personality will create these two debugfs files > under different debugfs directory. > > Patch 1 factors out the debugfs files creation routine for md-core and > md-personality, patch 2 creates two debugfs files: iostat & stat under > /sys/kernel/debug/block/mdX for md-core, and patch 3 creates two debugfs > files: iostat & stat under /sys/kernel/debug/block/mdX/raid1 for md-raid1. > > The following lines show the hierarchy and the content of these debugfs > files for a RAID1 device: > > $ pwd > /sys/kernel/debug/block/md0 > $ tree > . > ├── iostat > ├── raid1 > │ ├── iostat > │ └── stat > └── stat > > $ cat iostat > active_io 0 > sb_wait 0 pending_writes 0 > recovery_active 0 > bitmap pending_writes 0 > > $ cat stat > flags 0x20 > sb_flags 0x0 > recovery 0x0 > > $ cat raid1/iostat > retry_list active 0 > bio_end_io_list active 0 > pending_bio_list active 0 cnt 0 > sync_pending 0 > nr_pending 0 > nr_waiting 0 > nr_queued 0 > barrier 0 > > $ cat raid1/stat > array_frozen 0 > > I'm not sure whether the division of internal stats is appropriate and > whether the internal stats in debugfs files are sufficient, so questions > and comments are weclome. > > Regards, > Tao > > Hou Tao (3): > md-debugfs: add md_debugfs_create_files() > md: export inflight io counters and internal stats in debugfs > raid1: export inflight io counters and internal stats in debugfs > > drivers/md/Makefile | 2 +- > drivers/md/md-debugfs.c | 35 ++++++++++++++++++ > drivers/md/md-debugfs.h | 16 +++++++++ > drivers/md/md.c | 65 ++++++++++++++++++++++++++++++++++ > drivers/md/md.h | 1 + > drivers/md/raid1.c | 78 +++++++++++++++++++++++++++++++++++++++++ > drivers/md/raid1.h | 1 + > 7 files changed, 197 insertions(+), 1 deletion(-) > create mode 100644 drivers/md/md-debugfs.c > create mode 100644 drivers/md/md-debugfs.h >
Hi, On 2019/7/23 5:31, Song Liu wrote: > On Tue, Jul 2, 2019 at 6:25 AM Hou Tao <houtao1@huawei.com> wrote: >> >> Hi, >> >> There are so many io counters, stats and flags in md, so I think >> export these info to userspace will be helpful for online-debugging, >> especially when the vmlinux file and the crash utility are not >> available. And these info can also be utilized during code >> understanding. >> >> MD has already exported some stats through sysfs files under >> /sys/block/mdX/md, but using sysfs file to export more internal >> stats are not a good choice, because we need to create a single >> sysfs file for each internal stat according to the use convention >> of sysfs and there are too many internal stats. Further, the >> newly-created sysfs files would become APIs for userspace tools, >> but that is not we wanted, because these files are related with >> internal stats and internal stats may change from time to time. >> >> And I think debugfs is a better choice. Because we can show multiple >> related stats in a debugfs file, and the debugfs file will never be >> used as an userspace API. >> >> Two debugfs files are created to expose these internal stats: >> * iostat: io counters and io related stats (e.g., mddev->active_io, >> r1conf->nr_pending, or r1confi->retry_list) >> * stat: normal stats/flags (e.g., mddev->recovery, conf->array_frozen) >> >> Because internal stats are spreaded all over md-core and md-personality, >> so both md-core and md-personality will create these two debugfs files >> under different debugfs directory. >> >> Patch 1 factors out the debugfs files creation routine for md-core and >> md-personality, patch 2 creates two debugfs files: iostat & stat under >> /sys/kernel/debug/block/mdX for md-core, and patch 3 creates two debugfs >> files: iostat & stat under /sys/kernel/debug/block/mdX/raid1 for md-raid1. >> >> The following lines show the hierarchy and the content of these debugfs >> files for a RAID1 device: >> >> $ pwd >> /sys/kernel/debug/block/md0 >> $ tree >> . >> ├── iostat >> ├── raid1 >> │ ├── iostat >> │ └── stat >> └── stat >> >> $ cat iostat >> active_io 0 >> sb_wait 0 pending_writes 0 >> recovery_active 0 >> bitmap pending_writes 0 >> >> $ cat stat >> flags 0x20 >> sb_flags 0x0 >> recovery 0x0 >> >> $ cat raid1/iostat >> retry_list active 0 >> bio_end_io_list active 0 >> pending_bio_list active 0 cnt 0 >> sync_pending 0 >> nr_pending 0 >> nr_waiting 0 >> nr_queued 0 >> barrier 0 > > Hi, > > Sorry for the late reply. > > I think these information are really debug information that we should not > show in /sys. Once we expose them in /sys, we need to support them > because some use space may start searching data from them. So debugfs is used to place these debug information instead of sysfs. It's OK for user-space tools to search data from these files as long as these tools don't expect these information to be stable. And the most possible user of these files would be test programs, and if some user-space tools may truly expect some stable information from the debugfs file, maybe we should move these information from debugfs to sysfs file. Regards, Tao > > Thanks, > Song > > . >
Hi, On 2019/7/23 7:30, Bob Liu wrote: > On 7/2/19 9:29 PM, Hou Tao wrote: >> Hi, >> >> There are so many io counters, stats and flags in md, so I think >> export these info to userspace will be helpful for online-debugging, > > For online-debugging, I'd suggest you have a try eBPF which can be very helpful. > Thanks for your suggestion. Using an eBPF program to read these internal status from a host which is fully under your control is a good choice, but when the dependencies of an eBPF program (e.g., the minor version of the kernel and the kernel configuration which will influence the struct layout) is out-of-your-control, it's not a good choice. Thanks, Tao > -Bob > >> especially when the vmlinux file and the crash utility are not >> available. And these info can also be utilized during code >> understanding. >> >> MD has already exported some stats through sysfs files under >> /sys/block/mdX/md, but using sysfs file to export more internal >> stats are not a good choice, because we need to create a single >> sysfs file for each internal stat according to the use convention >> of sysfs and there are too many internal stats. Further, the >> newly-created sysfs files would become APIs for userspace tools, >> but that is not we wanted, because these files are related with >> internal stats and internal stats may change from time to time. >> >> And I think debugfs is a better choice. Because we can show multiple >> related stats in a debugfs file, and the debugfs file will never be >> used as an userspace API. >> >> Two debugfs files are created to expose these internal stats: >> * iostat: io counters and io related stats (e.g., mddev->active_io, >> r1conf->nr_pending, or r1confi->retry_list) >> * stat: normal stats/flags (e.g., mddev->recovery, conf->array_frozen) >> >> Because internal stats are spreaded all over md-core and md-personality, >> so both md-core and md-personality will create these two debugfs files >> under different debugfs directory. >> >> Patch 1 factors out the debugfs files creation routine for md-core and >> md-personality, patch 2 creates two debugfs files: iostat & stat under >> /sys/kernel/debug/block/mdX for md-core, and patch 3 creates two debugfs >> files: iostat & stat under /sys/kernel/debug/block/mdX/raid1 for md-raid1. >> >> The following lines show the hierarchy and the content of these debugfs >> files for a RAID1 device: >> >> $ pwd >> /sys/kernel/debug/block/md0 >> $ tree >> . >> ├── iostat >> ├── raid1 >> │ ├── iostat >> │ └── stat >> └── stat >> >> $ cat iostat >> active_io 0 >> sb_wait 0 pending_writes 0 >> recovery_active 0 >> bitmap pending_writes 0 >> >> $ cat stat >> flags 0x20 >> sb_flags 0x0 >> recovery 0x0 >> >> $ cat raid1/iostat >> retry_list active 0 >> bio_end_io_list active 0 >> pending_bio_list active 0 cnt 0 >> sync_pending 0 >> nr_pending 0 >> nr_waiting 0 >> nr_queued 0 >> barrier 0 >> >> $ cat raid1/stat >> array_frozen 0 >> >> I'm not sure whether the division of internal stats is appropriate and >> whether the internal stats in debugfs files are sufficient, so questions >> and comments are weclome. >> >> Regards, >> Tao >> >> Hou Tao (3): >> md-debugfs: add md_debugfs_create_files() >> md: export inflight io counters and internal stats in debugfs >> raid1: export inflight io counters and internal stats in debugfs >> >> drivers/md/Makefile | 2 +- >> drivers/md/md-debugfs.c | 35 ++++++++++++++++++ >> drivers/md/md-debugfs.h | 16 +++++++++ >> drivers/md/md.c | 65 ++++++++++++++++++++++++++++++++++ >> drivers/md/md.h | 1 + >> drivers/md/raid1.c | 78 +++++++++++++++++++++++++++++++++++++++++ >> drivers/md/raid1.h | 1 + >> 7 files changed, 197 insertions(+), 1 deletion(-) >> create mode 100644 drivers/md/md-debugfs.c >> create mode 100644 drivers/md/md-debugfs.h >> > > > . >
On Fri, Jul 26, 2019 at 10:48 PM Hou Tao <houtao1@huawei.com> wrote: > > Hi, > [...] > > > > Hi, > > > > Sorry for the late reply. > > > > I think these information are really debug information that we should not > > show in /sys. Once we expose them in /sys, we need to support them > > because some use space may start searching data from them. > So debugfs is used to place these debug information instead of sysfs. I don't think we should dump random information into debugfs. It is common for the developers to carry some local patches that dumps information for debug. We cannot get these patches upstream. Thanks, Song