diff mbox

[RFC] btrfs: introduce procfs interface for the device list

Message ID 1411967340-23802-1-git-send-email-anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Anand Jain Sept. 29, 2014, 5:09 a.m. UTC
From: Anand Jain <Anand.Jain@oracle.com>

(added RFC prefix to the patch header)
(as of now just an experimental interface)

This patch introduces profs interface /proc/fs/btrfs/devlist,
which as of now exports all the members of kernel fs_devices.

The current /sys/fs/btrfs interface works when the fs is
mounted, and is on the file directory hierarchy and also has
the sysfs limitation max output of U64 per file.

Here btrfs procfs uses seq_file to export all the members of
fs_devices. Also shows the contents when device is not mounted,
but have registered with btrfs kernel (useful as an alternative
to buggy ready ioctl)

An attempt is made to follow the some standard file format
output such as ini. So that a simple warper python script will
provide end user useful interfaces.

Further planning to add few more members to the interface such as
group profile info. The long term idea is to make procfs
interface a onestop btrfs application interface for the device and
fs info from the kernel, where a simple python script can make
use of it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/Makefile  |  2 +-
 fs/btrfs/ctree.h   |  4 +++
 fs/btrfs/procfs.c  | 45 ++++++++++++++++++++++++++
 fs/btrfs/super.c   |  4 +++
 fs/btrfs/volumes.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |  1 +
 6 files changed, 149 insertions(+), 1 deletion(-)
 create mode 100644 fs/btrfs/procfs.c

Comments

Chris Mason Sept. 30, 2014, 2:23 p.m. UTC | #1
On Mon, Sep 29, 2014 at 1:09 AM, Anand Jain <anand.jain@oracle.com> 
wrote:
> From: Anand Jain <Anand.Jain@oracle.com>
> 
> (added RFC prefix to the patch header)
> (as of now just an experimental interface)
> 
> This patch introduces profs interface /proc/fs/btrfs/devlist,
> which as of now exports all the members of kernel fs_devices.
> 
> The current /sys/fs/btrfs interface works when the fs is
> mounted, and is on the file directory hierarchy and also has
> the sysfs limitation max output of U64 per file.
> 
> Here btrfs procfs uses seq_file to export all the members of
> fs_devices. Also shows the contents when device is not mounted,
> but have registered with btrfs kernel (useful as an alternative
> to buggy ready ioctl)
> 
> An attempt is made to follow the some standard file format
> output such as ini. So that a simple warper python script will
> provide end user useful interfaces.
> 
> Further planning to add few more members to the interface such as
> group profile info. The long term idea is to make procfs
> interface a onestop btrfs application interface for the device and
> fs info from the kernel, where a simple python script can make
> use of it.

Hi Anand,

We're going to have a really hard time getting a new proc interface 
merged in, and after we've recently fixed up all (most?) of our sysfs 
races, I'd rather not have to do it all over again with /proc. I know 
the lack of a seq interface is a difficult compromise to make in sysfs, 
but at this point I think we're stuck with it.  Which specific part do 
you hope to improve by dumping more information out in a single file?

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anand Jain Oct. 1, 2014, 7:41 a.m. UTC | #2
Hi Chris,

  Thanks for commenting. Some clarifying comments as below.


On 30/09/2014 22:23, Chris Mason wrote:
>
>
> On Mon, Sep 29, 2014 at 1:09 AM, Anand Jain <anand.jain@oracle.com> wrote:
>> From: Anand Jain <Anand.Jain@oracle.com>
>>
>> (added RFC prefix to the patch header)
>> (as of now just an experimental interface)
>>
>> This patch introduces profs interface /proc/fs/btrfs/devlist,
>> which as of now exports all the members of kernel fs_devices.
>>
>> The current /sys/fs/btrfs interface works when the fs is
>> mounted, and is on the file directory hierarchy and also has
>> the sysfs limitation max output of U64 per file.
>>
>> Here btrfs procfs uses seq_file to export all the members of
>> fs_devices. Also shows the contents when device is not mounted,
>> but have registered with btrfs kernel (useful as an alternative
>> to buggy ready ioctl)
>>
>> An attempt is made to follow the some standard file format
>> output such as ini. So that a simple warper python script will
>> provide end user useful interfaces.
>>
>> Further planning to add few more members to the interface such as
>> group profile info. The long term idea is to make procfs
>> interface a onestop btrfs application interface for the device and
>> fs info from the kernel, where a simple python script can make
>> use of it.
>
> Hi Anand,
>
> We're going to have a really hard time getting a new proc interface
> merged in, and after we've recently fixed up all (most?) of our sysfs
> races, I'd rather not have to do it all over again with /proc.

  This does not use fsid/devid based file-directory. So races as were
  in sysfs implementation does not apply here. (But there are opportunity
  to optimize the code at the place mentioned in the code as todo).

> I know
> the lack of a seq interface is a difficult compromise to make in sysfs,
> but at this point I think we're stuck with it.  Which specific part do
> you hope to improve by dumping more information out in a single file?

  Since its a single file and dumping most of the members of fs_devices
  we would ensure the interface will remain unchanged for a long time
  and helps debugging. This is hard to do when we layout files per
  parameter value.

  Less clutter. But needs python script abstraction to provide what
  user want. Better than using ioctls.

  file-parameter-layout might introduce races. So here there is no file
  parameter layout, its just one file /proc/fs/btrfs/devlist, provides
  an interface which is compatible with parser such as python
  configparser, with which application can organize it using a simple
  script.


Further,
  This also exports all registered devices which may not be mounted.
  (sysfs implementation does not).

  we need alternative to btrfs-progs check_mounted(). check_mounted() is
  too heavy as it has to scans all devices when num_devices > 1. This
  interface can help to light weight check_mounted().

  This provides alternative to following ioctl and thus will remove
  its bugs mentioned below.
     BTRFS_IOC_DEVICES_READY
     This ioctl should have been readonly but it can update device path.

     BTRFS_IOC_FS_INFO and BTRFS_IOC_DEV_INFO conflict on the slots
     when seed disk is used.

  Could provide RAID volume status information mainly for enterprise
  users.

Thanks, Anand



> -chris
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Oct. 1, 2014, 2:09 p.m. UTC | #3
On Wed, Oct 1, 2014 at 3:41 AM, Anand Jain <Anand.Jain@oracle.com> 
wrote:
> 
> Hi Chris,
> 
>  Thanks for commenting. Some clarifying comments as below.
> 
> 
> On 30/09/2014 22:23, Chris Mason wrote:
>> 
>> 
>> On Mon, Sep 29, 2014 at 1:09 AM, Anand Jain <anand.jain@oracle.com> 
>> wrote:
>>> From: Anand Jain <Anand.Jain@oracle.com>
>>> 
>>> (added RFC prefix to the patch header)
>>> (as of now just an experimental interface)
>>> 
>>> This patch introduces profs interface /proc/fs/btrfs/devlist,
>>> which as of now exports all the members of kernel fs_devices.
>>> 
>>> The current /sys/fs/btrfs interface works when the fs is
>>> mounted, and is on the file directory hierarchy and also has
>>> the sysfs limitation max output of U64 per file.
>>> 
>>> Here btrfs procfs uses seq_file to export all the members of
>>> fs_devices. Also shows the contents when device is not mounted,
>>> but have registered with btrfs kernel (useful as an alternative
>>> to buggy ready ioctl)
>>> 
>>> An attempt is made to follow the some standard file format
>>> output such as ini. So that a simple warper python script will
>>> provide end user useful interfaces.
>>> 
>>> Further planning to add few more members to the interface such as
>>> group profile info. The long term idea is to make procfs
>>> interface a onestop btrfs application interface for the device and
>>> fs info from the kernel, where a simple python script can make
>>> use of it.
>> 
>> Hi Anand,
>> 
>> We're going to have a really hard time getting a new proc interface
>> merged in, and after we've recently fixed up all (most?) of our sysfs
>> races, I'd rather not have to do it all over again with /proc.
> 
>  This does not use fsid/devid based file-directory. So races as were
>  in sysfs implementation does not apply here. (But there are 
> opportunity
>  to optimize the code at the place mentioned in the code as todo).

Right, proc has different races ;)  Again the bar for new interfaces in 
proc is really very high.  It's not the direction the rest of the 
kernel is using.

> 
> 
>> I know
>> the lack of a seq interface is a difficult compromise to make in 
>> sysfs,
>> but at this point I think we're stuck with it.  Which specific part 
>> do
>> you hope to improve by dumping more information out in a single file?
> 
>  Since its a single file and dumping most of the members of fs_devices
>  we would ensure the interface will remain unchanged for a long time
>  and helps debugging. This is hard to do when we layout files per
>  parameter value.
> 
>  Less clutter. But needs python script abstraction to provide what
>  user want. Better than using ioctls.
> 
>  file-parameter-layout might introduce races. So here there is no file
>  parameter layout, its just one file /proc/fs/btrfs/devlist, provides
>  an interface which is compatible with parser such as python
>  configparser, with which application can organize it using a simple
>  script.
> 
> 
> Further,
>  This also exports all registered devices which may not be mounted.
>  (sysfs implementation does not).

For these features, we need to work within the sysfs and udev 
frameworks.  It will integrate better with the direction the distros 
are using for management in general.  I really understand that in some 
ways the proc interface would be easier to write and easier to use, but 
this is one of those times that consistency with the rest of the kernel 
comes first.

Thanks again for the time you've spent improving the device management 
side of things.  For now, sysfs and udev are the best choices overall.

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duncan Oct. 1, 2014, 11:09 p.m. UTC | #4
Chris Mason posted on Wed, 01 Oct 2014 10:09:12 -0400 as excerpted:

>>> We're going to have a really hard time getting a new proc interface
>>> merged in, and after we've recently fixed up all (most?) of our sysfs
>>> races, I'd rather not have to do it all over again with /proc.
>> 
>>  This does not use fsid/devid based file-directory. So races as were in
>>  sysfs implementation does not apply here. (But there are
>> opportunity
>>  to optimize the code at the place mentioned in the code as todo).
> 
> Right, proc has different races ;)  Again the bar for new interfaces in
> proc is really very high.  It's not the direction the rest of the kernel
> is using.

Put differently...

Proc has fallen out of favor as an early experiment in virtual filesystem 
kernel interfaces that ran amok due to lack of governing rules at the 
time and is effectively legacy/deprecated.  From this viewpoint the most 
simple explanation for its continued existence is Linus' "prime 
directive" that you don't break userspace -- being the primary/only 
kernel/userspace virtual filesystem interface for quite some time, 
there's a *LOT* of stuff that depends on proc, and despite what many 
might want, it's not going to disappear overnight.

That's the kind of resistance you're looking at to get something new in 
proc.  Basically, it's not going to happen.

So as Chris recommends, go tilt at a different windmill.  Getting this 
one to move is going to require moving heaven and hell both, and it's 
just not worth it.

Sys does have stricter rules, but they are there for a reason, to ensure 
the mistakes that were made with proc don't get made with sys as well.  
That's the accepted place to put stuff that might have, in an earlier 
time, gone into proc, but now following the rules for sys, of course.
diff mbox

Patch

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 6d1d0b9..134a62f 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -4,7 +4,7 @@  obj-$(CONFIG_BTRFS_FS) := btrfs.o
 btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
 	   file-item.o inode-item.o inode-map.o disk-io.o \
 	   transaction.o inode.o file.o tree-defrag.o \
-	   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
+	   extent_map.o sysfs.o procfs.o struct-funcs.o xattr.o ordered-data.o \
 	   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
 	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 6db3d4b..600adc30 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3909,6 +3909,10 @@  ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size);
 int btrfs_parse_options(struct btrfs_root *root, char *options);
 int btrfs_sync_fs(struct super_block *sb, int wait);
 
+/* procfs.c */
+void btrfs_exit_procfs(void);
+void btrfs_init_procfs(void);
+
 #ifdef CONFIG_PRINTK
 __printf(2, 3)
 void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...);
diff --git a/fs/btrfs/procfs.c b/fs/btrfs/procfs.c
new file mode 100644
index 0000000..9c94c6c
--- /dev/null
+++ b/fs/btrfs/procfs.c
@@ -0,0 +1,45 @@ 
+#include <linux/seq_file.h>
+#include <linux/vmalloc.h>
+#include <linux/proc_fs.h>
+#include "ctree.h"
+#include "volumes.h"
+
+#define BTRFS_PROC_PATH		"fs/btrfs"
+#define BTRFS_PROC_DEVLIST	"devlist"
+
+struct proc_dir_entry	*btrfs_proc_root;
+
+static int btrfs_devlist_show(struct seq_file *seq, void *offset)
+{
+	btrfs_print_devlist(seq);
+	return 0;
+}
+
+static int btrfs_seq_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, btrfs_devlist_show, PDE_DATA(inode));
+}
+
+static const struct file_operations btrfs_seq_fops = {
+	.owner   = THIS_MODULE,
+	.open    = btrfs_seq_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = single_release,
+};
+
+void btrfs_init_procfs(void)
+{
+	btrfs_proc_root = proc_mkdir(BTRFS_PROC_PATH, NULL);
+	if (btrfs_proc_root)
+		proc_create_data(BTRFS_PROC_DEVLIST, S_IRUGO, btrfs_proc_root,
+					&btrfs_seq_fops, NULL);
+	return;
+}
+
+void btrfs_exit_procfs(void)
+{
+	if (btrfs_proc_root)
+		remove_proc_entry(BTRFS_PROC_DEVLIST, btrfs_proc_root);
+	remove_proc_entry(BTRFS_PROC_PATH, NULL);
+}
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d0f44d9..f545c24 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1976,6 +1976,8 @@  static int __init init_btrfs_fs(void)
 	if (err)
 		goto free_hash;
 
+	btrfs_init_procfs();
+
 	btrfs_init_compress();
 
 	err = btrfs_init_cachep();
@@ -2048,6 +2050,7 @@  free_cachep:
 	btrfs_destroy_cachep();
 free_compress:
 	btrfs_exit_compress();
+	btrfs_exit_procfs();
 	btrfs_exit_sysfs();
 free_hash:
 	btrfs_hash_exit();
@@ -2066,6 +2069,7 @@  static void __exit exit_btrfs_fs(void)
 	extent_io_exit();
 	btrfs_interface_exit();
 	unregister_filesystem(&btrfs_fs_type);
+	btrfs_exit_procfs();
 	btrfs_exit_sysfs();
 	btrfs_cleanup_fs_uuids();
 	btrfs_exit_compress();
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3e803c1..a526722 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -27,6 +27,7 @@ 
 #include <linux/kthread.h>
 #include <linux/raid/pq.h>
 #include <linux/semaphore.h>
+#include <linux/seq_file.h>
 #include <asm/div64.h>
 #include "ctree.h"
 #include "extent_map.h"
@@ -6615,3 +6616,96 @@  void btrfs_update_commit_device_bytes_used(struct btrfs_root *root,
 	}
 	unlock_chunks(root);
 }
+
+
+void btrfs_print_devlist(struct seq_file *seq)
+{
+
+	/* Btrfs Procfs String Len */
+#define BPSL	256
+#define BTRFS_SEQ_PRINT(plist, arg)\
+		snprintf(str, BPSL, plist, arg);\
+		if (sprt)\
+			seq_printf(seq, "\t");\
+		seq_printf(seq, str)
+
+	char str[BPSL];
+	struct btrfs_device *device;
+	struct btrfs_fs_devices *fs_devices;
+	struct btrfs_fs_devices *cur_fs_devices;
+	struct btrfs_fs_devices *sprt; //sprout fs devices
+
+	seq_printf(seq, "\n#Its Experimental, parameters may change without notice.\n\n");
+
+	mutex_lock(&uuid_mutex);
+	/* Todo: there must be better way than nested locks */
+	list_for_each_entry(cur_fs_devices, &fs_uuids, list) {
+
+		mutex_lock(&cur_fs_devices->device_list_mutex);
+
+		fs_devices = cur_fs_devices;
+		sprt = NULL;
+
+again_fs_devs:
+		if (sprt) {
+			BTRFS_SEQ_PRINT("[[seed_fsid: %pU]]\n", fs_devices->fsid);
+			BTRFS_SEQ_PRINT("\tsprout_fsid:\t\t%pU\n", sprt->fsid);
+		} else {
+			BTRFS_SEQ_PRINT("[fsid: %pU]\n", fs_devices->fsid);
+		}
+		if (fs_devices->seed) {
+			BTRFS_SEQ_PRINT("\tseed_fsid:\t\t%pU\n", fs_devices->seed->fsid);
+		}
+		BTRFS_SEQ_PRINT("\tnum_devices:\t\t%llu\n", fs_devices->num_devices);
+		BTRFS_SEQ_PRINT("\topen_devices:\t\t%llu\n", fs_devices->open_devices);
+		BTRFS_SEQ_PRINT("\trw_devices:\t\t%llu\n", fs_devices->rw_devices);
+		BTRFS_SEQ_PRINT("\tmissing_devices:\t%llu\n", fs_devices->missing_devices);
+		BTRFS_SEQ_PRINT("\ttotal_rw_devices:\t%llu\n", fs_devices->total_rw_bytes);
+		BTRFS_SEQ_PRINT("\ttotal_devices:\t\t%llu\n", fs_devices->total_devices);
+		BTRFS_SEQ_PRINT("\topened:\t\t\t%d\n", fs_devices->opened);
+		BTRFS_SEQ_PRINT("\tseeding:\t\t%d\n", fs_devices->seeding);
+		BTRFS_SEQ_PRINT("\trotating:\t\t%d\n", fs_devices->rotating);
+
+		list_for_each_entry(device, &fs_devices->devices, dev_list) {
+			BTRFS_SEQ_PRINT("\t[[uuid: %pU]]\n", device->uuid);
+			rcu_read_lock();
+			BTRFS_SEQ_PRINT("\t\tdevice:\t\t%s\n",
+				device->name ? rcu_str_deref(device->name): "(null)");
+			rcu_read_unlock();
+			BTRFS_SEQ_PRINT("\t\tdevid:\t\t%llu\n", device->devid);
+			if (device->dev_root) {
+				BTRFS_SEQ_PRINT("\t\tdev_root_fsid:\t%pU\n",
+						device->dev_root->fs_info->fsid);
+			}
+			BTRFS_SEQ_PRINT("\t\tgeneration:\t%llu\n", device->generation);
+			BTRFS_SEQ_PRINT("\t\ttotal_bytes:\t%llu\n", device->total_bytes);
+			BTRFS_SEQ_PRINT("\t\tdev_totalbytes:\t%llu\n", device->disk_total_bytes);
+			BTRFS_SEQ_PRINT("\t\tbytes_used:\t%llu\n", device->bytes_used);
+			BTRFS_SEQ_PRINT("\t\ttype:\t\t%llu\n", device->type);
+			BTRFS_SEQ_PRINT("\t\tio_align:\t%u\n", device->io_align);
+			BTRFS_SEQ_PRINT("\t\tio_width:\t%u\n", device->io_width);
+			BTRFS_SEQ_PRINT("\t\tsector_size:\t%u\n", device->sector_size);
+			BTRFS_SEQ_PRINT("\t\tmode:\t\t0x%llx\n", (u64)device->mode);
+			BTRFS_SEQ_PRINT("\t\twriteable:\t%d\n", device->writeable);
+			BTRFS_SEQ_PRINT("\t\tin_fs_metadata:\t%d\n", device->in_fs_metadata);
+			BTRFS_SEQ_PRINT("\t\tmissing:\t%d\n", device->missing);
+			BTRFS_SEQ_PRINT("\t\tcan_discard:\t%d\n", device->can_discard);
+			BTRFS_SEQ_PRINT("\t\treplace_tgtdev:\t%d\n",
+								device->is_tgtdev_for_dev_replace);
+			BTRFS_SEQ_PRINT("\t\tactive_pending:\t%d\n", device->running_pending);
+			BTRFS_SEQ_PRINT("\t\tnobarriers:\t%d\n", device->nobarriers);
+			BTRFS_SEQ_PRINT("\t\tdevstats_valid:\t%d\n", device->dev_stats_valid);
+			BTRFS_SEQ_PRINT("\t\tbdev:\t\t%s\n", device->bdev ? "not_null":"null");
+		}
+
+		if (fs_devices->seed) {
+			sprt = fs_devices;
+			fs_devices = fs_devices->seed;
+			goto again_fs_devs;
+		}
+		seq_printf(seq, "\n");
+
+		mutex_unlock(&cur_fs_devices->device_list_mutex);
+	}
+	mutex_unlock(&uuid_mutex);
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 2b37da3..81f255e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -510,4 +510,5 @@  static inline void btrfs_dev_stat_reset(struct btrfs_device *dev,
 void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info);
 void btrfs_update_commit_device_bytes_used(struct btrfs_root *root,
 					struct btrfs_transaction *transaction);
+void btrfs_print_devlist(struct seq_file *seq);
 #endif