diff mbox

fix inconsistent device between /proc/pid/maps and stat

Message ID 871strcs56.wl-satoru.takeuchi@gmail.com (mailing list archive)
State Superseded
Headers show

Commit Message

Satoru Takeuchi March 21, 2017, 1:53 a.m. UTC
There have been some discussions about inconsistent device between /proc/pid/maps and stat(2).

http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps
https://www.spinics.net/lists/linux-btrfs/msg09044.html
https://patchwork.kernel.org/patch/2825842/
https://patchwork.kernel.org/patch/2831525/

It's important since it breaks user programs like lsof(8). There was a patche by Mark to fix this problem.
However, unfortunately, that patch is not merged so far.

  NOTE:
  1) This patch is inspired by the Mark's one and I tweak it as follows.
     - move a flag for this fix from sb->s_flags to kernel internal sb->s_iflags
     - change that flag's name from MS_STAT_FOR_DEV to SB_I_LOGICALVOL, to make its meaning clearer
  2) This patch is for Chris's for-linus-4.11 (commit e1699d2d7bf6e6cce3e1baff19f9dd4595a58664 ("")). It should modify to apply to 4.11-rcX because of statx patches changes
     struct inode_operations->getattr() interface.

For more information about this problem, please refer to the patchat the end of this mail.

---
Subject: [PATCH] fix inconsistent devie between /proc/pid/maps and stat

/proc/pid/maps returns each device as follows.

Comments

David Sterba March 24, 2017, 11:58 a.m. UTC | #1
On Tue, Mar 21, 2017 at 10:53:09AM +0900, Satoru Takeuchi wrote:
> There have been some discussions about inconsistent device between /proc/pid/maps and stat(2).
> 
> http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps
> https://www.spinics.net/lists/linux-btrfs/msg09044.html
> https://patchwork.kernel.org/patch/2825842/
> https://patchwork.kernel.org/patch/2831525/
> 
> It's important since it breaks user programs like lsof(8). There was a patche by Mark to fix this problem.
> However, unfortunately, that patch is not merged so far.

And no variant of the get_map_dev will ever be merged. Reworking this
requires extensions to the superblock and subvolume structures, making
it more generic and suitable for VFS.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Satoru Takeuchi March 24, 2017, 1:32 p.m. UTC | #2
2017-03-24 20:58 GMT+09:00 David Sterba <dsterba@suse.cz>:
> On Tue, Mar 21, 2017 at 10:53:09AM +0900, Satoru Takeuchi wrote:
>> There have been some discussions about inconsistent device between /proc/pid/maps and stat(2).
>>
>> http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps
>> https://www.spinics.net/lists/linux-btrfs/msg09044.html
>> https://patchwork.kernel.org/patch/2825842/
>> https://patchwork.kernel.org/patch/2831525/
>>
>> It's important since it breaks user programs like lsof(8). There was a patche by Mark to fix this problem.
>> However, unfortunately, that patch is not merged so far.
>
> And no variant of the get_map_dev will ever be merged. Reworking this
> requires extensions to the superblock and subvolume structures, making
> it more generic and suitable for VFS.

Thank you for your comment. I'm going to reconsider how to fix this problem.

Thanks,
Satoru
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

===
dev = inode->i_sb->s_dev;
===

However, stat(2) returns it as follows.

===
stat->dev = BTRFS_I(inode)->root->anon_dev;
===

It results in device mismatch and this inconsistency breaks users programs like lsof(8) as follows.

Without this patch:
===
$ mount | grep /home/vagrant/mnt
/dev/vda5 on /home/vagrant/mnt type btrfs (rw,relatime,noacl,space_cache,subvolid=5,subvol=/)
$ /home/vagrant/mnt/lsof | grep /home/vagrant/mnt
lsof      19292                  root  txt       REG               0,44   163136        257 /home/sat/mnt/lsof
lsof      19292                  root  mem       REG               0,42                 257 /home/sat/mnt/lsof (path dev=0,44)
lsof      19294                  root  txt       REG               0,44   163136        257 /home/sat/mnt/lsof
lsof      19294                  root  mem       REG               0,42                 257 /home/sat/mnt/lsof (path dev=0,44)
===

With this patch:
===
$ mount | grep /home/vagrant/mnt
/dev/vda5 on /home/vagrant/mnt type btrfs (rw,relatime,noacl,space_cache,subvolid=5,subvol=/)
$ /home/vagrant/mnt/lsof | grep /home/vagrant/mnt
lsof      752                 root  txt       REG               0,44   163224        263 /home/vagrant/mnt/lsof
lsof      754                 root  txt       REG               0,44   163224        263 /home/vagrant/mnt/lsof
===

Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
CC: Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/super.c     |  1 +
 fs/proc/generic.c    | 13 +++++++++++++
 fs/proc/internal.h   |  1 +
 fs/proc/nommu.c      |  2 +-
 fs/proc/task_mmu.c   |  2 +-
 fs/proc/task_nommu.c |  2 +-
 include/linux/fs.h   |  1 +
 7 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index da687dc79cce..2ccfdb107ba0 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1133,6 +1133,7 @@  static int btrfs_fill_super(struct super_block *sb,
 #endif
 	sb->s_flags |= MS_I_VERSION;
 	sb->s_iflags |= SB_I_CGROUPWB;
+	sb->s_iflags |= SB_I_LOGICALVOL;
 	err = open_ctree(sb, fs_devices, (char *)data);
 	if (err) {
 		btrfs_err(fs_info, "open_ctree failed");
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index f6a01f09f79d..d38cd77b297c 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -646,3 +646,16 @@  void *PDE_DATA(const struct inode *inode)
 	return __PDE_DATA(inode);
 }
 EXPORT_SYMBOL(PDE_DATA);
+
+dev_t proc_get_map_dev(struct dentry *dentry)
+{
+	struct inode *inode = dentry->d_inode;
+	struct kstat kstat;
+
+	if (inode->i_sb->s_iflags & SB_I_LOGICALVOL &&
+	    inode->i_op->getattr &&
+	    inode->i_op->getattr(NULL, dentry, &kstat) == 0)
+		return kstat.dev;
+
+	return inode->i_sb->s_dev;
+}
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 2de5194ba378..bf0f11fc209b 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -190,6 +190,7 @@  static inline struct proc_dir_entry *pde_get(struct proc_dir_entry *pde)
 	return pde;
 }
 extern void pde_put(struct proc_dir_entry *);
+dev_t proc_get_map_dev(struct dentry *);
 
 static inline bool is_empty_pde(const struct proc_dir_entry *pde)
 {
diff --git a/fs/proc/nommu.c b/fs/proc/nommu.c
index 75634379f82e..e9c29864a50e 100644
--- a/fs/proc/nommu.c
+++ b/fs/proc/nommu.c
@@ -46,7 +46,7 @@  static int nommu_region_show(struct seq_file *m, struct vm_region *region)
 
 	if (file) {
 		struct inode *inode = file_inode(region->vm_file);
-		dev = inode->i_sb->s_dev;
+		dev = proc_get_map_dev(vma->vm_file->f_path.dentry);
 		ino = inode->i_ino;
 	}
 
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 8f96a49178d0..3529adf0faa8 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -292,7 +292,7 @@  show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
 
 	if (file) {
 		struct inode *inode = file_inode(vma->vm_file);
-		dev = inode->i_sb->s_dev;
+		dev = proc_get_map_dev(vma->vm_file->f_path.dentry);
 		ino = inode->i_ino;
 		pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
 	}
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 37175621e890..4f17f91225c3 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -156,7 +156,7 @@  static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma,
 
 	if (file) {
 		struct inode *inode = file_inode(vma->vm_file);
-		dev = inode->i_sb->s_dev;
+		dev = proc_get_map_dev(vma->vm_file->f_path.dentry);
 		ino = inode->i_ino;
 		pgoff = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
 	}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2ba074328894..eec62a5bfd4d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1262,6 +1262,7 @@  struct mm_struct;
 #define SB_I_CGROUPWB	0x00000001	/* cgroup-aware writeback enabled */
 #define SB_I_NOEXEC	0x00000002	/* Ignore executables on this fs */
 #define SB_I_NODEV	0x00000004	/* Ignore devices on this fs */
+#define SB_I_LOGICALVOL	0x00000008	/* this fs supports logical volumes */
 
 /* sb->s_iflags to limit user namespace mounts */
 #define SB_I_USERNS_VISIBLE		0x00000010 /* fstype already mounted */