diff mbox

btrfs: stat(2) and /proc/pid/maps returns different devices

Message ID 20130710174545.GS32502@wotan.suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Mark Fasheh July 10, 2013, 5:45 p.m. UTC
On Wed, Jul 10, 2013 at 09:31:05AM -0700, Mark Fasheh wrote:
> As far as I can tell we'll be carrying this patch until a better
> solution is possible.
> 
> When that will happen, I don't know.
> 	--Mark

Well, what do I get when I pretend I don't care any more? The little voice
in my head says "keep plugging away". Here's another attempt at fixing this
problem in a sane manner. Basically, this time we're adding a flag to
s_flags which btrfs sets. Proc will see the flag and call ->getattr().

This compiles, but it needs testing (which I will get to soon). It still has
a bunch of problems in my honest opinion but maybe if we get something
acceptable upstream we can work from there.

Also, as Andrew pointed out there's more than one place which is return
different device than from stat(2) so I probably need to update more sites
to deal with this.

Does anyone see a problem with this approach?
	--Mark

--
Mark Fasheh

From: Mark Fasheh <mfasheh@suse.de>

vfs: allow /proc/PID/maps to get device from stat

stat(2) on btrfs returns a custom device, but proc uses s_dev from the super
block. This causes problems because software (and users) are not expecting
the kernel to return different devices from these calls.

This patch fixes the problem by adding a new superblock flag,
MS_PROC_USE_ST. When the proc code sees this flag, it will call the file
systems ->getattr() method to extract a device as opposed to getting it
directly from s_dev.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/super.c        |  1 +
 fs/proc/generic.c       | 30 ++++++++++++++++++++++++++++++
 fs/proc/internal.h      |  1 +
 fs/proc/task_mmu.c      |  2 +-
 fs/proc/task_nommu.c    |  2 +-
 include/uapi/linux/fs.h |  1 +
 6 files changed, 35 insertions(+), 2 deletions(-)

Comments

David Sterba July 10, 2013, 10:26 p.m. UTC | #1
On Wed, Jul 10, 2013 at 10:45:45AM -0700, Mark Fasheh wrote:
> Well, what do I get when I pretend I don't care any more? The little voice
> in my head says "keep plugging away". Here's another attempt at fixing this
> problem in a sane manner. Basically, this time we're adding a flag to
> s_flags which btrfs sets. Proc will see the flag and call ->getattr().
> 
> This compiles, but it needs testing (which I will get to soon). It still has
> a bunch of problems in my honest opinion but maybe if we get something
> acceptable upstream we can work from there.
> 
> Also, as Andrew pointed out there's more than one place which is return
> different device than from stat(2) so I probably need to update more sites
> to deal with this.
> 
> Does anyone see a problem with this approach?

The approach looks ok to me, the implementation is internal to vfs and
fairly minimal. The bit that bothers me is the name of the flag, it's
completely unobvious what it means.

There are some differences to the linked suse patch:

> +static dev_t proc_get_dev_from_stat(struct inode *inode)
> +{
> +	struct dentry *dentry = d_find_any_alias(inode);

This does the dentry -> inode mapping, while originally there was

	&file->f_path

passing just the inode to proc_get_dev_from_stat unnecessarily drops the
available information that's about to be retrieved again.

> +	struct kstat kstat;
> +
> +	if (!dentry)
> +		goto out_error;
> +	if (inode->i_op->getattr(NULL, dentry, &kstat))

The suse patch calls vfs_getattr that in turn calls

	security_inode_getattr(path->mnt, path->dentry);

That would be missing.

Plus checks for presence of the ->getattr operation. Though this is
superfluous with btrfs, I suggest to use vfs_getattr here, which will
fix all of the above.

> +		goto out_error_dput;
> +
> +	dput(dentry);
> +	return kstat.dev;
> +
> +out_error_dput:
> +	dput(dentry);
> +out_error:
> +	return inode->i_sb->s_dev;
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrei Vagin July 11, 2013, 9:25 p.m. UTC | #2
On Wed, Jul 10, 2013 at 10:45:45AM -0700, Mark Fasheh wrote:
> On Wed, Jul 10, 2013 at 09:31:05AM -0700, Mark Fasheh wrote:
> > As far as I can tell we'll be carrying this patch until a better
> > solution is possible.
> > 
> > When that will happen, I don't know.
> > 	--Mark
> 
> Well, what do I get when I pretend I don't care any more? The little voice
> in my head says "keep plugging away". Here's another attempt at fixing this
> problem in a sane manner. Basically, this time we're adding a flag to
> s_flags which btrfs sets. Proc will see the flag and call ->getattr().
> 
> This compiles, but it needs testing (which I will get to soon). It still has
> a bunch of problems in my honest opinion but maybe if we get something
> acceptable upstream we can work from there.
> 
> Also, as Andrew pointed out there's more than one place which is return
> different device than from stat(2) so I probably need to update more sites
> to deal with this.

Yes, we need to fix unix_diag, fanotify fdinfo, ...

> 
> Does anyone see a problem with this approach?

Looks good for me. Thanks.

> 	--Mark
> 
> --
> Mark Fasheh
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Fasheh July 19, 2013, 8:51 p.m. UTC | #3
On Thu, Jul 11, 2013 at 12:26:50AM +0200, David Sterba wrote:
> On Wed, Jul 10, 2013 at 10:45:45AM -0700, Mark Fasheh wrote:
> > Well, what do I get when I pretend I don't care any more? The little voice
> > in my head says "keep plugging away". Here's another attempt at fixing this
> > problem in a sane manner. Basically, this time we're adding a flag to
> > s_flags which btrfs sets. Proc will see the flag and call ->getattr().
> > 
> > This compiles, but it needs testing (which I will get to soon). It still has
> > a bunch of problems in my honest opinion but maybe if we get something
> > acceptable upstream we can work from there.
> > 
> > Also, as Andrew pointed out there's more than one place which is return
> > different device than from stat(2) so I probably need to update more sites
> > to deal with this.
> > 
> > Does anyone see a problem with this approach?
> 
> The approach looks ok to me, the implementation is internal to vfs and
> fairly minimal. The bit that bothers me is the name of the flag, it's
> completely unobvious what it means.

I'll come up with something better for my next revision :)

> 
> There are some differences to the linked suse patch:
> 
> > +static dev_t proc_get_dev_from_stat(struct inode *inode)
> > +{
> > +	struct dentry *dentry = d_find_any_alias(inode);
> 
> This does the dentry -> inode mapping, while originally there was
> 
> 	&file->f_path
> 
> passing just the inode to proc_get_dev_from_stat unnecessarily drops the
> available information that's about to be retrieved again.

Good catch, thanks.


> 
> > +	struct kstat kstat;
> > +
> > +	if (!dentry)
> > +		goto out_error;
> > +	if (inode->i_op->getattr(NULL, dentry, &kstat))
> 
> The suse patch calls vfs_getattr that in turn calls
> 
> 	security_inode_getattr(path->mnt, path->dentry);
> 
> That would be missing.
> 
> Plus checks for presence of the ->getattr operation. Though this is
> superfluous with btrfs, I suggest to use vfs_getattr here, which will
> fix all of the above.

Ok checking for the operation is definitely needed. I'll check for
->getattr(). The rest of the stuff in our suse patch must have been added
after my own commit. Do you know why this was added? Since this is all
internal to proc I didn't think we needed all this extra stuff to replicate
vfs_gettatr().
	--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba July 31, 2013, 6:24 p.m. UTC | #4
On Fri, Jul 19, 2013 at 01:51:15PM -0700, Mark Fasheh wrote:
> Ok checking for the operation is definitely needed. I'll check for
> ->getattr(). The rest of the stuff in our suse patch must have been added
> after my own commit. Do you know why this was added? Since this is all
> internal to proc I didn't think we needed all this extra stuff to replicate
> vfs_gettatr().

I don't remember, probably for code simplicity. The updated version
you've sent looks ok to me.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Emelyanov Oct. 24, 2013, 10:41 a.m. UTC | #5
On 07/20/2013 12:51 AM, Mark Fasheh wrote:
> On Thu, Jul 11, 2013 at 12:26:50AM +0200, David Sterba wrote:
>> On Wed, Jul 10, 2013 at 10:45:45AM -0700, Mark Fasheh wrote:
>>> Well, what do I get when I pretend I don't care any more? The little voice
>>> in my head says "keep plugging away". Here's another attempt at fixing this
>>> problem in a sane manner. Basically, this time we're adding a flag to
>>> s_flags which btrfs sets. Proc will see the flag and call ->getattr().
>>>
>>> This compiles, but it needs testing (which I will get to soon). It still has
>>> a bunch of problems in my honest opinion but maybe if we get something
>>> acceptable upstream we can work from there.
>>>
>>> Also, as Andrew pointed out there's more than one place which is return
>>> different device than from stat(2) so I probably need to update more sites
>>> to deal with this.
>>>
>>> Does anyone see a problem with this approach?
>>
>> The approach looks ok to me, the implementation is internal to vfs and
>> fairly minimal. The bit that bothers me is the name of the flag, it's
>> completely unobvious what it means.
> 
> I'll come up with something better for my next revision :)

Mark, David,

What are your plans about the next version? Any chance we can see it in the
3.13 merge window? (unless I've missed the fact, that it's already there)

I'd really love to see it, as this thing is a blocker for checkpoint-restore
on btrfs.

Thanks,
Pavel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f0857e0..67be4ef 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -822,6 +822,7 @@  static int btrfs_fill_super(struct super_block *sb,
 	sb->s_flags |= MS_POSIXACL;
 #endif
 	sb->s_flags |= MS_I_VERSION;
+	sb->s_flags |= MS_PROC_USE_ST;
 	err = open_ctree(sb, fs_devices, (char *)data);
 	if (err) {
 		printk("btrfs: open_ctree failed\n");
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index a2596af..eca8195 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -24,6 +24,8 @@ 
 #include <linux/spinlock.h>
 #include <linux/completion.h>
 #include <asm/uaccess.h>
+#include <linux/fs.h>
+#include <linux/dcache.h>
 
 #include "internal.h"
 
@@ -637,3 +639,31 @@  void *PDE_DATA(const struct inode *inode)
 	return __PDE_DATA(inode);
 }
 EXPORT_SYMBOL(PDE_DATA);
+
+static dev_t proc_get_dev_from_stat(struct inode *inode)
+{
+	struct dentry *dentry = d_find_any_alias(inode);
+	struct kstat kstat;
+
+	if (!dentry)
+		goto out_error;
+
+	if (inode->i_op->getattr(NULL, dentry, &kstat))
+		goto out_error_dput;
+
+	dput(dentry);
+	return kstat.dev;
+
+out_error_dput:
+	dput(dentry);
+out_error:
+	return inode->i_sb->s_dev;
+}
+
+dev_t proc_get_map_dev(struct inode *inode)
+{
+	if (inode->i_sb->s_flags & MS_PROC_USE_ST)
+		return proc_get_dev_from_stat(inode);
+	else
+		return inode->i_sb->s_dev;
+}
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index d600fb0..24808b0 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -192,6 +192,7 @@  static inline struct proc_dir_entry *pde_get(struct proc_dir_entry *pde)
 	return pde;
 }
 extern void pde_put(struct proc_dir_entry *);
+dev_t proc_get_map_dev(struct inode *inode);
 
 /*
  * inode.c
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3e636d8..9226600 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -272,7 +272,7 @@  show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
 
 	if (file) {
 		struct inode *inode = file_inode(vma->vm_file);
-		dev = inode->i_sb->s_dev;
+		dev = proc_get_map_dev(inode);
 		ino = inode->i_ino;
 		pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
 	}
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 56123a6..892d84a 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -150,7 +150,7 @@  static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma,
 
 	if (file) {
 		struct inode *inode = file_inode(vma->vm_file);
-		dev = inode->i_sb->s_dev;
+		dev = proc_get_map_dev(inode);
 		ino = inode->i_ino;
 		pgoff = (loff_t)vma->vm_pgoff << PAGE_SHIFT;
 	}
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index a4ed56c..b4173a3 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -88,6 +88,7 @@  struct inodes_stat_t {
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
 
 /* These sb flags are internal to the kernel */
+#define MS_PROC_USE_ST	(1<<27)
 #define MS_NOSEC	(1<<28)
 #define MS_BORN		(1<<29)
 #define MS_ACTIVE	(1<<30)