diff mbox

why is i_ino unsigned long, anyway?

Message ID 20130912160324.GE1462@fieldses.org (mailing list archive)
State New, archived
Headers show

Commit Message

J. Bruce Fields Sept. 12, 2013, 4:03 p.m. UTC
Somebody noticed an "ls -l" over nfs failing on entries with inode
numbers greater than 2^32 on a 32-bit NFS server.  The cause is some
code that tries to compare i_ino to the full 64-bit inode number.

I think the following will fix it, but I'm curious: why is i_ino
"unsigned long", anyway?  Is there something know that depends on that,
or is it just that the sheer number of users makes it too scary to
change?

--b.

commit 0cc784eb430285535ae7a79dd5133ab66e9ce839
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Tue Sep 10 11:41:12 2013 -0400

    exportfs: fix 32-bit nfsd handling of 64-bit inode numbers
    
    Symptoms were spurious -ENOENTs on stat of an NFS filesystem from a
    32-bit NFS server exporting a very large XFS filesystem, when the
    server's cache is cold (so the inodes in question are not in cache).
    
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Al Viro Sept. 12, 2013, 7:33 p.m. UTC | #1
On Thu, Sep 12, 2013 at 12:03:24PM -0400, J. Bruce Fields wrote:
> Somebody noticed an "ls -l" over nfs failing on entries with inode
> numbers greater than 2^32 on a 32-bit NFS server.  The cause is some
> code that tries to compare i_ino to the full 64-bit inode number.
> 
> I think the following will fix it, but I'm curious: why is i_ino
> "unsigned long", anyway?  Is there something know that depends on that,
> or is it just that the sheer number of users makes it too scary to
> change?

i_ino use is entirely up to filesystem; it may be used by library helpers,
provided that the choice of using or not using those is, again, up to
filesystem in question.  NFSD has no damn business looking at it; library
helpers in fs/exportfs might, but that makes them not suitable for use
by filesystems without inode numbers or with 64bit ones.

The reason why it's there at all is that it serves as convenient icache
search key for many filesystems.  IOW, it's used by iget_locked() and
avoiding the overhead of 64bit comparisons on 32bit hosts is the main
reason to avoid making it u64.

Again, no fs-independent code has any business looking at it, 64bit or
not.  From the VFS point of view there is no such thing as inode number.
And get_name() is just a library helper.  For many fs types it works
as suitable ->s_export_op->get_name() instance, but decision to use it
or not belongs to filesystem in question and frankly, it's probably better
to provide an instance of your own anyway.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Sept. 29, 2013, 11:54 a.m. UTC | #2
On Thu, Sep 12, 2013 at 08:33:28PM +0100, Al Viro wrote:
> i_ino use is entirely up to filesystem; it may be used by library helpers,
> provided that the choice of using or not using those is, again, up to
> filesystem in question.

With more and more filesystems using large inode numbers I'm starting to
wonder if this still makes sense.  But given that it's been this way for
a long time we should have more documentation of it for sure.

> NFSD has no damn business looking at it; library
> helpers in fs/exportfs might, but that makes them not suitable for use
> by filesystems without inode numbers or with 64bit ones.
> 
> The reason why it's there at all is that it serves as convenient icache
> search key for many filesystems.  IOW, it's used by iget_locked() and
> avoiding the overhead of 64bit comparisons on 32bit hosts is the main
> reason to avoid making it u64.
> 
> Again, no fs-independent code has any business looking at it, 64bit or
> not.  From the VFS point of view there is no such thing as inode number.
> And get_name() is just a library helper.  For many fs types it works
> as suitable ->s_export_op->get_name() instance, but decision to use it
> or not belongs to filesystem in question and frankly, it's probably better
> to provide an instance of your own anyway.

Given that these days most exportable filesystems use 64-bit inode
numbers I think we should put the patch from Bruce in.  Nevermind that
it's in a slow path, so the overhead of vfs_getattr really doesn't hurt.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Oct. 2, 2013, 2:25 p.m. UTC | #3
On Sun, Sep 29, 2013 at 04:54:54AM -0700, Christoph Hellwig wrote:
> On Thu, Sep 12, 2013 at 08:33:28PM +0100, Al Viro wrote:
> > i_ino use is entirely up to filesystem; it may be used by library helpers,
> > provided that the choice of using or not using those is, again, up to
> > filesystem in question.
> 
> With more and more filesystems using large inode numbers I'm starting to
> wonder if this still makes sense.  But given that it's been this way for
> a long time we should have more documentation of it for sure.
> 
> > NFSD has no damn business looking at it; library
> > helpers in fs/exportfs might, but that makes them not suitable for use
> > by filesystems without inode numbers or with 64bit ones.
> > 
> > The reason why it's there at all is that it serves as convenient icache
> > search key for many filesystems.  IOW, it's used by iget_locked() and
> > avoiding the overhead of 64bit comparisons on 32bit hosts is the main
> > reason to avoid making it u64.
> > 
> > Again, no fs-independent code has any business looking at it, 64bit or
> > not.  From the VFS point of view there is no such thing as inode number.
> > And get_name() is just a library helper.  For many fs types it works
> > as suitable ->s_export_op->get_name() instance, but decision to use it
> > or not belongs to filesystem in question and frankly, it's probably better
> > to provide an instance of your own anyway.
> 
> Given that these days most exportable filesystems use 64-bit inode
> numbers I think we should put the patch from Bruce in.  Nevermind that
> it's in a slow path, so the overhead of vfs_getattr really doesn't hurt.

Calling vfs_getattr adds a security_inode_getattr() call that wasn't
there before.  Any chance of that being a problem?

If so then it's no huge code duplication to it by hand:

	if (inode->i_op->getattr)
		inode->i_op->getattr(path->mnt, path->dentry, &stat);
	else
		generic_fillattr(inode, &stat);

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 2, 2013, 4:05 p.m. UTC | #4
On Wed, Oct 02, 2013 at 10:25:27AM -0400, J. Bruce Fields wrote:
> If so then it's no huge code duplication to it by hand:
> 
> 	if (inode->i_op->getattr)
> 		inode->i_op->getattr(path->mnt, path->dentry, &stat);
> 	else
> 		generic_fillattr(inode, &stat);

Maybe make that a vfs_getattr_nosec and let vfs_getattr call it?

Including a proper kerneldoc comment explaining when to use it, please.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 293bc2e..6a79bb8 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -215,7 +215,7 @@  struct getdents_callback {
 	struct dir_context ctx;
 	char *name;		/* name that was found. It already points to a
 				   buffer NAME_MAX+1 is size */
-	unsigned long ino;	/* the inum we are looking for */
+	u64 ino;		/* the inum we are looking for */
 	int found;		/* inode matched? */
 	int sequence;		/* sequence counter */
 };
@@ -255,10 +255,10 @@  static int get_name(const struct path *path, char *name, struct dentry *child)
 	struct inode *dir = path->dentry->d_inode;
 	int error;
 	struct file *file;
+	struct kstat stat;
 	struct getdents_callback buffer = {
 		.ctx.actor = filldir_one,
 		.name = name,
-		.ino = child->d_inode->i_ino
 	};
 
 	error = -ENOTDIR;
@@ -268,6 +268,16 @@  static int get_name(const struct path *path, char *name, struct dentry *child)
 	if (!dir->i_fop)
 		goto out;
 	/*
+	 * inode->i_ino is unsigned long, kstat->ino is u64, so the
+	 * former would be insufficient on 32-bit hosts when the
+	 * filesystem supports 64-bit inode numbers.  So we need to
+	 * actually call ->getattr, not just read i_ino:
+	 */
+	error = vfs_getattr(path, &stat);
+	if (error)
+		return error;
+	buffer.ino = stat.ino;
+	/*
 	 * Open the directory ...
 	 */
 	file = dentry_open(path, O_RDONLY, cred);