[RFC,1/1] shiftfs: uid/gid shifting bind mount
diff mbox

Message ID 1487474678.15793.2.camel@HansenPartnership.com
State New
Headers show

Commit Message

James Bottomley Feb. 19, 2017, 3:24 a.m. UTC
On Fri, 2017-02-17 at 15:35 -0500, Vivek Goyal wrote:
> On Fri, Feb 17, 2017 at 09:34:07AM -0800, James Bottomley wrote:
> > On Fri, 2017-02-17 at 02:55 +0000, Al Viro wrote:
> > > On Thu, Feb 16, 2017 at 07:56:30AM -0800, James Bottomley wrote:
> > > 
> > > > > Hi James,
> > > > > 
> > > > > Should it be "return d_splice_alias()" so that if we find an 
> > > > > alias it is returned back to caller and passed in dentry can 
> > > > > be freed. Though I don't know in what cases alias can be 
> > > > > found. And if alias is found how do we make sure alias_dentry
> > > > > ->d_fsdata is pointing to new (real dentry).
> > > > 
> > > > It probably should be for the sake of the pattern.  In our case 
> > > > I don't think we can have any root aliases because the root
> > > > dentry is always pinned in the cache, so cache lookup should 
> > > > always find it.
> > > 
> > > What does that have to do with root dentry?  The real reason why 
> > > that code works (FVerySVO) is that the damn thing allocates a new
> > > inode every time. Including the hardlinks, BTW.
> > 
> > Yes, this is a known characteristic of stacked filesystems.  Is 
> > there some magic I don't know about that would make it easier to 
> > reflect hard links as aliases?
> 
> I think overlayfs had the same issue in the beginning and miklos
> fixed it.
> 
> commit 51f7e52dc943468c6929fa0a82d4afac3c8e9636
> Author: Miklos Szeredi <mszeredi@redhat.com>
> Date:   Fri Jul 29 12:05:24 2016 +0200
> 
>     ovl: share inode for hard link

That's rather complex, but the principle is simple: use the inode hash
for all upper inodes that may have aliases.  Aliasable means the
underlying inode isn't a directory and has i_nlink > 1, so all I have
to do is perform a lookup through the hash if the underlying is
aliasable, invalidate the dentry in d_revalidate if the aliasing
conditions to the underlying change and manually handle hard links and
it should all work.

Like this?

James

---

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Vivek Goyal Feb. 20, 2017, 7:26 p.m. UTC | #1
On Sat, Feb 18, 2017 at 07:24:38PM -0800, James Bottomley wrote:

[..]
> > > Yes, this is a known characteristic of stacked filesystems.  Is 
> > > there some magic I don't know about that would make it easier to 
> > > reflect hard links as aliases?
> > 
> > I think overlayfs had the same issue in the beginning and miklos
> > fixed it.
> > 
> > commit 51f7e52dc943468c6929fa0a82d4afac3c8e9636
> > Author: Miklos Szeredi <mszeredi@redhat.com>
> > Date:   Fri Jul 29 12:05:24 2016 +0200
> > 
> >     ovl: share inode for hard link
> 
> That's rather complex, but the principle is simple: use the inode hash
> for all upper inodes that may have aliases.  Aliasable means the
> underlying inode isn't a directory and has i_nlink > 1, so all I have
> to do is perform a lookup through the hash if the underlying is
> aliasable, invalidate the dentry in d_revalidate if the aliasing
> conditions to the underlying change and manually handle hard links and
> it should all work.
> 
> Like this?

Sounds reasonable to me. I did basic testing and this seems to work for me.

In general, I am having random crashes. I just get following on serial
console

------[Cut Here]----------

And nothing after that.

Still trying to narrow down.

Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Feb. 21, 2017, 12:38 a.m. UTC | #2
On Mon, 2017-02-20 at 14:26 -0500, Vivek Goyal wrote:
> On Sat, Feb 18, 2017 at 07:24:38PM -0800, James Bottomley wrote:
> 
> [..]
> > > > Yes, this is a known characteristic of stacked filesystems.  Is
> > > > there some magic I don't know about that would make it easier
> > > > to 
> > > > reflect hard links as aliases?
> > > 
> > > I think overlayfs had the same issue in the beginning and miklos
> > > fixed it.
> > > 
> > > commit 51f7e52dc943468c6929fa0a82d4afac3c8e9636
> > > Author: Miklos Szeredi <mszeredi@redhat.com>
> > > Date:   Fri Jul 29 12:05:24 2016 +0200
> > > 
> > >     ovl: share inode for hard link
> > 
> > That's rather complex, but the principle is simple: use the inode
> > hash
> > for all upper inodes that may have aliases.  Aliasable means the
> > underlying inode isn't a directory and has i_nlink > 1, so all I
> > have
> > to do is perform a lookup through the hash if the underlying is
> > aliasable, invalidate the dentry in d_revalidate if the aliasing
> > conditions to the underlying change and manually handle hard links
> > and
> > it should all work.
> > 
> > Like this?
> 
> Sounds reasonable to me. I did basic testing and this seems to work
> for me.
> 
> In general, I am having random crashes. I just get following on 
> serial console
> 
> ------[Cut Here]----------
> 
> And nothing after that.

That's indicative of some hard lockup.  I don't see this, but I'm also
using a second laptop for testing, which is suboptimal.  I'm going to
try moving to xfstests inside a VM tomorrow (that's what long aeroplane
flights are for).

> Still trying to narrow down.

Thanks.  There've been a lot of patches flying around, so I'll do a
collected repost under a v2 header to make sure we're all in sync.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/fs/shiftfs.c b/fs/shiftfs.c
index 5b50447..c659812 100644
--- a/fs/shiftfs.c
+++ b/fs/shiftfs.c
@@ -134,6 +134,7 @@  static int shiftfs_d_weak_revalidate(struct dentry *dentry, unsigned int flags)
 static int shiftfs_d_revalidate(struct dentry *dentry, unsigned int flags)
 {
 	struct dentry *real = dentry->d_fsdata;
+	struct inode *reali = d_inode(real), *inode = d_inode(dentry);
 	int ret;
 
 	if (d_unhashed(real))
@@ -146,6 +147,15 @@  static int shiftfs_d_revalidate(struct dentry *dentry, unsigned int flags)
 	if (d_is_negative(real) != d_is_negative(dentry))
 		return 0;
 
+	/*
+	 * non dir link count is > 1 and our inode is currently not in
+	 * the inode hash => need to drop and reget our dentry to make
+	 * sure we're aliasing it correctly.
+	 */
+	if (reali &&!S_ISDIR(reali->i_mode) && reali->i_nlink > 1 &&
+	    (!inode || inode_unhashed(inode)))
+		return 0;
+
 	if (!(real->d_flags & DCACHE_OP_REVALIDATE))
 		return 1;
 
@@ -285,7 +295,8 @@  static int shiftfs_make_object(struct inode *dir, struct dentry *dentry,
 			       umode_t mode, const char *symlink,
 			       struct dentry *hardlink, bool excl)
 {
-	struct dentry *real = dir->i_private, *new = dentry->d_fsdata;
+	struct dentry *real = dir->i_private, *new = dentry->d_fsdata,
+		*realhardlink = NULL;
 	struct inode *reali = real->d_inode, *newi;
 	const struct inode_operations *iop = reali->i_op;
 	int err;
@@ -293,6 +304,7 @@  static int shiftfs_make_object(struct inode *dir, struct dentry *dentry,
 	bool op_ok = false;
 
 	if (hardlink) {
+		realhardlink = hardlink->d_fsdata;
 		op_ok = iop->link;
 	} else {
 		switch (mode & S_IFMT) {
@@ -310,7 +322,7 @@  static int shiftfs_make_object(struct inode *dir, struct dentry *dentry,
 		return -EINVAL;
 
 
-	newi = shiftfs_new_inode(dentry->d_sb, mode, NULL);
+	newi = shiftfs_new_inode(dentry->d_sb, mode, realhardlink);
 	if (!newi)
 		return -ENOMEM;
 
@@ -320,8 +332,6 @@  static int shiftfs_make_object(struct inode *dir, struct dentry *dentry,
 
 	err = -EINVAL;		/* shut gcc up about uninit var */
 	if (hardlink) {
-		struct dentry *realhardlink = hardlink->d_fsdata;
-
 		err = vfs_link(realhardlink, reali, new, NULL);
 	} else {
 		switch (mode & S_IFMT) {
@@ -341,7 +351,16 @@  static int shiftfs_make_object(struct inode *dir, struct dentry *dentry,
 	if (err)
 		goto out_dput;
 
-	shiftfs_fill_inode(newi, new);
+	if (!hardlink)
+		shiftfs_fill_inode(newi, new);
+	else if (inode_unhashed(newi) && !S_ISDIR(newi->i_mode))
+		/*
+		 * although dentry and hardlink now each point to
+		 * newi, the link count was 1 when they were created,
+		 * so insert into the inode cache now that the link
+		 * count has gone above one.
+		 */
+		__insert_inode_hash(newi, (unsigned long)d_inode(new));
 
 	d_instantiate(dentry, newi);
 
@@ -569,12 +588,55 @@  static const struct inode_operations shiftfs_inode_ops = {
 	.listxattr	= shiftfs_listxattr,
 };
 
+static int shiftfs_test(struct inode *inode, void *data)
+{
+	struct dentry *d1 = inode->i_private, *d2 = data;
+	struct inode *i1 = d_inode(d1), *i2 = d_inode(d2);
+
+	return i1 && i1 == i2;
+}
+
+static int shiftfs_set(struct inode *inode, void *data)
+{
+	struct dentry *dentry = data;
+
+	shiftfs_fill_inode(inode, dentry);
+
+	return 0;
+}
+
 static struct inode *shiftfs_new_inode(struct super_block *sb, umode_t mode,
 				       struct dentry *dentry)
 {
 	struct inode *inode;
+	struct inode *reali = dentry ? d_inode(dentry): NULL;
+	bool use_inode_hash = false;
+
+	/*
+	 * Here we hash the inode only if the underlying link count is
+	 * greater than one and it's not a directory (meaning the hash
+	 * contains all items that might be aliases).  We keep this
+	 * accurate by checking the underlying link count on
+	 * revalidation and forcing a new lookup if the underlying
+	 * link count is raised.
+	 *
+	 * Note: if the link count drops again, we don't remove the
+	 * inode from the hash, so the hash contains all inodes that
+	 * may be aliases plus a few others.
+	 */
+	if (reali)
+		use_inode_hash = ACCESS_ONCE(reali->i_nlink) > 1 &&
+			!S_ISDIR(reali->i_mode);
+
+	if (use_inode_hash) {
+		inode = iget5_locked(sb, (unsigned long)reali, shiftfs_test,
+				     shiftfs_set, dentry);
+		if (inode && !(inode->i_state & I_NEW))
+			return inode;
+	} else {
+		inode = new_inode(sb);
+	}
 
-	inode = new_inode(sb);
 	if (!inode)
 		return NULL;
 
@@ -586,7 +648,10 @@  static struct inode *shiftfs_new_inode(struct super_block *sb, umode_t mode,
 
 	inode->i_op = &shiftfs_inode_ops;
 
-	shiftfs_fill_inode(inode, dentry);
+	if (use_inode_hash)
+		unlock_new_inode(inode);
+	else
+		shiftfs_fill_inode(inode, dentry);
 
 	return inode;
 }