diff mbox

[RFC,03/13] ovl: lookup redirect by file handle

Message ID 1492387183-18847-4-git-send-email-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Amir Goldstein April 16, 2017, 11:59 p.m. UTC
When redirect_fh feature is enabled and when underlying
layers are all on the same fs, construct the dentry stack
of a merged dir by following a file handle chain of all
merged dir layers.

When overlay.fh xattr is found in upper inode, instead
of lookup of the dentry in next lower layer by name,
try to get it by calling exportfs_decode_fh().

On failure to follow file handle from upper to lower, fall back
to regular name lookup with or without path redirect.

On failure to decode file handle from middle layer, after
already following file handle from upper, there is no fall back
to lookup by name.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/namei.c     | 167 +++++++++++++++++++++++++++++++++++++++++++----
 fs/overlayfs/overlayfs.h |   1 +
 fs/overlayfs/util.c      |  14 ++++
 3 files changed, 168 insertions(+), 14 deletions(-)

Comments

Vivek Goyal April 18, 2017, 1:05 p.m. UTC | #1
On Mon, Apr 17, 2017 at 02:59:33AM +0300, Amir Goldstein wrote:

[..]
> @@ -272,6 +385,33 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>  			goto out_put_upper;
>  	}
>  
> +	/* Try to lookup lower layers by file handle (at root) */
> +	d.by_path = false;
> +	for (i = 0; !d.stop && d.fh && i < roe->numlower; i++) {
> +		struct path lowerpath = roe->lowerstack[i];
> +
> +		d.last = i == roe->numlower - 1;
> +		err = ovl_lookup_layer_fh(&lowerpath, &d, &this);
> +		if (err)
> +			goto out_put;
> +
> +		if (!this)
> +			continue;
> +
> +		stack[ctr].dentry = this;
> +		stack[ctr].mnt = lowerpath.mnt;
> +		ctr++;
> +	}
> +
> +	/* Fallback to lookup lower layers by name (at parent) */
> +	if (ctr) {
> +		d.stop = true;

Hi Amir,

Got a very basic question. So say I two lower layers and a directory is
in present in both, say lower1/dir1 and lower2/dir1. Now this directory
gets copied up to upper/dir1. Assume lower1/dir1 is being copied up. So
upper/dir1 will save file handle of lower1/dir1 right? This file handle
does not represent lower2/dir1?

IOW, later when I am doing lookup, then using file handle I will find
dentry of lower1/dir1 but not lower2/dir1. And looks like we will not
path based lookup as ctr will be non-zero (as we found one dentry in
one path).

What am I missing.

(This ovl_lookup() logic has become really twisted now. I wished it was
 little easier to read.)

Vivek
Amir Goldstein April 18, 2017, 2:05 p.m. UTC | #2
On Tue, Apr 18, 2017 at 4:05 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Mon, Apr 17, 2017 at 02:59:33AM +0300, Amir Goldstein wrote:
>
> [..]
>> @@ -272,6 +385,33 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
>>                       goto out_put_upper;
>>       }
>>
>> +     /* Try to lookup lower layers by file handle (at root) */
>> +     d.by_path = false;
>> +     for (i = 0; !d.stop && d.fh && i < roe->numlower; i++) {
>> +             struct path lowerpath = roe->lowerstack[i];
>> +
>> +             d.last = i == roe->numlower - 1;
>> +             err = ovl_lookup_layer_fh(&lowerpath, &d, &this);
>> +             if (err)
>> +                     goto out_put;
>> +
>> +             if (!this)
>> +                     continue;
>> +
>> +             stack[ctr].dentry = this;
>> +             stack[ctr].mnt = lowerpath.mnt;
>> +             ctr++;
>> +     }
>> +
>> +     /* Fallback to lookup lower layers by name (at parent) */
>> +     if (ctr) {
>> +             d.stop = true;
>
> Hi Amir,
>
> Got a very basic question. So say I two lower layers and a directory is
> in present in both, say lower1/dir1 and lower2/dir1. Now this directory
> gets copied up to upper/dir1. Assume lower1/dir1 is being copied up. So
> upper/dir1 will save file handle of lower1/dir1 right? This file handle
> does not represent lower2/dir1?
>
> IOW, later when I am doing lookup, then using file handle I will find
> dentry of lower1/dir1 but not lower2/dir1. And looks like we will not
> path based lookup as ctr will be non-zero (as we found one dentry in
> one path).
>
> What am I missing.
>

You are not missing anything - I am.

The reason I did not bump into this is because I have only 2 testing modes
with unionmount testsuite:
- upper layers recycled to lower layers without redirect fh (./run --ov=N)
- upper layers recycled to lower layers with redirect fh (./run --ov=N --samefs)

I have no tests that compose lower layers without redirect_fh and
upper layer with redirect_fh.
So in my tests, lower1/dir1 will always have a redirect_fh to lower2/dir1.

One way I consider adressing this issue is:
When mounting redirect_fh, check that all lowerstack[i] has a redirect_fh
to lowerstack[i+1] and set a redirect_fh to lowerstack[0] from upperdir.
If any of the redirect_fh are missing or stale (because lowerdirs were copied)
disable redirect_fh.

I think that any attempt to fallback from lookup by fh to lookup by path
is going to be risky and I wish to eliminate the per lookup fallback.
When redirect_fh does not fallback to lookup by path, then the semantics
of "implicit opaque" are always correct and there is no longer a need to
check directories for opaque xattr explicitly (the lack of redirect fh xattr
means opaque).

> (This ovl_lookup() logic has become really twisted now. I wished it was
>  little easier to read.)
>

Me too. IMO, most of the complexity is in the fallbacks from redirect
by fh to full path to relative path. Eliminating some of these fallbacks
and maybe having separate lookup op per mode, may simplify the code.

Amir.
Vivek Goyal April 18, 2017, 6:32 p.m. UTC | #3
On Tue, Apr 18, 2017 at 05:05:12PM +0300, Amir Goldstein wrote:

[..]
> > (This ovl_lookup() logic has become really twisted now. I wished it was
> >  little easier to read.)
> >
> 
> Me too. IMO, most of the complexity is in the fallbacks from redirect
> by fh to full path to relative path. Eliminating some of these fallbacks
> and maybe having separate lookup op per mode, may simplify the code.

In general, why are we falling back to path based lookup. If lower dirs
were copied or something else, that's a configuration error and overlayfs
will have undefined behavior?

Vivek
Amir Goldstein April 18, 2017, 6:57 p.m. UTC | #4
On Tue, Apr 18, 2017 at 9:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, Apr 18, 2017 at 05:05:12PM +0300, Amir Goldstein wrote:
>
> [..]
>> > (This ovl_lookup() logic has become really twisted now. I wished it was
>> >  little easier to read.)
>> >
>>
>> Me too. IMO, most of the complexity is in the fallbacks from redirect
>> by fh to full path to relative path. Eliminating some of these fallbacks
>> and maybe having separate lookup op per mode, may simplify the code.
>
> In general, why are we falling back to path based lookup. If lower dirs
> were copied or something else, that's a configuration error and overlayfs
> will have undefined behavior?
>

Heh, I just sent out a summary of what happens when lower dirs are copied.
The fallback is needed because when lower/upper dirs are copied
(which is a perfectly valid practice), the xattr are copied with them.
So the fallback is needed because ovl_lookup() find the fh on upper dir
and tries to follow it only to realize that it is stale - then it assumes that
layers were copied and resorts to path lookup.

My point earlier is that I wish to move this "was I copied?" test to mount
time instead of doing it on every lookup. But then user won't be able
to enjoy all the benefits of stable inodes after layers were copied, so
this is not ideal.

I guess what we could do is disable the redirect_dir by fh optimization
for directories if layers were copied and always follow non-dirs by fh.
That will eliminate the fallback and none of the stable inode features
would miss anything.
This should work well with the copied layers use case.

Amir.
Miklos Szeredi April 19, 2017, 3:21 p.m. UTC | #5
On Tue, Apr 18, 2017 at 8:57 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Tue, Apr 18, 2017 at 9:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> On Tue, Apr 18, 2017 at 05:05:12PM +0300, Amir Goldstein wrote:
>>
>> [..]
>>> > (This ovl_lookup() logic has become really twisted now. I wished it was
>>> >  little easier to read.)
>>> >
>>>
>>> Me too. IMO, most of the complexity is in the fallbacks from redirect
>>> by fh to full path to relative path. Eliminating some of these fallbacks
>>> and maybe having separate lookup op per mode, may simplify the code.
>>
>> In general, why are we falling back to path based lookup. If lower dirs
>> were copied or something else, that's a configuration error and overlayfs
>> will have undefined behavior?
>>
>
> Heh, I just sent out a summary of what happens when lower dirs are copied.
> The fallback is needed because when lower/upper dirs are copied
> (which is a perfectly valid practice), the xattr are copied with them.
> So the fallback is needed because ovl_lookup() find the fh on upper dir
> and tries to follow it only to realize that it is stale - then it assumes that
> layers were copied and resorts to path lookup.
>
> My point earlier is that I wish to move this "was I copied?" test to mount
> time instead of doing it on every lookup. But then user won't be able
> to enjoy all the benefits of stable inodes after layers were copied, so
> this is not ideal.

How so?  Nobody is expecting the copy to have the same inode numbers
as the original.

So it's fine if the relation to the original lower layer file is
broken, we don't care about that anymore.  Except for the hard link
case, but that relation can be preserved in the reverse mapping.

Thanks,
Miklos
Amir Goldstein April 19, 2017, 3:35 p.m. UTC | #6
On Wed, Apr 19, 2017 at 6:21 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Tue, Apr 18, 2017 at 8:57 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Tue, Apr 18, 2017 at 9:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>>> On Tue, Apr 18, 2017 at 05:05:12PM +0300, Amir Goldstein wrote:
>>>
>>> [..]
>>>> > (This ovl_lookup() logic has become really twisted now. I wished it was
>>>> >  little easier to read.)
>>>> >
>>>>
>>>> Me too. IMO, most of the complexity is in the fallbacks from redirect
>>>> by fh to full path to relative path. Eliminating some of these fallbacks
>>>> and maybe having separate lookup op per mode, may simplify the code.
>>>
>>> In general, why are we falling back to path based lookup. If lower dirs
>>> were copied or something else, that's a configuration error and overlayfs
>>> will have undefined behavior?
>>>
>>
>> Heh, I just sent out a summary of what happens when lower dirs are copied.
>> The fallback is needed because when lower/upper dirs are copied
>> (which is a perfectly valid practice), the xattr are copied with them.
>> So the fallback is needed because ovl_lookup() find the fh on upper dir
>> and tries to follow it only to realize that it is stale - then it assumes that
>> layers were copied and resorts to path lookup.
>>
>> My point earlier is that I wish to move this "was I copied?" test to mount
>> time instead of doing it on every lookup. But then user won't be able
>> to enjoy all the benefits of stable inodes after layers were copied, so
>> this is not ideal.
>
> How so?  Nobody is expecting the copy to have the same inode numbers
> as the original.
>
> So it's fine if the relation to the original lower layer file is
> broken, we don't care about that anymore.  Except for the hard link
> case, but that relation can be preserved in the reverse mapping.
>

I was addressing Vivek's concern about the complexity of the
"fallback from fh to path" code.
I managed to get the following use case wrong:
- 2 or more lower layers that have some dir redirects by path
- upper layer that now creates redirect_fh
After following from upper to lower1 by fh, I failed to follow by
path to lower2.

So I contemplated detecting this mixed mode at mount time
and disabling redirect_fh.

Nevermind, I'll just fix the fallback in mid layers case.
diff mbox

Patch

diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c
index b8b0778..fcb7c15 100644
--- a/fs/overlayfs/namei.c
+++ b/fs/overlayfs/namei.c
@@ -9,9 +9,11 @@ 
 
 #include <linux/fs.h>
 #include <linux/cred.h>
+#include <linux/mount.h>
 #include <linux/namei.h>
 #include <linux/xattr.h>
 #include <linux/ratelimit.h>
+#include <linux/exportfs.h>
 #include "overlayfs.h"
 #include "ovl_entry.h"
 
@@ -21,7 +23,10 @@  struct ovl_lookup_data {
 	bool opaque;
 	bool stop;
 	bool last;
-	char *redirect;
+	bool by_path;		/* redirect by path */
+	bool by_fh;		/* redirect by file handle */
+	char *redirect;		/* path to follow */
+	struct ovl_fh *fh;	/* file handle to follow */
 };
 
 static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
@@ -81,6 +86,41 @@  static int ovl_check_redirect(struct dentry *dentry, struct ovl_lookup_data *d,
 	goto err_free;
 }
 
+static int ovl_check_redirect_fh(struct dentry *dentry, struct ovl_lookup_data *d)
+{
+	int res;
+	void *buf = NULL;
+
+	res = vfs_getxattr(dentry, OVL_XATTR_FH, NULL, 0);
+	if (res < 0) {
+		if (res == -ENODATA || res == -EOPNOTSUPP)
+			return 0;
+		goto fail;
+	}
+	buf = kzalloc(res, GFP_TEMPORARY);
+	if (!buf)
+		return -ENOMEM;
+
+	if (res == 0)
+		goto fail;
+
+	res = vfs_getxattr(dentry, OVL_XATTR_FH, buf, res);
+	if (res < 0 || !ovl_redirect_fh_ok(buf, res))
+		goto fail;
+
+	kfree(d->fh);
+	d->fh = buf;
+
+	return 0;
+
+err_free:
+	kfree(buf);
+	return 0;
+fail:
+	pr_warn_ratelimited("overlayfs: failed to get file handle (%i)\n", res);
+	goto err_free;
+}
+
 static bool ovl_is_opaquedir(struct dentry *dentry)
 {
 	int res;
@@ -96,22 +136,76 @@  static bool ovl_is_opaquedir(struct dentry *dentry)
 	return false;
 }
 
+/* Check if p1 is connected with a chain of hashed dentries to p2 */
+static bool ovl_is_lookable(struct dentry *p1, struct dentry *p2)
+{
+	struct dentry *p;
+
+	for (p = p2; !IS_ROOT(p); p = p->d_parent) {
+		if (d_unhashed(p))
+			return false;
+		if (p->d_parent == p1)
+			return true;
+	}
+	return false;
+}
+
+/* Check if dentry is reachable from mnt via path lookup */
+static int ovl_dentry_under_mnt(void *ctx, struct dentry *dentry)
+{
+	struct vfsmount *mnt = ctx;
+
+	return ovl_is_lookable(mnt->mnt_root, dentry);
+}
+
+static struct dentry *ovl_lookup_fh(struct vfsmount *mnt,
+				    const struct ovl_fh *fh)
+{
+	int bytes = (fh->len - offsetof(struct ovl_fh, fid));
+
+	/*
+	 * Several layers can be on the same fs and decoded dentry may be in
+	 * either one of those layers. We are looking for a match of dentry
+	 * and mnt to find out to which layer the decoded dentry belongs to.
+	 */
+	return exportfs_decode_fh(mnt, (struct fid *)fh->fid,
+				  bytes >> 2, (int)fh->type,
+				  ovl_dentry_under_mnt, mnt);
+}
+
 static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 			     const char *name, unsigned int namelen,
 			     size_t prelen, const char *post,
-			     struct dentry **ret)
+			     struct vfsmount *mnt, struct dentry **ret)
 {
 	struct dentry *this;
 	int err;
 
-	this = lookup_one_len_unlocked(name, base, namelen);
+	/*
+	 * Lookup of upper is with null d->fh.
+	 * Lookup of lower is either by_fh with non-null d->fh
+	 * or by_path with null d->fh.
+	 */
+	if (d->fh)
+		this = ovl_lookup_fh(mnt, d->fh);
+	else
+		this = lookup_one_len_unlocked(name, base, namelen);
 	if (IS_ERR(this)) {
 		err = PTR_ERR(this);
 		this = NULL;
 		if (err == -ENOENT || err == -ENAMETOOLONG)
 			goto out;
+		if (d->fh && err == -ESTALE)
+			goto out;
 		goto out_err;
 	}
+
+	/* Lower found by file handle - don't follow that handle again */
+	if (d->fh) {
+		kfree(d->fh);
+		d->fh = NULL;
+	}
+
 	if (!this->d_inode)
 		goto put_and_out;
 
@@ -135,9 +229,18 @@  static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 		d->stop = d->opaque = true;
 		goto out;
 	}
-	err = ovl_check_redirect(this, d, prelen, post);
-	if (err)
-		goto out_err;
+	if (d->last)
+		goto out;
+	if (d->by_fh) {
+		err = ovl_check_redirect_fh(this, d);
+		if (err)
+			goto out_err;
+	}
+	if (d->by_path) {
+		err = ovl_check_redirect(this, d, prelen, post);
+		if (err)
+			goto out_err;
+	}
 out:
 	*ret = this;
 	return 0;
@@ -152,6 +255,12 @@  static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
 	return err;
 }
 
+static int ovl_lookup_layer_fh(struct path *path, struct ovl_lookup_data *d,
+			       struct dentry **ret)
+{
+	return ovl_lookup_single(path->dentry, d, "", 0, 0, "", path->mnt, ret);
+}
+
 static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
 			    struct dentry **ret)
 {
@@ -162,7 +271,7 @@  static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
 
 	if (d->name.name[0] != '/')
 		return ovl_lookup_single(base, d, d->name.name, d->name.len,
-					 0, "", ret);
+					 0, "", NULL, ret);
 
 	while (!IS_ERR_OR_NULL(base) && d_can_lookup(base)) {
 		const char *s = d->name.name + d->name.len - rem;
@@ -175,7 +284,7 @@  static int ovl_lookup_layer(struct dentry *base, struct ovl_lookup_data *d,
 			return -EIO;
 
 		err = ovl_lookup_single(base, d, s, thislen,
-					d->name.len - rem, next, &base);
+					d->name.len - rem, next, NULL, &base);
 		dput(dentry);
 		if (err)
 			return err;
@@ -220,6 +329,7 @@  struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	const struct cred *old_cred;
 	struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
 	struct ovl_entry *poe = dentry->d_parent->d_fsdata;
+	struct ovl_entry *roe = dentry->d_sb->s_root->d_fsdata;
 	struct path *stack = NULL;
 	struct dentry *upperdir, *upperdentry = NULL;
 	unsigned int ctr = 0;
@@ -235,7 +345,10 @@  struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		.opaque = false,
 		.stop = false,
 		.last = !poe->numlower,
+		.by_path = true,
+		.by_fh = ofs->config.redirect_fh,
 		.redirect = NULL,
+		.fh = NULL,
 	};
 
 	if (dentry->d_name.len > ofs->namelen)
@@ -259,12 +372,12 @@  struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			if (!upperredirect)
 				goto out_put_upper;
 			if (d.redirect[0] == '/')
-				poe = dentry->d_sb->s_root->d_fsdata;
+				poe = roe;
 		}
 		upperopaque = d.opaque;
 	}
 
-	if (!d.stop && poe->numlower) {
+	if (!d.stop && (poe->numlower || d.fh)) {
 		err = -ENOMEM;
 		stack = kcalloc(ofs->numlower, sizeof(struct path),
 				GFP_TEMPORARY);
@@ -272,6 +385,33 @@  struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 			goto out_put_upper;
 	}
 
+	/* Try to lookup lower layers by file handle (at root) */
+	d.by_path = false;
+	for (i = 0; !d.stop && d.fh && i < roe->numlower; i++) {
+		struct path lowerpath = roe->lowerstack[i];
+
+		d.last = i == roe->numlower - 1;
+		err = ovl_lookup_layer_fh(&lowerpath, &d, &this);
+		if (err)
+			goto out_put;
+
+		if (!this)
+			continue;
+
+		stack[ctr].dentry = this;
+		stack[ctr].mnt = lowerpath.mnt;
+		ctr++;
+	}
+
+	/* Fallback to lookup lower layers by name (at parent) */
+	if (ctr) {
+		d.stop = true;
+	} else {
+		d.by_path = true;
+		d.by_fh = false;
+		kfree(d.fh);
+		d.fh = NULL;
+	}
 	for (i = 0; !d.stop && i < poe->numlower; i++) {
 		struct path lowerpath = poe->lowerstack[i];
 
@@ -290,10 +430,8 @@  struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 		if (d.stop)
 			break;
 
-		if (d.redirect &&
-		    d.redirect[0] == '/' &&
-		    poe != dentry->d_sb->s_root->d_fsdata) {
-			poe = dentry->d_sb->s_root->d_fsdata;
+		if (d.redirect && d.redirect[0] == '/' && poe != roe) {
+			poe = roe;
 
 			/* Find the current layer on the root dentry */
 			for (i = 0; i < poe->numlower; i++)
@@ -352,6 +490,7 @@  struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 	dput(upperdentry);
 	kfree(upperredirect);
 out:
+	kfree(d.fh);
 	kfree(d.redirect);
 	revert_creds(old_cred);
 	return ERR_PTR(err);
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 49be199..6257c5b 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -186,6 +186,7 @@  const char *ovl_dentry_get_redirect(struct dentry *dentry);
 void ovl_dentry_set_redirect(struct dentry *dentry, const char *redirect);
 bool ovl_redirect_fh(struct super_block *sb);
 void ovl_clear_redirect_fh(struct super_block *sb);
+bool ovl_redirect_fh_ok(const char *redirect, size_t size);
 void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
 void ovl_inode_init(struct inode *inode, struct inode *realinode,
 		    bool is_upper);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 17530c5..6538ec5 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -229,6 +229,20 @@  void ovl_clear_redirect_fh(struct super_block *sb)
 	ofs->config.redirect_fh = false;
 }
 
+bool ovl_redirect_fh_ok(const char *redirect, size_t size)
+{
+	struct ovl_fh *fh = (void *)redirect;
+
+	if (size < sizeof(struct ovl_fh) || size < fh->len)
+		return false;
+
+	if (fh->version > OVL_FH_VERSION ||
+	    fh->magic != OVL_FH_MAGIC)
+		return false;
+
+	return true;
+}
+
 void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry)
 {
 	struct ovl_entry *oe = dentry->d_fsdata;