diff mbox

[v4,00/15] overlayfs constant inode numbers

Message ID CAOQ4uxitv9FcZO9Ch-3dx0vPZTpZMxcRo=WmCRxj7fUqNJqCGg@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Amir Goldstein May 5, 2017, 7:25 a.m. UTC
On Fri, May 5, 2017 at 12:03 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, May 4, 2017 at 4:14 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> One minor review comment.
>>
>> I used uuid_le_cmp(*uuid, NULL_UUID_LE) arbitrarily in my original patch,
>> but if we want to stick to semantic sb->s_uuid is probably more accurately
>> described as uuid_be, because filesystems most likely copy it in raw format
>> from disk.
>>
>> This is purely semantic of course, but if you think that matters, may as well
>> replace uuid_le with uuid_be.
>
> Okay.
>
> More changes:
>
>   - use vfs_getattr() to get the lower inode number instead of
> lower->d_inode->i_ino.  For one i_ino is 32bit on 32bit archs, while
> kstat.ino is always 64bit
>
>   - merge the dir and non-dir getattr, they look very similar now

Nice, see suggestion below for 'improvement'.

You also forgot to mention in changes since v6:

   - store 'null' fh instead of 'invalid' fh

Reviewed and tested v7.

Regarding ovl_getattr(), it is a good example for code that could make
use of OVL_TYPE_COPYUP() for better readability.

OVL_TYPE_COPYUP() was originally meant to mark also dentries
with null origin fh, but it is useful IMO also to mark dentries for which
we hold an origin (and maybe grow the original meaning later).

Attached a patch to add OVL_TYPE_COPYUP() with diminished
interpretation, which also makes use of it in ovl_getattr().
If you take that patch, you probably want to apply it before the
ovl_getattr() change though.

BTW, I tested with that patch.

Cheers,
Amir.

Comments

Amir Goldstein May 5, 2017, 7:55 a.m. UTC | #1
On Fri, May 5, 2017 at 10:25 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>
> You also forgot to mention in changes since v6:
>
>    - store 'null' fh instead of 'invalid' fh

And w.r.t. that change, the following comment in ovl_get_origin()
is a bit confusing now, because the comment referring to
'invalid' file handles was removed from the function.

/* Treat empty origin as "invalid" */
Miklos Szeredi May 5, 2017, 9:53 a.m. UTC | #2
On Fri, May 5, 2017 at 9:55 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Fri, May 5, 2017 at 10:25 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>
>> You also forgot to mention in changes since v6:
>>
>>    - store 'null' fh instead of 'invalid' fh
>
> And w.r.t. that change, the following comment in ovl_get_origin()
> is a bit confusing now, because the comment referring to
> 'invalid' file handles was removed from the function.
>
> /* Treat empty origin as "invalid" */

Okay, with cosmetic fixes pushed to overlayfs-next.

I renamed OVL_TYPE_COPYUP to OVL_TYPE_ORIGIN, because, like you said,
the meaning is slightly different.  COPYUP means it has been copied
up, ORIGIN means we have a dentry pointing to the origin.

Thanks,
Miklos
Amir Goldstein May 5, 2017, 9:58 a.m. UTC | #3
On Fri, May 5, 2017 at 12:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, May 5, 2017 at 9:55 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Fri, May 5, 2017 at 10:25 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>
>>> You also forgot to mention in changes since v6:
>>>
>>>    - store 'null' fh instead of 'invalid' fh
>>
>> And w.r.t. that change, the following comment in ovl_get_origin()
>> is a bit confusing now, because the comment referring to
>> 'invalid' file handles was removed from the function.
>>
>> /* Treat empty origin as "invalid" */
>
> Okay, with cosmetic fixes pushed to overlayfs-next.

Looks good.

>
> I renamed OVL_TYPE_COPYUP to OVL_TYPE_ORIGIN, because, like you said,
> the meaning is slightly different.  COPYUP means it has been copied
> up, ORIGIN means we have a dentry pointing to the origin.
>

Ack.

Thanks,
Amir.
Amir Goldstein May 10, 2017, 8:58 a.m. UTC | #4
On Fri, May 5, 2017 at 12:58 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Fri, May 5, 2017 at 12:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Fri, May 5, 2017 at 9:55 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Fri, May 5, 2017 at 10:25 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>
>>>> You also forgot to mention in changes since v6:
>>>>
>>>>    - store 'null' fh instead of 'invalid' fh
>>>
>>> And w.r.t. that change, the following comment in ovl_get_origin()
>>> is a bit confusing now, because the comment referring to
>>> 'invalid' file handles was removed from the function.
>>>
>>> /* Treat empty origin as "invalid" */
>>
>> Okay, with cosmetic fixes pushed to overlayfs-next.
>
> Looks good.
>

Miklos,

FYI, I have implemented verify_lower mount option:
https://github.com/amir73il/linux/commits/ovl-verify-lower
and tested it below overlay snapshots tests, so for what its worth,
'store file handle' and 'lookup file handle' from overlayfs-next
(merged to current master) have now been also exercised with lots
of lower changes and mount cycles.

Is it too soon to nudge you about the pull request? ;-)

Amir.
Miklos Szeredi May 10, 2017, 9:21 a.m. UTC | #5
On Wed, May 10, 2017 at 10:58 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Fri, May 5, 2017 at 12:58 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Fri, May 5, 2017 at 12:53 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Fri, May 5, 2017 at 9:55 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>> On Fri, May 5, 2017 at 10:25 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>>
>>>>> You also forgot to mention in changes since v6:
>>>>>
>>>>>    - store 'null' fh instead of 'invalid' fh
>>>>
>>>> And w.r.t. that change, the following comment in ovl_get_origin()
>>>> is a bit confusing now, because the comment referring to
>>>> 'invalid' file handles was removed from the function.
>>>>
>>>> /* Treat empty origin as "invalid" */
>>>
>>> Okay, with cosmetic fixes pushed to overlayfs-next.
>>
>> Looks good.
>>
>
> Miklos,
>
> FYI, I have implemented verify_lower mount option:
> https://github.com/amir73il/linux/commits/ovl-verify-lower
> and tested it below overlay snapshots tests, so for what its worth,
> 'store file handle' and 'lookup file handle' from overlayfs-next
> (merged to current master) have now been also exercised with lots
> of lower changes and mount cycles.
>
> Is it too soon to nudge you about the pull request? ;-)

I'll have a look.

My plan is to send a pull request to Linus for the const ino stuff,
and then leave the rest to 4.13.

This is the list I have in my head for what's missing:

- lookup origin dir for snapshots
- const ino for non-samefs
- correct d_ino for copied up entries
- NFS export support
- hardlink unbreaking
- sharing pages for reflink (*)
- ro/rw correctness for samefs with temp reflink (**)
- sharing pages in the general case (*)

(*) very preliminary design
(**) need to check overhead: I have a feeling that it's a heavyweight
solution for a tiny problem

The non-starred ones don't seem too hard and should aim for 4.13.

Thanks,
Miklos
Amir Goldstein May 10, 2017, 10:09 a.m. UTC | #6
On Wed, May 10, 2017 at 12:21 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, May 10, 2017 at 10:58 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>>
>> Is it too soon to nudge you about the pull request? ;-)
>
> I'll have a look.
>
> My plan is to send a pull request to Linus for the const ino stuff,
> and then leave the rest to 4.13.
>

Sure, no rush about verify_lower, I just meant to nudge about the pull
request for const ino.

> This is the list I have in my head for what's missing:
>
> - lookup origin dir for snapshots
> - const ino for non-samefs
> - correct d_ino for copied up entries
> - NFS export support
> - hardlink unbreaking
> - sharing pages for reflink (*)
> - ro/rw correctness for samefs with temp reflink (**)
> - sharing pages in the general case (*)
>

Just updated TODO with the a similar list yesterday ;)
https://github.com/amir73il/overlayfs/wiki/Overlayfs-TODO

> (*) very preliminary design
> (**) need to check overhead: I have a feeling that it's a heavyweight
> solution for a tiny problem
>

I am hoping to reduce the code complexity (of rocopyup) after
we have the workdir/inodes/ index, because taking care of
rocopyup/rwcopyup races can be similar to taking care of
link/unlink/link races... we'll see how it goes.

The read pages performance overhead should be reasonable with
sharing pages.

The inode create/delete overhead is something we will need to measure,
but from my experience with xfs, I suspect that the cost is going to be
fair enough that some people would like to pay it to get the "tiny" problem
solved.

Also, the "tiny" problem is also going to be manifested with hardlinks
copy up, where it may need to be fixed anyway.
see this commit message for the details:
https://github.com/amir73il/linux/commit/62a16f3389187d3c3efa22293bf65cdb7bd619c5



> The non-starred ones don't seem too hard and should aim for 4.13.
>

Indeed. I am going to start with hardlinks and const d_ino soon.
The case of non-dir is easier to handle first because:
1. workdir/inodes/#lower_ino is a hard link to upper,
    so fewer follow by file handles involved
2. const d_ino for directory is not very interesting -
    'find' always uses stat() to get inode number of directory,
    so 'find -ino N' is not affected by non-const dir d_ino.

Cheers,
Amir.
Amir Goldstein May 10, 2017, 4 p.m. UTC | #7
On Wed, May 10, 2017 at 12:21 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
[...]
>
> My plan is to send a pull request to Linus for the const ino stuff,
> and then leave the rest to 4.13.
>
> This is the list I have in my head for what's missing:
>
> - lookup origin dir for snapshots
> - const ino for non-samefs
> - correct d_ino for copied up entries
> - NFS export support
> - hardlink unbreaking
[...]
>
> The non-starred ones don't seem too hard and should aim for 4.13.
>

Just to clarify, did you mean that we should aim for NFS export and hardlinks
for 4.13 for samefs? or non-samefs?
Because I can see how non-samefs is possible, but complicates the reverse
inode map, so I rather start with samefs.

It turns out that constant d_ino wasn't that hard.
It's quite an isolated patch based on your old overlay.ino POC, see:
https://github.com/amir73il/linux/tree/ovl-constino

Basically, it's working and I also have xfstests to verify it,
which I will post those and the patch later.

I think that the toll on readdir performance is not such a big problem,
because stat'ing the dir entries is served from cache, so 'ls -lR' is
not going to suffer, only 'find -ino' with cold cache will suffer.

There is still room for optimization for iterating 'very pure upper' dirs
(with no copy ups in them).

Amir.
diff mbox

Patch

From cb0b6dacef3cf3b1c65459d174db2392095da9fa Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il@gmail.com>
Date: Sun, 23 Apr 2017 23:12:34 +0300
Subject: [PATCH v7] ovl: set the COPYUP type flag for non-dirs

For directory entries, non zero oe->numlower implies OVL_TYPE_MERGE.
Define a new type flag OVL_TYPE_COPYUP to indicate that an entry holds
a reference to its lower copy up origin.

For directory entries COPYUP := MERGE && UPPER. For non-dir entries
COPYUP means that a lower type dentry has been recently copied up or
that we were able to find the copy up origin from overlay.fh xattr.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/inode.c     |  5 ++---
 fs/overlayfs/overlayfs.h |  2 ++
 fs/overlayfs/util.c      | 10 ++++++----
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index fa202cf..1195a2c 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -84,12 +84,11 @@  int ovl_getattr(const struct path *path, struct kstat *stat,
 	 * persistent st_ino across mount cycle.
 	 */
 	if (ovl_same_sb(dentry->d_sb)) {
-		ovl_path_lower(dentry, &realpath);
-
-		if (OVL_TYPE_UPPER(type) && realpath.dentry) {
+		if (OVL_TYPE_COPYUP(type)) {
 			struct kstat lowerstat;
 			u32 lowermask = STATX_INO | (!is_dir ? STATX_NLINK : 0);
 
+			ovl_path_lower(dentry, &realpath);
 			err = vfs_getattr(&realpath, &lowerstat,
 					  lowermask, flags);
 			if (err)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index f3dd5f3..d3f2ee8 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -13,10 +13,12 @@ 
 enum ovl_path_type {
 	__OVL_PATH_UPPER	= (1 << 0),
 	__OVL_PATH_MERGE	= (1 << 1),
+	__OVL_PATH_COPYUP	= (1 << 2),
 };
 
 #define OVL_TYPE_UPPER(type)	((type) & __OVL_PATH_UPPER)
 #define OVL_TYPE_MERGE(type)	((type) & __OVL_PATH_MERGE)
+#define OVL_TYPE_COPYUP(type)	((type) & __OVL_PATH_COPYUP)
 
 #define OVL_XATTR_PREFIX XATTR_TRUSTED_PREFIX "overlay."
 #define OVL_XATTR_OPAQUE OVL_XATTR_PREFIX "opaque"
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 4a7e5c8..fd4f78c 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -83,11 +83,13 @@  enum ovl_path_type ovl_path_type(struct dentry *dentry)
 		type = __OVL_PATH_UPPER;
 
 		/*
-		 * Non-dir dentry can hold lower dentry from previous
-		 * location.
+		 * Non-dir dentry can hold lower dentry of its copy up origin.
 		 */
-		if (oe->numlower && d_is_dir(dentry))
-			type |= __OVL_PATH_MERGE;
+		if (oe->numlower) {
+			type |= __OVL_PATH_COPYUP;
+			if (d_is_dir(dentry))
+				type |= __OVL_PATH_MERGE;
+		}
 	} else {
 		if (oe->numlower > 1)
 			type |= __OVL_PATH_MERGE;
-- 
2.7.4