Message ID | CAOQ4uxiBmFdcueorKV7zwPLCDq4DE+H8x=8H1f7+3v3zysW9qA@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Amir, Thanks for the patch and the clarification for which changes to the underlying filesystem are likely to cause havoc. I'll play with this some to see how usable it might be. -JE On Tue, Feb 14, 2017 at 6:01 AM, Amir Goldstein <amir73il@gmail.com> wrote: > On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote: >> So here's the use case: lowerdir is an NFS mounted root filesystem >> (shared by a bunch of nodes). upperdir is a tmpfs RAM disk to allow >> for writes to happen. This works great with the caveat being I cannot >> make 'live' changes to the root filesystem, which poses the problem. >> Any access to a changed file causes a 'Stale file handle' error. >> >> With some experimenting, I've discovered that remounting the overlay >> filesystem (mount -o remount / /) registers any changes that have >> been made to the lower NFS filesystem. In addition, dumping cache >> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors >> go away and reads pass through to the lower dir and correctly show >> changes. >> >> I'd like to make this use case feasible by allowing changes to the NFS >> lowerdir to work more or less transparently. It seems like if the >> overlay did not do any caching at all, all reads would fall through to >> either the upperdir ram disk or the NFS lower, which is precisely what >> I want. >> >> So, let me pose this somewhat naive question: Would it be possible to >> simply disable any cacheing performed by the overlay to force all >> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower? >> Would this be enough to accomplish my goal of being able to change the >> lowerdir of an active overlayfs? >> > > There is no need to disable caching. There is already a mechanism > in place in VFS to revalidate inode cache entries. > NFS implements d_revalidate() and overlayfs implements d_revalidate() > by calling into the lower fs d_revalidate(). > > However overlayfs intentionally errors when lower entry has been modified. > (see: 7c03b5d ovl: allow distributed fs as lower layer) > > You can try this (untested) patch to revert this behavior, just to see if it > works for your use case, but it won't change this fact > from Documentation/filesystems/overlayfs.txt: > " Changes to the underlying filesystems while part of a mounted overlay > filesystem are not allowed. If the underlying filesystem is changed, > the behavior of the overlay is undefined, though it will not result in > a crash or deadlock." > > Specifically, renaming directories and files in lower that were already > copied up is going to have a weird outcome. > > Also, the situation with changing files in lower remote fs could be worse > than changing files on lower local fs, simply because right now, this > use case is not tested (i.e. it results in ESTALE). > > I believe that fixing this use case, if at all possible, would require quite > a bit of work, a lot of documentation (about expected behavior) and > even more testing. > > Amir. > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > index e8ef9d1..6806ef3 100644 > --- a/fs/overlayfs/super.c > +++ b/fs/overlayfs/super.c > @@ -106,16 +106,11 @@ static int ovl_dentry_revalidate(struct dentry > *dentry, unsigned int flags) > > if (d->d_flags & DCACHE_OP_REVALIDATE) { > ret = d->d_op->d_revalidate(d, flags); > - if (ret < 0) > + if (ret =< 0) > return ret; > - if (!ret) { > - if (!(flags & LOOKUP_RCU)) > - d_invalidate(d); > - return -ESTALE; > - } > } > } > - return 1; > + return ret; > }
Amir, After playing with it some, this patch seems to provide precisely the behavior I need for my use case. Do you think it makes sense to turn this behavior into a module parameter (eg: allow_revalidate)? -JE On Tue, Feb 14, 2017 at 6:01 AM, Amir Goldstein <amir73il@gmail.com> wrote: > On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote: >> So here's the use case: lowerdir is an NFS mounted root filesystem >> (shared by a bunch of nodes). upperdir is a tmpfs RAM disk to allow >> for writes to happen. This works great with the caveat being I cannot >> make 'live' changes to the root filesystem, which poses the problem. >> Any access to a changed file causes a 'Stale file handle' error. >> >> With some experimenting, I've discovered that remounting the overlay >> filesystem (mount -o remount / /) registers any changes that have >> been made to the lower NFS filesystem. In addition, dumping cache >> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors >> go away and reads pass through to the lower dir and correctly show >> changes. >> >> I'd like to make this use case feasible by allowing changes to the NFS >> lowerdir to work more or less transparently. It seems like if the >> overlay did not do any caching at all, all reads would fall through to >> either the upperdir ram disk or the NFS lower, which is precisely what >> I want. >> >> So, let me pose this somewhat naive question: Would it be possible to >> simply disable any cacheing performed by the overlay to force all >> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower? >> Would this be enough to accomplish my goal of being able to change the >> lowerdir of an active overlayfs? >> > > There is no need to disable caching. There is already a mechanism > in place in VFS to revalidate inode cache entries. > NFS implements d_revalidate() and overlayfs implements d_revalidate() > by calling into the lower fs d_revalidate(). > > However overlayfs intentionally errors when lower entry has been modified. > (see: 7c03b5d ovl: allow distributed fs as lower layer) > > You can try this (untested) patch to revert this behavior, just to see if it > works for your use case, but it won't change this fact > from Documentation/filesystems/overlayfs.txt: > " Changes to the underlying filesystems while part of a mounted overlay > filesystem are not allowed. If the underlying filesystem is changed, > the behavior of the overlay is undefined, though it will not result in > a crash or deadlock." > > Specifically, renaming directories and files in lower that were already > copied up is going to have a weird outcome. > > Also, the situation with changing files in lower remote fs could be worse > than changing files on lower local fs, simply because right now, this > use case is not tested (i.e. it results in ESTALE). > > I believe that fixing this use case, if at all possible, would require quite > a bit of work, a lot of documentation (about expected behavior) and > even more testing. > > Amir. > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > index e8ef9d1..6806ef3 100644 > --- a/fs/overlayfs/super.c > +++ b/fs/overlayfs/super.c > @@ -106,16 +106,11 @@ static int ovl_dentry_revalidate(struct dentry > *dentry, unsigned int flags) > > if (d->d_flags & DCACHE_OP_REVALIDATE) { > ret = d->d_op->d_revalidate(d, flags); > - if (ret < 0) > + if (ret =< 0) > return ret; > - if (!ret) { > - if (!(flags & LOOKUP_RCU)) > - d_invalidate(d); > - return -ESTALE; > - } > } > } > - return 1; > + return ret; > }
On Tue, 2017-02-21 at 15:08 -0800, Josh England wrote: > Amir, > > After playing with it some, this patch seems to provide precisely the > behavior I need for my use case. Do you think it makes sense to turn > this behavior into a module parameter (eg: allow_revalidate)? This has been a hot topic for me recently, and not only for the overlayfs use case. Any case where the inode of a mount point changes (file or directory) is a problem with current kernels. The problem is that, if the invalidated mount point belongs to a file system that has been propagated to other namespaces (which is mostly always the case) the invalid dentry will be exposed in the other namespaces where it is not a mountpoint. I'm not sure how this can be exploited but apparently it can be by using user namespaces. To implement this without exposing these dentrys when they shouldn't be the VFS would need to retain the invalidated dentrys and allow valid dentrys to be created (leading to multiple dentrys of the same name within directories), hide them from external VFS lookups, but be able to find them itself when it needs them and check if they are a mount point in the current namespace, avoid further revalidation, etc, etc, and the implications go on. Not only would retaining these dentrys probably lead to some fairly ugly code, locating and checking these dentrys would need to be done on very hot execution paths in the VFS which is a challenge in itself (the current kernel mount data structures don't lend themselves to this at all). So yes, on the face of it, it's straight forward to get something fairly simple that will "appear to work" but would not be anywhere near good enough for inclusion in the mainline kernel. > > -JE > > > On Tue, Feb 14, 2017 at 6:01 AM, Amir Goldstein <amir73il@gmail.com> wrote: > > On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote: > > > So here's the use case: lowerdir is an NFS mounted root filesystem > > > (shared by a bunch of nodes). upperdir is a tmpfs RAM disk to allow > > > for writes to happen. This works great with the caveat being I cannot > > > make 'live' changes to the root filesystem, which poses the problem. > > > Any access to a changed file causes a 'Stale file handle' error. > > > > > > With some experimenting, I've discovered that remounting the overlay > > > filesystem (mount -o remount / /) registers any changes that have > > > been made to the lower NFS filesystem. In addition, dumping cache > > > (via /proc/sys/vm/drop_caches) also makes the stale file handle errors > > > go away and reads pass through to the lower dir and correctly show > > > changes. > > > > > > I'd like to make this use case feasible by allowing changes to the NFS > > > lowerdir to work more or less transparently. It seems like if the > > > overlay did not do any caching at all, all reads would fall through to > > > either the upperdir ram disk or the NFS lower, which is precisely what > > > I want. > > > > > > So, let me pose this somewhat naive question: Would it be possible to > > > simply disable any cacheing performed by the overlay to force all > > > reads to go to either the tmpfs upper or the (VFS-cached) NFS lower? > > > Would this be enough to accomplish my goal of being able to change the > > > lowerdir of an active overlayfs? > > > > > > > There is no need to disable caching. There is already a mechanism > > in place in VFS to revalidate inode cache entries. > > NFS implements d_revalidate() and overlayfs implements d_revalidate() > > by calling into the lower fs d_revalidate(). > > > > However overlayfs intentionally errors when lower entry has been modified. > > (see: 7c03b5d ovl: allow distributed fs as lower layer) > > > > You can try this (untested) patch to revert this behavior, just to see if it > > works for your use case, but it won't change this fact > > from Documentation/filesystems/overlayfs.txt: > > " Changes to the underlying filesystems while part of a mounted overlay > > filesystem are not allowed. If the underlying filesystem is changed, > > the behavior of the overlay is undefined, though it will not result in > > a crash or deadlock." > > > > Specifically, renaming directories and files in lower that were already > > copied up is going to have a weird outcome. > > > > Also, the situation with changing files in lower remote fs could be worse > > than changing files on lower local fs, simply because right now, this > > use case is not tested (i.e. it results in ESTALE). > > > > I believe that fixing this use case, if at all possible, would require quite > > a bit of work, a lot of documentation (about expected behavior) and > > even more testing. > > > > Amir. > > > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > > index e8ef9d1..6806ef3 100644 > > --- a/fs/overlayfs/super.c > > +++ b/fs/overlayfs/super.c > > @@ -106,16 +106,11 @@ static int ovl_dentry_revalidate(struct dentry > > *dentry, unsigned int flags) > > > > if (d->d_flags & DCACHE_OP_REVALIDATE) { > > ret = d->d_op->d_revalidate(d, flags); > > - if (ret < 0) > > + if (ret =< 0) > > return ret; > > - if (!ret) { > > - if (!(flags & LOOKUP_RCU)) > > - d_invalidate(d); > > - return -ESTALE; > > - } > > } > > } > > - return 1; > > + return ret; > > } > > -- > To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 22, 2017 at 1:08 AM, Josh England <jjengla@gmail.com> wrote: > Amir, > > After playing with it some, this patch seems to provide precisely the > behavior I need for my use case. Do you think it makes sense to turn > this behavior into a module parameter (eg: allow_revalidate)? > I don't know, because I don't know the reason that Miklos chose to error on revalidate of remote lower fs. But it would be strange to introduce a feature that changes one undefined behavior (maybe ESTALE) with another undefined behavior. It may be easier if you can argue for a use case which does have defined behavior, for example, lower fs has some directory subtrees that are not modified via overlayfs and only modified directry via lower fs. I think this *may* result in defined behavior over lower remote fs, but can't tell for sure. Anyway, you will have to argue why such a setup is useful. > -JE > > > On Tue, Feb 14, 2017 at 6:01 AM, Amir Goldstein <amir73il@gmail.com> wrote: >> On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote: >>> So here's the use case: lowerdir is an NFS mounted root filesystem >>> (shared by a bunch of nodes). upperdir is a tmpfs RAM disk to allow >>> for writes to happen. This works great with the caveat being I cannot >>> make 'live' changes to the root filesystem, which poses the problem. >>> Any access to a changed file causes a 'Stale file handle' error. >>> >>> With some experimenting, I've discovered that remounting the overlay >>> filesystem (mount -o remount / /) registers any changes that have >>> been made to the lower NFS filesystem. In addition, dumping cache >>> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors >>> go away and reads pass through to the lower dir and correctly show >>> changes. >>> >>> I'd like to make this use case feasible by allowing changes to the NFS >>> lowerdir to work more or less transparently. It seems like if the >>> overlay did not do any caching at all, all reads would fall through to >>> either the upperdir ram disk or the NFS lower, which is precisely what >>> I want. >>> >>> So, let me pose this somewhat naive question: Would it be possible to >>> simply disable any cacheing performed by the overlay to force all >>> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower? >>> Would this be enough to accomplish my goal of being able to change the >>> lowerdir of an active overlayfs? >>> >> >> There is no need to disable caching. There is already a mechanism >> in place in VFS to revalidate inode cache entries. >> NFS implements d_revalidate() and overlayfs implements d_revalidate() >> by calling into the lower fs d_revalidate(). >> >> However overlayfs intentionally errors when lower entry has been modified. >> (see: 7c03b5d ovl: allow distributed fs as lower layer) >> >> You can try this (untested) patch to revert this behavior, just to see if it >> works for your use case, but it won't change this fact >> from Documentation/filesystems/overlayfs.txt: >> " Changes to the underlying filesystems while part of a mounted overlay >> filesystem are not allowed. If the underlying filesystem is changed, >> the behavior of the overlay is undefined, though it will not result in >> a crash or deadlock." >> >> Specifically, renaming directories and files in lower that were already >> copied up is going to have a weird outcome. >> >> Also, the situation with changing files in lower remote fs could be worse >> than changing files on lower local fs, simply because right now, this >> use case is not tested (i.e. it results in ESTALE). >> >> I believe that fixing this use case, if at all possible, would require quite >> a bit of work, a lot of documentation (about expected behavior) and >> even more testing. >> >> Amir. >> >> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c >> index e8ef9d1..6806ef3 100644 >> --- a/fs/overlayfs/super.c >> +++ b/fs/overlayfs/super.c >> @@ -106,16 +106,11 @@ static int ovl_dentry_revalidate(struct dentry >> *dentry, unsigned int flags) >> >> if (d->d_flags & DCACHE_OP_REVALIDATE) { >> ret = d->d_op->d_revalidate(d, flags); >> - if (ret < 0) >> + if (ret =< 0) >> return ret; >> - if (!ret) { >> - if (!(flags & LOOKUP_RCU)) >> - d_invalidate(d); >> - return -ESTALE; >> - } >> } >> } >> - return 1; >> + return ret; >> }
On Mon, Feb 27, 2017 at 2:40 AM, Amir Goldstein <amir73il@gmail.com> wrote: > It may be easier if you can argue for a use case which does have > defined behavior, > for example, lower fs has some directory subtrees that are not > modified via overlayfs > and only modified directry via lower fs. > I think this *may* result in defined behavior over lower remote fs, > but can't tell for sure. > Anyway, you will have to argue why such a setup is useful. My use case is actually one that involves very little (but necessary) writes to the overlayfs, and the vast majority of changes happen directly to the lower fs. I maintain cluster management software that has been used on some of the largest clusters in the world. The premise is to network boot and mount a shared NFS root on an arbitrarily large number of cluster nodes. It has always been an annoyance to have to find the little bits and pieces in the OS (think of a full RHEL/CentOS distro) that require the ability to write somewhere and so we've always had to play silly tricks to get around that. Using a ram overlay as the upper allows these minor writes to happen transparently and save end users massive amounts of time having to track down and handle each one individually. The behavior I've noted so far from this patch seems to be very well defined and follows the rule of least surprise (at least for me). I can change files/directories in the shared root and the changes are instantly visible to all clients. If a file/directory has been copied up on the client that change takes precedence over changes in the lower. Deletes work as expected. I haven't found a downside yet. In fact I've bundled a release of my software that configures this mode of operation as on option. The given patch is included with that release, but it would be 1000 times cleaner as a stock kernel module parameter with no patches required. -JE
On Tue, Feb 28, 2017 at 11:08:31AM -0800, Josh England wrote: > The behavior I've noted so far from this patch seems to be very well > defined and follows the rule of least surprise (at least for me). I > can change files/directories in the shared root and the changes are > instantly visible to all clients. If a file/directory has been copied > up on the client that change takes precedence over changes in the > lower. Deletes work as expected. I haven't found a downside yet. Try to do something that invalidates a directory in underlying layer with stuff in overlay on its subdirectories.
On Tue, Feb 28, 2017 at 9:44 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Tue, Feb 28, 2017 at 11:08:31AM -0800, Josh England wrote: > >> The behavior I've noted so far from this patch seems to be very well >> defined and follows the rule of least surprise (at least for me). I >> can change files/directories in the shared root and the changes are >> instantly visible to all clients. If a file/directory has been copied >> up on the client that change takes precedence over changes in the >> lower. Deletes work as expected. I haven't found a downside yet. > > Try to do something that invalidates a directory in underlying layer > with stuff in overlay on its subdirectories. That's an interesting test to run, but why does ovl_dentry_revalidate() need to return 1/-ESTALE instead of ret? Why not propagate the return values 0 and -ECHILD from NFS? After all, NFS lower directory inode can be invalidated after lookup and then other overlayfs ops (e.g. readdir) could fail with -ESTALE just the same as those ops would fail directly over NFS. No? What am I missing?
Directory /foo/bar Client (overlay) creates /foo/bar/biz /foo/bar gets deleted from lower (on NFS server) All of /foo/bar (including biz) still exists on client because it had been copied up. This is actually the behavior I'd expect given the nature of overlayfs. -JE On Tue, Feb 28, 2017 at 11:44 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > On Tue, Feb 28, 2017 at 11:08:31AM -0800, Josh England wrote: > >> The behavior I've noted so far from this patch seems to be very well >> defined and follows the rule of least surprise (at least for me). I >> can change files/directories in the shared root and the changes are >> instantly visible to all clients. If a file/directory has been copied >> up on the client that change takes precedence over changes in the >> lower. Deletes work as expected. I haven't found a downside yet. > > Try to do something that invalidates a directory in underlying layer > with stuff in overlay on its subdirectories.
On Tue, Feb 28, 2017, at 02:08 PM, Josh England wrote: > It has always been an > annoyance to have to find the little bits and pieces in the OS (think > of a full RHEL/CentOS distro) that require the ability to write > somewhere As part of driving the [OSTree](https://ostree.readthedocs.io/en/latest/) model into Fedora/CentOS (well really rpm-ostree, which is part of Atomic Host), we have a read-only bind mount over /usr, and the only two writable directories are /etc and /var. In a diskless mode, then you could have /usr as a read-only NFS/cachefiles mount, and use tmpfs for /etc and /var. Ideally /etc can be read-only too during operation but that is definitely one where we hit things like LVM userspace writing there in the background: https://bugzilla.redhat.com/show_bug.cgi?id=1366584 But if something required writing to /usr at runtime in Fedora/CentOS that should be considered a bug.
On Tue, Feb 14, 2017 at 3:01 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote: >> So here's the use case: lowerdir is an NFS mounted root filesystem >> (shared by a bunch of nodes). upperdir is a tmpfs RAM disk to allow >> for writes to happen. This works great with the caveat being I cannot >> make 'live' changes to the root filesystem, which poses the problem. >> Any access to a changed file causes a 'Stale file handle' error. >> >> With some experimenting, I've discovered that remounting the overlay >> filesystem (mount -o remount / /) registers any changes that have >> been made to the lower NFS filesystem. In addition, dumping cache >> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors >> go away and reads pass through to the lower dir and correctly show >> changes. >> >> I'd like to make this use case feasible by allowing changes to the NFS >> lowerdir to work more or less transparently. It seems like if the >> overlay did not do any caching at all, all reads would fall through to >> either the upperdir ram disk or the NFS lower, which is precisely what >> I want. >> >> So, let me pose this somewhat naive question: Would it be possible to >> simply disable any cacheing performed by the overlay to force all >> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower? >> Would this be enough to accomplish my goal of being able to change the >> lowerdir of an active overlayfs? >> > > There is no need to disable caching. There is already a mechanism > in place in VFS to revalidate inode cache entries. > NFS implements d_revalidate() and overlayfs implements d_revalidate() > by calling into the lower fs d_revalidate(). > > However overlayfs intentionally errors when lower entry has been modified. > (see: 7c03b5d ovl: allow distributed fs as lower layer) > > You can try this (untested) patch to revert this behavior, just to see if it > works for your use case, but it won't change this fact > from Documentation/filesystems/overlayfs.txt: > " Changes to the underlying filesystems while part of a mounted overlay > filesystem are not allowed. If the underlying filesystem is changed, > the behavior of the overlay is undefined, though it will not result in > a crash or deadlock." Best way to keep things simple is to only add functionality when someone actually needs it (and can test it). This has been the design policy in overlayfs and it worked wonderfully. So we could probably fix the undefined behavior in the above case to some extent. > > Specifically, renaming directories and files in lower that were already > copied up is going to have a weird outcome. > > Also, the situation with changing files in lower remote fs could be worse > than changing files on lower local fs, simply because right now, this > use case is not tested (i.e. it results in ESTALE). > > I believe that fixing this use case, if at all possible, would require quite > a bit of work, a lot of documentation (about expected behavior) and > even more testing. Well, your patch seems to be safe: if remote fs says something changed, throw away node and subtree on the overlay level. We could introduce the same thing for local fs. Just need to verify in .d_revalidate() that underlying dentry's parent and name matches overlay dentry's parent and name. It's an overhead, and makes no sense in the case when we know the lower layers won't change, so it may be best to keep this check optional. Note, that overlay would still return ESTALE if the change on the lower layer happens on a dentry already looked up (e.g. cwd, open file, race of lookup with rename on underlying layer). Same as NFS. Thanks, Miklos
On Thu, Mar 9, 2017 at 12:37 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Tue, Feb 14, 2017 at 3:01 PM, Amir Goldstein <amir73il@gmail.com> wrote: >> On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote: >>> So here's the use case: lowerdir is an NFS mounted root filesystem >>> (shared by a bunch of nodes). upperdir is a tmpfs RAM disk to allow >>> for writes to happen. This works great with the caveat being I cannot >>> make 'live' changes to the root filesystem, which poses the problem. >>> Any access to a changed file causes a 'Stale file handle' error. >>> >>> With some experimenting, I've discovered that remounting the overlay >>> filesystem (mount -o remount / /) registers any changes that have >>> been made to the lower NFS filesystem. In addition, dumping cache >>> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors >>> go away and reads pass through to the lower dir and correctly show >>> changes. >>> >>> I'd like to make this use case feasible by allowing changes to the NFS >>> lowerdir to work more or less transparently. It seems like if the >>> overlay did not do any caching at all, all reads would fall through to >>> either the upperdir ram disk or the NFS lower, which is precisely what >>> I want. >>> >>> So, let me pose this somewhat naive question: Would it be possible to >>> simply disable any cacheing performed by the overlay to force all >>> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower? >>> Would this be enough to accomplish my goal of being able to change the >>> lowerdir of an active overlayfs? >>> >> >> There is no need to disable caching. There is already a mechanism >> in place in VFS to revalidate inode cache entries. >> NFS implements d_revalidate() and overlayfs implements d_revalidate() >> by calling into the lower fs d_revalidate(). >> >> However overlayfs intentionally errors when lower entry has been modified. >> (see: 7c03b5d ovl: allow distributed fs as lower layer) >> >> You can try this (untested) patch to revert this behavior, just to see if it >> works for your use case, but it won't change this fact >> from Documentation/filesystems/overlayfs.txt: >> " Changes to the underlying filesystems while part of a mounted overlay >> filesystem are not allowed. If the underlying filesystem is changed, >> the behavior of the overlay is undefined, though it will not result in >> a crash or deadlock." > > Best way to keep things simple is to only add functionality when > someone actually needs it (and can test it). This has been the design > policy in overlayfs and it worked wonderfully. > > So we could probably fix the undefined behavior in the above case to > some extent. > >> >> Specifically, renaming directories and files in lower that were already >> copied up is going to have a weird outcome. >> >> Also, the situation with changing files in lower remote fs could be worse >> than changing files on lower local fs, simply because right now, this >> use case is not tested (i.e. it results in ESTALE). >> >> I believe that fixing this use case, if at all possible, would require quite >> a bit of work, a lot of documentation (about expected behavior) and >> even more testing. > > Well, your patch seems to be safe: if remote fs says something > changed, throw away node and subtree on the overlay level. > > We could introduce the same thing for local fs. Just need to verify > in .d_revalidate() that underlying dentry's parent and name matches > overlay dentry's parent and name. It's an overhead, and makes no > sense in the case when we know the lower layers won't change, so it > may be best to keep this check optional. > > Note, that overlay would still return ESTALE if the change on the > lower layer happens on a dentry already looked up (e.g. cwd, open > file, race of lookup with rename on underlying layer). Same as NFS. > Naturally. Could you explain what was the reason for special casing (ret == 0) in ovl_dentry_revalidate()?
On Thu, Mar 9, 2017 at 12:22 PM, Amir Goldstein <amir73il@gmail.com> wrote: > On Thu, Mar 9, 2017 at 12:37 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: >> On Tue, Feb 14, 2017 at 3:01 PM, Amir Goldstein <amir73il@gmail.com> wrote: >>> On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote: >>>> So here's the use case: lowerdir is an NFS mounted root filesystem >>>> (shared by a bunch of nodes). upperdir is a tmpfs RAM disk to allow >>>> for writes to happen. This works great with the caveat being I cannot >>>> make 'live' changes to the root filesystem, which poses the problem. >>>> Any access to a changed file causes a 'Stale file handle' error. >>>> >>>> With some experimenting, I've discovered that remounting the overlay >>>> filesystem (mount -o remount / /) registers any changes that have >>>> been made to the lower NFS filesystem. In addition, dumping cache >>>> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors >>>> go away and reads pass through to the lower dir and correctly show >>>> changes. >>>> >>>> I'd like to make this use case feasible by allowing changes to the NFS >>>> lowerdir to work more or less transparently. It seems like if the >>>> overlay did not do any caching at all, all reads would fall through to >>>> either the upperdir ram disk or the NFS lower, which is precisely what >>>> I want. >>>> >>>> So, let me pose this somewhat naive question: Would it be possible to >>>> simply disable any cacheing performed by the overlay to force all >>>> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower? >>>> Would this be enough to accomplish my goal of being able to change the >>>> lowerdir of an active overlayfs? >>>> >>> >>> There is no need to disable caching. There is already a mechanism >>> in place in VFS to revalidate inode cache entries. >>> NFS implements d_revalidate() and overlayfs implements d_revalidate() >>> by calling into the lower fs d_revalidate(). >>> >>> However overlayfs intentionally errors when lower entry has been modified. >>> (see: 7c03b5d ovl: allow distributed fs as lower layer) >>> >>> You can try this (untested) patch to revert this behavior, just to see if it >>> works for your use case, but it won't change this fact >>> from Documentation/filesystems/overlayfs.txt: >>> " Changes to the underlying filesystems while part of a mounted overlay >>> filesystem are not allowed. If the underlying filesystem is changed, >>> the behavior of the overlay is undefined, though it will not result in >>> a crash or deadlock." >> >> Best way to keep things simple is to only add functionality when >> someone actually needs it (and can test it). This has been the design >> policy in overlayfs and it worked wonderfully. >> >> So we could probably fix the undefined behavior in the above case to >> some extent. >> >>> >>> Specifically, renaming directories and files in lower that were already >>> copied up is going to have a weird outcome. >>> >>> Also, the situation with changing files in lower remote fs could be worse >>> than changing files on lower local fs, simply because right now, this >>> use case is not tested (i.e. it results in ESTALE). >>> >>> I believe that fixing this use case, if at all possible, would require quite >>> a bit of work, a lot of documentation (about expected behavior) and >>> even more testing. >> >> Well, your patch seems to be safe: if remote fs says something >> changed, throw away node and subtree on the overlay level. >> >> We could introduce the same thing for local fs. Just need to verify >> in .d_revalidate() that underlying dentry's parent and name matches >> overlay dentry's parent and name. It's an overhead, and makes no >> sense in the case when we know the lower layers won't change, so it >> may be best to keep this check optional. >> >> Note, that overlay would still return ESTALE if the change on the >> lower layer happens on a dentry already looked up (e.g. cwd, open >> file, race of lookup with rename on underlying layer). Same as NFS. >> > > Naturally. > > Could you explain what was the reason for special casing (ret == 0) > in ovl_dentry_revalidate()? I think my reasoning was: if lower dentry is invalid, it means lower was changed; this is against the rules (as per documentation) so return error. Thanks, Miklos
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index e8ef9d1..6806ef3 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -106,16 +106,11 @@ static int ovl_dentry_revalidate(struct dentry *dentry, unsigned int flags) if (d->d_flags & DCACHE_OP_REVALIDATE) { ret = d->d_op->d_revalidate(d, flags); - if (ret < 0) + if (ret =< 0) return ret; - if (!ret) { - if (!(flags & LOOKUP_RCU)) - d_invalidate(d); - return -ESTALE; - } } } - return 1; + return ret; }