Message ID | 20190809225833.6657-3-ira.weiny@intel.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | RDMA/FS DAX truncate proposal V1,000,002 ;-) | expand |
On Fri, 2019-08-09 at 15:58 -0700, ira.weiny@intel.com wrote: > From: Ira Weiny <ira.weiny@intel.com> > > Add an exclusive lease flag which indicates that the layout mechanism > can not be broken. > > Exclusive layout leases allow the file system to know that pages may be > GUP pined and that attempts to change the layout, ie truncate, should be > failed. > > A process which attempts to break it's own exclusive lease gets an > EDEADLOCK return to help determine that this is likely a programming bug > vs someone else holding a resource. > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > --- > fs/locks.c | 23 +++++++++++++++++++++-- > include/linux/fs.h | 1 + > include/uapi/asm-generic/fcntl.h | 2 ++ > 3 files changed, 24 insertions(+), 2 deletions(-) > > diff --git a/fs/locks.c b/fs/locks.c > index ad17c6ffca06..0c7359cdab92 100644 > --- a/fs/locks.c > +++ b/fs/locks.c > @@ -626,6 +626,8 @@ static int lease_init(struct file *filp, long type, unsigned int flags, > fl->fl_flags = FL_LEASE; > if (flags & FL_LAYOUT) > fl->fl_flags |= FL_LAYOUT; > + if (flags & FL_EXCLUSIVE) > + fl->fl_flags |= FL_EXCLUSIVE; > fl->fl_start = 0; > fl->fl_end = OFFSET_MAX; > fl->fl_ops = NULL; > @@ -1619,6 +1621,14 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) > list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, fl_list) { > if (!leases_conflict(fl, new_fl)) > continue; > + if (fl->fl_flags & FL_EXCLUSIVE) { > + error = -ETXTBSY; > + if (new_fl->fl_pid == fl->fl_pid) { > + error = -EDEADLOCK; > + goto out; > + } > + continue; > + } > if (want_write) { > if (fl->fl_flags & FL_UNLOCK_PENDING) > continue; > @@ -1634,6 +1644,13 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) > locks_delete_lock_ctx(fl, &dispose); > } > > + /* We differentiate between -EDEADLOCK and -ETXTBSY so the above loop > + * continues with -ETXTBSY looking for a potential deadlock instead. > + * If deadlock is not found go ahead and return -ETXTBSY. > + */ > + if (error == -ETXTBSY) > + goto out; > + > if (list_empty(&ctx->flc_lease)) > goto out; > > @@ -2044,9 +2061,11 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg) > * to revoke the lease in break_layout() And this is done by using > * F_WRLCK in the break code. > */ > - if (arg == F_LAYOUT) { > + if ((arg & F_LAYOUT) == F_LAYOUT) { > + if ((arg & F_EXCLUSIVE) == F_EXCLUSIVE) > + flags |= FL_EXCLUSIVE; > arg = F_RDLCK; > - flags = FL_LAYOUT; > + flags |= FL_LAYOUT; > } > > fl = lease_alloc(filp, arg, flags); > diff --git a/include/linux/fs.h b/include/linux/fs.h > index dd60d5be9886..2e41ce547913 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1005,6 +1005,7 @@ static inline struct file *get_file(struct file *f) > #define FL_UNLOCK_PENDING 512 /* Lease is being broken */ > #define FL_OFDLCK 1024 /* lock is "owned" by struct file */ > #define FL_LAYOUT 2048 /* outstanding pNFS layout or user held pin */ > +#define FL_EXCLUSIVE 4096 /* Layout lease is exclusive */ > > #define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE) > > diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h > index baddd54f3031..88b175ceccbc 100644 > --- a/include/uapi/asm-generic/fcntl.h > +++ b/include/uapi/asm-generic/fcntl.h > @@ -176,6 +176,8 @@ struct f_owner_ex { > > #define F_LAYOUT 16 /* layout lease to allow longterm pins such as > RDMA */ > +#define F_EXCLUSIVE 32 /* layout lease is exclusive */ > + /* FIXME or shoudl this be F_EXLCK??? */ > > /* operations for bsd flock(), also used by the kernel implementation */ > #define LOCK_SH 1 /* shared lock */ This interface just seems weird to me. The existing F_*LCK values aren't really set up to be flags, but are enumerated values (even if there are some gaps on some arches). For instance, on parisc and sparc: /* for posix fcntl() and lockf() */ #define F_RDLCK 01 #define F_WRLCK 02 #define F_UNLCK 03 While your new flag values are well above these values, it's still a bit sketchy to do what you're proposing from a cross-platform interface standpoint. I think this would be a lot cleaner if you weren't overloading the F_SETLEASE command with new flags, and instead added new F_SETLAYOUT/F_GETLAYOUT cmd values. You'd then be free to define a new set of "arg" values for use with layouts, and there's be a clear distinction interface-wise between setting a layout and a lease.
On Wed, Aug 14, 2019 at 10:15:06AM -0400, Jeff Layton wrote: > On Fri, 2019-08-09 at 15:58 -0700, ira.weiny@intel.com wrote: > > From: Ira Weiny <ira.weiny@intel.com> > > > > Add an exclusive lease flag which indicates that the layout mechanism > > can not be broken. > > > > Exclusive layout leases allow the file system to know that pages may be > > GUP pined and that attempts to change the layout, ie truncate, should be > > failed. > > > > A process which attempts to break it's own exclusive lease gets an > > EDEADLOCK return to help determine that this is likely a programming bug > > vs someone else holding a resource. ..... > > diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h > > index baddd54f3031..88b175ceccbc 100644 > > --- a/include/uapi/asm-generic/fcntl.h > > +++ b/include/uapi/asm-generic/fcntl.h > > @@ -176,6 +176,8 @@ struct f_owner_ex { > > > > #define F_LAYOUT 16 /* layout lease to allow longterm pins such as > > RDMA */ > > +#define F_EXCLUSIVE 32 /* layout lease is exclusive */ > > + /* FIXME or shoudl this be F_EXLCK??? */ > > > > /* operations for bsd flock(), also used by the kernel implementation */ > > #define LOCK_SH 1 /* shared lock */ > > This interface just seems weird to me. The existing F_*LCK values aren't > really set up to be flags, but are enumerated values (even if there are > some gaps on some arches). For instance, on parisc and sparc: I don't think we need to worry about this - the F_WRLCK version of the layout lease should have these exclusive access semantics (i.e other ops fail rather than block waiting for lease recall) and hence the API shouldn't need a new flag to specify them. i.e. the primary difference between F_RDLCK and F_WRLCK layout leases is that the F_RDLCK is a shared, co-operative lease model where only delays in operations will be seen, while F_WRLCK is a "guarantee exclusive access and I don't care what it breaks" model... :) Cheers, Dave.
On Thu, 2019-08-15 at 07:56 +1000, Dave Chinner wrote: > On Wed, Aug 14, 2019 at 10:15:06AM -0400, Jeff Layton wrote: > > On Fri, 2019-08-09 at 15:58 -0700, ira.weiny@intel.com wrote: > > > From: Ira Weiny <ira.weiny@intel.com> > > > > > > Add an exclusive lease flag which indicates that the layout mechanism > > > can not be broken. > > > > > > Exclusive layout leases allow the file system to know that pages may be > > > GUP pined and that attempts to change the layout, ie truncate, should be > > > failed. > > > > > > A process which attempts to break it's own exclusive lease gets an > > > EDEADLOCK return to help determine that this is likely a programming bug > > > vs someone else holding a resource. > ..... > > > diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h > > > index baddd54f3031..88b175ceccbc 100644 > > > --- a/include/uapi/asm-generic/fcntl.h > > > +++ b/include/uapi/asm-generic/fcntl.h > > > @@ -176,6 +176,8 @@ struct f_owner_ex { > > > > > > #define F_LAYOUT 16 /* layout lease to allow longterm pins such as > > > RDMA */ > > > +#define F_EXCLUSIVE 32 /* layout lease is exclusive */ > > > + /* FIXME or shoudl this be F_EXLCK??? */ > > > > > > /* operations for bsd flock(), also used by the kernel implementation */ > > > #define LOCK_SH 1 /* shared lock */ > > > > This interface just seems weird to me. The existing F_*LCK values aren't > > really set up to be flags, but are enumerated values (even if there are > > some gaps on some arches). For instance, on parisc and sparc: > > I don't think we need to worry about this - the F_WRLCK version of > the layout lease should have these exclusive access semantics (i.e > other ops fail rather than block waiting for lease recall) and hence > the API shouldn't need a new flag to specify them. > > i.e. the primary difference between F_RDLCK and F_WRLCK layout > leases is that the F_RDLCK is a shared, co-operative lease model > where only delays in operations will be seen, while F_WRLCK is a > "guarantee exclusive access and I don't care what it breaks" > model... :) > Not exactly... F_WRLCK and F_RDLCK leases can both be broken, and will eventually time out if there is conflicting access. The F_EXCLUSIVE flag on the other hand is there to prevent any sort of lease break from I'm guessing what Ira really wants with the F_EXCLUSIVE flag is something akin to what happens when we set fl_break_time to 0 in the nfsd code. nfsd never wants the locks code to time out a lease of any sort, since it handles that timeout itself. If you're going to add this functionality, it'd be good to also convert knfsd to use it as well, so we don't end up with multiple ways to deal with that situation.
Missed this. sorry. On Mon, Aug 26, 2019 at 06:41:07AM -0400, Jeff Layton wrote: > On Thu, 2019-08-15 at 07:56 +1000, Dave Chinner wrote: > > On Wed, Aug 14, 2019 at 10:15:06AM -0400, Jeff Layton wrote: > > > On Fri, 2019-08-09 at 15:58 -0700, ira.weiny@intel.com wrote: > > > > From: Ira Weiny <ira.weiny@intel.com> > > > > > > > > Add an exclusive lease flag which indicates that the layout mechanism > > > > can not be broken. > > > > > > > > Exclusive layout leases allow the file system to know that pages may be > > > > GUP pined and that attempts to change the layout, ie truncate, should be > > > > failed. > > > > > > > > A process which attempts to break it's own exclusive lease gets an > > > > EDEADLOCK return to help determine that this is likely a programming bug > > > > vs someone else holding a resource. > > ..... > > > > diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h > > > > index baddd54f3031..88b175ceccbc 100644 > > > > --- a/include/uapi/asm-generic/fcntl.h > > > > +++ b/include/uapi/asm-generic/fcntl.h > > > > @@ -176,6 +176,8 @@ struct f_owner_ex { > > > > > > > > #define F_LAYOUT 16 /* layout lease to allow longterm pins such as > > > > RDMA */ > > > > +#define F_EXCLUSIVE 32 /* layout lease is exclusive */ > > > > + /* FIXME or shoudl this be F_EXLCK??? */ > > > > > > > > /* operations for bsd flock(), also used by the kernel implementation */ > > > > #define LOCK_SH 1 /* shared lock */ > > > > > > This interface just seems weird to me. The existing F_*LCK values aren't > > > really set up to be flags, but are enumerated values (even if there are > > > some gaps on some arches). For instance, on parisc and sparc: > > > > I don't think we need to worry about this - the F_WRLCK version of > > the layout lease should have these exclusive access semantics (i.e > > other ops fail rather than block waiting for lease recall) and hence > > the API shouldn't need a new flag to specify them. > > > > i.e. the primary difference between F_RDLCK and F_WRLCK layout > > leases is that the F_RDLCK is a shared, co-operative lease model > > where only delays in operations will be seen, while F_WRLCK is a > > "guarantee exclusive access and I don't care what it breaks" > > model... :) > > > > Not exactly... > > F_WRLCK and F_RDLCK leases can both be broken, and will eventually time > out if there is conflicting access. The F_EXCLUSIVE flag on the other > hand is there to prevent any sort of lease break from Right EXCLUSIVE will not break for any reason. It will fail truncate and hole punch as we discussed back in June. This is for the use case where the user has handed this file/pages off to some hardware for which removing the lease would be impossible. _And_ we don't anticipate any valid use case that someone will need to truncate short of killing the process to free up file system space. > > I'm guessing what Ira really wants with the F_EXCLUSIVE flag is > something akin to what happens when we set fl_break_time to 0 in the > nfsd code. nfsd never wants the locks code to time out a lease of any > sort, since it handles that timeout itself. > > If you're going to add this functionality, it'd be good to also convert > knfsd to use it as well, so we don't end up with multiple ways to deal > with that situation. Could you point me at the source for knfsd? I looked in git://git.linux-nfs.org/projects/steved/nfs-utils.git but I don't see anywhere leases are used in that source? Thanks, Ira
On Thu, 2019-08-29 at 16:34 -0700, Ira Weiny wrote: > Missed this. sorry. > > On Mon, Aug 26, 2019 at 06:41:07AM -0400, Jeff Layton wrote: > > On Thu, 2019-08-15 at 07:56 +1000, Dave Chinner wrote: > > > On Wed, Aug 14, 2019 at 10:15:06AM -0400, Jeff Layton wrote: > > > > On Fri, 2019-08-09 at 15:58 -0700, ira.weiny@intel.com wrote: > > > > > From: Ira Weiny <ira.weiny@intel.com> > > > > > > > > > > Add an exclusive lease flag which indicates that the layout mechanism > > > > > can not be broken. > > > > > > > > > > Exclusive layout leases allow the file system to know that pages may be > > > > > GUP pined and that attempts to change the layout, ie truncate, should be > > > > > failed. > > > > > > > > > > A process which attempts to break it's own exclusive lease gets an > > > > > EDEADLOCK return to help determine that this is likely a programming bug > > > > > vs someone else holding a resource. > > > ..... > > > > > diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h > > > > > index baddd54f3031..88b175ceccbc 100644 > > > > > --- a/include/uapi/asm-generic/fcntl.h > > > > > +++ b/include/uapi/asm-generic/fcntl.h > > > > > @@ -176,6 +176,8 @@ struct f_owner_ex { > > > > > > > > > > #define F_LAYOUT 16 /* layout lease to allow longterm pins such as > > > > > RDMA */ > > > > > +#define F_EXCLUSIVE 32 /* layout lease is exclusive */ > > > > > + /* FIXME or shoudl this be F_EXLCK??? */ > > > > > > > > > > /* operations for bsd flock(), also used by the kernel implementation */ > > > > > #define LOCK_SH 1 /* shared lock */ > > > > > > > > This interface just seems weird to me. The existing F_*LCK values aren't > > > > really set up to be flags, but are enumerated values (even if there are > > > > some gaps on some arches). For instance, on parisc and sparc: > > > > > > I don't think we need to worry about this - the F_WRLCK version of > > > the layout lease should have these exclusive access semantics (i.e > > > other ops fail rather than block waiting for lease recall) and hence > > > the API shouldn't need a new flag to specify them. > > > > > > i.e. the primary difference between F_RDLCK and F_WRLCK layout > > > leases is that the F_RDLCK is a shared, co-operative lease model > > > where only delays in operations will be seen, while F_WRLCK is a > > > "guarantee exclusive access and I don't care what it breaks" > > > model... :) > > > > > > > Not exactly... > > > > F_WRLCK and F_RDLCK leases can both be broken, and will eventually time > > out if there is conflicting access. The F_EXCLUSIVE flag on the other > > hand is there to prevent any sort of lease break from > > Right EXCLUSIVE will not break for any reason. It will fail truncate and hole > punch as we discussed back in June. This is for the use case where the user > has handed this file/pages off to some hardware for which removing the lease > would be impossible. _And_ we don't anticipate any valid use case that someone > will need to truncate short of killing the process to free up file system > space. > > > I'm guessing what Ira really wants with the F_EXCLUSIVE flag is > > something akin to what happens when we set fl_break_time to 0 in the > > nfsd code. nfsd never wants the locks code to time out a lease of any > > sort, since it handles that timeout itself. > > > > If you're going to add this functionality, it'd be good to also convert > > knfsd to use it as well, so we don't end up with multiple ways to deal > > with that situation. > > Could you point me at the source for knfsd? I looked in > > git://git.linux-nfs.org/projects/steved/nfs-utils.git > > but I don't see anywhere leases are used in that source? > Ahh sorry that wasn't clear. It's the fs/nfsd directory in the Linux kernel sources. See nfsd4_layout_lm_break and nfsd_break_deleg_cb in particular.
On 8/9/19 3:58 PM, ira.weiny@intel.com wrote: > From: Ira Weiny <ira.weiny@intel.com> > > Add an exclusive lease flag which indicates that the layout mechanism > can not be broken. After studying the rest of these discussions extensively, I think in all cases FL_EXCLUSIVE is better named "unbreakable", rather than exclusive. If you read your sentence above, it basically reinforces that idea: "add an exclusive flag to mean it is unbreakable" is a bit of a disconnect. It would be better to say, Add an "unbreakable" lease flag which indicates that the layout lease cannot be broken. Furthermore, while this may or may not be a way forward on the "we cannot have more than one process take a layout lease on a file/range", it at least stops making it impossible. In other words, no one is going to write a patch that allows sharing an exclusive layout lease--but someone might well update some of these patches here to make it possible to have multiple processes take unbreakable leases on the same file/range. I haven't worked through everything there yet, but again: * FL_UNBREAKABLE is the name you're looking for here, and * I think we want to allow multiple processes to take FL_UNBREAKABLE leases on the same file/range, so that we can make RDMA setups reasonable. By "reasonable" I mean, "no need to have a lead process that owns all the leases". thanks,
diff --git a/fs/locks.c b/fs/locks.c index ad17c6ffca06..0c7359cdab92 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -626,6 +626,8 @@ static int lease_init(struct file *filp, long type, unsigned int flags, fl->fl_flags = FL_LEASE; if (flags & FL_LAYOUT) fl->fl_flags |= FL_LAYOUT; + if (flags & FL_EXCLUSIVE) + fl->fl_flags |= FL_EXCLUSIVE; fl->fl_start = 0; fl->fl_end = OFFSET_MAX; fl->fl_ops = NULL; @@ -1619,6 +1621,14 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, fl_list) { if (!leases_conflict(fl, new_fl)) continue; + if (fl->fl_flags & FL_EXCLUSIVE) { + error = -ETXTBSY; + if (new_fl->fl_pid == fl->fl_pid) { + error = -EDEADLOCK; + goto out; + } + continue; + } if (want_write) { if (fl->fl_flags & FL_UNLOCK_PENDING) continue; @@ -1634,6 +1644,13 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type) locks_delete_lock_ctx(fl, &dispose); } + /* We differentiate between -EDEADLOCK and -ETXTBSY so the above loop + * continues with -ETXTBSY looking for a potential deadlock instead. + * If deadlock is not found go ahead and return -ETXTBSY. + */ + if (error == -ETXTBSY) + goto out; + if (list_empty(&ctx->flc_lease)) goto out; @@ -2044,9 +2061,11 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg) * to revoke the lease in break_layout() And this is done by using * F_WRLCK in the break code. */ - if (arg == F_LAYOUT) { + if ((arg & F_LAYOUT) == F_LAYOUT) { + if ((arg & F_EXCLUSIVE) == F_EXCLUSIVE) + flags |= FL_EXCLUSIVE; arg = F_RDLCK; - flags = FL_LAYOUT; + flags |= FL_LAYOUT; } fl = lease_alloc(filp, arg, flags); diff --git a/include/linux/fs.h b/include/linux/fs.h index dd60d5be9886..2e41ce547913 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1005,6 +1005,7 @@ static inline struct file *get_file(struct file *f) #define FL_UNLOCK_PENDING 512 /* Lease is being broken */ #define FL_OFDLCK 1024 /* lock is "owned" by struct file */ #define FL_LAYOUT 2048 /* outstanding pNFS layout or user held pin */ +#define FL_EXCLUSIVE 4096 /* Layout lease is exclusive */ #define FL_CLOSE_POSIX (FL_POSIX | FL_CLOSE) diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h index baddd54f3031..88b175ceccbc 100644 --- a/include/uapi/asm-generic/fcntl.h +++ b/include/uapi/asm-generic/fcntl.h @@ -176,6 +176,8 @@ struct f_owner_ex { #define F_LAYOUT 16 /* layout lease to allow longterm pins such as RDMA */ +#define F_EXCLUSIVE 32 /* layout lease is exclusive */ + /* FIXME or shoudl this be F_EXLCK??? */ /* operations for bsd flock(), also used by the kernel implementation */ #define LOCK_SH 1 /* shared lock */