[RFC] filelock: new helper: vfs_file_has_locks

Message ID	20221114140747.134928-1-jlayton@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <ceph-devel-owner@kernel.org> From: Jeff Layton <jlayton@kernel.org> To: chuck.lever@oracle.com, xiubli@redhat.com Cc: linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org Subject: [RFC PATCH] filelock: new helper: vfs_file_has_locks Date: Mon, 14 Nov 2022 09:07:47 -0500 Message-Id: <20221114140747.134928-1-jlayton@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[RFC] filelock: new helper: vfs_file_has_locks \| expand [RFC] filelock: new helper: vfs_file_has_locks

Jeff Layton Nov. 14, 2022, 2:07 p.m. UTC

Ceph has a need to know whether a particular file has any locks set on
it. It's currently tracking that by a num_locks field in its
filp->private_data, but that's problematic as it tries to decrement this
field when releasing locks and that can race with the file being torn
down.

Add a new vfs_file_has_locks helper that will scan the flock and posix
lists, and return true if any of the locks have a fl_file that matches
the given one. Ceph can then call this instead of doing its own
tracking.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/locks.c         | 36 ++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |  1 +
 2 files changed, 37 insertions(+)

Xiubo,

Here's what I was thinking instead of trying to track this within ceph.
Most inodes never have locks set, so in most cases this will be a NULL
pointer check.

Chuck Lever Nov. 14, 2022, 2:19 p.m. UTC | #1

> On Nov 14, 2022, at 9:07 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> Ceph has a need to know whether a particular file has any locks set on
> it. It's currently tracking that by a num_locks field in its
> filp->private_data, but that's problematic as it tries to decrement this
> field when releasing locks and that can race with the file being torn
> down.
> 
> Add a new vfs_file_has_locks helper that will scan the flock and posix
> lists, and return true if any of the locks have a fl_file that matches
> the given one. Ceph can then call this instead of doing its own
> tracking.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

LGTM.


> ---
> fs/locks.c         | 36 ++++++++++++++++++++++++++++++++++++
> include/linux/fs.h |  1 +
> 2 files changed, 37 insertions(+)
> 
> Xiubo,
> 
> Here's what I was thinking instead of trying to track this within ceph.
> Most inodes never have locks set, so in most cases this will be a NULL
> pointer check.
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index 5876c8ff0edc..c7f903b63a53 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2672,6 +2672,42 @@ int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
> }
> EXPORT_SYMBOL_GPL(vfs_cancel_lock);
> 
> +/**
> + * vfs_file_has_locks - are any locks held that were set on @filp?
> + * @filp: open file to check for locks
> + *
> + * Return true if are any FL_POSIX or FL_FLOCK locks currently held
> + * on @filp.
> + */
> +bool vfs_file_has_locks(struct file *filp)
> +{
> +	struct file_lock_context *ctx;
> +	struct file_lock *fl;
> +	bool ret = false;
> +
> +	ctx = smp_load_acquire(&locks_inode(filp)->i_flctx);
> +	if (!ctx)
> +		return false;
> +
> +	spin_lock(&ctx->flc_lock);
> +	list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
> +		if (fl->fl_file == filp) {
> +			ret = true;
> +			goto out;
> +		}
> +	}
> +	list_for_each_entry(fl, &ctx->flc_flock, fl_list) {
> +		if (fl->fl_file == filp) {
> +			ret = true;
> +			break;
> +		}
> +	}
> +out:
> +	spin_unlock(&ctx->flc_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL(vfs_file_has_locks);
> +
> #ifdef CONFIG_PROC_FS
> #include <linux/proc_fs.h>
> #include <linux/seq_file.h>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e654435f1651..e4d0f1fa7f9f 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1170,6 +1170,7 @@ extern int locks_delete_block(struct file_lock *);
> extern int vfs_test_lock(struct file *, struct file_lock *);
> extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
> extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
> +bool vfs_file_has_locks(struct file *file);
> extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
> extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
> extern void lease_get_mtime(struct inode *, struct timespec64 *time);
> -- 
> 2.38.1
> 

--
Chuck Lever

Jeff Layton Nov. 14, 2022, 7:46 p.m. UTC | #2

On Mon, 2022-11-14 at 09:07 -0500, Jeff Layton wrote:
> Ceph has a need to know whether a particular file has any locks set on
> it. It's currently tracking that by a num_locks field in its
> filp->private_data, but that's problematic as it tries to decrement this
> field when releasing locks and that can race with the file being torn
> down.
> 
> Add a new vfs_file_has_locks helper that will scan the flock and posix
> lists, and return true if any of the locks have a fl_file that matches
> the given one. Ceph can then call this instead of doing its own
> tracking.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/locks.c         | 36 ++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h |  1 +
>  2 files changed, 37 insertions(+)
> 
> Xiubo,
> 
> Here's what I was thinking instead of trying to track this within ceph.
> Most inodes never have locks set, so in most cases this will be a NULL
> pointer check.
> 
> 
> 

I went ahead and added a slightly updated version of this this to my
locks-next branch for now, but...

Thinking about this more...I'm not sure this whole concept of what the
ceph code is trying to do makes sense. Locks only conflict if they have
different owners, and POSIX locks are owned by the process. Consider
this scenario (obviously, this is not a problem with OFD locks).

A process has the same file open via two different fds. It sets lock A
from offset 0..9 via fd 1. Now, same process sets lock B from 10..19 via
fd 2. The two locks will be merged, because they don't conflict (because
it's the same process).

Against which fd should the merged lock record be counted?

Would it be better to always check for CEPH_I_ERROR_FILELOCK, even when
the fd hasn't had any locks explicitly set on it?

> diff --git a/fs/locks.c b/fs/locks.c
> index 5876c8ff0edc..c7f903b63a53 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2672,6 +2672,42 @@ int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
>  }
>  EXPORT_SYMBOL_GPL(vfs_cancel_lock);
>  
> +/**
> + * vfs_file_has_locks - are any locks held that were set on @filp?
> + * @filp: open file to check for locks
> + *
> + * Return true if are any FL_POSIX or FL_FLOCK locks currently held
> + * on @filp.
> + */
> +bool vfs_file_has_locks(struct file *filp)
> +{
> +	struct file_lock_context *ctx;
> +	struct file_lock *fl;
> +	bool ret = false;
> +
> +	ctx = smp_load_acquire(&locks_inode(filp)->i_flctx);
> +	if (!ctx)
> +		return false;
> +
> +	spin_lock(&ctx->flc_lock);
> +	list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
> +		if (fl->fl_file == filp) {
> +			ret = true;
> +			goto out;
> +		}
> +	}
> +	list_for_each_entry(fl, &ctx->flc_flock, fl_list) {
> +		if (fl->fl_file == filp) {
> +			ret = true;
> +			break;
> +		}
> +	}
> +out:
> +	spin_unlock(&ctx->flc_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL(vfs_file_has_locks);
> +
>  #ifdef CONFIG_PROC_FS
>  #include <linux/proc_fs.h>
>  #include <linux/seq_file.h>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e654435f1651..e4d0f1fa7f9f 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1170,6 +1170,7 @@ extern int locks_delete_block(struct file_lock *);
>  extern int vfs_test_lock(struct file *, struct file_lock *);
>  extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
>  extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
> +bool vfs_file_has_locks(struct file *file);
>  extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
>  extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
>  extern void lease_get_mtime(struct inode *, struct timespec64 *time);

Xiubo Li Nov. 15, 2022, 5:43 a.m. UTC | #3

On 15/11/2022 03:46, Jeff Layton wrote:
> On Mon, 2022-11-14 at 09:07 -0500, Jeff Layton wrote:
>> Ceph has a need to know whether a particular file has any locks set on
>> it. It's currently tracking that by a num_locks field in its
>> filp->private_data, but that's problematic as it tries to decrement this
>> field when releasing locks and that can race with the file being torn
>> down.
>>
>> Add a new vfs_file_has_locks helper that will scan the flock and posix
>> lists, and return true if any of the locks have a fl_file that matches
>> the given one. Ceph can then call this instead of doing its own
>> tracking.
>>
>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> ---
>>   fs/locks.c         | 36 ++++++++++++++++++++++++++++++++++++
>>   include/linux/fs.h |  1 +
>>   2 files changed, 37 insertions(+)
>>
>> Xiubo,
>>
>> Here's what I was thinking instead of trying to track this within ceph.
>> Most inodes never have locks set, so in most cases this will be a NULL
>> pointer check.
>>
>>
>>
> I went ahead and added a slightly updated version of this this to my
> locks-next branch for now, but...
>
> Thinking about this more...I'm not sure this whole concept of what the
> ceph code is trying to do makes sense. Locks only conflict if they have
> different owners, and POSIX locks are owned by the process. Consider
> this scenario (obviously, this is not a problem with OFD locks).
>
> A process has the same file open via two different fds. It sets lock A
> from offset 0..9 via fd 1. Now, same process sets lock B from 10..19 via
> fd 2. The two locks will be merged, because they don't conflict (because
> it's the same process).
>
> Against which fd should the merged lock record be counted?

Thanks Jeff.

For the above example as you mentioned, from my reading of the lock code 
after being merged it will always keep the old file_lock's fl_file.

There is another case that if the Inode already has LockA and LockB:

Lock A --> [0, 9] --> fileA

Lock B --> [15, 20] --> fileB

And then LockC comes:

Lock C --> [8, 16] --> fileC

Then the inode will only have the LockB:

Lock B --> [0, 20] --> fileB.

So the exiting ceph code seems buggy!

>
> Would it be better to always check for CEPH_I_ERROR_FILELOCK, even when
> the fd hasn't had any locks explicitly set on it?

Maybe we should check whether any POSIX lock exist, if so we should 
check CEPH_I_ERROR_FILELOCK always. Or we need to check it depending on 
each fd ?

Thanks!

- Xiubo


>> diff --git a/fs/locks.c b/fs/locks.c
>> index 5876c8ff0edc..c7f903b63a53 100644
>> --- a/fs/locks.c
>> +++ b/fs/locks.c
>> @@ -2672,6 +2672,42 @@ int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
>>   }
>>   EXPORT_SYMBOL_GPL(vfs_cancel_lock);
>>   
>> +/**
>> + * vfs_file_has_locks - are any locks held that were set on @filp?
>> + * @filp: open file to check for locks
>> + *
>> + * Return true if are any FL_POSIX or FL_FLOCK locks currently held
>> + * on @filp.
>> + */
>> +bool vfs_file_has_locks(struct file *filp)
>> +{
>> +	struct file_lock_context *ctx;
>> +	struct file_lock *fl;
>> +	bool ret = false;
>> +
>> +	ctx = smp_load_acquire(&locks_inode(filp)->i_flctx);
>> +	if (!ctx)
>> +		return false;
>> +
>> +	spin_lock(&ctx->flc_lock);
>> +	list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
>> +		if (fl->fl_file == filp) {
>> +			ret = true;
>> +			goto out;
>> +		}
>> +	}
>> +	list_for_each_entry(fl, &ctx->flc_flock, fl_list) {
>> +		if (fl->fl_file == filp) {
>> +			ret = true;
>> +			break;
>> +		}
>> +	}
>> +out:
>> +	spin_unlock(&ctx->flc_lock);
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(vfs_file_has_locks);
>> +
>>   #ifdef CONFIG_PROC_FS
>>   #include <linux/proc_fs.h>
>>   #include <linux/seq_file.h>
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index e654435f1651..e4d0f1fa7f9f 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -1170,6 +1170,7 @@ extern int locks_delete_block(struct file_lock *);
>>   extern int vfs_test_lock(struct file *, struct file_lock *);
>>   extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
>>   extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
>> +bool vfs_file_has_locks(struct file *file);
>>   extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
>>   extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
>>   extern void lease_get_mtime(struct inode *, struct timespec64 *time);

Christoph Hellwig Nov. 15, 2022, 8:54 a.m. UTC | #4

On Mon, Nov 14, 2022 at 09:07:47AM -0500, Jeff Layton wrote:
> +bool vfs_file_has_locks(struct file *filp)
> +{
> +	struct file_lock_context *ctx;
> +	struct file_lock *fl;
> +	bool ret = false;
> +
> +	ctx = smp_load_acquire(&locks_inode(filp)->i_flctx);
> +	if (!ctx)
> +		return false;
> +
> +	spin_lock(&ctx->flc_lock);
> +	list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
> +		if (fl->fl_file == filp) {
> +			ret = true;
> +			goto out;
> +		}
> +	}
> +	list_for_each_entry(fl, &ctx->flc_flock, fl_list) {
> +		if (fl->fl_file == filp) {
> +			ret = true;
> +			break;
> +		}
> +	}

Maybe a little helper for the list lookup would be nice here:
static inline bool __vfs_file_has_locks(struct file *file)
{
	struct file_lock *fl;

	list_for_each_entry(fl, &ctx->flc_flock, fl_list)
		if (fl->fl_file == filp)
			return true;
	return false;
}

simplifying the check in the caller to:

	ret = __vfs_file_has_locks(&ctx->flc_posix) ||
	      __vfs_file_has_locks(&ctx->flc_flock);

> +EXPORT_SYMBOL(vfs_file_has_locks);

EXPORT_SYMBOL_GPL for any new network-fsy functionality would be nice.

Jeff Layton Nov. 15, 2022, 2:40 p.m. UTC | #5

On Tue, 2022-11-15 at 13:43 +0800, Xiubo Li wrote:
> On 15/11/2022 03:46, Jeff Layton wrote:
> > On Mon, 2022-11-14 at 09:07 -0500, Jeff Layton wrote:
> > > Ceph has a need to know whether a particular file has any locks set on
> > > it. It's currently tracking that by a num_locks field in its
> > > filp->private_data, but that's problematic as it tries to decrement this
> > > field when releasing locks and that can race with the file being torn
> > > down.
> > > 
> > > Add a new vfs_file_has_locks helper that will scan the flock and posix
> > > lists, and return true if any of the locks have a fl_file that matches
> > > the given one. Ceph can then call this instead of doing its own
> > > tracking.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >   fs/locks.c         | 36 ++++++++++++++++++++++++++++++++++++
> > >   include/linux/fs.h |  1 +
> > >   2 files changed, 37 insertions(+)
> > > 
> > > Xiubo,
> > > 
> > > Here's what I was thinking instead of trying to track this within ceph.
> > > Most inodes never have locks set, so in most cases this will be a NULL
> > > pointer check.
> > > 
> > > 
> > > 
> > I went ahead and added a slightly updated version of this this to my
> > locks-next branch for now, but...
> > 
> > Thinking about this more...I'm not sure this whole concept of what the
> > ceph code is trying to do makes sense. Locks only conflict if they have
> > different owners, and POSIX locks are owned by the process. Consider
> > this scenario (obviously, this is not a problem with OFD locks).
> > 
> > A process has the same file open via two different fds. It sets lock A
> > from offset 0..9 via fd 1. Now, same process sets lock B from 10..19 via
> > fd 2. The two locks will be merged, because they don't conflict (because
> > it's the same process).
> > 
> > Against which fd should the merged lock record be counted?
> 
> Thanks Jeff.
> 
> For the above example as you mentioned, from my reading of the lock code 
> after being merged it will always keep the old file_lock's fl_file.
> 
> There is another case that if the Inode already has LockA and LockB:
> 
> Lock A --> [0, 9] --> fileA
> 
> Lock B --> [15, 20] --> fileB
> 
> And then LockC comes:
> 
> Lock C --> [8, 16] --> fileC
> 
> Then the inode will only have the LockB:
> 
> Lock B --> [0, 20] --> fileB.
> 
> So the exiting ceph code seems buggy!
> 

Yeah, there are a number of ways to end up with a different fl_file than
you started with.
 
> > 
> > Would it be better to always check for CEPH_I_ERROR_FILELOCK, even when
> > the fd hasn't had any locks explicitly set on it?
> 
> Maybe we should check whether any POSIX lock exist, if so we should 
> check CEPH_I_ERROR_FILELOCK always. Or we need to check it depending on 
> each fd ?
> 
> 

It was originally added here:

commit ff5d913dfc7142974eb1694d5fd6284658e46bc6
Author: Yan, Zheng <zyan@redhat.com>
Date:   Thu Jul 25 20:16:45 2019 +0800

    ceph: return -EIO if read/write against filp that lost file locks
    
    After mds evicts session, file locks get lost sliently. It's not safe to
    let programs continue to do read/write.
    
    Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

So I guess with the current code if you have the file open and set a
lock on it, you'll get back EIO when you try to get caps for it, but if
you never set a lock on the fd, then you wouldn't get an error. We don't
reliably keep track of what fd was used to set a lock (as noted above),
so we can't really do what Zheng was trying to do here.

Having a file where some openers use locking and others don't is a
really odd usage pattern though. Locks are like stoplights -- they only
work if everyone pays attention to them.

I think we should probably switch ceph_get_caps to just check whether
any locks are set on the file. If there are POSIX/OFD/FLOCK locks on the
file at the time, we should set CHECK_FILELOCK, regardless of what fd
was used to set the lock.

In practical terms, we probably want a vfs_inode_has_locks function,
that just tests whether the flc_posix and flc_flock lists are empty.

Maybe something like this instead? Then ceph could call this from
ceph_get_caps and set CHECK_FILELOCK if it returns true.

-------------8<---------------

[PATCH] filelock: new helper: vfs_inode_has_locks

Ceph has a need to know whether a particular inode has any locks set on
it. It's currently tracking that by a num_locks field in its
filp->private_data, but that's problematic as it tries to decrement this
field when releasing locks and that can race with the file being torn
down.

Add a new vfs_inode_has_locks helper that just returns whether any locks
are currently held on the inode.

Cc: Xiubo Li <xiubli@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/locks.c         | 23 +++++++++++++++++++++++
 include/linux/fs.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/fs/locks.c b/fs/locks.c
index 5876c8ff0edc..9ccf89b6c95d 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2672,6 +2672,29 @@ int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
 }
 EXPORT_SYMBOL_GPL(vfs_cancel_lock);
 
+/**
+ * vfs_inode_has_locks - are any file locks held on @inode?
+ * @inode: inode to check for locks
+ *
+ * Return true if there are any FL_POSIX or FL_FLOCK locks currently
+ * set on @inode.
+ */
+bool vfs_inode_has_locks(struct inode *inode)
+{
+	struct file_lock_context *ctx;
+	bool ret;
+
+	ctx = smp_load_acquire(&inode->i_flctx);
+	if (!ctx)
+		return false;
+
+	spin_lock(&ctx->flc_lock);
+	ret = !list_empty(&ctx->flc_posix) || !list_empty(&ctx->flc_flock);
+	spin_unlock(&ctx->flc_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vfs_inode_has_locks);
+
 #ifdef CONFIG_PROC_FS
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e654435f1651..d6cb42b7e91c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1170,6 +1170,7 @@ extern int locks_delete_block(struct file_lock *);
 extern int vfs_test_lock(struct file *, struct file_lock *);
 extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
 extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
+bool vfs_inode_has_locks(struct inode *inode);
 extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
 extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
 extern void lease_get_mtime(struct inode *, struct timespec64 *time);

Christoph Hellwig Nov. 16, 2022, 6:16 a.m. UTC | #6

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Xiubo Li Nov. 16, 2022, 6:49 a.m. UTC | #7

On 15/11/2022 22:40, Jeff Layton wrote:
> On Tue, 2022-11-15 at 13:43 +0800, Xiubo Li wrote:
>> On 15/11/2022 03:46, Jeff Layton wrote:
>>> On Mon, 2022-11-14 at 09:07 -0500, Jeff Layton wrote:
>>>> Ceph has a need to know whether a particular file has any locks set on
>>>> it. It's currently tracking that by a num_locks field in its
>>>> filp->private_data, but that's problematic as it tries to decrement this
>>>> field when releasing locks and that can race with the file being torn
>>>> down.
>>>>
>>>> Add a new vfs_file_has_locks helper that will scan the flock and posix
>>>> lists, and return true if any of the locks have a fl_file that matches
>>>> the given one. Ceph can then call this instead of doing its own
>>>> tracking.
>>>>
>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>> ---
>>>>    fs/locks.c         | 36 ++++++++++++++++++++++++++++++++++++
>>>>    include/linux/fs.h |  1 +
>>>>    2 files changed, 37 insertions(+)
>>>>
>>>> Xiubo,
>>>>
>>>> Here's what I was thinking instead of trying to track this within ceph.
>>>> Most inodes never have locks set, so in most cases this will be a NULL
>>>> pointer check.
>>>>
>>>>
>>>>
>>> I went ahead and added a slightly updated version of this this to my
>>> locks-next branch for now, but...
>>>
>>> Thinking about this more...I'm not sure this whole concept of what the
>>> ceph code is trying to do makes sense. Locks only conflict if they have
>>> different owners, and POSIX locks are owned by the process. Consider
>>> this scenario (obviously, this is not a problem with OFD locks).
>>>
>>> A process has the same file open via two different fds. It sets lock A
>>> from offset 0..9 via fd 1. Now, same process sets lock B from 10..19 via
>>> fd 2. The two locks will be merged, because they don't conflict (because
>>> it's the same process).
>>>
>>> Against which fd should the merged lock record be counted?
>> Thanks Jeff.
>>
>> For the above example as you mentioned, from my reading of the lock code
>> after being merged it will always keep the old file_lock's fl_file.
>>
>> There is another case that if the Inode already has LockA and LockB:
>>
>> Lock A --> [0, 9] --> fileA
>>
>> Lock B --> [15, 20] --> fileB
>>
>> And then LockC comes:
>>
>> Lock C --> [8, 16] --> fileC
>>
>> Then the inode will only have the LockB:
>>
>> Lock B --> [0, 20] --> fileB.
>>
>> So the exiting ceph code seems buggy!
>>
> Yeah, there are a number of ways to end up with a different fl_file than
> you started with.
>   
>>> Would it be better to always check for CEPH_I_ERROR_FILELOCK, even when
>>> the fd hasn't had any locks explicitly set on it?
>> Maybe we should check whether any POSIX lock exist, if so we should
>> check CEPH_I_ERROR_FILELOCK always. Or we need to check it depending on
>> each fd ?
>>
>>
> It was originally added here:
>
> commit ff5d913dfc7142974eb1694d5fd6284658e46bc6
> Author: Yan, Zheng <zyan@redhat.com>
> Date:   Thu Jul 25 20:16:45 2019 +0800
>
>      ceph: return -EIO if read/write against filp that lost file locks
>      
>      After mds evicts session, file locks get lost sliently. It's not safe to
>      let programs continue to do read/write.
>      
>      Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
>      Reviewed-by: Jeff Layton <jlayton@kernel.org>
>      Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
>
> So I guess with the current code if you have the file open and set a
> lock on it, you'll get back EIO when you try to get caps for it, but if
> you never set a lock on the fd, then you wouldn't get an error. We don't
> reliably keep track of what fd was used to set a lock (as noted above),
> so we can't really do what Zheng was trying to do here.
>
> Having a file where some openers use locking and others don't is a
> really odd usage pattern though. Locks are like stoplights -- they only
> work if everyone pays attention to them.
>
> I think we should probably switch ceph_get_caps to just check whether
> any locks are set on the file. If there are POSIX/OFD/FLOCK locks on the
> file at the time, we should set CHECK_FILELOCK, regardless of what fd
> was used to set the lock.
>
> In practical terms, we probably want a vfs_inode_has_locks function,
> that just tests whether the flc_posix and flc_flock lists are empty.

Jeff,

Yeah, this sounds good to me.


> Maybe something like this instead? Then ceph could call this from
> ceph_get_caps and set CHECK_FILELOCK if it returns true.
>
> -------------8<---------------
>
> [PATCH] filelock: new helper: vfs_inode_has_locks
>
> Ceph has a need to know whether a particular inode has any locks set on
> it. It's currently tracking that by a num_locks field in its
> filp->private_data, but that's problematic as it tries to decrement this
> field when releasing locks and that can race with the file being torn
> down.
>
> Add a new vfs_inode_has_locks helper that just returns whether any locks
> are currently held on the inode.
>
> Cc: Xiubo Li <xiubli@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/locks.c         | 23 +++++++++++++++++++++++
>   include/linux/fs.h |  1 +
>   2 files changed, 24 insertions(+)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 5876c8ff0edc..9ccf89b6c95d 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2672,6 +2672,29 @@ int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
>   }
>   EXPORT_SYMBOL_GPL(vfs_cancel_lock);
>   
> +/**
> + * vfs_inode_has_locks - are any file locks held on @inode?
> + * @inode: inode to check for locks
> + *
> + * Return true if there are any FL_POSIX or FL_FLOCK locks currently
> + * set on @inode.
> + */
> +bool vfs_inode_has_locks(struct inode *inode)
> +{
> +	struct file_lock_context *ctx;
> +	bool ret;
> +
> +	ctx = smp_load_acquire(&inode->i_flctx);
> +	if (!ctx)
> +		return false;
> +
> +	spin_lock(&ctx->flc_lock);
> +	ret = !list_empty(&ctx->flc_posix) || !list_empty(&ctx->flc_flock);
> +	spin_unlock(&ctx->flc_lock);

BTW, is the spin_lock/spin_unlock here really needed ?

> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(vfs_inode_has_locks);
> +
>   #ifdef CONFIG_PROC_FS
>   #include <linux/proc_fs.h>
>   #include <linux/seq_file.h>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e654435f1651..d6cb42b7e91c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1170,6 +1170,7 @@ extern int locks_delete_block(struct file_lock *);
>   extern int vfs_test_lock(struct file *, struct file_lock *);
>   extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
>   extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
> +bool vfs_inode_has_locks(struct inode *inode);
>   extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
>   extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
>   extern void lease_get_mtime(struct inode *, struct timespec64 *time);

All the others LGTM.

Thanks.

- Xiubo

Jeff Layton Nov. 16, 2022, 10:55 a.m. UTC | #8

On Wed, 2022-11-16 at 14:49 +0800, Xiubo Li wrote:
> On 15/11/2022 22:40, Jeff Layton wrote:
> > On Tue, 2022-11-15 at 13:43 +0800, Xiubo Li wrote:
> > > On 15/11/2022 03:46, Jeff Layton wrote:
> > > > On Mon, 2022-11-14 at 09:07 -0500, Jeff Layton wrote:
> > > > > Ceph has a need to know whether a particular file has any locks set on
> > > > > it. It's currently tracking that by a num_locks field in its
> > > > > filp->private_data, but that's problematic as it tries to decrement this
> > > > > field when releasing locks and that can race with the file being torn
> > > > > down.
> > > > > 
> > > > > Add a new vfs_file_has_locks helper that will scan the flock and posix
> > > > > lists, and return true if any of the locks have a fl_file that matches
> > > > > the given one. Ceph can then call this instead of doing its own
> > > > > tracking.
> > > > > 
> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > ---
> > > > >    fs/locks.c         | 36 ++++++++++++++++++++++++++++++++++++
> > > > >    include/linux/fs.h |  1 +
> > > > >    2 files changed, 37 insertions(+)
> > > > > 
> > > > > Xiubo,
> > > > > 
> > > > > Here's what I was thinking instead of trying to track this within ceph.
> > > > > Most inodes never have locks set, so in most cases this will be a NULL
> > > > > pointer check.
> > > > > 
> > > > > 
> > > > > 
> > > > I went ahead and added a slightly updated version of this this to my
> > > > locks-next branch for now, but...
> > > > 
> > > > Thinking about this more...I'm not sure this whole concept of what the
> > > > ceph code is trying to do makes sense. Locks only conflict if they have
> > > > different owners, and POSIX locks are owned by the process. Consider
> > > > this scenario (obviously, this is not a problem with OFD locks).
> > > > 
> > > > A process has the same file open via two different fds. It sets lock A
> > > > from offset 0..9 via fd 1. Now, same process sets lock B from 10..19 via
> > > > fd 2. The two locks will be merged, because they don't conflict (because
> > > > it's the same process).
> > > > 
> > > > Against which fd should the merged lock record be counted?
> > > Thanks Jeff.
> > > 
> > > For the above example as you mentioned, from my reading of the lock code
> > > after being merged it will always keep the old file_lock's fl_file.
> > > 
> > > There is another case that if the Inode already has LockA and LockB:
> > > 
> > > Lock A --> [0, 9] --> fileA
> > > 
> > > Lock B --> [15, 20] --> fileB
> > > 
> > > And then LockC comes:
> > > 
> > > Lock C --> [8, 16] --> fileC
> > > 
> > > Then the inode will only have the LockB:
> > > 
> > > Lock B --> [0, 20] --> fileB.
> > > 
> > > So the exiting ceph code seems buggy!
> > > 
> > Yeah, there are a number of ways to end up with a different fl_file than
> > you started with.
> >   
> > > > Would it be better to always check for CEPH_I_ERROR_FILELOCK, even when
> > > > the fd hasn't had any locks explicitly set on it?
> > > Maybe we should check whether any POSIX lock exist, if so we should
> > > check CEPH_I_ERROR_FILELOCK always. Or we need to check it depending on
> > > each fd ?
> > > 
> > > 
> > It was originally added here:
> > 
> > commit ff5d913dfc7142974eb1694d5fd6284658e46bc6
> > Author: Yan, Zheng <zyan@redhat.com>
> > Date:   Thu Jul 25 20:16:45 2019 +0800
> > 
> >      ceph: return -EIO if read/write against filp that lost file locks
> >      
> >      After mds evicts session, file locks get lost sliently. It's not safe to
> >      let programs continue to do read/write.
> >      
> >      Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
> >      Reviewed-by: Jeff Layton <jlayton@kernel.org>
> >      Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
> > 
> > So I guess with the current code if you have the file open and set a
> > lock on it, you'll get back EIO when you try to get caps for it, but if
> > you never set a lock on the fd, then you wouldn't get an error. We don't
> > reliably keep track of what fd was used to set a lock (as noted above),
> > so we can't really do what Zheng was trying to do here.
> > 
> > Having a file where some openers use locking and others don't is a
> > really odd usage pattern though. Locks are like stoplights -- they only
> > work if everyone pays attention to them.
> > 
> > I think we should probably switch ceph_get_caps to just check whether
> > any locks are set on the file. If there are POSIX/OFD/FLOCK locks on the
> > file at the time, we should set CHECK_FILELOCK, regardless of what fd
> > was used to set the lock.
> > 
> > In practical terms, we probably want a vfs_inode_has_locks function,
> > that just tests whether the flc_posix and flc_flock lists are empty.
> 
> Jeff,
> 
> Yeah, this sounds good to me.
> 
> 
> > Maybe something like this instead? Then ceph could call this from
> > ceph_get_caps and set CHECK_FILELOCK if it returns true.
> > 
> > -------------8<---------------
> > 
> > [PATCH] filelock: new helper: vfs_inode_has_locks
> > 
> > Ceph has a need to know whether a particular inode has any locks set on
> > it. It's currently tracking that by a num_locks field in its
> > filp->private_data, but that's problematic as it tries to decrement this
> > field when releasing locks and that can race with the file being torn
> > down.
> > 
> > Add a new vfs_inode_has_locks helper that just returns whether any locks
> > are currently held on the inode.
> > 
> > Cc: Xiubo Li <xiubli@redhat.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >   fs/locks.c         | 23 +++++++++++++++++++++++
> >   include/linux/fs.h |  1 +
> >   2 files changed, 24 insertions(+)
> > 
> > diff --git a/fs/locks.c b/fs/locks.c
> > index 5876c8ff0edc..9ccf89b6c95d 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -2672,6 +2672,29 @@ int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
> >   }
> >   EXPORT_SYMBOL_GPL(vfs_cancel_lock);
> >   
> > +/**
> > + * vfs_inode_has_locks - are any file locks held on @inode?
> > + * @inode: inode to check for locks
> > + *
> > + * Return true if there are any FL_POSIX or FL_FLOCK locks currently
> > + * set on @inode.
> > + */
> > +bool vfs_inode_has_locks(struct inode *inode)
> > +{
> > +	struct file_lock_context *ctx;
> > +	bool ret;
> > +
> > +	ctx = smp_load_acquire(&inode->i_flctx);
> > +	if (!ctx)
> > +		return false;
> > +
> > +	spin_lock(&ctx->flc_lock);
> > +	ret = !list_empty(&ctx->flc_posix) || !list_empty(&ctx->flc_flock);
> > +	spin_unlock(&ctx->flc_lock);
> 
> BTW, is the spin_lock/spin_unlock here really needed ?
> 

We could probably achieve the same effect with barriers, but I doubt
it's worth it. The flc_lock only protects the lists in the
file_lock_context, so it should almost always be uncontended.


> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(vfs_inode_has_locks);
> > +
> >   #ifdef CONFIG_PROC_FS
> >   #include <linux/proc_fs.h>
> >   #include <linux/seq_file.h>
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index e654435f1651..d6cb42b7e91c 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1170,6 +1170,7 @@ extern int locks_delete_block(struct file_lock *);
> >   extern int vfs_test_lock(struct file *, struct file_lock *);
> >   extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
> >   extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
> > +bool vfs_inode_has_locks(struct inode *inode);
> >   extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
> >   extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
> >   extern void lease_get_mtime(struct inode *, struct timespec64 *time);
> 
> All the others LGTM.
> 
> Thanks.
> 
> - Xiubo
> 
> 

Thanks. I'll re-post it "officially" in a bit and will queue it up for
v6.2.

Xiubo Li Nov. 16, 2022, 11:16 a.m. UTC | #9

On 16/11/2022 18:55, Jeff Layton wrote:
> On Wed, 2022-11-16 at 14:49 +0800, Xiubo Li wrote:
>> On 15/11/2022 22:40, Jeff Layton wrote:
>>
...
>>> +	spin_lock(&ctx->flc_lock);
>>> +	ret = !list_empty(&ctx->flc_posix) || !list_empty(&ctx->flc_flock);
>>> +	spin_unlock(&ctx->flc_lock);
>> BTW, is the spin_lock/spin_unlock here really needed ?
>>
> We could probably achieve the same effect with barriers, but I doubt
> it's worth it. The flc_lock only protects the lists in the
> file_lock_context, so it should almost always be uncontended.
>
I just see some other places where are also checking this don't use the 
spin lock.

Thanks,

- Xiubo

>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(vfs_inode_has_locks);
>>> +
>>>    #ifdef CONFIG_PROC_FS
>>>    #include <linux/proc_fs.h>
>>>    #include <linux/seq_file.h>
>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>> index e654435f1651..d6cb42b7e91c 100644
>>> --- a/include/linux/fs.h
>>> +++ b/include/linux/fs.h
>>> @@ -1170,6 +1170,7 @@ extern int locks_delete_block(struct file_lock *);
>>>    extern int vfs_test_lock(struct file *, struct file_lock *);
>>>    extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
>>>    extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
>>> +bool vfs_inode_has_locks(struct inode *inode);
>>>    extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
>>>    extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
>>>    extern void lease_get_mtime(struct inode *, struct timespec64 *time);
>> All the others LGTM.
>>
>> Thanks.
>>
>> - Xiubo
>>
>>
> Thanks. I'll re-post it "officially" in a bit and will queue it up for
> v6.2.

Jeff Layton Nov. 16, 2022, 11:25 a.m. UTC | #10

On Wed, 2022-11-16 at 19:16 +0800, Xiubo Li wrote:
> On 16/11/2022 18:55, Jeff Layton wrote:
> > On Wed, 2022-11-16 at 14:49 +0800, Xiubo Li wrote:
> > > On 15/11/2022 22:40, Jeff Layton wrote:
> > > 
> ...
> > > > +	spin_lock(&ctx->flc_lock);
> > > > +	ret = !list_empty(&ctx->flc_posix) || !list_empty(&ctx->flc_flock);
> > > > +	spin_unlock(&ctx->flc_lock);
> > > BTW, is the spin_lock/spin_unlock here really needed ?
> > > 
> > We could probably achieve the same effect with barriers, but I doubt
> > it's worth it. The flc_lock only protects the lists in the
> > file_lock_context, so it should almost always be uncontended.
> > 
> I just see some other places where are also checking this don't use the 
> spin lock.
> 
> 

True.

There are a number of places that don't and that use list_empty_careful.
Some of those  We could convert to that here, but again, I'm not sure
it's worth it. Let's stick with using the spinlocks here, since this
isn't a performance-critical codepath anyway.

> 
> > > > +	return ret;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(vfs_inode_has_locks);
> > > > +
> > > >    #ifdef CONFIG_PROC_FS
> > > >    #include <linux/proc_fs.h>
> > > >    #include <linux/seq_file.h>
> > > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > > index e654435f1651..d6cb42b7e91c 100644
> > > > --- a/include/linux/fs.h
> > > > +++ b/include/linux/fs.h
> > > > @@ -1170,6 +1170,7 @@ extern int locks_delete_block(struct file_lock *);
> > > >    extern int vfs_test_lock(struct file *, struct file_lock *);
> > > >    extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
> > > >    extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
> > > > +bool vfs_inode_has_locks(struct inode *inode);
> > > >    extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
> > > >    extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
> > > >    extern void lease_get_mtime(struct inode *, struct timespec64 *time);
> > > All the others LGTM.
> > > 
> > > Thanks.
> > > 
> > > - Xiubo
> > > 
> > > 
> > Thanks. I'll re-post it "officially" in a bit and will queue it up for
> > v6.2.
>

Xiubo Li Nov. 16, 2022, 1:24 p.m. UTC | #11

On 16/11/2022 19:25, Jeff Layton wrote:
> On Wed, 2022-11-16 at 19:16 +0800, Xiubo Li wrote:
>> On 16/11/2022 18:55, Jeff Layton wrote:
>>> On Wed, 2022-11-16 at 14:49 +0800, Xiubo Li wrote:
>>>> On 15/11/2022 22:40, Jeff Layton wrote:
>>>>
>> ...
>>>>> +	spin_lock(&ctx->flc_lock);
>>>>> +	ret = !list_empty(&ctx->flc_posix) || !list_empty(&ctx->flc_flock);
>>>>> +	spin_unlock(&ctx->flc_lock);
>>>> BTW, is the spin_lock/spin_unlock here really needed ?
>>>>
>>> We could probably achieve the same effect with barriers, but I doubt
>>> it's worth it. The flc_lock only protects the lists in the
>>> file_lock_context, so it should almost always be uncontended.
>>>
>> I just see some other places where are also checking this don't use the
>> spin lock.
>>
>>
> True.
>
> There are a number of places that don't and that use list_empty_careful.
> Some of those  We could convert to that here, but again, I'm not sure
> it's worth it. Let's stick with using the spinlocks here, since this
> isn't a performance-critical codepath anyway.
>
Okay.

Thanks!


>>>>> +	return ret;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(vfs_inode_has_locks);
>>>>> +
>>>>>     #ifdef CONFIG_PROC_FS
>>>>>     #include <linux/proc_fs.h>
>>>>>     #include <linux/seq_file.h>
>>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>>>> index e654435f1651..d6cb42b7e91c 100644
>>>>> --- a/include/linux/fs.h
>>>>> +++ b/include/linux/fs.h
>>>>> @@ -1170,6 +1170,7 @@ extern int locks_delete_block(struct file_lock *);
>>>>>     extern int vfs_test_lock(struct file *, struct file_lock *);
>>>>>     extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
>>>>>     extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
>>>>> +bool vfs_inode_has_locks(struct inode *inode);
>>>>>     extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
>>>>>     extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
>>>>>     extern void lease_get_mtime(struct inode *, struct timespec64 *time);
>>>> All the others LGTM.
>>>>
>>>> Thanks.
>>>>
>>>> - Xiubo
>>>>
>>>>
>>> Thanks. I'll re-post it "officially" in a bit and will queue it up for
>>> v6.2.

[RFC] filelock: new helper: vfs_file_has_locks

Commit Message

Comments

Patch