diff mbox series

fs/ceph/io: make ceph_start_io_*() killable

Message ID 20241206165014.165614-1-max.kellermann@ionos.com (mailing list archive)
State New
Headers show
Series fs/ceph/io: make ceph_start_io_*() killable | expand

Commit Message

Max Kellermann Dec. 6, 2024, 4:50 p.m. UTC
This allows killing processes that wait for a lock when one process is
stuck waiting for the Ceph server.  This is similar to the NFS commit
38a125b31504 ("fs/nfs/io: make nfs_start_io_*() killable").

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
 fs/ceph/file.c | 22 +++++++++++++---------
 fs/ceph/io.c   | 44 +++++++++++++++++++++++++++++++++-----------
 fs/ceph/io.h   |  8 +++++---
 3 files changed, 51 insertions(+), 23 deletions(-)

Comments

Viacheslav Dubeyko Dec. 6, 2024, 5:40 p.m. UTC | #1
On Fri, 2024-12-06 at 17:50 +0100, Max Kellermann wrote:
> This allows killing processes that wait for a lock when one process
> is
> stuck waiting for the Ceph server.  This is similar to the NFS commit
> 38a125b31504 ("fs/nfs/io: make nfs_start_io_*() killable").
> 
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> ---
>  fs/ceph/file.c | 22 +++++++++++++---------
>  fs/ceph/io.c   | 44 +++++++++++++++++++++++++++++++++-----------
>  fs/ceph/io.h   |  8 +++++---
>  3 files changed, 51 insertions(+), 23 deletions(-)
> 
> 

<skipped>

>  
>  /**
> diff --git a/fs/ceph/io.h b/fs/ceph/io.h
> index fa594cd77348..08d58253f533 100644
> --- a/fs/ceph/io.h
> +++ b/fs/ceph/io.h
> @@ -2,11 +2,13 @@
>  #ifndef _FS_CEPH_IO_H
>  #define _FS_CEPH_IO_H
>  
> -void ceph_start_io_read(struct inode *inode);
> +#include <linux/compiler_attributes.h> // for __must_check

Do we really need this comment (for __must_check)? It looks like not
very informative. What do you think?

I am not completely sure that it really needs to request compiler to
check that return value is processed. Do we really need to enforce it?

Thanks,
Slava.

> +
> +__must_check int ceph_start_io_read(struct inode *inode);
>  void ceph_end_io_read(struct inode *inode);
> -void ceph_start_io_write(struct inode *inode);
> +__must_check int ceph_start_io_write(struct inode *inode);
>  void ceph_end_io_write(struct inode *inode);
> -void ceph_start_io_direct(struct inode *inode);
> +__must_check int ceph_start_io_direct(struct inode *inode);
>  void ceph_end_io_direct(struct inode *inode);
>  
>  #endif /* FS_CEPH_IO_H */
Max Kellermann Dec. 6, 2024, 6:58 p.m. UTC | #2
On Fri, Dec 6, 2024 at 6:40 PM Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote:
> Do we really need this comment (for __must_check)? It looks like not
> very informative. What do you think?

That's a question of taste. For my taste, such comments are (not
needed but) helpful; many similar comments exist in the Linux kernel.

> I am not completely sure that it really needs to request compiler to
> check that return value is processed. Do we really need to enforce it?

Yes, should definitely be enforced. Callers which don't check the
return value are 100% buggy.
Viacheslav Dubeyko Dec. 6, 2024, 7:11 p.m. UTC | #3
On Fri, 2024-12-06 at 19:58 +0100, Max Kellermann wrote:
> On Fri, Dec 6, 2024 at 6:40 PM Viacheslav Dubeyko
> <Slava.Dubeyko@ibm.com> wrote:
> > Do we really need this comment (for __must_check)? It looks like
> > not
> > very informative. What do you think?
> 
> That's a question of taste. For my taste, such comments are (not
> needed but) helpful; many similar comments exist in the Linux kernel.

Yeah, I completely see your point. But I believe that #include
<linux/compiler_attributes.h> is already contains enough info. If
anybody would like to understand __must_check origin, then this guy
will end into compiler_attributes.h. Otherwise, we need to comment
every #include that sounds like overkill for my taste. :)

> 
> > I am not completely sure that it really needs to request compiler
> > to
> > check that return value is processed. Do we really need to enforce
> > it?
> 
> Yes, should definitely be enforced. Callers which don't check the
> return value are 100% buggy.

I definitely could agree with you here. But, frankly speaking, it could
depends on function's logic. There are many places in kernel where such
checking was skipped and no harm finally. In our case, we have return
value from down_write_killable() only, mostly. Should be the check of
this function's output mandatory? I am not fully sure. But I believe
you are more right here than me.

Thanks,
Slava.
Max Kellermann Dec. 6, 2024, 10:48 p.m. UTC | #4
On Fri, Dec 6, 2024 at 8:11 PM Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote:
> Should be the check of
> this function's output mandatory? I am not fully sure.

But I am fully sure.
If you don't check the return value, you don't know whether the inode
was locked. If you don't know that, you can't decide whether you need
to unlock it. That being optional now (cancel locking if SIGKILL was
received) is the sole point of my patch. You MUST check the return
value. There is no other way. Don't trust my word - just read the
code.
Viacheslav Dubeyko Dec. 9, 2024, 6:59 p.m. UTC | #5
On Fri, 2024-12-06 at 23:48 +0100, Max Kellermann wrote:
> On Fri, Dec 6, 2024 at 8:11 PM Viacheslav Dubeyko
> <Slava.Dubeyko@ibm.com> wrote:
> > Should be the check of
> > this function's output mandatory? I am not fully sure.
> 
> But I am fully sure.
> If you don't check the return value, you don't know whether the inode
> was locked. If you don't know that, you can't decide whether you need
> to unlock it. That being optional now (cancel locking if SIGKILL was
> received) is the sole point of my patch. You MUST check the return
> value. There is no other way. Don't trust my word - just read the
> code.


The down_write_killable() can return -EINTR, currently:
https://elixir.bootlin.com/linux/v6.12.4/source/kernel/locking/rwsem.c#L1593

And -EINTR can imply that client has been killed:
https://elixir.bootlin.com/linux/v6.12.4/source/include/uapi/asm-generic/errno-base.h#L8

It sounds to me that we simply need not to execute the logic. But do we
really need to report
the error to the caller? I am simply trying to double check that
caller's logic is ready
to process the error condition in the correct way.

Thanks,
Slava.
diff mbox series

Patch

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 4b8d59ebda00..d79c0774dc6e 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -2127,10 +2127,11 @@  static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	if (ceph_inode_is_shutdown(inode))
 		return -ESTALE;
 
-	if (direct_lock)
-		ceph_start_io_direct(inode);
-	else
-		ceph_start_io_read(inode);
+	ret = direct_lock
+		? ceph_start_io_direct(inode)
+		: ceph_start_io_read(inode);
+	if (ret)
+		return ret;
 
 	if (!(fi->flags & CEPH_F_SYNC) && !direct_lock)
 		want |= CEPH_CAP_FILE_CACHE;
@@ -2283,7 +2284,9 @@  static ssize_t ceph_splice_read(struct file *in, loff_t *ppos,
 	    (fi->flags & CEPH_F_SYNC))
 		return copy_splice_read(in, ppos, pipe, len, flags);
 
-	ceph_start_io_read(inode);
+	ret = ceph_start_io_read(inode);
+	if (ret)
+		return ret;
 
 	want = CEPH_CAP_FILE_CACHE;
 	if (fi->fmode & CEPH_FILE_MODE_LAZY)
@@ -2362,10 +2365,11 @@  static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		direct_lock = true;
 
 retry_snap:
-	if (direct_lock)
-		ceph_start_io_direct(inode);
-	else
-		ceph_start_io_write(inode);
+	err = direct_lock
+		? ceph_start_io_direct(inode)
+		: ceph_start_io_write(inode);
+	if (err)
+		goto out_unlocked;
 
 	if (iocb->ki_flags & IOCB_APPEND) {
 		err = ceph_do_getattr(inode, CEPH_STAT_CAP_SIZE, false);
diff --git a/fs/ceph/io.c b/fs/ceph/io.c
index c456509b31c3..2735503bc479 100644
--- a/fs/ceph/io.c
+++ b/fs/ceph/io.c
@@ -47,20 +47,30 @@  static void ceph_block_o_direct(struct ceph_inode_info *ci, struct inode *inode)
  * Note that buffered writes and truncates both take a write lock on
  * inode->i_rwsem, meaning that those are serialised w.r.t. the reads.
  */
-void
+int
 ceph_start_io_read(struct inode *inode)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
+	int err;
 
 	/* Be an optimist! */
-	down_read(&inode->i_rwsem);
+	err = down_read_killable(&inode->i_rwsem);
+	if (err)
+		return err;
+
 	if (!(READ_ONCE(ci->i_ceph_flags) & CEPH_I_ODIRECT))
-		return;
+		return 0;
 	up_read(&inode->i_rwsem);
+
 	/* Slow path.... */
-	down_write(&inode->i_rwsem);
+	err = down_write_killable(&inode->i_rwsem);
+	if (err)
+		return err;
+
 	ceph_block_o_direct(ci, inode);
 	downgrade_write(&inode->i_rwsem);
+
+	return 0;
 }
 
 /**
@@ -83,11 +93,13 @@  ceph_end_io_read(struct inode *inode)
  * Declare that a buffered write operation is about to start, and ensure
  * that we block all direct I/O.
  */
-void
+int
 ceph_start_io_write(struct inode *inode)
 {
-	down_write(&inode->i_rwsem);
-	ceph_block_o_direct(ceph_inode(inode), inode);
+	int err = down_write_killable(&inode->i_rwsem);
+	if (!err)
+		ceph_block_o_direct(ceph_inode(inode), inode);
+	return err;
 }
 
 /**
@@ -133,20 +145,30 @@  static void ceph_block_buffered(struct ceph_inode_info *ci, struct inode *inode)
  * Note that buffered writes and truncates both take a write lock on
  * inode->i_rwsem, meaning that those are serialised w.r.t. O_DIRECT.
  */
-void
+int
 ceph_start_io_direct(struct inode *inode)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
+	int err;
 
 	/* Be an optimist! */
-	down_read(&inode->i_rwsem);
+	err = down_read_killable(&inode->i_rwsem);
+	if (err)
+		return err;
+
 	if (READ_ONCE(ci->i_ceph_flags) & CEPH_I_ODIRECT)
-		return;
+		return 0;
 	up_read(&inode->i_rwsem);
+
 	/* Slow path.... */
-	down_write(&inode->i_rwsem);
+	err = down_write_killable(&inode->i_rwsem);
+	if (err)
+		return err;
+
 	ceph_block_buffered(ci, inode);
 	downgrade_write(&inode->i_rwsem);
+
+	return 0;
 }
 
 /**
diff --git a/fs/ceph/io.h b/fs/ceph/io.h
index fa594cd77348..08d58253f533 100644
--- a/fs/ceph/io.h
+++ b/fs/ceph/io.h
@@ -2,11 +2,13 @@ 
 #ifndef _FS_CEPH_IO_H
 #define _FS_CEPH_IO_H
 
-void ceph_start_io_read(struct inode *inode);
+#include <linux/compiler_attributes.h> // for __must_check
+
+__must_check int ceph_start_io_read(struct inode *inode);
 void ceph_end_io_read(struct inode *inode);
-void ceph_start_io_write(struct inode *inode);
+__must_check int ceph_start_io_write(struct inode *inode);
 void ceph_end_io_write(struct inode *inode);
-void ceph_start_io_direct(struct inode *inode);
+__must_check int ceph_start_io_direct(struct inode *inode);
 void ceph_end_io_direct(struct inode *inode);
 
 #endif /* FS_CEPH_IO_H */