diff mbox series

Bug in short splice to socket?

Message ID 499791.1685485603@warthog.procyon.org.uk (mailing list archive)
State New
Headers show
Series Bug in short splice to socket? | expand

Commit Message

David Howells May 30, 2023, 10:26 p.m. UTC
Jakub Kicinski <kuba@kernel.org> wrote:

> Will the TLS selftests under tools/.../net/tls.c exercise this?

Interesting.  Now that you've pointed me at it, I've tried running it.  Mostly
it passes, but I'm having some problems with the multi_chunk_sendfile tests
that time out.  I think that splice_direct_to_actor() has a bug.  The problem
is this bit of code:

		/*
		 * If more data is pending, set SPLICE_F_MORE
		 * If this is the last data and SPLICE_F_MORE was not set
		 * initially, clears it.
		 */
		if (read_len < len)
			sd->flags |= SPLICE_F_MORE;
		else if (!more)
			sd->flags &= ~SPLICE_F_MORE;

When used with sendfile(), it sets SPLICE_F_MORE (which causes MSG_MORE to be
passed to the network protocol) if we haven't yet read everything that the
user requested and clears it if we fulfilled what the user requested.

This has the weird effect that MSG_MORE gets kind of inverted.  It's never
seen by the actor if we can read the entire request into the pipe - except if
we hit the EOF first.  If we hit the EOF before we fulfil the entire request,
we get a short read and SPLICE_F_MORE and thus MSG_MORE *is* set.  The
upstream TLS code ignores it - but I'm changing this with my patches as
sendmsg() then uses it to mark the EOR.

I think we probably need to fix this in some way to check the size of source
file - which may not be a regular file:-/  With the attached change, all tests
pass; without it, a bunch of tests fail with timeouts.

David
---

Comments

Jakub Kicinski May 31, 2023, 12:32 a.m. UTC | #1
On Tue, 30 May 2023 23:26:43 +0100 David Howells wrote:
> Interesting.  Now that you've pointed me at it, I've tried running it.  Mostly
> it passes, but I'm having some problems with the multi_chunk_sendfile tests
> that time out.  I think that splice_direct_to_actor() has a bug.  The problem
> is this bit of code:
> 
> 		/*
> 		 * If more data is pending, set SPLICE_F_MORE
> 		 * If this is the last data and SPLICE_F_MORE was not set
> 		 * initially, clears it.
> 		 */
> 		if (read_len < len)
> 			sd->flags |= SPLICE_F_MORE;
> 		else if (!more)
> 			sd->flags &= ~SPLICE_F_MORE;
> 
> When used with sendfile(), it sets SPLICE_F_MORE (which causes MSG_MORE to be
> passed to the network protocol) if we haven't yet read everything that the
> user requested and clears it if we fulfilled what the user requested.
> 
> This has the weird effect that MSG_MORE gets kind of inverted.  It's never
> seen by the actor if we can read the entire request into the pipe - except if
> we hit the EOF first.  If we hit the EOF before we fulfil the entire request,
> we get a short read and SPLICE_F_MORE and thus MSG_MORE *is* set.  The
> upstream TLS code ignores it - but I'm changing this with my patches as
> sendmsg() then uses it to mark the EOR.
> 
> I think we probably need to fix this in some way to check the size of source
> file - which may not be a regular file:-/  With the attached change, all tests
> pass; without it, a bunch of tests fail with timeouts.

Yeah.. it's one of those 'known warts' which we worked around in TLS
because I don't know enough about VFS to confidently fix it in fs/.
Proper fix would be pretty nice to have.

The original-original report of the problem is here, FWIW:
https://lore.kernel.org/netdev/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/

And my somewhat hacky fix was d452d48b9f8.
diff mbox series

Patch

diff --git a/fs/splice.c b/fs/splice.c
index 3e06611d19ae..a7cf216c02a7 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -982,10 +982,21 @@  ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 		 * If this is the last data and SPLICE_F_MORE was not set
 		 * initially, clears it.
 		 */
-		if (read_len < len)
-			sd->flags |= SPLICE_F_MORE;
-		else if (!more)
+		if (read_len < len) {
+			struct inode *ii = in->f_mapping->host;
+
+			if (ii->i_fop->llseek != noop_llseek &&
+			    pos >= i_size_read(ii)) {
+				if (!more)
+					sd->flags &= ~SPLICE_F_MORE;
+			} else {
+				sd->flags |= SPLICE_F_MORE;
+			}
+
+		} else if (!more) {
 			sd->flags &= ~SPLICE_F_MORE;
+		}
+
 		/*
 		 * NOTE: nonblocking mode only applies to the input. We
 		 * must not do the output in nonblocking mode as then we