diff mbox series

[v3] input: Read standard input byte-wise

Message ID Y7PIHqN4ic4UBOBb@gondor.apana.org.au (mailing list archive)
State Accepted
Delegated to: Herbert Xu
Headers show
Series [v3] input: Read standard input byte-wise | expand

Commit Message

Herbert Xu Jan. 3, 2023, 6:15 a.m. UTC
наб <nabijaczleweli@nabijaczleweli.xyz> wrote:
> 
> POSIX Issue 7, XCU, sh, STDIN says:

Your patch breaks history support since that relies on there
being a whole line in the input buffer.

This works for me:

---8<---
POSIX Issue 7, XCU, sh, STDIN says:
 When the shell is using standard input and it invokes a command that
 also uses standard input, the shell shall ensure that the standard
 input file pointer points directly after the command it has read when
 the command begins execution. It shall not read ahead in such a manner
 that any characters intended to be read by the invoked command are
 consumed by the shell (whether interpreted by the shell or not) or
 that characters that are not read by the invoked command are not seen
 by the shell.

I.e.
 sh <<EOF
 id
 cat
 good!
 EOF
must execute id, then execute cat, then the cat must copy "good!"
to the standard output stream, and similarly
 sh <<"EOF"
 id
 read Q
 good!
 echo Q$Q
 EOF
must execute id, then read "good!" into Q, then echo "Qgood!".

Heretofor the output was as such:
 uid=1000(nabijaczleweli) gid=100(users) groups=100(users)
 ./dash: 3: good!: not found
and as such (with -x):
 + id
 uid=1000(nabijaczleweli) gid=100(users) groups=100(users)
 + read Q
 + good!
 sh: 3: good!: not found
 + echo Q
 Q
and a strace confirms:
 read(0, "id\ncat\ngood!\n", 8192)       = 13
 read(0, "id\nread Q\ngood!\necho Q$Q\n", 8192) = 25

Reading the standard input byte-by-byte is the obvious solution to this
issue, Just Works, and is how all of shells do it on non-seekable input
(we could, theoretically, read regular files block-wise,
then seek within them after parsing, but the complexity out-weighs
the rarity of running sh < program; we could also do whole-line reads
on teletypes in icanon mode, but, again, the gain here is miniscule
for an interactive session, and the mode can change at any time, so...).
Naturally, we keep reading block-wise from not-standard-input.

With this patch, we observe the correct
 uid=1000(nabijaczleweli) gid=100(users) groups=100(users)
 good!
and
 + id
 uid=1000(nabijaczleweli) gid=100(users) groups=100(users)
 + read Q
 + echo Qgood!
 Qgood!

Link: https://bugs.debian.org/862907
Reported-by: наб <nabijaczleweli@nabijaczleweli.xyz>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Harald van Dijk Jan. 3, 2023, 11:54 a.m. UTC | #1
On 03/01/2023 06:15, Herbert Xu wrote:
> Your patch breaks history support since that relies on there
> being a whole line in the input buffer.

Does it? preadbuffer() knows to call history() with the H_APPEND flag to 
append a next chunk of input to what was already added to the buffer.

Cheers,
Harald van Dijk
Herbert Xu Jan. 4, 2023, 8:59 a.m. UTC | #2
On Tue, Jan 03, 2023 at 11:54:55AM +0000, Harald van Dijk wrote:
>
> Does it? preadbuffer() knows to call history() with the H_APPEND flag to
> append a next chunk of input to what was already added to the buffer.

It only uses H_APPEND once a newline has been detected and
whichprompt switches over to PS2.

Cheers,
Harald van Dijk Jan. 4, 2023, 11:18 a.m. UTC | #3
On 04/01/2023 08:59, Herbert Xu wrote:
> On Tue, Jan 03, 2023 at 11:54:55AM +0000, Harald van Dijk wrote:
>>
>> Does it? preadbuffer() knows to call history() with the H_APPEND flag to
>> append a next chunk of input to what was already added to the buffer.
> 
> It only uses H_APPEND once a newline has been detected and
> whichprompt switches over to PS2.

Oh yeah, that's something I'd changed in my version already in 2019 for 
other reasons. I missed that dash didn't do the same. There are a few 
separate but related bugs in the history handling, and until they are 
fixed, agreed that simply reading one bye at a time will make these more 
prominent.

One is that because of what you say, the history recording for commands 
greater than BUFSIZ is already buggy, but unlikely to pose a problem in 
practice currently because normally, people don't write such long commands.

Another is that the stripping of blank lines from history breaks when 
blank lines are significant, as in

   $ cat <<EOF
   > hello
   >
   > world
   > EOF
   hello

   world
   $ fc 1
   26
   q
   hello
   world

Note how the blank line was lost.

If there was already "something" (as the code puts it), then when more 
data is appended, blanks must be preserved.

There may be more that I am not seeing right now.

Cheers,
Harald van Dijk
diff mbox series

Patch

diff --git a/src/input.c b/src/input.c
index ec075f5..43c9161 100644
--- a/src/input.c
+++ b/src/input.c
@@ -159,6 +159,17 @@  int pgetc(void)
 	return __pgetc();
 }
 
+static int stdin_clear_nonblock(void)
+{
+	int flags = fcntl(0, F_GETFL, 0);
+
+	if (flags >= 0) {
+		flags &=~ O_NONBLOCK;
+		flags = fcntl(0, F_SETFL, flags);
+	}
+
+	return flags;
+}
 
 static int
 preadfd(void)
@@ -195,22 +206,38 @@  retry:
 
 	} else
 #endif
+	if (parsefile->fd)
 		nr = read(parsefile->fd, buf, IBUFSIZ - 1);
+	else {
+		unsigned len = IBUFSIZ - 1;
+
+		nr = 0;
+
+		do {
+			int err;
 
+			err = read(0, buf, 1);
+			if (err <= 0) {
+				if (nr)
+					break;
+
+				nr = err;
+				if (errno != EWOULDBLOCK)
+					break;
+				if (stdin_clear_nonblock() < 0)
+					break;
+
+				out2str("sh: turning off NDELAY mode\n");
+				goto retry;
+			}
+
+			nr++;
+		} while (!IS_DEFINED_SMALL && *buf++ != '\n' && --len);
+	}
 
 	if (nr < 0) {
 		if (errno == EINTR)
 			goto retry;
-		if (parsefile->fd == 0 && errno == EWOULDBLOCK) {
-			int flags = fcntl(0, F_GETFL, 0);
-			if (flags >= 0 && flags & O_NONBLOCK) {
-				flags &=~ O_NONBLOCK;
-				if (fcntl(0, F_SETFL, flags) >= 0) {
-					out2str("sh: turning off NDELAY mode\n");
-					goto retry;
-				}
-			}
-		}
 	}
 	return nr;
 }
diff --git a/src/input.h b/src/input.h
index 8c39f33..8830b66 100644
--- a/src/input.h
+++ b/src/input.h
@@ -34,6 +34,12 @@ 
  *	@(#)input.h	8.2 (Berkeley) 5/4/95
  */
 
+#ifdef SMALL
+#define IS_DEFINED_SMALL 1
+#else
+#define IS_DEFINED_SMALL 0
+#endif
+
 /* PEOF (the end of file marker) is defined in syntax.h */
 
 enum {