diff mbox

[v3] builtin: Fix handling of trailing IFS white spaces

Message ID 20160607092539.GA30397@gondor.apana.org.au (mailing list archive)
State Superseded
Delegated to: Herbert Xu
Headers show

Commit Message

Herbert Xu June 7, 2016, 9:25 a.m. UTC
On Mon, Jun 06, 2016 at 10:43:39PM +0200, Harald van Dijk wrote:
>
> Thanks. If starting with an escaped IFS, the CTLESC should already
> have been skipped before the if (maxargs == 1) check gets executed.
> That's done at the start of the outer loop, but is not visible in
> the patch because it's unmodified from the original code. Or did I
> misunderstand you here?

If the very first character later gets zeroed, you'd be zeroing
the character after the CTLESC, leaving the CTLESC in the value
to be assigned to the last variable of read.

If you got rid of the very first q=p assignment it should just work.

> With the link sent by Martijn Dekker earlier in this thread:
> 
> 
> <http://web.archive.org/web/20050414022354/http://www.research.att.com/~gsf/public/ifs.sh>
> 
> it gives a lot of failures:
> 
> # tests 6856 passed 1844 failed 5012
> 
> I'll look into it in more detail later, if no one beats me to it.

Thanks.  I've fixed up the buglet causing the failures:

---8<--
The read built-in does not handle trailing IFS white spaces in
the right way, when there are more fields than variables.  Part
of the problem is that this case is handled outside of ifsbreakup.

Harald van Dijk wrote a patch to fix this by moving the magic
into ifsbreakup itself.

This patch further reorganises the ifsbreakup loop by having only
one loop over the whole string.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Harald van Dijk June 12, 2016, 10:35 a.m. UTC | #1
On 07/06/16 11:25, Herbert Xu wrote:
> If the very first character later gets zeroed, you'd be zeroing
> the character after the CTLESC, leaving the CTLESC in the value
> to be assigned to the last variable of read.

Ah, I see. Thanks for the explanation. While trying to make it misbehave 
I found that rmescapes removes trailing CTLESC, and readcmd_handle_line 
calls rmescapes before the trailing CTLESC gets a chance to cause 
problems. I was now still only able to see the problem when adding some 
extra debugging statements.

> If you got rid of the very first q=p assignment it should just work.

There is a later q=p assignment too that gets performed after CTLESC has 
been skipped. That one also isn't going to cause problems in practice, 
since it only gets executed when a character is both IFS and IFS 
whitespace, but when called from readcmd_handle_line, there's never 
going to be escaped IFS whitespace.

> Thanks.  I've fixed up the buglet causing the failures:

The results are a lot better now, but I did spot a problem:

src/dash -c 'X="x  "; echo $X'

This is supposed to print "x\n", but the IFS breakup of $X generates two 
words, one "x", one " ", meaning "x  \n" gets printed instead. I think 
this is intended to get fixed up in the if (ifsspc) block, but that 
block doesn't get executed when there are no more characters after the 
spaces.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu June 12, 2016, 11:06 a.m. UTC | #2
On Sun, Jun 12, 2016 at 12:35:15PM +0200, Harald van Dijk wrote:
>
> The results are a lot better now, but I did spot a problem:
> 
> src/dash -c 'X="x  "; echo $X'
> 
> This is supposed to print "x\n", but the IFS breakup of $X generates
> two words, one "x", one " ", meaning "x  \n" gets printed instead. I
> think this is intended to get fixed up in the if (ifsspc) block, but
> that block doesn't get executed when there are no more characters
> after the spaces.

Weird, I can't reproduce this at all:

$ src/dash -c 'X="x "; echo $X' | cat -A
x$
$ 

What am I doing wrong?

Thanks,
Harald van Dijk June 12, 2016, 11:12 a.m. UTC | #3
On 12/06/16 13:06, Herbert Xu wrote:
> On Sun, Jun 12, 2016 at 12:35:15PM +0200, Harald van Dijk wrote:
>>
>> The results are a lot better now, but I did spot a problem:
>>
>> src/dash -c 'X="x  "; echo $X'
>>
>> This is supposed to print "x\n", but the IFS breakup of $X generates
>> two words, one "x", one " ", meaning "x  \n" gets printed instead. I
>> think this is intended to get fixed up in the if (ifsspc) block, but
>> that block doesn't get executed when there are no more characters
>> after the spaces.
>
> Weird, I can't reproduce this at all:
>
> $ src/dash -c 'X="x "; echo $X' | cat -A
> x$
> $
>
> What am I doing wrong?

It needs two spaces after the "x", not just one. With a single space, it 
does work right.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/expand.c b/src/expand.c
index b2d710d..2207eba 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -50,6 +50,7 @@ 
 #include <glob.h>
 #endif
 #include <ctype.h>
+#include <stdbool.h>
 
 /*
  * Routines to expand arguments to commands.  We have to deal with
@@ -203,7 +204,7 @@  expandarg(union node *arg, struct arglist *arglist, int flag)
 	 * TODO - EXP_REDIR
 	 */
 	if (flag & EXP_FULL) {
-		ifsbreakup(p, &exparg);
+		ifsbreakup(p, -1, &exparg);
 		*exparg.lastp = NULL;
 		exparg.lastp = &exparg.list;
 		expandmeta(exparg.list, flag);
@@ -1016,15 +1017,18 @@  recordregion(int start, int end, int nulonly)
  * Break the argument string into pieces based upon IFS and add the
  * strings to the argument list.  The regions of the string to be
  * searched for IFS characters have been stored by recordregion.
+ * If maxargs is non-negative, at most maxargs arguments will be created, by
+ * joining together the last arguments.
  */
 void
-ifsbreakup(char *string, struct arglist *arglist)
+ifsbreakup(char *string, int maxargs, struct arglist *arglist)
 {
 	struct ifsregion *ifsp;
 	struct strlist *sp;
 	char *start;
 	char *p;
 	char *q;
+	char *r = NULL;
 	const char *ifs, *realifs;
 	int ifsspc;
 	int nulonly;
@@ -1042,16 +1046,75 @@  ifsbreakup(char *string, struct arglist *arglist)
 			ifs = nulonly ? nullstr : realifs;
 			ifsspc = 0;
 			while (p < string + ifsp->endoff) {
+				int c;
+				bool isifs;
+				bool isdefifs;
+
 				q = p;
-				if (*p == (char)CTLESC)
-					p++;
-				if (strchr(ifs, *p)) {
+				c = *p++;
+				if (c == (char)CTLESC)
+					c = *p++;
+
+				isifs = strchr(ifs, c);
+				isdefifs = false;
+				if (isifs)
+					isdefifs = strchr(defifs, c);
+
+				/* If only reading one more argument:
+				 * If we have exactly one field,
+				 * read that field without its terminator.
+				 * If we have more than one field,
+				 * read all fields including their terminators,
+				 * except for trailing IFS whitespace.
+				 *
+				 * This means that if we have only IFS
+				 * characters left, and at most one
+				 * of them is non-whitespace, we stop
+				 * reading here.
+				 * Otherwise, we read all the remaining
+				 * characters except for trailing
+				 * IFS whitespace.
+				 *
+				 * In any case, r indicates the start
+				 * of the characters to remove, or NULL
+				 * if no characters should be removed.
+				 */
+				if (!maxargs) {
+					if (isdefifs) {
+						if (!r)
+							r = q;
+						continue;
+					}
+
+					if (!(isifs && ifsspc))
+						r = NULL;
+
+					ifsspc = 0;
+					continue;
+				}
+
+				if (ifsspc) {
+					if (isdefifs)
+						continue;
+
+					if (isifs)
+						q = p;
+
+					start = q;
+					isifs = false;
+				}
+
+				if (isifs) {
 					if (!nulonly)
-						ifsspc = (strchr(defifs, *p) != NULL);
+						ifsspc = isdefifs;
 					/* Ignore IFS whitespace at start */
 					if (q == start && ifsspc) {
-						p++;
 						start = p;
+						ifsspc = 0;
+						continue;
+					}
+					if (maxargs > 0 && !--maxargs) {
+						r = q;
 						continue;
 					}
 					*q = '\0';
@@ -1059,39 +1122,20 @@  ifsbreakup(char *string, struct arglist *arglist)
 					sp->text = start;
 					*arglist->lastp = sp;
 					arglist->lastp = &sp->next;
-					p++;
-					if (!nulonly) {
-						for (;;) {
-							if (p >= string + ifsp->endoff) {
-								break;
-							}
-							q = p;
-							if (*p == (char)CTLESC)
-								p++;
-							if (strchr(ifs, *p) == NULL ) {
-								p = q;
-								break;
-							} else if (strchr(defifs, *p) == NULL) {
-								if (ifsspc) {
-									p++;
-									ifsspc = 0;
-								} else {
-									p = q;
-									break;
-								}
-							} else
-								p++;
-						}
-					}
 					start = p;
-				} else
-					p++;
+					continue;
+				}
+
+				ifsspc = 0;
 			}
 		} while ((ifsp = ifsp->next) != NULL);
 		if (nulonly)
 			goto add;
 	}
 
+	if (r)
+		*r = '\0';
+
 	if (!*start)
 		return;
 
diff --git a/src/expand.h b/src/expand.h
index 6a90f67..26dc5b4 100644
--- a/src/expand.h
+++ b/src/expand.h
@@ -69,7 +69,7 @@  char *_rmescapes(char *, int);
 int casematch(union node *, char *);
 void recordregion(int, int, int);
 void removerecordregions(int); 
-void ifsbreakup(char *, struct arglist *);
+void ifsbreakup(char *, int, struct arglist *);
 void ifsfree(void);
 
 /* From arith.y */
diff --git a/src/miscbltin.c b/src/miscbltin.c
index b596fd2..39b9c47 100644
--- a/src/miscbltin.c
+++ b/src/miscbltin.c
@@ -67,28 +67,21 @@ 
  *  less fields than variables -> remaining variables unset.
  *
  *  @param line complete line of input
+ *  @param ac argument count
  *  @param ap argument (variable) list
  *  @param len length of line including trailing '\0'
  */
 static void
-readcmd_handle_line(char *s, char **ap)
+readcmd_handle_line(char *s, int ac, char **ap)
 {
 	struct arglist arglist;
 	struct strlist *sl;
-	char *backup;
-	char *line;
 
-	/* ifsbreakup will fiddle with stack region... */
-	line = stackblock();
 	s = grabstackstr(s);
 
-	/* need a copy, so that delimiters aren't lost
-	 * in case there are more fields than variables */
-	backup = sstrdup(line);
-
 	arglist.lastp = &arglist.list;
 	
-	ifsbreakup(s, &arglist);
+	ifsbreakup(s, ac, &arglist);
 	*arglist.lastp = NULL;
 	ifsfree();
 
@@ -104,21 +97,6 @@  readcmd_handle_line(char *s, char **ap)
 			return;
 		}
 
-		/* remaining fields present, but no variables left. */
-		if (!ap[1] && sl->next) {
-			size_t offset;
-			char *remainder;
-
-			/* FIXME little bit hacky, assuming that ifsbreakup 
-			 * will not modify the length of the string */
-			offset = sl->text - s;
-			remainder = backup + offset;
-			rmescapes(remainder);
-			setvar(*ap, remainder, 0);
-
-			return;
-		}
-		
 		/* set variable to field */
 		rmescapes(sl->text);
 		setvar(*ap, sl->text, 0);
@@ -211,7 +189,7 @@  start:
 out:
 	recordregion(startloc, p - (char *)stackblock(), 0);
 	STACKSTRNUL(p);
-	readcmd_handle_line(p + 1, ap);
+	readcmd_handle_line(p + 1, argc - (ap - argv), ap);
 	return status;
 }