diff mbox series

parser: Fix VSLENGTH parsing with trailing garbage

Message ID 20210621095719.GA9287@gondor.apana.org.au (mailing list archive)
State Accepted
Delegated to: Herbert Xu
Headers show
Series parser: Fix VSLENGTH parsing with trailing garbage | expand

Commit Message

Herbert Xu June 21, 2021, 9:57 a.m. UTC
On Sat, Jun 19, 2021 at 02:44:46PM +0200, Denys Vlasenko wrote:
> 
> CTLVAR and CTLBACKQ are not properly handled if encountered
> inside {$#...}. Testcase:
> 
> dash -c "`printf 'echo ${#1\x82}'`" 00 111 222
> 
> It should execute "echo ${#1 <byte 0x82> }" and thus print "3"
> (the length of $1, which is "111").
> 
> Instead, it segfaults.
> 
> (Ideally, it should fail since "1 <byte 0x82>" is not a valid
> variable name, but currently dash accepts e.g. "${#1abc}"
> as if it is "${#1}bc". A separate, less serious bug...).

In fact these two bugs are one and the same.  This patch fixes
both by detecting the invalid substitution and not emitting it
into the node tree.

Incidentally this reveals a bug in how we parse ${#10} that got
introduced recently, which is also fixed here.

Reported-by: Denys Vlasenko <vda.linux@googlemail.com>
Fixes: 7710a926b321 ("parser: Only accept single-digit parameter...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Denys Vlasenko June 21, 2021, 2:21 p.m. UTC | #1
On Mon, Jun 21, 2021 at 11:57 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Sat, Jun 19, 2021 at 02:44:46PM +0200, Denys Vlasenko wrote:
> > CTLVAR and CTLBACKQ are not properly handled if encountered
> > inside {$#...}. Testcase:
> >
> > dash -c "`printf 'echo ${#1\x82}'`" 00 111 222
...
> In fact these two bugs are one and the same.  This patch fixes
> both by detecting the invalid substitution and not emitting it
> into the node tree.
>
> Incidentally this reveals a bug in how we parse ${#10} that got
> introduced recently, which is also fixed here.
...
> --- a/src/parser.h
> +++ b/src/parser.h
> @@ -62,6 +62,7 @@
>  #define VSTRIMLEFT     0x8             /* ${var#pattern} */
>  #define VSTRIMLEFTMAX  0x9             /* ${var##pattern} */
>  #define VSLENGTH       0xa             /* ${#var} */
> +/* VSLENGTH must come last. */

The above assumption is not necessary if...


> diff --git a/src/parser.c b/src/parser.c
> index 3c80d17..13c2df5 100644
> --- a/src/parser.c
> +++ b/src/parser.c
> @@ -1252,7 +1252,8 @@ varname:
>                         do {
>                                 STPUTC(c, out);
>                                 c = pgetc_eatbnl();
> -                       } while (!subtype && is_digit(c));
> +                       } while ((subtype <= 0 || subtype >= VSLENGTH) &&
> +                                is_digit(c));

... you use (subtype == 0 || subtype == VSLENGTH) here.
Also, (subtype == 0 || subtype == VSLENGTH) is less confusing:
it says "loop if ${VAR} or ${#VAR} syntax", whereas <= >=
are a bit misleading.
Herbert Xu June 22, 2021, 12:19 a.m. UTC | #2
On Mon, Jun 21, 2021 at 04:21:40PM +0200, Denys Vlasenko wrote:
>
> > diff --git a/src/parser.c b/src/parser.c
> > index 3c80d17..13c2df5 100644
> > --- a/src/parser.c
> > +++ b/src/parser.c
> > @@ -1252,7 +1252,8 @@ varname:
> >                         do {
> >                                 STPUTC(c, out);
> >                                 c = pgetc_eatbnl();
> > -                       } while (!subtype && is_digit(c));
> > +                       } while ((subtype <= 0 || subtype >= VSLENGTH) &&
> > +                                is_digit(c));
> 
> ... you use (subtype == 0 || subtype == VSLENGTH) here.
> Also, (subtype == 0 || subtype == VSLENGTH) is less confusing:
> it says "loop if ${VAR} or ${#VAR} syntax", whereas <= >=
> are a bit misleading.

Yes it looks a bit confusing, but it turns into a single branch
instead of two.  Perhaps I should add a helper function for it.

Thanks,
Denys Vlasenko June 22, 2021, 8:34 a.m. UTC | #3
On Tue, Jun 22, 2021 at 2:19 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Mon, Jun 21, 2021 at 04:21:40PM +0200, Denys Vlasenko wrote:
> > > -                       } while (!subtype && is_digit(c));
> > > +                       } while ((subtype <= 0 || subtype >= VSLENGTH) &&
> > > +                                is_digit(c));
> >
> > ... you use (subtype == 0 || subtype == VSLENGTH) here.
> > Also, (subtype == 0 || subtype == VSLENGTH) is less confusing:
> > it says "loop if ${VAR} or ${#VAR} syntax", whereas <= >=
> > are a bit misleading.
>
> Yes it looks a bit confusing, but it turns into a single branch
> instead of two.

Yes, I know that. Compiler turns it into "(unsigned)(x-1) >= VSLENGTH-1"
expression.

But is it worth the obfuscation? Especially that it also has another
downside (it requires an additional free CPU register to hold (x-1)
result, which can force compiler to spill other values to stack).
diff mbox series

Patch

diff --git a/src/parser.c b/src/parser.c
index 3c80d17..13c2df5 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -1252,7 +1252,8 @@  varname:
 			do {
 				STPUTC(c, out);
 				c = pgetc_eatbnl();
-			} while (!subtype && is_digit(c));
+			} while ((subtype <= 0 || subtype >= VSLENGTH) &&
+				 is_digit(c));
 		} else if (c != '}') {
 			int cc = c;
 
@@ -1312,6 +1313,8 @@  varname:
 				break;
 			}
 		} else {
+			if (subtype == VSLENGTH && c != '}')
+				subtype = 0;
 badsub:
 			pungetc();
 		}
diff --git a/src/parser.h b/src/parser.h
index 524ac1c..7d2749b 100644
--- a/src/parser.h
+++ b/src/parser.h
@@ -62,6 +62,7 @@ 
 #define VSTRIMLEFT	0x8		/* ${var#pattern} */
 #define VSTRIMLEFTMAX	0x9		/* ${var##pattern} */
 #define VSLENGTH	0xa		/* ${#var} */
+/* VSLENGTH must come last. */
 
 /* values of checkkwd variable */
 #define CHKALIAS	0x1