diff mbox

dash bug: double-quoted "\" breaks glob protection for next char

Message ID 9f37ae19-6f74-f527-aa49-dd04c3c010f6@gigawatt.nl (mailing list archive)
State Superseded
Delegated to: Herbert Xu
Headers show

Commit Message

Harald van Dijk Feb. 19, 2018, 10:13 p.m. UTC
On 2/18/18 11:50 PM, Harald van Dijk wrote:
> On 2/14/18 11:50 PM, Harald van Dijk wrote:
>> On 2/14/18 10:44 PM, Harald van Dijk wrote:
>>> On 2/14/18 9:03 PM, Harald van Dijk wrote:
>>>> On 13/02/2018 14:53, Denys Vlasenko wrote:
>>>>> $ >'\zzzz'
>>>>> $ >'\wwww'
>>>>> $ dash -c 'echo "\*"'
>>>>> \wwww \zzzz
>>>>
>>>> [...]
>>>>
>>>> Currently:
>>>>
>>>> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
>>>> <>
>>>>
>>>> This is what I expect, and also what bash, ksh and posh do.
>>>>
>>>> With your patch:
>>>>
>>>> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
>>>> <a>
>>>
>>> Does the attached look right as an alternative? It treats a quoted 
>>> backslash the same way as if it were preceded by CTLESC in 
>>> _rmescapes. It passes your test case and mine, but I'll do more 
>>> extensive testing.
>>
>> It causes preglob's string to potentially grow larger than the 
>> original. When called with RMESCAPE_ALLOC, that can be handled by 
>> increasing the buffer size, but preglob also gets called without 
>> RMESCAPE_ALLOC to modify a string in-place. That's never going to work 
>> with this approach. Back to the drawing board...
> 
> There is a way to make it work: ensure sufficient memory is always 
> available. Instead of inserting CTLESC, which caused problems, 
> CTLQUOTEMARK+CTLQUOTEMARK can be inserted instead. It's effectively a 
> no-op here.

It required one obvious additional trivial change to the CHECKSTRSPACE 
invocation, but with that added, the attached passed all testing I could 
think of. Does this look okay to include, did I miss something, or is 
there perhaps a better alternative?

Cheers,
Harald van Dijk

Comments

Denys Vlasenko Feb. 21, 2018, 1:50 p.m. UTC | #1
On Mon, Feb 19, 2018 at 11:13 PM, Harald van Dijk <harald@gigawatt.nl> wrote:
> On 2/18/18 11:50 PM, Harald van Dijk wrote:
>> On 2/14/18 11:50 PM, Harald van Dijk wrote:
>>> On 2/14/18 10:44 PM, Harald van Dijk wrote:
>>>> On 2/14/18 9:03 PM, Harald van Dijk wrote:
>>>>> On 13/02/2018 14:53, Denys Vlasenko wrote:
>>>>>> $ >'\zzzz'
>>>>>> $ >'\wwww'
>>>>>> $ dash -c 'echo "\*"'
>>>>>> \wwww \zzzz
>>>>>
>>>>>
>>>>> [...]
>>>>>
>>>>> Currently:
>>>>>
>>>>> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
>>>>> <>
>>>>>
>>>>> This is what I expect, and also what bash, ksh and posh do.
>>>>>
>>>>> With your patch:
>>>>>
>>>>> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
>>>>> <a>
>>>>
>>>> Does the attached look right as an alternative? It treats a quoted
>>>> backslash the same way as if it were preceded by CTLESC in _rmescapes. It
>>>> passes your test case and mine, but I'll do more extensive testing.
>>>
>>> It causes preglob's string to potentially grow larger than the original.
>>> When called with RMESCAPE_ALLOC, that can be handled by increasing the
>>> buffer size, but preglob also gets called without RMESCAPE_ALLOC to modify a
>>> string in-place. That's never going to work with this approach. Back to the
>>> drawing board...
>>
>> There is a way to make it work: ensure sufficient memory is always
>> available. Instead of inserting CTLESC, which caused problems,
>> CTLQUOTEMARK+CTLQUOTEMARK can be inserted instead. It's effectively a no-op
>> here.
>
> It required one obvious additional trivial change to the CHECKSTRSPACE
> invocation, but with that added, the attached passed all testing I could
> think of. Does this look okay to include, did I miss something, or is there
> perhaps a better alternative?

I propose replacing this:

                if (*p == (char)CTLESC) {
                        p++;
+                       goto escape;
+               } else if (*p == '\\') {
+                       if (inquotes) {
+escape:
+                               if (notescaped)
+                                       *q++ = '\\';
+                       } else {
+                               /* naked back slash */
+                               notescaped = 0;
+                               goto copy;
+                       }

with equivalent

                if (*p == (char)CTLESC) {
                        p++;
+                       goto escape;
+               }
+               if (*p == '\\') {
+                       if (!inquotes) {
+                               /* naked back slash */
+                               notescaped = 0;
+                               goto copy;
+                       }
+escape:
+                       if (notescaped)
+                               *q++ = '\\';
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Harald van Dijk Feb. 21, 2018, 10:47 p.m. UTC | #2
On 2/21/18 2:50 PM, Denys Vlasenko wrote:
> I propose replacing this:

Agreed, that looks better. Thanks!

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/expand.c b/src/expand.c
index 2a50830..af88a69 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -1686,12 +1686,17 @@  _rmescapes(char *str, int flag)
 		}
 		if (*p == (char)CTLESC) {
 			p++;
-			if (notescaped)
-				*q++ = '\\';
-		} else if (*p == '\\' && !inquotes) {
-			/* naked back slash */
-			notescaped = 0;
-			goto copy;
+			goto escape;
+		} else if (*p == '\\') {
+			if (inquotes) {
+escape:
+				if (notescaped)
+					*q++ = '\\';
+			} else {
+				/* naked back slash */
+				notescaped = 0;
+				goto copy;
+			}
 		}
 		notescaped = globbing;
 copy:
diff --git a/src/parser.c b/src/parser.c
index 382658e..a847b2e 100644
--- a/src/parser.c
+++ b/src/parser.c
@@ -909,7 +909,7 @@  readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 #endif
 		CHECKEND();	/* set c to PEOF if at end of here document */
 		for (;;) {	/* until end of line or end of word */
-			CHECKSTRSPACE(4, out);	/* permit 4 calls to USTPUTC */
+			CHECKSTRSPACE(5, out);	/* permit 5 calls to USTPUTC */
 			switch(syntax[c]) {
 			case CNL:	/* '\n' */
 				if (syntax == BASESYNTAX)
@@ -944,6 +944,9 @@  readtoken1(int firstc, char const *syntax, char *eofmark, int striptabs)
 							eofmark != NULL
 						)
 					) {
+						/* Reserve extra memory in case this backslash will require later escaping. */
+						USTPUTC(CTLQUOTEMARK, out);
+						USTPUTC(CTLQUOTEMARK, out);
 						USTPUTC('\\', out);
 					}
 					USTPUTC(CTLESC, out);