diff mbox series

expand: Fix naked backslah leakage

Message ID ZizbYRti9EydVOg4@gondor.apana.org.au (mailing list archive)
State Superseded
Delegated to: Herbert Xu
Headers show
Series expand: Fix naked backslah leakage | expand

Commit Message

Herbert Xu April 27, 2024, 11:02 a.m. UTC
Naked backslashes in patterns may incorrectly unquote subsequent
wild characters that are themselves quoted.  Fix this by adding
an extra backslash when necessary.

Test case:

	a="\\*bc"; b="\\"; c="*"; echo "<${a##$b"$c"}>"

Old result:

	<>

New result:

	<bc>

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
 src/expand.c   | 10 ++++++++--
 src/mystring.c |  1 +
 2 files changed, 9 insertions(+), 2 deletions(-)

Comments

Harald van Dijk April 27, 2024, 12:12 p.m. UTC | #1
On 27/04/2024 12:02, Herbert Xu wrote:
> Naked backslashes in patterns may incorrectly unquote subsequent
> wild characters that are themselves quoted.  Fix this by adding
> an extra backslash when necessary.
> 
> Test case:
> 
> 	a="\\*bc"; b="\\"; c="*"; echo "<${a##$b"$c"}>"
> 
> Old result:
> 
> 	<>
> 
> New result:
> 
> 	<bc>

The correct result here, I believe, is <\*bc>. The backslash means the 
next character is taken as literal, even if that character is already 
also taken as literal because of the quoting. There is nothing to cause 
the backslash to be taken literally. The result should be that only a 
leading * is removed from $a.

Cheers,
Harald van Dijk
Herbert Xu April 27, 2024, 12:22 p.m. UTC | #2
On Sat, Apr 27, 2024 at 01:12:19PM +0100, Harald van Dijk wrote:
>
> The correct result here, I believe, is <\*bc>. The backslash means the next
> character is taken as literal, even if that character is already also taken
> as literal because of the quoting. There is nothing to cause the backslash
> to be taken literally. The result should be that only a leading * is removed
> from $a.

While I do see the logic in your reasoning, no other shell implements
this behaviour:

*sh -c 'a="*bc"; b="\\"; c="*"; echo "<${a##$b"$c"}>"'

always produces <*bc> across all shells on my system.

Thanks,
Harald van Dijk April 27, 2024, 1:46 p.m. UTC | #3
On 27/04/2024 13:22, Herbert Xu wrote:
> On Sat, Apr 27, 2024 at 01:12:19PM +0100, Harald van Dijk wrote:
>>
>> The correct result here, I believe, is <\*bc>. The backslash means the next
>> character is taken as literal, even if that character is already also taken
>> as literal because of the quoting. There is nothing to cause the backslash
>> to be taken literally. The result should be that only a leading * is removed
>> from $a.
> 
> While I do see the logic in your reasoning, no other shell implements
> this behaviour:

I'm aware this isn't what most shells do, but it's not quite the case 
that no other shell implements this.

> *sh -c 'a="*bc"; b="\\"; c="*"; echo "<${a##$b"$c"}>"'
> 
> always produces <*bc> across all shells on my system.

This produces <bc> with my shell, at least.

If we are comparing what other shells do, it is probably worth pointing 
out that shells disagree on your original example.

   a="\\*bc"; b="\\"; c="*"; echo "<${a##$b"$c"}>"

Many shells share the bug where this prints <>, but of shells that do 
not have this bug, while many print <bc> like what your patch makes dash 
do, the most commonly used one, bash (as of version 4), does not: it 
prints <\*bc>. I am not sure what exactly $b"$c" is trying to strip here.

Cheers,
Harald van Dijk
diff mbox series

Patch

diff --git a/src/expand.c b/src/expand.c
index 2ed02d6..0db2b29 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -1658,6 +1658,7 @@  _rmescapes(char *str, int flag)
 	char *p, *q, *r;
 	int notescaped;
 	int globbing;
+	int inquotes;
 
 	p = strpbrk(str, cqchars);
 	if (!p) {
@@ -1692,16 +1693,17 @@  _rmescapes(char *str, int flag)
 			q = mempcpy(q, str, len);
 		}
 	}
+	inquotes = 0;
 	notescaped = globbing;
 	while (*p) {
 		if (*p == (char)CTLQUOTEMARK) {
 			p++;
-			notescaped = globbing;
+			inquotes ^= globbing;
 			continue;
 		}
 		if (*p == '\\') {
 			/* naked back slash */
-			notescaped = 0;
+			notescaped ^= globbing;
 			goto copy;
 		}
 		if (FNMATCH_IS_ENABLED && *p == '^')
@@ -1711,6 +1713,10 @@  _rmescapes(char *str, int flag)
 add_escape:
 			if (notescaped)
 				*q++ = '\\';
+			else if (inquotes) {
+				*q++ = '\\';
+				*q++ = '\\';
+			}
 		}
 		notescaped = globbing;
 copy:
diff --git a/src/mystring.c b/src/mystring.c
index f651521..5eace6c 100644
--- a/src/mystring.c
+++ b/src/mystring.c
@@ -63,6 +63,7 @@  const char snlfmt[] = "%s\n";
 const char dolatstr[] = { CTLQUOTEMARK, CTLVAR, VSNORMAL | VSBIT, '@', '=',
 			  CTLQUOTEMARK, '\0' };
 const char cqchars[] = {
+	'\\',
 #ifdef HAVE_FNMATCH
 	'^',
 #endif