diff mbox series

expand: Recognize '^' as a negation character in BE

Message ID 20210716094227.15177-1-mscalindt@gmail.com (mailing list archive)
State Superseded
Delegated to: Herbert Xu
Headers show
Series expand: Recognize '^' as a negation character in BE | expand

Commit Message

Dimitar Yurukov July 16, 2021, 9:42 a.m. UTC
While parsing bracket expression ('[...]'), DASH recognizes only '!' as
a special character for negation/inversion, but POSIX specifies '^'.

The POSIX specification (2018 edition) states:

  ^ The <circumflex> shall signify a non-matching list expression when
    it occurs first in a list, immediately following a
    <left-square-bracket> (see RE Bracket Expression).

DASH:
    $ i='123 asd' && printf "%s\n" "${i##*[!a-z]}"
    asd
    $ i='123 asd' && printf "%s\n" "${i##*[^a-z]}"
    <empty expansion>

BASH (with --posix):
    $ i='123 asd' && printf "%s\n" "${i##*[!a-z]}"
    asd
    $ i='123 asd' && printf "%s\n" "${i##*[^a-z]}"
    asd

Make <circumflex> ('^') a special character used to specify
negation/inversion in bracket expressions:

    $ i='123 asd' && printf "%s\n" "${i##*[^a-z]}"
    asd

Signed-off-by: Dimitar Yurukov <mscalindt@gmail.com>
---
 src/expand.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Harald van Dijk July 16, 2021, 10 a.m. UTC | #1
On 16/07/2021 10:42, Dimitar Yurukov wrote:
> While parsing bracket expression ('[...]'), DASH recognizes only '!' as
> a special character for negation/inversion, but POSIX specifies '^'.
> 
> The POSIX specification (2018 edition) states:
> 
>    ^ The <circumflex> shall signify a non-matching list expression when
>      it occurs first in a list, immediately following a
>      <left-square-bracket> (see RE Bracket Expression).

It also states:

   the <exclamation-mark> character ( '!' ) shall replace the 
<circumflex> character ( '^' ) in its role in a non-matching list in the 
regular expression notation

and

   A bracket expression starting with an unquoted <circumflex> character 
produces unspecified results.

See 2.13.1 Patterns Matching a Single Character.

So both the dash and the bash behaviour are permitted and this patch 
does not address a correctness issue. Scripts that rely on ^ for 
negation should be modified to use !.

The patch may still be worthwhile to increase compatibility, but in that 
case the same change also needs to be made to expmeta().

Cheers,
Harald van Dijk
Dimitar Yurukov July 16, 2021, 12:46 p.m. UTC | #2
On 16/07/2021 11:00, Harald van Dijk wrote:
> On 16/07/2021 10:42, Dimitar Yurukov wrote:
> > While parsing bracket expression ('[...]'), DASH recognizes only '!' as
> > a special character for negation/inversion, but POSIX specifies '^'.
> > 
> > The POSIX specification (2018 edition) states:
> > 
> >    ^ The <circumflex> shall signify a non-matching list expression when
> >      it occurs first in a list, immediately following a
> >      <left-square-bracket> (see RE Bracket Expression).
> 
> It also states:
> 
>    the <exclamation-mark> character ( '!' ) shall replace the
> <circumflex> character ( '^' ) in its role in a non-matching list in the
> regular expression notation
> 
> and
> 
>    A bracket expression starting with an unquoted <circumflex> character
> produces unspecified results.
> 
> See 2.13.1 Patterns Matching a Single Character.

Oh, my bad, sorry for the noise.

> The patch may still be worthwhile to increase compatibility, but in that
> case the same change also needs to be made to expmeta().

Oops, you are right. Will attach v2.
diff mbox series

Patch

diff --git a/src/expand.c b/src/expand.c
index 1730670..06392ff 100644
--- a/src/expand.c
+++ b/src/expand.c
@@ -1565,7 +1565,7 @@  pmatch(const char *pattern, const char *string)
 
 			startp = p;
 			invert = 0;
-			if (*p == '!') {
+			if (*p == '!' || *p == '^') {
 				invert++;
 				p++;
 			}