diff mbox series

[1/6] t4061: use POSIX compliance regex(7)

Message ID c45d6383173d8d3e73cdcdd6e993d3259d519a68.1584625896.git.congdanhqx@gmail.com (mailing list archive)
State New, archived
Headers show
Series fix test failure with busybox | expand

Commit Message

Đoàn Trần Công Danh March 19, 2020, 2 p.m. UTC
BRE interprets `+` literally, and
`\+` is undefined for POSIX BRE, from:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_02

> The interpretation of an ordinary character preceded
> by an unescaped <backslash> ( '\\' ) is undefined, except for:
> - The characters ')', '(', '{', and '}'
> - The digits 1 to 9 inclusive
> - A character inside a bracket expression

This test is failing with busybox sed, the default sed of Alpine Linux

Fix it by using literal `+` instead.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
---
 t/t4061-diff-indent.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jeff King March 19, 2020, 3:53 p.m. UTC | #1
On Thu, Mar 19, 2020 at 09:00:02PM +0700, Đoàn Trần Công Danh wrote:

> BRE interprets `+` literally, and
> `\+` is undefined for POSIX BRE, from:
> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_02
> 
> > The interpretation of an ordinary character preceded
> > by an unescaped <backslash> ( '\\' ) is undefined, except for:
> > - The characters ')', '(', '{', and '}'
> > - The digits 1 to 9 inclusive
> > - A character inside a bracket expression
> 
> This test is failing with busybox sed, the default sed of Alpine Linux
> 
> Fix it by using literal `+` instead.

This makes sense, I think. It could hurt a sed which is expected ERE and
needs the "+" escaped, but I think such a sed would be wrong (and I
imagine would break things elsewhere).

-Peff
Eric Sunshine March 19, 2020, 4:01 p.m. UTC | #2
On Thu, Mar 19, 2020 at 11:53 AM Jeff King <peff@peff.net> wrote:
> On Thu, Mar 19, 2020 at 09:00:02PM +0700, Đoàn Trần Công Danh wrote:
> > Fix it by using literal `+` instead.
>
> This makes sense, I think. It could hurt a sed which is expected ERE and
> needs the "+" escaped, but I think such a sed would be wrong (and I
> imagine would break things elsewhere).

I had the same thought and considered suggesting a character class:

    sed -n -e "1,4d" -e "s/^[+]//p" <"$1" >.tmp-1

to make it painfully obvious that "+" is not special in the
expression. But then I thought better of it -- for the same reason as
you (to wit: such a 'sed' would be wrong) -- and decided against
saying anything.
Junio C Hamano March 19, 2020, 10:02 p.m. UTC | #3
Eric Sunshine <sunshine@sunshineco.com> writes:

> On Thu, Mar 19, 2020 at 11:53 AM Jeff King <peff@peff.net> wrote:
>> On Thu, Mar 19, 2020 at 09:00:02PM +0700, Đoàn Trần Công Danh wrote:
>> > Fix it by using literal `+` instead.
>>
>> This makes sense, I think. It could hurt a sed which is expected ERE and
>> needs the "+" escaped, but I think such a sed would be wrong (and I
>> imagine would break things elsewhere).
>
> I had the same thought and considered suggesting a character class:
>
>     sed -n -e "1,4d" -e "s/^[+]//p" <"$1" >.tmp-1
>
> to make it painfully obvious that "+" is not special in the
> expression. But then I thought better of it -- for the same reason as
> you (to wit: such a 'sed' would be wrong) -- and decided against
> saying anything.

I have only one thing that needs fixing, which is s/compliance/compliant/;
on the title.  Other than that, it looks good.

Having said that, I would have done the [+] thing if I were doing
this patch myself.  As long as we see no "wrong" sed that is broken
by this change, I am OK with it, though.

Thanks.
Đoàn Trần Công Danh March 20, 2020, 1:35 a.m. UTC | #4
On 2020-03-19 15:02:37-0700, Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> 
> > On Thu, Mar 19, 2020 at 11:53 AM Jeff King <peff@peff.net> wrote:
> >> On Thu, Mar 19, 2020 at 09:00:02PM +0700, Đoàn Trần Công Danh wrote:
> >> > Fix it by using literal `+` instead.
> >>
> >> This makes sense, I think. It could hurt a sed which is expected ERE and
> >> needs the "+" escaped, but I think such a sed would be wrong (and I
> >> imagine would break things elsewhere).
> >
> > I had the same thought and considered suggesting a character class:
> >
> >     sed -n -e "1,4d" -e "s/^[+]//p" <"$1" >.tmp-1
> >
> > to make it painfully obvious that "+" is not special in the
> > expression. But then I thought better of it -- for the same reason as
> > you (to wit: such a 'sed' would be wrong) -- and decided against
> > saying anything.
> 
> I have only one thing that needs fixing, which is s/compliance/compliant/;
> on the title.  Other than that, it looks good.
> 
> Having said that, I would have done the [+] thing if I were doing
> this patch myself.  As long as we see no "wrong" sed that is broken
> by this change, I am OK with it, though.

Well, `[+]` thing was my first version for this change,
but I change in to this version afterward.

However, your comment in a later patch:

> IOW, I do not have trouble changing the test so that it works with
> noncompliant "diff".  But then in the same series, I would prefer to
> see the existing test keeps working with a possibly noncompliant
> "sed" implementation that has been working well with the tests.

changed my mind.

I will use `[+]` here in the reroll.
diff mbox series

Patch

diff --git a/t/t4061-diff-indent.sh b/t/t4061-diff-indent.sh
index 2affd7a100..0f7a6d97a8 100755
--- a/t/t4061-diff-indent.sh
+++ b/t/t4061-diff-indent.sh
@@ -17,7 +17,7 @@  compare_diff () {
 # Compare blame output using the expectation for a diff as reference.
 # Only look for the lines coming from non-boundary commits.
 compare_blame () {
-	sed -n -e "1,4d" -e "s/^\+//p" <"$1" >.tmp-1
+	sed -n -e "1,4d" -e "s/^+//p" <"$1" >.tmp-1
 	sed -ne "s/^[^^][^)]*) *//p" <"$2" >.tmp-2
 	test_cmp .tmp-1 .tmp-2 && rm -f .tmp-1 .tmp-2
 }