diff mbox series

[3/4] check-non-portable-shell: improve `VAR=val shell-func` detection

Message ID 20240722065915.80760-4-ericsunshine@charter.net (mailing list archive)
State Superseded
Headers show
Series improve one-shot variable detection with shell function | expand

Commit Message

Eric Sunshine July 22, 2024, 6:59 a.m. UTC
From: Eric Sunshine <sunshine@sunshineco.com>

Unlike "VAR=val cmd" one-shot environment variable assignments which
exist only for the invocation of 'cmd', those assigned by "VAR=val
shell-func" exist within the running shell and continue to do so until
the process exits. check-non-portable-shell.pl warns when it detects
such usage since, more often than not, the author who writes such an
invocation is unaware of the undesirable behavior.

However, a limitation of the check is that it only detects such
invocations when variable assignment (i.e. `VAR=val`) is the first
thing on the line. Thus, it can easily be fooled by an invocation such
as:

    echo X | VAR=val shell-func

Address this shortcoming by loosening the check so that the variable
assignment can be recognized even when not at the beginning of the line.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 t/check-non-portable-shell.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Rubén Justo July 22, 2024, 2:46 p.m. UTC | #1
On Mon, Jul 22, 2024 at 02:59:13AM -0400, Eric Sunshine wrote:
> From: Eric Sunshine <sunshine@sunshineco.com>
> 
> Unlike "VAR=val cmd" one-shot environment variable assignments which
> exist only for the invocation of 'cmd', those assigned by "VAR=val
> shell-func" exist within the running shell and continue to do so until
> the process exits. check-non-portable-shell.pl warns when it detects
> such usage since, more often than not, the author who writes such an
> invocation is unaware of the undesirable behavior.
> 
> However, a limitation of the check is that it only detects such
> invocations when variable assignment (i.e. `VAR=val`) is the first
> thing on the line. Thus, it can easily be fooled by an invocation such
> as:
> 
>     echo X | VAR=val shell-func
> 
> Address this shortcoming by loosening the check so that the variable
> assignment can be recognized even when not at the beginning of the line.
> 
> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
> ---
>  t/check-non-portable-shell.pl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
> index b2b28c2ced..44b23d6ddd 100755
> --- a/t/check-non-portable-shell.pl
> +++ b/t/check-non-portable-shell.pl
> @@ -49,7 +49,7 @@ sub err {
>  	/\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable (use FOO=bar && export FOO)';
>  	/\blocal\s+[A-Za-z0-9_]*=\$([A-Za-z0-9_{]|[(][^(])/ and
>  		err q(quote "$val" in 'local var=$val');
> -	/^\s*([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
> +	/\b([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and !/test_env.+=/ and exists($func{$4}) and

Losing "^\s*" means we'll cause false positives, such as:

    # VAR=VAL shell-func
    echo VAR=VAL shell-func

Regardless of that, the regex will continue to pose problems with:

  VAR=$OTHER_VALUE shell-func
  VAR=$(cmd) shell-func
  VAR=VAL\ UE shell-func
  VAR="\"val\" shell-func UE" non-shell-func 

Which, of course, should be cases that should be written in a more
orthodox way. 

But we will start to detect errors like the ones mentioned in the
message, which are more likely to happen.

I think this change is a good step forward, and I'm happy with it as it
is.

Thanks

>  		err '"FOO=bar shell_func" assignment extends beyond "shell_func"';
>  	$line = '';
>  	# this resets our $. for each file
> -- 
> 2.45.2
>
Kyle Lippincott July 22, 2024, 5:26 p.m. UTC | #2
On Mon, Jul 22, 2024 at 12:01 AM Eric Sunshine <ericsunshine@charter.net> wrote:
>
> From: Eric Sunshine <sunshine@sunshineco.com>
>
> Unlike "VAR=val cmd" one-shot environment variable assignments which
> exist only for the invocation of 'cmd', those assigned by "VAR=val
> shell-func" exist within the running shell and continue to do so until
> the process exits. check-non-portable-shell.pl warns when it detects
> such usage since, more often than not, the author who writes such an
> invocation is unaware of the undesirable behavior.
>
> However, a limitation of the check is that it only detects such
> invocations when variable assignment (i.e. `VAR=val`) is the first
> thing on the line. Thus, it can easily be fooled by an invocation such
> as:
>
>     echo X | VAR=val shell-func
>
> Address this shortcoming by loosening the check so that the variable
> assignment can be recognized even when not at the beginning of the line.
>
> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
> ---
>  t/check-non-portable-shell.pl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
> index b2b28c2ced..44b23d6ddd 100755
> --- a/t/check-non-portable-shell.pl
> +++ b/t/check-non-portable-shell.pl
> @@ -49,7 +49,7 @@ sub err {
>         /\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable (use FOO=bar && export FOO)';
>         /\blocal\s+[A-Za-z0-9_]*=\$([A-Za-z0-9_{]|[(][^(])/ and
>                 err q(quote "$val" in 'local var=$val');
> -       /^\s*([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
> +       /\b([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and !/test_env.+=/ and exists($func{$4}) and
>                 err '"FOO=bar shell_func" assignment extends beyond "shell_func"';

Is there an example of a shell on Linux that has this behavior that I
can observe, and/or reproduction steps? `bash --posix` does not do
this, as far as I can tell. I tried:

```
bash --posix
echo_hi() { echo hi; }
FOO=BAR echo_hi
echo $FOO
<no output>
```

and the simpler:

```
bash --posix
FOO=BAR echo hi
echo $FOO
```

Both attempts were done interactively, not via a script.

I'm asking mostly because of the recent "platform support document"
patch series - this is a very surprising behavior (which is presumably
why it was added to this test), and it's possible this was just a bug
in a shell that isn't really used anymore. Having documentation on how
to reproduce the issue lets us know when we can remove these kinds of
restrictions on our codebase that we took in the name of
compatibility, possibly over a decade ago.

>         $line = '';
>         # this resets our $. for each file
> --
> 2.45.2
>
>
Junio C Hamano July 22, 2024, 6:10 p.m. UTC | #3
Kyle Lippincott <spectral@google.com> writes:

> Is there an example of a shell on Linux that has this behavior that I
> can observe, and/or reproduction steps?

Every once in a while this comes up and we fix, e.g.

https://lore.kernel.org/git/528CE716.8060307@ramsay1.demon.co.uk/
https://lore.kernel.org/git/c6efda03848abc00cf8bf8d84fc34ef0d652b64c.1264151435.git.mhagger@alum.mit.edu/
https://lore.kernel.org/git/Koa4iojOlOQ_YENPwWXKt7G8Aa1x6UaBnFFtliKdZmpcrrqOBhY7NQ@cipher.nrlssc.navy.mil/
https://lore.kernel.org/git/20180713055205.32351-2-sunshine@sunshineco.com/
https://lore.kernel.org/git/574E27A4.6040804@ramsayjones.plus.com/

which is from a query

    https://lore.kernel.org/git/?q=one-shot+export+shell+function

but unfortunately we do not document which exact shell the observed
breakage happened with.

The closest article I found that is suitable as a discussion
reignitor talks about what POSIX requires, which may be more
relevant:

  https://lore.kernel.org/git/4B5027B8.2090507@viscovery.net/
Kyle Lippincott July 22, 2024, 9:35 p.m. UTC | #4
On Mon, Jul 22, 2024 at 11:10 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Kyle Lippincott <spectral@google.com> writes:
>
> > Is there an example of a shell on Linux that has this behavior that I
> > can observe, and/or reproduction steps?
>
> Every once in a while this comes up and we fix, e.g.
>
> https://lore.kernel.org/git/528CE716.8060307@ramsay1.demon.co.uk/
> https://lore.kernel.org/git/c6efda03848abc00cf8bf8d84fc34ef0d652b64c.1264151435.git.mhagger@alum.mit.edu/
> https://lore.kernel.org/git/Koa4iojOlOQ_YENPwWXKt7G8Aa1x6UaBnFFtliKdZmpcrrqOBhY7NQ@cipher.nrlssc.navy.mil/
> https://lore.kernel.org/git/20180713055205.32351-2-sunshine@sunshineco.com/

Thanks, this one leads to
https://lore.kernel.org/git/20180713055205.32351-1-sunshine@sunshineco.com/,
which references
https://public-inbox.org/git/xmqqefg8w73c.fsf@gitster-ct.c.googlers.com/T/,
which claims that `dash` has this behavior 6 years ago. The version of
`dash` I have on my machine right now doesn't seem to have this issue,
but I can believe some older version does.

> https://lore.kernel.org/git/574E27A4.6040804@ramsayjones.plus.com/
>
> which is from a query
>
>     https://lore.kernel.org/git/?q=one-shot+export+shell+function
>
> but unfortunately we do not document which exact shell the observed
> breakage happened with.
>
> The closest article I found that is suitable as a discussion
> reignitor talks about what POSIX requires, which may be more
> relevant:
>
>   https://lore.kernel.org/git/4B5027B8.2090507@viscovery.net/

This claims that `ksh` "gets it right", and I can confirm that ksh
does behave this way on my Linux machine.

Having just looked at the POSIX standard (I don't think I'm allowed to
copy from this document), the POSIX standard (POSIX.1-2024, at least)
explicitly leaves it unspecified whether the variable assignments
remain in effect after function execution.

Thanks for indulging my curiosity; should we include a statement in
the linter along the lines of `# POSIX.1-2024 explicitly does not
specify if variable assignment persists after executing a shell
function; some shells, such as ksh, have these variables remain.`?
Junio C Hamano July 22, 2024, 9:57 p.m. UTC | #5
Kyle Lippincott <spectral@google.com> writes:

> Having just looked at the POSIX standard (I don't think I'm allowed to
> copy from this document), the POSIX standard (POSIX.1-2024, at least)
> explicitly leaves it unspecified whether the variable assignments
> remain in effect after function execution.

True.

https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html#tag_19_09_01_02

also says that it is unspecified if the variable gets exported, and
older version of dash that comes on Ubuntu 20.04 chooses *not* to
export, which was the test breakage that triggered this whole
discussion.  The thread can be seen here:

https://lore.kernel.org/git/xmqqbk2p9lwi.fsf_-_@gitster.g/

> Thanks for indulging my curiosity; should we include a statement in
> the linter along the lines of `# POSIX.1-2024 explicitly does not
> specify if variable assignment persists after executing a shell
> function; some shells, such as ksh, have these variables remain.`?

Giving a review (either positive or negative is fine, as long as it
is constructive) on the update to CodingGuidelines

  https://lore.kernel.org/git/xmqqbk2p9lwi.fsf_-_@gitster.g/

may be a good place to start.

Thanks.
Eric Sunshine July 26, 2024, 6:45 a.m. UTC | #6
On Mon, Jul 22, 2024 at 10:46 AM Rubén Justo <rjusto@gmail.com> wrote:
> On Mon, Jul 22, 2024 at 02:59:13AM -0400, Eric Sunshine wrote:
> > -     /^\s*([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
> > +     /\b([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and !/test_env.+=/ and exists($func{$4}) and
>
> Losing "^\s*" means we'll cause false positives, such as:
>
>     # VAR=VAL shell-func
>     echo VAR=VAL shell-func

True, though, considering that "shell-func" in these examples must
match the name of a function actually defined in one of the input
files, one would expect (or at least hope) that this sort of
false-positive will be exceedingly rare. Indeed, there are no such
false-positives in the existing test scripts. Of course, we can always
tighten the regex later if it proves to be problematic.

> Regardless of that, the regex will continue to pose problems with:
>
>   VAR=$OTHER_VALUE shell-func
>   VAR=$(cmd) shell-func
>   VAR=VAL\ UE shell-func
>   VAR="\"val\" shell-func UE" non-shell-func
>
> Which, of course, should be cases that should be written in a more
> orthodox way.

Yes, it can be difficult to be thorough when "linting" a programming
language merely via regular-expressions, and this particular
expression is already almost unreadable. The effort involved in trying
to make it perfect may very well outweigh the potential gain in
coverage.

> But we will start to detect errors like the ones mentioned in the
> message, which are more likely to happen.

Indeed.
Rubén Justo July 26, 2024, 1:15 p.m. UTC | #7
On Fri, Jul 26, 2024 at 02:45:59AM -0400, Eric Sunshine wrote:
> On Mon, Jul 22, 2024 at 10:46 AM Rubén Justo <rjusto@gmail.com> wrote:
> > On Mon, Jul 22, 2024 at 02:59:13AM -0400, Eric Sunshine wrote:
> > > -     /^\s*([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
> > > +     /\b([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and !/test_env.+=/ and exists($func{$4}) and
> >
> > Losing "^\s*" means we'll cause false positives, such as:
> >
> >     # VAR=VAL shell-func
> >     echo VAR=VAL shell-func
> 
> True, though, considering that "shell-func" in these examples must
> match the name of a function actually defined in one of the input
> files, one would expect (or at least hope) that this sort of
> false-positive will be exceedingly rare. Indeed, there are no such
> false-positives in the existing test scripts. Of course, we can always
> tighten the regex later if it proves to be problematic.
> 
> > Regardless of that, the regex will continue to pose problems with:
> >
> >   VAR=$OTHER_VALUE shell-func
> >   VAR=$(cmd) shell-func
> >   VAR=VAL\ UE shell-func
> >   VAR="\"val\" shell-func UE" non-shell-func
> >
> > Which, of course, should be cases that should be written in a more
> > orthodox way.
> 
> Yes, it can be difficult to be thorough when "linting" a programming
> language merely via regular-expressions, and this particular
> expression is already almost unreadable. The effort involved in trying
> to make it perfect may very well outweigh the potential gain in
> coverage.

I tried to be exhaustive in the analysis of the change and explicit in
the conclusions so that it is clear, and documented in the list, that we
acknowledge the magnitude of the change.

I agree.  I don't think it's worth refining the regex any further.  It
might even be counterproductive.  It covers the cases it was already
covering and the new ones that have occurred.

A simple 'Acked-by: Rubén Justo <rjusto@gmail.com>' didn't seem
sufficient to me :), but perhaps it would have been clearer.

I have the same positive opinion of your new iteration:
20240722065915.80760-5-ericsunshine@charter.net

> 
> > But we will start to detect errors like the ones mentioned in the
> > message, which are more likely to happen.
> 
> Indeed.
diff mbox series

Patch

diff --git a/t/check-non-portable-shell.pl b/t/check-non-portable-shell.pl
index b2b28c2ced..44b23d6ddd 100755
--- a/t/check-non-portable-shell.pl
+++ b/t/check-non-portable-shell.pl
@@ -49,7 +49,7 @@  sub err {
 	/\bexport\s+[A-Za-z0-9_]*=/ and err '"export FOO=bar" is not portable (use FOO=bar && export FOO)';
 	/\blocal\s+[A-Za-z0-9_]*=\$([A-Za-z0-9_{]|[(][^(])/ and
 		err q(quote "$val" in 'local var=$val');
-	/^\s*([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and exists($func{$4}) and
+	/\b([A-Z0-9_]+=(\w*|(["']).*?\3)\s+)+(\w+)/ and !/test_env.+=/ and exists($func{$4}) and
 		err '"FOO=bar shell_func" assignment extends beyond "shell_func"';
 	$line = '';
 	# this resets our $. for each file