diff mbox series

[v2,1/6] CodingGuidelines: add shell piping guidelines

Message ID c625bfe2205d51b3158ef71e4bf472708642c146.1537223021.git.matvore@google.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/6] CodingGuidelines: add shell piping guidelines | expand

Commit Message

Matthew DeVore Sept. 17, 2018, 10:24 p.m. UTC
Add two guidelines:

 - pipe characters should appear at the end of lines, and not cause
   indentation
 - pipes should be avoided when they swallow exit codes that can
   potentially fail
---
 Documentation/CodingGuidelines | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

Comments

Eric Sunshine Sept. 18, 2018, 12:16 a.m. UTC | #1
On Mon, Sep 17, 2018 at 6:24 PM Matthew DeVore <matvore@google.com> wrote:
> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> @@ -163,6 +163,35 @@ For shell scripts specifically (not exhaustive):
> + - In a piped sequence which spans multiple lines, put each statement
> +   on a separate line and put pipes on the end of each line, rather
> +   than the start. This means you don't need to use \ to join lines,
> +   since | implies a join already. Also, do not indent subsequent
> +   lines; if you need a sequence to visually stand apart from the
> +   surrounding code, use a blank line before and/or after the piped
> +   sequence.
> +
> +       (incorrect)
> +       [...]
> +       (correct)
> +       echo '...' > expected

Existing tests seem to favor the name "expect" over "expected", so
perhaps use that instead.

    $ git grep '>expect\b' -- t | wc -l
    2674
    $ git grep '>expected\b' -- t | wc -l
    1406

> +       git ls-files -s file.1 file.2 file.3 file.4 file.5 |
> +       awk '{print $1}' |
> +       sort >observed

This is not a great example since it flatly contradicts the very next
bit of advice added by this patch about not placing a Git command
upstream in a pipe. Perhaps come up with an example which doesn't
suffer this shortcoming.

I've seen the advice earlier in the thread of not indenting the
sub-commands in a pipe, but I find that the result makes it far more
difficult to see which commands are part of the pipe sequence than
with them indented, so I'm not convinced that this advice should be in
the guidelines. (But that just my opinion.)

> + - In a pipe, any non-zero exit codes returned by processes besides
> +   the last will be ignored. If there is any possibility some
> +   non-final command in the pipe will raise an error, prefer writing
> +   the output of that command to a temporary file with '>' rather than
> +   pipe it.

It's not so much that we care about losing a non-zero exit code (which
might be perfectly acceptable depending upon the context) but that we
care about missing a Git command which outright crashes. So, it might
make sense to make this text more specific by saying that ("exit code
indicating a crash" and "Git command") rather than being generic in
saying only "exit code" and "command".

Also, what about expression like $(git foo) by which a crash of a Git
command can also be lost? Do we want to talk about that, as well?
Matthew DeVore Sept. 19, 2018, 2:11 a.m. UTC | #2
On Mon, Sep 17, 2018 at 5:16 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Mon, Sep 17, 2018 at 6:24 PM Matthew DeVore <matvore@google.com> wrote:
> > diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> > @@ -163,6 +163,35 @@ For shell scripts specifically (not exhaustive):
> > + - In a piped sequence which spans multiple lines, put each statement
> > +   on a separate line and put pipes on the end of each line, rather
> > +   than the start. This means you don't need to use \ to join lines,
> > +   since | implies a join already. Also, do not indent subsequent
> > +   lines; if you need a sequence to visually stand apart from the
> > +   surrounding code, use a blank line before and/or after the piped
> > +   sequence.
> > +
> > +       (incorrect)
> > +       [...]
> > +       (correct)
> > +       echo '...' > expected
>
> Existing tests seem to favor the name "expect" over "expected", so
> perhaps use that instead.
>
>     $ git grep '>expect\b' -- t | wc -l
>     2674
>     $ git grep '>expected\b' -- t | wc -l
>     1406
Thank you for clarifying that out for me, but I'm not longer using
that example, so it's moot.

>
> > +       git ls-files -s file.1 file.2 file.3 file.4 file.5 |
> > +       awk '{print $1}' |
> > +       sort >observed
>
> This is not a great example since it flatly contradicts the very next
> bit of advice added by this patch about not placing a Git command
> upstream in a pipe. Perhaps come up with an example which doesn't
> suffer this shortcoming.
Done.

>
> I've seen the advice earlier in the thread of not indenting the
> sub-commands in a pipe, but I find that the result makes it far more
> difficult to see which commands are part of the pipe sequence than
> with them indented, so I'm not convinced that this advice should be in
> the guidelines. (But that just my opinion.)
I'm not totally sure either way, nor do I have a strong opinion. I
agree it's probably better to not codify this in the documentation
until there's a great reason to.

>
> > + - In a pipe, any non-zero exit codes returned by processes besides
> > +   the last will be ignored. If there is any possibility some
> > +   non-final command in the pipe will raise an error, prefer writing
> > +   the output of that command to a temporary file with '>' rather than
> > +   pipe it.
>
> It's not so much that we care about losing a non-zero exit code (which
> might be perfectly acceptable depending upon the context) but that we
> care about missing a Git command which outright crashes. So, it might
> make sense to make this text more specific by saying that ("exit code
> indicating a crash" and "Git command") rather than being generic in
> saying only "exit code" and "command".
Fixed.

>
> Also, what about expression like $(git foo) by which a crash of a Git
> command can also be lost? Do we want to talk about that, as well?
Yes, it's probably better to add a point about that. Here is the new
documentation after applying your suggestions:

 - If a piped sequence which spans multiple lines, put each statement
   on a separate line and put pipes on the end of each line, rather
   than the start. This means you don't need to use \ to join lines,
   since | implies a join already.

        (incorrect)
        grep blob verify_pack_result \
        | awk -f print_1.awk \
        | sort >actual &&
        ...

        (correct)
        grep blob verify_pack_result |
        awk -f print_1.awk |
        sort >actual &&
        ...

 - In a pipe, any exit codes returned by processes besides the last
   are ignored. This means that if git crashes at the beginning or
   middle of a pipe, it may go undetected. Prefer writing the output
   of that command to a temporary file with '>' rather than pipe it.

 - The $(git ...) construct also discards git's exit code, so if the
   goal is to test that particular command, redirect its output to a
   temporary file rather than wrap it with $( ).
Eric Sunshine Sept. 19, 2018, 12:36 p.m. UTC | #3
On Tue, Sep 18, 2018 at 10:11 PM Matthew DeVore <matvore@google.com> wrote:
> Yes, it's probably better to add a point about that. Here is the new
> documentation after applying your suggestions:
>
>  - If a piped sequence which spans multiple lines, put each statement

s/which//

>    on a separate line and put pipes on the end of each line, rather
>    than the start. This means you don't need to use \ to join lines,
>    since | implies a join already.
>         [...]
>  - In a pipe, any exit codes returned by processes besides the last
>    are ignored. This means that if git crashes at the beginning or
>    middle of a pipe, it may go undetected. Prefer writing the output
>    of that command to a temporary file with '>' rather than pipe it.
>
>  - The $(git ...) construct also discards git's exit code, so if the
>    goal is to test that particular command, redirect its output to a
>    temporary file rather than wrap it with $( ).

This all sounds better.
Junio C Hamano Sept. 19, 2018, 2:24 p.m. UTC | #4
Matthew DeVore <matvore@google.com> writes:

> Yes, it's probably better to add a point about that. Here is the new
> documentation after applying your suggestions:
>
>  - If a piped sequence which spans multiple lines, put each statement
>    on a separate line and put pipes on the end of each line, rather
>    than the start. This means you don't need to use \ to join lines,
>    since | implies a join already.
>
>         (incorrect)
>         grep blob verify_pack_result \
>         | awk -f print_1.awk \
>         | sort >actual &&
>         ...
>
>         (correct)
>         grep blob verify_pack_result |
>         awk -f print_1.awk |
>         sort >actual &&
>         ...

The formatting advice to place '|' at the end applies equally to
'&&' and '||' because these three syntactic elements share exactly
the same trait: the shell knows you haven't finished speaking when
it sees them at the end of the line and keeps listening, and humans
would know that too, so there is no need for explicitly continuing
the line with backslash.

Organizationally speaking, I wonder if the above about formatting
would better appear separate from the latter two points that are
about semantics.

>  - In a pipe, any exit codes returned by processes besides the last
>    are ignored. This means that if git crashes at the beginning or
>    middle of a pipe, it may go undetected. Prefer writing the output
>    of that command to a temporary file with '>' rather than pipe it.
>
>  - The $(git ...) construct also discards git's exit code, so if the
>    goal is to test that particular command, redirect its output to a
>    temporary file rather than wrap it with $( ).
Matthew DeVore Sept. 19, 2018, 8:06 p.m. UTC | #5
On Wed, Sep 19, 2018 at 5:36 AM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Tue, Sep 18, 2018 at 10:11 PM Matthew DeVore <matvore@google.com> wrote:
> > Yes, it's probably better to add a point about that. Here is the new
> > documentation after applying your suggestions:
> >
> >  - If a piped sequence which spans multiple lines, put each statement
>
> s/which//
Done.

On Wed, Sep 19, 2018 at 7:24 AM Junio C Hamano <gitster@pobox.com> wrote:
> The formatting advice to place '|' at the end applies equally to
> '&&' and '||' because these three syntactic elements share exactly
> the same trait: the shell knows you haven't finished speaking when
> it sees them at the end of the line and keeps listening, and humans
> would know that too, so there is no need for explicitly continuing
> the line with backslash.
>
I've reworded the text to indicate the advice applies to && and || as well.

> Organizationally speaking, I wonder if the above about formatting
> would better appear separate from the latter two points that are
> about semantics.
>
I moved the formatting point to right under the point about formatting
if statements, which does seem like a more natural progression.

Here is the new patch to summarize the changes (warning: tabs are mangled):

--------------------------------------------------------------------------------

    CodingGuidelines: add shell piping guidelines

    Add two guidelines:

     - pipe characters should appear at the end of lines, and not cause
       indentation
     - pipes should be avoided when they swallow exit codes that can
       potentially fail

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 48aa4edfb..6d265327c 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -118,6 +118,24 @@ For shell scripts specifically (not exhaustive):
                 do this
         fi

+ - If a command sequence joined with && or || or | spans multiple
+   lines, put each command on a separate line and put && and || and |
+   operators at the end of each line, rather than the start. This
+   means you don't need to use \ to join lines, since the above
+   operators imply the sequence isn't finished.
+
+        (incorrect)
+        grep blob verify_pack_result \
+        | awk -f print_1.awk \
+        | sort >actual &&
+        ...
+
+        (correct)
+        grep blob verify_pack_result |
+        awk -f print_1.awk |
+        sort >actual &&
+        ...
+
  - We prefer "test" over "[ ... ]".

  - We do not write the noiseword "function" in front of shell
@@ -163,6 +181,15 @@ For shell scripts specifically (not exhaustive):

    does not have such a problem.

+ - In a piped chain such as "grep blob objects | sort", the exit codes
+   returned by processes besides the last are ignored. This means that
+   if git crashes at the beginning or middle of a chain, it may go
+   undetected. Prefer writing the output of that command to a
+   temporary file with '>' rather than pipe it.
+
+ - The $(git ...) construct also discards git's exit code, so if the
+   goal is to test that particular command, redirect its output to a
+   temporary file rather than wrap it with $( ).

 For C programs:
diff mbox series

Patch

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 48aa4edfb..7f903c1aa 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -163,6 +163,35 @@  For shell scripts specifically (not exhaustive):
 
    does not have such a problem.
 
+ - In a piped sequence which spans multiple lines, put each statement
+   on a separate line and put pipes on the end of each line, rather
+   than the start. This means you don't need to use \ to join lines,
+   since | implies a join already. Also, do not indent subsequent
+   lines; if you need a sequence to visually stand apart from the
+   surrounding code, use a blank line before and/or after the piped
+   sequence.
+
+	(incorrect)
+	echo '...' > expected
+	git ls-files -s file.1 file.2 file.3 file.4 file.5 \
+		| awk '{print $1}' \
+		| sort >observed
+	test_cmp expected actual
+
+	(correct)
+	echo '...' > expected
+
+	git ls-files -s file.1 file.2 file.3 file.4 file.5 |
+	awk '{print $1}' |
+	sort >observed
+
+	test_cmp expected actual
+
+ - In a pipe, any non-zero exit codes returned by processes besides
+   the last will be ignored. If there is any possibility some
+   non-final command in the pipe will raise an error, prefer writing
+   the output of that command to a temporary file with '>' rather than
+   pipe it.
 
 For C programs: