diff mbox series

[v2,3/3] t6421: fix test to work when repo dir contains d0

Message ID 818dc9e6b3e8a4d449cb9dbce689bfadb95099ff.1722632287.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Small fixes for issues detected during internal CI runs | expand

Commit Message

Kyle Lippincott Aug. 2, 2024, 8:58 p.m. UTC
From: Kyle Lippincott <spectral@google.com>

The `grep` statement in this test looks for `d0.*<string>`, attempting
to filter to only show lines that had tabular output where the 2nd
column had `d0` and the final column had a substring of
[`git -c `]`fetch.negotiationAlgorithm`. These lines also have
`child_start` in the 4th column, but this isn't part of the condition.

A subsequent line will have `d1` in the 2nd column, `start` in the 4th
column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final
column. If `/path/to/git/git` contains the substring `d0`, then this
line is included by `grep` as well as the desired line, leading to an
effective doubling of the number of lines, and test failures.

Tighten the grep expression to require `d0` to be surrounded by spaces,
and to have the `child_start` label.

Signed-off-by: Kyle Lippincott <spectral@google.com>
---
 t/t6421-merge-partial-clone.sh | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Comments

Junio C Hamano Aug. 2, 2024, 9:41 p.m. UTC | #1
"Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Kyle Lippincott <spectral@google.com>
>
> The `grep` statement in this test looks for `d0.*<string>`, attempting
> to filter to only show lines that had tabular output where the 2nd
> column had `d0` and the final column had a substring of
> [`git -c `]`fetch.negotiationAlgorithm`. These lines also have
> `child_start` in the 4th column, but this isn't part of the condition.
>
> A subsequent line will have `d1` in the 2nd column, `start` in the 4th
> column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final
> column. If `/path/to/git/git` contains the substring `d0`, then this
> line is included by `grep` as well as the desired line, leading to an
> effective doubling of the number of lines, and test failures.
>
> Tighten the grep expression to require `d0` to be surrounded by spaces,
> and to have the `child_start` label.

OK.

I think I actually misinterpreted what you meant with these changes.
It is not what the patterns are picking.  It is some _other_ trace
entry we do not necessarily care about, like label:do_write_index
that has the path to the .git/index.lock file, that can accidentally
contain d0, that can be picked up with a pattern that is too loose.
So it really didn't have to clarify what it is looking for, as it
would not help seeing what false positives the patterns are designed
to avoid matching.  Sorry about that.

Will queue.
Kyle Lippincott Aug. 3, 2024, 12:03 a.m. UTC | #2
On Fri, Aug 2, 2024 at 2:41 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Kyle Lippincott <spectral@google.com>
> >
> > The `grep` statement in this test looks for `d0.*<string>`, attempting
> > to filter to only show lines that had tabular output where the 2nd
> > column had `d0` and the final column had a substring of
> > [`git -c `]`fetch.negotiationAlgorithm`. These lines also have
> > `child_start` in the 4th column, but this isn't part of the condition.
> >
> > A subsequent line will have `d1` in the 2nd column, `start` in the 4th
> > column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final
> > column. If `/path/to/git/git` contains the substring `d0`, then this
> > line is included by `grep` as well as the desired line, leading to an
> > effective doubling of the number of lines, and test failures.
> >
> > Tighten the grep expression to require `d0` to be surrounded by spaces,
> > and to have the `child_start` label.
>
> OK.
>
> I think I actually misinterpreted what you meant with these changes.
> It is not what the patterns are picking.  It is some _other_ trace
> entry we do not necessarily care about, like label:do_write_index
> that has the path to the .git/index.lock file, that can accidentally
> contain d0, that can be picked up with a pattern that is too loose.
> So it really didn't have to clarify what it is looking for, as it
> would not help seeing what false positives the patterns are designed
> to avoid matching.  Sorry about that.

I would have included examples, but they're quite long (>>>80 chars),
so seemed very out of place in both commit description and in the
codebase. With line wrapping, it wasn't very readable either. At the
risk of this also getting line-wrapped into unreadability:

test_line_count: line count for fetches != 1
23:59:48.794453 run-command.c:733            | d0 | main
      | child_start  |     |  0.027328 |           |              |
..........[ch1] class:? argv:[git -c fetch.negotiationAlgorithm=noop
fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no
--filter=blob:none --stdin]
23:59:48.798901 common-main.c:58             | d1 | main
      | start        |     |  0.000852 |           |              |
/usr/local/google/home/spectral/src/oss/d0/git/git -c
fetch.negotiationAlgorithm=noop fetch origin --no-tags
--no-write-fetch-head --recurse-submodules=no --filter=blob:none
--stdin

where each line in the `fetches` file starts with `23:59:48` here.
It's 9 columns, separated by `|` characters, and the line we don't
want is the second one; the regex `d0.*fetch.negotiationAlgorithm`
includes it because of the `d0` in the path.

I considered using `awk -F"|" "\$2~/d0/ &&
\$9~/fetch\\.negotiationAlgorithm/{ print }" trace.output >fetches`,
but it was longer, possibly less clear, and less specific (since it
didn't include the $4~/child_start/ condition)

>
> Will queue.
>
>
Junio C Hamano Aug. 3, 2024, 12:27 a.m. UTC | #3
Kyle Lippincott <spectral@google.com> writes:

>> So it really didn't have to clarify what it is looking for, as it
>> would not help seeing what false positives the patterns are designed
>> to avoid matching.  Sorry about that.
>
> I would have included examples, but they're quite long (>>>80 chars),
> so seemed very out of place in both commit description and in the
> codebase.

Absolutely.  It turned out not to be so useful to show the shape of
potential matches, like this one:

 ... run-command.c:733            | d0 | main      | child_start  |...

To explain why spaces around " d0 " matters, the readers need to
understand that other trace entries that are irrelevant for our
purpose, like this one

    ... | label:do_write_index /path/to/t/trash directory.../.git/index.lock

we want reject, and for that we want the pattern to be specific
enough by looking for " do " that is followed by " child_start ".
Otherwise the leading paths that can contain anything won't easily
match, and the original of looking for just "d0" was way too error
prone.  But it is hard to leave a concise hint for that there.

So, again, sorry about the bad suggestion.

> I considered using `awk -F"|" "\$2~/d0/ &&
> \$9~/fetch\\.negotiationAlgorithm/{ print }" trace.output >fetches`,
> but it was longer, possibly less clear, and less specific (since it
> didn't include the $4~/child_start/ condition)

Yeah, using the syntactic clue -F"|" would also be a way to convey
the intention (i.e. "we are dealing with tabular output and we
expect nth column to be X"), but what you have is probably good
enough---it certainly is simpler to read and understand.  I briefly
considered that looking for "| d0 |" (i.e. explicitly mentioning the
column separator in the pattern) would make it even more obvious
what we are looking for, but having to worry about quoting "|" in
regexp would negate the benefit of obviousness out of the approach
to use "grep".

Thanks.
diff mbox series

Patch

diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh
index 711b709e755..b99f29ef9ba 100755
--- a/t/t6421-merge-partial-clone.sh
+++ b/t/t6421-merge-partial-clone.sh
@@ -230,8 +230,9 @@  test_expect_merge_algorithm failure success 'Objects downloaded for single relev
 		grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual &&
 		test_cmp expect actual &&
 
-		# Check the number of fetch commands exec-ed
-		grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
+		# Check the number of fetch commands exec-ed by filtering trace to
+		# child_start events by the top-level program (2nd field == d0)
+		grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
 		test_line_count = 2 fetches &&
 
 		git rev-list --objects --all --missing=print |
@@ -318,8 +319,9 @@  test_expect_merge_algorithm failure success 'Objects downloaded when a directory
 		grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual &&
 		test_cmp expect actual &&
 
-		# Check the number of fetch commands exec-ed
-		grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
+		# Check the number of fetch commands exec-ed by filtering trace to
+		# child_start events by the top-level program (2nd field == d0)
+		grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
 		test_line_count = 1 fetches &&
 
 		git rev-list --objects --all --missing=print |
@@ -422,8 +424,9 @@  test_expect_merge_algorithm failure success 'Objects downloaded with lots of ren
 		grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual &&
 		test_cmp expect actual &&
 
-		# Check the number of fetch commands exec-ed
-		grep d0.*fetch.negotiationAlgorithm trace.output >fetches &&
+		# Check the number of fetch commands exec-ed by filtering trace to
+		# child_start events by the top-level program (2nd field == d0)
+		grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches &&
 		test_line_count = 4 fetches &&
 
 		git rev-list --objects --all --missing=print |