Message ID | 818dc9e6b3e8a4d449cb9dbce689bfadb95099ff.1722632287.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Small fixes for issues detected during internal CI runs | expand |
"Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Kyle Lippincott <spectral@google.com> > > The `grep` statement in this test looks for `d0.*<string>`, attempting > to filter to only show lines that had tabular output where the 2nd > column had `d0` and the final column had a substring of > [`git -c `]`fetch.negotiationAlgorithm`. These lines also have > `child_start` in the 4th column, but this isn't part of the condition. > > A subsequent line will have `d1` in the 2nd column, `start` in the 4th > column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final > column. If `/path/to/git/git` contains the substring `d0`, then this > line is included by `grep` as well as the desired line, leading to an > effective doubling of the number of lines, and test failures. > > Tighten the grep expression to require `d0` to be surrounded by spaces, > and to have the `child_start` label. OK. I think I actually misinterpreted what you meant with these changes. It is not what the patterns are picking. It is some _other_ trace entry we do not necessarily care about, like label:do_write_index that has the path to the .git/index.lock file, that can accidentally contain d0, that can be picked up with a pattern that is too loose. So it really didn't have to clarify what it is looking for, as it would not help seeing what false positives the patterns are designed to avoid matching. Sorry about that. Will queue.
On Fri, Aug 2, 2024 at 2:41 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Kyle Lippincott via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Kyle Lippincott <spectral@google.com> > > > > The `grep` statement in this test looks for `d0.*<string>`, attempting > > to filter to only show lines that had tabular output where the 2nd > > column had `d0` and the final column had a substring of > > [`git -c `]`fetch.negotiationAlgorithm`. These lines also have > > `child_start` in the 4th column, but this isn't part of the condition. > > > > A subsequent line will have `d1` in the 2nd column, `start` in the 4th > > column, and `/path/to/git/git -c fetch.negotiationAlgorihm` in the final > > column. If `/path/to/git/git` contains the substring `d0`, then this > > line is included by `grep` as well as the desired line, leading to an > > effective doubling of the number of lines, and test failures. > > > > Tighten the grep expression to require `d0` to be surrounded by spaces, > > and to have the `child_start` label. > > OK. > > I think I actually misinterpreted what you meant with these changes. > It is not what the patterns are picking. It is some _other_ trace > entry we do not necessarily care about, like label:do_write_index > that has the path to the .git/index.lock file, that can accidentally > contain d0, that can be picked up with a pattern that is too loose. > So it really didn't have to clarify what it is looking for, as it > would not help seeing what false positives the patterns are designed > to avoid matching. Sorry about that. I would have included examples, but they're quite long (>>>80 chars), so seemed very out of place in both commit description and in the codebase. With line wrapping, it wasn't very readable either. At the risk of this also getting line-wrapped into unreadability: test_line_count: line count for fetches != 1 23:59:48.794453 run-command.c:733 | d0 | main | child_start | | 0.027328 | | | ..........[ch1] class:? argv:[git -c fetch.negotiationAlgorithm=noop fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin] 23:59:48.798901 common-main.c:58 | d1 | main | start | | 0.000852 | | | /usr/local/google/home/spectral/src/oss/d0/git/git -c fetch.negotiationAlgorithm=noop fetch origin --no-tags --no-write-fetch-head --recurse-submodules=no --filter=blob:none --stdin where each line in the `fetches` file starts with `23:59:48` here. It's 9 columns, separated by `|` characters, and the line we don't want is the second one; the regex `d0.*fetch.negotiationAlgorithm` includes it because of the `d0` in the path. I considered using `awk -F"|" "\$2~/d0/ && \$9~/fetch\\.negotiationAlgorithm/{ print }" trace.output >fetches`, but it was longer, possibly less clear, and less specific (since it didn't include the $4~/child_start/ condition) > > Will queue. > >
Kyle Lippincott <spectral@google.com> writes: >> So it really didn't have to clarify what it is looking for, as it >> would not help seeing what false positives the patterns are designed >> to avoid matching. Sorry about that. > > I would have included examples, but they're quite long (>>>80 chars), > so seemed very out of place in both commit description and in the > codebase. Absolutely. It turned out not to be so useful to show the shape of potential matches, like this one: ... run-command.c:733 | d0 | main | child_start |... To explain why spaces around " d0 " matters, the readers need to understand that other trace entries that are irrelevant for our purpose, like this one ... | label:do_write_index /path/to/t/trash directory.../.git/index.lock we want reject, and for that we want the pattern to be specific enough by looking for " do " that is followed by " child_start ". Otherwise the leading paths that can contain anything won't easily match, and the original of looking for just "d0" was way too error prone. But it is hard to leave a concise hint for that there. So, again, sorry about the bad suggestion. > I considered using `awk -F"|" "\$2~/d0/ && > \$9~/fetch\\.negotiationAlgorithm/{ print }" trace.output >fetches`, > but it was longer, possibly less clear, and less specific (since it > didn't include the $4~/child_start/ condition) Yeah, using the syntactic clue -F"|" would also be a way to convey the intention (i.e. "we are dealing with tabular output and we expect nth column to be X"), but what you have is probably good enough---it certainly is simpler to read and understand. I briefly considered that looking for "| d0 |" (i.e. explicitly mentioning the column separator in the pattern) would make it even more obvious what we are looking for, but having to worry about quoting "|" in regexp would negate the benefit of obviousness out of the approach to use "grep". Thanks.
diff --git a/t/t6421-merge-partial-clone.sh b/t/t6421-merge-partial-clone.sh index 711b709e755..b99f29ef9ba 100755 --- a/t/t6421-merge-partial-clone.sh +++ b/t/t6421-merge-partial-clone.sh @@ -230,8 +230,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded for single relev grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && test_cmp expect actual && - # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + # Check the number of fetch commands exec-ed by filtering trace to + # child_start events by the top-level program (2nd field == d0) + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 2 fetches && git rev-list --objects --all --missing=print | @@ -318,8 +319,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded when a directory grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && test_cmp expect actual && - # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + # Check the number of fetch commands exec-ed by filtering trace to + # child_start events by the top-level program (2nd field == d0) + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 1 fetches && git rev-list --objects --all --missing=print | @@ -422,8 +424,9 @@ test_expect_merge_algorithm failure success 'Objects downloaded with lots of ren grep fetch_count trace.output | cut -d "|" -f 9 | tr -d " ." >actual && test_cmp expect actual && - # Check the number of fetch commands exec-ed - grep d0.*fetch.negotiationAlgorithm trace.output >fetches && + # Check the number of fetch commands exec-ed by filtering trace to + # child_start events by the top-level program (2nd field == d0) + grep " d0 .* child_start .*fetch.negotiationAlgorithm" trace.output >fetches && test_line_count = 4 fetches && git rev-list --objects --all --missing=print |