Message ID | a99a45cb6f2cc7328ef0e52fc2ea8fec537bfba9.1574884302.git.liu.denton@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | t: test cleanup stemming from experimentally enabling pipefail | expand |
Denton Liu <liu.denton@gmail.com> writes: > The code to test that objects were not missing from the packfile was > duplicated many times. Extract the duplicated code into > test_no_missing_in_packs() and use that instead. > > Refactor the resulting extraction so that if any git commands fail, > their return codes are not silently lost. > > We were using sed to filter lines. Although not incorrect, this is > exactly what grep is built for. Replace this invocation of sed with grep > so that we use the correct tool for the job. Well, $ sed -n -e 's/required match/desired part of the line/p' is much much more approirate than $ grep -e "requred match" | extract desired part of the line "grep" is better only if the original were $ sed -n -e '/required match/p' but everybody would write it with grep to begin with ;-) So, I dunno about this part of the conversion. > Instead of verifying each file of `alt_objects/pack/*.idx` individually > in a for-loop, batch them together into one verification step. Do you mean this one? git verify-pack -v alt_objects/pack/*.idx where we may pass 1 or more .idx file to the command? At first my reading was interrupted by a "Huh?", but that does look good. We'd need to be a bit careful to make sure that we have at least 1 .idx file, as the shell will happily feed a file whose name is "*.idx", which verify-pack would be unhappy about. > The original testing construct was O(n^2): it used a grep in a loop to > test whether any objects were missing in the packfile. Rewrite this to > sort the files then use `comm -23` so that finding missing lines from > the original file is done more efficiently. OK. If we an show measurable speedups, it would be great, but the loop structure does look O(n^2) and unnecessary costly. > +test_no_missing_in_packs () { > + myidx=$(ls -1 .git/objects/pack/*.idx) && > + test_path_is_file "$myidx" && If there are 2 or more .idx files, or if there is none, $myidx would hopefully be a concatenation of these filenames or a string that ends with asterisk-dot-idx and would fail path_is_file. Sounds OK. Ah, I do not have to review this part---these are repeated patterns in the original. > + git verify-pack -v alt_objects/pack/*.idx >orig.raw && > + grep "^[0-9a-f]\{40\}" orig.raw | cut -d" " -f1 | sort >orig && If output from 'grep' can be used as-is, it is worth doing, but if you have to pipe it to cut, the original that used sed to filter and edit the line would probably be a better way to write it. > + git verify-pack -v $myidx >dest.raw && This part does not quote $myidx" (inherited from the original); it probably is OK, as any potentially problematic value in $myidx would have been caught as an error much earlier in this test.
Hi Junio, On Fri, Nov 29, 2019 at 01:39:30PM -0800, Junio C Hamano wrote: > Denton Liu <liu.denton@gmail.com> writes: > > > The code to test that objects were not missing from the packfile was > > duplicated many times. Extract the duplicated code into > > test_no_missing_in_packs() and use that instead. > > > > Refactor the resulting extraction so that if any git commands fail, > > their return codes are not silently lost. > > > > We were using sed to filter lines. Although not incorrect, this is > > exactly what grep is built for. Replace this invocation of sed with grep > > so that we use the correct tool for the job. > > Well, > > $ sed -n -e 's/required match/desired part of the line/p' > > is much much more approirate than > > $ grep -e "requred match" | > extract desired part of the line > > "grep" is better only if the original were > > $ sed -n -e '/required match/p' > > but everybody would write it with grep to begin with ;-) This was what I was intending. It was originally written like the above and it made sense to convert it to use grep. I guess "filter lines" in my commit message is a little bit vague. Could we change this to "filter matching lines" perhaps? > > So, I dunno about this part of the conversion. > > > Instead of verifying each file of `alt_objects/pack/*.idx` individually > > in a for-loop, batch them together into one verification step. > > Do you mean this one? > > git verify-pack -v alt_objects/pack/*.idx > > where we may pass 1 or more .idx file to the command? At first my > reading was interrupted by a "Huh?", but that does look good. We'd > need to be a bit careful to make sure that we have at least 1 .idx > file, as the shell will happily feed a file whose name is "*.idx", > which verify-pack would be unhappy about. > > > The original testing construct was O(n^2): it used a grep in a loop to > > test whether any objects were missing in the packfile. Rewrite this to > > sort the files then use `comm -23` so that finding missing lines from > > the original file is done more efficiently. > > OK. If we an show measurable speedups, it would be great, but the > loop structure does look O(n^2) and unnecessary costly. > > > +test_no_missing_in_packs () { > > + myidx=$(ls -1 .git/objects/pack/*.idx) && > > + test_path_is_file "$myidx" && > > If there are 2 or more .idx files, or if there is none, $myidx would > hopefully be a concatenation of these filenames or a string that > ends with asterisk-dot-idx and would fail path_is_file. Sounds OK. > > Ah, I do not have to review this part---these are repeated patterns > in the original. > > > + git verify-pack -v alt_objects/pack/*.idx >orig.raw && > > + grep "^[0-9a-f]\{40\}" orig.raw | cut -d" " -f1 | sort >orig && > > If output from 'grep' can be used as-is, it is worth doing, but if > you have to pipe it to cut, the original that used sed to filter and > edit the line would probably be a better way to write it. The original sed actually only filtered the line; no editing done. The cut invocation was a consequence of using comm. Previously, in the while loop, we would separate the line into `sha1` and `rest` components and only match using the `sha1`. Since we use comm now, we have to use cut to grab the sha1 and omit the rest of the line. We could rewrite it with sed like this: sed -n -e "/^[0-9a-f]\{40\}/s/^\($[0-9a-f]\{40\}\).*/\1/" orig.raw but I believe that breaking it into grep and cut makes the intent much more clear. What do you think? (By the way, if I were to reroll this series, should I keep sending out the entire patchset? It feels very noisy to send out 20-something emails every reroll when I'm just making a small one or two line change.) Thanks, Denton > > > + git verify-pack -v $myidx >dest.raw && > > This part does not quote $myidx" (inherited from the original); it > probably is OK, as any potentially problematic value in $myidx would > have been caught as an error much earlier in this test. >
Denton Liu <liu.denton@gmail.com> writes: > Hi Junio, > ... >> "grep" is better only if the original were >> >> $ sed -n -e '/required match/p' >> >> but everybody would write it with grep to begin with ;-) > > This was what I was intending. It was originally written like the above > and it made sense to convert it to use grep. I guess "filter lines" in > my commit message is a little bit vague. Could we change this to "filter > matching lines" perhaps? Ah, I see. I somehow thought that some of the "sed" invocation in the original version were doing "find lines and filter its contents" (i.e. "-n -e 's/find/munge/p'"), but all three of them are just "find lines" (i.e. "-n -e '/find/p'"). So I think the change made by the patch is OK. I think I was reacting to the output of "grep" being piped to "cut". IOW, the original ... | sed -n -e '/find/p' | while read sha1 garbage do ... use sha1 ... were rewritten to ... >raw && grep -e 'find' raw | cut -d" " -f1 >orig ... use orig as a list of sha1s ... But the "grep piped to cut" can be a single process ... >raw && sed -n -e 's/\(find\).*/\1/p' raw >orig ... use orig as a list of sha1s ... So in the tiniest picture, turning "sed -n -e /find/p" into "grep" is not wrong per-se, but if you step back a bit and see a larger picture, using "sed" a bit more effectively turns out to be still a better rewrite. ... and I wrote the above before I read the remainder of your response, where you considered which one is easier to read between "grep piped to cut" and "sed" ;-) > (By the way, if I were to reroll this series, should I keep sending out > the entire patchset? It feels very noisy to send out 20-something emails > every reroll when I'm just making a small one or two line change.) Especially if it is near the end of the series, just a single step is OK. But is there anything that is glaringly wrong that needs a reroll? Or would it be "this is good enough, so let's have them cook in 'next' and graduate to 'master'---further clean-up can be done after all the dust settles"? I have an impression that we reached the latter by now. Thanks.
Hi Junio, On Mon, Dec 02, 2019 at 02:53:09PM -0800, Junio C Hamano wrote: > > (By the way, if I were to reroll this series, should I keep sending out > > the entire patchset? It feels very noisy to send out 20-something emails > > every reroll when I'm just making a small one or two line change.) > > Especially if it is near the end of the series, just a single step > is OK. But is there anything that is glaringly wrong that needs a > reroll? Or would it be "this is good enough, so let's have them > cook in 'next' and graduate to 'master'---further clean-up can be > done after all the dust settles"? I have an impression that we > reached the latter by now. Perhaps the log message could use some improvement to document the discussion we had? I don't know if that's worth a reroll, though. Aside from that, I agree that it's ready for 'next'. > > Thanks.
Denton Liu <liu.denton@gmail.com> writes: >> Especially if it is near the end of the series, just a single step >> is OK. But is there anything that is glaringly wrong that needs a >> reroll? Or would it be "this is good enough, so let's have them >> cook in 'next' and graduate to 'master'---further clean-up can be >> done after all the dust settles"? I have an impression that we >> reached the latter by now. > > Perhaps the log message could use some improvement to document the > discussion we had? I don't know if that's worth a reroll, though. Aside > from that, I agree that it's ready for 'next'. Sure, let's see what you have in mind. Thanks for working on this.
Hi Junio, On Tue, Dec 03, 2019 at 07:41:19AM -0800, Junio C Hamano wrote: > Denton Liu <liu.denton@gmail.com> writes: > > >> Especially if it is near the end of the series, just a single step > >> is OK. But is there anything that is glaringly wrong that needs a > >> reroll? Or would it be "this is good enough, so let's have them > >> cook in 'next' and graduate to 'master'---further clean-up can be > >> done after all the dust settles"? I have an impression that we > >> reached the latter by now. > > > > Perhaps the log message could use some improvement to document the > > discussion we had? I don't know if that's worth a reroll, though. Aside > > from that, I agree that it's ready for 'next'. > > Sure, let's see what you have in mind. Here's a complete replacement for the commit message: t7700: consolidate code into test_no_missing_in_packs() The code to test that objects were not missing from the packfile was duplicated many times. Extract the duplicated code into test_no_missing_in_packs() and use that instead. Refactor the resulting extraction so that if any git commands fail, their return codes are not silently lost. We were using sed to filter lines. Although not incorrect, this is exactly what grep is built for. Replace this invocation of sed with grep so that we use the correct tool for the job. Instead of verifying each file of `alt_objects/pack/*.idx` individually in a for-loop, batch them together into one verification step. The original testing construct was O(n^2): it used a grep in a loop to test whether any objects were missing in the packfile. Rewrite this to sort the files then use `comm -23` so that finding missing lines from the original file is done more efficiently. The result of this is that we end up with a `grep | cut | sort` pipeline. Previously, we were extracting the `sha1` as part of the `while read sha1 rest` loop. Since we removed the while-loop, we need to use `cut` to extract the `sha1` field. Note that we could have chosen to combine the `grep | cut` into a single `sed` invocation but we consciously leave it separate as it makes the intent more clear. While we're at it, add a space to `commit_and_pack ()` for style. Signed-off-by: Denton Liu <liu.denton@gmail.com> The only change between this and the old commit message is the addition of the "The result of this..." paragraph. Thanks, Denton > > Thanks for working on this.
Denton Liu <liu.denton@gmail.com> writes: >> Sure, let's see what you have in mind. > > Here's a complete replacement for the commit message: > > t7700: consolidate code into test_no_missing_in_packs() > > The code to test that objects were not missing from the packfile was > duplicated many times. Extract the duplicated code into > test_no_missing_in_packs() and use that instead. > > Refactor the resulting extraction so that if any git commands fail, > their return codes are not silently lost. > > We were using sed to filter lines. Although not incorrect, this is > exactly what grep is built for. Replace this invocation of sed with grep > so that we use the correct tool for the job. > > Instead of verifying each file of `alt_objects/pack/*.idx` individually > in a for-loop, batch them together into one verification step. > > The original testing construct was O(n^2): it used a grep in a loop to > test whether any objects were missing in the packfile. Rewrite this to > sort the files then use `comm -23` so that finding missing lines from > the original file is done more efficiently. > > The result of this is that we end up with a `grep | cut | sort` > pipeline. Previously, we were extracting the `sha1` as part of the > `while read sha1 rest` loop. Since we removed the while-loop, we need to > use `cut` to extract the `sha1` field. Note that we could have chosen to > combine the `grep | cut` into a single `sed` invocation but we > consciously leave it separate as it makes the intent more clear. > > While we're at it, add a space to `commit_and_pack ()` for style. > > Signed-off-by: Denton Liu <liu.denton@gmail.com> > > The only change between this and the old commit message is the addition > of the "The result of this..." paragraph. Ah, you were planning to only update the log message? Then let's not bother. I do not think we would want to encourage "grep piped to cut" as a good pattern for others to follow and a single sed that finds the relevant lines and munges the content of these lines into what is desired conveys the intent clearly and more concisely (I was hoping that that was what you had in mind for a reroll).
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh index 1d14ddcbdb..4bcd9fcc80 100755 --- a/t/t7700-repack.sh +++ b/t/t7700-repack.sh @@ -4,12 +4,23 @@ test_description='git repack works correctly' . ./test-lib.sh -commit_and_pack() { +commit_and_pack () { test_commit "$@" 1>&2 && SHA1=$(git pack-objects --all --unpacked --incremental .git/objects/pack/pack </dev/null) && echo pack-${SHA1}.pack } +test_no_missing_in_packs () { + myidx=$(ls -1 .git/objects/pack/*.idx) && + test_path_is_file "$myidx" && + git verify-pack -v alt_objects/pack/*.idx >orig.raw && + grep "^[0-9a-f]\{40\}" orig.raw | cut -d" " -f1 | sort >orig && + git verify-pack -v $myidx >dest.raw && + cut -d" " -f1 dest.raw | sort >dest && + comm -23 orig dest >missing && + test_must_be_empty missing +} + test_expect_success 'objects in packs marked .keep are not repacked' ' echo content1 >file1 && echo content2 >file2 && @@ -105,19 +116,7 @@ test_expect_success 'packed obs in alt ODB are repacked even when local repo is mkdir alt_objects/pack && mv .git/objects/pack/* alt_objects/pack && git repack -a && - myidx=$(ls -1 .git/objects/pack/*.idx) && - test_path_is_file "$myidx" && - for p in alt_objects/pack/*.idx - do - git verify-pack -v $p | sed -n -e "/^[0-9a-f]\{40\}/p" - done | while read sha1 rest - do - if ! ( git verify-pack -v $myidx | grep "^$sha1" ) - then - echo "Missing object in local pack: $sha1" - return 1 - fi - done + test_no_missing_in_packs ' test_expect_success 'packed obs in alt ODB are repacked when local repo has packs' ' @@ -128,19 +127,7 @@ test_expect_success 'packed obs in alt ODB are repacked when local repo has pack git commit -m more_content && git repack && git repack -a -d && - myidx=$(ls -1 .git/objects/pack/*.idx) && - test_path_is_file "$myidx" && - for p in alt_objects/pack/*.idx - do - git verify-pack -v $p | sed -n -e "/^[0-9a-f]\{40\}/p" - done | while read sha1 rest - do - if ! ( git verify-pack -v $myidx | grep "^$sha1" ) - then - echo "Missing object in local pack: $sha1" - return 1 - fi - done + test_no_missing_in_packs ' test_expect_success 'packed obs in alternate ODB kept pack are repacked' ' @@ -156,19 +143,7 @@ test_expect_success 'packed obs in alternate ODB kept pack are repacked' ' fi done && git repack -a -d && - myidx=$(ls -1 .git/objects/pack/*.idx) && - test_path_is_file "$myidx" && - for p in alt_objects/pack/*.idx - do - git verify-pack -v $p | sed -n -e "/^[0-9a-f]\{40\}/p" - done | while read sha1 rest - do - if ! ( git verify-pack -v $myidx | grep "^$sha1" ) - then - echo "Missing object in local pack: $sha1" - return 1 - fi - done + test_no_missing_in_packs ' test_expect_success 'packed unreachable obs in alternate ODB are not loosened' '
The code to test that objects were not missing from the packfile was duplicated many times. Extract the duplicated code into test_no_missing_in_packs() and use that instead. Refactor the resulting extraction so that if any git commands fail, their return codes are not silently lost. We were using sed to filter lines. Although not incorrect, this is exactly what grep is built for. Replace this invocation of sed with grep so that we use the correct tool for the job. Instead of verifying each file of `alt_objects/pack/*.idx` individually in a for-loop, batch them together into one verification step. The original testing construct was O(n^2): it used a grep in a loop to test whether any objects were missing in the packfile. Rewrite this to sort the files then use `comm -23` so that finding missing lines from the original file is done more efficiently. While we're at it, add a space to `commit_and_pack ()` for style. Signed-off-by: Denton Liu <liu.denton@gmail.com> --- t/t7700-repack.sh | 55 +++++++++++++---------------------------------- 1 file changed, 15 insertions(+), 40 deletions(-)