Message ID | 20200709203336.GA2748777@coredump.intra.peff.net (mailing list archive) |
---|---|
Headers | show |
Series | deterministic commit timestamps in tests | expand |
Jeff King <peff@peff.net> writes: > Most tests use test_tick or test_commit to get deterministic timestamps > in commits. Some don't because they don't care about the timestamps. But > they can still be annoying to debug, since the output changes from run > to run. > > This series sets a default timestamp for those scripts, in a way that > doesn't interfere with anybody using test_tick. The change is in the > final commit. The others are preparatory to deal with fallout in the > test suite. Normally fallout would give me pause about whether this is a > good idea, but a) there's not much of it and b) I think it helped shake > out questionable assumptions in those tests. > > This a replacement for the patches being discussed in: > > https://lore.kernel.org/git/pull.816.git.git.1594149804313.gitgitgadget@gmail.com/ > > It looks like Junio already picked up one of the fixes as > jk/t6000-timestamp-fix. The first patch here is identical (so we can > either drop that branch, or drop the first patch from this series and > apply on top). > > [1/4]: t6000: use test_tick consistently > [2/4]: t9700: loosen ident timezone regex > [3/4]: t5539: make timestamp requirements more explicit > [4/4]: test-lib: set deterministic default author/committer date > > t/t5539-fetch-http-shallow.sh | 4 +++- > t/t6000-rev-list-misc.sh | 7 +++++-- > t/t9700/test.pl | 6 +++--- > t/test-lib.sh | 3 +++ > 4 files changed, 14 insertions(+), 6 deletions(-) > > -Peff I have this queued on top for today's integration run. The last step is something worth doing in the longer term, but certainly not for the upcoming release ;-). -- >8 -- Subject: [PATCH] BANDAID: t9100.17 & t9100.18 --- t/t9100-git-svn-basic.sh | 2 ++ 1 file changed, 2 insertions(+) diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh index 2c309a57d9..33ea813585 100755 --- a/t/t9100-git-svn-basic.sh +++ b/t/t9100-git-svn-basic.sh @@ -8,6 +8,8 @@ GIT_SVN_LC_ALL=${LC_ALL:-$LANG} . ./lib-git-svn.sh +unset GIT_AUTHOR_DATE GIT_COMMITTER_DATE + case "$GIT_SVN_LC_ALL" in *.UTF-8) test_set_prereq UTF8
On Fri, Jul 10, 2020 at 03:02:01PM -0700, Junio C Hamano wrote: > I have this queued on top for today's integration run. The last > step is something worth doing in the longer term, but certainly not > for the upcoming release ;-). > > -- >8 -- > Subject: [PATCH] BANDAID: t9100.17 & t9100.18 Oof, thanks for catching. I don't usually have subversion on my system at all. I have been trying to pay more attention to our CI, but I have another topic I've been working on that has been causing failures in my integration runs, so I didn't notice this one. It took me a while to untangle just what's happening in the test, but I think your "unset" workaround is actually what we want. The patch below could go second-to-last in jk/tests-timestamp-fix, before the actual switch to setting GIT_COMMITTER_DATE. -- >8 -- Subject: [PATCH] t9100: explicitly unset GIT_COMMITTER_DATE The early part of t9100 creates an unusual "doubled" history in the "git-svn" ref. When we get to t9100.17, it looks like this: $ git log --oneline --graph git-svn [...] * efd0303 detect node change from file to directory #2 |\ * | 3e727c0 detect node change from file to directory #2 |/ * 3b00468 try a deep --rmdir with a commit |\ * | b4832d8 try a deep --rmdir with a commit |/ * f0d7bd5 import for git svn Each commit we make with "git commit" is paired with one from "git svn set-tree", with the latter as a merge of the first and its grandparent. Later, t9100.17 wants to check that "git svn fetch" gets the same trees. And it does, but just one copy of each. So it uses rev-list to get the tree of each commit and pipes it to "uniq" to drop the duplicates. Our input isn't sorted, but it will find adjacent duplicates. This works reliably because the order of commits from rev-list always shows the duplicates next to each other. For any one of those merges, we could choose to show its duplicate or the grandparent first. But barring clocks running backwards, the duplicate will always have a time equal to or greater than the grandparent. Even if equal, we break ties by showing the first-parent first, so the duplicates remain adjacent. But this would break if the timestamps stopped moving in chronological order. Normally we would rely on test_tick for this, but we have _two_ sources of time here: - "git commit" creates one commit based on GIT_COMMITTER_DATE (which respects test_tick) - the "svn set-tree" one is based on subversion, which does not have an easy way to specify a timestamp So using test_tick actually breaks the test, because now the duplicates are far in the past, and we'll show the grandparent before the duplicate. And likewise, a proposed change to set GIT_COMMITTER_DATE in all scripts will break it. We _could_ fix this by sorting before removing duplicates, but presumably it's a useful part of the test to make sure the trees appear in the same order in both spots. Likewise, we could use something like: perl -ne 'print unless $seen{$_}++' to remove duplicates without impacting the order. But that doesn't work either, because there are actually multiple (non-duplicate) commits with the same trees (we change a file mode and then change it back). So we'd actually have to de-duplicate the combination of subject and tree. Which then further throws off t9100.18, which compares the tree hashes exactly; we'd have to strip the result back down. Since this test _isn't_ buggy, the simplest thing is to just work around the proposed change by documenting our expectation that git-created commits are correctly interleaved using the current time. Signed-off-by: Jeff King <peff@peff.net> --- t/t9100-git-svn-basic.sh | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh index 9f2d19ecc4..b80952f0ac 100755 --- a/t/t9100-git-svn-basic.sh +++ b/t/t9100-git-svn-basic.sh @@ -8,6 +8,10 @@ GIT_SVN_LC_ALL=${LC_ALL:-$LANG} . ./lib-git-svn.sh +# Make sure timestamps of commits created by Git interleave +# with those created by "svn set-tree". +unset GIT_COMMITTER_DATE + case "$GIT_SVN_LC_ALL" in *.UTF-8) test_set_prereq UTF8
On Tue, Jul 14, 2020 at 08:31:42AM -0400, Jeff King wrote: > We _could_ fix this by sorting before removing duplicates, but > presumably it's a useful part of the test to make sure the trees appear > in the same order in both spots. Likewise, we could use something like: > > perl -ne 'print unless $seen{$_}++' > > to remove duplicates without impacting the order. But that doesn't work > either, because there are actually multiple (non-duplicate) commits with > the same trees (we change a file mode and then change it back). So we'd > actually have to de-duplicate the combination of subject and tree. Which > then further throws off t9100.18, which compares the tree hashes > exactly; we'd have to strip the result back down. Actually, that last one isn't _too_ bad. It looks something like this: diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh index b80952f0ac..4502c5f97d 100755 --- a/t/t9100-git-svn-basic.sh +++ b/t/t9100-git-svn-basic.sh @@ -204,8 +204,10 @@ GIT_SVN_ID=alt export GIT_SVN_ID test_expect_success "$name" \ 'git svn init "$svnrepo" && git svn fetch && - git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a && - git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b && + git log --format="tree %T %s" remotes/git-svn | + perl -lne "print unless \$seen{\$_}++" | + cut -d" " -f1-2 >a && + git log --format="tree %T" remotes/alt >b && test_cmp a b' name='check imported tree checksums expected tree checksums' It does lose a little bit of information, which is that in the original we confirmed that the duplicates were always next to each other. But I'm not sure that's important. We'd get confused if the same subject appeared twice, but all of the commits have distinct hard-coded subjects in the earlier tests. -Peff
Jeff King <peff@peff.net> wrote: > On Tue, Jul 14, 2020 at 08:31:42AM -0400, Jeff King wrote: > > > We _could_ fix this by sorting before removing duplicates, but > > presumably it's a useful part of the test to make sure the trees appear > > in the same order in both spots. Likewise, we could use something like: > > > > perl -ne 'print unless $seen{$_}++' > > > > to remove duplicates without impacting the order. But that doesn't work > > either, because there are actually multiple (non-duplicate) commits with > > the same trees (we change a file mode and then change it back). So we'd > > actually have to de-duplicate the combination of subject and tree. Which > > then further throws off t9100.18, which compares the tree hashes > > exactly; we'd have to strip the result back down. Right, log order matters, so sorting isn't ideal. > Actually, that last one isn't _too_ bad. It looks something like this: > > diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh > index b80952f0ac..4502c5f97d 100755 > --- a/t/t9100-git-svn-basic.sh > +++ b/t/t9100-git-svn-basic.sh > @@ -204,8 +204,10 @@ GIT_SVN_ID=alt > export GIT_SVN_ID > test_expect_success "$name" \ > 'git svn init "$svnrepo" && git svn fetch && > - git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a && > - git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b && > + git log --format="tree %T %s" remotes/git-svn | > + perl -lne "print unless \$seen{\$_}++" | > + cut -d" " -f1-2 >a && > + git log --format="tree %T" remotes/alt >b && > test_cmp a b' The future of non-strict one-liners with Perl7 on the horizon seems uncertain :< cut is unnecessary either way, but I suggest awk, here: awk "!seen[\$0]++ { print \$1, \$2 }' > name='check imported tree checksums expected tree checksums' > > It does lose a little bit of information, which is that in the original > we confirmed that the duplicates were always next to each other. But I'm > not sure that's important. We'd get confused if the same subject > appeared twice, but all of the commits have distinct hard-coded > subjects in the earlier tests. Yeah, but I think it's fine. It's been a while since I wrote this
On Tue, Jul 14, 2020 at 09:47:28PM +0000, Eric Wong wrote: > > - git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a && > > - git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b && > > + git log --format="tree %T %s" remotes/git-svn | > > + perl -lne "print unless \$seen{\$_}++" | > > + cut -d" " -f1-2 >a && > > + git log --format="tree %T" remotes/alt >b && > > test_cmp a b' > > The future of non-strict one-liners with Perl7 on the horizon > seems uncertain :< cut is unnecessary either way, but > I suggest awk, here: > > awk "!seen[\$0]++ { print \$1, \$2 }' Thanks, that is nicer. > > name='check imported tree checksums expected tree checksums' > > > > It does lose a little bit of information, which is that in the original > > we confirmed that the duplicates were always next to each other. But I'm > > not sure that's important. We'd get confused if the same subject > > appeared twice, but all of the commits have distinct hard-coded > > subjects in the earlier tests. > > Yeah, but I think it's fine. It's been a while since I wrote > this OK. If you're on board, then I think doing it this way is slightly nicer, as it's less likely to be confusing or bite somebody in the future. Here's a revised patch (I see Junio already picked up the other fix; if that ends up being merged instead, that's not the end of the world). (compared to the earlier version, you can skip everything in the commit message before "One fix would be..."). -- >8 -- Subject: [PATCH] t9100: stop depending on commit timestamps The early part of t9100 creates an unusual "doubled" history in the "git-svn" ref. When we get to t9100.17, it looks like this: $ git log --oneline --graph git-svn [...] * efd0303 detect node change from file to directory #2 |\ * | 3e727c0 detect node change from file to directory #2 |/ * 3b00468 try a deep --rmdir with a commit |\ * | b4832d8 try a deep --rmdir with a commit |/ * f0d7bd5 import for git svn Each commit we make with "git commit" is paired with one from "git svn set-tree", with the latter as a merge of the first and its grandparent. Later, t9100.17 wants to check that "git svn fetch" gets the same trees. And it does, but just one copy of each. So it uses rev-list to get the tree of each commit and pipes it to "uniq" to drop the duplicates. Our input isn't sorted, but it will find adjacent duplicates. This works reliably because the order of commits from rev-list always shows the duplicates next to each other. For any one of those merges, we could choose to show its duplicate or the grandparent first. But barring clocks running backwards, the duplicate will always have a time equal to or greater than the grandparent. Even if equal, we break ties by showing the first-parent first, so the duplicates remain adjacent. But this would break if the timestamps stopped moving in chronological order. Normally we would rely on test_tick for this, but we have _two_ sources of time here: - "git commit" creates one commit based on GIT_COMMITTER_DATE (which respects test_tick) - the "svn set-tree" one is based on subversion, which does not have an easy way to specify a timestamp So using test_tick actually breaks the test, because now the duplicates are far in the past, and we'll show the grandparent before the duplicate. And likewise, a proposed change to set GIT_COMMITTER_DATE in all scripts will break it. One fix would be to sort the list of trees before removing duplicates, but that loses information: - we do care that the fetched history is in the same order - there's a tree which appears twice in the history, and we'd want to make sure that it's there both times So instead, let's de-duplicate using a hash (preserving the order), and drop only lines with identical trees and subjects (preserving the tree which appears twice, since it has different subjects each time). Signed-off-by: Jeff King <peff@peff.net> --- t/t9100-git-svn-basic.sh | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh index 9f2d19ecc4..3055943a22 100755 --- a/t/t9100-git-svn-basic.sh +++ b/t/t9100-git-svn-basic.sh @@ -200,8 +200,9 @@ GIT_SVN_ID=alt export GIT_SVN_ID test_expect_success "$name" \ 'git svn init "$svnrepo" && git svn fetch && - git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a && - git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b && + git log --format="tree %T %s" remotes/git-svn | + awk "!seen[\$0]++ { print \$1, \$2 }" >a && + git log --format="tree %T" alt >b && test_cmp a b' name='check imported tree checksums expected tree checksums'
Jeff King <peff@peff.net> writes: > Here's a revised patch (I see Junio already picked up the other fix; if > that ends up being merged instead, that's not the end of the world). > > (compared to the earlier version, you can skip everything in the commit > message before "One fix would be..."). Heh, it already is in 'next' and I do not think it is worth rewinding and rebuilding to cause downstream folks an additional trouble. I however think this updated solution is nicer and do not mind building on top i.e. "while an earlier change did unbreak the svn tests, relying on the current timestamps is not nice and here is an update". Having said that, I think it is more urgent to address the "ouch, we made it clera that repos with extensions.worktreeconfig set without marking them as repoformat v1 are broken and without giving users enough hints to recover" issue discussed elsewhere before -rc1 (and for that reason I do not think I can tag -rc1 today), so I'd backburner it. This topic won't merge down from 'next' until final anyway. Thanks. > -- >8 -- > Subject: [PATCH] t9100: stop depending on commit timestamps > > The early part of t9100 creates an unusual "doubled" history in the > "git-svn" ref. When we get to t9100.17, it looks like this: > > $ git log --oneline --graph git-svn > [...] > * efd0303 detect node change from file to directory #2 > |\ > * | 3e727c0 detect node change from file to directory #2 > |/ > * 3b00468 try a deep --rmdir with a commit > |\ > * | b4832d8 try a deep --rmdir with a commit > |/ > * f0d7bd5 import for git svn > > Each commit we make with "git commit" is paired with one from "git svn > set-tree", with the latter as a merge of the first and its grandparent. > > Later, t9100.17 wants to check that "git svn fetch" gets the same trees. > And it does, but just one copy of each. So it uses rev-list to get the > tree of each commit and pipes it to "uniq" to drop the duplicates. Our > input isn't sorted, but it will find adjacent duplicates. This works > reliably because the order of commits from rev-list always shows the > duplicates next to each other. For any one of those merges, we could > choose to show its duplicate or the grandparent first. But barring > clocks running backwards, the duplicate will always have a time equal to > or greater than the grandparent. Even if equal, we break ties by showing > the first-parent first, so the duplicates remain adjacent. > > But this would break if the timestamps stopped moving in chronological > order. Normally we would rely on test_tick for this, but we have _two_ > sources of time here: > > - "git commit" creates one commit based on GIT_COMMITTER_DATE (which > respects test_tick) > > - the "svn set-tree" one is based on subversion, which does not have > an easy way to specify a timestamp > > So using test_tick actually breaks the test, because now the duplicates > are far in the past, and we'll show the grandparent before the > duplicate. And likewise, a proposed change to set GIT_COMMITTER_DATE in > all scripts will break it. > > One fix would be to sort the list of trees before removing duplicates, > but that loses information: > > - we do care that the fetched history is in the same order > > - there's a tree which appears twice in the history, and we'd want to > make sure that it's there both times > > So instead, let's de-duplicate using a hash (preserving the order), and > drop only lines with identical trees and subjects (preserving the tree > which appears twice, since it has different subjects each time). > > Signed-off-by: Jeff King <peff@peff.net> > --- > t/t9100-git-svn-basic.sh | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh > index 9f2d19ecc4..3055943a22 100755 > --- a/t/t9100-git-svn-basic.sh > +++ b/t/t9100-git-svn-basic.sh > @@ -200,8 +200,9 @@ GIT_SVN_ID=alt > export GIT_SVN_ID > test_expect_success "$name" \ > 'git svn init "$svnrepo" && git svn fetch && > - git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a && > - git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b && > + git log --format="tree %T %s" remotes/git-svn | > + awk "!seen[\$0]++ { print \$1, \$2 }" >a && > + git log --format="tree %T" alt >b && > test_cmp a b' > > name='check imported tree checksums expected tree checksums'
Junio C Hamano <gitster@pobox.com> writes: > Heh, it already is in 'next' and I do not think it is worth > rewinding and rebuilding to cause downstream folks an additional > trouble. I however think this updated solution is nicer and do not > mind building on top i.e. "while an earlier change did unbreak the > svn tests, relying on the current timestamps is not nice and here is > an update". > > Having said that, I think it is more urgent to address the "ouch, we > made it clera that repos with extensions.worktreeconfig set without > marking them as repoformat v1 are broken and without giving users > enough hints to recover" issue discussed elsewhere before -rc1 (and > for that reason I do not think I can tag -rc1 today), so I'd > backburner it. This topic won't merge down from 'next' until final > anyway. Before I forget, here is what could be applied on top of the "fix the starting timestamp for the world" step as a "further polishing on top of the completed series". -- >8 -- From: Jeff King <peff@peff.net> Date: Wed, 15 Jul 2020 03:42:50 -0400 Subject: [PATCH] t9100: stop depending on commit timestamps An earlier "fix" to this script gave up updating it not to rely on the current time because we cannot control what timestamp subversion gives its commits. We however could solve the issue in a different way and still use deterministic timestamps on Git commits. One fix would be to sort the list of trees before removing duplicates, but that loses information: - we do care that the fetched history is in the same order - there's a tree which appears twice in the history, and we'd want to make sure that it's there both times So instead, let's de-duplicate using a hash (preserving the order), and drop only lines with identical trees and subjects (preserving the tree which appears twice, since it has different subjects each time). Signed-off-by: Jeff King <peff@peff.net> Acked-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- t/t9100-git-svn-basic.sh | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/t/t9100-git-svn-basic.sh b/t/t9100-git-svn-basic.sh index a89d338aa7..8dd9645ce5 100755 --- a/t/t9100-git-svn-basic.sh +++ b/t/t9100-git-svn-basic.sh @@ -8,10 +8,6 @@ GIT_SVN_LC_ALL=${LC_ALL:-$LANG} . ./lib-git-svn.sh -# Make sure timestamps of commits created by Git interleave -# with those created by "svn set-tree". -unset GIT_COMMITTER_DATE - case "$GIT_SVN_LC_ALL" in *.UTF-8) test_set_prereq UTF8 @@ -204,8 +200,9 @@ GIT_SVN_ID=alt export GIT_SVN_ID test_expect_success "$name" \ 'git svn init "$svnrepo" && git svn fetch && - git rev-list --pretty=raw remotes/git-svn | grep ^tree | uniq > a && - git rev-list --pretty=raw remotes/alt | grep ^tree | uniq > b && + git log --format="tree %T %s" remotes/git-svn | + awk "!seen[\$0]++ { print \$1, \$2 }" >a && + git log --format="tree %T" alt >b && test_cmp a b' name='check imported tree checksums expected tree checksums'
On Wed, Jul 15, 2020 at 08:04:52AM -0700, Junio C Hamano wrote: > Before I forget, here is what could be applied on top of the "fix > the starting timestamp for the world" step as a "further polishing > on top of the completed series". > > -- >8 -- > From: Jeff King <peff@peff.net> > Date: Wed, 15 Jul 2020 03:42:50 -0400 > Subject: [PATCH] t9100: stop depending on commit timestamps Thanks, this looks good and sets up to merge it post-v2.28. -Peff