Message ID | pull.816.git.git.1594149804313.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Make some commit hashes in tests reproducible | expand |
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Han-Wen Nienhuys <hanwen@google.com> > > Adds test_tick to t5801-remote-helpers.sh and t3203-branch-output.sh That can be read from the patch. Also the subject tells us a half of what you want to achieve with this change (by the way, your subject is malformatted and lacks the <area>: prefix; perhaps "[PATCH] tests: make commit object names reproducible" or something), but the readers are left hanging without knowing what motivated the change. Do any test pieces in these scripts change their behaviour based on what exact object names are assigned to them, making them flaky and hard to test, and if so which one and in what way? Thanks. > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> > --- > Make commit hashes in test t3203-branch-output.sh reproducible > > Signed-off-by: Han-Wen Nienhuys hanwen@google.com [hanwen@google.com] > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-816%2Fhanwen%2Frepro-hash-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-816/hanwen/repro-hash-v1 > Pull-Request: https://github.com/git/git/pull/816 > > t/t3203-branch-output.sh | 2 ++ > t/t5801-remote-helpers.sh | 2 ++ > 2 files changed, 4 insertions(+) > > diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh > index 71818b90f0..6b614739af 100755 > --- a/t/t3203-branch-output.sh > +++ b/t/t3203-branch-output.sh > @@ -4,6 +4,8 @@ test_description='git branch display tests' > . ./test-lib.sh > . "$TEST_DIRECTORY"/lib-terminal.sh > > +test_tick > + > test_expect_success 'make commits' ' > echo content >file && > git add file && > diff --git a/t/t5801-remote-helpers.sh b/t/t5801-remote-helpers.sh > index 0f04b6cddb..de51584173 100755 > --- a/t/t5801-remote-helpers.sh > +++ b/t/t5801-remote-helpers.sh > @@ -22,6 +22,8 @@ compare_refs() { > eval $fail test_cmp expect actual > } > > +test_tick > + > test_expect_success 'setup repository' ' > git init server && > (cd server && > > base-commit: a08a83db2bf27f015bec9a435f6d73e223c21c5e
On Tue, Jul 07, 2020 at 12:50:33PM -0700, Junio C Hamano wrote: > "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Han-Wen Nienhuys <hanwen@google.com> > > > > Adds test_tick to t5801-remote-helpers.sh and t3203-branch-output.sh > > That can be read from the patch. Also the subject tells us a half > of what you want to achieve with this change (by the way, your > subject is malformatted and lacks the <area>: prefix; perhaps > "[PATCH] tests: make commit object names reproducible" or something), > but the readers are left hanging without knowing what motivated the > change. Do any test pieces in these scripts change their behaviour > based on what exact object names are assigned to them, making them > flaky and hard to test, and if so which one and in what way? I agree that more discussion would be nice. But I kind of wonder if we should be aiming for more determinism in general, just to make debugging and reproduction simpler. I.e., rather than pointing to _these_ tests, I think we could make an argument for setting up a known timestamp in the test environment. test_tick would continue to tick forward as usual, but for any tests that don't use it, they'd by default get a deterministic outcome. Something like this: diff --git a/t/test-lib.sh b/t/test-lib.sh index 618a7c8d5b..d8adf5a199 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -441,15 +441,18 @@ TEST_AUTHOR_LOCALNAME=author TEST_AUTHOR_DOMAIN=example.com GIT_AUTHOR_EMAIL=${TEST_AUTHOR_LOCALNAME}@${TEST_AUTHOR_DOMAIN} GIT_AUTHOR_NAME='A U Thor' +GIT_AUTHOR_DATE='1112911993 -0700' TEST_COMMITTER_LOCALNAME=committer TEST_COMMITTER_DOMAIN=example.com GIT_COMMITTER_EMAIL=${TEST_COMMITTER_LOCALNAME}@${TEST_COMMITTER_DOMAIN} GIT_COMMITTER_NAME='C O Mitter' +GIT_COMMITTER_DATE='1112911993 -0700' GIT_MERGE_VERBOSITY=5 GIT_MERGE_AUTOEDIT=no export GIT_MERGE_VERBOSITY GIT_MERGE_AUTOEDIT export GIT_AUTHOR_EMAIL GIT_AUTHOR_NAME export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME +export GIT_COMMITTER_DATE GIT_AUTHOR_DATE export EDITOR # Tests using GIT_TRACE typically don't want <timestamp> <file>:<line> output That's using the same start point as test_tick, though really it could be anything. I've intentionally _not_ called test_tick at the beginning of each script, because that would throw off all of the scripts that do use it by one tick (whereas the first test_tick will overwrite these values). Trying to devil's advocate against this line of reasoning: - using the current timestamp introduces more randomness into the test suite, which could uncover problems. I'm somewhat skeptical, as the usual outcome I see here is that we realize a test's expected output is simply racy, and we remove the raciness by using test_tick - using the current timestamp could alert us to problems that occur only as the clock ticks forward (e.g., if we had a Y2021 bug, we'd notice when the clock rolled forward). - some tests may rely on having a "recent" timestamp in commits (e.g., when looking at relative date handling). I think all of the relative-time tests already use a specific date, though, because otherwise we have too many problems with raciness. Note that the patch above does seem to cause two tests to fail. One of them I _suspect_ is a raciness problem (order of commits output changes, which implies the original was expecting the time to increment between two commits without running test_tick). And the other looks like some weird interaction with the perl test harness. I'd be happy to dig into both if this direction seems sane. -Peff
Jeff King <peff@peff.net> writes: > That's using the same start point as test_tick, though really it could > be anything. I've intentionally _not_ called test_tick at the beginning > of each script, because that would throw off all of the scripts that do > use it by one tick (whereas the first test_tick will overwrite these > values). Yup, I think that is a sensible approach. > Trying to devil's advocate against this line of reasoning: > > - using the current timestamp introduces more randomness into the test > suite, which could uncover problems. I'm somewhat skeptical, as the > usual outcome I see here is that we realize a test's expected output > is simply racy, and we remove the raciness by using test_tick True. > - using the current timestamp could alert us to problems that occur > only as the clock ticks forward (e.g., if we had a Y2021 bug, we'd > notice when the clock rolled forward). True. > - some tests may rely on having a "recent" timestamp in commits (e.g., > when looking at relative date handling). I think all of the > relative-time tests already use a specific date, though, because > otherwise we have too many problems with raciness. True. Another thing we could do is under DEVELOPER=YesPlease we can set timestamps you just added here to some random time. The ones that do care about reproducibility guarantee by using test_tick would not be affected, and those that were happy with the current time should be happy with such a random timestamp. Or we could just drop what this patch does only under DEVELOPER=YesPlease which amounts to be the same as setting random time. > Note that the patch above does seem to cause two tests to fail. One of > them I _suspect_ is a raciness problem (order of commits output changes, > which implies the original was expecting the time to increment between > two commits without running test_tick). Ahh, ok, "git commit && git commit" without these environment variable set and exported may or may not get the same timestamp from the wallclock during the testing. With your patch, they are guaranteed to get the same timestamp. On the surface, it sounds like a valid approach to smoke out any tests that rely on the passage of time, but realistically the machines are fast enough that it won't be a rare occasion for these two "git commit" invocations to be done within a single second, so I am not sure how much we would be gaining. We _are_ getting more reproducible test environment, though, which may be a plus.
On Tue, Jul 07, 2020 at 04:54:18PM -0400, Jeff King wrote: > Note that the patch above does seem to cause two tests to fail. One of > them I _suspect_ is a raciness problem (order of commits output changes, > which implies the original was expecting the time to increment between > two commits without running test_tick). It turns out this does depend on the system clock, but isn't racy in the traditional sense. See the explanation below (which might be worth doing even if we don't switch to a deterministic time everywhere else). -- >8 -- Subject: [PATCH] t6000: use test_tick consistently The first two commits created in t6000 are done without test_tick, meaning they use the current system clock. After that, we create one with test_tick, which means it uses a deterministic time in the past. The result of the "symleft flag bit is propagated down from tag" test relies on the output order of commits from git-log, which in turn depends on these timestamps. So this test is technically dependent on the system clock time, though in practice it would only matter if your system clock was set before test_tick's default time (which is in 2005). However, let's use test_tick consistently for those early commits (and update the expected output to match). This makes the test deterministic, which is in turn easier to reason about and debug. Note that there's also a fourth commit here, and it does not use test_tick. It does have a deterministic timestamp because of the prior use of test_tick in the script, but it will always be the same time as the third commit. Let's use test_tick here, too, for consistency. The matching timestamps between the third and fourth commit are not an important part of the test. We could also use test_commit in all of these cases, as it runs test_tick under the hood. But it would be awkward to do so, as these tests diverge from the usual test_commit patterns (e.g., by creating multiple files in a single commit). Signed-off-by: Jeff King <peff@peff.net> --- t/t6000-rev-list-misc.sh | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh index 3dc1ad8f71..3bb0e4ff8f 100755 --- a/t/t6000-rev-list-misc.sh +++ b/t/t6000-rev-list-misc.sh @@ -8,6 +8,7 @@ test_expect_success setup ' echo content1 >wanted_file && echo content2 >unwanted_file && git add wanted_file unwanted_file && + test_tick && git commit -m one ' @@ -21,6 +22,7 @@ test_expect_success 'rev-list --objects with pathspecs and deeper paths' ' mkdir foo && >foo/file && git add foo/file && + test_tick && git commit -m two && git rev-list --objects HEAD -- foo >output && @@ -69,6 +71,7 @@ test_expect_success '--no-object-names and --object-names are last-one-wins' ' ' test_expect_success 'rev-list A..B and rev-list ^A B are the same' ' + test_tick && git commit --allow-empty -m another && git tag -a -m "annotated" v1.0 && git rev-list --objects ^v1.0^ v1.0 >expect && @@ -84,10 +87,10 @@ test_expect_success 'propagate uninteresting flag down correctly' ' test_expect_success 'symleft flag bit is propagated down from tag' ' git log --format="%m %s" --left-right v1.0...master >actual && cat >expect <<-\EOF && - > two - > one < another < that + > two + > one EOF test_cmp expect actual '
On Tue, Jul 07, 2020 at 02:35:57PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > That's using the same start point as test_tick, though really it could > > be anything. I've intentionally _not_ called test_tick at the beginning > > of each script, because that would throw off all of the scripts that do > > use it by one tick (whereas the first test_tick will overwrite these > > values). > > Yup, I think that is a sensible approach. OK, I'll see if I can polish it into a series. > Another thing we could do is under DEVELOPER=YesPlease we can set > timestamps you just added here to some random time. > > The ones that do care about reproducibility guarantee by using > test_tick would not be affected, and those that were happy with the > current time should be happy with such a random timestamp. Or we > could just drop what this patch does only under DEVELOPER=YesPlease > which amounts to be the same as setting random time. That should be easy to do. I wonder if it's really worth the trouble to maintain two parallel worlds. I.e., one of the goals would be to stop worrying about this non-determinism. If we keep it as a requirement, then we might as well do so all the time (and making it random under DEVELOPER is effectively keeping it as a requirement, since we expect all tests to pass under that flag). I'm also skeptical how often we use system times anyway, because _any_ use of test_commit or test_tick in a script is enough to make all of the subsequent commands deterministic. I'd be more inclined to let a particular script say "I'm interested in random times". But then, I'd think such a script would be better written to trigger its interesting cases with a well-crafted set of deterministic times. -Peff
Jeff King <peff@peff.net> writes: > I'm also skeptical how often we use system times anyway, because _any_ > use of test_commit or test_tick in a script is enough to make all of the > subsequent commands deterministic. I'd be more inclined to let a > particular script say "I'm interested in random times". But then, I'd > think such a script would be better written to trigger its interesting > cases with a well-crafted set of deterministic times. Yeah, the DEVELOPER=YesPlease was a tongue-in-cheek comment. Reproducibility would mean we see less test breakages due to raciness, but I do not recall a case where digging into test breakage due to raciness led to discovery of and fix to a real breakage in Git itself, so it is probably a good thing. Thanks.
On Tue, Jul 7, 2020 at 9:50 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Han-Wen Nienhuys <hanwen@google.com> > > > > Adds test_tick to t5801-remote-helpers.sh and t3203-branch-output.sh > > That can be read from the patch. Also the subject tells us a half > of what you want to achieve with this change (by the way, your > subject is malformatted and lacks the <area>: prefix; perhaps > "[PATCH] tests: make commit object names reproducible" or something), > but the readers are left hanging without knowing what motivated the > change. Do any test pieces in these scripts change their behaviour > based on what exact object names are assigned to them, making them > flaky and hard to test, and if so which one and in what way? You're right, I forgot to provide a piece of the puzzle. These tests have been failing either legitimately or flakily with reftable. I built a GIT_DEBUG_REFS functionality, which dumps ref backend interactions onto stderr. This should be reworked to use the tracing functionality, but you can look at the patch in my reftable series. When a test fails with reftable, the easiest way to debug is running it both with and without GIT_DEBUG_REFS=1, and running a diff on the output. This needs the commit SHA1s to be deterministic, or the SHA1s will generate spurious diffs. I agree with Jeff, that a more structural approach to making sha1s deterministic would be preferable.
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh index 71818b90f0..6b614739af 100755 --- a/t/t3203-branch-output.sh +++ b/t/t3203-branch-output.sh @@ -4,6 +4,8 @@ test_description='git branch display tests' . ./test-lib.sh . "$TEST_DIRECTORY"/lib-terminal.sh +test_tick + test_expect_success 'make commits' ' echo content >file && git add file && diff --git a/t/t5801-remote-helpers.sh b/t/t5801-remote-helpers.sh index 0f04b6cddb..de51584173 100755 --- a/t/t5801-remote-helpers.sh +++ b/t/t5801-remote-helpers.sh @@ -22,6 +22,8 @@ compare_refs() { eval $fail test_cmp expect actual } +test_tick + test_expect_success 'setup repository' ' git init server && (cd server &&