Make some commit hashes in tests reproducible

Message ID	pull.816.git.git.1594149804313.gitgitgadget@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=WOg5=AS=vger.kernel.org=git-owner@kernel.org> Message-Id: <pull.816.git.git.1594149804313.gitgitgadget@gmail.com> From: "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> Date: Tue, 07 Jul 2020 19:23:24 +0000 Subject: [PATCH] Make some commit hashes in tests reproducible Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Han-Wen Nienhuys <hanwenn@gmail.com>, Han-Wen Nienhuys <hanwen@google.com> Sender: git-owner@vger.kernel.org Precedence: bulk
Series	Make some commit hashes in tests reproducible \| expand Make some commit hashes in tests reproducible

John Cai via GitGitGadget July 7, 2020, 7:23 p.m. UTC

From: Han-Wen Nienhuys <hanwen@google.com>

Adds test_tick to t5801-remote-helpers.sh and t3203-branch-output.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
    Make commit hashes in test t3203-branch-output.sh reproducible
    
    Signed-off-by: Han-Wen Nienhuys hanwen@google.com [hanwen@google.com]

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-816%2Fhanwen%2Frepro-hash-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-816/hanwen/repro-hash-v1
Pull-Request: https://github.com/git/git/pull/816

 t/t3203-branch-output.sh  | 2 ++
 t/t5801-remote-helpers.sh | 2 ++
 2 files changed, 4 insertions(+)


base-commit: a08a83db2bf27f015bec9a435f6d73e223c21c5e

Junio C Hamano July 7, 2020, 7:50 p.m. UTC | #1

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> Adds test_tick to t5801-remote-helpers.sh and t3203-branch-output.sh

That can be read from the patch.  Also the subject tells us a half
of what you want to achieve with this change (by the way, your
subject is malformatted and lacks the <area>: prefix; perhaps
"[PATCH] tests: make commit object names reproducible" or something),
but the readers are left hanging without knowing what motivated the
change.  Do any test pieces in these scripts change their behaviour
based on what exact object names are assigned to them, making them
flaky and hard to test, and if so which one and in what way?

Thanks.

> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>     Make commit hashes in test t3203-branch-output.sh reproducible
>     
>     Signed-off-by: Han-Wen Nienhuys hanwen@google.com [hanwen@google.com]
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-816%2Fhanwen%2Frepro-hash-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-816/hanwen/repro-hash-v1
> Pull-Request: https://github.com/git/git/pull/816
>
>  t/t3203-branch-output.sh  | 2 ++
>  t/t5801-remote-helpers.sh | 2 ++
>  2 files changed, 4 insertions(+)
>
> diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
> index 71818b90f0..6b614739af 100755
> --- a/t/t3203-branch-output.sh
> +++ b/t/t3203-branch-output.sh
> @@ -4,6 +4,8 @@ test_description='git branch display tests'
>  . ./test-lib.sh
>  . "$TEST_DIRECTORY"/lib-terminal.sh
>  
> +test_tick
> +
>  test_expect_success 'make commits' '
>  	echo content >file &&
>  	git add file &&
> diff --git a/t/t5801-remote-helpers.sh b/t/t5801-remote-helpers.sh
> index 0f04b6cddb..de51584173 100755
> --- a/t/t5801-remote-helpers.sh
> +++ b/t/t5801-remote-helpers.sh
> @@ -22,6 +22,8 @@ compare_refs() {
>  	eval $fail test_cmp expect actual
>  }
>  
> +test_tick
> +
>  test_expect_success 'setup repository' '
>  	git init server &&
>  	(cd server &&
>
> base-commit: a08a83db2bf27f015bec9a435f6d73e223c21c5e

Jeff King July 7, 2020, 8:54 p.m. UTC | #2

On Tue, Jul 07, 2020 at 12:50:33PM -0700, Junio C Hamano wrote:

> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > From: Han-Wen Nienhuys <hanwen@google.com>
> >
> > Adds test_tick to t5801-remote-helpers.sh and t3203-branch-output.sh
> 
> That can be read from the patch.  Also the subject tells us a half
> of what you want to achieve with this change (by the way, your
> subject is malformatted and lacks the <area>: prefix; perhaps
> "[PATCH] tests: make commit object names reproducible" or something),
> but the readers are left hanging without knowing what motivated the
> change.  Do any test pieces in these scripts change their behaviour
> based on what exact object names are assigned to them, making them
> flaky and hard to test, and if so which one and in what way?

I agree that more discussion would be nice.

But I kind of wonder if we should be aiming for more determinism in
general, just to make debugging and reproduction simpler.

I.e., rather than pointing to _these_ tests, I think we could make an
argument for setting up a known timestamp in the test environment.
test_tick would continue to tick forward as usual, but for any tests
that don't use it, they'd by default get a deterministic outcome.

Something like this:

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 618a7c8d5b..d8adf5a199 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -441,15 +441,18 @@ TEST_AUTHOR_LOCALNAME=author
 TEST_AUTHOR_DOMAIN=example.com
 GIT_AUTHOR_EMAIL=${TEST_AUTHOR_LOCALNAME}@${TEST_AUTHOR_DOMAIN}
 GIT_AUTHOR_NAME='A U Thor'
+GIT_AUTHOR_DATE='1112911993 -0700'
 TEST_COMMITTER_LOCALNAME=committer
 TEST_COMMITTER_DOMAIN=example.com
 GIT_COMMITTER_EMAIL=${TEST_COMMITTER_LOCALNAME}@${TEST_COMMITTER_DOMAIN}
 GIT_COMMITTER_NAME='C O Mitter'
+GIT_COMMITTER_DATE='1112911993 -0700'
 GIT_MERGE_VERBOSITY=5
 GIT_MERGE_AUTOEDIT=no
 export GIT_MERGE_VERBOSITY GIT_MERGE_AUTOEDIT
 export GIT_AUTHOR_EMAIL GIT_AUTHOR_NAME
 export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
+export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
 export EDITOR

 # Tests using GIT_TRACE typically don't want <timestamp> <file>:<line> output

That's using the same start point as test_tick, though really it could
be anything. I've intentionally _not_ called test_tick at the beginning
of each script, because that would throw off all of the scripts that do
use it by one tick (whereas the first test_tick will overwrite these
values).

Trying to devil's advocate against this line of reasoning:

  - using the current timestamp introduces more randomness into the test
    suite, which could uncover problems. I'm somewhat skeptical, as the
    usual outcome I see here is that we realize a test's expected output
    is simply racy, and we remove the raciness by using test_tick

  - using the current timestamp could alert us to problems that occur
    only as the clock ticks forward (e.g., if we had a Y2021 bug, we'd
    notice when the clock rolled forward).

  - some tests may rely on having a "recent" timestamp in commits (e.g.,
    when looking at relative date handling). I think all of the
    relative-time tests already use a specific date, though, because
    otherwise we have too many problems with raciness.

Note that the patch above does seem to cause two tests to fail. One of
them I _suspect_ is a raciness problem (order of commits output changes,
which implies the original was expecting the time to increment between
two commits without running test_tick). And the other looks like some
weird interaction with the perl test harness. I'd be happy to dig into
both if this direction seems sane.

-Peff

Junio C Hamano July 7, 2020, 9:35 p.m. UTC | #3

Jeff King <peff@peff.net> writes:

> That's using the same start point as test_tick, though really it could
> be anything. I've intentionally _not_ called test_tick at the beginning
> of each script, because that would throw off all of the scripts that do
> use it by one tick (whereas the first test_tick will overwrite these
> values).

Yup, I think that is a sensible approach.

> Trying to devil's advocate against this line of reasoning:
>
>   - using the current timestamp introduces more randomness into the test
>     suite, which could uncover problems. I'm somewhat skeptical, as the
>     usual outcome I see here is that we realize a test's expected output
>     is simply racy, and we remove the raciness by using test_tick

True.

>   - using the current timestamp could alert us to problems that occur
>     only as the clock ticks forward (e.g., if we had a Y2021 bug, we'd
>     notice when the clock rolled forward).

True.

>   - some tests may rely on having a "recent" timestamp in commits (e.g.,
>     when looking at relative date handling). I think all of the
>     relative-time tests already use a specific date, though, because
>     otherwise we have too many problems with raciness.

True.

Another thing we could do is under DEVELOPER=YesPlease we can set
timestamps you just added here to some random time.

The ones that do care about reproducibility guarantee by using
test_tick would not be affected, and those that were happy with the
current time should be happy with such a random timestamp.  Or we
could just drop what this patch does only under DEVELOPER=YesPlease
which amounts to be the same as setting random time.

> Note that the patch above does seem to cause two tests to fail. One of
> them I _suspect_ is a raciness problem (order of commits output changes,
> which implies the original was expecting the time to increment between
> two commits without running test_tick).

Ahh, ok, "git commit && git commit" without these environment
variable set and exported may or may not get the same timestamp from
the wallclock during the testing.  With your patch, they are
guaranteed to get the same timestamp.  On the surface, it sounds
like a valid approach to smoke out any tests that rely on the
passage of time, but realistically the machines are fast enough that
it won't be a rare occasion for these two "git commit" invocations
to be done within a single second, so I am not sure how much we
would be gaining.  We _are_ getting more reproducible test
environment, though, which may be a plus.

Jeff King July 7, 2020, 9:41 p.m. UTC | #4

On Tue, Jul 07, 2020 at 04:54:18PM -0400, Jeff King wrote:

> Note that the patch above does seem to cause two tests to fail. One of
> them I _suspect_ is a raciness problem (order of commits output changes,
> which implies the original was expecting the time to increment between
> two commits without running test_tick).

It turns out this does depend on the system clock, but isn't racy in the
traditional sense. See the explanation below (which might be worth doing
even if we don't switch to a deterministic time everywhere else).

-- >8 --
Subject: [PATCH] t6000: use test_tick consistently

The first two commits created in t6000 are done without test_tick,
meaning they use the current system clock. After that, we create one
with test_tick, which means it uses a deterministic time in the past.

The result of the "symleft flag bit is propagated down from tag" test
relies on the output order of commits from git-log, which in turn
depends on these timestamps. So this test is technically dependent on
the system clock time, though in practice it would only matter if your
system clock was set before test_tick's default time (which is in 2005).

However, let's use test_tick consistently for those early commits (and
update the expected output to match). This makes the test deterministic,
which is in turn easier to reason about and debug.

Note that there's also a fourth commit here, and it does not use
test_tick. It does have a deterministic timestamp because of the prior
use of test_tick in the script, but it will always be the same time as
the third commit. Let's use test_tick here, too, for consistency.  The
matching timestamps between the third and fourth commit are not an
important part of the test.

We could also use test_commit in all of these cases, as it runs
test_tick under the hood. But it would be awkward to do so, as these
tests diverge from the usual test_commit patterns (e.g., by creating
multiple files in a single commit).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t6000-rev-list-misc.sh | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 3dc1ad8f71..3bb0e4ff8f 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -8,6 +8,7 @@ test_expect_success setup '
 	echo content1 >wanted_file &&
 	echo content2 >unwanted_file &&
 	git add wanted_file unwanted_file &&
+	test_tick &&
 	git commit -m one
 '

@@ -21,6 +22,7 @@ test_expect_success 'rev-list --objects with pathspecs and deeper paths' '
 	mkdir foo &&
 	>foo/file &&
 	git add foo/file &&
+	test_tick &&
 	git commit -m two &&

 	git rev-list --objects HEAD -- foo >output &&
@@ -69,6 +71,7 @@ test_expect_success '--no-object-names and --object-names are last-one-wins' '
 '

 test_expect_success 'rev-list A..B and rev-list ^A B are the same' '
+	test_tick &&
 	git commit --allow-empty -m another &&
 	git tag -a -m "annotated" v1.0 &&
 	git rev-list --objects ^v1.0^ v1.0 >expect &&
@@ -84,10 +87,10 @@ test_expect_success 'propagate uninteresting flag down correctly' '
 test_expect_success 'symleft flag bit is propagated down from tag' '
 	git log --format="%m %s" --left-right v1.0...master >actual &&
 	cat >expect <<-\EOF &&
-	> two
-	> one
 	< another
 	< that
+	> two
+	> one
 	EOF
 	test_cmp expect actual
 '

Jeff King July 7, 2020, 9:52 p.m. UTC | #5

On Tue, Jul 07, 2020 at 02:35:57PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > That's using the same start point as test_tick, though really it could
> > be anything. I've intentionally _not_ called test_tick at the beginning
> > of each script, because that would throw off all of the scripts that do
> > use it by one tick (whereas the first test_tick will overwrite these
> > values).
> 
> Yup, I think that is a sensible approach.

OK, I'll see if I can polish it into a series.

> Another thing we could do is under DEVELOPER=YesPlease we can set
> timestamps you just added here to some random time.
> 
> The ones that do care about reproducibility guarantee by using
> test_tick would not be affected, and those that were happy with the
> current time should be happy with such a random timestamp.  Or we
> could just drop what this patch does only under DEVELOPER=YesPlease
> which amounts to be the same as setting random time.

That should be easy to do. I wonder if it's really worth the trouble to
maintain two parallel worlds. I.e., one of the goals would be to stop
worrying about this non-determinism. If we keep it as a requirement,
then we might as well do so all the time (and making it random under
DEVELOPER is effectively keeping it as a requirement, since we expect
all tests to pass under that flag).

I'm also skeptical how often we use system times anyway, because _any_
use of test_commit or test_tick in a script is enough to make all of the
subsequent commands deterministic. I'd be more inclined to let a
particular script say "I'm interested in random times". But then, I'd
think such a script would be better written to trigger its interesting
cases with a well-crafted set of deterministic times.

-Peff

Junio C Hamano July 7, 2020, 10:37 p.m. UTC | #6

Jeff King <peff@peff.net> writes:

> I'm also skeptical how often we use system times anyway, because _any_
> use of test_commit or test_tick in a script is enough to make all of the
> subsequent commands deterministic. I'd be more inclined to let a
> particular script say "I'm interested in random times". But then, I'd
> think such a script would be better written to trigger its interesting
> cases with a well-crafted set of deterministic times.

Yeah, the DEVELOPER=YesPlease was a tongue-in-cheek comment.  

Reproducibility would mean we see less test breakages due to
raciness, but I do not recall a case where digging into test
breakage due to raciness led to discovery of and fix to a real
breakage in Git itself, so it is probably a good thing.

Thanks.

Han-Wen Nienhuys July 8, 2020, 5:06 a.m. UTC | #7

On Tue, Jul 7, 2020 at 9:50 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Han-Wen Nienhuys <hanwen@google.com>
> >
> > Adds test_tick to t5801-remote-helpers.sh and t3203-branch-output.sh
>
> That can be read from the patch.  Also the subject tells us a half
> of what you want to achieve with this change (by the way, your
> subject is malformatted and lacks the <area>: prefix; perhaps
> "[PATCH] tests: make commit object names reproducible" or something),
> but the readers are left hanging without knowing what motivated the
> change.  Do any test pieces in these scripts change their behaviour
> based on what exact object names are assigned to them, making them
> flaky and hard to test, and if so which one and in what way?

You're right, I forgot to provide a piece of the puzzle. These tests
have been failing either legitimately or flakily with reftable.

I built a GIT_DEBUG_REFS functionality, which dumps ref backend
interactions onto stderr. This should be reworked to use the tracing
functionality, but you can look at the patch in my reftable series.

When a test fails with reftable, the easiest way to debug is running
it both with and without GIT_DEBUG_REFS=1, and running a diff on the
output.

This needs the commit SHA1s to be deterministic, or the SHA1s will
generate spurious diffs.

I agree with Jeff, that a more structural approach to making sha1s
deterministic would be preferable.

Make some commit hashes in tests reproducible

Commit Message

Comments

Patch