mbox series

[v2,0/5] Fix the racy split index problem

Message ID 20180927124434.30835-1-szeder.dev@gmail.com (mailing list archive)
Headers show
Series Fix the racy split index problem | expand

Message

SZEDER Gábor Sept. 27, 2018, 12:44 p.m. UTC
This is the second attempt to fix the racy split index problem, which
causes occasional failures in several random test scripts when run
with 'GIT_TEST_SPLIT_INDEX=yes'.  The important details are in patches
1 and 5 (corresponding to v1's 3 and 5).

Changes since v1:

  - Most importanly, patch 5 adds a second is_racy_timestamp()
    condition to prepare_to_write_split_index() to deal with those
    racily clean cache entries that were copied from the shared index
    by unpack_trees() while it created a new index.
    This should fix the remainig failures in 't3903-stash.sh' with
    split index enabled.

    Furthermore, together with patch 4 they add a bunch of comments to
    the most (alas, not all) conditions in
    prepare_to_write_split_index()'s first loop, trying to explain in
    plain English which cache entries are handled by each of those
    conditions.  I might have gone a bit overboard with the comments,
    but I think they would have helped me a lot while working on these
    fixes...

  - Patch 1 adds one more failing test to demonstrate the raciness
    after unpack_trees() created a new index.

    Updated some of the comments in those tests, and I also reordered
    some tests, because I found that it's more logical to explain them
    in this order.  (However, this made 'range-diff' think that it's a
    total rewrite, which is not really true... but even though I could
    convince it otherwise by adjusting the --creation-factor, the
    results were utterly unreadable.)

  - Patch 4 is new and...  interesting?  I'd like to hear Duy's
    thoughts on this one.

  - Patch 3 is new and...  I hesitate to call it a bugfix, because,
    luckily, there is no real and visible bug, but it's a fix
    nonetheless.

  - Commit message updates here and there.

  - Addressed Ævar's nits in patch 2.

  - The first two patches from v1 have already been merged to
    'master' (cff90bdc5c (Merge branch 'sg/split-index-test',
    2018-09-24)).


SZEDER Gábor (5):
  split-index: add tests to demonstrate the racy split index problem
  t1700-split-index: date back files to avoid racy situations
  split-index: count the number of deleted entries
  split-index: don't compare stat data of entries already marked for
    split index
  split-index: smudge and add racily clean cache entries to split index

 cache.h                     |   2 +
 read-cache.c                |   2 +-
 split-index.c               | 121 +++++++++++++++++---
 t/t1700-split-index.sh      |  49 +++++----
 t/t1701-racy-split-index.sh | 214 ++++++++++++++++++++++++++++++++++++
 5 files changed, 348 insertions(+), 40 deletions(-)
 create mode 100755 t/t1701-racy-split-index.sh

Range-diff:
1:  71d973e4c0 < -:  ---------- t1700-split-index: drop unnecessary 'grep'
2:  5d46f58a07 < -:  ---------- t0090: disable GIT_TEST_SPLIT_INDEX for the test checking split index
3:  e60fa2dd9e < -:  ---------- split index: add a test to demonstrate the racy split index problem
-:  ---------- > 1:  a5b46af0ff split-index: add tests to demonstrate the racy split index problem
4:  799c9c7739 ! 2:  ed26c9703e t1700-split-index: date back files to avoid racy situations
    @@ -43,7 +43,7 @@
      
     +# Create a file named as $1 with content read from stdin.
     +# Set the file's mtime to a few seconds in the past to avoid racy situations.
    -+create_file () {
    ++create_non_racy_file () {
     +	cat >"$1" &&
     +	test-tool chmtime =-5 "$1"
     +}
    @@ -56,7 +56,7 @@
      
      test_expect_success 'add one file' '
     -	: >one &&
    -+	create_file one &&
    ++	create_non_racy_file one &&
      	git update-index --add one &&
      	git ls-files --stage >ls-files.actual &&
      	cat >ls-files.expect <<-EOF &&
    @@ -65,7 +65,7 @@
      
      test_expect_success 'modify original file, base index untouched' '
     -	echo modified >one &&
    -+	echo modified |create_file one &&
    ++	echo modified | create_non_racy_file one &&
      	git update-index one &&
      	git ls-files --stage >ls-files.actual &&
      	cat >ls-files.expect <<-EOF &&
    @@ -74,7 +74,7 @@
      
      test_expect_success 'add another file, which stays index' '
     -	: >two &&
    -+	create_file two &&
    ++	create_non_racy_file two &&
      	git update-index --add two &&
      	git ls-files --stage >ls-files.actual &&
      	cat >ls-files.expect <<-EOF &&
    @@ -83,7 +83,7 @@
      
      test_expect_success 'add original file back' '
     -	: >one &&
    -+	create_file one &&
    ++	create_non_racy_file one &&
      	git update-index --add one &&
      	git ls-files --stage >ls-files.actual &&
      	cat >ls-files.expect <<-EOF &&
    @@ -92,7 +92,7 @@
      
      test_expect_success 'add new file' '
     -	: >two &&
    -+	create_file two &&
    ++	create_non_racy_file two &&
      	git update-index --add two &&
      	git ls-files --stage >actual &&
      	cat >expect <<-EOF &&
    @@ -101,7 +101,7 @@
      test_expect_success 'set core.splitIndex config variable to true' '
      	git config core.splitIndex true &&
     -	: >three &&
    -+	create_file three &&
    ++	create_non_racy_file three &&
      	git update-index --add three &&
      	git ls-files --stage >ls-files.actual &&
      	cat >ls-files.expect <<-EOF &&
    @@ -113,7 +113,7 @@
     +test_expect_success 'set core.splitIndex config variable back to true' '
      	git config core.splitIndex true &&
     -	: >three &&
    -+	create_file three &&
    ++	create_non_racy_file three &&
      	git update-index --add three &&
      	BASE=$(test-tool dump-split-index .git/index | grep "^base") &&
      	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
    @@ -122,7 +122,7 @@
      	EOF
      	test_cmp expect actual &&
     -	: >four &&
    -+	create_file four &&
    ++	create_non_racy_file four &&
      	git update-index --add four &&
      	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
      	cat >expect <<-EOF &&
    @@ -131,7 +131,7 @@
      test_expect_success 'check behavior with splitIndex.maxPercentChange unset' '
      	git config --unset splitIndex.maxPercentChange &&
     -	: >five &&
    -+	create_file five &&
    ++	create_non_racy_file five &&
      	git update-index --add five &&
      	BASE=$(test-tool dump-split-index .git/index | grep "^base") &&
      	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
    @@ -140,7 +140,7 @@
      	EOF
      	test_cmp expect actual &&
     -	: >six &&
    -+	create_file six &&
    ++	create_non_racy_file six &&
      	git update-index --add six &&
      	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
      	cat >expect <<-EOF &&
    @@ -149,7 +149,7 @@
      test_expect_success 'check splitIndex.maxPercentChange set to 0' '
      	git config splitIndex.maxPercentChange 0 &&
     -	: >seven &&
    -+	create_file seven &&
    ++	create_non_racy_file seven &&
      	git update-index --add seven &&
      	BASE=$(test-tool dump-split-index .git/index | grep "^base") &&
      	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
    @@ -158,7 +158,7 @@
      	EOF
      	test_cmp expect actual &&
     -	: >eight &&
    -+	create_file eight &&
    ++	create_non_racy_file eight &&
      	git update-index --add eight &&
      	BASE=$(test-tool dump-split-index .git/index | grep "^base") &&
      	test-tool dump-split-index .git/index | sed "/^own/d" >actual &&
    @@ -167,19 +167,19 @@
      
      test_expect_success 'shared index files expire after 2 weeks by default' '
     -	: >ten &&
    -+	create_file ten &&
    ++	create_non_racy_file ten &&
      	git update-index --add ten &&
      	test $(ls .git/sharedindex.* | wc -l) -gt 2 &&
      	just_under_2_weeks_ago=$((5-14*86400)) &&
      	test-tool chmtime =$just_under_2_weeks_ago .git/sharedindex.* &&
     -	: >eleven &&
    -+	create_file eleven &&
    ++	create_non_racy_file eleven &&
      	git update-index --add eleven &&
      	test $(ls .git/sharedindex.* | wc -l) -gt 2 &&
      	just_over_2_weeks_ago=$((-1-14*86400)) &&
      	test-tool chmtime =$just_over_2_weeks_ago .git/sharedindex.* &&
     -	: >twelve &&
    -+	create_file twelve &&
    ++	create_non_racy_file twelve &&
      	git update-index --add twelve &&
      	test $(ls .git/sharedindex.* | wc -l) -le 2
      '
    @@ -188,13 +188,13 @@
      	git config splitIndex.sharedIndexExpire "16.days.ago" &&
      	test-tool chmtime =$just_over_2_weeks_ago .git/sharedindex.* &&
     -	: >thirteen &&
    -+	create_file thirteen &&
    ++	create_non_racy_file thirteen &&
      	git update-index --add thirteen &&
      	test $(ls .git/sharedindex.* | wc -l) -gt 2 &&
      	just_over_16_days_ago=$((-1-16*86400)) &&
      	test-tool chmtime =$just_over_16_days_ago .git/sharedindex.* &&
     -	: >fourteen &&
    -+	create_file fourteen &&
    ++	create_non_racy_file fourteen &&
      	git update-index --add fourteen &&
      	test $(ls .git/sharedindex.* | wc -l) -le 2
      '
    @@ -203,14 +203,14 @@
      	just_10_years_ago=$((-365*10*86400)) &&
      	test-tool chmtime =$just_10_years_ago .git/sharedindex.* &&
     -	: >fifteen &&
    -+	create_file fifteen &&
    ++	create_non_racy_file fifteen &&
      	git update-index --add fifteen &&
      	test $(ls .git/sharedindex.* | wc -l) -gt 2 &&
      	git config splitIndex.sharedIndexExpire now &&
      	just_1_second_ago=-1 &&
      	test-tool chmtime =$just_1_second_ago .git/sharedindex.* &&
     -	: >sixteen &&
    -+	create_file sixteen &&
    ++	create_non_racy_file sixteen &&
      	git update-index --add sixteen &&
      	test $(ls .git/sharedindex.* | wc -l) -le 2
      '
    @@ -219,7 +219,7 @@
      		git config core.sharedrepository "$mode" &&
      		git config core.splitIndex true &&
     -		: >one &&
    -+		create_file one &&
    ++		create_non_racy_file one &&
      		git update-index --add one &&
      		echo "$modebits" >expect &&
      		test_modebits .git/index >actual &&
-:  ---------- > 3:  899e5090d3 split-index: count the number of deleted entries
-:  ---------- > 4:  02646f1a1b split-index: don't compare stat data of entries already marked for split index
5:  38b5f0fe72 ! 5:  290ac26055 split-index: smudge and add racily clean cache entries to split index
    @@ -22,25 +22,29 @@
         them with smudged stat data, i.e. with file size set to 0.  When
         subsequent git commands read the index, they will notice that the
         smudged stat data doesn't match with the file in the worktree, and
    -    then go on to check the file's content.
    +    then go on to check the file's content and notice its dirtiness.
     
         In the above example, however, in the second 'git update-index'
    -    prepare_to_write_split_index() gathers all cache entries that should
    -    be written to the new split index.  Alas, this function never looks
    -    out for racily clean cache entries, and since the file's stat data in
    -    the worktree hasn't changed since the shared index was written, it
    -    won't be replaced in the new split index.  Consequently,
    -    do_write_index() doesn't even get this racily clean cache entry, and
    -    can't smudge its stat data.  Subsequent git commands will then see
    -    that the index has more recent mtime than the file and that the (not
    -    smudged) cached stat data still matches with the file in the worktree,
    -    and, ultimately, will erroneously consider the file clean.
    +    prepare_to_write_split_index() decides which cache entries stored only
    +    in the shared index should be replaced in the new split index.  Alas,
    +    this function never looks out for racily clean cache entries, and
    +    since the file's stat data in the worktree hasn't changed since the
    +    shared index was written, it won't be replaced in the new split index.
    +    Consequently, do_write_index() doesn't even get this racily clean
    +    cache entry, and can't smudge its stat data.  Subsequent git commands
    +    will then see that the index has more recent mtime than the file and
    +    that the (not smudged) cached stat data still matches with the file in
    +    the worktree, and, ultimately, will erroneously consider the file
    +    clean.
     
         Modify prepare_to_write_split_index() to recognize racily clean cache
    -    entries, and mark them to be added to the split index.  This way
    -    do_write_index() will get these racily clean cache entries as well,
    -    and will then write them with smudged stat data to the new split
    -    index.
    +    entries, and mark them to be added to the split index.  Note that
    +    there are two places where it should check raciness: first those cache
    +    entries that are only stored in the shared index, and then those that
    +    have been copied by unpack_trees() from the shared index while it
    +    constructed a new index.  This way do_write_index() will get these
    +    racily clean cache entries as well, and will then write them with
    +    smudged stat data to the new split index.
     
         Note that after this change if the index is split when it contains a
         racily clean cache entry, then a smudged cache entry will be written
    @@ -49,19 +53,19 @@
         an entry in the split index replacing an outdated entry in the shared
         index.  It did affect a few tests in 't1700-split-index.sh', though,
         because they actually check which entries are stored in the split
    -    index; the previous patch made the necessary adjustments.  And racily
    -    clean cache entries and index splitting are rare enough to not worry
    -    about the resulting duplicated smudged cache entries, and the
    -    additional complexity required to prevent them is not worth it.
    +    index; a previous patch in this series has already made the necessary
    +    adjustments in 't1700'.  And racily clean cache entries and index
    +    splitting are rare enough to not worry about the resulting duplicated
    +    smudged cache entries, and the additional complexity required to
    +    prevent them is not worth it.
     
         Several tests failed occasionally when the test suite was run with
         'GIT_TEST_SPLIT_INDEX=yes'.  Here are those that I managed to trace
         back to this racy split index problem, starting with those failing
         more frequently, with a link to a failing Travis CI build job for
    -    each.  The highlighted line shows when the racy file was written,
    -    which is not always in the failing test (but note that those lines are
    -    in the 'after failure' fold, and your browser might unhelpfully fold
    -    it up before you could take a good look).
    +    each.  The highlighted line [2] shows when the racy file was written,
    +    which is not always in the failing test but in a preceeding setup
    +    test.
     
           t3903-stash.sh:
             https://travis-ci.org/git/git/jobs/385542084#L5858
    @@ -75,7 +79,7 @@
           t2200-add-update.sh:
             https://travis-ci.org/git/git/jobs/382543426#L3051
     
    -      t0090-cache-tree:
    +      t0090-cache-tree.sh:
             https://travis-ci.org/git/git/jobs/416583010#L3679
     
         There might be others, e.g. perhaps 't1000-read-tree-m-3way.sh' and
    @@ -84,6 +88,10 @@
         [1] In the branch leading to the merge commit v2.1.0-rc0~45 (Merge
             branch 'nd/split-index', 2014-07-16).
     
    +    [2] Note that those highlighted lines are in the 'after failure' fold,
    +        and your browser might unhelpfully fold it up before you could
    +        take a good look.
    +
         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
     
      diff --git a/cache.h b/cache.h
    @@ -121,39 +129,94 @@
      			base = si->base->cache[ce->index - 1];
     -			if (ce == base)
     +			if (ce == base) {
    -+				/*
    -+				 * Racily clean cache entries must be
    -+				 * written to the split index, so the
    -+				 * subsequent do_write_index() can smudge
    -+				 * their stat data.
    -+				 */
    -+				if (!ce_uptodate(ce) &&
    -+				    is_racy_timestamp(istate, ce))
    ++				/* The entry is present in the shared index. */
    ++				if (ce->ce_flags & CE_UPDATE_IN_BASE) {
    ++					/*
    ++					 * Already marked for inclusion in
    ++					 * the split index, either because
    ++					 * the corresponding file was
    ++					 * modified and the cached stat data
    ++					 * was refreshed, or because there
    ++					 * is already a replacement entry in
    ++					 * the split index.
    ++					 * Nothing more to do here.
    ++					 */
    ++				} else if (!ce_uptodate(ce) &&
    ++					   is_racy_timestamp(istate, ce)) {
    ++					/*
    ++					 * A racily clean cache entry stored
    ++					 * only in the shared index: it must
    ++					 * be added to the split index, so
    ++					 * the subsequent do_write_index()
    ++					 * can smudge its stat data.
    ++					 */
     +					ce->ce_flags |= CE_UPDATE_IN_BASE;
    ++				} else {
    ++					/*
    ++					 * The entry is only present in the
    ++					 * shared index and it was not
    ++					 * refreshed.
    ++					 * Just leave it there.
    ++					 */
    ++				}
      				continue;
     +			}
      			if (ce->ce_namelen != base->ce_namelen ||
      			    strcmp(ce->name, base->name)) {
      				ce->index = 0;
    +@@
    + 				 * the split index.
    + 				 * Nothing to do.
    + 				 */
    ++			} else if (!ce_uptodate(ce) &&
    ++				   is_racy_timestamp(istate, ce)) {
    ++				/*
    ++				 * A copy of a racily clean cache entry from
    ++				 * the shared index.  It must be added to
    ++				 * the split index, so the subsequent
    ++				 * do_write_index() can smudge its stat data.
    ++				 */
    ++				ce->ce_flags |= CE_UPDATE_IN_BASE;
    + 			} else {
    + 				/*
    + 				 * Thoroughly compare the cached data to see
     
      diff --git a/t/t1701-racy-split-index.sh b/t/t1701-racy-split-index.sh
      --- a/t/t1701-racy-split-index.sh
      +++ b/t/t1701-racy-split-index.sh
     @@
      
    - for trial in 0 1 2 3 4
    + for trial in $trials
      do
    --	test_expect_failure "update the split index when the shared index contains a racily clean cache entry #$trial" '
    -+	test_expect_success "update the split index when the shared index contains a racily clean cache entry #$trial" '
    - 		test_when_finished "rm -f .git/index .git/sharedindex.*" &&
    +-	test_expect_failure "update the split index when a racily clean cache entry is stored only in the shared index $trial" '
    ++	test_expect_success "update the split index when a racily clean cache entry is stored only in the shared index #$trial" '
    + 		rm -f .git/index .git/sharedindex.* &&
      
      		# The next three commands must be run within the same
     @@
    - 		# corresponding replacement cache entry with smudged
    - 		# stat data should be added to the new split index, so
    - 		# the file wont appear clean for subsequent git commands.
    + 		# entry of racy-file is only stored in the shared index.
    + 		# A corresponding replacement cache entry with smudged
    + 		# stat data should be added to the new split index.
     -		#
     -		# Alas, such a smudged replacement entry is not added!
      		git update-index --add other-file &&
      
    - 		check_cached_diff
    + 		# Subsequent git commands should notice the smudged
    +@@
    + 
    + for trial in $trials
    + do
    +-	test_expect_failure "update the split index after unpack trees() copied a racily clean cache entry from the shared index $trial" '
    ++	test_expect_success "update the split index after unpack trees() copied a racily clean cache entry from the shared index #$trial" '
    + 		rm -f .git/index .git/sharedindex.* &&
    + 
    + 		# The next three commands must be run within the same
    +@@
    + 		# index.  A corresponding replacement cache entry
    + 		# with smudged stat data should be added to the new
    + 		# split index.
    +-		#
    +-		# Alas, such a smudged replacement entry is not added!
    + 		git read-tree -m HEAD &&
    + 
    + 		# Subsequent git commands should notice the smudged

Comments

Ævar Arnfjörð Bjarmason Sept. 27, 2018, 1:53 p.m. UTC | #1
On Thu, Sep 27 2018, SZEDER Gábor wrote:

> This is the second attempt to fix the racy split index problem, which
> causes occasional failures in several random test scripts when run
> with 'GIT_TEST_SPLIT_INDEX=yes'.  The important details are in patches
> 1 and 5 (corresponding to v1's 3 and 5).

Thanks. I'm running the same sorts of tests I noted in
https://public-inbox.org/git/87va7ireuu.fsf@evledraar.gmail.com/ on
this. The fix Jeff had that you noted in
https://public-inbox.org/git/20180906151439.GA8016@localhost/ is now in
"master".

I take it your
https://github.com/szeder/git/commits/racy-split-index-fix is the same
as this submission? Anyway, I'm testing that cherry-picked on top of the
latest master.

Unfortunate that we couldn't get the isolated test you made in
https://public-inbox.org/git/20180907034942.GA10370@localhost/ but I
don't see how it could be added without some very liberal
getenv("GIT_TEST_blahblah"), so it's probably best to not add it,
particularly with the C rewrite of git-stash in-flight.

I'll report back when I have enough test data to say how these patches
affect the intermittent test failures under GIT_TEST_SPLIT_INDEX=yes.
SZEDER Gábor Sept. 27, 2018, 2:23 p.m. UTC | #2
On Thu, Sep 27, 2018 at 03:53:24PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Sep 27 2018, SZEDER Gábor wrote:
> 
> > This is the second attempt to fix the racy split index problem, which
> > causes occasional failures in several random test scripts when run
> > with 'GIT_TEST_SPLIT_INDEX=yes'.  The important details are in patches
> > 1 and 5 (corresponding to v1's 3 and 5).
> 
> Thanks. I'm running the same sorts of tests I noted in
> https://public-inbox.org/git/87va7ireuu.fsf@evledraar.gmail.com/ on
> this. The fix Jeff had that you noted in
> https://public-inbox.org/git/20180906151439.GA8016@localhost/ is now in
> "master".
> 
> I take it your
> https://github.com/szeder/git/commits/racy-split-index-fix is the same
> as this submission?

Yes.

> Anyway, I'm testing that cherry-picked on top of the
> latest master.
> 
> Unfortunate that we couldn't get the isolated test you made in
> https://public-inbox.org/git/20180907034942.GA10370@localhost/

Nah, that's not an isolated test case, that's only a somewhat
narrowed-down, but rather reliable reproduction recipe while I still
had no idea what was going on :)

The _real_ isolated test is the last test in t1701, that's what it
eventually boiled down to.

> but I
> don't see how it could be added without some very liberal
> getenv("GIT_TEST_blahblah"), so it's probably best to not add it,
> particularly with the C rewrite of git-stash in-flight.
> 
> I'll report back when I have enough test data to say how these patches
> affect the intermittent test failures under GIT_TEST_SPLIT_INDEX=yes.
Ævar Arnfjörð Bjarmason Sept. 27, 2018, 3:25 p.m. UTC | #3
On Thu, Sep 27 2018, SZEDER Gábor wrote:

> On Thu, Sep 27, 2018 at 03:53:24PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Thu, Sep 27 2018, SZEDER Gábor wrote:
>>
>> > This is the second attempt to fix the racy split index problem, which
>> > causes occasional failures in several random test scripts when run
>> > with 'GIT_TEST_SPLIT_INDEX=yes'.  The important details are in patches
>> > 1 and 5 (corresponding to v1's 3 and 5).
>>
>> Thanks. I'm running the same sorts of tests I noted in
>> https://public-inbox.org/git/87va7ireuu.fsf@evledraar.gmail.com/ on
>> this. The fix Jeff had that you noted in
>> https://public-inbox.org/git/20180906151439.GA8016@localhost/ is now in
>> "master".
>>
>> I take it your
>> https://github.com/szeder/git/commits/racy-split-index-fix is the same
>> as this submission?
>
> Yes.
>
>> Anyway, I'm testing that cherry-picked on top of the
>> latest master.
>>
>> Unfortunate that we couldn't get the isolated test you made in
>> https://public-inbox.org/git/20180907034942.GA10370@localhost/
>
> Nah, that's not an isolated test case, that's only a somewhat
> narrowed-down, but rather reliable reproduction recipe while I still
> had no idea what was going on :)
>
> The _real_ isolated test is the last test in t1701, that's what it
> eventually boiled down to.

Thanks. I had ~400 runs of the tests I ran before and they were all
OK. Now trying also with t1701 (which I hadn't noticed was a new
test...).


>> but I
>> don't see how it could be added without some very liberal
>> getenv("GIT_TEST_blahblah"), so it's probably best to not add it,
>> particularly with the C rewrite of git-stash in-flight.
>>
>> I'll report back when I have enough test data to say how these patches
>> affect the intermittent test failures under GIT_TEST_SPLIT_INDEX=yes.
Ævar Arnfjörð Bjarmason Sept. 28, 2018, 6:57 a.m. UTC | #4
On Thu, Sep 27 2018, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Sep 27 2018, SZEDER Gábor wrote:
>
>> On Thu, Sep 27, 2018 at 03:53:24PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> On Thu, Sep 27 2018, SZEDER Gábor wrote:
>>>
>>> > This is the second attempt to fix the racy split index problem, which
>>> > causes occasional failures in several random test scripts when run
>>> > with 'GIT_TEST_SPLIT_INDEX=yes'.  The important details are in patches
>>> > 1 and 5 (corresponding to v1's 3 and 5).
>>>
>>> Thanks. I'm running the same sorts of tests I noted in
>>> https://public-inbox.org/git/87va7ireuu.fsf@evledraar.gmail.com/ on
>>> this. The fix Jeff had that you noted in
>>> https://public-inbox.org/git/20180906151439.GA8016@localhost/ is now in
>>> "master".
>>>
>>> I take it your
>>> https://github.com/szeder/git/commits/racy-split-index-fix is the same
>>> as this submission?
>>
>> Yes.
>>
>>> Anyway, I'm testing that cherry-picked on top of the
>>> latest master.
>>>
>>> Unfortunate that we couldn't get the isolated test you made in
>>> https://public-inbox.org/git/20180907034942.GA10370@localhost/
>>
>> Nah, that's not an isolated test case, that's only a somewhat
>> narrowed-down, but rather reliable reproduction recipe while I still
>> had no idea what was going on :)
>>
>> The _real_ isolated test is the last test in t1701, that's what it
>> eventually boiled down to.
>
> Thanks. I had ~400 runs of the tests I ran before and they were all
> OK. Now trying also with t1701 (which I hadn't noticed was a new
> test...).

Ran that overnight with the same conditions as before. 2683 OK runs and
0 failures (and counting). So it seems like the combination of the two
fixed the split index bugs.

>>> but I
>>> don't see how it could be added without some very liberal
>>> getenv("GIT_TEST_blahblah"), so it's probably best to not add it,
>>> particularly with the C rewrite of git-stash in-flight.
>>>
>>> I'll report back when I have enough test data to say how these patches
>>> affect the intermittent test failures under GIT_TEST_SPLIT_INDEX=yes.
SZEDER Gábor Sept. 28, 2018, 10:17 a.m. UTC | #5
On Fri, Sep 28, 2018 at 08:57:53AM +0200, Ævar Arnfjörð Bjarmason wrote:
> > Thanks. I had ~400 runs of the tests I ran before and they were all
> > OK. Now trying also with t1701 (which I hadn't noticed was a new
> > test...).
> 
> Ran that overnight with the same conditions as before. 2683 OK runs and
> 0 failures (and counting). So it seems like the combination of the two
> fixed the split index bugs.

Yeah, I thought they would.  If you look at the first loop of
prepare_to_write_split_index() classifying which cache entries should
be included in the new split index, you'll see only two code paths
that could leave out an entry from the split index, i.e. where an
entry could be left with a non-zero 'ce->index' and without its
CE_UPDATE_IN_BASE flag set.  Now, with the fix in patch 5/5 both of
those code paths have the is_race_timestamp() check.
Ævar Arnfjörð Bjarmason Oct. 8, 2018, 2:54 p.m. UTC | #6
On Fri, Sep 28 2018, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Sep 27 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> On Thu, Sep 27 2018, SZEDER Gábor wrote:
>>
>>> On Thu, Sep 27, 2018 at 03:53:24PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>>
>>>> On Thu, Sep 27 2018, SZEDER Gábor wrote:
>>>>
>>>> > This is the second attempt to fix the racy split index problem, which
>>>> > causes occasional failures in several random test scripts when run
>>>> > with 'GIT_TEST_SPLIT_INDEX=yes'.  The important details are in patches
>>>> > 1 and 5 (corresponding to v1's 3 and 5).
>>>>
>>>> Thanks. I'm running the same sorts of tests I noted in
>>>> https://public-inbox.org/git/87va7ireuu.fsf@evledraar.gmail.com/ on
>>>> this. The fix Jeff had that you noted in
>>>> https://public-inbox.org/git/20180906151439.GA8016@localhost/ is now in
>>>> "master".
>>>>
>>>> I take it your
>>>> https://github.com/szeder/git/commits/racy-split-index-fix is the same
>>>> as this submission?
>>>
>>> Yes.
>>>
>>>> Anyway, I'm testing that cherry-picked on top of the
>>>> latest master.
>>>>
>>>> Unfortunate that we couldn't get the isolated test you made in
>>>> https://public-inbox.org/git/20180907034942.GA10370@localhost/
>>>
>>> Nah, that's not an isolated test case, that's only a somewhat
>>> narrowed-down, but rather reliable reproduction recipe while I still
>>> had no idea what was going on :)
>>>
>>> The _real_ isolated test is the last test in t1701, that's what it
>>> eventually boiled down to.
>>
>> Thanks. I had ~400 runs of the tests I ran before and they were all
>> OK. Now trying also with t1701 (which I hadn't noticed was a new
>> test...).
>
> Ran that overnight with the same conditions as before. 2683 OK runs and
> 0 failures (and counting). So it seems like the combination of the two
> fixed the split index bugs.

I forgot I ad this running, and got up to 45482 OKs and 0 FAILs before
finally Ctrl+C-ing it now :)

>>>> but I
>>>> don't see how it could be added without some very liberal
>>>> getenv("GIT_TEST_blahblah"), so it's probably best to not add it,
>>>> particularly with the C rewrite of git-stash in-flight.
>>>>
>>>> I'll report back when I have enough test data to say how these patches
>>>> affect the intermittent test failures under GIT_TEST_SPLIT_INDEX=yes.
SZEDER Gábor Oct. 8, 2018, 3:41 p.m. UTC | #7
On Mon, Oct 08, 2018 at 04:54:51PM +0200, Ævar Arnfjörð Bjarmason wrote:

> >> Thanks. I had ~400 runs of the tests I ran before and they were all
> >> OK. Now trying also with t1701 (which I hadn't noticed was a new
> >> test...).
> >
> > Ran that overnight with the same conditions as before. 2683 OK runs and
> > 0 failures (and counting). So it seems like the combination of the two
> > fixed the split index bugs.
> 
> I forgot I ad this running, and got up to 45482 OKs and 0 FAILs before
> finally Ctrl+C-ing it now :)

Yay! \o/

Thanks for testing, and I feel sorry for your poor machine...

I will send an updated version to address the (minor) points raised in
[1]...  well, most likely not today, but hopefully soon-ish.

1 - https://public-inbox.org/git/20180929091429.GF23446@localhost/