[4/4] midx: report checksum mismatches during 'verify'

Message ID	94e9de44e3b52513c5ab48aecd74f809dc34cbe3.1624473543.git.me@ttaylorr.com (mailing list archive)
State	Accepted
Commit	f89ecf79888a48e0adf14d0e05c69ee09e853fd5
Headers	show Return-Path: <git-owner@kernel.org> Date: Wed, 23 Jun 2021 14:39:15 -0400 From: Taylor Blau <me@ttaylorr.com> To: git@vger.kernel.org Cc: peff@peff.net, dstolee@microsoft.com Subject: [PATCH 4/4] midx: report checksum mismatches during 'verify' Message-ID: <94e9de44e3b52513c5ab48aecd74f809dc34cbe3.1624473543.git.me@ttaylorr.com> References: <cover.1624473543.git.me@ttaylorr.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <cover.1624473543.git.me@ttaylorr.com> Precedence: bulk
Series	midx: verify MIDX checksum before reusing \| expand [0/4] midx: verify MIDX checksum before reusing [1/4] csum-file: introduce checksum_valid() [2/4] commit-graph: rewrite to use checksum_valid() [3/4] midx: don't reuse corrupt MIDXs when writing [4/4] midx: report checksum mismatches during 'verify'

Taylor Blau June 23, 2021, 6:39 p.m. UTC

'git multi-pack-index verify' inspects the data in an existing MIDX for
correctness by checking that the recorded object offsets are correct,
and so on.

But it does not check that the file's trailing checksum matches the data
that it records. So, if an on-disk corruption happened to occur in the
final few bytes (and all other data was recorded correctly), we would:

  - get a clean result from 'git multi-pack-index verify', but
  - be unable to reuse the existing MIDX when writing a new one (since
    we now check for checksum mismatches before reusing a MIDX)

Teach the 'verify' sub-command to recognize corruption in the checksum
by calling midx_checksum_valid().

Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                      | 3 +++
 t/t5319-multi-pack-index.sh | 5 +++++
 2 files changed, 8 insertions(+)

Bagas Sanjaya June 24, 2021, 4:22 a.m. UTC | #1

On 24/06/21 01.39, Taylor Blau wrote:
> But it does not check that the file's trailing checksum matches the data
> that it records. So, if an on-disk corruption happened to occur in the
> final few bytes (and all other data was recorded correctly), we would:
> 
>    - get a clean result from 'git multi-pack-index verify', but
>    - be unable to reuse the existing MIDX when writing a new one (since
>      we now check for checksum mismatches before reusing a MIDX)
> 
> Teach the 'verify' sub-command to recognize corruption in the checksum
> by calling midx_checksum_valid().
...
> +	if (!midx_checksum_valid(m))
> +		midx_report(_("incorrect checksum"));
> +
...
> +test_expect_success 'verify incorrect checksum' '
> +	pos=$(($(wc -c <$objdir/pack/multi-pack-index) - 1)) &&
> +	corrupt_midx_and_verify $pos "\377" $objdir "incorrect checksum"
> +'
> +

We're in the context of checksum mismatches because of on-disk 
corruption, so why didn't the message say "incorrect checksum, possibly 
corrupt"?

Jeff King June 24, 2021, 8:10 p.m. UTC | #2

On Wed, Jun 23, 2021 at 02:39:15PM -0400, Taylor Blau wrote:

> 'git multi-pack-index verify' inspects the data in an existing MIDX for
> correctness by checking that the recorded object offsets are correct,
> and so on.
> 
> But it does not check that the file's trailing checksum matches the data
> that it records. So, if an on-disk corruption happened to occur in the
> final few bytes (and all other data was recorded correctly), we would:
> 
>   - get a clean result from 'git multi-pack-index verify', but
>   - be unable to reuse the existing MIDX when writing a new one (since
>     we now check for checksum mismatches before reusing a MIDX)
> 
> Teach the 'verify' sub-command to recognize corruption in the checksum
> by calling midx_checksum_valid().

Makes sense. I was a little surprised we didn't do this already, but I
guess it does not do the same "regenerate and make sure hashfile
produces the same checksum" trick that the pack idx verifier does (as an
aside, I think what the midx code is doing here is much _better_,
because it is looking at semantic problems in the file, and is more
robust against irrelevant changes in the format).

> @@ -1228,6 +1228,9 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
>  		return result;
>  	}
>  
> +	if (!midx_checksum_valid(m))
> +		midx_report(_("incorrect checksum"));

This "midx_report()" function doesn't provide much context on stderr. I
get:

  $ echo foo >>.git/objects/pack/multi-pack-index
  $ git multi-pack-index verify
  incorrect checksum
  Verifying OID order in multi-pack-index: 100% (282/282), done.
  Sorting objects by packfile: 100% (283/283), done.
  Verifying object offsets: 100% (283/283), done.

I think we should at least say "error:", but something along the lines
of "midx file at %s does not match its trailing checksum (possibly
corruption?)". Or something like that.

I think all of the existing calls to midx_report() share this issue,
though. We probably want to at least say "error:" here, but maybe even
something like:

diff --git a/midx.c b/midx.c
index 9a35b0255d..e464907a7c 100644
--- a/midx.c
+++ b/midx.c
@@ -1172,10 +1172,12 @@ void clear_midx_file(struct repository *r)

 static int verify_midx_error;

-static void midx_report(const char *fmt, ...)
+static void midx_report(struct multi_pack_index *m, const char *fmt, ...)
 {
 	va_list ap;
 	verify_midx_error = 1;
+	/* do we need to care about the "next" pointer here? */
+	fprintf(stderr, ("error: %s/multi-pack-index: "), m->object_dir);
 	va_start(ap, fmt);
 	vfprintf(stderr, fmt, ap);
 	fprintf(stderr, "\n");

Also, a side note: we should use __attribute__((format)) on this
function to get compile-time checks of our format strings.

-Peff

SZEDER Gábor Nov. 10, 2021, 11:11 p.m. UTC | #3

On Wed, Jun 23, 2021 at 02:39:15PM -0400, Taylor Blau wrote:
> 'git multi-pack-index verify' inspects the data in an existing MIDX for
> correctness by checking that the recorded object offsets are correct,
> and so on.
> 
> But it does not check that the file's trailing checksum matches the data
> that it records. So, if an on-disk corruption happened to occur in the
> final few bytes (and all other data was recorded correctly), we would:
> 
>   - get a clean result from 'git multi-pack-index verify', but
>   - be unable to reuse the existing MIDX when writing a new one (since
>     we now check for checksum mismatches before reusing a MIDX)
> 
> Teach the 'verify' sub-command to recognize corruption in the checksum
> by calling midx_checksum_valid().
> 
> Suggested-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  midx.c                      | 3 +++
>  t/t5319-multi-pack-index.sh | 5 +++++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/midx.c b/midx.c
> index a12cbbf928..9a35b0255d 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1228,6 +1228,9 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
>  		return result;
>  	}
>  
> +	if (!midx_checksum_valid(m))
> +		midx_report(_("incorrect checksum"));
> +
>  	if (flags & MIDX_PROGRESS)
>  		progress = start_delayed_progress(_("Looking for referenced packfiles"),
>  					  m->num_packs);
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index d582f370c4..7609f1ea64 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -418,6 +418,11 @@ test_expect_success 'corrupt MIDX is not reused' '
>  	git multi-pack-index verify
>  '
>  
> +test_expect_success 'verify incorrect checksum' '
> +	pos=$(($(wc -c <$objdir/pack/multi-pack-index) - 1)) &&
> +	corrupt_midx_and_verify $pos "\377" $objdir "incorrect checksum"

This test is flaky and can fail with:

  ...
  + printf \377
  + dd of=.git/objects/pack/multi-pack-index bs=1 seek=3839 conv=notrunc
  1+0 records in
  1+0 records out
  1 byte copied, 5.0656e-05 s, 19.7 kB/s
  + test_must_fail git multi-pack-index verify --object-dir=.git/objects
  test_must_fail: command succeeded: git multi-pack-index verify --object-dir=.git/objects
  error: last command exited with $?=1
  + mv midx-backup .git/objects/pack/multi-pack-index
  not ok 44 - verify incorrect checksum

So the test corrupts the checksum trailer in the 'multi-pack-index'
file by overwriting its last byte with 0xff, but if that byte were
already 0xff, then the file would be left as is, and 'git
multi-pack-index verify' wouldn't find anything amiss.

Since SHA1 is essentially random, there's a 1:256 chance of that
happening, assuming that the file's content is random.  That't not
really the case, however, because both the test repository's objects
and the resulting packfiles are deterministic, and, consequently, the
content of the MIDX is somewhat deterministic.  Only "somewhat",
though, because several of those objects appear in multiple packfiles,
and the MIDX selects a copy in the most recently modified packfile, so
filesystem mtime resolution and second boundaries become significant,
and cause some variance in the contents of the OOFF chunk.

Recently a laptop crossed my way that is somehow exceptionally good at
generating MIDX files ending with 0xff:

  # ad-hoc checksum statistics:
  $ git diff
  diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
  index f1ee2ce56d..605713b518 100755
  --- a/t/t5319-multi-pack-index.sh
  +++ b/t/t5319-multi-pack-index.sh
  @@ -482,9 +482,13 @@ test_expect_success 'corrupt MIDX is not reused' '
   '
   
   test_expect_success 'verify incorrect checksum' '
  +	skip=$(($(wc -c <$objdir/pack/multi-pack-index) - 20)) &&
  +	printf "checksum: " >&5 &&
  +	od -x -w20 -j$skip --endian=big -An "$objdir/pack/multi-pack-index" >&5 &&
   	pos=$(($(wc -c <$objdir/pack/multi-pack-index) - 1)) &&
   	corrupt_midx_and_verify $pos "\377" $objdir "incorrect checksum"
   '
  +test_done
   
   test_expect_success 'repack progress off for redirected stderr' '
   	GIT_PROGRESS_DELAY=0 git multi-pack-index --object-dir=$objdir repack 2>err &&
  $ for i in {1..500} ; do ./t5319-multi-pack-index.sh |sed -n -e 's/^checksum:  //p' ; done |sort |uniq -c
       31 1a70 3b1c 8ed3 56a6 5101 2a38 057e 698d 6faf fbaa
      340 5fc0 552f 0ac0 c876 f229 d9e3 ef13 a314 5847 89ff
      129 ce7d 3710 fd21 ef7b 8316 2b99 4e6c e5d5 5e7c 7b08

That's almost 70% failure rate, though I haven't been able to trigger
this failure on any of the other machines that I have access to.

Jeff King Nov. 11, 2021, 10:05 a.m. UTC | #4

On Thu, Nov 11, 2021 at 12:11:32AM +0100, SZEDER Gábor wrote:

> So the test corrupts the checksum trailer in the 'multi-pack-index'
> file by overwriting its last byte with 0xff, but if that byte were
> already 0xff, then the file would be left as is, and 'git
> multi-pack-index verify' wouldn't find anything amiss.

Good find. I tried running with the patch you showed:

>   $ for i in {1..500} ; do ./t5319-multi-pack-index.sh |sed -n -e 's/^checksum:  //p' ; done |sort |uniq -c
>        31 1a70 3b1c 8ed3 56a6 5101 2a38 057e 698d 6faf fbaa
>       340 5fc0 552f 0ac0 c876 f229 d9e3 ef13 a314 5847 89ff
>       129 ce7d 3710 fd21 ef7b 8316 2b99 4e6c e5d5 5e7c 7b08

but I only ever got the last one, even under load. However, it's easy
enough to tweak the mtimes to stimulate a failure:

  diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
  index 3f69e43178..b84e96779f 100755
  --- a/t/t5319-multi-pack-index.sh
  +++ b/t/t5319-multi-pack-index.sh
  @@ -481,10 +481,25 @@ test_expect_success 'corrupt MIDX is not reused' '
   	git multi-pack-index verify
   '
   
  +while true; do
   test_expect_success 'verify incorrect checksum' '
  +	t=1234567890 &&
  +	for i in $(ls $objdir/pack/*.pack | shuf)
  +	do
  +		touch -d @$t $i &&
  +		t=$((t+100))
  +	done &&
  +	rm -f $objdir/pack/multi-pack-index &&
  +	git multi-pack-index write &&
  +	skip=$(($(wc -c <$objdir/pack/multi-pack-index) - 20)) &&
  +	printf "checksum: " >&5 &&
  +	od -x -w20 -j$skip --endian=big -An "$objdir/pack/multi-pack-index" >&5 &&
   	pos=$(($(wc -c <$objdir/pack/multi-pack-index) - 1)) &&
   	corrupt_midx_and_verify $pos "\377" $objdir "incorrect checksum"
   '
  +done
  +
  +test_done
   
   test_expect_success 'repack progress off for redirected stderr' '
   	GIT_PROGRESS_DELAY=0 git multi-pack-index --object-dir=$objdir repack 2>err &&

Brute force, but it fails after a few hundred attempts. :)

> Since SHA1 is essentially random, there's a 1:256 chance of that
> happening, assuming that the file's content is random.

The most exact fix here would be to read the final byte, increment it
mod-256, and write it back, which would ensure it was always changed.
But that's a bit annoying to do in shell. Perhaps just corrupting more
bytes is a good solution? The patch below should reduce your changes to
1 in 2^80.

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 3f69e43178..a612e44547 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -482,8 +482,10 @@ test_expect_success 'corrupt MIDX is not reused' '
 '
 
 test_expect_success 'verify incorrect checksum' '
-	pos=$(($(wc -c <$objdir/pack/multi-pack-index) - 1)) &&
-	corrupt_midx_and_verify $pos "\377" $objdir "incorrect checksum"
+	pos=$(($(wc -c <$objdir/pack/multi-pack-index) - 10)) &&
+	corrupt_midx_and_verify $pos \
+		"\377\377\377\377\377\377\377\377\377\377" \
+		$objdir "incorrect checksum"
 '
 
 test_expect_success 'repack progress off for redirected stderr' '

There are other variants, of course. Just appending a single byte to the
file is enough to give you a high probability of failing the checksum
(since it shifts it all by one byte, making it essentially random), but
the corrupt_midx() helper doesn't support that.

-Peff

Taylor Blau Nov. 16, 2021, 9:10 p.m. UTC | #5

On Thu, Nov 11, 2021 at 05:05:06AM -0500, Jeff King wrote:
> > Since SHA1 is essentially random, there's a 1:256 chance of that
> > happening, assuming that the file's content is random.
>
> The most exact fix here would be to read the final byte, increment it
> mod-256, and write it back, which would ensure it was always changed.
> But that's a bit annoying to do in shell. Perhaps just corrupting more
> bytes is a good solution? The patch below should reduce your changes to
> 1 in 2^80.
>
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> index 3f69e43178..a612e44547 100755
> --- a/t/t5319-multi-pack-index.sh
> +++ b/t/t5319-multi-pack-index.sh
> @@ -482,8 +482,10 @@ test_expect_success 'corrupt MIDX is not reused' '
>  '
>
>  test_expect_success 'verify incorrect checksum' '
> -	pos=$(($(wc -c <$objdir/pack/multi-pack-index) - 1)) &&
> -	corrupt_midx_and_verify $pos "\377" $objdir "incorrect checksum"
> +	pos=$(($(wc -c <$objdir/pack/multi-pack-index) - 10)) &&
> +	corrupt_midx_and_verify $pos \
> +		"\377\377\377\377\377\377\377\377\377\377" \
> +		$objdir "incorrect checksum"
>  '
>
>  test_expect_success 'repack progress off for redirected stderr' '
>
> There are other variants, of course. Just appending a single byte to the
> file is enough to give you a high probability of failing the checksum
> (since it shifts it all by one byte, making it essentially random), but
> the corrupt_midx() helper doesn't support that.

Thanks SZEDER for the report, and thanks Peff for the fix :).

I agree that it's annoyingly cumbersome to write back the last byte
incremented mod-256. So I'm content to just make it astronomically
unlikely to run into a collision in practice. (As a matter of fact, I'm
surprised that the current implementation hasn't produced failures for
us more often).

Peff: do you want me to turn this into a patch or were you planning on
it?

Thanks,
Taylor

[4/4] midx: report checksum mismatches during 'verify'

Commit Message

Comments

Patch