diff mbox series

[3/4] fsck: check rev-index position values

Message ID adbe9c8ee90e087e864bd9e0c67338974b5dbc8d.1681748502.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 5f658d1b577722111564f51962d6af33d1fe96c6
Headers show
Series git fsck: check pack rev-index files | expand

Commit Message

Derrick Stolee April 17, 2023, 4:21 p.m. UTC
From: Derrick Stolee <derrickstolee@github.com>

When checking a rev-index file, it may be helpful to identify exactly
which positions are incorrect. Compare the rev-index to a
freshly-computed in-memory rev-index and report the comparison failures.

This additional check (on top of the checksum validation) can help find
files that were corrupt by a single bit flip on-disk or perhaps were
written incorrectly due to a bug in Git.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 pack-revindex.c          | 25 +++++++++++++++++++++----
 t/t5325-reverse-index.sh |  5 +++++
 2 files changed, 26 insertions(+), 4 deletions(-)

Comments

Junio C Hamano April 17, 2023, 10:01 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  	if (!hashfile_checksum_valid((const unsigned char *)p->revindex_map, p->revindex_size)) {
>  		error(_("invalid checksum"));
> -		return -1;
> +		res = -1;
>  	}
>  
> -	return 0;
> +	/* This may fail due to a broken .idx. */
> +	if (create_pack_revindex_in_memory(p))
> +		return res;

Here, if the revindex file itself is self consistent, res would
still be 0, so we will end up silently returning.  Is the idea that
while whatever causes in-memory revindex fail to be computed is a
concern from the point of view of the repository's health, it would
be caught elsewhere as a problem for the <pack,idx> pair we are
seeing here?

> +	for (size_t i = 0; i < p->num_objects; i++) {
> +		uint32_t nr = p->revindex[i].nr;
> +		uint32_t rev_val = get_be32(p->revindex_data + i);
> +
> +		if (nr != rev_val) {
> +			error(_("invalid rev-index position at %"PRIu64": %"PRIu32" != %"PRIu32""),
> +			      (uint64_t)i, nr, rev_val);
> +			res = -1;
> +		}
> +	}
> +
> +	return res;
>  }
>  
>  int load_midx_revindex(struct multi_pack_index *m)
> diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
> index 6b7c709a1f6..5c3c80f88f0 100755
> --- a/t/t5325-reverse-index.sh
> +++ b/t/t5325-reverse-index.sh
> @@ -185,4 +185,9 @@ test_expect_success 'fsck catches invalid checksum' '
>  		"invalid checksum"
>  '
>  
> +test_expect_success 'fsck catches invalid row position' '
> +	corrupt_rev_and_verify 14 "\07" \

;-)

I wondered how the patch made this "\07" portably available; it is
fed as the format string to printf in the helper, which is clever
but obviously correct way to do this.  Nice.

> +		"invalid rev-index position"
> +'
> +
>  test_done
Taylor Blau April 17, 2023, 10:52 p.m. UTC | #2
On Mon, Apr 17, 2023 at 04:21:40PM +0000, Derrick Stolee via GitGitGadget wrote:
> +	for (size_t i = 0; i < p->num_objects; i++) {
> +		uint32_t nr = p->revindex[i].nr;
> +		uint32_t rev_val = get_be32(p->revindex_data + i);
> +
> +		if (nr != rev_val) {
> +			error(_("invalid rev-index position at %"PRIu64": %"PRIu32" != %"PRIu32""),
> +			      (uint64_t)i, nr, rev_val);
> +			res = -1;
> +		}
> +	}
> +
> +	return res;

Very clever.

I thought about it for a while, and I am pretty confident that this is
the cleanest you can get this loop to be. You could do something like
building the forward index out of inverting the values in the .rev file,
but everything I came up with based on that ended up being circular and
not demonstrating anything interesting at all.

Thanks,
Taylor
Derrick Stolee April 18, 2023, 2:32 p.m. UTC | #3
On 4/17/2023 6:01 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>>  	if (!hashfile_checksum_valid((const unsigned char *)p->revindex_map, p->revindex_size)) {
>>  		error(_("invalid checksum"));
>> -		return -1;
>> +		res = -1;
>>  	}
>>  
>> -	return 0;
>> +	/* This may fail due to a broken .idx. */
>> +	if (create_pack_revindex_in_memory(p))
>> +		return res;
> 
> Here, if the revindex file itself is self consistent, res would
> still be 0, so we will end up silently returning.  Is the idea that
> while whatever causes in-memory revindex fail to be computed is a
> concern from the point of view of the repository's health, it would
> be caught elsewhere as a problem for the <pack,idx> pair we are
> seeing here?

This is something I noticed during a test suite run, where the .idx
was corrupt, so the .rev file could not be checked. In this situation,
we are not using the .rev file for anything since Git processes would
fail to lookup objects in the associated packfile.

In this case, we have no way to validate the rest of the .rev file's
contents, but at least we checked its header and checksum, which is
the best we can hope to do in this case.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/pack-revindex.c b/pack-revindex.c
index 007a806994f..62a9846470c 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -310,16 +310,33 @@  int load_pack_revindex(struct repository *r, struct packed_git *p)
  */
 int verify_pack_revindex(struct packed_git *p)
 {
+	int res = 0;
+
 	/* Do not bother checking if not initialized. */
-	if (!p->revindex_map)
-		return 0;
+	if (!p->revindex_map || !p->revindex_data)
+		return res;
 
 	if (!hashfile_checksum_valid((const unsigned char *)p->revindex_map, p->revindex_size)) {
 		error(_("invalid checksum"));
-		return -1;
+		res = -1;
 	}
 
-	return 0;
+	/* This may fail due to a broken .idx. */
+	if (create_pack_revindex_in_memory(p))
+		return res;
+
+	for (size_t i = 0; i < p->num_objects; i++) {
+		uint32_t nr = p->revindex[i].nr;
+		uint32_t rev_val = get_be32(p->revindex_data + i);
+
+		if (nr != rev_val) {
+			error(_("invalid rev-index position at %"PRIu64": %"PRIu32" != %"PRIu32""),
+			      (uint64_t)i, nr, rev_val);
+			res = -1;
+		}
+	}
+
+	return res;
 }
 
 int load_midx_revindex(struct multi_pack_index *m)
diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
index 6b7c709a1f6..5c3c80f88f0 100755
--- a/t/t5325-reverse-index.sh
+++ b/t/t5325-reverse-index.sh
@@ -185,4 +185,9 @@  test_expect_success 'fsck catches invalid checksum' '
 		"invalid checksum"
 '
 
+test_expect_success 'fsck catches invalid row position' '
+	corrupt_rev_and_verify 14 "\07" \
+		"invalid rev-index position"
+'
+
 test_done