Message ID | adbe9c8ee90e087e864bd9e0c67338974b5dbc8d.1681748502.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 5f658d1b577722111564f51962d6af33d1fe96c6 |
Headers | show |
Series | git fsck: check pack rev-index files | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > if (!hashfile_checksum_valid((const unsigned char *)p->revindex_map, p->revindex_size)) { > error(_("invalid checksum")); > - return -1; > + res = -1; > } > > - return 0; > + /* This may fail due to a broken .idx. */ > + if (create_pack_revindex_in_memory(p)) > + return res; Here, if the revindex file itself is self consistent, res would still be 0, so we will end up silently returning. Is the idea that while whatever causes in-memory revindex fail to be computed is a concern from the point of view of the repository's health, it would be caught elsewhere as a problem for the <pack,idx> pair we are seeing here? > + for (size_t i = 0; i < p->num_objects; i++) { > + uint32_t nr = p->revindex[i].nr; > + uint32_t rev_val = get_be32(p->revindex_data + i); > + > + if (nr != rev_val) { > + error(_("invalid rev-index position at %"PRIu64": %"PRIu32" != %"PRIu32""), > + (uint64_t)i, nr, rev_val); > + res = -1; > + } > + } > + > + return res; > } > > int load_midx_revindex(struct multi_pack_index *m) > diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh > index 6b7c709a1f6..5c3c80f88f0 100755 > --- a/t/t5325-reverse-index.sh > +++ b/t/t5325-reverse-index.sh > @@ -185,4 +185,9 @@ test_expect_success 'fsck catches invalid checksum' ' > "invalid checksum" > ' > > +test_expect_success 'fsck catches invalid row position' ' > + corrupt_rev_and_verify 14 "\07" \ ;-) I wondered how the patch made this "\07" portably available; it is fed as the format string to printf in the helper, which is clever but obviously correct way to do this. Nice. > + "invalid rev-index position" > +' > + > test_done
On Mon, Apr 17, 2023 at 04:21:40PM +0000, Derrick Stolee via GitGitGadget wrote: > + for (size_t i = 0; i < p->num_objects; i++) { > + uint32_t nr = p->revindex[i].nr; > + uint32_t rev_val = get_be32(p->revindex_data + i); > + > + if (nr != rev_val) { > + error(_("invalid rev-index position at %"PRIu64": %"PRIu32" != %"PRIu32""), > + (uint64_t)i, nr, rev_val); > + res = -1; > + } > + } > + > + return res; Very clever. I thought about it for a while, and I am pretty confident that this is the cleanest you can get this loop to be. You could do something like building the forward index out of inverting the values in the .rev file, but everything I came up with based on that ended up being circular and not demonstrating anything interesting at all. Thanks, Taylor
On 4/17/2023 6:01 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> if (!hashfile_checksum_valid((const unsigned char *)p->revindex_map, p->revindex_size)) { >> error(_("invalid checksum")); >> - return -1; >> + res = -1; >> } >> >> - return 0; >> + /* This may fail due to a broken .idx. */ >> + if (create_pack_revindex_in_memory(p)) >> + return res; > > Here, if the revindex file itself is self consistent, res would > still be 0, so we will end up silently returning. Is the idea that > while whatever causes in-memory revindex fail to be computed is a > concern from the point of view of the repository's health, it would > be caught elsewhere as a problem for the <pack,idx> pair we are > seeing here? This is something I noticed during a test suite run, where the .idx was corrupt, so the .rev file could not be checked. In this situation, we are not using the .rev file for anything since Git processes would fail to lookup objects in the associated packfile. In this case, we have no way to validate the rest of the .rev file's contents, but at least we checked its header and checksum, which is the best we can hope to do in this case. Thanks, -Stolee
diff --git a/pack-revindex.c b/pack-revindex.c index 007a806994f..62a9846470c 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -310,16 +310,33 @@ int load_pack_revindex(struct repository *r, struct packed_git *p) */ int verify_pack_revindex(struct packed_git *p) { + int res = 0; + /* Do not bother checking if not initialized. */ - if (!p->revindex_map) - return 0; + if (!p->revindex_map || !p->revindex_data) + return res; if (!hashfile_checksum_valid((const unsigned char *)p->revindex_map, p->revindex_size)) { error(_("invalid checksum")); - return -1; + res = -1; } - return 0; + /* This may fail due to a broken .idx. */ + if (create_pack_revindex_in_memory(p)) + return res; + + for (size_t i = 0; i < p->num_objects; i++) { + uint32_t nr = p->revindex[i].nr; + uint32_t rev_val = get_be32(p->revindex_data + i); + + if (nr != rev_val) { + error(_("invalid rev-index position at %"PRIu64": %"PRIu32" != %"PRIu32""), + (uint64_t)i, nr, rev_val); + res = -1; + } + } + + return res; } int load_midx_revindex(struct multi_pack_index *m) diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh index 6b7c709a1f6..5c3c80f88f0 100755 --- a/t/t5325-reverse-index.sh +++ b/t/t5325-reverse-index.sh @@ -185,4 +185,9 @@ test_expect_success 'fsck catches invalid checksum' ' "invalid checksum" ' +test_expect_success 'fsck catches invalid row position' ' + corrupt_rev_and_verify 14 "\07" \ + "invalid rev-index position" +' + test_done