mbox series

[0/4] git fsck: check pack rev-index files

Message ID pull.1512.git.1681748502.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series git fsck: check pack rev-index files | expand

Message

Jean-Noël Avila via GitGitGadget April 17, 2023, 4:21 p.m. UTC
This series is based on tb/pack-revindex-on-disk.

The git fsck builtin does not look at the .rev files that pair with .pack
and .idx files, but should. If these files suffer a bit flip on disk, then
the invalid data may cause fetches and clones from that repository to start
failing. The fix is simple: delete the .rev file (and regenerate, if
necessary), but detection is the first step.

This series adds those checks. The initial check to verify the checksum is
probably sufficient for most real-world scenarios, but going the extra mile
to verify the rev-index contents against an in-memory rev-index helps us be
sure that there isn't a bug in the rev-index writing code of Git (which
would result in a valid checksum). Much like other file formats, an invalid
header needs to be handled separately as a malformed header may prevent the
data structures from being initialized in the first place.

This series does not validate a multi-pack-index-<hash>.rev file or the
rev-index chunk of a multi-pack-index file. These could be fast-follows,
except that there is no existing equivalent for an in-memory rev-index for
easy comparison. The rev-index chunk (which is the most-common way for a
multi-pack-index to have this information) is already covered by existing
checksum validation, at least.

Thanks, -Stolee

Derrick Stolee (4):
  fsck: create scaffolding for rev-index checks
  fsck: check rev-index checksums
  fsck: check rev-index position values
  fsck: validate .rev file header

 builtin/fsck.c           | 36 +++++++++++++++++++
 pack-bitmap.c            |  4 +--
 pack-revindex.c          | 43 +++++++++++++++++++++--
 pack-revindex.h          | 16 +++++++++
 t/t5325-reverse-index.sh | 74 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 169 insertions(+), 4 deletions(-)


base-commit: 9f7f10a282d8adeb9da0990aa0eb2adf93a47ca7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1512%2Fderrickstolee%2Ffsck-rev-indexes-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1512/derrickstolee/fsck-rev-indexes-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1512

Comments

Junio C Hamano April 17, 2023, 9:37 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This series is based on tb/pack-revindex-on-disk.
>
> The git fsck builtin does not look at the .rev files that pair with .pack
> and .idx files, but should.

Yes, it should.

> ... The fix is simple: delete the .rev file (and regenerate, if
> necessary), but detection is the first step.

> This series adds those checks. The initial check to verify the checksum is
> probably sufficient for most real-world scenarios, but going the extra mile
> to verify the rev-index contents against an in-memory rev-index helps us be
> sure that there isn't a bug in the rev-index writing code of Git (which
> would result in a valid checksum).

Good description.

> Much like other file formats, an invalid
> header needs to be handled separately as a malformed header may prevent the
> data structures from being initialized in the first place.

> This series does not validate a multi-pack-index-<hash>.rev file or the
> rev-index chunk of a multi-pack-index file. These could be fast-follows,
> except that there is no existing equivalent for an in-memory rev-index for
> easy comparison. The rev-index chunk (which is the most-common way for a
> multi-pack-index to have this information) is already covered by existing
> checksum validation, at least.

TIL what "fast-follow" means ;-)
Taylor Blau April 18, 2023, 3:23 p.m. UTC | #2
On Mon, Apr 17, 2023 at 04:21:37PM +0000, Derrick Stolee via GitGitGadget wrote:
>  builtin/fsck.c           | 36 +++++++++++++++++++
>  pack-bitmap.c            |  4 +--
>  pack-revindex.c          | 43 +++++++++++++++++++++--
>  pack-revindex.h          | 16 +++++++++
>  t/t5325-reverse-index.sh | 74 ++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 169 insertions(+), 4 deletions(-)

I gave this a thorough look and it all looks good to me.

I left a couple of minor comments throughout, e.g., s/set up/setup, and
replacing 'wc -c' with 'test_file_size', but I doubt either of those are
a big enough deal to merit a reroll.

So I'd be happy to see this start moving forward as-is. I think that the
topic it depends on, tb/pack-revindex-on-disk is similarly ready for
merging.

Thanks,
Taylor
Junio C Hamano April 18, 2023, 4:59 p.m. UTC | #3
Taylor Blau <me@ttaylorr.com> writes:

> On Mon, Apr 17, 2023 at 04:21:37PM +0000, Derrick Stolee via GitGitGadget wrote:
>>  builtin/fsck.c           | 36 +++++++++++++++++++
>>  pack-bitmap.c            |  4 +--
>>  pack-revindex.c          | 43 +++++++++++++++++++++--
>>  pack-revindex.h          | 16 +++++++++
>>  t/t5325-reverse-index.sh | 74 ++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 169 insertions(+), 4 deletions(-)
>
> I gave this a thorough look and it all looks good to me.

This looked good to me, too.  I was planning to wait for a few days
to see what others find.

Thanks.