mbox series

[v3,0/2] index-pack: fsck honor checks

Message ID pull.1658.v3.git.git.1706302749.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series index-pack: fsck honor checks | expand

Message

Koji Nakamaru via GitGitGadget Jan. 26, 2024, 8:59 p.m. UTC
git-index-pack has a --strict mode that can take an optional argument to
provide a list of fsck issues to change their severity. --fsck-objects does
not have such a utility, which would be useful if one would like to be more
lenient or strict on data integrity in a repository.

Like --strict, Allow --fsck-objects to also take a list of fsck msgs to
change the severity.

Changes since V2:

 * fixed some typos in the documentation
 * added commit trailers

Change since V1:

 * edited commit messages
 * clarified formatting in documentation for --strict= and --fsck-objects=

John Cai (2):
  index-pack: test and document --strict=<msg-id>=<severity>...
  index-pack: --fsck-objects to take an optional argument for fsck msgs

 Documentation/git-index-pack.txt | 26 +++++++++++++-------
 builtin/index-pack.c             |  5 ++--
 t/t5300-pack-object.sh           | 41 ++++++++++++++++++++++++++++++++
 3 files changed, 62 insertions(+), 10 deletions(-)


base-commit: 186b115d3062e6230ee296d1ddaa0c4b72a464b5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1658%2Fjohn-cai%2Fjc%2Findex-pack-fsck-honor-checks-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1658/john-cai/jc/index-pack-fsck-honor-checks-v3
Pull-Request: https://github.com/git/git/pull/1658

Range-diff vs v2:

 1:  b3b3e8bd0bf ! 1:  cdf7fc7fe8a index-pack: test and document --strict=<msg>
     @@ Metadata
      Author: John Cai <johncai86@gmail.com>
      
       ## Commit message ##
     -    index-pack: test and document --strict=<msg>
     +    index-pack: test and document --strict=<msg-id>=<severity>...
      
          5d477a334a (fsck (receive-pack): allow demoting errors to warnings,
          2015-06-22) allowed a list of fsck msg to downgrade to be passed to
     @@ Commit message
          directly, (nor use index-pack for that matter) it is still useful to
          document and test this feature.
      
     +    Reviewed-by: Christian Couder <christian.couder@gmail.com>
          Signed-off-by: John Cai <johncai86@gmail.com>
      
       ## Documentation/git-index-pack.txt ##
     @@ Documentation/git-index-pack.txt: OPTIONS
      ---strict::
      -	Die, if the pack contains broken objects or links.
      +--strict[=<msg-id>=<severity>...]::
     -+	Die, if the pack contains broken objects or links. If `<msg-ids>` is passed,
     -+	it should be a comma-separated list of `<msg-id>=<severity>` elements where
     -+	`<msg-id>` and `<severity>` are used to change the severity of some possible
     -+	issues, e.g., `--strict="missingEmail=ignore,badTagName=error"`. See the entry
     -+	for the `fsck.<msg-id>` configuration options in `linkgit:git-fsck[1] for
     -+	more information on the possible values of `<msg-id>` and `<severity>`.
     ++	Die, if the pack contains broken objects or links. An optional
     ++	comma-separated list of `<msg-id>=<severity>` can be passed to change
     ++	the severity of some possible issues, e.g.,
     ++	 `--strict="missingEmail=ignore,badTagName=error"`. See the entry for the
     ++	`fsck.<msg-id>` configuration options in linkgit:git-fsck[1] for more
     ++	information on the possible values of `<msg-id>` and `<severity>`.
       
       --progress-title::
       	For internal use only.
     @@ builtin/index-pack.c
       
       static const char index_pack_usage[] =
      -"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--[no-]rev-index] [--verify] [--strict] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
     -+"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--[no-]rev-index] [--verify] [--strict[=<msg-ids>]] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
     ++"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--[no-]rev-index] [--verify] [--strict[=<msg-id>=<severity>...]] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
       
       struct object_entry {
       	struct pack_idx_entry idx;
 2:  cce63c6465f ! 2:  a2b9adb93d8 index-pack: --fsck-objects to take an optional argument for fsck msgs
     @@ Commit message
          the option. This won't often be used by the normal end user, but it
          turns out it is useful for Git forges like GitLab.
      
     +    Reviewed-by: Christian Couder <christian.couder@gmail.com>
          Signed-off-by: John Cai <johncai86@gmail.com>
      
       ## Documentation/git-index-pack.txt ##
     @@ Documentation/git-index-pack.txt: default and "Indexing objects" when `--stdin`
       
      ---fsck-objects::
      -	For internal use only.
     -+--fsck-objects[=<msg-ids>=<severity>...]::
     -+	Instructs index-pack to check for broken objects, but unlike `--strict`,
     -+	does not choke on broken links. If `<msg-ids>` is passed, it should be
     -+	a comma-separated list of `<msg-id>=<severity>` where `<msg-id>` and
     -+	`<severity>` are used to change the severity of `fsck` errors e.g.,
     -+	`--strict="missingEmail=ignore,badTagName=ignore"`. See the entry for
     -+	the `fsck.<msg-id>` configuration options in `linkgit:git-fsck[1] for
     -+	more information on the possible values of `<msg-id>` and `<severity>`.
     ++--fsck-objects[=<msg-id>=<severity>...]::
     ++	Die if the pack contains broken objects. If the pack contains a tree
     ++	pointing to a .gitmodules blob that does not exist, prints the hash of
     ++	that blob (for the caller to check) after the hash that goes into the
     ++	name of the pack/idx file (see "Notes").
       +
     - Die if the pack contains broken objects. If the pack contains a tree
     - pointing to a .gitmodules blob that does not exist, prints the hash of
     +-Die if the pack contains broken objects. If the pack contains a tree
     +-pointing to a .gitmodules blob that does not exist, prints the hash of
     +-that blob (for the caller to check) after the hash that goes into the
     +-name of the pack/idx file (see "Notes").
     ++Unlike `--strict` however, don't choke on broken links. An optional
     ++comma-separated list of `<msg-id>=<severity>` can be passed to change the
     ++severity of some possible issues, e.g.,
     ++`--fsck-objects="missingEmail=ignore,badTagName=ignore"`. See the entry for the
     ++`fsck.<msg-id>` configuration options in linkgit:git-fsck[1] for more
     ++information on the possible values of `<msg-id>` and `<severity>`.
     + 
     + --threads=<n>::
     + 	Specifies the number of threads to spawn when resolving
      
       ## builtin/index-pack.c ##
      @@
       #include "setup.h"
       
       static const char index_pack_usage[] =
     --"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--[no-]rev-index] [--verify] [--strict[=<msg-ids>]] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
     -+"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--[no-]rev-index] [--verify] [--strict[=<msg-ids>]] [--fsck-objects[=<msg-ids>]] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
     +-"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--[no-]rev-index] [--verify] [--strict[=<msg-id>=<severity>...]] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
     ++"git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>] [--[no-]rev-index] [--verify] [--strict[=<msg-id>=<severity>...]] [--fsck-objects[=<msg-id>=<severity>...]] (<pack-file> | --stdin [--fix-thin] [<pack-file>])";
       
       struct object_entry {
       	struct pack_idx_entry idx;

Comments

Junio C Hamano Jan. 26, 2024, 9:18 p.m. UTC | #1
"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  1:  b3b3e8bd0bf ! 1:  cdf7fc7fe8a index-pack: test and document --strict=<msg>
>      @@ Metadata
>       Author: John Cai <johncai86@gmail.com>
>       
>        ## Commit message ##
>      -    index-pack: test and document --strict=<msg>
>      +    index-pack: test and document --strict=<msg-id>=<severity>...

Ah, I missed this one.  Nice spotting.

>           5d477a334a (fsck (receive-pack): allow demoting errors to warnings,
>           2015-06-22) allowed a list of fsck msg to downgrade to be passed to
>      @@ Commit message
>           directly, (nor use index-pack for that matter) it is still useful to
>           document and test this feature.
>       
>      +    Reviewed-by: Christian Couder <christian.couder@gmail.com>
>           Signed-off-by: John Cai <johncai86@gmail.com>

I haven't seen Christian involved (by getting Cc'ed these patches,
sending out review comments, or giving his Reviewed-by:) during
these three rounds of this topic.  I'll wait until I hear from him
before queuing this, just to be safe.

>      ++	Die, if the pack contains broken objects or links. An optional
>      ++	comma-separated list of `<msg-id>=<severity>` can be passed to change
>      ++	the severity of some possible issues, e.g.,
>      ++	 `--strict="missingEmail=ignore,badTagName=error"`. See the entry for the
>      ++	`fsck.<msg-id>` configuration options in linkgit:git-fsck[1] for more
>      ++	information on the possible values of `<msg-id>` and `<severity>`.

This is much better than the tentative text I tweaked.  Nice.

>      ++--fsck-objects[=<msg-id>=<severity>...]::
>      ++	Die if the pack contains broken objects. If the pack contains a tree
>      ++	pointing to a .gitmodules blob that does not exist, prints the hash of
>      ++	that blob (for the caller to check) after the hash that goes into the
>      ++	name of the pack/idx file (see "Notes").

Not a new problem bit I have to wonder what happens if the pack
contains many trees that point at different blobs for ".gitmodules"
path and many of these blobs are not included in the packfile?  Will
the caller receive all of these blob object names so that they can
be verified?  The reference to the "Notes" only refer to the fact
that usually a single hash value that is used in constructing the
name of the packfile "pack-<Hashvalue>.pack" is emitted to the
standard output, which is not wrong per se, but does not help
readers very much wrt to understanding this.

[jc: dragging JTan into the thread, as this comes from his 5476e1ef
(fetch-pack: print and use dangling .gitmodules, 2021-02-22)].

>        +
>      ++Unlike `--strict` however, don't choke on broken links. An optional

You'd need a comma on both sides of "however" used like this, I
think.  

In any case, I thought your original construction to have this
"unlike" immediately after "die on broken objects" was far easier to
follow.

Thanks.
John Cai Jan. 26, 2024, 10:11 p.m. UTC | #2
Hi Junio,

On 26 Jan 2024, at 16:18, Junio C Hamano wrote:

> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>>  1:  b3b3e8bd0bf ! 1:  cdf7fc7fe8a index-pack: test and document --strict=<msg>
>>      @@ Metadata
>>       Author: John Cai <johncai86@gmail.com>
>>
>>        ## Commit message ##
>>      -    index-pack: test and document --strict=<msg>
>>      +    index-pack: test and document --strict=<msg-id>=<severity>...
>
> Ah, I missed this one.  Nice spotting.
>
>>           5d477a334a (fsck (receive-pack): allow demoting errors to warnings,
>>           2015-06-22) allowed a list of fsck msg to downgrade to be passed to
>>      @@ Commit message
>>           directly, (nor use index-pack for that matter) it is still useful to
>>           document and test this feature.
>>
>>      +    Reviewed-by: Christian Couder <christian.couder@gmail.com>
>>           Signed-off-by: John Cai <johncai86@gmail.com>
>
> I haven't seen Christian involved (by getting Cc'ed these patches,
> sending out review comments, or giving his Reviewed-by:) during
> these three rounds of this topic.  I'll wait until I hear from him
> before queuing this, just to be safe.

Christian was involved on an off-list review of this patch series. You can see
it in [1].


1. https://gitlab.com/gitlab-org/git/-/merge_requests/88
>
>>      ++	Die, if the pack contains broken objects or links. An optional
>>      ++	comma-separated list of `<msg-id>=<severity>` can be passed to change
>>      ++	the severity of some possible issues, e.g.,
>>      ++	 `--strict="missingEmail=ignore,badTagName=error"`. See the entry for the
>>      ++	`fsck.<msg-id>` configuration options in linkgit:git-fsck[1] for more
>>      ++	information on the possible values of `<msg-id>` and `<severity>`.
>
> This is much better than the tentative text I tweaked.  Nice.
>
>>      ++--fsck-objects[=<msg-id>=<severity>...]::
>>      ++	Die if the pack contains broken objects. If the pack contains a tree
>>      ++	pointing to a .gitmodules blob that does not exist, prints the hash of
>>      ++	that blob (for the caller to check) after the hash that goes into the
>>      ++	name of the pack/idx file (see "Notes").
>
> Not a new problem bit I have to wonder what happens if the pack
> contains many trees that point at different blobs for ".gitmodules"
> path and many of these blobs are not included in the packfile?  Will
> the caller receive all of these blob object names so that they can
> be verified?  The reference to the "Notes" only refer to the fact
> that usually a single hash value that is used in constructing the
> name of the packfile "pack-<Hashvalue>.pack" is emitted to the
> standard output, which is not wrong per se, but does not help
> readers very much wrt to understanding this.
>
> [jc: dragging JTan into the thread, as this comes from his 5476e1ef
> (fetch-pack: print and use dangling .gitmodules, 2021-02-22)].

sounds good, will wait for some clarification here
>
>>        +
>>      ++Unlike `--strict` however, don't choke on broken links. An optional
>
> You'd need a comma on both sides of "however" used like this, I
> think.

good catch

>
> In any case, I thought your original construction to have this
> "unlike" immediately after "die on broken objects" was far easier to
> follow.

I'll reformulate this to be clearer. From the previous version I realized I
didn't take into account the pre-existing "Die if the pack contains broken
objects" block so I put it at the beginning. But now I think you're right in
that the "Unlike..." comes too late.

>
> Thanks.
Jonathan Tan Jan. 26, 2024, 10:13 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:
> >      ++--fsck-objects[=<msg-id>=<severity>...]::
> >      ++	Die if the pack contains broken objects. If the pack contains a tree
> >      ++	pointing to a .gitmodules blob that does not exist, prints the hash of
> >      ++	that blob (for the caller to check) after the hash that goes into the
> >      ++	name of the pack/idx file (see "Notes").
> 
> Not a new problem bit I have to wonder what happens if the pack
> contains many trees that point at different blobs for ".gitmodules"
> path and many of these blobs are not included in the packfile?  Will
> the caller receive all of these blob object names so that they can
> be verified?  The reference to the "Notes" only refer to the fact
> that usually a single hash value that is used in constructing the
> name of the packfile "pack-<Hashvalue>.pack" is emitted to the
> standard output, which is not wrong per se, but does not help
> readers very much wrt to understanding this.
> 
> [jc: dragging JTan into the thread, as this comes from his 5476e1ef
> (fetch-pack: print and use dangling .gitmodules, 2021-02-22)].

Ah...I can see how that documentation isn't clear. The intention of that
commit is to check every link to a .gitmodules blob. The tests perhaps
should have been written with 2 .gitmodules blobs (in separate commits),
but I think the production code works: I tried changing the test to have
2 commits each with their own .gitmodules blob, and error messages were
printed for both blobs.

(If someone changes that test, e.g. to have 2 blobs, the ">h" in the
"configure_exclusion" invocations look superfluous and is perhaps a
copy-and-paste error from other tests that needed the hash later.)
John Cai Jan. 27, 2024, 2:31 a.m. UTC | #4
Hi Jonathan,

On 26 Jan 2024, at 17:13, Jonathan Tan wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>>>      ++--fsck-objects[=<msg-id>=<severity>...]::
>>>      ++	Die if the pack contains broken objects. If the pack contains a tree
>>>      ++	pointing to a .gitmodules blob that does not exist, prints the hash of
>>>      ++	that blob (for the caller to check) after the hash that goes into the
>>>      ++	name of the pack/idx file (see "Notes").
>>
>> Not a new problem bit I have to wonder what happens if the pack
>> contains many trees that point at different blobs for ".gitmodules"
>> path and many of these blobs are not included in the packfile?  Will
>> the caller receive all of these blob object names so that they can
>> be verified?  The reference to the "Notes" only refer to the fact
>> that usually a single hash value that is used in constructing the
>> name of the packfile "pack-<Hashvalue>.pack" is emitted to the
>> standard output, which is not wrong per se, but does not help
>> readers very much wrt to understanding this.
>>
>> [jc: dragging JTan into the thread, as this comes from his 5476e1ef
>> (fetch-pack: print and use dangling .gitmodules, 2021-02-22)].
>
> Ah...I can see how that documentation isn't clear. The intention of that
> commit is to check every link to a .gitmodules blob. The tests perhaps
> should have been written with 2 .gitmodules blobs (in separate commits),
> but I think the production code works: I tried changing the test to have
> 2 commits each with their own .gitmodules blob, and error messages were
> printed for both blobs.

Thanks for clarifying! Would you mind providing a patch to revise the wording
here to make it clearer? I would try but I feel like I might get the wording
wrong.
>
> (If someone changes that test, e.g. to have 2 blobs, the ">h" in the
> "configure_exclusion" invocations look superfluous and is perhaps a
> copy-and-paste error from other tests that needed the hash later.)

thanks
John
Patrick Steinhardt Jan. 29, 2024, 11:15 a.m. UTC | #5
On Fri, Jan 26, 2024 at 05:11:14PM -0500, John Cai wrote:
> Hi Junio,
> 
> On 26 Jan 2024, at 16:18, Junio C Hamano wrote:
> 
> > "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> >>  1:  b3b3e8bd0bf ! 1:  cdf7fc7fe8a index-pack: test and document --strict=<msg>
> >>      @@ Metadata
> >>       Author: John Cai <johncai86@gmail.com>
> >>
> >>        ## Commit message ##
> >>      -    index-pack: test and document --strict=<msg>
> >>      +    index-pack: test and document --strict=<msg-id>=<severity>...
> >
> > Ah, I missed this one.  Nice spotting.
> >
> >>           5d477a334a (fsck (receive-pack): allow demoting errors to warnings,
> >>           2015-06-22) allowed a list of fsck msg to downgrade to be passed to
> >>      @@ Commit message
> >>           directly, (nor use index-pack for that matter) it is still useful to
> >>           document and test this feature.
> >>
> >>      +    Reviewed-by: Christian Couder <christian.couder@gmail.com>
> >>           Signed-off-by: John Cai <johncai86@gmail.com>
> >
> > I haven't seen Christian involved (by getting Cc'ed these patches,
> > sending out review comments, or giving his Reviewed-by:) during
> > these three rounds of this topic.  I'll wait until I hear from him
> > before queuing this, just to be safe.
> 
> Christian was involved on an off-list review of this patch series. You can see
> it in [1].
> 
> 1. https://gitlab.com/gitlab-org/git/-/merge_requests/88

I'm always a bit hesitant to add trailers referring to off-list reviews
to commits. It's impossible for a future reader to discover how that
trailer came to be by just using the mailing list archive, and expecting
them to use third-party services to verify them feels wrong to me.

It's part of the reason why I'm pushing more into the direction of
on-list reviews at GitLab. It makes it a lot more obvious how such a
Reviewed-by came to be and keeps things self-contained on the mailing
list. It also grows new contributors who are becoming more familiar with
how the Git mailing list works. If such a review already happened
internally due to whatever reason then I think it ought to be fine for
that reviewer to chime in saying that they have already reviewed the
patch series and that things look good to them.

Patrick
Junio C Hamano Jan. 29, 2024, 5:18 p.m. UTC | #6
Patrick Steinhardt <ps@pks.im> writes:

> I'm always a bit hesitant to add trailers referring to off-list reviews
> to commits. It's impossible for a future reader to discover how that
> trailer came to be by just using the mailing list archive, and expecting
> them to use third-party services to verify them feels wrong to me.
>
> It's part of the reason why I'm pushing more into the direction of
> on-list reviews at GitLab. It makes it a lot more obvious how such a
> Reviewed-by came to be and keeps things self-contained on the mailing
> list. It also grows new contributors who are becoming more familiar with
> how the Git mailing list works. If such a review already happened
> internally due to whatever reason then I think it ought to be fine for
> that reviewer to chime in saying that they have already reviewed the
> patch series and that things look good to them.

Thanks.  That would improve clarifying a situation like this one
(eh, actually, once it is done this particular situation wouldn't
need any clarification).
Jonathan Tan Jan. 31, 2024, 10:30 p.m. UTC | #7
John Cai <johncai86@gmail.com> writes:
> Hi Jonathan,
> 
> On 26 Jan 2024, at 17:13, Jonathan Tan wrote:
> 
> > Junio C Hamano <gitster@pobox.com> writes:
> >>>      ++--fsck-objects[=<msg-id>=<severity>...]::
> >>>      ++	Die if the pack contains broken objects. If the pack contains a tree
> >>>      ++	pointing to a .gitmodules blob that does not exist, prints the hash of
> >>>      ++	that blob (for the caller to check) after the hash that goes into the
> >>>      ++	name of the pack/idx file (see "Notes").

> Thanks for clarifying! Would you mind providing a patch to revise the wording
> here to make it clearer? I would try but I feel like I might get the wording
> wrong.

I think the wording there is already mostly correct, except maybe make
everything plural (a tree -> trees, a .gitmodules blob -> .gitmodules
blobs, hash of that blob -> hashes of those blobs). We might also need
to modify a test to show that the current code indeed handles the plural
situation correctly. I don't have time right now to get to this, so
hopefully someone could pick this up.
John Cai Feb. 1, 2024, 1:34 a.m. UTC | #8
Hi Jonathan,

On 31 Jan 2024, at 17:30, Jonathan Tan wrote:

> John Cai <johncai86@gmail.com> writes:
>> Hi Jonathan,
>>
>> On 26 Jan 2024, at 17:13, Jonathan Tan wrote:
>>
>>> Junio C Hamano <gitster@pobox.com> writes:
>>>>>      ++--fsck-objects[=<msg-id>=<severity>...]::
>>>>>      ++	Die if the pack contains broken objects. If the pack contains a tree
>>>>>      ++	pointing to a .gitmodules blob that does not exist, prints the hash of
>>>>>      ++	that blob (for the caller to check) after the hash that goes into the
>>>>>      ++	name of the pack/idx file (see "Notes").
>
>> Thanks for clarifying! Would you mind providing a patch to revise the wording
>> here to make it clearer? I would try but I feel like I might get the wording
>> wrong.
>
> I think the wording there is already mostly correct, except maybe make
> everything plural (a tree -> trees, a .gitmodules blob -> .gitmodules
> blobs, hash of that blob -> hashes of those blobs). We might also need
> to modify a test to show that the current code indeed handles the plural
> situation correctly. I don't have time right now to get to this, so
> hopefully someone could pick this up.

Thanks! It sounds like we may want to tackle this as part of another patch.

John
Junio C Hamano Feb. 1, 2024, 4:44 p.m. UTC | #9
John Cai <johncai86@gmail.com> writes:

>>> Thanks for clarifying! Would you mind providing a patch to revise the wording
>>> here to make it clearer? I would try but I feel like I might get the wording
>>> wrong.
>>
>> I think the wording there is already mostly correct, except maybe make
>> everything plural (a tree -> trees, a .gitmodules blob -> .gitmodules
>> blobs, hash of that blob -> hashes of those blobs). We might also need
>> to modify a test to show that the current code indeed handles the plural
>> situation correctly. I don't have time right now to get to this, so
>> hopefully someone could pick this up.
>
> Thanks! It sounds like we may want to tackle this as part of another patch.

Yeah, the existing documentation has been with our users for some
time, and it is not ultra urgent to fix it in that sense.  I'd say
that it can even wait until JTan gets bored with what he's doing and
needs some distraction himself ;-) 

As long as our collective mind remembers it as #leftoverbits it
would be sufficient.

Thanks, both.