diff mbox series

[GSoC,v14,09/11] builtin/fsck: add `git-refs verify` child process

Message ID ZqumdJz3-0mqh6Rc@ArchLinux (mailing list archive)
State Superseded
Headers show
Series ref consistency check infra setup | expand

Commit Message

shejialuo Aug. 1, 2024, 3:15 p.m. UTC
Introduce a new function "fsck_refs" that initializes and runs a child
process to execute the "git-refs verify" command.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 builtin/fsck.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Patrick Steinhardt Aug. 5, 2024, 12:58 p.m. UTC | #1
On Thu, Aug 01, 2024 at 11:15:00PM +0800, shejialuo wrote:
> Introduce a new function "fsck_refs" that initializes and runs a child
> process to execute the "git-refs verify" command.

It's `git refs verify`, not `git-refs verify` both in the commit body
and subject.

> Mentored-by: Patrick Steinhardt <ps@pks.im>
> Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
>  builtin/fsck.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 766bbd014d..b6ac878270 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -899,6 +899,21 @@ static int check_pack_rev_indexes(struct repository *r, int show_progress)
>  	return res;
>  }
>  
> +static void fsck_refs(void)
> +{
> +	struct child_process refs_verify = CHILD_PROCESS_INIT;
> +	child_process_init(&refs_verify);
> +	refs_verify.git_cmd = 1;
> +	strvec_pushl(&refs_verify.args, "refs", "verify", NULL);
> +	if (verbose)
> +		strvec_push(&refs_verify.args, "--verbose");
> +	if (check_strict)
> +		strvec_push(&refs_verify.args, "--strict");
> +
> +	if (run_command(&refs_verify))
> +		errors_found |= ERROR_REFS;
> +}

Okay. I think that it's sensible to execute this as part of git-fsck(1).
But do we want to provide an option to disable this new check, as well?

It does feel a bit like opening a can of worms, though. None of the
other checks have trivial ways to disable them, and git-fsck(1) is
gaining more and more checks. So if we can disable ref checks, we may
also want to have options to disable checks for objects, connectivity,
reverse indices, indices, commit graphs and whatnot. In other words, in
my opinion we need to think a bit bigger and design a proper UI around
this.

But I don't think that should happen as part of this commit series, as
it is already big enough. So either we just accept this patch as-is. Or
we evict it from this series and handle it in the future together with
all the other taks that one may potentially want to disable.

I'd rather pick option two.

Patrick
shejialuo Aug. 5, 2024, 3:18 p.m. UTC | #2
> > +static void fsck_refs(void)
> > +{
> > +	struct child_process refs_verify = CHILD_PROCESS_INIT;
> > +	child_process_init(&refs_verify);
> > +	refs_verify.git_cmd = 1;
> > +	strvec_pushl(&refs_verify.args, "refs", "verify", NULL);
> > +	if (verbose)
> > +		strvec_push(&refs_verify.args, "--verbose");
> > +	if (check_strict)
> > +		strvec_push(&refs_verify.args, "--strict");
> > +
> > +	if (run_command(&refs_verify))
> > +		errors_found |= ERROR_REFS;
> > +}
> 
> Okay. I think that it's sensible to execute this as part of git-fsck(1).
> But do we want to provide an option to disable this new check, as well?
> 
> It does feel a bit like opening a can of worms, though. None of the
> other checks have trivial ways to disable them, and git-fsck(1) is
> gaining more and more checks. So if we can disable ref checks, we may
> also want to have options to disable checks for objects, connectivity,
> reverse indices, indices, commit graphs and whatnot. In other words, in
> my opinion we need to think a bit bigger and design a proper UI around
> this.
> 
> But I don't think that should happen as part of this commit series, as
> it is already big enough. So either we just accept this patch as-is. Or
> we evict it from this series and handle it in the future together with
> all the other taks that one may potentially want to disable.
> 
> I'd rather pick option two.
> 

After talking with Patrick offline, we decide to drop this patch. At
current, we should put this change slowly for the user. Because many
people use "git-fsck(1)", currently we don't have a way to disable ref
checks by default. It's a little beyond this series.

We may consider later.

> Patrick
Junio C Hamano Aug. 5, 2024, 6:11 p.m. UTC | #3
shejialuo <shejialuo@gmail.com> writes:

>> > +static void fsck_refs(void)
>> > +{
>> > +	struct child_process refs_verify = CHILD_PROCESS_INIT;
>> > +	child_process_init(&refs_verify);
>> > +	refs_verify.git_cmd = 1;
>> > +	strvec_pushl(&refs_verify.args, "refs", "verify", NULL);
>> > +	if (verbose)
>> > +		strvec_push(&refs_verify.args, "--verbose");
>> > +	if (check_strict)
>> > +		strvec_push(&refs_verify.args, "--strict");
>> > +
>> > +	if (run_command(&refs_verify))
>> > +		errors_found |= ERROR_REFS;
>> > +}
>> 
>> Okay. I think that it's sensible to execute this as part of git-fsck(1).
>> But do we want to provide an option to disable this new check, as well?
>> 
>> It does feel a bit like opening a can of worms, though. None of the
>> other checks have trivial ways to disable them, and git-fsck(1) is
>> gaining more and more checks. So if we can disable ref checks, we may
>> also want to have options to disable checks for objects, connectivity,
>> reverse indices, indices, commit graphs and whatnot. In other words, in
>> my opinion we need to think a bit bigger and design a proper UI around
>> this.
>> 
>> But I don't think that should happen as part of this commit series, as
>> it is already big enough. So either we just accept this patch as-is. Or
>> we evict it from this series and handle it in the future together with
>> all the other taks that one may potentially want to disable.
>> 
>> I'd rather pick option two.
>> 
>
> After talking with Patrick offline, we decide to drop this patch. At
> current, we should put this change slowly for the user. Because many
> people use "git-fsck(1)", currently we don't have a way to disable ref
> checks by default. It's a little beyond this series.
>
> We may consider later.

Hmph, I am fine with the approach to take it slower.

BUT.

Here is what the diffstat for the whole thing in the updated round
v15 looked like.  Do most of these changes outside refs/ still
needed if we do not give any way to even optionally enable it via
"fsck" for those who feel adventurous?  Should we still be touching
fsck.[ch] and other fsck related infrastructure in the series?

 Documentation/fsck-msgids.txt |   6 ++
 Documentation/git-refs.txt    |  13 +++++
 builtin/fsck.c                |  17 +++---
 builtin/mktag.c               |   3 +-
 builtin/refs.c                |  34 +++++++++++
 fsck.c                        | 127 +++++++++++++++++++++++++++++++++---------
 fsck.h                        |  76 +++++++++++++++++++------
 object-file.c                 |   9 ++-
 refs.c                        |   5 ++
 refs.h                        |   8 +++
 refs/debug.c                  |  11 ++++
 refs/files-backend.c          | 116 +++++++++++++++++++++++++++++++++++++-
 refs/packed-backend.c         |   8 +++
 refs/refs-internal.h          |   6 ++
 refs/reftable-backend.c       |   8 +++
 t/t0602-reffiles-fsck.sh      |  92 ++++++++++++++++++++++++++++++
 16 files changed, 480 insertions(+), 59 deletions(-)
shejialuo Aug. 5, 2024, 6:36 p.m. UTC | #4
> 
> Hmph, I am fine with the approach to take it slower.
> 
> BUT.
> 
> Here is what the diffstat for the whole thing in the updated round
> v15 looked like.  Do most of these changes outside refs/ still
> needed if we do not give any way to even optionally enable it via
> "fsck" for those who feel adventurous?  Should we still be touching
> fsck.[ch] and other fsck related infrastructure in the series?
> 

From my current standing, regardless of whether we run the "git refs
verify" subprocess inside "git-fsck(1)", we must change the fsck
infrastructure illustrated by the following:

From the development of this series, we can know the main problem is
that fsck error message is highly coupled with the object checks. Even
if we don't perform "git refs verify" in "git-fsck(1)", we still need to
follow the fsck msg framework when executing "git refs verify". We
cannot avoid this.

The content we change for "fsck.[ch]" is mainly the fsck msg part. We do
not change anything about the objects.

I agree with you that it would be strange if we do not expose any
interfaces for user who are adventurous. Actually we may simply add an
option "--refs-experimental" or simply "--refs" to allow the users check
ref consistency by using "git-fsck(1)".

I guess the concern that Patrick cares about is that we ONLY make refs
optional here, but do not provide options for other checks. It will be
strange from this perspective.

>  Documentation/fsck-msgids.txt |   6 ++
>  Documentation/git-refs.txt    |  13 +++++
>  builtin/fsck.c                |  17 +++---
>  builtin/mktag.c               |   3 +-
>  builtin/refs.c                |  34 +++++++++++
>  fsck.c                        | 127 +++++++++++++++++++++++++++++++++---------
>  fsck.h                        |  76 +++++++++++++++++++------
>  object-file.c                 |   9 ++-
>  refs.c                        |   5 ++
>  refs.h                        |   8 +++
>  refs/debug.c                  |  11 ++++
>  refs/files-backend.c          | 116 +++++++++++++++++++++++++++++++++++++-
>  refs/packed-backend.c         |   8 +++
>  refs/refs-internal.h          |   6 ++
>  refs/reftable-backend.c       |   8 +++
>  t/t0602-reffiles-fsck.sh      |  92 ++++++++++++++++++++++++++++++
>  16 files changed, 480 insertions(+), 59 deletions(-)
Junio C Hamano Aug. 5, 2024, 7:38 p.m. UTC | #5
shejialuo <shejialuo@gmail.com> writes:

> I agree with you that it would be strange if we do not expose any
> interfaces for user who are adventurous. Actually we may simply add an
> option "--refs-experimental" or simply "--refs" to allow the users check
> ref consistency by using "git-fsck(1)".
>
> I guess the concern that Patrick cares about is that we ONLY make refs
> optional here, but do not provide options for other checks. It will be
> strange from this perspective.

I do not care about strange all that much.  I however care about new
complexity in the code, complexity that is not taken advantage of
and is not exercised.  

You said

> From the development of this series, we can know the main problem is
> that fsck error message is highly coupled with the object checks.

and even if it is true and we have problem in fsck code paths, we
cannot see if _your_ solution to that problem is a good one without
having the code that exercises your additional code.

But if "git refs verify" does exercise all the new code paths (and
the refactored code that existed before this series, sitting now in
different places), then I do not have to worry about it.  My question
was primarily to extract "even though we do not wire this up to fsck,
we already have another code paths that uses all these changes" out
of you.

Thanks.
shejialuo Aug. 6, 2024, 12:42 a.m. UTC | #6
On Mon, Aug 05, 2024 at 12:38:43PM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > I agree with you that it would be strange if we do not expose any
> > interfaces for user who are adventurous. Actually we may simply add an
> > option "--refs-experimental" or simply "--refs" to allow the users check
> > ref consistency by using "git-fsck(1)".
> >
> > I guess the concern that Patrick cares about is that we ONLY make refs
> > optional here, but do not provide options for other checks. It will be
> > strange from this perspective.
> 
> I do not care about strange all that much.  I however care about new
> complexity in the code, complexity that is not taken advantage of
> and is not exercised.  
> 
> You said
> 
> > From the development of this series, we can know the main problem is
> > that fsck error message is highly coupled with the object checks.
> 
> and even if it is true and we have problem in fsck code paths, we
> cannot see if _your_ solution to that problem is a good one without
> having the code that exercises your additional code.
> 
> But if "git refs verify" does exercise all the new code paths (and
> the refactored code that existed before this series, sitting now in
> different places), then I do not have to worry about it.  My question
> was primarily to extract "even though we do not wire this up to fsck,
> we already have another code paths that uses all these changes" out
> of you.
> 

I understand what you mean here. I can say that "git refs verify" only
exercises a part of the new code paths. The main reason why this series
changes a lot of "fsck.[ch]" is that I want to expose the
"fsck_report_ref" interface to report refs-related errors. So I guess
this part should be covered.

At the current implementation, we change the "fsck.[ch]" for the
following three things:

1. Refactor "report" to use "fsck_vreport"
2. Create "fsck_report_ref" for refs check.
3. Do some simple renames to distinguish between refs and objects.

We do cover the second case in "git refs verify". But sadly, the first
case and third case are covered in "git-fsck(1)". So, "git refs verify"
does not exercise the refactored code.

However, I am a little confused. Actually, in the implementation, refs
check and objects check are independent.

I think we should wire up "git refs verify" to "git-fsck(1)". Because we
have no way to exercise the above case 1 and 3. If we do not, we will
bring a lot of complexity here.

> Thanks.
Patrick Steinhardt Aug. 6, 2024, 5:29 a.m. UTC | #7
On Mon, Aug 05, 2024 at 12:38:43PM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > I agree with you that it would be strange if we do not expose any
> > interfaces for user who are adventurous. Actually we may simply add an
> > option "--refs-experimental" or simply "--refs" to allow the users check
> > ref consistency by using "git-fsck(1)".
> >
> > I guess the concern that Patrick cares about is that we ONLY make refs
> > optional here, but do not provide options for other checks. It will be
> > strange from this perspective.
> 
> I do not care about strange all that much.  I however care about new
> complexity in the code, complexity that is not taken advantage of
> and is not exercised.  
> 
> You said
> 
> > From the development of this series, we can know the main problem is
> > that fsck error message is highly coupled with the object checks.
> 
> and even if it is true and we have problem in fsck code paths, we
> cannot see if _your_ solution to that problem is a good one without
> having the code that exercises your additional code.
> 
> But if "git refs verify" does exercise all the new code paths (and
> the refactored code that existed before this series, sitting now in
> different places), then I do not have to worry about it.  My question
> was primarily to extract "even though we do not wire this up to fsck,
> we already have another code paths that uses all these changes" out
> of you.

Yeah, there is quite some complexity introduced in fsck code, where most
of the complexity comes from making the reportin functions more generic.
And we do end up using that in this series even if we don't integrate
`git refs verify` with `git fsck`. It brings us the ability to configure
the checks performed by `git refs verify` in the same way like we can
configure all the other checks in `git fsck`, such that we can enable,
disable or change severity levels for each of the messages reported by
it.

I think that this is a sensible way to go about it, and it leaves us
with a more flexible and consistent error reporting infrastructure that
we can eventually also use for other commands that git-fsck(1) shells
out to.

Patrick
Patrick Steinhardt Aug. 6, 2024, 6:04 a.m. UTC | #8
On Tue, Aug 06, 2024 at 08:42:23AM +0800, shejialuo wrote:
> On Mon, Aug 05, 2024 at 12:38:43PM -0700, Junio C Hamano wrote:
> > shejialuo <shejialuo@gmail.com> writes:
> > 
> > > I agree with you that it would be strange if we do not expose any
> > > interfaces for user who are adventurous. Actually we may simply add an
> > > option "--refs-experimental" or simply "--refs" to allow the users check
> > > ref consistency by using "git-fsck(1)".
> > >
> > > I guess the concern that Patrick cares about is that we ONLY make refs
> > > optional here, but do not provide options for other checks. It will be
> > > strange from this perspective.
> > 
> > I do not care about strange all that much.  I however care about new
> > complexity in the code, complexity that is not taken advantage of
> > and is not exercised.  
> > 
> > You said
> > 
> > > From the development of this series, we can know the main problem is
> > > that fsck error message is highly coupled with the object checks.
> > 
> > and even if it is true and we have problem in fsck code paths, we
> > cannot see if _your_ solution to that problem is a good one without
> > having the code that exercises your additional code.
> > 
> > But if "git refs verify" does exercise all the new code paths (and
> > the refactored code that existed before this series, sitting now in
> > different places), then I do not have to worry about it.  My question
> > was primarily to extract "even though we do not wire this up to fsck,
> > we already have another code paths that uses all these changes" out
> > of you.
> > 
> 
> I understand what you mean here. I can say that "git refs verify" only
> exercises a part of the new code paths. The main reason why this series
> changes a lot of "fsck.[ch]" is that I want to expose the
> "fsck_report_ref" interface to report refs-related errors. So I guess
> this part should be covered.
> 
> At the current implementation, we change the "fsck.[ch]" for the
> following three things:
> 
> 1. Refactor "report" to use "fsck_vreport"
> 2. Create "fsck_report_ref" for refs check.
> 3. Do some simple renames to distinguish between refs and objects.
> 
> We do cover the second case in "git refs verify". But sadly, the first
> case and third case are covered in "git-fsck(1)". So, "git refs verify"
> does not exercise the refactored code.
> 
> However, I am a little confused. Actually, in the implementation, refs
> check and objects check are independent.
> 
> I think we should wire up "git refs verify" to "git-fsck(1)". Because we
> have no way to exercise the above case 1 and 3. If we do not, we will
> bring a lot of complexity here.

I don't think that is necessary. Basically, what you are saying is that
we seem to be testing only the new refs-related reporting that you have
introduced via your refactorings, but not the preexisting objects
related functionality.

That isn't true though. The object-specific reporting functions that you
have refactored already had callers before, and it still has callers now
because you adapted those accordingly. You can assume that those callers
already had tests before your refactorings, and as no tests broken you
can be reasonably sure that your refactorings are sound for the object
related code.

Furthermore, you do exercise the new ref-related parts via a couple of
tests that exercise `git refs verify`. Consequently, all parts of the
refactoring are covered by either old or new tests, and we should be
good here.

So even though we do not exercise the ref-related parts via git-fsck(1),
it is still subjected to tests. Eventually, when we start calling `git
refs verify` in git-fsck(1), we'd introduce more tests that verify that
the integration of those two commands works as expected, as well.

So, to summarize: the refactored functionality is both used and tested
and I think it's sensible to defer the integration of git-fsck(1) and
git-refs(1).

Thanks!

Patrick
Junio C Hamano Aug. 6, 2024, 3:54 p.m. UTC | #9
Patrick Steinhardt <ps@pks.im> writes:

>> > But if "git refs verify" does exercise all the new code paths (and
>> > the refactored code that existed before this series, sitting now in
>> > different places), then I do not have to worry about it.  My question
>> > was primarily to extract "even though we do not wire this up to fsck,
>> > we already have another code paths that uses all these changes" out
>> > of you.
>> ...
> So, to summarize: the refactored functionality is both used and tested
> and I think it's sensible to defer the integration of git-fsck(1) and
> git-refs(1).

After refactoring, existing functionality about objects are used, of
course, (there is no other code that does so), the refactoring lets
the code to learn to perform checks on references, and these new
checks are exercised by "git refs verify".

I took what shejialuo said that way, and that is fine by me when I
said the above ;-).  So I think we all are on the same page?

Thanks.
Patrick Steinhardt Aug. 7, 2024, 4:49 a.m. UTC | #10
On Tue, Aug 06, 2024 at 08:54:34AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >> > But if "git refs verify" does exercise all the new code paths (and
> >> > the refactored code that existed before this series, sitting now in
> >> > different places), then I do not have to worry about it.  My question
> >> > was primarily to extract "even though we do not wire this up to fsck,
> >> > we already have another code paths that uses all these changes" out
> >> > of you.
> >> ...
> > So, to summarize: the refactored functionality is both used and tested
> > and I think it's sensible to defer the integration of git-fsck(1) and
> > git-refs(1).
> 
> After refactoring, existing functionality about objects are used, of
> course, (there is no other code that does so), the refactoring lets
> the code to learn to perform checks on references, and these new
> checks are exercised by "git refs verify".
> 
> I took what shejialuo said that way, and that is fine by me when I
> said the above ;-).  So I think we all are on the same page?

Yup, we are. I was mostly aiming to reassure Jialuo, who was a bit
uncertain whether his answers had been sufficient or not.

Thanks!

Patrick
diff mbox series

Patch

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 766bbd014d..b6ac878270 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -899,6 +899,21 @@  static int check_pack_rev_indexes(struct repository *r, int show_progress)
 	return res;
 }
 
+static void fsck_refs(void)
+{
+	struct child_process refs_verify = CHILD_PROCESS_INIT;
+	child_process_init(&refs_verify);
+	refs_verify.git_cmd = 1;
+	strvec_pushl(&refs_verify.args, "refs", "verify", NULL);
+	if (verbose)
+		strvec_push(&refs_verify.args, "--verbose");
+	if (check_strict)
+		strvec_push(&refs_verify.args, "--strict");
+
+	if (run_command(&refs_verify))
+		errors_found |= ERROR_REFS;
+}
+
 static char const * const fsck_usage[] = {
 	N_("git fsck [--tags] [--root] [--unreachable] [--cache] [--no-reflogs]\n"
 	   "         [--[no-]full] [--strict] [--verbose] [--lost-found]\n"
@@ -1068,6 +1083,8 @@  int cmd_fsck(int argc, const char **argv, const char *prefix)
 
 	check_connectivity();
 
+	fsck_refs();
+
 	if (the_repository->settings.core_commit_graph) {
 		struct child_process commit_graph_verify = CHILD_PROCESS_INIT;