diff mbox series

[v2,8/8] builtin/fsck: add `git refs verify` child process

Message ID Z5r7NnzvirWEljwV@ArchLinux (mailing list archive)
State New
Headers show
Series add more ref consistency checks | expand

Commit Message

shejialuo Jan. 30, 2025, 4:08 a.m. UTC
At now, we have already implemented the ref consistency checks for both
"files-backend" and "packed-backend". Although we would check some
redundant things, it won't cause trouble. So, let's integrate it into
the "git-fsck(1)" command to get feedback from the users. And also by
calling "git refs verify" in "git-fsck(1)", we make sure that the new
added checks don't break.

Introduce a new function "fsck_refs" that initializes and runs a child
process to execute the "git refs verify" command. In order to provide
the user interface create a progress which makes the total task be 1.
It's hard to know how many loose refs we will check now. We might
improve this later.

And we run this function in the first execution sequence of
"git-fsck(1)" because we don't want the existing code of "git-fsck(1)"
which implicitly checks the consistency of refs to die the program.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 builtin/fsck.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

Comments

Junio C Hamano Jan. 30, 2025, 7:03 p.m. UTC | #1
shejialuo <shejialuo@gmail.com> writes:

> +static void fsck_refs(struct repository *r)
> +{
> +	struct child_process refs_verify = CHILD_PROCESS_INIT;
> +	struct progress *progress = NULL;
> +	uint64_t progress_num = 1;
> +
> +	if (show_progress)
> +		progress = start_progress(r, _("Checking ref database"),
> +					  progress_num);

I do not see why we need an extra variable progress_num here.  Just
passing a literal constant 1 should be sufficient.  The called
function has function prototype to help the compiler promite it to
the appropritate type.

Thanks.
shejialuo Jan. 31, 2025, 2:37 p.m. UTC | #2
On Thu, Jan 30, 2025 at 11:03:55AM -0800, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > +static void fsck_refs(struct repository *r)
> > +{
> > +	struct child_process refs_verify = CHILD_PROCESS_INIT;
> > +	struct progress *progress = NULL;
> > +	uint64_t progress_num = 1;
> > +
> > +	if (show_progress)
> > +		progress = start_progress(r, _("Checking ref database"),
> > +					  progress_num);
> 
> I do not see why we need an extra variable progress_num here.  Just
> passing a literal constant 1 should be sufficient.  The called
> function has function prototype to help the compiler promite it to
> the appropritate type.

You are correct, let me improve this in the next version.

Thanks,
Jialuo
Patrick Steinhardt Feb. 3, 2025, 8:40 a.m. UTC | #3
On Thu, Jan 30, 2025 at 12:08:22PM +0800, shejialuo wrote:
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 7a4dcb0716..9a8613d07f 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -905,6 +905,34 @@ static int check_pack_rev_indexes(struct repository *r, int show_progress)
>  	return res;
>  }
>  
> +static void fsck_refs(struct repository *r)
> +{
> +	struct child_process refs_verify = CHILD_PROCESS_INIT;
> +	struct progress *progress = NULL;
> +	uint64_t progress_num = 1;
> +
> +	if (show_progress)
> +		progress = start_progress(r, _("Checking ref database"),
> +					  progress_num);

Hm. I don't really think that this progress meter adds anything right
now. It only shows either 0 or 1, so it basically only tells you when
you're done. And that is something that the user can tell without a
progress meter.

> +
> +	if (verbose)
> +		fprintf_ln(stderr, _("Checking ref database"));
> +
> +	child_process_init(&refs_verify);
> +	refs_verify.git_cmd = 1;
> +	strvec_pushl(&refs_verify.args, "refs", "verify", NULL);
> +	if (verbose)
> +		strvec_push(&refs_verify.args, "--verbose");
> +	if (check_strict)
> +		strvec_push(&refs_verify.args, "--strict");
> +
> +	if (run_command(&refs_verify))
> +		errors_found |= ERROR_REFS;
> +
> +	display_progress(progress, 1);
> +	stop_progress(&progress);
> +}
> +
>  static char const * const fsck_usage[] = {
>  	N_("git fsck [--tags] [--root] [--unreachable] [--cache] [--no-reflogs]\n"
>  	   "         [--[no-]full] [--strict] [--verbose] [--lost-found]\n"
> @@ -970,6 +998,8 @@ int cmd_fsck(int argc,
>  	git_config(git_fsck_config, &fsck_obj_options);
>  	prepare_repo_settings(the_repository);
>  
> +	fsck_refs(the_repository);

I think there needs to be a way to disable this. How about we add an
option `--[no-]references` to do so? I was briefly wondering whether we
also want to have `--only-references`, but if a user wants to do that
they can simply execute `git refs verify` directly.

Patrick
shejialuo Feb. 4, 2025, 5:32 a.m. UTC | #4
On Mon, Feb 03, 2025 at 09:40:43AM +0100, Patrick Steinhardt wrote:
> On Thu, Jan 30, 2025 at 12:08:22PM +0800, shejialuo wrote:
> > diff --git a/builtin/fsck.c b/builtin/fsck.c
> > index 7a4dcb0716..9a8613d07f 100644
> > --- a/builtin/fsck.c
> > +++ b/builtin/fsck.c
> > @@ -905,6 +905,34 @@ static int check_pack_rev_indexes(struct repository *r, int show_progress)
> >  	return res;
> >  }
> >  
> > +static void fsck_refs(struct repository *r)
> > +{
> > +	struct child_process refs_verify = CHILD_PROCESS_INIT;
> > +	struct progress *progress = NULL;
> > +	uint64_t progress_num = 1;
> > +
> > +	if (show_progress)
> > +		progress = start_progress(r, _("Checking ref database"),
> > +					  progress_num);
> 
> Hm. I don't really think that this progress meter adds anything right
> now. It only shows either 0 or 1, so it basically only tells you when
> you're done. And that is something that the user can tell without a
> progress meter.
> 

You are correct in the functionality part. Actually, my very initial
implementation is what you have said. I simply used the following way to
indicate the user that we are going to check ref database.

    fprintf_ln(stderr, _("Checking ref database"));

However, it will break a test in "t/t1050-large.sh::fsck large blobs". I
cite the shell script below:

	test_expect_success 'fsck large blobs' '
		git fsck 2>err &&
		test_must_be_empty err
	'

> > +
> > +	if (verbose)
> > +		fprintf_ln(stderr, _("Checking ref database"));
> > +

That's the reason why we need to use `verbose` to control the behavior
here. Put it futhermore, We either use `process` or `verbose` to print
the message to the user. This is a pattern widely used in "git-fsck(1)".
For example "builtin/fsck.c::fsck_object_dir", we have the following
code:

	if (verbose)
		fprintf_ln(stderr, _("Checking object directory"));

	if (show_progress)
		progress = start_progress(the_repository,
					  _("Checking object directories"), 256);

So, that's why I use progress here. We need this to print the
information to the user. I have also tried to print to the stdout like
the following

	fprintf_ln(stdout, _("Checking ref database"));

It will also break the test.

> > +	child_process_init(&refs_verify);
> > +	refs_verify.git_cmd = 1;
> > +	strvec_pushl(&refs_verify.args, "refs", "verify", NULL);
> > +	if (verbose)
> > +		strvec_push(&refs_verify.args, "--verbose");
> > +	if (check_strict)
> > +		strvec_push(&refs_verify.args, "--strict");
> > +
> > +	if (run_command(&refs_verify))
> > +		errors_found |= ERROR_REFS;
> > +
> > +	display_progress(progress, 1);
> > +	stop_progress(&progress);
> > +}
> > +
> >  static char const * const fsck_usage[] = {
> >  	N_("git fsck [--tags] [--root] [--unreachable] [--cache] [--no-reflogs]\n"
> >  	   "         [--[no-]full] [--strict] [--verbose] [--lost-found]\n"
> > @@ -970,6 +998,8 @@ int cmd_fsck(int argc,
> >  	git_config(git_fsck_config, &fsck_obj_options);
> >  	prepare_repo_settings(the_repository);
> >  
> > +	fsck_refs(the_repository);
> 
> I think there needs to be a way to disable this. How about we add an
> option `--[no-]references` to do so? I was briefly wondering whether we
> also want to have `--only-references`, but if a user wants to do that
> they can simply execute `git refs verify` directly.
> 

Good idea, let me improve this in the next version.

Thanks,
Jialuo

> Patrick
diff mbox series

Patch

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 7a4dcb0716..9a8613d07f 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -905,6 +905,34 @@  static int check_pack_rev_indexes(struct repository *r, int show_progress)
 	return res;
 }
 
+static void fsck_refs(struct repository *r)
+{
+	struct child_process refs_verify = CHILD_PROCESS_INIT;
+	struct progress *progress = NULL;
+	uint64_t progress_num = 1;
+
+	if (show_progress)
+		progress = start_progress(r, _("Checking ref database"),
+					  progress_num);
+
+	if (verbose)
+		fprintf_ln(stderr, _("Checking ref database"));
+
+	child_process_init(&refs_verify);
+	refs_verify.git_cmd = 1;
+	strvec_pushl(&refs_verify.args, "refs", "verify", NULL);
+	if (verbose)
+		strvec_push(&refs_verify.args, "--verbose");
+	if (check_strict)
+		strvec_push(&refs_verify.args, "--strict");
+
+	if (run_command(&refs_verify))
+		errors_found |= ERROR_REFS;
+
+	display_progress(progress, 1);
+	stop_progress(&progress);
+}
+
 static char const * const fsck_usage[] = {
 	N_("git fsck [--tags] [--root] [--unreachable] [--cache] [--no-reflogs]\n"
 	   "         [--[no-]full] [--strict] [--verbose] [--lost-found]\n"
@@ -970,6 +998,8 @@  int cmd_fsck(int argc,
 	git_config(git_fsck_config, &fsck_obj_options);
 	prepare_repo_settings(the_repository);
 
+	fsck_refs(the_repository);
+
 	if (connectivity_only) {
 		for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
 		for_each_packed_object(the_repository,