diff mbox series

[GSoC,v15,7/9] builtin/refs: add verify subcommand

Message ID ZrEB16WGGc28dxu2@ArchLinux (mailing list archive)
State Accepted
Commit bf061d26c7a595f1f46510aa584d50be1b1edb93
Headers show
Series [GSoC,v15,1/9] fsck: rename "skiplist" to "skip_oids" | expand

Commit Message

shejialuo Aug. 5, 2024, 4:46 p.m. UTC
Introduce a new subcommand "verify" in git-refs(1) to allow the user to
check the reference database consistency and also this subcommand will
be used as the entry point of checking refs for "git-fsck(1)".

Add "verbose" field into "fsck_options" to indicate whether we should
print verbose messages when checking refs and objects consistency.

Remove bit-field for "strict" field, this is because we cannot take
address of a bit-field which makes it unhandy to set member variables
when parsing the command line options.

The "git-fsck(1)" declares "fsck_options" variable with "static"
identifier which avoids complaint by the leak-checker. However, in
"git-refs verify", we need to do memory clean manually. Thus add
"fsck_options_clear" function in "fsck.c" to provide memory clean
operation.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/git-refs.txt | 13 +++++++++++++
 builtin/refs.c             | 34 ++++++++++++++++++++++++++++++++++
 fsck.c                     | 11 +++++++++++
 fsck.h                     |  8 +++++++-
 4 files changed, 65 insertions(+), 1 deletion(-)

Comments

Patrick Steinhardt Aug. 6, 2024, 7:32 a.m. UTC | #1
On Tue, Aug 06, 2024 at 12:46:15AM +0800, shejialuo wrote:
> @@ -58,15 +63,44 @@ static int cmd_refs_migrate(int argc, const char **argv, const char *prefix)
>  	return err;
>  }
>  
> +static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
> +{
> +	struct fsck_options fsck_refs_options = FSCK_REFS_OPTIONS_DEFAULT;
> +	const char * const verify_usage[] = {
> +		REFS_VERIFY_USAGE,
> +		NULL,
> +	};
> +	struct option options[] = {
> +		OPT_BOOL(0, "verbose", &fsck_refs_options.verbose, N_("be verbose")),
> +		OPT_BOOL(0, "strict", &fsck_refs_options.strict, N_("enable strict checking")),
> +		OPT_END(),
> +	};
> +	int ret;
> +
> +	argc = parse_options(argc, argv, prefix, options, verify_usage, 0);
> +	if (argc)
> +		usage(_("'git refs verify' takes no arguments"));

Junio has posted a patch series [1] where he wants to get rid of
messages that simply say "no arguments" or "too many arguments". I guess
we can play nice and also move into the same direction here, where we
instead tell the user which argument we didn't expect.

So I'd propose to make this:

    argc = parse_options(argc, argv, prefix, options, verify_usage, 0);
    if (argc)
            usage(_("unknown argument: '%s'", argv[0]));

[1]: <20240806003539.3292562-1-gitster@pobox.com>

Patrick
Junio C Hamano Aug. 6, 2024, 4:15 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

>> +	if (argc)
>> +		usage(_("'git refs verify' takes no arguments"));
>
> Junio has posted a patch series [1] where he wants to get rid of
> messages that simply say "no arguments" or "too many arguments".
> ...
> So I'd propose to make this:
>
>     argc = parse_options(argc, argv, prefix, options, verify_usage, 0);
>     if (argc)
>             usage(_("unknown argument: '%s'", argv[0]));

I probably should have said that I am fully behind the intent
against "too many arguments", but I am not 100% behind the
particular messaging used in the patch series I sent out.

One potential complaint I expected to hear, for example, was that "a
is unknown" given when you said "git cmd a a a a a" is not all that
clear ;-).  To alleviate, you would have to say "git cmd takes only
2 arguments" if 'a' you are complaining about is the third one.

Also, many people would consider that "unexpected argument" is
better than "unknown argument".

I personally think the message above is absolutely clear and good.

You say that 'git refs verify' takes no arguments, and for somebody
who said "git refs verify a b c d e", there is no doubt that all of
these a b c d e are unwanted.  And there is no room to misinterpret
the message as "'git refs' is ok but 'git refs verify' is already
unwelcome with extra argument", either [*].

In short, I think the message in the patch here is good, and it is
the other "war on 'too many arguments'" series whose messages need
to be thought further.


[Foornote]

 * ... which was the problem I was trying to address in the current
   message "too many arguments" that does not even say which early
   part of the command line we consider is "command" that was given
   "arguments"---to uninitiated who said "git refs verify foo", it
   is unclera if that's "git refs" command whose first argument is
   "verify", "git" command whose first two arguments are "refs
   verify", etc.
Patrick Steinhardt Aug. 7, 2024, 5:55 a.m. UTC | #3
On Tue, Aug 06, 2024 at 09:15:28AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >> +	if (argc)
> >> +		usage(_("'git refs verify' takes no arguments"));
> >
> > Junio has posted a patch series [1] where he wants to get rid of
> > messages that simply say "no arguments" or "too many arguments".
> > ...
> > So I'd propose to make this:
> >
> >     argc = parse_options(argc, argv, prefix, options, verify_usage, 0);
> >     if (argc)
> >             usage(_("unknown argument: '%s'", argv[0]));
> 
> I probably should have said that I am fully behind the intent
> against "too many arguments", but I am not 100% behind the
> particular messaging used in the patch series I sent out.
> 
> One potential complaint I expected to hear, for example, was that "a
> is unknown" given when you said "git cmd a a a a a" is not all that
> clear ;-).  To alleviate, you would have to say "git cmd takes only
> 2 arguments" if 'a' you are complaining about is the third one.
> 
> Also, many people would consider that "unexpected argument" is
> better than "unknown argument".
> 
> I personally think the message above is absolutely clear and good.
> 
> You say that 'git refs verify' takes no arguments, and for somebody
> who said "git refs verify a b c d e", there is no doubt that all of
> these a b c d e are unwanted.  And there is no room to misinterpret
> the message as "'git refs' is ok but 'git refs verify' is already
> unwelcome with extra argument", either [*].
> 
> In short, I think the message in the patch here is good, and it is
> the other "war on 'too many arguments'" series whose messages need
> to be thought further.

Just to clarify: with "the patch" you probably refer to the current
version that Jialuo has, right? In other words, keep the current version
that he has and adapt the message in the future, when we have decided
what to do about those "too many arguments" messages?

If so, then the only two I had were some spurious newlines. I'm not sure
whether these really would be worth rerolling the whole patch series.

Patrick
shejialuo Aug. 7, 2024, 12:50 p.m. UTC | #4
> If so, then the only two I had were some spurious newlines. I'm not sure
> whether these really would be worth rerolling the whole patch series.
> 

Karthik has given some reviews. I guess I need to reroll because there
is one typo error in commit message. It's important to make this fixed.

> Patrick
Junio C Hamano Aug. 7, 2024, 3:55 p.m. UTC | #5
Patrick Steinhardt <ps@pks.im> writes:

> On Tue, Aug 06, 2024 at 09:15:28AM -0700, Junio C Hamano wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>> 
>> >> +	if (argc)
>> >> +		usage(_("'git refs verify' takes no arguments"));
>> > 
>> > Junio has posted a patch series [1] where he wants to get rid of
>> > messages that simply say "no arguments" or "too many arguments".
>> > ...
>> > So I'd propose to make this:
>> >
>> >     argc = parse_options(argc, argv, prefix, options, verify_usage, 0);
>> >     if (argc)
>> >             usage(_("unknown argument: '%s'", argv[0]));
>> 
>> I probably should have said that I am fully behind the intent
>> against "too many arguments", but I am not 100% behind the
>> particular messaging used in the patch series I sent out.
>> 
>> One potential complaint I expected to hear, for example, was that "a
>> is unknown" given when you said "git cmd a a a a a" is not all that
>> clear ;-).  To alleviate, you would have to say "git cmd takes only
>> 2 arguments" if 'a' you are complaining about is the third one.
>> 
>> Also, many people would consider that "unexpected argument" is
>> better than "unknown argument".
>> 
>> I personally think the message above is absolutely clear and good.
>> 
>> You say that 'git refs verify' takes no arguments, and for somebody
>> who said "git refs verify a b c d e", there is no doubt that all of
>> these a b c d e are unwanted.  And there is no room to misinterpret
>> the message as "'git refs' is ok but 'git refs verify' is already
>> unwelcome with extra argument", either [*].
>> 
>> In short, I think the message in the patch here is good, and it is
>> the other "war on 'too many arguments'" series whose messages need
>> to be thought further.
>
> Just to clarify: with "the patch" you probably refer to the current
> version that Jialuo has, right? In other words, keep the current version
> that he has and adapt the message in the future, when we have decided
> what to do about those "too many arguments" messages?

I meant that I think (1) that "'git refs verify' takes no arguments"
is a good message, and (2) that there is no further change needed to
the patch that started this review thread, regardless of how we want
to deal with "too many arguments" messages.

> If so, then the only two I had were some spurious newlines. I'm not sure
> whether these really would be worth rerolling the whole patch series.

Yup, those blank lines were annoying while scanning the patches, but
they alone would not be something that makes a reroll _required_.  A
reroll that clearly shows that the incremental change since the last
round is only these blank line changes is not too much to process,
so "not worth the reviewer time" is not a huge reason to avoid it,
either.  I'd see that it is up to how perfectionist the patch
submitter wants to be ;-)

Thanks.
Karthik Nayak Aug. 8, 2024, 10:22 a.m. UTC | #6
shejialuo <shejialuo@gmail.com> writes:

>> If so, then the only two I had were some spurious newlines. I'm not sure
>> whether these really would be worth rerolling the whole patch series.
>>
>
> Karthik has given some reviews. I guess I need to reroll because there
> is one typo error in commit message. It's important to make this fixed.
>
>> Patrick

I would say, my nits by themselves don't require a re-roll, but if
you're re-rolling, it'd be nice to resolve them too :)
diff mbox series

Patch

diff --git a/Documentation/git-refs.txt b/Documentation/git-refs.txt
index 5b99e04385..ce31f93061 100644
--- a/Documentation/git-refs.txt
+++ b/Documentation/git-refs.txt
@@ -10,6 +10,7 @@  SYNOPSIS
 --------
 [verse]
 'git refs migrate' --ref-format=<format> [--dry-run]
+'git refs verify' [--strict] [--verbose]
 
 DESCRIPTION
 -----------
@@ -22,6 +23,9 @@  COMMANDS
 migrate::
 	Migrate ref store between different formats.
 
+verify::
+	Verify reference database consistency.
+
 OPTIONS
 -------
 
@@ -39,6 +43,15 @@  include::ref-storage-format.txt[]
 	can be used to double check that the migration works as expected before
 	performing the actual migration.
 
+The following options are specific to 'git refs verify':
+
+--strict::
+	Enable stricter error checking. This will cause warnings to be
+	reported as errors. See linkgit:git-fsck[1].
+
+--verbose::
+	When verifying the reference database consistency, be chatty.
+
 KNOWN LIMITATIONS
 -----------------
 
diff --git a/builtin/refs.c b/builtin/refs.c
index 46dcd150d4..131f98be98 100644
--- a/builtin/refs.c
+++ b/builtin/refs.c
@@ -1,4 +1,6 @@ 
 #include "builtin.h"
+#include "config.h"
+#include "fsck.h"
 #include "parse-options.h"
 #include "refs.h"
 #include "repository.h"
@@ -7,6 +9,9 @@ 
 #define REFS_MIGRATE_USAGE \
 	N_("git refs migrate --ref-format=<format> [--dry-run]")
 
+#define REFS_VERIFY_USAGE \
+	N_("git refs verify [--strict] [--verbose]")
+
 static int cmd_refs_migrate(int argc, const char **argv, const char *prefix)
 {
 	const char * const migrate_usage[] = {
@@ -58,15 +63,44 @@  static int cmd_refs_migrate(int argc, const char **argv, const char *prefix)
 	return err;
 }
 
+static int cmd_refs_verify(int argc, const char **argv, const char *prefix)
+{
+	struct fsck_options fsck_refs_options = FSCK_REFS_OPTIONS_DEFAULT;
+	const char * const verify_usage[] = {
+		REFS_VERIFY_USAGE,
+		NULL,
+	};
+	struct option options[] = {
+		OPT_BOOL(0, "verbose", &fsck_refs_options.verbose, N_("be verbose")),
+		OPT_BOOL(0, "strict", &fsck_refs_options.strict, N_("enable strict checking")),
+		OPT_END(),
+	};
+	int ret;
+
+	argc = parse_options(argc, argv, prefix, options, verify_usage, 0);
+	if (argc)
+		usage(_("'git refs verify' takes no arguments"));
+
+	git_config(git_fsck_config, &fsck_refs_options);
+	prepare_repo_settings(the_repository);
+
+	ret = refs_fsck(get_main_ref_store(the_repository), &fsck_refs_options);
+
+	fsck_options_clear(&fsck_refs_options);
+	return ret;
+}
+
 int cmd_refs(int argc, const char **argv, const char *prefix)
 {
 	const char * const refs_usage[] = {
 		REFS_MIGRATE_USAGE,
+		REFS_VERIFY_USAGE,
 		NULL,
 	};
 	parse_opt_subcommand_fn *fn = NULL;
 	struct option opts[] = {
 		OPT_SUBCOMMAND("migrate", &fn, cmd_refs_migrate),
+		OPT_SUBCOMMAND("verify", &fn, cmd_refs_verify),
 		OPT_END(),
 	};
 
diff --git a/fsck.c b/fsck.c
index 38554b626e..7eb5cdefdd 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1333,6 +1333,17 @@  int fsck_finish(struct fsck_options *options)
 	return ret;
 }
 
+void fsck_options_clear(struct fsck_options *options)
+{
+	free(options->msg_type);
+	oidset_clear(&options->skip_oids);
+	oidset_clear(&options->gitmodules_found);
+	oidset_clear(&options->gitmodules_done);
+	oidset_clear(&options->gitattributes_found);
+	oidset_clear(&options->gitattributes_done);
+	kh_clear_oid_map(options->object_names);
+}
+
 int git_fsck_config(const char *var, const char *value,
 		    const struct config_context *ctx, void *cb)
 {
diff --git a/fsck.h b/fsck.h
index 2002590f60..d551a9fe86 100644
--- a/fsck.h
+++ b/fsck.h
@@ -153,7 +153,8 @@  struct fsck_ref_report {
 struct fsck_options {
 	fsck_walk_func walk;
 	fsck_error error_func;
-	unsigned strict:1;
+	unsigned strict;
+	unsigned verbose;
 	enum fsck_msg_type *msg_type;
 	struct oidset skip_oids;
 	struct oidset gitmodules_found;
@@ -231,6 +232,11 @@  int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
  */
 int fsck_finish(struct fsck_options *options);
 
+/*
+ * Clear the fsck_options struct, freeing any allocated memory.
+ */
+void fsck_options_clear(struct fsck_options *options);
+
 /*
  * Report an error or warning for refs.
  */