diff mbox series

[GSoC,v3,1/7] fsck: add refs check interfaces to interface with fsck error levels

Message ID ZnFCEYypdAyXMMlg@ArchLinux (mailing list archive)
State New, archived
Headers show
Series [GSoC,v3,1/7] fsck: add refs check interfaces to interface with fsck error levels | expand

Commit Message

shejialuo June 18, 2024, 8:15 a.m. UTC
The git-fsck(1) focuses on object database consistency check. It relies
on the "fsck_options" to interact with fsck error levels. However,
"fsck_options" aims at checking the object database which makes it
unsuitable to change the semantics of it. Instead, create
"fsck_refs_options" structure to handle refs consistency check.

The "git_fsck_config" sets up the "msg_type" and "skiplist" member of
the "fsck_options". For refs, we just need the "msg_type". In order to
allow setting up more refs-specific options easily later, add a separate
function "git_fsck_refs_config" to initialize the refs-specific options.

Move the "msg_type" and "strict" member to the top of the "fsck_options"
which allows us to convert "fsck_refs_options *" to "fsck_options *" to
reuse the interfaces provided by "fsck.h" without changing the original
code.

The static function "report" provided by "fsck.c" aims at reporting the
problems related to object database which cannot be reused for refs.
Provide "fsck_refs_report" function to integrate the fsck error levels
into reference consistency check.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 fsck.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fsck.h | 45 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 112 insertions(+), 2 deletions(-)

Comments

karthik nayak June 18, 2024, 8:38 a.m. UTC | #1
shejialuo <shejialuo@gmail.com> writes:

[snip]

>  struct fsck_options {
> +	/*
> +	 * Reorder the fields to allow `fsck_ref_options` to use
> +	 * the interfaces using `struct fsck_options`.
> +	 */

Why is this added? It makes sense to have it in the commit message
because it talks about the change, but why make it persistent in the
code?

[snip]
shejialuo June 18, 2024, 8:46 a.m. UTC | #2
On Tue, Jun 18, 2024 at 04:38:07AM -0400, Karthik Nayak wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> [snip]
> 
> >  struct fsck_options {
> > +	/*
> > +	 * Reorder the fields to allow `fsck_ref_options` to use
> > +	 * the interfaces using `struct fsck_options`.
> > +	 */
> 
> Why is this added? It makes sense to have it in the commit message
> because it talks about the change, but why make it persistent in the
> code?
> 

I explicitly add this comments due to the following reasons:

1. If someone needs to change the `fsck_options`, without this comment,
he might be just add some new fields at top. Although the change will
fail the tests here, I think we should mention this in code.
2. My later intention is that we should extract these common fields out
of the `fsck_options` and `fsck_ref_options`.

> [snip]
Junio C Hamano June 18, 2024, 3:25 p.m. UTC | #3
shejialuo <shejialuo@gmail.com> writes:

> On Tue, Jun 18, 2024 at 04:38:07AM -0400, Karthik Nayak wrote:
>> shejialuo <shejialuo@gmail.com> writes:
>> 
>> [snip]
>> 
>> >  struct fsck_options {
>> > +	/*
>> > +	 * Reorder the fields to allow `fsck_ref_options` to use
>> > +	 * the interfaces using `struct fsck_options`.
>> > +	 */
>> 
>> Why is this added? It makes sense to have it in the commit message
>> because it talks about the change, but why make it persistent in the
>> code?
>> 
>
> I explicitly add this comments due to the following reasons:
>
> 1. If someone needs to change the `fsck_options`, without this comment,
> he might be just add some new fields at top. Although the change will
> fail the tests here, I think we should mention this in code.

Do you mean you plan to take advantage of the fact that early
members of two structures are the same?  IOW, if there is a function
that takes a pointer to smaller fsck_refs_options, you plan to pass
a pointer to fsck_options from some callers, e.g.

    extern void func(struct fsck_refs_options *);
    void a_caller(struct fsck_options *o)
    {
	func((struct fsck_options *)o);
	...

If that is the case, then ...

Do not do that.

Your data structure design is broken.  Instead you would do this:

	struct fsck_options {
		struct fsck_refs_options refs;
		... other members ...
	};
	void a_caller(struct fsck_options *o)
	{
		func(&o->refs);
		...
shejialuo June 18, 2024, 3:47 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:
>
> shejialuo <shejialuo@gmail.com> writes:
>
> > On Tue, Jun 18, 2024 at 04:38:07AM -0400, Karthik Nayak wrote:
> >> shejialuo <shejialuo@gmail.com> writes:
> >>
> >> [snip]
> >>
> >> >  struct fsck_options {
> >> > +  /*
> >> > +   * Reorder the fields to allow `fsck_ref_options` to use
> >> > +   * the interfaces using `struct fsck_options`.
> >> > +   */
> >>
> >> Why is this added? It makes sense to have it in the commit message
> >> because it talks about the change, but why make it persistent in the
> >> code?
> >>
> >
> > I explicitly add this comments due to the following reasons:
> >
> > 1. If someone needs to change the `fsck_options`, without this comment,
> > he might be just add some new fields at top. Although the change will
> > fail the tests here, I think we should mention this in code.
>
> Do you mean you plan to take advantage of the fact that early
> members of two structures are the same?  IOW, if there is a function
> that takes a pointer to smaller fsck_refs_options, you plan to pass
> a pointer to fsck_options from some callers, e.g.
>
>     extern void func(struct fsck_refs_options *);
>     void a_caller(struct fsck_options *o)
>     {
>         func((struct fsck_options *)o);
>         ...
>

I do not want to convert "struct fsck_options*" to "struct fsck_refs_options*".
Instead, I want to convert "struct fsck_refs_options*" to "struct fsck_options*"
to reuse the functions which use the "struct fsck_options*" parameter.

Like the commit message said:

  Move the "msg_type" and "strict" member to the top of the "fsck_options"
  which allows us to convert "fsck_refs_options *" to "fsck_options *" to
  reuse the interfaces provided by "fsck.h" without changing the original
  code.

It may seem we should add some fields into `fsck_options`, but I don't think
it's a good idea. I will elaborate my design here:

The git-fsck(1) is highly relevant with the object database. I don't want to add
some new fields into "fsck_options" due to the following reason:

The fields in fsck_options are all related to objects except "fsck_msg_type"
and "strict". Adding some ref-related fields into "fsck_options" will break the
semantics.

Actually, it may be perfect that I may abstract some interfaces here, however,
it would be too complicated for this patch, many functions use "fsck_options *"
as the parameter. I think we may do this later.
shejialuo June 19, 2024, 2:39 a.m. UTC | #5
On Tue, Jun 18, 2024 at 08:25:34AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > On Tue, Jun 18, 2024 at 04:38:07AM -0400, Karthik Nayak wrote:
> >> shejialuo <shejialuo@gmail.com> writes:
> >> 
> >> [snip]
> >> 
> >> >  struct fsck_options {
> >> > +	/*
> >> > +	 * Reorder the fields to allow `fsck_ref_options` to use
> >> > +	 * the interfaces using `struct fsck_options`.
> >> > +	 */
> >> 
> >> Why is this added? It makes sense to have it in the commit message
> >> because it talks about the change, but why make it persistent in the
> >> code?
> >> 
> >
> > I explicitly add this comments due to the following reasons:
> >
> > 1. If someone needs to change the `fsck_options`, without this comment,
> > he might be just add some new fields at top. Although the change will
> > fail the tests here, I think we should mention this in code.
> 
> Do you mean you plan to take advantage of the fact that early
> members of two structures are the same?  IOW, if there is a function
> that takes a pointer to smaller fsck_refs_options, you plan to pass
> a pointer to fsck_options from some callers, e.g.
> 
>     extern void func(struct fsck_refs_options *);
>     void a_caller(struct fsck_options *o)
>     {
> 	func((struct fsck_options *)o);
> 	...
> 
> If that is the case, then ...
> 
> Do not do that.
> 
> Your data structure design is broken.  Instead you would do this:
> 
> 	struct fsck_options {
> 		struct fsck_refs_options refs;
> 		... other members ...
> 	};
> 	void a_caller(struct fsck_options *o)
> 	{
> 		func(&o->refs);
> 		...

Well, I totally agree with this. It's bad to convert the pointer type. I
will change the code in this patch to make it OK.

Thanks for your advice.
diff mbox series

Patch

diff --git a/fsck.c b/fsck.c
index 8ef962199f..13528c646e 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1249,6 +1249,20 @@  int fsck_buffer(const struct object_id *oid, enum object_type type,
 		      type);
 }
 
+int fsck_refs_error_function(struct fsck_refs_options *o UNUSED,
+			     const char *name,
+			     enum fsck_msg_type msg_type,
+			     enum fsck_msg_id msg_id UNUSED,
+			     const char *message)
+{
+	if (msg_type == FSCK_WARN) {
+		warning("%s: %s", name, message);
+		return 0;
+	}
+	error("%s: %s", name, message);
+	return 1;
+}
+
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type UNUSED,
@@ -1323,6 +1337,61 @@  int fsck_finish(struct fsck_options *options)
 	return ret;
 }
 
+int fsck_refs_report(struct fsck_refs_options *o,
+		     const char *name,
+		     enum fsck_msg_id msg_id,
+		     const char *fmt, ...)
+{
+	va_list ap;
+	struct strbuf sb = STRBUF_INIT;
+	enum fsck_msg_type msg_type =
+		fsck_msg_type(msg_id, (struct fsck_options*)o);
+	int ret = 0;
+
+	if (msg_type == FSCK_IGNORE)
+		return 0;
+
+	if (msg_type == FSCK_FATAL)
+		msg_type = FSCK_ERROR;
+	else if (msg_type == FSCK_INFO)
+		msg_type = FSCK_WARN;
+
+	prepare_msg_ids();
+	strbuf_addf(&sb, "%s: ", msg_id_info[msg_id].camelcased);
+
+	va_start(ap, fmt);
+	strbuf_vaddf(&sb, fmt, ap);
+	ret = o->error_func(o, name, msg_type, msg_id, sb.buf);
+	strbuf_release(&sb);
+	va_end(ap);
+
+	return ret;
+}
+
+int git_fsck_refs_config(const char *var, const char *value,
+			 const struct config_context *ctx, void *cb)
+{
+	struct fsck_refs_options *options = cb;
+	const char *msg_id;
+
+	/*
+	 * We don't check the value of fsck.skiplist here, because it
+	 * is specific to object database, not reference database.
+	 */
+	if (strcmp(var, "fsck.skiplist") == 0) {
+		return 0;
+	}
+
+	if (skip_prefix(var, "fsck.", &msg_id)) {
+		if (!value)
+			return config_error_nonbool(var);
+		fsck_set_msg_type((struct fsck_options*)options, msg_id, value);
+		return 0;
+	}
+
+	return git_default_config(var, value, ctx, cb);
+}
+
 int git_fsck_config(const char *var, const char *value,
 		    const struct config_context *ctx, void *cb)
 {
diff --git a/fsck.h b/fsck.h
index 17fa2dda5d..6a38ac4a16 100644
--- a/fsck.h
+++ b/fsck.h
@@ -96,6 +96,7 @@  enum fsck_msg_id {
 };
 #undef MSG_ID
 
+struct fsck_refs_options;
 struct fsck_options;
 struct object;
 
@@ -107,6 +108,21 @@  void fsck_set_msg_type(struct fsck_options *options,
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
 int is_valid_msg_type(const char *msg_id, const char *msg_type);
 
+/*
+ * callback function for fsck refs and reflogs.
+ */
+typedef int (*fsck_refs_error)(struct fsck_refs_options *o,
+			       const char *name,
+			       enum fsck_msg_type msg_type,
+			       enum fsck_msg_id msg_id,
+			       const char *message);
+
+int fsck_refs_error_function(struct fsck_refs_options *o,
+			     const char *name,
+			     enum fsck_msg_type msg_type,
+			     enum fsck_msg_id msg_id,
+			     const char *message);
+
 /*
  * callback function for fsck_walk
  * type is the expected type of the object or OBJ_ANY
@@ -135,11 +151,28 @@  int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
 					   enum fsck_msg_id msg_id,
 					   const char *message);
 
+struct fsck_refs_options {
+	enum fsck_msg_type *msg_type;
+	unsigned strict:1;
+
+	fsck_refs_error error_func;
+	unsigned verbose:1;
+};
+
+#define FSCK_REFS_OPTIONS_DEFAULT { \
+	.error_func = fsck_refs_error_function, \
+}
+
 struct fsck_options {
+	/*
+	 * Reorder the fields to allow `fsck_ref_options` to use
+	 * the interfaces using `struct fsck_options`.
+	 */
+	enum fsck_msg_type *msg_type;
+	unsigned strict:1;
+
 	fsck_walk_func walk;
 	fsck_error error_func;
-	unsigned strict:1;
-	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
 	struct oidset gitmodules_found;
 	struct oidset gitmodules_done;
@@ -221,6 +254,12 @@  int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
  */
 int fsck_finish(struct fsck_options *options);
 
+__attribute__((format (printf, 4, 5)))
+int fsck_refs_report(struct fsck_refs_options *o,
+		     const char *name,
+		     enum fsck_msg_id msg_id,
+		     const char *fmt, ...);
+
 /*
  * Subsystem for storing human-readable names for each object.
  *
@@ -247,6 +286,8 @@  const char *fsck_describe_object(struct fsck_options *options,
 				 const struct object_id *oid);
 
 struct key_value_info;
+int git_fsck_refs_config(const char *var, const char *value,
+			 const struct config_context *ctx, void *cb);
 /*
  * git_config() callback for use by fsck-y tools that want to support
  * fsck.<msg> fsck.skipList etc.