diff mbox series

[v2,6/8] packed-backend: add "packed-refs" entry consistency check

Message ID Z5r7Hlk_VS0jYU74@ArchLinux (mailing list archive)
State New
Headers show
Series add more ref consistency checks | expand

Commit Message

shejialuo Jan. 30, 2025, 4:07 a.m. UTC
"packed-backend.c::next_record" will parse the ref entry to check the
consistency. This function has already checked the following things:

1. Parse the main line of the ref entry, if the oid is not correct. It
   will die the program. And then it will check whether the next
   character of the oid is space. Then it will check whether the refname
   is correct.
2. If the next line starts with '^', it will continue to parse the oid
   of the peeled oid content and check whether the last character is
   '\n'.

We can iterate each line by using the "packed_fsck_ref_next_line"
function. Then, create a new fsck message "badPackedRefEntry(ERROR)" to
report to the user when something is wrong.

Create two new functions "packed_fsck_ref_main_line" and
"packed_fsck_ref_peeled_line" for case 1 and case 2 respectively. Last,
update the unit test to exercise the code.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  3 ++
 fsck.h                        |  1 +
 refs/packed-backend.c         | 98 ++++++++++++++++++++++++++++++++++-
 t/t0602-reffiles-fsck.sh      | 42 +++++++++++++++
 4 files changed, 143 insertions(+), 1 deletion(-)

Comments

Patrick Steinhardt Feb. 3, 2025, 8:40 a.m. UTC | #1
On Thu, Jan 30, 2025 at 12:07:58PM +0800, shejialuo wrote:
> "packed-backend.c::next_record" will parse the ref entry to check the
> consistency. This function has already checked the following things:
> 
> 1. Parse the main line of the ref entry, if the oid is not correct. It
>    will die the program. And then it will check whether the next
>    character of the oid is space. Then it will check whether the refname
>    is correct.
> 2. If the next line starts with '^', it will continue to parse the oid
>    of the peeled oid content and check whether the last character is
>    '\n'.
> 
> We can iterate each line by using the "packed_fsck_ref_next_line"
> function. Then, create a new fsck message "badPackedRefEntry(ERROR)" to
> report to the user when something is wrong.
> 
> Create two new functions "packed_fsck_ref_main_line" and
> "packed_fsck_ref_peeled_line" for case 1 and case 2 respectively. Last,
> update the unit test to exercise the code.

I think this message is going into too much detail about _how_ you are
doing things compared to _what_ you are doing and what the intent is.

> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 870c8e7aaa..271c740728 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1819,10 +1819,86 @@ static int packed_fsck_ref_header(struct fsck_options *o, const char *start, con
>  	return 0;
>  }
>  
> +static int packed_fsck_ref_peeled_line(struct fsck_options *o,
> +				       struct ref_store *ref_store,
> +				       struct strbuf *packed_entry,
> +				       const char *start, const char *eol)
> +{
> +	struct fsck_ref_report report = { 0 };
> +	struct object_id peeled;
> +	const char *p;
> +
> +	report.path = packed_entry->buf;
> +
> +	start++;

It's a bit weird that we increment `start` here, as it is very intimate
with how the caller calls us. Might be easier to reason about when the
caller did this for us.

> +	if (parse_oid_hex_algop(start, &peeled, &p, ref_store->repo->hash_algo)) {
> +		return fsck_report_ref(o, &report,
> +				       FSCK_MSG_BAD_PACKED_REF_ENTRY,
> +				       "'%.*s' has invalid peeled oid",
> +				       (int)(eol - start), start);
> +	}

All the braces around those single-line return statements can go away.

> @@ -1843,6 +1919,26 @@ static int packed_fsck_ref_content(struct fsck_options *o,
>  				       "missing header line");
>  	}
>  
> +	while (start < eof) {
> +		strbuf_reset(&packed_entry);
> +		strbuf_addf(&packed_entry, "packed-refs line %d", line_number);
> +		ret |= packed_fsck_ref_next_line(o, &packed_entry, start, eof, &eol);
> +		ret |= packed_fsck_ref_main_line(o, ref_store, &packed_entry, &refname, start, eol);

Don't we have to stop in case `next_line()` returns an error?

Patrick
shejialuo Feb. 4, 2025, 4:28 a.m. UTC | #2
On Mon, Feb 03, 2025 at 09:40:25AM +0100, Patrick Steinhardt wrote:
> On Thu, Jan 30, 2025 at 12:07:58PM +0800, shejialuo wrote:
> > "packed-backend.c::next_record" will parse the ref entry to check the
> > consistency. This function has already checked the following things:
> > 
> > 1. Parse the main line of the ref entry, if the oid is not correct. It
> >    will die the program. And then it will check whether the next
> >    character of the oid is space. Then it will check whether the refname
> >    is correct.
> > 2. If the next line starts with '^', it will continue to parse the oid
> >    of the peeled oid content and check whether the last character is
> >    '\n'.
> > 
> > We can iterate each line by using the "packed_fsck_ref_next_line"
> > function. Then, create a new fsck message "badPackedRefEntry(ERROR)" to
> > report to the user when something is wrong.
> > 
> > Create two new functions "packed_fsck_ref_main_line" and
> > "packed_fsck_ref_peeled_line" for case 1 and case 2 respectively. Last,
> > update the unit test to exercise the code.
> 
> I think this message is going into too much detail about _how_ you are
> doing things compared to _what_ you are doing and what the intent is.
> 

I think I have caused some confusion here. The reason why I mention what
"next_record" does is that I want to port these two checks. Let me
improve this in the next version. I will highlight more about the
motivation.

> > diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> > index 870c8e7aaa..271c740728 100644
> > --- a/refs/packed-backend.c
> > +++ b/refs/packed-backend.c
> > @@ -1819,10 +1819,86 @@ static int packed_fsck_ref_header(struct fsck_options *o, const char *start, con
> >  	return 0;
> >  }
> >  
> > +static int packed_fsck_ref_peeled_line(struct fsck_options *o,
> > +				       struct ref_store *ref_store,
> > +				       struct strbuf *packed_entry,
> > +				       const char *start, const char *eol)
> > +{
> > +	struct fsck_ref_report report = { 0 };
> > +	struct object_id peeled;
> > +	const char *p;
> > +
> > +	report.path = packed_entry->buf;
> > +
> > +	start++;
> 
> It's a bit weird that we increment `start` here, as it is very intimate
> with how the caller calls us. Might be easier to reason about when the
> caller did this for us.
> 

For each ref entry, we have two pointers, one is the `start` which is
used to indicate the start of the line and `eol` is the end of the line.

Let's see how we call this function:

		if (start < eof && *start == '^') {
			strbuf_reset(&packed_entry);
			strbuf_addf(&packed_entry, "packed-refs line %d", line_number);
			ret |= packed_fsck_ref_next_line(o, &packed_entry, start, eof, &eol);
			ret |= packed_fsck_ref_peeled_line(o, ref_store, &packed_entry,
							   start, eol);
			start = eol + 1;
			line_number++;
		}

The reason why we do this is that we need to skip the '^' character. I
don't do this in the `if` statement. This is because I want to make the
semantics of the `start` variable unchanged.

I would add a comment here to explain why we need to execute "start++".

> > +	if (parse_oid_hex_algop(start, &peeled, &p, ref_store->repo->hash_algo)) {
> > +		return fsck_report_ref(o, &report,
> > +				       FSCK_MSG_BAD_PACKED_REF_ENTRY,
> > +				       "'%.*s' has invalid peeled oid",
> > +				       (int)(eol - start), start);
> > +	}
> 
> All the braces around those single-line return statements can go away.
> 

I see. So, I have misunderstanding here. I have thought that we should
add braces because we have split this single statement into multiple
lines. Let me update this in the next version.

> > @@ -1843,6 +1919,26 @@ static int packed_fsck_ref_content(struct fsck_options *o,
> >  				       "missing header line");
> >  	}
> >  
> > +	while (start < eof) {
> > +		strbuf_reset(&packed_entry);
> > +		strbuf_addf(&packed_entry, "packed-refs line %d", line_number);
> > +		ret |= packed_fsck_ref_next_line(o, &packed_entry, start, eof, &eol);
> > +		ret |= packed_fsck_ref_main_line(o, ref_store, &packed_entry, &refname, start, eol);
> 
> Don't we have to stop in case `next_line()` returns an error?
> 

No, we don't have to stop. We will continue to check the last ref entry,
this is intentional, we still need to check the last ref entry even
though there is no newline. I don't think we should ignore this part.

Thanks,
Jialuo
diff mbox series

Patch

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 34375a3143..2a7ec7592e 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -16,6 +16,9 @@ 
 `badObjectSha1`::
 	(ERROR) An object has a bad sha1.
 
+`badPackedRefEntry`::
+	(ERROR) The "packed-refs" file contains an invalid entry.
+
 `badPackedRefHeader`::
 	(ERROR) The "packed-refs" file contains an invalid
 	header.
diff --git a/fsck.h b/fsck.h
index 3107a0093d..40126242a4 100644
--- a/fsck.h
+++ b/fsck.h
@@ -30,6 +30,7 @@  enum fsck_msg_type {
 	FUNC(BAD_EMAIL, ERROR) \
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
+	FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
 	FUNC(BAD_PACKED_REF_HEADER, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
 	FUNC(BAD_REF_CONTENT, ERROR) \
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 870c8e7aaa..271c740728 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1819,10 +1819,86 @@  static int packed_fsck_ref_header(struct fsck_options *o, const char *start, con
 	return 0;
 }
 
+static int packed_fsck_ref_peeled_line(struct fsck_options *o,
+				       struct ref_store *ref_store,
+				       struct strbuf *packed_entry,
+				       const char *start, const char *eol)
+{
+	struct fsck_ref_report report = { 0 };
+	struct object_id peeled;
+	const char *p;
+
+	report.path = packed_entry->buf;
+
+	start++;
+	if (parse_oid_hex_algop(start, &peeled, &p, ref_store->repo->hash_algo)) {
+		return fsck_report_ref(o, &report,
+				       FSCK_MSG_BAD_PACKED_REF_ENTRY,
+				       "'%.*s' has invalid peeled oid",
+				       (int)(eol - start), start);
+	}
+
+	if (p != eol) {
+		return fsck_report_ref(o, &report,
+				       FSCK_MSG_BAD_PACKED_REF_ENTRY,
+				       "has trailing garbage after peeled oid '%.*s'",
+				       (int)(eol - p), p);
+	}
+
+	return 0;
+}
+
+static int packed_fsck_ref_main_line(struct fsck_options *o,
+				     struct ref_store *ref_store,
+				     struct strbuf *packed_entry,
+				     struct strbuf *refname,
+				     const char *start, const char *eol)
+{
+	struct fsck_ref_report report = { 0 };
+	struct object_id oid;
+	const char *p;
+
+	report.path = packed_entry->buf;
+
+	if (parse_oid_hex_algop(start, &oid, &p, ref_store->repo->hash_algo)) {
+		return fsck_report_ref(o, &report,
+				       FSCK_MSG_BAD_PACKED_REF_ENTRY,
+				       "'%.*s' has invalid oid",
+				       (int)(eol - start), start);
+	}
+
+	if (p == eol || !isspace(*p)) {
+		return fsck_report_ref(o, &report,
+				       FSCK_MSG_BAD_PACKED_REF_ENTRY,
+				       "has no space after oid '%s' but with '%.*s'",
+				       oid_to_hex(&oid), (int)(eol - p), p);
+	}
+
+	p++;
+	strbuf_reset(refname);
+	strbuf_add(refname, p, eol - p);
+	if (refname_contains_nul(refname)) {
+		return fsck_report_ref(o, &report,
+				       FSCK_MSG_BAD_PACKED_REF_ENTRY,
+				       "refname '%s' contains NULL binaries",
+				       refname->buf);
+	}
+
+	if (check_refname_format(refname->buf, 0)) {
+		return fsck_report_ref(o, &report,
+				       FSCK_MSG_BAD_REF_NAME,
+				       "has bad refname '%s'", refname->buf);
+	}
+
+	return 0;
+}
+
 static int packed_fsck_ref_content(struct fsck_options *o,
+				   struct ref_store *ref_store,
 				   const char *start, const char *eof)
 {
 	struct strbuf packed_entry = STRBUF_INIT;
+	struct strbuf refname = STRBUF_INIT;
 	int line_number = 1;
 	const char *eol;
 	int ret = 0;
@@ -1843,6 +1919,26 @@  static int packed_fsck_ref_content(struct fsck_options *o,
 				       "missing header line");
 	}
 
+	while (start < eof) {
+		strbuf_reset(&packed_entry);
+		strbuf_addf(&packed_entry, "packed-refs line %d", line_number);
+		ret |= packed_fsck_ref_next_line(o, &packed_entry, start, eof, &eol);
+		ret |= packed_fsck_ref_main_line(o, ref_store, &packed_entry, &refname, start, eol);
+		start = eol + 1;
+		line_number++;
+		if (start < eof && *start == '^') {
+			strbuf_reset(&packed_entry);
+			strbuf_addf(&packed_entry, "packed-refs line %d", line_number);
+			ret |= packed_fsck_ref_next_line(o, &packed_entry, start, eof, &eol);
+			ret |= packed_fsck_ref_peeled_line(o, ref_store, &packed_entry,
+							   start, eol);
+			start = eol + 1;
+			line_number++;
+		}
+	}
+
+	strbuf_release(&packed_entry);
+	strbuf_release(&refname);
 	strbuf_release(&packed_entry);
 	return ret;
 }
@@ -1890,7 +1986,7 @@  static int packed_fsck(struct ref_store *ref_store,
 		goto cleanup;
 	}
 
-	ret = packed_fsck_ref_content(o, packed_ref_content.buf,
+	ret = packed_fsck_ref_content(o, ref_store, packed_ref_content.buf,
 				      packed_ref_content.buf + packed_ref_content.len);
 
 cleanup:
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index a7b46b6cb9..e4b4a58684 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -685,4 +685,46 @@  test_expect_success 'packed-refs header should be checked' '
 	)
 '
 
+test_expect_success 'packed-refs content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit default &&
+		git branch branch-1 &&
+		git branch branch-2 &&
+		git tag -a annotated-tag-1 -m tag-1 &&
+		git tag -a annotated-tag-2 -m tag-2 &&
+
+		branch_1_oid=$(git rev-parse branch-1) &&
+		branch_2_oid=$(git rev-parse branch-2) &&
+		tag_1_oid=$(git rev-parse annotated-tag-1) &&
+		tag_2_oid=$(git rev-parse annotated-tag-2) &&
+		tag_1_peeled_oid=$(git rev-parse annotated-tag-1^{}) &&
+		tag_2_peeled_oid=$(git rev-parse annotated-tag-2^{}) &&
+		short_oid=$(printf "%s" $tag_1_peeled_oid | cut -c 1-4) &&
+
+		printf "# pack-refs with: peeled fully-peeled sorted \n"  >.git/packed-refs &&
+		printf "%s\n" "$short_oid refs/heads/branch-1" >>.git/packed-refs &&
+		printf "%sx\n" "$branch_1_oid" >>.git/packed-refs &&
+		printf "%s   refs/heads/bad-branch\n" "$branch_2_oid" >>.git/packed-refs &&
+		printf "%s refs/heads/branch.\n" "$branch_2_oid" >>.git/packed-refs &&
+		printf "%s refs/tags/annotated-tag-3\n" "$tag_1_oid" >>.git/packed-refs &&
+		printf "^%s\n" "$short_oid" >>.git/packed-refs &&
+		printf "%s refs/tags/annotated-tag-4.\n" "$tag_2_oid" >>.git/packed-refs &&
+		printf "^%s garbage\n" "$tag_2_peeled_oid" >>.git/packed-refs &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: packed-refs line 2: badPackedRefEntry: '\''$short_oid refs/heads/branch-1'\'' has invalid oid
+		error: packed-refs line 3: badPackedRefEntry: has no space after oid '\''$branch_1_oid'\'' but with '\''x'\''
+		error: packed-refs line 4: badRefName: has bad refname '\''  refs/heads/bad-branch'\''
+		error: packed-refs line 5: badRefName: has bad refname '\''refs/heads/branch.'\''
+		error: packed-refs line 7: badPackedRefEntry: '\''$short_oid'\'' has invalid peeled oid
+		error: packed-refs line 8: badRefName: has bad refname '\''refs/tags/annotated-tag-4.'\''
+		error: packed-refs line 9: badPackedRefEntry: has trailing garbage after peeled oid '\'' garbage'\''
+		EOF
+		test_cmp expect err
+	)
+'
+
 test_done