diff mbox series

[v3,4/8] packed-backend: add "packed-refs" header consistency check

Message ID Z6RPqE64ScsjzUg7@ArchLinux (mailing list archive)
State Superseded
Headers show
Series add more ref consistency checks | expand

Commit Message

shejialuo Feb. 6, 2025, 5:59 a.m. UTC
In "packed-backend.c::create_snapshot", if there is a header (the line
which starts with '#'), we will check whether the line starts with "#
pack-refs with:". As we are going to implement the header consistency
check, we should port this check into "packed_fsck".

However, we need to consider other situations and discuss whether we
need to add checks.

1. If the header does not exist, we should not report an error to the
   user. This is because in older Git version, we never write header in
   the "packed-refs" file. Also, we do allow no header in "packed-refs"
   in runtime.
2. If the header content does not start with "# packed-ref with:", we
   should report an error just like what "create_snapshot" does. So,
   create a new fsck message "badPackedRefHeader(ERROR)" for this.
3. If the header content is not the same as the constant string
   "PACKED_REFS_HEADER". This is expected because we make it extensible
   intentionally. So, there is no need to report.

As we have analyzed, we only need to check the case 2 in the above. In
order to do this, read the "packed-refs" file via "strbuf_read". Like
what "create_snapshot" and other functions do, we could split the line
by finding the next newline in the buffer. When we cannot find a
newline, we could report an error.

So, create a function "packed_fsck_ref_next_line" to find the next
newline and if there is no such newline, use
"packedRefEntryNotTerminated(ERROR)" to report an error to the user.

Then, parse the first line to apply the checks. Update the test to
exercise the code.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  8 ++++
 fsck.h                        |  2 +
 refs/packed-backend.c         | 73 +++++++++++++++++++++++++++++++++++
 t/t0602-reffiles-fsck.sh      | 25 ++++++++++++
 4 files changed, 108 insertions(+)

Comments

Patrick Steinhardt Feb. 12, 2025, 9:56 a.m. UTC | #1
On Thu, Feb 06, 2025 at 01:59:04PM +0800, shejialuo wrote:
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 6401cecd5f..683cfe78dc 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1749,12 +1749,76 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
> +static int packed_fsck_ref_header(struct fsck_options *o,
> +				  const char *start, const char *eol)
> +{
> +	if (!starts_with(start, "# pack-refs with:")) {
> +		struct fsck_ref_report report = { 0 };
> +		report.path = "packed-refs.header";
> +
> +		return fsck_report_ref(o, &report,
> +				       FSCK_MSG_BAD_PACKED_REF_HEADER,
> +				       "'%.*s' does not start with '# pack-refs with:'",
> +				       (int)(eol - start), start);
> +	}
> +
> +	return 0;
> +}

Okay. We still complain about bad headers, but only if there is a line
starting with "#" and only if the prefix doesn't match. This addresses
Junio's comment that packfiles don't have to have a header, and that
they may contain capabilities that we don't understand.

> diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> index 42c8d4ca1e..da321f16c6 100755
> --- a/t/t0602-reffiles-fsck.sh
> +++ b/t/t0602-reffiles-fsck.sh
> @@ -639,4 +639,29 @@ test_expect_success SYMLINKS 'the filetype of packed-refs should be checked' '
>  	)
>  '
>  
> +test_expect_success 'packed-refs header should be checked' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit default &&
> +
> +		git refs verify 2>err &&
> +		test_must_be_empty err &&
> +
> +		for bad_header in "# pack-refs wit: peeled fully-peeled sorted " \
> +				  "# pack-refs with traits: peeled fully-peeled sorted " \
> +				  "# pack-refs with a: peeled fully-peeled"

Instead of verifying thrice that we complain about bad header prefixes,
should we maybe replace two of these with instances where we check a
packed-refs file _without_ a header and one with capabilities that we
don't understand?

Patrick
shejialuo Feb. 12, 2025, 10:12 a.m. UTC | #2
On Wed, Feb 12, 2025 at 10:56:43AM +0100, Patrick Steinhardt wrote:

[snip]

> > diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> > index 42c8d4ca1e..da321f16c6 100755
> > --- a/t/t0602-reffiles-fsck.sh
> > +++ b/t/t0602-reffiles-fsck.sh
> > @@ -639,4 +639,29 @@ test_expect_success SYMLINKS 'the filetype of packed-refs should be checked' '
> >  	)
> >  '
> >  
> > +test_expect_success 'packed-refs header should be checked' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		test_commit default &&
> > +
> > +		git refs verify 2>err &&
> > +		test_must_be_empty err &&
> > +
> > +		for bad_header in "# pack-refs wit: peeled fully-peeled sorted " \
> > +				  "# pack-refs with traits: peeled fully-peeled sorted " \
> > +				  "# pack-refs with a: peeled fully-peeled"
> 
> Instead of verifying thrice that we complain about bad header prefixes,
> should we maybe replace two of these with instances where we check a
> packed-refs file _without_ a header and one with capabilities that we
> don't understand?
> 

I think we could add some tests to verify that we won't complain about
above two cases where packed-refs file without a header and one with
capabilities that we don't understand.

> Patrick
Junio C Hamano Feb. 12, 2025, 5:48 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

> On Thu, Feb 06, 2025 at 01:59:04PM +0800, shejialuo wrote:
>> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
>> index 6401cecd5f..683cfe78dc 100644
>> --- a/refs/packed-backend.c
>> +++ b/refs/packed-backend.c
>> @@ -1749,12 +1749,76 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
>> +static int packed_fsck_ref_header(struct fsck_options *o,
>> +				  const char *start, const char *eol)
>> +{
>> +	if (!starts_with(start, "# pack-refs with:")) {
>> +		struct fsck_ref_report report = { 0 };
>> +		report.path = "packed-refs.header";
>> +
>> +		return fsck_report_ref(o, &report,
>> +				       FSCK_MSG_BAD_PACKED_REF_HEADER,
>> +				       "'%.*s' does not start with '# pack-refs with:'",
>> +				       (int)(eol - start), start);
>> +	}
>> +
>> +	return 0;
>> +}
>
> Okay. We still complain about bad headers, but only if there is a line
> starting with "#" and only if the prefix doesn't match. This addresses
> Junio's comment that packfiles don't have to have a header, and that
> they may contain capabilities that we don't understand.

We'd want to also ensure that there is a single trailing whitespace
after that colon, which we have always written after "with:", no?

>> diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
>> index 42c8d4ca1e..da321f16c6 100755
>> --- a/t/t0602-reffiles-fsck.sh
>> +++ b/t/t0602-reffiles-fsck.sh
>> @@ -639,4 +639,29 @@ test_expect_success SYMLINKS 'the filetype of packed-refs should be checked' '
>>  	)
>>  '
>>  
>> +test_expect_success 'packed-refs header should be checked' '
>> +	test_when_finished "rm -rf repo" &&
>> +	git init repo &&
>> +	(
>> +		cd repo &&
>> +		test_commit default &&
>> +
>> +		git refs verify 2>err &&
>> +		test_must_be_empty err &&
>> +
>> +		for bad_header in "# pack-refs wit: peeled fully-peeled sorted " \
>> +				  "# pack-refs with traits: peeled fully-peeled sorted " \
>> +				  "# pack-refs with a: peeled fully-peeled"
>
> Instead of verifying thrice that we complain about bad header prefixes,
> should we maybe replace two of these with instances where we check a
> packed-refs file _without_ a header and one with capabilities that we
> don't understand?

Yup.  I also notice that refs/packed-backend.c:create_snapshot()
would accept "# pack-refs with:peeled" if I am not reading it
correctly, which is an unrelated bug.

Thanks.
shejialuo Feb. 14, 2025, 3:53 a.m. UTC | #4
On Wed, Feb 12, 2025 at 09:48:09AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > On Thu, Feb 06, 2025 at 01:59:04PM +0800, shejialuo wrote:
> >> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> >> index 6401cecd5f..683cfe78dc 100644
> >> --- a/refs/packed-backend.c
> >> +++ b/refs/packed-backend.c
> >> @@ -1749,12 +1749,76 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
> >> +static int packed_fsck_ref_header(struct fsck_options *o,
> >> +				  const char *start, const char *eol)
> >> +{
> >> +	if (!starts_with(start, "# pack-refs with:")) {
> >> +		struct fsck_ref_report report = { 0 };
> >> +		report.path = "packed-refs.header";
> >> +
> >> +		return fsck_report_ref(o, &report,
> >> +				       FSCK_MSG_BAD_PACKED_REF_HEADER,
> >> +				       "'%.*s' does not start with '# pack-refs with:'",
> >> +				       (int)(eol - start), start);
> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >
> > Okay. We still complain about bad headers, but only if there is a line
> > starting with "#" and only if the prefix doesn't match. This addresses
> > Junio's comment that packfiles don't have to have a header, and that
> > they may contain capabilities that we don't understand.
> 
> We'd want to also ensure that there is a single trailing whitespace
> after that colon, which we have always written after "with:", no?
> 

As you have commented below, I don't add this check due to the reason
that "create_snapshot" method does _not_ check this.

> >> diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> >> index 42c8d4ca1e..da321f16c6 100755
> >> --- a/t/t0602-reffiles-fsck.sh
> >> +++ b/t/t0602-reffiles-fsck.sh
> >> @@ -639,4 +639,29 @@ test_expect_success SYMLINKS 'the filetype of packed-refs should be checked' '
> >>  	)
> >>  '
> >>  
> >> +test_expect_success 'packed-refs header should be checked' '
> >> +	test_when_finished "rm -rf repo" &&
> >> +	git init repo &&
> >> +	(
> >> +		cd repo &&
> >> +		test_commit default &&
> >> +
> >> +		git refs verify 2>err &&
> >> +		test_must_be_empty err &&
> >> +
> >> +		for bad_header in "# pack-refs wit: peeled fully-peeled sorted " \
> >> +				  "# pack-refs with traits: peeled fully-peeled sorted " \
> >> +				  "# pack-refs with a: peeled fully-peeled"
> >
> > Instead of verifying thrice that we complain about bad header prefixes,
> > should we maybe replace two of these with instances where we check a
> > packed-refs file _without_ a header and one with capabilities that we
> > don't understand?
> 
> Yup.  I also notice that refs/packed-backend.c:create_snapshot()
> would accept "# pack-refs with:peeled" if I am not reading it
> correctly, which is an unrelated bug.
> 

Yes, you are correct. Let me fix this in the next version.

Thanks,
Jialuo
diff mbox series

Patch

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index b14bc44ca4..11906f90fd 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -16,6 +16,10 @@ 
 `badObjectSha1`::
 	(ERROR) An object has a bad sha1.
 
+`badPackedRefHeader`::
+	(ERROR) The "packed-refs" file contains an invalid
+	header.
+
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
@@ -176,6 +180,10 @@ 
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`packedRefEntryNotTerminated`::
+	(ERROR) The "packed-refs" file contains an entry that is
+	not terminated by a newline.
+
 `refMissingNewline`::
 	(INFO) A loose ref that does not end with newline(LF). As
 	valid implementations of Git never created such a loose ref
diff --git a/fsck.h b/fsck.h
index a44c231a5f..67e3c97bc0 100644
--- a/fsck.h
+++ b/fsck.h
@@ -30,6 +30,7 @@  enum fsck_msg_type {
 	FUNC(BAD_EMAIL, ERROR) \
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
+	FUNC(BAD_PACKED_REF_HEADER, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
@@ -53,6 +54,7 @@  enum fsck_msg_type {
 	FUNC(MISSING_TYPE, ERROR) \
 	FUNC(MISSING_TYPE_ENTRY, ERROR) \
 	FUNC(MULTIPLE_AUTHORS, ERROR) \
+	FUNC(PACKED_REF_ENTRY_NOT_TERMINATED, ERROR) \
 	FUNC(TREE_NOT_SORTED, ERROR) \
 	FUNC(UNKNOWN_TYPE, ERROR) \
 	FUNC(ZERO_PADDED_DATE, ERROR) \
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 6401cecd5f..683cfe78dc 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1749,12 +1749,76 @@  static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
 	return empty_ref_iterator_begin();
 }
 
+static int packed_fsck_ref_next_line(struct fsck_options *o,
+				     struct strbuf *packed_entry, const char *start,
+				     const char *eof, const char **eol)
+{
+	int ret = 0;
+
+	*eol = memchr(start, '\n', eof - start);
+	if (!*eol) {
+		struct fsck_ref_report report = { 0 };
+
+		report.path = packed_entry->buf;
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_PACKED_REF_ENTRY_NOT_TERMINATED,
+				      "'%.*s' is not terminated with a newline",
+				      (int)(eof - start), start);
+
+		/*
+		 * There is no newline but we still want to parse it to the end of
+		 * the buffer.
+		 */
+		*eol = eof;
+	}
+
+	return ret;
+}
+
+static int packed_fsck_ref_header(struct fsck_options *o,
+				  const char *start, const char *eol)
+{
+	if (!starts_with(start, "# pack-refs with:")) {
+		struct fsck_ref_report report = { 0 };
+		report.path = "packed-refs.header";
+
+		return fsck_report_ref(o, &report,
+				       FSCK_MSG_BAD_PACKED_REF_HEADER,
+				       "'%.*s' does not start with '# pack-refs with:'",
+				       (int)(eol - start), start);
+	}
+
+	return 0;
+}
+
+static int packed_fsck_ref_content(struct fsck_options *o,
+				   const char *start, const char *eof)
+{
+	struct strbuf packed_entry = STRBUF_INIT;
+	unsigned long line_number = 1;
+	const char *eol;
+	int ret = 0;
+
+	strbuf_addf(&packed_entry, "packed-refs line %lu", line_number);
+	ret |= packed_fsck_ref_next_line(o, &packed_entry, start, eof, &eol);
+	if (*start == '#') {
+		ret |= packed_fsck_ref_header(o, start, eol);
+
+		start = eol + 1;
+		line_number++;
+	}
+
+	strbuf_release(&packed_entry);
+	return ret;
+}
+
 static int packed_fsck(struct ref_store *ref_store,
 		       struct fsck_options *o,
 		       struct worktree *wt)
 {
 	struct packed_ref_store *refs = packed_downcast(ref_store,
 							REF_STORE_READ, "fsck");
+	struct strbuf packed_ref_content = STRBUF_INIT;
 	int ret = 0;
 	int fd;
 
@@ -1786,7 +1850,16 @@  static int packed_fsck(struct ref_store *ref_store,
 		goto cleanup;
 	}
 
+	if (strbuf_read(&packed_ref_content, fd, 0) < 0) {
+		ret = error_errno(_("unable to read %s"), refs->path);
+		goto cleanup;
+	}
+
+	ret = packed_fsck_ref_content(o, packed_ref_content.buf,
+				      packed_ref_content.buf + packed_ref_content.len);
+
 cleanup:
+	strbuf_release(&packed_ref_content);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 42c8d4ca1e..da321f16c6 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -639,4 +639,29 @@  test_expect_success SYMLINKS 'the filetype of packed-refs should be checked' '
 	)
 '
 
+test_expect_success 'packed-refs header should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit default &&
+
+		git refs verify 2>err &&
+		test_must_be_empty err &&
+
+		for bad_header in "# pack-refs wit: peeled fully-peeled sorted " \
+				  "# pack-refs with traits: peeled fully-peeled sorted " \
+				  "# pack-refs with a: peeled fully-peeled"
+		do
+			printf "%s\n" "$bad_header" >.git/packed-refs &&
+			test_must_fail git refs verify 2>err &&
+			cat >expect <<-EOF &&
+			error: packed-refs.header: badPackedRefHeader: '\''$bad_header'\'' does not start with '\''# pack-refs with:'\''
+			EOF
+			rm .git/packed-refs &&
+			test_cmp expect err || return 1
+		done
+	)
+'
+
 test_done