diff mbox series

[v5,5/9] ref: add basic symref content check for files backend

Message ID Zvj-vbvQym1R4KJk@ArchLinux (mailing list archive)
State New
Headers show
Series add ref content check for files backend | expand

Commit Message

shejialuo Sept. 29, 2024, 7:16 a.m. UTC
We have code that checks regular ref contents, but we do not yet check
the contents of symbolic refs. By using "parse_loose_ref_content" for
symbolic refs, we will get the information of the "referent".

We do not need to check the "referent" by opening the file. This is
because if "referent" exists in the file system, we will eventually
check its correctness by inspecting every file in the "refs" directory.
If the "referent" does not exist in the filesystem, this is OK as it is
seen as the dangling symref.

So we just need to check the "referent" string content. A regular could
be accepted as a textual symref if it begins with "ref:", followed by
zero or more whitespaces, followed by the full refname, followed only by
whitespace characters. However, we always write a single SP after "ref:"
and a single LF after the refname. It may seem that we should report a
fsck error message when the "referent" does not apply above rules and we
should not be so aggressive because third-party reimplementations of Git
may have taken advantage of the looser syntax. Put it more specific, we
accept the following "referent":

1. "ref: refs/heads/master   "
2. "ref: refs/heads/master   \n  \n"
3. "ref: refs/heads/master\n\n"

When introducing the regular ref content checks, we created a new fsck
message "unofficialFormattedRef" which exactly represents above
situation. So we will reuse this fsck message to write checks to info
the user about these situations.

But we do not allow any other trailing garbage. The followings are bad
symref contents which will be reported as fsck error by "git-fsck(1)".

1. "ref: refs/heads/master garbage\n"
2. "ref: refs/heads/master \n\n\n garbage  "

And we introduce a new "badReferent(ERROR)" fsck message to report above
errors by using "ref.c::check_refname_format". But we cannot just pass
the "referent" to this function because the "referent" might contain
some whitespaces which will cause "check_refname_format" failing.

In order to add checks, we will do the following things:

1. Record the untrimmed length "orig_len" and untrimmed last byte
   "orig_last_byte".
2. Use "strbuf_rtrim" to trim the whitespaces or newlines to make sure
   "check_refname_format" won't be failed by them.
3. Use "orig_len" and "orig_last_byte" to check whether the "referent"
   misses '\n' at the end or it has trailing whitespaces or newlines.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  3 ++
 fsck.h                        |  1 +
 refs/files-backend.c          | 40 +++++++++++++++
 t/t0602-reffiles-fsck.sh      | 97 +++++++++++++++++++++++++++++++++++
 4 files changed, 141 insertions(+)

Comments

Karthik Nayak Oct. 8, 2024, 7:58 a.m. UTC | #1
shejialuo <shejialuo@gmail.com> writes:

> We have code that checks regular ref contents, but we do not yet check
> the contents of symbolic refs. By using "parse_loose_ref_content" for
> symbolic refs, we will get the information of the "referent".
>
> We do not need to check the "referent" by opening the file. This is
> because if "referent" exists in the file system, we will eventually
> check its correctness by inspecting every file in the "refs" directory.
> If the "referent" does not exist in the filesystem, this is OK as it is
> seen as the dangling symref.
>
> So we just need to check the "referent" string content. A regular could

seems like we're missing the noun here, a regular what?

> be accepted as a textual symref if it begins with "ref:", followed by
> zero or more whitespaces, followed by the full refname, followed only by
> whitespace characters. However, we always write a single SP after "ref:"
> and a single LF after the refname. It may seem that we should report a
> fsck error message when the "referent" does not apply above rules and we
> should not be so aggressive because third-party reimplementations of Git
> may have taken advantage of the looser syntax. Put it more specific, we
> accept the following "referent":
>
> 1. "ref: refs/heads/master   "
> 2. "ref: refs/heads/master   \n  \n"
> 3. "ref: refs/heads/master\n\n"
>
> When introducing the regular ref content checks, we created a new fsck
> message "unofficialFormattedRef" which exactly represents above
> situation. So we will reuse this fsck message to write checks to info
> the user about these situations.
>

Plus to what Patrick said in the previous commit, it would be nice to
separate these issues with different message IDs.

> But we do not allow any other trailing garbage. The followings are bad
> symref contents which will be reported as fsck error by "git-fsck(1)".
>
> 1. "ref: refs/heads/master garbage\n"
> 2. "ref: refs/heads/master \n\n\n garbage  "
>
> And we introduce a new "badReferent(ERROR)" fsck message to report above
> errors by using "ref.c::check_refname_format". But we cannot just pass
> the "referent" to this function because the "referent" might contain
> some whitespaces which will cause "check_refname_format" failing.
>

It would be nice if you could elaborate here, or rather restructure to
say something like..

    Since 'check_refname_format' doesn't work with whitespaces, we use
    the trimmed version of 'referent' with the function.

[snip]
shejialuo Oct. 8, 2024, 12:18 p.m. UTC | #2
On Tue, Oct 08, 2024 at 12:58:16AM -0700, Karthik Nayak wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > We have code that checks regular ref contents, but we do not yet check
> > the contents of symbolic refs. By using "parse_loose_ref_content" for
> > symbolic refs, we will get the information of the "referent".
> >
> > We do not need to check the "referent" by opening the file. This is
> > because if "referent" exists in the file system, we will eventually
> > check its correctness by inspecting every file in the "refs" directory.
> > If the "referent" does not exist in the filesystem, this is OK as it is
> > seen as the dangling symref.
> >
> > So we just need to check the "referent" string content. A regular could
> 
> seems like we're missing the noun here, a regular what?
> 

It should be "a regular ref". I copied the original commit message and
may carelessly type "daw" in vim to delete the "ref". Thanks.
diff mbox series

Patch

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index e310b5bce9..e0e4519334 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -28,6 +28,9 @@ 
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
+`badReferent`::
+	(ERROR) The referent of a ref is invalid.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/fsck.h b/fsck.h
index 7420add5c0..979d75cb53 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@  enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
+	FUNC(BAD_REFERENT, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index b2a790c884..57ac466b64 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3508,6 +3508,43 @@  typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+static int files_fsck_symref_target(struct fsck_options *o,
+				    struct fsck_ref_report *report,
+				    struct strbuf *referent)
+{
+	char orig_last_byte;
+	size_t orig_len;
+	int ret = 0;
+
+	orig_len = referent->len;
+	orig_last_byte = referent->buf[orig_len - 1];
+	strbuf_rtrim(referent);
+
+	if (check_refname_format(referent->buf, 0)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_REFERENT,
+				      "points to invalid refname '%s'", referent->buf);
+		goto out;
+	}
+
+
+	if (referent->len == orig_len ||
+	    (referent->len < orig_len && orig_last_byte != '\n')) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
+				      "misses LF at the end");
+	}
+
+	if (referent->len != orig_len && referent->len != orig_len - 1) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_UNOFFICIAL_FORMATTED_REF,
+				      "has trailing whitespaces or newlines");
+	}
+
+out:
+	return ret;
+}
+
 static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct fsck_options *o,
 				   const char *refs_check_dir,
@@ -3559,6 +3596,9 @@  static int files_fsck_refs_content(struct ref_store *ref_store,
 					      "has trailing garbage: '%s'", trailing);
 			goto cleanup;
 		}
+	} else {
+		ret = files_fsck_symref_target(o, &report, &referent);
+		goto cleanup;
 	}
 
 cleanup:
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 2f5c4a1926..718f6abb71 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -273,4 +273,101 @@  test_expect_success 'regular ref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'textual symref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	git refs verify 2>err &&
+	rm $branch_dir_prefix/branch-good &&
+	test_must_be_empty err &&
+
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline-1: unofficialFormattedRef: misses LF at the end
+	EOF
+	rm $branch_dir_prefix/branch-no-newline-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-1: unofficialFormattedRef: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: unofficialFormattedRef: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-2: unofficialFormattedRef: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-2 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-3: unofficialFormattedRef: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-3 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-complicated: unofficialFormattedRef: misses LF at the end
+	warning: refs/heads/a/b/branch-complicated: unofficialFormattedRef: has trailing whitespaces or newlines
+	EOF
+	rm $branch_dir_prefix/a/b/branch-complicated &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badReferent: points to invalid refname '\''refs/heads/.branch'\''
+	EOF
+	rm $branch_dir_prefix/branch-bad-1 &&
+	test_cmp expect err
+'
+
+test_expect_success 'textual symref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badReferent: points to invalid refname '\''refs/heads/.branch'\''
+	warning: refs/heads/a/b/branch-complicated: unofficialFormattedRef: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-complicated: unofficialFormattedRef: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-1: unofficialFormattedRef: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-1: unofficialFormattedRef: misses LF at the end
+	warning: refs/heads/a/b/branch-trailing-2: unofficialFormattedRef: has trailing whitespaces or newlines
+	warning: refs/heads/a/b/branch-trailing-3: unofficialFormattedRef: has trailing whitespaces or newlines
+	warning: refs/heads/branch-no-newline-1: unofficialFormattedRef: misses LF at the end
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_done