diff mbox series

[v1,3/4] ref: add symbolic ref content check for files backend

Message ID ZsIM4OZWfylcP5Ix@ArchLinux (mailing list archive)
State Superseded
Headers show
Series add ref content check for files backend | expand

Commit Message

shejialuo Aug. 18, 2024, 3:01 p.m. UTC
We have already introduced the checks for regular refs. There is no need
to check the consistency of the target which the symbolic ref points to.
Instead, we just check the content of the symbolic ref itself.

In order to check the content of the symbolic ref, create a function
"files_fsck_symref_target". It will first check whether the "pointee" is
under the "refs/" directory and then we will check the "pointee" itself.

There is no specification about the content of the symbolic ref.
Although we do write "ref: %s\n" to create a symbolic ref by using
"git-symbolic-ref(1)" command. However, this is not mandatory. We still
accept symbolic refs with null trailing garbage. Put it more specific,
the following are correct:

1. "ref: refs/heads/master   "
2. "ref: refs/heads/master   \n  \n"
3. "ref: refs/heads/master\n\n"

But we do not allow any non-null trailing garbage. The following are bad
symbolic contents.

1. "ref: refs/heads/master garbage\n"
2. "ref: refs/heads/master \n\n\n garbage  "

In order to provide above checks, we will traverse the "pointee" to
report the user whether this is null-garbage or no newline. And if
symbolic refs contain non-null garbage, we will report
"FSCK_MSG_BAD_REF_CONTENT" to the user.

Then, we will check the name of the "pointee" is correct by using
"check_refname_format". And then if we can access the "pointee_path" in
the file system, we should ensure that the file type is correct.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  3 ++
 fsck.h                        |  1 +
 refs/files-backend.c          | 87 +++++++++++++++++++++++++++++++++++
 t/t0602-reffiles-fsck.sh      | 52 +++++++++++++++++++++
 4 files changed, 143 insertions(+)

Comments

Patrick Steinhardt Aug. 22, 2024, 8:53 a.m. UTC | #1
On Sun, Aug 18, 2024 at 11:01:52PM +0800, shejialuo wrote:
> We have already introduced the checks for regular refs. There is no need
> to check the consistency of the target which the symbolic ref points to.
> Instead, we just check the content of the symbolic ref itself.
> 
> In order to check the content of the symbolic ref, create a function
> "files_fsck_symref_target". It will first check whether the "pointee" is
> under the "refs/" directory and then we will check the "pointee" itself.
> 
> There is no specification about the content of the symbolic ref.
> Although we do write "ref: %s\n" to create a symbolic ref by using
> "git-symbolic-ref(1)" command. However, this is not mandatory. We still
> accept symbolic refs with null trailing garbage. Put it more specific,
> the following are correct:
> 
> 1. "ref: refs/heads/master   "
> 2. "ref: refs/heads/master   \n  \n"
> 3. "ref: refs/heads/master\n\n"
> 
> But we do not allow any non-null trailing garbage. The following are bad
> symbolic contents.
> 
> 1. "ref: refs/heads/master garbage\n"
> 2. "ref: refs/heads/master \n\n\n garbage  "
> 
> In order to provide above checks, we will traverse the "pointee" to
> report the user whether this is null-garbage or no newline. And if
> symbolic refs contain non-null garbage, we will report
> "FSCK_MSG_BAD_REF_CONTENT" to the user.
> 
> Then, we will check the name of the "pointee" is correct by using
> "check_refname_format". And then if we can access the "pointee_path" in
> the file system, we should ensure that the file type is correct.
> 
> Mentored-by: Patrick Steinhardt <ps@pks.im>
> Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
>  Documentation/fsck-msgids.txt |  3 ++
>  fsck.h                        |  1 +
>  refs/files-backend.c          | 87 +++++++++++++++++++++++++++++++++++
>  t/t0602-reffiles-fsck.sh      | 52 +++++++++++++++++++++
>  4 files changed, 143 insertions(+)
> 
> diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> index 1688c2f1fe..73587661dc 100644
> --- a/Documentation/fsck-msgids.txt
> +++ b/Documentation/fsck-msgids.txt
> @@ -28,6 +28,9 @@
>  `badRefName`::
>  	(ERROR) A ref has an invalid format.
>  
> +`badSymrefPointee`::
> +	(ERROR) The pointee of a symref is bad.
> +
>  `badTagName`::
>  	(INFO) A tag has an invalid format.
>  
> diff --git a/fsck.h b/fsck.h
> index 975d9b9da9..985b674dd9 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -34,6 +34,7 @@ enum fsck_msg_type {
>  	FUNC(BAD_REF_CONTENT, ERROR) \
>  	FUNC(BAD_REF_FILETYPE, ERROR) \
>  	FUNC(BAD_REF_NAME, ERROR) \
> +	FUNC(BAD_SYMREF_POINTEE, ERROR) \
>  	FUNC(BAD_TIMEZONE, ERROR) \
>  	FUNC(BAD_TREE, ERROR) \
>  	FUNC(BAD_TREE_SHA1, ERROR) \
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index ae71692f36..bfb8d338d2 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3434,12 +3434,92 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
>  				  const char *refs_check_dir,
>  				  struct dir_iterator *iter);
>  
> +/*
> + * Check the symref "pointee_name" and "pointee_path". The caller should
> + * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name"
> + * would be the content after "refs:".
> + */
> +static int files_fsck_symref_target(struct fsck_options *o,
> +				    struct fsck_ref_report *report,
> +				    const char *refname,
> +				    struct strbuf *pointee_name,
> +				    struct strbuf *pointee_path)
> +{
> +	unsigned int newline_num = 0;
> +	unsigned int space_num = 0;
> +	const char *p = NULL;
> +	struct stat st;
> +	int ret = 0;
> +
> +	if (!skip_prefix(pointee_name->buf, "refs/", &p)) {
> +
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> +				      "points to ref outside the refs directory");
> +		goto out;
> +	}
> +
> +	while (*p != '\0') {

We typically write this `while (*p)`.

> +		if ((space_num || newline_num) && !isspace(*p)) {
> +			ret = fsck_report_ref(o, report,
> +					      FSCK_MSG_BAD_REF_CONTENT,
> +					      "contains non-null garbage");
> +			goto out;
> +		}
> +
> +		if (*p == '\n') {
> +			newline_num++;
> +		} else if (*p == ' ') {
> +			space_num++;
> +		}
> +		p++;
> +	}

Can't we replace this with a single `strchr('\n')` call to check for the
newline and then verify that the next character is a `\0`? The check for
spaces would then be handled by `check_refname_format()`.

> +	/*
> +	 * Missing target should not be treated as any error worthy event and
> +	 * not even warn. It is a common case that a symbolic ref points to a
> +	 * ref that does not exist yet. If the target ref does not exist, just
> +	 * skip the check for the file type.
> +	 */
> +	if (lstat(pointee_path->buf, &st) < 0)
> +		goto out;
> +
> +	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> +				      "points to an invalid file type");
> +		goto out;
> +	}

What exactly are we guarding against here? Don't we already verify that
files in `refs/` have the correct type? Or are we checking that it does
not point to a directory?

Patrick
shejialuo Aug. 22, 2024, 12:42 p.m. UTC | #2
On Thu, Aug 22, 2024 at 10:53:57AM +0200, Patrick Steinhardt wrote:


> > +		if ((space_num || newline_num) && !isspace(*p)) {
> > +			ret = fsck_report_ref(o, report,
> > +					      FSCK_MSG_BAD_REF_CONTENT,
> > +					      "contains non-null garbage");
> > +			goto out;
> > +		}
> > +
> > +		if (*p == '\n') {
> > +			newline_num++;
> > +		} else if (*p == ' ') {
> > +			space_num++;
> > +		}
> > +		p++;
> > +	}
> 
> Can't we replace this with a single `strchr('\n')` call to check for the
> newline and then verify that the next character is a `\0`? The check for
> spaces would then be handled by `check_refname_format()`.
> 

We cannot. Think about this situation.

  "ref: refs/heads/master  \n   "

We find that the next character of '\n' is not '\0'. Then we leave it to
"check_refname_format". But "check_refname_format" will report an error
here, but this is an allowed symref.

But I think using `strchr` is a nice way. I will try to find an elegant
way here to handle this logic here.

> > +	/*
> > +	 * Missing target should not be treated as any error worthy event and
> > +	 * not even warn. It is a common case that a symbolic ref points to a
> > +	 * ref that does not exist yet. If the target ref does not exist, just
> > +	 * skip the check for the file type.
> > +	 */
> > +	if (lstat(pointee_path->buf, &st) < 0)
> > +		goto out;
> > +
> > +	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_BAD_SYMREF_POINTEE,
> > +				      "points to an invalid file type");
> > +		goto out;
> > +	}
> 
> What exactly are we guarding against here? Don't we already verify that
> files in `refs/` have the correct type? Or are we checking that it does
> not point to a directory?
> 

When scanning the "refs" directory, we will check the file in the ref
database, but we ignore the directory. So we are checking to know
whether it does not point to a directory. If the ref points to a bad
file type for example "ref/heads/bad-file"

If it is a block type file. We will first report that "refs/heads/bad-file"
is a bad file and then report ref points to bad file
"refs/heads/bad-file".

Actually, I think this is a little redundant here, but we can be
tolerant about this because we need to guard against directory. We need
to consider this situation.

So we could let this be.

> Patrick
Patrick Steinhardt Aug. 23, 2024, 5:36 a.m. UTC | #3
On Thu, Aug 22, 2024 at 08:42:02PM +0800, shejialuo wrote:
> On Thu, Aug 22, 2024 at 10:53:57AM +0200, Patrick Steinhardt wrote:
> 
> 
> > > +		if ((space_num || newline_num) && !isspace(*p)) {
> > > +			ret = fsck_report_ref(o, report,
> > > +					      FSCK_MSG_BAD_REF_CONTENT,
> > > +					      "contains non-null garbage");
> > > +			goto out;
> > > +		}
> > > +
> > > +		if (*p == '\n') {
> > > +			newline_num++;
> > > +		} else if (*p == ' ') {
> > > +			space_num++;
> > > +		}
> > > +		p++;
> > > +	}
> > 
> > Can't we replace this with a single `strchr('\n')` call to check for the
> > newline and then verify that the next character is a `\0`? The check for
> > spaces would then be handled by `check_refname_format()`.
> > 
> 
> We cannot. Think about this situation.
> 
>   "ref: refs/heads/master  \n   "
> 
> We find that the next character of '\n' is not '\0'. Then we leave it to
> "check_refname_format". But "check_refname_format" will report an error
> here, but this is an allowed symref.

Wouldn't it be correct to warn about this? To me the above very much
looks like garbage after the refname, same like we'd also warn about
such garbage for direct refs.

Patrick
shejialuo Aug. 23, 2024, 11:37 a.m. UTC | #4
On Fri, Aug 23, 2024 at 07:36:10AM +0200, Patrick Steinhardt wrote:
> On Thu, Aug 22, 2024 at 08:42:02PM +0800, shejialuo wrote:
> > On Thu, Aug 22, 2024 at 10:53:57AM +0200, Patrick Steinhardt wrote:
> > 
> > 
> > > > +		if ((space_num || newline_num) && !isspace(*p)) {
> > > > +			ret = fsck_report_ref(o, report,
> > > > +					      FSCK_MSG_BAD_REF_CONTENT,
> > > > +					      "contains non-null garbage");
> > > > +			goto out;
> > > > +		}
> > > > +
> > > > +		if (*p == '\n') {
> > > > +			newline_num++;
> > > > +		} else if (*p == ' ') {
> > > > +			space_num++;
> > > > +		}
> > > > +		p++;
> > > > +	}
> > > 
> > > Can't we replace this with a single `strchr('\n')` call to check for the
> > > newline and then verify that the next character is a `\0`? The check for
> > > spaces would then be handled by `check_refname_format()`.
> > > 
> > 
> > We cannot. Think about this situation.
> > 
> >   "ref: refs/heads/master  \n   "
> > 
> > We find that the next character of '\n' is not '\0'. Then we leave it to
> > "check_refname_format". But "check_refname_format" will report an error
> > here, but this is an allowed symref.
> 
> Wouldn't it be correct to warn about this? To me the above very much
> looks like garbage after the refname, same like we'd also warn about
> such garbage for direct refs.
> 

Yes, we should warn about this. But only null-garbage is allowed for
symref. The following situation is bad:

  "ref: refs/heads/master  \n   garbage\n"

We should report error here, from my perspective, it's a FATAL ERROR.
However, let's decide how to do this when we know what fsck error level
we should set.

> Patrick
diff mbox series

Patch

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 1688c2f1fe..73587661dc 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -28,6 +28,9 @@ 
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
+`badSymrefPointee`::
+	(ERROR) The pointee of a symref is bad.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/fsck.h b/fsck.h
index 975d9b9da9..985b674dd9 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,6 +34,7 @@  enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
+	FUNC(BAD_SYMREF_POINTEE, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ae71692f36..bfb8d338d2 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3434,12 +3434,92 @@  typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+/*
+ * Check the symref "pointee_name" and "pointee_path". The caller should
+ * make sure that "pointee_path" is absolute. For symbolic ref, "pointee_name"
+ * would be the content after "refs:".
+ */
+static int files_fsck_symref_target(struct fsck_options *o,
+				    struct fsck_ref_report *report,
+				    const char *refname,
+				    struct strbuf *pointee_name,
+				    struct strbuf *pointee_path)
+{
+	unsigned int newline_num = 0;
+	unsigned int space_num = 0;
+	const char *p = NULL;
+	struct stat st;
+	int ret = 0;
+
+	if (!skip_prefix(pointee_name->buf, "refs/", &p)) {
+
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_SYMREF_POINTEE,
+				      "points to ref outside the refs directory");
+		goto out;
+	}
+
+	while (*p != '\0') {
+		if ((space_num || newline_num) && !isspace(*p)) {
+			ret = fsck_report_ref(o, report,
+					      FSCK_MSG_BAD_REF_CONTENT,
+					      "contains non-null garbage");
+			goto out;
+		}
+
+		if (*p == '\n') {
+			newline_num++;
+		} else if (*p == ' ') {
+			space_num++;
+		}
+		p++;
+	}
+
+	if (space_num || newline_num > 1) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_TRAILING_REF_CONTENT,
+				      "trailing null-garbage");
+	} else if (!newline_num) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_REF_MISSING_NEWLINE,
+				      "missing newline");
+	}
+
+	strbuf_rtrim(pointee_name);
+
+	if (check_refname_format(pointee_name->buf, 0)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_SYMREF_POINTEE,
+				      "points to refname with invalid format");
+	}
+
+	/*
+	 * Missing target should not be treated as any error worthy event and
+	 * not even warn. It is a common case that a symbolic ref points to a
+	 * ref that does not exist yet. If the target ref does not exist, just
+	 * skip the check for the file type.
+	 */
+	if (lstat(pointee_path->buf, &st) < 0)
+		goto out;
+
+	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_SYMREF_POINTEE,
+				      "points to an invalid file type");
+		goto out;
+	}
+
+out:
+	return ret;
+}
+
 static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct fsck_options *o,
 				   const char *refs_check_dir,
 				   struct dir_iterator *iter)
 {
 	struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;
+	struct strbuf pointee_path = STRBUF_INIT;
 	struct strbuf ref_content = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
@@ -3482,6 +3562,12 @@  static int files_fsck_refs_content(struct ref_store *ref_store,
 						      "trailing garbage in ref");
 				goto cleanup;
 			}
+		} else {
+			strbuf_addf(&pointee_path, "%s/%s",
+				    ref_store->gitdir, referent.buf);
+			ret = files_fsck_symref_target(o, &report, refname.buf,
+						       &referent,
+						       &pointee_path);
 		}
 	}
 
@@ -3489,6 +3575,7 @@  static int files_fsck_refs_content(struct ref_store *ref_store,
 	strbuf_release(&refname);
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
+	strbuf_release(&pointee_path);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 7c1910d784..e8fc2ef015 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -176,4 +176,56 @@  test_expect_success 'regular ref content should be checked' '
 	test_cmp expect err
 '
 
+test_expect_success 'symbolic ref content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	git commit --allow-empty -m initial &&
+	git checkout -b branch-1 &&
+	git tag tag-1 &&
+	git checkout -b a/b/branch-2 &&
+
+	printf "ref: refs/heads/branch" > $branch_dir_prefix/branch-1-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
+	EOF
+	rm $branch_dir_prefix/branch-1-no-newline &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch     " > $branch_dir_prefix/a/b/branch-trailing &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch\n\n" > $branch_dir_prefix/a/b/branch-trailing &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n\n " > $branch_dir_prefix/a/b/branch-trailing &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing: trailingRefContent: trailing null-garbage
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/.branch\n" > $branch_dir_prefix/branch-2-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-2-bad: badSymrefPointee: points to refname with invalid format
+	EOF
+	rm $branch_dir_prefix/branch-2-bad &&
+	test_cmp expect err
+'
+
 test_done