diff mbox series

[v4,2/5] ref: port git-fsck(1) regular refs check for files backend

Message ID ZuRzwKTFd65RL4HC@ArchLinux (mailing list archive)
State New
Headers show
Series [v4,1/5] ref: initialize "fsck_ref_report" with zero | expand

Commit Message

shejialuo Sept. 13, 2024, 5:17 p.m. UTC
We implicitly rely on "git-fsck(1)" to check the consistency of regular
refs. However, we have already set up the infrastructure of the ref
consistency checks. We need to port original checks from "git-fsck(1)".
Thus, we could clean the "git-fsck(1)" code by removing these implicit
checks.

The "git-fsck(1)" command reports an error when the ref content is
invalid. Following this, add a similar check to "git refs verify".
Add a new fsck error message called "badRefContent(ERROR)" to represent
that a ref has an invalid content.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  3 ++
 fsck.h                        |  1 +
 refs/files-backend.c          | 43 +++++++++++++++++++++++++
 t/t0602-reffiles-fsck.sh      | 60 +++++++++++++++++++++++++++++++++++
 4 files changed, 107 insertions(+)

Comments

Junio C Hamano Sept. 18, 2024, 6:59 p.m. UTC | #1
shejialuo <shejialuo@gmail.com> writes:

> We implicitly rely on "git-fsck(1)" to check the consistency of regular
> refs. However, we have already set up the infrastructure of the ref
> consistency checks. We need to port original checks from "git-fsck(1)".
> Thus, we could clean the "git-fsck(1)" code by removing these implicit
> checks.

The above reads as if you are, in preparation to "port" the checks
we have in "fsck" to elsewhere (presumably to "refs verify"), you
are removing the checks that _will_ become redundant from "fsck".

But that does not seem to be what is happening.  Let me try to
paraphrase, in order to check my understanding of what you wanted to
say:

    "git-fsck(1) has some consistency checks for regular refs.  As
    we want to align the checks "git refs verify" performs with
    them (and eventually call the unified code that checks refs from
    both), port the logic "git fsck" has to "git refs verify".

If we fail to achieve the "a single unified code to check called by
both fsck and refs-verify" at the end of this series, and instead
end up with duplicated code that implements the checks in two
separate code, risking them to be slightly different and drift away
over time from each other, that is fine, as long as our intention is
to continue the effort for unification in a follow up series.  

But such a plan needs to be spelled out.

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 890d0324e1..b1ed2e5c04 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3430,6 +3430,48 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
>  				  const char *refs_check_dir,
>  				  struct dir_iterator *iter);
>  
> +static int files_fsck_refs_content(struct ref_store *ref_store,
> +				   struct fsck_options *o,
> +				   const char *refs_check_dir,
> +				   struct dir_iterator *iter)
> +{
> +	struct strbuf ref_content = STRBUF_INIT;
> +	struct strbuf referent = STRBUF_INIT;
> +	struct strbuf refname = STRBUF_INIT;
> +	struct fsck_ref_report report = {0};
> +	unsigned int type = 0;
> +	int failure_errno = 0;
> +	struct object_id oid;
> +	int ret = 0;
> +
> +	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
> +	report.path = refname.buf;
> +
> +	if (S_ISLNK(iter->st.st_mode))
> +		goto cleanup;

"symbolic links are OK" for now.  We'll add sanity checks for them
in later steps.  OK.

> +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> +		ret = error_errno(_("unable to read ref '%s/%s'"),
> +				  refs_check_dir, iter->relative_path);

Is there a reason why we cannot to use report.path aka refname.buf,
and instead we have to recompute the same path again?

Should this error be propagated back to the caller, not just to the
end-user, by a call to fsck_report_ref(), like you do for a ref file
that has questionable contents?  If ref iteration (like for-each-ref)
claims there is this ref, and you cannot read its value when you try
to use it, it is just as bad as having a loose ref file that has
unusable contents, isn't it?

It is a separate matter if such a failure mode deserves its own
error code (FSCK_MSG_UNREADABLE_REF) or can be rolled into the same
FSCK_MSG_BAD_REF_CONTENT.  I can see arguments for both sides and
offhand have no strong preference either way.

Thanks.

> +		goto cleanup;
> +	}
> +
> +	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
> +				     ref_content.buf, &oid, &referent,
> +				     &type, &failure_errno)) {
> +		ret = fsck_report_ref(o, &report,
> +				      FSCK_MSG_BAD_REF_CONTENT,
> +				      "invalid ref content");
> +		goto cleanup;
> +	}
> +
> +cleanup:
> +	strbuf_release(&refname);
> +	strbuf_release(&ref_content);
> +	strbuf_release(&referent);
> +	return ret;
> +}
diff mbox series

Patch

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..22c385ea22 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@ 
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
diff --git a/fsck.h b/fsck.h
index 500b4c04d2..0d99a87911 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@  enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 890d0324e1..b1ed2e5c04 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3430,6 +3430,48 @@  typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *refs_check_dir,
+				   struct dir_iterator *iter)
+{
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct strbuf refname = STRBUF_INIT;
+	struct fsck_ref_report report = {0};
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
+	report.path = refname.buf;
+
+	if (S_ISLNK(iter->st.st_mode))
+		goto cleanup;
+
+	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+		ret = error_errno(_("unable to read ref '%s/%s'"),
+				  refs_check_dir, iter->relative_path);
+		goto cleanup;
+	}
+
+	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+				     ref_content.buf, &oid, &referent,
+				     &type, &failure_errno)) {
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "invalid ref content");
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&refname);
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *refs_check_dir,
@@ -3512,6 +3554,7 @@  static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 71a4d1a5ae..a1205b3a3b 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -89,4 +89,64 @@  test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_must_be_empty err
 '
 
+test_expect_success 'regular ref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	git refs verify 2>err &&
+	test_must_be_empty err &&
+
+	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-bad-1: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-bad-1 &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-bad-2: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-bad-2 &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content
+	EOF
+	rm $branch_dir_prefix/a/b/branch-bad &&
+	test_cmp expect err
+'
+
+test_expect_success 'regular ref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
+	printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 &&
+	printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content
+	error: refs/tags/tag-bad-1: badRefContent: invalid ref content
+	error: refs/tags/tag-bad-2: badRefContent: invalid ref content
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_done