diff mbox series

[v7,5/9] ref: port git-fsck(1) regular refs check for files backend

Message ID ZzCij4ilPLhlKXS6@ArchLinux (mailing list archive)
State Superseded
Headers show
Series add ref content check for files backend | expand

Commit Message

shejialuo Nov. 10, 2024, 12:09 p.m. UTC
"git-fsck(1)" implicitly checks the ref content by passing the
callback "fsck_handle_ref" to the "refs.c::refs_for_each_rawref".
Then, it will check whether the ref content (eventually "oid")
is valid. If not, it will report the following error to the user.

  error: refs/heads/main: invalid sha1 pointer 0000...

And it will also report above errors when there are dangling symrefs
in the repository wrongly. This does not align with the behavior of
the "git symbolic-ref" command which allows users to create dangling
symrefs.

As we have already introduced the "git refs verify" command, we'd better
check the ref content explicitly in the "git refs verify" command thus
later we could remove these checks in "git-fsck(1)" and launch a
subprocess to call "git refs verify" in "git-fsck(1)" to make the
"git-fsck(1)" more clean.

Following what "git-fsck(1)" does, add a similar check to "git refs
verify". Then add a new fsck error message "badRefContent(ERROR)" to
represent that a ref has an invalid content.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   3 +
 fsck.h                        |   1 +
 refs/files-backend.c          |  43 ++++++++++++++
 t/t0602-reffiles-fsck.sh      | 105 ++++++++++++++++++++++++++++++++++
 4 files changed, 152 insertions(+)

Comments

Patrick Steinhardt Nov. 13, 2024, 7:36 a.m. UTC | #1
On Sun, Nov 10, 2024 at 08:09:51PM +0800, shejialuo wrote:
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 8bfdce64bc..2d126ecbbe 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3505,6 +3505,48 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
>  				  const char *refname,
>  				  struct dir_iterator *iter);
>  
> +static int files_fsck_refs_content(struct ref_store *ref_store,
> +				   struct fsck_options *o,
> +				   const char *target_name,
> +				   struct dir_iterator *iter)
> +{
> +	struct strbuf ref_content = STRBUF_INIT;
> +	struct strbuf referent = STRBUF_INIT;
> +	struct fsck_ref_report report = { 0 };
> +	unsigned int type = 0;
> +	int failure_errno = 0;
> +	struct object_id oid;
> +	int ret = 0;
> +
> +	report.path = target_name;
> +
> +	if (S_ISLNK(iter->st.st_mode))
> +		goto cleanup;
> +
> +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> +		ret = fsck_report_ref(o, &report,
> +				      FSCK_MSG_BAD_REF_CONTENT,
> +				      "cannot read ref file '%s': %s",
> +				      iter->path.buf, strerror(errno));
> +		goto cleanup;
> +	}

I didn't catch this in previous rounds, but it's a little dubious
whether we should report this as an actual fsck error. I can expect
multiple situations:

  - The file has weird permissions and thus cannot be read, failing with
    EPERM, which doesn't match well with BAD_REF_CONTENT.

  - The file does not exist anymore because we were racing with a
    concurrent writer, failing with ENOENT. This is benign and expected
    to happen in busy repos, so generating an error here feels wrong.

  - The file cannot be read at all due to an I/O error. This may be
    reported with BAD_REF_CONTENT, but conflating this with the case
    where we have actually bad content may not be the best idea.

So maybe we should ignore ENOENT, report bad permissions and otherwise
return an actual error to the caller?

Patrick
shejialuo Nov. 14, 2024, 12:09 p.m. UTC | #2
On Wed, Nov 13, 2024 at 08:36:12AM +0100, Patrick Steinhardt wrote:
> On Sun, Nov 10, 2024 at 08:09:51PM +0800, shejialuo wrote:
> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index 8bfdce64bc..2d126ecbbe 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -3505,6 +3505,48 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
> >  				  const char *refname,
> >  				  struct dir_iterator *iter);
> >  
> > +static int files_fsck_refs_content(struct ref_store *ref_store,
> > +				   struct fsck_options *o,
> > +				   const char *target_name,
> > +				   struct dir_iterator *iter)
> > +{
> > +	struct strbuf ref_content = STRBUF_INIT;
> > +	struct strbuf referent = STRBUF_INIT;
> > +	struct fsck_ref_report report = { 0 };
> > +	unsigned int type = 0;
> > +	int failure_errno = 0;
> > +	struct object_id oid;
> > +	int ret = 0;
> > +
> > +	report.path = target_name;
> > +
> > +	if (S_ISLNK(iter->st.st_mode))
> > +		goto cleanup;
> > +
> > +	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
> > +		ret = fsck_report_ref(o, &report,
> > +				      FSCK_MSG_BAD_REF_CONTENT,
> > +				      "cannot read ref file '%s': %s",
> > +				      iter->path.buf, strerror(errno));
> > +		goto cleanup;
> > +	}
> 
> I didn't catch this in previous rounds, but it's a little dubious
> whether we should report this as an actual fsck error. I can expect
> multiple situations:
> 
>   - The file has weird permissions and thus cannot be read, failing with
>     EPERM, which doesn't match well with BAD_REF_CONTENT.
> 
>   - The file does not exist anymore because we were racing with a
>     concurrent writer, failing with ENOENT. This is benign and expected
>     to happen in busy repos, so generating an error here feels wrong.
> 
>   - The file cannot be read at all due to an I/O error. This may be
>     reported with BAD_REF_CONTENT, but conflating this with the case
>     where we have actually bad content may not be the best idea.
> 
> So maybe we should ignore ENOENT, report bad permissions and otherwise
> return an actual error to the caller?
> 

So, I think we should just use "error_errno" method to report the actual
error to the caller. And we also need to add some comments.

Thanks for this wonderful suggestion.


> Patrick
diff mbox series

Patch

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..22c385ea22 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@ 
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
diff --git a/fsck.h b/fsck.h
index 500b4c04d2..0d99a87911 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@  enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8bfdce64bc..2d126ecbbe 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3505,6 +3505,48 @@  typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refname,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *target_name,
+				   struct dir_iterator *iter)
+{
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct fsck_ref_report report = { 0 };
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	report.path = target_name;
+
+	if (S_ISLNK(iter->st.st_mode))
+		goto cleanup;
+
+	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "cannot read ref file '%s': %s",
+				      iter->path.buf, strerror(errno));
+		goto cleanup;
+	}
+
+	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+				     ref_content.buf, &oid, &referent,
+				     &type, &failure_errno)) {
+		strbuf_rtrim(&ref_content);
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "%s", ref_content.buf);
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *refname,
@@ -3600,6 +3642,7 @@  static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 1e17393a3d..162370077b 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -158,4 +158,109 @@  test_expect_success 'ref name check should work for multiple worktrees' '
 	done
 '
 
+test_expect_success 'regular ref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	git refs verify 2>err &&
+	test_must_be_empty err &&
+
+	for bad_content in "$(git rev-parse main)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$branch_dir_prefix/branch-bad &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/branch-bad: badRefContent: $bad_content
+		EOF
+		rm $branch_dir_prefix/branch-bad &&
+		test_cmp expect err || return 1
+	done &&
+
+	for bad_content in "$(git rev-parse main)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$branch_dir_prefix/a/b/branch-bad &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/a/b/branch-bad: badRefContent: $bad_content
+		EOF
+		rm $branch_dir_prefix/a/b/branch-bad &&
+		test_cmp expect err || return 1
+	done
+'
+
+test_expect_success 'regular ref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	bad_content_1=$(git rev-parse main)x &&
+	bad_content_2=xfsazqfxcadas &&
+	bad_content_3=Xfsazqfxcadas &&
+	printf "%s" $bad_content_1 >$tag_dir_prefix/tag-bad-1 &&
+	printf "%s" $bad_content_2 >$tag_dir_prefix/tag-bad-2 &&
+	printf "%s" $bad_content_3 >$branch_dir_prefix/a/b/branch-bad &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: $bad_content_3
+	error: refs/tags/tag-bad-1: badRefContent: $bad_content_1
+	error: refs/tags/tag-bad-2: badRefContent: $bad_content_2
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
+test_expect_success 'ref content checks should work with worktrees' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	cd repo &&
+	test_commit default &&
+	git branch branch-1 &&
+	git branch branch-2 &&
+	git branch branch-3 &&
+	git worktree add ./worktree-1 branch-2 &&
+	git worktree add ./worktree-2 branch-3 &&
+	worktree1_refdir_prefix=.git/worktrees/worktree-1/refs/worktree &&
+	worktree2_refdir_prefix=.git/worktrees/worktree-2/refs/worktree &&
+
+	(
+		cd worktree-1 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+	(
+		cd worktree-2 &&
+		git update-ref refs/worktree/branch-4 refs/heads/branch-1
+	) &&
+
+	for bad_content in "$(git rev-parse HEAD)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$worktree1_refdir_prefix/bad-branch-1 &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: worktrees/worktree-1/refs/worktree/bad-branch-1: badRefContent: $bad_content
+		EOF
+		rm $worktree1_refdir_prefix/bad-branch-1 &&
+		test_cmp expect err || return 1
+	done &&
+
+	for bad_content in "$(git rev-parse HEAD)x" "xfsazqfxcadas" "Xfsazqfxcadas"
+	do
+		printf "%s" $bad_content >$worktree2_refdir_prefix/bad-branch-2 &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: worktrees/worktree-2/refs/worktree/bad-branch-2: badRefContent: $bad_content
+		EOF
+		rm $worktree2_refdir_prefix/bad-branch-2 &&
+		test_cmp expect err || return 1
+	done
+'
+
 test_done