[GSoC,v13,09/10] fsck: add ref name check for files backend

Message ID	ZqeY0eHNZjKhNvIH@ArchLinux (mailing list archive)
State	Superseded
Headers	show Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E2F514A62A for <git@vger.kernel.org>; Mon, 29 Jul 2024 13:27:27 +0000 (UTC) Date: Mon, 29 Jul 2024 21:27:45 +0800 From: shejialuo <shejialuo@gmail.com> To: git@vger.kernel.org Cc: Patrick Steinhardt <ps@pks.im>, Karthik Nayak <karthik.188@gmail.com>, Junio C Hamano <gitster@pobox.com>, Eric Sunshine <sunshine@sunshineco.com>, Justin Tobler <jltobler@gmail.com> Subject: [GSoC][PATCH v13 09/10] fsck: add ref name check for files backend Message-ID: <ZqeY0eHNZjKhNvIH@ArchLinux> References: <ZqeXrPROpEg_pRS2@ArchLinux> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <ZqeXrPROpEg_pRS2@ArchLinux>
Series	ref consistency check infra setup \| expand [GSoC,v13,00/10] ref consistency check infra setup [GSoC,v13,01/10] fsck: rename "skiplist" to "skip_oids" [GSoC,v13,02/10] fsck: add a unified interface for reporting fsck messages [GSoC,v13,03/10] fsck: rename objects-related fsck error functions [GSoC,v13,04/10] fsck: add refs-related error report function [GSoC,v13,05/10] refs: set up ref consistency check infrastructure [GSoC,v13,06/10] git refs: add verify subcommand [GSoC,v13,07/10] builtin/fsck: add `git-refs verify` child process [GSoC,v13,08/10] files-backend: add unified interface for refs scanning [GSoC,v13,09/10] fsck: add ref name check for files backend [GSoC,v13,10/10] fsck: add ref content check for files backend

Message ID

ZqeY0eHNZjKhNvIH@ArchLinux (mailing list archive)

State

Superseded

Headers

Date: Mon, 29 Jul 2024 21:27:45 +0800
From: shejialuo <shejialuo@gmail.com>
To: git@vger.kernel.org
Cc: Patrick Steinhardt <ps@pks.im>, Karthik Nayak <karthik.188@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Justin Tobler <jltobler@gmail.com>
Subject: [GSoC][PATCH v13 09/10] fsck: add ref name check for files backend
Message-ID: <ZqeY0eHNZjKhNvIH@ArchLinux>
References: <ZqeXrPROpEg_pRS2@ArchLinux>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <ZqeXrPROpEg_pRS2@ArchLinux>

Series

ref consistency check infra setup | expand

Commit Message

shejialuo July 29, 2024, 1:27 p.m. UTC

The git-fsck(1) only implicitly checks the reference, it does not fully
check refs with bad format name such as standalone "@" and name ending
with ".lock".

In order to provide such checks, add a new fsck message id "badRefName"
with default ERROR type. Use existing "check_refname_format" to explicit
check the ref name. And add a new unit test to verify the functionality.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   3 +
 fsck.h                        |   1 +
 refs/files-backend.c          |  22 ++++++++
 t/t0602-reffiles-fsck.sh      | 101 ++++++++++++++++++++++++++++++++++
 4 files changed, 127 insertions(+)
 create mode 100755 t/t0602-reffiles-fsck.sh

Comments

Patrick Steinhardt July 30, 2024, 8:31 a.m. UTC | #1

On Mon, Jul 29, 2024 at 09:27:45PM +0800, shejialuo wrote:
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index cb184953c1..0d4fc27768 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3419,6 +3419,27 @@ typedef int (*files_fsck_refs_fn)(struct fsck_options *o,
>  				  const char *refs_check_dir,
>  				  struct dir_iterator *iter);
>  
> +static int files_fsck_refs_name(struct fsck_options *o,
> +				const char *gitdir UNUSED,
> +				const char *refs_check_dir,
> +				struct dir_iterator *iter)
> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	struct fsck_refs_info info;
> +	int ret = 0;
> +
> +	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
> +		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
> +		info.path = sb.buf;
> +		ret = fsck_refs_report(o, NULL, &info,
> +				       FSCK_MSG_BAD_REF_NAME,
> +				       "invalid refname format");
> +	}
> +
> +	strbuf_release(&sb);
> +	return ret;
> +}
> +
>  static int files_fsck_refs_dir(struct ref_store *ref_store,
>  			       struct fsck_options *o,
>  			       const char *refs_check_dir,
> @@ -3469,6 +3490,7 @@ static int files_fsck_refs(struct ref_store *ref_store,
>  			   struct fsck_options *o)
>  {
>  	files_fsck_refs_fn fsck_refs_fns[]= {
> +		files_fsck_refs_name,
>  		NULL

Neat. I very much like that we can simply add new checks to this
function and the rest is handled for us already. Makes this whole thing
nicely extensible.

>  	};
>  
> diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> new file mode 100755
> index 0000000000..b2db58d2c6
> --- /dev/null
> +++ b/t/t0602-reffiles-fsck.sh
> @@ -0,0 +1,101 @@
> +#!/bin/sh
> +
> +test_description='Test reffiles backend consistency check'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +GIT_TEST_DEFAULT_REF_FORMAT=files
> +export GIT_TEST_DEFAULT_REF_FORMAT
> +
> +. ./test-lib.sh

Is this test suite intentionally not marked with
`TEST_PASSES_SANITIZE_LEAK=true`?

> +
> +test_expect_success 'ref name should be checked' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	branch_dir_prefix=.git/refs/heads &&
> +	tag_dir_prefix=.git/refs/tags &&
> +	(
> +		cd repo &&
> +		git commit --allow-empty -m initial &&
> +		git checkout -b branch-1 &&
> +		git tag tag-1 &&
> +		git commit --allow-empty -m second &&
> +		git checkout -b branch-2 &&
> +		git tag tag-2 &&
> +		git tag multi_hierarchy/tag-2
> +	) &&

I don't quite get why you create several subshells only to cd into
`repo` in each of them. Isn't a single subshell sufficient for all of
those tests? If you want to delimit blocks, then you can simply add an
empty newline between them.

> +	(
> +		cd repo &&
> +		cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
> +		test_must_fail git fsck 2>err &&
> +		cat >expect <<-EOF &&
> +		error: refs/heads/.branch-1: badRefName: invalid refname format
> +		EOF
> +		rm $branch_dir_prefix/.branch-1 &&
> +		test_cmp expect err
> +	) &&
> +	(
> +		cd repo &&
> +		cp $tag_dir_prefix/tag-1 $tag_dir_prefix/tag-1.lock &&
> +		test_must_fail git fsck 2>err &&
> +		cat >expect <<-EOF &&
> +		error: refs/tags/tag-1.lock: badRefName: invalid refname format
> +		EOF
> +		rm $tag_dir_prefix/tag-1.lock &&
> +		test_cmp expect err
> +	) &&

The other cases all make sense, but I don't think that a file ending
with ".lock" should be marked as having a "badRefName". It is expected
that concurrent writers may have such lock files.

What could make sense is to eventually mark stale lock files older than
X amount of time as errors or warnings. But I'd think that this is
outside of the scope of this patch series.

Patrick

shejialuo July 30, 2024, 4:14 p.m. UTC | #2

> > diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> > new file mode 100755
> > index 0000000000..b2db58d2c6
> > --- /dev/null
> > +++ b/t/t0602-reffiles-fsck.sh
> > @@ -0,0 +1,101 @@
> > +#!/bin/sh
> > +
> > +test_description='Test reffiles backend consistency check'
> > +
> > +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> > +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> > +GIT_TEST_DEFAULT_REF_FORMAT=files
> > +export GIT_TEST_DEFAULT_REF_FORMAT
> > +
> > +. ./test-lib.sh
> 
> Is this test suite intentionally not marked with
> `TEST_PASSES_SANITIZE_LEAK=true`?
> 

No, I don't know this. I will add `TEST_PASSES_SANITIZE_LEAK=true` and
export this environment variable.

> > +
> > +test_expect_success 'ref name should be checked' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	branch_dir_prefix=.git/refs/heads &&
> > +	tag_dir_prefix=.git/refs/tags &&
> > +	(
> > +		cd repo &&
> > +		git commit --allow-empty -m initial &&
> > +		git checkout -b branch-1 &&
> > +		git tag tag-1 &&
> > +		git commit --allow-empty -m second &&
> > +		git checkout -b branch-2 &&
> > +		git tag tag-2 &&
> > +		git tag multi_hierarchy/tag-2
> > +	) &&
> 
> I don't quite get why you create several subshells only to cd into
> `repo` in each of them. Isn't a single subshell sufficient for all of
> those tests? If you want to delimit blocks, then you can simply add an
> empty newline between them.
> 

I just want to delimit, I will use newline in the next version.

> > +	(
> > +		cd repo &&
> > +		cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
> > +		test_must_fail git fsck 2>err &&
> > +		cat >expect <<-EOF &&
> > +		error: refs/heads/.branch-1: badRefName: invalid refname format
> > +		EOF
> > +		rm $branch_dir_prefix/.branch-1 &&
> > +		test_cmp expect err
> > +	) &&
> > +	(
> > +		cd repo &&
> > +		cp $tag_dir_prefix/tag-1 $tag_dir_prefix/tag-1.lock &&
> > +		test_must_fail git fsck 2>err &&
> > +		cat >expect <<-EOF &&
> > +		error: refs/tags/tag-1.lock: badRefName: invalid refname format
> > +		EOF
> > +		rm $tag_dir_prefix/tag-1.lock &&
> > +		test_cmp expect err
> > +	) &&
> 
> The other cases all make sense, but I don't think that a file ending
> with ".lock" should be marked as having a "badRefName". It is expected
> that concurrent writers may have such lock files.
> 
> What could make sense is to eventually mark stale lock files older than
> X amount of time as errors or warnings. But I'd think that this is
> outside of the scope of this patch series.
> 

If so, let us just ignore ".lock" situation at the moment.

> Patrick

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index f643585a34..d8e437a043 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@ 
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefName`::
+	(ERROR) A ref has an invalid format.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
diff --git a/fsck.h b/fsck.h
index b03dba442e..ce56ce4bef 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@  enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index cb184953c1..0d4fc27768 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3419,6 +3419,27 @@  typedef int (*files_fsck_refs_fn)(struct fsck_options *o,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_name(struct fsck_options *o,
+				const char *gitdir UNUSED,
+				const char *refs_check_dir,
+				struct dir_iterator *iter)
+{
+	struct strbuf sb = STRBUF_INIT;
+	struct fsck_refs_info info;
+	int ret = 0;
+
+	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
+		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
+		info.path = sb.buf;
+		ret = fsck_refs_report(o, NULL, &info,
+				       FSCK_MSG_BAD_REF_NAME,
+				       "invalid refname format");
+	}
+
+	strbuf_release(&sb);
+	return ret;
+}
+
 static int files_fsck_refs_dir(struct ref_store *ref_store,
 			       struct fsck_options *o,
 			       const char *refs_check_dir,
@@ -3469,6 +3490,7 @@  static int files_fsck_refs(struct ref_store *ref_store,
 			   struct fsck_options *o)
 {
 	files_fsck_refs_fn fsck_refs_fns[]= {
+		files_fsck_refs_name,
 		NULL
 	};
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
new file mode 100755
index 0000000000..b2db58d2c6
--- /dev/null
+++ b/t/t0602-reffiles-fsck.sh
@@ -0,0 +1,101 @@ 
+#!/bin/sh
+
+test_description='Test reffiles backend consistency check'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+GIT_TEST_DEFAULT_REF_FORMAT=files
+export GIT_TEST_DEFAULT_REF_FORMAT
+
+. ./test-lib.sh
+
+test_expect_success 'ref name should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+		git checkout -b branch-1 &&
+		git tag tag-1 &&
+		git commit --allow-empty -m second &&
+		git checkout -b branch-2 &&
+		git tag tag-2 &&
+		git tag multi_hierarchy/tag-2
+	) &&
+	(
+		cd repo &&
+		cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
+		test_must_fail git fsck 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/.branch-1: badRefName: invalid refname format
+		EOF
+		rm $branch_dir_prefix/.branch-1 &&
+		test_cmp expect err
+	) &&
+	(
+		cd repo &&
+		cp $tag_dir_prefix/tag-1 $tag_dir_prefix/tag-1.lock &&
+		test_must_fail git fsck 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/tags/tag-1.lock: badRefName: invalid refname format
+		EOF
+		rm $tag_dir_prefix/tag-1.lock &&
+		test_cmp expect err
+	) &&
+	(
+		cd repo &&
+		cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
+		test_must_fail git fsck 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/heads/@: badRefName: invalid refname format
+		EOF
+		rm $branch_dir_prefix/@ &&
+		test_cmp expect err
+	) &&
+	(
+		cd repo &&
+		cp $tag_dir_prefix/multi_hierarchy/tag-2 $tag_dir_prefix/multi_hierarchy/@ &&
+		test_must_fail git fsck 2>err &&
+		cat >expect <<-EOF &&
+		error: refs/tags/multi_hierarchy/@: badRefName: invalid refname format
+		EOF
+		rm $tag_dir_prefix/multi_hierarchy/@ &&
+		test_cmp expect err
+	)
+'
+
+test_expect_success 'ref name check should be adapted into fsck messages' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	(
+		cd repo &&
+		git commit --allow-empty -m initial &&
+		git checkout -b branch-1 &&
+		git tag tag-1 &&
+		git commit --allow-empty -m second &&
+		git checkout -b branch-2 &&
+		git tag tag-2
+	) &&
+	(
+		cd repo &&
+		cp $branch_dir_prefix/branch-1 $branch_dir_prefix/.branch-1 &&
+		git -c fsck.badRefName=warn fsck 2>err &&
+		cat >expect <<-EOF &&
+		warning: refs/heads/.branch-1: badRefName: invalid refname format
+		EOF
+		rm $branch_dir_prefix/.branch-1 &&
+		test_cmp expect err
+	) &&
+	(
+		cd repo &&
+		cp $branch_dir_prefix/branch-1 $branch_dir_prefix/@ &&
+		git -c fsck.badRefName=ignore fsck 2>err &&
+		test_must_be_empty err
+	)
+'
+
+test_done

[GSoC,v13,09/10] fsck: add ref name check for files backend

Commit Message

Comments

Patch