diff mbox series

[v5,6/9] ref: add escape check for the referent of symref

Message ID Zvj-xaa_j26Auig7@ArchLinux (mailing list archive)
State New
Headers show
Series add ref content check for files backend | expand

Commit Message

shejialuo Sept. 29, 2024, 7:16 a.m. UTC
Ideally, we want to the users use "git symbolic-ref" to create symrefs
instead of writing raw contents into the filesystem. However, "git
symbolic-ref" is strict with the refname but not strict with the
referent. For example, we can make the "referent" located at the
"$(gitdir)/logs/aaa" and manually write the content into this where we
can still successfully parse this symref by using "git rev-parse".

  $ git init repo && cd repo && git commit --allow-empty -mx
  $ git symbolic-ref refs/heads/test logs/aaa
  $ echo $(git rev-parse HEAD) > .git/logs/aaa
  $ git rev-parse test

We may need to add some restrictions for "referent" parameter when using
"git symbolic-ref" to create symrefs because ideally all the
nonpeudo-refs should be located under the "refs" directory and we may
tighten this in the future.

In order to tell the user we may tighten the "escape" situation, create
a new fsck message "escapeReferent" to notify the user that this may
become an error in the future.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  8 ++++++++
 fsck.h                        |  1 +
 refs/files-backend.c          |  7 +++++++
 t/t0602-reffiles-fsck.sh      | 18 ++++++++++++++++++
 4 files changed, 34 insertions(+)

Comments

Patrick Steinhardt Oct. 7, 2024, 6:58 a.m. UTC | #1
On Sun, Sep 29, 2024 at 03:16:21PM +0800, shejialuo wrote:
> Ideally, we want to the users use "git symbolic-ref" to create symrefs
> instead of writing raw contents into the filesystem. However, "git
> symbolic-ref" is strict with the refname but not strict with the
> referent. For example, we can make the "referent" located at the
> "$(gitdir)/logs/aaa" and manually write the content into this where we
> can still successfully parse this symref by using "git rev-parse".
> 
>   $ git init repo && cd repo && git commit --allow-empty -mx
>   $ git symbolic-ref refs/heads/test logs/aaa
>   $ echo $(git rev-parse HEAD) > .git/logs/aaa
>   $ git rev-parse test

Oh, curious. This should definitely be tightened in git-symbolic-ref(1)
itself. The target should either be a root ref or something starting
with "refs/". Anyway, that is of course outside of the scope of this
patch series.

> We may need to add some restrictions for "referent" parameter when using
> "git symbolic-ref" to create symrefs because ideally all the
> nonpeudo-refs should be located under the "refs" directory and we may
> tighten this in the future.

Agreed.

> In order to tell the user we may tighten the "escape" situation, create
> a new fsck message "escapeReferent" to notify the user that this may
> become an error in the future.
> 
> Mentored-by: Patrick Steinhardt <ps@pks.im>
> Mentored-by: Karthik Nayak <karthik.188@gmail.com>
> Signed-off-by: shejialuo <shejialuo@gmail.com>
> ---
>  Documentation/fsck-msgids.txt |  8 ++++++++
>  fsck.h                        |  1 +
>  refs/files-backend.c          |  7 +++++++
>  t/t0602-reffiles-fsck.sh      | 18 ++++++++++++++++++
>  4 files changed, 34 insertions(+)
> 
> diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> index e0e4519334..223974057d 100644
> --- a/Documentation/fsck-msgids.txt
> +++ b/Documentation/fsck-msgids.txt
> @@ -52,6 +52,14 @@
>  `emptyName`::
>  	(WARN) A path contains an empty name.
>  
> +`escapeReferent`::
> +	(INFO) The referent of a symref is outside the "ref" directory.

Proposal: 'The referent of a symbolic reference points neither to a root
reference nor to a reference starting with "refs/".'

I'd also rename this to e.g. "symrefTargetIsNotAReference" or something
like that, because it's not really about whether or not the referent is
"escaping". It's a bit of a mouthful, but I don't really have a better
name. So feel free to pick something different that describes the error
better.

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 57ac466b64..bd215c8d08 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3520,6 +3520,13 @@ static int files_fsck_symref_target(struct fsck_options *o,
>  	orig_last_byte = referent->buf[orig_len - 1];
>  	strbuf_rtrim(referent);
>  
> +	if (!starts_with(referent->buf, "refs/")) {
> +		ret = fsck_report_ref(o, report,
> +				      FSCK_MSG_ESCAPE_REFERENT,
> +				      "referent '%s' is outside of refs/",
> +				      referent->buf);
> +	}
> +
>  	if (check_refname_format(referent->buf, 0)) {
>  		ret = fsck_report_ref(o, report,
>  				      FSCK_MSG_BAD_REFERENT,

This check is invalid, because referents can also point to root refs. So
you should probably also add a call to `is_root_ref()` here.

We also have `is_pseudo_ref()`, and one might be tempted to also allow
that. But pseudo refs aren't proper refs, so I'd argue that a symref
pointing to a pseudo ref is invalid, too.

Patrick
shejialuo Oct. 7, 2024, 8:44 a.m. UTC | #2
On Mon, Oct 07, 2024 at 08:58:55AM +0200, Patrick Steinhardt wrote:
> On Sun, Sep 29, 2024 at 03:16:21PM +0800, shejialuo wrote:
> > Ideally, we want to the users use "git symbolic-ref" to create symrefs
> > instead of writing raw contents into the filesystem. However, "git
> > symbolic-ref" is strict with the refname but not strict with the
> > referent. For example, we can make the "referent" located at the
> > "$(gitdir)/logs/aaa" and manually write the content into this where we
> > can still successfully parse this symref by using "git rev-parse".
> > 
> >   $ git init repo && cd repo && git commit --allow-empty -mx
> >   $ git symbolic-ref refs/heads/test logs/aaa
> >   $ echo $(git rev-parse HEAD) > .git/logs/aaa
> >   $ git rev-parse test
> 
> Oh, curious. This should definitely be tightened in git-symbolic-ref(1)
> itself. The target should either be a root ref or something starting
> with "refs/". Anyway, that is of course outside of the scope of this
> patch series.
> 

I am curious here too when I did experiments when writing the code.
Because Junio have told me this could happen, so I dive into this.
However, it's not reasonable. If we want to tighten the rule, we need to
also let "git symbolic-ref" to align with the behavior. That's another
question though.

[snip]

> > diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
> > index e0e4519334..223974057d 100644
> > --- a/Documentation/fsck-msgids.txt
> > +++ b/Documentation/fsck-msgids.txt
> > @@ -52,6 +52,14 @@
> >  `emptyName`::
> >  	(WARN) A path contains an empty name.
> >  
> > +`escapeReferent`::
> > +	(INFO) The referent of a symref is outside the "ref" directory.
> 
> Proposal: 'The referent of a symbolic reference points neither to a root
> reference nor to a reference starting with "refs/".'
> 

That's much better.

> I'd also rename this to e.g. "symrefTargetIsNotAReference" or something
> like that, because it's not really about whether or not the referent is
> "escaping". It's a bit of a mouthful, but I don't really have a better
> name. So feel free to pick something different that describes the error
> better.
> 

I guess "symrefTargetIsNotAReference" is a little too long. If we decide
to convert it to error later. Why not just put it into the "badReferent"
fsck message?

So, I do not think we need to rename. As I have talked about, we don't
need to map error case to fsck message id one by one.

> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index 57ac466b64..bd215c8d08 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -3520,6 +3520,13 @@ static int files_fsck_symref_target(struct fsck_options *o,
> >  	orig_last_byte = referent->buf[orig_len - 1];
> >  	strbuf_rtrim(referent);
> >  
> > +	if (!starts_with(referent->buf, "refs/")) {
> > +		ret = fsck_report_ref(o, report,
> > +				      FSCK_MSG_ESCAPE_REFERENT,
> > +				      "referent '%s' is outside of refs/",
> > +				      referent->buf);
> > +	}
> > +
> >  	if (check_refname_format(referent->buf, 0)) {
> >  		ret = fsck_report_ref(o, report,
> >  				      FSCK_MSG_BAD_REFERENT,
> 
> This check is invalid, because referents can also point to root refs. So
> you should probably also add a call to `is_root_ref()` here.
> 

Thanks, I omit this situation here.

> We also have `is_pseudo_ref()`, and one might be tempted to also allow
> that. But pseudo refs aren't proper refs, so I'd argue that a symref
> pointing to a pseudo ref is invalid, too.
> 

I agree.

> Patrick

Thanks,
Jialuo
Patrick Steinhardt Oct. 7, 2024, 9:26 a.m. UTC | #3
On Mon, Oct 07, 2024 at 04:44:44PM +0800, shejialuo wrote:
> On Mon, Oct 07, 2024 at 08:58:55AM +0200, Patrick Steinhardt wrote:
> > On Sun, Sep 29, 2024 at 03:16:21PM +0800, shejialuo wrote:
> > I'd also rename this to e.g. "symrefTargetIsNotAReference" or something
> > like that, because it's not really about whether or not the referent is
> > "escaping". It's a bit of a mouthful, but I don't really have a better
> > name. So feel free to pick something different that describes the error
> > better.
> > 
> 
> I guess "symrefTargetIsNotAReference" is a little too long. If we decide
> to convert it to error later. Why not just put it into the "badReferent"
> fsck message?
> 
> So, I do not think we need to rename. As I have talked about, we don't
> need to map error case to fsck message id one by one.

Mostly because I disagree with this here. I think there should be a 1:1
mapping, and "badReferent" is too generic to provide that.

Patrick
diff mbox series

Patch

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index e0e4519334..223974057d 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -52,6 +52,14 @@ 
 `emptyName`::
 	(WARN) A path contains an empty name.
 
+`escapeReferent`::
+	(INFO) The referent of a symref is outside the "ref" directory.
+	Although we allow create a symref pointing to the referent which
+	is outside the "ref" by using `git symbolic-ref`, we may tighten
+	the rule in the future. Report to the git@vger.kernel.org
+	mailing list if you see this error, as we need to know what tools
+	created such a file.
+
 `extraHeaderEntry`::
 	(IGNORE) Extra headers found after `tagger`.
 
diff --git a/fsck.h b/fsck.h
index 979d75cb53..5ecee0fda5 100644
--- a/fsck.h
+++ b/fsck.h
@@ -80,6 +80,7 @@  enum fsck_msg_type {
 	FUNC(LARGE_PATHNAME, WARN) \
 	/* infos (reported as warnings, but ignored by default) */ \
 	FUNC(BAD_FILEMODE, INFO) \
+	FUNC(ESCAPE_REFERENT, INFO) \
 	FUNC(GITMODULES_PARSE, INFO) \
 	FUNC(GITIGNORE_SYMLINK, INFO) \
 	FUNC(GITATTRIBUTES_SYMLINK, INFO) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 57ac466b64..bd215c8d08 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3520,6 +3520,13 @@  static int files_fsck_symref_target(struct fsck_options *o,
 	orig_last_byte = referent->buf[orig_len - 1];
 	strbuf_rtrim(referent);
 
+	if (!starts_with(referent->buf, "refs/")) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_ESCAPE_REFERENT,
+				      "referent '%s' is outside of refs/",
+				      referent->buf);
+	}
+
 	if (check_refname_format(referent->buf, 0)) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_BAD_REFERENT,
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 718f6abb71..585f562245 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -370,4 +370,22 @@  test_expect_success 'textual symref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'textual symref should be checked whether it is escaped' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs-back/heads/main\n" >$branch_dir_prefix/branch-bad-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-bad-1: escapeReferent: referent '\''refs-back/heads/main'\'' is outside of refs/
+	EOF
+	rm $branch_dir_prefix/branch-bad-1 &&
+	test_cmp expect err
+'
+
 test_done