diff mbox series

[v2,2/4] ref: add regular ref content check for files backend

Message ID Zs353oLDaw2SbNQs@ArchLinux (mailing list archive)
State Superseded
Headers show
Series add ref content check for files backend | expand

Commit Message

shejialuo Aug. 27, 2024, 4:07 p.m. UTC
We implicitly rely on "git-fsck(1)" to check the consistency of regular
refs. However, when parsing the regular refs for files backend by using
"files-backend.c::parse_loose_ref_contents", we allow the ref content to
be end with no newline or contain some garbages.

It may seem that we should report an error or warn fsck message to the
user about above situations. However, there may be some third-party
tools customizing the content of refs. We should not report an error
fsck message.

And we cannot either report a warn fsck message to the user. This is
because if the caller set the "strict" field in "fsck_options" to
to upgrade the fsck warnings to errors.

We should not allow the user to upgrade the fsck warnings to errors. It
might cause compatibility issue which will break the legacy repository.
So we add the following two fsck infos to represent the situation where
the ref content ends without newline or has garbages:

1. "refMissingNewline(INFO)": A valid ref does not end with newline.
2. "trailingRefContent(INFO)": A ref has trailing contents.

In "fsck.c::fsck_vreport", we will convert "FSCK_INFO" to "FSCK_WARN",
and we can still warn the user about these situations when using
"git-refs verify" without introducing compatibility issue.

In current "git-fsck(1)", it will report an error when the ref content
is bad, so we should following this to report an error to the user when
"parse_loose_ref_contents" fails. And we add a new fsck error message
called "badRefContent(ERROR)" to represent that a ref has a bad content.

In order to tell whether the ref has trailing content, add a new
parameter "trailing" to "parse_loose_ref_contents". Then introduce a new
function "files_fsck_refs_content" to check the regular refs to enhance
the "git-refs verify".

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  9 ++++
 fsck.h                        |  3 ++
 refs.c                        |  2 +-
 refs/files-backend.c          | 68 ++++++++++++++++++++++++++-
 refs/refs-internal.h          |  2 +-
 t/t0602-reffiles-fsck.sh      | 87 +++++++++++++++++++++++++++++++++++
 6 files changed, 167 insertions(+), 4 deletions(-)

Comments

shejialuo Aug. 27, 2024, 4:19 p.m. UTC | #1
On Wed, Aug 28, 2024 at 12:08:03AM +0800, shejialuo wrote:

> And we cannot either report a warn fsck message to the user. This is
> because if the caller set the "strict" field in "fsck_options" to
> to upgrade the fsck warnings to errors.
> 

Sorry for this paragraph, because I have changed commit message for this
patch, the range-diff part would be outdated. But the code is still the
same. So I hope this will not cause much trouble. And this paragraph
should be the following:

  And we cannot either report a warn fsck message to the user. This is
  because if the caller set the "strict" field in "fsck_options", fsck
  warnings will be converted to errors.

I will fix this in the next version until I receive enough feedback.

Thanks.
Junio C Hamano Aug. 27, 2024, 6:21 p.m. UTC | #2
shejialuo <shejialuo@gmail.com> writes:

> We implicitly rely on "git-fsck(1)" to check the consistency of regular
> refs. However, when parsing the regular refs for files backend by using
> "files-backend.c::parse_loose_ref_contents", we allow the ref content to
> be end with no newline or contain some garbages.

"to be end with" -> "to end with".
"or contain" -> "or to contain" (optional, I think).

Or "... the ref content without terminating newline, or with extra
bytes after the terminating newline."

> It may seem that we should report an error or warn fsck message to the
> user about above situations. However, there may be some third-party
> tools customizing the content of refs. We should not report an error
> fsck message.

    Even though we never created such loose refs ourselves, we have
    accepted such loose refs forever, so it is entirely possible
    that third-party tools may rely on such loose refs being valid.
    Let's notice such a "curiously formatted" loose ref files and
    tell the users our findings, so that we can assess the possible
    extent of damage if/when we retroactively tightened the parsing
    rules in the future.

> We should not allow the user to upgrade the fsck warnings to errors. It
> might cause compatibility issue which will break the legacy repository.

I am not sure this is a right thing to say.  If the user wants to
ensure that the tool they use in their repository, which may include
some third-party reimplementation of Git, would never create such a
(semi-)malformed loose ref files, it is within their right, and it
is the most reasonable way, to promote these "curiously formatted
loose ref" fsck warnings to errors.

Is your "We should not allow" above backed by code that prevents
them from promoting the warnings to errors, or is it merely a
declaration of your intention?

> So we add the following two fsck infos to represent the situation where
> the ref content ends without newline or has garbages:
>
> 1. "refMissingNewline(INFO)": A valid ref does not end with newline.
> 2. "trailingRefContent(INFO)": A ref has trailing contents.

OK.

> In "fsck.c::fsck_vreport", we will convert "FSCK_INFO" to "FSCK_WARN",
> and we can still warn the user about these situations when using
> "git-refs verify" without introducing compatibility issue.

OK.

> In current "git-fsck(1)", it will report an error when the ref content
> is bad, so we should following this to report an error to the user when
> "parse_loose_ref_contents" fails. And we add a new fsck error message
> called "badRefContent(ERROR)" to represent that a ref has a bad content.

Good.

> @@ -170,6 +173,12 @@
>  `nullSha1`::
>  	(WARN) Tree contains entries pointing to a null sha1.
>  
> +`refMissingNewline`::
> +	(INFO) A valid ref does not end with newline.
> +
> +`trailingRefContent`::
> +	(INFO) A ref has trailing contents.
> +
>  `treeNotSorted`::
>  	(ERROR) A tree is not properly sorted.

There is no mention of "you shouldn't promote these to error" here,
which is good.  But wouldn't we want to tell users to report such
curiously formatted loose refs, after figuring out who created them,
to help us to eventually make the check stricter in the future?

Git 3.0 boundary might be a good time to tighten interoperability
rules such that we won't accept anything we wouldn't have written
ourselves (not limited to loose ref format, but this applies to
anything on-disk or on-wire), but we'd need enough preparation if we
want to be able to do so in the future.

Thanks.
Patrick Steinhardt Aug. 28, 2024, 12:50 p.m. UTC | #3
On Wed, Aug 28, 2024 at 12:07:58AM +0800, shejialuo wrote:
> @@ -170,6 +173,12 @@
>  `nullSha1`::
>  	(WARN) Tree contains entries pointing to a null sha1.
>  
> +`refMissingNewline`::
> +	(INFO) A valid ref does not end with newline.

This reads a bit funny to me. If the ref is valid, why do we complain?

Maybe this would read better if you said "An otherwise valid ref does
not end with a newline".

> @@ -3430,6 +3434,65 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
>  				  const char *refs_check_dir,
>  				  struct dir_iterator *iter);
>  
> +static int files_fsck_refs_content(struct ref_store *ref_store,
> +				   struct fsck_options *o,
> +				   const char *refs_check_dir,
> +				   struct dir_iterator *iter)
> +{
> +	struct strbuf ref_content = STRBUF_INIT;
> +	struct strbuf referent = STRBUF_INIT;
> +	struct strbuf refname = STRBUF_INIT;
> +	struct fsck_ref_report report = {0};
> +	const char *trailing = NULL;
> +	unsigned int type = 0;
> +	int failure_errno = 0;
> +	struct object_id oid;
> +	int ret = 0;
> +
> +	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
> +	report.path = refname.buf;
> +
> +	if (S_ISREG(iter->st.st_mode)) {

This is still indenting the whole body. You mentioned that you don't
want to use `goto`, but in our codebase it's actually quite idiomatic.
And you already use it anyway.

> diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> index 71a4d1a5ae..7c1910d784 100755
> --- a/t/t0602-reffiles-fsck.sh
> +++ b/t/t0602-reffiles-fsck.sh
> @@ -89,4 +89,91 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
>  	test_must_be_empty err
>  '
>  
> +test_expect_success 'regular ref content should be checked' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	branch_dir_prefix=.git/refs/heads &&
> +	tag_dir_prefix=.git/refs/tags &&
> +	cd repo &&
> +	git commit --allow-empty -m initial &&
> +	git checkout -b branch-1 &&
> +	git tag tag-1 &&
> +	git commit --allow-empty -m second &&
> +	git checkout -b branch-2 &&
> +	git tag tag-2 &&
> +	git checkout -b a/b/tag-2 &&

Wouldn't it be sufficient to only create a single commit, e.g. via
`test_commit`? From all I can see all you need is some object ID, so
creating the tags and second commit doesn't seem to be necessary.

> +	printf "%s" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-no-newline &&

We don't typically have spaces after the redirect. So you should remove
them here and in all the subsequent instances.

> +	git refs verify 2>err &&
> +	cat >expect <<-EOF &&
> +	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
> +	EOF
> +	rm $branch_dir_prefix/branch-1-no-newline &&
> +	test_cmp expect err &&

I was wondering whether each of these cases should be a separate test,
but that may be a bit wasteful. Alternatively, can we maybe set up a
single repository with all the garbage that we want to verify and then
double check that executing `git refs verify` surfaces them all in a
single invocation?

Patrick
Patrick Steinhardt Aug. 28, 2024, 12:50 p.m. UTC | #4
On Tue, Aug 27, 2024 at 11:21:34AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> > @@ -170,6 +173,12 @@
> >  `nullSha1`::
> >  	(WARN) Tree contains entries pointing to a null sha1.
> >  
> > +`refMissingNewline`::
> > +	(INFO) A valid ref does not end with newline.
> > +
> > +`trailingRefContent`::
> > +	(INFO) A ref has trailing contents.
> > +
> >  `treeNotSorted`::
> >  	(ERROR) A tree is not properly sorted.
> 
> There is no mention of "you shouldn't promote these to error" here,
> which is good.  But wouldn't we want to tell users to report such
> curiously formatted loose refs, after figuring out who created them,
> to help us to eventually make the check stricter in the future?
> 
> Git 3.0 boundary might be a good time to tighten interoperability
> rules such that we won't accept anything we wouldn't have written
> ourselves (not limited to loose ref format, but this applies to
> anything on-disk or on-wire), but we'd need enough preparation if we
> want to be able to do so in the future.

I quite like this idea. Jialuo, would you maybe want to include another
patch on top that adds a paragraph to Documentation/BreakingChanges.txt?
It should note that this is not yet settled and depends on whether or
not we see complaints with your new checks.

I guess another prereq for the change is to integrate `git refs verify`
with git-fsck(1), because otherwise people likely wouldn't see the new
messages in the first place.

Patrick
shejialuo Aug. 28, 2024, 2:31 p.m. UTC | #5
On Tue, Aug 27, 2024 at 11:21:34AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > We implicitly rely on "git-fsck(1)" to check the consistency of regular
> > refs. However, when parsing the regular refs for files backend by using
> > "files-backend.c::parse_loose_ref_contents", we allow the ref content to
> > be end with no newline or contain some garbages.
> 
> "to be end with" -> "to end with".
> "or contain" -> "or to contain" (optional, I think).
> 
> Or "... the ref content without terminating newline, or with extra
> bytes after the terminating newline."
> 

Thanks, I will fix this in the next version.

> > It may seem that we should report an error or warn fsck message to the
> > user about above situations. However, there may be some third-party
> > tools customizing the content of refs. We should not report an error
> > fsck message.
> 
> Even though we never created such loose refs ourselves, we have
> accepted such loose refs forever, so it is entirely possible
> that third-party tools may rely on such loose refs being valid.
> Let's notice such a "curiously formatted" loose ref files and
> tell the users our findings, so that we can assess the possible
> extent of damage if/when we retroactively tightened the parsing
> rules in the future.
> 

I think I could organize the above to the commit message to better show
the motivation why we should not report an error fsck message.

> > We should not allow the user to upgrade the fsck warnings to errors. It
> > might cause compatibility issue which will break the legacy repository.
> 
> I am not sure this is a right thing to say.  If the user wants to
> ensure that the tool they use in their repository, which may include
> some third-party reimplementation of Git, would never create such a
> (semi-)malformed loose ref files, it is within their right, and it
> is the most reasonable way, to promote these "curiously formatted
> loose ref" fsck warnings to errors.
> 
> Is your "We should not allow" above backed by code that prevents
> them from promoting the warnings to errors, or is it merely a
> declaration of your intention?
> 

I have introduced some misunderstanding here. In the previous paragraph,
I have mentioned that if the caller set the "strict" field in
"fsck_options", the fsck warns would be automatically converted to fsck
errors which may cause some trouble.

So I think here we should move this paragraph just after the previous
paragraph to indicate why we do want to make a info fsck message here.
Actually, the user could still explicitly use the following command

  git -c fsck.refMissingNewline=error refs verify

to upgrade the fsck info to fsck error. But if the user use "--strict"
like the following:

  git refs verify --strict

The fsck warns would be automatically converted to fsck errors. But
actually at current, we do not want to the user implicitly upgrade fsck
warns to fsck errors by using "--strict" flag. That's why we need to
introduce the "FSCK_INO" here.

Actually, I was inspired by the Jeff King's commit:

  4dd3b045f5 (fsck: downgrade tree badFilemode to "info", 2022-08-10)

In this commit, Jeff downgrades badFilemode to "info" to avoid above
situation. I will improve the commit message to make things clearer.

However, from my perspective, the semantic of "FSCK_INFO" is a little
unsuitable here. The comment says:

  /* infos (reported as warnings, but ignored by default) */

The "ignored by default" here is very confusing. Actually, we make the
"info" lower than the "warn" to avoid automatically converting the "warn"
to "error" by setting "strict" field in "fsck_options".

But "ignored by default" will make the user think "oh, it's info, but we
report it as warnings". We cannot know the real intention of the
"FSCK_INFO" unless we have above context.

But I guess this is too far from the intention of this patch. We may
improve this later.

> > @@ -170,6 +173,12 @@
> >  `nullSha1`::
> >  	(WARN) Tree contains entries pointing to a null sha1.
> >  
> > +`refMissingNewline`::
> > +	(INFO) A valid ref does not end with newline.
> > +
> > +`trailingRefContent`::
> > +	(INFO) A ref has trailing contents.
> > +
> >  `treeNotSorted`::
> >  	(ERROR) A tree is not properly sorted.
> 
> There is no mention of "you shouldn't promote these to error" here,
> which is good.  But wouldn't we want to tell users to report such
> curiously formatted loose refs, after figuring out who created them,
> to help us to eventually make the check stricter in the future?
> 

From the review from the Patrick, I will add another patch in the
"Documentation/BreakingChanges.txt" later.

> Git 3.0 boundary might be a good time to tighten interoperability
> rules such that we won't accept anything we wouldn't have written
> ourselves (not limited to loose ref format, but this applies to
> anything on-disk or on-wire), but we'd need enough preparation if we
> want to be able to do so in the future.
> 
> Thanks.
> 

Thanks,
Jialuo
shejialuo Aug. 28, 2024, 2:41 p.m. UTC | #6
On Wed, Aug 28, 2024 at 02:50:01PM +0200, Patrick Steinhardt wrote:
> On Wed, Aug 28, 2024 at 12:07:58AM +0800, shejialuo wrote:
> > @@ -170,6 +173,12 @@
> >  `nullSha1`::
> >  	(WARN) Tree contains entries pointing to a null sha1.
> >  
> > +`refMissingNewline`::
> > +	(INFO) A valid ref does not end with newline.
> 
> This reads a bit funny to me. If the ref is valid, why do we complain?
> 
> Maybe this would read better if you said "An otherwise valid ref does
> not end with a newline".
> 

I think we should just drop the "valid" here. Because for symref, it
may miss newline and is NOT valid.

I will improve this in the next version.

> > @@ -3430,6 +3434,65 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
> >  				  const char *refs_check_dir,
> >  				  struct dir_iterator *iter);
> >  
> > +static int files_fsck_refs_content(struct ref_store *ref_store,
> > +				   struct fsck_options *o,
> > +				   const char *refs_check_dir,
> > +				   struct dir_iterator *iter)
> > +{
> > +	struct strbuf ref_content = STRBUF_INIT;
> > +	struct strbuf referent = STRBUF_INIT;
> > +	struct strbuf refname = STRBUF_INIT;
> > +	struct fsck_ref_report report = {0};
> > +	const char *trailing = NULL;
> > +	unsigned int type = 0;
> > +	int failure_errno = 0;
> > +	struct object_id oid;
> > +	int ret = 0;
> > +
> > +	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
> > +	report.path = refname.buf;
> > +
> > +	if (S_ISREG(iter->st.st_mode)) {
> 
> This is still indenting the whole body. You mentioned that you don't
> want to use `goto`, but in our codebase it's actually quite idiomatic.
> And you already use it anyway.
> 

I think so, indenting is noisy. Will use "goto" to avoid indenting.

> > diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
> > index 71a4d1a5ae..7c1910d784 100755
> > --- a/t/t0602-reffiles-fsck.sh
> > +++ b/t/t0602-reffiles-fsck.sh
> > @@ -89,4 +89,91 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
> >  	test_must_be_empty err
> >  '
> >  
> > +test_expect_success 'regular ref content should be checked' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	branch_dir_prefix=.git/refs/heads &&
> > +	tag_dir_prefix=.git/refs/tags &&
> > +	cd repo &&
> > +	git commit --allow-empty -m initial &&
> > +	git checkout -b branch-1 &&
> > +	git tag tag-1 &&
> > +	git commit --allow-empty -m second &&
> > +	git checkout -b branch-2 &&
> > +	git tag tag-2 &&
> > +	git checkout -b a/b/tag-2 &&
> 
> Wouldn't it be sufficient to only create a single commit, e.g. via
> `test_commit`? From all I can see all you need is some object ID, so
> creating the tags and second commit doesn't seem to be necessary.
> 

I agree with this. I will clean the code for the next version.

> > +	printf "%s" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-no-newline &&
> 
> We don't typically have spaces after the redirect. So you should remove
> them here and in all the subsequent instances.
> 

I will clean the code style here.

> > +	git refs verify 2>err &&
> > +	cat >expect <<-EOF &&
> > +	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
> > +	EOF
> > +	rm $branch_dir_prefix/branch-1-no-newline &&
> > +	test_cmp expect err &&
> 
> I was wondering whether each of these cases should be a separate test,
> but that may be a bit wasteful. Alternatively, can we maybe set up a
> single repository with all the garbage that we want to verify and then
> double check that executing `git refs verify` surfaces them all in a
> single invocation?
> 

Actually, I have also thought about separating the tests which may
clear and I dropped this idea due to the reason the same as yours. I DO
agree that we should set up a single repository with all the garbage
that we want to verify. This is necessary.

Thanks,
Jialuo
Junio C Hamano Aug. 28, 2024, 3:30 p.m. UTC | #7
Patrick Steinhardt <ps@pks.im> writes:

> On Wed, Aug 28, 2024 at 12:07:58AM +0800, shejialuo wrote:
>> @@ -170,6 +173,12 @@
>>  `nullSha1`::
>>  	(WARN) Tree contains entries pointing to a null sha1.
>>  
>> +`refMissingNewline`::
>> +	(INFO) A valid ref does not end with newline.
>
> This reads a bit funny to me. If the ref is valid, why do we complain?

I think you understood after reading the series through and
responded to my "curiously formatted" comment.  I understand that
these marked as INFO are not about "to complain" but are for us to
ask the user to report so that we can learn of any third-party tools
that may get in our way to later tighten the parsing rules
retroactively.  

> Maybe this would read better if you said "An otherwise valid ref does
> not end with a newline".

So I do agree that the text above is less than optimal.  It is "this
is valid, but something we wouldn't have written.  Who creates such
a ref?"
Junio C Hamano Aug. 28, 2024, 4:32 p.m. UTC | #8
Patrick Steinhardt <ps@pks.im> writes:

>> Git 3.0 boundary might be a good time to tighten interoperability
>> rules such that we won't accept anything we wouldn't have written
>> ourselves (not limited to loose ref format, but this applies to
>> anything on-disk or on-wire), but we'd need enough preparation if we
>> want to be able to do so in the future.
>
> I quite like this idea.

I wouldn't say that I wrote it as a devil's advocate comment, but I
was hoping that somebody quote Postel in response, as the above
advocates a directly opposite position, which I wouldn't usually
take.

> I guess another prereq for the change is to integrate `git refs verify`
> with git-fsck(1), because otherwise people likely wouldn't see the new
> messages in the first place.

Absolutely.
Junio C Hamano Aug. 28, 2024, 4:45 p.m. UTC | #9
shejialuo <shejialuo@gmail.com> writes:

>> > @@ -170,6 +173,12 @@
>> >  `nullSha1`::
>> >  	(WARN) Tree contains entries pointing to a null sha1.
>> >  
>> > +`refMissingNewline`::
>> > +	(INFO) A valid ref does not end with newline.
>> > +
>> > +`trailingRefContent`::
>> > +	(INFO) A ref has trailing contents.
>> > +
>> >  `treeNotSorted`::
>> >  	(ERROR) A tree is not properly sorted.
>> 
>> There is no mention of "you shouldn't promote these to error" here,
>> which is good.  But wouldn't we want to tell users to report such
>> curiously formatted loose refs, after figuring out who created them,
>> to help us to eventually make the check stricter in the future?
>
> From the review from the Patrick, I will add another patch in the
> "Documentation/BreakingChanges.txt" later.

As that documentation is not end-user facing, it is orthogonal and
unrelated.

What I meant was that we need to tell the user that the refs they
have (and the third-party tools they used to create them) may be
declared invalid in a future version of Git and they would want to
report it, in order to influence our possible future direction.  And
we need to do so in an end-user facing documentation (i.e. the part
of the patch quoted above) and/or in the info messages themselves.
Patrick Steinhardt Aug. 29, 2024, 10:19 a.m. UTC | #10
On Wed, Aug 28, 2024 at 09:32:16AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >> Git 3.0 boundary might be a good time to tighten interoperability
> >> rules such that we won't accept anything we wouldn't have written
> >> ourselves (not limited to loose ref format, but this applies to
> >> anything on-disk or on-wire), but we'd need enough preparation if we
> >> want to be able to do so in the future.
> >
> > I quite like this idea.
> 
> I wouldn't say that I wrote it as a devil's advocate comment, but I
> was hoping that somebody quote Postel in response, as the above
> advocates a directly opposite position, which I wouldn't usually
> take.

For context, this is the quote you probably refer to: "be conservative
in what you do, be liberal in what you accept from others".

In any case, I still think it is sensible to at least warn about refs
like this. It is unexpected to me and may indicate real issues in the
understanding of others that end up writing to the refdb. If there are
implementations of Git out there that intentionally use our lax parsing
to e.g. stuff additional metadata into refs, then we need to tell them
that this is not okay.

This may have been fine in the past where there was only a single ref
backend, but now with multiple ref backends the picture has changed in
my opinion.

Patrick
diff mbox series

Patch

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..fc074fc571 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@ 
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has a bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
@@ -170,6 +173,12 @@ 
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`refMissingNewline`::
+	(INFO) A valid ref does not end with newline.
+
+`trailingRefContent`::
+	(INFO) A ref has trailing contents.
+
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
 
diff --git a/fsck.h b/fsck.h
index 500b4c04d2..b85072df57 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@  enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
@@ -84,6 +85,8 @@  enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
diff --git a/refs.c b/refs.c
index 74de3d3009..5e74881945 100644
--- a/refs.c
+++ b/refs.c
@@ -1758,7 +1758,7 @@  static int refs_read_special_head(struct ref_store *ref_store,
 	}
 
 	result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf,
-					  oid, referent, type, failure_errno);
+					  oid, referent, type, NULL, failure_errno);
 
 done:
 	strbuf_release(&full_path);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index d6fc3bd67c..69c00073eb 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -560,7 +560,7 @@  static int read_ref_internal(struct ref_store *ref_store, const char *refname,
 	buf = sb_contents.buf;
 
 	ret = parse_loose_ref_contents(ref_store->repo->hash_algo, buf,
-				       oid, referent, type, &myerr);
+				       oid, referent, type, NULL, &myerr);
 
 out:
 	if (ret && !myerr)
@@ -597,7 +597,7 @@  static int files_read_symbolic_ref(struct ref_store *ref_store, const char *refn
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno)
+			     const char **trailing, int *failure_errno)
 {
 	const char *p;
 	if (skip_prefix(buf, "ref:", &buf)) {
@@ -619,6 +619,10 @@  int parse_loose_ref_contents(const struct git_hash_algo *algop,
 		*failure_errno = EINVAL;
 		return -1;
 	}
+
+	if (trailing)
+		*trailing = p;
+
 	return 0;
 }
 
@@ -3430,6 +3434,65 @@  typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *refs_check_dir,
+				   struct dir_iterator *iter)
+{
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct strbuf refname = STRBUF_INIT;
+	struct fsck_ref_report report = {0};
+	const char *trailing = NULL;
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
+	report.path = refname.buf;
+
+	if (S_ISREG(iter->st.st_mode)) {
+		if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+			ret = error_errno(_("%s/%s: unable to read the ref"),
+					  refs_check_dir, iter->relative_path);
+			goto cleanup;
+		}
+
+		if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+					     ref_content.buf, &oid, &referent,
+					     &type, &trailing, &failure_errno)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_BAD_REF_CONTENT,
+					      "invalid ref content");
+			goto cleanup;
+		}
+
+		if (!(type & REF_ISSYMREF)) {
+			if (*trailing == '\0') {
+				ret = fsck_report_ref(o, &report,
+						      FSCK_MSG_REF_MISSING_NEWLINE,
+						      "missing newline");
+				goto cleanup;
+			}
+
+			if (*trailing != '\n' || (*(trailing + 1) != '\0')) {
+				ret = fsck_report_ref(o, &report,
+						      FSCK_MSG_TRAILING_REF_CONTENT,
+						      "trailing garbage in ref");
+				goto cleanup;
+			}
+		}
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&refname);
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *refs_check_dir,
@@ -3512,6 +3575,7 @@  static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2313c830d8..73b05f971b 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -715,7 +715,7 @@  struct ref_store {
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno);
+			     const char **trailing, int *failure_errno);
 
 /*
  * Fill in the generic part of refs and add it to our collection of
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 71a4d1a5ae..7c1910d784 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -89,4 +89,91 @@  test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_must_be_empty err
 '
 
+test_expect_success 'regular ref content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	git commit --allow-empty -m initial &&
+	git checkout -b branch-1 &&
+	git tag tag-1 &&
+	git commit --allow-empty -m second &&
+	git checkout -b branch-2 &&
+	git tag tag-2 &&
+	git checkout -b a/b/tag-2 &&
+
+	printf "%s" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
+	EOF
+	rm $branch_dir_prefix/branch-1-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/branch-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n  garbage" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%s    garbage\n\na" "$(git rev-parse tag-2)" > $tag_dir_prefix/tag-2-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-2-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-2-garbage &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
+	test_must_fail git -c fsck.trailingRefContent=error refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%sx" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-1-bad: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-1-bad &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" > $tag_dir_prefix/tag-2-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-2-bad: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-2-bad &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" > $branch_dir_prefix/a/b/branch-2-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-2-bad: badRefContent: invalid ref content
+	EOF
+	rm $branch_dir_prefix/a/b/branch-2-bad &&
+	test_cmp expect err
+'
+
 test_done