diff mbox series

[v1,2/4] ref: add regular ref content check for files backend

Message ID ZsIM2DRDbJsvNjAM@ArchLinux (mailing list archive)
State Superseded
Headers show
Series add ref content check for files backend | expand

Commit Message

shejialuo Aug. 18, 2024, 3:01 p.m. UTC
We implicitly reply on "git-fsck(1)" to check the consistency of regular
refs. However, when parsing the regular refs for files backend, we allow
the ref content to end with no newline or contain some garbages. We
should warn the user about above situations.

In order to provide above functionality, enhance the "git-refs verify"
command by adding consistency check for regular refs for files backend.

Add the following three fsck messages to represent the above situations:

1. "badRefContent(ERROR)": A ref has a bad content.
2. "refMissingNewline(WARN)": A valid ref does not end with newline.
3. "trailingRefContent(WARN)": A ref has trailing contents.

In order to tell whether the ref has trailing content, add a new
parameter "trailing" to "parse_loose_ref_contents". Then introduce a new
function "files_fsck_refs_content" to check the regular refs.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  9 ++++
 fsck.h                        |  3 ++
 refs.c                        |  2 +-
 refs/files-backend.c          | 67 ++++++++++++++++++++++++++-
 refs/refs-internal.h          |  2 +-
 t/t0602-reffiles-fsck.sh      | 87 +++++++++++++++++++++++++++++++++++
 6 files changed, 166 insertions(+), 4 deletions(-)

Comments

Junio C Hamano Aug. 20, 2024, 4:49 p.m. UTC | #1
shejialuo <shejialuo@gmail.com> writes:

> We implicitly reply on "git-fsck(1)" to check the consistency of regular

"reply" -> "rely", I think.

> refs. However, when parsing the regular refs for files backend, we allow
> the ref content to end with no newline or contain some garbages. We
> should warn the user about above situations.

Hmph, should we?  

If the content is short (e.g., in SHA-1 repository it only has 39
hexdigit) even if that may be sufficient to uniquely name the
object, we should warn about it, of course.  A file that has
64-hexdigit with a terminating LF at the end may be a valid file to
be in $GIT_DIR/refs/ hierarchy in a SHA-256 repository, but such a
file in a SHA-1 repository should also be subject to a warning, as
it could be a sign that somebody screwed up object format
conversion.

But a file that has only 40-hexdigit without a terminating LF at the
end?  Or a file that has 40-hexdigit followed by a CRLF instead of
LF?  Or a file that has the identical content as a valid ref on its
first line, but has extra stuff on its second and subsequent lines?

What does the name-to-object-name-mapping layer (aka "get_oid" API)
do when they see such a file in the $GIT_DIR/refs/ hierarchy?  If
they are treated as valid ref in the "normal" code path, it needs a
strong justification to tighten the rules retroactively, much
stronger than "Our current code, and any of our older versions,
would have written such a file as a loose ref with our code."

"What are we protecting us from with this tightening?" is the
question we should be asking ourselves, when evaluating each of
these new rules that fsck used not to care about.
shejialuo Aug. 21, 2024, 2:21 p.m. UTC | #2
On Tue, Aug 20, 2024 at 09:49:23AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > We implicitly reply on "git-fsck(1)" to check the consistency of regular
> 
> "reply" -> "rely", I think.

I will fix in the next version.

> > refs. However, when parsing the regular refs for files backend, we allow
> > the ref content to end with no newline or contain some garbages. We
> > should warn the user about above situations.
> 
> Hmph, should we?  
>

I am very sorry about this. Actually, I should not use "should". I don't
give compelling reasons here why we need to introduce such checks. I
just told the reviewer "we should warn". I will try to avoid above
mistakes where I didn't give enough motivation.

> What does the name-to-object-name-mapping layer (aka "get_oid" API)
> do when they see such a file in the $GIT_DIR/refs/ hierarchy?  If
> they are treated as valid ref in the "normal" code path, it needs a
> strong justification to tighten the rules retroactively, much
> stronger than "Our current code, and any of our older versions,
> would have written such a file as a loose ref with our code."
> 

Let me first talk about what will happen when we use the following
command:

  $ git checkout bad-branch

I use "gdb" to find the following call sequence:

  "cmd_checkout" -> "checkout_main" -> "parse_branchname_arg" ->
  ... -> "get_oid_basic" -> "repo_dwim_ref" -> ... ->
  "parse_loose_ref_contents" -> "parse_oid_hex_algop" ->
  "get_oid_hex_algop"

I dive into the "object-name.c::get_oid_basic" function. If we pass the
actually "oid", it will call the "get_oid_hex_algop" directly.
Otherwise, it will execute the following code:

  if (!len && reflog_len)
      refs_found = ...;
  else if (reflog_len)
      refs_found = ...
  else
      refs_found = repo_dwim_ref(r, str, len, oid, &real_ref, !fatal);

  if (!refs_found)
      return -1;

As we can see, when there is no corresponding refs found by calling
"repo_dwim_ref" function, "get_oid_basic" function will return -1. And
here we could have one important conclusion:

  The "get_oid_basic" function relies on "repo_dwim_ref" function to
  parse the ref and get the pointee "oid". So, it uses the interfaces
  provided by ref backend.

Next, we look at what will "parse_loose_ref_contents" do for regular
refs.

  int parse_loose_ref_contents(...)
  {
      ...
      if (parse_oid_hex_algop(buf, oid, *p, algop) ||
         (*p != '\0' && !isspace(*p))) {
            *type |= REF_ISBROKEN;
            *failure_errno = EINVAL;
            return -1;
      }
      return 0;
  }

Let's continue to see what "parse_oid_hex_algop" will do:

  int parse_oid_hex_algop(...)
  {
      int ret = get_oid_hex_algop(hex, oid, algop);
      if (!ret) {
          *end = hex + algop->hexsz;
      }
      return ret;
  }

If the result of "get_oid_hex_algop" is successful. We will set the
"end" pointer here. The "get_oid_hex_algop" will eventually call the
"get_hash_hex_algop" function

  static int get_hash_hex_algop(...)
  {
      int i;
      for (i = 0; i < algop->rawsz; i++) {
          int val = hex2chr(hex);
          if (val < 0)
              return -1;
          *hash+= = val;
          hex += 2;
      }
      return 0;
  }

This function will convert the hex to char by the raw size of the
algorithm. And by the following code, we could conclude the following
things:

1. "41053a9084501db79c72b14e8a5a0b67de3f91ae" is correct, because it
will be parsed successfully by "get_hash_hex_algop" and "*p == '\0'".
2. "41053a9084501db79c72b14e8a5a0b67de3f91aef" is not correct, it will
be parsed successfully by "get_hash_hex_algop" but "*p != '\0'"
and "isspace(*p)" is false. So the check in "parse_loose_ref_contents"
cannot be passed.
3. "1053a9084501db79c72b14e8a5a0b67de3f91a" is not correct, it cannot be
parsed successfully by "get_hash_hex_algop".
4. "41053a9084501db79c72b14e8a5a0b67de3f91ae garbage" is correct,
because it will be parsed successfully by "get_hash_hex_algop" and
"isspace(*p)" is true.

By the above discussion, I could answer you comments one by one.

> If the content is short (e.g., in SHA-1 repository it only has 39
> hexdigit) even if that may be sufficient to uniquely name the
> object, we should warn about it, of course.

When the content is short, although it may be sufficient to identify the
object, we should still report an error here. This is because we care
about the ref. As we can see from above discussion, the "object-name.c"
totally relies on the interfaces provided by the ref backend. And
"get_hash_hex_algop" is unhappy about this situation. And eventually the
"object-name.c::get_oid_basic" will be unhappy, return -1.

> A file that has 64-hexdigit with a terminating LF at the end may be
> a valid file to be in $GIT_DIR/refs/ hierarchy in a SHA-256
> repository, but such a file in a SHA-1 repository should also be
> subject to a warning, as it could be a sign that somebody screwed up
> object format conversion.

I agree with this idea. But in this implementation, we want to reuse the
"parse_loose_ref_contents" to check the consistency of the regular refs.
If we are in a SHA-1 repository, "parse_loose_ref_contents" will be
unhappy about this. However, I don't think we need to provide user that
"the content is 64-hexdigit ...". We just report "bad ref content" to
the user. This will also indicate the user something is wrong, you need
to check the ref database.

> But a file that has only 40-hexdigit without a terminating LF at the
> end?  Or a file that has 40-hexdigit followed by a CRLF instead of
> LF?  Or a file that has the identical content as a valid ref on its
> first line, but has extra stuff on its second and subsequent lines?

This is the core problem why we want to introduce more strict check.
Because in the current "parse_loose_ref_contents" function, as long as
the next byte of the end of the hex is '\0', spaces, LF, CRLF. We could
know that the content of the ref is OK.

But in my view, we should warn the user about this situation. This is
because in the original code, we do not check the ref strictly for files
backend. And I think at current, the normal user should not interact
with the git database. If there are some garbages we found in the ref
database, I guess this could be a sign for the user: "Watch out! there
may be something wrong".

> "What are we protecting us from with this tightening?" is the
> question we should be asking ourselves, when evaluating each of
> these new rules that fsck used not to care about.

That's a hard question, really. I find it hard to know what should we
do? The motivation is hard to describe. But I think this reply could
make thing more clear here.

Thanks,
Jialuo
Patrick Steinhardt Aug. 22, 2024, 8:46 a.m. UTC | #3
On Tue, Aug 20, 2024 at 09:49:23AM -0700, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > We implicitly reply on "git-fsck(1)" to check the consistency of regular
> 
> "reply" -> "rely", I think.
> 
> > refs. However, when parsing the regular refs for files backend, we allow
> > the ref content to end with no newline or contain some garbages. We
> > should warn the user about above situations.
> 
> Hmph, should we?  
> 
> If the content is short (e.g., in SHA-1 repository it only has 39
> hexdigit) even if that may be sufficient to uniquely name the
> object, we should warn about it, of course.  A file that has
> 64-hexdigit with a terminating LF at the end may be a valid file to
> be in $GIT_DIR/refs/ hierarchy in a SHA-256 repository, but such a
> file in a SHA-1 repository should also be subject to a warning, as
> it could be a sign that somebody screwed up object format
> conversion.
> 
> But a file that has only 40-hexdigit without a terminating LF at the
> end?  Or a file that has 40-hexdigit followed by a CRLF instead of
> LF?  Or a file that has the identical content as a valid ref on its
> first line, but has extra stuff on its second and subsequent lines?
> 
> What does the name-to-object-name-mapping layer (aka "get_oid" API)
> do when they see such a file in the $GIT_DIR/refs/ hierarchy?  If
> they are treated as valid ref in the "normal" code path, it needs a
> strong justification to tighten the rules retroactively, much
> stronger than "Our current code, and any of our older versions,
> would have written such a file as a loose ref with our code."
> 
> "What are we protecting us from with this tightening?" is the
> question we should be asking ourselves, when evaluating each of
> these new rules that fsck used not to care about.

I'd say filesystem corruption, buggy implementations and compatibility
with other implementations of Git. The format for refs does not allow
for any other information than either an object ID for plain refs, and
the referee for symbolic refs. The fact that we do accept that is a mere
implementation detail because we reuse the same function to parse refs
that we also use for pseudorefs. And these _can_ have additional data.

So any reference that contains additional data is not a proper ref and
thus should be warned about from my point of view. No Git tooling should
write them, so if something does it's a red flag to me.

Patrick
Patrick Steinhardt Aug. 22, 2024, 8:48 a.m. UTC | #4
On Sun, Aug 18, 2024 at 11:01:44PM +0800, shejialuo wrote:
> +static int files_fsck_refs_content(struct ref_store *ref_store,
> +				   struct fsck_options *o,
> +				   const char *refs_check_dir,
> +				   struct dir_iterator *iter)
> +{
> +	struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;
> +	struct strbuf ref_content = STRBUF_INIT;
> +	struct strbuf referent = STRBUF_INIT;
> +	struct strbuf refname = STRBUF_INIT;
> +	const char *trailing = NULL;
> +	unsigned int type = 0;
> +	int failure_errno = 0;
> +	struct object_id oid;
> +	int ret = 0;
> +
> +	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
> +	report.path = refname.buf;
> +
> +	if (S_ISREG(iter->st.st_mode)) {

We can avoid having to indent the remainder of this function if we `goto
cleanup` here.

Patrick
shejialuo Aug. 22, 2024, 12:06 p.m. UTC | #5
On Thu, Aug 22, 2024 at 10:48:30AM +0200, Patrick Steinhardt wrote:
>
> We can avoid having to indent the remainder of this function if we `goto
> cleanup` here.
> 

Yes, actually I have thought about this way. But I don't want to use
"goto". However, ident is noisy too. I will fix in the next version.
Junio C Hamano Aug. 22, 2024, 4:13 p.m. UTC | #6
Patrick Steinhardt <ps@pks.im> writes:

> So any reference that contains additional data is not a proper ref and
> thus should be warned about from my point of view. No Git tooling should
> write them, so if something does it's a red flag to me.

If you find such a file in $GIT_DIR/refs/ hierarchy, because our
consumer side has been looser than necessary forever, and we never
have written such a file ourselves, it is a sign that a third-party
tool wrote it, and that the third-party tool used our reader
implementation as the specification.  That is why I am hesitant to
retroactively tighten the rules like this patch does.

Thanks.
Junio C Hamano Aug. 22, 2024, 4:17 p.m. UTC | #7
Junio C Hamano <gitster@pobox.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> So any reference that contains additional data is not a proper ref and
>> thus should be warned about from my point of view. No Git tooling should
>> write them, so if something does it's a red flag to me.
>
> If you find such a file in $GIT_DIR/refs/ hierarchy, because our
> consumer side has been looser than necessary forever, and we never
> have written such a file ourselves, it is a sign that a third-party
> tool wrote it, and that the third-party tool used our reader
> implementation as the specification.  That is why I am hesitant to
> retroactively tighten the rules like this patch does.

I forgot to add my recommended course of action, without which a
review is worth much less X-<.

I am OK if we tightened the rules retroactively, as long as it
starts as a probing check (i.e. "info: we found an unusual thing
in the wild. Please report this to us so that we can ask you for
more details like how such a ref that would violate a rule that was
retroactively tightened got there", not "error: malformed ref").

Thanks.
Patrick Steinhardt Aug. 23, 2024, 7:21 a.m. UTC | #8
On Thu, Aug 22, 2024 at 09:17:08AM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Patrick Steinhardt <ps@pks.im> writes:
> >
> >> So any reference that contains additional data is not a proper ref and
> >> thus should be warned about from my point of view. No Git tooling should
> >> write them, so if something does it's a red flag to me.
> >
> > If you find such a file in $GIT_DIR/refs/ hierarchy, because our
> > consumer side has been looser than necessary forever, and we never
> > have written such a file ourselves, it is a sign that a third-party
> > tool wrote it, and that the third-party tool used our reader
> > implementation as the specification.  That is why I am hesitant to
> > retroactively tighten the rules like this patch does.
> 
> I forgot to add my recommended course of action, without which a
> review is worth much less X-<.
> 
> I am OK if we tightened the rules retroactively, as long as it
> starts as a probing check (i.e. "info: we found an unusual thing
> in the wild. Please report this to us so that we can ask you for
> more details like how such a ref that would violate a rule that was
> retroactively tightened got there", not "error: malformed ref").

Okay, that makes sense. The fsck infrastructure does have info message
types, so this should certainly be doable. I'd argue that we might want
to make this an `FSCK_WARN`, but I'm also fine with iteratively bumping
up the severity from INFO to WARN to ERROR when we don't observe any
complaints about this tightening.

Patrick
shejialuo Aug. 23, 2024, 11:30 a.m. UTC | #9
On Fri, Aug 23, 2024 at 09:21:14AM +0200, Patrick Steinhardt wrote:
> On Thu, Aug 22, 2024 at 09:17:08AM -0700, Junio C Hamano wrote:
> > Junio C Hamano <gitster@pobox.com> writes:
> > 
> > > Patrick Steinhardt <ps@pks.im> writes:
> > >
> > >> So any reference that contains additional data is not a proper ref and
> > >> thus should be warned about from my point of view. No Git tooling should
> > >> write them, so if something does it's a red flag to me.
> > >
> > > If you find such a file in $GIT_DIR/refs/ hierarchy, because our
> > > consumer side has been looser than necessary forever, and we never
> > > have written such a file ourselves, it is a sign that a third-party
> > > tool wrote it, and that the third-party tool used our reader
> > > implementation as the specification.  That is why I am hesitant to
> > > retroactively tighten the rules like this patch does.
> > 
> > I forgot to add my recommended course of action, without which a
> > review is worth much less X-<.
> > 
> > I am OK if we tightened the rules retroactively, as long as it
> > starts as a probing check (i.e. "info: we found an unusual thing
> > in the wild. Please report this to us so that we can ask you for
> > more details like how such a ref that would violate a rule that was
> > retroactively tightened got there", not "error: malformed ref").
> 
> Okay, that makes sense. The fsck infrastructure does have info message
> types, so this should certainly be doable. I'd argue that we might want
> to make this an `FSCK_WARN`, but I'm also fine with iteratively bumping
> up the severity from INFO to WARN to ERROR when we don't observe any
> complaints about this tightening.
> 

From the perspective of the implementation, there is no difference
between the info and warn. But I have a doubt here. Do we really
distinguish the info and warn in code?

Let's see the "fsck_vreport" (although this is a new function, but I
never change the implementation) function:

  static int fsck_vreport(...)
  {
      enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);

      if (msg_type == FSCK_FATAL)
          msg_type = FSCK_ERROR;
      if (msg_type == FSCK_INFO)
          msg_type = FSCK_WARN;

      ...
  }

We eventually convert the "FSCK_INFO" to "FSCK_WARN". Confusing.
diff mbox series

Patch

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..1688c2f1fe 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@ 
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has a bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
@@ -170,6 +173,12 @@ 
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`refMissingNewline`::
+	(WARN) A valid ref does not end with newline.
+
+`trailingRefContent`::
+	(WARN) A ref has trailing contents.
+
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
 
diff --git a/fsck.h b/fsck.h
index 8894394d16..975d9b9da9 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@  enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
@@ -73,6 +74,8 @@  enum fsck_msg_type {
 	FUNC(HAS_DOTDOT, WARN) \
 	FUNC(HAS_DOTGIT, WARN) \
 	FUNC(NULL_SHA1, WARN) \
+	FUNC(REF_MISSING_NEWLINE, WARN) \
+	FUNC(TRAILING_REF_CONTENT, WARN) \
 	FUNC(ZERO_PADDED_FILEMODE, WARN) \
 	FUNC(NUL_IN_COMMIT, WARN) \
 	FUNC(LARGE_PATHNAME, WARN) \
diff --git a/refs.c b/refs.c
index 74de3d3009..5e74881945 100644
--- a/refs.c
+++ b/refs.c
@@ -1758,7 +1758,7 @@  static int refs_read_special_head(struct ref_store *ref_store,
 	}
 
 	result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf,
-					  oid, referent, type, failure_errno);
+					  oid, referent, type, NULL, failure_errno);
 
 done:
 	strbuf_release(&full_path);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 725a4f52e3..ae71692f36 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -560,7 +560,7 @@  static int read_ref_internal(struct ref_store *ref_store, const char *refname,
 	buf = sb_contents.buf;
 
 	ret = parse_loose_ref_contents(ref_store->repo->hash_algo, buf,
-				       oid, referent, type, &myerr);
+				       oid, referent, type, NULL, &myerr);
 
 out:
 	if (ret && !myerr)
@@ -597,7 +597,7 @@  static int files_read_symbolic_ref(struct ref_store *ref_store, const char *refn
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno)
+			     const char **trailing, int *failure_errno)
 {
 	const char *p;
 	if (skip_prefix(buf, "ref:", &buf)) {
@@ -619,6 +619,10 @@  int parse_loose_ref_contents(const struct git_hash_algo *algop,
 		*failure_errno = EINVAL;
 		return -1;
 	}
+
+	if (trailing)
+		*trailing = p;
+
 	return 0;
 }
 
@@ -3430,6 +3434,64 @@  typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *refs_check_dir,
+				   struct dir_iterator *iter)
+{
+	struct fsck_ref_report report = FSCK_REF_REPORT_DEFAULT;
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct strbuf refname = STRBUF_INIT;
+	const char *trailing = NULL;
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
+	report.path = refname.buf;
+
+	if (S_ISREG(iter->st.st_mode)) {
+		if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+			ret = error_errno(_("%s/%s: unable to read the ref"),
+					  refs_check_dir, iter->relative_path);
+			goto cleanup;
+		}
+
+		if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+					    ref_content.buf, &oid, &referent,
+					    &type, &trailing, &failure_errno)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_BAD_REF_CONTENT,
+					      "invalid ref content");
+			goto cleanup;
+		}
+
+		if (!(type & REF_ISSYMREF)) {
+			if (*trailing == '\0') {
+				ret = fsck_report_ref(o, &report,
+						      FSCK_MSG_REF_MISSING_NEWLINE,
+						      "missing newline");
+				goto cleanup;
+			}
+
+			if (*trailing != '\n' || (*(trailing + 1) != '\0')) {
+				ret = fsck_report_ref(o, &report,
+						      FSCK_MSG_TRAILING_REF_CONTENT,
+						      "trailing garbage in ref");
+				goto cleanup;
+			}
+		}
+	}
+
+cleanup:
+	strbuf_release(&refname);
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *refs_check_dir,
@@ -3512,6 +3574,7 @@  static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2313c830d8..73b05f971b 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -715,7 +715,7 @@  struct ref_store {
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno);
+			     const char **trailing, int *failure_errno);
 
 /*
  * Fill in the generic part of refs and add it to our collection of
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 71a4d1a5ae..7c1910d784 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -89,4 +89,91 @@  test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_must_be_empty err
 '
 
+test_expect_success 'regular ref content should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	git commit --allow-empty -m initial &&
+	git checkout -b branch-1 &&
+	git tag tag-1 &&
+	git commit --allow-empty -m second &&
+	git checkout -b branch-2 &&
+	git tag tag-2 &&
+	git checkout -b a/b/tag-2 &&
+
+	printf "%s" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-1-no-newline: refMissingNewline: missing newline
+	EOF
+	rm $branch_dir_prefix/branch-1-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse branch-1)" > $branch_dir_prefix/branch-1-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/branch-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n  garbage" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%s    garbage\n\na" "$(git rev-parse tag-2)" > $tag_dir_prefix/tag-2-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-2-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-2-garbage &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-garbage &&
+	test_must_fail git -c fsck.trailingRefContent=error refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-1-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-1-garbage &&
+	test_cmp expect err &&
+
+	printf "%sx" "$(git rev-parse tag-1)" > $tag_dir_prefix/tag-1-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-1-bad: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-1-bad &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" > $tag_dir_prefix/tag-2-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-2-bad: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-2-bad &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" > $branch_dir_prefix/a/b/branch-2-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-2-bad: badRefContent: invalid ref content
+	EOF
+	rm $branch_dir_prefix/a/b/branch-2-bad &&
+	test_cmp expect err
+'
+
 test_done