diff mbox series

[v4,3/4] object-file: emit corruption errors when detected

Message ID 07d28db92c2c61358755b3d501bc5bd35a760de1.1670622176.git.jonathantanmy@google.com (mailing list archive)
State Superseded
Headers show
Series Don't lazy-fetch commits when parsing them | expand

Commit Message

Jonathan Tan Dec. 9, 2022, 9:44 p.m. UTC
Instead of relying on errno being preserved across function calls, teach
do_oid_object_info_extended() to itself report object corruption when
it first detects it. There are 3 types of corruption being detected:
 - when a replacement object is missing
 - when a loose object is corrupt
 - when a packed object is corrupt and the object cannot be read
   in another way

Note that in the RHS of this patch's diff, a check for ENOENT that was
introduced in 3ba7a06552 (A loose object is not corrupt if it cannot
be read due to EMFILE, 2010-10-28) is also removed. The purpose of this
check is to avoid a false report of corruption if the errno contains
something like EMFILE (or anything that is not ENOENT), in which case
a more generic report is presented. Because, as of this patch, we no
longer rely on such a heuristic to determine corruption, but surface
the error message at the point when we read something that we did not
expect, this check is no longer necessary.

Besides being more resilient, this also prepares for a future patch in
which an indirect caller of do_oid_object_info_extended() will need
such functionality.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 object-file.c  | 55 +++++++++++++++++++++++++-------------------------
 object-store.h |  3 +++
 2 files changed, 31 insertions(+), 27 deletions(-)

Comments

Junio C Hamano Dec. 10, 2022, 12:16 a.m. UTC | #1
Jonathan Tan <jonathantanmy@google.com> writes:

> +	fd = open_loose_object(r, oid, &path);
> +	if (fd < 0) {
> +		if (errno != ENOENT)
> +			error_errno(_("unable to open loose object %s"), path);
> +		return -1;
> +	}

I know there was a discussion in the previous round, but is this use
of path truly safe?  Currently it may happen to be as long as there
is at least one element on the odb list, but when thinking things
through with future-proofing point of view, I do not think assuming
that path is always computable is a healthy thing to do in the
longer term.

Our "struct object_id" may be extended in the future and allow us to
express "invalid" object name, in which case the error return we get
may not even be about "loose object file not openable" but "there
will never be a loose object file for such an invalid object name",
in which case there won't be any path returned from the function.

Other than that, the series looks quite clearly written.  Nicely
done.

Thanks.
Jonathan Tan Dec. 12, 2022, 8:38 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > +	fd = open_loose_object(r, oid, &path);
> > +	if (fd < 0) {
> > +		if (errno != ENOENT)
> > +			error_errno(_("unable to open loose object %s"), path);
> > +		return -1;
> > +	}
> 
> I know there was a discussion in the previous round, but is this use
> of path truly safe?  Currently it may happen to be as long as there
> is at least one element on the odb list, but when thinking things
> through with future-proofing point of view, I do not think assuming
> that path is always computable is a healthy thing to do in the
> longer term.
> 
> Our "struct object_id" may be extended in the future and allow us to
> express "invalid" object name, in which case the error return we get
> may not even be about "loose object file not openable" but "there
> will never be a loose object file for such an invalid object name",
> in which case there won't be any path returned from the function.

Ah, good point. I think what I can do is to document the function to
only return a path if a path was involved in the error somehow, and make
anything that uses "path" in the caller check for NULL.
 
> Other than that, the series looks quite clearly written.  Nicely
> done.
> 
> Thanks.

Thanks for taking a look.
Jeff King Dec. 12, 2022, 8:49 p.m. UTC | #3
On Sat, Dec 10, 2022 at 09:16:42AM +0900, Junio C Hamano wrote:

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > +	fd = open_loose_object(r, oid, &path);
> > +	if (fd < 0) {
> > +		if (errno != ENOENT)
> > +			error_errno(_("unable to open loose object %s"), path);
> > +		return -1;
> > +	}
> 
> I know there was a discussion in the previous round, but is this use
> of path truly safe?  Currently it may happen to be as long as there
> is at least one element on the odb list, but when thinking things
> through with future-proofing point of view, I do not think assuming
> that path is always computable is a healthy thing to do in the
> longer term.
> 
> Our "struct object_id" may be extended in the future and allow us to
> express "invalid" object name, in which case the error return we get
> may not even be about "loose object file not openable" but "there
> will never be a loose object file for such an invalid object name",
> in which case there won't be any path returned from the function.

Actually, I think it is much worse than that. The code as written above
is already buggy (which is my fault, as I suggested it).

In open_loose_object() we'll continue to iterate and pick out the "most
interesting errno". But we'll throw away the path that gave us that
errno. So we might well say:

  unable to open loose object /some/alternate/12/34abcd: permission denied

when the actual problem is in /main/objdir/12/34abcd.

It's fixable, but with some pain in handling the allocations. I think it
would be sufficient to just say:

  error_errno(_("unable to open loose object %s"), oid_to_hex(oid));

here. And possibly put a comment above open_loose_object() that "path"
is only guaranteed to point to something sensible when a non-negative
value is returned.

-Peff
Jonathan Tan Dec. 12, 2022, 8:59 p.m. UTC | #4
Jeff King <peff@peff.net> writes:
> Actually, I think it is much worse than that. The code as written above
> is already buggy (which is my fault, as I suggested it).
> 
> In open_loose_object() we'll continue to iterate and pick out the "most
> interesting errno". But we'll throw away the path that gave us that
> errno. So we might well say:
> 
>   unable to open loose object /some/alternate/12/34abcd: permission denied
> 
> when the actual problem is in /main/objdir/12/34abcd.
> 
> It's fixable, but with some pain in handling the allocations. I think it
> would be sufficient to just say:
> 
>   error_errno(_("unable to open loose object %s"), oid_to_hex(oid));
> 
> here. 

OK, let's go with this.

> And possibly put a comment above open_loose_object() that "path"
> is only guaranteed to point to something sensible when a non-negative
> value is returned.

Junio made a point that there could, for example, be no path when the
odb list is empty (maybe in the future) so I don't think this would be
sufficient. But there is already a comment there pointing to a comment
in another function that states "path ... (if any)" so this is something
that callers should already take care of. In my changes, I'll initialize
it to NULL and whenever I use it, I'll check for non-NULL first.
Jeff King Dec. 12, 2022, 9:20 p.m. UTC | #5
On Mon, Dec 12, 2022 at 12:59:55PM -0800, Jonathan Tan wrote:

> > And possibly put a comment above open_loose_object() that "path"
> > is only guaranteed to point to something sensible when a non-negative
> > value is returned.
> 
> Junio made a point that there could, for example, be no path when the
> odb list is empty (maybe in the future) so I don't think this would be
> sufficient. But there is already a comment there pointing to a comment
> in another function that states "path ... (if any)" so this is something
> that callers should already take care of. In my changes, I'll initialize
> it to NULL and whenever I use it, I'll check for non-NULL first.

If we return a non-negative value, then we opened something, so by
definition, don't we have a path of the thing we opened?

I think the case Junio mentioned was if we for some reason didn't look
at _any_ path. In which case we'd be returning an error.

-Peff
Jonathan Tan Dec. 12, 2022, 9:29 p.m. UTC | #6
Jeff King <peff@peff.net> writes:
> On Mon, Dec 12, 2022 at 12:59:55PM -0800, Jonathan Tan wrote:
> 
> > > And possibly put a comment above open_loose_object() that "path"
> > > is only guaranteed to point to something sensible when a non-negative
> > > value is returned.
> > 
> > Junio made a point that there could, for example, be no path when the
> > odb list is empty (maybe in the future) so I don't think this would be
> > sufficient. But there is already a comment there pointing to a comment
> > in another function that states "path ... (if any)" so this is something
> > that callers should already take care of. In my changes, I'll initialize
> > it to NULL and whenever I use it, I'll check for non-NULL first.
> 
> If we return a non-negative value, then we opened something, so by
> definition, don't we have a path of the thing we opened?
> 
> I think the case Junio mentioned was if we for some reason didn't look
> at _any_ path. In which case we'd be returning an error.

Ah, my reading comprehension is failing me, sorry. We do want "path"
to point to something sensible (well, whenever we can) when an error
occurs, though, since we want to include that path in our error message
when DIE_IF_CORRUPT is used. So guaranteeing "path" when a non-negative
value is returned (and hence, no error occurred) is not so useful.
Jeff King Dec. 12, 2022, 10:17 p.m. UTC | #7
On Mon, Dec 12, 2022 at 01:29:47PM -0800, Jonathan Tan wrote:

> > If we return a non-negative value, then we opened something, so by
> > definition, don't we have a path of the thing we opened?
> > 
> > I think the case Junio mentioned was if we for some reason didn't look
> > at _any_ path. In which case we'd be returning an error.
> 
> Ah, my reading comprehension is failing me, sorry. We do want "path"
> to point to something sensible (well, whenever we can) when an error
> occurs, though, since we want to include that path in our error message
> when DIE_IF_CORRUPT is used. So guaranteeing "path" when a non-negative
> value is returned (and hence, no error occurred) is not so useful.

But we only DIE_IF_CORRUPT when there is actual corruption, which means
we've opened an object file, and "path" is valid.

The only time "path" would be invalid is if open_loose_object() itself
returns an error, which is the message under discussion:

	fd = open_loose_object(r, oid, &path);
	if (fd < 0) {
		if (errno != ENOENT)
			error_errno(_("unable to open loose object %s"), path);
		return -1;
	}

If that stops expecting "path" to be valid (and just mentions the oid),
then the rest of loose_object_info() should be fine.

-Peff
Jonathan Tan Dec. 12, 2022, 10:52 p.m. UTC | #8
Jeff King <peff@peff.net> writes:
> On Mon, Dec 12, 2022 at 12:59:55PM -0800, Jonathan Tan wrote:
> 
> > > And possibly put a comment above open_loose_object() that "path"
> > > is only guaranteed to point to something sensible when a non-negative
> > > value is returned.
> > 
> > Junio made a point that there could, for example, be no path when the
> > odb list is empty (maybe in the future) so I don't think this would be
> > sufficient. But there is already a comment there pointing to a comment
> > in another function that states "path ... (if any)" so this is something
> > that callers should already take care of. In my changes, I'll initialize
> > it to NULL and whenever I use it, I'll check for non-NULL first.
> 
> If we return a non-negative value, then we opened something, so by
> definition, don't we have a path of the thing we opened?

Hmm...are you saying "path is guaranteed when there is no error; when
there is an error, the caller must check"? If yes, I think we are in
agreement. In any case, to make things more concrete, I've just sent a
new version [1].

[1] https://lore.kernel.org/git/cover.1670885252.git.jonathantanmy@google.com/
Jeff King Dec. 13, 2022, 10:37 a.m. UTC | #9
On Mon, Dec 12, 2022 at 02:52:12PM -0800, Jonathan Tan wrote:

> > > Junio made a point that there could, for example, be no path when the
> > > odb list is empty (maybe in the future) so I don't think this would be
> > > sufficient. But there is already a comment there pointing to a comment
> > > in another function that states "path ... (if any)" so this is something
> > > that callers should already take care of. In my changes, I'll initialize
> > > it to NULL and whenever I use it, I'll check for non-NULL first.
> > 
> > If we return a non-negative value, then we opened something, so by
> > definition, don't we have a path of the thing we opened?
> 
> Hmm...are you saying "path is guaranteed when there is no error; when
> there is an error, the caller must check"? If yes, I think we are in
> agreement. In any case, to make things more concrete, I've just sent a
> new version [1].

Almost. I'm saying "path is guaranteed when there is no error; when
there is an error, the value of path is meaningless and should not be
looked at".

If you want to enforce that open_loose_object() sets "path" to NULL on
error, and then say "the caller must check", that would be valid. But
without that, even checking it for NULL is pointless (because you may
see a path which got ENOENT, even though we got an interesting errno
from an earlier path).

-Peff
diff mbox series

Patch

diff --git a/object-file.c b/object-file.c
index 429e3a746d..2a0df39822 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1422,7 +1422,9 @@  static int loose_object_info(struct repository *r,
 			     struct object_info *oi, int flags)
 {
 	int status = 0;
+	int fd;
 	unsigned long mapsize;
+	const char *path;
 	void *map;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
@@ -1443,7 +1445,6 @@  static int loose_object_info(struct repository *r,
 	 * object even exists.
 	 */
 	if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) {
-		const char *path;
 		struct stat st;
 		if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
 			return quick_has_loose(r, oid) ? 0 : -1;
@@ -1454,7 +1455,13 @@  static int loose_object_info(struct repository *r,
 		return 0;
 	}
 
-	map = map_loose_object(r, oid, &mapsize);
+	fd = open_loose_object(r, oid, &path);
+	if (fd < 0) {
+		if (errno != ENOENT)
+			error_errno(_("unable to open loose object %s"), path);
+		return -1;
+	}
+	map = map_fd(fd, path, &mapsize);
 	if (!map)
 		return -1;
 
@@ -1492,6 +1499,10 @@  static int loose_object_info(struct repository *r,
 		break;
 	}
 
+	if (status && (flags & OBJECT_INFO_DIE_IF_CORRUPT))
+		die(_("loose object %s (stored in %s) is corrupt"),
+		    oid_to_hex(oid), path);
+
 	git_inflate_end(&stream);
 cleanup:
 	munmap(map, mapsize);
@@ -1601,6 +1612,15 @@  static int do_oid_object_info_extended(struct repository *r,
 			continue;
 		}
 
+		if (flags & OBJECT_INFO_DIE_IF_CORRUPT) {
+			const struct packed_git *p;
+			if ((flags & OBJECT_INFO_LOOKUP_REPLACE) && !oideq(real, oid))
+				die(_("replacement %s not found for %s"),
+				    oid_to_hex(real), oid_to_hex(oid));
+			if ((p = has_packed_and_bad(r, real)))
+				die(_("packed object %s (stored in %s) is corrupt"),
+				    oid_to_hex(real), p->pack_name);
+		}
 		return -1;
 	}
 
@@ -1653,7 +1673,8 @@  int oid_object_info(struct repository *r,
 
 static void *read_object(struct repository *r,
 			 const struct object_id *oid, enum object_type *type,
-			 unsigned long *size)
+			 unsigned long *size,
+			 int die_if_corrupt)
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	void *content;
@@ -1661,7 +1682,8 @@  static void *read_object(struct repository *r,
 	oi.sizep = size;
 	oi.contentp = &content;
 
-	if (oid_object_info_extended(r, oid, &oi, 0) < 0)
+	if (oid_object_info_extended(r, oid, &oi, die_if_corrupt
+				     ? OBJECT_INFO_DIE_IF_CORRUPT : 0) < 0)
 		return NULL;
 	return content;
 }
@@ -1697,35 +1719,14 @@  void *read_object_file_extended(struct repository *r,
 				int lookup_replace)
 {
 	void *data;
-	const struct packed_git *p;
-	const char *path;
-	struct stat st;
 	const struct object_id *repl = lookup_replace ?
 		lookup_replace_object(r, oid) : oid;
 
 	errno = 0;
-	data = read_object(r, repl, type, size);
+	data = read_object(r, repl, type, size, 1);
 	if (data)
 		return data;
 
-	obj_read_lock();
-	if (errno && errno != ENOENT)
-		die_errno(_("failed to read object %s"), oid_to_hex(oid));
-
-	/* die if we replaced an object with one that does not exist */
-	if (repl != oid)
-		die(_("replacement %s not found for %s"),
-		    oid_to_hex(repl), oid_to_hex(oid));
-
-	if (!stat_loose_object(r, repl, &st, &path))
-		die(_("loose object %s (stored in %s) is corrupt"),
-		    oid_to_hex(repl), path);
-
-	if ((p = has_packed_and_bad(r, repl)))
-		die(_("packed object %s (stored in %s) is corrupt"),
-		    oid_to_hex(repl), p->pack_name);
-	obj_read_unlock();
-
 	return NULL;
 }
 
@@ -2268,7 +2269,7 @@  int force_object_loose(const struct object_id *oid, time_t mtime)
 
 	if (has_loose_object(oid))
 		return 0;
-	buf = read_object(the_repository, oid, &type, &len);
+	buf = read_object(the_repository, oid, &type, &len, 0);
 	if (!buf)
 		return error(_("cannot read object for %s"), oid_to_hex(oid));
 	hdrlen = format_object_header(hdr, sizeof(hdr), type, len);
diff --git a/object-store.h b/object-store.h
index b1ec0bde82..98c1d67946 100644
--- a/object-store.h
+++ b/object-store.h
@@ -445,6 +445,9 @@  struct object_info {
  */
 #define OBJECT_INFO_FOR_PREFETCH (OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK)
 
+/* Die if object corruption (not just an object being missing) was detected. */
+#define OBJECT_INFO_DIE_IF_CORRUPT 32
+
 int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
 			     struct object_info *, unsigned flags);