diff mbox series

[v6,08/22] object-file.c: don't set "typep" when returning non-zero

Message ID patch-v6-08.22-38e4266772d-20210907T104559Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series fsck: lib-ify object-file.c & better fsck "invalid object" error reporting | expand

Commit Message

Ævar Arnfjörð Bjarmason Sept. 7, 2021, 10:58 a.m. UTC
When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Taylor Blau Sept. 16, 2021, 9:29 p.m. UTC | #1
On Tue, Sep 07, 2021 at 12:58:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
> When the loose_object_info() function returns an error stop faking up
> the "oi->typep" to OBJ_BAD. Let the return value of the function
> itself suffice. This code cleanup simplifies subsequent changes.

The obvious danger (which you mention) is that somebody is relying on
what typep points to, and is reading it even if we returned non-zero
from whatever called this function.

Hopefully nobody is, but this change makes me a little uncomfortable
nonetheless, since there are so many potential callers (even though this
function has only one caller, it doesn't take long before the number of
indirect callers explodes).

So it would be nice if we could do without it, but you claim that it
simplifies changes that happen later on. So let's continue to see if we
really do need it...

Thanks,
Taylor
Jeff King Sept. 16, 2021, 9:56 p.m. UTC | #2
On Thu, Sep 16, 2021 at 05:29:30PM -0400, Taylor Blau wrote:

> On Tue, Sep 07, 2021 at 12:58:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > When the loose_object_info() function returns an error stop faking up
> > the "oi->typep" to OBJ_BAD. Let the return value of the function
> > itself suffice. This code cleanup simplifies subsequent changes.
> 
> The obvious danger (which you mention) is that somebody is relying on
> what typep points to, and is reading it even if we returned non-zero
> from whatever called this function.
> 
> Hopefully nobody is, but this change makes me a little uncomfortable
> nonetheless, since there are so many potential callers (even though this
> function has only one caller, it doesn't take long before the number of
> indirect callers explodes).
> 
> So it would be nice if we could do without it, but you claim that it
> simplifies changes that happen later on. So let's continue to see if we
> really do need it...

I'm actually reasonable comfortable with this patch. If we return an
error from the *_object_info() functions, then I think all bets are off
on what is in the resulting object_info struct. E.g., we'd already leave
sizep uninitialized in such a case.

It feels like oi->typep may be a little bit special because we conflate
"error" and "type" in the return from oid_object_info(). But
oid_object_info_extended() does not do that, and the innards of
oid_object_info() do the right thing.

Of course we _have_ been setting typep in this way for a while, so it's
worth making sure nobody is depending on. Notably packed_object_info()
does not behave in this way; if it hits an error, typep may be left
unset. So any oid_object_info_extended() callers depending on this were
already potentially buggy. I'd be OK with a quick sweep of the hits of
"git grep typep" here.

I just did that, and all the sites look pretty reasonable (they call
oid_object_info_extended() and bail as soon as they see that it fails).

-Peff
diff mbox series

Patch

diff --git a/object-file.c b/object-file.c
index a8be8994814..bda3497d5ca 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1503,8 +1503,6 @@  static int loose_object_info(struct repository *r,
 		git_inflate_end(&stream);
 
 	munmap(map, mapsize);
-	if (status && oi->typep)
-		*oi->typep = status;
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);