diff mbox series

[v3,08/27] revisions API users: add "goto cleanup" for "rev_info" early exit

Message ID patch-v3-08.27-f8a9443fe6f-20220325T171340Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series revision.[ch]: add and use release_revisions() | expand

Commit Message

Ævar Arnfjörð Bjarmason March 25, 2022, 5:18 p.m. UTC
Add release_revisions() in various users of "struct rev_info" that can
mostly use a "goto cleanup" pattern, but also have an early "return"
before we've called repo_init_revisions(). We need to avoid calling
release_revisions() with uninitialized memory.

It would be a lot cleaner to be able to initialize "struct rev_info"
with "{ 0 }" here, or if a "REV_INFO_INIT" existed, we'll hopefully
get around to making the initialization easier in the future (now it
can't be done via a macro).

Until then let's leave a "cleanup_no_rev[s]" in place to document the
intention here. Only status_submodule() in builtin/submodule--helper.c
strictly speaking needs this, the other ones could keep their "return"
for the early exit. But let's have them also use the "goto
cleanup[...]" for consistency, and for the eventual migration to
simpler initialization.

For the bundle.c code see the early exit case added in
3bbbe467f29 (bundle verify: error out if called without an object
database, 2019-05-27).

For the relevant bisect.c code see 45b6370812c (bisect: libify
`check_good_are_ancestors_of_bad` and its dependents, 2020-02-17).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 bisect.c                    | 17 ++++++++++++-----
 builtin/submodule--helper.c |  6 ++++--
 bundle.c                    | 12 +++++++++---
 3 files changed, 25 insertions(+), 10 deletions(-)

Comments

Junio C Hamano March 25, 2022, 8:30 p.m. UTC | #1
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> It would be a lot cleaner to be able to initialize "struct rev_info"
> with "{ 0 }" here, or if a "REV_INFO_INIT" existed, we'll hopefully
> get around to making the initialization easier in the future (now it
> can't be done via a macro).

If "struct rev_info" can be initialized with "{ 0 }" here, i.e.

-	struct rev_info rev;
+	struct rev_info rev = { 0 };

to give us a valid solution, why wouldn't you be able to do

+#define REV_INFO_INIT { 0 }

elsewhere in a common *.h file, and then

-	struct rev_info rev;
+	struct rev_info rev = REV_INFO_INIT;

to make the fact that "rev" is initialized (and ready to be handed
to the releaser) even more explicit?  It's like arguing against
fixing a code like this:

	struct char *pointer;
	...
	if (condition)
		pointer = malloc(...);
	...
        /* pointer leaks */
 
by initializing

	struct char *pointer = NULL;
	...
	if (condition)
		pointer = malloc(...);
	...
        free(pointer);

because for some reason you are against the macro NULL but you are
willing to spell it out as "0" (without double-quotes)?

Puzzled.
Ævar Arnfjörð Bjarmason March 26, 2022, 12:37 a.m. UTC | #2
On Fri, Mar 25 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> It would be a lot cleaner to be able to initialize "struct rev_info"
>> with "{ 0 }" here, or if a "REV_INFO_INIT" existed, we'll hopefully
>> get around to making the initialization easier in the future (now it
>> can't be done via a macro).
>
> If "struct rev_info" can be initialized with "{ 0 }" here, i.e.
>
> -	struct rev_info rev;
> +	struct rev_info rev = { 0 };
>
> to give us a valid solution, why wouldn't you be able to do
>
> +#define REV_INFO_INIT { 0 }
>
> elsewhere in a common *.h file, and then
>
> -	struct rev_info rev;
> +	struct rev_info rev = REV_INFO_INIT;
>
> to make the fact that "rev" is initialized (and ready to be handed
> to the releaser) even more explicit?  It's like arguing against
> fixing a code like this:
>
> 	struct char *pointer;
> 	...
> 	if (condition)
> 		pointer = malloc(...);
> 	...
>         /* pointer leaks */
>  
> by initializing
>
> 	struct char *pointer = NULL;
> 	...
> 	if (condition)
> 		pointer = malloc(...);
> 	...
>         free(pointer);
>
> because for some reason you are against the macro NULL but you are
> willing to spell it out as "0" (without double-quotes)?

I was fine with having it be { 0 } and have "release" functions
everywhere assume that memset-ing structures to zero makes them safe to
pass to release(), but my reading of your v2 feedback was that you
didn't like making that assumption, so I changed it in the v3 so we
wouldn't assume that.

I then took that REV_INFO_INIT suggestion to mean that this series
could/should be predicated on some cross-codebase refactoring to change
all of the "struct rev_info" initialization, which would be a rather
large digression.

Because I don't see how it makes any sense to have a REV_INFO_INIT if it
doesn't actually give you an init'd "struct rev_info" that's ready for
use. I.e. if you still need to call repo_init_revisions() it's just a
misleading interface.

I also think the cosmetic issues of initialization here really aren't
important at all, fixing these leaks is. So whatever idioms you're OK
with us using here are fine by me, as long as I can be made to clearly
understand what you want for a re-roll :)

Very preferably something that doesn't require refactoring the entirety
of "struct rev_info" init across the codebase...
Junio C Hamano March 26, 2022, 5:24 a.m. UTC | #3
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Because I don't see how it makes any sense to have a REV_INFO_INIT if it
> doesn't actually give you an init'd "struct rev_info" that's ready for
> use. I.e. if you still need to call repo_init_revisions() it's just a
> misleading interface.

You can say

	struct something *foo = NULL;

	if (some condition)
		foo = alloc_and_init_foo();

	...

	free_and_destruct(foo);

and it is correct that "initialize with NULL" alone would not let
you use the thing as full fledged 'foo', but you can still safely
pass it to free_and_destruct() (or if "something" does not hold
external resources, it could just be free()).  A flow like this:

	struct rev_info revs = REV_INFO_INIT;

	if (!some condition)
		goto leave;
	init(&revs);
	... use revs ...
leave:
        release(&revs);

would exactly be the same thing.

In other words, you say "I do not see how it makes any sense" but to
me it looks more like what does not make sense is your argument
against what was suggested.
Derrick Stolee March 28, 2022, 5:55 p.m. UTC | #4
On 3/26/2022 1:24 AM, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> 
>> Because I don't see how it makes any sense to have a REV_INFO_INIT if it
>> doesn't actually give you an init'd "struct rev_info" that's ready for
>> use. I.e. if you still need to call repo_init_revisions() it's just a
>> misleading interface.
> 
> You can say
> 
> 	struct something *foo = NULL;
> 
> 	if (some condition)
> 		foo = alloc_and_init_foo();
> 
> 	...
> 
> 	free_and_destruct(foo);
> 
> and it is correct that "initialize with NULL" alone would not let
> you use the thing as full fledged 'foo', but you can still safely
> pass it to free_and_destruct() (or if "something" does not hold
> external resources, it could just be free()).  A flow like this:
> 
> 	struct rev_info revs = REV_INFO_INIT;
> 
> 	if (!some condition)
> 		goto leave;
> 	init(&revs);
> 	... use revs ...
> leave:
>         release(&revs);
> 
> would exactly be the same thing.
> 
> In other words, you say "I do not see how it makes any sense" but to
> me it looks more like what does not make sense is your argument
> against what was suggested.

Ævar has stated in multiple threads that he prefers to not
initialize data so that static analysis tools can detect a
use-before-initialization of specific members.

However, now that we are intending to free rev_info structs,
we need them to be initialized with NULL pointers because
otherwise the release_revisions() method won't know which
portions were legitimately initialized and which ones were
not.

Maybe this NULL assignment happens as part of
repo_init_revisions(), but that also assumes that there is no
code path that would jump to a "leave" or "cleanup" tag before
running that initialization method (which is the broken case
that Junio mentions above).

Maybe there are tools that would identify that Junio's example
would be bad, but it is also likely that a compiler doesn't
catch that issue and tests don't cover that error condition.

It's my preference to initialize things to all-zeroes whenever
there is any chance of complexity, which is why this topic has
come to my attention on multiple threads.

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason March 28, 2022, 6:55 p.m. UTC | #5
On Mon, Mar 28 2022, Derrick Stolee wrote:

> On 3/26/2022 1:24 AM, Junio C Hamano wrote:
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>> 
>>> Because I don't see how it makes any sense to have a REV_INFO_INIT if it
>>> doesn't actually give you an init'd "struct rev_info" that's ready for
>>> use. I.e. if you still need to call repo_init_revisions() it's just a
>>> misleading interface.
>> 
>> You can say
>> 
>> 	struct something *foo = NULL;
>> 
>> 	if (some condition)
>> 		foo = alloc_and_init_foo();
>> 
>> 	...
>> 
>> 	free_and_destruct(foo);
>> 
>> and it is correct that "initialize with NULL" alone would not let
>> you use the thing as full fledged 'foo', but you can still safely
>> pass it to free_and_destruct() (or if "something" does not hold
>> external resources, it could just be free()).  A flow like this:
>> 
>> 	struct rev_info revs = REV_INFO_INIT;
>> 
>> 	if (!some condition)
>> 		goto leave;
>> 	init(&revs);
>> 	... use revs ...
>> leave:
>>         release(&revs);
>> 
>> would exactly be the same thing.
>> 
>> In other words, you say "I do not see how it makes any sense" but to
>> me it looks more like what does not make sense is your argument
>> against what was suggested.
>

[re-arranging a bit]

> However, now that we are intending to free rev_info structs,
> we need them to be initialized with NULL pointers because
> otherwise the release_revisions() method won't know which
> portions were legitimately initialized and which ones were
> not.
>
> Maybe this NULL assignment happens as part of
> repo_init_revisions(), but that also assumes that there is no
> code path that would jump to a "leave" or "cleanup" tag before
> running that initialization method (which is the broken case
> that Junio mentions above).
>
> Maybe there are tools that would identify that Junio's example
> would be bad, but it is also likely that a compiler doesn't
> catch that issue and tests don't cover that error condition.
>
> It's my preference to initialize things to all-zeroes whenever
> there is any chance of complexity, which is why this topic has
> come to my attention on multiple threads.

Mine too, IOW I really don't like the version in my own v3 here, but
wanted to just use an initialization to { 0 } as in the v2.

But I read Junio's reply to
https://lore.kernel.org/git/220324.868rszmga6.gmgdl@evledraar.gmail.com/
as objecting to that general assumption, as we do e.g. in some of this
code:

    git grep 'struct.*=.*\{ 0 \}'

So the v3 rewrite of those was seeking a way around that which didn't
require implementing a full init-from-macro of REV_INFO_INIT.

Junio, would you be OK/prefer to basically have the v2 verison, with
just a dummy macro like this in revision.h?:

    #define REV_INFO_INIT { 0 }

Which we'd then do something more useful with down the line?

> Ævar has stated in multiple threads that he prefers to not
> initialize data so that static analysis tools can detect a
> use-before-initialization of specific members.

Just to be clear, I prefer not doing initialization in cases where the
compiler is guaranteed to help you by skipping them, i.e. init to NULL
within a function where say 2 branchesh both init the value, but if you
were to change that the init to NULL would hide the bug.

(And I'm sure it's not something I came up with, but got pointed out to
me repeatedly in review (by Jeff King or Junio?), until I got it...)

But skipping zero-ing out a struct like "struct rev_info" is not such a
case where the compiler can really help, and we often use that struct at
a far distance from where it was init'd. So I'd really like to see us
not do that as a habit, even if say valgrind can sometimes catch it (but
only if we have 100% test coverage!).

You're probably thinking more of the case of [1]. I consider that a
pretty clear case of running with scissors, I just figured that one was
narrow enough to be OK, but it's good to have [2] instead.

1. https://lore.kernel.org/git/patch-1.1-193534b0f07-20220325T121715Z-avarab@gmail.com/
2. https://lore.kernel.org/git/patch-v2-1.1-9951d92176e-20220328T154049Z-avarab@gmail.com/
Junio C Hamano March 28, 2022, 8:03 p.m. UTC | #6
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Junio, would you be OK/prefer to basically have the v2 verison, with
> just a dummy macro like this in revision.h?:
>
>     #define REV_INFO_INIT { 0 }

FWIW, I do not see that as "dummy" at all.  We may further extend it
in the future, but it is what is minimally required to safely pass
revs to init and release, and what I have been saying since the
previous round of the review,
https://lore.kernel.org/git/xmqqwngji72a.fsf@gitster.g/
diff mbox series

Patch

diff --git a/bisect.c b/bisect.c
index cc6b8b6230d..c2e9f6ca9f6 100644
--- a/bisect.c
+++ b/bisect.c
@@ -1035,7 +1035,7 @@  enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 
 	res = check_good_are_ancestors_of_bad(r, prefix, no_checkout);
 	if (res)
-		return res;
+		goto cleanup_no_revs;
 
 	bisect_rev_setup(r, &revs, prefix, "%s", "^%s", 1);
 
@@ -1060,14 +1060,16 @@  enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 		       term_good,
 		       term_bad);
 
-		return BISECT_FAILED;
+		res = BISECT_FAILED;
+		goto cleanup;
 	}
 
 	if (!all) {
 		fprintf(stderr, _("No testable commit found.\n"
 			"Maybe you started with bad path arguments?\n"));
 
-		return BISECT_NO_TESTABLE_COMMIT;
+		res = BISECT_NO_TESTABLE_COMMIT;
+		goto cleanup;
 	}
 
 	bisect_rev = &revs.commits->item->object.oid;
@@ -1087,7 +1089,8 @@  enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 		 * for negative return values for early returns up
 		 * until the cmd_bisect__helper() caller.
 		 */
-		return BISECT_INTERNAL_SUCCESS_1ST_BAD_FOUND;
+		res = BISECT_INTERNAL_SUCCESS_1ST_BAD_FOUND;
+		goto cleanup;
 	}
 
 	nr = all - reaches - 1;
@@ -1106,7 +1109,11 @@  enum bisect_error bisect_next_all(struct repository *r, const char *prefix)
 	/* Clean up objects used, as they will be reused. */
 	repo_clear_commit_marks(r, ALL_REV_FLAGS);
 
-	return bisect_checkout(bisect_rev, no_checkout);
+	res = bisect_checkout(bisect_rev, no_checkout);
+cleanup:
+	release_revisions(&revs);
+cleanup_no_revs:
+	return res;
 }
 
 static inline int log2i(int n)
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 24980863f68..d1b656c0909 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -779,7 +779,7 @@  static void status_submodule(const char *path, const struct object_id *ce_oid,
 
 	if ((CE_STAGEMASK & ce_flags) >> CE_STAGESHIFT) {
 		print_status(flags, 'U', path, null_oid(), displaypath);
-		goto cleanup;
+		goto cleanup_no_rev;
 	}
 
 	strbuf_addf(&buf, "%s/.git", path);
@@ -791,7 +791,7 @@  static void status_submodule(const char *path, const struct object_id *ce_oid,
 	    !is_git_directory(git_dir)) {
 		print_status(flags, '-', path, ce_oid, displaypath);
 		strbuf_release(&buf);
-		goto cleanup;
+		goto cleanup_no_rev;
 	}
 	strbuf_release(&buf);
 
@@ -851,6 +851,8 @@  static void status_submodule(const char *path, const struct object_id *ce_oid,
 	}
 
 cleanup:
+	release_revisions(&rev);
+cleanup_no_rev:
 	strvec_clear(&diff_files_args);
 	free(displaypath);
 }
diff --git a/bundle.c b/bundle.c
index e359370cfcd..9b7c0bc55c4 100644
--- a/bundle.c
+++ b/bundle.c
@@ -202,8 +202,10 @@  int verify_bundle(struct repository *r,
 	int i, ret = 0, req_nr;
 	const char *message = _("Repository lacks these prerequisite commits:");
 
-	if (!r || !r->objects || !r->objects->odb)
-		return error(_("need a repository to verify a bundle"));
+	if (!r || !r->objects || !r->objects->odb) {
+		ret = error(_("need a repository to verify a bundle"));
+		goto cleanup_no_revs;
+	}
 
 	repo_init_revisions(r, &revs, NULL);
 	for (i = 0; i < p->nr; i++) {
@@ -221,7 +223,7 @@  int verify_bundle(struct repository *r,
 		error("%s %s", oid_to_hex(oid), name);
 	}
 	if (revs.pending.nr != p->nr)
-		return ret;
+		goto cleanup;
 	req_nr = revs.pending.nr;
 	setup_revisions(2, argv, &revs, NULL);
 
@@ -283,6 +285,10 @@  int verify_bundle(struct repository *r,
 			list_refs(r, 0, NULL);
 		}
 	}
+
+cleanup:
+	release_revisions(&revs);
+cleanup_no_revs:
 	return ret;
 }