Message ID | patch-v3-08.27-f8a9443fe6f-20220325T171340Z-avarab@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | revision.[ch]: add and use release_revisions() | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > It would be a lot cleaner to be able to initialize "struct rev_info" > with "{ 0 }" here, or if a "REV_INFO_INIT" existed, we'll hopefully > get around to making the initialization easier in the future (now it > can't be done via a macro). If "struct rev_info" can be initialized with "{ 0 }" here, i.e. - struct rev_info rev; + struct rev_info rev = { 0 }; to give us a valid solution, why wouldn't you be able to do +#define REV_INFO_INIT { 0 } elsewhere in a common *.h file, and then - struct rev_info rev; + struct rev_info rev = REV_INFO_INIT; to make the fact that "rev" is initialized (and ready to be handed to the releaser) even more explicit? It's like arguing against fixing a code like this: struct char *pointer; ... if (condition) pointer = malloc(...); ... /* pointer leaks */ by initializing struct char *pointer = NULL; ... if (condition) pointer = malloc(...); ... free(pointer); because for some reason you are against the macro NULL but you are willing to spell it out as "0" (without double-quotes)? Puzzled.
On Fri, Mar 25 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> It would be a lot cleaner to be able to initialize "struct rev_info" >> with "{ 0 }" here, or if a "REV_INFO_INIT" existed, we'll hopefully >> get around to making the initialization easier in the future (now it >> can't be done via a macro). > > If "struct rev_info" can be initialized with "{ 0 }" here, i.e. > > - struct rev_info rev; > + struct rev_info rev = { 0 }; > > to give us a valid solution, why wouldn't you be able to do > > +#define REV_INFO_INIT { 0 } > > elsewhere in a common *.h file, and then > > - struct rev_info rev; > + struct rev_info rev = REV_INFO_INIT; > > to make the fact that "rev" is initialized (and ready to be handed > to the releaser) even more explicit? It's like arguing against > fixing a code like this: > > struct char *pointer; > ... > if (condition) > pointer = malloc(...); > ... > /* pointer leaks */ > > by initializing > > struct char *pointer = NULL; > ... > if (condition) > pointer = malloc(...); > ... > free(pointer); > > because for some reason you are against the macro NULL but you are > willing to spell it out as "0" (without double-quotes)? I was fine with having it be { 0 } and have "release" functions everywhere assume that memset-ing structures to zero makes them safe to pass to release(), but my reading of your v2 feedback was that you didn't like making that assumption, so I changed it in the v3 so we wouldn't assume that. I then took that REV_INFO_INIT suggestion to mean that this series could/should be predicated on some cross-codebase refactoring to change all of the "struct rev_info" initialization, which would be a rather large digression. Because I don't see how it makes any sense to have a REV_INFO_INIT if it doesn't actually give you an init'd "struct rev_info" that's ready for use. I.e. if you still need to call repo_init_revisions() it's just a misleading interface. I also think the cosmetic issues of initialization here really aren't important at all, fixing these leaks is. So whatever idioms you're OK with us using here are fine by me, as long as I can be made to clearly understand what you want for a re-roll :) Very preferably something that doesn't require refactoring the entirety of "struct rev_info" init across the codebase...
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Because I don't see how it makes any sense to have a REV_INFO_INIT if it > doesn't actually give you an init'd "struct rev_info" that's ready for > use. I.e. if you still need to call repo_init_revisions() it's just a > misleading interface. You can say struct something *foo = NULL; if (some condition) foo = alloc_and_init_foo(); ... free_and_destruct(foo); and it is correct that "initialize with NULL" alone would not let you use the thing as full fledged 'foo', but you can still safely pass it to free_and_destruct() (or if "something" does not hold external resources, it could just be free()). A flow like this: struct rev_info revs = REV_INFO_INIT; if (!some condition) goto leave; init(&revs); ... use revs ... leave: release(&revs); would exactly be the same thing. In other words, you say "I do not see how it makes any sense" but to me it looks more like what does not make sense is your argument against what was suggested.
On 3/26/2022 1:24 AM, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> Because I don't see how it makes any sense to have a REV_INFO_INIT if it >> doesn't actually give you an init'd "struct rev_info" that's ready for >> use. I.e. if you still need to call repo_init_revisions() it's just a >> misleading interface. > > You can say > > struct something *foo = NULL; > > if (some condition) > foo = alloc_and_init_foo(); > > ... > > free_and_destruct(foo); > > and it is correct that "initialize with NULL" alone would not let > you use the thing as full fledged 'foo', but you can still safely > pass it to free_and_destruct() (or if "something" does not hold > external resources, it could just be free()). A flow like this: > > struct rev_info revs = REV_INFO_INIT; > > if (!some condition) > goto leave; > init(&revs); > ... use revs ... > leave: > release(&revs); > > would exactly be the same thing. > > In other words, you say "I do not see how it makes any sense" but to > me it looks more like what does not make sense is your argument > against what was suggested. Ævar has stated in multiple threads that he prefers to not initialize data so that static analysis tools can detect a use-before-initialization of specific members. However, now that we are intending to free rev_info structs, we need them to be initialized with NULL pointers because otherwise the release_revisions() method won't know which portions were legitimately initialized and which ones were not. Maybe this NULL assignment happens as part of repo_init_revisions(), but that also assumes that there is no code path that would jump to a "leave" or "cleanup" tag before running that initialization method (which is the broken case that Junio mentions above). Maybe there are tools that would identify that Junio's example would be bad, but it is also likely that a compiler doesn't catch that issue and tests don't cover that error condition. It's my preference to initialize things to all-zeroes whenever there is any chance of complexity, which is why this topic has come to my attention on multiple threads. Thanks, -Stolee
On Mon, Mar 28 2022, Derrick Stolee wrote: > On 3/26/2022 1:24 AM, Junio C Hamano wrote: >> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> >>> Because I don't see how it makes any sense to have a REV_INFO_INIT if it >>> doesn't actually give you an init'd "struct rev_info" that's ready for >>> use. I.e. if you still need to call repo_init_revisions() it's just a >>> misleading interface. >> >> You can say >> >> struct something *foo = NULL; >> >> if (some condition) >> foo = alloc_and_init_foo(); >> >> ... >> >> free_and_destruct(foo); >> >> and it is correct that "initialize with NULL" alone would not let >> you use the thing as full fledged 'foo', but you can still safely >> pass it to free_and_destruct() (or if "something" does not hold >> external resources, it could just be free()). A flow like this: >> >> struct rev_info revs = REV_INFO_INIT; >> >> if (!some condition) >> goto leave; >> init(&revs); >> ... use revs ... >> leave: >> release(&revs); >> >> would exactly be the same thing. >> >> In other words, you say "I do not see how it makes any sense" but to >> me it looks more like what does not make sense is your argument >> against what was suggested. > [re-arranging a bit] > However, now that we are intending to free rev_info structs, > we need them to be initialized with NULL pointers because > otherwise the release_revisions() method won't know which > portions were legitimately initialized and which ones were > not. > > Maybe this NULL assignment happens as part of > repo_init_revisions(), but that also assumes that there is no > code path that would jump to a "leave" or "cleanup" tag before > running that initialization method (which is the broken case > that Junio mentions above). > > Maybe there are tools that would identify that Junio's example > would be bad, but it is also likely that a compiler doesn't > catch that issue and tests don't cover that error condition. > > It's my preference to initialize things to all-zeroes whenever > there is any chance of complexity, which is why this topic has > come to my attention on multiple threads. Mine too, IOW I really don't like the version in my own v3 here, but wanted to just use an initialization to { 0 } as in the v2. But I read Junio's reply to https://lore.kernel.org/git/220324.868rszmga6.gmgdl@evledraar.gmail.com/ as objecting to that general assumption, as we do e.g. in some of this code: git grep 'struct.*=.*\{ 0 \}' So the v3 rewrite of those was seeking a way around that which didn't require implementing a full init-from-macro of REV_INFO_INIT. Junio, would you be OK/prefer to basically have the v2 verison, with just a dummy macro like this in revision.h?: #define REV_INFO_INIT { 0 } Which we'd then do something more useful with down the line? > Ævar has stated in multiple threads that he prefers to not > initialize data so that static analysis tools can detect a > use-before-initialization of specific members. Just to be clear, I prefer not doing initialization in cases where the compiler is guaranteed to help you by skipping them, i.e. init to NULL within a function where say 2 branchesh both init the value, but if you were to change that the init to NULL would hide the bug. (And I'm sure it's not something I came up with, but got pointed out to me repeatedly in review (by Jeff King or Junio?), until I got it...) But skipping zero-ing out a struct like "struct rev_info" is not such a case where the compiler can really help, and we often use that struct at a far distance from where it was init'd. So I'd really like to see us not do that as a habit, even if say valgrind can sometimes catch it (but only if we have 100% test coverage!). You're probably thinking more of the case of [1]. I consider that a pretty clear case of running with scissors, I just figured that one was narrow enough to be OK, but it's good to have [2] instead. 1. https://lore.kernel.org/git/patch-1.1-193534b0f07-20220325T121715Z-avarab@gmail.com/ 2. https://lore.kernel.org/git/patch-v2-1.1-9951d92176e-20220328T154049Z-avarab@gmail.com/
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Junio, would you be OK/prefer to basically have the v2 verison, with > just a dummy macro like this in revision.h?: > > #define REV_INFO_INIT { 0 } FWIW, I do not see that as "dummy" at all. We may further extend it in the future, but it is what is minimally required to safely pass revs to init and release, and what I have been saying since the previous round of the review, https://lore.kernel.org/git/xmqqwngji72a.fsf@gitster.g/
diff --git a/bisect.c b/bisect.c index cc6b8b6230d..c2e9f6ca9f6 100644 --- a/bisect.c +++ b/bisect.c @@ -1035,7 +1035,7 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix) res = check_good_are_ancestors_of_bad(r, prefix, no_checkout); if (res) - return res; + goto cleanup_no_revs; bisect_rev_setup(r, &revs, prefix, "%s", "^%s", 1); @@ -1060,14 +1060,16 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix) term_good, term_bad); - return BISECT_FAILED; + res = BISECT_FAILED; + goto cleanup; } if (!all) { fprintf(stderr, _("No testable commit found.\n" "Maybe you started with bad path arguments?\n")); - return BISECT_NO_TESTABLE_COMMIT; + res = BISECT_NO_TESTABLE_COMMIT; + goto cleanup; } bisect_rev = &revs.commits->item->object.oid; @@ -1087,7 +1089,8 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix) * for negative return values for early returns up * until the cmd_bisect__helper() caller. */ - return BISECT_INTERNAL_SUCCESS_1ST_BAD_FOUND; + res = BISECT_INTERNAL_SUCCESS_1ST_BAD_FOUND; + goto cleanup; } nr = all - reaches - 1; @@ -1106,7 +1109,11 @@ enum bisect_error bisect_next_all(struct repository *r, const char *prefix) /* Clean up objects used, as they will be reused. */ repo_clear_commit_marks(r, ALL_REV_FLAGS); - return bisect_checkout(bisect_rev, no_checkout); + res = bisect_checkout(bisect_rev, no_checkout); +cleanup: + release_revisions(&revs); +cleanup_no_revs: + return res; } static inline int log2i(int n) diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index 24980863f68..d1b656c0909 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -779,7 +779,7 @@ static void status_submodule(const char *path, const struct object_id *ce_oid, if ((CE_STAGEMASK & ce_flags) >> CE_STAGESHIFT) { print_status(flags, 'U', path, null_oid(), displaypath); - goto cleanup; + goto cleanup_no_rev; } strbuf_addf(&buf, "%s/.git", path); @@ -791,7 +791,7 @@ static void status_submodule(const char *path, const struct object_id *ce_oid, !is_git_directory(git_dir)) { print_status(flags, '-', path, ce_oid, displaypath); strbuf_release(&buf); - goto cleanup; + goto cleanup_no_rev; } strbuf_release(&buf); @@ -851,6 +851,8 @@ static void status_submodule(const char *path, const struct object_id *ce_oid, } cleanup: + release_revisions(&rev); +cleanup_no_rev: strvec_clear(&diff_files_args); free(displaypath); } diff --git a/bundle.c b/bundle.c index e359370cfcd..9b7c0bc55c4 100644 --- a/bundle.c +++ b/bundle.c @@ -202,8 +202,10 @@ int verify_bundle(struct repository *r, int i, ret = 0, req_nr; const char *message = _("Repository lacks these prerequisite commits:"); - if (!r || !r->objects || !r->objects->odb) - return error(_("need a repository to verify a bundle")); + if (!r || !r->objects || !r->objects->odb) { + ret = error(_("need a repository to verify a bundle")); + goto cleanup_no_revs; + } repo_init_revisions(r, &revs, NULL); for (i = 0; i < p->nr; i++) { @@ -221,7 +223,7 @@ int verify_bundle(struct repository *r, error("%s %s", oid_to_hex(oid), name); } if (revs.pending.nr != p->nr) - return ret; + goto cleanup; req_nr = revs.pending.nr; setup_revisions(2, argv, &revs, NULL); @@ -283,6 +285,10 @@ int verify_bundle(struct repository *r, list_refs(r, 0, NULL); } } + +cleanup: + release_revisions(&revs); +cleanup_no_revs: return ret; }
Add release_revisions() in various users of "struct rev_info" that can mostly use a "goto cleanup" pattern, but also have an early "return" before we've called repo_init_revisions(). We need to avoid calling release_revisions() with uninitialized memory. It would be a lot cleaner to be able to initialize "struct rev_info" with "{ 0 }" here, or if a "REV_INFO_INIT" existed, we'll hopefully get around to making the initialization easier in the future (now it can't be done via a macro). Until then let's leave a "cleanup_no_rev[s]" in place to document the intention here. Only status_submodule() in builtin/submodule--helper.c strictly speaking needs this, the other ones could keep their "return" for the early exit. But let's have them also use the "goto cleanup[...]" for consistency, and for the eventual migration to simpler initialization. For the bundle.c code see the early exit case added in 3bbbe467f29 (bundle verify: error out if called without an object database, 2019-05-27). For the relevant bisect.c code see 45b6370812c (bisect: libify `check_good_are_ancestors_of_bad` and its dependents, 2020-02-17). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- bisect.c | 17 ++++++++++++----- builtin/submodule--helper.c | 6 ++++-- bundle.c | 12 +++++++++--- 3 files changed, 25 insertions(+), 10 deletions(-)