Message ID | 20230517-unit-tests-v2-v2-1-21b5b60f4b32@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add an external testing library for unit tests | expand |
steadmon@google.com writes: > It is convenient to have common_exit() in its own object file so that > standalone programs may link to it (and any other object files that > depend on it) while still having their own independent main() function. I am not so sure if this is a good direction to go in, though. The common_exit() function does two things that are very specific to and dependent on what Git runtime has supposed to have done, like initializing trace2 subsystem and linking with usage.c to make bug_called_must_BUG exist. I understand that a third-party or standalone non-Git programs may want to do _more_ than what our main() does when starting up, but it should be doable if make our main() call out to a hook function, whose definition in Git is a no-op, that can be replaced by their own implementation to do whatever they want to happen in main(), no? The reason why I am not comfortable with this patch is because I cannot say why this split is better than other possible split. For example, we could instead split only our 'main' out to a separate file, say "main.c", and put main.o together with common-main.o to libgit.a to be found by the linker, and that arrangement will also help your "standalone programs" having their own main() function. Now with these two possible ways to split (and there may be other split that may be even more convenient; I simply do not know), which one is better, and what's the argument for each approach? > So let's move it to a new common-exit.c file and update the Makefile > accordingly. > > Change-Id: I41b90059eb9031f40c9f65374b4b047e7ba3aac0 > --- > Makefile | 1 + > common-exit.c | 26 ++++++++++++++++++++++++++ > common-main.c | 24 ------------------------ > 3 files changed, 27 insertions(+), 24 deletions(-)
Hi, I'd like to revisit this as it's also relevant to a non-unit-test issue (`make fuzz-all` is currently broken). I have some questions inline below: On 2023.05.18 10:17, Junio C Hamano wrote: > steadmon@google.com writes: > > > It is convenient to have common_exit() in its own object file so that > > standalone programs may link to it (and any other object files that > > depend on it) while still having their own independent main() function. > > I am not so sure if this is a good direction to go in, though. The > common_exit() function does two things that are very specific to and > dependent on what Git runtime has supposed to have done, like > initializing trace2 subsystem and linking with usage.c to make > bug_called_must_BUG exist. True. We won't call common_exit() unless we're trying to exit() from a file that also includes git-compat-util.h, but I guess that's not a guarantee that trace2 is initialized or that usage.o is linked. > I understand that a third-party or standalone non-Git programs may > want to do _more_ than what our main() does when starting up, but it > should be doable if make our main() call out to a hook function, > whose definition in Git is a no-op, that can be replaced by their > own implementation to do whatever they want to happen in main(), no? > > The reason why I am not comfortable with this patch is because I > cannot say why this split is better than other possible split. For > example, we could instead split only our 'main' out to a separate > file, say "main.c", and put main.o together with common-main.o to > libgit.a to be found by the linker, and that arrangement will also > help your "standalone programs" having their own main() function. > Now with these two possible ways to split (and there may be other > split that may be even more convenient; I simply do not know), which > one is better, and what's the argument for each approach? Sorry, I don't think I'm understanding your proposal here properly, please let me know where I'm going wrong: isn't this functionally equivalent to my patch, just with different filenames? Now main() would live in main.c (vs. my common-main.c), while check_bug_if_BUG() and common_exit() would live in common-main.c (now a misnomer, vs. my common-exit.c). I'm not following how that changes anything so I'm pretty sure I've misunderstood. The issue I was trying to solve (whether for a unit-test framework or for the fuzzing engine) is that we don't have direct control over their main(), and so we can't rename it to avoid conflicts with our main(). I guess there may be some linker magic we could do to avoid the conflict and have (our) main() call (their, renamed) main()? I don't know offhand if that's actually possible, just speculating. Even if possible, it feels more complicated to me, but again that may just be due to my lack of linker knowledge.
Josh Steadmon <steadmon@google.com> writes: > Sorry, I don't think I'm understanding your proposal here properly, > please let me know where I'm going wrong: isn't this functionally > equivalent to my patch, just with different filenames? Now main() would > live in main.c (vs. my common-main.c), while check_bug_if_BUG() and > common_exit() would live in common-main.c (now a misnomer, vs. my > common-exit.c). I'm not following how that changes anything so I'm > pretty sure I've misunderstood. Sorry, the old discussion has expired out of my brain, and asking what I had in mind back then is too late. Your common-main.c has stuff _other than_ main(), and the remaining main() has tons of Git specific stuff. It may be one way to split, but I did not find a reason to convince myself that it was a good split. What I was wondering as a straw-man alternative was to have main.c that has only this and nothing else: $ cat >main.c <<\EOF #include "git-compat-util.h" /* or whatever defines git_main() */ int main(int ac, char **av) { return git_main(ac, av); } EOF Then in common-main.c, rename main() to git_main(). I was not saying such a split would be superiour compared to how you moved only some stuff out of common-main.c to a separate file. I had trouble equally with your split and with the above strawman, because I did not (and do not) quite see how one would evaluate the result (go back to the message you are responding to for details). > The issue I was trying to solve (whether for a unit-test framework or > for the fuzzing engine) is that we don't have direct control over their > main(), and so we can't rename it to avoid conflicts with our main(). Sure. And if we by default use a very thin main() that calls git_main(), it would be very easy for them to replace that main.o file with their own implementation of main(); as long as they eventually call git_main(), they can borrow what we do in ours. > I guess there may be some linker magic we could do to avoid the conflict > and have (our) main() call (their, renamed) main()? We can throw a main.o that has the implementation of our default "main" function into "libgit.a". Then, when we link our "git" program (either built-in programs that are reachable from git.o, or standalone programs like remote-curl.o that have their own cmd_main()), we list our object files (but we do not have to list main.o) and tuck libgit.a at the end of the linker command line. As the start-up runtime code needs to find symbol "main", and the linker sees no object files listed has "main", the linker goes in and finds main.o stored in libgit.a (which has "main" defined) and that will end up being linked. If on the other hand when we link somebody else's program that has its own "main()", we list the object files that make up the program, including the one that has their "main()", before "libgit.a" and the linker does not bother trying to find "main" in libgit.a:main.o so the resulting binary will use their main(). Is that what you are looking for?
On 7/14/23 7:38 PM, Josh Steadmon wrote: > Hi, I'd like to revisit this as it's also relevant to a non-unit-test > issue (`make fuzz-all` is currently broken). I have some questions > inline below: > > On 2023.05.18 10:17, Junio C Hamano wrote: >> steadmon@google.com writes: >> >>> It is convenient to have common_exit() in its own object file so that >>> standalone programs may link to it (and any other object files that >>> depend on it) while still having their own independent main() function. >> >> I am not so sure if this is a good direction to go in, though. The >> common_exit() function does two things that are very specific to and >> dependent on what Git runtime has supposed to have done, like >> initializing trace2 subsystem and linking with usage.c to make >> bug_called_must_BUG exist. > > True. We won't call common_exit() unless we're trying to exit() from a > file that also includes git-compat-util.h, but I guess that's not a > guarantee that trace2 is initialized or that usage.o is linked. > >> I understand that a third-party or standalone non-Git programs may >> want to do _more_ than what our main() does when starting up, but it >> should be doable if make our main() call out to a hook function, >> whose definition in Git is a no-op, that can be replaced by their >> own implementation to do whatever they want to happen in main(), no? >> >> The reason why I am not comfortable with this patch is because I >> cannot say why this split is better than other possible split. For >> example, we could instead split only our 'main' out to a separate >> file, say "main.c", and put main.o together with common-main.o to >> libgit.a to be found by the linker, and that arrangement will also >> help your "standalone programs" having their own main() function. >> Now with these two possible ways to split (and there may be other >> split that may be even more convenient; I simply do not know), which >> one is better, and what's the argument for each approach? > > Sorry, I don't think I'm understanding your proposal here properly, > please let me know where I'm going wrong: isn't this functionally > equivalent to my patch, just with different filenames? Now main() would > live in main.c (vs. my common-main.c), while check_bug_if_BUG() and > common_exit() would live in common-main.c (now a misnomer, vs. my > common-exit.c). I'm not following how that changes anything so I'm > pretty sure I've misunderstood. > > The issue I was trying to solve (whether for a unit-test framework or > for the fuzzing engine) is that we don't have direct control over their > main(), and so we can't rename it to avoid conflicts with our main(). > > I guess there may be some linker magic we could do to avoid the conflict > and have (our) main() call (their, renamed) main()? I don't know offhand > if that's actually possible, just speculating. Even if possible, it > feels more complicated to me, but again that may just be due to my lack > of linker knowledge. I missed the original discussion and am definitely late to the party, but an FYI there is also a `wmain()` in `compat/mingw.c` that is used for MSVC builds on Windows. It sets up some OS process stuff before calling the actual `main()`. Jeff
diff --git a/Makefile b/Makefile index e440728c24..8ee7c7e5a8 100644 --- a/Makefile +++ b/Makefile @@ -987,6 +987,7 @@ LIB_OBJS += combine-diff.o LIB_OBJS += commit-graph.o LIB_OBJS += commit-reach.o LIB_OBJS += commit.o +LIB_OBJS += common-exit.o LIB_OBJS += compat/nonblock.o LIB_OBJS += compat/obstack.o LIB_OBJS += compat/terminal.o diff --git a/common-exit.c b/common-exit.c new file mode 100644 index 0000000000..1aaa538be3 --- /dev/null +++ b/common-exit.c @@ -0,0 +1,26 @@ +#include "git-compat-util.h" +#include "trace2.h" + +static void check_bug_if_BUG(void) +{ + if (!bug_called_must_BUG) + return; + BUG("on exit(): had bug() call(s) in this process without explicit BUG_if_bug()"); +} + +/* We wrap exit() to call common_exit() in git-compat-util.h */ +int common_exit(const char *file, int line, int code) +{ + /* + * For non-POSIX systems: Take the lowest 8 bits of the "code" + * to e.g. turn -1 into 255. On a POSIX system this is + * redundant, see exit(3) and wait(2), but as it doesn't harm + * anything there we don't need to guard this with an "ifdef". + */ + code &= 0xff; + + check_bug_if_BUG(); + trace2_cmd_exit_fl(file, line, code); + + return code; +} diff --git a/common-main.c b/common-main.c index f319317353..a8627b4b25 100644 --- a/common-main.c +++ b/common-main.c @@ -62,27 +62,3 @@ int main(int argc, const char **argv) /* Not exit(3), but a wrapper calling our common_exit() */ exit(result); } - -static void check_bug_if_BUG(void) -{ - if (!bug_called_must_BUG) - return; - BUG("on exit(): had bug() call(s) in this process without explicit BUG_if_bug()"); -} - -/* We wrap exit() to call common_exit() in git-compat-util.h */ -int common_exit(const char *file, int line, int code) -{ - /* - * For non-POSIX systems: Take the lowest 8 bits of the "code" - * to e.g. turn -1 into 255. On a POSIX system this is - * redundant, see exit(3) and wait(2), but as it doesn't harm - * anything there we don't need to guard this with an "ifdef". - */ - code &= 0xff; - - check_bug_if_BUG(); - trace2_cmd_exit_fl(file, line, code); - - return code; -}