Message ID | pull.834.git.1609857770445.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | for-each-repo: do nothing on empty config | expand |
On Tue, Jan 5, 2021 at 9:44 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > 'git for-each-repo --config=X' should return success without calling any > subcommands when the config key 'X' has no value. The current > implementation instead segfaults. > [...] > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c > @@ -51,6 +51,9 @@ int cmd_for_each_repo(int argc, const char **argv, const char *prefix) > values = repo_config_get_value_multi(the_repository, > config_key); > + if (!values) > + return result; Probably not worth a re-roll, but the above has higher cognitive load than: if (!value) return 0; which indicates clearly that the command succeeded, whereas `return result` makes the reader scan all the code above the conditional to figure out what values `result` could possibly hold. > diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh > @@ -27,4 +27,8 @@ test_expect_success 'run based on configured value' ' > +test_expect_success 'do nothing on empty config' ' > + git for-each-repo --config=bogus.config -- these args would fail > +' The `these args would fail` arguments add mystery to the test, prompting the reader to wonder what exactly is going on: "Fail how?", "Is it supposed to fail?". It might be less confusing to use something more direct like `nonexistent` or `does not exist`.
On 1/5/2021 12:55 PM, Eric Sunshine wrote: > On Tue, Jan 5, 2021 at 9:44 AM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> 'git for-each-repo --config=X' should return success without calling any >> subcommands when the config key 'X' has no value. The current >> implementation instead segfaults. >> [...] >> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> >> --- >> diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c >> @@ -51,6 +51,9 @@ int cmd_for_each_repo(int argc, const char **argv, const char *prefix) >> values = repo_config_get_value_multi(the_repository, >> config_key); >> + if (!values) >> + return result; > > Probably not worth a re-roll, but the above has higher cognitive load than: > > if (!value) > return 0; > > which indicates clearly that the command succeeded, whereas `return > result` makes the reader scan all the code above the conditional to > figure out what values `result` could possibly hold. True. Looking at this again, it might be better to just update the loop to be for (i = 0; values && i < values->nr; i++) { which would run the loop zero times when there are zero results. >> diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh >> @@ -27,4 +27,8 @@ test_expect_success 'run based on configured value' ' >> +test_expect_success 'do nothing on empty config' ' >> + git for-each-repo --config=bogus.config -- these args would fail >> +' > > The `these args would fail` arguments add mystery to the test, > prompting the reader to wonder what exactly is going on: "Fail how?", > "Is it supposed to fail?". It might be less confusing to use something > more direct like `nonexistent` or `does not exist`. I guess the point is that if we actually did try running a subcommand on a repo (for instance, accidentally running it on the current repo) then the command would fail when running "git these args would fail". To me, "nonexistent" or "does not exist" doesn't adequately describe this (hypothetical) situation. Perhaps "fail-subcommand" might be better? -Stolee
On Tue, Jan 5, 2021 at 9:20 PM Derrick Stolee <stolee@gmail.com> wrote: > On 1/5/2021 12:55 PM, Eric Sunshine wrote: > > On Tue, Jan 5, 2021 at 9:44 AM Derrick Stolee via GitGitGadget > > <gitgitgadget@gmail.com> wrote: > > Probably not worth a re-roll, but the above has higher cognitive load than: > > > > if (!value) > > return 0; > > > > which indicates clearly that the command succeeded, whereas `return > > result` makes the reader scan all the code above the conditional to > > figure out what values `result` could possibly hold. > > True. Looking at this again, it might be better to just update the > loop to be > > for (i = 0; values && i < values->nr; i++) { > > which would run the loop zero times when there are zero results. I find the explicit `return 0` easier to reason about since I can see immediately that it handles a particular boundary condition, after which I no longer have to think about that condition as I continue reading the code. That said, I don't think it matters greatly one way or the other whether you use `return result`, `return 0`, or adjust the loop condition. It might matter if more functionality is added later, but we don't have to worry about it now, and it's not worth re-rolling just for this, or spending a lot of time discussing it. > >> + git for-each-repo --config=bogus.config -- these args would fail > > > > The `these args would fail` arguments add mystery to the test, > > prompting the reader to wonder what exactly is going on: "Fail how?", > > "Is it supposed to fail?". It might be less confusing to use something > > more direct like `nonexistent` or `does not exist`. > > I guess the point is that if we actually did try running a subcommand > on a repo (for instance, accidentally running it on the current repo) > then the command would fail when running "git these args would fail". > > To me, "nonexistent" or "does not exist" doesn't adequately describe > this (hypothetical) situation. > > Perhaps "fail-subcommand" might be better? I had never even looked at git-for-each-repo before, so I took the time to read the documentation and the source code before writing this reply. Now that I understand what is supposed to happen, I might be tempted to suggest `this-command-wont-be-run` as an argument, but that's a mouthful. If you want to be really explicit that it should fail if a bug gets re-introduced which causes the argument to be run even though the config is empty, then perhaps use `false`: git for-each-repo --config=bogus.config -- false By the way, when reading the documentation, some questions came to mind. Is the behavior implemented by this patch desirable? That is, if the user mistypes the name of the configuration variable, would it be better to let the user know about the problem by returning an error code rather than succeeding silently? Or do you foresee people intentionally running the command against an empty config variable? That might be legitimate in automation situations. If legitimate to have an empty config, then would it be helpful to somehow distinguish between an empty config variable and a non-existent one (assuming we can even do that)? Is the API of this command ideal? It feels odd to force the user to specify required input via a command-line option rather than just as a positional argument. In other words, since the config variable name is mandatory, an alternative invocation format would be: git for-each-repo <config-var> <cmd> Or do you foresee future enhancements expanding the ways in which the repo list can be specified (such as from a file or stdin or directly via the command-line), in which case the `--config` option makes sense.
Derrick Stolee <stolee@gmail.com> writes: >> Probably not worth a re-roll, but the above has higher cognitive load than: >> >> if (!value) >> return 0; >> >> which indicates clearly that the command succeeded, whereas `return >> result` makes the reader scan all the code above the conditional to >> figure out what values `result` could possibly hold. > > True. Looking at this again, it might be better to just update the > loop to be > > for (i = 0; values && i < values->nr; i++) { > > which would run the loop zero times when there are zero results. It is misleading, though. It forces readers to first assume that the loop body may update "values" from non-NULL to NULL in some condition and hunt for such a code in vain. If some clean-up starts to become needed after the loop, the "if the value array is empty, immediately return 0" may have to be rewritten to "if empty, jump to the clean-up part after the loop", but until then, what Eric gave us would be the easiest to read, I would think. > To me, "nonexistent" or "does not exist" doesn't adequately describe > this (hypothetical) situation. > > Perhaps "fail-subcommand" might be better? I wonder if "false" or "exit 1" would fit the bill. In any case, a comment may help, perhaps? test_expect_success 'do nothing and succeed on empty/missing config' ' # if this runs even once, "false" ensures a failure git for-each-repo --config=bogus.config -- false '
On 1/6/2021 3:05 AM, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >>> Probably not worth a re-roll, but the above has higher cognitive load than: >>> >>> if (!value) >>> return 0; >>> >>> which indicates clearly that the command succeeded, whereas `return >>> result` makes the reader scan all the code above the conditional to >>> figure out what values `result` could possibly hold. >> >> True. Looking at this again, it might be better to just update the >> loop to be >> >> for (i = 0; values && i < values->nr; i++) { >> >> which would run the loop zero times when there are zero results. > > It is misleading, though. It forces readers to first assume that > the loop body may update "values" from non-NULL to NULL in some > condition and hunt for such a code in vain. > > If some clean-up starts to become needed after the loop, the "if the > value array is empty, immediately return 0" may have to be rewritten > to "if empty, jump to the clean-up part after the loop", but until > then, what Eric gave us would be the easiest to read, I would think. Ok. That works for me. >> To me, "nonexistent" or "does not exist" doesn't adequately describe >> this (hypothetical) situation. >> >> Perhaps "fail-subcommand" might be better? > > I wonder if "false" or "exit 1" would fit the bill. In any case, a > comment may help, perhaps? > > test_expect_success 'do nothing and succeed on empty/missing config' ' > # if this runs even once, "false" ensures a failure > git for-each-repo --config=bogus.config -- false > ' I can add a comment, but keep in mind that this example would run the subcommand as "git false". This isn't intended as an arbitrary script runner, but a "please run the same Git command on a list of repos". Thanks, -Stolee
On 1/5/2021 11:20 PM, Eric Sunshine wrote: > On Tue, Jan 5, 2021 at 9:20 PM Derrick Stolee <stolee@gmail.com> wrote: >> On 1/5/2021 12:55 PM, Eric Sunshine wrote: >>> On Tue, Jan 5, 2021 at 9:44 AM Derrick Stolee via GitGitGadget >>> <gitgitgadget@gmail.com> wrote: >>> Probably not worth a re-roll, but the above has higher cognitive load than: >>> >>> if (!value) >>> return 0; >>> >>> which indicates clearly that the command succeeded, whereas `return >>> result` makes the reader scan all the code above the conditional to >>> figure out what values `result` could possibly hold. >> >> True. Looking at this again, it might be better to just update the >> loop to be >> >> for (i = 0; values && i < values->nr; i++) { >> >> which would run the loop zero times when there are zero results. > > I find the explicit `return 0` easier to reason about since I can see > immediately that it handles a particular boundary condition, after > which I no longer have to think about that condition as I continue > reading the code. > > That said, I don't think it matters greatly one way or the other > whether you use `return result`, `return 0`, or adjust the loop > condition. It might matter if more functionality is added later, but > we don't have to worry about it now, and it's not worth re-rolling > just for this, or spending a lot of time discussing it. Junio made a good point that 'values' doesn't change during the loop, so I shouldn't use it in the sentinel. I'll use your "return 0;" >>>> + git for-each-repo --config=bogus.config -- these args would fail >>> >>> The `these args would fail` arguments add mystery to the test, >>> prompting the reader to wonder what exactly is going on: "Fail how?", >>> "Is it supposed to fail?". It might be less confusing to use something >>> more direct like `nonexistent` or `does not exist`. >> >> I guess the point is that if we actually did try running a subcommand >> on a repo (for instance, accidentally running it on the current repo) >> then the command would fail when running "git these args would fail". >> >> To me, "nonexistent" or "does not exist" doesn't adequately describe >> this (hypothetical) situation. >> >> Perhaps "fail-subcommand" might be better? > > I had never even looked at git-for-each-repo before, so I took the > time to read the documentation and the source code before writing this > reply. Now that I understand what is supposed to happen, I might be > tempted to suggest `this-command-wont-be-run` as an argument, but > that's a mouthful. If you want to be really explicit that it should > fail if a bug gets re-introduced which causes the argument to be run > even though the config is empty, then perhaps use `false`: > > git for-each-repo --config=bogus.config -- false I'm just going to repeat that this would run "git false". It achieves the same goal where we interpret this as a failure if the subcommand is run. > By the way, when reading the documentation, some questions came to mind. > > Is the behavior implemented by this patch desirable? That is, if the > user mistypes the name of the configuration variable, would it be > better to let the user know about the problem by returning an error > code rather than succeeding silently? Or do you foresee people > intentionally running the command against an empty config variable? > That might be legitimate in automation situations. If legitimate to > have an empty config, then would it be helpful to somehow distinguish > between an empty config variable and a non-existent one (assuming we > can even do that)? As mentioned in the message, this is the situation the background maintenance would see if a user used 'git maintenance unregister' on all of their repos or removed the 'maintenance.repo' config values from their global config. I think it would be better to not start failing the background commands. Even outside of that context, we have no way to specify an "existing but empty" multi-valued config value over a non-existing config value. I'd rather prefer the interpretation "I succeeded in doing nothing" instead of "I think you made a mistake, because there's nothing to do." Could we meet in the middle and print a warning to stderr? warning(_("supplied config key '%s' has no values")); That would present a useful message to users running this command manually (or constructing automation) without breaking scripts that parse the output. > Is the API of this command ideal? It feels odd to force the user to > specify required input via a command-line option rather than just as a > positional argument. In other words, since the config variable name is > mandatory, an alternative invocation format would be: > > git for-each-repo <config-var> <cmd> > > Or do you foresee future enhancements expanding the ways in which the > repo list can be specified (such as from a file or stdin or directly > via the command-line), in which case the `--config` option makes > sense. I believe some voice interest in specifying a list of repositories by providing a directory instead of a config value. That would require scanning the immediate subdirectories to see if they are Git repos. A --stdin option would also be a good addition, if desired. Since this command was intended for automation, not end-users, it seemed that future-proofing the arguments in this way would be a good idea. We want to remain flexible for future additions without breaking compatibility. Thanks, -Stolee
On Wed, Jan 6, 2021 at 6:54 AM Derrick Stolee <stolee@gmail.com> wrote: > On 1/5/2021 11:20 PM, Eric Sunshine wrote: > > Is the behavior implemented by this patch desirable? That is, if the > > user mistypes the name of the configuration variable, would it be > > better to let the user know about the problem by returning an error > > code rather than succeeding silently? Or do you foresee people > > intentionally running the command against an empty config variable? > > As mentioned in the message, this is the situation the background > maintenance would see if a user used 'git maintenance unregister' on > all of their repos or removed the 'maintenance.repo' config values > from their global config. I think it would be better to not start > failing the background commands. > > Even outside of that context, we have no way to specify an "existing > but empty" multi-valued config value over a non-existing config > value. I'd rather prefer the interpretation "I succeeded in doing > nothing" instead of "I think you made a mistake, because there's > nothing to do." > > Could we meet in the middle and print a warning to stderr? > > warning(_("supplied config key '%s' has no values")); > > That would present a useful message to users running this command > manually (or constructing automation) without breaking scripts > that parse the output. I don't know. My knee-jerk response is that such a warning could get annoying quickly if this is used mostly in an automated environment, and an empty config value is likely to come up in practice. In that case, I'm somewhat negative on adding the warning. (I could perhaps see a --dry-run or --verbose mode issuing the above warning, but that's outside the scope of this series.) > > Is the API of this command ideal? It feels odd to force the user to > > specify required input via a command-line option rather than just as a > > positional argument. In other words, since the config variable name is > > mandatory, an alternative invocation format would be: > > > > git for-each-repo <config-var> <cmd> > > > > Or do you foresee future enhancements expanding the ways in which the > > repo list can be specified (such as from a file or stdin or directly > > via the command-line), in which case the `--config` option makes > > sense. > > I believe some voice interest in specifying a list of repositories > by providing a directory instead of a config value. That would > require scanning the immediate subdirectories to see if they are Git > repos. > > A --stdin option would also be a good addition, if desired. > > Since this command was intended for automation, not end-users, it > seemed that future-proofing the arguments in this way would be a good > idea. We want to remain flexible for future additions without > breaking compatibility. Fair enough.
Derrick Stolee <stolee@gmail.com> writes: >> I wonder if "false" or "exit 1" would fit the bill. In any case, a >> comment may help, perhaps? >> >> test_expect_success 'do nothing and succeed on empty/missing config' ' >> # if this runs even once, "false" ensures a failure >> git for-each-repo --config=bogus.config -- false >> ' > > I can add a comment, but keep in mind that this example would run the > subcommand as "git false". This isn't intended as an arbitrary script > runner, but a "please run the same Git command on a list of repos". Ah, that is a good point. The comment needs to explain: # the command fails if it attempts to run even once because # 'git false' does not exist and at that point, it does not have to be spelled 'false'. It could be 'no-such-git-subcommand' (and I wonder if that makes the comment unnecessary). That reminds me. If I have ~/bin/git-false and ~/bin on my $PATH, would this test fail to catch breakage?
Eric Sunshine <sunshine@sunshineco.com> writes: > Is the behavior implemented by this patch desirable? That is, if the > user mistypes the name of the configuration variable, would it be > better to let the user know about the problem by returning an error > code rather than succeeding silently? Or do you foresee people > intentionally running the command against an empty config variable? > That might be legitimate in automation situations. If legitimate to > have an empty config, then would it be helpful to somehow distinguish > between an empty config variable and a non-existent one (assuming we > can even do that)? I am guessing, without reading the mind of those who designed the feature, that the envisioned use pattern is that a configuration variable is designated, "git for-each-ref --config=<var> <cmd>" is carved into stone in a script that is run periodically with the hardcoded variable name in it, but the value of the variable may change from time to time. If it is expected that the variable can sometimes be empty, then erroring out or even issuing a warning would be counter-productive. > Is the API of this command ideal? It feels odd to force the user to > specify required input via a command-line option rather than just as a > positional argument. In other words, since the config variable name is > mandatory, an alternative invocation format would be: > > git for-each-repo <config-var> <cmd> Or not to use any configuration variable (or not to have the for-each-repo subcommand at all) and let the users write something along the lines of... git config --get-all <config-var> | while read repo do ( cd "$repo" && <cmd> ) done which is not all that bad and much more flexible. > Or do you foresee future enhancements expanding the ways in which the > repo list can be specified (such as from a file or stdin or directly > via the command-line), in which case the `--config` option makes > sense.
Junio C Hamano <gitster@pobox.com> writes: > Derrick Stolee <stolee@gmail.com> writes: > >>> I wonder if "false" or "exit 1" would fit the bill. In any case, a >>> comment may help, perhaps? >>> >>> test_expect_success 'do nothing and succeed on empty/missing config' ' >>> # if this runs even once, "false" ensures a failure >>> git for-each-repo --config=bogus.config -- false >>> ' >> >> I can add a comment, but keep in mind that this example would run the >> subcommand as "git false". This isn't intended as an arbitrary script >> runner, but a "please run the same Git command on a list of repos". > > Ah, that is a good point. > > The comment needs to explain: > > # the command fails if it attempts to run even once because > # 'git false' does not exist > > and at that point, it does not have to be spelled 'false'. It could > be 'no-such-git-subcommand' (and I wonder if that makes the comment > unnecessary). > > That reminds me. If I have ~/bin/git-false and ~/bin on my $PATH, > would this test fail to catch breakage? Yes, I think $PATH in the test environment starts from the original $PATH and modified only by prepending, so my ~/bin/git-false would kick in. We cannot reliably depend on the absence of a subcommand. We can instead use # the whole thing would fail if for-each-ref iterated even # once, because 'git help --no-such-option' would fail git for-each-ref --config=<var> -- help --no-such-option and I think that would be much more reliable; if an invocation of "git help" inside our test suite fails to refer to the "git help" from the version of Git being tested, we already have a serious problem. Thanks.
On 1/6/2021 4:40 PM, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > >> Derrick Stolee <stolee@gmail.com> writes: >> >>>> I wonder if "false" or "exit 1" would fit the bill. In any case, a >>>> comment may help, perhaps? >>>> >>>> test_expect_success 'do nothing and succeed on empty/missing config' ' >>>> # if this runs even once, "false" ensures a failure >>>> git for-each-repo --config=bogus.config -- false >>>> ' >>> >>> I can add a comment, but keep in mind that this example would run the >>> subcommand as "git false". This isn't intended as an arbitrary script >>> runner, but a "please run the same Git command on a list of repos". >> >> Ah, that is a good point. >> >> The comment needs to explain: >> >> # the command fails if it attempts to run even once because >> # 'git false' does not exist >> >> and at that point, it does not have to be spelled 'false'. It could >> be 'no-such-git-subcommand' (and I wonder if that makes the comment >> unnecessary). >> >> That reminds me. If I have ~/bin/git-false and ~/bin on my $PATH, >> would this test fail to catch breakage? > > Yes, I think $PATH in the test environment starts from the original > $PATH and modified only by prepending, so my ~/bin/git-false would > kick in. We cannot reliably depend on the absence of a subcommand. > > We can instead use > > # the whole thing would fail if for-each-ref iterated even > # once, because 'git help --no-such-option' would fail > git for-each-ref --config=<var> -- help --no-such-option > > and I think that would be much more reliable; if an invocation of > "git help" inside our test suite fails to refer to the "git help" > from the version of Git being tested, we already have a serious > problem. A very good point. I'll include this in v3. Thanks, -Stolee
On Wed, Jan 6, 2021 at 3:50 PM Junio C Hamano <gitster@pobox.com> wrote: > Eric Sunshine <sunshine@sunshineco.com> writes: > > Is the API of this command ideal? It feels odd to force the user to > > specify required input via a command-line option rather than just as a > > positional argument. In other words, since the config variable name is > > mandatory, an alternative invocation format would be: > > > > git for-each-repo <config-var> <cmd> > > Or not to use any configuration variable (or not to have the > for-each-repo subcommand at all) and let the users write > something along the lines of... > > git config --get-all <config-var> | > while read repo > do > ( cd "$repo" && <cmd> ) > done > > which is not all that bad and much more flexible. I had the same thought/question about why git-for-each-repo exists, though I didn't verbalize it since I assumed the reason was covered during the original discussion or patch submission, which I did not follow. I can see this command possibly being useful for Windows users who don't necessarily have a Unix-like shell or MS PowerShell with which to open-code the loop you illustrated. This may be especially important when this is used for some sort of scheduled maintenance on Windows, as a guess.
diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c index 5bba623ff12..9b9d2364f1a 100644 --- a/builtin/for-each-repo.c +++ b/builtin/for-each-repo.c @@ -51,6 +51,9 @@ int cmd_for_each_repo(int argc, const char **argv, const char *prefix) values = repo_config_get_value_multi(the_repository, config_key); + if (!values) + return result; + for (i = 0; !result && i < values->nr; i++) result = run_command_on_repo(values->items[i].string, &args); diff --git a/t/t0068-for-each-repo.sh b/t/t0068-for-each-repo.sh index 136b4ec8392..adc1b81826a 100755 --- a/t/t0068-for-each-repo.sh +++ b/t/t0068-for-each-repo.sh @@ -27,4 +27,8 @@ test_expect_success 'run based on configured value' ' grep again message ' +test_expect_success 'do nothing on empty config' ' + git for-each-repo --config=bogus.config -- these args would fail +' + test_done