Message ID | 63ec602a07756a41f8ccddd745562c567a4b3ed7.1595527000.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Maintenance builtin, allowing 'gc --auto' customization | expand |
On Thu, Jul 23, 2020 at 05:56:23PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> Everything in this patch looks very sensible to me, and I think that implementing 'git maintenance' in the same file as the current 'git gc' builtin makes a lot of sense to me. Reviewed-by: Taylor Blau <me@ttaylorr.com> Thanks, Taylor
On 2020-07-23 17:56:23+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > From: Derrick Stolee <dstolee@microsoft.com> > diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh > new file mode 100755 > index 0000000000..d00641c4dd > --- /dev/null > +++ b/t/t7900-maintenance.sh > @@ -0,0 +1,22 @@ > +#!/bin/sh > + > +test_description='git maintenance builtin' > + > +GIT_TEST_COMMIT_GRAPH=0 > +GIT_TEST_MULTI_PACK_INDEX=0 > + > +. ./test-lib.sh > + > +test_expect_success 'help text' ' > + test_must_fail git maintenance -h 2>err && > + test_i18ngrep "usage: git maintenance run" err > +' I think this test has been tested better already in t0012, Anyway, if we would like to check the word "git maintenance run" explicitly here, it would be clearer to s/test_must_fail/test_expect_code 129/ > + > +test_expect_success 'gc [--auto]' ' > + GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run && > + GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto && > + grep ",\"gc\"]" run-no-auto.txt && > + grep ",\"gc\",\"--auto\"]" run-auto.txt > +' > + > +test_done > -- > gitgitgadget >
Hi, Derrick Stolee wrote: > [Subject: maintenance: create basic maintenance runner] Seems sensible, and a good way to set up for the later patches. Let's take a look at how it does that. [...] > --- /dev/null > +++ b/Documentation/git-maintenance.txt > @@ -0,0 +1,57 @@ [...] > +SUBCOMMANDS > +----------- > + > +run:: > + Run one or more maintenance tasks. [jrnieder] How do I supply the tasks on the command line? Are they parameters to this subcommand? If so, it could make sense for this to say something like run <task>...:: What is the exit code convention for "git maintenance run"? (Many Git commands don't document this, but since we're making a new command it seems worth building the habit.) [...] > --- a/builtin/gc.c > +++ b/builtin/gc.c > @@ -699,3 +699,62 @@ int cmd_gc(int argc, const char **argv, const char *prefix) [...] > +static struct maintenance_opts { > + int auto_flag; > +} opts; Packing this in a struct feels a bit unusual. Is the struct going to be passed somewhere, or could these be individual locals in cmd_maintenance? [...] > + > +static int maintenance_task_gc(void) > +{ > + int result; > + struct argv_array cmd = ARGV_ARRAY_INIT; > + > + argv_array_pushl(&cmd, "gc", NULL); > + > + if (opts.auto_flag) > + argv_array_pushl(&cmd, "--auto", NULL); These are both pushing single strings, so they can use argv_array_push. [...] > --- /dev/null > +++ b/t/t7900-maintenance.sh > @@ -0,0 +1,22 @@ > +#!/bin/sh > + > +test_description='git maintenance builtin' > + > +GIT_TEST_COMMIT_GRAPH=0 > +GIT_TEST_MULTI_PACK_INDEX=0 Why does this disable commit graph and multipack index? Is that setting up for something that comes later? [...] > +test_expect_success 'gc [--auto]' ' > + GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run && > + GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto && > + grep ",\"gc\"]" run-no-auto.txt && > + grep ",\"gc\",\"--auto\"]" run-auto.txt This feels a little odd in two ways: - the use of "git gc" isn't a user-facing detail, so this is testing implementation instead of the user-facing behavior. That's okay, though --- it can be useful to test internals sometimes. - the way that this tests for "git gc" feels brittle: if the trace emitter changes some day to include a space after the comma, for example, then the test would start failing. I thought that in the spirit of fakes, it could make sense to write a custom "git gc" command using test_write_script, but that isn't likely to work either since gc is a builtin. Perhaps this is suggesting that we need some central test helper for parsing traces so we can do this reliably in one place. What do you think? Thanks, Jonathan
On 7/29/2020 6:19 PM, Jonathan Nieder wrote: > Hi, > > Derrick Stolee wrote: > >> [Subject: maintenance: create basic maintenance runner] > > Seems sensible, and a good way to set up for the later patches. Let's > take a look at how it does that. > > [...] >> --- /dev/null >> +++ b/Documentation/git-maintenance.txt >> @@ -0,0 +1,57 @@ > [...] >> +SUBCOMMANDS >> +----------- >> + >> +run:: >> + Run one or more maintenance tasks. > > [jrnieder] How do I supply the tasks on the command line? Are they > parameters to this subcommand? If so, it could make sense for this to > say something like > > run <task>...:: Hopefully this is documented to your satisfaction when the ability to customize the tasks is implemented. > What is the exit code convention for "git maintenance run"? (Many Git > commands don't document this, but since we're making a new command > it seems worth building the habit.) Is this worth doing? Do we really want every command to document that "0 means success, everything else means failure, and some of those exit codes mean a specific kind of failure that is global to Git"? > [...] >> --- a/builtin/gc.c >> +++ b/builtin/gc.c >> @@ -699,3 +699,62 @@ int cmd_gc(int argc, const char **argv, const char *prefix) > [...] >> +static struct maintenance_opts { >> + int auto_flag; >> +} opts; > > Packing this in a struct feels a bit unusual. Is the struct going to > be passed somewhere, or could these be individual locals in > cmd_maintenance? This will grow, and I'd rather have one global struct than many individual global items. It makes it clearer when I use "opts.auto_flag" that this corresponds to whether "--auto" was provided as a command-line option. > [...] >> + >> +static int maintenance_task_gc(void) >> +{ >> + int result; >> + struct argv_array cmd = ARGV_ARRAY_INIT; >> + >> + argv_array_pushl(&cmd, "gc", NULL); >> + >> + if (opts.auto_flag) >> + argv_array_pushl(&cmd, "--auto", NULL); > > These are both pushing single strings, so they can use argv_array_push. Thanks. I noticed a few of these myself. Luckily, I'll be going through all of these invocations and replacing them with new methods soon ;) [1] https://lore.kernel.org/git/30933a71-3130-5478-cbfd-0ca5bb308cf2@gmail.com/ > [...] >> --- /dev/null >> +++ b/t/t7900-maintenance.sh >> @@ -0,0 +1,22 @@ >> +#!/bin/sh >> + >> +test_description='git maintenance builtin' >> + >> +GIT_TEST_COMMIT_GRAPH=0 >> +GIT_TEST_MULTI_PACK_INDEX=0 > > Why does this disable commit graph and multipack index? Is that setting > up for something that comes later? Yes, these don't need to be here yet. > [...] >> +test_expect_success 'gc [--auto]' ' >> + GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run && >> + GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto && >> + grep ",\"gc\"]" run-no-auto.txt && >> + grep ",\"gc\",\"--auto\"]" run-auto.txt > > This feels a little odd in two ways: > > - the use of "git gc" isn't a user-facing detail, so this is testing > implementation instead of the user-facing behavior. That's okay, > though --- it can be useful to test internals sometimes. Consider this a "unit test" of the builtin. I'm testing whether the command-line arguments had an effect on the child process. > - the way that this tests for "git gc" feels brittle: if the trace > emitter changes some day to include a space after the comma, for > example, then the test would start failing. I thought that in the > spirit of fakes, it could make sense to write a custom "git gc" > command using test_write_script, but that isn't likely to work > either since gc is a builtin. > > Perhaps this is suggesting that we need some central test helper for > parsing traces so we can do this reliably in one place. What do you > think? I'm specifically using GIT_TRACE2_EVENT, which is intended for consumption by computer, not humans. Adding whitespace or otherwise changing the output format would be an unnecessary disruption that is more likely to have downstream effects with external tools. In some way, adding extra dependencies like this in our own test suite can help ensure the output format is more stable. If there is a better way to ask "Did my command call 'git gc' (with no arguments|with these arguments)?" then I'm happy to consider it. Thanks, -Stolee
Derrick Stolee wrote: > On 7/29/2020 6:19 PM, Jonathan Nieder wrote: >> [jrnieder] How do I supply the tasks on the command line? Are they >> parameters to this subcommand? If so, it could make sense for this to >> say something like >> >> run <task>...:: > > Hopefully this is documented to your satisfaction when the ability > to customize the tasks is implemented. Hm, do you mean that this change is too difficult to squash into the first patch? >> What is the exit code convention for "git maintenance run"? (Many Git >> commands don't document this, but since we're making a new command >> it seems worth building the habit.) > > Is this worth doing? Do we really want every command to document > that "0 means success, everything else means failure, and some of > those exit codes mean a specific kind of failure that is global to > Git"? I mean things like "what exit code does it run if there is no maintenance to do?" [...] >>> +static struct maintenance_opts { >>> + int auto_flag; >>> +} opts; >> >> Packing this in a struct feels a bit unusual. Is the struct going to >> be passed somewhere, or could these be individual locals in >> cmd_maintenance? > > This will grow, and I'd rather have one global struct than many > individual global items. It makes it clearer when I use > "opts.auto_flag" that this corresponds to whether "--auto" was > provided as a command-line option. That doesn't seem idiomatic within the Git codebase. Can they be locals instead? [...] >> Perhaps this is suggesting that we need some central test helper for >> parsing traces so we can do this reliably in one place. What do you >> think? > > I'm specifically using GIT_TRACE2_EVENT, which is intended for > consumption by computer, not humans. Adding whitespace or otherwise > changing the output format would be an unnecessary disruption that > is more likely to have downstream effects with external tools. > > In some way, adding extra dependencies like this in our own test > suite can help ensure the output format is more stable. Hm, I thought the contract for GIT_TRACE2_EVENT is that it is JSON, which is a well defined format. To that end, I would sincerely hope that external tools are not assuming anything beyond that the format is JSON. If that wasn't the intent, I'd rather that we not use a JSON-like format at all. That would make the expectations for evolving this interface clearer. > If there is a better way to ask "Did my command call 'git gc' (with > no arguments|with these arguments)?" then I'm happy to consider it. My proposal was just to factor this out into a function in test-lib-functions.sh so it's easy to evolve over time in one place. Thanks and hope that helps, Jonathan
Derrick Stolee wrote: > On 7/29/2020 6:19 PM, Jonathan Nieder wrote: > > Derrick Stolee wrote: >>> --- /dev/null >>> +++ b/Documentation/git-maintenance.txt >>> @@ -0,0 +1,57 @@ >> [...] >>> +SUBCOMMANDS >>> +----------- >>> + >>> +run:: >>> + Run one or more maintenance tasks. >> >> [jrnieder] How do I supply the tasks on the command line? Are they >> parameters to this subcommand? If so, it could make sense for this to >> say something like >> >> run <task>...:: > > Hopefully this is documented to your satisfaction when the ability > to customize the tasks is implemented. I think my last reply to this part missed the point a little, so let me try again. In multi-patch series, requests to make improvements to early parts that would be obsoleted by later parts may seem strange --- isn't this just wasted work? Sure, occasionally someone will stumble on the early part using "git bisect" or "git log", but isn't that rare enough to not be worth worrying too much about? And I'd agree with that reaction --- for a documentation nit like this one, that benefit alone is probably not enough to be worth the fuss of tweaking things. When I focus more on early parts of a series, it is for a different reason: from a release management point of view, it is useful to get the early part of a series "cooked" as soon as possible so it can be exposed to users in "next" and then become a foundation for later patches in "master". So we try to get as much of the early part of a series into that state as we can, and to merge the early part even without the later part as soon as the former is ready and valuable enough to users on its own. Relatedly, ensuring each commit is "complete" in this sense on its own gives us more flexibility in case a problem is discovered later: we can revert a patch that is causing issues without having to revert the preceding part. I think that might be part of the context I didn't convey well, and why I want to get this in tip-top shape and start using it. Thanks, Jonathan
On 7/30/2020 8:30 PM, Jonathan Nieder wrote: > Derrick Stolee wrote: >> On 7/29/2020 6:19 PM, Jonathan Nieder wrote: > >>> [jrnieder] How do I supply the tasks on the command line? Are they >>> parameters to this subcommand? If so, it could make sense for this to >>> say something like >>> >>> run <task>...:: >> >> Hopefully this is documented to your satisfaction when the ability >> to customize the tasks is implemented. > > Hm, do you mean that this change is too difficult to squash into the > first patch? I mean that there is only one task right now. Until the commit-graph task is implemented, there is no need to have a --task=<task> option. Creating a --task=<task> option in this patch would unnecessarily bloat the patch by implementing multiple tasks while filling in the boilerplate of creating a new builtin. >>> What is the exit code convention for "git maintenance run"? (Many Git >>> commands don't document this, but since we're making a new command >>> it seems worth building the habit.) >> >> Is this worth doing? Do we really want every command to document >> that "0 means success, everything else means failure, and some of >> those exit codes mean a specific kind of failure that is global to >> Git"? > > I mean things like "what exit code does it run if there is no > maintenance to do?" > > [...] >>>> +static struct maintenance_opts { >>>> + int auto_flag; >>>> +} opts; >>> >>> Packing this in a struct feels a bit unusual. Is the struct going to >>> be passed somewhere, or could these be individual locals in >>> cmd_maintenance? >> >> This will grow, and I'd rather have one global struct than many >> individual global items. It makes it clearer when I use >> "opts.auto_flag" that this corresponds to whether "--auto" was >> provided as a command-line option. > > That doesn't seem idiomatic within the Git codebase. > > Can they be locals instead? Which part do you want to be idiomatic? The fact that the parse-options library typically refers to static global values or the fact that the data is not organized in a struct? Turning these into locals is _even worse_ than the current model of static globals because that then requires updating the prototypes of all the (static) local methods that operate on the arguments. Every time a new option is added, that would cascade in method prototypes all over again. The natural thing to do to "fix" that situation is to create a struct that holds the information from the parsed command-line arguments. But then why is it a local parameter that is passed through all of the local methods instead of a global (as presented here in this patch)? > [...] >>> Perhaps this is suggesting that we need some central test helper for >>> parsing traces so we can do this reliably in one place. What do you >>> think? >> >> I'm specifically using GIT_TRACE2_EVENT, which is intended for >> consumption by computer, not humans. Adding whitespace or otherwise >> changing the output format would be an unnecessary disruption that >> is more likely to have downstream effects with external tools. >> >> In some way, adding extra dependencies like this in our own test >> suite can help ensure the output format is more stable. > > Hm, I thought the contract for GIT_TRACE2_EVENT is that it is JSON, > which is a well defined format. To that end, I would sincerely hope > that external tools are not assuming anything beyond that the format > is JSON. > > If that wasn't the intent, I'd rather that we not use a JSON-like > format at all. That would make the expectations for evolving this > interface clearer. > >> If there is a better way to ask "Did my command call 'git gc' (with >> no arguments|with these arguments)?" then I'm happy to consider it. > > My proposal was just to factor this out into a function in > test-lib-functions.sh so it's easy to evolve over time in one place. This is a valuable suggestion, but this series is already too large to make such a change in addition to the patches already here. I'd consider this more if there were more tests that checked "did we run this subcommand with these arguments?" Better to extract a helper when this is actually used in multiple places. Thanks, -Stolee
Derrick Stolee wrote: > On 7/30/2020 8:30 PM, Jonathan Nieder wrote: >> Derrick Stolee wrote: >>> On 7/29/2020 6:19 PM, Jonathan Nieder wrote: >>>> [jrnieder] How do I supply the tasks on the command line? Are they >>>> parameters to this subcommand? If so, it could make sense for this to >>>> say something like >>>> >>>> run <task>...:: >>> >>> Hopefully this is documented to your satisfaction when the ability >>> to customize the tasks is implemented. [...] > I mean that there is only one task right now. Until the commit-graph > task is implemented, there is no need to have a --task=<task> option. Ah, that wasn't clear to me from the docs. You're saying that "git maintenance run" runs the "gc" task? Can the documentation say so? [...] >>>>> +static struct maintenance_opts { >>>>> + int auto_flag; >>>>> +} opts; >>>> >>>> Packing this in a struct feels a bit unusual. Is the struct going to >>>> be passed somewhere, or could these be individual locals in >>>> cmd_maintenance? >>> >>> This will grow, and I'd rather have one global struct than many >>> individual global items. It makes it clearer when I use >>> "opts.auto_flag" that this corresponds to whether "--auto" was >>> provided as a command-line option. >> >> That doesn't seem idiomatic within the Git codebase. >> >> Can they be locals instead? > > Which part do you want to be idiomatic? The fact that the parse-options > library typically refers to static global values or the fact that the > data is not organized in a struct? parse-options has no requirement about the values being global. Some older code does that, but newer code tends to use locals when appropriate. Putting it in a struct is fine as long as that struct is being passed around. What isn't idiomatic in Git is using a global struct for namespacing. [...] > The natural thing to do to "fix" that situation is to create a struct > that holds the information from the parsed command-line arguments. But > then why is it a local parameter that is passed through all of the > local methods instead of a global (as presented here in this patch)? I'm having trouble parsing that last sentence. You're saying a global is preferable over passing around a pointer to a local "opts" struct? [...] >>> If there is a better way to ask "Did my command call 'git gc' (with >>> no arguments|with these arguments)?" then I'm happy to consider it. >> >> My proposal was just to factor this out into a function in >> test-lib-functions.sh so it's easy to evolve over time in one place. > > This is a valuable suggestion, but this series is already too large > to make such a change in addition to the patches already here. Hm, it's not clear to me that this would make the series significantly larger. And on the contrary, it would make the code less fragile. I think this is important. https://chromium.googlesource.com/chromium/src/+/master/docs/cl_respect.md#remember-communication-can-be-hard: if you'd like to meet out of band, let me know, and I'd be happy to go into this further. Thanks, Jonathan
On Mon, Aug 03, 2020 at 10:46:54AM -0700, Jonathan Nieder wrote: > Derrick Stolee wrote: > > On 7/30/2020 8:30 PM, Jonathan Nieder wrote: > >> Derrick Stolee wrote: > >>> On 7/29/2020 6:19 PM, Jonathan Nieder wrote: > > >>>> [jrnieder] How do I supply the tasks on the command line? Are they > >>>> parameters to this subcommand? If so, it could make sense for this to > >>>> say something like > >>>> > >>>> run <task>...:: > >>> > >>> Hopefully this is documented to your satisfaction when the ability > >>> to customize the tasks is implemented. > [...] > > I mean that there is only one task right now. Until the commit-graph > > task is implemented, there is no need to have a --task=<task> option. > > Ah, that wasn't clear to me from the docs. > > You're saying that "git maintenance run" runs the "gc" task? Can the > documentation say so? > > [...] > >>>>> +static struct maintenance_opts { > >>>>> + int auto_flag; > >>>>> +} opts; > >>>> > >>>> Packing this in a struct feels a bit unusual. Is the struct going to > >>>> be passed somewhere, or could these be individual locals in > >>>> cmd_maintenance? > >>> > >>> This will grow, and I'd rather have one global struct than many > >>> individual global items. It makes it clearer when I use > >>> "opts.auto_flag" that this corresponds to whether "--auto" was > >>> provided as a command-line option. > >> > >> That doesn't seem idiomatic within the Git codebase. > >> > >> Can they be locals instead? > > > > Which part do you want to be idiomatic? The fact that the parse-options > > library typically refers to static global values or the fact that the > > data is not organized in a struct? > > parse-options has no requirement about the values being global. Some > older code does that, but newer code tends to use locals when > appropriate. > > Putting it in a struct is fine as long as that struct is being passed > around. What isn't idiomatic in Git is using a global struct for > namespacing. > > [...] > > The natural thing to do to "fix" that situation is to create a struct > > that holds the information from the parsed command-line arguments. But > > then why is it a local parameter that is passed through all of the > > local methods instead of a global (as presented here in this patch)? > > I'm having trouble parsing that last sentence. You're saying a global > is preferable over passing around a pointer to a local "opts" struct? > > [...] > >>> If there is a better way to ask "Did my command call 'git gc' (with > >>> no arguments|with these arguments)?" then I'm happy to consider it. > >> > >> My proposal was just to factor this out into a function in > >> test-lib-functions.sh so it's easy to evolve over time in one place. > > > > This is a valuable suggestion, but this series is already too large > > to make such a change in addition to the patches already here. > > Hm, it's not clear to me that this would make the series significantly > larger. I think what Stolee is trying to say is less about a change that would make the series larger, it's about changing an already-large series. > And on the contrary, it would make the code less fragile. I think this > is important. I'm not sure that I see your argument. What we are really discussing is whether or not we should have a static struct outside of 'cmd_gc()', or a zero-initialized frame local struct within 'cmd_gc()'. I fully understand the arguments in favor of one or the other, but I struggle to grasp that this is worth our time to debate in this much detail. > https://chromium.googlesource.com/chromium/src/+/master/docs/cl_respect.md#remember-communication-can-be-hard: > if you'd like to meet out of band, let me know, and I'd be happy to go > into this further. I assume that the part of this document you were linking to was about meeting out-of-band to discuss an issue instead of emailing back and forth. I appreciate your willingness to resolve this expediently, but I would much prefer if these discussions (that is, the ones about the technical content of this series) happened on-list for the benefit of other reviewers. > Thanks, > Jonathan Thanks, Taylor
Taylor Blau wrote: > On Mon, Aug 03, 2020 at 10:46:54AM -0700, Jonathan Nieder wrote: >> Derrick Stolee wrote: >>> On 7/30/2020 8:30 PM, Jonathan Nieder wrote: >>>> Derrick Stolee wrote: >>>>> If there is a better way to ask "Did my command call 'git gc' (with >>>>> no arguments|with these arguments)?" then I'm happy to consider it. >>>> >>>> My proposal was just to factor this out into a function in >>>> test-lib-functions.sh so it's easy to evolve over time in one place. >>> >>> This is a valuable suggestion, but this series is already too large >>> to make such a change in addition to the patches already here. >> >> Hm, it's not clear to me that this would make the series significantly >> larger. > > I think what Stolee is trying to say is less about a change that would > make the series larger, it's about changing an already-large series. > >> And on the contrary, it would make the code less fragile. I think this >> is important. > > I'm not sure that I see your argument. What we are really discussing is > whether or not we should have a static struct outside of 'cmd_gc()', or > a zero-initialized frame local struct within 'cmd_gc()'. I fully > understand the arguments in favor of one or the other, but I struggle to > grasp that this is worth our time to debate in this much detail. Sorry for the lack of clarity. The proposal Stolee is pushing back on above is to have a helper in test-lib-functions.sh that tells a test whether an event it cares about happened in traces (in this example, invocation of "git gc"). I consider it important because if we are getting into the habit of having our tests assume the current exact byte-for-byte trace JSON output, then that is an assumption that is going to be painful to unravel. Factoring out a helper would also make the test easier to read, but that is less important than "an ounce of prevention is worth a pound of cure" to me in this example. Jonathan
On Mon, Aug 03, 2020 at 04:01:34PM -0700, Jonathan Nieder wrote: > Taylor Blau wrote: > > On Mon, Aug 03, 2020 at 10:46:54AM -0700, Jonathan Nieder wrote: > >> Derrick Stolee wrote: > >>> On 7/30/2020 8:30 PM, Jonathan Nieder wrote: > >>>> Derrick Stolee wrote: > > >>>>> If there is a better way to ask "Did my command call 'git gc' (with > >>>>> no arguments|with these arguments)?" then I'm happy to consider it. > >>>> > >>>> My proposal was just to factor this out into a function in > >>>> test-lib-functions.sh so it's easy to evolve over time in one place. > >>> > >>> This is a valuable suggestion, but this series is already too large > >>> to make such a change in addition to the patches already here. > >> > >> Hm, it's not clear to me that this would make the series significantly > >> larger. > > > > I think what Stolee is trying to say is less about a change that would > > make the series larger, it's about changing an already-large series. > > > >> And on the contrary, it would make the code less fragile. I think this > >> is important. > > > > I'm not sure that I see your argument. What we are really discussing is > > whether or not we should have a static struct outside of 'cmd_gc()', or > > a zero-initialized frame local struct within 'cmd_gc()'. I fully > > understand the arguments in favor of one or the other, but I struggle to > > grasp that this is worth our time to debate in this much detail. > > Sorry for the lack of clarity. The proposal Stolee is pushing back on > above is to have a helper in test-lib-functions.sh that tells a test > whether an event it cares about happened in traces (in this example, > invocation of "git gc"). Serves me right for reading too quickly ;). > I consider it important because if we are getting into the habit of > having our tests assume the current exact byte-for-byte trace JSON > output, then that is an assumption that is going to be painful to > unravel. Factoring out a helper would also make the test easier to > read, but that is less important than "an ounce of prevention is worth > a pound of cure" to me in this example. I still only partially buy into this, though. I had to deal with some rather fragile grepping through trace2 JSON in t4216 (? or something, the log-bloom tests) recently, and found it a little fragile, but not overly so. I'd rather just move forward, but I do feel strongly that we discuss these matters on the list. > Jonathan Thanks, Taylor
Taylor Blau wrote: > I still only partially buy into this, though. I had to deal with some > rather fragile grepping through trace2 JSON in t4216 (? or something, > the log-bloom tests) recently, and found it a little fragile, but not > overly so. > > I'd rather just move forward, This means we're promising to never change the JSON serialization, or that we're willing to pay with time in the future to update all tests that assume the current JSON serialization. I do not want to make that promise. I've lived through the same work of updating tests that assumed particular sha1s and having to change them to be more generic and do not want to have to do that again. If you're saying that you are willing to do that work when the time comes, then that counts for a lot. If you're saying that we should not change the JSON serialization, then it makes me worry that we made a mistake in choosing JSON as a format, since using a well defined format with an ecosystem of serialization + deserialization libraries was the whole point. Jonathan
Derrick Stolee wrote: > On 7/29/2020 6:19 PM, Jonathan Nieder wrote: >> Perhaps this is suggesting that we need some central test helper for >> parsing traces so we can do this reliably in one place. What do you >> think? > > I'm specifically using GIT_TRACE2_EVENT, which is intended for > consumption by computer, not humans. Adding whitespace or otherwise > changing the output format would be an unnecessary disruption that > is more likely to have downstream effects with external tools. > > In some way, adding extra dependencies like this in our own test > suite can help ensure the output format is more stable. I've discussed this a bit more at [1]: I do think this is output that we cannot promise to keep stable. Can you say a little more about why extracting a helper would be problematic? For example, would doing that in a separate patch at the end of the series be workable for you? More generally, one theme I'm sensing in the responses to the code review comments I sent is that the series may have gotten too long to be able to easily amend the beginning of it. To that end, I'm happy to make a reroll or make this easier in any other way I can. This also suggests to me that it may make sense to split off an early part of the series and get that into a shape you're comfortable with before rebasing the rest of the series on top as a separate topic (a tried and true method, used recently for the "git bugreport" command, for example). Thoughts? Thanks, Jonathan [1] https://lore.kernel.org/git/20200803231745.GB58587@google.com/
Jonathan Nieder <jrnieder@gmail.com> writes: > ... If you're saying that we should > not change the JSON serialization, then it makes me worry that we made > a mistake in choosing JSON as a format, since using a well defined > format with an ecosystem of serialization + deserialization libraries > was the whole point. Well said.
On 8/3/2020 7:17 PM, Jonathan Nieder wrote: > Taylor Blau wrote: > >> I still only partially buy into this, though. I had to deal with some >> rather fragile grepping through trace2 JSON in t4216 (? or something, >> the log-bloom tests) recently, and found it a little fragile, but not >> overly so. >> >> I'd rather just move forward, > > This means we're promising to never change the JSON serialization, or > that we're willing to pay with time in the future to update all tests > that assume the current JSON serialization. > > I do not want to make that promise. I've lived through the same work > of updating tests that assumed particular sha1s and having to change > them to be more generic and do not want to have to do that again. > > If you're saying that you are willing to do that work when the time > comes, then that counts for a lot. If you're saying that we should > not change the JSON serialization, then it makes me worry that we made > a mistake in choosing JSON as a format, since using a well defined > format with an ecosystem of serialization + deserialization libraries > was the whole point. The benefit of JSON is that it is simple to be _additive_. We can always jam more data into the format. What I'm doing is making the following assumptions: 1. There will always be an event line corresponding to launching a subcommand. 2. This line will contain the list of arguments used for that subcommand in a JSON array. 3. We don't add extra whitespace in that JSON array. The first two items are important promises made by the trace2 event output. By depending on them, I'm only adding more places where we expect the JSON *schema* to remain consistent. Rather, I only require that the JSON schema does not _remove_ that data which is expected by tools that are reading and reporting that data. Only the third item is something that adds an additional restriction on the JSON format. But also: why would we ever add extraneous whitespace characters to the output intended for computer parsing? This concern raises a giant "YAGNI" warning in my head. What is seems like you are asking instead is for me to create a tool in the test suite that parses each JSON line, extracts a specific member from that JSON object, reconstructs a command-line invocation from the JSON array, and reports whether that process worked for any line in the event output. That seems overly complicated and puts even more requirements on the JSON schema than I have placed in my current implementation. If this is to truly be a hard requirement for these tests to move forward, then I would rather do an alternate approach that would be a better test mechanism than scraping trace data: mock the subcommands. I would create a new directory, "t/mock", and create a new 'git' executable there. By setting --exec-path in our test commands, we would launch this mocked version of Git instead of our built version. The mock could report each set of command-line arguments to a known file. With some care, we could even provide an instructions file to tell it which error code to return or whether it should write data to stdout. This would allow testing things like the "rewrite after verify fails" behavior already in the maintenance builtin. If I'm to spend time engineering something more complicated just to check "did this subcommand run with these arguments?" then I'd rather do it this way. Thanks, -Stolee
Derrick Stolee wrote: > What is seems like you are asking instead is for me to create a tool > in the test suite that parses each JSON line, extracts a specific > member from that JSON object, reconstructs a command-line invocation > from the JSON array, and reports whether that process worked for any > line in the event output. No, that isn't what I'm asking. I'm asking for this patch series to take the existing "grep" lines and put them in a function in test-lib-functions.sh, so that we can change them in one place when the trace emitter changes. [...] > If this is to truly be a hard requirement for these tests to move > forward, Yes, from my point of view it really is. But that "is this truly a hard requirement?" comes up tells me I have not done a good job of communicating in this review. A review is about participants in the project working together to improve a patch, not people making demands at each other. [...] > If I'm to spend time engineering something more complicated just to > check "did this subcommand run with these arguments?" then I don't see why this is more complicated than what is in patch 1. In fact, I think it would be a little more simple. Jonathan
On 8/4/2020 10:42 AM, Jonathan Nieder wrote: > Derrick Stolee wrote: > >> What is seems like you are asking instead is for me to create a tool >> in the test suite that parses each JSON line, extracts a specific >> member from that JSON object, reconstructs a command-line invocation >> from the JSON array, and reports whether that process worked for any >> line in the event output. > > No, that isn't what I'm asking. > > I'm asking for this patch series to take the existing "grep" lines > and put them in a function in test-lib-functions.sh, so that we can > change them in one place when the trace emitter changes. Thanks for clarifying, clearing up my misinterpretation of your request. I'm coming around to this idea. > [...] >> If this is to truly be a hard requirement for these tests to move >> forward, > > Yes, from my point of view it really is. > > But that "is this truly a hard requirement?" comes up tells me I have > not done a good job of communicating in this review. A review is > about participants in the project working together to improve a patch, > not people making demands at each other. It's also about a balance. A patch author and a reviewer can have disagreements, and it is important to discover exactly how hard each is holding to their opinion. > [...] >> If I'm to spend time engineering something more complicated just to >> check "did this subcommand run with these arguments?" then > > I don't see why this is more complicated than what is in patch 1. In > fact, I think it would be a little more simple. My interpretation (parsing JSON) _was_ more complicated, hence my reservations. The helper still requires non-trivial script-fu (for someone like me who is bad at scripting languages) but centralizing the grep has value. Here is my attempt so far: trace2_subcommand_data () { command="$1" && commas=$(echo $command | sed s/\ /\",\"/g) && printf "[\"$commas\"]" } test_subcommand_called () { data=$(trace2_subcommand_data $1) && grep $data $2 } test_subcommand_not_called () { ! test_subcommand_called $1 $2 } with callers looking like test_subcommand_called "gc" run-no-auto.txt && test_subcommand_not_called "gc --auto" run-auto.txt && test_subcommand_called "gc --quiet" run-quiet.txt Thanks, -Stolee
Derrick Stolee wrote: > Here is my attempt so far: [...] > test_subcommand_called "gc" run-no-auto.txt && > test_subcommand_not_called "gc --auto" run-auto.txt && > test_subcommand_called "gc --quiet" run-quiet.txt I like it. Some tweaks: - can simplify by using 'printf' instead of sed - passing the trace in stdin - following the convention that helpers like test_i18ngrep use of accepting ! as a parameter to negate the exit code That would make (untested) # Check that the given command was invoked as part of the # trace2-format trace on stdin. # # test_subcommand_called [!] <command> <args>... < <trace> # # For example, to look for an invocation of "git upload-pack # /path/to/repo" # # GIT_TRACE2_EVENT=event.log git fetch ... && # test_subcommand_called git upload-pack "$PATH" <event.log # # If the first parameter passed is !, this instead checks that # the given command was not called. # test_subcommand_called () { local negate= if test "$1" = "!" then negate=t shift fi local expr=$(printf '"%s",' "$@") expr="${expr%,}" if test -n "$negate" then ! grep "\[$expr\]" else grep "\[$expr\]" fi }
On 8/4/2020 1:02 PM, Jonathan Nieder wrote: > Derrick Stolee wrote: > >> Here is my attempt so far: > [...] >> test_subcommand_called "gc" run-no-auto.txt && >> test_subcommand_not_called "gc --auto" run-auto.txt && >> test_subcommand_called "gc --quiet" run-quiet.txt > > I like it. Some tweaks: > - can simplify by using 'printf' instead of sed > - passing the trace in stdin > - following the convention that helpers like test_i18ngrep use of > accepting ! as a parameter to negate the exit code > > That would make (untested) > > # Check that the given command was invoked as part of the > # trace2-format trace on stdin. > # > # test_subcommand_called [!] <command> <args>... < <trace> > # > # For example, to look for an invocation of "git upload-pack > # /path/to/repo" > # > # GIT_TRACE2_EVENT=event.log git fetch ... && > # test_subcommand_called git upload-pack "$PATH" <event.log > # > # If the first parameter passed is !, this instead checks that > # the given command was not called. > # > test_subcommand_called () { > local negate= > if test "$1" = "!" > then > negate=t > shift > fi > > local expr=$(printf '"%s",' "$@") > expr="${expr%,}" > > if test -n "$negate" > then > ! grep "\[$expr\]" > else > grep "\[$expr\]" > fi > } This works! Thanks. With the ! usage, I think a rename to 'test_subcommand' makes sense. Further, I was somehow wrong about not including "git" in the beginning, so here is my example use of this method; test_subcommand git gc <run-no-auto.txt && test_subcommand ! git gc --auto <run-auto.txt && test_subcommand git gc --quiet <run-quiet.txt Thanks, -Stolee
On 8/3/2020 1:46 PM, Jonathan Nieder wrote: > Derrick Stolee wrote: >> On 7/30/2020 8:30 PM, Jonathan Nieder wrote: >>> Derrick Stolee wrote: >>>> On 7/29/2020 6:19 PM, Jonathan Nieder wrote: Revisiting the first part of this thread, as it got lost in the discussion about trace2 and JSON parsing... >>>>> [jrnieder] How do I supply the tasks on the command line? Are they >>>>> parameters to this subcommand? If so, it could make sense for this to >>>>> say something like >>>>> >>>>> run <task>...:: >>>> >>>> Hopefully this is documented to your satisfaction when the ability >>>> to customize the tasks is implemented. > [...] >> I mean that there is only one task right now. Until the commit-graph >> task is implemented, there is no need to have a --task=<task> option. > > Ah, that wasn't clear to me from the docs. > > You're saying that "git maintenance run" runs the "gc" task? Can the > documentation say so? In this patch, the documentation says Run tasks to optimize Git repository data the plural is a bit incorrect at this point, but not a huge deal. The run subcommand says "Run one or more maintenance tasks" which again is a bit misleading since "or more" is not technically possible. This becomes a bit obvious when we see that the task list only contains a 'gc' task. So no, the documentation in this patch doesn't say "Run the 'gc' task" because that could be gathered from context _and_ it will be incorrect by the time this patch is integrated as part of the series. So I struggle with finding a way to improve the documentation in this patch to satisfy your concerns. Any change made here will just be un-done by a later patch in the same series. I'd rather have something vague now and improve the documentation incrementally along with the implementation. Another point you made earlier, but seems to be dropped in this email: you wanted the return code documented. This also seems like wasteful text, because the best I can think of is This command will return 0 unless a maintenance task failed in an unrecoverable way, or other typical failure cases such as incorrect usage or the command being run outside of a Git repository. Please recommend something better, because that description suffers from being too generic/obvious when talking about the maintenance task and being an obvious repetition of Git's existing error code behavior. Perhaps this would be easier if there was an established pattern of documenting the return code. So, I went digging. The examples I could find are: 1. 'git cat-file -e' 2. 'git check-ref-format' 3. 'git commit-graph --object-dir=<dir>' 4. 'git cvsimport' 5. 'git fast-import' 6. 'git filter-branch' 7. 'git grep --quiet' 8. 'git gui citool' 9. 'git ls-remote --exit-code' 10. 'git merge-base --is-ancestor' 11. 'git push --recurse-submodules' 12. 'git reflog exists' 13. 'git rev-parse --parseopt' 14. 'git update-index --test-untracked-cache' Most of these are for specific options, and the exit code is documented with relation to how that specific option may lead to a non-zero exit code (1, 3, 7, 8, 9, 10, 11, 12, 13, 14). The one that is most clearly part of the main description is for the plumbing command 'git check-ref-format': Checks if a given 'refname' is acceptable, and exits with a non-zero status if it is not. This seems appropriate for something intended for use as a helper in script, with a very simple behavior. The exit code is the only way it provides any feedback to the caller! This use of the exit code as the only output is used by some other cases (1, 7, 9, 10, 11, 12, 13, 14). Of course, we also have examples like this in git-cvsimport.txt: Otherwise, success is indicated the Unix way, i.e. by simply exiting with a zero exit status. This phrase should be _assumed_ as part of Git itself, instead of written in the documentation for specific commands. The documentation for 'git fast-import' is more useful here: If fast-import is supplied invalid input it will terminate with a non-zero exit status and create a crash report... But, you can see that the exit code is not the focus, but the crash report _is_ important to document. 'git filter-branch' has a special exit code for the case that no commits were found to update. 'git gui citool' returns an error code if the window is closed before the command can complete. This is also not a core feature, so is not a good example. My only conclusion from this investigation is that the exit code is documented when it is useful, but otherwise the standard Unix pattern is assumed and not worth documenting. If this is something we want to change, then I don't think it valuable to have that discussion here. Instead, let's start that discussion elsewhere with a series that updates existing docs to catch up to a new standard of documenting the exit code. [1] https://git-scm.com/docs/git-cat-file#Documentation/git-cat-file.txt--e [2] https://git-scm.com/docs/git-check-ref-format#_description [3] https://git-scm.com/docs/git-commit-graph#Documentation/git-commit-graph.txt---object-dir [4] https://git-scm.com/docs/git-cvsimport#_output [5] https://git-scm.com/docs/git-fast-import#_crash_reports [6] https://git-scm.com/docs/git-filter-branch#_exit_status [7] https://git-scm.com/docs/git-grep#Documentation/git-grep.txt---quiet [8] https://git-scm.com/docs/git-gui#Documentation/git-gui.txt-codegitguicitoolcode [9] https://git-scm.com/docs/git-ls-remote#Documentation/git-ls-remote.txt---exit-code [10] https://git-scm.com/docs/git-merge-base#Documentation/git-merge-base.txt---is-ancestor [11] https://git-scm.com/docs/git-push#Documentation/git-push.txt---recurse-submodulescheckon-demandonlyno [12] https://git-scm.com/docs/git-reflog [13] https://git-scm.com/docs/git-rev-parse#_parseopt [14] https://git-scm.com/docs/git-update-index#Documentation/git-update-index.txt---test-untracked-cache > [...] >>>>>> +static struct maintenance_opts { >>>>>> + int auto_flag; >>>>>> +} opts; >>>>> >>>>> Packing this in a struct feels a bit unusual. Is the struct going to >>>>> be passed somewhere, or could these be individual locals in >>>>> cmd_maintenance? >>>> >>>> This will grow, and I'd rather have one global struct than many >>>> individual global items. It makes it clearer when I use >>>> "opts.auto_flag" that this corresponds to whether "--auto" was >>>> provided as a command-line option. >>> >>> That doesn't seem idiomatic within the Git codebase. >>> >>> Can they be locals instead? >> >> Which part do you want to be idiomatic? The fact that the parse-options >> library typically refers to static global values or the fact that the >> data is not organized in a struct? > > parse-options has no requirement about the values being global. Some > older code does that, but newer code tends to use locals when > appropriate. You're right. Global is not required. The struct needs to be 'static' or initialized for compilation. While trying to get rid of globals based on your recommendation, I see that the callback functions in the option-parsing _do_ require global state. This is used in "maintenance: add --task option" with the global tasks array, but it also uses a global count of "how many tasks have been selected" currently in the struct. Two ways to get around that issue include (1) making a global int, or (2) looping through the task list to count how many have been selected so far. Neither of these solutions is clean. (In the diff below, I chose option 2.) > Putting it in a struct is fine as long as that struct is being passed > around. What isn't idiomatic in Git is using a global struct for > namespacing. In my opinion, using a global struct would be better than the existing pattern of a bunch of global variables, but I get your point about precedence. More on this below. > [...] >> The natural thing to do to "fix" that situation is to create a struct >> that holds the information from the parsed command-line arguments. But >> then why is it a local parameter that is passed through all of the >> local methods instead of a global (as presented here in this patch)? > > I'm having trouble parsing that last sentence. You're saying a global > is preferable over passing around a pointer to a local "opts" struct? Using a "singleton" like the opts struct here, the code is definitely much simpler to read. There is only ever one instance, because the struct definition comes along with the instance declaration. By making it a local instance, we then need to pass around the struct as a parameter to all of the static methods in the builtin, even though there is only ever one instance of this struct per process. The two points above are the ones that led me to use the global instance in the first place. To get back to your point about being idiomatic, these builtins use a struct to group options, then pass it around as a parameter: * cat-file (struct batch_options) * checkout (struct checkout_opts) * rebase (struct rebase_options) * tag (struct create_tag_options) * worktree (struct add_opts) Unfortunately, there _is_ an example of the behavior I was following in this series: my own example in the sparse-checkout builtin. So, I'm just repeating my own mistakes that were not challenged before. I updated my v4-in-progress branch with that change, and the tip-to-tip diff is below. It required some non-trivial care around the --task=<task> callback (which can't see the local options struct) and other cases. Thanks, -Stolee --- diff --git a/builtin/gc.c b/builtin/gc.c index 1f0c519d14..3e473a1293 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -709,11 +709,10 @@ static const char * const builtin_maintenance_usage[] = { NULL }; -static struct maintenance_opts { +struct maintenance_opts { int auto_flag; int quiet; - int tasks_selected; -} opts; +}; /* Remember to update object flag allocation in object.h */ #define PARENT1 (1u<<16) @@ -789,7 +788,7 @@ static int should_write_commit_graph(void) return result; } -static int run_write_commit_graph(void) +static int run_write_commit_graph(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; @@ -797,13 +796,13 @@ static int run_write_commit_graph(void) strvec_pushl(&child.args, "commit-graph", "write", "--split", "--reachable", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--no-progress"); return !!run_command(&child); } -static int run_verify_commit_graph(void) +static int run_verify_commit_graph(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; @@ -811,24 +810,24 @@ static int run_verify_commit_graph(void) strvec_pushl(&child.args, "commit-graph", "verify", "--shallow", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--no-progress"); return !!run_command(&child); } -static int maintenance_task_commit_graph(void) +static int maintenance_task_commit_graph(struct maintenance_opts *opts) { struct repository *r = the_repository; char *chain_path; close_object_store(r->objects); - if (run_write_commit_graph()) { + if (run_write_commit_graph(opts)) { error(_("failed to write commit-graph")); return 1; } - if (!run_verify_commit_graph()) + if (!run_verify_commit_graph(opts)) return 0; warning(_("commit-graph verify caught error, rewriting")); @@ -840,14 +839,14 @@ static int maintenance_task_commit_graph(void) } free(chain_path); - if (!run_write_commit_graph()) + if (!run_write_commit_graph(opts)) return 0; error(_("failed to rewrite commit-graph")); return 1; } -static int fetch_remote(const char *remote) +static int fetch_remote(const char *remote, struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; @@ -855,7 +854,7 @@ static int fetch_remote(const char *remote) strvec_pushl(&child.args, "fetch", remote, "--prune", "--no-tags", "--no-write-fetch-head", "--refmap=", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--quiet"); strvec_pushf(&child.args, "+refs/heads/*:refs/prefetch/%s/*", remote); @@ -871,7 +870,7 @@ static int fill_each_remote(struct remote *remote, void *cbdata) return 0; } -static int maintenance_task_prefetch(void) +static int maintenance_task_prefetch(struct maintenance_opts *opts) { int result = 0; struct string_list_item *item; @@ -886,23 +885,23 @@ static int maintenance_task_prefetch(void) for (item = remotes.items; item && item < remotes.items + remotes.nr; item++) - result |= fetch_remote(item->string); + result |= fetch_remote(item->string, opts); cleanup: string_list_clear(&remotes, 0); return result; } -static int maintenance_task_gc(void) +static int maintenance_task_gc(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; child.git_cmd = 1; strvec_push(&child.args, "gc"); - if (opts.auto_flag) + if (opts->auto_flag) strvec_push(&child.args, "--auto"); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--quiet"); else strvec_push(&child.args, "--no-quiet"); @@ -911,14 +910,14 @@ static int maintenance_task_gc(void) return run_command(&child); } -static int prune_packed(void) +static int prune_packed(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; child.git_cmd = 1; strvec_push(&child.args, "prune-packed"); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--quiet"); return !!run_command(&child); @@ -977,7 +976,7 @@ static int write_loose_object_to_stdin(const struct object_id *oid, return ++(d->count) > d->batch_size; } -static int pack_loose(void) +static int pack_loose(struct maintenance_opts *opts) { struct repository *r = the_repository; int result = 0; @@ -996,7 +995,7 @@ static int pack_loose(void) pack_proc.git_cmd = 1; strvec_push(&pack_proc.args, "pack-objects"); - if (opts.quiet) + if (opts->quiet) strvec_push(&pack_proc.args, "--quiet"); strvec_pushf(&pack_proc.args, "%s/pack/loose", r->objects->odb->path); @@ -1027,9 +1026,9 @@ static int pack_loose(void) return result; } -static int maintenance_task_loose_objects(void) +static int maintenance_task_loose_objects(struct maintenance_opts *opts) { - return prune_packed() || pack_loose(); + return prune_packed(opts) || pack_loose(opts); } static int incremental_repack_auto_condition(void) @@ -1061,14 +1060,14 @@ static int incremental_repack_auto_condition(void) return count >= incremental_repack_auto_limit; } -static int multi_pack_index_write(void) +static int multi_pack_index_write(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; child.git_cmd = 1; strvec_pushl(&child.args, "multi-pack-index", "write", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--no-progress"); if (run_command(&child)) @@ -1077,7 +1076,7 @@ static int multi_pack_index_write(void) return 0; } -static int rewrite_multi_pack_index(void) +static int rewrite_multi_pack_index(struct maintenance_opts *opts) { struct repository *r = the_repository; char *midx_name = get_midx_filename(r->objects->odb->path); @@ -1085,17 +1084,18 @@ static int rewrite_multi_pack_index(void) unlink(midx_name); free(midx_name); - return multi_pack_index_write(); + return multi_pack_index_write(opts); } -static int multi_pack_index_verify(const char *message) +static int multi_pack_index_verify(struct maintenance_opts *opts, + const char *message) { struct child_process child = CHILD_PROCESS_INIT; child.git_cmd = 1; strvec_pushl(&child.args, "multi-pack-index", "verify", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--no-progress"); if (run_command(&child)) { @@ -1106,14 +1106,14 @@ static int multi_pack_index_verify(const char *message) return 0; } -static int multi_pack_index_expire(void) +static int multi_pack_index_expire(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; child.git_cmd = 1; strvec_pushl(&child.args, "multi-pack-index", "expire", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--no-progress"); close_object_store(the_repository->objects); @@ -1164,14 +1164,14 @@ static off_t get_auto_pack_size(void) return result_size; } -static int multi_pack_index_repack(void) +static int multi_pack_index_repack(struct maintenance_opts *opts) { struct child_process child = CHILD_PROCESS_INIT; child.git_cmd = 1; strvec_pushl(&child.args, "multi-pack-index", "repack", NULL); - if (opts.quiet) + if (opts->quiet) strvec_push(&child.args, "--no-progress"); strvec_pushf(&child.args, "--batch-size=%"PRIuMAX, @@ -1185,7 +1185,7 @@ static int multi_pack_index_repack(void) return 0; } -static int maintenance_task_incremental_repack(void) +static int maintenance_task_incremental_repack(struct maintenance_opts *opts) { prepare_repo_settings(the_repository); if (!the_repository->settings.core_multi_pack_index) { @@ -1193,22 +1193,22 @@ static int maintenance_task_incremental_repack(void) return 0; } - if (multi_pack_index_write()) + if (multi_pack_index_write(opts)) return 1; - if (multi_pack_index_verify("after initial write")) - return rewrite_multi_pack_index(); - if (multi_pack_index_expire()) + if (multi_pack_index_verify(opts, "after initial write")) + return rewrite_multi_pack_index(opts); + if (multi_pack_index_expire(opts)) return 1; - if (multi_pack_index_verify("after expire step")) - return !!rewrite_multi_pack_index(); - if (multi_pack_index_repack()) + if (multi_pack_index_verify(opts, "after expire step")) + return !!rewrite_multi_pack_index(opts); + if (multi_pack_index_repack(opts)) return 1; - if (multi_pack_index_verify("after repack step")) - return !!rewrite_multi_pack_index(); + if (multi_pack_index_verify(opts, "after repack step")) + return !!rewrite_multi_pack_index(opts); return 0; } -typedef int maintenance_task_fn(void); +typedef int maintenance_task_fn(struct maintenance_opts *opts); /* * An auto condition function returns 1 if the task should run @@ -1275,9 +1275,9 @@ static int compare_tasks_by_selection(const void *a_, const void *b_) return b->selected_order - a->selected_order; } -static int maintenance_run(void) +static int maintenance_run(struct maintenance_opts *opts) { - int i; + int i, found_selected = 0; int result = 0; struct lock_file lk; struct repository *r = the_repository; @@ -1291,7 +1291,7 @@ static int maintenance_run(void) * recursive process stack. Do not report an error in * that case. */ - if (!opts.auto_flag && !opts.quiet) + if (!opts->auto_flag && !opts->quiet) error(_("lock file '%s' exists, skipping maintenance"), lock_path); free(lock_path); @@ -1299,23 +1299,26 @@ static int maintenance_run(void) } free(lock_path); - if (opts.tasks_selected) + for (i = 0; !found_selected && i < TASK__COUNT; i++) + found_selected = tasks[i].selected; + + if (found_selected) QSORT(tasks, TASK__COUNT, compare_tasks_by_selection); for (i = 0; i < TASK__COUNT; i++) { - if (opts.tasks_selected && !tasks[i].selected) + if (found_selected && !tasks[i].selected) continue; - if (!opts.tasks_selected && !tasks[i].enabled) + if (!found_selected && !tasks[i].enabled) continue; - if (opts.auto_flag && + if (opts->auto_flag && (!tasks[i].auto_condition || !tasks[i].auto_condition())) continue; trace2_region_enter("maintenance", tasks[i].name, r); - if (tasks[i].fn()) { + if (tasks[i].fn(opts)) { error(_("task '%s' failed"), tasks[i].name); result = 1; } @@ -1349,17 +1352,15 @@ static void initialize_task_config(void) static int task_option_parse(const struct option *opt, const char *arg, int unset) { - int i; + int i, num_selected = 0; struct maintenance_task *task = NULL; BUG_ON_OPT_NEG(unset); - opts.tasks_selected++; - for (i = 0; i < TASK__COUNT; i++) { + num_selected += tasks[i].selected; if (!strcasecmp(tasks[i].name, arg)) { task = &tasks[i]; - break; } } @@ -1374,13 +1375,14 @@ static int task_option_parse(const struct option *opt, } task->selected = 1; - task->selected_order = opts.tasks_selected; + task->selected_order = num_selected + 1; return 0; } int cmd_maintenance(int argc, const char **argv, const char *prefix) { + static struct maintenance_opts opts; static struct option builtin_maintenance_options[] = { OPT_BOOL(0, "auto", &opts.auto_flag, N_("run tasks based on the state of the repository")), @@ -1408,7 +1410,7 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix) if (argc == 1) { if (!strcmp(argv[0], "run")) - return maintenance_run(); + return maintenance_run(&opts); } usage_with_options(builtin_maintenance_usage,
diff --git a/.gitignore b/.gitignore index ee509a2ad2..a5808fa30d 100644 --- a/.gitignore +++ b/.gitignore @@ -90,6 +90,7 @@ /git-ls-tree /git-mailinfo /git-mailsplit +/git-maintenance /git-merge /git-merge-base /git-merge-index diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt new file mode 100644 index 0000000000..34cd2b4417 --- /dev/null +++ b/Documentation/git-maintenance.txt @@ -0,0 +1,57 @@ +git-maintenance(1) +================== + +NAME +---- +git-maintenance - Run tasks to optimize Git repository data + + +SYNOPSIS +-------- +[verse] +'git maintenance' run [<options>] + + +DESCRIPTION +----------- +Run tasks to optimize Git repository data, speeding up other Git commands +and reducing storage requirements for the repository. ++ +Git commands that add repository data, such as `git add` or `git fetch`, +are optimized for a responsive user experience. These commands do not take +time to optimize the Git data, since such optimizations scale with the full +size of the repository while these user commands each perform a relatively +small action. ++ +The `git maintenance` command provides flexibility for how to optimize the +Git repository. + +SUBCOMMANDS +----------- + +run:: + Run one or more maintenance tasks. + +TASKS +----- + +gc:: + Cleanup unnecessary files and optimize the local repository. "GC" + stands for "garbage collection," but this task performs many + smaller tasks. This task can be rather expensive for large + repositories, as it repacks all Git objects into a single pack-file. + It can also be disruptive in some situations, as it deletes stale + data. + +OPTIONS +------- +--auto:: + When combined with the `run` subcommand, run maintenance tasks + only if certain thresholds are met. For example, the `gc` task + runs when the number of loose objects exceeds the number stored + in the `gc.auto` config setting, or when the number of pack-files + exceeds the `gc.autoPackLimit` config setting. + +GIT +--- +Part of the linkgit:git[1] suite diff --git a/builtin.h b/builtin.h index a5ae15bfe5..17c1c0ce49 100644 --- a/builtin.h +++ b/builtin.h @@ -167,6 +167,7 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix); int cmd_ls_remote(int argc, const char **argv, const char *prefix); int cmd_mailinfo(int argc, const char **argv, const char *prefix); int cmd_mailsplit(int argc, const char **argv, const char *prefix); +int cmd_maintenance(int argc, const char **argv, const char *prefix); int cmd_merge(int argc, const char **argv, const char *prefix); int cmd_merge_base(int argc, const char **argv, const char *prefix); int cmd_merge_index(int argc, const char **argv, const char *prefix); diff --git a/builtin/gc.c b/builtin/gc.c index 8e0b9cf41b..8d73c77f3a 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -699,3 +699,62 @@ int cmd_gc(int argc, const char **argv, const char *prefix) return 0; } + +static const char * const builtin_maintenance_usage[] = { + N_("git maintenance run [<options>]"), + NULL +}; + +static struct maintenance_opts { + int auto_flag; +} opts; + +static int maintenance_task_gc(void) +{ + int result; + struct argv_array cmd = ARGV_ARRAY_INIT; + + argv_array_pushl(&cmd, "gc", NULL); + + if (opts.auto_flag) + argv_array_pushl(&cmd, "--auto", NULL); + + close_object_store(the_repository->objects); + result = run_command_v_opt(cmd.argv, RUN_GIT_CMD); + argv_array_clear(&cmd); + + return result; +} + +static int maintenance_run(void) +{ + return maintenance_task_gc(); +} + +int cmd_maintenance(int argc, const char **argv, const char *prefix) +{ + static struct option builtin_maintenance_options[] = { + OPT_BOOL(0, "auto", &opts.auto_flag, + N_("run tasks based on the state of the repository")), + OPT_END() + }; + + memset(&opts, 0, sizeof(opts)); + + if (argc == 2 && !strcmp(argv[1], "-h")) + usage_with_options(builtin_maintenance_usage, + builtin_maintenance_options); + + argc = parse_options(argc, argv, prefix, + builtin_maintenance_options, + builtin_maintenance_usage, + PARSE_OPT_KEEP_UNKNOWN); + + if (argc == 1) { + if (!strcmp(argv[0], "run")) + return maintenance_run(); + } + + usage_with_options(builtin_maintenance_usage, + builtin_maintenance_options); +} diff --git a/git.c b/git.c index 2f021b97f3..ff56d1df24 100644 --- a/git.c +++ b/git.c @@ -527,6 +527,7 @@ static struct cmd_struct commands[] = { { "ls-tree", cmd_ls_tree, RUN_SETUP }, { "mailinfo", cmd_mailinfo, RUN_SETUP_GENTLY | NO_PARSEOPT }, { "mailsplit", cmd_mailsplit, NO_PARSEOPT }, + { "maintenance", cmd_maintenance, RUN_SETUP_GENTLY | NO_PARSEOPT }, { "merge", cmd_merge, RUN_SETUP | NEED_WORK_TREE }, { "merge-base", cmd_merge_base, RUN_SETUP }, { "merge-file", cmd_merge_file, RUN_SETUP_GENTLY }, diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh new file mode 100755 index 0000000000..d00641c4dd --- /dev/null +++ b/t/t7900-maintenance.sh @@ -0,0 +1,22 @@ +#!/bin/sh + +test_description='git maintenance builtin' + +GIT_TEST_COMMIT_GRAPH=0 +GIT_TEST_MULTI_PACK_INDEX=0 + +. ./test-lib.sh + +test_expect_success 'help text' ' + test_must_fail git maintenance -h 2>err && + test_i18ngrep "usage: git maintenance run" err +' + +test_expect_success 'gc [--auto]' ' + GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" git maintenance run && + GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" git maintenance run --auto && + grep ",\"gc\"]" run-no-auto.txt && + grep ",\"gc\",\"--auto\"]" run-auto.txt +' + +test_done