Message ID | 3cfd23073a036ea426569769b3e31290076257a6.1733515638.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | PATH WALK III: Add 'git backfill' command | expand |
On Fri, Dec 06, 2024 at 08:07:16PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@github.com> > > Users may want to specify a minimum batch size for their needs. This is only > a minimum: the path-walk API provides a list of OIDs that correspond to the > same path, and thus it is optimal to allow delta compression across those > objects in a single server request. Okay, here you explicitly say that this is a minimum batch size, so this is by design and with a proper reason. Good. > We could consider limiting the request to have a maximum batch size in the > future. For now, we let the path-walk API batches determine the > boundaries. Should we maybe rename `--batch-size` to `--min-batch-size` so that it does not become awkward if we ever want to have a maximum batch size, as well? Also helps to set expectations with the user. [snip] > Based on these experiments, a batch size of 50,000 was chosen as the > default value. Thanks for all the data, this is really helpful! > diff --git a/Documentation/git-backfill.txt b/Documentation/git-backfill.txt > index 0e10f066fef..9b0bae04e9d 100644 > --- a/Documentation/git-backfill.txt > +++ b/Documentation/git-backfill.txt > @@ -38,6 +38,14 @@ delta compression in the packfile sent by the server. > By default, `git backfill` downloads all blobs reachable from the `HEAD` > commit. This set can be restricted or expanded using various options. > > +OPTIONS > +------- > + > +--batch-size=<n>:: > + Specify a minimum size for a batch of missing objects to request > + from the server. This size may be exceeded by the last set of > + blobs seen at a given path. Default batch size is 16,000. This is stale: s/16,000/50,000/ > diff --git a/builtin/backfill.c b/builtin/backfill.c > index e5f2000d5e0..127333daef8 100644 > --- a/builtin/backfill.c > +++ b/builtin/backfill.c > @@ -112,6 +112,8 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, > struct reposit > .batch_size = 50000, > }; > struct option options[] = { > + OPT_INTEGER(0, "batch-size", &ctx.batch_size, > + N_("Minimun number of objects to request at a time")), s/Minimun/Minimum Patrick
On 12/16/24 3:01 AM, Patrick Steinhardt wrote: > On Fri, Dec 06, 2024 at 08:07:16PM +0000, Derrick Stolee via GitGitGadget wrote: >> From: Derrick Stolee <derrickstolee@github.com> >> >> Users may want to specify a minimum batch size for their needs. This is only >> a minimum: the path-walk API provides a list of OIDs that correspond to the >> same path, and thus it is optimal to allow delta compression across those >> objects in a single server request. > > Okay, here you explicitly say that this is a minimum batch size, so this > is by design and with a proper reason. Good. > >> We could consider limiting the request to have a maximum batch size in the >> future. For now, we let the path-walk API batches determine the >> boundaries. > > Should we maybe rename `--batch-size` to `--min-batch-size` so that it > does not become awkward if we ever want to have a maximum batch size, as > well? Also helps to set expectations with the user. Good idea. Will do. >> diff --git a/Documentation/git-backfill.txt b/Documentation/git-backfill.txt >> index 0e10f066fef..9b0bae04e9d 100644 >> --- a/Documentation/git-backfill.txt >> +++ b/Documentation/git-backfill.txt >> @@ -38,6 +38,14 @@ delta compression in the packfile sent by the server. >> By default, `git backfill` downloads all blobs reachable from the `HEAD` >> commit. This set can be restricted or expanded using various options. >> >> +OPTIONS >> +------- >> + >> +--batch-size=<n>:: >> + Specify a minimum size for a batch of missing objects to request >> + from the server. This size may be exceeded by the last set of >> + blobs seen at a given path. Default batch size is 16,000. > > This is stale: s/16,000/50,000/ Thanks! >> diff --git a/builtin/backfill.c b/builtin/backfill.c >> index e5f2000d5e0..127333daef8 100644 >> --- a/builtin/backfill.c >> +++ b/builtin/backfill.c >> @@ -112,6 +112,8 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, >> struct reposit >> .batch_size = 50000, >> }; >> struct option options[] = { >> + OPT_INTEGER(0, "batch-size", &ctx.batch_size, >> + N_("Minimun number of objects to request at a time")), > > s/Minimun/Minimum Thanks for the careful eye for detail. -Stolee
diff --git a/Documentation/git-backfill.txt b/Documentation/git-backfill.txt index 0e10f066fef..9b0bae04e9d 100644 --- a/Documentation/git-backfill.txt +++ b/Documentation/git-backfill.txt @@ -9,7 +9,7 @@ git-backfill - Download missing objects in a partial clone SYNOPSIS -------- [verse] -'git backfill' [<options>] +'git backfill' [--batch-size=<n>] DESCRIPTION ----------- @@ -38,6 +38,14 @@ delta compression in the packfile sent by the server. By default, `git backfill` downloads all blobs reachable from the `HEAD` commit. This set can be restricted or expanded using various options. +OPTIONS +------- + +--batch-size=<n>:: + Specify a minimum size for a batch of missing objects to request + from the server. This size may be exceeded by the last set of + blobs seen at a given path. Default batch size is 16,000. + SEE ALSO -------- linkgit:git-clone[1]. diff --git a/builtin/backfill.c b/builtin/backfill.c index e5f2000d5e0..127333daef8 100644 --- a/builtin/backfill.c +++ b/builtin/backfill.c @@ -21,7 +21,7 @@ #include "path-walk.h" static const char * const builtin_backfill_usage[] = { - N_("git backfill [<options>]"), + N_("git backfill [--batch-size=<n>]"), NULL }; @@ -112,6 +112,8 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit .batch_size = 50000, }; struct option options[] = { + OPT_INTEGER(0, "batch-size", &ctx.batch_size, + N_("Minimun number of objects to request at a time")), OPT_END(), }; diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh index 64326362d80..32e2bb1c132 100755 --- a/t/t5620-backfill.sh +++ b/t/t5620-backfill.sh @@ -59,6 +59,24 @@ test_expect_success 'do partial clone 1, backfill gets all objects' ' test_line_count = 0 revs2 ' +test_expect_success 'do partial clone 2, backfill batch size' ' + git clone --no-checkout --filter=blob:none \ + --single-branch --branch=main \ + "file://$(pwd)/srv.bare" backfill2 && + + GIT_TRACE2_EVENT="$(pwd)/batch-trace" git \ + -C backfill2 backfill --batch-size=20 && + + # Batches were used + test_trace2_data promisor fetch_count 20 <batch-trace >matches && + test_line_count = 2 matches && + test_trace2_data promisor fetch_count 8 <batch-trace && + + # No more missing objects! + git -C backfill2 rev-list --quiet --objects --missing=print HEAD >revs2 && + test_line_count = 0 revs2 +' + . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd