Message ID | pull.1206.git.git.1643248180.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | repack: add --filter= | expand |
Hi Johannes I'm not sure where I went wrong on GGG. Somehow the cc list didn't get translated into cc fields. Here's the PR: https://github.com/git/git/pull/1206. Thanks! cc'ing folks I meant to cc for this patch series On 8 Feb 2022, at 21:10, John Cai via GitGitGadget wrote: > This patch series makes partial clone more useful by making it possible to > run repack to remove objects from a repository (replacing it with promisor > objects). This is useful when we want to offload large blobs from a git > server onto another git server, or even use an http server through a remote > helper. > > In [A], a --refilter option on fetch and fetch-pack is being discussed where > either a less restrictive or more restrictive filter can be used. In the > more restrictive case, the objects that already exist will not be deleted. > But, one can imagine that users might want the ability to delete objects > when they apply a more restrictive filter in order to save space, and this > patch series would also allow that. > > There are a couple of things we need to adjust to make this possible. This > patch has three parts. > > 1. Allow --filter in pack-objects without --stdout > 2. Add a --filter flag for repack > 3. Allow missing promisor objects in upload-pack > 4. Tests that demonstrate the ability to offload objects onto an http > remote > > cc: Christian Couder christian.couder@gmail.com cc: Derrick Stolee > stolee@gmail.com cc: Robert Coup robert@coup.net.nz > > A. > https://lore.kernel.org/git/pull.1138.git.1643730593.gitgitgadget@gmail.com/ > > John Cai (4): > pack-objects: allow --filter without --stdout > repack: add --filter=<filter-spec> option > upload-pack: allow missing promisor objects > tests for repack --filter mode > > Documentation/git-repack.txt | 5 + > builtin/pack-objects.c | 2 - > builtin/repack.c | 22 +++-- > t/lib-httpd.sh | 2 + > t/lib-httpd/apache.conf | 8 ++ > t/lib-httpd/list.sh | 43 +++++++++ > t/lib-httpd/upload.sh | 46 +++++++++ > t/t0410-partial-clone.sh | 81 ++++++++++++++++ > t/t0410/git-remote-testhttpgit | 170 +++++++++++++++++++++++++++++++++ > t/t7700-repack.sh | 20 ++++ > upload-pack.c | 5 + > 11 files changed, 395 insertions(+), 9 deletions(-) > create mode 100644 t/lib-httpd/list.sh > create mode 100644 t/lib-httpd/upload.sh > create mode 100755 t/t0410/git-remote-testhttpgit > > > base-commit: 38062e73e009f27ea192d50481fcb5e7b0e9d6eb > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1206%2Fjohn-cai%2Fjc-repack-filter-v2 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1206/john-cai/jc-repack-filter-v2 > Pull-Request: https://github.com/git/git/pull/1206 > > Range-diff vs v1: > > 1: 0eec9b117da = 1: f43b76ca650 pack-objects: allow --filter without --stdout > -: ----------- > 2: 6e7c8410b8d repack: add --filter=<filter-spec> option > -: ----------- > 3: 40612b9663b upload-pack: allow missing promisor objects > 2: a3166381572 ! 4: d76faa1f16e repack: add --filter=<filter-spec> option > @@ Metadata > Author: John Cai <johncai86@gmail.com> > > ## Commit message ## > - repack: add --filter=<filter-spec> option > + tests for repack --filter mode > > - Currently, repack does not work with partial clones. When repack is run > - on a partially cloned repository, it grabs all missing objects from > - promisor remotes. This also means that when gc is run for repository > - maintenance on a partially cloned repository, it will end up getting > - missing objects, which is not what we want. > - > - In order to make repack work with partial clone, teach repack a new > - option --filter, which takes a <filter-spec> argument. repack will skip > - any objects that are matched by <filter-spec> similar to how the clone > - command will skip fetching certain objects. > - > - The final goal of this feature, is to be able to store objects on a > - server other than the regular git server itself. > + This patch adds tests to test both repack --filter functionality in > + isolation (in t7700-repack.sh) as well as how it can be used to offload > + large blobs (in t0410-partial-clone.sh) > > There are several scripts added so we can test the process of using a > - remote helper to upload blobs to an http server: > + remote helper to upload blobs to an http server. > > - t/lib-httpd/list.sh lists blobs uploaded to the http server. > - t/lib-httpd/upload.sh uploads blobs to the http server. > @@ Commit message > Based-on-patch-by: Christian Couder <chriscool@tuxfamily.org> > Signed-off-by: John Cai <johncai86@gmail.com> > > - ## Documentation/git-repack.txt ## > -@@ Documentation/git-repack.txt: depth is 4095. > - a larger and slower repository; see the discussion in > - `pack.packSizeLimit`. > - > -+--filter=<filter-spec>:: > -+ Omits certain objects (usually blobs) from the resulting > -+ packfile. See linkgit:git-rev-list[1] for valid > -+ `<filter-spec>` forms. > -+ > - -b:: > - --write-bitmap-index:: > - Write a reachability bitmap index as part of the repack. This > - > - ## builtin/repack.c ## > -@@ builtin/repack.c: struct pack_objects_args { > - const char *depth; > - const char *threads; > - const char *max_pack_size; > -+ const char *filter; > - int no_reuse_delta; > - int no_reuse_object; > - int quiet; > -@@ builtin/repack.c: static void prepare_pack_objects(struct child_process *cmd, > - strvec_pushf(&cmd->args, "--threads=%s", args->threads); > - if (args->max_pack_size) > - strvec_pushf(&cmd->args, "--max-pack-size=%s", args->max_pack_size); > -+ if (args->filter) > -+ strvec_pushf(&cmd->args, "--filter=%s", args->filter); > - if (args->no_reuse_delta) > - strvec_pushf(&cmd->args, "--no-reuse-delta"); > - if (args->no_reuse_object) > -@@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix) > - N_("limits the maximum number of threads")), > - OPT_STRING(0, "max-pack-size", &po_args.max_pack_size, N_("bytes"), > - N_("maximum size of each packfile")), > -+ OPT_STRING(0, "filter", &po_args.filter, N_("args"), > -+ N_("object filtering")), > - OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects, > - N_("repack objects in packs marked with .keep")), > - OPT_STRING_LIST(0, "keep-pack", &keep_pack_list, N_("name"), > -@@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix) > - if (line.len != the_hash_algo->hexsz) > - die(_("repack: Expecting full hex object ID lines only from pack-objects.")); > - string_list_append(&names, line.buf); > -+ if (po_args.filter) { > -+ char *promisor_name = mkpathdup("%s-%s.promisor", packtmp, > -+ line.buf); > -+ write_promisor_file(promisor_name, NULL, 0); > -+ } > - } > - fclose(out); > - ret = finish_command(&cmd); > - > ## t/lib-httpd.sh ## > @@ t/lib-httpd.sh: prepare_httpd() { > install_script error-smart-http.sh > @@ t/t0410-partial-clone.sh: test_expect_success 'fetching of missing objects from > + git -C server rev-list --objects --all --missing=print >objects && > + grep "$sha" objects > +' > ++ > ++test_expect_success 'fetch does not cause server to fetch missing objects' ' > ++ rm -rf origin server client && > ++ test_create_repo origin && > ++ dd if=/dev/zero of=origin/file1 bs=801k count=1 && > ++ git -C origin add file1 && > ++ git -C origin commit -m "large blob" && > ++ sha="$(git -C origin rev-parse :file1)" && > ++ expected="?$(git -C origin rev-parse :file1)" && > ++ git clone --bare --no-local origin server && > ++ git -C server remote add httpremote "testhttpgit::${PWD}/server" && > ++ git -C server config remote.httpremote.promisor true && > ++ git -C server config --remove-section remote.origin && > ++ git -C server rev-list --all --objects --filter-print-omitted \ > ++ --filter=blob:limit=800k | perl -ne "print if s/^[~]//" \ > ++ >large_blobs.txt && > ++ upload_blobs_from_stdin server <large_blobs.txt && > ++ git -C server -c repack.writebitmaps=false repack -a -d \ > ++ --filter=blob:limit=800k && > ++ git -C server config uploadpack.allowmissingpromisor true && > ++ git clone -c remote.httpremote.url="testhttpgit::${PWD}/server" \ > ++ -c remote.httpremote.fetch='+refs/heads/*:refs/remotes/httpremote/*' \ > ++ -c remote.httpremote.promisor=true --bare --no-local \ > ++ --filter=blob:limit=800k server client && > ++ git -C client rev-list --objects --all --missing=print >client_objects && > ++ grep "$expected" client_objects && > ++ git -C server rev-list --objects --all --missing=print >server_objects && > ++ grep "$expected" server_objects > ++' > + > # DO NOT add non-httpd-specific tests here, because the last part of this > # test script is only executed when httpd is available and enabled. > > -- > gitgitgadget