From patchwork Sat Aug 12 00:00:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Couder X-Patchwork-Id: 13351584 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0FADC001B0 for ; Sat, 12 Aug 2023 00:00:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237261AbjHLAAm (ORCPT ); Fri, 11 Aug 2023 20:00:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37226 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235924AbjHLAAi (ORCPT ); Fri, 11 Aug 2023 20:00:38 -0400 Received: from mail-ot1-x330.google.com (mail-ot1-x330.google.com [IPv6:2607:f8b0:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AED831719 for ; Fri, 11 Aug 2023 17:00:37 -0700 (PDT) Received: by mail-ot1-x330.google.com with SMTP id 46e09a7af769-6bd3317144fso829795a34.1 for ; Fri, 11 Aug 2023 17:00:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691798436; x=1692403236; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tZxoPxNblmOwSetYVOqsVtOz24b+OzDiRmBwJUcA1PU=; b=pO5bMR+THwZpj240bKMqdO5Q0INaC7cvNbWS9GVCFeI1FPogakyiVv/ExZ/s6DHzfB GCxABbxxhKwlCuQW1GZ4Bi8YaNgxs5Eq8oTtbjOGaavMjIiP6D1XpWUAROLG4Rz4sia4 OQCAupOyc5Q0i70BD2wicVM3sEY87Hk/klMhXtdDuIxPFk9MtQQAkBT1GgzONRuDgA/2 AXACwpJUVh5jNQ7575zaaGP49eXgEvkkF8ExMoyz3VfUqO0tBDl+R35OPCiWXa2abc/H nxeJgYsoPEDLb7/QosUVKJUfe+9+90NX0WPf1jg4OvLw/JJx2ERUgwbgvbEzQMBy/Cmh 4XyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691798436; x=1692403236; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tZxoPxNblmOwSetYVOqsVtOz24b+OzDiRmBwJUcA1PU=; b=ZClsQc8oVObgh9996pDRp1iJ6Elg/pVaV4E7LxoaW4k9PPjJWqO9MiyOty+mCUiZ0W TK2909NWfRHmfTaL5aw373GQx1/qSrzDf6mXZRJm8VNLkoClp4XuKuZxdT8oMy5Aa7xw WBnTssvw8bHafDAtp9kqPUmzHzmH+KN9OQ+NrDVjrgVijAteHZkhH5VRGqQ3tkc6pT4N HOhXJEy0LHrXkTpykwnVdVg5xNjLR+YHpGlCUfm8HJvNyzVWFZFQ2uWuCHR0sa+CdT0W 6ZYSSpH0Ogd9KPy92f0t+bKl5/we1e/L/HglfvyEYiDIcy8CvDmhDEouJEB08ghbGnir 8hKQ== X-Gm-Message-State: AOJu0YzM5Q6KbHB5rtIoGdrU9wmA9pmpH9yTiQx431vioT9cwTDesatU WasHgaHbQZKvF+cJ6/zAFSmn6elmU5cPIg== X-Google-Smtp-Source: AGHT+IFoq5RhV56m3wF+fRThKHI8u+LQF7CCm3rLbyg5lOO9u8Ah4a+nYqOGPtFDwmIv816k+lOXyQ== X-Received: by 2002:a05:6830:1e4e:b0:6b9:b67e:ea8a with SMTP id e14-20020a0568301e4e00b006b9b67eea8amr3233246otj.14.1691798436321; Fri, 11 Aug 2023 17:00:36 -0700 (PDT) Received: from christian-Precision-5550.. ([129.126.215.52]) by smtp.gmail.com with ESMTPSA id z5-20020a1709028f8500b001b8a7e1b116sm4478308plo.191.2023.08.11.17.00.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Aug 2023 17:00:35 -0700 (PDT) From: Christian Couder To: git@vger.kernel.org Cc: Junio C Hamano , John Cai , Jonathan Tan , Jonathan Nieder , Taylor Blau , Derrick Stolee , Patrick Steinhardt , Christian Couder , Christian Couder Subject: [PATCH v5 1/8] pack-objects: allow `--filter` without `--stdout` Date: Sat, 12 Aug 2023 02:00:04 +0200 Message-ID: <20230812000011.1227371-2-christian.couder@gmail.com> X-Mailer: git-send-email 2.42.0.rc1.8.ga52e3a71db In-Reply-To: <20230812000011.1227371-1-christian.couder@gmail.com> References: <20230808082608.582319-1-christian.couder@gmail.com> <20230812000011.1227371-1-christian.couder@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org 9535ce7337 (pack-objects: add list-objects filtering, 2017-11-21) taught `git pack-objects` to use `--filter`, but required the use of `--stdout` since a partial clone mechanism was not yet in place to handle missing objects. Since then, changes like 9e27beaa23 (promisor-remote: implement promisor_remote_get_direct(), 2019-06-25) and others added support to dynamically fetch objects that were missing. Even without a promisor remote, filtering out objects can also be useful if we can put the filtered out objects in a separate pack, and in this case it also makes sense for pack-objects to write the packfile directly to an actual file rather than on stdout. Remove the `--stdout` requirement when using `--filter`, so that in a follow-up commit, repack can pass `--filter` to pack-objects to omit certain objects from the resulting packfile. Signed-off-by: John Cai Signed-off-by: Christian Couder --- Documentation/git-pack-objects.txt | 4 ++-- builtin/pack-objects.c | 8 ++------ t/t5317-pack-objects-filter-objects.sh | 8 ++++++++ 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index a9995a932c..583270a85f 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -298,8 +298,8 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle. nevertheless. --filter=:: - Requires `--stdout`. Omits certain objects (usually blobs) from - the resulting packfile. See linkgit:git-rev-list[1] for valid + Omits certain objects (usually blobs) from the resulting + packfile. See linkgit:git-rev-list[1] for valid `` forms. --no-filter:: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index d2a162d528..000ebec7ab 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4400,12 +4400,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!rev_list_all || !rev_list_reflog || !rev_list_index) unpack_unreachable_expiration = 0; - if (filter_options.choice) { - if (!pack_to_stdout) - die(_("cannot use --filter without --stdout")); - if (stdin_packs) - die(_("cannot use --filter with --stdin-packs")); - } + if (stdin_packs && filter_options.choice) + die(_("cannot use --filter with --stdin-packs")); if (stdin_packs && use_internal_rev_list) die(_("cannot use internal rev list with --stdin-packs")); diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh index b26d476c64..2ff3eef9a3 100755 --- a/t/t5317-pack-objects-filter-objects.sh +++ b/t/t5317-pack-objects-filter-objects.sh @@ -53,6 +53,14 @@ test_expect_success 'verify blob:none packfile has no blobs' ' ! grep blob verify_result ' +test_expect_success 'verify blob:none packfile without --stdout' ' + git -C r1 pack-objects --revs --filter=blob:none mypackname >packhash <<-EOF && + HEAD + EOF + git -C r1 verify-pack -v "mypackname-$(cat packhash).pack" >verify_result && + ! grep blob verify_result +' + test_expect_success 'verify normal and blob:none packfiles have same commits/trees' ' git -C r1 verify-pack -v ../all.pack >verify_result && grep -E "commit|tree" verify_result | From patchwork Sat Aug 12 00:00:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Couder X-Patchwork-Id: 13351585 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46648C001B0 for ; Sat, 12 Aug 2023 00:00:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235924AbjHLAAp (ORCPT ); Fri, 11 Aug 2023 20:00:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237282AbjHLAAn (ORCPT ); Fri, 11 Aug 2023 20:00:43 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85ECF19AE for ; Fri, 11 Aug 2023 17:00:40 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1bc7e65ea44so17924815ad.1 for ; Fri, 11 Aug 2023 17:00:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691798439; x=1692403239; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=skyNehnkSuqAclYi6nXNluuSYkwDjuqXXyzBR4Rsax8=; b=XmscF62QqBhsPA+2jMnyMDm2TNiRqeUlDPQbbWUwyuTGQ93oL71O128tYI6JZt8INM JVdluX5b+LJMClRVpoCBB9xY5D2atTkbj24+wjNZGf9VroCECR6gan5IkWEh85MOZYpL zhx30d95AokM/gjuf0uBgvWoh/LdS5roW4m0q/4KScsVTScDmq0lqF2HoiZlPox+UoQZ P7g9/9og3YUQjLNJO625YCKY5RIMHWZECOEnLWc1Jhi6H8HOh6WIJ2Wzsk+DKtAdn+Dz bHN0B2gjE9EsoTdpJmWFyyZ6JV948ePTB+xbfX2Yw47mnG08eGLSTX6MnKl1M2BImgaR bStg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691798439; x=1692403239; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=skyNehnkSuqAclYi6nXNluuSYkwDjuqXXyzBR4Rsax8=; b=gjPWLKdh7x+jKYwhkn/awlUU+FXhPiNwsyjkU8sTLsngj9vrzHPBZx/G2nWR99Pxyf LzCaE3jUdOWQMERJ8KYKo4r4WW6AXxIZPd0hwlHYOUTJTRhLmaqFf9wNk2fgwoWcTHSS 9u+zWhHk6jvWchLvfSVCXG5qwpmwHLR7ue9ahBZYZ0r4Wnk4rLPaQX/A9HKe+65+6Ld0 mXnSJcRzCDe7GfXPYgMMaEYyE7mw3+JoCj7H2rVaB7tAfrRQyxY3a9MKnS8GZRrl3Gul 21a9m3ywXi4IScnFW6wOuEI5M9ZwPMcTz1jJx0YPN6v/sXPYJT9NTw1rNsjYoxzk1SNN VNXA== X-Gm-Message-State: AOJu0Yzi5DnETGPJCFIyIe/hZVjQMH2FXBpSiux/UTgZlpCRFo9CcDBx qn+IFwm4eTaI1dHgl5CKdeFSJex1WCxnEQ== X-Google-Smtp-Source: AGHT+IEhE/dQULFNhUE+jMRA9V7ao+SAeFHIKoqE5iN0hwQdIwhE8cfPfixMNsH4QW+OUQ0Nz3kGqg== X-Received: by 2002:a17:902:74c8:b0:1b8:76d1:f1e8 with SMTP id f8-20020a17090274c800b001b876d1f1e8mr3597779plt.28.1691798439315; Fri, 11 Aug 2023 17:00:39 -0700 (PDT) Received: from christian-Precision-5550.. ([129.126.215.52]) by smtp.gmail.com with ESMTPSA id z5-20020a1709028f8500b001b8a7e1b116sm4478308plo.191.2023.08.11.17.00.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Aug 2023 17:00:38 -0700 (PDT) From: Christian Couder To: git@vger.kernel.org Cc: Junio C Hamano , John Cai , Jonathan Tan , Jonathan Nieder , Taylor Blau , Derrick Stolee , Patrick Steinhardt , Christian Couder , Christian Couder Subject: [PATCH v5 2/8] t/helper: add 'find-pack' test-tool Date: Sat, 12 Aug 2023 02:00:05 +0200 Message-ID: <20230812000011.1227371-3-christian.couder@gmail.com> X-Mailer: git-send-email 2.42.0.rc1.8.ga52e3a71db In-Reply-To: <20230812000011.1227371-1-christian.couder@gmail.com> References: <20230808082608.582319-1-christian.couder@gmail.com> <20230812000011.1227371-1-christian.couder@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a following commit, we will make it possible to separate objects in different packfiles depending on a filter. To make sure that the right objects are in the right packs, let's add a new test-tool that can display which packfile(s) a given object is in. Let's also make it possible to check if a given object is in the expected number of packfiles with a `--check-count ` option. Signed-off-by: Christian Couder --- Makefile | 1 + t/helper/test-find-pack.c | 50 ++++++++++++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/t0080-find-pack.sh | 82 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 135 insertions(+) create mode 100644 t/helper/test-find-pack.c create mode 100755 t/t0080-find-pack.sh diff --git a/Makefile b/Makefile index ace3e5a506..2534c831e8 100644 --- a/Makefile +++ b/Makefile @@ -800,6 +800,7 @@ TEST_BUILTINS_OBJS += test-dump-untracked-cache.o TEST_BUILTINS_OBJS += test-env-helper.o TEST_BUILTINS_OBJS += test-example-decorate.o TEST_BUILTINS_OBJS += test-fast-rebase.o +TEST_BUILTINS_OBJS += test-find-pack.o TEST_BUILTINS_OBJS += test-fsmonitor-client.o TEST_BUILTINS_OBJS += test-genrandom.o TEST_BUILTINS_OBJS += test-genzeros.o diff --git a/t/helper/test-find-pack.c b/t/helper/test-find-pack.c new file mode 100644 index 0000000000..e8bd793e58 --- /dev/null +++ b/t/helper/test-find-pack.c @@ -0,0 +1,50 @@ +#include "test-tool.h" +#include "object-name.h" +#include "object-store.h" +#include "packfile.h" +#include "parse-options.h" +#include "setup.h" + +/* + * Display the path(s), one per line, of the packfile(s) containing + * the given object. + * + * If '--check-count ' is passed, then error out if the number of + * packfiles containing the object is not . + */ + +static const char *find_pack_usage[] = { + "test-tool find-pack [--check-count ] ", + NULL +}; + +int cmd__find_pack(int argc, const char **argv) +{ + struct object_id oid; + struct packed_git *p; + int count = -1, actual_count = 0; + const char *prefix = setup_git_directory(); + + struct option options[] = { + OPT_INTEGER('c', "check-count", &count, "expected number of packs"), + OPT_END(), + }; + + argc = parse_options(argc, argv, prefix, options, find_pack_usage, 0); + if (argc != 1) + usage(find_pack_usage[0]); + + if (repo_get_oid(the_repository, argv[0], &oid)) + die("cannot parse %s as an object name", argv[0]); + + for (p = get_all_packs(the_repository); p; p = p->next) + if (find_pack_entry_one(oid.hash, p)) { + printf("%s\n", p->pack_name); + actual_count++; + } + + if (count > -1 && count != actual_count) + die("bad packfile count %d instead of %d", actual_count, count); + + return 0; +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index abe8a785eb..41da40c296 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -31,6 +31,7 @@ static struct test_cmd cmds[] = { { "env-helper", cmd__env_helper }, { "example-decorate", cmd__example_decorate }, { "fast-rebase", cmd__fast_rebase }, + { "find-pack", cmd__find_pack }, { "fsmonitor-client", cmd__fsmonitor_client }, { "genrandom", cmd__genrandom }, { "genzeros", cmd__genzeros }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index ea2672436c..411dbf2db4 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -25,6 +25,7 @@ int cmd__dump_reftable(int argc, const char **argv); int cmd__env_helper(int argc, const char **argv); int cmd__example_decorate(int argc, const char **argv); int cmd__fast_rebase(int argc, const char **argv); +int cmd__find_pack(int argc, const char **argv); int cmd__fsmonitor_client(int argc, const char **argv); int cmd__genrandom(int argc, const char **argv); int cmd__genzeros(int argc, const char **argv); diff --git a/t/t0080-find-pack.sh b/t/t0080-find-pack.sh new file mode 100755 index 0000000000..67b11216a3 --- /dev/null +++ b/t/t0080-find-pack.sh @@ -0,0 +1,82 @@ +#!/bin/sh + +test_description='test `test-tool find-pack`' + +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'setup' ' + test_commit one && + test_commit two && + test_commit three && + test_commit four && + test_commit five +' + +test_expect_success 'repack everything into a single packfile' ' + git repack -a -d --no-write-bitmap-index && + + head_commit_pack=$(test-tool find-pack HEAD) && + head_tree_pack=$(test-tool find-pack HEAD^{tree}) && + one_pack=$(test-tool find-pack HEAD:one.t) && + three_pack=$(test-tool find-pack HEAD:three.t) && + old_commit_pack=$(test-tool find-pack HEAD~4) && + + test-tool find-pack --check-count 1 HEAD && + test-tool find-pack --check-count=1 HEAD^{tree} && + ! test-tool find-pack --check-count=0 HEAD:one.t && + ! test-tool find-pack -c 2 HEAD:one.t && + test-tool find-pack -c 1 HEAD:three.t && + + # Packfile exists at the right path + case "$head_commit_pack" in + ".git/objects/pack/pack-"*".pack") true ;; + *) false ;; + esac && + test -f "$head_commit_pack" && + + # Everything is in the same pack + test "$head_commit_pack" = "$head_tree_pack" && + test "$head_commit_pack" = "$one_pack" && + test "$head_commit_pack" = "$three_pack" && + test "$head_commit_pack" = "$old_commit_pack" +' + +test_expect_success 'add more packfiles' ' + git rev-parse HEAD^{tree} HEAD:two.t HEAD:four.t >objects && + git pack-objects .git/objects/pack/mypackname1 >packhash1 objects && + git pack-objects .git/objects/pack/mypackname2 >packhash2 head_tree_packs && + grep "$head_commit_pack" head_tree_packs && + grep mypackname1 head_tree_packs && + ! grep mypackname2 head_tree_packs && + test-tool find-pack --check-count 2 HEAD^{tree} && + ! test-tool find-pack --check-count 1 HEAD^{tree} && + + # HEAD:five.t is also in 2 packfiles + test-tool find-pack HEAD:five.t >five_packs && + grep "$head_commit_pack" five_packs && + ! grep mypackname1 five_packs && + grep mypackname2 five_packs && + test-tool find-pack -c 2 HEAD:five.t && + ! test-tool find-pack --check-count=0 HEAD:five.t +' + +test_expect_success 'add more commits (as loose objects)' ' + test_commit six && + test_commit seven && + + test -z "$(test-tool find-pack HEAD)" && + test -z "$(test-tool find-pack HEAD:six.t)" && + test-tool find-pack --check-count 0 HEAD && + test-tool find-pack -c 0 HEAD:six.t && + ! test-tool find-pack -c 1 HEAD:seven.t +' + +test_done From patchwork Sat Aug 12 00:00:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Couder X-Patchwork-Id: 13351586 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09B00C04A6A for ; Sat, 12 Aug 2023 00:00:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237297AbjHLAAq (ORCPT ); Fri, 11 Aug 2023 20:00:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237298AbjHLAAo (ORCPT ); Fri, 11 Aug 2023 20:00:44 -0400 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6E4F171F for ; Fri, 11 Aug 2023 17:00:43 -0700 (PDT) Received: by mail-pl1-x632.google.com with SMTP id d9443c01a7336-1bdba48eccdso8478495ad.3 for ; Fri, 11 Aug 2023 17:00:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691798442; x=1692403242; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9R9PvWuufPe7OZHaIhFg39XGiMj0LtppO0oX7nawezY=; b=AF3Yz/Ew5oHmkA/xBuGOqcpzjDJlnFoic7T552Ese+7nu0QT94xnwdSuGvOJxoPz++ GEG+xEWHXhHpwJtO5NgJE8dQlz6ka3OLFEBNn3oVqZZSTe3wKVOGfKkwOcfLs1lXCaQg tJb6LLFJpPphwR0tya9C3dyULSIQElqWaS6iM0Oh10C96XcN8HMibX6RZINxmv/A7hnQ V5R1W2YXafpwV7RC7YPu+LlMELtRNEJVPdBRASQT6vfw28MrShUjvbG+Nw8gNPPkgFUq qV1hL4dE9y/JqRYsf/q8PgnXR+V8J7N6fJfDwMWYXwvacES0yHrqXbbgZ3WOUSHM82aY KKwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691798442; x=1692403242; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9R9PvWuufPe7OZHaIhFg39XGiMj0LtppO0oX7nawezY=; b=EsMForz9QFNDPVWzwpreCX1ZrAArYjis8A3eoYNRuhaQabjOB4cN7IM6xCU+ezey8x Xm9x0ahXKrQqEeWa2nd9dR3r+qS792LuZtD9t1Xcj+fN8f430En8/2AEvafe4uzfTn5g sc2tf/JClmmY7Y/qKdVTkEh5U9gWj+SDBa083mEICpG6u7oACsvRQsJr8ZGXe0QfTk0t xVrKFBw7w0qqdX6lRilG3Zi+1BR9/LWKi8crN4f7V25keVFll7H81vvqKfOre6+z/R4S eC6yexu7bRofHkwD5j8CnfB6XK5NhSSHfDR2eW/wjWzt/+w//lv36aP7OFpQ8anfFGTY 9ceQ== X-Gm-Message-State: AOJu0Yzux640Aybd/EI5/7RZFYdiFzZ3KYBl1NcLWoreDuE6mc5rUD2N /38YJ6HslluBo5UhCgNAxQnaoXTNzog2rg== X-Google-Smtp-Source: AGHT+IH1NWCvC59+Mu0oFqbDHw5z1SaRzq/RLqw8c6dvd/8Lf3rir5lymzNKgpPFYozzoLWh9Ro3yg== X-Received: by 2002:a17:902:e741:b0:1bb:8e13:deba with SMTP id p1-20020a170902e74100b001bb8e13debamr3724667plf.11.1691798442518; Fri, 11 Aug 2023 17:00:42 -0700 (PDT) Received: from christian-Precision-5550.. ([129.126.215.52]) by smtp.gmail.com with ESMTPSA id z5-20020a1709028f8500b001b8a7e1b116sm4478308plo.191.2023.08.11.17.00.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Aug 2023 17:00:42 -0700 (PDT) From: Christian Couder To: git@vger.kernel.org Cc: Junio C Hamano , John Cai , Jonathan Tan , Jonathan Nieder , Taylor Blau , Derrick Stolee , Patrick Steinhardt , Christian Couder Subject: [PATCH v5 3/8] repack: refactor finishing pack-objects command Date: Sat, 12 Aug 2023 02:00:06 +0200 Message-ID: <20230812000011.1227371-4-christian.couder@gmail.com> X-Mailer: git-send-email 2.42.0.rc1.8.ga52e3a71db In-Reply-To: <20230812000011.1227371-1-christian.couder@gmail.com> References: <20230808082608.582319-1-christian.couder@gmail.com> <20230812000011.1227371-1-christian.couder@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Create a new finish_pack_objects_cmd() to refactor duplicated code that handles reading the packfile names from the output of a `git pack-objects` command and putting it into a string_list, as well as calling finish_command(). While at it, beautify a code comment a bit in the new function. Signed-off-by: Christian Couder out, "r"); + while (strbuf_getline_lf(&line, out) != EOF) { + struct string_list_item *item; + + if (line.len != the_hash_algo->hexsz) + die(_("repack: Expecting full hex object ID lines only " + "from pack-objects.")); + /* + * Avoid putting packs written outside of the repository in the + * list of names. + */ + if (local) { + item = string_list_append(names, line.buf); + item->util = populate_pack_exts(line.buf); + } + } + fclose(out); + + strbuf_release(&line); + + return finish_command(cmd); +} + static int write_cruft_pack(const struct pack_objects_args *args, const char *destination, const char *pack_prefix, @@ -705,9 +735,8 @@ static int write_cruft_pack(const struct pack_objects_args *args, struct string_list *existing_kept_packs) { struct child_process cmd = CHILD_PROCESS_INIT; - struct strbuf line = STRBUF_INIT; struct string_list_item *item; - FILE *in, *out; + FILE *in; int ret; const char *scratch; int local = skip_prefix(destination, packdir, &scratch); @@ -751,27 +780,7 @@ static int write_cruft_pack(const struct pack_objects_args *args, fprintf(in, "%s.pack\n", item->string); fclose(in); - out = xfdopen(cmd.out, "r"); - while (strbuf_getline_lf(&line, out) != EOF) { - struct string_list_item *item; - - if (line.len != the_hash_algo->hexsz) - die(_("repack: Expecting full hex object ID lines only " - "from pack-objects.")); - /* - * avoid putting packs written outside of the repository in the - * list of names - */ - if (local) { - item = string_list_append(names, line.buf); - item->util = populate_pack_exts(line.buf); - } - } - fclose(out); - - strbuf_release(&line); - - return finish_command(&cmd); + return finish_pack_objects_cmd(&cmd, names, local); } int cmd_repack(int argc, const char **argv, const char *prefix) @@ -782,10 +791,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) struct string_list existing_nonkept_packs = STRING_LIST_INIT_DUP; struct string_list existing_kept_packs = STRING_LIST_INIT_DUP; struct pack_geometry *geometry = NULL; - struct strbuf line = STRBUF_INIT; struct tempfile *refs_snapshot = NULL; int i, ext, ret; - FILE *out; int show_progress; /* variables to be filled by option parsing */ @@ -1016,18 +1023,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) fclose(in); } - out = xfdopen(cmd.out, "r"); - while (strbuf_getline_lf(&line, out) != EOF) { - struct string_list_item *item; - - if (line.len != the_hash_algo->hexsz) - die(_("repack: Expecting full hex object ID lines only from pack-objects.")); - item = string_list_append(&names, line.buf); - item->util = populate_pack_exts(item->string); - } - strbuf_release(&line); - fclose(out); - ret = finish_command(&cmd); + ret = finish_pack_objects_cmd(&cmd, &names, 1); if (ret) goto cleanup; From patchwork Sat Aug 12 00:00:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Couder X-Patchwork-Id: 13351587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF2C8C001B0 for ; Sat, 12 Aug 2023 00:00:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237298AbjHLAAv (ORCPT ); Fri, 11 Aug 2023 20:00:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237307AbjHLAAt (ORCPT ); Fri, 11 Aug 2023 20:00:49 -0400 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 554FF1BD9 for ; Fri, 11 Aug 2023 17:00:47 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-1bcad794ad4so18772515ad.3 for ; Fri, 11 Aug 2023 17:00:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691798446; x=1692403246; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VmEraCvZzOa7Xin8vFdDw5QqJGPLjtVJlBjKzAUJsa0=; b=FyVB7eN0PMiwyNPue4AxhJmyMG5Pa9bbzxYTe38aLW+H7QDs/1B8B/BC3mOV4o7xaq S6rM3xkLW4+hMX/wDKJDL/b4u4EinvAYS+GNFsNtzTgdl+ckCQZgLLdTObcah1jR5zHI ZWJpgiNedb3tUd+nOgMmh6hsO1B4bIYs6INMOaG71CUYqrN8tsJzcvhxwQA92ZHZabV8 lwsWVYF1CwNRQ5V4cVL5CALsASarn59SqkDgHClvLLZhjJSXfVcWoT7hBr620guh2nuS C+MNMFz6huT6St3M8E0akZtjOJW+uGogLd/p9V/fzrFBAoOfQz+0nuLpl7VxwTFPhBx2 aYqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691798446; x=1692403246; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VmEraCvZzOa7Xin8vFdDw5QqJGPLjtVJlBjKzAUJsa0=; b=HZOtK9yYK9ve565/hkSJ8wOmvGG5NVFICSuAOJNOC6A1vZ6zPi/CwxkqCSCItZ4zBb /eoOv6OrvCIAwV1xTSLUZtPCinp1QbznIp5kry35yeyu6erMQoNMJ9tnwiqL3SI3iJBJ vK6/u3mXa+thZXchAO2a3Q/+fHoTCU1z3QSTnVsJr2lyBRXARC5++zr03XDNVOVOWj7M JHEWXdkH+vcihQBVkVZrkvwLVOnV8Hg1+6YJHZHYhBqj6hJDCCQdb/7/bUyOGp8N7Bk0 bIVauZf+UVWEF16iQwz/T3gwPKLCWj+3PMCLEeXxeD3Ls0wHIzyUNd6ePO4aoBQHOm2B R3tA== X-Gm-Message-State: AOJu0Yw5LL/11wnHOKpwPBszdLVWztq4wU/jQWEyYzw8ZQOIyzE5w8ji 2iLzcLDkLUiNudHsWMCqM7h4pM/cxvpnhA== X-Google-Smtp-Source: AGHT+IH+JrmROsdV+GW1wJBKmqR023hcxFxzlo6sCmm+BDetUsEZq840tO/TOTW3vRblsCBx5ohlCA== X-Received: by 2002:a17:902:c085:b0:1b8:6850:c3c4 with SMTP id j5-20020a170902c08500b001b86850c3c4mr2876506pld.22.1691798446177; Fri, 11 Aug 2023 17:00:46 -0700 (PDT) Received: from christian-Precision-5550.. ([129.126.215.52]) by smtp.gmail.com with ESMTPSA id z5-20020a1709028f8500b001b8a7e1b116sm4478308plo.191.2023.08.11.17.00.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Aug 2023 17:00:45 -0700 (PDT) From: Christian Couder To: git@vger.kernel.org Cc: Junio C Hamano , John Cai , Jonathan Tan , Jonathan Nieder , Taylor Blau , Derrick Stolee , Patrick Steinhardt , Christian Couder Subject: [PATCH v5 4/8] repack: refactor finding pack prefix Date: Sat, 12 Aug 2023 02:00:07 +0200 Message-ID: <20230812000011.1227371-5-christian.couder@gmail.com> X-Mailer: git-send-email 2.42.0.rc1.8.ga52e3a71db In-Reply-To: <20230812000011.1227371-1-christian.couder@gmail.com> References: <20230808082608.582319-1-christian.couder@gmail.com> <20230812000011.1227371-1-christian.couder@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Create a new find_pack_prefix() to refactor code that handles finding the pack prefix from the packtmp and packdir global variables, as we are going to need this feature again in following commit. Signed-off-by: Christian Couder X-Patchwork-Id: 13351588 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F119C001DB for ; Sat, 12 Aug 2023 00:00:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237311AbjHLAAw (ORCPT ); Fri, 11 Aug 2023 20:00:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237292AbjHLAAv (ORCPT ); Fri, 11 Aug 2023 20:00:51 -0400 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B1BA171D for ; Fri, 11 Aug 2023 17:00:50 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-1bbc06f830aso18114205ad.0 for ; Fri, 11 Aug 2023 17:00:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691798449; x=1692403249; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=adWJrslufNZkxXttdxxfCaY9za+ZGB56QbK90yXLlKw=; b=kJTBOmC1Ij8ljKcipGSsNSiBn7RsCCgPjwB1+CYrxWl89Xp1P6l5MYwXMGQ0DFnSrL rdEuPeNH+JPgxuTKZMbHvlvGhFZefZAbXYduke0ZxpDCxgRIW7RUQF4f+y5FplDJsA7O In5waasUNe9mCEpjmfTGEDx50UrqsxrHfxIZZ8Ph94EVwb4Ljgc0tKnTL0LUl46RRVft 0axpWO5g2oaudiMcCGgSA9BYNTMtecKzNVE15GtM5THahhGR708xQp9txBstpfRc4IAp uJJAbLXradb//PMq8kpHGPAGiOl3whxM+R8dw5erzFQ3nA5cxiXjOLzFu0G/1ArgnmLm RKrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691798449; x=1692403249; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=adWJrslufNZkxXttdxxfCaY9za+ZGB56QbK90yXLlKw=; b=XaVjTR/SP+9J8DEtBAXvdODhNmJXaC54eOmBZv2E1QrYgxsKL95znGPSGqfJ96PnuM JhLhUNctbB0l0PCCp02nhWgsrD3Tb4prAUlZbmQprIMUsbtYMploE/u6DRs7mGL+NRMB A3YjxD/OcqkOOffyjGEaOcMlvVEgt0mlrijuTWXYq4u/239uslIJMlHwg2GipuA7uNM2 ueSvV/gGPtwkmiBNoWJ5kowiPLTNIETDHsIzcma1+Z6KDX+UHJViSZy7DJHtmYyg03C9 EGxlGMbu7Jc//rypQPIHeG8zQzcWYmX6IbcoMRPRmc2QS1yZAewrzQuH4Gbrtit9mhE5 MMyg== X-Gm-Message-State: AOJu0YyPBhMt8rPOZOQeoEzTQIfI/14ioN/5nCn/+bUc9bECgFUt/zFN Hc1JHL974Pf8kOqH/DBQid/Nh6jhghfztg== X-Google-Smtp-Source: AGHT+IGOfzd/f3u1lQjujrM+zKFpUbm/Wml3DWymel7deAwFRXA2+nxhOCxBhNKMWq5XjyqplxJzHQ== X-Received: by 2002:a17:902:ab1a:b0:1bc:4b77:c74 with SMTP id ik26-20020a170902ab1a00b001bc4b770c74mr6729228plb.0.1691798449419; Fri, 11 Aug 2023 17:00:49 -0700 (PDT) Received: from christian-Precision-5550.. ([129.126.215.52]) by smtp.gmail.com with ESMTPSA id z5-20020a1709028f8500b001b8a7e1b116sm4478308plo.191.2023.08.11.17.00.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Aug 2023 17:00:48 -0700 (PDT) From: Christian Couder To: git@vger.kernel.org Cc: Junio C Hamano , John Cai , Jonathan Tan , Jonathan Nieder , Taylor Blau , Derrick Stolee , Patrick Steinhardt , Christian Couder , Christian Couder Subject: [PATCH v5 5/8] repack: add `--filter=` option Date: Sat, 12 Aug 2023 02:00:08 +0200 Message-ID: <20230812000011.1227371-6-christian.couder@gmail.com> X-Mailer: git-send-email 2.42.0.rc1.8.ga52e3a71db In-Reply-To: <20230812000011.1227371-1-christian.couder@gmail.com> References: <20230808082608.582319-1-christian.couder@gmail.com> <20230812000011.1227371-1-christian.couder@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org This new option puts the objects specified by `` into a separate packfile. This could be useful if, for example, some blobs take up a lot of precious space on fast storage while they are rarely accessed. It could make sense to move them into a separate cheaper, though slower, storage. It's possible to find which new packfile contains the filtered out objects using one of the following: - `git verify-pack -v ...`, - `test-tool find-pack ...`, which a previous commit added, - `--filter-to=`, which a following commit will add to specify where the pack containing the filtered out objects will be. This feature is implemented by running `git pack-objects` twice in a row. The first command is run with `--filter=`, using the specified filter. It packs objects while omitting the objects specified by the filter. Then another `git pack-objects` command is launched using `--stdin-packs`. We pass it all the previously existing packs into its stdin, so that it will pack all the objects in the previously existing packs. But we also pass into its stdin, the pack created by the previous `git pack-objects --filter=` command as well as the kept packs, all prefixed with '^', so that the objects in these packs will be omitted from the resulting pack. The result is that only the objects filtered out by the first `git pack-objects` command are in the pack resulting from the second `git pack-objects` command. As the interactions with kept packs are a bit tricky, a few related tests are added. Signed-off-by: John Cai Signed-off-by: Christian Couder --- Documentation/git-repack.txt | 12 ++++ builtin/repack.c | 73 +++++++++++++++++++ t/t7700-repack.sh | 135 +++++++++++++++++++++++++++++++++++ 3 files changed, 220 insertions(+) diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 4017157949..6d5bec7716 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -143,6 +143,18 @@ depth is 4095. a larger and slower repository; see the discussion in `pack.packSizeLimit`. +--filter=:: + Remove objects matching the filter specification from the + resulting packfile and put them into a separate packfile. Note + that objects used in the working directory are not filtered + out. So for the split to fully work, it's best to perform it + in a bare repo and to use the `-a` and `-d` options along with + this option. Also `--no-write-bitmap-index` (or the + `repack.writebitmaps` config option set to `false`) should be + used otherwise writing bitmap index will fail, as it supposes + a single packfile containing all the objects. See + linkgit:git-rev-list[1] for valid `` forms. + -b:: --write-bitmap-index:: Write a reachability bitmap index as part of the repack. This diff --git a/builtin/repack.c b/builtin/repack.c index 825da1caca..c672387ab9 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -21,6 +21,7 @@ #include "pack.h" #include "pack-bitmap.h" #include "refs.h" +#include "list-objects-filter-options.h" #define ALL_INTO_ONE 1 #define LOOSEN_UNREACHABLE 2 @@ -57,6 +58,7 @@ struct pack_objects_args { int no_reuse_object; int quiet; int local; + struct list_objects_filter_options filter_options; }; static int repack_config(const char *var, const char *value, @@ -726,6 +728,57 @@ static int finish_pack_objects_cmd(struct child_process *cmd, return finish_command(cmd); } +static int write_filtered_pack(const struct pack_objects_args *args, + const char *destination, + const char *pack_prefix, + struct string_list *keep_pack_list, + struct string_list *names, + struct string_list *existing_packs, + struct string_list *existing_kept_packs) +{ + struct child_process cmd = CHILD_PROCESS_INIT; + struct string_list_item *item; + FILE *in; + int ret, i; + const char *caret; + const char *scratch; + int local = skip_prefix(destination, packdir, &scratch); + + prepare_pack_objects(&cmd, args, destination); + + strvec_push(&cmd.args, "--stdin-packs"); + + if (!pack_kept_objects) + strvec_push(&cmd.args, "--honor-pack-keep"); + for (i = 0; i < keep_pack_list->nr; i++) + strvec_pushf(&cmd.args, "--keep-pack=%s", + keep_pack_list->items[i].string); + + cmd.in = -1; + + ret = start_command(&cmd); + if (ret) + return ret; + + /* + * Here 'names' contains only the pack(s) that were just + * written, which is exactly the packs we want to keep. Also + * 'existing_kept_packs' already contains the packs in + * 'keep_pack_list'. + */ + in = xfdopen(cmd.in, "w"); + for_each_string_list_item(item, names) + fprintf(in, "^%s-%s.pack\n", pack_prefix, item->string); + for_each_string_list_item(item, existing_packs) + fprintf(in, "%s.pack\n", item->string); + caret = pack_kept_objects ? "" : "^"; + for_each_string_list_item(item, existing_kept_packs) + fprintf(in, "%s%s.pack\n", caret, item->string); + fclose(in); + + return finish_pack_objects_cmd(&cmd, names, local); +} + static int write_cruft_pack(const struct pack_objects_args *args, const char *destination, const char *pack_prefix, @@ -858,6 +911,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) N_("limits the maximum number of threads")), OPT_STRING(0, "max-pack-size", &po_args.max_pack_size, N_("bytes"), N_("maximum size of each packfile")), + OPT_PARSE_LIST_OBJECTS_FILTER(&po_args.filter_options), OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects, N_("repack objects in packs marked with .keep")), OPT_STRING_LIST(0, "keep-pack", &keep_pack_list, N_("name"), @@ -871,6 +925,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) OPT_END() }; + list_objects_filter_init(&po_args.filter_options); + git_config(repack_config, &cruft_po_args); argc = parse_options(argc, argv, prefix, builtin_repack_options, @@ -1011,6 +1067,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix) strvec_push(&cmd.args, "--incremental"); } + if (po_args.filter_options.choice) + strvec_pushf(&cmd.args, "--filter=%s", + expand_list_objects_filter_spec(&po_args.filter_options)); + if (geometry) cmd.in = -1; else @@ -1097,6 +1157,18 @@ int cmd_repack(int argc, const char **argv, const char *prefix) } } + if (po_args.filter_options.choice) { + ret = write_filtered_pack(&po_args, + packtmp, + find_pack_prefix(packdir, packtmp), + &keep_pack_list, + &names, + &existing_nonkept_packs, + &existing_kept_packs); + if (ret) + goto cleanup; + } + string_list_sort(&names); close_object_store(the_repository->objects); @@ -1231,6 +1303,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) string_list_clear(&existing_nonkept_packs, 0); string_list_clear(&existing_kept_packs, 0); clear_pack_geometry(geometry); + list_objects_filter_release(&po_args.filter_options); return ret; } diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh index 27b66807cd..39e89445fd 100755 --- a/t/t7700-repack.sh +++ b/t/t7700-repack.sh @@ -327,6 +327,141 @@ test_expect_success 'auto-bitmaps do not complain if unavailable' ' test_must_be_empty actual ' +test_expect_success 'repacking with a filter works' ' + git -C bare.git repack -a -d && + test_stdout_line_count = 1 ls bare.git/objects/pack/*.pack && + git -C bare.git -c repack.writebitmaps=false repack -a -d --filter=blob:none && + test_stdout_line_count = 2 ls bare.git/objects/pack/*.pack && + commit_pack=$(test-tool -C bare.git find-pack -c 1 HEAD) && + blob_pack=$(test-tool -C bare.git find-pack -c 1 HEAD:file1) && + test "$commit_pack" != "$blob_pack" && + tree_pack=$(test-tool -C bare.git find-pack -c 1 HEAD^{tree}) && + test "$tree_pack" = "$commit_pack" && + blob_pack2=$(test-tool -C bare.git find-pack -c 1 HEAD:file2) && + test "$blob_pack2" = "$blob_pack" +' + +test_expect_success '--filter fails with --write-bitmap-index' ' + test_must_fail \ + env GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \ + git -C bare.git repack -a -d --write-bitmap-index --filter=blob:none +' + +test_expect_success 'repacking with two filters works' ' + git init two-filters && + ( + cd two-filters && + mkdir subdir && + test_commit foo && + test_commit subdir_bar subdir/bar && + test_commit subdir_baz subdir/baz + ) && + git clone --no-local --bare two-filters two-filters.git && + ( + cd two-filters.git && + test_stdout_line_count = 1 ls objects/pack/*.pack && + git -c repack.writebitmaps=false repack -a -d \ + --filter=blob:none --filter=tree:1 && + test_stdout_line_count = 2 ls objects/pack/*.pack && + commit_pack=$(test-tool find-pack -c 1 HEAD) && + blob_pack=$(test-tool find-pack -c 1 HEAD:foo.t) && + root_tree_pack=$(test-tool find-pack -c 1 HEAD^{tree}) && + subdir_tree_hash=$(git ls-tree --object-only HEAD -- subdir) && + subdir_tree_pack=$(test-tool find-pack -c 1 "$subdir_tree_hash") && + + # Root tree and subdir tree are not in the same packfiles + test "$commit_pack" != "$blob_pack" && + test "$commit_pack" = "$root_tree_pack" && + test "$blob_pack" = "$subdir_tree_pack" + ) +' + +prepare_for_keep_packs () { + git init keep-packs && + ( + cd keep-packs && + test_commit foo && + test_commit bar + ) && + git clone --no-local --bare keep-packs keep-packs.git && + ( + cd keep-packs.git && + + # Create two packs + # The first pack will contain all of the objects except one blob + git rev-list --objects --all >objs && + grep -v "bar.t" objs | git pack-objects pack && + # The second pack will contain the excluded object and be kept + packid=$(grep "bar.t" objs | git pack-objects pack) && + >pack-$packid.keep && + + # Replace the existing pack with the 2 new ones + rm -f objects/pack/pack* && + mv pack-* objects/pack/ + ) +} + +test_expect_success '--filter works with .keep packs' ' + prepare_for_keep_packs && + ( + cd keep-packs.git && + + foo_pack=$(test-tool find-pack -c 1 HEAD:foo.t) && + bar_pack=$(test-tool find-pack -c 1 HEAD:bar.t) && + head_pack=$(test-tool find-pack -c 1 HEAD) && + + test "$foo_pack" != "$bar_pack" && + test "$foo_pack" = "$head_pack" && + + git -c repack.writebitmaps=false repack -a -d --filter=blob:none && + + foo_pack_1=$(test-tool find-pack -c 1 HEAD:foo.t) && + bar_pack_1=$(test-tool find-pack -c 1 HEAD:bar.t) && + head_pack_1=$(test-tool find-pack -c 1 HEAD) && + + # Object bar is still only in the old .keep pack + test "$foo_pack_1" != "$foo_pack" && + test "$bar_pack_1" = "$bar_pack" && + test "$head_pack_1" != "$head_pack" && + + test "$foo_pack_1" != "$bar_pack_1" && + test "$foo_pack_1" != "$head_pack_1" && + test "$bar_pack_1" != "$head_pack_1" + ) +' + +test_expect_success '--filter works with --pack-kept-objects and .keep packs' ' + rm -rf keep-packs keep-packs.git && + prepare_for_keep_packs && + ( + cd keep-packs.git && + + foo_pack=$(test-tool find-pack -c 1 HEAD:foo.t) && + bar_pack=$(test-tool find-pack -c 1 HEAD:bar.t) && + head_pack=$(test-tool find-pack -c 1 HEAD) && + + test "$foo_pack" != "$bar_pack" && + test "$foo_pack" = "$head_pack" && + + git -c repack.writebitmaps=false repack -a -d --filter=blob:none \ + --pack-kept-objects && + + foo_pack_1=$(test-tool find-pack -c 1 HEAD:foo.t) && + test-tool find-pack -c 2 HEAD:bar.t >bar_pack_1 && + head_pack_1=$(test-tool find-pack -c 1 HEAD) && + + test "$foo_pack_1" != "$foo_pack" && + test "$foo_pack_1" != "$bar_pack" && + test "$head_pack_1" != "$head_pack" && + + # Object bar is in both the old .keep pack and the new + # pack that contained the filtered out objects + grep "$bar_pack" bar_pack_1 && + grep "$foo_pack_1" bar_pack_1 && + test "$foo_pack_1" != "$head_pack_1" + ) +' + objdir=.git/objects midx=$objdir/pack/multi-pack-index From patchwork Sat Aug 12 00:00:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Couder X-Patchwork-Id: 13351589 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C75BC001DB for ; Sat, 12 Aug 2023 00:01:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237320AbjHLABA (ORCPT ); Fri, 11 Aug 2023 20:01:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237321AbjHLAA6 (ORCPT ); Fri, 11 Aug 2023 20:00:58 -0400 Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D425019AE for ; Fri, 11 Aug 2023 17:00:53 -0700 (PDT) Received: by mail-pf1-x42c.google.com with SMTP id d2e1a72fcca58-686f19b6dd2so1907518b3a.2 for ; Fri, 11 Aug 2023 17:00:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691798453; x=1692403253; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0om6oTRhRZlYIB5yxY0zTqTBjQ35baScSmZNkTE/e7M=; b=GEqhNl3NeRsDVvoqOFuT7L1lhw9wp7vv8aEvNAGJL6/cnjtljDapAmEBHqccdfPctZ 4uA0FXWmEMyz3W2glPvhtiXQ+usOHCk6Vhqt7GalyYO9hkC47b8anMWA2LUktPLuaaJK TAhXxm/R+gut7GW4qWgUygF5oBtkyOJ70g+bQtbilOIO2+gcZ932o2xovJflPiddeU+Y PwhpdfbfL4O9CgbeU55+wfcjGV2uCeE5H+es2zST0fmlBh1csn1MPIBBmWMMB9WUfXE/ aRWtC4St7VniiGf8Jsjtx3Nt5MJmG+bB7uUI19v8jAxc3Hbb1HfYCWcMkYG4h/fQ9wcd NLxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691798453; x=1692403253; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0om6oTRhRZlYIB5yxY0zTqTBjQ35baScSmZNkTE/e7M=; b=HEkV/OExRrzh2gIO1jIZXlqongqpEklxkRaD/jB11kq+meHx1+QbLleWxNbhzh0wqe oyp1jFwOAUfXImaubIUxMCxnGwabUoNwzz0w59NX5SBpQFBw8kg6ABHzWa/Iy6pxCN53 BekJ1MhwoL+WUmtPJnJ5phj5/UhalhJIrH8LZzqe4ZUbngB5BXw+i+OXYrudBA/PH/M0 SvOhJavV2+zq0iUKuL8t6rLST0G/TaNngvudLYgzULjpi2CwvNpO4weoB7+yw45DjYtt JAZCUi1o2vUwUzXpwa23WLHwzNPoLDLOSWRbfFX46FO4Py8ww83cdebT0dstTmsWqGNG gxRw== X-Gm-Message-State: AOJu0Yxq3EjMNlzVLqOE8hp4+H2ZTdA3+wE9zuD2MZqBkzm3cvowwn1s xWz4ETS/bAam2HLUTqxukRUYe6oLnrYxBg== X-Google-Smtp-Source: AGHT+IHd0C0c6JkRlCwvJOiQgFzFhhZhPZIcBCh4cTnG1msa6RaQ2+f4CRBzxikbqvRuBkcBC4zw7w== X-Received: by 2002:a05:6a21:4847:b0:137:8ddf:464b with SMTP id au7-20020a056a21484700b001378ddf464bmr3149860pzc.36.1691798452651; Fri, 11 Aug 2023 17:00:52 -0700 (PDT) Received: from christian-Precision-5550.. ([129.126.215.52]) by smtp.gmail.com with ESMTPSA id z5-20020a1709028f8500b001b8a7e1b116sm4478308plo.191.2023.08.11.17.00.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Aug 2023 17:00:52 -0700 (PDT) From: Christian Couder To: git@vger.kernel.org Cc: Junio C Hamano , John Cai , Jonathan Tan , Jonathan Nieder , Taylor Blau , Derrick Stolee , Patrick Steinhardt , Christian Couder , Christian Couder Subject: [PATCH v5 6/8] gc: add `gc.repackFilter` config option Date: Sat, 12 Aug 2023 02:00:09 +0200 Message-ID: <20230812000011.1227371-7-christian.couder@gmail.com> X-Mailer: git-send-email 2.42.0.rc1.8.ga52e3a71db In-Reply-To: <20230812000011.1227371-1-christian.couder@gmail.com> References: <20230808082608.582319-1-christian.couder@gmail.com> <20230812000011.1227371-1-christian.couder@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A previous commit has implemented `git repack --filter=` to allow users to filter out some objects from the main pack and move them into a new different pack. Users might want to perform such a cleanup regularly at the same time as they perform other repacks and cleanups, so as part of `git gc`. Let's allow them to configure a for that purpose using a new gc.repackFilter config option. Now when `git gc` will perform a repack with a configured through this option and not empty, the repack process will be passed a corresponding `--filter=` argument. Signed-off-by: Christian Couder --- Documentation/config/gc.txt | 5 +++++ builtin/gc.c | 6 ++++++ t/t6500-gc.sh | 13 +++++++++++++ 3 files changed, 24 insertions(+) diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt index ca47eb2008..2153bde7ac 100644 --- a/Documentation/config/gc.txt +++ b/Documentation/config/gc.txt @@ -145,6 +145,11 @@ Multiple hooks are supported, but all must exit successfully, else the operation (either generating a cruft pack or unpacking unreachable objects) will be halted. +gc.repackFilter:: + When repacking, use the specified filter to move certain + objects into a separate packfile. See the + `--filter=` option of linkgit:git-repack[1]. + gc.rerereResolved:: Records of conflicted merge you resolved earlier are kept for this many days when 'git rerere gc' is run. diff --git a/builtin/gc.c b/builtin/gc.c index 19d73067aa..9b0984f301 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -61,6 +61,7 @@ static timestamp_t gc_log_expire_time; static const char *gc_log_expire = "1.day.ago"; static const char *prune_expire = "2.weeks.ago"; static const char *prune_worktrees_expire = "3.months.ago"; +static char *repack_filter; static unsigned long big_pack_threshold; static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE; @@ -170,6 +171,8 @@ static void gc_config(void) git_config_get_ulong("gc.bigpackthreshold", &big_pack_threshold); git_config_get_ulong("pack.deltacachesize", &max_delta_cache_size); + git_config_get_string("gc.repackfilter", &repack_filter); + git_config(git_default_config, NULL); } @@ -355,6 +358,9 @@ static void add_repack_all_option(struct string_list *keep_pack) if (keep_pack) for_each_string_list(keep_pack, keep_one_pack, NULL); + + if (repack_filter && *repack_filter) + strvec_pushf(&repack, "--filter=%s", repack_filter); } static void add_repack_incremental_option(void) diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh index 69509d0c11..232e403b66 100755 --- a/t/t6500-gc.sh +++ b/t/t6500-gc.sh @@ -202,6 +202,19 @@ test_expect_success 'one of gc.reflogExpire{Unreachable,}=never does not skip "e grep -E "^trace: (built-in|exec|run_command): git reflog expire --" trace.out ' +test_expect_success 'gc.repackFilter launches repack with a filter' ' + test_when_finished "rm -rf bare.git" && + git clone --no-local --bare . bare.git && + + git -C bare.git -c gc.cruftPacks=false gc && + test_stdout_line_count = 1 ls bare.git/objects/pack/*.pack && + + GIT_TRACE=$(pwd)/trace.out git -C bare.git -c gc.repackFilter=blob:none \ + -c repack.writeBitmaps=false -c gc.cruftPacks=false gc && + test_stdout_line_count = 2 ls bare.git/objects/pack/*.pack && + grep -E "^trace: (built-in|exec|run_command): git repack .* --filter=blob:none ?.*" trace.out +' + prepare_cruft_history () { test_commit base && From patchwork Sat Aug 12 00:00:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Couder X-Patchwork-Id: 13351590 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0D06C001B0 for ; Sat, 12 Aug 2023 00:01:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237318AbjHLABB (ORCPT ); Fri, 11 Aug 2023 20:01:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237334AbjHLAA7 (ORCPT ); Fri, 11 Aug 2023 20:00:59 -0400 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A0F71FF0 for ; Fri, 11 Aug 2023 17:00:57 -0700 (PDT) Received: by mail-pf1-x435.google.com with SMTP id d2e1a72fcca58-686daaa5f1fso1906557b3a.3 for ; Fri, 11 Aug 2023 17:00:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691798456; x=1692403256; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=af6Idsvllj9+XrEMPM7RyG058LIfLHYt+3X+2+wki2M=; b=AHPlqgeDOk5WW9/iXmK9sT7jkJDxr09w6Fy1e9c3RIfedPs23NQYhJnPEVsx2CpmHg caGEVru5CvWV3acJCuekHK97uQA7zrxsYvsy1OfeObOGGzIvKJ7jQS52/VN786vgidyI q3tD3c/Eh4dh8cRmDTBcHZGYLBgpP7AY8q9ytu0W3mR4JFZfLkanPv77QF2i2aZyDdnk iGX/uW1TwX53UGbKLO72vk3Z1NXQmE3VNw0R00FcVbpDPF6TVGQCDxYNDfz3SlkqcYqx clgMZntdjB/rNALXq3YTF3aTn+iZ705cuEZojmM0rPOGtKU5hfYiBQ/qe8/WQ+KQXjDB Cxaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691798456; x=1692403256; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=af6Idsvllj9+XrEMPM7RyG058LIfLHYt+3X+2+wki2M=; b=gBPrLISeKhcfVTkOzPll46YHhTKu45JNjsZtswg6/bewPRR0TiLAC67iygw3qK3zGF kAPQcswR35bUlHcMZ5RiiPAczQ8JhrdYn+LkXgNH2pRF4PnIxYK54nlBGm7ymVYX/Zgm mtUgkIksEoOOl3UP/HaggULM7X1q/tazDKVuA8b3sKXqpDuOiilLZKgXuxV59ymC9OXp cG9XyaIhOOl+YXYNTmqslsICXuztVCyhCNMtY+PbW4vLttyGjLC0OJvgLe+q6BtOvsAU zOaJlcUMwzOv85Gulx0XIRpVqZhMFAtpKrkDQabnV4TBaeH9ooB5s+dyqY3Dipe2Q2je bE+w== X-Gm-Message-State: AOJu0Yz/fS6MQcpBi8IFsGapjT/jPUEKDkD/WEDEr5sLq0AvjoNv4un4 zSgXTP1DbMenpmyr4n2jzBCmaV4rNx4VZQ== X-Google-Smtp-Source: AGHT+IHn+WBjuQ5MTgEta7ATDsxvrd4WsQFYejIp6FJjmMNQZdALlzeGFo/gaL/O0JoR7Ju4Jf5KNA== X-Received: by 2002:a05:6a21:329a:b0:140:6979:295d with SMTP id yt26-20020a056a21329a00b001406979295dmr4035900pzb.2.1691798456166; Fri, 11 Aug 2023 17:00:56 -0700 (PDT) Received: from christian-Precision-5550.. ([129.126.215.52]) by smtp.gmail.com with ESMTPSA id z5-20020a1709028f8500b001b8a7e1b116sm4478308plo.191.2023.08.11.17.00.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Aug 2023 17:00:55 -0700 (PDT) From: Christian Couder To: git@vger.kernel.org Cc: Junio C Hamano , John Cai , Jonathan Tan , Jonathan Nieder , Taylor Blau , Derrick Stolee , Patrick Steinhardt , Christian Couder , Christian Couder Subject: [PATCH v5 7/8] repack: implement `--filter-to` for storing filtered out objects Date: Sat, 12 Aug 2023 02:00:10 +0200 Message-ID: <20230812000011.1227371-8-christian.couder@gmail.com> X-Mailer: git-send-email 2.42.0.rc1.8.ga52e3a71db In-Reply-To: <20230812000011.1227371-1-christian.couder@gmail.com> References: <20230808082608.582319-1-christian.couder@gmail.com> <20230812000011.1227371-1-christian.couder@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A previous commit has implemented `git repack --filter=` to allow users to filter out some objects from the main pack and move them into a new different pack. It would be nice if this new different pack could be created in a different directory than the regular pack. This would make it possible to move large blobs into a pack on a different kind of storage, for example cheaper storage. Even in a different directory, this pack can be accessible if, for example, the Git alternates mechanism is used to point to it. In fact not using the Git alternates mechanism can corrupt a repo as the generated pack containing the filtered objects might not be accessible from the repo any more. So setting up the Git alternates mechanism should be done before using this feature if the user wants the repo to be fully usable while this feature is used. In some cases, like when a repo has just been cloned or when there is no other activity in the repo, it's Ok to setup the Git alternates mechanism afterwards though. It's also Ok to just inspect the generated packfile containing the filtered objects and then just move it into the '.git/objects/pack/' directory manually. That's why it's not necessary for this command to check that the Git alternates mechanism has been already setup. While at it, as an example to show that `--filter` and `--filter-to` work well with other options, let's also add a test to check that these options work well with `--max-pack-size`. Signed-off-by: Christian Couder --- Documentation/git-repack.txt | 11 +++++++ builtin/repack.c | 10 +++++- t/t7700-repack.sh | 62 ++++++++++++++++++++++++++++++++++++ 3 files changed, 82 insertions(+), 1 deletion(-) diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index 6d5bec7716..8545a32667 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -155,6 +155,17 @@ depth is 4095. a single packfile containing all the objects. See linkgit:git-rev-list[1] for valid `` forms. +--filter-to=:: + Write the pack containing filtered out objects to the + directory ``. Only useful with `--filter`. This can be + used for putting the pack on a separate object directory that + is accessed through the Git alternates mechanism. **WARNING:** + If the packfile containing the filtered out objects is not + accessible, the repo can become corrupt as it might not be + possible to access the objects in that packfile. See the + `objects` and `objects/info/alternates` sections of + linkgit:gitrepository-layout[5]. + -b:: --write-bitmap-index:: Write a reachability bitmap index as part of the repack. This diff --git a/builtin/repack.c b/builtin/repack.c index c672387ab9..c396029ec9 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -870,6 +870,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) int write_midx = 0; const char *cruft_expiration = NULL; const char *expire_to = NULL; + const char *filter_to = NULL; struct option builtin_repack_options[] = { OPT_BIT('a', NULL, &pack_everything, @@ -922,6 +923,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) N_("write a multi-pack index of the resulting packs")), OPT_STRING(0, "expire-to", &expire_to, N_("dir"), N_("pack prefix to store a pack containing pruned objects")), + OPT_STRING(0, "filter-to", &filter_to, N_("dir"), + N_("pack prefix to store a pack containing filtered out objects")), OPT_END() }; @@ -1070,6 +1073,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) if (po_args.filter_options.choice) strvec_pushf(&cmd.args, "--filter=%s", expand_list_objects_filter_spec(&po_args.filter_options)); + else if (filter_to) + die(_("option '%s' can only be used along with '%s'"), "--filter-to", "--filter"); if (geometry) cmd.in = -1; @@ -1158,8 +1163,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix) } if (po_args.filter_options.choice) { + if (!filter_to) + filter_to = packtmp; + ret = write_filtered_pack(&po_args, - packtmp, + filter_to, find_pack_prefix(packdir, packtmp), &keep_pack_list, &names, diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh index 39e89445fd..48e92aa6f7 100755 --- a/t/t7700-repack.sh +++ b/t/t7700-repack.sh @@ -462,6 +462,68 @@ test_expect_success '--filter works with --pack-kept-objects and .keep packs' ' ) ' +test_expect_success '--filter-to stores filtered out objects' ' + git -C bare.git repack -a -d && + test_stdout_line_count = 1 ls bare.git/objects/pack/*.pack && + + git init --bare filtered.git && + git -C bare.git -c repack.writebitmaps=false repack -a -d \ + --filter=blob:none \ + --filter-to=../filtered.git/objects/pack/pack && + test_stdout_line_count = 1 ls bare.git/objects/pack/pack-*.pack && + test_stdout_line_count = 1 ls filtered.git/objects/pack/pack-*.pack && + + commit_pack=$(test-tool -C bare.git find-pack -c 1 HEAD) && + blob_pack=$(test-tool -C bare.git find-pack -c 0 HEAD:file1) && + blob_hash=$(git -C bare.git rev-parse HEAD:file1) && + test -n "$blob_hash" && + blob_pack=$(test-tool -C filtered.git find-pack -c 1 $blob_hash) && + + echo $(pwd)/filtered.git/objects >bare.git/objects/info/alternates && + blob_pack=$(test-tool -C bare.git find-pack -c 1 HEAD:file1) && + blob_content=$(git -C bare.git show $blob_hash) && + test "$blob_content" = "content1" +' + +test_expect_success '--filter works with --max-pack-size' ' + rm -rf filtered.git && + git init --bare filtered.git && + git init max-pack-size && + ( + cd max-pack-size && + test_commit base && + # two blobs which exceed the maximum pack size + test-tool genrandom foo 1048576 >foo && + git hash-object -w foo && + test-tool genrandom bar 1048576 >bar && + git hash-object -w bar && + git add foo bar && + git commit -m "adding foo and bar" + ) && + git clone --no-local --bare max-pack-size max-pack-size.git && + ( + cd max-pack-size.git && + git -c repack.writebitmaps=false repack -a -d --filter=blob:none \ + --max-pack-size=1M \ + --filter-to=../filtered.git/objects/pack/pack && + echo $(cd .. && pwd)/filtered.git/objects >objects/info/alternates && + + # Check that the 3 blobs are in different packfiles in filtered.git + test_stdout_line_count = 3 ls ../filtered.git/objects/pack/pack-*.pack && + test_stdout_line_count = 1 ls objects/pack/pack-*.pack && + foo_pack=$(test-tool find-pack -c 1 HEAD:foo) && + bar_pack=$(test-tool find-pack -c 1 HEAD:bar) && + base_pack=$(test-tool find-pack -c 1 HEAD:base.t) && + test "$foo_pack" != "$bar_pack" && + test "$foo_pack" != "$base_pack" && + test "$bar_pack" != "$base_pack" && + for pack in "$foo_pack" "$bar_pack" "$base_pack" + do + case "$foo_pack" in */filtered.git/objects/pack/*) true ;; *) return 1 ;; esac + done + ) +' + objdir=.git/objects midx=$objdir/pack/multi-pack-index From patchwork Sat Aug 12 00:00:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Couder X-Patchwork-Id: 13351591 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9493C001DB for ; Sat, 12 Aug 2023 00:01:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237334AbjHLABD (ORCPT ); Fri, 11 Aug 2023 20:01:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237217AbjHLABB (ORCPT ); Fri, 11 Aug 2023 20:01:01 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AA0F171D for ; Fri, 11 Aug 2023 17:01:01 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-1bb7b8390e8so18780435ad.2 for ; Fri, 11 Aug 2023 17:01:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691798460; x=1692403260; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lEBJbf6/3phuZ6h9in+X12qWnHu3Bx/RG3Utf7mBgQU=; b=VCqw3wd+uTU3TSsfq2sqsY26S5Fv8tO8fv7LcTbBbSG/QlobykghFRU4OhsQHSVa4l w4XQ2iZJqLX2jtvvrHnPn1RP78DjR5/9jDjj5Fy6Q/zdnuc1LO7zhJmrQQ3KO7qPkWhh Ucb6959BUs5Apt/+UWt1SDHoRPmE6/CC9WvBNHJncX4yGCanGy9R75tSL6JAQ8tgnMvM xEVqrpgHZC8wqK9gOkRY2Sl5G9XpU6tL6GEbJm1eSUXCLQLUEpZTlwlWlmuZN1hub0gH khZVss2hF98AmvmroC70Cy34pkUBzkHZ2l91Np/LmvgPGyhynz7zNSnuMGxBp2TMDHJf zNJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691798460; x=1692403260; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lEBJbf6/3phuZ6h9in+X12qWnHu3Bx/RG3Utf7mBgQU=; b=FaWMniYkOJNo2DB7p64lpRhO/fyA6ZD57+zbYsEmb9haCU6YHB974S7Z4qBcqokPxM J4hjiAk40uwvm1GRiQ2ldQa2MIR8YwXahfQVHXlCXiDsyEnvmJ+waq94N9eipOFbpgC0 xDMuj5UZyXMhCi9cfqr1952yRjaTMFsj3CkUyj9v9kW9zuflL+h3GZb5LBYlx94FXDl1 om+/3jo757q8qYn4OPdYAVr90Edp6PjmVCicCzURvD+1H2Si/F4h06oHrbVwehhhzUma 6QFVZsfMoufes1XeFydJ4WehnC6ErxU/agZTIgJcUGa1CUGsf78Xq0E82vdA1+OnIi0l Cvtw== X-Gm-Message-State: AOJu0YxqdLJvCmMiCib1Uu4MfprRPxCxOmVLGlBDYVaTFzAYOG1w5I1h p3X4hCzpjjdXse3rBsNvUII3wfHT9RmmOQ== X-Google-Smtp-Source: AGHT+IEyWGEcZvEEu93PX3MMhLtQ8NpHQ0hZmoupxKOCuelPnHurMFmxC01HmwynHaowgr/Zmc/hPg== X-Received: by 2002:a17:903:246:b0:1bc:7001:6e58 with SMTP id j6-20020a170903024600b001bc70016e58mr3375679plh.33.1691798459901; Fri, 11 Aug 2023 17:00:59 -0700 (PDT) Received: from christian-Precision-5550.. ([129.126.215.52]) by smtp.gmail.com with ESMTPSA id z5-20020a1709028f8500b001b8a7e1b116sm4478308plo.191.2023.08.11.17.00.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Aug 2023 17:00:59 -0700 (PDT) From: Christian Couder To: git@vger.kernel.org Cc: Junio C Hamano , John Cai , Jonathan Tan , Jonathan Nieder , Taylor Blau , Derrick Stolee , Patrick Steinhardt , Christian Couder , Christian Couder Subject: [PATCH v5 8/8] gc: add `gc.repackFilterTo` config option Date: Sat, 12 Aug 2023 02:00:11 +0200 Message-ID: <20230812000011.1227371-9-christian.couder@gmail.com> X-Mailer: git-send-email 2.42.0.rc1.8.ga52e3a71db In-Reply-To: <20230812000011.1227371-1-christian.couder@gmail.com> References: <20230808082608.582319-1-christian.couder@gmail.com> <20230812000011.1227371-1-christian.couder@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A previous commit implemented the `gc.repackFilter` config option to specify a filter that should be used by `git gc` when performing repacks. Another previous commit has implemented `git repack --filter-to=` to specify the location of the packfile containing filtered out objects when using a filter. Let's implement the `gc.repackFilterTo` config option to specify that location in the config when `gc.repackFilter` is used. Now when `git gc` will perform a repack with a configured through this option and not empty, the repack process will be passed a corresponding `--filter-to=` argument. Signed-off-by: Christian Couder --- Documentation/config/gc.txt | 11 +++++++++++ builtin/gc.c | 4 ++++ t/t6500-gc.sh | 13 ++++++++++++- 3 files changed, 27 insertions(+), 1 deletion(-) diff --git a/Documentation/config/gc.txt b/Documentation/config/gc.txt index 2153bde7ac..466466d6cc 100644 --- a/Documentation/config/gc.txt +++ b/Documentation/config/gc.txt @@ -150,6 +150,17 @@ gc.repackFilter:: objects into a separate packfile. See the `--filter=` option of linkgit:git-repack[1]. +gc.repackFilterTo:: + When repacking and using a filter, see `gc.repackFilter`, the + specified location will be used to create the packfile + containing the filtered out objects. **WARNING:** The + specified location should be accessible, using for example the + Git alternates mechanism, otherwise the repo could be + considered corrupt by Git as it migh not be able to access the + objects in that packfile. See the `--filter-to=` option + of linkgit:git-repack[1] and the `objects/info/alternates` + section of linkgit:gitrepository-layout[5]. + gc.rerereResolved:: Records of conflicted merge you resolved earlier are kept for this many days when 'git rerere gc' is run. diff --git a/builtin/gc.c b/builtin/gc.c index 9b0984f301..1b7c775d94 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -62,6 +62,7 @@ static const char *gc_log_expire = "1.day.ago"; static const char *prune_expire = "2.weeks.ago"; static const char *prune_worktrees_expire = "3.months.ago"; static char *repack_filter; +static char *repack_filter_to; static unsigned long big_pack_threshold; static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE; @@ -172,6 +173,7 @@ static void gc_config(void) git_config_get_ulong("pack.deltacachesize", &max_delta_cache_size); git_config_get_string("gc.repackfilter", &repack_filter); + git_config_get_string("gc.repackfilterto", &repack_filter_to); git_config(git_default_config, NULL); } @@ -361,6 +363,8 @@ static void add_repack_all_option(struct string_list *keep_pack) if (repack_filter && *repack_filter) strvec_pushf(&repack, "--filter=%s", repack_filter); + if (repack_filter_to && *repack_filter_to) + strvec_pushf(&repack, "--filter-to=%s", repack_filter_to); } static void add_repack_incremental_option(void) diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh index 232e403b66..e412cf8daf 100755 --- a/t/t6500-gc.sh +++ b/t/t6500-gc.sh @@ -203,7 +203,6 @@ test_expect_success 'one of gc.reflogExpire{Unreachable,}=never does not skip "e ' test_expect_success 'gc.repackFilter launches repack with a filter' ' - test_when_finished "rm -rf bare.git" && git clone --no-local --bare . bare.git && git -C bare.git -c gc.cruftPacks=false gc && @@ -215,6 +214,18 @@ test_expect_success 'gc.repackFilter launches repack with a filter' ' grep -E "^trace: (built-in|exec|run_command): git repack .* --filter=blob:none ?.*" trace.out ' +test_expect_success 'gc.repackFilterTo store filtered out objects' ' + test_when_finished "rm -rf bare.git filtered.git" && + + git init --bare filtered.git && + git -C bare.git -c gc.repackFilter=blob:none \ + -c gc.repackFilterTo=../filtered.git/objects/pack/pack \ + -c repack.writeBitmaps=false -c gc.cruftPacks=false gc && + + test_stdout_line_count = 1 ls bare.git/objects/pack/*.pack && + test_stdout_line_count = 1 ls filtered.git/objects/pack/*.pack +' + prepare_cruft_history () { test_commit base &&