From patchwork Tue Oct 8 08:13:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Young X-Patchwork-Id: 13825952 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 43A74176FCE for ; Tue, 8 Oct 2024 08:14:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728375247; cv=none; b=kvsFCZZAKDzHmzBQxsRkOFYmGFLSsopuylc5vFjMY1O20mHoazI6n3GMB285XkeBgQ6F/PecvvYQsiWKaD9BOuX+xryP9dMP1XnqoJHsfL/8eO6nBFNp8vDYFfEarLNMXHqNrcOa7s3rjSYJ6KtkZVb7M2VthRywRx6pn1EWUaU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728375247; c=relaxed/simple; bh=gHEcPIzzayJohWr4swmJWRmGWWQSP8LL1o4rEjTvnTU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eetsCwHO/4JDohadx0lbeuZmFhx1WT+dTW0DE5OvtQhy0OvKJHri8Ifo/7uBx5MLfaGKlQWMBmCHAx2fI5Mqpr/ZyzIeznYXrOBQm3fWkS2XnKOJ4iMsdMzoYa6grytMOVSHN/aFZHyo/T9CMPiflp+ogAqhHNfYdy0JUj1csdU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=HJvz3Qvd; arc=none smtp.client-ip=209.85.215.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="HJvz3Qvd" Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-7e9ff6fb4c6so3497656a12.3 for ; Tue, 08 Oct 2024 01:14:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1728375244; x=1728980044; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iWJBWfKv9Al91OhxSxJXS0vP//RUxD01/MXPJIuz314=; b=HJvz3QvdZxS7ZjxbeMQuiTUr1nw9qPBvSKP/0Tf2qIq/OdCPFmNUAXdsjLRGHc8ArY +DVryJoBTJ72/CoTGmUQEla3I4duUHiXsUfHjk/bt3w/AcPvowMbHcIXlRwns2BbiOMp YXq1JhOloJ9yJMGIGPHi16gfAezNYLdUb1T0SC800WIVnMMk08MtbqY0J7k2f0U5kKJk iT3hvvpaR9r1nXEJ9mWZEflZVuDPCvL38fysc0qwjXxwPyvkTNdVuqc2yK+aCgCZmtlR AqyWp9+DFUMcTPbgWCunQnB0/JR3qI3eOroh+Z1kjGD3QAXy/pf+z3K1j/9GINTCOahB fNWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728375244; x=1728980044; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iWJBWfKv9Al91OhxSxJXS0vP//RUxD01/MXPJIuz314=; b=T2NCZvI2742bg0DOUaJpaMOmH19qSX8BF84CCFuPONKDp78dp8JqJT52rUO4pPFVKm yL9B23XfaJPhlEy2BjLJxBvDP/nAXKZcLEBHrB4It9/RY5LDjvnS4w0j70yfBf6YeGcU 4hBaNE6xU4dydC1uwLhTsmHCQreYZktUlorA5I4fMmb//j9f14c5ov5vCRkWaSSUSfLl /qZ8mAsmsZAi2yDl7JRAXnphOSKN3dFZtIkc86AVO5J1HGKezvjWIXcFJJl7ODAAcA6F JHS5N32GSYZ3AJ7EkmSej9Iv/pDa3ln4ksjwbwpffCvHvxAw0Bm1UBV91MFKXZjNfQ51 w3hA== X-Gm-Message-State: AOJu0Yx5a0zw8vSS6Xo5Sz5zi0arUilojyjZXG5AnObf77CS5awGhRpQ hpmzwP3l0IlHBohdbhNu15n7XNviDL35Xx6Vm9A5TxN8JxGFyA+cpI9m33Ow/5HyguXLWeE5Gu/ 6 X-Google-Smtp-Source: AGHT+IGNoMdqYra+ZvksOUl/ZjOLY3OvzLIUyFwxPZ9LC+in754ns+rw/VcOikjOBe22bs3hvnqp0A== X-Received: by 2002:a17:90a:f311:b0:2e2:8c6f:3dd0 with SMTP id 98e67ed59e1d1-2e28c6f3ec8mr773071a91.14.1728375243918; Tue, 08 Oct 2024 01:14:03 -0700 (PDT) Received: from localhost.localdomain ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e1e83ca28fsm8739647a91.7.2024.10.08.01.14.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Oct 2024 01:14:03 -0700 (PDT) From: Han Young To: git@vger.kernel.org Cc: calvinwan@google.com, jonathantanmy@google.com, sokcevic@google.com, gitster@pobox.com, phillip.wood123@gmail.com, Han Young Subject: [PATCH v2 1/3] repack: pack everything into packfile Date: Tue, 8 Oct 2024 16:13:48 +0800 Message-ID: <20241008081350.8950-2-hanyang.tony@bytedance.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241008081350.8950-1-hanyang.tony@bytedance.com> References: <20240802073143.56731-1-hanyang.tony@bytedance.com> <20241008081350.8950-1-hanyang.tony@bytedance.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In a partial repository, creating a local commit and then fetching causes the following state to occur: commit tree blob C3 ---- T3 -- B3 (fetched from remote, in promisor pack) | C2 ---- T2 -- B2 (created locally, in non-promisor pack) | C1 ---- T1 -- B1 (fetched from remote, in promisor pack) During garbage collection, parents of promisor objects are marked as UNINTERESTING and are subsequently garbage collected. In this case, C2 would be deleted and attempts to access that commit would result in "bad object" errors (originally reported here[1]). For partial repos, repacking is done in two steps. We first repack all the objects in promisor packfile, then repack all the non-promisor objects. It turns out C2, T2 and B2 are not repacked in either steps, ended up deleted. We can avoid this by packing everything into a promisor packfile, if the repo is partial cloned. [1] https://lore.kernel.org/git/20240802073143.56731-1-hanyang.tony@bytedance.com/ Helped-by: Calvin Wan Signed-off-by: Han Young --- builtin/repack.c | 257 ++++++++++++++++++++++++++--------------------- 1 file changed, 143 insertions(+), 114 deletions(-) diff --git a/builtin/repack.c b/builtin/repack.c index cb4420f085..e9e18a31fe 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -321,6 +321,23 @@ static int write_oid(const struct object_id *oid, return 0; } +static int write_loose_oid(const struct object_id *oid, + const char *path UNUSED, + void *data) +{ + struct child_process *cmd = data; + + if (cmd->in == -1) { + if (start_command(cmd)) + die(_("could not start pack-objects to repack promisor objects")); + } + + if (write_in_full(cmd->in, oid_to_hex(oid), the_hash_algo->hexsz) < 0 || + write_in_full(cmd->in, "\n", 1) < 0) + die(_("failed to feed promisor objects to pack-objects")); + return 0; +} + static struct { const char *name; unsigned optional:1; @@ -370,12 +387,15 @@ static int has_pack_ext(const struct generated_pack_data *data, BUG("unknown pack extension: '%s'", ext); } -static void repack_promisor_objects(const struct pack_objects_args *args, - struct string_list *names) +static int repack_promisor_objects(const struct pack_objects_args *args, + struct string_list *names, + struct string_list *list, + int pack_all) { struct child_process cmd = CHILD_PROCESS_INIT; FILE *out; struct strbuf line = STRBUF_INIT; + struct string_list_item *item; prepare_pack_objects(&cmd, args, packtmp); cmd.in = -1; @@ -387,13 +407,19 @@ static void repack_promisor_objects(const struct pack_objects_args *args, * {type -> existing pack order} ordering when computing deltas instead * of a {type -> size} ordering, which may produce better deltas. */ - for_each_packed_object(write_oid, &cmd, - FOR_EACH_OBJECT_PROMISOR_ONLY); + if (pack_all) + for_each_packed_object(write_oid, &cmd, 0); + else + for_each_string_list_item(item, list) { + pack_mark_retained(item); + } + + for_each_loose_object(write_loose_oid, &cmd, 0); if (cmd.in == -1) { /* No packed objects; cmd was never started */ child_process_clear(&cmd); - return; + return 0; } close(cmd.in); @@ -431,6 +457,7 @@ static void repack_promisor_objects(const struct pack_objects_args *args, if (finish_command(&cmd)) die(_("could not finish pack-objects to repack promisor objects")); strbuf_release(&line); + return 0; } struct pack_geometry { @@ -1312,8 +1339,7 @@ int cmd_repack(int argc, strvec_push(&cmd.args, "--reflog"); strvec_push(&cmd.args, "--indexed-objects"); } - if (repo_has_promisor_remote(the_repository)) - strvec_push(&cmd.args, "--exclude-promisor-objects"); + if (!write_midx) { if (write_bitmaps > 0) strvec_push(&cmd.args, "--write-bitmap-index"); @@ -1323,125 +1349,128 @@ int cmd_repack(int argc, if (use_delta_islands) strvec_push(&cmd.args, "--delta-islands"); - if (pack_everything & ALL_INTO_ONE) { - repack_promisor_objects(&po_args, &names); - - if (has_existing_non_kept_packs(&existing) && - delete_redundant && - !(pack_everything & PACK_CRUFT)) { - for_each_string_list_item(item, &names) { - strvec_pushf(&cmd.args, "--keep-pack=%s-%s.pack", - packtmp_name, item->string); - } - if (unpack_unreachable) { - strvec_pushf(&cmd.args, - "--unpack-unreachable=%s", - unpack_unreachable); - } else if (pack_everything & LOOSEN_UNREACHABLE) { - strvec_push(&cmd.args, - "--unpack-unreachable"); - } else if (keep_unreachable) { - strvec_push(&cmd.args, "--keep-unreachable"); - strvec_push(&cmd.args, "--pack-loose-unreachable"); + if (repo_has_promisor_remote(the_repository)) { + ret = repack_promisor_objects(&po_args, &names, + &existing.non_kept_packs, pack_everything & ALL_INTO_ONE); + } else { + if (pack_everything & ALL_INTO_ONE) { + if (has_existing_non_kept_packs(&existing) && + delete_redundant && + !(pack_everything & PACK_CRUFT)) { + for_each_string_list_item(item, &names) { + strvec_pushf(&cmd.args, "--keep-pack=%s-%s.pack", + packtmp_name, item->string); + } + if (unpack_unreachable) { + strvec_pushf(&cmd.args, + "--unpack-unreachable=%s", + unpack_unreachable); + } else if (pack_everything & LOOSEN_UNREACHABLE) { + strvec_push(&cmd.args, + "--unpack-unreachable"); + } else if (keep_unreachable) { + strvec_push(&cmd.args, "--keep-unreachable"); + strvec_push(&cmd.args, "--pack-loose-unreachable"); + } } + } else if (geometry.split_factor) { + strvec_push(&cmd.args, "--stdin-packs"); + strvec_push(&cmd.args, "--unpacked"); + } else { + strvec_push(&cmd.args, "--unpacked"); + strvec_push(&cmd.args, "--incremental"); } - } else if (geometry.split_factor) { - strvec_push(&cmd.args, "--stdin-packs"); - strvec_push(&cmd.args, "--unpacked"); - } else { - strvec_push(&cmd.args, "--unpacked"); - strvec_push(&cmd.args, "--incremental"); - } - if (po_args.filter_options.choice) - strvec_pushf(&cmd.args, "--filter=%s", - expand_list_objects_filter_spec(&po_args.filter_options)); - else if (filter_to) - die(_("option '%s' can only be used along with '%s'"), "--filter-to", "--filter"); + if (po_args.filter_options.choice) + strvec_pushf(&cmd.args, "--filter=%s", + expand_list_objects_filter_spec(&po_args.filter_options)); + else if (filter_to) + die(_("option '%s' can only be used along with '%s'"), "--filter-to", "--filter"); - if (geometry.split_factor) - cmd.in = -1; - else - cmd.no_stdin = 1; + if (geometry.split_factor) + cmd.in = -1; + else + cmd.no_stdin = 1; - ret = start_command(&cmd); - if (ret) - goto cleanup; + ret = start_command(&cmd); + if (ret) + goto cleanup; - if (geometry.split_factor) { - FILE *in = xfdopen(cmd.in, "w"); - /* - * The resulting pack should contain all objects in packs that - * are going to be rolled up, but exclude objects in packs which - * are being left alone. - */ - for (i = 0; i < geometry.split; i++) - fprintf(in, "%s\n", pack_basename(geometry.pack[i])); - for (i = geometry.split; i < geometry.pack_nr; i++) - fprintf(in, "^%s\n", pack_basename(geometry.pack[i])); - fclose(in); - } + if (geometry.split_factor) { + FILE *in = xfdopen(cmd.in, "w"); + /* + * The resulting pack should contain all objects in packs that + * are going to be rolled up, but exclude objects in packs which + * are being left alone. + */ + for (i = 0; i < geometry.split; i++) + fprintf(in, "%s\n", pack_basename(geometry.pack[i])); + for (i = geometry.split; i < geometry.pack_nr; i++) + fprintf(in, "^%s\n", pack_basename(geometry.pack[i])); + fclose(in); + } - ret = finish_pack_objects_cmd(&cmd, &names, 1); - if (ret) - goto cleanup; - - if (!names.nr && !po_args.quiet) - printf_ln(_("Nothing new to pack.")); - - if (pack_everything & PACK_CRUFT) { - const char *pack_prefix = find_pack_prefix(packdir, packtmp); - - if (!cruft_po_args.window) - cruft_po_args.window = po_args.window; - if (!cruft_po_args.window_memory) - cruft_po_args.window_memory = po_args.window_memory; - if (!cruft_po_args.depth) - cruft_po_args.depth = po_args.depth; - if (!cruft_po_args.threads) - cruft_po_args.threads = po_args.threads; - if (!cruft_po_args.max_pack_size) - cruft_po_args.max_pack_size = po_args.max_pack_size; - - cruft_po_args.local = po_args.local; - cruft_po_args.quiet = po_args.quiet; - - ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix, - cruft_expiration, &names, - &existing); + ret = finish_pack_objects_cmd(&cmd, &names, 1); if (ret) goto cleanup; - if (delete_redundant && expire_to) { - /* - * If `--expire-to` is given with `-d`, it's possible - * that we're about to prune some objects. With cruft - * packs, pruning is implicit: any objects from existing - * packs that weren't picked up by new packs are removed - * when their packs are deleted. - * - * Generate an additional cruft pack, with one twist: - * `names` now includes the name of the cruft pack - * written in the previous step. So the contents of - * _this_ cruft pack exclude everything contained in the - * existing cruft pack (that is, all of the unreachable - * objects which are no older than - * `--cruft-expiration`). - * - * To make this work, cruft_expiration must become NULL - * so that this cruft pack doesn't actually prune any - * objects. If it were non-NULL, this call would always - * generate an empty pack (since every object not in the - * cruft pack generated above will have an mtime older - * than the expiration). - */ - ret = write_cruft_pack(&cruft_po_args, expire_to, - pack_prefix, - NULL, - &names, - &existing); + if (!names.nr && !po_args.quiet) + printf_ln(_("Nothing new to pack.")); + + if (pack_everything & PACK_CRUFT) { + const char *pack_prefix = find_pack_prefix(packdir, packtmp); + + if (!cruft_po_args.window) + cruft_po_args.window = po_args.window; + if (!cruft_po_args.window_memory) + cruft_po_args.window_memory = po_args.window_memory; + if (!cruft_po_args.depth) + cruft_po_args.depth = po_args.depth; + if (!cruft_po_args.threads) + cruft_po_args.threads = po_args.threads; + if (!cruft_po_args.max_pack_size) + cruft_po_args.max_pack_size = po_args.max_pack_size; + + cruft_po_args.local = po_args.local; + cruft_po_args.quiet = po_args.quiet; + + ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix, + cruft_expiration, &names, + &existing); if (ret) goto cleanup; + + if (delete_redundant && expire_to) { + /* + * If `--expire-to` is given with `-d`, it's possible + * that we're about to prune some objects. With cruft + * packs, pruning is implicit: any objects from existing + * packs that weren't picked up by new packs are removed + * when their packs are deleted. + * + * Generate an additional cruft pack, with one twist: + * `names` now includes the name of the cruft pack + * written in the previous step. So the contents of + * _this_ cruft pack exclude everything contained in the + * existing cruft pack (that is, all of the unreachable + * objects which are no older than + * `--cruft-expiration`). + * + * To make this work, cruft_expiration must become NULL + * so that this cruft pack doesn't actually prune any + * objects. If it were non-NULL, this call would always + * generate an empty pack (since every object not in the + * cruft pack generated above will have an mtime older + * than the expiration). + */ + ret = write_cruft_pack(&cruft_po_args, expire_to, + pack_prefix, + NULL, + &names, + &existing); + if (ret) + goto cleanup; + } } } From patchwork Tue Oct 8 08:13:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Young X-Patchwork-Id: 13825953 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1B6E1DC046 for ; Tue, 8 Oct 2024 08:14:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728375250; cv=none; b=sSiWyLNFK3l1xVHoiv9zQQc94XhqEzAAVJzd9J3eUJ8CfFPncSetvVGXVaniCFI8rjkcoyp13GgjoH0Xwv8nN1tQXg3WLlvbwOH4TWRMfUbP56/R0LqurPczyB3ws0SZCCVIEH9bAK6ZZ3hXRGuZZqsGFwbaQC7OSQcMQKm9mKU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728375250; c=relaxed/simple; bh=cC0RZZIWkpyHuRJz3IoEoS7wQDuApojvG++sJnvsvQE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=r8OGghATUuWciJ2POnWbN3Aa8s69nT/yhjsb0jlx7uGixisYWjwfRT7unaKT7kyNrrwSMKKbgSlsP8d26ZSUCsMxS4v2iXAGrtJL+/3/DMh9mlirPdf0e7/RDokftY2tZDpL9LYTwSORl6S7KO0aCbbVqlI/9WornzYx+Al9J2k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=i5LfItdp; arc=none smtp.client-ip=209.85.216.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="i5LfItdp" Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-2e18aa9f06dso3790667a91.0 for ; Tue, 08 Oct 2024 01:14:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1728375247; x=1728980047; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lGR3mW3SExbQVu2QHULPkFA06M+rTTCGJlowrDVvC74=; b=i5LfItdpleQB8H7pyMQMFp/TeZwIdia3qLeeVOUpp4w+xAeqC3Znvp+a0dvZRdny0p 7JT0cDED6f4gr/wwBqqab7jtC8olg14Z0soZY4qYd0b5IQXQt+40ABCYtgsnAFbLJavB L/NaR03I3RdA/o5tDGRWN2CRmd0uHc1QfSCw2a/mPjhUcfh5fj2Cpm/UgH1IJjH7/Xxx kg9v5iRrOwbu7xCTjJc/twBEJiCTh1PEytjVrW2jCVw0bxZEt+7Z3OACO5gQPyMRjtUN 4irqALhvx1d3s3sUK2lDCUZEe9cEntPC3l1KqCwHBZyWAen7VRaeqARabZQmBVdTEJ5d FRXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728375247; x=1728980047; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lGR3mW3SExbQVu2QHULPkFA06M+rTTCGJlowrDVvC74=; b=ZNqayNDTC0vBikRusGAfQiu/bDwA5LyxZyJd9UnYqQTQhoKRqGks40nJ1f2E51DaZT qdk4k2aWaj3GesAj60D3iTaCHtlGuRg2i9er7vyxxHx7BnFSyhS+c8e//zzLVO9RMhnd 2ul4dztYrJRD3+jvURpDEalIZmgvbd3chsouJXHHqcDSRbuCNWhItsqZbH9axBp6Hiwz ZlFWg4oX90w/7lXURnQYAPENFFJ1/PpbYjROXMIQ8l9x5R7axBgGgMg431Sx0v/i3qRV 9wRqRGUyoBxM2xz8jh3pSzimyUg3C0A7DmnRG5XUOT1Z2HxwjBWTg+RgwGszoh1WBN6K eFQQ== X-Gm-Message-State: AOJu0YxvYjC1ZNwvkJajQz43v307Pl12U53JKc2igB8Jy2O+DXcVmd/c xYF1M5+b7jjfI0rsrkAqcAE5Dj27LUYBWs333T1V6Mcln8vNlsXeQc0gb95WgP81cO2S3AS7Uha a X-Google-Smtp-Source: AGHT+IGLEVPpO4QtbytIU5Sj6F12AKzYbjuWBX9Ru/gVQeJil3QriFKcR+OiRwZHTI9FJ+PVBwRtzA== X-Received: by 2002:a17:90b:46cd:b0:2e2:85b8:14e with SMTP id 98e67ed59e1d1-2e285b801c0mr1924810a91.15.1728375247427; Tue, 08 Oct 2024 01:14:07 -0700 (PDT) Received: from localhost.localdomain ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e1e83ca28fsm8739647a91.7.2024.10.08.01.14.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Oct 2024 01:14:06 -0700 (PDT) From: Han Young To: git@vger.kernel.org Cc: calvinwan@google.com, jonathantanmy@google.com, sokcevic@google.com, gitster@pobox.com, phillip.wood123@gmail.com, Han Young Subject: [PATCH v2 2/3] t0410: adapt tests to repack changes Date: Tue, 8 Oct 2024 16:13:49 +0800 Message-ID: <20241008081350.8950-3-hanyang.tony@bytedance.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241008081350.8950-1-hanyang.tony@bytedance.com> References: <20240802073143.56731-1-hanyang.tony@bytedance.com> <20241008081350.8950-1-hanyang.tony@bytedance.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In the previous commit, we changed how partial repo is cloned. Adapt tests to these changes. Also check gc does not delete normal objects too. Signed-off-by: Han Young --- t/t0410-partial-clone.sh | 68 +--------------------------------------- t/t5616-partial-clone.sh | 9 +----- 2 files changed, 2 insertions(+), 75 deletions(-) diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index 34bdb3ab1f..c169b47160 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -499,46 +499,6 @@ test_expect_success 'single promisor remote can be re-initialized gracefully' ' git -C repo fetch --filter=blob:none foo ' -test_expect_success 'gc repacks promisor objects separately from non-promisor objects' ' - rm -rf repo && - test_create_repo repo && - test_commit -C repo one && - test_commit -C repo two && - - TREE_ONE=$(git -C repo rev-parse one^{tree}) && - printf "$TREE_ONE\n" | pack_as_from_promisor && - TREE_TWO=$(git -C repo rev-parse two^{tree}) && - printf "$TREE_TWO\n" | pack_as_from_promisor && - - git -C repo config core.repositoryformatversion 1 && - git -C repo config extensions.partialclone "arbitrary string" && - git -C repo gc && - - # Ensure that exactly one promisor packfile exists, and that it - # contains the trees but not the commits - ls repo/.git/objects/pack/pack-*.promisor >promisorlist && - test_line_count = 1 promisorlist && - PROMISOR_PACKFILE=$(sed "s/.promisor/.pack/" out && - grep "$TREE_ONE" out && - grep "$TREE_TWO" out && - ! grep "$(git -C repo rev-parse one)" out && - ! grep "$(git -C repo rev-parse two)" out && - - # Remove the promisor packfile and associated files - rm $(sed "s/.promisor//" packlist && - test_line_count = 1 packlist && - git verify-pack repo/.git/objects/pack/pack-*.pack -v >out && - grep "$(git -C repo rev-parse one)" out && - grep "$(git -C repo rev-parse two)" out && - ! grep "$TREE_ONE" out && - ! grep "$TREE_TWO" out -' - test_expect_success 'gc does not repack promisor objects if there are none' ' rm -rf repo && test_create_repo repo && @@ -569,7 +529,7 @@ repack_and_check () { git -C repo2 cat-file -e $3 } -test_expect_success 'repack -d does not irreversibly delete promisor objects' ' +test_expect_success 'repack -d does not irreversibly delete objects' ' rm -rf repo && test_create_repo repo && git -C repo config core.repositoryformatversion 1 && @@ -583,40 +543,14 @@ test_expect_success 'repack -d does not irreversibly delete promisor objects' ' TWO=$(git -C repo rev-parse HEAD^^) && THREE=$(git -C repo rev-parse HEAD^) && - printf "$TWO\n" | pack_as_from_promisor && printf "$THREE\n" | pack_as_from_promisor && delete_object repo "$ONE" && - repack_and_check --must-fail -ab "$TWO" "$THREE" && repack_and_check -a "$TWO" "$THREE" && repack_and_check -A "$TWO" "$THREE" && repack_and_check -l "$TWO" "$THREE" ' -test_expect_success 'gc stops traversal when a missing but promised object is reached' ' - rm -rf repo && - test_create_repo repo && - test_commit -C repo my_commit && - - TREE_HASH=$(git -C repo rev-parse HEAD^{tree}) && - HASH=$(promise_and_delete $TREE_HASH) && - - git -C repo config core.repositoryformatversion 1 && - git -C repo config extensions.partialclone "arbitrary string" && - git -C repo gc && - - # Ensure that the promisor packfile still exists, and remove it - test -e repo/.git/objects/pack/pack-$HASH.pack && - rm repo/.git/objects/pack/pack-$HASH.* && - - # Ensure that the single other pack contains the commit, but not the tree - ls repo/.git/objects/pack/pack-*.pack >packlist && - test_line_count = 1 packlist && - git verify-pack repo/.git/objects/pack/pack-*.pack -v >out && - grep "$(git -C repo rev-parse HEAD)" out && - ! grep "$TREE_HASH" out -' - test_expect_success 'do not fetch when checking existence of tree we construct ourselves' ' rm -rf repo && test_create_repo repo && diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh index c53e93be2f..2c6f10026f 100755 --- a/t/t5616-partial-clone.sh +++ b/t/t5616-partial-clone.sh @@ -643,16 +643,9 @@ test_expect_success 'fetch from a partial clone, protocol v2' ' grep "version 2" trace ' -test_expect_success 'repack does not loosen promisor objects' ' - rm -rf client trace && - git clone --bare --filter=blob:none "file://$(pwd)/srv.bare" client && - test_when_finished "rm -rf client trace" && - GIT_TRACE2_PERF="$(pwd)/trace" git -C client repack -A -d && - grep "loosen_unused_packed_objects/loosened:0" trace -' - test_expect_success 'lazy-fetch in submodule succeeds' ' # setup + rm -rf client && test_config_global protocol.file.allow always && test_when_finished "rm -rf src-sub" && From patchwork Tue Oct 8 08:13:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Han Young X-Patchwork-Id: 13825954 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4669217ADE1 for ; Tue, 8 Oct 2024 08:14:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728375253; cv=none; b=M6mkBJ8lDRwmr2VLgOk0BpEq05bjjO+6aEhFpjzFgD8NEgXbpx0mbG5LbLAautaE9ZIkKxi8rb0Om+uKJY7sff8HpT2uQ61O5QhsIlv72NrjCM43T85zYRjFBV1D4jf9luaqmLsEZIcM4KomPRPM86eWxi16AoLPTp/emIEqcV8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728375253; c=relaxed/simple; bh=Q0ZkySsPRgXuKb99NgsCtB5/hyfsAxgtEAsGiYMgx2Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uzemVt1UL3ys5Wx8JPfHHnPaS0Hs7efhsWAEeuSCiGF1Az3rg95vokRPqHG3Aiv/oiqpMabTcTsHzrkpMy1MGscOXJeiMGA23bD3Q0aEoojYwHsinK1l0lJ9YMNLbZotff+MYVenZszhqWxSajMCI1PURza7BCV/HhyjmKqyKTU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=ZOKKNURs; arc=none smtp.client-ip=209.85.216.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="ZOKKNURs" Received: by mail-pj1-f52.google.com with SMTP id 98e67ed59e1d1-2e0946f9a8eso3653642a91.1 for ; Tue, 08 Oct 2024 01:14:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1728375251; x=1728980051; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=3WVxW3kjCUYpN9aCAR5CYmn0Sg+BnR1VvxehgUzJ1yE=; b=ZOKKNURsMpixLgXbsq4Y8X1+uK9eg5OOn++1d5zPqRdIqFWXz6zOicjdeLW3JDk8Uy I82DabiG3BrpOI790mTFxQIUsjjAeIFvBFdHQuGlTjpt4Ixf5YSAXoMYz7Yee35teB6v axLpxYfroVf8EPzv0b4DnoWBgRKzvWPn0bU/yAai34XqcIyVBB59C2wboep7tcs3Ezb3 /XjnIVlPUO3p+2CVqkoa1ueDHpJnWgmFUu3JDWP2NPNMR6jYkbkT7BkECqswsLV0B0sG 5kcSEY7y969s/fFbiMNGQOFdFlIp06YOk1DaMfFrhqDlb8jHbW+c+VGZlx+1qRjVtW4e UKUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728375251; x=1728980051; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3WVxW3kjCUYpN9aCAR5CYmn0Sg+BnR1VvxehgUzJ1yE=; b=pJypD3FrM18X1u7hoKvcW5FX1YpHFehDDX4BxAsmGTuvgCnKBFJpT9RGfrJvDyr1WE +lxo5emZRDXKbaQwO2HnjDQjAOc1P9TuF9wvmczfCGNCBX+Pr09W5dbqPsFOuCkpGAgo vGpYxnr7fK5unWwNrRD/W6aCoeLVSG3VNqMhRuo98lzHZbTg9uqQU0dKBMtSdMnXkqor Zd+Ob1El0aVv+5ETXAlhgoxcBHj9NhcUrB6T+REoll9xfAYUssoRhcwlTK4w+Nwj3UQj 5CAEoviR5cOYR3uw6fgec9Mmc89H2g4soT4T1q0Eif402WYP83ZBOjdPHk3ntZdleONX 2PDg== X-Gm-Message-State: AOJu0Yzu04QQnTjfft/cZEUHEPq4unTqQLl/kos+Exlq3re21/NUL0sz Pvms5xwPdEJx59q3WvifHKLAQKWPZgMCQq5vYC0Q4+h3SwG51uuhUGswqCYJoOqDyIN5D6KmhWd K X-Google-Smtp-Source: AGHT+IH+zw+8nboMV/aYOrvEq5lQDHXyJobwoN99AEsUNiX98Z1Zw8rrFjvRTNCA2LjCCLoq4uHrVQ== X-Received: by 2002:a17:90b:370f:b0:2cc:ef14:89e3 with SMTP id 98e67ed59e1d1-2e1e622226emr16255975a91.15.1728375251022; Tue, 08 Oct 2024 01:14:11 -0700 (PDT) Received: from localhost.localdomain ([63.216.146.178]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e1e83ca28fsm8739647a91.7.2024.10.08.01.14.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Oct 2024 01:14:10 -0700 (PDT) From: Han Young To: git@vger.kernel.org Cc: calvinwan@google.com, jonathantanmy@google.com, sokcevic@google.com, gitster@pobox.com, phillip.wood123@gmail.com, Han Young Subject: [PATCH v2 3/3] partial-clone: update doc Date: Tue, 8 Oct 2024 16:13:50 +0800 Message-ID: <20241008081350.8950-4-hanyang.tony@bytedance.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241008081350.8950-1-hanyang.tony@bytedance.com> References: <20240802073143.56731-1-hanyang.tony@bytedance.com> <20241008081350.8950-1-hanyang.tony@bytedance.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Document new repack behavior for partial repo Signed-off-by: Han Young --- Documentation/technical/partial-clone.txt | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt index cd948b0072..9791c9ac24 100644 --- a/Documentation/technical/partial-clone.txt +++ b/Documentation/technical/partial-clone.txt @@ -124,6 +124,10 @@ their ".pack" and ".idx" files. + When Git encounters a missing object, Git can see if it is a promisor object and handle it appropriately. If not, Git can report a corruption. + +To prevent `repack` from removing locally created objects, `repack` packs all +the objects into one promisor packfile. It's no longer possible to determine +the cause of missing objects after `gc`.[7] + This means that there is no need for the client to explicitly maintain an expensive-to-modify list of missing objects.[a] @@ -156,8 +160,9 @@ and prefetch those objects in bulk. - `fsck` has been updated to be fully aware of promisor objects. -- `repack` in GC has been updated to not touch promisor packfiles at all, - and to only repack other objects. +- `repack` in GC has been taught to handle partial clone repo differently. + `repack` will pack every objects into one promisor packfile for partial + repos. - The global variable "fetch_if_missing" is used to control whether an object lookup will attempt to dynamically fetch a missing object or @@ -244,8 +249,7 @@ remote in a specific order. objects. We assume that promisor remotes have a complete view of the repository and can satisfy all such requests. -- Repack essentially treats promisor and non-promisor packfiles as 2 - distinct partitions and does not mix them. +- It's not possible to discard dangling objects in repack. - Dynamic object fetching invokes fetch-pack once *for each item* because most algorithms stumble upon a missing object and need to have @@ -365,3 +369,7 @@ Related Links [6] https://lore.kernel.org/git/20170714132651.170708-1-benpeart@microsoft.com/ + Subject: [RFC/PATCH v2 0/1] Add support for downloading blobs on demand + Date: Fri, 14 Jul 2017 09:26:50 -0400 + +[7] https://lore.kernel.org/git/20240802073143.56731-1-hanyang.tony@bytedance.com/ + + Subject: [PATCH 0/1] revision: fix reachable objects being gc'ed in no blob clone repo + + Date: Fri, 2 Aug 2024 15:31:42 +0800