From patchwork Thu Jul 18 19:55:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13736670 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B00C144D28 for ; Thu, 18 Jul 2024 19:55:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721332554; cv=none; b=fDXfK2iKmkvtPZzEa+SbfXqdy3NrTfRGEKrya8RWFOv+dQlt9Iyn/AZYRs3LexGd9Awl8VbLrZtT9JcLNwpi49XDkeVflwswPXS41PwSuFmKMfRVCMmOriBNnaz5YTh+xdJOlHp289FVLUv+ye3XUl/6Hgz0sRcTqtzikhVKEIo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721332554; c=relaxed/simple; bh=lescR3G70Vw4KeNxEz4uGRz+9G7NhoALFzbIldAUyQ0=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=m2jAMh+ZfWzH/IdU3rWlXxhSrmtPyVutjDtID1gS3Q9qg7wcpCnaiW0i8YNP/a1DRk81sBMgL+p+lbpWxtUOdz0hqGNPnuXa3Jtdzyiztporay9dveAZIfs7SVP+A5OADQtzC+r2KVgoLznT2y0DxI4Uy/uiMPUjGzIKyva1Vgc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=STSma1Sl; arc=none smtp.client-ip=209.85.128.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="STSma1Sl" Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-426717a2d12so4715855e9.0 for ; Thu, 18 Jul 2024 12:55:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721332550; x=1721937350; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=CWRq4v+BNYFQ8GqQO3d79v9l/tDeROjQFjgs4Ku4+KQ=; b=STSma1Sl3v7ZNlWYfwL2JfqFQCKp1bq87E61YjdvIzYt+LERmzQg9NZhvk31OhrnX1 EZ26opU9IJRtaOPHgjcey0xNKUq5VrfemBkSc9Bk7J+/JtbRrxoK/lEcH5GhkW0hUPMh 5YkqnW1UR27sUJPm8fw8bLUuX7pTCf6lt/KqPc1yTWxsTHx8g4+QhuX4hfHovHaADCf5 VWjTh37iZ6j2THwndfqdRKebl5covGPidM3S8SRAoNwgRJcVO8pXFIoWUlyyP/beuWUy Zkow9V9b+DYj+HLToGl6UuQ6LUcjRUFO8JF/yyTKSPcXoFJVlBmMIkjJhdVX4HKKYlc9 hs2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721332550; x=1721937350; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CWRq4v+BNYFQ8GqQO3d79v9l/tDeROjQFjgs4Ku4+KQ=; b=IaFmf07Hwpusm29yL9BRbqfsGyHcJgWlEefVTD3/yJfZ+A0myZG7o3CLyAvwphVkWL 7Mw/CC8S9kDhqgxDJIvvCJL0Qe1b5KsHdHmL3huhDY4BfXCCsb+yvoIJLn6aqRQ/1h66 q0tGVGnITZfV6TFwMEsGEjEZOs0rwXrHfhtw2VF+Q56+ydX6yVOYtgiM7a6kfRcCNC+b 41HcmxObOqU/ffBq3uzXxlLRhIF/GpET7Lt7vNC5EiaKX9+efOMWFuky8WotcIRT7VI7 MI9uQIXrYPeGtaGyMVfdtAA5DGS7Z1qrKucA990tjdOSU24QxEjTaE2I8a0Ozo7+MOKn 55Ag== X-Gm-Message-State: AOJu0YwQd5T8BWKNtogVDMv1FPpEZAEecvGvdx8JB4zliHWyBeZjMcD4 pgoig7vfnA1G5SwLWz3+CSTbx8uVAgdJRA2wFc1FlECmpBHd8o3LCENkHQ== X-Google-Smtp-Source: AGHT+IExI44iMUsbKV5eLIuuVGz9u5uulY/EJ+Pt0PQZD/l1q3XDl6zzoasL0YiARzMIW1UG2Vdqpg== X-Received: by 2002:a05:600c:4751:b0:426:5cc7:82f with SMTP id 5b1f17b1804b1-427d2b527demr14755515e9.13.1721332549706; Thu, 18 Jul 2024 12:55:49 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-427d2a7257esm26035645e9.28.2024.07.18.12.55.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jul 2024 12:55:48 -0700 (PDT) Message-Id: <9b8e2012c9107f99e19c541113ae6a405e38a92f.1721332546.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Thu, 18 Jul 2024 19:55:45 +0000 Subject: [PATCH 1/2] t5319: add failing test case for repack/expire Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttayllorr.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee Git 2.45.0 included the change b7d6f23a171 (midx-write.c: use `--stdin-packs` when repacking, 2024-04-01) which caused the 'git multi-pack-index repack' command to use 'git pack-objects --stdin-packs' instead of listing the objects to repack. While this change was motivated by efficient cross-process communication and the ability to improve delta compression, it breaks a fundamental function of the 'incremental-repack' task that is enabled by default in Scalar clones or Git repositories that run 'git maintenance start'. The 'incremental-repack' task performs a two-step process of the 'expire' and 'repack' subcommands of the 'git multi-pack-index' builtin. The 'expire' command removes any pack-files listed in the multi-pack-index but without any referenced objects. The 'repack' task then finds a batch of pack-files to repack and sends their objects to 'git pack-objects'. Both the pack-files chosen for the batch and the objects chosen to repack are based on the ones that the multi-pack-index references. Objects that appear in a pack-file but have a duplicate copy in a newer pack-file are not considered in this case. Since the multi-pack-index references only the newest copy of an object, this allows the next 'incremental-repack' task to remove the pack-files in the next 'expire' task. This delay is intentional due to how Windows handles may block deletion of files with open read handles. However, the mentioned commit changed this behavior to divorce the set of objects referenced by the multi-pack-index and instead use a set of "included" and "excluded" pack-files in the 'git pack-objects' builtin. When a pack-file is selected as "included", only the objects it contains but are not in any "excluded" pack-files are considered for repacking. This has led to client repositories failing to remove old pack-files as they still have some referenced objects. This grows over time until the point that Git is trying to repack the same pack-files over and over. For now, create a test case that demonstrates the expected behavior, but also fails in its final line. The setup here it attempting to recreate a typical situation for a repository that uses a blobless partial clone. There would be a large initial pack-file from the clone that is never selected in the 'repack' batch. There are other pack-files that have a combination of new objects from incremental fetches and possibly blobs that are not connected to those incremental fetches; these blobs could be filled in from commands like 'git checkout' or 'git blame'. The pack-files also have some overlap on purpose so test-1 has some duplicates in test-2 and test-2 has some duplicates in test-3. At the end of the test, the test-2 pack-file still exists though it should have been expired. This test will pass when reverting the offending commit. Signed-off-by: Derrick Stolee --- t/t5319-multi-pack-index.sh | 55 +++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index dd09134db03..327376233c5 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -1004,6 +1004,61 @@ test_expect_success 'repack --batch-size= repacks everything' ' ) ' +test_expect_failure 'repack/expire loop' ' + git init repack-expire && + test_when_finished "rm -fr repack-expire" && + ( + cd repack-expire && + + test_commit_bulk 5 && + + # Create three overlapping pack-files + git rev-list --objects HEAD~3 >in-1 && + git rev-list --objects HEAD~4..HEAD~2 >in-2 && + git rev-list --objects HEAD~3..HEAD >in-3 && + + # Create disconnected blobs + obj1=$(git hash-object -w in-1) && + obj2=$(git hash-object -w in-2) && + obj3=$(git hash-object -w in-3) && + + echo $obj2 >>in-2 && + echo $obj3 >>in-3 && + + for i in $(test_seq 3) + do + git pack-objects .git/objects/pack/test-$i X-Patchwork-Id: 13736671 Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B930145A1F for ; Thu, 18 Jul 2024 19:55:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721332555; cv=none; b=tfYp6z/Tl9biZ5p4dlj3aF04mWij+W/SOnw1ieNAmnSQcSMKsc3g4xNFS5c8mP5Ag66g/436j+sA0Fm3lzJrv4IQVZsyY7mlsnKk9GUuX8F0pum8rRryyRxHh3ONRu+9I6EIYRAeh3MGmZnsf9F49Yo1biMbmL0QNhSCMJS8ojU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721332555; c=relaxed/simple; bh=1bQe/3g7JyEqMVLdLQDZcHy7LDgcVD3zaTvBCmXPwE4=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=JhNNCGwM9gmfRhjF+bJzN145IIHux38F7lcTQX/wGpUEQvBN7X1tgB3I3xPrSJfNYv8zoF/oOBiArXTt4LHU9Me5FRTeyfE+wDPUoqvGqvSYILsxjVTLxk8fvEZJXypYiP7OMcPG2dm9VwqsaBipUBXQbym83KvGvB5rIERXL3M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ScYy1LIs; arc=none smtp.client-ip=209.85.221.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ScYy1LIs" Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-3686b285969so130830f8f.0 for ; Thu, 18 Jul 2024 12:55:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721332551; x=1721937351; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=Ye4j3W8FwjAhukIrNPgom0agTMcOPBUbhtjSqUmBekk=; b=ScYy1LIs6Mtct1Fv3T7YSHS5pO1MeP7MbrWaGsmjHlK4I0z3HBxwWJv/Lve0qaXlkY TdFQL5PqcTFE7Ou7A7QO+eCAUU2WidgUta42EJQUzUaGzC+qna6f0xSL8SvEQ1+OhkfG O8K5i5bGWN+YrDSBKtxKO8S9S3CiptoVbSKGdCqOKP0ms5mEXmNM1Sry+DGzqaJlNe9G T9uCZdXJupG6x3NybxL33K2nLWdpfyyz3cG6ntMcmgeiWwxLBKgMswy9/qLBo+bUkDwU 6hUYzPnS+DQvsRwYgmd5w+KFO8Hs6i8ad3so9aF2rc+sEN9ZGECh7FE3aaOxopmmDqTS b0Rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721332551; x=1721937351; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ye4j3W8FwjAhukIrNPgom0agTMcOPBUbhtjSqUmBekk=; b=CqWEJWkzEFzFTJCsOx3xwERGc3UJdq0/uBb9ayHDy0cgb26ZPcISvac25yktrY+w6Y GWSG90tQjiinkNE+TJc8c2J+n4RxBTV144Ga+RpQ4p/lYmq9aCjHi4lXaQaLtf332Xe1 G64ooMd7U5ErrLrjeo2DiecrrqBpyzWHBodaHyMD9Hl8EJNQHkq8SZG9N/Fg2hMKb3X+ UYfKZ4hFbPu9ISG+227FPASvm1+06H3/F3ZJBmiYbC85hboPYlMTFJTLfeKTqGbwXh28 QJwrD3dlmSXP51keOyioMSloIqwLd5vO6zWwwS2a4pO67Oo62DrHEdfhg74K0VhqXVtW KVJA== X-Gm-Message-State: AOJu0YyHfFnnlUBlsyiaI5dYuTmBcrOJ0FLgzTWSNLjzOdS/jHIIEZBZ Y51DXzzi1RRZ3mlyv8d1oRW2ikkW/yzrcI1YdXTtNFj4J61W2uu9sf5Osw== X-Google-Smtp-Source: AGHT+IE/HLcpv9hpvcuG78jK4gou6BSSb2e8ZOT/Ql6nTSwfe5Lhg9YMxVZjnojm3wEmZtmQVfrmXg== X-Received: by 2002:a5d:6091:0:b0:367:9625:bd06 with SMTP id ffacd0b85a97d-36831707925mr3838032f8f.42.1721332551428; Thu, 18 Jul 2024 12:55:51 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-368727f0791sm440492f8f.44.2024.07.18.12.55.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jul 2024 12:55:50 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Thu, 18 Jul 2024 19:55:46 +0000 Subject: [PATCH 2/2] midx-write: revert use of --stdin-packs Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttayllorr.com, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee This reverts b7d6f23a171 (midx-write.c: use `--stdin-packs` when repacking, 2024-04-01) and then marks the test created in the previous change as passing. The fundamental issue with the reverted change is that the focus on pack-files separates the object selection from how the multi-pack-index selects a single pack-file for an object ID with multiple copies among the tracked pack-files. The change was made with the intention of improving delta compression in the resulting pack-file, but that can be resolved with the existing object list mechanism. There are other potential pitfalls of doing an object walk at this time if the repository is a blobless partial clone, and that will require additional testing on top of the one that changes here. Signed-off-by: Derrick Stolee --- midx-write.c | 18 +++++++++--------- t/t5319-multi-pack-index.sh | 2 +- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/midx-write.c b/midx-write.c index 65e69d2de78..960cc46250e 100644 --- a/midx-write.c +++ b/midx-write.c @@ -1474,8 +1474,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset); repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands); - strvec_pushl(&cmd.args, "pack-objects", "--stdin-packs", "--non-empty", - NULL); + strvec_push(&cmd.args, "pack-objects"); strvec_pushf(&cmd.args, "%s/pack/pack", object_dir); @@ -1499,15 +1498,16 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, } cmd_in = xfdopen(cmd.in, "w"); - for (i = 0; i < m->num_packs; i++) { - struct packed_git *p = m->packs[i]; - if (!p) + + for (i = 0; i < m->num_objects; i++) { + struct object_id oid; + uint32_t pack_int_id = nth_midxed_pack_int_id(m, i); + + if (!include_pack[pack_int_id]) continue; - if (include_pack[i]) - fprintf(cmd_in, "%s\n", pack_basename(p)); - else - fprintf(cmd_in, "^%s\n", pack_basename(p)); + nth_midxed_object_oid(&oid, m, i); + fprintf(cmd_in, "%s\n", oid_to_hex(&oid)); } fclose(cmd_in); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 327376233c5..5cce0be19e6 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -1004,7 +1004,7 @@ test_expect_success 'repack --batch-size= repacks everything' ' ) ' -test_expect_failure 'repack/expire loop' ' +test_expect_success 'repack/expire loop' ' git init repack-expire && test_when_finished "rm -fr repack-expire" && (