From patchwork Tue Nov 28 19:07:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471561 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="oLw4lSe3" Received: from mail-qk1-x731.google.com (mail-qk1-x731.google.com [IPv6:2607:f8b0:4864:20::731]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66322C3 for ; Tue, 28 Nov 2023 11:07:59 -0800 (PST) Received: by mail-qk1-x731.google.com with SMTP id af79cd13be357-77dc2981a58so44404585a.3 for ; Tue, 28 Nov 2023 11:07:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198478; x=1701803278; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=5cLCn8orZeq2NVlYlW/PfjjK2vcYVaNJmJlwkuyMQ2E=; b=oLw4lSe398J0+lsbK21HXksWlxjFZQW+N3A1psgSgcEP7F/o/W1IHIz8sUwfZnQgbD zNgjJKzC4FNfWUJUYK0w2oO+QSqaS3sS9ksKykES37mWYChlXrLz8xnv0mIdPrBFs2pm ey0XNQLHwfcOHKz7Ai1aQpGsFk5IZ8OdddKJRiIxGKnAYfD6vnGK18supxZ/cG5BhGZq 42d+EmAr8xBCR1aMo/2+86mqR2LU4zyiFEbNHJWPuX76hzX2Ag5AFz5ZZMSEb/3JWFpU JMrXxckjS/j2zsV0FtFUwMqGbrGwHAOMtNN7xSZyq2cdctBMv0aqrl+vYLF8pkmwQQT9 LNQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198478; x=1701803278; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=5cLCn8orZeq2NVlYlW/PfjjK2vcYVaNJmJlwkuyMQ2E=; b=fC6l80zx3mML+vr0WfEfIQnVprl5Zh/nzdsOZWl/gz4sGMF2hgUMaWUiL8kEKvm0lO cwZ0ROEqYqvy+kV5BQw8GlHPR7GWEocqYKZjuMr0R5f57sTPSXyJi221qSFvifOozblT dc7wBYUB1iqsX/CbgaXwKeZJH9cdEW16OLDHK/8J9aZF1a9nraTEJ9BMuB7E0YQLhYfB yzlVE1knjHkWcVR+RakuEqKD1gIl6uZPOmo1HKXyVKrRDPG28XwIB0F4OKCkTpLlXYcE 6uutlpytKev/q381XNQMg8ixJxlolAUJAzyZUBuwxmkQ7JcOXtL5lVnPDhX/s97O5siN rMjQ== X-Gm-Message-State: AOJu0YyeowU7DebJA7AmikDsvDI0ceVGmkYnhimniT4bqg2jhc45Z4jc cqdkqyxfy3y9a0eMd6keG/3MTTdk4ImKHiokUDI= X-Google-Smtp-Source: AGHT+IFsIUgoqu/MdVf+vgPy5KGKoUi/qMcn8Av/EbyCuwEX6ITZ3JnwYl/DbUi5qu5u9o1b9OV9Xw== X-Received: by 2002:a05:620a:674b:b0:77d:756d:34cc with SMTP id rq11-20020a05620a674b00b0077d756d34ccmr15187797qkn.20.1701198478280; Tue, 28 Nov 2023 11:07:58 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id ea11-20020a05620a488b00b0077d93c7c785sm3026312qkb.119.2023.11.28.11.07.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:07:58 -0800 (PST) Date: Tue, 28 Nov 2023 14:07:57 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 01/24] pack-objects: free packing_data in more places Message-ID: <101d6a2841a909aa9c99225ebf1b65695cf116cb.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The pack-objects internals use a packing_data struct to track what objects are part of the pack(s) being formed. Since these structures contain allocated fields, failing to appropriately free() them results in a leak. Plug that leak by introducing a free_packing_data() function, and call it in the appropriate spots. This is a fairly straightforward leak to plug, since none of the callers expect to read any values or have any references to parts of the address space being freed. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 1 + midx.c | 5 +++++ pack-objects.c | 15 +++++++++++++++ pack-objects.h | 1 + 4 files changed, 22 insertions(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 89a8b5a976..bfa60359d4 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4522,6 +4522,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) reuse_packfile_objects); cleanup: + free_packing_data(&to_pack); list_objects_filter_release(&filter_options); strvec_clear(&rp); diff --git a/midx.c b/midx.c index 2f3863c936..3b727dc633 100644 --- a/midx.c +++ b/midx.c @@ -1592,8 +1592,13 @@ static int write_midx_internal(const char *object_dir, flags) < 0) { error(_("could not write multi-pack bitmap")); result = 1; + free_packing_data(&pdata); + free(commits); goto cleanup; } + + free_packing_data(&pdata); + free(commits); } /* * NOTE: Do not use ctx.entries beyond this point, since it might diff --git a/pack-objects.c b/pack-objects.c index f403ca6986..1c7bedcc94 100644 --- a/pack-objects.c +++ b/pack-objects.c @@ -151,6 +151,21 @@ void prepare_packing_data(struct repository *r, struct packing_data *pdata) init_recursive_mutex(&pdata->odb_lock); } +void free_packing_data(struct packing_data *pdata) +{ + if (!pdata) + return; + + free(pdata->cruft_mtime); + free(pdata->in_pack); + free(pdata->in_pack_by_idx); + free(pdata->in_pack_pos); + free(pdata->index); + free(pdata->layer); + free(pdata->objects); + free(pdata->tree_depth); +} + struct object_entry *packlist_alloc(struct packing_data *pdata, const struct object_id *oid) { diff --git a/pack-objects.h b/pack-objects.h index 0d78db40cb..336217e8cd 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -169,6 +169,7 @@ struct packing_data { }; void prepare_packing_data(struct repository *r, struct packing_data *pdata); +void free_packing_data(struct packing_data *pdata); /* Protect access to object database */ static inline void packing_data_lock(struct packing_data *pdata) From patchwork Tue Nov 28 19:07:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471562 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="UlScft13" Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2278BD63 for ; Tue, 28 Nov 2023 11:08:02 -0800 (PST) Received: by mail-qk1-x729.google.com with SMTP id af79cd13be357-77d5cf15280so318222485a.0 for ; Tue, 28 Nov 2023 11:08:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198481; x=1701803281; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=QNHSdvJxUhh1bobqc9K0tlmjnWuuFEhSc/suqvGlddM=; b=UlScft13hCYz15dOdHDtvEV3hB+aMAh6Ifm0Twbip3LZKvSwg/ej+S62kv33vJU0Qv 1KDiWFMEfLgQBMBFVWrDBB9cpYATk170PzCx91TBIiuPeGamnwVCefgPcW8xDSRYLwck 8KQYc56SGGeXTGtAYSh2VF403/5f+jwUXmYSjBc83gTz85/uiHZlEv2BcDDjnyqyb/Fq BVrguR6fzxpozan9T1uHLDqXmu52ZTDiq9rd9K6kBP8lMYoD8iu922lVmsm4PVN7wLeu MQDm3zltKQ2rRps+X3CLtdTaw893VL9kU1g8bG0/jDAcYbMZ6mKPnbpnL3NsK8KWxhfq sQEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198481; x=1701803281; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=QNHSdvJxUhh1bobqc9K0tlmjnWuuFEhSc/suqvGlddM=; b=JI+hKg2LFyXt1OnrQ8q+CQpG7LeIbVtttMVBDaavAfndXsecnNoZlW45yOPam/EBrz 4Fx9gBYDIQUR2ObSERodHnSjPesxX7/cl7N8d4wTPamT0IVsG2pO/Kwz0qSREh+NHtJO Aa/tRA4cGr8SQ2zwI68itSEdrQ3McUpdPd0fVVGqgLrsFUSkOqjRZC5jlo0QJ+3DrI3S uR3J/u2EogYtkx38aNKRDRf2m++IyePaRGSwwRs/nZo4hzOAipUGFopuQXfJY/S4kVKX ljm3zF909mbJyKUx3wXaucxi8TGUzD4dbP+8cgfEo3jTgbR4Q6t7uBVK8y9KVAIzlq4Y HAiA== X-Gm-Message-State: AOJu0Yx7IA7f1G/XDF9r6RP1NQpOi83u4ECM5lBwQH54+pTcf2IO84OZ jo44b6VexpRCyFEDRT3KQU6Pdc2Ng+Da2bccMx4= X-Google-Smtp-Source: AGHT+IHILdUxJ8W9ai8usxuAQMoR469YyRnyJBFksl5Zz28nyE2zNr2uuztipeKcOFFZUVZo/fMbCQ== X-Received: by 2002:a05:620a:6409:b0:77d:6804:aeda with SMTP id pz9-20020a05620a640900b0077d6804aedamr20712114qkn.19.1701198480965; Tue, 28 Nov 2023 11:08:00 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id c19-20020a05620a11b300b0077dc7cb232asm289063qkk.103.2023.11.28.11.08.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:00 -0800 (PST) Date: Tue, 28 Nov 2023 14:07:59 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 02/24] pack-bitmap-write: deep-clear the `bb_commit` slab Message-ID: <6f5ff96998946f3f49da56fd05c096b949521339.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The `bb_commit` commit slab is used by the pack-bitmap-write machinery to track various pieces of bookkeeping used to generate reachability bitmaps. Even though we clear the slab when freeing the bitmap_builder struct (with `bitmap_builder_clear()`), there are still pointers which point to locations in memory that have not yet been freed, resulting in a leak. Plug the leak by introducing a suitable `free_fn` for the `struct bb_commit` type, and make sure it is called on each member of the slab via the `deep_clear_bb_data()` function. Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index f4ecdf8b0e..dd3a415b9d 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -198,6 +198,13 @@ struct bb_commit { unsigned idx; /* within selected array */ }; +static void clear_bb_commit(struct bb_commit *commit) +{ + free(commit->reverse_edges); + bitmap_free(commit->commit_mask); + bitmap_free(commit->bitmap); +} + define_commit_slab(bb_data, struct bb_commit); struct bitmap_builder { @@ -339,7 +346,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, static void bitmap_builder_clear(struct bitmap_builder *bb) { - clear_bb_data(&bb->data); + deep_clear_bb_data(&bb->data, clear_bb_commit); free(bb->commits); bb->commits_nr = bb->commits_alloc = 0; } From patchwork Tue Nov 28 19:08:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471563 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="DhngvcuN" Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA375C3 for ; Tue, 28 Nov 2023 11:08:04 -0800 (PST) Received: by mail-qk1-x72e.google.com with SMTP id af79cd13be357-77d90497b86so184368585a.0 for ; Tue, 28 Nov 2023 11:08:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198483; x=1701803283; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=CA5ONnhb7PPzj475MAowp+4NXW6jj44d07pJCN46+3g=; b=DhngvcuNdRfvEQIxZk7MPLhfqmz1q7Urjx0PxIUZNdKS2sfuedRalcwpS0u0ndYNt4 KI1PT73ktvM0s1PBNacF7lfIwfAnJrtYBEurXdVX53dms3itiClD2RoNLlJr2KyCQmTJ w40I/F6b7JzMkFdRIwCoWPmqmAaEyJYZvAd27KY8SXD+/Xx3VLFd5E7kW9ZUxHqft2Nh V8PMlL/sviq6U9uq2Je5ptRfXrwdvSqzc1iE9SaYHyLImE/w2Ioa2Uc9Rfj0nAuidGvH VQQc/g6OuvM8spjRm+eWdWxSnqmNjzZcr+46dDHmcAlF5dfWtShpufQidQE0k1A7CZLP cG3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198483; x=1701803283; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=CA5ONnhb7PPzj475MAowp+4NXW6jj44d07pJCN46+3g=; b=G1ZHfO4d/Jr+eNbP3IurPPJtdlKEdqxguXnWk7O+J++n6OQWC7Gvw3UXq+59jUcZeZ 8yOAwaOjJtSEF1HdMkwTcOk57KyKCBelZv+/wZjL4YfkauCEF6X+/hzvipV01OVLwC9I LzAfueNWN4V5eOYvmRJDeTHGIE85tdGnY4IvvG1jPPivUBiqSDn3SbMqfT750Yt9RKMz qPHHJSlkp7o6MICE/AtPjh3MBsbRVVZJiYf+4fYa/iNsfYNBtj7D5uZ6sXgDNjXJkBK9 aa1aQOGeT2WVmBrS/cFuKktifOstcRyTw27nuyjLlRA+6EfkILkXW82ejm/BHqZTmsad mmdg== X-Gm-Message-State: AOJu0Yw9+f6PV2u8NAuHkuQRe5dnKux1FBUZfDrGlCE5Xw7f7vEw9343 nB8fsC9as2OujVKuBCEfAGAckcuk3QKwOJ98Gzc= X-Google-Smtp-Source: AGHT+IH+NcO+XCA1NpBdDkuenAUF8SL7YIrZcg0rXWLaPFFgPLcv+fTn9mUWYVxY3msihL49hKyy3A== X-Received: by 2002:a05:620a:802:b0:77d:9aa8:80f0 with SMTP id s2-20020a05620a080200b0077d9aa880f0mr11282223qks.70.1701198483510; Tue, 28 Nov 2023 11:08:03 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id o4-20020a05620a228400b0077d606bec92sm4695576qkh.108.2023.11.28.11.08.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:03 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:02 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 03/24] pack-bitmap: plug leak in find_objects() Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The `find_objects()` function creates an object_list for any tips of the reachability query which do not have corresponding bitmaps. The object_list is not used outside of `find_objects()`, but we never free it with `object_list_free()`, resulting in a leak. Let's plug that leak by calling `object_list_free()`, which results in t6113 becoming leak-free. Signed-off-by: Taylor Blau --- pack-bitmap.c | 2 ++ t/t6113-rev-list-bitmap-filters.sh | 2 ++ 2 files changed, 4 insertions(+) diff --git a/pack-bitmap.c b/pack-bitmap.c index 0260890341..d2f1306960 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1280,6 +1280,8 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, base = fill_in_bitmap(bitmap_git, revs, base, seen); } + object_list_free(¬_mapped); + return base; } diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh index 86c70521f1..459f0d7412 100755 --- a/t/t6113-rev-list-bitmap-filters.sh +++ b/t/t6113-rev-list-bitmap-filters.sh @@ -4,6 +4,8 @@ test_description='rev-list combining bitmaps and filters' . ./test-lib.sh . "$TEST_DIRECTORY"/lib-bitmap.sh +TEST_PASSES_SANITIZE_LEAK=true + test_expect_success 'set up bitmapped repo' ' # one commit will have bitmaps, the other will not test_commit one && From patchwork Tue Nov 28 19:08:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471564 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="ApWRMuPQ" Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D1AED66 for ; Tue, 28 Nov 2023 11:08:07 -0800 (PST) Received: by mail-qt1-x831.google.com with SMTP id d75a77b69052e-4239f2c1209so23772181cf.2 for ; Tue, 28 Nov 2023 11:08:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198486; x=1701803286; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=NT0SmoFD7rUuFip4tMJQaG/hpI4OvIVp7m3Ixsk9Xg4=; b=ApWRMuPQAyer/5bbnjaoJypJdwP4rZVCEYNfYOSihjZ0C41yFMAE8kG11DCMQIQ9cz xDYkhj7Mpo7rheZix9LzkZpgbZliQbFHaysytXQDRmuUV5694u5vwBg99Jf2PY0gd0ml zHxnK7NxairA18Yrn2GWYwORvFE3CWntmDwfqjnGG4pGU9xxjTzCOIFQGqPj2XymjWup rpY7e912vFxG+PVweD+W4P/8/IIpH8oBkQ+mg5R2K7mklVeNypMQsLdtLqk2KmEpU5v3 UHWHiLUkplngIO9xsmotIgCzDOcypPPKTB/XEuztmZ5bZovIYT8Qwwd15jN8XKQ/UVIi AFwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198486; x=1701803286; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=NT0SmoFD7rUuFip4tMJQaG/hpI4OvIVp7m3Ixsk9Xg4=; b=OJ7RqCAdKVBLyTQuvAVm39gelzugsxeVIydc6fas4fwaRV1xrVf/GZ0V29sLoatCUP ptcaaSzH5ceKu1hKcMPrcVOYpMi2E21bpH6NFQVOlkyIMPfLqKDsZpqsqJxo2RglyjWW IQx3pahvyqmOgwYqCD4DzqD73akvz4kx9Bcp67nCP+KDpjoEhVSkNztg1lJrO+/Y3WDk td5KfKzoKzBI1WShLZW4YABmzdKreVNrQ2mOMUqI6y3of6SbuXMpg9p4LGQRNsV9b2Mk gX2dWLOvRCxxQD0u9JGjyM1PO8YQK/7YkD0Cxs4IO3JuqX8/uVX4AV3Qy9C0+UdvsD87 PGCw== X-Gm-Message-State: AOJu0YzKcOnXHmpqNs1qRwvZrHbCNTp4og0HNOeBMuENLCaIZtpopukn Y/Z0u9NyPGbiDiq4GiDPS/Jv5hdNLTKjiM3AnPs= X-Google-Smtp-Source: AGHT+IHy3qRYO6uvmkojf0Z5ZlkmyoCy5dAya4XPLOLF1R6+1cNkIydlUUfyt/DVF/ZHdtS34MRSUw== X-Received: by 2002:ac8:7dcf:0:b0:41e:1b18:f4a2 with SMTP id c15-20020ac87dcf000000b0041e1b18f4a2mr21781682qte.36.1701198486343; Tue, 28 Nov 2023 11:08:06 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id o1-20020ac84281000000b0041cb8947ed2sm4810354qtl.26.2023.11.28.11.08.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:06 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:05 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 04/24] midx: factor out `fill_pack_info()` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When selecting which packfiles will be written while generating a MIDX, the MIDX internals fill out a 'struct pack_info' with various pieces of book-keeping. Instead of filling out each field of the `pack_info` structure individually in each of the two spots that modify the array of such structures (`ctx->info`), extract a common routine that does this for us. This reduces the code duplication by a modest amount. But more importantly, it zero-initializes the structure before assigning values into it. This hardens us for a future change which will add additional fields to this structure which (until this patch) was not zero-initialized. As a result, any new fields added to the `pack_info` structure need only be updated in a single location, instead of at each spot within midx.c. There are no functional changes in this patch. Signed-off-by: Taylor Blau --- midx.c | 35 +++++++++++++++++++---------------- 1 file changed, 19 insertions(+), 16 deletions(-) diff --git a/midx.c b/midx.c index 3b727dc633..591b3c636e 100644 --- a/midx.c +++ b/midx.c @@ -464,6 +464,17 @@ struct pack_info { unsigned expired : 1; }; +static void fill_pack_info(struct pack_info *info, + struct packed_git *p, char *pack_name, + uint32_t orig_pack_int_id) +{ + memset(info, 0, sizeof(struct pack_info)); + + info->orig_pack_int_id = orig_pack_int_id; + info->pack_name = pack_name; + info->p = p; +} + static int pack_info_compare(const void *_a, const void *_b) { struct pack_info *a = (struct pack_info *)_a; @@ -504,6 +515,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, const char *file_name, void *data) { struct write_midx_context *ctx = data; + struct packed_git *p; if (ends_with(file_name, ".idx")) { display_progress(ctx->progress, ++ctx->pack_paths_checked); @@ -530,17 +542,14 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc); - ctx->info[ctx->nr].p = add_packed_git(full_path, - full_path_len, - 0); - - if (!ctx->info[ctx->nr].p) { + p = add_packed_git(full_path, full_path_len, 0); + if (!p) { warning(_("failed to add packfile '%s'"), full_path); return; } - if (open_pack_index(ctx->info[ctx->nr].p)) { + if (open_pack_index(p)) { warning(_("failed to open pack-index '%s'"), full_path); close_pack(ctx->info[ctx->nr].p); @@ -548,9 +557,8 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, return; } - ctx->info[ctx->nr].pack_name = xstrdup(file_name); - ctx->info[ctx->nr].orig_pack_int_id = ctx->nr; - ctx->info[ctx->nr].expired = 0; + fill_pack_info(&ctx->info[ctx->nr], p, xstrdup(file_name), + ctx->nr); ctx->nr++; } } @@ -1310,11 +1318,6 @@ static int write_midx_internal(const char *object_dir, for (i = 0; i < ctx.m->num_packs; i++) { ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc); - ctx.info[ctx.nr].orig_pack_int_id = i; - ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]); - ctx.info[ctx.nr].p = ctx.m->packs[i]; - ctx.info[ctx.nr].expired = 0; - if (flags & MIDX_WRITE_REV_INDEX) { /* * If generating a reverse index, need to have @@ -1330,10 +1333,10 @@ static int write_midx_internal(const char *object_dir, if (open_pack_index(ctx.m->packs[i])) die(_("could not open index for %s"), ctx.m->packs[i]->pack_name); - ctx.info[ctx.nr].p = ctx.m->packs[i]; } - ctx.nr++; + fill_pack_info(&ctx.info[ctx.nr++], ctx.m->packs[i], + xstrdup(ctx.m->pack_names[i]), i); } } From patchwork Tue Nov 28 19:08:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471566 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="ipwbJoUW" Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com [IPv6:2607:f8b0:4864:20::f2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B58D1733 for ; Tue, 28 Nov 2023 11:08:10 -0800 (PST) Received: by mail-qv1-xf2a.google.com with SMTP id 6a1803df08f44-67a31b64a9aso18686186d6.0 for ; Tue, 28 Nov 2023 11:08:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198489; x=1701803289; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=kQrr3leSKb3XekPDx8zb8zAgnsSrKl75AYfTDMuWnS4=; b=ipwbJoUWzEwrOrW9ntwohoIosDg3g5kPQSd+7ytOVzOpidg7ujN4GcdCn2pWhviI1a DZ/ViKExk3D3jCZtts44rpLoYR+OV2U7Sicaxe7wvA1lFsBN+D1CDKewlMNixkr8asBz Zcwc2gybwEsU5OkADv3YJAIg2KKVKaVNteFh6OPXAfoSEguzVr1e5Vov03gORzqHg5LV 2q6FZr8mXz/7MY5hRbMPXz7z8iT0Yscdsnb8hlS4VmJ6bUrBNuRXQb3tX9WjBouVwNPj yd8Axq0pfW9bd5i8ISLhbueD+zpMwOvw+Lc0gqn/fluWrHfFfGrPCrEceLYIKc0vdzgW BrYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198489; x=1701803289; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=kQrr3leSKb3XekPDx8zb8zAgnsSrKl75AYfTDMuWnS4=; b=onKBTarpDXfJKOw5N/wvChXZeT+stMDaAmKxHtLYL6Ret9V2YUfrmOuF0stU0ry4w2 6bvwl/a2uyU5h+7j7FKZ2Zk7sNewZxevk8SqoyPXYr2YWi1OMyB5T1ygBMUqtpAOCJXu YBybdmZBRiEFHnSCKHtdJsRx88/xTaEz0fSTgttPO3NsbeBj7gN4xTMxwJLgbStPUUcu ItQ95OYDshOUGQmqnKWDfT+ezojP9O05jDkIK5f2ayJEwV8ps6wApkcycSTzCJAVdJAd 4JgrNqN7+746pOVoYj+IfYdtyFpzYrwjKNdAY1U2axYzph21nLStCZpWTyEsJZheleGh VRRg== X-Gm-Message-State: AOJu0YxMp7/ojrMAX0009TqGJ3IARNSpp5p9Bo+HsBy6uuPPBPY9KCjo QBzgvZOmimEIlXos95D2shTR99LPZQX6+elk/14= X-Google-Smtp-Source: AGHT+IHk1G9co26ZDkBXbg9V9CRi5iWY7HM4cymzGVOJNw8+X39/z/MP3MUEIXm7gJtHpznZnqECbw== X-Received: by 2002:a05:6214:86:b0:67a:2b0b:c591 with SMTP id n6-20020a056214008600b0067a2b0bc591mr13129514qvr.25.1701198489111; Tue, 28 Nov 2023 11:08:09 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id e3-20020ad44183000000b0067a2637e3e9sm3655739qvp.121.2023.11.28.11.08.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:08 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:08 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 05/24] midx: implement `DISP` chunk Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When a multi-pack bitmap is used to implement verbatim pack reuse (that is, when verbatim chunks from an on-disk packfile are copied directly[^1]), it does so by using its "preferred pack" as the source for pack-reuse. This allows repositories to pack the majority of their objects into a single (often large) pack, and then use it as the single source for verbatim pack reuse. This increases the amount of objects that are reused verbatim (and consequently, decrease the amount of time it takes to generate many packs). But this performance comes at a cost, which is that the preferred packfile must pace its growth with that of the entire repository in order to maintain the utility of verbatim pack reuse. As repositories grow beyond what we can reasonably store in a single packfile, the utility of verbatim pack reuse diminishes. Or, at the very least, it becomes increasingly more expensive to maintain as the pack grows larger and larger. It would be beneficial to be able to perform this same optimization over multiple packs, provided some modest constraints (most importantly, that the set of packs eligible for verbatim reuse are disjoint with respect to the objects that they contain). If we assume that the packs which we treat as candidates for verbatim reuse are disjoint with respect to their objects, we need to make only modest modifications to the verbatim pack-reuse code itself. Most notably, we need to remove the assumption that the bits in the reachability bitmap corresponding to objects from the single reuse pack begin at the first bit position. Future patches will unwind these assumptions and reimplement their existing functionality as special cases of the more general assumptions (e.g. that reuse bits can start anywhere within the bitset, but happen to start at 0 for all existing cases). This patch does not yet relax any of those assumptions. Instead, it implements a foundational data-structure, the "Disjoint Packs" (`DISP`) chunk of the multi-pack index. The `DISP` chunk's contents are described in detail here. Importantly, the `DISP` chunk contains information to map regions of a multi-pack index's reachability bitmap to the packs whose objects they represent. For now, this chunk is only written, not read (outside of the test-tool used in this patch to test the new chunk's behavior). Future patches will begin to make use of this new chunk. This patch implements reading (though no callers outside of the above one perform any reading) and writing this new chunk. It also extends the `--stdin-packs` format used by the `git multi-pack-index write` builtin to be able to designate that a given pack is to be marked as "disjoint" by prefixing it with a '+' character. [^1]: Modulo patching any `OFS_DELTA`'s that cross over a region of the pack that wasn't used verbatim. Signed-off-by: Taylor Blau --- Documentation/git-multi-pack-index.txt | 4 + Documentation/gitformat-pack.txt | 109 +++++++++++++++++++++++ builtin/multi-pack-index.c | 10 ++- midx.c | 116 ++++++++++++++++++++++--- midx.h | 5 ++ pack-bitmap.h | 9 ++ t/helper/test-read-midx.c | 31 ++++++- t/t5319-multi-pack-index.sh | 58 +++++++++++++ 8 files changed, 325 insertions(+), 17 deletions(-) diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index 3696506eb3..d130e65b28 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -49,6 +49,10 @@ write:: --stdin-packs:: Write a multi-pack index containing only the set of line-delimited pack index basenames provided over stdin. + Lines beginning with a '+' character (followed by the + pack index basename as before) have their pack marked as + "disjoint". See the "`DISP` chunk and disjoint packs" + section in linkgit:gitformat-pack[5] for more. --refs-snapshot=:: With `--bitmap`, optionally specify a file which diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt index 9fcb29a9c8..658682ddd5 100644 --- a/Documentation/gitformat-pack.txt +++ b/Documentation/gitformat-pack.txt @@ -396,6 +396,22 @@ CHUNK DATA: is padded at the end with between 0 and 3 NUL bytes to make the chunk size a multiple of 4 bytes. + Disjoint Packfiles (ID: {'D', 'I', 'S', 'P'}) + Stores a table of three 4-byte unsigned integers in network order. + Each table entry corresponds to a single pack (in the order that + they appear above in the `PNAM` chunk). The values for each table + entry are as follows: + - The first bit position (in psuedo-pack order, see below) to + contain an object from that pack. + - The number of bits whose objects are selected from that pack. + - A "meta" value, whose least-significant bit indicates whether or + not the pack is disjoint with respect to other packs. The + remaining bits are unused. + Two packs are "disjoint" with respect to one another when they have + disjoint sets of objects. In other words, any object found in a pack + contained in the set of disjoint packfiles is guaranteed to be + uniquely located among those packs. + OID Fanout (ID: {'O', 'I', 'D', 'F'}) The ith entry, F[i], stores the number of OIDs with first byte at most i. Thus F[255] stores the total @@ -509,6 +525,99 @@ packs arranged in MIDX order (with the preferred pack coming first). The MIDX's reverse index is stored in the optional 'RIDX' chunk within the MIDX itself. +=== `DISP` chunk and disjoint packs + +The Disjoint Packfiles (`DISP`) chunk encodes additional information +about the objects in the multi-pack index's reachability bitmap. Recall +that objects from the MIDX are arranged in "pseudo-pack" order (see: +above) for reachability bitmaps. + +From the example above, suppose we have packs "a", "b", and "c", with +10, 15, and 20 objects, respectively. In pseudo-pack order, those would +be arranged as follows: + + |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19| + +When working with single-pack bitmaps (or, equivalently, multi-pack +reachability bitmaps without any packs marked as disjoint), +linkgit:git-pack-objects[1] performs ``verbatim'' reuse, attempting to +reuse chunks of the existing packfile instead of adding objects to the +packing list. + +When a chunk of bytes are reused from an existing pack, any objects +contained therein do not need to be added to the packing list, saving +memory and CPU time. But a chunk from an existing packfile can only be +reused when the following conditions are met: + + - The chunk contains only objects which were requested by the caller + (i.e. does not contain any objects which the caller didn't ask for + explicitly or implicitly). + + - All objects stored as offset- or reference-deltas also include their + base object in the resulting pack. + +Additionally, packfiles many not contain more than one copy of any given +object. This introduces an additional constraint over the set of packs +we may want to reuse. The most straightforward approach is to mandate +that the set of packs is disjoint with respect to the set of objects +contained in each pack. In other words, for each object `o` in the union +of all objects stored by the disjoint set of packs, `o` is contained in +exactly one pack from the disjoint set. + +One alternative design choice for multi-pack reuse might instead involve +imposing a chunk-level constraint that allows packs in the reusable set +to contain multiple copies across different packs, but restricts each +chunk against including more than one copy of such an object. This is in +theory possible to implement, but significantly more complicated than +forcing packs themselves to be disjoint. Most notably, we would have to +keep track of which objects have already been sent during verbatim +pack-reuse, defeating the main purpose of verbatim pack reuse (that we +don't have to keep track of individual objects). + +The `DISP` chunk encodes the necessary information in order to implement +multi-pack reuse over a disjoint set of packs as described above. +Specifically, the `DISP` chunk encodes three pieces of information (all +32-bit unsigned integers in network byte-order) for each packfile `p` +that is stored in the MIDX, as follows: + +`bitmap_pos`:: The first bit position (in pseudo-pack order) in the + multi-pack index's reachability bitmap occupied by an object from `p`. + +`bitmap_nr`:: The number of bit positions (including the one at + `bitmap_pos`) that encode objects from that pack `p`. + +`disjoint`:: Metadata, including whether or not the pack `p` is + ``disjoint''. The least significant bit stores whether or not the pack + is disjoint. The remaining bits are reserved for future use. + +For example, the `DISP` chunk corresponding to the above example (with +packs ``a'', ``b'', and ``c'') would look like: + +[cols="1,2,2,2"] +|=== +| |`bitmap_pos` |`bitmap_nr` |`disjoint` + +|packfile ``a'' +|`0` +|`10` +|`0x1` + +|packfile ``b'' +|`10` +|`15` +|`0x1` + +|packfile ``c'' +|`25` +|`20` +|`0x1` +|=== + +With these constraints and information in place, we can treat each +packfile marked as disjoint as individually reusable in the same fashion +as verbatim pack reuse is performed on individual packs prior to the +implementation of the `DISP` chunk. + == cruft packs The cruft packs feature offer an alternative to Git's traditional mechanism of diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index a72aebecaa..0f1dd4651d 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -106,11 +106,17 @@ static int git_multi_pack_index_write_config(const char *var, const char *value, return 0; } +#define DISJOINT ((void*)(uintptr_t)1) + static void read_packs_from_stdin(struct string_list *to) { struct strbuf buf = STRBUF_INIT; - while (strbuf_getline(&buf, stdin) != EOF) - string_list_append(to, buf.buf); + while (strbuf_getline(&buf, stdin) != EOF) { + if (*buf.buf == '+') + string_list_append(to, buf.buf + 1)->util = DISJOINT; + else + string_list_append(to, buf.buf); + } string_list_sort(to); strbuf_release(&buf); diff --git a/midx.c b/midx.c index 591b3c636e..f55020072f 100644 --- a/midx.c +++ b/midx.c @@ -33,6 +33,7 @@ #define MIDX_CHUNK_ALIGNMENT 4 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */ +#define MIDX_CHUNKID_DISJOINTPACKS 0x44495350 /* "DISP" */ #define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */ @@ -182,6 +183,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets, &m->chunk_large_offsets_len); + pair_chunk(cf, MIDX_CHUNKID_DISJOINTPACKS, + (const unsigned char **)&m->chunk_disjoint_packs, + &m->chunk_disjoint_packs_len); if (git_env_bool("GIT_TEST_MIDX_READ_RIDX", 1)) pair_chunk(cf, MIDX_CHUNKID_REVINDEX, &m->chunk_revindex, @@ -275,6 +279,23 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t return 0; } +int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m, + struct bitmapped_pack *bp, uint32_t pack_int_id) +{ + if (!m->chunk_disjoint_packs) + return error(_("MIDX does not contain the DISP chunk")); + + if (prepare_midx_pack(r, m, pack_int_id)) + return error(_("could not load disjoint pack %"PRIu32), pack_int_id); + + bp->p = m->packs[pack_int_id]; + bp->bitmap_pos = get_be32(m->chunk_disjoint_packs + 3 * pack_int_id); + bp->bitmap_nr = get_be32(m->chunk_disjoint_packs + 3 * pack_int_id + 1); + bp->disjoint = !!get_be32(m->chunk_disjoint_packs + 3 * pack_int_id + 2); + + return 0; +} + int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result) { return bsearch_hash(oid->hash, m->chunk_oid_fanout, m->chunk_oid_lookup, @@ -457,11 +478,18 @@ static size_t write_midx_header(struct hashfile *f, return MIDX_HEADER_SIZE; } +#define BITMAP_POS_UNKNOWN (~((uint32_t)0)) + struct pack_info { uint32_t orig_pack_int_id; char *pack_name; struct packed_git *p; - unsigned expired : 1; + + uint32_t bitmap_pos; + uint32_t bitmap_nr; + + unsigned expired : 1, + disjoint : 1; }; static void fill_pack_info(struct pack_info *info, @@ -473,6 +501,7 @@ static void fill_pack_info(struct pack_info *info, info->orig_pack_int_id = orig_pack_int_id; info->pack_name = pack_name; info->p = p; + info->bitmap_pos = BITMAP_POS_UNKNOWN; } static int pack_info_compare(const void *_a, const void *_b) @@ -516,6 +545,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, { struct write_midx_context *ctx = data; struct packed_git *p; + struct string_list_item *item = NULL; if (ends_with(file_name, ".idx")) { display_progress(ctx->progress, ++ctx->pack_paths_checked); @@ -534,11 +564,13 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, * should be performed independently (likely checking * to_include before the existing MIDX). */ - if (ctx->m && midx_contains_pack(ctx->m, file_name)) - return; - else if (ctx->to_include && - !string_list_has_string(ctx->to_include, file_name)) + if (ctx->m && midx_contains_pack(ctx->m, file_name)) { return; + } else if (ctx->to_include) { + item = string_list_lookup(ctx->to_include, file_name); + if (!item) + return; + } ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc); @@ -559,6 +591,8 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, fill_pack_info(&ctx->info[ctx->nr], p, xstrdup(file_name), ctx->nr); + if (item) + ctx->info[ctx->nr].disjoint = !!item->util; ctx->nr++; } } @@ -568,7 +602,8 @@ struct pack_midx_entry { uint32_t pack_int_id; time_t pack_mtime; uint64_t offset; - unsigned preferred : 1; + unsigned preferred : 1, + disjoint : 1; }; static int midx_oid_compare(const void *_a, const void *_b) @@ -586,6 +621,12 @@ static int midx_oid_compare(const void *_a, const void *_b) if (a->preferred < b->preferred) return 1; + /* Sort objects in a disjoint pack last when multiple copies exist. */ + if (a->disjoint < b->disjoint) + return -1; + if (a->disjoint > b->disjoint) + return 1; + if (a->pack_mtime > b->pack_mtime) return -1; else if (a->pack_mtime < b->pack_mtime) @@ -671,6 +712,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, &fanout->entries[fanout->nr], cur_object); fanout->entries[fanout->nr].preferred = 0; + fanout->entries[fanout->nr].disjoint = 0; fanout->nr++; } } @@ -696,6 +738,7 @@ static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout, cur_object, &fanout->entries[fanout->nr], preferred); + fanout->entries[fanout->nr].disjoint = info->disjoint; fanout->nr++; } } @@ -764,14 +807,22 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, * Take only the first duplicate. */ for (cur_object = 0; cur_object < fanout.nr; cur_object++) { - if (cur_object && oideq(&fanout.entries[cur_object - 1].oid, - &fanout.entries[cur_object].oid)) - continue; + struct pack_midx_entry *ours = &fanout.entries[cur_object]; + if (cur_object) { + struct pack_midx_entry *prev = &fanout.entries[cur_object - 1]; + if (oideq(&prev->oid, &ours->oid)) { + if (prev->disjoint && ours->disjoint) + die(_("duplicate object '%s' among disjoint packs '%s', '%s'"), + oid_to_hex(&prev->oid), + info[prev->pack_int_id].pack_name, + info[ours->pack_int_id].pack_name); + continue; + } + } ALLOC_GROW(deduplicated_entries, st_add(*nr_objects, 1), alloc_objects); - memcpy(&deduplicated_entries[*nr_objects], - &fanout.entries[cur_object], + memcpy(&deduplicated_entries[*nr_objects], ours, sizeof(struct pack_midx_entry)); (*nr_objects)++; } @@ -814,6 +865,27 @@ static int write_midx_pack_names(struct hashfile *f, void *data) return 0; } +static int write_midx_disjoint_packs(struct hashfile *f, void *data) +{ + struct write_midx_context *ctx = data; + size_t i; + + for (i = 0; i < ctx->nr; i++) { + struct pack_info *pack = &ctx->info[i]; + if (pack->expired) + continue; + + if (pack->bitmap_pos == BITMAP_POS_UNKNOWN && pack->bitmap_nr) + BUG("pack '%s' has no bitmap position, but has %d bitmapped object(s)", + pack->pack_name, pack->bitmap_nr); + + hashwrite_be32(f, pack->bitmap_pos); + hashwrite_be32(f, pack->bitmap_nr); + hashwrite_be32(f, !!pack->disjoint); + } + return 0; +} + static int write_midx_oid_fanout(struct hashfile *f, void *data) { @@ -981,8 +1053,19 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx) QSORT(data, ctx->entries_nr, midx_pack_order_cmp); ALLOC_ARRAY(pack_order, ctx->entries_nr); - for (i = 0; i < ctx->entries_nr; i++) + for (i = 0; i < ctx->entries_nr; i++) { + struct pack_midx_entry *e = &ctx->entries[data[i].nr]; + struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]]; + if (pack->bitmap_pos == BITMAP_POS_UNKNOWN) + pack->bitmap_pos = i; + pack->bitmap_nr++; pack_order[i] = data[i].nr; + } + for (i = 0; i < ctx->nr; i++) { + struct pack_info *pack = &ctx->info[ctx->pack_perm[i]]; + if (pack->bitmap_pos == BITMAP_POS_UNKNOWN) + pack->bitmap_pos = 0; + } free(data); trace2_region_leave("midx", "midx_pack_order", the_repository); @@ -1283,6 +1366,7 @@ static int write_midx_internal(const char *object_dir, struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; + int pack_disjoint_concat_len = 0; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -1495,8 +1579,10 @@ static int write_midx_internal(const char *object_dir, } for (i = 0; i < ctx.nr; i++) { - if (!ctx.info[i].expired) - pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1; + if (ctx.info[i].expired) + continue; + pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1; + pack_disjoint_concat_len += 3 * sizeof(uint32_t); } /* Check that the preferred pack wasn't expired (if given). */ @@ -1556,6 +1642,8 @@ static int write_midx_internal(const char *object_dir, add_chunk(cf, MIDX_CHUNKID_REVINDEX, st_mult(ctx.entries_nr, sizeof(uint32_t)), write_midx_revindex); + add_chunk(cf, MIDX_CHUNKID_DISJOINTPACKS, + pack_disjoint_concat_len, write_midx_disjoint_packs); } write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs); diff --git a/midx.h b/midx.h index a5d98919c8..cdd16a8378 100644 --- a/midx.h +++ b/midx.h @@ -7,6 +7,7 @@ struct object_id; struct pack_entry; struct repository; +struct bitmapped_pack; #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX" #define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \ @@ -33,6 +34,8 @@ struct multi_pack_index { const unsigned char *chunk_pack_names; size_t chunk_pack_names_len; + const uint32_t *chunk_disjoint_packs; + size_t chunk_disjoint_packs_len; const uint32_t *chunk_oid_fanout; const unsigned char *chunk_oid_lookup; const unsigned char *chunk_object_offsets; @@ -58,6 +61,8 @@ void get_midx_rev_filename(struct strbuf *out, struct multi_pack_index *m); struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local); int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id); +int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m, + struct bitmapped_pack *bp, uint32_t pack_int_id); int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result); off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos); uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos); diff --git a/pack-bitmap.h b/pack-bitmap.h index 5273a6a019..b7fa1a42a9 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -52,6 +52,15 @@ typedef int (*show_reachable_fn)( struct bitmap_index; +struct bitmapped_pack { + struct packed_git *p; + + uint32_t bitmap_pos; + uint32_t bitmap_nr; + + unsigned disjoint : 1; +}; + struct bitmap_index *prepare_bitmap_git(struct repository *r); struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx); void count_bitmap_commit_list(struct bitmap_index *, uint32_t *commits, diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c index e9a444ddba..4b44995dca 100644 --- a/t/helper/test-read-midx.c +++ b/t/helper/test-read-midx.c @@ -100,10 +100,37 @@ static int read_midx_preferred_pack(const char *object_dir) return 0; } +static int read_midx_bitmapped_packs(const char *object_dir) +{ + struct multi_pack_index *midx = NULL; + struct bitmapped_pack pack; + uint32_t i; + + setup_git_directory(); + + midx = load_multi_pack_index(object_dir, 1); + if (!midx) + return 1; + + for (i = 0; i < midx->num_packs; i++) { + if (nth_bitmapped_pack(the_repository, midx, &pack, i) < 0) + return 1; + + printf("%s\n", pack_basename(pack.p)); + printf(" bitmap_pos: %"PRIuMAX"\n", (uintmax_t)pack.bitmap_pos); + printf(" bitmap_nr: %"PRIuMAX"\n", (uintmax_t)pack.bitmap_nr); + printf(" disjoint: %s\n", pack.disjoint & 0x1 ? "yes" : "no"); + } + + close_midx(midx); + + return 0; +} + int cmd__read_midx(int argc, const char **argv) { if (!(argc == 2 || argc == 3)) - usage("read-midx [--show-objects|--checksum|--preferred-pack] "); + usage("read-midx [--show-objects|--checksum|--preferred-pack|--bitmap] "); if (!strcmp(argv[1], "--show-objects")) return read_midx_file(argv[2], 1); @@ -111,5 +138,7 @@ int cmd__read_midx(int argc, const char **argv) return read_midx_checksum(argv[2]); else if (!strcmp(argv[1], "--preferred-pack")) return read_midx_preferred_pack(argv[2]); + else if (!strcmp(argv[1], "--bitmap")) + return read_midx_bitmapped_packs(argv[2]); return read_midx_file(argv[1], 0); } diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index c4c6060cee..fd24e0c952 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -1157,4 +1157,62 @@ test_expect_success 'reader notices too-small revindex chunk' ' test_cmp expect.err err ' +test_expect_success 'disjoint packs are stored via the DISP chunk' ' + test_when_finished "rm -fr repo" && + git init repo && + ( + cd repo && + + for i in 1 2 3 4 5 + do + test_commit "$i" && + git repack -d || return 1 + done && + + find $objdir/pack -type f -name "*.idx" | xargs -n 1 basename | sort >packs && + + git multi-pack-index write --stdin-packs err && + cat >expect <<-\EOF && + error: MIDX does not contain the DISP chunk + EOF + test_cmp expect err && + + sed -e "s/^/+/g" packs >in && + git multi-pack-index write --stdin-packs --bitmap \ + --preferred-pack="$(head -n1 actual && + for i in $(test_seq $(wc -l expect && + test_cmp expect actual + ) +' + +test_expect_success 'non-disjoint packs are detected' ' + test_when_finished "rm -fr repo" && + git init repo && + ( + cd repo && + + test_commit base && + git repack -d && + test_commit other && + git repack -a && + + ls -la .git/objects/pack/ && + + find $objdir/pack -type f -name "*.idx" | + sed -e "s/.*\/\(.*\)$/+\1/g" >in && + + test_must_fail git multi-pack-index write --stdin-packs \ + --bitmap err && + grep "duplicate object.* among disjoint packs" err + ) +' + test_done From patchwork Tue Nov 28 19:08:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471565 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="Tl5dz+v6" Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0985410EC for ; Tue, 28 Nov 2023 11:08:12 -0800 (PST) Received: by mail-qk1-x72c.google.com with SMTP id af79cd13be357-778ac9c898dso276704385a.0 for ; Tue, 28 Nov 2023 11:08:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198492; x=1701803292; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=GEDtWcgKkKFMfj4d8K6Y4PlGFqXS3yAeCGaW38EJJH8=; b=Tl5dz+v6fpKCwd1IHrvwBkP9bGNQHH9lKBnlUWK2zQ7WlGaVzsWMrkejoEXlPU4lZA tL7Y5o23O4vij/707OyScX4afkvZnvjhvl6R6PVrFOiIqq9UknyQOtNzHEqShZGc90dV MtfqhHU3nlA5wmklVP5Vk+lAxqEkQUS6uuYaCU4QHFo4i9WBxSSiWd1DWwqyPfk/iib3 Uc1L3VGaLgPSNeiitduuvrh8AGC1XDVxUADy3URP/DHGSn2LpcDVdDko3oDuPgimgG/b 2ubKuHQ8Na2pIWeyDl1xWYGxuyHeuOEG/lteoBGOtIBxU5Vr1vY/dJdei3stOFyMXmpz hqBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198492; x=1701803292; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=GEDtWcgKkKFMfj4d8K6Y4PlGFqXS3yAeCGaW38EJJH8=; b=PVlNvXA5JUyTcxfCTCLzkzU5gFODUFBwzsB54SKdXOQkfQXMAFWSw8E9ykejRrjI/U jz3PGu6p/dwyW1cn6HEAneFNt2GJCBiek1lJsJaO/aRNT6GYavRmgKcxG8ZR6XSzT9Gt nKsjPeDcWsbri1x03QLD9Lsyx48Un3LRD/1/C1NtcicdFMWWcJ1/+pbw7hx1iTUdD2O8 JBdvjmnMFfassBBOxkl5VLSCHACVeHXi0BcN9XQ/xn4piFtmK/T+pme5lXM0CSpl94Mu dn1qu5EzgOXem1xhqYBV2DAkap5M2mTrQp4XDsGa7Cuv/QCpnlUsN1+OV7PxLuKXdo+q fpKA== X-Gm-Message-State: AOJu0YwMNB+9p7daGONd88lFofeA3DU4oSfm3EaQPDfzvpKi9TYF10TX GxJvDbtCatwLfmNANrHuoEbxyepnQHo4s8L/0T4= X-Google-Smtp-Source: AGHT+IH6H6DgOw5Y7jLzNCRRfxul3ern0EAsO3FLTSo/OfWTr1neQ05GtUEUSReRHcUz6ohtIn+Isw== X-Received: by 2002:a05:620a:1992:b0:77d:bbb7:4690 with SMTP id bm18-20020a05620a199200b0077dbbb74690mr4263950qkb.12.1701198491804; Tue, 28 Nov 2023 11:08:11 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id dt32-20020a05620a47a000b0077d71d46887sm4702414qkb.112.2023.11.28.11.08.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:11 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:10 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 06/24] midx: implement `midx_locate_pack()` Message-ID: <541fbb442b78942dd6319821a340b2298c836138.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The multi-pack index API exposes a `midx_contains_pack()` function that takes in a string ending in either ".idx" or ".pack" and returns whether or not the MIDX contains a given pack corresponding to that string. There is no corresponding function to locate the position of a pack within the MIDX's pack order (sorted lexically by pack filename). We could add an optional out parameter to `midx_contains_pack()` that is filled out with the pack's position when the parameter is non-NULL. To minimize the amount of fallout from this change, instead introduce a new function by renaming `midx_contains_pack()` to `midx_locate_pack()`, adding that output parameter, and then reimplementing `midx_contains_pack()` in terms of it. Future patches will make use of this new function. Signed-off-by: Taylor Blau --- midx.c | 13 +++++++++++-- midx.h | 5 ++++- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/midx.c b/midx.c index f55020072f..65ba0c70fe 100644 --- a/midx.c +++ b/midx.c @@ -413,7 +413,8 @@ static int cmp_idx_or_pack_name(const char *idx_or_pack_name, return strcmp(idx_or_pack_name, idx_name); } -int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) +int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name, + uint32_t *pos) { uint32_t first = 0, last = m->num_packs; @@ -424,8 +425,11 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) current = m->pack_names[mid]; cmp = cmp_idx_or_pack_name(idx_or_pack_name, current); - if (!cmp) + if (!cmp) { + if (pos) + *pos = mid; return 1; + } if (cmp > 0) { first = mid + 1; continue; @@ -436,6 +440,11 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) return 0; } +int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) +{ + return midx_locate_pack(m, idx_or_pack_name, NULL); +} + int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local) { struct multi_pack_index *m; diff --git a/midx.h b/midx.h index cdd16a8378..a6e969c2ea 100644 --- a/midx.h +++ b/midx.h @@ -70,7 +70,10 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid, struct multi_pack_index *m, uint32_t n); int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e, struct multi_pack_index *m); -int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name); +int midx_contains_pack(struct multi_pack_index *m, + const char *idx_or_pack_name); +int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name, + uint32_t *pos); int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local); /* From patchwork Tue Nov 28 19:08:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471567 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="XnNqJjCj" Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4AF6C1733 for ; Tue, 28 Nov 2023 11:08:15 -0800 (PST) Received: by mail-qk1-x72c.google.com with SMTP id af79cd13be357-77dc404d926so40329785a.2 for ; Tue, 28 Nov 2023 11:08:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198494; x=1701803294; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=EX8bDPgtUd9ApCe6VVDMhUFO7Vc2E735OK+M34xcrFw=; b=XnNqJjCj0cNFZwyJrS28rAfuJmSL624QDMfytSj04OTOoiCNbeXqDwsX0HXA3BSfqZ kiiiNtsFD629j9MYZ9ERAqQWhzs1MiJIxwFhTaJAdnWJKawieg+t5/Zxi57mhLIEoxhy fvhIH5nww0RB0IiZlFh5+4J35X3KhgUvseMZx8TFLhsTkBCCahAHK9DMyT0HZraNoN96 +tIMiamTsqfUdM8ZgnfDoeNu6yrGgojqAKGlgnMHHqAn79G/lzwGOCQ7rs43fXH88bei MnBh1k0xlshiA7Fp9ItKgISywW2spzY/uU8S4o3uCKYjsm94/sdrHKrDdhYH7JScQmYT ZFuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198494; x=1701803294; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=EX8bDPgtUd9ApCe6VVDMhUFO7Vc2E735OK+M34xcrFw=; b=XGzXvwbMfOyAtyPwPMOqBXBEtnz4axQ27WHh2z2cZwTEFauHdDvnMF7uzB4+fpCvh8 T0MWTIAZRiJOte8tSU4EaIRJ1DbV6lFh2rlxYtSQxLtZya7C8PP7U3HnP4GsBDe64Ms5 oQ1PSNvSAiqAsr4L75V3lUrHdWlurBhkPW+Slna/EGCT6RzWiVu0palttbShQptN69Oj WHPNaC3hyowhsiepQK9g12C0LIGARImu+FD1vWMyzpZqoThorRtWV63PP+hFnwYrcnN/ H0Nko40ukXbVqgrsIYNaGXsTNKIE5fCEnmW41x4vZEgfOUqpAOzzOl3lkbYBsLLiVKYe r0Qg== X-Gm-Message-State: AOJu0Yx8liD10RHY0wqNQjpO9yLzXMSUUkrDgkDDpQE5SU55HTsXGMJy uHkM2ppnKO1LAaeeSsh6Xlj+ycCJnJsWAf5+7Sg= X-Google-Smtp-Source: AGHT+IHajgjVZCVwr941lGjjnUvUmlyMpYvVwBORCUNvSvfL/84ma088on1D16qLi0WCrVrRqnY+Xw== X-Received: by 2002:a05:620a:a4e:b0:76e:ef17:d37e with SMTP id j14-20020a05620a0a4e00b0076eef17d37emr16183541qka.71.1701198494490; Tue, 28 Nov 2023 11:08:14 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id f12-20020a05620a15ac00b0077dc395df88sm472205qkk.32.2023.11.28.11.08.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:14 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:13 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 07/24] midx: implement `--retain-disjoint` mode Message-ID: <3019738b52ba8cd78ea696a3b800fa91e722eb66.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Once multi-pack reachability bitmaps learn how to perform pack reuse over the set of disjoint packs, we will want to teach `git repack` to evolve the set of disjoint packs over time. To evolve the set of disjoint packs means any new packs made by `repack` should be disjoint with respect to the existing set of disjoint packs so as to be able to join that set when updating the multi-pack index. The details of generating such packs will be left to future commits. But any new pack(s) created by repack as disjoint will be marked as such by passing them over `--stdin-packs` with the special '+' marker when generating a new MIDX. This patch, however, addresses the question of how we retain the existing set of disjoint packs when updating the multi-pack index. One option would be for `repack` to keep track of the set of disjoint packs itself by querying the MIDX, and then adding the special '+' marker appropriately when generating the input for `--stdin-packs`. But this is verbose and error-prone, since two different parts of Git would need to maintain the same notion of the set of disjoint packs. When one disagrees with the other, the set of so-called disjoint packs may actually contain two or more packs which have one or more object(s) in common, making the set non-disjoint. Instead, introduce a `--retain-disjoint` mode for the `git multi-pack-index write` sub-command which keeps any packs which are: - marked as disjoint in the existing MIDX, and - not deleted (e.g., they are not excluded from the input for `--stdin-packs`). This will allow the `repack` command to not have to keep track of the set of currently-disjoint packs itself, reducing the number of lines of code necessary to implement this feature, and making the resulting implementation less error-prone. Signed-off-by: Taylor Blau --- Documentation/git-multi-pack-index.txt | 8 +++ builtin/multi-pack-index.c | 3 + midx.c | 49 +++++++++++++++ midx.h | 1 + t/lib-disjoint.sh | 38 ++++++++++++ t/t5319-multi-pack-index.sh | 82 ++++++++++++++++++++++++++ 6 files changed, 181 insertions(+) create mode 100644 t/lib-disjoint.sh diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index d130e65b28..ac0c7b124b 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -54,6 +54,14 @@ write:: "disjoint". See the "`DISP` chunk and disjoint packs" section in linkgit:gitformat-pack[5] for more. + --retain-disjoint:: + When writing a multi-pack index with a reachability + bitmap, keep any packs marked as disjoint in the + existing MIDX (if any) as such in the new MIDX. Existing + disjoint packs which are removed (e.g., not listed via + `--stdin-packs`) are ignored. This option works in + addition to the '+' marker for `--stdin-packs`. + --refs-snapshot=:: With `--bitmap`, optionally specify a file which contains a "refs snapshot" taken prior to repacking. diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index 0f1dd4651d..dcfabf2626 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -138,6 +138,9 @@ static int cmd_multi_pack_index_write(int argc, const char **argv, N_("write multi-pack index containing only given indexes")), OPT_FILENAME(0, "refs-snapshot", &opts.refs_snapshot, N_("refs snapshot for selecting bitmap commits")), + OPT_BIT(0, "retain-disjoint", &opts.flags, + N_("retain non-deleted disjoint packs"), + MIDX_WRITE_RETAIN_DISJOINT), OPT_END(), }; diff --git a/midx.c b/midx.c index 65ba0c70fe..ce67da9f85 100644 --- a/midx.c +++ b/midx.c @@ -721,6 +721,12 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, &fanout->entries[fanout->nr], cur_object); fanout->entries[fanout->nr].preferred = 0; + /* + * It's OK to set disjoint to 0 here, even with + * `--retain-disjoint`, since we will always see the disjoint + * copy of some object below in get_sorted_entries(), causing us + * to die(). + */ fanout->entries[fanout->nr].disjoint = 0; fanout->nr++; } @@ -1362,6 +1368,37 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r, return result; } +static int midx_retain_existing_disjoint(struct repository *r, + struct multi_pack_index *from, + struct write_midx_context *ctx) +{ + struct bitmapped_pack bp; + uint32_t i, midx_pos; + + for (i = 0; i < ctx->nr; i++) { + struct pack_info *info = &ctx->info[i]; + /* + * Having to call `midx_locate_pack()` in a loop is + * sub-optimal, since it is O(n*log(n)) in the number + * of packs. + * + * When reusing an existing MIDX, we know that the first + * 'n' packs appear in the same order, so we could avoid + * this when reusing an existing MIDX. But we may be + * instead relying on the order given to us by + * for_each_file_in_pack_dir(), in which case we can't + * make any such guarantees. + */ + if (!midx_locate_pack(from, info->pack_name, &midx_pos)) + continue; + + if (nth_bitmapped_pack(r, from, &bp, midx_pos) < 0) + return -1; + info->disjoint = bp.disjoint; + } + return 0; +} + static int write_midx_internal(const char *object_dir, struct string_list *packs_to_include, struct string_list *packs_to_drop, @@ -1444,6 +1481,18 @@ static int write_midx_internal(const char *object_dir, for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx); stop_progress(&ctx.progress); + if (flags & MIDX_WRITE_RETAIN_DISJOINT) { + struct multi_pack_index *m = ctx.m; + if (!m) + m = lookup_multi_pack_index(the_repository, object_dir); + + if (m) { + result = midx_retain_existing_disjoint(the_repository, m, &ctx); + if (result) + goto cleanup; + } + } + if ((ctx.m && ctx.nr == ctx.m->num_packs) && !(packs_to_include || packs_to_drop)) { struct bitmap_index *bitmap_git; diff --git a/midx.h b/midx.h index a6e969c2ea..d7ce52ff7b 100644 --- a/midx.h +++ b/midx.h @@ -54,6 +54,7 @@ struct multi_pack_index { #define MIDX_WRITE_BITMAP (1 << 2) #define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3) #define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4) +#define MIDX_WRITE_RETAIN_DISJOINT (1 << 5) const unsigned char *get_midx_checksum(struct multi_pack_index *m); void get_midx_filename(struct strbuf *out, const char *object_dir); diff --git a/t/lib-disjoint.sh b/t/lib-disjoint.sh new file mode 100644 index 0000000000..c6c6e74aba --- /dev/null +++ b/t/lib-disjoint.sh @@ -0,0 +1,38 @@ +# Helpers for scripts testing disjoint packs; see t5319 for example usage. + +objdir=.git/objects + +test_disjoint_1 () { + local pack="$1" + local want="$2" + + test-tool read-midx --bitmap $objdir >out && + grep -A 3 "$pack" out >found && + + if ! test -s found + then + echo >&2 "could not find '$pack' in MIDX" + return 1 + fi + + if ! grep -q "disjoint: $want" found + then + echo >&2 "incorrect disjoint state for pack '$pack'" + return 1 + fi + return 0 +} + +# test_must_be_disjoint +# +# Ensures that the given pack is marked as disjoint. +test_must_be_disjoint () { + test_disjoint_1 "$1" "yes" +} + +# test_must_not_be_disjoint +# +# Ensures that the given pack is not marked as disjoint. +test_must_not_be_disjoint () { + test_disjoint_1 "$1" "no" +} diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index fd24e0c952..02cfddf151 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -3,6 +3,7 @@ test_description='multi-pack-indexes' . ./test-lib.sh . "$TEST_DIRECTORY"/lib-chunk.sh +. "$TEST_DIRECTORY"/lib-disjoint.sh GIT_TEST_MULTI_PACK_INDEX=0 objdir=.git/objects @@ -1215,4 +1216,85 @@ test_expect_success 'non-disjoint packs are detected' ' ) ' +test_expect_success 'retain disjoint packs while writing' ' + test_when_finished "rm -fr repo" && + git init repo && + ( + cd repo && + + for i in 1 2 + do + test_commit "$i" && git repack -d || return 1 + done && + + find $objdir/pack -type f -name "pack-*.idx" | + sed -e "s/^.*\/\(.*\)/\1/g" | sort >packs.old && + + test_line_count = 2 packs.old && + disjoint="$(head -n 1 packs.old)" && + non_disjoint="$(tail -n 1 packs.old)" && + + cat >in <<-EOF && + +$disjoint + $non_disjoint + EOF + git multi-pack-index write --stdin-packs --bitmap packs.new && + + new_disjoint="$(comm -13 packs.old packs.new)" && + cat >in <<-EOF && + $disjoint + $non_disjoint + +$new_disjoint + EOF + git multi-pack-index write --stdin-packs --bitmap \ + --retain-disjoint in <<-EOF && + +pack-$base.idx + EOF + git multi-pack-index write --stdin-packs --bitmap in <<-EOF && + pack-$base.idx + +pack-$other.idx + EOF + test_must_fail git multi-pack-index write --stdin-packs --retain-disjoint --bitmap err && + grep "duplicate object.* among disjoint packs" err && + + test_must_fail git multi-pack-index write --retain-disjoint --bitmap 2>err && + grep "duplicate object.* among disjoint packs" err + ) +' + test_done From patchwork Tue Nov 28 19:08:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471568 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="dXQRAjS1" Received: from mail-qv1-xf2f.google.com (mail-qv1-xf2f.google.com [IPv6:2607:f8b0:4864:20::f2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65EE910FB for ; Tue, 28 Nov 2023 11:08:18 -0800 (PST) Received: by mail-qv1-xf2f.google.com with SMTP id 6a1803df08f44-67a42549764so14889936d6.1 for ; Tue, 28 Nov 2023 11:08:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198497; x=1701803297; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=4IxPIRJ5+f4FdzHV3JPmJVntgL2kpL2TSGZ7bDzH3iY=; b=dXQRAjS196hWaueofLs3CRaZwF+JVbguAFCTVAhhGQKZgRO5FFA3wZc7p3/7/+cGjV C6LMZUCS1k6FyaPqNSS+5NtpqTN9clb0j2K67+bnGlOzUNQ+7ZLugQoDfjb//idxN3XY mNPYWnaa49MomioNxk7/SO/IQGZrS4X3OP0FMoviwmkQva/vij5bBZA+knb1jfaq0X8O UBrvPZAxPxeF5Wx3jEMYeJQgiPpSUIXjVy4ueWJbPrHsymtcjz+0silu6euMYt5hr7kl WHTNQfyqNmXeHq/jUuF0LmTOG9wi96pVj0GbAEUbuR749HQMcmY0KfYB8i0eFw7CagB+ ynrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198497; x=1701803297; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4IxPIRJ5+f4FdzHV3JPmJVntgL2kpL2TSGZ7bDzH3iY=; b=MLaquVk2OQ4fCPdp8RamumWOBN+yRRydQH2MX7d3YlzxZylEaz5HPofEDJ+DvbqCq2 ce6WNxlsTzHZ4D+oo9fy5NaMyKBLLyGWnP9qa4WQZLJiRMebTCawTxqpnbfmA3QsnyNq 0r1hQYLJtLH/momutwyR4HsDKDqLHdKzklDdE6MHVu3aZwI33LlVbxCidRlwOdiu7v05 Od0l0otZu8VKIiouDt+K5WL4PDcl/zRfL6qfA1h6+kS0uMzqD2t+eK22Zy6ywJq8L215 N1i34sSpcJOnaMlBkDK/nEoNro9jtos9u642X+P+UgyF1yo5gFfhBQEcFBBI9j4s0qe4 j1bg== X-Gm-Message-State: AOJu0YyJod9m1aZYYIEQwd7sjWWUjtlDKTxweX/RaA1BnfyiPGO6l3YQ jcMishJv7jfbCYw3nawF4H0IJuzsVnwo8eHlUow= X-Google-Smtp-Source: AGHT+IE9PFFszQ0tdmoya3hoLm/zpF+WKxo1hcUIZw8EoUWuQ1JySZ2PdrjQ/FY7z5g6HooqZbiTKw== X-Received: by 2002:a0c:e950:0:b0:67a:35f5:bdea with SMTP id n16-20020a0ce950000000b0067a35f5bdeamr10814242qvo.43.1701198497172; Tue, 28 Nov 2023 11:08:17 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id h17-20020a0cedb1000000b0067a3ad49979sm2562517qvr.96.2023.11.28.11.08.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:16 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:16 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 08/24] pack-objects: implement `--ignore-disjoint` mode Message-ID: <0368f7ab37669163b50b82185725935bde5bc946.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Before multi-pack reachability bitmaps learn how to perform pack reuse over the set of disjoint packs, we will need a way to generate packs that are known to be disjoint with respect to the currently marked set of disjoint packs. In other words, we want a way to make a pack which does not have any objects contained in the union of the set of packs which are currently marked as disjoint. There are a various ways that we could go about this, for example: - passing `--unpacked`, which would exclude all packed objects (and thus would not contain any objects from the disjoint pack) - passing `--stdin-packs` with the set of packs currently marked as disjoint as "excluded", indicating that `pack-objects` should discard any objects present in any of the excluded packs (thus producing a disjoint pack) - marking each of the disjoint packs as kept in-core with the `--keep-pack` flag, and then passing `--honor-pack-keep` to similarly ignore any object(s) from kept packs (thus also producing a pack which is disjoint with respect to the current set) `git repack` is the main entry-point to generating a new pack, by invoking `pack-objects` and then adding the new pack to the set of disjoint packs if generating a new MIDX. However, `repack` has a number of ways to invoke `pack-objects` (e.g., all-into-one repacks, geometric repacks, incremental repacks, etc.), all of which would require careful reasoning in order to prove that the resulting set of packs is disjoint. The most appealing option of the above would be to pass the set of disjoint packs as kept (via `--keep-pack`) and then ignore their contents (with `--honor-pack-keep`), doing so for all kinds of `pack-objects` invocations. But there may be more disjoint packs than we can easily fit into the command-line arguments. Instead, teach `pack-objects` a special `--ignore-disjoint` which is the moral equivalent of marking the set of disjoint packs as kept, and ignoring their contents, even if it would have otherwise been packed. In fact, this similarity extends down to the implementation, where each disjoint pack is first loaded, then has its `pack_keep_in_core` bit set. With this in place, we can use the kept-pack cache from 20b031fede (packfile: add kept-pack cache for find_kept_pack_entry(), 2021-02-22), which looks up objects first in a cache containing just the set of kept (in this case, disjoint) packs. Assuming that the set of disjoint packs is a relatively small portion of the entire repository (which should be a safe assumption to make), each object lookup will be very inexpensive. The only place we want to avoid using `--ignore-disjoint` is in conjunction with `--cruft`, since doing so may cause us to omit an object which would have been included in a new cruft pack in order to freshen it. In other words, failing to do so might cause that object to be pruned from the repository earlier than expected. Otherwise, `--ignore-disjoint` is compatible with most other modes of `pack-objects`. These various combinations are tested below. As a result, `repack` will be able to unconditionally (except for the cruft pack) pass `--ignore-disjoint` when trying to add a new pack to the disjoint set, and the result will be usable, without having to carefully consider and reason about each individual case. Signed-off-by: Taylor Blau --- Documentation/git-pack-objects.txt | 8 ++ builtin/pack-objects.c | 31 +++++- t/lib-disjoint.sh | 11 ++ t/t5331-pack-objects-stdin.sh | 156 +++++++++++++++++++++++++++++ 4 files changed, 203 insertions(+), 3 deletions(-) diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index e32404c6aa..592c4ce742 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -96,6 +96,14 @@ base-name:: Incompatible with `--revs`, or options that imply `--revs` (such as `--all`), with the exception of `--unpacked`, which is compatible. +--ignore-disjoint:: + This flag causes an object that appears in any pack marked as + "disjoint" by the multi-pack index to be ignored, even if it + would have otherwise been packed. When used with + `--stdin-packs`, objects from disjoint packs may be included if + and only if a disjoint pack is explicitly given as an input pack + to `--stdin-packs`. Incompatible with `--cruft`. + --cruft:: Packs unreachable objects into a separate "cruft" pack, denoted by the existence of a `.mtimes` file. Typically used by `git diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index bfa60359d4..107154db34 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -207,6 +207,7 @@ static int have_non_local_packs; static int incremental; static int ignore_packed_keep_on_disk; static int ignore_packed_keep_in_core; +static int ignore_midx_disjoint_packs; static int allow_ofs_delta; static struct pack_idx_option pack_idx_opts; static const char *base_name; @@ -1403,7 +1404,8 @@ static int want_found_object(const struct object_id *oid, int exclude, /* * Then handle .keep first, as we have a fast(er) path there. */ - if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core) { + if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core || + ignore_midx_disjoint_packs) { /* * Set the flags for the kept-pack cache to be the ones we want * to ignore. @@ -1415,7 +1417,7 @@ static int want_found_object(const struct object_id *oid, int exclude, unsigned flags = 0; if (ignore_packed_keep_on_disk) flags |= ON_DISK_KEEP_PACKS; - if (ignore_packed_keep_in_core) + if (ignore_packed_keep_in_core || ignore_midx_disjoint_packs) flags |= IN_CORE_KEEP_PACKS; if (ignore_packed_keep_on_disk && p->pack_keep) @@ -3389,6 +3391,7 @@ static void read_packs_list_from_stdin(void) die(_("could not find pack '%s'"), item->string); if (!is_pack_valid(p)) die(_("packfile %s cannot be accessed"), p->pack_name); + p->pack_keep_in_core = 0; } /* @@ -4266,6 +4269,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) N_("create packs suitable for shallow fetches")), OPT_BOOL(0, "honor-pack-keep", &ignore_packed_keep_on_disk, N_("ignore packs that have companion .keep file")), + OPT_BOOL(0, "ignore-disjoint", &ignore_midx_disjoint_packs, + N_("ignore packs that are marked disjoint in the MIDX")), OPT_STRING_LIST(0, "keep-pack", &keep_pack_list, N_("name"), N_("ignore this pack")), OPT_INTEGER(0, "compression", &pack_compression_level, @@ -4412,7 +4417,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (use_internal_rev_list) die(_("cannot use internal rev list with --cruft")); if (stdin_packs) - die(_("cannot use --stdin-packs with --cruft")); + die(_("cannot use %s with %s"), "--stdin-packs", "--cruft"); + if (ignore_midx_disjoint_packs) + die(_("cannot use %s with %s"), "--ignore-disjoint", "--cruft"); } /* @@ -4452,6 +4459,24 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (!p) /* no keep-able packs found */ ignore_packed_keep_on_disk = 0; } + if (ignore_midx_disjoint_packs) { + struct multi_pack_index *m = get_multi_pack_index(the_repository); + struct bitmapped_pack pack; + unsigned any_disjoint = 0; + uint32_t i; + + for (i = 0; m && m->chunk_disjoint_packs && i < m->num_packs; i++) { + if (nth_bitmapped_pack(the_repository, m, &pack, i) < 0) + die(_("could not load bitmapped pack %i"), i); + if (pack.disjoint) { + pack.p->pack_keep_in_core = 1; + any_disjoint = 1; + } + } + + if (!any_disjoint) /* no disjoint packs to ignore */ + ignore_midx_disjoint_packs = 0; + } if (local) { /* * unlike ignore_packed_keep_on_disk above, we do not diff --git a/t/lib-disjoint.sh b/t/lib-disjoint.sh index c6c6e74aba..c802ca6940 100644 --- a/t/lib-disjoint.sh +++ b/t/lib-disjoint.sh @@ -36,3 +36,14 @@ test_must_be_disjoint () { test_must_not_be_disjoint () { test_disjoint_1 "$1" "no" } + +# packed_contents +# +# Prints the set of objects packed in the given pack indexes. +packed_contents () { + for idx in "$@" + do + git show-index <$idx || return 1 + done >tmp && + cut -d" " -f2 tmp-object-list && @@ -237,4 +238,159 @@ test_expect_success 'pack-objects --stdin with packfiles from main and alternate test_cmp expected-objects actual-objects ' +objdir=.git/objects +packdir=$objdir/pack + +test_expect_success 'loose objects also in disjoint packs are ignored' ' + test_when_finished "rm -fr repo" && + git init repo && + ( + cd repo && + + # create a pack containing the objects in each commit below, but + # do not delete their loose copies + test_commit base && + base_pack="$(echo base | git pack-objects --revs $packdir/pack)" && + + test_commit other && + other_pack="$(echo base..other | git pack-objects --revs $packdir/pack)" && + + cat >in <<-EOF && + pack-$base_pack.idx + +pack-$other_pack.idx + EOF + git multi-pack-index write --stdin-packs --bitmap all && + packed_contents "$packdir/pack-$other_pack.idx" >disjoint && + + # make sure that the set of objects we just generated matches + # "all \ disjoint" + packed_contents "$packdir/pack-$out.idx" >got && + comm -23 all disjoint >want && + test_cmp want got + ) +' + +test_expect_success 'objects in disjoint packs are ignored (--unpacked)' ' + test_when_finished "rm -fr repo" && + git init repo && + ( + cd repo && + + for c in A B + do + test_commit "$c" || return 1 + done && + + A="$(echo "A" | git pack-objects --revs $packdir/pack)" && + B="$(echo "A..B" | git pack-objects --revs $packdir/pack)" && + + cat >in <<-EOF && + pack-$A.idx + +pack-$B.idx + EOF + git multi-pack-index write --stdin-packs --bitmap actual && + + git rev-list --objects --no-object-names B..C >expect.raw && + sort expect && + + test_cmp expect actual + ) +' + +test_expect_success 'objects in disjoint packs are ignored (--stdin-packs)' ' + # Create objects in three separate packs: + # + # - pack A (midx, non disjoint) + # - pack B (midx, disjoint) + # - pack C (non-midx) + # + # Then create a new pack with `--stdin-packs` and `--ignore-disjoint` + # including packs A, B, and C. The resulting pack should contain + # only the objects from packs A, and C, excluding those from + # pack B as it is marked as disjoint. + test_when_finished "rm -fr repo" && + git init repo && + ( + cd repo && + + for c in A B C + do + test_commit "$c" || return 1 + done && + + A="$(echo "A" | git pack-objects --revs $packdir/pack)" && + B="$(echo "A..B" | git pack-objects --revs $packdir/pack)" && + C="$(echo "B..C" | git pack-objects --revs $packdir/pack)" && + + cat >in <<-EOF && + pack-$A.idx + +pack-$B.idx + EOF + git multi-pack-index write --stdin-packs --bitmap in <<-EOF && + pack-$A.pack + ^pack-$B.pack + pack-$C.pack + EOF + got="$(git pack-objects --stdin-packs --ignore-disjoint $packdir/pack actual && + packed_contents "$packdir/pack-$A.idx" \ + "$packdir/pack-$C.idx" >expect && + test_cmp expect actual && + + # Generate another pack with `--stdin-packs`, this time + # using packs "B" and "C". The objects from pack "B" are + # expected to be in the final pack, despite it being a + # disjoint pack, because "B" was mentioned explicitly + # via `stdin-packs`. + cat >in <<-EOF && + pack-$B.pack + pack-$C.pack + EOF + got="$(git pack-objects --stdin-packs --ignore-disjoint $packdir/pack actual && + packed_contents "$packdir/pack-$B.idx" \ + "$packdir/pack-$C.idx" >expect && + test_cmp expect actual + ) +' + +test_expect_success '--cruft is incompatible with --ignore-disjoint' ' + test_must_fail git pack-objects --cruft --ignore-disjoint --stdout \ + /dev/null 2>actual && + cat >expect <<-\EOF && + fatal: cannot use --ignore-disjoint with --cruft + EOF + test_cmp expect actual +' + test_done From patchwork Tue Nov 28 19:08:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471569 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="AsSDdN3+" Received: from mail-qk1-x736.google.com (mail-qk1-x736.google.com [IPv6:2607:f8b0:4864:20::736]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42CCAC3 for ; Tue, 28 Nov 2023 11:08:21 -0800 (PST) Received: by mail-qk1-x736.google.com with SMTP id af79cd13be357-77d85cf1ff5so218532385a.0 for ; Tue, 28 Nov 2023 11:08:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198500; x=1701803300; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=lHpbPyq4Mhf3Mg0XIfRDWzZwaFUDI2lMyaTsUTp389w=; b=AsSDdN3+C3fuSPHUjpZVoWLBZPh7xfFa01Xd9kk6tHfxv3dMyI8F+zHRFlbCJSZVl4 CGRaBROQoYiunQQrGMzcgVPzzKABtTVCkvYnVx28vCsIThGwywqxYhbcBUsORkRi5i5G VTvSdtNQEzKhwWLFWbp/8U4IoslAzwvv7uhnWJeODB8JIjXj8WCJ8Gnbw20bTOdY/118 IyW4dt/y7I+Mu4MkK6jHFoqCwO4WdiIaw2Hh0NldWTJ9D5idae8vrtFZzeXVSUFIcqwQ pOfa77evzTS3fNJMV5N+KFdMkACuUZvH1ROuM4vlPrn3RG05V8PAVh/3xPeepwMIaXJ4 YECw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198500; x=1701803300; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=lHpbPyq4Mhf3Mg0XIfRDWzZwaFUDI2lMyaTsUTp389w=; b=Zg7k4se8+K4syu7tKRAsbSCYPGRY3v0wbgPyA4rCwicr9nElL6mlveuP2CybZwrmrC w76Zy9we+2UyCtWB3TxSXD4EH/OklbjqIOtJHvD8sAyO9LNmQvVBWzIRvoR9CJVAkUHT xiavMLB6JDWNFF3JayCSmHXJWZCVTk97RxXixCdEapPJJpa/vOzJOGV89Q8jL83NmWGU A0oOKCo8tVHhqLVjIWZ46nTF4CZcsDwKxO8qx6rBDLSwXs9D7Yu2YwADUdJl2AszxKlj a7osItUrNiOcbhVouR1Rz250SudyXzeqwvpm6VE84440e+OMemTnoUOfxq3JYoQv76Dx oraA== X-Gm-Message-State: AOJu0Yy2ybc6aVtKNctxgRWcl754fYOCwO74Q7ki+cWxuOhRgSz6n8UN ogfgBzYimByZ5iZN6qHW5XW7x0RwRJK9AD1S9M0= X-Google-Smtp-Source: AGHT+IF45aEaCdR1d8KaKcrk9Hvu+vvrJsFNlVY4CFhhKxHQUr/qTk+pdY8OYzIXHUzUyPf/6Z8oOQ== X-Received: by 2002:a05:620a:113b:b0:76c:da86:3169 with SMTP id p27-20020a05620a113b00b0076cda863169mr16573251qkk.40.1701198499963; Tue, 28 Nov 2023 11:08:19 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id y25-20020a37e319000000b0077d74f884d9sm2147133qki.117.2023.11.28.11.08.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:19 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:18 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 09/24] repack: implement `--extend-disjoint` mode Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that we can generate packs which are disjoint with respect to the set of currently-disjoint packs, implement a mode of `git repack` which extends the set of disjoint packs with any new (non-cruft) pack(s) generated during the repack. The idea is mostly straightforward, with a couple of gotcha's. The straightforward part is to make sure that any new packs are disjoint with respect to the set of currently disjoint packs which are _not_ being removed from the repository as a result of the repack. If a pack which is currently marked as disjoint is, on the other hand, about to be removed from the repository, it is OK (and expected) that new pack(s) will contain some or all of its objects. Since the pack originally marked as disjoint will be removed, it will necessarily leave the disjoint set, making room for new packs with its same objects to take its place. In other words, the resulting set of disjoint packs will be disjoint with respect to one another. The gotchas mostly have to do with making sure that we do not generate a disjoint pack in the following scenarios: - promisor packs - cruft packs (which may necessarily need to include an object from a disjoint pack in order to freshen it in certain circumstances) - all-into-one repacks without '-d' - `--filter-to`, which conceptually could work with the new `--extend-disjoint` option, but only in limited circumstances Otherwise, we mark which packs were created as disjoint by using a new bit in the `generated_pack_data` struct, and then marking those pack(s) as disjoint accordingly when generating the MIDX. Non-deleted packs which are marked as disjoint are retained as such by passing the equivalent of `--retain-disjoint` when calling the MIDX API to update the MIDX. Signed-off-by: Taylor Blau --- Documentation/git-repack.txt | 12 +++ builtin/repack.c | 57 +++++++++--- t/t7700-repack.sh | 4 +- t/t7705-repack-extend-disjoint.sh | 142 ++++++++++++++++++++++++++++++ 4 files changed, 203 insertions(+), 12 deletions(-) create mode 100755 t/t7705-repack-extend-disjoint.sh diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt index c902512a9e..50ba5e7f9c 100644 --- a/Documentation/git-repack.txt +++ b/Documentation/git-repack.txt @@ -249,6 +249,18 @@ linkgit:git-multi-pack-index[1]). Write a multi-pack index (see linkgit:git-multi-pack-index[1]) containing the non-redundant packs. +--extend-disjoint:: + Extends the set of disjoint packs. All new non-cruft pack(s) + generated are constructed to be disjoint with respect to the set + of currently disjoint packs, excluding any packs that will be + removed as a result of the repack operation. For more on + disjoint packs, see the details in linkgit:gitformat-pack[5], + under the section "`DISP` chunk and disjoint packs". ++ +Useful only with the combination of `--write-midx` and +`--write-bitmap-index`. Incompatible with `--filter-to`. Incompatible +with `-A`, `-a`, or `--cruft` unless `-d` is given. + CONFIGURATION ------------- diff --git a/builtin/repack.c b/builtin/repack.c index edaee4dbec..0601bd16c4 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -58,6 +58,7 @@ struct pack_objects_args { int no_reuse_object; int quiet; int local; + int ignore_disjoint; struct list_objects_filter_options filter_options; }; @@ -293,6 +294,8 @@ static void prepare_pack_objects(struct child_process *cmd, strvec_push(&cmd->args, "--local"); if (args->quiet) strvec_push(&cmd->args, "--quiet"); + if (args->ignore_disjoint) + strvec_push(&cmd->args, "--ignore-disjoint"); if (delta_base_offset) strvec_push(&cmd->args, "--delta-base-offset"); strvec_push(&cmd->args, out); @@ -334,9 +337,11 @@ static struct { struct generated_pack_data { struct tempfile *tempfiles[ARRAY_SIZE(exts)]; + unsigned disjoint : 1; }; -static struct generated_pack_data *populate_pack_exts(const char *name) +static struct generated_pack_data *populate_pack_exts(const char *name, + unsigned disjoint) { struct stat statbuf; struct strbuf path = STRBUF_INIT; @@ -353,6 +358,8 @@ static struct generated_pack_data *populate_pack_exts(const char *name) data->tempfiles[i] = register_tempfile(path.buf); } + data->disjoint = disjoint; + strbuf_release(&path); return data; } @@ -379,6 +386,8 @@ static void repack_promisor_objects(const struct pack_objects_args *args, prepare_pack_objects(&cmd, args, packtmp); cmd.in = -1; + strvec_pushf(&cmd.args, "--no-ignore-disjoint"); + /* * NEEDSWORK: Giving pack-objects only the OIDs without any ordering * hints may result in suboptimal deltas in the resulting pack. See if @@ -421,7 +430,7 @@ static void repack_promisor_objects(const struct pack_objects_args *args, line.buf); write_promisor_file(promisor_name, NULL, 0); - item->util = populate_pack_exts(item->string); + item->util = populate_pack_exts(item->string, 0); free(promisor_name); } @@ -731,8 +740,13 @@ static void midx_included_packs(struct string_list *include, for_each_string_list_item(item, &existing->kept_packs) string_list_insert(include, xstrfmt("%s.idx", item->string)); - for_each_string_list_item(item, names) - string_list_insert(include, xstrfmt("pack-%s.idx", item->string)); + for_each_string_list_item(item, names) { + const char *marker = ""; + struct generated_pack_data *data = item->util; + if (data->disjoint) + marker = "+"; + string_list_insert(include, xstrfmt("%spack-%s.idx", marker, item->string)); + } if (geometry->split_factor) { struct strbuf buf = STRBUF_INIT; uint32_t i; @@ -788,7 +802,8 @@ static int write_midx_included_packs(struct string_list *include, struct pack_geometry *geometry, struct string_list *names, const char *refs_snapshot, - int show_progress, int write_bitmaps) + int show_progress, int write_bitmaps, + int exclude_disjoint) { struct child_process cmd = CHILD_PROCESS_INIT; struct string_list_item *item; @@ -852,6 +867,9 @@ static int write_midx_included_packs(struct string_list *include, if (refs_snapshot) strvec_pushf(&cmd.args, "--refs-snapshot=%s", refs_snapshot); + if (exclude_disjoint) + strvec_push(&cmd.args, "--retain-disjoint"); + ret = start_command(&cmd); if (ret) return ret; @@ -895,7 +913,7 @@ static void remove_redundant_bitmaps(struct string_list *include, static int finish_pack_objects_cmd(struct child_process *cmd, struct string_list *names, - int local) + int local, int disjoint) { FILE *out; struct strbuf line = STRBUF_INIT; @@ -913,7 +931,7 @@ static int finish_pack_objects_cmd(struct child_process *cmd, */ if (local) { item = string_list_append(names, line.buf); - item->util = populate_pack_exts(line.buf); + item->util = populate_pack_exts(line.buf, disjoint); } } fclose(out); @@ -970,7 +988,7 @@ static int write_filtered_pack(const struct pack_objects_args *args, fprintf(in, "%s%s.pack\n", caret, item->string); fclose(in); - return finish_pack_objects_cmd(&cmd, names, local); + return finish_pack_objects_cmd(&cmd, names, local, 0); } static int existing_cruft_pack_cmp(const void *va, const void *vb) @@ -1098,7 +1116,7 @@ static int write_cruft_pack(const struct pack_objects_args *args, fprintf(in, "%s.pack\n", item->string); fclose(in); - return finish_pack_objects_cmd(&cmd, names, local); + return finish_pack_objects_cmd(&cmd, names, local, 0); } static const char *find_pack_prefix(const char *packdir, const char *packtmp) @@ -1190,6 +1208,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) N_("pack prefix to store a pack containing pruned objects")), OPT_STRING(0, "filter-to", &filter_to, N_("dir"), N_("pack prefix to store a pack containing filtered out objects")), + OPT_BOOL(0, "extend-disjoint", &po_args.ignore_disjoint, + N_("add new packs to the set of disjoint ones")), OPT_END() }; @@ -1255,6 +1275,16 @@ int cmd_repack(int argc, const char **argv, const char *prefix) strbuf_release(&path); } + if (po_args.ignore_disjoint) { + if (filter_to) + die(_("options '%s' and '%s' cannot be used together"), + "--filter-to", "--extend-disjoint"); + if (pack_everything && !delete_redundant) + die(_("cannot use '--extend-disjoint' with '%s' but not '-d'"), + pack_everything & LOOSEN_UNREACHABLE ? "-A" : + pack_everything & PACK_CRUFT ? "--cruft" : "-a"); + } + packdir = mkpathdup("%s/pack", get_object_directory()); packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid()); packtmp = mkpathdup("%s/%s", packdir, packtmp_name); @@ -1308,6 +1338,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix) if (pack_everything & ALL_INTO_ONE) { repack_promisor_objects(&po_args, &names); + if (delete_redundant) + strvec_pushf(&cmd.args, "--no-ignore-disjoint"); + if (has_existing_non_kept_packs(&existing) && delete_redundant && !(pack_everything & PACK_CRUFT)) { @@ -1364,7 +1397,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) fclose(in); } - ret = finish_pack_objects_cmd(&cmd, &names, 1); + ret = finish_pack_objects_cmd(&cmd, &names, 1, po_args.ignore_disjoint); if (ret) goto cleanup; @@ -1387,6 +1420,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix) cruft_po_args.local = po_args.local; cruft_po_args.quiet = po_args.quiet; + cruft_po_args.ignore_disjoint = 0; ret = write_cruft_pack(&cruft_po_args, packtmp, pack_prefix, cruft_expiration, &names, @@ -1487,7 +1521,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix) ret = write_midx_included_packs(&include, &geometry, &names, refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL, - show_progress, write_bitmaps > 0); + show_progress, write_bitmaps > 0, + po_args.ignore_disjoint); if (!ret && write_bitmaps) remove_redundant_bitmaps(&include, packdir); diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh index d2975e6c93..277f1ff1d7 100755 --- a/t/t7700-repack.sh +++ b/t/t7700-repack.sh @@ -6,6 +6,7 @@ test_description='git repack works correctly' . "${TEST_DIRECTORY}/lib-bitmap.sh" . "${TEST_DIRECTORY}/lib-midx.sh" . "${TEST_DIRECTORY}/lib-terminal.sh" +. "${TEST_DIRECTORY}/lib-disjoint.sh" commit_and_pack () { test_commit "$@" 1>&2 && @@ -525,7 +526,8 @@ test_expect_success '--filter works with --max-pack-size' ' ' objdir=.git/objects -midx=$objdir/pack/multi-pack-index +packdir=$objdir/pack +midx=$packdir/multi-pack-index test_expect_success 'setup for --write-midx tests' ' git init midx && diff --git a/t/t7705-repack-extend-disjoint.sh b/t/t7705-repack-extend-disjoint.sh new file mode 100755 index 0000000000..0c8be1cb3f --- /dev/null +++ b/t/t7705-repack-extend-disjoint.sh @@ -0,0 +1,142 @@ +#!/bin/sh + +test_description='git repack --extend-disjoint works correctly' + +. ./test-lib.sh +. "${TEST_DIRECTORY}/lib-disjoint.sh" + +packdir=.git/objects/pack + +GIT_TEST_MULTI=0 +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 + +test_expect_success 'repack --extend-disjoint creates new disjoint packs' ' + git init repo && + ( + cd repo && + + test_commit A && + test_commit B && + + A="$(echo A | git pack-objects --revs $packdir/pack)" && + B="$(echo A..B | git pack-objects --revs $packdir/pack)" && + + git prune-packed && + + cat >in <<-EOF && + pack-$A.idx + +pack-$B.idx + EOF + git multi-pack-index write --bitmap --stdin-packs packs.before && + git repack --write-midx --write-bitmap-index --extend-disjoint && + find $packdir -type f -name "*.idx" | sort >packs.after && + + comm -13 packs.before packs.after >packs.new && + + test_line_count = 1 packs.new && + + test_must_not_be_disjoint "pack-$A.pack" && + test_must_be_disjoint "pack-$B.pack" && + test_must_be_disjoint "$(basename $(cat packs.new) .idx).pack" + ) +' + +test_expect_success 'repack --extend-disjoint combines existing disjoint packs' ' + ( + cd repo && + + test_commit D && + + git repack -a -d --write-midx --write-bitmap-index --extend-disjoint && + + find $packdir -type f -name "*.pack" >packs && + test_line_count = 1 packs && + + test_must_be_disjoint "$(basename $(cat packs))" + + ) +' + +test_expect_success 'repack --extend-disjoint with --geometric' ' + git init disjoint-geometric && + ( + cd disjoint-geometric && + + test_commit_bulk 8 && + base="$(basename $(ls $packdir/pack-*.idx))" && + echo "+$base" >>in && + + test_commit A && + A="$(echo HEAD^.. | git pack-objects --revs $packdir/pack)" && + test_commit B && + B="$(echo HEAD^.. | git pack-objects --revs $packdir/pack)" && + + git prune-packed && + + cat >>in <<-EOF && + +pack-$A.idx + +pack-$B.idx + EOF + git multi-pack-index write --bitmap --stdin-packs packs.before && + git repack --geometric=2 -d --write-midx --write-bitmap-index --extend-disjoint && + find $packdir -type f -name "*.pack" | sort >packs.after && + + comm -12 packs.before packs.after >packs.unchanged && + comm -23 packs.before packs.after >packs.removed && + comm -13 packs.before packs.after >packs.new && + + cat >expect <<-EOF && + $packdir/${base%.idx}.pack + EOF + test_cmp expect packs.unchanged && + + sort >expect <<-EOF && + $packdir/pack-$A.pack + $packdir/pack-$B.pack + EOF + test_cmp expect packs.removed && + + test_line_count = 1 packs.new && + + test_must_be_disjoint "$(basename $(cat packs.new))" && + test_must_be_disjoint "${base%.idx}.pack" + ) +' + +for flag in "-A" "-a" "--cruft" +do + test_expect_success "repack --extend-disjoint incompatible with $flag without -d" ' + test_must_fail git repack $flag --extend-disjoint \ + --write-midx --write-bitmap-index 2>actual && + cat >expect <<-EOF && + fatal: cannot use $SQ--extend-disjoint$SQ with $SQ$flag$SQ but not $SQ-d$SQ + EOF + test_cmp expect actual + ' +done + +test_expect_success 'repack --extend-disjoint is incompatible with --filter-to' ' + test_must_fail git repack --extend-disjoint --filter-to=dir 2>actual && + + cat >expect <<-EOF && + fatal: options $SQ--filter-to$SQ and $SQ--extend-disjoint$SQ cannot be used together + EOF + test_cmp expect actual +' + +test_done From patchwork Tue Nov 28 19:08:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471570 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="DriYqYCx" Received: from mail-qk1-x731.google.com (mail-qk1-x731.google.com [IPv6:2607:f8b0:4864:20::731]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AFAF41702 for ; Tue, 28 Nov 2023 11:08:23 -0800 (PST) Received: by mail-qk1-x731.google.com with SMTP id af79cd13be357-77d8d1b7952so186903885a.2 for ; Tue, 28 Nov 2023 11:08:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198502; x=1701803302; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=0LBgFElpi9Acco6bJTnDZid/m0EVpOrN/ZoGoWrUS4A=; b=DriYqYCxrF6XmGFhEkTil2JL4hVOVA5jie+hCoxS1fuCZs7FMnvLipFQ0fcbQ2QBMG y1dqH7dUvK2Sjv7f2ghUjYXORJaimQy0iUJPkkiG+E+Ui6ocrh15DsNM38V/D3DUmpFH FxCRle4IaOc0X587Q102cq6Yi/yPCI9RzZQgQnQo68IV5b+HJh1Ryx3v0hfLiMxfQ2wK OK9SClq6vwBs9XqG94ls7Mmn0c7apFaaA9nxgNAlfMCbDKyNYYNEFingOufinyNcdchp 5QZIuwBD/3FHqSV0n8bwCmuzKyfkN5gok8Rm6IIdvm0xoXWL7h0aEsPUpjRzYjnDNbsm gy5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198502; x=1701803302; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=0LBgFElpi9Acco6bJTnDZid/m0EVpOrN/ZoGoWrUS4A=; b=eGCwPmb/j4T1070ic2VuBvA1W/Zqjv1vowKjn1XbA2JVsR265TrqInFH4mzsZftfEi +HE8ENEMycjSFcfsMbwMwOz9CuTVbggvUoYj/ZMJf4kPpDL2lKMDCQc0zZoD6642qFCh AN/DxkGicazW9nCBZboFkT056stAZnmIfkclJJF157+uEHGUmWbyuvkRixM4da2d/n6f zfoula44nUpwIQZYXrZqOKXzxZfA3CwRxXeAqo5zEfgncz/nm3IwPyX1ieOT1ZglcOxB O8g8YC/XLgIkNmI0Ic9XWx6aR6KyIYFw0lFNwNhyeV0PPZydPQSx01lh4HB75HSXPCHb onoQ== X-Gm-Message-State: AOJu0Yxv+l7jvpeVyELvND0TzSpMEsOQl210ws3SkJ3xGeJYvgvZMIxF p8mmz87KUw+Ei/I4J9I0dvgD5wLbW+7hmOecIf0= X-Google-Smtp-Source: AGHT+IHlY/nj8Vay3d7RIk0XjVPY5EQshc40yl0vsBiwuxv7EjL3EiP4EtsvxYDQlOjlsKKkftD1Mg== X-Received: by 2002:a05:620a:1087:b0:77b:d8aa:6756 with SMTP id g7-20020a05620a108700b0077bd8aa6756mr16216898qkk.49.1701198502533; Tue, 28 Nov 2023 11:08:22 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id g15-20020a05620a278f00b0077d7649ef06sm4527922qkp.14.2023.11.28.11.08.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:22 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:21 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 10/24] pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions Message-ID: <970bd9eaeae038adb6e7d4c399c9b033668a8864.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When trying to assemble a pack with bitmaps using `--use-bitmap-index`, `pack-objects` asks the pack-bitmap machinery for a bitmap which indicates the set of objects we can "reuse" verbatim from on-disk. This set is roughly comprised of: a prefix of objects in the bitmapped pack (or preferred pack, in the case of a multi-pack reachability bitmap), plus any other objects not included in the prefix, excluding any deltas whose base we are not sending in the resulting pack. The pack-bitmap machinery is responsible for computing this bitmap, and does so with the following functions: - reuse_partial_packfile_from_bitmap() - try_partial_reuse() In the existing implementation, the first function is responsible for (a) marking the prefix of objects in the reusable pack, and then (b) calling try_partial_reuse() on any remaining objects to ensure that they are also reusable (and removing them from the bitmapped set if they are not). Likewise, the `try_partial_reuse()` function is responsible for checking whether an isolated object (that is, an object from the bitmapped pack/preferred pack not contained in the prefix from earlier) may be reused, i.e. that it isn't a delta of an object that we are not sending in the resulting pack. These functions are based on two core assumptions, which we will unwind in this and the following commits: 1. There is only a single pack from the bitmap which is eligible for verbatim pack-reuse. For single-pack bitmaps, this is trivially the bitmapped pack. For multi-pack bitmaps, this is (currently) the MIDX's preferred pack. 2. The pack eligible for reuse has its first object in bit position 0, and all objects from that pack follow in pack-order from that first bit position. In order to perform verbatim pack reuse over multiple packs, we must unwind these two assumptions. Most notably, in order to reuse bits from a given packfile, we need to know the first bit position occupied by an object form that packfile. To propagate this information around, pass a `struct bitmapped_pack *` anywhere we previously passed a `struct packed_git *`, since the former contains the bitmap position we're interested in (as well as a pointer to the latter). As an additional step, factor out a sub-routine from the main `reuse_partial_packfile_from_bitmap()` function, called `reuse_partial_packfile_from_bitmap_1()`. This new function will be responsible for figuring out which objects may be reused from a single pack, and the existing function will dispatch multiple calls to its new helper function for each reusable pack. Consequently, `reuse_partial_packfile_from_bitmap()` will now maintain an array of reusable packs instead of a single such pack. We currently expect that array to have only a single element, so this awkward state is short-lived. It will serve as useful scaffolding in subsequent commits as we begin to work towards enabling multi-pack reuse. Signed-off-by: Taylor Blau --- pack-bitmap.c | 105 +++++++++++++++++++++++++++++++++++--------------- 1 file changed, 74 insertions(+), 31 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index d2f1306960..2ebe2c314e 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1836,7 +1836,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, * -1 means "stop trying further objects"; 0 means we may or may not have * reused, but you can keep feeding bits. */ -static int try_partial_reuse(struct packed_git *pack, +static int try_partial_reuse(struct bitmapped_pack *pack, size_t pos, struct bitmap *reuse, struct pack_window **w_curs) @@ -1868,11 +1868,11 @@ static int try_partial_reuse(struct packed_git *pack, * preferred pack precede all bits from other packs. */ - if (pos >= pack->num_objects) + if (pos >= pack->p->num_objects) return -1; /* not actually in the pack or MIDX preferred pack */ - offset = delta_obj_offset = pack_pos_to_offset(pack, pos); - type = unpack_object_header(pack, w_curs, &offset, &size); + offset = delta_obj_offset = pack_pos_to_offset(pack->p, pos); + type = unpack_object_header(pack->p, w_curs, &offset, &size); if (type < 0) return -1; /* broken packfile, punt */ @@ -1888,11 +1888,11 @@ static int try_partial_reuse(struct packed_git *pack, * and the normal slow path will complain about it in * more detail. */ - base_offset = get_delta_base(pack, w_curs, &offset, type, + base_offset = get_delta_base(pack->p, w_curs, &offset, type, delta_obj_offset); if (!base_offset) return 0; - if (offset_to_pack_pos(pack, base_offset, &base_pos) < 0) + if (offset_to_pack_pos(pack->p, base_offset, &base_pos) < 0) return 0; /* @@ -1915,14 +1915,14 @@ static int try_partial_reuse(struct packed_git *pack, * to REF_DELTA on the fly. Better to just let the normal * object_entry code path handle it. */ - if (!bitmap_get(reuse, base_pos)) + if (!bitmap_get(reuse, pack->bitmap_pos + base_pos)) return 0; } /* * If we got here, then the object is OK to reuse. Mark it. */ - bitmap_set(reuse, pos); + bitmap_set(reuse, pack->bitmap_pos + pos); return 0; } @@ -1934,29 +1934,13 @@ uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git) return nth_midxed_pack_int_id(m, pack_pos_to_midx(bitmap_git->midx, 0)); } -int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, - struct packed_git **packfile_out, - uint32_t *entries, - struct bitmap **reuse_out) +static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git, + struct bitmapped_pack *pack, + struct bitmap *reuse) { - struct repository *r = the_repository; - struct packed_git *pack; struct bitmap *result = bitmap_git->result; - struct bitmap *reuse; struct pack_window *w_curs = NULL; size_t i = 0; - uint32_t offset; - uint32_t objects_nr; - - assert(result); - - load_reverse_index(r, bitmap_git); - - if (bitmap_is_midx(bitmap_git)) - pack = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)]; - else - pack = bitmap_git->pack; - objects_nr = pack->num_objects; while (i < result->word_alloc && result->words[i] == (eword_t)~0) i++; @@ -1969,15 +1953,15 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, * we use it instead of another pack. In single-pack bitmaps, the choice * is made for us. */ - if (i > objects_nr / BITS_IN_EWORD) - i = objects_nr / BITS_IN_EWORD; + if (i > pack->p->num_objects / BITS_IN_EWORD) + i = pack->p->num_objects / BITS_IN_EWORD; - reuse = bitmap_word_alloc(i); memset(reuse->words, 0xFF, i * sizeof(eword_t)); for (; i < result->word_alloc; ++i) { eword_t word = result->words[i]; size_t pos = (i * BITS_IN_EWORD); + size_t offset; for (offset = 0; offset < BITS_IN_EWORD; ++offset) { if ((word >> offset) == 0) @@ -2002,6 +1986,65 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, done: unuse_pack(&w_curs); +} + +int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, + struct packed_git **packfile_out, + uint32_t *entries, + struct bitmap **reuse_out) +{ + struct repository *r = the_repository; + struct bitmapped_pack *packs = NULL; + struct bitmap *result = bitmap_git->result; + struct bitmap *reuse; + size_t i; + size_t packs_nr = 0, packs_alloc = 0; + size_t word_alloc; + uint32_t objects_nr = 0; + + assert(result); + + load_reverse_index(r, bitmap_git); + + if (bitmap_is_midx(bitmap_git)) { + for (i = 0; i < bitmap_git->midx->num_packs; i++) { + struct bitmapped_pack pack; + if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) { + warning(_("unable to load pack: '%s', disabling pack-reuse"), + bitmap_git->midx->pack_names[i]); + free(packs); + return -1; + } + if (!pack.bitmap_nr) + continue; /* no objects from this pack */ + if (pack.bitmap_pos) + continue; /* not preferred pack */ + + ALLOC_GROW(packs, packs_nr + 1, packs_alloc); + memcpy(&packs[packs_nr++], &pack, sizeof(pack)); + + objects_nr += pack.p->num_objects; + } + } else { + ALLOC_GROW(packs, packs_nr + 1, packs_alloc); + + packs[packs_nr].p = bitmap_git->pack; + packs[packs_nr].bitmap_pos = 0; + packs[packs_nr].bitmap_nr = bitmap_git->pack->num_objects; + packs[packs_nr].disjoint = 1; + + objects_nr = packs[packs_nr++].p->num_objects; + } + + word_alloc = objects_nr / BITS_IN_EWORD; + if (objects_nr % BITS_IN_EWORD) + word_alloc++; + reuse = bitmap_word_alloc(word_alloc); + + if (packs_nr != 1) + BUG("pack reuse not yet implemented for multiple packs"); + + reuse_partial_packfile_from_bitmap_1(bitmap_git, packs, reuse); *entries = bitmap_popcount(reuse); if (!*entries) { @@ -2014,7 +2057,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, * need to be handled separately. */ bitmap_and_not(result, reuse); - *packfile_out = pack; + *packfile_out = packs[0].p; *reuse_out = reuse; return 0; } From patchwork Tue Nov 28 19:08:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471571 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="F77eVjgY" Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 531EC1BC for ; Tue, 28 Nov 2023 11:08:26 -0800 (PST) Received: by mail-qk1-x72d.google.com with SMTP id af79cd13be357-77d895c298eso197719085a.3 for ; Tue, 28 Nov 2023 11:08:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198505; x=1701803305; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=D3NH9e0B7TEOaLke+IJC8tguC6phHHxMJ1QTwr176tI=; b=F77eVjgYi7k3DIP5obR9R3peA73pUO26Y5kNMULYwnwG/XMtwmpvXAuWINf3GrWDeM mDxmu7BUcVIgnjgogNUGOazIEm+wQv8Kvdalz/0wM1iQXt3sfnKoXHNubE9RqCc06qo8 5Kba5/YycW+7I0XqquJUujEFT8/B8sxLINfuN/t/SeDt8a5+WbkLYM444K7gezz5H2Uv oRe7UN1S3gMYP/an+RTzCmLtlwpGak2Da/vpQmwYM90klmkz1YXN2pXRbscpWGGJMHVP Hyl6tg3ZgeiRVrriQHPxqCxc/FKkLoxvNZbn8w+/cu5T7YwBZyX6kSeYIAfRbyEkqFio U+7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198505; x=1701803305; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=D3NH9e0B7TEOaLke+IJC8tguC6phHHxMJ1QTwr176tI=; b=A3zWpnFUWtXo4BsmZw3VkcfkIWUtSVixnjuKNRRfFv1MkDbWRppWqRpCSB62xhOUTO WZnVQJ8r87M/1H7V+eFwJgbF4ucEsCjfWTe1Mr50drP1OrGOaw/wHgHKyCQhwvib6pFb 5ueoUMshmOLASCWt4MR3xeYsOo/hak4u+Jt6f3o0GTu6ARo85hW1o4GKedKmaYJuHxr2 vGH79d0pP6RqULMF049RpGp4FoEeVf0HVpr55pw/QOsyG6fSIpNFkgjsh0NW9DhTjnvD VZzy9N+tCobc888plIe6EIGnZzLWE86GNsfn8xavfbwFnjXLHjxQE+K4q+ySNy8YHWPZ 3mSw== X-Gm-Message-State: AOJu0YyEfC0PafDRcWjYs4q18mi7Podvu559/1nFifOdpC7Bs0PWIlk9 vdHQ4y5okuGwyB1J9cGLGm0Ky9N7v7X/N3DCr4E= X-Google-Smtp-Source: AGHT+IE/Jdvi1/gu2vWjA3AwPysC0qMpYuMIOc+27FVn+9ToiXT9Hto4bbVBpfdZCB9NHQwlra4B3w== X-Received: by 2002:a05:620a:6507:b0:76d:aa3b:ac9c with SMTP id qb7-20020a05620a650700b0076daa3bac9cmr18861838qkn.46.1701198505176; Tue, 28 Nov 2023 11:08:25 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id h19-20020a05620a285300b0077d897deb89sm3345508qkp.127.2023.11.28.11.08.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:25 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:24 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 11/24] pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature Message-ID: <432854b27c6731bd6ab1fa739b3a086ec0a90be8.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The signature of `reuse_partial_packfile_from_bitmap()` currently takes in a bitmap, as well as three output parameters (filled through pointers, and passed as arguments), and also returns an integer result. The output parameters are filled out with: (a) the packfile used for pack-reuse, (b) the number of objects from that pack that we can reuse, and (c) a bitmap indicating which objects we can reuse. The return value is either -1 (when there are no objects to reuse), or 0 (when there is at least one object to reuse). Some of these parameters are redundant. Notably, we can infer from the bitmap how many objects are reused by calling bitmap_popcount(). And we can similar compute the return value based on that number as well. As such, clean up the signature of this function to drop the "*entries" parameter, as well as the int return value, since the single caller of this function can infer these values themself. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 16 +++++++++------- pack-bitmap.c | 16 +++++++--------- pack-bitmap.h | 7 +++---- 3 files changed, 19 insertions(+), 20 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 107154db34..2bb1b64e8f 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3946,13 +3946,15 @@ static int get_object_list_from_bitmap(struct rev_info *revs) if (!(bitmap_git = prepare_bitmap_walk(revs, 0))) return -1; - if (pack_options_allow_reuse() && - !reuse_partial_packfile_from_bitmap( - bitmap_git, - &reuse_packfile, - &reuse_packfile_objects, - &reuse_packfile_bitmap)) { - assert(reuse_packfile_objects); + if (pack_options_allow_reuse()) + reuse_partial_packfile_from_bitmap(bitmap_git, &reuse_packfile, + &reuse_packfile_bitmap); + + if (reuse_packfile) { + reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap); + if (!reuse_packfile_objects) + BUG("expected non-empty reuse bitmap"); + nr_result += reuse_packfile_objects; nr_seen += reuse_packfile_objects; display_progress(progress_state, nr_seen); diff --git a/pack-bitmap.c b/pack-bitmap.c index 2ebe2c314e..614fc09a4e 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1988,10 +1988,9 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git unuse_pack(&w_curs); } -int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, - struct packed_git **packfile_out, - uint32_t *entries, - struct bitmap **reuse_out) +void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, + struct packed_git **packfile_out, + struct bitmap **reuse_out) { struct repository *r = the_repository; struct bitmapped_pack *packs = NULL; @@ -2013,7 +2012,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, warning(_("unable to load pack: '%s', disabling pack-reuse"), bitmap_git->midx->pack_names[i]); free(packs); - return -1; + return; } if (!pack.bitmap_nr) continue; /* no objects from this pack */ @@ -2046,10 +2045,10 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, reuse_partial_packfile_from_bitmap_1(bitmap_git, packs, reuse); - *entries = bitmap_popcount(reuse); - if (!*entries) { + if (!bitmap_popcount(reuse)) { + free(packs); bitmap_free(reuse); - return -1; + return; } /* @@ -2059,7 +2058,6 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, bitmap_and_not(result, reuse); *packfile_out = packs[0].p; *reuse_out = reuse; - return 0; } int bitmap_walk_contains(struct bitmap_index *bitmap_git, diff --git a/pack-bitmap.h b/pack-bitmap.h index b7fa1a42a9..5bc1ca5b65 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -78,10 +78,9 @@ int test_bitmap_hashes(struct repository *r); struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, int filter_provided_objects); uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git); -int reuse_partial_packfile_from_bitmap(struct bitmap_index *, - struct packed_git **packfile, - uint32_t *entries, - struct bitmap **reuse_out); +void reuse_partial_packfile_from_bitmap(struct bitmap_index *, + struct packed_git **packfile, + struct bitmap **reuse_out); int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping, kh_oid_map_t *reused_bitmaps, int show_progress); void free_bitmap_index(struct bitmap_index *); From patchwork Tue Nov 28 19:08:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471572 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="ZijxkSRC" Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4E3592 for ; Tue, 28 Nov 2023 11:08:28 -0800 (PST) Received: by mail-qt1-x832.google.com with SMTP id d75a77b69052e-42033328ad0so32729931cf.0 for ; Tue, 28 Nov 2023 11:08:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198508; x=1701803308; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ilkXkGh4UQ+/aeFm6DMB3/MRHBoihpCdwNtE5FkgYiA=; b=ZijxkSRCa+MVoufQw8pX1zEPDmbJQHMIc7ryEPNYdxRW7uE1elAtPSEoO6TO2fdL3C YYQJaFPp7+XmVgUuPBLsbh8kUw7SKdejWYyghinEmlcDe0ULhtA4MtIqR+jEN2WlinHT rQR4rmtfQIe+M14yTYYVAo/CiHl6BtQ7g8BeE9AjKV0icbDZr85XZipLoE5eQ/zLcrOA qq6Tj17eJ8pSn1YjOQTW9YbR6/34H1KTm3kmb+xrriLIfhgDr5sWZExCw/RkRJt6cLvF qUjuUgvkfcuJD0BY8W1g6soQTICypHyEZWBA0EEUACro/YqszjDcPamlITZYdOO88J3I N9Eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198508; x=1701803308; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ilkXkGh4UQ+/aeFm6DMB3/MRHBoihpCdwNtE5FkgYiA=; b=NE6fw6dyDPggY6zRVv/CLXMPR1R+e6gvEFLNFbp2pOh9xG4RA2kXKRZN8OuEXchcRO A2WDrinb8GgvJnY1lZyT05ru36H1G3qYt2iNoWzaDYU2eYk5EEMYEBZRob7Tij8eiIbq J8mIgR8OBEpYuW7yZTy1eZA/NEIf3Y+rIB+jDhoY1cttvRkVdLsxjbc5OYIEHDB/bKVc uiKtoRrlhiOiq0r0OKjJEKELKihgdDog4qBh1EGriVNQ/MEan+q4doOrpdwqEOZLMdXV OKPH1feufGbFuwnt/4JIwDeSofdllGO41B1/QRuWg0hWBcBhLXen0hy7GnEec6dMv0fd ZJSA== X-Gm-Message-State: AOJu0Yyjk4yTKABNoGhCZrL0SktZgZkI8Uv8btj2J9q8n2UrpT0WiPJ4 IpRGO0nraH+IQS8T30RRVmH/mecsovGXWMryuEw= X-Google-Smtp-Source: AGHT+IEGpHLFMvQAcGb5Zz6cRwfBxzkgLcTjbWXuREtroKUjuMbHF8V8wVZ0zR6J4StKXf6JL7HAoA== X-Received: by 2002:a05:622a:7188:b0:423:a3c9:8e70 with SMTP id jd8-20020a05622a718800b00423a3c98e70mr15553662qtb.17.1701198507813; Tue, 28 Nov 2023 11:08:27 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id w26-20020ac86b1a000000b0041e383d527esm4754403qts.66.2023.11.28.11.08.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:27 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:26 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 12/24] pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()` Message-ID: <36b794d9e1eaa413b3b87eb7e5a0d82098d05dca.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Further prepare for enabling verbatim pack-reuse over multiple packfiles by changing the signature of reuse_partial_packfile_from_bitmap() to populate an array of `struct bitmapped_pack *`'s instead of a pointer to a single packfile. Since the array we're filling out is sized dynamically[^1], add an additional `size_t *` parameter which will hold the number of reusable packs (equal to the number of elements in the array). Note that since we still have not implemented true multi-pack reuse, these changes aren't propagated out to the rest of the caller in builtin/pack-objects.c. In the interim state, we expect that the array has a single element, and we use that element to fill out the static `reuse_packfile` variable (which is a bog-standard `struct packed_git *`). Future commits will continue to push this change further out through the pack-objects code. [^1]: That is, even though we know the number of packs which are candidates for pack-reuse, we do not know how many of those candidates we can actually reuse. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 9 +++++++-- pack-bitmap.c | 6 ++++-- pack-bitmap.h | 5 +++-- 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 2bb1b64e8f..89de23f39a 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3943,14 +3943,19 @@ static int pack_options_allow_reuse(void) static int get_object_list_from_bitmap(struct rev_info *revs) { + struct bitmapped_pack *packs = NULL; + size_t packs_nr = 0; + if (!(bitmap_git = prepare_bitmap_walk(revs, 0))) return -1; if (pack_options_allow_reuse()) - reuse_partial_packfile_from_bitmap(bitmap_git, &reuse_packfile, + reuse_partial_packfile_from_bitmap(bitmap_git, &packs, + &packs_nr, &reuse_packfile_bitmap); - if (reuse_packfile) { + if (packs) { + reuse_packfile = packs[0].p; reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap); if (!reuse_packfile_objects) BUG("expected non-empty reuse bitmap"); diff --git a/pack-bitmap.c b/pack-bitmap.c index 614fc09a4e..670deec909 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1989,7 +1989,8 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git } void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, - struct packed_git **packfile_out, + struct bitmapped_pack **packs_out, + size_t *packs_nr_out, struct bitmap **reuse_out) { struct repository *r = the_repository; @@ -2056,7 +2057,8 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, * need to be handled separately. */ bitmap_and_not(result, reuse); - *packfile_out = packs[0].p; + *packs_out = packs; + *packs_nr_out = packs_nr; *reuse_out = reuse; } diff --git a/pack-bitmap.h b/pack-bitmap.h index 5bc1ca5b65..901a3b86ed 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -78,8 +78,9 @@ int test_bitmap_hashes(struct repository *r); struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, int filter_provided_objects); uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git); -void reuse_partial_packfile_from_bitmap(struct bitmap_index *, - struct packed_git **packfile, +void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, + struct bitmapped_pack **packs_out, + size_t *packs_nr_out, struct bitmap **reuse_out); int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping, kh_oid_map_t *reused_bitmaps, int show_progress); From patchwork Tue Nov 28 19:08:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471573 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="eqPFVPgm" Received: from mail-qv1-xf2c.google.com (mail-qv1-xf2c.google.com [IPv6:2607:f8b0:4864:20::f2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9CEC210F6 for ; Tue, 28 Nov 2023 11:08:31 -0800 (PST) Received: by mail-qv1-xf2c.google.com with SMTP id 6a1803df08f44-67a3e0fb11aso15709856d6.2 for ; Tue, 28 Nov 2023 11:08:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198510; x=1701803310; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=zzd1kLFDSKSrvJElIbHY8ojcKXX1VK/68PrAguZ/47Q=; b=eqPFVPgmxeRQ9F1C8A3gf1iI3Mj8eXW8hOsgfxStqIawGveduCvB9Xjmstugk2jnc3 cW8BZNoesJ11K2bO9lpyMCgFM9TESn/TQjqW6dn6aUeK8f8CQ2//OFDs/UKkmUeFJEjC CS23gBs+A0Z31GgddMxRu+IVYlFFnDVheu1Hn7wimbIwu9u0cecG1DSCeNcFtv74UkAF 6pArJ59n3VIGbBvNvXb9y+WrOEfrBkdKdFHHQfxhO5ZTq7Vcfs547MQcntrsJi+prjfd P6zrTQ4dWc5SR21X/AzCpw1Z8d4dz+LXawNcxk3SgOYYW3mEfoMyLLpFyB+l8mHmVm0/ 0Q6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198510; x=1701803310; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=zzd1kLFDSKSrvJElIbHY8ojcKXX1VK/68PrAguZ/47Q=; b=OmSPAqv/Gf7PMZ7BBw9n0PIBYZWUiOkOCnnF/dlAgdSeEhK86mtfWpW4MHet7FK1m8 QyNqy1YcAzNL+oglliVyR77IRwwEWo3bqAPlR4nKGXV7DO520v0joC6BPFHMgJdQtXHz ta6JB+5Q6duORTVKkuT4Ko0BdYS1RhkH8/Ftyvo/Z3UCc/S7uDa3dttem18PkBm0waRS QpNzftQHYBMjJSnUsXiqUqTlDnzroKtGFPDadtPHKBlJm7bweKAA7sd1KY7bI4Gv47lF dyyo2MBgSQ4VaA8Rn80jZveaMdUSS/yk8sqmYnsKOTAiapnr+JgkC5KI+Oogx5qVLd1M 0Now== X-Gm-Message-State: AOJu0YwOrebp+wW+VLBSou0z1xno5tjRZNPtJNRW5MD6jT+Rx1xeMYla O8XOVl+y5BiPvYOdsKgDGu5c4opB+V5RhgmfEd0= X-Google-Smtp-Source: AGHT+IF0xib+eSb9iU/oCk7Xd5fOlUnniG7XOjyVve5Ow4JAVgtNfTnCNU7+HaSyhk3ocN/DRU9y5Q== X-Received: by 2002:a05:6214:4a8f:b0:67a:2bf7:5275 with SMTP id pi15-20020a0562144a8f00b0067a2bf75275mr17590562qvb.46.1701198510521; Tue, 28 Nov 2023 11:08:30 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id dp11-20020a05621409cb00b0067a4b72b6besm1654490qvb.105.2023.11.28.11.08.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:30 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:29 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 13/24] pack-objects: parameterize pack-reuse routines over a single pack Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The routines pack-objects uses to perform verbatim pack-reuse are: - write_reused_pack_one() - write_reused_pack_verbatim() - write_reused_pack() , all of which assume that there is exactly one packfile being reused: the global constant `reuse_packfile`. Prepare for reusing objects from multiple packs by making reuse packfile a parameter of each of the above functions in preparation for calling these functions in a loop with multiple packfiles. Note that we still have the global "reuse_packfile", but pass it through each of the above function's parameter lists, eliminating all but one direct access (the top-level caller in `write_pack_file()`). Even after this series, we will still have a global, but it will hold the array of reusable packfiles, and we'll pass them one at a time to these functions in a loop. Note also that we will eventually need to pass a `bitmapped_pack` instead of a `packed_git` in order to hold onto additional information required for reuse (such as the bit position of the first object belonging to that pack). But that change will be made in a future commit so as to minimize the noise below as much as possible. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 89de23f39a..7682bd65bb 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1014,7 +1014,8 @@ static off_t find_reused_offset(off_t where) return reused_chunks[lo-1].difference; } -static void write_reused_pack_one(size_t pos, struct hashfile *out, +static void write_reused_pack_one(struct packed_git *reuse_packfile, + size_t pos, struct hashfile *out, struct pack_window **w_curs) { off_t offset, next, cur; @@ -1092,7 +1093,8 @@ static void write_reused_pack_one(size_t pos, struct hashfile *out, copy_pack_data(out, reuse_packfile, w_curs, offset, next - offset); } -static size_t write_reused_pack_verbatim(struct hashfile *out, +static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile, + struct hashfile *out, struct pack_window **w_curs) { size_t pos = 0; @@ -1119,14 +1121,15 @@ static size_t write_reused_pack_verbatim(struct hashfile *out, return pos; } -static void write_reused_pack(struct hashfile *f) +static void write_reused_pack(struct packed_git *reuse_packfile, + struct hashfile *f) { size_t i = 0; uint32_t offset; struct pack_window *w_curs = NULL; if (allow_ofs_delta) - i = write_reused_pack_verbatim(f, &w_curs); + i = write_reused_pack_verbatim(reuse_packfile, f, &w_curs); for (; i < reuse_packfile_bitmap->word_alloc; ++i) { eword_t word = reuse_packfile_bitmap->words[i]; @@ -1142,7 +1145,8 @@ static void write_reused_pack(struct hashfile *f) * bitmaps. See comment in try_partial_reuse() * for why. */ - write_reused_pack_one(pos + offset, f, &w_curs); + write_reused_pack_one(reuse_packfile, pos + offset, f, + &w_curs); display_progress(progress_state, ++written); } } @@ -1200,7 +1204,7 @@ static void write_pack_file(void) if (reuse_packfile) { assert(pack_to_stdout); - write_reused_pack(f); + write_reused_pack(reuse_packfile, f); offset = hashfile_total(f); } From patchwork Tue Nov 28 19:08:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471574 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="AZGr/vwr" Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com [IPv6:2607:f8b0:4864:20::730]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13CC0C3 for ; Tue, 28 Nov 2023 11:08:34 -0800 (PST) Received: by mail-qk1-x730.google.com with SMTP id af79cd13be357-77d67000b69so333897285a.2 for ; Tue, 28 Nov 2023 11:08:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198513; x=1701803313; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=+gJXQz2t/K9qukTZoVFmWxkjVYWThf7iWKhcma2Kg/4=; b=AZGr/vwr5WrFuXrzGwruXHJNO5Jer5LvhU3jpAsjHqDq5lSI6y6IaZ7Aa7b93yZJFe ukZJkzyMK0JOvcda5Y1Ui/qclsqp1QsaeInlMJewJDUpapM5ZAOyQLdXm1V049HkyEec 4egis6hyTPy0N+tNyVD9vFdFfDOEomi9r9491IriVo6IZ00L+f/uy1mwgQZtUg4SvBeo M7mfFU+eL5FLokT9uVvseA4HFdNAtnURpyvJmUY9UKDeipjPadwWGwu0UFLpefW+ITkA GCb3KZevBkAKk42YQ72g0cJ8cjSOyCoIco4ZitdC+aMpowVvmyQn4XCn6sdA6OoyaQ06 Ld2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198513; x=1701803313; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=+gJXQz2t/K9qukTZoVFmWxkjVYWThf7iWKhcma2Kg/4=; b=sXI7kaX4w5jFKUZeFQ/51ZXccv6kq8jtftljad0OvvzEMZtDQHkEdm9emVJ7Ty5PF7 MVoEhoS+TlyZCzKxAe1kWsmCdfCvq6HOt3t0P0nZblp22KjsBm76vNOQ0MokwTUOpmSw rBxGRBKSmlX+lzkEjfxHjPYY0lYe0fT+8TIYHITVEGsWfreQ8qssHMMIwTjp2sS8yDan FP12JdmT3hqkcG+aCj1eo3wmn/Ja9niz6k3BJI+Umrs5rexy6l99vjYPY8z/zUxjSloE pfSKuAiC8p/VAUJf/ybM3T0ZsjAMSUoboh8bCwPdsOr37eFodRA7T+o1So9OwMDUb9SV Ky9g== X-Gm-Message-State: AOJu0YyM2xATp2Gpc8piqKMTxoL4QPOUDy9QjVA6wh0ebEWD5TKGkIgD fLtypfzAWsmOkmJM4e7psgSRrrztBtlt+YbLvhw= X-Google-Smtp-Source: AGHT+IGjrVhyrG/ZO8n+pl234gxDQQfpm3DDL1xXheVTIz68Cs/XdRYfqAMEe12p2lMJnrFTdgjUmw== X-Received: by 2002:a05:620a:8087:b0:774:1ac2:79be with SMTP id ef7-20020a05620a808700b007741ac279bemr18423634qkb.53.1701198513084; Tue, 28 Nov 2023 11:08:33 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id vq2-20020a05620a558200b0077da7a46b0fsm2016966qkn.69.2023.11.28.11.08.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:32 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:32 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 14/24] pack-objects: keep track of `pack_start` for each reuse pack Message-ID: <6f4fba861b59f909148775ee64c3ba89afc872b5.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When reusing objects from a pack, we keep track of a set of one or more `reused_chunk`s, corresponding to sections of one or more object(s) from a source pack that we are reusing. Each chunk contains two pieces of information: - the offset of the first object in the source pack (relative to the beginning of the source pack) - the difference between that offset, and the corresponding offset in the pack we're generating The purpose of keeping track of these is so that we can patch an OFS_DELTAs that cross over a section of the reuse pack that we didn't take. For instance, consider a hypothetical pack as shown below: (chunk #2) __________... / / +--------+---------+-------------------+---------+ ... | | | (unused) | | ... +--------+---------+-------------------+---------+ \ / \______________/ (chunk #1) Suppose that we are sending objects "base", "other", and "delta", and that the "delta" object is stored as an OFS_DELTA, and that its base is "base". If we don't send any objects in the "(unused)" range, we can't copy the delta'd object directly, since its delta offset includes a range of the pack that we didn't copy, so we have to account for that difference when patching and reassembling the delta. In order to compute this value correctly, we need to know not only where we are in the packfile we're assembling (with `hashfile_total(f)`) but also the position of the first byte of the packfile that we are currently reusing. Together, these two allow us to compute the reused chunk's offset difference relative to the start of the reused pack, as desired. Helped-by: Jeff King Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 7682bd65bb..eb8be514d1 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1016,6 +1016,7 @@ static off_t find_reused_offset(off_t where) static void write_reused_pack_one(struct packed_git *reuse_packfile, size_t pos, struct hashfile *out, + off_t pack_start, struct pack_window **w_curs) { off_t offset, next, cur; @@ -1025,7 +1026,8 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile, offset = pack_pos_to_offset(reuse_packfile, pos); next = pack_pos_to_offset(reuse_packfile, pos + 1); - record_reused_object(offset, offset - hashfile_total(out)); + record_reused_object(offset, + offset - (hashfile_total(out) - pack_start)); cur = offset; type = unpack_object_header(reuse_packfile, w_curs, &cur, &size); @@ -1095,6 +1097,7 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile, static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile, struct hashfile *out, + off_t pack_start UNUSED, struct pack_window **w_curs) { size_t pos = 0; @@ -1126,10 +1129,12 @@ static void write_reused_pack(struct packed_git *reuse_packfile, { size_t i = 0; uint32_t offset; + off_t pack_start = hashfile_total(f) - sizeof(struct pack_header); struct pack_window *w_curs = NULL; if (allow_ofs_delta) - i = write_reused_pack_verbatim(reuse_packfile, f, &w_curs); + i = write_reused_pack_verbatim(reuse_packfile, f, pack_start, + &w_curs); for (; i < reuse_packfile_bitmap->word_alloc; ++i) { eword_t word = reuse_packfile_bitmap->words[i]; @@ -1146,7 +1151,7 @@ static void write_reused_pack(struct packed_git *reuse_packfile, * for why. */ write_reused_pack_one(reuse_packfile, pos + offset, f, - &w_curs); + pack_start, &w_curs); display_progress(progress_state, ++written); } } From patchwork Tue Nov 28 19:08:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471575 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="qRnv5MC9" Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A459410EC for ; Tue, 28 Nov 2023 11:08:36 -0800 (PST) Received: by mail-qt1-x831.google.com with SMTP id d75a77b69052e-41e3e77e675so32689671cf.1 for ; Tue, 28 Nov 2023 11:08:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198515; x=1701803315; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=seMXIWlc+7KjgA00SKA47VNvcWo21GjGL3BPgYiPQ4M=; b=qRnv5MC9e85F/ZOGfzsNQUOZnUUJ/xIlQu8k38FfOy6GVnKt2NFV5DuTZLCOZtfjva IeIt6BIJu7kXQeJqOmfciSXbW5nvcgzTGboJLKftNhXV1cLXZRp9rV4Ccf/IKPJq6OKU EN+B+5+N62posSJvSbXOQE1SdRObmwKBFu05P3I9BRdtukXjae38RPlsXuN55lwliDHo Wxz71BcgSd5TOcyZVvv4PrQ10EFafaPZ07w315XJ+4Q7y5MlzjcYcMcQV+A4Z3ZI61so 63JP4CS6TuRjD8eaqr0oDaiKyzvXTwRABGn/hoCFBIxN7qvm2FKhB8XMxTxgp2RnMK11 2mWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198515; x=1701803315; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=seMXIWlc+7KjgA00SKA47VNvcWo21GjGL3BPgYiPQ4M=; b=Xx6iBSH+n9zcjcRDJ4V0hbCE27KiFnNPhnO6I2vzqkV7WCIXeLlZzNlh+kVT9BAQfB iWJVODDlhXaFS0oYLjOGTboRTi6ADkR+m8+ZH2Ez672dAlcfCu19spIDNISSMGoZjSo/ 9ZULUyouLjzkiTym2zKCCSMr4iA47wktPK9HobuHp23n8Qey8l7LsjcyH9znv3RBKcmC 3g01FsKnc2gJTItGAjFnH+qFF6NNM76aR/lPUlGy4t+tHTA9wGEu58CHLZmSrZC1Amp9 +0BkKo/GoJfIa5eqS0hDXjDERwAMeKZBy1YDlzK7tyo/ekejLf5ZYjEy20FyIV70Ydtr fvPw== X-Gm-Message-State: AOJu0YwuNRbSClK9iH7W52xxSdNVgIhFSjRvq4s5tLf3GInvul34WXfB bBzoZTHHO65EkySH1Rt+BoQDvi4eeGwQF3qMo2o= X-Google-Smtp-Source: AGHT+IG1hy++/iv6rdBh06N0m7Odxno/5+8ErlwuyZi0LT5o8uBcA16MR4rKKpKNFcrexeFxoxbxeg== X-Received: by 2002:a05:622a:578c:b0:423:a391:68ae with SMTP id eh12-20020a05622a578c00b00423a39168aemr11311387qtb.26.1701198515681; Tue, 28 Nov 2023 11:08:35 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id cq9-20020a05622a424900b00423dbb19262sm660286qtb.78.2023.11.28.11.08.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:35 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:34 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 15/24] pack-objects: pass `bitmapped_pack`'s to pack-reuse functions Message-ID: <2bb01e2370fd01d8468e69da474c24266ba851fb.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Further prepare pack-objects to perform verbatim pack-reuse over multiple packfiles by converting functions that take in a pointer to a `struct packed_git` to instead take in a pointer to a `struct bitmapped_pack`. The additional information found in the bitmapped_pack struct (such as the bit position corresponding to the beginning of the pack) will be necessary in order to perform verbatim pack-reuse. Note that we don't use any of the extra pieces of information contained in the bitmapped_pack struct, so this step is merely preparatory and does not introduce any functional changes. Note further that we do not change the argument type to write_reused_pack_one(). That function is responsible for copying sections of the packfile directly and optionally patching any OFS_DELTAs to account for not reusing sections of the packfile in between a delta and its base. As such, that function is (and should remain) oblivious to multi-pack reuse, and does not require any of the extra pieces of information stored in the bitmapped_pack struct. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index eb8be514d1..3b7704d062 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -222,7 +222,8 @@ static int thin; static int num_preferred_base; static struct progress *progress_state; -static struct packed_git *reuse_packfile; +static struct bitmapped_pack *reuse_packfiles; +static size_t reuse_packfiles_nr; static uint32_t reuse_packfile_objects; static struct bitmap *reuse_packfile_bitmap; @@ -1095,7 +1096,7 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile, copy_pack_data(out, reuse_packfile, w_curs, offset, next - offset); } -static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile, +static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile, struct hashfile *out, off_t pack_start UNUSED, struct pack_window **w_curs) @@ -1110,13 +1111,13 @@ static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile, off_t to_write; written = (pos * BITS_IN_EWORD); - to_write = pack_pos_to_offset(reuse_packfile, written) + to_write = pack_pos_to_offset(reuse_packfile->p, written) - sizeof(struct pack_header); /* We're recording one chunk, not one object. */ record_reused_object(sizeof(struct pack_header), 0); hashflush(out); - copy_pack_data(out, reuse_packfile, w_curs, + copy_pack_data(out, reuse_packfile->p, w_curs, sizeof(struct pack_header), to_write); display_progress(progress_state, written); @@ -1124,7 +1125,7 @@ static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile, return pos; } -static void write_reused_pack(struct packed_git *reuse_packfile, +static void write_reused_pack(struct bitmapped_pack *reuse_packfile, struct hashfile *f) { size_t i = 0; @@ -1150,8 +1151,8 @@ static void write_reused_pack(struct packed_git *reuse_packfile, * bitmaps. See comment in try_partial_reuse() * for why. */ - write_reused_pack_one(reuse_packfile, pos + offset, f, - pack_start, &w_curs); + write_reused_pack_one(reuse_packfile->p, pos + offset, + f, pack_start, &w_curs); display_progress(progress_state, ++written); } } @@ -1207,9 +1208,12 @@ static void write_pack_file(void) offset = write_pack_header(f, nr_remaining); - if (reuse_packfile) { + if (reuse_packfiles_nr) { assert(pack_to_stdout); - write_reused_pack(reuse_packfile, f); + for (j = 0; j < reuse_packfiles_nr; j++) { + reused_chunks_nr = 0; + write_reused_pack(&reuse_packfiles[j], f); + } offset = hashfile_total(f); } @@ -3952,19 +3956,16 @@ static int pack_options_allow_reuse(void) static int get_object_list_from_bitmap(struct rev_info *revs) { - struct bitmapped_pack *packs = NULL; - size_t packs_nr = 0; - if (!(bitmap_git = prepare_bitmap_walk(revs, 0))) return -1; if (pack_options_allow_reuse()) - reuse_partial_packfile_from_bitmap(bitmap_git, &packs, - &packs_nr, + reuse_partial_packfile_from_bitmap(bitmap_git, + &reuse_packfiles, + &reuse_packfiles_nr, &reuse_packfile_bitmap); - if (packs) { - reuse_packfile = packs[0].p; + if (reuse_packfiles) { reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap); if (!reuse_packfile_objects) BUG("expected non-empty reuse bitmap"); From patchwork Tue Nov 28 19:08:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471576 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="IiTadKG0" Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F2F910FB for ; Tue, 28 Nov 2023 11:08:39 -0800 (PST) Received: by mail-qk1-x72c.google.com with SMTP id af79cd13be357-77bc5d8490dso342118985a.2 for ; Tue, 28 Nov 2023 11:08:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198518; x=1701803318; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=M9jusTEKxOqvtQziRVIyM7EH2YLqR3mDeyS0923frbk=; b=IiTadKG0dsxG2UsEN1csrrO9Jf/QBcL8cRNclFURUas90MSIB1TTgbl5zslJ4A/BqU apZNSnXjkbcy6KNDFQYuv6Fgc6fYogK4sLi30zJeDE0ud+MY1wXuR6DQZaffELng+mjo lfKdCw3gT/psQdD0PrlRSNc2/y6HX7RBT/5urys6FfEJJC+IFK4G9fH3vEub7q1qHzS6 stYl9KvqJuqjTa7tJ5eoAr36JRe82Ia+sS6wDNjOUHe+FOEcKwRDTn1rluJ1N8Q+mRvg B2dSRMMM0hVEwhO1fJ6ZyAk/QgS4mX9xGLtW8zjP3aO2+mYFevCtOyLimn/+/EKsz8ce xVOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198518; x=1701803318; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=M9jusTEKxOqvtQziRVIyM7EH2YLqR3mDeyS0923frbk=; b=mu6zQ0w35h+ZhmE+tq1750I6mqYTL1+VcEgWhX3RmESgvoTE6ZyRQWP3rPenF0s+ji 7/eu3FkgRuBB0gIepboeCY5hy9YZYKmUNL7QavMxUcne4fL9QLR8Va0hhCGp7FJSICRj K9JZFIHWikf/xkBLMCJWqJVmyFpsC2NGQt9RRHeR9FTm5IEEXU57EX7UbukHOlG26zyJ HO2IWAEsSkDez1i9DQDsVKIzrqKEOQpiIM0Xem0pq82++1AUER4MeYwlMSL667FGvoG5 N9Qmxn9SCh8JjcXY3hrDe1uIEJq8+DAONPRFDp7U6juLRaxoW62qxJ0Praa7CLJ1QYTS 1O5w== X-Gm-Message-State: AOJu0Yzl9/jgfjwk0VBEe32QhYfsQKFSlbqOzWE/9Cbp1AAXye3/dM1e Qys6DlmJlQ8qyZYEDO/ECq+2PN1FKgB4aOD6L6U= X-Google-Smtp-Source: AGHT+IHjFhA4NNlmUgs6h9HgUf6Q1yNsX54zyZ9kiKeLlmztQC8xWh77JKJKDrdCoqthHKDau3w9PQ== X-Received: by 2002:a0c:f950:0:b0:67a:45d8:7a6e with SMTP id i16-20020a0cf950000000b0067a45d87a6emr7501734qvo.57.1701198518405; Tue, 28 Nov 2023 11:08:38 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 10-20020a05621420ca00b0067a2749baa5sm3556499qve.60.2023.11.28.11.08.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:38 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:37 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 16/24] pack-objects: prepare `write_reused_pack()` for multi-pack reuse Message-ID: <67a8196978244b56d4f60861f140b78c59d15e30.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The function `write_reused_pack()` within `builtin/pack-objects.c` is responsible for performing pack-reuse on a single pack, and has two main functions: - it dispatches a call to `write_reused_pack_verbatim()` to see if we can reuse portions of the packfile in whole-word chunks - for any remaining objects (that is, any objects that appear after the first "gap" in the bitmap), call write_reused_pack_one() on that object to record it for reuse. Prepare this function for multi-pack reuse by removing the assumption that the bit position corresponding to the first object being reused from a given pack may not be at bit position zero. The changes in this function are mostly straightforward. Initialize `i` to the position of the first word to contain bits corresponding to that reuse pack. In most situations, we throw the initialized value away, since we end up replacing it with the return value from write_reused_pack_verbatim(), moving us past the section of whole words that we reused. Likewise, modify the per-object loop to ignore any bits at the beginning of the first word that do not belong to the pack currently being reused, as well as skip to the "done" section once we have processed the last bit corresponding to this pack. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 3b7704d062..b5e6f6377a 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1128,7 +1128,7 @@ static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile, static void write_reused_pack(struct bitmapped_pack *reuse_packfile, struct hashfile *f) { - size_t i = 0; + size_t i = reuse_packfile->bitmap_pos / BITS_IN_EWORD; uint32_t offset; off_t pack_start = hashfile_total(f) - sizeof(struct pack_header); struct pack_window *w_curs = NULL; @@ -1146,17 +1146,23 @@ static void write_reused_pack(struct bitmapped_pack *reuse_packfile, break; offset += ewah_bit_ctz64(word >> offset); + if (pos + offset < reuse_packfile->bitmap_pos) + continue; + if (pos + offset >= reuse_packfile->bitmap_pos + reuse_packfile->bitmap_nr) + goto done; /* * Can use bit positions directly, even for MIDX * bitmaps. See comment in try_partial_reuse() * for why. */ - write_reused_pack_one(reuse_packfile->p, pos + offset, + write_reused_pack_one(reuse_packfile->p, + pos + offset - reuse_packfile->bitmap_pos, f, pack_start, &w_curs); display_progress(progress_state, ++written); } } +done: unuse_pack(&w_curs); } From patchwork Tue Nov 28 19:08:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471577 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="KRIFk0JR" Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com [IPv6:2607:f8b0:4864:20::730]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A921C3 for ; Tue, 28 Nov 2023 11:08:42 -0800 (PST) Received: by mail-qk1-x730.google.com with SMTP id af79cd13be357-77d90497b86so184410185a.0 for ; Tue, 28 Nov 2023 11:08:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198521; x=1701803321; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=EbKNY7VNNH+QCW70ZspylXD6r6Zjz9ZM4nRfzSIaKf4=; b=KRIFk0JR3s2PZOZZ+U8XvECgp/JmR3zGCGTRJP8uDOKAsV3DMMKuYZo1HJQQzCYH4c SZCqErDdkG6U+4P9m20r9DVQsWTS6MIdmwOTDJNW83DTR5XjbHPOfwwVv4PPLOCc+vR/ WoEPs1rmy6D1nk3PCNsnl4/J5tFPTRAlvfup+lCAWG4cX7UARTlTvQxLVxdFMMGmGdJt s+hNYwvymNs4LJykDts9otWzILcnqR3JgmGB7M5rnhAUFqzPLVF1SO0yeSu2lZCsMgQ2 l1htOkK7hC3vXtoL7Xefuf+8Xc3oh3ZEUVnCpgt4TJWJfGMIy69g0x4zKGtKCI+syork Y4yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198521; x=1701803321; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=EbKNY7VNNH+QCW70ZspylXD6r6Zjz9ZM4nRfzSIaKf4=; b=DQYmYzMPIwK4yX4B8rexb4bKN/d30NXRGqmej0X466CL2WjkeyrRwBHrc3rmJENuBl xG0su9AzFOV1OSshrNOrYAXDavqhuuJgPPvxiaWkTe4geIWyr4wOWUs8mBuDAkzgRTVJ iVKJ4mtgE5SDKY4Is+734fIDQyWASSorZWnffck2dv1wJf0noVVUJDe4qeAYEfU15CDB HnlhdctXJzLEc4WLzarxpHF6ttKzQrCzYxY5TJ/6WoEbsSSEfyJLCu2m+4/mPQzFwGxv 0tTiIVSQ2GuwdDQv3MZyLms9uILpQRmVcFR0aPPgOmV+NSgwZUgCBd/Z6vjO7CzQx+as J6SQ== X-Gm-Message-State: AOJu0YzDqJM/a7wdXG5gYGr/sDreugLWY8xDGwtgazRRoAuen/iF2+Qd 9zGIwFzFEy3ZaIYqCbj/5eP73u0D1hkeTWHkbe8= X-Google-Smtp-Source: AGHT+IHks86Jl2+dFFRowpcz9ARa1ei7WhDjOqPmzqLwsNMJMXYjxIKXvXoz23XWptWyT7JcypR77w== X-Received: by 2002:a05:620a:1455:b0:77d:73fd:c79 with SMTP id i21-20020a05620a145500b0077d73fd0c79mr15964483qkl.53.1701198520953; Tue, 28 Nov 2023 11:08:40 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id bi40-20020a05620a31a800b0077da68b8801sm2066973qkb.65.2023.11.28.11.08.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:40 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:40 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 17/24] pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse Message-ID: <1f766648df5f80a178d6eae31cb25fc8484ef84e.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The function `write_reused_pack_verbatim()` within `builtin/pack-objects.c` is responsible for writing out a continuous set of objects beginning at the start of the reuse packfile. In the existing implementation, we did something like: while (pos < reuse_packfile_bitmap->word_alloc && reuse_packfile_bitmap->words[pos] == (eword_t)~0) pos++; if (pos) /* write first `pos * BITS_IN_WORD` objects from pack */ as an optimization to record a single chunk for the longest continuous prefix of objects wanted out of the reuse pack, instead of having a chunk for each individual object. For more details, see bb514de356 (pack-objects: improve partial packfile reuse, 2019-12-18). In order to retain this optimization in a multi-pack reuse world, we can no longer assume that the first object in a pack is on a word boundary in the bitmap storing the set of reusable objects. Assuming that all objects from the beginning of the reuse packfile up to the object corresponding to the first bit on a word boundary are part of the result, consume whole words at a time until the last whole word belonging to the reuse packfile. Copy those objects to the resulting packfile, and track that we reused them by recording a single chunk. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 73 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 60 insertions(+), 13 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index b5e6f6377a..e37509568b 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1098,31 +1098,78 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile, static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile, struct hashfile *out, - off_t pack_start UNUSED, + off_t pack_start, struct pack_window **w_curs) { - size_t pos = 0; + size_t pos = reuse_packfile->bitmap_pos; + size_t end; - while (pos < reuse_packfile_bitmap->word_alloc && - reuse_packfile_bitmap->words[pos] == (eword_t)~0) - pos++; + if (pos % BITS_IN_EWORD) { + size_t word_pos = (pos / BITS_IN_EWORD); + size_t offset = pos % BITS_IN_EWORD; + size_t last; + eword_t word = reuse_packfile_bitmap->words[word_pos]; - if (pos) { - off_t to_write; + if (offset + reuse_packfile->bitmap_nr < BITS_IN_EWORD) + last = offset + reuse_packfile->bitmap_nr; + else + last = BITS_IN_EWORD; - written = (pos * BITS_IN_EWORD); - to_write = pack_pos_to_offset(reuse_packfile->p, written) - - sizeof(struct pack_header); + for (; offset < last; offset++) { + if (word >> offset == 0) + return word_pos; + if (!bitmap_get(reuse_packfile_bitmap, + word_pos * BITS_IN_EWORD + offset)) + return word_pos; + } + + pos += BITS_IN_EWORD - (pos % BITS_IN_EWORD); + } + + /* + * Now we're going to copy as many whole eword_t's as possible. + * "end" is the index of the last whole eword_t we copy, but + * there may be additional bits to process. Those are handled + * individually by write_reused_pack(). + * + * Begin by advancing to the first word boundary in range of the + * bit positions occupied by objects in "reuse_packfile". Then + * pick the last word boundary in the same range. If we have at + * least one word's worth of bits to process, continue on. + */ + end = reuse_packfile->bitmap_pos + reuse_packfile->bitmap_nr; + if (end % BITS_IN_EWORD) + end -= end % BITS_IN_EWORD; + if (pos >= end) + return reuse_packfile->bitmap_pos / BITS_IN_EWORD; + + while (pos < end && + reuse_packfile_bitmap->words[pos / BITS_IN_EWORD] == (eword_t)~0) + pos += BITS_IN_EWORD; + + if (pos > end) + pos = end; + + if (reuse_packfile->bitmap_pos < pos) { + off_t pack_start_off = pack_pos_to_offset(reuse_packfile->p, 0); + off_t pack_end_off = pack_pos_to_offset(reuse_packfile->p, + pos - reuse_packfile->bitmap_pos); + + written += pos - reuse_packfile->bitmap_pos; /* We're recording one chunk, not one object. */ - record_reused_object(sizeof(struct pack_header), 0); + record_reused_object(pack_start_off, + pack_start_off - (hashfile_total(out) - pack_start)); hashflush(out); copy_pack_data(out, reuse_packfile->p, w_curs, - sizeof(struct pack_header), to_write); + pack_start_off, pack_end_off - pack_start_off); display_progress(progress_state, written); } - return pos; + if (pos % BITS_IN_EWORD) + BUG("attempted to jump past a word boundary to %"PRIuMAX, + (uintmax_t)pos); + return pos / BITS_IN_EWORD; } static void write_reused_pack(struct bitmapped_pack *reuse_packfile, From patchwork Tue Nov 28 19:08:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471578 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="nvVTDDYK" Received: from mail-ot1-x336.google.com (mail-ot1-x336.google.com [IPv6:2607:f8b0:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A74C92 for ; Tue, 28 Nov 2023 11:08:44 -0800 (PST) Received: by mail-ot1-x336.google.com with SMTP id 46e09a7af769-6d817ccaa6dso2067134a34.2 for ; Tue, 28 Nov 2023 11:08:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198523; x=1701803323; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=3XYPYw4QcsT+ncSJlBUK7JQLmFa9VjpCKiOD0dkIe5E=; b=nvVTDDYKLWokrI+7ZpDK/djx2AkY60cG+oRpiJk5QBL1gjv3T/QbSKVEkCG7wEsapk KbnY/dPb3czGoC6COMxqqiXDLTlvd1tovtIf+tmjSWu9SiIEEsrd8Auz40MfvX5s7oGK nvFQAZCnppgCxPVxMZomkNR2NW2wMfiP55SpMSLZOGtbL6vMSGjWaBJ6CNyNH2OLvS5R KqDrC3UspK+OXkv7PYjBp7u1//nXetm4fIsMjcMjOF/Wjk0d8l43D2ITkyoKRiGg2Tq2 qLqYhg0D7nQ3OpO47fXfQEEDPsZWahqm4S8G1jdcYrL2MO6T3IwgfVdoCXsfmh1HyB/x yZQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198523; x=1701803323; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=3XYPYw4QcsT+ncSJlBUK7JQLmFa9VjpCKiOD0dkIe5E=; b=uptQas0y6hUs5bpughkMBZrNx3OfPluJSHJvftE0bNxwfJZQN9bN4kq/qzWtmIBDp5 eOakbC0fzez9wcpwlhsYD6N134I7A8tNgA+WIuZH0+lP4PrPHe2ZnOgXVSPGYB5kbgaC ZgwnWqz+bAyaQfJE+QB0Lfmy6g9dqlafO8dnxu1Gf+IBO3hq+xfxcRGxApnzAAImRV7U PplcnKBJvdsw5+BIkljy+0UzF45/pkg7R7HqaGOltafo5tQLlFt+5LY+5fYiv1gZsCmx QqRKTgiXpBvlr9HYJOI7sU/FKoJRfW7MGx9slvij0oyhBPSH/V/et4/YkMKPBC9uya9x +HlA== X-Gm-Message-State: AOJu0Yx8niMX80Fn9Tv0xhpyjFsRmAy/mpelbbPTLtPz0NYRGcjeakUA NqpzwXvgXOWGBeGx0X+ijYuyR7l4OwVS/H5ZGrerwA== X-Google-Smtp-Source: AGHT+IFMTfbOGU/R2LDK7fQF+9EIK1fL/ZBJ18szCO7R2j9yATrQ+lLLzwbVG/PLfs511GAVdJ8e/g== X-Received: by 2002:a9d:7382:0:b0:6d7:f8c7:1a73 with SMTP id j2-20020a9d7382000000b006d7f8c71a73mr17675235otk.24.1701198523508; Tue, 28 Nov 2023 11:08:43 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id gv9-20020a056214262900b0067a2354f8besm3808396qvb.91.2023.11.28.11.08.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:43 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:42 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 18/24] pack-objects: include number of packs reused in output Message-ID: <4cd9f99bfdda0fa7db2372ca194572df1000304f.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: In addition to including the number of objects reused verbatim from a reuse-pack, include the number of packs from which objects were reused. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index e37509568b..902e70abc5 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -224,6 +224,7 @@ static struct progress *progress_state; static struct bitmapped_pack *reuse_packfiles; static size_t reuse_packfiles_nr; +static size_t reuse_packfiles_used_nr; static uint32_t reuse_packfile_objects; static struct bitmap *reuse_packfile_bitmap; @@ -1266,6 +1267,8 @@ static void write_pack_file(void) for (j = 0; j < reuse_packfiles_nr; j++) { reused_chunks_nr = 0; write_reused_pack(&reuse_packfiles[j], f); + if (reused_chunks_nr) + reuse_packfiles_used_nr++; } offset = hashfile_total(f); } @@ -4612,9 +4615,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) fprintf_ln(stderr, _("Total %"PRIu32" (delta %"PRIu32")," " reused %"PRIu32" (delta %"PRIu32")," - " pack-reused %"PRIu32), + " pack-reused %"PRIu32" (from %"PRIuMAX")"), written, written_delta, reused, reused_delta, - reuse_packfile_objects); + reuse_packfile_objects, + (uintmax_t)reuse_packfiles_used_nr); cleanup: free_packing_data(&to_pack); From patchwork Tue Nov 28 19:08:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471579 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="VzUPdWqp" Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A970198D for ; Tue, 28 Nov 2023 11:08:47 -0800 (PST) Received: by mail-qk1-x729.google.com with SMTP id af79cd13be357-77a453eb01cso311571685a.0 for ; Tue, 28 Nov 2023 11:08:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198526; x=1701803326; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Oe0M/K9IEuvFNDS0SqHzhQrVS8Rynou0B7htzngm67E=; b=VzUPdWqp8R+aiA0jN9vIObiQhvtkEBzr3/FHB/M76gkdIPbdPUpHR2E8iy0ZfZNpXU Lsw+7HtEEtglm6F+LbZoyh4qPtNkOx09Vx1fmuIIl0tBltNw9Fy6bSjh5ahKJqF/Pxbq gvdPkssdpcxaw5YhDEwT9iED8HRiuPljYMnS5Q1cHOm6XRIAOpH9u4FyrsI6iPkMBkUv pUTelLmDt0oOQ4bvYGoHXSthveCLoDdYYUHSQUFle3trYk9ncMRVnJQGxcITymPd8DLn xZtzzBJCVe8cbv9xiixCyGMQ9RQcPlUp0OX5b0yJnaTBGCbpKPXDINjRBpdYUmnATYAU 1maA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198526; x=1701803326; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Oe0M/K9IEuvFNDS0SqHzhQrVS8Rynou0B7htzngm67E=; b=E2tDfaidBn0JxyZ0swJ5kjhTup83je7VMdCZYDYzvntcuqEQ6VyoLC7gsJiKrYVJoP +uKGuAEI3hQaBUqwnQpNfUWdfuHgDCQ9/rosoLVL45m/M0iVPGrts+gJsnGb1hYOWEIe eXRzfvJ7/VreNeA4+q8AFI0z9K46jUUMdw3nXkJV6JzBVE05BPb5Zz7Zu6QpBb5glZ5W m9OjhnnO2Uk5UeALUxXs7ksD6to5cEbsqLPvm5JKjuVpDjWA9g7/sfejGkpBULW/tN0e pGUefQtPte0PEMvB8JUL9YYGxnlM6nKLuY2Er5ypX+y7kyWau4U8IB6irhhDQ8g4TEtF UcmA== X-Gm-Message-State: AOJu0YxOGoENDga8TvY1lgi8Q2TA0LsWpla6cB4GEDvl8o0a9KB7oexU uiIqRPABtEID3Ob3e1kjE+Y6dsAIks3bPg/xZ6w= X-Google-Smtp-Source: AGHT+IFFPr/RYubkQXaptnXUmKcaXLsuNQ+fj7YFmyeLLN4iljBtRLlBNY0QmJ3jSjprj3ZnXHlUFQ== X-Received: by 2002:ae9:ee05:0:b0:77d:645a:85e7 with SMTP id i5-20020ae9ee05000000b0077d645a85e7mr14906765qkg.42.1701198526221; Tue, 28 Nov 2023 11:08:46 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id de8-20020a05620a370800b0077d78afc513sm3468060qkb.110.2023.11.28.11.08.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:46 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:45 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 19/24] pack-bitmap: prepare to mark objects from multiple packs for reuse Message-ID: <176c4c95ac90a37b9ccf684e39126836d2cadc97.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that the pack-objects code is equipped to handle reusing objects from multiple packs, prepare the pack-bitmap code to mark objects from multiple packs as reuse candidates. In order to prepare the pack-bitmap code for this change, remove the same set of assumptions we unwound in previous commits from the helper function `reuse_partial_packfile_from_bitmap_1()`, in preparation for it to be called in a loop over the set of disjoint packs in a following commit. Specifically, remove the assumption that the bit position corresponding to the first object in a given reuse pack candidate is at a word boundary. Like in previous commits, we have to walk up to the first word boundary before marking whole words at a time for reuse. Unlike in previous commits, however, we have to keep track of whether all of the objects in the run-up to the first word boundary are wanted in the resulting pack. This is because we cannot blindly reuse whole words at a time unless we know for certain that we are sending all bases for any objects stored as deltas within each word. Once we're on a word boundary (provided that we want a complete prefix of objects from the pack), we can then reuse the same "whole-words" optimization from previous patches, marking all of the bits in a single word at a time. Any remaining objects (either from a partial word corresponding to objects at the end of a pack, or starting from the middle of the pack if we do not have a complete prefix) are dealt with individually via try_partial_reuse(), which (among other things) ensures that we are sending the necessary bases for all objects packed as OFS_DELTA or REF_DELTA. Signed-off-by: Taylor Blau --- pack-bitmap.c | 113 +++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 93 insertions(+), 20 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 670deec909..be53fc6da5 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1940,36 +1940,109 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git { struct bitmap *result = bitmap_git->result; struct pack_window *w_curs = NULL; - size_t i = 0; + size_t pos, offset; + unsigned complete_prefix = 1; - while (i < result->word_alloc && result->words[i] == (eword_t)~0) - i++; + pos = pack->bitmap_pos / BITS_IN_EWORD; + offset = pack->bitmap_pos % BITS_IN_EWORD; /* - * Don't mark objects not in the packfile or preferred pack. This bitmap - * marks objects eligible for reuse, but the pack-reuse code only - * understands how to reuse a single pack. Since the preferred pack is - * guaranteed to have all bases for its deltas (in a multi-pack bitmap), - * we use it instead of another pack. In single-pack bitmaps, the choice - * is made for us. + * If the position of our first object is not on a word + * boundary, check all bits individually until we reach the + * first word boundary. + * + * If no bits are missing between pack->bitmap_pos and the next + * word boundary, then we can move by whole words instead of by + * individual objects. If one or more of the objects are missing + * in that range, we must evaluate each subsequent object + * individually in order to exclude deltas whose base we are not + * sending, etc. */ - if (i > pack->p->num_objects / BITS_IN_EWORD) - i = pack->p->num_objects / BITS_IN_EWORD; + if (offset) { + /* + * Scan to the next word boundary, or through the last + * object in this bitmap, whichever occurs earlier. + */ + size_t last; + eword_t word = result->words[pos]; + if (pack->bitmap_nr < BITS_IN_EWORD - offset) + last = offset + pack->bitmap_nr; + else + last = BITS_IN_EWORD; - memset(reuse->words, 0xFF, i * sizeof(eword_t)); + for (; offset < last; offset++) { + size_t pack_pos; + if (word >> offset == 0) { + complete_prefix = 0; + continue; + } - for (; i < result->word_alloc; ++i) { - eword_t word = result->words[i]; - size_t pos = (i * BITS_IN_EWORD); - size_t offset; + offset += ewah_bit_ctz64(word >> offset); - for (offset = 0; offset < BITS_IN_EWORD; ++offset) { - if ((word >> offset) == 0) + pack_pos = pos * BITS_IN_EWORD + offset; + pack_pos -= pack->bitmap_pos; + + try_partial_reuse(pack, pack_pos, reuse, &w_curs); + if (!bitmap_get(reuse, pos + pack->bitmap_pos)) + complete_prefix = 0; + } + + pos++; + } + + if (complete_prefix) { + /* + * If we are using all of the objects at the beginning + * of this pack, we can safely reuse objects in eword_t + * sized chunks, since we are guaranteed to send all + * potential delta bases. + * + * Scan the nearest word boundaries within range of this + * pack's bit positions. If the pack does not start on a + * word boundary, skip to the next boundary, since we + * have already checked above. + */ + size_t start = pos; + size_t word_end = (pack->bitmap_pos + pack->bitmap_nr) / BITS_IN_EWORD; + while (start <= pos && pos < word_end && + pos < result->word_alloc && + result->words[pos] == (eword_t)~0) + pos++; + memset(reuse->words + start, 0xFF, (pos - start) * sizeof(eword_t)); + } + + /* + * At this point, we know that we are at an eword boundary, + * either because: + * + * - we started at one and used zero or more whole words + * following pack->bitmap_pos + * + * - we started in between two word boundaries, advanced + * forward to the next word boundary, and then used zero or + * more (assuming a complete prefix) whole words following. + */ + for (; pos < result->word_alloc; pos++) { + eword_t word = result->words[pos]; + + for (offset = 0; offset < BITS_IN_EWORD; offset++) { + size_t bit_pos, pack_pos; + if (word >> offset == 0) break; offset += ewah_bit_ctz64(word >> offset); - if (try_partial_reuse(pack, pos + offset, - reuse, &w_curs) < 0) { + + bit_pos = pos * BITS_IN_EWORD + offset; + if (bit_pos >= pack->bitmap_pos + pack->bitmap_nr) + goto done; + + pack_pos = bit_pos - pack->bitmap_pos; + if (pack_pos >= pack->p->num_objects) + BUG("advanced beyond the end of pack %s (%"PRIuMAX" > %"PRIu32")", + pack_basename(pack->p), (uintmax_t)pack_pos, + pack->p->num_objects); + + if (try_partial_reuse(pack, pack_pos, reuse, &w_curs) < 0) { /* * try_partial_reuse indicated we couldn't reuse * any bits, so there is no point in trying more From patchwork Tue Nov 28 19:08:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471580 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="ZFYeW17+" Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBB3F170B for ; Tue, 28 Nov 2023 11:08:49 -0800 (PST) Received: by mail-qt1-x82f.google.com with SMTP id d75a77b69052e-423cd566ffeso11445321cf.3 for ; Tue, 28 Nov 2023 11:08:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198529; x=1701803329; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=kiCJmkcXn9qQCJTM9uPvAKRKRLAuz4ZM0prseVj+urw=; b=ZFYeW17+0yIGXCtfL0aj7xYR8Z+reIDS/lyUisytWnhgna+sgaxPU3D3r/VfSZSnH1 3oRFVwkCHENSiltEENmaVQF5qys+G9ETtYQIqJ0LVoUgPWIgvc+xIHUHmuGCF5gj52CN rd2jUTKiy1M5GV3V6bYiK2bCuUgicfGQTcwXo5fE4QA3rZnpsDQO+UqaDMkW+JsNxOzk ggO1hFqXB5l2Wldtx7oLRMCZ1sqkXyouIvO8jjQ3+JO+Buvu6TTwtxP7vf72d76xXdyl NEcrHbaxyVEVXv/NFhDT8kLdnK7LG9q1CXhyGUzHKcGqpegDHmQfR9RHyniJtf3j1l41 BAng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198529; x=1701803329; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=kiCJmkcXn9qQCJTM9uPvAKRKRLAuz4ZM0prseVj+urw=; b=N865jszPzfoaeGATKJz+Dk5HjEtRC1xEqcARLgzmeM6dTM0lyCXHqmzgGJ3D86J/4J nE15NL7I2ogjVpaKDmHaMGC7ivVetj4nAYFTyrfRjM/+kqEuoAUhR4Qiulmu9pPFT/xq 0Kt6jK0aNniDA0fSPrQ+lJvYbOC/brwsVAKs3UCri7Ktnp2reac83K92aeWdo56SSXxE 2OGSFPhGwUUarPe1ix3vyCbFYLq9QraCFwHgNwotFk/qnuX3xbGaxuUdCLRvW70f+hs6 2VOml5H/a7086NxUa2BiBkY7hN7bQoqiwxOUowKn1WintQnT5ZdAuqdqnAXx0HMVNqQ+ QqQQ== X-Gm-Message-State: AOJu0YzT8UrtYveJ+BNZNgJXl8OvS0PQzUyr9h2M/RuqU/+NzW258ffN w7tmFpOsaMHpbutyOu3gRWPqaHjAUM1NOPlaktA= X-Google-Smtp-Source: AGHT+IE8Xu8lleWaYBp1rvv5e+4j+2e5vimacSwsajpA2purrZobdsVSFP8/yJLTaNNTF8JsJuWiWA== X-Received: by 2002:ac8:58ca:0:b0:413:5e4d:bf46 with SMTP id u10-20020ac858ca000000b004135e4dbf46mr24273798qta.11.1701198528874; Tue, 28 Nov 2023 11:08:48 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id d5-20020ac85345000000b00417f330026bsm4796874qto.49.2023.11.28.11.08.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:48 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:47 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 20/24] pack-objects: add tracing for various packfile metrics Message-ID: <46f1a03b8b15f0c4432c26aca1e842670638c798.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: As part of the multi-pack reuse effort, we will want to add some tests that assert that we reused a certain number of objects from a certain number of packs. We could do this by grepping through the stderr output of `pack-objects`, but doing so would be brittle in case the output format changed. Instead, let's use the trace2 mechanism to log various pieces of information about the generated packfile, which we can then use to compare against desired values. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 902e70abc5..fa71fe1ccf 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4620,6 +4620,13 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) reuse_packfile_objects, (uintmax_t)reuse_packfiles_used_nr); + trace2_data_intmax("pack-objects", the_repository, "written", written); + trace2_data_intmax("pack-objects", the_repository, "written/delta", written_delta); + trace2_data_intmax("pack-objects", the_repository, "reused", reused); + trace2_data_intmax("pack-objects", the_repository, "reused/delta", reused_delta); + trace2_data_intmax("pack-objects", the_repository, "pack-reused", reuse_packfile_objects); + trace2_data_intmax("pack-objects", the_repository, "packs-reused", reuse_packfiles_used_nr); + cleanup: free_packing_data(&to_pack); list_objects_filter_release(&filter_options); From patchwork Tue Nov 28 19:08:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471581 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="b1qipxcP" Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF5C01702 for ; Tue, 28 Nov 2023 11:08:52 -0800 (PST) Received: by mail-qt1-x82b.google.com with SMTP id d75a77b69052e-423d9d508d1so6051131cf.1 for ; Tue, 28 Nov 2023 11:08:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198531; x=1701803331; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=vmeNHyN2c0V60ZBnVmcpeDCxDkDyAskNROhNrlgkMg0=; b=b1qipxcP+T+FON0bdpTx6+zkn/mzZKUYc3C5yas+CbWci5+DGUUO3hLzi7d3onrOeH Z8HEjERg3qihq5t3oU4RbqkUb5FwBREtlfssP+kYkChw2DRzs97ZNVufQKaH8q1ZHpEL PUfX/101XtvNLKysUyLOj4aeOwo4P8OIhXE8QGaNTIpG+paNjcmz8uKo7HCQo+0qXAK8 whAfKMiUAgQ/UqApIo3vUcxHa9aKH11tFGQdss5pGqkN7Z6Uj6Zb7rbBV/2T55rKpU9u sjdx3RC+IsjKcmufmfoB7MEO/ldchXJ2B3i/6/bf1EANpsur3HURkqqPAPSPDAbkuUDt +YrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198531; x=1701803331; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=vmeNHyN2c0V60ZBnVmcpeDCxDkDyAskNROhNrlgkMg0=; b=AQcuDKyNc3L6WQj+TYW2rJr6grBrWXSrmP5bO8j02cSLrGNE1kLn9vG/wy5s3LT11d n6XsWPGdKQhKTpV1oxw3WkQ1gueBEfwe4sRVGH/62oeRW36H/TdozEvXjDHtnistqisk UNkKYZbi0kCUknW9HOmQN9+BWmmLEzhngjfAD1dVuH+dB6bZ/VNtHfP5jltTgfRLpmvE 3w6p7D4NiCd9JtDDe+2ziNdHqbeDN/1NOWDs0TJEJ41P+onndaR6X0g6k9fdFLYAHn4S 8tMr+YO8VS0lcS9yR29PePsCQLtzMFPDFI1vgTKWM2l8ubxE/59eoiagJs/vDbB8L7NH HJWQ== X-Gm-Message-State: AOJu0YyTrM1cGvLKsKd4Q3WhTz5CggCAHGlfyTPzZEukuIYFfVTiI8GV h08BIHHy8HDXH3ay1qnINsVWAr08ggseNqph57w= X-Google-Smtp-Source: AGHT+IEzxAZAQ4K7axNN2gIMpvod88X2nlvkDx12/1G3ec9wP0tXTChnqyizDjx8lxuinzOZ9573QA== X-Received: by 2002:ac8:5794:0:b0:41e:2d8e:b364 with SMTP id v20-20020ac85794000000b0041e2d8eb364mr22399252qta.53.1701198531477; Tue, 28 Nov 2023 11:08:51 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id u6-20020a05622a14c600b00423a3f5ea75sm3254360qtx.21.2023.11.28.11.08.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:51 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:50 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 21/24] t/test-lib-functions.sh: implement `test_trace2_data` helper Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Introduce a helper function which looks for a specific (category, key, value) tuple in the output of a trace2 event stream. We will use this function in a future patch to ensure that the expected number of objects are reused from an expected number of packs. Signed-off-by: Taylor Blau --- t/test-lib-functions.sh | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 9c3cf12b26..93fe819b0a 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1874,6 +1874,20 @@ test_region () { return 0 } +# Check that the given data fragment was included as part of the +# trace2-format trace on stdin. +# +# test_trace2_data +# +# For example, to look for trace2_data_intmax("pack-objects", repo, +# "reused", N) in an invocation of "git pack-objects", run: +# +# GIT_TRACE2_EVENT="$(pwd)/trace.txt" git pack-objects ... && +# test_trace2_data pack-objects reused N X-Patchwork-Id: 13471582 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="xXUmDOHp" Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82CB819A7 for ; Tue, 28 Nov 2023 11:08:55 -0800 (PST) Received: by mail-qk1-x729.google.com with SMTP id af79cd13be357-77d708e4916so306775885a.3 for ; Tue, 28 Nov 2023 11:08:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198534; x=1701803334; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=R+HnTrDUlLfFrEvRBf2+sp7Svlm3XV3W17W8ck0s8Vg=; b=xXUmDOHpWucW17oWjqCOG5I07/K3izTR1WgC/NVEgc3jRTvvBi1TXUinH/QSVZtj4d EtDYO60Tu2g6O/+PFti46prfj6Vm4rZ+qk4somsGT5hYUzJ0F3uYwJrXsw3k22UlqxXs vUeyO/+H57ite9TSw3JFpDPen7cIMn0pobsdQImidFCq/gHeImPJJXFOK2jaX8w3Hyot frdAyZs/ycWbvwmOB70SHExTCPcqzlrQPKissK5NqKunJNslZt5IjBJgJoZc9ChH63v2 ciu7fR+fEiREIohBcnRlimS4AOVjQtvEMQNjRrNV/LtjmO9qWLr5OZCYhpZhz61n0L4V Irwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198534; x=1701803334; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=R+HnTrDUlLfFrEvRBf2+sp7Svlm3XV3W17W8ck0s8Vg=; b=q6f48r/L4BSwcpJ6Vg6UH04viGlb+NMZicGUcuJf4PuDLGIot176JWlih6JLX7K2Hu 1Be7f29Hqw86HnhRLeByDSiCirSULC2dSXlVq9weiGBnQyq9EhUrAyTFCRUNnInF5P38 sXJ1KAhgTIGYiYFV/kQTy0xn5so/uFFVUg950k437RUpSCRzhXinTbM5v++PwrEgY348 2N6rtq3bLeYvgoyUMxK4FZpKEDuC+LYLGGr6D9FltXlCsOpBCv7c3/ONzbGoBHs1rtt4 H9sDBNEKtI9gf5ekok971ehh9ORd5bXwtcSpA0CesMFheGbywWn28G+aMXZ3NuWOZ4EQ 3kXg== X-Gm-Message-State: AOJu0YzNS3ySsdrOg37NfdV8KqxEXkhA9qw2iap9dJzWsO4tzUPZ0xDq glEjj0hxUEj+rIxYsItk/cd/BRN7fnCUpHy/3C0= X-Google-Smtp-Source: AGHT+IGxaE8ImNoF7U7VZoKd8C5gGQMeIABvjNTGv1s3cE7DFAsy8tYyuYyS0//76fFALcU2jWdIbw== X-Received: by 2002:a05:620a:1402:b0:77b:c536:3722 with SMTP id d2-20020a05620a140200b0077bc5363722mr16353079qkj.25.1701198534518; Tue, 28 Nov 2023 11:08:54 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id u21-20020ae9c015000000b0076efaec147csm4704615qkk.45.2023.11.28.11.08.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:53 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:53 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 22/24] pack-objects: allow setting `pack.allowPackReuse` to "single" Message-ID: <3140a1703a7a0dbf2787b953c7cdb0c0e1803a6f.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: In e704fc7978 (pack-objects: introduce pack.allowPackReuse, 2019-12-18), the `pack.allowPackReuse` configuration option was introduced, allowing users to disable the pack reuse mechanism. To prepare for debugging multi-pack reuse, allow setting configuration to "single" in addition to the usual bool-or-int values. "single" implies the same behavior as "true", "1", "yes", and so on. But it will complement a new "multi" value (to be introduced in a future commit). When set to "single", we will only perform pack reuse on a single pack, regardless of whether or not there are multiple disjoint packs. This requires no code changes (yet), since we only support single pack reuse. Signed-off-by: Taylor Blau --- Documentation/config/pack.txt | 2 +- builtin/pack-objects.c | 19 ++++++++++++++++--- 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt index f50df9dbce..fe100d0fb7 100644 --- a/Documentation/config/pack.txt +++ b/Documentation/config/pack.txt @@ -28,7 +28,7 @@ all existing objects. You can force recompression by passing the -F option to linkgit:git-repack[1]. pack.allowPackReuse:: - When true, and when reachability bitmaps are enabled, + When true or "single", and when reachability bitmaps are enabled, pack-objects will try to send parts of the bitmapped packfile verbatim. This can reduce memory and CPU usage to serve fetches, but might result in sending a slightly larger pack. Defaults to diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index fa71fe1ccf..4853e91251 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -230,7 +230,10 @@ static struct bitmap *reuse_packfile_bitmap; static int use_bitmap_index_default = 1; static int use_bitmap_index = -1; -static int allow_pack_reuse = 1; +static enum { + NO_PACK_REUSE = 0, + SINGLE_PACK_REUSE, +} allow_pack_reuse = SINGLE_PACK_REUSE; static enum { WRITE_BITMAP_FALSE = 0, WRITE_BITMAP_QUIET, @@ -3246,7 +3249,17 @@ static int git_pack_config(const char *k, const char *v, return 0; } if (!strcmp(k, "pack.allowpackreuse")) { - allow_pack_reuse = git_config_bool(k, v); + int res = git_parse_maybe_bool_text(v); + if (res < 0) { + if (!strcasecmp(v, "single")) + allow_pack_reuse = SINGLE_PACK_REUSE; + else + die(_("invalid pack.allowPackReuse value: '%s'"), v); + } else if (res) { + allow_pack_reuse = SINGLE_PACK_REUSE; + } else { + allow_pack_reuse = NO_PACK_REUSE; + } return 0; } if (!strcmp(k, "pack.threads")) { @@ -4002,7 +4015,7 @@ static void loosen_unused_packed_objects(void) */ static int pack_options_allow_reuse(void) { - return allow_pack_reuse && + return allow_pack_reuse != NO_PACK_REUSE && pack_to_stdout && !ignore_packed_keep_on_disk && !ignore_packed_keep_in_core && From patchwork Tue Nov 28 19:08:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13471583 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="EKZBdRCO" Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5002619B7 for ; Tue, 28 Nov 2023 11:08:58 -0800 (PST) Received: by mail-qv1-xf30.google.com with SMTP id 6a1803df08f44-67a12079162so25628056d6.1 for ; Tue, 28 Nov 2023 11:08:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198537; x=1701803337; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PiEG9Jo34bwgYs5P7LAA1JzjVLTB5iGtzHWSpByPLeE=; b=EKZBdRCOdmEePikDu/rDfpnsShIqjigJo0tQippx6YIV9QI2AuQRf26ghBKCdEczeC cC3LxmJmRv59AtjjMX3EF8XmWJw5Zi8kfnF7HOF9ziZ3ATtFTbXBUHIjEBnD9VKw/jeA S+zMnZDjr/fJTIX/wBfqtJWdtxDSnAsEP4kx6RMRxunaPLGTquNLFII2vTJzsSB1x5s8 mc7B8Md1+HufCnYIjkaDaRokxmXTxtEOp8MgbmgFF9rC2Qf4Isol7Rtn8jKtSca6VRAo ITFXKx/1EUoQVAbViBNn9pPlxTVsLMBfnU+hQElbBKo3pLGCL1RwlBCeXegcrhoiiqbl 5O3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198537; x=1701803337; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PiEG9Jo34bwgYs5P7LAA1JzjVLTB5iGtzHWSpByPLeE=; b=SQ+hw9HQik6FnBiRa12oQ9QMAKUHw6AmHtvSogcPp3WTp7mW0/oFKsODoN+wnGOWvR GQH2xeNqgDbCPjIh3rjrq21gp6/csO3OecEight9XypAd23N4RbAHiLnhuP83Il6OV5F 1U+6e6/cEow++hoXukyYnj7nW7FDiX2LnI9csaK8DBp8SoXTSjImEMvFpyi4h6DUPI6l wThFo1hvjfVxQXyzG8aqdIBcfje2b98aTqi3rZRWKToSrP7HX3MmPZmyDv8C/pbi6B9N gkGpaczTejtblEcm1l1Yk+TX4SIglLYqqaML6WOJn5FCZnI6AFdTx+tashANtNktXqYI Lsyw== X-Gm-Message-State: AOJu0YwVmVlszDN0pvQiZAJeCMefvJX/305bbtq62EPEISacyedV8/49 XDVAr6vEPL2wjGP+Tskj0qqSZrFIqQX/NwuDJh4= X-Google-Smtp-Source: AGHT+IHGTHIuC7CpnyMAhwCQca5yauPxGSGuj5a56b54gvzEYRsz7Y9X3haIpkt5rUdq1B5ObP1N3w== X-Received: by 2002:ad4:4690:0:b0:66d:65a9:8a17 with SMTP id pl16-20020ad44690000000b0066d65a98a17mr16957596qvb.2.1701198537219; Tue, 28 Nov 2023 11:08:57 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 10-20020ad45baa000000b0067a21a7397bsm3919911qvq.12.2023.11.28.11.08.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:56 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:56 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 23/24] pack-bitmap: reuse objects from all disjoint packs Message-ID: <7345e3946743cd098a57c384436a3f44592e5fc6.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that both the pack-bitmap and pack-objects code are prepared to handle marking and using objects from multiple disjoint packs for verbatim reuse, allow marking objects from all disjoint packs as eligible for reuse. Within the `reuse_partial_packfile_from_bitmap()` function, we no longer only mark the pack whose first object is at bit position zero for reuse, and instead mark any pack which is flagged as disjoint by the MIDX as a reuse candidate. If no such packs exist (i.e because we are reading a MIDX written before the "DISP" chunk was introduced), then treat the preferred pack as disjoint for the purposes of reuse. This is a safe assumption to make since all duplicate objects are resolved in favor of the preferred pack. Provide a handful of test cases in a new script (t5332) exercising interesting behavior for multi-pack reuse to ensure that we performed all of the previous steps correctly. Signed-off-by: Taylor Blau --- Documentation/config/pack.txt | 6 +- builtin/pack-objects.c | 6 +- pack-bitmap.c | 73 +++++++++--- pack-bitmap.h | 3 +- t/t5332-multi-pack-reuse.sh | 219 ++++++++++++++++++++++++++++++++++ 5 files changed, 290 insertions(+), 17 deletions(-) create mode 100755 t/t5332-multi-pack-reuse.sh diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt index fe100d0fb7..9fe48d41c9 100644 --- a/Documentation/config/pack.txt +++ b/Documentation/config/pack.txt @@ -30,7 +30,11 @@ to linkgit:git-repack[1]. pack.allowPackReuse:: When true or "single", and when reachability bitmaps are enabled, pack-objects will try to send parts of the bitmapped packfile - verbatim. This can reduce memory and CPU usage to serve fetches, + verbatim. When "multi", and when a multi-pack reachability bitmap is + available, pack-objects will try to send parts of all packs marked as + disjoint by the MIDX. If only a single pack bitmap is available, and + `pack.allowPackReuse` is set to "multi", reuse parts of just the + bitmapped packfile. This can reduce memory and CPU usage to serve fetches, but might result in sending a slightly larger pack. Defaults to true. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 4853e91251..43b77bff7c 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -233,6 +233,7 @@ static int use_bitmap_index = -1; static enum { NO_PACK_REUSE = 0, SINGLE_PACK_REUSE, + MULTI_PACK_REUSE, } allow_pack_reuse = SINGLE_PACK_REUSE; static enum { WRITE_BITMAP_FALSE = 0, @@ -3253,6 +3254,8 @@ static int git_pack_config(const char *k, const char *v, if (res < 0) { if (!strcasecmp(v, "single")) allow_pack_reuse = SINGLE_PACK_REUSE; + else if (!strcasecmp(v, "multi")) + allow_pack_reuse = MULTI_PACK_REUSE; else die(_("invalid pack.allowPackReuse value: '%s'"), v); } else if (res) { @@ -4032,7 +4035,8 @@ static int get_object_list_from_bitmap(struct rev_info *revs) reuse_partial_packfile_from_bitmap(bitmap_git, &reuse_packfiles, &reuse_packfiles_nr, - &reuse_packfile_bitmap); + &reuse_packfile_bitmap, + allow_pack_reuse == MULTI_PACK_REUSE); if (reuse_packfiles) { reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap); diff --git a/pack-bitmap.c b/pack-bitmap.c index be53fc6da5..561690c679 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -2061,10 +2061,19 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git unuse_pack(&w_curs); } +static void make_disjoint_pack(struct bitmapped_pack *out, struct packed_git *p) +{ + out->p = p; + out->bitmap_pos = 0; + out->bitmap_nr = p->num_objects; + out->disjoint = 1; +} + void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, struct bitmapped_pack **packs_out, size_t *packs_nr_out, - struct bitmap **reuse_out) + struct bitmap **reuse_out, + int multi_pack_reuse) { struct repository *r = the_repository; struct bitmapped_pack *packs = NULL; @@ -2088,24 +2097,62 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, free(packs); return; } - if (!pack.bitmap_nr) - continue; /* no objects from this pack */ - if (pack.bitmap_pos) - continue; /* not preferred pack */ + + if (!pack.disjoint) + continue; + + if (!multi_pack_reuse && pack.bitmap_pos) { + /* + * If we're only reusing a single pack, skip + * over any packs which are not positioned at + * the beginning of the MIDX bitmap. + * + * This is consistent with the existing + * single-pack reuse behavior, which only reuses + * parts of the MIDX's preferred pack. + */ + continue; + } ALLOC_GROW(packs, packs_nr + 1, packs_alloc); memcpy(&packs[packs_nr++], &pack, sizeof(pack)); objects_nr += pack.p->num_objects; + + if (!multi_pack_reuse) + break; + } + + if (!packs_nr) { + /* + * Old MIDXs (i.e. those written before the "DISP" chunk + * existed) will not have any packs marked as disjoint. + * + * But we still want to perform pack reuse with the + * special "preferred pack" as before. To do this, form + * the singleton set containing just the preferred pack, + * which is trivially disjoint with itself. + * + * Moreover, the MIDX is guaranteed to resolve duplicate + * objects in favor of the copy in the preferred pack + * (if one exists). Thus, we can safely perform pack + * reuse on this pack. + */ + uint32_t preferred_pack_pos; + struct packed_git *preferred_pack; + + preferred_pack_pos = midx_preferred_pack(bitmap_git); + preferred_pack = bitmap_git->midx->packs[preferred_pack_pos]; + + ALLOC_GROW(packs, packs_nr + 1, packs_alloc); + + make_disjoint_pack(&packs[packs_nr], preferred_pack); + objects_nr = packs[packs_nr++].p->num_objects; } } else { ALLOC_GROW(packs, packs_nr + 1, packs_alloc); - packs[packs_nr].p = bitmap_git->pack; - packs[packs_nr].bitmap_pos = 0; - packs[packs_nr].bitmap_nr = bitmap_git->pack->num_objects; - packs[packs_nr].disjoint = 1; - + make_disjoint_pack(&packs[packs_nr], bitmap_git->pack); objects_nr = packs[packs_nr++].p->num_objects; } @@ -2114,10 +2161,8 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, word_alloc++; reuse = bitmap_word_alloc(word_alloc); - if (packs_nr != 1) - BUG("pack reuse not yet implemented for multiple packs"); - - reuse_partial_packfile_from_bitmap_1(bitmap_git, packs, reuse); + for (i = 0; i < packs_nr; i++) + reuse_partial_packfile_from_bitmap_1(bitmap_git, &packs[i], reuse); if (!bitmap_popcount(reuse)) { free(packs); diff --git a/pack-bitmap.h b/pack-bitmap.h index 901a3b86ed..8bb316ce52 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -81,7 +81,8 @@ uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git); void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, struct bitmapped_pack **packs_out, size_t *packs_nr_out, - struct bitmap **reuse_out); + struct bitmap **reuse_out, + int multi_pack_reuse); int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping, kh_oid_map_t *reused_bitmaps, int show_progress); void free_bitmap_index(struct bitmap_index *); diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh new file mode 100755 index 0000000000..a9bd3870e6 --- /dev/null +++ b/t/t5332-multi-pack-reuse.sh @@ -0,0 +1,219 @@ +#!/bin/sh + +test_description='pack-objects multi-pack reuse' + +. ./test-lib.sh +. "$TEST_DIRECTORY"/lib-bitmap.sh +. "$TEST_DIRECTORY"/lib-disjoint.sh + +objdir=.git/objects +packdir=$objdir/pack + +all_packs () { + find $packdir -type f -name "*.idx" | sed -e 's/^.*\/\([^\/]\)/\1/g' +} + +all_disjoint () { + all_packs | sed -e 's/^/+/g' +} + +test_pack_reused () { + test_trace2_data pack-objects pack-reused "$1" +} + +test_packs_reused () { + test_trace2_data pack-objects packs-reused "$1" +} + + +# pack_position objects && + grep "$1" objects | cut -d" " -f1 +} + +test_expect_success 'setup' ' + git config pack.allowPackReuse multi +' + +test_expect_success 'preferred pack is reused without packs marked disjoint' ' + test_commit A && + test_commit B && + + A="$(echo A | git pack-objects --unpacked --delta-base-offset $packdir/pack)" && + B="$(echo B | git pack-objects --unpacked --delta-base-offset $packdir/pack)" && + + git prune-packed && + + git multi-pack-index write --bitmap && + + test_must_not_be_disjoint "pack-$A.pack" && + test_must_not_be_disjoint "pack-$B.pack" && + + : >trace2.txt && + GIT_TRACE2_EVENT="$PWD/trace2.txt" \ + git pack-objects --stdout --revs --all >/dev/null && + + test_pack_reused 3 in <<-EOF && + pack-$A.idx + +pack-$B.idx + +pack-$C.idx + EOF + git multi-pack-index write --bitmap --stdin-packs trace2.txt && + GIT_TRACE2_EVENT="$PWD/trace2.txt" \ + git pack-objects --stdout --revs --all >/dev/null && + + test_pack_reused 6 in && + git multi-pack-index write --bitmap --stdin-packs trace2.txt && + GIT_TRACE2_EVENT="$PWD/trace2.txt" \ + git pack-objects --stdout --revs --all >/dev/null && + + test_pack_reused 9 in && + git multi-pack-index write --bitmap --preferred-pack="pack-$D.idx" \ + --stdin-packs in <<-EOF && + $(git rev-parse E) + ^$(git rev-parse D) + EOF + + : >trace2.txt && + GIT_TRACE2_EVENT="$PWD/trace2.txt" \ + git pack-objects --stdout --delta-base-offset --revs /dev/null && + + test_pack_reused 3 in && + git multi-pack-index write --bitmap --preferred-pack="pack-$A.idx" \ + --stdin-packs in <<-EOF && + $(git rev-parse E) + ^$(git rev-parse D) + EOF + + : >trace2.txt && + GIT_TRACE2_EVENT="$PWD/trace2.txt" \ + git pack-objects --stdout --delta-base-offset --revs /dev/null && + + test_pack_reused 3 f && + git add f && + test_tick && + git commit -m "delta" && + delta="$(git rev-parse HEAD)" && + + test_seq 64 >f && + test_tick && + git commit -a -m "base" && + base="$(git rev-parse HEAD)" && + + test_commit other && + + git repack -d && + + have_delta "$(git rev-parse $delta:f)" "$(git rev-parse $base:f)" && + + all_disjoint >in && + git multi-pack-index write --bitmap --stdin-packs in <<-EOF && + $(git rev-parse other) + ^$base + EOF + + : >trace2.txt && + GIT_TRACE2_EVENT="$PWD/trace2.txt" \ + git pack-objects --stdout --delta-base-offset --revs /dev/null && + + # Even though all packs are marked disjoint, we can only reuse + # the 3 objects corresponding to "other" from the latest pack. + # + # This is because even though we want "delta", we do not want + # "base", meaning that we have to inflate the delta/base-pair + # corresponding to the blob in commit "delta", which bypasses + # the pack-reuse mechanism. + # + # The remaining objects from the other pack are similarly not + # reused because their objects are on the uninteresting side of + # the query. + test_pack_reused 3 X-Patchwork-Id: 13471584 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="SbaF+87Y" Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1F5F19A6 for ; Tue, 28 Nov 2023 11:09:00 -0800 (PST) Received: by mail-qk1-x72e.google.com with SMTP id af79cd13be357-778a6c440faso275189785a.3 for ; Tue, 28 Nov 2023 11:09:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1701198540; x=1701803340; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=yyjU3+2aACcGusXPzHPS4F+h43r7SvevI+1svAuY+R8=; b=SbaF+87Yl4pFHRI5d8jjYXIPqj8iTMfK7lrBc/G/5zzyBw9UT2Hj/LqT6/+iGoypm/ 72Lk2cfCli34Lcvu7fu6+hj/BnzUBhd9h+0kxa6YZ5qPrx6IB/4z5z+x7c5ixloxQ4nR 0RNWn09Xh996zHBJK+fqTM1kEjJWxMSvX+zCgDU/qtHLi4UyOL+L95RET4QtsZOEiib2 4jQKRAzvOQLmoDwWOVYQJZ1zDuEQN1wfJ4McTgZJG/XSFwc9EKd8gzRcnGuWCp6UURau xM8Ajul7cwuFumVLd256+BYjtMV+/SJ2cED8S3S8WRLujKXUA1iw0/OWQgS1UsynCsrr gQZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701198540; x=1701803340; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=yyjU3+2aACcGusXPzHPS4F+h43r7SvevI+1svAuY+R8=; b=IDEZO1Gq/8pbdanKMzO7rXTlmseS5ekkGBd8lAWVy8AeWrkGo11/OhfJnoTUcugiqb D2Vt/oV1JNtOGuoBpq/NMP0poTtt5mYEOpkCbNXZEEJXmIyqZ71DVsgKNfldoit/mW4c paKHfaSmCqpv4SjBsB7lGbPFfWyrINoEq5CWY7JWRZIpj1uFIN8gpUj/pBfrT9Bl1z7O 4mfPcSvDN0NOJezlvIMnRnfg/fbM1igP9x6gbk2k1Yjd7CqgB56sFI+Hk0Ko2TC5fPfQ eUZa5Xxgk9wlGnmNDm3SV1fcqkL7d1Ymg/6rqSdJb8Hu2kMIPFXGAUGNGO571mOYf89n N/9A== X-Gm-Message-State: AOJu0YzPN3z6vQUv33k0LipWaL4CZlocOxSMOqnVu/ifyF6G56+5p2DU zuFuegC6ySPrP9T2pEF9IwyazrgJ25jdix23H3k= X-Google-Smtp-Source: AGHT+IHHE5lXlE61vzqUDnstW5pQJdSlLw58Y1XqlmtuuJ+/AjUdUB2MsHn0SNrnAB5ofMxDRZ43DA== X-Received: by 2002:a05:6214:90c:b0:67a:2969:dd3e with SMTP id dj12-20020a056214090c00b0067a2969dd3emr10055928qvb.62.1701198539824; Tue, 28 Nov 2023 11:08:59 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id x5-20020a0ce785000000b0067a5571bca6sm1078464qvn.74.2023.11.28.11.08.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 11:08:59 -0800 (PST) Date: Tue, 28 Nov 2023 14:08:58 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH 24/24] t/perf: add performance tests for multi-pack reuse Message-ID: <980b318f98143d8ba26443127e515c0c98a2bd6b.1701198172.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: To ensure that we don't regress either the size or runtime performance of multi-pack reuse, add a performance test to measure both of these. The test partitions the objects in GIT_TEST_PERF_LARGE_REPO into 1, 10, and 100 packs, and then tries to perform a "clone" at each stage with both single- and multi-pack reuse enabled. Note that the `repack_into_n_chunks()` function in this new test script differs from the existing `repack_into_n()`. The former partitions the repository into N equal-sized chunks, while the latter produces N packs of five commits each (plus their objects), and then another pack with the remainder. On git.git, I can produce the following results on my machine: Test this tree -------------------------------------------------------------------------------- 5332.3: clone for 1-pack scenario (single-pack reuse) 1.57(2.99+0.15) 5332.4: clone size for 1-pack scenario (single-pack reuse) 231.8M 5332.5: clone for 1-pack scenario (multi-pack reuse) 1.79(2.96+0.21) 5332.6: clone size for 1-pack scenario (multi-pack reuse) 231.7M 5332.9: clone for 10-pack scenario (single-pack reuse) 3.89(16.75+0.35) 5332.10: clone size for 10-pack scenario (single-pack reuse) 209.9M 5332.11: clone for 10-pack scenario (multi-pack reuse) 1.56(2.99+0.17) 5332.12: clone size for 10-pack scenario (multi-pack reuse) 224.4M 5332.15: clone for 100-pack scenario (single-pack reuse) 8.24(54.31+0.59) 5332.16: clone size for 100-pack scenario (single-pack reuse) 278.3M 5332.17: clone for 100-pack scenario (multi-pack reuse) 2.13(2.44+0.33) 5332.18: clone size for 100-pack scenario (multi-pack reuse) 357.9M Signed-off-by: Taylor Blau --- t/perf/p5332-multi-pack-reuse.sh | 81 ++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100755 t/perf/p5332-multi-pack-reuse.sh diff --git a/t/perf/p5332-multi-pack-reuse.sh b/t/perf/p5332-multi-pack-reuse.sh new file mode 100755 index 0000000000..5c6c575d62 --- /dev/null +++ b/t/perf/p5332-multi-pack-reuse.sh @@ -0,0 +1,81 @@ +#!/bin/sh + +test_description='tests pack performance with multi-pack reuse' + +. ./perf-lib.sh +. "${TEST_DIRECTORY}/perf/lib-pack.sh" + +packdir=.git/objects/pack + +test_perf_large_repo + +find_pack () { + for idx in $packdir/pack-*.idx + do + if git show-index <$idx | grep -q "$1" + then + basename $idx + fi || return 1 + done +} + +repack_into_n_chunks () { + git repack -adk && + + test "$1" -eq 1 && return || + + find $packdir -type f | sort >packs.before && + + # partition the repository into $1 chunks of consecutive commits, and + # then create $1 packs with the objects reachable from each chunk + # (excluding any objects reachable from the previous chunks) + sz="$(($(git rev-list --count --all) / $1))" + for rev in $(git rev-list --all | awk "NR % $sz == 0" | tac) + do + pack="$(echo "$rev" | git pack-objects --revs \ + --honor-pack-keep --delta-base-offset $packdir/pack)" && + touch $packdir/pack-$pack.keep || return 1 + done + + # grab any remaining objects not packed by the previous step(s) + git pack-objects --revs --all --honor-pack-keep --delta-base-offset \ + $packdir/pack && + + find $packdir -type f | sort >packs.after && + + # and install the whole thing + for f in $(comm -12 packs.before packs.after) + do + rm -f "$f" || return 1 + done + rm -fr $packdir/*.keep +} + +for nr_packs in 1 10 100 +do + test_expect_success "create $nr_packs-pack scenario" ' + repack_into_n_chunks $nr_packs + ' + + test_expect_success "setup bitmaps for $nr_packs-pack scenario" ' + find $packdir -type f -name "*.idx" | sed -e "s/.*\/\(.*\)$/+\1/g" | + git multi-pack-index write --stdin-packs --bitmap \ + --preferred-pack="$(find_pack $(git rev-parse HEAD))" + ' + + for reuse in single multi + do + test_perf "clone for $nr_packs-pack scenario ($reuse-pack reuse)" " + git for-each-ref --format='%(objectname)' refs/heads refs/tags >in && + git -c pack.allowPackReuse=$reuse pack-objects \ + --revs --delta-base-offset --use-bitmap-index \ + --stdout result + " + + test_size "clone size for $nr_packs-pack scenario ($reuse-pack reuse)" ' + wc -c