From patchwork Thu Dec 14 22:23:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493743 Received: from mail-oa1-f53.google.com (mail-oa1-f53.google.com [209.85.160.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4591766AB6 for ; Thu, 14 Dec 2023 22:23:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="wZHB2Qec" Received: by mail-oa1-f53.google.com with SMTP id 586e51a60fabf-1efabc436e4so35010fac.1 for ; Thu, 14 Dec 2023 14:23:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592621; x=1703197421; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=uXgE8FVumua/OgsY6+EN7x41FTQBQNWf+Toq+QSaXGo=; b=wZHB2QecUsaTBcIAGPOfV4ab63T12cK7e9/3b8e7nksVr7K/ugUd9cmt8dEhsAkQd4 Xr3OlOX3i9OENAHHFdnrdngxP61ccHAsIP3WQJ8dijPps0HczcTXkC3jmG0Wd3Gkl61X FZ7OdnehtKMrfZ5ovWQoBF7fGnrXLoMocpeHmkVs+JSHAwXIMDAOlImTXO/YZfDVTVCO TqHPUa1JNLVH0U5rrytzgssvXrhgImPHAeQ6IRirxwcNWb1Sp6O7+XbE/R0apYIO9V3x 0kWfbN5CHOSbsNl++rS+GnFYe70G3YglXN/ZwIczVkmWgTbnun3Pu0ycg+JdaouK1F2C 0XUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592621; x=1703197421; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=uXgE8FVumua/OgsY6+EN7x41FTQBQNWf+Toq+QSaXGo=; b=V9Dxm/ykE5FEQ7sxgIohyTz4B45NCoGgMJUsuYF7Ll0hqsq7stUgpR4IJzB9MxZ7vQ NKsm9LQx1k0udVwMIyk0GAD3t76tsjbLbaN55RmviOiwBZdF7Z+p5s3tI4LNkRRFlvyb 1vSoE3zsaylrUJEsiRxxRhQSWe+e0mAFRc09E/AjxuPVuEA5fIahEKQxPWMXsA2ZKzDp DEDT1/HlXQkfdeP8jivnSRCA9uGSMO5EdTMKy/qB0a38lxKJXoxVIuU3ojYyo6IfT0ts DII3Rdhl11Fc0wcFiOTSfgrIMA4OAGMwPIbdWc804toXYKLfiHZmbdQKtoKc4vdfvIUF RRAA== X-Gm-Message-State: AOJu0YxZE/XXy3eASiUo5TyJcHTIvzWR5WqpILO4Gars9ombSqxnckqt p5Np/C5rVPmnQARKGGEp2Pxwg9DOyYI1T3iDtWaMNg== X-Google-Smtp-Source: AGHT+IHhGSE201A3Plge5osbbx708Cgl3sSaetQq1MhFQxFiMmVGEHciWGjq5aDqnBRqqzikgPF2Dg== X-Received: by 2002:a05:6870:9f07:b0:1fb:75a:c43f with SMTP id xl7-20020a0568709f0700b001fb075ac43fmr10075837oab.104.1702592620752; Thu, 14 Dec 2023 14:23:40 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id hg5-20020a056870790500b001fa1db68eecsm4759249oab.4.2023.12.14.14.23.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:23:40 -0800 (PST) Date: Thu, 14 Dec 2023 17:23:39 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 01/26] pack-objects: free packing_data in more places Message-ID: <7d65abfa1d38c8bfa59a06514c5bbe6a07f3c6da.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The pack-objects internals use a packing_data struct to track what objects are part of the pack(s) being formed. Since these structures contain allocated fields, failing to appropriately free() them results in a leak. Plug that leak by introducing a clear_packing_data() function, and call it in the appropriate spots. This is a fairly straightforward leak to plug, since none of the callers expect to read any values or have any references to parts of the address space being freed. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 1 + midx.c | 5 +++++ pack-objects.c | 15 +++++++++++++++ pack-objects.h | 1 + 4 files changed, 22 insertions(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 89a8b5a976..321d7effb0 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4522,6 +4522,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) reuse_packfile_objects); cleanup: + clear_packing_data(&to_pack); list_objects_filter_release(&filter_options); strvec_clear(&rp); diff --git a/midx.c b/midx.c index 1d14661dad..778dd536c8 100644 --- a/midx.c +++ b/midx.c @@ -1603,8 +1603,13 @@ static int write_midx_internal(const char *object_dir, flags) < 0) { error(_("could not write multi-pack bitmap")); result = 1; + clear_packing_data(&pdata); + free(commits); goto cleanup; } + + clear_packing_data(&pdata); + free(commits); } /* * NOTE: Do not use ctx.entries beyond this point, since it might diff --git a/pack-objects.c b/pack-objects.c index f403ca6986..a9d9855063 100644 --- a/pack-objects.c +++ b/pack-objects.c @@ -151,6 +151,21 @@ void prepare_packing_data(struct repository *r, struct packing_data *pdata) init_recursive_mutex(&pdata->odb_lock); } +void clear_packing_data(struct packing_data *pdata) +{ + if (!pdata) + return; + + free(pdata->cruft_mtime); + free(pdata->in_pack); + free(pdata->in_pack_by_idx); + free(pdata->in_pack_pos); + free(pdata->index); + free(pdata->layer); + free(pdata->objects); + free(pdata->tree_depth); +} + struct object_entry *packlist_alloc(struct packing_data *pdata, const struct object_id *oid) { diff --git a/pack-objects.h b/pack-objects.h index 0d78db40cb..b9898a4e64 100644 --- a/pack-objects.h +++ b/pack-objects.h @@ -169,6 +169,7 @@ struct packing_data { }; void prepare_packing_data(struct repository *r, struct packing_data *pdata); +void clear_packing_data(struct packing_data *pdata); /* Protect access to object database */ static inline void packing_data_lock(struct packing_data *pdata) From patchwork Thu Dec 14 22:23:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493728 Received: from mail-oo1-f48.google.com (mail-oo1-f48.google.com [209.85.161.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D6DB6E2BB for ; Thu, 14 Dec 2023 22:23:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="kBszcGR7" Received: by mail-oo1-f48.google.com with SMTP id 006d021491bc7-5906df1d2adso44348eaf.2 for ; Thu, 14 Dec 2023 14:23:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592623; x=1703197423; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=rpb97+wmWvddLNa6TbWiGBUlqgUoWA+hzhvJsz6L1dI=; b=kBszcGR77vEoZBg3Pp272zW7qKV+ajcKiDMcK4pR5t3Fnd4m5kXln/4bc2MaMd2xlE nrJTScMwppAdR4LRq4DVUEC/jMrMK3xFQgs+d6g9E8No1IkxLWecjxm//H9ZFy9vVlRN yf5m0Gmln9EglK9fIFmk1ZI5/S7XigGhEQw0zVgXxGLQ9kdKCytLFO6ugBTMQr9bhz9o t6PTxppYP6X3cIquqwKdLQXdO3OgFx0XgSb7eghVU7y0ieH0qcf0GgvWnoISiIwC1JKh 5xVMJmRLSx+vugaYwUtH1zn6wJIFwtBwnbpW2fvRKok5NlxL994KA0+1t+Lcm5Fbp0so CvXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592623; x=1703197423; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=rpb97+wmWvddLNa6TbWiGBUlqgUoWA+hzhvJsz6L1dI=; b=uXIz/c97XyQ2RsvGi8+640l+vaK/wjnieHgy5BFrBga9wsEGWoPGRIIpIG88uxZko5 ozy+a2Gc1Ywn8rxRnb8Mq7k1/7fwhknZjtSrC0yF8FORl/+fheNttDMCk4OD8eBE97LL BHMlAALIVCVxXetbYodCuUjX88GuXc4QmkZyT7b1RWaMHX6o21bfD//By6WT4ObH22LB rPfXCOmIRtGW/tyMfckyXWexCMdYbG6bHQEl+sgAqDATvGBnBwBp0NeYWM05HFQ1V46W w2QkCHhYa3PvBGPz9jJX/9fPZRzqFDiO94scjIpS/Y5KClu83pFbWWX4mZhMP0MSZUxo 1TjQ== X-Gm-Message-State: AOJu0YwbtylvuVtl6slSEYIacpbFhEuDsvQbxiDmQxbPZKKED+k4zt/u Z2ATaMl7g4RX4vijSUaQNXSUw9ZvTmAdPxXLMwyvvg== X-Google-Smtp-Source: AGHT+IEMN5vMlodgeVWhTWhAn/54up8yip+Z4r54OJR/MR6TiA1wrbZCcm1wPmR4a5V5zNh1cwG4UQ== X-Received: by 2002:a4a:dfb6:0:b0:590:6fd5:68f2 with SMTP id k22-20020a4adfb6000000b005906fd568f2mr5140508ook.18.1702592623687; Thu, 14 Dec 2023 14:23:43 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id s14-20020a0568301e0e00b006d87dc31cddsm3361038otr.37.2023.12.14.14.23.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:23:43 -0800 (PST) Date: Thu, 14 Dec 2023 17:23:42 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 02/26] pack-bitmap-write: deep-clear the `bb_commit` slab Message-ID: <19cdaf59c5e8c2aa58b757f7013ccb4ba1cc7f98.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The `bb_commit` commit slab is used by the pack-bitmap-write machinery to track various pieces of bookkeeping used to generate reachability bitmaps. Even though we clear the slab when freeing the bitmap_builder struct (with `bitmap_builder_clear()`), there are still pointers which point to locations in memory that have not yet been freed, resulting in a leak. Plug the leak by introducing a suitable `free_fn` for the `struct bb_commit` type, and make sure it is called on each member of the slab via the `deep_clear_bb_data()` function. Note that it is possible for both of the arguments to `bitmap_free()` to be NULL, but `bitmap_free()` is a noop for NULL arguments, so it is OK to pass them unconditionally. Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index f4ecdf8b0e..ae37fb6976 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -198,6 +198,13 @@ struct bb_commit { unsigned idx; /* within selected array */ }; +static void clear_bb_commit(struct bb_commit *commit) +{ + free_commit_list(commit->reverse_edges); + bitmap_free(commit->commit_mask); + bitmap_free(commit->bitmap); +} + define_commit_slab(bb_data, struct bb_commit); struct bitmap_builder { @@ -339,7 +346,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, static void bitmap_builder_clear(struct bitmap_builder *bb) { - clear_bb_data(&bb->data); + deep_clear_bb_data(&bb->data, clear_bb_commit); free(bb->commits); bb->commits_nr = bb->commits_alloc = 0; } From patchwork Thu Dec 14 22:23:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493729 Received: from mail-oa1-f53.google.com (mail-oa1-f53.google.com [209.85.160.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 494C2671E8 for ; Thu, 14 Dec 2023 22:23:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="jkl4UK9m" Received: by mail-oa1-f53.google.com with SMTP id 586e51a60fabf-1f055438492so30671fac.3 for ; Thu, 14 Dec 2023 14:23:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592627; x=1703197427; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=V8TZpuvfY0jvCMNagKiqmDEug8KHOO6QlTzJG+1k+VA=; b=jkl4UK9mk8EqTeDXZhQ2IlYYXjXo/bjQk8cG3f2phQ6ib2fgdYWsOODuwGKG2hFvQE 1aqYioLej1jQjg3f0nyTDmHmdF/iRcPCAWleoIpR1+dcGS/ttCvwA2JsdsJ3B5VkW4bK vkJZA6BWnVSRpkMyFpImK4rfR0hbNmZzmxx+j2m+OTOSDuGrmZ/y6qnIBBGierUUTjYX DeZ78VX2kqnub3SWyjke5r9yO5WkYsj+ICol+azQuHtmj0yeetgdBcEi8vAOyCUywRh3 vp+2mDXwkrImzVsmC7pasSaDBFL7dn2xFLka35eeviIaxBMTMC1crLqy8DvMl9OOOd9W gIZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592627; x=1703197427; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=V8TZpuvfY0jvCMNagKiqmDEug8KHOO6QlTzJG+1k+VA=; b=fJk3smf99RfKQr5AolgRuX80SeWJt621zJU3MAuP9CZwrBH0N1vUhj7C2jTRTlIWMC dhvzs9UUSzzxEf0WPjHUlRnsITs1p4negERyk09OZsNx3m6iHMO1y4nNIuMUSoSL62w0 t22Fjk6uTkB/LYc8w3H3t8hp9CCm+w1TVDRmoVp2S+XSu2X6JvikowQWSuOR4mtwKoLZ LSbG4qrkHkfeggC/KBlOco8iVr28cYF7qWGP1Z2qMuKy2aPOiJX+TptBEZ6AL9n+Ukno L/NG0e4BG6VkDmjzCV7LnV7XcpyRF1v/czD2g9O/pc2g09u2guq+rKMWynOA+6IQ2VU2 s0cQ== X-Gm-Message-State: AOJu0Yz9XM4nDGOtckmMMeON9f6GvlG0zwNqbAio95PEENHIMYfBi/iE VtzlrCdx13faiQ/T5f6lDktR2iAF9UhpM7Mb76xkfg== X-Google-Smtp-Source: AGHT+IH7iFX2wTvs+q0vQPRAxZSkrsR4XsMP2cFL3Yen82E+oTJE0cdE535nLQt9DpiGoHweL8ccyQ== X-Received: by 2002:a05:6870:56ac:b0:203:4ca8:fdd5 with SMTP id p44-20020a05687056ac00b002034ca8fdd5mr2058456oao.97.1702592626900; Thu, 14 Dec 2023 14:23:46 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id ov19-20020a056870cb9300b001fb0168cc3esm4731057oab.42.2023.12.14.14.23.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:23:46 -0800 (PST) Date: Thu, 14 Dec 2023 17:23:45 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 03/26] pack-bitmap: plug leak in find_objects() Message-ID: <477df6c974bf5ba7bf91d2f720e9de4f0e91f246.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The `find_objects()` function creates an object_list for any tips of the reachability query which do not have corresponding bitmaps. The object_list is not used outside of `find_objects()`, but we never free it with `object_list_free()`, resulting in a leak. Let's plug that leak by calling `object_list_free()`, which results in t6113 becoming leak-free. Signed-off-by: Taylor Blau --- pack-bitmap.c | 2 ++ t/t6113-rev-list-bitmap-filters.sh | 2 ++ 2 files changed, 4 insertions(+) diff --git a/pack-bitmap.c b/pack-bitmap.c index 0260890341..d2f1306960 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1280,6 +1280,8 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, base = fill_in_bitmap(bitmap_git, revs, base, seen); } + object_list_free(¬_mapped); + return base; } diff --git a/t/t6113-rev-list-bitmap-filters.sh b/t/t6113-rev-list-bitmap-filters.sh index 86c70521f1..459f0d7412 100755 --- a/t/t6113-rev-list-bitmap-filters.sh +++ b/t/t6113-rev-list-bitmap-filters.sh @@ -4,6 +4,8 @@ test_description='rev-list combining bitmaps and filters' . ./test-lib.sh . "$TEST_DIRECTORY"/lib-bitmap.sh +TEST_PASSES_SANITIZE_LEAK=true + test_expect_success 'set up bitmapped repo' ' # one commit will have bitmaps, the other will not test_commit one && From patchwork Thu Dec 14 22:23:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493752 Received: from mail-oo1-f41.google.com (mail-oo1-f41.google.com [209.85.161.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 72EF06D1DE for ; Thu, 14 Dec 2023 22:23:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="ff0sMCkX" Received: by mail-oo1-f41.google.com with SMTP id 006d021491bc7-5906988ab8dso58211eaf.0 for ; Thu, 14 Dec 2023 14:23:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592629; x=1703197429; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=CJn5qRs6lqTbZ3HymPXKMtcJ4ZUxdDzTr5RaipMgT4A=; b=ff0sMCkX9fwugW1uzkA5qmhl20t9GRaLzkQ/2oKeCWJWeyD0NHe8EvYpP8cEiRYbzX 7I7nKS1DyE0GUrOy6s+oqsKDmIbaCAr5yQUjYPkTzuAdFL6TEZHU7Fr4kqnD5FWQu/EN NW1RWXMf7scA8vqGHtkLJRBlbremjujtP9fMwk1iZqet8mGdZygBbxH8BId27GIR1XrC GW97ZtTw308wmDexV60qSes2XNkWC+Fh90Xoo838/zMKVymaKRWN2S9Kk6NOpoaPgYgK fizdrTfZNr+SCLxCh0l4vhsAhT/Vpyt+ZVXNmLs14jWx7d30QDGaCMRUg4XMfWwCEB7o 7j6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592629; x=1703197429; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=CJn5qRs6lqTbZ3HymPXKMtcJ4ZUxdDzTr5RaipMgT4A=; b=INxvKYF5gPm8lAP5jzcBnp46w2uaBa3T3ZbIp4XR1RiYVPvKwkW0/AJZN+XmR68O8N id11h9ZkftmM0Gyt/2Or5RXb3mCOhr2294i3FiyJMJ7fEchUux6Gdtm7OTBeEB9ZEWZT Is5ZW0eThEBxMpsyTadMAoBfr2rtxqNLuq1TpACr+nOCym/1Wd23wVKrGlPX8YTLz/eQ psqnHW/jb3+V5zw1kdWw+IIKlcWHT/2UaSpV1Zjzux59zRrDPR9dK61IKN2WFPZ32wVK z5b6DPkxpNpjOzHhsHx+erm38eRx62bq7v2EDl5QZjdy8MJnOMtSGKqhHCTK/vnvaauf luMg== X-Gm-Message-State: AOJu0Yzu6kPUz17UwpL/ZFyYAz+yUertB1Hr1+RmOYZo397Yduvg9kS+ aVsoEK4aL9oX5bKu3KwSJK9Bq4SMjXgKVlz7rlcOvQ== X-Google-Smtp-Source: AGHT+IEyAyxIEqv5kskArvqU0iw5bGtY+Ru6qLVDG3Ku4806wwKSc48Ik9cxrJrRsx/C3YoEEKkq8w== X-Received: by 2002:a05:6870:6190:b0:1fa:1ebb:8e86 with SMTP id a16-20020a056870619000b001fa1ebb8e86mr7019478oah.52.1702592629548; Thu, 14 Dec 2023 14:23:49 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id xm7-20020a0568709f8700b001fb2acf9a66sm1735958oab.51.2023.12.14.14.23.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:23:49 -0800 (PST) Date: Thu, 14 Dec 2023 17:23:48 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 04/26] midx: factor out `fill_pack_info()` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When selecting which packfiles will be written while generating a MIDX, the MIDX internals fill out a 'struct pack_info' with various pieces of book-keeping. Instead of filling out each field of the `pack_info` structure individually in each of the two spots that modify the array of such structures (`ctx->info`), extract a common routine that does this for us. This reduces the code duplication by a modest amount. But more importantly, it zero-initializes the structure before assigning values into it. This hardens us for a future change which will add additional fields to this structure which (until this patch) was not zero-initialized. As a result, any new fields added to the `pack_info` structure need only be updated in a single location, instead of at each spot within midx.c. There are no functional changes in this patch. Signed-off-by: Taylor Blau --- midx.c | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/midx.c b/midx.c index 778dd536c8..8dba67ddbe 100644 --- a/midx.c +++ b/midx.c @@ -475,6 +475,17 @@ struct pack_info { unsigned expired : 1; }; +static void fill_pack_info(struct pack_info *info, + struct packed_git *p, const char *pack_name, + uint32_t orig_pack_int_id) +{ + memset(info, 0, sizeof(struct pack_info)); + + info->orig_pack_int_id = orig_pack_int_id; + info->pack_name = xstrdup(pack_name); + info->p = p; +} + static int pack_info_compare(const void *_a, const void *_b) { struct pack_info *a = (struct pack_info *)_a; @@ -515,6 +526,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, const char *file_name, void *data) { struct write_midx_context *ctx = data; + struct packed_git *p; if (ends_with(file_name, ".idx")) { display_progress(ctx->progress, ++ctx->pack_paths_checked); @@ -541,27 +553,22 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc); - ctx->info[ctx->nr].p = add_packed_git(full_path, - full_path_len, - 0); - - if (!ctx->info[ctx->nr].p) { + p = add_packed_git(full_path, full_path_len, 0); + if (!p) { warning(_("failed to add packfile '%s'"), full_path); return; } - if (open_pack_index(ctx->info[ctx->nr].p)) { + if (open_pack_index(p)) { warning(_("failed to open pack-index '%s'"), full_path); - close_pack(ctx->info[ctx->nr].p); - FREE_AND_NULL(ctx->info[ctx->nr].p); + close_pack(p); + free(p); return; } - ctx->info[ctx->nr].pack_name = xstrdup(file_name); - ctx->info[ctx->nr].orig_pack_int_id = ctx->nr; - ctx->info[ctx->nr].expired = 0; + fill_pack_info(&ctx->info[ctx->nr], p, file_name, ctx->nr); ctx->nr++; } } @@ -1321,11 +1328,6 @@ static int write_midx_internal(const char *object_dir, for (i = 0; i < ctx.m->num_packs; i++) { ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc); - ctx.info[ctx.nr].orig_pack_int_id = i; - ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]); - ctx.info[ctx.nr].p = ctx.m->packs[i]; - ctx.info[ctx.nr].expired = 0; - if (flags & MIDX_WRITE_REV_INDEX) { /* * If generating a reverse index, need to have @@ -1341,10 +1343,10 @@ static int write_midx_internal(const char *object_dir, if (open_pack_index(ctx.m->packs[i])) die(_("could not open index for %s"), ctx.m->packs[i]->pack_name); - ctx.info[ctx.nr].p = ctx.m->packs[i]; } - ctx.nr++; + fill_pack_info(&ctx.info[ctx.nr++], ctx.m->packs[i], + ctx.m->pack_names[i], i); } } From patchwork Thu Dec 14 22:23:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493733 Received: from mail-oo1-f49.google.com (mail-oo1-f49.google.com [209.85.161.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B3B62C698 for ; Thu, 14 Dec 2023 22:23:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="U8bblx+b" Received: by mail-oo1-f49.google.com with SMTP id 006d021491bc7-589d4033e84so67387eaf.1 for ; Thu, 14 Dec 2023 14:23:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592632; x=1703197432; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=C7l7hngAwNvJhZLNLsoaio/dfa11PQj0fp8hhM8HT90=; b=U8bblx+bhlbPXlbZ1cc3+VLMZ7NZl+FoOOSLN2rt1FP0cd26Al8mRe115zdtZYfzj0 NYnFxWpeY+zbuzO+l0YVVsqzXJhJbou1tQAV1Wf/DkF8nzEHSORBvbspbacfem5jogsl ll4jCUrsAJgNo6G4R75VyZf1dLP2JsaSgB8Tz7O9pWCtnDkJe8QTMg6l5l25J2N2WmRB EPUxJQslZSUXTDengX+KthHWgrb0awxOpn19RpbCd0wx+Nb1fIQfKohLgV//TSiY3QvW ROcJ286sI3bc6N0fP+YHFOsEbdKFFRsF63qe+duP0H611qsH3e+Uf7E+duTBek6p+0qA Nq4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592632; x=1703197432; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=C7l7hngAwNvJhZLNLsoaio/dfa11PQj0fp8hhM8HT90=; b=TfTwwolBjuNM8KF6uQBInkhee7EXE1//PsuE2fNCfcj6uInw6UOmqKaHUGLndQ9Wtf G6jMHzWa9PRSTB6uua0n5h3mbeYxz+Ut71SeQVdFJW6Ju0/bXhr7TLwEorGkQ0lQSwq4 BOwL0xMoo10D1oK/veNkcTk0nAymZmXC+1hTLjKW2c2iBz9TbNKP9QFkIBUtrYj7PaZS w+Is/9i5KYWvFjxtmhIkL8sCKhXd7RYMtlFR3KkZQRdjLc+tCfwGI8Ipu2Gqk/EV7jck 32oa9sj+MHFiq6Q0eVwtTfVRjrv5Mq2G4oCTpsKcGLELh9BklYJXAhjbE5KabHdpVJ7T 7JnQ== X-Gm-Message-State: AOJu0YwNnXKxNkHuDpAGu7hHb0FZBSt09iofHg64/yNGsoWyiPCjOmTx k2s5BgDFmQZKp2lZOH8Esiu2WMsxAqUUvFQTDMvwGA== X-Google-Smtp-Source: AGHT+IFLhkELFWZXnErcZGzsUkYNOI3lMH0OdU1HMg8FTWFWegb+ligLVJhhu/JI6aCKfzCehmN3hg== X-Received: by 2002:a05:6870:9f82:b0:1fb:75b:99ad with SMTP id xm2-20020a0568709f8200b001fb075b99admr12088026oab.92.1702592632461; Thu, 14 Dec 2023 14:23:52 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id du37-20020a0568703a2500b00203184540easm1448915oab.50.2023.12.14.14.23.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:23:52 -0800 (PST) Date: Thu, 14 Dec 2023 17:23:51 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 05/26] midx: implement `BTMP` chunk Message-ID: <6fdc68418f196ad4d35866321760c3e4629c7ff7.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When a multi-pack bitmap is used to implement verbatim pack reuse (that is, when verbatim chunks from an on-disk packfile are copied directly[^1]), it does so by using its "preferred pack" as the source for pack-reuse. This allows repositories to pack the majority of their objects into a single (often large) pack, and then use it as the single source for verbatim pack reuse. This increases the amount of objects that are reused verbatim (and consequently, decrease the amount of time it takes to generate many packs). But this performance comes at a cost, which is that the preferred packfile must pace its growth with that of the entire repository in order to maintain the utility of verbatim pack reuse. As repositories grow beyond what we can reasonably store in a single packfile, the utility of verbatim pack reuse diminishes. Or, at the very least, it becomes increasingly more expensive to maintain as the pack grows larger and larger. It would be beneficial to be able to perform this same optimization over multiple packs, provided some modest constraints (most importantly, that the set of packs eligible for verbatim reuse are disjoint with respect to the subset of their objects being sent). If we assume that the packs which we treat as candidates for verbatim reuse are disjoint with respect to any of their objects we may output, we need to make only modest modifications to the verbatim pack-reuse code itself. Most notably, we need to remove the assumption that the bits in the reachability bitmap corresponding to objects from the single reuse pack begin at the first bit position. Future patches will unwind these assumptions and reimplement their existing functionality as special cases of the more general assumptions (e.g. that reuse bits can start anywhere within the bitset, but happen to start at 0 for all existing cases). This patch does not yet relax any of those assumptions. Instead, it implements a foundational data-structure, the "Bitampped Packs" (`BTMP`) chunk of the multi-pack index. The `BTMP` chunk's contents are described in detail here. Importantly, the `BTMP` chunk contains information to map regions of a multi-pack index's reachability bitmap to the packs whose objects they represent. For now, this chunk is only written, not read (outside of the test-tool used in this patch to test the new chunk's behavior). Future patches will begin to make use of this new chunk. [^1]: Modulo patching any `OFS_DELTA`'s that cross over a region of the pack that wasn't used verbatim. Signed-off-by: Taylor Blau --- Documentation/gitformat-pack.txt | 76 ++++++++++++++++++++++++++++++++ midx.c | 75 +++++++++++++++++++++++++++++-- midx.h | 5 +++ pack-bitmap.h | 9 ++++ t/helper/test-read-midx.c | 30 ++++++++++++- t/t5319-multi-pack-index.sh | 35 +++++++++++++++ 6 files changed, 226 insertions(+), 4 deletions(-) diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt index 9fcb29a9c8..d6ae229be5 100644 --- a/Documentation/gitformat-pack.txt +++ b/Documentation/gitformat-pack.txt @@ -396,6 +396,15 @@ CHUNK DATA: is padded at the end with between 0 and 3 NUL bytes to make the chunk size a multiple of 4 bytes. + Bitmapped Packfiles (ID: {'B', 'T', 'M', 'P'}) + Stores a table of two 4-byte unsigned integers in network order. + Each table entry corresponds to a single pack (in the order that + they appear above in the `PNAM` chunk). The values for each table + entry are as follows: + - The first bit position (in pseudo-pack order, see below) to + contain an object from that pack. + - The number of bits whose objects are selected from that pack. + OID Fanout (ID: {'O', 'I', 'D', 'F'}) The ith entry, F[i], stores the number of OIDs with first byte at most i. Thus F[255] stores the total @@ -509,6 +518,73 @@ packs arranged in MIDX order (with the preferred pack coming first). The MIDX's reverse index is stored in the optional 'RIDX' chunk within the MIDX itself. +=== `BTMP` chunk + +The Bitmapped Packfiles (`BTMP`) chunk encodes additional information +about the objects in the multi-pack index's reachability bitmap. Recall +that objects from the MIDX are arranged in "pseudo-pack" order (see +above) for reachability bitmaps. + +From the example above, suppose we have packs "a", "b", and "c", with +10, 15, and 20 objects, respectively. In pseudo-pack order, those would +be arranged as follows: + + |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19| + +When working with single-pack bitmaps (or, equivalently, multi-pack +reachability bitmaps with a preferred pack), linkgit:git-pack-objects[1] +performs ``verbatim'' reuse, attempting to reuse chunks of the bitmapped +or preferred packfile instead of adding objects to the packing list. + +When a chunk of bytes is reused from an existing pack, any objects +contained therein do not need to be added to the packing list, saving +memory and CPU time. But a chunk from an existing packfile can only be +reused when the following conditions are met: + + - The chunk contains only objects which were requested by the caller + (i.e. does not contain any objects which the caller didn't ask for + explicitly or implicitly). + + - All objects stored in non-thin packs as offset- or reference-deltas + also include their base object in the resulting pack. + +The `BTMP` chunk encodes the necessary information in order to implement +multi-pack reuse over a set of packfiles as described above. +Specifically, the `BTMP` chunk encodes three pieces of information (all +32-bit unsigned integers in network byte-order) for each packfile `p` +that is stored in the MIDX, as follows: + +`bitmap_pos`:: The first bit position (in pseudo-pack order) in the + multi-pack index's reachability bitmap occupied by an object from `p`. + +`bitmap_nr`:: The number of bit positions (including the one at + `bitmap_pos`) that encode objects from that pack `p`. + +For example, the `BTMP` chunk corresponding to the above example (with +packs ``a'', ``b'', and ``c'') would look like: + +[cols="1,2,2"] +|=== +| |`bitmap_pos` |`bitmap_nr` + +|packfile ``a'' +|`0` +|`10` + +|packfile ``b'' +|`10` +|`15` + +|packfile ``c'' +|`25` +|`20` +|=== + +With this information in place, we can treat each packfile as +individually reusable in the same fashion as verbatim pack reuse is +performed on individual packs prior to the implementation of the `BTMP` +chunk. + == cruft packs The cruft packs feature offer an alternative to Git's traditional mechanism of diff --git a/midx.c b/midx.c index 8dba67ddbe..de25612b0c 100644 --- a/midx.c +++ b/midx.c @@ -33,6 +33,7 @@ #define MIDX_CHUNK_ALIGNMENT 4 #define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */ +#define MIDX_CHUNKID_BITMAPPEDPACKS 0x42544d50 /* "BTMP" */ #define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */ @@ -41,6 +42,7 @@ #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256) #define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t)) #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t)) +#define MIDX_CHUNK_BITMAPPED_PACKS_WIDTH (2 * sizeof(uint32_t)) #define MIDX_LARGE_OFFSET_NEEDED 0x80000000 #define PACK_EXPIRED UINT_MAX @@ -193,6 +195,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets, &m->chunk_large_offsets_len); + pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS, + (const unsigned char **)&m->chunk_bitmapped_packs, + &m->chunk_bitmapped_packs_len); if (git_env_bool("GIT_TEST_MIDX_READ_RIDX", 1)) pair_chunk(cf, MIDX_CHUNKID_REVINDEX, &m->chunk_revindex, @@ -286,6 +291,26 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t return 0; } +int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m, + struct bitmapped_pack *bp, uint32_t pack_int_id) +{ + if (!m->chunk_bitmapped_packs) + return error(_("MIDX does not contain the BTMP chunk")); + + if (prepare_midx_pack(r, m, pack_int_id)) + return error(_("could not load bitmapped pack %"PRIu32), pack_int_id); + + bp->p = m->packs[pack_int_id]; + bp->bitmap_pos = get_be32((char *)m->chunk_bitmapped_packs + + MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * pack_int_id); + bp->bitmap_nr = get_be32((char *)m->chunk_bitmapped_packs + + MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * pack_int_id + + sizeof(uint32_t)); + bp->pack_int_id = pack_int_id; + + return 0; +} + int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result) { return bsearch_hash(oid->hash, m->chunk_oid_fanout, m->chunk_oid_lookup, @@ -468,10 +493,16 @@ static size_t write_midx_header(struct hashfile *f, return MIDX_HEADER_SIZE; } +#define BITMAP_POS_UNKNOWN (~((uint32_t)0)) + struct pack_info { uint32_t orig_pack_int_id; char *pack_name; struct packed_git *p; + + uint32_t bitmap_pos; + uint32_t bitmap_nr; + unsigned expired : 1; }; @@ -484,6 +515,7 @@ static void fill_pack_info(struct pack_info *info, info->orig_pack_int_id = orig_pack_int_id; info->pack_name = xstrdup(pack_name); info->p = p; + info->bitmap_pos = BITMAP_POS_UNKNOWN; } static int pack_info_compare(const void *_a, const void *_b) @@ -824,6 +856,26 @@ static int write_midx_pack_names(struct hashfile *f, void *data) return 0; } +static int write_midx_bitmapped_packs(struct hashfile *f, void *data) +{ + struct write_midx_context *ctx = data; + size_t i; + + for (i = 0; i < ctx->nr; i++) { + struct pack_info *pack = &ctx->info[i]; + if (pack->expired) + continue; + + if (pack->bitmap_pos == BITMAP_POS_UNKNOWN && pack->bitmap_nr) + BUG("pack '%s' has no bitmap position, but has %d bitmapped object(s)", + pack->pack_name, pack->bitmap_nr); + + hashwrite_be32(f, pack->bitmap_pos); + hashwrite_be32(f, pack->bitmap_nr); + } + return 0; +} + static int write_midx_oid_fanout(struct hashfile *f, void *data) { @@ -991,8 +1043,19 @@ static uint32_t *midx_pack_order(struct write_midx_context *ctx) QSORT(data, ctx->entries_nr, midx_pack_order_cmp); ALLOC_ARRAY(pack_order, ctx->entries_nr); - for (i = 0; i < ctx->entries_nr; i++) + for (i = 0; i < ctx->entries_nr; i++) { + struct pack_midx_entry *e = &ctx->entries[data[i].nr]; + struct pack_info *pack = &ctx->info[ctx->pack_perm[e->pack_int_id]]; + if (pack->bitmap_pos == BITMAP_POS_UNKNOWN) + pack->bitmap_pos = i; + pack->bitmap_nr++; pack_order[i] = data[i].nr; + } + for (i = 0; i < ctx->nr; i++) { + struct pack_info *pack = &ctx->info[ctx->pack_perm[i]]; + if (pack->bitmap_pos == BITMAP_POS_UNKNOWN) + pack->bitmap_pos = 0; + } free(data); trace2_region_leave("midx", "midx_pack_order", the_repository); @@ -1293,6 +1356,7 @@ static int write_midx_internal(const char *object_dir, struct hashfile *f = NULL; struct lock_file lk; struct write_midx_context ctx = { 0 }; + int bitmapped_packs_concat_len = 0; int pack_name_concat_len = 0; int dropped_packs = 0; int result = 0; @@ -1505,8 +1569,10 @@ static int write_midx_internal(const char *object_dir, } for (i = 0; i < ctx.nr; i++) { - if (!ctx.info[i].expired) - pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1; + if (ctx.info[i].expired) + continue; + pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1; + bitmapped_packs_concat_len += 2 * sizeof(uint32_t); } /* Check that the preferred pack wasn't expired (if given). */ @@ -1566,6 +1632,9 @@ static int write_midx_internal(const char *object_dir, add_chunk(cf, MIDX_CHUNKID_REVINDEX, st_mult(ctx.entries_nr, sizeof(uint32_t)), write_midx_revindex); + add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS, + bitmapped_packs_concat_len, + write_midx_bitmapped_packs); } write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs); diff --git a/midx.h b/midx.h index a5d98919c8..b404235db5 100644 --- a/midx.h +++ b/midx.h @@ -7,6 +7,7 @@ struct object_id; struct pack_entry; struct repository; +struct bitmapped_pack; #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX" #define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \ @@ -33,6 +34,8 @@ struct multi_pack_index { const unsigned char *chunk_pack_names; size_t chunk_pack_names_len; + const uint32_t *chunk_bitmapped_packs; + size_t chunk_bitmapped_packs_len; const uint32_t *chunk_oid_fanout; const unsigned char *chunk_oid_lookup; const unsigned char *chunk_object_offsets; @@ -58,6 +61,8 @@ void get_midx_rev_filename(struct strbuf *out, struct multi_pack_index *m); struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local); int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id); +int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m, + struct bitmapped_pack *bp, uint32_t pack_int_id); int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result); off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos); uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos); diff --git a/pack-bitmap.h b/pack-bitmap.h index 5273a6a019..b68b213388 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -52,6 +52,15 @@ typedef int (*show_reachable_fn)( struct bitmap_index; +struct bitmapped_pack { + struct packed_git *p; + + uint32_t bitmap_pos; + uint32_t bitmap_nr; + + uint32_t pack_int_id; /* MIDX only */ +}; + struct bitmap_index *prepare_bitmap_git(struct repository *r); struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx); void count_bitmap_commit_list(struct bitmap_index *, uint32_t *commits, diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c index e9a444ddba..e48557aba1 100644 --- a/t/helper/test-read-midx.c +++ b/t/helper/test-read-midx.c @@ -100,10 +100,36 @@ static int read_midx_preferred_pack(const char *object_dir) return 0; } +static int read_midx_bitmapped_packs(const char *object_dir) +{ + struct multi_pack_index *midx = NULL; + struct bitmapped_pack pack; + uint32_t i; + + setup_git_directory(); + + midx = load_multi_pack_index(object_dir, 1); + if (!midx) + return 1; + + for (i = 0; i < midx->num_packs; i++) { + if (nth_bitmapped_pack(the_repository, midx, &pack, i) < 0) + return 1; + + printf("%s\n", pack_basename(pack.p)); + printf(" bitmap_pos: %"PRIuMAX"\n", (uintmax_t)pack.bitmap_pos); + printf(" bitmap_nr: %"PRIuMAX"\n", (uintmax_t)pack.bitmap_nr); + } + + close_midx(midx); + + return 0; +} + int cmd__read_midx(int argc, const char **argv) { if (!(argc == 2 || argc == 3)) - usage("read-midx [--show-objects|--checksum|--preferred-pack] "); + usage("read-midx [--show-objects|--checksum|--preferred-pack|--bitmap] "); if (!strcmp(argv[1], "--show-objects")) return read_midx_file(argv[2], 1); @@ -111,5 +137,7 @@ int cmd__read_midx(int argc, const char **argv) return read_midx_checksum(argv[2]); else if (!strcmp(argv[1], "--preferred-pack")) return read_midx_preferred_pack(argv[2]); + else if (!strcmp(argv[1], "--bitmap")) + return read_midx_bitmapped_packs(argv[2]); return read_midx_file(argv[1], 0); } diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index c20aafe99a..dd09134db0 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -1171,4 +1171,39 @@ test_expect_success 'reader notices out-of-bounds fanout' ' test_cmp expect err ' +test_expect_success 'bitmapped packs are stored via the BTMP chunk' ' + test_when_finished "rm -fr repo" && + git init repo && + ( + cd repo && + + for i in 1 2 3 4 5 + do + test_commit "$i" && + git repack -d || return 1 + done && + + find $objdir/pack -type f -name "*.idx" | xargs -n 1 basename | + sort >packs && + + git multi-pack-index write --stdin-packs err && + cat >expect <<-\EOF && + error: MIDX does not contain the BTMP chunk + EOF + test_cmp expect err && + + git multi-pack-index write --stdin-packs --bitmap \ + --preferred-pack="$(head -n1 actual && + for i in $(test_seq $(wc -l expect && + test_cmp expect actual + ) +' + test_done From patchwork Thu Dec 14 22:23:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493730 Received: from mail-oa1-f54.google.com (mail-oa1-f54.google.com [209.85.160.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A8FF367215 for ; Thu, 14 Dec 2023 22:23:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="xHq9gnOZ" Received: by mail-oa1-f54.google.com with SMTP id 586e51a60fabf-1fb9a22b4a7so30673fac.3 for ; Thu, 14 Dec 2023 14:23:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592635; x=1703197435; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=E5jT4KWLJtxvtElWXJZV/gpmSKNZ9enQEjfI8AweJxI=; b=xHq9gnOZIvGPvjaFGnCROb+rFY4WDFqv8bw7wVVCu446oln36HAzuGfF/fPxV0oDTR bwW4ybyNk7XcXgkmlGlGaGFnkxFUR34uroO53vSYsUqPR9uGQcK/behrtMX4ntnfBLS4 LqTN2FjhLer1doOrZYsTc4Ok8egBozUrS1nWeyn4YCytMdQmwcAgN3SBN8TowlTCM2j7 blJkJ4pKSnNC/LESB7v6OThUiyYF2Bs1wOBMzk1646tAxGnR+ogvImvtHMSiaR7vKxPx NfpbMpJMrYAwU3SasaJRcSBvftkmC4NtxXorU05tpDbdsMgMqw3kc6hQwYPRQ8Wh1olC D16g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592635; x=1703197435; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=E5jT4KWLJtxvtElWXJZV/gpmSKNZ9enQEjfI8AweJxI=; b=AFOF7e5Li3nIkhFFE+sFDN4GwouQiUFVrNOxbo0ynKYX9+OHi0UBOA+w54DDheHWTw oces/OIxhUrrMxPi8EUCdilNZxvIhQrgXPqYlzVYf3tfi72POtABttvwqDGSowxzLkuK NjhHdFRtOEzPpCXo3fr+sgK9rhvbFx7nPtZ59gjwcES/U3AC2G2c+NQlMVJHR4AW4vl6 QPByljNaKSuWpdl2XDghVAldXcLNMB5YGoqBw/ifeUT/ZTeFpOv0jzA63McEVa0anSrY GR5PpXQ6QO7bLN9iym0Hmwsy/muhuC0dn5GmvWCB7y6C+Xt196BwycdNLCj52YtV5vgK lT6A== X-Gm-Message-State: AOJu0YxGYu+lu5Z6QwX1ByI3GPTQ8zLzMHTSq1NEkkAmjSffVDjQdxq7 h6tpBfTReRUDPV69y4InV/3rAR1Z8KirnkeqG7qDBw== X-Google-Smtp-Source: AGHT+IHNHHXvVyg3qqZL0Xq1fi+mexRNZAyIyjrWgmbVc6nR2ztQUnPuwhwaaPWjeVg+N2oODYkkjQ== X-Received: by 2002:a05:6871:824:b0:203:fde:f894 with SMTP id q36-20020a056871082400b002030fdef894mr4590362oap.57.1702592635161; Thu, 14 Dec 2023 14:23:55 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id gu11-20020a056870ab0b00b001fb3143cc0bsm4760250oab.44.2023.12.14.14.23.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:23:54 -0800 (PST) Date: Thu, 14 Dec 2023 17:23:54 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 06/26] midx: implement `midx_locate_pack()` Message-ID: <96f397a2b2a48db87975cee4789a91e45bd3bc39.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The multi-pack index API exposes a `midx_contains_pack()` function that takes in a string ending in either ".idx" or ".pack" and returns whether or not the MIDX contains a given pack corresponding to that string. There is no corresponding function to locate the position of a pack within the MIDX's pack order (sorted lexically by pack filename). We could add an optional out parameter to `midx_contains_pack()` that is filled out with the pack's position when the parameter is non-NULL. To minimize the amount of fallout from this change, instead introduce a new function by renaming `midx_contains_pack()` to `midx_locate_pack()`, adding that output parameter, and then reimplementing `midx_contains_pack()` in terms of it. Future patches will make use of this new function. Signed-off-by: Taylor Blau --- midx.c | 13 +++++++++++-- midx.h | 5 ++++- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/midx.c b/midx.c index de25612b0c..beaf0c0de4 100644 --- a/midx.c +++ b/midx.c @@ -428,7 +428,8 @@ static int cmp_idx_or_pack_name(const char *idx_or_pack_name, return strcmp(idx_or_pack_name, idx_name); } -int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) +int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name, + uint32_t *pos) { uint32_t first = 0, last = m->num_packs; @@ -439,8 +440,11 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) current = m->pack_names[mid]; cmp = cmp_idx_or_pack_name(idx_or_pack_name, current); - if (!cmp) + if (!cmp) { + if (pos) + *pos = mid; return 1; + } if (cmp > 0) { first = mid + 1; continue; @@ -451,6 +455,11 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) return 0; } +int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) +{ + return midx_locate_pack(m, idx_or_pack_name, NULL); +} + int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local) { struct multi_pack_index *m; diff --git a/midx.h b/midx.h index b404235db5..89c5aa637e 100644 --- a/midx.h +++ b/midx.h @@ -70,7 +70,10 @@ struct object_id *nth_midxed_object_oid(struct object_id *oid, struct multi_pack_index *m, uint32_t n); int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e, struct multi_pack_index *m); -int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name); +int midx_contains_pack(struct multi_pack_index *m, + const char *idx_or_pack_name); +int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name, + uint32_t *pos); int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local); /* From patchwork Thu Dec 14 22:23:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493732 Received: from mail-oa1-f53.google.com (mail-oa1-f53.google.com [209.85.160.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3795967217 for ; Thu, 14 Dec 2023 22:23:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="l2vc0E95" Received: by mail-oa1-f53.google.com with SMTP id 586e51a60fabf-1f055438492so30750fac.3 for ; Thu, 14 Dec 2023 14:23:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592638; x=1703197438; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=CHDy3Vs7lALL7DwXC1VnW4EZ41sWyCaUQtqPaaHaiNQ=; b=l2vc0E95LUPuq06NMTvxE3XzYUpWSkxLcQlcvNrig0mVU6VnC5sUBCq5LX5vKyuqZS YpsEbI+r0ixttghaiAn0Z16DQWCm0WsY1xlC/PnUWj0ZfegzsW0qzQYTxcWiQeC2mdur ZaT0KpyAIOJ/xgibcHhe/tlpqtqKq2XzciB21FLnuPqKBuuHcqhIKQGAbyj1fgw05n3+ t3gC1z19ms7PuzhTI0pF2TEiLVVINCZHRhfgHmKeCAWIUlKubV4sNTn9zZD4MmG458lg z80fHvVOeQMXX+JUvWNTTw2xxapF2VkTNj80CcbLSv2Ci4ea3v2H6f7TL6RCOiIedZF3 H5ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592638; x=1703197438; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=CHDy3Vs7lALL7DwXC1VnW4EZ41sWyCaUQtqPaaHaiNQ=; b=Stcg5FKOAjmsduo+VqPRjj+y0RjeFqIYNejSq2hh5t0CgUJkMgc4lrgNRGhHmJOAfZ uCA3zAAS2YBcTZeiGTL7xIS3uO2XzsFNAaR7fO1wTtPeA9IHvVnZktwGgwysrylbFaih 5tqE7XdJU0XY+ud2+a43ognWudu6lVfOX7i/ABKQkLdX56fI28UsjaS1nGiI7mlE8ZBi qIfjHN+pbY+QqdSzxhw/492rNq03uvPnYfRBkrd5L+WDsTLadLdolK44d/AGpanjDnjF 8auy8Ed3xbqM4N0r/kPjuQfWDrebtUsbtrl0MDZeJ0fUvkTAo0H/MGEwNoxcgG6UF0C1 X/Vg== X-Gm-Message-State: AOJu0YwCkaGIXQvdGElEzlNg902+lEom/ceZKvDV9WDNkc6phKMOFJzQ cQDCjk4TX5Bal6YEOLLx3D7VBO9xtBxXJUVjYBmWGw== X-Google-Smtp-Source: AGHT+IGnSbjS+VsgsC7iB26wl63u9ik4XRmx60VHkwS1SdOEoTSk0YhBZzfm+/+f5HPbjgEzkYdOhw== X-Received: by 2002:a05:6870:808e:b0:203:3680:6f75 with SMTP id q14-20020a056870808e00b0020336806f75mr2856715oab.15.1702592637696; Thu, 14 Dec 2023 14:23:57 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id wd15-20020a056871a58f00b001fb21cee425sm4732577oab.40.2023.12.14.14.23.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:23:57 -0800 (PST) Date: Thu, 14 Dec 2023 17:23:56 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 07/26] pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When trying to assemble a pack with bitmaps using `--use-bitmap-index`, `pack-objects` asks the pack-bitmap machinery for a bitmap which indicates the set of objects we can "reuse" verbatim from on-disk. This set is roughly comprised of: a prefix of objects in the bitmapped pack (or preferred pack, in the case of a multi-pack reachability bitmap), plus any other objects not included in the prefix, excluding any deltas whose base we are not sending in the resulting pack. The pack-bitmap machinery is responsible for computing this bitmap, and does so with the following functions: - reuse_partial_packfile_from_bitmap() - try_partial_reuse() In the existing implementation, the first function is responsible for (a) marking the prefix of objects in the reusable pack, and then (b) calling try_partial_reuse() on any remaining objects to ensure that they are also reusable (and removing them from the bitmapped set if they are not). Likewise, the `try_partial_reuse()` function is responsible for checking whether an isolated object (that is, an object from the bitmapped pack/preferred pack not contained in the prefix from earlier) may be reused, i.e. that it isn't a delta of an object that we are not sending in the resulting pack. These functions are based on two core assumptions, which we will unwind in this and the following commits: 1. There is only a single pack from the bitmap which is eligible for verbatim pack-reuse. For single-pack bitmaps, this is trivially the bitmapped pack. For multi-pack bitmaps, this is (currently) the MIDX's preferred pack. 2. The pack eligible for reuse has its first object in bit position 0, and all objects from that pack follow in pack-order from that first bit position. In order to perform verbatim pack reuse over multiple packs, we must unwind these two assumptions. Most notably, in order to reuse bits from a given packfile, we need to know the first bit position occupied by an object form that packfile. To propagate this information around, pass a `struct bitmapped_pack *` anywhere we previously passed a `struct packed_git *`, since the former contains the bitmap position we're interested in (as well as a pointer to the latter). As an additional step, factor out a sub-routine from the main `reuse_partial_packfile_from_bitmap()` function, called `reuse_partial_packfile_from_bitmap_1()`. This new function will be responsible for figuring out which objects may be reused from a single pack, and the existing function will dispatch multiple calls to its new helper function for each reusable pack. Consequently, `reuse_partial_packfile_from_bitmap()` will now maintain an array of reusable packs instead of a single such pack. We currently expect that array to have only a single element, so this awkward state is short-lived. It will serve as useful scaffolding in subsequent commits as we begin to work towards enabling multi-pack reuse. Signed-off-by: Taylor Blau --- pack-bitmap.c | 118 +++++++++++++++++++++++++++++++++++++------------- 1 file changed, 87 insertions(+), 31 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index d2f1306960..d64a80c30c 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1836,7 +1836,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, * -1 means "stop trying further objects"; 0 means we may or may not have * reused, but you can keep feeding bits. */ -static int try_partial_reuse(struct packed_git *pack, +static int try_partial_reuse(struct bitmapped_pack *pack, size_t pos, struct bitmap *reuse, struct pack_window **w_curs) @@ -1868,11 +1868,11 @@ static int try_partial_reuse(struct packed_git *pack, * preferred pack precede all bits from other packs. */ - if (pos >= pack->num_objects) + if (pos >= pack->p->num_objects) return -1; /* not actually in the pack or MIDX preferred pack */ - offset = delta_obj_offset = pack_pos_to_offset(pack, pos); - type = unpack_object_header(pack, w_curs, &offset, &size); + offset = delta_obj_offset = pack_pos_to_offset(pack->p, pos); + type = unpack_object_header(pack->p, w_curs, &offset, &size); if (type < 0) return -1; /* broken packfile, punt */ @@ -1888,11 +1888,11 @@ static int try_partial_reuse(struct packed_git *pack, * and the normal slow path will complain about it in * more detail. */ - base_offset = get_delta_base(pack, w_curs, &offset, type, + base_offset = get_delta_base(pack->p, w_curs, &offset, type, delta_obj_offset); if (!base_offset) return 0; - if (offset_to_pack_pos(pack, base_offset, &base_pos) < 0) + if (offset_to_pack_pos(pack->p, base_offset, &base_pos) < 0) return 0; /* @@ -1915,14 +1915,14 @@ static int try_partial_reuse(struct packed_git *pack, * to REF_DELTA on the fly. Better to just let the normal * object_entry code path handle it. */ - if (!bitmap_get(reuse, base_pos)) + if (!bitmap_get(reuse, pack->bitmap_pos + base_pos)) return 0; } /* * If we got here, then the object is OK to reuse. Mark it. */ - bitmap_set(reuse, pos); + bitmap_set(reuse, pack->bitmap_pos + pos); return 0; } @@ -1934,29 +1934,13 @@ uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git) return nth_midxed_pack_int_id(m, pack_pos_to_midx(bitmap_git->midx, 0)); } -int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, - struct packed_git **packfile_out, - uint32_t *entries, - struct bitmap **reuse_out) +static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git, + struct bitmapped_pack *pack, + struct bitmap *reuse) { - struct repository *r = the_repository; - struct packed_git *pack; struct bitmap *result = bitmap_git->result; - struct bitmap *reuse; struct pack_window *w_curs = NULL; size_t i = 0; - uint32_t offset; - uint32_t objects_nr; - - assert(result); - - load_reverse_index(r, bitmap_git); - - if (bitmap_is_midx(bitmap_git)) - pack = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)]; - else - pack = bitmap_git->pack; - objects_nr = pack->num_objects; while (i < result->word_alloc && result->words[i] == (eword_t)~0) i++; @@ -1969,15 +1953,15 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, * we use it instead of another pack. In single-pack bitmaps, the choice * is made for us. */ - if (i > objects_nr / BITS_IN_EWORD) - i = objects_nr / BITS_IN_EWORD; + if (i > pack->p->num_objects / BITS_IN_EWORD) + i = pack->p->num_objects / BITS_IN_EWORD; - reuse = bitmap_word_alloc(i); memset(reuse->words, 0xFF, i * sizeof(eword_t)); for (; i < result->word_alloc; ++i) { eword_t word = result->words[i]; size_t pos = (i * BITS_IN_EWORD); + size_t offset; for (offset = 0; offset < BITS_IN_EWORD; ++offset) { if ((word >> offset) == 0) @@ -2002,6 +1986,78 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, done: unuse_pack(&w_curs); +} + +static int bitmapped_pack_cmp(const void *va, const void *vb) +{ + const struct bitmapped_pack *a = va; + const struct bitmapped_pack *b = vb; + + if (a->bitmap_pos < b->bitmap_pos) + return -1; + if (a->bitmap_pos > b->bitmap_pos) + return 1; + return 0; +} + +int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, + struct packed_git **packfile_out, + uint32_t *entries, + struct bitmap **reuse_out) +{ + struct repository *r = the_repository; + struct bitmapped_pack *packs = NULL; + struct bitmap *result = bitmap_git->result; + struct bitmap *reuse; + size_t i; + size_t packs_nr = 0, packs_alloc = 0; + size_t word_alloc; + uint32_t objects_nr = 0; + + assert(result); + + load_reverse_index(r, bitmap_git); + + if (bitmap_is_midx(bitmap_git)) { + for (i = 0; i < bitmap_git->midx->num_packs; i++) { + struct bitmapped_pack pack; + if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) { + warning(_("unable to load pack: '%s', disabling pack-reuse"), + bitmap_git->midx->pack_names[i]); + free(packs); + return -1; + } + if (!pack.bitmap_nr) + continue; /* no objects from this pack */ + if (pack.bitmap_pos) + continue; /* not preferred pack */ + + ALLOC_GROW(packs, packs_nr + 1, packs_alloc); + memcpy(&packs[packs_nr++], &pack, sizeof(pack)); + + objects_nr += pack.p->num_objects; + } + + QSORT(packs, packs_nr, bitmapped_pack_cmp); + } else { + ALLOC_GROW(packs, packs_nr + 1, packs_alloc); + + packs[packs_nr].p = bitmap_git->pack; + packs[packs_nr].bitmap_pos = 0; + packs[packs_nr].bitmap_nr = bitmap_git->pack->num_objects; + + objects_nr = packs[packs_nr++].p->num_objects; + } + + word_alloc = objects_nr / BITS_IN_EWORD; + if (objects_nr % BITS_IN_EWORD) + word_alloc++; + reuse = bitmap_word_alloc(word_alloc); + + if (packs_nr != 1) + BUG("pack reuse not yet implemented for multiple packs"); + + reuse_partial_packfile_from_bitmap_1(bitmap_git, packs, reuse); *entries = bitmap_popcount(reuse); if (!*entries) { @@ -2014,7 +2070,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, * need to be handled separately. */ bitmap_and_not(result, reuse); - *packfile_out = pack; + *packfile_out = packs[0].p; *reuse_out = reuse; return 0; } From patchwork Thu Dec 14 22:23:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493738 Received: from mail-oo1-f46.google.com (mail-oo1-f46.google.com [209.85.161.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A3D2D67208 for ; Thu, 14 Dec 2023 22:24:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="AosW+pcm" Received: by mail-oo1-f46.google.com with SMTP id 006d021491bc7-58e256505f7so55345eaf.3 for ; Thu, 14 Dec 2023 14:24:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592640; x=1703197440; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=295UZYPZbV1wJAB8tzl5D02ZNn0sA5tomkIqFAKeBtw=; b=AosW+pcmwBWo61EGodvDBHlIO/9wuUxPt7epPmWvkwKnZw7IbrWUVFrtZLFhxxaIcf UWO9I3B/gY311XpsfA/8JYNKzPADMniPDhWOLbDnkzPJ1UBS2f1FLw/Je6KEZWXb0HiX M9a/RoTG8ChGoYCNUe4qpnLp+1vKgMGEBR0nAYbY84ivBKzWjI161J8WkJH+A4zSB6/N LduaoKABKRcYiafCw6JdMQqXyAIeA/J4+2ToZ/RlvkKXgyZSHgRPkzjaOxBzI81UoP+Y j6VzRp0xqlYMo9hvlqBkeS/9vbSFuL+ItOF1TAGfR5cR+91YgOMZtXSON2snAn/ISfYR SCPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592640; x=1703197440; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=295UZYPZbV1wJAB8tzl5D02ZNn0sA5tomkIqFAKeBtw=; b=EwZoe3Q81y2LzwX7LKft3JWOWskclt+H+X3jrplsmLU1qrJFjlwgLpWEM9KZzZdFYU rSRzdtTWJDgggml0YtDBuBNuZ5seX5ZfDSp8N677wnVPHgKGcNRhZ65i1V4c6s0Jshen eQdO1oEwrm3DxadJXOqZ15vW8DTPhBmmNRvS4yjxsm//T7g7a4a7RTNnPZ25F+kxa6sQ oOjXrb2ow6TcZbwgCyZHJbfuoXPzJEOx+bejjYAvAcRgh5pzlF25hrikkI1GJWkxgaoj ZJTxozdrjOAC6Bj5/hdVL5koXwkoQzAbM05rAIguAt9+8fmcc9fWHuoyNdzP75QRsM9W CpMw== X-Gm-Message-State: AOJu0YyK8Wh92zv9ZSVjHVizU4tGLIdryzQRBIPQQfDg0mNb3eVbyfn5 TC1kViJ/KmJTpswV+DLwSQ/xqRuWXPSHXeNG3Y94Zw== X-Google-Smtp-Source: AGHT+IFdL7sGkj9ZtjanbR7a31e2f7sxAkUjOIy/Eotr201Xim6RDeoHh+hRCgfX8mjCPlP951SFdw== X-Received: by 2002:a4a:1d86:0:b0:58e:1c47:879b with SMTP id 128-20020a4a1d86000000b0058e1c47879bmr7574223oog.16.1702592640295; Thu, 14 Dec 2023 14:24:00 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id l26-20020a4a855a000000b0058d1f2e1c8csm3692368ooh.40.2023.12.14.14.23.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:00 -0800 (PST) Date: Thu, 14 Dec 2023 17:23:59 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 08/26] ewah: implement `bitmap_is_empty()` Message-ID: <595b3b698615cd341cef49623523c0d3ea1b6802.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: In a future commit, we will want to check whether or not a bitmap has any bits set in any of its words. The best way to do this (prior to the existence of this patch) is to call `bitmap_popcount()` and check whether the result is non-zero. But this is semi-wasteful, since we do not need to know the exact number of bits set, only whether or not there is at least one of them. Implement a new helper function to check just that. Suggested-by: Patrick Steinhardt Signed-off-by: Taylor Blau --- ewah/bitmap.c | 9 +++++++++ ewah/ewok.h | 1 + 2 files changed, 10 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index 7b525b1ecd..ac7e0af622 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -169,6 +169,15 @@ size_t bitmap_popcount(struct bitmap *self) return count; } +int bitmap_is_empty(struct bitmap *self) +{ + size_t i; + for (i = 0; i < self->word_alloc; i++) + if (self->words[i]) + return 0; + return 1; +} + int bitmap_equals(struct bitmap *self, struct bitmap *other) { struct bitmap *big, *small; diff --git a/ewah/ewok.h b/ewah/ewok.h index 7eb8b9b630..c11d76c6f3 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -189,5 +189,6 @@ void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other); void bitmap_or(struct bitmap *self, const struct bitmap *other); size_t bitmap_popcount(struct bitmap *self); +int bitmap_is_empty(struct bitmap *self); #endif From patchwork Thu Dec 14 22:24:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493734 Received: from mail-oi1-f178.google.com (mail-oi1-f178.google.com [209.85.167.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3FAE82C6B7 for ; Thu, 14 Dec 2023 22:24:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="IByVytpN" Received: by mail-oi1-f178.google.com with SMTP id 5614622812f47-3b9fd22bb1aso41410b6e.2 for ; Thu, 14 Dec 2023 14:24:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592643; x=1703197443; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=nqgbEuvVcDmJPtDsqf7xfDXiJRrRES42+XUwX9wFkfI=; b=IByVytpN1d6IHafe3hxB7VcqUrBzoYNeH8TH0E/+kx/bPA+BKE0i1kZzKdmlqzWhEf 1adAwEoO+yg0Uw40hivX1uxWsMfKc9W0/EewOZ9e4Fdey17TmV1U/r/AWQ/VqFEohi5T eu7L4h37NTGN1YrJbypbbNmrHQXjKjVchye7hRrnpNwylp0ie4Slic5Fw43wTjcCAfd1 hoY/swfdaFX4LxRDDmglTzG1HF1Y1dB89JRGElqZdqmQ7xO7wuiaNyetwAPcSEvI78pm 6J/JGCLiimE2TB1fEhIGdqcdQW6ZvzoFjNyKBmVX6d5aYw3AJQqtQnmYR5CoZVGK2m8t ThBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592643; x=1703197443; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=nqgbEuvVcDmJPtDsqf7xfDXiJRrRES42+XUwX9wFkfI=; b=vMumIgnv2BR1JjU/DLI6oOij+G4EtZGu9exj9RzIn+KSnuPM2CehBWRspV4JsEymSU +gHl/PzTDQltW4jD+lw5xTuQBGRnqkloOkeAnOyNYG5EkqvLEitJu2L5UhEEpweGimH6 rYXe5HDSTJkVdCcHWYMMh3s9wroPZHROUdGtF6SArQ5uGlIjEQTAxl9YTLLERW4PZWOd QyEX70/42wLpul89iLk1HSnwqY3kD0lT6rlnFEJ07e2cLxyZ4KJCFbnUzqhy1z33ckCl EuNsM92rf//1uiONHQNhVC5ZR17u9qAX31GxAVYW3aH0UJYdTrcjp3VhKbtahvl4cp9A xQSA== X-Gm-Message-State: AOJu0YziIK7CxqKtaMKwc14rDAWCPrcsD/PLFmPSVwmmBYrCyU80BWEp izA/mjG0bEwU9EwhGW9dhE3o6aiTAR5lKDgvzhrGSA== X-Google-Smtp-Source: AGHT+IGWsoUe+BmOtlTCdIZ0odAYAkhJuO6cK5h0tAPGwgCQmsIk2yv6a69wMOLR2nNiqKUJOktLUA== X-Received: by 2002:a05:6808:30a3:b0:3b9:e8f4:a488 with SMTP id bl35-20020a05680830a300b003b9e8f4a488mr12710045oib.26.1702592642895; Thu, 14 Dec 2023 14:24:02 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id l3-20020a056808020300b003b8388ffaffsm3535219oie.41.2023.12.14.14.24.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:02 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:01 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 09/26] pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The signature of `reuse_partial_packfile_from_bitmap()` currently takes in a bitmap, as well as three output parameters (filled through pointers, and passed as arguments), and also returns an integer result. The output parameters are filled out with: (a) the packfile used for pack-reuse, (b) the number of objects from that pack that we can reuse, and (c) a bitmap indicating which objects we can reuse. The return value is either -1 (when there are no objects to reuse), or 0 (when there is at least one object to reuse). Some of these parameters are redundant. Notably, we can infer from the bitmap how many objects are reused by calling bitmap_popcount(). And we can similar compute the return value based on that number as well. As such, clean up the signature of this function to drop the "*entries" parameter, as well as the int return value, since the single caller of this function can infer these values themself. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 16 +++++++++------- pack-bitmap.c | 16 +++++++--------- pack-bitmap.h | 7 +++---- 3 files changed, 19 insertions(+), 20 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 321d7effb0..c3df6d9657 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3943,13 +3943,15 @@ static int get_object_list_from_bitmap(struct rev_info *revs) if (!(bitmap_git = prepare_bitmap_walk(revs, 0))) return -1; - if (pack_options_allow_reuse() && - !reuse_partial_packfile_from_bitmap( - bitmap_git, - &reuse_packfile, - &reuse_packfile_objects, - &reuse_packfile_bitmap)) { - assert(reuse_packfile_objects); + if (pack_options_allow_reuse()) + reuse_partial_packfile_from_bitmap(bitmap_git, &reuse_packfile, + &reuse_packfile_bitmap); + + if (reuse_packfile) { + reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap); + if (!reuse_packfile_objects) + BUG("expected non-empty reuse bitmap"); + nr_result += reuse_packfile_objects; nr_seen += reuse_packfile_objects; display_progress(progress_state, nr_seen); diff --git a/pack-bitmap.c b/pack-bitmap.c index d64a80c30c..c75a83e9cc 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -2000,10 +2000,9 @@ static int bitmapped_pack_cmp(const void *va, const void *vb) return 0; } -int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, - struct packed_git **packfile_out, - uint32_t *entries, - struct bitmap **reuse_out) +void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, + struct packed_git **packfile_out, + struct bitmap **reuse_out) { struct repository *r = the_repository; struct bitmapped_pack *packs = NULL; @@ -2025,7 +2024,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, warning(_("unable to load pack: '%s', disabling pack-reuse"), bitmap_git->midx->pack_names[i]); free(packs); - return -1; + return; } if (!pack.bitmap_nr) continue; /* no objects from this pack */ @@ -2059,10 +2058,10 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, reuse_partial_packfile_from_bitmap_1(bitmap_git, packs, reuse); - *entries = bitmap_popcount(reuse); - if (!*entries) { + if (bitmap_is_empty(reuse)) { + free(packs); bitmap_free(reuse); - return -1; + return; } /* @@ -2072,7 +2071,6 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, bitmap_and_not(result, reuse); *packfile_out = packs[0].p; *reuse_out = reuse; - return 0; } int bitmap_walk_contains(struct bitmap_index *bitmap_git, diff --git a/pack-bitmap.h b/pack-bitmap.h index b68b213388..ab3fdcde6b 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -78,10 +78,9 @@ int test_bitmap_hashes(struct repository *r); struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, int filter_provided_objects); uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git); -int reuse_partial_packfile_from_bitmap(struct bitmap_index *, - struct packed_git **packfile, - uint32_t *entries, - struct bitmap **reuse_out); +void reuse_partial_packfile_from_bitmap(struct bitmap_index *, + struct packed_git **packfile, + struct bitmap **reuse_out); int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping, kh_oid_map_t *reused_bitmaps, int show_progress); void free_bitmap_index(struct bitmap_index *); From patchwork Thu Dec 14 22:24:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493737 Received: from mail-ot1-f49.google.com (mail-ot1-f49.google.com [209.85.210.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E73FE66AC4 for ; Thu, 14 Dec 2023 22:24:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="0dSclZ5q" Received: by mail-ot1-f49.google.com with SMTP id 46e09a7af769-6d9f879f784so60338a34.2 for ; Thu, 14 Dec 2023 14:24:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592645; x=1703197445; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=lt9BCHjvdrmObONlTkDEC+OLN2lDb4n9PEi2jy5LUrs=; b=0dSclZ5qlfX44SmLucj43/mdzJAshHwCgEezrWOlLgS734dTgxJp7rY1+6Y4O7ih91 LKP4HGq9rWQfMJkSVOKoV63SD41PttUo6CD0R7KAQCEFVMLYz9DtqZ2WT6aYl1dhGGRG 2rpBe5G6rAJDcnQFlR7BLAF6NkFKdSzDNspcJiGx8AH0+uQmbPjrxv0aiuea+4dH4n1h 4O8TkhPd9HniSbi4+kCNAoICMicJujX8XsYZlmpT6S88Y5AgCKmJbSvbq1pFrdmJLb5D klX/Tbrl91+nXus4MYxtpGyRiYEoxNwkFvhvHqMpTaMZnHJe9qTqHTYowPVkvxv4s7hF fJyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592645; x=1703197445; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=lt9BCHjvdrmObONlTkDEC+OLN2lDb4n9PEi2jy5LUrs=; b=FDd6rMXIBSB7VKuqX99jWalKUzlnbOhiTRyhfj0x0/tHQjDYoMbW0TqGKg5ZreIMMm yWdFgR1tiVDn3RqHuPY4sGWQv50rvbRX6gcGpo2PV95ZlgnzYqSUIN9+KZbz/KxUgu5k ihvxJSymwqQl1GnI+kZPrLGmP5RzdolnQme4e0B+qRZdXuig15tMmb1WuAhDyUIIcSeT 7CaPGEyUjv4x8mCyb3MiBSxrkuD5CzZkk1/hVMg3PPMwF6DqIn+k7OzKUCet/8XFj1gz bkE8rulD1eQAEZiyZMHdDUDCz0R0vB+VQ3zLjqQneslbT/Ho/pihtOronLvheAKkJAAz KsQw== X-Gm-Message-State: AOJu0YxuG3Xm2yj13ReJuBtLv33Q7jZaiA98I9VAuhAlkxNlJDyhX943 /oOwZoN+IINfaUL6WjIzrIBT1W2eQfZ3ZgvYCnST4w== X-Google-Smtp-Source: AGHT+IGOVzyHRaunzRx8BM1ChCZ5VzOp3+kyGJS5FgtIZ/JNs3bbSAazbwDZOXCgHQSSSepJ7I202g== X-Received: by 2002:a9d:6a10:0:b0:6d9:e01c:6b79 with SMTP id g16-20020a9d6a10000000b006d9e01c6b79mr10756559otn.30.1702592645492; Thu, 14 Dec 2023 14:24:05 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id da15-20020a0568306a8f00b006ce28044207sm1098118otb.58.2023.12.14.14.24.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:05 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:04 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 10/26] pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Further prepare for enabling verbatim pack-reuse over multiple packfiles by changing the signature of reuse_partial_packfile_from_bitmap() to populate an array of `struct bitmapped_pack *`'s instead of a pointer to a single packfile. Since the array we're filling out is sized dynamically[^1], add an additional `size_t *` parameter which will hold the number of reusable packs (equal to the number of elements in the array). Note that since we still have not implemented true multi-pack reuse, these changes aren't propagated out to the rest of the caller in builtin/pack-objects.c. In the interim state, we expect that the array has a single element, and we use that element to fill out the static `reuse_packfile` variable (which is a bog-standard `struct packed_git *`). Future commits will continue to push this change further out through the pack-objects code. [^1]: That is, even though we know the number of packs which are candidates for pack-reuse, we do not know how many of those candidates we can actually reuse. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 9 +++++++-- pack-bitmap.c | 6 ++++-- pack-bitmap.h | 5 +++-- 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index c3df6d9657..87e16636a8 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3940,14 +3940,19 @@ static int pack_options_allow_reuse(void) static int get_object_list_from_bitmap(struct rev_info *revs) { + struct bitmapped_pack *packs = NULL; + size_t packs_nr = 0; + if (!(bitmap_git = prepare_bitmap_walk(revs, 0))) return -1; if (pack_options_allow_reuse()) - reuse_partial_packfile_from_bitmap(bitmap_git, &reuse_packfile, + reuse_partial_packfile_from_bitmap(bitmap_git, &packs, + &packs_nr, &reuse_packfile_bitmap); - if (reuse_packfile) { + if (packs) { + reuse_packfile = packs[0].p; reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap); if (!reuse_packfile_objects) BUG("expected non-empty reuse bitmap"); diff --git a/pack-bitmap.c b/pack-bitmap.c index c75a83e9cc..4d5a484678 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -2001,7 +2001,8 @@ static int bitmapped_pack_cmp(const void *va, const void *vb) } void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, - struct packed_git **packfile_out, + struct bitmapped_pack **packs_out, + size_t *packs_nr_out, struct bitmap **reuse_out) { struct repository *r = the_repository; @@ -2069,7 +2070,8 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, * need to be handled separately. */ bitmap_and_not(result, reuse); - *packfile_out = packs[0].p; + *packs_out = packs; + *packs_nr_out = packs_nr; *reuse_out = reuse; } diff --git a/pack-bitmap.h b/pack-bitmap.h index ab3fdcde6b..7a12a2ce81 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -78,8 +78,9 @@ int test_bitmap_hashes(struct repository *r); struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, int filter_provided_objects); uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git); -void reuse_partial_packfile_from_bitmap(struct bitmap_index *, - struct packed_git **packfile, +void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, + struct bitmapped_pack **packs_out, + size_t *packs_nr_out, struct bitmap **reuse_out); int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping, kh_oid_map_t *reused_bitmaps, int show_progress); From patchwork Thu Dec 14 22:24:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493735 Received: from mail-oa1-f52.google.com (mail-oa1-f52.google.com [209.85.160.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB8C32C697 for ; Thu, 14 Dec 2023 22:24:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="Ca1j7Pv2" Received: by mail-oa1-f52.google.com with SMTP id 586e51a60fabf-20335dcec64so31881fac.3 for ; Thu, 14 Dec 2023 14:24:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592648; x=1703197448; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=dmySEqTNnPQFnlPQzIl5fkFaXq8X0shgNEuk2aLPPK8=; b=Ca1j7Pv2IMoFQuN3vQAQkCJi6kU3JRN4HQJMTocWtnSri8ovmc2BJjzjFnAqWfGsoZ 0lhRDCty6bGG6iL4vStgkzYG4fPalkMdsL4yx4597KRmwaDTz0cSmKqSmw1xUwdAUh/U qh/xAcBqd9obawviLyEB5/L3mPTQASbTybXIilOmZuFUgtQ1b5X7w/wJWOljhmj4xz3k GtZ0jvnMcAH5uTs91y3OghZmOGFw8A2iQ1yw54dRJvdMAoXdPxwVsr5skjxHUh79F41Y lg/03dmjFMrqRLYP6ZsVrywBu7mHyZ6A1tuqq1MU5gMjlyBSeAL9fqqDMh4gMUFavmPe dIRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592648; x=1703197448; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=dmySEqTNnPQFnlPQzIl5fkFaXq8X0shgNEuk2aLPPK8=; b=wBG3TRX9VBYU2l6cl0vhDSuPWgyhM01krr/EBvHi/INM1F66pM0OlBEzGIXu6WnHID vSaVMbEmAFezf/1cKMkgtTFCb9/4SiLFqcT9qksykiD17crt+BcbM/Nkq9FqKDHGTs0I PNe8/uwraMqv8X/8aM/w1+w6M6kxXIZY9tGIFRVeFuzxwybVzsXZNE1wA31TVByyTnQg x7BkjcTv+uR18LogYYuHJ298cj0BhVhD7tNKP4UbAKF9rEd803bs3nKJ0frT9LV6YBT5 5ann9da1lFVCcflCmMy5B1L+xrg+bAwJMWKLcAFQLIesmkAqLikIlZLT+Fagv1w2s5fi PVpw== X-Gm-Message-State: AOJu0Yy/B7CYWBo4LG8GXHiKALqna6eaFrqnykDmXAJh1thSC2P6lU5n ttlhosBgT7JY/ePtFAd5X2myeuO/bzJ/3nHcAYhnZg== X-Google-Smtp-Source: AGHT+IEGg8n6tyr6h/j1LGosNe7opT0uBwgC6oQp/rLuS6/Q/NdsSAGswpxupivCJmGDlGq1cDQB3Q== X-Received: by 2002:a05:6871:e407:b0:203:1696:1dd1 with SMTP id py7-20020a056871e40700b0020316961dd1mr4569726oac.100.1702592648191; Thu, 14 Dec 2023 14:24:08 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id m18-20020a9d6ad2000000b006d7e23c58b6sm3344242otq.38.2023.12.14.14.24.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:08 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:07 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 11/26] pack-objects: parameterize pack-reuse routines over a single pack Message-ID: <67e4fd8a061fdc008031f6d3746fd65664965ff2.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The routines pack-objects uses to perform verbatim pack-reuse are: - write_reused_pack_one() - write_reused_pack_verbatim() - write_reused_pack() , all of which assume that there is exactly one packfile being reused: the global constant `reuse_packfile`. Prepare for reusing objects from multiple packs by making reuse packfile a parameter of each of the above functions in preparation for calling these functions in a loop with multiple packfiles. Note that we still have the global "reuse_packfile", but pass it through each of the above function's parameter lists, eliminating all but one direct access (the top-level caller in `write_pack_file()`). Even after this series, we will still have a global, but it will hold the array of reusable packfiles, and we'll pass them one at a time to these functions in a loop. Note also that we will eventually need to pass a `bitmapped_pack` instead of a `packed_git` in order to hold onto additional information required for reuse (such as the bit position of the first object belonging to that pack). But that change will be made in a future commit so as to minimize the noise below as much as possible. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 87e16636a8..102fe9a4f8 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1013,7 +1013,8 @@ static off_t find_reused_offset(off_t where) return reused_chunks[lo-1].difference; } -static void write_reused_pack_one(size_t pos, struct hashfile *out, +static void write_reused_pack_one(struct packed_git *reuse_packfile, + size_t pos, struct hashfile *out, struct pack_window **w_curs) { off_t offset, next, cur; @@ -1091,7 +1092,8 @@ static void write_reused_pack_one(size_t pos, struct hashfile *out, copy_pack_data(out, reuse_packfile, w_curs, offset, next - offset); } -static size_t write_reused_pack_verbatim(struct hashfile *out, +static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile, + struct hashfile *out, struct pack_window **w_curs) { size_t pos = 0; @@ -1118,14 +1120,15 @@ static size_t write_reused_pack_verbatim(struct hashfile *out, return pos; } -static void write_reused_pack(struct hashfile *f) +static void write_reused_pack(struct packed_git *reuse_packfile, + struct hashfile *f) { size_t i = 0; uint32_t offset; struct pack_window *w_curs = NULL; if (allow_ofs_delta) - i = write_reused_pack_verbatim(f, &w_curs); + i = write_reused_pack_verbatim(reuse_packfile, f, &w_curs); for (; i < reuse_packfile_bitmap->word_alloc; ++i) { eword_t word = reuse_packfile_bitmap->words[i]; @@ -1141,7 +1144,8 @@ static void write_reused_pack(struct hashfile *f) * bitmaps. See comment in try_partial_reuse() * for why. */ - write_reused_pack_one(pos + offset, f, &w_curs); + write_reused_pack_one(reuse_packfile, pos + offset, f, + &w_curs); display_progress(progress_state, ++written); } } @@ -1199,7 +1203,7 @@ static void write_pack_file(void) if (reuse_packfile) { assert(pack_to_stdout); - write_reused_pack(f); + write_reused_pack(reuse_packfile, f); offset = hashfile_total(f); } From patchwork Thu Dec 14 22:24:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493736 Received: from mail-ot1-f48.google.com (mail-ot1-f48.google.com [209.85.210.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2BA806720B for ; Thu, 14 Dec 2023 22:24:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="NDqSAlKb" Received: by mail-ot1-f48.google.com with SMTP id 46e09a7af769-6d9f9fbfd11so61217a34.2 for ; Thu, 14 Dec 2023 14:24:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592651; x=1703197451; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=IhGfCoK2tH7CSsD31Uea+efJ+DNSx0N9h9XZ47wrw/8=; b=NDqSAlKbMExTVmQze3Hq4goJEsanXcRA7TsntaRbwki6o5YCG1vrYFLtA4oLnqEzo5 lKU06P4p3EwoyVQNk/ZhXFq+P9fNWwvoIv1pGL10sVOBE550aZU8bwNkj0u7F9gyZgAg lr9AXnk/7zDPlN18xQ8hByE66PewnyENC+faIci8r+Uiuzto8fhVPIRdjkJTJiYWSs0J q/BC4CsB4XPKUjbUPbz1ZmrN+yg0Ps/yaLjIRgvqL5gUX8bMw55fulvh5tAAbIC/E9Qf LK2ZZ4ORy8glF25rLI5ZaoeyS7v8/5iCAc0Ic4HSZrHUgUnfLRJ14iWtKmQYGI+3fjGF gzzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592651; x=1703197451; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=IhGfCoK2tH7CSsD31Uea+efJ+DNSx0N9h9XZ47wrw/8=; b=G2lJybYh2VJ1Z9VqwlydzhB006K+HXu2q+JGdkjv2jIdpTUoRX5TyldwTbdq6hScMD FT60pfhTr+UbtHyjxFjQCD18/SKIAGynE0tDDXWqjvpMzLWFovJyPhTvYIK3AzQ36cE7 QfR7qpEuCjy67OVbH+0kmlaClARxjI2hjrPopRtgnZsnfSsDd2wr0EE4jvLGXwxLOjS4 OG1shVrRMMWLviAX8MbA1SguLuz6KtB6CPbANjEANvYUbRKXWF3mR4xtwoVvZWoKdI9k kEnMzICV5Fd820H2WR/d2MKn/VrKciYLEvi6uz2B7K5Pmweg5KKobDR4DavR5PHSi2Ko WE8g== X-Gm-Message-State: AOJu0Yy2+MqjgpCmMs3LjCA0/92VCPTHBapFlaLHs3jMY8QL4QAITuTs d3UAlj+qcazC6zXzESYxXUqcOqonxtgMxh/KxKkr5A== X-Google-Smtp-Source: AGHT+IGC6oS2gmVri538ikrtq0Co2V9TNixeYvSH6cThmAt5OjmjsK/6kjOyVaBSJshjYmTp5txW9A== X-Received: by 2002:a05:6830:1395:b0:6d8:7a4e:37c1 with SMTP id d21-20020a056830139500b006d87a4e37c1mr9095638otq.9.1702592650784; Thu, 14 Dec 2023 14:24:10 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id cn14-20020a056830658e00b006d87e38f91asm920385otb.56.2023.12.14.14.24.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:10 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:09 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 12/26] pack-objects: keep track of `pack_start` for each reuse pack Message-ID: <9a5c38514bb6dedc44c28bab71a98fe1c0150ecb.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When reusing objects from a pack, we keep track of a set of one or more `reused_chunk`s, corresponding to sections of one or more object(s) from a source pack that we are reusing. Each chunk contains two pieces of information: - the offset of the first object in the source pack (relative to the beginning of the source pack) - the difference between that offset, and the corresponding offset in the pack we're generating The purpose of keeping track of these is so that we can patch an OFS_DELTAs that cross over a section of the reuse pack that we didn't take. For instance, consider a hypothetical pack as shown below: (chunk #2) __________... / / +--------+---------+-------------------+---------+ ... | | | (unused) | | ... +--------+---------+-------------------+---------+ \ / \______________/ (chunk #1) Suppose that we are sending objects "base", "other", and "delta", and that the "delta" object is stored as an OFS_DELTA, and that its base is "base". If we don't send any objects in the "(unused)" range, we can't copy the delta'd object directly, since its delta offset includes a range of the pack that we didn't copy, so we have to account for that difference when patching and reassembling the delta. In order to compute this value correctly, we need to know not only where we are in the packfile we're assembling (with `hashfile_total(f)`) but also the position of the first byte of the packfile that we are currently reusing. Currently, this works just fine, since when reusing only a single pack those two values are always identical (because verbatim reuse is the first thing pack-objects does when enabled after writing the pack header). But when reusing multiple packs which have one or more gaps, we'll need to account for these two values diverging. Together, these two allow us to compute the reused chunk's offset difference relative to the start of the reused pack, as desired. Helped-by: Jeff King Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 102fe9a4f8..f51b86d99f 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1015,6 +1015,7 @@ static off_t find_reused_offset(off_t where) static void write_reused_pack_one(struct packed_git *reuse_packfile, size_t pos, struct hashfile *out, + off_t pack_start, struct pack_window **w_curs) { off_t offset, next, cur; @@ -1024,7 +1025,8 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile, offset = pack_pos_to_offset(reuse_packfile, pos); next = pack_pos_to_offset(reuse_packfile, pos + 1); - record_reused_object(offset, offset - hashfile_total(out)); + record_reused_object(offset, + offset - (hashfile_total(out) - pack_start)); cur = offset; type = unpack_object_header(reuse_packfile, w_curs, &cur, &size); @@ -1094,6 +1096,7 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile, static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile, struct hashfile *out, + off_t pack_start UNUSED, struct pack_window **w_curs) { size_t pos = 0; @@ -1125,10 +1128,12 @@ static void write_reused_pack(struct packed_git *reuse_packfile, { size_t i = 0; uint32_t offset; + off_t pack_start = hashfile_total(f) - sizeof(struct pack_header); struct pack_window *w_curs = NULL; if (allow_ofs_delta) - i = write_reused_pack_verbatim(reuse_packfile, f, &w_curs); + i = write_reused_pack_verbatim(reuse_packfile, f, pack_start, + &w_curs); for (; i < reuse_packfile_bitmap->word_alloc; ++i) { eword_t word = reuse_packfile_bitmap->words[i]; @@ -1145,7 +1150,7 @@ static void write_reused_pack(struct packed_git *reuse_packfile, * for why. */ write_reused_pack_one(reuse_packfile, pos + offset, f, - &w_curs); + pack_start, &w_curs); display_progress(progress_state, ++written); } } From patchwork Thu Dec 14 22:24:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493739 Received: from mail-oa1-f46.google.com (mail-oa1-f46.google.com [209.85.160.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCAF72C696 for ; Thu, 14 Dec 2023 22:24:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="iKBW55nu" Received: by mail-oa1-f46.google.com with SMTP id 586e51a60fabf-1f055438492so30902fac.3 for ; Thu, 14 Dec 2023 14:24:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592653; x=1703197453; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=uwZrOkhPjBG+WXIE3BSRWg+0jUXkdoVE/GCNZunL+Ao=; b=iKBW55nueRrlOKzdpuczCs5EjU1IgT9ibzCd+hkjdcDiV/4xnUpmC0djPI3fX57xHh lRZQ934LZjh71v4GuE7jh6e3nNFtjmwlwmLF1h2edTPsEVaqhyfZxh0JDApjljt0NMHv Cffpk6Ir4uuMxiyodSa+/FB8DfZIEUNmD71kW/Xu+VywB69OorIef9jnjChVl0TDfpoW ERGFulzqCXKaMtsliNSwdqfsOKDuTbnkwvdpPRkIRnqkvMbzRr9lv7TMndfCOYsVQ9qP DI3IW6gVYDnJioHyvBFxpqtguvVgSamczaZFR+jRza5LRfb7i6Suc1KZxAVz1LGSTX7N qTMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592653; x=1703197453; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=uwZrOkhPjBG+WXIE3BSRWg+0jUXkdoVE/GCNZunL+Ao=; b=eQErEHLXl/x3oQJh8dW7P1OsKh8wlZygO0HCJSI8OCbadnk9cnEIniej5im83jIHRU CvLLbPTAn1f0HAv580AKnCeor+1HrKLjVMdfaxQqa+c2j4w6Q14kK3qmqsV5ekFUyqpV m3PG127SfyqB4fIce2XN37jGRrnuZhT17yXh8+ahRlbRnFUY5fvXYHivmnw7BaqHuy4q ki7Ajt0Ecw/72JkBz7uHTxe79sF7QRC5JvgmIK92Q8qPgpzdEn6++xl71DLjYpZ++OrJ 4P8dGLwj43AyTyJ9t1nrApgektkTKJfU8g0yDTmtWNZLV2WsrXkpXsLoNF+4kNRjDv/L /lXw== X-Gm-Message-State: AOJu0YwMvOL0ZP33ZEMg5oWpc5orgv6xGNr0QnBFO6rT92q+IAO9VOUF qTBg/O+ej4KMJxXlLgDONxAGBQgJx+EapFWwV3mLpA== X-Google-Smtp-Source: AGHT+IEdyCR7JZMl1zY9/OpBk04tul+0I1P5ebKE2fh75HhvE6RWcQs08teGqoE11wPSSVVj1ZACNg== X-Received: by 2002:a05:6870:d91b:b0:1fa:f6e2:6fc7 with SMTP id gq27-20020a056870d91b00b001faf6e26fc7mr13328963oab.29.1702592653372; Thu, 14 Dec 2023 14:24:13 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id pp23-20020a0568709d1700b001fa3c734bc5sm4733200oab.46.2023.12.14.14.24.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:13 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:12 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 13/26] pack-objects: pass `bitmapped_pack`'s to pack-reuse functions Message-ID: <5492d11f25ec54c8eff59e3fa266abd766a1f40c.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Further prepare pack-objects to perform verbatim pack-reuse over multiple packfiles by converting functions that take in a pointer to a `struct packed_git` to instead take in a pointer to a `struct bitmapped_pack`. The additional information found in the bitmapped_pack struct (such as the bit position corresponding to the beginning of the pack) will be necessary in order to perform verbatim pack-reuse. Note that we don't use any of the extra pieces of information contained in the bitmapped_pack struct, so this step is merely preparatory and does not introduce any functional changes. Note further that we do not change the argument type to write_reused_pack_one(). That function is responsible for copying sections of the packfile directly and optionally patching any OFS_DELTAs to account for not reusing sections of the packfile in between a delta and its base. As such, that function is (and should remain) oblivious to multi-pack reuse, and does not require any of the extra pieces of information stored in the bitmapped_pack struct. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index f51b86d99f..07c849b5d4 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -221,7 +221,8 @@ static int thin; static int num_preferred_base; static struct progress *progress_state; -static struct packed_git *reuse_packfile; +static struct bitmapped_pack *reuse_packfiles; +static size_t reuse_packfiles_nr; static uint32_t reuse_packfile_objects; static struct bitmap *reuse_packfile_bitmap; @@ -1094,7 +1095,7 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile, copy_pack_data(out, reuse_packfile, w_curs, offset, next - offset); } -static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile, +static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile, struct hashfile *out, off_t pack_start UNUSED, struct pack_window **w_curs) @@ -1109,13 +1110,13 @@ static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile, off_t to_write; written = (pos * BITS_IN_EWORD); - to_write = pack_pos_to_offset(reuse_packfile, written) + to_write = pack_pos_to_offset(reuse_packfile->p, written) - sizeof(struct pack_header); /* We're recording one chunk, not one object. */ record_reused_object(sizeof(struct pack_header), 0); hashflush(out); - copy_pack_data(out, reuse_packfile, w_curs, + copy_pack_data(out, reuse_packfile->p, w_curs, sizeof(struct pack_header), to_write); display_progress(progress_state, written); @@ -1123,7 +1124,7 @@ static size_t write_reused_pack_verbatim(struct packed_git *reuse_packfile, return pos; } -static void write_reused_pack(struct packed_git *reuse_packfile, +static void write_reused_pack(struct bitmapped_pack *reuse_packfile, struct hashfile *f) { size_t i = 0; @@ -1149,8 +1150,8 @@ static void write_reused_pack(struct packed_git *reuse_packfile, * bitmaps. See comment in try_partial_reuse() * for why. */ - write_reused_pack_one(reuse_packfile, pos + offset, f, - pack_start, &w_curs); + write_reused_pack_one(reuse_packfile->p, pos + offset, + f, pack_start, &w_curs); display_progress(progress_state, ++written); } } @@ -1206,9 +1207,12 @@ static void write_pack_file(void) offset = write_pack_header(f, nr_remaining); - if (reuse_packfile) { + if (reuse_packfiles_nr) { assert(pack_to_stdout); - write_reused_pack(reuse_packfile, f); + for (j = 0; j < reuse_packfiles_nr; j++) { + reused_chunks_nr = 0; + write_reused_pack(&reuse_packfiles[j], f); + } offset = hashfile_total(f); } @@ -3949,19 +3953,16 @@ static int pack_options_allow_reuse(void) static int get_object_list_from_bitmap(struct rev_info *revs) { - struct bitmapped_pack *packs = NULL; - size_t packs_nr = 0; - if (!(bitmap_git = prepare_bitmap_walk(revs, 0))) return -1; if (pack_options_allow_reuse()) - reuse_partial_packfile_from_bitmap(bitmap_git, &packs, - &packs_nr, + reuse_partial_packfile_from_bitmap(bitmap_git, + &reuse_packfiles, + &reuse_packfiles_nr, &reuse_packfile_bitmap); - if (packs) { - reuse_packfile = packs[0].p; + if (reuse_packfiles) { reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap); if (!reuse_packfile_objects) BUG("expected non-empty reuse bitmap"); From patchwork Thu Dec 14 22:24:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493740 Received: from mail-ot1-f54.google.com (mail-ot1-f54.google.com [209.85.210.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 353F36720C for ; Thu, 14 Dec 2023 22:24:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="MbJqBnBa" Received: by mail-ot1-f54.google.com with SMTP id 46e09a7af769-6d9f069e9b0so63447a34.3 for ; Thu, 14 Dec 2023 14:24:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592656; x=1703197456; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=W4XNhNNhFFLFDKKlC4eR3SMBA65aA/jbaNCQOeCQ6bI=; b=MbJqBnBap7jVQLodTgtN3gETHcybHbJtrT1Rurh78JrlvRvEGGZxsyPZbQuH614zTE nF/K1/jVd1CvyRrMxztP1oFRSnC51Yv71QiKVqRX0csMCnJ2DYXWv75+2aXxDY20OazB 1k/wzF1N1797+ubIeOkQNZGxe1O3pV652fuuFGJmLNh9u2TN65cLWhys/AHWtWxal4ib I3UWXjaXtaq3ufiLR+809Yu5yeFxUnFD0rd412TqtLDEuoZkPLM1GjK3Huz1LOfF3njn IgJQX2vJ8RKkGZcaUB9Fm+YqoghsItBiY/w6ZXH206zWqGti1A6IKU7ctQ91OAKxnNNT 1FgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592656; x=1703197456; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=W4XNhNNhFFLFDKKlC4eR3SMBA65aA/jbaNCQOeCQ6bI=; b=wc3NZnTTr+wg79NScjAIWUixpjP61bJz+yrQtNvlioR0yCBRNRW7rVrXNaRlLllvyP fK0h4O4FNIKECsMCO58blpxXQc81V9kf3cMqnBevijljBH8THnbZ/1dEIm6+z5ZFXH3j GHmzYSf0+sLVds58brKgTmxKrEmzqAlnAyv2GVtTeTmn8ovZeMXiGXQ76mE/kEv/APKK Uv4Ua0fPgv/zgyjx9GCTAg6wYxne4LWATAWKuDzj5o1yEznxXi+MPkXJ6DroEARzMaYF KPMCPZ481teNfqkSk/3PrISF6uIXxfZeHHkQNk37g0MduOfDbumZk09zKTEWkF3WJ80V SjFg== X-Gm-Message-State: AOJu0YxExFEVsLbDjgiIG3HSHXaKJIt/GPwqYn1dRxE9rTT8zKm8KTm3 2LtuSa3Wr8eFx6IoUuMnQ6cCq+bNJ10qDnz/WLrmgg== X-Google-Smtp-Source: AGHT+IHJE6l0iY/onrivYq9/wiK9+XcjoaAcKU9AnnAeWGmya+cmpcNgO4x9Lds7vqjO/zu6g1rIYQ== X-Received: by 2002:a9d:7f11:0:b0:6d9:a1f8:6b9a with SMTP id j17-20020a9d7f11000000b006d9a1f86b9amr9816362otq.47.1702592655878; Thu, 14 Dec 2023 14:24:15 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id cq8-20020a056830668800b006d7f6638afesm1108494otb.0.2023.12.14.14.24.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:15 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:14 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 14/26] pack-objects: prepare `write_reused_pack()` for multi-pack reuse Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The function `write_reused_pack()` within `builtin/pack-objects.c` is responsible for performing pack-reuse on a single pack, and has two main functions: - it dispatches a call to `write_reused_pack_verbatim()` to see if we can reuse portions of the packfile in whole-word chunks - for any remaining objects (that is, any objects that appear after the first "gap" in the bitmap), call write_reused_pack_one() on that object to record it for reuse. Prepare this function for multi-pack reuse by removing the assumption that the bit position corresponding to the first object being reused from a given pack must be at bit position zero. The changes in this function are mostly straightforward. Initialize `i` to the position of the first word to contain bits corresponding to that reuse pack. In most situations, we throw the initialized value away, since we end up replacing it with the return value from write_reused_pack_verbatim(), moving us past the section of whole words that we reused. Likewise, modify the per-object loop to ignore any bits at the beginning of the first word that do not belong to the pack currently being reused, as well as skip to the "done" section once we have processed the last bit corresponding to this pack. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 07c849b5d4..6ce52d88a9 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1127,7 +1127,7 @@ static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile, static void write_reused_pack(struct bitmapped_pack *reuse_packfile, struct hashfile *f) { - size_t i = 0; + size_t i = reuse_packfile->bitmap_pos / BITS_IN_EWORD; uint32_t offset; off_t pack_start = hashfile_total(f) - sizeof(struct pack_header); struct pack_window *w_curs = NULL; @@ -1145,17 +1145,23 @@ static void write_reused_pack(struct bitmapped_pack *reuse_packfile, break; offset += ewah_bit_ctz64(word >> offset); + if (pos + offset < reuse_packfile->bitmap_pos) + continue; + if (pos + offset >= reuse_packfile->bitmap_pos + reuse_packfile->bitmap_nr) + goto done; /* * Can use bit positions directly, even for MIDX * bitmaps. See comment in try_partial_reuse() * for why. */ - write_reused_pack_one(reuse_packfile->p, pos + offset, + write_reused_pack_one(reuse_packfile->p, + pos + offset - reuse_packfile->bitmap_pos, f, pack_start, &w_curs); display_progress(progress_state, ++written); } } +done: unuse_pack(&w_curs); } From patchwork Thu Dec 14 22:24:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493747 Received: from mail-ot1-f50.google.com (mail-ot1-f50.google.com [209.85.210.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFD7F2C6B8 for ; Thu, 14 Dec 2023 22:24:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="2WUa3Ug5" Received: by mail-ot1-f50.google.com with SMTP id 46e09a7af769-6d9e9b72ecfso59886a34.3 for ; Thu, 14 Dec 2023 14:24:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592658; x=1703197458; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=m98ij0UquASMbSrqlkhGVvX4Kku1DVwI/fwUpawfsf4=; b=2WUa3Ug5GN/rHNXItIfiaWlMsOnfdeOKEA3DvX+Jj52MJbAM3IfPdlMqZI+uxsfFem Rf6AQZKzjbxjQTzSjD9xbqB6w7BQmkzmFRgHZowqY1oheQPwgoOkiRW/tN/OEUmbgXEb LBeLYSYnilwD1AEJ4GKKbBF+HlbgQ74DCezLFT6/Lk6PmIgyCF+hu5AP1gOvDjYXREjp wb79dOq9Dayx/+Q2AmihjjFQg2srlsdGnqKVCtOKl07jNimyx6OAwdraQ/djN51WtFgd Xsa/9b8e9fyaMen9PP0zmK9YCRfnWADRl/Cl3gk+K/BueQUj07zozbLycOrhGcFxR9Oy pHEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592658; x=1703197458; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=m98ij0UquASMbSrqlkhGVvX4Kku1DVwI/fwUpawfsf4=; b=ZXYDg2Ov1Yy/bYufoLVHlkFisN05KuMJCtrPc3flAYf8ICe5bOE2Mp+Z3ueg5ceHKt P3MHNxZkXAz6MCNlIrDwB3U33khxlc2BBb0XAzaq3PgdyGaLsjkrgZ5772rTAfsVUjzZ Gys0eIebu8ayjphQ/DpV0K0Mjy1LTfbzxRpIumOgDJrQw21e3u9aLoXXfAPJbBFOTVUN bv9H1cdWm8W3vC6+gBHjkavVPSwV/iD08ucDOtx235S1uTFJRekpTAwsg77rbBWVPeFF Ni/vI0fvcY6Mh4C7GJ0f5XWlzduyxmHV326AFAcil5e97QQ9w9ITT2PYAJrO9B4RGXd5 rJeQ== X-Gm-Message-State: AOJu0Yw9eI0TsPvU1GiR/wolVTYCDbVGJjoFkFau2KcjVAgOydJPDwqV n0z6LcnCg7v7NVxBARK6bO6FJGgcRv50JOUVU/HzpQ== X-Google-Smtp-Source: AGHT+IGGHBnEkGGNly6peoTpfTPCGY4SWBJ9EnKEzuXlc/3wQ3GCz0cjpPHiDnqX4JFauTPNtUTKRg== X-Received: by 2002:a9d:744b:0:b0:6d9:d278:3f2b with SMTP id p11-20020a9d744b000000b006d9d2783f2bmr11392149otk.45.1702592658544; Thu, 14 Dec 2023 14:24:18 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id bi13-20020a056830378d00b006d9ccfddbdcsm3052672otb.68.2023.12.14.14.24.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:18 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:17 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 15/26] pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse Message-ID: <805c42185ab92bb4303e91fb15e884940380fc87.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The function `write_reused_pack_verbatim()` within `builtin/pack-objects.c` is responsible for writing out a continuous set of objects beginning at the start of the reuse packfile. In the existing implementation, we did something like: while (pos < reuse_packfile_bitmap->word_alloc && reuse_packfile_bitmap->words[pos] == (eword_t)~0) pos++; if (pos) /* write first `pos * BITS_IN_WORD` objects from pack */ as an optimization to record a single chunk for the longest continuous prefix of objects wanted out of the reuse pack, instead of having a chunk for each individual object. For more details, see bb514de356 (pack-objects: improve partial packfile reuse, 2019-12-18). In order to retain this optimization in a multi-pack reuse world, we can no longer assume that the first object in a pack is on a word boundary in the bitmap storing the set of reusable objects. Assuming that all objects from the beginning of the reuse packfile up to the object corresponding to the first bit on a word boundary are part of the result, consume whole words at a time until the last whole word belonging to the reuse packfile. Copy those objects to the resulting packfile, and track that we reused them by recording a single chunk. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 73 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 60 insertions(+), 13 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 6ce52d88a9..31053128fc 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1097,31 +1097,78 @@ static void write_reused_pack_one(struct packed_git *reuse_packfile, static size_t write_reused_pack_verbatim(struct bitmapped_pack *reuse_packfile, struct hashfile *out, - off_t pack_start UNUSED, + off_t pack_start, struct pack_window **w_curs) { - size_t pos = 0; + size_t pos = reuse_packfile->bitmap_pos; + size_t end; - while (pos < reuse_packfile_bitmap->word_alloc && - reuse_packfile_bitmap->words[pos] == (eword_t)~0) - pos++; + if (pos % BITS_IN_EWORD) { + size_t word_pos = (pos / BITS_IN_EWORD); + size_t offset = pos % BITS_IN_EWORD; + size_t last; + eword_t word = reuse_packfile_bitmap->words[word_pos]; - if (pos) { - off_t to_write; + if (offset + reuse_packfile->bitmap_nr < BITS_IN_EWORD) + last = offset + reuse_packfile->bitmap_nr; + else + last = BITS_IN_EWORD; - written = (pos * BITS_IN_EWORD); - to_write = pack_pos_to_offset(reuse_packfile->p, written) - - sizeof(struct pack_header); + for (; offset < last; offset++) { + if (word >> offset == 0) + return word_pos; + if (!bitmap_get(reuse_packfile_bitmap, + word_pos * BITS_IN_EWORD + offset)) + return word_pos; + } + + pos += BITS_IN_EWORD - (pos % BITS_IN_EWORD); + } + + /* + * Now we're going to copy as many whole eword_t's as possible. + * "end" is the index of the last whole eword_t we copy, but + * there may be additional bits to process. Those are handled + * individually by write_reused_pack(). + * + * Begin by advancing to the first word boundary in range of the + * bit positions occupied by objects in "reuse_packfile". Then + * pick the last word boundary in the same range. If we have at + * least one word's worth of bits to process, continue on. + */ + end = reuse_packfile->bitmap_pos + reuse_packfile->bitmap_nr; + if (end % BITS_IN_EWORD) + end -= end % BITS_IN_EWORD; + if (pos >= end) + return reuse_packfile->bitmap_pos / BITS_IN_EWORD; + + while (pos < end && + reuse_packfile_bitmap->words[pos / BITS_IN_EWORD] == (eword_t)~0) + pos += BITS_IN_EWORD; + + if (pos > end) + pos = end; + + if (reuse_packfile->bitmap_pos < pos) { + off_t pack_start_off = pack_pos_to_offset(reuse_packfile->p, 0); + off_t pack_end_off = pack_pos_to_offset(reuse_packfile->p, + pos - reuse_packfile->bitmap_pos); + + written += pos - reuse_packfile->bitmap_pos; /* We're recording one chunk, not one object. */ - record_reused_object(sizeof(struct pack_header), 0); + record_reused_object(pack_start_off, + pack_start_off - (hashfile_total(out) - pack_start)); hashflush(out); copy_pack_data(out, reuse_packfile->p, w_curs, - sizeof(struct pack_header), to_write); + pack_start_off, pack_end_off - pack_start_off); display_progress(progress_state, written); } - return pos; + if (pos % BITS_IN_EWORD) + BUG("attempted to jump past a word boundary to %"PRIuMAX, + (uintmax_t)pos); + return pos / BITS_IN_EWORD; } static void write_reused_pack(struct bitmapped_pack *reuse_packfile, From patchwork Thu Dec 14 22:24:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493741 Received: from mail-oa1-f41.google.com (mail-oa1-f41.google.com [209.85.160.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A00B167207 for ; Thu, 14 Dec 2023 22:24:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="PDYjqyUd" Received: by mail-oa1-f41.google.com with SMTP id 586e51a60fabf-20307e91258so46180fac.0 for ; Thu, 14 Dec 2023 14:24:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592661; x=1703197461; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=5fxM0jOl38/bE95GkRDzvnW1PJiCClC1Gv4ljevaEUQ=; b=PDYjqyUdXWQmn/D2pNs3n374r3CAJvWr62saMstXcJhlVAOwgowOWYD1AKFonaLDTH rMwRVZgovJSrlWHOnk7t0LkX1Dysil9OX/hOcRFPK+XGJdSI8YJeq0MOjItiAWcCXC8H NnWD9fw3UPNEJuXc8uF/1SeU92031YSJYuFPyoj79czJ3TrLvJM1RH4clgctxs/lWvy7 g8rS2uDM9Ip01BTeWPXCAnr7YVh00HASfBWq1nXglwu4ls4CcWYI8Mrg/X5Y5EaIZX8V 0+ktDU5GXewMfSMwJpM8U5+uGE1YaHQ00soAlnOYZsRkLT+fhtH28dU6rMhY4wCN/C3V gmsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592661; x=1703197461; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=5fxM0jOl38/bE95GkRDzvnW1PJiCClC1Gv4ljevaEUQ=; b=fEP1Vb92EvToCjhIdTK7oP+O9LJR5vVXb8mn9nH31iBoVmyXqChKCnmC59irj2H1aU 6WehGvfVexuyrnh2qWIkuPk8J2NP9lE1c9shL4YEl5rFzPJXYaaRqRTj1NmYr4b0f35k fAN2FnCACPlfZK5135wYhQYCucINdE8yEohcem0rEBcU/m5YluSHxmqQPfrhdciMNa26 0/gdk1HaWTIalXVaEivjIvMDFnLnjEPLK04RdpYYEe5LrhoQCMn2irT0mv9fj5kucwuD /IYQ3zglQUfGwZvJEKH6botSyDRxvd10z+0sWOZoxIqH0QifWu61FzP5y/rPCdRcEnbN fr2w== X-Gm-Message-State: AOJu0YyxS98WbF8TyvL12DVmaeFD19582gKknQax6ryCGnE7QJCxZEqn NwX1UEAawEfirv+hYWTFP93MzerXZA+XcTTbcWN6Tw== X-Google-Smtp-Source: AGHT+IEtXHaQFSvV7+0nogLJ9X25WxCL+2oJXgXU00bi6OAgZXqUBZhg0HiddFREz/Uy1m4DOa691g== X-Received: by 2002:a05:6871:288f:b0:203:270:96b3 with SMTP id bq15-20020a056871288f00b00203027096b3mr5167740oac.118.1702592661252; Thu, 14 Dec 2023 14:24:21 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id ot8-20020a056870cc8800b001fb2c8d6d05sm4767069oab.5.2023.12.14.14.24.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:21 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:20 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 16/26] pack-objects: include number of packs reused in output Message-ID: <55696bc1c9fc4d8d99fc870b9f9f5c97dbb181e9.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: In addition to including the number of objects reused verbatim from a reuse-pack, include the number of packs from which objects were reused. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 31053128fc..7eb035eb7d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -223,6 +223,7 @@ static struct progress *progress_state; static struct bitmapped_pack *reuse_packfiles; static size_t reuse_packfiles_nr; +static size_t reuse_packfiles_used_nr; static uint32_t reuse_packfile_objects; static struct bitmap *reuse_packfile_bitmap; @@ -1265,6 +1266,8 @@ static void write_pack_file(void) for (j = 0; j < reuse_packfiles_nr; j++) { reused_chunks_nr = 0; write_reused_pack(&reuse_packfiles[j], f); + if (reused_chunks_nr) + reuse_packfiles_used_nr++; } offset = hashfile_total(f); } @@ -4587,9 +4590,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) fprintf_ln(stderr, _("Total %"PRIu32" (delta %"PRIu32")," " reused %"PRIu32" (delta %"PRIu32")," - " pack-reused %"PRIu32), + " pack-reused %"PRIu32" (from %"PRIuMAX")"), written, written_delta, reused, reused_delta, - reuse_packfile_objects); + reuse_packfile_objects, + (uintmax_t)reuse_packfiles_used_nr); cleanup: clear_packing_data(&to_pack); From patchwork Thu Dec 14 22:24:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493742 Received: from mail-ot1-f47.google.com (mail-ot1-f47.google.com [209.85.210.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 34A5D67214 for ; Thu, 14 Dec 2023 22:24:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="Z45AaP0z" Received: by mail-ot1-f47.google.com with SMTP id 46e09a7af769-6d9ac148ca3so72762a34.0 for ; Thu, 14 Dec 2023 14:24:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592664; x=1703197464; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=9z4Ks2WyJoTkeVpY3qX4VmXVrOs2KVewpxKx1HUbyG0=; b=Z45AaP0z7coqYs9IgJ+gSNId6pfpXuh4AbcjEuiI9RUT2fBAnuhpW5JVJ2VdWrr+o5 /1/iqYGKiljw9GgnCAr9VCCFoJ2HPMb2ozgx61jkjP6aK3aczQ2J/bc5CXofPfvmb5Xq WBECK03oXLsDuN5RYiAtDyi/a8JWt2BaJ6ocf8tdnxjb3S+EF/u3QvoSlHQRJrfO7iIB pDcr4LOu/5etwOIXvAvtf17nkCrullKLwsTNOIqyOXJaRvtAVOlJtM22eoOwMPlm/ppi aLmaMMPYWqevz+TrgjJ+5d6DCI2PMbv2ff4WKw+Ko5nG3MX5GLIv9i+sH/KG4thwzBMa 20uQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592664; x=1703197464; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=9z4Ks2WyJoTkeVpY3qX4VmXVrOs2KVewpxKx1HUbyG0=; b=JxVGmGtYuYn1Zx6wBcDjVtcj7DHP4ZnF/3hF7vbm982gQ0Ct7EAk+RYu2filBCEvbo mlZ/Ej2gjkPy0dsWRWjIEyaLfLT71oT6Vjdfbi/pkmBgZ7krtZYx9l2RHwUoace4cZaw ApfTFapkRfmbs7gpM2AEfq6F4vS/KFn+XCuIVYhouogSDnGnt0OjUtDOr4jmq/cngXzr acOF3MPBP5OQ5TQASIkREiV5Uup1C0M3F+4IIGyT7K41wMkUn9W8c2DMr4XbHz8LZemJ OSRPmX4y/Z4VjACwmjQYrmKjlJQfZXcFilQlyR6V+CktbAFID6sizEnwU/dL2++9NO5X bsNA== X-Gm-Message-State: AOJu0YyKVat5CCQElLYJn05pC5oyJM/JyqiiHBvUJg4bvLpr4jt+QSOd EKmRlF6Y2vIwdGf/uLDqXr1r9f/596PJeuO2C0QxcQ== X-Google-Smtp-Source: AGHT+IHO9t3GhQzMC9bRekJDhJepjxv7G3lJFwTon2QTZHTMeKMXtFY82VSR9B7iAN5ptHnyqsVjxA== X-Received: by 2002:a05:6870:71c2:b0:203:5dc1:2e14 with SMTP id p2-20020a05687071c200b002035dc12e14mr745366oag.85.1702592663819; Thu, 14 Dec 2023 14:24:23 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id r9-20020a056870e8c900b001fb25872d59sm4760930oan.2.2023.12.14.14.24.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:23 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:22 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 17/26] git-compat-util.h: implement checked size_t to uint32_t conversion Message-ID: <6ede9e060332a0850269fe69a7a3a15ac229344b.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: In a similar fashion as other checked cast functions in this header (such as `cast_size_t_to_ulong()` and `cast_size_t_to_int()`), implement a checked cast function for going from a size_t to a uint32_t value. This function will be utilized in a future commit which needs to make such a conversion. Signed-off-by: Taylor Blau --- git-compat-util.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/git-compat-util.h b/git-compat-util.h index 3e7a59b5ff..c3b6c2c226 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -1013,6 +1013,15 @@ static inline unsigned long cast_size_t_to_ulong(size_t a) return (unsigned long)a; } +static inline uint32_t cast_size_t_to_uint32_t(size_t a) +{ + if (a != (uint32_t)a) + die("object too large to read on this platform: %" + PRIuMAX" is cut off to %u", + (uintmax_t)a, (uint32_t)a); + return (uint32_t)a; +} + static inline int cast_size_t_to_int(size_t a) { if (a > INT_MAX) From patchwork Thu Dec 14 22:24:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493745 Received: from mail-ot1-f47.google.com (mail-ot1-f47.google.com [209.85.210.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7D182C695 for ; Thu, 14 Dec 2023 22:24:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="iv/kuNnS" Received: by mail-ot1-f47.google.com with SMTP id 46e09a7af769-6da4e123a96so76392a34.0 for ; Thu, 14 Dec 2023 14:24:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592666; x=1703197466; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=szECaOx7WJurAk3hx1kksvbITTjbggMIrpyIu8xsP6k=; b=iv/kuNnSKVPrwPtiTpGqoC6NCFnJ22wFg4ePu1buQtB9O8WZ2D1bZNcXkkAHNv+f8l oNT7pJbhRlDo9wx117L47DoecgDPVHidYIUfui07OJFuR6/PSFm21vFpyRu37BJDH+GE oVZUGzBEtms72xOlDMEin1mxx6qXbSyNbVfQxmqHc+otiwOGUMwNHAC7+D1GhnSOssJ+ WpEbyo8twANSqh2mwcMIZ1EV2O0ig9G+7dfNjqSjrGcnBOEtkwmthjUfb2hyvA2bgt+r 8WerNORK3tK8BBXdeVJDD42O3FdLC3qKMjml0H7vGrN9wzpHzMVHKWBHwATaoG23Sn/n Naog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592666; x=1703197466; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=szECaOx7WJurAk3hx1kksvbITTjbggMIrpyIu8xsP6k=; b=KLEC/TNymn7QNlcQFmM2wtpQfLBu0jMfJoJdQIm9YJmWD/jyXlaxCW1n4vBL+bJLKL Cy/pENpT3BsDT+MlwWQhw9lqFXv/F/y/hMM1MZVW8kJcoD/UnIe/nIOvfRMGjROliIML PtTu86qJ1AavJCBxx0cDLyZpICVDL1SI9uR0Ap1yvY63KtSDDHL28GC8dmhXYxA2btQG sJ4kBBTN1N2cRGT4ed0slSdJAFVtYxb3Uva2xn9I3mz8+cXyzQzZd5rADledOXirrCnL npEzSjKQpsy4kY8VyZ3Gz2z2O+KFxxwIeW6njuehrUAwbd9lD4twZckSlWh05p9KfTll 7GdQ== X-Gm-Message-State: AOJu0YyYCYaJvLKt0rRvHB71h17HdnEVjz+Qv8nt20ykBPvhek+HlJ3R SLg9n2ZKYMLrZu+P80HYCsU998/lJrYk2CAkgD7unA== X-Google-Smtp-Source: AGHT+IEs6FV2lIfI9LVcmaHEfP+VMwuDfO2dxHWdCjyWLao/6rBdWLXVmLWmBTOxVGpVhpJkABpnsQ== X-Received: by 2002:a05:6830:457:b0:6da:4a6c:44a7 with SMTP id d23-20020a056830045700b006da4a6c44a7mr869197otc.27.1702592666415; Thu, 14 Dec 2023 14:24:26 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id g4-20020a9d6c44000000b006d879b8e68csm3343902otq.69.2023.12.14.14.24.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:26 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:25 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 18/26] midx: implement `midx_preferred_pack()` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When performing a binary search over the objects in a MIDX's bitmap (i.e. in pseudo-pack order), the reader reconstructs the pseudo-pack ordering using a combination of (a) the preferred pack, (b) the pack's lexical position in the MIDX based on pack names, and (c) the object offset within the pack. In order to perform this binary search, the reader must know the identity of the preferred pack. This could be stored in the MIDX, but isn't for historical reasons, mostly because it can easily be inferred at read-time by looking at the object in the first bit position and finding out which pack it was selected from in the MIDX, like so: nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0)); In midx_to_pack_pos() which performs this binary search, we look up the identity of the preferred pack before each search. This is relatively quick, since it involves two table-driven lookups (one in the MIDX's revindex for `pack_pos_to_midx()`, and another in the MIDX's object table for `nth_midxed_pack_int_id()`). But since the preferred pack does not change after the MIDX is written, it is safe to cache this value on the MIDX itself. Write a helper to do just that, and rewrite all of the existing call-sites that care about the identity of the preferred pack in terms of this new helper. This will prepare us for a subsequent patch where we will need to binary search through the MIDX's pseudo-pack order multiple times. Signed-off-by: Taylor Blau --- midx.c | 20 ++++++++++++++++++++ midx.h | 2 ++ pack-bitmap.c | 17 +++++++---------- pack-bitmap.h | 1 - pack-revindex.c | 4 +++- t/helper/test-read-midx.c | 13 +++++-------- 6 files changed, 37 insertions(+), 20 deletions(-) diff --git a/midx.c b/midx.c index beaf0c0de4..85e1c2cd12 100644 --- a/midx.c +++ b/midx.c @@ -21,6 +21,7 @@ #include "refs.h" #include "revision.h" #include "list-objects.h" +#include "pack-revindex.h" #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */ #define MIDX_VERSION 1 @@ -177,6 +178,8 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS); + m->preferred_pack_idx = -1; + cf = init_chunkfile(NULL); if (read_table_of_contents(cf, m->data, midx_size, @@ -460,6 +463,23 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) return midx_locate_pack(m, idx_or_pack_name, NULL); } +int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id) +{ + if (m->preferred_pack_idx == -1) { + if (load_midx_revindex(m) < 0) { + m->preferred_pack_idx = -2; + return -1; + } + + m->preferred_pack_idx = + nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0)); + } else if (m->preferred_pack_idx == -2) + return -1; /* no revindex */ + + *pack_int_id = m->preferred_pack_idx; + return 0; +} + int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local) { struct multi_pack_index *m; diff --git a/midx.h b/midx.h index 89c5aa637e..f87a8fff26 100644 --- a/midx.h +++ b/midx.h @@ -29,6 +29,7 @@ struct multi_pack_index { unsigned char num_chunks; uint32_t num_packs; uint32_t num_objects; + int preferred_pack_idx; int local; @@ -74,6 +75,7 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name); int midx_locate_pack(struct multi_pack_index *m, const char *idx_or_pack_name, uint32_t *pos); +int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id); int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local); /* diff --git a/pack-bitmap.c b/pack-bitmap.c index 4d5a484678..1682f99596 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -338,7 +338,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git, struct stat st; char *bitmap_name = midx_bitmap_filename(midx); int fd = git_open(bitmap_name); - uint32_t i; + uint32_t i, preferred_pack; struct packed_git *preferred; if (fd < 0) { @@ -393,7 +393,12 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git, } } - preferred = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)]; + if (midx_preferred_pack(bitmap_git->midx, &preferred_pack) < 0) { + warning(_("could not determine MIDX preferred pack")); + goto cleanup; + } + + preferred = bitmap_git->midx->packs[preferred_pack]; if (!is_pack_valid(preferred)) { warning(_("preferred pack (%s) is invalid"), preferred->pack_name); @@ -1926,14 +1931,6 @@ static int try_partial_reuse(struct bitmapped_pack *pack, return 0; } -uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git) -{ - struct multi_pack_index *m = bitmap_git->midx; - if (!m) - BUG("midx_preferred_pack: requires non-empty MIDX"); - return nth_midxed_pack_int_id(m, pack_pos_to_midx(bitmap_git->midx, 0)); -} - static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git, struct bitmapped_pack *pack, struct bitmap *reuse) diff --git a/pack-bitmap.h b/pack-bitmap.h index 7a12a2ce81..179b343912 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -77,7 +77,6 @@ int test_bitmap_hashes(struct repository *r); struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, int filter_provided_objects); -uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git); void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, struct bitmapped_pack **packs_out, size_t *packs_nr_out, diff --git a/pack-revindex.c b/pack-revindex.c index acf1dd9786..7dc6c776d5 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -542,7 +542,9 @@ int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos) * implicitly is preferred (and includes all its objects, since ties are * broken first by pack identifier). */ - key.preferred_pack = nth_midxed_pack_int_id(m, pack_pos_to_midx(m, 0)); + if (midx_preferred_pack(key.midx, &key.preferred_pack) < 0) + return error(_("could not determine preferred pack")); + found = bsearch(&key, m->revindex_data, m->num_objects, sizeof(*m->revindex_data), midx_pack_order_cmp); diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c index e48557aba1..4acae41bb9 100644 --- a/t/helper/test-read-midx.c +++ b/t/helper/test-read-midx.c @@ -6,6 +6,7 @@ #include "pack-bitmap.h" #include "packfile.h" #include "setup.h" +#include "gettext.h" static int read_midx_file(const char *object_dir, int show_objects) { @@ -79,7 +80,7 @@ static int read_midx_checksum(const char *object_dir) static int read_midx_preferred_pack(const char *object_dir) { struct multi_pack_index *midx = NULL; - struct bitmap_index *bitmap = NULL; + uint32_t preferred_pack; setup_git_directory(); @@ -87,16 +88,12 @@ static int read_midx_preferred_pack(const char *object_dir) if (!midx) return 1; - bitmap = prepare_bitmap_git(the_repository); - if (!bitmap) - return 1; - if (!bitmap_is_midx(bitmap)) { - free_bitmap_index(bitmap); + if (midx_preferred_pack(midx, &preferred_pack) < 0) { + warning(_("could not determine MIDX preferred pack")); return 1; } - printf("%s\n", midx->pack_names[midx_preferred_pack(bitmap)]); - free_bitmap_index(bitmap); + printf("%s\n", midx->pack_names[preferred_pack]); return 0; } From patchwork Thu Dec 14 22:24:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493744 Received: from mail-oo1-f46.google.com (mail-oo1-f46.google.com [209.85.161.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4819B2C68E for ; Thu, 14 Dec 2023 22:24:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="ENb03PvX" Received: by mail-oo1-f46.google.com with SMTP id 006d021491bc7-59067ccb090so73009eaf.1 for ; Thu, 14 Dec 2023 14:24:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592669; x=1703197469; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=g3CjAgzRsJDCAUE1qxgrABGgz8llngccG1BYa1wyBXg=; b=ENb03PvXBNIBs16sKsiOaynfMZYUcX35WHzB2S+c7Y7HqOeSnXLzcWkCPlu6fY+0Yd G3YH2VQp7Jwlxwk2yHNItl2XrDkvFngctIBEIToNfweaVExMcaH7LrrVsSsd3taFpBls 5wFenTjicEj4+JkdsgbL3okSg31lVWtJOh0NjYUtT+bXyHtjbTmG9xHI8M/Whjb1HriP bct+Pz2abaCXrCH5RIYEVf1UEFTtbOfFac3+5Wa6UcLNCy91sbY84zOLx9FEkjO87Kv8 Gdx3pctFyWkIeGW7FQ2TM+99KvvAQAnPl+4kJX2I9buExP/t/x9KfpSc+ZxqNrdnwMye cpKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592669; x=1703197469; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=g3CjAgzRsJDCAUE1qxgrABGgz8llngccG1BYa1wyBXg=; b=pyhIuyqsotPswJYfERfo3CEyoKTs4bLmCRJX2LrnAnKqcUvzASwOTvPfKl3D8S3yB8 z/zcXeiMATBKD9/n8477cIwAWRq7v9jfb+Xk6da/wRwW+NIic8a/gXtXfoLZUumVHuRF R5eUtm35Xl14/Ze4934AbgUgng2GDw5JYk2cGgj5hee6vS2q/cYUswDHQzgWCzI2INJY 06HHpIw1qfdIMVzzJJfFqaR6k+N4mH+uoB6kfEtKhA34ocn/VciWT1VPaWWWcftbeRtK SOfDK7J/UyMBFkdMLw4m/Ky2Ep6VG//x2t06VD5FhfUsW3HfIowFiDTUv+Dp+xQUMtSi 0MWQ== X-Gm-Message-State: AOJu0YzEngh+vV+M1a3iN0jTgYdK65+aGpIxqgYDVohgV6zrN3+LU5wG fDMLgWXXVQ7oEhh7V1ixvWDyXbMjaJvJ9Bn7Xa6d5g== X-Google-Smtp-Source: AGHT+IGx9bVR9KJ1MTk9AB72mBGfPnSONQDAtEtaj2ZDI0JlFaQkUPBYdlG1xvBmuSMSZaK+hQxUMQ== X-Received: by 2002:a4a:58cd:0:b0:590:c350:34c3 with SMTP id f196-20020a4a58cd000000b00590c35034c3mr6485132oob.5.1702592668999; Thu, 14 Dec 2023 14:24:28 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id z20-20020a4a6554000000b0059089f2e461sm3621033oog.0.2023.12.14.14.24.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:28 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:28 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 19/26] pack-revindex: factor out `midx_key_to_pack_pos()` helper Message-ID: <14b054d27283ffdda1d3f2a4078a4dae54868bfa.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The `midx_to_pack_pos()` function implements a binary search over objects in the MIDX between lexical and pseudo-pack order. It does this by taking in an index into the lexical order (i.e. the same argument you'd use for `nth_midxed_object_id()` and similar) and spits out a position in the pseudo-pack order. This works for all callers, since they currently all are translating from lexical order to pseudo-pack order. But future callers may want to translate a known (offset, pack_id) tuple into an index into the psuedo-pack order, without knowing where that (offset, pack_id) tuple appears in lexical order. Prepare for implementing a function that translates between a (offset, pack_id) tuple into an index into the psuedo-pack order by extracting a helper function which does just that, and then reimplementing midx_to_pack_pos() in terms of it. Signed-off-by: Taylor Blau --- pack-revindex.c | 39 ++++++++++++++++++++++++--------------- 1 file changed, 24 insertions(+), 15 deletions(-) diff --git a/pack-revindex.c b/pack-revindex.c index 7dc6c776d5..baa4657ed3 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -520,19 +520,12 @@ static int midx_pack_order_cmp(const void *va, const void *vb) return 0; } -int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos) +static int midx_key_to_pack_pos(struct multi_pack_index *m, + struct midx_pack_key *key, + uint32_t *pos) { - struct midx_pack_key key; uint32_t *found; - if (!m->revindex_data) - BUG("midx_to_pack_pos: reverse index not yet loaded"); - if (m->num_objects <= at) - BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at); - - key.pack = nth_midxed_pack_int_id(m, at); - key.offset = nth_midxed_offset(m, at); - key.midx = m; /* * The preferred pack sorts first, so determine its identifier by * looking at the first object in pseudo-pack order. @@ -542,16 +535,32 @@ int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos) * implicitly is preferred (and includes all its objects, since ties are * broken first by pack identifier). */ - if (midx_preferred_pack(key.midx, &key.preferred_pack) < 0) + if (midx_preferred_pack(key->midx, &key->preferred_pack) < 0) return error(_("could not determine preferred pack")); - - found = bsearch(&key, m->revindex_data, m->num_objects, - sizeof(*m->revindex_data), midx_pack_order_cmp); + found = bsearch(key, m->revindex_data, m->num_objects, + sizeof(*m->revindex_data), + midx_pack_order_cmp); if (!found) - return error("bad offset for revindex"); + return -1; *pos = found - m->revindex_data; return 0; } + +int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos) +{ + struct midx_pack_key key; + + if (!m->revindex_data) + BUG("midx_to_pack_pos: reverse index not yet loaded"); + if (m->num_objects <= at) + BUG("midx_to_pack_pos: out-of-bounds object at %"PRIu32, at); + + key.pack = nth_midxed_pack_int_id(m, at); + key.offset = nth_midxed_offset(m, at); + key.midx = m; + + return midx_key_to_pack_pos(m, &key, pos); +} From patchwork Thu Dec 14 22:24:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493746 Received: from mail-oa1-f53.google.com (mail-oa1-f53.google.com [209.85.160.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE2962C6AE for ; Thu, 14 Dec 2023 22:24:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="WBN0G/EY" Received: by mail-oa1-f53.google.com with SMTP id 586e51a60fabf-1ef36a04931so31497fac.2 for ; Thu, 14 Dec 2023 14:24:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592672; x=1703197472; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Mi3NkmKbTEGlRgCqHBBRwBibXS5a8/ymuIbwWMc+rFk=; b=WBN0G/EYi+Liuoqaq0V9J7MQYPSwQuObEIi7+WxteQA5J4E21u+Sd2PVZVvt8aWgND 9We7PAuTzu5wvJkTvi8ZJizUubYk9phSB8jyYBSOIqXfbuplK+JSkz6EV7EfVXnPC/VS dpOBdWd2WS8PhZhtPUPJzdt0RZ05ZPsGSBKouKJ1Inn8XauquQZrIueorXm6MeH3OfWR L0XR3nQ8bqSXQt04efokL++Fne69Jb8+CJsdNRrzHjKYNpH2Qzel4OI0oeHZyg5wdVZ2 bxiamFsjjLCjkAParjIFr+jm98rEnf7yn+BONWy5YDtrU+sBwtTrz4N6oby+3MJAQcp9 n4qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592672; x=1703197472; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Mi3NkmKbTEGlRgCqHBBRwBibXS5a8/ymuIbwWMc+rFk=; b=mxJePzRUS9Zb0V3xGZMCBy16YP4pP2pE+/9/rfmLdCGaK+cptV4QgqAk/hBShMNAM7 uWubkudV/Yra9wInLrfe9zR6FSRSCNf/CvrAic3SynDWPc7dDsQWMNtbJmeHhuf2G8pt SkxALs6MIFjPIQqdUooDQK9OeFfC1WSGboSmNDAkl5WMINC6YBCLVWxuLavR1RGYLHCx uF/85SskxNELjy1gWppUGwAcsMjYbwcXJcqVsLzqG34iNrC7+ONkVLhcZkNji6b9XoKU ecK9vWDKoQ3R0eXjsT+6ayUHz0zU858JrDkTkf0MO2m0uIZJ/AuK+Wq1cXADAsvmhchi jpMw== X-Gm-Message-State: AOJu0YxjvvAEqGYGvy5wYaQuZyGdGuywG2WMmoJJxDFc8BvVrWIIhZKi NewpdgpxpS1HIJtvQ0LV/XLolShniW32WiHAIiIHwg== X-Google-Smtp-Source: AGHT+IEP8cVmQqnhzh7+d7NTBjyi6uwL9ayHlnEsjvJhgi1KYidHKLD5CKquonO3/n1ShuXRqO24Og== X-Received: by 2002:a05:6870:55cc:b0:203:5cd1:b639 with SMTP id qk12-20020a05687055cc00b002035cd1b639mr892910oac.116.1702592672335; Thu, 14 Dec 2023 14:24:32 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id pn9-20020a056871d30900b001fb5551af6fsm4716635oac.16.2023.12.14.14.24.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:31 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:30 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 20/26] pack-revindex: implement `midx_pair_to_pack_pos()` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that we have extracted the `midx_key_to_pack_pos()` function, we can implement the `midx_pair_to_pack_pos()` function which accepts (pack_id, offset) tuples and returns an index into the psuedo-pack order. This will be used in a following commit in order to figure out whether or not the MIDX chose a given delta's base object from the same pack as the delta resides in. It will do so by locating the base object's offset in the pack, and then performing a binary search using the same pack ID with the base object's offset. If (and only if) it finds a match (at any position) we can guarantee that the MIDX selected both halves of the delta/base pair from the same pack. Signed-off-by: Taylor Blau --- pack-revindex.c | 11 +++++++++++ pack-revindex.h | 3 +++ 2 files changed, 14 insertions(+) diff --git a/pack-revindex.c b/pack-revindex.c index baa4657ed3..a7624d8be8 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -564,3 +564,14 @@ int midx_to_pack_pos(struct multi_pack_index *m, uint32_t at, uint32_t *pos) return midx_key_to_pack_pos(m, &key, pos); } + +int midx_pair_to_pack_pos(struct multi_pack_index *m, uint32_t pack_int_id, + off_t ofs, uint32_t *pos) +{ + struct midx_pack_key key = { + .pack = pack_int_id, + .offset = ofs, + .midx = m, + }; + return midx_key_to_pack_pos(m, &key, pos); +} diff --git a/pack-revindex.h b/pack-revindex.h index 6dd47efea1..422c2487ae 100644 --- a/pack-revindex.h +++ b/pack-revindex.h @@ -142,4 +142,7 @@ uint32_t pack_pos_to_midx(struct multi_pack_index *m, uint32_t pos); */ int midx_to_pack_pos(struct multi_pack_index *midx, uint32_t at, uint32_t *pos); +int midx_pair_to_pack_pos(struct multi_pack_index *midx, uint32_t pack_id, + off_t ofs, uint32_t *pos); + #endif From patchwork Thu Dec 14 22:24:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493749 Received: from mail-ot1-f43.google.com (mail-ot1-f43.google.com [209.85.210.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DB892C6B9 for ; Thu, 14 Dec 2023 22:24:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="XsQSDYNi" Received: by mail-ot1-f43.google.com with SMTP id 46e09a7af769-6d9f879f784so60644a34.2 for ; Thu, 14 Dec 2023 14:24:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592675; x=1703197475; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=IH/3jJZSH1yuHnsBAD5VZnKewlomdJS+VanA6yTZaGo=; b=XsQSDYNisJp+d/t3t7ajE1VocQ+4fuHM1/SRBSeC/t7b3Nj5ZCYluOrXeOBABsQRND S/XJSFWlt0Y+1kbcj97aXFKUHxHIQVfCmY/7tZlCcNJK/MPAUWPKRTBkCFlGviz33zFT PzoVQXB8jiBH1VXpN44MFsYnwD5hVMezCoNbvgP9UTh49nJh102bp9IlAhykJDs6rZAl VKLfWC4g02VMBQCWlahXdOWbefuDcACCZEPuC8LaDfr+xeldZttLl8ary20k20rBRF7H TnCV6QSwn64CDntfYABuJf9AfzvVppvF7uBXxlUwDjdUS+IEjnQifg4BUMmLD3UAPqn/ g0iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592675; x=1703197475; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=IH/3jJZSH1yuHnsBAD5VZnKewlomdJS+VanA6yTZaGo=; b=NxgVQgVF+IgcjNNMgcIsSldzheVjKgQ/rdNYvU8kF/2u80BEKMO1xovxhQ8DK85kVO 9WZ5atv10PYAfV+aRg4mE5OBkQsF8qgELyG+l1Loyrd9h61KUHabPWZTH2f+vU4/QXRh VwGFfIfdNHwkwV/E3JoNYxNca2ef+V8QSUJzy/DcY8sNOYg+MakzWJHEnuZasQCzhsSI d+In4h4PlBMK4Wr3viEUi8zPAjIEKWw+V0DbPi9nOy3daigREEG32hXVMYQL3grWQ7zn mMHYJZM0xVW9gHxFyjm7mQdLBkubAsglPjtk2Y86E+D9BkUKApUKvhpHzbQOpw0jIWzH JZug== X-Gm-Message-State: AOJu0YzxGlbI57ysyU6WQH5pz98KIiS2pgKtx3QRMnR5UxU0JHhBzwpE wFwYbeQRrpCJI56K/EvKV1Gf/S+1hS/qERJ1A4bNEA== X-Google-Smtp-Source: AGHT+IER+43TX+GL6LORx6kt/ZjRybvc5k6m9yPDWsTvfpSkism7NtXEokOPGP07ZzSISsmdEDEtkA== X-Received: by 2002:a9d:7f14:0:b0:6d9:e37f:5c44 with SMTP id j20-20020a9d7f14000000b006d9e37f5c44mr10747723otq.57.1702592674986; Thu, 14 Dec 2023 14:24:34 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id q17-20020a056830019100b006d87ae1b15dsm3347348ota.62.2023.12.14.14.24.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:34 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:34 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 21/26] pack-bitmap: prepare to mark objects from multiple packs for reuse Message-ID: <3e3625aebe58e2c8048f9decb80c28c84b6cb0b4.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that the pack-objects code is equipped to handle reusing objects from multiple packs, prepare the pack-bitmap code to mark objects from multiple packs as reuse candidates. In order to prepare the pack-bitmap code for this change, remove the same set of assumptions we unwound in previous commits from the helper function `reuse_partial_packfile_from_bitmap_1()`, in preparation for it to be called in a loop over the set of bitmapped packs in a following commit. Most importantly, we can no longer assume that the bit position corresponding to the first object in a given reuse pack candidate is at the beginning of the bitmap itself. For the single pack that this assumption is still true for (in MIDX bitmaps, this is the preferred pack, in single-pack bitmaps it is the pack the bitmap is tied to), we can still use our whole-words optimization. But for all subsequent packs, we can not make use of this optimization, since it assumes that all delta bases are being sent from the same pack, which would break if we are sending OFS_DELTAs down to the client. To understand why, consider two packs, P1 and P2 where: - P1 has object A which is a delta on base B - P2 has its own copy of B, in addition to other objects Suppose that the MIDX which covers P1 and P2 selected its copy of A from P1, but selected its copy of B from P2. Since A is a delta of B, but the base was selected from a different pack, sending the bytes corresponding to A as an OFS_DELTA verbatim from P1 would be incorrect, since we don't guarantee that B is in the same place relative to A in the generated pack as in P1. For now, we detect and reject these cross-pack deltas by searching for the (pack_id, offset) pair for the delta's base object (using the same pack_id as the pack containing the delta'd object) in the MIDX. If we find a match, that means that the MIDX did indeed pick the base object from the same pack, and we are OK to reuse the delta. If we don't find a match, however, that means that the base object was selected from a different pack in the MIDX, and we can let the slower path handle re-delta'ing our candidate object. In the future, there are a couple of other things we could do, namely: - Turn any cross-pack deltas (which are stored as OFS_DELTAs) into REF_DELTAs. We already do this today when reusing an OFS_DELTA without `--delta-base-offset` enabled, so it's not a huge stretch to do the same for cross-pack deltas even when `--delta-base-offset` is enabled. This would work, but would obviously result in larger-than-necessary packs, as we in theory *could* represent these cross-pack deltas by patching an existing OFS_DELTA. But it's not clear how much that would matter in practice. I suspect it would have a lot to do with how you pack your repository in the first place. - Finally, we could patch OFS_DELTAs across packs in a similar fashion as we do today for OFS_DELTAs within a single pack on either side of a gap. This would result in the smallest packs of the three options here, but implementing this would be more involved. At minimum, you'd have to keep the reusable chunks list for all reused packs, not just the one we're currently processing. And you'd have to ensure that any bases which are a part of cross-pack deltas appear before the delta. I think this is possible to do, but would require assembling the reusable chunks list potentially in a different order than they appear in the source packs. For now, let's pursue the simplest approach and reject any cross-pack deltas. Signed-off-by: Taylor Blau --- pack-bitmap.c | 172 +++++++++++++++++++++++++++++++------------------- 1 file changed, 106 insertions(+), 66 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 1682f99596..242a5908f7 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1841,8 +1841,10 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, * -1 means "stop trying further objects"; 0 means we may or may not have * reused, but you can keep feeding bits. */ -static int try_partial_reuse(struct bitmapped_pack *pack, - size_t pos, +static int try_partial_reuse(struct bitmap_index *bitmap_git, + struct bitmapped_pack *pack, + size_t bitmap_pos, + uint32_t pack_pos, struct bitmap *reuse, struct pack_window **w_curs) { @@ -1850,33 +1852,10 @@ static int try_partial_reuse(struct bitmapped_pack *pack, enum object_type type; unsigned long size; - /* - * try_partial_reuse() is called either on (a) objects in the - * bitmapped pack (in the case of a single-pack bitmap) or (b) - * objects in the preferred pack of a multi-pack bitmap. - * Importantly, the latter can pretend as if only a single pack - * exists because: - * - * - The first pack->num_objects bits of a MIDX bitmap are - * reserved for the preferred pack, and - * - * - Ties due to duplicate objects are always resolved in - * favor of the preferred pack. - * - * Therefore we do not need to ever ask the MIDX for its copy of - * an object by OID, since it will always select it from the - * preferred pack. Likewise, the selected copy of the base - * object for any deltas will reside in the same pack. - * - * This means that we can reuse pos when looking up the bit in - * the reuse bitmap, too, since bits corresponding to the - * preferred pack precede all bits from other packs. - */ + if (pack_pos >= pack->p->num_objects) + return -1; /* not actually in the pack */ - if (pos >= pack->p->num_objects) - return -1; /* not actually in the pack or MIDX preferred pack */ - - offset = delta_obj_offset = pack_pos_to_offset(pack->p, pos); + offset = delta_obj_offset = pack_pos_to_offset(pack->p, pack_pos); type = unpack_object_header(pack->p, w_curs, &offset, &size); if (type < 0) return -1; /* broken packfile, punt */ @@ -1884,6 +1863,7 @@ static int try_partial_reuse(struct bitmapped_pack *pack, if (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA) { off_t base_offset; uint32_t base_pos; + uint32_t base_bitmap_pos; /* * Find the position of the base object so we can look it up @@ -1897,20 +1877,44 @@ static int try_partial_reuse(struct bitmapped_pack *pack, delta_obj_offset); if (!base_offset) return 0; - if (offset_to_pack_pos(pack->p, base_offset, &base_pos) < 0) - return 0; - /* - * We assume delta dependencies always point backwards. This - * lets us do a single pass, and is basically always true - * due to the way OFS_DELTAs work. You would not typically - * find REF_DELTA in a bitmapped pack, since we only bitmap - * packs we write fresh, and OFS_DELTA is the default). But - * let's double check to make sure the pack wasn't written with - * odd parameters. - */ - if (base_pos >= pos) - return 0; + offset_to_pack_pos(pack->p, base_offset, &base_pos); + + if (bitmap_is_midx(bitmap_git)) { + /* + * Cross-pack deltas are rejected for now, but could + * theoretically be supported in the future. + * + * We would need to ensure that we're sending both + * halves of the delta/base pair, regardless of whether + * or not the two cross a pack boundary. If they do, + * then we must convert the delta to an REF_DELTA to + * refer back to the base in the other pack. + * */ + if (midx_pair_to_pack_pos(bitmap_git->midx, + pack->pack_int_id, + base_offset, + &base_bitmap_pos) < 0) { + return 0; + } + } else { + if (offset_to_pack_pos(pack->p, base_offset, + &base_pos) < 0) + return 0; + /* + * We assume delta dependencies always point backwards. + * This lets us do a single pass, and is basically + * always true due to the way OFS_DELTAs work. You would + * not typically find REF_DELTA in a bitmapped pack, + * since we only bitmap packs we write fresh, and + * OFS_DELTA is the default). But let's double check to + * make sure the pack wasn't written with odd + * parameters. + */ + if (base_pos >= pack_pos) + return 0; + base_bitmap_pos = pack->bitmap_pos + base_pos; + } /* * And finally, if we're not sending the base as part of our @@ -1920,14 +1924,14 @@ static int try_partial_reuse(struct bitmapped_pack *pack, * to REF_DELTA on the fly. Better to just let the normal * object_entry code path handle it. */ - if (!bitmap_get(reuse, pack->bitmap_pos + base_pos)) + if (!bitmap_get(reuse, base_bitmap_pos)) return 0; } /* * If we got here, then the object is OK to reuse. Mark it. */ - bitmap_set(reuse, pack->bitmap_pos + pos); + bitmap_set(reuse, bitmap_pos); return 0; } @@ -1937,36 +1941,72 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git { struct bitmap *result = bitmap_git->result; struct pack_window *w_curs = NULL; - size_t i = 0; + size_t pos = pack->bitmap_pos / BITS_IN_EWORD; - while (i < result->word_alloc && result->words[i] == (eword_t)~0) - i++; + if (!pack->bitmap_pos) { + /* + * If we're processing the first (in the case of a MIDX, the + * preferred pack) or the only (in the case of single-pack + * bitmaps) pack, then we can reuse whole words at a time. + * + * This is because we know that any deltas in this range *must* + * have their bases chosen from the same pack, since: + * + * - In the single pack case, there is no other pack to choose + * them from. + * + * - In the MIDX case, the first pack is the preferred pack, so + * all ties are broken in favor of that pack (i.e. the one + * we're currently processing). So any duplicate bases will be + * resolved in favor of the pack we're processing. + */ + while (pos < result->word_alloc && + pos < pack->bitmap_nr / BITS_IN_EWORD && + result->words[pos] == (eword_t)~0) + pos++; + memset(reuse->words, 0xFF, pos * sizeof(eword_t)); + } - /* - * Don't mark objects not in the packfile or preferred pack. This bitmap - * marks objects eligible for reuse, but the pack-reuse code only - * understands how to reuse a single pack. Since the preferred pack is - * guaranteed to have all bases for its deltas (in a multi-pack bitmap), - * we use it instead of another pack. In single-pack bitmaps, the choice - * is made for us. - */ - if (i > pack->p->num_objects / BITS_IN_EWORD) - i = pack->p->num_objects / BITS_IN_EWORD; - - memset(reuse->words, 0xFF, i * sizeof(eword_t)); - - for (; i < result->word_alloc; ++i) { - eword_t word = result->words[i]; - size_t pos = (i * BITS_IN_EWORD); + for (; pos < result->word_alloc; pos++) { + eword_t word = result->words[pos]; size_t offset; - for (offset = 0; offset < BITS_IN_EWORD; ++offset) { - if ((word >> offset) == 0) + for (offset = 0; offset < BITS_IN_EWORD; offset++) { + size_t bit_pos; + uint32_t pack_pos; + + if (word >> offset == 0) break; offset += ewah_bit_ctz64(word >> offset); - if (try_partial_reuse(pack, pos + offset, - reuse, &w_curs) < 0) { + + bit_pos = pos * BITS_IN_EWORD + offset; + if (bit_pos < pack->bitmap_pos) + continue; + if (bit_pos >= pack->bitmap_pos + pack->bitmap_nr) + goto done; + + if (bitmap_is_midx(bitmap_git)) { + uint32_t midx_pos; + off_t ofs; + + midx_pos = pack_pos_to_midx(bitmap_git->midx, bit_pos); + ofs = nth_midxed_offset(bitmap_git->midx, midx_pos); + + if (offset_to_pack_pos(pack->p, ofs, &pack_pos) < 0) + BUG("could not find object in pack %s " + "at offset %"PRIuMAX" in MIDX", + pack_basename(pack->p), (uintmax_t)ofs); + } else { + pack_pos = cast_size_t_to_uint32_t(st_sub(bit_pos, pack->bitmap_pos)); + if (pack_pos >= pack->p->num_objects) + BUG("advanced beyond the end of pack %s (%"PRIuMAX" > %"PRIu32")", + pack_basename(pack->p), (uintmax_t)pack_pos, + pack->p->num_objects); + } + + if (try_partial_reuse(bitmap_git, pack, bit_pos, + pack_pos, reuse, &w_curs) < 0) { /* * try_partial_reuse indicated we couldn't reuse * any bits, so there is no point in trying more From patchwork Thu Dec 14 22:24:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493748 Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 25D2E6721B for ; Thu, 14 Dec 2023 22:24:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="d8f00qCO" Received: by mail-oi1-f182.google.com with SMTP id 5614622812f47-3b9fd7b14cdso512083b6e.0 for ; Thu, 14 Dec 2023 14:24:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592678; x=1703197478; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=GRsYYhPgPUf8u728XV2u8BkxysmiQ6wX1U/0wFiUVNg=; b=d8f00qCONe8woQ74m2UmtJGn/9xNyY2YnVreIp6GBYN/LecUAwQbfy36hRIKjUsEX5 gakwds4SuXXdWUMewjsKImlcLbQRgLOK1CEGOT00+cUr+xDOssTaMOngMlHIbWdVGCcP YAWP2h4jfEbfwTcmT70vIMHzxyALaemC0E7f7xH2TcrpVNy6/ovxyZxrBy/RFPNfTBK/ 4BtdCHBQqjnjei7RKWhsVEvIjXQwWAdohSS/gXOMQwDXK5r0ZVCFNbbxTbEHpC5XijRY RY8ssCfCFgBSttJgxkkxIZO+m80eVUIr2B11HLgnoFRoQyc0DfkB6fZzJs79+/V8HorL Eb1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592678; x=1703197478; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=GRsYYhPgPUf8u728XV2u8BkxysmiQ6wX1U/0wFiUVNg=; b=Je01bn2ZJ/uo/kUCbf+HCnKq0g4EstyTQl/aR3EYktrn/C/xCT+WE31WREShQf2pA3 WQcG84IGMu3mO/mYwEJrhZwAf8/qYnwCQnWpQPYtnLVyd9ZtXqYxb5Ds4a6aS5bnrURt 23XJ7YJtAgb0cxJXfYSL3yieRSqBoXP9YpbZ7Foaz0QGXd2Qa4GmhV7r6SzO7DJmzypb xYbTwdjXXV9lIOE4cp1Su02qiKvrdmnxpPZytSEFkGCNulhaTqUoBHcLVQd0AOw+Qwjg hlagaIa0ds3R5Z9oUkCu5fiar75O1/cgjoY2tl1hPcKAYVWi0m1dAj1gwYUe4YRvXVjn UDKQ== X-Gm-Message-State: AOJu0Yze9bOLfj7WALUR/I63M8ZlUyigt64s6CtODtVn+EUbTKtL9Lvl zIhcp7zn0ssCyF9HKsT+vZ9xoAovHddup/3s77pwlQ== X-Google-Smtp-Source: AGHT+IGyWcWdwzmsOM2GGOSv770RPtctdMgvWxPyLF2HUaFEAGDL4hqMGEfTF93Hkb6NRUT9CXzfog== X-Received: by 2002:a05:6808:3989:b0:3b8:b06b:97f8 with SMTP id gq9-20020a056808398900b003b8b06b97f8mr5656374oib.40.1702592677613; Thu, 14 Dec 2023 14:24:37 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id y9-20020a4a2d09000000b00584017f57a9sm3675566ooy.30.2023.12.14.14.24.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:37 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:36 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 22/26] pack-objects: add tracing for various packfile metrics Message-ID: <1723cd0384b56bbf5ae77ed249327c16cb937634.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: As part of the multi-pack reuse effort, we will want to add some tests that assert that we reused a certain number of objects from a certain number of packs. We could do this by grepping through the stderr output of `pack-objects`, but doing so would be brittle in case the output format changed. Instead, let's use the trace2 mechanism to log various pieces of information about the generated packfile, which we can then use to compare against desired values. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 7eb035eb7d..7aae9f104b 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4595,6 +4595,13 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) reuse_packfile_objects, (uintmax_t)reuse_packfiles_used_nr); + trace2_data_intmax("pack-objects", the_repository, "written", written); + trace2_data_intmax("pack-objects", the_repository, "written/delta", written_delta); + trace2_data_intmax("pack-objects", the_repository, "reused", reused); + trace2_data_intmax("pack-objects", the_repository, "reused/delta", reused_delta); + trace2_data_intmax("pack-objects", the_repository, "pack-reused", reuse_packfile_objects); + trace2_data_intmax("pack-objects", the_repository, "packs-reused", reuse_packfiles_used_nr); + cleanup: clear_packing_data(&to_pack); list_objects_filter_release(&filter_options); From patchwork Thu Dec 14 22:24:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493751 Received: from mail-oa1-f44.google.com (mail-oa1-f44.google.com [209.85.160.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B05696720F for ; Thu, 14 Dec 2023 22:24:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="wsD1LPD3" Received: by mail-oa1-f44.google.com with SMTP id 586e51a60fabf-20308664c13so30969fac.3 for ; Thu, 14 Dec 2023 14:24:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592680; x=1703197480; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=VF17IDQorzmO9WV0u1VPJ82QQN6gFJDb0Wcyvjc5H9M=; b=wsD1LPD35GyS9ekCCveVZ65Vd/vheTxKuKH/CBxAmUOD+WKsj9NOgD95XX2lP8xRWG tvzWVjmwadcrfrX1z+tcAqP9v1zpjds8n/8Co360Odn8LSpNbEkEBZGEz0G4WX8kPyFu /IMqjeZlwbD0LX8eORuvIohztGZEbNzpy1U2upv3UYyy3eXr6GEyQg+urhEVoz9wHFGL SwqRoujSW3xdeom7WaYMteFCLPnlZNQqd0bM4vq2CKlMhGRJu/mMXZ0ihnsyEczIkOIo h30Jgoy865HutE4B7HGxbnLEHpaDDGEayO2ifvfLhE6btdmGphtv2ox8dH/5QOPNlW6n Ukfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592680; x=1703197480; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=VF17IDQorzmO9WV0u1VPJ82QQN6gFJDb0Wcyvjc5H9M=; b=Y8KWltNPUiAVZzNmspJfMSEQ+W8n+EBGrPWtc9mhu43q1RlXt2u774vfV9dHIwFBA9 voIixrqiItiFYFA0UAWZ3KKj7xfZ6y67DuyrbqKyrObMYUvnykD1U0ELyzKzVzFvKNu4 4e4mY8C4PSUqIQZyVc81obTuBLZWg1cs7Z3Pfgt8p5vw7yqfpUER7omuIqujkt09iYAi 4fwTAw7SOxFZyF3x8Mg126ou7lFQsB8ukXp+OFD0MEwYui59F1iAda0qnYuW8Sq8dJUb PfHtejjBSMRascDRCNSnQGJWC4O8YFPHQiRR0szsHHtagXoeholksCDs5TX8crP0LHBm HNuw== X-Gm-Message-State: AOJu0Yzx9P2BYxus5X2mMTnV/dIc2UGtUPsbhZCLNa0MBgPMeYFXXWj3 36aBHWPjDkPVRGO/37e8vEyp0Y1COzeLqBOXZl7AIA== X-Google-Smtp-Source: AGHT+IH+D7rfA056aLM7owiH0V4zVP1TvL9lTDwDWmrz1Gph04GD54CytFvD6M4xY36hbjajr9Ff2w== X-Received: by 2002:a05:6870:e388:b0:203:268:7773 with SMTP id x8-20020a056870e38800b0020302687773mr5649813oad.33.1702592680268; Thu, 14 Dec 2023 14:24:40 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id hp10-20020a0568709a8a00b0020312c31095sm865880oab.28.2023.12.14.14.24.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:40 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:39 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 23/26] t/test-lib-functions.sh: implement `test_trace2_data` helper Message-ID: <79c830e37ae7acec826bc41b8473309b38ed006f.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Introduce a helper function which looks for a specific (category, key, value) tuple in the output of a trace2 event stream. We will use this function in a future patch to ensure that the expected number of objects are reused from an expected number of packs. Signed-off-by: Taylor Blau --- t/test-lib-functions.sh | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 9c3cf12b26..93fe819b0a 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1874,6 +1874,20 @@ test_region () { return 0 } +# Check that the given data fragment was included as part of the +# trace2-format trace on stdin. +# +# test_trace2_data +# +# For example, to look for trace2_data_intmax("pack-objects", repo, +# "reused", N) in an invocation of "git pack-objects", run: +# +# GIT_TRACE2_EVENT="$(pwd)/trace.txt" git pack-objects ... && +# test_trace2_data pack-objects reused N X-Patchwork-Id: 13493750 Received: from mail-oi1-f178.google.com (mail-oi1-f178.google.com [209.85.167.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2BA6766AB7 for ; Thu, 14 Dec 2023 22:24:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="v6bw8MfU" Received: by mail-oi1-f178.google.com with SMTP id 5614622812f47-3ba2dd905f9so38502b6e.2 for ; Thu, 14 Dec 2023 14:24:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592683; x=1703197483; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=XKqbkIfE9C+xd6Y3exjFR6YVRIjozkCh/v7fzBRb+QM=; b=v6bw8MfUnCCxni5Njkusy8j8esrw6XJWZ2a76mRgwo2DuFL1myw9V3tir2ngKzZlIA g5/LpiF6/SgpB/H8a6CQpOlKqK8MBhliOlcPnfJCM1/apUrxfnoO6fbSFoqtZwl1ocg2 AWAjkq9pKOlOKJIU774PrIs+VBIw3dtCwZBQebXslPoNR80F1YMclSiubMOhv7YJwDE5 yOwWq0DbVh2Air5/eUTL41I+hHPz3E188NsCphwfXV6NqG2NEjMyEEgiQzb9j88vFe5/ PicwediXOTIJYMV2zRFXMbMDvgsZrLPHZGsi+Rn9oW6Adlke0SgK0dRPqNukUvSD2o8b iPuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592683; x=1703197483; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=XKqbkIfE9C+xd6Y3exjFR6YVRIjozkCh/v7fzBRb+QM=; b=PAzApScnXjK1LHcxaRjAH4x31iMflX41w3Z37DEAXNfTD8R1xDeQt9B9qIlYUcDiBc zVcMUODGvFq/v5WB6zBjSDDvw/gzSQ22TA9EtCDoeoI4YTynjR8ibiDmTqkf46Ph8lu1 hjFpuwEf1Hok1FKlWzsILU/nr7RNZ0V9Uwj7Vi2WE1RTEuNbx/CqLMK15UYKh/T325oM Otywc8sopVLgjBD99yAvsR47JC8pU3FAglKGtt5scxyZTU6OlxH9f3rtWQtLJ7WdNKPQ XWDimOV8cra91sIqXkFFrh0ESHziBYmYNfHpuqjhp/ijtFREIEm8ZBw9SbDFhXPOJaxI JNHQ== X-Gm-Message-State: AOJu0YxTSB0puVh94QPhoLAzSQ4BFpCz930e8AmcterpMjzQtm0JR1uq 1Qpbzjno98VkKuPt1d/E9iN4/aJwnnNfthK4yNLMeA== X-Google-Smtp-Source: AGHT+IF5zwimxicbhsjfhOX5r8oHLv8UjOhb3vdkojeKnGJCcVjESudUR69aKY9IbbbHg2UwZp+RTw== X-Received: by 2002:a05:6808:22a4:b0:3ae:1298:257a with SMTP id bo36-20020a05680822a400b003ae1298257amr11246154oib.1.1702592682915; Thu, 14 Dec 2023 14:24:42 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id bp6-20020a056808238600b003b2e4511f22sm634092oib.17.2023.12.14.14.24.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:42 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:42 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 24/26] pack-objects: allow setting `pack.allowPackReuse` to "single" Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: In e704fc7978 (pack-objects: introduce pack.allowPackReuse, 2019-12-18), the `pack.allowPackReuse` configuration option was introduced, allowing users to disable the pack reuse mechanism. To prepare for debugging multi-pack reuse, allow setting configuration to "single" in addition to the usual bool-or-int values. "single" implies the same behavior as "true", "1", "yes", and so on. But it will complement a new "multi" value (to be introduced in a future commit). When set to "single", we will only perform pack reuse on a single pack, regardless of whether or not there are multiple MIDX'd packs. This requires no code changes (yet), since we only support single pack reuse. Signed-off-by: Taylor Blau --- Documentation/config/pack.txt | 2 +- builtin/pack-objects.c | 19 ++++++++++++++++--- 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt index f50df9dbce..fe100d0fb7 100644 --- a/Documentation/config/pack.txt +++ b/Documentation/config/pack.txt @@ -28,7 +28,7 @@ all existing objects. You can force recompression by passing the -F option to linkgit:git-repack[1]. pack.allowPackReuse:: - When true, and when reachability bitmaps are enabled, + When true or "single", and when reachability bitmaps are enabled, pack-objects will try to send parts of the bitmapped packfile verbatim. This can reduce memory and CPU usage to serve fetches, but might result in sending a slightly larger pack. Defaults to diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 7aae9f104b..684698f679 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -229,7 +229,10 @@ static struct bitmap *reuse_packfile_bitmap; static int use_bitmap_index_default = 1; static int use_bitmap_index = -1; -static int allow_pack_reuse = 1; +static enum { + NO_PACK_REUSE = 0, + SINGLE_PACK_REUSE, +} allow_pack_reuse = SINGLE_PACK_REUSE; static enum { WRITE_BITMAP_FALSE = 0, WRITE_BITMAP_QUIET, @@ -3244,7 +3247,17 @@ static int git_pack_config(const char *k, const char *v, return 0; } if (!strcmp(k, "pack.allowpackreuse")) { - allow_pack_reuse = git_config_bool(k, v); + int res = git_parse_maybe_bool_text(v); + if (res < 0) { + if (!strcasecmp(v, "single")) + allow_pack_reuse = SINGLE_PACK_REUSE; + else + die(_("invalid pack.allowPackReuse value: '%s'"), v); + } else if (res) { + allow_pack_reuse = SINGLE_PACK_REUSE; + } else { + allow_pack_reuse = NO_PACK_REUSE; + } return 0; } if (!strcmp(k, "pack.threads")) { @@ -3999,7 +4012,7 @@ static void loosen_unused_packed_objects(void) */ static int pack_options_allow_reuse(void) { - return allow_pack_reuse && + return allow_pack_reuse != NO_PACK_REUSE && pack_to_stdout && !ignore_packed_keep_on_disk && !ignore_packed_keep_in_core && From patchwork Thu Dec 14 22:24:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13493753 Received: from mail-oi1-f170.google.com (mail-oi1-f170.google.com [209.85.167.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 04A4366ACB for ; Thu, 14 Dec 2023 22:24:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="MfawSoxB" Received: by mail-oi1-f170.google.com with SMTP id 5614622812f47-3ba1be5ad0aso51708b6e.0 for ; Thu, 14 Dec 2023 14:24:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592686; x=1703197486; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=CkXBRJmZ/gCGMW3Z/EKezE9boZCgs6DFU8gHUaHQTCk=; b=MfawSoxBZBEPOA3GAY7Zl/NXok/W5BEhWfBE/p1WioG92I+oWYE2OB82nF/X3CZ3QL hkEhNgdoMuha3Jd17Qj0ErcfN8FZJ7KXAqVz6FZmXLIDwM6+1tC4X/NWpY9eMDB4lia+ SbvpRGn/Ud0+QVCCIFsn3qEh0BuJ3Nt2ab/BmfAlYo16Gr/T0QjP7UyACj8gTkzMWqDO vJBuwqokH0KcuRBsVPkADwUvqIbM3dOlv7s2MANIatKFqRNmJmOmxAtgUiF8SRD1FHa0 JRRGKfVqdnW3Qv20ZajH/22V8Aj/QYb3dxdwLixpvtJDtlgdIWKgvAxlcdJ8heGD1e+b GjAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592686; x=1703197486; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=CkXBRJmZ/gCGMW3Z/EKezE9boZCgs6DFU8gHUaHQTCk=; b=iRtXeZr4Knq6tPg8cjIVMTYF7f2ac9LNyCHhFQ4OBgwpDvGeBReL00GgrFaBxZ5CrR 3RAmO46qaE3S8t04ow13dfL+4VYUfMmtpTX4tEbqTZb7oh0dYZ1MXGN3OGnL6zIf5OIb 0BBc/SnhC1pHJ32aHjV2SMzYM3wlUUbrN8QYOlPdp86XW6Yxr5tjoB1DZRn86pjxwEZX Bl8LmICAoupZ5w0/b3+EPzZkH0caZopkTI837JDKFUm9FTjKeW19kLPqtkZ7I58LQO0B 4LKfBedh9kqPBJtmUPHWK8VVVG5raUV64KsGKJ5R+68/QN88Lve3xHezuSulVAYfmgQ+ n32g== X-Gm-Message-State: AOJu0YwrCljEH0X31e+bhgYpAwM8V4gYYyLjA548JothEoVhwptahV/8 XFdCKXTBjV4uSUx6t+TaTfh0JI5tVBMKgvhWjsWRyw== X-Google-Smtp-Source: AGHT+IGbZ5eqjFMV+bV96YXxwsh/65EcEVZz3RYoGMn1zM0EW2jAcR9EMLnExHO2gFel5/xpHGQs9g== X-Received: by 2002:a05:6808:f91:b0:3b8:b063:5d65 with SMTP id o17-20020a0568080f9100b003b8b0635d65mr12166463oiw.76.1702592685700; Thu, 14 Dec 2023 14:24:45 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id i26-20020a54409a000000b003b2e2d134a5sm3562554oii.35.2023.12.14.14.24.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:45 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:44 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 25/26] pack-bitmap: enable reuse from all bitmapped packs Message-ID: <7002cf08fe301f1de28137b798fab3c8c32337fa.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that both the pack-bitmap and pack-objects code are prepared to handle marking and using objects from multiple bitmapped packs for verbatim reuse, allow marking objects from all bitmapped packs as eligible for reuse. Within the `reuse_partial_packfile_from_bitmap()` function, we no longer only mark the pack whose first object is at bit position zero for reuse, and instead mark any pack contained in the MIDX as a reuse candidate. Provide a handful of test cases in a new script (t5332) exercising interesting behavior for multi-pack reuse to ensure that we performed all of the previous steps correctly. Signed-off-by: Taylor Blau --- Documentation/config/pack.txt | 16 ++- builtin/pack-objects.c | 6 +- pack-bitmap.c | 34 ++++-- pack-bitmap.h | 3 +- t/t5332-multi-pack-reuse.sh | 203 ++++++++++++++++++++++++++++++++++ 5 files changed, 245 insertions(+), 17 deletions(-) create mode 100755 t/t5332-multi-pack-reuse.sh diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt index fe100d0fb7..9c630863e6 100644 --- a/Documentation/config/pack.txt +++ b/Documentation/config/pack.txt @@ -28,11 +28,17 @@ all existing objects. You can force recompression by passing the -F option to linkgit:git-repack[1]. pack.allowPackReuse:: - When true or "single", and when reachability bitmaps are enabled, - pack-objects will try to send parts of the bitmapped packfile - verbatim. This can reduce memory and CPU usage to serve fetches, - but might result in sending a slightly larger pack. Defaults to - true. + When true or "single", and when reachability bitmaps are + enabled, pack-objects will try to send parts of the bitmapped + packfile verbatim. When "multi", and when a multi-pack + reachability bitmap is available, pack-objects will try to send + parts of all packs in the MIDX. ++ + If only a single pack bitmap is available, and + `pack.allowPackReuse` is set to "multi", reuse parts of just the + bitmapped packfile. This can reduce memory and CPU usage to + serve fetches, but might result in sending a slightly larger + pack. Defaults to true. pack.island:: An extended regular expression configuring a set of delta diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 684698f679..5d3c42035b 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -232,6 +232,7 @@ static int use_bitmap_index = -1; static enum { NO_PACK_REUSE = 0, SINGLE_PACK_REUSE, + MULTI_PACK_REUSE, } allow_pack_reuse = SINGLE_PACK_REUSE; static enum { WRITE_BITMAP_FALSE = 0, @@ -3251,6 +3252,8 @@ static int git_pack_config(const char *k, const char *v, if (res < 0) { if (!strcasecmp(v, "single")) allow_pack_reuse = SINGLE_PACK_REUSE; + else if (!strcasecmp(v, "multi")) + allow_pack_reuse = MULTI_PACK_REUSE; else die(_("invalid pack.allowPackReuse value: '%s'"), v); } else if (res) { @@ -4029,7 +4032,8 @@ static int get_object_list_from_bitmap(struct rev_info *revs) reuse_partial_packfile_from_bitmap(bitmap_git, &reuse_packfiles, &reuse_packfiles_nr, - &reuse_packfile_bitmap); + &reuse_packfile_bitmap, + allow_pack_reuse == MULTI_PACK_REUSE); if (reuse_packfiles) { reuse_packfile_objects = bitmap_popcount(reuse_packfile_bitmap); diff --git a/pack-bitmap.c b/pack-bitmap.c index 242a5908f7..229a11fb00 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -2040,7 +2040,8 @@ static int bitmapped_pack_cmp(const void *va, const void *vb) void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, struct bitmapped_pack **packs_out, size_t *packs_nr_out, - struct bitmap **reuse_out) + struct bitmap **reuse_out, + int multi_pack_reuse) { struct repository *r = the_repository; struct bitmapped_pack *packs = NULL; @@ -2064,15 +2065,30 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, free(packs); return; } + if (!pack.bitmap_nr) - continue; /* no objects from this pack */ - if (pack.bitmap_pos) - continue; /* not preferred pack */ + continue; + + if (!multi_pack_reuse && pack.bitmap_pos) { + /* + * If we're only reusing a single pack, skip + * over any packs which are not positioned at + * the beginning of the MIDX bitmap. + * + * This is consistent with the existing + * single-pack reuse behavior, which only reuses + * parts of the MIDX's preferred pack. + */ + continue; + } ALLOC_GROW(packs, packs_nr + 1, packs_alloc); memcpy(&packs[packs_nr++], &pack, sizeof(pack)); objects_nr += pack.p->num_objects; + + if (!multi_pack_reuse) + break; } QSORT(packs, packs_nr, bitmapped_pack_cmp); @@ -2080,10 +2096,10 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, ALLOC_GROW(packs, packs_nr + 1, packs_alloc); packs[packs_nr].p = bitmap_git->pack; - packs[packs_nr].bitmap_pos = 0; packs[packs_nr].bitmap_nr = bitmap_git->pack->num_objects; + packs[packs_nr].bitmap_pos = 0; - objects_nr = packs[packs_nr++].p->num_objects; + objects_nr = packs[packs_nr++].bitmap_nr; } word_alloc = objects_nr / BITS_IN_EWORD; @@ -2091,10 +2107,8 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, word_alloc++; reuse = bitmap_word_alloc(word_alloc); - if (packs_nr != 1) - BUG("pack reuse not yet implemented for multiple packs"); - - reuse_partial_packfile_from_bitmap_1(bitmap_git, packs, reuse); + for (i = 0; i < packs_nr; i++) + reuse_partial_packfile_from_bitmap_1(bitmap_git, &packs[i], reuse); if (bitmap_is_empty(reuse)) { free(packs); diff --git a/pack-bitmap.h b/pack-bitmap.h index 179b343912..c7dea13217 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -80,7 +80,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git, struct bitmapped_pack **packs_out, size_t *packs_nr_out, - struct bitmap **reuse_out); + struct bitmap **reuse_out, + int multi_pack_reuse); int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data *mapping, kh_oid_map_t *reused_bitmaps, int show_progress); void free_bitmap_index(struct bitmap_index *); diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh new file mode 100755 index 0000000000..2ba788b042 --- /dev/null +++ b/t/t5332-multi-pack-reuse.sh @@ -0,0 +1,203 @@ +#!/bin/sh + +test_description='pack-objects multi-pack reuse' + +. ./test-lib.sh +. "$TEST_DIRECTORY"/lib-bitmap.sh + +objdir=.git/objects +packdir=$objdir/pack + +test_pack_reused () { + test_trace2_data pack-objects pack-reused "$1" +} + +test_packs_reused () { + test_trace2_data pack-objects packs-reused "$1" +} + + +# pack_position objects && + grep "$1" objects | cut -d" " -f1 +} + +test_expect_success 'preferred pack is reused for single-pack reuse' ' + test_config pack.allowPackReuse single && + + for i in A B + do + test_commit "$i" && + git repack -d || return 1 + done && + + git multi-pack-index write --bitmap && + + : >trace2.txt && + GIT_TRACE2_EVENT="$PWD/trace2.txt" \ + git pack-objects --stdout --revs --all >/dev/null && + + test_pack_reused 3 in <<-EOF && + $(git rev-parse C) + ^$(git rev-parse A) + EOF + + : >trace2.txt && + GIT_TRACE2_EVENT="$PWD/trace2.txt" \ + git pack-objects --stdout --revs /dev/null && + + test_pack_reused 6 trace2.txt && + GIT_TRACE2_EVENT="$PWD/trace2.txt" \ + git pack-objects --stdout --revs --all >/dev/null && + + test_pack_reused 9 in <<-EOF && + $(git rev-parse E) + ^$(git rev-parse D) + EOF + + : >trace2.txt && + GIT_TRACE2_EVENT="$PWD/trace2.txt" \ + git pack-objects --stdout --delta-base-offset --revs /dev/null && + + test_pack_reused 3 in <<-EOF && + $(git rev-parse E) + ^$(git rev-parse D) + EOF + + : >trace2.txt && + GIT_TRACE2_EVENT="$PWD/trace2.txt" \ + git pack-objects --stdout --delta-base-offset --revs /dev/null && + + test_pack_reused 3 f && + git add f && + test_tick && + git commit -m "delta" && + delta="$(git rev-parse HEAD)" && + + test_seq 64 >f && + test_tick && + git commit -a -m "base" && + base="$(git rev-parse HEAD)" && + + test_commit other && + + git repack -d && + + have_delta "$(git rev-parse $delta:f)" "$(git rev-parse $base:f)" && + + git multi-pack-index write --bitmap && + + cat >in <<-EOF && + $(git rev-parse other) + ^$base + EOF + + : >trace2.txt && + GIT_TRACE2_EVENT="$PWD/trace2.txt" \ + git pack-objects --stdout --delta-base-offset --revs /dev/null && + + # We can only reuse the 3 objects corresponding to "other" from + # the latest pack. + # + # This is because even though we want "delta", we do not want + # "base", meaning that we have to inflate the delta/base-pair + # corresponding to the blob in commit "delta", which bypasses + # the pack-reuse mechanism. + # + # The remaining objects from the other pack are similarly not + # reused because their objects are on the uninteresting side of + # the query. + test_pack_reused 3 in <<-EOF && + $(git rev-parse $base) + ^$(git rev-parse $delta) + EOF + + P="$(git pack-objects --revs $packdir/pack trace2.txt && + GIT_TRACE2_EVENT="$PWD/trace2.txt" \ + git pack-objects --stdout --delta-base-offset --all >/dev/null && + + packs_nr="$(find $packdir -type f -name "pack-*.pack" | wc -l)" && + objects_nr="$(git rev-list --count --all --objects)" && + + test_pack_reused $(($objects_nr - 1)) X-Patchwork-Id: 13493754 Received: from mail-ot1-f51.google.com (mail-ot1-f51.google.com [209.85.210.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 979DC671E8 for ; Thu, 14 Dec 2023 22:24:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="ALvmIvgN" Received: by mail-ot1-f51.google.com with SMTP id 46e09a7af769-6da1b71a085so48266a34.2 for ; Thu, 14 Dec 2023 14:24:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1702592688; x=1703197488; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=WO7jHHA81HsziybRHje3jIUVkg0779cnJFsesAoqDpc=; b=ALvmIvgNxxKondmlaxXalGet04lSBK3ohZ0aovhwe8FJ6+akVFlnU3AwL0TtVfe4wu YdUSu3O/Eg4RUmTVICh3ATzT8WZQ5frkF6HZ7dmvxLSFhOPydc8xqgHpMKngEKZcgyBq aq/nlvObXCB7DEZlyOodiu+gtYm20bTLwukfLjcYJR7wxxYt6xmAVNX8d4OzOGKcveMp nnjh+vDb9C00q9aiIzs1JE2TFIynIiaAOm/l41KCRv3GMOe8ruVQdzQd06MJ/IPLMGHm AN7X3yHVDmUitpCA0XcOiqn1bStwnhdDGt/SiuNi5c6RG1lgg58Ovk96GrGl4LNsCMM7 43tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592688; x=1703197488; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=WO7jHHA81HsziybRHje3jIUVkg0779cnJFsesAoqDpc=; b=OP1WjMNG8EDQsEMeLZNuCxq5itUClb/S/Gmw/V3fP9OfoOWjd6BtDwRfYD9S81L8Kw 3dPlO9VT+Z/l2N8HIKL6KeJOF50A5SfHjjuw4gna/XQBtz9rdZeN8ioMkPS8+1xU5ACj Yn4LgBleUZl6zgtJa134RrhrS9AGUEP/o4lEF+IKiz8U6ieEiQUb46W8TZXpAXXt0Ozl Aiv1cZrf0ISknHFFcu5frw9dnGxzuUCXooKYXGqhVXPTGRteyBDaK9t1SIl8yEevBjyG 65l8KACe6GfqBPPfzT8DNTMINIs/WNrl8jqdK2Ek9xuoxYR8auCgzGY7OZSOUXLugNHl +hKA== X-Gm-Message-State: AOJu0YxblFwPrBdaE/70gf0nsyfJTpaSNXuYnV0iVBRG92VNUREIV4za Zl2RA8/6FY5wfOdAFqYuNvLyNQB5N4qojMivgibU0w== X-Google-Smtp-Source: AGHT+IERGBvQ2y9xKwNU4auQn/K0v/wV2kIXr4ZhWtuclsY6iwltDqAIOuCulOefv4/4bdNaQcX8ZA== X-Received: by 2002:a9d:6f85:0:b0:6d8:7a7a:2d7f with SMTP id h5-20020a9d6f85000000b006d87a7a2d7fmr6422041otq.41.1702592688302; Thu, 14 Dec 2023 14:24:48 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id a25-20020a056830009900b006d7f3e00bc2sm3349472oto.52.2023.12.14.14.24.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:24:48 -0800 (PST) Date: Thu, 14 Dec 2023 17:24:47 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , Junio C Hamano Subject: [PATCH v2 26/26] t/perf: add performance tests for multi-pack reuse Message-ID: <94e5ae4cf6e0f53d4141fc486f32d73d168cf993.1702592604.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: To ensure that we don't regress either the size or runtime performance of multi-pack reuse, add a performance test to measure both of these. The test partitions the objects in GIT_TEST_PERF_LARGE_REPO into 1, 10, and 100 packs, and then tries to perform a "clone" at each stage with both single- and multi-pack reuse enabled. Note that the `repack_into_n_chunks()` function in this new test script differs from the existing `repack_into_n()`. The former partitions the repository into N equal-sized chunks, while the latter produces N packs of five commits each (plus their objects), and then another pack with the remainder. On git.git, I can produce the following results on my machine: Test this tree -------------------------------------------------------------------------------- 5332.3: clone for 1-pack scenario (single-pack reuse) 1.57(2.99+0.15) 5332.4: clone size for 1-pack scenario (single-pack reuse) 231.8M 5332.5: clone for 1-pack scenario (multi-pack reuse) 1.79(2.96+0.21) 5332.6: clone size for 1-pack scenario (multi-pack reuse) 231.7M 5332.9: clone for 10-pack scenario (single-pack reuse) 3.89(16.75+0.35) 5332.10: clone size for 10-pack scenario (single-pack reuse) 209.9M 5332.11: clone for 10-pack scenario (multi-pack reuse) 1.56(2.99+0.17) 5332.12: clone size for 10-pack scenario (multi-pack reuse) 224.4M 5332.15: clone for 100-pack scenario (single-pack reuse) 8.24(54.31+0.59) 5332.16: clone size for 100-pack scenario (single-pack reuse) 278.3M 5332.17: clone for 100-pack scenario (multi-pack reuse) 2.13(2.44+0.33) 5332.18: clone size for 100-pack scenario (multi-pack reuse) 357.9M Signed-off-by: Taylor Blau --- t/perf/p5332-multi-pack-reuse.sh | 81 ++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100755 t/perf/p5332-multi-pack-reuse.sh diff --git a/t/perf/p5332-multi-pack-reuse.sh b/t/perf/p5332-multi-pack-reuse.sh new file mode 100755 index 0000000000..5c6c575d62 --- /dev/null +++ b/t/perf/p5332-multi-pack-reuse.sh @@ -0,0 +1,81 @@ +#!/bin/sh + +test_description='tests pack performance with multi-pack reuse' + +. ./perf-lib.sh +. "${TEST_DIRECTORY}/perf/lib-pack.sh" + +packdir=.git/objects/pack + +test_perf_large_repo + +find_pack () { + for idx in $packdir/pack-*.idx + do + if git show-index <$idx | grep -q "$1" + then + basename $idx + fi || return 1 + done +} + +repack_into_n_chunks () { + git repack -adk && + + test "$1" -eq 1 && return || + + find $packdir -type f | sort >packs.before && + + # partition the repository into $1 chunks of consecutive commits, and + # then create $1 packs with the objects reachable from each chunk + # (excluding any objects reachable from the previous chunks) + sz="$(($(git rev-list --count --all) / $1))" + for rev in $(git rev-list --all | awk "NR % $sz == 0" | tac) + do + pack="$(echo "$rev" | git pack-objects --revs \ + --honor-pack-keep --delta-base-offset $packdir/pack)" && + touch $packdir/pack-$pack.keep || return 1 + done + + # grab any remaining objects not packed by the previous step(s) + git pack-objects --revs --all --honor-pack-keep --delta-base-offset \ + $packdir/pack && + + find $packdir -type f | sort >packs.after && + + # and install the whole thing + for f in $(comm -12 packs.before packs.after) + do + rm -f "$f" || return 1 + done + rm -fr $packdir/*.keep +} + +for nr_packs in 1 10 100 +do + test_expect_success "create $nr_packs-pack scenario" ' + repack_into_n_chunks $nr_packs + ' + + test_expect_success "setup bitmaps for $nr_packs-pack scenario" ' + find $packdir -type f -name "*.idx" | sed -e "s/.*\/\(.*\)$/+\1/g" | + git multi-pack-index write --stdin-packs --bitmap \ + --preferred-pack="$(find_pack $(git rev-parse HEAD))" + ' + + for reuse in single multi + do + test_perf "clone for $nr_packs-pack scenario ($reuse-pack reuse)" " + git for-each-ref --format='%(objectname)' refs/heads refs/tags >in && + git -c pack.allowPackReuse=$reuse pack-objects \ + --revs --delta-base-offset --use-bitmap-index \ + --stdout result + " + + test_size "clone size for $nr_packs-pack scenario ($reuse-pack reuse)" ' + wc -c