From patchwork Mon Mar 24 15:22:37 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 14027490 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D435F4964E for ; Mon, 24 Mar 2025 15:22:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742829777; cv=none; b=YZ/MQlL4UiWVMN0j1oBiQc7CHlXx39Q4NHgkzS+FWfhJ0MNtCKwnVC1LfEwypmqKOfVsrtq1m28bdqj/Ni+PHjnkreBZLZ2/2HMBtUupAk7prJsqhuwI7gXT7iYnpCAJmWIEN42VvYMkku8PP5bwoTJ15GwVU7dZyUBzqYKZOhE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742829777; c=relaxed/simple; bh=v8LZnescrDVzeeUCNdIfN6UlcKTQw08afA/JLUOT9P4=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=sx5SCAF0nuYN0prqIN1A/7XYkkIopw20GESLlsyWqjU+70hwqBLGylFdRjJItNK1q+pxs2kG7b4LHnszUdmscFt73LX3ulSGx7fZsnO76oM6mygWI1DTAix59rbhD139SIB4oydo3abKE3c2E6VA6v9/+aEcf784YBuqon40p+c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=iMwdwRrx; arc=none smtp.client-ip=209.85.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iMwdwRrx" Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-43cf034d4abso49788275e9.3 for ; Mon, 24 Mar 2025 08:22:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742829774; x=1743434574; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=PLF1r7PAEn3B8+tXJt7njpgeq2h9wMxAx9WDzs4XelM=; b=iMwdwRrxQ5FJ1UG854dofzbHhnxXcmU2ZIRZHy5aEP0xcLrGEaypEobrK7ugUvXbai GseiI9AlwT0AM34NeZwgB9LzQxewCWl3JlYAEUHVfzvM36jL3fFMxiRDwcsmfAQCwcWn 5GxNNtC7U7oTe6NlzsIWxS2MdnOsWsb68y+v3ZkkpwzDYL4rA+cjHwyIMQUYe7T7ctQG IvAYuKxPBmYE34rf2vPaoZVUv8nKlj54Dg+o5AhCALqdWcviwq3/Xj/uNsWFgiqxBXwb mrzGKZrkZFY2Eh8swB7yMdwy/T9FNEh0U9pMA+czVO9mnjPTje0ZY9WdOJgM4U1TxmXD bZKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742829774; x=1743434574; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PLF1r7PAEn3B8+tXJt7njpgeq2h9wMxAx9WDzs4XelM=; b=q+qt/nsqnkf68euB1tJDhv+VbtVDavjUO56S0bN3BWESpfqZzNYIxEspnWZ6LQL768 slRZXA+QYkimUACo0aNKDdG/ND/rNnM0nmYqVqfm4MkAwVpAlR2lU3ACNCC0AgXT0s9d babe0BLGXijHq7QZlcY36EpjNWXxNMkSo6F//MZiIQyBHK0EhTRIoe5zCkas65mB3va9 r5iNj0B15zsv4U3jVLa2/HJGf68FWFQutZhtg8NkheJgzLc3djswbDvk0+Mk2sdNX257 RucMZZf4KtqcSHUXmaCbLWcVgRMqg33UKso6fdOrrVlNjdrY++NHLE3uks2CPCb7Y73e l4pA== X-Gm-Message-State: AOJu0YxFcWsn2wIs6ftzORoFOPYqCW5MfU8LiZat6qDAyNpXDXadkuVb 0ymjqk3lFEHOqDjVz66dNk/CRSu7TI4pu1rBhWmeAeBSs6V3ZAkwxwaiQQ== X-Gm-Gg: ASbGncvyhQUYqUYl5t3VcRR9n/qAEvs3sJg+uk67dbQZX2nuDMmRNU651B/krR2FXL5 O29lnTYlpAMBBDUnF9JSV6SVnv2uFXUuHGI4swqoRIplOUolTHGa/rdLDis+Z2VnHV23qfydi5u iMSPelvf1cbi7j0knzEJSOyTDWQkEcjvrt9lVsuXk5WK4/wXBfOmkjstM9jhbQqSxJ0T/+AFnqr eZ1A7eQZt2XtyC/4BjdRJhRuKS9ktY8F7LBW9l/+8qEBeuurEqhsTbHrHqKmD84IUiuS4DJK+be lSBzJWIPcGU2weifAr06NDmdhx/eATJw7spk/I/1sRICtA== X-Google-Smtp-Source: AGHT+IELtlp/nYQopXVc1NEUlMC5XFNdD5XXXoOGYZnebLkblDnEkXOuaSikX4SbZaTFjQnLVUJacg== X-Received: by 2002:a05:600c:510d:b0:439:a139:7a19 with SMTP id 5b1f17b1804b1-43d50a1d901mr104373225e9.23.1742829773377; Mon, 24 Mar 2025 08:22:53 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d4fceb780sm126588525e9.4.2025.03.24.08.22.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Mar 2025 08:22:53 -0700 (PDT) Message-Id: <57c1cc20de0c80b84ad11a6546763826fe4b1a09.1742829770.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 24 Mar 2025 15:22:37 +0000 Subject: [PATCH v2 01/13] pack-objects: extract should_attempt_deltas() Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: christian.couder@gmail.com, gitster@pobox.com, johannes.schindelin@gmx.de, johncai86@gmail.com, jonathantanmy@google.com, karthik.188@gmail.com, kristofferhaugsbakk@fastmail.com, me@ttaylorr.com, newren@gmail.com, peff@peff.net, ps@pks.im, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee This will be helpful in a future change, which will reuse this logic. Signed-off-by: Derrick Stolee --- builtin/pack-objects.c | 56 ++++++++++++++++++++++++------------------ 1 file changed, 32 insertions(+), 24 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 58a9b161262..7805429f5d1 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3196,6 +3196,36 @@ static int add_ref_tag(const char *tag UNUSED, const char *referent UNUSED, cons return 0; } +static int should_attempt_deltas(struct object_entry *entry) +{ + if (DELTA(entry)) + /* This happens if we decided to reuse existing + * delta from a pack. "reuse_delta &&" is implied. + */ + return 0; + + if (!entry->type_valid || + oe_size_less_than(&to_pack, entry, 50)) + return 0; + + if (entry->no_try_delta) + return 0; + + if (!entry->preferred_base) { + if (oe_type(entry) < 0) + die(_("unable to get type of object %s"), + oid_to_hex(&entry->idx.oid)); + } else if (oe_type(entry) < 0) { + /* + * This object is not found, but we + * don't have to include it anyway. + */ + return 0; + } + + return 1; +} + static void prepare_pack(int window, int depth) { struct object_entry **delta_list; @@ -3226,33 +3256,11 @@ static void prepare_pack(int window, int depth) for (i = 0; i < to_pack.nr_objects; i++) { struct object_entry *entry = to_pack.objects + i; - if (DELTA(entry)) - /* This happens if we decided to reuse existing - * delta from a pack. "reuse_delta &&" is implied. - */ - continue; - - if (!entry->type_valid || - oe_size_less_than(&to_pack, entry, 50)) + if (!should_attempt_deltas(entry)) continue; - if (entry->no_try_delta) - continue; - - if (!entry->preferred_base) { + if (!entry->preferred_base) nr_deltas++; - if (oe_type(entry) < 0) - die(_("unable to get type of object %s"), - oid_to_hex(&entry->idx.oid)); - } else { - if (oe_type(entry) < 0) { - /* - * This object is not found, but we - * don't have to include it anyway. - */ - continue; - } - } delta_list[n++] = entry; } From patchwork Mon Mar 24 15:22:38 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 14027491 Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB6E025DD13 for ; Mon, 24 Mar 2025 15:22:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742829778; cv=none; b=dF5mclModYM51MyAw61ORLSe9gFMBull+V5a/NpJv34T+GvHBM53Y4A6FA/tGXoK7Ggk/GYJqa6eEts7jcBlswDdHmdnmMkLn7Eokjiw/yznxt1nDEBxBf4OwFljgsiaZKa65ZPvBWVtmToErt4j8ZuehJKlVumyKOzoxIHArRw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742829778; c=relaxed/simple; bh=Al07JWhTANngwMQuFrAiz3DaVb47+pCFFTGTcKbmVRc=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=L7nBt6WhIDb+zdJmMse4Jn/iWpn2mF1VbKhjXi+ZByoiqLELjUe54hu+10INBdc5wYoOpK+tIEwVw/7SQA9mD3RmAbeRNXzAVsV43zx4tdZxbTvzZKSi/QKNA3PE56HE9wJpWsWahqO0u5IlhGAPpVIoZso9nhDiE/iRk6gu7hc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=H9VcoNUg; arc=none smtp.client-ip=209.85.128.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="H9VcoNUg" Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-43948f77f1aso29968235e9.0 for ; Mon, 24 Mar 2025 08:22:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1742829775; x=1743434575; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=ckB2qb3M4BIOZ27rVI+0dQUy/RAdnRTmybqdUHD6Zvw=; b=H9VcoNUge4pCHYlRcg9K23NvTQGbBTxVstgAJXI6uxzhELDKnJtmbqtbkuDEgFNt++ 1e3aM4/mmuaDCvlybgjTkErRzllx/fc7vQB6kiye5qkvnl0bN3Z/oYULyNoEWSe34/c0 0RwUAqW32i00KsVU5tSplqyyRt40yBIta7gB1k+QTSXInnRgNA8UC0nFRgVu+qkHbOi5 VXlCQX1dOztXjbRXPKcl89imLhCVwOPyAl4X8AajRhmtyNCigWgKetR/gS9TvBAj0O+l GmX8zabmAGjeiytTjGgtK1mhIZhOUD1WBDSMFQNiPgnyHoH5znweUOklzkdWt/sOj3kS usvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742829775; x=1743434575; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ckB2qb3M4BIOZ27rVI+0dQUy/RAdnRTmybqdUHD6Zvw=; b=Kf2OP8eJxfgDj9BXejKeG2vePCFWUrgNg0vFG+aijY7vkWfCmfh30Y/bWaxHn+sWiB ArgwgJlnhvseHUQHwVacDBQEmG69DY76X4TOcgg8tL6N2hx9omSvL75jIIY6yJ3x7Nbz 4qzLbk3BVl4+oDuOmcoGaKcKFrQki5Y1MrYI1lEfqYzlBQyYdiYbMGx57LZEznnFcIKR b0ONbSV87iN0jZabVZTgpvzh+OSGRBmaDH2ELW5vb1UF9ivSL63bGdEeIVS53cJOMaJF Ia5bL8y1iHiz2fdsHk2nxwxBJpQvE36FEPwW01mViu9piIUUSQG3RqQ0bsmZRUVFItsb hcnA== X-Gm-Message-State: AOJu0YyAqYgjCaszplBNbHyEE0AYPFFEBTx1A9WUwlJy0OUX/kmIkXxf o69/AfnDRF+wHl7rjMVerrrLS+O8MNZkzsSui7nQiagU82OGAfqSUxOTNQ== X-Gm-Gg: ASbGncuEZuMEI1+85teGo06uxykpJz8c5ARoneRcL8tc4eXZSmfrN31I5Kv5Dk+x1Jc X4OHhrcAFLb3N1+hwUO/0OW7f++xsYFNBz9AVfZRWGcKwr9UneE4xmtNvqOcWRuq8731+IT88jL /+xDYCwDOtPRQeGpIk6/LQSY8QtP2ay4w+VZ/0kDFIq9ypoHcTT6Km4E9PiLFX13m/WNYACUsEb ic4XxnzEekt37XJDcsJfkCSsc2xGzIfuHZgCwvIbGnO2CWYpsbw2Xd4GDLo56rIImE1owIZQgkb Xn8Db27+ckxlefrKagPtrk+064FFtLn3q6IcKl5JJVjr4A== X-Google-Smtp-Source: AGHT+IGt8E65aG7nRY2oR968pEG1tqPABM6VsrcnyWPQYFn7wYWQb+R79ywh/GKYjqGcTirmKZAC7Q== X-Received: by 2002:a05:600c:249:b0:43d:cc9:b0a3 with SMTP id 5b1f17b1804b1-43d567a2f59mr69151465e9.22.1742829774250; Mon, 24 Mar 2025 08:22:54 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-43d4fcea6ecsm124478455e9.5.2025.03.24.08.22.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Mar 2025 08:22:53 -0700 (PDT) Message-Id: In-Reply-To: References: Date: Mon, 24 Mar 2025 15:22:38 +0000 Subject: [PATCH v2 02/13] pack-objects: add --path-walk option Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: christian.couder@gmail.com, gitster@pobox.com, johannes.schindelin@gmx.de, johncai86@gmail.com, jonathantanmy@google.com, karthik.188@gmail.com, kristofferhaugsbakk@fastmail.com, me@ttaylorr.com, newren@gmail.com, peff@peff.net, ps@pks.im, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee In order to more easily compute delta bases among objects that appear at the exact same path, add a --path-walk option to 'git pack-objects'. This option will use the path-walk API instead of the object walk given by the revision machinery. Since objects will be provided in batches representing a common path, those objects can be tested for delta bases immediately instead of waiting for a sort of the full object list by name-hash. This has multiple benefits, including avoiding collisions by name-hash. The objects marked as UNINTERESTING are included in these batches, so we are guaranteeing some locality to find good delta bases. After the individual passes are done on a per-path basis, the default name-hash is used to find other opportunistic delta bases that did not match exactly by the full path name. The current implementation performs delta calculations while walking objects, which is not ideal for a few reasons. First, this will cause the "Enumerating objects" phase to be much longer than usual. Second, it does not take advantage of threading during the path-scoped delta calculations. Even with this lack of threading, the path-walk option is sometimes faster than the usual approach. Future changes will refactor this code to allow for threading, but that complexity is deferred until later to keep this patch as simple as possible. This new walk is incompatible with some features and is ignored by others: * Object filters are not currently integrated with the path-walk API, such as sparse-checkout or tree depth. A blobless packfile could be integrated easily, but that is deferred for later. * Server-focused features such as delta islands, shallow packs, and using a bitmap index are incompatible with the path-walk API. * The path walk API is only compatible with the --revs option, not taking object lists or pack lists over stdin. These alternative ways to specify the objects currently ignores the --path-walk option without even a warning. Future changes will create performance tests that demonstrate the power of this approach. Signed-off-by: Derrick Stolee --- Documentation/git-pack-objects.adoc | 14 +- Documentation/technical/api-path-walk.adoc | 1 + builtin/pack-objects.c | 147 +++++++++++++++++++-- t/t5300-pack-object.sh | 15 +++ 4 files changed, 167 insertions(+), 10 deletions(-) diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc index 7f69ae4855f..7065758eddf 100644 --- a/Documentation/git-pack-objects.adoc +++ b/Documentation/git-pack-objects.adoc @@ -16,7 +16,7 @@ SYNOPSIS [--cruft] [--cruft-expiration=