From patchwork Mon Mar 10 01:50:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 14009160 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F30A79C4 for ; Mon, 10 Mar 2025 01:51:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741571463; cv=none; b=iMe5c1tgVrrKmD+QG/QH9lSXyHtrkgcl+sfxXSeVTlVDv0qhXf5BqJ299e/fsZanMzqkh28Ahv6Is1ZsbKeWQZ/5de8ZS/00VpcSW2gN/2WvnMv8kpMtHJDQaQ5hLr0pVv4aA4hgxU/y0NjM4o/MqkiePR1KBijO3Yz0P8xkZ7M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741571463; c=relaxed/simple; bh=rrWJiDZRa62NeWIzuBayNkgL7bKmrESBgsFekn6aSTo=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=aalt4Ao1bu5K04GPm2IVjy9Cx2BdfD4nM+cm6tg0xTjuxy/FU36mKj+QXCsd7OuwtIhxIhYrvLoXweuNSF+b/Aegx59BW3+MS0iT8jmh8SGsCE5rNzPfIAHiWNP6ySBXpw+hzArRrXdiccLX9W8in/4EbeYW5hNvlYn5b/m8oJ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=P9zGEBNd; arc=none smtp.client-ip=209.85.208.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="P9zGEBNd" Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-5e5b572e45cso6453636a12.0 for ; Sun, 09 Mar 2025 18:51:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741571458; x=1742176258; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=IHfJA8EakFpP4alsVHXP/A3oM0vOFjKNDNe+NK8sJOs=; b=P9zGEBNdMEdEGVQeFJJAaARq3e4c+uUF/Ywzlxgc3myo1hgnaLFQSoSCFyVYl9GDF1 YqZsEMKHDvsjKS6YgQVqb8cYS4RGoxYBrUltcyqEsbfg4RB9xpum7VY+S1YnCnFgpEkB B7vyrFA8bZ9ldwe+mEZOz2ec7zXgQnGtSLICqe8nxoPjC4seApKITo3VWF3ghlR5+V8D tHnSZpulTckh1U+FY4gCGPRh7GunkfFUL1FB8RaDDTrQ+ioxrb5+sYH1zsH0XcmpAWrX brukUjJFSicJo5W+QZ8TZ7Y/ApfdAXHCbtU7RlT9B88HMl9pJwEw52/ooe/r/dtbqlOp ibrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741571458; x=1742176258; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IHfJA8EakFpP4alsVHXP/A3oM0vOFjKNDNe+NK8sJOs=; b=KdiRRhihdtd6zFJsxWZ7fLJVRdb62foQcCfE9CBAkfRFLcgllcR3yHiKXLb2MXlvmy uGesve1Sm29gLzK8aStPVU+x2ykteGj9hdBVmk54TdTtLFt7KbCUpkeZLwMeTR9xO6E+ ZElyKZMCbbjE6I289nybSL5VT13/QCBNrSoCf3XtZukN9DgRcMtiKQUEP6LhJnYbfX06 6JdOiPfigMS/eWmxfkhTmY7FllLFVk0aMivdxShFDIF8RBAgzUVApr9ML6q/amqSfZVU i8ZrEPdu2vYoN8Y5Cl7ZKSb9xzzhsvz6QXs+4POCYLsgx+Qq21pYVlcyqc4TOCkaYGWV ioMQ== X-Gm-Message-State: AOJu0YzuA8xrLPLQE1QSnjqiSC7haz9U9wn4G0xGYvShSAeqxtMLyRkC Z3sNoCK+IOq3bK3esKJ5ynyqsDJa6BjdbVEckRbZJvQG4FHMjC2+z4olMA== X-Gm-Gg: ASbGnctiPRZ6QjJbOCrwVDoa60++2ZZqWc103zDIL9dX6azmaMu3lnBZVcdyb90oAii 3GoxbAR6Ofo2Sz2z1vQh1xz3bgiAn34F919z0ZGG1jmW9zyoDI0UM+aNCf/XdADMwx66363J7hV w9BznNpG0VdVgnoX/jD5v3wkSSPurdBZtEgqYiHTJNowX5QSWOzn0bN2cUErEZRVuxpV9gdUVn4 r95bY85jN3zMh8wNEzBC0IhEt7UZ4FkTcDDDANeI1g62caO3ivdODWd3U1v9rOrBSjgpWAWVUTJ /YR08l/Dp1uAB2GCEnvw/PciTCaqk/IGha2FZPT8R/+1qQ== X-Google-Smtp-Source: AGHT+IGimGS7ezV/3xBgr+OzONlJ62q2rL9NNf1pDlo3zeTV5vc8B8097FN4WFMrZ8UmDFJBpq7AiQ== X-Received: by 2002:a17:907:6ea8:b0:ac2:8a59:92f4 with SMTP id a640c23a62f3a-ac28a5997fdmr515684066b.51.1741571458093; Sun, 09 Mar 2025 18:50:58 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ac23973a74bsm666260766b.123.2025.03.09.18.50.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 09 Mar 2025 18:50:57 -0700 (PDT) Message-Id: <9b31dc87bb61f4d73eced02a24baea58bc51aa5e.1741571455.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 10 Mar 2025 01:50:44 +0000 Subject: [PATCH 02/13] pack-objects: add --path-walk option Fcc: Sent Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: christian.couder@gmail.com, gitster@pobox.com, johannes.schindelin@gmx.de, johncai86@gmail.com, jonathantanmy@google.com, karthik.188@gmail.com, kristofferhaugsbakk@fastmail.com, me@ttaylorr.com, newren@gmail.com, peff@peff.net, ps@pks.im, Derrick Stolee , Derrick Stolee From: Derrick Stolee From: Derrick Stolee In order to more easily compute delta bases among objects that appear at the exact same path, add a --path-walk option to 'git pack-objects'. This option will use the path-walk API instead of the object walk given by the revision machinery. Since objects will be provided in batches representing a common path, those objects can be tested for delta bases immediately instead of waiting for a sort of the full object list by name-hash. This has multiple benefits, including avoiding collisions by name-hash. The objects marked as UNINTERESTING are included in these batches, so we are guaranteeing some locality to find good delta bases. After the individual passes are done on a per-path basis, the default name-hash is used to find other opportunistic delta bases that did not match exactly by the full path name. The current implementation performs delta calculations while walking objects, which is not ideal for a few reasons. First, this will cause the "Enumerating objects" phase to be much longer than usual. Second, it does not take advantage of threading during the path-scoped delta calculations. Even with this lack of threading, the path-walk option is sometimes faster than the usual approach. Future changes will refactor this code to allow for threading, but that complexity is deferred until later to keep this patch as simple as possible. This new walk is incompatible with some features and is ignored by others: * Object filters are not currently integrated with the path-walk API, such as sparse-checkout or tree depth. A blobless packfile could be integrated easily, but that is deferred for later. * Server-focused features such as delta islands, shallow packs, and using a bitmap index are incompatible with the path-walk API. * The path walk API is only compatible with the --revs option, not taking object lists or pack lists over stdin. These alternative ways to specify the objects currently ignores the --path-walk option without even a warning. Future changes will create performance tests that demonstrate the power of this approach. Signed-off-by: Derrick Stolee --- Documentation/git-pack-objects.adoc | 13 +- Documentation/technical/api-path-walk.adoc | 1 + builtin/pack-objects.c | 147 +++++++++++++++++++-- t/t5300-pack-object.sh | 15 +++ 4 files changed, 166 insertions(+), 10 deletions(-) diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc index 7f69ae4855f..7dbbe6d54d2 100644 --- a/Documentation/git-pack-objects.adoc +++ b/Documentation/git-pack-objects.adoc @@ -16,7 +16,7 @@ SYNOPSIS [--cruft] [--cruft-expiration=