From patchwork Wed Mar 26 20:18:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toon Claes X-Patchwork-Id: 14030536 Received: from out-182.mta0.migadu.com (out-182.mta0.migadu.com [91.218.175.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 475DC1A08BC for ; Wed, 26 Mar 2025 20:19:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020375; cv=none; b=hjtwn21wcXPllAY8TafzQk8ulmMYg6Tdm8KMX76czj4rZJXpo7lDFnyZ9mzqKToLDZ0zWeftSgAgtm+WNLFHlhfgP+YgWPgWOinyN1nxZ38sYIxcqSaaIy7n2AU1p51jPUcbxxUE9ugTTV/U6602NDcIpZa3jexGKARuX4JZonE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020375; c=relaxed/simple; bh=o8NlAsZD7BkALT2zOK835T9eDLqzj+22ubrEF9P9gyU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Gw7RSEGLUJJjYvHCpN0O3l3nlUCwmJrcx+DZFVKykus6J7wMOb9Mf9seiE7Vun69kXcWPMtbRrKG08tfsFBMC2HSrnfsAsY3YxbVt5sXBpSckpt1XQ42qrHv6tShT1z5pZuvUL31kuAwdpBW0iyHQTTzHu98AAXsOBcce1yGoRA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com; spf=fail smtp.mailfrom=iotcl.com; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b=IYNrvrco; arc=none smtp.client-ip=91.218.175.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=iotcl.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b="IYNrvrco" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iotcl.com; s=key1; t=1743020370; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IyDiiCtlyRbrzxRTZ+qqxZqLXT3pzUj3mrr82SVnQ5I=; b=IYNrvrcowIzVOepdynevHjgX4pSd+w8YJcLE2UK2NqTvFAbBdSBgH7I3sr+oemhFrwQZii 5nO/xl7QR/cxCXK5T5NvDmlGrSmrrFw7NuLstwBevgPmM3DCpYI4Y7/47jvZfiE4NQTgC+ s/4SqswuEsyFzOwUVAxrz8s1P26cRKg= From: Toon Claes Date: Wed, 26 Mar 2025 21:18:25 +0100 Subject: [PATCH 1/8] combine-diff: zero memory used for callback filepairs Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250326-toon-blame-tree-v1-1-4173133f3786@iotcl.com> References: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> In-Reply-To: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt X-Migadu-Flow: FLOW_OUT From: Jeff King In commit 25e5e2bf85 (combine-diff: support format_callback, 2011-08-19), the combined-diff code learned how to make a multi-sourced diff_filepair to pass to a diff callback. When we create each filepair, we do not bother to fill in many of the fields, because they would make no sense (e.g., there can be no rename score or broken_pair flag because we do not go through the diffcore filters). However, we did not even bother to zero them, leading to random values. Let's make sure everything is blank with xcalloc, just as the regular diff code does. We would potentially want to set the "status" flag to something non-zero, but it is not clear to what. Possibly a new DIFF_STATUS_COMBINED would make sense, as this is not strictly a modification, nor does it fit any other category. Since it is not yet clear what callers would want, this patch simply leaves it as "0", the same empty flag that is seen when diffcore_std is not used at all. Signed-off-by: Jeff King --- combine-diff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/combine-diff.c b/combine-diff.c index 9527f3160d..1deb8c7374 100644 --- a/combine-diff.c +++ b/combine-diff.c @@ -1315,7 +1315,7 @@ static struct diff_filepair *combined_pair(struct combine_diff_path *p, struct diff_filepair *pair; struct diff_filespec *pool; - pair = xmalloc(sizeof(*pair)); + CALLOC_ARRAY(pair, 1); CALLOC_ARRAY(pool, st_add(num_parent, 1)); pair->one = pool + 1; pair->two = pool; From patchwork Wed Mar 26 20:18:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toon Claes X-Patchwork-Id: 14030537 Received: from out-170.mta1.migadu.com (out-170.mta1.migadu.com [95.215.58.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BB83214A8F for ; Wed, 26 Mar 2025 20:19:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020379; cv=none; b=eBBUuIHKZgoN19aFKajkVRJLyOQZqJWAOzWZT5aaXDIYIZM6ZgOgCgvrG3A0JYhHy0psAOTIM+IMmCOgOMMSOiQ8eE3Q6jc0xWOyXr44/cl1tVso70oxGDtM3/fDXQEqkjMgwthEEmXL4N5w3uezReD7CpFC3Rs3VgrIGs0V8KU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020379; c=relaxed/simple; bh=co6sp4XhrW8QICdllWV/8LKV704rKIry+jXVEJOQCVU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=KywvwLhRVp/g0Z5pfqvbkXPnunc9ocM6sg0hgp3axNnbn88g2lk8tE+HdQnh1ey9sTh7sMYihAUAvcYmvPX5V89WeNufGuyPqBJcd/ZugtLMoN47oyLP0P3iQmXy+btMJBWFiKKY6VjW5dotCajLrOFSwhG1covI15fKwXiXpbs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com; spf=fail smtp.mailfrom=iotcl.com; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b=kN+0j2Wn; arc=none smtp.client-ip=95.215.58.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=iotcl.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b="kN+0j2Wn" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iotcl.com; s=key1; t=1743020375; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=agIwgCHQn9n3n6cUiz7PIJoQZis536XvaW6vG413CBc=; b=kN+0j2Wn5BeAGZP/Jo8jrsStefKmoiJ1Kxzhmih0lN8FRSOU2k0G3hqC+fS+ZliB8xydiX 4KNq0zYyp5I10dtrrTdQKxnTqS2hfnM/qd82KImCGhrcC/ZTXWrz5PFd498MjbWoV3b6T9 ze3fcGKSLdYf2pdJhl7q7rUvQr0YFh4= From: Toon Claes Date: Wed, 26 Mar 2025 21:18:26 +0100 Subject: [PATCH 2/8] within_depth: fix return for empty path Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250326-toon-blame-tree-v1-2-4173133f3786@iotcl.com> References: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> In-Reply-To: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt X-Migadu-Flow: FLOW_OUT From: Jeff King The within_depth function is used to check whether pathspecs limited by a max-depth parameter are acceptable. It takes a path to check, a maximum depth, and a "base" depth. It counts the components in the path (by counting slashes), adds them to the base, and compare them to the maximum. However, if the base does not have any slashes at all, we always return "true". If the base depth is 0, then this is correct; no matter what the maximum is, we are always within it. However, if the base depth is greater than 0, then we might return an erroneous result. This ends up not causing any user-visible bugs in the current code. The call sites in dir.c always pass a base depth of 0, so are unaffected. But tree_entry_interesting uses this function differently: it will pass the prefix of the current entry, along with a "1" if the entry is a directory, in essence checking whether items inside the entry would be of interest. It turns out not to make a difference in behavior, but the reasoning is complex. Given a tree like: file a/file a/b/file walking the tree and calling tree_entry_interesting will yield the following results: (with max_depth=0): file: yes a: yes a/file: no a/b: no (with max_depth=1): file: yes a: yes a/file: yes a/b: no So we have inconsistent behavior in considering directories interesting. If they are at the edge of our depth but at the root, we will recurse into them, but then find all of their entries uninteresting (e.g., in the first case, we will look at "a" but find "a/*" uninteresting). But if they are at the edge of our depth and not at the root, then we will not recurse (in the second example, we do not even bother entering "a/b"). This turns out not to matter because the only caller which uses max-depth pathspecs is cmd_grep, which only cares about blob entries. From its perspective, it is exactly the same to not recurse into a subtree, or to recurse and find that it contains no matching entries. Not recursing is merely an optimization. It is debatable whether tree_entry_interesting should consider such an entry interesting. The only caller does not care if it sees the tree itself, and can benefit from the optimization. But if we add a "max-depth" limiter to regular diffs, then a diff with DIFF_OPT_TREE_IN_RECURSIVE would probably want to show the tree itself, but not what it contains. This patch just fixes within_depth, which means we consider such entries uninteresting (and makes the current caller happy). If we want to change that in the future, then this fix is still the correct first step, as the current behavior is simply inconsistent. Signed-off-by: Jeff King --- dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dir.c b/dir.c index cbd82be6c9..3aead2a599 100644 --- a/dir.c +++ b/dir.c @@ -278,7 +278,7 @@ int within_depth(const char *name, int namelen, if (depth > max_depth) return 0; } - return 1; + return depth <= max_depth; } /* From patchwork Wed Mar 26 20:18:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toon Claes X-Patchwork-Id: 14030538 Received: from out-183.mta1.migadu.com (out-183.mta1.migadu.com [95.215.58.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0308E1A08BC for ; Wed, 26 Mar 2025 20:19:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.183 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020382; cv=none; b=scezgee6zSKozWaHWDfMhGSF2oyG7G1xAJyJPNELBisyXb6onW4cn8FV+newNVNkPcVTrZ3NaOKUi9DnzFa9JtNlX8QDbzaHy4Zcaz2ZlulaQinoIntoFuU+dkSQ49JmR8RryC1vyLrWErFLSJX1D6DpN4vJPb5BXTqlnrnsYlQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020382; c=relaxed/simple; bh=DTeLNtf/zB+0/NL9uD1FFEVNLz7vS4GIf+rxeAaTWuA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=kprDUlB4vYdquRgHuhJbYB16os+yO+XKjrpfrHLL03KwkeJPBYouqDKVyi8uD0iU8hPr8Ak5j8xMP4A7YDj9EjYu08BwgK2DuHsJjL4h3FnM37BjCN85P7N7AouPXpVeq1huMn3IOsam5WfC1aHJ8xO+X8H9kGSjMHWXjy9O5lc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com; spf=fail smtp.mailfrom=iotcl.com; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b=OL94wubv; arc=none smtp.client-ip=95.215.58.183 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=iotcl.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b="OL94wubv" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iotcl.com; s=key1; t=1743020378; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J9N47mHmD+30oxVZbJnhp+KlpZw21t8Qhgt7NuwGQDo=; b=OL94wubv3BmN6eSYhJCm7yb5xA/2Q+qGjdj/CnVNFpTXJxufApKCMV3eDy1RqdpxOYuVhj LWoWB5unYqmKN3noXOCIxUszTS7/DDBLG7CyZB3/yMnmnS1pRm0+WaOEWXWaH+eJudpVY+ V6XlVtgC4TeiIooGY8oXM6YzSKHT9Mo= From: Toon Claes Date: Wed, 26 Mar 2025 21:18:27 +0100 Subject: [PATCH 3/8] diff: teach tree-diff a max-depth parameter Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250326-toon-blame-tree-v1-3-4173133f3786@iotcl.com> References: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> In-Reply-To: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt X-Migadu-Flow: FLOW_OUT From: Jeff King When you are doing a tree-diff, there are basically two options: do not recurse into subtrees at all, or recurse indefinitely. While most callers would want to always recurse and see full pathnames, some may want the efficiency of looking only at a particular level of the tree. This is currently easy to do for the top-level (just turn off recursion), but you cannot say "show me what changed in subdir/, but do not recurse". This patch adds a max-depth parameter which is measured from the closest pathspec match, so that you can do: git log --raw --max-depth=1 a/b/c and see the contents of a/b/c/, but not those of a/b/c/d/ (you would see the --raw entry for a/b/c/d in such a case). Signed-off-by: Jeff King --- Documentation/diff-options.adoc | 26 ++++++++++ diff-lib.c | 5 ++ diff.c | 18 +++++++ diff.h | 9 ++++ t/meson.build | 1 + t/t4071-diff-max-depth.sh | 109 ++++++++++++++++++++++++++++++++++++++++ tree-diff.c | 78 ++++++++++++++++++++++++++-- 7 files changed, 243 insertions(+), 3 deletions(-) diff --git a/Documentation/diff-options.adoc b/Documentation/diff-options.adoc index 640eb6e7db..8bb523d933 100644 --- a/Documentation/diff-options.adoc +++ b/Documentation/diff-options.adoc @@ -887,5 +887,31 @@ endif::git-format-patch[] reverted with `--ita-visible-in-index`. Both options are experimental and could be removed in future. +--max-depth=:: + + Limit diff recursion to `` levels (implies `-r`). The depth + is measured from the closest pathspec. Given a tree containing + `foo/bar/baz`, the following list shows the matches generated by + each set of options: ++ +-- + - `--max-depth=0 -- foo`: `foo` + + - `--max-depth=1 -- foo`: `foo/bar` + + - `--max-depth=2 -- foo`: `foo/bar/baz` + + - `--max-depth=1 -- foo/bar`: `foo/bar/baz` +-- ++ +If no pathspec is given, the depth is measured as if all +top-level entries were specified. Note that this is different +than measuring from the root, in that `--max-depth=0` would +still return `foo`. This allows you to still limit depth while +asking for a subset of the top-level entries. ++ +Note that this option is only supported for diffs between tree objects, +not against the index or working tree. + For more detailed explanation on these common options, see also linkgit:gitdiffcore[7]. diff --git a/diff-lib.c b/diff-lib.c index 353b473ed5..34d61cc6a0 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -115,6 +115,9 @@ void run_diff_files(struct rev_info *revs, unsigned int option) uint64_t start = getnanotime(); struct index_state *istate = revs->diffopt.repo->index; + if (revs->diffopt.max_depth_valid) + die("max-depth is not supported for worktree diffs"); + diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/"); refresh_fsmonitor(istate); @@ -560,6 +563,8 @@ static int diff_cache(struct rev_info *revs, opts.dst_index = NULL; opts.pathspec = &revs->diffopt.pathspec; opts.pathspec->recursive = 1; + if (revs->diffopt.max_depth_valid) + die("max-depth is not supported for index diffs"); init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size); return unpack_trees(1, &t, &opts); diff --git a/diff.c b/diff.c index c89c15d98e..acf2212a4a 100644 --- a/diff.c +++ b/diff.c @@ -4986,6 +4986,9 @@ void diff_setup_done(struct diff_options *options) options->filter = ~filter_bit[DIFF_STATUS_FILTER_AON]; options->filter &= ~options->filter_not; } + + if (options->pathspec.has_wildcard && options->max_depth_valid) + die("max-depth cannot be used with wildcard pathspecs"); } int parse_long_opt(const char *opt, const char **argv, @@ -5620,6 +5623,18 @@ static int diff_opt_rotate_to(const struct option *opt, const char *arg, int uns return 0; } +static int diff_opt_max_depth(const struct option *opt, + const char *arg, int unset) +{ + struct diff_options *options = opt->value; + + BUG_ON_OPT_NEG(unset); + options->flags.recursive = 1; + options->max_depth = strtol(arg, NULL, 10); + options->max_depth_valid = 1; + return 0; +} + /* * Consider adding new flags to __git_diff_common_options * in contrib/completion/git-completion.bash @@ -5895,6 +5910,9 @@ struct option *add_diff_options(const struct option *opts, { OPTION_CALLBACK, 0, "output", options, N_(""), N_("output to a specific file"), PARSE_OPT_NONEG, NULL, 0, diff_opt_output }, + OPT_CALLBACK_F(0, "max-depth", options, N_(""), + N_("maximum tree depth to recurse"), + PARSE_OPT_NONEG, diff_opt_max_depth), OPT_END() }; diff --git a/diff.h b/diff.h index ff0348e4a9..a07d2178c4 100644 --- a/diff.h +++ b/diff.h @@ -396,6 +396,15 @@ struct diff_options { struct strmap *additional_path_headers; int no_free; + + /* + * The extra "valid" flag is a slight hack. The value "0" is perfectly + * valid for max-depth. We would normally use -1 to indicate "not set", + * but there are many code paths which assume that assume that just + * zero-ing out a diff_options is enough to initialize it. + */ + int max_depth; + int max_depth_valid; }; unsigned diff_filter_bit(char status); diff --git a/t/meson.build b/t/meson.build index a59da26be3..950d1b7483 100644 --- a/t/meson.build +++ b/t/meson.build @@ -500,6 +500,7 @@ integration_tests = [ 't4067-diff-partial-clone.sh', 't4068-diff-symmetric-merge-base.sh', 't4069-remerge-diff.sh', + 't4071-diff-max-depth.sh', 't4100-apply-stat.sh', 't4101-apply-nonl.sh', 't4102-apply-rename.sh', diff --git a/t/t4071-diff-max-depth.sh b/t/t4071-diff-max-depth.sh new file mode 100755 index 0000000000..1545ebb869 --- /dev/null +++ b/t/t4071-diff-max-depth.sh @@ -0,0 +1,109 @@ +#!/bin/sh + +test_description='check that diff --max-depth will limit recursion' +. ./test-lib.sh + +make_dir() { + mkdir -p "$1" && + echo "$2" >"$1/file" +} + +make_files() { + echo "$1" >file && + make_dir one "$1" && + make_dir one/two "$1" && + make_dir one/two/three "$1" +} + +test_expect_success 'setup' ' + git commit --allow-empty -m empty && + git tag empty && + make_files added && + git add . && + git commit -m added && + make_files modified && + git add . && + git commit -m modified && + make_files index && + git add . && + make_files worktree +' + +test_expect_success '--max-depth is disallowed with wildcard pathspecs' ' + test_must_fail git diff-tree --max-depth=0 HEAD^ HEAD -- "f*" +' + +check_one() { + type=$1; shift + args=$1; shift + path=$1; shift + depth=$1; shift + test_expect_${expect:-success} "diff-$type $args, path=$path, depth=$depth" " + for i in $*; do echo \$i; done >expect && + git diff-$type --max-depth=$depth --name-only $args -- $path >actual && + test_cmp expect actual + " +} + +# For tree comparisons, we expect to see subtrees at the boundary +# get their own entry. +check_trees() { + check_one tree "$*" '' 0 file one + check_one tree "$*" '' 1 file one/file one/two + check_one tree "$*" '' 2 file one/file one/two/file one/two/three + check_one tree "$*" '' 3 file one/file one/two/file one/two/three/file + check_one tree "$*" one 0 one + check_one tree "$*" one 1 one/file one/two + check_one tree "$*" one 2 one/file one/two/file one/two/three + check_one tree "$*" one 3 one/file one/two/file one/two/three/file + check_one tree "$*" one/two 0 one/two + check_one tree "$*" one/two 1 one/two/file one/two/three + check_one tree "$*" one/two 2 one/two/file one/two/three/file + check_one tree "$*" one/two/three 0 one/two/three + check_one tree "$*" one/two/three 1 one/two/three/file +} + +# But for index comparisons, we do not store subtrees at all, so we do not +# expect them. +check_index() { + check_one "$@" '' 0 file + check_one "$@" '' 1 file one/file + check_one "$@" '' 2 file one/file one/two/file + check_one "$@" '' 3 file one/file one/two/file one/two/three/file + check_one "$@" one 0 + check_one "$@" one 1 one/file + check_one "$@" one 2 one/file one/two/file + check_one "$@" one 3 one/file one/two/file one/two/three/file + check_one "$@" one/two 0 + check_one "$@" one/two 1 one/two/file + check_one "$@" one/two 2 one/two/file one/two/three/file + check_one "$@" one/two/three 0 + check_one "$@" one/two/three 1 one/two/three/file +} + +# Check as a modification... +check_trees HEAD^ HEAD +# ...and as an addition... +check_trees empty HEAD +# ...and as a deletion. +check_trees HEAD empty + +# We currently only implement max-depth for trees. +expect=failure +# Check index against a tree +check_index index "--cached HEAD" +# and index against the worktree +check_index files "" +expect= + +test_expect_success 'find shortest path within embedded pathspecs' ' + cat >expect <<-\EOF && + one/file + one/two/file + one/two/three/file + EOF + git diff-tree --max-depth=2 --name-only HEAD^ HEAD -- one one/two >actual && + test_cmp expect actual +' + +test_done diff --git a/tree-diff.c b/tree-diff.c index 60c558c2b5..2a744dfaec 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -13,6 +13,7 @@ #include "tree-walk.h" #include "environment.h" #include "repository.h" +#include "dir.h" /* * Some mode bits are also used internally for computations. @@ -48,6 +49,73 @@ free((x)); \ } while(0) +/* Returns true if and only if "dir" is a leading directory of "path" */ +static int is_dir_prefix(const char *path, const char *dir, int dirlen) +{ + return !strncmp(path, dir, dirlen) && + (!path[dirlen] || path[dirlen] == '/'); +} + +static int check_recursion_depth(struct strbuf *name, + const struct pathspec *ps, + int max_depth) +{ + int i; + + if (!ps->nr) + return within_depth(name->buf, name->len, 1, max_depth); + + /* + * We look through the pathspecs in reverse-sorted order, because we + * want to find the longest match first (e.g., "a/b" is better for + * checking depth than "a/b/c"). + */ + for (i = ps->nr - 1; i >= 0; i--) { + const struct pathspec_item *item = ps->items+i; + + /* + * If the name to match is longer than the pathspec, then we + * are only interested if the pathspec matches and we are + * within the allowed depth. + */ + if (name->len >= item->len) { + if (!is_dir_prefix(name->buf, item->match, item->len)) + continue; + return within_depth(name->buf + item->len, + name->len - item->len, + 1, max_depth); + } + + /* + * Otherwise, our name is shorter than the pathspec. We need to + * check if it is a prefix of the pathspec; if so, we must + * always recurse in order to process further (the resulting + * paths we find might or might not match our pathspec, but we + * cannot know until we recurse). + */ + if (is_dir_prefix(item->match, name->buf, name->len)) + return 1; + } + return 0; +} + +static int should_recurse(struct strbuf *name, struct diff_options *opt) +{ + if (!opt->flags.recursive) + return 0; + if (!opt->max_depth_valid) + return 1; + + /* + * We catch this during diff_setup_done, but let's double-check + * against any internal munging. + */ + if (opt->pathspec.has_wildcard) + die("BUG: wildcard pathspecs are incompatible with max-depth"); + + return check_recursion_depth(name, &opt->pathspec, opt->max_depth); +} + static void ll_diff_tree_paths( struct combine_diff_path ***tail, const struct object_id *oid, const struct object_id **parents_oid, int nparent, @@ -170,9 +238,13 @@ static void emit_path(struct combine_diff_path ***tail, mode = 0; } - if (opt->flags.recursive && isdir) { - recurse = 1; - emitthis = opt->flags.tree_in_recursive; + if (isdir) { + strbuf_add(base, path, pathlen); + if (should_recurse(base, opt)) { + recurse = 1; + emitthis = opt->flags.tree_in_recursive; + } + strbuf_setlen(base, old_baselen); } if (emitthis) { From patchwork Wed Mar 26 20:18:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Toon Claes X-Patchwork-Id: 14030539 Received: from out-173.mta1.migadu.com (out-173.mta1.migadu.com [95.215.58.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F7CA2512EE for ; Wed, 26 Mar 2025 20:19:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020387; cv=none; b=rXzbkdL/k9NlUaJ84mkTp2WIYYOaSciZKePQdmfLCLqqfm3mnSb1czbaX48QAtLAHGELwOn2Q5jW9rlg5hkx2X+M861ekf0v6+BBfyz5kinin5kYLFjhW31TTNkQp63dnNOPnANeGNmjwQfYQQlnzeEp0Zs8TBPINKJMqFRhA64= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020387; c=relaxed/simple; bh=0O2WvGs2ykmym5YUzBe7rlmucIYLRnIsN2j3UU8eaZo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=eCrmk6oxBas5oLy3wQm+dyTMgtRsbZYu0t3L8Ry6B4oB3dnmmkoRiJShe5HccfrTqYTTM0lYDFh5UbJt0Ah4Doqk6MXChJFIUQufxgAfFe/nmxvZVPlU88aMaLXWg1n2nEy57dF3uUOiPs0qau2bmUjrOLccRVvCgJN9/1ySd7s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com; spf=fail smtp.mailfrom=iotcl.com; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b=0nU2L4UQ; arc=none smtp.client-ip=95.215.58.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=iotcl.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b="0nU2L4UQ" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iotcl.com; s=key1; t=1743020382; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FPlFTI5WyODzn9QvpIhXIEqvRzM0lNrWQHcv5HTkSMQ=; b=0nU2L4UQMgEybCz27AZmC6rbc8yX+pmsimHSrD05UEwz5/Kg+cewQIPK/5ByasvVuOAqSz Dq1zbAnCoLIeNgDE5YfdxylZ+iwqQsPKcKBubeuWdE2weE+OdUEm4nMptMPkiJZ2t13zhh PsstawfHLkaJ+VnqQyq8iqcYiS+dgwo= From: Toon Claes Date: Wed, 26 Mar 2025 21:18:28 +0100 Subject: [PATCH 4/8] blame-tree: introduce new subcommand to blame files Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250326-toon-blame-tree-v1-4-4173133f3786@iotcl.com> References: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> In-Reply-To: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt , =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= , Toon Claes X-Migadu-Flow: FLOW_OUT Similar to git-blame(1), introduce a new subcommand git-blame-tree(1). This command shows the most recent modification to paths in a tree. It does so by expanding the tree at a given commit, taking note of the current state of each path, and then walking backwards through history looking for commits where each path changed into its final commit ID. Based-on-a-patch-by: Jeff King Improved-by: "Ævar Arnfjörð Bjarmason" Signed-off-by: Toon Claes --- .gitignore | 1 + Makefile | 2 + blame-tree.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++ blame-tree.h | 43 +++++++++++ builtin.h | 1 + builtin/blame-tree.c | 67 +++++++++++++++++ git.c | 1 + meson.build | 2 + t/helper/test-tool.h | 1 + t/meson.build | 1 + t/t8020-blame-tree.sh | 142 ++++++++++++++++++++++++++++++++++++ 11 files changed, 459 insertions(+) diff --git a/.gitignore b/.gitignore index 08a66ca508..27faa0ce90 100644 --- a/.gitignore +++ b/.gitignore @@ -22,6 +22,7 @@ /git-backfill /git-bisect /git-blame +/git-blame-tree /git-branch /git-bugreport /git-bundle diff --git a/Makefile b/Makefile index 7315507381..92fdfc76df 100644 --- a/Makefile +++ b/Makefile @@ -972,6 +972,7 @@ LIB_OBJS += archive.o LIB_OBJS += attr.o LIB_OBJS += base85.o LIB_OBJS += bisect.o +LIB_OBJS += blame-tree.o LIB_OBJS += blame.o LIB_OBJS += blob.o LIB_OBJS += bloom.o @@ -1215,6 +1216,7 @@ BUILTIN_OBJS += builtin/archive.o BUILTIN_OBJS += builtin/backfill.o BUILTIN_OBJS += builtin/bisect.o BUILTIN_OBJS += builtin/blame.o +BUILTIN_OBJS += builtin/blame-tree.o BUILTIN_OBJS += builtin/branch.o BUILTIN_OBJS += builtin/bugreport.o BUILTIN_OBJS += builtin/bundle.o diff --git a/blame-tree.c b/blame-tree.c new file mode 100644 index 0000000000..ed4ec1e694 --- /dev/null +++ b/blame-tree.c @@ -0,0 +1,198 @@ +#include "git-compat-util.h" +#include "blame-tree.h" +#include "strvec.h" +#include "hex.h" +#include "commit.h" +#include "diffcore.h" +#include "diff.h" +#include "object.h" +#include "revision.h" +#include "repository.h" +#include "log-tree.h" + +void blame_tree_opts_release(struct blame_tree_options *bto) +{ + strvec_clear(&bto->args); +} + +struct blame_tree_entry { + struct object_id oid; + struct commit *commit; +}; + +static void add_from_diff(struct diff_queue_struct *q, + struct diff_options *opt UNUSED, void *data) +{ + struct blame_tree *bt = data; + + for (int i = 0; i < q->nr; i++) { + struct diff_filepair *p = q->queue[i]; + struct blame_tree_entry *ent = xcalloc(1, sizeof(*ent)); + struct string_list_item *it; + + oidcpy(&ent->oid, &p->two->oid); + it = string_list_append(&bt->paths, p->two->path); + it->util = ent; + } +} + +static int add_from_revs(struct blame_tree *bt) +{ + size_t count = 0; + struct diff_options diffopt; + + memcpy(&diffopt, &bt->rev.diffopt, sizeof(diffopt)); + diffopt.output_format = DIFF_FORMAT_CALLBACK; + diffopt.format_callback = add_from_diff; + diffopt.format_callback_data = bt; + diffopt.no_free = 1; + + for (size_t i = 0; i < bt->rev.pending.nr; i++) { + struct object_array_entry *obj = bt->rev.pending.objects + i; + + if (obj->item->flags & UNINTERESTING) + continue; + + if (count++) + return error(_("can only blame one tree at a time")); + diff_tree_oid(bt->rev.repo->hash_algo->empty_tree, + &obj->item->oid, "", &diffopt); + diff_flush(&diffopt); + } + + string_list_sort(&bt->paths); + return 0; +} + +void blame_tree_init(struct repository *r, struct blame_tree *bt, + const struct blame_tree_options *opts) +{ + repo_init_revisions(r, &bt->rev, opts->prefix); + bt->rev.def = oid_to_hex(&opts->oid); + bt->rev.combine_merges = 1; + bt->rev.show_root_diff = 1; + bt->rev.boundary = 1; + bt->rev.no_commit_id = 1; + bt->rev.diff = 1; + bt->rev.diffopt.flags.recursive = opts->recursive; + setup_revisions(opts->args.nr, opts->args.v, &bt->rev, NULL); + + if (add_from_revs(bt) < 0) + die(_("unable to setup blame-tree")); +} + +void blame_tree_release(struct blame_tree *bt) +{ + string_list_clear(&bt->paths, 1); + release_revisions(&bt->rev); +} + +struct blame_tree_callback_data { + struct commit *commit; + struct string_list *paths; + size_t num_interesting; + + blame_tree_fn callback; + void *callback_data; +}; + +static void mark_path(const char *path, const struct object_id *oid, + struct blame_tree_callback_data *data) +{ + struct string_list_item *item = string_list_lookup(data->paths, path); + struct blame_tree_entry *ent; + + /* Is it even a path that exists in our tree? */ + if (!item) + return; + + /* Have we already blamed a commit? */ + ent = item->util; + if (ent->commit) + return; + + /* + * Is it arriving at a version of interest, or is it from a side branch + * which did not contribute to the final state? + */ + if (!oideq(oid, &ent->oid)) + return; + + ent->commit = data->commit; + data->num_interesting--; + if (data->callback) + data->callback(path, data->commit, data->callback_data); +} + +static void blame_diff(struct diff_queue_struct *q, + struct diff_options *opt UNUSED, void *cbdata) +{ + struct blame_tree_callback_data *data = cbdata; + + for (int i = 0; i < q->nr; i++) { + struct diff_filepair *p = q->queue[i]; + switch (p->status) { + case DIFF_STATUS_DELETED: + /* + * There's no point in feeding a deletion, as it could + * not have resulted in our current state, which + * actually has the file. + */ + break; + + default: + /* + * Otherwise, we care only that we somehow arrived at + * a final path/sha1 state. Note that this covers some + * potentially controversial areas, including: + * + * 1. A rename or copy will be blamed, as it is the + * first time the content has arrived at the given + * path. + * + * 2. Even a non-content modification like a mode or + * type change will trigger it. + * + * We take the inclusive approach for now, and blame + * anything which impacts the path. Options to tweak + * the behavior (e.g., to "--follow" the content across + * renames) can come later. + */ + mark_path(p->two->path, &p->two->oid, data); + break; + } + } +} + +int blame_tree_run(struct blame_tree *bt, blame_tree_fn cb, void *cbdata) +{ + struct blame_tree_callback_data data; + + data.paths = &bt->paths; + data.num_interesting = bt->paths.nr; + data.callback = cb; + data.callback_data = cbdata; + + bt->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK; + bt->rev.diffopt.format_callback = blame_diff; + bt->rev.diffopt.format_callback_data = &data; + + prepare_revision_walk(&bt->rev); + + while (data.num_interesting) { + data.commit = get_revision(&bt->rev); + if (!data.commit) + break; + + if (data.commit->object.flags & BOUNDARY) { + diff_tree_oid(bt->rev.repo->hash_algo->empty_tree, + &data.commit->object.oid, + "", &bt->rev.diffopt); + diff_flush(&bt->rev.diffopt); + } else { + log_tree_commit(&bt->rev, data.commit); + } + } + + return 0; +} diff --git a/blame-tree.h b/blame-tree.h new file mode 100644 index 0000000000..ea06298f88 --- /dev/null +++ b/blame-tree.h @@ -0,0 +1,43 @@ +#ifndef BLAME_TREE_H +#define BLAME_TREE_H + +#include "hash.h" +#include "strvec.h" +#include "string-list.h" +#include "revision.h" +#include "commit.h" + +struct blame_tree_options { + struct object_id oid; + const char *prefix; + unsigned int recursive; + struct strvec args; +}; + +#define BLAME_TREE_OPTIONS_INIT(...) { \ + .args = STRVEC_INIT, \ + __VA_ARGS__ \ +} + +void blame_tree_opts_release(struct blame_tree_options *bto); + +struct blame_tree { + struct string_list paths; + struct rev_info rev; + struct repository *repository; +}; +#define BLAME_TREE_INIT { \ + .paths = STRING_LIST_INIT_DUP, \ + .rev = REV_INFO_INIT, \ +} + +void blame_tree_init(struct repository *r, struct blame_tree *bt, + const struct blame_tree_options *opts); + +void blame_tree_release(struct blame_tree *); + +typedef void (*blame_tree_fn)(const char *path, const struct commit *commit, + void *data); +int blame_tree_run(struct blame_tree *bt, blame_tree_fn cb, void *data); + +#endif /* BLAME_TREE_H */ diff --git a/builtin.h b/builtin.h index 993a583872..be3176924d 100644 --- a/builtin.h +++ b/builtin.h @@ -123,6 +123,7 @@ int cmd_archive(int argc, const char **argv, const char *prefix, struct reposito int cmd_backfill(int argc, const char **argv, const char *prefix, struct repository *repo); int cmd_bisect(int argc, const char **argv, const char *prefix, struct repository *repo); int cmd_blame(int argc, const char **argv, const char *prefix, struct repository *repo); +int cmd_blame_tree(int argc, const char **argv, const char *prefix, struct repository *repo); int cmd_branch(int argc, const char **argv, const char *prefix, struct repository *repo); int cmd_bugreport(int argc, const char **argv, const char *prefix, struct repository *repo); int cmd_bundle(int argc, const char **argv, const char *prefix, struct repository *repo); diff --git a/builtin/blame-tree.c b/builtin/blame-tree.c new file mode 100644 index 0000000000..bc404b63f3 --- /dev/null +++ b/builtin/blame-tree.c @@ -0,0 +1,67 @@ +#define USE_THE_REPOSITORY_VARIABLE + +#include "git-compat-util.h" +#include "blame-tree.h" +#include "strvec.h" +#include "hex.h" +#include "quote.h" +#include "config.h" +#include "environment.h" +#include "object-name.h" +#include "parse-options.h" +#include "builtin.h" +#include "setup.h" + +static void show_entry(const char *path, const struct commit *commit, void *d) +{ + struct blame_tree *bt = d; + + if (commit->object.flags & BOUNDARY) + putchar('^'); + printf("%s\t", oid_to_hex(&commit->object.oid)); + + if (bt->rev.diffopt.line_termination) + write_name_quoted(path, stdout, '\n'); + else + printf("%s%c", path, '\0'); + + fflush(stdout); +} + +int cmd_blame_tree(int argc, const char **argv, const char *prefix, struct repository *repo) +{ + struct blame_tree bt = BLAME_TREE_INIT; + struct blame_tree_options opts = BLAME_TREE_OPTIONS_INIT( + .prefix = prefix, + ); + + struct option options[] = { + OPT_BOOL(0, "recursive", &opts.recursive, + "recurse into to subtrees"), + OPT_END() + }; + + const char * const blame_tree_usage[] = { + N_("git blame-tree [--no-recursive] []"), + NULL, + }; + + git_config(git_default_config, NULL); + + if (repo_get_oid(the_repository, "HEAD", &opts.oid)) + die("unable to get HEAD"); + + argc = parse_options(argc, argv, prefix, options, blame_tree_usage, + PARSE_OPT_KEEP_ARGV0 | PARSE_OPT_KEEP_UNKNOWN_OPT); + if (argc) + strvec_pushv(&opts.args, argv); + + blame_tree_init(repo, &bt, &opts); + + if (blame_tree_run(&bt, show_entry, &bt) < 0) + die(_("error running blame-tree traversal")); + blame_tree_release(&bt); + blame_tree_opts_release(&opts); + + return 0; +} diff --git a/git.c b/git.c index 450d6aaa86..42de740378 100644 --- a/git.c +++ b/git.c @@ -509,6 +509,7 @@ static struct cmd_struct commands[] = { { "backfill", cmd_backfill, RUN_SETUP }, { "bisect", cmd_bisect, RUN_SETUP }, { "blame", cmd_blame, RUN_SETUP }, + { "blame-tree", cmd_blame_tree, RUN_SETUP }, { "branch", cmd_branch, RUN_SETUP | DELAY_PAGER_CONFIG }, { "bugreport", cmd_bugreport, RUN_SETUP_GENTLY }, { "bundle", cmd_bundle, RUN_SETUP_GENTLY }, diff --git a/meson.build b/meson.build index efe2871c9d..dd231b669b 100644 --- a/meson.build +++ b/meson.build @@ -241,6 +241,7 @@ libgit_sources = [ 'attr.c', 'base85.c', 'bisect.c', + 'blame-tree.c', 'blame.c', 'blob.c', 'bloom.c', @@ -512,6 +513,7 @@ builtin_sources = [ 'builtin/archive.c', 'builtin/backfill.c', 'builtin/bisect.c', + 'builtin/blame-tree.c', 'builtin/blame.c', 'builtin/branch.c', 'builtin/bugreport.c', diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 6d62a5b53d..41cc3730dc 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -5,6 +5,7 @@ int cmd__advise_if_enabled(int argc, const char **argv); int cmd__bitmap(int argc, const char **argv); +int cmd__blame_tree(int argc, const char **argv); int cmd__bloom(int argc, const char **argv); int cmd__bundle_uri(int argc, const char **argv); int cmd__cache_tree(int argc, const char **argv); diff --git a/t/meson.build b/t/meson.build index 950d1b7483..6f6485c8b4 100644 --- a/t/meson.build +++ b/t/meson.build @@ -960,6 +960,7 @@ integration_tests = [ 't8012-blame-colors.sh', 't8013-blame-ignore-revs.sh', 't8014-blame-ignore-fuzzy.sh', + 't8020-blame-tree.sh', 't9001-send-email.sh', 't9002-column.sh', 't9003-help-autocorrect.sh', diff --git a/t/t8020-blame-tree.sh b/t/t8020-blame-tree.sh new file mode 100755 index 0000000000..5fd2c079fe --- /dev/null +++ b/t/t8020-blame-tree.sh @@ -0,0 +1,142 @@ +#!/bin/sh + +test_description='blame-tree tests' + +. ./test-lib.sh + +test_expect_success 'setup' ' + test_commit 1 file && + mkdir a && + test_commit 2 a/file && + mkdir a/b && + test_commit 3 a/b/file +' + +test_expect_success 'cannot blame two trees' ' + test_must_fail git blame-tree HEAD HEAD~1 +' + +check_blame() { + local indir= && + while test $# != 0 + do + case "$1" in + -C) + indir="$2" + shift + ;; + *) + break + ;; + esac && + shift + done && + + cat >expect && + test_when_finished "rm -f tmp.*" && + git ${indir:+-C "$indir"} blame-tree "$@" >tmp.1 && + git name-rev --annotate-stdin --name-only --tags \ + tmp.2 && + tr '\t' ' ' tmp.3 && + sort tmp.3 >actual && + test_cmp expect actual +} + +test_expect_success 'blame recursive' ' + check_blame --recursive <<-\EOF + 1 file + 2 a/file + 3 a/b/file + EOF +' + +test_expect_success 'blame non-recursive' ' + check_blame --no-recursive <<-\EOF + 1 file + 3 a + EOF +' + +test_expect_success 'blame subdir' ' + check_blame a <<-\EOF + 3 a + EOF +' + +test_expect_success 'blame subdir recursive' ' + check_blame --recursive a <<-\EOF + 2 a/file + 3 a/b/file + EOF +' + +test_expect_success 'blame from non-HEAD commit' ' + check_blame --no-recursive HEAD^ <<-\EOF + 1 file + 2 a + EOF +' + +test_expect_success 'blame from subdir defaults to root' ' + check_blame -C a --no-recursive <<-\EOF + 1 file + 3 a + EOF +' + +test_expect_success 'blame from subdir uses relative pathspecs' ' + check_blame -C a --recursive b <<-\EOF + 3 a/b/file + EOF +' + +test_expect_failure 'limit blame traversal by count' ' + check_blame --no-recursive -1 <<-\EOF + 3 a + EOF +' + +test_expect_success 'limit blame traversal by commit' ' + check_blame --no-recursive HEAD~2..HEAD <<-\EOF + 3 a + ^1 file + EOF +' + +test_expect_success 'only blame files in the current tree' ' + git rm -rf a && + git commit -m "remove a" && + check_blame <<-\EOF + 1 file + EOF +' + +test_expect_success 'cross merge boundaries in blaming' ' + git checkout HEAD^0 && + git rm -rf . && + test_commit m1 && + git checkout HEAD^ && + git rm -rf . && + test_commit m2 && + git merge m1 && + check_blame <<-\EOF + m1 m1.t + m2 m2.t + EOF +' + +test_expect_success 'blame merge for resolved conflicts' ' + git checkout HEAD^0 && + git rm -rf . && + test_commit c1 conflict && + git checkout HEAD^ && + git rm -rf . && + test_commit c2 conflict && + test_must_fail git merge c1 && + test_commit resolved conflict && + check_blame conflict <<-\EOF + resolved conflict + EOF +' + +test_done From patchwork Wed Mar 26 20:18:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toon Claes X-Patchwork-Id: 14030540 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 866BD24EF97 for ; Wed, 26 Mar 2025 20:19:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020391; cv=none; b=gInk6Ib+LcH31iBDZMAlkRbOt2ADuKlxRwLE8skgY4qDmvL3llVb7GiZkppPDwtikkySPdeXSV4jKVtz78flERqerrgVek4BPgkMqFO6fMkUPr5DQ3g2yvVBUj5R2cRYZ0odzm6DeXMWLif14AuTAOavvuYbN3aCUE7fQBEO25g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020391; c=relaxed/simple; bh=4bAywy4PIYpYXPv0TSkIXjNtit4YwQYek4bv729yTtM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=VZi+fcB31xZgBhVCYOk8XGwZysJOwM3vc9aSbBLQ6omQeEZFx8jhA83hVQ212MMKlwbDnv6XeTtf0KWvYAAjTAHZrZ2sc6r3zcnGBGyPKo0yk1MK1gKGPj5Zhs73y8OgFewrB+FsIBJt4qdlIK1yp4/blg+X8p7SfYeNT50xy3o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com; spf=fail smtp.mailfrom=iotcl.com; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b=inlJlPPk; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=iotcl.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b="inlJlPPk" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iotcl.com; s=key1; t=1743020385; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i3MnCuhwVlFg/EBSuI7HaR8/5BJIwom/0fpo7Son0OA=; b=inlJlPPkC4LfccZlDAIBIEi0pQxK9bHXoIZ/1FVzsEKecg2C9OFsowc7K1P6hqZDgMg2rw sJylV8uBgCqwbUpl1HpYY4IEZw3nLquehegY/aL8yhL2OVDZP0Irw/9t1Rm5j9CUQaK9Vi qt8F5xUCFW0BcKV5vD7+pBRPsea1Q0o= From: Toon Claes Date: Wed, 26 Mar 2025 21:18:29 +0100 Subject: [PATCH 5/8] pathspec: add optional trie index Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250326-toon-blame-tree-v1-5-4173133f3786@iotcl.com> References: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> In-Reply-To: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt X-Migadu-Flow: FLOW_OUT From: Jeff King When checking if a path matches our pathspec list, in the worst case we walk the whole list linearly to check each pathspec. As a result, an operation like a tree diff does `O(paths * pathspecs)` comparisons. If we assume that the number of pathspecs may approach the number of paths, this is `O(n^2)` in the number of paths. We have to do it this way in the general case because pathspecs may be arbitrary fnmatch expressions, and we cannot create any meaningful index. However, it is common that all of the pathspecs are vanilla prefixes (e.g., "git.c" or "Documentation/") that do not use any wildcards or magic features. In such a case, we can index the pathspec list to give us faster lookups. The simplest solution would be to keep the pathspec list sorted, and use a binary search over the entries for each path. This turns out to be rather tricky, though, as we are not looking for an exact match in the list. We must also find prefix matches, and take care to handle trailing slashes for directories. Instead, this patch introduces a pathspec_trie struct, which represents the pathspecs as a graph of entries. For example, the pathspec list ["foo/bar", "foo/baz/"] would be represented as: (bar, terminal) / (foo) \ (baz, terminal, must_be_dir) To see if a path "foo/eek" is covered by the pathspec, we walk the trie, matching "foo" but ultimately seeing that "eek" is not mentioned. This provides us with two optimizations: 1. The children of each trie node are simple strings representing directory components. This means we can sort and binary-search them, giving us logarithmic lookups (but note that rather than one log lookup on the whole pathname, it is a sequence of smaller log lookups on components of the pathname). 2. Many users of the pathspec code do not feed full pathnames, but instead are walking a tree hierarchy themselves, and limiting the walk as they go. They can walk the trie at the same time, meanining they avoid looking repeatedly at parts of the pathspec we already know are matched. The current code, by contrast, will repeatedly match "foo/" against each pathspec when looking at "foo/bar", "foo/baz", etc. Note that this patch just introduces the data structure; the code is not used anywhere yet. Signed-off-by: Jeff King --- Makefile | 1 + pathspec.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++ pathspec.h | 23 +++++++++ t/helper/meson.build | 1 + t/helper/test-pathspec.c | 98 ++++++++++++++++++++++++++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + t/meson.build | 1 + t/t6137-pathspec-trie.sh | 57 +++++++++++++++++++++++ 9 files changed, 302 insertions(+) diff --git a/Makefile b/Makefile index 92fdfc76df..c90607baa1 100644 --- a/Makefile +++ b/Makefile @@ -827,6 +827,7 @@ TEST_BUILTINS_OBJS += test-parse-pathspec-file.o TEST_BUILTINS_OBJS += test-partial-clone.o TEST_BUILTINS_OBJS += test-path-utils.o TEST_BUILTINS_OBJS += test-path-walk.o +TEST_BUILTINS_OBJS += test-pathspec.o TEST_BUILTINS_OBJS += test-pcre2-config.o TEST_BUILTINS_OBJS += test-pkt-line.o TEST_BUILTINS_OBJS += test-proc-receive.o diff --git a/pathspec.c b/pathspec.c index 89663645e1..0e381fd748 100644 --- a/pathspec.c +++ b/pathspec.c @@ -17,6 +17,125 @@ #include "quote.h" #include "wildmatch.h" +/* + * This is basically a strcmp, but we do not want the caller + * to have to terminate "a", so we pretend as if it had a NUL. + */ +static int pathspec_trie_cmp(const char *a, size_t a_len, + const char *b) +{ + int ret = strncmp(a, b, a_len); + return ret ? + ret : + (unsigned char)0 - (unsigned char )b[a_len]; +} + +static struct pathspec_trie *pathspec_trie_alloc(const char *path, size_t len) +{ + struct pathspec_trie *ret = xcalloc(1, sizeof(*ret) + len); + memcpy(ret->path, path, len); + return ret; +} + +/* + * Add "path" to the trie rooted at "t". + */ +static void pathspec_trie_add(struct pathspec_trie *t, const char *path) +{ + /* + * Special case the empty path (i.e., "."), as our splitting algorithm + * below assumes at least one component. + */ + if (!*path) { + t->terminal = 1; + return; + } + + while (1) { + const char *slash = strchrnul(path, '/'); + size_t len = slash - path; + int pos; + + pos = pathspec_trie_lookup(t, path, len); + if (pos < 0) { + ALLOC_GROW(t->entries, t->nr + 1, t->alloc); + + pos = -pos - 1; + if (pos < t->nr) + memmove(t->entries + pos + 1, + t->entries + pos, + sizeof(*t->entries) * (t->nr - pos)); + t->entries[pos] = pathspec_trie_alloc(path, len); + t->nr++; + } + + t = t->entries[pos]; + path += len; + + if (!*path) { + t->must_be_dir = 0; + t->terminal = 1; + return; + } + + while (*path == '/') + path++; + if (!*path) { + /* + * if we were already a terminal, then do not set + * must_be_dir; we are "foo/", but we already had a + * pathspec "foo", which is a superset of us. + */ + if (!t->terminal) + t->must_be_dir = 1; + t->terminal = 1; + return; + } + } +} + +struct pathspec_trie *pathspec_trie_build(const struct pathspec *pathspec) +{ + struct pathspec_trie *ret; + int i; + + /* we only make a trie for plain-vanilla pathspecs */ + if (pathspec->has_wildcard || (pathspec->magic & ~PATHSPEC_LITERAL)) + return NULL; + + ret = pathspec_trie_alloc("", 0); + + /* + * XXX we could construct the trie more efficiently by creating each + * node with all of its entries in sorted order. But this is much + * simpler, and since we only do this once at the start of a traversal, + * it's probably fast enough. + */ + for (i = 0; i < pathspec->nr; i++) + pathspec_trie_add(ret, pathspec->items[i].match); + + return ret; +} + +int pathspec_trie_lookup(const struct pathspec_trie *parent, + const char *path, size_t len) +{ + int lo = 0, hi = parent->nr; + while (lo < hi) { + int mi = lo + ((hi - lo) / 2); + int cmp; + + cmp = pathspec_trie_cmp(path, len, parent->entries[mi]->path); + if (!cmp) + return mi; + if (cmp < 0) + hi = mi; + else + lo = mi + 1; + } + return -lo - 1; +} + /* * Finds which of the given pathspecs match items in the index. * diff --git a/pathspec.h b/pathspec.h index de537cff3c..71bafb78a9 100644 --- a/pathspec.h +++ b/pathspec.h @@ -2,6 +2,7 @@ #define PATHSPEC_H struct index_state; +struct pathspec; /* Pathspec magic */ #define PATHSPEC_FROMTOP (1<<0) @@ -22,6 +23,28 @@ struct index_state; #define PATHSPEC_ONESTAR 1 /* the pathspec pattern satisfies GFNM_ONESTAR */ +struct pathspec_trie { + struct pathspec_trie **entries; + int nr, alloc; + unsigned terminal:1, + must_be_dir:1; + char path[FLEX_ARRAY]; +}; + +/* + * Build a pathspec_trie for the given pathspec. + */ +struct pathspec_trie *pathspec_trie_build(const struct pathspec *); + +/* + * Do a binary search on one level of the pathspec_trie. If found, + * returns the offset of the item in the entry list. If not found, + * return a negative value encoding the offset where it would be inserted + * (you can recover the true offset with "-pos - 1"). + */ +int pathspec_trie_lookup(const struct pathspec_trie *pst, + const char *path, size_t len); + /** * See glossary-content.txt for the syntax of pathspec. * In memory, a pathspec set is represented by "struct pathspec" and is diff --git a/t/helper/meson.build b/t/helper/meson.build index d2cabaa2bc..f6dc3e443b 100644 --- a/t/helper/meson.build +++ b/t/helper/meson.build @@ -42,6 +42,7 @@ test_tool_sources = [ 'test-partial-clone.c', 'test-path-utils.c', 'test-path-walk.c', + 'test-pathspec.c', 'test-pcre2-config.c', 'test-pkt-line.c', 'test-proc-receive.c', diff --git a/t/helper/test-pathspec.c b/t/helper/test-pathspec.c new file mode 100644 index 0000000000..c04417681f --- /dev/null +++ b/t/helper/test-pathspec.c @@ -0,0 +1,98 @@ +#include "test-tool.h" +#include "pathspec.h" + +static const char usage_msg[] = +"test-tool pathspec trie [pathspecs...] -- [paths....]"; + +/* + * XXX Yuck. This is a lot of complicated code specific to our test. Even if it + * runs correctly, we have no real guarantee that the actual trie users are + * doing it right. And reusing their code is tough, because it happens as part + * of their own traversals (e.g., we walk the pathspec trie while walking the + * tree objects themselves). + * + * This whole test program should probably go away in favor of directly testing + * the tree-diff code. + */ +static int trie_match(const struct pathspec_trie *pst, + const char *path) +{ + int pathlen = strlen(path); + int is_dir = 0; + + if (pathlen > 0 && path[pathlen-1] == '/') { + is_dir = 1; + pathlen--; + } + + while (pathlen) { + const char *slash = memchr(path, '/', pathlen); + int component_len; + int pos; + + if (slash) + component_len = slash - path; + else + component_len = pathlen; + + pos = pathspec_trie_lookup(pst, path, component_len); + if (pos < 0) + return 0; + + pst = pst->entries[pos]; + path += component_len; + pathlen -= component_len; + + while (pathlen && *path == '/') { + path++; + pathlen--; + } + + if (pst->terminal) { + if (!pst->must_be_dir) + return 1; + if (pathlen) + return 1; + return is_dir; + } + } + return 0; +} + +static int cmd_trie(const char **argv) +{ + const char **specs, **paths; + struct pathspec pathspec; + struct pathspec_trie *trie; + + paths = specs = argv; + while (*paths && strcmp(*paths, "--")) + paths++; + if (*paths) + *paths++ = NULL; + + parse_pathspec(&pathspec, 0, 0, "", specs); + trie = pathspec_trie_build(&pathspec); + if (!trie) + die("unable to make trie from pathspec"); + + for (; *paths; paths++) { + if (trie_match(trie, *paths)) + printf("yes\n"); + else + printf("no\n"); + } + return 0; +} + +int cmd__pathspec(int argc UNUSED, const char **argv) +{ + const char *cmd = argv[1]; + + if (!cmd) + usage(usage_msg); + else if (!strcmp(cmd, "trie")) + return cmd_trie(argv + 2); + else + die("unknown cmd: %s", cmd); +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 50dc4dac4e..bc0a1b9245 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -54,6 +54,7 @@ static struct test_cmd cmds[] = { { "partial-clone", cmd__partial_clone }, { "path-utils", cmd__path_utils }, { "path-walk", cmd__path_walk }, + { "pathspec", cmd__pathspec }, { "pcre2-config", cmd__pcre2_config }, { "pkt-line", cmd__pkt_line }, { "proc-receive", cmd__proc_receive }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 41cc3730dc..5e79fa38ca 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -46,6 +46,7 @@ int cmd__parse_options_flags(int argc, const char **argv); int cmd__parse_pathspec_file(int argc, const char** argv); int cmd__parse_subcommand(int argc, const char **argv); int cmd__partial_clone(int argc, const char **argv); +int cmd__pathspec(int argc, const char **argv); int cmd__path_utils(int argc, const char **argv); int cmd__path_walk(int argc, const char **argv); int cmd__pcre2_config(int argc, const char **argv); diff --git a/t/meson.build b/t/meson.build index 6f6485c8b4..bb29c36b2a 100644 --- a/t/meson.build +++ b/t/meson.build @@ -788,6 +788,7 @@ integration_tests = [ 't6134-pathspec-in-submodule.sh', 't6135-pathspec-with-attrs.sh', 't6136-pathspec-in-bare.sh', + 't6137-pathspec-trie.sh', 't6200-fmt-merge-msg.sh', 't6300-for-each-ref.sh', 't6301-for-each-ref-errors.sh', diff --git a/t/t6137-pathspec-trie.sh b/t/t6137-pathspec-trie.sh new file mode 100755 index 0000000000..ca2935c7db --- /dev/null +++ b/t/t6137-pathspec-trie.sh @@ -0,0 +1,57 @@ +#!/bin/sh + +test_description='test optimized pathspec trie lookup' +. ./test-lib.sh + +# This is a basic lookup test; the offsets are there to provide +# some variation in where we land in our binary search. +ps="faa fzz foo/bar foo/baa foo/bzz" +for offset in a "a b" "a b c"; do + test_expect_success "lookups with ps=$offset $ps" " + cat >expect <<-\\EOF && + no + yes + yes + no + EOF + test-tool pathspec trie $offset $ps -- f faa foo/bar foo/baz >actual && + test_cmp expect actual + " +done + +test_expect_success 'pathspecs match by prefix' ' + cat >expect <<-\EOF && + yes + yes + yes + EOF + test-tool pathspec trie foo -- foo foo/bar foo/with/deep/subdirs >actual && + test_cmp expect actual +' + +test_expect_success 'trailing slash sets must_be_dir' ' + cat >expect <<-\EOF && + no + yes + yes + EOF + test-tool pathspec trie dir/ -- dir dir/ dir/foo +' + +test_expect_success 'overlapping pathspecs allow the "loose" side' ' + echo yes >expect && + test-tool pathspec trie foo foo/ -- foo >actual && + test_cmp expect actual && + test-tool pathspec trie foo/ foo -- foo >actual && + test_cmp expect actual +' + +test_expect_success '"." at the root matches everything' ' + cat >expect <<-\EOF && + yes + yes + EOF + test-tool pathspec trie . -- foo foo/bar +' + +test_done From patchwork Wed Mar 26 20:18:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toon Claes X-Patchwork-Id: 14030541 Received: from out-172.mta0.migadu.com (out-172.mta0.migadu.com [91.218.175.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E246214A8F for ; Wed, 26 Mar 2025 20:19:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020396; cv=none; b=FwoN2xMBXSG6vTOVzEh93t8eW8kRdkLUH62i53S5qI5vgR5IsMVkMSXYp0ubIzMA/BdKNkrvEtGT29/pgI7PtcGbuFOrlSn6L489tTCP2kEL4XSY65jDqResCq0f21VFAZjCXGK3qdH/RyJXZG094oDu8jye2rEC1nWHSVJHT14= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020396; c=relaxed/simple; bh=g7G5U0ib82y4oTPSle8iKL8Mj8IzRt5eLSqCBaOiG44=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=e/h7cWDZN6pPAh98xQSg51cagpgMfjCU8EEO9y261n+/7TRFqj2uL0G1zr/WyE3+G+XeZsagvflq/LDJjLYqnpeGa880e/+xUKM6GTlUESM9nzO8/d5Q5KcCSW1lznf7i5pzYhugjsveB4Q709McHoDlwZJ6xbmY+71tHzUkCf4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com; spf=fail smtp.mailfrom=iotcl.com; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b=V+8/4vhg; arc=none smtp.client-ip=91.218.175.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=iotcl.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b="V+8/4vhg" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iotcl.com; s=key1; t=1743020392; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=77bAUie27+ANIS2Evl6uua+NwfDNh6OrEC9u7U29vpw=; b=V+8/4vhgEPogXU4EaeIOWIlUtnmv11m0h4Nnc2uNNTZ2p0L8li9QcAXIDiJMaLJPsnvfgf 1pzOaWAwLVGNN6BMIhgM6QYiNogjEWuNNxkNApw0Swd72g7X8zpjqew3d6RzWuzJUTmQD4 6zZ9kuRhxEnoto07++5AnaX2pNjUjXU= From: Toon Claes Date: Wed, 26 Mar 2025 21:18:30 +0100 Subject: [PATCH 6/8] pathspec: turn on tries when appropriate Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250326-toon-blame-tree-v1-6-4173133f3786@iotcl.com> References: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> In-Reply-To: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt X-Migadu-Flow: FLOW_OUT From: Jeff King An earlier commit introduced pathspec_tries, but we did not actually generate them by default. This patch causes us to do so when it is possible (i.e., when no wildcards or other pathspec magic are in use). This doesn't actually do anything yet, though, as none of the pathspec users have learned to make use of the tries. We embed the pathspec_trie directly inside the "struct pathspec". This is not strictly necessary, as once created, the trie does not depend on the original pathspec. However, since the intended use is to optimize existing pathspec callers, passing the trie around as part of the pathspec will minimize disruption to the call chain. Signed-off-by: Jeff King --- pathspec.c | 19 +++++++++++++++++++ pathspec.h | 1 + t/helper/test-pathspec.c | 6 ++---- 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/pathspec.c b/pathspec.c index 0e381fd748..c174edef32 100644 --- a/pathspec.c +++ b/pathspec.c @@ -117,6 +117,19 @@ struct pathspec_trie *pathspec_trie_build(const struct pathspec *pathspec) return ret; } +static void pathspec_trie_clear(struct pathspec_trie *t) +{ + if (t) { + for (size_t i = 0; i < t->nr; i++) { + pathspec_trie_clear(t->entries[i]); + FREE_AND_NULL(t->entries[i]); + } + + t->nr = 0; + FREE_AND_NULL(t->entries); + } +} + int pathspec_trie_lookup(const struct pathspec_trie *parent, const char *path, size_t len) { @@ -799,6 +812,8 @@ void parse_pathspec(struct pathspec *pathspec, BUG("PATHSPEC_MAXDEPTH_VALID and PATHSPEC_KEEP_ORDER are incompatible"); QSORT(pathspec->items, pathspec->nr, pathspec_item_cmp); } + + pathspec->trie = pathspec_trie_build(pathspec); } void parse_pathspec_file(struct pathspec *pathspec, unsigned magic_mask, @@ -859,6 +874,8 @@ void copy_pathspec(struct pathspec *dst, const struct pathspec *src) d->attr_check = attr_check_dup(s->attr_check); } + + dst->trie = pathspec_trie_build(dst); } void clear_pathspec(struct pathspec *pathspec) @@ -877,6 +894,8 @@ void clear_pathspec(struct pathspec *pathspec) attr_check_free(pathspec->items[i].attr_check); } + pathspec_trie_clear(pathspec->trie); + FREE_AND_NULL(pathspec->trie); FREE_AND_NULL(pathspec->items); pathspec->nr = 0; } diff --git a/pathspec.h b/pathspec.h index 71bafb78a9..ff11599d25 100644 --- a/pathspec.h +++ b/pathspec.h @@ -76,6 +76,7 @@ struct pathspec { } *attr_match; struct attr_check *attr_check; } *items; + struct pathspec_trie *trie; }; #define GUARD_PATHSPEC(ps, mask) \ diff --git a/t/helper/test-pathspec.c b/t/helper/test-pathspec.c index c04417681f..72e335abc1 100644 --- a/t/helper/test-pathspec.c +++ b/t/helper/test-pathspec.c @@ -63,7 +63,6 @@ static int cmd_trie(const char **argv) { const char **specs, **paths; struct pathspec pathspec; - struct pathspec_trie *trie; paths = specs = argv; while (*paths && strcmp(*paths, "--")) @@ -72,12 +71,11 @@ static int cmd_trie(const char **argv) *paths++ = NULL; parse_pathspec(&pathspec, 0, 0, "", specs); - trie = pathspec_trie_build(&pathspec); - if (!trie) + if (!pathspec.trie) die("unable to make trie from pathspec"); for (; *paths; paths++) { - if (trie_match(trie, *paths)) + if (trie_match(pathspec.trie, *paths)) printf("yes\n"); else printf("no\n"); From patchwork Wed Mar 26 20:18:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Toon Claes X-Patchwork-Id: 14030542 Received: from out-173.mta1.migadu.com (out-173.mta1.migadu.com [95.215.58.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9EEA9255E55 for ; Wed, 26 Mar 2025 20:19:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020399; cv=none; b=G/fsnEpharF8Z3OUD+jRQ/HDwb8GfTsolvkgp8HocV+nzAgcKK3Awl3rI1eoLAODXSyaXpObAV8m4Ipofae55zpVV+W4Y1kQqQf0/eqMluWpBHoszORjFPxSIPn7+HHguM6eI/sff7SUELP+F0lOVHTHhIqzx1D+BB3FM34Skuo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020399; c=relaxed/simple; bh=Qb8O1CLZbww2BvKRKk4p2z5pz2dFNkLVnqaU5euU7bA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=FOpHpo/Za0wfq9aSAMqB2vfRvymKIDJKHMpFWyX4CiOzy8zhzmiMmJ0qSdNttcffQ17YdTLVN2KF/IBea/csyWAHKSHnCl65DEhTZJr5CEYQvtr9JUwkPP85sDJormZfIBp5HW6/Rwt+fFhwK4laiMtMj+MfMRoTh5S94Q7nRvs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com; spf=fail smtp.mailfrom=iotcl.com; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b=KM+kWJnC; arc=none smtp.client-ip=95.215.58.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=iotcl.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b="KM+kWJnC" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iotcl.com; s=key1; t=1743020395; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ICR27UkxxjH7++DZkx80GSOaDA5KH2fxXg7+Kx5RtxA=; b=KM+kWJnCryy86YhHUBJ7LNJVaWchB2wa7tBinsZfB2wJvph3C3i+NyVKTkZhVbdzx6HdlA TyXwpK8hiyTUmwibku3IDga4L7Ar1jkZPRiZpp1gNo8cv9hVQnKrSAo3BCIYJr9ojFvvT3 uObccpLUYr7GaRlqGNHBwRhYkCt5/GM= From: Toon Claes Date: Wed, 26 Mar 2025 21:18:31 +0100 Subject: [PATCH 7/8] tree-diff: use pathspec tries Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250326-toon-blame-tree-v1-7-4173133f3786@iotcl.com> References: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> In-Reply-To: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt X-Migadu-Flow: FLOW_OUT From: Jeff King The tree-diff currently matches each pathspec against every entry of the tree. For the common case of a handful of pathspecs, this is not a big deal. However, if you have a large number of pathspecs, it gets noticeably slow. Now that we have the pathspec_trie optimization, we can do much better (at least for simple cases without wildcards). Here are numbers for running "git rev-list" with limiting pathspecs of varying sizes, both before and after this patch: Test origin HEAD --------------------------------------------------------------- 4003.2: size=1 0.12(0.11+0.00) 0.12(0.12+0.00) +0.0% 4003.3: size=2 0.17(0.16+0.00) 0.16(0.15+0.01) -5.9% 4003.4: size=4 0.17(0.17+0.00) 0.17(0.17+0.00) +0.0% 4003.5: size=8 0.21(0.20+0.00) 0.20(0.20+0.00) -4.8% 4003.6: size=16 0.25(0.24+0.00) 0.21(0.20+0.00) -16.0% 4003.7: size=32 0.31(0.31+0.00) 0.21(0.20+0.00) -32.3% 4003.8: size=64 0.43(0.41+0.01) 0.21(0.21+0.00) -51.2% 4003.9: size=128 0.73(0.72+0.00) 0.22(0.21+0.00) -69.9% 4003.10: size=256 2.02(2.02+0.00) 0.37(0.36+0.00) -81.7% 4003.11: size=512 6.78(6.78+0.00) 0.64(0.64+0.00) -90.6% 4003.12: size=1024 23.67(23.67+0.02) 1.22(1.20+0.01) -94.8% For small pathspecs, we produce no real difference (which is good; we know we are asymptotically better, but we have not regressed our constant factor). Between 16 and 32 pathspecs we start to see some small improvement, and the benefit keeps growing with the number of pathspecs. Obviously these large-pathspec cases are unusual. But you might use them, for example, if the pathspecs were generated programatically (e.g., if you want the history of all files that are in Documentation/ _now_, not what was historically ever there, you would expand the pathspec at the current tree, and feed the result to rev-list). Signed-off-by: Jeff King --- t/perf/p4003-diff-pathspec.sh | 26 ++++++++++++ tree-diff.c | 98 +++++++++++++++++++++++++++++++++++++------ 2 files changed, 112 insertions(+), 12 deletions(-) diff --git a/t/perf/p4003-diff-pathspec.sh b/t/perf/p4003-diff-pathspec.sh new file mode 100755 index 0000000000..02312d1b0c --- /dev/null +++ b/t/perf/p4003-diff-pathspec.sh @@ -0,0 +1,26 @@ +#!/bin/sh + +test_description='diff performance with many pathspecs' +. ./perf-lib.sh + +test_perf_default_repo + +sizes='1 2 4 8 16 32 64 128 256 512 1024' + +test_expect_success 'create pathspec lists' ' + git ls-tree --name-only -r HEAD >all && + for i in $sizes; do + { + printf "%s\n" -- && + head -$i all + } >ps-$i || return 1 + done +' + +for i in $sizes; do + test_perf "size=$i" " + git rev-list HEAD --stdin /dev/null + " +done + +test_done diff --git a/tree-diff.c b/tree-diff.c index 2a744dfaec..f3d916201b 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -120,7 +120,8 @@ static void ll_diff_tree_paths( struct combine_diff_path ***tail, const struct object_id *oid, const struct object_id **parents_oid, int nparent, struct strbuf *base, struct diff_options *opt, - int depth); + int depth, struct pathspec_trie *pst); + static void ll_diff_tree_oid(const struct object_id *old_oid, const struct object_id *new_oid, struct strbuf *base, struct diff_options *opt); @@ -205,7 +206,7 @@ static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_ static void emit_path(struct combine_diff_path ***tail, struct strbuf *base, struct diff_options *opt, int nparent, struct tree_desc *t, struct tree_desc *tp, - int imin, int depth) + int imin, int depth, struct pathspec_trie *pst) { unsigned short mode; const char *path; @@ -309,23 +310,95 @@ static void emit_path(struct combine_diff_path ***tail, parents_oid[i] = tpi_valid ? &tp[i].entry.oid : NULL; } + /* + * As we recurse through the tree objects, move through + * our pathspec trie, as well. The one exception is if + * we already hit a terminal node. This means we have a strict + * prefix match (e.g., "foo/" matched, and we are in + * "foo/bar"). We don't have to bother with checking the + * pathspec at all anymore in that case. + * + * Note that the "pos < 0" case should not happen here, + * as we would have skipped the tree entry as uninteresting + * earlier. As a safety measure, we turn off the trie + * optimization and fall back to doing regular pathspec + * matching in this case. + */ + if (pst && !pst->terminal) { + int pos = pathspec_trie_lookup(pst, path, pathlen); + if (pos < 0) + pst = NULL; + else + pst = pst->entries[pos]; + } + strbuf_add(base, path, pathlen); strbuf_addch(base, '/'); ll_diff_tree_paths(tail, oid, parents_oid, nparent, base, opt, - depth + 1); + depth + 1, pst); FAST_ARRAY_FREE(parents_oid, nparent); } strbuf_setlen(base, old_baselen); } +static enum interesting match_pathspec_trie_entry(struct pathspec_trie *pst, + const struct name_entry *entry) +{ + int pos; + + /* + * If our base directory is matched, then everything below is + * interesting (i.e., a prefix match). + */ + if (pst->terminal) + return entry_interesting; + + /* + * Otherwise, look up the actual entry. If we don't mention it at all, + * it's definitely uninteresting. But furthermore, if we're at the + * end of our sorted list, we know that nothing after it is + * interesting, either. + * + * XXX It seems like we should have to make special consideration here + * for the sort order of trees. But tree_entry_interesting does not + * seem to. Is it OK, is tree_entry_interesting buggy too, or am I + * reading it wrong? This optimization gives substantial speedups, so + * we really need to keep it or something like it. + */ + pos = pathspec_trie_lookup(pst, entry->path, tree_entry_len(entry)); + if (pos < 0) { + if (-pos - 1 == pst->nr) + return all_entries_not_interesting; + else + return entry_not_interesting; + } + + /* + * We definitely have the entry. First we have to resolve any directory + * restrictions; if there aren't any, then it's definitely interesting. + * + * Note that we do not need to check the "terminal" flag of the + * resulting trie node. If it is not set, then this particular entry + * does not match our pathspec, but we do still need to traverse + * through it to get to the interesting things inside. It's interesting + * either way. + */ + if (pst->entries[pos]->must_be_dir) + return !!S_ISDIR(entry->mode); + return entry_interesting; +} + static void skip_uninteresting(struct tree_desc *t, struct strbuf *base, - struct diff_options *opt) + struct diff_options *opt, + struct pathspec_trie *pst) { enum interesting match; while (t->size) { - match = tree_entry_interesting(opt->repo->index, &t->entry, + match = pst ? + match_pathspec_trie_entry(pst, &t->entry) : + tree_entry_interesting(opt->repo->index, &t->entry, base, &opt->pathspec); if (match) { if (match == all_entries_not_interesting) @@ -433,7 +506,7 @@ static void ll_diff_tree_paths( struct combine_diff_path ***tail, const struct object_id *oid, const struct object_id **parents_oid, int nparent, struct strbuf *base, struct diff_options *opt, - int depth) + int depth, struct pathspec_trie *pst) { struct tree_desc t, *tp; void *ttree, **tptree; @@ -468,9 +541,9 @@ static void ll_diff_tree_paths( break; if (opt->pathspec.nr) { - skip_uninteresting(&t, base, opt); + skip_uninteresting(&t, base, opt, pst); for (i = 0; i < nparent; i++) - skip_uninteresting(&tp[i], base, opt); + skip_uninteresting(&tp[i], base, opt, pst); } /* comparing is finished when all trees are done */ @@ -535,7 +608,7 @@ static void ll_diff_tree_paths( /* D += {δ(t,pi) if pi=p[imin]; "+a" if pi > p[imin]} */ emit_path(tail, base, opt, nparent, - &t, tp, imin, depth); + &t, tp, imin, depth, pst); skip_emit_t_tp: /* t↓, ∀ pi=p[imin] pi↓ */ @@ -547,7 +620,7 @@ static void ll_diff_tree_paths( else if (cmp < 0) { /* D += "+t" */ emit_path(tail, base, opt, nparent, - &t, /*tp=*/NULL, -1, depth); + &t, /*tp=*/NULL, -1, depth, pst); /* t↓ */ update_tree_entry(&t); @@ -563,7 +636,7 @@ static void ll_diff_tree_paths( } emit_path(tail, base, opt, nparent, - /*t=*/NULL, tp, imin, depth); + /*t=*/NULL, tp, imin, depth, pst); skip_emit_tp: /* ∀ pi=p[imin] pi↓ */ @@ -584,7 +657,8 @@ struct combine_diff_path *diff_tree_paths( struct strbuf *base, struct diff_options *opt) { struct combine_diff_path *head = NULL, **tail = &head; - ll_diff_tree_paths(&tail, oid, parents_oid, nparent, base, opt, 0); + ll_diff_tree_paths(&tail, oid, parents_oid, nparent, base, opt, + 0, opt->pathspec.trie); return head; } From patchwork Wed Mar 26 20:18:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Toon Claes X-Patchwork-Id: 14030543 Received: from out-186.mta1.migadu.com (out-186.mta1.migadu.com [95.215.58.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC10F214A8F for ; Wed, 26 Mar 2025 20:19:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.186 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020402; cv=none; b=JODtx1MQYxTEJSPEWm9Iu09j4IbKQAxVeHSb3nkpjp28+4jHBeG3JMEOUbMDZnrO86FmdA40yjoEGe3EMvtbjLx9W3iBdVX6eshjc79ZZV8dLnqoDVRmbB9feU7GNgSK+rBs674iEbJ/hKr6sAEmdknq2B5aRy1fs5yCe7yfc5c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743020402; c=relaxed/simple; bh=DlIbwCVA4Z8fR6YFuZYrAxXpBtYZamNwX+6SHp00/sc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=cgA2fRe2ZoXesddZMEaoqpjTbAYYVdhocTSHbLKgLcDqUYlS4rMpjgUAl3N6uwVtXq0Zeb5p+o9Jzk7ixsogz2JlUlCaaO/1jA0OxbthbSUME5ZJ+5T9CvRjiqAIxIgNz3wtaM+J0h17UL8aM3kAVR0fz2p7cyxVQCKy2e48Jvc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com; spf=fail smtp.mailfrom=iotcl.com; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b=1sPnY0Uz; arc=none smtp.client-ip=95.215.58.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=iotcl.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=iotcl.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=iotcl.com header.i=@iotcl.com header.b="1sPnY0Uz" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iotcl.com; s=key1; t=1743020398; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RWymkKJjSTq1VhSrlAN4k2u3aqs/e4N/vBlfvHlBAL8=; b=1sPnY0UzhwtU/s4aZ2bSjZPSSgSovZT64/358hD9yHc4azmZMPSWlanZWNAtfl0+kX+QlC cxFqXv5VrO0/z8rUXwKvJ/Qc2EwHJgyF8rFnnS/YoT4m72G9rENOOUgayRfVnzdnrMKbSj 2Pa8OVNXD6AwrO5aZXK3hK2UA/OaqK4= From: Toon Claes Date: Wed, 26 Mar 2025 21:18:32 +0100 Subject: [PATCH 8/8] blame-tree: narrow pathspec as we traverse Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20250326-toon-blame-tree-v1-8-4173133f3786@iotcl.com> References: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> In-Reply-To: <20250326-toon-blame-tree-v1-0-4173133f3786@iotcl.com> To: git@vger.kernel.org Cc: Jeff King , Patrick Steinhardt X-Migadu-Flow: FLOW_OUT From: Jeff King As we find out the "last touched" for each path, we cease to care about that path. However, we traditionally have not told the revision-walking machinery about this, for two reasons: 1. Munging the pathspecs mid-traversal is not an officially supported thing to be doing. It does seem to work, though, and with Duy's recent pathspec refactoring, it's fairly straightforward. 2. The pathspec machinery is slow. We know have to look at each pathspec to see if each tree entry is interesting. And since our cost to handle a diff_filepair is so cheap, it doesn't really help us much. It turns out that (2) is not quite right, though. For the diffs themselves, limiting the pathspec doesn't help at all. But it does mean that we can simplify history more aggressively, which may mean avoiding whole swaths of history that don't touch interesting paths. This happens especially near the end of the traversal, when we are only interested in a few paths, and there are many side branches that do not touch any of them. And as of the last commit, the pathspec machinery is no longer quadratic, so the cost to use it is easily worth paying, even for large trees. Here are runs of `git blame-tree --max-depth=0` for torvalds/linux (a complex history with a small root tree) and git/git (a flat hierarchy with a large root tree): For git/git: Benchmark 1: ./git.old blame-tree --max-depth=0 Time (mean ± σ): 2.223 s ± 0.360 s [User: 1.808 s, System: 0.369 s] Range (min … max): 1.866 s … 2.927 s 10 runs Benchmark 2: ./git.new blame-tree --max-depth=0 Time (mean ± σ): 945.0 ms ± 93.4 ms [User: 783.6 ms, System: 131.0 ms] Range (min … max): 846.4 ms … 1071.5 ms 10 runs Summary ./git.new blame-tree --max-depth=0 ran 2.35 ± 0.45 times faster than git.old blame-tree --max-depth=0 For torvalds/linux: Benchmark 1: ./git.old blame-tree --max-depth=0 Time (mean ± σ): 13.504 s ± 0.545 s [User: 12.605 s, System: 0.583 s] Range (min … max): 12.947 s … 14.488 s 10 runs Benchmark 2: ./git.new blame-tree --max-depth=0 Time (mean ± σ): 622.8 ms ± 39.2 ms [User: 522.1 ms, System: 95.7 ms] Range (min … max): 556.6 ms … 681.6 ms 10 runs Summary ./git.new blame-tree --max-depth=0 ran 21.68 ± 1.62 times faster than ./git.old blame-tree --max-depth=0 Note that the output may change slightly in some cases due to the history pruning. This should be acceptable, as either output is correct in these cases (e.g., scenarios like a cherry-picked commit on two parallel branches). Signed-off-by: Jeff King --- blame-tree.c | 18 ++++++++++++++ pathspec.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ pathspec.h | 3 +++ 3 files changed, 99 insertions(+) diff --git a/blame-tree.c b/blame-tree.c index ed4ec1e694..7be67520f8 100644 --- a/blame-tree.c +++ b/blame-tree.c @@ -79,6 +79,18 @@ void blame_tree_init(struct repository *r, struct blame_tree *bt, if (add_from_revs(bt) < 0) die(_("unable to setup blame-tree")); + + pathspec_setup(&bt->rev.prune_data, &bt->paths); + copy_pathspec(&bt->rev.pruning.pathspec, &bt->rev.prune_data); + copy_pathspec(&bt->rev.diffopt.pathspec, &bt->rev.prune_data); + bt->rev.prune = 1; + + /* + * Have the diff engine tell us about everything, including trees. + * We may have used --max-depth to get our list of paths to blame, + * in which case we would mention trees explicitly. + */ + bt->rev.diffopt.flags.tree_in_recursive = 1; } void blame_tree_release(struct blame_tree *bt) @@ -91,6 +103,7 @@ struct blame_tree_callback_data { struct commit *commit; struct string_list *paths; size_t num_interesting; + struct rev_info *rev; blame_tree_fn callback; void *callback_data; @@ -122,6 +135,10 @@ static void mark_path(const char *path, const struct object_id *oid, data->num_interesting--; if (data->callback) data->callback(path, data->commit, data->callback_data); + + pathspec_drop(&data->rev->pruning.pathspec, path); + clear_pathspec(&data->rev->diffopt.pathspec); + copy_pathspec(&data->rev->diffopt.pathspec, &data->rev->pruning.pathspec); } static void blame_diff(struct diff_queue_struct *q, @@ -172,6 +189,7 @@ int blame_tree_run(struct blame_tree *bt, blame_tree_fn cb, void *cbdata) data.num_interesting = bt->paths.nr; data.callback = cb; data.callback_data = cbdata; + data.rev = &bt->rev; bt->rev.diffopt.output_format = DIFF_FORMAT_CALLBACK; bt->rev.diffopt.format_callback = blame_diff; diff --git a/pathspec.c b/pathspec.c index c174edef32..7ef2a55280 100644 --- a/pathspec.c +++ b/pathspec.c @@ -117,6 +117,50 @@ struct pathspec_trie *pathspec_trie_build(const struct pathspec *pathspec) return ret; } +static void pathspec_drop_from_trie(struct pathspec *ps, + const char *path) +{ + struct pathspec_trie *prev = NULL, *cur = ps->trie; + int pos = -1; + + if (!cur) + return; + + while (*path) { + const char *end = strchrnul(path, '/'); + size_t len = end - path; + pos = pathspec_trie_lookup(cur, path, len); + + if (pos < 0) + die("BUG: didn't find the pathspec trie we matched"); + + prev = cur; + cur = cur->entries[pos]; + path = end; + while (*path == '/') + path++; + } + + if (!cur->terminal) + die("BUG: pathspec trie we found isn't terminal?"); + + if (cur->nr) { + cur->terminal = 0; + cur->must_be_dir = 0; + return; + } + + free(cur); + if (pos < 0) + ps->trie = NULL; + else if (prev) { + prev->nr--; + memmove(prev->entries + pos, + prev->entries + pos + 1, + sizeof(*prev->entries) * (prev->nr - pos)); + } +} + static void pathspec_trie_clear(struct pathspec_trie *t) { if (t) { @@ -878,6 +922,40 @@ void copy_pathspec(struct pathspec *dst, const struct pathspec *src) dst->trie = pathspec_trie_build(dst); } +void pathspec_setup(struct pathspec *ps, const struct string_list *paths) +{ + struct strvec argv = STRVEC_INIT; + size_t i; + + for (i = 0; i < paths->nr; i++) + strvec_push(&argv, paths->items[i].string); + + clear_pathspec(ps); + parse_pathspec(ps, PATHSPEC_ALL_MAGIC & ~PATHSPEC_LITERAL, + PATHSPEC_PREFER_FULL | PATHSPEC_LITERAL_PATH, "", + argv.v); + strvec_clear(&argv); +} + +void pathspec_drop(struct pathspec *ps, const char *path) +{ + int i; + + /* We know these are literals, so we can just strcmp */ + for (i = 0; i < ps->nr; i++) + if (!strcmp(ps->items[i].match, path)) + break; + + if (i == ps->nr) + die("BUG: didn't find the pathspec we just matched"); + + memmove(ps->items + i, ps->items + i + 1, + sizeof(*ps->items) * (ps->nr - i - 1)); + ps->nr--; + + pathspec_drop_from_trie(ps, path); +} + void clear_pathspec(struct pathspec *pathspec) { int i, j; diff --git a/pathspec.h b/pathspec.h index ff11599d25..a46b3177e3 100644 --- a/pathspec.h +++ b/pathspec.h @@ -3,6 +3,7 @@ struct index_state; struct pathspec; +struct string_list; /* Pathspec magic */ #define PATHSPEC_FROMTOP (1<<0) @@ -152,6 +153,8 @@ void parse_pathspec_file(struct pathspec *pathspec, int nul_term_line); void copy_pathspec(struct pathspec *dst, const struct pathspec *src); +void pathspec_setup(struct pathspec *ps, const struct string_list *paths); +void pathspec_drop(struct pathspec *ps, const char *path); void clear_pathspec(struct pathspec *); /*