diff mbox series

[v2,15/17] mktree: optionally add to an existing tree

Message ID 4b88f84b933b1598d12e3620f0c9fb85c559e8fb.1718834285.git.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series mktree: support more flexible usage | expand

Commit Message

Victoria Dye June 19, 2024, 9:58 p.m. UTC
From: Victoria Dye <vdye@github.com>

Allow users to specify a single "tree-ish" value as a positional argument.
If provided, the contents of the given tree serve as the basis for the new
tree (or trees, in --batch mode) created by 'mktree', on top of which all of
the stdin-provided tree entries are applied.

At a high level, the entries are "applied" to a base tree by iterating
through the base tree using 'read_tree' in parallel with iterating through
the sorted & deduplicated stdin entries via their iterator. That is, for
each call to the 'build_index_from_tree callback of 'read_tree':

* If the iterator entry precedes the base tree entry, add it to the in-core
  index, increment the iterator, and repeat.
* If the iterator entry has the same name as the base tree entry, add the
  iterator entry to the index, increment the iterator, and return from the
  callback to continue the 'read_tree' iteration.
* If the iterator entry follows the base tree entry, first check
  'df_name_hash' to ensure we won't be adding an entry with the same name
  later (with a different mode). If there's no directory/file conflict, add
  the base tree entry to the index. In either case, return from the callback
  to continue the 'read_tree' iteration.

Finally, once 'read_tree' is complete, add the remaining entries in the
iterator to the index and write out the index as a tree.

Signed-off-by: Victoria Dye <vdye@github.com>
---
 Documentation/git-mktree.txt |   7 +-
 builtin/mktree.c             | 138 +++++++++++++++++++++++++++++------
 t/t1010-mktree.sh            |  36 +++++++++
 3 files changed, 159 insertions(+), 22 deletions(-)

Comments

Junio C Hamano June 26, 2024, 9:23 p.m. UTC | #1
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Victoria Dye <vdye@github.com>
>
> Allow users to specify a single "tree-ish" value as a positional argument.
> If provided, the contents of the given tree serve as the basis for the new
> tree (or trees, in --batch mode) created by 'mktree', on top of which all of
> the stdin-provided tree entries are applied.
>
> At a high level, the entries are "applied" to a base tree by iterating
> through the base tree using 'read_tree' in parallel with iterating through
> the sorted & deduplicated stdin entries via their iterator. That is, for
> each call to the 'build_index_from_tree callback of 'read_tree':
>
> * If the iterator entry precedes the base tree entry, add it to the in-core
>   index, increment the iterator, and repeat.

"add it" -> "add the base tree entry"?  The next bullet point
explicitly says it adds "the iterator entry", which makes it crystal
clear what is going on.

> * If the iterator entry has the same name as the base tree entry, add the
>   iterator entry to the index, increment the iterator, and return from the
>   callback to continue the 'read_tree' iteration.
> * If the iterator entry follows the base tree entry, first check
>   'df_name_hash' to ensure we won't be adding an entry with the same name
>   later (with a different mode). If there's no directory/file conflict, add
>   the base tree entry to the index. In either case, return from the callback
>   to continue the 'read_tree' iteration.

IOW, we take advantage of the fact that iteration over the base tree
and iteration over the sorted-and-deduped entries from the standard
input are already sorted, and do a simple bog-standard "merge" of
two lists?

We'd probably have many common pitfalls to avoid with the read-tree
walking the index and tree(s) in parallel (I still remember the pain
of maintaining the cache_bottom for the side that walks the index).
Makes me wonder if this opens a way to a future where somehow
read-tree also shares code with this new code in mktree (or vice
versa).

> Finally, once 'read_tree' is complete, add the remaining entries in the
> iterator to the index and write out the index as a tree.

Or vice versa?  We may finish iterating over the entries read from
the standard input but there still are entries from the base tree
side remaining, which would need to be added to complete the index,
right?

> +<tree-ish>::
> +	If provided, the tree entries provided in stdin are added to this
> +	tree rather than a new empty one, replacing existing entries with
> +	identical names. Not compatible with `--literally`.

"replacing" might need a bit more clarification when we start
reading paths with multiple pathname components concatenated with
slashes.  In the base tree, we may have

    100644 blob 536e55524db72bd2acf175208aef4f3dfc148d42    D

and it can (indirectly) replaced by the standard input stream
feeding entries like these

    100644 blob b0517166ae2ad92f3b17638cbdee0f04b8170d99    D/a
    100644 blob 495a54bc1397e2fd3177c2733baf4899b48d30bd    D/b


which also leads us to compute a tree entry

    040000 tree eccdce44520aa3ef4ac5ba090df53eadb01229ef    D/

in the top-level tree?

The code looks good to me.  Thanks.
diff mbox series

Patch

diff --git a/Documentation/git-mktree.txt b/Documentation/git-mktree.txt
index cf1fd82f754..260d0e0bd7b 100644
--- a/Documentation/git-mktree.txt
+++ b/Documentation/git-mktree.txt
@@ -9,7 +9,7 @@  git-mktree - Build a tree-object from formatted tree entries
 SYNOPSIS
 --------
 [verse]
-'git mktree' [-z] [--missing] [--literally] [--batch]
+'git mktree' [-z] [--missing] [--literally] [--batch] [<tree-ish>]
 
 DESCRIPTION
 -----------
@@ -41,6 +41,11 @@  OPTIONS
 	optional.  Note - if the `-z` option is used, lines are terminated
 	with NUL.
 
+<tree-ish>::
+	If provided, the tree entries provided in stdin are added to this
+	tree rather than a new empty one, replacing existing entries with
+	identical names. Not compatible with `--literally`.
+
 INPUT FORMAT
 ------------
 Tree entries may be specified in any of the formats compatible with the
diff --git a/builtin/mktree.c b/builtin/mktree.c
index b4d71dcdd02..96f06547a2a 100644
--- a/builtin/mktree.c
+++ b/builtin/mktree.c
@@ -12,7 +12,9 @@ 
 #include "read-cache-ll.h"
 #include "strbuf.h"
 #include "tree.h"
+#include "object-name.h"
 #include "parse-options.h"
+#include "pathspec.h"
 #include "object-store-ll.h"
 
 struct tree_entry {
@@ -204,47 +206,124 @@  static void tree_entry_iterator_advance(struct tree_entry_iterator *iter)
 			: NULL;
 }
 
-static int add_tree_entry_to_index(struct index_state *istate,
+struct build_index_data {
+	struct tree_entry_iterator iter;
+	struct hashmap *df_name_hash;
+	struct index_state istate;
+};
+
+static int add_tree_entry_to_index(struct build_index_data *data,
 				   struct tree_entry *ent)
 {
 	struct cache_entry *ce;
-	struct strbuf ce_name = STRBUF_INIT;
-	strbuf_add(&ce_name, ent->name, ent->len);
-
-	ce = make_cache_entry(istate, ent->mode, &ent->oid, ent->name, 0, 0);
+	ce = make_cache_entry(&data->istate, ent->mode, &ent->oid, ent->name, 0, 0);
 	if (!ce)
 		return error(_("make_cache_entry failed for path '%s'"), ent->name);
 
-	add_index_entry(istate, ce, ADD_CACHE_JUST_APPEND);
-	strbuf_release(&ce_name);
+	add_index_entry(&data->istate, ce, ADD_CACHE_JUST_APPEND);
 	return 0;
 }
 
-static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
+static int build_index_from_tree(const struct object_id *oid,
+				 struct strbuf *base, const char *filename,
+				 unsigned mode, void *context)
 {
-	struct tree_entry_iterator iter = { NULL };
-	struct index_state istate = INDEX_STATE_INIT(the_repository);
-	istate.sparse_index = 1;
+	int result;
+	struct tree_entry *base_tree_ent;
+	struct build_index_data *cbdata = context;
+	size_t filename_len = strlen(filename);
+	size_t path_len = S_ISDIR(mode) ? st_add3(filename_len, base->len, 1)
+					: st_add(filename_len, base->len);
+
+	/* Create a tree entry from the current entry in read_tree iteration */
+	base_tree_ent = xcalloc(1, st_add3(sizeof(struct tree_entry), path_len, 1));
+	base_tree_ent->len = path_len;
+	base_tree_ent->mode = mode;
+	oidcpy(&base_tree_ent->oid, oid);
+
+	memcpy(base_tree_ent->name, base->buf, base->len);
+	memcpy(base_tree_ent->name + base->len, filename, filename_len);
+	if (S_ISDIR(mode))
+		base_tree_ent->name[base_tree_ent->len - 1] = '/';
+
+	while (cbdata->iter.current) {
+		struct tree_entry *ent = cbdata->iter.current;
+
+		int cmp = name_compare(ent->name, ent->len,
+				       base_tree_ent->name, base_tree_ent->len);
+		if (!cmp || cmp < 0) {
+			tree_entry_iterator_advance(&cbdata->iter);
+
+			if (add_tree_entry_to_index(cbdata, ent) < 0) {
+				result = error(_("failed to add tree entry '%s'"), ent->name);
+				goto cleanup_and_return;
+			}
+
+			if (!cmp) {
+				result = 0;
+				goto cleanup_and_return;
+			} else
+				continue;
+		}
+
+		break;
+	}
+
+	/*
+	 * If the tree entry should be replaced with an entry with the same name
+	 * (but different mode), skip it.
+	 */
+	hashmap_entry_init(&base_tree_ent->ent,
+			   memhash(base_tree_ent->name, df_path_len(base_tree_ent->len, base_tree_ent->mode)));
+	if (hashmap_get_entry(cbdata->df_name_hash, base_tree_ent, ent, NULL)) {
+		result = 0;
+		goto cleanup_and_return;
+	}
+
+	if (add_tree_entry_to_index(cbdata, base_tree_ent)) {
+		result = -1;
+		goto cleanup_and_return;
+	}
+
+	result = 0;
+
+cleanup_and_return:
+	FREE_AND_NULL(base_tree_ent);
+	return result;
+}
+
+static void write_tree(struct tree_entry_array *arr, struct tree *base_tree,
+		       struct object_id *oid)
+{
+	struct build_index_data cbdata = { 0 };
+	struct pathspec ps = { 0 };
 
 	sort_and_dedup_tree_entry_array(arr);
 
-	tree_entry_iterator_init(&iter, arr);
+	index_state_init(&cbdata.istate, the_repository);
+	cbdata.istate.sparse_index = 1;
+	tree_entry_iterator_init(&cbdata.iter, arr);
+	cbdata.df_name_hash = &arr->df_name_hash;
 
 	/* Construct an in-memory index from the provided entries & base tree */
-	while (iter.current) {
-		struct tree_entry *ent = iter.current;
-		tree_entry_iterator_advance(&iter);
+	if (base_tree &&
+	    read_tree(the_repository, base_tree, &ps, build_index_from_tree, &cbdata) < 0)
+		die(_("failed to create tree"));
+
+	while (cbdata.iter.current) {
+		struct tree_entry *ent = cbdata.iter.current;
+		tree_entry_iterator_advance(&cbdata.iter);
 
-		if (add_tree_entry_to_index(&istate, ent))
+		if (add_tree_entry_to_index(&cbdata, ent))
 			die(_("failed to add tree entry '%s'"), ent->name);
 	}
 
 	/* Write out new tree */
-	if (cache_tree_update(&istate, WRITE_TREE_SILENT | WRITE_TREE_MISSING_OK))
+	if (cache_tree_update(&cbdata.istate, WRITE_TREE_SILENT | WRITE_TREE_MISSING_OK))
 		die(_("failed to write tree"));
-	oidcpy(oid, &istate.cache_tree->oid);
+	oidcpy(oid, &cbdata.istate.cache_tree->oid);
 
-	release_index(&istate);
+	release_index(&cbdata.istate);
 }
 
 static void write_tree_literally(struct tree_entry_array *arr,
@@ -268,7 +347,7 @@  static void write_tree_literally(struct tree_entry_array *arr,
 }
 
 static const char *mktree_usage[] = {
-	"git mktree [-z] [--missing] [--literally] [--batch]",
+	"git mktree [-z] [--missing] [--literally] [--batch] [<tree-ish>]",
 	NULL
 };
 
@@ -334,6 +413,7 @@  int cmd_mktree(int ac, const char **av, const char *prefix)
 	struct tree_entry_array arr = { 0 };
 	struct mktree_line_data mktree_line_data = { .arr = &arr };
 	struct strbuf line = STRBUF_INIT;
+	struct tree *base_tree = NULL;
 	int ret;
 
 	const struct option option[] = {
@@ -346,6 +426,22 @@  int cmd_mktree(int ac, const char **av, const char *prefix)
 	};
 
 	ac = parse_options(ac, av, prefix, option, mktree_usage, 0);
+	if (ac > 1)
+		usage_with_options(mktree_usage, option);
+
+	if (ac) {
+		struct object_id base_tree_oid;
+
+		if (mktree_line_data.literally)
+			die(_("option '%s' and tree-ish cannot be used together"), "--literally");
+
+		if (repo_get_oid(the_repository, av[0], &base_tree_oid))
+			die(_("not a valid object name %s"), av[0]);
+
+		base_tree = parse_tree_indirect(&base_tree_oid);
+		if (!base_tree)
+			die(_("not a tree object: %s"), oid_to_hex(&base_tree_oid));
+	}
 
 	tree_entry_array_init(&arr);
 
@@ -373,7 +469,7 @@  int cmd_mktree(int ac, const char **av, const char *prefix)
 			if (mktree_line_data.literally)
 				write_tree_literally(&arr, &oid);
 			else
-				write_tree(&arr, &oid);
+				write_tree(&arr, base_tree, &oid);
 			puts(oid_to_hex(&oid));
 			fflush(stdout);
 		}
diff --git a/t/t1010-mktree.sh b/t/t1010-mktree.sh
index 08760141d6f..435ac23bd50 100755
--- a/t/t1010-mktree.sh
+++ b/t/t1010-mktree.sh
@@ -234,4 +234,40 @@  test_expect_success 'mktree with duplicate entries' '
 	test_cmp expect actual
 '
 
+test_expect_success 'mktree with base tree' '
+	tree_oid=$(cat tree) &&
+	folder_oid=$(git rev-parse ${tree_oid}:folder) &&
+	before_oid=$(git rev-parse ${tree_oid}:before) &&
+	head_oid=$(git rev-parse HEAD) &&
+
+	{
+		printf "040000 tree $folder_oid\ttest\n" &&
+		printf "100644 blob $before_oid\ttest.txt\n" &&
+		printf "040000 tree $folder_oid\ttest-\n" &&
+		printf "160000 commit $head_oid\ttest0\n"
+	} >top.base &&
+	git mktree <top.base >tree.base &&
+
+	{
+		printf "100755 blob $before_oid\tz\n" &&
+		printf "160000 commit $head_oid\ttest.xyz\n" &&
+		printf "040000 tree $folder_oid\ta\n" &&
+		printf "100644 blob $before_oid\ttest\n"
+	} >top.append &&
+	git mktree $(cat tree.base) <top.append >tree.actual &&
+
+	{
+		printf "040000 tree $folder_oid\ta\n" &&
+		printf "100644 blob $before_oid\ttest\n" &&
+		printf "040000 tree $folder_oid\ttest-\n" &&
+		printf "100644 blob $before_oid\ttest.txt\n" &&
+		printf "160000 commit $head_oid\ttest.xyz\n" &&
+		printf "160000 commit $head_oid\ttest0\n" &&
+		printf "100755 blob $before_oid\tz\n"
+	} >expect &&
+	git ls-tree $(cat tree.actual) >actual &&
+
+	test_cmp expect actual
+'
+
 test_done