Message ID | b9b97e0112939d1787ff1d2a13c48e5b406408db.1623069252.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Sparse-index: integrate with status | expand |
On 6/7/2021 8:34 AM, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> ... > + old_entry = make_transient_cache_entry( > + entry[0].mode, &entry[0].oid, > + old_path, /* stage */ 0); I didn't realize this before I started integrating with v2.32.0 (which I should have done before submitting v5) that make_transient_cache_entry() has changed its prototype to include a memory pool parameter. I'm working on a v6 that makes only this update and it will probably be ready tomorrow. Thanks, -Stolee
Derrick Stolee <stolee@gmail.com> writes: > On 6/7/2021 8:34 AM, Derrick Stolee via GitGitGadget wrote: >> From: Derrick Stolee <dstolee@microsoft.com> > ... >> + old_entry = make_transient_cache_entry( >> + entry[0].mode, &entry[0].oid, >> + old_path, /* stage */ 0); > > I didn't realize this before I started integrating with > v2.32.0 (which I should have done before submitting v5) that > make_transient_cache_entry() has changed its prototype to > include a memory pool parameter. Sorry for the trouble---these are usually all known to me for topics I happened to have picked up in 'seen', since I try to make it a rule that 'seen' must be a descendant of 'master'. How can I usefully communicate the conflicts I find out during the integration cycles to topic owners, I wonder. Thanks.
On 6/7/2021 9:05 PM, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >> On 6/7/2021 8:34 AM, Derrick Stolee via GitGitGadget wrote: >>> From: Derrick Stolee <dstolee@microsoft.com> >> ... >>> + old_entry = make_transient_cache_entry( >>> + entry[0].mode, &entry[0].oid, >>> + old_path, /* stage */ 0); >> >> I didn't realize this before I started integrating with >> v2.32.0 (which I should have done before submitting v5) that >> make_transient_cache_entry() has changed its prototype to >> include a memory pool parameter. > > Sorry for the trouble---these are usually all known to me for topics > I happened to have picked up in 'seen', since I try to make it a rule > that 'seen' must be a descendant of 'master'. > > How can I usefully communicate the conflicts I find out during the > integration cycles to topic owners, I wonder. This is my fault for stacking topics. I used a GitGitGadget PR to target a custom merge of other topics in flight, so my merges were testing against a static target. When those topics were merged, I should have updated my PR to point to 'master' or even 'next'. Thanks, -Stolee
On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Derrick Stolee <dstolee@microsoft.com> > > While comparing an index to a tree, we may see a sparse directory entry. > In this case, we should compare that portion of the tree to the tree > represented by that entry. This could include a new tree which needs to > be expanded to a full list of added files. It could also include an > existing tree, in which case all of the changes inside are important to > describe, including the modifications, additions, and deletions. Note > that the case where the tree has a path and the index does not remains > identical to before: the lack of a cache entry is the same with a sparse > index. > > In the case where a tree is modified, we need to expand the tree > recursively, and start comparing each contained entry as either an > addition, deletion, or modification. This causes an interesting > recursion that did not exist before. So, I haven't read through this in detail yet...but there's a big question I'm curious about: Git already has code for comparing an index to a tree, a tree to a tree, or a tree to the working directory, right? So, when comparing a sparse-index to a tree...can't we re-use the compare a tree to a tree code when we hit a sparse directory? Maybe there's a really good reason to conceptually duplicate the compare a tree to a tree code, but it seems the commit message should at least address that reason and why we need to reimplement that logic. > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > diff-lib.c | 188 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 188 insertions(+) > > diff --git a/diff-lib.c b/diff-lib.c > index b73cc1859a49..ba4c683d4bc4 100644 > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -314,6 +314,48 @@ static int get_stat_data(const struct cache_entry *ce, > return 0; > } > > +struct show_new_tree_context { > + struct rev_info *revs; > + unsigned added:1; > +}; > + > +static int show_new_file_from_tree(const struct object_id *oid, > + struct strbuf *base, const char *path, > + unsigned int mode, void *context) > +{ > + struct show_new_tree_context *ctx = context; > + struct cache_entry *new_file = make_transient_cache_entry(mode, oid, path, /* stage */ 0); > + > + diff_index_show_file(ctx->revs, ctx->added ? "+" : "-", new_file, oid, !is_null_oid(oid), mode, 0); > + discard_cache_entry(new_file); > + return 0; > +} > + > +static void show_directory(struct rev_info *revs, > + const struct cache_entry *new_dir, > + int added) > +{ > + /* > + * new_dir is a sparse directory entry, so we want to collect all > + * of the new files within the tree. This requires recursively > + * expanding the trees. > + */ > + struct show_new_tree_context ctx = { revs, added }; > + struct repository *r = revs->repo; > + struct strbuf base = STRBUF_INIT; > + struct pathspec ps; > + struct tree *tree = lookup_tree(r, &new_dir->oid); > + > + memset(&ps, 0, sizeof(ps)); > + ps.recursive = 1; > + ps.has_wildcard = 1; > + ps.max_depth = -1; > + > + strbuf_add(&base, new_dir->name, new_dir->ce_namelen); > + read_tree_at(r, tree, &base, &ps, > + show_new_file_from_tree, &ctx); > +} > + > static void show_new_file(struct rev_info *revs, > const struct cache_entry *new_file, > int cached, int match_missing) > @@ -322,6 +364,11 @@ static void show_new_file(struct rev_info *revs, > unsigned int mode; > unsigned dirty_submodule = 0; > > + if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) { > + show_directory(revs, new_file, /*added */ 1); > + return; > + } > + > /* > * New file in the index: it might actually be different in > * the working tree. > @@ -333,6 +380,136 @@ static void show_new_file(struct rev_info *revs, > diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule); > } > > +static int show_modified(struct rev_info *revs, > + const struct cache_entry *old_entry, > + const struct cache_entry *new_entry, > + int report_missing, > + int cached, int match_missing); > + > +static int compare_within_sparse_dir(int n, unsigned long mask, > + unsigned long dirmask, struct name_entry *entry, > + struct traverse_info *info) > +{ > + struct rev_info *revs = info->data; > + struct object_id *oid0 = &entry[0].oid; > + struct object_id *oid1 = &entry[1].oid; > + > + if (oideq(oid0, oid1)) > + return mask; > + > + /* Directory/file conflicts are handled earlier. */ > + if (S_ISDIR(entry[0].mode) && S_ISDIR(entry[1].mode)) { > + struct tree_desc t[2]; > + void *buf[2]; > + struct traverse_info info_r = { NULL, }; > + > + info_r.name = xstrfmt("%s%s", info->traverse_path, entry[0].path); > + info_r.namelen = strlen(info_r.name); > + info_r.traverse_path = xstrfmt("%s/", info_r.name); > + info_r.fn = compare_within_sparse_dir; > + info_r.prev = info; > + info_r.mode = entry[0].mode; > + info_r.pathlen = entry[0].pathlen; > + info_r.df_conflicts = 0; > + info_r.data = revs; > + > + buf[0] = fill_tree_descriptor(revs->repo, &t[0], oid0); > + buf[1] = fill_tree_descriptor(revs->repo, &t[1], oid1); > + > + traverse_trees(NULL, 2, t, &info_r); > + > + free((char *)info_r.name); > + free((char *)info_r.traverse_path); > + free(buf[0]); > + free(buf[1]); > + } else { > + char *old_path = NULL, *new_path = NULL; > + struct cache_entry *old_entry = NULL, *new_entry = NULL; > + > + if (entry[0].path) { > + old_path = xstrfmt("%s%s", info->traverse_path, entry[0].path); > + old_entry = make_transient_cache_entry( > + entry[0].mode, &entry[0].oid, > + old_path, /* stage */ 0); > + old_entry->ce_flags |= CE_SKIP_WORKTREE; > + } > + if (entry[1].path) { > + new_path = xstrfmt("%s%s", info->traverse_path, entry[1].path); > + new_entry = make_transient_cache_entry( > + entry[1].mode, &entry[1].oid, > + new_path, /* stage */ 0); > + new_entry->ce_flags |= CE_SKIP_WORKTREE; > + } > + > + if (entry[0].path && entry[1].path) > + show_modified(revs, old_entry, new_entry, 0, 1, 0); > + else if (entry[0].path) > + diff_index_show_file(revs, revs->prefix, > + old_entry, &entry[0].oid, > + 0, entry[0].mode, 0); > + else if (entry[1].path) > + show_new_file(revs, new_entry, 1, 0); > + > + discard_cache_entry(old_entry); > + discard_cache_entry(new_entry); > + free(old_path); > + free(new_path); > + } > + > + return mask; > +} > + > +static void show_modified_sparse_directory(struct rev_info *revs, > + const struct cache_entry *old_entry, > + const struct cache_entry *new_entry, > + int report_missing, > + int cached, int match_missing) > +{ > + struct tree_desc t[2]; > + void *buf[2]; > + struct traverse_info info = { NULL }; > + struct strbuf name = STRBUF_INIT; > + struct strbuf parent_path = STRBUF_INIT; > + char *last_dir_sep; > + > + if (oideq(&old_entry->oid, &new_entry->oid)) > + return; > + > + info.fn = compare_within_sparse_dir; > + info.prev = &info; > + > + strbuf_add(&name, new_entry->name, new_entry->ce_namelen - 1); > + info.name = name.buf; > + info.namelen = name.len; > + > + strbuf_add(&parent_path, new_entry->name, new_entry->ce_namelen - 1); > + if ((last_dir_sep = find_last_dir_sep(parent_path.buf)) > parent_path.buf) > + strbuf_setlen(&parent_path, (last_dir_sep - parent_path.buf) - 1); > + else > + strbuf_setlen(&parent_path, 0); > + > + info.pathlen = parent_path.len; > + > + if (parent_path.len) > + info.traverse_path = parent_path.buf; > + else > + info.traverse_path = ""; > + > + info.mode = new_entry->ce_mode; > + info.df_conflicts = 0; > + info.data = revs; > + > + buf[0] = fill_tree_descriptor(revs->repo, &t[0], &old_entry->oid); > + buf[1] = fill_tree_descriptor(revs->repo, &t[1], &new_entry->oid); > + > + traverse_trees(NULL, 2, t, &info); > + > + free(buf[0]); > + free(buf[1]); > + strbuf_release(&name); > + strbuf_release(&parent_path); > +} > + > static int show_modified(struct rev_info *revs, > const struct cache_entry *old_entry, > const struct cache_entry *new_entry, > @@ -343,6 +520,17 @@ static int show_modified(struct rev_info *revs, > const struct object_id *oid; > unsigned dirty_submodule = 0; > > + /* > + * If both are sparse directory entries, then expand the > + * modifications to the file level. > + */ > + if (old_entry && new_entry && > + S_ISSPARSEDIR(old_entry->ce_mode) && > + S_ISSPARSEDIR(new_entry->ce_mode)) { > + show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing); > + return 0; > + } > + > if (get_stat_data(new_entry, &oid, &mode, cached, match_missing, > &dirty_submodule, &revs->diffopt) < 0) { > if (report_missing) > -- > gitgitgadget >
Elijah Newren <newren@gmail.com> writes: > On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> >> From: Derrick Stolee <dstolee@microsoft.com> >> >> While comparing an index to a tree, we may see a sparse directory entry. >> In this case, we should compare that portion of the tree to the tree >> represented by that entry. This could include a new tree which needs to >> be expanded to a full list of added files. It could also include an >> existing tree, in which case all of the changes inside are important to >> describe, including the modifications, additions, and deletions. Note >> that the case where the tree has a path and the index does not remains >> identical to before: the lack of a cache entry is the same with a sparse >> index. >> >> In the case where a tree is modified, we need to expand the tree >> recursively, and start comparing each contained entry as either an >> addition, deletion, or modification. This causes an interesting >> recursion that did not exist before. > > So, I haven't read through this in detail yet...but there's a big > question I'm curious about: > > Git already has code for comparing an index to a tree, a tree to a > tree, or a tree to the working directory, right? So, when comparing a > sparse-index to a tree...can't we re-use the compare a tree to a tree > code when we hit a sparse directory? Offhand I do not think of a reason why that cannot work. The tree-diff machinery takes two trees, walks them in parallel and repeatedly calls either diff_addremove() or diff_change(), which appends diff_filepair() to the diff_queue[] structure. If you see an unexpanded tree on the index side, you should be able to pass that tree with the subtree you are comparing against to the tree-diff machinery to come up with a series of filepairs, and then tweak the pathnames of these filepairs (as such a two-tree comparison would be comparing two trees representing a single subdirectory of two different vintages) before adding them to the diff_queue[] you are collecting the index-vs-tree diff, for example. But if a part of the index is represented as a tree because it is outside the cone of interest, should we even be showing the difference in that part of the tree? If t/ directory is outside the cone of interest, should "git diff HEAD~100 HEAD t/" show anything to begin with (the same question for "git diff --cached HEAD t/")?
On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget > > <gitgitgadget@gmail.com> wrote: > >> > >> From: Derrick Stolee <dstolee@microsoft.com> > >> > >> While comparing an index to a tree, we may see a sparse directory entry. > >> In this case, we should compare that portion of the tree to the tree > >> represented by that entry. This could include a new tree which needs to > >> be expanded to a full list of added files. It could also include an > >> existing tree, in which case all of the changes inside are important to > >> describe, including the modifications, additions, and deletions. Note > >> that the case where the tree has a path and the index does not remains > >> identical to before: the lack of a cache entry is the same with a sparse > >> index. > >> > >> In the case where a tree is modified, we need to expand the tree > >> recursively, and start comparing each contained entry as either an > >> addition, deletion, or modification. This causes an interesting > >> recursion that did not exist before. > > > > So, I haven't read through this in detail yet...but there's a big > > question I'm curious about: > > > > Git already has code for comparing an index to a tree, a tree to a > > tree, or a tree to the working directory, right? So, when comparing a > > sparse-index to a tree...can't we re-use the compare a tree to a tree > > code when we hit a sparse directory? > > Offhand I do not think of a reason why that cannot work. > > The tree-diff machinery takes two trees, walks them in parallel and > repeatedly calls either diff_addremove() or diff_change(), which > appends diff_filepair() to the diff_queue[] structure. If you see > an unexpanded tree on the index side, you should be able to pass > that tree with the subtree you are comparing against to the tree-diff > machinery to come up with a series of filepairs, and then tweak the > pathnames of these filepairs (as such a two-tree comparison would be > comparing two trees representing a single subdirectory of two different > vintages) before adding them to the diff_queue[] you are collecting > the index-vs-tree diff, for example. Good to know it seems my idea might be reasonable. > But if a part of the index is represented as a tree because it is > outside the cone of interest, should we even be showing the > difference in that part of the tree? If t/ directory is outside the > cone of interest, should "git diff HEAD~100 HEAD t/" show anything > to begin with (the same question for "git diff --cached HEAD t/")? Excellent question...and not just for diff, but log, grep with revisions, and other commands. We discussed this a while back[1] and we seemed to lean towards eventually adding a flag because there are usecases both for (1) viewing full history while having sparsity paths restrict just the working copy, and (2) also restricting the view of history to the sparsity paths. [1] It's been discussed a few times, but there's a relatively comprehensive discussion at the "Commands that would change for behavior A" section from https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
On 6/9/2021 4:11 AM, Elijah Newren wrote: > On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote: >> >> Elijah Newren <newren@gmail.com> writes: >> >>> On Mon, Jun 7, 2021 at 5:34 AM Derrick Stolee via GitGitGadget >>> <gitgitgadget@gmail.com> wrote: >>>> >>>> From: Derrick Stolee <dstolee@microsoft.com> >>>> >>>> While comparing an index to a tree, we may see a sparse directory entry. >>>> In this case, we should compare that portion of the tree to the tree >>>> represented by that entry. This could include a new tree which needs to >>>> be expanded to a full list of added files. It could also include an >>>> existing tree, in which case all of the changes inside are important to >>>> describe, including the modifications, additions, and deletions. Note >>>> that the case where the tree has a path and the index does not remains >>>> identical to before: the lack of a cache entry is the same with a sparse >>>> index. >>>> >>>> In the case where a tree is modified, we need to expand the tree >>>> recursively, and start comparing each contained entry as either an >>>> addition, deletion, or modification. This causes an interesting >>>> recursion that did not exist before. >>> >>> So, I haven't read through this in detail yet...but there's a big >>> question I'm curious about: >>> >>> Git already has code for comparing an index to a tree, a tree to a >>> tree, or a tree to the working directory, right? So, when comparing a >>> sparse-index to a tree...can't we re-use the compare a tree to a tree >>> code when we hit a sparse directory? >> >> Offhand I do not think of a reason why that cannot work. >> >> The tree-diff machinery takes two trees, walks them in parallel and >> repeatedly calls either diff_addremove() or diff_change(), which >> appends diff_filepair() to the diff_queue[] structure. If you see >> an unexpanded tree on the index side, you should be able to pass >> that tree with the subtree you are comparing against to the tree-diff >> machinery to come up with a series of filepairs, and then tweak the >> pathnames of these filepairs (as such a two-tree comparison would be >> comparing two trees representing a single subdirectory of two different >> vintages) before adding them to the diff_queue[] you are collecting >> the index-vs-tree diff, for example. > > Good to know it seems my idea might be reasonable. I agree that this is reasonable. I just didn't look hard enough to find existing code for this, since I found traverse_trees and thought that _was_ the library for this. >> But if a part of the index is represented as a tree because it is >> outside the cone of interest, should we even be showing the >> difference in that part of the tree? If t/ directory is outside the >> cone of interest, should "git diff HEAD~100 HEAD t/" show anything >> to begin with (the same question for "git diff --cached HEAD t/")? > > Excellent question...and not just for diff, but log, grep with > revisions, and other commands. We discussed this a while back[1] and > we seemed to lean towards eventually adding a flag because there are > usecases both for (1) viewing full history while having sparsity paths > restrict just the working copy, and (2) also restricting the view of > history to the sparsity paths. > > [1] It's been discussed a few times, but there's a relatively > comprehensive discussion at the "Commands that would change for > behavior A" section from > https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ Yes, we could investigate this behavior change in the future. The good thing is that these points that handle sparse directory entries create clear branching points for that future behavior change. Thanks, -Stolee
On 6/9/2021 4:33 PM, Derrick Stolee wrote: > On 6/9/2021 4:11 AM, Elijah Newren wrote: >> On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote: >>> >>> Elijah Newren <newren@gmail.com> writes: >>> >>> The tree-diff machinery takes two trees, walks them in parallel and >>> repeatedly calls either diff_addremove() or diff_change(), which >>> appends diff_filepair() to the diff_queue[] structure. If you see >>> an unexpanded tree on the index side, you should be able to pass >>> that tree with the subtree you are comparing against to the tree-diff >>> machinery to come up with a series of filepairs, and then tweak the >>> pathnames of these filepairs (as such a two-tree comparison would be >>> comparing two trees representing a single subdirectory of two different >>> vintages) before adding them to the diff_queue[] you are collecting >>> the index-vs-tree diff, for example. >> >> Good to know it seems my idea might be reasonable. > > I agree that this is reasonable. I just didn't look hard enough > to find existing code for this, since I found traverse_trees and > thought that _was_ the library for this. This was surprisingly simple, since most of the complicated stuff is built into diff_tree_oid() and its use of revs->diffopt. The new patch works as shown below the cut-line. I was incredibly suspicious of how quickly this came together, but it passes all the tests I have for it (including Scalar functional tests with the commit, checkout, and add integrations). I'll send a new version with this patch tomorrow, as well as the other recommended edits. Thanks, -Stolee --- >8 --- diff --git a/diff-lib.c b/diff-lib.c index c2ac9250fe9..b631df89343 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -316,6 +316,13 @@ static int get_stat_data(const struct index_state *istate, return 0; } +static void show_directory(struct rev_info *revs, + const struct cache_entry *new_dir, + int added) +{ + diff_tree_oid(NULL, &new_dir->oid, new_dir->name, &revs->diffopt); +} + static void show_new_file(struct rev_info *revs, const struct cache_entry *new_file, int cached, int match_missing) @@ -325,6 +332,11 @@ static void show_new_file(struct rev_info *revs, unsigned dirty_submodule = 0; struct index_state *istate = revs->diffopt.repo->index; + if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) { + show_directory(revs, new_file, /*added */ 1); + return; + } + /* * New file in the index: it might actually be different in * the working tree. @@ -336,6 +348,15 @@ static void show_new_file(struct rev_info *revs, diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule); } +static void show_modified_sparse_directory(struct rev_info *revs, + const struct cache_entry *old_entry, + const struct cache_entry *new_entry, + int report_missing, + int cached, int match_missing) +{ + diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt); +} + static int show_modified(struct rev_info *revs, const struct cache_entry *old_entry, const struct cache_entry *new_entry, @@ -347,6 +368,17 @@ static int show_modified(struct rev_info *revs, unsigned dirty_submodule = 0; struct index_state *istate = revs->diffopt.repo->index; + /* + * If both are sparse directory entries, then expand the + * modifications to the file level. + */ + if (old_entry && new_entry && + S_ISSPARSEDIR(old_entry->ce_mode) && + S_ISSPARSEDIR(new_entry->ce_mode)) { + show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing); + return 0; + } + if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing, &dirty_submodule, &revs->diffopt) < 0) { if (report_missing)
On Thu, Jun 10, 2021 at 10:45 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 6/9/2021 4:33 PM, Derrick Stolee wrote: > > On 6/9/2021 4:11 AM, Elijah Newren wrote: > >> On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote: > >>> > >>> Elijah Newren <newren@gmail.com> writes: > >>> > >>> The tree-diff machinery takes two trees, walks them in parallel and > >>> repeatedly calls either diff_addremove() or diff_change(), which > >>> appends diff_filepair() to the diff_queue[] structure. If you see > >>> an unexpanded tree on the index side, you should be able to pass > >>> that tree with the subtree you are comparing against to the tree-diff > >>> machinery to come up with a series of filepairs, and then tweak the > >>> pathnames of these filepairs (as such a two-tree comparison would be > >>> comparing two trees representing a single subdirectory of two different > >>> vintages) before adding them to the diff_queue[] you are collecting > >>> the index-vs-tree diff, for example. > >> > >> Good to know it seems my idea might be reasonable. > > > > I agree that this is reasonable. I just didn't look hard enough > > to find existing code for this, since I found traverse_trees and > > thought that _was_ the library for this. > > This was surprisingly simple, since most of the complicated stuff > is built into diff_tree_oid() and its use of revs->diffopt. The > new patch works as shown below the cut-line. > > I was incredibly suspicious of how quickly this came together, > but it passes all the tests I have for it (including Scalar > functional tests with the commit, checkout, and add integrations). Nice! > I'll send a new version with this patch tomorrow, as well as the > other recommended edits. > > Thanks, > -Stolee > > --- >8 --- > > > diff --git a/diff-lib.c b/diff-lib.c > index c2ac9250fe9..b631df89343 100644 > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -316,6 +316,13 @@ static int get_stat_data(const struct index_state *istate, > return 0; > } > > +static void show_directory(struct rev_info *revs, > + const struct cache_entry *new_dir, > + int added) > +{ > + diff_tree_oid(NULL, &new_dir->oid, new_dir->name, &revs->diffopt); > +} > + > static void show_new_file(struct rev_info *revs, > const struct cache_entry *new_file, > int cached, int match_missing) > @@ -325,6 +332,11 @@ static void show_new_file(struct rev_info *revs, > unsigned dirty_submodule = 0; > struct index_state *istate = revs->diffopt.repo->index; > > + if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) { > + show_directory(revs, new_file, /*added */ 1); > + return; > + } > + > /* > * New file in the index: it might actually be different in > * the working tree. > @@ -336,6 +348,15 @@ static void show_new_file(struct rev_info *revs, > diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule); > } > > +static void show_modified_sparse_directory(struct rev_info *revs, > + const struct cache_entry *old_entry, > + const struct cache_entry *new_entry, > + int report_missing, > + int cached, int match_missing) > +{ > + diff_tree_oid(&old_entry->oid, &new_entry->oid, new_entry->name, &revs->diffopt); > +} > + > static int show_modified(struct rev_info *revs, > const struct cache_entry *old_entry, > const struct cache_entry *new_entry, > @@ -347,6 +368,17 @@ static int show_modified(struct rev_info *revs, > unsigned dirty_submodule = 0; > struct index_state *istate = revs->diffopt.repo->index; > > + /* > + * If both are sparse directory entries, then expand the > + * modifications to the file level. > + */ > + if (old_entry && new_entry && > + S_ISSPARSEDIR(old_entry->ce_mode) && > + S_ISSPARSEDIR(new_entry->ce_mode)) { > + show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing); > + return 0; > + } What if S_ISSPARSEDIR(old_entry->ce_mode) != S_ISSPARSEDIR(new_entry->ce_mode) ? > + > if (get_stat_data(istate, new_entry, &oid, &mode, cached, match_missing, > &dirty_submodule, &revs->diffopt) < 0) { > if (report_missing)
On 6/10/2021 5:31 PM, Elijah Newren wrote: > On Thu, Jun 10, 2021 at 10:45 AM Derrick Stolee <stolee@gmail.com> wrote: >> >> On 6/9/2021 4:33 PM, Derrick Stolee wrote: >>> On 6/9/2021 4:11 AM, Elijah Newren wrote: >>>> On Tue, Jun 8, 2021 at 11:32 PM Junio C Hamano <gitster@pobox.com> wrote: >>>>> >>>>> Elijah Newren <newren@gmail.com> writes: >>>>> >>>>> The tree-diff machinery takes two trees, walks them in parallel and >>>>> repeatedly calls either diff_addremove() or diff_change(), which >>>>> appends diff_filepair() to the diff_queue[] structure. If you see >>>>> an unexpanded tree on the index side, you should be able to pass >>>>> that tree with the subtree you are comparing against to the tree-diff >>>>> machinery to come up with a series of filepairs, and then tweak the >>>>> pathnames of these filepairs (as such a two-tree comparison would be >>>>> comparing two trees representing a single subdirectory of two different >>>>> vintages) before adding them to the diff_queue[] you are collecting >>>>> the index-vs-tree diff, for example. >>>> >>>> Good to know it seems my idea might be reasonable. >>> >>> I agree that this is reasonable. I just didn't look hard enough >>> to find existing code for this, since I found traverse_trees and >>> thought that _was_ the library for this. >> >> This was surprisingly simple, since most of the complicated stuff >> is built into diff_tree_oid() and its use of revs->diffopt. The >> new patch works as shown below the cut-line. >> >> I was incredibly suspicious of how quickly this came together, >> but it passes all the tests I have for it (including Scalar >> functional tests with the commit, checkout, and add integrations). > > Nice! > >> I'll send a new version with this patch tomorrow, as well as the >> other recommended edits. ...still planning on this today, but... >> + /* >> + * If both are sparse directory entries, then expand the >> + * modifications to the file level. >> + */ >> + if (old_entry && new_entry && >> + S_ISSPARSEDIR(old_entry->ce_mode) && >> + S_ISSPARSEDIR(new_entry->ce_mode)) { >> + show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing); >> + return 0; >> + } > > What if S_ISSPARSEDIR(old_entry->ce_mode) != S_ISSPARSEDIR(new_entry->ce_mode) ? You make a good point that something different would happen in the case of a directory/file conflict on the sparse checkout boundary. This can be as simple as the trivial "only files at root" cone-mode sparse-checkout definition, with "folder/" (tree) changing to "folder" (blob). I'll see what I can do to create a test scenario for this and add the correct cases. Thanks, -Stolee
On 6/11/2021 8:57 AM, Derrick Stolee wrote: > On 6/10/2021 5:31 PM, Elijah Newren wrote: >> On Thu, Jun 10, 2021 at 10:45 AM Derrick Stolee <stolee@gmail.com> wrote: >>> >>> I'll send a new version with this patch tomorrow, as well as the >>> other recommended edits. > > ...still planning on this today, but... So optimistic! >>> + /* >>> + * If both are sparse directory entries, then expand the >>> + * modifications to the file level. >>> + */ >>> + if (old_entry && new_entry && >>> + S_ISSPARSEDIR(old_entry->ce_mode) && >>> + S_ISSPARSEDIR(new_entry->ce_mode)) { >>> + show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing); >>> + return 0; >>> + } >> >> What if S_ISSPARSEDIR(old_entry->ce_mode) != S_ISSPARSEDIR(new_entry->ce_mode) ? > > You make a good point that something different would happen > in the case of a directory/file conflict on the sparse checkout > boundary. This can be as simple as the trivial "only files at > root" cone-mode sparse-checkout definition, with "folder/" (tree) > changing to "folder" (blob). > > I'll see what I can do to create a test scenario for > this and add the correct cases. Creating a directory/file conflict in this way exposes a bug in a different codepath in unpack_trees(), although it isn't visible until 'git checkout' allows the index to stay sparse. It's due to the code in unpack_callback() that handles blobs and trees differently, and hence the blob/tree conflict isn't handled appropriately there. The changes from Patch 8 are to blame for these first errors. At least, those are the first errors I have discovered with these conflicts. There might be other scenarios that care about this section of diff-lib.c, but I have not gotten to a point where such behavior would be exposed. I don't expect to succeed in squashing this bug today, so I'll try again next week. Thanks, -Stolee
diff --git a/diff-lib.c b/diff-lib.c index b73cc1859a49..ba4c683d4bc4 100644 --- a/diff-lib.c +++ b/diff-lib.c @@ -314,6 +314,48 @@ static int get_stat_data(const struct cache_entry *ce, return 0; } +struct show_new_tree_context { + struct rev_info *revs; + unsigned added:1; +}; + +static int show_new_file_from_tree(const struct object_id *oid, + struct strbuf *base, const char *path, + unsigned int mode, void *context) +{ + struct show_new_tree_context *ctx = context; + struct cache_entry *new_file = make_transient_cache_entry(mode, oid, path, /* stage */ 0); + + diff_index_show_file(ctx->revs, ctx->added ? "+" : "-", new_file, oid, !is_null_oid(oid), mode, 0); + discard_cache_entry(new_file); + return 0; +} + +static void show_directory(struct rev_info *revs, + const struct cache_entry *new_dir, + int added) +{ + /* + * new_dir is a sparse directory entry, so we want to collect all + * of the new files within the tree. This requires recursively + * expanding the trees. + */ + struct show_new_tree_context ctx = { revs, added }; + struct repository *r = revs->repo; + struct strbuf base = STRBUF_INIT; + struct pathspec ps; + struct tree *tree = lookup_tree(r, &new_dir->oid); + + memset(&ps, 0, sizeof(ps)); + ps.recursive = 1; + ps.has_wildcard = 1; + ps.max_depth = -1; + + strbuf_add(&base, new_dir->name, new_dir->ce_namelen); + read_tree_at(r, tree, &base, &ps, + show_new_file_from_tree, &ctx); +} + static void show_new_file(struct rev_info *revs, const struct cache_entry *new_file, int cached, int match_missing) @@ -322,6 +364,11 @@ static void show_new_file(struct rev_info *revs, unsigned int mode; unsigned dirty_submodule = 0; + if (new_file && S_ISSPARSEDIR(new_file->ce_mode)) { + show_directory(revs, new_file, /*added */ 1); + return; + } + /* * New file in the index: it might actually be different in * the working tree. @@ -333,6 +380,136 @@ static void show_new_file(struct rev_info *revs, diff_index_show_file(revs, "+", new_file, oid, !is_null_oid(oid), mode, dirty_submodule); } +static int show_modified(struct rev_info *revs, + const struct cache_entry *old_entry, + const struct cache_entry *new_entry, + int report_missing, + int cached, int match_missing); + +static int compare_within_sparse_dir(int n, unsigned long mask, + unsigned long dirmask, struct name_entry *entry, + struct traverse_info *info) +{ + struct rev_info *revs = info->data; + struct object_id *oid0 = &entry[0].oid; + struct object_id *oid1 = &entry[1].oid; + + if (oideq(oid0, oid1)) + return mask; + + /* Directory/file conflicts are handled earlier. */ + if (S_ISDIR(entry[0].mode) && S_ISDIR(entry[1].mode)) { + struct tree_desc t[2]; + void *buf[2]; + struct traverse_info info_r = { NULL, }; + + info_r.name = xstrfmt("%s%s", info->traverse_path, entry[0].path); + info_r.namelen = strlen(info_r.name); + info_r.traverse_path = xstrfmt("%s/", info_r.name); + info_r.fn = compare_within_sparse_dir; + info_r.prev = info; + info_r.mode = entry[0].mode; + info_r.pathlen = entry[0].pathlen; + info_r.df_conflicts = 0; + info_r.data = revs; + + buf[0] = fill_tree_descriptor(revs->repo, &t[0], oid0); + buf[1] = fill_tree_descriptor(revs->repo, &t[1], oid1); + + traverse_trees(NULL, 2, t, &info_r); + + free((char *)info_r.name); + free((char *)info_r.traverse_path); + free(buf[0]); + free(buf[1]); + } else { + char *old_path = NULL, *new_path = NULL; + struct cache_entry *old_entry = NULL, *new_entry = NULL; + + if (entry[0].path) { + old_path = xstrfmt("%s%s", info->traverse_path, entry[0].path); + old_entry = make_transient_cache_entry( + entry[0].mode, &entry[0].oid, + old_path, /* stage */ 0); + old_entry->ce_flags |= CE_SKIP_WORKTREE; + } + if (entry[1].path) { + new_path = xstrfmt("%s%s", info->traverse_path, entry[1].path); + new_entry = make_transient_cache_entry( + entry[1].mode, &entry[1].oid, + new_path, /* stage */ 0); + new_entry->ce_flags |= CE_SKIP_WORKTREE; + } + + if (entry[0].path && entry[1].path) + show_modified(revs, old_entry, new_entry, 0, 1, 0); + else if (entry[0].path) + diff_index_show_file(revs, revs->prefix, + old_entry, &entry[0].oid, + 0, entry[0].mode, 0); + else if (entry[1].path) + show_new_file(revs, new_entry, 1, 0); + + discard_cache_entry(old_entry); + discard_cache_entry(new_entry); + free(old_path); + free(new_path); + } + + return mask; +} + +static void show_modified_sparse_directory(struct rev_info *revs, + const struct cache_entry *old_entry, + const struct cache_entry *new_entry, + int report_missing, + int cached, int match_missing) +{ + struct tree_desc t[2]; + void *buf[2]; + struct traverse_info info = { NULL }; + struct strbuf name = STRBUF_INIT; + struct strbuf parent_path = STRBUF_INIT; + char *last_dir_sep; + + if (oideq(&old_entry->oid, &new_entry->oid)) + return; + + info.fn = compare_within_sparse_dir; + info.prev = &info; + + strbuf_add(&name, new_entry->name, new_entry->ce_namelen - 1); + info.name = name.buf; + info.namelen = name.len; + + strbuf_add(&parent_path, new_entry->name, new_entry->ce_namelen - 1); + if ((last_dir_sep = find_last_dir_sep(parent_path.buf)) > parent_path.buf) + strbuf_setlen(&parent_path, (last_dir_sep - parent_path.buf) - 1); + else + strbuf_setlen(&parent_path, 0); + + info.pathlen = parent_path.len; + + if (parent_path.len) + info.traverse_path = parent_path.buf; + else + info.traverse_path = ""; + + info.mode = new_entry->ce_mode; + info.df_conflicts = 0; + info.data = revs; + + buf[0] = fill_tree_descriptor(revs->repo, &t[0], &old_entry->oid); + buf[1] = fill_tree_descriptor(revs->repo, &t[1], &new_entry->oid); + + traverse_trees(NULL, 2, t, &info); + + free(buf[0]); + free(buf[1]); + strbuf_release(&name); + strbuf_release(&parent_path); +} + static int show_modified(struct rev_info *revs, const struct cache_entry *old_entry, const struct cache_entry *new_entry, @@ -343,6 +520,17 @@ static int show_modified(struct rev_info *revs, const struct object_id *oid; unsigned dirty_submodule = 0; + /* + * If both are sparse directory entries, then expand the + * modifications to the file level. + */ + if (old_entry && new_entry && + S_ISSPARSEDIR(old_entry->ce_mode) && + S_ISSPARSEDIR(new_entry->ce_mode)) { + show_modified_sparse_directory(revs, old_entry, new_entry, report_missing, cached, match_missing); + return 0; + } + if (get_stat_data(new_entry, &oid, &mode, cached, match_missing, &dirty_submodule, &revs->diffopt) < 0) { if (report_missing)