diff mbox series

[v4,1/6] revision: add mark_tree_uninteresting_sparse

Message ID 817e30a287e12ce8e94ce41fcb969dd8ae53b9ce.1544822533.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Add a new "sparse" tree walk algorithm | expand

Commit Message

Linus Arver via GitGitGadget Dec. 14, 2018, 9:22 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

In preparation for a new algorithm that walks fewer trees when
creating a pack from a set of revisions, create a method that
takes an oidset of tree oids and marks reachable objects as
UNINTERESTING.

The current implementation uses the existing
mark_tree_uninteresting to recursively walk the trees and blobs.
This will walk the same number of trees as the old mechanism.

There is one new assumption in this approach: we are also given
the oids of the interesting trees. This implementation does not
use those trees at the moment, but we will use them in a later
rewrite of this method.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 revision.c | 25 +++++++++++++++++++++++++
 revision.h |  2 ++
 2 files changed, 27 insertions(+)

Comments

Junio C Hamano Jan. 11, 2019, 7:43 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> In preparation for a new algorithm that walks fewer trees when
> creating a pack from a set of revisions, create a method that
> takes an oidset of tree oids and marks reachable objects as
> UNINTERESTING.
>
> The current implementation uses the existing
> mark_tree_uninteresting to recursively walk the trees and blobs.
> This will walk the same number of trees as the old mechanism.
>
> There is one new assumption in this approach: we are also given
> the oids of the interesting trees. This implementation does not
> use those trees at the moment, but we will use them in a later
> rewrite of this method.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  revision.c | 25 +++++++++++++++++++++++++
>  revision.h |  2 ++
>  2 files changed, 27 insertions(+)
>
> diff --git a/revision.c b/revision.c
> index 13e0519c02..f9eb6400f1 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -99,6 +99,31 @@ void mark_tree_uninteresting(struct repository *r, struct tree *tree)
>  	mark_tree_contents_uninteresting(r, tree);
>  }
>  
> +void mark_trees_uninteresting_sparse(struct repository *r,
> +				     struct oidset *set)
> +{
> +	struct object_id *oid;
> +	struct oidset_iter iter;
> +
> +	oidset_iter_init(set, &iter);
> +	while ((oid = oidset_iter_next(&iter))) {
> +		struct tree *tree = lookup_tree(r, oid);
> +
> +		if (!tree)
> +			continue;
> +
> +		if (tree->object.flags & UNINTERESTING) {
> +			/*
> +			 * Remove the flag so the next call
> +			 * is not a no-op. The flag is added
> +			 * in mark_tree_unintersting().
> +			 */
> +			tree->object.flags ^= UNINTERESTING;
> +			mark_tree_uninteresting(r, tree);
> +		}

The proposed log message claims that the method takes an oidset and
marks reachable objects as uninteresting, but the implementation
only marks those that are reachable from already uninteresting
trees.  Either one of them must be wrong.

Did you mean to have this instead?

		if (!tree)
			continue;
		/*
		 * Force traversal of the tree, even if it has been
		 * already marked as UNINTERESTING.
		 */
		tree->object.flags &= ~UNINTERESTING;
		mark_tree_uninteresting(r, tree);

By the way, one of the bigger reasons why I have to ask, instead of
making an educated guess, is because "struct oidset *set" parameter
does not give any useful information with the variable name to the
readers.  We know it is a set because its type is oidset; readers
need to know what meaning the 'set' has, what it is used for, why
the caller wants to (or decides not to) place a tree object in the
set when it calls it.  None of that can be read from its name.
Junio C Hamano Jan. 11, 2019, 8:25 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

>> In preparation for a new algorithm that walks fewer trees when
>> creating a pack from a set of revisions, create a method that
>> takes an oidset of tree oids and marks reachable objects as
>> UNINTERESTING.
>> ...
>> There is one new assumption in this approach: we are also given
>> the oids of the interesting trees. This implementation does not
>> use those trees at the moment, but we will use them in a later
>> rewrite of this method.

Ahh....


> The proposed log message claims that the method takes an oidset and
> marks reachable objects as uninteresting, but the implementation
> only marks those that are reachable from already uninteresting
> trees.  Either one of them must be wrong.
>
> Did you mean to have this instead?
>
> 		if (!tree)
> 			continue;
> 		/*
> 		 * Force traversal of the tree, even if it has been
> 		 * already marked as UNINTERESTING.
> 		 */
> 		tree->object.flags &= ~UNINTERESTING;
> 		mark_tree_uninteresting(r, tree);

So, I assumed that the implementation was wrong, but it is the other
way around.  You do mean to pick only already uninteresting trees
out of "set" and mark its reachables.

One thing that would make me worried is what help the callers of
this function will get (or they may have to devise the way
themselves) to avoid having to traverse the same tree number of
times.  A tree can be made uninteresting after a traversal of
another tree that contains it, but the logic in this function

> +		if (tree->object.flags & UNINTERESTING) {
> +			/*
> +			 * Remove the flag so the next call
> +			 * is not a no-op. The flag is added
> +			 * in mark_tree_unintersting().
> +			 */
> +			tree->object.flags ^= UNINTERESTING;
> +			mark_tree_uninteresting(r, tree);
> +		}

ignores the fact that it is already UNINTERESTING (in fact, in a
sense it is even worse---it cannot be used to make a not-yet
UNINTERESTING one into UNINTERESTING), drops the UNINTERESING bit
and forces the traversal of that tree.  The only thing I see that
would work as a saving grace is that mark_tree_uninteresting()
itself would honor existing UNINTERESING bit and refuses to recurse
into its subtrees, but that means blobs at the top-level of such a
tree would be marked UNINTERESING while those in its subtrees can be
left not-UNINTERESING, which sounds like inviting a mess.

It does *not* immediately mean this function is misdesigned.  It
just means that the caller needs to carefully follow whatever
calling convention this series will establish in the later patches
(which we haven't seen yet at this point).

> By the way, one of the bigger reasons why I have to ask, instead of
> making an educated guess, is because "struct oidset *set" parameter
> does not give any useful information with the variable name to the
> readers.  We know it is a set because its type is oidset; readers
> need to know what meaning the 'set' has, what it is used for, why
> the caller wants to (or decides not to) place a tree object in the
> set when it calls it.  None of that can be read from its name.
Derrick Stolee Jan. 11, 2019, 10:05 p.m. UTC | #3
On 1/11/2019 3:25 PM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> So, I assumed that the implementation was wrong, but it is the other
> way around.  You do mean to pick only already uninteresting trees
> out of "set" and mark its reachables.
>
> One thing that would make me worried is what help the callers of
> this function will get (or they may have to devise the way
> themselves) to avoid having to traverse the same tree number of
> times.  A tree can be made uninteresting after a traversal of
> another tree that contains it, but the logic in this function
>
>> +		if (tree->object.flags & UNINTERESTING) {
>> +			/*
>> +			 * Remove the flag so the next call
>> +			 * is not a no-op. The flag is added
>> +			 * in mark_tree_unintersting().
>> +			 */
>> +			tree->object.flags ^= UNINTERESTING;
>> +			mark_tree_uninteresting(r, tree);
>> +		}
> ignores the fact that it is already UNINTERESTING (in fact, in a
> sense it is even worse---it cannot be used to make a not-yet
> UNINTERESTING one into UNINTERESTING), drops the UNINTERESING bit
> and forces the traversal of that tree.  The only thing I see that
> would work as a saving grace is that mark_tree_uninteresting()
> itself would honor existing UNINTERESING bit and refuses to recurse
> into its subtrees, but that means blobs at the top-level of such a
> tree would be marked UNINTERESING while those in its subtrees can be
> left not-UNINTERESING, which sounds like inviting a mess.
>
> It does *not* immediately mean this function is misdesigned.  It
> just means that the caller needs to carefully follow whatever
> calling convention this series will establish in the later patches
> (which we haven't seen yet at this point).
>
I'm sorry that this implementation is particularly confusing. It is 
created only as a placeholder so we can wire up the --sparse option to 
use this method without being "wrong" but is then removed entirely in 
PATCH 4/6:

...

void mark_trees_uninteresting_sparse(struct repository *r,
  				     struct oidset *set)
  {
+	unsigned has_interesting = 0, has_uninteresting = 0;
+	struct hashmap map;
+	struct hashmap_iter map_iter;
+	struct path_and_oids_entry *entry;
  	struct object_id *oid;
  	struct oidset_iter iter;
  
  	oidset_iter_init(set, &iter);
-	while ((oid = oidset_iter_next(&iter))) {
+	while ((!has_interesting || !has_uninteresting) &&
+	       (oid = oidset_iter_next(&iter))) {
  		struct tree *tree = lookup_tree(r, oid);
  
  		if (!tree)
  			continue;
  
-		if (tree->object.flags & UNINTERESTING) {
-			/*
-			 * Remove the flag so the next call
-			 * is not a no-op. The flag is added
-			 * in mark_tree_unintersting().
-			 */
-			tree->object.flags ^= UNINTERESTING;
-			mark_tree_uninteresting(r, tree);
-		}
+		if (tree->object.flags & UNINTERESTING)
+			has_uninteresting = 1;
+		else
+			has_interesting = 1;
+	}
...

You are definitely correct that "set" is not a valuable variable name. It could instead be "tree_oids" to be slightly more informative.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/revision.c b/revision.c
index 13e0519c02..f9eb6400f1 100644
--- a/revision.c
+++ b/revision.c
@@ -99,6 +99,31 @@  void mark_tree_uninteresting(struct repository *r, struct tree *tree)
 	mark_tree_contents_uninteresting(r, tree);
 }
 
+void mark_trees_uninteresting_sparse(struct repository *r,
+				     struct oidset *set)
+{
+	struct object_id *oid;
+	struct oidset_iter iter;
+
+	oidset_iter_init(set, &iter);
+	while ((oid = oidset_iter_next(&iter))) {
+		struct tree *tree = lookup_tree(r, oid);
+
+		if (!tree)
+			continue;
+
+		if (tree->object.flags & UNINTERESTING) {
+			/*
+			 * Remove the flag so the next call
+			 * is not a no-op. The flag is added
+			 * in mark_tree_unintersting().
+			 */
+			tree->object.flags ^= UNINTERESTING;
+			mark_tree_uninteresting(r, tree);
+		}
+	}
+}
+
 struct commit_stack {
 	struct commit **items;
 	size_t nr, alloc;
diff --git a/revision.h b/revision.h
index 7987bfcd2e..f828e91ae9 100644
--- a/revision.h
+++ b/revision.h
@@ -67,6 +67,7 @@  struct rev_cmdline_info {
 #define REVISION_WALK_NO_WALK_SORTED 1
 #define REVISION_WALK_NO_WALK_UNSORTED 2
 
+struct oidset;
 struct topo_walk_info;
 
 struct rev_info {
@@ -327,6 +328,7 @@  void put_revision_mark(const struct rev_info *revs,
 
 void mark_parents_uninteresting(struct commit *commit);
 void mark_tree_uninteresting(struct repository *r, struct tree *tree);
+void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *set);
 
 void show_object_with_name(FILE *, struct object *, const char *);