Message ID | e340f743e6749e65525e1342dc47faaa6540f04b.1539298957.git.matvore@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | support for filtering trees and blobs based on depth | expand |
Matthew DeVore <matvore@google.com> writes: > The long-term goal at the end of this is to allow a partial clone to > eagerly fetch an entire directory of files by fetching a tree and > specifying <depth>=1. This, for instance, would make a build operation > fast and convenient This would reduce round-trip, as you do not have to fetch the tree to see what its contents are before issuing another set of fetches for them. Those who are building virtual filesystem that let you mount a specific tree object, perhaps via fuse, may find it useful, too, even though I suspect that may not be your primary focus. > diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt > index c2c1c40e6..c78985c41 100644 > --- a/Documentation/rev-list-options.txt > +++ b/Documentation/rev-list-options.txt > @@ -734,8 +734,12 @@ specification contained in <path>. > + > The form '--filter=tree:<depth>' omits all blobs and trees whose depth > from the root tree is >= <depth> (minimum depth if an object is located > -at multiple depths in the commits traversed). Currently, only <depth>=0 > -is supported, which omits all blobs and trees. > +at multiple depths in the commits traversed). <depth>=0 will not include > +any trees or blobs unless included explicitly in <object>. <depth>=1 Here, <object> refers to the objects directly requested on the command line (or --stdin)? Triggering this question from me is a sign that this description may want to say a bit more to avoid the same question from the real readers. Perhaps replace "included explicitly in <object>" with "explicitly requested by listing them on the command line or feeding them with `--stdin`", or something like that? > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c > index e8da2e858..9dc61d6e6 100644 > --- a/list-objects-filter-options.c > +++ b/list-objects-filter-options.c > @@ -50,12 +50,12 @@ static int gently_parse_list_objects_filter( > } > > } else if (skip_prefix(arg, "tree:", &v0)) { > - unsigned long depth; > - if (!git_parse_ulong(v0, &depth) || depth != 0) { > + if (!git_parse_ulong(v0, > + &filter_options->tree_depth_limit_value)) { > if (errbuf) { > strbuf_addstr( > errbuf, > - _("only 'tree:0' is supported")); > + _("expected 'tree:<int>'")); We do not accept "tree:-1", even though "-1" is an int. Is it too obvious to worry about? I do not think we want to say tree:<uint> even if we do want to make it clear we reject "tree:-1" I am wondering if "expected 'tree:<depth>'" would work better. > diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h > index af64e5c66..c1ae70cd8 100644 > --- a/list-objects-filter-options.h > +++ b/list-objects-filter-options.h > @@ -44,6 +44,7 @@ struct list_objects_filter_options { > struct object_id *sparse_oid_value; > char *sparse_path_value; > unsigned long blob_limit_value; > + unsigned long tree_depth_limit_value; > }; This change gets it right by adding "depth" in the field name and it is not a comment on this patch, but someday not in too distant future we should rename the "blob_limit_value" to make it clear that it is filtering by number of bytes, as other filtering criteria on blobs that can be expressed in ulong are quite possible. > -static enum list_objects_filter_result filter_trees_none( > +static enum list_objects_filter_result filter_trees_depth( > enum list_objects_filter_situation filter_situation, > struct object *obj, > const char *pathname, > const char *filename, > void *filter_data_) > { > - struct filter_trees_none_data *filter_data = filter_data_; > + struct filter_trees_depth_data *filter_data = filter_data_; > + > + int too_deep = filter_data->current_depth >= filter_data->max_depth; Does max mean "maximum allowed", or "this and anything larger are rejected". The latter sound wrong, but I offhand do not know if your current_depth counts from 0 or 1, so there may not be off-by-one. - dir.c::within_depth() that is used by pathspec matching that in turn is used by "grep --max-depth=1" does "if (depth > max_depth)", which sounds more in line with the usual convention, I would think. - pack-objects has max_delta_cache_size, which also is used as "maximum allowed", not "this is already too big". So is its max_depth. There may be other examples. One existing violator I noticed was the "reject blobs that is this size or larger" in this file; it is called "max_bytes", but it is apparently not "maximum allowed", which we probably would want to fix. > + /* > + * Note that we do not use _MARK_SEEN in order to allow re-traversal in > + * case we encounter a tree or blob again at a shallower depth. > + */ Hmph. Earlier tree:0 support never even read the actual tree, so this was not a problem. We wouldn't have found a tree in deeper places first and then at a shallower depth, as we wouldn't have seen any tree in any depth deeper than the surface anyway. Now we need to worry about a tree that originally gets seen in a deeper depth (that is still below the allowed maximum) to reappear at a shallower place, so a subtree within it that used to be too deep may now be within the allowed maximum depth. Step 1 of these three patches made sure trees are not even opened under "tree:0", so it was not just optimizing/shrinking the output of rev-list but also optimizing the traversal. When we are collecting omits, however, this one now returns _ZERO which means we still traverse into the tree, even under "tree:0"? I must be reading the code incorrectly (in general, when we are seeing a tree object that itself is at the maximum allowed depth, trees found by reading its contents will never become eligible for output, even if they are found at a shallower depth than their other copies were previously found, I would think). > switch (filter_situation) { > default: > BUG("unknown filter_situation: %d", filter_situation); > > - case LOFS_BEGIN_TREE: > case LOFS_BLOB: > + if (!too_deep) goto include_it; Style: on two lines, like you did below for the next if() statement. > + > + if (filter_data->omits) > + oidset_insert(filter_data->omits, &obj->oid); > + > + return LOFR_ZERO; > + > + case LOFS_BEGIN_TREE: > + filter_data->current_depth++; > + > + if (!too_deep) goto include_it; > + > if (filter_data->omits) { > oidset_insert(filter_data->omits, &obj->oid); > - /* _MARK_SEEN but not _DO_SHOW (hard omit) */ > - return LOFR_MARK_SEEN; > + return LOFR_ZERO; > } > else > /* > * Not collecting omits so no need to to traverse tree. > */ > - return LOFR_SKIP_TREE | LOFR_MARK_SEEN; > + return LOFR_SKIP_TREE; > > case LOFS_END_TREE: > assert(obj->type == OBJ_TREE); > + filter_data->current_depth--; > return LOFR_ZERO; > > } > + > +include_it: > + if (filter_data->omits) > + oidset_remove(filter_data->omits, &obj->oid); > + return LOFR_MARK_SEEN | LOFR_DO_SHOW; > }
A quick update: I've read Junio's comments on this patchset and basically agree with them, but I haven't had a chance to apply them yet. I plan to pick this patchset up again (as well as the patch "md/list-lazy-objects-fix") once things settle down in my day job. On Sun, Oct 14, 2018 at 7:31 PM Junio C Hamano <gitster@pobox.com> wrote: > > Matthew DeVore <matvore@google.com> writes: > > > The long-term goal at the end of this is to allow a partial clone to > > eagerly fetch an entire directory of files by fetching a tree and > > specifying <depth>=1. This, for instance, would make a build operation > > fast and convenient > > This would reduce round-trip, as you do not have to fetch the tree > to see what its contents are before issuing another set of fetches > for them. Those who are building virtual filesystem that let you > mount a specific tree object, perhaps via fuse, may find it useful, > too, even though I suspect that may not be your primary focus. > > > diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt > > index c2c1c40e6..c78985c41 100644 > > --- a/Documentation/rev-list-options.txt > > +++ b/Documentation/rev-list-options.txt > > @@ -734,8 +734,12 @@ specification contained in <path>. > > + > > The form '--filter=tree:<depth>' omits all blobs and trees whose depth > > from the root tree is >= <depth> (minimum depth if an object is located > > -at multiple depths in the commits traversed). Currently, only <depth>=0 > > -is supported, which omits all blobs and trees. > > +at multiple depths in the commits traversed). <depth>=0 will not include > > +any trees or blobs unless included explicitly in <object>. <depth>=1 > > Here, <object> refers to the objects directly requested on the > command line (or --stdin)? Triggering this question from me is a > sign that this description may want to say a bit more to avoid the > same question from the real readers. Perhaps replace "included > explicitly in <object>" with "explicitly requested by listing them > on the command line or feeding them with `--stdin`", or something > like that? > > > diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c > > index e8da2e858..9dc61d6e6 100644 > > --- a/list-objects-filter-options.c > > +++ b/list-objects-filter-options.c > > @@ -50,12 +50,12 @@ static int gently_parse_list_objects_filter( > > } > > > > } else if (skip_prefix(arg, "tree:", &v0)) { > > - unsigned long depth; > > - if (!git_parse_ulong(v0, &depth) || depth != 0) { > > + if (!git_parse_ulong(v0, > > + &filter_options->tree_depth_limit_value)) { > > if (errbuf) { > > strbuf_addstr( > > errbuf, > > - _("only 'tree:0' is supported")); > > + _("expected 'tree:<int>'")); > > We do not accept "tree:-1", even though "-1" is an int. Is it too > obvious to worry about? I do not think we want to say tree:<uint> > even if we do want to make it clear we reject "tree:-1" > > I am wondering if "expected 'tree:<depth>'" would work better. > > > diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h > > index af64e5c66..c1ae70cd8 100644 > > --- a/list-objects-filter-options.h > > +++ b/list-objects-filter-options.h > > @@ -44,6 +44,7 @@ struct list_objects_filter_options { > > struct object_id *sparse_oid_value; > > char *sparse_path_value; > > unsigned long blob_limit_value; > > + unsigned long tree_depth_limit_value; > > }; > > This change gets it right by adding "depth" in the field name and it > is not a comment on this patch, but someday not in too distant > future we should rename the "blob_limit_value" to make it clear that > it is filtering by number of bytes, as other filtering criteria on > blobs that can be expressed in ulong are quite possible. > > > -static enum list_objects_filter_result filter_trees_none( > > +static enum list_objects_filter_result filter_trees_depth( > > enum list_objects_filter_situation filter_situation, > > struct object *obj, > > const char *pathname, > > const char *filename, > > void *filter_data_) > > { > > - struct filter_trees_none_data *filter_data = filter_data_; > > + struct filter_trees_depth_data *filter_data = filter_data_; > > + > > + int too_deep = filter_data->current_depth >= filter_data->max_depth; > > Does max mean "maximum allowed", or "this and anything larger are > rejected". The latter sound wrong, but I offhand do not know if > your current_depth counts from 0 or 1, so there may not be > off-by-one. > > - dir.c::within_depth() that is used by pathspec matching that in turn > is used by "grep --max-depth=1" does "if (depth > max_depth)", which > sounds more in line with the usual convention, I would think. > > - pack-objects has max_delta_cache_size, which also is used as > "maximum allowed", not "this is already too big". So is its > max_depth. > > There may be other examples. One existing violator I noticed was > the "reject blobs that is this size or larger" in this file; it is > called "max_bytes", but it is apparently not "maximum allowed", > which we probably would want to fix. > > > + /* > > + * Note that we do not use _MARK_SEEN in order to allow re-traversal in > > + * case we encounter a tree or blob again at a shallower depth. > > + */ > > Hmph. Earlier tree:0 support never even read the actual tree, so > this was not a problem. We wouldn't have found a tree in deeper > places first and then at a shallower depth, as we wouldn't have seen > any tree in any depth deeper than the surface anyway. > > Now we need to worry about a tree that originally gets seen in a > deeper depth (that is still below the allowed maximum) to reappear > at a shallower place, so a subtree within it that used to be too > deep may now be within the allowed maximum depth. > > Step 1 of these three patches made sure trees are not even opened > under "tree:0", so it was not just optimizing/shrinking the output > of rev-list but also optimizing the traversal. When we are > collecting omits, however, this one now returns _ZERO which means we > still traverse into the tree, even under "tree:0"? I must be > reading the code incorrectly (in general, when we are seeing a tree > object that itself is at the maximum allowed depth, trees found by > reading its contents will never become eligible for output, even if > they are found at a shallower depth than their other copies were > previously found, I would think). > > > switch (filter_situation) { > > default: > > BUG("unknown filter_situation: %d", filter_situation); > > > > - case LOFS_BEGIN_TREE: > > case LOFS_BLOB: > > + if (!too_deep) goto include_it; > > Style: on two lines, like you did below for the next if() statement. > > > + > > + if (filter_data->omits) > > + oidset_insert(filter_data->omits, &obj->oid); > > + > > + return LOFR_ZERO; > > + > > + case LOFS_BEGIN_TREE: > > + filter_data->current_depth++; > > + > > + if (!too_deep) goto include_it; > > + > > if (filter_data->omits) { > > oidset_insert(filter_data->omits, &obj->oid); > > - /* _MARK_SEEN but not _DO_SHOW (hard omit) */ > > - return LOFR_MARK_SEEN; > > + return LOFR_ZERO; > > } > > else > > /* > > * Not collecting omits so no need to to traverse tree. > > */ > > - return LOFR_SKIP_TREE | LOFR_MARK_SEEN; > > + return LOFR_SKIP_TREE; > > > > case LOFS_END_TREE: > > assert(obj->type == OBJ_TREE); > > + filter_data->current_depth--; > > return LOFR_ZERO; > > > > } > > + > > +include_it: > > + if (filter_data->omits) > > + oidset_remove(filter_data->omits, &obj->oid); > > + return LOFR_MARK_SEEN | LOFR_DO_SHOW; > > }
On Sun, Oct 14, 2018 at 7:31 PM Junio C Hamano <gitster@pobox.com> wrote: > > Matthew DeVore <matvore@google.com> writes: > > > The long-term goal at the end of this is to allow a partial clone to > > eagerly fetch an entire directory of files by fetching a tree and > > specifying <depth>=1. This, for instance, would make a build operation > > fast and convenient > > This would reduce round-trip, as you do not have to fetch the tree > to see what its contents are before issuing another set of fetches > for them. Those who are building virtual filesystem that let you > mount a specific tree object, perhaps via fuse, may find it useful, > too, even though I suspect that may not be your primary focus. I added a paragraph mentioning the "roundtrip" aspect to the commit message, since this *may* help someone find a use for it. > > > diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt > > index c2c1c40e6..c78985c41 100644 > > --- a/Documentation/rev-list-options.txt > > +++ b/Documentation/rev-list-options.txt > > @@ -734,8 +734,12 @@ specification contained in <path>. > > + > > The form '--filter=tree:<depth>' omits all blobs and trees whose depth > > from the root tree is >= <depth> (minimum depth if an object is located > > -at multiple depths in the commits traversed). Currently, only <depth>=0 > > -is supported, which omits all blobs and trees. > > +at multiple depths in the commits traversed). <depth>=0 will not include > > +any trees or blobs unless included explicitly in <object>. <depth>=1 > > Here, <object> refers to the objects directly requested on the > command line (or --stdin)? Triggering this question from me is a > sign that this description may want to say a bit more to avoid the > same question from the real readers. Perhaps replace "included > explicitly in <object>" with "explicitly requested by listing them > on the command line or feeding them with `--stdin`", or something > like that? I have reworded this to be more explicit about "stdin," and also to avoid directly admitting that trees are allowed in the <commit> argument, since I've dropped the s/<commit>/<object>/ patch. Here is the new paragraph: The form '--filter=tree:<depth>' omits all blobs and trees whose depth from the root tree is >= <depth> (minimum depth if an object is located at multiple depths in the commits traversed). <depth>=0 will not include any trees or blobs unless included explicitly in the command-line (or standard input when --stdin is used). <depth>=1 will include only the tree and blobs which are referenced directly by a commit reachable from <commit> or an explicitly-given object. <depth>=2 is like <depth>=1 while also including trees and blobs one more level removed from an explicitly-given commit or tree. > > + _("expected 'tree:<int>'")); > > We do not accept "tree:-1", even though "-1" is an int. Is it too > obvious to worry about? I do not think we want to say tree:<uint> > even if we do want to make it clear we reject "tree:-1" > > I am wondering if "expected 'tree:<depth>'" would work better. Yes, I agree, <depth> is more helpful and not ambiguous in a significant way. Fixed. > > + unsigned long tree_depth_limit_value; > > }; > > This change gets it right by adding "depth" in the field name and it > is not a comment on this patch, but someday not in too distant > future we should rename the "blob_limit_value" to make it clear that > it is filtering by number of bytes, as other filtering criteria on > blobs that can be expressed in ulong are quite possible. Agreed. > > > -static enum list_objects_filter_result filter_trees_none( > > +static enum list_objects_filter_result filter_trees_depth( > > enum list_objects_filter_situation filter_situation, > > struct object *obj, > > const char *pathname, > > const char *filename, > > void *filter_data_) > > { > > - struct filter_trees_none_data *filter_data = filter_data_; > > + struct filter_trees_depth_data *filter_data = filter_data_; > > + > > + int too_deep = filter_data->current_depth >= filter_data->max_depth; > > Does max mean "maximum allowed", or "this and anything larger are > rejected". The latter sound wrong, but I offhand do not know if > your current_depth counts from 0 or 1, so there may not be > off-by-one. > It means "this and anything larger are rejected." The documentation words it similarly: " The form '--filter=tree:<depth>' omits all blobs and trees whose depth from the root tree is >= <depth> ... " There is no intuitive phrase to mean "distance from root tree minus one" (left operand) and I don't want to change the filter option field to something different from what the user entered (right operand), so I think we'd best use ">=" and I've renamed the field to "exclusion_trigger_depth". > There may be other examples. One existing violator I noticed was > the "reject blobs that is this size or larger" in this file; it is > called "max_bytes", but it is apparently not "maximum allowed", > which we probably would want to fix. > That's not ideal. The documentation suggests it means "maximum allowed," and JGit apparently is interpreting the value as "maximum allowed," so we should probably fix the semantics in Git. Or were you suggesting to keep the behavior the same but fix the naming and docs? > > + /* > > + * Note that we do not use _MARK_SEEN in order to allow re-traversal in > > + * case we encounter a tree or blob again at a shallower depth. > > + */ > > Hmph. Earlier tree:0 support never even read the actual tree, so > this was not a problem. We wouldn't have found a tree in deeper > places first and then at a shallower depth, as we wouldn't have seen > any tree in any depth deeper than the surface anyway. > > Now we need to worry about a tree that originally gets seen in a > deeper depth (that is still below the allowed maximum) to reappear > at a shallower place, so a subtree within it that used to be too > deep may now be within the allowed maximum depth. That's true, and it points out the fact that we can't use LOFR_MARK_SEEN even if we do LOFR_DO_SHOW, since subtrees may or may not be within the limit in the future. I've fixed that for v+1 of the patch, and added a corresponding test. I'm also refactoring the filters_tree_depth function to not use goto. I think it is easier to follow that way. > > Step 1 of these three patches made sure trees are not even opened > under "tree:0", so it was not just optimizing/shrinking the output > of rev-list but also optimizing the traversal. When we are > collecting omits, however, this one now returns _ZERO which means we > still traverse into the tree, even under "tree:0"? I must be > reading the code incorrectly (in general, when we are seeing a tree > object that itself is at the maximum allowed depth, trees found by > reading its contents will never become eligible for output, even if > they are found at a shallower depth than their other copies were > previously found, I would think). You are reading the code right - when the tree is too deep, we sometimes return LOFR_ZERO, sometimes _SKIP_TREE - the latter if we are collecting omits. But as you observe, we don't need to traverse an omitted tree *if it has already been marked as omitted*, since that suggests that its children have also been marked as omitted. So I've added a new commit to this patchset which optimizes this case. The next patchset will have 2 commits (subtracting the documentation rewrite one and ones that are already scheduled to merge): Patch 1/2 - implement tree:<depth> for positive depths Patch 2/2 - stop iterating through trees for collecting omits if the omits are already marked
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index c2c1c40e6..c78985c41 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -734,8 +734,12 @@ specification contained in <path>. + The form '--filter=tree:<depth>' omits all blobs and trees whose depth from the root tree is >= <depth> (minimum depth if an object is located -at multiple depths in the commits traversed). Currently, only <depth>=0 -is supported, which omits all blobs and trees. +at multiple depths in the commits traversed). <depth>=0 will not include +any trees or blobs unless included explicitly in <object>. <depth>=1 +will include only the tree and blobs which are referenced directly by a +commit reachable from <object> or an object given in <object>. <depth>=2 +is like <depth>=1 while also including trees and blobs one more level +removed from <object> or a reachable commit. --no-filter:: Turn off any previous `--filter=` argument. diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c index e8da2e858..9dc61d6e6 100644 --- a/list-objects-filter-options.c +++ b/list-objects-filter-options.c @@ -50,12 +50,12 @@ static int gently_parse_list_objects_filter( } } else if (skip_prefix(arg, "tree:", &v0)) { - unsigned long depth; - if (!git_parse_ulong(v0, &depth) || depth != 0) { + if (!git_parse_ulong(v0, + &filter_options->tree_depth_limit_value)) { if (errbuf) { strbuf_addstr( errbuf, - _("only 'tree:0' is supported")); + _("expected 'tree:<int>'")); } return 1; } diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h index af64e5c66..c1ae70cd8 100644 --- a/list-objects-filter-options.h +++ b/list-objects-filter-options.h @@ -44,6 +44,7 @@ struct list_objects_filter_options { struct object_id *sparse_oid_value; char *sparse_path_value; unsigned long blob_limit_value; + unsigned long tree_depth_limit_value; }; /* Normalized command line arguments */ diff --git a/list-objects-filter.c b/list-objects-filter.c index 37fba456d..e69fb9b82 100644 --- a/list-objects-filter.c +++ b/list-objects-filter.c @@ -83,53 +83,80 @@ static void *filter_blobs_none__init( * A filter for list-objects to omit ALL trees and blobs from the traversal. * Can OPTIONALLY collect a list of the omitted OIDs. */ -struct filter_trees_none_data { +struct filter_trees_depth_data { struct oidset *omits; + unsigned long max_depth; + unsigned long current_depth; }; -static enum list_objects_filter_result filter_trees_none( +static enum list_objects_filter_result filter_trees_depth( enum list_objects_filter_situation filter_situation, struct object *obj, const char *pathname, const char *filename, void *filter_data_) { - struct filter_trees_none_data *filter_data = filter_data_; + struct filter_trees_depth_data *filter_data = filter_data_; + + int too_deep = filter_data->current_depth >= filter_data->max_depth; + + /* + * Note that we do not use _MARK_SEEN in order to allow re-traversal in + * case we encounter a tree or blob again at a shallower depth. + */ switch (filter_situation) { default: BUG("unknown filter_situation: %d", filter_situation); - case LOFS_BEGIN_TREE: case LOFS_BLOB: + if (!too_deep) goto include_it; + + if (filter_data->omits) + oidset_insert(filter_data->omits, &obj->oid); + + return LOFR_ZERO; + + case LOFS_BEGIN_TREE: + filter_data->current_depth++; + + if (!too_deep) goto include_it; + if (filter_data->omits) { oidset_insert(filter_data->omits, &obj->oid); - /* _MARK_SEEN but not _DO_SHOW (hard omit) */ - return LOFR_MARK_SEEN; + return LOFR_ZERO; } else /* * Not collecting omits so no need to to traverse tree. */ - return LOFR_SKIP_TREE | LOFR_MARK_SEEN; + return LOFR_SKIP_TREE; case LOFS_END_TREE: assert(obj->type == OBJ_TREE); + filter_data->current_depth--; return LOFR_ZERO; } + +include_it: + if (filter_data->omits) + oidset_remove(filter_data->omits, &obj->oid); + return LOFR_MARK_SEEN | LOFR_DO_SHOW; } -static void* filter_trees_none__init( +static void* filter_trees_depth__init( struct oidset *omitted, struct list_objects_filter_options *filter_options, filter_object_fn *filter_fn, filter_free_fn *filter_free_fn) { - struct filter_trees_none_data *d = xcalloc(1, sizeof(*d)); + struct filter_trees_depth_data *d = xcalloc(1, sizeof(*d)); d->omits = omitted; + d->max_depth = filter_options->tree_depth_limit_value; + d->current_depth = 0; - *filter_fn = filter_trees_none; + *filter_fn = filter_trees_depth; *filter_free_fn = free; return d; } @@ -426,7 +453,7 @@ static filter_init_fn s_filters[] = { NULL, filter_blobs_none__init, filter_blobs_limit__init, - filter_trees_none__init, + filter_trees_depth__init, filter_sparse_oid__init, filter_sparse_path__init, }; diff --git a/t/t6112-rev-list-filters-objects.sh b/t/t6112-rev-list-filters-objects.sh index efb1bee2e..43ee0df80 100755 --- a/t/t6112-rev-list-filters-objects.sh +++ b/t/t6112-rev-list-filters-objects.sh @@ -254,6 +254,90 @@ test_expect_success 'filter a GIANT tree through tree:0' ' test_line_count = 2 actual ' +# Test tree:# filters. + +expect_has () { + commit=$1 && + name=$2 && + + hash=$(git -C r3 rev-parse $commit:$name) && + grep "^$hash $name$" actual +} + +test_expect_success 'verify tree:1 includes root trees' ' + git -C r3 rev-list --objects --filter=tree:1 HEAD >actual && + + # We should get two root directories and two commits. + expect_has HEAD "" && + expect_has HEAD~1 "" && + test_line_count = 4 actual +' + +test_expect_success 'verify tree:2 includes root trees and immediate children' ' + git -C r3 rev-list --objects --filter=tree:2 HEAD >actual && + + expect_has HEAD "" && + expect_has HEAD~1 "" && + expect_has HEAD dir1 && + expect_has HEAD pattern && + expect_has HEAD sparse1 && + expect_has HEAD sparse2 && + + # There are also 2 commit objects + test_line_count = 8 actual +' + +test_expect_success 'verify tree:3 includes everything expected' ' + git -C r3 rev-list --objects --filter=tree:3 HEAD >actual && + + expect_has HEAD "" && + expect_has HEAD~1 "" && + expect_has HEAD dir1 && + expect_has HEAD dir1/sparse1 && + expect_has HEAD dir1/sparse2 && + expect_has HEAD pattern && + expect_has HEAD sparse1 && + expect_has HEAD sparse2 && + + # There are also 2 commit objects + test_line_count = 10 actual +' + +# Test provisional omit collection logic with a repo that has objects appearing +# at multiple depths - first deeper than the filter's threshold, then shallow. + +test_expect_success 'setup r4' ' + git init r4 && + + echo foo > r4/foo && + mkdir r4/subdir && + echo bar > r4/subdir/bar && + + mkdir r4/filt && + cp -r r4/foo r4/subdir r4/filt && + + git -C r4 add foo subdir filt && + git -C r4 commit -m "commit msg" +' + +expect_has_with_different_name () { + commit=$1 && + name=$2 && + + hash=$(git -C r4 rev-parse $commit:$name) && + ! grep "^$hash $name$" actual && + grep "^$hash " actual && + ! grep "~$hash" actual +} + +test_expect_success 'test tree:# filter provisional omit for blob and tree' ' + git -C r4 rev-list --objects --filter-print-omitted --filter=tree:2 \ + HEAD >actual && + + expect_has_with_different_name HEAD filt/foo && + expect_has_with_different_name HEAD filt/subdir +' + # Delete some loose objects and use rev-list, but WITHOUT any filtering. # This models previously omitted objects that we did not receive.
Implement positive values for <depth> in the tree:<depth> filter. The exact semantics are described in Documentation/rev-list-options.txt. The long-term goal at the end of this is to allow a partial clone to eagerly fetch an entire directory of files by fetching a tree and specifying <depth>=1. This, for instance, would make a build operation fast and convenient. It is fast because the partial clone does not need to fetch each file individually, and convenient because the user does not need to supply a sparse-checkout specification. Signed-off-by: Matthew DeVore <matvore@google.com> --- Documentation/rev-list-options.txt | 8 ++- list-objects-filter-options.c | 6 +-- list-objects-filter-options.h | 1 + list-objects-filter.c | 49 +++++++++++++---- t/t6112-rev-list-filters-objects.sh | 84 +++++++++++++++++++++++++++++ 5 files changed, 132 insertions(+), 16 deletions(-)