mbox series

[v4,0/8] rev-list: implement object type filter

Message ID cover.1618234575.git.ps@pks.im (mailing list archive)
Headers show
Series rev-list: implement object type filter | expand

Message

Patrick Steinhardt April 12, 2021, 1:37 p.m. UTC
Hi,

this is the fourth version of my patch series which implements a new
`object:type` filter for git-rev-list(1) and git-upload-pack(1) and
extends support for bitmap indices to work with combined filters.

Changes compared to v3:

    - Small style fix to not pass an empty string and `0`, but instead
      simply pass two `NULL` pointers to
      `list_objects_filter__filter_object, pointed out by Junio.

    - I've changed patch 7/8 as proposed by Peff: support of combined
      filters is now determined directly in `filter_bitmap()`, without
      having to mirror all filter types in the new `filter_supported()`
      function.

    - Renamed `--filter-provided-revisions` to
      `--filter-provided-objects` as proposed by Peff and addressed both
      commit message and tests as pointed out by Philip.

Thanks for all your feedback! As alawys, the range-diff is attached
below.

Patrick

Patrick Steinhardt (8):
  uploadpack.txt: document implication of `uploadpackfilter.allow`
  revision: mark commit parents as NOT_USER_GIVEN
  list-objects: move tag processing into its own function
  list-objects: support filtering by tag and commit
  list-objects: implement object type filter
  pack-bitmap: implement object type filter
  pack-bitmap: implement combined filter
  rev-list: allow filtering of provided items

 Documentation/config/uploadpack.txt |   9 ++-
 Documentation/rev-list-options.txt  |   8 ++
 builtin/pack-objects.c              |   2 +-
 builtin/rev-list.c                  |  36 ++++++---
 list-objects-filter-options.c       |  14 ++++
 list-objects-filter-options.h       |   2 +
 list-objects-filter.c               | 116 ++++++++++++++++++++++++++++
 list-objects-filter.h               |   2 +
 list-objects.c                      |  30 ++++++-
 pack-bitmap.c                       |  45 +++++++++--
 pack-bitmap.h                       |   3 +-
 reachable.c                         |   2 +-
 revision.c                          |   4 +-
 revision.h                          |   3 -
 t/t6112-rev-list-filters-objects.sh |  76 ++++++++++++++++++
 t/t6113-rev-list-bitmap-filters.sh  |  68 +++++++++++++++-
 16 files changed, 390 insertions(+), 30 deletions(-)

Range-diff against v3:
1:  f80b9570d4 = 1:  f80b9570d4 uploadpack.txt: document implication of `uploadpackfilter.allow`
2:  46c1952405 = 2:  46c1952405 revision: mark commit parents as NOT_USER_GIVEN
3:  3d792f6339 = 3:  3d792f6339 list-objects: move tag processing into its own function
4:  80193d6ba3 ! 4:  674da0f9ac list-objects: support filtering by tag and commit
    @@ list-objects.c: static void process_tag(struct traversal_context *ctx,
     +	enum list_objects_filter_result r;
     +
     +	r = list_objects_filter__filter_object(ctx->revs->repo, LOFS_TAG,
    -+					       &tag->object, "", 0, ctx->filter);
    ++					       &tag->object, NULL, NULL,
    ++					       ctx->filter);
     +	if (r & LOFR_MARK_SEEN)
     +		tag->object.flags |= SEEN;
     +	if (r & LOFR_DO_SHOW)
5:  e2a14abf92 = 5:  d22a5fd37d list-objects: implement object type filter
6:  46d4450d38 = 6:  17c9f66bbc pack-bitmap: implement object type filter
7:  06a376399b ! 7:  759ac54bb2 pack-bitmap: implement combined filter
    @@ Commit message
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## pack-bitmap.c ##
    -@@ pack-bitmap.c: static void filter_bitmap_object_type(struct bitmap_index *bitmap_git,
    - 		filter_bitmap_exclude_type(bitmap_git, tip_objects, to_filter, OBJ_BLOB);
    - }
    - 
    -+static int filter_supported(struct list_objects_filter_options *filter)
    -+{
    -+	int i;
    -+
    -+	switch (filter->choice) {
    -+	case LOFC_BLOB_NONE:
    -+	case LOFC_BLOB_LIMIT:
    -+	case LOFC_OBJECT_TYPE:
    -+		return 1;
    -+	case LOFC_TREE_DEPTH:
    -+		if (filter->tree_exclude_depth == 0)
    -+			return 1;
    -+		return 0;
    -+	case LOFC_COMBINE:
    -+		for (i = 0; i < filter->sub_nr; i++)
    -+			if (!filter_supported(&filter->sub[i]))
    -+				return 0;
    -+		return 1;
    -+	default:
    -+		return 0;
    -+	}
    -+}
    -+
    - static int filter_bitmap(struct bitmap_index *bitmap_git,
    - 			 struct object_list *tip_objects,
    - 			 struct bitmap *to_filter,
    -@@ pack-bitmap.c: static int filter_bitmap(struct bitmap_index *bitmap_git,
    - {
    - 	if (!filter || filter->choice == LOFC_DISABLED)
    - 		return 0;
    -+	if (!filter_supported(filter))
    -+		return -1;
    - 
    - 	if (filter->choice == LOFC_BLOB_NONE) {
    - 		if (bitmap_git)
     @@ pack-bitmap.c: static int filter_bitmap(struct bitmap_index *bitmap_git,
      		return 0;
      	}
      
    --	if (filter->choice == LOFC_TREE_DEPTH &&
    --	    filter->tree_exclude_depth == 0) {
    -+	if (filter->choice == LOFC_TREE_DEPTH) {
    - 		if (bitmap_git)
    - 			filter_bitmap_tree_depth(bitmap_git, tip_objects,
    - 						 to_filter,
    -@@ pack-bitmap.c: static int filter_bitmap(struct bitmap_index *bitmap_git,
    - 		return 0;
    - 	}
    - 
    --	/* filter choice not handled */
    --	return -1;
     +	if (filter->choice == LOFC_COMBINE) {
     +		int i;
     +		for (i = 0; i < filter->sub_nr; i++) {
    @@ pack-bitmap.c: static int filter_bitmap(struct bitmap_index *bitmap_git,
     +		return 0;
     +	}
     +
    -+	BUG("unsupported filter choice");
    + 	/* filter choice not handled */
    + 	return -1;
      }
    - 
    - static int can_filter_bitmap(struct list_objects_filter_options *filter)
     
      ## t/t6113-rev-list-bitmap-filters.sh ##
     @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'object:type filter' '
8:  cf2297b413 ! 8:  c779d222cf rev-list: allow filtering of provided items
    @@ Commit message
         that even if the user wants to only see blobs, he'll still see commits
         of provided references.
     
    -    Improve this by introducing a new `--filter-provided` option to the
    -    git-rev-parse(1) command. If given, then all user-provided references
    -    will be subject to filtering.
    +    Improve this by introducing a new `--filter-provided-objects` option
    +    to the git-rev-parse(1) command. If given, then all user-provided
    +    references will be subject to filtering.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
    @@ Documentation/rev-list-options.txt: equivalent.
      --no-filter::
      	Turn off any previous `--filter=` argument.
      
    -+--filter-provided-revisions::
    -+	Filter the list of explicitly provided revisions, which would otherwise
    ++--filter-provided-objects::
    ++	Filter the list of explicitly provided objects, which would otherwise
     +	always be printed even if they did not match any of the filters. Only
     +	useful with `--filter=`.
     +
    @@ builtin/rev-list.c: static inline int parse_missing_action_value(const char *val
      static int try_bitmap_count(struct rev_info *revs,
     -			    struct list_objects_filter_options *filter)
     +			    struct list_objects_filter_options *filter,
    -+			    int filter_provided_revs)
    ++			    int filter_provided_objects)
      {
      	uint32_t commit_count = 0,
      		 tag_count = 0,
    @@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs,
      	max_count = revs->max_count;
      
     -	bitmap_git = prepare_bitmap_walk(revs, filter);
    -+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
    ++	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
      	if (!bitmap_git)
      		return -1;
      
    @@ builtin/rev-list.c: static int try_bitmap_count(struct rev_info *revs,
      static int try_bitmap_traversal(struct rev_info *revs,
     -				struct list_objects_filter_options *filter)
     +				struct list_objects_filter_options *filter,
    -+				int filter_provided_revs)
    ++				int filter_provided_objects)
      {
      	struct bitmap_index *bitmap_git;
      
    @@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs,
      		return -1;
      
     -	bitmap_git = prepare_bitmap_walk(revs, filter);
    -+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
    ++	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
      	if (!bitmap_git)
      		return -1;
      
    @@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs,
      static int try_bitmap_disk_usage(struct rev_info *revs,
     -				 struct list_objects_filter_options *filter)
     +				 struct list_objects_filter_options *filter,
    -+				 int filter_provided_revs)
    ++				 int filter_provided_objects)
      {
      	struct bitmap_index *bitmap_git;
      
    @@ builtin/rev-list.c: static int try_bitmap_traversal(struct rev_info *revs,
      		return -1;
      
     -	bitmap_git = prepare_bitmap_walk(revs, filter);
    -+	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_revs);
    ++	bitmap_git = prepare_bitmap_walk(revs, filter, filter_provided_objects);
      	if (!bitmap_git)
      		return -1;
      
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      	int bisect_show_vars = 0;
      	int bisect_find_all = 0;
      	int use_bitmap_index = 0;
    -+	int filter_provided_revs = 0;
    ++	int filter_provided_objects = 0;
      	const char *show_progress = NULL;
      
      	if (argc == 2 && !strcmp(argv[1], "-h"))
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      			list_objects_filter_set_no_filter(&filter_options);
      			continue;
      		}
    -+		if (!strcmp(arg, "--filter-provided-revisions")) {
    -+			filter_provided_revs = 1;
    ++		if (!strcmp(arg, "--filter-provided-objects")) {
    ++			filter_provided_objects = 1;
     +			continue;
     +		}
      		if (!strcmp(arg, "--filter-print-omitted")) {
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      
      	if (use_bitmap_index) {
     -		if (!try_bitmap_count(&revs, &filter_options))
    -+		if (!try_bitmap_count(&revs, &filter_options, filter_provided_revs))
    ++		if (!try_bitmap_count(&revs, &filter_options, filter_provided_objects))
      			return 0;
     -		if (!try_bitmap_disk_usage(&revs, &filter_options))
    -+		if (!try_bitmap_disk_usage(&revs, &filter_options, filter_provided_revs))
    ++		if (!try_bitmap_disk_usage(&revs, &filter_options, filter_provided_objects))
      			return 0;
     -		if (!try_bitmap_traversal(&revs, &filter_options))
    -+		if (!try_bitmap_traversal(&revs, &filter_options, filter_provided_revs))
    ++		if (!try_bitmap_traversal(&revs, &filter_options, filter_provided_objects))
      			return 0;
      	}
      
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      			return show_bisect_vars(&info, reaches, all);
      	}
      
    -+	if (filter_provided_revs) {
    ++	if (filter_provided_objects) {
     +		struct commit_list *c;
     +		for (i = 0; i < revs.pending.nr; i++) {
     +			struct object_array_entry *pending = revs.pending.objects + i;
    @@ pack-bitmap.c: static int can_filter_bitmap(struct list_objects_filter_options *
      struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
     -					 struct list_objects_filter_options *filter)
     +					 struct list_objects_filter_options *filter,
    -+					 int filter_provided_revs)
    ++					 int filter_provided_objects)
      {
      	unsigned int i;
      
    @@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
      		bitmap_and_not(wants_bitmap, haves_bitmap);
      
     -	filter_bitmap(bitmap_git, wants, wants_bitmap, filter);
    -+	filter_bitmap(bitmap_git, (filter && filter_provided_revs) ? NULL : wants,
    ++	filter_bitmap(bitmap_git, (filter && filter_provided_objects) ? NULL : wants,
     +		      wants_bitmap, filter);
      
      	bitmap_git->result = wants_bitmap;
    @@ pack-bitmap.h: void traverse_bitmap_commit_list(struct bitmap_index *,
      struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
     -					 struct list_objects_filter_options *filter);
     +					 struct list_objects_filter_options *filter,
    -+					 int filter_provided_revs);
    ++					 int filter_provided_objects);
      int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
      				       struct packed_git **packfile,
      				       uint32_t *entries,
    @@ t/t6112-rev-list-filters-objects.sh: test_expect_success 'verify object:type=tag
      	test_cmp expected actual
      '
      
    -+test_expect_success 'verify object:type=blob prints only blob with --filter-provided-revisions' '
    ++test_expect_success 'verify object:type=blob prints only blob with --filter-provided-objects' '
     +	printf "%s blob\n" $(git -C object-type rev-parse HEAD:blob) >expected &&
     +	git -C object-type rev-list --objects \
    -+		--filter=object:type=blob --filter-provided-revisions HEAD >actual &&
    ++		--filter=object:type=blob --filter-provided-objects HEAD >actual &&
     +	test_cmp expected actual
     +'
     +
    -+test_expect_success 'verify object:type=tree prints only tree with --filter-provided-revisions' '
    ++test_expect_success 'verify object:type=tree prints only tree with --filter-provided-objects' '
     +	printf "%s \n" $(git -C object-type rev-parse HEAD^{tree}) >expected &&
     +	git -C object-type rev-list --objects \
    -+		--filter=object:type=tree HEAD --filter-provided-revisions >actual &&
    ++		--filter=object:type=tree HEAD --filter-provided-objects >actual &&
     +	test_cmp expected actual
     +'
     +
    -+test_expect_success 'verify object:type=commit prints only commit with --filter-provided-revisions' '
    ++test_expect_success 'verify object:type=commit prints only commit with --filter-provided-objects' '
     +	git -C object-type rev-parse HEAD >expected &&
     +	git -C object-type rev-list --objects \
    -+		--filter=object:type=commit --filter-provided-revisions HEAD >actual &&
    ++		--filter=object:type=commit --filter-provided-objects HEAD >actual &&
     +	test_cmp expected actual
     +'
     +
    -+test_expect_success 'verify object:type=tag prints only tag with --filter-provided-revisions' '
    ++test_expect_success 'verify object:type=tag prints only tag with --filter-provided-objects' '
     +	printf "%s tag\n" $(git -C object-type rev-parse tag) >expected &&
     +	git -C object-type rev-list --objects \
    -+		--filter=object:type=tag --filter-provided-revisions tag >actual &&
    ++		--filter=object:type=tag --filter-provided-objects tag >actual &&
     +	test_cmp expected actual
     +'
     +
    @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'object:type filter' '
      	test_bitmap_traversal expect actual
      '
      
    -+test_expect_success 'object:type filter with --filter-provided-revisions' '
    -+	git rev-list --objects --filter-provided-revisions --filter=object:type=tag tag >expect &&
    ++test_expect_success 'object:type filter with --filter-provided-objects' '
    ++	git rev-list --objects --filter-provided-objects --filter=object:type=tag tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter-provided-revisions --filter=object:type=tag tag >actual &&
    ++		     --objects --filter-provided-objects --filter=object:type=tag tag >actual &&
     +	test_cmp expect actual &&
     +
    -+	git rev-list --objects --filter-provided-revisions --filter=object:type=commit tag >expect &&
    ++	git rev-list --objects --filter-provided-objects --filter=object:type=commit tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter-provided-revisions --filter=object:type=commit tag >actual &&
    ++		     --objects --filter-provided-objects --filter=object:type=commit tag >actual &&
     +	test_bitmap_traversal expect actual &&
     +
    -+	git rev-list --objects --filter-provided-revisions --filter=object:type=tree tag >expect &&
    ++	git rev-list --objects --filter-provided-objects --filter=object:type=tree tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter-provided-revisions --filter=object:type=tree tag >actual &&
    ++		     --objects --filter-provided-objects --filter=object:type=tree tag >actual &&
     +	test_bitmap_traversal expect actual &&
     +
    -+	git rev-list --objects --filter-provided-revisions --filter=object:type=blob tag >expect &&
    ++	git rev-list --objects --filter-provided-objects --filter=object:type=blob tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter-provided-revisions --filter=object:type=blob tag >actual &&
    ++		     --objects --filter-provided-objects --filter=object:type=blob tag >actual &&
     +	test_bitmap_traversal expect actual
     +'
     +
    @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'combine filter' '
      	test_bitmap_traversal expect actual
      '
      
    -+test_expect_success 'combine filter with --filter-provided-revisions' '
    -+	git rev-list --objects --filter-provided-revisions --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
    ++test_expect_success 'combine filter with --filter-provided-objects' '
    ++	git rev-list --objects --filter-provided-objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter-provided-revisions --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
    ++		     --objects --filter-provided-objects --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
     +	test_bitmap_traversal expect actual &&
     +
     +	git cat-file --batch-check="%(objecttype) %(objectsize)" <actual >objects &&

Comments

Jeff King April 13, 2021, 7:45 a.m. UTC | #1
On Mon, Apr 12, 2021 at 03:37:14PM +0200, Patrick Steinhardt wrote:

> this is the fourth version of my patch series which implements a new
> `object:type` filter for git-rev-list(1) and git-upload-pack(1) and
> extends support for bitmap indices to work with combined filters.
> 
> Changes compared to v3:
> 
>     - Small style fix to not pass an empty string and `0`, but instead
>       simply pass two `NULL` pointers to
>       `list_objects_filter__filter_object, pointed out by Junio.
> 
>     - I've changed patch 7/8 as proposed by Peff: support of combined
>       filters is now determined directly in `filter_bitmap()`, without
>       having to mirror all filter types in the new `filter_supported()`
>       function.
> 
>     - Renamed `--filter-provided-revisions` to
>       `--filter-provided-objects` as proposed by Peff and addressed both
>       commit message and tests as pointed out by Philip.

Thanks. I have no more nits to pick. The only question left for me is
the big one of "do we like this with --filter, or should it be some kind
of rev-list option", as discussed in:

  https://lore.kernel.org/git/YHB7R8hVRt+V+i2W@coredump.intra.peff.net/

As I said elsewhere, I'm OK going with this route. I just wanted to call
attention to that earlier response in case you had any final thoughts
(I'm guessing your final thoughts are "jeez, I already wrote it with
--filter, so let's just go with that" which is perfectly reasonable to
me. ;) ).

-Peff
Patrick Steinhardt April 13, 2021, 8:06 a.m. UTC | #2
On Tue, Apr 13, 2021 at 03:45:21AM -0400, Jeff King wrote:
> On Mon, Apr 12, 2021 at 03:37:14PM +0200, Patrick Steinhardt wrote:
> 
> > this is the fourth version of my patch series which implements a new
> > `object:type` filter for git-rev-list(1) and git-upload-pack(1) and
> > extends support for bitmap indices to work with combined filters.
> > 
> > Changes compared to v3:
> > 
> >     - Small style fix to not pass an empty string and `0`, but instead
> >       simply pass two `NULL` pointers to
> >       `list_objects_filter__filter_object, pointed out by Junio.
> > 
> >     - I've changed patch 7/8 as proposed by Peff: support of combined
> >       filters is now determined directly in `filter_bitmap()`, without
> >       having to mirror all filter types in the new `filter_supported()`
> >       function.
> > 
> >     - Renamed `--filter-provided-revisions` to
> >       `--filter-provided-objects` as proposed by Peff and addressed both
> >       commit message and tests as pointed out by Philip.
> 
> Thanks. I have no more nits to pick. The only question left for me is
> the big one of "do we like this with --filter, or should it be some kind
> of rev-list option", as discussed in:
> 
>   https://lore.kernel.org/git/YHB7R8hVRt+V+i2W@coredump.intra.peff.net/
> 
> As I said elsewhere, I'm OK going with this route. I just wanted to call
> attention to that earlier response in case you had any final thoughts
> (I'm guessing your final thoughts are "jeez, I already wrote it with
> --filter, so let's just go with that" which is perfectly reasonable to
> me. ;) ).

I don't think it would help usability to add new `--show-blobs` and
`--show-trees` options. The user interface to show this kind of
information exists already with `--objects`, and by adding another way
of asking a similar query would raise the question of how these two ways
interact with each other:

    - Does `--show-blobs` have effect if `--objects` is not set?

    - Is `--objects` redundant if we have `--show-blobs`, or would
      `--objects --show-blobs` list all objects regardless of whether
      they're blobs or not?

    - What would happen if the user says `--show-blobs --no-objects`?

    - Are these options mutually exclusive?

We avoid all these questions by just adding it as a filter.

Furthermore, the filter also allows future iterations which build on top
of this. If we had a combined OR filter in addition to the existing
combined AND filter, the user could say "Give me all blobs which aren't
bigger than a specific size PLUS all trees with a depth smaller than 5
PLUS all commits and tags". It's not like I'd know of a specific usecase
for this right now, but I think the potential of having such filters in
the future is a plus.

Patrick
Junio C Hamano April 13, 2021, 9:03 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> Thanks. I have no more nits to pick. The only question left for me is
> the big one of "do we like this with --filter, or should it be some kind
> of rev-list option", as discussed in:
>
>   https://lore.kernel.org/git/YHB7R8hVRt+V+i2W@coredump.intra.peff.net/

I do agree that adding "--blobs", "--trees", and "--tags" options to
the "--objects" (and implicit --commits) to rev-list parameters is a
lot more in line with the original design by Linus when we added
"--objects" (vs not giving it).  We even internally have had the "do
we show trees?" "do we show blobs?" separate bits from the very
beginning of the "--objects" feature at 9de48752 (git-rev-list: add
option to list all objects (not just commits), 2005-06-24), which
was extended to cover tags at 3c90f03d (Prepare git-rev-list for
tracking tag objects too, 2005-06-29).  The basic design principle
hasn't changed when the code was reorganized to be closer to the
current shape at ae563542 (First cut at libifying revlist
generation, 2006-02-25).

But back then, we didn't have mechanisms to filter rev-list output
using other criteria, which brought us the umbrella notation to use
"--filter=...", so as a notation, it might be OK, provided if

	git rev-list \
	    --filter=object:type=tag \
	    --filter=object:type=commit \
	    --filter=object:type=tree \
	    --filter=object:type=blob "$@ther args"

is an equivalent to existing

	git rev-list --objects "$@ther args"

I however have to wonder why this need so much code (in other words,
why do we need more than just adding something similar to this block
in the revision.c machinery:

	} else if (!strcmp(arg, "--objects")) {
		revs->tag_objects = 1;
		revs->tree_objects = 1;
		revs->blob_objects = 1;

that flips <type>_objects member bits?) though.
Patrick Steinhardt April 14, 2021, 11:59 a.m. UTC | #4
On Tue, Apr 13, 2021 at 02:03:12PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > Thanks. I have no more nits to pick. The only question left for me is
> > the big one of "do we like this with --filter, or should it be some kind
> > of rev-list option", as discussed in:
> >
> >   https://lore.kernel.org/git/YHB7R8hVRt+V+i2W@coredump.intra.peff.net/
> 
> I do agree that adding "--blobs", "--trees", and "--tags" options to
> the "--objects" (and implicit --commits) to rev-list parameters is a
> lot more in line with the original design by Linus when we added
> "--objects" (vs not giving it).  We even internally have had the "do
> we show trees?" "do we show blobs?" separate bits from the very
> beginning of the "--objects" feature at 9de48752 (git-rev-list: add
> option to list all objects (not just commits), 2005-06-24), which
> was extended to cover tags at 3c90f03d (Prepare git-rev-list for
> tracking tag objects too, 2005-06-29).  The basic design principle
> hasn't changed when the code was reorganized to be closer to the
> current shape at ae563542 (First cut at libifying revlist
> generation, 2006-02-25).
> 
> But back then, we didn't have mechanisms to filter rev-list output
> using other criteria, which brought us the umbrella notation to use
> "--filter=...", so as a notation, it might be OK, provided if
> 
> 	git rev-list \
> 	    --filter=object:type=tag \
> 	    --filter=object:type=commit \
> 	    --filter=object:type=tree \
> 	    --filter=object:type=blob "$@ther args"
> 
> is an equivalent to existing
> 
> 	git rev-list --objects "$@ther args"

Filters are currently AND filters, so specifying them multiple times
would cause us to print nothing. And I don't think we should treat
object:type filters any different compared to the other filters, because
that could limit our ability to iterate in the future where we may want
to add OR filters.

I initially said that I didn't want to add `object:type=tag,commit`
filters as a way to say that all types which are either a tag or commit
should be printed because I thought that having the ability to combine
filters with an OR instead of an AND would be the better design here.
But on the other hand, there is no reason we cannot have both, and it
would implement your above usecase, even though syntax is different.

> I however have to wonder why this need so much code (in other words,
> why do we need more than just adding something similar to this block
> in the revision.c machinery:
> 
> 	} else if (!strcmp(arg, "--objects")) {
> 		revs->tag_objects = 1;
> 		revs->tree_objects = 1;
> 		revs->blob_objects = 1;
> 
> that flips <type>_objects member bits?) though.

So if we make this part of the rev machinery, I guess your idea is that
we can just put the logic into `show_object()`? That would indeed be a
lot simpler to implement with a lot less code. But on the other hand,
it's also less efficient: we cannot stop walking down the graph like we
can with the current design when e.g. saying "Only give me
{tags,commits,trees}". And with bitmap indices, we can even skip all
objects of a certain type at once, without having to load the object,
right inside the filtering logic. If this was part of `show_object()`,
we wouldn't be able to do so (easily).

Anyway, this is assuming that I'm not misinterpreting what you mean by
your above comment. So please let me know if I misunderstood.

Patrick
Junio C Hamano April 14, 2021, 9:07 p.m. UTC | #5
Patrick Steinhardt <ps@pks.im> writes:

>> I however have to wonder why this need so much code (in other words,
>> why do we need more than just adding something similar to this block
>> in the revision.c machinery:
>> 
>> 	} else if (!strcmp(arg, "--objects")) {
>> 		revs->tag_objects = 1;
>> 		revs->tree_objects = 1;
>> 		revs->blob_objects = 1;
>> 
>> that flips <type>_objects member bits?) though.
>
> So if we make this part of the rev machinery, I guess your idea is that
> we can just put the logic into `show_object()`?

Not necessarily.  I was making a lot more naïve observations.

 * Even though we have 3 independent (tag|tree|blob)_objects bits,
   they are all set to true with "--objects" or they are cleared
   without.  There is no codepath that flips these bits to set some
   but not all of them to be set.

 * But if we look at the the hits from

    $ git grep -C2 -E -e '([.]|->)(tag|tree|blob)_objects' \*.[ch]

   it is clear that the code is (at least trying to be) prepared for
   them to be set independently.  The .tree_objects member is often
   checked without checking others to mark the tree objects on the
   edge of the range uninteresting, for example.

   It of course is unknown how well the code is actually prepared
   for these three bits to be set independently, as nobody can set
   these bits independently.

 * Which makes a naïve reader to wonder if it would be sufficient
   to have a silly option, like this:

 	} else if (!strcmp(arg, "--filter=object:type=tree")) {
 		revs->tag_objects = 0;
 		revs->tree_objects = 1;
 		revs->blob_objects = 0;

   in the same if/else if/... cascade as the above (and other types
   as well), in order to do the same thing as this series.

And, the above led to the question---the patches in your series
apparently do a lot more (even if we discount the option parsing
part), and I was wondering if that is because the independence
between these three bits the existing code aspires to maintain is
broken.

> Anyway, this is assuming that I'm not misinterpreting what you mean by
> your above comment. So please let me know if I misunderstood.
Jeff King April 15, 2021, 9:42 a.m. UTC | #6
On Tue, Apr 13, 2021 at 10:06:13AM +0200, Patrick Steinhardt wrote:

> I don't think it would help usability to add new `--show-blobs` and
> `--show-trees` options. The user interface to show this kind of
> information exists already with `--objects`, and by adding another way
> of asking a similar query would raise the question of how these two ways
> interact with each other:
> 
>     - Does `--show-blobs` have effect if `--objects` is not set?
> 
>     - Is `--objects` redundant if we have `--show-blobs`, or would
>       `--objects --show-blobs` list all objects regardless of whether
>       they're blobs or not?
> 
>     - What would happen if the user says `--show-blobs --no-objects`?
> 
>     - Are these options mutually exclusive?
> 
> We avoid all these questions by just adding it as a filter.

I'm not too worried about those. I'd imagine that "--objects" becomes a
documented synonym for "--show-trees --show-blobs --show-commits
--show-tags", and then the usual interactions take over.

But...

> Furthermore, the filter also allows future iterations which build on top
> of this. If we had a combined OR filter in addition to the existing
> combined AND filter, the user could say "Give me all blobs which aren't
> bigger than a specific size PLUS all trees with a depth smaller than 5
> PLUS all commits and tags". It's not like I'd know of a specific usecase
> for this right now, but I think the potential of having such filters in
> the future is a plus.

Yeah, that's true. My biggest complaint is lack of an OR filter, but we
could add that later. And then we would be _more_ flexible, as you note,
since we could and/or more filters.

So I'm OK proceeding with this direction.

-Peff
Jeff King April 15, 2021, 9:57 a.m. UTC | #7
On Wed, Apr 14, 2021 at 02:07:07PM -0700, Junio C Hamano wrote:

>  * But if we look at the the hits from
> 
>     $ git grep -C2 -E -e '([.]|->)(tag|tree|blob)_objects' \*.[ch]
> 
>    it is clear that the code is (at least trying to be) prepared for
>    them to be set independently.  The .tree_objects member is often
>    checked without checking others to mark the tree objects on the
>    edge of the range uninteresting, for example.
> 
>    It of course is unknown how well the code is actually prepared
>    for these three bits to be set independently, as nobody can set
>    these bits independently.

Yeah, as somebody who has added or touched a lot of those paths, I've
often wondered this myself: what would break if you asked for blobs but
not trees? I think it does not work like you'd expect, because
list-objects.c:process_tree() will not recurse the trees at all (and
hence you'd never see any blob).

>  * Which makes a naïve reader to wonder if it would be sufficient
>    to have a silly option, like this:
> 
>  	} else if (!strcmp(arg, "--filter=object:type=tree")) {
>  		revs->tag_objects = 0;
>  		revs->tree_objects = 1;
>  		revs->blob_objects = 0;
> 
>    in the same if/else if/... cascade as the above (and other types
>    as well), in order to do the same thing as this series.
> 
> And, the above led to the question---the patches in your series
> apparently do a lot more (even if we discount the option parsing
> part), and I was wondering if that is because the independence
> between these three bits the existing code aspires to maintain is
> broken.

I don't think the code Patrick added is that much more complex. Most if
it was cleaning up rough edges in the filtering system, and making sure
that bitmaps supported this style of filtering. I _think_ the existing
bitmap code would Just Work with the example you showed above. It uses
list-objects.c to do any fill-in traversal, and then
traverse_bitmap_commit_list() uses the type-bitmaps to filter the
results.

But it would be likewise broken for the case of "no trees, just
blobs", because of the problem in list-objects.c that I mentioned (but
worse, it would only be _half_ broken; we might even produce the right
answer if we don't have to do any fill-in traversal!).

Anyway. I do think this all could be done using the existing bits in
rev_info. But there is value in making it all part of the modular
filtering system.

-Peff
Junio C Hamano April 15, 2021, 5:53 p.m. UTC | #8
Jeff King <peff@peff.net> writes:

> Yeah, as somebody who has added or touched a lot of those paths, I've
> often wondered this myself: what would break if you asked for blobs but
> not trees? I think it does not work like you'd expect, because
> list-objects.c:process_tree() will not recurse the trees at all (and
> hence you'd never see any blob).

Ah, OK, that much I figured.  I was wondering if it is worth
"fixing", IOW, if we see blobs being asked without trees being
asked, perhaps process_tree() should be taught to descend---even if
it won't show the trees themselves, blobs contained within would.

>> And, the above led to the question---the patches in your series
>> apparently do a lot more (even if we discount the option parsing
>> part), and I was wondering if that is because the independence
>> between these three bits the existing code aspires to maintain is
>> broken.
> ...
> Anyway. I do think this all could be done using the existing bits in
> rev_info. But there is value in making it all part of the modular
> filtering system.

Yes, I do not have a problem with the approach to add new features
as part of the "modular filtering system".  But that leads me to
wonder into a different direction---coalesce (tag|tree|blob)_objects
members into a single bit, say all_objects, have "--objects" and
friends set that single bit, and update places like these to check
that single bit.

The patch is for illustration purposes; I didn't even aim to cover
the entirety of these two files.

 builtin/rev-list.c | 9 ++++-----
 pack-bitmap.c      | 4 +---
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git c/builtin/rev-list.c w/builtin/rev-list.c
index 7677b1af5a..b6f0d7e9a3 100644
--- c/builtin/rev-list.c
+++ w/builtin/rev-list.c
@@ -424,8 +424,7 @@ static int try_bitmap_count(struct rev_info *revs,
 	 * commits to traverse, since we don't know which objects go with which
 	 * commit.
 	 */
-	if (revs->max_count >= 0 &&
-	    (revs->tag_objects || revs->tree_objects || revs->blob_objects))
+	if (revs->max_count >= 0 && revs->all_objects)
 		return -1;
 
 	/*
@@ -439,9 +438,9 @@ static int try_bitmap_count(struct rev_info *revs,
 		return -1;
 
 	count_bitmap_commit_list(bitmap_git, &commit_count,
-				 revs->tree_objects ? &tree_count : NULL,
-				 revs->blob_objects ? &blob_count : NULL,
-				 revs->tag_objects ? &tag_count : NULL);
+				 revs->all_objects ? &tree_count : NULL,
+				 revs->all_objects ? &blob_count : NULL,
+				 revs->all_objects ? &tag_count : NULL);
 	if (max_count >= 0 && max_count < commit_count)
 		commit_count = max_count;
 
diff --git c/pack-bitmap.c w/pack-bitmap.c
index d7e9f14f92..918c80a391 100644
--- c/pack-bitmap.c
+++ w/pack-bitmap.c
@@ -801,9 +801,7 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
 			continue;
 
 		obj = eindex->objects[i];
-		if ((obj->type == OBJ_BLOB && !revs->blob_objects) ||
-		    (obj->type == OBJ_TREE && !revs->tree_objects) ||
-		    (obj->type == OBJ_TAG && !revs->tag_objects))
+		if (!revs->all_objects && obj->type != OBJ_COMMIT)
 			continue;
 
 		show_reach(&obj->oid, obj->type, 0, eindex->hashes[i], NULL, 0);
Junio C Hamano April 15, 2021, 5:57 p.m. UTC | #9
Junio C Hamano <gitster@pobox.com> writes:

> ...  But that leads me to
> wonder into a different direction---coalesce (tag|tree|blob)_objects
> members into a single bit, say all_objects, have "--objects" and
> friends set that single bit, and update places like these to check
> that single bit.

Just to avoid misunderstanding, I am not saying this topic needs to
address any of this unifying of three bits.

It is just an idea for those interested to think about, when they
have nothing better to do, when the codebase is quiescent.
Junio C Hamano April 16, 2021, 10:06 p.m. UTC | #10
Jeff King <peff@peff.net> writes:

>> Furthermore, the filter also allows future iterations which build on top
>> of this. If we had a combined OR filter in addition to the existing
>> combined AND filter, the user could say "Give me all blobs which aren't
>> bigger than a specific size PLUS all trees with a depth smaller than 5
>> PLUS all commits and tags". It's not like I'd know of a specific usecase
>> for this right now, but I think the potential of having such filters in
>> the future is a plus.
>
> Yeah, that's true. My biggest complaint is lack of an OR filter, but we
> could add that later. And then we would be _more_ flexible, as you note,
> since we could and/or more filters.
>
> So I'm OK proceeding with this direction.

I think the only remaining issues are the comments on 5/8 on tests,
then?  Hopefully we can have one more iteration to finalize the
topic and merge it down to 'next'?

Thanks.
Junio C Hamano April 16, 2021, 11:15 p.m. UTC | #11
Junio C Hamano <gitster@pobox.com> writes:

> I think the only remaining issues are the comments on 5/8 on tests,
> then?  Hopefully we can have one more iteration to finalize the
> topic and merge it down to 'next'?
>
> Thanks.

I guess not.  I am guessing this topic is responsible for

  https://github.com/git/git/runs/2366364023?check_suite_focus=true#step:4:115
Ramsay Jones April 17, 2021, 1:17 a.m. UTC | #12
On Fri, Apr 16, 2021 at 04:15:39PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > I think the only remaining issues are the comments on 5/8 on tests,
> > then?  Hopefully we can have one more iteration to finalize the
> > topic and merge it down to 'next'?
> >
> > Thanks.
> 
> I guess not.  I am guessing this topic is responsible for
> 
>   https://github.com/git/git/runs/2366364023?check_suite_focus=true#step:4:115

Yes, I noticed this a few days ago, and tried the obvious fix (ie to
#include "cache.h" at the start of the list-objects-filter-options.h
header file) which does indeed work fine. However, I then thought that
moving the definition of 'enum object_type' (along with the TYPE_BITS
#define) to the 'object.h' header file (and #include "object.h" into
cache.h) would be a better idea...

Having done that, I wondered how many '#include "cache.h"' could be
replaced by "object.h", and ... well, that was a few days ago and
something came up...

ATB,
Ramsay Jones
Jeff King April 17, 2021, 8:58 a.m. UTC | #13
On Thu, Apr 15, 2021 at 10:57:51AM -0700, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > ...  But that leads me to
> > wonder into a different direction---coalesce (tag|tree|blob)_objects
> > members into a single bit, say all_objects, have "--objects" and
> > friends set that single bit, and update places like these to check
> > that single bit.
> 
> Just to avoid misunderstanding, I am not saying this topic needs to
> address any of this unifying of three bits.
> 
> It is just an idea for those interested to think about, when they
> have nothing better to do, when the codebase is quiescent.

It does feel like going "backwards" in a sense. We have the three flags
mostly split, and we'd lose that distinction. On the other hand, if the
current split is imperfect, it may be leading people down a confusing
path (I _think_ this "trees must be set in order to see blobs" thing is
the only real gotcha, but there could be others).

There's some other discussion in this old thread:

  https://lore.kernel.org/git/06a84f8c77924b275606384ead8bb2fd7d75f7b6.1487984670.git.jonathantanmy@google.com/

(I didn't remember it, but my spider sense tingling caused me to dig in
the archive a bit).

-Peff
Jeff King April 17, 2021, 9:01 a.m. UTC | #14
On Sat, Apr 17, 2021 at 02:17:32AM +0100, Ramsay Jones wrote:

> On Fri, Apr 16, 2021 at 04:15:39PM -0700, Junio C Hamano wrote:
> > Junio C Hamano <gitster@pobox.com> writes:
> > 
> > > I think the only remaining issues are the comments on 5/8 on tests,
> > > then?  Hopefully we can have one more iteration to finalize the
> > > topic and merge it down to 'next'?
> > >
> > > Thanks.
> > 
> > I guess not.  I am guessing this topic is responsible for
> > 
> >   https://github.com/git/git/runs/2366364023?check_suite_focus=true#step:4:115
> 
> Yes, I noticed this a few days ago, and tried the obvious fix (ie to
> #include "cache.h" at the start of the list-objects-filter-options.h
> header file) which does indeed work fine. However, I then thought that
> moving the definition of 'enum object_type' (along with the TYPE_BITS
> #define) to the 'object.h' header file (and #include "object.h" into
> cache.h) would be a better idea...
> 
> Having done that, I wondered how many '#include "cache.h"' could be
> replaced by "object.h", and ... well, that was a few days ago and
> something came up...

I agree that would probably be nice, but let's not hold up the topic. It
can include "cache.h" like a bunch of other headers in the meantime.

(Too bad we can't just forward-declare the enum like we do for some
other types, but it's impossible in C).

-Peff
Junio C Hamano April 17, 2021, 9:45 p.m. UTC | #15
Jeff King <peff@peff.net> writes:

> I agree that would probably be nice, but let's not hold up the topic. It
> can include "cache.h" like a bunch of other headers in the meantime.
>
> (Too bad we can't just forward-declare the enum like we do for some
> other types, but it's impossible in C).

For now...

From f08346ce05e5512e5914e760aba769d0ef8035bc Mon Sep 17 00:00:00 2001
From: Junio C Hamano <gitster@pobox.com>
Date: Sat, 17 Apr 2021 14:44:37 -0700
Subject: [PATCH] fixup! list-objects: implement object type filter

---
 list-objects-filter-options.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index 4d0d0588cc..da5b6737e2 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -1,6 +1,7 @@
 #ifndef LIST_OBJECTS_FILTER_OPTIONS_H
 #define LIST_OBJECTS_FILTER_OPTIONS_H
 
+#include "cache.h"
 #include "parse-options.h"
 #include "string-list.h"