mbox series

[v2,0/8] rev-parse: implement object type filter

Message ID cover.1615813673.git.ps@pks.im (mailing list archive)
Headers show
Series rev-parse: implement object type filter | expand

Message

Patrick Steinhardt March 15, 2021, 1:14 p.m. UTC
Hi,

this is the second version of my patch series which implements a new
`object:type` filter for git-rev-parse(1) and git-upload-pack(1) and
extends support for bitmap indices to work with combined filters.

Changes compared to v1:

    - I've added a patch up front which changes the uploadpack
      documentation to explicitly document that setting
      `uploadpackfilter.allow=true` will enable all future filters. I'm
      not yet saying that this is the correct thing to do, but rather
      added this patch such that we have a proper place to discuss this
      topic. In the context of object-type filters, I do think though
      that it's not an issue to default-enable type filters: they're not
      expensive to compute anyway.

    - `uploadpackfilter.<filter>.allow` documentation was updated to
      mention the new filter.

    - A bug was fixed which caused us to reset `--filter-allowed` in
      case a normal filter was converted to a combined filter. I've
      added tests to more thoroughly verify that filters work as
      expected and also filter provided objects.

Please see the attached range-diff for more details.

Patrick

Patrick Steinhardt (8):
  uploadpack.txt: document implication of `uploadpackfilter.allow`
  revision: mark commit parents as NOT_USER_GIVEN
  list-objects: move tag processing into its own function
  list-objects: support filtering by tag and commit
  list-objects: implement object type filter
  pack-bitmap: implement object type filter
  pack-bitmap: implement combined filter
  rev-list: allow filtering of provided items

 Documentation/config/uploadpack.txt |   9 ++-
 Documentation/rev-list-options.txt  |   3 +
 builtin/rev-list.c                  |  14 ++++
 list-objects-filter-options.c       |  18 +++++
 list-objects-filter-options.h       |   8 ++
 list-objects-filter.c               | 116 ++++++++++++++++++++++++++++
 list-objects-filter.h               |   2 +
 list-objects.c                      |  32 +++++++-
 pack-bitmap.c                       |  71 +++++++++++++++--
 revision.c                          |   4 +-
 revision.h                          |   3 -
 t/t6112-rev-list-filters-objects.sh |  76 ++++++++++++++++++
 t/t6113-rev-list-bitmap-filters.sh  |  68 +++++++++++++++-
 13 files changed, 403 insertions(+), 21 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  270ff80dac uploadpack.txt: document implication of `uploadpackfilter.allow`
1:  f2ce5dac89 = 2:  ddbec75986 revision: mark commit parents as NOT_USER_GIVEN
2:  9feadba124 = 3:  d8da0b24f4 list-objects: move tag processing into its own function
3:  4aa13ee83f = 4:  5545c189c5 list-objects: support filtering by tag and commit
4:  01b9fdbb9c ! 5:  acf01472af list-objects: implement object type filter
    @@ Commit message
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
    + ## Documentation/config/uploadpack.txt ##
    +@@ Documentation/config/uploadpack.txt: uploadpackfilter.allow::
    + uploadpackfilter.<filter>.allow::
    + 	Explicitly allow or ban the object filter corresponding to
    + 	`<filter>`, where `<filter>` may be one of: `blob:none`,
    +-	`blob:limit`, `tree`, `sparse:oid`, or `combine`. If using
    +-	combined filters, both `combine` and all of the nested filter
    +-	kinds must be allowed. Defaults to `uploadpackfilter.allow`.
    ++	`blob:limit`, `object:type`, `tree`, `sparse:oid`, or `combine`.
    ++	If using combined filters, both `combine` and all of the nested
    ++	filter kinds must be allowed. Defaults to `uploadpackfilter.allow`.
    + 
    + uploadpackfilter.tree.maxDepth::
    + 	Only allow `--filter=tree:<n>` when `<n>` is no more than the value of
    +
      ## Documentation/rev-list-options.txt ##
     @@ Documentation/rev-list-options.txt: or units.  n may be zero.  The suffixes k, m, and g can be used to name
      units in KiB, MiB, or GiB.  For example, 'blob:limit=1k' is the same
5:  c97fd28d8f = 6:  8073ab665b pack-bitmap: implement object type filter
6:  fe2b7a1e55 = 7:  fac3477d97 pack-bitmap: implement combined filter
7:  b43bf401df ! 8:  0e26fee8b3 rev-list: allow filtering of provided items
    @@ builtin/rev-list.c: int cmd_rev_list(int argc, const char **argv, const char *pr
      		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
      	if (arg_missing_action == MA_PRINT)
     
    + ## list-objects-filter-options.c ##
    +@@ list-objects-filter-options.c: static void transform_to_combine_type(
    + 		memset(filter_options, 0, sizeof(*filter_options));
    + 		filter_options->sub = sub_array;
    + 		filter_options->sub_alloc = initial_sub_alloc;
    ++		filter_options->filter_wants = sub_array[0].filter_wants;
    + 	}
    + 	filter_options->sub_nr = 1;
    + 	filter_options->choice = LOFC_COMBINE;
    +@@ list-objects-filter-options.c: void parse_list_objects_filter(
    + 		parse_error = gently_parse_list_objects_filter(
    + 			&filter_options->sub[filter_options->sub_nr - 1], arg,
    + 			&errbuf);
    ++		if (!parse_error)
    ++			filter_options->sub[filter_options->sub_nr - 1].filter_wants =
    ++				filter_options->filter_wants;
    + 	}
    + 	if (parse_error)
    + 		die("%s", errbuf.buf);
    +
      ## list-objects-filter-options.h ##
     @@ list-objects-filter-options.h: struct list_objects_filter_options {
      	 */
    @@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'object:type filter' '
      '
      
     +test_expect_success 'object:type filter with --filter-provided' '
    -+	git rev-list --objects --filter=object:type=tag --filter-provided tag >expect &&
    ++	git rev-list --objects --filter-provided --filter=object:type=tag tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter=object:type=tag --filter-provided tag >actual &&
    ++		     --objects --filter-provided --filter=object:type=tag tag >actual &&
     +	test_cmp expect actual &&
     +
    -+	git rev-list --objects --filter=object:type=commit --filter-provided tag >expect &&
    ++	git rev-list --objects --filter-provided --filter=object:type=commit tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter=object:type=commit --filter-provided tag >actual &&
    ++		     --objects --filter-provided --filter=object:type=commit tag >actual &&
     +	test_bitmap_traversal expect actual &&
     +
    -+	git rev-list --objects --filter=object:type=tree --filter-provided tag >expect &&
    ++	git rev-list --objects --filter-provided --filter=object:type=tree tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter=object:type=tree --filter-provided tag >actual &&
    ++		     --objects --filter-provided --filter=object:type=tree tag >actual &&
     +	test_bitmap_traversal expect actual &&
     +
    -+	git rev-list --objects --filter=object:type=blob --filter-provided tag >expect &&
    ++	git rev-list --objects --filter-provided --filter=object:type=blob tag >expect &&
     +	git rev-list --use-bitmap-index \
    -+		     --objects --filter=object:type=blob --filter-provided tag >actual &&
    ++		     --objects --filter-provided --filter=object:type=blob tag >actual &&
     +	test_bitmap_traversal expect actual
     +'
     +
      test_expect_success 'combine filter' '
      	git rev-list --objects --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
      	git rev-list --use-bitmap-index \
    +@@ t/t6113-rev-list-bitmap-filters.sh: test_expect_success 'combine filter' '
    + 	test_bitmap_traversal expect actual
    + '
    + 
    ++test_expect_success 'combine filter with --filter-provided' '
    ++	git rev-list --objects --filter-provided --filter=blob:limit=1000 --filter=object:type=blob tag >expect &&
    ++	git rev-list --use-bitmap-index \
    ++		     --objects --filter-provided --filter=blob:limit=1000 --filter=object:type=blob tag >actual &&
    ++	test_bitmap_traversal expect actual &&
    ++
    ++	git cat-file --batch-check="%(objecttype) %(objectsize)" <actual >objects &&
    ++	while read objecttype objectsize
    ++	do
    ++		test "$objecttype" = blob || return 1
    ++		test "$objectsize" -le 1000 || return 1
    ++	done <objects
    ++'
    ++
    + test_done

Comments

Junio C Hamano March 20, 2021, 9:10 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> this is the second version of my patch series which implements a new
> `object:type` filter for git-rev-parse(1) and git-upload-pack(1) and
> extends support for bitmap indices to work with combined filters.
> ...
> Please see the attached range-diff for more details.

Any comment from stakeholders?

Thanks.
Jeff King April 6, 2021, 6:08 p.m. UTC | #2
On Sat, Mar 20, 2021 at 02:10:41PM -0700, Junio C Hamano wrote:

> Patrick Steinhardt <ps@pks.im> writes:
> 
> > this is the second version of my patch series which implements a new
> > `object:type` filter for git-rev-parse(1) and git-upload-pack(1) and
> > extends support for bitmap indices to work with combined filters.
> > ...
> > Please see the attached range-diff for more details.
> 
> Any comment from stakeholders?

Sorry, this languished on my to-review list for a while.

I took a careful look. I found a few small nits, but the code overall
looks pretty good.

I do still find the use of the filter code here a _little_ bit
off-putting. It makes perfect sense in some ways: we are asking rev-list
to filter the output, and it keeps our implementation nice and simple.
It took me a while to figure out what I think makes it weird, but I
think it's:

  - the partial-clone feature exposes the filter mechanism in a very
    transparent way. So while it's not _wrong_ to be able to ask for a
    partial clone of only trees, it's an odd thing that nobody would
    really use in practice. And so it's a bit funny that it gets
    documented alongside blob:limit, etc.

  - for the same reason, it's very rigid. We have no way to say "this
    filter OR that filter", and are unlikely to grow them (because this
    is all part of the network protocol). Whereas it's perfectly
    reasonable for somebody to ask for "trees and blobs" via rev-list.

I dunno. Those aren't objections exactly. Just trying to put my finger
on why my initial reaction was "huh, why --filter?".

-Peff
Patrick Steinhardt April 9, 2021, 11:14 a.m. UTC | #3
On Tue, Apr 06, 2021 at 02:08:52PM -0400, Jeff King wrote:
> On Sat, Mar 20, 2021 at 02:10:41PM -0700, Junio C Hamano wrote:
> 
> > Patrick Steinhardt <ps@pks.im> writes:
> > 
> > > this is the second version of my patch series which implements a new
> > > `object:type` filter for git-rev-parse(1) and git-upload-pack(1) and
> > > extends support for bitmap indices to work with combined filters.
> > > ...
> > > Please see the attached range-diff for more details.
> > 
> > Any comment from stakeholders?
> 
> Sorry, this languished on my to-review list for a while.
> 
> I took a careful look. I found a few small nits, but the code overall
> looks pretty good.
> 
> I do still find the use of the filter code here a _little_ bit
> off-putting. It makes perfect sense in some ways: we are asking rev-list
> to filter the output, and it keeps our implementation nice and simple.
> It took me a while to figure out what I think makes it weird, but I
> think it's:
> 
>   - the partial-clone feature exposes the filter mechanism in a very
>     transparent way. So while it's not _wrong_ to be able to ask for a
>     partial clone of only trees, it's an odd thing that nobody would
>     really use in practice. And so it's a bit funny that it gets
>     documented alongside blob:limit, etc.
> 
>   - for the same reason, it's very rigid. We have no way to say "this
>     filter OR that filter", and are unlikely to grow them (because this
>     is all part of the network protocol). Whereas it's perfectly
>     reasonable for somebody to ask for "trees and blobs" via rev-list.
> 
> I dunno. Those aren't objections exactly. Just trying to put my finger
> on why my initial reaction was "huh, why --filter?".

Yeah, I do kind of share these concerns. Ideally, we'd provide a nicer
only-user-facing interface to query the repository for various objects.
git-cat-file(1) would be the obvious thing that first gets into my mind,
where it would be nice to have it filter stuff. But then on the other
hand, it's really rather a simple "Give me what I tell you to" binary,
which is probably a good thing. Other than that I don't think there's
any executable that'd be a good fit -- we could do this via a new
git-list-objects(1), but then again git-rev-list(1) already does most of
what git-list-objects(1) would do, so why bother.

It kind of feels like git-checkout(1) to me: it does many things, and if
you know how to wield it it works perfectly fine. But the user interface
is lacking, which is why it was split up into git-switch(1) and
git-restore(1). It's telling already that the summary of git-rev-list(1)
is "Lists commit objects in reverse chronological order". I mean yes,
that's what it does in many cases. But there's just as many cases where
it doesn't.

Patrick
Jeff King April 9, 2021, 4:05 p.m. UTC | #4
On Fri, Apr 09, 2021 at 01:14:26PM +0200, Patrick Steinhardt wrote:

> > I dunno. Those aren't objections exactly. Just trying to put my finger
> > on why my initial reaction was "huh, why --filter?".
> 
> Yeah, I do kind of share these concerns. Ideally, we'd provide a nicer
> only-user-facing interface to query the repository for various objects.
> git-cat-file(1) would be the obvious thing that first gets into my mind,
> where it would be nice to have it filter stuff. But then on the other
> hand, it's really rather a simple "Give me what I tell you to" binary,
> which is probably a good thing. Other than that I don't think there's
> any executable that'd be a good fit -- we could do this via a new
> git-list-objects(1), but then again git-rev-list(1) already does most of
> what git-list-objects(1) would do, so why bother.

I don't think cat-file does quite the same thing. An important part of
rev-list is that it is traversing. So it is determining both
reachability, but also eliminating excluded objects. For example, there
is no cat-file equivalent (and can never be) of:

  git rev-list --objects --filter=object:type=blob $old..$new

Likewise for list-objects (which cat-file really _does_ cover, with
--batch-all-objects). Obviously you can pair rev-list with cat-file to
traverse and then filter, but the whole point of this series is to do so
more efficiently.

So I think putting this into rev-list is the only sensible option. The
question is just whether to use --filter, or if it should be:

  git rev-list --show-blobs --show-trees $old..$new

with rules like:

  - if no --show-X is given, show only commits

  - if one or more --show-X is given, show all of them (but nothing else)

  - --objects is equivalent to providing each of --show-commits
    --show-blobs --show-trees --show-tags

-Peff