mbox series

[RFC,0/3] implement composite filters

Message ID cover.1558030802.git.matvore@google.com (mailing list archive)
Headers show
Series implement composite filters | expand

Message

Matthew DeVore May 16, 2019, 6:56 p.m. UTC
Here is a first stab at composite filters. It does not actually support omits,
but the biggest difficulties of the implementation are already addressed. So I
decided to send out an early version to give interested people an idea of just
what is needed to implement it, and then give them a chance to change my mind
(Jonathan T. especially was concerned about code complexity).

Thank you,

Matthew DeVore (3):
  list-objects-filter: refactor into a context struct
  list-objects-filter-options: error is localizeable
  list-objects-filter: implement composite filters

 Documentation/rev-list-options.txt     |   8 +
 contrib/completion/git-completion.bash |   2 +-
 list-objects-filter-options.c          | 132 +++++++++++-
 list-objects-filter-options.h          |  14 +-
 list-objects-filter.c                  | 272 ++++++++++++++++---------
 list-objects-filter.h                  |  31 ++-
 list-objects.c                         |  45 ++--
 t/t6112-rev-list-filters-objects.sh    | 108 +++++++++-
 8 files changed, 470 insertions(+), 142 deletions(-)

Comments

Jonathan Tan May 16, 2019, 10:41 p.m. UTC | #1
> Here is a first stab at composite filters. It does not actually support omits,
> but the biggest difficulties of the implementation are already addressed. So I
> decided to send out an early version to give interested people an idea of just
> what is needed to implement it, and then give them a chance to change my mind
> (Jonathan T. especially was concerned about code complexity).

Thanks - seeing these patches reduces my concerns significantly. A
composite filter has LHS and RHS filters, either of which can be
composite themselves, so we support compositing arbitrary numbers of
filters. I see that the approach is to run LHS and RHS in lockstep, with
our own handling of the result flags:

- LOFR_MARK_SEEN is tracked for LHS and RHS separately. To support an
  arbitrary number of filters, we don't use object flags to track this,
  so we use oidsets instead. I don't think that the extra memory usage
  will be a problem (we already allocate more for all the struct
  object). If this is an issue in the future, we can switch to using
  object flags for the first N filters, and oidsets thereafter.

- LOFR_SKIP_TREE is simulated if only one filter wants to skip the tree.

- I haven't fully figured out LOFR_DO_SHOW yet. It seems to me that if
  an object appears twice in the walk, and the LHS says LOFR_DO_SHOW on
  the first occurrence, if the RHS says LOFR_DO_SHOW on the second
  occurrence, the object will be shown twice. But perhaps this isn't a
  problem - as it is, I think that a filter can call LOFR_DO_SHOW
  multiple times on the same object anyway. In any case, if this turns
  out to be a problem, it seems surmountable to me.
Matthew DeVore May 17, 2019, 12:01 a.m. UTC | #2
> On 2019/05/16, at 15:41, Jonathan Tan <jonathantanmy@google.com> wrote:
> 
> Thanks - seeing these patches reduces my concerns significantly. A

Thank you for taking a look :)

> 
> - LOFR_MARK_SEEN is tracked for LHS and RHS separately. To support an
>  arbitrary number of filters, we don't use object flags to track this,
>  so we use oidsets instead. I don't think that the extra memory usage
>  will be a problem (we already allocate more for all the struct
>  object). If this is an issue in the future, we can switch to using
>  object flags for the first N filters, and oidsets thereafter.

Yup. Another possibility that comes to mind is that when both the lhs and rhs seen sets contain the same object, we promote it to a combined set, and remove it from the individual ones at that time.

> 
> - LOFR_SKIP_TREE is simulated if only one filter wants to skip the tree.
> 
> - I haven't fully figured out LOFR_DO_SHOW yet. It seems to me that if
>  an object appears twice in the walk, and the LHS says LOFR_DO_SHOW on
>  the first occurrence, if the RHS says LOFR_DO_SHOW on the second
>  occurrence, the object will be shown twice. But perhaps this isn't a

LOFR_DO_SHOW is only propagated upward from the combine: filter if both children indicate LOFR_DO_SHOW for the same object at the same point in the traversal (see the line that has "combined_result |= LOFR_DO_SHOW"). In the scenario you draw out, the object won’t be shown at all, since the first occurrence is filtered out for one reason, and the second is filtered out for a separate reason. This may happen when a sparse: filter includes a "deep" blob but excludes the same blob at a shallower point in the tree, and a tree: filter does the opposite with the same blob. Just thinking about it now for a moment, this seems intuitive and reasonable.