mbox series

[0/8] Additional metadata for filter processes

Message ID 20200310182046.748959-1-sandals@crustytoothpaste.net (mailing list archive)
Headers show
Series Additional metadata for filter processes | expand

Message

brian m. carlson March 10, 2020, 6:20 p.m. UTC
Smudge and clean filters are currently provided with one particular
piece of data: the pathname of the file being smudged.  While this is
helpful, there are a variety of situations where people would like to
have more data.

One such situation is for users who would like to have a custom
ident-style filter that contains the branch name.  In many cases, it's
sufficient to look up this information based on HEAD, but during
checkout, HEAD does not point to the right place, since it's updated
after the files are written.

Other information users frequently want to know is the commit's object
ID and the object ID of the blob being filtered.  For example, if
filtering is expensive and the filter process sees duplicate blobs
during checkout, it may cache the results and avoid having to compute
the filter twice.

This series provides an additional set of metadata to the filter
process with the keys "ref", "treeish", and "blob".  We prefer to
provide a commit as the treeish whenever possible, but in some cases,
such as when git archive is invoked with a tree, there is no commit, and
we use the tree instead.

Note that we don't provide this metadata in all cases.  Sometimes it is
trivial for the filter to do a simple "git rev-parse HEAD", and in such
cases, metadata other than the blob may not be provided.  We also don't
handle the case where the user is using a smudge or clean command
instead of a filter process command: if the user wants the additional
metadata, it should be possible for them to write a small filter
process, which is reasonably trivial in most languages.  Our
documentation already permits us to add additional metadata and
guarantees only that the pathname will be provided.

My particular use case for this is prefetching and precomputing data
during archive generation, since we don't permit delayed filters there
due to archives needing to be in a predictable order.  I have tried to
make it as generally applicable as possible, since I can imagine (and
have indeed seen requests for) many other useful applications of this
elsewhere.

Feedback is of course welcome.

brian m. carlson (8):
  builtin/checkout: pass branch info down to checkout_worktree
  convert: permit passing additional metadata to filter processes
  convert: provide additional metadata to filters
  builtin/checkout: compute checkout metadata for checkouts
  builtin/clone: compute checkout metadata for clones
  builtin/rebase: compute checkout metadata for rebases
  builtin/reset: compute checkout metadata for reset
  t0021: test filter metadata for additional cases

 apply.c                 |   2 +-
 archive.c               |  13 ++-
 archive.h               |   1 +
 builtin/cat-file.c      |   5 +-
 builtin/checkout.c      |  54 +++++++----
 builtin/clone.c         |   6 +-
 builtin/rebase.c        |   1 +
 builtin/reset.c         |  16 +++-
 cache.h                 |   1 +
 convert.c               |  66 ++++++++++++--
 convert.h               |  29 +++++-
 diff.c                  |   5 +-
 entry.c                 |   7 +-
 merge-recursive.c       |   2 +-
 merge.c                 |   1 +
 sequencer.c             |   1 +
 t/t0021-conversion.sh   | 198 ++++++++++++++++++++++++++++++++++------
 t/t0021/rot13-filter.pl |   6 ++
 unpack-trees.c          |   1 +
 unpack-trees.h          |   1 +
 20 files changed, 341 insertions(+), 75 deletions(-)