mbox series

[v3,00/19] Adds reftable library code from https://github.com/hanwen/reftable.

Message ID pull.1081.v3.git.git.1632841817.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Adds reftable library code from https://github.com/hanwen/reftable. | expand

Message

Koji Nakamaru via GitGitGadget Sept. 28, 2021, 3:09 p.m. UTC
The reftable format is described in Documentation/technical/reftable.txt.

This is a fully reentrant implementation of reading and writing the reftable
file format, and should be suitable for embedding in libgit2 too. It does
not hook the code up to git to function as a ref storage backend yet.

v2:

 * fold in OpenBSD fixes by Carlo Belón.

Han-Wen Nienhuys (19):
  hash.h: provide constants for the hash IDs
  reftable: RFC: add LICENSE
  reftable: add error related functionality
  reftable: utility functions
  reftable: add blocksource, an abstraction for random access reads
  reftable: (de)serialization for the polymorphic record type.
  Provide zlib's uncompress2 from compat/zlib-compat.c
  reftable: reading/writing blocks
  reftable: a generic binary tree implementation
  reftable: write reftable files
  reftable: generic interface to tables
  reftable: read reftable files
  reftable: reftable file level tests
  reftable: add a heap-based priority queue for reftable records
  reftable: add merged table view
  reftable: implement refname validation
  reftable: implement stack, a mutable database of reftable files.
  reftable: add dump utility
  Add "test-tool dump-reftable" command.

 Makefile                                   |   53 +-
 ci/lib.sh                                  |    1 +
 compat/.gitattributes                      |    1 +
 compat/zlib-uncompress2.c                  |   95 ++
 config.mak.uname                           |    4 +
 configure.ac                               |   13 +
 contrib/buildsystems/CMakeLists.txt        |   14 +-
 contrib/buildsystems/Generators/Vcxproj.pm |   11 +-
 hash.h                                     |    6 +
 object-file.c                              |    7 +-
 reftable/LICENSE                           |   31 +
 reftable/basics.c                          |  128 ++
 reftable/basics.h                          |   60 +
 reftable/basics_test.c                     |   98 ++
 reftable/block.c                           |  439 ++++++
 reftable/block.h                           |  127 ++
 reftable/block_test.c                      |  120 ++
 reftable/blocksource.c                     |  148 +++
 reftable/blocksource.h                     |   22 +
 reftable/constants.h                       |   21 +
 reftable/dump.c                            |  107 ++
 reftable/error.c                           |   41 +
 reftable/generic.c                         |  169 +++
 reftable/generic.h                         |   32 +
 reftable/iter.c                            |  194 +++
 reftable/iter.h                            |   69 +
 reftable/merged.c                          |  362 +++++
 reftable/merged.h                          |   35 +
 reftable/merged_test.c                     |  292 ++++
 reftable/pq.c                              |  105 ++
 reftable/pq.h                              |   33 +
 reftable/pq_test.c                         |   82 ++
 reftable/publicbasics.c                    |   58 +
 reftable/reader.c                          |  801 +++++++++++
 reftable/reader.h                          |   66 +
 reftable/readwrite_test.c                  |  652 +++++++++
 reftable/record.c                          | 1212 +++++++++++++++++
 reftable/record.h                          |  139 ++
 reftable/record_test.c                     |  412 ++++++
 reftable/refname.c                         |  209 +++
 reftable/refname.h                         |   29 +
 reftable/refname_test.c                    |  102 ++
 reftable/reftable-blocksource.h            |   49 +
 reftable/reftable-error.h                  |   62 +
 reftable/reftable-generic.h                |   47 +
 reftable/reftable-iterator.h               |   39 +
 reftable/reftable-malloc.h                 |   18 +
 reftable/reftable-merged.h                 |   72 +
 reftable/reftable-reader.h                 |  101 ++
 reftable/reftable-record.h                 |  114 ++
 reftable/reftable-stack.h                  |  128 ++
 reftable/reftable-tests.h                  |   23 +
 reftable/reftable-writer.h                 |  148 +++
 reftable/reftable.c                        |  115 ++
 reftable/stack.c                           | 1396 ++++++++++++++++++++
 reftable/stack.h                           |   41 +
 reftable/stack_test.c                      |  949 +++++++++++++
 reftable/system.h                          |   32 +
 reftable/test_framework.c                  |   23 +
 reftable/test_framework.h                  |   53 +
 reftable/tree.c                            |   63 +
 reftable/tree.h                            |   34 +
 reftable/tree_test.c                       |   61 +
 reftable/writer.c                          |  690 ++++++++++
 reftable/writer.h                          |   50 +
 t/helper/test-reftable.c                   |   21 +
 t/helper/test-tool.c                       |    4 +-
 t/helper/test-tool.h                       |    2 +
 t/t0032-reftable-unittest.sh               |   15 +
 69 files changed, 10938 insertions(+), 12 deletions(-)
 create mode 100644 compat/.gitattributes
 create mode 100644 compat/zlib-uncompress2.c
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/basics_test.c
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/blocksource.c
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/error.c
 create mode 100644 reftable/generic.c
 create mode 100644 reftable/generic.h
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/pq_test.c
 create mode 100644 reftable/publicbasics.c
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/readwrite_test.c
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/refname_test.c
 create mode 100644 reftable/reftable-blocksource.h
 create mode 100644 reftable/reftable-error.h
 create mode 100644 reftable/reftable-generic.h
 create mode 100644 reftable/reftable-iterator.h
 create mode 100644 reftable/reftable-malloc.h
 create mode 100644 reftable/reftable-merged.h
 create mode 100644 reftable/reftable-reader.h
 create mode 100644 reftable/reftable-record.h
 create mode 100644 reftable/reftable-stack.h
 create mode 100644 reftable/reftable-tests.h
 create mode 100644 reftable/reftable-writer.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/system.h
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 t/helper/test-reftable.c
 create mode 100755 t/t0032-reftable-unittest.sh


base-commit: ddb1055343948e0d0bc81f8d20245f1ada6430a0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1081%2Fhanwen%2Fjust-library-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1081/hanwen/just-library-v3
Pull-Request: https://github.com/git/git/pull/1081

Range-diff vs v2:

  1:  8534c69cf84 =  1:  b2f8dc32bfc hash.h: provide constants for the hash IDs
  2:  52968d8e83f =  2:  2597ddfeae1 reftable: RFC: add LICENSE
  3:  84c8b396df5 =  3:  41b1ff1293f reftable: add error related functionality
  4:  fa58f3d73e7 =  4:  c03043c5e63 reftable: utility functions
  5:  e9206532126 =  5:  317a6aaa357 reftable: add blocksource, an abstraction for random access reads
  6:  651e195fe77 =  6:  55e66b7a74f reftable: (de)serialization for the polymorphic record type.
  7:  e595dbc6c84 !  7:  4f6ecd3646a Provide zlib's uncompress2 from compat/zlib-compat.c
     @@ config.mak.uname: ifeq ($(uname_S),FreeBSD)
       	FILENO_IS_A_MACRO = UnfortunatelyYes
       endif
       ifeq ($(uname_S),OpenBSD)
     -+	NO_UNCOMPRESS2 = YesPlease
      +	# Versions < 7.0 need compatibility layer
      +	ifeq ($(shell expr "$(uname_R)" : "[1-6]\."),2)
      +		NO_UNCOMPRESS2 = UnfortunatelyYes
  8:  97f7ee04886 =  8:  04f2d74df51 reftable: reading/writing blocks
  9:  8d2e8be5795 =  9:  f6c2da61208 reftable: a generic binary tree implementation
 10:  965af6ec065 = 10:  277a1f662e3 reftable: write reftable files
 11:  4e259380fda = 11:  f97182dec1f reftable: generic interface to tables
 12:  a43d3512164 = 12:  13de5d03a71 reftable: read reftable files
 13:  f0e159cd853 = 13:  8efc3c2b6ab reftable: reftable file level tests
 14:  5a5c108d601 = 14:  e89b16a37e6 reftable: add a heap-based priority queue for reftable records
 15:  905ffa58f36 = 15:  40a91b14be8 reftable: add merged table view
 16:  e9598889af2 = 16:  46c2ddf07cf reftable: implement refname validation
 17:  8b0ee68c636 ! 17:  3db887bc078 reftable: implement stack, a mutable database of reftable files.
     @@ reftable/stack.c (new)
      +
      +int read_lines(const char *filename, char ***namesp)
      +{
     -+	int fd = open(filename, O_RDONLY, 0644);
     ++	int fd = open(filename, O_RDONLY);
      +	int err = 0;
      +	if (fd < 0) {
      +		if (errno == ENOENT) {
 18:  a9bebe45bde = 18:  795a15000fb reftable: add dump utility
 19:  104fbc7502f = 19:  7b3215aef39 Add "test-tool dump-reftable" command.

Comments

Junio C Hamano Sept. 28, 2021, 6:17 p.m. UTC | #1
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The reftable format is described in Documentation/technical/reftable.txt.
>
> This is a fully reentrant implementation of reading and writing the reftable
> file format, and should be suitable for embedding in libgit2 too. It does
> not hook the code up to git to function as a ref storage backend yet.

Elsewhere you solicited comments on the "RFC - add LICENSE" step.

As long as the copyright holders of this library are happy with BSD,
I do not see a problem with your plan in which the project borrows
this code under the license and then owns it, giving access to
others under the same license.

If somebody sees problems with it, please raise your hands ;-)

Thanks.
Carlo Marcelo Arenas Belón Sept. 30, 2021, 5:40 a.m. UTC | #2
Patch 1, was discussed before and might be solved in a better way as an
alternative.

Patch 2 and 3 are "nice to have" for portability and hopefully not controversial
but could be dropped, if someone feels strongly against it.

Patch 4 is not something I'd found failing anywhere, but the fact that Microsoft
mentions it is only supported as an extension and it needs to be supported by
the dynamic linker and I couldn't find anything clear about it in POSIX means,
that is probably safer this way.

  [PATCH 1/4] fixup! reftable: add a heap-based priority queue for
  [PATCH 2/4] fixup! reftable: implement stack, a mutable database of
  [PATCH 3/4] config.mak.uname: last release and snapshots of Minix
  [PATCH 4/4] reftable: avoid non portable compile time pointer to
Junio C Hamano Sept. 30, 2021, 8:35 p.m. UTC | #3
Carlo Marcelo Arenas Belón  <carenas@gmail.com> writes:

> Patch 1, was discussed before and might be solved in a better way as an
> alternative.
>
> Patch 2 and 3 are "nice to have" for portability and hopefully not controversial
> but could be dropped, if someone feels strongly against it.
>
> Patch 4 is not something I'd found failing anywhere, but the fact that Microsoft
> mentions it is only supported as an extension and it needs to be supported by
> the dynamic linker and I couldn't find anything clear about it in POSIX means,
> that is probably safer this way.

All of them look sensible.  Thanks.
Ævar Arnfjörð Bjarmason Oct. 2, 2021, 9:20 a.m. UTC | #4
On Tue, Sep 28 2021, Junio C Hamano wrote:

> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> The reftable format is described in Documentation/technical/reftable.txt.
>>
>> This is a fully reentrant implementation of reading and writing the reftable
>> file format, and should be suitable for embedding in libgit2 too. It does
>> not hook the code up to git to function as a ref storage backend yet.
>
> Elsewhere you solicited comments on the "RFC - add LICENSE" step.
>
> As long as the copyright holders of this library are happy with BSD,
> I do not see a problem with your plan in which the project borrows
> this code under the license and then owns it, giving access to
> others under the same license.
>
> If somebody sees problems with it, please raise your hands ;-)

No objection, but as noted in previous discussions about it I think we
should have some clarity about what it means for contributors. I.e. that
when they're contributing to reftable/ it's BSD license, but for the
rest of the codebase it's GPLv2.

I've just submitted a series to address that more generally:
https://lore.kernel.org/git/cover-0.5-00000000000-20211002T091212Z-avarab@gmail.com/

I think with that clarification of what these in-tree subdirectory
"COPYING" files mean the only thing left is for the "RFC" label to be
dropped here.

From what I as not-a-lawyer think I know about how licenses work the
concept that we can have BSD licensed content in-tree that can be
extracted and legally used by anyone outside of it seems rather dubious.

I.e. if I author a cross-tree where "reftable/" uses some updated API,
isn't the result a derived work and now GPLv2 and not "cleanly" BSD
licensed? But that's a separate question, and ultimately a worry for any
out-of-tree users. So I don't really care. I also believe that we've had
that situation before with (I think) libgit2 taking snapshots of our
"xdiff", so maybe it's already deemed to be a complete non-issue.

All I'd like is for contributors to git.git not to be confused, which I
think with my series above won't be the case.