mbox series

[v2,00/10] reftable: stop using `struct strbuf`

Message ID cover.1728910726.git.ps@pks.im (mailing list archive)
Headers show
Series reftable: stop using `struct strbuf` | expand

Message

Patrick Steinhardt Oct. 14, 2024, 1:02 p.m. UTC
Hi,

this is the second part of my patch series that stop using `struct
strbuf` in the reftable library. This is done such that the reftable
library becomes standalone again and so that we can use the pluggable
allocators part of the library.

Changes compared to v1:

  - Point out a while-at-it refactoring when getting rid of
    `strbuf_addf()`.

  - Fix grammar issue in another commit message.

  - Use REFTABLE_OUT_OF_MEMORY_ERROR error codes on allocation failure.

  - Add documentation for `struct reftable_buf` functions.

  - Clarify that the conversion to use the new interface is not wholly
    mechanical via sed(1).

Thanks!

Patrick

Patrick Steinhardt (10):
  reftable: stop using `strbuf_addbuf()`
  reftable: stop using `strbuf_addf()`
  reftable/basics: provide new `reftable_buf` interface
  reftable: convert from `strbuf` to `reftable_buf`
  reftable/blocksource: adapt interface name
  t/unit-tests: check for `reftable_buf` allocation errors
  reftable/stack: adapt `format_name()` to handle allocation failures
  reftable/record: adapt `reftable_record_key()` to handle allocation
    failures
  reftable/stack: adapt `stack_filename()` to handle allocation failures
  reftable: handle trivial `reftable_buf` errors

 reftable/basics.c                   |  76 +++++++++-
 reftable/basics.h                   |  59 +++++++-
 reftable/block.c                    |  61 +++++---
 reftable/block.h                    |  14 +-
 reftable/blocksource.c              |  30 ++--
 reftable/blocksource.h              |   5 +-
 reftable/iter.c                     |   9 +-
 reftable/iter.h                     |   8 +-
 reftable/reader.c                   |  27 ++--
 reftable/record.c                   | 114 ++++++++------
 reftable/record.h                   |  21 +--
 reftable/stack.c                    | 221 ++++++++++++++++++----------
 reftable/system.h                   |   1 -
 reftable/writer.c                   | 102 ++++++++-----
 reftable/writer.h                   |   2 +-
 t/unit-tests/lib-reftable.c         |   4 +-
 t/unit-tests/lib-reftable.h         |   7 +-
 t/unit-tests/t-reftable-basics.c    |  16 +-
 t/unit-tests/t-reftable-block.c     |  53 +++----
 t/unit-tests/t-reftable-merged.c    |  32 ++--
 t/unit-tests/t-reftable-reader.c    |  12 +-
 t/unit-tests/t-reftable-readwrite.c | 134 +++++++++--------
 t/unit-tests/t-reftable-record.c    |  74 +++++-----
 t/unit-tests/t-reftable-stack.c     |  96 ++++++------
 24 files changed, 726 insertions(+), 452 deletions(-)

Range-diff against v1:
 1:  7408482c152 =  1:  7408482c152 reftable: stop using `strbuf_addbuf()`
 2:  abc28d7664f !  2:  6a7333b275e reftable: stop using `strbuf_addf()`
    @@ Commit message
         `strbuf`. Get rid of the seldomly-used `strbuf_addf()` function such
         that we have to reimplement one less function.
     
    +    While at it, remove a useless call to `strbuf_reset()` in
    +    `t_reftable_stack_auto_compaction_with_locked_tables()`. We don't write
    +    to the buffer before this and initialize it with `STRBUF_INIT`, so there
    +    is no need to reset anything.
    +
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## reftable/stack.c ##
 3:  24e31619b93 !  3:  0ddc8c0c896 reftable/basics: provide new `reftable_buf` interface
    @@ Commit message
     
           - The `strbuf` interfaces do not handle memory allocation failures and
             instead causes us to die. This is okay in the context of Git, but is
    -        not context of the reftable library, which is supposed to be usable
    -        by third-party applications.
    +        not in the context of the reftable library, which is supposed to be
    +        usable by third-party applications.
     
           - The `strbuf` interface is quite deeply tied into Git, which makes it
             hard to use the reftable library as a standalone library. Any
    @@ Commit message
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## reftable/basics.c ##
    +@@ reftable/basics.c: license that can be found in the LICENSE file or at
    + #define REFTABLE_ALLOW_BANNED_ALLOCATORS
    + #include "basics.h"
    + #include "reftable-basics.h"
    ++#include "reftable-error.h"
    + 
    + static void *(*reftable_malloc_ptr)(size_t sz);
    + static void *(*reftable_realloc_ptr)(void *, size_t);
     @@ reftable/basics.c: void reftable_set_alloc(void *(*malloc)(size_t),
      	reftable_free_ptr = free;
      }
    @@ reftable/basics.c: void reftable_set_alloc(void *(*malloc)(size_t),
     +		char *reallocated = buf->buf;
     +		REFTABLE_ALLOC_GROW(reallocated, newlen + 1, buf->alloc);
     +		if (!reallocated)
    -+			return -1;
    ++			return REFTABLE_OUT_OF_MEMORY_ERROR;
     +		buf->buf = reallocated;
     +	}
     +
    @@ reftable/basics.h: license that can be found in the LICENSE file or at
     +};
     +#define REFTABLE_BUF_INIT { 0 }
     +
    ++/*
    ++ * Initialize the buffer such that it is ready for use. This is equivalent to
    ++ * using REFTABLE_BUF_INIT for stack-allocated variables.
    ++ */
     +void reftable_buf_init(struct reftable_buf *buf);
    ++
    ++/*
    ++ * Release memory associated with the buffer. The buffer is reinitialized such
    ++ * that it can be reused for subsequent operations.
    ++ */
     +void reftable_buf_release(struct reftable_buf *buf);
    ++
    ++/*
    ++ * Reset the buffer such that it is effectively empty, without releasing the
    ++ * memory that this structure holds on to. This is equivalent to calling
    ++ * `reftable_buf_setlen(buf, 0)`.
    ++ */
     +void reftable_buf_reset(struct reftable_buf *buf);
    ++
    ++/*
    ++ * Trim the buffer to a shorter length by updating the `len` member and writing
    ++ * a NUL byte to `buf[len]`. Returns 0 on success, -1 when `len` points outside
    ++ * of the array.
    ++ */
     +int reftable_buf_setlen(struct reftable_buf *buf, size_t len);
    ++
    ++/*
    ++ * Lexicographically compare the two buffers. Returns 0 when both buffers have
    ++ * the same contents, -1 when `a` is lexicographically smaller than `b`, and 1
    ++ * otherwise.
    ++ */
     +int reftable_buf_cmp(const struct reftable_buf *a, const struct reftable_buf *b);
    ++
    ++/*
    ++ * Add the given bytes to the buffer. Returns 0 on success,
    ++ * REFTABLE_OUT_OF_MEMORY_ERROR on allocation failure.
    ++ */
     +int reftable_buf_add(struct reftable_buf *buf, const void *data, size_t len);
    ++
    ++/* Equivalent to `reftable_buf_add(buf, s, strlen(s))`. */
     +int reftable_buf_addstr(struct reftable_buf *buf, const char *s);
    ++
    ++/*
    ++ * Detach the buffer from the structure such that the underlying memory is now
    ++ * owned by the caller. The buffer is reinitialized such that it can be reused
    ++ * for subsequent operations.
    ++ */
     +char *reftable_buf_detach(struct reftable_buf *buf);
     +
      /* Bigendian en/decoding of integers */
 4:  e2ac27dbca0 !  4:  e1ff1af1f30 reftable: convert from `strbuf` to `reftable_buf`
    @@ Commit message
         reftable: convert from `strbuf` to `reftable_buf`
     
         Convert the reftable library to use the `reftable_buf` interface instead
    -    of the `strbuf` interface. This is a mechanical change via sed(1) and
    -    does not yet handle allocation failures. These will be addressed in
    -    subsequent commits.
    +    of the `strbuf` interface. This is mostly a mechanical change via sed(1)
    +    with some manual fixes where functions for `strbuf` and `reftable_buf`
    +    differ. The converted code does not yet handle allocation failures. This
    +    will be handled in subsequent commits.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
 5:  da432d7a384 =  5:  fe8c9ace463 reftable/blocksource: adapt interface name
 6:  797e435ed2e =  6:  8c98745233a t/unit-tests: check for `reftable_buf` allocation errors
 7:  66ee431db46 =  7:  1f08163009b reftable/stack: adapt `format_name()` to handle allocation failures
 8:  f5ae5ec27d6 =  8:  5798d76d7a4 reftable/record: adapt `reftable_record_key()` to handle allocation failures
 9:  d66d9e50e06 =  9:  a9582d51dd1 reftable/stack: adapt `stack_filename()` to handle allocation failures
10:  8406948ae74 = 10:  90819c90f38 reftable: handle trivial `reftable_buf` errors

Comments

Taylor Blau Oct. 14, 2024, 10:44 p.m. UTC | #1
On Mon, Oct 14, 2024 at 03:02:16PM +0200, Patrick Steinhardt wrote:
> Hi,
>
> this is the second part of my patch series that stop using `struct
> strbuf` in the reftable library. This is done such that the reftable
> library becomes standalone again and so that we can use the pluggable
> allocators part of the library.

I reviewed this round, and it looks generally good to me. I feel
somewhat unhappy to have to force the reftable backend to implement its
own strbuf-like functionality.

So I think it may be worth considering whether or not we can reuse Git's
strbuf implementation through a vtable or similar. But it may not be
immediately possible since that implementation just die()s on error,
can't easily swap out the allocator, etc. So perhaps this is the best
path forward, it just feels somewhat unsatisfying to me.

Thanks,
Taylor
Patrick Steinhardt Oct. 15, 2024, 4:37 a.m. UTC | #2
On Mon, Oct 14, 2024 at 06:44:35PM -0400, Taylor Blau wrote:
> On Mon, Oct 14, 2024 at 03:02:16PM +0200, Patrick Steinhardt wrote:
> > Hi,
> >
> > this is the second part of my patch series that stop using `struct
> > strbuf` in the reftable library. This is done such that the reftable
> > library becomes standalone again and so that we can use the pluggable
> > allocators part of the library.
> 
> I reviewed this round, and it looks generally good to me. I feel
> somewhat unhappy to have to force the reftable backend to implement its
> own strbuf-like functionality.
> 
> So I think it may be worth considering whether or not we can reuse Git's
> strbuf implementation through a vtable or similar. But it may not be
> immediately possible since that implementation just die()s on error,
> can't easily swap out the allocator, etc. So perhaps this is the best
> path forward, it just feels somewhat unsatisfying to me.

It's not perfect, I agree. I initially tried to do something like a
vtable or to even compile this into code with something like a wrapper
structure. But that approach in the end fell flat. So I decided to be
pragmatic about this whole issue and just duplicate some code --
overall, we are talking about ~200 lines of code to completely detangle
the reftable library from libgit.a.

For what it's worth, I also had this discussion with Junio in [1], in
which we ultimately agreed that this is probably the best way forward.

Patrick

[1]: <ZvVPiIzzLTTb75b8@pks.im>
shejialuo Oct. 15, 2024, 10:33 a.m. UTC | #3
On Tue, Oct 15, 2024 at 06:37:37AM +0200, Patrick Steinhardt wrote:
> On Mon, Oct 14, 2024 at 06:44:35PM -0400, Taylor Blau wrote:
> > On Mon, Oct 14, 2024 at 03:02:16PM +0200, Patrick Steinhardt wrote:
> > > Hi,
> > >
> > > this is the second part of my patch series that stop using `struct
> > > strbuf` in the reftable library. This is done such that the reftable
> > > library becomes standalone again and so that we can use the pluggable
> > > allocators part of the library.
> > 
> > I reviewed this round, and it looks generally good to me. I feel
> > somewhat unhappy to have to force the reftable backend to implement its
> > own strbuf-like functionality.
> > 
> > So I think it may be worth considering whether or not we can reuse Git's
> > strbuf implementation through a vtable or similar. But it may not be
> > immediately possible since that implementation just die()s on error,
> > can't easily swap out the allocator, etc. So perhaps this is the best
> > path forward, it just feels somewhat unsatisfying to me.
> 
> It's not perfect, I agree. I initially tried to do something like a
> vtable or to even compile this into code with something like a wrapper
> structure. But that approach in the end fell flat. So I decided to be
> pragmatic about this whole issue and just duplicate some code --
> overall, we are talking about ~200 lines of code to completely detangle
> the reftable library from libgit.a.
> 

I have read some patches yesterday, I feel quite strange that we need to
make repetition. Could we provide a header file which requires the users
who need to use the reftable library to implement the interfaces?

    reftable_strbuf_addf(void *buf, char *fmt, va_list ap);

Thus, we could reuse "strbuf_addf" to implement this interface in Git.
As for libgit2, could we let it implement these interfaces? Although I
have never read the source code of libgit2, I think there should be some
code which could be reuse to implement these interfaces?

However, I do not know the context. Maybe the above is totally wrong. If
so, please ignore.

Thanks,
Jialuo
Patrick Steinhardt Oct. 15, 2024, 10:44 a.m. UTC | #4
On Tue, Oct 15, 2024 at 06:33:21PM +0800, shejialuo wrote:
> On Tue, Oct 15, 2024 at 06:37:37AM +0200, Patrick Steinhardt wrote:
> > On Mon, Oct 14, 2024 at 06:44:35PM -0400, Taylor Blau wrote:
> > > On Mon, Oct 14, 2024 at 03:02:16PM +0200, Patrick Steinhardt wrote:
> > > > Hi,
> > > >
> > > > this is the second part of my patch series that stop using `struct
> > > > strbuf` in the reftable library. This is done such that the reftable
> > > > library becomes standalone again and so that we can use the pluggable
> > > > allocators part of the library.
> > > 
> > > I reviewed this round, and it looks generally good to me. I feel
> > > somewhat unhappy to have to force the reftable backend to implement its
> > > own strbuf-like functionality.
> > > 
> > > So I think it may be worth considering whether or not we can reuse Git's
> > > strbuf implementation through a vtable or similar. But it may not be
> > > immediately possible since that implementation just die()s on error,
> > > can't easily swap out the allocator, etc. So perhaps this is the best
> > > path forward, it just feels somewhat unsatisfying to me.
> > 
> > It's not perfect, I agree. I initially tried to do something like a
> > vtable or to even compile this into code with something like a wrapper
> > structure. But that approach in the end fell flat. So I decided to be
> > pragmatic about this whole issue and just duplicate some code --
> > overall, we are talking about ~200 lines of code to completely detangle
> > the reftable library from libgit.a.
> > 
> 
> I have read some patches yesterday, I feel quite strange that we need to
> make repetition. Could we provide a header file which requires the users
> who need to use the reftable library to implement the interfaces?
> 
>     reftable_strbuf_addf(void *buf, char *fmt, va_list ap);
> 
> Thus, we could reuse "strbuf_addf" to implement this interface in Git.
> As for libgit2, could we let it implement these interfaces? Although I
> have never read the source code of libgit2, I think there should be some
> code which could be reuse to implement these interfaces?
> 
> However, I do not know the context. Maybe the above is totally wrong. If
> so, please ignore.

The thing is that we'll have repetition regardless of what we end up
doing:

  - We could either have repetition once in the reftable library,
    reimplementing `struct strbuf`. This can then be reused by every
    single user of the reftable library.

  - Or we can have repetition for every single user of the reftable
    library. For now that'd only be Git and libgit2, but we'd still have
    repetition.

The second kind of repetition is way worse though, because now every
user of the reftable library has a different implementation of a type
that is as basic as a buffer. These _must_ behave the exact same across
implementations or we will hit issues. So I'd rather have the repetition
a single time in the reftable library such that all users of the library
will behave the same rather than having downstream users copy the
implementation of `struct strbuf` and making it work for their library.

Also, due to the nature of `struct strbuf` not allowing for allocation
failures we'd already have diverging behaviour. In Git you would never
hit error code paths for allocation failures, whereas every library user
potentially can.

So we really have to treat the reftable code base special. If we want to
be a good citizen and be a proper upstream for projects like libgit2 we
don't really have much of a choice than to detangle it from libgit.a. If
we don't we may be saving 20 lines of code, but we make everybody elses
life harder.

Patrick
shejialuo Oct. 15, 2024, 11:23 a.m. UTC | #5
On Tue, Oct 15, 2024 at 12:44:53PM +0200, Patrick Steinhardt wrote:

[snip]

> > I have read some patches yesterday, I feel quite strange that we need to
> > make repetition. Could we provide a header file which requires the users
> > who need to use the reftable library to implement the interfaces?
> > 
> >     reftable_strbuf_addf(void *buf, char *fmt, va_list ap);
> > 
> > Thus, we could reuse "strbuf_addf" to implement this interface in Git.
> > As for libgit2, could we let it implement these interfaces? Although I
> > have never read the source code of libgit2, I think there should be some
> > code which could be reuse to implement these interfaces?
> > 
> > However, I do not know the context. Maybe the above is totally wrong. If
> > so, please ignore.
> 
> The thing is that we'll have repetition regardless of what we end up
> doing:
> 
>   - We could either have repetition once in the reftable library,
>     reimplementing `struct strbuf`. This can then be reused by every
>     single user of the reftable library.
> 
>   - Or we can have repetition for every single user of the reftable
>     library. For now that'd only be Git and libgit2, but we'd still have
>     repetition.
> 
> The second kind of repetition is way worse though, because now every
> user of the reftable library has a different implementation of a type
> that is as basic as a buffer. These _must_ behave the exact same across
> implementations or we will hit issues. So I'd rather have the repetition
> a single time in the reftable library such that all users of the library
> will behave the same rather than having downstream users copy the
> implementation of `struct strbuf` and making it work for their library.
> 

Yes. I agree with you it is worse to let every downstream to implement
the interfaces. I know the motivation here, we want to make the whole
reftable library be independent of the Git which allows the downstream
to easily use the reftable library.

> Also, due to the nature of `struct strbuf` not allowing for allocation
> failures we'd already have diverging behaviour. In Git you would never
> hit error code paths for allocation failures, whereas every library user
> potentially can.
> 
> So we really have to treat the reftable code base special. If we want to
> be a good citizen and be a proper upstream for projects like libgit2 we
> don't really have much of a choice than to detangle it from libgit.a. If
> we don't we may be saving 20 lines of code, but we make everybody elses
> life harder.
> 

Yes. And I do not think this is a problem right now. Thanks for this
wonderful explanation.

> Patrick

Jialuo