mbox series

[00/11] reftable: expose write options as config

Message ID cover.1714630191.git.ps@pks.im (mailing list archive)
Headers show
Series reftable: expose write options as config | expand

Message

Patrick Steinhardt May 2, 2024, 6:51 a.m. UTC
Hi,

the reftable format has some flexibility with regards to how exactly it
writes the respective tables:

  - The block size allows you to control how large each block is
    supposed to be. The bigger the block, the more records you can fit
    into it.

  - Restart intervals control how often a restart point is written that
    breaks prefix compression. The lower the interval, the less disk
    space savings you get.

  - Object indices can be enabled or disabled. These are optional and
    Git doesn't use them right now, so disabling them may be a sensible
    thing to do if you want to save some disk space.

  - The geometric factor controls when we compact tables during auto
    compaction.

This patch series exposes all of these via a new set of configs so that
they can be tweaked by the user as-needed. It's not expected that those
are really of much importance for the "normal" user -- the defaults
should be good enough. But for edge cases (huge repos with millions of
refs) and for hosting providers these knobs can be helpful.

This patch series applies on top of d4cc1ec35f (Start the 2.46 cycle,
2024-04-30).

Patrick

Patrick Steinhardt (11):
  reftable: consistently refer to `reftable_write_options` as `opts`
  reftable: consistently pass write opts as value
  reftable/writer: drop static variable used to initialize strbuf
  reftable/writer: improve error when passed an invalid block size
  reftable/dump: support dumping a table's block structure
  refs/reftable: allow configuring block size
  reftable: use `uint16_t` to track restart interval
  refs/reftable: allow configuring restart interval
  refs/reftable: allow disabling writing the object index
  reftable: make the compaction factor configurable
  refs/reftable: allow configuring geometric factor

 Documentation/config.txt          |   2 +
 Documentation/config/reftable.txt |  49 +++++
 refs/reftable-backend.c           |  46 ++++-
 reftable/block.h                  |   2 +-
 reftable/dump.c                   |  12 +-
 reftable/merged_test.c            |   6 +-
 reftable/reader.c                 |  63 +++++++
 reftable/readwrite_test.c         |  26 +--
 reftable/refname_test.c           |   2 +-
 reftable/reftable-reader.h        |   2 +
 reftable/reftable-stack.h         |   2 +-
 reftable/reftable-writer.h        |  10 +-
 reftable/stack.c                  |  56 +++---
 reftable/stack.h                  |   5 +-
 reftable/stack_test.c             | 118 ++++++------
 reftable/writer.c                 |  20 +--
 t/t0613-reftable-write-options.sh | 286 ++++++++++++++++++++++++++++++
 17 files changed, 577 insertions(+), 130 deletions(-)
 create mode 100644 Documentation/config/reftable.txt
 create mode 100755 t/t0613-reftable-write-options.sh


base-commit: d4cc1ec35f3bcce816b69986ca41943f6ce21377

Comments

Patrick Steinhardt May 2, 2024, 7:29 a.m. UTC | #1
On Thu, May 02, 2024 at 08:51:27AM +0200, Patrick Steinhardt wrote:
> Hi,
> 
> the reftable format has some flexibility with regards to how exactly it
> writes the respective tables:
> 
>   - The block size allows you to control how large each block is
>     supposed to be. The bigger the block, the more records you can fit
>     into it.
> 
>   - Restart intervals control how often a restart point is written that
>     breaks prefix compression. The lower the interval, the less disk
>     space savings you get.
> 
>   - Object indices can be enabled or disabled. These are optional and
>     Git doesn't use them right now, so disabling them may be a sensible
>     thing to do if you want to save some disk space.
> 
>   - The geometric factor controls when we compact tables during auto
>     compaction.
> 
> This patch series exposes all of these via a new set of configs so that
> they can be tweaked by the user as-needed. It's not expected that those
> are really of much importance for the "normal" user -- the defaults
> should be good enough. But for edge cases (huge repos with millions of
> refs) and for hosting providers these knobs can be helpful.
> 
> This patch series applies on top of d4cc1ec35f (Start the 2.46 cycle,
> 2024-04-30).

Ugh. I actually intended to pull in ps/reftable-write-optim as a
dependency because I know it causes conflicts. But I screwed this up as
I thought that the topic was merged into "master" already, even though
it has only hit "next".

I'll refrain from sending a new version immediately though and will wait
for reviews first. Once those are in I will pull in the above topic.

Patrick
Junio C Hamano May 3, 2024, 8:38 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> Ugh. I actually intended to pull in ps/reftable-write-optim as a
> dependency because I know it causes conflicts. But I screwed this up as
> I thought that the topic was merged into "master" already, even though
> it has only hit "next".
>
> I'll refrain from sending a new version immediately though and will wait
> for reviews first. Once those are in I will pull in the above topic.

I saw [01/11] has changes to stack_check_addition() that disappeared
by the other topic, and the resolution is to remove the function (as
nobody calls it anyway).  Also [02/11] has changes to refname_test.c
that can be resolved by just removing the file.

If there are other semantic conflict resolution needed, such a
rebasing is appreciated, but otherwise, there is no strong need to
rebase.

The mention of the name of the other topic that has interactions was
very much helpful.

Thanks.
Patrick Steinhardt May 6, 2024, 6:51 a.m. UTC | #3
On Fri, May 03, 2024 at 01:38:51PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Ugh. I actually intended to pull in ps/reftable-write-optim as a
> > dependency because I know it causes conflicts. But I screwed this up as
> > I thought that the topic was merged into "master" already, even though
> > it has only hit "next".
> >
> > I'll refrain from sending a new version immediately though and will wait
> > for reviews first. Once those are in I will pull in the above topic.
> 
> I saw [01/11] has changes to stack_check_addition() that disappeared
> by the other topic, and the resolution is to remove the function (as
> nobody calls it anyway).  Also [02/11] has changes to refname_test.c
> that can be resolved by just removing the file.

Yup, sounds right to me.

> If there are other semantic conflict resolution needed, such a
> rebasing is appreciated, but otherwise, there is no strong need to
> rebase.

No other semantic conflicts I'm aware of, no. Thanks!

Patrick

> The mention of the name of the other topic that has interactions was
> very much helpful.
> 
> Thanks.
Justin Tobler May 6, 2024, 9:29 p.m. UTC | #4
On 24/05/02 08:51AM, Patrick Steinhardt wrote:
> Hi,
> 
> the reftable format has some flexibility with regards to how exactly it
> writes the respective tables:
> 
>   - The block size allows you to control how large each block is
>     supposed to be. The bigger the block, the more records you can fit
>     into it.
> 
>   - Restart intervals control how often a restart point is written that
>     breaks prefix compression. The lower the interval, the less disk
>     space savings you get.
> 
>   - Object indices can be enabled or disabled. These are optional and
>     Git doesn't use them right now, so disabling them may be a sensible
>     thing to do if you want to save some disk space.
> 
>   - The geometric factor controls when we compact tables during auto
>     compaction.
> 
> This patch series exposes all of these via a new set of configs so that
> they can be tweaked by the user as-needed. It's not expected that those
> are really of much importance for the "normal" user -- the defaults
> should be good enough. But for edge cases (huge repos with millions of
> refs) and for hosting providers these knobs can be helpful.
> 
> This patch series applies on top of d4cc1ec35f (Start the 2.46 cycle,
> 2024-04-30).

I have reviewed these patches and I have nothing to add. Thanks Patrick!

-Justin
Karthik Nayak May 10, 2024, 10 a.m. UTC | #5
Hello Patrick,

Patrick Steinhardt <ps@pks.im> writes:
> Hi,
>
> the reftable format has some flexibility with regards to how exactly it
> writes the respective tables:
>
>   - The block size allows you to control how large each block is
>     supposed to be. The bigger the block, the more records you can fit
>     into it.
>
>   - Restart intervals control how often a restart point is written that
>     breaks prefix compression. The lower the interval, the less disk
>     space savings you get.
>
>   - Object indices can be enabled or disabled. These are optional and
>     Git doesn't use them right now, so disabling them may be a sensible
>     thing to do if you want to save some disk space.
>
>   - The geometric factor controls when we compact tables during auto
>     compaction.
>
> This patch series exposes all of these via a new set of configs so that
> they can be tweaked by the user as-needed. It's not expected that those
> are really of much importance for the "normal" user -- the defaults
> should be good enough. But for edge cases (huge repos with millions of
> refs) and for hosting providers these knobs can be helpful.
>
> This patch series applies on top of d4cc1ec35f (Start the 2.46 cycle,
> 2024-04-30).
>
> Patrick
>

I'v gone through the commits and apart from a few small comments, I
think it looks great already.

Thanks,
Karthik
Patrick Steinhardt May 10, 2024, 10:14 a.m. UTC | #6
On Fri, May 10, 2024 at 03:00:41AM -0700, Karthik Nayak wrote:
> I'v gone through the commits and apart from a few small comments, I
> think it looks great already.
> 
> Thanks,
> Karthik

Thanks for your review!

Patrick