mbox series

[v2,00/13] reftable: prepare for re-seekable iterators

Message ID cover.1715589670.git.ps@pks.im (mailing list archive)
Headers show
Series reftable: prepare for re-seekable iterators | expand

Message

Patrick Steinhardt May 13, 2024, 8:46 a.m. UTC
Hi,

this is the second version of my patch series that prepares the reftable
iterators to become re-seekable. These refactorings will eventually
allow us to reuse data structures by iterators and optimize for certain
cases.

Changes compared to v1:

  - Various fixes to commit messages.

  - Fixed a copy & pasted comment to refer to logs instead of refs.

The series continues to build on top of ps/reftable-write-optim. There
is a merge conflict with the in-flight ps/reftable-write-options, but
given that this series has not yet been merged to next and because Junio
has already resolved the conflict in "seen" I decided to not pull it in
as an additional dependency.

Thanks!

Patrick

Patrick Steinhardt (13):
  reftable/block: use `size_t` to track restart point index
  reftable/reader: avoid copying index iterator
  reftable/reader: unify indexed and linear seeking
  reftable/reader: separate concerns of table iter and reftable reader
  reftable/reader: inline `reader_seek_internal()`
  reftable/reader: set up the reader when initializing table iterator
  reftable/merged: split up initialization and seeking of records
  reftable/merged: simplify indices for subiterators
  reftable/generic: move seeking of records into the iterator
  reftable/generic: adapt interface to allow reuse of iterators
  reftable/reader: adapt interface to allow reuse of iterators
  reftable/stack: provide convenience functions to create iterators
  reftable/merged: adapt interface to allow reuse of iterators

 refs/reftable-backend.c      |  48 ++++----
 reftable/block.c             |   4 +-
 reftable/generic.c           |  94 +++++++++++----
 reftable/generic.h           |   9 +-
 reftable/iter.c              |  23 +++-
 reftable/merged.c            | 148 ++++++++----------------
 reftable/merged.h            |   6 +
 reftable/merged_test.c       |  19 ++-
 reftable/reader.c            | 218 +++++++++++++++--------------------
 reftable/readwrite_test.c    |  35 ++++--
 reftable/reftable-generic.h  |   8 +-
 reftable/reftable-iterator.h |  21 ++++
 reftable/reftable-merged.h   |  15 ---
 reftable/reftable-reader.h   |  45 ++------
 reftable/reftable-stack.h    |  18 +++
 reftable/stack.c             |  29 ++++-
 16 files changed, 378 insertions(+), 362 deletions(-)

Range-diff against v1:
 1:  ca86a8b58d =  1:  ca86a8b58d reftable/block: use `size_t` to track restart point index
 2:  bdbebdbd36 =  2:  bdbebdbd36 reftable/reader: avoid copying index iterator
 3:  716863a580 =  3:  716863a580 reftable/reader: unify indexed and linear seeking
 4:  91db2f18c1 =  4:  91db2f18c1 reftable/reader: separate concerns of table iter and reftable reader
 5:  4d498ef342 =  5:  4d498ef342 reftable/reader: inline `reader_seek_internal()`
 6:  5a10a11584 =  6:  5a10a11584 reftable/reader: set up the reader when initializing table iterator
 7:  21b3e3ab5f !  7:  12c10fd366 reftable/merged: split up initialization and seeking of records
    @@ Commit message
         To initialize a `struct merged_iter`, we need to seek all subiterators
         to the wanted record and then add their results to the priority queue
         used to sort the records. This logic is split up across two functions,
    -    `merged_table_seek_record()` and `merged_table_iter()`. The scope of
    -    these functions is somewhat weird though, where `merged_table_iter()` is
    +    `merged_table_seek_record()` and `merged_iter_init()`. The scope of
    +    these functions is somewhat weird though, where `merged_iter_init()` is
         only responsible for adding the records of the subiterators to the
         priority queue.
     
    -    Clarify the scope of those functions such that `merged_table_iter()` is
    +    Clarify the scope of those functions such that `merged_iter_init()` is
         only responsible for initializing the iterator's structure. Performing
         the subiterator seeks are now part of `merged_table_seek_record()`.
     
 8:  f0f42cd56b !  8:  31bdfc1e34 reftable/merged: simplify indices for subiterators
    @@ Commit message
         reftable/merged: simplify indices for subiterators
     
         When seeking on a merged table, we perform the seek for each of the
    -    subiterators. If the subiterator hasa the desired record we add it to
    -    the priority queue, otherwise we skip it and don't add it to the stack
    -    of subiterators hosted by the merged table.
    +    subiterators. If the subiterator has the desired record we add it to the
    +    priority queue, otherwise we skip it and don't add it to the stack of
    +    subiterators hosted by the merged table.
     
         The consequence of this is that the index of the subiterator in the
         merged table does not necessarily correspond to the index of it in the
 9:  859b399e71 !  9:  3e91c3830e reftable/generic: move seeking of records into the iterator
    @@ Commit message
         The second and more significant downside is that it is impossible to
         reuse iterators for multiple seeks. Whenever you want to look up a
         record, you need to re-create the whole infrastructure again, which is
    -    quite a waste of time. Furthermore, it is impossible to for example
    -    optimize seeks, for example when seeking the same record multiple times.
    +    quite a waste of time. Furthermore, it is impossible to optimize seeks,
    +    such as when seeking the same record multiple times.
     
         To address this, we essentially split up the concerns properly such that
         the parent data structure is responsible for setting up the iterator via
10:  727b8fa432 = 10:  b0641dd800 reftable/generic: adapt interface to allow reuse of iterators
11:  f688f8f59a ! 11:  0745951650 reftable/reader: adapt interface to allow reuse of iterators
    @@ reftable/reftable-reader.h: struct reftable_table;
     +void reftable_reader_init_ref_iterator(struct reftable_reader *r,
     +				       struct reftable_iterator *it);
     +
    -+/* Initialize a reftable iterator for reading refs. */
    ++/* Initialize a reftable iterator for reading logs. */
     +void reftable_reader_init_log_iterator(struct reftable_reader *r,
     +				       struct reftable_iterator *it);
      
12:  68cc35919b = 12:  51480c4328 reftable/stack: provide convenience functions to create iterators
13:  be4da295c6 = 13:  2586e93c44 reftable/merged: adapt interface to allow reuse of iterators

Comments

Justin Tobler May 15, 2024, 8:15 p.m. UTC | #1
On 24/05/13 10:46AM, Patrick Steinhardt wrote:
> Hi,
> 
> this is the second version of my patch series that prepares the reftable
> iterators to become re-seekable. These refactorings will eventually
> allow us to reuse data structures by iterators and optimize for certain
> cases.
> 
> Changes compared to v1:
> 
>   - Various fixes to commit messages.
> 
>   - Fixed a copy & pasted comment to refer to logs instead of refs.
> 
> The series continues to build on top of ps/reftable-write-optim. There
> is a merge conflict with the in-flight ps/reftable-write-options, but
> given that this series has not yet been merged to next and because Junio
> has already resolved the conflict in "seen" I decided to not pull it in
> as an additional dependency.

Thanks Patrick! From the range-diff, this version looks good to me :)

-Justin
Karthik Nayak May 21, 2024, 3:31 p.m. UTC | #2
Hello,

Patrick Steinhardt <ps@pks.im> writes:
> Hi,
>
> this is the second version of my patch series that prepares the reftable
> iterators to become re-seekable. These refactorings will eventually
> allow us to reuse data structures by iterators and optimize for certain
> cases.
>
> Changes compared to v1:
>
>   - Various fixes to commit messages.
>
>   - Fixed a copy & pasted comment to refer to logs instead of refs.
>
> The series continues to build on top of ps/reftable-write-optim. There
> is a merge conflict with the in-flight ps/reftable-write-options, but
> given that this series has not yet been merged to next and because Junio
> has already resolved the conflict in "seen" I decided to not pull it in
> as an additional dependency.
>
> Thanks!
>
> Patrick

I didn't review the earlier version. I did go through the patches in
this version, I only have a nit in the first patch, not worth a reroll.
Thanks for the series, I have nothing more to add.

Karthik
Patrick Steinhardt May 22, 2024, 7:23 a.m. UTC | #3
On Tue, May 21, 2024 at 03:31:15PM +0000, Karthik Nayak wrote:
> Hello,
> 
> Patrick Steinhardt <ps@pks.im> writes:
> > Hi,
> >
> > this is the second version of my patch series that prepares the reftable
> > iterators to become re-seekable. These refactorings will eventually
> > allow us to reuse data structures by iterators and optimize for certain
> > cases.
> >
> > Changes compared to v1:
> >
> >   - Various fixes to commit messages.
> >
> >   - Fixed a copy & pasted comment to refer to logs instead of refs.
> >
> > The series continues to build on top of ps/reftable-write-optim. There
> > is a merge conflict with the in-flight ps/reftable-write-options, but
> > given that this series has not yet been merged to next and because Junio
> > has already resolved the conflict in "seen" I decided to not pull it in
> > as an additional dependency.
> >
> > Thanks!
> >
> > Patrick
> 
> I didn't review the earlier version. I did go through the patches in
> this version, I only have a nit in the first patch, not worth a reroll.
> Thanks for the series, I have nothing more to add.

Thanks!

Patrick