[00/11] pack-bitmap: convert offset to ref deltas where possible

Message ID	cover.1728505840.git.me@ttaylorr.com (mailing list archive)
Headers	show Received: from mail-yw1-f177.google.com (mail-yw1-f177.google.com [209.85.128.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9F591714A4 for <git@vger.kernel.org>; Wed, 9 Oct 2024 20:31:00 +0000 (UTC) Date: Wed, 9 Oct 2024 16:30:52 -0400 From: Taylor Blau <me@ttaylorr.com> To: git@vger.kernel.org Cc: Elijah Newren <newren@gmail.com>, Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>, Patrick Steinhardt <ps@pks.im> Subject: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Message-ID: <cover.1728505840.git.me@ttaylorr.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline
Series	pack-bitmap: convert offset to ref deltas where possible \| expand [00/11] pack-bitmap: convert offset to ref deltas where possible [01/11] pack-bitmap.c: do not pass `pack_pos` to `try_partial_reuse()` [02/11] pack-bitmap.c: avoid unnecessary `offset_to_pack_pos()` [03/11] pack-bitmap.c: delay calling 'offset_to_pack_pos()' [04/11] pack-bitmap.c: compare `base_offset` to `delta_obj_offset` [05/11] pack-bitmap.c: extract `find_base_bitmap_pos()` [06/11] pack-bitmap: drop `from_midx` field from `bitmapped_pack` [07/11] write_reused_pack_one(): translate bit positions directly [08/11] t5332: enable OFS_DELTAs via test_pack_objects_reused [09/11] pack-bitmap: enable cross-pack delta reuse [10/11] pack-bitmap.c: record whether the result was filtered [11/11] pack-bitmap: enable reusing deltas with base objects in 'haves' bitmap

Taylor Blau Oct. 9, 2024, 8:30 p.m. UTC

This patch series enables more objects to be candidates for verbatim
reuse when generating a pack with the aide of reachability bitmaps.

By the end of this series, two new classes of objects are now reuse
candidates which were not before. They are:

  - Cross-pack deltas. In multi-pack bitmaps, if the delta and base
    were selected from different packs, the delta was not reusable.

  - Thin deltas. In both single- and multi-pack bitmaps, we did not
    consider reusing deltas whose base object appears in the 'haves'
    bitmap.

This series teaches the pack-objects machinery to convert cross-pack
and thin deltas to reference deltas so that they may be reused.

The series started off as a couple of patches, each addressing one of
the two classes from the above. But as I started refining these
patches, the series grew longer, as there were a handful of
opportunities to clean up some of the existing pack-reuse bitmap code.

The series is organized as follows:

  - The first seven patches are various cleanups that make the
    pack-reuse code more readable and prepare us to enable delta reuse
    in more situation as above.

      pack-bitmap.c: do not pass `pack_pos` to `try_partial_reuse()`
      pack-bitmap.c: avoid unnecessary `offset_to_pack_pos()`
      pack-bitmap.c: delay calling 'offset_to_pack_pos()'
      pack-bitmap.c: compare `base_offset` to `delta_obj_offset`
      pack-bitmap.c: extract `find_base_bitmap_pos()`
      pack-bitmap: drop `from_midx` field from `bitmapped_pack`
      write_reused_pack_one(): translate bit positions directly

  - The next patch is a test fixup:

      t5332: enable OFS_DELTAs via test_pack_objects_reused

  - The next patch after that enables us to reuse deltas whose base
    appears in another pack:

      pack-bitmap: enable cross-pack delta reuse

  - The final two patches enable the "thin deltas" optimization
    above:

      pack-bitmap.c: record whether the result was filtered
      pack-bitmap: enable reusing deltas with base objects in 'haves' bitmap

I think that the first seven patches are worthwhile on their own. I
considered splitting those out into their own series, but decided
against it to so that we could realize their benefit more readily.

The first optimization (cross-pack deltas) should help clones and
fetches, but the second optimization (thin deltas) will only help
fetches, since the 'haves' bitmap will necessarily be empty for
clones.

Of course, REF_DELTAs have a less compact representation than
OFS_DELTAs, so the resulting packs will trade off some CPU time for a
slightly larger pack. But we're only talking about a handful of extra
bytes, and network bandwidth is pretty cheap, so I think the
optimization is worthwhile here too.

Thanks in advance for your review!

Taylor Blau (11):
  pack-bitmap.c: do not pass `pack_pos` to `try_partial_reuse()`
  pack-bitmap.c: avoid unnecessary `offset_to_pack_pos()`
  pack-bitmap.c: delay calling 'offset_to_pack_pos()'
  pack-bitmap.c: compare `base_offset` to `delta_obj_offset`
  pack-bitmap.c: extract `find_base_bitmap_pos()`
  pack-bitmap: drop `from_midx` field from `bitmapped_pack`
  write_reused_pack_one(): translate bit positions directly
  t5332: enable OFS_DELTAs via test_pack_objects_reused
  pack-bitmap: enable cross-pack delta reuse
  pack-bitmap.c: record whether the result was filtered
  pack-bitmap: enable reusing deltas with base objects in 'haves' bitmap

 builtin/pack-objects.c      | 110 ++++++++++++--------
 midx.c                      |   1 -
 pack-bitmap.c               | 198 ++++++++++++++++++++++++------------
 pack-bitmap.h               |   6 +-
 t/t5332-multi-pack-reuse.sh |  72 ++++++++++++-
 5 files changed, 275 insertions(+), 112 deletions(-)


base-commit: 777489f9e09c8d0dd6b12f9d90de6376330577a2

Junio C Hamano Oct. 10, 2024, 4:46 p.m. UTC | #1

Taylor Blau <me@ttaylorr.com> writes:

> This patch series enables more objects to be candidates for verbatim
> reuse when generating a pack with the aide of reachability bitmaps.
>
> By the end of this series, two new classes of objects are now reuse
> candidates which were not before. They are:
>
>   - Cross-pack deltas. In multi-pack bitmaps, if the delta and base
>     were selected from different packs, the delta was not reusable.

Hmph.  Suppose that you need to send object X, you happen to have X
as a ofs-delta against Y, but Y may appear in multiple packs.

Even if the copy of Y you are going to send together with X is from
the same packfile, you may not be sending all the objects between X
and Y in the original local packfile, so you would need to recompute
the offset to give ofs-delta X to the distance between X and Y in
the resulting packstream, no?

So when you pick the copy of Y out of another pack, what's so
different?  After emitting Y to the resulting pack stream (and
remembering where in the packstream you did so), when it is X's turn
to be emitted, shouldn't you be able to compute the distance in the
resulting packstream to represent X as an ofs-delta against Y, which
should already be happening when you had both X and Y in the same
original pack?

>   - Thin deltas. In both single- and multi-pack bitmaps, we did not
>     consider reusing deltas whose base object appears in the 'haves'
>     bitmap.

I hope this optimization does not kick in unless the receiving end
is prepared to do "index-pack --fix-thin".

I've never thought about this specifically, but it is interesting to
realize that by definition "thin" deltas cannot be ofs-deltas.

> Of course, REF_DELTAs have a less compact representation than
> OFS_DELTAs, so the resulting packs will trade off some CPU time for a
> slightly larger pack.

Is comparing ref- vs ofs- delta a meaningful thing to do in the
context of this series?

What does the current code without these patches do in the same
situation?  Give up on reusing the existing delta and then?  If we
send the base representation instead, the comparison is "we
currently do not use delta, but with this change we can reuse delta
(even though we do not bother recompute the offset and instead use
ref-delta)".

Do we recompute the delta on the fly and show it as an ofs-delta
with the current code?  Then the comparison would be "we spend time
doing diff-delta once right now but instead reuse an existing one
(even though we do not bother recompute the offset and instead use
ref-delta)".

Taylor Blau Oct. 10, 2024, 5:07 p.m. UTC | #2

On Thu, Oct 10, 2024 at 09:46:00AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > This patch series enables more objects to be candidates for verbatim
> > reuse when generating a pack with the aide of reachability bitmaps.
> >
> > By the end of this series, two new classes of objects are now reuse
> > candidates which were not before. They are:
> >
> >   - Cross-pack deltas. In multi-pack bitmaps, if the delta and base
> >     were selected from different packs, the delta was not reusable.
>
> Hmph.  Suppose that you need to send object X, you happen to have X
> as a ofs-delta against Y, but Y may appear in multiple packs.
>
> Even if the copy of Y you are going to send together with X is from
> the same packfile, you may not be sending all the objects between X
> and Y in the original local packfile, so you would need to recompute
> the offset to give ofs-delta X to the distance between X and Y in
> the resulting packstream, no?

That's right. When doing verbatim pack reuse, we mark reused "chunks"
(c.f. 'struct reused_chunk'), where each chunk records the offset of the
beginning of the chunk in the source pack, and the difference between
that value and the corresponding byte in the assembled pack.

So suppose like in your example we have X and Y from the same pack, and
we are sending both objects, but not necessarily every object in between
the two from the source pack. In that case, X and Y will end up in
different chunks.

When we go to actually rewrite the OFS_DELTA for X, we'll compute:

    off_t fixup = find_reused_offset(X) - find_reused_offset(Y);

(in the code, X and Y are actually the offsets for each object in the
source pack, but I'm using the object names here for clarity).

When there is a non-zero fixup value, we'll patch the OFS_DELTA to
account for the gap between it and its base object. The details of how
that is done are in builtin/pack-objects.c::write_reused_pack_one(), in
the 'if (fixup) { ... }' block.

> So when you pick the copy of Y out of another pack, what's so
> different?  After emitting Y to the resulting pack stream (and
> remembering where in the packstream you did so), when it is X's turn
> to be emitted, shouldn't you be able to compute the distance in the
> resulting packstream to represent X as an ofs-delta against Y, which
> should already be happening when you had both X and Y in the same
> original pack?

Good question. The difference is that if you're reusing X and Y from
same pack, you know that Y occurs some number of bytes *before* X in the
resulting pack.

But if Y comes from a different pack, it may get pushed further back in
the MIDX pseudo-pack order. So in that case the assembled pack may list
X before Y, in which case X cannot be an OFS_DELTA of Y, since offset
deltas require that the base object appears first.

> >   - Thin deltas. In both single- and multi-pack bitmaps, we did not
> >     consider reusing deltas whose base object appears in the 'haves'
> >     bitmap.
>
> I hope this optimization does not kick in unless the receiving end
> is prepared to do "index-pack --fix-thin".

It does. We only allow computing "thin deltas" when pack-objects was
invoked with `--thin`.

> I've never thought about this specifically, but it is interesting to
> realize that by definition "thin" deltas cannot be ofs-deltas.

Yep.

> > Of course, REF_DELTAs have a less compact representation than
> > OFS_DELTAs, so the resulting packs will trade off some CPU time for a
> > slightly larger pack.
>
> Is comparing ref- vs ofs- delta a meaningful thing to do in the
> context of this series?
>
> What does the current code without these patches do in the same
> situation?  Give up on reusing the existing delta and then?  If we
> send the base representation instead, the comparison is "we
> currently do not use delta, but with this change we can reuse delta
> (even though we do not bother recompute the offset and instead use
> ref-delta)".

The current code does not reuse deltas when we're either (a) not sending
the base, (b) we are sending the base, but it's in a different pack, or
(c) both. When any of those conditions are met, we do not reuse the
existing on-disk representation of the delta object verbatim, and
instead kick it back to the normal pack generation routine.

That might result in us finding a new base, or applying sending the
object's contents as a standalone object.

> Do we recompute the delta on the fly and show it as an ofs-delta
> with the current code?  Then the comparison would be "we spend time
> doing diff-delta once right now but instead reuse an existing one
> (even though we do not bother recompute the offset and instead use
> ref-delta)".

Fair.

Thanks,
Taylor

Junio C Hamano Oct. 10, 2024, 8:20 p.m. UTC | #3

Taylor Blau <me@ttaylorr.com> writes:

>> So when you pick the copy of Y out of another pack, what's so
>> different?  After emitting Y to the resulting pack stream (and
>> remembering where in the packstream you did so), when it is X's turn
>> to be emitted, shouldn't you be able to compute the distance in the
>> resulting packstream to represent X as an ofs-delta against Y, which
>> should already be happening when you had both X and Y in the same
>> original pack?
>
> Good question. The difference is that if you're reusing X and Y from
> same pack, you know that Y occurs some number of bytes *before* X in the
> resulting pack.
>
> But if Y comes from a different pack, it may get pushed further back in
> the MIDX pseudo-pack order. So in that case the assembled pack may list
> X before Y, in which case X cannot be an OFS_DELTA of Y, since offset
> deltas require that the base object appears first.

That is what we have always done even before we started bitmap based
optimization.  If we happen to write Y before X, we consider doing
ofs-delta for X, but otherwise we do ref-delta for X.  We do reorder
fairly late in the pipeline when we notice that X that we are about
to write out depends on Y that we haven't emitted to avoid this,
though.  All of that the bitmap-based optimization code path should
be able to imitate, I would think.

Taylor Blau Oct. 10, 2024, 8:32 p.m. UTC | #4

On Thu, Oct 10, 2024 at 01:20:06PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> >> So when you pick the copy of Y out of another pack, what's so
> >> different?  After emitting Y to the resulting pack stream (and
> >> remembering where in the packstream you did so), when it is X's turn
> >> to be emitted, shouldn't you be able to compute the distance in the
> >> resulting packstream to represent X as an ofs-delta against Y, which
> >> should already be happening when you had both X and Y in the same
> >> original pack?
> >
> > Good question. The difference is that if you're reusing X and Y from
> > same pack, you know that Y occurs some number of bytes *before* X in the
> > resulting pack.
> >
> > But if Y comes from a different pack, it may get pushed further back in
> > the MIDX pseudo-pack order. So in that case the assembled pack may list
> > X before Y, in which case X cannot be an OFS_DELTA of Y, since offset
> > deltas require that the base object appears first.
>
> That is what we have always done even before we started bitmap based
> optimization.  If we happen to write Y before X, we consider doing
> ofs-delta for X, but otherwise we do ref-delta for X.  We do reorder
> fairly late in the pipeline when we notice that X that we are about
> to write out depends on Y that we haven't emitted to avoid this,
> though.  All of that the bitmap-based optimization code path should
> be able to imitate, I would think.

I see..., so yeah: before this patch series we would have not processed
X via the bitmap-powered pack reuse mechanism, and after this series we
will.

Thanks,
Taylor

Jeff King Oct. 11, 2024, 7:54 a.m. UTC | #5

On Thu, Oct 10, 2024 at 01:20:06PM -0700, Junio C Hamano wrote:

> Taylor Blau <me@ttaylorr.com> writes:
> 
> >> So when you pick the copy of Y out of another pack, what's so
> >> different?  After emitting Y to the resulting pack stream (and
> >> remembering where in the packstream you did so), when it is X's turn
> >> to be emitted, shouldn't you be able to compute the distance in the
> >> resulting packstream to represent X as an ofs-delta against Y, which
> >> should already be happening when you had both X and Y in the same
> >> original pack?
> >
> > Good question. The difference is that if you're reusing X and Y from
> > same pack, you know that Y occurs some number of bytes *before* X in the
> > resulting pack.
> >
> > But if Y comes from a different pack, it may get pushed further back in
> > the MIDX pseudo-pack order. So in that case the assembled pack may list
> > X before Y, in which case X cannot be an OFS_DELTA of Y, since offset
> > deltas require that the base object appears first.
> 
> That is what we have always done even before we started bitmap based
> optimization.  If we happen to write Y before X, we consider doing
> ofs-delta for X, but otherwise we do ref-delta for X.  We do reorder
> fairly late in the pipeline when we notice that X that we are about
> to write out depends on Y that we haven't emitted to avoid this,
> though.  All of that the bitmap-based optimization code path should
> be able to imitate, I would think.

A small nitpick on your final sentence here. As you note, we do not ever
write Y before X, because compute_write_order() always places bases
before their deltas in the output pack (and we do not allow cycles of
deltas, of course).

And even with bitmaps we'd do the same, as long as those objects are
both fed to the regular pack-writing machinery.

It is only the special verbatim-pack-reuse[1] code that is trying to
blit out the start of an existing pack that is affected. And in theory
there it _could_ try to reorder to produce an ofs delta, but in practice
the whole point is to take a single very cheap pass over the start of
the pack (or multiple packs in the case of the midx). Doing any
reordering would be counterproductive to the "cheap" adjective there (it
does not even keep a list of object ids it is sending), so we are better
to leave those objects for the regular output code (which does make such
a list).

Taylor's series introduces an in-between where we choose not to reorder,
but switch to REF_DELTA. That is still cheap on CPU on the generating
side, though the resulting pack is slightly larger.

-Peff

[1] I wish we had good names to distinguish the various cases, because
    the term "reuse" is kind of overloaded. The "slower" regular
    object-sending path may still reuse verbatim bytes found in an
    on-disk path. But this "blit out matching parts of a pack without
    otherwise considering the objects" feature happens outside of that.
    We called it "pack reuse" back in 2013, but that was not a good name
    even then. I don't have a good suggestion, though.

Jeff King Oct. 11, 2024, 8:01 a.m. UTC | #6

On Fri, Oct 11, 2024 at 03:54:51AM -0400, Jeff King wrote:

> [1] I wish we had good names to distinguish the various cases, because
>     the term "reuse" is kind of overloaded. The "slower" regular
>     object-sending path may still reuse verbatim bytes found in an
>     on-disk path. But this "blit out matching parts of a pack without
>     otherwise considering the objects" feature happens outside of that.
>     We called it "pack reuse" back in 2013, but that was not a good name
>     even then. I don't have a good suggestion, though.

Actually, confusing things more, there are really _three_ layers of
reuse:

  1. At the beginning of a pack, we can blit out the bytes for objects
     starting from the beginning of the pack that are being sent (we
     know any delta will be satisfied since its base comes before it).

  2. After that, we process objects one by one, but do so very cheaply
     by just deciding if we can blit them out one by one, fixing up
     delta base offsets to account for gaps.

  3. Otherwise, we generate an object_entry struct in packing_data for
     them, try to find new deltas, and so on. We may then reuse the
     on-disk bytes after deciding they're suitable.

We call all of these "reuse", and certainly both (1) and (2) are "pack
reuse", but I think that term is sufficiently vague that it could apply
to (3) as well.

-Peff

Jeff King Oct. 11, 2024, 8:38 a.m. UTC | #7

On Wed, Oct 09, 2024 at 04:30:52PM -0400, Taylor Blau wrote:

> The first optimization (cross-pack deltas) should help clones and
> fetches, but the second optimization (thin deltas) will only help
> fetches, since the 'haves' bitmap will necessarily be empty for
> clones.
> 
> Of course, REF_DELTAs have a less compact representation than
> OFS_DELTAs, so the resulting packs will trade off some CPU time for a
> slightly larger pack. But we're only talking about a handful of extra
> bytes, and network bandwidth is pretty cheap, so I think the
> optimization is worthwhile here too.

It would be nice to see some numbers, even simulated ones from t/perf.
Of course we'd like to show off any reduced server CPU for generating a
fetch response. But I'd also like to see if the extra steps to find the
cross-pack bases have any measurable negative effect (so the ideal there
would be a big midx repo that doesn't have a lot of those bases, as it
would not be helped much by the optimization and would be hurt by the
time spent trying to apply it).

-Peff

Junio C Hamano Oct. 11, 2024, 4:23 p.m. UTC | #8

Jeff King <peff@peff.net> writes:

> On Fri, Oct 11, 2024 at 03:54:51AM -0400, Jeff King wrote:
>
>> [1] I wish we had good names to distinguish the various cases, because
>>     the term "reuse" is kind of overloaded. The "slower" regular
>>     object-sending path may still reuse verbatim bytes found in an
>>     on-disk path. But this "blit out matching parts of a pack without
>>     otherwise considering the objects" feature happens outside of that.
>>     We called it "pack reuse" back in 2013, but that was not a good name
>>     even then. I don't have a good suggestion, though.
>
> Actually, confusing things more, there are really _three_ layers of
> reuse:
>
>   1. At the beginning of a pack, we can blit out the bytes for objects
>      starting from the beginning of the pack that are being sent (we
>      know any delta will be satisfied since its base comes before it).

Yes, I wouldn't be worried about that one.  The data encoded as an
ofs-delta in this section already point at their base correctly in
the original pack, and in the resulting pack.

>   2. After that, we process objects one by one, but do so very cheaply
>      by just deciding if we can blit them out one by one, fixing up
>      delta base offsets to account for gaps.

This is the part I said "we have to remember where the base was emitted
and subtract it from the offset of the delta anyway even if we are
reusing delta from the same pack, so what do we need a separate code path
for this?" in my initial response.

I guess, "fixing up" could be done by using the difference between
offsets in the original pack for this step, which would be an
unfortunate design that prevents it from getting reused.

>   3. Otherwise, we generate an object_entry struct in packing_data for
>      them, try to find new deltas, and so on. We may then reuse the
>      on-disk bytes after deciding they're suitable.

It is a bit unfortunate, if we were to trust the existing delta base
selection in the pack like we did since Feb 2006 [*], we should be
omitting the "try to find new deltas" step.  Perhaps that comes for
free as the object_entry knows that our object has a delta_base?

> We call all of these "reuse", and certainly both (1) and (2) are "pack
> reuse", but I think that term is sufficiently vague that it could apply
> to (3) as well.
>
> -Peff

[00/11] pack-bitmap: convert offset to ref deltas where possible

Message

Comments