Message ID | cover.1728505840.git.me@ttaylorr.com (mailing list archive) |
---|---|
Headers | show |
Series | pack-bitmap: convert offset to ref deltas where possible | expand |
Taylor Blau <me@ttaylorr.com> writes: > This patch series enables more objects to be candidates for verbatim > reuse when generating a pack with the aide of reachability bitmaps. > > By the end of this series, two new classes of objects are now reuse > candidates which were not before. They are: > > - Cross-pack deltas. In multi-pack bitmaps, if the delta and base > were selected from different packs, the delta was not reusable. Hmph. Suppose that you need to send object X, you happen to have X as a ofs-delta against Y, but Y may appear in multiple packs. Even if the copy of Y you are going to send together with X is from the same packfile, you may not be sending all the objects between X and Y in the original local packfile, so you would need to recompute the offset to give ofs-delta X to the distance between X and Y in the resulting packstream, no? So when you pick the copy of Y out of another pack, what's so different? After emitting Y to the resulting pack stream (and remembering where in the packstream you did so), when it is X's turn to be emitted, shouldn't you be able to compute the distance in the resulting packstream to represent X as an ofs-delta against Y, which should already be happening when you had both X and Y in the same original pack? > - Thin deltas. In both single- and multi-pack bitmaps, we did not > consider reusing deltas whose base object appears in the 'haves' > bitmap. I hope this optimization does not kick in unless the receiving end is prepared to do "index-pack --fix-thin". I've never thought about this specifically, but it is interesting to realize that by definition "thin" deltas cannot be ofs-deltas. > Of course, REF_DELTAs have a less compact representation than > OFS_DELTAs, so the resulting packs will trade off some CPU time for a > slightly larger pack. Is comparing ref- vs ofs- delta a meaningful thing to do in the context of this series? What does the current code without these patches do in the same situation? Give up on reusing the existing delta and then? If we send the base representation instead, the comparison is "we currently do not use delta, but with this change we can reuse delta (even though we do not bother recompute the offset and instead use ref-delta)". Do we recompute the delta on the fly and show it as an ofs-delta with the current code? Then the comparison would be "we spend time doing diff-delta once right now but instead reuse an existing one (even though we do not bother recompute the offset and instead use ref-delta)".
On Thu, Oct 10, 2024 at 09:46:00AM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > This patch series enables more objects to be candidates for verbatim > > reuse when generating a pack with the aide of reachability bitmaps. > > > > By the end of this series, two new classes of objects are now reuse > > candidates which were not before. They are: > > > > - Cross-pack deltas. In multi-pack bitmaps, if the delta and base > > were selected from different packs, the delta was not reusable. > > Hmph. Suppose that you need to send object X, you happen to have X > as a ofs-delta against Y, but Y may appear in multiple packs. > > Even if the copy of Y you are going to send together with X is from > the same packfile, you may not be sending all the objects between X > and Y in the original local packfile, so you would need to recompute > the offset to give ofs-delta X to the distance between X and Y in > the resulting packstream, no? That's right. When doing verbatim pack reuse, we mark reused "chunks" (c.f. 'struct reused_chunk'), where each chunk records the offset of the beginning of the chunk in the source pack, and the difference between that value and the corresponding byte in the assembled pack. So suppose like in your example we have X and Y from the same pack, and we are sending both objects, but not necessarily every object in between the two from the source pack. In that case, X and Y will end up in different chunks. When we go to actually rewrite the OFS_DELTA for X, we'll compute: off_t fixup = find_reused_offset(X) - find_reused_offset(Y); (in the code, X and Y are actually the offsets for each object in the source pack, but I'm using the object names here for clarity). When there is a non-zero fixup value, we'll patch the OFS_DELTA to account for the gap between it and its base object. The details of how that is done are in builtin/pack-objects.c::write_reused_pack_one(), in the 'if (fixup) { ... }' block. > So when you pick the copy of Y out of another pack, what's so > different? After emitting Y to the resulting pack stream (and > remembering where in the packstream you did so), when it is X's turn > to be emitted, shouldn't you be able to compute the distance in the > resulting packstream to represent X as an ofs-delta against Y, which > should already be happening when you had both X and Y in the same > original pack? Good question. The difference is that if you're reusing X and Y from same pack, you know that Y occurs some number of bytes *before* X in the resulting pack. But if Y comes from a different pack, it may get pushed further back in the MIDX pseudo-pack order. So in that case the assembled pack may list X before Y, in which case X cannot be an OFS_DELTA of Y, since offset deltas require that the base object appears first. > > - Thin deltas. In both single- and multi-pack bitmaps, we did not > > consider reusing deltas whose base object appears in the 'haves' > > bitmap. > > I hope this optimization does not kick in unless the receiving end > is prepared to do "index-pack --fix-thin". It does. We only allow computing "thin deltas" when pack-objects was invoked with `--thin`. > I've never thought about this specifically, but it is interesting to > realize that by definition "thin" deltas cannot be ofs-deltas. Yep. > > Of course, REF_DELTAs have a less compact representation than > > OFS_DELTAs, so the resulting packs will trade off some CPU time for a > > slightly larger pack. > > Is comparing ref- vs ofs- delta a meaningful thing to do in the > context of this series? > > What does the current code without these patches do in the same > situation? Give up on reusing the existing delta and then? If we > send the base representation instead, the comparison is "we > currently do not use delta, but with this change we can reuse delta > (even though we do not bother recompute the offset and instead use > ref-delta)". The current code does not reuse deltas when we're either (a) not sending the base, (b) we are sending the base, but it's in a different pack, or (c) both. When any of those conditions are met, we do not reuse the existing on-disk representation of the delta object verbatim, and instead kick it back to the normal pack generation routine. That might result in us finding a new base, or applying sending the object's contents as a standalone object. > Do we recompute the delta on the fly and show it as an ofs-delta > with the current code? Then the comparison would be "we spend time > doing diff-delta once right now but instead reuse an existing one > (even though we do not bother recompute the offset and instead use > ref-delta)". Fair. Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: >> So when you pick the copy of Y out of another pack, what's so >> different? After emitting Y to the resulting pack stream (and >> remembering where in the packstream you did so), when it is X's turn >> to be emitted, shouldn't you be able to compute the distance in the >> resulting packstream to represent X as an ofs-delta against Y, which >> should already be happening when you had both X and Y in the same >> original pack? > > Good question. The difference is that if you're reusing X and Y from > same pack, you know that Y occurs some number of bytes *before* X in the > resulting pack. > > But if Y comes from a different pack, it may get pushed further back in > the MIDX pseudo-pack order. So in that case the assembled pack may list > X before Y, in which case X cannot be an OFS_DELTA of Y, since offset > deltas require that the base object appears first. That is what we have always done even before we started bitmap based optimization. If we happen to write Y before X, we consider doing ofs-delta for X, but otherwise we do ref-delta for X. We do reorder fairly late in the pipeline when we notice that X that we are about to write out depends on Y that we haven't emitted to avoid this, though. All of that the bitmap-based optimization code path should be able to imitate, I would think.
On Thu, Oct 10, 2024 at 01:20:06PM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > >> So when you pick the copy of Y out of another pack, what's so > >> different? After emitting Y to the resulting pack stream (and > >> remembering where in the packstream you did so), when it is X's turn > >> to be emitted, shouldn't you be able to compute the distance in the > >> resulting packstream to represent X as an ofs-delta against Y, which > >> should already be happening when you had both X and Y in the same > >> original pack? > > > > Good question. The difference is that if you're reusing X and Y from > > same pack, you know that Y occurs some number of bytes *before* X in the > > resulting pack. > > > > But if Y comes from a different pack, it may get pushed further back in > > the MIDX pseudo-pack order. So in that case the assembled pack may list > > X before Y, in which case X cannot be an OFS_DELTA of Y, since offset > > deltas require that the base object appears first. > > That is what we have always done even before we started bitmap based > optimization. If we happen to write Y before X, we consider doing > ofs-delta for X, but otherwise we do ref-delta for X. We do reorder > fairly late in the pipeline when we notice that X that we are about > to write out depends on Y that we haven't emitted to avoid this, > though. All of that the bitmap-based optimization code path should > be able to imitate, I would think. I see..., so yeah: before this patch series we would have not processed X via the bitmap-powered pack reuse mechanism, and after this series we will. Thanks, Taylor
On Thu, Oct 10, 2024 at 01:20:06PM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > >> So when you pick the copy of Y out of another pack, what's so > >> different? After emitting Y to the resulting pack stream (and > >> remembering where in the packstream you did so), when it is X's turn > >> to be emitted, shouldn't you be able to compute the distance in the > >> resulting packstream to represent X as an ofs-delta against Y, which > >> should already be happening when you had both X and Y in the same > >> original pack? > > > > Good question. The difference is that if you're reusing X and Y from > > same pack, you know that Y occurs some number of bytes *before* X in the > > resulting pack. > > > > But if Y comes from a different pack, it may get pushed further back in > > the MIDX pseudo-pack order. So in that case the assembled pack may list > > X before Y, in which case X cannot be an OFS_DELTA of Y, since offset > > deltas require that the base object appears first. > > That is what we have always done even before we started bitmap based > optimization. If we happen to write Y before X, we consider doing > ofs-delta for X, but otherwise we do ref-delta for X. We do reorder > fairly late in the pipeline when we notice that X that we are about > to write out depends on Y that we haven't emitted to avoid this, > though. All of that the bitmap-based optimization code path should > be able to imitate, I would think. A small nitpick on your final sentence here. As you note, we do not ever write Y before X, because compute_write_order() always places bases before their deltas in the output pack (and we do not allow cycles of deltas, of course). And even with bitmaps we'd do the same, as long as those objects are both fed to the regular pack-writing machinery. It is only the special verbatim-pack-reuse[1] code that is trying to blit out the start of an existing pack that is affected. And in theory there it _could_ try to reorder to produce an ofs delta, but in practice the whole point is to take a single very cheap pass over the start of the pack (or multiple packs in the case of the midx). Doing any reordering would be counterproductive to the "cheap" adjective there (it does not even keep a list of object ids it is sending), so we are better to leave those objects for the regular output code (which does make such a list). Taylor's series introduces an in-between where we choose not to reorder, but switch to REF_DELTA. That is still cheap on CPU on the generating side, though the resulting pack is slightly larger. -Peff [1] I wish we had good names to distinguish the various cases, because the term "reuse" is kind of overloaded. The "slower" regular object-sending path may still reuse verbatim bytes found in an on-disk path. But this "blit out matching parts of a pack without otherwise considering the objects" feature happens outside of that. We called it "pack reuse" back in 2013, but that was not a good name even then. I don't have a good suggestion, though.
On Fri, Oct 11, 2024 at 03:54:51AM -0400, Jeff King wrote: > [1] I wish we had good names to distinguish the various cases, because > the term "reuse" is kind of overloaded. The "slower" regular > object-sending path may still reuse verbatim bytes found in an > on-disk path. But this "blit out matching parts of a pack without > otherwise considering the objects" feature happens outside of that. > We called it "pack reuse" back in 2013, but that was not a good name > even then. I don't have a good suggestion, though. Actually, confusing things more, there are really _three_ layers of reuse: 1. At the beginning of a pack, we can blit out the bytes for objects starting from the beginning of the pack that are being sent (we know any delta will be satisfied since its base comes before it). 2. After that, we process objects one by one, but do so very cheaply by just deciding if we can blit them out one by one, fixing up delta base offsets to account for gaps. 3. Otherwise, we generate an object_entry struct in packing_data for them, try to find new deltas, and so on. We may then reuse the on-disk bytes after deciding they're suitable. We call all of these "reuse", and certainly both (1) and (2) are "pack reuse", but I think that term is sufficiently vague that it could apply to (3) as well. -Peff
On Wed, Oct 09, 2024 at 04:30:52PM -0400, Taylor Blau wrote: > The first optimization (cross-pack deltas) should help clones and > fetches, but the second optimization (thin deltas) will only help > fetches, since the 'haves' bitmap will necessarily be empty for > clones. > > Of course, REF_DELTAs have a less compact representation than > OFS_DELTAs, so the resulting packs will trade off some CPU time for a > slightly larger pack. But we're only talking about a handful of extra > bytes, and network bandwidth is pretty cheap, so I think the > optimization is worthwhile here too. It would be nice to see some numbers, even simulated ones from t/perf. Of course we'd like to show off any reduced server CPU for generating a fetch response. But I'd also like to see if the extra steps to find the cross-pack bases have any measurable negative effect (so the ideal there would be a big midx repo that doesn't have a lot of those bases, as it would not be helped much by the optimization and would be hurt by the time spent trying to apply it). -Peff
Jeff King <peff@peff.net> writes: > On Fri, Oct 11, 2024 at 03:54:51AM -0400, Jeff King wrote: > >> [1] I wish we had good names to distinguish the various cases, because >> the term "reuse" is kind of overloaded. The "slower" regular >> object-sending path may still reuse verbatim bytes found in an >> on-disk path. But this "blit out matching parts of a pack without >> otherwise considering the objects" feature happens outside of that. >> We called it "pack reuse" back in 2013, but that was not a good name >> even then. I don't have a good suggestion, though. > > Actually, confusing things more, there are really _three_ layers of > reuse: > > 1. At the beginning of a pack, we can blit out the bytes for objects > starting from the beginning of the pack that are being sent (we > know any delta will be satisfied since its base comes before it). Yes, I wouldn't be worried about that one. The data encoded as an ofs-delta in this section already point at their base correctly in the original pack, and in the resulting pack. > 2. After that, we process objects one by one, but do so very cheaply > by just deciding if we can blit them out one by one, fixing up > delta base offsets to account for gaps. This is the part I said "we have to remember where the base was emitted and subtract it from the offset of the delta anyway even if we are reusing delta from the same pack, so what do we need a separate code path for this?" in my initial response. I guess, "fixing up" could be done by using the difference between offsets in the original pack for this step, which would be an unfortunate design that prevents it from getting reused. > 3. Otherwise, we generate an object_entry struct in packing_data for > them, try to find new deltas, and so on. We may then reuse the > on-disk bytes after deciding they're suitable. It is a bit unfortunate, if we were to trust the existing delta base selection in the pack like we did since Feb 2006 [*], we should be omitting the "try to find new deltas" step. Perhaps that comes for free as the object_entry knows that our object has a delta_base? > We call all of these "reuse", and certainly both (1) and (2) are "pack > reuse", but I think that term is sufficiently vague that it could apply > to (3) as well. > > -Peff