Message ID | 20240925072021.77078-1-hanyang.tony@bytedance.com (mailing list archive) |
---|---|
Headers | show |
Series | repack: pack everything into promisor packfile in partial repos | expand |
Hi Han On 25/09/2024 08:20, Han Young wrote: > As suggested by Jonathan[1], there are number of ways to fix this issue. > We have already explored some of them in this thread, and so far none of them > is satisfiable. Calvin and I tried to address the problem from fetch-pack side > and rev-list side. But the fix either consumes too much CPU power or results > in inefficient bandwidth use. I was wondering if it would be possible to cache the tip commits in promisor packs when repacking so that a subsequent repack only has to walk the commits added since the last repack when it is trying to figure out if a local object should be moved into a promisor pack. > So let's attack the problem from repack side. The goal is to prevent repack > from discarding local objects, previously it is done by carefully > separating promisor objects and normal objects in rev-list. > The implementation is flawed and no solution have been found so far. > Instead, we can get ride of rev-list and just pack everything into promisor > files. This way, no objects would be lost. > > By using 'repack everything', repacking requires less work and we are not > using more bandwidth. The only downside is normal objects packing does not > benefiting from the history and path based delta calculation. I've just been looking at Documentation/technical/partial-clone.txt and I think there are a couple of other implications of this change > An object may be missing due to a partial clone or fetch, or missing > due to repository corruption. To differentiate these cases, the > local repository specially indicates such filtered packfiles > obtained from promisor remotes as "promisor packfiles". Packing local objects into promisor packfiles means that it is no longer possible to detect if an object is missing due to repository corruption or because we need to fetch it from a promisor remote. > `repack` in GC has been updated to not touch promisor packfiles at > all, and to only repack other objects. Packing local objects into promisor packfiles means that GC will no-longer remove unreachable local objects. It would be helpful if the cover letter or commit messages discussed the tradeoffs of these changes and updated that document accordingly. Best Wishes Phillip > Majority of > objects in a partial repo is promisor objects, so the impact of worse normal > objects repacking is negligible. > > [1] https://lore.kernel.org/git/20240813004508.2768102-1-jonathantanmy@google.com/ > > Han Young (2): > repack: pack everything into promisor packfile in partial repos > t0410: adapt tests to repack changes > > builtin/repack.c | 258 ++++++++++++++++++++++----------------- > t/t0410-partial-clone.sh | 68 +---------- > 2 files changed, 145 insertions(+), 181 deletions(-) >
Phillip Wood <phillip.wood123@gmail.com> writes: > I was wondering if it would be possible to cache the tip commits in > promisor packs when repacking so that a subsequent repack only has to > walk the commits added since the last repack when it is trying to > figure out if a local object should be moved into a promisor pack. I was wondering the same thing. If packfiles (and bundles) record the entry points and the exit points of the DAG, it would help quite a bit. > It would be helpful if the cover letter or commit messages discussed > the tradeoffs of these changes and updated that document accordingly. I like the suggestion very much. Thanks for a review.
Han Young <hanyang.tony@bytedance.com> writes: > By using 'repack everything', repacking requires less work and we are not > using more bandwidth. The only downside is normal objects packing does not > benefiting from the history and path based delta calculation. Majority of > objects in a partial repo is promisor objects, so the impact of worse normal > objects repacking is negligible. There is an important assumption that any objects in promisor packs *and* any objects that are (directly or indirectly) referenced by these objects in promisor packs can safely be expunged from the local object store because they can be later fetched again from the promisor remote. In that (in)famous failure case topology of the history: commit tree blob C3 ---- T3 -- B3 (fetched from remote, in promisor pack) | C2 ---- T2 -- B2 (created locally, in non-promisor pack) | C1 ---- T1 -- B1 (fetched from remote, in promisor pack) even though the objects associated with the commit C2 are created locally, the fact that C3 in promisor pack references it alone is sufficient for us to also assume that these "locally created" are now refetchable from the promisor remote that gave us C3, hence it is safe to repack the history leading to C3 and all objects involved in the history and mark the resulting pack(s) promisor packs. OK. That sounds workable, but aren't there downsides? Thanks for working on this topic.