Message ID | c3868c7df92484f0527ce500ad1156275be334e8.1612411124.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | repack: support repacking into a geometric sequence | expand |
On Wed, Feb 03, 2021 at 10:59:17PM -0500, Taylor Blau wrote: > @@ -1209,22 +1210,73 @@ static int want_found_object(int exclude, struct packed_git *p) > * Otherwise, we signal "-1" at the end to tell the caller that we do > * not know either way, and it needs to check more packs. > */ > - if (!ignore_packed_keep_on_disk && > - !ignore_packed_keep_in_core && > - (!local || !have_non_local_packs)) > + > + /* > + * Handle .keep first, as we have a fast(er) path there. > + */ > + if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core) { > + /* > + * Set the flags for the kept-pack cache to be the ones we want > + * to ignore. > + * > + * That is, if we are ignoring objects in on-disk keep packs, > + * then we want to search through the on-disk keep and ignore > + * the in-core ones. > + */ > + unsigned flags = 0; > + if (ignore_packed_keep_on_disk) > + flags |= ON_DISK_KEEP_PACKS; > + if (ignore_packed_keep_in_core) > + flags |= IN_CORE_KEEP_PACKS; > + > + if (ignore_packed_keep_on_disk && p->pack_keep) > + return 0; > + if (ignore_packed_keep_in_core && p->pack_keep_in_core) > + return 0; > + if (has_object_kept_pack(oid, flags)) > + return 0; > + } > + > + /* > + * At this point we know definitively that either we don't care about > + * keep-packs, or the object is not in one. Keep checking other > + * conditions... > + */ > + > + if (!local || !have_non_local_packs) > return 1; > - > if (local && !p->pack_local) > return 0; > - if (p->pack_local && > - ((ignore_packed_keep_on_disk && p->pack_keep) || > - (ignore_packed_keep_in_core && p->pack_keep_in_core))) > - return 0; > > /* we don't know yet; keep looking for more packs */ > return -1; I know I wrote this patch, but just looking it over again with a critical eye: it looks like more re-ordering could avoid work in some cases. In particular, has_object_kept_pack() is a potentially expensive call. But if "(local && !p->pack_local)" is true, then we could cheaply exit the function with "0", regardless of what the keep requirement says. That's not a case that I think anybody cares that deeply about (and it certainly is not covered by t/perf). But I think it does regress in this patch. Prior to the patch, we'd check that condition before returning -1, and it was the caller who would then continue to search through all the kept packs. Now we do it preemptively. I think just bumping that: if (local && !p->pack_local) return 0; above the new code would fix it. Or to lay out the logic more fully, the order of checks should be: - does _this_ pack we found the object in disqualify it. If so, we can cheaply return 0. And that applies to both keep and local rules. - otherwise, check all packs via has_object_kept_pack(), which is cheaper than continuing to iterate through all packs by returning -1. - once we know definitively about keep-packs, then check any shortcuts related to local packs (like !have_non_local_packs) - and then if no shortcuts, we return -1 I think that might be easier to express by rewriting the patch. :) -Peff
On Wed, Feb 17, 2021 at 11:05:22AM -0500, Jeff King wrote: > I think just bumping that: > > if (local && !p->pack_local) > return 0; > above the new code would fix it. Or to lay out the logic more fully, the > order of checks should be: > - does _this_ pack we found the object in disqualify it. If so, we can > cheaply return 0. And that applies to both keep and local rules. > > - otherwise, check all packs via has_object_kept_pack(), which is > cheaper than continuing to iterate through all packs by returning > -1. > > - once we know definitively about keep-packs, then check any shortcuts > related to local packs (like !have_non_local_packs) > > - and then if no shortcuts, we return -1 I don't understand what you're suggesting. Is the (local && !p->pack_local) a disqualifying condition? Reading the comment, I think it is, and so we could do something like: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 36c2fa3aff..be3ba60bc2 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1205,14 +1205,21 @@ static int want_found_object(const struct object_id *oid, int exclude, * make sure no copy of this object appears in _any_ pack that makes us * to omit the object, so we need to check all the packs. * - * We can however first check whether these options can possible matter; + * We can however first check whether these options can possibly matter; * if they do not matter we know we want the object in generated pack. * Otherwise, we signal "-1" at the end to tell the caller that we do * not know either way, and it needs to check more packs. */ /* - * Handle .keep first, as we have a fast(er) path there. + * Objects in packs borrowed from elsewhere are discarded regardless of + * if they appear in other packs that weren't borrowed. + */ + if (local && !p->pack_local) + return 0; + + /* + * Then handle .keep first, as we have a fast(er) path there. */ if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core) { /* @@ -1242,11 +1249,8 @@ static int want_found_object(const struct object_id *oid, int exclude, * keep-packs, or the object is not in one. Keep checking other * conditions... */ - if (!local || !have_non_local_packs) return 1; - if (local && !p->pack_local) - return 0; /* we don't know yet; keep looking for more packs */ return -1; But your "check any shortcuts related to local packs" makes me think that we should leave the code as-is. Which are you suggesting? Thanks, Taylor
On Wed, Feb 17, 2021 at 02:23:27PM -0500, Taylor Blau wrote: > On Wed, Feb 17, 2021 at 11:05:22AM -0500, Jeff King wrote: > > I think just bumping that: > > > > if (local && !p->pack_local) > > return 0; > > > above the new code would fix it. Or to lay out the logic more fully, the > > order of checks should be: > > > - does _this_ pack we found the object in disqualify it. If so, we can > > cheaply return 0. And that applies to both keep and local rules. > > > > - otherwise, check all packs via has_object_kept_pack(), which is > > cheaper than continuing to iterate through all packs by returning > > -1. > > > > - once we know definitively about keep-packs, then check any shortcuts > > related to local packs (like !have_non_local_packs) > > > > - and then if no shortcuts, we return -1 > > I don't understand what you're suggesting. Is the (local && > !p->pack_local) a disqualifying condition? Reading the comment, I think > it is, and so we could do something like: That's exactly what I'm suggesting. If we have a non-local pack and were given --local, then we can shortcut immediately without caring about kept packs: we know that we do not want the object. > [...] > But your "check any shortcuts related to local packs" makes me think > that we should leave the code as-is. No, the "shortcuts" there is the opposite: if (!local || !have_non_local_packs) return 1; If either of those is true, we can say "definitely include" but only with respect to the --local requirement. So we _can't_ bump that up, but must check it only after we've definitively resolved the keep-pack requirement. -Peff
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 6d19eb000a..fbd7b54d70 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1188,7 +1188,8 @@ static int have_duplicate_entry(const struct object_id *oid, return 1; } -static int want_found_object(int exclude, struct packed_git *p) +static int want_found_object(const struct object_id *oid, int exclude, + struct packed_git *p) { if (exclude) return 1; @@ -1209,22 +1210,73 @@ static int want_found_object(int exclude, struct packed_git *p) * Otherwise, we signal "-1" at the end to tell the caller that we do * not know either way, and it needs to check more packs. */ - if (!ignore_packed_keep_on_disk && - !ignore_packed_keep_in_core && - (!local || !have_non_local_packs)) + + /* + * Handle .keep first, as we have a fast(er) path there. + */ + if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core) { + /* + * Set the flags for the kept-pack cache to be the ones we want + * to ignore. + * + * That is, if we are ignoring objects in on-disk keep packs, + * then we want to search through the on-disk keep and ignore + * the in-core ones. + */ + unsigned flags = 0; + if (ignore_packed_keep_on_disk) + flags |= ON_DISK_KEEP_PACKS; + if (ignore_packed_keep_in_core) + flags |= IN_CORE_KEEP_PACKS; + + if (ignore_packed_keep_on_disk && p->pack_keep) + return 0; + if (ignore_packed_keep_in_core && p->pack_keep_in_core) + return 0; + if (has_object_kept_pack(oid, flags)) + return 0; + } + + /* + * At this point we know definitively that either we don't care about + * keep-packs, or the object is not in one. Keep checking other + * conditions... + */ + + if (!local || !have_non_local_packs) return 1; - if (local && !p->pack_local) return 0; - if (p->pack_local && - ((ignore_packed_keep_on_disk && p->pack_keep) || - (ignore_packed_keep_in_core && p->pack_keep_in_core))) - return 0; /* we don't know yet; keep looking for more packs */ return -1; } +static int want_object_in_pack_one(struct packed_git *p, + const struct object_id *oid, + int exclude, + struct packed_git **found_pack, + off_t *found_offset) +{ + off_t offset; + + if (p == *found_pack) + offset = *found_offset; + else + offset = find_pack_entry_one(oid->hash, p); + + if (offset) { + if (!*found_pack) { + if (!is_pack_valid(p)) + return -1; + *found_offset = offset; + *found_pack = p; + } + return want_found_object(oid, exclude, p); + } + return -1; +} + /* * Check whether we want the object in the pack (e.g., we do not want * objects found in non-local stores if the "--local" option was used). @@ -1252,7 +1304,7 @@ static int want_object_in_pack(const struct object_id *oid, * are present we will determine the answer right now. */ if (*found_pack) { - want = want_found_object(exclude, *found_pack); + want = want_found_object(oid, exclude, *found_pack); if (want != -1) return want; } @@ -1260,53 +1312,22 @@ static int want_object_in_pack(const struct object_id *oid, for (m = get_multi_pack_index(the_repository); m; m = m->next) { struct pack_entry e; if (fill_midx_entry(the_repository, oid, &e, m)) { - struct packed_git *p = e.p; - off_t offset; - - if (p == *found_pack) - offset = *found_offset; - else - offset = find_pack_entry_one(oid->hash, p); - - if (offset) { - if (!*found_pack) { - if (!is_pack_valid(p)) - continue; - *found_offset = offset; - *found_pack = p; - } - want = want_found_object(exclude, p); - if (want != -1) - return want; - } - } - } - - list_for_each(pos, get_packed_git_mru(the_repository)) { - struct packed_git *p = list_entry(pos, struct packed_git, mru); - off_t offset; - - if (p == *found_pack) - offset = *found_offset; - else - offset = find_pack_entry_one(oid->hash, p); - - if (offset) { - if (!*found_pack) { - if (!is_pack_valid(p)) - continue; - *found_offset = offset; - *found_pack = p; - } - want = want_found_object(exclude, p); - if (!exclude && want > 0) - list_move(&p->mru, - get_packed_git_mru(the_repository)); + want = want_object_in_pack_one(e.p, oid, exclude, found_pack, found_offset); if (want != -1) return want; } } + list_for_each(pos, get_packed_git_mru(the_repository)) { + struct packed_git *p = list_entry(pos, struct packed_git, mru); + want = want_object_in_pack_one(p, oid, exclude, found_pack, found_offset); + if (!exclude && want > 0) + list_move(&p->mru, + get_packed_git_mru(the_repository)); + if (want != -1) + return want; + } + if (uri_protocols.nr) { struct configured_exclusion *ex = oidmap_get(&configured_exclusions, oid);