Message ID | 20c29d37f9c1ba1367145331d25dd27f966312cd.1675171759.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | d9fd674c8b26c376b37e02d974b92033acb99732 |
Headers | show |
Series | Bundle URIs V: creationToken heuristic for incremental fetches | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > Thus, the code in verify_bundle() has previously had the additional > check that all prerequisite commits are reachable from repository > references. This is done via a revision walk from all references, > stopping only if all prerequisite commits are discovered or all commits > are walked. This uses a custom walk to verify_bundle(). > > This check is more strict than what Git applies to fetched pack-files. > In the fetch case, Git guarantees that the new references are closed > under reachability by walking from the new references until walking > commits that are reachable from repository refs. This is done through > the well-used check_connected() method. > > To better align with the restrictions required by 'git fetch', > reimplement this check in verify_bundle() to use check_connected(). This > also simplifies the code significantly. As I often say, breaking repository faster is not the kind of performance gain we want to have in Git, and I am in favor of this iteration compared to the earlier version that mostly punted on ensuring the correctness (rather, it relied on assumption that "most of the time this should be OK"). > bundle.c | 75 ++++++++++++++++-------------------------- > t/t6020-bundle-misc.sh | 8 ++--- > 2 files changed, 33 insertions(+), 50 deletions(-) The diffstat is very pleasing to see. Let me read the postimage along aloud (preimage omitted). > diff --git a/bundle.c b/bundle.c > index 4ef7256aa11..76c3a904898 100644 > --- a/bundle.c > +++ b/bundle.c > @@ -187,6 +188,21 @@ static int list_refs(struct string_list *r, int argc, const char **argv) > /* Remember to update object flag allocation in object.h */ > #define PREREQ_MARK (1u<<16) > > +struct string_list_iterator { > + struct string_list *list; > + size_t cur; > +}; > + > +static const struct object_id *iterate_ref_map(void *cb_data) > +{ > + struct string_list_iterator *iter = cb_data; > + > + if (iter->cur >= iter->list->nr) > + return NULL; > + > + return iter->list->items[iter->cur++].util; > +} This is to let check_connected() collect all the prerequisite object names. OK. > int verify_bundle(struct repository *r, > struct bundle_header *header, > enum verify_bundle_flags flags) > { > /* > * Do fast check, then if any prereqs are missing then go line by line > * to be verbose about the errors > */ > struct string_list *p = &header->prerequisites; > + int i, ret = 0; > const char *message = _("Repository lacks these prerequisite commits:"); > + struct string_list_iterator iter = { > + .list = p, > + }; > + struct check_connected_options opts = { > + .quiet = 1, > + }; > > if (!r || !r->objects || !r->objects->odb) > return error(_("need a repository to verify a bundle")); > > for (i = 0; i < p->nr; i++) { > struct string_list_item *e = p->items + i; > const char *name = e->string; > struct object_id *oid = e->util; > struct object *o = parse_object(r, oid); > + if (o) > continue; > ret++; > if (flags & VERIFY_BUNDLE_QUIET) > continue; > if (ret == 1) > error("%s", message); > error("%s %s", oid_to_hex(oid), name); > } > + if (ret) > goto cleanup; The "quick fail" logic as before. Looking sensible. > > + if ((ret = check_connected(iterate_ref_map, &iter, &opts))) > + error(_("some prerequisite commits exist in the object store, " > + "but are not connected to the repository's history")); And then we let check_connected() to ensure that traversing from these prerequisite objects down to the DAG formed by existing refs will not die from missing objects. Makes sense. > + /* TODO: preserve this verbose language. */ I am lost -- aren't we preserving the BUNDLE_VERBOSE code below? > if (flags & VERIFY_BUNDLE_VERBOSE) { > diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh > index 38dbbf89155..7d40994991e 100755 > --- a/t/t6020-bundle-misc.sh > +++ b/t/t6020-bundle-misc.sh > @@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' ' > # Verify should fail > test_must_fail git bundle verify \ > ../clone-from/tip.bundle 2>err && > - grep "Could not read $BAD_OID" err && > - grep "Failed to traverse parents of commit $TIP_OID" err && > + grep "some prerequisite commits .* are not connected" err && > + test_line_count = 1 err && > > # Unbundling should fail > test_must_fail git bundle unbundle \ > ../clone-from/tip.bundle 2>err && > - grep "Could not read $BAD_OID" err && > - grep "Failed to traverse parents of commit $TIP_OID" err > + grep "some prerequisite commits .* are not connected" err && > + test_line_count = 1 err > ) > ' Especially with the new test added in the previous step, we know we are not trading correctness off. Excellent. I wonder how much the performance hit does this version incur over the "not safe at all" version and over the "use custom and stricter-than-needed" version, by the way? Thanks.
On 1/31/2023 12:35 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: >> + /* TODO: preserve this verbose language. */ > > I am lost -- aren't we preserving the BUNDLE_VERBOSE code below? Sorry, I put this in so I wouldn't forget to test that the verbose message sticks, but I did forget to remove it. >> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh >> index 38dbbf89155..7d40994991e 100755 >> --- a/t/t6020-bundle-misc.sh >> +++ b/t/t6020-bundle-misc.sh >> @@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' ' >> # Verify should fail >> test_must_fail git bundle verify \ >> ../clone-from/tip.bundle 2>err && >> - grep "Could not read $BAD_OID" err && >> - grep "Failed to traverse parents of commit $TIP_OID" err && >> + grep "some prerequisite commits .* are not connected" err && >> + test_line_count = 1 err && >> >> # Unbundling should fail >> test_must_fail git bundle unbundle \ >> ../clone-from/tip.bundle 2>err && >> - grep "Could not read $BAD_OID" err && >> - grep "Failed to traverse parents of commit $TIP_OID" err >> + grep "some prerequisite commits .* are not connected" err && >> + test_line_count = 1 err >> ) >> ' > > Especially with the new test added in the previous step, we know we > are not trading correctness off. Excellent. > > I wonder how much the performance hit does this version incur over > the "not safe at all" version and over the "use custom and > stricter-than-needed" version, by the way? I was able to simulate this in an extreme situation by taking a clone of the normal Git fork, creating a ref at every first parent, then creating a bundle of the difference between git/git and gitster/git. Finally, I compared the performance of 'git bundle verify' on Git compiled before this change and after this change: Benchmark 1: old Time (mean ± σ): 324.7 ms ± 7.5 ms [User: 228.0 ms, System: 95.7 ms] Range (min … max): 315.3 ms … 338.0 ms 10 runs Benchmark 2: new Time (mean ± σ): 331.2 ms ± 20.2 ms [User: 230.6 ms, System: 99.7 ms] Range (min … max): 302.8 ms … 360.2 ms 10 runs Summary 'old' ran 1.02 ± 0.07 times faster than 'new' So, there is a performance difference in the two situations, but it is very slight, in what I can detect. Of course, there is the case where the behavior differs because of the more permissible behavior, but I expect the old algorithm to be slower than the new case, because check_connected() can terminate with success (due to seeing all the prerequisite commits) faster than the old walk can terminate with failure (due to walking all reachable commits). Thanks, -Stolee
Derrick Stolee <derrickstolee@github.com> writes: > ... but I expect the old algorithm to be > slower than the new case, because check_connected() can terminate > with success (due to seeing all the prerequisite commits) faster > than the old walk can terminate with failure (due to walking all > reachable commits). That is good to hear. As long as we are not making it worse than before, I am happy as a clam ;-) Thanks.
diff --git a/bundle.c b/bundle.c index 4ef7256aa11..76c3a904898 100644 --- a/bundle.c +++ b/bundle.c @@ -12,6 +12,7 @@ #include "refs.h" #include "strvec.h" #include "list-objects-filter-options.h" +#include "connected.h" static const char v2_bundle_signature[] = "# v2 git bundle\n"; static const char v3_bundle_signature[] = "# v3 git bundle\n"; @@ -187,6 +188,21 @@ static int list_refs(struct string_list *r, int argc, const char **argv) /* Remember to update object flag allocation in object.h */ #define PREREQ_MARK (1u<<16) +struct string_list_iterator { + struct string_list *list; + size_t cur; +}; + +static const struct object_id *iterate_ref_map(void *cb_data) +{ + struct string_list_iterator *iter = cb_data; + + if (iter->cur >= iter->list->nr) + return NULL; + + return iter->list->items[iter->cur++].util; +} + int verify_bundle(struct repository *r, struct bundle_header *header, enum verify_bundle_flags flags) @@ -196,26 +212,25 @@ int verify_bundle(struct repository *r, * to be verbose about the errors */ struct string_list *p = &header->prerequisites; - struct rev_info revs = REV_INFO_INIT; - const char *argv[] = {NULL, "--all", NULL}; - struct commit *commit; - int i, ret = 0, req_nr; + int i, ret = 0; const char *message = _("Repository lacks these prerequisite commits:"); + struct string_list_iterator iter = { + .list = p, + }; + struct check_connected_options opts = { + .quiet = 1, + }; if (!r || !r->objects || !r->objects->odb) return error(_("need a repository to verify a bundle")); - repo_init_revisions(r, &revs, NULL); for (i = 0; i < p->nr; i++) { struct string_list_item *e = p->items + i; const char *name = e->string; struct object_id *oid = e->util; struct object *o = parse_object(r, oid); - if (o) { - o->flags |= PREREQ_MARK; - add_pending_object(&revs, o, name); + if (o) continue; - } ret++; if (flags & VERIFY_BUNDLE_QUIET) continue; @@ -223,37 +238,14 @@ int verify_bundle(struct repository *r, error("%s", message); error("%s %s", oid_to_hex(oid), name); } - if (revs.pending.nr != p->nr) + if (ret) goto cleanup; - req_nr = revs.pending.nr; - setup_revisions(2, argv, &revs, NULL); - - list_objects_filter_copy(&revs.filter, &header->filter); - - if (prepare_revision_walk(&revs)) - die(_("revision walk setup failed")); - i = req_nr; - while (i && (commit = get_revision(&revs))) - if (commit->object.flags & PREREQ_MARK) - i--; - - for (i = 0; i < p->nr; i++) { - struct string_list_item *e = p->items + i; - const char *name = e->string; - const struct object_id *oid = e->util; - struct object *o = parse_object(r, oid); - assert(o); /* otherwise we'd have returned early */ - if (o->flags & SHOWN) - continue; - ret++; - if (flags & VERIFY_BUNDLE_QUIET) - continue; - if (ret == 1) - error("%s", message); - error("%s %s", oid_to_hex(oid), name); - } + if ((ret = check_connected(iterate_ref_map, &iter, &opts))) + error(_("some prerequisite commits exist in the object store, " + "but are not connected to the repository's history")); + /* TODO: preserve this verbose language. */ if (flags & VERIFY_BUNDLE_VERBOSE) { struct string_list *r; @@ -282,15 +274,6 @@ int verify_bundle(struct repository *r, list_objects_filter_spec(&header->filter)); } cleanup: - /* Clean up objects used, as they will be reused. */ - for (i = 0; i < p->nr; i++) { - struct string_list_item *e = p->items + i; - struct object_id *oid = e->util; - commit = lookup_commit_reference_gently(r, oid, 1); - if (commit) - clear_commit_marks(commit, ALL_REV_FLAGS | PREREQ_MARK); - } - release_revisions(&revs); return ret; } diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh index 38dbbf89155..7d40994991e 100755 --- a/t/t6020-bundle-misc.sh +++ b/t/t6020-bundle-misc.sh @@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' ' # Verify should fail test_must_fail git bundle verify \ ../clone-from/tip.bundle 2>err && - grep "Could not read $BAD_OID" err && - grep "Failed to traverse parents of commit $TIP_OID" err && + grep "some prerequisite commits .* are not connected" err && + test_line_count = 1 err && # Unbundling should fail test_must_fail git bundle unbundle \ ../clone-from/tip.bundle 2>err && - grep "Could not read $BAD_OID" err && - grep "Failed to traverse parents of commit $TIP_OID" err + grep "some prerequisite commits .* are not connected" err && + test_line_count = 1 err ) '