Message ID | ecc6b167-f5c4-48ce-3973-461d1659ed40@github.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2.5,01/11] bundle: test unbundling with incomplete history | expand |
Derrick Stolee <derrickstolee@github.com> writes: > When Git verifies a bundle to see if it is safe for unbundling, it first > looks to see if the prerequisite commits are in the object store. This > is usually a sufficient filter, and those missing commits are indicated > clearly in the error messages. I am not sure if our early check is because "does the prerequisite even exist?" is sufficient. It is a short-cut that is cheap and can be done without preparing the commit traversal. > However, if the commits are present in > the object store, then there could still be issues if those commits are > not reachable from the repository's references. The repository only has > guarantees that its object store is closed under reachability for the > objects that are reachable from references. Correct. > Thus, the code in verify_bundle() has previously had the additional > check that all prerequisite commits are reachable from repository > references. This is done via a revision walk from all references, > stopping only if all prerequisite commits are discovered or all commits > are walked. This uses a custom walk to verify_bundle(). Correct. > This check is more strict than what Git applies even to fetched > pack-files. I do not see the need to say "even" here. In what other situation do we make connectivity checks, and is there a need to be more strict than others when checking fetched packfiles? > In the fetch case, Git guarantees that the new references > are closed under reachability by walking from the new references until > walking commits that are reachable from repository refs. This is done > through the well-used check_connected() method. Correct and is a good point to make. > To better align with the restrictions required by 'git fetch', > reimplement this check in verify_bundle() to use check_connected(). This > also simplifies the code significantly. Wonderful. I never liked the custom check done in unbundle code, which I am reasonably sure came from scripted hack to unbundle I wrote eons ago.
On 1/24/2023 12:33 PM, Junio C Hamano wrote: > Derrick Stolee <derrickstolee@github.com> writes: > >> When Git verifies a bundle to see if it is safe for unbundling, it first >> looks to see if the prerequisite commits are in the object store. This >> is usually a sufficient filter, and those missing commits are indicated >> clearly in the error messages. > > I am not sure if our early check is because "does the prerequisite > even exist?" is sufficient. It is a short-cut that is cheap and can > be done without preparing the commit traversal. I suppose I should say "Usually, existence in the object store is correlated with having all reachable objects, but this is not guaranteed." I'll also mention that it is a short-cut that can fail faster than the reachability check. >> This check is more strict than what Git applies even to fetched >> pack-files. > > I do not see the need to say "even" here. In what other situation > do we make connectivity checks, and is there a need to be more > strict than others when checking fetched packfiles? I suppose that I was implying that fetches are the more common operation, and the scrutiny applied to an arbitrary pack-file from a remote is probably higher there. However, who knows where a bundle came from, so the scrutiny should be the same. >> To better align with the restrictions required by 'git fetch', >> reimplement this check in verify_bundle() to use check_connected(). This >> also simplifies the code significantly. > > Wonderful. I never liked the custom check done in unbundle code, > which I am reasonably sure came from scripted hack to unbundle I > wrote eons ago. Excellent. Thanks for your feedback. -Stolee
Derrick Stolee <derrickstolee@github.com> writes: >> I do not see the need to say "even" here. In what other situation >> do we make connectivity checks, and is there a need to be more >> strict than others when checking fetched packfiles? > > I suppose that I was implying that fetches are the more common > operation, and the scrutiny applied to an arbitrary pack-file from > a remote is probably higher there. However, who knows where a > bundle came from, so the scrutiny should be the same. Ah, I see that is where that "even" came from. And yes, I agree that unbundling and fetch should be suspicious of their input to the same degree.
diff --git a/bundle.c b/bundle.c index 4ef7256aa11..76c3a904898 100644 --- a/bundle.c +++ b/bundle.c @@ -12,6 +12,7 @@ #include "refs.h" #include "strvec.h" #include "list-objects-filter-options.h" +#include "connected.h" static const char v2_bundle_signature[] = "# v2 git bundle\n"; static const char v3_bundle_signature[] = "# v3 git bundle\n"; @@ -187,6 +188,21 @@ static int list_refs(struct string_list *r, int argc, const char **argv) /* Remember to update object flag allocation in object.h */ #define PREREQ_MARK (1u<<16) +struct string_list_iterator { + struct string_list *list; + size_t cur; +}; + +static const struct object_id *iterate_ref_map(void *cb_data) +{ + struct string_list_iterator *iter = cb_data; + + if (iter->cur >= iter->list->nr) + return NULL; + + return iter->list->items[iter->cur++].util; +} + int verify_bundle(struct repository *r, struct bundle_header *header, enum verify_bundle_flags flags) @@ -196,26 +212,25 @@ int verify_bundle(struct repository *r, * to be verbose about the errors */ struct string_list *p = &header->prerequisites; - struct rev_info revs = REV_INFO_INIT; - const char *argv[] = {NULL, "--all", NULL}; - struct commit *commit; - int i, ret = 0, req_nr; + int i, ret = 0; const char *message = _("Repository lacks these prerequisite commits:"); + struct string_list_iterator iter = { + .list = p, + }; + struct check_connected_options opts = { + .quiet = 1, + }; if (!r || !r->objects || !r->objects->odb) return error(_("need a repository to verify a bundle")); - repo_init_revisions(r, &revs, NULL); for (i = 0; i < p->nr; i++) { struct string_list_item *e = p->items + i; const char *name = e->string; struct object_id *oid = e->util; struct object *o = parse_object(r, oid); - if (o) { - o->flags |= PREREQ_MARK; - add_pending_object(&revs, o, name); + if (o) continue; - } ret++; if (flags & VERIFY_BUNDLE_QUIET) continue; @@ -223,37 +238,14 @@ int verify_bundle(struct repository *r, error("%s", message); error("%s %s", oid_to_hex(oid), name); } - if (revs.pending.nr != p->nr) + if (ret) goto cleanup; - req_nr = revs.pending.nr; - setup_revisions(2, argv, &revs, NULL); - - list_objects_filter_copy(&revs.filter, &header->filter); - - if (prepare_revision_walk(&revs)) - die(_("revision walk setup failed")); - i = req_nr; - while (i && (commit = get_revision(&revs))) - if (commit->object.flags & PREREQ_MARK) - i--; - - for (i = 0; i < p->nr; i++) { - struct string_list_item *e = p->items + i; - const char *name = e->string; - const struct object_id *oid = e->util; - struct object *o = parse_object(r, oid); - assert(o); /* otherwise we'd have returned early */ - if (o->flags & SHOWN) - continue; - ret++; - if (flags & VERIFY_BUNDLE_QUIET) - continue; - if (ret == 1) - error("%s", message); - error("%s %s", oid_to_hex(oid), name); - } + if ((ret = check_connected(iterate_ref_map, &iter, &opts))) + error(_("some prerequisite commits exist in the object store, " + "but are not connected to the repository's history")); + /* TODO: preserve this verbose language. */ if (flags & VERIFY_BUNDLE_VERBOSE) { struct string_list *r; @@ -282,15 +274,6 @@ int verify_bundle(struct repository *r, list_objects_filter_spec(&header->filter)); } cleanup: - /* Clean up objects used, as they will be reused. */ - for (i = 0; i < p->nr; i++) { - struct string_list_item *e = p->items + i; - struct object_id *oid = e->util; - commit = lookup_commit_reference_gently(r, oid, 1); - if (commit) - clear_commit_marks(commit, ALL_REV_FLAGS | PREREQ_MARK); - } - release_revisions(&revs); return ret; } diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh index 38dbbf89155..7d40994991e 100755 --- a/t/t6020-bundle-misc.sh +++ b/t/t6020-bundle-misc.sh @@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' ' # Verify should fail test_must_fail git bundle verify \ ../clone-from/tip.bundle 2>err && - grep "Could not read $BAD_OID" err && - grep "Failed to traverse parents of commit $TIP_OID" err && + grep "some prerequisite commits .* are not connected" err && + test_line_count = 1 err && # Unbundling should fail test_must_fail git bundle unbundle \ ../clone-from/tip.bundle 2>err && - grep "Could not read $BAD_OID" err && - grep "Failed to traverse parents of commit $TIP_OID" err + grep "some prerequisite commits .* are not connected" err && + test_line_count = 1 err ) '