From patchwork Tue Jan 31 13:29:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13123001 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E3EDC38142 for ; Tue, 31 Jan 2023 13:29:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231311AbjAaN31 (ORCPT ); Tue, 31 Jan 2023 08:29:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40404 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232048AbjAaN3Z (ORCPT ); Tue, 31 Jan 2023 08:29:25 -0500 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48BE64860E for ; Tue, 31 Jan 2023 05:29:24 -0800 (PST) Received: by mail-wm1-x330.google.com with SMTP id iv8-20020a05600c548800b003db04a0a46bso910462wmb.0 for ; Tue, 31 Jan 2023 05:29:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=GYBJOViu+O2gi7DLng7rH3WxdKvyH5pj0JIiJdL8CsQ=; b=TpmNnYr+l+LhJurYEE92vTY7YH7DmeDg8LkP2KWp3sbnPXwCWRab5TXfW7rkuwOjGO asGqEdxClsViKmlnV6u8+UsxRrhud0FZVnLs/M/AzPcPBdBMqb2jH4LFSG0f/yfKCo5u ExFbqv94/Q1rbNpXBE+9NwipkkyILTX6bTkEzigtHjB2jVBQg+oWE+XLQJg1G1c+QaWF jIpZvFfv4+nCaH5uxQjloYpjiWljPf4CJ+uFbodLXbOZZPOvMEMdwW9eBAPf1pIZpyL7 8BuDIKl272grioeQDgSq7ZNZfMwGFXua02NxDncvtgAGEr9O7R6Z9OdeHpzpFTj8rJZp q/Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GYBJOViu+O2gi7DLng7rH3WxdKvyH5pj0JIiJdL8CsQ=; b=Lc5+wD1Wbt7vMJ0au3lpeSwC9WOFLjHGJ34d5MSVgm1X+dfnUCsk136AWEMaPnzCoY safguJcD8qBWQ9MhjCd+4zLmb6b42lQF6HolofFhOQmPPrmQMQdxgM3qLEZHdZtAfK5C KFvMI2cuZ1x3F05IzkBN3hVSr2LUhpYtImBqj8BLgCgd/PcQj263RtV05On92EVnVA6U RKQ2kYzTkoN3BGCfkuZDyfc6H31cF8GoFrSU60cCc9pJrnLpvVmOkZSmjfavusMgOGW/ LszXkfu3Pt2Y5tQBZgAPE9MEUUkpKQFWE7JHM4QHGeyrJXr/IUEicLLZPbTM3qAitloc DCsw== X-Gm-Message-State: AO0yUKXt/BowrirIQpWPWBiu7hWjqGj3UVCF1WEPdcajfzjC+npJr+gD mhQVg3KGs3ztKmxtmCHkT0vDye0VJZ4= X-Google-Smtp-Source: AK7set8z3zGo8sAi20JLTDJOXU41D9fPoOhysCvgffi58jTrKY8VbTwNppifk2OM1xymQ3l4JMZJSw== X-Received: by 2002:a05:600c:1d2a:b0:3dc:443e:4212 with SMTP id l42-20020a05600c1d2a00b003dc443e4212mr13049409wms.1.1675171762576; Tue, 31 Jan 2023 05:29:22 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b21-20020a05600c151500b003dc5b59ed7asm5255968wmg.11.2023.01.31.05.29.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Jan 2023 05:29:22 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 31 Jan 2023 13:29:09 +0000 Subject: [PATCH v3 01/11] bundle: test unbundling with incomplete history Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When verifying a bundle, Git checks first that all prerequisite commits exist in the object store, then adds an additional check: those prerequisite commits must be reachable from references in the repository. This check is stronger than what is checked for refs being added during 'git fetch', which simply guarantees that the new refs have a complete history up to the point where it intersects with the current reachable history. However, we also do not have any tests that check the behavior under this condition. Create a test that demonstrates its behavior. In order to construct a broken history, perform a shallow clone of a repository with a linear history, but whose default branch ('base') has a single commit, so dropping the shallow markers leaves a complete history from that reference. However, the 'tip' reference adds a shallow commit whose parent is missing in the cloned repository. Trying to unbundle a bundle with the 'tip' as a prerequisite will succeed past the object store check and move into the reachability check. The two errors that are reported are of this form: error: Could not read fatal: Failed to traverse parents of commit These messages are not particularly helpful for the person running the unbundle command, but they do prevent the command from succeeding. Signed-off-by: Derrick Stolee --- t/t6020-bundle-misc.sh | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh index 3a1cf30b1d7..38dbbf89155 100755 --- a/t/t6020-bundle-misc.sh +++ b/t/t6020-bundle-misc.sh @@ -566,4 +566,44 @@ test_expect_success 'cloning from filtered bundle has useful error' ' grep "cannot clone from filtered bundle" err ' +test_expect_success 'verify catches unreachable, broken prerequisites' ' + test_when_finished rm -rf clone-from clone-to && + git init clone-from && + ( + cd clone-from && + git checkout -b base && + test_commit A && + git checkout -b tip && + git commit --allow-empty -m "will drop by shallow" && + git commit --allow-empty -m "will keep by shallow" && + git commit --allow-empty -m "for bundle, not clone" && + git bundle create tip.bundle tip~1..tip && + git reset --hard HEAD~1 && + git checkout base + ) && + BAD_OID=$(git -C clone-from rev-parse tip~1) && + TIP_OID=$(git -C clone-from rev-parse tip) && + git clone --depth=1 --no-single-branch \ + "file://$(pwd)/clone-from" clone-to && + ( + cd clone-to && + + # Set up broken history by removing shallow markers + git update-ref -d refs/remotes/origin/tip && + rm .git/shallow && + + # Verify should fail + test_must_fail git bundle verify \ + ../clone-from/tip.bundle 2>err && + grep "Could not read $BAD_OID" err && + grep "Failed to traverse parents of commit $TIP_OID" err && + + # Unbundling should fail + test_must_fail git bundle unbundle \ + ../clone-from/tip.bundle 2>err && + grep "Could not read $BAD_OID" err && + grep "Failed to traverse parents of commit $TIP_OID" err + ) +' + test_done From patchwork Tue Jan 31 13:29:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13123003 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59BABC636D3 for ; Tue, 31 Jan 2023 13:29:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232183AbjAaN3a (ORCPT ); Tue, 31 Jan 2023 08:29:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40410 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232067AbjAaN30 (ORCPT ); Tue, 31 Jan 2023 08:29:26 -0500 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0DDA548615 for ; Tue, 31 Jan 2023 05:29:25 -0800 (PST) Received: by mail-wm1-x330.google.com with SMTP id c4-20020a1c3504000000b003d9e2f72093so12459176wma.1 for ; Tue, 31 Jan 2023 05:29:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=SEWWW/J6pfqbgcEnSGKk8RmYi0LognbbM6TQuGUEkcE=; b=HH9f86Eiird7UcIU90HzM3SkBiVfdtIhGj1lWIrVUztNLL4NO4kb6WqSyuLAckzGXn PHdMwvuqETbd9AEl+OlYV/WNZ21TQc/V8B6soWTTfimkGxjN6wvxOzohnYZdHa23+DMY pjCzy5lP+tVht6HzJlCPh8wtPWo5VKdmAdclzDg9aLEusKD/zUuadYjYuap1zubLbycX +Ye28Xyd+R3+bH7yYuwKIWUZtA/SlZyjgR2nAOlpYjzBioq+AUCVkldMXWppDqViFS45 p+WBZS/0g+C/99lKK7qy8dnywABPYvIuf/7+A7fw3l5HsMMWcNRpVlVShXP1wt+i5N1K ljXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SEWWW/J6pfqbgcEnSGKk8RmYi0LognbbM6TQuGUEkcE=; b=zPZ5k5KFChbAhi/zTaSx41Qyt7HneWqlLoiUitttpmAo9d5FoiyPwmDf5LjaLMun8m Xox48LKQbaLBPKsmeyIEegMBMKXNGi4pBPZ8cDkTUqM/2B7Sq20AEPudi2VSvXk6uTAV wPuFqkIj4CvC0T1Y0P0tVkrXEwVdVeW5md9Az81xNZRbKviflJw0D5/a7k9HJVP0mg97 cmVi7NqNvuP6W84u/s27ZKw6yzdIPJjt48NtzQ18GLGULiUSsHroKjME8hOYpVwiyqmi i/CuoUQcJp7L1jY+L6L+K7NUgn0kDPMjDjNIGMqITbrBbH18WdtcrPmTBL71a05E3Hjn vrwA== X-Gm-Message-State: AO0yUKUO8GFo5s2h3amTC63WgCUUEL3DbtEZSMG3FfzTAH6gzVhnHtLm eQP5GaXeo7IFxFvlWSse1uuboJJWdC8= X-Google-Smtp-Source: AK7set9bpZKMVaFqB/ly8xp1VrloQoQgXWXfsiPLjC2o0BbepMLYKItmdAtnMypX3ZCntY0/ajBnzA== X-Received: by 2002:a05:600c:4447:b0:3dc:4481:b07 with SMTP id v7-20020a05600c444700b003dc44810b07mr16101827wmn.20.1675171763410; Tue, 31 Jan 2023 05:29:23 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id bi5-20020a05600c3d8500b003db0bb81b6asm16724085wmb.1.2023.01.31.05.29.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Jan 2023 05:29:23 -0800 (PST) Message-Id: <20c29d37f9c1ba1367145331d25dd27f966312cd.1675171759.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 31 Jan 2023 13:29:10 +0000 Subject: [PATCH v3 02/11] bundle: verify using check_connected() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When Git verifies a bundle to see if it is safe for unbundling, it first looks to see if the prerequisite commits are in the object store. This is an easy way to "fail fast" but it is not a sufficient check for updating refs that guarantee closure under reachability. There could still be issues if those commits are not reachable from the repository's references. The repository only has guarantees that its object store is closed under reachability for the objects that are reachable from references. Thus, the code in verify_bundle() has previously had the additional check that all prerequisite commits are reachable from repository references. This is done via a revision walk from all references, stopping only if all prerequisite commits are discovered or all commits are walked. This uses a custom walk to verify_bundle(). This check is more strict than what Git applies to fetched pack-files. In the fetch case, Git guarantees that the new references are closed under reachability by walking from the new references until walking commits that are reachable from repository refs. This is done through the well-used check_connected() method. To better align with the restrictions required by 'git fetch', reimplement this check in verify_bundle() to use check_connected(). This also simplifies the code significantly. The previous change added a test that verified the behavior of 'git bundle verify' and 'git bundle unbundle' in this case, and the error messages looked like this: error: Could not read fatal: Failed to traverse parents of commit However, by changing the revision walk slightly within check_connected() and using its quiet mode, we can omit those messages. Instead, we get only this message, tailored to describing the current state of the repository: error: some prerequisite commits exist in the object store, but are not connected to the repository's history (Line break added here for the commit message formatting, only.) While this message does not include any object IDs, there is no guarantee that those object IDs would help the user diagnose what is going on, as they could be separated from the prerequisite commits by some distance. At minimum, this situation describes the situation in a more informative way than the previous error messages. Signed-off-by: Derrick Stolee --- bundle.c | 75 ++++++++++++++++-------------------------- t/t6020-bundle-misc.sh | 8 ++--- 2 files changed, 33 insertions(+), 50 deletions(-) diff --git a/bundle.c b/bundle.c index 4ef7256aa11..76c3a904898 100644 --- a/bundle.c +++ b/bundle.c @@ -12,6 +12,7 @@ #include "refs.h" #include "strvec.h" #include "list-objects-filter-options.h" +#include "connected.h" static const char v2_bundle_signature[] = "# v2 git bundle\n"; static const char v3_bundle_signature[] = "# v3 git bundle\n"; @@ -187,6 +188,21 @@ static int list_refs(struct string_list *r, int argc, const char **argv) /* Remember to update object flag allocation in object.h */ #define PREREQ_MARK (1u<<16) +struct string_list_iterator { + struct string_list *list; + size_t cur; +}; + +static const struct object_id *iterate_ref_map(void *cb_data) +{ + struct string_list_iterator *iter = cb_data; + + if (iter->cur >= iter->list->nr) + return NULL; + + return iter->list->items[iter->cur++].util; +} + int verify_bundle(struct repository *r, struct bundle_header *header, enum verify_bundle_flags flags) @@ -196,26 +212,25 @@ int verify_bundle(struct repository *r, * to be verbose about the errors */ struct string_list *p = &header->prerequisites; - struct rev_info revs = REV_INFO_INIT; - const char *argv[] = {NULL, "--all", NULL}; - struct commit *commit; - int i, ret = 0, req_nr; + int i, ret = 0; const char *message = _("Repository lacks these prerequisite commits:"); + struct string_list_iterator iter = { + .list = p, + }; + struct check_connected_options opts = { + .quiet = 1, + }; if (!r || !r->objects || !r->objects->odb) return error(_("need a repository to verify a bundle")); - repo_init_revisions(r, &revs, NULL); for (i = 0; i < p->nr; i++) { struct string_list_item *e = p->items + i; const char *name = e->string; struct object_id *oid = e->util; struct object *o = parse_object(r, oid); - if (o) { - o->flags |= PREREQ_MARK; - add_pending_object(&revs, o, name); + if (o) continue; - } ret++; if (flags & VERIFY_BUNDLE_QUIET) continue; @@ -223,37 +238,14 @@ int verify_bundle(struct repository *r, error("%s", message); error("%s %s", oid_to_hex(oid), name); } - if (revs.pending.nr != p->nr) + if (ret) goto cleanup; - req_nr = revs.pending.nr; - setup_revisions(2, argv, &revs, NULL); - - list_objects_filter_copy(&revs.filter, &header->filter); - - if (prepare_revision_walk(&revs)) - die(_("revision walk setup failed")); - i = req_nr; - while (i && (commit = get_revision(&revs))) - if (commit->object.flags & PREREQ_MARK) - i--; - - for (i = 0; i < p->nr; i++) { - struct string_list_item *e = p->items + i; - const char *name = e->string; - const struct object_id *oid = e->util; - struct object *o = parse_object(r, oid); - assert(o); /* otherwise we'd have returned early */ - if (o->flags & SHOWN) - continue; - ret++; - if (flags & VERIFY_BUNDLE_QUIET) - continue; - if (ret == 1) - error("%s", message); - error("%s %s", oid_to_hex(oid), name); - } + if ((ret = check_connected(iterate_ref_map, &iter, &opts))) + error(_("some prerequisite commits exist in the object store, " + "but are not connected to the repository's history")); + /* TODO: preserve this verbose language. */ if (flags & VERIFY_BUNDLE_VERBOSE) { struct string_list *r; @@ -282,15 +274,6 @@ int verify_bundle(struct repository *r, list_objects_filter_spec(&header->filter)); } cleanup: - /* Clean up objects used, as they will be reused. */ - for (i = 0; i < p->nr; i++) { - struct string_list_item *e = p->items + i; - struct object_id *oid = e->util; - commit = lookup_commit_reference_gently(r, oid, 1); - if (commit) - clear_commit_marks(commit, ALL_REV_FLAGS | PREREQ_MARK); - } - release_revisions(&revs); return ret; } diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh index 38dbbf89155..7d40994991e 100755 --- a/t/t6020-bundle-misc.sh +++ b/t/t6020-bundle-misc.sh @@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' ' # Verify should fail test_must_fail git bundle verify \ ../clone-from/tip.bundle 2>err && - grep "Could not read $BAD_OID" err && - grep "Failed to traverse parents of commit $TIP_OID" err && + grep "some prerequisite commits .* are not connected" err && + test_line_count = 1 err && # Unbundling should fail test_must_fail git bundle unbundle \ ../clone-from/tip.bundle 2>err && - grep "Could not read $BAD_OID" err && - grep "Failed to traverse parents of commit $TIP_OID" err + grep "some prerequisite commits .* are not connected" err && + test_line_count = 1 err ) ' From patchwork Tue Jan 31 13:29:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13123004 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C9DFC38142 for ; Tue, 31 Jan 2023 13:29:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232192AbjAaN3c (ORCPT ); Tue, 31 Jan 2023 08:29:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232043AbjAaN31 (ORCPT ); Tue, 31 Jan 2023 08:29:27 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7D9849565 for ; Tue, 31 Jan 2023 05:29:25 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id m14so13739165wrg.13 for ; Tue, 31 Jan 2023 05:29:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=GNgYHe2+hAaU8wbxFrTySxoshKCEj59J0s/xFDZd5wE=; b=EAOS+/q+9Zb01DDKv7MZlT4C/ahKNQWMjhqO9JkTkPbPpiQhoVv0f+I99RiM0T+jmy IYcN/nuKRFm/6VAFJOZuUAfRmcwSmLT1WJZ/P+Nv4OFcfMcsBTtGqedaSVB41EIi8IGm FQckpBfuZ3dF2MBjkFrsWCbBJw8MgBJhztn1s4KgXEKEiau+zFF/IU3SaDd8TObf2sBM itI2vUIqM8JNsQhBVlaPleXIIGer0pASeHKXtoOHwcpvf8XL9DIcncqO56If/AaifLVZ glYonnaxdQU0nx5m+UOOsfeIiOx8PAqQK98RkRGAYX7gt64IvSJocQM1aoxmAocdBjyI 7LvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GNgYHe2+hAaU8wbxFrTySxoshKCEj59J0s/xFDZd5wE=; b=RPRm0EKgOYAZ/b+2egxRnEV5JkwANrR9N49w+stpRz7C3CrCbcyTtPJvsQ9WSlYUIM 58iG0R01kMoR9NiLspePr2kp+BKNtaT5WqCya8kNNVs3LC6jEad95stchezUv29rc2Qe Ku47g+aBdRClhGQpD2gTiW8GiihsRGc3jq9i8qJKsqyGt8HRGTjUrYu4shNbMfbVSkdP kl+ABKZHiO0JrfkMu/eVjp/8CVOsmc75gNVKdOqOfx9WDjh5B36KbPQRPb7si7AUgVXG 9/WrqhKa7d4t5buJLVQ3XDhYkEtnBvv2upopgLivPvzKz41GTCpzd5Nn0fl3gESwoVQf XuLA== X-Gm-Message-State: AO0yUKVCgHx4cXnOnK/byGiRvNntmXPjM/mPmgTlewGecEjWJrzReZ4y Ayi7fuQmtYccfSXrCy4JQvnVfuayNYQ= X-Google-Smtp-Source: AK7set/+tRYQtQ3aSt2e1lImfiVLIVWsO9JSiA7g2/7TSezYIPZ8hD75ambD27Eo87VuDpGRFjkZLQ== X-Received: by 2002:a5d:4dce:0:b0:2bf:dcf5:f486 with SMTP id f14-20020a5d4dce000000b002bfdcf5f486mr9158023wru.29.1675171764236; Tue, 31 Jan 2023 05:29:24 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l16-20020adffe90000000b002b8fe58d6desm14370903wrr.62.2023.01.31.05.29.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Jan 2023 05:29:23 -0800 (PST) Message-Id: <45cdf9d13a7e0b1e68932defe46a790f6813ed96.1675171759.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 31 Jan 2023 13:29:11 +0000 Subject: [PATCH v3 03/11] t5558: add tests for creationToken heuristic Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee As documented in the bundle URI design doc in 2da14fad8fe (docs: document bundle URI standard, 2022-08-09), the 'creationToken' member of a bundle URI allows a bundle provider to specify a total order on the bundles. Future changes will allow the Git client to understand these members and modify its behavior around downloading the bundles in that order. In the meantime, create tests that add creation tokens to the bundle list. For now, the Git client correctly ignores these unknown keys. Create a new test helper function, test_remote_https_urls, which filters GIT_TRACE2_EVENT output to extract a list of URLs passed to git-remote-https child processes. This can be used to verify the order of these requests as we implement the creationToken heuristic. For now, we need to sort the actual output since the current client does not have a well-defined order that it applies to the bundles. Signed-off-by: Derrick Stolee --- t/t5558-clone-bundle-uri.sh | 69 +++++++++++++++++++++++++++++++++++-- t/test-lib-functions.sh | 8 +++++ 2 files changed, 75 insertions(+), 2 deletions(-) diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 9155f31fa2c..474432c8ace 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -285,6 +285,8 @@ test_expect_success 'clone HTTP bundle' ' ' test_expect_success 'clone bundle list (HTTP, no heuristic)' ' + test_when_finished rm -f trace*.txt && + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && [bundle] @@ -304,12 +306,26 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' ' uri = $HTTPD_URL/bundle-4.bundle EOF - git clone --bundle-uri="$HTTPD_URL/bundle-list" \ + GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \ + git clone --bundle-uri="$HTTPD_URL/bundle-list" \ clone-from clone-list-http 2>err && ! grep "Repository lacks these prerequisite commits" err && git -C clone-from for-each-ref --format="%(objectname)" >oids && - git -C clone-list-http cat-file --batch-check expect <<-EOF && + $HTTPD_URL/bundle-1.bundle + $HTTPD_URL/bundle-2.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-4.bundle + $HTTPD_URL/bundle-list + EOF + + # Sort the list, since the order is not well-defined + # without a heuristic. + test_remote_https_urls actual && + test_cmp expect actual ' test_expect_success 'clone bundle list (HTTP, any mode)' ' @@ -350,6 +366,55 @@ test_expect_success 'clone bundle list (HTTP, any mode)' ' test_cmp expect actual ' +test_expect_success 'clone bundle list (http, creationToken)' ' + test_when_finished rm -f trace*.txt && + + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" git \ + clone --bundle-uri="$HTTPD_URL/bundle-list" \ + "$HTTPD_URL/smart/fetch.git" clone-list-http-2 && + + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-list-http-2 cat-file --batch-check expect <<-EOF && + $HTTPD_URL/bundle-1.bundle + $HTTPD_URL/bundle-2.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-4.bundle + $HTTPD_URL/bundle-list + EOF + + # Since the creationToken heuristic is not yet understood by the + # client, the order cannot be verified at this moment. Sort the + # list for consistent results. + test_remote_https_urls actual && + test_cmp expect actual +' + # Do not add tests here unless they use the HTTP server, as they will # not run unless the HTTP dependencies exist. diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index f036c4d3003..ace542f4226 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1833,6 +1833,14 @@ test_region () { return 0 } +# Given a GIT_TRACE2_EVENT log over stdin, writes to stdout a list of URLs +# sent to git-remote-https child processes. +test_remote_https_urls() { + grep -e '"event":"child_start".*"argv":\["git-remote-https",".*"\]' | + sed -e 's/{"event":"child_start".*"argv":\["git-remote-https","//g' \ + -e 's/"\]}//g' +} + # Print the destination of symlink(s) provided as arguments. Basically # the same as the readlink command, but it's not available everywhere. test_readlink () { From patchwork Tue Jan 31 13:29:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13123005 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A291FC38142 for ; Tue, 31 Jan 2023 13:29:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232216AbjAaN3j (ORCPT ); Tue, 31 Jan 2023 08:29:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232167AbjAaN32 (ORCPT ); Tue, 31 Jan 2023 08:29:28 -0500 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4BC111650 for ; Tue, 31 Jan 2023 05:29:26 -0800 (PST) Received: by mail-wm1-x32b.google.com with SMTP id f47-20020a05600c492f00b003dc584a7b7eso4434903wmp.3 for ; Tue, 31 Jan 2023 05:29:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=B4sUp7XVYAnX/xBzZnTGLuky6ow9i8XqFCncWHsIoV0=; b=qlHwq0Y7kCAZ6EckYxEbe5epym4pxhfkqv/novbd+nRcRHoIk8q1Ni+LkvvFmZPOpv f4fvUmH/Q8RKGzzx5+qdKJpTQIx9FGL3YSVx3ruXnN9dYKLe72pZZpJNeNjuY1IM1oPr 410eJsVyUzZJYmydbI9ploZmkeldqwVNiKliHNITaM9CvBfmauxOawNs+ERQSsKKlIGl 4BrYRJ19DIedwM3Oc6iLyXze7pTegjTd2lu+xEBYKYpk9UEQf6tB7//zUVP6QHTsjsut tJ/kAsA0uMVISnPIsTwnuKf3bZndDAKUvN53xzAnVshdLoGgfu3oD0ZjFdF8pBUktnxY 5+Zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=B4sUp7XVYAnX/xBzZnTGLuky6ow9i8XqFCncWHsIoV0=; b=lBznr2NSycE8ODNZA2GZUS6b3osraoa9eQKMsUxUR1gp/HVOIC3psLVcGqvIRgHd6i Sv/aCCIDb7R8k7200BCerD/fMn4/DeOqlpqSb/4eGy2YznxpFInv8Uj7LmKsVx1BbmTq aqu0fz+ETjCGUcHf8RoK/V/KpO853Sl4nqaT1B0YDUf5xHptNlT2PjPb2BFTKsBQzRoR ds3ZXoHPW2Dp+hI5neNNX2VK0e+qtkWjf+2ZOUO/F9tg0a47jncRVxWNBPh9zJEZARH2 8rBRIfEKp1ORbiLEUVwS5dLXU7oUGecfUeFVeb9as0BUZZSNFMv29DIjgqtT9l4Z24xp 5iqw== X-Gm-Message-State: AFqh2kqxcTl4E4+1Y2MrDI0uccapVQ1oRAsm9keEwur1QkBh0ZXfhrea CrNkkEiMdksRSmI4ZvIDPTt2mRUodCo= X-Google-Smtp-Source: AMrXdXsT6JAIFfLdbP3/0ZWORJS9YxVAcEHqa6Z4btoTyyfyIFLvz7QSqzAHNzKtqtpdF3a9WRmwXw== X-Received: by 2002:a05:600c:3d16:b0:3db:1ae8:ad98 with SMTP id bh22-20020a05600c3d1600b003db1ae8ad98mr47092403wmb.33.1675171765115; Tue, 31 Jan 2023 05:29:25 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n17-20020a1c7211000000b003dc3f07c876sm13287005wmc.46.2023.01.31.05.29.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Jan 2023 05:29:24 -0800 (PST) Message-Id: <49bf10e0fd4468720cd86cc47e425bf446a1a4f7.1675171759.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 31 Jan 2023 13:29:12 +0000 Subject: [PATCH v3 04/11] bundle-uri: parse bundle.heuristic=creationToken Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The bundle.heuristic value communicates that the bundle list is organized to make use of the bundle..creationToken values that may be provided in the bundle list. Those values will create a total order on the bundles, allowing the Git client to download them in a specific order and even remember previously-downloaded bundles by storing the maximum creation token value. Before implementing any logic that parses or uses the bundle..creationToken values, teach Git to parse the bundle.heuristic value from a bundle list. We can use 'test-tool bundle-uri' to print the heuristic value and verify that the parsing works correctly. As an extra precaution, create the internal 'heuristics' array to be a list of (enum, string) pairs so we can iterate through the array entries carefully, regardless of the enum values. Signed-off-by: Derrick Stolee --- Documentation/config/bundle.txt | 7 +++++++ bundle-uri.c | 34 +++++++++++++++++++++++++++++++++ bundle-uri.h | 14 ++++++++++++++ t/t5750-bundle-uri-parse.sh | 19 ++++++++++++++++++ 4 files changed, 74 insertions(+) diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt index daa21eb674a..3faae386853 100644 --- a/Documentation/config/bundle.txt +++ b/Documentation/config/bundle.txt @@ -15,6 +15,13 @@ bundle.mode:: complete understanding of the bundled information (`all`) or if any one of the listed bundle URIs is sufficient (`any`). +bundle.heuristic:: + If this string-valued key exists, then the bundle list is designed to + work well with incremental `git fetch` commands. The heuristic signals + that there are additional keys available for each bundle that help + determine which subset of bundles the client should download. The + only value currently understood is `creationToken`. + bundle..*:: The `bundle..*` keys are used to describe a single item in the bundle list, grouped under `` for identification purposes. diff --git a/bundle-uri.c b/bundle-uri.c index 36268dda172..36ec542718d 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -9,6 +9,14 @@ #include "config.h" #include "remote.h" +static struct { + enum bundle_list_heuristic heuristic; + const char *name; +} heuristics[BUNDLE_HEURISTIC__COUNT] = { + { BUNDLE_HEURISTIC_NONE, ""}, + { BUNDLE_HEURISTIC_CREATIONTOKEN, "creationToken" }, +}; + static int compare_bundles(const void *hashmap_cmp_fn_data, const struct hashmap_entry *he1, const struct hashmap_entry *he2, @@ -100,6 +108,17 @@ void print_bundle_list(FILE *fp, struct bundle_list *list) fprintf(fp, "\tversion = %d\n", list->version); fprintf(fp, "\tmode = %s\n", mode); + if (list->heuristic) { + int i; + for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) { + if (heuristics[i].heuristic == list->heuristic) { + printf("\theuristic = %s\n", + heuristics[list->heuristic].name); + break; + } + } + } + for_all_bundles_in_list(list, summarize_bundle, fp); } @@ -142,6 +161,21 @@ static int bundle_list_update(const char *key, const char *value, return 0; } + if (!strcmp(subkey, "heuristic")) { + int i; + for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) { + if (heuristics[i].heuristic && + heuristics[i].name && + !strcmp(value, heuristics[i].name)) { + list->heuristic = heuristics[i].heuristic; + return 0; + } + } + + /* Ignore unknown heuristics. */ + return 0; + } + /* Ignore other unknown global keys. */ return 0; } diff --git a/bundle-uri.h b/bundle-uri.h index d5e89f1671c..2e44a50a90b 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -52,6 +52,14 @@ enum bundle_list_mode { BUNDLE_MODE_ANY }; +enum bundle_list_heuristic { + BUNDLE_HEURISTIC_NONE = 0, + BUNDLE_HEURISTIC_CREATIONTOKEN, + + /* Must be last. */ + BUNDLE_HEURISTIC__COUNT +}; + /** * A bundle_list contains an unordered set of remote_bundle_info structs, * as well as information about the bundle listing, such as version and @@ -75,6 +83,12 @@ struct bundle_list { * advertised by the bundle list at that location. */ char *baseURI; + + /** + * A list can have a heuristic, which helps reduce the number of + * downloaded bundles. + */ + enum bundle_list_heuristic heuristic; }; void init_bundle_list(struct bundle_list *list); diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh index 7b4f930e532..6fc92a9c0d4 100755 --- a/t/t5750-bundle-uri-parse.sh +++ b/t/t5750-bundle-uri-parse.sh @@ -250,4 +250,23 @@ test_expect_success 'parse config format edge cases: empty key or value' ' test_cmp_config_output expect actual ' +test_expect_success 'parse config format: creationToken heuristic' ' + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-config expect >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + test_done From patchwork Tue Jan 31 13:29:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13123006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F24D7C636D3 for ; Tue, 31 Jan 2023 13:29:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232199AbjAaN3k (ORCPT ); Tue, 31 Jan 2023 08:29:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231849AbjAaN32 (ORCPT ); Tue, 31 Jan 2023 08:29:28 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A62374860E for ; Tue, 31 Jan 2023 05:29:27 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id h16so14177526wrz.12 for ; Tue, 31 Jan 2023 05:29:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=zcHUJaW0aTZ2AIHKKulh/nhnLboMi0P+NYDOF2ZdK3U=; b=axqCTIKx4r1hfgSOVARWzhtZ7TlP8r70y9ZJA8mV6aImkgv9kUkfBYfYp+q7eKJVzM DgvghoFFKdDfVhUjw59bpQS/lC81YGqL3EsjbABO/GnnyZRCz0AMfE0X7Pag/sFh1hhf VT1FdDrwGC7zNtlbsXwmgDBAx2zYYm7jVDsH8CRmyJqv2AHtpLcC8m1m5TN5UdRYbOHm AQbx97/JwmtB8ZmbkDbJM24WcKiKr43ZnKJxTYSTh8qZYkTmeuHi+JCxS5AE8q8zVbaI Cjxj41f+gDTfhb7h0KLZLzaBAH6B47juBHcHSBCrgIrdRG5AZLFgrmmtTPH3UvXuYuWw KDjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zcHUJaW0aTZ2AIHKKulh/nhnLboMi0P+NYDOF2ZdK3U=; b=U95fZQbZ5abbS8JeXm2z/TmP+TKx9/kekalt/QSdU4e8ZuXKLTk71/j6WEIJ02i3Tj v5Ny5T8lhUR+MALPhQf3eeV4EgOGyX0vlUHNEjR221TcBySEohFLI+y45f2IlX0L3chH L0+wHtSoGvIejc+eUZUNDzcckfDoE+/uHObltsbkbIL7d4KVJiK9RGMfIjQocJF0lnaW 6R99CIyYiV1Y9+KUMLQuXkLVghJKqnfRvjMKg4QTAxdD+kvQ7tQIo+gg9kYbcK4ziuSK BDGQQbeOaEDHYDQvALcWXAz9l/cO4eNNsAnvg6P7yRlD24rCtwKA3Y0aRZDI5rhHEs1N MvKA== X-Gm-Message-State: AFqh2koTtHjlxiwUApQ/zfTPRK8Jo9CU4Jv9dxId/e1U32WeRIC2BZ5a ptSuEd6nG/JF0O4zD/pQk5LRKk1KAkg= X-Google-Smtp-Source: AMrXdXsGChAKGWlfhKFdXkSDWh0Z3HISfQs9mAIkkpAtvHRjedFA/KP/UoVx6sRqopr4be/qOG6sBQ== X-Received: by 2002:adf:e782:0:b0:2be:5ca0:a7f9 with SMTP id n2-20020adfe782000000b002be5ca0a7f9mr35186217wrm.33.1675171765976; Tue, 31 Jan 2023 05:29:25 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id f6-20020a5d50c6000000b002bfc24e1c55sm14795352wrt.78.2023.01.31.05.29.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Jan 2023 05:29:25 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 31 Jan 2023 13:29:13 +0000 Subject: [PATCH v3 05/11] bundle-uri: parse bundle..creationToken values Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The previous change taught Git to parse the bundle.heuristic value, especially when its value is "creationToken". Now, teach Git to parse the bundle..creationToken values on each bundle in a bundle list. Before implementing any logic based on creationToken values for the creationToken heuristic, parse and print these values for testing purposes. Signed-off-by: Derrick Stolee --- bundle-uri.c | 10 ++++++++++ bundle-uri.h | 6 ++++++ t/t5750-bundle-uri-parse.sh | 18 ++++++++++++++++++ 3 files changed, 34 insertions(+) diff --git a/bundle-uri.c b/bundle-uri.c index 36ec542718d..d4277b2e3a7 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -83,6 +83,9 @@ static int summarize_bundle(struct remote_bundle_info *info, void *data) FILE *fp = data; fprintf(fp, "[bundle \"%s\"]\n", info->id); fprintf(fp, "\turi = %s\n", info->uri); + + if (info->creationToken) + fprintf(fp, "\tcreationToken = %"PRIu64"\n", info->creationToken); return 0; } @@ -203,6 +206,13 @@ static int bundle_list_update(const char *key, const char *value, return 0; } + if (!strcmp(subkey, "creationtoken")) { + if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1) + warning(_("could not parse bundle list key %s with value '%s'"), + "creationToken", value); + return 0; + } + /* * At this point, we ignore any information that we don't * understand, assuming it to be hints for a heuristic the client diff --git a/bundle-uri.h b/bundle-uri.h index 2e44a50a90b..ef32840bfa6 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -42,6 +42,12 @@ struct remote_bundle_info { * this boolean is true. */ unsigned unbundled:1; + + /** + * If the bundle is part of a list with the creationToken + * heuristic, then we use this member for sorting the bundles. + */ + uint64_t creationToken; }; #define REMOTE_BUNDLE_INFO_INIT { 0 } diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh index 6fc92a9c0d4..81bdf58b944 100755 --- a/t/t5750-bundle-uri-parse.sh +++ b/t/t5750-bundle-uri-parse.sh @@ -258,10 +258,13 @@ test_expect_success 'parse config format: creationToken heuristic' ' heuristic = creationToken [bundle "one"] uri = http://example.com/bundle.bdl + creationToken = 123456 [bundle "two"] uri = https://example.com/bundle.bdl + creationToken = 12345678901234567890 [bundle "three"] uri = file:///usr/share/git/bundle.bdl + creationToken = 1 EOF test-tool bundle-uri parse-config expect >actual 2>err && @@ -269,4 +272,19 @@ test_expect_success 'parse config format: creationToken heuristic' ' test_cmp_config_output expect actual ' +test_expect_success 'parse config format edge cases: creationToken heuristic' ' + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + [bundle "one"] + uri = http://example.com/bundle.bdl + creationToken = bogus + EOF + + test-tool bundle-uri parse-config expect >actual 2>err && + grep "could not parse bundle list key creationToken with value '\''bogus'\''" err +' + test_done From patchwork Tue Jan 31 13:29:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13123009 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B087AC636D3 for ; Tue, 31 Jan 2023 13:29:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232201AbjAaN3q (ORCPT ); Tue, 31 Jan 2023 08:29:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40552 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229895AbjAaN3a (ORCPT ); Tue, 31 Jan 2023 08:29:30 -0500 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9008F4F369 for ; Tue, 31 Jan 2023 05:29:28 -0800 (PST) Received: by mail-wr1-x436.google.com with SMTP id bk16so14190698wrb.11 for ; Tue, 31 Jan 2023 05:29:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=JtRtbRU16IKeKpkp2bOM7JX6HckPDbaIgKOSCRQCNeU=; b=VKKB7O5C2KviePEiWbElbW9YHIocVd6Dt8ES6qM5PJshkzivBzlWiTXhvmT1HCfFvN TbJvBrRSy96y77ifiuj/QavM98kYTqGRlwEpy1lEYlgyBp1erO58/HvLEr1ytqEQGUrI oKh/7EcgatVWHi4iqWBqGKK8w6cMtnFC49KpJq6dDeCMK4URnIbNYL1gNblvq4S8MKEy ruqsndMj7rKjytABFeoOq7zFFK8/sF6LConzv5YNBW8XQIT2lgUbcw8piBHSwD/KdcFw oiT/7s+U/iydwKVoHsPfxuCzcamqjwcNdMlPBl46XFCT9qXdae1jQOQ/Iz0nMynZOTWG vfdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JtRtbRU16IKeKpkp2bOM7JX6HckPDbaIgKOSCRQCNeU=; b=id4+yrLAyoLTbinNZmFtvBCiIEGipJhm/DWmGr5yEbxi4jkssq03tN7g80PVZR47bO x1qtq+lIysm27YGMy2hE6577TMrrIUnPLogZxgReSIp7SvK1jom3CURAE88IdbwHL55h LKwPK15ajkR9a90+ilyr0PDkWnKNsCNK2kz7j92M0WWGfxvf/ljF+TUkxby0ihrnUbzE Zbj2NZinzP5LnZwsXck5bnqGQJ0AX2ui8tya91SP2Ve0crzH26EPVCDHSFFMRmSmG/6+ ngZHJBO7EhUqL8VmfLx7TJsRfwsHZStmGvgSGZFurFVC+zFIktdYw2piLnxPlfS3/gqc k6bw== X-Gm-Message-State: AO0yUKVrp6iPy2T+ZLY+JFx3xQUKrcjl2ztGU/4blaNXI3bYhK2ajInR kOZtcnp4bhIvoNATQwHyOyBuZlCbqQ4= X-Google-Smtp-Source: AK7set/dYi3x0h3U2HfxPxlFP/YhlGJQ8TOskMH4it9nRsh2BvCXvlh05OFRjfBcqJXrKR+zYlwc0A== X-Received: by 2002:a5d:420a:0:b0:2bf:e39d:c8a7 with SMTP id n10-20020a5d420a000000b002bfe39dc8a7mr9534510wrq.44.1675171766818; Tue, 31 Jan 2023 05:29:26 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b14-20020a05600010ce00b0029e1aa67fd2sm14867070wrx.115.2023.01.31.05.29.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Jan 2023 05:29:26 -0800 (PST) Message-Id: <366db5f6931df1a2fc832d998a69ac010e59293c.1675171760.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 31 Jan 2023 13:29:14 +0000 Subject: [PATCH v3 06/11] bundle-uri: download in creationToken order Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The creationToken heuristic provides an ordering on the bundles advertised by a bundle list. Teach the Git client to download bundles differently when this heuristic is advertised. The bundles in the list are sorted by their advertised creationToken values, then downloaded in decreasing order. This avoids the previous strategy of downloading bundles in an arbitrary order and attempting to apply them (likely failing in the case of required commits) until discovering the order through attempted unbundling. During a fresh 'git clone', it may make sense to download the bundles in increasing order, since that would prevent the need to attempt unbundling a bundle with required commits that do not exist in our empty object store. The cost of testing an unbundle is quite low, and instead the chosen order is optimizing for a future bundle download during a 'git fetch' operation with a non-empty object store. Since the Git client continues fetching from the Git remote after downloading and unbundling bundles, the client's object store can be ahead of the bundle provider's object store. The next time it attempts to download from the bundle list, it makes most sense to download only the most-recent bundles until all tips successfully unbundle. The strategy implemented here provides that short-circuit where the client downloads a minimal set of bundles. However, we are not satisfied by the naive approach of downloading bundles until one successfully unbundles, expecting the earlier bundles to successfully unbundle now. The example repository in t5558 demonstrates this well: ---------------- bundle-4 4 / \ ----|---|------- bundle-3 | | | 3 | | ----|---|------- bundle-2 | | 2 | | | ----|---|------- bundle-1 \ / 1 | (previous commits) In this repository, if we already have the objects for bundle-1 and then try to fetch from this list, the naive approach will fail. bundle-4 requires both bundle-3 and bundle-2, though bundle-3 will successfully unbundle without bundle-2. Thus, the algorithm needs to keep this in mind. A later implementation detail will store the maximum creationToken seen during such a bundle download, and the client will avoid downloading a bundle unless its creationToken is strictly greater than that stored value. For now, if the client seeks to download from an identical bundle list since its previous download, it will download the most-recent bundle then stop since its required commits are already in the object store. Add tests that exercise this behavior, but we will expand upon these tests when incremental downloads during 'git fetch' make use of creationToken values. Signed-off-by: Derrick Stolee --- bundle-uri.c | 156 +++++++++++++++++++++++++++++++++++- t/t5558-clone-bundle-uri.sh | 40 +++++++-- t/t5601-clone.sh | 46 +++++++++++ 3 files changed, 233 insertions(+), 9 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index d4277b2e3a7..af48938d243 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -447,6 +447,139 @@ static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data return 0; } +struct bundles_for_sorting { + struct remote_bundle_info **items; + size_t alloc; + size_t nr; +}; + +static int append_bundle(struct remote_bundle_info *bundle, void *data) +{ + struct bundles_for_sorting *list = data; + list->items[list->nr++] = bundle; + return 0; +} + +/** + * For use in QSORT() to get a list sorted by creationToken + * in decreasing order. + */ +static int compare_creation_token_decreasing(const void *va, const void *vb) +{ + const struct remote_bundle_info * const *a = va; + const struct remote_bundle_info * const *b = vb; + + if ((*a)->creationToken > (*b)->creationToken) + return -1; + if ((*a)->creationToken < (*b)->creationToken) + return 1; + return 0; +} + +static int fetch_bundles_by_token(struct repository *r, + struct bundle_list *list) +{ + int cur; + int move_direction = 0; + struct bundle_list_context ctx = { + .r = r, + .list = list, + .mode = list->mode, + }; + struct bundles_for_sorting bundles = { + .alloc = hashmap_get_size(&list->bundles), + }; + + ALLOC_ARRAY(bundles.items, bundles.alloc); + + for_all_bundles_in_list(list, append_bundle, &bundles); + + QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing); + + /* + * Attempt to download and unbundle the minimum number of bundles by + * creationToken in decreasing order. If we fail to unbundle (after + * a successful download) then move to the next non-downloaded bundle + * and attempt downloading. Once we succeed in applying a bundle, + * move to the previous unapplied bundle and attempt to unbundle it + * again. + * + * In the case of a fresh clone, we will likely download all of the + * bundles before successfully unbundling the oldest one, then the + * rest of the bundles unbundle successfully in increasing order + * of creationToken. + * + * If there are existing objects, then this process may terminate + * early when all required commits from "new" bundles exist in the + * repo's object store. + */ + cur = 0; + while (cur >= 0 && cur < bundles.nr) { + struct remote_bundle_info *bundle = bundles.items[cur]; + if (!bundle->file) { + /* + * Not downloaded yet. Try downloading. + * + * Note that bundle->file is non-NULL if a download + * was attempted, even if it failed to download. + */ + if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) { + /* Mark as unbundled so we do not retry. */ + bundle->unbundled = 1; + + /* Try looking deeper in the list. */ + move_direction = 1; + goto move; + } + + /* We expect bundles when using creationTokens. */ + if (!is_bundle(bundle->file, 1)) { + warning(_("file downloaded from '%s' is not a bundle"), + bundle->uri); + break; + } + } + + if (bundle->file && !bundle->unbundled) { + /* + * This was downloaded, but not successfully + * unbundled. Try unbundling again. + */ + if (unbundle_from_file(ctx.r, bundle->file)) { + /* Try looking deeper in the list. */ + move_direction = 1; + } else { + /* + * Succeeded in unbundle. Retry bundles + * that previously failed to unbundle. + */ + move_direction = -1; + bundle->unbundled = 1; + } + } + + /* + * Else case: downloaded and unbundled successfully. + * Skip this by moving in the same direction as the + * previous step. + */ + +move: + /* Move in the specified direction and repeat. */ + cur += move_direction; + } + + free(bundles.items); + + /* + * We succeed if the loop terminates because 'cur' drops below + * zero. The other case is that we terminate because 'cur' + * reaches the end of the list, so we have a failure no matter + * which bundles we apply from the list. + */ + return cur >= 0; +} + static int download_bundle_list(struct repository *r, struct bundle_list *local_list, struct bundle_list *global_list, @@ -484,7 +617,15 @@ static int fetch_bundle_list_in_config_format(struct repository *r, goto cleanup; } - if ((result = download_bundle_list(r, &list_from_bundle, + /* + * If this list uses the creationToken heuristic, then the URIs + * it advertises are expected to be bundles, not nested lists. + * We can drop 'global_list' and 'depth'. + */ + if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) { + result = fetch_bundles_by_token(r, &list_from_bundle); + global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN; + } else if ((result = download_bundle_list(r, &list_from_bundle, global_list, depth))) goto cleanup; @@ -626,6 +767,14 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list) int result; struct bundle_list global_list; + /* + * If the creationToken heuristic is used, then the URIs + * advertised by 'list' are not nested lists and instead + * direct bundles. We do not need to use global_list. + */ + if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) + return fetch_bundles_by_token(r, list); + init_bundle_list(&global_list); /* If a bundle is added to this global list, then it is required. */ @@ -634,7 +783,10 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list) if ((result = download_bundle_list(r, list, &global_list, 0))) goto cleanup; - result = unbundle_all_bundles(r, &global_list); + if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) + result = fetch_bundles_by_token(r, list); + else + result = unbundle_all_bundles(r, &global_list); cleanup: for_all_bundles_in_list(&global_list, unlink_bundle, NULL); diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 474432c8ace..6f9417a0afb 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -401,17 +401,43 @@ test_expect_success 'clone bundle list (http, creationToken)' ' git -C clone-list-http-2 cat-file --batch-check expect <<-EOF && - $HTTPD_URL/bundle-1.bundle - $HTTPD_URL/bundle-2.bundle - $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-list $HTTPD_URL/bundle-4.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-2.bundle + $HTTPD_URL/bundle-1.bundle + EOF + + test_remote_https_urls actual && + test_cmp expect actual +' + +test_expect_success 'clone incomplete bundle list (http, creationToken)' ' + test_when_finished rm -f trace*.txt && + + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + EOF + + GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \ + git clone --bundle-uri="$HTTPD_URL/bundle-list" \ + --single-branch --branch=base --no-tags \ + "$HTTPD_URL/smart/fetch.git" clone-token-http && + + cat >expect <<-EOF && $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-1.bundle EOF - # Since the creationToken heuristic is not yet understood by the - # client, the order cannot be verified at this moment. Sort the - # list for consistent results. - test_remote_https_urls actual && + test_remote_https_urls actual && test_cmp expect actual ' diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 1928ea1dd7c..b7d5551262c 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -831,6 +831,52 @@ test_expect_success 'auto-discover multiple bundles from HTTP clone' ' grep -f pattern trace.txt ' +test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' ' + test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" && + test_when_finished rm -rf clone-heuristic trace*.txt && + + test_commit -C src newest && + git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/newest.bundle" HEAD~1..HEAD && + git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" && + + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/repo4.git/config" <<-EOF && + [uploadPack] + advertiseBundleURIs = true + + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "everything"] + uri = $HTTPD_URL/everything.bundle + creationtoken = 1 + + [bundle "new"] + uri = $HTTPD_URL/new.bundle + creationtoken = 2 + + [bundle "newest"] + uri = $HTTPD_URL/newest.bundle + creationtoken = 3 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \ + git -c protocol.version=2 \ + -c transfer.bundleURI=true clone \ + "$HTTPD_URL/smart/repo4.git" clone-heuristic && + + cat >expect <<-EOF && + $HTTPD_URL/newest.bundle + $HTTPD_URL/new.bundle + $HTTPD_URL/everything.bundle + EOF + + # We should fetch all bundles in the expected order. + test_remote_https_urls actual && + test_cmp expect actual +' + # DO NOT add non-httpd-specific tests here, because the last part of this # test script is only executed when httpd is available and enabled. From patchwork Tue Jan 31 13:29:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13123007 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F964C38142 for ; Tue, 31 Jan 2023 13:29:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232170AbjAaN3n (ORCPT ); Tue, 31 Jan 2023 08:29:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232048AbjAaN33 (ORCPT ); Tue, 31 Jan 2023 08:29:29 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E82D4DBFE for ; Tue, 31 Jan 2023 05:29:28 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id m14so13739311wrg.13 for ; Tue, 31 Jan 2023 05:29:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=JCQCgTP5dGvVcPHMeLYc7PxeJqBE0GAhPPkU5Yq/epU=; b=LEymr8/RnVBhRlpE5WAF6ZF4amscJadk/p6Aofh5NQYauLxefWdOBvI6de6h40Mtte 1siGcmiIqyDo8yJJSDKPrpX9SXDwMQ6sQfsNgN+yBDGAPpo4HuSfqvzobVPW37EXyWYC aWQZ34iwn4p4fm06upJHcV9vXlXnibztvi6gJFGcJRDCZCXPNmtXvVjDCAeutoLYcQF2 v3hzPG1OsLCHVjVFDdOuEDNBi/oQx+Ze4wB81W3FB/Ym5x1bLHPu2bjYwzqyqEBYgrXe qC0K3XYTngApW1JImtllAq8JK3g7+OxkJhu8p2DDONGJ6+XOYzYEH/bDqethAM6vWIyT BllA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JCQCgTP5dGvVcPHMeLYc7PxeJqBE0GAhPPkU5Yq/epU=; b=JC3iuh/yrRwX3imgPk2jydYQZBx8W39mVoj4eRleEG+OSfPEPIY0P3Jv5xPO2pn05z jmh81T+as+T/7kkG9hZ6lt9nGpndQ51Jt8u9YBVMfqqc0QgR8brc4ujEQHtsn5j1aLi6 vDpxckSI7j9tIWGhQsh4bzqmFDLjbWibcUzsiqUY2spuGYXQkFl7vECN4s8F0klo4KZA Sy13s2Brkj7Cv2csZaat8p52FKdZEbShHmC/WrSy6oU3hPzxp8j7djDpnqljGA+MuXKv REL4qniohvzI0YX9y4K/wjd9F6iJEdhzMxXGkAIWt4Gj1VfC9+ofdkj78JXhvthBj4Zf RCOQ== X-Gm-Message-State: AO0yUKWZaQYSpmWFJ+IkpvKu+YsFQbse3t7Ox6gWrIy6Xr1UdFsuk9f2 n5hjLilcfJhpZMdCKuFtrSQebR8sVKA= X-Google-Smtp-Source: AK7set//Y/fGbNeSHk6E79YFH15wajcj3ikUTKLGPb5cKIR1fzM1A1ywERLvLAcBNf9X5h/RHPQTRg== X-Received: by 2002:adf:f38e:0:b0:2bd:c83a:6e3 with SMTP id m14-20020adff38e000000b002bdc83a06e3mr2881012wro.7.1675171767656; Tue, 31 Jan 2023 05:29:27 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a4-20020adffb84000000b002bc7f64efa3sm14505281wrr.29.2023.01.31.05.29.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Jan 2023 05:29:27 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 31 Jan 2023 13:29:15 +0000 Subject: [PATCH v3 07/11] clone: set fetch.bundleURI if appropriate Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Bundle providers may organize their bundle lists in a way that is intended to improve incremental fetches, not just initial clones. However, they do need to state that they have organized with that in mind, or else the client will not expect to save time by downloading bundles after the initial clone. This is done by specifying a bundle.heuristic value. There are two types of bundle lists: those at a static URI and those that are advertised from a Git remote over protocol v2. The new fetch.bundleURI config value applies for static bundle URIs that are not advertised over protocol v2. If the user specifies a static URI via 'git clone --bundle-uri', then Git can set this config as a reminder for future 'git fetch' operations to check the bundle list before connecting to the remote(s). For lists provided over protocol v2, we will want to take a different approach and create a property of the remote itself by creating a remote..* type config key. That is not implemented in this change. Later changes will update 'git fetch' to consume this option. Signed-off-by: Derrick Stolee --- Documentation/config/fetch.txt | 8 +++++++ builtin/clone.c | 6 +++++- bundle-uri.c | 5 ++++- bundle-uri.h | 8 ++++++- t/t5558-clone-bundle-uri.sh | 39 ++++++++++++++++++++++++++++++++++ 5 files changed, 63 insertions(+), 3 deletions(-) diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt index cd65d236b43..244f44d460f 100644 --- a/Documentation/config/fetch.txt +++ b/Documentation/config/fetch.txt @@ -96,3 +96,11 @@ fetch.writeCommitGraph:: merge and the write may take longer. Having an updated commit-graph file helps performance of many Git commands, including `git merge-base`, `git push -f`, and `git log --graph`. Defaults to false. + +fetch.bundleURI:: + This value stores a URI for downloading Git object data from a bundle + URI before performing an incremental fetch from the origin Git server. + This is similar to how the `--bundle-uri` option behaves in + linkgit:git-clone[1]. `git clone --bundle-uri` will set the + `fetch.bundleURI` value if the supplied bundle URI contains a bundle + list that is organized for incremental fetches. diff --git a/builtin/clone.c b/builtin/clone.c index 5453ba5277f..5370617664d 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -1248,12 +1248,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix) * data from the --bundle-uri option. */ if (bundle_uri) { + int has_heuristic = 0; + /* At this point, we need the_repository to match the cloned repo. */ if (repo_init(the_repository, git_dir, work_tree)) warning(_("failed to initialize the repo, skipping bundle URI")); - else if (fetch_bundle_uri(the_repository, bundle_uri)) + else if (fetch_bundle_uri(the_repository, bundle_uri, &has_heuristic)) warning(_("failed to fetch objects from bundle URI '%s'"), bundle_uri); + else if (has_heuristic) + git_config_set_gently("fetch.bundleuri", bundle_uri); } strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD"); diff --git a/bundle-uri.c b/bundle-uri.c index af48938d243..7a1b6d94bf5 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -736,7 +736,8 @@ static int unlink_bundle(struct remote_bundle_info *info, void *data) return 0; } -int fetch_bundle_uri(struct repository *r, const char *uri) +int fetch_bundle_uri(struct repository *r, const char *uri, + int *has_heuristic) { int result; struct bundle_list list; @@ -756,6 +757,8 @@ int fetch_bundle_uri(struct repository *r, const char *uri) result = unbundle_all_bundles(r, &list); cleanup: + if (has_heuristic) + *has_heuristic = (list.heuristic != BUNDLE_HEURISTIC_NONE); for_all_bundles_in_list(&list, unlink_bundle, NULL); clear_bundle_list(&list); clear_remote_bundle_info(&bundle, NULL); diff --git a/bundle-uri.h b/bundle-uri.h index ef32840bfa6..6dbc780f661 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -124,8 +124,14 @@ int bundle_uri_parse_config_format(const char *uri, * based on that information. * * Returns non-zero if no bundle information is found at the given 'uri'. + * + * If the pointer 'has_heuristic' is non-NULL, then the value it points to + * will be set to be non-zero if and only if the fetched list has a + * heuristic value. Such a value indicates that the list was designed for + * incremental fetches. */ -int fetch_bundle_uri(struct repository *r, const char *uri); +int fetch_bundle_uri(struct repository *r, const char *uri, + int *has_heuristic); /** * Given a bundle list that was already advertised (likely by the diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 6f9417a0afb..b2d15e141ca 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -432,6 +432,8 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' ' --single-branch --branch=base --no-tags \ "$HTTPD_URL/smart/fetch.git" clone-token-http && + test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri && + cat >expect <<-EOF && $HTTPD_URL/bundle-list $HTTPD_URL/bundle-1.bundle @@ -441,6 +443,43 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' ' test_cmp expect actual ' +test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' + test_when_finished rm -rf fetch-http-4 trace*.txt && + + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \ + git clone --single-branch --branch=base \ + --bundle-uri="$HTTPD_URL/bundle-list" \ + "$HTTPD_URL/smart/fetch.git" fetch-http-4 && + + test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-1.bundle + EOF + + test_remote_https_urls actual && + test_cmp expect actual && + + # only received base ref from bundle-1 + git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-\EOF && + refs/bundles/base + EOF + test_cmp expect refs +' + # Do not add tests here unless they use the HTTP server, as they will # not run unless the HTTP dependencies exist. From patchwork Tue Jan 31 13:29:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13123008 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF8B3C38142 for ; Tue, 31 Jan 2023 13:29:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231154AbjAaN3s (ORCPT ); Tue, 31 Jan 2023 08:29:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232209AbjAaN3j (ORCPT ); Tue, 31 Jan 2023 08:29:39 -0500 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 302ED4FC2C for ; Tue, 31 Jan 2023 05:29:30 -0800 (PST) Received: by mail-wr1-x430.google.com with SMTP id h16so14177658wrz.12 for ; Tue, 31 Jan 2023 05:29:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=iNFrRUzAOuNSF8t1leDfh9Tj6xawIPV4QkdV6yQMg+U=; b=hIIgwIq6bU6jZQVNwfZpe6RSRro0spxzWbi198oNJI09x9up8UtB0FzBHIGWZF+ZHN zfHRrtAn8IONMsYW7o14oUpLMRbT+Xi7tx/JIMR2lqPiVNtb4mAj5QL7oDvxDZ28UaKs sEK1ahEqlmkYg3LeS8yxGGkifcLdJXzlFF7TUdti3yAp2tqeRnrUBSjk/5nAINRIhXa9 sOEzlA9AeicxOC61IypZGeKpvbLxRqUx1iwWcRTjLdVTE8r09eSUhXDnkbh9KnRJW7x/ IvNJFJcSFJ4L9LbmFoJWpMnQnyMJekGuJ7srhmIyVYBQ8K3M2Q+y3XLefVxCiLUjrPhL 72Vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iNFrRUzAOuNSF8t1leDfh9Tj6xawIPV4QkdV6yQMg+U=; b=N8zxH4Sk+gTyulDw6qTkL0wsWlaD+rl1RanPIWBdOensxu5fwECj3qNR6oqh6Zjo1T xD/Zim5IaStZKvVSoEheUvB0xf6yDBDNNe0+ZClUUm5sM4M52ACBlvg0wWe96yw5VCCK XOSCBA9vAH8KifyPSD7bjaZgODooeaJLhQX8Ppi4eTdMzZQU35lP9ujn2jSgTW4k7lY8 6k66ZmRJVdOIc55ViyAp4clGCFZJTX/jHDzAcnv3LZHIXFOwErhZvxLWfJRrqBv9+tT3 v3zfoMALSteMyQzXou9TOBJwvD1l/20dZKlTQVKoO20DH4KmZDQxqMvIxwmg3Psndx3L M3RQ== X-Gm-Message-State: AFqh2koYnWY6crkvivwgNJFZ1UCrZVd4eIO8hrBc0ADQ8IffqgN3/SEO 8n4t0UlLFQLTYcQExH+Rv7/Gd7dbb2I= X-Google-Smtp-Source: AMrXdXsp3JEy1eiXOmRui9ds5H2m9lIVPXnFVcMVkn3ROMWSCPYjQz2QSGoH9Lmr02Akikqn2avLMw== X-Received: by 2002:a5d:6d4b:0:b0:242:2d7:71e4 with SMTP id k11-20020a5d6d4b000000b0024202d771e4mr43013586wri.2.1675171768470; Tue, 31 Jan 2023 05:29:28 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b11-20020adfe30b000000b002bfcc940014sm13535715wrj.82.2023.01.31.05.29.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Jan 2023 05:29:28 -0800 (PST) Message-Id: <83f49b37c6975a5898e66a420f13a70ef25afc59.1675171760.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 31 Jan 2023 13:29:16 +0000 Subject: [PATCH v3 08/11] bundle-uri: drop bundle.flag from design doc Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The Implementation Plan section lists a 'bundle.flag' option that is not documented anywhere else. What is documented elsewhere in the document and implemented by previous changes is the 'bundle.heuristic' config key. For now, a heuristic is required to indicate that a bundle list is organized for use during 'git fetch', and it is also sufficient for all existing designs. Signed-off-by: Derrick Stolee --- Documentation/technical/bundle-uri.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt index b78d01d9adf..91d3a13e327 100644 --- a/Documentation/technical/bundle-uri.txt +++ b/Documentation/technical/bundle-uri.txt @@ -479,14 +479,14 @@ outline for submitting these features: (This choice is an opt-in via a config option and a command-line option.) -4. Allow the client to understand the `bundle.flag=forFetch` configuration +4. Allow the client to understand the `bundle.heuristic` configuration key and the `bundle..creationToken` heuristic. When `git clone` - discovers a bundle URI with `bundle.flag=forFetch`, it configures the - client repository to check that bundle URI during later `git fetch ` + discovers a bundle URI with `bundle.heuristic`, it configures the client + repository to check that bundle URI during later `git fetch ` commands. 5. Allow clients to discover bundle URIs during `git fetch` and configure - a bundle URI for later fetches if `bundle.flag=forFetch`. + a bundle URI for later fetches if `bundle.heuristic` is set. 6. Implement the "inspect headers" heuristic to reduce data downloads when the `bundle..creationToken` heuristic is not available. From patchwork Tue Jan 31 13:29:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13123010 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4EB10C38142 for ; Tue, 31 Jan 2023 13:30:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232043AbjAaNaA (ORCPT ); Tue, 31 Jan 2023 08:30:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229930AbjAaN3j (ORCPT ); Tue, 31 Jan 2023 08:29:39 -0500 Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 125E34FAF6 for ; Tue, 31 Jan 2023 05:29:31 -0800 (PST) Received: by mail-wr1-x436.google.com with SMTP id t7so5848730wrp.5 for ; Tue, 31 Jan 2023 05:29:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=6m2umidPNl80fZh/ciSYwAKmAA9i57f7nRwRJBm5MTw=; b=goxExLHGtTZpa3yblohuCqd1Va6yHyWZJ4/X//1PuEXDVbRMNJhjL5Gud2SDRVJz+6 LrNn/Srj1w73GzRimSEEfdR26zKOWnMyIeuRofGkUUKrDx4DKWWz5xDCu0n6HuywDw5R EVF4O/jcmpVWwFOPE8uA4KpJjfWHdv1P6rNTA2DRlItXj/inLZZl5u5jPCvD0F1jSMx7 W017E9BExzP27YzvnS1BCNRY5GMmltGafnO5ffQSXPjuBAzBBdH3/QudfMCKsG8MPX/Z eaBR5V9Jo+ILca9Mep2FYOhELtolr3+aSq+tDRkXWWc4ZWUVyZLrEwLgnYXqMbhW8sxT 4+EA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6m2umidPNl80fZh/ciSYwAKmAA9i57f7nRwRJBm5MTw=; b=lc4Vqgc4ldnG8rxP6m9LrVVHrEic+Yg0Ob7FQRmWjQDE/z1U+C8xI9vD1MvPmnfkRM He1RNin0Q89ETBPuLSGkYfVkXw2TVsHeb0a1kTz3zMsHsj2IpLLmIS7l+/ttQYAAT2IC K1Q5UzG3rQcayaU79DcfektfgAs+hn5dXHoknW208zPZIM7kD32SuFMWOyzNbtm2Bnx8 rJ5eNjta40m256SleMplRsA1r8XKIGVJkXsSEnszmKGek2W8KYu7PHdPON90DgRBTFLA Bou2QYdka/Qb1m/RQAQImAmAcyOq4BIuiZUHiyZBWd86eUQWosQZgyORwyR2hU/eH9ef 1Lrg== X-Gm-Message-State: AO0yUKUY+1ByWJp7Dqtevavpbs3FZonICvvNSW7JwF5vTSOL/vzh8OP7 g5TzwQcXth3pqqL9q2hmyKWUz8nAGS0= X-Google-Smtp-Source: AK7set/OR1F4vt60bXk1KLguQN90G8U6PghPP2vv9JzkHUMKt2SroNPzyZNGuYNofKwYo6QZKYYL/Q== X-Received: by 2002:adf:fe4e:0:b0:2bf:be8b:32e with SMTP id m14-20020adffe4e000000b002bfbe8b032emr19256633wrs.17.1675171769197; Tue, 31 Jan 2023 05:29:29 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n9-20020a5d67c9000000b00294176c2c01sm14683084wrw.86.2023.01.31.05.29.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Jan 2023 05:29:28 -0800 (PST) Message-Id: <314c60f2ae4cf54a0d99cc4e6d8c4edd77fd6b22.1675171760.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 31 Jan 2023 13:29:17 +0000 Subject: [PATCH v3 09/11] fetch: fetch from an external bundle URI Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When a user specifies a URI via 'git clone --bundle-uri', that URI may be a bundle list that advertises a 'bundle.heuristic' value. In that case, the Git client stores a 'fetch.bundleURI' config value storing that URI. Teach 'git fetch' to check for this config value and download bundles from that URI before fetching from the Git remote(s). Likely, the bundle provider has configured a heuristic (such as "creationToken") that will allow the Git client to download only a portion of the bundles before continuing the fetch. Since this URI is completely independent of the remote server, we want to be sure that we connect to the bundle URI before creating a connection to the Git remote. We do not want to hold a stateful connection for too long if we can avoid it. To test that this works correctly, extend the previous tests that set 'fetch.bundleURI' to do follow-up fetches. The bundle list is updated incrementally at each phase to demonstrate that the heuristic avoids downloading older bundles. This includes the middle fetch downloading the objects in bundle-3.bundle from the Git remote, and therefore not needing that bundle in the third fetch. Signed-off-by: Derrick Stolee --- builtin/fetch.c | 6 ++ t/t5558-clone-bundle-uri.sh | 113 +++++++++++++++++++++++++++++++++++- 2 files changed, 118 insertions(+), 1 deletion(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 7378cafeec9..0477c379369 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -29,6 +29,7 @@ #include "commit-graph.h" #include "shallow.h" #include "worktree.h" +#include "bundle-uri.h" #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000) @@ -2109,6 +2110,7 @@ static int fetch_one(struct remote *remote, int argc, const char **argv, int cmd_fetch(int argc, const char **argv, const char *prefix) { int i; + const char *bundle_uri; struct string_list list = STRING_LIST_INIT_DUP; struct remote *remote = NULL; int result = 0; @@ -2194,6 +2196,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) if (dry_run) write_fetch_head = 0; + if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri) && + fetch_bundle_uri(the_repository, bundle_uri, NULL)) + warning(_("failed to fetch bundles from '%s'"), bundle_uri); + if (all) { if (argc == 1) die(_("fetch --all does not take a repository argument")); diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index b2d15e141ca..7deeb4b8ad1 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -440,7 +440,55 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' ' EOF test_remote_https_urls actual && - test_cmp expect actual + test_cmp expect actual && + + # We now have only one bundle ref. + git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-\EOF && + refs/bundles/base + EOF + test_cmp expect refs && + + # Add remaining bundles, exercising the "deepening" strategy + # for downloading via the creationToken heurisitc. + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \ + git -C clone-token-http fetch origin --no-tags \ + refs/heads/merge:refs/heads/merge && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-4.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-2.bundle + EOF + + test_remote_https_urls actual && + test_cmp expect actual && + + # We now have all bundle refs. + git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/merge + refs/bundles/right + EOF + test_cmp expect refs ' test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' @@ -477,6 +525,69 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' cat >expect <<-\EOF && refs/bundles/base EOF + test_cmp expect refs && + + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + EOF + + # Fetch the objects for bundle-2 _and_ bundle-3. + GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \ + git -C fetch-http-4 fetch origin --no-tags \ + refs/heads/left:refs/heads/left \ + refs/heads/right:refs/heads/right && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-2.bundle + EOF + + test_remote_https_urls actual && + test_cmp expect actual && + + # received left from bundle-2 + git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + EOF + test_cmp expect refs && + + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + EOF + + # This fetch should skip bundle-3.bundle, since its objects are + # already local (we have the requisite commits for bundle-4.bundle). + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C fetch-http-4 fetch origin --no-tags \ + refs/heads/merge:refs/heads/merge && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-4.bundle + EOF + + test_remote_https_urls actual && + test_cmp expect actual && + + # received merge ref from bundle-4, but right is missing + # because we did not download bundle-3. + git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/merge + EOF test_cmp expect refs ' From patchwork Tue Jan 31 13:29:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13123012 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5390CC636D4 for ; Tue, 31 Jan 2023 13:30:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231946AbjAaNaG (ORCPT ); Tue, 31 Jan 2023 08:30:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232217AbjAaN3j (ORCPT ); Tue, 31 Jan 2023 08:29:39 -0500 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C0FE4DCE0 for ; Tue, 31 Jan 2023 05:29:31 -0800 (PST) Received: by mail-wm1-x32c.google.com with SMTP id n13so3330085wmr.4 for ; Tue, 31 Jan 2023 05:29:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=Ji80PUlPvKQbAIOn0OsU1g9u7Sp2n3XxcsOBLiRJA50=; b=KjPNiBaUyMIcCjaqZ91MAL2z9VGYxq8ZB2XCKdb7g6NSsE6eWXEkOXfDsubWzoI0eq WhJJL6kZjrHOoddEOr6/q9RtqZxDIpbIp7odqxjPuv+hbwk/CzVYKdcsNNPUGE9LccWL MNJEj+LFvZ+nOKQCSlC7GM+atloX6baN/FHkYpPVmr3tUDi3QqsEoAIS8RZJ09yM1ZkB uGa76qftBrmmJyu0sOZ0ZrliF+wfc7inw5OKa5hhGm4NW6QYp/xoeybEjjoJLLcSPwYS Mc7FRfUcTNJPPk3A0fzMJzyQYITcYN0jN9Y8T/Efc8yydsSwAR+Mn2dglRyIWYNznziF PTgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ji80PUlPvKQbAIOn0OsU1g9u7Sp2n3XxcsOBLiRJA50=; b=EUKsDNLYt2zktGC10PIbhuUD1j4c2iks5zmIq4oWo3NVt7nm3AJ/l9ify0k9xO0ztm YYDIMZ/g5ZmvGDF/g+0JF3V/LWRhtf50cEw+nHTs19lK1afCb7sySo4cZiDtBZNjjtIT ibk4f2Dvk2valPt8I7dHEGy9jyxEDYxbiQRQ/bnqKEShzQRhZSpGnqLyj2EG61mb6Zyj sqfQB0Oy6Ctu697qswW0b6zDQ3+MF97qqyzDa3uTke412DxLRQcI8rfwvfaFStGnq8Us 0X+Oogl4hZ8nolFwJzvJgh4Iqaa+ImBxvbNfgB8JTdzyxxTiBr8/P3JuI4NpNNM1Uujl aUew== X-Gm-Message-State: AO0yUKUEs321tZJKvfxA8NYaTLM8JlS0omsmggGdgqIdypwehUcs6wjn H1/WNIUfeJNprAFMvO3nd85X2WHWtRI= X-Google-Smtp-Source: AK7set+qu0nccDBqiD41AbpS3TibvCPfqA3DRrCvP3swpbJRKH8C05wvJkPzNUQvqddk84L89GISsA== X-Received: by 2002:a05:600c:1c9c:b0:3de:d9f:3025 with SMTP id k28-20020a05600c1c9c00b003de0d9f3025mr79021wms.0.1675171769975; Tue, 31 Jan 2023 05:29:29 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id iz16-20020a05600c555000b003dd1aa227afsm3385663wmb.31.2023.01.31.05.29.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Jan 2023 05:29:29 -0800 (PST) Message-Id: <4e0465efd19798e7a14d653440233b9b3319b6de.1675171760.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Tue, 31 Jan 2023 13:29:18 +0000 Subject: [PATCH v3 10/11] bundle-uri: store fetch.bundleCreationToken Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When a bundle list specifies the "creationToken" heuristic, the Git client downloads the list and then starts downloading bundles in descending creationToken order. This process stops as soon as all downloaded bundles can be applied to the repository (because all required commits are present in the repository or in the downloaded bundles). When checking the same bundle list twice, this strategy requires downloading the bundle with the maximum creationToken again, which is wasteful. The creationToken heuristic promises that the client will not have a use for that bundle if its creationToken value is at most the previous creationToken value. To prevent these wasteful downloads, create a fetch.bundleCreationToken config setting that the Git client sets after downloading bundles. This value allows skipping that maximum bundle download when this config value is the same value (or larger). To test that this works correctly, we can insert some "duplicate" fetches into existing tests and demonstrate that only the bundle list is downloaded. The previous logic for downloading bundles by creationToken worked even if the bundle list was empty, but now we have logic that depends on the first entry of the list. Terminate early in the (non-sensical) case of an empty bundle list. Signed-off-by: Derrick Stolee --- Documentation/config/fetch.txt | 16 ++++++++++++ bundle-uri.c | 48 ++++++++++++++++++++++++++++++++-- t/t5558-clone-bundle-uri.sh | 29 +++++++++++++++++++- 3 files changed, 90 insertions(+), 3 deletions(-) diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt index 244f44d460f..568f0f75b30 100644 --- a/Documentation/config/fetch.txt +++ b/Documentation/config/fetch.txt @@ -104,3 +104,19 @@ fetch.bundleURI:: linkgit:git-clone[1]. `git clone --bundle-uri` will set the `fetch.bundleURI` value if the supplied bundle URI contains a bundle list that is organized for incremental fetches. ++ +If you modify this value and your repository has a `fetch.bundleCreationToken` +value, then remove that `fetch.bundleCreationToken` value before fetching from +the new bundle URI. + +fetch.bundleCreationToken:: + When using `fetch.bundleURI` to fetch incrementally from a bundle + list that uses the "creationToken" heuristic, this config value + stores the maximum `creationToken` value of the downloaded bundles. + This value is used to prevent downloading bundles in the future + if the advertised `creationToken` is not strictly larger than this + value. ++ +The creation token values are chosen by the provider serving the specific +bundle URI. If you modify the URI at `fetch.bundleURI`, then be sure to +remove the value for the `fetch.bundleCreationToken` value before fetching. diff --git a/bundle-uri.c b/bundle-uri.c index 7a1b6d94bf5..d6f7df7350f 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -481,6 +481,8 @@ static int fetch_bundles_by_token(struct repository *r, { int cur; int move_direction = 0; + const char *creationTokenStr; + uint64_t maxCreationToken = 0, newMaxCreationToken = 0; struct bundle_list_context ctx = { .r = r, .list = list, @@ -494,8 +496,27 @@ static int fetch_bundles_by_token(struct repository *r, for_all_bundles_in_list(list, append_bundle, &bundles); + if (!bundles.nr) { + free(bundles.items); + return 0; + } + QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing); + /* + * If fetch.bundleCreationToken exists, parses to a uint64t, and + * is not strictly smaller than the maximum creation token in the + * bundle list, then do not download any bundles. + */ + if (!repo_config_get_value(r, + "fetch.bundlecreationtoken", + &creationTokenStr) && + sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 && + bundles.items[0]->creationToken <= maxCreationToken) { + free(bundles.items); + return 0; + } + /* * Attempt to download and unbundle the minimum number of bundles by * creationToken in decreasing order. If we fail to unbundle (after @@ -516,6 +537,16 @@ static int fetch_bundles_by_token(struct repository *r, cur = 0; while (cur >= 0 && cur < bundles.nr) { struct remote_bundle_info *bundle = bundles.items[cur]; + + /* + * If we need to dig into bundles below the previous + * creation token value, then likely we are in an erroneous + * state due to missing or invalid bundles. Halt the process + * instead of continuing to download extra data. + */ + if (bundle->creationToken <= maxCreationToken) + break; + if (!bundle->file) { /* * Not downloaded yet. Try downloading. @@ -555,6 +586,9 @@ static int fetch_bundles_by_token(struct repository *r, */ move_direction = -1; bundle->unbundled = 1; + + if (bundle->creationToken > newMaxCreationToken) + newMaxCreationToken = bundle->creationToken; } } @@ -569,14 +603,24 @@ move: cur += move_direction; } - free(bundles.items); - /* * We succeed if the loop terminates because 'cur' drops below * zero. The other case is that we terminate because 'cur' * reaches the end of the list, so we have a failure no matter * which bundles we apply from the list. */ + if (cur < 0) { + struct strbuf value = STRBUF_INIT; + strbuf_addf(&value, "%"PRIu64"", newMaxCreationToken); + if (repo_config_set_multivar_gently(ctx.r, + "fetch.bundleCreationToken", + value.buf, NULL, 0)) + warning(_("failed to store maximum creation token")); + + strbuf_release(&value); + } + + free(bundles.items); return cur >= 0; } diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 7deeb4b8ad1..9c2b7934b9b 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -433,6 +433,7 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' ' "$HTTPD_URL/smart/fetch.git" clone-token-http && test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri && + test_cmp_config -C clone-token-http 1 fetch.bundlecreationtoken && cat >expect <<-EOF && $HTTPD_URL/bundle-list @@ -468,6 +469,7 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' ' GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \ git -C clone-token-http fetch origin --no-tags \ refs/heads/merge:refs/heads/merge && + test_cmp_config -C clone-token-http 4 fetch.bundlecreationtoken && cat >expect <<-EOF && $HTTPD_URL/bundle-list @@ -511,6 +513,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' "$HTTPD_URL/smart/fetch.git" fetch-http-4 && test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri && + test_cmp_config -C fetch-http-4 1 fetch.bundlecreationtoken && cat >expect <<-EOF && $HTTPD_URL/bundle-list @@ -538,6 +541,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' git -C fetch-http-4 fetch origin --no-tags \ refs/heads/left:refs/heads/left \ refs/heads/right:refs/heads/right && + test_cmp_config -C fetch-http-4 2 fetch.bundlecreationtoken && cat >expect <<-EOF && $HTTPD_URL/bundle-list @@ -555,6 +559,18 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' EOF test_cmp expect refs && + # No-op fetch + GIT_TRACE2_EVENT="$(pwd)/trace1b.txt" \ + git -C fetch-http-4 fetch origin --no-tags \ + refs/heads/left:refs/heads/left \ + refs/heads/right:refs/heads/right && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + EOF + test_remote_https_urls actual && + test_cmp expect actual && + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && [bundle "bundle-3"] uri = bundle-3.bundle @@ -570,6 +586,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ git -C fetch-http-4 fetch origin --no-tags \ refs/heads/merge:refs/heads/merge && + test_cmp_config -C fetch-http-4 4 fetch.bundlecreationtoken && cat >expect <<-EOF && $HTTPD_URL/bundle-list @@ -588,7 +605,17 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' refs/bundles/left refs/bundles/merge EOF - test_cmp expect refs + test_cmp expect refs && + + # No-op fetch + GIT_TRACE2_EVENT="$(pwd)/trace2b.txt" \ + git -C fetch-http-4 fetch origin && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + EOF + test_remote_https_urls actual && + test_cmp expect actual ' # Do not add tests here unless they use the HTTP server, as they will From patchwork Tue Jan 31 13:29:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13123011 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C31E7C38142 for ; Tue, 31 Jan 2023 13:30:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232244AbjAaNaE (ORCPT ); Tue, 31 Jan 2023 08:30:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40856 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232224AbjAaN3j (ORCPT ); Tue, 31 Jan 2023 08:29:39 -0500 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8AD604B76D for ; Tue, 31 Jan 2023 05:29:31 -0800 (PST) Received: by mail-wm1-x330.google.com with SMTP id c4-20020a1c3504000000b003d9e2f72093so12459438wma.1 for ; Tue, 31 Jan 2023 05:29:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=foPdSOP1k+3PLwuXXDyn4B/taHtg4WPtOdzuqttAkSY=; b=lhxQ/bDE+f513zwtcb1nIsm0cFLiRln/IQN1m74JKbsSs8RxIsN+EG/RSus4q9zZT/ MTxQ/F5iksnS3npa5kpPsBGLZ8/UJahS3brmwm5iAGwMFa7A0PY0aWpfvAN8hY3jgjZE PQ5wkj4plgNRGU+ZGKHMxFWrKcLN2SiR0uzEcoOTkdZy20aAe4+fT9l0i5y7admLQg2l FtktbNS/UXrc0T1FEv8PsjyhIo4uJ841q6UURd07QJwEt4C4F8ACo2VP+zNTeCx2KdAQ wlTaKnnXsmYHZMyzC32JrKKexmIPQJ2GXi21G/evwFYmFRawkqSjOXHFR6c7Rvj0GPjH GvzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=foPdSOP1k+3PLwuXXDyn4B/taHtg4WPtOdzuqttAkSY=; b=E7h90SqHaJd8JfiAtNq9agpz6cre8QoOihSrd9dC1ma+JFeTuFYik3TW8I/HkxAR1M rnLtF7K1Uw79wuMKYbisWUkgkUWHe9zxm0AbW01LKdWOXkUyQDbUuDQW93/gCo8npf3w fdwnOgFPyROTFVI6u/Dp9pfodI+KQRrRN3dp3fBiVA41WdZ8wCVRtjvns2409cHVFd7i ODlhG01eGOAs+n4694PpFEd02tzU2TkdSyeofAbASOCvdcFF0IljwFk0wRODRWfPOJWh vpNnM9GHmUMhpNGTqv1fjRT46ifIVD8yVnPNcwxWLRsl1KU9h5cgoJIaWKTzlOIP2oCh K9xA== X-Gm-Message-State: AFqh2koo9LZ61ynv6nNVWyWXUSKV9iabLIfgQjq8WLXib9EAwWLMFZ1I IjudFmRJ8EENFVbSof8irlhqtggsBBU= X-Google-Smtp-Source: AMrXdXs1SaRd5GnhksVYE9SQS65n2icyCx4fSesZjIUJkhTpJwJoLdfJElLEupuzAEpvh8UIoTz1bQ== X-Received: by 2002:a05:600c:1c9d:b0:3da:db4:6105 with SMTP id k29-20020a05600c1c9d00b003da0db46105mr54537562wms.37.1675171770821; Tue, 31 Jan 2023 05:29:30 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o16-20020a05600c379000b003dc49e0132asm10470109wmr.1.2023.01.31.05.29.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Jan 2023 05:29:30 -0800 (PST) Message-Id: In-Reply-To: References: Date: Tue, 31 Jan 2023 13:29:19 +0000 Subject: [PATCH v3 11/11] bundle-uri: test missing bundles with heuristic Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The creationToken heuristic uses a different mechanism for downloading bundles from the "standard" approach. Specifically: it uses a concrete order based on the creationToken values and attempts to download as few bundles as possible. It also modifies local config to store a value for future fetches to avoid downloading bundles, if possible. However, if any of the individual bundles has a failed download, then the logic for the ordering comes into question. It is important to avoid infinite loops, assigning invalid creation token values in config, but also to be opportunistic as possible when downloading as many bundles as seem appropriate. These tests were used to inform the implementation of fetch_bundles_by_token() in bundle-uri.c, but are being added independently here to allow focusing on faulty downloads. There may be more cases that could be added that result in modifications to fetch_bundles_by_token() as interesting data shapes reveal themselves in real scenarios. Signed-off-by: Derrick Stolee --- t/t5558-clone-bundle-uri.sh | 400 ++++++++++++++++++++++++++++++++++++ 1 file changed, 400 insertions(+) diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 9c2b7934b9b..afd56926c53 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -618,6 +618,406 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' test_cmp expect actual ' +test_expect_success 'creationToken heuristic with failed downloads (clone)' ' + test_when_finished rm -rf download-* trace*.txt && + + # Case 1: base bundle does not exist, nothing can unbundle + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = fake.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone-1.txt" \ + git clone --single-branch --branch=base \ + --bundle-uri="$HTTPD_URL/bundle-list" \ + "$HTTPD_URL/smart/fetch.git" download-1 && + + # Bundle failure does not set these configs. + test_must_fail git -C download-1 config fetch.bundleuri && + test_must_fail git -C download-1 config fetch.bundlecreationtoken && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-4.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-2.bundle + $HTTPD_URL/fake.bundle + EOF + test_remote_https_urls actual && + test_cmp expect actual && + + # All bundles failed to unbundle + git -C download-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + test_must_be_empty refs && + + # Case 2: middle bundle does not exist, only two bundles can unbundle + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = fake.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone-2.txt" \ + git clone --single-branch --branch=base \ + --bundle-uri="$HTTPD_URL/bundle-list" \ + "$HTTPD_URL/smart/fetch.git" download-2 && + + # Bundle failure does not set these configs. + test_must_fail git -C download-2 config fetch.bundleuri && + test_must_fail git -C download-2 config fetch.bundlecreationtoken && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-4.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/fake.bundle + $HTTPD_URL/bundle-1.bundle + EOF + test_remote_https_urls actual && + test_cmp expect actual && + + # bundle-1 and bundle-3 could unbundle, but bundle-4 could not + git -C download-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-EOF && + refs/bundles/base + refs/bundles/right + EOF + test_cmp expect refs && + + # Case 3: top bundle does not exist, rest unbundle fine. + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = fake.bundle + creationToken = 4 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone-3.txt" \ + git clone --single-branch --branch=base \ + --bundle-uri="$HTTPD_URL/bundle-list" \ + "$HTTPD_URL/smart/fetch.git" download-3 && + + # As long as we have continguous successful downloads, + # we _do_ set these configs. + test_cmp_config -C download-3 "$HTTPD_URL/bundle-list" fetch.bundleuri && + test_cmp_config -C download-3 3 fetch.bundlecreationtoken && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/fake.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-2.bundle + $HTTPD_URL/bundle-1.bundle + EOF + test_remote_https_urls actual && + test_cmp expect actual && + + # fake.bundle did not unbundle, but the others did. + git -C download-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/right + EOF + test_cmp expect refs +' + +# Expand the bundle list to include other interesting shapes, specifically +# interesting for use when fetching from a previous state. +# +# ---------------- bundle-7 +# 7 +# _/|\_ +# ---/--|--\------ bundle-6 +# 5 | 6 +# --|---|---|----- bundle-4 +# | 4 | +# | / \ / +# --|-|---|/------ bundle-3 (the client will be caught up to this point.) +# \ | 3 +# ---\|---|------- bundle-2 +# 2 | +# ----|---|------- bundle-1 +# \ / +# 1 +# | +# (previous commits) +test_expect_success 'expand incremental bundle list' ' + ( + cd clone-from && + git checkout -b lefter left && + test_commit 5 && + git checkout -b righter right && + test_commit 6 && + git checkout -b top lefter && + git merge -m "7" merge righter && + + git bundle create bundle-6.bundle lefter righter --not left right && + git bundle create bundle-7.bundle top --not lefter merge righter && + + cp bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" + ) && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" fetch origin +refs/heads/*:refs/heads/* +' + +test_expect_success 'creationToken heuristic with failed downloads (fetch)' ' + test_when_finished rm -rf download-* trace*.txt && + + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + EOF + + git clone --single-branch --branch=left \ + --bundle-uri="$HTTPD_URL/bundle-list" \ + "$HTTPD_URL/smart/fetch.git" fetch-base && + test_cmp_config -C fetch-base "$HTTPD_URL/bundle-list" fetch.bundleURI && + test_cmp_config -C fetch-base 3 fetch.bundleCreationToken && + + # Case 1: all bundles exist: successful unbundling of all bundles + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + + [bundle "bundle-6"] + uri = bundle-6.bundle + creationToken = 6 + + [bundle "bundle-7"] + uri = bundle-7.bundle + creationToken = 7 + EOF + + cp -r fetch-base fetch-1 && + GIT_TRACE2_EVENT="$(pwd)/trace-fetch-1.txt" \ + git -C fetch-1 fetch origin && + test_cmp_config -C fetch-1 7 fetch.bundlecreationtoken && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-7.bundle + $HTTPD_URL/bundle-6.bundle + $HTTPD_URL/bundle-4.bundle + EOF + test_remote_https_urls actual && + test_cmp expect actual && + + # Check which bundles have unbundled by refs + git -C fetch-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/lefter + refs/bundles/merge + refs/bundles/right + refs/bundles/righter + refs/bundles/top + EOF + test_cmp expect refs && + + # Case 2: middle bundle does not exist, only bundle-4 can unbundle + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + + [bundle "bundle-6"] + uri = fake.bundle + creationToken = 6 + + [bundle "bundle-7"] + uri = bundle-7.bundle + creationToken = 7 + EOF + + cp -r fetch-base fetch-2 && + GIT_TRACE2_EVENT="$(pwd)/trace-fetch-2.txt" \ + git -C fetch-2 fetch origin && + + # Since bundle-7 fails to unbundle, do not update creation token. + test_cmp_config -C fetch-2 3 fetch.bundlecreationtoken && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-7.bundle + $HTTPD_URL/fake.bundle + $HTTPD_URL/bundle-4.bundle + EOF + test_remote_https_urls actual && + test_cmp expect actual && + + # Check which bundles have unbundled by refs + git -C fetch-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/merge + refs/bundles/right + EOF + test_cmp expect refs && + + # Case 3: top bundle does not exist, rest unbundle fine. + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + + [bundle "bundle-6"] + uri = bundle-6.bundle + creationToken = 6 + + [bundle "bundle-7"] + uri = fake.bundle + creationToken = 7 + EOF + + cp -r fetch-base fetch-3 && + GIT_TRACE2_EVENT="$(pwd)/trace-fetch-3.txt" \ + git -C fetch-3 fetch origin && + + # As long as we have continguous successful downloads, + # we _do_ set the maximum creation token. + test_cmp_config -C fetch-3 6 fetch.bundlecreationtoken && + + # NOTE: the fetch skips bundle-4 since bundle-6 successfully + # unbundles itself and bundle-7 failed to download. + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/fake.bundle + $HTTPD_URL/bundle-6.bundle + EOF + test_remote_https_urls actual && + test_cmp expect actual && + + # Check which bundles have unbundled by refs + git -C fetch-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/lefter + refs/bundles/right + refs/bundles/righter + EOF + test_cmp expect refs +' + # Do not add tests here unless they use the HTTP server, as they will # not run unless the HTTP dependencies exist.