From patchwork Mon Jan 23 15:21:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13112401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75BC6C54E94 for ; Mon, 23 Jan 2023 15:22:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232290AbjAWPWb (ORCPT ); Mon, 23 Jan 2023 10:22:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232473AbjAWPW0 (ORCPT ); Mon, 23 Jan 2023 10:22:26 -0500 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3AF842940A for ; Mon, 23 Jan 2023 07:22:02 -0800 (PST) Received: by mail-wr1-x42f.google.com with SMTP id q5so6487828wrv.0 for ; Mon, 23 Jan 2023 07:22:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=SH9IwcCpLB6DzLtgeSA7bFT1dV0sdIfD4B6P7ktcuyc=; b=OPA86HsnrAZiPd9Hj4oy30MWRRpjwKTN7y4FkaTUK1el/n8+K09c+wYs3GViSu2iax l7qg3yiwS+swCaLz85fLolkHs7/A6IRgw2IXzjZV2YnYZtsanTWILn5tCKSGwrUclygc FRmqPkSgU1oZxplfROCCTMMvDCuacv1/pgVPiv0oV2Xayf1L3mOc17L8jVptImA0DUfQ QjkSuyKBLXwcW22MlSl7RQTbarp3zav4oiosaYzA0YhEWTKhthbbCmQfaqwd857OhroB 9tSAOrroZVfwqlm5+OB8kWzvOLtQiCKTs76wJ7VSuxMF/zpOND+TvM9Gt+CA1Q7u65vA xMnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SH9IwcCpLB6DzLtgeSA7bFT1dV0sdIfD4B6P7ktcuyc=; b=B0pejh0nCJye2Y2/ulpHstC/GXTmEcRDIsuB6fRJ9K2OYXFZee8SQGmWujl01Bl88P Yl+sRk47mg0vVZi8QbaSAXvc1t65iGgur4ZX/vT1ocDMmqX0TUj9IxVkcD+22SD/n8Bn M1XWLHp/JOdEW1JUjUeGr8jpEFrIlLDphr7zbcjLdGboWbfE+p5VntnRFTSzlK9d3vYa zz9GldAH+Ohx3zOeFMLjZPGFwxLlIBubHSopUNbdnjgjHmVw42aoabTeMroxWjdPqnFZ 4wRu/mthdRZxqDgxTm7NdlPulnYn2uXruogBbP1St3Q6E546mzGdMOkPfzQ7mefKkzpg pAsA== X-Gm-Message-State: AFqh2ko3lgZXFImO4NvQ6nL2KbWr3r+bihjmi8XVVoUz0wugLL6RjgFf EDbjosVzQDrSrX0yuOH5A2SWGr0S2yY= X-Google-Smtp-Source: AMrXdXuPnMG+wsnnaqp6lEQ0c/uDOAFruORbTHgeszCGvZNnQgqQU7C58iIYZSpuQzH2frCz8+Oyyw== X-Received: by 2002:a5d:6a86:0:b0:2bb:e805:c1ef with SMTP id s6-20020a5d6a86000000b002bbe805c1efmr21501266wru.52.1674487313288; Mon, 23 Jan 2023 07:21:53 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id r9-20020a056000014900b002366dd0e030sm4426200wrx.68.2023.01.23.07.21.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Jan 2023 07:21:53 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 23 Jan 2023 15:21:41 +0000 Subject: [PATCH v2 01/10] bundle: optionally skip reachability walk Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When unbundling a bundle, the verify_bundle() method checks two things with regards to the prerequisite commits: 1. Those commits are in the object store, and 2. Those commits are reachable from refs. During testing of the bundle URI feature, where multiple bundles are unbundled in the same process, the ref store did not appear to be refreshing with the new refs/bundles/* references added within that process. This caused the second half -- the reachability walk -- report that some commits were not present, despite actually being present. One way to attempt to fix this would be to create a way to force-refresh the ref state. That would correct this for these cases where the refs/bundles/* references have been updated. However, this still is an expensive operation in a repository with many references. Instead, optionally allow callers to skip this portion by instead just checking for presence within the object store. Use this when unbundling in bundle-uri.c. Signed-off-by: Derrick Stolee --- bundle-uri.c | 8 +++++++- bundle.c | 3 ++- bundle.h | 1 + 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index 36268dda172..2f079f713cf 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -322,9 +322,15 @@ static int unbundle_from_file(struct repository *r, const char *file) * Skip the reachability walk here, since we will be adding * a reachable ref pointing to the new tips, which will reach * the prerequisite commits. + * + * Since multiple iterations of unbundle_from_file() can create + * new commits in the object store that are not reachable from + * the current cached state of the ref store, skip the reachability + * walk and move forward as long as the objects are present in the + * object store. */ if ((result = unbundle(r, &header, bundle_fd, NULL, - VERIFY_BUNDLE_QUIET))) + VERIFY_BUNDLE_QUIET | VERIFY_BUNDLE_SKIP_REACHABLE))) return 1; /* diff --git a/bundle.c b/bundle.c index 4ef7256aa11..b51974f0806 100644 --- a/bundle.c +++ b/bundle.c @@ -223,7 +223,8 @@ int verify_bundle(struct repository *r, error("%s", message); error("%s %s", oid_to_hex(oid), name); } - if (revs.pending.nr != p->nr) + if (revs.pending.nr != p->nr || + (flags & VERIFY_BUNDLE_SKIP_REACHABLE)) goto cleanup; req_nr = revs.pending.nr; setup_revisions(2, argv, &revs, NULL); diff --git a/bundle.h b/bundle.h index 9f2bd733a6a..24c30e5f74a 100644 --- a/bundle.h +++ b/bundle.h @@ -34,6 +34,7 @@ int create_bundle(struct repository *r, const char *path, enum verify_bundle_flags { VERIFY_BUNDLE_VERBOSE = (1 << 0), VERIFY_BUNDLE_QUIET = (1 << 1), + VERIFY_BUNDLE_SKIP_REACHABLE = (1 << 2), }; int verify_bundle(struct repository *r, struct bundle_header *header, From patchwork Mon Jan 23 15:21:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13112402 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 590DAC54E94 for ; Mon, 23 Jan 2023 15:22:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232421AbjAWPWe (ORCPT ); Mon, 23 Jan 2023 10:22:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53494 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231912AbjAWPW1 (ORCPT ); Mon, 23 Jan 2023 10:22:27 -0500 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48EA529414 for ; Mon, 23 Jan 2023 07:22:02 -0800 (PST) Received: by mail-wr1-x430.google.com with SMTP id t5so11168302wrq.1 for ; Mon, 23 Jan 2023 07:22:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=GNgYHe2+hAaU8wbxFrTySxoshKCEj59J0s/xFDZd5wE=; b=bHbsldQ09pXgkv9dti/WWsXfcUZkj6ZMocSZJffaimIqdQWlcUh4MRWRaAiyKNllPJ 9DaR33XW2zbtyXPQkmWdDZZVwduVD+zdCVh8TfBFgsVYreww/5K6kVw4UAwsdpCBNM25 GrwD6WOrFrZort7/RIjy9Kj6wTur9gZaoMhKbu9TpQsEYI8jP21hHnMvBoY6Nxl8L737 v6VC6C2a/CIOhfCCeJIpO8EpUjoE/OTzQW3CIYpL7nbMPxBRA/7k/naYgxk9b7cNb1qg 5cepX+nMDaehAWs0c08HBQ92BCIqb7ve6RBgUFMcdVb3YujcMQQDiFZnBzWJdZhcCSlK Jo2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GNgYHe2+hAaU8wbxFrTySxoshKCEj59J0s/xFDZd5wE=; b=wQWYsBmL3uE0XTauaxjmEKdW3fkDOU7EmBOqEzgQAeeY0ZFesLD7qOaYDb83LAAtnx Y2bacx+rp4dLYPU9ExPmTD3hzj6iZ3TgfV4hFpehWkimvdf7thQhmUgadJPkl4GKreEx WKl6jyhDjF/2DdGzOIbIbTom+XXrzaXNvzC54dXuzK0sPJvQlGZtF434rLj8XvSCJPoG 7AO/V+wW5tQOQYpwUv2n8+GflxLrBwFLLV0ZTX8UlUyIt0HmWufFqDZBjqy02GKsRQHs TdudtkBMoAABnGHJ4254XtHoX/EAxf100v1s/xJiyMF9/8hxUSMcbGHDPmDzDC9JWfl5 47vg== X-Gm-Message-State: AFqh2kq2rj0Y+SjkNyGL5vswRyV3vhc9FlhjXzDWltyzmn+9BtQIWegf hP9mUgJqq+yLKIJRC9BiAZeOYkHFw+Q= X-Google-Smtp-Source: AMrXdXsXKOJ06Fgr73c+N9AesgwoD9XGIGYz44JUAvkR13cTzK57EtxQ/ktV/FhhA7ca/icE0mFM2w== X-Received: by 2002:adf:f992:0:b0:2bd:fdce:f206 with SMTP id f18-20020adff992000000b002bdfdcef206mr20856060wrr.2.1674487314251; Mon, 23 Jan 2023 07:21:54 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id w7-20020adfd4c7000000b002baa780f0fasm4909811wrk.111.2023.01.23.07.21.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Jan 2023 07:21:53 -0800 (PST) Message-Id: <427aff4d5e5c85b601f43af8b664515380e11453.1674487310.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 23 Jan 2023 15:21:42 +0000 Subject: [PATCH v2 02/10] t5558: add tests for creationToken heuristic Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee As documented in the bundle URI design doc in 2da14fad8fe (docs: document bundle URI standard, 2022-08-09), the 'creationToken' member of a bundle URI allows a bundle provider to specify a total order on the bundles. Future changes will allow the Git client to understand these members and modify its behavior around downloading the bundles in that order. In the meantime, create tests that add creation tokens to the bundle list. For now, the Git client correctly ignores these unknown keys. Create a new test helper function, test_remote_https_urls, which filters GIT_TRACE2_EVENT output to extract a list of URLs passed to git-remote-https child processes. This can be used to verify the order of these requests as we implement the creationToken heuristic. For now, we need to sort the actual output since the current client does not have a well-defined order that it applies to the bundles. Signed-off-by: Derrick Stolee --- t/t5558-clone-bundle-uri.sh | 69 +++++++++++++++++++++++++++++++++++-- t/test-lib-functions.sh | 8 +++++ 2 files changed, 75 insertions(+), 2 deletions(-) diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 9155f31fa2c..474432c8ace 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -285,6 +285,8 @@ test_expect_success 'clone HTTP bundle' ' ' test_expect_success 'clone bundle list (HTTP, no heuristic)' ' + test_when_finished rm -f trace*.txt && + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && [bundle] @@ -304,12 +306,26 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' ' uri = $HTTPD_URL/bundle-4.bundle EOF - git clone --bundle-uri="$HTTPD_URL/bundle-list" \ + GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \ + git clone --bundle-uri="$HTTPD_URL/bundle-list" \ clone-from clone-list-http 2>err && ! grep "Repository lacks these prerequisite commits" err && git -C clone-from for-each-ref --format="%(objectname)" >oids && - git -C clone-list-http cat-file --batch-check expect <<-EOF && + $HTTPD_URL/bundle-1.bundle + $HTTPD_URL/bundle-2.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-4.bundle + $HTTPD_URL/bundle-list + EOF + + # Sort the list, since the order is not well-defined + # without a heuristic. + test_remote_https_urls actual && + test_cmp expect actual ' test_expect_success 'clone bundle list (HTTP, any mode)' ' @@ -350,6 +366,55 @@ test_expect_success 'clone bundle list (HTTP, any mode)' ' test_cmp expect actual ' +test_expect_success 'clone bundle list (http, creationToken)' ' + test_when_finished rm -f trace*.txt && + + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" git \ + clone --bundle-uri="$HTTPD_URL/bundle-list" \ + "$HTTPD_URL/smart/fetch.git" clone-list-http-2 && + + git -C clone-from for-each-ref --format="%(objectname)" >oids && + git -C clone-list-http-2 cat-file --batch-check expect <<-EOF && + $HTTPD_URL/bundle-1.bundle + $HTTPD_URL/bundle-2.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-4.bundle + $HTTPD_URL/bundle-list + EOF + + # Since the creationToken heuristic is not yet understood by the + # client, the order cannot be verified at this moment. Sort the + # list for consistent results. + test_remote_https_urls actual && + test_cmp expect actual +' + # Do not add tests here unless they use the HTTP server, as they will # not run unless the HTTP dependencies exist. diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index f036c4d3003..ace542f4226 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1833,6 +1833,14 @@ test_region () { return 0 } +# Given a GIT_TRACE2_EVENT log over stdin, writes to stdout a list of URLs +# sent to git-remote-https child processes. +test_remote_https_urls() { + grep -e '"event":"child_start".*"argv":\["git-remote-https",".*"\]' | + sed -e 's/{"event":"child_start".*"argv":\["git-remote-https","//g' \ + -e 's/"\]}//g' +} + # Print the destination of symlink(s) provided as arguments. Basically # the same as the readlink command, but it's not available everywhere. test_readlink () { From patchwork Mon Jan 23 15:21:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13112404 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50686C05027 for ; Mon, 23 Jan 2023 15:22:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232000AbjAWPWs (ORCPT ); Mon, 23 Jan 2023 10:22:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231946AbjAWPW1 (ORCPT ); Mon, 23 Jan 2023 10:22:27 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B48A828D0B for ; Mon, 23 Jan 2023 07:22:02 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id d2so11132918wrp.8 for ; Mon, 23 Jan 2023 07:22:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=3nruLYI83rK7a0+moEicZfo/x6SocdUj6R2o/a4eOLg=; b=n2Aw1aWV5oavQ0bd7vzD9AFcmH5RdyHgKrhV8w7MgCXqgTbqJJzUVv66wmJGoTmeJu eKtKYYePnios6WqhpAoEskrHILR2o4bddaVPUIORU4dh+1Ftd+VCYbB/ZWTYIxV+uEZL 0YwzcZSmOpr0V9dLQEnXSdFY2s8Xwa/tXA4nKcL5p9nk3Ms1diqoN97sHIAJzp4YI4ch vYWYOSFYs9YrbQmq7LjxF1J3qz5uzVUW2vsjveExkTytTvZ0+zIhkP6iPs8NYNQrh2VO JVTudzdmRLohKIpZZEr1AjVurhdvaOyRNTqPnj9SULYY43o1t+pElvIOzCqKMKh7ZzDX CvAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3nruLYI83rK7a0+moEicZfo/x6SocdUj6R2o/a4eOLg=; b=7uTfF0w7EARhC0B1eIDwMJewvJHrys1SQZqnelKDws90aa4DtpPHFCZpYdemY0rL7x 8WK8H/6SiKd17NYZRgWOl+fTOuqso0HArkOHAlnOwmJ1dmKb44sY1T/st3ai9yg+v3DO u2kM71kVHvgzCN+wHI6qF6Wdt7SZwfsgjv5fOWmIv8Ryl4zwBRu0zDgKdt/wzkY/Rcy5 49/gi9CZK0bZ0lTQpnYPDdiA3w1/GuGfrchGemF/hQiSG5gvE+4CNw7RFCOQA+JlmCX8 sa8JTMVG1TQ44re9GewC0ypmvilNfcoogCAK3gukcD7ObU1rNVd4P71Xn3KEPUyGyruQ Pj0A== X-Gm-Message-State: AFqh2kpXc4wreWiLY/pQwXZoQ2m5z6xOT6eu7VwfMsU7+WWLWhDhnDxT zDp+8yLX68mChQ7mGybZ3mdM2/uebfg= X-Google-Smtp-Source: AMrXdXtOkiLPsIq87qFmYxJR7XEzzpCa46JaP4OBpj5XL8bM4CQztGkgzdd3WCaUTV/PTKJNvx0+dw== X-Received: by 2002:a05:6000:1290:b0:2b6:8a41:a949 with SMTP id f16-20020a056000129000b002b68a41a949mr21003892wrx.46.1674487315080; Mon, 23 Jan 2023 07:21:55 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u8-20020adfdd48000000b00241fab5a296sm4623690wrm.40.2023.01.23.07.21.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Jan 2023 07:21:54 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 23 Jan 2023 15:21:43 +0000 Subject: [PATCH v2 03/10] bundle-uri: parse bundle.heuristic=creationToken Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The bundle.heuristic value communicates that the bundle list is organized to make use of the bundle..creationToken values that may be provided in the bundle list. Those values will create a total order on the bundles, allowing the Git client to download them in a specific order and even remember previously-downloaded bundles by storing the maximum creation token value. Before implementing any logic that parses or uses the bundle..creationToken values, teach Git to parse the bundle.heuristic value from a bundle list. We can use 'test-tool bundle-uri' to print the heuristic value and verify that the parsing works correctly. As an extra precaution, create the internal 'heuristics' array to be a list of (enum, string) pairs so we can iterate through the array entries carefully, regardless of the enum values. Signed-off-by: Derrick Stolee --- Documentation/config/bundle.txt | 7 +++++++ bundle-uri.c | 34 +++++++++++++++++++++++++++++++++ bundle-uri.h | 14 ++++++++++++++ t/t5750-bundle-uri-parse.sh | 19 ++++++++++++++++++ 4 files changed, 74 insertions(+) diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt index daa21eb674a..3faae386853 100644 --- a/Documentation/config/bundle.txt +++ b/Documentation/config/bundle.txt @@ -15,6 +15,13 @@ bundle.mode:: complete understanding of the bundled information (`all`) or if any one of the listed bundle URIs is sufficient (`any`). +bundle.heuristic:: + If this string-valued key exists, then the bundle list is designed to + work well with incremental `git fetch` commands. The heuristic signals + that there are additional keys available for each bundle that help + determine which subset of bundles the client should download. The + only value currently understood is `creationToken`. + bundle..*:: The `bundle..*` keys are used to describe a single item in the bundle list, grouped under `` for identification purposes. diff --git a/bundle-uri.c b/bundle-uri.c index 2f079f713cf..0d64b1d84ba 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -9,6 +9,14 @@ #include "config.h" #include "remote.h" +static struct { + enum bundle_list_heuristic heuristic; + const char *name; +} heuristics[BUNDLE_HEURISTIC__COUNT] = { + { BUNDLE_HEURISTIC_NONE, ""}, + { BUNDLE_HEURISTIC_CREATIONTOKEN, "creationToken" }, +}; + static int compare_bundles(const void *hashmap_cmp_fn_data, const struct hashmap_entry *he1, const struct hashmap_entry *he2, @@ -100,6 +108,17 @@ void print_bundle_list(FILE *fp, struct bundle_list *list) fprintf(fp, "\tversion = %d\n", list->version); fprintf(fp, "\tmode = %s\n", mode); + if (list->heuristic) { + int i; + for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) { + if (heuristics[i].heuristic == list->heuristic) { + printf("\theuristic = %s\n", + heuristics[list->heuristic].name); + break; + } + } + } + for_all_bundles_in_list(list, summarize_bundle, fp); } @@ -142,6 +161,21 @@ static int bundle_list_update(const char *key, const char *value, return 0; } + if (!strcmp(subkey, "heuristic")) { + int i; + for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) { + if (heuristics[i].heuristic && + heuristics[i].name && + !strcmp(value, heuristics[i].name)) { + list->heuristic = heuristics[i].heuristic; + return 0; + } + } + + /* Ignore unknown heuristics. */ + return 0; + } + /* Ignore other unknown global keys. */ return 0; } diff --git a/bundle-uri.h b/bundle-uri.h index d5e89f1671c..2e44a50a90b 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -52,6 +52,14 @@ enum bundle_list_mode { BUNDLE_MODE_ANY }; +enum bundle_list_heuristic { + BUNDLE_HEURISTIC_NONE = 0, + BUNDLE_HEURISTIC_CREATIONTOKEN, + + /* Must be last. */ + BUNDLE_HEURISTIC__COUNT +}; + /** * A bundle_list contains an unordered set of remote_bundle_info structs, * as well as information about the bundle listing, such as version and @@ -75,6 +83,12 @@ struct bundle_list { * advertised by the bundle list at that location. */ char *baseURI; + + /** + * A list can have a heuristic, which helps reduce the number of + * downloaded bundles. + */ + enum bundle_list_heuristic heuristic; }; void init_bundle_list(struct bundle_list *list); diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh index 7b4f930e532..6fc92a9c0d4 100755 --- a/t/t5750-bundle-uri-parse.sh +++ b/t/t5750-bundle-uri-parse.sh @@ -250,4 +250,23 @@ test_expect_success 'parse config format edge cases: empty key or value' ' test_cmp_config_output expect actual ' +test_expect_success 'parse config format: creationToken heuristic' ' + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + [bundle "one"] + uri = http://example.com/bundle.bdl + [bundle "two"] + uri = https://example.com/bundle.bdl + [bundle "three"] + uri = file:///usr/share/git/bundle.bdl + EOF + + test-tool bundle-uri parse-config expect >actual 2>err && + test_must_be_empty err && + test_cmp_config_output expect actual +' + test_done From patchwork Mon Jan 23 15:21:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13112403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B63FC05027 for ; Mon, 23 Jan 2023 15:22:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232019AbjAWPWq (ORCPT ); Mon, 23 Jan 2023 10:22:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231571AbjAWPW1 (ORCPT ); Mon, 23 Jan 2023 10:22:27 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F15D6D535 for ; Mon, 23 Jan 2023 07:22:02 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id q10-20020a1cf30a000000b003db0edfdb74so4515676wmq.1 for ; Mon, 23 Jan 2023 07:22:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=r6cSfUKeOq20ZqZT1MVsgnW3c3FiqM7zn3svGaN2eUQ=; b=leWw4e8N9YnjdXSj3HFeuB+wU+RjNJEWZukTBvagwOqBcpWB4z5xjYAOTOsOAMnaPK BPm6u69EZGnXnVx+/DNyX4RlEcpnJ1Z1+8BQQGH4GO0U/IcSW3a+o0RP7BwsdAcLIjaF WyLYU4M/tLnqHDZkt9Xi7tYS1ZIhrqnbcDNo44jzlnrQM+h9WMsqnYs+4c4ilgWOI6L7 VCyAYFvVLnqwO0Z2zgcvEXnlKG9S9NlHr9iXxKkifb8qUXrbD9V7aEKiL+fj4JcKQRyw meyq/2cAviAjYE4TbzbmkfLw5Q+KpAHXyzmOAu7+Kco1duYnRmblwa7EUly/GMpliuy5 D8qQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r6cSfUKeOq20ZqZT1MVsgnW3c3FiqM7zn3svGaN2eUQ=; b=xCDgrkEPxJlodSF2vkN3yktYWQQCZZySnrBvrfpCOb1pe0ZAPzNQDEtLxQc6Dvk9Vj DYPcVe0s7Q4Nh2Pl4KGxh9AiX0zOuat7xRfxxO2lB5uj5ic+itukPraLQAlsl34OcAwQ mdH+rIB+MYOVmu3+KFRQt7hUztBij2kN+maG7yo5YbKOUS0lR9iaorGhs2oUCBYGVqIl F5UtKn4lkp6gCa7ex/GhvZ0M4Rjr92+9UCo1xtep2C4e9rGjunA4idbb3DlZX3DczoJV 9ZErUSnHQiy6qC53sx2ADj8rVZH4pIHPEQExB50DHIGz7bouVRr53iipxdAfBkgHKCsL nYIQ== X-Gm-Message-State: AFqh2koKp6jpD6E9V2yQT2Zz4fn6YNV3fn46ACihu6+RgSw0F4ZWgrPu XqtQnXAExxBQVihOkKyuRwD9EkQ1j88= X-Google-Smtp-Source: AMrXdXs+09w8p2+dkgjYlNiyV2Etzy6aNFcuME/NsPntk32OlFzO1lo1cl3aR4bKr2qEY3U6UOjtBg== X-Received: by 2002:a05:600c:1d12:b0:3da:ff82:f627 with SMTP id l18-20020a05600c1d1200b003daff82f627mr25341769wms.25.1674487315953; Mon, 23 Jan 2023 07:21:55 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id h19-20020a05600c351300b003cfd4e6400csm11402131wmq.19.2023.01.23.07.21.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Jan 2023 07:21:55 -0800 (PST) Message-Id: <12efa228d04c9379ec0598c974045ed851b068d9.1674487310.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 23 Jan 2023 15:21:44 +0000 Subject: [PATCH v2 04/10] bundle-uri: parse bundle..creationToken values Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The previous change taught Git to parse the bundle.heuristic value, especially when its value is "creationToken". Now, teach Git to parse the bundle..creationToken values on each bundle in a bundle list. Before implementing any logic based on creationToken values for the creationToken heuristic, parse and print these values for testing purposes. Signed-off-by: Derrick Stolee --- bundle-uri.c | 10 ++++++++++ bundle-uri.h | 6 ++++++ t/t5750-bundle-uri-parse.sh | 18 ++++++++++++++++++ 3 files changed, 34 insertions(+) diff --git a/bundle-uri.c b/bundle-uri.c index 0d64b1d84ba..f46ab5c1743 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -83,6 +83,9 @@ static int summarize_bundle(struct remote_bundle_info *info, void *data) FILE *fp = data; fprintf(fp, "[bundle \"%s\"]\n", info->id); fprintf(fp, "\turi = %s\n", info->uri); + + if (info->creationToken) + fprintf(fp, "\tcreationToken = %"PRIu64"\n", info->creationToken); return 0; } @@ -203,6 +206,13 @@ static int bundle_list_update(const char *key, const char *value, return 0; } + if (!strcmp(subkey, "creationtoken")) { + if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1) + warning(_("could not parse bundle list key %s with value '%s'"), + "creationToken", value); + return 0; + } + /* * At this point, we ignore any information that we don't * understand, assuming it to be hints for a heuristic the client diff --git a/bundle-uri.h b/bundle-uri.h index 2e44a50a90b..ef32840bfa6 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -42,6 +42,12 @@ struct remote_bundle_info { * this boolean is true. */ unsigned unbundled:1; + + /** + * If the bundle is part of a list with the creationToken + * heuristic, then we use this member for sorting the bundles. + */ + uint64_t creationToken; }; #define REMOTE_BUNDLE_INFO_INIT { 0 } diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh index 6fc92a9c0d4..81bdf58b944 100755 --- a/t/t5750-bundle-uri-parse.sh +++ b/t/t5750-bundle-uri-parse.sh @@ -258,10 +258,13 @@ test_expect_success 'parse config format: creationToken heuristic' ' heuristic = creationToken [bundle "one"] uri = http://example.com/bundle.bdl + creationToken = 123456 [bundle "two"] uri = https://example.com/bundle.bdl + creationToken = 12345678901234567890 [bundle "three"] uri = file:///usr/share/git/bundle.bdl + creationToken = 1 EOF test-tool bundle-uri parse-config expect >actual 2>err && @@ -269,4 +272,19 @@ test_expect_success 'parse config format: creationToken heuristic' ' test_cmp_config_output expect actual ' +test_expect_success 'parse config format edge cases: creationToken heuristic' ' + cat >expect <<-\EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + [bundle "one"] + uri = http://example.com/bundle.bdl + creationToken = bogus + EOF + + test-tool bundle-uri parse-config expect >actual 2>err && + grep "could not parse bundle list key creationToken with value '\''bogus'\''" err +' + test_done From patchwork Mon Jan 23 15:21:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13112405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9F11C54E94 for ; Mon, 23 Jan 2023 15:22:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231913AbjAWPWu (ORCPT ); Mon, 23 Jan 2023 10:22:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231883AbjAWPW2 (ORCPT ); Mon, 23 Jan 2023 10:22:28 -0500 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07DEF1EFED for ; Mon, 23 Jan 2023 07:22:03 -0800 (PST) Received: by mail-wm1-x333.google.com with SMTP id fl11-20020a05600c0b8b00b003daf72fc844so10886052wmb.0 for ; Mon, 23 Jan 2023 07:22:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=1J4PiALmbk8tjvT2I+N+9ZJNQzshR/5SGaQZobABn5U=; b=XJ8899WGegiDvaIpApPo7kFVj1fUwf4F62m48Xyu92xiDrr/hU7K4aswld9+hoT4G0 JkDGXX+J0Y9+SluqoG9tx+ylmUkf/cKGWX2HmwfDm/ICuUIBou5f1LAQpJs4JV3VZA7l sRbYUHx3tt8a1X5LqvfB+3RLshHo3lgtMDbvjsCUC73ni4SEFDa+0px3exqeWE3mxECz q0aw6f9LmeqnNlXLNgl6xZrJBdw2tx8qGAcNS6apOEge1/7f2o6iuBXIJEdFCKMBKD2X BjQ3o463z5ee0wKKsCsBY/fEGMzB66KqxvDRaRlRj8ykC06usu+w9FQbOdhqr4BZhuAB YY2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1J4PiALmbk8tjvT2I+N+9ZJNQzshR/5SGaQZobABn5U=; b=UpzbsysVsxCvBnx1zr/a3cC2LKS6qY3L7TwVkv4blX6lbhysovm5vNzt0FTwYUr7Vp aX+Z1fvv+ve7/LQi/JVywIKFWWJeVcuLnQ/YjqroKkLhpfFIWoFtKspDoAVmKES0NiF1 gzNv4zy92yd2BijvPShYyusDJsdOVsq23VCY/cmCuSw/MJHYvlo+hyDAKJXLb75Ip86I 7qIeq6UdkKmFW/nYSMNm6QmUrYZ7D8tJdFGbkXukvufUtm3D+dIWGEPFc+rGEIyMxltp BAs7pJfV/75NcAwf9MH65rA11asge16i2ofNMc9ZQ1PhIPVHQZj3adGbDC2PNVFlvY7l Sozg== X-Gm-Message-State: AFqh2krkK+jXVs4IvKAhWeOA8xeb/9rhxyAKPmcPuo3tuTaKZ22HoccM RhyBXqVwnsDg3ZmH4/JuQ57N03ffYbE= X-Google-Smtp-Source: AMrXdXupz0lGnHD2QyPm3xC7pWCocGl+v0ijGrramcRHYA9mwFduZFuNzniG1LzddsISWd65FN35ew== X-Received: by 2002:a05:600c:5405:b0:3d3:5709:68e8 with SMTP id he5-20020a05600c540500b003d3570968e8mr23925558wmb.36.1674487316774; Mon, 23 Jan 2023 07:21:56 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id u3-20020a7bc043000000b003d1d5a83b2esm10719073wmc.35.2023.01.23.07.21.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Jan 2023 07:21:56 -0800 (PST) Message-Id: <7cfaa3c518cbedb65c585cc02015bb21ae24e9fa.1674487310.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 23 Jan 2023 15:21:45 +0000 Subject: [PATCH v2 05/10] bundle-uri: download in creationToken order Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The creationToken heuristic provides an ordering on the bundles advertised by a bundle list. Teach the Git client to download bundles differently when this heuristic is advertised. The bundles in the list are sorted by their advertised creationToken values, then downloaded in decreasing order. This avoids the previous strategy of downloading bundles in an arbitrary order and attempting to apply them (likely failing in the case of required commits) until discovering the order through attempted unbundling. During a fresh 'git clone', it may make sense to download the bundles in increasing order, since that would prevent the need to attempt unbundling a bundle with required commits that do not exist in our empty object store. The cost of testing an unbundle is quite low, and instead the chosen order is optimizing for a future bundle download during a 'git fetch' operation with a non-empty object store. Since the Git client continues fetching from the Git remote after downloading and unbundling bundles, the client's object store can be ahead of the bundle provider's object store. The next time it attempts to download from the bundle list, it makes most sense to download only the most-recent bundles until all tips successfully unbundle. The strategy implemented here provides that short-circuit where the client downloads a minimal set of bundles. However, we are not satisfied by the naive approach of downloading bundles until one successfully unbundles, expecting the earlier bundles to successfully unbundle now. The example repository in t5558 demonstrates this well: ---------------- bundle-4 4 / \ ----|---|------- bundle-3 | | | 3 | | ----|---|------- bundle-2 | | 2 | | | ----|---|------- bundle-1 \ / 1 | (previous commits) In this repository, if we already have the objects for bundle-1 and then try to fetch from this list, the naive approach will fail. bundle-4 requires both bundle-3 and bundle-2, though bundle-3 will successfully unbundle without bundle-2. Thus, the algorithm needs to keep this in mind. A later implementation detail will store the maximum creationToken seen during such a bundle download, and the client will avoid downloading a bundle unless its creationToken is strictly greater than that stored value. For now, if the client seeks to download from an identical bundle list since its previous download, it will download the most-recent bundle then stop since its required commits are already in the object store. Add tests that exercise this behavior, but we will expand upon these tests when incremental downloads during 'git fetch' make use of creationToken values. Signed-off-by: Derrick Stolee --- bundle-uri.c | 156 +++++++++++++++++++++++++++++++++++- t/t5558-clone-bundle-uri.sh | 40 +++++++-- t/t5601-clone.sh | 46 +++++++++++ 3 files changed, 233 insertions(+), 9 deletions(-) diff --git a/bundle-uri.c b/bundle-uri.c index f46ab5c1743..39acd856fb9 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -453,6 +453,139 @@ static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data return 0; } +struct bundles_for_sorting { + struct remote_bundle_info **items; + size_t alloc; + size_t nr; +}; + +static int append_bundle(struct remote_bundle_info *bundle, void *data) +{ + struct bundles_for_sorting *list = data; + list->items[list->nr++] = bundle; + return 0; +} + +/** + * For use in QSORT() to get a list sorted by creationToken + * in decreasing order. + */ +static int compare_creation_token_decreasing(const void *va, const void *vb) +{ + const struct remote_bundle_info * const *a = va; + const struct remote_bundle_info * const *b = vb; + + if ((*a)->creationToken > (*b)->creationToken) + return -1; + if ((*a)->creationToken < (*b)->creationToken) + return 1; + return 0; +} + +static int fetch_bundles_by_token(struct repository *r, + struct bundle_list *list) +{ + int cur; + int move_direction = 0; + struct bundle_list_context ctx = { + .r = r, + .list = list, + .mode = list->mode, + }; + struct bundles_for_sorting bundles = { + .alloc = hashmap_get_size(&list->bundles), + }; + + ALLOC_ARRAY(bundles.items, bundles.alloc); + + for_all_bundles_in_list(list, append_bundle, &bundles); + + QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing); + + /* + * Attempt to download and unbundle the minimum number of bundles by + * creationToken in decreasing order. If we fail to unbundle (after + * a successful download) then move to the next non-downloaded bundle + * and attempt downloading. Once we succeed in applying a bundle, + * move to the previous unapplied bundle and attempt to unbundle it + * again. + * + * In the case of a fresh clone, we will likely download all of the + * bundles before successfully unbundling the oldest one, then the + * rest of the bundles unbundle successfully in increasing order + * of creationToken. + * + * If there are existing objects, then this process may terminate + * early when all required commits from "new" bundles exist in the + * repo's object store. + */ + cur = 0; + while (cur >= 0 && cur < bundles.nr) { + struct remote_bundle_info *bundle = bundles.items[cur]; + if (!bundle->file) { + /* + * Not downloaded yet. Try downloading. + * + * Note that bundle->file is non-NULL if a download + * was attempted, even if it failed to download. + */ + if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) { + /* Mark as unbundled so we do not retry. */ + bundle->unbundled = 1; + + /* Try looking deeper in the list. */ + move_direction = 1; + goto stack_operation; + } + + /* We expect bundles when using creationTokens. */ + if (!is_bundle(bundle->file, 1)) { + warning(_("file downloaded from '%s' is not a bundle"), + bundle->uri); + break; + } + } + + if (bundle->file && !bundle->unbundled) { + /* + * This was downloaded, but not successfully + * unbundled. Try unbundling again. + */ + if (unbundle_from_file(ctx.r, bundle->file)) { + /* Try looking deeper in the list. */ + move_direction = 1; + } else { + /* + * Succeeded in unbundle. Retry bundles + * that previously failed to unbundle. + */ + move_direction = -1; + bundle->unbundled = 1; + } + } + + /* + * Else case: downloaded and unbundled successfully. + * Skip this by moving in the same direction as the + * previous step. + */ + +stack_operation: + /* Move in the specified direction and repeat. */ + cur += move_direction; + } + + free(bundles.items); + + /* + * We succeed if the loop terminates because 'cur' drops below + * zero. The other case is that we terminate because 'cur' + * reaches the end of the list, so we have a failure no matter + * which bundles we apply from the list. + */ + return cur >= 0; +} + static int download_bundle_list(struct repository *r, struct bundle_list *local_list, struct bundle_list *global_list, @@ -490,7 +623,15 @@ static int fetch_bundle_list_in_config_format(struct repository *r, goto cleanup; } - if ((result = download_bundle_list(r, &list_from_bundle, + /* + * If this list uses the creationToken heuristic, then the URIs + * it advertises are expected to be bundles, not nested lists. + * We can drop 'global_list' and 'depth'. + */ + if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) { + result = fetch_bundles_by_token(r, &list_from_bundle); + global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN; + } else if ((result = download_bundle_list(r, &list_from_bundle, global_list, depth))) goto cleanup; @@ -632,6 +773,14 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list) int result; struct bundle_list global_list; + /* + * If the creationToken heuristic is used, then the URIs + * advertised by 'list' are not nested lists and instead + * direct bundles. We do not need to use global_list. + */ + if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) + return fetch_bundles_by_token(r, list); + init_bundle_list(&global_list); /* If a bundle is added to this global list, then it is required. */ @@ -640,7 +789,10 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list) if ((result = download_bundle_list(r, list, &global_list, 0))) goto cleanup; - result = unbundle_all_bundles(r, &global_list); + if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) + result = fetch_bundles_by_token(r, list); + else + result = unbundle_all_bundles(r, &global_list); cleanup: for_all_bundles_in_list(&global_list, unlink_bundle, NULL); diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 474432c8ace..6f9417a0afb 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -401,17 +401,43 @@ test_expect_success 'clone bundle list (http, creationToken)' ' git -C clone-list-http-2 cat-file --batch-check expect <<-EOF && - $HTTPD_URL/bundle-1.bundle - $HTTPD_URL/bundle-2.bundle - $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-list $HTTPD_URL/bundle-4.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-2.bundle + $HTTPD_URL/bundle-1.bundle + EOF + + test_remote_https_urls actual && + test_cmp expect actual +' + +test_expect_success 'clone incomplete bundle list (http, creationToken)' ' + test_when_finished rm -f trace*.txt && + + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + EOF + + GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \ + git clone --bundle-uri="$HTTPD_URL/bundle-list" \ + --single-branch --branch=base --no-tags \ + "$HTTPD_URL/smart/fetch.git" clone-token-http && + + cat >expect <<-EOF && $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-1.bundle EOF - # Since the creationToken heuristic is not yet understood by the - # client, the order cannot be verified at this moment. Sort the - # list for consistent results. - test_remote_https_urls actual && + test_remote_https_urls actual && test_cmp expect actual ' diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 1928ea1dd7c..b7d5551262c 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -831,6 +831,52 @@ test_expect_success 'auto-discover multiple bundles from HTTP clone' ' grep -f pattern trace.txt ' +test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' ' + test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" && + test_when_finished rm -rf clone-heuristic trace*.txt && + + test_commit -C src newest && + git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/newest.bundle" HEAD~1..HEAD && + git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" && + + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/repo4.git/config" <<-EOF && + [uploadPack] + advertiseBundleURIs = true + + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "everything"] + uri = $HTTPD_URL/everything.bundle + creationtoken = 1 + + [bundle "new"] + uri = $HTTPD_URL/new.bundle + creationtoken = 2 + + [bundle "newest"] + uri = $HTTPD_URL/newest.bundle + creationtoken = 3 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \ + git -c protocol.version=2 \ + -c transfer.bundleURI=true clone \ + "$HTTPD_URL/smart/repo4.git" clone-heuristic && + + cat >expect <<-EOF && + $HTTPD_URL/newest.bundle + $HTTPD_URL/new.bundle + $HTTPD_URL/everything.bundle + EOF + + # We should fetch all bundles in the expected order. + test_remote_https_urls actual && + test_cmp expect actual +' + # DO NOT add non-httpd-specific tests here, because the last part of this # test script is only executed when httpd is available and enabled. From patchwork Mon Jan 23 15:21:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13112406 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FBD7C05027 for ; Mon, 23 Jan 2023 15:22:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232429AbjAWPWy (ORCPT ); Mon, 23 Jan 2023 10:22:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231974AbjAWPW2 (ORCPT ); Mon, 23 Jan 2023 10:22:28 -0500 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 863D929404 for ; Mon, 23 Jan 2023 07:22:03 -0800 (PST) Received: by mail-wm1-x32f.google.com with SMTP id d4-20020a05600c3ac400b003db1de2aef0so8831636wms.2 for ; Mon, 23 Jan 2023 07:22:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=znl0EhyHD8RVNecs85AAFqvciRWhRkW3gJPZgJlcpL4=; b=oiECA9DwcvbzFY0/9LR5uJicxJu356qFQ86dpXVQbcwdF0BAm8BcflYASWIv7X830C yrG/ZxypYKAf5NOwxx1jV3wrFgT0BcyecVEeKVAX7NU5BtjqXD2eJZLRMujOF7qyc/zp f6WhojPqSq3EXTrIAf6ZLjqejo/nNZiQTbNBY4FNeYVyxghcE//nDAaiXpU+twaAZ8yI 4P2RkINf9PWbDO1H2KPy1YGjf8kXtmDXsIVmdIPMZtaKGbn/YX+F5LMbBXlipjV+8N6d ev0XT8F0NKWFh9YbHd0627iPx0NFQUyW13PAdvhr6WDFyOm5bTi2mOW2fnMDRmFdnEbk UzNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=znl0EhyHD8RVNecs85AAFqvciRWhRkW3gJPZgJlcpL4=; b=6c3y7VjP51uX+hJvOZ7VIEYPX+p+FS/5yBLXsHlaQ5HhWURv9uY3eYvL7eZSsib1y2 ntnOgc0rjgcmgw6PKAGw6g+9h2Dbe3/u/Mg/Qvsre5DYyx3JyVEeQ1SUo9rP2HZkmt5Z JMYNjdcup7ohOWLlIPA54Yif0gUbJLRreLU+nQKPS8wZ8lqyWM86om1h5755Wjxn4uKN FlEZv73c6r/5uSTP1NLrEC4K/ffVb33bWZ+6JPXF9LYXqgUyGGWQnOd+eVa6OhUkdYcU SISM3kUKkUiE6f2iElQkOuMSnFyoZb5+eVHhxieFTlr2DY+Aa03RCQMBh2sZQlEcpVt1 /Jxg== X-Gm-Message-State: AFqh2kqTExZTD8T6mz8MsD5WY8oe4cVDWLXTp8VMSPdhMG0vOmlfpEmq /85o9meXj8v89MJSr6Si3TxN1WnHniY= X-Google-Smtp-Source: AMrXdXvbtIKj1480JNyHnWXbsD1F/0l9xtf7hDzsiLX5FXdDg2t/Pjd6IsNYzdc+BTrr6y0ZlzIMmg== X-Received: by 2002:a1c:7415:0:b0:3da:fd07:1e3 with SMTP id p21-20020a1c7415000000b003dafd0701e3mr23138857wmc.22.1674487317578; Mon, 23 Jan 2023 07:21:57 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id bg24-20020a05600c3c9800b003d9ed40a512sm15136081wmb.45.2023.01.23.07.21.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Jan 2023 07:21:57 -0800 (PST) Message-Id: <17c404c1b836d7c160defe97e1667c631d292fb0.1674487310.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 23 Jan 2023 15:21:46 +0000 Subject: [PATCH v2 06/10] clone: set fetch.bundleURI if appropriate Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Bundle providers may organize their bundle lists in a way that is intended to improve incremental fetches, not just initial clones. However, they do need to state that they have organized with that in mind, or else the client will not expect to save time by downloading bundles after the initial clone. This is done by specifying a bundle.heuristic value. There are two types of bundle lists: those at a static URI and those that are advertised from a Git remote over protocol v2. The new fetch.bundleURI config value applies for static bundle URIs that are not advertised over protocol v2. If the user specifies a static URI via 'git clone --bundle-uri', then Git can set this config as a reminder for future 'git fetch' operations to check the bundle list before connecting to the remote(s). For lists provided over protocol v2, we will want to take a different approach and create a property of the remote itself by creating a remote..* type config key. That is not implemented in this change. Later changes will update 'git fetch' to consume this option. Signed-off-by: Derrick Stolee --- Documentation/config/fetch.txt | 8 +++++++ builtin/clone.c | 6 +++++- bundle-uri.c | 5 ++++- bundle-uri.h | 8 ++++++- t/t5558-clone-bundle-uri.sh | 39 ++++++++++++++++++++++++++++++++++ 5 files changed, 63 insertions(+), 3 deletions(-) diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt index cd65d236b43..244f44d460f 100644 --- a/Documentation/config/fetch.txt +++ b/Documentation/config/fetch.txt @@ -96,3 +96,11 @@ fetch.writeCommitGraph:: merge and the write may take longer. Having an updated commit-graph file helps performance of many Git commands, including `git merge-base`, `git push -f`, and `git log --graph`. Defaults to false. + +fetch.bundleURI:: + This value stores a URI for downloading Git object data from a bundle + URI before performing an incremental fetch from the origin Git server. + This is similar to how the `--bundle-uri` option behaves in + linkgit:git-clone[1]. `git clone --bundle-uri` will set the + `fetch.bundleURI` value if the supplied bundle URI contains a bundle + list that is organized for incremental fetches. diff --git a/builtin/clone.c b/builtin/clone.c index 5453ba5277f..5370617664d 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -1248,12 +1248,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix) * data from the --bundle-uri option. */ if (bundle_uri) { + int has_heuristic = 0; + /* At this point, we need the_repository to match the cloned repo. */ if (repo_init(the_repository, git_dir, work_tree)) warning(_("failed to initialize the repo, skipping bundle URI")); - else if (fetch_bundle_uri(the_repository, bundle_uri)) + else if (fetch_bundle_uri(the_repository, bundle_uri, &has_heuristic)) warning(_("failed to fetch objects from bundle URI '%s'"), bundle_uri); + else if (has_heuristic) + git_config_set_gently("fetch.bundleuri", bundle_uri); } strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD"); diff --git a/bundle-uri.c b/bundle-uri.c index 39acd856fb9..162a9276f31 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -742,7 +742,8 @@ static int unlink_bundle(struct remote_bundle_info *info, void *data) return 0; } -int fetch_bundle_uri(struct repository *r, const char *uri) +int fetch_bundle_uri(struct repository *r, const char *uri, + int *has_heuristic) { int result; struct bundle_list list; @@ -762,6 +763,8 @@ int fetch_bundle_uri(struct repository *r, const char *uri) result = unbundle_all_bundles(r, &list); cleanup: + if (has_heuristic) + *has_heuristic = (list.heuristic != BUNDLE_HEURISTIC_NONE); for_all_bundles_in_list(&list, unlink_bundle, NULL); clear_bundle_list(&list); clear_remote_bundle_info(&bundle, NULL); diff --git a/bundle-uri.h b/bundle-uri.h index ef32840bfa6..6dbc780f661 100644 --- a/bundle-uri.h +++ b/bundle-uri.h @@ -124,8 +124,14 @@ int bundle_uri_parse_config_format(const char *uri, * based on that information. * * Returns non-zero if no bundle information is found at the given 'uri'. + * + * If the pointer 'has_heuristic' is non-NULL, then the value it points to + * will be set to be non-zero if and only if the fetched list has a + * heuristic value. Such a value indicates that the list was designed for + * incremental fetches. */ -int fetch_bundle_uri(struct repository *r, const char *uri); +int fetch_bundle_uri(struct repository *r, const char *uri, + int *has_heuristic); /** * Given a bundle list that was already advertised (likely by the diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 6f9417a0afb..b2d15e141ca 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -432,6 +432,8 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' ' --single-branch --branch=base --no-tags \ "$HTTPD_URL/smart/fetch.git" clone-token-http && + test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri && + cat >expect <<-EOF && $HTTPD_URL/bundle-list $HTTPD_URL/bundle-1.bundle @@ -441,6 +443,43 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' ' test_cmp expect actual ' +test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' + test_when_finished rm -rf fetch-http-4 trace*.txt && + + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \ + git clone --single-branch --branch=base \ + --bundle-uri="$HTTPD_URL/bundle-list" \ + "$HTTPD_URL/smart/fetch.git" fetch-http-4 && + + test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-1.bundle + EOF + + test_remote_https_urls actual && + test_cmp expect actual && + + # only received base ref from bundle-1 + git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-\EOF && + refs/bundles/base + EOF + test_cmp expect refs +' + # Do not add tests here unless they use the HTTP server, as they will # not run unless the HTTP dependencies exist. From patchwork Mon Jan 23 15:21:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13112407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31E56C54E94 for ; Mon, 23 Jan 2023 15:22:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231668AbjAWPWz (ORCPT ); Mon, 23 Jan 2023 10:22:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232208AbjAWPW3 (ORCPT ); Mon, 23 Jan 2023 10:22:29 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 116E629E0B for ; Mon, 23 Jan 2023 07:22:04 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id t5so11168505wrq.1 for ; Mon, 23 Jan 2023 07:22:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=iNFrRUzAOuNSF8t1leDfh9Tj6xawIPV4QkdV6yQMg+U=; b=aHTcAUqo5ES01/XNA2DdU9UAhqDCKo4JpVOR2vPXZ1HGqcsB8puXYCbdVANyM9gD38 lBBhsMGDLot2lYDIcsSb+Lstwm45YLDNx1wcrq814NpHS/lPddGhw6AieAkrNp1+U09t QEQyPJZeg6sY87thsidS5+mMWz5oLHaKvvbiOSAWsNhMabUvhZz4rgcgPimn1/g0UDQ9 JLJJUwu37NxnOKzKZSW/Dip2GULPO7Y7LGYldE8ikZlXY8uETlrtJiiBTMqCi6w3UKpm SKrQBxqrsx3IppGUCbVy/n6iFH7YIe1+7laWesTqz9LXdnfRRYhTiha3XXl+y8KnSGWc O5gA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iNFrRUzAOuNSF8t1leDfh9Tj6xawIPV4QkdV6yQMg+U=; b=vO6d9kl5Iyf6A4jJQwTdn6gwR2zt7+XZgDCaRcGe3ufQnRUAUsZIBzmmFOoXO6Std7 x6bEDTbBkneFECQuywq3PrMjovpRomytuLmVhjzXXT5whvqR8q+/j5T6HJ5RkKVsHR/+ Qm8NfI5xwK7WClPHkOj4spqvj7xKMyJp9NJ9kiqQt3oQojAFxZ6dStAJ23RocNxOJRy/ tlW5h9tenSc+tfGTnGRN0MgGllAu2NTpRZMyqs6rgm6DOa3/BWLEir+BFXioCPZAHvhB 9kzlWpWYC/wCmBw4ZSRwdkvj2ilGHww7qORLrZyokDVGKLbXz3pMEQMmuok3mTQAJ6YI TRug== X-Gm-Message-State: AFqh2kpxDZRNa2MemaRFps2DdHPPMoXJ9nYw/StsSYfi8elatqSAxb9F wEVzCF07xOBUl1E97w2wpHaxXCDawHM= X-Google-Smtp-Source: AMrXdXt8NkK7rXYg1/Dis30TLiMUJzptDsMdVi6bfW+C4Zh0L5DfsWy+dd6oo9lxl/mIWd5JsydCAg== X-Received: by 2002:adf:ee8d:0:b0:2bf:95cc:744f with SMTP id b13-20020adfee8d000000b002bf95cc744fmr10235251wro.56.1674487318412; Mon, 23 Jan 2023 07:21:58 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id co22-20020a0560000a1600b002a01e64f7a1sm19113789wrb.88.2023.01.23.07.21.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Jan 2023 07:21:58 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 23 Jan 2023 15:21:47 +0000 Subject: [PATCH v2 07/10] bundle-uri: drop bundle.flag from design doc Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The Implementation Plan section lists a 'bundle.flag' option that is not documented anywhere else. What is documented elsewhere in the document and implemented by previous changes is the 'bundle.heuristic' config key. For now, a heuristic is required to indicate that a bundle list is organized for use during 'git fetch', and it is also sufficient for all existing designs. Signed-off-by: Derrick Stolee --- Documentation/technical/bundle-uri.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt index b78d01d9adf..91d3a13e327 100644 --- a/Documentation/technical/bundle-uri.txt +++ b/Documentation/technical/bundle-uri.txt @@ -479,14 +479,14 @@ outline for submitting these features: (This choice is an opt-in via a config option and a command-line option.) -4. Allow the client to understand the `bundle.flag=forFetch` configuration +4. Allow the client to understand the `bundle.heuristic` configuration key and the `bundle..creationToken` heuristic. When `git clone` - discovers a bundle URI with `bundle.flag=forFetch`, it configures the - client repository to check that bundle URI during later `git fetch ` + discovers a bundle URI with `bundle.heuristic`, it configures the client + repository to check that bundle URI during later `git fetch ` commands. 5. Allow clients to discover bundle URIs during `git fetch` and configure - a bundle URI for later fetches if `bundle.flag=forFetch`. + a bundle URI for later fetches if `bundle.heuristic` is set. 6. Implement the "inspect headers" heuristic to reduce data downloads when the `bundle..creationToken` heuristic is not available. From patchwork Mon Jan 23 15:21:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13112411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7268C54E94 for ; Mon, 23 Jan 2023 15:23:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232533AbjAWPXW (ORCPT ); Mon, 23 Jan 2023 10:23:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232182AbjAWPWr (ORCPT ); Mon, 23 Jan 2023 10:22:47 -0500 Received: from mail-wm1-x32c.google.com (mail-wm1-x32c.google.com [IPv6:2a00:1450:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E57F72A16C for ; Mon, 23 Jan 2023 07:22:04 -0800 (PST) Received: by mail-wm1-x32c.google.com with SMTP id g10so9303457wmo.1 for ; Mon, 23 Jan 2023 07:22:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=qvZ0O/bBZenXrlPhO3RM5S4lF2nqmNMOi1HZFOAXj0E=; b=DI7qGFgjU2RVZ2H8f53B6PCVy7Ujk7rHJrpZkJzohzAw15JweGkfOeA0ucsckRzZ70 h4sA5WeDyTEskuhbrqQq3x81GB4F2tVFcMI/h2hQrSsHpaW4tSrB3qmZEVfq/NpKEeYH kyQqk6C/XkhyJEUUDnhy+xkKqfdnw49Ml7YqLIfGZQCsAhJDXP6rMQQiFyquVdTYJYc3 2Ahw8FQHvr/em1KDiBlzWjUanohbwMmNC0AgSULf27Ub5sVlyYrH6pUqYlQ6CucvXyyo mIXiGsVEgxEsh1EgUDp+3odAArS4XwkE/qRhxgTPZb3PDMPt4gHx13mSWlY11RhmSf7x Xq6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qvZ0O/bBZenXrlPhO3RM5S4lF2nqmNMOi1HZFOAXj0E=; b=G/Hns8DEFga8RMsZPkAEFfMPkzh48EnmV3DUZg9wV6OZOrKn2m9YRxX7HG+D9voCF9 pAsSLoXzpkgq43OhpmfQNlIoIdY+wBChdMIFkl7T/KdstTQy8GWWlbp4w+H9eXdhTeIy acwjKJ/PPLoVCbmMg6j/9SPyRmlBfBsUTCab8u/6tyS+bpgibuEZlYjwi4PaIPTWfIK7 vvtCrJOShiHTGnLDzEZo0MpIc/AypOwyCrJrcE2FjkCzmTv2KiYUUnEfvfutoC38Co8I q1DRVocHn8C00bjaY1rWwegbHT6/htY/iKrzWt9a9v70KWen09IrcZwOsd3mT5XBL9jS hYlQ== X-Gm-Message-State: AFqh2krctK+NePVZq6JXJm4X2w1Oc3rcpNRnQE23IycmR3sBWAazseot fLJVHt7exanBMBCAZPgHWkO2gV4UZuM= X-Google-Smtp-Source: AMrXdXs+eeUbNoq1oKL13wyZWiZcXlgV3vY8S4SOet6u36cdyDN10q2afWpLHLveCE7yV2hzrcWc/A== X-Received: by 2002:a05:600c:3b02:b0:3da:1bb0:4d71 with SMTP id m2-20020a05600c3b0200b003da1bb04d71mr32449632wms.11.1674487319327; Mon, 23 Jan 2023 07:21:59 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n35-20020a05600c3ba300b003db12112fcfsm12367626wms.4.2023.01.23.07.21.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Jan 2023 07:21:58 -0800 (PST) Message-Id: <59e57e049683e42248c270b3bfcad2d72769219d.1674487310.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 23 Jan 2023 15:21:48 +0000 Subject: [PATCH v2 08/10] fetch: fetch from an external bundle URI Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When a user specifies a URI via 'git clone --bundle-uri', that URI may be a bundle list that advertises a 'bundle.heuristic' value. In that case, the Git client stores a 'fetch.bundleURI' config value storing that URI. Teach 'git fetch' to check for this config value and download bundles from that URI before fetching from the Git remote(s). Likely, the bundle provider has configured a heuristic (such as "creationToken") that will allow the Git client to download only a portion of the bundles before continuing the fetch. Since this URI is completely independent of the remote server, we want to be sure that we connect to the bundle URI before creating a connection to the Git remote. We do not want to hold a stateful connection for too long if we can avoid it. To test that this works correctly, extend the previous tests that set 'fetch.bundleURI' to do follow-up fetches. The bundle list is updated incrementally at each phase to demonstrate that the heuristic avoids downloading older bundles. This includes the middle fetch downloading the objects in bundle-3.bundle from the Git remote, and therefore not needing that bundle in the third fetch. Signed-off-by: Derrick Stolee --- builtin/fetch.c | 7 +++ t/t5558-clone-bundle-uri.sh | 113 +++++++++++++++++++++++++++++++++++- 2 files changed, 119 insertions(+), 1 deletion(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 7378cafeec9..f101e454dc9 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -29,6 +29,7 @@ #include "commit-graph.h" #include "shallow.h" #include "worktree.h" +#include "bundle-uri.h" #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000) @@ -2109,6 +2110,7 @@ static int fetch_one(struct remote *remote, int argc, const char **argv, int cmd_fetch(int argc, const char **argv, const char *prefix) { int i; + const char *bundle_uri; struct string_list list = STRING_LIST_INIT_DUP; struct remote *remote = NULL; int result = 0; @@ -2194,6 +2196,11 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) if (dry_run) write_fetch_head = 0; + if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri)) { + if (fetch_bundle_uri(the_repository, bundle_uri, NULL)) + warning(_("failed to fetch bundles from '%s'"), bundle_uri); + } + if (all) { if (argc == 1) die(_("fetch --all does not take a repository argument")); diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index b2d15e141ca..7deeb4b8ad1 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -440,7 +440,55 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' ' EOF test_remote_https_urls actual && - test_cmp expect actual + test_cmp expect actual && + + # We now have only one bundle ref. + git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-\EOF && + refs/bundles/base + EOF + test_cmp expect refs && + + # Add remaining bundles, exercising the "deepening" strategy + # for downloading via the creationToken heurisitc. + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \ + git -C clone-token-http fetch origin --no-tags \ + refs/heads/merge:refs/heads/merge && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-4.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-2.bundle + EOF + + test_remote_https_urls actual && + test_cmp expect actual && + + # We now have all bundle refs. + git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/merge + refs/bundles/right + EOF + test_cmp expect refs ' test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' @@ -477,6 +525,69 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' cat >expect <<-\EOF && refs/bundles/base EOF + test_cmp expect refs && + + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + EOF + + # Fetch the objects for bundle-2 _and_ bundle-3. + GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \ + git -C fetch-http-4 fetch origin --no-tags \ + refs/heads/left:refs/heads/left \ + refs/heads/right:refs/heads/right && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-2.bundle + EOF + + test_remote_https_urls actual && + test_cmp expect actual && + + # received left from bundle-2 + git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + EOF + test_cmp expect refs && + + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + EOF + + # This fetch should skip bundle-3.bundle, since its objects are + # already local (we have the requisite commits for bundle-4.bundle). + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ + git -C fetch-http-4 fetch origin --no-tags \ + refs/heads/merge:refs/heads/merge && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-4.bundle + EOF + + test_remote_https_urls actual && + test_cmp expect actual && + + # received merge ref from bundle-4, but right is missing + # because we did not download bundle-3. + git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + + cat >expect <<-\EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/merge + EOF test_cmp expect refs ' From patchwork Mon Jan 23 15:21:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13112409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A12E9C38142 for ; Mon, 23 Jan 2023 15:23:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232456AbjAWPXA (ORCPT ); Mon, 23 Jan 2023 10:23:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232296AbjAWPWd (ORCPT ); Mon, 23 Jan 2023 10:22:33 -0500 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3CEC2A169 for ; Mon, 23 Jan 2023 07:22:04 -0800 (PST) Received: by mail-wm1-x32f.google.com with SMTP id l41-20020a05600c1d2900b003daf986faaeso8831440wms.3 for ; Mon, 23 Jan 2023 07:22:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=SfnOow34r53R8ZEx76D1TLBXZKqDDbz6TSPtoa9Dcf8=; b=BTj13uRnuveZKAHLsUnmMvKXqdpxVvYP8avC5qBLFHFO5j4GJqvE3FNGcgFbF1mZkm jJVcn2S52oHxS/UpEQMK+fox99URPV5qLlq9UAv+BUd5peGwiL5/H+e4J+864hLHIySf SdVOxHDyZNSwUP20kksA3TRnbSk0LFJ0SWNUCJMet5R5IPNhcITJKNXtWcrdZD/GtsQ8 CLsJGg69w8BCbFZiNJCwhc+Y1Ij77KoXibWLKj+GmNFkYCjcEHxmnT0rkk7vxKhSFJOc 4MDVj60ADNkfIz+bCmMl32DaweuuH9VmVhxds/YPh212lxrt3UhKFU3UBP4nHICn9Mxh gbzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SfnOow34r53R8ZEx76D1TLBXZKqDDbz6TSPtoa9Dcf8=; b=PXJ1UivzW3xmGryFJp3wfRhCr64Qao4sziwSdGVrAJirCs2DrHoVljA+SgZQraY2hn DtLfmn2hPNZEGG8VTE6fmftk5JtmyoIn66BbxXkR/ep/ga6gpbniM75gBKqnbVkVahqR 2xP2DsaPidzJBgJHJTES4RvFNn3En/uGowU/lFX7tILIIoTD2HIldkCW3pa3yH2Ukv65 ct8y/knYcutY84lhAldBy7JpM/FuF3y4UydvQ2jtP8rS8bJk90HF6Kr5Y5qqrUXo1L9U 0JvNF0TLTChQqDLGCTvtdttWGQgxufZQgDsx4ooJ4QylqzqfvCneBgc6IiMgZWT4rRCM DGcg== X-Gm-Message-State: AFqh2kr1wAiOvq6+mo0raN1A/CDADoz/5Zh27AcERJq+BQjo2Wblzn+Z I6kqtdNE8M74eVYoyP8fE6N9xkReojc= X-Google-Smtp-Source: AMrXdXs6uuMkWwO2gYBN5kE+RzZMjxgOzmbvzEt9yHkG9oRlX0bRhlZwQbC6aAcqf2VoK/84vI7c8Q== X-Received: by 2002:a05:600c:1d8e:b0:3d1:fcb4:4074 with SMTP id p14-20020a05600c1d8e00b003d1fcb44074mr25307300wms.22.1674487320140; Mon, 23 Jan 2023 07:22:00 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z4-20020a05600c0a0400b003db01178b62sm12365005wmp.40.2023.01.23.07.21.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Jan 2023 07:21:59 -0800 (PST) Message-Id: <6a1504b1c3a24b45d48c093285dfcc9a3d6afd68.1674487310.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 23 Jan 2023 15:21:49 +0000 Subject: [PATCH v2 09/10] bundle-uri: store fetch.bundleCreationToken Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When a bundle list specifies the "creationToken" heuristic, the Git client downloads the list and then starts downloading bundles in descending creationToken order. This process stops as soon as all downloaded bundles can be applied to the repository (because all required commits are present in the repository or in the downloaded bundles). When checking the same bundle list twice, this strategy requires downloading the bundle with the maximum creationToken again, which is wasteful. The creationToken heuristic promises that the client will not have a use for that bundle if its creationToken value is at most the previous creationToken value. To prevent these wasteful downloads, create a fetch.bundleCreationToken config setting that the Git client sets after downloading bundles. This value allows skipping that maximum bundle download when this config value is the same value (or larger). To test that this works correctly, we can insert some "duplicate" fetches into existing tests and demonstrate that only the bundle list is downloaded. The previous logic for downloading bundles by creationToken worked even if the bundle list was empty, but now we have logic that depends on the first entry of the list. Terminate early in the (non-sensical) case of an empty bundle list. Signed-off-by: Derrick Stolee --- Documentation/config/fetch.txt | 16 ++++++++++++ bundle-uri.c | 48 ++++++++++++++++++++++++++++++++-- t/t5558-clone-bundle-uri.sh | 29 +++++++++++++++++++- 3 files changed, 90 insertions(+), 3 deletions(-) diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt index 244f44d460f..568f0f75b30 100644 --- a/Documentation/config/fetch.txt +++ b/Documentation/config/fetch.txt @@ -104,3 +104,19 @@ fetch.bundleURI:: linkgit:git-clone[1]. `git clone --bundle-uri` will set the `fetch.bundleURI` value if the supplied bundle URI contains a bundle list that is organized for incremental fetches. ++ +If you modify this value and your repository has a `fetch.bundleCreationToken` +value, then remove that `fetch.bundleCreationToken` value before fetching from +the new bundle URI. + +fetch.bundleCreationToken:: + When using `fetch.bundleURI` to fetch incrementally from a bundle + list that uses the "creationToken" heuristic, this config value + stores the maximum `creationToken` value of the downloaded bundles. + This value is used to prevent downloading bundles in the future + if the advertised `creationToken` is not strictly larger than this + value. ++ +The creation token values are chosen by the provider serving the specific +bundle URI. If you modify the URI at `fetch.bundleURI`, then be sure to +remove the value for the `fetch.bundleCreationToken` value before fetching. diff --git a/bundle-uri.c b/bundle-uri.c index 162a9276f31..691853b2c56 100644 --- a/bundle-uri.c +++ b/bundle-uri.c @@ -487,6 +487,8 @@ static int fetch_bundles_by_token(struct repository *r, { int cur; int move_direction = 0; + const char *creationTokenStr; + uint64_t maxCreationToken = 0, newMaxCreationToken = 0; struct bundle_list_context ctx = { .r = r, .list = list, @@ -500,8 +502,27 @@ static int fetch_bundles_by_token(struct repository *r, for_all_bundles_in_list(list, append_bundle, &bundles); + if (!bundles.nr) { + free(bundles.items); + return 0; + } + QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing); + /* + * If fetch.bundleCreationToken exists, parses to a uint64t, and + * is not strictly smaller than the maximum creation token in the + * bundle list, then do not download any bundles. + */ + if (!repo_config_get_value(r, + "fetch.bundlecreationtoken", + &creationTokenStr) && + sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 && + bundles.items[0]->creationToken <= maxCreationToken) { + free(bundles.items); + return 0; + } + /* * Attempt to download and unbundle the minimum number of bundles by * creationToken in decreasing order. If we fail to unbundle (after @@ -522,6 +543,16 @@ static int fetch_bundles_by_token(struct repository *r, cur = 0; while (cur >= 0 && cur < bundles.nr) { struct remote_bundle_info *bundle = bundles.items[cur]; + + /* + * If we need to dig into bundles below the previous + * creation token value, then likely we are in an erroneous + * state due to missing or invalid bundles. Halt the process + * instead of continuing to download extra data. + */ + if (bundle->creationToken <= maxCreationToken) + break; + if (!bundle->file) { /* * Not downloaded yet. Try downloading. @@ -561,6 +592,9 @@ static int fetch_bundles_by_token(struct repository *r, */ move_direction = -1; bundle->unbundled = 1; + + if (bundle->creationToken > newMaxCreationToken) + newMaxCreationToken = bundle->creationToken; } } @@ -575,14 +609,24 @@ stack_operation: cur += move_direction; } - free(bundles.items); - /* * We succeed if the loop terminates because 'cur' drops below * zero. The other case is that we terminate because 'cur' * reaches the end of the list, so we have a failure no matter * which bundles we apply from the list. */ + if (cur < 0) { + struct strbuf value = STRBUF_INIT; + strbuf_addf(&value, "%"PRIu64"", newMaxCreationToken); + if (repo_config_set_multivar_gently(ctx.r, + "fetch.bundleCreationToken", + value.buf, NULL, 0)) + warning(_("failed to store maximum creation token")); + + strbuf_release(&value); + } + + free(bundles.items); return cur >= 0; } diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 7deeb4b8ad1..9c2b7934b9b 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -433,6 +433,7 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' ' "$HTTPD_URL/smart/fetch.git" clone-token-http && test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri && + test_cmp_config -C clone-token-http 1 fetch.bundlecreationtoken && cat >expect <<-EOF && $HTTPD_URL/bundle-list @@ -468,6 +469,7 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' ' GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \ git -C clone-token-http fetch origin --no-tags \ refs/heads/merge:refs/heads/merge && + test_cmp_config -C clone-token-http 4 fetch.bundlecreationtoken && cat >expect <<-EOF && $HTTPD_URL/bundle-list @@ -511,6 +513,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' "$HTTPD_URL/smart/fetch.git" fetch-http-4 && test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri && + test_cmp_config -C fetch-http-4 1 fetch.bundlecreationtoken && cat >expect <<-EOF && $HTTPD_URL/bundle-list @@ -538,6 +541,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' git -C fetch-http-4 fetch origin --no-tags \ refs/heads/left:refs/heads/left \ refs/heads/right:refs/heads/right && + test_cmp_config -C fetch-http-4 2 fetch.bundlecreationtoken && cat >expect <<-EOF && $HTTPD_URL/bundle-list @@ -555,6 +559,18 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' EOF test_cmp expect refs && + # No-op fetch + GIT_TRACE2_EVENT="$(pwd)/trace1b.txt" \ + git -C fetch-http-4 fetch origin --no-tags \ + refs/heads/left:refs/heads/left \ + refs/heads/right:refs/heads/right && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + EOF + test_remote_https_urls actual && + test_cmp expect actual && + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && [bundle "bundle-3"] uri = bundle-3.bundle @@ -570,6 +586,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \ git -C fetch-http-4 fetch origin --no-tags \ refs/heads/merge:refs/heads/merge && + test_cmp_config -C fetch-http-4 4 fetch.bundlecreationtoken && cat >expect <<-EOF && $HTTPD_URL/bundle-list @@ -588,7 +605,17 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' refs/bundles/left refs/bundles/merge EOF - test_cmp expect refs + test_cmp expect refs && + + # No-op fetch + GIT_TRACE2_EVENT="$(pwd)/trace2b.txt" \ + git -C fetch-http-4 fetch origin && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + EOF + test_remote_https_urls actual && + test_cmp expect actual ' # Do not add tests here unless they use the HTTP server, as they will From patchwork Mon Jan 23 15:21:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13112410 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 974C3C05027 for ; Mon, 23 Jan 2023 15:23:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232532AbjAWPXU (ORCPT ); Mon, 23 Jan 2023 10:23:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232419AbjAWPWe (ORCPT ); Mon, 23 Jan 2023 10:22:34 -0500 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3C6A2A164 for ; Mon, 23 Jan 2023 07:22:04 -0800 (PST) Received: by mail-wm1-x32b.google.com with SMTP id c4-20020a1c3504000000b003d9e2f72093so10868818wma.1 for ; Mon, 23 Jan 2023 07:22:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=YV9hjl1OVRrp505HrIIt+0EfH9Q3G3aaQK8uCxUSDp8=; b=pYp9npiTSjFN3BsE8sW9HQERYMFAWaU8YaYjYl4rn9E/meTmmSnMvTrDtF5f5+4TJ5 dye/MObx9KMjkC+wpvmSbLzCgumn4WLEyf3emETwNQaq1Q+/83W1ARJSsA+c+Gk31wbV +OxYQ5uMMM/AALCJVgNA5m7t2NBEtQKJT6srfMrTy82+Sc4m4EWyyDP4tjLGFhwhPIk3 rRVxNLJm76qgfFkmDad3d/9aht7CxgftxKuKFdYk0okxJsoVtltr1DItKSF2ib8wOGyd 8twZdctsCCgL+gEjFkuBWqMbiW/wpZiCYJpZRnSUU0V8IfP2rmRcOR7nbB3SYep0qLKl AQhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YV9hjl1OVRrp505HrIIt+0EfH9Q3G3aaQK8uCxUSDp8=; b=AMQH7SCb20fakjHFXD/QwbcA60Rp5TvZ4fTHHHigUSWuoBDAdKIsxiJhDF/VIPZGMc +kft3pVFfqUsQ1oB9bkV0wyWPR/D0cQ+jgUwjJhzvaRiGxuvclF9S7WlYxQiE/rJCuET eei48w8YClFVTy01n4m0QbZajXqn/DLwenGf5sJIpOJ+UACNLVrmYSF+sraUFO4I3xs2 6sRbCje72qyxye17XsCALbjRHrLgRAFJgoYrJSlqMtDupKJj8pRvLB14x9zXpFgWAvaV KmvzfCZj3QkHa0a/UMMB/7f/aH9SLRPbaRfdPPMCtcVqmyQquxnsfxo4eDPqzuipsS9M QeGQ== X-Gm-Message-State: AFqh2kq8kK3+XuSmx4ip1pq4KegQwrBPqJ5kAa3HttSg8r0bQ15dIZlc taHAZp9/1R0HLr2WjHVBFgN3SFn/was= X-Google-Smtp-Source: AMrXdXsXXU3qUP6tzyGtSx9KqLFo03F0sHJ6JRcvuQHEupHUY5riYN7/xdBG6sck5vIGMnve9OvfUQ== X-Received: by 2002:a1c:4b14:0:b0:3cf:8957:a441 with SMTP id y20-20020a1c4b14000000b003cf8957a441mr23327633wma.12.1674487320982; Mon, 23 Jan 2023 07:22:00 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m31-20020a05600c3b1f00b003dafadd2f77sm12038458wms.1.2023.01.23.07.22.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Jan 2023 07:22:00 -0800 (PST) Message-Id: <676522615ad0e8f24099ef35a0f39367e5f688ae.1674487310.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 23 Jan 2023 15:21:50 +0000 Subject: [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com, avarab@gmail.com, steadmon@google.com, chooglen@google.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The creationToken heuristic uses a different mechanism for downloading bundles from the "standard" approach. Specifically: it uses a concrete order based on the creationToken values and attempts to download as few bundles as possible. It also modifies local config to store a value for future fetches to avoid downloading bundles, if possible. However, if any of the individual bundles has a failed download, then the logic for the ordering comes into question. It is important to avoid infinite loops, assigning invalid creation token values in config, but also to be opportunistic as possible when downloading as many bundles as seem appropriate. These tests were used to inform the implementation of fetch_bundles_by_token() in bundle-uri.c, but are being added independently here to allow focusing on faulty downloads. There may be more cases that could be added that result in modifications to fetch_bundles_by_token() as interesting data shapes reveal themselves in real scenarios. Signed-off-by: Derrick Stolee --- t/t5558-clone-bundle-uri.sh | 400 ++++++++++++++++++++++++++++++++++++ 1 file changed, 400 insertions(+) diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 9c2b7934b9b..e3ccfe872c4 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -618,6 +618,406 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' ' test_cmp expect actual ' +test_expect_success 'creationToken heuristic with failed downloads (clone)' ' + test_when_finished rm -rf download-* trace*.txt && + + # Case 1: base bundle does not exist, nothing can unbundle + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = fake.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone-1.txt" \ + git clone --single-branch --branch=base \ + --bundle-uri="$HTTPD_URL/bundle-list" \ + "$HTTPD_URL/smart/fetch.git" download-1 && + + # Bundle failure does not set these configs. + test_must_fail git -C download-1 config fetch.bundleuri && + test_must_fail git -C download-1 config fetch.bundlecreationtoken && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-4.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-2.bundle + $HTTPD_URL/fake.bundle + EOF + test_remote_https_urls actual && + test_cmp expect actual && + + # All bundles failed to unbundle + git -C download-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + test_must_be_empty refs && + + # Case 2: middle bundle does not exist, only two bundles can unbundle + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = fake.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone-2.txt" \ + git clone --single-branch --branch=base \ + --bundle-uri="$HTTPD_URL/bundle-list" \ + "$HTTPD_URL/smart/fetch.git" download-2 && + + # Bundle failure does not set these configs. + test_must_fail git -C download-2 config fetch.bundleuri && + test_must_fail git -C download-2 config fetch.bundlecreationtoken && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-4.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/fake.bundle + $HTTPD_URL/bundle-1.bundle + EOF + test_remote_https_urls actual && + test_cmp expect actual && + + # Only base bundle unbundled. + git -C download-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-EOF && + refs/bundles/base + refs/bundles/right + EOF + test_cmp expect refs && + + # Case 3: top bundle does not exist, rest unbundle fine. + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = fake.bundle + creationToken = 4 + EOF + + GIT_TRACE2_EVENT="$(pwd)/trace-clone-3.txt" \ + git clone --single-branch --branch=base \ + --bundle-uri="$HTTPD_URL/bundle-list" \ + "$HTTPD_URL/smart/fetch.git" download-3 && + + # As long as we have continguous successful downloads, + # we _do_ set these configs. + test_cmp_config -C download-3 "$HTTPD_URL/bundle-list" fetch.bundleuri && + test_cmp_config -C download-3 3 fetch.bundlecreationtoken && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/fake.bundle + $HTTPD_URL/bundle-3.bundle + $HTTPD_URL/bundle-2.bundle + $HTTPD_URL/bundle-1.bundle + EOF + test_remote_https_urls actual && + test_cmp expect actual && + + # All bundles failed to unbundle + git -C download-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/right + EOF + test_cmp expect refs +' + +# Expand the bundle list to include other interesting shapes, specifically +# interesting for use when fetching from a previous state. +# +# ---------------- bundle-7 +# 7 +# _/|\_ +# ---/--|--\------ bundle-6 +# 5 | 6 +# --|---|---|----- bundle-4 +# | 4 | +# | / \ / +# --|-|---|/------ bundle-3 (the client will be caught up to this point.) +# \ | 3 +# ---\|---|------- bundle-2 +# 2 | +# ----|---|------- bundle-1 +# \ / +# 1 +# | +# (previous commits) +test_expect_success 'expand incremental bundle list' ' + ( + cd clone-from && + git checkout -b lefter left && + test_commit 5 && + git checkout -b righter right && + test_commit 6 && + git checkout -b top lefter && + git merge -m "7" merge righter && + + git bundle create bundle-6.bundle lefter righter --not left right && + git bundle create bundle-7.bundle top --not lefter merge righter && + + cp bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" + ) && + git -C "$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" fetch origin +refs/heads/*:refs/heads/* +' + +test_expect_success 'creationToken heuristic with failed downloads (fetch)' ' + test_when_finished rm -rf download-* trace*.txt && + + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + EOF + + git clone --single-branch --branch=left \ + --bundle-uri="$HTTPD_URL/bundle-list" \ + "$HTTPD_URL/smart/fetch.git" fetch-base && + test_cmp_config -C fetch-base "$HTTPD_URL/bundle-list" fetch.bundleURI && + test_cmp_config -C fetch-base 3 fetch.bundleCreationToken && + + # Case 1: all bundles exist: successful unbundling of all bundles + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + + [bundle "bundle-6"] + uri = bundle-6.bundle + creationToken = 6 + + [bundle "bundle-7"] + uri = bundle-7.bundle + creationToken = 7 + EOF + + cp -r fetch-base fetch-1 && + GIT_TRACE2_EVENT="$(pwd)/trace-fetch-1.txt" \ + git -C fetch-1 fetch origin && + test_cmp_config -C fetch-1 7 fetch.bundlecreationtoken && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-7.bundle + $HTTPD_URL/bundle-6.bundle + $HTTPD_URL/bundle-4.bundle + EOF + test_remote_https_urls actual && + test_cmp expect actual && + + # Check which bundles have unbundled by refs + git -C fetch-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/lefter + refs/bundles/merge + refs/bundles/right + refs/bundles/righter + refs/bundles/top + EOF + test_cmp expect refs && + + # Case 2: middle bundle does not exist, only bundle-4 can unbundle + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + + [bundle "bundle-6"] + uri = fake.bundle + creationToken = 6 + + [bundle "bundle-7"] + uri = bundle-7.bundle + creationToken = 7 + EOF + + cp -r fetch-base fetch-2 && + GIT_TRACE2_EVENT="$(pwd)/trace-fetch-2.txt" \ + git -C fetch-2 fetch origin && + + # Since bundle-7 fails to unbundle, do not update creation token. + test_cmp_config -C fetch-2 3 fetch.bundlecreationtoken && + + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/bundle-7.bundle + $HTTPD_URL/fake.bundle + $HTTPD_URL/bundle-4.bundle + EOF + test_remote_https_urls actual && + test_cmp expect actual && + + # Check which bundles have unbundled by refs + git -C fetch-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/merge + refs/bundles/right + EOF + test_cmp expect refs && + + # Case 3: top bundle does not exist, rest unbundle fine. + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && + [bundle] + version = 1 + mode = all + heuristic = creationToken + + [bundle "bundle-1"] + uri = bundle-1.bundle + creationToken = 1 + + [bundle "bundle-2"] + uri = bundle-2.bundle + creationToken = 2 + + [bundle "bundle-3"] + uri = bundle-3.bundle + creationToken = 3 + + [bundle "bundle-4"] + uri = bundle-4.bundle + creationToken = 4 + + [bundle "bundle-6"] + uri = bundle-6.bundle + creationToken = 6 + + [bundle "bundle-7"] + uri = fake.bundle + creationToken = 7 + EOF + + cp -r fetch-base fetch-3 && + GIT_TRACE2_EVENT="$(pwd)/trace-fetch-3.txt" \ + git -C fetch-3 fetch origin && + + # As long as we have continguous successful downloads, + # we _do_ set the maximum creation token. + test_cmp_config -C fetch-3 6 fetch.bundlecreationtoken && + + # NOTE: the fetch skips bundle-4 since bundle-6 successfully + # unbundles itself and bundle-7 failed to download. + cat >expect <<-EOF && + $HTTPD_URL/bundle-list + $HTTPD_URL/fake.bundle + $HTTPD_URL/bundle-6.bundle + EOF + test_remote_https_urls actual && + test_cmp expect actual && + + # Check which bundles have unbundled by refs + git -C fetch-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs && + cat >expect <<-EOF && + refs/bundles/base + refs/bundles/left + refs/bundles/lefter + refs/bundles/right + refs/bundles/righter + EOF + test_cmp expect refs +' + # Do not add tests here unless they use the HTTP server, as they will # not run unless the HTTP dependencies exist.