From patchwork Tue May 14 21:10:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 10943759 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4923E924 for ; Tue, 14 May 2019 21:11:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 39A3228986 for ; Tue, 14 May 2019 21:11:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2DDC728984; Tue, 14 May 2019 21:11:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B525E28980 for ; Tue, 14 May 2019 21:11:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726339AbfENVLC (ORCPT ); Tue, 14 May 2019 17:11:02 -0400 Received: from mail-pl1-f201.google.com ([209.85.214.201]:54623 "EHLO mail-pl1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726148AbfENVLB (ORCPT ); Tue, 14 May 2019 17:11:01 -0400 Received: by mail-pl1-f201.google.com with SMTP id 61so321173plr.21 for ; Tue, 14 May 2019 14:11:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=uzdUejCGyE5iId1yrUQPOKFbOC1aKEuHXKtaammMnrI=; b=VE5a0N3N0T+7tVhu2ALUGjDrCsnvyAD6bnYvudM7PeZOvH5z6FKGESYe1vpF6UpVhZ FfhUzOZvhs/E1kecp3m3Y7KR2roB4tX7sX1BzBGHy5gHlwglBWIQoma2rbmpC/FY2uqD y4WXjUx+/VVlyIV2LAlVLXqsmfsITzFLIj3FxyvOGT1+jItbwr5zhf97mO/I32znx8az P8Ay50Ay0XMoAt2TDZDXWuQPzFM5LhOdqGu9iotujkhAogZhq2CdfFUUbC6c2OJlwnVe peAXco/NxuAGQUQW1GSc8RPW26QTJR8V3FZmRKgZ2hEK6K2h4jNXqnvC5LNJHKHUKOJS OmQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=uzdUejCGyE5iId1yrUQPOKFbOC1aKEuHXKtaammMnrI=; b=qe1KlItPamIC+B7p5sOc8tlz2+7iCBvCVfRL5o9fKK9sEZpM8hXNyM6ad7uzlK6ICT kA227CO7wEf/j3ynv55a2tghF+nZXNvl3hTkYSnVatZ/B6HPDUQd7BotCVxCt5FpZjNB IM8F+aym9aTL3nOfdSPVEwE0Oxqr/tOiMwQne06NIrVw4UwWiQFweqG/u1cC11XaoNsO 4gQRgo/3nF+3ABlazMaTN7LbGt7M0J3xq1clSzw1BSdSASlyo+gr+0HRUaKbPEahYx7C L6s2Y9uICgT3qHh1OjcSaNxB7owtJMuwYixkLoxnEDGy+cGwdukhLX2/elkSLCwfgW+c smOQ== X-Gm-Message-State: APjAAAX2gWgzBC+2tjUjnMNu0ghnKflh8ODleiGNYxqjeIN0MuyHPbse yJ9kxJqHbRA1Y/EaI6nDtspbhbpS8E984F9PvfkvnE0p4843jVQqoGYU23Jzw3r+4zdXLKlyQjD gKaPNj0s3tG8yLAzK5Wgkf7STrBMp9U27gA5QELl6UjllqWFaY8F1+gH2L1wrjXmDYHQ+jfvsIB 3f X-Google-Smtp-Source: APXvYqyuMMJG4zugvVPKKsO8nLzTchmVY3uxxKZCEravu0LHQ7zWuEB0gZNRusjPxOC0+fTlxwkYEfBq8/BB6I8KbbkR X-Received: by 2002:a63:dc15:: with SMTP id s21mr1732503pgg.215.1557868260936; Tue, 14 May 2019 14:11:00 -0700 (PDT) Date: Tue, 14 May 2019 14:10:54 -0700 In-Reply-To: Message-Id: <991a3aa27dd7fe67adbed2e03502790932b5059c.1557868134.git.jonathantanmy@google.com> Mime-Version: 1.0 References: X-Mailer: git-send-email 2.21.0.1020.gf2820cf01a-goog Subject: [PATCH 1/2] t5616: refactor packfile replacement From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP A subsequent patch will perform the same packfile replacement that is already done twice, so refactor it into its own function. Also, the same subsequent patch will use, in another way, part of the packfile replacement functionality, so extract those out too. Signed-off-by: Jonathan Tan --- t/t5616-partial-clone.sh | 34 +++++++++++++++++++++------------- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh index 9a8f9886b3..7cc0c71556 100755 --- a/t/t5616-partial-clone.sh +++ b/t/t5616-partial-clone.sh @@ -244,11 +244,25 @@ test_expect_success 'fetch what is specified on CLI even if already promised' ' . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd -# Converts bytes into a form suitable for inclusion in a sed command. For -# example, "printf 'ab\r\n' | hex_unpack" results in '\x61\x62\x0d\x0a'. -sed_escape () { - perl -e '$/ = undef; $input = <>; print unpack("H2" x length($input), $input)' | - sed 's/\(..\)/\\x\1/g' +# Converts bytes into their hexadecimal representation. For example, +# "printf 'ab\r\n' | hex_unpack" results in '61620d0a'. +hex_unpack () { + perl -e '$/ = undef; $input = <>; print unpack("H2" x length($input), $input)' +} + +# Inserts $1 at the start of the string and every 2 characters thereafter. +intersperse () { + sed 's/\(..\)/'$1'\1/g' +} + +# Create a one-time-sed command to replace the existing packfile with $1. +replace_packfile () { + # The protocol requires that the packfile be sent in sideband 1, hence + # the extra \x01 byte at the beginning. + printf "1,/packfile/!c %04x\\\\x01%s0000" \ + "$(($(wc -c <$1) + 5))" \ + "$(hex_unpack <$1 | intersperse '\\x')" \ + >"$HTTPD_ROOT_PATH/one-time-sed" } test_expect_success 'upon cloning, check that all refs point to objects' ' @@ -270,10 +284,7 @@ test_expect_success 'upon cloning, check that all refs point to objects' ' # Replace the existing packfile with the crafted one. The protocol # requires that the packfile be sent in sideband 1, hence the extra # \x01 byte at the beginning. - printf "1,/packfile/!c %04x\\\\x01%s0000" \ - "$(($(wc -c "$HTTPD_ROOT_PATH/one-time-sed" && + replace_packfile incomplete.pack && # Use protocol v2 because the sed command looks for the "packfile" # section header. @@ -313,10 +324,7 @@ test_expect_success 'when partial cloning, tolerate server not sending target of # Replace the existing packfile with the crafted one. The protocol # requires that the packfile be sent in sideband 1, hence the extra # \x01 byte at the beginning. - printf "1,/packfile/!c %04x\\\\x01%s0000" \ - "$(($(wc -c "$HTTPD_ROOT_PATH/one-time-sed" && + replace_packfile incomplete.pack && # Use protocol v2 because the sed command looks for the "packfile" # section header. From patchwork Tue May 14 21:10:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Tan X-Patchwork-Id: 10943761 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E3758924 for ; Tue, 14 May 2019 21:11:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C879F2896D for ; Tue, 14 May 2019 21:11:06 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BCFE32898D; Tue, 14 May 2019 21:11:06 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.5 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, USER_IN_DEF_DKIM_WL autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25F952896D for ; Tue, 14 May 2019 21:11:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726429AbfENVLF (ORCPT ); Tue, 14 May 2019 17:11:05 -0400 Received: from mail-oi1-f201.google.com ([209.85.167.201]:43879 "EHLO mail-oi1-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726148AbfENVLE (ORCPT ); Tue, 14 May 2019 17:11:04 -0400 Received: by mail-oi1-f201.google.com with SMTP id l12so182834oii.10 for ; Tue, 14 May 2019 14:11:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ljS0W6FVKQYnkZNyAH2XaU6gTPEF8XCjTfmlTU+yS5w=; b=O8ni3XHxoLC0b0nbrvDr/xajmMcKTnEXHlF3gql+Hoz/CEgoPOXdULxSMEELYuAB3W NZrKvlTiUk6F3i+VtJnAmDl5pFAMiUnGnZ4AzDgHJIxaoYIa1nHr0DwlRHqK9f8AWD0h Uds/dCJ+Ldrd8ma1gz06p2c1b1OMY2P93rYvNNy4upBeEyp45dCp5RHVJJ8StfTIv6O3 hhCeJQ3alUR6p47M2S1FsXO1T+rTa2hSUfxcamWJJoxJut4ZkHybNw4F1Nsw31ba7OJn PeyMrf/kkaQZ/IihKvzOdcEsMuHsJlzkn4J0ztEV2zdraUGw0unmDPH0U7SrwG0SyPls YuRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ljS0W6FVKQYnkZNyAH2XaU6gTPEF8XCjTfmlTU+yS5w=; b=YINSpM5V+tL22ZATQVNJpt0mMcBSU6tKTCO0sL9rVoVKgfe5kYXWetZZZyg2IYGwFV HCzw/blosNLCGwsC8iVQ1If+RA9FEuvrHw5q6uJ4PyH62Rnzf5dZnCb91zUttrzUN9hN uZKk79fpxemGqvh01DsD4o3GzCT4ofy22ySoT7IJoK9jclFKP46G4yWd7tAughpY44Ju o/yng4SXzuDwyeeSB8euUfyie3r4n7oDTQXl+Qetj1RiTetcubZTEm+V3MYFXbvxTlBL z2x5zY8+V62exQR5lm+azRz49VxRAcJCso8w5yaqcjYsRSerQpRtepT/Iz1sDh8iGs9N EjNA== X-Gm-Message-State: APjAAAWVGSvTc+QkjDJy8KRESxH8CFGgBP6IQk6lA4lcADxn2sOBzM4i bZNeyC/KzFbNVxKzAGnbfxxRPQxI3xna33qAgm8OmNVQ7EtETuFwxFx1wiqAPoFHoME6IfuTbZe eb26mGAsTqKz+JG6+Rxq5jfv1EuaZf1ax842aIdXWIVvEd6d7IuNqkvDkNuFUqDCfoZkU9xZqyw mk X-Google-Smtp-Source: APXvYqyEA5BQZ/rfAWTvKCCfZcgldgWLAVY9gm02A4GUNn0zE00bjv1+m8IA8hPA9cogn9+4n/osY3MHypwnw+COMsmv X-Received: by 2002:aca:40c1:: with SMTP id n184mr72924oia.73.1557868263350; Tue, 14 May 2019 14:11:03 -0700 (PDT) Date: Tue, 14 May 2019 14:10:55 -0700 In-Reply-To: Message-Id: <4fcaa4481b5fd2a76aa21263f997e00913db0e0f.1557868134.git.jonathantanmy@google.com> Mime-Version: 1.0 References: X-Mailer: git-send-email 2.21.0.1020.gf2820cf01a-goog Subject: [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases From: Jonathan Tan To: git@vger.kernel.org Cc: Jonathan Tan Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When fetching, the client sends "have" commit IDs indicating that the server does not need to send any object referenced by those commits, reducing network I/O. When the client is a partial clone, the client still sends "have"s in this way, even if it does not have every object referenced by a commit it sent as "have". If a server omits such an object, it is fine: the client could lazily fetch that object before this fetch, and it can still do so after. The issue is when the server sends a thin pack containing an object that is a REF_DELTA against such a missing object: index-pack fails to fix the thin pack. When support for lazily fetching missing objects was added in 8b4c0103a9 ("sha1_file: support lazily fetching missing objects", 2017-12-08), support in index-pack was turned off in the belief that it accesses the repo only to do hash collision checks. However, this is not true: it also needs to access the repo to resolve REF_DELTA bases. Support for lazy fetching should still generally be turned off in index-pack because it is used as part of the lazy fetching process itself (if not, infinite loops may occur), but we do need to fetch the REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that those are REF_DELTA themselves, because we do not send "have" when making such fetches.) To resolve this, prefetch all missing REF_DELTA bases before attempting to resolve them. This both ensures that all bases are attempted to be fetched, and ensures that we make only one request per index-pack invocation, and not one request per missing object. Signed-off-by: Jonathan Tan Signed-off-by: Jeff King Tested-by: Jonathan Nieder --- builtin/index-pack.c | 26 +++++++++++++++-- t/t5616-partial-clone.sh | 61 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+), 2 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index ccf4eb7e9b..0d55f73b0b 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -14,6 +14,7 @@ #include "thread-utils.h" #include "packfile.h" #include "object-store.h" +#include "fetch-object.h" static const char index_pack_usage[] = "git index-pack [-v] [-o ] [--keep | --keep=] [--verify] [--strict] ( | --stdin [--fix-thin] [])"; @@ -1351,6 +1352,25 @@ static void fix_unresolved_deltas(struct hashfile *f) sorted_by_pos[i] = &ref_deltas[i]; QSORT(sorted_by_pos, nr_ref_deltas, delta_pos_compare); + if (repository_format_partial_clone) { + /* + * Prefetch the delta bases. + */ + struct oid_array to_fetch = OID_ARRAY_INIT; + for (i = 0; i < nr_ref_deltas; i++) { + struct ref_delta_entry *d = sorted_by_pos[i]; + if (!oid_object_info_extended(the_repository, &d->oid, + NULL, + OBJECT_INFO_FOR_PREFETCH)) + continue; + oid_array_append(&to_fetch, &d->oid); + } + if (to_fetch.nr) + fetch_objects(repository_format_partial_clone, + to_fetch.oid, to_fetch.nr); + oid_array_clear(&to_fetch); + } + for (i = 0; i < nr_ref_deltas; i++) { struct ref_delta_entry *d = sorted_by_pos[i]; enum object_type type; @@ -1650,8 +1670,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) int report_end_of_input = 0; /* - * index-pack never needs to fetch missing objects, since it only - * accesses the repo to do hash collision checks + * index-pack never needs to fetch missing objects except when + * REF_DELTA bases are missing (which are explicitly handled). It only + * accesses the repo to do hash collision checks and to check which + * REF_DELTA bases need to be fetched. */ fetch_if_missing = 0; diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh index 7cc0c71556..f1baf83502 100755 --- a/t/t5616-partial-clone.sh +++ b/t/t5616-partial-clone.sh @@ -339,4 +339,65 @@ test_expect_success 'when partial cloning, tolerate server not sending target of ! test -e "$HTTPD_ROOT_PATH/one-time-sed" ' +test_expect_success 'tolerate server sending REF_DELTA against missing promisor objects' ' + SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" && + rm -rf "$SERVER" repo && + test_create_repo "$SERVER" && + test_config -C "$SERVER" uploadpack.allowfilter 1 && + test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 && + + # Create a commit with a blob to be used as a delta base. + for i in $(test_seq 10) + do + echo "this is a line" >>"$SERVER/foo.txt" + done && + git -C "$SERVER" add foo.txt && + git -C "$SERVER" commit -m bar && + git -C "$SERVER" rev-parse HEAD:foo.txt >deltabase && + + git -c protocol.version=2 clone --no-checkout \ + --filter=blob:none $HTTPD_URL/one_time_sed/server repo && + + # Sanity check to ensure that the client does not have that blob. + git -C repo rev-list --objects --exclude-promisor-objects \ + -- $(cat deltabase) >objlist && + test_line_count = 0 objlist && + + # Another commit. This commit will be fetched by the client. + echo "abcdefghijklmnopqrstuvwxyz" >>"$SERVER/foo.txt" && + git -C "$SERVER" add foo.txt && + git -C "$SERVER" commit -m baz && + + # Pack a thin pack containing, among other things, HEAD:foo.txt + # delta-ed against HEAD^:foo.txt. + printf "%s\n--not\n%s\n" \ + $(git -C "$SERVER" rev-parse HEAD) \ + $(git -C "$SERVER" rev-parse HEAD^) | + git -C "$SERVER" pack-objects --thin --stdout >thin.pack && + + # Ensure that the pack contains one delta against HEAD^:foo.txt. Since + # the delta contains at least 26 novel characters, the size cannot be + # contained in 4 bits, so the object header will take up 2 bytes. The + # most significant nybble of the first byte is 0b1111 (0b1 to indicate + # that the header continues, and 0b111 to indicate REF_DELTA), followed + # by any 3 nybbles, then the OID of the delta base. + git -C "$SERVER" rev-parse HEAD^:foo.txt >deltabase && + printf "f.,..%s" $(intersperse "," want && + hex_unpack have && + grep $(cat want) have && + + replace_packfile thin.pack && + + # Use protocol v2 because the sed command looks for the "packfile" + # section header. + test_config -C "$SERVER" protocol.version 2 && + + # Fetch the thin pack and ensure that index-pack is able to handle the + # REF_DELTA object with a missing promisor delta base. + git -C repo -c protocol.version=2 fetch && + + # Ensure that the one-time-sed script was used. + ! test -e "$HTTPD_ROOT_PATH/one-time-sed" +' + test_done