From patchwork Thu Oct 25 18:38:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Geert Jansen X-Patchwork-Id: 10656397 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 66D8813A4 for ; Thu, 25 Oct 2018 18:38:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 592A1283D1 for ; Thu, 25 Oct 2018 18:38:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 480252C38C; Thu, 25 Oct 2018 18:38:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BBA01283D1 for ; Thu, 25 Oct 2018 18:38:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727710AbeJZDMI (ORCPT ); Thu, 25 Oct 2018 23:12:08 -0400 Received: from smtp-fw-6001.amazon.com ([52.95.48.154]:13749 "EHLO smtp-fw-6001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727465AbeJZDMH (ORCPT ); Thu, 25 Oct 2018 23:12:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1540492693; x=1572028693; h=from:to:subject:date:message-id:content-id: content-transfer-encoding:mime-version; bh=YRHp+PAJ1rIodgmnUG0Kxn/gSXiNH2blJKAB6rW/Li8=; b=kakoU5VAF5CYwlbfSJaLGEmCEN/tPGdVTDXt6jfIED8FKL/vdA2/vwqA jqfxG0TR0vmSUJO+O6A523I8nwNcZneHjAe4rTmwCbhKmkXMl+3+vnJg2 tDtiqGw3N24eVrQkIpT0VAOefPNvJlB+t5US7XOOkGH2Zff18xuzOIjI2 A=; X-IronPort-AV: E=Sophos;i="5.54,425,1534809600"; d="scan'208";a="365174968" Received: from iad6-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2b-a7fdc47a.us-west-2.amazon.com) ([10.124.125.6]) by smtp-border-fw-out-6001.iad6.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 25 Oct 2018 18:38:11 +0000 Received: from EX13MTAUEA001.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan3.pdx.amazon.com [10.236.137.198]) by email-inbound-relay-2b-a7fdc47a.us-west-2.amazon.com (8.14.7/8.14.7) with ESMTP id w9PIc4F7107163 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL) for ; Thu, 25 Oct 2018 18:38:10 GMT Received: from EX13D11UEE001.ant.amazon.com (10.43.62.132) by EX13MTAUEA001.ant.amazon.com (10.43.61.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Thu, 25 Oct 2018 18:38:10 +0000 Received: from EX13D11UEE003.ant.amazon.com (10.43.62.248) by EX13D11UEE001.ant.amazon.com (10.43.62.132) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Thu, 25 Oct 2018 18:38:10 +0000 Received: from EX13D11UEE003.ant.amazon.com ([10.43.62.248]) by EX13D11UEE003.ant.amazon.com ([10.43.62.248]) with mapi id 15.00.1367.000; Thu, 25 Oct 2018 18:38:10 +0000 From: "Jansen, Geert" To: "git@vger.kernel.org" Subject: [RFC PATCH] index-pack: improve performance on NFS Thread-Topic: [RFC PATCH] index-pack: improve performance on NFS Thread-Index: AQHUbJHg0WSTDXn0M0aRdbudIsvf0Q== Date: Thu, 25 Oct 2018 18:38:09 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.43.62.42] Content-ID: <8636AF3EC0004A43BA0E58DD9426D7F7@amazon.com> MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The index-pack command determines if a sha1 collision test is needed by checking the existence of a loose object with the given hash. In my tests, I can improve performance of “git clone” on Amazon EFS by 8x when used with a non-default mount option (lookupcache=pos) that's required for a Gitlab HA setup. My assumption is that this check is unnecessary when cloning into a new repository because the repository will be empty. By default, the Linux NFS client will cache directory entries as well as the non-existence of directory entries. The latter means that when client c1 does stat() on a file that does not exist, the non-existence will be cached and any subsequent stat() operation on the file will return -ENOENT until the cache expires or is invalidated, even if the file was created on client c2 in the mean time. This leads to errors in a Gitlab HA setup when it distributes jobs over multiple worker nodes assuming each worker node has the same view of the shared file system. The recommended workaround by Gitlab is to use the “lookupcache=pos” NFS mount option which disables the negative lookup cache. This option has a high performance impact. Cloning the gitlab-ce repository (https://gitlab.com/gitlab-org/gitlib-ce.git) into an NFS mounted directory gives the following results: lookupcache=all (default): 624 seconds lookupcache=pos: 4957 seconds The reason for the poor performance is that index-pack will issue a stat() call for every object in the repo when checking if a collision test is needed. These stat() calls result in the following NFS operations: LOOKUP dirfh=".git/objects", name="01" -> NFS4ERR_ENOENT With lookupcache=all, the non-existence of the .git/objects/XX directories is cached, so that there will be at most 256 LOOKUP calls. With lookupcache=pos, there will be one LOOKUP operation for every object in the repository, which in case of the gitlab-ce repo is about 1.3 million times. The attached patch removes the collision check when cloning into a new repository. The performance of git clone with this patch is: lookupcache=pos (with patch): 577 seconds I'd welcome feedback on the attached patch and whether my assumption that the sha1 collision check can be safely omitted when cloning into a new repository is correct. Signed-off-by: Geert Jansen --- builtin/index-pack.c | 5 ++++- fetch-pack.c | 2 ++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 2004e25da..22b3d40fb 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -84,6 +84,7 @@ static int verbose; static int show_resolving_progress; static int show_stat; static int check_self_contained_and_connected; +static int cloning; static struct progress *progress; @@ -794,7 +795,7 @@ static void sha1_object(const void *data, struct object_entry *obj_entry, assert(data || obj_entry); - if (startup_info->have_repository) { + if (startup_info->have_repository && !cloning) { read_lock(); collision_test_needed = has_sha1_file_with_flags(oid->hash, OBJECT_INFO_QUICK); @@ -1705,6 +1706,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) check_self_contained_and_connected = 1; } else if (!strcmp(arg, "--fsck-objects")) { do_fsck_object = 1; + } else if (!strcmp(arg, "--cloning")) { + cloning = 1; } else if (!strcmp(arg, "--verify")) { verify = 1; } else if (!strcmp(arg, "--verify-stat")) { diff --git a/fetch-pack.c b/fetch-pack.c index b3ed7121b..c75bfb8aa 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -843,6 +843,8 @@ static int get_pack(struct fetch_pack_args *args, argv_array_push(&cmd.args, "--check-self-contained-and-connected"); if (args->from_promisor) argv_array_push(&cmd.args, "--promisor"); + if (args->cloning) + argv_array_pushf(&cmd.args, "--cloning"); } else { cmd_name = "unpack-objects";