From patchwork Wed Nov 25 17:17:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11931913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A44D9C56201 for ; Wed, 25 Nov 2020 17:17:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 49938206D9 for ; Wed, 25 Nov 2020 17:17:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="MxO8KhIy" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730875AbgKYRRd (ORCPT ); Wed, 25 Nov 2020 12:17:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730653AbgKYRRc (ORCPT ); Wed, 25 Nov 2020 12:17:32 -0500 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 457FCC0613D4 for ; Wed, 25 Nov 2020 09:17:32 -0800 (PST) Received: by mail-pf1-x429.google.com with SMTP id b63so2921294pfg.12 for ; Wed, 25 Nov 2020 09:17:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=kSUzz6oZiVkN7bwi7Iy4qoKpkmDgc8biv7EHgaXpvgY=; b=MxO8KhIyY8nfjbojRDfTjuRH2fPFqueJ+nfGjzrBMJP/T8er+2FjSTvk6DuD6KwSWm kF+k4X8nMQx7Bfd8VPqpM/YyqaCoSgXntM+31P9XF8qBhuh4CfQgGROgs6Hxlhf9TmiW xcgPvIiv8QBWOnhY9u4GzIISkrvvfPuEH88mzlpHf7vVGqK28iv2lWSxn1Uidn6XWBdf JClw5Qv2uH6MnhB1mSXXBUik0OuArEetI1ds8Ws7MKdpc+4p4xr92A5FAWuRvvvGQQ33 304TyC6Kyi99aaOjnoZ81wdHNsQpT/n5fLFFGi4kDg+1A+kClUb2cSLAdXubC3rRm7wB ghFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=kSUzz6oZiVkN7bwi7Iy4qoKpkmDgc8biv7EHgaXpvgY=; b=uN3wqoY+WfbcbvWMxkREqySIVpmArJhhguO3c4WnRztXJ9MWIxnQgJie+WjBeuBmlr bSzSNflpzjtrk+mcZmcFsTEBurkOPNMOzlT6d0A5NFbrDtM/L96aD9Ma41tk7dIJKVy0 7ruF13gUd9ilyTfkOnOamYCUpc8bRx90HQ9C1Jz5fcDP0eWHHW1GuuYQIiFVFRSi4YrI i7Ux9S55xPwADNPvBaT5c/fjr73zPebzvmH4tSZpzOxzEav/zY0GgnuYPa/YNS7GLWbN n1zk+ZlnO609U/ACJJK8JnvujXE05bZ4tnvZgwb+E00SneWnxtNi+00MWpq4FJroGVsJ hDhg== X-Gm-Message-State: AOAM530tKdxUjJjRM/Tu3dNpIt08O0VDfIpl3SCIN/sKJc0eqsb82teP VWiIWEbuDW15yQIy2EkvYXZWC78I2uLIpWjF X-Google-Smtp-Source: ABdhPJz/wQvHCV6vTRrH7BXj6kBV2tseNimOu4qi6DFdTa3dewaVGiAdxezlj49Ykkt6xkFrhbXxIA== X-Received: by 2002:a05:6a00:13a3:b029:18b:d5d2:196 with SMTP id t35-20020a056a0013a3b029018bd5d20196mr635689pfg.62.1606324651429; Wed, 25 Nov 2020 09:17:31 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id p127sm2553895pfp.93.2020.11.25.09.17.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Nov 2020 09:17:30 -0800 (PST) Date: Wed, 25 Nov 2020 12:17:28 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, peff@peff.net Subject: [PATCH 1/2] packfile.c: protect against disappearing indexes Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In 17c35c8969 (packfile: skip loading index if in multi-pack-index, 2018-07-12) we stopped loading the .idx file for packs that are contained within a multi-pack index. This saves us the effort of loading an .idx and doing some lightweight validity checks by way of 'packfile.c:load_idx()', but introduces a race between processes that need to load the index (e.g., to generate a reverse index) and processes that can delete the index. For example, running the following in your shell: $ git init repo && cd repo $ git commit --allow-empty -m 'base' $ git repack -ad && git multi-pack-index write followed by: $ rm -f .git/objects/pack/pack-*.idx $ git rev-parse HEAD | git cat-file --batch-check='%(objectsize:disk)' will result in a segfault prior to this patch. What's happening here is that we notice that the pack is in the multi-pack index, and so don't check that it still has a .idx. When we then try and load that index to generate a reverse index, we don't have it, so the call to 'find_pack_revindex()' in 'packfile.c:packed_object_info()' returns NULL, and then dereferencing it causes a segfault. Of course, we don't ever expect someone to remove the index file by hand, or to be in a state where we never wrote it to begin with (yet find that pack in the multi-pack-index). But, this can happen in a timing race with 'git repack -ad', which removes all existing packs after writing a new pack containing all of their objects. Avoid this by reverting the hunk of 17c35c8969 which stops loading the index when the pack is contained in a MIDX. This makes the latter half of 17c35c8969 useless, since we'll always have a non-NULL 'p->index_data', in which case that if statement isn't guarding anything. These two together effectively revert 17c35c8969, and avoid the race explained above. Co-authored-by: Jeff King Signed-off-by: Taylor Blau --- packfile.c | 19 ++----------------- t/t5319-multi-pack-index.sh | 24 ++++++++++++++++++++++-- 2 files changed, 24 insertions(+), 19 deletions(-) diff --git a/packfile.c b/packfile.c index 0929ebe4fc..8d7f37a5f6 100644 --- a/packfile.c +++ b/packfile.c @@ -514,19 +514,8 @@ static int open_packed_git_1(struct packed_git *p) ssize_t read_result; const unsigned hashsz = the_hash_algo->rawsz; - if (!p->index_data) { - struct multi_pack_index *m; - const char *pack_name = pack_basename(p); - - for (m = the_repository->objects->multi_pack_index; - m; m = m->next) { - if (midx_contains_pack(m, pack_name)) - break; - } - - if (!m && open_pack_index(p)) - return error("packfile %s index unavailable", p->pack_name); - } + if (open_pack_index(p)) + return error("packfile %s index unavailable", p->pack_name); if (!pack_max_fds) { unsigned int max_fds = get_max_fd_limit(); @@ -567,10 +556,6 @@ static int open_packed_git_1(struct packed_git *p) " supported (try upgrading GIT to a newer version)", p->pack_name, ntohl(hdr.hdr_version)); - /* Skip index checking if in multi-pack-index */ - if (!p->index_data) - return 0; - /* Verify the pack matches its index. */ if (p->num_objects != ntohl(hdr.hdr_entries)) return error("packfile %s claims to have %"PRIu32" objects" diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index ace469c95c..d4607daec1 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -138,7 +138,7 @@ test_expect_success 'write midx with one v2 pack' ' compare_results_with_midx "one v2 pack" -test_expect_success 'corrupt idx not opened' ' +test_expect_success 'corrupt idx reports errors' ' idx=$(test-tool read-midx $objdir | grep "\.idx\$") && mv $objdir/pack/$idx backup-$idx && test_when_finished "mv backup-\$idx \$objdir/pack/\$idx" && @@ -149,7 +149,7 @@ test_expect_success 'corrupt idx not opened' ' test_copy_bytes 1064 $objdir/pack/$idx && git -c core.multiPackIndex=true rev-list --objects --all 2>err && - test_must_be_empty err + grep "index unavailable" err ' test_expect_success 'add more objects' ' @@ -755,4 +755,24 @@ test_expect_success 'repack --batch-size= repacks everything' ' ) ' +test_expect_success 'load reverse index when missing .idx' ' + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + git config core.multiPackIndex true && + + test_commit base && + git repack -ad && + git multi-pack-index write && + + git rev-parse HEAD >tip && + idx=$(ls .git/objects/pack/pack-*.idx) && + + mv $idx $idx.bak && + git cat-file --batch-check="%(objectsize:disk)" X-Patchwork-Id: 11931915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D6D7C56202 for ; Wed, 25 Nov 2020 17:17:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 041E7206D9 for ; Wed, 25 Nov 2020 17:17:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="egUXoa/a" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731064AbgKYRRh (ORCPT ); Wed, 25 Nov 2020 12:17:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730653AbgKYRRg (ORCPT ); Wed, 25 Nov 2020 12:17:36 -0500 Received: from mail-pf1-x441.google.com (mail-pf1-x441.google.com [IPv6:2607:f8b0:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCF96C0613D4 for ; Wed, 25 Nov 2020 09:17:36 -0800 (PST) Received: by mail-pf1-x441.google.com with SMTP id b63so2921498pfg.12 for ; Wed, 25 Nov 2020 09:17:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=H1bZ0lj9xGSJTWkocKn/mshKb+Tw72RucmICD7z5Ruc=; b=egUXoa/ayDW/6+J66xsalL4ULH7KhAJ27HL9s8RYwvlaI8110/FgEn6WjJ8GsjZqr6 TRsdIOmNtLSY0oxfWCeC3FkgrQatf+rHtsypBSXxXXLsfm0Nw3JqeZd+o2kvla2YUcPd NRspZ++P4FnGvvqDE5xiCaxee/YYORtX4RPp5Z2HQYa+C0LStWP9nDzAK6qywNbD8R5F YRagXzEzaBgPXCQ9IQ8mdsXkYxTtD9ctSkDaiGZozTHgyj4tOX4QhLkF2QCofaa2JHzy nF7O3qTHyULwRc90TJ4Pl4htdzxoH2keLXQc4AX1tAqOIoyetdPTHuslMB4oXkmTGRIU Wdtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=H1bZ0lj9xGSJTWkocKn/mshKb+Tw72RucmICD7z5Ruc=; b=E5zJSNvn73w3JDrs1PaYXg3NnxWnhbmQin6JyAN++cpxCdLZ2GKAfRzpavGQ39tDxe 5hWvr7YTloEOtsdbrx3G9jfeKTIle2Kvy8f89yVIZ66XMX4RdfHfoSa1UDU86HA1d52k pqoCPpvFWPOQUkwGjIXBmpVz/37E+QpwsTQdt3yflPk1A5oR1iHyz793ypMdCKR05f/C 0dS/tXGElWeuAToejIL1ErsQEbfU4fui8Cjj81uiG4OIOzRR6AwW12HxmSvBWAtIZqGc nY9V+TNDiyt8sy6Hmf2+rqv5XULNxP5Hr//FebOw/HKnFtD0vgCHyZm5eSUf312Rhz7Z YcTg== X-Gm-Message-State: AOAM5300iiu2KrktMwv9ArX+d3h6WFNDyEAJrdlN9eoOuvsRqwGfnd57 bxKRLPxoNlYgWrmFOStAo/NSxCBLRZrGSKW2 X-Google-Smtp-Source: ABdhPJw3zFazljIgTEpaU3KAbSm2zg2JMVNASU3ZWOmap0ybxVL1uhyHSDlJzCZvYfVZKwJd7P56Jg== X-Received: by 2002:a62:1896:0:b029:197:491c:be38 with SMTP id 144-20020a6218960000b0290197491cbe38mr3985394pfy.15.1606324656064; Wed, 25 Nov 2020 09:17:36 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id m18sm2618374pff.144.2020.11.25.09.17.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Nov 2020 09:17:35 -0800 (PST) Date: Wed, 25 Nov 2020 12:17:33 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, peff@peff.net Subject: [PATCH 2/2] midx.c: protect against disappearing packs Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When a packed object is stored in a multi-pack index, but that pack has racily gone away, the MIDX code simply calls die(), when it could be returning an error to the caller, which would in turn lead to re-scanning the pack directory. A pack can racily disappear, for example, due to a simultaneous 'git repack -ad', You can also reproduce this with two terminals, where one is running: git init while true; do git commit -q --allow-empty -m foo git repack -ad git multi-pack-index write done (in effect, constantly writing new MIDXs), and the other is running: obj=$(git rev-parse HEAD) while true; do echo $obj | git cat-file --batch-check='%(objectsize:disk)' || break done That will sometimes hit the error preparing packfile from multi-pack-index message, which this patch fixes. Right now, that path to discovering a missing pack looks something like 'find_pack_entry()' calling 'fill_midx_entry()' and eventually making its way to call 'nth_midxed_pack_entry()'. 'nth_midxed_pack_entry()' already checks 'is_pack_valid()' and propagates an error if the pack is invalid. So, this works if the pack has gone away between calling 'prepare_midx_pack()' and before calling 'is_pack_valid()', but not if it disappears before then. Catch the case where the pack has already disappeared before 'prepare_midx_pack()' by returning an error in that case, too. Co-authored-by: Jeff King Signed-off-by: Taylor Blau --- midx.c | 2 +- t/t5319-multi-pack-index.sh | 8 +++++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/midx.c b/midx.c index d233b54ac7..1d2179a61f 100644 --- a/midx.c +++ b/midx.c @@ -298,7 +298,7 @@ static int nth_midxed_pack_entry(struct repository *r, pack_int_id = nth_midxed_pack_int_id(m, pos); if (prepare_midx_pack(r, m, pack_int_id)) - die(_("error preparing packfile from multi-pack-index")); + return 0; p = m->packs[pack_int_id]; /* diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index d4607daec1..297de502a9 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -755,7 +755,7 @@ test_expect_success 'repack --batch-size= repacks everything' ' ) ' -test_expect_success 'load reverse index when missing .idx' ' +test_expect_success 'load reverse index when missing .idx, .pack' ' git init repo && test_when_finished "rm -fr repo" && ( @@ -768,9 +768,15 @@ test_expect_success 'load reverse index when missing .idx' ' git multi-pack-index write && git rev-parse HEAD >tip && + pack=$(ls .git/objects/pack/pack-*.pack) && idx=$(ls .git/objects/pack/pack-*.idx) && mv $idx $idx.bak && + git cat-file --batch-check="%(objectsize:disk)"