From patchwork Tue May 24 18:54:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12860411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A064C433F5 for ; Tue, 24 May 2022 18:54:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240495AbiEXSyb (ORCPT ); Tue, 24 May 2022 14:54:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45062 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240491AbiEXSy1 (ORCPT ); Tue, 24 May 2022 14:54:27 -0400 Received: from mail-oa1-x2c.google.com (mail-oa1-x2c.google.com [IPv6:2001:4860:4864:20::2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D06E5A168 for ; Tue, 24 May 2022 11:54:26 -0700 (PDT) Received: by mail-oa1-x2c.google.com with SMTP id 586e51a60fabf-f17f1acffeso23381884fac.4 for ; Tue, 24 May 2022 11:54:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=mQp8NliBj5JpIDXrjp5fkKh0Z7L6r2JU92UGn9OVt6s=; b=ydmyN3x3ORauPTvQ9WbOd9si5JrFwWwPXdWvVJgojRW9pJZIGXAMWBUUIKbaDDYsSD diisAWkxojLBKEfyCIAkICy3GEACdACR7OqUpMiGoIZqTaJKiFqEcBtlP4CEeWhFq94a 2Wlzn7h/8ASwgw5qkKIY1cQNnf9OjVkep+nCe4KqR8Y3X8QhmnB3QScvA+qNBf1Q6sgP yDTPwUffqTldkLa9bpmAA31vaPIP/otfCh5OAYRoLaKieQRl5QEHVbTu8cS14GbDj2qg MhF2DUTFAO09LVgCqOYN8BEgsRhknZLuk3yFjACtHdBmXAs20VkXLsHj2YZ95pdIYorB UJeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=mQp8NliBj5JpIDXrjp5fkKh0Z7L6r2JU92UGn9OVt6s=; b=TF5u0AiECkX0VNnl2yHelx35Xx3s7Al5XYsgpnaa2vHsQA3Yg53Z5dIhZQpxprKuW/ BljsMpRro7S2iMuZ25RE2p3xQmB4JVwCbCf8YCNMNLZQfdF0z1YwRCF8HeUpFedaG3DZ EVVYCAx9Nixmtx34uSLEuK186gXXh57Fu9I9un8mKhLaDljiykORaYVBKG6IREcsQemY kDsHo5Jw77fWp1as1fRXejTfoA/+jE+HZDPmsHmg3svkEzfLGuUy3FnaPP+DASo0LjiW dVXd9njLeY6Vmlg8e07e6SbFBXxNXU5dc7IZXdE14OWza4SxuCFl/VeaNzjPNieTFNVI mDTA== X-Gm-Message-State: AOAM530mHUxgDpdZ+IPXTSgbtUpc/za2eYDEC9BNflMl0nQz+32qGcJh 6EsRVB0eQo6N8vY7znrbp+3A1TTypUWvhw== X-Google-Smtp-Source: ABdhPJzryHNzWBuMS5hAx1Gs56Bec1rTNKulJAB+uQ+MbAfHyu9xpT4EOrBoRkIMP4fuqDfwOL78RA== X-Received: by 2002:a05:6870:b14a:b0:f2:3221:be57 with SMTP id a10-20020a056870b14a00b000f23221be57mr3343092oal.10.1653418465394; Tue, 24 May 2022 11:54:25 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id s7-20020a05687087c700b000f2911d1987sm1396028oam.39.2022.05.24.11.54.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 May 2022 11:54:24 -0700 (PDT) Date: Tue, 24 May 2022 14:54:22 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: vdye@github.com, jonathantanmy@google.com, gitster@pobox.com Subject: [PATCH v2 1/4] pack-bitmap.c: check preferred pack validity when opening MIDX bitmap Message-ID: <618e8a6166473d238e62ce6243d9a0b2b72ee2f0.1653418457.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When pack-objects adds an entry to its packing list, it marks the packfile and offset containing the object, which we may later use during verbatim reuse (c.f., `write_reused_pack_verbatim()`). If the packfile in question is deleted in the background (e.g., due to a concurrent `git repack`), we'll die() as a result of calling use_pack(), unless we have an open file descriptor on the pack itself. 4c08018204 (pack-objects: protect against disappearing packs, 2011-10-14) worked around this by opening the pack ahead of time before recording it as a valid source for reuse. 4c08018204's treatment meant that we could tolerate disappearing packs, since it ensures we always have an open file descriptor on any pack that we mark as a valid source for reuse. This tightens the race to only happen when we need to close an open pack's file descriptor (c.f., the caller of `packfile.c::get_max_fd_limit()`) _and_ that pack was deleted, in which case we'll complain that a pack could not be accessed and die(). The pack bitmap code does this, too, since prior to dc1daacdcc (pack-bitmap: check pack validity when opening bitmap, 2021-07-23) it was vulnerable to the same race. The MIDX bitmap code does not do this, and is vulnerable to the same race. Apply the same treatment as dc1daacdcc to the routine responsible for opening the multi-pack bitmap's preferred pack to close this race. This patch handles the "preferred" pack (c.f., the section "multi-pack-index reverse indexes" in Documentation/technical/pack-format.txt) specially, since pack-objects depends on reusing exact chunks of that pack verbatim in reuse_partial_packfile_from_bitmap(). So if that pack cannot be loaded, the utility of a bitmap is significantly diminished. Similar to dc1daacdcc, we could technically just add this check in reuse_partial_packfile_from_bitmap(), since it's possible to use a MIDX .bitmap without needing to open any of its packs. But it's simpler to do the check as early as possible, covering all direct uses of the preferred pack. Note that doing this check early requires us to call prepare_midx_pack() early, too, so move the relevant part of that loop from load_reverse_index() into open_midx_bitmap_1(). Subsequent patches handle the non-preferred packs in a slightly different fashion. Signed-off-by: Taylor Blau --- pack-bitmap.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 97909d48da..d607918407 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -315,6 +315,8 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git, struct stat st; char *idx_name = midx_bitmap_filename(midx); int fd = git_open(idx_name); + uint32_t i; + struct packed_git *preferred; free(idx_name); @@ -353,6 +355,20 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git, warning(_("multi-pack bitmap is missing required reverse index")); goto cleanup; } + + for (i = 0; i < bitmap_git->midx->num_packs; i++) { + if (prepare_midx_pack(the_repository, bitmap_git->midx, i)) + die(_("could not open pack %s"), + bitmap_git->midx->pack_names[i]); + } + + preferred = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)]; + if (!is_pack_valid(preferred)) { + warning(_("preferred pack (%s) is invalid"), + preferred->pack_name); + goto cleanup; + } + return 0; cleanup: @@ -429,8 +445,6 @@ static int load_reverse_index(struct bitmap_index *bitmap_git) * since we will need to make use of them in pack-objects. */ for (i = 0; i < bitmap_git->midx->num_packs; i++) { - if (prepare_midx_pack(the_repository, bitmap_git->midx, i)) - die(_("load_reverse_index: could not open pack")); ret = load_pack_revindex(bitmap_git->midx->packs[i]); if (ret) return ret; From patchwork Tue May 24 18:54:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12860412 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 667DAC433F5 for ; Tue, 24 May 2022 18:54:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235226AbiEXSyk (ORCPT ); Tue, 24 May 2022 14:54:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240498AbiEXSyc (ORCPT ); Tue, 24 May 2022 14:54:32 -0400 Received: from mail-oa1-x2f.google.com (mail-oa1-x2f.google.com [IPv6:2001:4860:4864:20::2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E07B55A17C for ; Tue, 24 May 2022 11:54:30 -0700 (PDT) Received: by mail-oa1-x2f.google.com with SMTP id 586e51a60fabf-e93bbb54f9so23345563fac.12 for ; Tue, 24 May 2022 11:54:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=D7LV9vFMN1YJTuI2n9SEbJ5iJsxmMil2bdChPnPBV94=; b=plOK/KXSBpXUrUukKzdk5/9fcQMGLyd+MVb9m973UnVpwYhP8Q8Iv/PpbBqIfRwyvW LZOXInR6046rpShXZWRCfW/6HXY1W/0o5i9W+o3LIV33n4RwxTAOeyDUoX/oLo0WsFbC BRHuM5nzUJ2x03kCd/IMpgNNUz6fzBCrNvYlaaFy/V5DqyGX7m8FsNL3FPEBxVusSCPo K0Zl3TTNpjGeo/Pm10ldyBXLqRzASaqVeFsMhAvKb3wuQUwI6pFC2L/A1mXgKZSmb32t 69HUsFDYxRR+yXttK5izxbViAhu1Uko7E6yG+YVNrX5zN6r03XYpVWB/vklKvoCEv6L2 Vygg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=D7LV9vFMN1YJTuI2n9SEbJ5iJsxmMil2bdChPnPBV94=; b=W1n0V6x5HR5VFGNI3tnpXNMyh4kRgnqAvBtuNDTdee9UwZa+Ji/KIn/qJ2i6H+ompI g8P0Iz/bKFCb8hSN0xjRjV/yT8oqJSDt/FQO+Yz2eCW0MW6aD7SMbGogDJou2YYhyqTk iXVbNNn2Gz2yFJCBQEEmVcakYiUwKgnPruFfAIBob3E8LyU/aJf0nkA0lhwQeyJbWKY1 qXSkbDOgGf1/f0Ci9aQ4QigYUbVuuSsLAH9RBmNg1yHsyoB3uq/KQygzkJLU96KLA3oG 5j1o2XsUseF+XDrSqLSpIQsyAbRTfldROJ0A6JX3HJ/RNTljFqAegqHwG7/Ca4GvtfNS uf3Q== X-Gm-Message-State: AOAM531NbIlsZ22N6pVlMzBhepMCjfswF2XEuvdxLUj7L3Yghc6Y75O2 BVwodjwyjSC14xIEQmxhbgLoXq7c75U+KQ== X-Google-Smtp-Source: ABdhPJzoFZnpatUo0uSNBjW5BIz7nIe6JtlhHxjzXYFeK10XekPUc9NQliulNXF4SQLTHqoRN7Ydxg== X-Received: by 2002:a05:6870:fb94:b0:f1:92e7:5501 with SMTP id kv20-20020a056870fb9400b000f192e75501mr3345624oab.283.1653418470086; Tue, 24 May 2022 11:54:30 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id e2-20020a056870920200b000f2776dd2acsm2240061oaf.27.2022.05.24.11.54.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 May 2022 11:54:29 -0700 (PDT) Date: Tue, 24 May 2022 14:54:27 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: vdye@github.com, jonathantanmy@google.com, gitster@pobox.com Subject: [PATCH v2 2/4] builtin/pack-objects.c: avoid redundant NULL check Message-ID: <2719d33f328e03239cdb2f5cca2fef9a4e9cbb93.1653418457.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Before calling `for_each_object_in_pack()`, the caller `read_packs_list_from_stdin()` loops through each of the `include_packs` and checks that its `->util` pointer (which is used to store the `struct packed_git *` itself) is non-NULL. This check is redundant, because `read_packs_list_from_stdin()` already checks that the included packs are non-NULL earlier on in the same function (and it does not add any new entries in between). Remove this check, since it is not doing anything in the meantime. Co-authored-by: Victoria Dye Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 014dcd4bc9..ec3193fd95 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3369,8 +3369,6 @@ static void read_packs_list_from_stdin(void) for_each_string_list_item(item, &include_packs) { struct packed_git *p = item->util; - if (!p) - die(_("could not find pack '%s'"), item->string); for_each_object_in_pack(p, add_object_entry_from_pack, &revs, From patchwork Tue May 24 18:54:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12860413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB9C7C433EF for ; Tue, 24 May 2022 18:54:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240502AbiEXSym (ORCPT ); Tue, 24 May 2022 14:54:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240509AbiEXSyh (ORCPT ); Tue, 24 May 2022 14:54:37 -0400 Received: from mail-oa1-x36.google.com (mail-oa1-x36.google.com [IPv6:2001:4860:4864:20::36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E83C35A168 for ; Tue, 24 May 2022 11:54:34 -0700 (PDT) Received: by mail-oa1-x36.google.com with SMTP id 586e51a60fabf-f1d5464c48so23394187fac.6 for ; Tue, 24 May 2022 11:54:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=gugJpDy48BPcqmPFt4PTAbwM7LzN68z05teuPcY5Kco=; b=z71BigBDhoVVKXO20DoKulAeCN99ePcB9nUw6iFxJPIb5sLyQTH6D+c6iHC+VkxvfX t3P32UDAl7U7Pc6fUAizGivBEWkxzFmTqJQ/q4yOG42i2zEyF2lLmJfWDDx8YOrAAjOg S6U4FkacPC+AYOPJygFKT3Un/KyjmbjIPfghNkaTTRlujqOAIv+2rNo8J11gLA0BDpbV TxVJ+UATfzIR6C1HLJH/+9MtCWFXbVKo1P32N5DN1wHbz/pVwTHrQCeJsbggCGMVJbjS j5XXPt4y5Dxpu6NnMuB5t1XUNZ9g80SRkp+8fRu7CxjZgwfKMzhiWUHWDPMaUqaCKz5w fq5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=gugJpDy48BPcqmPFt4PTAbwM7LzN68z05teuPcY5Kco=; b=1AUQ7CLOyTUNcA+j39kuWbRRMm3HYU1Hhom2YBFjc45yzTiagNiJw1OEKQNu5+zxcC ZHWVYKCoBTTqqvkYUkCJXGI7bS6zDrjSbwioLfmtYMbnCrQM/G8B4CcBQF37k5OB/mFA ++84sxNBeiCdLmUWV+pk/cRm6eE8DnrDkxfIgdGbBgQ4CdwKkd0B6sISPr9jxkU/Nxe8 CnksM0lvwbyORNsAykv0+VQ3pefLrBZKv4DxU5p/lcqMKA5832hrp2DMtlKNbg15XYXq 2MHwDR360q9vjwt5Lir0UeYv+zg9Uam0+k753iCaLqhnEgvm+FJBEprDf87wdNK7+lr0 fuwg== X-Gm-Message-State: AOAM530Caxr3U/LWsjQtFan3xkYQsCFBfbucx6GcrVvSJ4ZsYcOabz6o DrMHfX367RzUuYk+0hT0QbOsBaFs+Sf37g== X-Google-Smtp-Source: ABdhPJzIWErygw4aEbdYKUPyKWXBP+46PM4T0ZGqnbAG8LVAMeZnM63q0R5Y1ZpHsrgW4QCshqnLDw== X-Received: by 2002:a05:6870:f225:b0:f1:fc2d:af76 with SMTP id t37-20020a056870f22500b000f1fc2daf76mr3534953oao.87.1653418474127; Tue, 24 May 2022 11:54:34 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id i186-20020acab8c3000000b00325cda1ffb9sm5475326oif.56.2022.05.24.11.54.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 May 2022 11:54:33 -0700 (PDT) Date: Tue, 24 May 2022 14:54:31 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: vdye@github.com, jonathantanmy@google.com, gitster@pobox.com Subject: [PATCH v2 3/4] builtin/pack-objects.c: ensure included `--stdin-packs` exist Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A subsequent patch will teach `want_object_in_pack()` to set its `*found_pack` and `*found_offset` poitners to NULL when the provided pack does not pass the `is_pack_valid()` check. The `--stdin-packs` mode of `pack-objects` is not quite prepared to handle this. To prepare it for this change, do the following two things: - Ensure provided packs pass the `is_pack_valid()` check when collecting the caller-provided packs into the "included" and "excluded" lists. - Gracefully handle any _invalid_ packs being passed to `want_object_in_pack()`. Calling `is_pack_valid()` early on makes it substantially less likely that we will have to deal with a pack going away, since we'll have an open file descriptor on its contents much earlier. But even packs with open descriptors can become invalid in the future if we (a) hit our open descriptor limit, forcing us to close some open packs, and (b) one of those just-closed packs has gone away in the meantime. `add_object_entry_from_pack()` depends on having a non-NULL `*found_pack`, since it passes that pointer to `packed_object_info()`, meaning that we would SEGV if the pointer became NULL (like we propose to do in `want_object_in_pack()` in the following patch). But avoiding calling `packed_object_info()` entirely is OK, too, since its only purpose is to identify which objects in the included packs are commits, so that they can form the tips of the advisory traversal used to discover the object namehashes. Failing to do this means that at worst we will produce lower-quality deltas, but it does not prevent us from generating the pack as long as we can find a copy of each object from the disappearing pack in some other part of the repository. Co-authored-by: Victoria Dye Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 35 ++++++++++++++++++++--------------- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index ec3193fd95..ffeaecd1d8 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3201,10 +3201,8 @@ static int add_object_entry_from_pack(const struct object_id *oid, uint32_t pos, void *_data) { - struct rev_info *revs = _data; - struct object_info oi = OBJECT_INFO_INIT; off_t ofs; - enum object_type type; + enum object_type type = OBJ_NONE; display_progress(progress_state, ++nr_seen); @@ -3215,20 +3213,25 @@ static int add_object_entry_from_pack(const struct object_id *oid, if (!want_object_in_pack(oid, 0, &p, &ofs)) return 0; - oi.typep = &type; - if (packed_object_info(the_repository, p, ofs, &oi) < 0) - die(_("could not get type of object %s in pack %s"), - oid_to_hex(oid), p->pack_name); - else if (type == OBJ_COMMIT) { - /* - * commits in included packs are used as starting points for the - * subsequent revision walk - */ - add_pending_oid(revs, NULL, oid, 0); + if (p) { + struct rev_info *revs = _data; + struct object_info oi = OBJECT_INFO_INIT; + + oi.typep = &type; + if (packed_object_info(the_repository, p, ofs, &oi) < 0) { + die(_("could not get type of object %s in pack %s"), + oid_to_hex(oid), p->pack_name); + } else if (type == OBJ_COMMIT) { + /* + * commits in included packs are used as starting points for the + * subsequent revision walk + */ + add_pending_oid(revs, NULL, oid, 0); + } + + stdin_packs_found_nr++; } - stdin_packs_found_nr++; - create_object_entry(oid, type, 0, 0, 0, p, ofs); return 0; @@ -3346,6 +3349,8 @@ static void read_packs_list_from_stdin(void) struct packed_git *p = item->util; if (!p) die(_("could not find pack '%s'"), item->string); + if (!is_pack_valid(p)) + die(_("packfile %s cannot be accessed"), p->pack_name); } /* From patchwork Tue May 24 18:54:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12860414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4935AC433FE for ; Tue, 24 May 2022 18:54:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240506AbiEXSyo (ORCPT ); Tue, 24 May 2022 14:54:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240491AbiEXSyk (ORCPT ); Tue, 24 May 2022 14:54:40 -0400 Received: from mail-ot1-x32a.google.com (mail-ot1-x32a.google.com [IPv6:2607:f8b0:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA5F25A17C for ; Tue, 24 May 2022 11:54:39 -0700 (PDT) Received: by mail-ot1-x32a.google.com with SMTP id t14-20020a9d66ce000000b0060af9ed4b87so7612783otm.9 for ; Tue, 24 May 2022 11:54:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=E7ITLcB0vJfRa74a/TzQmn4Qk5yV5raJ2AmNs5hd+1w=; b=QVeFJen9AT57R2hvxuLFX9UNC317xSov5sXkfhJjUMFTqs33d5H26P5biADsCa6iPM 8ia4acaqudaf2BpYnAE86OBBbgJmRt9zH3DI15qQJ3FjEf2+YJ6YMpuZHzwZaBqfIbFe yeav4kUF7Wlkq2rQDbaFRI+0aywP/ln3cx8aSVF8rBoU4C6aXkrzCxGC+g45w74hYStR 2uVY9hFXZDkq7B8qjFQ6Uj2jsdhqEyKcRflRc575hv2iaU1qYdnkkB7f8qsfHHCLfuYo V62LXwiKWY/JyP0uQcJ+6TzEbKt6sadXz+ogdiN19EOcWaoWRVHRMjkzImwB3mQha3zS vzOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=E7ITLcB0vJfRa74a/TzQmn4Qk5yV5raJ2AmNs5hd+1w=; b=iqMktEQEWDxapt1x7x+/zsXmkp5XMBWrwBXUk8p3inphzWTs8sgRegIeIFcGZbZ3Jh nCELlaxY1XstIsNmGU6CFtNrCljc90QXLxS3JqAU3iu+NwNPh67YkHJJbCeY4SpvVd+m vtvRaVb+TaTrfm0fZIbAiUgDLuUMPTzghSl5I/dniGdSu6t2ys0tMRm0sVh2CUPYixnb n58HWRUvoH3epUuKoG7KS72cQfwNlDMQTQfakPxBBvtTVsG5dzNNAuqdiwV2Oyk3Dn4P H7lYfDUXmLFbkoSASVsWlby0EmSJbFuRJkA32Otn57NxjY9Xd48pHPwZDhOI2xMCiIv2 WTmw== X-Gm-Message-State: AOAM533hzs1v9KKvs97BlhYBlnjjeYZfEQvqGudfjIHtHhn2hqP6f2qq 5ebU0qhugmN6ndM6bgs461mwmTTaxeWThw== X-Google-Smtp-Source: ABdhPJx3oa7ClV3bNfmXp88ZJ3UCe4aCeCWKlgE6kIu1qrPE3aKXepJepxCePLJkeX93w0zefC30Sw== X-Received: by 2002:a9d:482:0:b0:5cd:57a1:8082 with SMTP id 2-20020a9d0482000000b005cd57a18082mr11337016otm.360.1653418478839; Tue, 24 May 2022 11:54:38 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id x25-20020a9d6d99000000b0060603221280sm5380085otp.80.2022.05.24.11.54.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 May 2022 11:54:38 -0700 (PDT) Date: Tue, 24 May 2022 14:54:36 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: vdye@github.com, jonathantanmy@google.com, gitster@pobox.com Subject: [PATCH v2 4/4] builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects Message-ID: <3fc3a95517558005b940b65d0e5357de50f81d98.1653418457.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When using a multi-pack bitmap, pack-objects will try to perform its traversal using a call to `traverse_bitmap_commit_list()`, which calls `add_object_entry_from_bitmap()` to add each object it finds to its packing list. This path can cause pack-objects to add objects from packs that don't have open pack_fds on them, by avoiding a call to `is_pack_valid()`. This is because we only call `is_pack_valid()` on the preferred pack (in order to do verbatim reuse via `reuse_partial_packfile_from_bitmap()`) and not others when loading a MIDX bitmap. In this case, `add_object_entry_from_bitmap()` will check whether it wants each object entry by calling `want_object_in_pack()`, which will call `want_found_object` (since its caller already supplied a `found_pack`). In most cases (particularly without `--local`, and when `ignored_packed_keep_on_disk` and `ignored_packed_keep_in_core` are both "0"), we'll take the entry from the pack contained in the MIDX bitmap, all without an open pack_fd. When we then try to use that entry later to assemble the actual pack, we'll be susceptible to any simultaneous writers moving that pack out of the way (e.g., due to a concurrent repack) without having an open file descriptor, causing races that result in errors like: remote: Enumerating objects: 1498802, done. remote: fatal: packfile ./objects/pack/pack-e57d433b5a588daa37fbe946e2b28dfaec03a93e.pack cannot be accessed remote: aborting due to possible repository corruption on the remote side. This race can happen even with multi-pack bitmaps, since we may open a MIDX bitmap that is being rewritten long before its packs are actually unlinked. Work around this by calling `is_pack_valid()` from within `want_found_object()`, matching the behavior in `want_object_in_pack_one()` (which has an analogous call). Most calls to `is_pack_valid()` should be basically no-ops, since only the first call requires us to open a file (subsequent calls realize the file is already open, and return immediately). Importantly, when `want_object_in_pack()` is given a non-NULL `*found_pack`, but `want_found_object()` rejects the copy of the object in that pack, we must reset `*found_pack` and `*found_offset` to NULL and 0, respectively. Failing to do so could lead to other checks in `want_object_in_pack()` (such as `want_object_in_pack_one()`) using the same (invalid) pack as `*found_pack`, meaning that we don't call `is_pack_valid()` because `p == *found_pack`. This can lead the caller to believe it can use a copy of an object from an invalid pack. An alternative approach to closing this race would have been to call `is_pack_valid()` on _all_ packs in a multi-pack bitmap on load. This has a couple of problems: - it is unnecessarily expensive in the cases where we don't actually need to open any packs (e.g., in `git rev-list --use-bitmap-index --count`) - more importantly, it means any time we would have hit this race, we'll avoid using bitmaps altogether, leading to significant slowdowns by forcing a full object traversal Co-authored-by: Victoria Dye Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index ffeaecd1d8..0a26de166d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1357,6 +1357,9 @@ static int want_found_object(const struct object_id *oid, int exclude, if (incremental) return 0; + if (!is_pack_valid(p)) + return -1; + /* * When asked to do --local (do not include an object that appears in a * pack we borrow from elsewhere) or --honor-pack-keep (do not include @@ -1472,6 +1475,9 @@ static int want_object_in_pack(const struct object_id *oid, want = want_found_object(oid, exclude, *found_pack); if (want != -1) return want; + + *found_pack = NULL; + *found_offset = 0; } for (m = get_multi_pack_index(the_repository); m; m = m->next) {