From patchwork Mon Aug 22 19:50:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12951316 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A22CC28D13 for ; Mon, 22 Aug 2022 19:50:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237501AbiHVTuk (ORCPT ); Mon, 22 Aug 2022 15:50:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237368AbiHVTuf (ORCPT ); Mon, 22 Aug 2022 15:50:35 -0400 Received: from mail-qt1-x82d.google.com (mail-qt1-x82d.google.com [IPv6:2607:f8b0:4864:20::82d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCF933CBCD for ; Mon, 22 Aug 2022 12:50:34 -0700 (PDT) Received: by mail-qt1-x82d.google.com with SMTP id cr9so8749681qtb.13 for ; Mon, 22 Aug 2022 12:50:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc; bh=nZ4LwTwuNjIxq/I6FM2nNIZ0mIOMCp6EDYqdGvsUI7o=; b=PWkiLJEkq0Ibe+uyvtXT3uD4EyvuA07eORTACbosOmrddo8R2BXrCPeFZYx8Qye0yy aOYiY0OCo8ur9Kuyn2FRFPU+y4Enzvh183vLYS8VYmfvscUZRQzDikKeIrqBq/z8Nxz4 29jly2oQvWCcoXWVOuiVf39iog7buSJmxPEgSXdNI66q3lYam2A+DU71PrVvWiMHNh54 dGGVEI5Yn6yotlAXwEWv3JIue8Mb3jLOdbnHBHLdhDDWYnsUwmndjP/WCYmUmtbcNdHk vSAtIofOfFGuPc9FOc+EgkKKt0LrrztM3p3a3ZeKsnw16/gQZM1Of+wF4BmLaO8R6sI+ qK+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=nZ4LwTwuNjIxq/I6FM2nNIZ0mIOMCp6EDYqdGvsUI7o=; b=46zz4WrpF8vYW7Fe/ofL+sOV/8H2zCsPTAqOIhsB/ELfRph1B4ebNaVS61B07cClfi RmdLHH3bLfeKNM1+QgiNFC4C5d/ZBIXsIYS6ehlVBSS+Aahx5aCk3CmPaM+QY4m48Ci8 IfMp0p7fNBZ4sA13LP9G3yU0eo4z1DvDGoaYxSJT9N+7gdcXatO0pAOFDa2buDiQlhjQ BrS3OAevyGEfePH2hYkidaz/3yPqX0H3R/34j47giiUZCIKhmUGvje3HdYsiJUxuSh4W BRXo/PDVb6yWhrgQRWTqLNPnw2U8bTKeym3emrvxwXmoFnoz0jWqVI8axQZQpArnpiNy FhIA== X-Gm-Message-State: ACgBeo0vUGQN3rWGxGnWRXl52YcPsPg5HAqHrpQ+BkUpJIPUg5Tq0jhP S5GYpcw8iSO3FnfBvmoeIZ5aPiAU9LPoY/Ru X-Google-Smtp-Source: AA6agR5not16ABYfBmvCxeVqy7L0VekhOJ/j3fotf8s8tSVM6JdWiA8+W+4tUF9A0ypq+f6EFIbqZg== X-Received: by 2002:ac8:7f53:0:b0:343:652:ce62 with SMTP id g19-20020ac87f53000000b003430652ce62mr16123511qtk.514.1661197833830; Mon, 22 Aug 2022 12:50:33 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id m9-20020a05620a24c900b006af0ce13499sm12226402qkn.115.2022.08.22.12.50.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 12:50:33 -0700 (PDT) Date: Mon, 22 Aug 2022 15:50:32 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: gitster@pobox.com, Johannes.Schindelin@gmx.de, chakrabortyabhradeep79@gmail.com, derrickstolee@github.com, jonathantanmy@google.com, kaartic.sivaraam@gmail.com Subject: [PATCH v2 1/7] t5326: demonstrate potential bitmap corruption Message-ID: <6b38bfcd2c2bced350cea198f7d576ffb81f3481.1661197803.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org It is possible to generate a corrupt MIDX bitmap when certain conditions are met. This happens when the preferred pack "P" changes to one (say, "Q") that: - "Q" has objects included in an existing MIDX, - but "Q" is different than "P", - and "Q" and "P" have some objects in common When this is the case, not all objects from "Q" will be selected from "Q" (ie., the generated MIDX will represent them as coming from a different pack), despite "Q" being preferred. This is an invariant violation, since all objects contained in the MIDX's preferred pack are supposed to originate from the preferred pack. In other words, all duplicate objects are resolved in favor of the copy that comes from the MIDX's preferred pack, if any. This violation results in a corrupt object order, which cannot be interpreted by the pack-bitmap code, leading to broken clones and other defects. This test demonstrates the above problem by constructing a minimal reproduction, and showing that the final `git clone` invocation fails. The reproduction is mostly straightforward, except that the new pack generated between MIDX writes (which is necessary in order to prevent that operation from being a noop) must sort ahead of all existing packs in order to prevent a different pack (neither "P" nor "Q") from appearing as preferred (meaning all its objects appear in order at the beginning of the pseudo-pack order). Subsequent commits will first refactor the midx.c::get_sorted_entries() function, and then fix this bug. Reported-by: Abhradeep Chakraborty Signed-off-by: Taylor Blau --- t/t5326-multi-pack-bitmaps.sh | 47 +++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 4fe57414c1..c364677ae8 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -307,4 +307,51 @@ test_expect_success 'graceful fallback when missing reverse index' ' ) ' +test_expect_success 'preferred pack change with existing MIDX bitmap' ' + git init preferred-pack-with-existing && + ( + cd preferred-pack-with-existing && + + test_commit base && + test_commit other && + + git rev-list --objects --no-object-names base >p1.objects && + git rev-list --objects --no-object-names other >p2.objects && + + p1="$(git pack-objects "$objdir/pack/pack" \ + --delta-base-offset X-Patchwork-Id: 12951317 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF46DC28D13 for ; Mon, 22 Aug 2022 19:50:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237698AbiHVTun (ORCPT ); Mon, 22 Aug 2022 15:50:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237477AbiHVTuj (ORCPT ); Mon, 22 Aug 2022 15:50:39 -0400 Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C98A4DB70 for ; Mon, 22 Aug 2022 12:50:37 -0700 (PDT) Received: by mail-qk1-x72e.google.com with SMTP id b2so8718372qkh.12 for ; Mon, 22 Aug 2022 12:50:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc; bh=tuvUAhGRN5xXKXxLp64O3OCnHzZvI5R33q/Tv4ZPE2c=; b=ZS6obIg7d7WqcUwXSr43zI5dEwcgQO3vOF5ubkZ0gg2gzTgN4ZKu8W6tP7dHpDzmus rSIwonyKsu3lb/lQD/UkRBk6m91SCZkFlx872yrZ+SAYBaPPfPycLZwrQ2ShY9fGz07g GfyAcd5g4pOvDAKl6u3S74cZ6ggkIv3K4GZCKCAc2esk02Z+Jbw3v9+esnbDB6AzPIbe RSKOXaQ0fO3AKgLn0/NNJ/KNG3xdde1mVZ7knY4a2y8T0O6P04wctSg3x/HPKjJWh6J2 YNCR7zDoMRdNzmZd3Xw27F1MsJXM3NAvI98xsWQ5EhrApC/VlZI2Gr5JqaAoUSkdPcq+ PuBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=tuvUAhGRN5xXKXxLp64O3OCnHzZvI5R33q/Tv4ZPE2c=; b=f4oH3Gq6+4/WrNzC1f0X8IYJbORU9aP7j0wf8ryAK2SFPZ8lyLZWhLDS1mauSSW7k7 Zd1svk5zHrGVRgFj1MrsHUDNDgDYmrpvwAXX9JqUv8bTgoq81ZS+fjvvkdXyzidsJmic V8UQ5DBo5e1h/rSkE0Bn1DLXPFsGL5v2nnqImjVNFYFSqBVX0WJ9eRoimRODm81INnTN M7wslJZDT5q6jwINb2DLnMZ3Wb0363xa+0si15/U5UGCykmWdyGIekigAC0OsN6xVYI1 T2rDtKDe/0YK/LBZfUln7SAOqlrgnc6puTQ1U/LgqCRE1oJmnmvYK6fjJ2/IkQxFE70+ WwfQ== X-Gm-Message-State: ACgBeo3pje/XS5Aa8MKDJRbYVVjWSRjvygQbmq/moqXK0Ftv2+g9HA9G U9Nriir6w+WnYyRDeNwFakX41vuiOmulzJyY X-Google-Smtp-Source: AA6agR46Bev+nENVlSi9krC0oMCP3076gUq/FeqQX4CKhTW/3azlfFQVkq89Z/zE6LsBX3LbTPOM3Q== X-Received: by 2002:a05:620a:385:b0:6bb:5a16:54fc with SMTP id q5-20020a05620a038500b006bb5a1654fcmr13607666qkm.46.1661197836623; Mon, 22 Aug 2022 12:50:36 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id bb27-20020a05622a1b1b00b00342f917444csm9243621qtb.85.2022.08.22.12.50.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 12:50:36 -0700 (PDT) Date: Mon, 22 Aug 2022 15:50:35 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: gitster@pobox.com, Johannes.Schindelin@gmx.de, chakrabortyabhradeep79@gmail.com, derrickstolee@github.com, jonathantanmy@google.com, kaartic.sivaraam@gmail.com Subject: [PATCH v2 2/7] t/lib-bitmap.sh: avoid silencing stderr Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The midx_bitmap_partial_tests() function is responsible for setting up a state where some (but not all) packs in the repository are covered by a MIDX (and bitmap). This function has redirected the `git multi-pack-index write --bitmap`'s stderr to a file "err" since its introduction back in c51f5a6437 (t5326: test multi-pack bitmap behavior, 2021-08-31). This was likely a stray change left over from a slightly different version of this test, since the file "err" is never read after being written. This leads to confusingly-missing output, especially when the contents of stderr are important. Resolve this confusion by avoiding silencing stderr in this case. Signed-off-by: Taylor Blau --- t/lib-bitmap.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh index a95537e759..f595937094 100644 --- a/t/lib-bitmap.sh +++ b/t/lib-bitmap.sh @@ -440,7 +440,7 @@ midx_bitmap_partial_tests () { test_commit packed && git repack && test_commit loose && - git multi-pack-index write --bitmap 2>err && + git multi-pack-index write --bitmap && test_path_is_file $midx && test_path_is_file $midx-$(midx_checksum $objdir).bitmap ' From patchwork Mon Aug 22 19:50:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12951318 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37D64C28D13 for ; Mon, 22 Aug 2022 19:50:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238017AbiHVTuu (ORCPT ); Mon, 22 Aug 2022 15:50:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237697AbiHVTum (ORCPT ); Mon, 22 Aug 2022 15:50:42 -0400 Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com [IPv6:2607:f8b0:4864:20::732]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85E064D4F5 for ; Mon, 22 Aug 2022 12:50:40 -0700 (PDT) Received: by mail-qk1-x732.google.com with SMTP id n21so8738027qkk.3 for ; Mon, 22 Aug 2022 12:50:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc; bh=/gi8CYbMyQmjBWk4p4hGJoYwPgWEnR36155X72Fh47g=; b=IXMMlQcGg3zwQSBKMoRfQTmb0yNLb7ZMc84IAr/23DlDkoAEWW8EU01326kXD60YIj hcDXphij//afb+/JSayUzdmAXL6AT/CB//HCjsG3HrYtlStVdpjkmf52pxqPYViG/LRX KdNVAsALi2B9YE9NcE+aOr7C8GVmRYqo7jmoisuThEBDVnlwWSjA4B1yJMutlgChO6u9 0PaPVtVD8Vkrmmrf4oOYNKp0YqnbayfQIGv/lngmmOJM1Pt/9vAiIp2OzEOY9mAQ/Ig3 8tG+7eC7QHR+aaA5HxiHSKbuKupvNLzbVV2XoFxGhW4M1ROuNc2vA/gGxVGsykAF/XSX 1OgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=/gi8CYbMyQmjBWk4p4hGJoYwPgWEnR36155X72Fh47g=; b=nsr1OqRTlxhmrcwutBK/uGbc5jVdTLFwjeBpp1BLskXM2JvNEVBe5Y6PqRy2Xcnx7u qdWqpqn0oHsa1y+5zn++qhQad8mhQ7PBgn5ErGk2MRdhFIJweHn8Cu5DVf5nsfpeskWz DNkJyVmUBIDnal9pBOW2hLwE+gf8c3VdOiZUm3Br/myzH89NkXuyTDVUkyRYSu8m2EJl yNBIkz1aEUz+JziSGcsr0Se+ZpzxuqrS3w9fOxUEzv/W56tZfcYvoij/st3APtKbrzNN kiqTje2TCNgaoq4IzBjJSxDSvFZebsKLKUz8gXaIYpz42teKhX9UXsgrwzZc9h3m5FBp xH2A== X-Gm-Message-State: ACgBeo0UBEyMH8RDWvufUkdf+cCgAAp4NbTChm5VxNM6Des616TxOEtF PKPyeKRJ1MpK6VRvfLP6hsbnu5QDo0zr8YCT X-Google-Smtp-Source: AA6agR6O6RLk3bDJePopqkMB6BsWMF2kX4VHBCtZaUqJidMSicc6VQqcgDuXTzYWiFfmQlL1xUA4nw== X-Received: by 2002:a05:620a:1402:b0:6bc:2055:6da0 with SMTP id d2-20020a05620a140200b006bc20556da0mr3322017qkj.534.1661197839429; Mon, 22 Aug 2022 12:50:39 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id t201-20020a37aad2000000b006bacf4703c5sm12131074qke.111.2022.08.22.12.50.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 12:50:39 -0700 (PDT) Date: Mon, 22 Aug 2022 15:50:38 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: gitster@pobox.com, Johannes.Schindelin@gmx.de, chakrabortyabhradeep79@gmail.com, derrickstolee@github.com, jonathantanmy@google.com, kaartic.sivaraam@gmail.com Subject: [PATCH v2 3/7] midx.c: extract `struct midx_fanout` Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org To build up a list of objects (along with their packs, and the offsets within those packs that each object appears at), the MIDX code implements `get_sorted_entries()` which builds up a list of candidates, sorts them, and then removes duplicate entries. To do this, it keeps an array of `pack_midx_entry` structures that it builds up once for each fanout level (ie., for all possible values of the first byte of each object's ID). This array is a function-local variable of `get_sorted_entries()`. Since it uses the ALLOC_GROW() macro, having the `alloc_fanout` variable also be local to that function, and only modified within that function is convenient. However, subsequent changes will extract the two ways this array is filled (from a pack at some fanout value, and from an existing MIDX at some fanout value) into separate functions. Instead of passing around pointers to the entries array, along with `nr_fanout` and `alloc_fanout`, encapsulate these three into a structure instead. Then pass around a pointer to this structure instead. This patch does not yet extract the above two functions, but sets us up to begin doing so in the following commit. For now, the implementation of get_sorted_entries() is only modified to replace `entries_by_fanout` with `fanout.entries`, `nr_fanout` with `fanout.nr`, and so on. Signed-off-by: Taylor Blau --- midx.c | 54 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 35 insertions(+), 19 deletions(-) diff --git a/midx.c b/midx.c index 4e956cacb7..cdb6c481c7 100644 --- a/midx.c +++ b/midx.c @@ -577,6 +577,22 @@ static void fill_pack_entry(uint32_t pack_int_id, entry->preferred = !!preferred; } +struct midx_fanout { + struct pack_midx_entry *entries; + uint32_t nr; + uint32_t alloc; +}; + +static void midx_fanout_grow(struct midx_fanout *fanout, uint32_t nr) +{ + ALLOC_GROW(fanout->entries, nr, fanout->alloc); +} + +static void midx_fanout_sort(struct midx_fanout *fanout) +{ + QSORT(fanout->entries, fanout->nr, midx_oid_compare); +} + /* * It is possible to artificially get into a state where there are many * duplicate copies of objects. That can create high memory pressure if @@ -595,8 +611,8 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, int preferred_pack) { uint32_t cur_fanout, cur_pack, cur_object; - uint32_t alloc_fanout, alloc_objects, total_objects = 0; - struct pack_midx_entry *entries_by_fanout = NULL; + uint32_t alloc_objects, total_objects = 0; + struct midx_fanout fanout = { 0 }; struct pack_midx_entry *deduplicated_entries = NULL; uint32_t start_pack = m ? m->num_packs : 0; @@ -608,14 +624,14 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, * slices to be evenly distributed, with some noise. Hence, * allocate slightly more than one 256th. */ - alloc_objects = alloc_fanout = total_objects > 3200 ? total_objects / 200 : 16; + alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16; - ALLOC_ARRAY(entries_by_fanout, alloc_fanout); + ALLOC_ARRAY(fanout.entries, fanout.alloc); ALLOC_ARRAY(deduplicated_entries, alloc_objects); *nr_objects = 0; for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) { - uint32_t nr_fanout = 0; + fanout.nr = 0; if (m) { uint32_t start = 0, end; @@ -625,15 +641,15 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, end = ntohl(m->chunk_oid_fanout[cur_fanout]); for (cur_object = start; cur_object < end; cur_object++) { - ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout); + midx_fanout_grow(&fanout, fanout.nr + 1); nth_midxed_pack_midx_entry(m, - &entries_by_fanout[nr_fanout], + &fanout.entries[fanout.nr], cur_object); if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) - entries_by_fanout[nr_fanout].preferred = 1; + fanout.entries[fanout.nr].preferred = 1; else - entries_by_fanout[nr_fanout].preferred = 0; - nr_fanout++; + fanout.entries[fanout.nr].preferred = 0; + fanout.nr++; } } @@ -646,36 +662,36 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, end = get_pack_fanout(info[cur_pack].p, cur_fanout); for (cur_object = start; cur_object < end; cur_object++) { - ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout); + midx_fanout_grow(&fanout, fanout.nr + 1); fill_pack_entry(cur_pack, info[cur_pack].p, cur_object, - &entries_by_fanout[nr_fanout], + &fanout.entries[fanout.nr], preferred); - nr_fanout++; + fanout.nr++; } } - QSORT(entries_by_fanout, nr_fanout, midx_oid_compare); + midx_fanout_sort(&fanout); /* * The batch is now sorted by OID and then mtime (descending). * Take only the first duplicate. */ - for (cur_object = 0; cur_object < nr_fanout; cur_object++) { - if (cur_object && oideq(&entries_by_fanout[cur_object - 1].oid, - &entries_by_fanout[cur_object].oid)) + for (cur_object = 0; cur_object < fanout.nr; cur_object++) { + if (cur_object && oideq(&fanout.entries[cur_object - 1].oid, + &fanout.entries[cur_object].oid)) continue; ALLOC_GROW(deduplicated_entries, *nr_objects + 1, alloc_objects); memcpy(&deduplicated_entries[*nr_objects], - &entries_by_fanout[cur_object], + &fanout.entries[cur_object], sizeof(struct pack_midx_entry)); (*nr_objects)++; } } - free(entries_by_fanout); + free(fanout.entries); return deduplicated_entries; } From patchwork Mon Aug 22 19:50:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12951319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9BB5C28D13 for ; Mon, 22 Aug 2022 19:50:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238148AbiHVTuz (ORCPT ); Mon, 22 Aug 2022 15:50:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237717AbiHVTut (ORCPT ); Mon, 22 Aug 2022 15:50:49 -0400 Received: from mail-qk1-x72b.google.com (mail-qk1-x72b.google.com [IPv6:2607:f8b0:4864:20::72b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CE545006A for ; Mon, 22 Aug 2022 12:50:43 -0700 (PDT) Received: by mail-qk1-x72b.google.com with SMTP id j6so8720774qkl.10 for ; Mon, 22 Aug 2022 12:50:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc; bh=ZaKuBtWVTlemWiVOiVNpqCfcFRaKxUfBq3eH7HUa1gU=; b=eNd9L1CQKoD1v4WdtiB7/beZ5HQUdzL6p9ucc2h2FVHGzuN/tycqfdxv9wp3HJJxY3 nVDTFxr5MCEVXquvQEtCl0OS0/XuzI4jE6TiwT09VWdisOD0cfwtkJDemk0jc0tIaiUy lxl0TVpxynZ8qv93m4CPOGkFTIoQF5guP9xbUhRKNKG2bc4USrsp0FHSpe8j+ImUO3rv NaA+qTfvbHyD74r6k4rMLdZ+TD/r4WiLce+TRPETTOVEFSzEEnMwlMTVIcglvjffQhon hmKP5jWd2JU8kSmfW3Ul0HaCFYxOWerKDGITsjJr2pgXiEUCJ7sCh/3/19vnPenQDbE+ p0oQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=ZaKuBtWVTlemWiVOiVNpqCfcFRaKxUfBq3eH7HUa1gU=; b=lJNsSM2atxEis7gFM+ryupaNCp9laoacxUvasHlLaA7PkqK36+Za7L53WLwovQIW09 cCI1kSaZqTorzfOnqmExYDpWE3CZit775opI1+4KCFMRtwaYDfEs9Yo9O175bAxcIzvi XmVcz/J0/sMfWEt0bnDCBMbDqjBBCasC7m5KQ/qgxHLzZ70Gs0UEr1J/Yuzx1XHs88oQ wMH7JRXNdn9z1M1lnGby1JeQr5yl9egkFTRes+SGVX/E0RTTfVRg03XjBk7mbtG4STOA 6gr+Jyhq3x8WG60aINPoIhy0eSLtL7em2d3yioP+hxig+Uha9iRTdKbFnZ/JDSgyyVNy Z+mA== X-Gm-Message-State: ACgBeo0DUZw1+mJjGNP6y//NQxkp5v2mGcTuHktNM5z/HjA+jh52UP+1 vQr1lrxpIq8d64T0WMhK96zAaXxdA7ixKwBR X-Google-Smtp-Source: AA6agR4110MgiynTtBUfbImx7+1zpKuxxmGwuGM7kXhEA7gwvUHHGNcn0J/HCM46YsJgXppHoD/ogw== X-Received: by 2002:a37:87c7:0:b0:6bb:20e8:ee3c with SMTP id j190-20020a3787c7000000b006bb20e8ee3cmr13783713qkd.474.1661197842267; Mon, 22 Aug 2022 12:50:42 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id t201-20020a37aad2000000b006b9264191b5sm12192958qke.32.2022.08.22.12.50.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 12:50:41 -0700 (PDT) Date: Mon, 22 Aug 2022 15:50:41 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: gitster@pobox.com, Johannes.Schindelin@gmx.de, chakrabortyabhradeep79@gmail.com, derrickstolee@github.com, jonathantanmy@google.com, kaartic.sivaraam@gmail.com Subject: [PATCH v2 4/7] midx.c: extract `midx_fanout_add_midx_fanout()` Message-ID: <2351a9fc27140b99d9306eebcdb306df711f3c83.1661197803.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Extract a routine to add all objects whose object ID's first byte is `cur_fanout` from an existing MIDX. This function will only be called once, so extracting it is purely cosmetic to improve the readability of `get_sorted_entries()` (its sole caller) below. The functionality is unchanged in this commit. Signed-off-by: Taylor Blau --- midx.c | 47 ++++++++++++++++++++++++++++------------------- 1 file changed, 28 insertions(+), 19 deletions(-) diff --git a/midx.c b/midx.c index cdb6c481c7..0d40089c4d 100644 --- a/midx.c +++ b/midx.c @@ -593,6 +593,31 @@ static void midx_fanout_sort(struct midx_fanout *fanout) QSORT(fanout->entries, fanout->nr, midx_oid_compare); } +static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, + struct multi_pack_index *m, + int preferred_pack, + uint32_t cur_fanout) +{ + uint32_t start = 0, end; + uint32_t cur_object; + + if (cur_fanout) + start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]); + end = ntohl(m->chunk_oid_fanout[cur_fanout]); + + for (cur_object = start; cur_object < end; cur_object++) { + midx_fanout_grow(fanout, fanout->nr + 1); + nth_midxed_pack_midx_entry(m, + &fanout->entries[fanout->nr], + cur_object); + if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) + fanout->entries[fanout->nr].preferred = 1; + else + fanout->entries[fanout->nr].preferred = 0; + fanout->nr++; + } +} + /* * It is possible to artificially get into a state where there are many * duplicate copies of objects. That can create high memory pressure if @@ -633,25 +658,9 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) { fanout.nr = 0; - if (m) { - uint32_t start = 0, end; - - if (cur_fanout) - start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]); - end = ntohl(m->chunk_oid_fanout[cur_fanout]); - - for (cur_object = start; cur_object < end; cur_object++) { - midx_fanout_grow(&fanout, fanout.nr + 1); - nth_midxed_pack_midx_entry(m, - &fanout.entries[fanout.nr], - cur_object); - if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) - fanout.entries[fanout.nr].preferred = 1; - else - fanout.entries[fanout.nr].preferred = 0; - fanout.nr++; - } - } + if (m) + midx_fanout_add_midx_fanout(&fanout, m, preferred_pack, + cur_fanout); for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) { uint32_t start = 0, end; From patchwork Mon Aug 22 19:50:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12951320 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29F25C28D13 for ; Mon, 22 Aug 2022 19:51:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237932AbiHVTvM (ORCPT ); Mon, 22 Aug 2022 15:51:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237920AbiHVTut (ORCPT ); Mon, 22 Aug 2022 15:50:49 -0400 Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 174434DB08 for ; Mon, 22 Aug 2022 12:50:46 -0700 (PDT) Received: by mail-qv1-xf35.google.com with SMTP id e4so8954753qvr.2 for ; Mon, 22 Aug 2022 12:50:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc; bh=arZxlmoQFLEphCbCK0kNlqnrKdx4dYYcctDBCHQpTIU=; b=b3ImRhFDw1xW161y7U1NBd8z/ZcZ0y9BfuZyo3FQAeRxaSSbCezFZZkveJsXUUxO46 ch3o/Qdh95LSRaCKJaGdyUrLFaRTHNxSPvA+oQ0c6iCiNB32FRblvE7FpYFY5lgf/+bY PRDi31WE5nlIRneJblaMtmpN1M8xhmmYW6u0b2/JNeqLsZ9yIlJ6Sd90ot9y2GHIatWq /yo6U+9X0hq9Yg6EOwAAUXz1CSzs9rDUbj5MkE07aHLZLFHJQ9yEdTC2ptfLz4JNEQ1t daRZOCbAficUYzF/TTnnnmo5tF6Ws7OoPZ0OXliT4Yl5ubhIfm2pTWLsTIkgR9m7C6uS 2RHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=arZxlmoQFLEphCbCK0kNlqnrKdx4dYYcctDBCHQpTIU=; b=g7h4W4LROlUH9nlZ86amhQXU9ewHe5xrHkU/ofTKwUafvEHyDfRq59OCsrVmy0vMv3 wYK3IX+4QeY8uu6XsjjVerUMSwcJZ9zbSy2BkWuOAPDKzeOeJYC14zrCli+EeXfmwZhd aliuhe1PJARFpHcTilDCvK3v4VAyC/TDcVukBgLWumcDI5SanltEu3OY2aFZFEaqv70E /bxdN6AGzyoXmfe1T8yCIlCzz9tEzvCgRqD2QvNx6Mf3DsYLj+qG53lZhfSK+GzMzF4/ I94ZJ6cViWdxLOkq1EaqqfIZvMYmNwrInQNx8Gr7rIB2XlpGPSVPERJDAwSfg+n2vED2 5tUw== X-Gm-Message-State: ACgBeo3SjobRO7dNK/aQ1LRB5NFo09sb8orHys0Bd/GkKKtih7Fb114s 1BVD8mILCxfVbek/AKka/NMluLnah0+K4ANk X-Google-Smtp-Source: AA6agR7UMtL4wYg4nhlVWCtGbqQacNdltHB7XtGKP7dyBOrGwQ7b8nSi0cFL+vaW7ndwUAokDkZawQ== X-Received: by 2002:ad4:5f8d:0:b0:496:d7c3:17eb with SMTP id jp13-20020ad45f8d000000b00496d7c317ebmr8012887qvb.114.1661197844989; Mon, 22 Aug 2022 12:50:44 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id s19-20020a05622a019300b00342f97dc84fsm10225574qtw.76.2022.08.22.12.50.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 12:50:44 -0700 (PDT) Date: Mon, 22 Aug 2022 15:50:43 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: gitster@pobox.com, Johannes.Schindelin@gmx.de, chakrabortyabhradeep79@gmail.com, derrickstolee@github.com, jonathantanmy@google.com, kaartic.sivaraam@gmail.com Subject: [PATCH v2 5/7] midx.c: extract `midx_fanout_add_pack_fanout()` Message-ID: <845e1484b49f88bdd64faec2b7e61eddf2e3045c.1661197803.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Extract a routine to add all objects whose object ID's first byte is `cur_fanout` from a given pack (identified by its index into the `struct pack_info` array maintained by the MIDX writing routine). Unlike the previous extraction (for `midx_fanout_add_midx_fanout()`), this function will be called twice, once for all new packs, and again for the preferred pack (if it appears in an existing MIDX). The latter change is to resolve the bug described a few patches ago, and will be made in the subsequent commit. Similar to the previous refactoring, this function also enhances the readability of its caller in `get_sorted_entries()`. Its functionality is unchanged in this commit. Signed-off-by: Taylor Blau --- midx.c | 43 ++++++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 15 deletions(-) diff --git a/midx.c b/midx.c index 0d40089c4d..be8186eec2 100644 --- a/midx.c +++ b/midx.c @@ -618,6 +618,31 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, } } +static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout, + struct pack_info *info, + uint32_t cur_pack, + int preferred, + uint32_t cur_fanout) +{ + struct packed_git *pack = info[cur_pack].p; + uint32_t start = 0, end; + uint32_t cur_object; + + if (cur_fanout) + start = get_pack_fanout(pack, cur_fanout - 1); + end = get_pack_fanout(pack, cur_fanout); + + for (cur_object = start; cur_object < end; cur_object++) { + midx_fanout_grow(fanout, fanout->nr + 1); + fill_pack_entry(cur_pack, + info[cur_pack].p, + cur_object, + &fanout->entries[fanout->nr], + preferred); + fanout->nr++; + } +} + /* * It is possible to artificially get into a state where there are many * duplicate copies of objects. That can create high memory pressure if @@ -663,22 +688,10 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, cur_fanout); for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) { - uint32_t start = 0, end; int preferred = cur_pack == preferred_pack; - - if (cur_fanout) - start = get_pack_fanout(info[cur_pack].p, cur_fanout - 1); - end = get_pack_fanout(info[cur_pack].p, cur_fanout); - - for (cur_object = start; cur_object < end; cur_object++) { - midx_fanout_grow(&fanout, fanout.nr + 1); - fill_pack_entry(cur_pack, - info[cur_pack].p, - cur_object, - &fanout.entries[fanout.nr], - preferred); - fanout.nr++; - } + midx_fanout_add_pack_fanout(&fanout, + info, cur_pack, + preferred, cur_fanout); } midx_fanout_sort(&fanout); From patchwork Mon Aug 22 19:50:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12951322 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF421C28D13 for ; Mon, 22 Aug 2022 19:51:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237410AbiHVTvQ (ORCPT ); Mon, 22 Aug 2022 15:51:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238081AbiHVTuw (ORCPT ); Mon, 22 Aug 2022 15:50:52 -0400 Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D30095283F for ; Mon, 22 Aug 2022 12:50:48 -0700 (PDT) Received: by mail-qt1-x835.google.com with SMTP id w28so8777634qtc.7 for ; Mon, 22 Aug 2022 12:50:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc; bh=IJc+d+l0tYTwna4fuOdWhSK5he40CommzX41O51/Bug=; b=nb3LyijBTs0PsRiD9sfAfIt7F8G/Eu+XeLD/N7uiJPRn3VSfLgtD5oOqdD0qMRMEe/ fKql7UU47IwbG0BeYJVaOaGn3sS1EGXtVNbRX7NXdymbwDuMaaRkYaTZGK+LX9k9zRt4 OcMzjoicwjoxgySpNA1i7U8/+KPraCzNXz84OZTkFDi85ntzMi9YkYZy0CoTSZWsA9jj qp7fQgGeSA4CRzB9pCb3eQDd1mA+2qz9S550shN0i8hQPNc5pZU3u1jaC5d90zCn28gY TVa/UVUn9dbtRCb1SYG+WAUZOO7Rc74sUwu0n8OkwmNmR6AIFOocvP5WSPOTZn+FUPWj n5hQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=IJc+d+l0tYTwna4fuOdWhSK5he40CommzX41O51/Bug=; b=yUmvdMK3PbKNLndzBw1ajBvGjK5r+4c5Hzn46rCkYI2CfkUnO9gIe99vLNcoEuJEjF Cc1ZsQFRam8/SkmMmiBXi4wSPPdBStfNFjn/WPf20h7Jo8idVlgPL0Jz6fNPA+Zud2bG RO18wUIb/A0hH7dkIwwz+piETbMvK5Z1XIF75YQmqhxxs93r9fezDplGDc1yWr6bYamb T81wRItTl7VVKezaCqd3KNwAwVBXsWfZe0z1mvNBGtfVTVRN6QR1gwAtEH3bLQnp974u JgHD3npbnrPSGEfy74+SNotj4m/YSnte04rAJiLkaUCbCibExlOaG3afo/+r4AiGE8Li Bzdg== X-Gm-Message-State: ACgBeo3efdevyowuIDmqqmmep63e3C/IWp3rCtxyGgjN15txc3K72Z6A jr8OSA7cpjW75dHnPmpAb25mqhltyp4whU2o X-Google-Smtp-Source: AA6agR6HziyWn85XmsOH6ST1KT+MiDZ2TooHdlRkgxABBP4uwgfjLglnSH7qfIOQM5qaQpMT49ms+w== X-Received: by 2002:a05:622a:8a:b0:344:5611:7a8a with SMTP id o10-20020a05622a008a00b0034456117a8amr16883035qtw.565.1661197847734; Mon, 22 Aug 2022 12:50:47 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id d134-20020ae9ef8c000000b006b942f4ffe3sm11001800qkg.18.2022.08.22.12.50.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 12:50:47 -0700 (PDT) Date: Mon, 22 Aug 2022 15:50:46 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: gitster@pobox.com, Johannes.Schindelin@gmx.de, chakrabortyabhradeep79@gmail.com, derrickstolee@github.com, jonathantanmy@google.com, kaartic.sivaraam@gmail.com Subject: [PATCH v2 6/7] midx.c: include preferred pack correctly with existing MIDX Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org This patch resolves an issue where the object order used to generate a MIDX bitmap would violate an invariant that all of the preferred pack's objects are represented by that pack in the MIDX. The problem arises when reusing an existing MIDX while generating a new one, and occurs specifically when the identity of the preferred pack changes from one MIDX to another, along with a few other conditions: - the new preferred pack must also be present in the existing MIDX - the new preferred pack must *not* have been the preferred pack in the existing MIDX - most importantly, there must be at least one object present in the physical preferred pack (ie., it shows up in that pack's index) but was selected from a *different* pack when the previous MIDX was generated When the above conditions are all met, we end up (incorrectly) discarding copies of some objects in the pack selected as the preferred pack. This is because `get_sorted_entries()` adds objects to its list by doing the following at each fanout level: - first, adding all objects from that fanout level from an existing MIDX - then, adding all objects from that fanout level in each pack *not* included in the existing MIDX So if some object was not selected from the to-be-preferred pack when writing the previous MIDX, then we will never consider it as a candidate when generating the new MIDX. This means that it's possible for the preferred pack to not include all of its objects in the MIDX's pseudo-pack object order, which is an invariant violation of that order. Resolve this by adding all objects from the preferred pack separately when it appears in the existing MIDX (if one was present). This will duplicate objects from that pack that *did* appear in the MIDX, but this is fine, since get_sorted_entries() already handles duplicates. (A future optimization in this area could avoid adding copies of objects that we know already existing in the MIDX.) Note that we no longer need to compute the preferred-ness of objects added from the MIDX, since we only want to select the preferred objects from a single source. (We could still mark these preferred bits, but doing so is redundant and unnecessary). This resolves the bug demonstrated by t5326.174 ("preferred pack change with existing MIDX bitmap"). Signed-off-by: Taylor Blau --- midx.c | 14 +++++++------- t/t5326-multi-pack-bitmaps.sh | 13 +++++-------- 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/midx.c b/midx.c index be8186eec2..bd1d27090e 100644 --- a/midx.c +++ b/midx.c @@ -595,7 +595,6 @@ static void midx_fanout_sort(struct midx_fanout *fanout) static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, struct multi_pack_index *m, - int preferred_pack, uint32_t cur_fanout) { uint32_t start = 0, end; @@ -610,10 +609,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, nth_midxed_pack_midx_entry(m, &fanout->entries[fanout->nr], cur_object); - if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) - fanout->entries[fanout->nr].preferred = 1; - else - fanout->entries[fanout->nr].preferred = 0; + fanout->entries[fanout->nr].preferred = 0; fanout->nr++; } } @@ -684,8 +680,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, fanout.nr = 0; if (m) - midx_fanout_add_midx_fanout(&fanout, m, preferred_pack, - cur_fanout); + midx_fanout_add_midx_fanout(&fanout, m, cur_fanout); for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) { int preferred = cur_pack == preferred_pack; @@ -694,6 +689,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, preferred, cur_fanout); } + if (-1 < preferred_pack && preferred_pack < start_pack) + midx_fanout_add_pack_fanout(&fanout, info, + preferred_pack, 1, + cur_fanout); + midx_fanout_sort(&fanout); /* diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index c364677ae8..89ecd1062c 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -338,19 +338,16 @@ test_expect_success 'preferred pack change with existing MIDX bitmap' ' git pack-objects --all --unpacked $objdir/pack/pack0 && # Generate a new MIDX which changes the preferred pack - # to a pack contained in the existing MIDX, such that - # not all objects from p2 that appear in the MIDX had - # their copy selected from p2. + # to a pack contained in the existing MIDX. git multi-pack-index write --bitmap \ --preferred-pack="pack-$p2.pack" && test_path_is_file $midx && test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - # When the above circumstances are met, an existing bug - # in the MIDX machinery will cause the reverse index to - # be read incorrectly, resulting in failed clones (among - # other things). - test_must_fail git clone --no-local . clone2 + # When the above circumstances are met, the preferred + # pack should change appropriately and clones should + # (still) succeed. + git clone --no-local . clone2 ) ' From patchwork Mon Aug 22 19:50:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 12951321 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE93DC32792 for ; Mon, 22 Aug 2022 19:51:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237689AbiHVTvO (ORCPT ); Mon, 22 Aug 2022 15:51:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237613AbiHVTvG (ORCPT ); Mon, 22 Aug 2022 15:51:06 -0400 Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com [IPv6:2607:f8b0:4864:20::f34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61B005006A for ; Mon, 22 Aug 2022 12:50:51 -0700 (PDT) Received: by mail-qv1-xf34.google.com with SMTP id u6so4371683qvp.5 for ; Mon, 22 Aug 2022 12:50:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc; bh=61WnEJtEOMxH7DdglrD6RuoqWGFPUieBr48HGPuFatM=; b=RGYH+XEKhWNjlK/rqDYZe3jZqtAY4A0PmXRglmLHnhwcfq6vBcglKNr/7HBb9dIIgG fJbDIll3lNG6TMHr1NtIZEFfYA1UjqL9HG5VxhoKBK+Xunmr0y7P6Y7b/Jo5sjOX1qWG Ge4xAq7psFDP4lgiHHsLVdRd/KlhEHSWIfYqkhZE/EQbbZZAF6TEPWzvxXOaDqHyVLo7 vM8DkCC+dtn6zVznCi7eQapv12aLqxhJcruPVsbwgDJKUN2ATxyRIgxeQWTKGShmH3Ti MOmB+Ik8aKRmJhQwiqegvQnX/gMzQcUGjdzUQHM5UFwoFrv8ai33K7EHMrfZyjZ+XL9p 1iSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc; bh=61WnEJtEOMxH7DdglrD6RuoqWGFPUieBr48HGPuFatM=; b=BEhQG/E3UQANZzI8+TxGPrbOP6HdnOsUUPxXJ5T+Ey06Nt8fDaQXMpoIRjpT7cgqr7 njOCxeEcqUJDOmFhzxQsHhvXInEZ5b9/5doucOdhKVBjj6B7tZIivVA1E8X+WYeVOw9y JuGPHiegmVpCJk8KS2YW1FPT0l/jJSE2Oj8V31DM3z8zO+zWCoiSiVIwa7hBNL3s9e2R CN1bic0LqcPeC3McIlAHkmunB77+EOJovdTgHvz9cKD0JYV9ymPFJYBH036pOAYFJ/kS w1wd1/bT1JBmTf2TRrIvd6An5/87A4QO4IPLP/y5nk8neChywtCyPTnaQklGpWuhR4yi wdAw== X-Gm-Message-State: ACgBeo2fFSkSGv0DwGXVJlO7o5mZ1uLtWwEku9V2T64E7lmx2u1/enJU csaTsBhbXw5n427uLTJoTj1yj4uGZFHHixUp X-Google-Smtp-Source: AA6agR5QNOhk+WjLn13BUaIDBuR9xSds9LDTl7dTHyJb0eEX3/tBnH49RcBPp3p+NluAbbARzDI+iw== X-Received: by 2002:a05:6214:262f:b0:477:734:3177 with SMTP id gv15-20020a056214262f00b0047707343177mr17323290qvb.67.1661197850391; Mon, 22 Aug 2022 12:50:50 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id h11-20020ac8514b000000b00342f4fc290esm9511877qtn.71.2022.08.22.12.50.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 12:50:50 -0700 (PDT) Date: Mon, 22 Aug 2022 15:50:49 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: gitster@pobox.com, Johannes.Schindelin@gmx.de, chakrabortyabhradeep79@gmail.com, derrickstolee@github.com, jonathantanmy@google.com, kaartic.sivaraam@gmail.com Subject: [PATCH v2 7/7] midx.c: avoid adding preferred objects twice Message-ID: <887ab9485faa21f5a5cd889d97895ed41013803d.1661197803.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The last commit changes the behavior of midx.c's `get_sorted_objects()` function to handle the case of writing a MIDX bitmap while reusing an existing MIDX and changing the identity of the preferred pack separately. As part of this change, all objects from the (new) preferred pack are added to the fanout table in a separate pass. Since these copies of the objects all have their preferred bits set, any duplicates will be resolved in their favor. Importantly, this includes any copies of those same objects that come from the existing MIDX. We know at the time of adding them that they'll be redundant if their source pack is the (new) preferred one, so we can avoid adding them to the list in this case. Signed-off-by: Taylor Blau --- midx.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/midx.c b/midx.c index bd1d27090e..148ecc2f14 100644 --- a/midx.c +++ b/midx.c @@ -595,7 +595,8 @@ static void midx_fanout_sort(struct midx_fanout *fanout) static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, struct multi_pack_index *m, - uint32_t cur_fanout) + uint32_t cur_fanout, + int preferred_pack) { uint32_t start = 0, end; uint32_t cur_object; @@ -605,6 +606,15 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, end = ntohl(m->chunk_oid_fanout[cur_fanout]); for (cur_object = start; cur_object < end; cur_object++) { + if ((preferred_pack > -1) && + (preferred_pack == nth_midxed_pack_int_id(m, cur_object))) { + /* + * Objects from preferred packs are added + * separately. + */ + continue; + } + midx_fanout_grow(fanout, fanout->nr + 1); nth_midxed_pack_midx_entry(m, &fanout->entries[fanout->nr], @@ -680,7 +690,8 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, fanout.nr = 0; if (m) - midx_fanout_add_midx_fanout(&fanout, m, cur_fanout); + midx_fanout_add_midx_fanout(&fanout, m, cur_fanout, + preferred_pack); for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) { int preferred = cur_pack == preferred_pack;