From patchwork Tue Dec 8 22:03:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959859 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 335FDC4167B for ; Tue, 8 Dec 2020 22:04:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0431C207B0 for ; Tue, 8 Dec 2020 22:04:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730607AbgLHWEH (ORCPT ); Tue, 8 Dec 2020 17:04:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730592AbgLHWEE (ORCPT ); Tue, 8 Dec 2020 17:04:04 -0500 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1166FC061793 for ; Tue, 8 Dec 2020 14:03:18 -0800 (PST) Received: by mail-oi1-x244.google.com with SMTP id p126so190047oif.7 for ; Tue, 08 Dec 2020 14:03:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=7VnFwsY7xM/vegQ12SEFiiCQRRPhzMko6e1us1C3KBY=; b=T+S0awvBUMN3Hk/YnRIoGx2jSQ+7a+JaeNr6DKlAiQ+aweMRtQPnIAKZzNmMesUGgg a7I4A5JNCb4Jh2tmvEPea3rjx4WGohQxnZp+SA2FqFwtintuYwavi0184yGIzuzntzAm CHo8mlbTstajiu7wKuUz1lTcviqBqFRwYIpMR6JzkAq+F87uffKUDkrUlJI0ggeW7Otv lRJBKD30NSseRYb0hQ7TqvutR6eovNCb40HL04DnE6FWfEZM0LE20HmpVvnDRrz2SqLY u2ilfiUh6Ozft9AhcXn6CYlLKIeHlp152zZYu5mzjGm53/7WqnRxJDMLRRy/W/B0Coah poxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=7VnFwsY7xM/vegQ12SEFiiCQRRPhzMko6e1us1C3KBY=; b=ZKl1bd1uj236BnxNHTpedO0WE2SGLIX2pIuWRx9nJRbmIKBLsrjN/qZpcSCQPxngZg nZv+EmaU3F0NHdkTnnRYMW2PgogN793oYwGfz7Hil7DX4C6Sr/n7NlFxYLTcgVixr6sw b0ErMt5mBkMMYfK6chXtd1Rkfijmbzh/g4t2ncrojAxlzGp0xQuJw+4/yKm/cZz+lJ9N Sre6U8vTOWKXOSOXdJvgDrLFeO+vplp+E6AfGya/VfuY7hPGAjHDwZks+jfd2pzgYLrd JWQNUu5dUwMYt3BpZL+Mgvyfs07bzR81RUvGqXP4Nf09LWqMiIGakfY4DaycOTIjoCAr kHfg== X-Gm-Message-State: AOAM53061erTw+EZMkmLCHkfrQxGjutSMEv+Fg6jYIvX6yn8S2fIc571 nDZFN2TSx+adxzNmhaRjV3sIyhxQl+ONQBUJ X-Google-Smtp-Source: ABdhPJzL/MYrD7Zr5tqIW0b+P14ZRez0oUBe9b53jcF/+DxnuiGTX73sXYO8GNXzcINNJv1ska60nQ== X-Received: by 2002:aca:eccb:: with SMTP id k194mr4478603oih.112.1607464997205; Tue, 08 Dec 2020 14:03:17 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id b82sm11649oif.49.2020.12.08.14.03.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:03:16 -0800 (PST) Date: Tue, 8 Dec 2020 17:03:14 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 01/24] ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW() Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org 'ewah/ewah_bitmap.c:buffer_grow()' is responsible for growing the buffer used to store the bits of an EWAH bitmap. It is essentially doing the same task as the 'ALLOC_GROW()' macro, so use that instead. This simplifies the callers of 'buffer_grow()', who no longer have to ask for a specific size, but rather specify how much of the buffer they need. They also no longer need to guard 'buffer_grow()' behind an if statement, since 'ALLOC_GROW()' (and, by extension, 'buffer_grow()') is a noop if the buffer is already large enough. But, the most significant change is that this fixes a bug when calling buffer_grow() with both 'alloc_size' and 'new_size' set to 1. In this case, truncating integer math will leave the new size set to 1, causing the buffer to never grow. Instead, let alloc_nr() handle this, which asks for '(new_size + 16) * 3 / 2' instead of 'new_size * 3 / 2'. Signed-off-by: Taylor Blau --- ewah/ewah_bitmap.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c index d59b1afe3d..2a8c7c5c33 100644 --- a/ewah/ewah_bitmap.c +++ b/ewah/ewah_bitmap.c @@ -19,6 +19,7 @@ #include "git-compat-util.h" #include "ewok.h" #include "ewok_rlw.h" +#include "cache.h" static inline size_t min_size(size_t a, size_t b) { @@ -33,20 +34,13 @@ static inline size_t max_size(size_t a, size_t b) static inline void buffer_grow(struct ewah_bitmap *self, size_t new_size) { size_t rlw_offset = (uint8_t *)self->rlw - (uint8_t *)self->buffer; - - if (self->alloc_size >= new_size) - return; - - self->alloc_size = new_size; - REALLOC_ARRAY(self->buffer, self->alloc_size); + ALLOC_GROW(self->buffer, new_size, self->alloc_size); self->rlw = self->buffer + (rlw_offset / sizeof(eword_t)); } static inline void buffer_push(struct ewah_bitmap *self, eword_t value) { - if (self->buffer_size + 1 >= self->alloc_size) - buffer_grow(self, self->buffer_size * 3 / 2); - + buffer_grow(self, self->buffer_size + 1); self->buffer[self->buffer_size++] = value; } @@ -137,8 +131,7 @@ void ewah_add_dirty_words( rlw_set_literal_words(self->rlw, literals + can_add); - if (self->buffer_size + can_add >= self->alloc_size) - buffer_grow(self, (self->buffer_size + can_add) * 3 / 2); + buffer_grow(self, self->buffer_size + can_add); if (negate) { size_t i; From patchwork Tue Dec 8 22:03:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959857 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 555C4C1B0D9 for ; Tue, 8 Dec 2020 22:04:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 25BC322472 for ; Tue, 8 Dec 2020 22:04:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730597AbgLHWEH (ORCPT ); Tue, 8 Dec 2020 17:04:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53980 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729455AbgLHWED (ORCPT ); Tue, 8 Dec 2020 17:04:03 -0500 Received: from mail-ot1-x343.google.com (mail-ot1-x343.google.com [IPv6:2607:f8b0:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C829C0613CF for ; Tue, 8 Dec 2020 14:03:23 -0800 (PST) Received: by mail-ot1-x343.google.com with SMTP id b18so281763ots.0 for ; Tue, 08 Dec 2020 14:03:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=0biD3z7zmbAd3yPu0/4ZJpBDWIf7gZwf31aSZXDeAwY=; b=Rj7ZrtgzURLvVzZC8VVInduZrv0kABGBICz5NN1/cr9QyrW0PqWT4FByRNZshzlc1W ByhvvZlnEJJi6JefiHXOFedTYoJjEfGfR5HRbHSvzgnahMVDUV3uH/bij8q9llzKPZNm bFXeDp/KwVxohf4TLVG37CxwtBL7s8XuyC+aa6FP0cdraTBEtlspLfzYYxMQqNbwrNKh pUqRyH1rz5EQlPO8iBTUhHA5H5HjezCzURd0u3bU8yPNmnyzUoEzwH3HkpwuC/g8l5B2 23XDeSJpMAA7vxXkaw5u0wttXsrwU/xKRm5+HlO+XMCCA4cBid0COJmFRXVAKTBY7Ovo hqng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=0biD3z7zmbAd3yPu0/4ZJpBDWIf7gZwf31aSZXDeAwY=; b=OPA6NC84hthcNCX/UY0pj86aUjjRskFeW+dtP1FCtwEu6OKE6NIkrpGJur467MoKif wVbJzER3INgmxDcwJiobqqEeXv3Omt+0I0JLg5cDo52yY9FTADQnKzG5wjVRG9fKR86x vP0lOx4eiiY9DZ+Yd6MKQ30h43KE7JQ/G825IEvpxtkXojr1Srm1QboUI1W8T0UHzH+F twf7/qCwOemgJtiFWrwC8Oz5Ilt/oSdr5bKOQRzAz+NsonndzsF04ytJjfJrYr0b00Hv CWByQQbXC6eWcjC9mPS5x7byaUYdFq9bvMSHelW7jhrXjU3jQC8zjTLhl/WjyfVku2J9 Q08w== X-Gm-Message-State: AOAM532N4qK/MyTjXd3vd8Jb7fiqRTnXq2gtdZ+RGfktBPr6zH3+OqMg F8Q174szHmTy3+SU6yjtYdGOzsTeTjOCIg3A X-Google-Smtp-Source: ABdhPJyuHwFbS+6CawE8nxGR7uAizAhXpsop3RrN/9AeMCcdIfAAkRgJxEI9qcS5bqbhOEe+QN1mow== X-Received: by 2002:a9d:3a2:: with SMTP id f31mr151544otf.216.1607465002235; Tue, 08 Dec 2020 14:03:22 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id a18sm17545oia.29.2020.12.08.14.03.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:03:21 -0800 (PST) Date: Tue, 8 Dec 2020 17:03:19 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 02/24] pack-bitmap: fix header size check Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King When we parse a .bitmap header, we first check that we have enough bytes to make a valid header. We do that based on sizeof(struct bitmap_disk_header). However, as of 0f4d6cada8 (pack-bitmap: make bitmap header handling hash agnostic, 2019-02-19), that struct oversizes its checksum member to GIT_MAX_RAWSZ. That means we need to adjust for the difference between that constant and the size of the actual hash we're using. That commit adjusted the code which moves our pointer forward, but forgot to update the size check. This meant we were overly strict about the header size (requiring room for a 32-byte worst-case hash, when sha1 is only 20 bytes). But in practice it didn't matter because bitmap files tend to have at least 12 bytes of actual data anyway, so it was unlikely for a valid file to be caught by this. Let's fix it by pulling the header size into a separate variable and using it in both spots. That fixes the bug and simplifies the code to make it harder to have a mismatch like this in the future. It will also come in handy in the next patch for more bounds checking. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 4077e731e8..fe5647e72e 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -138,9 +138,10 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index) static int load_bitmap_header(struct bitmap_index *index) { struct bitmap_disk_header *header = (void *)index->map; + size_t header_size = sizeof(*header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz; - if (index->map_size < sizeof(*header) + the_hash_algo->rawsz) - return error("Corrupted bitmap index (missing header data)"); + if (index->map_size < header_size + the_hash_algo->rawsz) + return error("Corrupted bitmap index (too small)"); if (memcmp(header->magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)) != 0) return error("Corrupted bitmap index file (wrong header)"); @@ -164,7 +165,7 @@ static int load_bitmap_header(struct bitmap_index *index) } index->entry_count = ntohl(header->entry_count); - index->map_pos += sizeof(*header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz; + index->map_pos += header_size; return 0; } From patchwork Tue Dec 8 22:03:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959861 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19710C4361B for ; Tue, 8 Dec 2020 22:04:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DA0A723A7C for ; Tue, 8 Dec 2020 22:04:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730627AbgLHWEP (ORCPT ); Tue, 8 Dec 2020 17:04:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730609AbgLHWEO (ORCPT ); Tue, 8 Dec 2020 17:04:14 -0500 Received: from mail-oi1-x235.google.com (mail-oi1-x235.google.com [IPv6:2607:f8b0:4864:20::235]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C98AC061794 for ; Tue, 8 Dec 2020 14:03:27 -0800 (PST) Received: by mail-oi1-x235.google.com with SMTP id k2so150459oic.13 for ; Tue, 08 Dec 2020 14:03:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=4+YaVsaIeql0R9c8t74ily3XElRCcVkPGK0I++sGLHk=; b=Mhp0jXbf+oPozHd6q+1GuhMKeklre+V+tm8bswskKOXA+9zG8fVdO/R8/DArMRTRpQ jIp4q4fwW5ygxLK4gN9Y4kTKY6iYbUhunaD5bLyj+yROso6PRGyd2RbvUkz4akxLQaZF RbKMDtSHUgTh4OTEfe/AH7/5oe3F6i8aJxDjyQDaBy5RdI4XN0bzW6u0g0o3KVQNtwBl MyPoQtr5QYhIDQVbG+lTTbCBA/j5X4oTJ7Xk3xIra39cJDPTfTTSt8jtCNIlGOxPJqhv JKfOFfCAle+3pXZG4hi4kmmsko3SY18i6IDT9fYmLfGIf7joJY30hWl3K6dQUI08B7u8 0qyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=4+YaVsaIeql0R9c8t74ily3XElRCcVkPGK0I++sGLHk=; b=mS1ECZufltJEqLPqfnn4nh4ve9/C3xCXQpXagB9v+3wXILKAIZ8nbmpdf/L5amcUw1 KCsqaVSJABwEfDkhrCWHfnUPuqfMBURk5fXkslnbEuxBOv1wMjlfrSC3Lz5EF8abFMG8 ZmHFXAt3O4KxtPzVXEdaf/On57pUoAJC+Apl4NUQueSv0nqlklGAkmRN8S84oUVZ3kmB 1uUhqBicKwz7YuBQmqfb/tobYJvRhcMJSGYmtf3idcJ9dY9Ssuny3iPkxEXpM9LLyP9+ MR90BuOBFLA9bvtiuiPhzIaPfZqPvi7itDQZBwUDOh/VI2ptP5vCpxWIh8wED0W9vHpM L04Q== X-Gm-Message-State: AOAM532Ud2yF+nWfgXLPlHx5fz9eRGcvzDxT8yQA2UQyyEMCgoqRhPAu agclLuhAkYJGpRXEPNUMiyd0b5HLfRHVRJP2 X-Google-Smtp-Source: ABdhPJy5S7rRsYV43puAHd7TIs1uon/XLWuXLfR3d7sjOkxvbcv7RxkoFaYn/36uWLFqSmYtiY+Ycw== X-Received: by 2002:aca:e146:: with SMTP id y67mr2973oig.70.1607465006725; Tue, 08 Dec 2020 14:03:26 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id c18sm16220oib.31.2020.12.08.14.03.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:03:26 -0800 (PST) Date: Tue, 8 Dec 2020 17:03:24 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 03/24] pack-bitmap: bounds-check size of cache extension Message-ID: <97533dba2720f7709699aaf42afede65a2cb540b.1607464775.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King A .bitmap file may have a "name hash cache" extension, which puts a sequence of uint32_t values (one per object) at the end of the file. When we see a flag indicating this extension, we blindly subtract the appropriate number of bytes from our available length. However, if the .bitmap file is too short, we'll underflow our length variable and wrap around, thinking we have a very large length. This can lead to reading out-of-bounds bytes while loading individual ewah bitmaps. We can fix this by checking the number of available bytes when we parse the header. The existing "truncated bitmap" test is now split into two tests: one where we don't have this extension at all (and hence actually do try to read a truncated ewah bitmap) and one where we realize up-front that we can't even fit in the cache structure. We'll check stderr in each case to make sure we hit the error we're expecting. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 8 ++++++-- t/t5310-pack-bitmaps.sh | 17 +++++++++++++++-- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index fe5647e72e..074d9ac8f2 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -153,14 +153,18 @@ static int load_bitmap_header(struct bitmap_index *index) /* Parse known bitmap format options */ { uint32_t flags = ntohs(header->options); + size_t cache_size = st_mult(index->pack->num_objects, sizeof(uint32_t)); + unsigned char *index_end = index->map + index->map_size - the_hash_algo->rawsz; if ((flags & BITMAP_OPT_FULL_DAG) == 0) return error("Unsupported options for bitmap index file " "(Git requires BITMAP_OPT_FULL_DAG)"); if (flags & BITMAP_OPT_HASH_CACHE) { - unsigned char *end = index->map + index->map_size - the_hash_algo->rawsz; - index->hashes = ((uint32_t *)end) - index->pack->num_objects; + if (cache_size > index_end - index->map - header_size) + return error("corrupted bitmap index file (too short to fit hash cache)"); + index->hashes = (void *)(index_end - cache_size); + index_end -= cache_size; } } diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 1d40fcad39..dbe1ffc88a 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -343,7 +343,8 @@ test_expect_success 'pack reuse respects --incremental' ' test_must_be_empty actual ' -test_expect_success 'truncated bitmap fails gracefully' ' +test_expect_success 'truncated bitmap fails gracefully (ewah)' ' + test_config pack.writebitmaphashcache false && git repack -ad && git rev-list --use-bitmap-index --count --all >expect && bitmap=$(ls .git/objects/pack/*.bitmap) && @@ -352,7 +353,19 @@ test_expect_success 'truncated bitmap fails gracefully' ' mv -f $bitmap.tmp $bitmap && git rev-list --use-bitmap-index --count --all >actual 2>stderr && test_cmp expect actual && - test_i18ngrep corrupt stderr + test_i18ngrep corrupt.ewah.bitmap stderr +' + +test_expect_success 'truncated bitmap fails gracefully (cache)' ' + git repack -ad && + git rev-list --use-bitmap-index --count --all >expect && + bitmap=$(ls .git/objects/pack/*.bitmap) && + test_when_finished "rm -f $bitmap" && + test_copy_bytes 512 <$bitmap >$bitmap.tmp && + mv -f $bitmap.tmp $bitmap && + git rev-list --use-bitmap-index --count --all >actual 2>stderr && + test_cmp expect actual && + test_i18ngrep corrupted.bitmap.index stderr ' # have_delta From patchwork Tue Dec 8 22:03:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959863 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F19F3C433FE for ; Tue, 8 Dec 2020 22:04:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C2A5F222B3 for ; Tue, 8 Dec 2020 22:04:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730621AbgLHWEP (ORCPT ); Tue, 8 Dec 2020 17:04:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730592AbgLHWEM (ORCPT ); Tue, 8 Dec 2020 17:04:12 -0500 Received: from mail-oi1-x242.google.com (mail-oi1-x242.google.com [IPv6:2607:f8b0:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF459C06179C for ; Tue, 8 Dec 2020 14:03:31 -0800 (PST) Received: by mail-oi1-x242.google.com with SMTP id s2so229755oij.2 for ; Tue, 08 Dec 2020 14:03:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=OhlqUiYnccM3S4S5W6mpurQ7phrdH7FfjG761/iIhc8=; b=Fv824w8N0lp4gitj4dS9ZflSpzZysPGXyWppLFi5m9JAUC3zupdaL4FXyGZXF5FHuJ KAzwgrNhrDOWFmJ2vZRJS2OPvB/LByfLDuvt9YHq5H98NxqFbVhQhxWH8/H68W3lSmPJ VWh7OqanZOUR7iuiNko3ZN+H7F9bgpey6t84GkeFiZSykooJ2E5CSLkRagnIaDwPrNWg Tj+L6/BqhhZXW+jnC4/MgTI+uC4S8ha/EYWz2qTk+md4AzubSEV5uGZPr9zZaVUSgQ5V bF+cn+4C6Ap6qaCxGlftP4VCMhnbOjd1ehqn37aSSXH/jlYGG+6UZzw2sLd7uF+HUsY7 84mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=OhlqUiYnccM3S4S5W6mpurQ7phrdH7FfjG761/iIhc8=; b=kXQWTKkAETaz/FcbGVv+5rpKutVPEipu1b4lQFSa8Ntos/6yDXKtkqw/Z4dksg8eLY PJZGz0qyIhhG+StftGkNlVS83WYcNU4oafmpzrok05ArIbB3hFUFROCAhWFnC8sYDiBY bXigthsHOD4W5Ved+mrKv6gWCOGR8JXMe+yswYdxH8QXTq7vf9ODMEEgaDrvrQ5+sF2P swnl5gl358HqexZmuh8ri7dtaJupGPZCOMzLNUf9oa2fNTCOx16m4nHH53QCK855jhHE O5R5K6SFg82cJtN6/BktJBtiC/wkLgHxuiagx5O/IvpCfIZ/C3eSPWlg3YCcGI3L8PnT vefA== X-Gm-Message-State: AOAM531ppVgnoJSLAupgwLVHs0vESScPaYb4pxcPnqHuDsvQfbcSuEwL RiiQlPAQ8Ykbb/XL2A2X4qUy5Oc9BMOIIuJX X-Google-Smtp-Source: ABdhPJw4aVIXg1L5yMPsPf76M7P9XnhTdWH/ETugh6p71bRoAOoZJzuGpKTnYjUXcocTnMLlVYNK9w== X-Received: by 2002:aca:c0d7:: with SMTP id q206mr4349019oif.89.1607465011121; Tue, 08 Dec 2020 14:03:31 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id d15sm11951otk.62.2020.12.08.14.03.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:03:30 -0800 (PST) Date: Tue, 8 Dec 2020 17:03:28 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 04/24] t5310: drop size of truncated ewah bitmap Message-ID: <2e7454d7b9bf2f1953bee54d578434e6831632cd.1607464775.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King We truncate the .bitmap file to 512 bytes and expect to run into problems reading an individual ewah file. But this length is somewhat arbitrary, and just happened to work when the test was added in 9d2e330b17 (ewah_read_mmap: bounds-check mmap reads, 2018-06-14). An upcoming commit will change the size of the history we create in the test repo, which will cause this test to fail. We can future-proof it a bit more by reducing the size of the truncated bitmap file. Signed-off-by: Jeff King Helped-by: Junio C Hamano Signed-off-by: Taylor Blau --- t/t5310-pack-bitmaps.sh | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index dbe1ffc88a..bf094cfe42 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -22,10 +22,11 @@ has_any () { test_expect_success 'setup repo with moderate-sized history' ' test_commit_bulk --id=file 100 && + git branch -M second && git checkout -b other HEAD~5 && test_commit_bulk --id=side 10 && - git checkout master && - bitmaptip=$(git rev-parse master) && + git checkout second && + bitmaptip=$(git rev-parse second) && blob=$(echo tagged-blob | git hash-object -w --stdin) && git tag tagged-blob $blob && git config repack.writebitmaps true @@ -63,8 +64,8 @@ rev_list_tests() { ' test_expect_success "counting non-linear history ($state)" ' - git rev-list --count other...master >expect && - git rev-list --use-bitmap-index --count other...master >actual && + git rev-list --count other...second >expect && + git rev-list --use-bitmap-index --count other...second >actual && test_cmp expect actual ' @@ -128,7 +129,7 @@ test_expect_success 'setup further non-bitmapped commits' ' rev_list_tests 'partial bitmap' test_expect_success 'fetch (partial bitmap)' ' - git --git-dir=clone.git fetch origin master:master && + git --git-dir=clone.git fetch origin second:second && git rev-parse HEAD >expect && git --git-dir=clone.git rev-parse HEAD >actual && test_cmp expect actual @@ -230,7 +231,7 @@ test_expect_success 'full repack, reusing previous bitmaps' ' ' test_expect_success 'fetch (full bitmap)' ' - git --git-dir=clone.git fetch origin master:master && + git --git-dir=clone.git fetch origin second:second && git rev-parse HEAD >expect && git --git-dir=clone.git rev-parse HEAD >actual && test_cmp expect actual @@ -349,7 +350,7 @@ test_expect_success 'truncated bitmap fails gracefully (ewah)' ' git rev-list --use-bitmap-index --count --all >expect && bitmap=$(ls .git/objects/pack/*.bitmap) && test_when_finished "rm -f $bitmap" && - test_copy_bytes 512 <$bitmap >$bitmap.tmp && + test_copy_bytes 256 <$bitmap >$bitmap.tmp && mv -f $bitmap.tmp $bitmap && git rev-list --use-bitmap-index --count --all >actual 2>stderr && test_cmp expect actual && From patchwork Tue Dec 8 22:03:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959865 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D8E4C1B0D9 for ; Tue, 8 Dec 2020 22:04:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1857B222B3 for ; Tue, 8 Dec 2020 22:04:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730645AbgLHWEW (ORCPT ); Tue, 8 Dec 2020 17:04:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730609AbgLHWER (ORCPT ); Tue, 8 Dec 2020 17:04:17 -0500 Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E9F7C0617A6 for ; Tue, 8 Dec 2020 14:03:37 -0800 (PST) Received: by mail-ot1-x341.google.com with SMTP id y24so263612otk.3 for ; Tue, 08 Dec 2020 14:03:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=aTPqehF/iR92lHl5qU6Qtb+pF/tTsVor4twzj8uNe5U=; b=OrBiAJ9M+vzL8kOQecVNx+9U7zWgXrxGunr64Gq4NDvdBbOg7WVo9auAWwwiz08oIj DeUqPFh0C0p19hLadl8VhIZuyL9mu6/N1BMTSLathkLufRCGTBLaYqDs2cFu/RqcrtIx Mr2OiUqyObIYm2JmRcv7a2v+Nr6r6YjBv8tjqLW37svG7pSStvyebQUFbpSQgN5G/4D6 qxluIUzC1sZT9JcnJA/isewx2uphV4vpXmHgI2aXPbrPru1rEWCnFkZ0xNUhpUDcCcie Pk3rfpMBzH49Uhu5gbQR/mJoejTtrMxVJ8xREV9OTLJbxksSjaQaY2xaJgY2u6Pn6H8Q kdWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=aTPqehF/iR92lHl5qU6Qtb+pF/tTsVor4twzj8uNe5U=; b=P3mWa5Wu0Vt6OyuKMqrJoSx42aFuLZBt3/ItZf4a2vMJTZNfsKWQCvyo3Vm1B0ZC+N 3eJHfdx8aqZ0MxPoP4RdLyIrUndZCdOV1KHplFKRcL1K97kZ+bDqTnGXbYzjxKPbrdQH 9W4xaJo5kBftgG5jRHwbSfeYTCYQrAtHPL5QJfPpvNC39qfSRk/lMcCmaZuQQOJB/nwl OM+EpTxuoitHxXN0fXQ5DNpo+QA9ZgtCAzjNqpjiQpTosY3LYeQObeU3IStf8zlBbWzX vgmLZjQk7725RAJBZ+gWS+bcSMY7bGx1+FJi9taQKsfmiKGaaqn+U98Y82yZ0mOw2ZVF wDoA== X-Gm-Message-State: AOAM531rDceN3/V2RUHb3NVoZWOuXay1PTSGLmIz7BjaMA2qHi/5Bxh9 CawGwCtSr9MVi7loY6OJ1e9VX4mwZDofbgqW X-Google-Smtp-Source: ABdhPJyHurmSs0e46sPwPI/PFiHwYkCiqfu33wSST/+e3u8Pt3K9FNW4loiY8DKYLAGKwv7g2OnS4w== X-Received: by 2002:a9d:2ac3:: with SMTP id e61mr144775otb.252.1607465015649; Tue, 08 Dec 2020 14:03:35 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id v3sm8391oth.80.2020.12.08.14.03.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:03:35 -0800 (PST) Date: Tue, 8 Dec 2020 17:03:33 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 05/24] rev-list: die when --test-bitmap detects a mismatch Message-ID: <3cb41563725a16f87f65eb75c5c4f61b79abe1cc.1607464775.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King You can use "git rev-list --test-bitmap HEAD" to check that bitmaps produce the same answer we'd get from a regular traversal. But if we detect an error, we only print "mismatch", and still exit with a successful error code. That makes the uses of --test-bitmap in the test suite (e.g., in t5310) mostly pointless: even if we saw an error, the tests wouldn't notice. Let's instead call die(), which will let these tests work as designed, and alert us if the bitmaps are bogus. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 074d9ac8f2..4431f9f120 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1328,7 +1328,7 @@ void test_bitmap_walk(struct rev_info *revs) if (bitmap_equals(result, tdata.base)) fprintf(stderr, "OK!\n"); else - fprintf(stderr, "Mismatch!\n"); + die("mismatch in bitmap results"); free_bitmap_index(bitmap_git); } From patchwork Tue Dec 8 22:03:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53ABBC2BB40 for ; Tue, 8 Dec 2020 22:04:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3048523A7C for ; Tue, 8 Dec 2020 22:04:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730666AbgLHWEa (ORCPT ); Tue, 8 Dec 2020 17:04:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730608AbgLHWE1 (ORCPT ); Tue, 8 Dec 2020 17:04:27 -0500 Received: from mail-ot1-x342.google.com (mail-ot1-x342.google.com [IPv6:2607:f8b0:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E7DFC0617A7 for ; Tue, 8 Dec 2020 14:03:41 -0800 (PST) Received: by mail-ot1-x342.google.com with SMTP id h18so205798otq.12 for ; Tue, 08 Dec 2020 14:03:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=S+PfCUh8SD4AoMOsjP36g0QEn/qzOI2Cq9Wdhx/41HM=; b=L8eTbS4aQZ3hxwgtGK3f1X6bNgbZcUNaFLzvqdFJQpZCxt/Hli1XILrGnEpixRk092 6C78kog7v0FhqpuwQ3SOIe1HB0JrBRvA0Nhd0YRAcG8b7VFYDqCKE/Xc4zYFhtVs2Mxh a2HLQpvFMR/OdM1OGl6ogrWWJ54veOQE+tRCE5x6lw34ChFYpl4XvYdWnkgStBIKcG6r aotw4Enytmsq6HIdmIuWpKsMO5QPzSie6va1gC/jKRdfUSFViKQExkVb4TBLcCZWH7C7 eQQjPqjaeUtfe46/wIGWHN77WfT4mTwWZlJQU1N8XFUbqjqlA0t+U8l6XXlEOFho7pv0 e1Rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=S+PfCUh8SD4AoMOsjP36g0QEn/qzOI2Cq9Wdhx/41HM=; b=nWGACugTvRS2mwrohiFeU/s+wif4NHpVHp9cNg3nIm8NnbTCCi7Gb2UVL5ZH8fnpEx uE2QMUpfFAe31TD02yaWhbMtGyYJoSr86O2sOWWa95psbrhcXkTGfQznmhRPgBhnZtVA Ds9ePr6JDdzpYKeCudE0KqA334M2It5BfvL+WI+fLIAhJNI5ohKRQZWC8DgJf5vAfTAd r57xVVDT+gAoMk6cEie+dWAxB5ntVJWFbGxVMnZV6sHDX0fa41lKT+hLH66j8fEREi4t HE3Ka76ndJvfNPqnRFoBU3af39ZoEB6f3lteCgya6CxQFpZVuuWPKGlrtmsjcvpWCkHB PTGA== X-Gm-Message-State: AOAM532//uBlIXAvZ4eW+5kwggObcUZwThmidIXUubwTkbaKD7Tj23dY nDTRTn9r2tc8Ia1aVvq0T4xFFYGVKFvT4Wom X-Google-Smtp-Source: ABdhPJzXXhFPpRhUATanC/NuUxLSSW85lnl+YAVqbfyktZnUhbWfs9MawPBvGEgN8gW84HuMtob1hw== X-Received: by 2002:a05:6830:1f50:: with SMTP id u16mr104415oth.265.1607465020429; Tue, 08 Dec 2020 14:03:40 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id a4sm67416oot.6.2020.12.08.14.03.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:03:39 -0800 (PST) Date: Tue, 8 Dec 2020 17:03:38 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 06/24] ewah: factor out bitmap growth Message-ID: <570bf22425b1b9d9088846e39b26b07265079ece.1607464775.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King We auto-grow bitmaps when somebody asks to set a bit whose position is outside of our currently allocated range. Other operations besides single bit-setting might need to do this, too, so let's pull it into its own function. Note that we change the semantics a little: you now ask for the number of words you'd like to have, not the id of the block you'd like to write to. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index d8cec585af..7c1ecfa6fd 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -35,18 +35,22 @@ struct bitmap *bitmap_new(void) return bitmap_word_alloc(32); } -void bitmap_set(struct bitmap *self, size_t pos) +static void bitmap_grow(struct bitmap *self, size_t word_alloc) { - size_t block = EWAH_BLOCK(pos); - - if (block >= self->word_alloc) { + if (word_alloc > self->word_alloc) { size_t old_size = self->word_alloc; - self->word_alloc = block ? block * 2 : 1; + self->word_alloc = word_alloc * 2; REALLOC_ARRAY(self->words, self->word_alloc); memset(self->words + old_size, 0x0, (self->word_alloc - old_size) * sizeof(eword_t)); } +} +void bitmap_set(struct bitmap *self, size_t pos) +{ + size_t block = EWAH_BLOCK(pos); + + bitmap_grow(self, block + 1); self->words[block] |= EWAH_MASK(pos); } From patchwork Tue Dec 8 22:03:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959869 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D125CC2BBCA for ; Tue, 8 Dec 2020 22:04:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8AB8A23A7C for ; Tue, 8 Dec 2020 22:04:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730677AbgLHWEf (ORCPT ); Tue, 8 Dec 2020 17:04:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730668AbgLHWEb (ORCPT ); Tue, 8 Dec 2020 17:04:31 -0500 Received: from mail-ot1-x342.google.com (mail-ot1-x342.google.com [IPv6:2607:f8b0:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E32FC0617B0 for ; Tue, 8 Dec 2020 14:03:45 -0800 (PST) Received: by mail-ot1-x342.google.com with SMTP id i6so269983otr.2 for ; Tue, 08 Dec 2020 14:03:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=TKUdskbrHTgE2dMosZfIVMUOZBpta233l6JHsaPug8M=; b=dbKhEtJ5K+jZCdJLmzN34XNwMKR3c/YFIrAMBTM6/Y1rfzvd9VzAsR9MqtI/Uw2/lq hTYMXKko9Nl8ECVXXgjGMoRm1VGSXagJT/isIp9jyOzh4xBPzSTxqMSw6cwnaYnNYrd8 kvGDIGb1LniStuhwkUM7yBF5gQPJIIVX3W6FzAcNjNETikPXjFskFK61aY5WJyLWHCuz xMbkNrejYA1ZF9ozJjQAnInvdh5Ea/vG7JTgkvNzsV8euS8zOAIIE3WvNUUH1M80BPRs B6FAP/pIKRDrBaX5wZDv1TVldzTRsouPP1VSf+G+7cuk2qlNTcvdWpBTc/CjKShQe4Dw B1mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=TKUdskbrHTgE2dMosZfIVMUOZBpta233l6JHsaPug8M=; b=OezKNzZNCr8olzJMfSzEnZOWkQDzMhvCR0x6aQ5LROs17IXXX7YV2yO8mB59F77RhB are1SpkeqFVtHM5Oh0VXGs+Mbd6+jBiQPN8zf56nYXdRHAWzIjAZMG4f0zVc2YNtnQu9 DvbWqyboFYfa4Nj5b44Js3XNE08CqxwDbCvsHPCv8bxWxey4mYjzBgxtOoPGaggxmjyB ldQQWjsqYI5kQfCnifs+V8X9o7E34uEYzRxCwvWWXRZOJpG/N8D8JA8Ec6siLhLj6a5Y 6L41+NBMHnCefMgGogiNZskmPo+eNy+hmIiF3JzEE6HOdfFvdDLX5Q3BwI017k2S3m5d oprg== X-Gm-Message-State: AOAM5310BUQ9pF/wNglw8RE6+EE9FVtRL7oSYJE0rnp7+Mkxl2P9j/sJ CZCSKBNtIFmzMrcbu1uZ7kI6JGXCNBO/50gx X-Google-Smtp-Source: ABdhPJyYvvmp8tuVn3el5RQUpKesIE41+P9y6ZNUD/T/KO1SI4VrrivKzKJ+m82NPbyYlZdvX+wLpA== X-Received: by 2002:a9d:5f03:: with SMTP id f3mr124661oti.91.1607465024641; Tue, 08 Dec 2020 14:03:44 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id 60sm16630ott.32.2020.12.08.14.03.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:03:44 -0800 (PST) Date: Tue, 8 Dec 2020 17:03:42 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 07/24] ewah: make bitmap growth less aggressive Message-ID: <48a1949ee62113fdb187287de4e3584c45f444e8.1607464775.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King If you ask to set a bit in the Nth word and we haven't yet allocated that many slots in our array, we'll increase the bitmap size to 2*N. This means we might frequently end up with bitmaps that are twice the necessary size (as soon as you ask for the biggest bit, we'll size up to twice that). But if we just allocate as many words as were asked for, we may not grow fast enough. The worst case there is setting bit 0, then 1, etc. Each time we grow we'd just extend by one more word, giving us linear reallocations (and quadratic memory copies). A middle ground is relying on alloc_nr(), which causes us to grow by a factor of roughly 3/2 instead of 2. That's less aggressive than doubling, and it may help avoid fragmenting memory. (If we start with N, then grow twice, our total is N*(3/2)^2 = 9N/4. After growing twice, that array of size 9N/4 can fit into the space vacated by the original array and first growth, N+3N/2 = 10N/4 > 9N/4, leading to less fragmentation in memory). Our worst case is still 3/2N wasted bits (you set bit N-1, then setting bit N causes us to grow by 3/2), but our average should be much better. This isn't usually that big a deal, but it will matter as we shift the reachability bitmap generation code to store more bitmaps in memory. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index 7c1ecfa6fd..6f9e5c529b 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -37,13 +37,10 @@ struct bitmap *bitmap_new(void) static void bitmap_grow(struct bitmap *self, size_t word_alloc) { - if (word_alloc > self->word_alloc) { - size_t old_size = self->word_alloc; - self->word_alloc = word_alloc * 2; - REALLOC_ARRAY(self->words, self->word_alloc); - memset(self->words + old_size, 0x0, - (self->word_alloc - old_size) * sizeof(eword_t)); - } + size_t old_size = self->word_alloc; + ALLOC_GROW(self->words, word_alloc, self->word_alloc); + memset(self->words + old_size, 0x0, + (self->word_alloc - old_size) * sizeof(eword_t)); } void bitmap_set(struct bitmap *self, size_t pos) From patchwork Tue Dec 8 22:03:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959877 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1621C4167B for ; Tue, 8 Dec 2020 22:05:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9E34223B54 for ; Tue, 8 Dec 2020 22:05:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730726AbgLHWEz (ORCPT ); Tue, 8 Dec 2020 17:04:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730698AbgLHWEs (ORCPT ); Tue, 8 Dec 2020 17:04:48 -0500 Received: from mail-oi1-x241.google.com (mail-oi1-x241.google.com [IPv6:2607:f8b0:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A69DEC06138C for ; Tue, 8 Dec 2020 14:03:49 -0800 (PST) Received: by mail-oi1-x241.google.com with SMTP id s2so230485oij.2 for ; Tue, 08 Dec 2020 14:03:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=bsuIqyXlLj77HI6Jh/NOCqFh+idsp+iFSjpkcZnqfL8=; b=NMgc3IS/5LpOalCHIkhuudbDn/wV2TXvI1cRQzmU97HVVmxBaakIt+77GL97+/cwKD AFNzGRZdUdh3zYz6Kz2KgwneXd7BYr3zb0/tPvWnW771G/8DkfJjG0h3ojLEbS4hM7+s LuEiampXEdW3Dhqzw1CE0Ij9r0YI5OeLcNp/+oL9CFDvtrr3i99VMhzzjtQS9BlUbxuC zsTeDBJtfLQYKqBMK5xna93vnMgxXwFTYnx43tiR9JuI2WUwicYuQcyFLVyFtWbQ3lBk Ncs+uJJg8T1cW0YfYlAITRtLZmwGufNgoSoj32T6KEhF4KUtFE0Mi41MXdNRHBLft4DK XTLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=bsuIqyXlLj77HI6Jh/NOCqFh+idsp+iFSjpkcZnqfL8=; b=B90LjeZlx/vPGTw0KP+fBY99sG6Ac0qg7qan+J0I2/tmNkf6gG2epa3O9QIYp6sx5r FI3Em8FV0ggQbcVHnjqRS4Yoe5KSxnh8ueBGw/PmxxND93iH9rgWEQDy5HO20vXNTtPG I497AisySzQTotuuFntayolLo6955lf5ZQsjwIGmkmTkbnPy9vH8nUlf2o5wY5x+/Aue JuqvZ6AQhJO5bqXLygTMZ86bVMmuOvhNNtLQJgQ/J25hegLQPYqM/EpM/w03N+bInhQK 1MPsfKJBy5ZxWLGGLoCaz2zsrY5lhQ2C8jSd7b1HR5WBUESbgswXwt5X3oPl1A2LBF+6 5p+Q== X-Gm-Message-State: AOAM531WMR23G7+/BMHkKKwsD8kLLPtLF8XPchOI/Xh0P83NTX/Zmjnx KTtTfAcHsyRrSuCmG0wCEzujUKjW0UV/xUZO X-Google-Smtp-Source: ABdhPJzsJgh5AyExeMDo91cKpRnILWKIh+xPiVytEVxrSFiaZl5/pwQQwaX/FMifbvch/BFDPrf8og== X-Received: by 2002:a05:6808:9b7:: with SMTP id e23mr4406850oig.167.1607465028850; Tue, 08 Dec 2020 14:03:48 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id q4sm67298ooo.1.2020.12.08.14.03.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:03:48 -0800 (PST) Date: Tue, 8 Dec 2020 17:03:46 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 08/24] ewah: implement bitmap_or() Message-ID: <04bf0de4742386b2c5bce7bbfdb07cbaefc637c8.1607464775.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King We have a function to bitwise-OR an ewah into an uncompressed bitmap, but not to OR two uncompressed bitmaps. Let's add it. Interestingly, we have a public header declaration going back to e1273106f6 (ewah: compressed bitmap implementation, 2013-11-14), but the function was never implemented. That was all OK since there were no users of 'bitmap_or()', but a first caller will be added in a couple of patches. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index 6f9e5c529b..0a3502603f 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -122,6 +122,15 @@ void bitmap_and_not(struct bitmap *self, struct bitmap *other) self->words[i] &= ~other->words[i]; } +void bitmap_or(struct bitmap *self, const struct bitmap *other) +{ + size_t i; + + bitmap_grow(self, other->word_alloc); + for (i = 0; i < other->word_alloc; i++) + self->words[i] |= other->words[i]; +} + void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other) { size_t original_size = self->word_alloc; From patchwork Tue Dec 8 22:03:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959875 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B03E6C19437 for ; Tue, 8 Dec 2020 22:05:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 87D6323AC4 for ; Tue, 8 Dec 2020 22:05:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730716AbgLHWEz (ORCPT ); Tue, 8 Dec 2020 17:04:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730697AbgLHWEs (ORCPT ); Tue, 8 Dec 2020 17:04:48 -0500 Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49603C061282 for ; Tue, 8 Dec 2020 14:03:54 -0800 (PST) Received: by mail-ot1-x341.google.com with SMTP id o11so255543ote.4 for ; Tue, 08 Dec 2020 14:03:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=tgwQSV0Jx2RFdjihG4qsp3emQ8nruJR26Qc29XYgvWk=; b=oDWgLk2eutaDLo36krbya8wCZEvFHPhyPDejQJVfqZ8UZFRzwmPeE9wC25nhWimTx5 2NqBnZaK9skVuS9LfMwTPuzEH+XNT2tslkHFv17SQjvgXBmrqa2VWV/Wm0n6HwAPOVPT sxkMbAJQf+3Vag4kd0qS+BalrZJOJiI5IntnRYAKgkPr6I05tc6vUZP0nAGovXcaLzm1 jbyuJLgTIc+1H4VIWOfKaFeMHMRAqNuLiInC6AUmUVZIjGish4BX3CqbpWNEITaTa9JD 0eEEp5rWDxNAI7ll7RFnboYT7u1oNqSheKoZ4rm5y9v7folZsdJl6V1Ltu60r8rpPZ3R L9Gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=tgwQSV0Jx2RFdjihG4qsp3emQ8nruJR26Qc29XYgvWk=; b=rMPCDUPwIy8lCK1fvLADKfFQ4MVlwAyci90p3U3+A5kJ+SJuwlya5fGud75NcqZTEX KzDOML22ZZm191kveLKf1NDqyEbIdo3Abb3rfJ1QgBkw0QqvLk5XFtVKPP0a9aS3QXX7 8vhE5O+BCcug0uwBpqdlvTo1Va3clsGiGaM/gzA6U5Lcr5OjdZSoizaWdPf6VdlvDHIw ab/Tk5onIBsY4ywb80WtSkzj9xsiNVlTYT2uw8Abu8qxrQy4lR08Ql4fikVdgYJxTtrr Lxq3xO7tNEHFqV9IEtA+9Ed7DPeAWeMkOLMqPdOtMoaLEWQq1OrCzjnIFQDAptA/S04G MVrw== X-Gm-Message-State: AOAM530WQ+W5oD/4Gba2ggtQNC1BdTzvvNSC5YoI7u8xaGiKvwX7yXcQ JeMl0jBSzqqkDMz7ax1PgJIHT7SgPK8ygvhv X-Google-Smtp-Source: ABdhPJxlI/MRTD6NlzPCWOVIsKxqCL/m6ORPvwHJAHUBfQ6VGRyt1xIZ4sWiemls0yHUjvgN6UMjrA== X-Received: by 2002:a9d:39f5:: with SMTP id y108mr129616otb.63.1607465033396; Tue, 08 Dec 2020 14:03:53 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id l1sm55670ooi.48.2020.12.08.14.03.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:03:52 -0800 (PST) Date: Tue, 8 Dec 2020 17:03:50 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 09/24] ewah: add bitmap_dup() function Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King There's no easy way to make a copy of a bitmap. Obviously a caller can iterate over the bits and set them one by one in a new bitmap, but we can go much faster by copying whole words with memcpy(). Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 7 +++++++ ewah/ewok.h | 1 + 2 files changed, 8 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index 0a3502603f..b5f6376282 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -35,6 +35,13 @@ struct bitmap *bitmap_new(void) return bitmap_word_alloc(32); } +struct bitmap *bitmap_dup(const struct bitmap *src) +{ + struct bitmap *dst = bitmap_word_alloc(src->word_alloc); + COPY_ARRAY(dst->words, src->words, src->word_alloc); + return dst; +} + static void bitmap_grow(struct bitmap *self, size_t word_alloc) { size_t old_size = self->word_alloc; diff --git a/ewah/ewok.h b/ewah/ewok.h index 011852bef1..1fc555e672 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -173,6 +173,7 @@ struct bitmap { struct bitmap *bitmap_new(void); struct bitmap *bitmap_word_alloc(size_t word_alloc); +struct bitmap *bitmap_dup(const struct bitmap *src); void bitmap_set(struct bitmap *self, size_t pos); void bitmap_unset(struct bitmap *self, size_t pos); int bitmap_get(struct bitmap *self, size_t pos); From patchwork Tue Dec 8 22:03:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959873 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B645C4361B for ; Tue, 8 Dec 2020 22:05:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4700523A33 for ; Tue, 8 Dec 2020 22:05:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730691AbgLHWEr (ORCPT ); Tue, 8 Dec 2020 17:04:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730688AbgLHWEr (ORCPT ); Tue, 8 Dec 2020 17:04:47 -0500 Received: from mail-ot1-x32e.google.com (mail-ot1-x32e.google.com [IPv6:2607:f8b0:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1A7AC061285 for ; Tue, 8 Dec 2020 14:03:58 -0800 (PST) Received: by mail-ot1-x32e.google.com with SMTP id h18so206545otq.12 for ; Tue, 08 Dec 2020 14:03:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=8a1DVfwodNSPG/KEl4b7OreWUlMbtW7oinAE1F5TJy0=; b=RzbxPXlmQwr9c1B2rjqsfX4vaDDp0bIMjwsZhYC12usQC7OaOeNp05awFwVWUYhGN0 w6hBGsPxnd8YQsTb77WWwcqDf5xZCgDd3IBE+Htnp7xUkmwZYvIe2xS0HXDX7/X6bOWk LAxgPsVxz5glgoxXZNO/ElkEFNqEycxGXwbSVlvyzNMzfrGRHa5ibeoUvUdamSpzVYuM Bf97HiGNkgOHH+6LTP3b4foBjuCJfrCpgf1GDOJaRHy47afBefHOeTMTZqBLvXkP6Rs7 Q9MrDDQ73Xo0NmKrmuwnLv1fYiltbtaLXxfzEqCs+x7tEvUXTOdj3Ubonlyatt0lYjz8 APnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=8a1DVfwodNSPG/KEl4b7OreWUlMbtW7oinAE1F5TJy0=; b=WMxrU2RRIr7SwSdwXGa2FeiT8Ti/XC+c2u1D0j29JV8im4qMjF07qB0YnStt/TDOdz OczIeF9hA09eGGszT9p5g43VDSWzqrqiJw1Z56+0a+lDsuAkijtn2o8l7DH1Sh/xQpJa QTFK1potj6vJhbYH7kkwdrc2QFIXpXuvIOQhaEjK/DXsLpP0RQvgbEx8zQjqTEHvNrPQ c4ocoW7h3z3FBS3IaSq4WtizDciE3aHfK13pY7FhfBFSgHa5bUnQrjp8VrRm6sBjxGgS G2HSB12HiFfXgXSDD9xoy9SHhl5TevUyrD15FdLDxhi/DlTb1dNnJqA4ZdQMApryjXOy yzQw== X-Gm-Message-State: AOAM530gFhoWN8QwBQShxPLF3yfEQe97dL5ZQqGsdE9BvTPan/mAPg3o cgthGA/JpNcBqLysQNQ+rjrexI1CIfiP4JRf X-Google-Smtp-Source: ABdhPJzws9gByl+S1GJ2frJaNqEIk6n0XqdElzyGa+vWWZ0JRl1prrkYS40hx2dHrJmuBKrtl+3+Kg== X-Received: by 2002:a9d:708e:: with SMTP id l14mr128140otj.87.1607465037824; Tue, 08 Dec 2020 14:03:57 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id x12sm11982oic.51.2020.12.08.14.03.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:03:57 -0800 (PST) Date: Tue, 8 Dec 2020 17:03:55 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 10/24] pack-bitmap-write: reimplement bitmap writing Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King The bitmap generation code works by iterating over the set of commits for which we plan to write bitmaps, and then for each one performing a traditional traversal over the reachable commits and trees, filling in the bitmap. Between two traversals, we can often reuse the previous bitmap result as long as the first commit is an ancestor of the second. However, our worst case is that we may end up doing "n" complete complete traversals to the root in order to create "n" bitmaps. In a real-world case (the shared-storage repo consisting of all GitHub forks of chromium/chromium), we perform very poorly: generating bitmaps takes ~3 hours, whereas we can walk the whole object graph in ~3 minutes. This commit completely rewrites the algorithm, with the goal of accessing each object only once. It works roughly like this: - generate a list of commits in topo-order using a single traversal - invert the edges of the graph (so have parents point at their children) - make one pass in reverse topo-order, generating a bitmap for each commit and passing the result along to child nodes We generate correct results because each node we visit has already had all of its ancestors added to the bitmap. And we make only two linear passes over the commits. We also visit each tree usually only once. When filling in a bitmap, we don't bother to recurse into trees whose bit is already set in the bitmap (since we know we've already done so when setting their bit). That means that if commit A references tree T, none of its descendants will need to open T again. I say "usually", though, because it is possible for a given tree to be mentioned in unrelated parts of history (e.g., cherry-picking to a parallel branch). So we've accomplished our goal, and the resulting algorithm is pretty simple to understand. But there are some downsides, at least with this initial implementation: - we no longer reuse the results of any on-disk bitmaps when generating. So we'd expect to sometimes be slower than the original when bitmaps already exist. However, this is something we'll be able to add back in later. - we use much more memory. Instead of keeping one bitmap in memory at a time, we're passing them up through the graph. So our memory use should scale with the graph width (times the size of a bitmap). So how does it perform? For a clone of linux.git, generating bitmaps from scratch with the old algorithm took 63s. Using this algorithm it takes 205s. Which is much worse, but _might_ be acceptable if it behaved linearly as the size grew. It also increases peak heap usage by ~1G. That's not impossibly large, but not encouraging. On the complete fork-network of torvalds/linux, it increases the peak RAM usage by 40GB. Yikes. (I forgot to record the time it took, but the memory usage was too much to consider this reasonable anyway). On the complete fork-network of chromium/chromium, I ran out of memory before succeeding. Some back-of-the-envelope calculations indicate it would need 80+GB to complete. So at this stage, we've managed to make things much worse. But because of the way this new algorithm is structured, there are a lot of opportunities for optimization on top. We'll start implementing those in the follow-on patches. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 306 +++++++++++++++++++++++++------------------- 1 file changed, 172 insertions(+), 134 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 5e998bdaa7..bcd059ccd9 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -110,8 +110,6 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack, /** * Compute the actual bitmaps */ -static struct object **seen_objects; -static unsigned int seen_objects_nr, seen_objects_alloc; static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused) { @@ -127,21 +125,6 @@ static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitm writer.selected_nr++; } -static inline void mark_as_seen(struct object *object) -{ - ALLOC_GROW(seen_objects, seen_objects_nr + 1, seen_objects_alloc); - seen_objects[seen_objects_nr++] = object; -} - -static inline void reset_all_seen(void) -{ - unsigned int i; - for (i = 0; i < seen_objects_nr; ++i) { - seen_objects[i]->flags &= ~(SEEN | ADDED | SHOWN); - } - seen_objects_nr = 0; -} - static uint32_t find_object_pos(const struct object_id *oid) { struct object_entry *entry = packlist_find(writer.to_pack, oid); @@ -154,60 +137,6 @@ static uint32_t find_object_pos(const struct object_id *oid) return oe_in_pack_pos(writer.to_pack, entry); } -static void show_object(struct object *object, const char *name, void *data) -{ - struct bitmap *base = data; - bitmap_set(base, find_object_pos(&object->oid)); - mark_as_seen(object); -} - -static void show_commit(struct commit *commit, void *data) -{ - mark_as_seen((struct object *)commit); -} - -static int -add_to_include_set(struct bitmap *base, struct commit *commit) -{ - khiter_t hash_pos; - uint32_t bitmap_pos = find_object_pos(&commit->object.oid); - - if (bitmap_get(base, bitmap_pos)) - return 0; - - hash_pos = kh_get_oid_map(writer.bitmaps, commit->object.oid); - if (hash_pos < kh_end(writer.bitmaps)) { - struct bitmapped_commit *bc = kh_value(writer.bitmaps, hash_pos); - bitmap_or_ewah(base, bc->bitmap); - return 0; - } - - bitmap_set(base, bitmap_pos); - return 1; -} - -static int -should_include(struct commit *commit, void *_data) -{ - struct bitmap *base = _data; - - if (!add_to_include_set(base, commit)) { - struct commit_list *parent = commit->parents; - - mark_as_seen((struct object *)commit); - - while (parent) { - parent->item->object.flags |= SEEN; - mark_as_seen((struct object *)parent->item); - parent = parent->next; - } - - return 0; - } - - return 1; -} - static void compute_xor_offsets(void) { static const int MAX_XOR_OFFSET_SEARCH = 10; @@ -248,79 +177,188 @@ static void compute_xor_offsets(void) } } -void bitmap_writer_build(struct packing_data *to_pack) +struct bb_commit { + struct commit_list *children; + struct bitmap *bitmap; + unsigned selected:1; + unsigned idx; /* within selected array */ +}; + +define_commit_slab(bb_data, struct bb_commit); + +struct bitmap_builder { + struct bb_data data; + struct commit **commits; + size_t commits_nr, commits_alloc; +}; + +static void bitmap_builder_init(struct bitmap_builder *bb, + struct bitmap_writer *writer) { - static const double REUSE_BITMAP_THRESHOLD = 0.2; - - int i, reuse_after, need_reset; - struct bitmap *base = bitmap_new(); struct rev_info revs; + struct commit *commit; + unsigned int i; + + memset(bb, 0, sizeof(*bb)); + init_bb_data(&bb->data); + + reset_revision_walk(); + repo_init_revisions(writer->to_pack->repo, &revs, NULL); + revs.topo_order = 1; + + for (i = 0; i < writer->selected_nr; i++) { + struct commit *c = writer->selected[i].commit; + struct bb_commit *ent = bb_data_at(&bb->data, c); + ent->selected = 1; + ent->idx = i; + add_pending_object(&revs, &c->object, ""); + } + + if (prepare_revision_walk(&revs)) + die("revision walk setup failed"); + + while ((commit = get_revision(&revs))) { + struct commit_list *p; + + parse_commit_or_die(commit); + + ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); + bb->commits[bb->commits_nr++] = commit; + + for (p = commit->parents; p; p = p->next) { + struct bb_commit *ent = bb_data_at(&bb->data, p->item); + commit_list_insert(commit, &ent->children); + } + } +} + +static void bitmap_builder_clear(struct bitmap_builder *bb) +{ + clear_bb_data(&bb->data); + free(bb->commits); + bb->commits_nr = bb->commits_alloc = 0; +} + +static void fill_bitmap_tree(struct bitmap *bitmap, + struct tree *tree) +{ + uint32_t pos; + struct tree_desc desc; + struct name_entry entry; + + /* + * If our bit is already set, then there is nothing to do. Both this + * tree and all of its children will be set. + */ + pos = find_object_pos(&tree->object.oid); + if (bitmap_get(bitmap, pos)) + return; + bitmap_set(bitmap, pos); + + if (parse_tree(tree) < 0) + die("unable to load tree object %s", + oid_to_hex(&tree->object.oid)); + init_tree_desc(&desc, tree->buffer, tree->size); + + while (tree_entry(&desc, &entry)) { + switch (object_type(entry.mode)) { + case OBJ_TREE: + fill_bitmap_tree(bitmap, + lookup_tree(the_repository, &entry.oid)); + break; + case OBJ_BLOB: + bitmap_set(bitmap, find_object_pos(&entry.oid)); + break; + default: + /* Gitlink, etc; not reachable */ + break; + } + } + + free_tree_buffer(tree); +} + +static void fill_bitmap_commit(struct bb_commit *ent, + struct commit *commit) +{ + if (!ent->bitmap) + ent->bitmap = bitmap_new(); + + /* + * mark ourselves, but do not bother with parents; their values + * will already have been propagated to us + */ + bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); + fill_bitmap_tree(ent->bitmap, get_commit_tree(commit)); +} + +static void store_selected(struct bb_commit *ent, struct commit *commit) +{ + struct bitmapped_commit *stored = &writer.selected[ent->idx]; + khiter_t hash_pos; + int hash_ret; + + /* + * the "reuse bitmaps" phase may have stored something here, but + * our new algorithm doesn't use it. Drop it. + */ + if (stored->bitmap) + ewah_free(stored->bitmap); + + stored->bitmap = bitmap_to_ewah(ent->bitmap); + + hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret); + if (hash_ret == 0) + die("Duplicate entry when writing index: %s", + oid_to_hex(&commit->object.oid)); + kh_value(writer.bitmaps, hash_pos) = stored; +} + +void bitmap_writer_build(struct packing_data *to_pack) +{ + struct bitmap_builder bb; + size_t i; + int nr_stored = 0; /* for progress */ writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; if (writer.show_progress) writer.progress = start_progress("Building bitmaps", writer.selected_nr); - - repo_init_revisions(to_pack->repo, &revs, NULL); - revs.tag_objects = 1; - revs.tree_objects = 1; - revs.blob_objects = 1; - revs.no_walk = 0; - - revs.include_check = should_include; - reset_revision_walk(); - - reuse_after = writer.selected_nr * REUSE_BITMAP_THRESHOLD; - need_reset = 0; - - for (i = writer.selected_nr - 1; i >= 0; --i) { - struct bitmapped_commit *stored; - struct object *object; - - khiter_t hash_pos; - int hash_ret; - - stored = &writer.selected[i]; - object = (struct object *)stored->commit; - - if (stored->bitmap == NULL) { - if (i < writer.selected_nr - 1 && - (need_reset || - !in_merge_bases(writer.selected[i + 1].commit, - stored->commit))) { - bitmap_reset(base); - reset_all_seen(); - } - - add_pending_object(&revs, object, ""); - revs.include_check_data = base; - - if (prepare_revision_walk(&revs)) - die("revision walk setup failed"); - - traverse_commit_list(&revs, show_commit, show_object, base); - - object_array_clear(&revs.pending); - - stored->bitmap = bitmap_to_ewah(base); - need_reset = 0; - } else - need_reset = 1; - - if (i >= reuse_after) - stored->flags |= BITMAP_FLAG_REUSE; - - hash_pos = kh_put_oid_map(writer.bitmaps, object->oid, &hash_ret); - if (hash_ret == 0) - die("Duplicate entry when writing index: %s", - oid_to_hex(&object->oid)); - - kh_value(writer.bitmaps, hash_pos) = stored; - display_progress(writer.progress, writer.selected_nr - i); + trace2_region_enter("pack-bitmap-write", "building_bitmaps_total", + the_repository); + + bitmap_builder_init(&bb, &writer); + for (i = bb.commits_nr; i > 0; i--) { + struct commit *commit = bb.commits[i-1]; + struct bb_commit *ent = bb_data_at(&bb.data, commit); + struct commit *child; + + fill_bitmap_commit(ent, commit); + + if (ent->selected) { + store_selected(ent, commit); + nr_stored++; + display_progress(writer.progress, nr_stored); + } + + while ((child = pop_commit(&ent->children))) { + struct bb_commit *child_ent = + bb_data_at(&bb.data, child); + + if (child_ent->bitmap) + bitmap_or(child_ent->bitmap, ent->bitmap); + else + child_ent->bitmap = bitmap_dup(ent->bitmap); + } + bitmap_free(ent->bitmap); + ent->bitmap = NULL; } + bitmap_builder_clear(&bb); + + trace2_region_leave("pack-bitmap-write", "building_bitmaps_total", + the_repository); - bitmap_free(base); stop_progress(&writer.progress); compute_xor_offsets(); From patchwork Tue Dec 8 22:03:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFCB4C1B0D9 for ; Tue, 8 Dec 2020 22:05:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B22FE23B55 for ; Tue, 8 Dec 2020 22:05:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730733AbgLHWE6 (ORCPT ); Tue, 8 Dec 2020 17:04:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730715AbgLHWEw (ORCPT ); Tue, 8 Dec 2020 17:04:52 -0500 Received: from mail-ot1-x342.google.com (mail-ot1-x342.google.com [IPv6:2607:f8b0:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 187AAC0613D6 for ; Tue, 8 Dec 2020 14:04:03 -0800 (PST) Received: by mail-ot1-x342.google.com with SMTP id d8so245103otq.6 for ; Tue, 08 Dec 2020 14:04:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=41w+8dMxR92mbEFLoU66S+t8xwPreIFljCrA8SXjhWo=; b=P3D79hlhKv8xjGZb3FYa+9U/TgbMdaBTjQ+/Niesc0D1PGGjL8tWWyJuY7TQIiw8px SHkBx2XVhKYUYVTMFc9e8IN4kLGWYYMH5yLoRe0fj0JnczRLveUiCFdiUgI07OYIgCzk 3mJg4cKvpw4POBLCtTxgRY3MaevG3PLMWAeMpCrVDR1nlyn3gJ/WZmVgyMcfbWKZ7z3J D16se0UxFDkP4RlzZ8fOjcm3px6aekC7ZYV3IAfKt1w8vU1T9CEhREo6y4Vl0eLm3MSs RowqhjQ8oU3hK8v2AL3MGCN1YQGsxqtTmfG2esUwp2z5C3YTBTF7lbJ80dUe4UkZzud3 BXYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=41w+8dMxR92mbEFLoU66S+t8xwPreIFljCrA8SXjhWo=; b=GfqOdoD7tYN2w0Z7opcRGOLQBPFFCvfRb+L88ehQEgdrhEAuOF9I3Y8LTYsurC1CUc kR62ND/eWZ1g7CCaLjkcW047II1EdeZxE3C+ybRVjFC9Ndv4HxbZ3/Fw94Qc+brnlNxi WPH+6BAsbhlAjxCEtO8b0oLVjBADxGBcg27NMnqTo95IR/bLwSc3NOpO1F1OM1TSwj70 uGPnXCbStz29+SkRWpjMjoff1tBDUWnjnIXk49DsbxvRjlvdW/X1RuazikTiJGpu6rew zLzNoPzZK4yBSO2dh2dAgWa7q78qwcWO14xVmve2nssx5VMwr56yxXC5j1WAcaQcEz/R cO+A== X-Gm-Message-State: AOAM533zGKfVDYcGiCgk0Ooo9QBrV7uCcuO5rDdp5c20GP0wiwnYUM85 a3TCV9pYxGu/wgoLVoTMEwWbKOaMDQjYhCfT X-Google-Smtp-Source: ABdhPJxb1dZc0fFA2bPa/3yOYyMXhd1SNx6Fa7gFu57C5JtYqnchgRSUoU5WVvh8t+9zZKCEvVl4lg== X-Received: by 2002:a9d:774a:: with SMTP id t10mr150037otl.190.1607465042212; Tue, 08 Dec 2020 14:04:02 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id z10sm69578oom.3.2020.12.08.14.04.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:04:01 -0800 (PST) Date: Tue, 8 Dec 2020 17:03:59 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 11/24] pack-bitmap-write: pass ownership of intermediate bitmaps Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King Our algorithm to generate reachability bitmaps walks through the commit graph from the bottom up, passing bitmap data from each commit to its descendants. For a linear stretch of history like: A -- B -- C our sequence of steps is: - compute the bitmap for A by walking its trees, etc - duplicate A's bitmap as a starting point for B; we can now free A's bitmap, since we only needed it as an intermediate result - OR in any extra objects that B can reach into its bitmap - duplicate B's bitmap as a starting point for C; likewise, free B's bitmap - OR in objects for C, and so on... Rather than duplicating bitmaps and immediately freeing the original, we can just pass ownership from commit to commit. Note that this doesn't always work: - the recipient may be a merge which already has an intermediate bitmap from its other ancestor. In that case we have to OR our result into it. Note that the first ancestor to reach the merge does get to pass ownership, though. - we may have multiple children; we can only pass ownership to one of them However, it happens often enough and copying bitmaps is expensive enough that this provides a noticeable speedup. On a clone of linux.git, this reduces the time to generate bitmaps from 205s to 70s. This is about the same amount of time it took to generate bitmaps using our old "many traversals" algorithm (the previous commit measures the identical scenario as taking 63s). It unfortunately provides only a very modest reduction in the peak memory usage, though. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index bcd059ccd9..1eb9615df8 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -333,6 +333,7 @@ void bitmap_writer_build(struct packing_data *to_pack) struct commit *commit = bb.commits[i-1]; struct bb_commit *ent = bb_data_at(&bb.data, commit); struct commit *child; + int reused = 0; fill_bitmap_commit(ent, commit); @@ -348,10 +349,15 @@ void bitmap_writer_build(struct packing_data *to_pack) if (child_ent->bitmap) bitmap_or(child_ent->bitmap, ent->bitmap); - else + else if (reused) child_ent->bitmap = bitmap_dup(ent->bitmap); + else { + child_ent->bitmap = ent->bitmap; + reused = 1; + } } - bitmap_free(ent->bitmap); + if (!reused) + bitmap_free(ent->bitmap); ent->bitmap = NULL; } bitmap_builder_clear(&bb); From patchwork Tue Dec 8 22:04:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959871 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 997E5C433FE for ; Tue, 8 Dec 2020 22:05:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 66DFE23A7C for ; Tue, 8 Dec 2020 22:05:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730707AbgLHWEs (ORCPT ); Tue, 8 Dec 2020 17:04:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730688AbgLHWEr (ORCPT ); Tue, 8 Dec 2020 17:04:47 -0500 Received: from mail-ot1-x342.google.com (mail-ot1-x342.google.com [IPv6:2607:f8b0:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80BEFC0613CF for ; Tue, 8 Dec 2020 14:04:07 -0800 (PST) Received: by mail-ot1-x342.google.com with SMTP id j12so239297ota.7 for ; Tue, 08 Dec 2020 14:04:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=69NZYdya+y3fH6tvgWnX9Me1m2KxucJctZMIwxwOhUo=; b=Gbesx6Ne5RqUvPgXiK628oaB1N58cLluN+o668vf+L1zbAxdWHgMPy4JpNwolchzD2 TF+8ixbIQ9+GxI+BboWFsb4Sav6mC4vbLxdrZEEBg2Z1rp8svyIBLYP2k9sdB4reI1Nr KWrAXpjA3jgYjN/vBXxUCKvbJNzJ+vx6nOv6i9h1zmMrlkPinBu3kR7Vn+aUwlI2k2ki 88HxRarPj/hiQ3q92LlPdlS8yy9hnsKKAnTLMBGB2mgNKhPrHM2AZFt3a3kNh7U3UK9W hQ0PBUgvWCU7Tx45bcQGq/zJa/Urj/OinOOa+m/KO2qWYb+ueskphGd14MojozPcT9pb SIfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=69NZYdya+y3fH6tvgWnX9Me1m2KxucJctZMIwxwOhUo=; b=NH/om0x1pBIlsGnBe6+Efddg7WdPjKMOV/PmPMdpc7K61jrx+kQCqNq0olRF/StHtw Vc4Ea0jEHhNekLJiYQV6BwgnEKyGQO8Lddj1X2u13S7ZRIM4xwAqs8x4SmtCEfKfrhUX iYiAnXBtewb82JGoLF5EaxZ/COAJi5oTlNZrFzBaRV8KvVy0LiLh7eAXb6VyICCpyPhg qMusQ62z2XEtK6kXGwt2SaPIM1lkAbvJKfi7J4Q7cDbIQIjO6s4dmcEVpqMTa1rtD8Ux 04+5ui9TqKW232LNipq+0iDOnMGOx0fGYZ8hBrYczJ3XON3EASR2J2Ga3efaJYyte3gj wBuw== X-Gm-Message-State: AOAM530gJ4DhV4u8NgYdkaFhOi8clzSZPbt1Q89/ONPlk2KAsYErPds8 uKYSA1KEXAG6sARcKeisBs9pmpaMg58qeHna X-Google-Smtp-Source: ABdhPJx4fgZU/34EVAGXE8eTXYXbgwhNiqvvYCS8n6Zzfbo4TvadT0XQ6BSCmCZ8FjYlxKPKqCsfbQ== X-Received: by 2002:a05:6830:1aec:: with SMTP id c12mr99930otd.337.1607465046660; Tue, 08 Dec 2020 14:04:06 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id p3sm22242otf.3.2020.12.08.14.04.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:04:06 -0800 (PST) Date: Tue, 8 Dec 2020 17:04:03 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 12/24] pack-bitmap-write: fill bitmap with commit history Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The current implementation of bitmap_writer_build() creates a reachability bitmap for every walked commit. After computing a bitmap for a commit, those bits are pushed to an in-progress bitmap for its children. fill_bitmap_commit() assumes the bits corresponding to objects reachable from the parents of a commit are already set. This means that when visiting a new commit, we only have to walk the objects reachable between it and any of its parents. A future change to bitmap_writer_build() will relax this condition so not all parents have their bits set. Prepare for that by having 'fill_bitmap_commit()' walk parents until reaching commits whose bits are already set. Then, walk the trees for these commits as well. This has no functional change with the current implementation of bitmap_writer_build(). Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 1eb9615df8..957639241e 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -12,6 +12,7 @@ #include "sha1-lookup.h" #include "pack-objects.h" #include "commit-reach.h" +#include "prio-queue.h" struct bitmapped_commit { struct commit *commit; @@ -279,17 +280,30 @@ static void fill_bitmap_tree(struct bitmap *bitmap, } static void fill_bitmap_commit(struct bb_commit *ent, - struct commit *commit) + struct commit *commit, + struct prio_queue *queue) { if (!ent->bitmap) ent->bitmap = bitmap_new(); - /* - * mark ourselves, but do not bother with parents; their values - * will already have been propagated to us - */ bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); - fill_bitmap_tree(ent->bitmap, get_commit_tree(commit)); + prio_queue_put(queue, commit); + + while (queue->nr) { + struct commit_list *p; + struct commit *c = prio_queue_get(queue); + + bitmap_set(ent->bitmap, find_object_pos(&c->object.oid)); + fill_bitmap_tree(ent->bitmap, get_commit_tree(c)); + + for (p = c->parents; p; p = p->next) { + int pos = find_object_pos(&p->item->object.oid); + if (!bitmap_get(ent->bitmap, pos)) { + bitmap_set(ent->bitmap, pos); + prio_queue_put(queue, p->item); + } + } + } } static void store_selected(struct bb_commit *ent, struct commit *commit) @@ -319,6 +333,7 @@ void bitmap_writer_build(struct packing_data *to_pack) struct bitmap_builder bb; size_t i; int nr_stored = 0; /* for progress */ + struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; @@ -335,7 +350,7 @@ void bitmap_writer_build(struct packing_data *to_pack) struct commit *child; int reused = 0; - fill_bitmap_commit(ent, commit); + fill_bitmap_commit(ent, commit, &queue); if (ent->selected) { store_selected(ent, commit); @@ -360,6 +375,7 @@ void bitmap_writer_build(struct packing_data *to_pack) bitmap_free(ent->bitmap); ent->bitmap = NULL; } + clear_prio_queue(&queue); bitmap_builder_clear(&bb); trace2_region_leave("pack-bitmap-write", "building_bitmaps_total", From patchwork Tue Dec 8 22:04:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959881 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A529C2BB40 for ; Tue, 8 Dec 2020 22:05:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D06B6222B3 for ; Tue, 8 Dec 2020 22:05:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730744AbgLHWFB (ORCPT ); Tue, 8 Dec 2020 17:05:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54122 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730728AbgLHWE6 (ORCPT ); Tue, 8 Dec 2020 17:04:58 -0500 Received: from mail-ot1-x343.google.com (mail-ot1-x343.google.com [IPv6:2607:f8b0:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11758C061793 for ; Tue, 8 Dec 2020 14:04:12 -0800 (PST) Received: by mail-ot1-x343.google.com with SMTP id b62so251898otc.5 for ; Tue, 08 Dec 2020 14:04:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=9mqvfpcaLJPH1p9MC9DOD+j9GQThZNPIDACNQV59NaA=; b=JwszJnqBTszbZBZY18Y8jzQ1zBnYsMvWjbGDRbKVxIHeprYg3dFo1Yu8dqPgZILDoS U/Q/q/ILvCUiRjcYCSn1FGfjfG3MPYXYbSyFBY9WME82AF2X7MQ/vooFrK/KdBXlflZi PZTK3ehF45iralk+q3gKIzOnrW+X0UHGgSeQftIjcbewz/GtNeRQdxUmNxP497oSnIMg ewXB1kBSCI9o03rEf80nm1hfTcKzAUk2buoItt6EFo4boaA6S7JgqxHiMTFNUU9BWb6M FULGeCBl7ggzNmm1zpx6G+2aYpokUvmOZKP2HY5TFpGnXk6iagvtcyWr7t3xgofod1p4 3RwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=9mqvfpcaLJPH1p9MC9DOD+j9GQThZNPIDACNQV59NaA=; b=Obg7gg9Qub/xn0UdAa1aDWCYybv3tcvah3oPH4LGYH7O8mUCtOOPlLeeK8FJM3zHHG fDFALjp5AJK1KS3LXO4DU5+XY5CKcv9cOSwCkanaf5/Xd3pAdvXZYOy3PkQMD5DQQAk9 CO6kIX25mDLdw+IyhzNBXt8u59YFzrVz9Fh3149I92c6JDZcPRae4qXWPUTJJGmRryaM nmA9gs14yJ5c6/8PzfY8sBluERtxUmTld1fIri/83sKYwpHCAxku3CPCZ6NAXtpuLslF 9vE0gwtKpzlpFd5kTykJg8YB9G/nj6ibv2GclQHRDF9o+aGP7IHN4F3zHzqOhWL2JuFL G5mA== X-Gm-Message-State: AOAM532aIpN0Vg5MYA3v4HoJclkBLc7a/cBE+Vaj1FPblNWVTvOclQyh uQfx+keK+b2HMqHLBkgKFhBOYhXhn8zjwOLt X-Google-Smtp-Source: ABdhPJx6XjNZwZMqiULfFtntbTv1SJa6MQWP0myE23ogFRR/BK9NbW3BrvFsGCIhnZQNGm7E9s8Ajg== X-Received: by 2002:a05:6830:3154:: with SMTP id c20mr150275ots.286.1607465051204; Tue, 08 Dec 2020 14:04:11 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id o135sm58325ooo.38.2020.12.08.14.04.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:04:10 -0800 (PST) Date: Tue, 8 Dec 2020 17:04:08 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 13/24] bitmap: implement bitmap_is_subset() Message-ID: <0cfa932b7175ab2863f42c195c2eea6c711f6553.1607464775.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The bitmap_is_subset() function checks if the 'self' bitmap contains any bitmaps that are not on in the 'other' bitmap. Up until this patch, it had a declaration, but no implementation or callers. A subsequent patch will want this function, so implement it here. Helped-by: Junio C Hamano Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- ewah/bitmap.c | 21 +++++++++++++++++++++ ewah/ewok.h | 2 +- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index b5f6376282..0d31cdc866 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -195,6 +195,27 @@ int bitmap_equals(struct bitmap *self, struct bitmap *other) return 1; } +int bitmap_is_subset(struct bitmap *self, struct bitmap *other) +{ + size_t common_size, i; + + if (self->word_alloc < other->word_alloc) + common_size = self->word_alloc; + else { + common_size = other->word_alloc; + for (i = common_size; i < self->word_alloc; i++) { + if (self->words[i]) + return 1; + } + } + + for (i = 0; i < common_size; i++) { + if (self->words[i] & ~other->words[i]) + return 1; + } + return 0; +} + void bitmap_reset(struct bitmap *bitmap) { memset(bitmap->words, 0x0, bitmap->word_alloc * sizeof(eword_t)); diff --git a/ewah/ewok.h b/ewah/ewok.h index 1fc555e672..66920965da 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -180,7 +180,7 @@ int bitmap_get(struct bitmap *self, size_t pos); void bitmap_reset(struct bitmap *self); void bitmap_free(struct bitmap *self); int bitmap_equals(struct bitmap *self, struct bitmap *other); -int bitmap_is_subset(struct bitmap *self, struct bitmap *super); +int bitmap_is_subset(struct bitmap *self, struct bitmap *other); struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap); struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah); From patchwork Tue Dec 8 22:04:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B0D5C2BBCF for ; Tue, 8 Dec 2020 22:06:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DEC3823A1D for ; Tue, 8 Dec 2020 22:06:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730757AbgLHWFD (ORCPT ); Tue, 8 Dec 2020 17:05:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730747AbgLHWFC (ORCPT ); Tue, 8 Dec 2020 17:05:02 -0500 Received: from mail-oo1-xc30.google.com (mail-oo1-xc30.google.com [IPv6:2607:f8b0:4864:20::c30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4D16C061794 for ; Tue, 8 Dec 2020 14:04:16 -0800 (PST) Received: by mail-oo1-xc30.google.com with SMTP id t63so1832272ooa.1 for ; Tue, 08 Dec 2020 14:04:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=cEQOsB+BOx66j/LFEO/fQE8ubCTEUkji902LrnsMttQ=; b=GN5h+hcIA9Wz1YOLpd5u2dqsHWTNabt7S+resS4QChqkx/oG6/w6J8cxqr0FmMEM4h 9Bhqg20goj0jJDUSGGRyu690y1ycrGzPAcjhfr3QLGcdbkgJlBftEPwanTYzW7+BnElA gFf6GC7A+YhkZ0WAx+Q5ucGLbNC9YYRSEP+Xc1p+HzfahHcqLwdsPh9rwegZltPw4hLz 6hT6cpIB0pDW5WcFvFr7vhXUKYV/qavoN6tzYW9UqHeOcSoHxIcJH1LfZ+Sa93kcemJ7 8zz4n+jcJnL7IMctqQP63qpWrwu0cyr1VIRO0ikoKA1V6DlWxnmm3UPzhhQtTQ162mNW m+3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=cEQOsB+BOx66j/LFEO/fQE8ubCTEUkji902LrnsMttQ=; b=rVDjDBwuXN4yOUqTvcXuTgYcljVfvNhkTZUI62z++0P+5ypn/ffM4NxxLMNLcOFk3g /iHF+Rxnm21YSoL6dU0H1WzWt9EgIFKyuA4g1jYYsaYX60U77EHD3e+fEgD9g/cdj5mV iH3uLw+4WOJ+FrEMkYu/oH7kGuEbVOrGKnyNmGnkiDpuV11a2rOudNnqhi61rUrZjffS KAfj5cXikX9fnFnlMi7GrTmeG5L9RbszXI60OlkKBFBAi5MDEeiRH2knTvwDccDqW4EO iIpbSZWvQz4MdYiz4kLR2fk8qiyzHpSz1WGKHAquG41ecSTVfoiXMCqLKYXDWPBAinlV mDNA== X-Gm-Message-State: AOAM533NvhXj/368u0jpqFVij49VY1Yr5id/k2tc4P0vL9madSsxsrxX fK2eA4L0z5mkQvkkK2IsZSJrrnBe/Xt+9Mj5 X-Google-Smtp-Source: ABdhPJwIinURTFyIlhL7nxcsu/sJHHBHF72YQxlNNXtFMu9NA3fg8WRuHPOIN5mN8+XO3E03ZA/k3A== X-Received: by 2002:a4a:b4c4:: with SMTP id g4mr58776ooo.7.1607465055786; Tue, 08 Dec 2020 14:04:15 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id q18sm59918ood.35.2020.12.08.14.04.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:04:15 -0800 (PST) Date: Tue, 8 Dec 2020 17:04:13 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 14/24] commit: implement commit_list_contains() Message-ID: <033fb2ed55fcab150bc019c4d6b6749ab59e3274.1607464775.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee It can be helpful to check if a commit_list contains a commit. Use pointer equality, assuming lookup_commit() was used. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- commit.c | 11 +++++++++++ commit.h | 2 ++ 2 files changed, 13 insertions(+) diff --git a/commit.c b/commit.c index fe1fa3dc41..9a785bf906 100644 --- a/commit.c +++ b/commit.c @@ -544,6 +544,17 @@ struct commit_list *commit_list_insert(struct commit *item, struct commit_list * return new_list; } +int commit_list_contains(struct commit *item, struct commit_list *list) +{ + while (list) { + if (list->item == item) + return 1; + list = list->next; + } + + return 0; +} + unsigned commit_list_count(const struct commit_list *l) { unsigned c = 0; diff --git a/commit.h b/commit.h index 5467786c7b..742a6de460 100644 --- a/commit.h +++ b/commit.h @@ -167,6 +167,8 @@ int find_commit_subject(const char *commit_buffer, const char **subject); struct commit_list *commit_list_insert(struct commit *item, struct commit_list **list); +int commit_list_contains(struct commit *item, + struct commit_list *list); struct commit_list **commit_list_append(struct commit *commit, struct commit_list **next); unsigned commit_list_count(const struct commit_list *l); From patchwork Tue Dec 8 22:04:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49B0EC2BBCD for ; Tue, 8 Dec 2020 22:05:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 24D56222B3 for ; Tue, 8 Dec 2020 22:05:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730803AbgLHWFM (ORCPT ); Tue, 8 Dec 2020 17:05:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730781AbgLHWFH (ORCPT ); Tue, 8 Dec 2020 17:05:07 -0500 Received: from mail-oi1-x241.google.com (mail-oi1-x241.google.com [IPv6:2607:f8b0:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D468C06179C for ; Tue, 8 Dec 2020 14:04:21 -0800 (PST) Received: by mail-oi1-x241.google.com with SMTP id y74so166730oia.11 for ; Tue, 08 Dec 2020 14:04:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ULT/P2uDpVwqEmdBgLYbNeuKth8g/ttngXB8MGQmitc=; b=SmFN+dtWoZ/86hxiy7bwz7jXyVlARAa5S46kr5bFuS9Qf6rHHP/w0mhUmDh+zXfMcj /o0tu0N37RFKjeQKoM6qZ7nnNHd+F96vkPpR3fIMTEuhjA1ZQ+dgqqmvIkcqE+VcheQ1 xJcyPVHBe81wjsg10PcmQYliVHAzATJ79wdR7Ro7Jsjt5UfJT8oOhCDgntIYE+B79NQ9 Her/mW5Vi5RvEZ9L2pF9t9SgN+qeVFKaGeaVlcO44KaDT1EL/k5YCLOEv7WQj/jXqvSE jjiwiaGlpZUeyj0QE6gJ6Pv58zuBbHKAzd1dB5fXihlEnyoPTTlLerWsCrm7nnMduF2D HtpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ULT/P2uDpVwqEmdBgLYbNeuKth8g/ttngXB8MGQmitc=; b=ONJZtSsvfrZDX5BXQFInHYIAlFemWLbUWXm2S2635T3I8u/v1kyNbjd7wbbRl7dO3Z dnYXao5NhM1UsVAqAqPgsMmoQXn2L9wdSrL0avVhnzvLkYvvPysMLJVSlBrJA25bEHYV n5rQiCJijpvxV8+gpccxWODwXA5RfBiDorHYfpMMQqYNBizBfoFcrsd+Qu2F7Tq8epHY JzgalUSU4FSlkpq1WJmtZjCPIxnEGYeToiItqaS0Uerh3TNmSi6s0z/OrHInkMn04scs C3+QQILC/pt99dc8sYoblltEVCdhvs1L2SLELKH4PSarze/IyAnF2JPcMqseaGvLhwj7 kPfw== X-Gm-Message-State: AOAM531hgiyccO6Aly07iObALKg+W1lfsIxRDqGODofm4x+OGCygO8w/ S97GLV/Fs2Wag4A0l/2sDJZseLeLYfKmH5uj X-Google-Smtp-Source: ABdhPJwyAqO6N1azhIM0nl7diKgZhpQwFUe70ncFD41AdkwqXEs4bVPJcWz41M3RmQJasmVw9YldsA== X-Received: by 2002:aca:4d8b:: with SMTP id a133mr4537785oib.79.1607465060159; Tue, 08 Dec 2020 14:04:20 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id j15sm16898ota.39.2020.12.08.14.04.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:04:19 -0800 (PST) Date: Tue, 8 Dec 2020 17:04:17 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 15/24] t5310: add branch-based checks Message-ID: <76071f9f4e04c67bf8b4191df880aa5bfd3348c8.1607464775.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The current rev-list tests that check the bitmap data only work on HEAD instead of multiple branches. Expand the test cases to handle both 'master' and 'other' branches. Signed-off-by: Derrick Stolee Helped-by: Junio C Hamano Signed-off-by: Taylor Blau --- t/t5310-pack-bitmaps.sh | 61 +++++++++++++++++++++++------------------ 1 file changed, 34 insertions(+), 27 deletions(-) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index bf094cfe42..8bf02336d9 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -42,63 +42,70 @@ test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' git rev-list --test-bitmap HEAD ' -rev_list_tests() { - state=$1 - - test_expect_success "counting commits via bitmap ($state)" ' - git rev-list --count HEAD >expect && - git rev-list --use-bitmap-index --count HEAD >actual && +rev_list_tests_head () { + test_expect_success "counting commits via bitmap ($state, $branch)" ' + git rev-list --count $branch >expect && + git rev-list --use-bitmap-index --count $branch >actual && test_cmp expect actual ' - test_expect_success "counting partial commits via bitmap ($state)" ' - git rev-list --count HEAD~5..HEAD >expect && - git rev-list --use-bitmap-index --count HEAD~5..HEAD >actual && + test_expect_success "counting partial commits via bitmap ($state, $branch)" ' + git rev-list --count $branch~5..$branch >expect && + git rev-list --use-bitmap-index --count $branch~5..$branch >actual && test_cmp expect actual ' - test_expect_success "counting commits with limit ($state)" ' - git rev-list --count -n 1 HEAD >expect && - git rev-list --use-bitmap-index --count -n 1 HEAD >actual && + test_expect_success "counting commits with limit ($state, $branch)" ' + git rev-list --count -n 1 $branch >expect && + git rev-list --use-bitmap-index --count -n 1 $branch >actual && test_cmp expect actual ' - test_expect_success "counting non-linear history ($state)" ' + test_expect_success "counting non-linear history ($state, $branch)" ' git rev-list --count other...second >expect && git rev-list --use-bitmap-index --count other...second >actual && test_cmp expect actual ' - test_expect_success "counting commits with limiting ($state)" ' - git rev-list --count HEAD -- 1.t >expect && - git rev-list --use-bitmap-index --count HEAD -- 1.t >actual && + test_expect_success "counting commits with limiting ($state, $branch)" ' + git rev-list --count $branch -- 1.t >expect && + git rev-list --use-bitmap-index --count $branch -- 1.t >actual && test_cmp expect actual ' - test_expect_success "counting objects via bitmap ($state)" ' - git rev-list --count --objects HEAD >expect && - git rev-list --use-bitmap-index --count --objects HEAD >actual && + test_expect_success "counting objects via bitmap ($state, $branch)" ' + git rev-list --count --objects $branch >expect && + git rev-list --use-bitmap-index --count --objects $branch >actual && test_cmp expect actual ' - test_expect_success "enumerate commits ($state)" ' - git rev-list --use-bitmap-index HEAD >actual && - git rev-list HEAD >expect && + test_expect_success "enumerate commits ($state, $branch)" ' + git rev-list --use-bitmap-index $branch >actual && + git rev-list $branch >expect && test_bitmap_traversal --no-confirm-bitmaps expect actual ' - test_expect_success "enumerate --objects ($state)" ' - git rev-list --objects --use-bitmap-index HEAD >actual && - git rev-list --objects HEAD >expect && + test_expect_success "enumerate --objects ($state, $branch)" ' + git rev-list --objects --use-bitmap-index $branch >actual && + git rev-list --objects $branch >expect && test_bitmap_traversal expect actual ' - test_expect_success "bitmap --objects handles non-commit objects ($state)" ' - git rev-list --objects --use-bitmap-index HEAD tagged-blob >actual && + test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" ' + git rev-list --objects --use-bitmap-index $branch tagged-blob >actual && grep $blob actual ' } +rev_list_tests () { + state=$1 + + for branch in "second" "other" + do + rev_list_tests_head + done +} + rev_list_tests 'full bitmap' test_expect_success 'clone from bitmapped repository' ' From patchwork Tue Dec 8 22:04:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959885 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B7BDC2BBCF for ; Tue, 8 Dec 2020 22:05:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 38EF923A33 for ; Tue, 8 Dec 2020 22:05:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730805AbgLHWFN (ORCPT ); Tue, 8 Dec 2020 17:05:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730775AbgLHWFF (ORCPT ); Tue, 8 Dec 2020 17:05:05 -0500 Received: from mail-ot1-x342.google.com (mail-ot1-x342.google.com [IPv6:2607:f8b0:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 655E0C0617A6 for ; Tue, 8 Dec 2020 14:04:25 -0800 (PST) Received: by mail-ot1-x342.google.com with SMTP id o11so256998ote.4 for ; Tue, 08 Dec 2020 14:04:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=WoZWPxmAj0x1vYLK6v7ZELgMGpSwPf3gkePQcJB85eg=; b=PwJ16WwPRv3nKvDjFVXy7ZfmtjWAA/860D3aUbslURE5RZwZVym0FiyNB8mgorkr4B 0JuxIhhYKxkLlHu5gmaI7opfumANFUq2RJTkaSBMUb65bSaQkO2AfKZ5iR28/FSBQH/V P58JvInnN8tkgcIb8148heNbQOYCJ0NsVoO6GPuzNso4EV8ed9LCXYsC6mkFTSlKtHxC SzWzXn+amRRDjR188+IGsX3VkFd3z5NPgQB/7YwFWlDXMzVzCQm+sFkAQE40B7HWYFmP s5KgO+KTTyoum6eSMVpSwVz/toFx0Tr3hQN9nJ6VocqJ7Z53s5+Izkv/UcF4QO1NeWbu Nkuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=WoZWPxmAj0x1vYLK6v7ZELgMGpSwPf3gkePQcJB85eg=; b=DnIHeqTJYwovHgCinf5d1aMggfM6Ts/TjPW3ODipvLoBt8PH+izQXazvMIqgw/yaq9 fzP6b2nkwOKJq5Z2TGcBtZSb0C0aJ8LncU2X6Bz7x1ZfZwWnqxnltAMv+cdkDdSOoq9r IsC0zByFvI8WGp9Z5MK49szAE/00VrNsVA7PVtYKt6iaxr4IhSjkI90OAGJduKGAWWz/ Bx+g4SFh8+hwH1mFTfASReJhbIBmyPPTM8NJHaMbKzzLT3koFbLS/K0edwVDzzzOAM1e tTx2I49s5BD3N0zadzALvtZ5qoZVwty9/j7sjZZdevyfMGac0G+432lHokz7cuv7yT1W zdGw== X-Gm-Message-State: AOAM531QcagNarioUNRCpHkZlogjY0eW6OQYKOqUeOxuhDr4msWyFWHH ukHUftkIjqixSctTNE4RexE+6jK1pM3gtS2T X-Google-Smtp-Source: ABdhPJy2wk99/93Pqwl1bEEf47MTcvAChUNSfeaCKfP9ijNYs7YNtzc1QGdfd5iLJrSgC1I0ysCeEA== X-Received: by 2002:a9d:5e97:: with SMTP id f23mr148341otl.204.1607465064460; Tue, 08 Dec 2020 14:04:24 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id k30sm60834ool.34.2020.12.08.14.04.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:04:23 -0800 (PST) Date: Tue, 8 Dec 2020 17:04:22 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 16/24] pack-bitmap-write: rename children to reverse_edges Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The bitmap_builder_init() method walks the reachable commits in topological order and constructs a "reverse graph" along the way. At the moment, this reverse graph contains an edge from commit A to commit B if and only if A is a parent of B. Thus, the name "children" is appropriate for for this reverse graph. In the next change, we will repurpose the reverse graph to not be directly-adjacent commits in the commit-graph, but instead a more abstract relationship. The previous changes have already incorporated the necessary updates to fill_bitmap_commit() that allow these edges to not be immediate children. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 957639241e..7e218d02a6 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -179,7 +179,7 @@ static void compute_xor_offsets(void) } struct bb_commit { - struct commit_list *children; + struct commit_list *reverse_edges; struct bitmap *bitmap; unsigned selected:1; unsigned idx; /* within selected array */ @@ -228,7 +228,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, for (p = commit->parents; p; p = p->next) { struct bb_commit *ent = bb_data_at(&bb->data, p->item); - commit_list_insert(commit, &ent->children); + commit_list_insert(commit, &ent->reverse_edges); } } } @@ -358,7 +358,7 @@ void bitmap_writer_build(struct packing_data *to_pack) display_progress(writer.progress, nr_stored); } - while ((child = pop_commit(&ent->children))) { + while ((child = pop_commit(&ent->reverse_edges))) { struct bb_commit *child_ent = bb_data_at(&bb.data, child); From patchwork Tue Dec 8 22:04:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959887 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B525C4361B for ; Tue, 8 Dec 2020 22:06:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D3666222B3 for ; Tue, 8 Dec 2020 22:06:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730739AbgLHWFT (ORCPT ); Tue, 8 Dec 2020 17:05:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730775AbgLHWFP (ORCPT ); Tue, 8 Dec 2020 17:05:15 -0500 Received: from mail-ot1-x344.google.com (mail-ot1-x344.google.com [IPv6:2607:f8b0:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83556C061257 for ; Tue, 8 Dec 2020 14:04:29 -0800 (PST) Received: by mail-ot1-x344.google.com with SMTP id i6so271880otr.2 for ; Tue, 08 Dec 2020 14:04:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=/xbW6H63n+yti/OsJjfkYsqvsHOXl31qvIGu38SrEno=; b=WYm9nK13l24tH+5h5/2izJ7alJzlSUi6+dGxz0Srag6gzbWw4ns2fJR45RMMyK744n UF6wfIGtTg0uopHQSx9jMddy84QMh7dLORxlMUJKVLdkA9p89zXMJJWpd0rvYF4ZTW9O ODb6/cDI8doKhzoq2pOF+ZPWCHpkqk2D5/OhxUvnBCtjpNQRifx8+JSwuAI8Etfz1ou6 OGm/agURTyu71Lrhl3N7QrshoUoUIs3RsewE5uvBm7hlEg7tYIRZ5wXgfaoUhQDXa2/e y5YnqJajC9AfWbhgk5MvrDP83keS6RysaeqZQLeLKDrjFW+gFVEN2lzo9CRm5bQe4lJN MNvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=/xbW6H63n+yti/OsJjfkYsqvsHOXl31qvIGu38SrEno=; b=YWum8XPIiIGOA1gdu5pzmvUoLwbPd4aY9Hn0S5LdgPYNyJi9/R5vGL8chRbqYM9H/y WZaz0/wf5iyP49GsSZxe1an0eTkbMyEBdE3csDNOc2Knm5ROyQfk7RtpUfLllo13APPZ Ukw1XldIwJ6RUMMcNTJIqiDmJxKu7RhCGb1a3Qw/k6Yder3HeiQHKWEsIQv4uTLym4tq +0GyoAP5pMYV+sfwjSzw20J0f4hntldGSaLH7zkS0u3CDibIz4ZsIoNwJeS87GLw8yQs Sl+u5wOxixaLLqvd3CkHfM0PPnP2UuXLoxjQ85ij/ut2Mo1w0gmEA1fke0AXVgURO2Zb ayMw== X-Gm-Message-State: AOAM5319t37ch/ebCtOmwLppGqIX39Dojes24HSwwmwDXqHUctfbdX6u x5/NCBim56vDQvsvm7OA540YzpOujdlWu9EY X-Google-Smtp-Source: ABdhPJxbNi/EFDjjXGrP66NFN9Ze5DW9eXCDU678+LbALhn5iR4g6QPcqw8dZWqWjmrvlKXuDtCPwQ== X-Received: by 2002:a05:6830:402c:: with SMTP id i12mr129361ots.25.1607465068645; Tue, 08 Dec 2020 14:04:28 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id n63sm14980oih.39.2020.12.08.14.04.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:04:28 -0800 (PST) Date: Tue, 8 Dec 2020 17:04:26 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 17/24] pack-bitmap.c: check reads more aggressively when loading Message-ID: <2e082437060c43b7a6410be1f9ddd2eeb104e4bc.1607464775.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Before 'load_bitmap_entries_v1()' reads an actual EWAH bitmap, it should check that it can safely do so by ensuring that there are at least 6 bytes available to be read (four for the commit's index position, and then two more for the xor offset and flags, respectively). Likewise, it should check that the commit index it read refers to a legitimate object in the pack. The first fix catches a truncation bug that was exposed when testing, and the second is purely precautionary. There are some possible future improvements, not pursued here. They are: - Computing the correct boundary of the bitmap itself in the caller and ensuring that we don't read past it. This may or may not be worth it, since in a truncation situation, all bets are off: (is the trailer still there and the bitmap entries malformed, or is the trailer truncated?). The best we can do is try to read what's there as if it's correct data (and protect ourselves when it's obviously bogus). - Avoid the magic "6" by teaching read_be32() and read_u8() (both of which are custom helpers for this function) to check sizes before advancing the pointers. - Adding more tests in this area. Testing these truncation situations are remarkably fragile to even subtle changes in the bitmap generation. So, the resulting tests are likely to be quite brittle. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 4431f9f120..60c781d100 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -229,11 +229,16 @@ static int load_bitmap_entries_v1(struct bitmap_index *index) uint32_t commit_idx_pos; struct object_id oid; + if (index->map_size - index->map_pos < 6) + return error("corrupt ewah bitmap: truncated header for entry %d", i); + commit_idx_pos = read_be32(index->map, &index->map_pos); xor_offset = read_u8(index->map, &index->map_pos); flags = read_u8(index->map, &index->map_pos); - nth_packed_object_id(&oid, index->pack, commit_idx_pos); + if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0) + return error("corrupt ewah bitmap: commit index %u out of range", + (unsigned)commit_idx_pos); bitmap = read_bitmap_1(index); if (!bitmap) From patchwork Tue Dec 8 22:04:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AF07C2BBD4 for ; Tue, 8 Dec 2020 22:06:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 21A7F23AA8 for ; Tue, 8 Dec 2020 22:06:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730976AbgLHWFx (ORCPT ); Tue, 8 Dec 2020 17:05:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54192 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730801AbgLHWFU (ORCPT ); Tue, 8 Dec 2020 17:05:20 -0500 Received: from mail-oi1-x229.google.com (mail-oi1-x229.google.com [IPv6:2607:f8b0:4864:20::229]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34D6EC0617A7 for ; Tue, 8 Dec 2020 14:04:34 -0800 (PST) Received: by mail-oi1-x229.google.com with SMTP id o25so207682oie.5 for ; Tue, 08 Dec 2020 14:04:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=fWXmhjodka59g2xFVNz1E/fdENe696wq2q53jDUOQsY=; b=GfGXStx19tZhUVQE3HbR4fPhgjP1GywQVzMcH4QCm1UrKRyyCrXr2iH5aiWsg+YPWv uRfpgLzFPeD+ENgXD2zbSW8otf3TUj/e8Xouj9rD+aZ9uUHgUaUFQ1e/dRsuzxvp74t4 QQRi8HnRx3ieP6RGsihqZKZZxupmK4+IDTk537DmPqxbA64WANohFHqMPTeTDGI7FSBj L2luEITIA15/8tZyR6CQLYtmFsLA7Y9j3vHaeckgI+9uEHuD3Ux8hCqN3TGDXnhh8Dda Kw4drHZCp5uOzodIotGRHd2I/oiMT44YqNHE09MOm9IVd+XDMMKvOLc+tH+fEcAxUjt5 KWtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=fWXmhjodka59g2xFVNz1E/fdENe696wq2q53jDUOQsY=; b=jOCQm5t+QmPYsaiXzQnnwFAq11zSNNaxRifb6VUXLVGkpQcCHzOGk7FVrBEKBxiJUq V1B083bb775yW1rYZIP61JyLjxtyhxDUMJWkJYVrZihTK4R0zdyM2KEUCnlBnyoLAnHS yivU2UbCBx0u057I1CstZE+1hVD8CJh2vi7e0F2bM/cplsGvNX5rIsLK+1REKx/9qqJF tfGRYZChblS5H9MtRqHC6TE6rvdzvbgMGxPDzZrBof4vbD9gGLBDeLXHTtM412Wkb552 OzzeADWybYxqf5JpuyKyVpoQi4tVKVtGsNnN8ahjYfbBJSkt71GmMWuZOGQWosHT0Bmd Rf/A== X-Gm-Message-State: AOAM532FyDfpmfO30NKCN7LxLp3IclqOFmyH2mf7URL7aVAKKgwbd9WA JjWa90wwcB98YcThQ5GR/6ndt2g/X0SGO/Zk X-Google-Smtp-Source: ABdhPJyRvBJJkGc8XvsTwofiY2cxQxRf19btW3AoyeoznUuvfVyo1CIE9S4siwcsHVm8oVSrQpu7MA== X-Received: by 2002:aca:6208:: with SMTP id w8mr4640735oib.120.1607465072843; Tue, 08 Dec 2020 14:04:32 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id v12sm54776ooi.46.2020.12.08.14.04.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:04:32 -0800 (PST) Date: Tue, 8 Dec 2020 17:04:30 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 18/24] pack-bitmap-write: build fewer intermediate bitmaps Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The bitmap_writer_build() method calls bitmap_builder_init() to construct a list of commits reachable from the selected commits along with a "reverse graph". This reverse graph has edges pointing from a commit to other commits that can reach that commit. After computing a reachability bitmap for a commit, the values in that bitmap are then copied to the reachability bitmaps across the edges in the reverse graph. We can now relax the role of the reverse graph to greatly reduce the number of intermediate reachability bitmaps we compute during this reverse walk. The end result is that we walk objects the same number of times as before when constructing the reachability bitmaps, but we also spend much less time copying bits between bitmaps and have much lower memory pressure in the process. The core idea is to select a set of "important" commits based on interactions among the sets of commits reachable from each selected commit. The first technical concept is to create a new 'commit_mask' member in the bb_commit struct. Note that the selected commits are provided in an ordered array. The first thing to do is to mark the ith bit in the commit_mask for the ith selected commit. As we walk the commit-graph, we copy the bits in a commit's commit_mask to its parents. At the end of the walk, the ith bit in the commit_mask for a commit C stores a boolean representing "The ith selected commit can reach C." As we walk, we will discover non-selected commits that are important. We will get into this later, but those important commits must also receive bit positions, growing the width of the bitmasks as we walk. At the true end of the walk, the ith bit means "the ith _important_ commit can reach C." MAXIMAL COMMITS --------------- We use a new 'maximal' bit in the bb_commit struct to represent whether a commit is important or not. The term "maximal" comes from the partially-ordered set of commits in the commit-graph where C >= P if P is a parent of C, and then extending the relationship transitively. Instead of taking the maximal commits across the entire commit-graph, we instead focus on selecting each commit that is maximal among commits with the same bits on in their commit_mask. This definition is important, so let's consider an example. Suppose we have three selected commits A, B, and C. These are assigned bitmasks 100, 010, and 001 to start. Each of these can be marked as maximal immediately because they each will be the uniquely maximal commit that contains their own bit. Keep in mind that that these commits may have different bitmasks after the walk; for example, if B can reach C but A cannot, then the final bitmask for C is 011. Even in these cases, C would still be a maximal commit among all commits with the third bit on in their masks. Now define sets X, Y, and Z to be the sets of commits reachable from A, B, and C, respectively. The intersections of these sets correspond to different bitmasks: * 100: X - (Y union Z) * 010: Y - (X union Z) * 001: Z - (X union Y) * 110: (X intersect Y) - Z * 101: (X intersect Z) - Y * 011: (Y intersect Z) - X * 111: X intersect Y intersect Z This can be visualized with the following Hasse diagram: 100 010 001 | \ / \ / | | \/ \/ | | /\ /\ | | / \ / \ | 110 101 011 \___ | ___/ \ | / 111 Some of these bitmasks may not be represented, depending on the topology of the commit-graph. In fact, we are counting on it, since the number of possible bitmasks is exponential in the number of selected commits, but is also limited by the total number of commits. In practice, very few bitmasks are possible because most commits converge on a common "trunk" in the commit history. With this three-bit example, we wish to find commits that are maximal for each bitmask. How can we identify this as we are walking? As we walk, we visit a commit C. Since we are walking the commits in topo-order, we know that C is visited after all of its children are visited. Thus, when we get C from the revision walk we inspect the 'maximal' property of its bb_data and use that to determine if C is truly important. Its commit_mask is also nearly final. If C is not one of the originally-selected commits, then assign a bit position to C (by incrementing num_maximal) and set that bit on in commit_mask. See "MULTIPLE MAXIMAL COMMITS" below for more detail on this. Now that the commit C is known to be maximal or not, consider each parent P of C. Compute two new values: * c_not_p : true if and only if the commit_mask for C contains a bit that is not contained in the commit_mask for P. * p_not_c : true if and only if the commit_mask for P contains a bit that is not contained in the commit_mask for P. If c_not_p is false, then P already has all of the bits that C would provide to its commit_mask. In this case, move on to other parents as C has nothing to contribute to P's state that was not already provided by other children of P. We continue with the case that c_not_p is true. This means there are bits in C's commit_mask to copy to P's commit_mask, so use bitmap_or() to add those bits. If p_not_c is also true, then set the maximal bit for P to one. This means that if no other commit has P as a parent, then P is definitely maximal. This is because no child had the same bitmask. It is important to think about the maximal bit for P at this point as a temporary state: "P is maximal based on current information." In contrast, if p_not_c is false, then set the maximal bit for P to zero. Further, clear all reverse_edges for P since any edges that were previously assigned to P are no longer important. P will gain all reverse edges based on C. The final thing we need to do is to update the reverse edges for P. These reverse edges respresent "which closest maximal commits contributed bits to my commit_mask?" Since C contributed bits to P's commit_mask in this case, C must add to the reverse edges of P. If C is maximal, then C is a 'closest' maximal commit that contributed bits to P. Add C to P's reverse_edges list. Otherwise, C has a list of maximal commits that contributed bits to its bitmask (and this list is exactly one element). Add all of these items to P's reverse_edges list. Be careful to ignore duplicates here. After inspecting all parents P for a commit C, we can clear the commit_mask for C. This reduces the memory load to be limited to the "width" of the commit graph. Consider our ABC/XYZ example from earlier and let's inspect the state of the commits for an interesting bitmask, say 011. Suppose that D is the only maximal commit with this bitmask (in the first three bits). All other commits with bitmask 011 have D as the only entry in their reverse_edges list. D's reverse_edges list contains B and C. COMPUTING REACHABILITY BITMAPS ------------------------------ Now that we have our definition, let's zoom out and consider what happens with our new reverse graph when computing reachability bitmaps. We walk the reverse graph in reverse-topo-order, so we visit commits with largest commit_masks first. After we compute the reachability bitmap for a commit C, we push the bits in that bitmap to each commit D in the reverse edge list for C. Then, when we finally visit D we already have the bits for everything reachable from maximal commits that D can reach and we only need to walk the objects in the set-difference. In our ABC/XYZ example, when we finally walk for the commit A we only need to walk commits with bitmask equal to A's bitmask. If that bitmask is 100, then we are only walking commits in X - (Y union Z) because the bitmap already contains the bits for objects reachable from (X intersect Y) union (X intersect Z) (i.e. the bits from the reachability bitmaps for the maximal commits with bitmasks 110 and 101). The behavior is intended to walk each commit (and the trees that commit introduces) at most once while allocating and copying fewer reachability bitmaps. There is one caveat: what happens when there are multiple maximal commits with the same bitmask, with respect to the initial set of selected commits? MULTIPLE MAXIMAL COMMITS ------------------------ Earlier, we mentioned that when we discover a new maximal commit, we assign a new bit position to that commit and set that bit position to one for that commit. This is absolutely important for interesting commit-graphs such as git/git and torvalds/linux. The reason is due to the existence of "butterflies" in the commit-graph partial order. Here is an example of four commits forming a butterfly: I J |\ /| | \/ | | /\ | |/ \| M N \ / |/ Q Here, I and J both have parents M and N. In general, these do not need to be exact parent relationships, but reachability relationships. The most important part is that M and N cannot reach each other, so they are independent in the partial order. If I had commit_mask 10 and J had commit_mask 01, then M and N would both be assigned commit_mask 11 and be maximal commits with the bitmask 11. Then, what happens when M and N can both reach a commit Q? If Q is also assigned the bitmask 11, then it is not maximal but is reachable from both M and N. While this is not necessarily a deal-breaker for our abstract definition of finding maximal commits according to a given bitmask, we have a few issues that can come up in our larger picture of constructing reachability bitmaps. In particular, if we do not also consider Q to be a "maximal" commit, then we will walk commits reachable from Q twice: once when computing the reachability bitmap for M and another time when computing the reachability bitmap for N. This becomes much worse if the topology continues this pattern with multiple butterflies. The solution has already been mentioned: each of M and N are assigned their own bits to the bitmask and hence they become uniquely maximal for their bitmasks. Finally, Q also becomes maximal and thus we do not need to walk its commits multiple times. The final bitmasks for these commits are as follows: I:10 J:01 |\ /| | \ _____/ | | /\____ | |/ \ | M:111 N:1101 \ / Q:1111 Further, Q's reverse edge list is { M, N }, while M and N both have reverse edge list { I, J }. PERFORMANCE MEASUREMENTS ------------------------ Now that we've spent a LOT of time on the theory of this algorithm, let's show that this is actually worth all that effort. To test the performance, use GIT_TRACE2_PERF=1 when running 'git repack -abd' in a repository with no existing reachability bitmaps. This avoids any issues with keeping existing bitmaps to skew the numbers. Inspect the "building_bitmaps_total" region in the trace2 output to focus on the portion of work that is affected by this change. Here are the performance comparisons for a few repositories. The timings are for the following versions of Git: "multi" is the timing from before any reverse graph is constructed, where we might perform multiple traversals. "reverse" is for the previous change where the reverse graph has every reachable commit. Finally "maximal" is the version introduced here where the reverse graph only contains the maximal commits. Repository: git/git multi: 2.628 sec reverse: 2.344 sec maximal: 2.047 sec Repository: torvalds/linux multi: 64.7 sec reverse: 205.3 sec maximal: 44.7 sec So in all cases we've not only recovered any time lost to switching to the reverse-edge algorithm, but we come out ahead of "multi" in all cases. Likewise, peak heap has gone back to something reasonable: Repository: torvalds/linux multi: 2.087 GB reverse: 3.141 GB maximal: 2.288 GB While I do not have access to full fork networks on GitHub, Peff has run this algorithm on the chromium/chromium fork network and reported a change from 3 hours to ~233 seconds. That network is particularly beneficial for this approach because it has a long, linear history along with many tags. The "multi" approach was obviously quadratic and the new approach is linear. Helped-by: Jeff King Signed-off-by: Derrick Stolee Helped-by: Johannes Schindelin Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 72 +++++++++++++++++++++++++++++++--- t/t5310-pack-bitmaps.sh | 85 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 148 insertions(+), 9 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 7e218d02a6..0af93193d8 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -180,8 +180,10 @@ static void compute_xor_offsets(void) struct bb_commit { struct commit_list *reverse_edges; + struct bitmap *commit_mask; struct bitmap *bitmap; - unsigned selected:1; + unsigned selected:1, + maximal:1; unsigned idx; /* within selected array */ }; @@ -198,7 +200,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, { struct rev_info revs; struct commit *commit; - unsigned int i; + unsigned int i, num_maximal; memset(bb, 0, sizeof(*bb)); init_bb_data(&bb->data); @@ -210,27 +212,85 @@ static void bitmap_builder_init(struct bitmap_builder *bb, for (i = 0; i < writer->selected_nr; i++) { struct commit *c = writer->selected[i].commit; struct bb_commit *ent = bb_data_at(&bb->data, c); + ent->selected = 1; + ent->maximal = 1; ent->idx = i; + + ent->commit_mask = bitmap_new(); + bitmap_set(ent->commit_mask, i); + add_pending_object(&revs, &c->object, ""); } + num_maximal = writer->selected_nr; if (prepare_revision_walk(&revs)) die("revision walk setup failed"); while ((commit = get_revision(&revs))) { struct commit_list *p; + struct bb_commit *c_ent; parse_commit_or_die(commit); - ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); - bb->commits[bb->commits_nr++] = commit; + c_ent = bb_data_at(&bb->data, commit); + + if (c_ent->maximal) { + if (!c_ent->selected) { + bitmap_set(c_ent->commit_mask, num_maximal); + num_maximal++; + } + + ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); + bb->commits[bb->commits_nr++] = commit; + } for (p = commit->parents; p; p = p->next) { - struct bb_commit *ent = bb_data_at(&bb->data, p->item); - commit_list_insert(commit, &ent->reverse_edges); + struct bb_commit *p_ent = bb_data_at(&bb->data, p->item); + int c_not_p, p_not_c; + + if (!p_ent->commit_mask) { + p_ent->commit_mask = bitmap_new(); + c_not_p = 1; + p_not_c = 0; + } else { + c_not_p = bitmap_is_subset(c_ent->commit_mask, p_ent->commit_mask); + p_not_c = bitmap_is_subset(p_ent->commit_mask, c_ent->commit_mask); + } + + if (!c_not_p) + continue; + + bitmap_or(p_ent->commit_mask, c_ent->commit_mask); + + if (p_not_c) + p_ent->maximal = 1; + else { + p_ent->maximal = 0; + free_commit_list(p_ent->reverse_edges); + p_ent->reverse_edges = NULL; + } + + if (c_ent->maximal) { + commit_list_insert(commit, &p_ent->reverse_edges); + } else { + struct commit_list *cc = c_ent->reverse_edges; + + for (; cc; cc = cc->next) { + if (!commit_list_contains(cc->item, p_ent->reverse_edges)) + commit_list_insert(cc->item, &p_ent->reverse_edges); + } + } } + + bitmap_free(c_ent->commit_mask); + c_ent->commit_mask = NULL; } + + trace2_data_intmax("pack-bitmap-write", the_repository, + "num_selected_commits", writer->selected_nr); + trace2_data_intmax("pack-bitmap-write", the_repository, + "num_maximal_commits", num_maximal); } static void bitmap_builder_clear(struct bitmap_builder *bb) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 8bf02336d9..6815fb6a4e 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -20,12 +20,88 @@ has_any () { grep -Ff "$1" "$2" } +# To ensure the logic for "maximal commits" is exercised, make +# the repository a bit more complicated. +# +# other master +# * * +# (99 commits) (99 commits) +# * * +# |\ /| +# | * octo-other octo-master * | +# |/|\_________ ____________/|\| +# | \ \/ __________/ | +# | | ________/\ / | +# * |/ * merge-right * +# | _|__________/ \____________ | +# |/ | \| +# (l1) * * merge-left * (r1) +# | / \________________________ | +# |/ \| +# (l2) * * (r2) +# \___________________________ | +# \| +# * (base) +# +# The important part for the maximal commit algorithm is how +# the bitmasks are extended. Assuming starting bit positions +# for master (bit 0) and other (bit 1), and some flexibility +# in the order that merge bases are visited, the bitmasks at +# the end should be: +# +# master: 1 (maximal, selected) +# other: 01 (maximal, selected) +# octo-master: 1 +# octo-other: 01 +# merge-right: 111 (maximal) +# (l1): 111 +# (r1): 111 +# merge-left: 1101 (maximal) +# (l2): 11111 (maximal) +# (r2): 111101 (maximal) +# (base): 1111111 (maximal) + test_expect_success 'setup repo with moderate-sized history' ' - test_commit_bulk --id=file 100 && + test_commit_bulk --id=file 10 && git branch -M second && git checkout -b other HEAD~5 && test_commit_bulk --id=side 10 && + + # add complicated history setup, including merges and + # ambiguous merge-bases + + git checkout -b merge-left other~2 && + git merge second~2 -m "merge-left" && + + git checkout -b merge-right second~1 && + git merge other~1 -m "merge-right" && + + git checkout -b octo-second second && + git merge merge-left merge-right -m "octopus-second" && + + git checkout -b octo-other other && + git merge merge-left merge-right -m "octopus-other" && + + git checkout other && + git merge octo-other -m "pull octopus" && + git checkout second && + git merge octo-second -m "pull octopus" && + + # Remove these branches so they are not selected + # as bitmap tips + git branch -D merge-left && + git branch -D merge-right && + git branch -D octo-other && + git branch -D octo-second && + + # add padding to make these merges less interesting + # and avoid having them selected for bitmaps + test_commit_bulk --id=file 100 && + git checkout other && + test_commit_bulk --id=side 100 && + git checkout second && + bitmaptip=$(git rev-parse second) && blob=$(echo tagged-blob | git hash-object -w --stdin) && git tag tagged-blob $blob && @@ -33,9 +109,12 @@ test_expect_success 'setup repo with moderate-sized history' ' ' test_expect_success 'full repack creates bitmaps' ' - git repack -ad && + GIT_TRACE2_EVENT_NESTING=4 GIT_TRACE2_EVENT="$(pwd)/trace" \ + git repack -ad && ls .git/objects/pack/ | grep bitmap >output && - test_line_count = 1 output + test_line_count = 1 output && + grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace && + grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace ' test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' From patchwork Tue Dec 8 22:04:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959889 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DBD9C433FE for ; Tue, 8 Dec 2020 22:06:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E9D1023A1D for ; Tue, 8 Dec 2020 22:06:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730834AbgLHWFY (ORCPT ); Tue, 8 Dec 2020 17:05:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54198 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730811AbgLHWFS (ORCPT ); Tue, 8 Dec 2020 17:05:18 -0500 Received: from mail-ot1-x32d.google.com (mail-ot1-x32d.google.com [IPv6:2607:f8b0:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0C10C0617B0 for ; Tue, 8 Dec 2020 14:04:37 -0800 (PST) Received: by mail-ot1-x32d.google.com with SMTP id w3so206251otp.13 for ; Tue, 08 Dec 2020 14:04:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=5oK9d8MRvwLAr4PgAX2C7YmL72QhE9Q310xr6XLqZic=; b=wKbDFcOUE8qavn+MkPwGTFR5vBn8ktTJfLokzkc1U37um0Ng95FWmjxDj5cCgohuuq yWKS64YzKKPdHz0oaQrYTZBgklWroBzwGYfgOeZDnj4UbgJN54jam6xDE946vTnX8kOD YuQcphW2kGoVyG8HsqxuGGJoYMiMrcZ3rtIrtvoSBi4T+6SOVj28S8q4yj5U1qKQeQf0 8+pYtz9GqnYpwUHm6qa7FV/1ACq+OoEC9YsfbBEec2c3evu1QhKrKxskxtXS22W40f3H cJJUggPVmwkUKSBVKNy6rJ3LILPdJwseXXzWPiy0ee2+1DQWlVe6cEZIN5snv2W71xJ1 oGug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=5oK9d8MRvwLAr4PgAX2C7YmL72QhE9Q310xr6XLqZic=; b=j02u5O4fcg9ECITTBFNx3qlRedoEoeKah+imBx021EIEkbsU/yV6L7uZuUM/L8VGwW 1Ddak8VbhTDl1QXCAFSOuv3zUJT+s5AZIPaeUFUl+4rS9w08A5sEJO5E3NevLNB7LWHD ylTISo38VVsw+H62khbj290v+CMTeHL+ZfC7BiKOuv6nLDcfd1hCt63y9cCluj/6m/IK ck6UcODMgdqlbe2Ov6DXmTeh1yRTlSp+HHYuhaDkpV6FH9OZcK2WCRVMP4seMFt+a4TF igxDJhz9FN0qFU/gpoDonsIafWHQ6i+a/SWEClttb4Fecb9yq/WWK2/A9DpWpWojbqsb jGeQ== X-Gm-Message-State: AOAM531Qyxp8wLT8/GKCUnKTBO+AZGpnLPNlFwXt+cVs3uEDhqz8puuv dxhR0KUPZ8EQlGmu0lsquTQ0YjU/E2EEBzYF X-Google-Smtp-Source: ABdhPJx7E9zkTKc0KYI+K45toqFe21yxDEfF2bX14ir37t07P4Y/3ohAdgMOMhgnTN9jYkgasTT+5A== X-Received: by 2002:a05:6830:22d3:: with SMTP id q19mr119349otc.115.1607465076958; Tue, 08 Dec 2020 14:04:36 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id l132sm19018oia.23.2020.12.08.14.04.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:04:36 -0800 (PST) Date: Tue, 8 Dec 2020 17:04:34 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 19/24] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King The on-disk bitmap format has a flag to mark a bitmap to be "reused". This is a rather curious feature, and works like this: - a run of pack-objects would decide to mark the last 80% of the bitmaps it generates with the reuse flag - the next time we generate bitmaps, we'd see those reuse flags from the last run, and mark those commits as special: - we'd be more likely to select those commits to get bitmaps in the new output - when generating the bitmap for a selected commit, we'd reuse the old bitmap as-is (rearranging the bits to match the new pack, of course) However, neither of these behaviors particularly makes sense. Just because a commit happened to be bitmapped last time does not make it a good candidate for having a bitmap this time. In particular, we may choose bitmaps based on how recent they are in history, or whether a ref tip points to them, and those things will change. We're better off re-considering fresh which commits are good candidates. Reusing the existing bitmap _is_ a reasonable thing to do to save computation. But only reusing exact bitmaps is a weak form of this. If we have an old bitmap for A and now want a new bitmap for its child, we should be able to compute that only by looking at trees and that are new to the child. But this code would consider only exact reuse (which is perhaps why it was eager to select those commits in the first place). Furthermore, the recent switch to the reverse-edge algorithm for generating bitmaps dropped this optimization entirely (and yet still performs better). So let's do a few cleanups: - drop the whole "reusing bitmaps" phase of generating bitmaps. It's not helping anything, and is mostly unused code (or worse, code that is using CPU but not doing anything useful) - drop the use of the on-disk reuse flag to select commits to bitmap - stop setting the on-disk reuse flag in bitmaps we generate (since nothing respects it anymore) We will keep a few innards of the reuse code, which will help us implement a more capable version of the "reuse" optimization: - simplify rebuild_existing_bitmaps() into a function that only builds the mapping of bits between the old and new orders, but doesn't actually convert any bitmaps - make rebuild_bitmap() public; we'll call it lazily to convert bitmaps as we traverse (using the mapping created above) Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 1 - pack-bitmap-write.c | 50 +++++------------------------------------- pack-bitmap.c | 46 +++++--------------------------------- pack-bitmap.h | 6 ++++- 4 files changed, 16 insertions(+), 87 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 5617c01b5a..2a00358f34 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1104,7 +1104,6 @@ static void write_pack_file(void) stop_progress(&progress_state); bitmap_writer_show_progress(progress); - bitmap_writer_reuse_bitmaps(&to_pack); bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1); bitmap_writer_build(&to_pack); bitmap_writer_finish(written_list, nr_written, diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 0af93193d8..333058854d 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -30,7 +30,6 @@ struct bitmap_writer { struct ewah_bitmap *tags; kh_oid_map_t *bitmaps; - kh_oid_map_t *reused; struct packing_data *to_pack; struct bitmapped_commit *selected; @@ -112,7 +111,7 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack, * Compute the actual bitmaps */ -static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused) +static inline void push_bitmapped_commit(struct commit *commit) { if (writer.selected_nr >= writer.selected_alloc) { writer.selected_alloc = (writer.selected_alloc + 32) * 2; @@ -120,7 +119,7 @@ static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitm } writer.selected[writer.selected_nr].commit = commit; - writer.selected[writer.selected_nr].bitmap = reused; + writer.selected[writer.selected_nr].bitmap = NULL; writer.selected[writer.selected_nr].flags = 0; writer.selected_nr++; @@ -372,13 +371,6 @@ static void store_selected(struct bb_commit *ent, struct commit *commit) khiter_t hash_pos; int hash_ret; - /* - * the "reuse bitmaps" phase may have stored something here, but - * our new algorithm doesn't use it. Drop it. - */ - if (stored->bitmap) - ewah_free(stored->bitmap); - stored->bitmap = bitmap_to_ewah(ent->bitmap); hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret); @@ -480,35 +472,6 @@ static int date_compare(const void *_a, const void *_b) return (long)b->date - (long)a->date; } -void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack) -{ - struct bitmap_index *bitmap_git; - if (!(bitmap_git = prepare_bitmap_git(to_pack->repo))) - return; - - writer.reused = kh_init_oid_map(); - rebuild_existing_bitmaps(bitmap_git, to_pack, writer.reused, - writer.show_progress); - /* - * NEEDSWORK: rebuild_existing_bitmaps() makes writer.reused reference - * some bitmaps in bitmap_git, so we can't free the latter. - */ -} - -static struct ewah_bitmap *find_reused_bitmap(const struct object_id *oid) -{ - khiter_t hash_pos; - - if (!writer.reused) - return NULL; - - hash_pos = kh_get_oid_map(writer.reused, *oid); - if (hash_pos >= kh_end(writer.reused)) - return NULL; - - return kh_value(writer.reused, hash_pos); -} - void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps) @@ -522,12 +485,11 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, if (indexed_commits_nr < 100) { for (i = 0; i < indexed_commits_nr; ++i) - push_bitmapped_commit(indexed_commits[i], NULL); + push_bitmapped_commit(indexed_commits[i]); return; } for (;;) { - struct ewah_bitmap *reused_bitmap = NULL; struct commit *chosen = NULL; next = next_commit_index(i); @@ -542,15 +504,13 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, if (next == 0) { chosen = indexed_commits[i]; - reused_bitmap = find_reused_bitmap(&chosen->object.oid); } else { chosen = indexed_commits[i + next]; for (j = 0; j <= next; ++j) { struct commit *cm = indexed_commits[i + j]; - reused_bitmap = find_reused_bitmap(&cm->object.oid); - if (reused_bitmap || (cm->object.flags & NEEDS_BITMAP) != 0) { + if ((cm->object.flags & NEEDS_BITMAP) != 0) { chosen = cm; break; } @@ -560,7 +520,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, } } - push_bitmapped_commit(chosen, reused_bitmap); + push_bitmapped_commit(chosen); i += next + 1; display_progress(writer.progress, i); diff --git a/pack-bitmap.c b/pack-bitmap.c index 60c781d100..d1368b69bb 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1338,9 +1338,9 @@ void test_bitmap_walk(struct rev_info *revs) free_bitmap_index(bitmap_git); } -static int rebuild_bitmap(uint32_t *reposition, - struct ewah_bitmap *source, - struct bitmap *dest) +int rebuild_bitmap(const uint32_t *reposition, + struct ewah_bitmap *source, + struct bitmap *dest) { uint32_t pos = 0; struct ewah_iterator it; @@ -1369,19 +1369,11 @@ static int rebuild_bitmap(uint32_t *reposition, return 0; } -int rebuild_existing_bitmaps(struct bitmap_index *bitmap_git, - struct packing_data *mapping, - kh_oid_map_t *reused_bitmaps, - int show_progress) +uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, + struct packing_data *mapping) { uint32_t i, num_objects; uint32_t *reposition; - struct bitmap *rebuild; - struct stored_bitmap *stored; - struct progress *progress = NULL; - - khiter_t hash_pos; - int hash_ret; num_objects = bitmap_git->pack->num_objects; reposition = xcalloc(num_objects, sizeof(uint32_t)); @@ -1399,33 +1391,7 @@ int rebuild_existing_bitmaps(struct bitmap_index *bitmap_git, reposition[i] = oe_in_pack_pos(mapping, oe) + 1; } - rebuild = bitmap_new(); - i = 0; - - if (show_progress) - progress = start_progress("Reusing bitmaps", 0); - - kh_foreach_value(bitmap_git->bitmaps, stored, { - if (stored->flags & BITMAP_FLAG_REUSE) { - if (!rebuild_bitmap(reposition, - lookup_stored_bitmap(stored), - rebuild)) { - hash_pos = kh_put_oid_map(reused_bitmaps, - stored->oid, - &hash_ret); - kh_value(reused_bitmaps, hash_pos) = - bitmap_to_ewah(rebuild); - } - bitmap_reset(rebuild); - display_progress(progress, ++i); - } - }); - - stop_progress(&progress); - - free(reposition); - bitmap_free(rebuild); - return 0; + return reposition; } void free_bitmap_index(struct bitmap_index *b) diff --git a/pack-bitmap.h b/pack-bitmap.h index 1203120c43..afa4115136 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -73,7 +73,11 @@ void bitmap_writer_set_checksum(unsigned char *sha1); void bitmap_writer_build_type_index(struct packing_data *to_pack, struct pack_idx_entry **index, uint32_t index_nr); -void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack); +uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, + struct packing_data *mapping); +int rebuild_bitmap(const uint32_t *reposition, + struct ewah_bitmap *source, + struct bitmap *dest); void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps); void bitmap_writer_build(struct packing_data *to_pack); From patchwork Tue Dec 8 22:04:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959891 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B923C1B0D9 for ; Tue, 8 Dec 2020 22:06:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 32ED123A33 for ; Tue, 8 Dec 2020 22:06:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730851AbgLHWF3 (ORCPT ); Tue, 8 Dec 2020 17:05:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730823AbgLHWFW (ORCPT ); Tue, 8 Dec 2020 17:05:22 -0500 Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11A49C0611C5 for ; Tue, 8 Dec 2020 14:04:42 -0800 (PST) Received: by mail-ot1-x341.google.com with SMTP id w3so206431otp.13 for ; Tue, 08 Dec 2020 14:04:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=LViEVk8iVCd/Fue2tk1lTGL+H9g9t3D0HHU/a47NVcM=; b=WJJc9T2dTmzJKUZTiFKZk/LaSvr4E5kCNTDb0tVr+wMGjLHjw+AvePsPHTAmHrsEKB 3bcbiRQGNF1RQ7P1pSEoo+fVbLSykXJz+fRbSnl/4nhKxZwt8LF8yU70j4/921qMo6UO dU0gOFruEfXVMFCetodF6kHVHRpP2Mua4mlCsB+c7An+Lp4kbOejpoH2xdcS+PBaJdub zzppijdYF9UATQh9Mf9xFxkYTvbYpNTODp2TVXyPDXShkkOma/0EUjq1q3c3eebjyKrl NnKUsN3Xn/e0mfesBhgeAUQ9e2a8OgBmLV8cK/Of55cPUa0ungk5odoexw2MzMHmYQa1 Z6Kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=LViEVk8iVCd/Fue2tk1lTGL+H9g9t3D0HHU/a47NVcM=; b=Gz6sojtZ/0BzKGuvOaHZv0I+dSZh6xtC0pOqpy5ZzXsdg2xqaDPU6i9Hv2Y5fpmEt0 DvywyoOHKgSJL4lDdBu9cXudllqg+ypxPpZDHLPlKGHU436sL70sY1xP2APMCUxCpB2H 9M/W/oTl1hCr5Rlzl/SkmBU6HV5gtk4028oX6Z9+v/IlG1SZZ9+8muVCB/gxctFVamXi bT0JYuIeLje2BsnmSkIOgIkV4++Z7pE7608FmX2zX6EK24cO2T4w4YLUKZFhTTq7Kc1y 92MHGKtP1qWB08ZjlWlQkdOulLAVCgQeYpUSFrh35rdpgtNLPCaRgACerTEmgcq+EzYD tuvA== X-Gm-Message-State: AOAM5333fMmtICU7Adks29+5c4hyMdthViaKA9/oR01eUVRxhL8SS+dr QUiKVS7dJCYtGCmIzKcKJy/ERymbXrepdMcd X-Google-Smtp-Source: ABdhPJxLLaCEEikBDGiWwi2+jlLiHghmxo+a4NmvpMI15l36VyiejiK2JRJlWc7R2GrOIeh4LXpweQ== X-Received: by 2002:a9d:7e7:: with SMTP id 94mr141497oto.370.1607465081145; Tue, 08 Dec 2020 14:04:41 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id v92sm10052otb.75.2020.12.08.14.04.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:04:40 -0800 (PST) Date: Tue, 8 Dec 2020 17:04:38 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 20/24] pack-bitmap: factor out 'bitmap_for_commit()' Message-ID: <4d7a4184ac141115ef2ede59d9af30ae47b3a387.1607464775.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A couple of callers within pack-bitmap.c duplicate logic to lookup a given object id in the bitamps khash. Factor this out into a new function, 'bitmap_for_commit()' to reduce some code duplication. Make this new function non-static, since it will be used in later commits from outside of pack-bitmap.c. Signed-off-by: Taylor Blau --- pack-bitmap.c | 33 +++++++++++++++++++-------------- pack-bitmap.h | 2 ++ 2 files changed, 21 insertions(+), 14 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index d1368b69bb..5efb8af121 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -380,6 +380,16 @@ struct include_data { struct bitmap *seen; }; +struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git, + struct commit *commit) +{ + khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps, + commit->object.oid); + if (hash_pos >= kh_end(bitmap_git->bitmaps)) + return NULL; + return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos)); +} + static inline int bitmap_position_extended(struct bitmap_index *bitmap_git, const struct object_id *oid) { @@ -465,10 +475,10 @@ static void show_commit(struct commit *commit, void *data) static int add_to_include_set(struct bitmap_index *bitmap_git, struct include_data *data, - const struct object_id *oid, + struct commit *commit, int bitmap_pos) { - khiter_t hash_pos; + struct ewah_bitmap *partial; if (data->seen && bitmap_get(data->seen, bitmap_pos)) return 0; @@ -476,10 +486,9 @@ static int add_to_include_set(struct bitmap_index *bitmap_git, if (bitmap_get(data->base, bitmap_pos)) return 0; - hash_pos = kh_get_oid_map(bitmap_git->bitmaps, *oid); - if (hash_pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, hash_pos); - bitmap_or_ewah(data->base, lookup_stored_bitmap(st)); + partial = bitmap_for_commit(bitmap_git, commit); + if (partial) { + bitmap_or_ewah(data->base, partial); return 0; } @@ -498,8 +507,7 @@ static int should_include(struct commit *commit, void *_data) (struct object *)commit, NULL); - if (!add_to_include_set(data->bitmap_git, data, &commit->object.oid, - bitmap_pos)) { + if (!add_to_include_set(data->bitmap_git, data, commit, bitmap_pos)) { struct commit_list *parent = commit->parents; while (parent) { @@ -1282,10 +1290,10 @@ void test_bitmap_walk(struct rev_info *revs) { struct object *root; struct bitmap *result = NULL; - khiter_t pos; size_t result_popcnt; struct bitmap_test_data tdata; struct bitmap_index *bitmap_git; + struct ewah_bitmap *bm; if (!(bitmap_git = prepare_bitmap_git(revs->repo))) die("failed to load bitmap indexes"); @@ -1297,12 +1305,9 @@ void test_bitmap_walk(struct rev_info *revs) bitmap_git->version, bitmap_git->entry_count); root = revs->pending.objects[0].item; - pos = kh_get_oid_map(bitmap_git->bitmaps, root->oid); - - if (pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos); - struct ewah_bitmap *bm = lookup_stored_bitmap(st); + bm = bitmap_for_commit(bitmap_git, (struct commit *)root); + if (bm) { fprintf(stderr, "Found bitmap for %s. %d bits / %08x checksum\n", oid_to_hex(&root->oid), (int)bm->bit_size, ewah_checksum(bm)); diff --git a/pack-bitmap.h b/pack-bitmap.h index afa4115136..25dfcf5615 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -78,6 +78,8 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, int rebuild_bitmap(const uint32_t *reposition, struct ewah_bitmap *source, struct bitmap *dest); +struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git, + struct commit *commit); void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps); void bitmap_writer_build(struct packing_data *to_pack); From patchwork Tue Dec 8 22:05:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E33F6C2BB9A for ; Tue, 8 Dec 2020 22:06:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BE5BD222B3 for ; Tue, 8 Dec 2020 22:06:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731014AbgLHWGD (ORCPT ); Tue, 8 Dec 2020 17:06:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730821AbgLHWF6 (ORCPT ); Tue, 8 Dec 2020 17:05:58 -0500 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7DE9DC061794 for ; Tue, 8 Dec 2020 14:05:18 -0800 (PST) Received: by mail-oi1-x244.google.com with SMTP id l200so183819oig.9 for ; Tue, 08 Dec 2020 14:05:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=/3ATCqzziAQFVwBi3ny+EmDvKvp43Oa289NvDRWL6DI=; b=SZPQkqIe9zQEVomKpEuRgLMOvUcoVFXBYipJvAOcaaJCbEZHIZ2WDA85pJwDpFNhZM n+aMmP9jGw4JAz+cDOCBxdcZIzw1u8LQSdTZPQ826HWvIWHkDGAuvWGzV14SM5aibjqE rUarZA2J20fTiTl4Q/jCg187WXSyc+o8kg2MJe+feCstOpXODDAD4GPpkt3kxk8gn+CK eEVKsotpwtG62Om2/uHVhkayX22I3YrZmMhcLJdc8hj8S7/H+2WyrupXrz0fmD2P2EKb QJuk6D4n1S2cQe8N+x/Z80WP4VLnZsTiOwSOJF/7rce2zVBQKi9Qrj3gJS5HseKtpmB4 cjow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=/3ATCqzziAQFVwBi3ny+EmDvKvp43Oa289NvDRWL6DI=; b=SqMAc0en8pd9quYYUqzkpJBGrtw/gUT6+FrSjcoGj690B7brBrLp+3iJwL+EIdgB15 g/MAh0yGbx4HHT39sWuQ6zxxsVvw3pNbRMqWxOXeZiuhOGPrQjFl/d1KTNF58JVjw6UQ 9Ufj5GZBFqQ0Ngx7Fq7qZfmPegFZt4u3kHE1+WzuxmMb5fDd3ziDpx9DhMoH7ROoeYzx toQbEmif1eMnj5Gz7BWrEwySYNzQwQuwrLyuVTOA4Bfq2majPuR7SDDkTgDppeEdnzUY 6haKjlRKO9h4ZqiqE+q0IVhI6i2VCTkhYtt05Xt1OQYHezaJmNkcxvKrkpnHOKYbBx5C 053A== X-Gm-Message-State: AOAM533nCDzivbelPmy/PWR7TxP00rkUzDIdm1vZRI6SKsE5hMbCtWex Zb12SeJNX0c+3q0O7sBuBq5mB0H8pylR/EDa X-Google-Smtp-Source: ABdhPJyFxMUduJ2srEsUO8Gzs/EnsvWcLG+pBmiMPjqY8VEBeZI5qQ5Ld/T4nUWdBftTobxQe+166A== X-Received: by 2002:aca:bc41:: with SMTP id m62mr18157oif.16.1607465117681; Tue, 08 Dec 2020 14:05:17 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id w131sm22911oif.8.2020.12.08.14.05.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:05:17 -0800 (PST) Date: Tue, 8 Dec 2020 17:05:14 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 21/24] pack-bitmap: factor out 'add_commit_to_bitmap()' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org 'find_objects()' currently needs to interact with the bitmaps khash pretty closely. To make 'find_objects()' read a little more straightforwardly, remove some of the khash-level details into a new function that describes what it does: 'add_commit_to_bitmap()'. Signed-off-by: Taylor Blau --- pack-bitmap.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 5efb8af121..d88745fb02 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -521,6 +521,23 @@ static int should_include(struct commit *commit, void *_data) return 1; } +static int add_commit_to_bitmap(struct bitmap_index *bitmap_git, + struct bitmap **base, + struct commit *commit) +{ + struct ewah_bitmap *or_with = bitmap_for_commit(bitmap_git, commit); + + if (!or_with) + return 0; + + if (*base == NULL) + *base = ewah_to_bitmap(or_with); + else + bitmap_or_ewah(*base, or_with); + + return 1; +} + static struct bitmap *find_objects(struct bitmap_index *bitmap_git, struct rev_info *revs, struct object_list *roots, @@ -544,21 +561,10 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, struct object *object = roots->item; roots = roots->next; - if (object->type == OBJ_COMMIT) { - khiter_t pos = kh_get_oid_map(bitmap_git->bitmaps, object->oid); - - if (pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos); - struct ewah_bitmap *or_with = lookup_stored_bitmap(st); - - if (base == NULL) - base = ewah_to_bitmap(or_with); - else - bitmap_or_ewah(base, or_with); - - object->flags |= SEEN; - continue; - } + if (object->type == OBJ_COMMIT && + add_commit_to_bitmap(bitmap_git, &base, (struct commit *)object)) { + object->flags |= SEEN; + continue; } object_list_insert(object, ¬_mapped); From patchwork Tue Dec 8 22:05:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959893 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDC7EC2BB48 for ; Tue, 8 Dec 2020 22:06:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AA45923A33 for ; Tue, 8 Dec 2020 22:06:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730942AbgLHWFo (ORCPT ); Tue, 8 Dec 2020 17:05:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730829AbgLHWFk (ORCPT ); Tue, 8 Dec 2020 17:05:40 -0500 Received: from mail-ot1-x343.google.com (mail-ot1-x343.google.com [IPv6:2607:f8b0:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20BF7C0613CF for ; Tue, 8 Dec 2020 14:05:25 -0800 (PST) Received: by mail-ot1-x343.google.com with SMTP id b18so287380ots.0 for ; Tue, 08 Dec 2020 14:05:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=g8jjZe/eZNNaHHg17+lxlVYaOvNcp9G5BR/nfwQZ4yI=; b=dcRwEr5Oo6/0pYftCOUCot6SnQhS1eNhI81dn/Qp0DNcaSObQSlLYJX6ZuRThr9fBQ VPlqSBClQkoC0L025FgHqBF+W7E/O1EvtA/amseSuegIy9CXOcktLVXSYrTXUvBT7Zlv cBWQj+Z2XXM052Sv6PIrpHlkezjJ9h8D5cUvtIUZDlwwK4ASsGh7J8+bhZerG3cMESEM nujXnMAEYppBxS3d86dMLVQzMvskStX4o/S+YF0XOhe/b4WX2WSLTFoiCNcl/Zzi0gE3 a9yVBgbPc8dE63th7nwftZG3/L2unU26qNkJhi+bDGjJyHqZqg5skfC2EgKA6PFvpQWN blxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=g8jjZe/eZNNaHHg17+lxlVYaOvNcp9G5BR/nfwQZ4yI=; b=biXNi//HXrVkPJ71W0mdQ2EIDKKNk3253/9IhDil9ulVk8OFGbZJUjUhf2O90DPUVD pYYZGf3X/f+/7vLD/2urlGQHlhQMbvCXr7/S1s2BWUoh22Vo6H3hdiSvQQKG+niEOLas HdKN2SbJyMT7VqKT0eWangA2msjXzJxOAqlKmhuH6FZ/Aw7Sr/BD5Z4Hqigfho6ELG33 4x1PqfE5q8F4padBFCfH+LXSRaNbKMi5IE4haz+Cf+n0ADfgl4TibQMLiCFqGtFHjqez wdYzPpp55gS0DnQA8ecoLf/Zs2gJ2kJo3kYMebdWZ4NGKj8J9f/+/1R5y91uTpcOHDUc RbPA== X-Gm-Message-State: AOAM532XZPVIQV+HXppro3I4kIMwBJixWSbrIV3zFxiBQl+NPMQMwPRV W/XmbECzmGnCs7xkUnAA6e5JB5fdJ1EXqXkX X-Google-Smtp-Source: ABdhPJwxlYFxoeEUAsFOYtO+lVTDe0BhDVV1kBNgwngQcrnMWsDijgBqAMGG5qaugRCmOPHUCxl9eQ== X-Received: by 2002:a9d:5ad:: with SMTP id 42mr159124otd.154.1607465124151; Tue, 08 Dec 2020 14:05:24 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id 60sm16755otn.35.2020.12.08.14.05.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:05:23 -0800 (PST) Date: Tue, 8 Dec 2020 17:05:21 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 22/24] pack-bitmap-write: use existing bitmaps Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee When constructing new bitmaps, we perform a commit and tree walk in fill_bitmap_commit() and fill_bitmap_tree(). This walk would benefit from using existing bitmaps when available. We must track the existing bitmaps and translate them into the new object order, but this is generally faster than parsing trees. In fill_bitmap_commit(), we must reorder thing somewhat. The priority queue walks commits from newest-to-oldest, which means we correctly stop walking when reaching a commit with a bitmap. However, if we walk trees interleaved with the commits, then we might be parsing trees that are actually part of a re-used bitmap. To avoid over-walking trees, add them to a LIFO queue and walk them after exploring commits completely. On git.git, this reduces a second immediate bitmap computation from 2.0s to 1.0s. On linux.git, we go from 32s to 22s. On chromium's fork network, we go from 227s to 198s. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 40 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 333058854d..76c8236f94 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -340,20 +340,37 @@ static void fill_bitmap_tree(struct bitmap *bitmap, static void fill_bitmap_commit(struct bb_commit *ent, struct commit *commit, - struct prio_queue *queue) + struct prio_queue *queue, + struct prio_queue *tree_queue, + struct bitmap_index *old_bitmap, + const uint32_t *mapping) { if (!ent->bitmap) ent->bitmap = bitmap_new(); - bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); prio_queue_put(queue, commit); while (queue->nr) { struct commit_list *p; struct commit *c = prio_queue_get(queue); + if (old_bitmap && mapping) { + struct ewah_bitmap *old = bitmap_for_commit(old_bitmap, c); + /* + * If this commit has an old bitmap, then translate that + * bitmap and add its bits to this one. No need to walk + * parents or the tree for this commit. + */ + if (old && !rebuild_bitmap(mapping, old, ent->bitmap)) + continue; + } + + /* + * Mark ourselves and queue our tree. The commit + * walk ensures we cover all parents. + */ bitmap_set(ent->bitmap, find_object_pos(&c->object.oid)); - fill_bitmap_tree(ent->bitmap, get_commit_tree(c)); + prio_queue_put(tree_queue, get_commit_tree(c)); for (p = c->parents; p; p = p->next) { int pos = find_object_pos(&p->item->object.oid); @@ -363,6 +380,9 @@ static void fill_bitmap_commit(struct bb_commit *ent, } } } + + while (tree_queue->nr) + fill_bitmap_tree(ent->bitmap, prio_queue_get(tree_queue)); } static void store_selected(struct bb_commit *ent, struct commit *commit) @@ -386,6 +406,9 @@ void bitmap_writer_build(struct packing_data *to_pack) size_t i; int nr_stored = 0; /* for progress */ struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; + struct prio_queue tree_queue = { NULL }; + struct bitmap_index *old_bitmap; + uint32_t *mapping; writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; @@ -395,6 +418,12 @@ void bitmap_writer_build(struct packing_data *to_pack) trace2_region_enter("pack-bitmap-write", "building_bitmaps_total", the_repository); + old_bitmap = prepare_bitmap_git(to_pack->repo); + if (old_bitmap) + mapping = create_bitmap_mapping(old_bitmap, to_pack); + else + mapping = NULL; + bitmap_builder_init(&bb, &writer); for (i = bb.commits_nr; i > 0; i--) { struct commit *commit = bb.commits[i-1]; @@ -402,7 +431,8 @@ void bitmap_writer_build(struct packing_data *to_pack) struct commit *child; int reused = 0; - fill_bitmap_commit(ent, commit, &queue); + fill_bitmap_commit(ent, commit, &queue, &tree_queue, + old_bitmap, mapping); if (ent->selected) { store_selected(ent, commit); @@ -428,7 +458,9 @@ void bitmap_writer_build(struct packing_data *to_pack) ent->bitmap = NULL; } clear_prio_queue(&queue); + clear_prio_queue(&tree_queue); bitmap_builder_clear(&bb); + free(mapping); trace2_region_leave("pack-bitmap-write", "building_bitmaps_total", the_repository); From patchwork Tue Dec 8 22:05:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959911 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A237C1B0D9 for ; Tue, 8 Dec 2020 22:06:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 315FD23A1D for ; Tue, 8 Dec 2020 22:06:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731108AbgLHWGR (ORCPT ); Tue, 8 Dec 2020 17:06:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731062AbgLHWGP (ORCPT ); Tue, 8 Dec 2020 17:06:15 -0500 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B2DDC06179C for ; Tue, 8 Dec 2020 14:05:29 -0800 (PST) Received: by mail-oi1-x244.google.com with SMTP id f132so163180oib.12 for ; Tue, 08 Dec 2020 14:05:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=w1xB3i7H2Z5DNsLz2Xah7JmKQyWhg02742HJdA61YU0=; b=pCqX8y+Fevi9q8BeLp4N7xBaRE2pxPoBrM1SRj8Vcyr0OZATGXpNKoTi8A1M/Vby+X 4phtQSwqsPMZ5MFzh3EFTWo3MRy0B4FbWWzTNXUZqxT0lIHpGfezUilk3H6gmLcmQLIP k495XmWlMayrLZJxwEFQAjgTLSFVZgjnVEInEmqA6s8GFrJxsm3hdJNu7dHRvyh/F97w zMHV+5MCxMARAcQTueM4qwyx167wga6Vgo/ppihPPuUF1WUzToYhzSgQhD9dUggeBgwz 8t41LsOXGFnaCgRA1c+/ZPLHqYabosc/8zIywI3L1zATspLo0ko8G+u6DkGVJRKzD6Ow U7mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=w1xB3i7H2Z5DNsLz2Xah7JmKQyWhg02742HJdA61YU0=; b=Lj+FCd547YIdiIPDoCAZyMv8kh1eGmMcKd/oomNQw6+/Lgt13kw9bAbJbZT5hkOozs LpiCgksVL+KxGX4lgYElmZJP5b/vOawAa9gwtup8StiayKVFkdxBFmnp977J7maoaA0O 1ObYGfi30nrMBJKs46+lz9ziJs/lBjqKYPWQ930W1vPgvLZM+qJXklvr7GpRAbXA9PWi 4JgzrMJEEZxxoxJ3eKbmxUrWNR+ZR0r6FDEhTvUB6EFaNoCbBYGQ5CEOQ4mCqOHwxFzP iZbPodKobf02rkBcbaIg2NEf3C/g+0JwkHTqNYMv6fL5fc6aK/ar5nr/C9V1rJ3ollQ0 Bb9A== X-Gm-Message-State: AOAM5330lnX0mhEc9etWMJLi1t+CyQuxlo3Ok03YjuNfTtpJ71SBIV7o DAF5/jerpcSdrhRX4J8Hi08JmKZAvYRdBMak X-Google-Smtp-Source: ABdhPJzzQ/6IVbCt/zOpHmsVkbf1ci0qgKqTssx+EjXroMNkal8x4o8lW63NqGDfLLGN1VLKO2neTw== X-Received: by 2002:aca:5a42:: with SMTP id o63mr4252310oib.69.1607465128462; Tue, 08 Dec 2020 14:05:28 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id j62sm14562otc.49.2020.12.08.14.05.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:05:27 -0800 (PST) Date: Tue, 8 Dec 2020 17:05:26 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 23/24] pack-bitmap-write: relax unique revwalk condition Message-ID: <8f9fdb0f43ef6aa6796ef50d39a4c911259aa3db.1607464775.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The previous commits improved the bitmap computation process for very long, linear histories with many refs by removing quadratic growth in how many objects were walked. The strategy of computing "intermediate commits" using bitmasks for which refs can reach those commits partitioned the poset of reachable objects so each part could be walked exactly once. This was effective for linear histories. However, there was a (significant) drawback: wide histories with many refs had an explosion of memory costs to compute the commit bitmasks during the exploration that discovers these intermediate commits. Since these wide histories are unlikely to repeat walking objects, the benefit of walking objects multiple times was not expensive before. But now, the commit walk *before computing bitmaps* is incredibly expensive. In an effort to discover a happy medium, this change reduces the walk for intermediate commits to only the first-parent history. This focuses the walk on how the histories converge, which still has significant reduction in repeat object walks. It is still possible to create quadratic behavior in this version, but it is probably less likely in realistic data shapes. Here is some data taken on a fresh clone of the kernel: | runtime (sec) | peak heap (GB) | | | | | from | with | from | with | | scratch | existing | scratch | existing | -----------+---------+----------+---------+----------- original | 64.044 | 83.241 | 2.088 | 2.194 | last patch | 45.049 | 37.624 | 2.267 | 2.334 | this patch | 88.478 | 53.218 | 2.157 | 2.224 | Signed-off-by: Derrick Stolee Helped-by: Johannes Schindelin Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 14 +++++--------- t/t5310-pack-bitmaps.sh | 33 +++++++++++++++++---------------- 2 files changed, 22 insertions(+), 25 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 76c8236f94..d2af4a974f 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -199,7 +199,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, { struct rev_info revs; struct commit *commit; - unsigned int i, num_maximal; + unsigned int i, num_maximal = 0; memset(bb, 0, sizeof(*bb)); init_bb_data(&bb->data); @@ -207,6 +207,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, reset_revision_walk(); repo_init_revisions(writer->to_pack->repo, &revs, NULL); revs.topo_order = 1; + revs.first_parent_only = 1; for (i = 0; i < writer->selected_nr; i++) { struct commit *c = writer->selected[i].commit; @@ -221,13 +222,12 @@ static void bitmap_builder_init(struct bitmap_builder *bb, add_pending_object(&revs, &c->object, ""); } - num_maximal = writer->selected_nr; if (prepare_revision_walk(&revs)) die("revision walk setup failed"); while ((commit = get_revision(&revs))) { - struct commit_list *p; + struct commit_list *p = commit->parents; struct bb_commit *c_ent; parse_commit_or_die(commit); @@ -235,16 +235,12 @@ static void bitmap_builder_init(struct bitmap_builder *bb, c_ent = bb_data_at(&bb->data, commit); if (c_ent->maximal) { - if (!c_ent->selected) { - bitmap_set(c_ent->commit_mask, num_maximal); - num_maximal++; - } - + num_maximal++; ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); bb->commits[bb->commits_nr++] = commit; } - for (p = commit->parents; p; p = p->next) { + if (p) { struct bb_commit *p_ent = bb_data_at(&bb->data, p->item); int c_not_p, p_not_c; diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 6815fb6a4e..3a2c9d2d8e 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -23,12 +23,12 @@ has_any () { # To ensure the logic for "maximal commits" is exercised, make # the repository a bit more complicated. # -# other master +# other second # * * # (99 commits) (99 commits) # * * # |\ /| -# | * octo-other octo-master * | +# | * octo-other octo-second * | # |/|\_________ ____________/|\| # | \ \/ __________/ | # | | ________/\ / | @@ -43,23 +43,24 @@ has_any () { # \| # * (base) # +# We only push bits down the first-parent history, which +# makes some of these commits unimportant! +# # The important part for the maximal commit algorithm is how # the bitmasks are extended. Assuming starting bit positions -# for master (bit 0) and other (bit 1), and some flexibility -# in the order that merge bases are visited, the bitmasks at -# the end should be: +# for second (bit 0) and other (bit 1), the bitmasks at the +# end should be: # -# master: 1 (maximal, selected) +# second: 1 (maximal, selected) # other: 01 (maximal, selected) -# octo-master: 1 -# octo-other: 01 -# merge-right: 111 (maximal) -# (l1): 111 -# (r1): 111 -# merge-left: 1101 (maximal) -# (l2): 11111 (maximal) -# (r2): 111101 (maximal) -# (base): 1111111 (maximal) +# (base): 11 (maximal) +# +# This complicated history was important for a previous +# version of the walk that guarantees never walking a +# commit multiple times. That goal might be important +# again, so preserve this complicated case. For now, this +# test will guarantee that the bitmaps are computed +# correctly, even with the repeat calculations. test_expect_success 'setup repo with moderate-sized history' ' test_commit_bulk --id=file 10 && @@ -114,7 +115,7 @@ test_expect_success 'full repack creates bitmaps' ' ls .git/objects/pack/ | grep bitmap >output && test_line_count = 1 output && grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace && - grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace + grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace ' test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' From patchwork Tue Dec 8 22:05:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11959913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD20FC2BB48 for ; Tue, 8 Dec 2020 22:06:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 83D8C23A1D for ; Tue, 8 Dec 2020 22:06:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730468AbgLHWG1 (ORCPT ); Tue, 8 Dec 2020 17:06:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730971AbgLHWGO (ORCPT ); Tue, 8 Dec 2020 17:06:14 -0500 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E27DC0617A6 for ; Tue, 8 Dec 2020 14:05:34 -0800 (PST) Received: by mail-oi1-x244.google.com with SMTP id f132so163453oib.12 for ; Tue, 08 Dec 2020 14:05:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=QL9l0cHKGO73C4wCfiM9Q7odgpoKI5hiF7YgL6COI5Y=; b=cfgB3vlSZ/k+twq9mbxJ5QwReuUG10539HvOdLLYLcUO8Nr7iBGn1pek7DSnf82/A9 N7dv9P2Lghwr4rK7mRAyIVlPsn4LlArpo3th5q20aKSPfWviVgpMPrj/u4PhGXbG+nmc G0inTpcOGVe7ViOTh3LdYQvjqiqbuRMkb3IvFwkkjimN8Bp6PEz9bPCPhvtye/KV21lf bvpGP3MEi8OIlrWXP6mi5Y4P/jyyzjP9Peut5+nkvzUlSoFOP8iK7POOAo3ysJuY7kmv Jd4MOJknoy1cVzZy6ZGiiDZQ/nHr+7GP3zIXyap24rW/ievZQGdeeWxe1mUFK9vu+Dk6 +UXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=QL9l0cHKGO73C4wCfiM9Q7odgpoKI5hiF7YgL6COI5Y=; b=Paw3adlbqEAqJFNAT/s42ZqxL+JCLLrd2PewKk28oiqhjGreKY+/6+B8kx0EWvd8OV NeIJjxQ/Il+fA87Y/IQzar2rWuBMG73q2ANOO4LFKlEjxoO081u/POZTDU3Hbcu50Fwr cSUCHT09l+3MPAgwijf40R/1I5lIO1YzuQmU6Qy6+5rfpSxrjrtzrgZWm3o+b792vwic jZgofMj+y/RsSegVu4oMkTr3HtqUOLRqWLfXaoQPxfwPkL4N1Szm23qkHj12RFJPhSBD OpomeNjszRYV/W815rLsyCKRhXFqa0z51UOf+BIYSJ/z9SL1h+dUYJ3RtOvPARrQMJY7 mdug== X-Gm-Message-State: AOAM530sbxKEKEBW942XReuKdr05nUPhqlALEXLsLtP5/zc0UOvxCAvm u+gIKrGAGZUsXnosrBdbuq+M9hXYl2TeNltY X-Google-Smtp-Source: ABdhPJxXru8+cXKqORtE4BJiYEzoGQDpAGnpV7FYLSsXCJi+5JQTIfDfRTZ7CP/fY17JIe9fd24wBA== X-Received: by 2002:aca:5a86:: with SMTP id o128mr4442303oib.23.1607465133262; Tue, 08 Dec 2020 14:05:33 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id m10sm20112oig.27.2020.12.08.14.05.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Dec 2020 14:05:32 -0800 (PST) Date: Tue, 8 Dec 2020 17:05:30 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v4 24/24] pack-bitmap-write: better reuse bitmaps Message-ID: <720b6e0dc7d69bf3454c57002a67a488c25050e4.1607464775.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee If the old bitmap file contains a bitmap for a given commit, then that commit does not need help from intermediate commits in its history to compute its final bitmap. Eject that commit from the walk and insert it into a separate list of reusable commits that are eventually stored in the list of commits for computing bitmaps. This helps the repeat bitmap computation task, even if the selected commits shift drastically. This helps when a previously-bitmapped commit exists in the first-parent history of a newly-selected commit. Since we stop the walk at these commits and we use a first-parent walk, it is harder to walk "around" these bitmapped commits. It's not impossible, but we can greatly reduce the computation time for many selected commits. | runtime (sec) | peak heap (GB) | | | | | from | with | from | with | | scratch | existing | scratch | existing | -----------+---------+----------+---------+----------- last patch | 88.478 | 53.218 | 2.157 | 2.224 | this patch | 86.681 | 16.164 | 2.157 | 2.222 | Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 40 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index d2af4a974f..cc5ead9990 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -195,10 +195,13 @@ struct bitmap_builder { }; static void bitmap_builder_init(struct bitmap_builder *bb, - struct bitmap_writer *writer) + struct bitmap_writer *writer, + struct bitmap_index *old_bitmap) { struct rev_info revs; struct commit *commit; + struct commit_list *reusable = NULL; + struct commit_list *r; unsigned int i, num_maximal = 0; memset(bb, 0, sizeof(*bb)); @@ -234,6 +237,31 @@ static void bitmap_builder_init(struct bitmap_builder *bb, c_ent = bb_data_at(&bb->data, commit); + /* + * If there is no commit_mask, there is no reason to iterate + * over this commit; it is not selected (if it were, it would + * not have a blank commit mask) and all its children have + * existing bitmaps (see the comment starting with "This commit + * has an existing bitmap" below), so it does not contribute + * anything to the final bitmap file or its descendants. + */ + if (!c_ent->commit_mask) + continue; + + if (old_bitmap && bitmap_for_commit(old_bitmap, commit)) { + /* + * This commit has an existing bitmap, so we can + * get its bits immediately without an object + * walk. That is, it is reusable as-is and there is no + * need to continue walking beyond it. + * + * Mark it as such and add it to bb->commits separately + * to avoid allocating a position in the commit mask. + */ + commit_list_insert(commit, &reusable); + goto next; + } + if (c_ent->maximal) { num_maximal++; ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); @@ -278,14 +306,22 @@ static void bitmap_builder_init(struct bitmap_builder *bb, } } +next: bitmap_free(c_ent->commit_mask); c_ent->commit_mask = NULL; } + for (r = reusable; r; r = r->next) { + ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); + bb->commits[bb->commits_nr++] = r->item; + } + trace2_data_intmax("pack-bitmap-write", the_repository, "num_selected_commits", writer->selected_nr); trace2_data_intmax("pack-bitmap-write", the_repository, "num_maximal_commits", num_maximal); + + free_commit_list(reusable); } static void bitmap_builder_clear(struct bitmap_builder *bb) @@ -420,7 +456,7 @@ void bitmap_writer_build(struct packing_data *to_pack) else mapping = NULL; - bitmap_builder_init(&bb, &writer); + bitmap_builder_init(&bb, &writer, old_bitmap); for (i = bb.commits_nr; i > 0; i--) { struct commit *commit = bb.commits[i-1]; struct bb_commit *ent = bb_data_at(&bb.data, commit);