From patchwork Tue Dec 8 00:04:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD883C4361B for ; Tue, 8 Dec 2020 00:05:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 652C823A1D for ; Tue, 8 Dec 2020 00:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728405AbgLHAFC (ORCPT ); Mon, 7 Dec 2020 19:05:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726207AbgLHAFC (ORCPT ); Mon, 7 Dec 2020 19:05:02 -0500 Received: from mail-oi1-x243.google.com (mail-oi1-x243.google.com [IPv6:2607:f8b0:4864:20::243]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A53A3C061794 for ; Mon, 7 Dec 2020 16:04:16 -0800 (PST) Received: by mail-oi1-x243.google.com with SMTP id f132so3073124oib.12 for ; Mon, 07 Dec 2020 16:04:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=7VnFwsY7xM/vegQ12SEFiiCQRRPhzMko6e1us1C3KBY=; b=ABL9BeKYAHk+ThcnqfnAiQfwPEThnyKcMGQt09fQs9uxrlGayw2Vf4TYO6MZ/P2PqX zoRTTMt+04/6xIEtBTY2RlsmRrtEOpfSBxRwJ158XSDGAkSJzhGZxQtckh6zmxsoGr6Y Xv66lDHyuD2bLvKulMcrmCAk+EbHeJ+CHGnL9aBsz3cI8KJdWqyQSzTU7dQejWUKLaVK mGJHCITkynbAoRzNLIY0OrG/KFxR0/kS5z9rvntpKRrVNlf6pCmtRYzXCFo0KyxRjKcv C71UmWefbfZwMPRYge4HtqYUTFdAsM5WhXYSSI77N3ctB6r6lGUnZW/6nb2JYIU22pnd l2ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=7VnFwsY7xM/vegQ12SEFiiCQRRPhzMko6e1us1C3KBY=; b=UV9qtgbqb281Bazvecuw8fDV2ooz/Sr8K+fza4EQ88GqHUThs7bfYxxE5SB/uerXKa RF7Z97PWFgpq7fSdXn1/U6yiEsIyhPQsK05B+YnZRMr2raP/9ThVJi4yVUtCZ1y0zyj4 rkK1376L4hUR0347bNETEkXvaB7QeXLbxrXYkAxqhsio/CPMg41fs05TrTnkF0o1d4Vj s5oQadPa67fkGcnXrtw0jWHiPGiYltCYYmx7HpgpXFO756OJxZnr33rBc2jPEf5jH/wg /gYqVSCMpjNYewVOAbQ5Q1CjtOCd2bkHP3mJhGNhd6KfLrsVUda7zPuvwnXkplgFNIZk Ll2w== X-Gm-Message-State: AOAM5308qvYPgz2XqGkLSXB7C2rGQ4n9vlFXLj3aBqWZl7JwdXwPn3NU iZR7G7mQNcEfKuxHMzM9qtIyCeJ/vDfJHedO X-Google-Smtp-Source: ABdhPJx5DPO6Z+bTUq9TG8lpPu5Do3RS1erfVRcPv5kEhvhZhJcANtvqhC8ETQzVqCcc0sRq9V6wug== X-Received: by 2002:a05:6808:b26:: with SMTP id t6mr919099oij.169.1607385855822; Mon, 07 Dec 2020 16:04:15 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id g13sm2327221otl.47.2020.12.07.16.04.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:04:15 -0800 (PST) Date: Mon, 7 Dec 2020 19:04:12 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 01/24] ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW() Message-ID: <0b25ba4ca7133be1f7b763ba9c3d18dbdd03f37a.1607385833.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org 'ewah/ewah_bitmap.c:buffer_grow()' is responsible for growing the buffer used to store the bits of an EWAH bitmap. It is essentially doing the same task as the 'ALLOC_GROW()' macro, so use that instead. This simplifies the callers of 'buffer_grow()', who no longer have to ask for a specific size, but rather specify how much of the buffer they need. They also no longer need to guard 'buffer_grow()' behind an if statement, since 'ALLOC_GROW()' (and, by extension, 'buffer_grow()') is a noop if the buffer is already large enough. But, the most significant change is that this fixes a bug when calling buffer_grow() with both 'alloc_size' and 'new_size' set to 1. In this case, truncating integer math will leave the new size set to 1, causing the buffer to never grow. Instead, let alloc_nr() handle this, which asks for '(new_size + 16) * 3 / 2' instead of 'new_size * 3 / 2'. Signed-off-by: Taylor Blau --- ewah/ewah_bitmap.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c index d59b1afe3d..2a8c7c5c33 100644 --- a/ewah/ewah_bitmap.c +++ b/ewah/ewah_bitmap.c @@ -19,6 +19,7 @@ #include "git-compat-util.h" #include "ewok.h" #include "ewok_rlw.h" +#include "cache.h" static inline size_t min_size(size_t a, size_t b) { @@ -33,20 +34,13 @@ static inline size_t max_size(size_t a, size_t b) static inline void buffer_grow(struct ewah_bitmap *self, size_t new_size) { size_t rlw_offset = (uint8_t *)self->rlw - (uint8_t *)self->buffer; - - if (self->alloc_size >= new_size) - return; - - self->alloc_size = new_size; - REALLOC_ARRAY(self->buffer, self->alloc_size); + ALLOC_GROW(self->buffer, new_size, self->alloc_size); self->rlw = self->buffer + (rlw_offset / sizeof(eword_t)); } static inline void buffer_push(struct ewah_bitmap *self, eword_t value) { - if (self->buffer_size + 1 >= self->alloc_size) - buffer_grow(self, self->buffer_size * 3 / 2); - + buffer_grow(self, self->buffer_size + 1); self->buffer[self->buffer_size++] = value; } @@ -137,8 +131,7 @@ void ewah_add_dirty_words( rlw_set_literal_words(self->rlw, literals + can_add); - if (self->buffer_size + can_add >= self->alloc_size) - buffer_grow(self, (self->buffer_size + can_add) * 3 / 2); + buffer_grow(self, self->buffer_size + can_add); if (negate) { size_t i; From patchwork Tue Dec 8 00:04:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBCC4C197BF for ; Tue, 8 Dec 2020 00:05:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A9D0123A22 for ; Tue, 8 Dec 2020 00:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728430AbgLHAFH (ORCPT ); Mon, 7 Dec 2020 19:05:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728311AbgLHAFH (ORCPT ); Mon, 7 Dec 2020 19:05:07 -0500 Received: from mail-oi1-x243.google.com (mail-oi1-x243.google.com [IPv6:2607:f8b0:4864:20::243]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E4F3C06179C for ; Mon, 7 Dec 2020 16:04:21 -0800 (PST) Received: by mail-oi1-x243.google.com with SMTP id v85so6968793oia.6 for ; Mon, 07 Dec 2020 16:04:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=0biD3z7zmbAd3yPu0/4ZJpBDWIf7gZwf31aSZXDeAwY=; b=BsFstG/chYQOyqJYvyeGq4IcYqyKHxA+N2e9TqzgH6pAOpReFMqCbrwInrmZ1fMSFe cemZVcHUGXKUCPGyDrKX25Ej/54C+6bh7t06u2Qt53wkFvibZYAWCoD5gj9psZ/gMJLS qPKunGY94e9OFHIkqOhpuJ31uSPy7D3sho7KKUrKzgdTp1MPaIkccI9AjnANAjqzNcfM iPX1WmzJYC/okMZZvFVsR+gvAxy9BDtyyBFRwmcWKQtvoif3WqSkNi/rjfkcyBxKl2T4 0etgcjDOJ4ESEtZUakGJczfSxtjIHtBPDPbRtzZEd8B66+NCh8a/yuI/VI67peHHD+6R tg3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=0biD3z7zmbAd3yPu0/4ZJpBDWIf7gZwf31aSZXDeAwY=; b=bfcTpSnXzz8YVzzyj8drHCXsHWcdn1H0AV17DUBm8SDGV9sEgcBDR8mCkmrjqD+iVs 2nTAOH/pTachiQeffCwvuqhjgWpLahVQGEH+AT2QWlxX3hrK8sVdwtRpExE79NSscB1w ekHr2myNMua5xZbqoUXghmYoGWVmqxrFbEqo1N2HyAJ4ykGr0AoOmpq+HA4C9S8PFCOd DMRwPwFlZbCJn3YFZqYHhrHVcthoxV1vu9u+iwbnrbNK3Mr9phVTNTSoxLeRJ27d81AX sZu1fAD5SwwqCCbksiq8nsFdyo7d4+DuRqI3tT7vOuWiegEazXcCVVQy1GK3IZFy3quu Yavg== X-Gm-Message-State: AOAM531sAfGc6aV5TDG7p1xfMXyrajwuhjj9yuR6Ma9+vDIEb0QmaEIF xWQg3CfMN3aUHDauE2wD4fpGtMhXXjugc9iO X-Google-Smtp-Source: ABdhPJzXO3IkJnY+scjzi+JeQN6vkwHz97+Dmk5lIBqc4NAnS2BCc1Xtb3zFd7uiFqL1x6o1GMhDMQ== X-Received: by 2002:aca:ef44:: with SMTP id n65mr1012874oih.90.1607385860707; Mon, 07 Dec 2020 16:04:20 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id t12sm1976021oot.21.2020.12.07.16.04.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:04:20 -0800 (PST) Date: Mon, 7 Dec 2020 19:04:17 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 02/24] pack-bitmap: fix header size check Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King When we parse a .bitmap header, we first check that we have enough bytes to make a valid header. We do that based on sizeof(struct bitmap_disk_header). However, as of 0f4d6cada8 (pack-bitmap: make bitmap header handling hash agnostic, 2019-02-19), that struct oversizes its checksum member to GIT_MAX_RAWSZ. That means we need to adjust for the difference between that constant and the size of the actual hash we're using. That commit adjusted the code which moves our pointer forward, but forgot to update the size check. This meant we were overly strict about the header size (requiring room for a 32-byte worst-case hash, when sha1 is only 20 bytes). But in practice it didn't matter because bitmap files tend to have at least 12 bytes of actual data anyway, so it was unlikely for a valid file to be caught by this. Let's fix it by pulling the header size into a separate variable and using it in both spots. That fixes the bug and simplifies the code to make it harder to have a mismatch like this in the future. It will also come in handy in the next patch for more bounds checking. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 4077e731e8..fe5647e72e 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -138,9 +138,10 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index) static int load_bitmap_header(struct bitmap_index *index) { struct bitmap_disk_header *header = (void *)index->map; + size_t header_size = sizeof(*header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz; - if (index->map_size < sizeof(*header) + the_hash_algo->rawsz) - return error("Corrupted bitmap index (missing header data)"); + if (index->map_size < header_size + the_hash_algo->rawsz) + return error("Corrupted bitmap index (too small)"); if (memcmp(header->magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)) != 0) return error("Corrupted bitmap index file (wrong header)"); @@ -164,7 +165,7 @@ static int load_bitmap_header(struct bitmap_index *index) } index->entry_count = ntohl(header->entry_count); - index->map_pos += sizeof(*header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz; + index->map_pos += header_size; return 0; } From patchwork Tue Dec 8 00:04:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEF05C0018C for ; Tue, 8 Dec 2020 00:05:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 86F4D23A23 for ; Tue, 8 Dec 2020 00:05:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728414AbgLHAFG (ORCPT ); Mon, 7 Dec 2020 19:05:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45348 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728311AbgLHAFG (ORCPT ); Mon, 7 Dec 2020 19:05:06 -0500 Received: from mail-ot1-x331.google.com (mail-ot1-x331.google.com [IPv6:2607:f8b0:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D702C0617B0 for ; Mon, 7 Dec 2020 16:04:26 -0800 (PST) Received: by mail-ot1-x331.google.com with SMTP id h19so14338421otr.1 for ; Mon, 07 Dec 2020 16:04:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=4+YaVsaIeql0R9c8t74ily3XElRCcVkPGK0I++sGLHk=; b=T5xQ6P30JuZ2+Ax5QRAdn2jwIa4TEO1r52wg6nI04La1jELXsW3QXxDvm2PaEi9/ri WSqu/OPBdBed8+/d6K6uKXf3NCOriCqXDtGdhVTEYeRoYCouGPmMapdGfyH5BRfdsfup oifhpdyppZVsumhXr1I6/TYElZf4F6ZZCCBYj0GbNUqYStuc/OVS7Zb1XFIx8CnX09e9 v8QJRuECXaEE8HW2l0onc3ISyDDO8+73zpgAslHg17U97Fs48IcLlKbz7G7Aw4bfTziC prbEb9PvYvOFH/LSLl70RLHPH2LwcPjRFBSVhychDqsBAWimDRPf5sgXD/omdtSKFiqD VyMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=4+YaVsaIeql0R9c8t74ily3XElRCcVkPGK0I++sGLHk=; b=bTeBS9j6jZOv9/xjVvi7bCcfwCMJjTOuiLmaog2n+nFUVKuzeBsXXo2eiMieqU1Pew +SDvPbdTIVRFRScNGZbTZ2o53jFeQ8Z3ef/lEzEF0shw51TXw23QDz3qxUsXmCzY2pfe zETGW8ggQpVYAo9wX3hai6Y1iBB6LlIXbfSlelsgqUkViue5Ng0xjzRJfwnxZ2soF32J WTrKYQ76Fs6mWhLJzV5UtI/l/6LJCx9kGpiuJ67yPiTDNdSFs8Bs5POXuiIcRXAhI/ic e0Z34WSD3sYnv2Dsbsv/zVX7UmwJbfRBCs0Vk/pEGW98bE2tDgiApFuBZOSw/vfgLRwS wAHw== X-Gm-Message-State: AOAM532CdFpqmCB2icZ6TFZ39XPAOtBKOjQeb1h0iabAB3qnyLYIeAWA vmvhlo7LfvSN6Ge42geHwizvYMkI7IrX7ZUo X-Google-Smtp-Source: ABdhPJx5xT0bSS2on3YgM3bk/+pKxn6jpfP31F1eYHtSnUDeLy/sMa80UkrVDi2JBF4ecR1Qb5pT8Q== X-Received: by 2002:a9d:64da:: with SMTP id n26mr14569581otl.64.1607385865526; Mon, 07 Dec 2020 16:04:25 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id g3sm3327357oif.26.2020.12.07.16.04.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:04:24 -0800 (PST) Date: Mon, 7 Dec 2020 19:04:22 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 03/24] pack-bitmap: bounds-check size of cache extension Message-ID: <73224274444ea1385bb98b26710725bbb66a9128.1607385833.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King A .bitmap file may have a "name hash cache" extension, which puts a sequence of uint32_t values (one per object) at the end of the file. When we see a flag indicating this extension, we blindly subtract the appropriate number of bytes from our available length. However, if the .bitmap file is too short, we'll underflow our length variable and wrap around, thinking we have a very large length. This can lead to reading out-of-bounds bytes while loading individual ewah bitmaps. We can fix this by checking the number of available bytes when we parse the header. The existing "truncated bitmap" test is now split into two tests: one where we don't have this extension at all (and hence actually do try to read a truncated ewah bitmap) and one where we realize up-front that we can't even fit in the cache structure. We'll check stderr in each case to make sure we hit the error we're expecting. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 8 ++++++-- t/t5310-pack-bitmaps.sh | 17 +++++++++++++++-- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index fe5647e72e..074d9ac8f2 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -153,14 +153,18 @@ static int load_bitmap_header(struct bitmap_index *index) /* Parse known bitmap format options */ { uint32_t flags = ntohs(header->options); + size_t cache_size = st_mult(index->pack->num_objects, sizeof(uint32_t)); + unsigned char *index_end = index->map + index->map_size - the_hash_algo->rawsz; if ((flags & BITMAP_OPT_FULL_DAG) == 0) return error("Unsupported options for bitmap index file " "(Git requires BITMAP_OPT_FULL_DAG)"); if (flags & BITMAP_OPT_HASH_CACHE) { - unsigned char *end = index->map + index->map_size - the_hash_algo->rawsz; - index->hashes = ((uint32_t *)end) - index->pack->num_objects; + if (cache_size > index_end - index->map - header_size) + return error("corrupted bitmap index file (too short to fit hash cache)"); + index->hashes = (void *)(index_end - cache_size); + index_end -= cache_size; } } diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 1d40fcad39..dbe1ffc88a 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -343,7 +343,8 @@ test_expect_success 'pack reuse respects --incremental' ' test_must_be_empty actual ' -test_expect_success 'truncated bitmap fails gracefully' ' +test_expect_success 'truncated bitmap fails gracefully (ewah)' ' + test_config pack.writebitmaphashcache false && git repack -ad && git rev-list --use-bitmap-index --count --all >expect && bitmap=$(ls .git/objects/pack/*.bitmap) && @@ -352,7 +353,19 @@ test_expect_success 'truncated bitmap fails gracefully' ' mv -f $bitmap.tmp $bitmap && git rev-list --use-bitmap-index --count --all >actual 2>stderr && test_cmp expect actual && - test_i18ngrep corrupt stderr + test_i18ngrep corrupt.ewah.bitmap stderr +' + +test_expect_success 'truncated bitmap fails gracefully (cache)' ' + git repack -ad && + git rev-list --use-bitmap-index --count --all >expect && + bitmap=$(ls .git/objects/pack/*.bitmap) && + test_when_finished "rm -f $bitmap" && + test_copy_bytes 512 <$bitmap >$bitmap.tmp && + mv -f $bitmap.tmp $bitmap && + git rev-list --use-bitmap-index --count --all >actual 2>stderr && + test_cmp expect actual && + test_i18ngrep corrupted.bitmap.index stderr ' # have_delta From patchwork Tue Dec 8 00:04:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 765F8C4361B for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4766B23A1D for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728510AbgLHAFR (ORCPT ); Mon, 7 Dec 2020 19:05:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728426AbgLHAFR (ORCPT ); Mon, 7 Dec 2020 19:05:17 -0500 Received: from mail-ot1-x342.google.com (mail-ot1-x342.google.com [IPv6:2607:f8b0:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B393C061793 for ; Mon, 7 Dec 2020 16:04:31 -0800 (PST) Received: by mail-ot1-x342.google.com with SMTP id h19so14338575otr.1 for ; Mon, 07 Dec 2020 16:04:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=KUBJcsWeoRI5VoCgkX2lkcFq3ATa4y+wUU2KV7/4Hsk=; b=L7ztPi/gyhZsSxItNSCo/YyQR4VUz29DP9jG02LtTDcIDkydx6yhiSbZZgEnLgw2Ya zHbCcj1VyoPtymSrs+KI/oPWkshsSGrAEckg6w2UeJkphOPY3QhkSZdXLLVL88Y4YRFk TlFthoeTZ9k+5b4cemg5S+AoRQQ32A3hNsLw3e9sSRmnYKjFGsDm2LmThPZWEfe66Oh8 K++s/k49+2fXKGPbt7kCSlx7kqQDP4C8fD/0FBgPj9RM7iMD9+KV4sMQn0AzmyPa1uyC l9wfDvDEEbCCZlwOXnwQw6wOOaRryR7w1qE8ptlIzlIHsNu+rQ3Rf12Cig6iAZM/ttuI ymMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=KUBJcsWeoRI5VoCgkX2lkcFq3ATa4y+wUU2KV7/4Hsk=; b=CXddz+Wy6/HwwxUePT/m46XF1CXIIhESx/hdbLc3FWTRWv1WlFCarKrj4BOYnHF4rf L3gFYV+QT1b4JHvKUZJMMvr4gG0bAlJQ2c8cNLwY0T8UwUeBG1B3Wr0rpCFsoltRk7Cp YzRGArWSYKrDhxA2JARli7r65Nv32Jrf3UmsLjlVXgWe3WMx3POWi8YMBF76xOZdamp/ TfH+lGH/lG7DbvRFDrgc3tL/hYhGCIl80W0kRGKnygv6P8VsicsYgpPKhYyYMHYdOJp0 3Pz6l1TSZDagzXK9vj0y6zFEhxUNuP977Gy+EOMaH1EVvq9poBas8nU3AR7WfChNLfrJ bfxA== X-Gm-Message-State: AOAM53161dJQxWQcY2j7jwLwDmkqi5BbNW6rA2cNBmlDRDJpPTeUQSW/ LeZGwHhSpk7VVW/m4Awl3G9y3QRTPlkUP1/h X-Google-Smtp-Source: ABdhPJxz6hHn3mrFvycDQFs4pr5q/scXYQIzC6aLzJjAjmGKrqlyV7P+DHiZ+tI73Xd/vOgHStMIrw== X-Received: by 2002:a9d:508:: with SMTP id 8mr8596704otw.338.1607385870262; Mon, 07 Dec 2020 16:04:30 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id u15sm592094oiv.28.2020.12.07.16.04.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:04:29 -0800 (PST) Date: Mon, 7 Dec 2020 19:04:27 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 04/24] t5310: drop size of truncated ewah bitmap Message-ID: <055bc1fe66d415bc8e6dcef3e7201d007608e27f.1607385833.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King We truncate the .bitmap file to 512 bytes and expect to run into problems reading an individual ewah file. But this length is somewhat arbitrary, and just happened to work when the test was added in 9d2e330b17 (ewah_read_mmap: bounds-check mmap reads, 2018-06-14). An upcoming commit will change the size of the history we create in the test repo, which will cause this test to fail. We can future-proof it a bit more by reducing the size of the truncated bitmap file. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- t/t5310-pack-bitmaps.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index dbe1ffc88a..8a2a3b2114 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -349,7 +349,7 @@ test_expect_success 'truncated bitmap fails gracefully (ewah)' ' git rev-list --use-bitmap-index --count --all >expect && bitmap=$(ls .git/objects/pack/*.bitmap) && test_when_finished "rm -f $bitmap" && - test_copy_bytes 512 <$bitmap >$bitmap.tmp && + test_copy_bytes 256 <$bitmap >$bitmap.tmp && mv -f $bitmap.tmp $bitmap && git rev-list --use-bitmap-index --count --all >actual 2>stderr && test_cmp expect actual && From patchwork Tue Dec 8 00:04:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 612C3C433FE for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2F40F23A04 for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728537AbgLHAFR (ORCPT ); Mon, 7 Dec 2020 19:05:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45376 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728460AbgLHAFR (ORCPT ); Mon, 7 Dec 2020 19:05:17 -0500 Received: from mail-oi1-x242.google.com (mail-oi1-x242.google.com [IPv6:2607:f8b0:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 012EFC06138C for ; Mon, 7 Dec 2020 16:04:37 -0800 (PST) Received: by mail-oi1-x242.google.com with SMTP id y74so17505712oia.11 for ; Mon, 07 Dec 2020 16:04:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=aTPqehF/iR92lHl5qU6Qtb+pF/tTsVor4twzj8uNe5U=; b=CoQkznRX2vLhrMm4wInB7eGzfOoBCY/H2eRTME8ZsIqvl2BsaCaH3lY+/BTMSerZMU KLin/yJ7JR1uV5A+EtfZsI1XRhkU/to26EGi2Givinv76IxvKQLc0jUHiip07qAVqsow uajIvLRuX+k2HD/1V+dkExod1EnfXhD5v4s5ZHolDTT8tp20v1MYE/IOdNag2FT8jl/0 gpg6JkvyGYjGkie3T4rP0C+j+mPI7zMxV+H3DLISG3a83IdLyeaHba+Q4ORBiJIDZqfy ejSXBVYusip9m+GGL63kAHkDibVKOR6P7FpATIkGE8w35puKge3CgL8cL1VWBdfnr+Un 4LXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=aTPqehF/iR92lHl5qU6Qtb+pF/tTsVor4twzj8uNe5U=; b=VydywP/5Sd3sZW4IhHSMSNesvIaKsx4LGga7Y5w/1vICmdO5Ud3msMItE41msErYI3 z7pyHT+gLx3LyqAHmtZNQQyqZ3K8pVwOYl9jBLWiQJQn/0P7cd/g2BG3SkSrxoQsqMCD QMEAhsmGEow8BreCWW0xwJPX6Y+zqWmS26+ScLP1sJ9f90ulk08rTa1iNdi7RAPhsFpk mIUWA3p0Ai3mHHcBzIr9OkDGRwU9BL+gvddzSI42k2fOPtXC4L4ZJxBMVgN/sUKgQH4R PwWj8uNRfO8UbyYIuymCBe7mGL6pa0hw8ba0gl7d+qOi/eOea13bcJdGUrAbaS5nvykl Reww== X-Gm-Message-State: AOAM531a/eqHhlq5u+orGH8ef0UTSCy8EymLktVLobM6cP+pd7jnNDMa Yc15iX3P/60IJQlSqAeED9yKTbuhg0rRtNXS X-Google-Smtp-Source: ABdhPJwV/jBhN3SEBoqD+2DAt5+aAO01p6ddOSJjMMBrBIcoIpSiZc2moSiF4Z5yOdHjLUv6p018wA== X-Received: by 2002:aca:3ad6:: with SMTP id h205mr939499oia.119.1607385876104; Mon, 07 Dec 2020 16:04:36 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id n31sm1085098otn.33.2020.12.07.16.04.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:04:35 -0800 (PST) Date: Mon, 7 Dec 2020 19:04:32 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 05/24] rev-list: die when --test-bitmap detects a mismatch Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King You can use "git rev-list --test-bitmap HEAD" to check that bitmaps produce the same answer we'd get from a regular traversal. But if we detect an error, we only print "mismatch", and still exit with a successful error code. That makes the uses of --test-bitmap in the test suite (e.g., in t5310) mostly pointless: even if we saw an error, the tests wouldn't notice. Let's instead call die(), which will let these tests work as designed, and alert us if the bitmaps are bogus. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 074d9ac8f2..4431f9f120 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1328,7 +1328,7 @@ void test_bitmap_walk(struct rev_info *revs) if (bitmap_equals(result, tdata.base)) fprintf(stderr, "OK!\n"); else - fprintf(stderr, "Mismatch!\n"); + die("mismatch in bitmap results"); free_bitmap_index(bitmap_git); } From patchwork Tue Dec 8 00:04:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957147 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9809CC4167B for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 755DD23A04 for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728547AbgLHAFi (ORCPT ); Mon, 7 Dec 2020 19:05:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728514AbgLHAFi (ORCPT ); Mon, 7 Dec 2020 19:05:38 -0500 Received: from mail-oi1-x241.google.com (mail-oi1-x241.google.com [IPv6:2607:f8b0:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4A36C061257 for ; Mon, 7 Dec 2020 16:04:41 -0800 (PST) Received: by mail-oi1-x241.google.com with SMTP id s75so14467023oih.1 for ; Mon, 07 Dec 2020 16:04:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=S+PfCUh8SD4AoMOsjP36g0QEn/qzOI2Cq9Wdhx/41HM=; b=fOl3IMuNceSm3Kka1Uxw7pxRpnpOupmFNK6xaZwEaYkUiOmQDBrRyEguuCAYQsSU1+ l4bke7XeJGv8WO3EetogxgxGB+tZAUHPnNWD32KMK06Hp0EW+4J6MbtCmEaKsCkGLFH5 BoXk6cCBEa4RL9lLD4n9UdFSRRRmjsYGWQ+VuKdde3L6FbQod+zcvuWWhqmHUhHF1VcH 64L9+/up8viG0gDc9TH6VTy7cJgW21zBqGvJm2O/aBT8G1B4MpeqTZGSe4yQqD7G91us V1a6GvnDComgH4T5e6dEshhT1ojAE3HiFQwfSt4lOJAT8bcP8I+hwhfSV2l9CJ7/oWL1 Vlsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=S+PfCUh8SD4AoMOsjP36g0QEn/qzOI2Cq9Wdhx/41HM=; b=KSbQosYMlnFrGNUTVQjbqXTOPaoLDOjU9pcRdShFYEQdBiuXz77deiHFGcfxNm4k6Y 82QOuYJ12sj09EfdyBc89mVsjxvMmC7OAcCkzcnBQbgUDPEvCt76Kb/Fru4VpqmHD3xL rLZZZWmZa+suZIXilY9EKz4F7wvGEGcXmKqgxn168yfNaDCb0iLD5F+J1HVkIVzLhqnu A46s7tepBg+0kIZdmcC0kO+DaNsup0CldtieJ+0un4qcYfNx6vz9VU8cOFBDwQjhsW3v lBUfKWFV0TbadmUiloaIIhbrWLXlAcJDf7ExNvRwwWQij3MUG0QdrN1Tof2dZ3eAF11R 2KSQ== X-Gm-Message-State: AOAM5309esj7aHeXFDKsKUkWk6PihlZaLNWPHH+IJCjXtlkxuZ/s1kWS j24sKJy6HfhFRevtcf6ojX3aVyol1zZQwnuz X-Google-Smtp-Source: ABdhPJx1HNg6XCDSqqQymokQ2bmYbomfqyJ5Vi7HClu27toBrIZts6Q/514PR+2poXnYXCl4Dsz+Gw== X-Received: by 2002:aca:c615:: with SMTP id w21mr970125oif.68.1607385880801; Mon, 07 Dec 2020 16:04:40 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id g6sm3014978otk.40.2020.12.07.16.04.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:04:40 -0800 (PST) Date: Mon, 7 Dec 2020 19:04:37 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 06/24] ewah: factor out bitmap growth Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King We auto-grow bitmaps when somebody asks to set a bit whose position is outside of our currently allocated range. Other operations besides single bit-setting might need to do this, too, so let's pull it into its own function. Note that we change the semantics a little: you now ask for the number of words you'd like to have, not the id of the block you'd like to write to. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index d8cec585af..7c1ecfa6fd 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -35,18 +35,22 @@ struct bitmap *bitmap_new(void) return bitmap_word_alloc(32); } -void bitmap_set(struct bitmap *self, size_t pos) +static void bitmap_grow(struct bitmap *self, size_t word_alloc) { - size_t block = EWAH_BLOCK(pos); - - if (block >= self->word_alloc) { + if (word_alloc > self->word_alloc) { size_t old_size = self->word_alloc; - self->word_alloc = block ? block * 2 : 1; + self->word_alloc = word_alloc * 2; REALLOC_ARRAY(self->words, self->word_alloc); memset(self->words + old_size, 0x0, (self->word_alloc - old_size) * sizeof(eword_t)); } +} +void bitmap_set(struct bitmap *self, size_t pos) +{ + size_t block = EWAH_BLOCK(pos); + + bitmap_grow(self, block + 1); self->words[block] |= EWAH_MASK(pos); } From patchwork Tue Dec 8 00:04:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957141 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83D66C0018C for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5F16823A22 for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728541AbgLHAFc (ORCPT ); Mon, 7 Dec 2020 19:05:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728460AbgLHAFc (ORCPT ); Mon, 7 Dec 2020 19:05:32 -0500 Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4209FC0611C5 for ; Mon, 7 Dec 2020 16:04:46 -0800 (PST) Received: by mail-ot1-x341.google.com with SMTP id x13so6789356oto.8 for ; Mon, 07 Dec 2020 16:04:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=TKUdskbrHTgE2dMosZfIVMUOZBpta233l6JHsaPug8M=; b=KC4jZaKhX9ET7KbtFO6OzTIJPzXJQ0tbORMU7vLSLyOAUWVY1gOGGJvNbhccfShU9P 6ETyA3jGaYFGMDMFH7xbt+1gWLlCqqwlN7mOcF0lyNtvp2bErf/ev2VZ4FQh4MFf7Ki4 iEm6oryNfmubeughJic+PiCE/Uusynu9vHa1b5tMUqSje27vCf9CLUU8sYD90nfiQHQ4 KeHkp4A8XpXV/9z29ZqDMS+tPJt2dypHFdBWJ65GmEl/djwghT55KbrAyr2ZnW0drfpu xqCiP1l9OexGmkaUmCDRhiAUyWwTpO7tNObOE/i0rVNCZaxx3pYISmL1WryChrJprFsj 6oMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=TKUdskbrHTgE2dMosZfIVMUOZBpta233l6JHsaPug8M=; b=cfYWMIhIGyyLtCW2WdlTG64S7dRiE+t1BP8T/j6laBaMZpVF94fxHJaTrWfRXeCCLm s2Xt64ILdMApCXLLyCxdx2OCAs0ZDBm18rqx0Dzh40+ChGzaTIgYiUEX5BdbcgLnzt9x OYRXTfVOPpQ4pUw2AtSmZMtqRUkhanZ5wIe6cnZl3HqROM4eEwE1DSFoW09MqQ8TcBqK cGSgFejWCBlV9f4NWJeOvW3jL2Io/bmrlqiWXIf6t3P8qYk4VeAoIJGJeFkN1aIGjMXj bUAD9V/taVwErLJGaBhPLlI9r+rvFqluczQpmgiuTDNIMgTQqM0Xbds5WetGSvW4w3It kS2w== X-Gm-Message-State: AOAM532FmuVYN7QCSW74Dzqefk/vn0ZazHfWemDxGHG/Fc2N3bvB/6n+ 2XTNqHizWaqZFngevUPHQ9rVcaQWaJiFUhqM X-Google-Smtp-Source: ABdhPJyLQnrmks6+EhwYSG9RtiYquBEZTg750qeeaF2ebhXWCVwS8IdI00thS7x47L9ri7VfHsLDvQ== X-Received: by 2002:a05:6830:916:: with SMTP id v22mr11471305ott.257.1607385885386; Mon, 07 Dec 2020 16:04:45 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id n190sm2969199oob.11.2020.12.07.16.04.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:04:44 -0800 (PST) Date: Mon, 7 Dec 2020 19:04:42 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 07/24] ewah: make bitmap growth less aggressive Message-ID: <4b56f12932c0fd9e47a82a1adbeb4080a894013f.1607385833.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King If you ask to set a bit in the Nth word and we haven't yet allocated that many slots in our array, we'll increase the bitmap size to 2*N. This means we might frequently end up with bitmaps that are twice the necessary size (as soon as you ask for the biggest bit, we'll size up to twice that). But if we just allocate as many words as were asked for, we may not grow fast enough. The worst case there is setting bit 0, then 1, etc. Each time we grow we'd just extend by one more word, giving us linear reallocations (and quadratic memory copies). A middle ground is relying on alloc_nr(), which causes us to grow by a factor of roughly 3/2 instead of 2. That's less aggressive than doubling, and it may help avoid fragmenting memory. (If we start with N, then grow twice, our total is N*(3/2)^2 = 9N/4. After growing twice, that array of size 9N/4 can fit into the space vacated by the original array and first growth, N+3N/2 = 10N/4 > 9N/4, leading to less fragmentation in memory). Our worst case is still 3/2N wasted bits (you set bit N-1, then setting bit N causes us to grow by 3/2), but our average should be much better. This isn't usually that big a deal, but it will matter as we shift the reachability bitmap generation code to store more bitmaps in memory. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index 7c1ecfa6fd..6f9e5c529b 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -37,13 +37,10 @@ struct bitmap *bitmap_new(void) static void bitmap_grow(struct bitmap *self, size_t word_alloc) { - if (word_alloc > self->word_alloc) { - size_t old_size = self->word_alloc; - self->word_alloc = word_alloc * 2; - REALLOC_ARRAY(self->words, self->word_alloc); - memset(self->words + old_size, 0x0, - (self->word_alloc - old_size) * sizeof(eword_t)); - } + size_t old_size = self->word_alloc; + ALLOC_GROW(self->words, word_alloc, self->word_alloc); + memset(self->words + old_size, 0x0, + (self->word_alloc - old_size) * sizeof(eword_t)); } void bitmap_set(struct bitmap *self, size_t pos) From patchwork Tue Dec 8 00:04:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957139 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3D47C1B0D9 for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8B52E23A1D for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728549AbgLHAFn (ORCPT ); Mon, 7 Dec 2020 19:05:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727744AbgLHAFm (ORCPT ); Mon, 7 Dec 2020 19:05:42 -0500 Received: from mail-ot1-x344.google.com (mail-ot1-x344.google.com [IPv6:2607:f8b0:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0F13C0611CA for ; Mon, 7 Dec 2020 16:04:50 -0800 (PST) Received: by mail-ot1-x344.google.com with SMTP id b18so14338255ots.0 for ; Mon, 07 Dec 2020 16:04:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=bsuIqyXlLj77HI6Jh/NOCqFh+idsp+iFSjpkcZnqfL8=; b=aQM4hreS6KtUL1R136w+xwOYm2V2sqqwsv2J/erkj887bXhFXcv3Gx+yU1WvhiWT6X uRdF2h3iz9rN7VTZFQOV297YD2YuilW//+/vyqVgtXPHOumQevv2zhg1wHJKzKDI4P37 ySrk+di+BKJ03hBoYr/dsiI8r8haOn2aZ8+c1S9kPfB1krcqCZpvC6w/skbYIIPoJQT3 C8jNm2YQkfZzVwp1h3GH9dsb3Dcy5GOITqAdP5FA0sN+Rg6Jqr6aSXF3jmChjzFlWOSE MtPAIvIl4wIaGkY8CamG/En/CX00NtlV7HCOKL7sblwSNdSiMsXn9NwDPx1nieRk58bA aOxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=bsuIqyXlLj77HI6Jh/NOCqFh+idsp+iFSjpkcZnqfL8=; b=hQhPp4zDdIdn53OD4mt1zsjiMztHvbjktta4ZdDdxgAaaIJb72CpofOa1f4uS1S4CT c0KrclMUUDYfI9wNDeo111mElm7nKpcbmvGUnd9pyqfeoSnNgkW/embjig9bhT/vwYq2 klfTe1JjZwXTIaj1sU9O9ld0QBr75+xTsaEbvhExTwBtZdk7pnq+PdGmI0b88LfYNvPH Hn/Ap1m0kXWJAQHJynFd7gK0KUnQLln+C7u/Fdj+xopmD/IsVewPOC0W8iGT60jwTKbd 4PG/p7K0Yy/Hn5P/dSlPFXVon/FMv65ijWz5CWVuAM3B5eiaDPakz2EDIa7JQ3M2C2+s 6aGg== X-Gm-Message-State: AOAM532m4MzvOU4fQ9T7sA0GIRHIJeI6AmWg1zU98YAeXCsSbIfk8K2f QcAokxZ2h4Ie+6Gwy+bgJSwgTBpgD0qJ9ShB X-Google-Smtp-Source: ABdhPJxfJgOKA7lgXWXOsB1ehCDGtRvcG+wL3MMoe3quKW3Kkeeq8NxK1HzSopN7ftMDs7DFCEtnaw== X-Received: by 2002:a9d:1725:: with SMTP id i37mr14866583ota.258.1607385889890; Mon, 07 Dec 2020 16:04:49 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id s189sm3318435oia.7.2020.12.07.16.04.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:04:49 -0800 (PST) Date: Mon, 7 Dec 2020 19:04:47 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 08/24] ewah: implement bitmap_or() Message-ID: <34137a7f35835c923fe9db39049a124a41ca0839.1607385833.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King We have a function to bitwise-OR an ewah into an uncompressed bitmap, but not to OR two uncompressed bitmaps. Let's add it. Interestingly, we have a public header declaration going back to e1273106f6 (ewah: compressed bitmap implementation, 2013-11-14), but the function was never implemented. That was all OK since there were no users of 'bitmap_or()', but a first caller will be added in a couple of patches. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index 6f9e5c529b..0a3502603f 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -122,6 +122,15 @@ void bitmap_and_not(struct bitmap *self, struct bitmap *other) self->words[i] &= ~other->words[i]; } +void bitmap_or(struct bitmap *self, const struct bitmap *other) +{ + size_t i; + + bitmap_grow(self, other->word_alloc); + for (i = 0; i < other->word_alloc; i++) + self->words[i] |= other->words[i]; +} + void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other) { size_t original_size = self->word_alloc; From patchwork Tue Dec 8 00:04:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957143 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D50C1C2BB40 for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B72B423A1D for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728565AbgLHAFt (ORCPT ); Mon, 7 Dec 2020 19:05:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728558AbgLHAFs (ORCPT ); Mon, 7 Dec 2020 19:05:48 -0500 Received: from mail-oo1-xc41.google.com (mail-oo1-xc41.google.com [IPv6:2607:f8b0:4864:20::c41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8765DC061749 for ; Mon, 7 Dec 2020 16:04:55 -0800 (PST) Received: by mail-oo1-xc41.google.com with SMTP id i7so3635267oot.8 for ; Mon, 07 Dec 2020 16:04:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=tgwQSV0Jx2RFdjihG4qsp3emQ8nruJR26Qc29XYgvWk=; b=1rTyzAF5kh5uFw97ocA9ZigNbKXPAsOLQcrKOG0rrksaSYXvKMCrzZfF2sJ0vzos4r 6CkuJw5Nyx0eJvJ9cq6gCO4y4e6c9mlmf8cv/HD/EZvRx988GeeEE2sGpEg7ERpeivph q90Jgg3LaqFPYY71DUmdyieucJla3A9SgOttFzmIYX7GIgHerXREC+9WUv62XsIKd0H/ R2QlVZyDxe3tf4I8MziPqrHy8woAxxAQ5S4SiKGeCYaMTV7tssJy0Vlb4r71BmiTLEsu OXrkV/4l3lvw9UEePpmkRcaYuC/P/2eCHFZsg227mWhs+WgGrMtJGspxDexSLrqVB4yf CzhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=tgwQSV0Jx2RFdjihG4qsp3emQ8nruJR26Qc29XYgvWk=; b=dRYP6V2Gee2L1+iayEN359N3UBgwYuk9v5BO3VLcS1IDD5HoerCMlbCH41quS3TCjd A6UUunRKC6OgyW/CZC1ReJaXgFhs8WCdkHA2VedGVuBCmCW4uqehNonyuIVTc+heyLea vlrgYfSckFfg/J5IjNaIHEmoGaxppIVXhtDgbANg1/NqoXoRhDXG+EFeeNP6G1vgKfYO +UWV6YKz23Lpyj6kjl0+JTaHgRSRk8DjTuFdwyLekknDEt7jf+jj+x6+7XXaV6zYl30/ EJgt5Qyz5yMqOtv2/2/kcrka+KBp613NnW8VBrYT3EOZCce2HVMNCsWo07q9o21rpI9C o6MA== X-Gm-Message-State: AOAM530WFFhN33UXujULq8TLfAyYRE2Ktab1Z/GayVvfr7HutSmKGR7K 4747RyFPFt/Ukb8S/IeyzTWhrJjXXaFnq11M X-Google-Smtp-Source: ABdhPJyxZEGHsnadsvtvu5FLbuZTMAc+HUjOirWthXAouMWM3Dz8amRUEmPb1vT4uXQG3JG8ZUGWeQ== X-Received: by 2002:a4a:9b16:: with SMTP id a22mr4542898ook.6.1607385894667; Mon, 07 Dec 2020 16:04:54 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id 94sm3020477otw.41.2020.12.07.16.04.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:04:54 -0800 (PST) Date: Mon, 7 Dec 2020 19:04:51 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 09/24] ewah: add bitmap_dup() function Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King There's no easy way to make a copy of a bitmap. Obviously a caller can iterate over the bits and set them one by one in a new bitmap, but we can go much faster by copying whole words with memcpy(). Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 7 +++++++ ewah/ewok.h | 1 + 2 files changed, 8 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index 0a3502603f..b5f6376282 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -35,6 +35,13 @@ struct bitmap *bitmap_new(void) return bitmap_word_alloc(32); } +struct bitmap *bitmap_dup(const struct bitmap *src) +{ + struct bitmap *dst = bitmap_word_alloc(src->word_alloc); + COPY_ARRAY(dst->words, src->words, src->word_alloc); + return dst; +} + static void bitmap_grow(struct bitmap *self, size_t word_alloc) { size_t old_size = self->word_alloc; diff --git a/ewah/ewok.h b/ewah/ewok.h index 011852bef1..1fc555e672 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -173,6 +173,7 @@ struct bitmap { struct bitmap *bitmap_new(void); struct bitmap *bitmap_word_alloc(size_t word_alloc); +struct bitmap *bitmap_dup(const struct bitmap *src); void bitmap_set(struct bitmap *self, size_t pos); void bitmap_unset(struct bitmap *self, size_t pos); int bitmap_get(struct bitmap *self, size_t pos); From patchwork Tue Dec 8 00:04:56 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957155 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41893C2BB9A for ; Tue, 8 Dec 2020 00:05:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 114B323A04 for ; Tue, 8 Dec 2020 00:05:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728568AbgLHAFw (ORCPT ); Mon, 7 Dec 2020 19:05:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728460AbgLHAFv (ORCPT ); Mon, 7 Dec 2020 19:05:51 -0500 Received: from mail-oo1-xc29.google.com (mail-oo1-xc29.google.com [IPv6:2607:f8b0:4864:20::c29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D2D8C0611CB for ; Mon, 7 Dec 2020 16:05:00 -0800 (PST) Received: by mail-oo1-xc29.google.com with SMTP id i18so1607518ooh.5 for ; Mon, 07 Dec 2020 16:05:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=8a1DVfwodNSPG/KEl4b7OreWUlMbtW7oinAE1F5TJy0=; b=ZDEM1VnEuPQ0DDN+cjkq8oZA/sQRRrtO36UxEOaFkwx1XcDAJl0iXYfKy+0Znb18px Mdxu3h3IsL3oatcss+76QEP0ZEvlxWo84kXOcylweLN189a4wfvWfTZRRIUjpRzq3UZG RLUD5DDF7mP6GWr67DT+Wpy0lXQi4NGwjoCemyiW8hBeQ75NoB674oBMiLD0nojECLzK 818aVf6vXpeB2OZLEiM9DuJLEfVwnLLemoCKeTrCABZakrTjiZ0qGvqRyft3CCUADKjs RgOyvXRCr/Wl/EnEPzqIYjzcYQ4rSandEQd+xZ9qyAUtkaoiUR79CFFN9yTP4fEjXJbn gAgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=8a1DVfwodNSPG/KEl4b7OreWUlMbtW7oinAE1F5TJy0=; b=IVvMZOGa8AsqEz7XmxfsZ+ElCyjjoyNYN3XIhZyykDxBJVYXfcz/zWWYDkeUsqxbvr UWdvJO7O7sHGo2feChAer5Bg0kZDt5hQMJaRr9uuZOryh/CADV3xpDRH6aGirhE12R/0 4T0+vHSe9LeoE3gAUWxcvsWrcslGG8iGUq/mkkeN3CY17zQgL2zqP96nXgTp/2whHbdl KhzTBJ9kvd9NaHIqEUOSceXyOrnLBTVZcgiB/JhOQF9ZrqYxWy3RhfHTkhOj+e3H26H3 Rhb8jky3X0Y72YWksotMEwksrAmDdP+KayD/LGGCl4e3fgj18/gVSvQO7AdgMEM767Bw x0Ig== X-Gm-Message-State: AOAM5329L9VdmsoAtxYvq/ioZJvXbz4ZC6QBZ7A2RzzvxkzIna7d+GWA ZIj0kXh042zlFy+RU0LrI67Leq7NaBcU4JhW X-Google-Smtp-Source: ABdhPJxQVrTJhGoiPfO1NgRfIPBhZluD6x+JvzaMaDL0zU5tZ/5y6Ha0odFgRbylb9Lpz5xq78SrSw== X-Received: by 2002:a4a:8f98:: with SMTP id c24mr14542175ooj.27.1607385899036; Mon, 07 Dec 2020 16:04:59 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id t13sm3313693oic.4.2020.12.07.16.04.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:04:58 -0800 (PST) Date: Mon, 7 Dec 2020 19:04:56 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 10/24] pack-bitmap-write: reimplement bitmap writing Message-ID: <91cd8b1a49290095ba55955d86f29cf3afd5e1ce.1607385833.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King The bitmap generation code works by iterating over the set of commits for which we plan to write bitmaps, and then for each one performing a traditional traversal over the reachable commits and trees, filling in the bitmap. Between two traversals, we can often reuse the previous bitmap result as long as the first commit is an ancestor of the second. However, our worst case is that we may end up doing "n" complete complete traversals to the root in order to create "n" bitmaps. In a real-world case (the shared-storage repo consisting of all GitHub forks of chromium/chromium), we perform very poorly: generating bitmaps takes ~3 hours, whereas we can walk the whole object graph in ~3 minutes. This commit completely rewrites the algorithm, with the goal of accessing each object only once. It works roughly like this: - generate a list of commits in topo-order using a single traversal - invert the edges of the graph (so have parents point at their children) - make one pass in reverse topo-order, generating a bitmap for each commit and passing the result along to child nodes We generate correct results because each node we visit has already had all of its ancestors added to the bitmap. And we make only two linear passes over the commits. We also visit each tree usually only once. When filling in a bitmap, we don't bother to recurse into trees whose bit is already set in the bitmap (since we know we've already done so when setting their bit). That means that if commit A references tree T, none of its descendants will need to open T again. I say "usually", though, because it is possible for a given tree to be mentioned in unrelated parts of history (e.g., cherry-picking to a parallel branch). So we've accomplished our goal, and the resulting algorithm is pretty simple to understand. But there are some downsides, at least with this initial implementation: - we no longer reuse the results of any on-disk bitmaps when generating. So we'd expect to sometimes be slower than the original when bitmaps already exist. However, this is something we'll be able to add back in later. - we use much more memory. Instead of keeping one bitmap in memory at a time, we're passing them up through the graph. So our memory use should scale with the graph width (times the size of a bitmap). So how does it perform? For a clone of linux.git, generating bitmaps from scratch with the old algorithm took 63s. Using this algorithm it takes 205s. Which is much worse, but _might_ be acceptable if it behaved linearly as the size grew. It also increases peak heap usage by ~1G. That's not impossibly large, but not encouraging. On the complete fork-network of torvalds/linux, it increases the peak RAM usage by 40GB. Yikes. (I forgot to record the time it took, but the memory usage was too much to consider this reasonable anyway). On the complete fork-network of chromium/chromium, I ran out of memory before succeeding. Some back-of-the-envelope calculations indicate it would need 80+GB to complete. So at this stage, we've managed to make things much worse. But because of the way this new algorithm is structured, there are a lot of opportunities for optimization on top. We'll start implementing those in the follow-on patches. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 306 +++++++++++++++++++++++++------------------- 1 file changed, 172 insertions(+), 134 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 5e998bdaa7..bcd059ccd9 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -110,8 +110,6 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack, /** * Compute the actual bitmaps */ -static struct object **seen_objects; -static unsigned int seen_objects_nr, seen_objects_alloc; static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused) { @@ -127,21 +125,6 @@ static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitm writer.selected_nr++; } -static inline void mark_as_seen(struct object *object) -{ - ALLOC_GROW(seen_objects, seen_objects_nr + 1, seen_objects_alloc); - seen_objects[seen_objects_nr++] = object; -} - -static inline void reset_all_seen(void) -{ - unsigned int i; - for (i = 0; i < seen_objects_nr; ++i) { - seen_objects[i]->flags &= ~(SEEN | ADDED | SHOWN); - } - seen_objects_nr = 0; -} - static uint32_t find_object_pos(const struct object_id *oid) { struct object_entry *entry = packlist_find(writer.to_pack, oid); @@ -154,60 +137,6 @@ static uint32_t find_object_pos(const struct object_id *oid) return oe_in_pack_pos(writer.to_pack, entry); } -static void show_object(struct object *object, const char *name, void *data) -{ - struct bitmap *base = data; - bitmap_set(base, find_object_pos(&object->oid)); - mark_as_seen(object); -} - -static void show_commit(struct commit *commit, void *data) -{ - mark_as_seen((struct object *)commit); -} - -static int -add_to_include_set(struct bitmap *base, struct commit *commit) -{ - khiter_t hash_pos; - uint32_t bitmap_pos = find_object_pos(&commit->object.oid); - - if (bitmap_get(base, bitmap_pos)) - return 0; - - hash_pos = kh_get_oid_map(writer.bitmaps, commit->object.oid); - if (hash_pos < kh_end(writer.bitmaps)) { - struct bitmapped_commit *bc = kh_value(writer.bitmaps, hash_pos); - bitmap_or_ewah(base, bc->bitmap); - return 0; - } - - bitmap_set(base, bitmap_pos); - return 1; -} - -static int -should_include(struct commit *commit, void *_data) -{ - struct bitmap *base = _data; - - if (!add_to_include_set(base, commit)) { - struct commit_list *parent = commit->parents; - - mark_as_seen((struct object *)commit); - - while (parent) { - parent->item->object.flags |= SEEN; - mark_as_seen((struct object *)parent->item); - parent = parent->next; - } - - return 0; - } - - return 1; -} - static void compute_xor_offsets(void) { static const int MAX_XOR_OFFSET_SEARCH = 10; @@ -248,79 +177,188 @@ static void compute_xor_offsets(void) } } -void bitmap_writer_build(struct packing_data *to_pack) +struct bb_commit { + struct commit_list *children; + struct bitmap *bitmap; + unsigned selected:1; + unsigned idx; /* within selected array */ +}; + +define_commit_slab(bb_data, struct bb_commit); + +struct bitmap_builder { + struct bb_data data; + struct commit **commits; + size_t commits_nr, commits_alloc; +}; + +static void bitmap_builder_init(struct bitmap_builder *bb, + struct bitmap_writer *writer) { - static const double REUSE_BITMAP_THRESHOLD = 0.2; - - int i, reuse_after, need_reset; - struct bitmap *base = bitmap_new(); struct rev_info revs; + struct commit *commit; + unsigned int i; + + memset(bb, 0, sizeof(*bb)); + init_bb_data(&bb->data); + + reset_revision_walk(); + repo_init_revisions(writer->to_pack->repo, &revs, NULL); + revs.topo_order = 1; + + for (i = 0; i < writer->selected_nr; i++) { + struct commit *c = writer->selected[i].commit; + struct bb_commit *ent = bb_data_at(&bb->data, c); + ent->selected = 1; + ent->idx = i; + add_pending_object(&revs, &c->object, ""); + } + + if (prepare_revision_walk(&revs)) + die("revision walk setup failed"); + + while ((commit = get_revision(&revs))) { + struct commit_list *p; + + parse_commit_or_die(commit); + + ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); + bb->commits[bb->commits_nr++] = commit; + + for (p = commit->parents; p; p = p->next) { + struct bb_commit *ent = bb_data_at(&bb->data, p->item); + commit_list_insert(commit, &ent->children); + } + } +} + +static void bitmap_builder_clear(struct bitmap_builder *bb) +{ + clear_bb_data(&bb->data); + free(bb->commits); + bb->commits_nr = bb->commits_alloc = 0; +} + +static void fill_bitmap_tree(struct bitmap *bitmap, + struct tree *tree) +{ + uint32_t pos; + struct tree_desc desc; + struct name_entry entry; + + /* + * If our bit is already set, then there is nothing to do. Both this + * tree and all of its children will be set. + */ + pos = find_object_pos(&tree->object.oid); + if (bitmap_get(bitmap, pos)) + return; + bitmap_set(bitmap, pos); + + if (parse_tree(tree) < 0) + die("unable to load tree object %s", + oid_to_hex(&tree->object.oid)); + init_tree_desc(&desc, tree->buffer, tree->size); + + while (tree_entry(&desc, &entry)) { + switch (object_type(entry.mode)) { + case OBJ_TREE: + fill_bitmap_tree(bitmap, + lookup_tree(the_repository, &entry.oid)); + break; + case OBJ_BLOB: + bitmap_set(bitmap, find_object_pos(&entry.oid)); + break; + default: + /* Gitlink, etc; not reachable */ + break; + } + } + + free_tree_buffer(tree); +} + +static void fill_bitmap_commit(struct bb_commit *ent, + struct commit *commit) +{ + if (!ent->bitmap) + ent->bitmap = bitmap_new(); + + /* + * mark ourselves, but do not bother with parents; their values + * will already have been propagated to us + */ + bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); + fill_bitmap_tree(ent->bitmap, get_commit_tree(commit)); +} + +static void store_selected(struct bb_commit *ent, struct commit *commit) +{ + struct bitmapped_commit *stored = &writer.selected[ent->idx]; + khiter_t hash_pos; + int hash_ret; + + /* + * the "reuse bitmaps" phase may have stored something here, but + * our new algorithm doesn't use it. Drop it. + */ + if (stored->bitmap) + ewah_free(stored->bitmap); + + stored->bitmap = bitmap_to_ewah(ent->bitmap); + + hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret); + if (hash_ret == 0) + die("Duplicate entry when writing index: %s", + oid_to_hex(&commit->object.oid)); + kh_value(writer.bitmaps, hash_pos) = stored; +} + +void bitmap_writer_build(struct packing_data *to_pack) +{ + struct bitmap_builder bb; + size_t i; + int nr_stored = 0; /* for progress */ writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; if (writer.show_progress) writer.progress = start_progress("Building bitmaps", writer.selected_nr); - - repo_init_revisions(to_pack->repo, &revs, NULL); - revs.tag_objects = 1; - revs.tree_objects = 1; - revs.blob_objects = 1; - revs.no_walk = 0; - - revs.include_check = should_include; - reset_revision_walk(); - - reuse_after = writer.selected_nr * REUSE_BITMAP_THRESHOLD; - need_reset = 0; - - for (i = writer.selected_nr - 1; i >= 0; --i) { - struct bitmapped_commit *stored; - struct object *object; - - khiter_t hash_pos; - int hash_ret; - - stored = &writer.selected[i]; - object = (struct object *)stored->commit; - - if (stored->bitmap == NULL) { - if (i < writer.selected_nr - 1 && - (need_reset || - !in_merge_bases(writer.selected[i + 1].commit, - stored->commit))) { - bitmap_reset(base); - reset_all_seen(); - } - - add_pending_object(&revs, object, ""); - revs.include_check_data = base; - - if (prepare_revision_walk(&revs)) - die("revision walk setup failed"); - - traverse_commit_list(&revs, show_commit, show_object, base); - - object_array_clear(&revs.pending); - - stored->bitmap = bitmap_to_ewah(base); - need_reset = 0; - } else - need_reset = 1; - - if (i >= reuse_after) - stored->flags |= BITMAP_FLAG_REUSE; - - hash_pos = kh_put_oid_map(writer.bitmaps, object->oid, &hash_ret); - if (hash_ret == 0) - die("Duplicate entry when writing index: %s", - oid_to_hex(&object->oid)); - - kh_value(writer.bitmaps, hash_pos) = stored; - display_progress(writer.progress, writer.selected_nr - i); + trace2_region_enter("pack-bitmap-write", "building_bitmaps_total", + the_repository); + + bitmap_builder_init(&bb, &writer); + for (i = bb.commits_nr; i > 0; i--) { + struct commit *commit = bb.commits[i-1]; + struct bb_commit *ent = bb_data_at(&bb.data, commit); + struct commit *child; + + fill_bitmap_commit(ent, commit); + + if (ent->selected) { + store_selected(ent, commit); + nr_stored++; + display_progress(writer.progress, nr_stored); + } + + while ((child = pop_commit(&ent->children))) { + struct bb_commit *child_ent = + bb_data_at(&bb.data, child); + + if (child_ent->bitmap) + bitmap_or(child_ent->bitmap, ent->bitmap); + else + child_ent->bitmap = bitmap_dup(ent->bitmap); + } + bitmap_free(ent->bitmap); + ent->bitmap = NULL; } + bitmap_builder_clear(&bb); + + trace2_region_leave("pack-bitmap-write", "building_bitmaps_total", + the_repository); - bitmap_free(base); stop_progress(&writer.progress); compute_xor_offsets(); From patchwork Tue Dec 8 00:05:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957145 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE46FC197BF for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A14FC23A04 for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728557AbgLHAFq (ORCPT ); Mon, 7 Dec 2020 19:05:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727744AbgLHAFp (ORCPT ); Mon, 7 Dec 2020 19:05:45 -0500 Received: from mail-oi1-x241.google.com (mail-oi1-x241.google.com [IPv6:2607:f8b0:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B4ACC061794 for ; Mon, 7 Dec 2020 16:05:04 -0800 (PST) Received: by mail-oi1-x241.google.com with SMTP id p126so17537472oif.7 for ; Mon, 07 Dec 2020 16:05:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=41w+8dMxR92mbEFLoU66S+t8xwPreIFljCrA8SXjhWo=; b=RZzr4HlnTWlNIgRSiAEI3rJ0MOjL9hOqxt7nGGlBf8hthTg8MDLaj77WUG5hg9s5qm xN8wRgCX199OpsYbTTkogssGNY1V4mHIqLs6+Tt6SD4NJg1CSHEkZ6gaJkJ9sXVClIyx Fjubv+CrZtA9JYHdcsJEJAG7DLez6cxwULn53gzpqmsXnjpuDdnwhv+59okbBrh7lBpm o8Q6XAyB1MC22luYPmhrM8kkhPJojvwiZnIqqG6l1/HxvxWgShC9FQsYgkhL6qdmDQAd DidNNaKIiaQl4guSnb7yDQbywhiqry8r+vmm3pOq8HSKKPx+ZKDswFNUldyCgk6xE/vF NExg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=41w+8dMxR92mbEFLoU66S+t8xwPreIFljCrA8SXjhWo=; b=JHgNud6U2HoB9CUOYCkf8lwAxX2igouRG9ucbGSDW5p1ok+k8xqRd5BCLQAciy0HN2 Kc0PeKNRsES/doHwV/yED0NoSHwoWxNvcWRagORMWpi8Wu3/twfOROc/eculcunlIVM+ 4mu6bfg0NvXrimAEIH7h5F/3sd1ThiCH/wEl2RXy3dGgv1O2j0a8xlOIrQHLddkCB2rm TOLA8NRpr7XLIdU0lAiuQyvr6Osso6uqX4USzWmn3+Fqopq71sem+mbDI16k4k1Cb2zi 03fJ8VlM4icPrfFNash/9TCCIO/NtexSbBAheNzsXLAnkN1eJs+/h5C3jMCCxpcqByAU VXNQ== X-Gm-Message-State: AOAM530ECLZFzONlzCLkqpztgF0J0Lh69F+JL7fQZ8m0ENhSfX56lzzQ RDupP8Ezqv1rpCR4sCv2Bwx43wuk1ckniFCV X-Google-Smtp-Source: ABdhPJxgC/wlxXXYrv8mYxYL4AKa1QBm00ZjRMDFfowV2zg2c7tl+tFoW8X0dDf9dHPPA60IwyB5Fg== X-Received: by 2002:aca:ed86:: with SMTP id l128mr988464oih.0.1607385903372; Mon, 07 Dec 2020 16:05:03 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id s189sm3318569oia.7.2020.12.07.16.05.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:05:02 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:00 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 11/24] pack-bitmap-write: pass ownership of intermediate bitmaps Message-ID: <64598024ecfe60428df800caabb4e2c6efc0f905.1607385833.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King Our algorithm to generate reachability bitmaps walks through the commit graph from the bottom up, passing bitmap data from each commit to its descendants. For a linear stretch of history like: A -- B -- C our sequence of steps is: - compute the bitmap for A by walking its trees, etc - duplicate A's bitmap as a starting point for B; we can now free A's bitmap, since we only needed it as an intermediate result - OR in any extra objects that B can reach into its bitmap - duplicate B's bitmap as a starting point for C; likewise, free B's bitmap - OR in objects for C, and so on... Rather than duplicating bitmaps and immediately freeing the original, we can just pass ownership from commit to commit. Note that this doesn't always work: - the recipient may be a merge which already has an intermediate bitmap from its other ancestor. In that case we have to OR our result into it. Note that the first ancestor to reach the merge does get to pass ownership, though. - we may have multiple children; we can only pass ownership to one of them However, it happens often enough and copying bitmaps is expensive enough that this provides a noticeable speedup. On a clone of linux.git, this reduces the time to generate bitmaps from 205s to 70s. This is about the same amount of time it took to generate bitmaps using our old "many traversals" algorithm (the previous commit measures the identical scenario as taking 63s). It unfortunately provides only a very modest reduction in the peak memory usage, though. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index bcd059ccd9..1eb9615df8 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -333,6 +333,7 @@ void bitmap_writer_build(struct packing_data *to_pack) struct commit *commit = bb.commits[i-1]; struct bb_commit *ent = bb_data_at(&bb.data, commit); struct commit *child; + int reused = 0; fill_bitmap_commit(ent, commit); @@ -348,10 +349,15 @@ void bitmap_writer_build(struct packing_data *to_pack) if (child_ent->bitmap) bitmap_or(child_ent->bitmap, ent->bitmap); - else + else if (reused) child_ent->bitmap = bitmap_dup(ent->bitmap); + else { + child_ent->bitmap = ent->bitmap; + reused = 1; + } } - bitmap_free(ent->bitmap); + if (!reused) + bitmap_free(ent->bitmap); ent->bitmap = NULL; } bitmap_builder_clear(&bb); From patchwork Tue Dec 8 00:05:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957149 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D24EC1B0E3 for ; Tue, 8 Dec 2020 00:05:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CCDB423A22 for ; Tue, 8 Dec 2020 00:05:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728561AbgLHAFt (ORCPT ); Mon, 7 Dec 2020 19:05:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727744AbgLHAFs (ORCPT ); Mon, 7 Dec 2020 19:05:48 -0500 Received: from mail-oi1-x243.google.com (mail-oi1-x243.google.com [IPv6:2607:f8b0:4864:20::243]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E844C06179C for ; Mon, 7 Dec 2020 16:05:08 -0800 (PST) Received: by mail-oi1-x243.google.com with SMTP id s75so14468190oih.1 for ; Mon, 07 Dec 2020 16:05:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=69NZYdya+y3fH6tvgWnX9Me1m2KxucJctZMIwxwOhUo=; b=YsbIXYOgrsrI0GQySj02mUhpd6BB8LG17Km4hqcb6vC9IHH2rAqRmF72ior2DKoULs /sgL8Ema1F43alkBDlzel/Ob14qPoTYB55ilN9tpPaAQptLuWISPkHUwTTkY0eDok4nK TgfdT0UiTZK2yNgrUgEkoOy/yVUcfo25u+kctdrWr8UJovlGG8sfgNVcs2XXill/ClRC fqbj82EswHSSISKO2SYTp7FsXq6FdrVDkTHHN0wlaEtOJChfEHJ6w52DVpHSjW7IP0ka pitPgZI4Bof27UmHNT4D2TzyXtKOEX/zvsGHNBn9ARlAmgKKkjoN004Gy3WzJVuMjqw0 Notg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=69NZYdya+y3fH6tvgWnX9Me1m2KxucJctZMIwxwOhUo=; b=Bxp5LHjv6P6aSQpHGYnWXRhxnx1lyuInmY8Bg3r/+FhUyCQ5vLgupcAqINKvGf+L/x SemnExrmsMCpTBJNovAr6GMsyPUC/3sLuvM0LZ2rrgo93BZCTCmOEIJ7t0oIOAqwElH5 dqJ0WWRujdvp+wsWnn3I7ZRnFb9VTVlYGz7Ao0X2AI9T+2ro2dHjr2EDyvkUCQvrC8vG XR1tyDMP/KILPblSygeRTIrV9ONLY3eoY+VQ0/UREBMYPnVfji+eW0UXB2P+vNgrUahm DKr3VnN6E2YJa4vbY6rILXSURSBZlxElm6qOGjK/xxNxhs53I08pT1fDA2/Q/VIvB+wX pKzg== X-Gm-Message-State: AOAM533DrzUgtU5WuaFJAHs12UmcJznbwve2dMBYv8ymDWq8Hgq5sv53 VLNIiMMsrbSxBgD7CL4jpxAdR1X6PoyCGS8I X-Google-Smtp-Source: ABdhPJzc8+auc140TZAR6/k4AzFGIyDUZkJU4ysrM/vjjttOshUwP+oXFOrOvELEGd+VBGVgoMivjw== X-Received: by 2002:a05:6808:7de:: with SMTP id f30mr972804oij.148.1607385907550; Mon, 07 Dec 2020 16:05:07 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id s206sm1923954oie.24.2020.12.07.16.05.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:05:07 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:05 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 12/24] pack-bitmap-write: fill bitmap with commit history Message-ID: <93fc437a3c1b4ac3bdf1c241b2178281348ba561.1607385833.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The current implementation of bitmap_writer_build() creates a reachability bitmap for every walked commit. After computing a bitmap for a commit, those bits are pushed to an in-progress bitmap for its children. fill_bitmap_commit() assumes the bits corresponding to objects reachable from the parents of a commit are already set. This means that when visiting a new commit, we only have to walk the objects reachable between it and any of its parents. A future change to bitmap_writer_build() will relax this condition so not all parents have their bits set. Prepare for that by having 'fill_bitmap_commit()' walk parents until reaching commits whose bits are already set. Then, walk the trees for these commits as well. This has no functional change with the current implementation of bitmap_writer_build(). Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 1eb9615df8..957639241e 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -12,6 +12,7 @@ #include "sha1-lookup.h" #include "pack-objects.h" #include "commit-reach.h" +#include "prio-queue.h" struct bitmapped_commit { struct commit *commit; @@ -279,17 +280,30 @@ static void fill_bitmap_tree(struct bitmap *bitmap, } static void fill_bitmap_commit(struct bb_commit *ent, - struct commit *commit) + struct commit *commit, + struct prio_queue *queue) { if (!ent->bitmap) ent->bitmap = bitmap_new(); - /* - * mark ourselves, but do not bother with parents; their values - * will already have been propagated to us - */ bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); - fill_bitmap_tree(ent->bitmap, get_commit_tree(commit)); + prio_queue_put(queue, commit); + + while (queue->nr) { + struct commit_list *p; + struct commit *c = prio_queue_get(queue); + + bitmap_set(ent->bitmap, find_object_pos(&c->object.oid)); + fill_bitmap_tree(ent->bitmap, get_commit_tree(c)); + + for (p = c->parents; p; p = p->next) { + int pos = find_object_pos(&p->item->object.oid); + if (!bitmap_get(ent->bitmap, pos)) { + bitmap_set(ent->bitmap, pos); + prio_queue_put(queue, p->item); + } + } + } } static void store_selected(struct bb_commit *ent, struct commit *commit) @@ -319,6 +333,7 @@ void bitmap_writer_build(struct packing_data *to_pack) struct bitmap_builder bb; size_t i; int nr_stored = 0; /* for progress */ + struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; @@ -335,7 +350,7 @@ void bitmap_writer_build(struct packing_data *to_pack) struct commit *child; int reused = 0; - fill_bitmap_commit(ent, commit); + fill_bitmap_commit(ent, commit, &queue); if (ent->selected) { store_selected(ent, commit); @@ -360,6 +375,7 @@ void bitmap_writer_build(struct packing_data *to_pack) bitmap_free(ent->bitmap); ent->bitmap = NULL; } + clear_prio_queue(&queue); bitmap_builder_clear(&bb); trace2_region_leave("pack-bitmap-write", "building_bitmaps_total", From patchwork Tue Dec 8 00:05:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957151 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C508C4361B for ; Tue, 8 Dec 2020 00:05:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D5E3323A04 for ; Tue, 8 Dec 2020 00:05:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728577AbgLHAFy (ORCPT ); Mon, 7 Dec 2020 19:05:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728460AbgLHAFw (ORCPT ); Mon, 7 Dec 2020 19:05:52 -0500 Received: from mail-ot1-x343.google.com (mail-ot1-x343.google.com [IPv6:2607:f8b0:4864:20::343]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5A1FC0617B0 for ; Mon, 7 Dec 2020 16:05:12 -0800 (PST) Received: by mail-ot1-x343.google.com with SMTP id i6so8220948otr.2 for ; Mon, 07 Dec 2020 16:05:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=9mqvfpcaLJPH1p9MC9DOD+j9GQThZNPIDACNQV59NaA=; b=MIkJpbvKwY82mNrxHOKYopIy6iBYJJXya5nvsYzxEIQFT3aFw9QOKEyP+8UH17awos a/BjQQEW4NPv1lsomSpTgRVDIzW2wx6fDfNlxlRIjg1Iw6MXh80QpNzNAw0WCjuC6y+8 HGq2JKhNIddukOQ+9PhqzExtMLWiTP3WMSqmjBY1thi/LjNGI1UJncws85lF/Nq9Ix+t FioEBqjSYnNZLBfbruD2wN0oDHmMsJL5IjgFcp4FL0DLUyrxm3WSI9t/agJg1+ayZpOC flH9TD7eqtF2UyNKYaMz+sh2+CZiXpGxsGsbidivhc81F5tMqtu7h6Pke0lwheKYCWJo s77A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=9mqvfpcaLJPH1p9MC9DOD+j9GQThZNPIDACNQV59NaA=; b=Szsc2W9c7gTfrdKwJwhMyFWZ7WBRTnyPGbiCSiIzouO0F3N3jfLTuHrbHNe2oLGly0 xQdEGytAb6jRppQYl/h6q37w3Xrxq56uRsfsESOqqbxOga5Pla8FSl51Qr/aF2ncQm6r c27NBknAMAfH6dUn6pJXoj2Q+6Fi0lkEC5HIe2h4r0fkulUtfTcPsvKnUJ5dOatxS9oS 90DASrYk2OUQ2RnqgKUJgmqKb7Q6+r1f4vZsZXlZiBUglHoaP9blISOsivoDzasEsa3N nnKJ5ik8bkGcTCWYJFv/1QP75MVBXSf2DYvBIe0Thu0/YbFMxNbWaJI+VD89O8WWnhKf 3zsw== X-Gm-Message-State: AOAM5306MNT4ucg0Tz7eMHsK8LKu2ZCudEiexGh/d6dJpy2zUH14R43Y HiOUrK92/zH/OblM5LhBKt2ZF6Yy8IZvedY/ X-Google-Smtp-Source: ABdhPJwiKDU8uZ1cQQjgLvXKzBeL6QKaQ4iTX05+b7ZoXHDQtcFDkcFAiVfr/J2jxdUZU7kMdoaQWw== X-Received: by 2002:a9d:3a76:: with SMTP id j109mr14830084otc.186.1607385911898; Mon, 07 Dec 2020 16:05:11 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id 43sm3005709otf.28.2020.12.07.16.05.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:05:11 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:09 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 13/24] bitmap: implement bitmap_is_subset() Message-ID: <0d5213ba44351df71d4f1405683c3d0729031f14.1607385833.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The bitmap_is_subset() function checks if the 'self' bitmap contains any bitmaps that are not on in the 'other' bitmap. Up until this patch, it had a declaration, but no implementation or callers. A subsequent patch will want this function, so implement it here. Helped-by: Junio C Hamano Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- ewah/bitmap.c | 21 +++++++++++++++++++++ ewah/ewok.h | 2 +- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index b5f6376282..0d31cdc866 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -195,6 +195,27 @@ int bitmap_equals(struct bitmap *self, struct bitmap *other) return 1; } +int bitmap_is_subset(struct bitmap *self, struct bitmap *other) +{ + size_t common_size, i; + + if (self->word_alloc < other->word_alloc) + common_size = self->word_alloc; + else { + common_size = other->word_alloc; + for (i = common_size; i < self->word_alloc; i++) { + if (self->words[i]) + return 1; + } + } + + for (i = 0; i < common_size; i++) { + if (self->words[i] & ~other->words[i]) + return 1; + } + return 0; +} + void bitmap_reset(struct bitmap *bitmap) { memset(bitmap->words, 0x0, bitmap->word_alloc * sizeof(eword_t)); diff --git a/ewah/ewok.h b/ewah/ewok.h index 1fc555e672..66920965da 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -180,7 +180,7 @@ int bitmap_get(struct bitmap *self, size_t pos); void bitmap_reset(struct bitmap *self); void bitmap_free(struct bitmap *self); int bitmap_equals(struct bitmap *self, struct bitmap *other); -int bitmap_is_subset(struct bitmap *self, struct bitmap *super); +int bitmap_is_subset(struct bitmap *self, struct bitmap *other); struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap); struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah); From patchwork Tue Dec 8 00:05:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957153 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93AE8C433FE for ; Tue, 8 Dec 2020 00:05:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 68E2B23A04 for ; Tue, 8 Dec 2020 00:05:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728582AbgLHAF5 (ORCPT ); Mon, 7 Dec 2020 19:05:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728460AbgLHAF5 (ORCPT ); Mon, 7 Dec 2020 19:05:57 -0500 Received: from mail-ot1-x32a.google.com (mail-ot1-x32a.google.com [IPv6:2607:f8b0:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFF1DC061285 for ; Mon, 7 Dec 2020 16:05:16 -0800 (PST) Received: by mail-ot1-x32a.google.com with SMTP id d8so6388051otq.6 for ; Mon, 07 Dec 2020 16:05:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=cEQOsB+BOx66j/LFEO/fQE8ubCTEUkji902LrnsMttQ=; b=qSES9HD9RzK4mzjrkCAUnKWqIS/FIGknKUJ3lYjXS4myejNY8+mgeCfKsLCiSp+73M c7Y2gRR0CZSeeyTm4sl+3+yTN7vKV8EKnaKyQebHo+gPOxUm4f/uZBH5vdYFC3uoeepF Vaw6c6EcVeSEvZ3Y4k2ES4u7cnFyPUDHQoUoj4mSkb/XkjwflN0XFnWkPqPKzTPyDR0e 8zFRtY7ec+nMn38U83NfWGAlkzWO/GDEeQoUjLgk18sC3PlA0JefreLgK+i3Wf9iOLMf D6w9uG3bBipjBoMPdGiyQErrRoUoqobnwYeZy9ddZNoDUfAoShJbUasyQFQg2FEbhrx3 GVSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=cEQOsB+BOx66j/LFEO/fQE8ubCTEUkji902LrnsMttQ=; b=hDfMNwbFWnnFhqefcH4haUiYK3Wic1YXlhcd6gdTvc4pBg1dwHt0WYlbVI1F93wugb bzl9Yum4BkyqvT/eVhg4dYv1OmZMgDZRplcHOX8or2EsgQ93vYIIKguU5bn3hCQXigeR DtJenik1Usio1lhVurUP9qozK+QOWlXK4cdKxb0FVOWOU0lu67hOHRyQqyGn+b7s7dFR Bv2FXbQCDrJ7S1iWJi53F6qurNas8wm4cagr9I9rgJtlm6IY9ihYA2zDVbsjGgoz8a7C M4WQjf+RETqhwax2b/tQm/5qvn0RaOvZdIJV8LPo565I/19uj2USYBPBqK/ZQYr4S7J+ eFDw== X-Gm-Message-State: AOAM5329bj+JUsFr2iIE3ZeuG7nFJtz7HyCJjcaIrYh9mW2dL8L0X73E 3IxWW5dWbECsyN0HBz6vpx8MbYvf34Ye2GKN X-Google-Smtp-Source: ABdhPJybsOJreDMrywXl9d/HCPIe0mjnCcT6Ss94HOSWOG7Lgec7Jtek4DkaczB0/nd8DcVU4OShdQ== X-Received: by 2002:a05:6830:1e0b:: with SMTP id s11mr15274457otr.352.1607385916063; Mon, 07 Dec 2020 16:05:16 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id k5sm2960731oot.30.2020.12.07.16.05.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:05:15 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:13 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 14/24] commit: implement commit_list_contains() Message-ID: <72e745fed8e357e8af6725ab8a0929752c0d2593.1607385833.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee It can be helpful to check if a commit_list contains a commit. Use pointer equality, assuming lookup_commit() was used. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- commit.c | 11 +++++++++++ commit.h | 2 ++ 2 files changed, 13 insertions(+) diff --git a/commit.c b/commit.c index fe1fa3dc41..9a785bf906 100644 --- a/commit.c +++ b/commit.c @@ -544,6 +544,17 @@ struct commit_list *commit_list_insert(struct commit *item, struct commit_list * return new_list; } +int commit_list_contains(struct commit *item, struct commit_list *list) +{ + while (list) { + if (list->item == item) + return 1; + list = list->next; + } + + return 0; +} + unsigned commit_list_count(const struct commit_list *l) { unsigned c = 0; diff --git a/commit.h b/commit.h index 5467786c7b..742a6de460 100644 --- a/commit.h +++ b/commit.h @@ -167,6 +167,8 @@ int find_commit_subject(const char *commit_buffer, const char **subject); struct commit_list *commit_list_insert(struct commit *item, struct commit_list **list); +int commit_list_contains(struct commit *item, + struct commit_list *list); struct commit_list **commit_list_append(struct commit *commit, struct commit_list **next); unsigned commit_list_count(const struct commit_list *l); From patchwork Tue Dec 8 00:05:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957157 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A0E5C4361B for ; Tue, 8 Dec 2020 00:06:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D1AC423A1D for ; Tue, 8 Dec 2020 00:06:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728586AbgLHAGJ (ORCPT ); Mon, 7 Dec 2020 19:06:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728583AbgLHAGI (ORCPT ); Mon, 7 Dec 2020 19:06:08 -0500 Received: from mail-oi1-x241.google.com (mail-oi1-x241.google.com [IPv6:2607:f8b0:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0AD9EC061793 for ; Mon, 7 Dec 2020 16:05:22 -0800 (PST) Received: by mail-oi1-x241.google.com with SMTP id x16so17574656oic.3 for ; Mon, 07 Dec 2020 16:05:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ctuYLhWvckLzSg3MCjV41UN7lDM7tbErVdpu0ikEr+4=; b=tyim3oOZe01QqFAM/RlTTsQ1rjkd95uK+MqA9bIwOoCt7qfR+RqpgNhEvTTPQKt2yK 9AqGUa7VbN8wPQx0wt/0sd9oygzZsN7bel91wqDjIkVZLfQhf7S2PpHEa3dbutHEN/bJ SPHLBH6z2zhRIYp8NsOpwsxnPflm+tKKpASrfx3sECnX6MXd47RYbgHNQaZI4dBWOD45 fl2+YwHztIirttkHBuC6ed3MOJOyTW0+5HGGtgN4shJ8bF3FmqJK4dLSUodKZGYGccZp WlFhodiK9ccWYWwl8AOPtN30XQrLyPlb20OBXQH5RNvVIThw7UioXTAQ3BsEplgh17n1 U96Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ctuYLhWvckLzSg3MCjV41UN7lDM7tbErVdpu0ikEr+4=; b=ESy3c0m3LGaPjZU5Ll2vbj/oadt4mjCqpV1qvJFTb3MPTDFZucVuqNzGu6CCJLF2lN NJhDQuCxU0qTI/JWcSfFrhS7X2WCjKQHsEVOjYHm0byW5IQRKKO9pXI+zoXw5rE2rvCl ZcUte6jAjIXJvMxajTWbKdbZchYa+6lmzWuwfeVo5Q6j21J/9NccF0wWbiQKEYU61olw JQgsIdsEMQReybwb4FB9/Id2Ubht8qCR5M0352JrW4AJmiTvO0aKf7TamzUJp/lSOMHt f7XWWxNh1E858IAINwCtONPAjMKsaNRfnkj+P07923yNW9YZjBgwiKSJ9X1xFiq7Bv+3 lF7Q== X-Gm-Message-State: AOAM533/UA16VIFMdQ1epcAu2eC1070AHXOSmy84z8PcKrv9zZWCJtaP 0AJxGPQdM4mpDx4PhtMWIDTIt1fDQos/2SZQ X-Google-Smtp-Source: ABdhPJynebarMuI4y0pb/RwgGQRzYcUE2nDq/k4Ce0Fca30jx2zBHa5CwJHPlI8UGn+e/M6BVbm4xg== X-Received: by 2002:aca:f1c1:: with SMTP id p184mr990663oih.54.1607385921191; Mon, 07 Dec 2020 16:05:21 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id d15sm2880285otk.62.2020.12.07.16.05.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:05:20 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:17 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 15/24] t5310: add branch-based checks Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The current rev-list tests that check the bitmap data only work on HEAD instead of multiple branches. Expand the test cases to handle both 'master' and 'other' branches. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- t/t5310-pack-bitmaps.sh | 61 +++++++++++++++++++++++------------------ 1 file changed, 34 insertions(+), 27 deletions(-) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 8a2a3b2114..b1248f1cc8 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -41,63 +41,70 @@ test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' git rev-list --test-bitmap HEAD ' -rev_list_tests() { - state=$1 - - test_expect_success "counting commits via bitmap ($state)" ' - git rev-list --count HEAD >expect && - git rev-list --use-bitmap-index --count HEAD >actual && +rev_list_tests_head () { + test_expect_success "counting commits via bitmap ($state, $branch)" ' + git rev-list --count $branch >expect && + git rev-list --use-bitmap-index --count $branch >actual && test_cmp expect actual ' - test_expect_success "counting partial commits via bitmap ($state)" ' - git rev-list --count HEAD~5..HEAD >expect && - git rev-list --use-bitmap-index --count HEAD~5..HEAD >actual && + test_expect_success "counting partial commits via bitmap ($state, $branch)" ' + git rev-list --count $branch~5..$branch >expect && + git rev-list --use-bitmap-index --count $branch~5..$branch >actual && test_cmp expect actual ' - test_expect_success "counting commits with limit ($state)" ' - git rev-list --count -n 1 HEAD >expect && - git rev-list --use-bitmap-index --count -n 1 HEAD >actual && + test_expect_success "counting commits with limit ($state, $branch)" ' + git rev-list --count -n 1 $branch >expect && + git rev-list --use-bitmap-index --count -n 1 $branch >actual && test_cmp expect actual ' - test_expect_success "counting non-linear history ($state)" ' + test_expect_success "counting non-linear history ($state, $branch)" ' git rev-list --count other...master >expect && git rev-list --use-bitmap-index --count other...master >actual && test_cmp expect actual ' - test_expect_success "counting commits with limiting ($state)" ' - git rev-list --count HEAD -- 1.t >expect && - git rev-list --use-bitmap-index --count HEAD -- 1.t >actual && + test_expect_success "counting commits with limiting ($state, $branch)" ' + git rev-list --count $branch -- 1.t >expect && + git rev-list --use-bitmap-index --count $branch -- 1.t >actual && test_cmp expect actual ' - test_expect_success "counting objects via bitmap ($state)" ' - git rev-list --count --objects HEAD >expect && - git rev-list --use-bitmap-index --count --objects HEAD >actual && + test_expect_success "counting objects via bitmap ($state, $branch)" ' + git rev-list --count --objects $branch >expect && + git rev-list --use-bitmap-index --count --objects $branch >actual && test_cmp expect actual ' - test_expect_success "enumerate commits ($state)" ' - git rev-list --use-bitmap-index HEAD >actual && - git rev-list HEAD >expect && + test_expect_success "enumerate commits ($state, $branch)" ' + git rev-list --use-bitmap-index $branch >actual && + git rev-list $branch >expect && test_bitmap_traversal --no-confirm-bitmaps expect actual ' - test_expect_success "enumerate --objects ($state)" ' - git rev-list --objects --use-bitmap-index HEAD >actual && - git rev-list --objects HEAD >expect && + test_expect_success "enumerate --objects ($state, $branch)" ' + git rev-list --objects --use-bitmap-index $branch >actual && + git rev-list --objects $branch >expect && test_bitmap_traversal expect actual ' - test_expect_success "bitmap --objects handles non-commit objects ($state)" ' - git rev-list --objects --use-bitmap-index HEAD tagged-blob >actual && + test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" ' + git rev-list --objects --use-bitmap-index $branch tagged-blob >actual && grep $blob actual ' } +rev_list_tests () { + state=$1 + + for branch in "master" "other" + do + rev_list_tests_head + done +} + rev_list_tests 'full bitmap' test_expect_success 'clone from bitmapped repository' ' From patchwork Tue Dec 8 00:05:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957167 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82431C4361B for ; Tue, 8 Dec 2020 00:06:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 56F5523A1D for ; Tue, 8 Dec 2020 00:06:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728594AbgLHAGN (ORCPT ); Mon, 7 Dec 2020 19:06:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728460AbgLHAGM (ORCPT ); Mon, 7 Dec 2020 19:06:12 -0500 Received: from mail-ot1-x344.google.com (mail-ot1-x344.google.com [IPv6:2607:f8b0:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93546C06138C for ; Mon, 7 Dec 2020 16:05:26 -0800 (PST) Received: by mail-ot1-x344.google.com with SMTP id h18so10331374otq.12 for ; Mon, 07 Dec 2020 16:05:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=WoZWPxmAj0x1vYLK6v7ZELgMGpSwPf3gkePQcJB85eg=; b=Ul/q6hcyDBq/04VPQ0miufuF0f834SOUJSRrziAnr+BAYvY0P+LPs3NsHrSpXY5riF KtJRXGjpcNA/8iPgQy2nGSLWNsNt7MoePiWkBr0goDTzBFBuGaQawrU0lpNREyZPEv7n a1PRIDJpylnoZR5KRSbzGAsvT7sX83Pw0AjQUM1k6suRwZhfNR1l8KyMo+TWbGMeOIL1 RaT2t3hZZbEW+UHk+rQaWo/Tgu3ldbgSt6Uz+hcr4rkZEtoP5f4Jz2K7jH4U8O0+WG9m tc1HGsfn17aO2YfGso3Jz5rkoecnYOz5Duve+Sb/LoAoO6141shMWZvG02/E438QtrSB EWWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=WoZWPxmAj0x1vYLK6v7ZELgMGpSwPf3gkePQcJB85eg=; b=kj1f2IkiQgKAORSceYz3jGCsT149uKgtLj56Q8FG/wMOzPbAZM8TUaEBpJD5gXUMdw iyozhEj/egQJvUEINlOmZNyq5MiNDwzVvvNCjRR4BlBfeH1FwaVgj8+yTidogeqievVx 0eZN0je7kckJTmJ56ZUyDvBtz43HGO8rTAMe5KC5VFjX/Bi2PcgGGs3ttW0PlsY0KG8K qRhB9B+MSRAVWmR97Z81H8mO4PRt7DgVUQEGvPPSyoK0OCGK8+BWB5iRvD8niFk46wht ds/et1/rlp9e/u/sEolWrHgDiXbUpAok3ntxtqMTIz5vTFhJX5MS4VnzDePYK/QZvgtM PODA== X-Gm-Message-State: AOAM530jpzYffaN9tn9piRh2+wCvIk9E38aTadABDUm8cfNtthDsPEYF Zt5rbuwpjsMZLsueal9/Z5yrYaSAutRBiekw X-Google-Smtp-Source: ABdhPJyBs1hsp0LEHSnOK8P1F+JQ8XO1m0xRiXzeqwV5K8WVbB+oXeHfbWpT6Lxw3Sue24LiLZT/Uw== X-Received: by 2002:a9d:590c:: with SMTP id t12mr14696415oth.308.1607385925762; Mon, 07 Dec 2020 16:05:25 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id h2sm3028319otn.15.2020.12.07.16.05.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:05:25 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:22 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 16/24] pack-bitmap-write: rename children to reverse_edges Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The bitmap_builder_init() method walks the reachable commits in topological order and constructs a "reverse graph" along the way. At the moment, this reverse graph contains an edge from commit A to commit B if and only if A is a parent of B. Thus, the name "children" is appropriate for for this reverse graph. In the next change, we will repurpose the reverse graph to not be directly-adjacent commits in the commit-graph, but instead a more abstract relationship. The previous changes have already incorporated the necessary updates to fill_bitmap_commit() that allow these edges to not be immediate children. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 957639241e..7e218d02a6 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -179,7 +179,7 @@ static void compute_xor_offsets(void) } struct bb_commit { - struct commit_list *children; + struct commit_list *reverse_edges; struct bitmap *bitmap; unsigned selected:1; unsigned idx; /* within selected array */ @@ -228,7 +228,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, for (p = commit->parents; p; p = p->next) { struct bb_commit *ent = bb_data_at(&bb->data, p->item); - commit_list_insert(commit, &ent->children); + commit_list_insert(commit, &ent->reverse_edges); } } } @@ -358,7 +358,7 @@ void bitmap_writer_build(struct packing_data *to_pack) display_progress(writer.progress, nr_stored); } - while ((child = pop_commit(&ent->children))) { + while ((child = pop_commit(&ent->reverse_edges))) { struct bb_commit *child_ent = bb_data_at(&bb.data, child); From patchwork Tue Dec 8 00:05:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957165 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BE92C4361B for ; Tue, 8 Dec 2020 00:06:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6F14023A1D for ; Tue, 8 Dec 2020 00:06:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728588AbgLHAGL (ORCPT ); Mon, 7 Dec 2020 19:06:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728460AbgLHAGL (ORCPT ); Mon, 7 Dec 2020 19:06:11 -0500 Received: from mail-ot1-x344.google.com (mail-ot1-x344.google.com [IPv6:2607:f8b0:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2D13C0611CC for ; Mon, 7 Dec 2020 16:05:30 -0800 (PST) Received: by mail-ot1-x344.google.com with SMTP id d8so6388497otq.6 for ; Mon, 07 Dec 2020 16:05:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=/xbW6H63n+yti/OsJjfkYsqvsHOXl31qvIGu38SrEno=; b=mesdvj6UHVOxJLDKkdh+ablDkj9rJY2jjm3lNHOsT/Z0XlRlN+PlMNny7diSMP+rwH waddUWaqE8dn0rVGT9ec6w7oXzYJRJii6lz+AX9PytctfI6YYyqtmCyytb/3bRseiery ecSvCYig5CsQM5GwST+5VZih2e2pwLbwQHk+G7adk7Vj/8eAPKMy2CdtjA3+gj4l2JjT dx3iQbGSqNizQPwbjugeDxgxP3IrTLiL/rMot+eiG4kX25v21xRGbUnDLC2UL17xNIcl dFjN022te2uuUrhfzyK3S0Y5QszvZoHooNSzLXT8JB6ahYvy+B8yJ/797xD9+Bl38nZi xCQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=/xbW6H63n+yti/OsJjfkYsqvsHOXl31qvIGu38SrEno=; b=O9qV6ToRz4i32zw26qeODI4e1MJFkrVl5oKb6zHKmy5tAJ6pjEADspOLUb3w0DkddR QZIY3FjwpNQOYt66ovtEJkVGAEHbz8FbM1w00pMnnwbHRO7nIK9yc8LUVGgvWKpmkCbh noUJBXhgf9B1t1UQuimjNvSp7452QeGJ1PzMFwBV/BEJZUpvZDXqTHHz2DaWPG6+T7aK wWKycmMiyx6jtGyGCy5IiaRlOmyDHQ4QAqpZY1TwG5vxlsjI2SNwkmMnhCEbQsc7OG1u 2fX42aDEo4JqX/4PXur5pwTQaEntNoLrOqrB51pGwADm+rLMzbYTEKwIJL1pwZehGPol 4Nxw== X-Gm-Message-State: AOAM5324rryiI5Wx+Dz+r1feLDLtzpbKRwbe7KNklsvm8wL//N5SMzKe fLmdhsMpN31dihoTP+TOwS5rdeud55ty+q7r X-Google-Smtp-Source: ABdhPJxzrqPZDpG81Kh7tJrjcTvyhVhhISQgHN8XRatEblpTnd3mjLn8iSz4+aYpT/mSlUyysm7tAg== X-Received: by 2002:a9d:5613:: with SMTP id e19mr14722115oti.153.1607385930135; Mon, 07 Dec 2020 16:05:30 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id j2sm291879otq.78.2020.12.07.16.05.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:05:29 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:27 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 17/24] pack-bitmap.c: check reads more aggressively when loading Message-ID: <37f96360983557f6653af85da45383dc16d8b00c.1607385833.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Before 'load_bitmap_entries_v1()' reads an actual EWAH bitmap, it should check that it can safely do so by ensuring that there are at least 6 bytes available to be read (four for the commit's index position, and then two more for the xor offset and flags, respectively). Likewise, it should check that the commit index it read refers to a legitimate object in the pack. The first fix catches a truncation bug that was exposed when testing, and the second is purely precautionary. There are some possible future improvements, not pursued here. They are: - Computing the correct boundary of the bitmap itself in the caller and ensuring that we don't read past it. This may or may not be worth it, since in a truncation situation, all bets are off: (is the trailer still there and the bitmap entries malformed, or is the trailer truncated?). The best we can do is try to read what's there as if it's correct data (and protect ourselves when it's obviously bogus). - Avoid the magic "6" by teaching read_be32() and read_u8() (both of which are custom helpers for this function) to check sizes before advancing the pointers. - Adding more tests in this area. Testing these truncation situations are remarkably fragile to even subtle changes in the bitmap generation. So, the resulting tests are likely to be quite brittle. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 4431f9f120..60c781d100 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -229,11 +229,16 @@ static int load_bitmap_entries_v1(struct bitmap_index *index) uint32_t commit_idx_pos; struct object_id oid; + if (index->map_size - index->map_pos < 6) + return error("corrupt ewah bitmap: truncated header for entry %d", i); + commit_idx_pos = read_be32(index->map, &index->map_pos); xor_offset = read_u8(index->map, &index->map_pos); flags = read_u8(index->map, &index->map_pos); - nth_packed_object_id(&oid, index->pack, commit_idx_pos); + if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0) + return error("corrupt ewah bitmap: commit index %u out of range", + (unsigned)commit_idx_pos); bitmap = read_bitmap_1(index); if (!bitmap) From patchwork Tue Dec 8 00:05:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957171 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA618C4361B for ; Tue, 8 Dec 2020 00:06:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6AD9623A1D for ; Tue, 8 Dec 2020 00:06:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728601AbgLHAGQ (ORCPT ); Mon, 7 Dec 2020 19:06:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728460AbgLHAGQ (ORCPT ); Mon, 7 Dec 2020 19:06:16 -0500 Received: from mail-oi1-x22a.google.com (mail-oi1-x22a.google.com [IPv6:2607:f8b0:4864:20::22a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D89A8C0611C5 for ; Mon, 7 Dec 2020 16:05:35 -0800 (PST) Received: by mail-oi1-x22a.google.com with SMTP id x16so17575239oic.3 for ; Mon, 07 Dec 2020 16:05:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=j/TDBrLI93V2C4lmm5ZGKarSky0RGSD6g7lSTqg4gn0=; b=A7FzOhQz+c711lC2v/kQdwYn3blLPdjX82oDxm1b6PUqberJyAi2eG1TJN4j6yVckM +qt3ssPvpIjLAzdBH2Ntu1IdAnEnm31tQzW+SafHJQ9AncvZFD3jNDlXs75pZ2THKWBW 0WA5F+Q4V+uTVxbHNlUAPHSu87qIJp3ze+c8XPmRYzBAhDBA2Jg/nz0ThNexXjfubmvw hRNYBsUl9wdCvkA2adscssF9LK3sj6B9uhTFrSzz4kMIM+T3xQRqDfdFnYQWLtDPqgDT a8OrI45VcOLgfLwyL5LdH/OmRhUbJ0KFEUR90FFqiKkuK3VOglf7h8poB1fzfuKX1C59 SeEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=j/TDBrLI93V2C4lmm5ZGKarSky0RGSD6g7lSTqg4gn0=; b=eHRYGj4yvbVCSHDbCuAt/yGWwfTQaar7BOzwu9FrwmiaOI2QYWRVxnzxP5BFtNNoGB hhm/95+aRHREOSf85Sn/vgJQAf4IhEivhsfxl3/IpzWY7Z+8aZ48UbGJbBOOpHPEHse3 qaIrQevJje4rQh7JYvKSTgB2ifdV7OzHY1g58orxJ5MfyzZiBEtmL3YzIjtbI9e92Cng ejl0KsxFoHsjAy8pv1HtbU02tNfEq5/jKFNBReDCpMPMawNrLm9+Mpr+KN1cKVV/0IXB mhv/E2XSUjqdywdmTuR8uDks70nrmw9WbxchRVIcN1gQa43ReJC0wbMIVmIrUOrErJg0 RtEA== X-Gm-Message-State: AOAM531YskdB9Zf9/1IjI5ZQljbUTMMQJcosWaz3qu5Suhu9f7ClCdJq obFArlURgPz1GN1FseL05zV+c5sJ+KkW7rha X-Google-Smtp-Source: ABdhPJzHSaul2LsEpL45Cq6G3wGj7HCZ7ReVOOkM1PJ3lniapJ1SLWxIGn8RNHp0UmMa+KxDB0N8Qw== X-Received: by 2002:a05:6808:685:: with SMTP id k5mr939312oig.135.1607385934566; Mon, 07 Dec 2020 16:05:34 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id q77sm514691ooq.15.2020.12.07.16.05.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:05:33 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:31 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 18/24] pack-bitmap-write: build fewer intermediate bitmaps Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The bitmap_writer_build() method calls bitmap_builder_init() to construct a list of commits reachable from the selected commits along with a "reverse graph". This reverse graph has edges pointing from a commit to other commits that can reach that commit. After computing a reachability bitmap for a commit, the values in that bitmap are then copied to the reachability bitmaps across the edges in the reverse graph. We can now relax the role of the reverse graph to greatly reduce the number of intermediate reachability bitmaps we compute during this reverse walk. The end result is that we walk objects the same number of times as before when constructing the reachability bitmaps, but we also spend much less time copying bits between bitmaps and have much lower memory pressure in the process. The core idea is to select a set of "important" commits based on interactions among the sets of commits reachable from each selected commit. The first technical concept is to create a new 'commit_mask' member in the bb_commit struct. Note that the selected commits are provided in an ordered array. The first thing to do is to mark the ith bit in the commit_mask for the ith selected commit. As we walk the commit-graph, we copy the bits in a commit's commit_mask to its parents. At the end of the walk, the ith bit in the commit_mask for a commit C stores a boolean representing "The ith selected commit can reach C." As we walk, we will discover non-selected commits that are important. We will get into this later, but those important commits must also receive bit positions, growing the width of the bitmasks as we walk. At the true end of the walk, the ith bit means "the ith _important_ commit can reach C." MAXIMAL COMMITS --------------- We use a new 'maximal' bit in the bb_commit struct to represent whether a commit is important or not. The term "maximal" comes from the partially-ordered set of commits in the commit-graph where C >= P if P is a parent of C, and then extending the relationship transitively. Instead of taking the maximal commits across the entire commit-graph, we instead focus on selecting each commit that is maximal among commits with the same bits on in their commit_mask. This definition is important, so let's consider an example. Suppose we have three selected commits A, B, and C. These are assigned bitmasks 100, 010, and 001 to start. Each of these can be marked as maximal immediately because they each will be the uniquely maximal commit that contains their own bit. Keep in mind that that these commits may have different bitmasks after the walk; for example, if B can reach C but A cannot, then the final bitmask for C is 011. Even in these cases, C would still be a maximal commit among all commits with the third bit on in their masks. Now define sets X, Y, and Z to be the sets of commits reachable from A, B, and C, respectively. The intersections of these sets correspond to different bitmasks: * 100: X - (Y union Z) * 010: Y - (X union Z) * 001: Z - (X union Y) * 110: (X intersect Y) - Z * 101: (X intersect Z) - Y * 011: (Y intersect Z) - X * 111: X intersect Y intersect Z This can be visualized with the following Hasse diagram: 100 010 001 | \ / \ / | | \/ \/ | | /\ /\ | | / \ / \ | 110 101 011 \___ | ___/ \ | / 111 Some of these bitmasks may not be represented, depending on the topology of the commit-graph. In fact, we are counting on it, since the number of possible bitmasks is exponential in the number of selected commits, but is also limited by the total number of commits. In practice, very few bitmasks are possible because most commits converge on a common "trunk" in the commit history. With this three-bit example, we wish to find commits that are maximal for each bitmask. How can we identify this as we are walking? As we walk, we visit a commit C. Since we are walking the commits in topo-order, we know that C is visited after all of its children are visited. Thus, when we get C from the revision walk we inspect the 'maximal' property of its bb_data and use that to determine if C is truly important. Its commit_mask is also nearly final. If C is not one of the originally-selected commits, then assign a bit position to C (by incrementing num_maximal) and set that bit on in commit_mask. See "MULTIPLE MAXIMAL COMMITS" below for more detail on this. Now that the commit C is known to be maximal or not, consider each parent P of C. Compute two new values: * c_not_p : true if and only if the commit_mask for C contains a bit that is not contained in the commit_mask for P. * p_not_c : true if and only if the commit_mask for P contains a bit that is not contained in the commit_mask for P. If c_not_p is false, then P already has all of the bits that C would provide to its commit_mask. In this case, move on to other parents as C has nothing to contribute to P's state that was not already provided by other children of P. We continue with the case that c_not_p is true. This means there are bits in C's commit_mask to copy to P's commit_mask, so use bitmap_or() to add those bits. If p_not_c is also true, then set the maximal bit for P to one. This means that if no other commit has P as a parent, then P is definitely maximal. This is because no child had the same bitmask. It is important to think about the maximal bit for P at this point as a temporary state: "P is maximal based on current information." In contrast, if p_not_c is false, then set the maximal bit for P to zero. Further, clear all reverse_edges for P since any edges that were previously assigned to P are no longer important. P will gain all reverse edges based on C. The final thing we need to do is to update the reverse edges for P. These reverse edges respresent "which closest maximal commits contributed bits to my commit_mask?" Since C contributed bits to P's commit_mask in this case, C must add to the reverse edges of P. If C is maximal, then C is a 'closest' maximal commit that contributed bits to P. Add C to P's reverse_edges list. Otherwise, C has a list of maximal commits that contributed bits to its bitmask (and this list is exactly one element). Add all of these items to P's reverse_edges list. Be careful to ignore duplicates here. After inspecting all parents P for a commit C, we can clear the commit_mask for C. This reduces the memory load to be limited to the "width" of the commit graph. Consider our ABC/XYZ example from earlier and let's inspect the state of the commits for an interesting bitmask, say 011. Suppose that D is the only maximal commit with this bitmask (in the first three bits). All other commits with bitmask 011 have D as the only entry in their reverse_edges list. D's reverse_edges list contains B and C. COMPUTING REACHABILITY BITMAPS ------------------------------ Now that we have our definition, let's zoom out and consider what happens with our new reverse graph when computing reachability bitmaps. We walk the reverse graph in reverse-topo-order, so we visit commits with largest commit_masks first. After we compute the reachability bitmap for a commit C, we push the bits in that bitmap to each commit D in the reverse edge list for C. Then, when we finally visit D we already have the bits for everything reachable from maximal commits that D can reach and we only need to walk the objects in the set-difference. In our ABC/XYZ example, when we finally walk for the commit A we only need to walk commits with bitmask equal to A's bitmask. If that bitmask is 100, then we are only walking commits in X - (Y union Z) because the bitmap already contains the bits for objects reachable from (X intersect Y) union (X intersect Z) (i.e. the bits from the reachability bitmaps for the maximal commits with bitmasks 110 and 101). The behavior is intended to walk each commit (and the trees that commit introduces) at most once while allocating and copying fewer reachability bitmaps. There is one caveat: what happens when there are multiple maximal commits with the same bitmask, with respect to the initial set of selected commits? MULTIPLE MAXIMAL COMMITS ------------------------ Earlier, we mentioned that when we discover a new maximal commit, we assign a new bit position to that commit and set that bit position to one for that commit. This is absolutely important for interesting commit-graphs such as git/git and torvalds/linux. The reason is due to the existence of "butterflies" in the commit-graph partial order. Here is an example of four commits forming a butterfly: I J |\ /| | \/ | | /\ | |/ \| M N \ / |/ Q Here, I and J both have parents M and N. In general, these do not need to be exact parent relationships, but reachability relationships. The most important part is that M and N cannot reach each other, so they are independent in the partial order. If I had commit_mask 10 and J had commit_mask 01, then M and N would both be assigned commit_mask 11 and be maximal commits with the bitmask 11. Then, what happens when M and N can both reach a commit Q? If Q is also assigned the bitmask 11, then it is not maximal but is reachable from both M and N. While this is not necessarily a deal-breaker for our abstract definition of finding maximal commits according to a given bitmask, we have a few issues that can come up in our larger picture of constructing reachability bitmaps. In particular, if we do not also consider Q to be a "maximal" commit, then we will walk commits reachable from Q twice: once when computing the reachability bitmap for M and another time when computing the reachability bitmap for N. This becomes much worse if the topology continues this pattern with multiple butterflies. The solution has already been mentioned: each of M and N are assigned their own bits to the bitmask and hence they become uniquely maximal for their bitmasks. Finally, Q also becomes maximal and thus we do not need to walk its commits multiple times. The final bitmasks for these commits are as follows: I:10 J:01 |\ /| | \ _____/ | | /\____ | |/ \ | M:111 N:1101 \ / Q:1111 Further, Q's reverse edge list is { M, N }, while M and N both have reverse edge list { I, J }. PERFORMANCE MEASUREMENTS ------------------------ Now that we've spent a LOT of time on the theory of this algorithm, let's show that this is actually worth all that effort. To test the performance, use GIT_TRACE2_PERF=1 when running 'git repack -abd' in a repository with no existing reachability bitmaps. This avoids any issues with keeping existing bitmaps to skew the numbers. Inspect the "building_bitmaps_total" region in the trace2 output to focus on the portion of work that is affected by this change. Here are the performance comparisons for a few repositories. The timings are for the following versions of Git: "multi" is the timing from before any reverse graph is constructed, where we might perform multiple traversals. "reverse" is for the previous change where the reverse graph has every reachable commit. Finally "maximal" is the version introduced here where the reverse graph only contains the maximal commits. Repository: git/git multi: 2.628 sec reverse: 2.344 sec maximal: 2.047 sec Repository: torvalds/linux multi: 64.7 sec reverse: 205.3 sec maximal: 44.7 sec So in all cases we've not only recovered any time lost to switching to the reverse-edge algorithm, but we come out ahead of "multi" in all cases. Likewise, peak heap has gone back to something reasonable: Repository: torvalds/linux multi: 2.087 GB reverse: 3.141 GB maximal: 2.288 GB While I do not have access to full fork networks on GitHub, Peff has run this algorithm on the chromium/chromium fork network and reported a change from 3 hours to ~233 seconds. That network is particularly beneficial for this approach because it has a long, linear history along with many tags. The "multi" approach was obviously quadratic and the new approach is linear. Helped-by: Jeff King Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 72 +++++++++++++++++++++++++++++++--- t/t5310-pack-bitmaps.sh | 85 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 148 insertions(+), 9 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 7e218d02a6..0af93193d8 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -180,8 +180,10 @@ static void compute_xor_offsets(void) struct bb_commit { struct commit_list *reverse_edges; + struct bitmap *commit_mask; struct bitmap *bitmap; - unsigned selected:1; + unsigned selected:1, + maximal:1; unsigned idx; /* within selected array */ }; @@ -198,7 +200,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, { struct rev_info revs; struct commit *commit; - unsigned int i; + unsigned int i, num_maximal; memset(bb, 0, sizeof(*bb)); init_bb_data(&bb->data); @@ -210,27 +212,85 @@ static void bitmap_builder_init(struct bitmap_builder *bb, for (i = 0; i < writer->selected_nr; i++) { struct commit *c = writer->selected[i].commit; struct bb_commit *ent = bb_data_at(&bb->data, c); + ent->selected = 1; + ent->maximal = 1; ent->idx = i; + + ent->commit_mask = bitmap_new(); + bitmap_set(ent->commit_mask, i); + add_pending_object(&revs, &c->object, ""); } + num_maximal = writer->selected_nr; if (prepare_revision_walk(&revs)) die("revision walk setup failed"); while ((commit = get_revision(&revs))) { struct commit_list *p; + struct bb_commit *c_ent; parse_commit_or_die(commit); - ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); - bb->commits[bb->commits_nr++] = commit; + c_ent = bb_data_at(&bb->data, commit); + + if (c_ent->maximal) { + if (!c_ent->selected) { + bitmap_set(c_ent->commit_mask, num_maximal); + num_maximal++; + } + + ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); + bb->commits[bb->commits_nr++] = commit; + } for (p = commit->parents; p; p = p->next) { - struct bb_commit *ent = bb_data_at(&bb->data, p->item); - commit_list_insert(commit, &ent->reverse_edges); + struct bb_commit *p_ent = bb_data_at(&bb->data, p->item); + int c_not_p, p_not_c; + + if (!p_ent->commit_mask) { + p_ent->commit_mask = bitmap_new(); + c_not_p = 1; + p_not_c = 0; + } else { + c_not_p = bitmap_is_subset(c_ent->commit_mask, p_ent->commit_mask); + p_not_c = bitmap_is_subset(p_ent->commit_mask, c_ent->commit_mask); + } + + if (!c_not_p) + continue; + + bitmap_or(p_ent->commit_mask, c_ent->commit_mask); + + if (p_not_c) + p_ent->maximal = 1; + else { + p_ent->maximal = 0; + free_commit_list(p_ent->reverse_edges); + p_ent->reverse_edges = NULL; + } + + if (c_ent->maximal) { + commit_list_insert(commit, &p_ent->reverse_edges); + } else { + struct commit_list *cc = c_ent->reverse_edges; + + for (; cc; cc = cc->next) { + if (!commit_list_contains(cc->item, p_ent->reverse_edges)) + commit_list_insert(cc->item, &p_ent->reverse_edges); + } + } } + + bitmap_free(c_ent->commit_mask); + c_ent->commit_mask = NULL; } + + trace2_data_intmax("pack-bitmap-write", the_repository, + "num_selected_commits", writer->selected_nr); + trace2_data_intmax("pack-bitmap-write", the_repository, + "num_maximal_commits", num_maximal); } static void bitmap_builder_clear(struct bitmap_builder *bb) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index b1248f1cc8..4c928221be 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -20,11 +20,87 @@ has_any () { grep -Ff "$1" "$2" } +# To ensure the logic for "maximal commits" is exercised, make +# the repository a bit more complicated. +# +# other master +# * * +# (99 commits) (99 commits) +# * * +# |\ /| +# | * octo-other octo-master * | +# |/|\_________ ____________/|\| +# | \ \/ __________/ | +# | | ________/\ / | +# * |/ * merge-right * +# | _|__________/ \____________ | +# |/ | \| +# (l1) * * merge-left * (r1) +# | / \________________________ | +# |/ \| +# (l2) * * (r2) +# \___________________________ | +# \| +# * (base) +# +# The important part for the maximal commit algorithm is how +# the bitmasks are extended. Assuming starting bit positions +# for master (bit 0) and other (bit 1), and some flexibility +# in the order that merge bases are visited, the bitmasks at +# the end should be: +# +# master: 1 (maximal, selected) +# other: 01 (maximal, selected) +# octo-master: 1 +# octo-other: 01 +# merge-right: 111 (maximal) +# (l1): 111 +# (r1): 111 +# merge-left: 1101 (maximal) +# (l2): 11111 (maximal) +# (r2): 111101 (maximal) +# (base): 1111111 (maximal) + test_expect_success 'setup repo with moderate-sized history' ' - test_commit_bulk --id=file 100 && + test_commit_bulk --id=file 10 && git checkout -b other HEAD~5 && test_commit_bulk --id=side 10 && + + # add complicated history setup, including merges and + # ambiguous merge-bases + + git checkout -b merge-left other~2 && + git merge master~2 -m "merge-left" && + + git checkout -b merge-right master~1 && + git merge other~1 -m "merge-right" && + + git checkout -b octo-master master && + git merge merge-left merge-right -m "octopus-master" && + + git checkout -b octo-other other && + git merge merge-left merge-right -m "octopus-other" && + + git checkout other && + git merge octo-other -m "pull octopus" && + git checkout master && + git merge octo-master -m "pull octopus" && + + # Remove these branches so they are not selected + # as bitmap tips + git branch -D merge-left && + git branch -D merge-right && + git branch -D octo-other && + git branch -D octo-master && + + # add padding to make these merges less interesting + # and avoid having them selected for bitmaps + test_commit_bulk --id=file 100 && + git checkout other && + test_commit_bulk --id=side 100 && + git checkout master && + bitmaptip=$(git rev-parse master) && blob=$(echo tagged-blob | git hash-object -w --stdin) && git tag tagged-blob $blob && @@ -32,9 +108,12 @@ test_expect_success 'setup repo with moderate-sized history' ' ' test_expect_success 'full repack creates bitmaps' ' - git repack -ad && + GIT_TRACE2_EVENT_NESTING=4 GIT_TRACE2_EVENT="$(pwd)/trace" \ + git repack -ad && ls .git/objects/pack/ | grep bitmap >output && - test_line_count = 1 output + test_line_count = 1 output && + grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace && + grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace ' test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' From patchwork Tue Dec 8 00:05:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957175 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D28DC4167B for ; Tue, 8 Dec 2020 00:06:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 463BC23A1D for ; Tue, 8 Dec 2020 00:06:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728607AbgLHAG0 (ORCPT ); Mon, 7 Dec 2020 19:06:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728300AbgLHAGZ (ORCPT ); Mon, 7 Dec 2020 19:06:25 -0500 Received: from mail-ot1-x331.google.com (mail-ot1-x331.google.com [IPv6:2607:f8b0:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8E8AC061257 for ; Mon, 7 Dec 2020 16:05:39 -0800 (PST) Received: by mail-ot1-x331.google.com with SMTP id h19so14340925otr.1 for ; Mon, 07 Dec 2020 16:05:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=5oK9d8MRvwLAr4PgAX2C7YmL72QhE9Q310xr6XLqZic=; b=apmt2GF7L2sLHpAH0dNeJrVXrW7pWxS8xd9OeVNtvDV69j30dkAGP9TI4qG1fCaXEQ thBPuFfTJtoSXX1XQ7H11EWcPjj8JYgavSKUd9yYmv6dOhF1Hx/eMOVDPeWcrw+AmDqN 81s+/d47HY/pu4aGt+qlPzVKzpbAlHZCrcRfzPA4BYh+hLjO8u2jqemc9OYLA3OaSf+1 tfNrcEQKcLZvZMqj639iTkNBYXYbXommFZWs8YgT6qvkOZj6LpJcjUeo3o9iQnAjpZu0 K01Ted05PD5VuuVmD4z3w57OMPxwx3viFUlvD//25uC1qpWlsCzERk5MMDkQtJu3DSwr hguw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=5oK9d8MRvwLAr4PgAX2C7YmL72QhE9Q310xr6XLqZic=; b=nmZU7rH7prksg2hqQO2NaQRKXGnNeRXD03xrDaeF9DSgFc2qpeHdQxvJ6TAZlQpU32 P7EM5u7YT+3YaAODLzSGWjg8u+ACfWS8aiVQPJmm66av/fWdlMaYsAFw9U/DCLw2WIka IhpDF2yvkgrpQvdc+L0CCRVDosbFAisfLTt/hfQ/bRxoLy/Vz/C4KlKjGQwLyePmZyLD G0vQIsSGCHjTEZQzIJRypLHJYG7K0B3qFM4OUjVWcUnBoWiWIDMfhwVw/uOQs3atiLSV UWA43nOHVtJ1xQx8m6RNlip4Zkbe5O+OTEMSnIjbjAo77wLhuXWHeW2vh2e8fXw0OwGo 0fyA== X-Gm-Message-State: AOAM533r76jWcwRln99nGqWQjxejZyFO3c8Y9/lK6yAcX9O3sbZdpDQp NgkR0ggPl/8Q3FxlcJ9Z9dazqYlKeaLipigw X-Google-Smtp-Source: ABdhPJyJFZrGH6Cf2jTQJu7Qnw/atyCH20E+BbsC0NAh4nv6ec6hXATZrFrtQpE2j4dq2sNLGQDV+A== X-Received: by 2002:a9d:7a97:: with SMTP id l23mr14763954otn.232.1607385938896; Mon, 07 Dec 2020 16:05:38 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id n31sm1085775otn.33.2020.12.07.16.05.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:05:38 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:36 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 19/24] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King The on-disk bitmap format has a flag to mark a bitmap to be "reused". This is a rather curious feature, and works like this: - a run of pack-objects would decide to mark the last 80% of the bitmaps it generates with the reuse flag - the next time we generate bitmaps, we'd see those reuse flags from the last run, and mark those commits as special: - we'd be more likely to select those commits to get bitmaps in the new output - when generating the bitmap for a selected commit, we'd reuse the old bitmap as-is (rearranging the bits to match the new pack, of course) However, neither of these behaviors particularly makes sense. Just because a commit happened to be bitmapped last time does not make it a good candidate for having a bitmap this time. In particular, we may choose bitmaps based on how recent they are in history, or whether a ref tip points to them, and those things will change. We're better off re-considering fresh which commits are good candidates. Reusing the existing bitmap _is_ a reasonable thing to do to save computation. But only reusing exact bitmaps is a weak form of this. If we have an old bitmap for A and now want a new bitmap for its child, we should be able to compute that only by looking at trees and that are new to the child. But this code would consider only exact reuse (which is perhaps why it was eager to select those commits in the first place). Furthermore, the recent switch to the reverse-edge algorithm for generating bitmaps dropped this optimization entirely (and yet still performs better). So let's do a few cleanups: - drop the whole "reusing bitmaps" phase of generating bitmaps. It's not helping anything, and is mostly unused code (or worse, code that is using CPU but not doing anything useful) - drop the use of the on-disk reuse flag to select commits to bitmap - stop setting the on-disk reuse flag in bitmaps we generate (since nothing respects it anymore) We will keep a few innards of the reuse code, which will help us implement a more capable version of the "reuse" optimization: - simplify rebuild_existing_bitmaps() into a function that only builds the mapping of bits between the old and new orders, but doesn't actually convert any bitmaps - make rebuild_bitmap() public; we'll call it lazily to convert bitmaps as we traverse (using the mapping created above) Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 1 - pack-bitmap-write.c | 50 +++++------------------------------------- pack-bitmap.c | 46 +++++--------------------------------- pack-bitmap.h | 6 ++++- 4 files changed, 16 insertions(+), 87 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 5617c01b5a..2a00358f34 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1104,7 +1104,6 @@ static void write_pack_file(void) stop_progress(&progress_state); bitmap_writer_show_progress(progress); - bitmap_writer_reuse_bitmaps(&to_pack); bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1); bitmap_writer_build(&to_pack); bitmap_writer_finish(written_list, nr_written, diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 0af93193d8..333058854d 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -30,7 +30,6 @@ struct bitmap_writer { struct ewah_bitmap *tags; kh_oid_map_t *bitmaps; - kh_oid_map_t *reused; struct packing_data *to_pack; struct bitmapped_commit *selected; @@ -112,7 +111,7 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack, * Compute the actual bitmaps */ -static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused) +static inline void push_bitmapped_commit(struct commit *commit) { if (writer.selected_nr >= writer.selected_alloc) { writer.selected_alloc = (writer.selected_alloc + 32) * 2; @@ -120,7 +119,7 @@ static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitm } writer.selected[writer.selected_nr].commit = commit; - writer.selected[writer.selected_nr].bitmap = reused; + writer.selected[writer.selected_nr].bitmap = NULL; writer.selected[writer.selected_nr].flags = 0; writer.selected_nr++; @@ -372,13 +371,6 @@ static void store_selected(struct bb_commit *ent, struct commit *commit) khiter_t hash_pos; int hash_ret; - /* - * the "reuse bitmaps" phase may have stored something here, but - * our new algorithm doesn't use it. Drop it. - */ - if (stored->bitmap) - ewah_free(stored->bitmap); - stored->bitmap = bitmap_to_ewah(ent->bitmap); hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret); @@ -480,35 +472,6 @@ static int date_compare(const void *_a, const void *_b) return (long)b->date - (long)a->date; } -void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack) -{ - struct bitmap_index *bitmap_git; - if (!(bitmap_git = prepare_bitmap_git(to_pack->repo))) - return; - - writer.reused = kh_init_oid_map(); - rebuild_existing_bitmaps(bitmap_git, to_pack, writer.reused, - writer.show_progress); - /* - * NEEDSWORK: rebuild_existing_bitmaps() makes writer.reused reference - * some bitmaps in bitmap_git, so we can't free the latter. - */ -} - -static struct ewah_bitmap *find_reused_bitmap(const struct object_id *oid) -{ - khiter_t hash_pos; - - if (!writer.reused) - return NULL; - - hash_pos = kh_get_oid_map(writer.reused, *oid); - if (hash_pos >= kh_end(writer.reused)) - return NULL; - - return kh_value(writer.reused, hash_pos); -} - void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps) @@ -522,12 +485,11 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, if (indexed_commits_nr < 100) { for (i = 0; i < indexed_commits_nr; ++i) - push_bitmapped_commit(indexed_commits[i], NULL); + push_bitmapped_commit(indexed_commits[i]); return; } for (;;) { - struct ewah_bitmap *reused_bitmap = NULL; struct commit *chosen = NULL; next = next_commit_index(i); @@ -542,15 +504,13 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, if (next == 0) { chosen = indexed_commits[i]; - reused_bitmap = find_reused_bitmap(&chosen->object.oid); } else { chosen = indexed_commits[i + next]; for (j = 0; j <= next; ++j) { struct commit *cm = indexed_commits[i + j]; - reused_bitmap = find_reused_bitmap(&cm->object.oid); - if (reused_bitmap || (cm->object.flags & NEEDS_BITMAP) != 0) { + if ((cm->object.flags & NEEDS_BITMAP) != 0) { chosen = cm; break; } @@ -560,7 +520,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, } } - push_bitmapped_commit(chosen, reused_bitmap); + push_bitmapped_commit(chosen); i += next + 1; display_progress(writer.progress, i); diff --git a/pack-bitmap.c b/pack-bitmap.c index 60c781d100..d1368b69bb 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1338,9 +1338,9 @@ void test_bitmap_walk(struct rev_info *revs) free_bitmap_index(bitmap_git); } -static int rebuild_bitmap(uint32_t *reposition, - struct ewah_bitmap *source, - struct bitmap *dest) +int rebuild_bitmap(const uint32_t *reposition, + struct ewah_bitmap *source, + struct bitmap *dest) { uint32_t pos = 0; struct ewah_iterator it; @@ -1369,19 +1369,11 @@ static int rebuild_bitmap(uint32_t *reposition, return 0; } -int rebuild_existing_bitmaps(struct bitmap_index *bitmap_git, - struct packing_data *mapping, - kh_oid_map_t *reused_bitmaps, - int show_progress) +uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, + struct packing_data *mapping) { uint32_t i, num_objects; uint32_t *reposition; - struct bitmap *rebuild; - struct stored_bitmap *stored; - struct progress *progress = NULL; - - khiter_t hash_pos; - int hash_ret; num_objects = bitmap_git->pack->num_objects; reposition = xcalloc(num_objects, sizeof(uint32_t)); @@ -1399,33 +1391,7 @@ int rebuild_existing_bitmaps(struct bitmap_index *bitmap_git, reposition[i] = oe_in_pack_pos(mapping, oe) + 1; } - rebuild = bitmap_new(); - i = 0; - - if (show_progress) - progress = start_progress("Reusing bitmaps", 0); - - kh_foreach_value(bitmap_git->bitmaps, stored, { - if (stored->flags & BITMAP_FLAG_REUSE) { - if (!rebuild_bitmap(reposition, - lookup_stored_bitmap(stored), - rebuild)) { - hash_pos = kh_put_oid_map(reused_bitmaps, - stored->oid, - &hash_ret); - kh_value(reused_bitmaps, hash_pos) = - bitmap_to_ewah(rebuild); - } - bitmap_reset(rebuild); - display_progress(progress, ++i); - } - }); - - stop_progress(&progress); - - free(reposition); - bitmap_free(rebuild); - return 0; + return reposition; } void free_bitmap_index(struct bitmap_index *b) diff --git a/pack-bitmap.h b/pack-bitmap.h index 1203120c43..afa4115136 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -73,7 +73,11 @@ void bitmap_writer_set_checksum(unsigned char *sha1); void bitmap_writer_build_type_index(struct packing_data *to_pack, struct pack_idx_entry **index, uint32_t index_nr); -void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack); +uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, + struct packing_data *mapping); +int rebuild_bitmap(const uint32_t *reposition, + struct ewah_bitmap *source, + struct bitmap *dest); void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps); void bitmap_writer_build(struct packing_data *to_pack); From patchwork Tue Dec 8 00:05:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1385C4361B for ; Tue, 8 Dec 2020 00:06:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B456A23A1D for ; Tue, 8 Dec 2020 00:06:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728606AbgLHAGY (ORCPT ); Mon, 7 Dec 2020 19:06:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728300AbgLHAGY (ORCPT ); Mon, 7 Dec 2020 19:06:24 -0500 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F3AFC0611CA for ; Mon, 7 Dec 2020 16:05:44 -0800 (PST) Received: by mail-oi1-x244.google.com with SMTP id o25so17553816oie.5 for ; Mon, 07 Dec 2020 16:05:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=LViEVk8iVCd/Fue2tk1lTGL+H9g9t3D0HHU/a47NVcM=; b=PZ0qwWqea+1gSo3pCG66yDSQDQtNavM2ugrjq6cDQG+Dy2vqmlfQom5PzDIGaWRMyo 5SWChe1uCgOzpEBJ2N8H8lopnA8vBTjRl7Q2LchV+kVK27TtkqAHOch8/n53juY37Qt7 u1euhgHEjTz0Cxz93iXBDIifFfaghqL8WzDTDo6uYpCiWjccO0l1SbCmN136+I4WF32x uxcGd2X+ygWtpPOh1J9SdzLjc9VuV2LAJ3bHQFzwoesjCkWUct4srTDJYr+wpqswTlHP ZlU1aQIWGgB0bD6nbaMe2x4NhbKwWkUOjzefh5+XtahksjNe+l+GK1ISPrILlqFmtNDe utDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=LViEVk8iVCd/Fue2tk1lTGL+H9g9t3D0HHU/a47NVcM=; b=srt0Xq6CXqHkn+M4qRPTbAnElZGuWRRmY7mbOPtvYWdpcpGRs/tV60YMy9Ff0XFbPj FfGf6hCU6jsc3kdhUjSy6if1Yz3VRZ3NChBIFuc/smk1AZiT3E7XUF8aBRz3JYCPQamY taR0NcFMEEqBdKW+a9cscWoGP0yJwSssi1IwK7jrBR/bj7KByGi671/2QHtvLCWim4WJ 3v5N8E2lpZEhIeuBM1wqKQRrQDUYvYIvbmbGJD7dE9AFbDKFw65q7vzFtzz6Tso9Db3d GsJrPPWig1a5jjoUkInFTgxVOsochb2Yevnp99xuj3rLxTIl5tj7Y5mY8g0fRykkPXI9 bg/w== X-Gm-Message-State: AOAM530/NOJS1/LBgyluWHewZfTGSn3wFNQuGHXHupWdmtTFUxTXAS6W Jgja23Jip58ypnWnQT8snxzmi/dis2diDYMu X-Google-Smtp-Source: ABdhPJxJIlwuJTLZ+CV67vaR+RZzhwJv/qvxbcyFZ3yYeDeUppgPZ3TvV6DYYLE+DcFeuSm1DKlqmA== X-Received: by 2002:aca:e146:: with SMTP id y67mr969596oig.70.1607385943165; Mon, 07 Dec 2020 16:05:43 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id i82sm3314112oia.2.2020.12.07.16.05.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:05:42 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:40 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 20/24] pack-bitmap: factor out 'bitmap_for_commit()' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A couple of callers within pack-bitmap.c duplicate logic to lookup a given object id in the bitamps khash. Factor this out into a new function, 'bitmap_for_commit()' to reduce some code duplication. Make this new function non-static, since it will be used in later commits from outside of pack-bitmap.c. Signed-off-by: Taylor Blau --- pack-bitmap.c | 33 +++++++++++++++++++-------------- pack-bitmap.h | 2 ++ 2 files changed, 21 insertions(+), 14 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index d1368b69bb..5efb8af121 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -380,6 +380,16 @@ struct include_data { struct bitmap *seen; }; +struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git, + struct commit *commit) +{ + khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps, + commit->object.oid); + if (hash_pos >= kh_end(bitmap_git->bitmaps)) + return NULL; + return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos)); +} + static inline int bitmap_position_extended(struct bitmap_index *bitmap_git, const struct object_id *oid) { @@ -465,10 +475,10 @@ static void show_commit(struct commit *commit, void *data) static int add_to_include_set(struct bitmap_index *bitmap_git, struct include_data *data, - const struct object_id *oid, + struct commit *commit, int bitmap_pos) { - khiter_t hash_pos; + struct ewah_bitmap *partial; if (data->seen && bitmap_get(data->seen, bitmap_pos)) return 0; @@ -476,10 +486,9 @@ static int add_to_include_set(struct bitmap_index *bitmap_git, if (bitmap_get(data->base, bitmap_pos)) return 0; - hash_pos = kh_get_oid_map(bitmap_git->bitmaps, *oid); - if (hash_pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, hash_pos); - bitmap_or_ewah(data->base, lookup_stored_bitmap(st)); + partial = bitmap_for_commit(bitmap_git, commit); + if (partial) { + bitmap_or_ewah(data->base, partial); return 0; } @@ -498,8 +507,7 @@ static int should_include(struct commit *commit, void *_data) (struct object *)commit, NULL); - if (!add_to_include_set(data->bitmap_git, data, &commit->object.oid, - bitmap_pos)) { + if (!add_to_include_set(data->bitmap_git, data, commit, bitmap_pos)) { struct commit_list *parent = commit->parents; while (parent) { @@ -1282,10 +1290,10 @@ void test_bitmap_walk(struct rev_info *revs) { struct object *root; struct bitmap *result = NULL; - khiter_t pos; size_t result_popcnt; struct bitmap_test_data tdata; struct bitmap_index *bitmap_git; + struct ewah_bitmap *bm; if (!(bitmap_git = prepare_bitmap_git(revs->repo))) die("failed to load bitmap indexes"); @@ -1297,12 +1305,9 @@ void test_bitmap_walk(struct rev_info *revs) bitmap_git->version, bitmap_git->entry_count); root = revs->pending.objects[0].item; - pos = kh_get_oid_map(bitmap_git->bitmaps, root->oid); - - if (pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos); - struct ewah_bitmap *bm = lookup_stored_bitmap(st); + bm = bitmap_for_commit(bitmap_git, (struct commit *)root); + if (bm) { fprintf(stderr, "Found bitmap for %s. %d bits / %08x checksum\n", oid_to_hex(&root->oid), (int)bm->bit_size, ewah_checksum(bm)); diff --git a/pack-bitmap.h b/pack-bitmap.h index afa4115136..25dfcf5615 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -78,6 +78,8 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, int rebuild_bitmap(const uint32_t *reposition, struct ewah_bitmap *source, struct bitmap *dest); +struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git, + struct commit *commit); void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps); void bitmap_writer_build(struct packing_data *to_pack); From patchwork Tue Dec 8 00:05:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957177 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CABD5C4361B for ; Tue, 8 Dec 2020 00:06:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9987E23A1D for ; Tue, 8 Dec 2020 00:06:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728441AbgLHAGe (ORCPT ); Mon, 7 Dec 2020 19:06:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728300AbgLHAGe (ORCPT ); Mon, 7 Dec 2020 19:06:34 -0500 Received: from mail-ot1-x344.google.com (mail-ot1-x344.google.com [IPv6:2607:f8b0:4864:20::344]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 491FEC061794 for ; Mon, 7 Dec 2020 16:05:48 -0800 (PST) Received: by mail-ot1-x344.google.com with SMTP id i6so8222342otr.2 for ; Mon, 07 Dec 2020 16:05:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=/3ATCqzziAQFVwBi3ny+EmDvKvp43Oa289NvDRWL6DI=; b=WJE5/+PDOsdM5cx7TTmDZKVjZRkDLzTesBZdhNhsWGoGVqWasTo0FQz4DrlaRcBdup 6yOnszfh/hL5B3hmKBen1TMkrAXx+AUAIhMI8XGUxX7SPh98bSaRC9/w4vVV7prQ0cm9 vXJWCfV60+E8Fp3630QQFkQ2kb2O72bulDlPXpCgPMfqFPCvpwkYV6DbzNXfUZJKWCIz vUoh28TduATJl8KdkcSJC5mI3QyLjl+A1nf+TTtqKv3qvvplIarApEujzyrQm87rooAe wsMvI9nFD2wpeV3jUfFbITHlp0+qdWBiGcwmo7bDj4jPAOjNhch2K55qLPatp5F+yHBu 50MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=/3ATCqzziAQFVwBi3ny+EmDvKvp43Oa289NvDRWL6DI=; b=g9cWJu4MCBCMFoZ5+L1DfO+9vzUgZA8AeN7FL0SAbuo5zKXkRgQ0TYUNnPlO4ngmfq IPGxPKP0P1TL98eA1EYMriCo2ls9E7SdK/QSGv88LtMwYb31CSj5YicVT1fORZ+sl2m1 sDaatLQNrp7tx70hrG/HVa8kvfJZ/Wha9MSb9meNk209ta1oEQZZDe8CG1c1peQJiAn1 478LtSYOxNdTkW/GL+d7OWC9OK1mAty8R8SXs7BY2kMYKjrNQOFPwbEMcu0Kn9E4B/la 3eTFh8DWisre1YSmcw71RmxJ7iWpDp3uCeqXp0muCkwajbIep/tZYrZez+oogfbYcmoF KPqA== X-Gm-Message-State: AOAM530rmUWN9BZzygs+7HPnMngCVAN5G0VBnNPag9H5swz7yNJYZY2k 72YkymJ6hssDSeUQuzZ+6Y1KFLlQkDNoaGAj X-Google-Smtp-Source: ABdhPJx8bIObmDKpvbNwnAuYC4EhKRmisXWXmWWfvp8jdlhjyh5zI0IlvuuWJN9qJGmRu8R3ZlVCgw== X-Received: by 2002:a05:6830:1551:: with SMTP id l17mr14716154otp.279.1607385947425; Mon, 07 Dec 2020 16:05:47 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id u10sm3542348oig.54.2020.12.07.16.05.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:05:46 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:44 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 21/24] pack-bitmap: factor out 'add_commit_to_bitmap()' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org 'find_objects()' currently needs to interact with the bitmaps khash pretty closely. To make 'find_objects()' read a little more straightforwardly, remove some of the khash-level details into a new function that describes what it does: 'add_commit_to_bitmap()'. Signed-off-by: Taylor Blau --- pack-bitmap.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 5efb8af121..d88745fb02 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -521,6 +521,23 @@ static int should_include(struct commit *commit, void *_data) return 1; } +static int add_commit_to_bitmap(struct bitmap_index *bitmap_git, + struct bitmap **base, + struct commit *commit) +{ + struct ewah_bitmap *or_with = bitmap_for_commit(bitmap_git, commit); + + if (!or_with) + return 0; + + if (*base == NULL) + *base = ewah_to_bitmap(or_with); + else + bitmap_or_ewah(*base, or_with); + + return 1; +} + static struct bitmap *find_objects(struct bitmap_index *bitmap_git, struct rev_info *revs, struct object_list *roots, @@ -544,21 +561,10 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, struct object *object = roots->item; roots = roots->next; - if (object->type == OBJ_COMMIT) { - khiter_t pos = kh_get_oid_map(bitmap_git->bitmaps, object->oid); - - if (pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos); - struct ewah_bitmap *or_with = lookup_stored_bitmap(st); - - if (base == NULL) - base = ewah_to_bitmap(or_with); - else - bitmap_or_ewah(base, or_with); - - object->flags |= SEEN; - continue; - } + if (object->type == OBJ_COMMIT && + add_commit_to_bitmap(bitmap_git, &base, (struct commit *)object)) { + object->flags |= SEEN; + continue; } object_list_insert(object, ¬_mapped); From patchwork Tue Dec 8 00:05:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957169 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91CD7C4167B for ; Tue, 8 Dec 2020 00:06:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6151D23A04 for ; Tue, 8 Dec 2020 00:06:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728596AbgLHAGP (ORCPT ); Mon, 7 Dec 2020 19:06:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728460AbgLHAGN (ORCPT ); Mon, 7 Dec 2020 19:06:13 -0500 Received: from mail-ot1-x342.google.com (mail-ot1-x342.google.com [IPv6:2607:f8b0:4864:20::342]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 741B0C061749 for ; Mon, 7 Dec 2020 16:05:52 -0800 (PST) Received: by mail-ot1-x342.google.com with SMTP id f16so14285448otl.11 for ; Mon, 07 Dec 2020 16:05:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=g8jjZe/eZNNaHHg17+lxlVYaOvNcp9G5BR/nfwQZ4yI=; b=tTwv5usyX1pfDBiWgFg8niTWEnnEQeWW4Cw5Og6rtQH5JE+m+auiwl1gq1E8lWG5Hb S9IrsMiH8vnHanxbtN6rJm2a0tcOIGlLtxqXFYZFQQW3A7lipV6s4xR9MuCPWkU11Efz mk05F969rcqCUwLszwYEEYcUB5WEfv7ekv3l25kR6ywbuXDX2bmZE7WZlLmxhUzi96RV Uy4flrfp5mKVcNk/45mvwLFchMmI78Mo1ferzd+1qD0XgTAO2tiG2HE/B6lOmJWpJ3BY 38VapbkvlvuCR1bHB+J56PyGsgmb6BeI6R7YeXm8MJ1s127CRsnNmylVyRhV8z7rpd5S MPgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=g8jjZe/eZNNaHHg17+lxlVYaOvNcp9G5BR/nfwQZ4yI=; b=QfK2+7LpDj5o0MqveFx3K1pzHjYnbZGGrdB0de6q0bPY7U3QZ9Q4rCMp4xaX/tPfY8 nzI3Ygk+bLQuV004YQhjoKJbYHUBie5EKX8uEKbVO+C6ex6VpfhX/AmtZnKboCLe8K39 bqXsCHAgJ4y32qlClZtFh2NTHhYCfbSGkotLCybtfPvnkAY7wocPOqWOHp6zopUmZFne 1Nn0skf7nXRHY2XfAlCNbW6A+hF+2McT6U696zy5tKKCTvqlf3i+nQxNFL0Xvs5ESLbu +UlqGtBMVUX1B0BqVUwl+xJp+26Y1PzBWU+i5InCndv5z9BIo11wpNgfUFM/0H8Z2png CWIw== X-Gm-Message-State: AOAM532WH5HuP7I0lGzFyjud8qYP9QWWOY4TYtvUgC0HcsJ/d5jrS9iX tRzaem918EHUpn3rq4ezFkZ+yxFpr9q6VAvo X-Google-Smtp-Source: ABdhPJywnavjtJpfJYpherdLJOYaNwZiQ34N2ZE+TgZfIOEsiRueOMs12+js9aArafQulMWKP8UEog== X-Received: by 2002:a9d:1725:: with SMTP id i37mr14869746ota.258.1607385951588; Mon, 07 Dec 2020 16:05:51 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id k3sm1709878oor.19.2020.12.07.16.05.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:05:51 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:49 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 22/24] pack-bitmap-write: use existing bitmaps Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee When constructing new bitmaps, we perform a commit and tree walk in fill_bitmap_commit() and fill_bitmap_tree(). This walk would benefit from using existing bitmaps when available. We must track the existing bitmaps and translate them into the new object order, but this is generally faster than parsing trees. In fill_bitmap_commit(), we must reorder thing somewhat. The priority queue walks commits from newest-to-oldest, which means we correctly stop walking when reaching a commit with a bitmap. However, if we walk trees interleaved with the commits, then we might be parsing trees that are actually part of a re-used bitmap. To avoid over-walking trees, add them to a LIFO queue and walk them after exploring commits completely. On git.git, this reduces a second immediate bitmap computation from 2.0s to 1.0s. On linux.git, we go from 32s to 22s. On chromium's fork network, we go from 227s to 198s. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 40 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 333058854d..76c8236f94 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -340,20 +340,37 @@ static void fill_bitmap_tree(struct bitmap *bitmap, static void fill_bitmap_commit(struct bb_commit *ent, struct commit *commit, - struct prio_queue *queue) + struct prio_queue *queue, + struct prio_queue *tree_queue, + struct bitmap_index *old_bitmap, + const uint32_t *mapping) { if (!ent->bitmap) ent->bitmap = bitmap_new(); - bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); prio_queue_put(queue, commit); while (queue->nr) { struct commit_list *p; struct commit *c = prio_queue_get(queue); + if (old_bitmap && mapping) { + struct ewah_bitmap *old = bitmap_for_commit(old_bitmap, c); + /* + * If this commit has an old bitmap, then translate that + * bitmap and add its bits to this one. No need to walk + * parents or the tree for this commit. + */ + if (old && !rebuild_bitmap(mapping, old, ent->bitmap)) + continue; + } + + /* + * Mark ourselves and queue our tree. The commit + * walk ensures we cover all parents. + */ bitmap_set(ent->bitmap, find_object_pos(&c->object.oid)); - fill_bitmap_tree(ent->bitmap, get_commit_tree(c)); + prio_queue_put(tree_queue, get_commit_tree(c)); for (p = c->parents; p; p = p->next) { int pos = find_object_pos(&p->item->object.oid); @@ -363,6 +380,9 @@ static void fill_bitmap_commit(struct bb_commit *ent, } } } + + while (tree_queue->nr) + fill_bitmap_tree(ent->bitmap, prio_queue_get(tree_queue)); } static void store_selected(struct bb_commit *ent, struct commit *commit) @@ -386,6 +406,9 @@ void bitmap_writer_build(struct packing_data *to_pack) size_t i; int nr_stored = 0; /* for progress */ struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; + struct prio_queue tree_queue = { NULL }; + struct bitmap_index *old_bitmap; + uint32_t *mapping; writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; @@ -395,6 +418,12 @@ void bitmap_writer_build(struct packing_data *to_pack) trace2_region_enter("pack-bitmap-write", "building_bitmaps_total", the_repository); + old_bitmap = prepare_bitmap_git(to_pack->repo); + if (old_bitmap) + mapping = create_bitmap_mapping(old_bitmap, to_pack); + else + mapping = NULL; + bitmap_builder_init(&bb, &writer); for (i = bb.commits_nr; i > 0; i--) { struct commit *commit = bb.commits[i-1]; @@ -402,7 +431,8 @@ void bitmap_writer_build(struct packing_data *to_pack) struct commit *child; int reused = 0; - fill_bitmap_commit(ent, commit, &queue); + fill_bitmap_commit(ent, commit, &queue, &tree_queue, + old_bitmap, mapping); if (ent->selected) { store_selected(ent, commit); @@ -428,7 +458,9 @@ void bitmap_writer_build(struct packing_data *to_pack) ent->bitmap = NULL; } clear_prio_queue(&queue); + clear_prio_queue(&tree_queue); bitmap_builder_clear(&bb); + free(mapping); trace2_region_leave("pack-bitmap-write", "building_bitmaps_total", the_repository); From patchwork Tue Dec 8 00:05:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957179 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E700C433FE for ; Tue, 8 Dec 2020 00:06:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4DAD823A1D for ; Tue, 8 Dec 2020 00:06:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728300AbgLHAGh (ORCPT ); Mon, 7 Dec 2020 19:06:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728436AbgLHAGh (ORCPT ); Mon, 7 Dec 2020 19:06:37 -0500 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF428C06179C for ; Mon, 7 Dec 2020 16:05:56 -0800 (PST) Received: by mail-oi1-x244.google.com with SMTP id f132so3077257oib.12 for ; Mon, 07 Dec 2020 16:05:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=FW9w8ap8XK7AB9uKdNqbrlGQyj6sw7ozjIgT4+fQKdE=; b=FTz/3u2iFym66aTf5UJWV9/OCFPfH4I9QazZRun+kJ4CuuEdMFha9ufEJMPcgwKtmx cB2qutUYQhMX3dT+qzSgMidGPzzNDsZFSzLR076NEE2tXuZ4cDkyM8D4Ip7nbddQeRB9 T4CEK0Iwil2ScLfWdE0woWaInlCX4IrLIxkXoiXh+U9yWknNLCONcy2zbHCJ4tmapRD4 ETpYK5VwORdhaXfeielQHmEoNF0kvIV0wEBSFCwFUPmEYW4UQX9q3q4OwzV41JqhEPtS i62zcwKVnUeqNyLWQZm7hx+nBHlxOnCfnBauq1chUpzPjsKF/jyraVsVBA8EkU67hUe1 8hXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=FW9w8ap8XK7AB9uKdNqbrlGQyj6sw7ozjIgT4+fQKdE=; b=WCZhkYtE0OgsOIMz+siU8x9SXzqTZeYgyAbzh7ENDKHQcoMwitAW5xjJdLY5oXShGa nReA3JVyTmT+KuFSfnEM/PzBhGokl0Vy6Iee+f/xBkaQHZnppLoUkaeBYwCc2oV/JjNp 94rNOd7tdSnk/ar3/sewtU7k45DlVjOkiq2306aOOj0T4A8PeAqIMsDzMMqC61OOmWO/ TAvF3Ehf4L+KZ6FNk2ft8J5VrVINbue9eueJYD7ctnMWt/z8GIuNkBq2Roag2TZHr1l7 tbRkEbUiGC/F7R5ZNF0+Nu+uJC4hijOb4XQ12iFr54R+Y6/azAr9cvOc2Vl9rE+ilG08 KIjA== X-Gm-Message-State: AOAM532xYgcPeZde7fkBGNjU8oZOKqu3lKQ4xxVMJnSBqwlWRPJc9eIQ FsDvFrCIPwQlCpN18LT/9LBPDv0urQ0CbNiB X-Google-Smtp-Source: ABdhPJyj2TY2Zov9lMgpoYEa3vsv6QDmY4QSTvcrm9Old3RKplQz23KBWTvLAT4FXt4kRFFp783Osw== X-Received: by 2002:aca:1b0a:: with SMTP id b10mr940610oib.9.1607385955821; Mon, 07 Dec 2020 16:05:55 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id t19sm1093917otp.36.2020.12.07.16.05.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:05:55 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:53 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 23/24] pack-bitmap-write: relax unique rewalk condition Message-ID: <50d2031debdc8e5fb627851738d2a5b0c4d75266.1607385833.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The previous commits improved the bitmap computation process for very long, linear histories with many refs by removing quadratic growth in how many objects were walked. The strategy of computing "intermediate commits" using bitmasks for which refs can reach those commits partitioned the poset of reachable objects so each part could be walked exactly once. This was effective for linear histories. However, there was a (significant) drawback: wide histories with many refs had an explosion of memory costs to compute the commit bitmasks during the exploration that discovers these intermediate commits. Since these wide histories are unlikely to repeat walking objects, the benefit of walking objects multiple times was not expensive before. But now, the commit walk *before computing bitmaps* is incredibly expensive. In an effort to discover a happy medium, this change reduces the walk for intermediate commits to only the first-parent history. This focuses the walk on how the histories converge, which still has significant reduction in repeat object walks. It is still possible to create quadratic behavior in this version, but it is probably less likely in realistic data shapes. Here is some data taken on a fresh clone of the kernel: | runtime (sec) | peak heap (GB) | | | | | from | with | from | with | | scratch | existing | scratch | existing | -----------+---------+----------+---------+----------- original | 64.044 | 83.241 | 2.088 | 2.194 | last patch | 45.049 | 37.624 | 2.267 | 2.334 | this patch | 88.478 | 53.218 | 2.157 | 2.224 | Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 14 +++++--------- t/t5310-pack-bitmaps.sh | 27 ++++++++++++++------------- 2 files changed, 19 insertions(+), 22 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 76c8236f94..d2af4a974f 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -199,7 +199,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, { struct rev_info revs; struct commit *commit; - unsigned int i, num_maximal; + unsigned int i, num_maximal = 0; memset(bb, 0, sizeof(*bb)); init_bb_data(&bb->data); @@ -207,6 +207,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, reset_revision_walk(); repo_init_revisions(writer->to_pack->repo, &revs, NULL); revs.topo_order = 1; + revs.first_parent_only = 1; for (i = 0; i < writer->selected_nr; i++) { struct commit *c = writer->selected[i].commit; @@ -221,13 +222,12 @@ static void bitmap_builder_init(struct bitmap_builder *bb, add_pending_object(&revs, &c->object, ""); } - num_maximal = writer->selected_nr; if (prepare_revision_walk(&revs)) die("revision walk setup failed"); while ((commit = get_revision(&revs))) { - struct commit_list *p; + struct commit_list *p = commit->parents; struct bb_commit *c_ent; parse_commit_or_die(commit); @@ -235,16 +235,12 @@ static void bitmap_builder_init(struct bitmap_builder *bb, c_ent = bb_data_at(&bb->data, commit); if (c_ent->maximal) { - if (!c_ent->selected) { - bitmap_set(c_ent->commit_mask, num_maximal); - num_maximal++; - } - + num_maximal++; ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); bb->commits[bb->commits_nr++] = commit; } - for (p = commit->parents; p; p = p->next) { + if (p) { struct bb_commit *p_ent = bb_data_at(&bb->data, p->item); int c_not_p, p_not_c; diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 4c928221be..332af446a8 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -43,23 +43,24 @@ has_any () { # \| # * (base) # +# We only push bits down the first-parent history, which +# makes some of these commits unimportant! +# # The important part for the maximal commit algorithm is how # the bitmasks are extended. Assuming starting bit positions -# for master (bit 0) and other (bit 1), and some flexibility -# in the order that merge bases are visited, the bitmasks at -# the end should be: +# for master (bit 0) and other (bit 1), the bitmasks at the +# end should be: # # master: 1 (maximal, selected) # other: 01 (maximal, selected) -# octo-master: 1 -# octo-other: 01 -# merge-right: 111 (maximal) -# (l1): 111 -# (r1): 111 -# merge-left: 1101 (maximal) -# (l2): 11111 (maximal) -# (r2): 111101 (maximal) -# (base): 1111111 (maximal) +# (base): 11 (maximal) +# +# This complicated history was important for a previous +# version of the walk that guarantees never walking a +# commit multiple times. That goal might be important +# again, so preserve this complicated case. For now, this +# test will guarantee that the bitmaps are computed +# correctly, even with the repeat calculations. test_expect_success 'setup repo with moderate-sized history' ' test_commit_bulk --id=file 10 && @@ -113,7 +114,7 @@ test_expect_success 'full repack creates bitmaps' ' ls .git/objects/pack/ | grep bitmap >output && test_line_count = 1 output && grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace && - grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace + grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace ' test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' From patchwork Tue Dec 8 00:05:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11957181 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7175DC433FE for ; Tue, 8 Dec 2020 00:06:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 399B623A1D for ; Tue, 8 Dec 2020 00:06:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728436AbgLHAGn (ORCPT ); Mon, 7 Dec 2020 19:06:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726918AbgLHAGm (ORCPT ); Mon, 7 Dec 2020 19:06:42 -0500 Received: from mail-oi1-x244.google.com (mail-oi1-x244.google.com [IPv6:2607:f8b0:4864:20::244]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3DEBFC0617B0 for ; Mon, 7 Dec 2020 16:06:02 -0800 (PST) Received: by mail-oi1-x244.google.com with SMTP id f132so3077617oib.12 for ; Mon, 07 Dec 2020 16:06:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=QL9l0cHKGO73C4wCfiM9Q7odgpoKI5hiF7YgL6COI5Y=; b=cMpmnObVKIJ5aFxX9S7oC7iG7UyeYj1Rhq/kjQfqmrczV24W/b3Z3csrKlwXAOPw3V cfzUuGLp18LQvG0i1RdU5dImsHblyIEPZSnKAWFK3MqNO0yyt6OJ93vaKywMtG3TthEb heg5um8d2tNPh43JLDquM7itHpHjX9mrrGB2bdomNtUPnG6+3MJp62PbPZKgM6Clonm6 deT0MmycZ6hFq55+Kzbf2nxYKiVUC7QrPbOdu8JK3XBK/ysoVortczJrRJbpXMHaAMMu FxfvSzDhYS1Hv2+aGVx/xhofsZpsBddm6HSH15zHuyyAw1JkI0alvSlipkGiQLsph2V9 UscQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=QL9l0cHKGO73C4wCfiM9Q7odgpoKI5hiF7YgL6COI5Y=; b=Kcc4hDQuNigs50qc1jReBuTFkKIFRId2NtVxSlDRKT5ybpOW/VHZnB87lzrgpRCR0r cKDid1lMzXQ0UCJqEYgHh6Bn729pivKg1CEtpv2jMZPb5C36WHW7Yaztpf6V61qy3eaq qRq9/BYgkbWtOL4sooiHMGsupwrXYm3GiP0ZREqReHcIHT76lq1sFWVud5viz2b1Tqar gBt/RIaAyt0Y41TbuCN6YfiuPHqKyOoG6SxXeeXiNO6Rvi1dzwHuliaTXvnYcVFkCthz 6/GUNE7ygWvQ9/WsAFDNr5r9kx2yWj6Y84/fa/sCd8T7Ub6LxIeTCH/9WwpDY9wzI+4h 9yrw== X-Gm-Message-State: AOAM531n5fjw47F+vvpHhksuVmjXZb7OF2ut+XIz80+d1up1j3cVjHJf ZtM1vyFHkycwR85VedIVWAdzFaY3vAN3cCnI X-Google-Smtp-Source: ABdhPJxhBLT0Mp7SzN6bbhxuriiwosv0/rxSVN6rWLLjPilzQFj7sp03CtcbJDZhJ4dPlQPtem3pFQ== X-Received: by 2002:a05:6808:b38:: with SMTP id t24mr911954oij.153.1607385961388; Mon, 07 Dec 2020 16:06:01 -0800 (PST) Received: from localhost ([8.44.146.30]) by smtp.gmail.com with ESMTPSA id j46sm3108071oof.36.2020.12.07.16.06.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Dec 2020 16:06:00 -0800 (PST) Date: Mon, 7 Dec 2020 19:05:57 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: peff@peff.net, jonathantanmy@google.com, dstolee@microsoft.com, gitster@pobox.com Subject: [PATCH v3 24/24] pack-bitmap-write: better reuse bitmaps Message-ID: <6b9950771e57bd89dadcddaee7f27269e6ded535.1607385833.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee If the old bitmap file contains a bitmap for a given commit, then that commit does not need help from intermediate commits in its history to compute its final bitmap. Eject that commit from the walk and insert it into a separate list of reusable commits that are eventually stored in the list of commits for computing bitmaps. This helps the repeat bitmap computation task, even if the selected commits shift drastically. This helps when a previously-bitmapped commit exists in the first-parent history of a newly-selected commit. Since we stop the walk at these commits and we use a first-parent walk, it is harder to walk "around" these bitmapped commits. It's not impossible, but we can greatly reduce the computation time for many selected commits. | runtime (sec) | peak heap (GB) | | | | | from | with | from | with | | scratch | existing | scratch | existing | -----------+---------+----------+---------+----------- last patch | 88.478 | 53.218 | 2.157 | 2.224 | this patch | 86.681 | 16.164 | 2.157 | 2.222 | Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 40 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index d2af4a974f..cc5ead9990 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -195,10 +195,13 @@ struct bitmap_builder { }; static void bitmap_builder_init(struct bitmap_builder *bb, - struct bitmap_writer *writer) + struct bitmap_writer *writer, + struct bitmap_index *old_bitmap) { struct rev_info revs; struct commit *commit; + struct commit_list *reusable = NULL; + struct commit_list *r; unsigned int i, num_maximal = 0; memset(bb, 0, sizeof(*bb)); @@ -234,6 +237,31 @@ static void bitmap_builder_init(struct bitmap_builder *bb, c_ent = bb_data_at(&bb->data, commit); + /* + * If there is no commit_mask, there is no reason to iterate + * over this commit; it is not selected (if it were, it would + * not have a blank commit mask) and all its children have + * existing bitmaps (see the comment starting with "This commit + * has an existing bitmap" below), so it does not contribute + * anything to the final bitmap file or its descendants. + */ + if (!c_ent->commit_mask) + continue; + + if (old_bitmap && bitmap_for_commit(old_bitmap, commit)) { + /* + * This commit has an existing bitmap, so we can + * get its bits immediately without an object + * walk. That is, it is reusable as-is and there is no + * need to continue walking beyond it. + * + * Mark it as such and add it to bb->commits separately + * to avoid allocating a position in the commit mask. + */ + commit_list_insert(commit, &reusable); + goto next; + } + if (c_ent->maximal) { num_maximal++; ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); @@ -278,14 +306,22 @@ static void bitmap_builder_init(struct bitmap_builder *bb, } } +next: bitmap_free(c_ent->commit_mask); c_ent->commit_mask = NULL; } + for (r = reusable; r; r = r->next) { + ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); + bb->commits[bb->commits_nr++] = r->item; + } + trace2_data_intmax("pack-bitmap-write", the_repository, "num_selected_commits", writer->selected_nr); trace2_data_intmax("pack-bitmap-write", the_repository, "num_maximal_commits", num_maximal); + + free_commit_list(reusable); } static void bitmap_builder_clear(struct bitmap_builder *bb) @@ -420,7 +456,7 @@ void bitmap_writer_build(struct packing_data *to_pack) else mapping = NULL; - bitmap_builder_init(&bb, &writer); + bitmap_builder_init(&bb, &writer, old_bitmap); for (i = bb.commits_nr; i > 0; i--) { struct commit *commit = bb.commits[i-1]; struct bb_commit *ent = bb_data_at(&bb.data, commit);