From patchwork Wed Nov 11 19:41:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898385 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24844C388F9 for ; Wed, 11 Nov 2020 19:41:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B002F2087D for ; Wed, 11 Nov 2020 19:41:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="lbMk2yiB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727489AbgKKTlv (ORCPT ); Wed, 11 Nov 2020 14:41:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726479AbgKKTlv (ORCPT ); Wed, 11 Nov 2020 14:41:51 -0500 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE744C0613D1 for ; Wed, 11 Nov 2020 11:41:49 -0800 (PST) Received: by mail-qk1-x743.google.com with SMTP id h15so2860601qkl.13 for ; Wed, 11 Nov 2020 11:41:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=YfB1LW/vqaO3YmLzAXyzg2hBvw20Dcx/Uldtx6zwBcs=; b=lbMk2yiB6Wh6jx84tSQ3+nQrxJUTAXZgkntCbG2nJ8W03dyO1YavYDx86WWJj556ND ps6+QThGc05S3D2BpXh9yOFGmO9EMqyIDBHlxDVA0ef2s9EiMVGokHFbjML1WsYT0IFl HF7lHRdZw8S2O912W8tPl6r4oSB5F2ZQuLa3tofk9j3T6ti69z/bUOA1zqBFxTIFwMx4 077CmXXiKYgOeQgXerWP1ve4/on64Gi4OraF/jlxsUed4UgC4cPNeOaN/wWc74k5WcXk kC9hPwNTkMTsn0XFcDX9eNW/+7sEtKQ2zr/1D2lgnhcwYZMHRH9YHP6OHq1lFmSYNtDz 2brg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=YfB1LW/vqaO3YmLzAXyzg2hBvw20Dcx/Uldtx6zwBcs=; b=BZtETV7T5JUFkdWSxcR7R/3ltbbAjzsUhxkzTzkEuQ7jK1wE26PJjh7xK6j1K+cK0n RPE/Qjia73ZEtOHwk3qKmUAF0lOmIUYcxu1pXy3PX91CW3SYirYDud7g3KNcgpiYTGYb nC+Fl3BF8BQK3WIJdOSqzJmoA66q1ygUSuzQq4yRgDVwIwiFIA4DlVz0U9zg1tQzDPCL tMH9AN2oD+YMRlPA9x+yMyL7hETKk8Z9lVHo5H3Bj76vyaX3Q5YARKyteq62TGI0jdXT eRRj+4ODFBgLm2U+5EpxfZ30wX37wotzwzk040qRvmQENjSVqmWQHekQMHFrhPXxd7Pe buZQ== X-Gm-Message-State: AOAM5304co05Z1+Yy5tPEFcci2D5QuX7SkDSqm+EAMHyUULfSd2IMX6E JelN0m8RAYbmOQMVqh9OzNXOfH3gSoAdcsrN X-Google-Smtp-Source: ABdhPJyV6fAtAmZ3NJmp3zf3LtszvCzEBMWha3QfW2Hw4BjDYgTSc1PTE/ZTgBGzu6SCTb7/Pg6F2g== X-Received: by 2002:ae9:ea14:: with SMTP id f20mr25928195qkg.239.1605123708683; Wed, 11 Nov 2020 11:41:48 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id z125sm3098243qke.54.2020.11.11.11.41.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:41:48 -0800 (PST) Date: Wed, 11 Nov 2020 14:41:44 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 01/23] ewah/ewah_bitmap.c: grow buffer past 1 Message-ID: <36deaad366d66d10b96755dd6969bfe51123a2d4.1605123652.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When the buffer size is exactly 1, we fail to grow it properly, since the integer truncation means that 1 * 3 / 2 = 1. This can cause a bad write on the line below. Bandaid this by first padding the buffer by 16, and then growing it. This still allows old blocks to fit into new ones, but fixes the case where the block size equals 1. Co-authored-by: Jeff King Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/ewah_bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c index d59b1afe3d..3fae04ad00 100644 --- a/ewah/ewah_bitmap.c +++ b/ewah/ewah_bitmap.c @@ -45,7 +45,7 @@ static inline void buffer_grow(struct ewah_bitmap *self, size_t new_size) static inline void buffer_push(struct ewah_bitmap *self, eword_t value) { if (self->buffer_size + 1 >= self->alloc_size) - buffer_grow(self, self->buffer_size * 3 / 2); + buffer_grow(self, (self->buffer_size + 16) * 3 / 2); self->buffer[self->buffer_size++] = value; } From patchwork Wed Nov 11 19:41:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898387 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E094C388F9 for ; Wed, 11 Nov 2020 19:42:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 46BE42087D for ; Wed, 11 Nov 2020 19:42:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="wLojPmCR" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727553AbgKKTl7 (ORCPT ); Wed, 11 Nov 2020 14:41:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727468AbgKKTl7 (ORCPT ); Wed, 11 Nov 2020 14:41:59 -0500 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E84F9C0613D1 for ; Wed, 11 Nov 2020 11:41:58 -0800 (PST) Received: by mail-qk1-x743.google.com with SMTP id t191so2916311qka.4 for ; Wed, 11 Nov 2020 11:41:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=iP/7a2VntiLlnAaVrnGYm87IoWVvfW65anp38JXV3Qw=; b=wLojPmCRqZhpgAq5eLmbAR6g3EoqvnLrTh8h+ROuP1LtyjyWdDq3cc+ZbQ0Ut1KbbP 2C6LsUHVQ5LD9ypL+d3yR9a6iM45dkcIFKfjWe44wijQt9ALuQXeidRaYJILHw1ExqAX yb7rcu/mRKe3M+VKBwAHmULWYcNIC+GTgZqy5knN73uJP28EKN0OntLvqhyDhg2B063u 9LDxylTEoRwcgrCRkzMx7y149dA7S2ToLGBMNNXgyafO8f7Isf26wMSPjLYER/kSvjyL ku9rO6pwlw87v+gcnNq/ka0HF5qUbvQGgG4RhQPYBMk5rikqN4AZu6K029b7kvVDelcs H01w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=iP/7a2VntiLlnAaVrnGYm87IoWVvfW65anp38JXV3Qw=; b=DxzpW/XbzCv1wc/g4yEt8NR96aBplm7GRP+FX6Hm9W5co5EoADukPzGyUsbE/VFbfh RXJUkd6hHdiaPoQcmicmPENAG2dEE6p+Z45pB56AoJXwCWeKt8TBZ7GrwAko5l14qz37 seJUxVkEg3gArDGanISrQ2EyqdHFcED4uNmoEaiNiwxtgdrrBGu6Hc0IV2nbIVoJQ/Ld tH+puOhcMXsx4sTa/X7Z4HVnncOU5865nJSrL1OzldKPERZDiJklWi4Ob/Aq8oKIH4P8 mxMvzrTtLThxztqZ4Msp+JJ1/UUrxnSYvUW4rEr4RSGamiIG1FTHYMWFD44nT+TSRnNS 8qnQ== X-Gm-Message-State: AOAM533DU7nsc+ZPEWGVoHGSau8FPe3NDwO7y4yVSXe4xTaGRqxIF3hF MBQSwNVoFEqEKbe8v1/Km1CqFFUcs0j3C/Fd X-Google-Smtp-Source: ABdhPJzdAMMFdjTDyT18f1My43B8MMVz01HdNNRKnDevDXfuRk/XTXNPZAdKGfyaKLQiHStETtBRpQ== X-Received: by 2002:a37:98c7:: with SMTP id a190mr27174894qke.471.1605123717929; Wed, 11 Nov 2020 11:41:57 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id 9sm2814578qty.30.2020.11.11.11.41.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:41:57 -0800 (PST) Date: Wed, 11 Nov 2020 14:41:54 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 02/23] pack-bitmap: fix header size check Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King When we parse a .bitmap header, we first check that we have enough bytes to make a valid header. We do that based on sizeof(struct bitmap_disk_header). However, as of 0f4d6cada8 (pack-bitmap: make bitmap header handling hash agnostic, 2019-02-19), that struct oversizes its checksum member to GIT_MAX_RAWSZ. That means we need to adjust for the difference between that constant and the size of the actual hash we're using. That commit adjusted the code which moves our pointer forward, but forgot to update the size check. This meant we were overly strict about the header size (requiring room for a 32-byte worst-case hash, when sha1 is only 20 bytes). But in practice it didn't matter because bitmap files tend to have at least 12 bytes of actual data anyway, so it was unlikely for a valid file to be caught by this. Let's fix it by pulling the header size into a separate variable and using it in both spots. That fixes the bug and simplifies the code to make it harder to have a mismatch like this in the future. It will also come in handy in the next patch for more bounds checking. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 4077e731e8..cea3bb88bf 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -138,8 +138,9 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index) static int load_bitmap_header(struct bitmap_index *index) { struct bitmap_disk_header *header = (void *)index->map; + size_t header_size = sizeof(*header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz; - if (index->map_size < sizeof(*header) + the_hash_algo->rawsz) + if (index->map_size < header_size) return error("Corrupted bitmap index (missing header data)"); if (memcmp(header->magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)) != 0) @@ -164,7 +165,7 @@ static int load_bitmap_header(struct bitmap_index *index) } index->entry_count = ntohl(header->entry_count); - index->map_pos += sizeof(*header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz; + index->map_pos += header_size; return 0; } From patchwork Wed Nov 11 19:42:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898389 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 807DDC388F9 for ; Wed, 11 Nov 2020 19:42:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2E8702087D for ; Wed, 11 Nov 2020 19:42:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="cHQn3DX0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727645AbgKKTmG (ORCPT ); Wed, 11 Nov 2020 14:42:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726479AbgKKTmE (ORCPT ); Wed, 11 Nov 2020 14:42:04 -0500 Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7ACC9C0613D1 for ; Wed, 11 Nov 2020 11:42:04 -0800 (PST) Received: by mail-qt1-x835.google.com with SMTP id i12so2204511qtj.0 for ; Wed, 11 Nov 2020 11:42:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=WJoyeKHouA+s3pojn9t5eLjcAQwqU/UHOR9tm2lp0R0=; b=cHQn3DX0DHvS1ULAOHu7p5dsQCBkQ6dPaUizKTo+z5czYs1x/O2nddTgeeyaGc7Hp/ n/PryO09EJdqZ2uTvtUIhaXNrxpb0VbfPUaI6lHuQHK4f5HRC/aV4GhYngvHGB5Z1nnN 1HTgYNSLU5d0ljeNYD1TAsAzpWPPmktN8OyeHumvEarpQoA40L18+9vOsxi2E/bcu8oy r63OGddljm8nzZzrVoPqWBniSeYtGfh9hAvDLLuUNFaAemzc5MVIj8fetb3Ezt+ezSu8 bzZvTCqllcC+hm9A5tMBl+GaC3+YqjrY9qJOecT8K+lT+gdxsF12N0p9f1znma1PIIRs f2VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=WJoyeKHouA+s3pojn9t5eLjcAQwqU/UHOR9tm2lp0R0=; b=XVNBLyUlOeM2mbYBiqAB/hZMFJq3w5yejQValsvmuOXc1RJ+u3TpGcDnvJASCGuBdN qTDxkM47V8YjZFnuHq72S/l5L3Xf9YKARpkdL5c3bu8JGSqt6t+4a6p+PYNR10+ZrHSs gLeEsYv1aVB5WLsdsmVNAq2irb9JU8dGnS8XtcGFnN/9saAq0PVUl6xXjEJVyKCjj3lc NifExXm4s5FiZQltX6QaEd4hglalESvb1L17vzksdBRcqlb35tCF/NJ5K7XCkUlUEYiz WQfIDQxTxg4fHE9or2BpVn82eqMWbkwEZYdG782o1AXPuU30YCGiHC49L6TyGZj+Z7Zl MbDg== X-Gm-Message-State: AOAM530pwUFfSRqOqBsZKecG8Vjs+h9S+cPu3nVBPPwcc/Hp2Q5xdpFa cP47pQrgT0CFraQh2nnyVS/IwvBOYG26SDAj X-Google-Smtp-Source: ABdhPJwhCe/GcuJIFgxAhhSFk7gmNHSvSdO82uMjguQwhqUuDTbW4Dc74Le0KwQKcnhAL8mufW+xPg== X-Received: by 2002:a05:622a:c8:: with SMTP id p8mr5402929qtw.293.1605123723244; Wed, 11 Nov 2020 11:42:03 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id f130sm1774843qke.30.2020.11.11.11.42.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:42:02 -0800 (PST) Date: Wed, 11 Nov 2020 14:42:00 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 03/23] pack-bitmap: bounds-check size of cache extension Message-ID: <1573902df00e8a14a9cb68c37f55474388b1dc2e.1605123652.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King A .bitmap file may have a "name hash cache" extension, which puts a sequence of uint32_t bytes (one per object) at the end of the file. When we see a flag indicating this extension, we blindly subtract the appropriate number of bytes from our available length. However, if the .bitmap file is too short, we'll underflow our length variable and wrap around, thinking we have a very large length. This can lead to reading out-of-bounds bytes while loading individual ewah bitmaps. We can fix this by checking the number of available bytes when we parse the header. The existing "truncated bitmap" test is now split into two tests: one where we don't have this extension at all (and hence actually do try to read a truncated ewah bitmap) and one where we realize up-front that we can't even fit in the cache structure. We'll check stderr in each case to make sure we hit the error we're expecting. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 8 ++++++-- t/t5310-pack-bitmaps.sh | 17 +++++++++++++++-- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index cea3bb88bf..42d4824c76 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -153,14 +153,18 @@ static int load_bitmap_header(struct bitmap_index *index) /* Parse known bitmap format options */ { uint32_t flags = ntohs(header->options); + uint32_t cache_size = st_mult(index->pack->num_objects, sizeof(uint32_t)); + unsigned char *index_end = index->map + index->map_size - the_hash_algo->rawsz; if ((flags & BITMAP_OPT_FULL_DAG) == 0) return error("Unsupported options for bitmap index file " "(Git requires BITMAP_OPT_FULL_DAG)"); if (flags & BITMAP_OPT_HASH_CACHE) { - unsigned char *end = index->map + index->map_size - the_hash_algo->rawsz; - index->hashes = ((uint32_t *)end) - index->pack->num_objects; + if (index->map + header_size + cache_size > index_end) + return error("corrupted bitmap index file (too short to fit hash cache)"); + index->hashes = (void *)(index_end - cache_size); + index_end -= cache_size; } } diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 8318781d2b..e2c3907a68 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -343,7 +343,8 @@ test_expect_success 'pack reuse respects --incremental' ' test_must_be_empty actual ' -test_expect_success 'truncated bitmap fails gracefully' ' +test_expect_success 'truncated bitmap fails gracefully (ewah)' ' + test_config pack.writebitmaphashcache false && git repack -ad && git rev-list --use-bitmap-index --count --all >expect && bitmap=$(ls .git/objects/pack/*.bitmap) && @@ -352,7 +353,19 @@ test_expect_success 'truncated bitmap fails gracefully' ' mv -f $bitmap.tmp $bitmap && git rev-list --use-bitmap-index --count --all >actual 2>stderr && test_cmp expect actual && - test_i18ngrep corrupt stderr + test_i18ngrep corrupt.ewah.bitmap stderr +' + +test_expect_success 'truncated bitmap fails gracefully (cache)' ' + git repack -ad && + git rev-list --use-bitmap-index --count --all >expect && + bitmap=$(ls .git/objects/pack/*.bitmap) && + test_when_finished "rm -f $bitmap" && + test_copy_bytes 512 <$bitmap >$bitmap.tmp && + mv -f $bitmap.tmp $bitmap && + git rev-list --use-bitmap-index --count --all >actual 2>stderr && + test_cmp expect actual && + test_i18ngrep corrupted.bitmap.index stderr ' # have_delta From patchwork Wed Nov 11 19:42:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898393 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5344AC388F9 for ; Wed, 11 Nov 2020 19:42:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F1D9C207F7 for ; Wed, 11 Nov 2020 19:42:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="lVxk02Yy" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727695AbgKKTmW (ORCPT ); Wed, 11 Nov 2020 14:42:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726479AbgKKTmO (ORCPT ); Wed, 11 Nov 2020 14:42:14 -0500 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C85D5C0613D1 for ; Wed, 11 Nov 2020 11:42:12 -0800 (PST) Received: by mail-qk1-x742.google.com with SMTP id 11so2911877qkd.5 for ; Wed, 11 Nov 2020 11:42:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=aGJpchHTDY1Z4e1hNCHdORMrTmXRdchiVOmu8aKmyl8=; b=lVxk02Yy+NKT7FMca74ajqDAEzMtXzbmen+gMQEWPLSYBqQP1BAVrXmnStrBHeB1Rf PKF11tfx690Ti5ulvLnkj4g6NeuALUf+i6QEVXqyRBg71pop7FCUf+rvmP0FdntR75TR ZA2tiUNBlt8k/eHVQJMz1WK21Z5COCYUx1L4NsLbsShamtGAtWgDaWWolkaIKQYZ5sTE 5rUgMa6jRLNgXBL23X39Cl04fm8MWZC9WqCvquO72hUN/RKRII72UWoNcULldeEH++H2 hjuAzB2dM7CEcBnmPuTEXGLdG6xiYt9tm2cSlto+VaCzLbDVh+L6eFZqSU9c13BG7YUs BH/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=aGJpchHTDY1Z4e1hNCHdORMrTmXRdchiVOmu8aKmyl8=; b=I8tmfDDi8P8fIl5X/Jz7dbyz2gpSoLIJ2YDqWIR4L90r6gc0cyltt1w+Dr1VF6yxxs +NsP+8ZU0WqFRtqHVc7yZVVMbVSdlmT7sFDQ9Er+EMHC2nA/NSPavYF1tsW1n5lVqYMx p38GKP2ma2WZwJ+1ryyk8Q3HxoQKQVLwMbCG/c7G51omoZWWF2TH+IEWIKevOuB4QckM I4UDQ6eouQoM7RRcN8Dx9EHrRcaDj6th//X9JmFbf0kLSquvx76s2OGMbJgGEUan2Ov6 5rQ89i3k09vcwT9FA5PIwW0RAgJ67Dl1NU7i+DCyVufEde3YMmc2Fg9uImhLNeldksoC IMiQ== X-Gm-Message-State: AOAM533F4b12XqcxTOtgi3KG5hVpQe1Ft22IIC3LevUgJKxyELdGGrER Za9dDQbIoLQWR3aERpW09DGuJBctubnd9J5p X-Google-Smtp-Source: ABdhPJwQamuw6LB0D5Hjzsa/K1OfPllbnphfN2Z0Sp6lUzrhl0JYwf0sRUqWYMRcBRXAdOUnqTFyrw== X-Received: by 2002:a37:f517:: with SMTP id l23mr14268528qkk.39.1605123731701; Wed, 11 Nov 2020 11:42:11 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id z20sm3330282qtb.31.2020.11.11.11.42.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:42:11 -0800 (PST) Date: Wed, 11 Nov 2020 14:42:08 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 04/23] t5310: drop size of truncated ewah bitmap Message-ID: <9db61a254af0f8369872e0ef91cbc38acfb079d0.1605123652.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King We truncate the .bitmap file to 512 bytes and expect to run into problems reading an individual ewah file. But this length is somewhat arbitrary, and just happened to work when the test was added in 9d2e330b17 (ewah_read_mmap: bounds-check mmap reads, 2018-06-14). An upcoming commit will change the size of the history we create in the test repo, which will cause this test to fail. We can future-proof it a bit more by reducing the size of the truncated bitmap file. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- t/t5310-pack-bitmaps.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index e2c3907a68..70a4fc4843 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -349,7 +349,7 @@ test_expect_success 'truncated bitmap fails gracefully (ewah)' ' git rev-list --use-bitmap-index --count --all >expect && bitmap=$(ls .git/objects/pack/*.bitmap) && test_when_finished "rm -f $bitmap" && - test_copy_bytes 512 <$bitmap >$bitmap.tmp && + test_copy_bytes 256 <$bitmap >$bitmap.tmp && mv -f $bitmap.tmp $bitmap && git rev-list --use-bitmap-index --count --all >actual 2>stderr && test_cmp expect actual && From patchwork Wed Nov 11 19:42:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95841C388F9 for ; Wed, 11 Nov 2020 19:42:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2A39D2087D for ; Wed, 11 Nov 2020 19:42:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="jYc8mMMn" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727753AbgKKTmu (ORCPT ); Wed, 11 Nov 2020 14:42:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727684AbgKKTmu (ORCPT ); Wed, 11 Nov 2020 14:42:50 -0500 Received: from mail-qv1-xf42.google.com (mail-qv1-xf42.google.com [IPv6:2607:f8b0:4864:20::f42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF050C0613D1 for ; Wed, 11 Nov 2020 11:42:48 -0800 (PST) Received: by mail-qv1-xf42.google.com with SMTP id 63so1515857qva.7 for ; Wed, 11 Nov 2020 11:42:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=t9OaGPAEVEXhVY/MRONUct124SfceUrdt8cXlZu/fug=; b=jYc8mMMn6v8/RlugAn4Hb6w7bIx+RSj+MnPEaCxKZpOlvWfu8mn3Ft9iHqkVqc2aCp 4iMqxX7f5JwMxnu4I1Je0A1cHH4QII6lvVfBcID+ik9l8ISwuemqffvdjQXQzwod6GgI 1T8NMVwMC+noZXsihwOB7Q0nngTTd019kn+6piKUUoMBouguF5wiAqe4z1spxfJWn0v7 M/LE5k8Mr3i4X75xHsS2mh/pN7X3iOZ1Th4oAtX7+yP0wqSQvcZ6/70K65uM6MpHv82V MUMgZVkrl3rQQUcQZXRUp9IYtEz+oXvzaKTuB8ozog4WRjna0IX2N4PPK/924HbfCYIy KWiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=t9OaGPAEVEXhVY/MRONUct124SfceUrdt8cXlZu/fug=; b=qjljlnIG/lArvgc95dYH3qMkDQzzpMRYKzjuwHBuFukAhIEPdcWgeOQ/r65GSCA253 ZNTQNzdUX9O/aqw1mnwGVUfWDK2ylzp7frsmd4Ys/UFhR3hshHFyZREqXVXlaZheRFWm NDUC2XqIlVTDrf7qh3dRe+E6PZnwn6LMFEombSemlYNMqGkqbjBnxUejhiLDKH9bQT5w 4EdaKiRXS072S9YPka1V8NJv1uAfjYk1qXBczWFfAL9aOmAvp4aTwJSk2K3D404P5e4s Qef0UcS/dozjRo24bD9N/rJGy3Vmu8X5Z1GXwum70EC6dcvs0/q5WEIroOFttBLsyiRI CZjg== X-Gm-Message-State: AOAM531O39RznbLr/fV5xkLjHzxnsXuqXr7yzqAtH2HnBxSlFjhLrAnA NQXGFWLuvIv4SLUzBbOq2eRITiSgH5LuyvFg X-Google-Smtp-Source: ABdhPJxTYHkAQlPNJn+jFlLI1mW3pG79kki8NDREePHwltyRRyQIk/nWwhPRRvQq5AgJU5N/IuWMeA== X-Received: by 2002:a05:6214:174f:: with SMTP id dc15mr26187514qvb.25.1605123767606; Wed, 11 Nov 2020 11:42:47 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id n187sm3255652qkc.133.2020.11.11.11.42.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:42:47 -0800 (PST) Date: Wed, 11 Nov 2020 14:42:44 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 05/23] rev-list: die when --test-bitmap detects a mismatch Message-ID: <37af96311503c3515ebec8d6215af70c48337d40.1605123652.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King You can use "git rev-list --test-bitmap HEAD" to check that bitmaps produce the same answer we'd get from a regular traversal. But if we detect an error, we only print "mismatch", and still exit with a successful error code. That makes the uses of --test-bitmap in the test suite (e.g., in t5310) mostly pointless: even if we saw an error, the tests wouldn't notice. Let's instead call die(), which will let these tests work as designed, and alert us if the bitmaps are bogus. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 42d4824c76..82c6bf2843 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1328,7 +1328,7 @@ void test_bitmap_walk(struct rev_info *revs) if (bitmap_equals(result, tdata.base)) fprintf(stderr, "OK!\n"); else - fprintf(stderr, "Mismatch!\n"); + die("mismatch in bitmap results"); free_bitmap_index(bitmap_git); } From patchwork Wed Nov 11 19:42:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E9F4C5517A for ; Wed, 11 Nov 2020 19:43:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B54BC207F7 for ; Wed, 11 Nov 2020 19:42:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="INuZNxNN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727776AbgKKTm5 (ORCPT ); Wed, 11 Nov 2020 14:42:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54898 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727767AbgKKTmz (ORCPT ); Wed, 11 Nov 2020 14:42:55 -0500 Received: from mail-qv1-xf43.google.com (mail-qv1-xf43.google.com [IPv6:2607:f8b0:4864:20::f43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C4FBC0613D1 for ; Wed, 11 Nov 2020 11:42:55 -0800 (PST) Received: by mail-qv1-xf43.google.com with SMTP id 63so1516052qva.7 for ; Wed, 11 Nov 2020 11:42:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=/v80Nm5n3Jy6++VHoC545Uu99XrlcvvkxM1yto8kTBg=; b=INuZNxNNYWL7SXAvMvIjwyJFupyUR2dSclLHHs6xfFMqLAj8z5v/MlNcf9o7IU3AWS VO1SroZcLick9gwd5QzsHsL+YY5+IKsXhLTbEKRbpnOWYTHnWHirJvJvV858eMBzMJKZ z04BIB3IqrUjt2FFaI3BhyXE5gC2bt2UsTdUh1IET4ecnILGGYLJ/+tHTNrdXv+4ZE1Q vNMM9DBnzvLFHhbeVZCNb2V1FqSKYFQMpfSmS+Rf1rm841TkN3ihS+9/pQnvx2lvac5f dzyzs/OAN9NP3smuwRbRrbp7LrOafqRFyvpaopk1fmf7YHye0BkBLl5nABoP79hs1ahi D3RA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=/v80Nm5n3Jy6++VHoC545Uu99XrlcvvkxM1yto8kTBg=; b=ds8G0KGjjYj+01X36RmzJuI5HQGAOkMDtrPIWwnc0tehyDnt+SL4jRmwfeh17/8k8+ LNxTMYKsjlzFa0f9KNXLwXYlrjyIFoxnPYF27UPrqTLk2yVGJeQ4lyO98TqHIuHcFh5d jU3UExWMYnWWEVf2Ar1Ot18B/bCEBIBvYPhYq/8cx20aQETzQa3QhDehb5ipT8MPw6nU IBBbiBU89V6zggRtoWOdvdWI1s9ITPTReh5nuJxpvDDuSJFEsMaVhNwqfvphLvO554B9 +y+uLwKiHkKNxGv3EuusVtPuAtW1VN3I9xuVps91n/ch6kk7MRmTaDen3e25yHvxeY63 zXxw== X-Gm-Message-State: AOAM53087AOE81YL5HsUU/0mzP0RDKSMbQUA6JkTm57qNnFLCH1EPgDQ pABr+edtUWxFkG5LJPg6RyxoGBhgEiHici9v X-Google-Smtp-Source: ABdhPJzXekRgV4s1N1YHxZOyNLhZv4eH9yvv4LwGfnoeVJzkuiJ6L4vW02bdaRGNY2kVQDBP9ks9PA== X-Received: by 2002:a0c:ba85:: with SMTP id x5mr26323534qvf.7.1605123774485; Wed, 11 Nov 2020 11:42:54 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id g9sm3051472qtq.21.2020.11.11.11.42.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:42:53 -0800 (PST) Date: Wed, 11 Nov 2020 14:42:51 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 06/23] ewah: factor out bitmap growth Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King We auto-grow bitmaps when somebody asks to set a bit whose position is outside of our currently allocated range. Other operations besides single bit-setting might need to do this, too, so let's pull it into its own function. Note that we change the semantics a little: you now ask for the number of words you'd like to have, not the id of the block you'd like to write to. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index d8cec585af..7c1ecfa6fd 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -35,18 +35,22 @@ struct bitmap *bitmap_new(void) return bitmap_word_alloc(32); } -void bitmap_set(struct bitmap *self, size_t pos) +static void bitmap_grow(struct bitmap *self, size_t word_alloc) { - size_t block = EWAH_BLOCK(pos); - - if (block >= self->word_alloc) { + if (word_alloc > self->word_alloc) { size_t old_size = self->word_alloc; - self->word_alloc = block ? block * 2 : 1; + self->word_alloc = word_alloc * 2; REALLOC_ARRAY(self->words, self->word_alloc); memset(self->words + old_size, 0x0, (self->word_alloc - old_size) * sizeof(eword_t)); } +} +void bitmap_set(struct bitmap *self, size_t pos) +{ + size_t block = EWAH_BLOCK(pos); + + bitmap_grow(self, block + 1); self->words[block] |= EWAH_MASK(pos); } From patchwork Wed Nov 11 19:42:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93FB5C388F9 for ; Wed, 11 Nov 2020 19:43:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 43C0A208B8 for ; Wed, 11 Nov 2020 19:43:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="Y8E05TL6" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727784AbgKKTnD (ORCPT ); Wed, 11 Nov 2020 14:43:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727684AbgKKTnC (ORCPT ); Wed, 11 Nov 2020 14:43:02 -0500 Received: from mail-qv1-xf44.google.com (mail-qv1-xf44.google.com [IPv6:2607:f8b0:4864:20::f44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6833FC0613D1 for ; Wed, 11 Nov 2020 11:43:02 -0800 (PST) Received: by mail-qv1-xf44.google.com with SMTP id 13so1516621qvr.5 for ; Wed, 11 Nov 2020 11:43:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=2IwwukICgO2aonhDeOrRN3bUyQ69oFdrn20lHy8bdOM=; b=Y8E05TL63EnuYw2AZg3D++DoSnlsslHZ97X+cCr0IQPxMITHCpQ2rJEtWop9JrvfXW qxQ3jSORhZMc9/FgVgnnkctwVHz6hwVBvAlvf0agzsZf7lBNGkxkhKRExM3jaO8NhzBb qE4SjSC+V3329EvtKmZpilFPE09MYkCwqfr3Ni13QBj2Tgp7B+I4ZrSfCAZ/KzykuveE J1F2iPWeg65KRjmjN59bV1qrxUrImWr/zLdXDNOKwqA0J4FQ4+Z+gbTkYQo7Q3QVWllF TXQxpv/Hmt2KhGoOI7gOnzbs4va47FQ7WVTvnO9vAyW9G1zyJAy/qIKm46to2+zlTTyq asgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=2IwwukICgO2aonhDeOrRN3bUyQ69oFdrn20lHy8bdOM=; b=e49drNO9QjAqIqouKFX+PsJd6tGmGE/McgiG9ldb8gJUE7eyye60tp/pR8v/0JpLLU GxN1Xmm7ROByQV0UcRn4rkkIc8fh9/BAtnRTq+Ps/r91iLe14KqmvWmz7x8Bz5yuwl2S h0aJUHyPNwopah+dleVgBDXLm2RG+T9KUV/poJI4CbmbbAb+nhfR2zIK6ufb3ZlC4qZs /VNQmvbu3DqaEHw2Ld+QXGez/7kRgZn4vqMP5Z6IfvivtEXRsuDibetiKm+OBDHmKTUe R0B8aGDsIspALvP7sn5Px1ZxOE9rmBLp9FwnLuiouYuYgtMbMhAwLf467goLhcIrnWMJ U+2w== X-Gm-Message-State: AOAM533DYvV7zYA02UlS1br1U8tOykiZu1wFBS/lpvjiRE5kV19VnhcF c3YrJBi8mNq0i0XeXQkwa05ucAsfyt5lfLwD X-Google-Smtp-Source: ABdhPJz0iC2NnBs5hJEjtaGzTtWE93azhJM8UQqsIGs0mMvxp4QxrGLJXZUvLcM7DAhf+UPzzUsZyQ== X-Received: by 2002:a05:6214:386:: with SMTP id l6mr2437700qvy.49.1605123781334; Wed, 11 Nov 2020 11:43:01 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id p187sm2919913qkf.70.2020.11.11.11.43.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:43:00 -0800 (PST) Date: Wed, 11 Nov 2020 14:42:58 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 07/23] ewah: make bitmap growth less aggressive Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King If you ask to set a bit in the Nth word and we haven't yet allocated that many slots in our array, we'll increase the bitmap size to 2*N. This means we might frequently end up with bitmaps that are twice the necessary size (as soon as you ask for the biggest bit, we'll size up to twice that). But if we just allocate as many words as were asked for, we may not grow fast enough. The worst case there is setting bit 0, then 1, etc. Each time we grow we'd just extend by one more word, giving us linear reallocations (and quadratic memory copies). Let's combine those by allocating the maximum of: - what the caller asked for - a geometric increase in existing size; we'll switch to 3/2 instead of 2 here. That's less aggressive and may help avoid fragmenting memory (N + 3N/2 > 9N/4, so old chunks can be reused as we scale up). Our worst case is still 3/2N wasted bits (you set bit N-1, then setting bit N causes us to grow by 3/2), but our average should be much better. This isn't usually that big a deal, but it will matter as we shift the reachability bitmap generation code to store more bitmaps in memory. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index 7c1ecfa6fd..43a59d7fed 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -39,7 +39,9 @@ static void bitmap_grow(struct bitmap *self, size_t word_alloc) { if (word_alloc > self->word_alloc) { size_t old_size = self->word_alloc; - self->word_alloc = word_alloc * 2; + self->word_alloc = old_size * 3 / 2; + if (word_alloc > self->word_alloc) + self->word_alloc = word_alloc; REALLOC_ARRAY(self->words, self->word_alloc); memset(self->words + old_size, 0x0, (self->word_alloc - old_size) * sizeof(eword_t)); From patchwork Wed Nov 11 19:43:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C421C388F9 for ; Wed, 11 Nov 2020 19:43:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 11E2A2087D for ; Wed, 11 Nov 2020 19:43:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="CGBs3pBT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727794AbgKKTnH (ORCPT ); Wed, 11 Nov 2020 14:43:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727684AbgKKTnH (ORCPT ); Wed, 11 Nov 2020 14:43:07 -0500 Received: from mail-qv1-xf42.google.com (mail-qv1-xf42.google.com [IPv6:2607:f8b0:4864:20::f42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5971EC0613D1 for ; Wed, 11 Nov 2020 11:43:07 -0800 (PST) Received: by mail-qv1-xf42.google.com with SMTP id g19so1529608qvy.2 for ; Wed, 11 Nov 2020 11:43:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=sE5cm27BQBCiqVlBL8nTP7rYpEYlk03VQA5EA27Rtqc=; b=CGBs3pBTP+aMRwCd+O8j/p/yNCldMmPc2rIeLq9BV+Vj+YADIIf5/rIEm8F1NeGgPe rlZzNG2CE4KN4xywgL537bnMF1xx4t9wk1QK/0v7t7ZSsWKTYhW9WjEXnwNJjnl3Ko7U 8sw2hrNWBnyhUezqKp+eOXp8Cpv2GC5hcvLTpkoUK3B23gndN8U0Qov08RYx343YazA6 me+8W9cBZaBajt3FIKyTLtR7ECWHb6CgdPT2RoGkybr+vLvHpr5p/mlwzdtGPgcMm/s0 bETQGvxpnpLgbVlZqd3GBGu3DMY8YT/ZSqIRV+oExu08lXfVaERGexBenuJCQvmWdkEb 2/7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=sE5cm27BQBCiqVlBL8nTP7rYpEYlk03VQA5EA27Rtqc=; b=piLVxKFiPe45fJOE+bNdMzaTF5X46Dd4yJCWrzK5GLmX1blQ7S2QE1gsc9jnln3shO ruw2D9NSfFcRPPGPh9yLU/xfi78fCwqTML1pmmf5uVutNOJhbtEc1i7YFQIgC9lraTof QiWgJlsuK6h7RRLiD+mCVK4M/6/C/Zo968poiA8TgicSCOFZUZYplvBIwp1TODy6zipa 1Pyxo5IXlVMkj5FZnBCwvWxL1+zwslX2NuGt/tE46LRqbTZnfcrzU32kEi6wh217rpV0 EJc1MlCZSxaBTfHCrFke/RE0BduFvIJOXKbV58G3cJ/KcIUxXMpbHlK9OOOazcgKOHmG yTRA== X-Gm-Message-State: AOAM530BvaHfU7NcWwUlDYpsvYgav2M5H9HrLS4Ai1VW1nQMSHRyfemG yxZXeTPjyzgT+XoosvajYk+2X6YE/B+0KBkQ X-Google-Smtp-Source: ABdhPJzbMKMvBJECrJ1xY4OE+oMngLM8tZ74EQcLTOHmF0lCu3bqFH2QuCMAsoHY4G46jV/HL0+4Rw== X-Received: by 2002:a0c:e443:: with SMTP id d3mr15685381qvm.18.1605123786373; Wed, 11 Nov 2020 11:43:06 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id t20sm3162276qtt.70.2020.11.11.11.43.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:43:05 -0800 (PST) Date: Wed, 11 Nov 2020 14:43:03 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 08/23] ewah: implement bitmap_or() Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King We have a function to bitwise-OR an ewah into an uncompressed bitmap, but not to OR two uncompressed bitmaps. Let's add it. Interestingly, we have a public header declaration going back to e1273106f6 (ewah: compressed bitmap implementation, 2013-11-14), but the function was never implemented. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index 43a59d7fed..c3f8e7242b 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -127,6 +127,15 @@ void bitmap_and_not(struct bitmap *self, struct bitmap *other) self->words[i] &= ~other->words[i]; } +void bitmap_or(struct bitmap *self, const struct bitmap *other) +{ + size_t i; + + bitmap_grow(self, other->word_alloc); + for (i = 0; i < other->word_alloc; i++) + self->words[i] |= other->words[i]; +} + void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other) { size_t original_size = self->word_alloc; From patchwork Wed Nov 11 19:43:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1700AC5517A for ; Wed, 11 Nov 2020 19:43:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A4A3D2087D for ; Wed, 11 Nov 2020 19:43:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="ySqkrOr8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727803AbgKKTnN (ORCPT ); Wed, 11 Nov 2020 14:43:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727773AbgKKTnN (ORCPT ); Wed, 11 Nov 2020 14:43:13 -0500 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1391CC0613D1 for ; Wed, 11 Nov 2020 11:43:12 -0800 (PST) Received: by mail-qt1-x842.google.com with SMTP id b16so1952012qtb.6 for ; Wed, 11 Nov 2020 11:43:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=bzeOBd/U/47Er7muYEe+SEG0TcfPAQqvipRIyq1W2Yw=; b=ySqkrOr8tR3GMgUtqy33ABBnoIZr03chDCS33nCUcIibJseTsluJBajvCmRq8oqPal WGm3qKATafMUvd0o824Nw6fOxPv82WUnxvJkyg4iRBBlOl29PvBND11lQNfs738EULTW AT4c73WVqHuzRAc61UIG2mP3eW51cSZLPsYjkUqKqTo9M+C0BMlS0GN8LIas4/fog5Rh gneuuLw94DObjzp1iz9LaUQ7wR+fdfvq4x19ShVEZijCX/bPez/6mUpM5sFevUIbPGBw 7pzdyvtkKtLHK0JjGjqagVVXMwQ0HYWtJywGKaRQJQ6cn9udl7ppe44iVBdoBAqdWK6z PwBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=bzeOBd/U/47Er7muYEe+SEG0TcfPAQqvipRIyq1W2Yw=; b=YN95rtIb5ROTuaKK1vJQ+vsheFVgVE8N2KeqimegdyTxcV57LYOxbNyqLcbd915DKn O3nUqnrKrOSt6JQ0pcZQppBjhXH33En67GKhOMXrrMVqRaDcabyMvJwixt72mPqhreF1 vK4YwC0fn0NIcGR8MltdLO9NtoF1PItcvkEV6qhpFSCbZ+JGFrkHberalELptPl/Kk56 xWHO/CiPd54jVmSog4e1qvU4GyQ5/vVLRDkMvqEKyf08RXDZEhAQ8PzWq5qBIxLxbyIX JyoffTvgz6MkkouVxeBNuis2zpjF0hLXv9zzPJwjr5lv2MSH+GUf+nU1gOjQJMoJkRft nQ/A== X-Gm-Message-State: AOAM532qUYO0BcSpblePm+G5QexE1sAMH9GMm3P1jjGjXcIM1dKBiLCV +OhLuTn64/AJ1lCBn041gvUTDq3YhgBSIdZc X-Google-Smtp-Source: ABdhPJzt63DGcuodKbUiKbfwMIQYU8iibHGIZnL18xMyJtWoZJ6TCL2KsfTX8bZwF+A0QKZt1dyP+g== X-Received: by 2002:a05:622a:4e:: with SMTP id y14mr17336842qtw.392.1605123791026; Wed, 11 Nov 2020 11:43:11 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id j50sm3297161qtc.5.2020.11.11.11.43.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:43:10 -0800 (PST) Date: Wed, 11 Nov 2020 14:43:08 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 09/23] ewah: add bitmap_dup() function Message-ID: <242bd3f110fe43c127208a7351215f1b238c714b.1605123652.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King There's no easy way to make a copy of a bitmap. Obviously a caller can iterate over the bits and set them one by one in a new bitmap, but we can go much faster by copying whole words with memcpy(). Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 7 +++++++ ewah/ewok.h | 1 + 2 files changed, 8 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index c3f8e7242b..eb7e2539be 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -35,6 +35,13 @@ struct bitmap *bitmap_new(void) return bitmap_word_alloc(32); } +struct bitmap *bitmap_dup(const struct bitmap *src) +{ + struct bitmap *dst = bitmap_word_alloc(src->word_alloc); + COPY_ARRAY(dst->words, src->words, src->word_alloc); + return dst; +} + static void bitmap_grow(struct bitmap *self, size_t word_alloc) { if (word_alloc > self->word_alloc) { diff --git a/ewah/ewok.h b/ewah/ewok.h index 011852bef1..1fc555e672 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -173,6 +173,7 @@ struct bitmap { struct bitmap *bitmap_new(void); struct bitmap *bitmap_word_alloc(size_t word_alloc); +struct bitmap *bitmap_dup(const struct bitmap *src); void bitmap_set(struct bitmap *self, size_t pos); void bitmap_unset(struct bitmap *self, size_t pos); int bitmap_get(struct bitmap *self, size_t pos); From patchwork Wed Nov 11 19:43:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DC37C5517A for ; Wed, 11 Nov 2020 19:43:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0807C2087D for ; Wed, 11 Nov 2020 19:43:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="z8tMN/3z" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727805AbgKKTnS (ORCPT ); Wed, 11 Nov 2020 14:43:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726904AbgKKTnR (ORCPT ); Wed, 11 Nov 2020 14:43:17 -0500 Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B5EDC0613D1 for ; Wed, 11 Nov 2020 11:43:17 -0800 (PST) Received: by mail-qt1-x832.google.com with SMTP id f93so2167054qtb.10 for ; Wed, 11 Nov 2020 11:43:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=DXAUWieoTNab9Z6/aRiTnGVEYxvjTtCtn6yRzBGlmbQ=; b=z8tMN/3znFOutfKRAuj0Ai3rsglpaPfi4JBbCUQb+Mdu45OS3W/o6tj+7ojittLO2a BhjEmPuufkqhbKTtzOAl0AfphIu4prqhDpPyMR4bIx6vhwq4h96QadJNFdWAgnKbYJx6 SeB6VgIX1X4OJxRDcFSnceVU8aLzQqQJtYvHOdN6Kpe8YtPpvb35nVI1++G8KyO1wCrn yOz+ts0cGUo79aujGFUAMLZ43jk6HL192pTj27KE2el5mvzQsjUqhpQA5lLv4vJ3Ht5L u71XgELy2FW4WNVIeOKdW5B18It5+c2Zj4w/JijorM+jOHgAtaL8BZFDFCfUIpUueT6u z95Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=DXAUWieoTNab9Z6/aRiTnGVEYxvjTtCtn6yRzBGlmbQ=; b=sJyqKkOKcMlrDB4n7PXDBWsNJHvvCC8IXrwuH/3GPsrTuO8UXXLoQlVDh9233Pfgti 2OiQLqA053c/0eYlRXQ1+x+sCYaKiIgeF3U8Aama0cTQ1BG4k5LZ7y5GmLvrfN/0DtRW O9VnunKlUoZCCFRbMSZzrbTylMz6meja2pOlnr3Rjv0uCHhSu3vABz1cujT40dcw0SZ/ ZIx3RSJwOLJpe4sbEwQK9CEeoad106icuq1shfqg8763NxtcOoVoonCE3kZPanryJ9ef Gk4gu41PaAvDwGXAakzKa6oJtxT6tt36KyFSirn0AtDkESEmJE7+ORlce/cmsTorHcOV /UTw== X-Gm-Message-State: AOAM531ylDC7f/hqJAIXzFYMDOtNj9TRmEcj7Sw6WGv6t+Fxf0IFqoJ7 b/e+7s7xOBPqjy5RF9MZkg7pZWrXth+LKS1l X-Google-Smtp-Source: ABdhPJyzc/Q/5XNRn0xLDqUs4Q1S+T3x3hMnpdAXpEYOwRKJwl0eI7QFm1l5eOi73mGhnbDDxqSqAg== X-Received: by 2002:ac8:6898:: with SMTP id m24mr25111028qtq.157.1605123795698; Wed, 11 Nov 2020 11:43:15 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id i4sm2995640qtw.22.2020.11.11.11.43.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:43:15 -0800 (PST) Date: Wed, 11 Nov 2020 14:43:12 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 10/23] pack-bitmap-write: reimplement bitmap writing Message-ID: <2f18d76cfd2c6ab3df7f4391aa824de1ad5f2d5f.1605123652.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King The bitmap generation code works by iterating over the set of commits for which we plan to write bitmaps, and then for each one performing a traditional traversal over the reachable commits and trees, filling in the bitmap. Between two traversals, we can often reuse the previous bitmap result as long as the first commit is an ancestor of the second. However, our worst case is that we may end up doing "n" complete complete traversals to the root in order to create "n" bitmaps. In a real-world case (the shared-storage repo consisting of all GitHub forks of chromium/chromium), we perform very poorly: generating bitmaps takes ~3 hours, whereas we can walk the whole object graph in ~3 minutes. This commit completely rewrites the algorithm, with the goal of accessing each object only once. It works roughly like this: - generate a list of commits in topo-order using a single traversal - invert the edges of the graph (so have parents point at their children) - make one pass in reverse topo-order, generating a bitmap for each commit and passing the result along to child nodes We generate correct results because each node we visit has already had all of its ancestors added to the bitmap. And we make only two linear passes over the commits. We also visit each tree usually only once. When filling in a bitmap, we don't bother to recurse into trees whose bit is already set in the bitmap (since we know we've already done so when setting their bit). That means that if commit A references tree T, none of its descendants will need to open T again. I say "usually", though, because it is possible for a given tree to be mentioned in unrelated parts of history (e.g., cherry-picking to a parallel branch). So we've accomplished our goal, and the resulting algorithm is pretty simple to understand. But there are some downsides, at least with this initial implementation: - we no longer reuse the results of any on-disk bitmaps when generating. So we'd expect to sometimes be slower than the original when bitmaps already exist. However, this is something we'll be able to add back in later. - we use much more memory. Instead of keeping one bitmap in memory at a time, we're passing them up through the graph. So our memory use should scale with the graph width (times the size of a bitmap). So how does it perform? For a clone of linux.git, generating bitmaps from scratch with the old algorithm took 63s. Using this algorithm it takes 205s. Which is much worse, but _might_ be acceptable if it behaved linearly as the size grew. It also increases peak heap usage by ~1G. That's not impossibly large, but not encouraging. On the complete fork-network of torvalds/linux, it increases the peak RAM usage by 40GB. Yikes. (I forgot to record the time it took, but the memory usage was too much to consider this reasonable anyway). On the complete fork-network of chromium/chromium, I ran out of memory before succeeding. Some back-of-the-envelope calculations indicate it would need 80+GB to complete. So at this stage, we've managed to make things much worse. But because of the way this new algorithm is structured, there are a lot of opportunities for optimization on top. We'll start implementing those in the follow-on patches. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 303 ++++++++++++++++++++++++-------------------- 1 file changed, 169 insertions(+), 134 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 5e998bdaa7..f2f0b6b2c2 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -110,8 +110,6 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack, /** * Compute the actual bitmaps */ -static struct object **seen_objects; -static unsigned int seen_objects_nr, seen_objects_alloc; static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused) { @@ -127,21 +125,6 @@ static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitm writer.selected_nr++; } -static inline void mark_as_seen(struct object *object) -{ - ALLOC_GROW(seen_objects, seen_objects_nr + 1, seen_objects_alloc); - seen_objects[seen_objects_nr++] = object; -} - -static inline void reset_all_seen(void) -{ - unsigned int i; - for (i = 0; i < seen_objects_nr; ++i) { - seen_objects[i]->flags &= ~(SEEN | ADDED | SHOWN); - } - seen_objects_nr = 0; -} - static uint32_t find_object_pos(const struct object_id *oid) { struct object_entry *entry = packlist_find(writer.to_pack, oid); @@ -154,60 +137,6 @@ static uint32_t find_object_pos(const struct object_id *oid) return oe_in_pack_pos(writer.to_pack, entry); } -static void show_object(struct object *object, const char *name, void *data) -{ - struct bitmap *base = data; - bitmap_set(base, find_object_pos(&object->oid)); - mark_as_seen(object); -} - -static void show_commit(struct commit *commit, void *data) -{ - mark_as_seen((struct object *)commit); -} - -static int -add_to_include_set(struct bitmap *base, struct commit *commit) -{ - khiter_t hash_pos; - uint32_t bitmap_pos = find_object_pos(&commit->object.oid); - - if (bitmap_get(base, bitmap_pos)) - return 0; - - hash_pos = kh_get_oid_map(writer.bitmaps, commit->object.oid); - if (hash_pos < kh_end(writer.bitmaps)) { - struct bitmapped_commit *bc = kh_value(writer.bitmaps, hash_pos); - bitmap_or_ewah(base, bc->bitmap); - return 0; - } - - bitmap_set(base, bitmap_pos); - return 1; -} - -static int -should_include(struct commit *commit, void *_data) -{ - struct bitmap *base = _data; - - if (!add_to_include_set(base, commit)) { - struct commit_list *parent = commit->parents; - - mark_as_seen((struct object *)commit); - - while (parent) { - parent->item->object.flags |= SEEN; - mark_as_seen((struct object *)parent->item); - parent = parent->next; - } - - return 0; - } - - return 1; -} - static void compute_xor_offsets(void) { static const int MAX_XOR_OFFSET_SEARCH = 10; @@ -248,79 +177,185 @@ static void compute_xor_offsets(void) } } -void bitmap_writer_build(struct packing_data *to_pack) +struct bb_commit { + struct commit_list *children; + struct bitmap *bitmap; + unsigned selected:1; + unsigned idx; /* within selected array */ +}; + +define_commit_slab(bb_data, struct bb_commit); + +struct bitmap_builder { + struct bb_data data; + struct commit **commits; + size_t commits_nr, commits_alloc; +}; + +static void bitmap_builder_init(struct bitmap_builder *bb, + struct bitmap_writer *writer) { - static const double REUSE_BITMAP_THRESHOLD = 0.2; - - int i, reuse_after, need_reset; - struct bitmap *base = bitmap_new(); struct rev_info revs; + struct commit *commit; + unsigned int i; + + memset(bb, 0, sizeof(*bb)); + init_bb_data(&bb->data); + + reset_revision_walk(); + repo_init_revisions(writer->to_pack->repo, &revs, NULL); + revs.topo_order = 1; + + for (i = 0; i < writer->selected_nr; i++) { + struct commit *c = writer->selected[i].commit; + struct bb_commit *ent = bb_data_at(&bb->data, c); + ent->selected = 1; + ent->idx = i; + add_pending_object(&revs, &c->object, ""); + } + + if (prepare_revision_walk(&revs)) + die("revision walk setup failed"); + + while ((commit = get_revision(&revs))) { + struct commit_list *p; + + parse_commit_or_die(commit); + + ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); + bb->commits[bb->commits_nr++] = commit; + + for (p = commit->parents; p; p = p->next) { + struct bb_commit *ent = bb_data_at(&bb->data, p->item); + commit_list_insert(commit, &ent->children); + } + } +} + +static void bitmap_builder_clear(struct bitmap_builder *bb) +{ + clear_bb_data(&bb->data); + free(bb->commits); + bb->commits_nr = bb->commits_alloc = 0; +} + +static void fill_bitmap_tree(struct bitmap *bitmap, + struct tree *tree) +{ + uint32_t pos; + struct tree_desc desc; + struct name_entry entry; + + /* + * If our bit is already set, then there is nothing to do. Both this + * tree and all of its children will be set. + */ + pos = find_object_pos(&tree->object.oid); + if (bitmap_get(bitmap, pos)) + return; + bitmap_set(bitmap, pos); + + if (parse_tree(tree) < 0) + die("unable to load tree object %s", + oid_to_hex(&tree->object.oid)); + init_tree_desc(&desc, tree->buffer, tree->size); + + while (tree_entry(&desc, &entry)) { + switch (object_type(entry.mode)) { + case OBJ_TREE: + fill_bitmap_tree(bitmap, + lookup_tree(the_repository, &entry.oid)); + break; + case OBJ_BLOB: + bitmap_set(bitmap, find_object_pos(&entry.oid)); + break; + default: + /* Gitlink, etc; not reachable */ + break; + } + } + + free_tree_buffer(tree); +} + +static void fill_bitmap_commit(struct bb_commit *ent, + struct commit *commit) +{ + if (!ent->bitmap) + ent->bitmap = bitmap_new(); + + /* + * mark ourselves, but do not bother with parents; their values + * will already have been propagated to us + */ + bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); + fill_bitmap_tree(ent->bitmap, get_commit_tree(commit)); +} + +static void store_selected(struct bb_commit *ent, struct commit *commit) +{ + struct bitmapped_commit *stored = &writer.selected[ent->idx]; + khiter_t hash_pos; + int hash_ret; + + /* + * the "reuse bitmaps" phase may have stored something here, but + * our new algorithm doesn't use it. Drop it. + */ + if (stored->bitmap) + ewah_free(stored->bitmap); + + stored->bitmap = bitmap_to_ewah(ent->bitmap); + + hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret); + if (hash_ret == 0) + die("Duplicate entry when writing index: %s", + oid_to_hex(&commit->object.oid)); + kh_value(writer.bitmaps, hash_pos) = stored; +} + +void bitmap_writer_build(struct packing_data *to_pack) +{ + struct bitmap_builder bb; + size_t i; + int nr_stored = 0; /* for progress */ writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; if (writer.show_progress) writer.progress = start_progress("Building bitmaps", writer.selected_nr); - - repo_init_revisions(to_pack->repo, &revs, NULL); - revs.tag_objects = 1; - revs.tree_objects = 1; - revs.blob_objects = 1; - revs.no_walk = 0; - - revs.include_check = should_include; - reset_revision_walk(); - - reuse_after = writer.selected_nr * REUSE_BITMAP_THRESHOLD; - need_reset = 0; - - for (i = writer.selected_nr - 1; i >= 0; --i) { - struct bitmapped_commit *stored; - struct object *object; - - khiter_t hash_pos; - int hash_ret; - - stored = &writer.selected[i]; - object = (struct object *)stored->commit; - - if (stored->bitmap == NULL) { - if (i < writer.selected_nr - 1 && - (need_reset || - !in_merge_bases(writer.selected[i + 1].commit, - stored->commit))) { - bitmap_reset(base); - reset_all_seen(); - } - - add_pending_object(&revs, object, ""); - revs.include_check_data = base; - - if (prepare_revision_walk(&revs)) - die("revision walk setup failed"); - - traverse_commit_list(&revs, show_commit, show_object, base); - - object_array_clear(&revs.pending); - - stored->bitmap = bitmap_to_ewah(base); - need_reset = 0; - } else - need_reset = 1; - - if (i >= reuse_after) - stored->flags |= BITMAP_FLAG_REUSE; - - hash_pos = kh_put_oid_map(writer.bitmaps, object->oid, &hash_ret); - if (hash_ret == 0) - die("Duplicate entry when writing index: %s", - oid_to_hex(&object->oid)); - - kh_value(writer.bitmaps, hash_pos) = stored; - display_progress(writer.progress, writer.selected_nr - i); + trace2_region_enter("pack-bitmap-write", "building_bitmaps_total", + the_repository); + + bitmap_builder_init(&bb, &writer); + for (i = bb.commits_nr; i > 0; i--) { + struct commit *commit = bb.commits[i-1]; + struct bb_commit *ent = bb_data_at(&bb.data, commit); + struct commit *child; + + fill_bitmap_commit(ent, commit); + + if (ent->selected) { + store_selected(ent, commit); + nr_stored++; + display_progress(writer.progress, nr_stored); + } + + while ((child = pop_commit(&ent->children))) { + struct bb_commit *child_ent = + bb_data_at(&bb.data, child); + + if (child_ent->bitmap) + bitmap_or(child_ent->bitmap, ent->bitmap); + else + child_ent->bitmap = bitmap_dup(ent->bitmap); + } + bitmap_free(ent->bitmap); + ent->bitmap = NULL; } + bitmap_builder_clear(&bb); - bitmap_free(base); stop_progress(&writer.progress); compute_xor_offsets(); From patchwork Wed Nov 11 19:43:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 620BDC5517A for ; Wed, 11 Nov 2020 19:43:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 02AEF207F7 for ; Wed, 11 Nov 2020 19:43:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="krWJMXj1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727811AbgKKTnY (ORCPT ); Wed, 11 Nov 2020 14:43:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726904AbgKKTnX (ORCPT ); Wed, 11 Nov 2020 14:43:23 -0500 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 275A8C0613D1 for ; Wed, 11 Nov 2020 11:43:23 -0800 (PST) Received: by mail-qt1-x843.google.com with SMTP id p12so2178076qtp.7 for ; Wed, 11 Nov 2020 11:43:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=x3WnFG+yS//yZUuDHlkQGLxSM+69kgHjwyV+D6gY31Q=; b=krWJMXj1+1hKgTBKiGeW8Cg6wcWIk5YJqL4n1wpjAkST+LBj8snRwrxzpj8MJXxHQ/ ycTtE+eGonAMCUmBN5D5bUh4w0A47BAb8k34PoIDrTDymmC/rnNb1jq2D/PhY1Q19bVy Vh2XMRbPid5h4j2Osp+OsvG3udFGz18HwBhlB5w2LTBIJrxThNKMbUP5M5C6qI1/2rIl qdJrGgbsX+d9ACegCB2mWHJ4OM1k+vFmYZCBhlbC0BFCNgIO4ulkRkUMZiwcBPZrQyxL J+/f86+JY/xNLuNqx+K1iBckC3PdKrcYa0mzjlxP3qQkSBNgGZ2mla5YkeTBW58R3KGp tfNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=x3WnFG+yS//yZUuDHlkQGLxSM+69kgHjwyV+D6gY31Q=; b=dMjLFtltEqpuME9vXWX94oNZYzrF9lyBq5buOgz7dJ49d16zBOqyiEDIz2l7sNFxM+ G61G69ckURYk4oFzVDPiEkEdeAW3NV2CEK/8iQIJS0zt3A8B22/L1ZtHgkwpVXdkff5X 3CWB5OmyQrq8fuzA75x9akNc1YUAGj1oZNWlZdsRU1xzCPl1r+mQ1wTI7P3PVANzQkIg TKZEvteVWJamJwJKD99+NDeclZZwtpB+o92N5yD4+snjB6S/sZUJVAMjypLyRmRf6UQ3 OhCEfgxsGuzhZbla03TmGM2kfKNsLc7kvnYMDbNo58VmPsva1wtk1Q0wqEsaYxX6Sm6A qC0A== X-Gm-Message-State: AOAM530SWGMh/IeM304nhcGZNBrhdBh6sRX+YIOaPKRV58d2TEeA+S2M u2JGLaZ9UIn6jCS37dZBej37IKz9K7aR6zDj X-Google-Smtp-Source: ABdhPJxnW1btbRO4kZQiSEKzyIcXL8GQP5LuSJ11gRQN3L4yYHrSnSqRswqtm1ijD5TZWZqHk28hrA== X-Received: by 2002:ac8:70cd:: with SMTP id g13mr24365421qtp.345.1605123802084; Wed, 11 Nov 2020 11:43:22 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id h125sm3192809qkc.36.2020.11.11.11.43.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:43:21 -0800 (PST) Date: Wed, 11 Nov 2020 14:43:18 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 11/23] pack-bitmap-write: pass ownership of intermediate bitmaps Message-ID: <8328d8cc9930ca10d939e4b35abe1c4987aba115.1605123652.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King Our algorithm to generate reachability bitmaps walks through the commit graph from the bottom up, passing bitmap data from each commit to its descendants. For a linear stretch of history like: A -- B -- C our sequence of steps is: - compute the bitmap for A by walking its trees, etc - duplicate A's bitmap as a starting point for B; we can now free A's bitmap, since we only needed it as an intermediate result - OR in any extra objects that B can reach into its bitmap - duplicate B's bitmap as a starting point for C; likewise, free B's bitmap - OR in objects for C, and so on... Rather than duplicating bitmaps and immediately freeing the original, we can just pass ownership from commit to commit. Note that this doesn't always work: - the recipient may be a merge which already has an intermediate bitmap from its other ancestor. In that case we have to OR our result into it. Note that the first ancestor to reach the merge does get to pass ownership, though. - we may have multiple children; we can only pass ownership to one of them However, it happens often enough and copying bitmaps is expensive enough that this provides a noticeable speedup. On a clone of linux.git, this reduces the time to generate bitmaps from 205s to 70s. This is about the same amount of time it took to generate bitmaps using our old "many traversals" algorithm (the previous commit measures the identical scenario as taking 63s). It unfortunately provides only a very modest reduction in the peak memory usage, though. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index f2f0b6b2c2..d2d46ff5f4 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -333,6 +333,7 @@ void bitmap_writer_build(struct packing_data *to_pack) struct commit *commit = bb.commits[i-1]; struct bb_commit *ent = bb_data_at(&bb.data, commit); struct commit *child; + int reused = 0; fill_bitmap_commit(ent, commit); @@ -348,10 +349,15 @@ void bitmap_writer_build(struct packing_data *to_pack) if (child_ent->bitmap) bitmap_or(child_ent->bitmap, ent->bitmap); - else + else if (reused) child_ent->bitmap = bitmap_dup(ent->bitmap); + else { + child_ent->bitmap = ent->bitmap; + reused = 1; + } } - bitmap_free(ent->bitmap); + if (!reused) + bitmap_free(ent->bitmap); ent->bitmap = NULL; } bitmap_builder_clear(&bb); From patchwork Wed Nov 11 19:43:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF73FC5517A for ; Wed, 11 Nov 2020 19:43:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 58ED62087D for ; Wed, 11 Nov 2020 19:43:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="zfqJgWRp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727814AbgKKTn3 (ORCPT ); Wed, 11 Nov 2020 14:43:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726904AbgKKTn3 (ORCPT ); Wed, 11 Nov 2020 14:43:29 -0500 Received: from mail-qv1-xf41.google.com (mail-qv1-xf41.google.com [IPv6:2607:f8b0:4864:20::f41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3844C0613D1 for ; Wed, 11 Nov 2020 11:43:27 -0800 (PST) Received: by mail-qv1-xf41.google.com with SMTP id g19so1530192qvy.2 for ; Wed, 11 Nov 2020 11:43:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=eA8mts39wLjLDb5MEYgDHjwZHfJ1yYnd4tkZbI/5PSE=; b=zfqJgWRpoDrira+jtFKdmeZn6VJ1XdJUhCqYh5TSYb/93VNctvM3T8BAPwD1zStJsR T6Ysi85BcbB1YQsofjCRZ99AIQ8EQ2hJDlMKmZhNGEiYeJwBTWavp0G8Ab9vGWHWkO2b gjtNPGMmHXW8ammhqlavUfDxD+bhlkl5AlqnDpRtx0+0YxMeaDJHs2jxAh6tMTI4bLqW c4plBzN1i9I8Jd5pLolhYgzSWrVcLB7M+TqRHgG6ydIj/qsl2zEtuaPVkIEz7VMq7n4B Wl4nQRNwZ96bcDSPDM6YYORgrWfVwwfk+6FoZMU/GrVUMkg/JE5X7rmUbsOh8T7vD1w8 3uUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=eA8mts39wLjLDb5MEYgDHjwZHfJ1yYnd4tkZbI/5PSE=; b=aQ4Oyj2+IXFqhYVd9c/AiuZj38YtnmfSmsCd+UJfWzjF2GZzmaGVBmNnjnU4BjadMo JlWD4hDF/t8759ZvhpuUikf689ufYBkaGAO9O5GqSYhpJJdEiGg5dQB9GkD2wP1bQZmv ILB85pXZfK9EJGroqDAd90EObDmbs/SCXsETlRWgjckr3QZF+2BO0MwpUAlkHiRrASN4 s1rlhjkqNLacivHTjqbA2cevZGnJhc1aNRJGLnRGL/ADBj0AcGaLH8T10zhVXT0NB8/O wgburE7CGoC9qPNHSxz3mqmqwoWAYH9pLVdvlcWjmy8fk/OboOiIPIEcCk9O4W+TYdzr 5AzQ== X-Gm-Message-State: AOAM533wMqrIqyc1+nVSWaGjG3SSBytfkWR76ONmpkAAiuK+EIxCE/rC +Jqh5D4K2EB54g94FTC7LrKQ6BzwxrMTyWKR X-Google-Smtp-Source: ABdhPJyNO2W2j/EGYWSbptwBPLbNoXqvlNOr7dAqCigXYztGdgLeIxdEIGVGdHnrtdbpAdisUG96Ag== X-Received: by 2002:ad4:470d:: with SMTP id k13mr25141849qvz.40.1605123806964; Wed, 11 Nov 2020 11:43:26 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id 82sm3038643qke.76.2020.11.11.11.43.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:43:26 -0800 (PST) Date: Wed, 11 Nov 2020 14:43:24 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 12/23] pack-bitmap-write: fill bitmap with commit history Message-ID: <88e7988751fca329a8e453727c614fdfbbba426a.1605123652.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The fill_bitmap_commit() method assumes that every parent of the given commit is already part of the current bitmap. Instead of making that assumption, let's walk parents until we reach commits already part of the bitmap. Set the value for that parent immediately after querying to save time doing double calls to find_object_pos() and to avoid inserting the parent into the queue multiple times. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index d2d46ff5f4..361f3305a2 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -12,6 +12,7 @@ #include "sha1-lookup.h" #include "pack-objects.h" #include "commit-reach.h" +#include "prio-queue.h" struct bitmapped_commit { struct commit *commit; @@ -279,17 +280,30 @@ static void fill_bitmap_tree(struct bitmap *bitmap, } static void fill_bitmap_commit(struct bb_commit *ent, - struct commit *commit) + struct commit *commit, + struct prio_queue *queue) { if (!ent->bitmap) ent->bitmap = bitmap_new(); - /* - * mark ourselves, but do not bother with parents; their values - * will already have been propagated to us - */ bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); - fill_bitmap_tree(ent->bitmap, get_commit_tree(commit)); + prio_queue_put(queue, commit); + + while (queue->nr) { + struct commit_list *p; + struct commit *c = prio_queue_get(queue); + + bitmap_set(ent->bitmap, find_object_pos(&c->object.oid)); + fill_bitmap_tree(ent->bitmap, get_commit_tree(c)); + + for (p = c->parents; p; p = p->next) { + int pos = find_object_pos(&p->item->object.oid); + if (!bitmap_get(ent->bitmap, pos)) { + bitmap_set(ent->bitmap, pos); + prio_queue_put(queue, p->item); + } + } + } } static void store_selected(struct bb_commit *ent, struct commit *commit) @@ -319,6 +333,7 @@ void bitmap_writer_build(struct packing_data *to_pack) struct bitmap_builder bb; size_t i; int nr_stored = 0; /* for progress */ + struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; @@ -335,7 +350,7 @@ void bitmap_writer_build(struct packing_data *to_pack) struct commit *child; int reused = 0; - fill_bitmap_commit(ent, commit); + fill_bitmap_commit(ent, commit, &queue); if (ent->selected) { store_selected(ent, commit); @@ -360,6 +375,7 @@ void bitmap_writer_build(struct packing_data *to_pack) bitmap_free(ent->bitmap); ent->bitmap = NULL; } + clear_prio_queue(&queue); bitmap_builder_clear(&bb); stop_progress(&writer.progress); From patchwork Wed Nov 11 19:43:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898423 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6662C5517A for ; Wed, 11 Nov 2020 19:43:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 907872087D for ; Wed, 11 Nov 2020 19:43:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="MmshsSUS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727735AbgKKTne (ORCPT ); Wed, 11 Nov 2020 14:43:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726904AbgKKTnd (ORCPT ); Wed, 11 Nov 2020 14:43:33 -0500 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99CF6C0613D1 for ; Wed, 11 Nov 2020 11:43:32 -0800 (PST) Received: by mail-qk1-x742.google.com with SMTP id l2so2935255qkf.0 for ; Wed, 11 Nov 2020 11:43:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Q0JDLDeqmjbZWetZBWxBOsVHgE1yq9OO5bMtjjsWEDk=; b=MmshsSUS1H+Djv/E+lwr0sasD2rR/XrmH4By+hQ79pvr6CPRH/vCDKRRdG4dA1pt4c 6oBLQiyqiYqzfeFWbc7vvD7FALxIkPOV8P5CFiUjAxCNt0kZsl1vLyWXwkWnXNmuL5WR 0N0kNINAQc5dHFgGqu6vDyu/W3SIAkrgJxu5i9MJf7d6UTrcuFA91/r24yPZkc/OsQmr dk7KMHjLrJv4MlPRyYiUz9XY/Nq5Y03/0dxk8l5Fon0Bsj6NijKCcTjwHIOAtuZ88jJn AXaUNGhAObTtLLqrUtcdiErgF+wBwgfgR3aumEeqHBKTU0ojPLb8aKlD5axtqinkHTc1 DVkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Q0JDLDeqmjbZWetZBWxBOsVHgE1yq9OO5bMtjjsWEDk=; b=pqugNoNLT1kzBZt71I85Vne8svEQKD2XCl8AGbGsWykLVTvRYBeuE3p+5Mpl0aCJC9 enArXrj4Vw+VxAExUJdouSvk6nI5k/9WLxTTcpuqwUnNfkMDqhDRpYTymhbAuCmWGhfb usNcm5blKrBw3OzjXtG6XrzDtyCgnT5XFGIgq5P5I0yDol4AF4pF+f/6ddsu8PakNzJn I6iIygMWYwVvnO6lQbmwskn0qMPnGpRpYshyDn9+V7EQOi4jnHO/F7wjr9WdX0u3UJpr niMKCEFlx5HgGqixtYcn4D2WxgFiidm3YqNI5aNjHiTP+wjGl0g67o6NQ7HB4jTRmWIS yiRQ== X-Gm-Message-State: AOAM532baWGArSsl0Ldp7ziE0K+Xqr+w6ZXlfXgUrqkJ+S37kCMTXaWo KMot9uvLYDXogOhpmI6kkq/k9Vt0at5KR/u1 X-Google-Smtp-Source: ABdhPJwbnH52xi/ERlV6GWego/nEpa2MjQ8/DBbJC6TN5jVFIUBGfVgHcR5mqmhot6CWiJKznHebMQ== X-Received: by 2002:a37:6403:: with SMTP id y3mr4011334qkb.204.1605123811505; Wed, 11 Nov 2020 11:43:31 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id o14sm3261897qto.16.2020.11.11.11.43.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:43:31 -0800 (PST) Date: Wed, 11 Nov 2020 14:43:29 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 13/23] bitmap: add bitmap_diff_nonzero() Message-ID: <3f25315cf7960dcdb33e56ce8ea083b4b9688f99.1605123652.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The bitmap_diff_nonzero() checks if the 'self' bitmap contains any bits that are not on in the 'other' bitmap. Also, delete the declaration of bitmap_is_subset() as it is not used or implemented. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- ewah/bitmap.c | 24 ++++++++++++++++++++++++ ewah/ewok.h | 2 +- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index eb7e2539be..e2ebeac0e5 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -200,6 +200,30 @@ int bitmap_equals(struct bitmap *self, struct bitmap *other) return 1; } +int bitmap_diff_nonzero(struct bitmap *self, struct bitmap *other) +{ + struct bitmap *small; + size_t i; + + if (self->word_alloc < other->word_alloc) { + small = self; + } else { + small = other; + + for (i = other->word_alloc; i < self->word_alloc; i++) { + if (self->words[i] != 0) + return 1; + } + } + + for (i = 0; i < small->word_alloc; i++) { + if ((self->words[i] & ~other->words[i])) + return 1; + } + + return 0; +} + void bitmap_reset(struct bitmap *bitmap) { memset(bitmap->words, 0x0, bitmap->word_alloc * sizeof(eword_t)); diff --git a/ewah/ewok.h b/ewah/ewok.h index 1fc555e672..156c71d06d 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -180,7 +180,7 @@ int bitmap_get(struct bitmap *self, size_t pos); void bitmap_reset(struct bitmap *self); void bitmap_free(struct bitmap *self); int bitmap_equals(struct bitmap *self, struct bitmap *other); -int bitmap_is_subset(struct bitmap *self, struct bitmap *super); +int bitmap_diff_nonzero(struct bitmap *self, struct bitmap *other); struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap); struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah); From patchwork Wed Nov 11 19:43:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F7AFC56201 for ; Wed, 11 Nov 2020 19:43:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1F8D0208B3 for ; Wed, 11 Nov 2020 19:43:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="JGosVA1W" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726904AbgKKTnm (ORCPT ); Wed, 11 Nov 2020 14:43:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727817AbgKKTnj (ORCPT ); Wed, 11 Nov 2020 14:43:39 -0500 Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72B03C0613D1 for ; Wed, 11 Nov 2020 11:43:37 -0800 (PST) Received: by mail-qk1-x734.google.com with SMTP id n132so2936976qke.1 for ; Wed, 11 Nov 2020 11:43:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Vyxl/ZzskMDJlsODRF7Ik3qIATDvo6PKYLmRKflxJQg=; b=JGosVA1W6HsHthORVdGf9nwFFHx9nTY46Yshp9sI2ACo7GIq3cJPUKEDR+nurS+8Wy ofcdfFStJsaq5K008if9DArP1ZY60dBpNBTG8WtPrrzz7/YO+ezYoxUrhN0fqJaU5bH3 OFD9ukru3c0eYOdiXInTJ8GLt+l9XKdpyIuzx1kkOMi2faNGl8zxhKqFJ0K0LYX3HIXv pUt5fz3zftz+zdloAwADLH5SJqiw90awI4RdXrMTwQiZLhPAWyftJzL0+PwSOVv1PQad sPOIofdd/oYCSBy2BN5W+02boypqj148wFY9y302Qlh6PDeag2yd+fAb2FgXm3iw03fb IWww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Vyxl/ZzskMDJlsODRF7Ik3qIATDvo6PKYLmRKflxJQg=; b=o/taia71IKpGHFH/0EbWMvhN+375Xid/nARR0gVUa2OAXHZsjzBOUkQTp98X2DYsSW YUSkaAwkyyzIAve8EAvcNhZw2jm7UoBkgrwMFJwL1lnZntjwdRha0m2PHaqgl75K85WX GG61GPD/BT61C6VDD+D0mfYCHAIx+7UJgYbWgsoRW3QTe8lsOncenBXDxhNa479UArDY Jm4qxi5RTpVzCJsm2CKMXlMfgqtZ70sHwYVgEbH94M4nUnDJfC4C8LnTSdlEQQabWXfq b6JRaF6QkB42c+zXgYHyrcBrkJuNPTtNrpojVzBEHMkXjKKlR/CaJNNZsokaGyLZGTYS CNuQ== X-Gm-Message-State: AOAM531wQl9PHdPGF6QsOLyLRMJixp5tqdj8uHqdCO2ELY2gjVbBntWb JsBVqJZiX5462KUjPCirbAAJwsMQAL6A7teD X-Google-Smtp-Source: ABdhPJyNpv4Qc4CR8tJatmjVTgCIdFTKDKZg0P6U6WqowJU9WrWQixq0ek/Yo+WZS00ffEa+GKki4w== X-Received: by 2002:ae9:f714:: with SMTP id s20mr9140080qkg.45.1605123816377; Wed, 11 Nov 2020 11:43:36 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id r125sm1548104qke.129.2020.11.11.11.43.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:43:35 -0800 (PST) Date: Wed, 11 Nov 2020 14:43:33 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 14/23] commit: implement commit_list_contains() Message-ID: <7f9b45b118cdd48372dd60d5e2f3e9a588175dab.1605123652.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee It can be helpful to check if a commit_list contains a commit. Use pointer equality, assuming lookup_commit() was used. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- commit.c | 11 +++++++++++ commit.h | 2 ++ 2 files changed, 13 insertions(+) diff --git a/commit.c b/commit.c index fe1fa3dc41..9a785bf906 100644 --- a/commit.c +++ b/commit.c @@ -544,6 +544,17 @@ struct commit_list *commit_list_insert(struct commit *item, struct commit_list * return new_list; } +int commit_list_contains(struct commit *item, struct commit_list *list) +{ + while (list) { + if (list->item == item) + return 1; + list = list->next; + } + + return 0; +} + unsigned commit_list_count(const struct commit_list *l) { unsigned c = 0; diff --git a/commit.h b/commit.h index 5467786c7b..742a6de460 100644 --- a/commit.h +++ b/commit.h @@ -167,6 +167,8 @@ int find_commit_subject(const char *commit_buffer, const char **subject); struct commit_list *commit_list_insert(struct commit *item, struct commit_list **list); +int commit_list_contains(struct commit *item, + struct commit_list *list); struct commit_list **commit_list_append(struct commit *commit, struct commit_list **next); unsigned commit_list_count(const struct commit_list *l); From patchwork Wed Nov 11 19:43:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898421 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CA00C5517A for ; Wed, 11 Nov 2020 19:43:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9580F2087D for ; Wed, 11 Nov 2020 19:43:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="p8C3bHAo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727829AbgKKTnq (ORCPT ); Wed, 11 Nov 2020 14:43:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727817AbgKKTnq (ORCPT ); Wed, 11 Nov 2020 14:43:46 -0500 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DD80C0613D1 for ; Wed, 11 Nov 2020 11:43:46 -0800 (PST) Received: by mail-qt1-x842.google.com with SMTP id g15so2158057qtq.13 for ; Wed, 11 Nov 2020 11:43:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=7JAF9xt14K8KYQ7q4EBRpopn6x8URZ3ztlGC19fCCI4=; b=p8C3bHAoMZPdUU/jDKZcaclR/uSeXp8P1iPbmYQsVgcLvqAyS9+MRAQcbBrNWJxDwH K+66b1rSuRi8DQKofJ65EKX/P36jw9EGPQ/oz7CjNIlXTdPgvIcHUapHUYJ0MorTVZl7 KwewtzmD9Jr2RgAn1vZx/qPcydGjmalkDTlX0YKj+kpMwN/HFutbw+jHwEXe5zF85QHI lvZLAOw/sJkCEOmTskUnIG/4zzkRl4iDZ+0d+yBCl1BJaYW0kwZeKeyVOQ5hLJZLMU62 OSmapRWJIOXXEuADX2/LJ/DGl64Smg3/qAdycqKbsWD1TEFar1TvBrB39kVShcNsYTWS X4yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=7JAF9xt14K8KYQ7q4EBRpopn6x8URZ3ztlGC19fCCI4=; b=mZgCm5YqDzXiSJpmWIOEAw0LUgXaBn9/fpfFJOe1Gj74/uabKW9g0Ia2bQV/j6nsTz nng8eW4u8662DbLwDztk2FIht4kTbZMeVWQjpd9/XuAE4LNgHORZH2/ZO0Nsy3Dx8MnJ /b/gtxLyym/0aQFX1N7YWgHc0P3CMfqVI2WDpIEi7fHVPyneFMT4C/vDUDSv9xww+mvh Px27KATBJTRln9k/5iAvTtEwwRVVmAzIEsIJkmAwABfLiQE80KTYwzndjUfBXmgXvdFC +NKE9bpfpJ8/4o/VjoxRoVKk1+a0HsWiCqe8Ry4bcuds42AYEKKWaOG8EtX0T22cSuWN 2iRQ== X-Gm-Message-State: AOAM531xtZaB/h8840+nijRu0cZmZXqiRnDD3FrwawrEtSJ5t5o/WKdk ueOUOcYijFj6EdolMmGnurdXgqPTGG4inxWi X-Google-Smtp-Source: ABdhPJxPxpOnxYqlEvHM4lewwkiKrVhfnBHroqXhaZRFqp+64+tdisyNm+asQlfRsmrUABOHAiOpOQ== X-Received: by 2002:aed:3383:: with SMTP id v3mr25132344qtd.353.1605123824133; Wed, 11 Nov 2020 11:43:44 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id q189sm3147890qkd.41.2020.11.11.11.43.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:43:43 -0800 (PST) Date: Wed, 11 Nov 2020 14:43:41 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 15/23] t5310: add branch-based checks Message-ID: <9ab4b94b3573346b31e710486799ab3d95bade8e.1605123653.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The current rev-list tests that check the bitmap data only work on HEAD instead of multiple branches. Expand the test cases to handle both 'master' and 'other' branches. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- t/t5310-pack-bitmaps.sh | 61 +++++++++++++++++++++++------------------ 1 file changed, 34 insertions(+), 27 deletions(-) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 70a4fc4843..6bf68fee85 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -41,63 +41,70 @@ test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' git rev-list --test-bitmap HEAD ' -rev_list_tests() { - state=$1 - - test_expect_success "counting commits via bitmap ($state)" ' - git rev-list --count HEAD >expect && - git rev-list --use-bitmap-index --count HEAD >actual && +rev_list_tests_head () { + test_expect_success "counting commits via bitmap ($state, $branch)" ' + git rev-list --count $branch >expect && + git rev-list --use-bitmap-index --count $branch >actual && test_cmp expect actual ' - test_expect_success "counting partial commits via bitmap ($state)" ' - git rev-list --count HEAD~5..HEAD >expect && - git rev-list --use-bitmap-index --count HEAD~5..HEAD >actual && + test_expect_success "counting partial commits via bitmap ($state, $branch)" ' + git rev-list --count $branch~5..$branch >expect && + git rev-list --use-bitmap-index --count $branch~5..$branch >actual && test_cmp expect actual ' - test_expect_success "counting commits with limit ($state)" ' - git rev-list --count -n 1 HEAD >expect && - git rev-list --use-bitmap-index --count -n 1 HEAD >actual && + test_expect_success "counting commits with limit ($state, $branch)" ' + git rev-list --count -n 1 $branch >expect && + git rev-list --use-bitmap-index --count -n 1 $branch >actual && test_cmp expect actual ' - test_expect_success "counting non-linear history ($state)" ' + test_expect_success "counting non-linear history ($state, $branch)" ' git rev-list --count other...master >expect && git rev-list --use-bitmap-index --count other...master >actual && test_cmp expect actual ' - test_expect_success "counting commits with limiting ($state)" ' - git rev-list --count HEAD -- 1.t >expect && - git rev-list --use-bitmap-index --count HEAD -- 1.t >actual && + test_expect_success "counting commits with limiting ($state, $branch)" ' + git rev-list --count $branch -- 1.t >expect && + git rev-list --use-bitmap-index --count $branch -- 1.t >actual && test_cmp expect actual ' - test_expect_success "counting objects via bitmap ($state)" ' - git rev-list --count --objects HEAD >expect && - git rev-list --use-bitmap-index --count --objects HEAD >actual && + test_expect_success "counting objects via bitmap ($state, $branch)" ' + git rev-list --count --objects $branch >expect && + git rev-list --use-bitmap-index --count --objects $branch >actual && test_cmp expect actual ' - test_expect_success "enumerate commits ($state)" ' - git rev-list --use-bitmap-index HEAD >actual && - git rev-list HEAD >expect && + test_expect_success "enumerate commits ($state, $branch)" ' + git rev-list --use-bitmap-index $branch >actual && + git rev-list $branch >expect && test_bitmap_traversal --no-confirm-bitmaps expect actual ' - test_expect_success "enumerate --objects ($state)" ' - git rev-list --objects --use-bitmap-index HEAD >actual && - git rev-list --objects HEAD >expect && + test_expect_success "enumerate --objects ($state, $branch)" ' + git rev-list --objects --use-bitmap-index $branch >actual && + git rev-list --objects $branch >expect && test_bitmap_traversal expect actual ' - test_expect_success "bitmap --objects handles non-commit objects ($state)" ' - git rev-list --objects --use-bitmap-index HEAD tagged-blob >actual && + test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" ' + git rev-list --objects --use-bitmap-index $branch tagged-blob >actual && grep $blob actual ' } +rev_list_tests () { + state=$1 + + for branch in "master" "other" + do + rev_list_tests_head + done +} + rev_list_tests 'full bitmap' test_expect_success 'clone from bitmapped repository' ' From patchwork Wed Nov 11 19:43:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898419 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D214C388F9 for ; Wed, 11 Nov 2020 19:43:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B08032087D for ; Wed, 11 Nov 2020 19:43:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="fQC8SDZu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727832AbgKKTnw (ORCPT ); Wed, 11 Nov 2020 14:43:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727830AbgKKTnv (ORCPT ); Wed, 11 Nov 2020 14:43:51 -0500 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D121EC0613D1 for ; Wed, 11 Nov 2020 11:43:50 -0800 (PST) Received: by mail-qt1-x841.google.com with SMTP id v11so2158395qtq.12 for ; Wed, 11 Nov 2020 11:43:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ytlKQFGGG+ravvYTA3kptUK39TXZJYYvtg8yhG9mjuU=; b=fQC8SDZu8/hvxm3SeKYojJ5YCXDONdfFLvkUHasfuqeIo32ik9aH7O8jWs5GGKq1oY Kq84yOjlyT5OOZMuGenpISr0TaMGayAJUREgUcL98oeUMrZdqqpwPcW6R8Y77hY8eUej pQm/XAsALJ+4eYRMTTZys3s4FCR9MKqTEiFT0xPGfM9iKIR0fuT/kbmdbtojrxj/Zk2/ /YpYNfyWRQGb9o7YZLuoOkUYPxiBEgSnrQCnt/l7xtV75j6v99QkM8x0vANlLRLNCB5R N8+f4LG9Mn7H6Z/v1b4ZH1AzHcbDdxcOh80dD1BzpgZxcNH4Gk7RGWFc9JvTiziuWezR P/6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ytlKQFGGG+ravvYTA3kptUK39TXZJYYvtg8yhG9mjuU=; b=Ykg78UqZZd0ZctrntFawgZsyZEouHUZwE800DJfkxuFUq33vQ0g9H5h+fK4Ac3ElP8 TBUb0VauKVx941lm4cHzC7yxXcjb6djJTwzE6WwoSl0Uh3CjzwvPSCo+5qLV091na6rn e3fZ7g/70MGNtJqbHRkOvudwK3me85WD4yopkV8D/lXj3KhCcZjKGA2l1GTzd25BluMq XjhbLs3TF/0Eqwj9JQ/5IdIfC9eJ0S7Anjc0rp4rjSPT4779jD2eXWZJwAl+Tpggf9m9 l2RXjx3h3pE9viE/YqIxPXHBbVaex2lu/530Ia7xruVJQJtXL5mRWT6fzmHBSMYf7IlQ ShUg== X-Gm-Message-State: AOAM533FM43IM1QBwtM85n3Uk70kCV7kGzXQayV/Pau1MScxK0ctdO+F u9zLYlnTaLYEHmHBNVtIIYPmWI460o4MlJF0 X-Google-Smtp-Source: ABdhPJycwbGNuPiUbuPGkzIq+uLHZzSNmOyE0wGmuTGekQ84voXJDeO7efCJzAoGOAEWRYTvr6/nKw== X-Received: by 2002:ac8:7688:: with SMTP id g8mr18154824qtr.270.1605123829771; Wed, 11 Nov 2020 11:43:49 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id m2sm3005356qtu.62.2020.11.11.11.43.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:43:49 -0800 (PST) Date: Wed, 11 Nov 2020 14:43:47 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 16/23] pack-bitmap-write: rename children to reverse_edges Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The bitmap_builder_init() method walks the reachable commits in topological order and constructs a "reverse graph" along the way. At the moment, this reverse graph contains an edge from commit A to commit B if and only if A is a parent of B. Thus, the name "children" is appropriate for for this reverse graph. In the next change, we will repurpose the reverse graph to not be directly-adjacent commits in the commit-graph, but instead a more abstract relationship. The previous changes have already incorporated the necessary updates to fill_bitmap_commit() that allow these edges to not be immediate children. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 361f3305a2..369c76a87c 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -179,7 +179,7 @@ static void compute_xor_offsets(void) } struct bb_commit { - struct commit_list *children; + struct commit_list *reverse_edges; struct bitmap *bitmap; unsigned selected:1; unsigned idx; /* within selected array */ @@ -228,7 +228,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, for (p = commit->parents; p; p = p->next) { struct bb_commit *ent = bb_data_at(&bb->data, p->item); - commit_list_insert(commit, &ent->children); + commit_list_insert(commit, &ent->reverse_edges); } } } @@ -358,7 +358,7 @@ void bitmap_writer_build(struct packing_data *to_pack) display_progress(writer.progress, nr_stored); } - while ((child = pop_commit(&ent->children))) { + while ((child = pop_commit(&ent->reverse_edges))) { struct bb_commit *child_ent = bb_data_at(&bb.data, child); From patchwork Wed Nov 11 19:43:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898429 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C17FC388F9 for ; Wed, 11 Nov 2020 19:43:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B4AF7207F7 for ; Wed, 11 Nov 2020 19:43:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="XnBmmDtO" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727836AbgKKTn4 (ORCPT ); Wed, 11 Nov 2020 14:43:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727830AbgKKTn4 (ORCPT ); Wed, 11 Nov 2020 14:43:56 -0500 Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D60FDC0613D1 for ; Wed, 11 Nov 2020 11:43:55 -0800 (PST) Received: by mail-qt1-x82e.google.com with SMTP id b16so1953631qtb.6 for ; Wed, 11 Nov 2020 11:43:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ZbLlQr/7fT+H7a7hBRslD+OUa7spHV/OCBkYqEvFico=; b=XnBmmDtOrOtzQSApF+XxkeffFKcpVXzkaJoDnt92OF5RbIWdyQxWjpLRJRgGEqLrR0 arVnfX8lrnuXxdvBW834VJJQI2BpAGNA+CNcilaKe+YtP+qq5MJM2FFAQdbstctGSn4k oSo9VonknACJgY0XkGBRrfraqAEBBbUXiHE3Exg7yHVFnZRR6SuysBmwPQ9ZT0yXjJCb 7LTTDGblzYmy8EsUIHxFxybmk6bcrEJzgNeFuJ8HTLNEebwM3m+fjuLcNJCVFeSRixkr Rk/RQmq7aVN7R3Mb5bFfLUBISG/3ShjzdCXxe667f4NHSY45yxkGVXv5LacG+GksoHWQ GGVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ZbLlQr/7fT+H7a7hBRslD+OUa7spHV/OCBkYqEvFico=; b=r7A9Sm9p/LsM+GJQm1leW8BNTI0I+TuL0kb1sKO+UYQG8xGWap74r8dmx+dFLfkmjK K4aBEZcpKI0z3pw1OKnv4Zl/498gd1FnqduDsB70gGTxMYpP1LgAaIRhim3+YL1wXHIL Vj3xMuD37qhvCY4XXkyZN3yO/YL8zunMRqfGypFf42gX30C5ZnJ8Vo9/4UAqdbljxuuP sepcrTnrKs2TW7ADU+e8b2QOmdRO2PW9o4Z00+mGySYh0BbLnyYO8M78GUo+0JSWJ20y tv4rU3PqaBT1wkdp+DJmakC3CzzYRFYEev6F+Kp59ljZbZk8z22bk4/dmH0vGM/UHfFE bMQA== X-Gm-Message-State: AOAM531ADcNWW0wFXCbSBklXb5wdER0Sz755cB7XfYMP+8YGuhdb9BeQ PixvK+jfsm5Vcr1lVXfm7O1oP9hYxhOpo/2Z X-Google-Smtp-Source: ABdhPJwoOXkxsYLPXm3tRvS4NVbHtvgpLdNZK9ohktJ2qpjvHbmEOgSxFQbBYxKboXAfY7GG2mqXsQ== X-Received: by 2002:ac8:376b:: with SMTP id p40mr24686343qtb.231.1605123834341; Wed, 11 Nov 2020 11:43:54 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id s23sm3031076qke.11.2020.11.11.11.43.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:43:53 -0800 (PST) Date: Wed, 11 Nov 2020 14:43:51 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 17/23] pack-bitmap-write: build fewer intermediate bitmaps Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The bitmap_writer_build() method calls bitmap_builder_init() to construct a list of commits reachable from the selected commits along with a "reverse graph". This reverse graph has edges pointing from a commit to other commits that can reach that commit. After computing a reachability bitmap for a commit, the values in that bitmap are then copied to the reachability bitmaps across the edges in the reverse graph. We can now relax the role of the reverse graph to greatly reduce the number of intermediate reachability bitmaps we compute during this reverse walk. The end result is that we walk objects the same number of times as before when constructing the reachability bitmaps, but we also spend much less time copying bits between bitmaps and have much lower memory pressure in the process. The core idea is to select a set of "important" commits based on interactions among the sets of commits reachable from each selected commit. The first technical concept is to create a new 'commit_mask' member in the bb_commit struct. Note that the selected commits are provided in an ordered array. The first thing to do is to mark the ith bit in the commit_mask for the ith selected commit. As we walk the commit-graph, we copy the bits in a commit's commit_mask to its parents. At the end of the walk, the ith bit in the commit_mask for a commit C stores a boolean representing "The ith selected commit can reach C." As we walk, we will discover non-selected commits that are important. We will get into this later, but those important commits must also receive bit positions, growing the width of the bitmasks as we walk. At the true end of the walk, the ith bit means "the ith _important_ commit can reach C." MAXIMAL COMMITS --------------- We use a new 'maximal' bit in the bb_commit struct to represent whether a commit is important or not. The term "maximal" comes from the partially-ordered set of commits in the commit-graph where C >= P if P is a parent of C, and then extending the relationship transitively. Instead of taking the maximal commits across the entire commit-graph, we instead focus on selecting each commit that is maximal among commits with the same bits on in their commit_mask. This definition is important, so let's consider an example. Suppose we have three selected commits A, B, and C. These are assigned bitmasks 100, 010, and 001 to start. Each of these can be marked as maximal immediately because they each will be the uniquely maximal commit that contains their own bit. Keep in mind that that these commits may have different bitmasks after the walk; for example, if B can reach C but A cannot, then the final bitmask for C is 011. Even in these cases, C would still be a maximal commit among all commits with the third bit on in their masks. Now define sets X, Y, and Z to be the sets of commits reachable from A, B, and C, respectively. The intersections of these sets correspond to different bitmasks: * 100: X - (Y union Z) * 010: Y - (X union Z) * 001: Z - (X union Y) * 110: (X intersect Y) - Z * 101: (X intersect Z) - Y * 011: (Y intersect Z) - X * 111: X intersect Y intersect Z This can be visualized with the following Hasse diagram: 100 010 001 | \ / \ / | | \/ \/ | | /\ /\ | | / \ / \ | 110 101 011 \___ | ___/ \ | / 111 Some of these bitmasks may not be represented, depending on the topology of the commit-graph. In fact, we are counting on it, since the number of possible bitmasks is exponential in the number of selected commits, but is also limited by the total number of commits. In practice, very few bitmasks are possible because most commits converge on a common "trunk" in the commit history. With this three-bit example, we wish to find commits that are maximal for each bitmask. How can we identify this as we are walking? As we walk, we visit a commit C. Since we are walking the commits in topo-order, we know that C is visited after all of its children are visited. Thus, when we get C from the revision walk we inspect the 'maximal' property of its bb_data and use that to determine if C is truly important. Its commit_mask is also nearly final. If C is not one of the originally-selected commits, then assign a bit position to C (by incrementing num_maximal) and set that bit on in commit_mask. See "MULTIPLE MAXIMAL COMMITS" below for more detail on this. Now that the commit C is known to be maximal or not, consider each parent P of C. Compute two new values: * c_not_p : true if and only if the commit_mask for C contains a bit that is not contained in the commit_mask for P. * p_not_c : true if and only if the commit_mask for P contains a bit that is not contained in the commit_mask for P. If c_not_p is false, then P already has all of the bits that C would provide to its commit_mask. In this case, move on to other parents as C has nothing to contribute to P's state that was not already provided by other children of P. We continue with the case that c_not_p is true. This means there are bits in C's commit_mask to copy to P's commit_mask, so use bitmap_or() to add those bits. If p_not_c is also true, then set the maximal bit for P to one. This means that if no other commit has P as a parent, then P is definitely maximal. This is because no child had the same bitmask. It is important to think about the maximal bit for P at this point as a temporary state: "P is maximal based on current information." In contrast, if p_not_c is false, then set the maximal bit for P to zero. Further, clear all reverse_edges for P since any edges that were previously assigned to P are no longer important. P will gain all reverse edges based on C. The final thing we need to do is to update the reverse edges for P. These reverse edges respresent "which closest maximal commits contributed bits to my commit_mask?" Since C contributed bits to P's commit_mask in this case, C must add to the reverse edges of P. If C is maximal, then C is a 'closest' maximal commit that contributed bits to P. Add C to P's reverse_edges list. Otherwise, C has a list of maximal commits that contributed bits to its bitmask (and this list is exactly one element). Add all of these items to P's reverse_edges list. Be careful to ignore duplicates here. After inspecting all parents P for a commit C, we can clear the commit_mask for C. This reduces the memory load to be limited to the "width" of the commit graph. Consider our ABC/XYZ example from earlier and let's inspect the state of the commits for an interesting bitmask, say 011. Suppose that D is the only maximal commit with this bitmask (in the first three bits). All other commits with bitmask 011 have D as the only entry in their reverse_edges list. D's reverse_edges list contains B and C. COMPUTING REACHABILITY BITMAPS ------------------------------ Now that we have our definition, let's zoom out and consider what happens with our new reverse graph when computing reachability bitmaps. We walk the reverse graph in reverse-topo-order, so we visit commits with largest commit_masks first. After we compute the reachability bitmap for a commit C, we push the bits in that bitmap to each commit D in the reverse edge list for C. Then, when we finally visit D we already have the bits for everything reachable from maximal commits that D can reach and we only need to walk the objects in the set-difference. In our ABC/XYZ example, when we finally walk for the commit A we only need to walk commits with bitmask equal to A's bitmask. If that bitmask is 100, then we are only walking commits in X - (Y union Z) because the bitmap already contains the bits for objects reachable from (X intersect Y) union (X intersect Z) (i.e. the bits from the reachability bitmaps for the maximal commits with bitmasks 110 and 101). The behavior is intended to walk each commit (and the trees that commit introduces) at most once while allocating and copying fewer reachability bitmaps. There is one caveat: what happens when there are multiple maximal commits with the same bitmask, with respect to the initial set of selected commits? MULTIPLE MAXIMAL COMMITS ------------------------ Earlier, we mentioned that when we discover a new maximal commit, we assign a new bit position to that commit and set that bit position to one for that commit. This is absolutely important for interesting commit-graphs such as git/git and torvalds/linux. The reason is due to the existence of "butterflies" in the commit-graph partial order. Here is an example of four commits forming a butterfly: I J |\ /| | \/ | | /\ | |/ \| M N \ / |/ Q Here, I and J both have parents M and N. In general, these do not need to be exact parent relationships, but reachability relationships. The most important part is that M and N cannot reach each other, so they are independent in the partial order. If I had commit_mask 10 and J had commit_mask 01, then M and N would both be assigned commit_mask 11 and be maximal commits with the bitmask 11. Then, what happens when M and N can both reach a commit Q? If Q is also assigned the bitmask 11, then it is not maximal but is reachable from both M and N. While this is not necessarily a deal-breaker for our abstract definition of finding maximal commits according to a given bitmask, we have a few issues that can come up in our larger picture of constructing reachability bitmaps. In particular, if we do not also consider Q to be a "maximal" commit, then we will walk commits reachable from Q twice: once when computing the reachability bitmap for M and another time when computing the reachability bitmap for N. This becomes much worse if the topology continues this pattern with multiple butterflies. The solution has already been mentioned: each of M and N are assigned their own bits to the bitmask and hence they become uniquely maximal for their bitmasks. Finally, Q also becomes maximal and thus we do not need to walk its commits multiple times. The final bitmasks for these commits are as follows: I:10 J:01 |\ /| | \ _____/ | | /\____ | |/ \ | M:111 N:1101 \ / Q:1111 Further, Q's reverse edge list is { M, N }, while M and N both have reverse edge list { I, J }. PERFORMANCE MEASUREMENTS ------------------------ Now that we've spent a LOT of time on the theory of this algorithm, let's show that this is actually worth all that effort. To test the performance, use GIT_TRACE2_PERF=1 when running 'git repack -abd' in a repository with no existing reachability bitmaps. This avoids any issues with keeping existing bitmaps to skew the numbers. Inspect the "building_bitmaps_total" region in the trace2 output to focus on the portion of work that is affected by this change. Here are the performance comparisons for a few repositories. The timings are for the following versions of Git: "multi" is the timing from before any reverse graph is constructed, where we might perform multiple traversals. "reverse" is for the previous change where the reverse graph has every reachable commit. Finally "maximal" is the version introduced here where the reverse graph only contains the maximal commits. Repository: git/git multi: 2.628 sec reverse: 2.344 sec maximal: 2.047 sec Repository: torvalds/linux multi: 64.7 sec reverse: 205.3 sec maximal: 44.7 sec So in all cases we've not only recovered any time lost to switching to the reverse-edge algorithm, but we come out ahead of "multi" in all cases. Likewise, peak heap has gone back to something reasonable: Repository: torvalds/linux multi: 2.087 GB reverse: 3.141 GB maximal: 2.288 GB While I do not have access to full fork networks on GitHub, Peff has run this algorithm on the chromium/chromium fork network and reported a change from 3 hours to ~233 seconds. That network is particularly beneficial for this approach because it has a long, linear history along with many tags. The "multi" approach was obviously quadratic and the new approach is linear. Helped-by: Jeff King Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 72 +++++++++++++++++++++++++++++++--- t/t5310-pack-bitmaps.sh | 85 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 148 insertions(+), 9 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 369c76a87c..7b4fc0f304 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -180,8 +180,10 @@ static void compute_xor_offsets(void) struct bb_commit { struct commit_list *reverse_edges; + struct bitmap *commit_mask; struct bitmap *bitmap; - unsigned selected:1; + unsigned selected:1, + maximal:1; unsigned idx; /* within selected array */ }; @@ -198,7 +200,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, { struct rev_info revs; struct commit *commit; - unsigned int i; + unsigned int i, num_maximal; memset(bb, 0, sizeof(*bb)); init_bb_data(&bb->data); @@ -210,27 +212,85 @@ static void bitmap_builder_init(struct bitmap_builder *bb, for (i = 0; i < writer->selected_nr; i++) { struct commit *c = writer->selected[i].commit; struct bb_commit *ent = bb_data_at(&bb->data, c); + ent->selected = 1; + ent->maximal = 1; ent->idx = i; + + ent->commit_mask = bitmap_new(); + bitmap_set(ent->commit_mask, i); + add_pending_object(&revs, &c->object, ""); } + num_maximal = writer->selected_nr; if (prepare_revision_walk(&revs)) die("revision walk setup failed"); while ((commit = get_revision(&revs))) { struct commit_list *p; + struct bb_commit *c_ent; parse_commit_or_die(commit); - ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); - bb->commits[bb->commits_nr++] = commit; + c_ent = bb_data_at(&bb->data, commit); + + if (c_ent->maximal) { + if (!c_ent->selected) { + bitmap_set(c_ent->commit_mask, num_maximal); + num_maximal++; + } + + ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); + bb->commits[bb->commits_nr++] = commit; + } for (p = commit->parents; p; p = p->next) { - struct bb_commit *ent = bb_data_at(&bb->data, p->item); - commit_list_insert(commit, &ent->reverse_edges); + struct bb_commit *p_ent = bb_data_at(&bb->data, p->item); + int c_not_p, p_not_c; + + if (!p_ent->commit_mask) { + p_ent->commit_mask = bitmap_new(); + c_not_p = 1; + p_not_c = 0; + } else { + c_not_p = bitmap_diff_nonzero(c_ent->commit_mask, p_ent->commit_mask); + p_not_c = bitmap_diff_nonzero(p_ent->commit_mask, c_ent->commit_mask); + } + + if (!c_not_p) + continue; + + bitmap_or(p_ent->commit_mask, c_ent->commit_mask); + + if (p_not_c) + p_ent->maximal = 1; + else { + p_ent->maximal = 0; + free_commit_list(p_ent->reverse_edges); + p_ent->reverse_edges = NULL; + } + + if (c_ent->maximal) { + commit_list_insert(commit, &p_ent->reverse_edges); + } else { + struct commit_list *cc = c_ent->reverse_edges; + + for (; cc; cc = cc->next) { + if (!commit_list_contains(cc->item, p_ent->reverse_edges)) + commit_list_insert(cc->item, &p_ent->reverse_edges); + } + } } + + bitmap_free(c_ent->commit_mask); + c_ent->commit_mask = NULL; } + + trace2_data_intmax("pack-bitmap-write", the_repository, + "num_selected_commits", writer->selected_nr); + trace2_data_intmax("pack-bitmap-write", the_repository, + "num_maximal_commits", num_maximal); } static void bitmap_builder_clear(struct bitmap_builder *bb) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 6bf68fee85..33ef9a098d 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -20,11 +20,87 @@ has_any () { grep -Ff "$1" "$2" } +# To ensure the logic for "maximal commits" is exercised, make +# the repository a bit more complicated. +# +# other master +# * * +# (99 commits) (99 commits) +# * * +# |\ /| +# | * octo-other octo-master * | +# |/|\_________ ____________/|\| +# | \ \/ __________/ | +# | | ________/\ / | +# * |/ * merge-right * +# | _|__________/ \____________ | +# |/ | \| +# (l1) * * merge-left * (r1) +# | / \________________________ | +# |/ \| +# (l2) * * (r2) +# \____________...____________ | +# \| +# * (base) +# +# The important part for the maximal commit algorithm is how +# the bitmasks are extended. Assuming starting bit positions +# for master (bit 0) and other (bit 1), and some flexibility +# in the order that merge bases are visited, the bitmasks at +# the end should be: +# +# master: 1 (maximal, selected) +# other: 01 (maximal, selected) +# octo-master: 1 +# octo-other: 01 +# merge-right: 111 (maximal) +# (l1): 111 +# (r1): 111 +# merge-left: 1101 (maximal) +# (l2): 11111 (maximal) +# (r2): 111101 (maximal) +# (base): 1111111 (maximal) + test_expect_success 'setup repo with moderate-sized history' ' - test_commit_bulk --id=file 100 && + test_commit_bulk --id=file 10 && git checkout -b other HEAD~5 && test_commit_bulk --id=side 10 && + + # add complicated history setup, including merges and + # ambiguous merge-bases + + git checkout -b merge-left other~2 && + git merge master~2 -m "merge-left" && + + git checkout -b merge-right master~1 && + git merge other~1 -m "merge-right" && + + git checkout -b octo-master master && + git merge merge-left merge-right -m "octopus-master" && + + git checkout -b octo-other other && + git merge merge-left merge-right -m "octopus-other" && + + git checkout other && + git merge octo-other -m "pull octopus" && + git checkout master && + git merge octo-master -m "pull octopus" && + + # Remove these branches so they are not selected + # as bitmap tips + git branch -D merge-left && + git branch -D merge-right && + git branch -D octo-other && + git branch -D octo-master && + + # add padding to make these merges less interesting + # and avoid having them selected for bitmaps + test_commit_bulk --id=file 100 && + git checkout other && + test_commit_bulk --id=side 100 && + git checkout master && + bitmaptip=$(git rev-parse master) && blob=$(echo tagged-blob | git hash-object -w --stdin) && git tag tagged-blob $blob && @@ -32,9 +108,12 @@ test_expect_success 'setup repo with moderate-sized history' ' ' test_expect_success 'full repack creates bitmaps' ' - git repack -ad && + GIT_TRACE2_EVENT_NESTING=4 GIT_TRACE2_EVENT="$(pwd)/trace" \ + git repack -ad && ls .git/objects/pack/ | grep bitmap >output && - test_line_count = 1 output + test_line_count = 1 output && + grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace && + grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace ' test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' From patchwork Wed Nov 11 19:43:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43505C5517A for ; Wed, 11 Nov 2020 19:44:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D2C1E208B3 for ; Wed, 11 Nov 2020 19:44:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="f8QjcU2R" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727841AbgKKToD (ORCPT ); Wed, 11 Nov 2020 14:44:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727830AbgKKToD (ORCPT ); Wed, 11 Nov 2020 14:44:03 -0500 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5FEDDC0613D1 for ; Wed, 11 Nov 2020 11:44:03 -0800 (PST) Received: by mail-qk1-x729.google.com with SMTP id d9so2903380qke.8 for ; Wed, 11 Nov 2020 11:44:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=HWQ/0PA991tIXD805VXbeNtT0ZCfavw0PqLbVnE1H9g=; b=f8QjcU2ROPDrR0ZFTjjzSjhnY+mzgb8+HDvL76xAzsnlItRAFEmw8mp6QA887XHGoP 0u3Syn0hPbV0T/IeaOS6mUDi2ftzHnzwkZV1jmtz/F5KzURY2544M/iJd+dc086Wv4ZF NH7aIqF0oz5xjo+fX6Wg9ks0JIfX1VyfmEkIccBEpHiS4Gx8/5HaOFCbjSU/v3hO9e2v nn3UbbSARQi2S5cDAyi24ZbrvKBFZ4TPz0nFtOhFTsbrnY5RD5UUhN13IBpRzr1rnGAa 5S2qUSBoeJ8D9umG20O3Jiudm7m4slEwPgsw3ohTWjh2xZYUIXhYERh+vqNYKVOQ9GEn UZ6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=HWQ/0PA991tIXD805VXbeNtT0ZCfavw0PqLbVnE1H9g=; b=FkDrLdOGovRsIJ2a2lXvidp54EcqfHnOLMhAY8gpNvh1xLfEK5UKIitsr7Q5P1vQ9M 8fv/9nkm+/dT5U2MGGzE5PBtPUGQqKlrxomeYpdv0k5LeuGPswt6nSo4X/JdZQe6V5Yu W7Cx3++GNrhpeqaNFYme0Fy+zxvewdAjAMgFGJeS34VldECFMAsKbU+4k4n8q5FbahVa WS6WPYG/5rVZ+c5VNO26zpT7/Yal/aFMQ84cu87jNR1ylqyYzhlzfDsqx7cumqP97Ivo HeVHjUVYoasVa5fPMaw0T+4MYk7TsmUpxpv7Izrl+SobIfdznJaM7XQoIJW9h4OUk8d3 7+zg== X-Gm-Message-State: AOAM531pNlQKp2nBy9Hpme6bQpxN+Asj7F9U+ac2GI1HZcGti5p0d37M BRMWnuVrsglStDojM/P4i5zdqX0r/Za+Xu6b X-Google-Smtp-Source: ABdhPJwktK5j5DPnV9yw04DQRmsPqRGG+LXyjC7/XmX+Ia6UYXbaVV0J1SXQ9LyJxrCezHovM7hkbw== X-Received: by 2002:a37:7345:: with SMTP id o66mr27393731qkc.222.1605123842141; Wed, 11 Nov 2020 11:44:02 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id q1sm2990054qti.95.2020.11.11.11.44.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:44:01 -0800 (PST) Date: Wed, 11 Nov 2020 14:43:59 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 18/23] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Message-ID: <99a295416b2714c183aa39dcf515fb625090c2f2.1605123653.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King The on-disk bitmap format has a flag to mark a bitmap to be "reused". This is a rather curious feature, and works like this: - a run of pack-objects would decide to mark the last 80% of the bitmaps it generates with the reuse flag - the next time we generate bitmaps, we'd see those reuse flags from the last run, and mark those commits as special: - we'd be more likely to select those commits to get bitmaps in the new output - when generating the bitmap for a selected commit, we'd reuse the old bitmap as-is (rearranging the bits to match the new pack, of course) However, neither of these behaviors particularly makes sense. Just because a commit happened to be bitmapped last time does not make it a good candidate for having a bitmap this time. In particular, we may choose bitmaps based on how recent they are in history, or whether a ref tip points to them, and those things will change. We're better off re-considering fresh which commits are good candidates. Reusing the existing bitmap _is_ a reasonable thing to do to save computation. But only reusing exact bitmaps is a weak form of this. If we have an old bitmap for A and now want a new bitmap for its child, we should be able to compute that only by looking at trees and that are new to the child. But this code would consider only exact reuse (which is perhaps why it was eager to select those commits in the first place). Furthermore, the recent switch to the reverse-edge algorithm for generating bitmaps dropped this optimization entirely (and yet still performs better). So let's do a few cleanups: - drop the whole "reusing bitmaps" phase of generating bitmaps. It's not helping anything, and is mostly unused code (or worse, code that is using CPU but not doing anything useful) - drop the use of the on-disk reuse flag to select commits to bitmap - stop setting the on-disk reuse flag in bitmaps we generate (since nothing respects it anymore) We will keep a few innards of the reuse code, which will help us implement a more capable version of the "reuse" optimization: - simplify rebuild_existing_bitmaps() into a function that only builds the mapping of bits between the old and new orders, but doesn't actually convert any bitmaps - make rebuild_bitmap() public; we'll call it lazily to convert bitmaps as we traverse (using the mapping created above) Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 1 - pack-bitmap-write.c | 50 +++++------------------------------------- pack-bitmap.c | 46 +++++--------------------------------- pack-bitmap.h | 6 ++++- 4 files changed, 16 insertions(+), 87 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 5617c01b5a..2a00358f34 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1104,7 +1104,6 @@ static void write_pack_file(void) stop_progress(&progress_state); bitmap_writer_show_progress(progress); - bitmap_writer_reuse_bitmaps(&to_pack); bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1); bitmap_writer_build(&to_pack); bitmap_writer_finish(written_list, nr_written, diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 7b4fc0f304..1995f75818 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -30,7 +30,6 @@ struct bitmap_writer { struct ewah_bitmap *tags; kh_oid_map_t *bitmaps; - kh_oid_map_t *reused; struct packing_data *to_pack; struct bitmapped_commit *selected; @@ -112,7 +111,7 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack, * Compute the actual bitmaps */ -static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused) +static inline void push_bitmapped_commit(struct commit *commit) { if (writer.selected_nr >= writer.selected_alloc) { writer.selected_alloc = (writer.selected_alloc + 32) * 2; @@ -120,7 +119,7 @@ static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitm } writer.selected[writer.selected_nr].commit = commit; - writer.selected[writer.selected_nr].bitmap = reused; + writer.selected[writer.selected_nr].bitmap = NULL; writer.selected[writer.selected_nr].flags = 0; writer.selected_nr++; @@ -372,13 +371,6 @@ static void store_selected(struct bb_commit *ent, struct commit *commit) khiter_t hash_pos; int hash_ret; - /* - * the "reuse bitmaps" phase may have stored something here, but - * our new algorithm doesn't use it. Drop it. - */ - if (stored->bitmap) - ewah_free(stored->bitmap); - stored->bitmap = bitmap_to_ewah(ent->bitmap); hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret); @@ -477,35 +469,6 @@ static int date_compare(const void *_a, const void *_b) return (long)b->date - (long)a->date; } -void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack) -{ - struct bitmap_index *bitmap_git; - if (!(bitmap_git = prepare_bitmap_git(to_pack->repo))) - return; - - writer.reused = kh_init_oid_map(); - rebuild_existing_bitmaps(bitmap_git, to_pack, writer.reused, - writer.show_progress); - /* - * NEEDSWORK: rebuild_existing_bitmaps() makes writer.reused reference - * some bitmaps in bitmap_git, so we can't free the latter. - */ -} - -static struct ewah_bitmap *find_reused_bitmap(const struct object_id *oid) -{ - khiter_t hash_pos; - - if (!writer.reused) - return NULL; - - hash_pos = kh_get_oid_map(writer.reused, *oid); - if (hash_pos >= kh_end(writer.reused)) - return NULL; - - return kh_value(writer.reused, hash_pos); -} - void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps) @@ -519,12 +482,11 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, if (indexed_commits_nr < 100) { for (i = 0; i < indexed_commits_nr; ++i) - push_bitmapped_commit(indexed_commits[i], NULL); + push_bitmapped_commit(indexed_commits[i]); return; } for (;;) { - struct ewah_bitmap *reused_bitmap = NULL; struct commit *chosen = NULL; next = next_commit_index(i); @@ -539,15 +501,13 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, if (next == 0) { chosen = indexed_commits[i]; - reused_bitmap = find_reused_bitmap(&chosen->object.oid); } else { chosen = indexed_commits[i + next]; for (j = 0; j <= next; ++j) { struct commit *cm = indexed_commits[i + j]; - reused_bitmap = find_reused_bitmap(&cm->object.oid); - if (reused_bitmap || (cm->object.flags & NEEDS_BITMAP) != 0) { + if ((cm->object.flags & NEEDS_BITMAP) != 0) { chosen = cm; break; } @@ -557,7 +517,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, } } - push_bitmapped_commit(chosen, reused_bitmap); + push_bitmapped_commit(chosen); i += next + 1; display_progress(writer.progress, i); diff --git a/pack-bitmap.c b/pack-bitmap.c index 82c6bf2843..682f4d19dd 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1333,9 +1333,9 @@ void test_bitmap_walk(struct rev_info *revs) free_bitmap_index(bitmap_git); } -static int rebuild_bitmap(uint32_t *reposition, - struct ewah_bitmap *source, - struct bitmap *dest) +int rebuild_bitmap(const uint32_t *reposition, + struct ewah_bitmap *source, + struct bitmap *dest) { uint32_t pos = 0; struct ewah_iterator it; @@ -1364,19 +1364,11 @@ static int rebuild_bitmap(uint32_t *reposition, return 0; } -int rebuild_existing_bitmaps(struct bitmap_index *bitmap_git, - struct packing_data *mapping, - kh_oid_map_t *reused_bitmaps, - int show_progress) +uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, + struct packing_data *mapping) { uint32_t i, num_objects; uint32_t *reposition; - struct bitmap *rebuild; - struct stored_bitmap *stored; - struct progress *progress = NULL; - - khiter_t hash_pos; - int hash_ret; num_objects = bitmap_git->pack->num_objects; reposition = xcalloc(num_objects, sizeof(uint32_t)); @@ -1394,33 +1386,7 @@ int rebuild_existing_bitmaps(struct bitmap_index *bitmap_git, reposition[i] = oe_in_pack_pos(mapping, oe) + 1; } - rebuild = bitmap_new(); - i = 0; - - if (show_progress) - progress = start_progress("Reusing bitmaps", 0); - - kh_foreach_value(bitmap_git->bitmaps, stored, { - if (stored->flags & BITMAP_FLAG_REUSE) { - if (!rebuild_bitmap(reposition, - lookup_stored_bitmap(stored), - rebuild)) { - hash_pos = kh_put_oid_map(reused_bitmaps, - stored->oid, - &hash_ret); - kh_value(reused_bitmaps, hash_pos) = - bitmap_to_ewah(rebuild); - } - bitmap_reset(rebuild); - display_progress(progress, ++i); - } - }); - - stop_progress(&progress); - - free(reposition); - bitmap_free(rebuild); - return 0; + return reposition; } void free_bitmap_index(struct bitmap_index *b) diff --git a/pack-bitmap.h b/pack-bitmap.h index 1203120c43..afa4115136 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -73,7 +73,11 @@ void bitmap_writer_set_checksum(unsigned char *sha1); void bitmap_writer_build_type_index(struct packing_data *to_pack, struct pack_idx_entry **index, uint32_t index_nr); -void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack); +uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, + struct packing_data *mapping); +int rebuild_bitmap(const uint32_t *reposition, + struct ewah_bitmap *source, + struct bitmap *dest); void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps); void bitmap_writer_build(struct packing_data *to_pack); From patchwork Wed Nov 11 19:44:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12DB5C5517A for ; Wed, 11 Nov 2020 19:44:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B4644207F7 for ; Wed, 11 Nov 2020 19:44:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="0kWXM6Cf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727845AbgKKToK (ORCPT ); Wed, 11 Nov 2020 14:44:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55116 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727830AbgKKToK (ORCPT ); Wed, 11 Nov 2020 14:44:10 -0500 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C5C2C0613D1 for ; Wed, 11 Nov 2020 11:44:10 -0800 (PST) Received: by mail-qk1-x743.google.com with SMTP id q22so2917213qkq.6 for ; Wed, 11 Nov 2020 11:44:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=ToHhIdnfjJW8cH7okZum7KMkq/HVPVbdlqyPU4zMAZI=; b=0kWXM6CfVHbGDGQdY89jllyPc8pAOhJwUXUBRuznyULQy3t5MP5ppmvmAlg+soYxQe vxuv4E8wP56FWuVeTnhxjxB8QhqltluU6FwY2bL2Q0niCwpZpqEmKDlVNzODtqG4Jh36 R8LgsAoL0U8aLDi8qOdKjhFPmSgcLdUZV4hUETLdZkRpC/M6f7VDejeCdojFEkkl2Wwb dkYMI+i/wAANsO6bO7FXO/5chEBXPS3trDpT3WWGCip9xVyjFMXmYSWnPZjH24AnZzzl S03pixDjuTKzEw5AB4qkj8dZj1PA+6M/8qkjU7XZfBZstCQNHJpcs1KC/dd7/YBUsNre 9UaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ToHhIdnfjJW8cH7okZum7KMkq/HVPVbdlqyPU4zMAZI=; b=VxzgLtV01nqIK2TzeYEIcXF8geNuvIRtZC/p0LtbXBeebZyE+8GfQiCQXJTwBY9Avi YoCWPW0/KvtktNq6KI5nEyKgEppBEO7oeeKEHwRYWvllmFniBFcFxRXER2pHhN0YVaUH 4LPZ5UJtb1HTv+jfNmpi0hLJyiNmW3QuQ+0xMDEDjypan1/RWhjsiPaGfHpj+Vw4Epb8 sVpwZlzE1YJ1KKMYLCU1v+NoIktx8HnAcH1GEKwxncJOqMiWNZxzbucNr5DkDK4vwF/D odLFKR3pzc/hfAOrJsuJBLkdys9J0O7BPhwpCcA1MOlDFG1yIFgeWcvSkDYdTNGGI31d YVPA== X-Gm-Message-State: AOAM533Tzes/RkWc7OZa7uR4abqT6oaVCZ+Lbn7uoDlnHonCtbU54cwu e8cHM0/kZDO41CCuzXbdSVnNjUrEbeHLtvpO X-Google-Smtp-Source: ABdhPJwOXr05fsOOcseGD6hokZAvQ8RHsRKRzP1oLkebitJxtHavQCW14veIOYOZHaCq/zgT/xUSMA== X-Received: by 2002:a37:8703:: with SMTP id j3mr22050441qkd.5.1605123849274; Wed, 11 Nov 2020 11:44:09 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id q20sm3126457qtl.69.2020.11.11.11.44.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:44:08 -0800 (PST) Date: Wed, 11 Nov 2020 14:44:06 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 19/23] pack-bitmap: factor out 'bitmap_for_commit()' Message-ID: <3aecb42215c31db61397edc9620f1e038753709e.1605123653.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A couple of callers within pack-bitmap.c duplicate logic to lookup a given object id in the bitamps khash. Factor this out into a new function, 'bitmap_for_commit()' to reduce some code duplication. Make this new function non-static, since it will be used in later commits from outside of pack-bitmap.c. Signed-off-by: Taylor Blau --- pack-bitmap.c | 33 +++++++++++++++++++-------------- pack-bitmap.h | 2 ++ 2 files changed, 21 insertions(+), 14 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 682f4d19dd..99a0683f49 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -375,6 +375,16 @@ struct include_data { struct bitmap *seen; }; +struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git, + struct commit *commit) +{ + khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps, + commit->object.oid); + if (hash_pos >= kh_end(bitmap_git->bitmaps)) + return NULL; + return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos)); +} + static inline int bitmap_position_extended(struct bitmap_index *bitmap_git, const struct object_id *oid) { @@ -460,10 +470,10 @@ static void show_commit(struct commit *commit, void *data) static int add_to_include_set(struct bitmap_index *bitmap_git, struct include_data *data, - const struct object_id *oid, + struct commit *commit, int bitmap_pos) { - khiter_t hash_pos; + struct ewah_bitmap *partial; if (data->seen && bitmap_get(data->seen, bitmap_pos)) return 0; @@ -471,10 +481,9 @@ static int add_to_include_set(struct bitmap_index *bitmap_git, if (bitmap_get(data->base, bitmap_pos)) return 0; - hash_pos = kh_get_oid_map(bitmap_git->bitmaps, *oid); - if (hash_pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, hash_pos); - bitmap_or_ewah(data->base, lookup_stored_bitmap(st)); + partial = bitmap_for_commit(bitmap_git, commit); + if (partial) { + bitmap_or_ewah(data->base, partial); return 0; } @@ -493,8 +502,7 @@ static int should_include(struct commit *commit, void *_data) (struct object *)commit, NULL); - if (!add_to_include_set(data->bitmap_git, data, &commit->object.oid, - bitmap_pos)) { + if (!add_to_include_set(data->bitmap_git, data, commit, bitmap_pos)) { struct commit_list *parent = commit->parents; while (parent) { @@ -1277,10 +1285,10 @@ void test_bitmap_walk(struct rev_info *revs) { struct object *root; struct bitmap *result = NULL; - khiter_t pos; size_t result_popcnt; struct bitmap_test_data tdata; struct bitmap_index *bitmap_git; + struct ewah_bitmap *bm; if (!(bitmap_git = prepare_bitmap_git(revs->repo))) die("failed to load bitmap indexes"); @@ -1292,12 +1300,9 @@ void test_bitmap_walk(struct rev_info *revs) bitmap_git->version, bitmap_git->entry_count); root = revs->pending.objects[0].item; - pos = kh_get_oid_map(bitmap_git->bitmaps, root->oid); - - if (pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos); - struct ewah_bitmap *bm = lookup_stored_bitmap(st); + bm = bitmap_for_commit(bitmap_git, (struct commit *)root); + if (bm) { fprintf(stderr, "Found bitmap for %s. %d bits / %08x checksum\n", oid_to_hex(&root->oid), (int)bm->bit_size, ewah_checksum(bm)); diff --git a/pack-bitmap.h b/pack-bitmap.h index afa4115136..25dfcf5615 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -78,6 +78,8 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, int rebuild_bitmap(const uint32_t *reposition, struct ewah_bitmap *source, struct bitmap *dest); +struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git, + struct commit *commit); void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps); void bitmap_writer_build(struct packing_data *to_pack); From patchwork Wed Nov 11 19:44:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898425 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DB55C5517A for ; Wed, 11 Nov 2020 19:44:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1CBC92087D for ; Wed, 11 Nov 2020 19:44:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="cHiUD2m/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727626AbgKKToS (ORCPT ); Wed, 11 Nov 2020 14:44:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727132AbgKKToQ (ORCPT ); Wed, 11 Nov 2020 14:44:16 -0500 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27018C0613D1 for ; Wed, 11 Nov 2020 11:44:15 -0800 (PST) Received: by mail-qt1-x842.google.com with SMTP id b16so1954419qtb.6 for ; Wed, 11 Nov 2020 11:44:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=TZHmQ+gEVWeVWitBWD1fHXPDTy8E5OTpxkmbjVhKUS8=; b=cHiUD2m/U0bv+WzsVUidNHGA9uAVWvwj8RshQeGLifHpKZiIxARuE1BZffL6cWTj4P kCumZptrDNFYKKOyQaLu5wO43O9EVtrL9cB9sVPVtmwMfdyFD9o+RRkSouIU6Dui7qMY EPj/SJET12MTAYfiK2bbj7zuFo6q2yPlTptlycMPZoAosQM08RBYiyRxG+XOgFe3cGd9 lNTBonucKi/f6mNnZTKElk0uVgMWW6wHb8UcTIOOanWEK9Fd0sYMNJXQdIABfjaq+kDv TqRCp1E++3C6IFfPS6LFux2CTK8dpc0usy0GKbK4ZVlbsCrHkFOpUp1wGrTrGZnjhhLI EAcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=TZHmQ+gEVWeVWitBWD1fHXPDTy8E5OTpxkmbjVhKUS8=; b=f3/TmwUr7sU2whD58fMxqs5BCeKX89WxWOJMYh/io5aP1BO2q4NJiHL+qm2CVEIHL2 vu4Z2AqmV1A9zNq4XUHbV9OTqZmD0P/TfzQxL+NBd1joqz5X+vIiTQE1pqftdGQCh1a3 Dlu13cp3d96rThgzF1SgRmiXt3CA1g9GNLY0QPOtQmDlZPomQEo3ldXn7zynYjn+BM8d IIKSbuikuIzBc9NqZt9QLDyfvlK3CMZp71Mmf4tBKatxC7droClQZ96wVZKxbkcNsv8W qhdiw1bO2GwP9g9hW2hLq4kyq1daDFUGUD19APDrc0H/EURFGlJpUkCBlfLXP9Apy8el KpqQ== X-Gm-Message-State: AOAM531J8qm82xpTgduUuaCJPgBgkXqDLEteXOd2boltrScv5M0hV0nw YZlN8aQtaTOosk+FHGdtCwDUn9v2gIeBzd2k X-Google-Smtp-Source: ABdhPJyH17xRKQqmJCtuOyVOTS7qblfx8knR0tR6CEwOQKAP80q/iIZcE7+1+3w7OFTiVr9bsouMUQ== X-Received: by 2002:aed:3ae5:: with SMTP id o92mr24535152qte.265.1605123854118; Wed, 11 Nov 2020 11:44:14 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id x3sm3034424qke.55.2020.11.11.11.44.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:44:13 -0800 (PST) Date: Wed, 11 Nov 2020 14:44:11 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 20/23] pack-bitmap: factor out 'add_commit_to_bitmap()' Message-ID: <2ee47f31ab7718e3b6b0d0079f70b0a9195677b3.1605123653.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org 'find_objects()' currently needs to interact with the bitmaps khash pretty closely. To make 'find_objects()' read a little more straightforwardly, remove some of the khash-level details into a new function that describes what it does: 'add_commit_to_bitmap()'. Signed-off-by: Taylor Blau --- pack-bitmap.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 99a0683f49..dc811ebae8 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -516,6 +516,23 @@ static int should_include(struct commit *commit, void *_data) return 1; } +static int add_commit_to_bitmap(struct bitmap_index *bitmap_git, + struct bitmap **base, + struct commit *commit) +{ + struct ewah_bitmap *or_with = bitmap_for_commit(bitmap_git, commit); + + if (!or_with) + return 0; + + if (*base == NULL) + *base = ewah_to_bitmap(or_with); + else + bitmap_or_ewah(*base, or_with); + + return 1; +} + static struct bitmap *find_objects(struct bitmap_index *bitmap_git, struct rev_info *revs, struct object_list *roots, @@ -539,21 +556,10 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, struct object *object = roots->item; roots = roots->next; - if (object->type == OBJ_COMMIT) { - khiter_t pos = kh_get_oid_map(bitmap_git->bitmaps, object->oid); - - if (pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos); - struct ewah_bitmap *or_with = lookup_stored_bitmap(st); - - if (base == NULL) - base = ewah_to_bitmap(or_with); - else - bitmap_or_ewah(base, or_with); - - object->flags |= SEEN; - continue; - } + if (object->type == OBJ_COMMIT && + add_commit_to_bitmap(bitmap_git, &base, (struct commit *)object)) { + object->flags |= SEEN; + continue; } object_list_insert(object, ¬_mapped); From patchwork Wed Nov 11 19:44:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E633C388F9 for ; Wed, 11 Nov 2020 19:44:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3AFC2087D for ; Wed, 11 Nov 2020 19:44:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="UwVYZLxI" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727864AbgKKToY (ORCPT ); Wed, 11 Nov 2020 14:44:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727660AbgKKToW (ORCPT ); Wed, 11 Nov 2020 14:44:22 -0500 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15EF8C0613D1 for ; Wed, 11 Nov 2020 11:44:22 -0800 (PST) Received: by mail-qt1-x842.google.com with SMTP id v11so2159729qtq.12 for ; Wed, 11 Nov 2020 11:44:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=kNXQEb//1qY33cnwzPvcqXO19GELsXVnKcuW9OgoooY=; b=UwVYZLxIrXwlEw/Qaq9/meNHjsq8fjNoh7KW6vjIcsbjGMqjW95m1xCAN0ME1zkIP1 ZE4tJvabF6dp6oS2p0FzNERIc45JdXuj6+LRXNKzRCz3AE00fcmxqBm0o4XvHPGYiOze XU81UB9JJEV0rUgVsYpl0xkNFotYu3ncYDgQ0hsSXjtzj5m8e7Gnm2lPW0vPvX9Elo2g AazxGf5UzNyvKUM3Hh8Q6KJKuJ4c1SWcMHHVgQzWWJ/3hWoKAjgtW0FPDX/rVxbgCJFz 8cz6AZMJemwQpD3TzKQCf5eQfWQue4CHvq07JD5sOGrfjZZV7cktFl/ll6B9qOO4u83A KiFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=kNXQEb//1qY33cnwzPvcqXO19GELsXVnKcuW9OgoooY=; b=BY3OsHNn77SpOrjynvgJmulcYwGi2pNhpa1dO4AiRBHL17xW5tealEoIgLXnxGwIDB GGu7FQtSOodiRgZpBZHHL/N4l9Cvqnb0eSHd3Rm3eucf0z02Qk12PsvLSqAIKCh/VEy1 nfWGJ8cIIk7Mqbaxd6ayIx1MBRPqNnDQieW9n43NGobY/3DYwa6DX7x5HDgiTXwWUk+6 cpUIkLL+HcD5kmirqBpscSmspL6SNLSdq1BiukJKXlQzrfV4QaaByeKten2TVQxzRClB 6mKkNdWc9inAJdSDi2IJalvfZxsIDrBAN+4x5yfDFIdoHcP6A0nLhM/r0sTzpd2++vgs Y03A== X-Gm-Message-State: AOAM532jUSIJFapS+6/mfF/m4DLAGQjkxL5QM4KMpXFph3JDrgGvMEKj SBCNRnTanwAXrSMZ8hDgQlQ2cFXwb6BnSxmV X-Google-Smtp-Source: ABdhPJwtkid05INFExQgi8iwruGIP882/zF6SKE7HAX6tshOUhJfSIJnbYw7FnSZsy1nRyfHc4KmZw== X-Received: by 2002:ac8:578a:: with SMTP id v10mr24639126qta.277.1605123860997; Wed, 11 Nov 2020 11:44:20 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id o9sm3215946qte.35.2020.11.11.11.44.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:44:20 -0800 (PST) Date: Wed, 11 Nov 2020 14:44:18 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 21/23] pack-bitmap-write: use existing bitmaps Message-ID: <3f8235180636ea6735302c6e415d406676282090.1605123653.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee When constructing new bitmaps, we perform a commit and tree walk in fill_bitmap_commit() and fill_bitmap_tree(). This walk would benefit from using existing bitmaps when available. We must track the existing bitmaps and translate them into the new object order, but this is generally faster than parsing trees. In fill_bitmap_commit(), we must reorder thing somewhat. The priority queue walks commits from newest-to-oldest, which means we correctly stop walking when reaching a commit with a bitmap. However, if we walk trees from top to bottom, then we might be parsing trees that are actually part of a re-used bitmap. To avoid over-walking trees, add them to a LIFO queue and walk them from bottom-to-top after exploring commits completely. On git.git, this reduces a second immediate bitmap computation from 2.0s to 1.0s. On linux.git, we go from 32s to 22s. On chromium's fork network, we go from 227s to 198s. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 42 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 1995f75818..37204b691c 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -340,20 +340,39 @@ static void fill_bitmap_tree(struct bitmap *bitmap, static void fill_bitmap_commit(struct bb_commit *ent, struct commit *commit, - struct prio_queue *queue) + struct prio_queue *queue, + struct prio_queue *tree_queue, + struct bitmap_index *old_bitmap, + const uint32_t *mapping) { if (!ent->bitmap) ent->bitmap = bitmap_new(); - bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); prio_queue_put(queue, commit); while (queue->nr) { struct commit_list *p; struct commit *c = prio_queue_get(queue); + /* + * If this commit has an old bitmap, then translate that + * bitmap and add its bits to this one. No need to walk + * parents or the tree for this commit. + */ + if (old_bitmap && mapping) { + struct ewah_bitmap *old; + + old = bitmap_for_commit(old_bitmap, c); + if (old && !rebuild_bitmap(mapping, old, ent->bitmap)) + continue; + } + + /* + * Mark ourselves and queue our tree. The commit + * walk ensures we cover all parents. + */ bitmap_set(ent->bitmap, find_object_pos(&c->object.oid)); - fill_bitmap_tree(ent->bitmap, get_commit_tree(c)); + prio_queue_put(tree_queue, get_commit_tree(c)); for (p = c->parents; p; p = p->next) { int pos = find_object_pos(&p->item->object.oid); @@ -363,6 +382,9 @@ static void fill_bitmap_commit(struct bb_commit *ent, } } } + + while (tree_queue->nr) + fill_bitmap_tree(ent->bitmap, prio_queue_get(tree_queue)); } static void store_selected(struct bb_commit *ent, struct commit *commit) @@ -386,6 +408,9 @@ void bitmap_writer_build(struct packing_data *to_pack) size_t i; int nr_stored = 0; /* for progress */ struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; + struct prio_queue tree_queue = { NULL }; + struct bitmap_index *old_bitmap; + uint32_t *mapping; writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; @@ -395,6 +420,12 @@ void bitmap_writer_build(struct packing_data *to_pack) trace2_region_enter("pack-bitmap-write", "building_bitmaps_total", the_repository); + old_bitmap = prepare_bitmap_git(to_pack->repo); + if (old_bitmap) + mapping = create_bitmap_mapping(old_bitmap, to_pack); + else + mapping = NULL; + bitmap_builder_init(&bb, &writer); for (i = bb.commits_nr; i > 0; i--) { struct commit *commit = bb.commits[i-1]; @@ -402,7 +433,8 @@ void bitmap_writer_build(struct packing_data *to_pack) struct commit *child; int reused = 0; - fill_bitmap_commit(ent, commit, &queue); + fill_bitmap_commit(ent, commit, &queue, &tree_queue, + old_bitmap, mapping); if (ent->selected) { store_selected(ent, commit); @@ -428,7 +460,9 @@ void bitmap_writer_build(struct packing_data *to_pack) ent->bitmap = NULL; } clear_prio_queue(&queue); + clear_prio_queue(&tree_queue); bitmap_builder_clear(&bb); + free(mapping); stop_progress(&writer.progress); From patchwork Wed Nov 11 19:44:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898431 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BB41C5517A for ; Wed, 11 Nov 2020 19:44:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AD6D7207F7 for ; Wed, 11 Nov 2020 19:44:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="m5KIE8l9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727874AbgKKTo2 (ORCPT ); Wed, 11 Nov 2020 14:44:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727868AbgKKTo2 (ORCPT ); Wed, 11 Nov 2020 14:44:28 -0500 Received: from mail-qv1-xf42.google.com (mail-qv1-xf42.google.com [IPv6:2607:f8b0:4864:20::f42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29008C0613D1 for ; Wed, 11 Nov 2020 11:44:28 -0800 (PST) Received: by mail-qv1-xf42.google.com with SMTP id ec16so1539788qvb.0 for ; Wed, 11 Nov 2020 11:44:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Bh/sxgDmzH4yBz/lJLa665MnoIXLK85KXlMvWajUJ6Y=; b=m5KIE8l95MqfdcP4zM8oN2qi4NPIudIQHITuK26m+qOv9FCBaBsxcCgBuAC+cL28xn twqvIAtEIn1LiRRbYYUoP9Ch4dexjJEgyRmlqzcyiBjgSUqPKga2xYm69KrEqAfKoaAm k8iY1fgBgTvGBlyL8xhKBpTAmduBpADZrOqnEd7qQK5+Zht+kykZa/9nAnzz8z9FsAWG Sn252zNpIZsfEgv7jN2d+tjLq8a4huRgyakssTNIYBk1+F6lJ4Ws6ngeSjvUWYdb7ZVE 93qtBuJJ4IAtLTkUpJRnS6/dZB1Xo71PJdhrL2Mjd0KSc6/BdzfJscuN7lbVeWwcLwrQ xDkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Bh/sxgDmzH4yBz/lJLa665MnoIXLK85KXlMvWajUJ6Y=; b=dsubNK9rLAzh34BuADbtJMyX7dk+0IDHTT9GIfwNB4yuu4SyMmLyTUbdrUfEFpRL1U kblSsV8S7Nc1ALKevhHtp7phHD/ckJaIae15xit2UtsXmmva5b6HV1jHijOpYM1d7qaK zOqrynIcFaVA3YzbOVy0OCXEO9gvakiwZgOFhyYkT7IBKtavuQMSHrZXQH9/nQDLIbA1 HvVMT+MCvmhqpZo8q6LQVjbW/YJa8ykEuu1dk9AHwKSP0kiC58mggkvgiC2MSiwDh9Ga 8R8QXUmE1wkO8dRtjixrdSD07kQEw1rB4eDRd1n7bCiGGNauAQdceeE5QzgLnD7IUOax pEhw== X-Gm-Message-State: AOAM532YABZcNsUZxH1pbTR8ubH4zIqBuCglNAlDnfgq6Il4okOTFg7A yIG9chPm1JIChWnbRflLBUnyxE0Sqvpm2rVH X-Google-Smtp-Source: ABdhPJzuE6eNmzSE8q1hfUY2RBIGW51tqYBJ+hluvk38cqDfOi1hQU2PaTTp0Xd4Ff79HA+8T31a1g== X-Received: by 2002:a0c:db8c:: with SMTP id m12mr3321359qvk.11.1605123867051; Wed, 11 Nov 2020 11:44:27 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id y3sm2946813qto.2.2020.11.11.11.44.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:44:26 -0800 (PST) Date: Wed, 11 Nov 2020 14:44:24 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 22/23] pack-bitmap-write: relax unique rewalk condition Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The previous commits improved the bitmap computation process for very long, linear histories with many refs by removing quadratic growth in how many objects were walked. The strategy of computing "intermediate commits" using bitmasks for which refs can reach those commits partitioned the poset of reachable objects so each part could be walked exactly once. This was effective for linear histories. However, there was a (significant) drawback: wide histories with many refs had an explosion of memory costs to compute the commit bitmasks during the exploration that discovers these intermediate commits. Since these wide histories are unlikely to repeat walking objects, the benefit of walking objects multiple times was not expensive before. But now, the commit walk *before computing bitmaps* is incredibly expensive. In an effort to discover a happy medium, this change reduces the walk for intermediate commits to only the first-parent history. This focuses the walk on how the histories converge, which still has significant reduction in repeat object walks. It is still possible to create quadratic behavior in this version, but it is probably less likely in realistic data shapes. Here is some data taken on a fresh clone of the kernel: | runtime (sec) | peak heap (GB) | | | | | from | with | from | with | | scratch | existing | scratch | existing | -----------+---------+----------+---------+----------- original | 64.044 | 83.241 | 2.088 | 2.194 | last patch | 44.811 | 27.828 | 2.289 | 2.358 | this patch | 100.641 | 35.560 | 2.152 | 2.224 | Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 14 +++++--------- t/t5310-pack-bitmaps.sh | 27 ++++++++++++++------------- 2 files changed, 19 insertions(+), 22 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 37204b691c..b0493d971d 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -199,7 +199,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, { struct rev_info revs; struct commit *commit; - unsigned int i, num_maximal; + unsigned int i, num_maximal = 0; memset(bb, 0, sizeof(*bb)); init_bb_data(&bb->data); @@ -207,6 +207,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, reset_revision_walk(); repo_init_revisions(writer->to_pack->repo, &revs, NULL); revs.topo_order = 1; + revs.first_parent_only = 1; for (i = 0; i < writer->selected_nr; i++) { struct commit *c = writer->selected[i].commit; @@ -221,13 +222,12 @@ static void bitmap_builder_init(struct bitmap_builder *bb, add_pending_object(&revs, &c->object, ""); } - num_maximal = writer->selected_nr; if (prepare_revision_walk(&revs)) die("revision walk setup failed"); while ((commit = get_revision(&revs))) { - struct commit_list *p; + struct commit_list *p = commit->parents; struct bb_commit *c_ent; parse_commit_or_die(commit); @@ -235,16 +235,12 @@ static void bitmap_builder_init(struct bitmap_builder *bb, c_ent = bb_data_at(&bb->data, commit); if (c_ent->maximal) { - if (!c_ent->selected) { - bitmap_set(c_ent->commit_mask, num_maximal); - num_maximal++; - } - + num_maximal++; ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); bb->commits[bb->commits_nr++] = commit; } - for (p = commit->parents; p; p = p->next) { + if (p) { struct bb_commit *p_ent = bb_data_at(&bb->data, p->item); int c_not_p, p_not_c; diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 33ef9a098d..68badd63cb 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -43,23 +43,24 @@ has_any () { # \| # * (base) # +# We only push bits down the first-parent history, which +# makes some of these commits unimportant! +# # The important part for the maximal commit algorithm is how # the bitmasks are extended. Assuming starting bit positions -# for master (bit 0) and other (bit 1), and some flexibility -# in the order that merge bases are visited, the bitmasks at -# the end should be: +# for master (bit 0) and other (bit 1), the bitmasks at the +# end should be: # # master: 1 (maximal, selected) # other: 01 (maximal, selected) -# octo-master: 1 -# octo-other: 01 -# merge-right: 111 (maximal) -# (l1): 111 -# (r1): 111 -# merge-left: 1101 (maximal) -# (l2): 11111 (maximal) -# (r2): 111101 (maximal) -# (base): 1111111 (maximal) +# (base): 11 (maximal) +# +# This complicated history was important for a previous +# version of the walk that guarantees never walking a +# commit multiple times. That goal might be important +# again, so preserve this complicated case. For now, this +# test will guarantee that the bitmaps are computed +# correctly, even with the repeat calculations. test_expect_success 'setup repo with moderate-sized history' ' test_commit_bulk --id=file 10 && @@ -113,7 +114,7 @@ test_expect_success 'full repack creates bitmaps' ' ls .git/objects/pack/ | grep bitmap >output && test_line_count = 1 output && grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace && - grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace + grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace ' test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' From patchwork Wed Nov 11 19:44:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11898427 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD64CC388F9 for ; Wed, 11 Nov 2020 19:44:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 864A8207F7 for ; Wed, 11 Nov 2020 19:44:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="If+bFuul" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727875AbgKKTod (ORCPT ); Wed, 11 Nov 2020 14:44:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727660AbgKKTod (ORCPT ); Wed, 11 Nov 2020 14:44:33 -0500 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91BE2C0613D1 for ; Wed, 11 Nov 2020 11:44:32 -0800 (PST) Received: by mail-qk1-x742.google.com with SMTP id n132so2939924qke.1 for ; Wed, 11 Nov 2020 11:44:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=QzJrkPUzW98AR8zeqXneGhd/FKkWTXJQDhZM9/p9aqI=; b=If+bFuul1fQz9BRBPMZmO6If5RWx8MGJN17zA6k8u+p2XNsXPszmz2sL5o7XnhLFFS 2vc1Yhs4Qm/AOGN20uu1ysWRGxiUJdH2xBhnY00GbT5xTnD4wdqgiDTRdyJcJqHGviBX CV81+dgKdPlGRFSYDszyaOmszA79TU8XMQxYF8CLzWBxnfR/K4kiFO/tGscHfQSBTbUo eHRtmze+lxV11WCJhYK1nKxy3ajw9zjM4TodfArI/4ASUwaGi1tU5FWz0CiB8wCrGOp+ lcojA6gy6XAQWqrZY2f7o/0J7ofDeKOun407vPSNyO24JhlTys8STgEMAuKaUcGzH2Lw cjdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=QzJrkPUzW98AR8zeqXneGhd/FKkWTXJQDhZM9/p9aqI=; b=SHLdnZHov1T65aGXAXW0Zly/E+vm7ZVDjk460LhQM3lu5L7PnjDPAVecMY038sTqn7 GQxeNMeazRHNglLix1OSUCmjwD5l8PdeEK9ZZTdbpZVp4J1pmwE8qh35U6pZzRXU+d5D eIwMc8TGF9Czy2/ciTN9axHYqJDxDONL7avKukkKmeYFTmbz3GCk5FxwHF2l1adiJVFc 9c8fEONcQi18woGu1XsciodeRk/6C5hY6mq9xJX7a+1NVjAZkciTc3vCUBnl0dwZxNpM 40ZZjKebCRakUVEHpSvkhHKNOWhbV8nnypVewzpZmr7HI3WIvMgJn7BHfvX5ln5TzSsO od8w== X-Gm-Message-State: AOAM531ujo22dfJcXqoKd1+LOKcTsZ7D6ttCrfvKdOqYE5QX/igYo4fA dYMTbhfGhv7jGXgCVkRIqlPnoaD/VU5RupL6 X-Google-Smtp-Source: ABdhPJwnCd3S53sYVmJx7Alj8mtadj8Jn4aFt4Kk3ULDp0+LQyq96Rqdm8kP9b0FgGjTRdJpEULnDg== X-Received: by 2002:a37:a185:: with SMTP id k127mr8786667qke.413.1605123871567; Wed, 11 Nov 2020 11:44:31 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id g66sm3061069qkb.91.2020.11.11.11.44.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:44:30 -0800 (PST) Date: Wed, 11 Nov 2020 14:44:28 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 23/23] pack-bitmap-write: better reuse bitmaps Message-ID: <35beb4f098991bb3886bbaf74f9d2fc5561bca42.1605123653.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee If the old bitmap file contains a bitmap for a given commit, then that commit does not need help from intermediate commits in its history to compute its final bitmap. Eject that commit from the walk and insert it as a maximal commit in the list of commits for computing bitmaps. This helps the repeat bitmap computation task, even if the selected commits shift drastically. This helps when a previously-bitmapped commit exists in the first-parent history of a newly-selected commit. Since we stop the walk at these commits and we use a first-parent walk, it is harder to walk "around" these bitmapped commits. It's not impossible, but we can greatly reduce the computation time for many selected commits. | runtime (sec) | peak heap (GB) | | | | | from | with | from | with | | scratch | existing | scratch | existing | -----------+---------+----------+---------+----------- last patch | 100.641 | 35.560 | 2.152 | 2.224 | this patch | 99.720 | 11.696 | 2.152 | 2.217 | Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index b0493d971d..3ac90ae410 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -195,7 +195,8 @@ struct bitmap_builder { }; static void bitmap_builder_init(struct bitmap_builder *bb, - struct bitmap_writer *writer) + struct bitmap_writer *writer, + struct bitmap_index *old_bitmap) { struct rev_info revs; struct commit *commit; @@ -234,12 +235,26 @@ static void bitmap_builder_init(struct bitmap_builder *bb, c_ent = bb_data_at(&bb->data, commit); + if (old_bitmap && bitmap_for_commit(old_bitmap, commit)) { + /* + * This commit has an existing bitmap, so we can + * get its bits immediately without an object + * walk. There is no need to continue walking + * beyond this commit. + */ + c_ent->maximal = 1; + p = NULL; + } + if (c_ent->maximal) { num_maximal++; ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); bb->commits[bb->commits_nr++] = commit; } + if (!c_ent->commit_mask) + continue; + if (p) { struct bb_commit *p_ent = bb_data_at(&bb->data, p->item); int c_not_p, p_not_c; @@ -422,7 +437,7 @@ void bitmap_writer_build(struct packing_data *to_pack) else mapping = NULL; - bitmap_builder_init(&bb, &writer); + bitmap_builder_init(&bb, &writer, old_bitmap); for (i = bb.commits_nr; i > 0; i--) { struct commit *commit = bb.commits[i-1]; struct bb_commit *ent = bb_data_at(&bb.data, commit);