From patchwork Tue Nov 17 21:46:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913581 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47B8AC63697 for ; Tue, 17 Nov 2020 21:46:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E00212463B for ; Tue, 17 Nov 2020 21:46:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="ZWVI95Zm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728375AbgKQVq3 (ORCPT ); Tue, 17 Nov 2020 16:46:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728045AbgKQVq3 (ORCPT ); Tue, 17 Nov 2020 16:46:29 -0500 Received: from mail-qv1-xf41.google.com (mail-qv1-xf41.google.com [IPv6:2607:f8b0:4864:20::f41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 053B4C0613CF for ; Tue, 17 Nov 2020 13:46:28 -0800 (PST) Received: by mail-qv1-xf41.google.com with SMTP id z17so11562135qvy.11 for ; Tue, 17 Nov 2020 13:46:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=UplR+lA1i/f+BbgtcWFX3ZDVG/ISKvLsoJrbR1qzcvg=; b=ZWVI95ZmMWMhkQKC5ovkr6Zz6w4jhLv7yZLdvYjGWQZOeZu2G+JWzyD+2YdNQc5B30 pDXPNEiCX4Kol+NleyYHutPeq0gSp7hCULvqdhaIkAVjARJdlAuQP8LluDJ/UaElMi/5 KnavQrXCcwYE9TzPdcWIrQqvhD+MCbNolYmx3uL72Iyt34XMYzAe5YyX1SCGrH5EyEXg g6VarKO3IkZzy0TKefu/MoSzrCMkroeTZe+++UQU4k7otISSR2X3Njfqe7o8JT+6yozA c69Vn33ABpL11bjY8B4myoMnH6VQNMRec9wEl03I/uorgETEJxCQifkb/xmmoC7YNdQy UYjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=UplR+lA1i/f+BbgtcWFX3ZDVG/ISKvLsoJrbR1qzcvg=; b=O4mbbE/8+ayw0T6KEmMeAmzUL0qrifhD1FNsPPFj8IVhQuj9deV19p2SHx6L8ZPpv1 LGH0r/FjANznH0nwcPYau3djoesMlRQofgL5p1rKEFFSwD5ZhNLGzjzD/5SRj8VV2dH3 TCtmZfjXrr6clqUne72wPKrjpGtMoxgOT47+dHk9Ka53Hhn/ewPcyckjDGGvYSVUHQps 3whY5ZrK6sHZSqji9dVmM/X+zGUdfamx7kgXTcEsM1g7x1MxaTrIxY/I+2Pef3EhAqjD M8cXgWUg389YpkGzkk9Jh4axsCRgZSeMeW4JKnbwNi7I4S/RTQoFD5Y19tGv3geVjyjy aT7g== X-Gm-Message-State: AOAM533ZwPsq3Wuvq6dEQ/MRVb4E8J3uVah2G62Zi4oG4yhB+GNJ8+dH +EhFYbFVoMJz/YVX9mq0w8pnM2ePoB8CwkV5 X-Google-Smtp-Source: ABdhPJyMvVpeHCB8QQJap5776Al83EHT6ysXacb7gTbEn5zC0IjX04LylhDHNyKH+oqcxvnCesDk3w== X-Received: by 2002:a0c:e608:: with SMTP id z8mr1908391qvm.2.1605649586936; Tue, 17 Nov 2020 13:46:26 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id v14sm15556520qkb.15.2020.11.17.13.46.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:46:26 -0800 (PST) Date: Tue, 17 Nov 2020 16:46:24 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 01/24] ewah/ewah_bitmap.c: grow buffer past 1 Message-ID: <07054ff8ee43c4361c472e40b72f767107f66ed8.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When the buffer size is exactly 1, we fail to grow it properly, since the integer truncation means that 1 * 3 / 2 = 1. This can cause a bad write on the line below. Bandaid this by first padding the buffer by 16, and then growing it. This still allows old blocks to fit into new ones, but fixes the case where the block size equals 1. Co-authored-by: Jeff King Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/ewah_bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c index d59b1afe3d..3fae04ad00 100644 --- a/ewah/ewah_bitmap.c +++ b/ewah/ewah_bitmap.c @@ -45,7 +45,7 @@ static inline void buffer_grow(struct ewah_bitmap *self, size_t new_size) static inline void buffer_push(struct ewah_bitmap *self, eword_t value) { if (self->buffer_size + 1 >= self->alloc_size) - buffer_grow(self, self->buffer_size * 3 / 2); + buffer_grow(self, (self->buffer_size + 16) * 3 / 2); self->buffer[self->buffer_size++] = value; } From patchwork Tue Nov 17 21:46:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913583 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9024FC63777 for ; Tue, 17 Nov 2020 21:46:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2D212241A7 for ; Tue, 17 Nov 2020 21:46:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="eJe8NBLN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728399AbgKQVqi (ORCPT ); Tue, 17 Nov 2020 16:46:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728391AbgKQVqh (ORCPT ); Tue, 17 Nov 2020 16:46:37 -0500 Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACDB9C0613CF for ; Tue, 17 Nov 2020 13:46:37 -0800 (PST) Received: by mail-qk1-x72a.google.com with SMTP id v143so22408687qkb.2 for ; Tue, 17 Nov 2020 13:46:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=3oGTX7BFfV53cenzji9PUIDZ4Kg6jEo0uaSL1SEMJVo=; b=eJe8NBLNR2D69yo1hD4ZxBBrKW0rHuPIYXazNDeKmf4zuA5Vv8fK60sT+HBc8Z/yCd wfeCKrfvyuG0q8X+3zFLLlVGDq+Wsq+RQn5xQtZlzO9kg+uvFwMCsSKsuIP2fZFK+epo 4CY4VDgCc+ivuK2/zBIlqLX3hCVHxRnxFiqsh5S4YRmc2ka6XqHTXWDKsKyHqUJOAz3i myUhbj5e9+wgMqSNBmyeUiWBSjL7OYV89M7Jdykr+2kzQ2UzLNeGkHzrFKl0bj8OGEAr Fja2MuEltGvBlIitx5OUzf3lnk4mtoUDE6WpUzK0LnEwB/a/aKKjzwzLdrI9/kt2IHmR iaMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=3oGTX7BFfV53cenzji9PUIDZ4Kg6jEo0uaSL1SEMJVo=; b=NSoFr0LSlA7Y5BVkyJrASKMTLlO29/+hECsaattrjLtt3GG8o+MUZT75OsW27dy8x8 SZlqGJCRzG/45V03lALxza43bIxOxDZz8DPKIkJHSskwWPxyulcBPjnrvWQs1/7oZ/nR iyX/KiHxOLwSy3is+6ibvsmlHv7rIh/3Gv5aGPQHW9CjsV4d51tEA8YUuF0+O6p5mzc2 rlghDgQvCu44DhcCBiyqRDo0V8YDZQMEtxb0JyfLEUB7Yzv0qSOpJgcfnk45mdEw619p fgRrccUu0aS0tmHK9ctiLflu4Hp/TPeol9i5EQrdVppp3HIqHqDbEzISWo4XOa8vTOba ImBw== X-Gm-Message-State: AOAM533FxfvSlo3PY2Om9rF9MflFvVMuZxQjKL3Bo3vX11LttvBAHFte MlKi+trR4AlU9bUWcOX98aZ+7+GvyTtdLAan X-Google-Smtp-Source: ABdhPJyyjC6hcGwMNblfpkKM0WIEzHc9MLbqVOBrGxn47aT72VEVq/kPysEZaKeo7l2BduSSaHAHdA== X-Received: by 2002:a37:aad2:: with SMTP id t201mr1710938qke.61.1605649596518; Tue, 17 Nov 2020 13:46:36 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id z26sm14618337qki.40.2020.11.17.13.46.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:46:35 -0800 (PST) Date: Tue, 17 Nov 2020 16:46:33 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 02/24] pack-bitmap: fix header size check Message-ID: <74a13b4a6edaa8f9c679019017fafe2fd8512357.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King When we parse a .bitmap header, we first check that we have enough bytes to make a valid header. We do that based on sizeof(struct bitmap_disk_header). However, as of 0f4d6cada8 (pack-bitmap: make bitmap header handling hash agnostic, 2019-02-19), that struct oversizes its checksum member to GIT_MAX_RAWSZ. That means we need to adjust for the difference between that constant and the size of the actual hash we're using. That commit adjusted the code which moves our pointer forward, but forgot to update the size check. This meant we were overly strict about the header size (requiring room for a 32-byte worst-case hash, when sha1 is only 20 bytes). But in practice it didn't matter because bitmap files tend to have at least 12 bytes of actual data anyway, so it was unlikely for a valid file to be caught by this. Let's fix it by pulling the header size into a separate variable and using it in both spots. That fixes the bug and simplifies the code to make it harder to have a mismatch like this in the future. It will also come in handy in the next patch for more bounds checking. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 4077e731e8..fe5647e72e 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -138,9 +138,10 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index) static int load_bitmap_header(struct bitmap_index *index) { struct bitmap_disk_header *header = (void *)index->map; + size_t header_size = sizeof(*header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz; - if (index->map_size < sizeof(*header) + the_hash_algo->rawsz) - return error("Corrupted bitmap index (missing header data)"); + if (index->map_size < header_size + the_hash_algo->rawsz) + return error("Corrupted bitmap index (too small)"); if (memcmp(header->magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)) != 0) return error("Corrupted bitmap index file (wrong header)"); @@ -164,7 +165,7 @@ static int load_bitmap_header(struct bitmap_index *index) } index->entry_count = ntohl(header->entry_count); - index->map_pos += sizeof(*header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz; + index->map_pos += header_size; return 0; } From patchwork Tue Nov 17 21:46:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913585 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27AEAC6379D for ; Tue, 17 Nov 2020 21:46:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BEDC6241A7 for ; Tue, 17 Nov 2020 21:46:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="PwWO4DV3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728410AbgKQVqp (ORCPT ); Tue, 17 Nov 2020 16:46:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728391AbgKQVqo (ORCPT ); Tue, 17 Nov 2020 16:46:44 -0500 Received: from mail-qt1-x834.google.com (mail-qt1-x834.google.com [IPv6:2607:f8b0:4864:20::834]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FAF7C0613CF for ; Tue, 17 Nov 2020 13:46:43 -0800 (PST) Received: by mail-qt1-x834.google.com with SMTP id f93so17077814qtb.10 for ; Tue, 17 Nov 2020 13:46:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=4d20uyG9cZ5B5GqRS0mcoB4LsZ3zxHoVoqqlcKj8nJw=; b=PwWO4DV3oppq+sjcVRaPn+hk7WMsOg7a04C2ZACrd8PeEuhrxsHCDUhkn5PuqQHgkw rLZc53DEeXudjsEo8AjUQcLGObenBmPwQD0WNqyzUZ2A2s2WAYRGubFyojK3hq2+6QUM X2Cw+ggiEARfAh5DySlaGUQQFIogfSIQHoAlZnb3/KpstoTziCLQSB9qFlpTAvNXlz6M XOyJqbmk0DS9gK+bcLpqvos1Kl6WrB+2t4mQ55HmlNbNLtXLk9XSza3fBvILfVWMXvIS SACPUUm6X/NelpzEJfESWclnuJxONmWUPoIOvb4WhNRIMZAIo7IIJDLvE6Aw1+RppsH0 yFLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=4d20uyG9cZ5B5GqRS0mcoB4LsZ3zxHoVoqqlcKj8nJw=; b=EEm/cyEysfCSues2d1jIX9LMcRGIEJRYRnXjWb7/Gdy4/DkiDecXNXNl2p1f/CghPL w2PJFJ6YG7pHTiQ+IZz/wtepAFNq5fn4U3uyGrXjP7uAw4aRlPc/+62trgdBOmBh+5VT GBweD9Mr5Gl42s+4G9/PGnhEXvVlhts76sI10aHgjMzEwW9/1s6b77y2mUoS/yoLj0Ld RsLIzutJcX60LnyYfusRfSE1goxRrpgwWHU1CKZPREzPrmzHL9+agw9Goo0EocQtVOwk QbvbW6kXJ00ceiSj2sF9zz+Dim6oGLqHgsoyf8QKMKA9sBK8az+hlrVUMYIxRtDBenaq nISQ== X-Gm-Message-State: AOAM531gHtHDgHkkIKoFNfvmvQlo+yEv7orgqwA0sPwXteDkHEIJEUjT KsBNfOweNWHw8cmK1+IOnQaCr7P4xqahU71J X-Google-Smtp-Source: ABdhPJyENf8MX9syLITDvXVTLxs38jR1CSZx39qYt9EgI/Ijep6dUMIB0fsWawj0vm1iAuuOwTzgSQ== X-Received: by 2002:ac8:5c94:: with SMTP id r20mr1839630qta.158.1605649602060; Tue, 17 Nov 2020 13:46:42 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id n125sm11491325qkd.85.2020.11.17.13.46.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:46:41 -0800 (PST) Date: Tue, 17 Nov 2020 16:46:38 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 03/24] pack-bitmap: bounds-check size of cache extension Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King A .bitmap file may have a "name hash cache" extension, which puts a sequence of uint32_t values (one per object) at the end of the file. When we see a flag indicating this extension, we blindly subtract the appropriate number of bytes from our available length. However, if the .bitmap file is too short, we'll underflow our length variable and wrap around, thinking we have a very large length. This can lead to reading out-of-bounds bytes while loading individual ewah bitmaps. We can fix this by checking the number of available bytes when we parse the header. The existing "truncated bitmap" test is now split into two tests: one where we don't have this extension at all (and hence actually do try to read a truncated ewah bitmap) and one where we realize up-front that we can't even fit in the cache structure. We'll check stderr in each case to make sure we hit the error we're expecting. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 8 ++++++-- t/t5310-pack-bitmaps.sh | 17 +++++++++++++++-- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index fe5647e72e..074d9ac8f2 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -153,14 +153,18 @@ static int load_bitmap_header(struct bitmap_index *index) /* Parse known bitmap format options */ { uint32_t flags = ntohs(header->options); + size_t cache_size = st_mult(index->pack->num_objects, sizeof(uint32_t)); + unsigned char *index_end = index->map + index->map_size - the_hash_algo->rawsz; if ((flags & BITMAP_OPT_FULL_DAG) == 0) return error("Unsupported options for bitmap index file " "(Git requires BITMAP_OPT_FULL_DAG)"); if (flags & BITMAP_OPT_HASH_CACHE) { - unsigned char *end = index->map + index->map_size - the_hash_algo->rawsz; - index->hashes = ((uint32_t *)end) - index->pack->num_objects; + if (cache_size > index_end - index->map - header_size) + return error("corrupted bitmap index file (too short to fit hash cache)"); + index->hashes = (void *)(index_end - cache_size); + index_end -= cache_size; } } diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 8318781d2b..e2c3907a68 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -343,7 +343,8 @@ test_expect_success 'pack reuse respects --incremental' ' test_must_be_empty actual ' -test_expect_success 'truncated bitmap fails gracefully' ' +test_expect_success 'truncated bitmap fails gracefully (ewah)' ' + test_config pack.writebitmaphashcache false && git repack -ad && git rev-list --use-bitmap-index --count --all >expect && bitmap=$(ls .git/objects/pack/*.bitmap) && @@ -352,7 +353,19 @@ test_expect_success 'truncated bitmap fails gracefully' ' mv -f $bitmap.tmp $bitmap && git rev-list --use-bitmap-index --count --all >actual 2>stderr && test_cmp expect actual && - test_i18ngrep corrupt stderr + test_i18ngrep corrupt.ewah.bitmap stderr +' + +test_expect_success 'truncated bitmap fails gracefully (cache)' ' + git repack -ad && + git rev-list --use-bitmap-index --count --all >expect && + bitmap=$(ls .git/objects/pack/*.bitmap) && + test_when_finished "rm -f $bitmap" && + test_copy_bytes 512 <$bitmap >$bitmap.tmp && + mv -f $bitmap.tmp $bitmap && + git rev-list --use-bitmap-index --count --all >actual 2>stderr && + test_cmp expect actual && + test_i18ngrep corrupted.bitmap.index stderr ' # have_delta From patchwork Tue Nov 17 21:46:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A850EC2D0E4 for ; Tue, 17 Nov 2020 21:46:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 63AE4241A7 for ; Tue, 17 Nov 2020 21:46:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="jErih8+T" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728415AbgKQVqt (ORCPT ); Tue, 17 Nov 2020 16:46:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728391AbgKQVqt (ORCPT ); Tue, 17 Nov 2020 16:46:49 -0500 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9459C0613CF for ; Tue, 17 Nov 2020 13:46:47 -0800 (PST) Received: by mail-qk1-x744.google.com with SMTP id u4so22344484qkk.10 for ; Tue, 17 Nov 2020 13:46:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=JtHKqRC+mJFp8b40zjnzmvm8xtAum+EmraJL4uzwst0=; b=jErih8+TIfvG4xvDTWIZO2JDJl2trrVXjm88991M+982/C+HuWRqf1Yhc9Xcf+jO7E dqVoJqQUKzSCaTvIPqicyxiJY5zzZDgQ539UyFRgIbqCo6t+x/NgSVxce9C8qpB2cCD5 +24ORzJOZx6yP8PcXsiyYxsV3Dwvf7VmBYO5tjSP3zvZpqBBF36xc8Ilm6YrS0NPf8oQ IEtDJrbxyAhyujhmDapFWjL12KUIlPjtzPqugUgKeF2WK6B4M5spwHUZpknXVniCefE5 2lf2vHpoiNGzoYjAUDJp4oz+vUquJcgak20OVJw6s7WCj3yR1fvvbPomeSc4BVNYcjtc 7daQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=JtHKqRC+mJFp8b40zjnzmvm8xtAum+EmraJL4uzwst0=; b=n20bDkuO9FjJmnxSUS2cLEnL/zTdly1eexHT42tn3A7qcju1zZVh0fde8gheNSgobz UmYoaW6Cw2k7Lecp7n0Vq+23B2cSqipYPN3RnvgRLlkXuJTK2F3DbE6fBjEScGFHcN1P +OV9X84fR5TPqt4PdPsaJtR4EjWdDYNE9edT84ZyXn4B69y8HBJTbrpOSNdkOEHdfCkJ 9c3VWLzELlN6aE1toYiJ/YJNS5NqUYdyuSPJ5MIo1X8TPqnD684s1kgGvY+ObfHFJDmg nXnYkUoypC4lVmmPv4LkwANH07GDOY4YKoFRx2P97XM5OroLQ7ILPJdlV0coWF0oDaY7 HctA== X-Gm-Message-State: AOAM532O2ljSiFZxe++ByV48nNiYRpi3UxvenyFTFX7DWacBVjSqjFg4 8aMstoyRdly0Esvz2qIbE1TdTS4JxmHWbI4A X-Google-Smtp-Source: ABdhPJy1DQmUB5hvbOP+xlyuEeatVbs1zf+PkKrJv3UbHcCwc6SmoJvQccBZqQEoFGszCKxlcZ3J0g== X-Received: by 2002:ae9:e919:: with SMTP id x25mr1808698qkf.50.1605649606718; Tue, 17 Nov 2020 13:46:46 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id q32sm10615354qtb.71.2020.11.17.13.46.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:46:46 -0800 (PST) Date: Tue, 17 Nov 2020 16:46:44 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 04/24] t5310: drop size of truncated ewah bitmap Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King We truncate the .bitmap file to 512 bytes and expect to run into problems reading an individual ewah file. But this length is somewhat arbitrary, and just happened to work when the test was added in 9d2e330b17 (ewah_read_mmap: bounds-check mmap reads, 2018-06-14). An upcoming commit will change the size of the history we create in the test repo, which will cause this test to fail. We can future-proof it a bit more by reducing the size of the truncated bitmap file. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- t/t5310-pack-bitmaps.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index e2c3907a68..70a4fc4843 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -349,7 +349,7 @@ test_expect_success 'truncated bitmap fails gracefully (ewah)' ' git rev-list --use-bitmap-index --count --all >expect && bitmap=$(ls .git/objects/pack/*.bitmap) && test_when_finished "rm -f $bitmap" && - test_copy_bytes 512 <$bitmap >$bitmap.tmp && + test_copy_bytes 256 <$bitmap >$bitmap.tmp && mv -f $bitmap.tmp $bitmap && git rev-list --use-bitmap-index --count --all >actual 2>stderr && test_cmp expect actual && From patchwork Tue Nov 17 21:46:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913589 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AC5AC63697 for ; Tue, 17 Nov 2020 21:46:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A6C81241A7 for ; Tue, 17 Nov 2020 21:46:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="cYjXYzjl" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728430AbgKQVqw (ORCPT ); Tue, 17 Nov 2020 16:46:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728391AbgKQVqw (ORCPT ); Tue, 17 Nov 2020 16:46:52 -0500 Received: from mail-qv1-xf44.google.com (mail-qv1-xf44.google.com [IPv6:2607:f8b0:4864:20::f44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 513B9C0613CF for ; Tue, 17 Nov 2020 13:46:52 -0800 (PST) Received: by mail-qv1-xf44.google.com with SMTP id v20so8313914qvx.4 for ; Tue, 17 Nov 2020 13:46:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=BvjQlX/qMG8QjnQOEs4lX90tShF0U0rm3K0EXLIQjzw=; b=cYjXYzjlpsp0EPw0d2vCjjtjo0xmteB7Fit6oeBMsJdVKN2nVtlvCWftIdI7a/eZ1u RS01zkx16NLWFLdUC+B8JRULJBs75tGjxaMr9CgF40UDTr2BvDk4JaYSD6XWKjHLE365 QjEYbdbiNO/+MSNYyax5SCFKSPAZnK/QzQVkp7Rq5SdcaOFYvGmpuxAvnfAy7HPVSNDF Jv9Pgqdbx1cz8x9qWNtr7mI9Gbn0t1qJ+uaZUOflUwnABc6B0n0S2x4u/DbjaT/Y6yqW 6p70/ZhSo5Yr5y2gCaYXp7bp3cJ41rDTz3IJNwiO2D8Xni8YH/r9ZOiOte06K245D96/ TU1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=BvjQlX/qMG8QjnQOEs4lX90tShF0U0rm3K0EXLIQjzw=; b=NWdnpj3dirHgxULQiwfbj0HjYrGp5zctpFrPR3EuP3QD1EJd5FAVK4aMHueAkc7wlO /Ws25CKTqM9GoyjmpwOU/FRToe3T2IWBPwXzfm37XsbGJc130EroWHg+Yy1GQkH6KQQT LgGX3ZRtdEy6/wgRGht4xElCLxYLfDHgauQcaJhlFdN2JeThqFgTkaLyotM1elh8ZV6+ t8zyJSBNChg0rmSAqrma9E8NeOudC2aSUys7J9yPOme1O13HvKCoaLIVuFIp/Fiasvu8 z+9QCKS3LRUlH3TVesInBumbVF2iQ+TpXIXHTcGmN6BTHry1wpIATmCkLGXohxBXmjwQ 4i/w== X-Gm-Message-State: AOAM533XmEf9hV0v2NioL+VOIDiJy9pQFAWJQcNOwcxPdA2K6M/D1Ve+ cMW4Hc9TFvJuBjtuj++6yKO49gEOunEyBpPA X-Google-Smtp-Source: ABdhPJx2Ak8cJqg0+ILquN9cWdtbO71WWkxdtVeg5gMnXjoNGGZX+iTX4xPK8c6cKPEmpUi5ejRbqQ== X-Received: by 2002:a0c:80e1:: with SMTP id 88mr1947107qvb.10.1605649611282; Tue, 17 Nov 2020 13:46:51 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id d48sm15814626qta.26.2020.11.17.13.46.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:46:50 -0800 (PST) Date: Tue, 17 Nov 2020 16:46:48 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 05/24] rev-list: die when --test-bitmap detects a mismatch Message-ID: <1a9ac1c4aefcc8e4665106a39ce6c6c0b2a4a52c.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King You can use "git rev-list --test-bitmap HEAD" to check that bitmaps produce the same answer we'd get from a regular traversal. But if we detect an error, we only print "mismatch", and still exit with a successful error code. That makes the uses of --test-bitmap in the test suite (e.g., in t5310) mostly pointless: even if we saw an error, the tests wouldn't notice. Let's instead call die(), which will let these tests work as designed, and alert us if the bitmaps are bogus. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 074d9ac8f2..4431f9f120 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1328,7 +1328,7 @@ void test_bitmap_walk(struct rev_info *revs) if (bitmap_equals(result, tdata.base)) fprintf(stderr, "OK!\n"); else - fprintf(stderr, "Mismatch!\n"); + die("mismatch in bitmap results"); free_bitmap_index(bitmap_git); } From patchwork Tue Nov 17 21:46:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913591 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9595C63697 for ; Tue, 17 Nov 2020 21:47:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8D7A6241A7 for ; Tue, 17 Nov 2020 21:47:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="K2Yz+mbe" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728431AbgKQVq7 (ORCPT ); Tue, 17 Nov 2020 16:46:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726544AbgKQVq7 (ORCPT ); Tue, 17 Nov 2020 16:46:59 -0500 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3589C0613CF for ; Tue, 17 Nov 2020 13:46:57 -0800 (PST) Received: by mail-qt1-x842.google.com with SMTP id v11so17079216qtq.12 for ; Tue, 17 Nov 2020 13:46:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=A1JWPrX0+V9iLB+u62uSdRbBNywDpECZ6X0CkSo1VC4=; b=K2Yz+mbeVME6pxLdkxuQiG095D9J31aTYuEV0En+eIWNyw82TG1AfmBvcxwKSyttS9 lDMca+Fb3A/UI300zhtFmnqRAihccnSX2N5TXOPpOoLhVGCML69A6K7PpXOvaSgjQzgp 1KF7B/ACDFz/+HnL8PUAXMxwOjmVYtEoO+BV40UxlF3lo6dUmdv+9G+MIcEwyE6htlNN A/J4iTQ4FVR63kaLT/fJduUijZUuC4LWTQ89/WcO2E7wiyp0jsKZRVpDYh1seaf+vzok sfG6wpv0X+bKPXKOONrLOenEXUyn6o/0YGIr5aePvYfDf5+lCnmv4phJAbRNulNJLTvH ZAOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=A1JWPrX0+V9iLB+u62uSdRbBNywDpECZ6X0CkSo1VC4=; b=TuK2JkfNuF68sqRDV0Ka00QuF5M4zfg38Q7ur8B7qHXO4mXXnp9KMtehqMc4wSal1l /ieJRLCLLFlLV59Wtm44B3Wwx0JrUjYsgXfWnjIkvBmQ0E68piZq+oMpbJHL9AEFyi3d QYhrEFk3sXYGEIjiq0WfCS6s4rZIMtoMOtOFaOWWf8hPAHZxpnvDObGL20smNj7P4c78 FgYBy31bHf2GDjcAS0TmTdiTVf20+GiKKHm3Yp6OZkfD5UKh9VslClj4Xc6GMeFEEQLx f4K3Tpx6gD794ZeIuSyD7ROn/GLpEjlmKl3bsb2r9AvQFASqznhuW0Y4BuiGnIdKO8+Q TuUA== X-Gm-Message-State: AOAM531qvq/FZIlG5wDfg6qqO86FaVTKkr9NaqswlioxV6rs6+bFoebp 4dEPB8XbVG216IOZJADbZodyIXjiWvYzu3ME X-Google-Smtp-Source: ABdhPJzCx6hOFpzsp9VSfkg3XzchswRfNaPW/5DrQCx51d8ay1SyLI2dFs/7yiD7Bi6+zj0c6dUJjA== X-Received: by 2002:ac8:5649:: with SMTP id 9mr1729711qtt.379.1605649616887; Tue, 17 Nov 2020 13:46:56 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id f189sm14982766qkb.84.2020.11.17.13.46.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:46:56 -0800 (PST) Date: Tue, 17 Nov 2020 16:46:54 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 06/24] ewah: factor out bitmap growth Message-ID: <9bb1ea3b19c86507b7485ae872a8ef350cd0aedc.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King We auto-grow bitmaps when somebody asks to set a bit whose position is outside of our currently allocated range. Other operations besides single bit-setting might need to do this, too, so let's pull it into its own function. Note that we change the semantics a little: you now ask for the number of words you'd like to have, not the id of the block you'd like to write to. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index d8cec585af..7c1ecfa6fd 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -35,18 +35,22 @@ struct bitmap *bitmap_new(void) return bitmap_word_alloc(32); } -void bitmap_set(struct bitmap *self, size_t pos) +static void bitmap_grow(struct bitmap *self, size_t word_alloc) { - size_t block = EWAH_BLOCK(pos); - - if (block >= self->word_alloc) { + if (word_alloc > self->word_alloc) { size_t old_size = self->word_alloc; - self->word_alloc = block ? block * 2 : 1; + self->word_alloc = word_alloc * 2; REALLOC_ARRAY(self->words, self->word_alloc); memset(self->words + old_size, 0x0, (self->word_alloc - old_size) * sizeof(eword_t)); } +} +void bitmap_set(struct bitmap *self, size_t pos) +{ + size_t block = EWAH_BLOCK(pos); + + bitmap_grow(self, block + 1); self->words[block] |= EWAH_MASK(pos); } From patchwork Tue Nov 17 21:47:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913593 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAFE2C63697 for ; Tue, 17 Nov 2020 21:47:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 55C10241A7 for ; Tue, 17 Nov 2020 21:47:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="NAzpINx7" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728443AbgKQVrE (ORCPT ); Tue, 17 Nov 2020 16:47:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726544AbgKQVrE (ORCPT ); Tue, 17 Nov 2020 16:47:04 -0500 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4AF84C0613CF for ; Tue, 17 Nov 2020 13:47:04 -0800 (PST) Received: by mail-qt1-x841.google.com with SMTP id g20so5282662qtu.4 for ; Tue, 17 Nov 2020 13:47:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=17lcZ0zqXOkBUp/MGOpfciq4MhQIFv8u9HznwaBQsZI=; b=NAzpINx7YyRB8FIOpCmUTiUpVLLtbNjRXMFJWkvg1NaNwuY0Ukc4d3YFz+1o8Y2gMZ 0rur92X+1MkCTKnrjpN+OW2kgK20WtEDAeC+sMAr1HT8H6v5jA7WlPtJy4sBXVyC6Sld 701JMhkxKZUZv3MgdBsGFm1k65lPDsWrzjcypj2o7M9E7PrWSskC0NvLfPiw5l6IadGV Xzo59sTaISjv0h0QT2YayuiAQKRqcB/u9IbVogqM/8jRmwhBVh15/jFKqrXhNgkEVOS5 Yr4oGPpeLzoFt6ZVPITDvwNKbJtdr0pAbjUuei6N8OhCJyqiaIMQ7uH2xMypUoNVTRzv X40A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=17lcZ0zqXOkBUp/MGOpfciq4MhQIFv8u9HznwaBQsZI=; b=ekHTFL6n1yX7CpA2Xm+pVa/FghpseDu37ZfKvsB9wPwduSQKHnnIdpRTzqMiIM3DlY Tt65NGiQsjqrJCWCn0TljFGaIRIW5JCwfaYtMSB//XOeARxVse17NTbWn1a0sYf7JDPf 1e5MF972NusR1Fo+g6syksQF2r66+tF2Tq6HMcg5JFmAGAmte0b7/bf40O2r2zSaeLyk SQc/bOxKTw4Etrmtj4d+r/Pgf7pHbxvScEB0J1yC4jDe5dcnHEJR22LIe/OqPy7QMoVz Jwy7iTiE9pip7/FT/q5Qa5VefvY/lys/dJy1+lN3kJDY+VSNyAJKorymn3LVwGFrExrf 3QEw== X-Gm-Message-State: AOAM530gQaHa3AX0f8ujA+D6X23vmJgvnTR0TVD8pLEY7161p1AHXs93 I3c00T8DahpDKlFjoJMvRKQWAeuUKIeocUYO X-Google-Smtp-Source: ABdhPJwblMlIuv1dZ61PMhmSP7N5/ELv0FIKwUH6K+H5Qtx3sWtSP/mh9XCvPN1ncJgzzLqhErPqDw== X-Received: by 2002:ac8:71d9:: with SMTP id i25mr1798728qtp.218.1605649623040; Tue, 17 Nov 2020 13:47:03 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id y23sm6431121qkb.26.2020.11.17.13.47.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:47:02 -0800 (PST) Date: Tue, 17 Nov 2020 16:47:00 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 07/24] ewah: make bitmap growth less aggressive Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King If you ask to set a bit in the Nth word and we haven't yet allocated that many slots in our array, we'll increase the bitmap size to 2*N. This means we might frequently end up with bitmaps that are twice the necessary size (as soon as you ask for the biggest bit, we'll size up to twice that). But if we just allocate as many words as were asked for, we may not grow fast enough. The worst case there is setting bit 0, then 1, etc. Each time we grow we'd just extend by one more word, giving us linear reallocations (and quadratic memory copies). Let's combine those by allocating the maximum of: - what the caller asked for - a geometric increase in existing size; we'll switch to 3/2 instead of 2 here. That's less aggressive and may help avoid fragmenting memory (N + 3N/2 > 9N/4, so old chunks can be reused as we scale up). Our worst case is still 3/2N wasted bits (you set bit N-1, then setting bit N causes us to grow by 3/2), but our average should be much better. This isn't usually that big a deal, but it will matter as we shift the reachability bitmap generation code to store more bitmaps in memory. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index 7c1ecfa6fd..43a59d7fed 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -39,7 +39,9 @@ static void bitmap_grow(struct bitmap *self, size_t word_alloc) { if (word_alloc > self->word_alloc) { size_t old_size = self->word_alloc; - self->word_alloc = word_alloc * 2; + self->word_alloc = old_size * 3 / 2; + if (word_alloc > self->word_alloc) + self->word_alloc = word_alloc; REALLOC_ARRAY(self->words, self->word_alloc); memset(self->words + old_size, 0x0, (self->word_alloc - old_size) * sizeof(eword_t)); From patchwork Tue Nov 17 21:47:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913595 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 752E8C63697 for ; Tue, 17 Nov 2020 21:47:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1EF92241A7 for ; Tue, 17 Nov 2020 21:47:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="fMX8usJ3" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728451AbgKQVrK (ORCPT ); Tue, 17 Nov 2020 16:47:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726544AbgKQVrK (ORCPT ); Tue, 17 Nov 2020 16:47:10 -0500 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3E98C0613CF for ; Tue, 17 Nov 2020 13:47:09 -0800 (PST) Received: by mail-qk1-x743.google.com with SMTP id y197so22349838qkb.7 for ; Tue, 17 Nov 2020 13:47:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=hFtrMf4H0HoMBwwREAKCo9ihlUTZl2c1Fk0BmPe3kV4=; b=fMX8usJ3foe4qQgdxfcVyO63tQSRLdEDKGhRR/Q6TU6ce6KWet/8DDpVQAxnpNfsl3 1r0kHebmYM0+A/q5FEfXyW4sexrI5iRZpar4A2N6f59J4OxrN5LAy4sOYs/zc8hXxYZ1 A8T1Nk34Gv3O9m76hveeeTRoU8nqoiy1OWWIYo6DBbkIl8gewzU7R7yKKIUXHMxZ15/J eB/ireWn1duWkaJuuH032/EnAgupzHtOkOBPmmcMLjK7dGlTZLLHzvVp9CXlCLtDsv+n ThEqgMsFKqAa90cbe0FFhjjh4mnwFK7sOEiBQ7mn2nEALaUhZH26xtoRv63dn5VCtUvK o5oA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=hFtrMf4H0HoMBwwREAKCo9ihlUTZl2c1Fk0BmPe3kV4=; b=NC2gKuDjz/P/jV30gMy7rw9Zy1a/UQixfYMZxE8Y0FJn3ME/qPC8WQdQ+sLiVg9vQd 7n649paAbNUCcyXXDp7ESnZyOwQzIg09rJYcAOMXLuemp6bXc3qGqBj40XelneqFlB9O 9OuotIwwu+GiLQs5THczv9VVJBylaZt+dO2e+BP2ojh/HL0uhPi01cKNTCOfuczD+HcL a7x9AVK5/X+2oQwPbuiqDO+nTW39H3SkcyRUHIwV2304zGStGaUoHz0Lut1h7Yq4gzGF RINbVYnxm2NeCVx/cfkNnexL5fxDaVukL0wRwnx63nCCJ9TeYW1pZqVnV1OPR7t5RR2/ O48g== X-Gm-Message-State: AOAM530IRlbuJ112ghCitDCfmk+4Cs5t4oaTmwzx9k4gP23+X0iaSQST OQDZleX8fbSrLUp71Hg7RAO2+jASwVHu2+dO X-Google-Smtp-Source: ABdhPJwmVhd3MmO0qGMezaZNzA4BgM+fPnJYnhIIB5Cjecb+OkrlYNo57+JOum9LZzvqcdVd2NYcjg== X-Received: by 2002:a37:9b06:: with SMTP id d6mr1732298qke.116.1605649628741; Tue, 17 Nov 2020 13:47:08 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id o15sm14779932qtq.91.2020.11.17.13.47.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:47:08 -0800 (PST) Date: Tue, 17 Nov 2020 16:47:05 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 08/24] ewah: implement bitmap_or() Message-ID: <674e31f98e9ad3a69504125a9aaa6e914383858b.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King We have a function to bitwise-OR an ewah into an uncompressed bitmap, but not to OR two uncompressed bitmaps. Let's add it. Interestingly, we have a public header declaration going back to e1273106f6 (ewah: compressed bitmap implementation, 2013-11-14), but the function was never implemented. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index 43a59d7fed..c3f8e7242b 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -127,6 +127,15 @@ void bitmap_and_not(struct bitmap *self, struct bitmap *other) self->words[i] &= ~other->words[i]; } +void bitmap_or(struct bitmap *self, const struct bitmap *other) +{ + size_t i; + + bitmap_grow(self, other->word_alloc); + for (i = 0; i < other->word_alloc; i++) + self->words[i] |= other->words[i]; +} + void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other) { size_t original_size = self->word_alloc; From patchwork Tue Nov 17 21:47:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913597 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CC62C63697 for ; Tue, 17 Nov 2020 21:47:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AB445241A7 for ; Tue, 17 Nov 2020 21:47:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="G7VQy5t9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728460AbgKQVrP (ORCPT ); Tue, 17 Nov 2020 16:47:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726544AbgKQVrO (ORCPT ); Tue, 17 Nov 2020 16:47:14 -0500 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6BF25C0613CF for ; Tue, 17 Nov 2020 13:47:14 -0800 (PST) Received: by mail-qt1-x841.google.com with SMTP id m65so17069034qte.11 for ; Tue, 17 Nov 2020 13:47:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=1qPKW/h2owEnumCUEtXxvfiizBNn0zE3UKWmCTVbkm0=; b=G7VQy5t9IVgd3hYEqP2bARDnG1z0qm5M3+Mtowobs7gaNtH8+240wIROskt7KqxqGZ zCKGajoTBvuTejCSvoawis9aU48URvmVMGKG00QmzOlaOSdLZcypEMvoKnE5+oQ4BYyB 5Bs9tp17C7KfkVGvg0EWVwil0R8uu0qQnAhGklaxk3PQ1hD3bI0wMXhLXZbXbN1o4mlA q+qivmQGtPrJmMEf5AUsa1Zem0HFQYnu+NFrKtMHa1V5Ggraf6s1FuIPeSQTONhNIJpv LK/ap+3DTx3viHb9DJWk8LJo5T7rtgLKPC2uW7VWaYnEOALEtOL2Jv1KM9byF8MOIGSA jKnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=1qPKW/h2owEnumCUEtXxvfiizBNn0zE3UKWmCTVbkm0=; b=h0IT61c6DVll8AoGuSk+v4mSeTFJpFTf//V9ZJypkTvZGZik358GHoJ3OW9AATU59h qj01oPEv0tm6sxY45NrPuPo9bjcuSuaPRDxZ5u6O/qVx9YLNaYHGoZwUxSJRkEyqYGUX FcM0veg2XKO/pMVp2DJDI0OHEs8zcwJwaSshXbIIlJXHbAMFJnPvuhZ4n2+8HONLFJFT aM6OUYMGXi50g1/huYeRyMc8+eAmMbZ9M9Y2w6h3F8KO8RTdB+agNjqXFxJT4zrH7V/D Ukfu1E2nedJExzh/NzYlEQ63vwUemH236sAtk/3Kt1HI0nS+srVWUNvLtlR5+qkcMDY6 15Gg== X-Gm-Message-State: AOAM530p8tLTU2wymH1iZrzWSEw98HGvXV3CMWXRbWoQ6ALiaFdpCE1Y e0P4+N1nNVWwdfYZH74YxY49SjxkiP9jZ49B X-Google-Smtp-Source: ABdhPJwoLX8fd+OfWGYVHL8IW+XktCgFXmnJgOmYY8RfEmPTK5YNwCEuReYJ7SSLElN6EF1GYk0f0g== X-Received: by 2002:ac8:6a16:: with SMTP id t22mr1332457qtr.304.1605649633316; Tue, 17 Nov 2020 13:47:13 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id 207sm10504469qki.91.2020.11.17.13.47.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:47:12 -0800 (PST) Date: Tue, 17 Nov 2020 16:47:10 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 09/24] ewah: add bitmap_dup() function Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King There's no easy way to make a copy of a bitmap. Obviously a caller can iterate over the bits and set them one by one in a new bitmap, but we can go much faster by copying whole words with memcpy(). Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- ewah/bitmap.c | 7 +++++++ ewah/ewok.h | 1 + 2 files changed, 8 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index c3f8e7242b..eb7e2539be 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -35,6 +35,13 @@ struct bitmap *bitmap_new(void) return bitmap_word_alloc(32); } +struct bitmap *bitmap_dup(const struct bitmap *src) +{ + struct bitmap *dst = bitmap_word_alloc(src->word_alloc); + COPY_ARRAY(dst->words, src->words, src->word_alloc); + return dst; +} + static void bitmap_grow(struct bitmap *self, size_t word_alloc) { if (word_alloc > self->word_alloc) { diff --git a/ewah/ewok.h b/ewah/ewok.h index 011852bef1..1fc555e672 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -173,6 +173,7 @@ struct bitmap { struct bitmap *bitmap_new(void); struct bitmap *bitmap_word_alloc(size_t word_alloc); +struct bitmap *bitmap_dup(const struct bitmap *src); void bitmap_set(struct bitmap *self, size_t pos); void bitmap_unset(struct bitmap *self, size_t pos); int bitmap_get(struct bitmap *self, size_t pos); From patchwork Tue Nov 17 21:47:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913599 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB9DCC2D0E4 for ; Tue, 17 Nov 2020 21:47:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 833E6241A7 for ; Tue, 17 Nov 2020 21:47:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="qiXYZTUo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728480AbgKQVrU (ORCPT ); Tue, 17 Nov 2020 16:47:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726544AbgKQVrU (ORCPT ); Tue, 17 Nov 2020 16:47:20 -0500 Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24EE0C0613CF for ; Tue, 17 Nov 2020 13:47:19 -0800 (PST) Received: by mail-qk1-x72a.google.com with SMTP id q22so22379445qkq.6 for ; Tue, 17 Nov 2020 13:47:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=v0mO2fGveooN1+oJnam6oJuJzXQLTmYQxZmIXQ4D3mA=; b=qiXYZTUo7Vr338LJBszM3huRvni6QL1bTheaP68YhoQc0Cku3mZIbFvEdsrcP3b2Ye 462Ue/QJiCHiJI12PLSkyzStIcKZ+zz5EOwBhbuG4gyO8jbUBqNmBwTk33jo9HzkU8DB Sp4rMp2TNBl5BwsxRxegCxIVdR9FmdDmcqxoIXm+JVG+eyW3vsRYdO7WCpqx2Kr18WsZ bbclXcN+i4L8sf7yHUbDPso8dQtxioxfxGC4Px7LK8G5BMyYEsk3HBLkVR7vpdZY0hDi LzuSzeVGIJnH35tZhx7xbi1h3xB7xg1NvUtWH4pvVoXlh1cip95O5BdClfOJG6RdhoMl QNow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=v0mO2fGveooN1+oJnam6oJuJzXQLTmYQxZmIXQ4D3mA=; b=DmXBT4OtMHUlMw8zadE0a6GxiHSHTedn6VZunCZ+jTC8mclheMR02k1BDjTWAlxp2D 0HfDAYsRNuPD/x1T260/FLur2KBldAryNfXjQxBCy7TiQwUSe7pk+uAn4IuOgq6DTL1R ZtvUI2trxL1gJTnEUeKGhbQPiRP/P/o8O2id0A+o7ZUujl8vHYjdjJHZ4lQfm0kRxB4i 1aepCNr4d43VVKaojHYDyCNLU21qxh/32lNUT5gDaQrd3xtu1b2YI9hHCXkwDEWFckl5 Jav35Jtmj6aLUYdS2qk5+SbSyt61o9KQF31J32z+RHGxIgg/m4xj1CFYxZIAC0Mec/5j kqyA== X-Gm-Message-State: AOAM533BNEjsgtg883v9sV+MEjzw87SKzbPx5S+DCH3owVcnMkrogoXn uzBpSA+XtnSI3dTYLMGJFw+igv8sqWgwd4Bu X-Google-Smtp-Source: ABdhPJz8fAmCT4ds+3lkO1cd1tEW8JOBZo4WTvyS2/8jITWMJEhee9QyXI1eFVVIYXyEc/Q0ItrGrQ== X-Received: by 2002:ae9:e8d4:: with SMTP id a203mr1805083qkg.442.1605649637620; Tue, 17 Nov 2020 13:47:17 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id o187sm15158555qkb.120.2020.11.17.13.47.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:47:17 -0800 (PST) Date: Tue, 17 Nov 2020 16:47:14 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 10/24] pack-bitmap-write: reimplement bitmap writing Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King The bitmap generation code works by iterating over the set of commits for which we plan to write bitmaps, and then for each one performing a traditional traversal over the reachable commits and trees, filling in the bitmap. Between two traversals, we can often reuse the previous bitmap result as long as the first commit is an ancestor of the second. However, our worst case is that we may end up doing "n" complete complete traversals to the root in order to create "n" bitmaps. In a real-world case (the shared-storage repo consisting of all GitHub forks of chromium/chromium), we perform very poorly: generating bitmaps takes ~3 hours, whereas we can walk the whole object graph in ~3 minutes. This commit completely rewrites the algorithm, with the goal of accessing each object only once. It works roughly like this: - generate a list of commits in topo-order using a single traversal - invert the edges of the graph (so have parents point at their children) - make one pass in reverse topo-order, generating a bitmap for each commit and passing the result along to child nodes We generate correct results because each node we visit has already had all of its ancestors added to the bitmap. And we make only two linear passes over the commits. We also visit each tree usually only once. When filling in a bitmap, we don't bother to recurse into trees whose bit is already set in the bitmap (since we know we've already done so when setting their bit). That means that if commit A references tree T, none of its descendants will need to open T again. I say "usually", though, because it is possible for a given tree to be mentioned in unrelated parts of history (e.g., cherry-picking to a parallel branch). So we've accomplished our goal, and the resulting algorithm is pretty simple to understand. But there are some downsides, at least with this initial implementation: - we no longer reuse the results of any on-disk bitmaps when generating. So we'd expect to sometimes be slower than the original when bitmaps already exist. However, this is something we'll be able to add back in later. - we use much more memory. Instead of keeping one bitmap in memory at a time, we're passing them up through the graph. So our memory use should scale with the graph width (times the size of a bitmap). So how does it perform? For a clone of linux.git, generating bitmaps from scratch with the old algorithm took 63s. Using this algorithm it takes 205s. Which is much worse, but _might_ be acceptable if it behaved linearly as the size grew. It also increases peak heap usage by ~1G. That's not impossibly large, but not encouraging. On the complete fork-network of torvalds/linux, it increases the peak RAM usage by 40GB. Yikes. (I forgot to record the time it took, but the memory usage was too much to consider this reasonable anyway). On the complete fork-network of chromium/chromium, I ran out of memory before succeeding. Some back-of-the-envelope calculations indicate it would need 80+GB to complete. So at this stage, we've managed to make things much worse. But because of the way this new algorithm is structured, there are a lot of opportunities for optimization on top. We'll start implementing those in the follow-on patches. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 303 ++++++++++++++++++++++++-------------------- 1 file changed, 169 insertions(+), 134 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 5e998bdaa7..f2f0b6b2c2 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -110,8 +110,6 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack, /** * Compute the actual bitmaps */ -static struct object **seen_objects; -static unsigned int seen_objects_nr, seen_objects_alloc; static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused) { @@ -127,21 +125,6 @@ static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitm writer.selected_nr++; } -static inline void mark_as_seen(struct object *object) -{ - ALLOC_GROW(seen_objects, seen_objects_nr + 1, seen_objects_alloc); - seen_objects[seen_objects_nr++] = object; -} - -static inline void reset_all_seen(void) -{ - unsigned int i; - for (i = 0; i < seen_objects_nr; ++i) { - seen_objects[i]->flags &= ~(SEEN | ADDED | SHOWN); - } - seen_objects_nr = 0; -} - static uint32_t find_object_pos(const struct object_id *oid) { struct object_entry *entry = packlist_find(writer.to_pack, oid); @@ -154,60 +137,6 @@ static uint32_t find_object_pos(const struct object_id *oid) return oe_in_pack_pos(writer.to_pack, entry); } -static void show_object(struct object *object, const char *name, void *data) -{ - struct bitmap *base = data; - bitmap_set(base, find_object_pos(&object->oid)); - mark_as_seen(object); -} - -static void show_commit(struct commit *commit, void *data) -{ - mark_as_seen((struct object *)commit); -} - -static int -add_to_include_set(struct bitmap *base, struct commit *commit) -{ - khiter_t hash_pos; - uint32_t bitmap_pos = find_object_pos(&commit->object.oid); - - if (bitmap_get(base, bitmap_pos)) - return 0; - - hash_pos = kh_get_oid_map(writer.bitmaps, commit->object.oid); - if (hash_pos < kh_end(writer.bitmaps)) { - struct bitmapped_commit *bc = kh_value(writer.bitmaps, hash_pos); - bitmap_or_ewah(base, bc->bitmap); - return 0; - } - - bitmap_set(base, bitmap_pos); - return 1; -} - -static int -should_include(struct commit *commit, void *_data) -{ - struct bitmap *base = _data; - - if (!add_to_include_set(base, commit)) { - struct commit_list *parent = commit->parents; - - mark_as_seen((struct object *)commit); - - while (parent) { - parent->item->object.flags |= SEEN; - mark_as_seen((struct object *)parent->item); - parent = parent->next; - } - - return 0; - } - - return 1; -} - static void compute_xor_offsets(void) { static const int MAX_XOR_OFFSET_SEARCH = 10; @@ -248,79 +177,185 @@ static void compute_xor_offsets(void) } } -void bitmap_writer_build(struct packing_data *to_pack) +struct bb_commit { + struct commit_list *children; + struct bitmap *bitmap; + unsigned selected:1; + unsigned idx; /* within selected array */ +}; + +define_commit_slab(bb_data, struct bb_commit); + +struct bitmap_builder { + struct bb_data data; + struct commit **commits; + size_t commits_nr, commits_alloc; +}; + +static void bitmap_builder_init(struct bitmap_builder *bb, + struct bitmap_writer *writer) { - static const double REUSE_BITMAP_THRESHOLD = 0.2; - - int i, reuse_after, need_reset; - struct bitmap *base = bitmap_new(); struct rev_info revs; + struct commit *commit; + unsigned int i; + + memset(bb, 0, sizeof(*bb)); + init_bb_data(&bb->data); + + reset_revision_walk(); + repo_init_revisions(writer->to_pack->repo, &revs, NULL); + revs.topo_order = 1; + + for (i = 0; i < writer->selected_nr; i++) { + struct commit *c = writer->selected[i].commit; + struct bb_commit *ent = bb_data_at(&bb->data, c); + ent->selected = 1; + ent->idx = i; + add_pending_object(&revs, &c->object, ""); + } + + if (prepare_revision_walk(&revs)) + die("revision walk setup failed"); + + while ((commit = get_revision(&revs))) { + struct commit_list *p; + + parse_commit_or_die(commit); + + ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); + bb->commits[bb->commits_nr++] = commit; + + for (p = commit->parents; p; p = p->next) { + struct bb_commit *ent = bb_data_at(&bb->data, p->item); + commit_list_insert(commit, &ent->children); + } + } +} + +static void bitmap_builder_clear(struct bitmap_builder *bb) +{ + clear_bb_data(&bb->data); + free(bb->commits); + bb->commits_nr = bb->commits_alloc = 0; +} + +static void fill_bitmap_tree(struct bitmap *bitmap, + struct tree *tree) +{ + uint32_t pos; + struct tree_desc desc; + struct name_entry entry; + + /* + * If our bit is already set, then there is nothing to do. Both this + * tree and all of its children will be set. + */ + pos = find_object_pos(&tree->object.oid); + if (bitmap_get(bitmap, pos)) + return; + bitmap_set(bitmap, pos); + + if (parse_tree(tree) < 0) + die("unable to load tree object %s", + oid_to_hex(&tree->object.oid)); + init_tree_desc(&desc, tree->buffer, tree->size); + + while (tree_entry(&desc, &entry)) { + switch (object_type(entry.mode)) { + case OBJ_TREE: + fill_bitmap_tree(bitmap, + lookup_tree(the_repository, &entry.oid)); + break; + case OBJ_BLOB: + bitmap_set(bitmap, find_object_pos(&entry.oid)); + break; + default: + /* Gitlink, etc; not reachable */ + break; + } + } + + free_tree_buffer(tree); +} + +static void fill_bitmap_commit(struct bb_commit *ent, + struct commit *commit) +{ + if (!ent->bitmap) + ent->bitmap = bitmap_new(); + + /* + * mark ourselves, but do not bother with parents; their values + * will already have been propagated to us + */ + bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); + fill_bitmap_tree(ent->bitmap, get_commit_tree(commit)); +} + +static void store_selected(struct bb_commit *ent, struct commit *commit) +{ + struct bitmapped_commit *stored = &writer.selected[ent->idx]; + khiter_t hash_pos; + int hash_ret; + + /* + * the "reuse bitmaps" phase may have stored something here, but + * our new algorithm doesn't use it. Drop it. + */ + if (stored->bitmap) + ewah_free(stored->bitmap); + + stored->bitmap = bitmap_to_ewah(ent->bitmap); + + hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret); + if (hash_ret == 0) + die("Duplicate entry when writing index: %s", + oid_to_hex(&commit->object.oid)); + kh_value(writer.bitmaps, hash_pos) = stored; +} + +void bitmap_writer_build(struct packing_data *to_pack) +{ + struct bitmap_builder bb; + size_t i; + int nr_stored = 0; /* for progress */ writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; if (writer.show_progress) writer.progress = start_progress("Building bitmaps", writer.selected_nr); - - repo_init_revisions(to_pack->repo, &revs, NULL); - revs.tag_objects = 1; - revs.tree_objects = 1; - revs.blob_objects = 1; - revs.no_walk = 0; - - revs.include_check = should_include; - reset_revision_walk(); - - reuse_after = writer.selected_nr * REUSE_BITMAP_THRESHOLD; - need_reset = 0; - - for (i = writer.selected_nr - 1; i >= 0; --i) { - struct bitmapped_commit *stored; - struct object *object; - - khiter_t hash_pos; - int hash_ret; - - stored = &writer.selected[i]; - object = (struct object *)stored->commit; - - if (stored->bitmap == NULL) { - if (i < writer.selected_nr - 1 && - (need_reset || - !in_merge_bases(writer.selected[i + 1].commit, - stored->commit))) { - bitmap_reset(base); - reset_all_seen(); - } - - add_pending_object(&revs, object, ""); - revs.include_check_data = base; - - if (prepare_revision_walk(&revs)) - die("revision walk setup failed"); - - traverse_commit_list(&revs, show_commit, show_object, base); - - object_array_clear(&revs.pending); - - stored->bitmap = bitmap_to_ewah(base); - need_reset = 0; - } else - need_reset = 1; - - if (i >= reuse_after) - stored->flags |= BITMAP_FLAG_REUSE; - - hash_pos = kh_put_oid_map(writer.bitmaps, object->oid, &hash_ret); - if (hash_ret == 0) - die("Duplicate entry when writing index: %s", - oid_to_hex(&object->oid)); - - kh_value(writer.bitmaps, hash_pos) = stored; - display_progress(writer.progress, writer.selected_nr - i); + trace2_region_enter("pack-bitmap-write", "building_bitmaps_total", + the_repository); + + bitmap_builder_init(&bb, &writer); + for (i = bb.commits_nr; i > 0; i--) { + struct commit *commit = bb.commits[i-1]; + struct bb_commit *ent = bb_data_at(&bb.data, commit); + struct commit *child; + + fill_bitmap_commit(ent, commit); + + if (ent->selected) { + store_selected(ent, commit); + nr_stored++; + display_progress(writer.progress, nr_stored); + } + + while ((child = pop_commit(&ent->children))) { + struct bb_commit *child_ent = + bb_data_at(&bb.data, child); + + if (child_ent->bitmap) + bitmap_or(child_ent->bitmap, ent->bitmap); + else + child_ent->bitmap = bitmap_dup(ent->bitmap); + } + bitmap_free(ent->bitmap); + ent->bitmap = NULL; } + bitmap_builder_clear(&bb); - bitmap_free(base); stop_progress(&writer.progress); compute_xor_offsets(); From patchwork Tue Nov 17 21:47:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913601 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D749CC63697 for ; Tue, 17 Nov 2020 21:47:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 761D624181 for ; Tue, 17 Nov 2020 21:47:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="Vdto+aVJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728494AbgKQVrY (ORCPT ); Tue, 17 Nov 2020 16:47:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726544AbgKQVrY (ORCPT ); Tue, 17 Nov 2020 16:47:24 -0500 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 00D16C0613CF for ; Tue, 17 Nov 2020 13:47:24 -0800 (PST) Received: by mail-qt1-x843.google.com with SMTP id 3so17133883qtx.3 for ; Tue, 17 Nov 2020 13:47:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=SB/Np43TUqBuwpOVIoYb4n+uOEF85GV7A6IjS9o8uW4=; b=Vdto+aVJ1rS1aN5mXd6Xl78tyGWUvUpTsHxJ9BaolvsM74xo9DjpfRG2ECxQ7i0KhI vKNxWfcVU0ETU3v9CkjW9/gKIVzDyBx/MTS1IcfuDFhuihPzAs6E52q/gX1gNiAvWvp6 cGwyYYDoeCg4lSEAvOSCn1DupWeEsi+rAzFZNAE0j1aGnOF88Fy3lNCvkQWpxrFCPs1a +YhvCT8ULWSD456owJRE0G5Ux3/LLoMQTbTxwgzATcshOGFlLg8bb3W8oa06dPeIL2or +PWoS/D8C/IhlTI/0mEE/gy5YIfVJRNX5oK6NnlYooi+mPqwjS3g72eXunvGLbtH6Z6p Bt9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=SB/Np43TUqBuwpOVIoYb4n+uOEF85GV7A6IjS9o8uW4=; b=sbzRF3+nBZfS4PZDOVs2ywcLNoqhV4SIhluww/JkBk1EihOCjMx5gv3kyVQnh4Hul1 DKtr8g3siLnaGYEZ1h/nT62fXhYkpPIhnxDjccarC4sGKs2uI2mr2Wx6SzO5torpZaPs y0xo1/dlUPdSuqPja2W+WeVLMhqj3y+vHP1Tny5UE0IqlcwO/pFn7Fgpa/rknzCz/ob0 ERnbgTsJLDpRAIWU+lSU0FpAHewckuI6PfyK8b/rdUj2tbyJczd/I+eiWheSm25Y1xII bjRq7j17YYR2bbeBvVLg2cqvUVqT5opeu6zkYHEr6CcK4UR4KIf2TGGLzqJu3OyxbQ5c LmTA== X-Gm-Message-State: AOAM533TxlIIPlzW93qLujwDICs40jxwJyaMzMFvc7W4LQHCYg8Zntsm UX6bAszyLHSHIcDvhEz8frQo79uFnctuQFhp X-Google-Smtp-Source: ABdhPJxBMGZQCL5PHvvaPHcxNioLtKURi3BtythbxUp0S9ligH3gLBLUYg5PpEilX9YMUiaviXuUJQ== X-Received: by 2002:ac8:7b47:: with SMTP id m7mr1768617qtu.171.1605649642932; Tue, 17 Nov 2020 13:47:22 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id g70sm14727155qke.8.2020.11.17.13.47.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:47:22 -0800 (PST) Date: Tue, 17 Nov 2020 16:47:20 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 11/24] pack-bitmap-write: pass ownership of intermediate bitmaps Message-ID: <466dd3036a8ca7dc9718a53f17cf87e0eb22c066.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King Our algorithm to generate reachability bitmaps walks through the commit graph from the bottom up, passing bitmap data from each commit to its descendants. For a linear stretch of history like: A -- B -- C our sequence of steps is: - compute the bitmap for A by walking its trees, etc - duplicate A's bitmap as a starting point for B; we can now free A's bitmap, since we only needed it as an intermediate result - OR in any extra objects that B can reach into its bitmap - duplicate B's bitmap as a starting point for C; likewise, free B's bitmap - OR in objects for C, and so on... Rather than duplicating bitmaps and immediately freeing the original, we can just pass ownership from commit to commit. Note that this doesn't always work: - the recipient may be a merge which already has an intermediate bitmap from its other ancestor. In that case we have to OR our result into it. Note that the first ancestor to reach the merge does get to pass ownership, though. - we may have multiple children; we can only pass ownership to one of them However, it happens often enough and copying bitmaps is expensive enough that this provides a noticeable speedup. On a clone of linux.git, this reduces the time to generate bitmaps from 205s to 70s. This is about the same amount of time it took to generate bitmaps using our old "many traversals" algorithm (the previous commit measures the identical scenario as taking 63s). It unfortunately provides only a very modest reduction in the peak memory usage, though. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index f2f0b6b2c2..d2d46ff5f4 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -333,6 +333,7 @@ void bitmap_writer_build(struct packing_data *to_pack) struct commit *commit = bb.commits[i-1]; struct bb_commit *ent = bb_data_at(&bb.data, commit); struct commit *child; + int reused = 0; fill_bitmap_commit(ent, commit); @@ -348,10 +349,15 @@ void bitmap_writer_build(struct packing_data *to_pack) if (child_ent->bitmap) bitmap_or(child_ent->bitmap, ent->bitmap); - else + else if (reused) child_ent->bitmap = bitmap_dup(ent->bitmap); + else { + child_ent->bitmap = ent->bitmap; + reused = 1; + } } - bitmap_free(ent->bitmap); + if (!reused) + bitmap_free(ent->bitmap); ent->bitmap = NULL; } bitmap_builder_clear(&bb); From patchwork Tue Nov 17 21:47:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913603 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD147C2D0E4 for ; Tue, 17 Nov 2020 21:47:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8179B241A7 for ; Tue, 17 Nov 2020 21:47:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="ZJ7d2AcG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728500AbgKQVra (ORCPT ); Tue, 17 Nov 2020 16:47:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726544AbgKQVra (ORCPT ); Tue, 17 Nov 2020 16:47:30 -0500 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8725C0613CF for ; Tue, 17 Nov 2020 13:47:28 -0800 (PST) Received: by mail-qk1-x744.google.com with SMTP id u4so22347127qkk.10 for ; Tue, 17 Nov 2020 13:47:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=JsFhLQbh5EuSVPg83T6V+6+RMKHaLo4hNx2UNqp8ct4=; b=ZJ7d2AcGZ0Wlhv4hdlpsUliHRuJVkxB/bnwdlslqNDezoeksda7XVVwbPcPO+rmHY7 hlXfWxPPTJykYnCuYfZtlomdjxWeKOR1TLs/QgqCoc/iK9qPsu4/ZdCE/nKVksL+332J BPl6lccz3xsUD7RXRI1o5HP+atkvbAvAuQp/A7hoLMccQFxYoorvQ/GYOp0Wv6eFNNxD RtEe/7ON2/wPkUHnIYFkJbSya40sgTkELaFUZxNwnZCgRbd41vz2iyGOB+BsQnL5ruWZ IAQvzY6/JdNRGISEXSElgbHUPS1fJ6ByjQwdKXxRnw0611TIQ8y5xB33fF2cEX6h82Vn XBBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=JsFhLQbh5EuSVPg83T6V+6+RMKHaLo4hNx2UNqp8ct4=; b=uBL0oQbuR5WcJ2c9sviejAytMAqWbyS7aTjCaQE+XzoFEzNZh31fWkKIVdofT7JOvI S9Jl472TDoUmjn4N5J2LDizfHhjpYpCuWuO6O5S+f0J1uGYGzdXmTZa/t9HksoKMVqdl SKTtF0WH+ptAIiPg3T4gnYYzfp8diVpMMgqGrtDHcvQL2BBsgfwfi9hGsmSQzgWt4u4q Psz+iafaCY45LmiLMWe0V41/IEY+IXGFKkGCNT14dfJOz5kXXB76ciYXc80UUIbMe+Gf sZLa5kVCqXGgHHo7nwAJuNdjoWMdD1Hcpo7/ohf1+2mjc7/k/7hT0btEgleXl89DD4Wl B35g== X-Gm-Message-State: AOAM530Ev//YsnnwowGzVW7MOAOVxe5dQitG98VDZFg4c7rEFsNwllg7 +yXIHQWWrTJXIsfe/zcYzuJoLX03VTUxcjpm X-Google-Smtp-Source: ABdhPJxBvcG73kKw4OVaxiQ26sxTIXYZcuH2dZ7k7jrJbKJ7lSKbFJuR5JvyUKlNu0tgKijDC/WeyA== X-Received: by 2002:a05:620a:1087:: with SMTP id g7mr1702402qkk.457.1605649647528; Tue, 17 Nov 2020 13:47:27 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id i4sm3563999qti.78.2020.11.17.13.47.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:47:26 -0800 (PST) Date: Tue, 17 Nov 2020 16:47:24 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 12/24] pack-bitmap-write: fill bitmap with commit history Message-ID: <8e5607929d66a3c808dbe3a06c312d0cda1ef568.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The fill_bitmap_commit() method assumes that every parent of the given commit is already part of the current bitmap. Instead of making that assumption, let's walk parents until we reach commits already part of the bitmap. Set the value for that parent immediately after querying to save time doing double calls to find_object_pos() and to avoid inserting the parent into the queue multiple times. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index d2d46ff5f4..361f3305a2 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -12,6 +12,7 @@ #include "sha1-lookup.h" #include "pack-objects.h" #include "commit-reach.h" +#include "prio-queue.h" struct bitmapped_commit { struct commit *commit; @@ -279,17 +280,30 @@ static void fill_bitmap_tree(struct bitmap *bitmap, } static void fill_bitmap_commit(struct bb_commit *ent, - struct commit *commit) + struct commit *commit, + struct prio_queue *queue) { if (!ent->bitmap) ent->bitmap = bitmap_new(); - /* - * mark ourselves, but do not bother with parents; their values - * will already have been propagated to us - */ bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); - fill_bitmap_tree(ent->bitmap, get_commit_tree(commit)); + prio_queue_put(queue, commit); + + while (queue->nr) { + struct commit_list *p; + struct commit *c = prio_queue_get(queue); + + bitmap_set(ent->bitmap, find_object_pos(&c->object.oid)); + fill_bitmap_tree(ent->bitmap, get_commit_tree(c)); + + for (p = c->parents; p; p = p->next) { + int pos = find_object_pos(&p->item->object.oid); + if (!bitmap_get(ent->bitmap, pos)) { + bitmap_set(ent->bitmap, pos); + prio_queue_put(queue, p->item); + } + } + } } static void store_selected(struct bb_commit *ent, struct commit *commit) @@ -319,6 +333,7 @@ void bitmap_writer_build(struct packing_data *to_pack) struct bitmap_builder bb; size_t i; int nr_stored = 0; /* for progress */ + struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; @@ -335,7 +350,7 @@ void bitmap_writer_build(struct packing_data *to_pack) struct commit *child; int reused = 0; - fill_bitmap_commit(ent, commit); + fill_bitmap_commit(ent, commit, &queue); if (ent->selected) { store_selected(ent, commit); @@ -360,6 +375,7 @@ void bitmap_writer_build(struct packing_data *to_pack) bitmap_free(ent->bitmap); ent->bitmap = NULL; } + clear_prio_queue(&queue); bitmap_builder_clear(&bb); stop_progress(&writer.progress); From patchwork Tue Nov 17 21:47:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913605 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F004C2D0E4 for ; Tue, 17 Nov 2020 21:47:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2643F241A7 for ; Tue, 17 Nov 2020 21:47:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="h2a5iP5J" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728512AbgKQVre (ORCPT ); Tue, 17 Nov 2020 16:47:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726544AbgKQVrd (ORCPT ); Tue, 17 Nov 2020 16:47:33 -0500 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B25C8C0613CF for ; Tue, 17 Nov 2020 13:47:33 -0800 (PST) Received: by mail-qt1-x844.google.com with SMTP id t5so17116477qtp.2 for ; Tue, 17 Nov 2020 13:47:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=NBbvdD+apBGnqe7XsCbQb+AUlARGfFxhZp5Yln7ln9o=; b=h2a5iP5J02SK6lDYIBO9elVy9rII8AylH6PJfBHQIBknJ2nwELjOsBC5jNX94CkLhG 2XvR6QSSD/LRyGk38WfYwWFoglBvl67evyV6NJG5CFcQizyLy+1V63kkXCvgzbRxAuRr Tuldu16FTOO37pab8MNoY6GbyGj5MS9SE9MsqfIsvatDOU/84BeK7lhiX5AcpSsJZeZv E76HbrjfQvkIWamK/vADEPzs/W030FtFjzn+fW169le3QQ5amMpaVwgekkmskT/rNUVg pD+G1WOhWnDRDkedZfkn1IsfRC6nt62q1mXkAtEPxVBRgP01d+ByUEcdtLTMhqHrDdKp 1r6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=NBbvdD+apBGnqe7XsCbQb+AUlARGfFxhZp5Yln7ln9o=; b=tSyhEhD+vF+ITxmYGRZTErxkLqCTWhF9ws7hN7W8AA29aA3n7OLRF9Uf7xOhxlzkSx 1CffLA2P7RroBzgrmsAN7ztD31XR+HgX5quhfrnXM6EcksE5HyWrZU+qWzyajQjaKdSJ Woj6jqIOF2167aGevPV+6pwiv220w7dFnI8SPEKMMHZD2x+XH0RT9tb2CWSKe2kzQRzO fPE24BC8OOULRBSGtG0SdGQzmBUmLeP+4Z+3PE87y0eHh8/2PMI7qG5I1sRPNRqd7nPA lteg9FSNLybqJiT87G1XcNybHZSrtySpyJEh0uPo2BQOXvE8MuiQEgF7Immbv3WlN8bW Dc/Q== X-Gm-Message-State: AOAM53161Jr9DzxGLriORVX6b6oizxtUpQXWjSg+7EYi+p9b02iJ9k0W 6Yd0kr679nUvoQrHRKYhkwujXw2mpKSfJf0V X-Google-Smtp-Source: ABdhPJwyv37Vr7+RHbFwm3xCoirlJTzkxBiBa8kRPZ3476yLBVA26IHEXKdI8kY3sN3uAfKQLZjiaw== X-Received: by 2002:ac8:7619:: with SMTP id t25mr1834844qtq.244.1605649652633; Tue, 17 Nov 2020 13:47:32 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id k15sm15233472qke.75.2020.11.17.13.47.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:47:32 -0800 (PST) Date: Tue, 17 Nov 2020 16:47:29 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 13/24] bitmap: add bitmap_diff_nonzero() Message-ID: <4840c64c51d65ea7bf1ebe03cad4588267db0207.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The bitmap_diff_nonzero() checks if the 'self' bitmap contains any bits that are not on in the 'other' bitmap. Also, delete the declaration of bitmap_is_subset() as it is not used or implemented. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- ewah/bitmap.c | 24 ++++++++++++++++++++++++ ewah/ewok.h | 2 +- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index eb7e2539be..e2ebeac0e5 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -200,6 +200,30 @@ int bitmap_equals(struct bitmap *self, struct bitmap *other) return 1; } +int bitmap_diff_nonzero(struct bitmap *self, struct bitmap *other) +{ + struct bitmap *small; + size_t i; + + if (self->word_alloc < other->word_alloc) { + small = self; + } else { + small = other; + + for (i = other->word_alloc; i < self->word_alloc; i++) { + if (self->words[i] != 0) + return 1; + } + } + + for (i = 0; i < small->word_alloc; i++) { + if ((self->words[i] & ~other->words[i])) + return 1; + } + + return 0; +} + void bitmap_reset(struct bitmap *bitmap) { memset(bitmap->words, 0x0, bitmap->word_alloc * sizeof(eword_t)); diff --git a/ewah/ewok.h b/ewah/ewok.h index 1fc555e672..156c71d06d 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -180,7 +180,7 @@ int bitmap_get(struct bitmap *self, size_t pos); void bitmap_reset(struct bitmap *self); void bitmap_free(struct bitmap *self); int bitmap_equals(struct bitmap *self, struct bitmap *other); -int bitmap_is_subset(struct bitmap *self, struct bitmap *super); +int bitmap_diff_nonzero(struct bitmap *self, struct bitmap *other); struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap); struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah); From patchwork Tue Nov 17 21:47:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913607 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7B1CC2D0E4 for ; Tue, 17 Nov 2020 21:47:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7B8972463D for ; Tue, 17 Nov 2020 21:47:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="f0PqAHvF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728536AbgKQVrk (ORCPT ); Tue, 17 Nov 2020 16:47:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728522AbgKQVrj (ORCPT ); Tue, 17 Nov 2020 16:47:39 -0500 Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com [IPv6:2607:f8b0:4864:20::730]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38AE7C0613CF for ; Tue, 17 Nov 2020 13:47:38 -0800 (PST) Received: by mail-qk1-x730.google.com with SMTP id 11so22360991qkd.5 for ; Tue, 17 Nov 2020 13:47:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=20LuDZ0/tqk6rFM5lLPtdr8IqDUAgw2aiaqSEtSZ7UU=; b=f0PqAHvFclJlaJvdFXmVXwhZxWUNCyYehhFcMrPxdJitoeC+9y4JMiedPDKfiq4uLc a2Sn7iqJolN6zqY0jT5ZIOWkxmUI31MOmVDJ1yMVZeQajvWZ95RzD3B7Q3gkokq+niK1 PksuMeHot6QBgYQ9SpJJ1RPsJvlpO/R1KdqNxkXN8A/+DiY4H9NHFOwb6rC7ANmVW6DA ydiEQMjGUQ/NcTf+VQSc3OFH+ozh5rwPPUcdDCSYf2XeUSU7TWkIcrzSxkRFxQZjuvT4 tCSqTQSpPwponXmC4nFjm9oQ+lF+Sc23nEEp58/uDOYPx9Mym/b1qwwZ4QyMm9i9Ghsa hiQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=20LuDZ0/tqk6rFM5lLPtdr8IqDUAgw2aiaqSEtSZ7UU=; b=fS54NS6weIMX/EnCQHUD0wKSbKItQ9HcCVud+KCEwF6E0r0itFLvSmDEMqEVAtXjHK XN4cNs2N1tk3qgdVsxfbUL/hqJk2uqmmHfn2yIc5dD2TGFUOaSj9Qfy7/bvdL6n/zp3P Me1KnDETsVkTP0ENVNO4fuAlpQei9ZPwLqpyRYkieLgP8Eujn+OMRqEbU6CnvTS3roop TN9fCtZuyWD3ouUsLmCeliGFH9hHinMHhimf8NAy3s75rfXV7FFgJ0E9Ws+vxpiIa44N bSDN271rhCwl0J/gZKC4m9nMwzdue9i80m5LthljNbhFYkTnn5GhZ8ePXSb38kE7U+Dh 0uxQ== X-Gm-Message-State: AOAM532i51JFcvvPVulQj9LqhLidl1v0VemVH2fPO2fbUm6twP1vB8bA Hd3Jp96UQfqOkmdCKMZfa3Db8rWt7ZwsIpYl X-Google-Smtp-Source: ABdhPJxziwp1PFRfGJYRicyTqGwN3ir8uqgJ3PAks8Tw9kZmVyAzrXck7JTufDslznGUYqmElPBqsA== X-Received: by 2002:ae9:e007:: with SMTP id m7mr1745542qkk.416.1605649657210; Tue, 17 Nov 2020 13:47:37 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id k8sm10647363qki.55.2020.11.17.13.47.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:47:36 -0800 (PST) Date: Tue, 17 Nov 2020 16:47:34 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 14/24] commit: implement commit_list_contains() Message-ID: <63e846f4e8841edeecccb46efa1645293f7452a8.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee It can be helpful to check if a commit_list contains a commit. Use pointer equality, assuming lookup_commit() was used. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- commit.c | 11 +++++++++++ commit.h | 2 ++ 2 files changed, 13 insertions(+) diff --git a/commit.c b/commit.c index fe1fa3dc41..9a785bf906 100644 --- a/commit.c +++ b/commit.c @@ -544,6 +544,17 @@ struct commit_list *commit_list_insert(struct commit *item, struct commit_list * return new_list; } +int commit_list_contains(struct commit *item, struct commit_list *list) +{ + while (list) { + if (list->item == item) + return 1; + list = list->next; + } + + return 0; +} + unsigned commit_list_count(const struct commit_list *l) { unsigned c = 0; diff --git a/commit.h b/commit.h index 5467786c7b..742a6de460 100644 --- a/commit.h +++ b/commit.h @@ -167,6 +167,8 @@ int find_commit_subject(const char *commit_buffer, const char **subject); struct commit_list *commit_list_insert(struct commit *item, struct commit_list **list); +int commit_list_contains(struct commit *item, + struct commit_list *list); struct commit_list **commit_list_append(struct commit *commit, struct commit_list **next); unsigned commit_list_count(const struct commit_list *l); From patchwork Tue Nov 17 21:47:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913609 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8842CC2D0E4 for ; Tue, 17 Nov 2020 21:47:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 20D56241A7 for ; Tue, 17 Nov 2020 21:47:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="I+3YWqrU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728522AbgKQVrq (ORCPT ); Tue, 17 Nov 2020 16:47:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726498AbgKQVrq (ORCPT ); Tue, 17 Nov 2020 16:47:46 -0500 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD5EBC0613CF for ; Tue, 17 Nov 2020 13:47:45 -0800 (PST) Received: by mail-qk1-x744.google.com with SMTP id k4so12429652qko.13 for ; Tue, 17 Nov 2020 13:47:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=g49Z2A6JCu5TR/TsL2TYsryee1VVppQfQySl90waqeY=; b=I+3YWqrUJMKBHFdN4p0E+5Ioh1SrLkh5NjZ4rGboORRkRRnGqQv5vKoTAzOXY3j5J0 Sk4tS8Z/1OesjeTzBa4Cgnu/R5lIhzLKScHZI8moyQqdFmwh5wQkuBLKqR1DFfXyPfak HYs+Qo/qMqOlfkPXUNURKpjcd1wj1dOIMWvISD4hKkJ1Ja9mGBFhJlumIq8czD+BKEAd 9oWmYsfwv/1+wmcAU0FJzBOsl1O4vzPHXtgJNGt2V27rhAmhgsTvIaCGxzLqywbZGiqP 1DsD+h454/mee4SSzDS2+kjAHsejJDRbCZgPb9xMMYolmvObkc52hZqe/6CS+siXHGyu X0DA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=g49Z2A6JCu5TR/TsL2TYsryee1VVppQfQySl90waqeY=; b=J+AQaVshMIlwpHkJiKhdDF2ENnupmLib4g/Hp7MqRtH8QQL7ZidMCR06eTaqdnXN7p Qe07a9pvJfh6cQFOj1Ic37Um5Aa/rAnNSUQP30xTNFm9wqniGoPD5xh3SlT/aUNCenz2 3s+e1BedtHSmuHUeoJxvDS2X8XkPsuqR5P2w8ZjnJhHUtskPuykhMp9vjmVxLYXh70cw m6JTCK83GzFG+1zw088ocoe4+eD1e56sBp6lZ5F/B1UFVjxQTECTAtf4yr347ea2gJsk j6AkiYMemb2V+Ku5UR6kkZGJ7WB3ZPj8/uZgUvMrq2BgZLQqzkXlJqK8r19o0nUzSNDt vuVw== X-Gm-Message-State: AOAM53179gkVjDJS28W3RA68Iu9WM4wCjFoDm0eC7v6pjLFCXpmU7KpL zFayze0cW3nwttpC/rJ/h//uSaneumuP2wRn X-Google-Smtp-Source: ABdhPJyP9VVHQUh6bhAyySgXUFRgMa1v0CZGYaWjz3mQ3ONq/ct0jxvcxRUv5ZxxpIarc4Bb+BbZzw== X-Received: by 2002:a37:7cb:: with SMTP id 194mr1732198qkh.289.1605649664663; Tue, 17 Nov 2020 13:47:44 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id 131sm15385555qkh.115.2020.11.17.13.47.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:47:44 -0800 (PST) Date: Tue, 17 Nov 2020 16:47:41 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 15/24] t5310: add branch-based checks Message-ID: <8b5d23933301988e42ddd57687e0535f8749367f.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The current rev-list tests that check the bitmap data only work on HEAD instead of multiple branches. Expand the test cases to handle both 'master' and 'other' branches. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- t/t5310-pack-bitmaps.sh | 61 +++++++++++++++++++++++------------------ 1 file changed, 34 insertions(+), 27 deletions(-) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 70a4fc4843..6bf68fee85 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -41,63 +41,70 @@ test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' git rev-list --test-bitmap HEAD ' -rev_list_tests() { - state=$1 - - test_expect_success "counting commits via bitmap ($state)" ' - git rev-list --count HEAD >expect && - git rev-list --use-bitmap-index --count HEAD >actual && +rev_list_tests_head () { + test_expect_success "counting commits via bitmap ($state, $branch)" ' + git rev-list --count $branch >expect && + git rev-list --use-bitmap-index --count $branch >actual && test_cmp expect actual ' - test_expect_success "counting partial commits via bitmap ($state)" ' - git rev-list --count HEAD~5..HEAD >expect && - git rev-list --use-bitmap-index --count HEAD~5..HEAD >actual && + test_expect_success "counting partial commits via bitmap ($state, $branch)" ' + git rev-list --count $branch~5..$branch >expect && + git rev-list --use-bitmap-index --count $branch~5..$branch >actual && test_cmp expect actual ' - test_expect_success "counting commits with limit ($state)" ' - git rev-list --count -n 1 HEAD >expect && - git rev-list --use-bitmap-index --count -n 1 HEAD >actual && + test_expect_success "counting commits with limit ($state, $branch)" ' + git rev-list --count -n 1 $branch >expect && + git rev-list --use-bitmap-index --count -n 1 $branch >actual && test_cmp expect actual ' - test_expect_success "counting non-linear history ($state)" ' + test_expect_success "counting non-linear history ($state, $branch)" ' git rev-list --count other...master >expect && git rev-list --use-bitmap-index --count other...master >actual && test_cmp expect actual ' - test_expect_success "counting commits with limiting ($state)" ' - git rev-list --count HEAD -- 1.t >expect && - git rev-list --use-bitmap-index --count HEAD -- 1.t >actual && + test_expect_success "counting commits with limiting ($state, $branch)" ' + git rev-list --count $branch -- 1.t >expect && + git rev-list --use-bitmap-index --count $branch -- 1.t >actual && test_cmp expect actual ' - test_expect_success "counting objects via bitmap ($state)" ' - git rev-list --count --objects HEAD >expect && - git rev-list --use-bitmap-index --count --objects HEAD >actual && + test_expect_success "counting objects via bitmap ($state, $branch)" ' + git rev-list --count --objects $branch >expect && + git rev-list --use-bitmap-index --count --objects $branch >actual && test_cmp expect actual ' - test_expect_success "enumerate commits ($state)" ' - git rev-list --use-bitmap-index HEAD >actual && - git rev-list HEAD >expect && + test_expect_success "enumerate commits ($state, $branch)" ' + git rev-list --use-bitmap-index $branch >actual && + git rev-list $branch >expect && test_bitmap_traversal --no-confirm-bitmaps expect actual ' - test_expect_success "enumerate --objects ($state)" ' - git rev-list --objects --use-bitmap-index HEAD >actual && - git rev-list --objects HEAD >expect && + test_expect_success "enumerate --objects ($state, $branch)" ' + git rev-list --objects --use-bitmap-index $branch >actual && + git rev-list --objects $branch >expect && test_bitmap_traversal expect actual ' - test_expect_success "bitmap --objects handles non-commit objects ($state)" ' - git rev-list --objects --use-bitmap-index HEAD tagged-blob >actual && + test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" ' + git rev-list --objects --use-bitmap-index $branch tagged-blob >actual && grep $blob actual ' } +rev_list_tests () { + state=$1 + + for branch in "master" "other" + do + rev_list_tests_head + done +} + rev_list_tests 'full bitmap' test_expect_success 'clone from bitmapped repository' ' From patchwork Tue Nov 17 21:47:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913611 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E2D2C63697 for ; Tue, 17 Nov 2020 21:47:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B8688241A7 for ; Tue, 17 Nov 2020 21:47:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="PxNOC8Lj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728558AbgKQVrw (ORCPT ); Tue, 17 Nov 2020 16:47:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726498AbgKQVrv (ORCPT ); Tue, 17 Nov 2020 16:47:51 -0500 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7ABD5C0613CF for ; Tue, 17 Nov 2020 13:47:51 -0800 (PST) Received: by mail-qt1-x843.google.com with SMTP id f93so17081415qtb.10 for ; Tue, 17 Nov 2020 13:47:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=gpM7ZEJoHWnszhfZZzvCrqIDHE5CA/aVMdvCvRpJSxg=; b=PxNOC8Lj19iqo/vV/tSDGbdU/s1q8f4zsEYBSvXRCIAcknXLinW3oObFtTuCA+CYzD F4cisGiExAiorhZkcnq37DxBnD4VeLZCW3iOaYT2XHJyBaO1lU8WH3uK4BxAvQumEogD erLd+DFdWs6fy1BVhWEE6HPkaOoS0xHjtdZUNmRcbI9Mt7SGdJQ23TlbS9MPE1jSofW7 RA9aikrox3b67b2h36WYHcuL+rzLyz852nuf6HZIniCCV31OR6aw8FXQd9KO2w+K68ua zcExelouD3oID+IplicaHiKzo5S7KfWubbXFGU9bLhU5BROtgtnxTwfGER8zkAv/nVT5 UiEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=gpM7ZEJoHWnszhfZZzvCrqIDHE5CA/aVMdvCvRpJSxg=; b=JWC3+4pyrhBM+aMaLudgRZJ4ciOZQTMdquoA0HJKB7KJAerTs7HSbbAmJJDx0E1jy+ oBtMnTKQN94lQ0gzMxTvnmJA7LHtPWVjG3H8p5mevwIt5Eunj6g/e97Wa2MnM1AG46lY HK62wrjRs66V35RcUyrGYGXqbJPvA2eFHek1oJwfJchhhLTi/LGJQP9RzHjTfyY4l169 stILQX2+6YcxBoeTYSZ5aB7Ok+akXfCho7YZLd4T941Jvw4176+GKew2ax72L1nWzXae YS5WYKgW2fBqrk8o7p+e8SR9ZzBnpiODCDyXVKhu7PUIm4v7RgfvHgEwZrSKORzUX/dn MlGw== X-Gm-Message-State: AOAM532kTBvBYzIwa9dkp3B+7tohsyAJLrEriMgXvxqkWhrakZ9rCKKF DGrP4DWk/Y6t+L74jkAJCN+MVJnzY1uQKqsp X-Google-Smtp-Source: ABdhPJyMM9xCLe4RblfG0uGweEvsovUBQUKtFBxyQ4dEOmxGWMg/jQ9/vQLlpTvnnWoHZBFhXf0I0Q== X-Received: by 2002:ac8:7619:: with SMTP id t25mr1835803qtq.244.1605649670365; Tue, 17 Nov 2020 13:47:50 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id c14sm15257405qko.29.2020.11.17.13.47.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:47:49 -0800 (PST) Date: Tue, 17 Nov 2020 16:47:47 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 16/24] pack-bitmap-write: rename children to reverse_edges Message-ID: <60a46091bb4661946985f66ec10062739770696a.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The bitmap_builder_init() method walks the reachable commits in topological order and constructs a "reverse graph" along the way. At the moment, this reverse graph contains an edge from commit A to commit B if and only if A is a parent of B. Thus, the name "children" is appropriate for for this reverse graph. In the next change, we will repurpose the reverse graph to not be directly-adjacent commits in the commit-graph, but instead a more abstract relationship. The previous changes have already incorporated the necessary updates to fill_bitmap_commit() that allow these edges to not be immediate children. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 361f3305a2..369c76a87c 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -179,7 +179,7 @@ static void compute_xor_offsets(void) } struct bb_commit { - struct commit_list *children; + struct commit_list *reverse_edges; struct bitmap *bitmap; unsigned selected:1; unsigned idx; /* within selected array */ @@ -228,7 +228,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, for (p = commit->parents; p; p = p->next) { struct bb_commit *ent = bb_data_at(&bb->data, p->item); - commit_list_insert(commit, &ent->children); + commit_list_insert(commit, &ent->reverse_edges); } } } @@ -358,7 +358,7 @@ void bitmap_writer_build(struct packing_data *to_pack) display_progress(writer.progress, nr_stored); } - while ((child = pop_commit(&ent->children))) { + while ((child = pop_commit(&ent->reverse_edges))) { struct bb_commit *child_ent = bb_data_at(&bb.data, child); From patchwork Tue Nov 17 21:47:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8671C63697 for ; Tue, 17 Nov 2020 21:47:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5728B241A7 for ; Tue, 17 Nov 2020 21:47:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="J3/gzJ+K" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728045AbgKQVr6 (ORCPT ); Tue, 17 Nov 2020 16:47:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726498AbgKQVr6 (ORCPT ); Tue, 17 Nov 2020 16:47:58 -0500 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F05A7C0613CF for ; Tue, 17 Nov 2020 13:47:57 -0800 (PST) Received: by mail-qt1-x841.google.com with SMTP id z3so13509912qtw.9 for ; Tue, 17 Nov 2020 13:47:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=DyjLrIPhWNLVZ54Jka/84KiziIeEi2U5qd9H2KkmKUA=; b=J3/gzJ+KteeO6Lbi8/E6m2m2UqnpaAfIP8Bawbr4a2D5yTpw6Ei2FiGtp+UGrc3ibp l8JR40KHC5GXXqCuxz72MEZM6Hk+1KNcJiTvMIvWLS52nQvC9kHWaBp1WlCyHNZaqwd+ zI/VJZiHXO9ixR08GeptEB6rBd/hY50i1CJeWFhXqmQZXhTB1REZoF1iotrNAu9RW/X0 t4susIAFL9Tnc4uwJryonshdFnATV8MbuB8I/tUkRchQm+HqS3l7Wy1vXo7gTr7vE2O7 FOG3y8NSjzsn6SCvxvw7/LFXK1xd2e7CqMe7r09gI+mrfpUfu5V1wbEewsp7bHouzmjV UvXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=DyjLrIPhWNLVZ54Jka/84KiziIeEi2U5qd9H2KkmKUA=; b=LqPfC9IYQuiUQe+1f+yb7bzG82eAGSCMuIg0ohu1Q/jdMik6X3OwcKS73N5yuXrBpx zpc/xwuC+gIMyd7adIYTdsYX57Yvuo0D+WLstzW06j2W/cRtj7/mPurmAWBno+w1yUWk zC7pzDnPlVk0JkWxxB6TTCsc961RufYhIXOz92M15xExz12PlJR4huKhfONlhoipNnIc ceSbK+1xiBJeug2+GJE7xZ6j3Ir3nPG3c+ZU0wtjgNNZ3oaaRMTKDlkyah7xFPB/BNG7 PNVCpGarY3oWIVNyuWIDT1QtLWrZWhKO/OO6QChfZm9sDFwMYmozBWES0DLf53lYlB2o 7yFQ== X-Gm-Message-State: AOAM531AtqgGtXddfcusRdmWXRVjgRDMNqkSrCcC5Ov6QLsjzQBOHnPW znZV/9cpNbZhUlp7F5AH9vFsQN3CuECT5Ngs X-Google-Smtp-Source: ABdhPJyQXKGzAU9YZMOxP11NzLEFmt3hLPxSn1DHy+GGwotnVkj5MYV6XSF7HAtXmLXrPgjXZzn17Q== X-Received: by 2002:a05:622a:c8:: with SMTP id p8mr1312697qtw.293.1605649676886; Tue, 17 Nov 2020 13:47:56 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id 9sm13982493qty.30.2020.11.17.13.47.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:47:56 -0800 (PST) Date: Tue, 17 Nov 2020 16:47:54 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 17/24] pack-bitmap.c: check reads more aggressively when loading Message-ID: <8f7bb2dd2e192562395bb815d891ec2ad28e6644.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Before 'load_bitmap_entries_v1()' reads an actual EWAH bitmap, it should check that it can safely do so by ensuring that there are at least 6 bytes available to be read (four for the commit's index position, and then two more for the xor offset and flags, respectively). Likewise, it should check that the commit index it read refers to a legitimate object in the pack. The first fix catches a truncation bug that was exposed when testing, and the second is purely precautionary. There are some possible future improvements, not pursued here. They are: - Computing the correct boundary of the bitmap itself in the caller and ensuring that we don't read past it. This may or may not be worth it, since in a truncation situation, all bets are off: (is the trailer still there and the bitmap entries malformed, or is the trailer truncated?). The best we can do is try to read what's there as if it's correct data (and protect ourselves when it's obviously bogus). - Avoid the magic "6" by teaching read_be32() and read_u8() (both of which are custom helpers for this function) to check sizes before advancing the pointers. - Adding more tests in this area. Testing these truncation situations are remarkably fragile to even subtle changes in the bitmap generation. So, the resulting tests are likely to be quite brittle. Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- pack-bitmap.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 4431f9f120..60c781d100 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -229,11 +229,16 @@ static int load_bitmap_entries_v1(struct bitmap_index *index) uint32_t commit_idx_pos; struct object_id oid; + if (index->map_size - index->map_pos < 6) + return error("corrupt ewah bitmap: truncated header for entry %d", i); + commit_idx_pos = read_be32(index->map, &index->map_pos); xor_offset = read_u8(index->map, &index->map_pos); flags = read_u8(index->map, &index->map_pos); - nth_packed_object_id(&oid, index->pack, commit_idx_pos); + if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0) + return error("corrupt ewah bitmap: commit index %u out of range", + (unsigned)commit_idx_pos); bitmap = read_bitmap_1(index); if (!bitmap) From patchwork Tue Nov 17 21:48:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913615 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75E0DC2D0E4 for ; Tue, 17 Nov 2020 21:48:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 040AA241A7 for ; Tue, 17 Nov 2020 21:48:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="YupCHoyC" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728381AbgKQVsI (ORCPT ); Tue, 17 Nov 2020 16:48:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726204AbgKQVsF (ORCPT ); Tue, 17 Nov 2020 16:48:05 -0500 Received: from mail-qv1-xf31.google.com (mail-qv1-xf31.google.com [IPv6:2607:f8b0:4864:20::f31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B56CC0613CF for ; Tue, 17 Nov 2020 13:48:05 -0800 (PST) Received: by mail-qv1-xf31.google.com with SMTP id x13so11581083qvk.8 for ; Tue, 17 Nov 2020 13:48:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=tiILLqY7U1Y3QbpM1Mvip22YwICjH0jyS4H5r3mejDU=; b=YupCHoyCsotFrbTPPfFta+UslN+K/KRYBOKQeLWHNYUq9qKeN9gHbkNFwxBoWuogyo sjDo3hG/VeSHsYNviZmh/aV9rD2vLqz4w1ZbGL95ZVptAWD7B76uUpSj500ExErRxTtT qiCwYJ5AHWvIq+3Yp0gP7+bANyqRhMwNF4NDlnIMt52Sa/C+pp5/u3kmuAhqMyplSz9I WLr6oSZ0oWxglIJKEin5MsG8MLnisfBFzJRAm74RC8mwsVCIJ0bSNgwc49XvH90Qr06p R02s6R0sSZazflVa6asQC7rUQlOjqtcyVuz1S+b51srlK7FiWvtG9sIhbBNx3nKghePx tn3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=tiILLqY7U1Y3QbpM1Mvip22YwICjH0jyS4H5r3mejDU=; b=GLTWreh8VZLyQrL3e4H0vyJPU4Kmle0fic1WJL0OcIHYyxEc6TrM6tw+dRqKBMqw8y Htv/7ycsRiGYjWRyY1qSYmFhYnB9WDOgPkM8MEo38GcWaEPRzW5q1SRD9QJUK8y1XxTG LvJ7RWbOJGAKYPjWDC+jdgV9fRuJhqIH/xRmf+m6Li+uyD5mi3KH14f5IykyPQD8YL/B NlQffXEWvtJpEhKjA4B9cMvmlm+yfJT0nP0ZkbRGTb+Y9Crh/MScvfihrASIZvJOFjXw J7qHrFoS5Y3t/ZQ2ipdOKbGh43REmlp1eqcgA/XEDctIShgy6QI5wZZtrMvChVVIODoK N69w== X-Gm-Message-State: AOAM530Tivn2e4U/+Yvuv8K3eRPFg2dOTmf24K/sLpd/KzyIPmH8tCNk ROz86MwW3jEcMPmSZU3kNdrw3BfPbczH09FW X-Google-Smtp-Source: ABdhPJxn/zw1oz3EvNekE8xI7r4SW5dCEndDE/kZi/wuo+TBXRBnhkJ0+dwU8t76gV331bhhlKx0+w== X-Received: by 2002:a0c:90e6:: with SMTP id p93mr1429400qvp.47.1605649683972; Tue, 17 Nov 2020 13:48:03 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id h6sm14204082qtm.68.2020.11.17.13.48.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:48:03 -0800 (PST) Date: Tue, 17 Nov 2020 16:48:01 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 18/24] pack-bitmap-write: build fewer intermediate bitmaps Message-ID: <5262daa3300114fbaccdbc7393882c5435f95f4f.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The bitmap_writer_build() method calls bitmap_builder_init() to construct a list of commits reachable from the selected commits along with a "reverse graph". This reverse graph has edges pointing from a commit to other commits that can reach that commit. After computing a reachability bitmap for a commit, the values in that bitmap are then copied to the reachability bitmaps across the edges in the reverse graph. We can now relax the role of the reverse graph to greatly reduce the number of intermediate reachability bitmaps we compute during this reverse walk. The end result is that we walk objects the same number of times as before when constructing the reachability bitmaps, but we also spend much less time copying bits between bitmaps and have much lower memory pressure in the process. The core idea is to select a set of "important" commits based on interactions among the sets of commits reachable from each selected commit. The first technical concept is to create a new 'commit_mask' member in the bb_commit struct. Note that the selected commits are provided in an ordered array. The first thing to do is to mark the ith bit in the commit_mask for the ith selected commit. As we walk the commit-graph, we copy the bits in a commit's commit_mask to its parents. At the end of the walk, the ith bit in the commit_mask for a commit C stores a boolean representing "The ith selected commit can reach C." As we walk, we will discover non-selected commits that are important. We will get into this later, but those important commits must also receive bit positions, growing the width of the bitmasks as we walk. At the true end of the walk, the ith bit means "the ith _important_ commit can reach C." MAXIMAL COMMITS --------------- We use a new 'maximal' bit in the bb_commit struct to represent whether a commit is important or not. The term "maximal" comes from the partially-ordered set of commits in the commit-graph where C >= P if P is a parent of C, and then extending the relationship transitively. Instead of taking the maximal commits across the entire commit-graph, we instead focus on selecting each commit that is maximal among commits with the same bits on in their commit_mask. This definition is important, so let's consider an example. Suppose we have three selected commits A, B, and C. These are assigned bitmasks 100, 010, and 001 to start. Each of these can be marked as maximal immediately because they each will be the uniquely maximal commit that contains their own bit. Keep in mind that that these commits may have different bitmasks after the walk; for example, if B can reach C but A cannot, then the final bitmask for C is 011. Even in these cases, C would still be a maximal commit among all commits with the third bit on in their masks. Now define sets X, Y, and Z to be the sets of commits reachable from A, B, and C, respectively. The intersections of these sets correspond to different bitmasks: * 100: X - (Y union Z) * 010: Y - (X union Z) * 001: Z - (X union Y) * 110: (X intersect Y) - Z * 101: (X intersect Z) - Y * 011: (Y intersect Z) - X * 111: X intersect Y intersect Z This can be visualized with the following Hasse diagram: 100 010 001 | \ / \ / | | \/ \/ | | /\ /\ | | / \ / \ | 110 101 011 \___ | ___/ \ | / 111 Some of these bitmasks may not be represented, depending on the topology of the commit-graph. In fact, we are counting on it, since the number of possible bitmasks is exponential in the number of selected commits, but is also limited by the total number of commits. In practice, very few bitmasks are possible because most commits converge on a common "trunk" in the commit history. With this three-bit example, we wish to find commits that are maximal for each bitmask. How can we identify this as we are walking? As we walk, we visit a commit C. Since we are walking the commits in topo-order, we know that C is visited after all of its children are visited. Thus, when we get C from the revision walk we inspect the 'maximal' property of its bb_data and use that to determine if C is truly important. Its commit_mask is also nearly final. If C is not one of the originally-selected commits, then assign a bit position to C (by incrementing num_maximal) and set that bit on in commit_mask. See "MULTIPLE MAXIMAL COMMITS" below for more detail on this. Now that the commit C is known to be maximal or not, consider each parent P of C. Compute two new values: * c_not_p : true if and only if the commit_mask for C contains a bit that is not contained in the commit_mask for P. * p_not_c : true if and only if the commit_mask for P contains a bit that is not contained in the commit_mask for P. If c_not_p is false, then P already has all of the bits that C would provide to its commit_mask. In this case, move on to other parents as C has nothing to contribute to P's state that was not already provided by other children of P. We continue with the case that c_not_p is true. This means there are bits in C's commit_mask to copy to P's commit_mask, so use bitmap_or() to add those bits. If p_not_c is also true, then set the maximal bit for P to one. This means that if no other commit has P as a parent, then P is definitely maximal. This is because no child had the same bitmask. It is important to think about the maximal bit for P at this point as a temporary state: "P is maximal based on current information." In contrast, if p_not_c is false, then set the maximal bit for P to zero. Further, clear all reverse_edges for P since any edges that were previously assigned to P are no longer important. P will gain all reverse edges based on C. The final thing we need to do is to update the reverse edges for P. These reverse edges respresent "which closest maximal commits contributed bits to my commit_mask?" Since C contributed bits to P's commit_mask in this case, C must add to the reverse edges of P. If C is maximal, then C is a 'closest' maximal commit that contributed bits to P. Add C to P's reverse_edges list. Otherwise, C has a list of maximal commits that contributed bits to its bitmask (and this list is exactly one element). Add all of these items to P's reverse_edges list. Be careful to ignore duplicates here. After inspecting all parents P for a commit C, we can clear the commit_mask for C. This reduces the memory load to be limited to the "width" of the commit graph. Consider our ABC/XYZ example from earlier and let's inspect the state of the commits for an interesting bitmask, say 011. Suppose that D is the only maximal commit with this bitmask (in the first three bits). All other commits with bitmask 011 have D as the only entry in their reverse_edges list. D's reverse_edges list contains B and C. COMPUTING REACHABILITY BITMAPS ------------------------------ Now that we have our definition, let's zoom out and consider what happens with our new reverse graph when computing reachability bitmaps. We walk the reverse graph in reverse-topo-order, so we visit commits with largest commit_masks first. After we compute the reachability bitmap for a commit C, we push the bits in that bitmap to each commit D in the reverse edge list for C. Then, when we finally visit D we already have the bits for everything reachable from maximal commits that D can reach and we only need to walk the objects in the set-difference. In our ABC/XYZ example, when we finally walk for the commit A we only need to walk commits with bitmask equal to A's bitmask. If that bitmask is 100, then we are only walking commits in X - (Y union Z) because the bitmap already contains the bits for objects reachable from (X intersect Y) union (X intersect Z) (i.e. the bits from the reachability bitmaps for the maximal commits with bitmasks 110 and 101). The behavior is intended to walk each commit (and the trees that commit introduces) at most once while allocating and copying fewer reachability bitmaps. There is one caveat: what happens when there are multiple maximal commits with the same bitmask, with respect to the initial set of selected commits? MULTIPLE MAXIMAL COMMITS ------------------------ Earlier, we mentioned that when we discover a new maximal commit, we assign a new bit position to that commit and set that bit position to one for that commit. This is absolutely important for interesting commit-graphs such as git/git and torvalds/linux. The reason is due to the existence of "butterflies" in the commit-graph partial order. Here is an example of four commits forming a butterfly: I J |\ /| | \/ | | /\ | |/ \| M N \ / |/ Q Here, I and J both have parents M and N. In general, these do not need to be exact parent relationships, but reachability relationships. The most important part is that M and N cannot reach each other, so they are independent in the partial order. If I had commit_mask 10 and J had commit_mask 01, then M and N would both be assigned commit_mask 11 and be maximal commits with the bitmask 11. Then, what happens when M and N can both reach a commit Q? If Q is also assigned the bitmask 11, then it is not maximal but is reachable from both M and N. While this is not necessarily a deal-breaker for our abstract definition of finding maximal commits according to a given bitmask, we have a few issues that can come up in our larger picture of constructing reachability bitmaps. In particular, if we do not also consider Q to be a "maximal" commit, then we will walk commits reachable from Q twice: once when computing the reachability bitmap for M and another time when computing the reachability bitmap for N. This becomes much worse if the topology continues this pattern with multiple butterflies. The solution has already been mentioned: each of M and N are assigned their own bits to the bitmask and hence they become uniquely maximal for their bitmasks. Finally, Q also becomes maximal and thus we do not need to walk its commits multiple times. The final bitmasks for these commits are as follows: I:10 J:01 |\ /| | \ _____/ | | /\____ | |/ \ | M:111 N:1101 \ / Q:1111 Further, Q's reverse edge list is { M, N }, while M and N both have reverse edge list { I, J }. PERFORMANCE MEASUREMENTS ------------------------ Now that we've spent a LOT of time on the theory of this algorithm, let's show that this is actually worth all that effort. To test the performance, use GIT_TRACE2_PERF=1 when running 'git repack -abd' in a repository with no existing reachability bitmaps. This avoids any issues with keeping existing bitmaps to skew the numbers. Inspect the "building_bitmaps_total" region in the trace2 output to focus on the portion of work that is affected by this change. Here are the performance comparisons for a few repositories. The timings are for the following versions of Git: "multi" is the timing from before any reverse graph is constructed, where we might perform multiple traversals. "reverse" is for the previous change where the reverse graph has every reachable commit. Finally "maximal" is the version introduced here where the reverse graph only contains the maximal commits. Repository: git/git multi: 2.628 sec reverse: 2.344 sec maximal: 2.047 sec Repository: torvalds/linux multi: 64.7 sec reverse: 205.3 sec maximal: 44.7 sec So in all cases we've not only recovered any time lost to switching to the reverse-edge algorithm, but we come out ahead of "multi" in all cases. Likewise, peak heap has gone back to something reasonable: Repository: torvalds/linux multi: 2.087 GB reverse: 3.141 GB maximal: 2.288 GB While I do not have access to full fork networks on GitHub, Peff has run this algorithm on the chromium/chromium fork network and reported a change from 3 hours to ~233 seconds. That network is particularly beneficial for this approach because it has a long, linear history along with many tags. The "multi" approach was obviously quadratic and the new approach is linear. Helped-by: Jeff King Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 72 +++++++++++++++++++++++++++++++--- t/t5310-pack-bitmaps.sh | 87 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 149 insertions(+), 10 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 369c76a87c..7b4fc0f304 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -180,8 +180,10 @@ static void compute_xor_offsets(void) struct bb_commit { struct commit_list *reverse_edges; + struct bitmap *commit_mask; struct bitmap *bitmap; - unsigned selected:1; + unsigned selected:1, + maximal:1; unsigned idx; /* within selected array */ }; @@ -198,7 +200,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, { struct rev_info revs; struct commit *commit; - unsigned int i; + unsigned int i, num_maximal; memset(bb, 0, sizeof(*bb)); init_bb_data(&bb->data); @@ -210,27 +212,85 @@ static void bitmap_builder_init(struct bitmap_builder *bb, for (i = 0; i < writer->selected_nr; i++) { struct commit *c = writer->selected[i].commit; struct bb_commit *ent = bb_data_at(&bb->data, c); + ent->selected = 1; + ent->maximal = 1; ent->idx = i; + + ent->commit_mask = bitmap_new(); + bitmap_set(ent->commit_mask, i); + add_pending_object(&revs, &c->object, ""); } + num_maximal = writer->selected_nr; if (prepare_revision_walk(&revs)) die("revision walk setup failed"); while ((commit = get_revision(&revs))) { struct commit_list *p; + struct bb_commit *c_ent; parse_commit_or_die(commit); - ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); - bb->commits[bb->commits_nr++] = commit; + c_ent = bb_data_at(&bb->data, commit); + + if (c_ent->maximal) { + if (!c_ent->selected) { + bitmap_set(c_ent->commit_mask, num_maximal); + num_maximal++; + } + + ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); + bb->commits[bb->commits_nr++] = commit; + } for (p = commit->parents; p; p = p->next) { - struct bb_commit *ent = bb_data_at(&bb->data, p->item); - commit_list_insert(commit, &ent->reverse_edges); + struct bb_commit *p_ent = bb_data_at(&bb->data, p->item); + int c_not_p, p_not_c; + + if (!p_ent->commit_mask) { + p_ent->commit_mask = bitmap_new(); + c_not_p = 1; + p_not_c = 0; + } else { + c_not_p = bitmap_diff_nonzero(c_ent->commit_mask, p_ent->commit_mask); + p_not_c = bitmap_diff_nonzero(p_ent->commit_mask, c_ent->commit_mask); + } + + if (!c_not_p) + continue; + + bitmap_or(p_ent->commit_mask, c_ent->commit_mask); + + if (p_not_c) + p_ent->maximal = 1; + else { + p_ent->maximal = 0; + free_commit_list(p_ent->reverse_edges); + p_ent->reverse_edges = NULL; + } + + if (c_ent->maximal) { + commit_list_insert(commit, &p_ent->reverse_edges); + } else { + struct commit_list *cc = c_ent->reverse_edges; + + for (; cc; cc = cc->next) { + if (!commit_list_contains(cc->item, p_ent->reverse_edges)) + commit_list_insert(cc->item, &p_ent->reverse_edges); + } + } } + + bitmap_free(c_ent->commit_mask); + c_ent->commit_mask = NULL; } + + trace2_data_intmax("pack-bitmap-write", the_repository, + "num_selected_commits", writer->selected_nr); + trace2_data_intmax("pack-bitmap-write", the_repository, + "num_maximal_commits", num_maximal); } static void bitmap_builder_clear(struct bitmap_builder *bb) diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 6bf68fee85..1691710ec1 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -20,11 +20,87 @@ has_any () { grep -Ff "$1" "$2" } +# To ensure the logic for "maximal commits" is exercised, make +# the repository a bit more complicated. +# +# other master +# * * +# (99 commits) (99 commits) +# * * +# |\ /| +# | * octo-other octo-master * | +# |/|\_________ ____________/|\| +# | \ \/ __________/ | +# | | ________/\ / | +# * |/ * merge-right * +# | _|__________/ \____________ | +# |/ | \| +# (l1) * * merge-left * (r1) +# | / \________________________ | +# |/ \| +# (l2) * * (r2) +# \____________...____________ | +# \| +# * (base) +# +# The important part for the maximal commit algorithm is how +# the bitmasks are extended. Assuming starting bit positions +# for master (bit 0) and other (bit 1), and some flexibility +# in the order that merge bases are visited, the bitmasks at +# the end should be: +# +# master: 1 (maximal, selected) +# other: 01 (maximal, selected) +# octo-master: 1 +# octo-other: 01 +# merge-right: 111 (maximal) +# (l1): 111 +# (r1): 111 +# merge-left: 1101 (maximal) +# (l2): 11111 (maximal) +# (r2): 111101 (maximal) +# (base): 1111111 (maximal) + test_expect_success 'setup repo with moderate-sized history' ' - test_commit_bulk --id=file 100 && + test_commit_bulk --id=file 10 && git checkout -b other HEAD~5 && test_commit_bulk --id=side 10 && + + # add complicated history setup, including merges and + # ambiguous merge-bases + + git checkout -b merge-left other~2 && + git merge master~2 -m "merge-left" && + + git checkout -b merge-right master~1 && + git merge other~1 -m "merge-right" && + + git checkout -b octo-master master && + git merge merge-left merge-right -m "octopus-master" && + + git checkout -b octo-other other && + git merge merge-left merge-right -m "octopus-other" && + + git checkout other && + git merge octo-other -m "pull octopus" && + git checkout master && + git merge octo-master -m "pull octopus" && + + # Remove these branches so they are not selected + # as bitmap tips + git branch -D merge-left && + git branch -D merge-right && + git branch -D octo-other && + git branch -D octo-master && + + # add padding to make these merges less interesting + # and avoid having them selected for bitmaps + test_commit_bulk --id=file 100 && + git checkout other && + test_commit_bulk --id=side 100 && + git checkout master && + bitmaptip=$(git rev-parse master) && blob=$(echo tagged-blob | git hash-object -w --stdin) && git tag tagged-blob $blob && @@ -32,9 +108,12 @@ test_expect_success 'setup repo with moderate-sized history' ' ' test_expect_success 'full repack creates bitmaps' ' - git repack -ad && + GIT_TRACE2_EVENT_NESTING=4 GIT_TRACE2_EVENT="$(pwd)/trace" \ + git repack -ad && ls .git/objects/pack/ | grep bitmap >output && - test_line_count = 1 output + test_line_count = 1 output && + grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace && + grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace ' test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' @@ -356,7 +435,7 @@ test_expect_success 'truncated bitmap fails gracefully (ewah)' ' git rev-list --use-bitmap-index --count --all >expect && bitmap=$(ls .git/objects/pack/*.bitmap) && test_when_finished "rm -f $bitmap" && - test_copy_bytes 256 <$bitmap >$bitmap.tmp && + test_copy_bytes 270 <$bitmap >$bitmap.tmp && mv -f $bitmap.tmp $bitmap && git rev-list --use-bitmap-index --count --all >actual 2>stderr && test_cmp expect actual && From patchwork Tue Nov 17 21:48:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2E70C2D0E4 for ; Tue, 17 Nov 2020 21:48:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7431F241A7 for ; Tue, 17 Nov 2020 21:48:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="wIB1vvhC" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728388AbgKQVsM (ORCPT ); Tue, 17 Nov 2020 16:48:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728385AbgKQVsM (ORCPT ); Tue, 17 Nov 2020 16:48:12 -0500 Received: from mail-qk1-x72f.google.com (mail-qk1-x72f.google.com [IPv6:2607:f8b0:4864:20::72f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E56FEC0613CF for ; Tue, 17 Nov 2020 13:48:11 -0800 (PST) Received: by mail-qk1-x72f.google.com with SMTP id v143so22414703qkb.2 for ; Tue, 17 Nov 2020 13:48:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=GEb9T142uLBdJ3Em6kJ88iWE9NBaAcI3uoSRB0KT4Xs=; b=wIB1vvhCPpLQEOmMqQSrERJPbTAAdg8MCGEnleQcIWEA/iwUPqk4WNRwuUHaKwFosQ IH7NlTUH8q8vkRGQoJi504E7CtaMEUw1h7eaPCRNTXOrsMVqJV0aO/wX6hZ0YWPUkZaY pN89417VrCEOTlSmL6aMbhiipZT3bWAdAbQGZuMfvfN45d6F5K+zw9dz1n34dk4CMjm6 nAXhAONDdLwdWEhSGc6M2jWRmZqhSeHgY8VrK85aofhg+tHoHU9gAAtcqaEpxrMqybbw PaoVtZ12rl68/MM0tcJ/tTa+cninONc8FoJZzjy3j22uMKozdaVjdb347HSMyMrEZ7Q3 aFIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=GEb9T142uLBdJ3Em6kJ88iWE9NBaAcI3uoSRB0KT4Xs=; b=WNmuqcPOBdgRxFGVHzo3y4plYQIha3ZqpmRt3a3mpWYa2VBJ+tfAovNGyofN9c3BSb 420heV2QJbuNlfXbnWEmCTl8BmLykAR1a6aklAnE9tvNPDzHgdF6prip5Uaj8vLCAeNL gcca39ymZAB9sILno3CVe9TpbeTgEc2tolVMfSlQcAaglaK00RPhOeYnXuUwozhmQHWS aWKPLQ0g8fQ4UyuYIpeSN1lt+ZNHyZoJUcZFKNdf7agIUpK60pK3HBaUCzpp3mschrzB I0aehSdtqpxkMaRp4e0n6KHLGJoy/VTGgw75s0jaGCcSIlqpBcSoYEsWMutY76v6nObB 3KwA== X-Gm-Message-State: AOAM530vy7IEuPISjrflIeOWH0sQQHQEnedzj3PHc+64ywkqpStuRlBA dbb7kKieKOlHcPm4KiV65ThuNwfPlSL0JFlD X-Google-Smtp-Source: ABdhPJyEyF12fb5h1rnAzXt8Q0UiwGEhmjfZ4bbgll2tUsbGIIC5YvVfo22qJJKmtFqXNugEpSVWlA== X-Received: by 2002:a37:47d6:: with SMTP id u205mr1722594qka.19.1605649690680; Tue, 17 Nov 2020 13:48:10 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id w30sm3417420qkw.24.2020.11.17.13.48.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:48:10 -0800 (PST) Date: Tue, 17 Nov 2020 16:48:07 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 19/24] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King The on-disk bitmap format has a flag to mark a bitmap to be "reused". This is a rather curious feature, and works like this: - a run of pack-objects would decide to mark the last 80% of the bitmaps it generates with the reuse flag - the next time we generate bitmaps, we'd see those reuse flags from the last run, and mark those commits as special: - we'd be more likely to select those commits to get bitmaps in the new output - when generating the bitmap for a selected commit, we'd reuse the old bitmap as-is (rearranging the bits to match the new pack, of course) However, neither of these behaviors particularly makes sense. Just because a commit happened to be bitmapped last time does not make it a good candidate for having a bitmap this time. In particular, we may choose bitmaps based on how recent they are in history, or whether a ref tip points to them, and those things will change. We're better off re-considering fresh which commits are good candidates. Reusing the existing bitmap _is_ a reasonable thing to do to save computation. But only reusing exact bitmaps is a weak form of this. If we have an old bitmap for A and now want a new bitmap for its child, we should be able to compute that only by looking at trees and that are new to the child. But this code would consider only exact reuse (which is perhaps why it was eager to select those commits in the first place). Furthermore, the recent switch to the reverse-edge algorithm for generating bitmaps dropped this optimization entirely (and yet still performs better). So let's do a few cleanups: - drop the whole "reusing bitmaps" phase of generating bitmaps. It's not helping anything, and is mostly unused code (or worse, code that is using CPU but not doing anything useful) - drop the use of the on-disk reuse flag to select commits to bitmap - stop setting the on-disk reuse flag in bitmaps we generate (since nothing respects it anymore) We will keep a few innards of the reuse code, which will help us implement a more capable version of the "reuse" optimization: - simplify rebuild_existing_bitmaps() into a function that only builds the mapping of bits between the old and new orders, but doesn't actually convert any bitmaps - make rebuild_bitmap() public; we'll call it lazily to convert bitmaps as we traverse (using the mapping created above) Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 1 - pack-bitmap-write.c | 50 +++++------------------------------------- pack-bitmap.c | 46 +++++--------------------------------- pack-bitmap.h | 6 ++++- 4 files changed, 16 insertions(+), 87 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 5617c01b5a..2a00358f34 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1104,7 +1104,6 @@ static void write_pack_file(void) stop_progress(&progress_state); bitmap_writer_show_progress(progress); - bitmap_writer_reuse_bitmaps(&to_pack); bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1); bitmap_writer_build(&to_pack); bitmap_writer_finish(written_list, nr_written, diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 7b4fc0f304..1995f75818 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -30,7 +30,6 @@ struct bitmap_writer { struct ewah_bitmap *tags; kh_oid_map_t *bitmaps; - kh_oid_map_t *reused; struct packing_data *to_pack; struct bitmapped_commit *selected; @@ -112,7 +111,7 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack, * Compute the actual bitmaps */ -static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused) +static inline void push_bitmapped_commit(struct commit *commit) { if (writer.selected_nr >= writer.selected_alloc) { writer.selected_alloc = (writer.selected_alloc + 32) * 2; @@ -120,7 +119,7 @@ static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitm } writer.selected[writer.selected_nr].commit = commit; - writer.selected[writer.selected_nr].bitmap = reused; + writer.selected[writer.selected_nr].bitmap = NULL; writer.selected[writer.selected_nr].flags = 0; writer.selected_nr++; @@ -372,13 +371,6 @@ static void store_selected(struct bb_commit *ent, struct commit *commit) khiter_t hash_pos; int hash_ret; - /* - * the "reuse bitmaps" phase may have stored something here, but - * our new algorithm doesn't use it. Drop it. - */ - if (stored->bitmap) - ewah_free(stored->bitmap); - stored->bitmap = bitmap_to_ewah(ent->bitmap); hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret); @@ -477,35 +469,6 @@ static int date_compare(const void *_a, const void *_b) return (long)b->date - (long)a->date; } -void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack) -{ - struct bitmap_index *bitmap_git; - if (!(bitmap_git = prepare_bitmap_git(to_pack->repo))) - return; - - writer.reused = kh_init_oid_map(); - rebuild_existing_bitmaps(bitmap_git, to_pack, writer.reused, - writer.show_progress); - /* - * NEEDSWORK: rebuild_existing_bitmaps() makes writer.reused reference - * some bitmaps in bitmap_git, so we can't free the latter. - */ -} - -static struct ewah_bitmap *find_reused_bitmap(const struct object_id *oid) -{ - khiter_t hash_pos; - - if (!writer.reused) - return NULL; - - hash_pos = kh_get_oid_map(writer.reused, *oid); - if (hash_pos >= kh_end(writer.reused)) - return NULL; - - return kh_value(writer.reused, hash_pos); -} - void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps) @@ -519,12 +482,11 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, if (indexed_commits_nr < 100) { for (i = 0; i < indexed_commits_nr; ++i) - push_bitmapped_commit(indexed_commits[i], NULL); + push_bitmapped_commit(indexed_commits[i]); return; } for (;;) { - struct ewah_bitmap *reused_bitmap = NULL; struct commit *chosen = NULL; next = next_commit_index(i); @@ -539,15 +501,13 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, if (next == 0) { chosen = indexed_commits[i]; - reused_bitmap = find_reused_bitmap(&chosen->object.oid); } else { chosen = indexed_commits[i + next]; for (j = 0; j <= next; ++j) { struct commit *cm = indexed_commits[i + j]; - reused_bitmap = find_reused_bitmap(&cm->object.oid); - if (reused_bitmap || (cm->object.flags & NEEDS_BITMAP) != 0) { + if ((cm->object.flags & NEEDS_BITMAP) != 0) { chosen = cm; break; } @@ -557,7 +517,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, } } - push_bitmapped_commit(chosen, reused_bitmap); + push_bitmapped_commit(chosen); i += next + 1; display_progress(writer.progress, i); diff --git a/pack-bitmap.c b/pack-bitmap.c index 60c781d100..d1368b69bb 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1338,9 +1338,9 @@ void test_bitmap_walk(struct rev_info *revs) free_bitmap_index(bitmap_git); } -static int rebuild_bitmap(uint32_t *reposition, - struct ewah_bitmap *source, - struct bitmap *dest) +int rebuild_bitmap(const uint32_t *reposition, + struct ewah_bitmap *source, + struct bitmap *dest) { uint32_t pos = 0; struct ewah_iterator it; @@ -1369,19 +1369,11 @@ static int rebuild_bitmap(uint32_t *reposition, return 0; } -int rebuild_existing_bitmaps(struct bitmap_index *bitmap_git, - struct packing_data *mapping, - kh_oid_map_t *reused_bitmaps, - int show_progress) +uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, + struct packing_data *mapping) { uint32_t i, num_objects; uint32_t *reposition; - struct bitmap *rebuild; - struct stored_bitmap *stored; - struct progress *progress = NULL; - - khiter_t hash_pos; - int hash_ret; num_objects = bitmap_git->pack->num_objects; reposition = xcalloc(num_objects, sizeof(uint32_t)); @@ -1399,33 +1391,7 @@ int rebuild_existing_bitmaps(struct bitmap_index *bitmap_git, reposition[i] = oe_in_pack_pos(mapping, oe) + 1; } - rebuild = bitmap_new(); - i = 0; - - if (show_progress) - progress = start_progress("Reusing bitmaps", 0); - - kh_foreach_value(bitmap_git->bitmaps, stored, { - if (stored->flags & BITMAP_FLAG_REUSE) { - if (!rebuild_bitmap(reposition, - lookup_stored_bitmap(stored), - rebuild)) { - hash_pos = kh_put_oid_map(reused_bitmaps, - stored->oid, - &hash_ret); - kh_value(reused_bitmaps, hash_pos) = - bitmap_to_ewah(rebuild); - } - bitmap_reset(rebuild); - display_progress(progress, ++i); - } - }); - - stop_progress(&progress); - - free(reposition); - bitmap_free(rebuild); - return 0; + return reposition; } void free_bitmap_index(struct bitmap_index *b) diff --git a/pack-bitmap.h b/pack-bitmap.h index 1203120c43..afa4115136 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -73,7 +73,11 @@ void bitmap_writer_set_checksum(unsigned char *sha1); void bitmap_writer_build_type_index(struct packing_data *to_pack, struct pack_idx_entry **index, uint32_t index_nr); -void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack); +uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, + struct packing_data *mapping); +int rebuild_bitmap(const uint32_t *reposition, + struct ewah_bitmap *source, + struct bitmap *dest); void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps); void bitmap_writer_build(struct packing_data *to_pack); From patchwork Tue Nov 17 21:48:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913619 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A0B5C2D0E4 for ; Tue, 17 Nov 2020 21:48:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2C86B241A7 for ; Tue, 17 Nov 2020 21:48:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="wDL3JinQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728568AbgKQVsT (ORCPT ); Tue, 17 Nov 2020 16:48:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728385AbgKQVsT (ORCPT ); Tue, 17 Nov 2020 16:48:19 -0500 Received: from mail-qv1-xf41.google.com (mail-qv1-xf41.google.com [IPv6:2607:f8b0:4864:20::f41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8F2FC0613CF for ; Tue, 17 Nov 2020 13:48:18 -0800 (PST) Received: by mail-qv1-xf41.google.com with SMTP id u23so4175062qvf.1 for ; Tue, 17 Nov 2020 13:48:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=eid1orQflA5dQESKfzDY36urrmCU8GuKTyfyOBrz318=; b=wDL3JinQlj+o3czGxNiG0w85utgzdLuNcibAq0OuQ427aVT3Gsx9LUPWqrLmvafUJN R4zCCuvBhKJb3/hJw0XAkT97KJlRGdz20jZGEeDNGFcakGgROBB28G4OgomgXkyuTWWJ OufXuc6GuGm8Ak9oJlPM/LFulT4AVnyH0Gx7kCEv9+3JW0/zmbHENOeBAP9BOoVLkoVc 3okmO/fAUhBmMupSWVtkupqInQspuIM2gsW3odZ1E6ct6NzlCGeZK3sBtGB57hH+6lFB y9A73XYnuB2HVkJF9+op+UIHiMNrOoIqiTdov2/4MsznQP0SFkPKjmwt9mTgnjlg5dFE besQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=eid1orQflA5dQESKfzDY36urrmCU8GuKTyfyOBrz318=; b=MAv5LHASHR3qPyoVtT/x8rsRyGKzaIfiU7q+Ke/du5RAdkQtuloQ1MJoruHgkzlUJa UoqiXsmFuOatSbgx5AJeyTY3Ey5KfyremOD5+ah5ROvle3ySIcK/TvQ9JtptmpAHJRTq sYBR1/SM9ARXy7FiPPMvUjUbLsTJkRlhj0HSf7u9lhQUHzBerKQb5Y2PFzgTt7NpBp8j 9DHEVIcyjFu2k/DpkK71lFEaSdiOysbsmdC1km1D0a16M5I3AQ8HueRQdsnvdeD5Eb3i 6icQMaZe/XQZ3KllxSios5E2clVEKiD3JEdKhSRS7suLysuHgUdqGFz7hJtKHw6BPHwr 7aEQ== X-Gm-Message-State: AOAM53296GBzyqwaOI0w3p+2KfIjHPorRsPkY6DCIAW3KFHLB7fKv3td Zmy9jXh/DYL4IeMmbpLTQ2LNKfD/C+ZF6qhA X-Google-Smtp-Source: ABdhPJwj8ireI1QBqWnuf/R+OWFamPYvtmncj0hIcUVSQRUaHzkwLmREkF73NT7FWc5gUQGpclkN5g== X-Received: by 2002:a05:6214:247:: with SMTP id k7mr1482891qvt.32.1605649697636; Tue, 17 Nov 2020 13:48:17 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id p73sm15533051qka.79.2020.11.17.13.48.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:48:17 -0800 (PST) Date: Tue, 17 Nov 2020 16:48:15 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 20/24] pack-bitmap: factor out 'bitmap_for_commit()' Message-ID: <9928b3c7da33edd8d4beae006a74dd682daf5fa5.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org A couple of callers within pack-bitmap.c duplicate logic to lookup a given object id in the bitamps khash. Factor this out into a new function, 'bitmap_for_commit()' to reduce some code duplication. Make this new function non-static, since it will be used in later commits from outside of pack-bitmap.c. Signed-off-by: Taylor Blau --- pack-bitmap.c | 33 +++++++++++++++++++-------------- pack-bitmap.h | 2 ++ 2 files changed, 21 insertions(+), 14 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index d1368b69bb..5efb8af121 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -380,6 +380,16 @@ struct include_data { struct bitmap *seen; }; +struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git, + struct commit *commit) +{ + khiter_t hash_pos = kh_get_oid_map(bitmap_git->bitmaps, + commit->object.oid); + if (hash_pos >= kh_end(bitmap_git->bitmaps)) + return NULL; + return lookup_stored_bitmap(kh_value(bitmap_git->bitmaps, hash_pos)); +} + static inline int bitmap_position_extended(struct bitmap_index *bitmap_git, const struct object_id *oid) { @@ -465,10 +475,10 @@ static void show_commit(struct commit *commit, void *data) static int add_to_include_set(struct bitmap_index *bitmap_git, struct include_data *data, - const struct object_id *oid, + struct commit *commit, int bitmap_pos) { - khiter_t hash_pos; + struct ewah_bitmap *partial; if (data->seen && bitmap_get(data->seen, bitmap_pos)) return 0; @@ -476,10 +486,9 @@ static int add_to_include_set(struct bitmap_index *bitmap_git, if (bitmap_get(data->base, bitmap_pos)) return 0; - hash_pos = kh_get_oid_map(bitmap_git->bitmaps, *oid); - if (hash_pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, hash_pos); - bitmap_or_ewah(data->base, lookup_stored_bitmap(st)); + partial = bitmap_for_commit(bitmap_git, commit); + if (partial) { + bitmap_or_ewah(data->base, partial); return 0; } @@ -498,8 +507,7 @@ static int should_include(struct commit *commit, void *_data) (struct object *)commit, NULL); - if (!add_to_include_set(data->bitmap_git, data, &commit->object.oid, - bitmap_pos)) { + if (!add_to_include_set(data->bitmap_git, data, commit, bitmap_pos)) { struct commit_list *parent = commit->parents; while (parent) { @@ -1282,10 +1290,10 @@ void test_bitmap_walk(struct rev_info *revs) { struct object *root; struct bitmap *result = NULL; - khiter_t pos; size_t result_popcnt; struct bitmap_test_data tdata; struct bitmap_index *bitmap_git; + struct ewah_bitmap *bm; if (!(bitmap_git = prepare_bitmap_git(revs->repo))) die("failed to load bitmap indexes"); @@ -1297,12 +1305,9 @@ void test_bitmap_walk(struct rev_info *revs) bitmap_git->version, bitmap_git->entry_count); root = revs->pending.objects[0].item; - pos = kh_get_oid_map(bitmap_git->bitmaps, root->oid); - - if (pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos); - struct ewah_bitmap *bm = lookup_stored_bitmap(st); + bm = bitmap_for_commit(bitmap_git, (struct commit *)root); + if (bm) { fprintf(stderr, "Found bitmap for %s. %d bits / %08x checksum\n", oid_to_hex(&root->oid), (int)bm->bit_size, ewah_checksum(bm)); diff --git a/pack-bitmap.h b/pack-bitmap.h index afa4115136..25dfcf5615 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -78,6 +78,8 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, int rebuild_bitmap(const uint32_t *reposition, struct ewah_bitmap *source, struct bitmap *dest); +struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git, + struct commit *commit); void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps); void bitmap_writer_build(struct packing_data *to_pack); From patchwork Tue Nov 17 21:48:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C6C8C63697 for ; Tue, 17 Nov 2020 21:48:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F2A16241A7 for ; Tue, 17 Nov 2020 21:48:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="x5mOGwug" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728574AbgKQVsZ (ORCPT ); Tue, 17 Nov 2020 16:48:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728385AbgKQVsY (ORCPT ); Tue, 17 Nov 2020 16:48:24 -0500 Received: from mail-qv1-xf41.google.com (mail-qv1-xf41.google.com [IPv6:2607:f8b0:4864:20::f41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 219BAC0613CF for ; Tue, 17 Nov 2020 13:48:23 -0800 (PST) Received: by mail-qv1-xf41.google.com with SMTP id e14so4438126qve.3 for ; Tue, 17 Nov 2020 13:48:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=IhkxgLth/z+cgMyj/QWQzcwniJPc/I0VOjeQDzXCE7Q=; b=x5mOGwugUhnpEM9xLOtOEyfUhSn/s+ZxMpM8TwqJJzpF4tgKxcEBZoUJGnezjS3MEg vVnwu9zqr/vSwOi2ce/7K7K/J8KsNofZz/fj6QfWQKaKvV+opc7FfKd5k20uFE9VTIZG Us9qvckD4jeqAK7fcYbzhyFGc7Ienwi+5e4bkGqsl1F9U8ZnoJrFovOgsnAspqSyOwnH aI/MKpfYEGBUpHqr2YB5ZUopTrcWoEG0M4emQnRDh5ptVn0bi4yIIR7fbWZ+anfs9mR2 Ow3j47vjqlMD7SI//yludgnplpcM8ztKrwbDKrNVg17hylHZh2J8kMj2KXt7SOKrSOOB uQtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=IhkxgLth/z+cgMyj/QWQzcwniJPc/I0VOjeQDzXCE7Q=; b=fuKSKtpek8vQYr60rRIZ1gn4m72b5DRXwa7Zt/rMQjPZVRpXmXlYN//VvYHMvyM6C/ MN1AWL+0E24b/8cpmiNcQKy/SlCqahAx5ZtIE5lMqDxmMWbJnmpiyWtDKMcZqBzsLnlm RYSEdPRPA5n4+6LbMHWTRAnF0sgygf6JC6hZcv5vmp6cwEKVfWAD+AwDu10d26Rni6lm FTBAskRW+754z4WlpM7hg0mrFv/5eO5bGSS7Qw0XBkUeYU9XyYL2wADjodjyswd9NFZw Falc0awMo07kFilquOViIL9qBiRo2c0D6V03CvCfPqOFBwvz9PO7JftkDwLPhfUf//Po sDvw== X-Gm-Message-State: AOAM530ETjqbJ2uadB268W4lJqmRAOIXtJ982lCdeoJP58p3SNLo8D+l qM3zlxB1JICO5oM+4OdROP8bj+re5/g1FwKw X-Google-Smtp-Source: ABdhPJzZ3YtQhZ9d6DDe5WWjB7Mo+++JDbyDpWPNZ4YvPsrJ0mOFKCgp0syM8edZ87rD3cm7gSBO/A== X-Received: by 2002:a0c:b34a:: with SMTP id a10mr1790753qvf.15.1605649702015; Tue, 17 Nov 2020 13:48:22 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id 72sm15579116qkn.44.2020.11.17.13.48.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:48:21 -0800 (PST) Date: Tue, 17 Nov 2020 16:48:19 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 21/24] pack-bitmap: factor out 'add_commit_to_bitmap()' Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org 'find_objects()' currently needs to interact with the bitmaps khash pretty closely. To make 'find_objects()' read a little more straightforwardly, remove some of the khash-level details into a new function that describes what it does: 'add_commit_to_bitmap()'. Signed-off-by: Taylor Blau --- pack-bitmap.c | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 5efb8af121..d88745fb02 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -521,6 +521,23 @@ static int should_include(struct commit *commit, void *_data) return 1; } +static int add_commit_to_bitmap(struct bitmap_index *bitmap_git, + struct bitmap **base, + struct commit *commit) +{ + struct ewah_bitmap *or_with = bitmap_for_commit(bitmap_git, commit); + + if (!or_with) + return 0; + + if (*base == NULL) + *base = ewah_to_bitmap(or_with); + else + bitmap_or_ewah(*base, or_with); + + return 1; +} + static struct bitmap *find_objects(struct bitmap_index *bitmap_git, struct rev_info *revs, struct object_list *roots, @@ -544,21 +561,10 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, struct object *object = roots->item; roots = roots->next; - if (object->type == OBJ_COMMIT) { - khiter_t pos = kh_get_oid_map(bitmap_git->bitmaps, object->oid); - - if (pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos); - struct ewah_bitmap *or_with = lookup_stored_bitmap(st); - - if (base == NULL) - base = ewah_to_bitmap(or_with); - else - bitmap_or_ewah(base, or_with); - - object->flags |= SEEN; - continue; - } + if (object->type == OBJ_COMMIT && + add_commit_to_bitmap(bitmap_git, &base, (struct commit *)object)) { + object->flags |= SEEN; + continue; } object_list_insert(object, ¬_mapped); From patchwork Tue Nov 17 21:48:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F305C63697 for ; Tue, 17 Nov 2020 21:48:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A2A38241A7 for ; Tue, 17 Nov 2020 21:48:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="mOWzHUGw" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728579AbgKQVsb (ORCPT ); Tue, 17 Nov 2020 16:48:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727386AbgKQVsa (ORCPT ); Tue, 17 Nov 2020 16:48:30 -0500 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A866FC0613CF for ; Tue, 17 Nov 2020 13:48:30 -0800 (PST) Received: by mail-qt1-x843.google.com with SMTP id g15so17077641qtq.13 for ; Tue, 17 Nov 2020 13:48:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=gualvIaBaGBiJTzE/Bia/VmJUrpSkpUb+YalyQUJ8XY=; b=mOWzHUGwE4CFJoBCQzj0qJ4lh1SsfLIcqyao9XWFkrN8m5aAyI2gCo5XIF7wpdCF5q 5Ne+ToBPC08ygEXJxASTVmI2ostx9qSxw3iT6Rw9w4NhaKsweymNiDRAZPpJXLu0Y0dg r9zagL5AiHFzUmhPuv4oMaq9Hvg8AHFd1laZ3a8Ptsu0ZmP816Uk1rAum65HO42nbCQ0 AhUzFVtMf3bBen9nRZYrV8R+m5Ybjy+bxWo3Vyy8U1Z/aeZaGe9s/BwXGC0BU85a6zuG /6w9yn8r6ZA9gBwvIketngZWbmaO0gILome6AEajpQhmArtZBwUNxAbA6KAg+MNES0Wf TDYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=gualvIaBaGBiJTzE/Bia/VmJUrpSkpUb+YalyQUJ8XY=; b=t4A6jP2AupWRsCVP7t6qO2ZOiTse7RGtgGowIv0RnJXxwK51RPxJy3k3jupDsdMM0R rExXlKa9hgNC3bSoDUWEphBkBf92UW6T7+VDs/jcBJ2JcRtA4UHTRzCIaxxWKhh2cBNQ gMuqxcCU9FTcuPMQ0Wmg85+byhKgXg1G9CtP41bYTxUFJZMI3RskVtE08iCPPi/xeuPC oc5wSoPLSr95OL3lbCotuy7Tht/118K04FUbvZERsgIeewgX25/jn/nwmIBCuKxmfNCC Wt927fpvxuZ1T+qDxlW4SDz1Yt4sqRZnk3NjdJCWOWqonlkSpSzein0PCs11fCD0NlSC ls8Q== X-Gm-Message-State: AOAM5323VJvXWhtwTG80WDvxAa2Ac4ZMdCXcZX7qbMMYwAvv+FKjC5SE VT1Ptrf5eCt2CiLDIWzXFtzYzgF0VNjIr7rL X-Google-Smtp-Source: ABdhPJzEYnhzLMCavzfD/oBgFsm/B2MtNlussKHnNVdCSv/zHD7YgH32lN5ZsW/ak0165g1PPsKy0A== X-Received: by 2002:ac8:4247:: with SMTP id r7mr1817306qtm.291.1605649709539; Tue, 17 Nov 2020 13:48:29 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id d19sm4853014qtd.32.2020.11.17.13.48.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:48:28 -0800 (PST) Date: Tue, 17 Nov 2020 16:48:26 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 22/24] pack-bitmap-write: use existing bitmaps Message-ID: <4bf5e78a54dfdcbe13dd66ba4c5955a159ea181d.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee When constructing new bitmaps, we perform a commit and tree walk in fill_bitmap_commit() and fill_bitmap_tree(). This walk would benefit from using existing bitmaps when available. We must track the existing bitmaps and translate them into the new object order, but this is generally faster than parsing trees. In fill_bitmap_commit(), we must reorder thing somewhat. The priority queue walks commits from newest-to-oldest, which means we correctly stop walking when reaching a commit with a bitmap. However, if we walk trees from top to bottom, then we might be parsing trees that are actually part of a re-used bitmap. To avoid over-walking trees, add them to a LIFO queue and walk them from bottom-to-top after exploring commits completely. On git.git, this reduces a second immediate bitmap computation from 2.0s to 1.0s. On linux.git, we go from 32s to 22s. On chromium's fork network, we go from 227s to 198s. Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 42 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 1995f75818..37204b691c 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -340,20 +340,39 @@ static void fill_bitmap_tree(struct bitmap *bitmap, static void fill_bitmap_commit(struct bb_commit *ent, struct commit *commit, - struct prio_queue *queue) + struct prio_queue *queue, + struct prio_queue *tree_queue, + struct bitmap_index *old_bitmap, + const uint32_t *mapping) { if (!ent->bitmap) ent->bitmap = bitmap_new(); - bitmap_set(ent->bitmap, find_object_pos(&commit->object.oid)); prio_queue_put(queue, commit); while (queue->nr) { struct commit_list *p; struct commit *c = prio_queue_get(queue); + /* + * If this commit has an old bitmap, then translate that + * bitmap and add its bits to this one. No need to walk + * parents or the tree for this commit. + */ + if (old_bitmap && mapping) { + struct ewah_bitmap *old; + + old = bitmap_for_commit(old_bitmap, c); + if (old && !rebuild_bitmap(mapping, old, ent->bitmap)) + continue; + } + + /* + * Mark ourselves and queue our tree. The commit + * walk ensures we cover all parents. + */ bitmap_set(ent->bitmap, find_object_pos(&c->object.oid)); - fill_bitmap_tree(ent->bitmap, get_commit_tree(c)); + prio_queue_put(tree_queue, get_commit_tree(c)); for (p = c->parents; p; p = p->next) { int pos = find_object_pos(&p->item->object.oid); @@ -363,6 +382,9 @@ static void fill_bitmap_commit(struct bb_commit *ent, } } } + + while (tree_queue->nr) + fill_bitmap_tree(ent->bitmap, prio_queue_get(tree_queue)); } static void store_selected(struct bb_commit *ent, struct commit *commit) @@ -386,6 +408,9 @@ void bitmap_writer_build(struct packing_data *to_pack) size_t i; int nr_stored = 0; /* for progress */ struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; + struct prio_queue tree_queue = { NULL }; + struct bitmap_index *old_bitmap; + uint32_t *mapping; writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; @@ -395,6 +420,12 @@ void bitmap_writer_build(struct packing_data *to_pack) trace2_region_enter("pack-bitmap-write", "building_bitmaps_total", the_repository); + old_bitmap = prepare_bitmap_git(to_pack->repo); + if (old_bitmap) + mapping = create_bitmap_mapping(old_bitmap, to_pack); + else + mapping = NULL; + bitmap_builder_init(&bb, &writer); for (i = bb.commits_nr; i > 0; i--) { struct commit *commit = bb.commits[i-1]; @@ -402,7 +433,8 @@ void bitmap_writer_build(struct packing_data *to_pack) struct commit *child; int reused = 0; - fill_bitmap_commit(ent, commit, &queue); + fill_bitmap_commit(ent, commit, &queue, &tree_queue, + old_bitmap, mapping); if (ent->selected) { store_selected(ent, commit); @@ -428,7 +460,9 @@ void bitmap_writer_build(struct packing_data *to_pack) ent->bitmap = NULL; } clear_prio_queue(&queue); + clear_prio_queue(&tree_queue); bitmap_builder_clear(&bb); + free(mapping); stop_progress(&writer.progress); From patchwork Tue Nov 17 21:48:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913625 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8354C2D0E4 for ; Tue, 17 Nov 2020 21:48:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8D95B241A7 for ; Tue, 17 Nov 2020 21:48:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="fXBns2uk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728596AbgKQVsh (ORCPT ); Tue, 17 Nov 2020 16:48:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727386AbgKQVsh (ORCPT ); Tue, 17 Nov 2020 16:48:37 -0500 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37CE4C0613CF for ; Tue, 17 Nov 2020 13:48:36 -0800 (PST) Received: by mail-qt1-x843.google.com with SMTP id p12so17108990qtp.7 for ; Tue, 17 Nov 2020 13:48:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=T+oflGXLl8xWYLzNnpWt2kvd3xic8KRtedehKZGIHmA=; b=fXBns2ukmUVEECWBVda5xJLCvUYcwKlqZ8iF7scXQaVevKVSp37KlI8qOg/Ruks3IJ f/sn5R6KwlEsxts5JB49popu2Gly4UwfsNNyhRjwp2ZCY3PsbmYGnegjAce0M1u1ROzs vtMG6Y/CCx8fQUEMFLbU85vR3wEvY3UVTDmoa3VLiMHSanrCXTJOKWoC3we6wFa9RtFE T+RidtUUhmm5A2XPqs1THxxXSvzh+FmQkmLvfwNyIcBoSLDk9iARUD2InXzrXo4+fgn7 5luIK65GJNnamvm3gl+PK/AoLFPpNdEhs1UUgpO4KjGFms7SeBkf6YXvxfvaRNW3wO8E jdgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=T+oflGXLl8xWYLzNnpWt2kvd3xic8KRtedehKZGIHmA=; b=r3XkpI2aloOyE85aNcDQwamLW1dHWToUO3cx5wIb9kPOE9WdLUte9Ti2jMbpxmmN94 OSROY95ZcOXgp6t9qovwuW+NyzU55OcQykb9WPWfuqbiso37kkCxPP48mVJgI+HG2Ppt R/pZzjhTdm6DkDqQzkUF8spVCNO7SP/KfcwHfJLZ974och9RL4Ente3BKALw+JDiG6gE kNCc+BM0i5DP5zp15J9PvXIRwraBF2l8UpjlzMU9rSVmHB90Ng5KH7Ox1bkuCUkV5RTN 77pjSjLzAK7HYiJ0pDg9Msq3jRG6DXdx/Y8vnAv6tQTitm+NPM7hJeLYOSjQ6AWBPj/e 0Dhw== X-Gm-Message-State: AOAM533+Oc8zapaoCkNnpZnl2khLK5bFP9raGOU8MCT6w/hH6jePqAoL 5HRkNZDdAKdpGiWiCuaam0MO7D50NrwToHyI X-Google-Smtp-Source: ABdhPJxlz+1xpkDXwJHNlfi1rF9KEXe88XjeF782jkjNHq/4ybXy7kD3FKP1AR+OtA2v+uMJPDqRhw== X-Received: by 2002:ac8:4250:: with SMTP id r16mr1891647qtm.225.1605649715034; Tue, 17 Nov 2020 13:48:35 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id v32sm4943673qtb.42.2020.11.17.13.48.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:48:34 -0800 (PST) Date: Tue, 17 Nov 2020 16:48:32 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 23/24] pack-bitmap-write: relax unique rewalk condition Message-ID: <1da4fa0fb85fe848aa86987e767b33d296f8f878.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee The previous commits improved the bitmap computation process for very long, linear histories with many refs by removing quadratic growth in how many objects were walked. The strategy of computing "intermediate commits" using bitmasks for which refs can reach those commits partitioned the poset of reachable objects so each part could be walked exactly once. This was effective for linear histories. However, there was a (significant) drawback: wide histories with many refs had an explosion of memory costs to compute the commit bitmasks during the exploration that discovers these intermediate commits. Since these wide histories are unlikely to repeat walking objects, the benefit of walking objects multiple times was not expensive before. But now, the commit walk *before computing bitmaps* is incredibly expensive. In an effort to discover a happy medium, this change reduces the walk for intermediate commits to only the first-parent history. This focuses the walk on how the histories converge, which still has significant reduction in repeat object walks. It is still possible to create quadratic behavior in this version, but it is probably less likely in realistic data shapes. Here is some data taken on a fresh clone of the kernel: | runtime (sec) | peak heap (GB) | | | | | from | with | from | with | | scratch | existing | scratch | existing | -----------+---------+----------+---------+----------- original | 64.044 | 83.241 | 2.088 | 2.194 | last patch | 44.811 | 27.828 | 2.289 | 2.358 | this patch | 100.641 | 35.560 | 2.152 | 2.224 | Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 14 +++++--------- t/t5310-pack-bitmaps.sh | 27 ++++++++++++++------------- 2 files changed, 19 insertions(+), 22 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 37204b691c..b0493d971d 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -199,7 +199,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, { struct rev_info revs; struct commit *commit; - unsigned int i, num_maximal; + unsigned int i, num_maximal = 0; memset(bb, 0, sizeof(*bb)); init_bb_data(&bb->data); @@ -207,6 +207,7 @@ static void bitmap_builder_init(struct bitmap_builder *bb, reset_revision_walk(); repo_init_revisions(writer->to_pack->repo, &revs, NULL); revs.topo_order = 1; + revs.first_parent_only = 1; for (i = 0; i < writer->selected_nr; i++) { struct commit *c = writer->selected[i].commit; @@ -221,13 +222,12 @@ static void bitmap_builder_init(struct bitmap_builder *bb, add_pending_object(&revs, &c->object, ""); } - num_maximal = writer->selected_nr; if (prepare_revision_walk(&revs)) die("revision walk setup failed"); while ((commit = get_revision(&revs))) { - struct commit_list *p; + struct commit_list *p = commit->parents; struct bb_commit *c_ent; parse_commit_or_die(commit); @@ -235,16 +235,12 @@ static void bitmap_builder_init(struct bitmap_builder *bb, c_ent = bb_data_at(&bb->data, commit); if (c_ent->maximal) { - if (!c_ent->selected) { - bitmap_set(c_ent->commit_mask, num_maximal); - num_maximal++; - } - + num_maximal++; ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); bb->commits[bb->commits_nr++] = commit; } - for (p = commit->parents; p; p = p->next) { + if (p) { struct bb_commit *p_ent = bb_data_at(&bb->data, p->item); int c_not_p, p_not_c; diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh index 1691710ec1..a83e7a93fb 100755 --- a/t/t5310-pack-bitmaps.sh +++ b/t/t5310-pack-bitmaps.sh @@ -43,23 +43,24 @@ has_any () { # \| # * (base) # +# We only push bits down the first-parent history, which +# makes some of these commits unimportant! +# # The important part for the maximal commit algorithm is how # the bitmasks are extended. Assuming starting bit positions -# for master (bit 0) and other (bit 1), and some flexibility -# in the order that merge bases are visited, the bitmasks at -# the end should be: +# for master (bit 0) and other (bit 1), the bitmasks at the +# end should be: # # master: 1 (maximal, selected) # other: 01 (maximal, selected) -# octo-master: 1 -# octo-other: 01 -# merge-right: 111 (maximal) -# (l1): 111 -# (r1): 111 -# merge-left: 1101 (maximal) -# (l2): 11111 (maximal) -# (r2): 111101 (maximal) -# (base): 1111111 (maximal) +# (base): 11 (maximal) +# +# This complicated history was important for a previous +# version of the walk that guarantees never walking a +# commit multiple times. That goal might be important +# again, so preserve this complicated case. For now, this +# test will guarantee that the bitmaps are computed +# correctly, even with the repeat calculations. test_expect_success 'setup repo with moderate-sized history' ' test_commit_bulk --id=file 10 && @@ -113,7 +114,7 @@ test_expect_success 'full repack creates bitmaps' ' ls .git/objects/pack/ | grep bitmap >output && test_line_count = 1 output && grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace && - grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace + grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace ' test_expect_success 'rev-list --test-bitmap verifies bitmaps' ' From patchwork Tue Nov 17 21:48:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 11913627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCB0DC2D0E4 for ; Tue, 17 Nov 2020 21:48:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 635E4241A7 for ; Tue, 17 Nov 2020 21:48:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="JeonMeUJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728601AbgKQVsn (ORCPT ); Tue, 17 Nov 2020 16:48:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53854 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727386AbgKQVsn (ORCPT ); Tue, 17 Nov 2020 16:48:43 -0500 Received: from mail-qv1-xf43.google.com (mail-qv1-xf43.google.com [IPv6:2607:f8b0:4864:20::f43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2A67C0613CF for ; Tue, 17 Nov 2020 13:48:41 -0800 (PST) Received: by mail-qv1-xf43.google.com with SMTP id u23so4175824qvf.1 for ; Tue, 17 Nov 2020 13:48:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=h62hnBTQj+3iLx4T5aRYU4yZUSJ3pyFT8cbw1VivSPY=; b=JeonMeUJsvOEUFeaRN3eNDVeV7J4sNR4elorMpofkQUYYFSXD+pFj0yuUY2tU2DlDm FYTUsLQoiPrD5ZnhOFoeNTA4wcqPmbeCW4UoIwzlv0up6XycpA7qNkvLgCHLEqsqvXJi qf312I907HfGT2iZRF4GYkCYL9zN+RfxksfDMuaRpJlnZ6nU6a8QUQQ9F9st58Wa1u9F mwGnOKd0Xp69JFxaiJ7nZEHX0h4HIAeZGnjoLtDmBPK5W7XlO2zxwFX5G+XMkz/Hq0ci OM3W7U1XzrdImBu/w/M/nn45Bw3jR44SWWUn0gSJXYhl64kY0UMvZV9aY/GkZWBY6Svh izhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=h62hnBTQj+3iLx4T5aRYU4yZUSJ3pyFT8cbw1VivSPY=; b=Pfd1ZcJQ6nFxx6pPyBVGCLIbjuxw8FQUrYj1luppk87auEdaTwuwyj1wRkGzurE7DG M+kKuZLL7tcDAvucje6OMZBj9jDz7G1aVGrQIsfHDfXcMPfvxFl64P5t1Gn3mrkuEDZr BR4jfYzrB1lozQbTkEMFt+tzzg1dhvhOQENiy5OZ6ytgsmCSDko3/84bxxeucAAR9aX8 ywXcNddyYE5+gD7pexeieL1WJ0BiWgf+ZOdy4HEDjo6ZfPxLB8W0Swe5qea/4ins4OJz 55EwkXlSdggXIsqrnJMLoQLzx7aY8+sM7oQwDhmyu/9O0PhqvEoLq/9e7sGp4gGih4Jg 8hcw== X-Gm-Message-State: AOAM532kbrBuZPlXsJyjDtTfnCF341Xb6WSYtgG7M9qMA0Tg29KpNwr6 0Cjt154J2fOm+WnwtrEqiaKGx7WfwWfTB29P X-Google-Smtp-Source: ABdhPJxxXfTgjjMsq+KVwGCRIZw/yqhv6bzwAE+jB7gXU0O/asKOTkeurG7ZJt3NYGDMJIEhGniQPg== X-Received: by 2002:ad4:4745:: with SMTP id c5mr1535151qvx.2.1605649720915; Tue, 17 Nov 2020 13:48:40 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7fe5:c4d6:f587:dc1f]) by smtp.gmail.com with ESMTPSA id 205sm15270396qki.50.2020.11.17.13.48.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Nov 2020 13:48:40 -0800 (PST) Date: Tue, 17 Nov 2020 16:48:36 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net, martin.agren@gmail.com, szeder.dev@gmail.com Subject: [PATCH v2 24/24] pack-bitmap-write: better reuse bitmaps Message-ID: <42399a1c2e52e1d055a2d0ad96af2ca4dce6b1a0.1605649533.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee If the old bitmap file contains a bitmap for a given commit, then that commit does not need help from intermediate commits in its history to compute its final bitmap. Eject that commit from the walk and insert it as a maximal commit in the list of commits for computing bitmaps. This helps the repeat bitmap computation task, even if the selected commits shift drastically. This helps when a previously-bitmapped commit exists in the first-parent history of a newly-selected commit. Since we stop the walk at these commits and we use a first-parent walk, it is harder to walk "around" these bitmapped commits. It's not impossible, but we can greatly reduce the computation time for many selected commits. | runtime (sec) | peak heap (GB) | | | | | from | with | from | with | | scratch | existing | scratch | existing | -----------+---------+----------+---------+----------- last patch | 100.641 | 35.560 | 2.152 | 2.224 | this patch | 99.720 | 11.696 | 2.152 | 2.217 | Signed-off-by: Derrick Stolee Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index b0493d971d..3ac90ae410 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -195,7 +195,8 @@ struct bitmap_builder { }; static void bitmap_builder_init(struct bitmap_builder *bb, - struct bitmap_writer *writer) + struct bitmap_writer *writer, + struct bitmap_index *old_bitmap) { struct rev_info revs; struct commit *commit; @@ -234,12 +235,26 @@ static void bitmap_builder_init(struct bitmap_builder *bb, c_ent = bb_data_at(&bb->data, commit); + if (old_bitmap && bitmap_for_commit(old_bitmap, commit)) { + /* + * This commit has an existing bitmap, so we can + * get its bits immediately without an object + * walk. There is no need to continue walking + * beyond this commit. + */ + c_ent->maximal = 1; + p = NULL; + } + if (c_ent->maximal) { num_maximal++; ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc); bb->commits[bb->commits_nr++] = commit; } + if (!c_ent->commit_mask) + continue; + if (p) { struct bb_commit *p_ent = bb_data_at(&bb->data, p->item); int c_not_p, p_not_c; @@ -422,7 +437,7 @@ void bitmap_writer_build(struct packing_data *to_pack) else mapping = NULL; - bitmap_builder_init(&bb, &writer); + bitmap_builder_init(&bb, &writer, old_bitmap); for (i = bb.commits_nr; i > 0; i--) { struct commit *commit = bb.commits[i-1]; struct bb_commit *ent = bb_data_at(&bb.data, commit);