From patchwork Mon Apr 29 20:42:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647701 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D824B1411EF for ; Mon, 29 Apr 2024 20:42:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423379; cv=none; b=DKI3TkhiS/1jpxhQEEg9Ih+T42DB1PcuguunPOpBU/w2HjJMy2ga37zovgYmh2SMOmLQu76TmtwaA+a5OKLETUFpXQmrxPhRSrRSOq0J3Gk4X73s/iqpfGzO786HRmIlf/qf5od07M15vmh9evZIMKqR/HlUe4c01CmOz5qxQic= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423379; c=relaxed/simple; bh=uxyMBzXlEePT4trccpCZStVv6oK2YKnUVgB18Hbn7cg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=nDIFLXborfG7eEyGOy5tvJsHy2XqSANBoDl3EkeQxDa8Qomccc5SZLOAAksA8Qba8h4CrTGhMqlI3yhvADMvx+VKMJG7Gii4pa1m6XfUTuCQrWzHzAPn7uM+OVoZnS86bjg0Z3yDyNQp7E68axu6FpMzw32i6P2oHC4BLkCn/cs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=M5f+SPzc; arc=none smtp.client-ip=209.85.160.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="M5f+SPzc" Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-43a317135a5so39685441cf.0 for ; Mon, 29 Apr 2024 13:42:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423376; x=1715028176; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=OrI4IWKk+NpIw0nmTLS5qsL+Wz72wC39Xtuit9X4Iv8=; b=M5f+SPzcqiLEpPdsBhOyps4LDOw5QF2Qgxy4qIutEFbNr5xV2121SiSZw/o0VG3Ja2 kKwrLnMzH0xcvLKYZXWZ/OIbAmQ4lhsDggGK6N6PafaWuVSQTdE0Rplxz+CZLCPSH6Th pNN2H6wCrxzcjZC1+d/5YN3C15EshN8+jj7jSkvlFp6qR9DUWXhOfpViiu6K8OB1BoGa wy5MoGGE1KwX7GMcEkZQWlk2m642Pl/XrX2w8qGd7HYykdvct9gjI2g7kHOz7pDi9p2L qY3KVgs5nIstjfNaQYIac7KEX4x0kQWNK3XruysBbzvuE1lN5tNOc8+ywoQLkQm2t2OD GFyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423376; x=1715028176; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=OrI4IWKk+NpIw0nmTLS5qsL+Wz72wC39Xtuit9X4Iv8=; b=kXp1X5jkEIsgn+CWS10sCUvZ0+LYKM4IlRU8sGyU0lIYj1zbUhz8lvqjnB5eyQOqDA 87m4Zeq9Ogkvta1crZogH+hXNdPjXjdmK2ap6f2C5zj/oAx8ckz71RGw1749toxQ8OUr 4N8FTtjUmnl7rprzr5XX2enArnEhmGEd7vj4CjPOf6Q5A/At+3Xpku0/gS/H1HScuszI tp8DL1XngUrGE6G+F4ps+wyzaX7WZfEWk9AYgLkkxKiYCmaKVr1fsAZAoylWus6SxYK4 Tj1SFWnMrB0NvPDbqOPd0+V3hKpGn2IR702RSfU3ss9MknScBnBOS4A50tDc4SdoCOay cSZA== X-Gm-Message-State: AOJu0YyFjZ23MD/ptvb2mnBczBBO4sSfqbPLrl/1cKNTPntH0XT20ZMR /ct3CYiGSrtLvKyI6Mkv9Id6kWbkklNSyR81BD6EokYEy5VHy+F4ucoZ6crTUbJdidRp43LBqpI MtnQ= X-Google-Smtp-Source: AGHT+IH+Elvtep9+X3j9zSypLJjcv+gAkq9riUaNthYFGE/V9nVVeWEt73rd1/E4l09D8JSof/BM6Q== X-Received: by 2002:a05:622a:2595:b0:43b:173e:ce1b with SMTP id cj21-20020a05622a259500b0043b173ece1bmr1450052qtb.32.1714423376363; Mon, 29 Apr 2024 13:42:56 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id z11-20020a05622a124b00b00437b4048972sm10219109qtx.18.2024.04.29.13.42.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:42:55 -0700 (PDT) Date: Mon, 29 Apr 2024 16:42:54 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format Message-ID: <43fd5e3597151a86254e18e08ffd8cadbcb6e4f0.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Prepare to implement pseudo-merge bitmaps over the next several commits by first describing the serialization format which will store the new pseudo-merge bitmaps themselves. This format is implemented as an optional extension within the bitmap v1 format, making it compatible with previous versions of Git, as well as the original .bitmap implementation within JGit. The format (as well as a general description of pseudo-merge bitmaps, and motivating use-case(s)) is described in detail in the patch contents below, but the high-level description is as follows: - An array of pseudo-merge bitmaps, each containing a pair of EWAH bitmaps: one describing the set of pseudo-merge "parents", and another describing the set of object(s) reachable from those parents. - A lookup table to determine which pseudo-merge(s) a given commit appears in. An optional extended lookup table follows when there is at least one commit which appears in multiple pseudo-merge groups. - Trailing metadata, including the number of pseudo-merge(s), number of unique parents, the offset within the .bitmap file for the pseudo-merge commit lookup table, and the size of the optional extension itself. Signed-off-by: Taylor Blau --- Documentation/technical/bitmap-format.txt | 179 ++++++++++++++++++++++ 1 file changed, 179 insertions(+) diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index f5d200939b0..63a7177ac08 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -255,3 +255,182 @@ triplet is - xor_row (4 byte integer, network byte order): :: The position of the triplet whose bitmap is used to compress this one, or `0xffffffff` if no such bitmap exists. + +Pseudo-merge bitmaps +-------------------- + +If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of +bytes (preceding the name-hash cache, commit lookup table, and trailing +checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps. + +A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as +follows: + +Commit bitmap:: + + A bitmap whose set bits describe the set of commits included in the + pseudo-merge's "merge" bitmap (as below). + +Merge bitmap:: + + A bitmap whose set bits describe the reachability closure over the set + of commits in the pseudo-merge's "commits" bitmap (as above). An + identical bitmap would be generated for an octopus merge with the same + set of parents as described in the commits bitmap. + +Pseudo-merge bitmaps can accelerate bitmap traversals when all commits +for a given pseudo-merge are listed on either side of the traversal, +either directly (by explicitly asking for them as part of the `HAVES` +or `WANTS`) or indirectly (by encountering them during a fill-in +traversal). + +=== Use-cases + +For example, suppose there exists a pseudo-merge bitmap with a large +number of commits, all of which are listed in the `WANTS` section of +some bitmap traversal query. When pseudo-merge bitmaps are enabled, the +bitmap machinery can quickly determine there is a pseudo-merge which +satisfies some subset of the wanted objects on either side of the query. +Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the +resulting bitmap. By contrast, without pseudo-merge bitmaps, we would +have to repeat the decompression and `OR`-ing step over a potentially +large number of individual bitmaps, which can take proportionally more +time. + +Another benefit of pseudo-merges arises when there is some combination +of (a) a large number of references, with (b) poor bitmap coverage, and +(c) deep, nested trees, making fill-in traversal relatively expensive. +For example, suppose that there are a large enough number of tags where +bitmapping each of the tags individually is infeasible. Without +pseudo-merge bitmaps, computing the result of, say, `git rev-list +--use-bitmap-index --count --objects --tags` would likely require a +large amount of fill-in traversal. But when a large quantity of those +tags are stored together in a pseudo-merge bitmap, the bitmap machinery +can take advantage of the fact that we only care about the union of +objects reachable from all of those tags, and answer the query much +faster. + +=== File format + +If enabled, pseudo-merge bitmaps are stored in an optional section at +the end of a `.bitmap` file. The format is as follows: + +.... ++-------------------------------------------+ +| .bitmap File | ++-------------------------------------------+ +| | +| Pseudo-merge bitmaps (Variable Length) | +| +---------------------------+ | +| | commits_bitmap (EWAH) | | +| +---------------------------+ | +| | merge_bitmap (EWAH) | | +| +---------------------------+ | +| | ++-------------------------------------------+ +| | +| Lookup Table | +| +------------+--------------+ | +| | commit_pos | offset | | +| +------------+--------------+ | +| | 4 bytes | 8 bytes | | +| +------------+--------------+ | +| | +| Offset Cases: | +| ------------- | +| | +| 1. MSB Unset: single pseudo-merge bitmap | +| + offset to pseudo-merge bitmap | +| | +| 2. MSB Set: multiple pseudo-merges | +| + offset to extended lookup table | +| | ++-------------------------------------------+ +| | +| Extended Lookup Table (Optional) | +| | +| +----+----------+----------+----------+ | +| | N | Offset 1 | .... | Offset N | | +| +----+----------+----------+----------+ | +| | | 8 bytes | .... | 8 bytes | | +| +----+----------+----------+----------+ | +| | ++-------------------------------------------+ +| | +| Pseudo-merge Metadata | +| +------------------+----------------+ | +| | # pseudo-merges | # Commits | | +| +------------------+----------------+ | +| | 4 bytes | 4 bytes | | +| +------------------+----------------+ | +| | +| +------------------+----------------+ | +| | Lookup offset | Extension size | | +| +------------------+----------------+ | +| | 8 bytes | 8 bytes | | +| +------------------+----------------+ | +| | ++-------------------------------------------+ +.... + +* One or more pseudo-merge bitmaps, each containing: + + ** `commits_bitmap`, an EWAH-compressed bitmap describing the set of + commits included in the this psuedo-merge. + + ** `merge_bitmap`, an EWAH-compressed bitmap describing the union of + the set of objects reachable from all commits listed in the + `commits_bitmap`. + +* A lookup table, mapping pseudo-merged commits to the pseudo-merges + they belong to. Entries appear in increasing order of each commit's + bit position. Each entry is 12 bytes wide, and is comprised of the + following: + + ** `commit_pos`, a 4-byte unsigned value (in network byte-order) + containing the bit position for this commit. + + ** `offset`, an 8-byte unsigned value (also in network byte-order) + containing either one of two possible offsets, depending on whether or + not the most-significant bit is set. + + *** If unset (i.e. `offset & ((uint64_t)1<<63) == 0`), the offset + (relative to the beginning of the `.bitmap` file) at which the + pseudo-merge bitmap for this commit can be read. This indicates + only a single pseudo-merge bitmap contains this commit. + + *** If set (i.e. `offset & ((uint64_t)1<<63) != 0`), the offset + (again relative to the beginning of the `.bitmap` file) at which + the extended offset table can be located describing the set of + pseudo-merge bitmaps which contain this commit. This indicates + that multiple pseudo-merge bitmaps contain this commit. + +* An (optional) extended lookup table (written if and only if there is + at least one commit which appears in more than one pseudo-merge). + There are as many entries as commits which appear in multiple + pseudo-merges. Each entry contains the following: + + ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges + which contain a given commit. + + ** An array of `N` 8-byte unsigned values, each of which is + interpreted as an offset (relative to the beginning of the + `.bitmap` file) at which a pseudo-merge bitmap for this commit can + be read. These values occur in no particular order. + +* Positions for all pseudo-merges, each stored as an 8-byte unsigned + value (in network byte-order) containing the offset (relative to the + beginnign of the `.bitmap` file) of each consecutive pseudo-merge. + +* A 4-byte unsigned value (in network byte-order) equal to the number of + pseudo-merges. + +* A 4-byte unsigned value (in network byte-order) equal to the number of + unique commits which appear in any pseudo-merge. + +* An 8-byte unsigned value (in network byte-order) equal to the number + of bytes between the start of the pseudo-merge section and the + beginning of the lookup table. + +* An 8-byte unsigned value (in network byte-order) equal to the number + of bytes in the pseudo-merge section (including this field). From patchwork Mon Apr 29 20:43:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647703 Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E03D1411EF for ; Mon, 29 Apr 2024 20:43:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423386; cv=none; b=cPwiqfxNuzbhzBBh358c0shemIQqlOV75xJAVHNDbSzB0sGljvGc/8oHlm/ZbwPD7ysXjbOrnhcgQRKjjKjU68aaTg3vu2LpQuDx2Ng7MSq++8vQ61vlBr5+pX7g7FIo4WCYY8Xdw8Vlln8VFIO9uNHvrFrx9SU/qw13GqOe99Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423386; c=relaxed/simple; bh=xbESJ4F4HQc359w7+VrXMlAFZrYzInK2tDCTLrbnSbI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=RxSR1qeT6WVpE+sF5ghvBqyUnuOKPeVTjKGNgoQI5i1D0Re+BeZtlg3AbKfOhBr93M9GdUCZAke5UFhOG1GnHkHn9mzEu3HcD1B65MESWDVprDAaGnYc/NxCS00mQo3wOlKdGo9wJ1ECDMUUWFy1xRTawuAAttRTYne3ORAR/Rs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=RZMoQ98e; arc=none smtp.client-ip=209.85.160.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="RZMoQ98e" Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-4347cbdb952so23110631cf.3 for ; Mon, 29 Apr 2024 13:43:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423384; x=1715028184; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=IaU+CdAhesO5/IIvhGjVMpnAErIz1OGuGakuUZZErX4=; b=RZMoQ98ebojG1NEm6/smp7aRmwS2XIxwQif6UIJD7CcY400MQfSTCbOZotiDGRLLP7 XBSX+JvpIwPsFtacYjtyraHTtAP1GvuZdGuhG5oI68812+BwSWhkk8nRiOZNmb2IbkVN 0gG8bS8EY8bUXaEo1lgckKDkNmaR2T6l9af+ytTjUjRCObm+ihn88EwkxoumGVNPboI4 xXTyghfdexJLyBDDsFb3r6aYByIGcengr+uV3+AGTFGLGJDEq1kU197CHqjwd/GdGezw 7q76/SS3DCjpxbw/irIBpWQZv6dAsqkoPP4o44OPbxj7KSzhDEqCNHbl8g+y2dDwdplZ QumA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423384; x=1715028184; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=IaU+CdAhesO5/IIvhGjVMpnAErIz1OGuGakuUZZErX4=; b=A0ol0/xNqKUbKHlR4DNAcOInL8FBtKnG+6+CPE9HcXpLJ7ee2h4ZW2iTmcmnBXhXQf cIp18a6Vb2AckSiSJtuq2+vIUNSKAlWIHbTcoB33Id8iLYAHWpUzVbEs6yOnqaCKNc12 oumBywfViKAiEuMy4+6zYCg6UJPMyo+6XwPvaUFsKNyMefl1ZgYmInxQMopNzSCKu3jU M2RELjWS7g50WA7l5QTWY2FKR6B+SCGOjiw2a83W0uSMF6dHpPOtJGiLG9GQelf/RlBs 6yWtdg5oXgfL3pAj/BceEoxzM7Sw1rQ7yQd5PdDWiXLXrjirxVAwGm2phb72XhwQvuQP pjcQ== X-Gm-Message-State: AOJu0Yzfp1se+cMGnxVbyKFEUUVjIqxmPKTcLD55YyhRUJM2rNi3WDLj MDIRJCMLtBsHq0xTShO1GC6NFbT5R30T9uB/Hu8Fzz8tGZA8CIdkSBagc+EH4n4revgY/BYKBmJ 6Bp4= X-Google-Smtp-Source: AGHT+IF9orO3F6YpoJWUyBMUfO9sa3/eyDlbjTKF1cvWOUQquSGUsrYw3CLddkvra+yBQPFrSWtM8w== X-Received: by 2002:a05:622a:3cd:b0:439:daae:d6a6 with SMTP id k13-20020a05622a03cd00b00439daaed6a6mr693445qtx.15.1714423383596; Mon, 29 Apr 2024 13:43:03 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id y15-20020ac8708f000000b0043476c7f668sm10772799qto.5.2024.04.29.13.43.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:43:03 -0700 (PDT) Date: Mon, 29 Apr 2024 16:43:01 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 02/23] ewah: implement `ewah_bitmap_is_subset()` Message-ID: <290d928325dedd89d8e95aa12e643434b0dd2501.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: In order to know whether a given pseudo-merge (comprised of a "parents" and "objects" bitmaps) is "satisfied" and can be OR'd into the bitmap result, we need to be able to quickly determine whether the "parents" bitmap is a subset of the current set of objects reachable on either side of a traversal. Implement a helper function to prepare for that, which determines whether an EWAH bitmap (the parents bitmap from the pseudo-merge) is a subset of a non-EWAH bitmap (in this case, the results bitmap from either side of the traversal). This function makes use of the EWAH iterator to avoid inflating any part of the EWAH bitmap after we determine it is not a subset of the non-EWAH bitmap. This "fail-fast" allows us to avoid a potentially large amount of wasted effort. Signed-off-by: Taylor Blau --- ewah/bitmap.c | 43 +++++++++++++++++++++++++++++++++++++++++++ ewah/ewok.h | 6 ++++++ 2 files changed, 49 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index ac7e0af622a..d352fec54ce 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -138,6 +138,49 @@ void bitmap_or(struct bitmap *self, const struct bitmap *other) self->words[i] |= other->words[i]; } +int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other) +{ + struct ewah_iterator it; + eword_t word; + size_t i; + + ewah_iterator_init(&it, self); + + for (i = 0; i < other->word_alloc; i++) { + if (!ewah_iterator_next(&word, &it)) { + /* + * If we reached the end of `self`, and haven't + * rejected `self` as a possible subset of + * `other` yet, then we are done and `self` is + * indeed a subset of `other`. + */ + return 1; + } + if (word & ~other->words[i]) { + /* + * Otherwise, compare the next two pairs of + * words. If the word from `self` has bit(s) not + * in the word from `other`, `self` is not a + * subset of `other`. + */ + return 0; + } + } + + /* + * If we got to this point, there may be zero or more words + * remaining in `self`, with no remaining words left in `other`. + * If there are any bits set in the remaining word(s) in `self`, + * then `self` is not a subset of `other`. + */ + while (ewah_iterator_next(&word, &it)) + if (word) + return 0; + + /* `self` is definitely a subset of `other` */ + return 1; +} + void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other) { size_t original_size = self->word_alloc; diff --git a/ewah/ewok.h b/ewah/ewok.h index c11d76c6f33..2b6c4ac499c 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -179,7 +179,13 @@ void bitmap_unset(struct bitmap *self, size_t pos); int bitmap_get(struct bitmap *self, size_t pos); void bitmap_free(struct bitmap *self); int bitmap_equals(struct bitmap *self, struct bitmap *other); + +/* + * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set + * of bits in 'self' are a subset of the bits in 'other'. Returns 0 otherwise. + */ int bitmap_is_subset(struct bitmap *self, struct bitmap *other); +int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other); struct ewah_bitmap * bitmap_to_ewah(struct bitmap *bitmap); struct bitmap *ewah_to_bitmap(struct ewah_bitmap *ewah); From patchwork Mon Apr 29 20:43:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647704 Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 311C31411EF for ; Mon, 29 Apr 2024 20:43:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423389; cv=none; b=t/DSuTuG7gyxbPqtKLjYblqTp5ER/s8Dj2P+xLn7Vdqyqg21bU27ugBte4zNn0f1CfyDHrpgN2Tp/vpzAtLn6uoBLYd24ykTBjX9AAAFjA06IKU3ua2cy4sYB1jG3Qje70JPD9KZZy19NV3hEE9IgZa8BW80SPu0S6TqqjpDx6k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423389; c=relaxed/simple; bh=iKnexeLn1VB6/XrI3uiwbpAxYwgIfxhwsCkSb7jo5g4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=u/QQggU+3Pd5LaB2hdp0aRcNJZ8KWQ4TGQF218GvftyV7tm/yJn4pk3w1nCiKcd45D3BsFkRImr2RMxvYMfTaY/Hfoeec2Pj9Dg5sNJH9mNLftGOKIrj+z3uLvx0TfD4ftJGxL1R/c6RSPlj4SrGVJndNxkXdHULpiFQOleYq2o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=NCy803Cx; arc=none smtp.client-ip=209.85.219.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="NCy803Cx" Received: by mail-qv1-f50.google.com with SMTP id 6a1803df08f44-69b4e94ecd2so24187786d6.3 for ; Mon, 29 Apr 2024 13:43:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423387; x=1715028187; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=JjtxHcAnI7ucNOXIsqOnaxMAIC5xY892HGXD+sKEcHw=; b=NCy803CxIx6WqnNscLASwVSj4z1lZNqUd1wK2Reo0bZ/TgyVNZO8hgyxCuJzflqr8M wnghyuFhZRt0GIUKkM8l9wiGaSsuiyECuL3JQqUw9TA1Uuwzr1ytelLL6yF1kLP2KNeG eH1X7Gtj7Xe/2HrvkOULX/Tld7S7F3yfBOMKcatrvQZqTQ/ObonHnSYSqHQas08owo61 h0rVFnZWEElFjtEziUm/Xr4Lh5/UKgk47M70+S2Jf2z38/2F0aljuBQ/YjoyabidhaRS t4T9ZEEpPImjEZbX4R30iEoMevdQTEvym2nCMBfSGfRBl5u6PyHzYi7Q8MbMTRjsiJUB vxgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423387; x=1715028187; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JjtxHcAnI7ucNOXIsqOnaxMAIC5xY892HGXD+sKEcHw=; b=VkYm4Dlir+E1+A0qcGv0wBw3WW/XKUTI6xrHUG11y5yduS5yP6XwtgYrzf0rkrT2AT YkDlsghkzHQQeEHp1FiASRC+qkd8yMZmzV14HTBJTlsrVFTwmpabEj8V7nJCZcI/NX2I qDIN7480wmcfh6RXKKewnznhZJAejmQUlLV4/f7vzH5918Wjolspqh1JOnectWYdVp0w imkKWfGcQMhKMvT+1liVLADnsIgmCO7bg2PEelvgilkEIstkUUWm5P3ODWYW8y/xiZ5X SKsdiJhP78h44KL91uJyyNN43NBQ3HOa7CozCigyDrlGqjlIfuFqZoPSHjxQCD7Bu5lI +C4Q== X-Gm-Message-State: AOJu0YzUfJ5BsZf2+o2pkNiQhT+gPTOBqUeLXt8Be1YMliiTOInAqlCY IKb5/MD0C3VDhSC19hkDvPO1zW5Vi1XT867CbPDkMNCW8VZwyhwwKYk4pKDFHEnKDukAlz4BTmP 7ylk= X-Google-Smtp-Source: AGHT+IEnT8KkgkLEzTdKtBKiUqZKxkTN5O9McW+b0I/fOdSCv4qr6VBiX4sEA68UFCG3NzeTgBxMng== X-Received: by 2002:a05:6214:29ed:b0:6a0:aac9:c56f with SMTP id jv13-20020a05621429ed00b006a0aac9c56fmr8996204qvb.33.1714423386750; Mon, 29 Apr 2024 13:43:06 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id r1-20020ad44041000000b006a0d4d7ec55sm765496qvp.88.2024.04.29.13.43.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:43:06 -0700 (PDT) Date: Mon, 29 Apr 2024 16:43:05 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 03/23] pack-bitmap: drop unused `max_bitmaps` parameter Message-ID: <5160859f7f3cf72de03a4644ee3d3743eaba2bc2.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The `max_bitmaps` parameter in `bitmap_writer_select_commits()` was introduced back in 7cc8f97108 (pack-objects: implement bitmap writing, 2013-12-21), making it original to the bitmap implementation in Git itself. When that patch was merged via 0f9e62e084 (Merge branch 'jk/pack-bitmap', 2014-02-27), its sole caller in builtin/pack-objects.c passed a value of "-1" for `max_bitmaps`, indicating no limit. Since then, the only other caller (in midx.c, added via c528e17966 (pack-bitmap: write multi-pack bitmaps, 2021-08-31)) also uses a value of "-1" for `max_bitmaps`. Since no callers have needed a finite limit for the `max_bitmaps` parameter in the nearly decade that has passed since 0f9e62e084, let's remove the parameter and any dead pieces of code connected to it. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 2 +- midx-write.c | 2 +- pack-bitmap-write.c | 8 +------- pack-bitmap.h | 2 +- 4 files changed, 4 insertions(+), 10 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index baf0090fc8d..5060ce2dfba 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1359,7 +1359,7 @@ static void write_pack_file(void) stop_progress(&progress_state); bitmap_writer_show_progress(progress); - bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1); + bitmap_writer_select_commits(indexed_commits, indexed_commits_nr); if (bitmap_writer_build(&to_pack) < 0) die(_("failed to write bitmap index")); bitmap_writer_finish(written_list, nr_written, diff --git a/midx-write.c b/midx-write.c index 65e69d2de78..469cceaa583 100644 --- a/midx-write.c +++ b/midx-write.c @@ -838,7 +838,7 @@ static int write_midx_bitmap(const char *midx_name, for (i = 0; i < pdata->nr_objects; i++) index[pack_order[i]] = &pdata->objects[i].idx; - bitmap_writer_select_commits(commits, commits_nr, -1); + bitmap_writer_select_commits(commits, commits_nr); ret = bitmap_writer_build(pdata); if (ret < 0) goto cleanup; diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index c6c8f94cc51..c35bc81d00f 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -591,8 +591,7 @@ static int date_compare(const void *_a, const void *_b) } void bitmap_writer_select_commits(struct commit **indexed_commits, - unsigned int indexed_commits_nr, - int max_bitmaps) + unsigned int indexed_commits_nr) { unsigned int i = 0, j, next; @@ -615,11 +614,6 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, if (i + next >= indexed_commits_nr) break; - if (max_bitmaps > 0 && writer.selected_nr >= max_bitmaps) { - writer.selected_nr = max_bitmaps; - break; - } - if (next == 0) { chosen = indexed_commits[i]; } else { diff --git a/pack-bitmap.h b/pack-bitmap.h index c7dea13217a..3f96608d5c1 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -110,7 +110,7 @@ int rebuild_bitmap(const uint32_t *reposition, struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git, struct commit *commit); void bitmap_writer_select_commits(struct commit **indexed_commits, - unsigned int indexed_commits_nr, int max_bitmaps); + unsigned int indexed_commits_nr); int bitmap_writer_build(struct packing_data *to_pack); void bitmap_writer_finish(struct pack_idx_entry **index, uint32_t index_nr, From patchwork Mon Apr 29 20:43:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647705 Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A0A41411EF for ; Mon, 29 Apr 2024 20:43:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423392; cv=none; b=aCaWg6l5L1TjPFmw4BQ2IYn3ITYvz22Lqb6Au9GlWWoUDZcdxvhVj158Po9X8pDQBiOzrDAdYfp3CfJMwN2UuGKrrjD54TeXRl1InOSTFK7Von1GmupgD/zcWBqGZCqrJ5QR/zoMU/cdlWGi3+cZo8+Jiir4A62suE5BlkH1f40= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423392; c=relaxed/simple; bh=R2R3PYNmR8O1rtREDVem/efjKmPJyiGIPUVZ9rL5wdY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=dEoUacioXkzCBcqsTvfX/mmEQJDxdU9twQrsNppdk3Z9swZ4e3teS9X797kALSXpmCVDiJAMN1vpQLaI44dTOXcBiJDOj2V0FqKbKgbHNJ66Z9zd8LWOU+rOdgAzIEnXqTbp3pVYHfUwNmVr/DZDts92P0GKKGCJ6B3k8aEB3/k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=yGlih9Ru; arc=none smtp.client-ip=209.85.219.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="yGlih9Ru" Received: by mail-qv1-f50.google.com with SMTP id 6a1803df08f44-69b4043b7b3so29331026d6.1 for ; Mon, 29 Apr 2024 13:43:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423390; x=1715028190; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=/G0CuWAteRaKdhzIRJGhjRw3j94B2GZN9Bg+THSA5jU=; b=yGlih9Ru9fAURzzyfWnLpw4fQPhd6WjrXpcl+BCJ+3i4uGcjUPxPSu35YP9rnrBdHE u1C/3iV1/BooHNtg47H74FC7XQhiZZZGxzktm7PLn5qcMvMjb8ASklR47YmeWUnULMOX NfV1ejHNfF7/9cFlRHq+lXpJHqlOdFAE7X3bFjA56KnpGmXUVZE/hJ5GqVJNC5pe7T8R KSGx68IoXd5x9v3MYjK4LD+Vv7yXJYSqXD8cOE6q9vHzNCSR/+8WGwEe3ecm8e1+ljDr eqnrSgwiufVWGQxUCCCTCHeH/v+AvpcIqXdXHgwC5RZhoa/Fat2yPBlVecp8/vaTAYJF iXnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423390; x=1715028190; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/G0CuWAteRaKdhzIRJGhjRw3j94B2GZN9Bg+THSA5jU=; b=DoHRxfLa6GN7RL7NXb41YSs2uTr/X5WDHL3Wc6MnjD39bh39S7hjDLD95MhVN6gp+J A5I7Wu/oNjHwMY9yUX9NtZ2ngwyL9DBtDxbMUB3zaPAT70r/e9jgAlp/ctiDCwhDgqYS OEmepl4VKxXt97YB3s5MNbBSOoWfBGRpExMJxyJFi3UNjJdVXxhovgKKLHA+J/yJXrsA DdQsNxtqlqwnYkxnQ6k9B8ZPLee4I3dP/KeK2cHjMGL6rVWVYUnxUAfEHAFt7QJaXrU0 Pg9rScK4ZztzqqM3Se4uNpFQrrImWwIzV/p60UYDTrUPYYtiywMxX2UOG3grBe6SKQ0O g67A== X-Gm-Message-State: AOJu0Yx5OmaAkBPYSkuQEEz9eAq6uO4HTl8iQV+q4onLwNf0lgR50phO 4e0PJel+IHIJVAMXAPLnD7LA5SPGZ61koBo1EjFH4i0GNrTyDTmfnEu9x0/Ieuo5/TnzugAiKQO +qjk= X-Google-Smtp-Source: AGHT+IGzTxYsC3+bbyL13olmrCn0CIvNoob9x04f3zwMRQEC8kLWnMtlNivk677IzhUepP0MZUJEIw== X-Received: by 2002:a05:6214:21cd:b0:6a0:cc66:3c74 with SMTP id d13-20020a05621421cd00b006a0cc663c74mr5538036qvh.18.1714423389947; Mon, 29 Apr 2024 13:43:09 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id a8-20020a0ce388000000b0069b2064b988sm10788557qvl.131.2024.04.29.13.43.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:43:09 -0700 (PDT) Date: Mon, 29 Apr 2024 16:43:08 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 04/23] pack-bitmap: move some initialization to `bitmap_writer_init()` Message-ID: <3d7d930b1c5c4d122d8731ef0dc3fc90115573a2.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The pack-bitmap-writer machinery uses a oidmap (backed by khash.h) to map from commits selected for bitmaps (by OID) to a bitmapped_commit structure (containing the bitmap itself, among other things like its XOR offset, etc.) This map was initialized at the end of `bitmap_writer_build()`. New entries are added in `pack-bitmap-write.c::store_selected()`, which is called by the bitmap_builder machinery (which is responsible for traversing history and generating the actual bitmaps). Reorganize when this field is initialized and when entries are added to it so that we can quickly determine whether a commit is a candidate for pseudo-merge selection, or not (since it was already selected to receive a bitmap, and thus is ineligible for pseudo-merge inclusion). The changes are as follows: - Introduce a new `bitmap_writer_init()` function which initializes the `writer.bitmaps` field (instead of waiting until the end of `bitmap_writer_build()`). - Add map entries in `push_bitmapped_commit()` (which is called via `bitmap_writer_select_commits()`) with OID keys and NULL values to track whether or not we *expect* to write a bitmap for some given commit. - Validate that a NULL entry is found matching the given key when we store a selected bitmap. Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 1 + midx-write.c | 1 + pack-bitmap-write.c | 23 ++++++++++++++++++----- pack-bitmap.h | 1 + 4 files changed, 21 insertions(+), 5 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 5060ce2dfba..2958cdda499 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1339,6 +1339,7 @@ static void write_pack_file(void) hash_to_hex(hash)); if (write_bitmap_index) { + bitmap_writer_init(the_repository); bitmap_writer_set_checksum(hash); bitmap_writer_build_type_index( &to_pack, written_list, nr_written); diff --git a/midx-write.c b/midx-write.c index 469cceaa583..ed5f8b72b9c 100644 --- a/midx-write.c +++ b/midx-write.c @@ -819,6 +819,7 @@ static int write_midx_bitmap(const char *midx_name, for (i = 0; i < pdata->nr_objects; i++) index[i] = &pdata->objects[i].idx; + bitmap_writer_init(the_repository); bitmap_writer_show_progress(flags & MIDX_PROGRESS); bitmap_writer_build_type_index(pdata, index, pdata->nr_objects); diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index c35bc81d00f..9bc41a9e145 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -46,6 +46,11 @@ struct bitmap_writer { static struct bitmap_writer writer; +void bitmap_writer_init(struct repository *r) +{ + writer.bitmaps = kh_init_oid_map(); +} + void bitmap_writer_show_progress(int show) { writer.show_progress = show; @@ -117,11 +122,20 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack, static inline void push_bitmapped_commit(struct commit *commit) { + int hash_ret; + khiter_t hash_pos; + if (writer.selected_nr >= writer.selected_alloc) { writer.selected_alloc = (writer.selected_alloc + 32) * 2; REALLOC_ARRAY(writer.selected, writer.selected_alloc); } + hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret); + if (!hash_ret) + die(_("duplicate entry when writing bitmap index: %s"), + oid_to_hex(&commit->object.oid)); + kh_value(writer.bitmaps, hash_pos) = NULL; + writer.selected[writer.selected_nr].commit = commit; writer.selected[writer.selected_nr].bitmap = NULL; writer.selected[writer.selected_nr].flags = 0; @@ -466,14 +480,14 @@ static void store_selected(struct bb_commit *ent, struct commit *commit) { struct bitmapped_commit *stored = &writer.selected[ent->idx]; khiter_t hash_pos; - int hash_ret; stored->bitmap = bitmap_to_ewah(ent->bitmap); - hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret); - if (hash_ret == 0) - die("Duplicate entry when writing index: %s", + hash_pos = kh_get_oid_map(writer.bitmaps, commit->object.oid); + if (hash_pos == kh_end(writer.bitmaps)) + die(_("attempted to store non-selected commit: '%s'"), oid_to_hex(&commit->object.oid)); + kh_value(writer.bitmaps, hash_pos) = stored; } @@ -488,7 +502,6 @@ int bitmap_writer_build(struct packing_data *to_pack) uint32_t *mapping; int closed = 1; /* until proven otherwise */ - writer.bitmaps = kh_init_oid_map(); writer.to_pack = to_pack; if (writer.show_progress) diff --git a/pack-bitmap.h b/pack-bitmap.h index 3f96608d5c1..dae2d68a338 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -97,6 +97,7 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_i off_t get_disk_usage_from_bitmap(struct bitmap_index *, struct rev_info *); +void bitmap_writer_init(struct repository *r); void bitmap_writer_show_progress(int show); void bitmap_writer_set_checksum(const unsigned char *sha1); void bitmap_writer_build_type_index(struct packing_data *to_pack, From patchwork Mon Apr 29 20:43:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647706 Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com [209.85.219.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAB271411EF for ; Mon, 29 Apr 2024 20:43:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423396; cv=none; b=cIBikTYc4tWRaUX8FxLz3slHoTkyarcVw0aVfnODIkwI/QFKBTSZ4lmYjtIeGW7Wt69sqpkndJRHGrlY23h1Sv44srLJ0hjR/LKYoI/zrsySWqWk2PM+Bcq0YW5TLQK4+u6rZqC7T9gde0AYL9/aXcUozZoMzxPKMN6A3/uhU0Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423396; c=relaxed/simple; bh=cfOZevQxKQ7lU5FOH0CzJya2ZzWk0TwhcqA7oTLpGBU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=FJtFYJyqy+oYStFS1y2oydDbNu0xZiWWW/g0OUFTiwMaccoz2ZvlE9qtyDFlFbtxqbukrN6Aj/nPVI1aPztvTzDQJgA0mTOkLUVNDsiXkmWz9chtf4Twmwjbtqi7KrfK7M/aly00rsYMytyf4hrGs2d1U0oiaMPM9rlyuaPkaOI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=mXvbY0jc; arc=none smtp.client-ip=209.85.219.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="mXvbY0jc" Received: by mail-qv1-f54.google.com with SMTP id 6a1803df08f44-6a0c8fb3540so14402086d6.1 for ; Mon, 29 Apr 2024 13:43:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423393; x=1715028193; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=aOfHl3IsgyD4kmOqu7gRQ/cA2C0zX3zS64mKluQK8po=; b=mXvbY0jcPH3l/bhkRLedFn63hk9L5ICoO6NwInM6oUisDfPHtJ4nQ74b/EWo4A9YL9 MsmLcFw57Dc5EaovvN1NSlbzma798aANDM5y0wkjD7T4AmEdnADmL1jkcYgYaRpN1ouy pS4FVXfs6I2HppLg7HexmQ/YL3xzrr9UElrZLWRJ2XyBrpxjLpeDZ6MURmgyHaZ36Mrk E2Rf2yD7mSS9LMKcju3U+ozZnuYrqqYPBHtRQu+dO9nAJBcSFdsbn+thGCeGUgrlczrH 5x9I5rhRgGSfNMeNs9IV4GO7kOx9em7CIwJbYV+aq+dWchzIkY5bEbwkvGazDtPyBVh4 bjfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423393; x=1715028193; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=aOfHl3IsgyD4kmOqu7gRQ/cA2C0zX3zS64mKluQK8po=; b=MJVUxvtaDORmtII7tVS5MTIQhDlrwLZvmVAe5m2shxN41aIw3K/QDD1huQwJ/4MNbr xFSsWk0nHIO5fwLuMOf8n9lLgcRrqzrOgWE7mHcBVSy6XzN9k6twg6hwzhhWV00DltfW SCDdOEt5H2Rp8MGzeMQzqfRuq6kB8ICLjwWXzC/9ht9Acf7am8Lfz0nUarfADJp8+mWn cmlDws6RP838PELNlDhn49Evsgrr4rI4ObS2Dtwy9XemUqbqNd2zkw6kLKtXWEv8MJGz /Zg/VUFFUbQ6d5QCkOs5qSyqCWVzKsg7o+jvvNmD5YyNSTa+ugYjpAKExvqM6YLGOKQh GLCQ== X-Gm-Message-State: AOJu0YxV5w92O7SMisPB5UrYOg2Fa+lsoW3Mg/m92MWkEoNF+GXfXt8r KT4kSWRdk2Fne16wAvLGIOTPK8kehyB2jYADaZrX9ZJOqk5cqslxRjYRo9Am+u6JbSo4qa0nTuM xIEE= X-Google-Smtp-Source: AGHT+IFimIO3TRJUbxhsiZ8NEhRJvh828E+4SvPrfVMyDfy8RrkGige6n3NP5ZVNiKhhhHoJc6LKjQ== X-Received: by 2002:a05:6214:5099:b0:696:4086:5e1 with SMTP id kk25-20020a056214509900b00696408605e1mr1580246qvb.2.1714423393517; Mon, 29 Apr 2024 13:43:13 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id x8-20020ad44588000000b0069b432df140sm3847012qvu.121.2024.04.29.13.43.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:43:13 -0700 (PDT) Date: Mon, 29 Apr 2024 16:43:11 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 05/23] pseudo-merge.ch: initial commit Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Add a new (empty) header file to contain the implementation for selecting, reading, and applying pseudo-merge bitmaps. For now this header and its corresponding implementation are left empty, but they will evolve over the course of subsequent commit(s). Signed-off-by: Taylor Blau --- Makefile | 1 + pseudo-merge.c | 2 ++ pseudo-merge.h | 6 ++++++ 3 files changed, 9 insertions(+) create mode 100644 pseudo-merge.c create mode 100644 pseudo-merge.h diff --git a/Makefile b/Makefile index 1e31acc72ec..6a3d164fdf8 100644 --- a/Makefile +++ b/Makefile @@ -1119,6 +1119,7 @@ LIB_OBJS += prompt.o LIB_OBJS += protocol.o LIB_OBJS += protocol-caps.o LIB_OBJS += prune-packed.o +LIB_OBJS += pseudo-merge.o LIB_OBJS += quote.o LIB_OBJS += range-diff.o LIB_OBJS += reachable.o diff --git a/pseudo-merge.c b/pseudo-merge.c new file mode 100644 index 00000000000..37e037ba272 --- /dev/null +++ b/pseudo-merge.c @@ -0,0 +1,2 @@ +#include "git-compat-util.h" +#include "pseudo-merge.h" diff --git a/pseudo-merge.h b/pseudo-merge.h new file mode 100644 index 00000000000..cab8ff6960a --- /dev/null +++ b/pseudo-merge.h @@ -0,0 +1,6 @@ +#ifndef PSEUDO_MERGE_H +#define PSEUDO_MERGE_H + +#include "git-compat-util.h" + +#endif From patchwork Mon Apr 29 20:43:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647707 Received: from mail-oi1-f174.google.com (mail-oi1-f174.google.com [209.85.167.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23E241411EF for ; Mon, 29 Apr 2024 20:43:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423399; cv=none; b=g2QqS5HhY9qux9IlLCzhspI6PwGSoyeEUBrFJszuA4IQfYDqRual4NA+BpUS/ZVqikGi70u3L9hNQ9ZacoylvTdEjlLUUo/Qp4nIgdG06KSe1Rx/6loUFWfOlFsQpMn8GFKCPCt4P6AYPkUh/ndGy2xZ328PI04t0YWWv6Ghcxk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423399; c=relaxed/simple; bh=syWjFrn8ECzEYK9++Z+udP13OjgkpUyW0n6BjqKVB3Q=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=GaerpeSkRWxlckuBb/3rz0IoiS1cy6Rl/GEyg8C4FedKRrh0ZAxrOyewvtfkhcs51o0Td4ByMuEQsZE+BuM7y4yitpwA6T+T5yoMYywGBwednb+Wmy3dF3iMaHydkxcnl+yPgZdfoRoZlUMI9e2saTqZlFdE21M6l/YjUo2tUAE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=eKcaKj6p; arc=none smtp.client-ip=209.85.167.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="eKcaKj6p" Received: by mail-oi1-f174.google.com with SMTP id 5614622812f47-3c6f6c5bc37so2514905b6e.1 for ; Mon, 29 Apr 2024 13:43:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423397; x=1715028197; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=OfAXjcEOc8B55XNZDM5wggzSFw6UG91isjhZk5Ttxe8=; b=eKcaKj6pyI2AaSzlJTMJ7Sbb3LJssFNeLe244BXCDIsKwfVVmYVMD/BeIEiMttP/05 LV8mzkd7TOagQO4b++eMlwe6il6auxWP8ZhkaWoDq4VVuoHNw33FSNKQSxe7fBVo9oBy HHhFwbVh7mgraSK8ipCGSO476eTdTMkhEqeFo6rdJ8Rq/tmZIb64mqrpQZspHiXLbOP9 styL556t/5LeNN7K5IzExAoSjoqXUepgw9uUPCCxv8fl6pm/jXMTO67C8C9CQOJlIaOt Z9Nsa8Bkz7fo1a5cW0FiSPqR4sEDiU1aFeO8NMYF9rWIWTQFicij6ORjt+uGsb84z2Ge zGXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423397; x=1715028197; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=OfAXjcEOc8B55XNZDM5wggzSFw6UG91isjhZk5Ttxe8=; b=mtLTTo0XypkkyCxgt9sK28G18NQnCo5JVX8fLLKqEugsWwq7rtaeldr7/C9bLEjoeB nQKUdV2Dot1j02gTExaGMC6JrsQ/1omZCIQ8buORwLEqIRCSAE0NhGZ+TLn5fWovLGlm h0NRnu4DknVcA5aag+obj4W+fr31CJFrsonFwveyiNK0XCOyEmOb5etKPCvw6Z03Nt04 g1TTrcQsC6AHLEO9lKqePbQaerD86MmChU3xolL386QwV43KaJ1xhOcZG0l2XIDY+BXQ 05Vtu112fNz4N1JQuixxLLnzlvvcJht0CYVP+y2gfLlLC/DLFTldUfI/5M7s8KROzc6a srww== X-Gm-Message-State: AOJu0Yz/aqW6jk+Zc7RTK6DWPvVOZolXqLhW4drj1H97wXw1IX8wf8db m64QowuV4Vw0BzW++mLT9M3wdQqZL+KS1Vu4X+RsW631PcvtxSZIkDdT1N/ih67B4XtChi8jKw3 0T9c= X-Google-Smtp-Source: AGHT+IFNx0bbzd6dIiwYvwhvY5zNB3VqSDFjwP3FHOPc9trY6lZux5ebCCiiLFS5pJJvSaWPvgli7A== X-Received: by 2002:a05:6808:4348:b0:3c8:6957:fbae with SMTP id dx8-20020a056808434800b003c86957fbaemr805714oib.17.1714423396835; Mon, 29 Apr 2024 13:43:16 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id o17-20020ae9f511000000b0078f1044bd68sm10536283qkg.50.2024.04.29.13.43.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:43:16 -0700 (PDT) Date: Mon, 29 Apr 2024 16:43:15 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Prepare to write pseudo-merge bitmaps by annotating individual bitmapped commits (which are represented by the `bitmapped_commit` structure) with an extra bit indicating whether or not they are a pseudo-merge. In subsequent commits, pseudo-merge bitmaps will be generated by allocating a fake commit node with parents covering the full set of commits represented by the pseudo-merge bitmap. These commits will be added to the set of "selected" commits as usual, but will be written specially instead of being included with the rest of the selected commits. Mechanically speaking, there are two parts of this change: - The bitmapped_commit struct gets a new bit indicating whether it is a pseudo-merge, or an ordinary commit selected for bitmaps. - A handful of changes to only write out the non-pseudo-merge commits when enumerating through the selected array (see the new `bitmap_writer_selected_nr()` function). Pseudo-merge commits appear after all non-pseudo-merge commits, so it is safe to enumerate through the selected array like so: for (i = 0; i < bitmap_writer_selected_nr(); i++) if (writer.selected[i].pseudo_merge) BUG("unexpected pseudo-merge"); without encountering the BUG(). Signed-off-by: Taylor Blau Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 100 +++++++++++++++++++++++++++++--------------- pack-bitmap.h | 1 + 2 files changed, 67 insertions(+), 34 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 9bc41a9e145..fef02cd745a 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -24,7 +24,7 @@ struct bitmapped_commit { struct ewah_bitmap *write_as; int flags; int xor_offset; - uint32_t commit_pos; + unsigned pseudo_merge : 1; }; struct bitmap_writer { @@ -39,6 +39,8 @@ struct bitmap_writer { struct bitmapped_commit *selected; unsigned int selected_nr, selected_alloc; + uint32_t pseudo_merges_nr; + struct progress *progress; int show_progress; unsigned char pack_checksum[GIT_MAX_RAWSZ]; @@ -46,6 +48,11 @@ struct bitmap_writer { static struct bitmap_writer writer; +static inline int bitmap_writer_selected_nr(void) +{ + return writer.selected_nr - writer.pseudo_merges_nr; +} + void bitmap_writer_init(struct repository *r) { writer.bitmaps = kh_init_oid_map(); @@ -120,25 +127,30 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack, * Compute the actual bitmaps */ -static inline void push_bitmapped_commit(struct commit *commit) +static void bitmap_writer_push_bitmapped_commit(struct commit *commit, + unsigned pseudo_merge) { - int hash_ret; - khiter_t hash_pos; - if (writer.selected_nr >= writer.selected_alloc) { writer.selected_alloc = (writer.selected_alloc + 32) * 2; REALLOC_ARRAY(writer.selected, writer.selected_alloc); } - hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret); - if (!hash_ret) - die(_("duplicate entry when writing bitmap index: %s"), - oid_to_hex(&commit->object.oid)); - kh_value(writer.bitmaps, hash_pos) = NULL; + if (!pseudo_merge) { + int hash_ret; + khiter_t hash_pos = kh_put_oid_map(writer.bitmaps, + commit->object.oid, + &hash_ret); + + if (!hash_ret) + die(_("duplicate entry when writing bitmap index: %s"), + oid_to_hex(&commit->object.oid)); + kh_value(writer.bitmaps, hash_pos) = NULL; + } writer.selected[writer.selected_nr].commit = commit; writer.selected[writer.selected_nr].bitmap = NULL; writer.selected[writer.selected_nr].flags = 0; + writer.selected[writer.selected_nr].pseudo_merge = pseudo_merge; writer.selected_nr++; } @@ -168,16 +180,20 @@ static void compute_xor_offsets(void) while (next < writer.selected_nr) { struct bitmapped_commit *stored = &writer.selected[next]; - int best_offset = 0; struct ewah_bitmap *best_bitmap = stored->bitmap; struct ewah_bitmap *test_xor; + if (stored->pseudo_merge) + goto next; + for (i = 1; i <= MAX_XOR_OFFSET_SEARCH; ++i) { int curr = next - i; if (curr < 0) break; + if (writer.selected[curr].pseudo_merge) + continue; test_xor = ewah_pool_new(); ewah_xor(writer.selected[curr].bitmap, stored->bitmap, test_xor); @@ -193,6 +209,7 @@ static void compute_xor_offsets(void) } } +next: stored->xor_offset = best_offset; stored->write_as = best_bitmap; @@ -205,7 +222,8 @@ struct bb_commit { struct bitmap *commit_mask; struct bitmap *bitmap; unsigned selected:1, - maximal:1; + maximal:1, + pseudo_merge:1; unsigned idx; /* within selected array */ }; @@ -243,17 +261,18 @@ static void bitmap_builder_init(struct bitmap_builder *bb, revs.first_parent_only = 1; for (i = 0; i < writer->selected_nr; i++) { - struct commit *c = writer->selected[i].commit; - struct bb_commit *ent = bb_data_at(&bb->data, c); + struct bitmapped_commit *bc = &writer->selected[i]; + struct bb_commit *ent = bb_data_at(&bb->data, bc->commit); ent->selected = 1; ent->maximal = 1; + ent->pseudo_merge = bc->pseudo_merge; ent->idx = i; ent->commit_mask = bitmap_new(); bitmap_set(ent->commit_mask, i); - add_pending_object(&revs, &c->object, ""); + add_pending_object(&revs, &bc->commit->object, ""); } if (prepare_revision_walk(&revs)) @@ -430,8 +449,13 @@ static int fill_bitmap_commit(struct bb_commit *ent, struct commit *c = prio_queue_get(queue); if (old_bitmap && mapping) { - struct ewah_bitmap *old = bitmap_for_commit(old_bitmap, c); + struct ewah_bitmap *old; struct bitmap *remapped = bitmap_new(); + + if (commit->object.flags & BITMAP_PSEUDO_MERGE) + old = NULL; + else + old = bitmap_for_commit(old_bitmap, c); /* * If this commit has an old bitmap, then translate that * bitmap and add its bits to this one. No need to walk @@ -450,12 +474,14 @@ static int fill_bitmap_commit(struct bb_commit *ent, * Mark ourselves and queue our tree. The commit * walk ensures we cover all parents. */ - pos = find_object_pos(&c->object.oid, &found); - if (!found) - return -1; - bitmap_set(ent->bitmap, pos); - prio_queue_put(tree_queue, - repo_get_commit_tree(the_repository, c)); + if (!(c->object.flags & BITMAP_PSEUDO_MERGE)) { + pos = find_object_pos(&c->object.oid, &found); + if (!found) + return -1; + bitmap_set(ent->bitmap, pos); + prio_queue_put(tree_queue, + repo_get_commit_tree(the_repository, c)); + } for (p = c->parents; p; p = p->next) { pos = find_object_pos(&p->item->object.oid, &found); @@ -483,6 +509,9 @@ static void store_selected(struct bb_commit *ent, struct commit *commit) stored->bitmap = bitmap_to_ewah(ent->bitmap); + if (ent->pseudo_merge) + return; + hash_pos = kh_get_oid_map(writer.bitmaps, commit->object.oid); if (hash_pos == kh_end(writer.bitmaps)) die(_("attempted to store non-selected commit: '%s'"), @@ -612,7 +641,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, if (indexed_commits_nr < 100) { for (i = 0; i < indexed_commits_nr; ++i) - push_bitmapped_commit(indexed_commits[i]); + bitmap_writer_push_bitmapped_commit(indexed_commits[i], 0); return; } @@ -645,7 +674,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, } } - push_bitmapped_commit(chosen); + bitmap_writer_push_bitmapped_commit(chosen, 0); i += next + 1; display_progress(writer.progress, i); @@ -683,8 +712,11 @@ static void write_selected_commits_v1(struct hashfile *f, { int i; - for (i = 0; i < writer.selected_nr; ++i) { + for (i = 0; i < bitmap_writer_selected_nr(); ++i) { struct bitmapped_commit *stored = &writer.selected[i]; + if (stored->pseudo_merge) + BUG("unexpected pseudo-merge among selected: %s", + oid_to_hex(&stored->commit->object.oid)); if (offsets) offsets[i] = hashfile_total(f); @@ -718,10 +750,10 @@ static void write_lookup_table(struct hashfile *f, uint32_t i; uint32_t *table, *table_inv; - ALLOC_ARRAY(table, writer.selected_nr); - ALLOC_ARRAY(table_inv, writer.selected_nr); + ALLOC_ARRAY(table, bitmap_writer_selected_nr()); + ALLOC_ARRAY(table_inv, bitmap_writer_selected_nr()); - for (i = 0; i < writer.selected_nr; i++) + for (i = 0; i < bitmap_writer_selected_nr(); i++) table[i] = i; /* @@ -729,16 +761,16 @@ static void write_lookup_table(struct hashfile *f, * bitmap corresponds to j'th bitmapped commit (among the selected * commits) in lex order of OIDs. */ - QSORT_S(table, writer.selected_nr, table_cmp, commit_positions); + QSORT_S(table, bitmap_writer_selected_nr(), table_cmp, commit_positions); /* table_inv helps us discover that relationship (i'th bitmap * to j'th commit by j = table_inv[i]) */ - for (i = 0; i < writer.selected_nr; i++) + for (i = 0; i < bitmap_writer_selected_nr(); i++) table_inv[table[i]] = i; trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository); - for (i = 0; i < writer.selected_nr; i++) { + for (i = 0; i < bitmap_writer_selected_nr(); i++) { struct bitmapped_commit *selected = &writer.selected[table[i]]; uint32_t xor_offset = selected->xor_offset; uint32_t xor_row; @@ -809,7 +841,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index, memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)); header.version = htons(default_version); header.options = htons(flags | options); - header.entry_count = htonl(writer.selected_nr); + header.entry_count = htonl(bitmap_writer_selected_nr()); hashcpy(header.checksum, writer.pack_checksum); hashwrite(f, &header, sizeof(header) - GIT_MAX_RAWSZ + the_hash_algo->rawsz); @@ -821,9 +853,9 @@ void bitmap_writer_finish(struct pack_idx_entry **index, if (options & BITMAP_OPT_LOOKUP_TABLE) CALLOC_ARRAY(offsets, index_nr); - ALLOC_ARRAY(commit_positions, writer.selected_nr); + ALLOC_ARRAY(commit_positions, bitmap_writer_selected_nr()); - for (i = 0; i < writer.selected_nr; i++) { + for (i = 0; i < bitmap_writer_selected_nr(); i++) { struct bitmapped_commit *stored = &writer.selected[i]; int commit_pos = oid_pos(&stored->commit->object.oid, index, index_nr, oid_access); diff --git a/pack-bitmap.h b/pack-bitmap.h index dae2d68a338..ca9acd2f735 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -21,6 +21,7 @@ struct bitmap_disk_header { unsigned char checksum[GIT_MAX_RAWSZ]; }; +#define BITMAP_PSEUDO_MERGE (1u<<21) #define NEEDS_BITMAP (1u<<22) /* From patchwork Mon Apr 29 20:43:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647708 Received: from mail-yb1-f182.google.com (mail-yb1-f182.google.com [209.85.219.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 819A6178CE7 for ; Mon, 29 Apr 2024 20:43:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423402; cv=none; b=nj0xzYRAk/3f6CHvZVFZL6ThOLH09hfugDtQIQLoJS7exn9+tdZRZ24eK8lxjDwygHAbdjIsmoVTqJt72uEe+I9u9vLAZz+jDSrXU/6xnNebf4Wbs473GYJGnuB22nr6DY+Jv+KE/TlUSHk4DgKCZoAU6bqKDVqOFlkxlbOpvi0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423402; c=relaxed/simple; bh=lp6+12pYJTcvHGbcBqxpNoi3F3KEBEnIhVMIg4OYKuA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=XE+Hd1WBSJwc0tMeONjdTiSPRA6cWs33l4nZTTlJ17BQd26Ond8TaE8zMEd4PFOgLDcAQoTPaFcgzBpRYFsCDugKlwEyTNoxp/Csy47wfodRHruooX37rDWOdvZHmD2gxCfx//43WrAJ6t7XZjkhiAetNQzI1+UBiGQe8BH+Kus= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=pT43n5cN; arc=none smtp.client-ip=209.85.219.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="pT43n5cN" Received: by mail-yb1-f182.google.com with SMTP id 3f1490d57ef6-de607ab52f4so691658276.2 for ; Mon, 29 Apr 2024 13:43:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423400; x=1715028200; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=phMBd8bIvedWnGXfGOIYfGH1sSspJcu4/f5+qrtXNmg=; b=pT43n5cNA9vaCC0YgOj+vq5tTiulKNv8KIx8EYKsj7WkFVgrdEQOgLpcwgLxRv6NLV dS8YMyWaah2nmBJqXBTpeBf35AYpCv24WdqWY7UhysvadPDyD9F8Jo0YDPw0UZbdGFOk z9WGllZU/4XEqhZ6E8QBJ5KpBzS6Ib45yBeQy2bx5IS4DZ11MRSPGGUG7OQPMbdIYVdn LWh5qC2pec3y/ym25PXqUQVXm9oFeqEbs/2Pk+ML/6o7/BkmGkOQWikJeICuwoQpwCy+ 6WVkfNXahPA/hn/Ys+Vr73N28ounCRln0GEZoMHSEyXF4WIHmTTCkEGxq4KhEUP1aQVm Kkyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423400; x=1715028200; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=phMBd8bIvedWnGXfGOIYfGH1sSspJcu4/f5+qrtXNmg=; b=dSiddOt57cdJnY/HxpAmcBPLBV67dwGS7QquBWFEJYduSOb8iVRbEkXEkcpIZy22xz K2RQiGcyjTFM4lCbUJO3lGos5VfYcyt8YVhG8fZTkDNKhZm64CU8I+MrEOCKYiZ5cIhP xzNq9fVvQfnjjlttGOQmekwm3VCAsCufmekUnaVrlU64uok834oP/W2L/SCJrZu13fLt 6lfN4J9/BgGtuHXBWzhkUVl1avgVrGXU+B7P+V6x5LTKhrsLaTGld1yrDMPlb0F12ymU 5OTGoL1xTV/NGSDFhdxaELDAI+ykWrgiW3j8pdpX0aYPPUub/ud0HqP6xMtDrCb4/SnU ulRg== X-Gm-Message-State: AOJu0YxKBcxPdFL44H8YsiM/4wZxf0oE+c4h6nOXsKWSp+Aj900X4E53 777JFFHyZaIhSS//IQjX6KxsXai/+AlrnZvZrw0eLSitgakUEGlaAkGUUMTA7iVWh/ODLKXhFdL zO04= X-Google-Smtp-Source: AGHT+IE3vZ2w2wVBzY8fddfI3hKyqEB4ooK1oiMXnU3cZ5Enuiy1PmCj9hPfuNLqBNTlo68MfyVchA== X-Received: by 2002:a05:690c:6105:b0:61b:1b51:371f with SMTP id hi5-20020a05690c610500b0061b1b51371fmr12799283ywb.12.1714423400231; Mon, 29 Apr 2024 13:43:20 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id hg17-20020a05622a611100b0043ae2fd5a7esm1312355qtb.23.2024.04.29.13.43.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:43:19 -0700 (PDT) Date: Mon, 29 Apr 2024 16:43:18 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 07/23] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Message-ID: <9c6d09bf8742f880647067af91da16d656bab672.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Prepare to implement pseudo-merge bitmap selection by implementing a necessary new function, `bitmap_writer_has_bitmapped_object_id()`. This function returns whether or not the bitmap_writer selected the given object ID for bitmapping. This will allow the pseudo-merge machinery to reject candidates for pseudo-merges if they have already been selected as an ordinary bitmap tip. Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 5 +++++ pack-bitmap.h | 2 ++ 2 files changed, 7 insertions(+) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index fef02cd745a..c7514a58407 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -123,6 +123,11 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack, } } +int bitmap_writer_has_bitmapped_object_id(const struct object_id *oid) +{ + return kh_get_oid_map(writer.bitmaps, *oid) != kh_end(writer.bitmaps); +} + /** * Compute the actual bitmaps */ diff --git a/pack-bitmap.h b/pack-bitmap.h index ca9acd2f735..995d664cc89 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -98,6 +98,8 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_i off_t get_disk_usage_from_bitmap(struct bitmap_index *, struct rev_info *); +int bitmap_writer_has_bitmapped_object_id(const struct object_id *oid); + void bitmap_writer_init(struct repository *r); void bitmap_writer_show_progress(int show); void bitmap_writer_set_checksum(const unsigned char *sha1); From patchwork Mon Apr 29 20:43:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647709 Received: from mail-qv1-f52.google.com (mail-qv1-f52.google.com [209.85.219.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF1C4178CEC for ; Mon, 29 Apr 2024 20:43:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423406; cv=none; b=tkQRuwjvlwWjPhpzs+KreyPSpnhJIS+jS+GHKGQIMLu7ogTU0jRX6+5As/ikxEkeupDiZbAkIH6mHr90dDRhkyej73vVeqPRh0+3XzYEXaPSNbsX90TmhETDfKmy8jvv01CkvKrQSBkGD7TZ4XKO67NVk0WPoGnMXLcnLZLDnfA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423406; c=relaxed/simple; bh=MhtoFFee8VEI8nIR8q+Kc1MDVwkatehkVx/4Jwmlya8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=rVUcLrAY72fbROLGx5piDyjaSDVhR9DbU0RYPquE6Q4jru5aJJqikMenADFBS3FU0oD5wfzaCqKdrCFr348SidoXWMbT/nwoL/+x5K9cu4DE0s24hl3lX/9PWr0G6wDQBHCgA+aFsQbCsnhLKgGSdPL20i8Q2Kj6n6QD0Ex5eio= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=SntFHexL; arc=none smtp.client-ip=209.85.219.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="SntFHexL" Received: by mail-qv1-f52.google.com with SMTP id 6a1803df08f44-6969388c36fso20410996d6.1 for ; Mon, 29 Apr 2024 13:43:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423403; x=1715028203; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=qnJosCcbkQgZoHuN4GTVYVsyg93qmGK++c13D073jCY=; b=SntFHexLAj5nqT5tEUs2I3CqXSFJg/BSgfzpXSGXdNbgphUotcRwoxqAU7vAI5CDKJ 7c5Hk7jetNpVHs5BaunO5OmklvdTPv+wBeLKWEUk2rqRKuVTjR5dOwsOQcDRp8Ckg1NN FpSEin6m2xeUczReXeVrGuxkzug2gKLw3fpmhpeVL5NOWsO2eho7zaDKKeWHF6MKmVVR u/FRy94N96PwoWHNESw04rbN/Z0hKg2Mv+YVSsGyvXdHhd0bU/Np64zRxHaJM3/eHwJb LTnAjTY90f0HTUY66sWH/00BgdQVCwLBJneYxtide78ucAuuy911d3GfwtO/u+X4h92A 23DA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423403; x=1715028203; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=qnJosCcbkQgZoHuN4GTVYVsyg93qmGK++c13D073jCY=; b=MHFSBhS0r5ch1xYxn7JFOLC0S6vqd5K6w+imkKJwmhrTCexlKzqqY9+3heRUfmdawC s0/Ao6hK9Gb4E6eMAK6+jrOif3H2cjV4taNS4A/1e00qKjF8ecXb/ITYraFb+gEJJEk3 XXWc0T+aHuZWQr2ZGf2Int/TdllDYu5b8ew8UdpAlpbN2Hfz073L4OjDmvvTgBSLkJei Dd/ncYmFRbop63ehKyEQu5CyDmIk0f2RThWFBbUvGw2uS8SgTpDUXL6OkbGtot16F/HH lzUF9bxWdFty7G3zhssDXZdJiNdc9NuWtCi0GRcaFzBaRGZTYLnUwpSwDShrHv035o+W 4lPw== X-Gm-Message-State: AOJu0YzSlPwsfMem63u7J7Yy3yYKZs8s4R1rsQ7T2VZelLRxsEmutfIu ado6ghuehuISvBF4MOrs3EZmWsfTT9lgu4hrLxZNUBg/PGtiUBsU0YVDRPRzU3XNZIYZk3zE6xX /+rQ= X-Google-Smtp-Source: AGHT+IH5TD3YRIDCytHRlzpFCLDwAn/lKbvMi03SsWBX4opmhXPvtqEwvRef26rK8kBzrbmPl5ujow== X-Received: by 2002:a05:6214:76e:b0:699:3a91:e25 with SMTP id f14-20020a056214076e00b006993a910e25mr628656qvz.11.1714423403479; Mon, 29 Apr 2024 13:43:23 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id l26-20020a0c979a000000b0069b10d78445sm10860126qvd.142.2024.04.29.13.43.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:43:23 -0700 (PDT) Date: Mon, 29 Apr 2024 16:43:22 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 08/23] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The pseudo-merge selection code will be added in a subsequent commit, and will need a way to push the allocated commit structures into the bitmap writer from a separate compilation unit. Make the `bitmap_writer_push_bitmapped_commit()` function part of the pack-bitmap.h header in order to make this possible. Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 4 ++-- pack-bitmap.h | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index c7514a58407..dab5bdea806 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -132,8 +132,8 @@ int bitmap_writer_has_bitmapped_object_id(const struct object_id *oid) * Compute the actual bitmaps */ -static void bitmap_writer_push_bitmapped_commit(struct commit *commit, - unsigned pseudo_merge) +void bitmap_writer_push_bitmapped_commit(struct commit *commit, + unsigned pseudo_merge) { if (writer.selected_nr >= writer.selected_alloc) { writer.selected_alloc = (writer.selected_alloc + 32) * 2; diff --git a/pack-bitmap.h b/pack-bitmap.h index 995d664cc89..0f539d79cfd 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -99,6 +99,8 @@ int bitmap_has_oid_in_uninteresting(struct bitmap_index *, const struct object_i off_t get_disk_usage_from_bitmap(struct bitmap_index *, struct rev_info *); int bitmap_writer_has_bitmapped_object_id(const struct object_id *oid); +void bitmap_writer_push_bitmapped_commit(struct commit *commit, + unsigned pseudo_merge); void bitmap_writer_init(struct repository *r); void bitmap_writer_show_progress(int show); From patchwork Mon Apr 29 20:43:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647710 Received: from mail-ua1-f42.google.com (mail-ua1-f42.google.com [209.85.222.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8FD5176FC6 for ; Mon, 29 Apr 2024 20:43:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423423; cv=none; b=Z8jb8kaMBIH7a3z04NkbU5JQWPvSv+ElUBLwjhPyBLJJ4Mjpw/sjnCI72VTCZmipZRqk+y4js+UzEBDhIv0L5vVQfFpFXmXsYQyMU/RWif2Lr8qLNf5PD9tlg4mbPZBXZDlo3lIHj4wzb1rv1tszIdaKQlIZPQskPbfmdk+8tok= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423423; c=relaxed/simple; bh=x4I1IwSgpJtr6xLL5KP7QotW2wHG01iCJSBvHCxRu38=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Le/G/S4hJcMJ7tXh+gfyFuBdIBkaJTjve6YbRYdcU2K8m3TnW6lSx9nptu1FzmkW12aW28LfIlhc2MBjSr2VmWr+uMtU1gYyQVqW2vJa+8uKS1Rtli7Zvx1YTSX2UZORj+YTrQpy7kwoZbHUhMDg3SNKIC7tP3dl4WEttn4g2SY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=mFfxTJLW; arc=none smtp.client-ip=209.85.222.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="mFfxTJLW" Received: by mail-ua1-f42.google.com with SMTP id a1e0cc1a2514c-7f01c1514c9so948996241.2 for ; Mon, 29 Apr 2024 13:43:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423419; x=1715028219; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Zz5hsleHJDZXaFttIlPUzSi9lW1GL66CFNaHU/sCTok=; b=mFfxTJLWoQQsCX3ixm5n+K415LmlnRTs8f/tRo01giWOzZ7AwNOvcDWkBFkcRXW2lx 8vVu7Tv6PZeVcMLK1dpDwTDF8kOhLVCyGSIWMNa0GIArZoEeGrCsZW+pLJ2LlQ5XWrVw sixtPHYc1gSG/p8F3lmhFzIm3P4Jy62RrruZHVP1yWMRJ+dUDsyUdYD4TTbYkE4Cg0l7 wPs+Xud93oWMtc3ZZ+ry0ZDObZK6FgSHB+s0GpzGX1j/DchwxR4uJ4ZjqXNnsosN6PMy +VjWw+rMtZfwDZc3uP52v42clYFLLIVAPIgliTeeTi2gfeZI3+zbtfyPAkEoOSxJdVpc YSOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423419; x=1715028219; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Zz5hsleHJDZXaFttIlPUzSi9lW1GL66CFNaHU/sCTok=; b=ven7yIHnW2yciMRft1uWmGZA3qB/MmCOmmPHMYn76uzoStpE2C5mHAsYhRxY7t9A47 76B78s73TwubMFAHcLWlP4eXkfgJMLFfiygm5rJ7DR6oF950qy/N9DVXbf35uvzK/9Mn dSQN4wufVQD/F601MYm8J6/5pcipMOUjQlh3sFcDOHb5xYJLi3Q/BbRSLmmPXMKo/Lk4 D5KwvXNGAmmVLO3aVwsIHbp0GCoaWOWurEomIChFgtICT+b+RlH8I8lQsxlvCjfQs+Jc 4GKTytYZW3d1KKm9A12oLnsVBRho6op9ltPcCyBzUrJ7RJnXr2v5Q3eY86B17r/+b5Gs HjRg== X-Gm-Message-State: AOJu0YyD/HvqkN9O5KuHPQZPIhrlcny1oFrMUq9kkl+wx+zABvoGq2lJ oxS6PjriydhK/t+xN7QpixnUIkfCWV5eTACDdAXmgq2HEDCzjYUhgER9QPOGOD+mZ1U7fjvzpr6 /fLU= X-Google-Smtp-Source: AGHT+IFuZUlB3VbsE8pmOzlzy5+Zj5XhX7a9QWWb7sM5nW5FNL+4CqTjh5GtvCnEUvqxv8683KYTdA== X-Received: by 2002:a05:6102:390e:b0:47b:b404:d63e with SMTP id e14-20020a056102390e00b0047bb404d63emr9650854vsu.15.1714423418937; Mon, 29 Apr 2024 13:43:38 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id w17-20020a056214013100b0069b57111a98sm9895034qvs.79.2024.04.29.13.43.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:43:38 -0700 (PDT) Date: Mon, 29 Apr 2024 16:43:37 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits Message-ID: <86a1e4b8b9be99563836d1539fbf2ed4c4a6920d.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Teach the new pseudo-merge machinery how to select non-bitmapped commits for inclusion in different pseudo-merge group(s) based on a handful of criteria. Pseudo-merges are derived first from named pseudo-merge groups (see the `bitmapPseudoMerge..*` configuration options). They are (optionally) further segmented within an individual pseudo-merge group based on any capture group(s) within the pseudo-merge group's pattern. For example, a configuration like so: [bitmapPseudoMerge "all"] pattern = "refs/" threshold = now stableThreshold = never sampleRate = 100 maxMerges = 64 would group all non-bitmapped commits into up to 64 individual pseudo-merge commits. If you wanted to separate tags from branches when generating pseudo-merge commits, and further segment them by which fork they originate from (using the same "refs/virtual/" scheme as in the delta islands documentation), you would instead write something like: [bitmapPseudoMerge "all"] pattern = "refs/virtual/([0-9]+)/(heads|tags)/" threshold = now stableThreshold = never sampleRate = 100 maxMerges = 64 Which would generate pseudo-merge group identifiers like "1234-heads", and "5678-tags" (for branches in fork "1234", and tags in remote "5678", respectively). Within pseudo-merge groups, there are a handful of other options used to control the distribution of matching commits among individual pseudo-merge commits: - bitmapPseudoMerge..decay - bitmapPseudoMerge..sampleRate - bitmapPseudoMerge..threshold - bitmapPseudoMerge..maxMerges - bitmapPseudoMerge..stableThreshold - bitmapPseudoMerge..stableSize The decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`, where `f(n)` describes the size of the `n`-th pseudo-merge group. The sample rate controls what percentage of eligible commits are considered as candidates. The threshold parameter indicates the minimum age (so as to avoid including too-recent commits in a pseudo-merge group, making it less likely to be valid). The "maxMerges" parameter sets an upper-bound on the number of pseudo-merge commits an individual group The latter two "stable"-related parameters control "stable" pseudo-merge groups, comprised of a fixed number of commits which are older than the configured "stable threshold" value and may be grouped together in chunks of "stableSize" in order of age. This patch implements the aforementioned selection routine, as well as parsing the relevant configuration options. Signed-off-by: Taylor Blau --- pseudo-merge.c | 441 +++++++++++++++++++++++++++++++++++++++++++++++++ pseudo-merge.h | 96 +++++++++++ 2 files changed, 537 insertions(+) diff --git a/pseudo-merge.c b/pseudo-merge.c index 37e037ba272..caccef942a1 100644 --- a/pseudo-merge.c +++ b/pseudo-merge.c @@ -1,2 +1,443 @@ #include "git-compat-util.h" #include "pseudo-merge.h" +#include "date.h" +#include "oid-array.h" +#include "strbuf.h" +#include "config.h" +#include "string-list.h" +#include "refs.h" +#include "pack-bitmap.h" +#include "commit.h" +#include "alloc.h" +#include "progress.h" + +#define DEFAULT_PSEUDO_MERGE_DECAY 1.0f +#define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64 +#define DEFAULT_PSEUDO_MERGE_SAMPLE_RATE 100 +#define DEFAULT_PSEUDO_MERGE_THRESHOLD approxidate("1.week.ago") +#define DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD approxidate("1.month.ago") +#define DEFAULT_PSEUDO_MERGE_STABLE_SIZE 512 + +static float gitexp(float base, int exp) +{ + float result = 1; + while (1) { + if (exp % 2) + result *= base; + exp >>= 1; + if (!exp) + break; + base *= base; + } + return result; +} + +static uint32_t pseudo_merge_group_size(const struct pseudo_merge_group *group, + const struct pseudo_merge_matches *matches, + uint32_t i) +{ + float C = 0.0f; + uint32_t n; + + /* + * The size of pseudo-merge groups decays according to a power series, + * which looks like: + * + * f(n) = C * n^-k + * + * , where 'n' is the n-th pseudo-merge group, 'f(n)' is its size, 'k' + * is the decay rate, and 'C' is a scaling value. + * + * The value of C depends on the number of groups, decay rate, and total + * number of commits. It is computed such that if there are M and N + * total groups and commits, respectively, that: + * + * N = f(0) + f(1) + ... f(M-1) + * + * Rearranging to isolate C, we get: + * + * N = \sum_{n=1}^M C / n^k + * + * N / C = \sum_{n=1}^M n^-k + * + * C = N / \sum_{n=1}^M n^-k + * + * For example, if we have a decay rate of 'k' being equal to 1.5, 'N' + * total commits equal to 10,000, and 'M' being equal to 6 groups, then + * the (rounded) group sizes are: + * + * { 5469, 1934, 1053, 684, 489, 372 } + * + * increasing the number of total groups, say to 10, scales the group + * sizes appropriately: + * + * { 5012, 1772, 964, 626, 448, 341, 271, 221, 186, 158 } + */ + for (n = 0; n < group->max_merges; n++) + C += 1.0f / gitexp(n + 1, group->decay); + C = matches->unstable_nr / C; + + return (int)((C / gitexp(i + 1, group->decay)) + 0.5); +} + +static void init_pseudo_merge_group(struct pseudo_merge_group *group) +{ + memset(group, 0, sizeof(struct pseudo_merge_group)); + + strmap_init_with_options(&group->matches, NULL, 0); + + group->decay = DEFAULT_PSEUDO_MERGE_DECAY; + group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES; + group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE; + group->threshold = DEFAULT_PSEUDO_MERGE_THRESHOLD; + group->stable_threshold = DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD; + group->stable_size = DEFAULT_PSEUDO_MERGE_STABLE_SIZE; +} + +static int pseudo_merge_config(const char *var, const char *value, + const struct config_context *ctx, + void *cb_data) +{ + struct string_list *list = cb_data; + struct string_list_item *item; + struct pseudo_merge_group *group; + struct strbuf buf = STRBUF_INIT; + const char *sub, *key; + size_t sub_len; + + if (parse_config_key(var, "bitmappseudomerge", &sub, &sub_len, &key)) + return 0; + + if (!sub_len) + return 0; + + strbuf_add(&buf, sub, sub_len); + + item = string_list_lookup(list, buf.buf); + if (!item) { + item = string_list_insert(list, buf.buf); + + item->util = xmalloc(sizeof(struct pseudo_merge_group)); + init_pseudo_merge_group(item->util); + } + + group = item->util; + + if (!strcmp(key, "pattern")) { + struct strbuf re = STRBUF_INIT; + + free(group->pattern); + if (*value != '^') + strbuf_addch(&re, '^'); + strbuf_addstr(&re, value); + + group->pattern = xcalloc(1, sizeof(regex_t)); + if (regcomp(group->pattern, re.buf, REG_EXTENDED)) + die(_("failed to load pseudo-merge regex for %s: '%s'"), + sub, re.buf); + + strbuf_release(&re); + } else if (!strcmp(key, "decay")) { + group->decay = git_config_int(var, value, ctx->kvi); + if (group->decay < 0) { + warning(_("%s must be non-negative, using default"), var); + group->decay = DEFAULT_PSEUDO_MERGE_DECAY; + } + } else if (!strcmp(key, "samplerate")) { + group->sample_rate = git_config_int(var, value, ctx->kvi); + if (!(0 <= group->sample_rate && group->sample_rate <= 100)) { + warning(_("%s must be between 0 and 100, using default"), var); + group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE; + } + } else if (!strcmp(key, "threshold")) { + if (git_config_expiry_date(&group->threshold, var, value)) { + strbuf_release(&buf); + return -1; + } + } else if (!strcmp(key, "maxmerges")) { + group->max_merges = git_config_int(var, value, ctx->kvi); + if (group->max_merges < 0) { + warning(_("%s must be non-negative, using default"), var); + group->max_merges = DEFAULT_PSEUDO_MERGE_MAX_MERGES; + } + } else if (!strcmp(key, "stablethreshold")) { + if (git_config_expiry_date(&group->stable_threshold, var, value)) { + strbuf_release(&buf); + return -1; + } + } else if (!strcmp(key, "stablesize")) { + group->stable_size = git_config_int(var, value, ctx->kvi); + if (group->stable_size <= 0) { + warning(_("%s must be positive, using default"), var); + group->stable_size = DEFAULT_PSEUDO_MERGE_STABLE_SIZE; + } + } + + strbuf_release(&buf); + + return 0; +} + +void load_pseudo_merges_from_config(struct string_list *list) +{ + struct string_list_item *item; + + git_config(pseudo_merge_config, list); + + for_each_string_list_item(item, list) { + struct pseudo_merge_group *group = item->util; + if (!group->pattern) + die(_("pseudo-merge group '%s' missing required pattern"), + item->string); + if (group->threshold < group->stable_threshold) + die(_("pseudo-merge group '%s' has unstable threshold " + "before stable one"), item->string); + } +} + +static int find_pseudo_merge_group_for_ref(const char *refname, + const struct object_id *oid, + int flags UNUSED, + void *_data) +{ + struct string_list *list = _data; + struct object_id peeled; + struct commit *c; + uint32_t i; + int has_bitmap; + + if (!peel_iterated_oid(oid, &peeled)) + oid = &peeled; + + c = lookup_commit(the_repository, oid); + if (!c) + return 0; + + has_bitmap = bitmap_writer_has_bitmapped_object_id(oid); + + for (i = 0; i < list->nr; i++) { + struct pseudo_merge_group *group; + struct pseudo_merge_matches *matches; + struct strbuf group_name = STRBUF_INIT; + regmatch_t captures[16]; + size_t j; + + group = list->items[i].util; + if (regexec(group->pattern, refname, ARRAY_SIZE(captures), + captures, 0)) + continue; + + if (captures[ARRAY_SIZE(captures) - 1].rm_so != -1) + warning(_("pseudo-merge regex from config has too many capture " + "groups (max=%"PRIuMAX")"), + (uintmax_t)ARRAY_SIZE(captures) - 2); + + for (j = !!group->pattern->re_nsub; j < ARRAY_SIZE(captures); j++) { + regmatch_t *match = &captures[j]; + if (match->rm_so == -1) + continue; + + if (group_name.len) + strbuf_addch(&group_name, '-'); + + strbuf_add(&group_name, refname + match->rm_so, + match->rm_eo - match->rm_so); + } + + matches = strmap_get(&group->matches, group_name.buf); + if (!matches) { + matches = xcalloc(1, sizeof(*matches)); + strmap_put(&group->matches, strbuf_detach(&group_name, NULL), + matches); + } + + if (c->date <= group->stable_threshold) { + ALLOC_GROW(matches->stable, matches->stable_nr + 1, + matches->stable_alloc); + matches->stable[matches->stable_nr++] = c; + } else if (c->date <= group->threshold && !has_bitmap) { + ALLOC_GROW(matches->unstable, matches->unstable_nr + 1, + matches->unstable_alloc); + matches->unstable[matches->unstable_nr++] = c; + } + + strbuf_release(&group_name); + } + + return 0; +} + +static struct commit *push_pseudo_merge(struct pseudo_merge_group *group) +{ + struct commit *merge; + + ALLOC_GROW(group->merges, group->merges_nr + 1, group->merges_alloc); + + merge = alloc_commit_node(the_repository); + merge->object.parsed = 1; + merge->object.flags |= BITMAP_PSEUDO_MERGE; + + group->merges[group->merges_nr++] = merge; + + return merge; +} + +static struct pseudo_merge_commit_idx *pseudo_merge_idx(kh_oid_map_t *pseudo_merge_commits, + const struct object_id *oid) + +{ + struct pseudo_merge_commit_idx *pmc; + khiter_t hash_pos; + + hash_pos = kh_get_oid_map(pseudo_merge_commits, *oid); + if (hash_pos == kh_end(pseudo_merge_commits)) { + int hash_ret; + hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid, &hash_ret); + + CALLOC_ARRAY(pmc, 1); + + kh_value(pseudo_merge_commits, hash_pos) = pmc; + } else { + pmc = kh_value(pseudo_merge_commits, hash_pos); + } + + return pmc; +} + +#define MIN_PSEUDO_MERGE_SIZE 8 + +static void select_pseudo_merges_1(struct pseudo_merge_group *group, + struct pseudo_merge_matches *matches, + kh_oid_map_t *pseudo_merge_commits, + uint32_t *pseudo_merges_nr) +{ + uint32_t i, j; + uint32_t stable_merges_nr; + + if (!matches->stable_nr && !matches->unstable_nr) + return; /* all tips in this group already have bitmaps */ + + stable_merges_nr = matches->stable_nr / group->stable_size; + if (matches->stable_nr % group->stable_size) + stable_merges_nr++; + + /* make stable_merges_nr pseudo merges for stable commits */ + for (i = 0, j = 0; i < stable_merges_nr; i++) { + struct commit *merge; + struct commit_list **p; + + merge = push_pseudo_merge(group); + p = &merge->parents; + + do { + struct commit *c; + struct pseudo_merge_commit_idx *pmc; + + if (j >= matches->stable_nr) + break; + + c = matches->stable[j++]; + pmc = pseudo_merge_idx(pseudo_merge_commits, + &c->object.oid); + + ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc); + + pmc->pseudo_merge[pmc->nr++] = *pseudo_merges_nr; + p = commit_list_append(c, p); + } while (j % group->stable_size); + + bitmap_writer_push_bitmapped_commit(merge, 1); + (*pseudo_merges_nr)++; + } + + /* make up to group->max_merges pseudo merges for unstable commits */ + for (i = 0, j = 0; i < group->max_merges; i++) { + struct commit *merge; + struct commit_list **p; + uint32_t size, end; + + merge = push_pseudo_merge(group); + p = &merge->parents; + + size = pseudo_merge_group_size(group, matches, i); + end = size < MIN_PSEUDO_MERGE_SIZE ? matches->unstable_nr : j + size; + + for (; j < end && j < matches->unstable_nr; j++) { + struct commit *c = matches->unstable[j]; + struct pseudo_merge_commit_idx *pmc; + + if (j % (100 / group->sample_rate)) + continue; + + pmc = pseudo_merge_idx(pseudo_merge_commits, + &c->object.oid); + + ALLOC_GROW(pmc->pseudo_merge, pmc->nr + 1, pmc->alloc); + + pmc->pseudo_merge[pmc->nr++] = *pseudo_merges_nr; + p = commit_list_append(c, p); + } + + bitmap_writer_push_bitmapped_commit(merge, 1); + (*pseudo_merges_nr)++; + if (end >= matches->unstable_nr) + break; + } +} + +static int commit_date_cmp(const void *va, const void *vb) +{ + timestamp_t a = (*(const struct commit **)va)->date; + timestamp_t b = (*(const struct commit **)vb)->date; + + if (a < b) + return -1; + else if (a > b) + return 1; + return 0; +} + +static void sort_pseudo_merge_matches(struct pseudo_merge_matches *matches) +{ + QSORT(matches->stable, matches->stable_nr, commit_date_cmp); + QSORT(matches->unstable, matches->unstable_nr, commit_date_cmp); +} + +void select_pseudo_merges(struct string_list *list, + struct commit **commits, size_t commits_nr, + kh_oid_map_t *pseudo_merge_commits, + uint32_t *pseudo_merges_nr, + unsigned show_progress) +{ + struct progress *progress = NULL; + uint32_t i; + + if (!list->nr) + return; + + if (show_progress) + progress = start_progress("Selecting pseudo-merge commits", list->nr); + + for_each_ref(find_pseudo_merge_group_for_ref, list); + + for (i = 0; i < list->nr; i++) { + struct pseudo_merge_group *group; + struct hashmap_iter iter; + struct strmap_entry *e; + + group = list->items[i].util; + strmap_for_each_entry(&group->matches, &iter, e) { + struct pseudo_merge_matches *matches = e->value; + + sort_pseudo_merge_matches(matches); + + select_pseudo_merges_1(group, matches, + pseudo_merge_commits, + pseudo_merges_nr); + } + + display_progress(progress, i + 1); + } + + stop_progress(&progress); +} diff --git a/pseudo-merge.h b/pseudo-merge.h index cab8ff6960a..81888731864 100644 --- a/pseudo-merge.h +++ b/pseudo-merge.h @@ -2,5 +2,101 @@ #define PSEUDO_MERGE_H #include "git-compat-util.h" +#include "strmap.h" +#include "khash.h" +#include "ewah/ewok.h" + +struct commit; +struct string_list; +struct bitmap_index; + +/* + * A pseudo-merge group tracks the set of non-bitmapped reference tips + * that match the given pattern. + * + * Within those matches, they are further segmented by separating + * consecutive capture groups with '-' dash character capture groups + * with '-' dash characters. + * + * Those groups are then ordered by committer date and partitioned + * into individual pseudo-merge(s) according to the decay, max_merges, + * sample_rate, and threshold parameters. + */ +struct pseudo_merge_group { + regex_t *pattern; + + /* capture group(s) -> struct pseudo_merge_matches */ + struct strmap matches; + + /* + * The individual pseudo-merge(s) that are generated from the + * above array of matches, partitioned according to the below + * parameters. + */ + struct commit **merges; + size_t merges_nr; + size_t merges_alloc; + + /* + * Pseudo-merge grouping parameters. See git-config(1) for + * more information. + */ + float decay; + int max_merges; + int sample_rate; + int stable_size; + timestamp_t threshold; + timestamp_t stable_threshold; +}; + +struct pseudo_merge_matches { + struct commit **stable; + struct commit **unstable; + size_t stable_nr, stable_alloc; + size_t unstable_nr, unstable_alloc; +}; + +/* + * Read the repository's configuration: + * + * - bitmapPseudoMerge..pattern + * - bitmapPseudoMerge..decay + * - bitmapPseudoMerge..sampleRate + * - bitmapPseudoMerge..threshold + * - bitmapPseudoMerge..maxMerges + * - bitmapPseudoMerge..stableThreshold + * - bitmapPseudoMerge..stableSize + * + * and populates the given `list` with pseudo-merge groups. String + * entry keys are the pseudo-merge group names, and the values are + * pointers to the pseudo_merge_group structure itself. + */ +void load_pseudo_merges_from_config(struct string_list *list); + +/* + * A pseudo-merge commit index (pseudo_merge_commit_idx) maps a + * particular (non-pseudo-merge) commit to the list of pseudo-merge(s) + * it appears in. + */ +struct pseudo_merge_commit_idx { + uint32_t *pseudo_merge; + size_t nr, alloc; +}; + +/* + * Selects pseudo-merges from a list of commits, populating the given + * string_list of pseudo-merge groups. + * + * Populates the pseudo_merge_commits map with a commit_idx + * corresponding to each commit in the list. Counts the total number + * of pseudo-merges generated. + * + * Optionally shows a progress meter. + */ +void select_pseudo_merges(struct string_list *list, + struct commit **commits, size_t commits_nr, + kh_oid_map_t *pseudo_merge_commits, + uint32_t *pseudo_merges_nr, + unsigned show_progress); #endif From patchwork Mon Apr 29 20:43:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647711 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19006177998 for ; Mon, 29 Apr 2024 20:43:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423426; cv=none; b=PsCJVieZ4dRZP9s45lhycFYE2Tn/2SyD6S1pEe8wv/eF5eKQ2tdnhTBu8V3sYRf/LKNeImzv3ERJvccxdpsCXnMdHWgWEgb4MJPsmkdlPpaV66bwosZAhaWA6XYHwtbp1ymbONK/HdD+vv8v675zEcgGYyfn1faxsasVLgTLjWA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423426; c=relaxed/simple; bh=6iRdneNhefP3orMwpZbbkVN1EzXhDEWeMOX/M69Jxx8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=bmr4d4KbYGtDJJQ3ghifJPX2ubxXg0V+UTb87/ibeAGLqtlZKc9F2KE7iJ48LsgTvXgTNb7E5a+Wu+kjLqPsb8hAnHFPFqze9xUdcXLJxJovHIvXyMOm54Oiy8In96PaCgPCiuwtl0NpyMCMjsq3f8hXQzukEpYHybsDB7SHey8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=0akAZysq; arc=none smtp.client-ip=209.85.160.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="0akAZysq" Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-4375ddb9eaeso31204741cf.3 for ; Mon, 29 Apr 2024 13:43:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423423; x=1715028223; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=GKbZlD30mfj9APCtKm3B1xV9xc5+CARQFHwiXtZWwfk=; b=0akAZysqKlrsna1lktZuTilWg1cOydveM1t1ZQ+5B333VOfRNVuP10zfAOPUDi7b3f GxH8cr8se13HcWUEAg1XON4H9RuPIi4/MNbjoFkliu/C+nB+IKdP7cZHOK+UqBMVPR8H Rk7ZRkWZFWEX5DgSdhmHVhPTOXnb+WTXoCidp1O92C1TlHgb8ARvQmeiEatat3I0Ovu/ RqSu9vey730dsq3j9Thfcm9mPLlBlNBMjAawefhFSTLxCMsbSFz1GxRliwhj2HN6NFqD HI70kedDY0sjcllVZrBggY8Fux/k9OLcitoZTDIBoX87FPeESZpVJh2JpQ6+nTf62TiN pEZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423423; x=1715028223; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=GKbZlD30mfj9APCtKm3B1xV9xc5+CARQFHwiXtZWwfk=; b=b+nctlJeECEYfjulchkhqmhADv/MUwflC6+OrSJwuV3UZfkdJ/ugnNG9GUt8xCzkFG zH+TozIvRyMqyDxXbj83lAhGJhZLB3ZwiQIV4R3FYuS/JsB9uSrBFfFJa1mTUGP+vuj9 0j4IP0uXZzJL8BB5IKmIW/vgpwYKoA7HEGQEVx7MvIMmp+XgXQI33dmGo5fOIKAEC79/ /WyNC+oQlLSsKVfqjYOotuea5lE1QYu/guT262O597UGeBeKo8ANbybn5yBa+ulqTebr CuRtjrzyrFr+n32x3qBkwgrkWKh4Vrdn1fkgGJap0IjfS94hVXC3hI6e6xixzXKfG9Na 2dWg== X-Gm-Message-State: AOJu0YxuJHkMtVirRyfJ+m9AX9MKSQ/JK8XuKMaPXNb9u0WDyrL58TtU ZhocI5UvIIE1LK7VDXqY8X/uuGZJYzYSMwXwLTzwuEOXbcycLpOiwkjGPOQBlpZCu+f+9Ljfr13 nCks= X-Google-Smtp-Source: AGHT+IGoLpJBTFHzdCIyAnHhwm/4gJ3vByB+u/pmd7QlZs2TypjZWUmCC297SeJoAT8PnIJH0ZDIlQ== X-Received: by 2002:a05:622a:578b:b0:439:dfc7:aca4 with SMTP id eh11-20020a05622a578b00b00439dfc7aca4mr12372144qtb.63.1714423423105; Mon, 29 Apr 2024 13:43:43 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id d12-20020ac851cc000000b00438527a4eb5sm9488520qtn.10.2024.04.29.13.43.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:43:42 -0700 (PDT) Date: Mon, 29 Apr 2024 16:43:41 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 10/23] pack-bitmap-write.c: select pseudo-merge commits Message-ID: <12b432e3a8adcda6228beae2b41b2363a6ce82a0.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that the pseudo-merge machinery has learned how to select non-bitmapped commits and assign them into different pseudo-merge group(s), invoke this new API from within the pack-bitmap internals and store the results off. Note that the selected pseudo-merge commits aren't actually used or written anywhere yet. This will be done in the following commit. Signed-off-by: Taylor Blau --- Documentation/config.txt | 2 + Documentation/config/bitmap-pseudo-merge.txt | 75 ++++++++++++++++++++ Documentation/technical/bitmap-format.txt | 26 +++++++ pack-bitmap-write.c | 14 ++++ 4 files changed, 117 insertions(+) create mode 100644 Documentation/config/bitmap-pseudo-merge.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index 70b448b1326..bbedb7b9a06 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -383,6 +383,8 @@ include::config/apply.txt[] include::config/attr.txt[] +include::config/bitmap-pseudo-merge.txt[] + include::config/blame.txt[] include::config/branch.txt[] diff --git a/Documentation/config/bitmap-pseudo-merge.txt b/Documentation/config/bitmap-pseudo-merge.txt new file mode 100644 index 00000000000..90b72522046 --- /dev/null +++ b/Documentation/config/bitmap-pseudo-merge.txt @@ -0,0 +1,75 @@ +bitmapPseudoMerge..pattern:: + Regular expression used to match reference names. Commits + pointed to by references matching this pattern (and meeting + the below criteria, like `bitmapPseudoMerge..sampleRate` + and `bitmapPseudoMerge..threshold`) will be considered + for inclusion in a pseudo-merge bitmap. ++ +Commits are grouped into pseudo-merge groups based on whether or not +any reference(s) that point at a given commit match the pattern, which +is an extended regular expression. ++ +Within a pseudo-merge group, commits may be further grouped into +sub-groups based on the capture groups in the pattern. These +sub-groupings are formed from the regular expressions by concatenating +any capture groups from the regular expression, with a '-' dash in +between. ++ +For example, if the pattern is `refs/tags/`, then all tags (provided +they meet the below criteria) will be considered candidates for the +same pseudo-merge group. However, if the pattern is instead +`refs/remotes/([0-9])+/tags/`, then tags from different remotes will +be grouped into separate pseudo-merge groups, based on the remote +number. + +bitmapPseudoMerge..decay:: + Determines the rate at which consecutive pseudo-merge bitmap + groups decrease in size. Must be non-negative. This parameter + can be thought of as `k` in the function `f(n) = C * + n^(-k/100)`, where `f(n)` is the size of the `n`th group. ++ +Setting the decay rate equal to `0` will cause all groups to be the +same size. Setting the decay rate equal to `100` will cause the `n`th +group to be `1/n` the size of the initial group. Higher values of the +decay rate cause consecutive groups to shrink at an increasing rate. +The default is `100`. + +bitmapPseudoMerge..sampleRate:: + Determines the proportion of non-bitmapped commits (among + reference tips) which are selected for inclusion in an + unstable pseudo-merge bitmap. Must be between `0` and `100` + (inclusive). The default is `100`. + +bitmapPseudoMerge..threshold:: + Determines the minimum age of non-bitmapped commits (among + reference tips, as above) which are candidates for inclusion + in an unstable pseudo-merge bitmap. The default is + `1.week.ago`. + +bitmapPseudoMerge..maxMerges:: + Determines the maximum number of pseudo-merge commits among + which commits may be distributed. ++ +For pseudo-merge groups whose pattern does not contain any capture +groups, this setting is applied for all commits matching the regular +expression. For patterns that have one or more capture groups, this +setting is applied for each distinct capture group. ++ +For example, if your capture group is `refs/tags/`, then this setting +will distribute all tags into a maximum of `maxMerges` pseudo-merge +commits. However, if your capture group is, say, +`refs/remotes/([0-9]+)/tags/`, then this setting will be applied to +each remote's set of tags individually. ++ +Must be non-negative. The default value is 64. + +bitmapPseudoMerge..stableThreshold:: + Determines the minimum age of commits (among reference tips, + as above, however stable commits are still considered + candidates even when they have been covered by a bitmap) which + are candidates for a stable a pseudo-merge bitmap. The default + is `1.month.ago`. + +bitmapPseudoMerge..stableSize:: + Determines the size (in number of commits) of a stable + psuedo-merge bitmap. The default is `512`. diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt index 63a7177ac08..ed7edf98034 100644 --- a/Documentation/technical/bitmap-format.txt +++ b/Documentation/technical/bitmap-format.txt @@ -434,3 +434,29 @@ the end of a `.bitmap` file. The format is as follows: * An 8-byte unsigned value (in network byte-order) equal to the number of bytes in the pseudo-merge section (including this field). + +=== Pseudo-merge selection + +Pseudo-merge commits are selected among non-bitmapped commits at the +tip of one or more reference(s). In addition, there are a handful of +constraints to further refine this selection: + +`pack.bitmapPseudoMergeDecay`:: Defines the "decay rate", which +corresponds to how quickly (or not) consecutive pseudo-merges decrease +in size relative to one another. + +`pack.bitmapPseudoMergeGroups`:: Defines the maximum number of +pseudo-merge groups. + +`pack.bitmapPseudoMergeSampleRate`:: Defines the percentage of commits +(matching the above criteria) which are selected. + +`pack.bitmapPseudoMergeThreshold`:: Defines the minimum age of a commit +in order to be considered for inclusion within one or more pseudo-merge +bitmaps. + +The size of consecutive pseudo-merge groups decays according to a +power-law decay function, where the size of the `n`-th group is `f(n) = +C*n^-k`. The value of `C` is chosen accordingly to match the number of +desired groups, and `k` is 1/100th of the value of +`pack.bitmapPseudoMergeDecay`. diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index dab5bdea806..e06930e10b9 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -17,6 +17,7 @@ #include "trace2.h" #include "tree.h" #include "tree-walk.h" +#include "pseudo-merge.h" struct bitmapped_commit { struct commit *commit; @@ -39,6 +40,8 @@ struct bitmap_writer { struct bitmapped_commit *selected; unsigned int selected_nr, selected_alloc; + struct string_list pseudo_merge_groups; + kh_oid_map_t *pseudo_merge_commits; /* oid -> pseudo merge(s) */ uint32_t pseudo_merges_nr; struct progress *progress; @@ -56,6 +59,11 @@ static inline int bitmap_writer_selected_nr(void) void bitmap_writer_init(struct repository *r) { writer.bitmaps = kh_init_oid_map(); + writer.pseudo_merge_commits = kh_init_oid_map(); + + string_list_init_dup(&writer.pseudo_merge_groups); + + load_pseudo_merges_from_config(&writer.pseudo_merge_groups); } void bitmap_writer_show_progress(int show) @@ -686,6 +694,12 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, } stop_progress(&writer.progress); + + select_pseudo_merges(&writer.pseudo_merge_groups, + indexed_commits, indexed_commits_nr, + writer.pseudo_merge_commits, + &writer.pseudo_merges_nr, + writer.show_progress); } From patchwork Mon Apr 29 20:43:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647712 Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7659D177998 for ; Mon, 29 Apr 2024 20:43:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423431; cv=none; b=a4OOhcsmtMPZhKJXIRJCiFXLQRZAfLPsf1RhdFvA1HO733aDB1v5/BzeOgzkUeeZPSs6+sUwaHTSjCMbzjVODBEYaAjeQKPRn6fmiFMJlp3TBs/AmFTmCbZns7Clg7TCnVtPOhVayYLE4+4rzmDYOs/o81sSfRD+Ydryfvy4sLU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423431; c=relaxed/simple; bh=DeamlAErB87ukdXQph/1cvNBoIBzmwpHMhHIinkcAwI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=nhTo8kIXLi6Y72UJC6wIkPp8nJT4tMTB7WcVlsNJgtExKLhp3w9PFTuDIm1O7HhbuCtRb9sVeXzKn6xtC/Q7BojaukBZluZGWeGXKwS5kYalo3/oNqzOzcDz2rGSso0c+taK+NaN13ynO7I3oYaZQFVd3Spj6UUd1D9vtVwRRoY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=lqHwF3I0; arc=none smtp.client-ip=209.85.222.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="lqHwF3I0" Received: by mail-qk1-f174.google.com with SMTP id af79cd13be357-78f02298dc6so425448185a.1 for ; Mon, 29 Apr 2024 13:43:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423427; x=1715028227; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=zGly7GL0o5G5yfB/BICGPXwv2ggsfp9TQD80qtz2Cac=; b=lqHwF3I00KFJ7HXRsZqp/jpGj9yXhlehg4FEFsS8UTGKAPqce1a/MD6XBVG8VWmjjy VkHNY57yHOz8d8nWmAPfM75vXZ0kzQyREXn8EiCsD8xmkIOLOh05cloK/pwNLQGgGkVq HrI6qdu1fGHFQirJaKPKdhHG4d9JevJ2mylcoDfguwrDm8C+hDWsfK6hau/JVMOr5FQq xC77FTw1Qtyul8rgfXk4B89YOq+eAgQHBg78Prxx1TdrOuan+O9D+OAXLzi0Kw4QiFru sk/D33aQMb0BEwKenpD30mF7eGF435Lk68Sr1U4rujINRCqKnC2MEeo++/oxj0V+xvD4 y8ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423427; x=1715028227; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=zGly7GL0o5G5yfB/BICGPXwv2ggsfp9TQD80qtz2Cac=; b=YTgfDhtCcdgfjmS91tBsnZkevryFHoV5z4oePlBDIWEe+SiX7mkVkzaVp06si7lnIf 7EYR55zrBKbQPS4LdcmyTNpVC8xdN41NQ3F0geWSkNsD7X3QAKNhCuO5A47HziTyURho POpQ/AjpOB8mNrvVaNS313ERhtthbuhS17NMMza8xu9XdAnLh9d2+my7DLR899SwKgiM EHgqfmlPKg5fWEJbrjrGLAL6C7pw9n1lnFKlqn7yaG8ZppkzEZcWLraKC8qAPOtB/4US cd+3uDBBvLDgQcjxfj9V7e2+1htQyJAkzi10Ji8nRwazmS3x7yFEZ70IwniTP4HTRXZk +wCQ== X-Gm-Message-State: AOJu0Yxp1ZylCW5qv1CnrDw+7WHwbfU0iRucA9mycAkqZmfbAyLk2pqV aXDF9SCtoaSCQWDDFqdqRYOcOPKGkCq7r/2vKNd8gqN96qqElUfJO6rH6nHhvSB7klTqiuuYquW f6NI= X-Google-Smtp-Source: AGHT+IHjLBNx0eH6gb6h3+JKbpNRriwtw1YP45c5kYFyRG+x20UxCYKsSCFoOMFIVX0LEK9xxN5aGA== X-Received: by 2002:a05:620a:254d:b0:790:c1ad:d075 with SMTP id s13-20020a05620a254d00b00790c1add075mr1285829qko.3.1714423427029; Mon, 29 Apr 2024 13:43:47 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id t17-20020a05620a451100b0078edf6393edsm10786991qkp.73.2024.04.29.13.43.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:43:46 -0700 (PDT) Date: Mon, 29 Apr 2024 16:43:45 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 11/23] pack-bitmap-write.c: write pseudo-merge table Message-ID: <6ce805d061e1e51dfd1b2ab3b8cd081292f42f3a.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that the pack-bitmap writer machinery understands how to select and store pseudo-merge commits, teach it how to write the new optional pseudo-merge .bitmap extension. No readers yet exist for this new extension to the .bitmap format. The following commits will take any preparatory step(s) necessary before then implementing the routines necessary to read this new table. In the meantime, the new `write_pseudo_merges()` function implements writing this new format as described by a previous commit in Documentation/technical/bitmap-format.txt. Writing this table is fairly straightforward and consists of a few sub-components: - a pair of bitmaps for each pseudo-merge (one for the pseudo-merge "parents", and another for the objects reachable from those parents) - for each commit, the offset of either (a) the pseudo-merge it belongs to, or (b) an extended lookup table if it belongs to >1 pseudo-merge groups - if there are any commits belonging to >1 pseudo-merge group, the extended lookup tables (which each consist of the number of pseudo-merge groups a commit appears in, and then that many 4-byte unsigned ) Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 128 ++++++++++++++++++++++++++++++++++++++++++++ pack-bitmap.h | 1 + 2 files changed, 129 insertions(+) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index e06930e10b9..d4894ace9ee 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -18,6 +18,7 @@ #include "tree.h" #include "tree-walk.h" #include "pseudo-merge.h" +#include "oid-array.h" struct bitmapped_commit { struct commit *commit; @@ -748,6 +749,127 @@ static void write_selected_commits_v1(struct hashfile *f, } } +static void write_pseudo_merges(struct hashfile *f) +{ + struct oid_array commits = OID_ARRAY_INIT; + struct bitmap **commits_bitmap = NULL; + off_t *pseudo_merge_ofs = NULL; + off_t start, table_start, next_ext; + + uint32_t base = bitmap_writer_selected_nr(); + size_t i, j = 0; + + CALLOC_ARRAY(commits_bitmap, writer.pseudo_merges_nr); + CALLOC_ARRAY(pseudo_merge_ofs, writer.pseudo_merges_nr); + + for (i = 0; i < writer.pseudo_merges_nr; i++) { + struct bitmapped_commit *merge = &writer.selected[base + i]; + struct commit_list *p; + + if (!merge->pseudo_merge) + BUG("found non-pseudo merge commit at %"PRIuMAX, (uintmax_t)i); + + commits_bitmap[i] = bitmap_new(); + + for (p = merge->commit->parents; p; p = p->next) + bitmap_set(commits_bitmap[i], + find_object_pos(&p->item->object.oid, NULL)); + } + + start = hashfile_total(f); + + for (i = 0; i < writer.pseudo_merges_nr; i++) { + struct ewah_bitmap *commits_ewah = bitmap_to_ewah(commits_bitmap[i]); + + pseudo_merge_ofs[i] = hashfile_total(f); + + dump_bitmap(f, commits_ewah); + dump_bitmap(f, writer.selected[base+i].write_as); + + ewah_free(commits_ewah); + } + + next_ext = st_add(hashfile_total(f), + st_mult(kh_size(writer.pseudo_merge_commits), + sizeof(uint64_t))); + + table_start = hashfile_total(f); + + commits.alloc = kh_size(writer.pseudo_merge_commits); + CALLOC_ARRAY(commits.oid, commits.alloc); + + for (i = kh_begin(writer.pseudo_merge_commits); i != kh_end(writer.pseudo_merge_commits); i++) { + if (!kh_exist(writer.pseudo_merge_commits, i)) + continue; + oid_array_append(&commits, &kh_key(writer.pseudo_merge_commits, i)); + } + + oid_array_sort(&commits); + + /* write lookup table (non-extended) */ + for (i = 0; i < commits.nr; i++) { + int hash_pos; + struct pseudo_merge_commit_idx *c; + + hash_pos = kh_get_oid_map(writer.pseudo_merge_commits, + commits.oid[i]); + if (hash_pos == kh_end(writer.pseudo_merge_commits)) + BUG("could not find pseudo-merge commit %s", + oid_to_hex(&commits.oid[i])); + + c = kh_value(writer.pseudo_merge_commits, hash_pos); + + hashwrite_be32(f, find_object_pos(&commits.oid[i], NULL)); + if (c->nr == 1) + hashwrite_be64(f, pseudo_merge_ofs[c->pseudo_merge[0]]); + else if (c->nr > 1) { + if (next_ext & ((uint64_t)1<<63)) + die(_("too many pseudo-merges")); + hashwrite_be64(f, next_ext | ((uint64_t)1<<63)); + next_ext = st_add3(next_ext, + sizeof(uint32_t), + st_mult(c->nr, sizeof(uint64_t))); + } else + BUG("expected commit '%s' to have at least one " + "pseudo-merge", oid_to_hex(&commits.oid[i])); + } + + /* write lookup table (extended) */ + for (i = 0; i < commits.nr; i++) { + int hash_pos; + struct pseudo_merge_commit_idx *c; + + hash_pos = kh_get_oid_map(writer.pseudo_merge_commits, + commits.oid[i]); + if (hash_pos == kh_end(writer.pseudo_merge_commits)) + BUG("could not find pseudo-merge commit %s", + oid_to_hex(&commits.oid[i])); + + c = kh_value(writer.pseudo_merge_commits, hash_pos); + if (c->nr == 1) + continue; + + hashwrite_be32(f, c->nr); + for (j = 0; j < c->nr; j++) + hashwrite_be64(f, pseudo_merge_ofs[c->pseudo_merge[j]]); + } + + /* write positions for all pseudo merges */ + for (i = 0; i < writer.pseudo_merges_nr; i++) + hashwrite_be64(f, pseudo_merge_ofs[i]); + + hashwrite_be32(f, writer.pseudo_merges_nr); + hashwrite_be32(f, kh_size(writer.pseudo_merge_commits)); + hashwrite_be64(f, table_start - start); + hashwrite_be64(f, hashfile_total(f) - start + sizeof(uint64_t)); + + for (i = 0; i < writer.pseudo_merges_nr; i++) + bitmap_free(commits_bitmap[i]); + + free(pseudo_merge_ofs); + free(commits_bitmap); +} + static int table_cmp(const void *_va, const void *_vb, void *_data) { uint32_t *commit_positions = _data; @@ -855,6 +977,9 @@ void bitmap_writer_finish(struct pack_idx_entry **index, int fd = odb_mkstemp(&tmp_file, "pack/tmp_bitmap_XXXXXX"); + if (writer.pseudo_merges_nr) + options |= BITMAP_OPT_PSEUDO_MERGES; + f = hashfd(fd, tmp_file.buf); memcpy(header.magic, BITMAP_IDX_SIGNATURE, sizeof(BITMAP_IDX_SIGNATURE)); @@ -886,6 +1011,9 @@ void bitmap_writer_finish(struct pack_idx_entry **index, write_selected_commits_v1(f, commit_positions, offsets); + if (options & BITMAP_OPT_PSEUDO_MERGES) + write_pseudo_merges(f); + if (options & BITMAP_OPT_LOOKUP_TABLE) write_lookup_table(f, commit_positions, offsets); diff --git a/pack-bitmap.h b/pack-bitmap.h index 0f539d79cfd..55527f61cd9 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -37,6 +37,7 @@ enum pack_bitmap_opts { BITMAP_OPT_FULL_DAG = 0x1, BITMAP_OPT_HASH_CACHE = 0x4, BITMAP_OPT_LOOKUP_TABLE = 0x10, + BITMAP_OPT_PSEUDO_MERGES = 0x20, }; enum pack_bitmap_flags { From patchwork Mon Apr 29 20:43:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647713 Received: from mail-yw1-f173.google.com (mail-yw1-f173.google.com [209.85.128.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 791EC177998 for ; Mon, 29 Apr 2024 20:43:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423433; cv=none; b=TQ+Ewkm+WL6IGgD5chfQlUL9mIFk+NXrvkAWb7hT5MZMMwutXAbOHp0qUdPJRw+F2F7YTDPXo/BazNivNievOZkW0qr4x+lPRjsXJq7aOjc5QqLzOjSg1/C5aMXPYJcWDwRNVXm3MyZBS2cZrT/AKRhZ0fVIFTirclHofezzYKw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423433; c=relaxed/simple; bh=tAWUidDbsNG+hhVhOgjlRV2TO5ANqfyKEyIG7nJYVus=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=f6CQp7a0dbelX/pFklAi18b9sUSGIkAG+0J8I8xC8hYKCXqXZJysxJr7ADqppwvdU5RLJXpqc9igvfFFcwOOGyOOVI6WZK4+8L2f1XuBA884rPmFZnv4VWbKnznhUfqlf7FxG6nqqoTuWC3vks2hGSZ4UwNZhxS9aRbJkYWchtM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=tu1lskW7; arc=none smtp.client-ip=209.85.128.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="tu1lskW7" Received: by mail-yw1-f173.google.com with SMTP id 00721157ae682-61be4b98766so6958337b3.3 for ; Mon, 29 Apr 2024 13:43:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423431; x=1715028231; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=4W+VNq/Hg0fenhqmh65c6yemc1Q8sJen2LGJXWgGbso=; b=tu1lskW7Y/D7O6rF+Q0dvhVyqXdsnUVLE3jvmR6w7p5mOLwQPKuWYbVmIlxmIN8ZjD GCWDvxpwRHTEIv99woKMG+JJAjheCVfUV3VHlXXZ+eDMGPej0iYDbVcxR8NwosKrk9bs y1PPD2tisBviVs5qCmZoyVjarEVGja58ZNLvX2Lh6ZxdBtvpk5YVHO47iCGnDn09b6hj v3iIaRNVrZdaFP/dnKJKfEuh1pc7GxHZNLh6BvfM4bVq8itwwCrDLGfO4eDDS/WJeRXE 11hjftq7zK7/8dhpGgLFBFYzUlB1pagLj3hfNwOTmMbwaukGsUAOrglIP8Kdy6OcpQ6o GZTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423431; x=1715028231; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4W+VNq/Hg0fenhqmh65c6yemc1Q8sJen2LGJXWgGbso=; b=nmLCLywwe5s7e5PS1Pmyue2MJooB6nDgVWUoStUxVHksTkDvBPwNiJ7vXm3Rkj6KJJ F412iKy2BwNvSKfjnBYQnauAijz4e4BWjDCviAtjpmT7bNXrl/2h2LHu8/8mm92eLnTL K+8xB5RBvnAGCTv9tE3wWB0+R/VSERabGmxA4e4wifj/BmUV514AGy0fdeCS69hTj943 p8fY2idoHx+SVDnuj4OQiydDspVtWzuRR5CtWdnFFHXyEcCUZm5tby6YxMyZWKNWaBse S6mDsZMPPB3dujclnCC95B/viCwzrbRklCwIfFyQt4bMs2ANZi1j7BtCmxjUnsjwHL5U YmYw== X-Gm-Message-State: AOJu0YymZ//6C7JzqAeLDHeujIcMaxJosVsYVYnN4y4PSLrYEeNweW2O VjW9wbWjCu7GM/DgMWh7mYHPhmId7O+4HPuZbYYdwNz4bylswDmE89TYCdgzY7RSz3sKhzVTaR2 63PI= X-Google-Smtp-Source: AGHT+IGu3bCSwSN9SQKA8XOdB8J6+L1/e2P6MONNIGVqugF328AULuv68oKH6Npp/MsQB7EWsOpQig== X-Received: by 2002:a05:690c:6d0f:b0:618:95a3:70b9 with SMTP id iv15-20020a05690c6d0f00b0061895a370b9mr11520099ywb.36.1714423430777; Mon, 29 Apr 2024 13:43:50 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id bb40-20020a05622a1b2800b0043ad7ddda16sm1538618qtb.97.2024.04.29.13.43.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:43:50 -0700 (PDT) Date: Mon, 29 Apr 2024 16:43:48 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 12/23] pack-bitmap: extract `read_bitmap()` function Message-ID: <60f6b310213b86de25ecf23002cee7a3c8460415.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The pack-bitmap machinery uses the `read_bitmap_1()` function to read a bitmap from within the mmap'd region corresponding to the .bitmap file. As as side-effect of calling this function, `read_bitmap_1()` increments the `index->map_pos` variable to reflect the number of bytes read. Extract the core of this routine to a separate function (that operates over a `const unsigned char *`, a `size_t` and a `size_t *` pointer) instead of a `struct bitmap_index *` pointer. This function (called `read_bitmap()`) is part of the pack-bitmap.h API so that it can be used within the upcoming portion of the implementation in pseduo-merge.ch. Rewrite the existing function, `read_bitmap_1()`, in terms of its more generic counterpart. Signed-off-by: Taylor Blau --- pack-bitmap.c | 24 +++++++++++++++--------- pack-bitmap.h | 2 ++ 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 35c5ef9d3cd..3519edb896b 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -129,17 +129,13 @@ static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st) return composed; } -/* - * Read a bitmap from the current read position on the mmaped - * index, and increase the read position accordingly - */ -static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index) +struct ewah_bitmap *read_bitmap(const unsigned char *map, + size_t map_size, size_t *map_pos) { struct ewah_bitmap *b = ewah_pool_new(); - ssize_t bitmap_size = ewah_read_mmap(b, - index->map + index->map_pos, - index->map_size - index->map_pos); + ssize_t bitmap_size = ewah_read_mmap(b, map + *map_pos, + map_size - *map_pos); if (bitmap_size < 0) { error(_("failed to load bitmap index (corrupted?)")); @@ -147,10 +143,20 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index) return NULL; } - index->map_pos += bitmap_size; + *map_pos += bitmap_size; + return b; } +/* + * Read a bitmap from the current read position on the mmaped + * index, and increase the read position accordingly + */ +static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index) +{ + return read_bitmap(index->map, index->map_size, &index->map_pos); +} + static uint32_t bitmap_num_objects(struct bitmap_index *index) { if (index->midx) diff --git a/pack-bitmap.h b/pack-bitmap.h index 55527f61cd9..a5fe4f305ef 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -133,4 +133,6 @@ int bitmap_is_preferred_refname(struct repository *r, const char *refname); int verify_bitmap_files(struct repository *r); +struct ewah_bitmap *read_bitmap(const unsigned char *map, + size_t map_size, size_t *map_pos); #endif From patchwork Mon Apr 29 20:43:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647714 Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAF8517799F for ; Mon, 29 Apr 2024 20:43:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423438; cv=none; b=WJfSdo83MhpF4caObRFNI84uB1M9JjSYxJoZItYoMXliwcLzGBEy9p8xgqtqNEnf/Mor2Yc7AX9G7TU3pO/MhULxfDy0W3t7Kx9QQJDEVF1W2kFCUZLGONE/bDlveerLt0HGMdNiGJU4GWbZVL7ASR4o+o8MJc48Bno9VkmdPXA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423438; c=relaxed/simple; bh=qSbhVW7oQb1bERhH992tv25U+uBS0ux2GSLnVUlPu+Y=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=gtSlV+wuDEf1oJHJ6juUZpxgBUPYY/vOOcIF6KkPTlnygsC0whqQk9k1HLKBGlMEMD6UepDf6gtAdBW4nSP2CRwHjfFkL8QX/tm+g4h2TjWYcvi8QtvaXkcBUA4w90CN18AaryHDkOebxJvTWcx2AK5lbzGJlhC14cBf2OGb0rE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=tGH9Vsl9; arc=none smtp.client-ip=209.85.219.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="tGH9Vsl9" Received: by mail-qv1-f49.google.com with SMTP id 6a1803df08f44-69b4043b7b3so29334346d6.1 for ; Mon, 29 Apr 2024 13:43:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423435; x=1715028235; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=kcuAcp4qIz1oR6JgwihtPgoqgjoZ8TNZnxuAUK1aUTI=; b=tGH9Vsl9E9IiaHWLitDx9JGTzLnzSexxB2k8WjoDcYvKjOeSjme2u778D7johYmyu+ V2b1dUun4lqggsRZ7BGrHUvpNXLJrfybp8wJxSxW8L/lPeC7gE2GopdP6tTJjgSlkJLE UkB3bVp49sjQf5uxffxebhbnD2u2K0ImiEm3tV4LQwjPWqesnG159e74La8dzNcbIcjO gJbfEAmxhVTO8b6T2Zn2K5/n9lchrtZbjBNkHEQgYOIVxXt9cHX/AVp+Sgjd1vAZ62zH aA8660JgY0JSg8hG0vABE5rJcdyOMUhMB0hZtnBtSfHO1RxWN7dYskUnqWpsXX6/qYar 3TpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423435; x=1715028235; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=kcuAcp4qIz1oR6JgwihtPgoqgjoZ8TNZnxuAUK1aUTI=; b=QUOS8Gqzmw17ILcgZLWFCoi9/9eenr/yMOa1KgXxw5aEyHWPlia21k8ZHGnUAlOjJa 3g3OMD8WmL/E9V2l7UYBOBRsF/3nTgh+SwATrtR5LdioxNPKdMV/waOlynRZQeXwhmCM i3jwKzCgw2I6F24KGNmH9CeLSczniu6yHTPkhNPVn24wee7uEKPomc1SRJYB9yIzRpeK iPyvK7h50LX9GNRZnEXVx1IoJxLalGDhHWC/eu1yhZQg6JJ5us33NOm/jjQC0kEqxnbz 1wzXcQSTeYucqxmTv8WZ/H/86bt3ogAjCPlFfZqPpiJy/TahUS8csH6qcUyUIzAkuNOO ZfEw== X-Gm-Message-State: AOJu0Yxk/wmXV/k64SnJJ5nLegBZH8Mj+S9xkvHXPv3lWjshKtsI3X60 KwJ0/Lo2MxDNJNzQVeR3FMojtXSYsNfKKacpbebpPffNg2Xukw3JIRBoPRCzXVmvttFijl3o69Q bfKY= X-Google-Smtp-Source: AGHT+IHqJf+102N5N5A6AmwjuX3D4lU3JlIen1+7jkUGyVIK7QJeTIQleEw+6d+URk+dhsc+NRtpCw== X-Received: by 2002:ad4:5ca2:0:b0:69f:ebc3:7d0d with SMTP id q2-20020ad45ca2000000b0069febc37d0dmr14432429qvh.60.1714423434560; Mon, 29 Apr 2024 13:43:54 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id p18-20020a0cfd92000000b006a0d701ee4esm642548qvr.75.2024.04.29.13.43.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:43:53 -0700 (PDT) Date: Mon, 29 Apr 2024 16:43:52 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 13/23] pseudo-merge: scaffolding for reads Message-ID: <9465313691be43c00c6f03c0be58710edbeb9caf.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Implement scaffolding within the new pseudo-merge compilation unit necessary to use the pseudo-merge API from within the pack-bitmap.c machinery. The core of this scaffolding is two-fold: - The `pseudo_merge` structure itself, which represents an individual pseudo-merge bitmap. It has fields for both bitmaps, as well as metadata about its position within the memory-mapped region, and a few extra bits indicating whether or not it is satisfied, and which bitmaps(s, if any) have been read, since they are initialized lazily. - The `pseudo_merge_map` structure, which holds an array of pseudo_merges, as well as a pointer to the memory-mapped region containing the pseudo-merge serialization from within a .bitmap file. Note that the `bitmap_index` structure is defined statically within the pack-bitmap.o compilation unit, so we can't take in a `struct bitmap_index *`. Instead, wrap the primary components necessary to read the pseudo-merges in this new structure to avoid exposing the implementation details of the `bitmap_index` structure. Signed-off-by: Taylor Blau --- pseudo-merge.c | 10 ++++++++ pseudo-merge.h | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) diff --git a/pseudo-merge.c b/pseudo-merge.c index caccef942a1..d18de0a266b 100644 --- a/pseudo-merge.c +++ b/pseudo-merge.c @@ -441,3 +441,13 @@ void select_pseudo_merges(struct string_list *list, stop_progress(&progress); } + +void free_pseudo_merge_map(struct pseudo_merge_map *pm) +{ + uint32_t i; + for (i = 0; i < pm->nr; i++) { + ewah_pool_free(pm->v[i].commits); + ewah_pool_free(pm->v[i].bitmap); + } + free(pm->v); +} diff --git a/pseudo-merge.h b/pseudo-merge.h index 81888731864..2f652fc6767 100644 --- a/pseudo-merge.h +++ b/pseudo-merge.h @@ -99,4 +99,69 @@ void select_pseudo_merges(struct string_list *list, uint32_t *pseudo_merges_nr, unsigned show_progress); +/* + * Represents a serialized view of a file containing pseudo-merge(s) + * (see Documentation/technical/bitmap-format.txt for a specification + * of the format). + */ +struct pseudo_merge_map { + /* + * An array of pseudo-merge(s), lazily loaded from the .bitmap + * file. + */ + struct pseudo_merge *v; + size_t nr; + size_t commits_nr; + + /* + * Pointers into a memory-mapped view of the .bitmap file: + * + * - map: the beginning of the .bitmap file + * - commits: the beginning of the pseudo-merge commit index + * - map_size: the size of the .bitmap file + */ + const unsigned char *map; + const unsigned char *commits; + + size_t map_size; +}; + +/* + * An individual pseudo-merge, storing a pair of lazily-loaded + * bitmaps: + * + * - commits: the set of commit(s) that are part of the pseudo-merge + * - bitmap: the set of object(s) reachable from the above set of + * commits. + * + * The `at` and `bitmap_at` fields are used to store the locations of + * each of the above bitmaps in the .bitmap file. + */ +struct pseudo_merge { + struct ewah_bitmap *commits; + struct ewah_bitmap *bitmap; + + off_t at; + off_t bitmap_at; + + /* + * `satisfied` indicates whether the given pseudo-merge has been + * used. + * + * `loaded_commits` and `loaded_bitmap` indicate whether the + * respective bitmaps have been loaded and read from the + * .bitmap file. + */ + unsigned satisfied : 1, + loaded_commits : 1, + loaded_bitmap : 1; +}; + +/* + * Frees the given pseudo-merge map, releasing any memory held by (a) + * parsed EWAH bitmaps, or (b) the array of pseudo-merges itself. Does + * not free the memory-mapped view of the .bitmap file. + */ +void free_pseudo_merge_map(struct pseudo_merge_map *pm); + #endif From patchwork Mon Apr 29 20:43:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647715 Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8ABED178CC9 for ; Mon, 29 Apr 2024 20:44:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423441; cv=none; b=GAtz3+zwP1d3IUMDPklWIpIL/lUMeP43/kr7bV9RXMK6agC0aYsiIhNcLYuFQpjOgcH2Zv//rfNU9SKP4LePjJDA3HPkDDG8pcGW47VKsdqYxoO6q9nD1P9ImD14DPVKnlnk4TQWvSdn6R2FuDNcmm8X3sptwpFX4Qq8UBS/h44= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423441; c=relaxed/simple; bh=lfXWrOzhxJNC3jMCytXQzIXYiiVNQO1rGJ+IS7Uv5Ag=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=MyM/kqKn+au92T6so4Cg6OZO3eGIcExB8pCVkzji1h5EP4poPepW9z5sc7yYSJzGLGNF6iKlCIRDo0o/EMubEr77Dqw5MFE6TqpHRetyd1zcVKkbhNly+L8+tJM9vaDUxmGx7FLGswdCeAgCi3YlL7FC6q+tn7vHvq8OTLXWLVU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=vAb5c//6; arc=none smtp.client-ip=209.85.219.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="vAb5c//6" Received: by mail-qv1-f50.google.com with SMTP id 6a1803df08f44-6a073f10e25so26626836d6.3 for ; Mon, 29 Apr 2024 13:44:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423438; x=1715028238; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=fQlkjKJe8kxy7uAvSD0ZpgVrwHZXTzknaiQ7fV9DxyM=; b=vAb5c//69TfYtW69fCWoI9ldisja0M96G/INXJ79waVczIfiQssrMQtiIhmpocKBWj hb8aimTvWNUkXwb3H/w7RWErCrsLWkjE6kuCH7UI2tan2x0RIEj8+IFhipUevVoli9IN ru+DhsbKNeFzjCbO4V4AxfD62u9vrPpTGkNxPlH0x1ikO+jn9+3Q8xqyj38Nigu2EaZp Ri514XCEPcHPTf5Mjrcul3xnHBbVmjpKxUgNjaGGV6+MN4IRcdFGCZZPwM15QWhxkBWa Q37EXVcWb94+ZJ2ugiw8c2lHqo41SWNQmmR3AabnsfceQgssn8Mmx01GMLsImZazwuLR k3PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423438; x=1715028238; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=fQlkjKJe8kxy7uAvSD0ZpgVrwHZXTzknaiQ7fV9DxyM=; b=hlzdss2ef+5hoCjEMxSVT5apY/+/yoIR/AqDF1nT7WsI1VkvQ6PxHe5cl+fgOd1Zfx QZ5yEyDtK2UdimtuOR0dyL7bJohqxnTDg54L8xtUA9NZwjoB19LmHfw/ijooYZkAjG8T OC3sTfy9b58dR6i4zU83/HYA93GkI857wxQWfsSnz3wue7idV7hboQ54MTMs4ShH3V/K RyIkOTseRAj6ulTejaeSJocbVvlvS/z3058uawdiY7ldeS2QMy4BnEeFji/fq8GmH963 PBqI515jInY+ys9BkWuBUVBx8EBXOowgLqrvVI6zpEtKea82VBJFP4L7HGwybGesuiuf hRKw== X-Gm-Message-State: AOJu0YwYptgAYEj2wcAkdimm+1Au7sDmfukJ0EzYrbG77MfV7ywdnsaC dpey+Z//J513LyliX1mXpO82TfFSxrjDU4KVIR1xIDzqvnhXGQPl1+NbmDVB7D65F5Vn2Vc+3b5 HB5E= X-Google-Smtp-Source: AGHT+IEuNXguoVl8r7OVuZ0bS+goCENH3a9iQPavmcnHQLqCi6WRuSx7xKQ+1CIkMs/+Y8XrEHVRHA== X-Received: by 2002:a05:6214:21cd:b0:6a0:cc66:3c74 with SMTP id d13-20020a05621421cd00b006a0cc663c74mr5539866qvh.18.1714423438253; Mon, 29 Apr 2024 13:43:58 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id qd25-20020ad44819000000b006a0d991c638sm444052qvb.104.2024.04.29.13.43.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:43:57 -0700 (PDT) Date: Mon, 29 Apr 2024 16:43:56 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 14/23] pack-bitmap.c: read pseudo-merge extension Message-ID: <5894f3d536980abbb4115aa842e27995e93326af.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that the scaffolding for reading the pseudo-merge extension has been laid, teach the pack-bitmap machinery to read the pseudo-merge extension when present. Note that pseudo-merges themselves are not yet used during traversal, this step will be taken by a future commit. In the meantime, read the table and initialize the pseudo_merge_map structure introduced by a previous commit. When the pseudo-merge extension is present, `load_bitmap_header()` performs basic sanity checks to make sure that the table is well-formed. Signed-off-by: Taylor Blau --- pack-bitmap.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/pack-bitmap.c b/pack-bitmap.c index 3519edb896b..fc9c3e2fc43 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -20,6 +20,7 @@ #include "list-objects-filter-options.h" #include "midx.h" #include "config.h" +#include "pseudo-merge.h" /* * An entry on the bitmap index, representing the bitmap for a given @@ -86,6 +87,9 @@ struct bitmap_index { */ unsigned char *table_lookup; + /* This contains the pseudo-merge cache within 'map' (if found). */ + struct pseudo_merge_map pseudo_merges; + /* * Extended index. * @@ -205,6 +209,41 @@ static int load_bitmap_header(struct bitmap_index *index) index->table_lookup = (void *)(index_end - table_size); index_end -= table_size; } + + if (flags & BITMAP_OPT_PSEUDO_MERGES) { + unsigned char *pseudo_merge_ofs; + size_t table_size; + uint32_t i; + + if (sizeof(table_size) > index_end - index->map - header_size) + return error(_("corrupted bitmap index file (too short to fit pseudo-merge table header)")); + + table_size = get_be64(index_end - 8); + if (table_size > index_end - index->map - header_size) + return error(_("corrupted bitmap index file (too short to fit pseudo-merge table)")); + + if (git_env_bool("GIT_TEST_USE_PSEUDO_MERGES", 1)) { + const unsigned char *ext = (index_end - table_size); + + index->pseudo_merges.map = index->map; + index->pseudo_merges.map_size = index->map_size; + index->pseudo_merges.commits = ext + get_be64(index_end - 16); + index->pseudo_merges.commits_nr = get_be32(index_end - 20); + index->pseudo_merges.nr = get_be32(index_end - 24); + + CALLOC_ARRAY(index->pseudo_merges.v, + index->pseudo_merges.nr); + + pseudo_merge_ofs = index_end - 24 - + (index->pseudo_merges.nr * sizeof(uint64_t)); + for (i = 0; i < index->pseudo_merges.nr; i++) { + index->pseudo_merges.v[i].at = get_be64(pseudo_merge_ofs); + pseudo_merge_ofs += sizeof(uint64_t); + } + } + + index_end -= table_size; + } } index->entry_count = ntohl(header->entry_count); From patchwork Mon Apr 29 20:44:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647716 Received: from mail-qt1-f174.google.com (mail-qt1-f174.google.com [209.85.160.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 551E01779B1 for ; Mon, 29 Apr 2024 20:44:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423445; cv=none; b=FvpFQn3MTSLYPWvhvDlztuKwHSQdqKFf4t/RQTKa+Ibl/U6c5YFG9Rvy2RSupaClf9cdyWKAfPNXay1kcNyiZaF1LIlCbjaV2GzrRupSh1rJ3sTcTiCZrdQwU4L8VmPyNZt+V184kQTNYnGwX3BpTl8aP3SdSdpqfRbtmN+kL7k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423445; c=relaxed/simple; bh=KnMZTO07YVsVcRfeq/NzClA1UyDe7VbUxkFcq77kT/Q=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=gkJfSHRhg+moq5f7oT1hrIfkbo5N/EgVfM/ZX/xAQtM0EvcEag4yrxk/LGCdSQeM8hRSmsfo8gACPyCo9mYRJow1dQ/e/mIJgXKs7znpW1fHLwZqbd3M8bgKV6ZOMAy9mb6l+vGs5nqYJBqaCv6ayqcvt3Gv5l3l5//4Q1yjnuw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=FjvSP/nJ; arc=none smtp.client-ip=209.85.160.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="FjvSP/nJ" Received: by mail-qt1-f174.google.com with SMTP id d75a77b69052e-43ade9223c0so5977311cf.2 for ; Mon, 29 Apr 2024 13:44:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423442; x=1715028242; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=c6PYeWR57n/3yvSWOhrGXOKoj0eC1tJmUhnV9GLtUrk=; b=FjvSP/nJAVkniJtQBRcg5ncOwFqe5F4+c+CkpGMqb7jxzuYZiaW4yNigeNFP4gDtPh y2pV5XZ7e1aAkovwLpf/+j9/ATtuzqiDwXZm797aRXe4FQjxHZO79q3IJFb2Om2iem2A 4h4zd3usnr22NwCGi5rADCB1VzvbVXQmBztgFJl5O3Zpif4UPpHVVKRou5KMl+39+VIH Yyd72Ht8ph/cwVgSM1j/WUJBfumjtDjx1mvd9ACYb+UhkzK4Waor+iH135ag2ugu5koX 2NbXUkoASTRntanx9tNVTQJkbI7lCbYlWUazuknnL0TYj1bSd3fChmhCi11wGEoPDcj2 jz+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423442; x=1715028242; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=c6PYeWR57n/3yvSWOhrGXOKoj0eC1tJmUhnV9GLtUrk=; b=No0pI+h54X7JRBt5agF/3ZVMgtB58iK/A7VZDokK5+JX4cgtZai19tWnn0dO57ijEE t6p1sqyZLC6JT2UizGkJd/oi/4K8PuKoc8eTJFvkANqB/MSFDML4q6EQfQPmgYqZkLp3 tACxSTToI1miyHL70s7zB+uyGQIS5ZybBe+4zXLcUNtn3J2+N8xxWQTVJ7yoijxpOPVD mzV8ArKP4imJKvkTvlkd0VRAaR0+GprK5LFgVUPh7y/MurA/+ic3xH/AJutdwq6zh4px SHAXD8yQWZprEXezmFh1m+qJAkMDqbqvbKoqj5lWzuRwLMuWrAcAHlkuQKeR4ubmVoGZ myUA== X-Gm-Message-State: AOJu0YwpDTzynPjyEQ1Muv7YhjILO5IzwNRGWH4chSn7cShd7hAv+aNZ 0DYI1eLYRupbn1/mSc6tbnxo8tXbjjTdEXjjs/aacX+KlLUTiXx1aXgcAR0gkgkS/Dea3wB9IBb t3no= X-Google-Smtp-Source: AGHT+IEV0gw4K23lSMAV/gha+vQwNmz34MKP+gj8Z7lUmErWqpkz5GfNr45oBeJJq9uvgLLHatRBrg== X-Received: by 2002:a05:622a:411:b0:43a:ee30:1293 with SMTP id n17-20020a05622a041100b0043aee301293mr4659471qtx.61.1714423442121; Mon, 29 Apr 2024 13:44:02 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id p9-20020ac84089000000b00437a996ca44sm10411027qtl.21.2024.04.29.13.44.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:44:01 -0700 (PDT) Date: Mon, 29 Apr 2024 16:44:00 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 15/23] pseudo-merge: implement support for reading pseudo-merge commits Message-ID: <7dbee8bcbdf2b4f7f35fb76c3d5b0ad6a465539e.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Implement the basic API for reading pseudo-merge bitmaps, which consists of four basic functions: - pseudo_merge_bitmap() - use_pseudo_merge() - apply_pseudo_merges_for_commit() - cascade_pseudo_merges() These functions are all documented in pseudo-merge.h, but their rough descriptions are as follows: - pseudo_merge_bitmap() reads and inflates the objects EWAH bitmap for a given pseudo-merge - use_pseudo_merge() does the same as pseudo_merge_bitmap(), but on the commits EWAH bitmap, not the objects bitmap - apply_pseudo_merges_for_commit() applies all satisfied pseudo-merge commits for a given result set, and cascades any yet-unsatisfied pseudo-merges if any were applied in the previous step - cascade_pseudo_merges() applies all pseudo-merges which are satisfied but have not been previously applied, repeating this process until no more pseudo-merges can be applied The core of the API is the latter two functions, which are responsible for applying pseudo-merges during the object traversal implemented in the pack-bitmap machinery. The other two functions (pseudo_merge_bitmap(), and use_pseudo_merge()) are low-level ways to interact with the pseudo-merge machinery, which will be useful in future commits. Signed-off-by: Taylor Blau --- pseudo-merge.c | 231 +++++++++++++++++++++++++++++++++++++++++++++++++ pseudo-merge.h | 44 ++++++++++ 2 files changed, 275 insertions(+) diff --git a/pseudo-merge.c b/pseudo-merge.c index d18de0a266b..e111c9cd1a6 100644 --- a/pseudo-merge.c +++ b/pseudo-merge.c @@ -10,6 +10,7 @@ #include "commit.h" #include "alloc.h" #include "progress.h" +#include "hex.h" #define DEFAULT_PSEUDO_MERGE_DECAY 1.0f #define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64 @@ -451,3 +452,233 @@ void free_pseudo_merge_map(struct pseudo_merge_map *pm) } free(pm->v); } + +struct pseudo_merge_commit_ext { + uint32_t nr; + const unsigned char *ptr; +}; + +static int pseudo_merge_ext_at(const struct pseudo_merge_map *pm, + struct pseudo_merge_commit_ext *ext, size_t at) +{ + if (at >= pm->map_size) + return error(_("extended pseudo-merge read out-of-bounds " + "(%"PRIuMAX" >= %"PRIuMAX")"), + (uintmax_t)at, (uintmax_t)pm->map_size); + + ext->nr = get_be32(pm->map + at); + ext->ptr = pm->map + at + sizeof(uint32_t); + + return 0; +} + +struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm, + struct pseudo_merge *merge) +{ + if (!merge->loaded_commits) + BUG("cannot use unloaded pseudo-merge bitmap"); + + if (!merge->loaded_bitmap) { + size_t at = merge->bitmap_at; + + merge->bitmap = read_bitmap(pm->map, pm->map_size, &at); + merge->loaded_bitmap = 1; + } + + return merge->bitmap; +} + +struct pseudo_merge *use_pseudo_merge(const struct pseudo_merge_map *pm, + struct pseudo_merge *merge) +{ + if (!merge->loaded_commits) { + size_t pos = merge->at; + + merge->commits = read_bitmap(pm->map, pm->map_size, &pos); + merge->bitmap_at = pos; + merge->loaded_commits = 1; + } + return merge; +} + +static struct pseudo_merge *pseudo_merge_at(const struct pseudo_merge_map *pm, + struct object_id *oid, + size_t want) +{ + size_t lo = 0; + size_t hi = pm->nr; + + while (lo < hi) { + size_t mi = lo + (hi - lo) / 2; + size_t got = pm->v[mi].at; + + if (got == want) + return use_pseudo_merge(pm, &pm->v[mi]); + else if (got < want) + hi = mi; + else + lo = mi + 1; + } + + warning(_("could not find pseudo-merge for commit %s at offset %"PRIuMAX), + oid_to_hex(oid), (uintmax_t)want); + + return NULL; +} + +struct pseudo_merge_commit { + uint32_t commit_pos; + uint64_t pseudo_merge_ofs; +}; + +#define PSEUDO_MERGE_COMMIT_RAWSZ (sizeof(uint32_t)+sizeof(uint64_t)) + +static void read_pseudo_merge_commit_at(struct pseudo_merge_commit *merge, + const unsigned char *at) +{ + merge->commit_pos = get_be32(at); + merge->pseudo_merge_ofs = get_be64(at + sizeof(uint32_t)); +} + +static int nth_pseudo_merge_ext(const struct pseudo_merge_map *pm, + struct pseudo_merge_commit_ext *ext, + struct pseudo_merge_commit *merge, + uint32_t n) +{ + size_t ofs; + + if (n >= ext->nr) + return error(_("extended pseudo-merge lookup out-of-bounds " + "(%"PRIu32" >= %"PRIu32")"), n, ext->nr); + + ofs = get_be64(ext->ptr + st_mult(n, sizeof(uint64_t))); + if (ofs >= pm->map_size) + return error(_("out-of-bounds read: (%"PRIuMAX" >= %"PRIuMAX")"), + (uintmax_t)ofs, (uintmax_t)pm->map_size); + + read_pseudo_merge_commit_at(merge, pm->map + ofs); + + return 0; +} + +static unsigned apply_pseudo_merge(const struct pseudo_merge_map *pm, + struct pseudo_merge *merge, + struct bitmap *result, + struct bitmap *roots) +{ + if (merge->satisfied) + return 0; + + if (!ewah_bitmap_is_subset(merge->commits, roots ? roots : result)) + return 0; + + bitmap_or_ewah(result, pseudo_merge_bitmap(pm, merge)); + if (roots) + bitmap_or_ewah(roots, pseudo_merge_bitmap(pm, merge)); + merge->satisfied = 1; + + return 1; +} + +static int pseudo_merge_commit_cmp(const void *va, const void *vb) +{ + struct pseudo_merge_commit merge; + uint32_t key = *(uint32_t*)va; + + read_pseudo_merge_commit_at(&merge, vb); + + if (key < merge.commit_pos) + return -1; + if (key > merge.commit_pos) + return 1; + return 0; +} + +static struct pseudo_merge_commit *find_pseudo_merge(const struct pseudo_merge_map *pm, + uint32_t pos) +{ + if (!pm->commits_nr) + return NULL; + + return bsearch(&pos, pm->commits, pm->commits_nr, + PSEUDO_MERGE_COMMIT_RAWSZ, pseudo_merge_commit_cmp); +} + +int apply_pseudo_merges_for_commit(const struct pseudo_merge_map *pm, + struct bitmap *result, + struct commit *commit, uint32_t commit_pos) +{ + struct pseudo_merge *merge; + struct pseudo_merge_commit *merge_commit; + int ret = 0; + + merge_commit = find_pseudo_merge(pm, commit_pos); + if (!merge_commit) + return 0; + + if (merge_commit->pseudo_merge_ofs & ((uint64_t)1<<63)) { + struct pseudo_merge_commit_ext ext = { 0 }; + off_t ofs = merge_commit->pseudo_merge_ofs & ~((uint64_t)1<<63); + uint32_t i; + + if (pseudo_merge_ext_at(pm, &ext, ofs) < -1) { + warning(_("could not read extended pseudo-merge table " + "for commit %s"), + oid_to_hex(&commit->object.oid)); + return ret; + } + + for (i = 0; i < ext.nr; i++) { + if (nth_pseudo_merge_ext(pm, &ext, merge_commit, i) < 0) + return ret; + + merge = pseudo_merge_at(pm, &commit->object.oid, + merge_commit->pseudo_merge_ofs); + + if (!merge) + return ret; + + if (apply_pseudo_merge(pm, merge, result, NULL)) + ret++; + } + } else { + merge = pseudo_merge_at(pm, &commit->object.oid, + merge_commit->pseudo_merge_ofs); + + if (!merge) + return ret; + + if (apply_pseudo_merge(pm, merge, result, NULL)) + ret++; + } + + if (ret) + cascade_pseudo_merges(pm, result, NULL); + + return ret; +} + +int cascade_pseudo_merges(const struct pseudo_merge_map *pm, + struct bitmap *result, + struct bitmap *roots) +{ + unsigned any_satisfied; + int ret = 0; + + do { + struct pseudo_merge *merge; + uint32_t i; + + any_satisfied = 0; + + for (i = 0; i < pm->nr; i++) { + merge = use_pseudo_merge(pm, &pm->v[i]); + if (apply_pseudo_merge(pm, merge, result, roots)) { + any_satisfied |= 1; + ret++; + } + } + } while (any_satisfied); + + return ret; +} diff --git a/pseudo-merge.h b/pseudo-merge.h index 2f652fc6767..cc14e947e86 100644 --- a/pseudo-merge.h +++ b/pseudo-merge.h @@ -164,4 +164,48 @@ struct pseudo_merge { */ void free_pseudo_merge_map(struct pseudo_merge_map *pm); +/* + * Loads the bitmap corresponding to the given pseudo-merge from the + * map, if it has not already been loaded. + */ +struct ewah_bitmap *pseudo_merge_bitmap(const struct pseudo_merge_map *pm, + struct pseudo_merge *merge); + +/* + * Loads the pseudo-merge and its commits bitmap from the given + * pseudo-merge map, if it has not already been loaded. + */ +struct pseudo_merge *use_pseudo_merge(const struct pseudo_merge_map *pm, + struct pseudo_merge *merge); + +/* + * Applies pseudo-merge(s) containing the given commit to the bitmap + * "result". + * + * If any pseudo-merge(s) were satisfied, returns the number + * satisfied, otherwise returns 0. If any were satisfied, the + * remaining unsatisfied pseudo-merges are cascaded (see below). + */ +int apply_pseudo_merges_for_commit(const struct pseudo_merge_map *pm, + struct bitmap *result, + struct commit *commit, uint32_t commit_pos); + +/* + * Applies pseudo-merge(s) which are satisfied according to the + * current bitmap in result (or roots, see below). If any + * pseudo-merges were satisfied, repeat the process over unsatisfied + * pseudo-merge commits until no more pseudo-merges are satisfied. + * + * Result is the bitmap to which the pseudo-merge(s) are applied. + * Roots (if given) is a bitmap of the traversal tip(s) for either + * side of a reachability traversal. + * + * Roots may given instead of a populated results bitmap at the + * beginning of a traversal on either side where the reachability + * closure over tips is not yet known. + */ +int cascade_pseudo_merges(const struct pseudo_merge_map *pm, + struct bitmap *result, + struct bitmap *roots); + #endif From patchwork Mon Apr 29 20:44:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647717 Received: from mail-qv1-f46.google.com (mail-qv1-f46.google.com [209.85.219.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F95A1779B1 for ; Mon, 29 Apr 2024 20:44:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423448; cv=none; b=Bb0oNjEfm4MAyLB8OlmOL7ei8KTgADQIx2qSGWbNiZMLQyJxIybyvIWza8Z2XMOU6q3dF3plRcdYl3bz8UlBOi3AKuzI7Tx6/HCHtfLJL5NyygVYBNriL3/HSkX/h+WvTEyXh3sKOfSJy0Vr8BCp1nOtUkj6vU4ifGTUjwodJAI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423448; c=relaxed/simple; bh=UmeTX517CZV/Dmxixbw0T6EgqFGIkyh+cH4s7AeqCWM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=iwkwoRSlwXP14zVVM2soykuT3lkIj7HRLOvJ9FyI2T3bpe+PGkwjfM6TPfJISNPDpd8jQJY+VJ+p5Bzj4OHmZDspbd//op0dHbLufrHboyOfOMeJHDA3QzKDow9bTF1oxI7VPlJ5W8iu8LrrdIlqjLKlqVJaBaWACWJ/193UPfk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=wKWAd7Ku; arc=none smtp.client-ip=209.85.219.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="wKWAd7Ku" Received: by mail-qv1-f46.google.com with SMTP id 6a1803df08f44-69b514d3cf4so53889956d6.0 for ; Mon, 29 Apr 2024 13:44:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423446; x=1715028246; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=UPapTvPHlFB8iWTIO6Q/BZ2fGUJm3bOyC6YN8/L0aeU=; b=wKWAd7KuUkZ2kcWLWqX5D4TOVgQa0IEhMtvGrJE954fYCe2iECJClAVbOVX/Uj+DsL ut6nl4vEzihYZcGqhz+mOpq57Y2Z8lDJHElKfIRcJ6MaHWNU5DmxEuSVcBTqhDghNgMK 6uLChPxb/icCQl8p81xFTubDHkNDJEYzmrwmG8si1yF9pupmQuXKtCMYFmxcVDiTxceE YhQ7XYgz6eD7GmB2SeVKpGtQ7GCZmxcg9NSrS9L/F+uoRZzw8wkeOehWzbOJFn28f32E uYXyOclP77LMZcWhljJB2gabjKqN78yubfxVZmt3Tz5zasycYS3LG24sgF7EklWuY+XQ CNHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423446; x=1715028246; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=UPapTvPHlFB8iWTIO6Q/BZ2fGUJm3bOyC6YN8/L0aeU=; b=O5CDAVyv7SzYi/a43DCt/Y9r0cIIJZjb8ijEdaITYBoSe5UrB1ZdRS4uQTzqCEw3GX 792lqXbJI0I5YHjyR/ggTKw7LKxzTeYP4Jv+OXvLuGxMZONeqnhEhMmqt8iuUla4j3OH JjJWRsdXGA8wW845TXJkktEUTFYqqy5u6xAU5yuwoNhZrWPifcCeggA+tFpAHtmSC6Cb 42N0NO6+txGvuMuHRnv5AG3S5J+xANiAnDfn9BK/agZKeh1UOIJBngfTaJxaMrDEup3+ RIzWQyzFFAznHRZnjGG2bp2IZRZvmhLIZkH/KhHDgQE7eV3ZL35r/7Fc7irzm5zTnTX9 fhYQ== X-Gm-Message-State: AOJu0YxFUBNrGUS7EL4yDlFzPYmXgD4pCsxO9iitWrTa0Q9cw2wABGPX YuG1OUVrSUUYi1vYusraNrX8VfBVonESg4VEzNBwj2yP0lROX7rCeoEX8LzsUkrURALLKd7Cg6X AxEQ= X-Google-Smtp-Source: AGHT+IFiygYslx6/MMQTJ9bQQ4iaPoAgt720ZJMZb1DKBtNnFMaVlql0Qmt7SXPraoSULdNeiAgZug== X-Received: by 2002:a05:6214:27c8:b0:6a0:b9e7:dd4d with SMTP id ge8-20020a05621427c800b006a0b9e7dd4dmr10195396qvb.42.1714423445861; Mon, 29 Apr 2024 13:44:05 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id y19-20020ad445b3000000b006a0cf22e54asm1233706qvu.41.2024.04.29.13.44.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:44:05 -0700 (PDT) Date: Mon, 29 Apr 2024 16:44:03 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 16/23] ewah: implement `ewah_bitmap_popcount()` Message-ID: <09650aa53e9207191e7607f9f7ec77e2e7bc796b.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Some of the pseudo-merge test helpers (which will be introduced in the following commit) will want to indicate the total number of commits in or objects reachable from a pseudo-merge. Implement a popcount() function that operates on EWAH bitmaps to quickly determine how many bits are set in each of the respective bitmaps. Signed-off-by: Taylor Blau --- ewah/bitmap.c | 14 ++++++++++++++ ewah/ewok.h | 1 + 2 files changed, 15 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index d352fec54ce..dc2ca190f12 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -212,6 +212,20 @@ size_t bitmap_popcount(struct bitmap *self) return count; } +size_t ewah_bitmap_popcount(struct ewah_bitmap *self) +{ + struct ewah_iterator it; + eword_t word; + size_t count = 0; + + ewah_iterator_init(&it, self); + + while (ewah_iterator_next(&word, &it)) + count += ewah_bit_popcount64(word); + + return count; +} + int bitmap_is_empty(struct bitmap *self) { size_t i; diff --git a/ewah/ewok.h b/ewah/ewok.h index 2b6c4ac499c..7074a6347b7 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -195,6 +195,7 @@ void bitmap_or_ewah(struct bitmap *self, struct ewah_bitmap *other); void bitmap_or(struct bitmap *self, const struct bitmap *other); size_t bitmap_popcount(struct bitmap *self); +size_t ewah_bitmap_popcount(struct ewah_bitmap *self); int bitmap_is_empty(struct bitmap *self); #endif From patchwork Mon Apr 29 20:44:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647718 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 727351791F4 for ; Mon, 29 Apr 2024 20:44:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423452; cv=none; b=jB1fqqji0GKbCB8uU6ol54gSsWw3DPfA+9/+q8+BQHhByVoN+GnLOWoidKgoha9+PGvHCJSm8U5XKFEOh1PUGnKZDb95s7jpr8E7fmpkNZZt/d6A2EHCdgqbmP+G6j95FccwVVzQkVvHxpmlzXvC3r9eXxnIOTf84YApjB6FO7A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423452; c=relaxed/simple; bh=qvHkiGfH34O/NdIvEh7yQVzzOtcrFJNUIdYbo1N7ko8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=j67PXz0P/6lEjvSVpAltYWAzUvn0iDFoEtsjZBGmO3QvNAF2ZxI3Ly5rRfE+GHJkUBCLSTyUXbS92exc246RCgp6KCnXB9UWdsPts19wPFtzEfprffmf0T2BLlxcfCEjxggIK/UEPloygyUoQ5MxgmtIgh0KGx8oEqpO2mM7RPk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=nd4gvblt; arc=none smtp.client-ip=209.85.160.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="nd4gvblt" Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-434d0e3f213so19637011cf.3 for ; Mon, 29 Apr 2024 13:44:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423449; x=1715028249; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=4iHzmZIKJuASomErhthoWyZfpL6c5BEwFsZ3tXJaKVQ=; b=nd4gvbltl7VML0koLEIE3JTlEwIIkaQxHJcRUZwvcNH6iomaYqVS6VGu4B3fEa2rXh DZYABx8ntE0NG7QOIwZUnY4elFrscm2qZxcRqsL87GdoSOnaueEBEmz//dPzoPEHDhP+ 5/JJkwrg5UA6H79ySICQ4g/CAFT70i36GxsAgNT4095dg+k5cVMnIxNGngbvapdqWDNc juMt+3zKG2zCXYLZJjtHCQNhhaI60lki7aM9Oh8qOmfax7ok/Ut9n06qXQTkpwfDroNu Y+fw4kk5AOuEgdb/Qg2okHZq+RK3/5/Pov6BjRP0sqbOBt2xN06RsxL+zgzI/4kXbzDa i2eQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423449; x=1715028249; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4iHzmZIKJuASomErhthoWyZfpL6c5BEwFsZ3tXJaKVQ=; b=ZO9qgUFW4LtvYvSQYqA1dmuIzZf4l1JFnHUFf+hiZan/pUbA3P+OBfbhiL+VoycaS6 Ibx6GUQhcy8otf7//d6qjQmuxGEg5cVXQyTnDY7lpcCkMUlCGW49cIs26hF7mOtha21w F6O2Lu10lTwDhOha+760OX6YdG4FP7R4g6vKQL74+5swgcdawKFD5BJREpmvI04dvW72 8qo1GP+qTYYJyk61YZXkN4JpuwsMQKRzkoJzms5uxFsKENc8iPO2cAe1p8QnlE9Ui3AU wG8HpHuUmeKWNS3YG7hHeQd7MUzgiloWKUIeZw5f147hRXiB2X65TsKXKvvT1uXq+QV/ WP0g== X-Gm-Message-State: AOJu0YydeQ66UqUSOHEhW+eSb5oM7h4hMTaiXXhOUNjREyFwSdPunw0F 6DUEWe/NBzmcFsStXFTheJWyrrZJMIdB5SJa7UZwwyEOKALdsz8q1nJ9q3vSJ5RDhcF946Ry+1h Ibdw= X-Google-Smtp-Source: AGHT+IEoRWCeSB+5PvqYsMgX5NB+xUH2QAypOePBRhiP+AHrPsk/MM6kp3zVxm+dnacOx4xUw8m9PA== X-Received: by 2002:a05:622a:1a83:b0:434:e522:3c26 with SMTP id s3-20020a05622a1a8300b00434e5223c26mr12784305qtc.19.1714423449574; Mon, 29 Apr 2024 13:44:09 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id l6-20020ac87246000000b004369f4d31f2sm10817941qtp.50.2024.04.29.13.44.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:44:08 -0700 (PDT) Date: Mon, 29 Apr 2024 16:44:07 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 17/23] pack-bitmap: implement test helpers for pseudo-merge Message-ID: <7b5ea56d053bc023780c34385570be4617da6705.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Implement three new sub-commands for the "bitmap" test-helper: - t/helper test-tool bitmap dump-pseudo-merges - t/helper test-tool bitmap dump-pseudo-merge-commits - t/helper test-tool bitmap dump-pseudo-merge-objects These three helpers dump the list of pseudo merges, the "parents" of the nth pseudo-merges, and the set of objects reachable from those parents, respectively. These helpers will be useful in subsequent patches when we add test coverage for pseudo-merge bitmaps. Signed-off-by: Taylor Blau --- pack-bitmap.c | 126 +++++++++++++++++++++++++++++++++++++++++ pack-bitmap.h | 3 + t/helper/test-bitmap.c | 34 ++++++++--- 3 files changed, 156 insertions(+), 7 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index fc9c3e2fc43..c13074673af 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -2443,6 +2443,132 @@ int test_bitmap_hashes(struct repository *r) return 0; } +static void bit_pos_to_object_id(struct bitmap_index *bitmap_git, + uint32_t bit_pos, + struct object_id *oid) +{ + uint32_t index_pos; + + if (bitmap_is_midx(bitmap_git)) + index_pos = pack_pos_to_midx(bitmap_git->midx, bit_pos); + else + index_pos = pack_pos_to_index(bitmap_git->pack, bit_pos); + + nth_bitmap_object_oid(bitmap_git, oid, index_pos); +} + +int test_bitmap_pseudo_merges(struct repository *r) +{ + struct bitmap_index *bitmap_git; + uint32_t i; + + bitmap_git = prepare_bitmap_git(r); + if (!bitmap_git || !bitmap_git->pseudo_merges.nr) + goto cleanup; + + for (i = 0; i < bitmap_git->pseudo_merges.nr; i++) { + struct pseudo_merge *merge; + struct ewah_bitmap *commits_bitmap, *merge_bitmap; + + merge = use_pseudo_merge(&bitmap_git->pseudo_merges, + &bitmap_git->pseudo_merges.v[i]); + commits_bitmap = merge->commits; + merge_bitmap = pseudo_merge_bitmap(&bitmap_git->pseudo_merges, + merge); + + printf("at=%"PRIuMAX", commits=%"PRIuMAX", objects=%"PRIuMAX"\n", + (uintmax_t)merge->at, + (uintmax_t)ewah_bitmap_popcount(commits_bitmap), + (uintmax_t)ewah_bitmap_popcount(merge_bitmap)); + } + +cleanup: + free_bitmap_index(bitmap_git); + return 0; +} + +static void dump_ewah_object_ids(struct bitmap_index *bitmap_git, + struct ewah_bitmap *bitmap) + +{ + struct ewah_iterator it; + eword_t word; + uint32_t pos = 0; + + ewah_iterator_init(&it, bitmap); + + while (ewah_iterator_next(&word, &it)) { + struct object_id oid; + uint32_t offset; + + for (offset = 0; offset < BITS_IN_EWORD; offset++) { + if (!(word >> offset)) + break; + + offset += ewah_bit_ctz64(word >> offset); + + bit_pos_to_object_id(bitmap_git, pos + offset, &oid); + printf("%s\n", oid_to_hex(&oid)); + } + pos += BITS_IN_EWORD; + } +} + +int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n) +{ + struct bitmap_index *bitmap_git; + struct pseudo_merge *merge; + int ret = 0; + + bitmap_git = prepare_bitmap_git(r); + if (!bitmap_git || !bitmap_git->pseudo_merges.nr) + goto cleanup; + + if (n >= bitmap_git->pseudo_merges.nr) { + ret = error(_("pseudo-merge index out of range " + "(%"PRIu32" >= %"PRIuMAX")"), + n, (uintmax_t)bitmap_git->pseudo_merges.nr); + goto cleanup; + } + + merge = use_pseudo_merge(&bitmap_git->pseudo_merges, + &bitmap_git->pseudo_merges.v[n]); + dump_ewah_object_ids(bitmap_git, merge->commits); + +cleanup: + free_bitmap_index(bitmap_git); + return ret; +} + +int test_bitmap_pseudo_merge_objects(struct repository *r, uint32_t n) +{ + struct bitmap_index *bitmap_git; + struct pseudo_merge *merge; + int ret = 0; + + bitmap_git = prepare_bitmap_git(r); + if (!bitmap_git || !bitmap_git->pseudo_merges.nr) + goto cleanup; + + if (n >= bitmap_git->pseudo_merges.nr) { + ret = error(_("pseudo-merge index out of range " + "(%"PRIu32" >= %"PRIuMAX")"), + n, (uintmax_t)bitmap_git->pseudo_merges.nr); + goto cleanup; + } + + merge = use_pseudo_merge(&bitmap_git->pseudo_merges, + &bitmap_git->pseudo_merges.v[n]); + + dump_ewah_object_ids(bitmap_git, + pseudo_merge_bitmap(&bitmap_git->pseudo_merges, + merge)); + +cleanup: + free_bitmap_index(bitmap_git); + return ret; +} + int rebuild_bitmap(const uint32_t *reposition, struct ewah_bitmap *source, struct bitmap *dest) diff --git a/pack-bitmap.h b/pack-bitmap.h index a5fe4f305ef..25d3b8e604a 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -73,6 +73,9 @@ void traverse_bitmap_commit_list(struct bitmap_index *, void test_bitmap_walk(struct rev_info *revs); int test_bitmap_commits(struct repository *r); int test_bitmap_hashes(struct repository *r); +int test_bitmap_pseudo_merges(struct repository *r); +int test_bitmap_pseudo_merge_commits(struct repository *r, uint32_t n); +int test_bitmap_pseudo_merge_objects(struct repository *r, uint32_t n); #define GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL \ "GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL" diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c index af43ee1cb5e..6af2b42678f 100644 --- a/t/helper/test-bitmap.c +++ b/t/helper/test-bitmap.c @@ -13,21 +13,41 @@ static int bitmap_dump_hashes(void) return test_bitmap_hashes(the_repository); } +static int bitmap_dump_pseudo_merges(void) +{ + return test_bitmap_pseudo_merges(the_repository); +} + +static int bitmap_dump_pseudo_merge_commits(uint32_t n) +{ + return test_bitmap_pseudo_merge_commits(the_repository, n); +} + +static int bitmap_dump_pseudo_merge_objects(uint32_t n) +{ + return test_bitmap_pseudo_merge_objects(the_repository, n); +} + int cmd__bitmap(int argc, const char **argv) { setup_git_directory(); - if (argc != 2) - goto usage; - - if (!strcmp(argv[1], "list-commits")) + if (argc == 2 && !strcmp(argv[1], "list-commits")) return bitmap_list_commits(); - if (!strcmp(argv[1], "dump-hashes")) + if (argc == 2 && !strcmp(argv[1], "dump-hashes")) return bitmap_dump_hashes(); + if (argc == 2 && !strcmp(argv[1], "dump-pseudo-merges")) + return bitmap_dump_pseudo_merges(); + if (argc == 3 && !strcmp(argv[1], "dump-pseudo-merge-commits")) + return bitmap_dump_pseudo_merge_commits(atoi(argv[2])); + if (argc == 3 && !strcmp(argv[1], "dump-pseudo-merge-objects")) + return bitmap_dump_pseudo_merge_objects(atoi(argv[2])); -usage: usage("\ttest-tool bitmap list-commits\n" - "\ttest-tool bitmap dump-hashes"); + "\ttest-tool bitmap dump-hashes\n" + "\ttest-tool bitmap dump-pseudo-merges\n" + "\ttest-tool bitmap dump-pseudo-merge-commits \n" + "\ttest-tool bitmap dump-pseudo-merge-objects "); return -1; } From patchwork Mon Apr 29 20:44:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647719 Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D782179659 for ; Mon, 29 Apr 2024 20:44:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423456; cv=none; b=qgcspI88r0lKyKyORsUCyQsSsNmaXAKhwZrPrnmUiiEHgCXlBdInoewHRz0kN+pDQcQErIZg1hvVNuQJkVqCdKzlOBhRZNXjGTC64NGwa3j8RgARdauqZtLviLvp8M+FOirNeBbkpsmU2XdTvkBAsDv6IbxnABWfsxhmjitnTIk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423456; c=relaxed/simple; bh=SV96UtOJMku0cCXiWpqq/U1DmMgZJy8G1ihBQW3b354=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=GP9fKrH3INxDiQ8UAqqpOlMmyMOUHi8Kbp5khm1zn33evhFR6xYe8hkLDQ8VLcgiw979MEmFHO1/EK+3ULQ3QVovbiwBEJKjsEjdMD8AhYN8daaqNsOXxO0e66WgRO+Ty7Ji+25JMOOXF0XT6ZOtFAgdYc62JC5h2DOUSx+3DaU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=u1Gfd/Zf; arc=none smtp.client-ip=209.85.219.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="u1Gfd/Zf" Received: by mail-qv1-f48.google.com with SMTP id 6a1803df08f44-6969388c36fso20414336d6.1 for ; Mon, 29 Apr 2024 13:44:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423453; x=1715028253; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=UJK1qt4MrO/9hMZTnFd2Wp9d2ueuTJESIE9NLlAmvSw=; b=u1Gfd/ZfIYSlhnLXvZh9K0fbhHWVXOVZeQiojHOXdmS3Ou8obvYiPk5Hb+h1Dybg1q QQZZKaOGIiMU6rpYzB0rQiIuoIE/AUGXrl9aiNmnVeUUDco0A2yvHV0SBF9DDwhYtuRI M17g1lWMLADhIQRKz8CQ1BHeRtFlxH+BiUMbsTKQiHDeE9Uaba162tln7mXXliw9KfFC UgP3JXOJ72vxoh7vvfId0Uk2xey7vkt7uWbS8FJvYZ3hxePhwHyifOCWvmGhV2aAfnzL IC7O3aw3Lqc8pvbsDDb9dTSOI9NsrMp87kBc8ChNEE3t8SD6440hJKhBFrSCBBp+C8t4 Hntg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423453; x=1715028253; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=UJK1qt4MrO/9hMZTnFd2Wp9d2ueuTJESIE9NLlAmvSw=; b=rdYTYcZW9gPdVxtQ5AMErCsKnciSqRS6fa5mK/dWhtlsofGVfPC3Row8UsdAfJgSMQ fZqZPaoyRRPJl/scPRg9MwuGpjBdiyjRrJkoCOrp1Zvrck58EuHd1DNWWa+gG8+2S/EW YI9l0b9Xj5Yn3uaEaDu1QhMJPGWFLT7IAZPhQdoCb637r736dn0wBJslTR1mpRfQRJAa Ja9GUaFu02nAWtIUMGjksvP5PgUe/Q91Whg3NqgewzX+RSMsG82a+xk5HUVOK/fptQJd IVH051uEjZik8o20BbJKgTAuyFuvBEPG0yhvyU+1xrEjpc+ld5eBkcc6JqLnb4tMBwnk x0rg== X-Gm-Message-State: AOJu0Yxc2Il6SutGiReywvGfa1UeAAxCv6yeHpRzA1j/IdOU/FFI8Kjp 3BYWGUTSBz74z9S6g4F4kes0uvEUIlAlk4KEKeMUtdEHFjsx7Kd5RL1UPXN9WRzvaL2aPR2Fm1M gFPs= X-Google-Smtp-Source: AGHT+IEm5rseE2EOIrXP4DAXUKvTlGYxAtoa6ICJnlbZO/GEkS1aQnZV1IXbcySM2AOt+EsMBEh+PA== X-Received: by 2002:ad4:5de6:0:b0:6a0:b4fd:d0a0 with SMTP id jn6-20020ad45de6000000b006a0b4fdd0a0mr770882qvb.6.1714423453200; Mon, 29 Apr 2024 13:44:13 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id ne4-20020a056214424400b006a0cfdcffbdsm1201765qvb.50.2024.04.29.13.44.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:44:12 -0700 (PDT) Date: Mon, 29 Apr 2024 16:44:11 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 18/23] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Message-ID: <006abdd1698999539eda7f1b08c2b5a99efaed94.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: One of the tests we'll want to add for pseudo-merge bitmaps needs to be able to generate a large number of commits at a specific date. Support the `--date` option (with identical semantics to the `--date` option for `test_commit()`) within `test_commit_bulk` as a prerequisite for that. Signed-off-by: Taylor Blau --- t/test-lib-functions.sh | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 862d80c9748..16fd585e34b 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -458,6 +458,7 @@ test_commit_bulk () { indir=. ref=HEAD n=1 + notick= message='commit %s' filename='%s.t' contents='content %s' @@ -488,6 +489,12 @@ test_commit_bulk () { filename="${1#--*=}-%s.t" contents="${1#--*=} %s" ;; + --date) + notick=yes + GIT_COMMITTER_DATE="$2" + GIT_AUTHOR_DATE="$2" + shift + ;; -*) BUG "invalid test_commit_bulk option: $1" ;; @@ -507,7 +514,10 @@ test_commit_bulk () { while test "$total" -gt 0 do - test_tick && + if test -z "$notick" + then + test_tick + fi && echo "commit $ref" printf 'author %s <%s> %s\n' \ "$GIT_AUTHOR_NAME" \ From patchwork Mon Apr 29 20:44:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647720 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 619DB178CFD for ; Mon, 29 Apr 2024 20:44:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423461; cv=none; b=usvfknEnjkEsQqAhy8zgfJhaV2G96Iway67ucVfOTH5sZ60YsPRsKSoMTWIJfNglnkzRdQnzSqnBiLOsUMkfEobsr0qYdWZKj+5Hv2RXrMUlKp7W+mOPLaXd79nuNwf2M3wPAjLmRy1Gx9UZt+ELP1DdMrl11Dxeg5GITzMvDkw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423461; c=relaxed/simple; bh=l0p0EPjQiiRbNZJfe/aQ2DcZpVdiG1+MpK6D3JWVtHk=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=kUIgoL93aQLKgmDYruR1faNboXEQivC0UJiWRhmDTqPr+wziKPqUqMfR5dGUFXG84zbTBCeoTEmqpPmxXYQKmsdgPI+Zft6Nqi6JN9i2kU2PbTZQwNadmvUHdVqjQDKxxSqOd4ouzjlcqjOnGV2sKpdIIjLdaSAVjz92FJZx6to= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=i6qo6DA1; arc=none smtp.client-ip=209.85.160.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="i6qo6DA1" Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-4349685c845so35749251cf.0 for ; Mon, 29 Apr 2024 13:44:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423457; x=1715028257; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=tC9xrGCkEoVASN/n/1byE1JRWQQ4kvBcR1Zwc4omlAs=; b=i6qo6DA1P5FyB79FsiSCZ3b95IGX2BNIx8jORHb4zlvsyUkuDZARV7RXqoUchNGIDC O2g1f3Gr7TeLQZXL+7gIfY4mYj7SjlIa6P8neXuRV19WenG3JagIZtodbDcb/3Zp0usI SIMX5jz3R2fnsgH1nlC3rJUfUp6Tii0luJRVaRHwshjtoK3KEEZZpxaMMJ9IABrqlkw5 /Oz3PQ0p9Prq6kHzUYhNwrlF8KURwaobPkQ/pdpXJa2R/q4jwGswY3OQwMd8SLuPd2XF cH9kwPAKVnnHwamDyumP3vTf/8hoilcB/M+gQabGANTBhkqOVJlIS04fRX6g6RzilDuO UWbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423457; x=1715028257; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=tC9xrGCkEoVASN/n/1byE1JRWQQ4kvBcR1Zwc4omlAs=; b=pjB6AaI0Tl8eUy5MkAvZ69omv4+cXsd+sCYPNXC2mMWSaLo4CuracR3ypTplOdXOUa h6k69yNLcS2S8ymGH8DArwLUDB7GDY8FPx9ujw82rbnyb3vULm3pRhvtLL0D0Ri80U0F U7YBWuDN7FUJllxqRtaoubLXK5Cl0+p8/eYp4tB1IKfOqRx2LSF7E/0g7R2iuQiNA1Ot 6acheC55GBHbMCQdvgRuZTQPbKjloX8HQXkbtDBXxrzyANECjFK1ibAsklnneRRMVD7z 8BzYlDEy8ltw3FR9ELViNcRHaFKtZvsr8pdIYdjqZaX1+uisfOWEIEGejZcMmWKH333B jesg== X-Gm-Message-State: AOJu0Yz3ffdOhp6MBpHa+vYjltwF+drs1fps/AF9DKO94Q0pIKXs+/JL A52RgMavCkWAihD21OfH21RCit7aYAXpNVTRGGaCjwvElQem+NA0boUgadXDoSwkhw0aIp7qXwM OGAY= X-Google-Smtp-Source: AGHT+IHK9tuG3hQgsk2ve92i70chTRggdEvEztY86tUXBIRt6bYyTLp0MzCiMeQNxnXeMBJ6+5aCJw== X-Received: by 2002:ac8:5ad5:0:b0:43a:f441:b3ac with SMTP id d21-20020ac85ad5000000b0043af441b3acmr4755216qtd.32.1714423457029; Mon, 29 Apr 2024 13:44:17 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id et25-20020a05622a4b1900b0043abd262cc9sm2099013qtb.83.2024.04.29.13.44.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:44:16 -0700 (PDT) Date: Mon, 29 Apr 2024 16:44:14 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 19/23] pack-bitmap.c: use pseudo-merges during traversal Message-ID: <3f85e5b90f5fc65b4d6aa610cfbec40d2cd8fdf2.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Now that all of the groundwork has been laid to support reading and using pseudo-merges, make use of that work in this commit by teaching the pack-bitmap machinery to use pseudo-merge(s) when available during traversal. The basic operation is as follows: - When enumerating objects on either side of a reachability query, first see if any subset of the roots satisfies some pseudo-merge bitmap. If it does, apply that pseudo-merge bitmap. - If any pseudo-merge bitmap(s) were applied in the previous step, OR them into the result[^1]. Then repeat the process over all pseudo-merge bitmaps (we'll refer to this as "cascading" pseudo-merges). Once this is done, OR in the resulting bitmap. - If there is no fill-in traversal to be done, return the bitmap for that side of the reachability query. If there is fill-in traversal, then for each commit we encounter via show_commit(), check to see if any unsatisfied pseudo-merges containing that commit as one of its parents has been made satisfied by the presence of that commit. If so, OR in the object set from that pseudo-merge bitmap, and then cascade. If not, continue traversal. A similar implementation is present in the boundary-based bitmap traversal routines. [^1]: Importantly, we cannot OR in the entire set of roots along with the objects reachable from whatever pseudo-merge bitmaps were satisfied. This may leave some dangling bits corresponding to any unsatisfied root(s) getting OR'd into the resulting bitmap, tricking other parts of the traversal into thinking we already have a reachability closure over those commit(s) when we do not. Signed-off-by: Taylor Blau --- pack-bitmap.c | 112 ++++++++++- t/t5333-pseudo-merge-bitmaps.sh | 325 ++++++++++++++++++++++++++++++++ 2 files changed, 436 insertions(+), 1 deletion(-) create mode 100755 t/t5333-pseudo-merge-bitmaps.sh diff --git a/pack-bitmap.c b/pack-bitmap.c index c13074673af..e61058dada6 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -114,6 +114,9 @@ struct bitmap_index { unsigned int version; }; +static int pseudo_merges_satisfied_nr; +static int pseudo_merges_cascades_nr; + static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st) { struct ewah_bitmap *parent; @@ -1006,6 +1009,22 @@ static void show_commit(struct commit *commit UNUSED, { } +static unsigned apply_pseudo_merges_for_commit_1(struct bitmap_index *bitmap_git, + struct bitmap *result, + struct commit *commit, + uint32_t commit_pos) +{ + int ret; + + ret = apply_pseudo_merges_for_commit(&bitmap_git->pseudo_merges, + result, commit, commit_pos); + + if (ret) + pseudo_merges_satisfied_nr += ret; + + return ret; +} + static int add_to_include_set(struct bitmap_index *bitmap_git, struct include_data *data, struct commit *commit, @@ -1026,6 +1045,10 @@ static int add_to_include_set(struct bitmap_index *bitmap_git, } bitmap_set(data->base, bitmap_pos); + if (apply_pseudo_merges_for_commit_1(bitmap_git, data->base, commit, + bitmap_pos)) + return 0; + return 1; } @@ -1151,6 +1174,20 @@ static void show_boundary_object(struct object *object UNUSED, BUG("should not be called"); } +static unsigned cascade_pseudo_merges_1(struct bitmap_index *bitmap_git, + struct bitmap *result, + struct bitmap *roots) +{ + int ret = cascade_pseudo_merges(&bitmap_git->pseudo_merges, + result, roots); + if (ret) { + pseudo_merges_cascades_nr++; + pseudo_merges_satisfied_nr += ret; + } + + return ret; +} + static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, struct rev_info *revs, struct object_list *roots) @@ -1160,6 +1197,7 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, unsigned int i; unsigned int tmp_blobs, tmp_trees, tmp_tags; int any_missing = 0; + int existing_bitmaps = 0; cb.bitmap_git = bitmap_git; cb.base = bitmap_new(); @@ -1167,6 +1205,25 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, revs->ignore_missing_links = 1; + if (bitmap_git->pseudo_merges.nr) { + struct bitmap *roots_bitmap = bitmap_new(); + struct object_list *objects = NULL; + + for (objects = roots; objects; objects = objects->next) { + struct object *object = objects->item; + int pos; + + pos = bitmap_position(bitmap_git, &object->oid); + if (pos < 0) + continue; + + bitmap_set(roots_bitmap, pos); + } + + if (!cascade_pseudo_merges_1(bitmap_git, cb.base, roots_bitmap)) + bitmap_free(roots_bitmap); + } + /* * OR in any existing reachability bitmaps among `roots` into * `cb.base`. @@ -1178,8 +1235,10 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, continue; if (add_commit_to_bitmap(bitmap_git, &cb.base, - (struct commit *)object)) + (struct commit *)object)) { + existing_bitmaps = 1; continue; + } any_missing = 1; } @@ -1187,6 +1246,9 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, if (!any_missing) goto cleanup; + if (existing_bitmaps) + cascade_pseudo_merges_1(bitmap_git, cb.base, NULL); + tmp_blobs = revs->blob_objects; tmp_trees = revs->tree_objects; tmp_tags = revs->blob_objects; @@ -1242,6 +1304,13 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, return cb.base; } +static void unsatisfy_all_pseudo_merges(struct bitmap_index *bitmap_git) +{ + uint32_t i; + for (i = 0; i < bitmap_git->pseudo_merges.nr; i++) + bitmap_git->pseudo_merges.v[i].satisfied = 0; +} + static struct bitmap *find_objects(struct bitmap_index *bitmap_git, struct rev_info *revs, struct object_list *roots, @@ -1249,9 +1318,32 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, { struct bitmap *base = NULL; int needs_walk = 0; + unsigned existing_bitmaps = 0; struct object_list *not_mapped = NULL; + unsatisfy_all_pseudo_merges(bitmap_git); + + if (bitmap_git->pseudo_merges.nr) { + struct bitmap *roots_bitmap = bitmap_new(); + struct object_list *objects = NULL; + + for (objects = roots; objects; objects = objects->next) { + struct object *object = objects->item; + int pos; + + pos = bitmap_position(bitmap_git, &object->oid); + if (pos < 0) + continue; + + bitmap_set(roots_bitmap, pos); + } + + base = bitmap_new(); + if (!cascade_pseudo_merges_1(bitmap_git, base, roots_bitmap)) + bitmap_free(roots_bitmap); + } + /* * Go through all the roots for the walk. The ones that have bitmaps * on the bitmap index will be `or`ed together to form an initial @@ -1262,11 +1354,21 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, */ while (roots) { struct object *object = roots->item; + roots = roots->next; + if (base) { + int pos = bitmap_position(bitmap_git, &object->oid); + if (pos > 0 && bitmap_get(base, pos)) { + object->flags |= SEEN; + continue; + } + } + if (object->type == OBJ_COMMIT && add_commit_to_bitmap(bitmap_git, &base, (struct commit *)object)) { object->flags |= SEEN; + existing_bitmaps = 1; continue; } @@ -1282,6 +1384,9 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, roots = not_mapped; + if (existing_bitmaps) + cascade_pseudo_merges_1(bitmap_git, base, NULL); + /* * Let's iterate through all the roots that don't have bitmaps to * check if we can determine them to be reachable from the existing @@ -1866,6 +1971,11 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, object_list_free(&wants); object_list_free(&haves); + trace2_data_intmax("bitmap", the_repository, "pseudo_merges_satisfied", + pseudo_merges_satisfied_nr); + trace2_data_intmax("bitmap", the_repository, "pseudo_merges_cascades", + pseudo_merges_cascades_nr); + return bitmap_git; cleanup: diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh new file mode 100755 index 00000000000..909c17e301e --- /dev/null +++ b/t/t5333-pseudo-merge-bitmaps.sh @@ -0,0 +1,325 @@ +#!/bin/sh + +test_description='pseudo-merge bitmaps' + +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 + +. ./test-lib.sh + +test_pseudo_merges () { + test-tool bitmap dump-pseudo-merges +} + +test_pseudo_merge_commits () { + test-tool bitmap dump-pseudo-merge-commits "$1" +} + +test_pseudo_merges_satisfied () { + test_trace2_data bitmap pseudo_merges_satisfied "$1" +} + +test_pseudo_merges_cascades () { + test_trace2_data bitmap pseudo_merges_cascades "$1" +} + +tag_everything () { + git rev-list --all --no-object-names >in && + perl -lne ' + print "create refs/tags/" . $. . " " . $1 if /([0-9a-f]+)/ + ' expect && + + : >trace2.txt && + GIT_TRACE2_EVENT=$PWD/trace2.txt \ + git rev-list --count --all --objects --use-bitmap-index >actual && + + test_pseudo_merges_satisfied 0 merges && + test_must_be_empty merges && + test_cmp expect actual +' + +test_expect_success 'pseudo-merges accurately represent their objects' ' + test_config bitmapPseudoMerge.test.pattern "refs/tags/" && + test_config bitmapPseudoMerge.test.maxMerges 8 && + test_config bitmapPseudoMerge.test.stableThreshold never && + + git repack -adb && + + test_pseudo_merges >merges && + test_line_count = 8 merges && + + for i in $(test_seq 0 $(($(wc -l commits && + + git rev-list --objects --no-object-names --stdin expect.raw && + test-tool bitmap dump-pseudo-merge-objects $i >actual.raw && + + sort -u expect && + sort -u actual && + + test_cmp expect actual || return 1 + done +' + +test_expect_success 'bitmap traversal with pseudo-merges' ' + : >trace2.txt && + GIT_TRACE2_EVENT=$PWD/trace2.txt \ + git rev-list --count --all --objects --use-bitmap-index >actual && + git rev-list --count --all --objects >expect && + + test_pseudo_merges_satisfied 8 trace2.txt && + GIT_TRACE2_EVENT=$PWD/trace2.txt \ + git rev-list --count --all --objects --use-bitmap-index >actual && + git rev-list --count --all --objects >expect && + + test_pseudo_merges_satisfied 8 merges && + for i in $(test_seq 0 $(($(wc -l commits && + + test-tool bitmap list-commits >bitmaps && + bitmaps_nr="$(wc -l expect && + + test $(cat expect) -eq $(wc -l merges && + test_line_count = 1 merges && + + test_pseudo_merge_commits 0 >oids && + git cat-file --batch commits && + + test $(wc -l in && + git update-ref --stdin merges && + merges_nr="$(wc -l oids && + git cat-file --batch commits && + + expect="$(grep -c "^committer.*$old +0000$" commits)" && + actual="$(wc -l oids && + git cat-file --batch commits && + test $(wc -l err && + + cat >expect <<-EOF && + fatal: pseudo-merge group ${SQ}test${SQ} has unstable threshold before stable one + EOF + + test_cmp expect err +' + +test_expect_success 'pseudo-merge pattern with capture groups' ' + git init pseudo-merge-captures && + ( + cd pseudo-merge-captures && + + test_commit_bulk 128 && + tag_everything && + + for r in $(test_seq 8) + do + test_commit_bulk 16 && + + git rev-list HEAD~16.. >in && + + perl -lne "print \"create refs/remotes/$r/tags/\$. \$_\"" refs && + + test_pseudo_merges >merges && + for m in $(test_seq 0 $(($(wc -l oids && + grep -f oids refs | + perl -lne "print \$1 if /refs\/remotes\/([0-9]+)/" | + sort -u || return 1 + done >remotes && + + test $(wc -l merges && + test_line_count = 2 merges && + + test_pseudo_merge_commits 0 >commits-0.raw && + test_pseudo_merge_commits 1 >commits-1.raw && + + sort commits-0.raw >commits-0 && + sort commits-1.raw >commits-1 && + + comm -12 commits-0 commits-1 >overlap && + + test_line_count -gt 0 overlap + ) +' + +test_expect_success 'pseudo-merge overlap traversal' ' + ( + cd pseudo-merge-overlap && + + : >trace2.txt && + GIT_TRACE2_EVENT=$PWD/trace2.txt \ + git rev-list --count --all --objects --use-bitmap-index >actual && + git rev-list --count --all --objects >expect && + + test_pseudo_merges_satisfied 2 trace2.txt && + GIT_TRACE2_EVENT=$PWD/trace2.txt \ + git rev-list --count --all --objects --use-bitmap-index >actual && + git rev-list --count --all --objects >expect && + + test_pseudo_merges_satisfied 2 X-Patchwork-Id: 13647721 Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8161A17B500 for ; Mon, 29 Apr 2024 20:44:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423463; cv=none; b=F5CwM2pwug9Dc3MBpbtuxaK02lDFBrFyYY6UHSHgZvKUdogyjSlwaYPSyzB2cY/ztH8Kp3KJp9j2mLesXuDjtWFtuu13ld+YHUZ02TuRQdoQEiqE9qrWLueuTJi/oJD5WQCzFfbbhC3hoJ62D22tajbqgdoJswJrFxxeVgh9Aes= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423463; c=relaxed/simple; bh=vquQ99UiSrPByaTwamfX8BS+fAcMmeMTNIJ07oLqQIQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=XiDmlGrZWvHJGsJl8RoteuwGQZN8pKB6LPd+Vi2QodfddeimYPSCMI0bXOPut1Nl2/ooFEraHfdnQuMjsfMejyXfrOITLmmmZ+345CAMmv8MFjn+l+LtbJnmfBItHj0xE48Lxyjk0iX54aLXbr3JKG3pkUrl2CSUA6ONixXMI4k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=N81Eimfz; arc=none smtp.client-ip=209.85.222.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="N81Eimfz" Received: by mail-qk1-f177.google.com with SMTP id af79cd13be357-790f4650f93so146710685a.2 for ; Mon, 29 Apr 2024 13:44:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423461; x=1715028261; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=OsjD1jUWCx5XOA4iwy+r90r2mQuzBCQ5x3rK9E5Y/v8=; b=N81Eimfz7N/LVkjKLoCT6gkGcqpwCb9AG0o738EZLC/p8B4v8affKTHNm+Dx8Yq9km 0ENiQPy59s0k88ncn9dbjL+EKr2EP8OxMtU2hHOmtOtP/ZwGHWrZnDtUoVvwzNo2mVlr x30wHM+THLGHqO+YwyB6ObRhM3XaJbjMF2GSWzJqXe3SKZK76uZbIKRNjMGuOcaUpKam DLrrfljfo26UUc081+fiM8U9BzB8ZCg1BUVfW5lyeo6EaA5LGUVCUsyfTakRFGdwzeMB rHUYepcZ8cggXM6jJ1QyFsWxQoPEbE04FhAxYcYtOGgrqXGHteyr7iYhgc1wsTvaczq2 rndw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423461; x=1715028261; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=OsjD1jUWCx5XOA4iwy+r90r2mQuzBCQ5x3rK9E5Y/v8=; b=G/XPXgL0yWVsx0sq9p9OHAYzW7S5JBiIZjTQP/h2Y9oskmspb+aUoEhOjQWTCzYjvB iYID2lQQI4ab+F61n3ebaKOY81w+kthrp0S9ixyZCFAw//Loo/KpMl3+Op7gz+pOByQh UeQpowzY7tjy9FlxkewKVspOdR+aNxIILhIyONgslxOT8kyqWeieEt4yadnCt8Sp1sUN IEH9lQrL0ZpimxGg8sp36RIepqWUe7ciWlnPclgZXDFjt/yGMi+GfRiGPHJEMASy5V6M OYnAAtAOCjcS59fS1l/Pv9YWODqRorL3ebdzxHiKFevEEgLrye1QNqw0wXlWLxXRf4QJ 7RZw== X-Gm-Message-State: AOJu0YzSkGkJD0CKSWr4JgneSp4Bv9iKOxQXgfNh7MAyixMH63GWbnxI +hfORpyYhwfQZvYASUyRBWFhtV3Z8e1a/e3x3hdlY86e9+bTZF8nLumndICK7K3w9tzt6pzhKPF hqxs= X-Google-Smtp-Source: AGHT+IFRsxDPhQukwsRatPWjkbCZzUI81wGqrG64PdPDsUZYlqkTeRTWOAUHwIuFkrcrz0fsn/7WHQ== X-Received: by 2002:a05:6214:1308:b0:69b:16d5:a8ac with SMTP id pn8-20020a056214130800b0069b16d5a8acmr15989279qvb.4.1714423460761; Mon, 29 Apr 2024 13:44:20 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id l1-20020a0ce841000000b006a0cda0f4ddsm1353279qvo.34.2024.04.29.13.44.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:44:20 -0700 (PDT) Date: Mon, 29 Apr 2024 16:44:18 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 20/23] pack-bitmap: extra trace2 information Message-ID: <5fac186df641f86814063c9df31f1184056a70ad.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Add some extra trace2 lines to capture the number of bitmap lookups that are hits versus misses, as well as the number of reachability roots that have bitmap coverage (versus those that do not). Signed-off-by: Taylor Blau --- pack-bitmap.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index e61058dada6..1966b3b95f1 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -116,6 +116,10 @@ struct bitmap_index { static int pseudo_merges_satisfied_nr; static int pseudo_merges_cascades_nr; +static int existing_bitmaps_hits_nr; +static int existing_bitmaps_misses_nr; +static int roots_with_bitmaps_nr; +static int roots_without_bitmaps_nr; static struct ewah_bitmap *lookup_stored_bitmap(struct stored_bitmap *st) { @@ -1040,10 +1044,14 @@ static int add_to_include_set(struct bitmap_index *bitmap_git, partial = bitmap_for_commit(bitmap_git, commit); if (partial) { + existing_bitmaps_hits_nr++; + bitmap_or_ewah(data->base, partial); return 0; } + existing_bitmaps_misses_nr++; + bitmap_set(data->base, bitmap_pos); if (apply_pseudo_merges_for_commit_1(bitmap_git, data->base, commit, bitmap_pos)) @@ -1099,8 +1107,12 @@ static int add_commit_to_bitmap(struct bitmap_index *bitmap_git, { struct ewah_bitmap *or_with = bitmap_for_commit(bitmap_git, commit); - if (!or_with) + if (!or_with) { + existing_bitmaps_misses_nr++; return 0; + } + + existing_bitmaps_hits_nr++; if (!*base) *base = ewah_to_bitmap(or_with); @@ -1407,8 +1419,12 @@ static struct bitmap *find_objects(struct bitmap_index *bitmap_git, object->flags &= ~UNINTERESTING; add_pending_object(revs, object, ""); needs_walk = 1; + + roots_without_bitmaps_nr++; } else { object->flags |= SEEN; + + roots_with_bitmaps_nr++; } } @@ -1975,6 +1991,14 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs, pseudo_merges_satisfied_nr); trace2_data_intmax("bitmap", the_repository, "pseudo_merges_cascades", pseudo_merges_cascades_nr); + trace2_data_intmax("bitmap", the_repository, "bitmap/hits", + existing_bitmaps_hits_nr); + trace2_data_intmax("bitmap", the_repository, "bitmap/misses", + existing_bitmaps_misses_nr); + trace2_data_intmax("bitmap", the_repository, "bitmap/roots_with_bitmap", + roots_with_bitmaps_nr); + trace2_data_intmax("bitmap", the_repository, "bitmap/roots_without_bitmap", + roots_without_bitmaps_nr); return bitmap_git; From patchwork Mon Apr 29 20:44:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647722 Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70B42178CF1 for ; Mon, 29 Apr 2024 20:44:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423467; cv=none; b=S5QQuz7uvMU2UFpfcGC8AfNu5tZW16IHuZkHufWO4ezjXG2FxyaW9pTzDIZEZZIs/V3FiJy5kUZ53zlYFuJAtYbYTqOxcDi5ET2RKfZCQdDz5/86jD8kei83qs+y/Gv8fR0LWzBDvg+XKfSpb7ZnNFXuFE7uj5sRITPYuDTLbvI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423467; c=relaxed/simple; bh=5VX68u/Li1sjhd8gSunrRWSJN9qD+2Wq4hjNMX/awMs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Q+qzRFKqX3MspymHZcJs+67yPKg30CiAf74fF9QFEBJZQUb+mkaLAvOGfK0ZhCZ+9m9yvwp7kAW0GTrbcQyCNdaecdFqWxfo5UUThPZSSo7jIVcjoQrVSPnjzm37aP/wFI2QqMXdGf0dLXWOzRanovjJCas9Bkgn6MdbCO4HTSk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=TDPRkWiy; arc=none smtp.client-ip=209.85.222.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="TDPRkWiy" Received: by mail-qk1-f181.google.com with SMTP id af79cd13be357-78ecd752a7cso358205785a.0 for ; Mon, 29 Apr 2024 13:44:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423464; x=1715028264; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=8flAK7JhzakD+2lsKZhmDULy09W5z3KJEkpqDGr3jLE=; b=TDPRkWiy1j4XPlCxnS/pYM79097TGD2aBiOtlKk4UfJqAsNizZWeXloSyE0qx7KljD HUI0ZURE9fT/8TdFhrSWcu/gZy2d+Y6zMWY49ZMa7eV+CeglLo+xjxsQb95EyCX/WFhy 0JMTb7kJzOBVSx8U1+evjBz+cArbjEwaXDqDf1Z+hXvxW8yH2vDJW7GOy2FDPovH048+ P/ylJO2yr9TkS9RFIhhG9H5J1eZyhRFugWNPotPB2EVFMuLplwp/J+USZP7BP0JJDiAH xQlR0o/lKOMTlJHf6UvZtlxWJCEWA+u83hGIG3h2Lx3bdq/r6KM7w+6Ak9qVm+AW9Opr 3EAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423464; x=1715028264; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8flAK7JhzakD+2lsKZhmDULy09W5z3KJEkpqDGr3jLE=; b=ZXMPEC8ftyv0y6HzEjrWaQfOcE+B/bEbk0pRXR5kNbQckSOD4/evSZ3ybvyYF8mOAG PEyOrj7dyH25vEKxk3mSw4Q2H+wDnpi43yp+YTN4frWiZAIYGN5t7nqVE5sjBvz7noNu 4OcIXxzcEYDsLtHswnMBQ0slYrktQ/58G6xin0vJCxsAJGCoYRYq6h4ohcjYgdXu9oZl NJQS/P7P/viyKw6Il9tS/lP05WGJFJPbKHnjA4YI0nbOx9cGyQijRKLq8MmxTOX9Tx2P LWMVgT91QlTOOKvL6/ognKccg6TgCDRi+VPsBJ6gIkO7cZTkiAZ5ZMuVuRzfDbeRoRdC MW5Q== X-Gm-Message-State: AOJu0YzFRRHyPTHp6b4btl9vWA627IwRIWxAFqsUoJ5ZXiyocP8UjA5L 0KFNWcb4SpNs9SgGFZwraPaB1I9jIsPKQWQnaJKXoSh6Yo8VuBdgDlmW12zj61rB6zmNMrrMji3 HTBY= X-Google-Smtp-Source: AGHT+IH17Vg8Yxokefx9W1EU//H0k6JwkCYHf/PcX4QyYQGMifvpgTaYZh0jpNYC/OOhMZ6oi7+3Nw== X-Received: by 2002:a05:620a:51cf:b0:790:9625:d6d9 with SMTP id cx15-20020a05620a51cf00b007909625d6d9mr11957729qkb.35.1714423464481; Mon, 29 Apr 2024 13:44:24 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id d15-20020a05620a158f00b00790f882fa89sm1015804qkk.33.2024.04.29.13.44.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:44:23 -0700 (PDT) Date: Mon, 29 Apr 2024 16:44:22 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 21/23] ewah: `bitmap_equals_ewah()` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Prepare to reuse existing pseudo-merge bitmaps by implementing a `bitmap_equals_ewah()` helper. This helper will be used to see if a raw bitmap (containing the set of parents for some pseudo-merge) is equal to any existing pseudo-merge's commits bitmap (which are stored as EWAH-compressed bitmaps on disk). Signed-off-by: Taylor Blau --- ewah/bitmap.c | 19 +++++++++++++++++++ ewah/ewok.h | 1 + 2 files changed, 20 insertions(+) diff --git a/ewah/bitmap.c b/ewah/bitmap.c index dc2ca190f12..55928dada86 100644 --- a/ewah/bitmap.c +++ b/ewah/bitmap.c @@ -261,6 +261,25 @@ int bitmap_equals(struct bitmap *self, struct bitmap *other) return 1; } +int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other) +{ + struct ewah_iterator it; + eword_t word; + size_t i = 0; + + ewah_iterator_init(&it, other); + + while (ewah_iterator_next(&word, &it)) + if (word != (i < self->word_alloc ? self->words[i++] : 0)) + return 0; + + for (; i < self->word_alloc; i++) + if (self->words[i]) + return 0; + + return 1; +} + int bitmap_is_subset(struct bitmap *self, struct bitmap *other) { size_t common_size, i; diff --git a/ewah/ewok.h b/ewah/ewok.h index 7074a6347b7..5e357e24933 100644 --- a/ewah/ewok.h +++ b/ewah/ewok.h @@ -179,6 +179,7 @@ void bitmap_unset(struct bitmap *self, size_t pos); int bitmap_get(struct bitmap *self, size_t pos); void bitmap_free(struct bitmap *self); int bitmap_equals(struct bitmap *self, struct bitmap *other); +int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other); /* * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set From patchwork Mon Apr 29 20:44:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647723 Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5BC3C178CF1 for ; Mon, 29 Apr 2024 20:44:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423472; cv=none; b=T+PTVOnuCBaTou/vZk0OtM8X57JfWWQC+Jhby74EOn8/NWU/9TQzWjmzg3Ri8gsqcOpQkOAYIv2COX+pNLGfnwZOz+XuoPxnToygMt2ZUCtrnNxX2LalPXdJ/liB7jXJORexqYY08Zg917OuyMCSAem7xmYyHJYEawXXpRYbeV0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423472; c=relaxed/simple; bh=fvxOkp2eWu9r6zgvz8Dpotl1KXKd9QgO0MMlUEB0+fM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=BjBGKMAOR995nFu5yCqoQydnYz5hCgq9D5Q8sWvn4jx3Gtl1HW5TO/YLqTZpL4TXICDEzDjF/HF6gxlaCXTnMu69TdXhGWshGman1lMawKbMbUHSDELBec8HU8Um0VM7ldvVJMH5OcLcGoZVsiTNHv7zef4B+S21M7BGQ3rc3kw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=JuVmP3SY; arc=none smtp.client-ip=209.85.128.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="JuVmP3SY" Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-61be4b986aaso7895967b3.3 for ; Mon, 29 Apr 2024 13:44:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423468; x=1715028268; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=zTGZQKsd3E7eGfKoI7Y/lfPCuaKuqfdQIrqOutTbSSY=; b=JuVmP3SYdRZyKCNCX6F6cFKkAOi8vcFawEKKohLYaZrguI8jx8eC5z9LwRww/4ztWl jNz8pqkYeZLB6hzOJ8rjK0quO/MakMiQq7KZVebwTQpA43uBecKNON8eKrY26ImRI1qb h4miw6r09nHslEUhMiGCGhaolhTrc7J4oqfsRWAFchIls3L1BFon2B90GlK074I5kOmT ygsxLatom0RTm56Ru8zgPmwIc+v2MpdSRFLIgJqzX2jln5W2l/SDH4TBvhb4CcqtFXV8 ui/LM+hDC+EZRsL6C3kiSMgTbJp/HA9vKTGb+WSfQdIsiAegw189bUH7V4X6AHcaWKs3 7CXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423468; x=1715028268; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=zTGZQKsd3E7eGfKoI7Y/lfPCuaKuqfdQIrqOutTbSSY=; b=IQioHL8Gx4GECnQ7pCcipHkXDS2O9Q6PyBVfk8VFV6fl7/tQCcUGS1twlzXw5r1ZJ8 VxJK3BNxULurPHAtXUlJoGJKVPLl/awGOYyFuSeyucjSI2zkdi+xiVaRk4Ks9VX72yh0 Neq7QAKtGXQbApRO68i0kEQ3Y/ZdTdNO00rNVMhlYLv+myuwNK8TLVMVgdZayTULxbn9 sTun5dw9pIe/RFlo65ASV1FrLc3me54Ec0pOyOO0Zi4mQAHNDcAJhFM90dHFcpRQBbrq 8XPYbeezMZ3I5XULf01CcC9Ip4nrMkQCmrrq+qsZ4iuyxyu3dR26YdH/h1rL7UiDfaJJ 8lSQ== X-Gm-Message-State: AOJu0YxH7OsUzsRXD/z0lR7wTmGtkfrVF+IN1YHHlsJ79PBOyxnVFVZK hdqKJFYI/OUm9mXd+8W6MLFPr+9pue7nBUIa9fTJ1cqvUkjYDIsuV1pWwtAwKCT93iTv8P4WEyb Ar7k= X-Google-Smtp-Source: AGHT+IF4aV4X053EDwy2/aBD3lp0TZ+eDqSxLTeZngeAb/zEynMaOt+oOR9FUDhd0LTysKuRfp+AmA== X-Received: by 2002:a05:690c:f11:b0:61a:cc3c:ae69 with SMTP id dc17-20020a05690c0f1100b0061acc3cae69mr13317793ywb.18.1714423468137; Mon, 29 Apr 2024 13:44:28 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id he28-20020a05622a601c00b00437543e5307sm10792252qtb.40.2024.04.29.13.44.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:44:27 -0700 (PDT) Date: Mon, 29 Apr 2024 16:44:26 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 22/23] pseudo-merge: implement support for finding existing merges Message-ID: <61ddb5742850868d0fd192f37048527c3b06e853.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: This patch implements support for reusing existing pseudo-merge commits when writing bitmaps when there is an existing pseudo-merge bitmap which has exactly the same set of parents as one that we are about to write. Note that unstable pseudo-merges are likely to change between consecutive repacks, and so are generally poor candidates for reuse. However, stable pseudo-merges (see the configuration option 'bitmapPseudoMerge..stableThreshold') are by definition unlikely to change between runs (as they represent long-running branches). Because there is no index from a *set* of pseudo-merge parents to a matching pseudo-merge bitmap, we have to construct the bitmap corresponding to the set of parents for each pending pseudo-merge commit and see if a matching bitmap exists. This is technically quadratic in the number of pseudo-merges, but is OK in practice for a couple of reasons: - non-matching pseudo-merge bitmaps are rejected quickly as soon as they differ in a single bit - already-matched pseudo-merge bitmaps are discarded from subsequent rounds of search - the number of pseudo-merges is generally small, even for large repositories In order to do this, implement (a) a function that finds a matching pseudo-merge given some uncompressed bitset describing its parents, (b) a function that computes the bitset of parents for a given pseudo-merge commit, and (c) call that function before computing the set of reachable objects for some pending pseudo-merge. Signed-off-by: Taylor Blau --- pack-bitmap-write.c | 15 ++++++-- pack-bitmap.c | 32 +++++++++++++++++ pack-bitmap.h | 2 ++ pseudo-merge.c | 55 ++++++++++++++++++++++++++++ pseudo-merge.h | 7 ++++ t/t5333-pseudo-merge-bitmaps.sh | 64 +++++++++++++++++++++++++++++++++ 6 files changed, 173 insertions(+), 2 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index d4894ace9ee..f7245d7d6fa 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -19,6 +19,10 @@ #include "tree-walk.h" #include "pseudo-merge.h" #include "oid-array.h" +#include "config.h" +#include "alloc.h" +#include "refs.h" +#include "strmap.h" struct bitmapped_commit { struct commit *commit; @@ -443,6 +447,7 @@ static int fill_bitmap_tree(struct bitmap *bitmap, } static int reused_bitmaps_nr; +static int reused_pseudo_merge_bitmaps_nr; static int fill_bitmap_commit(struct bb_commit *ent, struct commit *commit, @@ -467,7 +472,7 @@ static int fill_bitmap_commit(struct bb_commit *ent, struct bitmap *remapped = bitmap_new(); if (commit->object.flags & BITMAP_PSEUDO_MERGE) - old = NULL; + old = pseudo_merge_bitmap_for_commit(old_bitmap, c); else old = bitmap_for_commit(old_bitmap, c); /* @@ -478,7 +483,10 @@ static int fill_bitmap_commit(struct bb_commit *ent, if (old && !rebuild_bitmap(mapping, old, remapped)) { bitmap_or(ent->bitmap, remapped); bitmap_free(remapped); - reused_bitmaps_nr++; + if (commit->object.flags & BITMAP_PSEUDO_MERGE) + reused_pseudo_merge_bitmaps_nr++; + else + reused_bitmaps_nr++; continue; } bitmap_free(remapped); @@ -604,6 +612,9 @@ int bitmap_writer_build(struct packing_data *to_pack) the_repository); trace2_data_intmax("pack-bitmap-write", the_repository, "building_bitmaps_reused", reused_bitmaps_nr); + trace2_data_intmax("pack-bitmap-write", the_repository, + "building_bitmaps_pseudo_merge_reused", + reused_pseudo_merge_bitmaps_nr); stop_progress(&writer.progress); diff --git a/pack-bitmap.c b/pack-bitmap.c index 1966b3b95f1..70230e26479 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1316,6 +1316,37 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git, return cb.base; } +struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git, + struct commit *commit) +{ + struct commit_list *p; + struct bitmap *parents; + struct pseudo_merge *match = NULL; + + if (!bitmap_git->pseudo_merges.nr) + return NULL; + + parents = bitmap_new(); + + for (p = commit->parents; p; p = p->next) { + int pos = bitmap_position(bitmap_git, &p->item->object.oid); + if (pos < 0 || pos >= bitmap_num_objects(bitmap_git)) + goto done; + + bitmap_set(parents, pos); + } + + match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges, + parents); + +done: + bitmap_free(parents); + if (match) + return pseudo_merge_bitmap(&bitmap_git->pseudo_merges, match); + + return NULL; +} + static void unsatisfy_all_pseudo_merges(struct bitmap_index *bitmap_git) { uint32_t i; @@ -2809,6 +2840,7 @@ void free_bitmap_index(struct bitmap_index *b) */ close_midx_revindex(b->midx); } + free_pseudo_merge_map(&b->pseudo_merges); free(b); } diff --git a/pack-bitmap.h b/pack-bitmap.h index 25d3b8e604a..0fefef39bec 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -119,6 +119,8 @@ int rebuild_bitmap(const uint32_t *reposition, struct bitmap *dest); struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git, struct commit *commit); +struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git, + struct commit *commit); void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr); int bitmap_writer_build(struct packing_data *to_pack); diff --git a/pseudo-merge.c b/pseudo-merge.c index e111c9cd1a6..9e21fbb5062 100644 --- a/pseudo-merge.c +++ b/pseudo-merge.c @@ -682,3 +682,58 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm, return ret; } + +struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm, + struct bitmap *parents) +{ + struct pseudo_merge *match = NULL; + size_t i; + + if (!pm->nr) + return NULL; + + /* + * NOTE: this loop is quadratic in the worst-case (where no + * matching pseudo-merge bitmaps are found), but in practice + * this is OK for a few reasons: + * + * - Rejecting pseudo-merge bitmaps that do not match the + * given commit is done quickly (i.e. `bitmap_equals_ewah()` + * returns early when we know the two bitmaps aren't equal. + * + * - Already matched pseudo-merge bitmaps (which we track with + * the `->satisfied` bit here) are skipped as potential + * candidates. + * + * - The number of pseudo-merges should be small (in the + * hundreds for most repositories). + * + * If in the future this semi-quadratic behavior does become a + * problem, another approach would be to keep track of which + * pseudo-merges are still "viable" after enumerating the + * pseudo-merge commit's parents: + * + * - A pseudo-merge bitmap becomes non-viable when the bit(s) + * corresponding to one or more parent(s) of the given + * commit are not set in a candidate pseudo-merge's commits + * bitmap. + * + * - After processing all bits, enumerate the remaining set of + * viable pseudo-merge bitmaps, and check that their + * popcount() matches the number of parents in the given + * commit. + */ + for (i = 0; i < pm->nr; i++) { + struct pseudo_merge *candidate = use_pseudo_merge(pm, &pm->v[i]); + if (!candidate || candidate->satisfied) + continue; + if (!bitmap_equals_ewah(parents, candidate->commits)) + continue; + + match = candidate; + match->satisfied = 1; + break; + } + + return match; +} diff --git a/pseudo-merge.h b/pseudo-merge.h index cc14e947e86..33acd00a3e5 100644 --- a/pseudo-merge.h +++ b/pseudo-merge.h @@ -208,4 +208,11 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm, struct bitmap *result, struct bitmap *roots); +/* + * Returns a pseudo-merge which contains the exact set of commits + * listed in the "parents" bitamp, or NULL if none could be found. + */ +struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm, + struct bitmap *parents); + #endif diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh index 909c17e301e..531f1924af4 100755 --- a/t/t5333-pseudo-merge-bitmaps.sh +++ b/t/t5333-pseudo-merge-bitmaps.sh @@ -22,6 +22,10 @@ test_pseudo_merges_cascades () { test_trace2_data bitmap pseudo_merges_cascades "$1" } +test_pseudo_merges_reused () { + test_trace2_data pack-bitmap-write building_bitmaps_pseudo_merge_reused "$1" +} + tag_everything () { git rev-list --all --no-object-names >in && perl -lne ' @@ -322,4 +326,64 @@ test_expect_success 'pseudo-merge overlap stale traversal' ' ) ' +test_expect_success 'pseudo-merge reuse' ' + git init pseudo-merge-reuse && + ( + cd pseudo-merge-reuse && + + stable="1641013200" && # 2022-01-01 + unstable="1672549200" && # 2023-01-01 + + for date in $stable $unstable + do + test_commit_bulk --date "$date +0000" 128 && + test_tick || return 1 + done && + + tag_everything && + + git \ + -c bitmapPseudoMerge.test.pattern="refs/tags/" \ + -c bitmapPseudoMerge.test.maxMerges=1 \ + -c bitmapPseudoMerge.test.threshold=now \ + -c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \ + -c bitmapPseudoMerge.test.stableSize=512 \ + repack -adb && + + test_pseudo_merges >merges && + test_line_count = 2 merges && + + test_pseudo_merge_commits 0 >stable-oids.before && + test_pseudo_merge_commits 1 >unstable-oids.before && + + : >trace2.txt && + GIT_TRACE2_EVENT=$PWD/trace2.txt git \ + -c bitmapPseudoMerge.test.pattern="refs/tags/" \ + -c bitmapPseudoMerge.test.maxMerges=2 \ + -c bitmapPseudoMerge.test.threshold=now \ + -c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \ + -c bitmapPseudoMerge.test.stableSize=512 \ + repack -adb && + + test_pseudo_merges_reused 1 merges && + test_line_count = 3 merges && + + test_pseudo_merge_commits 0 >stable-oids.after && + for i in 1 2 + do + test_pseudo_merge_commits $i || return 1 + done >unstable-oids.after && + + sort -u expect && + sort -u actual && + test_cmp expect actual && + + sort -u expect && + sort -u actual && + test_cmp expect actual + ) +' + test_done From patchwork Mon Apr 29 20:44:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13647724 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF46D17B515 for ; Mon, 29 Apr 2024 20:44:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423476; cv=none; b=WyohTYYSHvaqD35D714xR7mnO4se5E2sqDfr6RWnB5eopp6gSQCLAFqYyw4YUTh7jVl3ity/eaiNIyfud1F4PrY15B+p7K092CekhIXf/dt24eMdpTgFI1+PIiL7GUN0bMW6lW3vlMhhEmpDmkEQQavap56nTF4Onb4k1j7OCow= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714423476; c=relaxed/simple; bh=CV1DOlc9nztiU3u0j6d7RQWToUVFH1YonGGXHuRhH94=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=gMMsj6EICtek5AUrr1Ssb8hO5lyK0tXTZVsvxfK8q/fWBEvYbGgXbPk8HpBlhFJ/+DfexCpaAEb674DVYgYkzDmGzEQhiO/Qrp0JyyujdQpI0kIs6SRmVsLRkLhzpk6E6EvG7m4u4sHJAQOrkg3nrlOvDVponUrdoxJQuIYYkvI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=none smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=hscL4oGl; arc=none smtp.client-ip=209.85.160.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="hscL4oGl" Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-439656c1b50so30204551cf.1 for ; Mon, 29 Apr 2024 13:44:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1714423473; x=1715028273; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=JL0Kf4HGUwy/IQMtp/3ZaD2qeBJ0iAlE1x8LJAajc9M=; b=hscL4oGlPjYOlaJYwZcLwmZP24Ei3VR11PIHZyDnndr8b9Oz/7i5tavhM7Q0p8wazu TVwn2ymXRbhqGFXhx9HQ/mc29CqYlzp//z0OKAXDP2Nm7pEr8ITGABmFac7sv2aLtmGN R70uRP/+U7puzpyrwmQx4ZayLL/U0I/z8quNo6twx4UYgBse+FIZ2wRnSBaZVAA1XHD9 JxeFkBYBQ2OjiJOQJ5JMhpB9gEg+o9QFtilXSlonFJFKdwLVtHAK9V0kSr4VSD8kfDMj 9rcYKmTHqilPsGqxOL1yxJVV+ijJChZYejR5fhmvN/vt899pEfuoFIrsK11A52i/24V6 +iVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714423473; x=1715028273; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JL0Kf4HGUwy/IQMtp/3ZaD2qeBJ0iAlE1x8LJAajc9M=; b=XJR1gJEmmM9kQkYkLR7m8BhKIlFDP+prB1scbo2JDXYDUKBo6xd5GhsEMF5NuNVouI Prpq4f42MaJDdWiOLGbor5++dekeBkwZjUc9a/y+BzcZzD4rMFMbDo0osJqbnU29BO8g A25rjMWIrYZMFfrGBmn0ZCNgOiUrdd1uUZNeq9Le/dQHW8wqFbxu45txMXGIP80Wl2eL OSEXLKInb8yKLoeuUsRE3jGlgi8KbJdKXvE+cDwk9uaDRmAASUT+LJTjXvPhOWjpJbet TL2IaNh96L3H2VhNjFhfjaT168nzZfC0l+hI5QW2mMUsrlK0TlXJd+q01wGsxZ0HH2FU 47Ag== X-Gm-Message-State: AOJu0YzFOjUQZOlyi1hq4LaMjREx6y8ByJPZYl/Z6qyxFgczQdxaXDOP e7/HwVkljof9s2/Xc9k7GI2yg46us6MeGjLGz5Bv1uaC+zggfpBRR6lN5Oz+e/NfA5/nQzQqjxn GQic= X-Google-Smtp-Source: AGHT+IGxRqdKrAa9mnm381JmU99ql3wrJDp8+iBDCXkvDcnAPCU7zYRWYDxkYoFMsG94/usoUBLD4Q== X-Received: by 2002:ac8:7d4f:0:b0:43a:c878:d7c6 with SMTP id h15-20020ac87d4f000000b0043ac878d7c6mr693082qtb.66.1714423472809; Mon, 29 Apr 2024 13:44:32 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id x25-20020ac84d59000000b004343f36ab58sm10803823qtv.81.2024.04.29.13.44.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Apr 2024 13:44:32 -0700 (PDT) Date: Mon, 29 Apr 2024 16:44:30 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Jeff King , Elijah Newren , Junio C Hamano Subject: [PATCH v2 23/23] t/perf: implement performace tests for pseudo-merge bitmaps Message-ID: <2bd830d35dd79a7b1201655df70fc0039cc44d7e.1714422410.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Implement a straightforward performance test demonstrating the benefit of pseudo-merge bitmaps by measuring how long it takes to count reachable objects in a few different scenarios: - without bitmaps, to demonstrate a reasonable baseline - with bitmaps, but without pseudo-merges - with bitmaps and pseudo-merges Results from running this test on git.git are as follows: Test this tree ----------------------------------------------------------------------------------- 5333.2: git rev-list --count --all --objects (no bitmaps) 3.46(3.37+0.09) 5333.3: git rev-list --count --all --objects (no pseudo-merges) 0.13(0.11+0.01) 5333.4: git rev-list --count --all --objects (with pseudo-merges) 0.12(0.11+0.01) Signed-off-by: Taylor Blau --- t/perf/p5333-pseudo-merge-bitmaps.sh | 32 ++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100755 t/perf/p5333-pseudo-merge-bitmaps.sh diff --git a/t/perf/p5333-pseudo-merge-bitmaps.sh b/t/perf/p5333-pseudo-merge-bitmaps.sh new file mode 100755 index 00000000000..4bec409d10e --- /dev/null +++ b/t/perf/p5333-pseudo-merge-bitmaps.sh @@ -0,0 +1,32 @@ +#!/bin/sh + +test_description='pseudo-merge bitmaps' +. ./perf-lib.sh + +test_perf_large_repo + +test_expect_success 'setup' ' + git \ + -c bitmapPseudoMerge.all.pattern="refs/" \ + -c bitmapPseudoMerge.all.threshold=now \ + -c bitmapPseudoMerge.all.stableThreshold=never \ + -c bitmapPseudoMerge.all.maxMerges=64 \ + -c pack.writeBitmapLookupTable=true \ + repack -adb +' + +test_perf 'git rev-list --count --all --objects (no bitmaps)' ' + git rev-list --objects --all +' + +test_perf 'git rev-list --count --all --objects (no pseudo-merges)' ' + GIT_TEST_USE_PSEDUO_MERGES=0 \ + git rev-list --objects --all --use-bitmap-index +' + +test_perf 'git rev-list --count --all --objects (with pseudo-merges)' ' + GIT_TEST_USE_PSEDUO_MERGES=1 \ + git rev-list --objects --all --use-bitmap-index +' + +test_done