From patchwork Mon Dec 12 16:31:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13071223 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D836C4332F for ; Mon, 12 Dec 2022 16:31:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232579AbiLLQbb (ORCPT ); Mon, 12 Dec 2022 11:31:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232286AbiLLQbY (ORCPT ); Mon, 12 Dec 2022 11:31:24 -0500 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 18A31FCEC for ; Mon, 12 Dec 2022 08:31:21 -0800 (PST) Received: by mail-wm1-x333.google.com with SMTP id n9-20020a05600c3b8900b003d0944dba41so5669893wms.4 for ; Mon, 12 Dec 2022 08:31:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=Ed5q4EI9Rja0zU3IUSIZ9CyNU+B6k57eyHEpfMq9F2s=; b=N9TLo4b0YJAnMqBn7W6Xr+yD4fMQjYFNCu4raw++ZuwTHvvqA8/fRckg1VsCsjkvSc vQ+Y52KRXPeEiYK1FWSW6PhHocXMZNG6vBLjTBK4rNHmskZ+rSN2C82gePUYo4Zi2BAk xByf+GFGHmSgnxmPlXi4+uDqhzOlEPFtl36OEEZqwbbcWH3lQkUyDwfoCM5dEKjLN1eK ZI2T0J5PVBw7TYM5pG+LcvWaCg9YdjZszYCRJ5zd2vSZLXQqrJw8/HrcZHMfq9rLY2jL JovkWzR3WZpkd/u4KPX6zjJLvsR453iYN85wgtWEmDMkfBQ7faIEu66l9T7cEM9K2WnZ YZdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ed5q4EI9Rja0zU3IUSIZ9CyNU+B6k57eyHEpfMq9F2s=; b=D70omHTwMVTLINh16tOI02xENKmNRvWGjK9iQ8WndjoGbnE2tCurVdhMAjcmEty4EI Ryd9f7DK3UOcEwHSPLp5dxjI6GUfVGI/raTAtmtKT0aapatOdXI8fNNoI6zER6Jxg+aY /8NYBzeEXrVBw8uvZx5UNjxmg6XjCN/KcB/Yf0+8TDWJgtB8vntv17p5iIRd8hsSnAuy /DFMJsEiyVKQq0KE+NzIKUTuMK3l7U4eu4L831jgjG9wrYC/Afj/v5EYhyVX8c2EDCT/ 9OCKbb87XPPuK6k7F98f5P7AQ3H+8Uw/Vik7T3hf9bzoLbZNVWcUj9tUWZgDdLbWiJRL hrQw== X-Gm-Message-State: ANoB5plzmDfCH3dvT1GGj61JyquMka7fu4T5bSFacv2xl32MiA9fss8a NvwQTOVfzdTptYO6h5xfjoEh/qgW8PY= X-Google-Smtp-Source: AA0mqf797uGpLey11OSWDFflIbPaz21DA48ut5R8Jh7CP6rz6hlS8Gk+Xw2l6RRljzE0H5ifCYTQsw== X-Received: by 2002:a05:600c:2201:b0:3cf:6be3:a7f6 with SMTP id z1-20020a05600c220100b003cf6be3a7f6mr13007975wml.13.1670862679488; Mon, 12 Dec 2022 08:31:19 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n12-20020a05600c3b8c00b003cfd10a33afsm10700061wms.11.2022.12.12.08.31.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Dec 2022 08:31:19 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 12 Dec 2022 16:31:14 +0000 Subject: [PATCH v2 1/4] hashfile: allow skipping the hash function Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The hashfile API is useful for generating files that include a trailing hash of the file's contents up to that point. Using such a hash is helpful for verifying the file for corruption-at-rest, such as a faulty drive causing flipped bits. Git's index file includes this trailing hash, so it uses a 'struct hashfile' to handle the I/O to the file. This was very convenient to allow using the hashfile methods during these operations. However, hashing the file contents during write comes at a performance penalty. It's slower to hash the bytes on their way to the disk than without that step. This problem is made worse by the replacement of hardware-accelerated SHA1 computations with the software-based sha1dc computation. This write cost is significant, and the checksum capability is likely not worth that cost for such a short-lived file. The index is rewritten frequently and the only time the checksum is checked is during 'git fsck'. Thus, it would be helpful to allow a user to opt-out of the hash computation. We first need to allow Git to opt-out of the hash computation in the hashfile API. The buffered writes of the API are still helpful, so it makes sense to make the change here. Introduce a new 'skip_hash' option to 'struct hashfile'. When set, the update_fn and final_fn members of the_hash_algo are skipped. When finalizing the hashfile, the trailing hash is replaced with the null hash. This use of a trailing null hash would be desireable in either case, since we do not want to special case a file format to have a different length depending on whether it was hashed or not. When the final bytes of a file are all zero, we can infer that it was written without hashing, and thus that verification is not available as a check for file consistency. This also means that we could easily toggle hashing for any file format we desire. A version of this patch has existed in the microsoft/git fork since 2017 [1] (the linked commit was rebased in 2018, but the original dates back to January 2017). Here, the change to make the index use this fast path is delayed until a later change. [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 Co-authored-by: Kevin Willford Signed-off-by: Kevin Willford Signed-off-by: Derrick Stolee --- csum-file.c | 14 +++++++++++--- csum-file.h | 7 +++++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/csum-file.c b/csum-file.c index 59ef3398ca2..cce13c0f047 100644 --- a/csum-file.c +++ b/csum-file.c @@ -45,7 +45,8 @@ void hashflush(struct hashfile *f) unsigned offset = f->offset; if (offset) { - the_hash_algo->update_fn(&f->ctx, f->buffer, offset); + if (!f->skip_hash) + the_hash_algo->update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -64,7 +65,12 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, int fd; hashflush(f); - the_hash_algo->final_fn(f->buffer, &f->ctx); + + if (f->skip_hash) + hashclr(f->buffer); + else + the_hash_algo->final_fn(f->buffer, &f->ctx); + if (result) hashcpy(result, f->buffer); if (flags & CSUM_HASH_IN_STREAM) @@ -108,7 +114,8 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * the hashfile's buffer. In this block, * f->offset is necessarily zero. */ - the_hash_algo->update_fn(&f->ctx, buf, nr); + if (!f->skip_hash) + the_hash_algo->update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -153,6 +160,7 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->tp = tp; f->name = name; f->do_crc = 0; + f->skip_hash = 0; the_hash_algo->init_fn(&f->ctx); f->buffer_len = buffer_len; diff --git a/csum-file.h b/csum-file.h index 0d29f528fbc..29468067f81 100644 --- a/csum-file.h +++ b/csum-file.h @@ -20,6 +20,13 @@ struct hashfile { size_t buffer_len; unsigned char *buffer; unsigned char *check_buffer; + + /** + * If set to 1, skip_hash indicates that we should + * not actually compute the hash for this hashfile and + * instead only use it as a buffered write. + */ + unsigned int skip_hash; }; /* Checkpoint */ From patchwork Mon Dec 12 16:31:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13071226 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35CE3C4332F for ; Mon, 12 Dec 2022 16:31:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232590AbiLLQbj (ORCPT ); Mon, 12 Dec 2022 11:31:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232591AbiLLQb1 (ORCPT ); Mon, 12 Dec 2022 11:31:27 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14E9EFCF0 for ; Mon, 12 Dec 2022 08:31:22 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id h7so12669397wrs.6 for ; Mon, 12 Dec 2022 08:31:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:fcc:content-transfer-encoding:mime-version:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=z+RQlk3U7znlrez+HRor0VkDoWW9HIopyv+94qA9Dt4=; b=dc0ZKHgzJFz58d7MnuDkHWSNIOOfeYuy4/WFeHyASLbv3PqXu/nBtLP8IETRuDf6HW 4Koz/JAIzAPooAHEAhiuL4iOPK0Q/KLozrraIauPNbTHvd11IduGF6uZxZO4WFdpXObt Y5GlClnw73qITXmeCH0pR4+cH+ri2HD7xnK6vS0KSZbPTBMVbz6j2+P528SujS1z4XvV DII9VXCVtqRYtZLEqW6CCmxRyC/wmlzAKpAWglWu+hr/6J0d6IgSIGWkwzERZg7MbCK3 j9SC65PpoIoNzHr6PMAXD5MH9vXZy5AXIxD6IDIkn+Y4W+H+yOYmvD77kSiEww3ByDSj m7dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:fcc:content-transfer-encoding:mime-version:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=z+RQlk3U7znlrez+HRor0VkDoWW9HIopyv+94qA9Dt4=; b=x92dDV7t1rbHeRkbNEtspYBT5w9RUdi3AVqEpBtrXb1eMR9LS46sC+jwrK8X0zr0UA GkdrP+irUO4bO86wxt46gzM1OrD5XywPcMZuQ5ehkSCORQJbI0CJawghJDJTn1VQQa2G 7D5+5fRhx6iEz9VzbkXuDOLwZHSeoJnGBns3+/lxhUcCjlEOIFkN2/YFVpeydLXXJACA uFQYgHQB41Apjg10jGy76kx38+TPkDpDmbiO0ku9PTfvefaH9VVWsb+IeHxTz6KiRiwl EH5qbSaJfYHoA8VH+5yl/w8LTBy2ZIqgs3o0JdrPq1ne8Bh5xcPE063+SaVc/C1/OW39 ispA== X-Gm-Message-State: ANoB5plMKH9kuysoOyoj7Io6xD9A092IzQN91v6e0GCDdBiQxR2+PVdw BogqD1X2UjnofzzbCUZ0sX0LEiYuMfI= X-Google-Smtp-Source: AA0mqf6jNNjMxymW+wm/dCiPD2e9u09AtEwEbZe9TJPypNaFo36b3edy8X/+y9P8fzeg32rFu5PUsQ== X-Received: by 2002:adf:e946:0:b0:242:483f:e9b9 with SMTP id m6-20020adfe946000000b00242483fe9b9mr9156654wrn.24.1670862680366; Mon, 12 Dec 2022 08:31:20 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id x1-20020a5d60c1000000b00241c712916fsm10958925wrt.0.2022.12.12.08.31.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Dec 2022 08:31:19 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 12 Dec 2022 16:31:15 +0000 Subject: [PATCH v2 2/4] read-cache: add index.skipHash config option MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The previous change allowed skipping the hashing portion of the hashwrite API, using it instead as a buffered write API. Disabling the hashwrite can be particularly helpful when the write operation is in a critical path. One such critical path is the writing of the index. This operation is so critical that the sparse index was created specifically to reduce the size of the index to make these writes (and reads) faster. Following a similar approach to one used in the microsoft/git fork [1], add a new config option (index.skipHash) that allows disabling this hashing during the index write. The cost is that we can no longer validate the contents for corruption-at-rest using the trailing hash. [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 While older Git versions will not recognize the null hash as a special case, the file format itself is still being met in terms of its structure. Using this null hash will still allow Git operations to function across older versions. The one exception is 'git fsck' which checks the hash of the index file. This used to be a check on every index read, but was split out to just the index in a33fc72fe91 (read-cache: force_verify_index_checksum, 2017-04-14) and released first in Git 2.13.0. Document the versions that relaxed these restrictions, with the optimistic expectation that this change will be included in Git 2.40.0. Here, we disable this check if the trailing hash is all zeroes. We add a warning to the config option that this may cause undesirable behavior with older Git versions. As a quick comparison, I tested 'git update-index --force-write' with and without index.skipHash=true on a copy of the Linux kernel repository. Benchmark 1: with hash Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms] Range (min … max): 34.3 ms … 79.1 ms 82 runs Benchmark 2: without hash Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms] Range (min … max): 16.3 ms … 42.0 ms 69 runs Summary 'without hash' ran 1.78 ± 0.76 times faster than 'with hash' These performance benefits are substantial enough to allow users the ability to opt-in to this feature, even with the potential confusion with older 'git fsck' versions. It is critical that this test is placed before the test_index_version tests, since those tests obliterate the .git/config file and hence lose the setting from GIT_TEST_DEFAULT_HASH, if set. Signed-off-by: Derrick Stolee --- Documentation/config/index.txt | 9 +++++++++ read-cache.c | 12 +++++++++++- t/t1600-index.sh | 6 ++++++ 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt index 75f3a2d1054..5d62489c302 100644 --- a/Documentation/config/index.txt +++ b/Documentation/config/index.txt @@ -30,3 +30,12 @@ index.version:: Specify the version with which new index files should be initialized. This does not affect existing repositories. If `feature.manyFiles` is enabled, then the default is 4. + +index.skipHash:: + When enabled, do not compute the trailing hash for the index file. + Instead, write a trailing set of bytes with value zero, indicating + that the computation was skipped. ++ +If you enable `index.skipHash`, then Git clients older than 2.13.0 will +refuse to parse the index and Git clients older than 2.40.0 will report an +error during `git fsck`. diff --git a/read-cache.c b/read-cache.c index 46f5e497b14..3f7de8b2e20 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1817,6 +1817,8 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size) git_hash_ctx c; unsigned char hash[GIT_MAX_RAWSZ]; int hdr_version; + unsigned char *start, *end; + struct object_id oid; if (hdr->hdr_signature != htonl(CACHE_SIGNATURE)) return error(_("bad signature 0x%08x"), hdr->hdr_signature); @@ -1827,10 +1829,16 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size) if (!verify_index_checksum) return 0; + end = (unsigned char *)hdr + size; + start = end - the_hash_algo->rawsz; + oidread(&oid, start); + if (oideq(&oid, null_oid())) + return 0; + the_hash_algo->init_fn(&c); the_hash_algo->update_fn(&c, hdr, size - the_hash_algo->rawsz); the_hash_algo->final_fn(hash, &c); - if (!hasheq(hash, (unsigned char *)hdr + size - the_hash_algo->rawsz)) + if (!hasheq(hash, end - the_hash_algo->rawsz)) return error(_("bad index file sha1 signature")); return 0; } @@ -2918,6 +2926,8 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, f = hashfd(tempfile->fd, tempfile->filename.buf); + git_config_get_maybe_bool("index.skiphash", (int *)&f->skip_hash); + for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) removed++; diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 010989f90e6..45feb0fc5d8 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -65,6 +65,12 @@ test_expect_success 'out of bounds index.version issues warning' ' ) ' +test_expect_success 'index.skipHash config option' ' + rm -f .git/index && + git -c index.skipHash=true add a && + git fsck +' + test_index_version () { INDEX_VERSION_CONFIG=$1 && FEATURE_MANY_FILES=$2 && From patchwork Mon Dec 12 16:31:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13071227 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 454B5C4332F for ; Mon, 12 Dec 2022 16:31:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231971AbiLLQbm (ORCPT ); Mon, 12 Dec 2022 11:31:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232643AbiLLQb1 (ORCPT ); Mon, 12 Dec 2022 11:31:27 -0500 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9387AFCF2 for ; Mon, 12 Dec 2022 08:31:22 -0800 (PST) Received: by mail-wm1-x331.google.com with SMTP id m19so5927190wms.5 for ; Mon, 12 Dec 2022 08:31:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=FotZg4yy9XlqAjPVJNxC5xwRpcqkYAtK4Jr/uCnQRcg=; b=Lhxg7majVMA+prwXg8viWatfVjmEmB9VJxGwBoG08o5OldHg4SBiLr35Dm54kCMrV7 mykArtbtpLxL20h76SPL52+MRAkqNvJA8/7oTyPOZNZ53ZUAb9gz8DV9BWLGwOjSbL5r F9jaru7saQi+5EUEgoxyNR5NK7lTjFG+UVrKVWgzpPSQE5HGSVXPBW06LqML/eojYWj6 KnXaLdkFIGRhv0AOtG43Hak1+VdxPX9a5rX6N6vRqdqVNsfyYsAKAtCmb8+0eo2RWcGg M40L56RJsx/BWXlPPk90c2PRV0lb3nWEUSp31c3f//kIS9l+0brc/wWRlm18XJYeEhTT X15g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FotZg4yy9XlqAjPVJNxC5xwRpcqkYAtK4Jr/uCnQRcg=; b=leoEo8hruVp011nu1/5rkdiZK7Sx8eqqFM5ErCqHc0ZC+pbCPEx/AytI2oPWitvGlc g2fxDq5KtGDwr8HLi/2yC2lFmVmQJwkXzmBcwk0+NQQejcW/gtfpuzg3clXW5RQGWCYq wW4A9YavvPPitivTTcFYn0yZcsbN12V60Cm8J84nzCVXR9zCaqmz2Cm7537m2u05lxTX lrLxnuO2ioULPe7XNTbnZPg5tGKTjL7xEUfUnHmUT8ArOO06N4NJFCFsdp7tP8sdmiyR eSAxqJD9ZYNZGwr22xQipZlGJAJFguUeF4WFi7ZgnEiI8y7YOqclDcCr3WGCDhuflUNm 8U7g== X-Gm-Message-State: ANoB5pkdvsBzDPRKlEM18ca9gkOkLvc0VUsKsdN1wjbpzVrcc+i1+P5t WI8fnRY+aDWcnDOqOuf7KWHd6vq8ETg= X-Google-Smtp-Source: AA0mqf5M/Hp99+CVKcibGdnsHb6qEQRkCwxYlvg3pPnpPHOpdc2Za2HuqW/sr340VjMv/NqiyV5kbg== X-Received: by 2002:a7b:c417:0:b0:3cc:cc18:b490 with SMTP id k23-20020a7bc417000000b003cccc18b490mr12824717wmi.28.1670862680957; Mon, 12 Dec 2022 08:31:20 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i10-20020a05600c354a00b003d069fc7372sm10804674wmq.1.2022.12.12.08.31.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Dec 2022 08:31:20 -0800 (PST) Message-Id: <813e81a058227bd373cec802e443fcd677042fb4.1670862677.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 12 Dec 2022 16:31:16 +0000 Subject: [PATCH v2 3/4] test-lib-functions: add helper for trailing hash Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee It can be helpful to check that a file format with a trailing hash has a specific hash in the final bytes of a written file. This is made more apparent by recent changes that allow skipping the hash algorithm and writing a null hash at the end of the file instead. Add a new test_trailing_hash helper and use it in t1600 to verify that index.skipHash=true really does skip the hash computation, since 'git fsck' does not actually verify the hash. Keep the 'git fsck' call to ensure that any potential future change to check the index hash does not cause an error in this case. Signed-off-by: Derrick Stolee --- t/t1600-index.sh | 3 +++ t/test-lib-functions.sh | 8 ++++++++ 2 files changed, 11 insertions(+) diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 45feb0fc5d8..55914bc3506 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -68,6 +68,9 @@ test_expect_success 'out of bounds index.version issues warning' ' test_expect_success 'index.skipHash config option' ' rm -f .git/index && git -c index.skipHash=true add a && + test_trailing_hash .git/index >hash && + echo $(test_oid zero) >expect && + test_cmp expect hash && git fsck ' diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 796093a7b32..60308843f8f 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1875,3 +1875,11 @@ test_cmp_config_output () { sort config-actual >sorted-actual && test_cmp sorted-expect sorted-actual } + +# Given a filename, extract its trailing hash as a hex string +test_trailing_hash () { + local file="$1" && + tail -c $(test_oid rawsz) "$file" | + test-tool hexdump | + sed "s/ //g" +} From patchwork Mon Dec 12 16:31:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13071224 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCF73C00145 for ; Mon, 12 Dec 2022 16:31:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232552AbiLLQbd (ORCPT ); Mon, 12 Dec 2022 11:31:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231462AbiLLQbZ (ORCPT ); Mon, 12 Dec 2022 11:31:25 -0500 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56147FCF3 for ; Mon, 12 Dec 2022 08:31:23 -0800 (PST) Received: by mail-wm1-x333.google.com with SMTP id c65-20020a1c3544000000b003cfffd00fc0so5659459wma.1 for ; Mon, 12 Dec 2022 08:31:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=fESBRnWCynHwK44228IT91mKpA82MaCngsfDT7ZqGik=; b=N2YniAR2nO6LsksvkfACYIzA3p/VOybuOTIwzhWDmlxSnrKYYqYhXmNU5msVXVL6uf cEge4vNK5c7Zsl/mQfaEltC7XW+qlTbm5YOKQw+61fm5lVGPE2M4Na3kIRe6S82S2Jva NneEDgQ/lTpQ5xV1DTMzrx723A67iQT0sV7qb+tZxfelXwpDbMthts1LYgintMeukMqc tC+u1Kw5PluqI+3auFsdc8PIHZKQAVb2HaLapDgt1cQ4cFdPDHWHJC0iQPvc7ojGURdg 6X3B1Ne1jlcNWY3I/mkISs+rO92gHTZ2lIEY72jpLjE/zP7okcW2umf0/Dya9CrBwYVf ZeoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fESBRnWCynHwK44228IT91mKpA82MaCngsfDT7ZqGik=; b=UVkJseKFZj+ZW2FyG5y+Z228M9QTTCwIXVWj47UZ/XNTURjednXKdUIS+2g0hqk2Qf 0GtvY13y5U5Km7biX0qYpTv8SS4+AgK9Hqwadu7wTFFjQ3ZEVTzHCQ/2uEwgo8kbuReV PwAT0gZxkOsM2zq1KB7zbBU1hQB6jv1wRCmuIEbgYbybHQ3qZsF4NhfZRz+MsXmH2yT7 x8l9BceqKLPEiNC95julM/ElVN+FH3l/u9F6g0nBD3LBt/vK7MM1cBuk+g4y2Eorr/BX sfNw+iFJ4sGSgS+GU0igZfenVJ8lhiyc4XewVpFAJE0tHBtryTL46K2jjSIbwZkfZ/oE +z8w== X-Gm-Message-State: ANoB5pmusjLNabjKfpqscblI3dgeafcW3c5GTtyEGcHaytIxK+vNgLyH bSgYrXFMZujnle+mRRpKDpufxvK/obY= X-Google-Smtp-Source: AA0mqf5J4e/hoKyzAXzTIQ82GkRd860DjAknzDEQdOXDncwwdpm+f78TDUFFQzPzHT6uOXkunU3GVA== X-Received: by 2002:a05:600c:a56:b0:3d1:d396:1adc with SMTP id c22-20020a05600c0a5600b003d1d3961adcmr16235665wmq.14.1670862681736; Mon, 12 Dec 2022 08:31:21 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id bu4-20020a056000078400b00236576c8eddsm9350031wrb.12.2022.12.12.08.31.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Dec 2022 08:31:21 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 12 Dec 2022 16:31:17 +0000 Subject: [PATCH v2 4/4] features: feature.manyFiles implies fast index writes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The recent addition of the index.skipHash config option allows index writes to speed up by skipping the hash computation for the trailing checksum. This is particularly critical for repositories with many files at HEAD, so add this config option to two cases where users in that scenario may opt-in to such behavior: 1. The feature.manyFiles config option enables some options that are helpful for repositories with many files at HEAD. 2. 'scalar register' and 'scalar reconfigure' set config options that optimize for large repositories. In both of these cases, set index.skipHash=true to gain this speedup. Add tests that demonstrate the proper way that index.skipHash=true can override feature.manyFiles=true. Signed-off-by: Derrick Stolee --- Documentation/config/feature.txt | 5 +++++ read-cache.c | 5 ++++- repo-settings.c | 2 ++ repository.h | 1 + scalar.c | 1 + t/t1600-index.sh | 13 ++++++++++++- 6 files changed, 25 insertions(+), 2 deletions(-) diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt index 95975e50912..e52bc6b8584 100644 --- a/Documentation/config/feature.txt +++ b/Documentation/config/feature.txt @@ -23,6 +23,11 @@ feature.manyFiles:: working directory. With many files, commands such as `git status` and `git checkout` may be slow and these new defaults improve performance: + +* `index.skipHash=true` speeds up index writes by not computing a trailing + checksum. Note that this will cause Git versions earlier than 2.13.0 to + refuse to parse the index and Git versions earlier than 2.40.0 will report + a corrupted index during `git fsck`. ++ * `index.version=4` enables path-prefix compression in the index. + * `core.untrackedCache=true` enables the untracked cache. This setting assumes diff --git a/read-cache.c b/read-cache.c index 3f7de8b2e20..1844953fba7 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2926,7 +2926,10 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, f = hashfd(tempfile->fd, tempfile->filename.buf); - git_config_get_maybe_bool("index.skiphash", (int *)&f->skip_hash); + if (istate->repo) { + prepare_repo_settings(istate->repo); + f->skip_hash = istate->repo->settings.index_skip_hash; + } for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) diff --git a/repo-settings.c b/repo-settings.c index 3021921c53d..3dbd3f0e2ec 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -47,6 +47,7 @@ void prepare_repo_settings(struct repository *r) } if (manyfiles) { r->settings.index_version = 4; + r->settings.index_skip_hash = 1; r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE; } @@ -61,6 +62,7 @@ void prepare_repo_settings(struct repository *r) repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1); repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1); repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0); + repo_cfg_bool(r, "index.skiphash", &r->settings.index_skip_hash, r->settings.index_skip_hash); /* * The GIT_TEST_MULTI_PACK_INDEX variable is special in that diff --git a/repository.h b/repository.h index 6c461c5b9de..e8c67ffe165 100644 --- a/repository.h +++ b/repository.h @@ -42,6 +42,7 @@ struct repo_settings { struct fsmonitor_settings *fsmonitor; /* lazily loaded */ int index_version; + int index_skip_hash; enum untracked_cache_setting core_untracked_cache; int pack_use_sparse; diff --git a/scalar.c b/scalar.c index 6c52243cdf1..b49bb8c24ec 100644 --- a/scalar.c +++ b/scalar.c @@ -143,6 +143,7 @@ static int set_recommended_config(int reconfigure) { "credential.validate", "false", 1 }, /* GCM4W-only */ { "gc.auto", "0", 1 }, { "gui.GCWarning", "false", 1 }, + { "index.skipHash", "false", 1 }, { "index.threads", "true", 1 }, { "index.version", "4", 1 }, { "merge.stat", "false", 1 }, diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 55914bc3506..103743a1c7d 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -71,7 +71,18 @@ test_expect_success 'index.skipHash config option' ' test_trailing_hash .git/index >hash && echo $(test_oid zero) >expect && test_cmp expect hash && - git fsck + git fsck && + + rm -f .git/index && + git -c feature.manyFiles=true add a && + test_trailing_hash .git/index >hash && + test_cmp expect hash && + + rm -f .git/index && + git -c feature.manyFiles=true \ + -c index.skipHash=false add a && + test_trailing_hash .git/index >hash && + ! cmp expect hash ' test_index_version () {