From patchwork Wed Dec 7 17:25:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13067452 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7661C47089 for ; Wed, 7 Dec 2022 17:26:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230049AbiLGR00 (ORCPT ); Wed, 7 Dec 2022 12:26:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229940AbiLGR0D (ORCPT ); Wed, 7 Dec 2022 12:26:03 -0500 Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1950669A90 for ; Wed, 7 Dec 2022 09:26:02 -0800 (PST) Received: by mail-wm1-x333.google.com with SMTP id l26so7976145wms.4 for ; Wed, 07 Dec 2022 09:26:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=/qSHcM9yUUo0nD7eZW5ZD8SkIf1Ai0up1bzvnY0av0A=; b=fk/RKlwTeDijb3xzkNAdwnJBV5MhXWO4XBEMUCXYS6v19aTHBt5ymXdCb1b8Dtxpjo r/Z7LAHJGwibf7kjRgyIP79MV0FA0DB+pQH+YCqCqDM5fJdqm1x4Uc54hRTg6Rh5hxqk LQZvm3n6b6nYdIRkLBerESB+Cpr/4lM2nY2EOcA4xV08gpPRhqeoXwzOhHdzPM4TRVu2 Xbk7lKMC0GE1hh5yaEgSqZ8M2TW8XjADot6LY2rIg3BPBrNW5dv/YGulax1uEu94UUv8 iZvVLGcxJ59bzMvEj++UIxD9ITtfaK5yOszDKOld7GREnkfgqhz1befSAHjx58UTzV66 gYuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/qSHcM9yUUo0nD7eZW5ZD8SkIf1Ai0up1bzvnY0av0A=; b=D5BKbHXv3ft7oimj+ZQtqOifUPe3gKIzAxMIO1O1fjsnmIUvLYXqKZgPzcdHH8noMI FArPAne90JT9ZXv3LPhPnrrw5CTx1Z0+Xa24dXTSTGG+35TMP4S8K6/3G8uA35l94mUw cd9oda7XFNP69UlyK2v9fqk3AMZ0LzA88BswS0PLjzWo/wmGevCeywBhN/0EClAdb2hS ttiPhrZ4MVUbM+tt/9/Fa9WIYcp5f4DNKn4O0C2t6O4cXXzwoYEzhBwH1v1XAIXomHEx ndw7yk5efTW/fQtSaZpTz/zYwBsZ/nDI3smNztBALUUPolrlI5uOol3ToPOsI8/fJwlG kfFg== X-Gm-Message-State: ANoB5pleHGUJPI+iPdlEvRsjj9rr2ITMJSuZrB2i4ow1Kh+nDKYjM9zH 8bFF00UBE24LQFVItob5OOIVaXgXQjU= X-Google-Smtp-Source: AA0mqf4jJe5Nl+Fqcbj3c76KbJicG2J1Ialn7KCf3tNfSS/WqIBCMocuxX8QBSOyJSHVzIf1luq90g== X-Received: by 2002:a05:600c:19d1:b0:3cf:4d12:1b4b with SMTP id u17-20020a05600c19d100b003cf4d121b4bmr69062314wmq.23.1670433960394; Wed, 07 Dec 2022 09:26:00 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id bd10-20020a05600c1f0a00b003d070e45574sm2537434wmb.11.2022.12.07.09.25.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 09:26:00 -0800 (PST) Message-Id: <40ee8dbaef06f8f4265d12436455279499d7ac01.1670433958.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 07 Dec 2022 17:25:55 +0000 Subject: [PATCH 1/4] hashfile: allow skipping the hash function Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The hashfile API is useful for generating files that include a trailing hash of the file's contents up to that point. Using such a hash is helpful for verifying the file for corruption-at-rest, such as a faulty drive causing flipped bits. Git's index file includes this trailing hash, so it uses a 'struct hashfile' to handle the I/O to the file. This was very convenient to allow using the hashfile methods during these operations. However, hashing the file contents during write comes at a performance penalty. It's slower to hash the bytes on their way to the disk than without that step. This problem is made worse by the replacement of hardware-accelerated SHA1 computations with the software-based sha1dc computation. This write cost is significant, and the checksum capability is likely not worth that cost for such a short-lived file. The index is rewritten frequently and the only time the checksum is checked is during 'git fsck'. Thus, it would be helpful to allow a user to opt-out of the hash computation. We first need to allow Git to opt-out of the hash computation in the hashfile API. The buffered writes of the API are still helpful, so it makes sense to make the change here. Introduce a new 'skip_hash' option to 'struct hashfile'. When set, the update_fn and final_fn members of the_hash_algo are skipped. When finalizing the hashfile, the trailing hash is replaced with the null hash. This use of a trailing null hash would be desireable in either case, since we do not want to special case a file format to have a different length depending on whether it was hashed or not. When the final bytes of a file are all zero, we can infer that it was written without hashing, and thus that verification is not available as a check for file consistency. This also means that we could easily toggle hashing for any file format we desire. A version of this patch has existed in the microsoft/git fork since 2017 [1] (the linked commit was rebased in 2018, but the original dates back to January 2017). Here, the change to make the index use this fast path is delayed until a later change. [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 Co-authored-by: Kevin Willford Signed-off-by: Kevin Willford Signed-off-by: Derrick Stolee --- csum-file.c | 14 +++++++++++--- csum-file.h | 7 +++++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/csum-file.c b/csum-file.c index 59ef3398ca2..3243473c3d7 100644 --- a/csum-file.c +++ b/csum-file.c @@ -45,7 +45,8 @@ void hashflush(struct hashfile *f) unsigned offset = f->offset; if (offset) { - the_hash_algo->update_fn(&f->ctx, f->buffer, offset); + if (!f->skip_hash) + the_hash_algo->update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -64,7 +65,12 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, int fd; hashflush(f); - the_hash_algo->final_fn(f->buffer, &f->ctx); + + if (f->skip_hash) + memset(f->buffer, 0, the_hash_algo->rawsz); + else + the_hash_algo->final_fn(f->buffer, &f->ctx); + if (result) hashcpy(result, f->buffer); if (flags & CSUM_HASH_IN_STREAM) @@ -108,7 +114,8 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * the hashfile's buffer. In this block, * f->offset is necessarily zero. */ - the_hash_algo->update_fn(&f->ctx, buf, nr); + if (!f->skip_hash) + the_hash_algo->update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -153,6 +160,7 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->tp = tp; f->name = name; f->do_crc = 0; + f->skip_hash = 0; the_hash_algo->init_fn(&f->ctx); f->buffer_len = buffer_len; diff --git a/csum-file.h b/csum-file.h index 0d29f528fbc..29468067f81 100644 --- a/csum-file.h +++ b/csum-file.h @@ -20,6 +20,13 @@ struct hashfile { size_t buffer_len; unsigned char *buffer; unsigned char *check_buffer; + + /** + * If set to 1, skip_hash indicates that we should + * not actually compute the hash for this hashfile and + * instead only use it as a buffered write. + */ + unsigned int skip_hash; }; /* Checkpoint */ From patchwork Wed Dec 7 17:25:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13067451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6B6FC352A1 for ; Wed, 7 Dec 2022 17:26:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229942AbiLGR0X (ORCPT ); Wed, 7 Dec 2022 12:26:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229955AbiLGR0E (ORCPT ); Wed, 7 Dec 2022 12:26:04 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D399669A95 for ; Wed, 7 Dec 2022 09:26:02 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id d1so29037017wrs.12 for ; Wed, 07 Dec 2022 09:26:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:fcc:content-transfer-encoding:mime-version:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=Q/sYCO0g63r3rEs9I2ZMdvq3WSQ0z/98w5Msc+Z9TkE=; b=ANpe3W0O/Pui6JZ12//faaQ9p002uwfUYV0aDEV7wFGV+rI+6WDiIFJ0Rmb+eOd5wX 8yy+sjx5b5hJYu0jrUrotgw01yVOiObEvCjvF8vfWag65PgQZ0a3V83quCUOq9JLJz+O mfY6tcZgPIEC6wVS3OMTVyWU+hReU2WQa42upzS2g13wMkk9NtCPW42K7JalcdaUB4/E rWjBvw6UVPl9TxvCNy3mPk0kJx5fhVBQDgM2w8sj3ZrIBpXYHM7QnYDW11cdofTIJFvn 1Kx+fo5HXTFN9vik96+pHd70TcVkbCBBwbZve87EiCXWqCeCPs/S7kBdNYAPTz/pRyPf Ijng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:fcc:content-transfer-encoding:mime-version:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q/sYCO0g63r3rEs9I2ZMdvq3WSQ0z/98w5Msc+Z9TkE=; b=StLrhPRylfRNXGtS30X6FHt9YLBJpjsnOc5lbC0xtbsUYaF43mQuttNTZPO4CMa26c huL5FX9pSqHIbZMHoQxNV6mG08qBLWaA4DvwAbQqfK+XGCaKU1iCf0QQUKPsOAnyMgHl QInXqGSgI6Z2s3POg/ldJDk7coVSMje9gFx1Z75oD0w8eeEi+v5N1XQvMmMhKyM5uolE RdR4uThsIjbOYocn90sHI4DiZr9NDPFZzpGtHLQVLjM4naaK4ExhS7+feHtmTSV5pj7T K2cbjpdYZyjh1Hsx0uAJig67uxJ5D11Q8QbuoAEKJryGdD3HRRdkOqvuuxkYP/S+w5Eq 6ZVw== X-Gm-Message-State: ANoB5pkGRXMMHq0YQt1HveUrYkD9i/P5ZPGX4nu7xxGwMeOh0pGZB03Q amgpyZQJJLh4Gp5eMovQi8T6KUifSLY= X-Google-Smtp-Source: AA0mqf65hAIFm444tbEaZvs2sM0syppFpNOWwFInySB/kHEG80HDZCtdlXNkOua6pTKjul1Owh+/Lw== X-Received: by 2002:a05:6000:989:b0:236:91d0:1f with SMTP id by9-20020a056000098900b0023691d0001fmr48952203wrb.33.1670433961243; Wed, 07 Dec 2022 09:26:01 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n5-20020a1c7205000000b003cf6c2f9513sm2274337wmc.2.2022.12.07.09.26.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 09:26:00 -0800 (PST) Message-Id: <5fb4b5a36ac806f3ee07a614bcb93df2c430507c.1670433958.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 07 Dec 2022 17:25:56 +0000 Subject: [PATCH 2/4] read-cache: add index.skipHash config option MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The previous change allowed skipping the hashing portion of the hashwrite API, using it instead as a buffered write API. Disabling the hashwrite can be particularly helpful when the write operation is in a critical path. One such critical path is the writing of the index. This operation is so critical that the sparse index was created specifically to reduce the size of the index to make these writes (and reads) faster. Following a similar approach to one used in the microsoft/git fork [1], add a new config option (index.skipHash) that allows disabling this hashing during the index write. The cost is that we can no longer validate the contents for corruption-at-rest using the trailing hash. [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 While older Git versions will not recognize the null hash as a special case, the file format itself is still being met in terms of its structure. Using this null hash will still allow Git operations to function across older versions. The one exception is 'git fsck' which checks the hash of the index file. This used to be a check on every index read, but was split out to just the index in a33fc72fe91 (read-cache: force_verify_index_checksum, 2017-04-14). Here, we disable this check if the trailing hash is all zeroes. We add a warning to the config option that this may cause undesirable behavior with older Git versions. As a quick comparison, I tested 'git update-index --force-write' with and without index.computeHash=false on a copy of the Linux kernel repository. Benchmark 1: with hash Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms] Range (min … max): 34.3 ms … 79.1 ms 82 runs Benchmark 2: without hash Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms] Range (min … max): 16.3 ms … 42.0 ms 69 runs Summary 'without hash' ran 1.78 ± 0.76 times faster than 'with hash' These performance benefits are substantial enough to allow users the ability to opt-in to this feature, even with the potential confusion with older 'git fsck' versions. It is critical that this test is placed before the test_index_version tests, since those tests obliterate the .git/config file and hence lose the setting from GIT_TEST_DEFAULT_HASH, if set. Signed-off-by: Derrick Stolee --- Documentation/config/index.txt | 8 ++++++++ read-cache.c | 14 +++++++++++++- t/t1600-index.sh | 8 ++++++++ 3 files changed, 29 insertions(+), 1 deletion(-) diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt index 75f3a2d1054..3ea0962631d 100644 --- a/Documentation/config/index.txt +++ b/Documentation/config/index.txt @@ -30,3 +30,11 @@ index.version:: Specify the version with which new index files should be initialized. This does not affect existing repositories. If `feature.manyFiles` is enabled, then the default is 4. + +index.skipHash:: + When enabled, do not compute the trailing hash for the index file. + Instead, write a trailing set of bytes with value zero, indicating + that the computation was skipped. ++ +If you enable `index.skipHash`, then older Git clients may report that +your index is corrupt during `git fsck`. diff --git a/read-cache.c b/read-cache.c index 46f5e497b14..fb4d6fb6387 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1817,6 +1817,8 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size) git_hash_ctx c; unsigned char hash[GIT_MAX_RAWSZ]; int hdr_version; + unsigned char *start, *end; + struct object_id oid; if (hdr->hdr_signature != htonl(CACHE_SIGNATURE)) return error(_("bad signature 0x%08x"), hdr->hdr_signature); @@ -1827,10 +1829,16 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size) if (!verify_index_checksum) return 0; + end = (unsigned char *)hdr + size; + start = end - the_hash_algo->rawsz; + oidread(&oid, start); + if (oideq(&oid, null_oid())) + return 0; + the_hash_algo->init_fn(&c); the_hash_algo->update_fn(&c, hdr, size - the_hash_algo->rawsz); the_hash_algo->final_fn(hash, &c); - if (!hasheq(hash, (unsigned char *)hdr + size - the_hash_algo->rawsz)) + if (!hasheq(hash, end - the_hash_algo->rawsz)) return error(_("bad index file sha1 signature")); return 0; } @@ -2915,9 +2923,13 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, int ieot_entries = 1; struct index_entry_offset_table *ieot = NULL; int nr, nr_threads; + int skip_hash; f = hashfd(tempfile->fd, tempfile->filename.buf); + if (!git_config_get_maybe_bool("index.skiphash", &skip_hash)) + f->skip_hash = skip_hash; + for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) removed++; diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 010989f90e6..df07c587e0e 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -65,6 +65,14 @@ test_expect_success 'out of bounds index.version issues warning' ' ) ' +test_expect_success 'index.skipHash config option' ' + ( + rm -f .git/index && + git -c index.skipHash=true add a && + git fsck + ) +' + test_index_version () { INDEX_VERSION_CONFIG=$1 && FEATURE_MANY_FILES=$2 && From patchwork Wed Dec 7 17:25:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13067453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D039C352A1 for ; Wed, 7 Dec 2022 17:26:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229961AbiLGR0d (ORCPT ); Wed, 7 Dec 2022 12:26:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229962AbiLGR0F (ORCPT ); Wed, 7 Dec 2022 12:26:05 -0500 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9639669A9D for ; Wed, 7 Dec 2022 09:26:03 -0800 (PST) Received: by mail-wr1-x42e.google.com with SMTP id y16so29088703wrm.2 for ; Wed, 07 Dec 2022 09:26:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=IvTC4dVaqIpb1NK9YY6HU4hn/glAaEhIyfVbC4Etk6c=; b=UJDvSPXJ4Dky/Ddl0UzBTqEtVRdmWzlvq38/3jm9O2vPgiSQVIsr5syuYosBq01S1M oNs968wNe6CWczOuQZGlvTDss3g0TdfT3K4ow7/9m7LnBBTqRJbxU2yJVIGfjq0yakQr Lx2apTk/ZMvHJw38xiRKzUstYNNlloro0VtKj2GGupzQjiJboe94d0su6VfwDw0Kb1qJ 71LKPjfZdt+Mt3AhIulVVz9VTKOTGriNRsxUXfdanB4m7LxaVD9mvYnZ8ZvSdl54UIiY HFzJeEOZ51wEWaJQuroHy03TzCz5RafXywNxOQM1AfX6Avr/T2cO3nGK+ws7vpxn9Jaw MHvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IvTC4dVaqIpb1NK9YY6HU4hn/glAaEhIyfVbC4Etk6c=; b=MASr4zs8hsj21rF0JcE+m60nwcWQ4hWrXo08xpob1LDHvm8nTHn50Kh7xkH6YKBpHh mYTr2unSyWeqwawbZ0pZKP3oaH4u5mXudH2CY2raNpe/qE1y3hMOD3llYF+f8XuxtOFm 7h0c4LcdatnJUmCcnbvmrjJeCjBBAIAQ/JAvhqSzrkisxcRIdnkcXNvgoXA+/5iDVyZC 1Cj/vORlkNQqr8EkPrrsAIh5DX0KRHH5f9XU+R7yD0XSsEHSJpmPLw9fGvVGQjjks35F ozIFOpVWXwPEqLEoPV9oh9jHbkTwHVPf2pSnD+G/kB8mDcHp5k1iTA0G8bGNLik7HXaP ScEw== X-Gm-Message-State: ANoB5pnPUbE6Wa3eYKP5uRb8CGHC64gJxaXCWzlF22zxRF+65qN8vQya /TpIEBBVVTPrXpxrfb3a5yiMBF9z1o8= X-Google-Smtp-Source: AA0mqf6RY77g9CV32Da/y7z/78GNtrMrYPW/vIZwpH7wx5PtUvI5e1wW107kAf2hyEN/AszAgwE2AQ== X-Received: by 2002:adf:db07:0:b0:241:bf23:47dc with SMTP id s7-20020adfdb07000000b00241bf2347dcmr54121720wri.459.1670433961941; Wed, 07 Dec 2022 09:26:01 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id az23-20020adfe197000000b002425504ae7dsm11845092wrb.80.2022.12.07.09.26.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 09:26:01 -0800 (PST) Message-Id: In-Reply-To: References: Date: Wed, 07 Dec 2022 17:25:57 +0000 Subject: [PATCH 3/4] test-lib-functions: add helper for trailing hash Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee It can be helpful to check that a file format with a trailing hash has a specific hash in the final bytes of a written file. This is made more apparent by recent changes that allow skipping the hash algorithm and writing a null hash at the end of the file instead. Add a new test_trailing_hash helper and use it in t1600 to verify that index.skipHash=true really does skip the hash computation, since 'git fsck' does not actually verify the hash. Keep the 'git fsck' call to ensure that any potential future change to check the index hash does not cause an error in this case. Signed-off-by: Derrick Stolee --- t/t1600-index.sh | 3 +++ t/test-lib-functions.sh | 8 ++++++++ 2 files changed, 11 insertions(+) diff --git a/t/t1600-index.sh b/t/t1600-index.sh index df07c587e0e..55816756607 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -69,6 +69,9 @@ test_expect_success 'index.skipHash config option' ' ( rm -f .git/index && git -c index.skipHash=true add a && + test_trailing_hash .git/index >hash && + echo $(test_oid zero) >expect && + test_cmp expect hash && git fsck ) ' diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 796093a7b32..e88acfdb68a 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1875,3 +1875,11 @@ test_cmp_config_output () { sort config-actual >sorted-actual && test_cmp sorted-expect sorted-actual } + +# Given a filename, extract its trailing hash as a hex string +test_trailing_hash () { + local file="$1" && + tail -c $(test_oid rawsz) "$file" | \ + test-tool hexdump | \ + sed "s/ //g" +} From patchwork Wed Dec 7 17:25:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13067454 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26031C352A1 for ; Wed, 7 Dec 2022 17:26:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230071AbiLGR0j (ORCPT ); Wed, 7 Dec 2022 12:26:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34000 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229969AbiLGR0G (ORCPT ); Wed, 7 Dec 2022 12:26:06 -0500 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C3846A771 for ; Wed, 7 Dec 2022 09:26:04 -0800 (PST) Received: by mail-wr1-x431.google.com with SMTP id y16so29088745wrm.2 for ; Wed, 07 Dec 2022 09:26:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=O22MfRRRtoXbCAgJQHjyf/Tyih0cuI2+bKHpdzLx/Gs=; b=QmqVt8YZPJAfYy8RDvNSdF3E8NXxmJqSmqhyIG73hdC/tlMf2dL2eTZI6fvsikCUZy 4X7/bT3bJpvXfnHyDBG/ar39Gd9VyaIhCnKkLQpaauoEsRART7gB1iAZk79ZBWFoLDaF or62hFtKJ4ILPYj8zu2BE7144Ohe3YvOtA+tJRQXaMCqyvdlk2bj6qSEm76vDEvRZgVH LM//0iqb4v4qz9hvqQ8Vo3NxE/1GDXWYAoiQje24Sm0V9DrE/EAHo6iQweoyasLb6NOK Op9I9qw7xtvcaxCw0hmOjWNRSp9Tb8Ge0gr3hTk2eLsUlG2tWz8tmAyEvGC6WHBGYUcu puLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=O22MfRRRtoXbCAgJQHjyf/Tyih0cuI2+bKHpdzLx/Gs=; b=Tiok483fkkaQ3a5qiX/e+KmW2mtXa05urWcNlgISKj8AxQN80sx62IEqxODTryG10T OQg3Na+R6PakuNEF25+gFgKQsmuChR/Q4uonpohCcs+GdIPSiyw5Pm/az9daEkpdQ4OF Oq4lPGE7wEDG7AixkTHnXw+RY2IzsDQo0R4wm8bY2gL0+6dUpR2KpEBqfnfPtbiVvRRF SVGDyW1FnqHEj/8gWvlat1xGT7HHoTx9NQ87tVhv6La2ep8C9ciOBCMVLJqnpOD3aXTh 69Yr4KN9Luq2LRWsONOPzmEHMy8bxhNL0uyZ57Hlzw4f32DEdO9lhoxZkTVfouV38qmf 6hlA== X-Gm-Message-State: ANoB5pne4FpeQrI/EY4QFbfz4exEdyjkiykOLGV5n/a5+f66OaHLGZrg BIvNnRiW6+k0VZqjCQp95cpc22T8GWE= X-Google-Smtp-Source: AA0mqf4i5h6DPzJJZ0BoJlo6CSaZK3d0Wz69bIosMPOOysr4GsyXPUqXBI0pG1am4R7fyROstgQ/OQ== X-Received: by 2002:a5d:6947:0:b0:242:17a5:ee80 with SMTP id r7-20020a5d6947000000b0024217a5ee80mr26946195wrw.628.1670433962669; Wed, 07 Dec 2022 09:26:02 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id z5-20020adff745000000b002383fc96509sm19732824wrp.47.2022.12.07.09.26.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 09:26:02 -0800 (PST) Message-Id: <77bf5d5ff27729a39ac00d52af3c09610d733b14.1670433958.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Wed, 07 Dec 2022 17:25:58 +0000 Subject: [PATCH 4/4] features: feature.manyFiles implies fast index writes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: gitster@pobox.com, vdye@github.com, avarab@gmail.com, newren@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The recent addition of the index.skipHash config option allows index writes to speed up by skipping the hash computation for the trailing checksum. This is particularly critical for repositories with many files at HEAD, so add this config option to two cases where users in that scenario may opt-in to such behavior: 1. The feature.manyFiles config option enables some options that are helpful for repositories with many files at HEAD. 2. 'scalar register' and 'scalar reconfigure' set config options that optimize for large repositories. In both of these cases, set index.skipHash=true to gain this speedup. Add tests that demonstrate the proper way that index.skipHash=true can override feature.manyFiles=true. Signed-off-by: Derrick Stolee --- Documentation/config/feature.txt | 3 +++ read-cache.c | 7 ++++--- repo-settings.c | 2 ++ repository.h | 1 + scalar.c | 1 + t/t1600-index.sh | 13 ++++++++++++- 6 files changed, 23 insertions(+), 4 deletions(-) diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt index 95975e50912..f0e1d4cb2be 100644 --- a/Documentation/config/feature.txt +++ b/Documentation/config/feature.txt @@ -23,6 +23,9 @@ feature.manyFiles:: working directory. With many files, commands such as `git status` and `git checkout` may be slow and these new defaults improve performance: + +* `index.skipHash=true` speeds up index writes by not computing a trailing + checksum. ++ * `index.version=4` enables path-prefix compression in the index. + * `core.untrackedCache=true` enables the untracked cache. This setting assumes diff --git a/read-cache.c b/read-cache.c index fb4d6fb6387..1844953fba7 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2923,12 +2923,13 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, int ieot_entries = 1; struct index_entry_offset_table *ieot = NULL; int nr, nr_threads; - int skip_hash; f = hashfd(tempfile->fd, tempfile->filename.buf); - if (!git_config_get_maybe_bool("index.skiphash", &skip_hash)) - f->skip_hash = skip_hash; + if (istate->repo) { + prepare_repo_settings(istate->repo); + f->skip_hash = istate->repo->settings.index_skip_hash; + } for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) diff --git a/repo-settings.c b/repo-settings.c index 3021921c53d..3dbd3f0e2ec 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -47,6 +47,7 @@ void prepare_repo_settings(struct repository *r) } if (manyfiles) { r->settings.index_version = 4; + r->settings.index_skip_hash = 1; r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE; } @@ -61,6 +62,7 @@ void prepare_repo_settings(struct repository *r) repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1); repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1); repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0); + repo_cfg_bool(r, "index.skiphash", &r->settings.index_skip_hash, r->settings.index_skip_hash); /* * The GIT_TEST_MULTI_PACK_INDEX variable is special in that diff --git a/repository.h b/repository.h index 6c461c5b9de..e8c67ffe165 100644 --- a/repository.h +++ b/repository.h @@ -42,6 +42,7 @@ struct repo_settings { struct fsmonitor_settings *fsmonitor; /* lazily loaded */ int index_version; + int index_skip_hash; enum untracked_cache_setting core_untracked_cache; int pack_use_sparse; diff --git a/scalar.c b/scalar.c index 6c52243cdf1..b49bb8c24ec 100644 --- a/scalar.c +++ b/scalar.c @@ -143,6 +143,7 @@ static int set_recommended_config(int reconfigure) { "credential.validate", "false", 1 }, /* GCM4W-only */ { "gc.auto", "0", 1 }, { "gui.GCWarning", "false", 1 }, + { "index.skipHash", "false", 1 }, { "index.threads", "true", 1 }, { "index.version", "4", 1 }, { "merge.stat", "false", 1 }, diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 55816756607..be0a0a8a008 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -72,7 +72,18 @@ test_expect_success 'index.skipHash config option' ' test_trailing_hash .git/index >hash && echo $(test_oid zero) >expect && test_cmp expect hash && - git fsck + git fsck && + + rm -f .git/index && + git -c feature.manyFiles=true add a && + test_trailing_hash .git/index >hash && + test_cmp expect hash && + + rm -f .git/index && + git -c feature.manyFiles=true \ + -c index.skipHash=false add a && + test_trailing_hash .git/index >hash && + ! test_cmp expect hash ) '