From patchwork Mon Nov 7 18:35:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035081 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A6CBC4332F for ; Mon, 7 Nov 2022 18:36:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233093AbiKGSgL (ORCPT ); Mon, 7 Nov 2022 13:36:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232992AbiKGSgK (ORCPT ); Mon, 7 Nov 2022 13:36:10 -0500 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63BBA240A3 for ; Mon, 7 Nov 2022 10:36:09 -0800 (PST) Received: by mail-wr1-x431.google.com with SMTP id y16so17507795wrt.12 for ; Mon, 07 Nov 2022 10:36:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=hwC5rdCBH8oAZ28SC1bjINPERkNTBY5/ApyQoQryQpE=; b=SQeyBlH+F3IEKx0DdJhEp0kc6GA7huuXSGyIt9SNC78aCM+Wy+a+qsbCy+WvhGJdPu tvtHlpcKVVSDF0WWxwcqxLsyDmcGQ6VVBm7IOfydwBji5dGPUqzv+jN0t1EahuPmxgK5 7is0ttJnGz1Fnvr3X3WScPtr3fYituL1XNpmUH6GzL8atFE+KJTTC7GmKZBJZ+NZf4/d leqXe26YHBopA9DWwd+aXmK8s9jU8tWQfR4tWgT2qO4KJ7986qXoIssCNPrR93e8viGJ XvHOKRoNCYsa+O1x8Z+jH/fT7Th03Li0HonmAnfL5O/44Hqlk6HjrUoraZTGM4bqU97h +K+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hwC5rdCBH8oAZ28SC1bjINPERkNTBY5/ApyQoQryQpE=; b=c7anbCaAeTOPEkCm6xExFYr0cz5obHQEgNrjSwLxY2nU9hgl7fv9bRjTVCBSqgaa5K CKnT9eo5kqg0aDXZU4KrjNV5TkpmENou5fXafZkHCi7vG9wINOJkrVa/yEg0fc0VIkWh iE4Ktnf/dM3Cf1hytbQZoItzh0/juC7ol3FtKCN5gXvZ6DZpBA+s9wvaCld1gI2eGeyV Gkj4pSBv+bYfY9NqE93yMfxa8aE9txAGxpHgyGwIAOxxl/cBRfJHnqVwQbGZCLIjBfEi sqfohqcijNrdbgFf//3nhlHF3omzI6EhgwT+JCJlMpI187hu+HC5jSxdSMZewry/6ltO 8ZeQ== X-Gm-Message-State: ACrzQf3en++lWUNx7gMXrRE9gqa2JC7s7eQM2/pWArkeAxbMj/QEeIsg hbpqkK8tsbXbfhV/uXaSfrUnlalXnrw= X-Google-Smtp-Source: AMsMyM7xZUl0ylNMZ41dn/zopx24e4xnTYld6JcNmdql4E2RjwPDG+F4GzPcDrtHwQSfjeTlTu8Bvw== X-Received: by 2002:a5d:52ca:0:b0:236:e9ac:3c3f with SMTP id r10-20020a5d52ca000000b00236e9ac3c3fmr22325891wrv.53.1667846167626; Mon, 07 Nov 2022 10:36:07 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l7-20020a7bc447000000b003cf7928e731sm8795380wmi.9.2022.11.07.10.36.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:06 -0800 (PST) Message-Id: <71c76d4ccbe577f82e820fb08fe93e5177177804.1667846164.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:35 +0000 Subject: [PATCH 01/30] hashfile: allow skipping the hash function Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The hashfile API is useful for generating files that include a trailing hash of the file's contents up to that point. Using such a hash is helpful for verifying the file for corruption-at-rest, such as a faulty drive causing flipped bits. Since the commit-graph and multi-pack-index files both use this trailing hash, the chunk-format API uses a 'struct hashfile' to handle the I/O to the file. This was very convenient to allow using the hashfile methods during these operations. However, hashing the file contents during write comes at a performance penalty. It's slower to hash the bytes on their way to the disk than without that step. If we wish to use the chunk-format API to upgrade other file types, then this hashing is a performance penalty that might not be worth the benefit of a trailing hash. For example, if we create a chunk-format version of the packed-refs file, then the file format could shrink by using raw object IDs instead of hexadecimal representations in ASCII. That reduction in size is not enough to counteract the performance penalty of hashing the file contents. In cases such as deleting a reference that appears in the packed-refs file, that write-time performance is critical. This is in contrast to the commit-graph and multi-pack-index files which are mainly updated in non-critical paths such as background maintenance. One way to allow future chunked formats to not suffer this penalty would be to create an abstraction layer around the 'struct hashfile' using a vtable of function pointers. This would allow placing a different representation in place of the hashfile. This option would be cumbersome for a few reasons. First, the hashfile's buffered writes are already highly optimized and would need to be duplicated in another code path. The second is that the chunk-format API calls the chunk_write_fn pointers using a hashfile. If we change that to an abstraction layer, then those that _do_ use the hashfile API would need to change all of their instances of hashwrite(), hashwrite_be32(), and others to use the new abstraction layer. Instead, this change opts for a simpler change. Introduce a new 'skip_hash' option to 'struct hashfile'. When set, the update_fn and final_fn members of the_hash_algo are skipped. When finalizing the hashfile, the trailing hash is replaced with the null hash. This use of a trailing null hash would be desireable in either case, since we do not want to special case a file format to have a different length depending on whether it was hashed or not. When the final bytes of a file are all zero, we can infer that it was written without hashing, and thus that verification is not available as a check for file consistency. This also means that we could easily toggle hashing for any file format we desire. For the commit-graph and multi-pack-index file, it may be possible to allow the null hash without incrementing the file format version, since it technically fits the structure of the file format. The only issue is that older versions would trigger a failure during 'git fsck'. For these file formats, we may want to delay such a change until it is justified. However, the index file is written in critical paths. It is also frequently updated, so corruption at rest is less likely to be an issue than in those other file formats. This could be a good candidate to create an option that skips the hashing operation. A version of this patch has existed in the microsoft/git fork since 2017 [1] (the linked commit was rebased in 2018, but the original dates back to January 2017). Here, the change to make the index use this fast path is delayed until a later change. [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 Co-authored-by: Kevin Willford Signed-off-by: Kevin Willford Signed-off-by: Derrick Stolee --- csum-file.c | 14 +++++++++++--- csum-file.h | 7 +++++++ 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/csum-file.c b/csum-file.c index 59ef3398ca2..3243473c3d7 100644 --- a/csum-file.c +++ b/csum-file.c @@ -45,7 +45,8 @@ void hashflush(struct hashfile *f) unsigned offset = f->offset; if (offset) { - the_hash_algo->update_fn(&f->ctx, f->buffer, offset); + if (!f->skip_hash) + the_hash_algo->update_fn(&f->ctx, f->buffer, offset); flush(f, f->buffer, offset); f->offset = 0; } @@ -64,7 +65,12 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, int fd; hashflush(f); - the_hash_algo->final_fn(f->buffer, &f->ctx); + + if (f->skip_hash) + memset(f->buffer, 0, the_hash_algo->rawsz); + else + the_hash_algo->final_fn(f->buffer, &f->ctx); + if (result) hashcpy(result, f->buffer); if (flags & CSUM_HASH_IN_STREAM) @@ -108,7 +114,8 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count) * the hashfile's buffer. In this block, * f->offset is necessarily zero. */ - the_hash_algo->update_fn(&f->ctx, buf, nr); + if (!f->skip_hash) + the_hash_algo->update_fn(&f->ctx, buf, nr); flush(f, buf, nr); } else { /* @@ -153,6 +160,7 @@ static struct hashfile *hashfd_internal(int fd, const char *name, f->tp = tp; f->name = name; f->do_crc = 0; + f->skip_hash = 0; the_hash_algo->init_fn(&f->ctx); f->buffer_len = buffer_len; diff --git a/csum-file.h b/csum-file.h index 0d29f528fbc..29468067f81 100644 --- a/csum-file.h +++ b/csum-file.h @@ -20,6 +20,13 @@ struct hashfile { size_t buffer_len; unsigned char *buffer; unsigned char *check_buffer; + + /** + * If set to 1, skip_hash indicates that we should + * not actually compute the hash for this hashfile and + * instead only use it as a buffered write. + */ + unsigned int skip_hash; }; /* Checkpoint */ From patchwork Mon Nov 7 18:35:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035083 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD2C0C4332F for ; Mon, 7 Nov 2022 18:36:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233118AbiKGSgN (ORCPT ); Mon, 7 Nov 2022 13:36:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233081AbiKGSgL (ORCPT ); Mon, 7 Nov 2022 13:36:11 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 660C220BF9 for ; Mon, 7 Nov 2022 10:36:10 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id l14so17592639wrw.2 for ; Mon, 07 Nov 2022 10:36:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:fcc:content-transfer-encoding:mime-version:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=WefgVLjhe4OA0CfLG1K4M32LWeD7NxCDVXtdSjOhz5w=; b=K/Uw/VwxZceen+bwzIbvdvnz/l552rYfgKCZXlkFi/G/Wg53euHNkwy7f6tgQCeo9/ ELvlSHIdIkMgCNG/iIUdWofv/17nKmHfpjOYOn/mpCQjXoeqHXd5NbTCYisIuJ1fWFGJ 8DLbHnPfWaxRoxMAdl1/hlmkKqoAXDxior6/cQnpCTAB/3FpjmTw2cTL5zmasp28yLp/ 8JlJ1DPuMTJsXhRfcLiQnUZMd6Iexou3sI8QCrw1WrHWDsRg4VopPve0KRVYZHVfI6X4 C4pbkd7QW71WXEU3Ul0I2C8iKWDXxzS1tCojNefvNIz3Ihe40NBnx2w5ma8ERkVBvpk5 xl0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:fcc:content-transfer-encoding:mime-version:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WefgVLjhe4OA0CfLG1K4M32LWeD7NxCDVXtdSjOhz5w=; b=Cmw1a8cB2KCKo1/YM8ZlvnaN3vK1gmiLzz3HGlE7qcNChL50SW3cBtVpsoX63Zl+M9 39RUmKtCtxf6YdfVDZVHQc3Ex+L5j5RiVD9SS+aJ4ct5d8ntFPGnvHicTgLiiRlmAOKR n98ko+uoomFW2k0UhdBQuzH/nATX4U4y+FAd/KYcOPy6mSrJSs+PgrEz2DvvRXNFCWRV pq/LhLBKE3r7U7lYJw8tG/mnEA705YNK7F6Lfkv1pgK8xvRmeUVmn+Rkj1MurN4PviCB OrJr2JiwwFftwwFUTEuHBY8mlGsfdm8yRvJAP39yg8/n6zHWMi2yzkXfDOC4Pxkt36ak 1ppQ== X-Gm-Message-State: ACrzQf0ZVv0tTtRWn9KzoKYKxjY0cxuknf8PUIRU+oytuJ1bqvAU/w4X 7VFKPvdLbmX5OSu9zGvOh/OwL2ohb9E= X-Google-Smtp-Source: AMsMyM6KHNDMDsQImWjY9wqYSCws8QKx2ek8bP2zpv4fkRZSMxNUSDiSCRAo4eMr/wZExisEfH0srw== X-Received: by 2002:adf:b612:0:b0:236:5d1f:143a with SMTP id f18-20020adfb612000000b002365d1f143amr31624793wre.364.1667846168712; Mon, 07 Nov 2022 10:36:08 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id e8-20020a5d5948000000b0023657e1b97esm8110315wri.11.2022.11.07.10.36.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:08 -0800 (PST) Message-Id: <030d76f52af654470026b0c4b1dfba2b6c996885.1667846164.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:36 +0000 Subject: [PATCH 02/30] read-cache: add index.computeHash config option MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The previous change allowed skipping the hashing portion of the hashwrite API, using it instead as a buffered write API. Disabling the hashwrite can be particularly helpful when the write operation is in a critical path. One such critical path is the writing of the index. This operation is so critical that the sparse index was created specifically to reduce the size of the index to make these writes (and reads) faster. Following a similar approach to one used in the microsoft/git fork [1], add a new config option that allows disabling this hashing during the index write. The cost is that we can no longer validate the contents for corruption-at-rest using the trailing hash. [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201 While older Git versions will not recognize the null hash as a special case, the file format itself is still being met in terms of its structure. Using this null hash will still allow Git operations to function across older versions. The one exception is 'git fsck' which checks the hash of the index file. Here, we disable this check if the trailing hash is all zeroes. We add a warning to the config option that this may cause undesirable behavior with older Git versions. As a quick comparison, I tested 'git update-index --force-write' with and without index.computHash=false on a copy of the Linux kernel repository. Benchmark 1: with hash Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms] Range (min … max): 34.3 ms … 79.1 ms 82 runs Benchmark 2: without hash Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms] Range (min … max): 16.3 ms … 42.0 ms 69 runs Summary 'without hash' ran 1.78 ± 0.76 times faster than 'with hash' These performance benefits are substantial enough to allow users the ability to opt-in to this feature, even with the potential confusion with older 'git fsck' versions. Signed-off-by: Derrick Stolee --- Documentation/config/index.txt | 8 ++++++++ read-cache.c | 22 +++++++++++++++++++++- t/t1600-index.sh | 8 ++++++++ 3 files changed, 37 insertions(+), 1 deletion(-) diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt index 75f3a2d1054..709ba72f622 100644 --- a/Documentation/config/index.txt +++ b/Documentation/config/index.txt @@ -30,3 +30,11 @@ index.version:: Specify the version with which new index files should be initialized. This does not affect existing repositories. If `feature.manyFiles` is enabled, then the default is 4. + +index.computeHash:: + When enabled, compute the hash of the index file as it is written + and store the hash at the end of the content. This is enabled by + default. ++ +If you disable `index.computHash`, then older Git clients may report that +your index is corrupt during `git fsck`. diff --git a/read-cache.c b/read-cache.c index 32024029274..f24d96de4d3 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1817,6 +1817,8 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size) git_hash_ctx c; unsigned char hash[GIT_MAX_RAWSZ]; int hdr_version; + int all_zeroes = 1; + unsigned char *start, *end; if (hdr->hdr_signature != htonl(CACHE_SIGNATURE)) return error(_("bad signature 0x%08x"), hdr->hdr_signature); @@ -1827,10 +1829,23 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size) if (!verify_index_checksum) return 0; + end = (unsigned char *)hdr + size; + start = end - the_hash_algo->rawsz; + while (start < end) { + if (*start != 0) { + all_zeroes = 0; + break; + } + start++; + } + + if (all_zeroes) + return 0; + the_hash_algo->init_fn(&c); the_hash_algo->update_fn(&c, hdr, size - the_hash_algo->rawsz); the_hash_algo->final_fn(hash, &c); - if (!hasheq(hash, (unsigned char *)hdr + size - the_hash_algo->rawsz)) + if (!hasheq(hash, end - the_hash_algo->rawsz)) return error(_("bad index file sha1 signature")); return 0; } @@ -2917,9 +2932,14 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, int ieot_entries = 1; struct index_entry_offset_table *ieot = NULL; int nr, nr_threads; + int compute_hash; f = hashfd(tempfile->fd, tempfile->filename.buf); + if (!git_config_get_maybe_bool("index.computehash", &compute_hash) && + !compute_hash) + f->skip_hash = 1; + for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) removed++; diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 010989f90e6..24ab90ca047 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -103,4 +103,12 @@ test_expect_success 'index version config precedence' ' test_index_version 0 true 2 2 ' +test_expect_success 'index.computeHash config option' ' + ( + rm -f .git/index && + git -c index.computeHash=false add a && + git fsck + ) +' + test_done From patchwork Mon Nov 7 18:35:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035084 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E426C4332F for ; Mon, 7 Nov 2022 18:36:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232994AbiKGSg2 (ORCPT ); Mon, 7 Nov 2022 13:36:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232565AbiKGSgM (ORCPT ); Mon, 7 Nov 2022 13:36:12 -0500 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70930209B5 for ; Mon, 7 Nov 2022 10:36:11 -0800 (PST) Received: by mail-wr1-x42e.google.com with SMTP id z14so17564971wrn.7 for ; Mon, 07 Nov 2022 10:36:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=BqzRpn/q6peZiy2Tiiqul2t5ioB1w1wUwPSftwG+4Y8=; b=S/kAtHIierTE22qAQ1IG5QM8b9OJDYOKr4HLJ7Fgg0Pme0M6a7TTh/qhRFvlHa+UZh U2ynZNc5thlnGoQKpjPW2diIqCe8jQl106QfG+QHl6fIAOb5uUDfWS8D0denGbCR+z6d g6v9SXpFJNFIS0iN2AqeIXDq2YOEPkUvvAtj6JLSC1DQ3PPwUVuM9+3i4eslXNC1b/M5 v1dxPsQWRlNVkXVSPBnIltwUUJBUzNMnWmGMsVSf9o7Vxm7MvMYJUNYupGZ0sH7HSxtc 7DJwQZsxC8PuzHyAeHSoR+v4Z0D0Frpi0Em1Fb3gwlj47JjgR30WaiRbPO309emJQcIC M+zA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BqzRpn/q6peZiy2Tiiqul2t5ioB1w1wUwPSftwG+4Y8=; b=lhrMH8Lq+RRAhHfjKeyz4uaFOBlFesokRoQl06hmGbhLqgJXGWxgxoxwyu26zShkO4 kXOW0XfRyRvt7QH9j03On1cyAU2mZeDo/5U9HWxSOCx1P7/8L0x8yaK+7zjYm2B/+0SF zynOyytSUfkTAL8iLgVjRc4QT5KEDNFmTHgpUMo6Y0lPPht+7ms1p3+cQ3hPqtvuAUm8 dRErC64xFGFShRxpxVrVtwsb43QNYXRfklepZnRry6ysjOsGOBG82sYdcQHViQF6Ppz5 zGLMEdKPHY3o3aZ6NQD7RKpAyiaQ1B71TFM7v6cCkjo254DC183u94JhpSjxgSmbrQ6V JNTQ== X-Gm-Message-State: ACrzQf0dhG/Rq2G63gCuyFfdsglZFhHjIbLIf0AT29qHuB59y5aPq6ez NJMCcumil7QjEdS39IxA5IOPE+a/ixM= X-Google-Smtp-Source: AMsMyM6vrfsgRWTyNStuUzyl51zzqj2wx3R9GPWoGZIu/lre1gA0tW1/RqgH9xZaDtO2u3mGtcTxiw== X-Received: by 2002:a05:6000:1042:b0:236:f509:1dc3 with SMTP id c2-20020a056000104200b00236f5091dc3mr18555041wrx.313.1667846169632; Mon, 07 Nov 2022 10:36:09 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id j18-20020a056000125200b0023677fd2657sm7826542wrx.52.2022.11.07.10.36.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:09 -0800 (PST) Message-Id: <4013f992d15aab69346bf6f8eafe38511b923595.1667846164.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:37 +0000 Subject: [PATCH 03/30] extensions: add refFormat extension Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Git's reference storage is critical to its function. Creating new storage formats for references requires adding an extension. This prevents third-party tools that do not understand that format from operating incorrectly on the repository. This makes updating ref formats more difficult than other optional indexes, such as the commit-graph or multi-pack-index. However, there are a number of potential ref storage enhancements that are underway or could be created. Git needs an established mechanism for coordinating between these different options. The first obvious format update is the reftable format as documented in Documentation/technical/reftable.txt. This format has much of its implementation already in Git, but its connection as a ref backend is not complete. This change is similar to some changes within one of the patches intended for the reftable effort [1]. [1] https://lore.kernel.org/git/pull.1215.git.git.1644351400761.gitgitgadget@gmail.com/ However, this change makes a distinct strategy change from the one recommended by reftable. Here, the extensions.refFormat extension is provided as a multi-valued list. In the reftable RFC, the extension has a single value, "files" or "reftable" and explicitly states that this should not change after 'git init' or 'git clone'. The single-valued approach has some major drawbacks, including the idea that the "files" backend cannot coexist with the "reftable" backend at the same time. In this way, it would not be possible to create a repository that can write loose references and combine them into a reftable in the background. With the multi-valued approach, we could integrate reftable as a drop-in replacement for the packed-refs file and allow that to be a faster way to do the integration since the test suite would only need updates when the test is explicitly testing packed-refs. When upgrading a repository from the "files" backend to the "reftable" backend, it can help to have a transition period where both are present, then finally removing the "files" backend after all loose refs are collected into the reftable. But the reftable is not the only approach available. One obvious improvement could be a new file format version for the packed-refs file. Its current plaintext-based format is inefficient due to storing object IDs as hexadecimal representations instead of in their raw format. This extra cost will get worse with SHA-256. In addition, binary searches need to guess a position and scan to find newlines for a refname entry. A structured binary format could allow for more compact representation and faster access. Adding such a format could be seen as "files-v2", but it is really "packed-v2". The reftable approach has a concept of a "stack" of reftable files. This idea would also work for a stack of packed-refs files (in v1 or v2 format). It would be helpful to describe that the refs could be stored in a stack of packed-ref files independently of whether that is in file format v1 or v2. Even in these two options, it might be helpful to indicate whether or not loose ref files are present. That is one reason to not make them appear as "files-v2" or "files-v3" options in a single-valued extension. Even as "packed-v2" or "packed-v3" options, this approach would require third-party tools to understand the "v2" version if they want to support the "v3" options. Instead, by splitting the format from the layout, we can allow third-party tools to integrate only with the most-desired format options. For these reasons, this change is defining the extensions.refFormat extension as well as how the two existing values interact. By default, Git will assume "files" and "packed" in the list. If any other value is provided, then the extension is marked as unrecognized. Add tests that check the behavior of extensions.refFormat, both in that it requires core.repositoryFormatVersion=1, and Git will refuse to work with an unknown value of the extension. There is a gap in the current implementation, though. What happens if exactly one of "files" or "packed" is provided? The presence of only one would imply that the other is not available. A later change can communicate the list contents to the repository struct and then the reference backend could ignore one of these two layers. Specifically, having only "files" would mean that Git should not read or write the packed-refs file and instead only read and write loose ref files. By contrast, having only "packed" would mean that Git should not read or write loose ref files and instead always update the packed-refs file on every ref update. Signed-off-by: Derrick Stolee --- Documentation/config/extensions.txt | 41 +++++++++++++++++++++++++++++ setup.c | 5 ++++ t/t3212-ref-formats.sh | 27 +++++++++++++++++++ 3 files changed, 73 insertions(+) create mode 100755 t/t3212-ref-formats.sh diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt index bccaec7a963..ce8185adf53 100644 --- a/Documentation/config/extensions.txt +++ b/Documentation/config/extensions.txt @@ -7,6 +7,47 @@ Note that this setting should only be set by linkgit:git-init[1] or linkgit:git-clone[1]. Trying to change it after initialization will not work and will produce hard-to-diagnose issues. +extensions.refFormat:: + Specify the reference storage mechanisms used by the repoitory as a + multi-valued list. The acceptable values are `files` and `packed`. + If not specified, the list of `files` and `packed` is assumed. It + is an error to specify this key unless `core.repositoryFormatVersion` + is 1. ++ +As new ref formats are added, Git commands may modify this list before and +after upgrading the on-disk reference storage files. The specific values +indicate the existence of different layers: ++ +-- +`files`;; + When present, references may be stored as "loose" reference files + in the `$GIT_DIR/refs/` directory. The name of the reference + corresponds to the filename after `$GIT_DIR` and the file contains + an object ID as a hexadecimal string. If a loose reference file + exists, then its value takes precedence over all other formats. + +`packed`;; + When present, references may be stored as a group in a + `packed-refs` file in its version 1 format. When grouped with + `"files"` or provided on its own, this file is located at + `$GIT_DIR/packed-refs`. This file contains a list of distinct + reference names, paired with their object IDs. When combined with + `files`, the `packed` format will only be used to group multiple + loose object files upon request via the `git pack-refs` command or + via the `pack-refs` maintenance task. +-- ++ +The following combinations are supported by this version of Git: ++ +-- +`files` and `packed`;; + This set of values indicates that references are stored both as + loose reference files and in the `packed-refs` file in its v1 + format. Loose references are preferred, and the `packed-refs` file + is updated only when deleting a reference that is stored in the + `packed-refs` file or during a `git pack-refs` command. +-- + extensions.worktreeConfig:: If enabled, then worktrees will load config settings from the `$GIT_DIR/config.worktree` file in addition to the diff --git a/setup.c b/setup.c index cefd5f63c46..f5eb50c969a 100644 --- a/setup.c +++ b/setup.c @@ -577,6 +577,11 @@ static enum extension_result handle_extension(const char *var, "extensions.objectformat", value); data->hash_algo = format; return EXTENSION_OK; + } else if (!strcmp(ext, "refformat")) { + if (strcmp(value, "files") && strcmp(value, "packed")) + return error(_("invalid value for '%s': '%s'"), + "extensions.refFormat", value); + return EXTENSION_OK; } return EXTENSION_UNKNOWN; } diff --git a/t/t3212-ref-formats.sh b/t/t3212-ref-formats.sh new file mode 100755 index 00000000000..bc554e7c701 --- /dev/null +++ b/t/t3212-ref-formats.sh @@ -0,0 +1,27 @@ +#!/bin/sh + +test_description='test across ref formats' + +. ./test-lib.sh + +test_expect_success 'extensions.refFormat requires core.repositoryFormatVersion=1' ' + test_when_finished rm -rf broken && + + # Force sha1 to ensure GIT_TEST_DEFAULT_HASH does + # not imply a value of core.repositoryFormatVersion. + git init --object-format=sha1 broken && + git -C broken config extensions.refFormat files && + test_must_fail git -C broken status 2>err && + grep "repo version is 0, but v1-only extension found" err +' + +test_expect_success 'invalid extensions.refFormat' ' + test_when_finished rm -rf broken && + git init broken && + git -C broken config core.repositoryFormatVersion 1 && + git -C broken config extensions.refFormat bogus && + test_must_fail git -C broken status 2>err && + grep "invalid value for '\''extensions.refFormat'\'': '\''bogus'\''" err +' + +test_done From patchwork Mon Nov 7 18:35:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035087 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29780C4332F for ; Mon, 7 Nov 2022 18:36:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233150AbiKGSge (ORCPT ); Mon, 7 Nov 2022 13:36:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233106AbiKGSgN (ORCPT ); Mon, 7 Nov 2022 13:36:13 -0500 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 599352497A for ; Mon, 7 Nov 2022 10:36:12 -0800 (PST) Received: by mail-wm1-x32a.google.com with SMTP id p13-20020a05600c468d00b003cf8859ed1bso7718379wmo.1 for ; Mon, 07 Nov 2022 10:36:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=fAsIG3Q7Y722cAUQFUTkCz2gvIX8KTEf/tthursnj+s=; b=RT9/5AQ38nsw+Lak4Q9RmFoIXiPoT7X0NcYbiURzQzih+EBJh7yJtEo9kskN93Fx8B 0lUGVsSjg5JKzDVtzlsu8bK2gCaccz91QzQIV526vc2y/pU9//bNemYeJF8ls3lWHFQx uvXN3Vr2rGJzozFk4k4uLQ/H5c5z3mW7eRRj+FD4XL3JFgHYOKLKBStv4nRIKNCApjJ8 AmZ825BNUTcJGUVulNjbEKQgZNnWLLa3uj8r7NuE2j8VldlKmMAI5TtOgzJ7UHR4gmYQ U1jjxiB1DiKEc+FgS0lq3HHRYHVqQqShjHavLHxBfGG8wb3VZsFmoXyXOgincTY/UoD3 sALQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fAsIG3Q7Y722cAUQFUTkCz2gvIX8KTEf/tthursnj+s=; b=q4k75MMWtVCpqk7N3JCluMNQUy05HQu9YfCwF2NTFn4kIr2+No5KmCasB2MKmoubnm lt00o4XnMNKNeUTMF6WPAEPqe71y8IdBKx5r8b/nLj1/PPz5y07dnGohAJhfkmrprdyV 204gvy76b3WOXKTYR47GudGjqpmLFSmOcl1ADlV1bvQv19oLOE0gyDmmOzBaZhkf9Rus 1KUvfDXOdiXhmLPbTBDzDG/YRWuCFP4jrDilggVq3jNAJrdf7MeA2TOb077VK/Cn0PM2 rYc6mi1m+OzGqFmF11Kd9noWoX3sotF7kLmpPyOY1/Pd65TmZ4Mpj/16tupnun5Dq52D IGUA== X-Gm-Message-State: ANoB5pk8BVQLFZ9V4HE1vdTLDM2/BGfSEBJqC5bT8NiR+ndevKIxXXcF MA1VXp9Yhdz0DIw4ZwvUCsc8tKK+8UQ= X-Google-Smtp-Source: AA0mqf5P2kX9eI92TdewIrFk9CfmmNq1QxwsJcgks0wklZL1R4cmZiKVmyTTgmBXsgjhnk/stNx+qg== X-Received: by 2002:a1c:4384:0:b0:3cf:b287:916b with SMTP id q126-20020a1c4384000000b003cfb287916bmr2521667wma.181.1667846170715; Mon, 07 Nov 2022 10:36:10 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n4-20020a5d6604000000b002366fb99cdasm7846723wru.50.2022.11.07.10.36.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:10 -0800 (PST) Message-Id: <0cf654925f8d16a439871499a02125d75140ee36.1667846164.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:38 +0000 Subject: [PATCH 04/30] config: fix multi-level bulleted list Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The documentation for 'extensions.worktreeConfig' includes a bulletted list describing certain config values that need to be moved into the worktree config instead of the repository config file. However, since we are already in a bulletted list, the documentation tools do not know when that inner list is complete. Paragraphs intended to not be part of that inner list are rendered as part of the last bullet. Modify the format to match a similar doubly-nested list from the 'column.ui' config documentation. Reword the descriptions slightly to make the config keys appear as their own heading in the inner list. Signed-off-by: Derrick Stolee --- Documentation/config/extensions.txt | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt index ce8185adf53..18ed1c58126 100644 --- a/Documentation/config/extensions.txt +++ b/Documentation/config/extensions.txt @@ -62,10 +62,15 @@ When enabling `extensions.worktreeConfig`, you must be careful to move certain values from the common config file to the main working tree's `config.worktree` file, if present: + -* `core.worktree` must be moved from `$GIT_COMMON_DIR/config` to - `$GIT_COMMON_DIR/config.worktree`. -* If `core.bare` is true, then it must be moved from `$GIT_COMMON_DIR/config` - to `$GIT_COMMON_DIR/config.worktree`. +-- +`core.worktree`;; + This config value must be moved from `$GIT_COMMON_DIR/config` to + `$GIT_COMMON_DIR/config.worktree`. + +`core.bare`;; + If true, then this value must be moved from + `$GIT_COMMON_DIR/config` to `$GIT_COMMON_DIR/config.worktree`. +-- + It may also be beneficial to adjust the locations of `core.sparseCheckout` and `core.sparseCheckoutCone` depending on your desire for customizable From patchwork Mon Nov 7 18:35:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035088 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D419C4332F for ; Mon, 7 Nov 2022 18:36:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233185AbiKGSgf (ORCPT ); Mon, 7 Nov 2022 13:36:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233081AbiKGSg0 (ORCPT ); Mon, 7 Nov 2022 13:36:26 -0500 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A70725E96 for ; Mon, 7 Nov 2022 10:36:13 -0800 (PST) Received: by mail-wm1-x32d.google.com with SMTP id i132-20020a1c3b8a000000b003cfa97c05cdso215025wma.4 for ; Mon, 07 Nov 2022 10:36:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=Q8V34SHaCNfUgksCPhOhYFXWc4a0KY2DM8YhQE9/XcY=; b=ZfOV0xPQayPCvreYqemF6ArP7Muw1ixTHd+sTkGUybEHePIZcKEm4409YgvH4KZWzI CYLcjz/+6ZIKosTAa6Zq8CegBV2UnH7MkZ6oSOdHiFiFr0DMjQn5X/ofgD/HMvoZCMln n2h7p9+XepeWuIryuzV5icfDoKFbLQJvLSLtGk2thV4Sm7OJLD65f/oqouPrybs0eHFh lQV0MBWq9XlihVWmoDfHh1uwbaLPG3JJlRbl1oQ8QTIN+WISBPEK3buuteTISknq3SGS 9MvnZyuRm+Qi1qmkpxLncINCUS47ThlBQMT7kwSJUJwntwdt8qZhQ/AUX0pLq5WO7O2m N+mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q8V34SHaCNfUgksCPhOhYFXWc4a0KY2DM8YhQE9/XcY=; b=a938WwEJXZFYgoowgc+4t/2KL9Fwa7H5y9ysvyHj7ae6miO6wnCvbhZgPWWnq5L2Jw DGsuEi80PYQUZdcz/fydQRJ+Slsg/zBfk7q8suqtlRTuATecozp82x5FKPR0bwDXREiv hAnwsBW/Ft+QwBeE85O69bbhlHsxnR5QapRgLPI6PA5M4snJW48PWQliKZnR6Djl0lYB oBfRKJTer+mRfOzEfhnXcjbjsVqKY+kKaDQzQus9pO3uJd3RNohz9Tnl5XtzAj2xSWpH Q+EVUQPVg8ix9qoX/mW8jOGaHD3/k+eD9QoJWiX0R6hO8ZC9utG/yioLALy5wVb5QTuZ kK3Q== X-Gm-Message-State: ANoB5pmXiM67DsRN9GxaUYM+hUYejp1sY/UqPfoR4U6CHOutMzME9J8R pB1IjmG3qWczvstkaTppIkzDyI3fTBA= X-Google-Smtp-Source: AA0mqf4ow7PU/y+Wk0zQS5V9GvUJ9zrmgAYWormMOfw0cNC6mQgON5E2oHVxAevttNPs7tgLOpZT6g== X-Received: by 2002:a05:600c:1695:b0:3cf:a9b7:81e7 with SMTP id k21-20020a05600c169500b003cfa9b781e7mr5381585wmn.116.1667846171609; Mon, 07 Nov 2022 10:36:11 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id bn23-20020a056000061700b002305cfb9f3dsm8068156wrb.89.2022.11.07.10.36.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:11 -0800 (PST) Message-Id: <3121334256d8ab9afb2922a389ec22f9faaa08cf.1667846164.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:39 +0000 Subject: [PATCH 05/30] repository: wire ref extensions to ref backends Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The previous change introduced the extensions.refFormat config option. It is a multi-valued config option that currently understands "files" and "packed", with both values assumed by default. If any value is provided explicitly, this default is ignored and the provided settings are used instead. The multi-valued nature of this extension presents a way to allow a user to specify that they never want a packed-refs file (only use "files") or that they never want loose reference files (only use "packed"). However, that functionality is not currently connected. Before actually modifying the files backend to understand these extension settings, do the basic wiring that connects the extensions.refFormat parsing to the creation of the ref backend. A future change will actually change the ref backend initialization based on these settings, but this communication of the extension is sufficiently complicated to be worth an isolated change. For now, also forbid the setting of only "packed". This is done by redirecting the choice of backend to the packed backend when that selection is made. A later change will make the "files"-only extension value ignore the packed backend. Signed-off-by: Derrick Stolee --- cache.h | 2 ++ refs.c | 22 ++++++++++++++++++++-- refs/files-backend.c | 2 +- refs/refs-internal.h | 3 +++ repository.c | 2 ++ repository.h | 6 ++++++ setup.c | 18 +++++++++++++++++- t/t3212-ref-formats.sh | 12 ++++++++++++ 8 files changed, 63 insertions(+), 4 deletions(-) diff --git a/cache.h b/cache.h index 26ed03bd6de..13e9c251ac3 100644 --- a/cache.h +++ b/cache.h @@ -1155,6 +1155,8 @@ struct repository_format { int hash_algo; int sparse_index; char *work_tree; + int ref_format_count; + enum ref_format_flags ref_format; struct string_list unknown_extensions; struct string_list v1_only_extensions; }; diff --git a/refs.c b/refs.c index 1491ae937eb..21441ddb162 100644 --- a/refs.c +++ b/refs.c @@ -1982,6 +1982,15 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map, return entry ? entry->refs : NULL; } +static int add_ref_format_flags(enum ref_format_flags flags, int caps) { + if (flags & REF_FORMAT_FILES) + caps |= REF_STORE_FORMAT_FILES; + if (flags & REF_FORMAT_PACKED) + caps |= REF_STORE_FORMAT_PACKED; + + return caps; +} + /* * Create, record, and return a ref_store instance for the specified * gitdir. @@ -1991,9 +2000,17 @@ static struct ref_store *ref_store_init(struct repository *repo, unsigned int flags) { const char *be_name = "files"; - struct ref_storage_be *be = find_ref_storage_backend(be_name); + struct ref_storage_be *be; struct ref_store *refs; + flags = add_ref_format_flags(repo->ref_format, flags); + + if (!(flags & REF_STORE_FORMAT_FILES) && + (flags & REF_STORE_FORMAT_PACKED)) + be_name = "packed"; + + be = find_ref_storage_backend(be_name); + if (!be) BUG("reference backend %s is unknown", be_name); @@ -2009,7 +2026,8 @@ struct ref_store *get_main_ref_store(struct repository *r) if (!r->gitdir) BUG("attempting to get main_ref_store outside of repository"); - r->refs_private = ref_store_init(r, r->gitdir, REF_STORE_ALL_CAPS); + r->refs_private = ref_store_init(r, r->gitdir, + REF_STORE_ALL_CAPS); r->refs_private = maybe_debug_wrap_ref_store(r->gitdir, r->refs_private); return r->refs_private; } diff --git a/refs/files-backend.c b/refs/files-backend.c index b89954355de..db6c8e434c6 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3274,7 +3274,7 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err UNUSED) } struct ref_storage_be refs_be_files = { - .next = NULL, + .next = &refs_be_packed, .name = "files", .init = files_ref_store_create, .init_db = files_init_db, diff --git a/refs/refs-internal.h b/refs/refs-internal.h index 69f93b0e2ac..41520c945e4 100644 --- a/refs/refs-internal.h +++ b/refs/refs-internal.h @@ -521,6 +521,9 @@ struct ref_store; REF_STORE_ODB | \ REF_STORE_MAIN) +#define REF_STORE_FORMAT_FILES (1 << 8) /* can use loose ref files */ +#define REF_STORE_FORMAT_PACKED (1 << 9) /* can use packed-refs file */ + /* * Initialize the ref_store for the specified gitdir. These functions * should call base_ref_store_init() to initialize the shared part of diff --git a/repository.c b/repository.c index 5d166b692c8..96533fc76be 100644 --- a/repository.c +++ b/repository.c @@ -182,6 +182,8 @@ int repo_init(struct repository *repo, repo->repository_format_partial_clone = format.partial_clone; format.partial_clone = NULL; + repo->ref_format = format.ref_format; + if (worktree) repo_set_worktree(repo, worktree); diff --git a/repository.h b/repository.h index 24316ac944e..5cfde4282c5 100644 --- a/repository.h +++ b/repository.h @@ -61,6 +61,11 @@ struct repo_path_cache { char *shallow; }; +enum ref_format_flags { + REF_FORMAT_FILES = (1 << 0), + REF_FORMAT_PACKED = (1 << 1), +}; + struct repository { /* Environment */ /* @@ -95,6 +100,7 @@ struct repository { * the ref object. */ struct ref_store *refs_private; + enum ref_format_flags ref_format; /* * Contains path to often used file names. diff --git a/setup.c b/setup.c index f5eb50c969a..a5e63479558 100644 --- a/setup.c +++ b/setup.c @@ -578,9 +578,14 @@ static enum extension_result handle_extension(const char *var, data->hash_algo = format; return EXTENSION_OK; } else if (!strcmp(ext, "refformat")) { - if (strcmp(value, "files") && strcmp(value, "packed")) + if (!strcmp(value, "files")) + data->ref_format |= REF_FORMAT_FILES; + else if (!strcmp(value, "packed")) + data->ref_format |= REF_FORMAT_PACKED; + else return error(_("invalid value for '%s': '%s'"), "extensions.refFormat", value); + data->ref_format_count++; return EXTENSION_OK; } return EXTENSION_UNKNOWN; @@ -723,6 +728,11 @@ int read_repository_format(struct repository_format *format, const char *path) git_config_from_file(check_repo_format, path, format); if (format->version == -1) clear_repository_format(format); + + /* Set default ref_format if no extensions.refFormat exists. */ + if (!format->ref_format_count) + format->ref_format = REF_FORMAT_FILES | REF_FORMAT_PACKED; + return format->version; } @@ -1425,6 +1435,9 @@ int discover_git_directory(struct strbuf *commondir, candidate.partial_clone; candidate.partial_clone = NULL; + /* take ownership of candidate.ref_format */ + the_repository->ref_format = candidate.ref_format; + clear_repository_format(&candidate); return 0; } @@ -1561,6 +1574,8 @@ const char *setup_git_directory_gently(int *nongit_ok) the_repository->repository_format_partial_clone = repo_fmt.partial_clone; repo_fmt.partial_clone = NULL; + + the_repository->ref_format = repo_fmt.ref_format; } } /* @@ -1650,6 +1665,7 @@ void check_repository_format(struct repository_format *fmt) repo_set_hash_algo(the_repository, fmt->hash_algo); the_repository->repository_format_partial_clone = xstrdup_or_null(fmt->partial_clone); + the_repository->ref_format = fmt->ref_format; clear_repository_format(&repo_fmt); } diff --git a/t/t3212-ref-formats.sh b/t/t3212-ref-formats.sh index bc554e7c701..8c4e70196a0 100755 --- a/t/t3212-ref-formats.sh +++ b/t/t3212-ref-formats.sh @@ -24,4 +24,16 @@ test_expect_success 'invalid extensions.refFormat' ' grep "invalid value for '\''extensions.refFormat'\'': '\''bogus'\''" err ' +test_expect_success 'extensions.refFormat=packed only' ' + git init only-packed && + ( + cd only-packed && + git config core.repositoryFormatVersion 1 && + git config extensions.refFormat packed && + test_commit A && + test_path_exists .git/packed-refs && + test_path_is_missing .git/refs/tags/A + ) +' + test_done From patchwork Mon Nov 7 18:35:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035085 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C97F9C433FE for ; Mon, 7 Nov 2022 18:36:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233141AbiKGSga (ORCPT ); Mon, 7 Nov 2022 13:36:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233119AbiKGSg1 (ORCPT ); Mon, 7 Nov 2022 13:36:27 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2AF825EAE for ; Mon, 7 Nov 2022 10:36:14 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id j15so17559101wrq.3 for ; Mon, 07 Nov 2022 10:36:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=1auC5rAXkQITen61m670krVuOIVE+OUY+QjwiUQp/YU=; b=C+gIRI/LRC7xs2mG5txVSMcn9wMAEju4GKH+u+BqOGJHBD3dYNLklcrN5ZIGEX2EoG FjfBMTonxQMxxoyrWLVF6rSe74j3/R7m/D4voLLmtmD6VDI/uIKE55WqA6fZzIJ1Ia8J VyAaqNIPF+ngjlmx37jUp/550As6tPSnwO5NBpgQxr7umaQAbwg1D1Vz4eEGqeohpEFg Ct24cd7MwKl21yQquul5qn49w7O9/Q71nh5RMSEGGGEFH0QSaxA/ZDjOZbFaobDcckx0 1XAUDrZ50CzyfJghMYVqWAMZgauMCUYDPorh88JkHRSpikMceb5H7ttpXNxBgzfDACaY GuMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1auC5rAXkQITen61m670krVuOIVE+OUY+QjwiUQp/YU=; b=fkcTH1phB7e8u/JmIAAsWk23t7If/aoENzBkLsg8BtIrsi0Lh5obVmYTsbGZUSOJlh eoyNrcLzR3pXr+vOChiIune1tEfb4GH3wmBOgyA1jQb60sQd52BMrhL/7+gNwqMpfwJp +jI5RrBVKt1sjhT+INJYNlFNSbSHx790f0rKU8oXTsdYKrfsTXDY6Y6GbJO6q7fqMz7j plvqxLlETyfVUcIeoKLtHhFlSfYOqEUg9SuNg40jnGXEvhwwmKZ9avCJprJRwNfijLbE +5YK45MsdGbzs6A0ISzUnqH9RzLGLd+Sw/m/eAFDGdrgee9eBVCyAM6cfX0XJ/0A4TAl lPDg== X-Gm-Message-State: ACrzQf2ypjw1D71kn6i2423bJewcB+nDIE7oTGcxjBQq50zmzNZorcoD sAaDpA90dUxOqQINFmEoQonBaJx45DY= X-Google-Smtp-Source: AMsMyM62ukR7ug4S+lWxRXQarws4XcT7qXpMwQVBz+r1ubqCUJGIbVKAq8Oev/8qaPus8I70rlMt7A== X-Received: by 2002:adf:f7d2:0:b0:236:87bc:a8f7 with SMTP id a18-20020adff7d2000000b0023687bca8f7mr32437674wrq.579.1667846172615; Mon, 07 Nov 2022 10:36:12 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id bg20-20020a05600c3c9400b003c6bd12ac27sm9581105wmb.37.2022.11.07.10.36.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:11 -0800 (PST) Message-Id: <531bf1b6db0f5bbaf1508de5ea33f2e6d114f820.1667846164.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:40 +0000 Subject: [PATCH 06/30] refs: allow loose files without packed-refs Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The extensions.refFormat extension is a multi-valued config that specifies which ref formats are available to the current repository. By default, Git assumes the list of "files" and "packed", unless there is at least one of these extensions specified. With the current values, it is possible for a user to specify only "files" or only "packed". The only-"packed" option was already ruled as invalid since Git's current code has too many places that require a loose reference. This could change in the future. However, we can now allow the user to specify extensions.refFormat=files alone, making it impossible to create a packed-refs file (or to read one that might exist). Signed-off-by: Derrick Stolee --- Documentation/config/extensions.txt | 5 +++++ refs/files-backend.c | 6 ++++++ refs/packed-backend.c | 3 +++ refs/refs-internal.h | 5 +++++ t/t3212-ref-formats.sh | 20 ++++++++++++++++++++ 5 files changed, 39 insertions(+) diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt index 18ed1c58126..18071c336d0 100644 --- a/Documentation/config/extensions.txt +++ b/Documentation/config/extensions.txt @@ -46,6 +46,11 @@ The following combinations are supported by this version of Git: format. Loose references are preferred, and the `packed-refs` file is updated only when deleting a reference that is stored in the `packed-refs` file or during a `git pack-refs` command. + +`files`;; + When only this value is present, Git will ignore the `packed-refs` + file and refuse to write one during `git pack-refs`. All references + will be read from and written to loose reference files. -- extensions.worktreeConfig:: diff --git a/refs/files-backend.c b/refs/files-backend.c index db6c8e434c6..4a18aed6204 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -1198,6 +1198,12 @@ static int files_pack_refs(struct ref_store *ref_store, unsigned int flags) struct strbuf err = STRBUF_INIT; struct ref_transaction *transaction; + if (!packed_refs_enabled(refs->store_flags)) { + warning(_("refusing to create '%s' file because '%s' is not set"), + "packed-refs", "extensions.refFormat=packed"); + return -1; + } + transaction = ref_store_transaction_begin(refs->packed_ref_store, &err); if (!transaction) return -1; diff --git a/refs/packed-backend.c b/refs/packed-backend.c index c1c71d183ea..a4371b711b9 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -478,6 +478,9 @@ static int load_contents(struct snapshot *snapshot) size_t size; ssize_t bytes_read; + if (!packed_refs_enabled(snapshot->refs->store_flags)) + return 0; + fd = open(snapshot->refs->path, O_RDONLY); if (fd < 0) { if (errno == ENOENT) { diff --git a/refs/refs-internal.h b/refs/refs-internal.h index 41520c945e4..a1900848a87 100644 --- a/refs/refs-internal.h +++ b/refs/refs-internal.h @@ -524,6 +524,11 @@ struct ref_store; #define REF_STORE_FORMAT_FILES (1 << 8) /* can use loose ref files */ #define REF_STORE_FORMAT_PACKED (1 << 9) /* can use packed-refs file */ +static inline int packed_refs_enabled(int flags) +{ + return flags & REF_STORE_FORMAT_PACKED; +} + /* * Initialize the ref_store for the specified gitdir. These functions * should call base_ref_store_init() to initialize the shared part of diff --git a/t/t3212-ref-formats.sh b/t/t3212-ref-formats.sh index 8c4e70196a0..67aa65c116f 100755 --- a/t/t3212-ref-formats.sh +++ b/t/t3212-ref-formats.sh @@ -36,4 +36,24 @@ test_expect_success 'extensions.refFormat=packed only' ' ) ' +test_expect_success 'extensions.refFormat=files only' ' + test_commit T && + git pack-refs --all && + git init only-loose && + ( + cd only-loose && + git config core.repositoryFormatVersion 1 && + git config extensions.refFormat files && + test_commit A && + test_commit B && + test_must_fail git pack-refs 2>err && + grep "refusing to create" err && + test_path_is_missing .git/packed-refs && + + # Refuse to parse a packed-refs file. + cp ../.git/packed-refs .git/packed-refs && + test_must_fail git rev-parse refs/tags/T + ) +' + test_done From patchwork Mon Nov 7 18:35:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035086 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E59B2C433FE for ; Mon, 7 Nov 2022 18:36:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232804AbiKGSgc (ORCPT ); Mon, 7 Nov 2022 13:36:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233120AbiKGSg1 (ORCPT ); Mon, 7 Nov 2022 13:36:27 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C61C2611A for ; Mon, 7 Nov 2022 10:36:15 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id v1so17529021wrt.11 for ; Mon, 07 Nov 2022 10:36:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=GHeEwR8h3+faHBEfGtCs65hIdEp9VCiZJMy/rStdD14=; b=N9ZRe4QgKgOheVfPjKjTXjwCL2WOOT3gu39maUKPaxxH4SBx8v0lHzACWCq1ker+Pu lEvYv58QPBkLW0d/HTX+VONxzWjeQ3ZMkBfkxmg36G6jAcoMufNeMUWqVwxtynFbSvKc tTsoPJ3MX8fHu1jKWjK//5ZY7TyYEJbCNL4X/4WvOandW7ZJ8xclRTTMRO0/rPMP+b+U 4jUZgKjIDno0R1uLSWZmCbLi2u+nOIiUdTttdnea8Ab8aDpmss3X5pJzA9LsLDjs59JK s8U88zUByVhDghdys7UdzZwLWTTGu0KxIwSmXHvfmR6bwC3k2essvoWzkeZGax9GVJT7 HnDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GHeEwR8h3+faHBEfGtCs65hIdEp9VCiZJMy/rStdD14=; b=3W4oRGKBYBm3FunzJuKcmIUEzmay6ewrG6JQCtRwELrdJK/TvQiXHf8T6/4qvrIZ+K p+9DAuI1T20OhdwdiLSEFiSmBzg2zZ3xEn7cAfFd3i9fxdxtNtrOuGRjkwKyz7ezkI7U HJrzajvZjVVhjuCpKZ7K9Yw2h1Qzdm6TbSFKMKVE3ZPleG0pTinbXyePoINhkaWwWnC4 X0WyoouyVdCramWcFgkkfoLMnJdiUFi8ZOqcaaXiGvuudexa3n5/nsKvlATioFLLmeix BlYnvorbiC2yqr/+5RBguTz6yV2l+nMhH3L/Ie3ntQ5EwNLwfPDxzX671pT2WsDe2y9d Echw== X-Gm-Message-State: ACrzQf1YHCFC+pG6yMH+EHKqxmpTPGMwiMxWEVqLid55/bIkFWKMTWzZ zjYY0Rw7SHpU4jsrZ0dNylhuF44sQ4c= X-Google-Smtp-Source: AMsMyM7w61+pC1ohf8C3ShDnkysMeLe4Zw9n8MMp5W4urFgG/c930bEP7K6Dq7eaYwRW3WeD8YqsCg== X-Received: by 2002:a5d:564c:0:b0:236:6089:cc50 with SMTP id j12-20020a5d564c000000b002366089cc50mr31490126wrw.520.1667846173691; Mon, 07 Nov 2022 10:36:13 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m1-20020a7bca41000000b003c6c3fb3cf6sm8915492wml.18.2022.11.07.10.36.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:13 -0800 (PST) Message-Id: <4fcbfed2c7c78c804c7eeeed5b7080b9fd812bb7.1667846164.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:41 +0000 Subject: [PATCH 07/30] chunk-format: number of chunks is optional Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Even though the commit-graph and multi-pack-index file formats specify a number of chunks in their header information, this is optional. The table of contents terminates with a null chunk ID, which can be used instead. The extra value is helpful for some checks, but is ultimately not necessary for the format. This will be important in some future formats. Signed-off-by: Derrick Stolee --- Documentation/gitformat-chunk.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/Documentation/gitformat-chunk.txt b/Documentation/gitformat-chunk.txt index 57202ede273..c01f5567c4f 100644 --- a/Documentation/gitformat-chunk.txt +++ b/Documentation/gitformat-chunk.txt @@ -24,8 +24,9 @@ how they use the chunks to describe structured data. A chunk-based file format begins with some header information custom to that format. That header should include enough information to identify -the file type, format version, and number of chunks in the file. From this -information, that file can determine the start of the chunk-based region. +the file type, format version, and (optionally) the number of chunks in +the file. From this information, that file can determine the start of the +chunk-based region. The chunk-based region starts with a table of contents describing where each chunk starts and ends. This consists of (C+1) rows of 12 bytes each, From patchwork Mon Nov 7 18:35:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035089 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 034DAC433FE for ; Mon, 7 Nov 2022 18:36:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231408AbiKGSgi (ORCPT ); Mon, 7 Nov 2022 13:36:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233098AbiKGSg1 (ORCPT ); Mon, 7 Nov 2022 13:36:27 -0500 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51FE926483 for ; Mon, 7 Nov 2022 10:36:16 -0800 (PST) Received: by mail-wr1-x431.google.com with SMTP id j15so17559212wrq.3 for ; Mon, 07 Nov 2022 10:36:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=SjsBoUSCCBGR2rdNs7h7mrHxY35extMQtstKoNdGBAM=; b=MUF3Ox7IUGsFlhV2voeZhY8jThGs7Wf3zN0jbgCU0sRrgcKm73DLV7gGY88mNoWjy4 dLmDHEiRVKXSXbU65/JuoIc9dsnJWk7VrdFj71aDa8WPUidwfOuJ8ITQI4I5t8KlSXcq 7V5u5NVH0yqsLsATSeyGR/HoLca7khqancX2kp5X9vdbh1PRdmUgBEYMX4p8UtaKwcm1 HHs1uF1dY6Fbi2/oFnC87SwIFF7en1zC46Vgg1tM19urA3YF5CipuhF81cD+GLuzqKyj GDzR90uGEXFFd4aJv59j9G/+u0LeGzPp0UeeAPrSX5RYkuEeTxkvMyqjgy4fcbogSFc5 CQfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SjsBoUSCCBGR2rdNs7h7mrHxY35extMQtstKoNdGBAM=; b=bGS1Z39E82KZ4dqaHTEUSwtK4N5ITJy8wlJax595jHpB3urZGSbN3aktG0xGmZ3O/H TQjsqw0AuwdZrMdWiWAKh8DeddG0Gn0tLi6crFNYAuijk17BUsYvdA6ThcH+UI0p49Pq bZHM9fWBPZ1HxYAVS9KehCNsaLUGab+8W69cTaYCPb/EeWF5iRR4OzJkvIDbQXcZZRYB Fprv7u3SJxoNW0nBftP65L3nfple3rDft7ZcLU9u+nt55lIXRkGuTFDr0LJ0fYVrdWEY xPMKoDxaICeHdIkOlzu6JqFvQs12kmn7P5a2Xg94+eYQY+gBmfgh61jXGbO83X6kXyS6 ODQA== X-Gm-Message-State: ACrzQf3NxN7iIdBMUBVImZDWzDbOQ+n8CX8lK3yQxHlY78QZkAEv31mm CPpHKFEwZqWZ2A/D+RiFH58wdG1Jq6U= X-Google-Smtp-Source: AMsMyM7uM6pxz5JOrvp15bFDj2lGrIrad+mQ7WK/uRVC6wxw5E4IXZge3N5HgBcQ4q3WE/pzLJj2EA== X-Received: by 2002:a05:6000:1ce:b0:236:ef02:bb56 with SMTP id t14-20020a05600001ce00b00236ef02bb56mr19910340wrx.238.1667846174601; Mon, 07 Nov 2022 10:36:14 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o15-20020a056000010f00b0023691d62cffsm8081439wrx.70.2022.11.07.10.36.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:14 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:42 +0000 Subject: [PATCH 08/30] chunk-format: document trailing table of contents Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee It will be helpful to allow a trailing table of contents when writing some file types with the chunk-format API. The main reason is that it allows dynamically computing the chunk sizes while writing the file. This can use fewer resources than precomputing all chunk sizes in advance. Signed-off-by: Derrick Stolee --- Documentation/gitformat-chunk.txt | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/Documentation/gitformat-chunk.txt b/Documentation/gitformat-chunk.txt index c01f5567c4f..ee3718c4306 100644 --- a/Documentation/gitformat-chunk.txt +++ b/Documentation/gitformat-chunk.txt @@ -52,8 +52,27 @@ The final entry in the table of contents must be four zero bytes. This confirms that the table of contents is ending and provides the offset for the end of the chunk-based data. +The default chunk format assumes the table of contents appears at the +beginning of the file (after the header information) and the chunks are +ordered by increasing offset. Alternatively, the chunk format allows a +table of contents that is placed at the end of the file (before the +trailing hash) and the offsets are in descending order. In this trailing +table of contents case, the data in order looks instead like the following +table: + + | Chunk ID (4 bytes) | Chunk Offset (8 bytes) | + |--------------------|------------------------| + | 0x0000 | OFFSET[C+1] | + | ID[C] | OFFSET[C] | + | ... | ... | + | ID[0] | OFFSET[0] | + +The concrete file format that uses the chunk format will mention that it +uses a trailing table of contents if it uses it. By default, the table of +contents is in ascending order before all chunk data. + Note: The chunk-based format expects that the file contains _at least_ a -trailing hash after `OFFSET[C+1]`. +trailing hash after either `OFFSET[C+1]` or the trailing table of contents. Functions for working with chunk-based file formats are declared in `chunk-format.h`. Using these methods provide extra checks that assist From patchwork Mon Nov 7 18:35:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035090 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCFAFC4332F for ; Mon, 7 Nov 2022 18:36:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233188AbiKGSgl (ORCPT ); Mon, 7 Nov 2022 13:36:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233137AbiKGSg1 (ORCPT ); Mon, 7 Nov 2022 13:36:27 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F6CB2648A for ; Mon, 7 Nov 2022 10:36:17 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id k8so17610547wrh.1 for ; Mon, 07 Nov 2022 10:36:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=tl+O9eIsm0qB/+MG2tKxqwEaGg64A+ZDbNtireyw8vg=; b=XiZB0rgt5a7QgptSpfOP0SpGSZFWnU/Arpv6Vl5ch31VCraY0+cJig1FaDq3bsmr1+ Qsy7UULaztfbqNZUYuo4pH1nILaG+EkMwmKBxNJGIiEl+7W4awPpLjGUMgbtAJ9kd5Vk WbWLIUmz0w/J1hcuXvT6Ri3ZGb9U1TRQucCpLhGanTDSHg05HopMOGWdnb70FWjvmHvr lKJL2/jr0+xVJYr5ZHirAMskxJWLdCroLvNM15SrnlKOs6lPPmr7aRtOWl3CrO5/RxYn bAHqS/xCk1E5jx2gDWq8n6useJU/Du6gFsDAaiNQRd/mt4Q5Tik5rkMuWpw75KRGyQVM Jmkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tl+O9eIsm0qB/+MG2tKxqwEaGg64A+ZDbNtireyw8vg=; b=bWN8Fu7S94vQPeQwmZggrNlS6G6hz52FCWbcXfD3MV5h+vMd/LOccdwnrfryPOLEmE 6AIwwodTa5fL0gkbQvJFm+gXXd02699091cuN0M4zwA4j6bPOKbEdx7etXsUNO1NjC0w mOC/Di6Rp7Ntn1fLkAUH223bIUNv5/uiKsUu+xAfVHrmG1Th9IfC/1xLNww53yHZ0gEE D7zXTidT0WY667Cs2bf/PDjZzf5TRt6AOR0mpdpJ+VPUE/VCNZZ0G8jTw/L6vU+3ik+0 bxV0IPOOY/J5KNml95v+nDGbqe3eneNL5JP93vJ0th5tyHCMvXm3Ac41PqIU8QZOmyj6 E2Og== X-Gm-Message-State: ACrzQf3ur5b+H7sxxRVf3/w8xA+Juo6IMhPNqXJkWxQAL+NKag/Cd7zc QvBWivo4o3FGFKRPrGXmqhYyqq/gSVc= X-Google-Smtp-Source: AMsMyM663zRxM0+DneQz0qjXGkLwx2mQUzMkSIIRmpkmy54uLVXjJJtsPiz6gQwZIuSD/OmKjDh5qQ== X-Received: by 2002:a05:6000:808:b0:236:9822:718d with SMTP id bt8-20020a056000080800b002369822718dmr32891715wrb.254.1667846175612; Mon, 07 Nov 2022 10:36:15 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id e5-20020adfef05000000b00225307f43fbsm8122945wro.44.2022.11.07.10.36.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:15 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:43 +0000 Subject: [PATCH 09/30] chunk-format: store chunk offset during write Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee As a preparatory step to allowing trailing table of contents, store the offsets of each chunk as we write them. This replaces an existing use of a local variable, but the stored value will be used in the next change. Signed-off-by: Derrick Stolee --- chunk-format.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/chunk-format.c b/chunk-format.c index 0275b74a895..f1b2c8a8b36 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -13,6 +13,7 @@ struct chunk_info { chunk_write_fn write_fn; const void *start; + off_t offset; }; struct chunkfile { @@ -78,16 +79,16 @@ int write_chunkfile(struct chunkfile *cf, void *data) hashwrite_be64(cf->f, cur_offset); for (i = 0; i < cf->chunks_nr; i++) { - off_t start_offset = hashfile_total(cf->f); + cf->chunks[i].offset = hashfile_total(cf->f); result = cf->chunks[i].write_fn(cf->f, data); if (result) goto cleanup; - if (hashfile_total(cf->f) - start_offset != cf->chunks[i].size) + if (hashfile_total(cf->f) - cf->chunks[i].offset != cf->chunks[i].size) BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", cf->chunks[i].size, cf->chunks[i].id, - hashfile_total(cf->f) - start_offset); + hashfile_total(cf->f) - cf->chunks[i].offset); } cleanup: From patchwork Mon Nov 7 18:35:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035092 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6039C433FE for ; Mon, 7 Nov 2022 18:36:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233159AbiKGSgo (ORCPT ); Mon, 7 Nov 2022 13:36:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233156AbiKGSg2 (ORCPT ); Mon, 7 Nov 2022 13:36:28 -0500 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13C1126ACF for ; Mon, 7 Nov 2022 10:36:18 -0800 (PST) Received: by mail-wr1-x42e.google.com with SMTP id w14so17541899wru.8 for ; Mon, 07 Nov 2022 10:36:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=WNL/PKLtqA4tM5xYWUkVzcVTLyy+kCZ5yTETBBQwE5c=; b=chspGvjswddO407jSiwI+Y2/8DsAkHtgq9rQ/UWHbslDAE5wM/HlTXkniM1krOTHZP xSpXPRKCSJE5ZCkcARRZ+GAfzM/8RQE/cPBowbL5QOhGPDMfol8o58AAIaBmGxo3zSY6 otMBjGgGLD0PkzbkL4wL/FK/v9BVzSMocmodfZs47EsyxWxJ1TSoMUK3IH50LPMbmyFX O9LQRxxd9B9YDL/huTvK4T6HDf8gJgDN8eLFBKRwZTZXqdNsim/LBV6D68LUsPcw9IVv RnjIpy4lBDyxYDugZSJXmrUiG4iG7TvmLfBEO2AMI6eL44IzpqVksT9nDElgvaATkqEj 6fGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WNL/PKLtqA4tM5xYWUkVzcVTLyy+kCZ5yTETBBQwE5c=; b=BrxLMuuUirb3xAf+CVFQ4HVPgvt7fAQPJv18e3gAwcjCvgp8dPzSY+aVQ5gLx9zX3B Z9p8q5CZzpkbKaI9UJZwj6opW0Ln/AU7mR/8czHS9a9Q6m+0gO1UerFgCKgqJ+OiuPGq 6Av6mgAOn3BLz1VA6KvhuJQUlUylDVnvHpFTcPUoXLU01qcBMKHP4SBywYbEXlmKYlgl vrgYSqFa0IQsnMfnNnmEAs5x3P7/xhL9Fzb67y2ddGQGiDXj8XajRDM9L/j28cJBWDEd gsGqbY1dnV1X+YHrdR7NE9m3m1BaV3lJlyjuew2A3PMAOm6OYxCakqqxlu+8isiPNiAg /ptA== X-Gm-Message-State: ACrzQf07UL858evbBlxSquvZk6kLjYDlb7bBuEPVZHiKBubPQmc0co+2 B2SUzfguTw8pMDWQjlUWIEWlMrNqQE0= X-Google-Smtp-Source: AMsMyM5M3nBpY/ZweaPIz5kyylQ4FD1xXokE9xcywosElzUCEDEtC52fQ0r5g132TWL7ezfbSZUdnA== X-Received: by 2002:adf:cc92:0:b0:236:77f0:ef5f with SMTP id p18-20020adfcc92000000b0023677f0ef5fmr33181589wrj.198.1667846176435; Mon, 07 Nov 2022 10:36:16 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c2-20020a5d4f02000000b002366553eca7sm7889447wru.83.2022.11.07.10.36.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:16 -0800 (PST) Message-Id: <78e585cf4df2bb82a2569cee226a6b97d0ea7629.1667846164.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:44 +0000 Subject: [PATCH 10/30] chunk-format: allow trailing table of contents Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The existing chunk formats use the table of contents at the beginning of the file. This is intended as a way to speed up the initial loading of the file, but comes at a cost during writes. Each example needs to fully compute how big each chunk will be in advance, which usually requires storing the full file contents in memory. Future file formats may want to use the chunk format API in cases where the writing stage is critical to performance, so we may want to stream updates from an existing file and then only write the table of contents at the end. Add a new 'flags' parameter to write_chunkfile() that allows this behavior. When this is specified, the defensive programming that checks that the chunks are written with the precomputed sizes is disabled. Then, the table of contents is written in reverse order at the end of the hashfile, so a parser can read the chunk list starting from the end of the file (minus the hash). The parsing of these table of contents will come in a later change. Signed-off-by: Derrick Stolee --- chunk-format.c | 53 +++++++++++++++++++++++++++++++++++--------------- chunk-format.h | 9 ++++++++- commit-graph.c | 2 +- midx.c | 2 +- 4 files changed, 47 insertions(+), 19 deletions(-) diff --git a/chunk-format.c b/chunk-format.c index f1b2c8a8b36..3f5cc9b5ddf 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -57,26 +57,31 @@ void add_chunk(struct chunkfile *cf, cf->chunks_nr++; } -int write_chunkfile(struct chunkfile *cf, void *data) +int write_chunkfile(struct chunkfile *cf, + enum chunkfile_flags flags, + void *data) { int i, result = 0; - uint64_t cur_offset = hashfile_total(cf->f); trace2_region_enter("chunkfile", "write", the_repository); - /* Add the table of contents to the current offset */ - cur_offset += (cf->chunks_nr + 1) * CHUNK_TOC_ENTRY_SIZE; + if (!(flags & CHUNKFILE_TRAILING_TOC)) { + uint64_t cur_offset = hashfile_total(cf->f); - for (i = 0; i < cf->chunks_nr; i++) { - hashwrite_be32(cf->f, cf->chunks[i].id); - hashwrite_be64(cf->f, cur_offset); + /* Add the table of contents to the current offset */ + cur_offset += (cf->chunks_nr + 1) * CHUNK_TOC_ENTRY_SIZE; - cur_offset += cf->chunks[i].size; - } + for (i = 0; i < cf->chunks_nr; i++) { + hashwrite_be32(cf->f, cf->chunks[i].id); + hashwrite_be64(cf->f, cur_offset); - /* Trailing entry marks the end of the chunks */ - hashwrite_be32(cf->f, 0); - hashwrite_be64(cf->f, cur_offset); + cur_offset += cf->chunks[i].size; + } + + /* Trailing entry marks the end of the chunks */ + hashwrite_be32(cf->f, 0); + hashwrite_be64(cf->f, cur_offset); + } for (i = 0; i < cf->chunks_nr; i++) { cf->chunks[i].offset = hashfile_total(cf->f); @@ -85,10 +90,26 @@ int write_chunkfile(struct chunkfile *cf, void *data) if (result) goto cleanup; - if (hashfile_total(cf->f) - cf->chunks[i].offset != cf->chunks[i].size) - BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", - cf->chunks[i].size, cf->chunks[i].id, - hashfile_total(cf->f) - cf->chunks[i].offset); + if (!(flags & CHUNKFILE_TRAILING_TOC)) { + if (hashfile_total(cf->f) - cf->chunks[i].offset != cf->chunks[i].size) + BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead", + cf->chunks[i].size, cf->chunks[i].id, + hashfile_total(cf->f) - cf->chunks[i].offset); + } + + cf->chunks[i].size = hashfile_total(cf->f) - cf->chunks[i].offset; + } + + if (flags & CHUNKFILE_TRAILING_TOC) { + size_t last_chunk_tail = hashfile_total(cf->f); + /* First entry marks the end of the chunks */ + hashwrite_be32(cf->f, 0); + hashwrite_be64(cf->f, last_chunk_tail); + + for (i = cf->chunks_nr - 1; i >= 0; i--) { + hashwrite_be32(cf->f, cf->chunks[i].id); + hashwrite_be64(cf->f, cf->chunks[i].offset); + } } cleanup: diff --git a/chunk-format.h b/chunk-format.h index 7885aa08487..39e8967e950 100644 --- a/chunk-format.h +++ b/chunk-format.h @@ -31,7 +31,14 @@ void add_chunk(struct chunkfile *cf, uint32_t id, size_t size, chunk_write_fn fn); -int write_chunkfile(struct chunkfile *cf, void *data); + +enum chunkfile_flags { + CHUNKFILE_TRAILING_TOC = (1 << 0), +}; + +int write_chunkfile(struct chunkfile *cf, + enum chunkfile_flags flags, + void *data); int read_table_of_contents(struct chunkfile *cf, const unsigned char *mfile, diff --git a/commit-graph.c b/commit-graph.c index a7d87559328..c927b81250d 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1932,7 +1932,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) get_num_chunks(cf) * ctx->commits.nr); } - write_chunkfile(cf, ctx); + write_chunkfile(cf, 0, ctx); stop_progress(&ctx->progress); strbuf_release(&progress_title); diff --git a/midx.c b/midx.c index 7cfad04a240..03d947a5d33 100644 --- a/midx.c +++ b/midx.c @@ -1510,7 +1510,7 @@ static int write_midx_internal(const char *object_dir, } write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs); - write_chunkfile(cf, &ctx); + write_chunkfile(cf, 0, &ctx); finalize_hashfile(f, midx_hash, FSYNC_COMPONENT_PACK_METADATA, CSUM_FSYNC | CSUM_HASH_IN_STREAM); From patchwork Mon Nov 7 18:35:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035091 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C55AEC43217 for ; Mon, 7 Nov 2022 18:36:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233190AbiKGSgm (ORCPT ); Mon, 7 Nov 2022 13:36:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233153AbiKGSg2 (ORCPT ); Mon, 7 Nov 2022 13:36:28 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D6D526AC7 for ; Mon, 7 Nov 2022 10:36:18 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id j15so17559396wrq.3 for ; Mon, 07 Nov 2022 10:36:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=44dn24qGOMEhexAJEfQkrMkQ/mBOQVNND14B5OE1N8E=; b=enPep+bMSVW4yMDWJZMcm23omZS61uDWj3SppLm5tEK8oj85qdXjIaavoPufDsa+nj ZQxafsI2GOHJKqk4lr3WImz2tQhaySF3LQ7HnpJ+gyf3QZYDnIbNiqatbkLz1B+PyCvp qY6KheApHaeqqWfPnuZJCqC5Ay8NH9iU5QHyKYipl8R7hKjGZ9YisifYADOqCLGBm0U7 rmH5OlMmxBlXqvcDDemxR55PFo4PMZRjLANQ6HOr81pusVlKobVvxtWDYLbmVWe9iXXC 0qoPd5mg++x/CzrG+01HvTp6G+bTXPam4NV0dvTUeAd6JQqxEYmhLCrEoCz69F6MW3PF m1Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=44dn24qGOMEhexAJEfQkrMkQ/mBOQVNND14B5OE1N8E=; b=bnwcWzI8GRSNzusqRmhPbTM3O731UFc2JAx04FZ2InAicSqr8jScPgF7EebmoHItdG 56OeoYYUytLB63cvCNH7eQ7WwAdsCtG6ZZrKMu0Ilv5e+YTbakr2TZ09kCi+x2Tto8KY JMFZ7e4+2FZ5KPBSiYQdJqpw1516Sow4wlr8F87D97Mnm6NFezPA0Sa4ZTk375k24AzY 6zmuJi4nIMNoIkVXX9UnaFDCl8Y3Z4r1HgaJgHku8fOsL48RaBu99CCeygskQG1n1ZVg rtgEkII2jYjcFPgJkv50gX6YthzevvWNUT0ljWywG1I36TQFBdEwXbLW+xdBx0CR/a+K xteA== X-Gm-Message-State: ACrzQf3tkJ61kJf0K+kBmqW7ZMO/ZRk0HJcPUTo5V6f2+UpWuC0ij5Oe OPC7nwLtewAwva8GjqfSJuZYhWE2Bhs= X-Google-Smtp-Source: AMsMyM78mynlrTK1PCvZN90lHu8vV5nZf/+uFrC+4JffEG1mtylPROFhtohVrvHl8nJxiBTIqCeqVQ== X-Received: by 2002:a5d:548d:0:b0:236:debd:f285 with SMTP id h13-20020a5d548d000000b00236debdf285mr24058614wrv.640.1667846177427; Mon, 07 Nov 2022 10:36:17 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id o15-20020a056000010f00b0023691d62cffsm8081531wrx.70.2022.11.07.10.36.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:16 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:45 +0000 Subject: [PATCH 11/30] chunk-format: parse trailing table of contents Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The new read_trailing_table_of_contents() mimics read_table_of_contents() except that it reads the table of contents in reverse from the end of the given hashfile. The file is given as a memory-mapped section of memory and a size. Automatically calculate the start of the trailing hash and read the table of contents in revers from that position. The errors come along from those in read_table_of_contents(). The one exception is that the chunk_offset cannot be checked as going into the table of contents since we do not have that length automatically. That may have some surprising results for some narrow forms of corruption. However, we do still limit the size to the size of the file plus the part of the table of contents read so far. At minimum, the given sizes can be used to limit parsing within the file itself. Signed-off-by: Derrick Stolee --- chunk-format.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++ chunk-format.h | 9 +++++++++ 2 files changed, 62 insertions(+) diff --git a/chunk-format.c b/chunk-format.c index 3f5cc9b5ddf..e836a121c5c 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -173,6 +173,59 @@ int read_table_of_contents(struct chunkfile *cf, return 0; } +int read_trailing_table_of_contents(struct chunkfile *cf, + const unsigned char *mfile, + size_t mfile_size) +{ + int i; + uint32_t chunk_id; + const unsigned char *table_of_contents = mfile + mfile_size - the_hash_algo->rawsz; + + while (1) { + uint64_t chunk_offset; + + table_of_contents -= CHUNK_TOC_ENTRY_SIZE; + + chunk_id = get_be32(table_of_contents); + chunk_offset = get_be64(table_of_contents + 4); + + /* Calculate the previous chunk size, if it exists. */ + if (cf->chunks_nr) { + off_t previous_offset = cf->chunks[cf->chunks_nr - 1].offset; + + if (chunk_offset < previous_offset || + chunk_offset > table_of_contents - mfile) { + error(_("improper chunk offset(s) %"PRIx64" and %"PRIx64""), + previous_offset, chunk_offset); + return -1; + } + + cf->chunks[cf->chunks_nr - 1].size = chunk_offset - previous_offset; + } + + /* Stop at the null chunk. We only need it for the last size. */ + if (!chunk_id) + break; + + for (i = 0; i < cf->chunks_nr; i++) { + if (cf->chunks[i].id == chunk_id) { + error(_("duplicate chunk ID %"PRIx32" found"), + chunk_id); + return -1; + } + } + + ALLOC_GROW(cf->chunks, cf->chunks_nr + 1, cf->chunks_alloc); + + cf->chunks[cf->chunks_nr].id = chunk_id; + cf->chunks[cf->chunks_nr].start = mfile + chunk_offset; + cf->chunks[cf->chunks_nr].offset = chunk_offset; + cf->chunks_nr++; + } + + return 0; +} + static int pair_chunk_fn(const unsigned char *chunk_start, size_t chunk_size, void *data) diff --git a/chunk-format.h b/chunk-format.h index 39e8967e950..acb8dfbce80 100644 --- a/chunk-format.h +++ b/chunk-format.h @@ -46,6 +46,15 @@ int read_table_of_contents(struct chunkfile *cf, uint64_t toc_offset, int toc_length); +/** + * Read the given chunkfile, but read the table of contents from the + * end of the given mfile. The file is expected to be a hashfile with + * the_hash_file->rawsz bytes at the end storing the hash. + */ +int read_trailing_table_of_contents(struct chunkfile *cf, + const unsigned char *mfile, + size_t mfile_size); + #define CHUNK_NOT_FOUND (-2) /* From patchwork Mon Nov 7 18:35:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035094 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E33BEC4332F for ; Mon, 7 Nov 2022 18:37:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233182AbiKGShJ (ORCPT ); Mon, 7 Nov 2022 13:37:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233173AbiKGSg3 (ORCPT ); Mon, 7 Nov 2022 13:36:29 -0500 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E71226AEC for ; Mon, 7 Nov 2022 10:36:20 -0800 (PST) Received: by mail-wr1-x42e.google.com with SMTP id z14so17565468wrn.7 for ; Mon, 07 Nov 2022 10:36:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=DTGeBfiVeJkj0ZQM6LquExSKUyR4SxzhZe2yP2FrgMY=; b=K6TnEnhjHnQ6c0hukeGZz5ZxX37e1C7uSqsFlQUU9YbV1ggVctuyNfuI1omaG+7cWy nKA6o+UQxdv8Pg6UAyIjSWNTbkBMyopx+6rKou+zO1HGxd1065UN58wjCbne4xgsUpwF P29wcrkNGFaA1nKQyGOnW+3ZximEMe1YXWkiA/3XMSiBfPtgcp/0D3J96mmhzkSnFg0I KUG3WxDCl7CWrsKW0Fb9V0P8IA6clT0eiaOHkAyPS3nxmKr+lCURjwmuWlJMvYDWVBw9 HnCz2vMQRhy/hD/ZCE38TC5Y1kBORMZO9jytcef1uDhMjVcuKmZ7tkSCp/0LhAZT718a j/xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DTGeBfiVeJkj0ZQM6LquExSKUyR4SxzhZe2yP2FrgMY=; b=og2oSdcObl08c3nrmMJvNGg/PeLRNKlS1wvRLREioQQHMF31Ym9RiJGBPEhpTKm5+W GGAEW/aYoNxYxuoWNrcYFTyZ1eqILUGO2quuAgwlfhABaEkGeics0mic/DX/WG4waTXP o0LLCT077OXnnCKfdQFO6dC4ampjArqSbzz5geZ5TB4c4y1nuGTbph82PosrIED1XynT 5/2lJmIt6UhQr660ILwAB7MXmlZIsZVo7CGaj7nCKqt2rvENugFAU53f4kJXOKfkDc3A 1a+sLk4JXyvr9GIv+IrpNRAWt3Suh+BCzcth5BR5a15E9+1RFRfZPR0/Qvzx4bCstfFp SjyQ== X-Gm-Message-State: ACrzQf3vecWJjfu8g0ZroWkOfALHj1TuGgFCydmjQU/ZqArAnh2ASdLE CFQb7RzadQ6WNEGP9rc6Hc0TLEO3NGY= X-Google-Smtp-Source: AMsMyM6at7FEZPeO2fAZ0s2QpIQjtEdgnhSoHT6US7nxeQKDmG6fRfM0D0w5r8Sf2cIUBaWqPlKIeg== X-Received: by 2002:a5d:56ce:0:b0:237:9917:45f5 with SMTP id m14-20020a5d56ce000000b00237991745f5mr16794693wrw.363.1667846178300; Mon, 07 Nov 2022 10:36:18 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n17-20020a5d6611000000b002383edcde09sm8071426wru.59.2022.11.07.10.36.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:17 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:46 +0000 Subject: [PATCH 12/30] refs: extract packfile format to new file Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee In preparation for adding a new packed-refs file format, extract all code from refs/packed-backend.c that involves knowledge of the plaintext file format. This includes any parsing logic that cares about the header, plaintext lines of the form " " or "^", and the error messages when there is an issue in the file. This also includes the writing logic that writes the header or the individual references. Future changes will perform more refactoring to abstract away more of the writing process to be more generic, but this is enough of a chunk of code movement. Signed-off-by: Derrick Stolee --- Makefile | 1 + refs/packed-backend.c | 595 ++-------------------------------------- refs/packed-backend.h | 195 +++++++++++++ refs/packed-format-v1.c | 453 ++++++++++++++++++++++++++++++ 4 files changed, 667 insertions(+), 577 deletions(-) create mode 100644 refs/packed-format-v1.c diff --git a/Makefile b/Makefile index 4927379184c..3dc887941d4 100644 --- a/Makefile +++ b/Makefile @@ -1057,6 +1057,7 @@ LIB_OBJS += refs/debug.o LIB_OBJS += refs/files-backend.o LIB_OBJS += refs/iterator.o LIB_OBJS += refs/packed-backend.o +LIB_OBJS += refs/packed-format-v1.o LIB_OBJS += refs/ref-cache.o LIB_OBJS += refspec.o LIB_OBJS += remote.o diff --git a/refs/packed-backend.c b/refs/packed-backend.c index a4371b711b9..afaf6f53233 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -36,121 +36,6 @@ static enum mmap_strategy mmap_strategy = MMAP_TEMPORARY; static enum mmap_strategy mmap_strategy = MMAP_OK; #endif -struct packed_ref_store; - -/* - * A `snapshot` represents one snapshot of a `packed-refs` file. - * - * Normally, this will be a mmapped view of the contents of the - * `packed-refs` file at the time the snapshot was created. However, - * if the `packed-refs` file was not sorted, this might point at heap - * memory holding the contents of the `packed-refs` file with its - * records sorted by refname. - * - * `snapshot` instances are reference counted (via - * `acquire_snapshot()` and `release_snapshot()`). This is to prevent - * an instance from disappearing while an iterator is still iterating - * over it. Instances are garbage collected when their `referrers` - * count goes to zero. - * - * The most recent `snapshot`, if available, is referenced by the - * `packed_ref_store`. Its freshness is checked whenever - * `get_snapshot()` is called; if the existing snapshot is obsolete, a - * new snapshot is taken. - */ -struct snapshot { - /* - * A back-pointer to the packed_ref_store with which this - * snapshot is associated: - */ - struct packed_ref_store *refs; - - /* Is the `packed-refs` file currently mmapped? */ - int mmapped; - - /* - * The contents of the `packed-refs` file: - * - * - buf -- a pointer to the start of the memory - * - start -- a pointer to the first byte of actual references - * (i.e., after the header line, if one is present) - * - eof -- a pointer just past the end of the reference - * contents - * - * If the `packed-refs` file was already sorted, `buf` points - * at the mmapped contents of the file. If not, it points at - * heap-allocated memory containing the contents, sorted. If - * there were no contents (e.g., because the file didn't - * exist), `buf`, `start`, and `eof` are all NULL. - */ - char *buf, *start, *eof; - - /* - * What is the peeled state of the `packed-refs` file that - * this snapshot represents? (This is usually determined from - * the file's header.) - */ - enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled; - - /* - * Count of references to this instance, including the pointer - * from `packed_ref_store::snapshot`, if any. The instance - * will not be freed as long as the reference count is - * nonzero. - */ - unsigned int referrers; - - /* - * The metadata of the `packed-refs` file from which this - * snapshot was created, used to tell if the file has been - * replaced since we read it. - */ - struct stat_validity validity; -}; - -/* - * A `ref_store` representing references stored in a `packed-refs` - * file. It implements the `ref_store` interface, though it has some - * limitations: - * - * - It cannot store symbolic references. - * - * - It cannot store reflogs. - * - * - It does not support reference renaming (though it could). - * - * On the other hand, it can be locked outside of a reference - * transaction. In that case, it remains locked even after the - * transaction is done and the new `packed-refs` file is activated. - */ -struct packed_ref_store { - struct ref_store base; - - unsigned int store_flags; - - /* The path of the "packed-refs" file: */ - char *path; - - /* - * A snapshot of the values read from the `packed-refs` file, - * if it might still be current; otherwise, NULL. - */ - struct snapshot *snapshot; - - /* - * Lock used for the "packed-refs" file. Note that this (and - * thus the enclosing `packed_ref_store`) must not be freed. - */ - struct lock_file lock; - - /* - * Temporary file used when rewriting new contents to the - * "packed-refs" file. Note that this (and thus the enclosing - * `packed_ref_store`) must not be freed. - */ - struct tempfile *tempfile; -}; - /* * Increment the reference count of `*snapshot`. */ @@ -164,7 +49,7 @@ static void acquire_snapshot(struct snapshot *snapshot) * memory and close the file, or free the memory. Then set the buffer * pointers to NULL. */ -static void clear_snapshot_buffer(struct snapshot *snapshot) +void clear_snapshot_buffer(struct snapshot *snapshot) { if (snapshot->mmapped) { if (munmap(snapshot->buf, snapshot->eof - snapshot->buf)) @@ -245,224 +130,6 @@ static void clear_snapshot(struct packed_ref_store *refs) } } -static NORETURN void die_unterminated_line(const char *path, - const char *p, size_t len) -{ - if (len < 80) - die("unterminated line in %s: %.*s", path, (int)len, p); - else - die("unterminated line in %s: %.75s...", path, p); -} - -static NORETURN void die_invalid_line(const char *path, - const char *p, size_t len) -{ - const char *eol = memchr(p, '\n', len); - - if (!eol) - die_unterminated_line(path, p, len); - else if (eol - p < 80) - die("unexpected line in %s: %.*s", path, (int)(eol - p), p); - else - die("unexpected line in %s: %.75s...", path, p); - -} - -struct snapshot_record { - const char *start; - size_t len; -}; - -static int cmp_packed_ref_records(const void *v1, const void *v2) -{ - const struct snapshot_record *e1 = v1, *e2 = v2; - const char *r1 = e1->start + the_hash_algo->hexsz + 1; - const char *r2 = e2->start + the_hash_algo->hexsz + 1; - - while (1) { - if (*r1 == '\n') - return *r2 == '\n' ? 0 : -1; - if (*r1 != *r2) { - if (*r2 == '\n') - return 1; - else - return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1; - } - r1++; - r2++; - } -} - -/* - * Compare a snapshot record at `rec` to the specified NUL-terminated - * refname. - */ -static int cmp_record_to_refname(const char *rec, const char *refname) -{ - const char *r1 = rec + the_hash_algo->hexsz + 1; - const char *r2 = refname; - - while (1) { - if (*r1 == '\n') - return *r2 ? -1 : 0; - if (!*r2) - return 1; - if (*r1 != *r2) - return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1; - r1++; - r2++; - } -} - -/* - * `snapshot->buf` is not known to be sorted. Check whether it is, and - * if not, sort it into new memory and munmap/free the old storage. - */ -static void sort_snapshot(struct snapshot *snapshot) -{ - struct snapshot_record *records = NULL; - size_t alloc = 0, nr = 0; - int sorted = 1; - const char *pos, *eof, *eol; - size_t len, i; - char *new_buffer, *dst; - - pos = snapshot->start; - eof = snapshot->eof; - - if (pos == eof) - return; - - len = eof - pos; - - /* - * Initialize records based on a crude estimate of the number - * of references in the file (we'll grow it below if needed): - */ - ALLOC_GROW(records, len / 80 + 20, alloc); - - while (pos < eof) { - eol = memchr(pos, '\n', eof - pos); - if (!eol) - /* The safety check should prevent this. */ - BUG("unterminated line found in packed-refs"); - if (eol - pos < the_hash_algo->hexsz + 2) - die_invalid_line(snapshot->refs->path, - pos, eof - pos); - eol++; - if (eol < eof && *eol == '^') { - /* - * Keep any peeled line together with its - * reference: - */ - const char *peeled_start = eol; - - eol = memchr(peeled_start, '\n', eof - peeled_start); - if (!eol) - /* The safety check should prevent this. */ - BUG("unterminated peeled line found in packed-refs"); - eol++; - } - - ALLOC_GROW(records, nr + 1, alloc); - records[nr].start = pos; - records[nr].len = eol - pos; - nr++; - - if (sorted && - nr > 1 && - cmp_packed_ref_records(&records[nr - 2], - &records[nr - 1]) >= 0) - sorted = 0; - - pos = eol; - } - - if (sorted) - goto cleanup; - - /* We need to sort the memory. First we sort the records array: */ - QSORT(records, nr, cmp_packed_ref_records); - - /* - * Allocate a new chunk of memory, and copy the old memory to - * the new in the order indicated by `records` (not bothering - * with the header line): - */ - new_buffer = xmalloc(len); - for (dst = new_buffer, i = 0; i < nr; i++) { - memcpy(dst, records[i].start, records[i].len); - dst += records[i].len; - } - - /* - * Now munmap the old buffer and use the sorted buffer in its - * place: - */ - clear_snapshot_buffer(snapshot); - snapshot->buf = snapshot->start = new_buffer; - snapshot->eof = new_buffer + len; - -cleanup: - free(records); -} - -/* - * Return a pointer to the start of the record that contains the - * character `*p` (which must be within the buffer). If no other - * record start is found, return `buf`. - */ -static const char *find_start_of_record(const char *buf, const char *p) -{ - while (p > buf && (p[-1] != '\n' || p[0] == '^')) - p--; - return p; -} - -/* - * Return a pointer to the start of the record following the record - * that contains `*p`. If none is found before `end`, return `end`. - */ -static const char *find_end_of_record(const char *p, const char *end) -{ - while (++p < end && (p[-1] != '\n' || p[0] == '^')) - ; - return p; -} - -/* - * We want to be able to compare mmapped reference records quickly, - * without totally parsing them. We can do so because the records are - * LF-terminated, and the refname should start exactly (GIT_SHA1_HEXSZ - * + 1) bytes past the beginning of the record. - * - * But what if the `packed-refs` file contains garbage? We're willing - * to tolerate not detecting the problem, as long as we don't produce - * totally garbled output (we can't afford to check the integrity of - * the whole file during every Git invocation). But we do want to be - * sure that we never read past the end of the buffer in memory and - * perform an illegal memory access. - * - * Guarantee that minimum level of safety by verifying that the last - * record in the file is LF-terminated, and that it has at least - * (GIT_SHA1_HEXSZ + 1) characters before the LF. Die if either of - * these checks fails. - */ -static void verify_buffer_safe(struct snapshot *snapshot) -{ - const char *start = snapshot->start; - const char *eof = snapshot->eof; - const char *last_line; - - if (start == eof) - return; - - last_line = find_start_of_record(start, eof - 1); - if (*(eof - 1) != '\n' || eof - last_line < the_hash_algo->hexsz + 2) - die_invalid_line(snapshot->refs->path, - last_line, eof - last_line); -} - #define SMALL_FILE_SIZE (32*1024) /* @@ -524,67 +191,6 @@ static int load_contents(struct snapshot *snapshot) return 1; } -/* - * Find the place in `snapshot->buf` where the start of the record for - * `refname` starts. If `mustexist` is true and the reference doesn't - * exist, then return NULL. If `mustexist` is false and the reference - * doesn't exist, then return the point where that reference would be - * inserted, or `snapshot->eof` (which might be NULL) if it would be - * inserted at the end of the file. In the latter mode, `refname` - * doesn't have to be a proper reference name; for example, one could - * search for "refs/replace/" to find the start of any replace - * references. - * - * The record is sought using a binary search, so `snapshot->buf` must - * be sorted. - */ -static const char *find_reference_location(struct snapshot *snapshot, - const char *refname, int mustexist) -{ - /* - * This is not *quite* a garden-variety binary search, because - * the data we're searching is made up of records, and we - * always need to find the beginning of a record to do a - * comparison. A "record" here is one line for the reference - * itself and zero or one peel lines that start with '^'. Our - * loop invariant is described in the next two comments. - */ - - /* - * A pointer to the character at the start of a record whose - * preceding records all have reference names that come - * *before* `refname`. - */ - const char *lo = snapshot->start; - - /* - * A pointer to a the first character of a record whose - * reference name comes *after* `refname`. - */ - const char *hi = snapshot->eof; - - while (lo != hi) { - const char *mid, *rec; - int cmp; - - mid = lo + (hi - lo) / 2; - rec = find_start_of_record(lo, mid); - cmp = cmp_record_to_refname(rec, refname); - if (cmp < 0) { - lo = find_end_of_record(mid, hi); - } else if (cmp > 0) { - hi = rec; - } else { - return rec; - } - } - - if (mustexist) - return NULL; - else - return lo; -} - /* * Create a newly-allocated `snapshot` of the `packed-refs` file in * its current state and return it. The return value will already have @@ -630,54 +236,22 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs) if (!load_contents(snapshot)) return snapshot; - /* If the file has a header line, process it: */ - if (snapshot->buf < snapshot->eof && *snapshot->buf == '#') { - char *tmp, *p, *eol; - struct string_list traits = STRING_LIST_INIT_NODUP; - - eol = memchr(snapshot->buf, '\n', - snapshot->eof - snapshot->buf); - if (!eol) - die_unterminated_line(refs->path, - snapshot->buf, - snapshot->eof - snapshot->buf); - - tmp = xmemdupz(snapshot->buf, eol - snapshot->buf); - - if (!skip_prefix(tmp, "# pack-refs with:", (const char **)&p)) - die_invalid_line(refs->path, - snapshot->buf, - snapshot->eof - snapshot->buf); - - string_list_split_in_place(&traits, p, ' ', -1); - - if (unsorted_string_list_has_string(&traits, "fully-peeled")) - snapshot->peeled = PEELED_FULLY; - else if (unsorted_string_list_has_string(&traits, "peeled")) - snapshot->peeled = PEELED_TAGS; - - sorted = unsorted_string_list_has_string(&traits, "sorted"); - - /* perhaps other traits later as well */ - - /* The "+ 1" is for the LF character. */ - snapshot->start = eol + 1; - - string_list_clear(&traits, 0); - free(tmp); + if (parse_packed_format_v1_header(refs, snapshot, &sorted)) { + clear_snapshot(refs); + return NULL; } - verify_buffer_safe(snapshot); + verify_buffer_safe_v1(snapshot); if (!sorted) { - sort_snapshot(snapshot); + sort_snapshot_v1(snapshot); /* * Reordering the records might have moved a short one * to the end of the buffer, so verify the buffer's * safety again: */ - verify_buffer_safe(snapshot); + verify_buffer_safe_v1(snapshot); } if (mmap_strategy != MMAP_OK && snapshot->mmapped) { @@ -735,55 +309,11 @@ static int packed_read_raw_ref(struct ref_store *ref_store, const char *refname, struct packed_ref_store *refs = packed_downcast(ref_store, REF_STORE_READ, "read_raw_ref"); struct snapshot *snapshot = get_snapshot(refs); - const char *rec; - - *type = 0; - rec = find_reference_location(snapshot, refname, 1); - - if (!rec) { - /* refname is not a packed reference. */ - *failure_errno = ENOENT; - return -1; - } - - if (get_oid_hex(rec, oid)) - die_invalid_line(refs->path, rec, snapshot->eof - rec); - - *type = REF_ISPACKED; - return 0; + return packed_read_raw_ref_v1(refs, snapshot, refname, + oid, type, failure_errno); } -/* - * This value is set in `base.flags` if the peeled value of the - * current reference is known. In that case, `peeled` contains the - * correct peeled value for the reference, which might be `null_oid` - * if the reference is not a tag or if it is broken. - */ -#define REF_KNOWS_PEELED 0x40 - -/* - * An iterator over a snapshot of a `packed-refs` file. - */ -struct packed_ref_iterator { - struct ref_iterator base; - - struct snapshot *snapshot; - - /* The current position in the snapshot's buffer: */ - const char *pos; - - /* The end of the part of the buffer that will be iterated over: */ - const char *eof; - - /* Scratch space for current values: */ - struct object_id oid, peeled; - struct strbuf refname_buf; - - struct repository *repo; - unsigned int flags; -}; - /* * Move the iterator to the next record in the snapshot, without * respect for whether the record is actually required by the current @@ -793,68 +323,7 @@ struct packed_ref_iterator { */ static int next_record(struct packed_ref_iterator *iter) { - const char *p = iter->pos, *eol; - - strbuf_reset(&iter->refname_buf); - - if (iter->pos == iter->eof) - return ITER_DONE; - - iter->base.flags = REF_ISPACKED; - - if (iter->eof - p < the_hash_algo->hexsz + 2 || - parse_oid_hex(p, &iter->oid, &p) || - !isspace(*p++)) - die_invalid_line(iter->snapshot->refs->path, - iter->pos, iter->eof - iter->pos); - - eol = memchr(p, '\n', iter->eof - p); - if (!eol) - die_unterminated_line(iter->snapshot->refs->path, - iter->pos, iter->eof - iter->pos); - - strbuf_add(&iter->refname_buf, p, eol - p); - iter->base.refname = iter->refname_buf.buf; - - if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) { - if (!refname_is_safe(iter->base.refname)) - die("packed refname is dangerous: %s", - iter->base.refname); - oidclr(&iter->oid); - iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN; - } - if (iter->snapshot->peeled == PEELED_FULLY || - (iter->snapshot->peeled == PEELED_TAGS && - starts_with(iter->base.refname, "refs/tags/"))) - iter->base.flags |= REF_KNOWS_PEELED; - - iter->pos = eol + 1; - - if (iter->pos < iter->eof && *iter->pos == '^') { - p = iter->pos + 1; - if (iter->eof - p < the_hash_algo->hexsz + 1 || - parse_oid_hex(p, &iter->peeled, &p) || - *p++ != '\n') - die_invalid_line(iter->snapshot->refs->path, - iter->pos, iter->eof - iter->pos); - iter->pos = p; - - /* - * Regardless of what the file header said, we - * definitely know the value of *this* reference. But - * we suppress it if the reference is broken: - */ - if ((iter->base.flags & REF_ISBROKEN)) { - oidclr(&iter->peeled); - iter->base.flags &= ~REF_KNOWS_PEELED; - } else { - iter->base.flags |= REF_KNOWS_PEELED; - } - } else { - oidclr(&iter->peeled); - } - - return ITER_OK; + return next_record_v1(iter); } static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator) @@ -942,7 +411,7 @@ static struct ref_iterator *packed_ref_iterator_begin( snapshot = get_snapshot(refs); if (prefix && *prefix) - start = find_reference_location(snapshot, prefix, 0); + start = find_reference_location_v1(snapshot, prefix, 0); else start = snapshot->start; @@ -972,23 +441,6 @@ static struct ref_iterator *packed_ref_iterator_begin( return ref_iterator; } -/* - * Write an entry to the packed-refs file for the specified refname. - * If peeled is non-NULL, write it as the entry's peeled value. On - * error, return a nonzero value and leave errno set at the value left - * by the failing call to `fprintf()`. - */ -static int write_packed_entry(FILE *fh, const char *refname, - const struct object_id *oid, - const struct object_id *peeled) -{ - if (fprintf(fh, "%s %s\n", oid_to_hex(oid), refname) < 0 || - (peeled && fprintf(fh, "^%s\n", oid_to_hex(peeled)) < 0)) - return -1; - - return 0; -} - int packed_refs_lock(struct ref_store *ref_store, int flags, struct strbuf *err) { struct packed_ref_store *refs = @@ -1070,17 +522,6 @@ int packed_refs_is_locked(struct ref_store *ref_store) return is_lock_file_locked(&refs->lock); } -/* - * The packed-refs header line that we write out. Perhaps other traits - * will be added later. - * - * Note that earlier versions of Git used to parse these traits by - * looking for " trait " in the line. For this reason, the space after - * the colon and the trailing space are required. - */ -static const char PACKED_REFS_HEADER[] = - "# pack-refs with: peeled fully-peeled sorted \n"; - static int packed_init_db(struct ref_store *ref_store UNUSED, struct strbuf *err UNUSED) { @@ -1136,7 +577,7 @@ static int write_with_updates(struct packed_ref_store *refs, goto error; } - if (fprintf(out, "%s", PACKED_REFS_HEADER) < 0) + if (write_packed_file_header_v1(out) < 0) goto write_error; /* @@ -1230,9 +671,9 @@ static int write_with_updates(struct packed_ref_store *refs, struct object_id peeled; int peel_error = ref_iterator_peel(iter, &peeled); - if (write_packed_entry(out, iter->refname, - iter->oid, - peel_error ? NULL : &peeled)) + if (write_packed_entry_v1(out, iter->refname, + iter->oid, + peel_error ? NULL : &peeled)) goto write_error; if ((ok = ref_iterator_advance(iter)) != ITER_OK) @@ -1251,9 +692,9 @@ static int write_with_updates(struct packed_ref_store *refs, int peel_error = peel_object(&update->new_oid, &peeled); - if (write_packed_entry(out, update->refname, - &update->new_oid, - peel_error ? NULL : &peeled)) + if (write_packed_entry_v1(out, update->refname, + &update->new_oid, + peel_error ? NULL : &peeled)) goto write_error; i++; diff --git a/refs/packed-backend.h b/refs/packed-backend.h index 9dd8a344c34..143ed6d4f6c 100644 --- a/refs/packed-backend.h +++ b/refs/packed-backend.h @@ -1,6 +1,10 @@ #ifndef REFS_PACKED_BACKEND_H #define REFS_PACKED_BACKEND_H +#include "../cache.h" +#include "refs-internal.h" +#include "../lockfile.h" + struct repository; struct ref_transaction; @@ -36,4 +40,195 @@ int packed_refs_is_locked(struct ref_store *ref_store); int is_packed_transaction_needed(struct ref_store *ref_store, struct ref_transaction *transaction); +struct packed_ref_store; + +/* + * A `snapshot` represents one snapshot of a `packed-refs` file. + * + * Normally, this will be a mmapped view of the contents of the + * `packed-refs` file at the time the snapshot was created. However, + * if the `packed-refs` file was not sorted, this might point at heap + * memory holding the contents of the `packed-refs` file with its + * records sorted by refname. + * + * `snapshot` instances are reference counted (via + * `acquire_snapshot()` and `release_snapshot()`). This is to prevent + * an instance from disappearing while an iterator is still iterating + * over it. Instances are garbage collected when their `referrers` + * count goes to zero. + * + * The most recent `snapshot`, if available, is referenced by the + * `packed_ref_store`. Its freshness is checked whenever + * `get_snapshot()` is called; if the existing snapshot is obsolete, a + * new snapshot is taken. + */ +struct snapshot { + /* + * A back-pointer to the packed_ref_store with which this + * snapshot is associated: + */ + struct packed_ref_store *refs; + + /* Is the `packed-refs` file currently mmapped? */ + int mmapped; + + /* + * The contents of the `packed-refs` file: + * + * - buf -- a pointer to the start of the memory + * - start -- a pointer to the first byte of actual references + * (i.e., after the header line, if one is present) + * - eof -- a pointer just past the end of the reference + * contents + * + * If the `packed-refs` file was already sorted, `buf` points + * at the mmapped contents of the file. If not, it points at + * heap-allocated memory containing the contents, sorted. If + * there were no contents (e.g., because the file didn't + * exist), `buf`, `start`, and `eof` are all NULL. + */ + char *buf, *start, *eof; + + /* + * What is the peeled state of the `packed-refs` file that + * this snapshot represents? (This is usually determined from + * the file's header.) + */ + enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled; + + /* + * Count of references to this instance, including the pointer + * from `packed_ref_store::snapshot`, if any. The instance + * will not be freed as long as the reference count is + * nonzero. + */ + unsigned int referrers; + + /* + * The metadata of the `packed-refs` file from which this + * snapshot was created, used to tell if the file has been + * replaced since we read it. + */ + struct stat_validity validity; +}; + +/* + * If the buffer in `snapshot` is active, then either munmap the + * memory and close the file, or free the memory. Then set the buffer + * pointers to NULL. + */ +void clear_snapshot_buffer(struct snapshot *snapshot); + +/* + * A `ref_store` representing references stored in a `packed-refs` + * file. It implements the `ref_store` interface, though it has some + * limitations: + * + * - It cannot store symbolic references. + * + * - It cannot store reflogs. + * + * - It does not support reference renaming (though it could). + * + * On the other hand, it can be locked outside of a reference + * transaction. In that case, it remains locked even after the + * transaction is done and the new `packed-refs` file is activated. + */ +struct packed_ref_store { + struct ref_store base; + + unsigned int store_flags; + + /* The path of the "packed-refs" file: */ + char *path; + + /* + * A snapshot of the values read from the `packed-refs` file, + * if it might still be current; otherwise, NULL. + */ + struct snapshot *snapshot; + + /* + * Lock used for the "packed-refs" file. Note that this (and + * thus the enclosing `packed_ref_store`) must not be freed. + */ + struct lock_file lock; + + /* + * Temporary file used when rewriting new contents to the + * "packed-refs" file. Note that this (and thus the enclosing + * `packed_ref_store`) must not be freed. + */ + struct tempfile *tempfile; +}; + +/* + * This value is set in `base.flags` if the peeled value of the + * current reference is known. In that case, `peeled` contains the + * correct peeled value for the reference, which might be `null_oid` + * if the reference is not a tag or if it is broken. + */ +#define REF_KNOWS_PEELED 0x40 + +/* + * An iterator over a snapshot of a `packed-refs` file. + */ +struct packed_ref_iterator { + struct ref_iterator base; + + struct snapshot *snapshot; + + /* The current position in the snapshot's buffer: */ + const char *pos; + + /* The end of the part of the buffer that will be iterated over: */ + const char *eof; + + /* Scratch space for current values: */ + struct object_id oid, peeled; + struct strbuf refname_buf; + + struct repository *repo; + unsigned int flags; +}; + +/** + * Parse the buffer at the given snapshot to verify that it is a + * packed-refs file in version 1 format. Update the snapshot->peeled + * value according to the header information. Update the given + * 'sorted' value with whether or not the packed-refs file is sorted. + */ +int parse_packed_format_v1_header(struct packed_ref_store *refs, + struct snapshot *snapshot, + int *sorted); + +/* + * Find the place in `snapshot->buf` where the start of the record for + * `refname` starts. If `mustexist` is true and the reference doesn't + * exist, then return NULL. If `mustexist` is false and the reference + * doesn't exist, then return the point where that reference would be + * inserted, or `snapshot->eof` (which might be NULL) if it would be + * inserted at the end of the file. In the latter mode, `refname` + * doesn't have to be a proper reference name; for example, one could + * search for "refs/replace/" to find the start of any replace + * references. + * + * The record is sought using a binary search, so `snapshot->buf` must + * be sorted. + */ +const char *find_reference_location_v1(struct snapshot *snapshot, + const char *refname, int mustexist); + +int packed_read_raw_ref_v1(struct packed_ref_store *refs, struct snapshot *snapshot, + const char *refname, struct object_id *oid, + unsigned int *type, int *failure_errno); + +void verify_buffer_safe_v1(struct snapshot *snapshot); +void sort_snapshot_v1(struct snapshot *snapshot); +int write_packed_file_header_v1(FILE *out); +int next_record_v1(struct packed_ref_iterator *iter); +int write_packed_entry_v1(FILE *fh, const char *refname, + const struct object_id *oid, + const struct object_id *peeled); + #endif /* REFS_PACKED_BACKEND_H */ diff --git a/refs/packed-format-v1.c b/refs/packed-format-v1.c new file mode 100644 index 00000000000..ef9e6618c89 --- /dev/null +++ b/refs/packed-format-v1.c @@ -0,0 +1,453 @@ +#include "../cache.h" +#include "../config.h" +#include "../refs.h" +#include "refs-internal.h" +#include "packed-backend.h" +#include "../iterator.h" +#include "../lockfile.h" +#include "../chdir-notify.h" + +static NORETURN void die_unterminated_line(const char *path, + const char *p, size_t len) +{ + if (len < 80) + die("unterminated line in %s: %.*s", path, (int)len, p); + else + die("unterminated line in %s: %.75s...", path, p); +} + +static NORETURN void die_invalid_line(const char *path, + const char *p, size_t len) +{ + const char *eol = memchr(p, '\n', len); + + if (!eol) + die_unterminated_line(path, p, len); + else if (eol - p < 80) + die("unexpected line in %s: %.*s", path, (int)(eol - p), p); + else + die("unexpected line in %s: %.75s...", path, p); +} + +struct snapshot_record { + const char *start; + size_t len; +}; + +static int cmp_packed_ref_records(const void *v1, const void *v2) +{ + const struct snapshot_record *e1 = v1, *e2 = v2; + const char *r1 = e1->start + the_hash_algo->hexsz + 1; + const char *r2 = e2->start + the_hash_algo->hexsz + 1; + + while (1) { + if (*r1 == '\n') + return *r2 == '\n' ? 0 : -1; + if (*r1 != *r2) { + if (*r2 == '\n') + return 1; + else + return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1; + } + r1++; + r2++; + } +} + +/* + * Compare a snapshot record at `rec` to the specified NUL-terminated + * refname. + */ +static int cmp_record_to_refname(const char *rec, const char *refname) +{ + const char *r1 = rec + the_hash_algo->hexsz + 1; + const char *r2 = refname; + + while (1) { + if (*r1 == '\n') + return *r2 ? -1 : 0; + if (!*r2) + return 1; + if (*r1 != *r2) + return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1; + r1++; + r2++; + } +} + +/* + * `snapshot->buf` is not known to be sorted. Check whether it is, and + * if not, sort it into new memory and munmap/free the old storage. + */ +void sort_snapshot_v1(struct snapshot *snapshot) +{ + struct snapshot_record *records = NULL; + size_t alloc = 0, nr = 0; + int sorted = 1; + const char *pos, *eof, *eol; + size_t len, i; + char *new_buffer, *dst; + + pos = snapshot->start; + eof = snapshot->eof; + + if (pos == eof) + return; + + len = eof - pos; + + /* + * Initialize records based on a crude estimate of the number + * of references in the file (we'll grow it below if needed): + */ + ALLOC_GROW(records, len / 80 + 20, alloc); + + while (pos < eof) { + eol = memchr(pos, '\n', eof - pos); + if (!eol) + /* The safety check should prevent this. */ + BUG("unterminated line found in packed-refs"); + if (eol - pos < the_hash_algo->hexsz + 2) + die_invalid_line(snapshot->refs->path, + pos, eof - pos); + eol++; + if (eol < eof && *eol == '^') { + /* + * Keep any peeled line together with its + * reference: + */ + const char *peeled_start = eol; + + eol = memchr(peeled_start, '\n', eof - peeled_start); + if (!eol) + /* The safety check should prevent this. */ + BUG("unterminated peeled line found in packed-refs"); + eol++; + } + + ALLOC_GROW(records, nr + 1, alloc); + records[nr].start = pos; + records[nr].len = eol - pos; + nr++; + + if (sorted && + nr > 1 && + cmp_packed_ref_records(&records[nr - 2], + &records[nr - 1]) >= 0) + sorted = 0; + + pos = eol; + } + + if (sorted) + goto cleanup; + + /* We need to sort the memory. First we sort the records array: */ + QSORT(records, nr, cmp_packed_ref_records); + + /* + * Allocate a new chunk of memory, and copy the old memory to + * the new in the order indicated by `records` (not bothering + * with the header line): + */ + new_buffer = xmalloc(len); + for (dst = new_buffer, i = 0; i < nr; i++) { + memcpy(dst, records[i].start, records[i].len); + dst += records[i].len; + } + + /* + * Now munmap the old buffer and use the sorted buffer in its + * place: + */ + clear_snapshot_buffer(snapshot); + snapshot->buf = snapshot->start = new_buffer; + snapshot->eof = new_buffer + len; + +cleanup: + free(records); +} + +/* + * Return a pointer to the start of the record that contains the + * character `*p` (which must be within the buffer). If no other + * record start is found, return `buf`. + */ +static const char *find_start_of_record(const char *buf, const char *p) +{ + while (p > buf && (p[-1] != '\n' || p[0] == '^')) + p--; + return p; +} + +/* + * Return a pointer to the start of the record following the record + * that contains `*p`. If none is found before `end`, return `end`. + */ +static const char *find_end_of_record(const char *p, const char *end) +{ + while (++p < end && (p[-1] != '\n' || p[0] == '^')) + ; + return p; +} + +/* + * We want to be able to compare mmapped reference records quickly, + * without totally parsing them. We can do so because the records are + * LF-terminated, and the refname should start exactly (GIT_SHA1_HEXSZ + * + 1) bytes past the beginning of the record. + * + * But what if the `packed-refs` file contains garbage? We're willing + * to tolerate not detecting the problem, as long as we don't produce + * totally garbled output (we can't afford to check the integrity of + * the whole file during every Git invocation). But we do want to be + * sure that we never read past the end of the buffer in memory and + * perform an illegal memory access. + * + * Guarantee that minimum level of safety by verifying that the last + * record in the file is LF-terminated, and that it has at least + * (GIT_SHA1_HEXSZ + 1) characters before the LF. Die if either of + * these checks fails. + */ +void verify_buffer_safe_v1(struct snapshot *snapshot) +{ + const char *start = snapshot->start; + const char *eof = snapshot->eof; + const char *last_line; + + if (start == eof) + return; + + last_line = find_start_of_record(start, eof - 1); + if (*(eof - 1) != '\n' || eof - last_line < the_hash_algo->hexsz + 2) + die_invalid_line(snapshot->refs->path, + last_line, eof - last_line); +} + +/* + * Find the place in `snapshot->buf` where the start of the record for + * `refname` starts. If `mustexist` is true and the reference doesn't + * exist, then return NULL. If `mustexist` is false and the reference + * doesn't exist, then return the point where that reference would be + * inserted, or `snapshot->eof` (which might be NULL) if it would be + * inserted at the end of the file. In the latter mode, `refname` + * doesn't have to be a proper reference name; for example, one could + * search for "refs/replace/" to find the start of any replace + * references. + * + * The record is sought using a binary search, so `snapshot->buf` must + * be sorted. + */ +const char *find_reference_location_v1(struct snapshot *snapshot, + const char *refname, int mustexist) +{ + /* + * This is not *quite* a garden-variety binary search, because + * the data we're searching is made up of records, and we + * always need to find the beginning of a record to do a + * comparison. A "record" here is one line for the reference + * itself and zero or one peel lines that start with '^'. Our + * loop invariant is described in the next two comments. + */ + + /* + * A pointer to the character at the start of a record whose + * preceding records all have reference names that come + * *before* `refname`. + */ + const char *lo = snapshot->start; + + /* + * A pointer to a the first character of a record whose + * reference name comes *after* `refname`. + */ + const char *hi = snapshot->eof; + + while (lo != hi) { + const char *mid, *rec; + int cmp; + + mid = lo + (hi - lo) / 2; + rec = find_start_of_record(lo, mid); + cmp = cmp_record_to_refname(rec, refname); + if (cmp < 0) { + lo = find_end_of_record(mid, hi); + } else if (cmp > 0) { + hi = rec; + } else { + return rec; + } + } + + if (mustexist) + return NULL; + else + return lo; +} + +int parse_packed_format_v1_header(struct packed_ref_store *refs, + struct snapshot *snapshot, + int *sorted) +{ + *sorted = 0; + /* If the file has a header line, process it: */ + if (snapshot->buf < snapshot->eof && *snapshot->buf == '#') { + char *tmp, *p, *eol; + struct string_list traits = STRING_LIST_INIT_NODUP; + + eol = memchr(snapshot->buf, '\n', + snapshot->eof - snapshot->buf); + if (!eol) + die_unterminated_line(refs->path, + snapshot->buf, + snapshot->eof - snapshot->buf); + + tmp = xmemdupz(snapshot->buf, eol - snapshot->buf); + + if (!skip_prefix(tmp, "# pack-refs with:", (const char **)&p)) + die_invalid_line(refs->path, + snapshot->buf, + snapshot->eof - snapshot->buf); + + string_list_split_in_place(&traits, p, ' ', -1); + + if (unsorted_string_list_has_string(&traits, "fully-peeled")) + snapshot->peeled = PEELED_FULLY; + else if (unsorted_string_list_has_string(&traits, "peeled")) + snapshot->peeled = PEELED_TAGS; + + *sorted = unsorted_string_list_has_string(&traits, "sorted"); + + /* perhaps other traits later as well */ + + /* The "+ 1" is for the LF character. */ + snapshot->start = eol + 1; + + string_list_clear(&traits, 0); + free(tmp); + } + + return 0; +} + +int packed_read_raw_ref_v1(struct packed_ref_store *refs, struct snapshot *snapshot, + const char *refname, struct object_id *oid, + unsigned int *type, int *failure_errno) +{ + const char *rec; + + *type = 0; + + rec = find_reference_location_v1(snapshot, refname, 1); + + if (!rec) { + /* refname is not a packed reference. */ + *failure_errno = ENOENT; + return -1; + } + + if (get_oid_hex(rec, oid)) + die_invalid_line(refs->path, rec, snapshot->eof - rec); + + *type = REF_ISPACKED; + return 0; +} + +int next_record_v1(struct packed_ref_iterator *iter) +{ + const char *p = iter->pos, *eol; + + strbuf_reset(&iter->refname_buf); + + if (iter->pos == iter->eof) + return ITER_DONE; + + iter->base.flags = REF_ISPACKED; + + if (iter->eof - p < the_hash_algo->hexsz + 2 || + parse_oid_hex(p, &iter->oid, &p) || + !isspace(*p++)) + die_invalid_line(iter->snapshot->refs->path, + iter->pos, iter->eof - iter->pos); + + eol = memchr(p, '\n', iter->eof - p); + if (!eol) + die_unterminated_line(iter->snapshot->refs->path, + iter->pos, iter->eof - iter->pos); + + strbuf_add(&iter->refname_buf, p, eol - p); + iter->base.refname = iter->refname_buf.buf; + + if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) { + if (!refname_is_safe(iter->base.refname)) + die("packed refname is dangerous: %s", + iter->base.refname); + oidclr(&iter->oid); + iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN; + } + if (iter->snapshot->peeled == PEELED_FULLY || + (iter->snapshot->peeled == PEELED_TAGS && + starts_with(iter->base.refname, "refs/tags/"))) + iter->base.flags |= REF_KNOWS_PEELED; + + iter->pos = eol + 1; + + if (iter->pos < iter->eof && *iter->pos == '^') { + p = iter->pos + 1; + if (iter->eof - p < the_hash_algo->hexsz + 1 || + parse_oid_hex(p, &iter->peeled, &p) || + *p++ != '\n') + die_invalid_line(iter->snapshot->refs->path, + iter->pos, iter->eof - iter->pos); + iter->pos = p; + + /* + * Regardless of what the file header said, we + * definitely know the value of *this* reference. But + * we suppress it if the reference is broken: + */ + if ((iter->base.flags & REF_ISBROKEN)) { + oidclr(&iter->peeled); + iter->base.flags &= ~REF_KNOWS_PEELED; + } else { + iter->base.flags |= REF_KNOWS_PEELED; + } + } else { + oidclr(&iter->peeled); + } + + return ITER_OK; +} + +/* + * The packed-refs header line that we write out. Perhaps other traits + * will be added later. + * + * Note that earlier versions of Git used to parse these traits by + * looking for " trait " in the line. For this reason, the space after + * the colon and the trailing space are required. + */ +static const char PACKED_REFS_HEADER[] = + "# pack-refs with: peeled fully-peeled sorted \n"; + +int write_packed_file_header_v1(FILE *out) +{ + return fprintf(out, "%s", PACKED_REFS_HEADER); +} + +/* + * Write an entry to the packed-refs file for the specified refname. + * If peeled is non-NULL, write it as the entry's peeled value. On + * error, return a nonzero value and leave errno set at the value left + * by the failing call to `fprintf()`. + */ +int write_packed_entry_v1(FILE *fh, const char *refname, + const struct object_id *oid, + const struct object_id *peeled) +{ + if (fprintf(fh, "%s %s\n", oid_to_hex(oid), refname) < 0 || + (peeled && fprintf(fh, "^%s\n", oid_to_hex(peeled)) < 0)) + return -1; + + return 0; +} From patchwork Mon Nov 7 18:35:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035093 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B848C433FE for ; Mon, 7 Nov 2022 18:37:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233234AbiKGShG (ORCPT ); Mon, 7 Nov 2022 13:37:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232799AbiKGSg3 (ORCPT ); Mon, 7 Nov 2022 13:36:29 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C692326AF8 for ; Mon, 7 Nov 2022 10:36:20 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id 187-20020a1c02c4000000b003cf9c3f3b80so2721605wmc.0 for ; Mon, 07 Nov 2022 10:36:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=BlC5bNjUxauJsYY1g9pQ/U+SebV6B9lRzVTbrUZDHzs=; b=IanYpiwCxJxeNHYnL6ebHK9YUseXcnAurV4gzxIBGzl2QrQKhAKtRtG41Jio4fMOiV VGilBEOOrLRUGs3gJGGAB9UurhyPLEL5ixdSyuJsvxxVpop64nUw53LqErQ1DbTBaYSj goyYVw992EKXGTwR85eJcyKhaQ02Y6LAhs7KaN/CTFFh9GTvT8Wt2NH9K4ufbSsN2fUc mzNz+b6tKDy65LwI4loFTS+EcAcCb5ITBsaqqvwILGfykuGTrkVTDO71saRNP1fZhhcD iB3Lm57sQs0Yae8/XOsMBXUF3okve2XpJEJ/YFWU7wQT7cBHV4qP4UB+MzrZw3TjY3VP lC0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BlC5bNjUxauJsYY1g9pQ/U+SebV6B9lRzVTbrUZDHzs=; b=DwOdmOfIMVhB+73eruh8uzkVlNITsW5sXpJuwf848BQszgIpbH0ZRfBMbtCey8LZbn 5fYk5e2cisku2FKl2WEDcniOYWlKBozGcq74YzSYSFpwXHNBhupsI0u/NRZob4WHoaZr aRejWffcojHJKXnA2b8bv6sZL8OsMybKbIrybYUWlhrRZDtgBPLiK0uMOR9FLkSrFhYI 7OdYnLl6ARaYw6znd/gQ5mojYLh1HmOzRcn6JnlmKInzw355B705QmHn3qa9XH0/Q89S QyjwTuOlJvf5YI8Fk0Ym0IzEK4AatAxobQ6rB8IPd+0mOPqwhG8vpLC0lKp7WVOQxOcd UtSA== X-Gm-Message-State: ACrzQf0f7yYBPXIlPRU4yBGz1rcRnNEQZWriGUVBXxpkDJASXKB3HCdJ uDX4cnXYZDZSSUg+zUBaTmdFKgQaEow= X-Google-Smtp-Source: AMsMyM6T6322swOAm+KD9FtpwDg596PZFAuYjmNQ2OtDUoDKFSj+6mlyb4X+g2mS2sFSOmmgOn8oyQ== X-Received: by 2002:a05:600c:4586:b0:3c6:fbb0:bf2d with SMTP id r6-20020a05600c458600b003c6fbb0bf2dmr33746676wmo.13.1667846179110; Mon, 07 Nov 2022 10:36:19 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n37-20020a05600c3ba500b003cfa3a12660sm13782344wms.1.2022.11.07.10.36.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:18 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:47 +0000 Subject: [PATCH 13/30] packed-backend: extract add_write_error() Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The write_with_updates() method uses a write_error label to jump to code that adds an error message before exiting with an error. This appears both when the packed-refs file header is written, but also when a ref line is written to the packed-refs file. A future change will abstract the loop that writes the refs out of write_with_updates(), making the goto an inconvenient pattern. For now, remove the distinction between "goto write_error" and "goto error" by adding the message in-line using the new static method add_write_error(). This is functionally equivalent, but will make the next step easier. Signed-off-by: Derrick Stolee --- refs/packed-backend.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index afaf6f53233..ef8060f2e08 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -529,6 +529,12 @@ static int packed_init_db(struct ref_store *ref_store UNUSED, return 0; } +static void add_write_error(struct packed_ref_store *refs, struct strbuf *err) +{ + strbuf_addf(err, "error writing to %s: %s", + get_tempfile_path(refs->tempfile), strerror(errno)); +} + /* * Write the packed refs from the current snapshot to the packed-refs * tempfile, incorporating any changes from `updates`. `updates` must @@ -577,8 +583,10 @@ static int write_with_updates(struct packed_ref_store *refs, goto error; } - if (write_packed_file_header_v1(out) < 0) - goto write_error; + if (write_packed_file_header_v1(out) < 0) { + add_write_error(refs, err); + goto error; + } /* * We iterate in parallel through the current list of refs and @@ -673,8 +681,10 @@ static int write_with_updates(struct packed_ref_store *refs, if (write_packed_entry_v1(out, iter->refname, iter->oid, - peel_error ? NULL : &peeled)) - goto write_error; + peel_error ? NULL : &peeled)) { + add_write_error(refs, err); + goto error; + } if ((ok = ref_iterator_advance(iter)) != ITER_OK) iter = NULL; @@ -694,8 +704,10 @@ static int write_with_updates(struct packed_ref_store *refs, if (write_packed_entry_v1(out, update->refname, &update->new_oid, - peel_error ? NULL : &peeled)) - goto write_error; + peel_error ? NULL : &peeled)) { + add_write_error(refs, err); + goto error; + } i++; } @@ -719,10 +731,6 @@ static int write_with_updates(struct packed_ref_store *refs, return 0; -write_error: - strbuf_addf(err, "error writing to %s: %s", - get_tempfile_path(refs->tempfile), strerror(errno)); - error: if (iter) ref_iterator_abort(iter); From patchwork Mon Nov 7 18:35:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035095 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6AADEC433FE for ; Mon, 7 Nov 2022 18:37:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233199AbiKGShL (ORCPT ); Mon, 7 Nov 2022 13:37:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233174AbiKGSg3 (ORCPT ); Mon, 7 Nov 2022 13:36:29 -0500 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25ADB26568 for ; Mon, 7 Nov 2022 10:36:22 -0800 (PST) Received: by mail-wm1-x330.google.com with SMTP id p16so7428180wmc.3 for ; Mon, 07 Nov 2022 10:36:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=4EEkPtnqoIpElJUZqBn68bySDmQakpDYN76PfJqN25I=; b=JhPh73n0fDv2qz/YbWonM644vKigr4NdnzWruf+kij4r0ZuLK+ReuJ+8sdyhOq0aKV aNwb4yZ0x06VqK/eYxNJTl0M4pIidUuiOsLiMBjGtrM9QPGk4QV5hcaWCGq+XF0JYvjr 5H+IZWWxZmpJo27Ejy7e5/lj6ECyreSVlZnpovGaO8TtzzSGPfTI4tPsq4wilMIo+vB2 NgRnBAwWZx05tb6Y0WP3sexvs4hz9SAe8hpCRvJV6D0ZMZ7eNNBQJOTg8tAysOQHVtmm /BTnKwRLZMP9HKYbJx/mSmw+NBj5+02mDVeA1ussWqyHzFCzx1AqnfvK2h0e73It58XP ZERQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4EEkPtnqoIpElJUZqBn68bySDmQakpDYN76PfJqN25I=; b=kcuYCH7ieCSkL0EK6k/duiP2GZeAhWoSXX4En5uBniya17Dp42fOleI8wwudWvc9I9 FdB2aLH2ZKufH3UPh1sX72sMfJ9z9LoIgo09HcYbYfiRgNzJY29Wu7CU3B8+rdaV18o2 KT7Yuf+OE9TNHfer37AG8f5Aalmmj54B566dcsfKQor7dwbUdmQu6opEO9+KxMZDtKHT Qe0jz7F6v1kX88OC4rjfgMBuHDwjlesB+qAwhOL6PcWMpG30jhvFyHhtsBQ065mxNGT/ /Y1yuldIktv2WSjD8Y4DE5i45VCyMvfZpFHV4XMe10TFBodv9vcue1OAwgLHU7WFlmvl NEBw== X-Gm-Message-State: ANoB5pndlp9A9YiJGphUJgWUtv8vSwVGQb4xjggI2FaJDR0S8BAylnfO pJTDFPpkAI5t9SKeCHMwWT9/crWLahQ= X-Google-Smtp-Source: AA0mqf7tTDk5b/TBhjnix76Mvj9sdpgmk9B+2YU172pNN8CgY7xxf8arql1TWgx0O9xW6Z0NJfoCVQ== X-Received: by 2002:a1c:3847:0:b0:3cf:a616:cc6f with SMTP id f68-20020a1c3847000000b003cfa616cc6fmr6603638wma.78.1667846180440; Mon, 07 Nov 2022 10:36:20 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id s25-20020adfa299000000b00236b2804d79sm8454951wra.2.2022.11.07.10.36.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:19 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:48 +0000 Subject: [PATCH 14/30] packed-backend: extract iterator/updates merge Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee TBD Signed-off-by: Derrick Stolee --- refs/packed-backend.c | 117 +++++++++++++++++++++++------------------- 1 file changed, 64 insertions(+), 53 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index ef8060f2e08..0dff78f02c8 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -535,58 +535,13 @@ static void add_write_error(struct packed_ref_store *refs, struct strbuf *err) get_tempfile_path(refs->tempfile), strerror(errno)); } -/* - * Write the packed refs from the current snapshot to the packed-refs - * tempfile, incorporating any changes from `updates`. `updates` must - * be a sorted string list whose keys are the refnames and whose util - * values are `struct ref_update *`. On error, rollback the tempfile, - * write an error message to `err`, and return a nonzero value. - * - * The packfile must be locked before calling this function and will - * remain locked when it is done. - */ -static int write_with_updates(struct packed_ref_store *refs, - struct string_list *updates, - struct strbuf *err) +static int merge_iterator_and_updates(struct packed_ref_store *refs, + struct string_list *updates, + struct strbuf *err, + FILE *out) { struct ref_iterator *iter = NULL; - size_t i; - int ok; - FILE *out; - struct strbuf sb = STRBUF_INIT; - char *packed_refs_path; - - if (!is_lock_file_locked(&refs->lock)) - BUG("write_with_updates() called while unlocked"); - - /* - * If packed-refs is a symlink, we want to overwrite the - * symlinked-to file, not the symlink itself. Also, put the - * staging file next to it: - */ - packed_refs_path = get_locked_file_path(&refs->lock); - strbuf_addf(&sb, "%s.new", packed_refs_path); - free(packed_refs_path); - refs->tempfile = create_tempfile(sb.buf); - if (!refs->tempfile) { - strbuf_addf(err, "unable to create file %s: %s", - sb.buf, strerror(errno)); - strbuf_release(&sb); - return -1; - } - strbuf_release(&sb); - - out = fdopen_tempfile(refs->tempfile, "w"); - if (!out) { - strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s", - strerror(errno)); - goto error; - } - - if (write_packed_file_header_v1(out) < 0) { - add_write_error(refs, err); - goto error; - } + int ok, i; /* * We iterate in parallel through the current list of refs and @@ -713,6 +668,65 @@ static int write_with_updates(struct packed_ref_store *refs, } } +error: + if (iter) + ref_iterator_abort(iter); + return ok; +} + +/* + * Write the packed refs from the current snapshot to the packed-refs + * tempfile, incorporating any changes from `updates`. `updates` must + * be a sorted string list whose keys are the refnames and whose util + * values are `struct ref_update *`. On error, rollback the tempfile, + * write an error message to `err`, and return a nonzero value. + * + * The packfile must be locked before calling this function and will + * remain locked when it is done. + */ +static int write_with_updates(struct packed_ref_store *refs, + struct string_list *updates, + struct strbuf *err) +{ + int ok; + FILE *out; + struct strbuf sb = STRBUF_INIT; + char *packed_refs_path; + + if (!is_lock_file_locked(&refs->lock)) + BUG("write_with_updates() called while unlocked"); + + /* + * If packed-refs is a symlink, we want to overwrite the + * symlinked-to file, not the symlink itself. Also, put the + * staging file next to it: + */ + packed_refs_path = get_locked_file_path(&refs->lock); + strbuf_addf(&sb, "%s.new", packed_refs_path); + free(packed_refs_path); + refs->tempfile = create_tempfile(sb.buf); + if (!refs->tempfile) { + strbuf_addf(err, "unable to create file %s: %s", + sb.buf, strerror(errno)); + strbuf_release(&sb); + return -1; + } + strbuf_release(&sb); + + out = fdopen_tempfile(refs->tempfile, "w"); + if (!out) { + strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s", + strerror(errno)); + goto error; + } + + if (write_packed_file_header_v1(out) < 0) { + add_write_error(refs, err); + goto error; + } + + ok = merge_iterator_and_updates(refs, updates, err, out); + if (ok != ITER_DONE) { strbuf_addstr(err, "unable to write packed-refs file: " "error iterating over old contents"); @@ -732,9 +746,6 @@ static int write_with_updates(struct packed_ref_store *refs, return 0; error: - if (iter) - ref_iterator_abort(iter); - delete_tempfile(&refs->tempfile); return -1; } From patchwork Mon Nov 7 18:35:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035097 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3462CC43217 for ; Mon, 7 Nov 2022 18:37:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233169AbiKGShP (ORCPT ); Mon, 7 Nov 2022 13:37:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50076 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233142AbiKGSga (ORCPT ); Mon, 7 Nov 2022 13:36:30 -0500 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6890D24F3F for ; Mon, 7 Nov 2022 10:36:23 -0800 (PST) Received: by mail-wm1-x32b.google.com with SMTP id j5-20020a05600c410500b003cfa9c0ea76so2299466wmi.3 for ; Mon, 07 Nov 2022 10:36:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=Xf2uJpX9giLjZADQIFnMMNnHnj/OzmZFTvDuR/mGwNg=; b=THHQ4G/SAjiN975lcWjUbIMVFjaBrdQu3XzvzdN8xFo1PMHRh9RXUkJSBtOAkHrPI3 zfLtLM+ox45tsASR/slf/8Jb0UxxptjvX/a5b89kwGFywbBSBxivtzz3U3YF7q2i9MKl vHF1Biu2PxiU/6GIPTlLLnrY+is3uR8iaEUQTkSc753J4OiC4P9YZBjq+IcPZ0fTKDWM Go9cWvr5NxV8Y4Bc/0r/1gxvrCszdNUhL5WugeOoGjnn66gTtWCzEXAXwuHJkcMENuyp k3pgBPi8OlzgHbdYY6lWHXPyzEvOABnDvij9rgwZhlju8e1/mSeK4l+IrAjtnLpGcwgD 0zUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Xf2uJpX9giLjZADQIFnMMNnHnj/OzmZFTvDuR/mGwNg=; b=Nu5wbObV+pJ0u1LZOgc3JQku/PMfo/44WH7R4PTgoy+N4FN/rDjMlbU/aICZ59+9lg 9aC2RxASBnvu9Fz3qJBbjH84yeflB2ZZexiUWZ0ouhZLk6lMm4PGOpXXYz6NGW+bj++y 7y969GT4rjohKEznu1tQ47J4nFuPwtkYUfc9boT+kd26PY0ws7PuALXzp5rfjLTC6X7Q xsr8kaSOZXybjQW3YoeDYeYXLzh6HB/snnWncXOiGJoZnmevsv2/Vxfo50Yfm8kAkONN Ek8fZLWsZfu8TIOThn3Nd7K4WklcpC3mA+RPYbACs/wGwX8p3cHi9iIpprh1pl/OUWY8 5MNA== X-Gm-Message-State: ACrzQf3yCs7Paf6zEDGAuhCh7YwiMU1GnusRmaOMD3P/hhcUSzAzrRqy EKhX7WPdW6s+t/bpkT/KdhIC9BK/CKw= X-Google-Smtp-Source: AMsMyM4eIWq2FZMmv57FA8hNKHY6QyowzdnCsbrX3hcz7KJ5EIVVIHtYyKh5eVPwJ9cI4hxyIuhZCg== X-Received: by 2002:a05:600c:358f:b0:3c6:da94:66f9 with SMTP id p15-20020a05600c358f00b003c6da9466f9mr33447744wmq.142.1667846181582; Mon, 07 Nov 2022 10:36:21 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id 5-20020a05600c26c500b003b50428cf66sm8556509wmv.33.2022.11.07.10.36.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:21 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:49 +0000 Subject: [PATCH 15/30] packed-backend: create abstraction for writing refs Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The packed-refs file is a plaintext file format that starts with a header line, then each ref is given as one or two lines (two if there is a peeled value). These lines are written as part of a sequence of updates which are merged with the existing ref iterator in merge_iterator_and_updates(). That method is currently tied directly to write_packed_entry_v1(). When creating a new version of the packed-file format, it would be valuable to use this merging logic in an identical way. Create a new function pointer type, write_ref_fn, and use that type in merge_iterator_and_updates(). Notably, the function pointer type no longer depends on a FILE pointer, but instead takes an arbitrary "void *write_data" parameter. This flexibility will be critical in the future, since the planned v2 format will use the chunk-format API and need a more complicated structure than the output FILE. Signed-off-by: Derrick Stolee --- refs/packed-backend.c | 26 +++++++++++++++----------- refs/packed-backend.h | 16 ++++++++++++++-- refs/packed-format-v1.c | 7 +++++-- 3 files changed, 34 insertions(+), 15 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 0dff78f02c8..7ed9475812c 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -535,10 +535,11 @@ static void add_write_error(struct packed_ref_store *refs, struct strbuf *err) get_tempfile_path(refs->tempfile), strerror(errno)); } -static int merge_iterator_and_updates(struct packed_ref_store *refs, - struct string_list *updates, - struct strbuf *err, - FILE *out) +int merge_iterator_and_updates(struct packed_ref_store *refs, + struct string_list *updates, + struct strbuf *err, + write_ref_fn write_fn, + void *write_data) { struct ref_iterator *iter = NULL; int ok, i; @@ -634,9 +635,10 @@ static int merge_iterator_and_updates(struct packed_ref_store *refs, struct object_id peeled; int peel_error = ref_iterator_peel(iter, &peeled); - if (write_packed_entry_v1(out, iter->refname, - iter->oid, - peel_error ? NULL : &peeled)) { + if (write_fn(iter->refname, + iter->oid, + peel_error ? NULL : &peeled, + write_data)) { add_write_error(refs, err); goto error; } @@ -657,9 +659,10 @@ static int merge_iterator_and_updates(struct packed_ref_store *refs, int peel_error = peel_object(&update->new_oid, &peeled); - if (write_packed_entry_v1(out, update->refname, - &update->new_oid, - peel_error ? NULL : &peeled)) { + if (write_fn(update->refname, + &update->new_oid, + peel_error ? NULL : &peeled, + write_data)) { add_write_error(refs, err); goto error; } @@ -725,7 +728,8 @@ static int write_with_updates(struct packed_ref_store *refs, goto error; } - ok = merge_iterator_and_updates(refs, updates, err, out); + ok = merge_iterator_and_updates(refs, updates, err, + write_packed_entry_v1, out); if (ok != ITER_DONE) { strbuf_addstr(err, "unable to write packed-refs file: " diff --git a/refs/packed-backend.h b/refs/packed-backend.h index 143ed6d4f6c..b6908bb002c 100644 --- a/refs/packed-backend.h +++ b/refs/packed-backend.h @@ -192,6 +192,17 @@ struct packed_ref_iterator { unsigned int flags; }; +typedef int (*write_ref_fn)(const char *refname, + const struct object_id *oid, + const struct object_id *peeled, + void *write_data); + +int merge_iterator_and_updates(struct packed_ref_store *refs, + struct string_list *updates, + struct strbuf *err, + write_ref_fn write_fn, + void *write_data); + /** * Parse the buffer at the given snapshot to verify that it is a * packed-refs file in version 1 format. Update the snapshot->peeled @@ -227,8 +238,9 @@ void verify_buffer_safe_v1(struct snapshot *snapshot); void sort_snapshot_v1(struct snapshot *snapshot); int write_packed_file_header_v1(FILE *out); int next_record_v1(struct packed_ref_iterator *iter); -int write_packed_entry_v1(FILE *fh, const char *refname, +int write_packed_entry_v1(const char *refname, const struct object_id *oid, - const struct object_id *peeled); + const struct object_id *peeled, + void *write_data); #endif /* REFS_PACKED_BACKEND_H */ diff --git a/refs/packed-format-v1.c b/refs/packed-format-v1.c index ef9e6618c89..2d071567c02 100644 --- a/refs/packed-format-v1.c +++ b/refs/packed-format-v1.c @@ -441,10 +441,13 @@ int write_packed_file_header_v1(FILE *out) * error, return a nonzero value and leave errno set at the value left * by the failing call to `fprintf()`. */ -int write_packed_entry_v1(FILE *fh, const char *refname, +int write_packed_entry_v1(const char *refname, const struct object_id *oid, - const struct object_id *peeled) + const struct object_id *peeled, + void *write_data) { + FILE *fh = write_data; + if (fprintf(fh, "%s %s\n", oid_to_hex(oid), refname) < 0 || (peeled && fprintf(fh, "^%s\n", oid_to_hex(peeled)) < 0)) return -1; From patchwork Mon Nov 7 18:35:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035096 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04D7EC4332F for ; Mon, 7 Nov 2022 18:37:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233156AbiKGShO (ORCPT ); Mon, 7 Nov 2022 13:37:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49938 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233192AbiKGSgn (ORCPT ); Mon, 7 Nov 2022 13:36:43 -0500 Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9487E2717A for ; Mon, 7 Nov 2022 10:36:24 -0800 (PST) Received: by mail-wm1-x331.google.com with SMTP id c3-20020a1c3503000000b003bd21e3dd7aso10230983wma.1 for ; Mon, 07 Nov 2022 10:36:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=52gDFZ/+n1Bm86Srvi2+P5tDi+9yXiVJ4AmRBf/KxJ8=; b=YH5RLjgQGZDNYHZfynjIUP9BLpGVgAZSeLYGV/oShBdr11in1UDHnuX9b51zaDymM+ 2tsJUjBM+R7uGGO//s3qgC/ImoI7iXxyU8C9vQ4Kt2XJ9V3G920nyPjuRAUO3TPezAAe jv54P4oHqzq3ePruQX+UomPrsfKDHIIXT/Co6uuWJ7vlbXL4xJ/eCZszDHBV1JPiC1Xu X/7/+I0spal4gJayf0wM0Dr5UFLxzkrt0DVhae2BjmWcYbUMX6rOcTS3eHq5txYXYSO9 IlGS0qNiEH+DQw4Wjpkn8EMJGEApVWxamoloeP2gLn5DEjSaOdEmgi3KrP2dxUdR1sJY Wo0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=52gDFZ/+n1Bm86Srvi2+P5tDi+9yXiVJ4AmRBf/KxJ8=; b=oWPKmC8oR1dgsd7E9I0ximaksQfWHciRq7dUVUrIUu2o0Sehqk1dsLZgsdyE4rg4Mc tUj9TTFUf+YIWyP9BNYMjplqcW+FTMLCuUfewDWBT7QA0hTBlrY1QFRGwT2X8NBkXkcQ Ti9nkQibyW83X+QHlCu73jz4qLWLC10N8/FuDwji7xUuX/nWJGAentsSMZXRx7sPqXBD B232DUwa98vsVJzI6JwSvPuTo3QDZrkn/3zeeZQvOaF0Hpn7Bw8fppXpBZz44TgmgocC PT3LsENleHpZoKQc46i1uEHRuW42Zv99wV763FDRreoobPxOmbYrYX+baHrmjZzoYmGC wF8g== X-Gm-Message-State: ACrzQf14SG8jDK6UhnM/NQbAcaXx6odqzU4iz+CaKv7nYinMAUmV6V7a s9t1RTl4hDX75GKGC7el0h+KQXrJiXI= X-Google-Smtp-Source: AMsMyM6Ncr4dfzdw398MvFQdMWsZzD0h5Wu3D+Q0bruJ/NYFb5FAu/P2w3O+szAWWQEmPPtZjG//wA== X-Received: by 2002:a05:600c:21a:b0:3cf:6e76:9830 with SMTP id 26-20020a05600c021a00b003cf6e769830mr30693875wmi.159.1667846182629; Mon, 07 Nov 2022 10:36:22 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b9-20020a05600010c900b002368424f89esm8107758wrx.67.2022.11.07.10.36.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:22 -0800 (PST) Message-Id: <7c1f6a1ad609ecd33ceda5655cd8fc02137f3e5e.1667846165.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:50 +0000 Subject: [PATCH 16/30] config: add config values for packed-refs v2 Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When updating the file format version for something as critical as ref storage, the file format version must come with an extension change. The extensions.refFormat config value is a multi-valued config value that defaults to the pair "files" and "packed". Add "packed-v2" as a possible value to extensions.refFormat. This value specifies that the packed-refs file may exist in the version 2 format. (If the "packed" value does not exist, then the packed-refs file must exist in version 2, not version 1.) In order to select version 2 for writing, the user will have two options. First, the user could remove "packed" and add "packed-v2" to the extensions.refFormat list. This would imply that version 2 is the only format available. However, this also means that version 1 files would be ignored at read time, so this does not allow users to upgrade repositories with existing packed-refs files. Add a new refs.packedRefsVersion config option which allows specifying which version to use during writes. Thus, when both "packed" and "packed-v2" are in the extensions.refFormat list, the user can upgrade from version 1 to version 2, or downgrade from 2 to 1. Currently, the implementation does not use refs.packedRefsVersion, as that is delayed until we have the code to write that file format version. However, we can add the necessary enum values and flag constants to communicate the presence of "packed-v2" in the extensions.refFormat list. Signed-off-by: Derrick Stolee --- Documentation/config.txt | 2 ++ Documentation/config/extensions.txt | 27 ++++++++++++++++++++++----- Documentation/config/refs.txt | 13 +++++++++++++ refs.c | 4 +++- refs/packed-backend.c | 17 ++++++++++++++++- refs/refs-internal.h | 5 +++-- repository.h | 1 + setup.c | 2 ++ t/t3212-ref-formats.sh | 19 +++++++++++++++++++ 9 files changed, 81 insertions(+), 9 deletions(-) create mode 100644 Documentation/config/refs.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index 0e93aef8626..e480f99c3e1 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -493,6 +493,8 @@ include::config/rebase.txt[] include::config/receive.txt[] +include::config/refs.txt[] + include::config/remote.txt[] include::config/remotes.txt[] diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt index 18071c336d0..05abb821e07 100644 --- a/Documentation/config/extensions.txt +++ b/Documentation/config/extensions.txt @@ -35,17 +35,34 @@ indicate the existence of different layers: `files`, the `packed` format will only be used to group multiple loose object files upon request via the `git pack-refs` command or via the `pack-refs` maintenance task. + +`packed-v2`;; + When present, references may be stored as a group in a + `packed-refs` file in its version 2 format. This file is in the + same position and interacts with loose refs the same as when the + `packed` value exists. Both `packed` and `packed-v2` must exist to + upgrade an existing `packed-refs` file from version 1 to version 2 + or to downgrade from version 2 to version 1. When both are + present, the `refs.packedRefsVersion` config value indicates which + file format version is used during writes, but both versions are + understood when reading the file. -- + The following combinations are supported by this version of Git: + -- -`files` and `packed`;; +`files` and (`packed` and/or `packed-v2`);; This set of values indicates that references are stored both as - loose reference files and in the `packed-refs` file in its v1 - format. Loose references are preferred, and the `packed-refs` file - is updated only when deleting a reference that is stored in the - `packed-refs` file or during a `git pack-refs` command. + loose reference files and in the `packed-refs` file. Loose + references are preferred, and the `packed-refs` file is updated + only when deleting a reference that is stored in the `packed-refs` + file or during a `git pack-refs` command. ++ +The presence of `packed` and `packed-v2` specifies whether the `packed-refs` +file is allowed to be in its v1 or v2 formats, respectively. When only one +is present, Git will refuse to read the `packed-refs` file that do not +match the expected format. When both are present, the `refs.packedRefsVersion` +config option indicates which file format is used during writes. `files`;; When only this value is present, Git will ignore the `packed-refs` diff --git a/Documentation/config/refs.txt b/Documentation/config/refs.txt new file mode 100644 index 00000000000..b2fdb2923f7 --- /dev/null +++ b/Documentation/config/refs.txt @@ -0,0 +1,13 @@ +refs.packedRefsVersion:: + Specifies the file format version to use when writing a `packed-refs` + file. Defaults to `1`. ++ +The only other value currently allowed is `2`, which uses a structured file +format to result in a smaller `packed-refs` file. In order to write this +file format version, the repository must also have the `packed-v2` extension +enabled. The most typical setup will include the +`core.repositoryFormatVersion=1` config value and the `extensions.refFormat` +key will have three values: `files`, `packed`, and `packed-v2`. ++ +If `extensions.refFormat` has the value `packed-v2` and not `packed`, then +`refs.packedRefsVersion` defaults to `2`. diff --git a/refs.c b/refs.c index 21441ddb162..bf53d1445f2 100644 --- a/refs.c +++ b/refs.c @@ -1987,6 +1987,8 @@ static int add_ref_format_flags(enum ref_format_flags flags, int caps) { caps |= REF_STORE_FORMAT_FILES; if (flags & REF_FORMAT_PACKED) caps |= REF_STORE_FORMAT_PACKED; + if (flags & REF_FORMAT_PACKED_V2) + caps |= REF_STORE_FORMAT_PACKED_V2; return caps; } @@ -2006,7 +2008,7 @@ static struct ref_store *ref_store_init(struct repository *repo, flags = add_ref_format_flags(repo->ref_format, flags); if (!(flags & REF_STORE_FORMAT_FILES) && - (flags & REF_STORE_FORMAT_PACKED)) + packed_refs_enabled(flags)) be_name = "packed"; be = find_ref_storage_backend(be_name); diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 7ed9475812c..655aab939be 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -236,7 +236,13 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs) if (!load_contents(snapshot)) return snapshot; - if (parse_packed_format_v1_header(refs, snapshot, &sorted)) { + /* + * If this is a v1 file format, but we don't have v1 enabled, + * then ignore it the same way we would as if we didn't + * understand it. + */ + if (parse_packed_format_v1_header(refs, snapshot, &sorted) || + !(refs->store_flags & REF_STORE_FORMAT_PACKED)) { clear_snapshot(refs); return NULL; } @@ -310,6 +316,12 @@ static int packed_read_raw_ref(struct ref_store *ref_store, const char *refname, packed_downcast(ref_store, REF_STORE_READ, "read_raw_ref"); struct snapshot *snapshot = get_snapshot(refs); + if (!snapshot) { + /* refname is not a packed reference. */ + *failure_errno = ENOENT; + return -1; + } + return packed_read_raw_ref_v1(refs, snapshot, refname, oid, type, failure_errno); } @@ -410,6 +422,9 @@ static struct ref_iterator *packed_ref_iterator_begin( */ snapshot = get_snapshot(refs); + if (!snapshot) + return empty_ref_iterator_begin(); + if (prefix && *prefix) start = find_reference_location_v1(snapshot, prefix, 0); else diff --git a/refs/refs-internal.h b/refs/refs-internal.h index a1900848a87..39b93fce97c 100644 --- a/refs/refs-internal.h +++ b/refs/refs-internal.h @@ -522,11 +522,12 @@ struct ref_store; REF_STORE_MAIN) #define REF_STORE_FORMAT_FILES (1 << 8) /* can use loose ref files */ -#define REF_STORE_FORMAT_PACKED (1 << 9) /* can use packed-refs file */ +#define REF_STORE_FORMAT_PACKED (1 << 9) /* can use v1 packed-refs file */ +#define REF_STORE_FORMAT_PACKED_V2 (1 << 10) /* can use v2 packed-refs file */ static inline int packed_refs_enabled(int flags) { - return flags & REF_STORE_FORMAT_PACKED; + return flags & (REF_STORE_FORMAT_PACKED | REF_STORE_FORMAT_PACKED_V2); } /* diff --git a/repository.h b/repository.h index 5cfde4282c5..ee3a90efc72 100644 --- a/repository.h +++ b/repository.h @@ -64,6 +64,7 @@ struct repo_path_cache { enum ref_format_flags { REF_FORMAT_FILES = (1 << 0), REF_FORMAT_PACKED = (1 << 1), + REF_FORMAT_PACKED_V2 = (1 << 2), }; struct repository { diff --git a/setup.c b/setup.c index a5e63479558..72bfa289ade 100644 --- a/setup.c +++ b/setup.c @@ -582,6 +582,8 @@ static enum extension_result handle_extension(const char *var, data->ref_format |= REF_FORMAT_FILES; else if (!strcmp(value, "packed")) data->ref_format |= REF_FORMAT_PACKED; + else if (!strcmp(value, "packed-v2")) + data->ref_format |= REF_FORMAT_PACKED_V2; else return error(_("invalid value for '%s': '%s'"), "extensions.refFormat", value); diff --git a/t/t3212-ref-formats.sh b/t/t3212-ref-formats.sh index 67aa65c116f..cd1b399bbb8 100755 --- a/t/t3212-ref-formats.sh +++ b/t/t3212-ref-formats.sh @@ -56,4 +56,23 @@ test_expect_success 'extensions.refFormat=files only' ' ) ' +test_expect_success 'extensions.refFormat=files,packed-v2' ' + test_commit Q && + git pack-refs --all && + git init no-packed-v1 && + ( + cd no-packed-v1 && + git config core.repositoryFormatVersion 1 && + git config extensions.refFormat files && + git config --add extensions.refFormat packed-v2 && + test_commit A && + test_commit B && + + # Refuse to parse a v1 packed-refs file. + cp ../.git/packed-refs .git/packed-refs && + test_must_fail git rev-parse refs/tags/Q && + rm -f .git/packed-refs + ) +' + test_done From patchwork Mon Nov 7 18:35:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035098 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36D24C4332F for ; Mon, 7 Nov 2022 18:37:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233170AbiKGShT (ORCPT ); Mon, 7 Nov 2022 13:37:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233200AbiKGSgo (ORCPT ); Mon, 7 Nov 2022 13:36:44 -0500 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CB872792F for ; Mon, 7 Nov 2022 10:36:25 -0800 (PST) Received: by mail-wr1-x42f.google.com with SMTP id bs21so17580516wrb.4 for ; Mon, 07 Nov 2022 10:36:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=zlu7Xg4pc6JmgPYy6uvF/YsQqtsrVJL4KfVD4FBTPDQ=; b=EBP4MuCCUL5SL4UTVKOr5xFd0SXhNna3l6guvBbQu2SK91htvXieERk+cVfoNaRBLC gHJ1MDhoRGuR/fxsfb918oT5wKor4jTISqIcy7O/M3Q0KCO4oLMhX/b+6rvxXagA9uAP 8itoGhCysb8kFMRWjQygRZZNBcxekABxn/t1VYxZDNWApR8RdL1pNKW3P8lP3BgCbAkV O9a3K7l1V26/vT9nAJc4lv73AoP3Vx4kiY/rwYTqUM4f7S6CU7mWRnk0w8VWy9WekhPn wea2eeKIJk07ZLIznLpaqif7nBjOhzz55PfZVcNnxAMKK7m+6qVbfaUSb0QacOhN3Lgl iytg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zlu7Xg4pc6JmgPYy6uvF/YsQqtsrVJL4KfVD4FBTPDQ=; b=wysVrnjTEQvvu8eFn9Pm4Ofqh0ky2j7xPGG2ZuxjmHnKn6wxVijPU3Pb+55BjCFXUJ IZLEZ70Z2Ib/33ti/VMkiYNVnu8hmv/m72RlXHgu/fXB9IHG1qBFLrwf1DrX8d0JjcuE C+IYHf5iWHd5b7KmhF5qZZe9IzgNpWI4XshM3MS8GD1f422XHFoDI3k/5dtxC5TpfiCM yyoSTR0BTymRUpM/qLA90OypTqJuWlESWn2m3A5iqlPTmCUsLS9V7PJQPvQb44COQ3Hr 7CQV8ADDcRq67HdylT7WHI0yyWL3xYW2M5HSbB+HOfNSFiZ5WiPEZcGUWMrvezQj+pAc yr1g== X-Gm-Message-State: ACrzQf2112sbUKpJgTMCPYv3zzKK62siskliFUslzD4fpwwF2AQcdBnP DbBB/h+VPmCt2qLcTIRY+PyRQehSolU= X-Google-Smtp-Source: AMsMyM7pojQaR3ohIF9GbQS2yND8Kjg4VvEXHlrErZ4iWGgvECCntPOkPtQqTYqp0AhKWUoH9UAvVA== X-Received: by 2002:a5d:650c:0:b0:236:49d9:8e83 with SMTP id x12-20020a5d650c000000b0023649d98e83mr32522853wru.714.1667846183687; Mon, 07 Nov 2022 10:36:23 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i2-20020a05600c354200b003c71358a42dsm16405634wmq.18.2022.11.07.10.36.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:23 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:51 +0000 Subject: [PATCH 17/30] packed-backend: create shell of v2 writes Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Signed-off-by: Derrick Stolee --- Makefile | 1 + refs/packed-backend.c | 75 +++++++++++++++++++++++++++++++++++------ refs/packed-backend.h | 7 ++++ refs/packed-format-v2.c | 38 +++++++++++++++++++++ 4 files changed, 110 insertions(+), 11 deletions(-) create mode 100644 refs/packed-format-v2.c diff --git a/Makefile b/Makefile index 3dc887941d4..16cd245e0ad 100644 --- a/Makefile +++ b/Makefile @@ -1058,6 +1058,7 @@ LIB_OBJS += refs/files-backend.o LIB_OBJS += refs/iterator.o LIB_OBJS += refs/packed-backend.o LIB_OBJS += refs/packed-format-v1.o +LIB_OBJS += refs/packed-format-v2.o LIB_OBJS += refs/ref-cache.o LIB_OBJS += refspec.o LIB_OBJS += remote.o diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 655aab939be..09f7b74584f 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -692,6 +692,45 @@ error: return ok; } +static int write_with_updates_v1(struct packed_ref_store *refs, + struct string_list *updates, + struct strbuf *err) +{ + FILE *out; + + out = fdopen_tempfile(refs->tempfile, "w"); + if (!out) { + strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s", + strerror(errno)); + goto error; + } + + if (write_packed_file_header_v1(out) < 0) { + add_write_error(refs, err); + goto error; + } + + return merge_iterator_and_updates(refs, updates, err, + write_packed_entry_v1, out); + +error: + return -1; +} + +static int write_with_updates_v2(struct packed_ref_store *refs, + struct string_list *updates, + struct strbuf *err) +{ + struct write_packed_refs_v2_context *ctx = create_v2_context(refs, updates, err); + int ok = -1; + + if ((ok = write_packed_refs_v2(ctx)) < 0) + add_write_error(refs, err); + + free_v2_context(ctx); + return ok; +} + /* * Write the packed refs from the current snapshot to the packed-refs * tempfile, incorporating any changes from `updates`. `updates` must @@ -707,9 +746,9 @@ static int write_with_updates(struct packed_ref_store *refs, struct strbuf *err) { int ok; - FILE *out; struct strbuf sb = STRBUF_INIT; char *packed_refs_path; + int version; if (!is_lock_file_locked(&refs->lock)) BUG("write_with_updates() called while unlocked"); @@ -731,21 +770,35 @@ static int write_with_updates(struct packed_ref_store *refs, } strbuf_release(&sb); - out = fdopen_tempfile(refs->tempfile, "w"); - if (!out) { - strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s", - strerror(errno)); - goto error; + if (git_config_get_int("refs.packedrefsversion", &version)) { + /* + * Set the default depending on the current extension + * list. Default to version 1 if available, but allow a + * default of 2 if only "packed-v2" exists. + */ + if (refs->store_flags & REF_STORE_FORMAT_PACKED) + version = 1; + else if (refs->store_flags & REF_STORE_FORMAT_PACKED_V2) + version = 2; + else + BUG("writing a packed-refs file without an extension"); } - if (write_packed_file_header_v1(out) < 0) { - add_write_error(refs, err); + switch (version) { + case 1: + ok = write_with_updates_v1(refs, updates, err); + break; + + case 2: + ok = write_with_updates_v2(refs, updates, err); + break; + + default: + strbuf_addf(err, "unknown packed-refs version: %d", + version); goto error; } - ok = merge_iterator_and_updates(refs, updates, err, - write_packed_entry_v1, out); - if (ok != ITER_DONE) { strbuf_addstr(err, "unable to write packed-refs file: " "error iterating over old contents"); diff --git a/refs/packed-backend.h b/refs/packed-backend.h index b6908bb002c..e76f26bfc46 100644 --- a/refs/packed-backend.h +++ b/refs/packed-backend.h @@ -243,4 +243,11 @@ int write_packed_entry_v1(const char *refname, const struct object_id *peeled, void *write_data); +struct write_packed_refs_v2_context; +struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store *refs, + struct string_list *updates, + struct strbuf *err); +int write_packed_refs_v2(struct write_packed_refs_v2_context *ctx); +void free_v2_context(struct write_packed_refs_v2_context *ctx); + #endif /* REFS_PACKED_BACKEND_H */ diff --git a/refs/packed-format-v2.c b/refs/packed-format-v2.c new file mode 100644 index 00000000000..ecf3cc93694 --- /dev/null +++ b/refs/packed-format-v2.c @@ -0,0 +1,38 @@ +#include "../cache.h" +#include "../config.h" +#include "../refs.h" +#include "refs-internal.h" +#include "packed-backend.h" +#include "../iterator.h" +#include "../lockfile.h" +#include "../chdir-notify.h" + +struct write_packed_refs_v2_context { + struct packed_ref_store *refs; + struct string_list *updates; + struct strbuf *err; +}; + +struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store *refs, + struct string_list *updates, + struct strbuf *err) +{ + struct write_packed_refs_v2_context *ctx; + CALLOC_ARRAY(ctx, 1); + + ctx->refs = refs; + ctx->updates = updates; + ctx->err = err; + + return ctx; +} + +int write_packed_refs_v2(struct write_packed_refs_v2_context *ctx) +{ + return 0; +} + +void free_v2_context(struct write_packed_refs_v2_context *ctx) +{ + free(ctx); +} From patchwork Mon Nov 7 18:35:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035099 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 577F4C433FE for ; Mon, 7 Nov 2022 18:37:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233174AbiKGShV (ORCPT ); Mon, 7 Nov 2022 13:37:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233201AbiKGSgo (ORCPT ); Mon, 7 Nov 2022 13:36:44 -0500 Received: from mail-wr1-x434.google.com (mail-wr1-x434.google.com [IPv6:2a00:1450:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49BB92792D for ; Mon, 7 Nov 2022 10:36:25 -0800 (PST) Received: by mail-wr1-x434.google.com with SMTP id k8so17611109wrh.1 for ; Mon, 07 Nov 2022 10:36:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=IMAERS2dE0cjI5/b7H9PVha+UALE1ARUL34XOpxkPHU=; b=Akng5Ez5gfLPJDyybkYKG3N5cS5PN1aXywZKNxpD2YtrLw5i0VUZY1Gd/Vvkbuws2g hCSFJ42bskVFO+pl9LtwpHNmoGb/XcPZ07vn1813JCzhnk4YidwyNa6eSu7/Nv2W9qRC +XnV0eo5cDSvOYOKFAM/88prNinNWLR0Xw9t1huiKHWnHRznLGTnuWyHIfADyPG+raxt 1pmf9+q5Waovsh7TSi54lPMIKQoX5KpkGgz1W7lq57cEdOB5Byz9CdUiAG10F9XgOag6 ixIgVbd9FFPARLWMHWic5wrPtalqApz+8Fz37bWD/Zh3AH50Lq9B5MrWlAPUxiqnVVUY G6WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IMAERS2dE0cjI5/b7H9PVha+UALE1ARUL34XOpxkPHU=; b=3J/SVI3sqxO1yJCyIBFGdp+zTg7RLElRD6AJEIyv+VgmRg1p8Bk+7pYqN0VPJfa7ME 373iQsLmLnGCOeG/wUnEWfJVECk1Lxl+MUVuzWgZbbi1yZzy2IT7BIaZoCvLxgWjKYt0 uZZwVlfYHRP9nWuWyPacvNzbswOhtU+su8T4xK3UqTYyfahuXby9fzU2cCLAFYt6UtK0 J+cvOpEOQevm8rT4bhM1mGiDyGqVxoNQ7XDVw2UcNDd5qCdWP+daBaWSt8w7ZPI3slOb H8TF7qpMuPK9CUeuh6QHNe1gt9XLwJriqcBbkBC8RZD7axasm3skcfavoJpotZHp3JkQ tP7w== X-Gm-Message-State: ACrzQf0DeCS7sojDhjXw2sPIs5DBa2jR/2Loa92E2n57dC5ZqKtG+A47 6s2tugp7KBeJU84MTf50jmaEbhWPmkY= X-Google-Smtp-Source: AMsMyM6TUTGdEnnJ5g7GmJV3lwSArqELz3aiTGYoNva8XFcSBYf5uyAYoFF813RkfcnufU2X9G1BXA== X-Received: by 2002:adf:f70b:0:b0:236:f367:920f with SMTP id r11-20020adff70b000000b00236f367920fmr18308870wrp.129.1667846184615; Mon, 07 Nov 2022 10:36:24 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id y15-20020a1c4b0f000000b003b31c560a0csm8796813wma.12.2022.11.07.10.36.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:24 -0800 (PST) Message-Id: <740c2f6e6d1e628a84dc4e1927fef70b5d8d624c.1667846165.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:52 +0000 Subject: [PATCH 18/30] packed-refs: write file format version 2 Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee TODO: add writing tests. Signed-off-by: Derrick Stolee --- refs/packed-backend.c | 3 +- refs/packed-format-v2.c | 108 ++++++++++++++++++++++++++++++++++++++++ t/t3212-ref-formats.sh | 6 ++- 3 files changed, 115 insertions(+), 2 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 09f7b74584f..3429e63620a 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -790,7 +790,8 @@ static int write_with_updates(struct packed_ref_store *refs, break; case 2: - ok = write_with_updates_v2(refs, updates, err); + /* Convert the normal error codes to ITER_DONE. */ + ok = write_with_updates_v2(refs, updates, err) ? -2 : ITER_DONE; break; default: diff --git a/refs/packed-format-v2.c b/refs/packed-format-v2.c index ecf3cc93694..044cc9f629a 100644 --- a/refs/packed-format-v2.c +++ b/refs/packed-format-v2.c @@ -6,11 +6,30 @@ #include "../iterator.h" #include "../lockfile.h" #include "../chdir-notify.h" +#include "../chunk-format.h" +#include "../csum-file.h" + +#define OFFSET_IS_PEELED (((uint64_t)1) << 63) + +#define PACKED_REFS_SIGNATURE 0x50524546 /* "PREF" */ +#define CHREFS_CHUNKID_OFFSETS 0x524F4646 /* "ROFF" */ +#define CHREFS_CHUNKID_REFS 0x52454653 /* "REFS" */ struct write_packed_refs_v2_context { struct packed_ref_store *refs; struct string_list *updates; struct strbuf *err; + + struct hashfile *f; + struct chunkfile *cf; + + /* + * As we stream the ref names to the refs chunk, store these + * values in-memory. These arrays are populated one for every ref. + */ + uint64_t *offsets; + size_t nr; + size_t offsets_alloc; }; struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store *refs, @@ -24,15 +43,104 @@ struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store * ctx->updates = updates; ctx->err = err; + if (!fdopen_tempfile(refs->tempfile, "w")) { + strbuf_addf(err, "unable to fdopen packed-refs tempfile: %s", + strerror(errno)); + return ctx; + } + + ctx->f = hashfd(refs->tempfile->fd, refs->tempfile->filename.buf); + ctx->cf = init_chunkfile(ctx->f); + return ctx; } +static int write_packed_entry_v2(const char *refname, + const struct object_id *oid, + const struct object_id *peeled, + void *write_data) +{ + struct write_packed_refs_v2_context *ctx = write_data; + size_t reflen = strlen(refname) + 1; + size_t i = ctx->nr; + + ALLOC_GROW(ctx->offsets, i + 1, ctx->offsets_alloc); + + /* Write entire ref, including null terminator. */ + hashwrite(ctx->f, refname, reflen); + hashwrite(ctx->f, oid->hash, the_hash_algo->rawsz); + if (peeled) + hashwrite(ctx->f, peeled->hash, the_hash_algo->rawsz); + + if (i) + ctx->offsets[i] = (ctx->offsets[i - 1] & (~OFFSET_IS_PEELED)); + else + ctx->offsets[i] = 0; + ctx->offsets[i] += reflen + the_hash_algo->rawsz; + + if (peeled) { + ctx->offsets[i] += the_hash_algo->rawsz; + ctx->offsets[i] |= OFFSET_IS_PEELED; + } + + ctx->nr++; + return 0; +} + +static int write_refs_chunk_refs(struct hashfile *f, + void *data) +{ + struct write_packed_refs_v2_context *ctx = data; + int ok; + + trace2_region_enter("refs", "refs-chunk", the_repository); + ok = merge_iterator_and_updates(ctx->refs, ctx->updates, ctx->err, + write_packed_entry_v2, ctx); + trace2_region_leave("refs", "refs-chunk", the_repository); + + return ok != ITER_DONE; +} + +static int write_refs_chunk_offsets(struct hashfile *f, + void *data) +{ + struct write_packed_refs_v2_context *ctx = data; + size_t i; + + trace2_region_enter("refs", "offsets", the_repository); + for (i = 0; i < ctx->nr; i++) + hashwrite_be64(f, ctx->offsets[i]); + + trace2_region_leave("refs", "offsets", the_repository); + return 0; +} + int write_packed_refs_v2(struct write_packed_refs_v2_context *ctx) { + unsigned char file_hash[GIT_MAX_RAWSZ]; + + add_chunk(ctx->cf, CHREFS_CHUNKID_REFS, 0, write_refs_chunk_refs); + add_chunk(ctx->cf, CHREFS_CHUNKID_OFFSETS, 0, write_refs_chunk_offsets); + + hashwrite_be32(ctx->f, PACKED_REFS_SIGNATURE); + hashwrite_be32(ctx->f, 2); + hashwrite_be32(ctx->f, the_hash_algo->format_id); + + if (write_chunkfile(ctx->cf, CHUNKFILE_TRAILING_TOC, ctx)) + goto failure; + + finalize_hashfile(ctx->f, file_hash, FSYNC_COMPONENT_REFERENCE, + CSUM_HASH_IN_STREAM | CSUM_FSYNC); + return 0; + +failure: + return -1; } void free_v2_context(struct write_packed_refs_v2_context *ctx) { + if (ctx->cf) + free_chunkfile(ctx->cf); free(ctx); } diff --git a/t/t3212-ref-formats.sh b/t/t3212-ref-formats.sh index cd1b399bbb8..03c713ac4f6 100755 --- a/t/t3212-ref-formats.sh +++ b/t/t3212-ref-formats.sh @@ -71,7 +71,11 @@ test_expect_success 'extensions.refFormat=files,packed-v2' ' # Refuse to parse a v1 packed-refs file. cp ../.git/packed-refs .git/packed-refs && test_must_fail git rev-parse refs/tags/Q && - rm -f .git/packed-refs + rm -f .git/packed-refs && + + # Create a v2 packed-refs file + git pack-refs --all && + test_path_exists .git/packed-refs ) ' From patchwork Mon Nov 7 18:35:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035100 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 820B9C4332F for ; Mon, 7 Nov 2022 18:37:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233260AbiKGShk (ORCPT ); Mon, 7 Nov 2022 13:37:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233220AbiKGShE (ORCPT ); Mon, 7 Nov 2022 13:37:04 -0500 Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3F8424953 for ; Mon, 7 Nov 2022 10:36:27 -0800 (PST) Received: by mail-wr1-x42d.google.com with SMTP id g12so17517830wrs.10 for ; Mon, 07 Nov 2022 10:36:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=Y95nN9gSVOv1U3X11AiaCeJzCGdcu0mcG5NIL8+Q140=; b=BL9a7sgu8N3GSYfAbE1chxjA1oKGxS9rdph4Dv72DMGRp8M3CLZyD7SEPoGBfS0ec+ 24vtfekTRB4FuDaDhQPMXFHgGPbvszVlZHWNDpTdWDqdiqJS33Z3LFwvBOOBjh2s7AOE u7+Xai1GJG1j6dQYTpaRjM3E2M1AKFEwvyt9pMgV7gCURUUhp9hqwvhwlb1N303Mfcvv JdiSabzWDnoxR6ghqZtGx0cpWJeaQyZefbt+V7Q2Z9UznArjGCM6c/NHiugkBWulVBvF YnqNgYUw3nN5K3DmthkDGQpSVsdYbzj71smBTDZWqk8iiVQedDs7k3xxoojVuIWlb+zL cxRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y95nN9gSVOv1U3X11AiaCeJzCGdcu0mcG5NIL8+Q140=; b=Tz4nXdPnWm/gMsDxGvr4E+j6ExlqwKWgcth7Sfk2Dd/Rkv0K/FezZ1qYedoJciBvc0 73wEPtbk0dGBi2U1oOPGWDc2mqB5cRuudFIW89RFrvoW0jQMDipoN7vLZjH74URbdDL8 4hBsYCbPQswSLh1O+QYXAhKlxQUkG4fD9lE2LQ4sG5BaupUpVUviaLYCEadvzNTyl+/6 NdM0VJ0VbGnAsjwhc0xgAV0iOcbGrhje5Z1q8b5owK+yuktBmqY7kl8SBD0PMguGI56d Lr92E/0NZZLvNek7JMlm/GBAHx9A6sIvAmnbfSaGQgoUdEcXfdKDKZDGgV9VaAgXc8nB ry0A== X-Gm-Message-State: ACrzQf3RHqmbKJjYf2bndL2/BTfgNZXp3aJCTLHPCVlylATJV51y2IRQ oaVqC0zzKr7XCQzltABBu6ZoddNjD2g= X-Google-Smtp-Source: AMsMyM5otmtJ4cDc4gg60ZDEg85sElCELPVfmJpfUpifJOfhuag6KvDdhrmVCZc+L3BN9W+/W4v0zw== X-Received: by 2002:a5d:590d:0:b0:236:4ddd:1869 with SMTP id v13-20020a5d590d000000b002364ddd1869mr31385949wrd.709.1667846185508; Mon, 07 Nov 2022 10:36:25 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v2-20020a7bcb42000000b003cf4ec90938sm8885100wmj.21.2022.11.07.10.36.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:25 -0800 (PST) Message-Id: <701c5ad22e7787cfd27628ff613b7849e24fc675.1667846165.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:53 +0000 Subject: [PATCH 19/30] packed-refs: read file format v2 Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Signed-off-by: Derrick Stolee --- refs/packed-backend.c | 129 ++++++++++++++++--------- refs/packed-backend.h | 72 ++++++++++++-- refs/packed-format-v2.c | 209 ++++++++++++++++++++++++++++++++++++++++ t/t3212-ref-formats.sh | 17 +++- 4 files changed, 372 insertions(+), 55 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 3429e63620a..549cce1f84a 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -66,7 +66,7 @@ void clear_snapshot_buffer(struct snapshot *snapshot) * Decrease the reference count of `*snapshot`. If it goes to zero, * free `*snapshot` and return true; otherwise return false. */ -static int release_snapshot(struct snapshot *snapshot) +int release_snapshot(struct snapshot *snapshot) { if (!--snapshot->referrers) { stat_validity_clear(&snapshot->validity); @@ -142,7 +142,6 @@ static int load_contents(struct snapshot *snapshot) { int fd; struct stat st; - size_t size; ssize_t bytes_read; if (!packed_refs_enabled(snapshot->refs->store_flags)) @@ -168,25 +167,25 @@ static int load_contents(struct snapshot *snapshot) if (fstat(fd, &st) < 0) die_errno("couldn't stat %s", snapshot->refs->path); - size = xsize_t(st.st_size); + snapshot->buflen = xsize_t(st.st_size); - if (!size) { + if (!snapshot->buflen) { close(fd); return 0; - } else if (mmap_strategy == MMAP_NONE || size <= SMALL_FILE_SIZE) { - snapshot->buf = xmalloc(size); - bytes_read = read_in_full(fd, snapshot->buf, size); - if (bytes_read < 0 || bytes_read != size) + } else if (mmap_strategy == MMAP_NONE || snapshot->buflen <= SMALL_FILE_SIZE) { + snapshot->buf = xmalloc(snapshot->buflen); + bytes_read = read_in_full(fd, snapshot->buf, snapshot->buflen); + if (bytes_read < 0 || bytes_read != snapshot->buflen) die_errno("couldn't read %s", snapshot->refs->path); snapshot->mmapped = 0; } else { - snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); + snapshot->buf = xmmap(NULL, snapshot->buflen, PROT_READ, MAP_PRIVATE, fd, 0); snapshot->mmapped = 1; } close(fd); snapshot->start = snapshot->buf; - snapshot->eof = snapshot->buf + size; + snapshot->eof = snapshot->buf + snapshot->buflen; return 1; } @@ -232,46 +231,52 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs) snapshot->refs = refs; acquire_snapshot(snapshot); snapshot->peeled = PEELED_NONE; + snapshot->version = 1; if (!load_contents(snapshot)) return snapshot; - /* - * If this is a v1 file format, but we don't have v1 enabled, - * then ignore it the same way we would as if we didn't - * understand it. - */ - if (parse_packed_format_v1_header(refs, snapshot, &sorted) || - !(refs->store_flags & REF_STORE_FORMAT_PACKED)) { - clear_snapshot(refs); - return NULL; - } + if ((refs->store_flags & REF_STORE_FORMAT_PACKED) && + !detect_packed_format_v2_header(refs, snapshot)) { + parse_packed_format_v1_header(refs, snapshot, &sorted); + snapshot->version = 1; + verify_buffer_safe_v1(snapshot); - verify_buffer_safe_v1(snapshot); + if (!sorted) { + sort_snapshot_v1(snapshot); - if (!sorted) { - sort_snapshot_v1(snapshot); + /* + * Reordering the records might have moved a short one + * to the end of the buffer, so verify the buffer's + * safety again: + */ + verify_buffer_safe_v1(snapshot); + } - /* - * Reordering the records might have moved a short one - * to the end of the buffer, so verify the buffer's - * safety again: - */ - verify_buffer_safe_v1(snapshot); + if (mmap_strategy != MMAP_OK && snapshot->mmapped) { + /* + * We don't want to leave the file mmapped, so we are + * forced to make a copy now: + */ + char *buf_copy = xmalloc(snapshot->buflen); + + memcpy(buf_copy, snapshot->start, snapshot->buflen); + clear_snapshot_buffer(snapshot); + snapshot->buf = snapshot->start = buf_copy; + snapshot->eof = buf_copy + snapshot->buflen; + } + + return snapshot; } - if (mmap_strategy != MMAP_OK && snapshot->mmapped) { + if (refs->store_flags & REF_STORE_FORMAT_PACKED_V2) { /* - * We don't want to leave the file mmapped, so we are - * forced to make a copy now: + * Assume we are in v2 format mode, now. + * + * fill_snapshot_v2() will die() if parsing fails. */ - size_t size = snapshot->eof - snapshot->start; - char *buf_copy = xmalloc(size); - - memcpy(buf_copy, snapshot->start, size); - clear_snapshot_buffer(snapshot); - snapshot->buf = snapshot->start = buf_copy; - snapshot->eof = buf_copy + size; + fill_snapshot_v2(snapshot); + snapshot->version = 2; } return snapshot; @@ -322,8 +327,18 @@ static int packed_read_raw_ref(struct ref_store *ref_store, const char *refname, return -1; } - return packed_read_raw_ref_v1(refs, snapshot, refname, - oid, type, failure_errno); + switch (snapshot->version) { + case 1: + return packed_read_raw_ref_v1(refs, snapshot, refname, + oid, type, failure_errno); + + case 2: + return packed_read_raw_ref_v2(refs, snapshot, refname, + oid, type, failure_errno); + + default: + return -1; + } } /* @@ -335,7 +350,16 @@ static int packed_read_raw_ref(struct ref_store *ref_store, const char *refname, */ static int next_record(struct packed_ref_iterator *iter) { - return next_record_v1(iter); + switch (iter->version) { + case 1: + return next_record_v1(iter); + + case 2: + return next_record_v2(iter); + + default: + return -1; + } } static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator) @@ -410,6 +434,7 @@ static struct ref_iterator *packed_ref_iterator_begin( struct packed_ref_iterator *iter; struct ref_iterator *ref_iterator; unsigned int required_flags = REF_STORE_READ; + size_t v2_row = 0; if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) required_flags |= REF_STORE_ODB; @@ -422,13 +447,21 @@ static struct ref_iterator *packed_ref_iterator_begin( */ snapshot = get_snapshot(refs); - if (!snapshot) + if (!snapshot || snapshot->version < 0 || snapshot->version > 2) return empty_ref_iterator_begin(); - if (prefix && *prefix) - start = find_reference_location_v1(snapshot, prefix, 0); - else - start = snapshot->start; + if (prefix && *prefix) { + if (snapshot->version == 1) + start = find_reference_location_v1(snapshot, prefix, 0); + else + start = find_reference_location_v2(snapshot, prefix, 0, + &v2_row); + } else { + if (snapshot->version == 1) + start = snapshot->start; + else + start = snapshot->refs_chunk; + } if (start == snapshot->eof) return empty_ref_iterator_begin(); @@ -439,6 +472,8 @@ static struct ref_iterator *packed_ref_iterator_begin( iter->snapshot = snapshot; acquire_snapshot(snapshot); + iter->version = snapshot->version; + iter->row = v2_row; iter->pos = start; iter->eof = snapshot->eof; diff --git a/refs/packed-backend.h b/refs/packed-backend.h index e76f26bfc46..3a8649857f1 100644 --- a/refs/packed-backend.h +++ b/refs/packed-backend.h @@ -72,6 +72,9 @@ struct snapshot { /* Is the `packed-refs` file currently mmapped? */ int mmapped; + /* which file format version is this file? */ + int version; + /* * The contents of the `packed-refs` file: * @@ -96,6 +99,14 @@ struct snapshot { */ enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled; + /************************* + * packed-refs v2 values * + *************************/ + size_t nr; + size_t buflen; + const unsigned char *offset_chunk; + const char *refs_chunk; + /* * Count of references to this instance, including the pointer * from `packed_ref_store::snapshot`, if any. The instance @@ -112,6 +123,8 @@ struct snapshot { struct stat_validity validity; }; +int release_snapshot(struct snapshot *snapshot); + /* * If the buffer in `snapshot` is active, then either munmap the * memory and close the file, or free the memory. Then set the buffer @@ -175,21 +188,30 @@ struct packed_ref_store { */ struct packed_ref_iterator { struct ref_iterator base; - struct snapshot *snapshot; + struct repository *repo; + unsigned int flags; + int version; + + /* Scratch space for current values: */ + struct object_id oid, peeled; + struct strbuf refname_buf; /* The current position in the snapshot's buffer: */ const char *pos; + /*********************************** + * packed-refs v1 iterator values. * + ***********************************/ + /* The end of the part of the buffer that will be iterated over: */ const char *eof; - /* Scratch space for current values: */ - struct object_id oid, peeled; - struct strbuf refname_buf; - - struct repository *repo; - unsigned int flags; + /*********************************** + * packed-refs v2 iterator values. * + ***********************************/ + size_t nr; + size_t row; }; typedef int (*write_ref_fn)(const char *refname, @@ -243,6 +265,42 @@ int write_packed_entry_v1(const char *refname, const struct object_id *peeled, void *write_data); +/** + * Parse the buffer at the given snapshot to verify that it is a + * packed-refs file in version 1 format. Update the snapshot->peeled + * value according to the header information. Update the given + * 'sorted' value with whether or not the packed-refs file is sorted. + */ +int parse_packed_format_v1_header(struct packed_ref_store *refs, + struct snapshot *snapshot, + int *sorted); + +int detect_packed_format_v2_header(struct packed_ref_store *refs, + struct snapshot *snapshot); +/* + * Find the place in `snapshot->buf` where the start of the record for + * `refname` starts. If `mustexist` is true and the reference doesn't + * exist, then return NULL. If `mustexist` is false and the reference + * doesn't exist, then return the point where that reference would be + * inserted, or `snapshot->eof` (which might be NULL) if it would be + * inserted at the end of the file. In the latter mode, `refname` + * doesn't have to be a proper reference name; for example, one could + * search for "refs/replace/" to find the start of any replace + * references. + * + * The record is sought using a binary search, so `snapshot->buf` must + * be sorted. + */ +const char *find_reference_location_v2(struct snapshot *snapshot, + const char *refname, int mustexist, + size_t *pos); + +int packed_read_raw_ref_v2(struct packed_ref_store *refs, struct snapshot *snapshot, + const char *refname, struct object_id *oid, + unsigned int *type, int *failure_errno); +int next_record_v2(struct packed_ref_iterator *iter); +void fill_snapshot_v2(struct snapshot *snapshot); + struct write_packed_refs_v2_context; struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store *refs, struct string_list *updates, diff --git a/refs/packed-format-v2.c b/refs/packed-format-v2.c index 044cc9f629a..d75df9545ec 100644 --- a/refs/packed-format-v2.c +++ b/refs/packed-format-v2.c @@ -15,6 +15,215 @@ #define CHREFS_CHUNKID_OFFSETS 0x524F4646 /* "ROFF" */ #define CHREFS_CHUNKID_REFS 0x52454653 /* "REFS" */ +int detect_packed_format_v2_header(struct packed_ref_store *refs, + struct snapshot *snapshot) +{ + /* + * packed-refs v1 might not have a header, so check instead + * that the v2 signature is not present. + */ + return get_be32(snapshot->buf) == PACKED_REFS_SIGNATURE; +} + +static const char *get_nth_ref(struct snapshot *snapshot, + size_t n) +{ + uint64_t offset; + + if (n >= snapshot->nr) + BUG("asking for position %"PRIu64" outside of bounds (%"PRIu64")", + (uint64_t)n, (uint64_t)snapshot->nr); + + if (n) + offset = get_be64(snapshot->offset_chunk + (n-1) * sizeof(uint64_t)) + & ~OFFSET_IS_PEELED; + else + offset = 0; + + return snapshot->refs_chunk + offset; +} + +/* + * Find the place in `snapshot->buf` where the start of the record for + * `refname` starts. If `mustexist` is true and the reference doesn't + * exist, then return NULL. If `mustexist` is false and the reference + * doesn't exist, then return the point where that reference would be + * inserted, or `snapshot->eof` (which might be NULL) if it would be + * inserted at the end of the file. In the latter mode, `refname` + * doesn't have to be a proper reference name; for example, one could + * search for "refs/replace/" to find the start of any replace + * references. + * + * The record is sought using a binary search, so `snapshot->buf` must + * be sorted. + */ +const char *find_reference_location_v2(struct snapshot *snapshot, + const char *refname, int mustexist, + size_t *pos) +{ + size_t lo = 0, hi = snapshot->nr; + + while (lo != hi) { + const char *rec; + int cmp; + size_t mid = lo + (hi - lo) / 2; + + rec = get_nth_ref(snapshot, mid); + cmp = strcmp(rec, refname); + if (cmp < 0) { + lo = mid + 1; + } else if (cmp > 0) { + hi = mid; + } else { + if (pos) + *pos = mid; + return rec; + } + } + + if (mustexist) { + return NULL; + } else { + const char *ret; + /* + * We are likely doing a prefix match, so use the current + * 'lo' position as the indicator. + */ + if (pos) + *pos = lo; + if (lo >= snapshot->nr) + return NULL; + + ret = get_nth_ref(snapshot, lo); + return ret; + } +} + +int packed_read_raw_ref_v2(struct packed_ref_store *refs, struct snapshot *snapshot, + const char *refname, struct object_id *oid, + unsigned int *type, int *failure_errno) +{ + const char *rec; + + *type = 0; + + rec = find_reference_location_v2(snapshot, refname, 1, NULL); + + if (!rec) { + /* refname is not a packed reference. */ + *failure_errno = ENOENT; + return -1; + } + + hashcpy(oid->hash, (const unsigned char *)rec + strlen(rec) + 1); + oid->algo = hash_algo_by_ptr(the_hash_algo); + + *type = REF_ISPACKED; + return 0; +} + +static int packed_refs_read_offsets(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct snapshot *snapshot = data; + + snapshot->offset_chunk = chunk_start; + snapshot->nr = chunk_size / sizeof(uint64_t); + return 0; +} + +void fill_snapshot_v2(struct snapshot *snapshot) +{ + uint32_t file_signature, file_version, hash_version; + struct chunkfile *cf; + + file_signature = get_be32(snapshot->buf); + if (file_signature != PACKED_REFS_SIGNATURE) + die(_("%s file signature %X does not match signature %X"), + "packed-ref", file_signature, PACKED_REFS_SIGNATURE); + + file_version = get_be32(snapshot->buf + sizeof(uint32_t)); + if (file_version != 2) + die(_("format version %u does not match expected file version %u"), + file_version, 2); + + hash_version = get_be32(snapshot->buf + 2 * sizeof(uint32_t)); + if (hash_version != the_hash_algo->format_id) + die(_("hash version %X does not match expected hash version %X"), + hash_version, the_hash_algo->format_id); + + cf = init_chunkfile(NULL); + + if (read_trailing_table_of_contents(cf, (const unsigned char *)snapshot->buf, snapshot->buflen)) { + release_snapshot(snapshot); + snapshot = NULL; + goto cleanup; + } + + read_chunk(cf, CHREFS_CHUNKID_OFFSETS, packed_refs_read_offsets, snapshot); + pair_chunk(cf, CHREFS_CHUNKID_REFS, (const unsigned char**)&snapshot->refs_chunk); + + /* TODO: add error checks for invalid chunk combinations. */ + +cleanup: + free_chunkfile(cf); +} + +/* + * Move the iterator to the next record in the snapshot, without + * respect for whether the record is actually required by the current + * iteration. Adjust the fields in `iter` and return `ITER_OK` or + * `ITER_DONE`. This function does not free the iterator in the case + * of `ITER_DONE`. + */ +int next_record_v2(struct packed_ref_iterator *iter) +{ + uint64_t offset; + const char *pos = iter->pos; + strbuf_reset(&iter->refname_buf); + + if (iter->row == iter->snapshot->nr) + return ITER_DONE; + + iter->base.flags = REF_ISPACKED; + + strbuf_addstr(&iter->refname_buf, pos); + iter->base.refname = iter->refname_buf.buf; + pos += strlen(pos) + 1; + + hashcpy(iter->oid.hash, (const unsigned char *)pos); + iter->oid.algo = hash_algo_by_ptr(the_hash_algo); + pos += the_hash_algo->rawsz; + + if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) { + if (!refname_is_safe(iter->base.refname)) + die("packed refname is dangerous: %s", + iter->base.refname); + oidclr(&iter->oid); + iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN; + } + + /* We always know the peeled value! */ + iter->base.flags |= REF_KNOWS_PEELED; + + offset = get_be64(iter->snapshot->offset_chunk + sizeof(uint64_t) * iter->row); + if (offset & OFFSET_IS_PEELED) { + hashcpy(iter->peeled.hash, (const unsigned char *)pos); + iter->peeled.algo = hash_algo_by_ptr(the_hash_algo); + } else { + oidclr(&iter->peeled); + } + + /* TODO: somehow all tags are getting OFFSET_IS_PEELED even though + * some are not annotated tags. + */ + iter->pos = iter->snapshot->refs_chunk + (offset & (~OFFSET_IS_PEELED)); + + iter->row++; + + return ITER_OK; +} + struct write_packed_refs_v2_context { struct packed_ref_store *refs; struct string_list *updates; diff --git a/t/t3212-ref-formats.sh b/t/t3212-ref-formats.sh index 03c713ac4f6..571ba518ef1 100755 --- a/t/t3212-ref-formats.sh +++ b/t/t3212-ref-formats.sh @@ -73,9 +73,24 @@ test_expect_success 'extensions.refFormat=files,packed-v2' ' test_must_fail git rev-parse refs/tags/Q && rm -f .git/packed-refs && + git for-each-ref --format="%(refname) %(objectname)" >expect-all && + git for-each-ref --format="%(refname) %(objectname)" \ + refs/tags/* >expect-tags && + # Create a v2 packed-refs file git pack-refs --all && - test_path_exists .git/packed-refs + test_path_exists .git/packed-refs && + for t in A B + do + test_path_is_missing .git/refs/tags/$t && + git rev-parse refs/tags/$t || return 1 + done && + + git for-each-ref --format="%(refname) %(objectname)" >actual-all && + test_cmp expect-all actual-all && + git for-each-ref --format="%(refname) %(objectname)" \ + refs/tags/* >actual-tags && + test_cmp expect-tags actual-tags ) ' From patchwork Mon Nov 7 18:35:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035101 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D991C43217 for ; Mon, 7 Nov 2022 18:37:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233241AbiKGShm (ORCPT ); Mon, 7 Nov 2022 13:37:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233221AbiKGShE (ORCPT ); Mon, 7 Nov 2022 13:37:04 -0500 Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C28C2612F for ; Mon, 7 Nov 2022 10:36:28 -0800 (PST) Received: by mail-wr1-x430.google.com with SMTP id o4so17577409wrq.6 for ; Mon, 07 Nov 2022 10:36:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=EpMd01WBMeG+LCpD0GW+bvFGhemRwxfM+YE1DxabB84=; b=ZLyCtnL1wMIJHd0CHLyajBvTqpXv2CExtDvXLHyoioibGuin7nW5YBNl5wk5PlhpeY U7kZbHkwbVhKA3bMCB47jwaXDmtyA7QJ2sPuQiiIJAUs22c2hMhWzthIIEea1taoeO/h +LL9NSTOGiDdl2UQcKKi22eV7C+BxB74dVEd+ymESHKkOqd9RJas6aLqo6T0sutlr7W0 F6wdgrsmyZiVDKhb3vERPtIy9weHYcaNd9tEz0Zhsir4r641LNtMHaG0FsHJG+d0KK9w GiJWv31evAvJKLKSFIGiWYuUEzKp2ov1OYPcxuYO05jEMC1jw6qEZpfcVbS0je3PSZjd sTIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EpMd01WBMeG+LCpD0GW+bvFGhemRwxfM+YE1DxabB84=; b=fJ4U6NxVK6EvXMn+5evM+nyh2gT9I6HSBV7p4A/3UWiSRg3r4HJz2YFEf45VVG7b18 sH7y9ooSFz0GA6+Ij3roWHraKCC/mpCCuVEvEgVYbE6207xCJlDUSaVmPCcQGJnZaDPp x91BEIQx/UqIv3GuEB9GFHs/6BrD5af3Qc5nxEBASttw7qr6BSNJnYqqgr11jgf8sNX+ dBSp1QWyzhB+mwj7FcGJO8zuiwbNm9wUc9zHT488t0l4/Vc/7XipCdSqgdE/8bVXaADf g0m18A50Pu3zjvUcYVxVaIeAAe9jPCqVU6vjngThSYGcsS9FlaKaAjKsT739/QTbmuVR SHNQ== X-Gm-Message-State: ACrzQf0N7spNOYujz0Wt+A52eJ2vswwHxTsTj7kcYUzMn67X686L578Y Fiyj6W18rVYXYC+V+awnzdJSih6t0k8= X-Google-Smtp-Source: AMsMyM60ORh/6OlzmT3uPfgB5uvPn8kMOnqOVaDImbJAieP7UE0WYN8F0ut4duDLGpz5TkvAkJgS2w== X-Received: by 2002:a5d:42c4:0:b0:236:637c:6c71 with SMTP id t4-20020a5d42c4000000b00236637c6c71mr32861665wrr.499.1667846186432; Mon, 07 Nov 2022 10:36:26 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id j22-20020a05600c1c1600b003a6125562e1sm9478018wms.46.2022.11.07.10.36.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:25 -0800 (PST) Message-Id: <9b3bd93e51e5ed4358c76263e96c4b4e218987b7.1667846165.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:54 +0000 Subject: [PATCH 20/30] packed-refs: read optional prefix chunks Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Signed-off-by: Derrick Stolee --- refs/packed-backend.c | 2 + refs/packed-backend.h | 9 +++ refs/packed-format-v2.c | 159 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 170 insertions(+) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 549cce1f84a..ae904de9014 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -475,6 +475,8 @@ static struct ref_iterator *packed_ref_iterator_begin( iter->version = snapshot->version; iter->row = v2_row; + init_iterator_prefix_info(prefix, iter); + iter->pos = start; iter->eof = snapshot->eof; strbuf_init(&iter->refname_buf, 0); diff --git a/refs/packed-backend.h b/refs/packed-backend.h index 3a8649857f1..1936bb5c76c 100644 --- a/refs/packed-backend.h +++ b/refs/packed-backend.h @@ -103,9 +103,12 @@ struct snapshot { * packed-refs v2 values * *************************/ size_t nr; + size_t prefixes_nr; size_t buflen; const unsigned char *offset_chunk; const char *refs_chunk; + const unsigned char *prefix_offsets_chunk; + const char *prefix_chunk; /* * Count of references to this instance, including the pointer @@ -212,6 +215,9 @@ struct packed_ref_iterator { ***********************************/ size_t nr; size_t row; + size_t prefix_row_end; + size_t prefix_i; + const char *cur_prefix; }; typedef int (*write_ref_fn)(const char *refname, @@ -308,4 +314,7 @@ struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store * int write_packed_refs_v2(struct write_packed_refs_v2_context *ctx); void free_v2_context(struct write_packed_refs_v2_context *ctx); +void init_iterator_prefix_info(const char *prefix, + struct packed_ref_iterator *iter); + #endif /* REFS_PACKED_BACKEND_H */ diff --git a/refs/packed-format-v2.c b/refs/packed-format-v2.c index d75df9545ec..0ab277f7ad4 100644 --- a/refs/packed-format-v2.c +++ b/refs/packed-format-v2.c @@ -14,6 +14,79 @@ #define PACKED_REFS_SIGNATURE 0x50524546 /* "PREF" */ #define CHREFS_CHUNKID_OFFSETS 0x524F4646 /* "ROFF" */ #define CHREFS_CHUNKID_REFS 0x52454653 /* "REFS" */ +#define CHREFS_CHUNKID_PREFIX_DATA 0x50465844 /* "PFXD" */ +#define CHREFS_CHUNKID_PREFIX_OFFSETS 0x5046584F /* "PFXO" */ + +static const char *get_nth_prefix(struct snapshot *snapshot, + size_t n, size_t *len) +{ + uint64_t offset, next_offset; + + if (n >= snapshot->prefixes_nr) + BUG("asking for prefix %"PRIu64" outside of bounds (%"PRIu64")", + (uint64_t)n, (uint64_t)snapshot->prefixes_nr); + + if (n) + offset = get_be32(snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * (n - 1)); + else + offset = 0; + + if (len) { + next_offset = get_be32(snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * n); + + /* Prefix includes null terminator. */ + *len = next_offset - offset - 1; + } + + return snapshot->prefix_chunk + offset; +} + +/* + * Find the place in `snapshot->buf` where the start of the record for + * `refname` starts. If `mustexist` is true and the reference doesn't + * exist, then return NULL. If `mustexist` is false and the reference + * doesn't exist, then return the point where that reference would be + * inserted, or `snapshot->eof` (which might be NULL) if it would be + * inserted at the end of the file. In the latter mode, `refname` + * doesn't have to be a proper reference name; for example, one could + * search for "refs/replace/" to find the start of any replace + * references. + * + * The record is sought using a binary search, so `snapshot->buf` must + * be sorted. + */ +static const char *find_prefix_location(struct snapshot *snapshot, + const char *refname, size_t *pos) +{ + size_t lo = 0, hi = snapshot->prefixes_nr; + + while (lo != hi) { + const char *rec; + int cmp; + size_t len; + size_t mid = lo + (hi - lo) / 2; + + rec = get_nth_prefix(snapshot, mid, &len); + cmp = strncmp(rec, refname, len); + if (cmp < 0) { + lo = mid + 1; + } else if (cmp > 0) { + hi = mid; + } else { + /* we have a prefix match! */ + *pos = mid; + return rec; + } + } + + *pos = lo; + if (lo < snapshot->prefixes_nr) + return get_nth_prefix(snapshot, lo, NULL); + else + return NULL; +} int detect_packed_format_v2_header(struct packed_ref_store *refs, struct snapshot *snapshot) @@ -63,6 +136,46 @@ const char *find_reference_location_v2(struct snapshot *snapshot, { size_t lo = 0, hi = snapshot->nr; + if (snapshot->prefix_chunk) { + size_t prefix_row; + const char *prefix; + int found = 1; + + prefix = find_prefix_location(snapshot, refname, &prefix_row); + + if (!prefix || !starts_with(refname, prefix)) { + if (mustexist) + return NULL; + found = 0; + } + + /* The second 4-byte column of the prefix offsets */ + if (prefix_row) { + /* if prefix_row == 0, then lo = 0, which is already true. */ + lo = get_be32(snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * (prefix_row - 1) + sizeof(uint32_t)); + } + + if (!found) { + const char *ret; + /* Terminate early with this lo position as the insertion point. */ + if (pos) + *pos = lo; + + if (lo >= snapshot->nr) + return NULL; + + ret = get_nth_ref(snapshot, lo); + return ret; + } + + hi = get_be32(snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * prefix_row + sizeof(uint32_t)); + + if (prefix) + refname += strlen(prefix); + } + while (lo != hi) { const char *rec; int cmp; @@ -132,6 +245,16 @@ static int packed_refs_read_offsets(const unsigned char *chunk_start, return 0; } +static int packed_refs_read_prefix_offsets(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct snapshot *snapshot = data; + + snapshot->prefix_offsets_chunk = chunk_start; + snapshot->prefixes_nr = chunk_size / sizeof(uint64_t); + return 0; +} + void fill_snapshot_v2(struct snapshot *snapshot) { uint32_t file_signature, file_version, hash_version; @@ -163,6 +286,9 @@ void fill_snapshot_v2(struct snapshot *snapshot) read_chunk(cf, CHREFS_CHUNKID_OFFSETS, packed_refs_read_offsets, snapshot); pair_chunk(cf, CHREFS_CHUNKID_REFS, (const unsigned char**)&snapshot->refs_chunk); + read_chunk(cf, CHREFS_CHUNKID_PREFIX_OFFSETS, packed_refs_read_prefix_offsets, snapshot); + pair_chunk(cf, CHREFS_CHUNKID_PREFIX_DATA, (const unsigned char**)&snapshot->prefix_chunk); + /* TODO: add error checks for invalid chunk combinations. */ cleanup: @@ -187,6 +313,8 @@ int next_record_v2(struct packed_ref_iterator *iter) iter->base.flags = REF_ISPACKED; + if (iter->cur_prefix) + strbuf_addstr(&iter->refname_buf, iter->cur_prefix); strbuf_addstr(&iter->refname_buf, pos); iter->base.refname = iter->refname_buf.buf; pos += strlen(pos) + 1; @@ -221,9 +349,40 @@ int next_record_v2(struct packed_ref_iterator *iter) iter->row++; + if (iter->row == iter->prefix_row_end && iter->snapshot->prefix_chunk) { + size_t prefix_pos = get_be32(iter->snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * iter->prefix_i); + iter->cur_prefix = iter->snapshot->prefix_chunk + prefix_pos; + iter->prefix_i++; + iter->prefix_row_end = get_be32(iter->snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * iter->prefix_i + sizeof(uint32_t)); + } + return ITER_OK; } +void init_iterator_prefix_info(const char *prefix, + struct packed_ref_iterator *iter) +{ + struct snapshot *snapshot = iter->snapshot; + + if (snapshot->version != 2 || !snapshot->prefix_chunk) { + iter->prefix_row_end = snapshot->nr; + return; + } + + if (prefix) + iter->cur_prefix = find_prefix_location(snapshot, prefix, &iter->prefix_i); + else { + iter->cur_prefix = snapshot->prefix_chunk; + iter->prefix_i = 0; + } + + iter->prefix_row_end = get_be32(snapshot->prefix_offsets_chunk + + 2 * sizeof(uint32_t) * iter->prefix_i + + sizeof(uint32_t)); +} + struct write_packed_refs_v2_context { struct packed_ref_store *refs; struct string_list *updates; From patchwork Mon Nov 7 18:35:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035102 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57F1AC4321E for ; Mon, 7 Nov 2022 18:37:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233180AbiKGShn (ORCPT ); Mon, 7 Nov 2022 13:37:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232565AbiKGShF (ORCPT ); Mon, 7 Nov 2022 13:37:05 -0500 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4878527FF2 for ; Mon, 7 Nov 2022 10:36:28 -0800 (PST) Received: by mail-wr1-x431.google.com with SMTP id j15so17560044wrq.3 for ; Mon, 07 Nov 2022 10:36:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=EaTWpV8iOTrU68jhsZKnAzcJGc1CKmBQdwvli0ieM6s=; b=J7OSXosLeQBeQmEaMmXwns/qKyCZnzH2eem3MlEekvd5QKNNcgst34xv0PJWdt4FOR cJAw04CUirx5AeWado0Bo+XJNJwxo5TkTv5q0YuOB+URW5/QKUBGPidEjkPmxDn+sUOv uQs9gDpcWFcw5U+2D/Y5ziGZtpcYcDMUWCOeIqEURZoD8/9S4QJ/pZtu8ljaDCvskXH8 SuJj8k4IyV9tSikmeedOlc6SeyeOj0P46cPik944RSCKVvAhrScMHoHyvTSNcX3VklDW t//IKagqW5Uzx3gmu64B3dbNmjSCRMaJZHzfUB1YU80FeHW5AgkbMuDEF/P57UeGW/1D YfLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EaTWpV8iOTrU68jhsZKnAzcJGc1CKmBQdwvli0ieM6s=; b=ZZJ300EQqTNSbubpMCDQPJBI16IQnctMqvuDywrMFWwVQ6JxHAEHKZI0Q6aB66MfYV x5WztPiqM/4TE6itQ25uuLDJScsgNDJojfdVqvasc4zJ25w1g+UjeNq5yHq97MRKTL7m yIFpUbKVVTwjGgthp0zhqhV7znlAOSlSAc5Sx0eSxiq9HaiB7McaJVws0CkdXkrjWgGK pdizd5Szl4VVgCo7u4Qkb2/8ipXF54zNQzMDnGy4xoc47ODpGP8c/GwvAs3WP8/XqPwT Hvra4x2NfmBHwZkEq8G58Xn0hrQCimNDskgd+UVFLPC8x51IvtCGYUGauzp06dJpX20S b4YQ== X-Gm-Message-State: ACrzQf27xU0C0JyQQ26Fe1MqOOE/Oyz1kvMfhKPNZVEnqtoCxSgHc+pb DxTirks7rbOye9ufykDvEGkTYKCyChk= X-Google-Smtp-Source: AMsMyM7NSibu9CiZrIKcKCC12X/hhOfzonGxR75Ee4YgYjTgKSwbhzRbLD5ZeBBSEaPGQuSh2jLMDg== X-Received: by 2002:a5d:5257:0:b0:236:8a38:4e08 with SMTP id k23-20020a5d5257000000b002368a384e08mr32236941wrc.118.1667846187614; Mon, 07 Nov 2022 10:36:27 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id v3-20020a1cac03000000b003c6f3e5ba42sm12274605wme.46.2022.11.07.10.36.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:26 -0800 (PST) Message-Id: <36f9aa02ebfb967799036c4a0a648ab332c2612b.1667846165.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:55 +0000 Subject: [PATCH 21/30] packed-refs: write prefix chunks Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Tests already cover that we will start reading these prefixes. TODO: discuss time and space savings over typical approach. Signed-off-by: Derrick Stolee --- refs/packed-format-v2.c | 103 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 103 insertions(+) diff --git a/refs/packed-format-v2.c b/refs/packed-format-v2.c index 0ab277f7ad4..2cd45a5987a 100644 --- a/refs/packed-format-v2.c +++ b/refs/packed-format-v2.c @@ -398,6 +398,18 @@ struct write_packed_refs_v2_context { uint64_t *offsets; size_t nr; size_t offsets_alloc; + + int write_prefixes; + const char *cur_prefix; + size_t cur_prefix_len; + + char **prefixes; + uint32_t *prefix_offsets; + uint32_t *prefix_rows; + size_t prefix_nr; + size_t prefixes_alloc; + size_t prefix_offsets_alloc; + size_t prefix_rows_alloc; }; struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store *refs, @@ -434,6 +446,56 @@ static int write_packed_entry_v2(const char *refname, ALLOC_GROW(ctx->offsets, i + 1, ctx->offsets_alloc); + if (ctx->write_prefixes) { + if (ctx->cur_prefix && starts_with(refname, ctx->cur_prefix)) { + /* skip ahead! */ + refname += ctx->cur_prefix_len; + reflen -= ctx->cur_prefix_len; + } else { + size_t len; + const char *slash, *slashslash = NULL; + if (ctx->prefix_nr) { + /* close out the old prefix. */ + ctx->prefix_rows[ctx->prefix_nr - 1] = ctx->nr; + } + + /* Find the new prefix. */ + slash = strchr(refname, '/'); + if (slash) + slashslash = strchr(slash + 1, '/'); + /* If there are two slashes, use that. */ + slash = slashslash ? slashslash : slash; + /* + * If there is at least one slash, use that, + * and include the slash in the string. + * Otherwise, use the end of the ref. + */ + slash = slash ? slash + 1 : refname + strlen(refname); + + len = slash - refname; + ALLOC_GROW(ctx->prefixes, ctx->prefix_nr + 1, ctx->prefixes_alloc); + ALLOC_GROW(ctx->prefix_offsets, ctx->prefix_nr + 1, ctx->prefix_offsets_alloc); + ALLOC_GROW(ctx->prefix_rows, ctx->prefix_nr + 1, ctx->prefix_rows_alloc); + + if (ctx->prefix_nr) + ctx->prefix_offsets[ctx->prefix_nr] = ctx->prefix_offsets[ctx->prefix_nr - 1] + len + 1; + else + ctx->prefix_offsets[ctx->prefix_nr] = len + 1; + + ctx->prefixes[ctx->prefix_nr] = xstrndup(refname, len); + ctx->cur_prefix = ctx->prefixes[ctx->prefix_nr]; + ctx->prefix_nr++; + + refname += len; + reflen -= len; + ctx->cur_prefix_len = len; + } + + /* Update the last row continually. */ + ctx->prefix_rows[ctx->prefix_nr - 1] = i + 1; + } + + /* Write entire ref, including null terminator. */ hashwrite(ctx->f, refname, reflen); hashwrite(ctx->f, oid->hash, the_hash_algo->rawsz); @@ -483,13 +545,54 @@ static int write_refs_chunk_offsets(struct hashfile *f, return 0; } +static int write_refs_chunk_prefix_data(struct hashfile *f, + void *data) +{ + struct write_packed_refs_v2_context *ctx = data; + size_t i; + + trace2_region_enter("refs", "prefix-data", the_repository); + for (i = 0; i < ctx->prefix_nr; i++) { + size_t len = strlen(ctx->prefixes[i]) + 1; + hashwrite(f, ctx->prefixes[i], len); + + /* TODO: assert the prefix lengths match the stored offsets? */ + } + + trace2_region_leave("refs", "prefix-data", the_repository); + return 0; +} + +static int write_refs_chunk_prefix_offsets(struct hashfile *f, + void *data) +{ + struct write_packed_refs_v2_context *ctx = data; + size_t i; + + trace2_region_enter("refs", "prefix-offsets", the_repository); + for (i = 0; i < ctx->prefix_nr; i++) { + hashwrite_be32(f, ctx->prefix_offsets[i]); + hashwrite_be32(f, ctx->prefix_rows[i]); + } + + trace2_region_leave("refs", "prefix-offsets", the_repository); + return 0; +} + int write_packed_refs_v2(struct write_packed_refs_v2_context *ctx) { unsigned char file_hash[GIT_MAX_RAWSZ]; + ctx->write_prefixes = git_env_bool("GIT_TEST_WRITE_PACKED_REFS_PREFIXES", 1); + add_chunk(ctx->cf, CHREFS_CHUNKID_REFS, 0, write_refs_chunk_refs); add_chunk(ctx->cf, CHREFS_CHUNKID_OFFSETS, 0, write_refs_chunk_offsets); + if (ctx->write_prefixes) { + add_chunk(ctx->cf, CHREFS_CHUNKID_PREFIX_DATA, 0, write_refs_chunk_prefix_data); + add_chunk(ctx->cf, CHREFS_CHUNKID_PREFIX_OFFSETS, 0, write_refs_chunk_prefix_offsets); + } + hashwrite_be32(ctx->f, PACKED_REFS_SIGNATURE); hashwrite_be32(ctx->f, 2); hashwrite_be32(ctx->f, the_hash_algo->format_id); From patchwork Mon Nov 7 18:35:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035103 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11921C4332F for ; Mon, 7 Nov 2022 18:37:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233139AbiKGShp (ORCPT ); Mon, 7 Nov 2022 13:37:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233225AbiKGShF (ORCPT ); Mon, 7 Nov 2022 13:37:05 -0500 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 284B227FF7 for ; Mon, 7 Nov 2022 10:36:29 -0800 (PST) Received: by mail-wm1-x330.google.com with SMTP id p16so7428387wmc.3 for ; Mon, 07 Nov 2022 10:36:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=2hmF6hnASy8Y3vGNvwDfIvoW1XYePf4jAtVdIcEfgOU=; b=p9f+Pmw7P0a/rl0cfkThUq0KYqU5fEqMzDi7T4ymnG98fuVMTlphq6fy/DluYqrqUz lqdVSRucvKRrAZRvZwWb0ztQv4FRcsOoBLPb2NwrKlvMtYWBXWMuIbw9Nph0TNY6Lbj2 Uut01J84OGsHjeE1nguUTEWAOA+qsvPCdA0fswr7vQVbFxKdG64QVkHCK3I3HSMuySL2 prR2eCl0Oh4cysyTnblc29eq5oxmBUBVoYqO3Xs/8tnjPCiOOV2zr026EaON7bfxU5L7 sTqSk4B5CuX1LGyq+NjZY476On6Uztjj2yAvt6ZdUW9FngQvwub4eW+cBGR1yaF6kGiS W/3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2hmF6hnASy8Y3vGNvwDfIvoW1XYePf4jAtVdIcEfgOU=; b=xrEJbL0L2dBEc95TVxJUWw0rq7gpnjI2MHHHR6xQZFYt82/4yqY/bX99yyfOfcJ3SW 7JaX/7Noe24AaKoyg/JIwzD2mhUdYqIxbRcx3y74v/wdYUIvqWg23Kptldg80eyETZjj iDhFOelioqjU36GyeiC7STVZHUxpGgGViW7AF23lNM10IShfB3BMlsVgtdNwSShjEzaw FCPJtXTgI3d+Z+dAgD7hmryqWSd0HXttDDUnyNKBwy+ijh114yBI6cxWBtABqEKWL8QS p5Zyfb1NPd3JzvoYUEG3IYnWgV/QlbGTn8rLQEg5j/eBAKnUB4kVIZWXcsL1IEi5nW0k J7Og== X-Gm-Message-State: ACrzQf00gkW6HUq7HfBSD71aHVXc64ohGuVdSD3xTjjAWn2nDJDso872 ZH1Xdt/6PWPZDYT7WBs3VXGQ/xktXKM= X-Google-Smtp-Source: AMsMyM6BzCpfR+t7sI3ROJlSSOY6di44hHeQ3UOhgCanx9T9GJc9KmZS5d0AUxFIsP03d+u2Rpppww== X-Received: by 2002:a1c:a107:0:b0:3cf:a25f:eef2 with SMTP id k7-20020a1ca107000000b003cfa25feef2mr8459445wme.195.1667846188534; Mon, 07 Nov 2022 10:36:28 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i2-20020a05600c354200b003c71358a42dsm16405875wmq.18.2022.11.07.10.36.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:28 -0800 (PST) Message-Id: <6fe60ef2f53e680b047581eefc629144048b2224.1667846165.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:56 +0000 Subject: [PATCH 22/30] packed-backend: create GIT_TEST_PACKED_REFS_VERSION Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee When set, this will create a default value for the packed-refs file version on writes. When set to "2", it will automatically add the "packed-v2" value to extensions.refFormat. Not all tests pass with GIT_TEST_PACKED_REFS_VERSION=2 because they care specifically about the content of the packed-refs file. These tests will be updated in following changes. To start, though, disable the GIT_TEST_PACKED_REFS_VERSION environment variable in t3212-ref-formats.sh, since that script already tests both versions, including upgrade scenarios. Signed-off-by: Derrick Stolee --- refs/packed-backend.c | 3 ++- setup.c | 5 ++++- t/t3212-ref-formats.sh | 3 +++ 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/refs/packed-backend.c b/refs/packed-backend.c index ae904de9014..e84f669c42e 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -807,7 +807,8 @@ static int write_with_updates(struct packed_ref_store *refs, } strbuf_release(&sb); - if (git_config_get_int("refs.packedrefsversion", &version)) { + if (!(version = git_env_ulong("GIT_TEST_PACKED_REFS_VERSION", 0)) && + git_config_get_int("refs.packedrefsversion", &version)) { /* * Set the default depending on the current extension * list. Default to version 1 if available, but allow a diff --git a/setup.c b/setup.c index 72bfa289ade..a4525732fe9 100644 --- a/setup.c +++ b/setup.c @@ -732,8 +732,11 @@ int read_repository_format(struct repository_format *format, const char *path) clear_repository_format(format); /* Set default ref_format if no extensions.refFormat exists. */ - if (!format->ref_format_count) + if (!format->ref_format_count) { format->ref_format = REF_FORMAT_FILES | REF_FORMAT_PACKED; + if (git_env_ulong("GIT_TEST_PACKED_REFS_VERSION", 0) == 2) + format->ref_format |= REF_FORMAT_PACKED_V2; + } return format->version; } diff --git a/t/t3212-ref-formats.sh b/t/t3212-ref-formats.sh index 571ba518ef1..5583f16db41 100755 --- a/t/t3212-ref-formats.sh +++ b/t/t3212-ref-formats.sh @@ -2,6 +2,9 @@ test_description='test across ref formats' +GIT_TEST_PACKED_REFS_VERSION=0 +export GIT_TEST_PACKED_REFS_VERSION + . ./test-lib.sh test_expect_success 'extensions.refFormat requires core.repositoryFormatVersion=1' ' From patchwork Mon Nov 7 18:35:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3882AC4332F for ; Mon, 7 Nov 2022 18:37:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233266AbiKGSht (ORCPT ); Mon, 7 Nov 2022 13:37:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233233AbiKGShG (ORCPT ); Mon, 7 Nov 2022 13:37:06 -0500 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E544286F7 for ; Mon, 7 Nov 2022 10:36:31 -0800 (PST) Received: by mail-wr1-x42c.google.com with SMTP id g12so17518054wrs.10 for ; Mon, 07 Nov 2022 10:36:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=FyDhYllrkxRfsgW7soBJoMJVOrtdrnjGA3pcE7VQQbs=; b=Zx3ZmsUmnvhfTQDQwNrda7dhgp+9KoCEFPkOCNuXCsS92ScYBAkqPgwzMkpobBKSNH SGfN9B7iTyjSUdtg80K28lggbLwy8AgK3/rxW0FTKHOu6SHRQSnDCrPuhShWIkQBJSiU fgnRU6HcxjmKhmfwigBc6l855hjdcKXXRAk2eouQ5QwnBf/A+2xcbh+4CO3eEEM4lNI8 H4JOmKOilgh81eABqVFS3mRafk44AprHh4ndghbDWxDTaFpNZHqoaomMglOKF9hytcsq sdZX6VLa9QqFYlwBG6fOuzOiMtSBwhpiDXWQuxapUk70cD4xJPITNP51/LaZdTifXbE2 Am6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FyDhYllrkxRfsgW7soBJoMJVOrtdrnjGA3pcE7VQQbs=; b=bnbA6IDPEwH+UzzQ2CJHPgQA2+lC1CaeD/OKGPLx+8QsJCce7VXEmk7vSQubzmrbbK /JuUvCabTwSLO2q+2ZMMPgESCLI/ptpMj4iwxlXsUDNTiJ1xdIVPpfazdpEJe1Xh/oAF TknVx72K4P7LFCF2tDaBdfMsKS3koJMYj8cQh0hztcp0Cn8Mvh/2Fy3Fy5IfvWVTvOs/ 3fUgpIx1JvCTq6y8Hy+xTfmWG1K7dOt1KJQ5EGOY4vMgcd2Gw7igSm5WQk/OQSadnCdV 8vCDr+SlJBe3K/EFhcOquJBXs/bUQ7OsCj4e34e4SRFy9UiqSE33CbQ0GVRl0XwBZd6S rDcA== X-Gm-Message-State: ACrzQf04kitL5z6rRIKrUqtlxaU8vVUo4298akEokoLk2NgVtl9qkdqi rwckQEt82f8NLCpKP9uiBH7z+YOWsvg= X-Google-Smtp-Source: AMsMyM6zU9urN5s5bFZfdYYB0OhpgmrS2o6IJkPGUGfPPt5ztxT01JdUwR5rZ8rnPit0r7NmRq9mVQ== X-Received: by 2002:a5d:4b51:0:b0:236:88a2:267f with SMTP id w17-20020a5d4b51000000b0023688a2267fmr31535988wrs.461.1667846189390; Mon, 07 Nov 2022 10:36:29 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id i19-20020a05600c355300b003b49ab8ff53sm9590365wmq.8.2022.11.07.10.36.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:29 -0800 (PST) Message-Id: <188a55ddcb876aab4e5476234da5412ace053b7b.1667846165.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:57 +0000 Subject: [PATCH 23/30] t1409: test with packed-refs v2 Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee t1409-avoid-packing-refs.sh seeks to test that the packed-refs file is not modified unnecessarily. One way it does this is by creating a packed-refs file, then munging its contents and verifying that the munged data remains after other commands. For packed-refs v1, it suffices to add a line that is similar to a comment. For packed-refs v2, we cannot even add to the file without messing up the trailing table of contents of its chunked format. However, we can manipulate the last bytes that are within the trailing hash and use 'tail -c 4' to read them. This makes t1409 pass with GIT_TEST_PACKED_REFS_VERSION=2. Signed-off-by: Derrick Stolee --- t/t1409-avoid-packing-refs.sh | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh index be12fb63506..dc8d58432c8 100755 --- a/t/t1409-avoid-packing-refs.sh +++ b/t/t1409-avoid-packing-refs.sh @@ -8,13 +8,29 @@ test_description='avoid rewriting packed-refs unnecessarily' # shouldn't upset readers, and it should be omitted if the file is # ever rewritten. mark_packed_refs () { - sed -e "s/^\(#.*\)/\1 t1409 /" .git/packed-refs >.git/packed-refs.new && - mv .git/packed-refs.new .git/packed-refs + if test "$GIT_TEST_PACKED_REFS_VERSION" = "2" + then + size=$(wc -c < .git/packed-refs) && + pos=$(expr $size - 4) && + printf "FAKE" | dd of=".git/packed-refs" bs=1 seek="$pos" conv=notrunc + else + sed -e "s/^\(#.*\)/\1 t1409 /" .git/packed-refs >.git/packed-refs.new && + mv .git/packed-refs.new .git/packed-refs + fi } # Verify that the packed-refs file is still marked. check_packed_refs_marked () { - grep -q '^#.* t1409 ' .git/packed-refs + if test "$GIT_TEST_PACKED_REFS_VERSION" = "2" + then + size=$(wc -c < .git/packed-refs) && + pos=$(expr $size - 4) && + tail -c 4 .git/packed-refs >actual && + printf "FAKE" >expect && + test_cmp expect actual + else + grep -q '^#.* t1409 ' .git/packed-refs + fi } test_expect_success 'setup' ' From patchwork Mon Nov 7 18:35:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035104 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8139C433FE for ; Mon, 7 Nov 2022 18:37:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233244AbiKGShr (ORCPT ); Mon, 7 Nov 2022 13:37:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233232AbiKGShG (ORCPT ); Mon, 7 Nov 2022 13:37:06 -0500 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5597286F2 for ; Mon, 7 Nov 2022 10:36:30 -0800 (PST) Received: by mail-wr1-x42e.google.com with SMTP id z14so17566109wrn.7 for ; Mon, 07 Nov 2022 10:36:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=JsXCzF2hTps28ohqvdqyMgg4sjH5U5fGKLqeeLyAWPg=; b=lZOkv2HJqjydlWeBhg9pICrbj30rVD/Zobk2bJmXmxEqKf/POoCP3IccCrH1xabpT7 hkhZD9D2XeauMWJXEXypkB7rCsARlEQUK9ACMYs/05dSNExk3mrK3mvk4VLpGqpa77Nn 4ntGhsuTvTg2U2ssKfxF6KklRANoxwd0c+dXqjq8WO9mE2tDYhvwS2FhiZDWKG8ExtKY El3K2jTQTwjastkwN3UgbXJJYFotqsdF7ZRztOTJ7v+r0SgpiPKXb7qoeLgPL43J0nu4 6E29lJUBLwtLYhlV2VrWO46shdZxpfQMW66GTquBC7XHkUZGTBQ62Ii7qlu8zHGFQFGj b/Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JsXCzF2hTps28ohqvdqyMgg4sjH5U5fGKLqeeLyAWPg=; b=VqXYuiVYr4WJoEqv9rLiO0xcCmf4eH0P9Sv+c+RH1x4lOAjd9qHqZqvlDRPyJW1C7g LzF9DeD+I7LudoLb+0+J6Mz3McfGT6KvgR2589GECS6AVs7+gQq957qVcLhjd+uYOuoE 7+3CBJOBubyRKNSjmpqGQvQTgzzRIOkf4KMoJB9jVIHgewQrZV5W9oxhslYw3V7exDY5 eIPmsh3evDBGtoS/VqW54tEGMic0b8sgE3BL2U1fWu6zi4ilmjmfhQPmkroMgsBP1jcu OkEZ8gMebNQrq/mNzenz0J/8BU+elL1UsRVcZnpgcFJKy63X5L2wRh27b/jbtHVxGmLr wi/Q== X-Gm-Message-State: ACrzQf1+2LT3uSUxb1Oq1yNo0Ccne1e0KhKAxtV4Gz+69ntE+qf0lG+K CZpPa2eps2EFp78oJLrS2Rb0Nk46Rd8= X-Google-Smtp-Source: AMsMyM4K5+0M7KV4nbGUaBlzwhncoyJ3ddzDPlwbNcSsVBBT9JNLZB1WOSuHJhNIGTKqcFdfdcEXEg== X-Received: by 2002:adf:e391:0:b0:236:599c:b9a3 with SMTP id e17-20020adfe391000000b00236599cb9a3mr628115wrm.258.1667846190257; Mon, 07 Nov 2022 10:36:30 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id c2-20020a5d4f02000000b002366553eca7sm7889858wru.83.2022.11.07.10.36.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:29 -0800 (PST) Message-Id: <191ad7fdef6880738c25c307bbc3c1d66b6378b5.1667846165.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:58 +0000 Subject: [PATCH 24/30] t5312: allow packed-refs v2 format Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee One test in t5312 uses 'grep' to detect that a ref is written in the packed-refs file instead of a loose object. This does not work when the packed-refs file is in v2 format, such as when GIT_TEST_PACKED_REFS_VERSION=2. Since the test already checks that the loose ref is missing, it suffices to check that 'git rev-parse' succeeds. Signed-off-by: Derrick Stolee --- t/t3210-pack-refs.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh index 577f32dc71f..fe6c97d9087 100755 --- a/t/t3210-pack-refs.sh +++ b/t/t3210-pack-refs.sh @@ -159,7 +159,7 @@ test_expect_success 'delete ref while another dangling packed ref' ' test_expect_success 'pack ref directly below refs/' ' git update-ref refs/top HEAD && git pack-refs --all --prune && - grep refs/top .git/packed-refs && + git rev-parse refs/top && test_path_is_missing .git/refs/top ' From patchwork Mon Nov 7 18:35:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035107 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83855C433FE for ; Mon, 7 Nov 2022 18:37:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233183AbiKGShx (ORCPT ); Mon, 7 Nov 2022 13:37:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233238AbiKGShG (ORCPT ); Mon, 7 Nov 2022 13:37:06 -0500 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E380B28E10 for ; Mon, 7 Nov 2022 10:36:32 -0800 (PST) Received: by mail-wm1-x330.google.com with SMTP id ja4-20020a05600c556400b003cf6e77f89cso8774382wmb.0 for ; Mon, 07 Nov 2022 10:36:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=HhqGG6rTY4nfyJbezE3fGZeGPuo2cVOoiqKaawOGS4k=; b=T8shosNnH3/2A8/8JPZrGeqLyR9JB4fjOtk3KainKdWoVHIb1u8Z7da5U5OFkW3CQS IZFO0pCf3euktiMXTVFt/qgNVn4kgHclB9tZFKz4maF1nvGh6IeWbj0vicKp1oH7SCh8 NwAPVhbhGZz9b9pP8cbsEPdTUokAzqSywl+vVh5vzRlZb1hssAa58iF/TRdGInYOpQWx S5+p/jrnysnJag6EaH/iuHnb9ppPs30h/o1LhxjghvdXjicJotwZQL4ZkvBs6tgqOmRY dTIKOJgPglN1vR/13KH5p80M2isDRUPMx5XAShu6P3UGLy9/OjCHdslL0VW1IvK7Sbhl xhAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HhqGG6rTY4nfyJbezE3fGZeGPuo2cVOoiqKaawOGS4k=; b=Edg04jl1wFGSH1kaWhx4iWCeoGM0eBsIHkkXkrtBZsaM8x6sbVH6n4kaGGhed6IRuW HmLpSEry+VebG+4HmWDmQ/IBcSkwfu6RRQ3UqSaCeVyK64elAtI156MOoQsxuxS8ag5o eTrjJAWs1gKvp8mY8alLAJ34F+0SHx8AHcoglqQbbeIPQzf+LUx3v8SC/W8oAb8HObXK zUw1OETbppm/kVafZZWsGR4JIhUivVDr/BzWzVmcBUopSjLPi7ca6YK45vLkHX44TbcU x0dxlW66nvhen029xJEk1HPkTBvp7oqlolgTwjVqnacA+OMeUxsC7OjSVbGNVctpEcWt XqKg== X-Gm-Message-State: ACrzQf0vfRYMF0ljgo32IoEQYoD4SB9K5L6U3rDIa50hg+uoHoDOZzDx 6QXE9ADopKpijaEf6oLEvwYW6Arxapo= X-Google-Smtp-Source: AMsMyM4ZZ2lAkjrcGeza2O1EjFiTjxmTE9udUZJBZ1KxoCO71e4HSdr8b3J0nyLUl3CKp+hk3zEBTw== X-Received: by 2002:a1c:730e:0:b0:3b4:b0c0:d616 with SMTP id d14-20020a1c730e000000b003b4b0c0d616mr34473794wmb.72.1667846191317; Mon, 07 Nov 2022 10:36:31 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id b18-20020a056000055200b00236545edc91sm7856335wrf.76.2022.11.07.10.36.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:30 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 07 Nov 2022 18:35:59 +0000 Subject: [PATCH 25/30] t5502: add PACKED_REFS_V1 prerequisite Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The last test in t5502-quickfetch.sh exploits the packed-refs v1 file format by appending 1000 lines to the packed-refs file. If the packed-refs file is in the v2 format, this corrupts the file as unreadable. Instead of making the test slower, let's ignore it when GIT_TEST_PACKED_REFS_VERSION=2. The test is really about 'git fetch', not the packed-refs format. Create a prerequisite in case we want to use this technique again in the future. An alternative would be to write those 1000 refs using a different mechanism, but let's opt for the simpler case for now. Signed-off-by: Derrick Stolee --- t/t5502-quickfetch.sh | 2 +- t/test-lib.sh | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/t/t5502-quickfetch.sh b/t/t5502-quickfetch.sh index b160f8b7fb7..0c4aadebae6 100755 --- a/t/t5502-quickfetch.sh +++ b/t/t5502-quickfetch.sh @@ -122,7 +122,7 @@ test_expect_success 'quickfetch should not copy from alternate' ' ' -test_expect_success 'quickfetch should handle ~1000 refs (on Windows)' ' +test_expect_success PACKED_REFS_V1 'quickfetch should handle ~1000 refs (on Windows)' ' git gc && head=$(git rev-parse HEAD) && diff --git a/t/test-lib.sh b/t/test-lib.sh index 6db377f68b8..a244cd75c06 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1954,3 +1954,7 @@ test_lazy_prereq FSMONITOR_DAEMON ' git version --build-options >output && grep "feature: fsmonitor--daemon" output ' + +test_lazy_prereq PACKED_REFS_V1 ' + test "$GIT_TEST_PACKED_REFS_VERSION" -ne "2" +' \ No newline at end of file From patchwork Mon Nov 7 18:36:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035106 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47610C433FE for ; Mon, 7 Nov 2022 18:37:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233268AbiKGShv (ORCPT ); Mon, 7 Nov 2022 13:37:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50070 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233239AbiKGShG (ORCPT ); Mon, 7 Nov 2022 13:37:06 -0500 Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B083A28723 for ; Mon, 7 Nov 2022 10:36:32 -0800 (PST) Received: by mail-wr1-x42e.google.com with SMTP id w14so17542885wru.8 for ; Mon, 07 Nov 2022 10:36:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=38oRreVhhfsA7kisxxjdPWw3Yc606kjdNdKLEY0xtR4=; b=bd+kMAQN4Ejso+PiQj1q/30Em+aXgSF681IImzOAWWu+6Uz6T5Js61NMHdSpmTKUxI S1XyNcvOil+Ev+ExM/sy8pY0pqQ8rBj3iIy+00shvD/TKY11gghqB8jiLHpBZM+ktuyr /Ze9RtWDLsSnjkwXCArQbHn/mJAPRWD4o1zqPg9KpDrAG4QK7OP8dO1YVQHRkxXxiLaH AXMAbRwGae9RCVNyKaYR1ekTfiVkWcXq30FgyWvIOdkIVmmwfyNBpCPBEJZXRx3NUKCn Qh29UsL/2Bm74eSFY9OwFgnYo6qbCIUYyDO9PBfUKsfLJS2W0H8HdvwHOqH1+vbp5K4W +faw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=38oRreVhhfsA7kisxxjdPWw3Yc606kjdNdKLEY0xtR4=; b=s6q9SDzRXFX8YNVr3VtM4YIe4YcvYSsB0rJ6dUvbX1mcnSgllMjBANr1FKMbpf+ZmH UqCjFArCi+TY048t2z2P4YbFNBmqSNV0NZ9mKDfQ6UYOPt3W7EotDT6Yzcq/SqbX33jj MSPeqA+zV26jYvMoLHmhoP8mkyMmViUoBp7qUPOW11SSITCNI3sJJI9We62vscr43A7O dW+dj8jSQyGx7HsK9O7ilopVSSnubRHfjgbIvVpR279ME+Ch+B4fAM9mIiXKCYAxcGUN vbII5dtWPz8rLwl9VPIyuoD30S9CmiKubA0LYoZLwpBMgnSEpJfN1B4Tli6FgTfgnV3s vYEA== X-Gm-Message-State: ACrzQf3ZQzqZ4nrH3Ss3eJ6ssNEppueCwODF/v5QLEjC+ebqnkyoZPg5 YurUlh3bdxRRxnOi9ADDn4UoLo1SqNk= X-Google-Smtp-Source: AMsMyM7RkTE/ymmav7p43E5GX2TpNL5+fh/hbrMq9qCJ9ht+/zF6ZNzLfC4xiaLlWhMeVBs7RgiWUA== X-Received: by 2002:adf:e90d:0:b0:236:7129:d7e6 with SMTP id f13-20020adfe90d000000b002367129d7e6mr33248312wrm.398.1667846192125; Mon, 07 Nov 2022 10:36:32 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id a17-20020adfed11000000b00236863c02f5sm7842128wro.96.2022.11.07.10.36.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:31 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 07 Nov 2022 18:36:00 +0000 Subject: [PATCH 26/30] t3210: require packed-refs v1 for some tests Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee Three tests in t3210-pack-refs.sh corrupt a packed-refs file to test that Git properly discovers and handles those failures. These tests assume that the file is in the v1 format, so add the PACKED_REFS_V1 prereq to skip these tests when GIT_TEST_PACKED_REFS_VERSION=2. Signed-off-by: Derrick Stolee --- t/t3210-pack-refs.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh index fe6c97d9087..76251dfe05a 100755 --- a/t/t3210-pack-refs.sh +++ b/t/t3210-pack-refs.sh @@ -197,7 +197,7 @@ test_expect_success 'notice d/f conflict with existing ref' ' test_must_fail git branch foo/bar/baz/lots/of/extra/components ' -test_expect_success 'reject packed-refs with unterminated line' ' +test_expect_success PACKED_REFS_V1 'reject packed-refs with unterminated line' ' cp .git/packed-refs .git/packed-refs.bak && test_when_finished "mv .git/packed-refs.bak .git/packed-refs" && printf "%s" "$HEAD refs/zzzzz" >>.git/packed-refs && @@ -206,7 +206,7 @@ test_expect_success 'reject packed-refs with unterminated line' ' test_cmp expected_err err ' -test_expect_success 'reject packed-refs containing junk' ' +test_expect_success PACKED_REFS_V1 'reject packed-refs containing junk' ' cp .git/packed-refs .git/packed-refs.bak && test_when_finished "mv .git/packed-refs.bak .git/packed-refs" && printf "%s\n" "bogus content" >>.git/packed-refs && @@ -215,7 +215,7 @@ test_expect_success 'reject packed-refs containing junk' ' test_cmp expected_err err ' -test_expect_success 'reject packed-refs with a short SHA-1' ' +test_expect_success PACKED_REFS_V1 'reject packed-refs with a short SHA-1' ' cp .git/packed-refs .git/packed-refs.bak && test_when_finished "mv .git/packed-refs.bak .git/packed-refs" && printf "%.7s %s\n" $HEAD refs/zzzzz >>.git/packed-refs && From patchwork Mon Nov 7 18:36:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035108 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C1CCC4332F for ; Mon, 7 Nov 2022 18:38:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233249AbiKGSh4 (ORCPT ); Mon, 7 Nov 2022 13:37:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233245AbiKGShI (ORCPT ); Mon, 7 Nov 2022 13:37:08 -0500 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B89AD25EB6 for ; Mon, 7 Nov 2022 10:36:34 -0800 (PST) Received: by mail-wm1-x32d.google.com with SMTP id ay14-20020a05600c1e0e00b003cf6ab34b61so10220998wmb.2 for ; Mon, 07 Nov 2022 10:36:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=oXI9YmLxXYusnxMb67gNbNOoBleGXrQo9qZsXx3Cr5k=; b=S1O6mLCcV17f6TZnxH2sapwhbTmbM4wAepF77iHwv+VcObita3pfXzSpkcIjmEJOT3 z+iyRf0E/mzhVS+NjrqGdEnQjgTUdeO4pwubkngBcmV32h8op0rGteJMNwgfZFL6CPci ZC/W+ciIedPE97vdfjDlwxpWbM0KE091H17RyxbxV29g7tXn6qidXxOdf+oUmsX0Zxse darvX9LkgX/MSmkXlmuDXXbv9Qu+DHvkb7MCPoV9VDTyE8GxH+i1hm1Q4Pd0KVQkrtiF z52ZwALnC3j/WPeXbq99FGUxxVlTGVQoKQCT2w0vNW6/toNGBWGpIM0LMwjt1otFvEgD Wk7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oXI9YmLxXYusnxMb67gNbNOoBleGXrQo9qZsXx3Cr5k=; b=GY1xEvTZ8Ns4zAY/0JazA0dJRVeoGOPWlcT2/q/lBzd9CicwUQJSvTiAfMmwhil3dB e1LMGYnli+IJl2SDl0fnjLGxchlGAuJ+b38N9K7Dgybk6VVseQZh4cBM+dQMrBUScO4e vPfEUExTzrXPYMiL0Tz5bKQND8OEcI+RVo2Bb+xt/fosBJjJzmBA0mg7vbNtbojU2uLs c2UDcSSIN3pqmaHXV9LMRhRil2HqYV/LD/vgzuCO3xlPDZQGVGQL/0JK1omAZuhOaqqr E5HgC8Z90wVAQ3dulsHtV0f+4EZfCtSdDVmvBsrLtylDL0gUswMFkipI0dlnxqPIPOHq z28A== X-Gm-Message-State: ACrzQf1jlOYaoBcY/BogV6ctzU5yDINun8+sVfHy8YcjdEXRMIBJ21ZS bUaG3vowsmERnQ1t+DJ7k6hR4hPceh0= X-Google-Smtp-Source: AMsMyM7fijAUeBuP8Za3sOJDZBAjFnJHeCSaxop0vmY2fjsOmWPRc3TWFzfvVJ/TISMVd7NiSFQqYA== X-Received: by 2002:a05:600c:5d3:b0:3cf:6bbf:9ee3 with SMTP id p19-20020a05600c05d300b003cf6bbf9ee3mr29721765wmd.15.1667846193077; Mon, 07 Nov 2022 10:36:33 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l27-20020a05600c1d1b00b003b95ed78275sm9506329wms.20.2022.11.07.10.36.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:32 -0800 (PST) Message-Id: <5aa0d4080291dda854fc1ea7655037822b53111a.1667846165.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:36:01 +0000 Subject: [PATCH 27/30] t*: skip packed-refs v2 over http tests Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The GIT_TEST_PACKED_REFS_VERSION=2 environment variable helps us test the packed-refs file format in its v2 version. This variable makes the Git process act as if the extensions.refFormat config key has "packed-v2" in its list. This means that if the environment variable is removed, the repository is in a bad state. This is sufficient for most test cases. However, tests that fetch over HTTP appear to lose this environment variable when executed through the HTTP server. Since the repositories are created via Git commands in the tests, the packed-refs files end up in the v2 format, but the server processes do not understand this and start serving empty payloads since they do not recognize any refs. The preferred long-term solution would be to ensure that the GIT_TEST_* environment variable persists into the HTTP server. However, these tests are not exercising any particularly tricky parts of the packed-refs file format. It may not be worth the effort to pass the environment variable and instead we can unset the environment variable (with a comment explaining why) in these tests. Signed-off-by: Derrick Stolee --- t/t5539-fetch-http-shallow.sh | 7 +++++++ t/t5541-http-push-smart.sh | 7 +++++++ t/t5542-push-http-shallow.sh | 7 +++++++ t/t5551-http-fetch-smart.sh | 7 +++++++ t/t5558-clone-bundle-uri.sh | 7 +++++++ 5 files changed, 35 insertions(+) diff --git a/t/t5539-fetch-http-shallow.sh b/t/t5539-fetch-http-shallow.sh index 3ea75d34ca0..5e3b4304367 100755 --- a/t/t5539-fetch-http-shallow.sh +++ b/t/t5539-fetch-http-shallow.sh @@ -5,6 +5,13 @@ test_description='fetch/clone from a shallow clone over http' GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME +# If GIT_TEST_PACKED_REFS_VERSION=2, then the packed-refs file will +# be written in v2 format without extensions.refFormat=packed-v2. This +# causes issues for the HTTP server which does not carry over the +# environment variable to the server process. +GIT_TEST_PACKED_REFS_VERSION=0 +export GIT_TEST_PACKED_REFS_VERSION + . ./test-lib.sh . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh index fbad2d5ff5e..495437dd3c7 100755 --- a/t/t5541-http-push-smart.sh +++ b/t/t5541-http-push-smart.sh @@ -7,6 +7,13 @@ test_description='test smart pushing over http via http-backend' GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME +# If GIT_TEST_PACKED_REFS_VERSION=2, then the packed-refs file will +# be written in v2 format without extensions.refFormat=packed-v2. This +# causes issues for the HTTP server which does not carry over the +# environment variable to the server process. +GIT_TEST_PACKED_REFS_VERSION=0 +export GIT_TEST_PACKED_REFS_VERSION + . ./test-lib.sh ROOT_PATH="$PWD" diff --git a/t/t5542-push-http-shallow.sh b/t/t5542-push-http-shallow.sh index c2cc83182f9..c47b18b9faa 100755 --- a/t/t5542-push-http-shallow.sh +++ b/t/t5542-push-http-shallow.sh @@ -5,6 +5,13 @@ test_description='push from/to a shallow clone over http' GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME +# If GIT_TEST_PACKED_REFS_VERSION=2, then the packed-refs file will +# be written in v2 format without extensions.refFormat=packed-v2. This +# causes issues for the HTTP server which does not carry over the +# environment variable to the server process. +GIT_TEST_PACKED_REFS_VERSION=0 +export GIT_TEST_PACKED_REFS_VERSION + . ./test-lib.sh . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh index 6a38294a476..61f2e90eabe 100755 --- a/t/t5551-http-fetch-smart.sh +++ b/t/t5551-http-fetch-smart.sh @@ -4,6 +4,13 @@ test_description='test smart fetching over http via http-backend' GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME +# If GIT_TEST_PACKED_REFS_VERSION=2, then the packed-refs file will +# be written in v2 format without extensions.refFormat=packed-v2. This +# causes issues for the HTTP server which does not carry over the +# environment variable to the server process. +GIT_TEST_PACKED_REFS_VERSION=0 +export GIT_TEST_PACKED_REFS_VERSION + . ./test-lib.sh . "$TEST_DIRECTORY"/lib-httpd.sh start_httpd diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh index 9155f31fa2c..3e35322155e 100755 --- a/t/t5558-clone-bundle-uri.sh +++ b/t/t5558-clone-bundle-uri.sh @@ -2,6 +2,13 @@ test_description='test fetching bundles with --bundle-uri' +# If GIT_TEST_PACKED_REFS_VERSION=2, then the packed-refs file will +# be written in v2 format without extensions.refFormat=packed-v2. This +# causes issues for the HTTP server which does not carry over the +# environment variable to the server process. +GIT_TEST_PACKED_REFS_VERSION=0 +export GIT_TEST_PACKED_REFS_VERSION + . ./test-lib.sh test_expect_success 'fail to clone from non-existent file' ' From patchwork Mon Nov 7 18:36:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D98B8C433FE for ; Mon, 7 Nov 2022 18:38:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233245AbiKGSiC (ORCPT ); Mon, 7 Nov 2022 13:38:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233247AbiKGShI (ORCPT ); Mon, 7 Nov 2022 13:37:08 -0500 Received: from mail-wr1-x432.google.com (mail-wr1-x432.google.com [IPv6:2a00:1450:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99D0026577 for ; Mon, 7 Nov 2022 10:36:35 -0800 (PST) Received: by mail-wr1-x432.google.com with SMTP id cl5so17545225wrb.9 for ; Mon, 07 Nov 2022 10:36:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=gM/FSCskvx0L2g6LFTw/tGiLchecE6CJVhufvrnkFkI=; b=ozlZ3LjgdXnyeeW6x8LQ3HZjfNV+IdFkYBoJ9gqO3jNyX5qRK6ZVc6HvjxHDjktPt1 vHFfgjKTK1VdOa7YtjlltBx4pwSKJ7PqYnpMwkWPZs/0zdHzZni5Q9dRc0RdLorkySB8 mvdZN379xjg7iiLukZBgnVEtlMGjmzH/0VCJ17nU6lTRacEVnPH4C9V4vFkeNpG05aNe nREQ7M075fNuyZaabhP47VvDe8yoVbmYPRkUDYT65b1iZOmUhfzLNwqN/X/ssQkjdugd eY3Mb6EQ4uoPtcfmjSC2tyGU20xq6m2GL8B6gzhpu6qGwg/C/1Hll5EsJIlYLWLuZzGv zUDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gM/FSCskvx0L2g6LFTw/tGiLchecE6CJVhufvrnkFkI=; b=Pxku481XJU5EWEIgVBKpKnqDyFoqGZhrxjqfnu7U/arCfjHW229s1k4KX1yvs+MQ23 H9KKSfOSDyjh/YlJcjQjwLtTtXBJsN6XRwvtyDpud8/kzzFMwJZnTATdgVXd6xFsGEcI 3SHtGBdqYpAN/jKKYRhUM8Mqj/F5Abqzczw0mtDfdPxKD2ceGyD028oubyrDP5EYoZjT ztHcnuA12ZCSZayegaJzJ4FMS0aTd3CHnEGm7nSjyjb2GL0aJZ8HyWrgwtLFwLLkcdzI PSdl4BwOfw+0pIYpn6u7nU88pyVZ4i1Q0X5wTvT24V4BVF0WVr3HF2ZE3lsfFUMuJ4uM MFew== X-Gm-Message-State: ACrzQf1eGVxxz3e/LWqvg83pt2y9jHia1HQMdrejc4QvAThGwiCZbBfM 6x4NkG1l312zHjCoR+sAFWRA0H2xIEU= X-Google-Smtp-Source: AMsMyM7Z5KyUhkvpOVHOk5z/gbo5///cpNbMyRYMaj08zh2LVYscN1djZMfdXtCU4LgAbHgzmwbejQ== X-Received: by 2002:adf:ebcf:0:b0:22c:9eb4:d6f6 with SMTP id v15-20020adfebcf000000b0022c9eb4d6f6mr32864743wrn.251.1667846193999; Mon, 07 Nov 2022 10:36:33 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l21-20020a05600c4f1500b003b4fdbb6319sm13248796wmq.21.2022.11.07.10.36.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:33 -0800 (PST) Message-Id: <9d261a55403f8c9d207cfb363689ba9964a57c57.1667846165.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:36:02 +0000 Subject: [PATCH 28/30] ci: run GIT_TEST_PACKED_REFS_VERSION=2 in some builds Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The linux-TEST-vars CI build helps us check that certain opt-in features are still exercised in at least one environment. The new GIT_TEST_PACKED_REFS_VERSION environment variable now passes the test suite when set to "2", so add this to that list of variables. This provides nearly the same coverage of the v2 format as we had in the v1 format. Signed-off-by: Derrick Stolee --- ci/run-build-and-tests.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh index 8ebff425967..e93574ca262 100755 --- a/ci/run-build-and-tests.sh +++ b/ci/run-build-and-tests.sh @@ -30,6 +30,7 @@ linux-TEST-vars) export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master export GIT_TEST_WRITE_REV_INDEX=1 export GIT_TEST_CHECKOUT_WORKERS=2 + export GIT_TEST_PACKED_REFS_VERSION=2 ;; linux-clang) export GIT_TEST_DEFAULT_HASH=sha1 From patchwork Mon Nov 7 18:36:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035110 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A303C4332F for ; Mon, 7 Nov 2022 18:38:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232623AbiKGSiF (ORCPT ); Mon, 7 Nov 2022 13:38:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233135AbiKGShI (ORCPT ); Mon, 7 Nov 2022 13:37:08 -0500 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FB7129354 for ; Mon, 7 Nov 2022 10:36:37 -0800 (PST) Received: by mail-wm1-x334.google.com with SMTP id t1so7424262wmi.4 for ; Mon, 07 Nov 2022 10:36:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=q1msgsAfJBAvhgId5qf6c2a1kBCGuldXrr95UAzR688=; b=ajh9SZcPyYvdLPMRDiF8t6heFPqEbwyL9HqJhRNsVRiC9LymIRWyzC7tysRn7GJu7z AkQQvqLGD5ixsTYjKTmrPWIg8IXn+dSQ6u4tHK0n558svFj2+NNDGFr+XQqvMTuAku9i QDr35BkHI9o47ThuVMBhW1rAUkQp5Uwl4mVsNftiiU+8GrRE8rp7IeHwTcHSzHDPf7pq g3JszMynMUGjkSR+/KuuVgFwlCEn/Z1S1XoeQ9PWZ7QMAirmImMqk+7tu3cZ+DorEBhh rCPjadkDufDwzxT1D9bkoaHKbHzcG4OVX2PHRGfCdAPDqNvAx9Mtwb1Es0ZPBAvcvbFR 2Xbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=q1msgsAfJBAvhgId5qf6c2a1kBCGuldXrr95UAzR688=; b=LZ5qIaIb2NhhXdoUD5ekf4msT0NbkDI8MO5afiIKFmTlG6ftHbfqq8EWuu7+hFJbD1 ITC4MaR03Ng1rqRoi1TjEG/aX2ZI4MqKSR1gcdynzpEIR3EIqrwltVapyRtTO++K2e5n ktrhTA1dsVyjE+dk+rNK2ovWrxmUladCuaPXG6dNY4Y2Bjj487P3P5C2uZQD0LzlPTom Z9Q5ia3omoqQ+zKennLZkBNziYWQOI+PScGL7rNt254UKoCEBHlLGkJL2gQi+/bv/yP/ PgKTxOp/KefSXhoKrEBn23iak0RIIkE3Nyt+ctczqa+HoxtCts5DiL8lcybYKQzLdAsd c9rQ== X-Gm-Message-State: ANoB5pmU1CqHl68HX4QT7TzHTQX7n5rF3ols8dpEooR8S4q+5H1ebMdw H0QRicQPKhqWtXdkgPW/bp/wbEl2joA= X-Google-Smtp-Source: AA0mqf7W+FoMc5mQ6/JWjyd+y9+AIhn1nlYAovFONMEPLdxgeJ1oIJRnjsdyacEd/k7FnWI+8X2I9A== X-Received: by 2002:a05:600c:4a11:b0:3cf:b128:39ad with SMTP id c17-20020a05600c4a1100b003cfb12839admr2701315wmp.127.1667846195248; Mon, 07 Nov 2022 10:36:35 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l27-20020a05600c1d1b00b003b95ed78275sm9506418wms.20.2022.11.07.10.36.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:34 -0800 (PST) Message-Id: In-Reply-To: References: Date: Mon, 07 Nov 2022 18:36:03 +0000 Subject: [PATCH 29/30] p1401: create performance test for ref operations Fcc: Sent MIME-Version: 1.0 To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee TBD Signed-off-by: Derrick Stolee --- t/perf/p1401-ref-operations.sh | 47 ++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) create mode 100755 t/perf/p1401-ref-operations.sh diff --git a/t/perf/p1401-ref-operations.sh b/t/perf/p1401-ref-operations.sh new file mode 100755 index 00000000000..1c372ba0ee8 --- /dev/null +++ b/t/perf/p1401-ref-operations.sh @@ -0,0 +1,47 @@ +#!/bin/sh + +test_description="Tests performance of ref operations" + +. ./perf-lib.sh + +test_perf_large_repo + +test_perf 'git pack-refs (v1)' ' + git commit --allow-empty -m "change one ref" && + git pack-refs --all +' + +test_perf 'git for-each-ref (v1)' ' + git for-each-ref --format="%(refname)" >/dev/null +' + +test_perf 'git for-each-ref prefix (v1)' ' + git for-each-ref --format="%(refname)" refs/tags/ >/dev/null +' + +test_expect_success 'configure packed-refs v2' ' + git config core.repositoryFormatVersion 1 && + git config --add extensions.refFormat files && + git config --add extensions.refFormat packed && + git config --add extensions.refFormat packed-v2 && + git config refs.packedRefsVersion 2 && + git commit --allow-empty -m "change one ref" && + git pack-refs --all && + test_copy_bytes 16 .git/packed-refs | xxd >actual && + grep PREF actual +' + +test_perf 'git pack-refs (v2)' ' + git commit --allow-empty -m "change one ref" && + git pack-refs --all +' + +test_perf 'git for-each-ref (v2)' ' + git for-each-ref --format="%(refname)" >/dev/null +' + +test_perf 'git for-each-ref prefix (v2)' ' + git for-each-ref --format="%(refname)" refs/tags/ >/dev/null +' + +test_done From patchwork Mon Nov 7 18:36:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Derrick Stolee X-Patchwork-Id: 13035111 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13C18C433FE for ; Mon, 7 Nov 2022 18:38:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233277AbiKGSiI (ORCPT ); Mon, 7 Nov 2022 13:38:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233181AbiKGShI (ORCPT ); Mon, 7 Nov 2022 13:37:08 -0500 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D254B29373 for ; Mon, 7 Nov 2022 10:36:37 -0800 (PST) Received: by mail-wm1-x329.google.com with SMTP id fn7-20020a05600c688700b003b4fb113b86so7729025wmb.0 for ; Mon, 07 Nov 2022 10:36:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:fcc:content-transfer-encoding:mime-version:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=8YJ063CkQp/JCiBt5QZ8btPCMMjnxYxtyH0hjcvY6bM=; b=hqpf0msiY3XLWcLSJFYBTJ1wWBFgtku6l7EnL3CTCKsZkAvw4vWFj/HOa2Q8aglQwT W9D6BcC8yw7vEa2z7lkQy99LXVU8SOvwdJg05kEP2JKE2HIh0j/Hvj4N8Jy9UChmsKtF 6yVuUHc3ZQVSvhM61Q36hZpb2i/FsJhENAGYa+Y95Vs19j8vNA4zu6eVFS6yHcwfyBHO M0q1TLglfPtoyqvFvLsIgIJg4y+cgpfhi8cm9KY2ojWeVioNLPcCvr+wf29R7TPUrw6w cKyjyqk4/Jht5YJIfOyrWhk/oxsPXzdUqyfchlHKWTw9naTUIXZyUqH7m4CbuIFhd7W+ vAeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:fcc:content-transfer-encoding:mime-version:subject:date:from :references:in-reply-to:message-id:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8YJ063CkQp/JCiBt5QZ8btPCMMjnxYxtyH0hjcvY6bM=; b=gTclr/hrvbBDJ2F0koTSma4gG0WKEB0EUmGk4W2BP3wZ5/1dzT4L+ezFrtMC+pNBrF PJ2YGc31XRWxK7MEzReeMY/EKQE5WmYhcsrHrwi1w3GzFa4Fv7UIRIcQj/Z+HEeEASHv AzcHb8Oh4suQX8V9zV3rYp+tHmhVA8+gMeLVzq8WA+X7jpsKz8MeTR/7n2gXqJIJN+dg xWZzwKLs/UWhdAOKOapYlyXGhzNoefhyuJGHrwJj5nLqesRzmwfSfJDZC0N0eazhu0Da d202encscQJMe/iTysL5od5XNuq93+5iIxtT1Gw6oTtdPNUfdrZBMXUp602wtTMhcfrb xBaA== X-Gm-Message-State: ACrzQf0TQTLjzcLMvXVlpwsqMfgkdxkO7+UIPnhDEME4JDaXUL2Da74q xXqt6bAXMgJUdl267kaKaAFzGDs9s4I= X-Google-Smtp-Source: AMsMyM6JlemftlbbjRotkltB9h6XmIAsBdsRgbSG8mPTc+qaKoBHW7zG5WV+pf612fO+6L5mwuZe0g== X-Received: by 2002:a1c:7207:0:b0:3cf:8115:b39a with SMTP id n7-20020a1c7207000000b003cf8115b39amr24428648wmc.80.1667846196075; Mon, 07 Nov 2022 10:36:36 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id n13-20020a05600c500d00b003b47e75b401sm11904668wmr.37.2022.11.07.10.36.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 10:36:35 -0800 (PST) Message-Id: <37fb4e73ca711f642351d10e1db51c330a1544f1.1667846165.git.gitgitgadget@gmail.com> In-Reply-To: References: Date: Mon, 07 Nov 2022 18:36:04 +0000 Subject: [PATCH 30/30] refs: skip hashing when writing packed-refs v2 MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: jrnieder@gmail.com, Derrick Stolee , Derrick Stolee Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Derrick Stolee From: Derrick Stolee The 'skip_hash' option in 'struct hashfile' indicates that we want to use the hashfile API as a buffered writer, and not use the hash function to create a trailing hash. We still write a trailing null hash to indicate that we do not have a checksum at the end. This feature is enabled for index writes using the 'index.computeHash' config key. Create a similar (currently hidden) option for the packed-refs v2 file format: refs.hashPackedRefs. This defaults to false because performance is compared to the packed-refs v1 file format which does have a checksum anywhere. This change results in improvements to p1401 when using a repository with a 42 MB packed-refs file (600,000+ refs). Test HEAD~1 HEAD -------------------------------------------------------------------- 1401.1: git pack-refs (v1) 0.38(0.31+0.52) 0.37(0.28+0.52) -2.6% 1401.5: git pack-refs (v2) 0.39(0.33+0.52) 0.30(0.28+0.46) -23.1% Note that these tests update a ref and then repack the packed-refs file. The following benchmarks are from a hyperfine experiment that only ran the 'git pack-refs --all' command for the two formats, but also compared the effect when refs.hashPackedRefs=true. Benchmark 1: v1 Time (mean ± σ): 163.5 ms ± 18.1 ms [User: 117.8 ms, System: 38.1 ms] Range (min … max): 131.3 ms … 190.4 ms 50 runs Benchmark 2: v2-no-hash Time (mean ± σ): 95.8 ms ± 15.1 ms [User: 72.5 ms, System: 23.0 ms] Range (min … max): 82.9 ms … 131.2 ms 50 runs Benchmark 3: v2-hashing Time (mean ± σ): 100.8 ms ± 16.4 ms [User: 77.2 ms, System: 23.1 ms] Range (min … max): 83.0 ms … 131.1 ms 50 runs Summary 'v2-no-hash' ran 1.05 ± 0.24 times faster than 'v2-hashing' 1.71 ± 0.33 times faster than 'v1' In this case of repeatedly rewriting the same refs seems to demonstrate a smaller improvement than the p1401 test. However, the overall reduction from v1 matches the expected reduction in file size. In my tests, the 42 MB packed-refs (v1) file was compacted to 28 MB in the v2 format. Signed-off-by: Derrick Stolee --- refs/packed-format-v2.c | 7 +++++++ t/perf/p1401-ref-operations.sh | 5 +++++ 2 files changed, 12 insertions(+) diff --git a/refs/packed-format-v2.c b/refs/packed-format-v2.c index 2cd45a5987a..ada34bf9bf0 100644 --- a/refs/packed-format-v2.c +++ b/refs/packed-format-v2.c @@ -417,6 +417,7 @@ struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store * struct strbuf *err) { struct write_packed_refs_v2_context *ctx; + int do_skip_hash; CALLOC_ARRAY(ctx, 1); ctx->refs = refs; @@ -430,6 +431,12 @@ struct write_packed_refs_v2_context *create_v2_context(struct packed_ref_store * } ctx->f = hashfd(refs->tempfile->fd, refs->tempfile->filename.buf); + + /* Default to true, so skip_hash if not set. */ + if (git_config_get_maybe_bool("refs.hashpackedrefs", &do_skip_hash) || + do_skip_hash) + ctx->f->skip_hash = 1; + ctx->cf = init_chunkfile(ctx->f); return ctx; diff --git a/t/perf/p1401-ref-operations.sh b/t/perf/p1401-ref-operations.sh index 1c372ba0ee8..0b88a2f531a 100755 --- a/t/perf/p1401-ref-operations.sh +++ b/t/perf/p1401-ref-operations.sh @@ -36,6 +36,11 @@ test_perf 'git pack-refs (v2)' ' git pack-refs --all ' +test_perf 'git pack-refs (v2;hashing)' ' + git commit --allow-empty -m "change one ref" && + git -c refs.hashPackedRefs=true pack-refs --all +' + test_perf 'git for-each-ref (v2)' ' git for-each-ref --format="%(refname)" >/dev/null '