Message ID | 4013f992d15aab69346bf6f8eafe38511b923595.1667846164.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | extensions.refFormat and packed-refs v2 file format | expand |
On Mon, Nov 7, 2022 at 10:48 AM Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: [...] > One obvious improvement could be a new file format version for the > packed-refs file. Its current plaintext-based format is inefficient due > to storing object IDs as hexadecimal representations instead of in > their raw format. This extra cost will get worse with SHA-256. > In addition, binary searches need to guess a position and scan to find > newlines for a refname entry. A structured binary format could allow for > more compact representation and faster access. This doesn't parse very well at all. The scanning is due to refname entries being of variable length, and changing hexadecimal representation of object IDs to binary values isn't going to help that. I _think_, after re-scanning your RFC cover letter that you had other ideas to allow a binary search in order to read a single ref's value, and that the juxtaposing of these sentences together leads to an unfortunate assumption that one change is related to the both goals, but something extra here to clarify would help. > diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt > index bccaec7a963..ce8185adf53 100644 > --- a/Documentation/config/extensions.txt > +++ b/Documentation/config/extensions.txt > @@ -7,6 +7,47 @@ Note that this setting should only be set by linkgit:git-init[1] or > linkgit:git-clone[1]. Trying to change it after initialization will not > work and will produce hard-to-diagnose issues. > > +extensions.refFormat:: > + Specify the reference storage mechanisms used by the repoitory as a > + multi-valued list. The acceptable values are `files` and `packed`. > + If not specified, the list of `files` and `packed` is assumed. This sentence doesn't parse for me. > + It > + is an error to specify this key unless `core.repositoryFormatVersion` > + is 1. ...is at least 1? Or are we trying to be incompatible with potential future core.repositoryFormatVersion values?
On 11/11/22 6:39 PM, Elijah Newren wrote: > On Mon, Nov 7, 2022 at 10:48 AM Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: > [...] >> One obvious improvement could be a new file format version for the >> packed-refs file. Its current plaintext-based format is inefficient due >> to storing object IDs as hexadecimal representations instead of in >> their raw format. This extra cost will get worse with SHA-256. > >> In addition, binary searches need to guess a position and scan to find >> newlines for a refname entry. A structured binary format could allow for >> more compact representation and faster access. > > This doesn't parse very well at all. The scanning is due to refname > entries being of variable length, and changing hexadecimal > representation of object IDs to binary values isn't going to help > that. > > I _think_, after re-scanning your RFC cover letter that you had other > ideas to allow a binary search in order to read a single ref's value, > and that the juxtaposing of these sentences together leads to an > unfortunate assumption that one change is related to the both goals, > but something extra here to clarify would help. The v2 format has a structured list of offsets that can be used to navigate directly to the ith ref in the file. Thus, we can use a more precise form of binary search. Since we have these values, we do not need to scan for newlines or spaces for the end of the ref strings. This allows us to use the raw OIDs since we are not using special characters as string boundaries. I will work to clarify when I submit this for review. >> diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt >> index bccaec7a963..ce8185adf53 100644 >> --- a/Documentation/config/extensions.txt >> +++ b/Documentation/config/extensions.txt >> @@ -7,6 +7,47 @@ Note that this setting should only be set by linkgit:git-init[1] or >> linkgit:git-clone[1]. Trying to change it after initialization will not >> work and will produce hard-to-diagnose issues. >> >> +extensions.refFormat:: >> + Specify the reference storage mechanisms used by the repoitory as a >> + multi-valued list. The acceptable values are `files` and `packed`. > >> + If not specified, the list of `files` and `packed` is assumed. > > This sentence doesn't parse for me. > >> + It >> + is an error to specify this key unless `core.repositoryFormatVersion` >> + is 1. > > ...is at least 1? Or are we trying to be incompatible with potential > future core.repositoryFormatVersion values? Specifying exactly 1 is consistent across our extensions documentation. The intention of the extensions system is that we should never need a value 2 here. If we do, then we should consider all extensions to be redesigned from scratch. Perhaps we'd have different defaults, or older options not possible anymore. Thanks, -Stolee
diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt index bccaec7a963..ce8185adf53 100644 --- a/Documentation/config/extensions.txt +++ b/Documentation/config/extensions.txt @@ -7,6 +7,47 @@ Note that this setting should only be set by linkgit:git-init[1] or linkgit:git-clone[1]. Trying to change it after initialization will not work and will produce hard-to-diagnose issues. +extensions.refFormat:: + Specify the reference storage mechanisms used by the repoitory as a + multi-valued list. The acceptable values are `files` and `packed`. + If not specified, the list of `files` and `packed` is assumed. It + is an error to specify this key unless `core.repositoryFormatVersion` + is 1. ++ +As new ref formats are added, Git commands may modify this list before and +after upgrading the on-disk reference storage files. The specific values +indicate the existence of different layers: ++ +-- +`files`;; + When present, references may be stored as "loose" reference files + in the `$GIT_DIR/refs/` directory. The name of the reference + corresponds to the filename after `$GIT_DIR` and the file contains + an object ID as a hexadecimal string. If a loose reference file + exists, then its value takes precedence over all other formats. + +`packed`;; + When present, references may be stored as a group in a + `packed-refs` file in its version 1 format. When grouped with + `"files"` or provided on its own, this file is located at + `$GIT_DIR/packed-refs`. This file contains a list of distinct + reference names, paired with their object IDs. When combined with + `files`, the `packed` format will only be used to group multiple + loose object files upon request via the `git pack-refs` command or + via the `pack-refs` maintenance task. +-- ++ +The following combinations are supported by this version of Git: ++ +-- +`files` and `packed`;; + This set of values indicates that references are stored both as + loose reference files and in the `packed-refs` file in its v1 + format. Loose references are preferred, and the `packed-refs` file + is updated only when deleting a reference that is stored in the + `packed-refs` file or during a `git pack-refs` command. +-- + extensions.worktreeConfig:: If enabled, then worktrees will load config settings from the `$GIT_DIR/config.worktree` file in addition to the diff --git a/setup.c b/setup.c index cefd5f63c46..f5eb50c969a 100644 --- a/setup.c +++ b/setup.c @@ -577,6 +577,11 @@ static enum extension_result handle_extension(const char *var, "extensions.objectformat", value); data->hash_algo = format; return EXTENSION_OK; + } else if (!strcmp(ext, "refformat")) { + if (strcmp(value, "files") && strcmp(value, "packed")) + return error(_("invalid value for '%s': '%s'"), + "extensions.refFormat", value); + return EXTENSION_OK; } return EXTENSION_UNKNOWN; } diff --git a/t/t3212-ref-formats.sh b/t/t3212-ref-formats.sh new file mode 100755 index 00000000000..bc554e7c701 --- /dev/null +++ b/t/t3212-ref-formats.sh @@ -0,0 +1,27 @@ +#!/bin/sh + +test_description='test across ref formats' + +. ./test-lib.sh + +test_expect_success 'extensions.refFormat requires core.repositoryFormatVersion=1' ' + test_when_finished rm -rf broken && + + # Force sha1 to ensure GIT_TEST_DEFAULT_HASH does + # not imply a value of core.repositoryFormatVersion. + git init --object-format=sha1 broken && + git -C broken config extensions.refFormat files && + test_must_fail git -C broken status 2>err && + grep "repo version is 0, but v1-only extension found" err +' + +test_expect_success 'invalid extensions.refFormat' ' + test_when_finished rm -rf broken && + git init broken && + git -C broken config core.repositoryFormatVersion 1 && + git -C broken config extensions.refFormat bogus && + test_must_fail git -C broken status 2>err && + grep "invalid value for '\''extensions.refFormat'\'': '\''bogus'\''" err +' + +test_done