Message ID | patch-v13-6.7-5ed79c58b18-20220604T095113Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | unpack-objects: support streaming blobs to disk | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > So let's attempt to summarize 12 years of changes in behavior, which > can be seen with: > > git log --oneline -Gbig_file_thre 5eef828bc03.. -- '*.c' > > To do that turn this into a bullet-point list. The summary Han Xin > produced in [1] helped a lot, but is a bit too detailed for > documentation aimed at users. Let's instead summarize how > user-observable behavior differs, and generally describe how we tend > to stream these files in various commands. Nicely studied. Very much appreciated. > core.bigFileThreshold:: > - Files larger than this size are stored deflated, without > - attempting delta compression. Storing large files without > - delta compression avoids excessive memory usage, at the > - slight expense of increased disk usage. Additionally files > - larger than this size are always treated as binary. > + The size of files considered "big", which as discussed below > + changes the behavior of numerous git commands, as well as how > + such files are stored within the repository. The default is > + 512 MiB. Common unit suffixes of 'k', 'm', or 'g' are > + supported. > + > -Default is 512 MiB on all platforms. This should be reasonable > -for most projects as source code and other text files can still > -be delta compressed, but larger binary media files won't be. > +Files above the configured limit will be: > + > -Common unit suffixes of 'k', 'm', or 'g' are supported. > +* Stored deflated, without attempting delta compression. "even in packfiles" (with or without "even") is better be there in the sentence---loose objects are always stored deflated anyway. > +The default limit is primarily set with this use-case in mind. With it > +most projects will have their source code and other text files delta > +compressed, but not larger binary media files. > ++ > +Storing large files without delta compression avoids excessive memory > +usage, at the slight expense of increased disk usage. > +* Will be treated as if though they were labeled "binary" (see > + linkgit:gitattributes[5]). This means that e.g. linkgit:git-log[1] > + and linkgit:git-diff[1] will not diffs for files above this limit. Good. You can lose three words "This means that" and the sentence means the same thing, so lose them (I always recommend people to reread the sentence when they write "This means that" with an eye to rewrite it better---it often is a sign that either the previous sentence is insufficiently clear, in which case it can be discarded and description after the three words can be enhanced to a better result). > +* Will be generally be streamed when written, which avoids excessive > +memory usage, at the cost of some fixed overhead. Commands that make > +use of this include linkgit:git-archive[1], > +linkgit:git-fast-import[1], linkgit:git-index-pack[1] and > +linkgit:git-fsck[1]. Nice. And this series adds unpack-objects to the mix. > core.excludesFile:: > Specifies the pathname to the file that contains patterns to Excellent. Thanks.
diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index 41e330f3069..ff6ae6bb647 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -444,17 +444,32 @@ You probably do not need to adjust this value. Common unit suffixes of 'k', 'm', or 'g' are supported. core.bigFileThreshold:: - Files larger than this size are stored deflated, without - attempting delta compression. Storing large files without - delta compression avoids excessive memory usage, at the - slight expense of increased disk usage. Additionally files - larger than this size are always treated as binary. + The size of files considered "big", which as discussed below + changes the behavior of numerous git commands, as well as how + such files are stored within the repository. The default is + 512 MiB. Common unit suffixes of 'k', 'm', or 'g' are + supported. + -Default is 512 MiB on all platforms. This should be reasonable -for most projects as source code and other text files can still -be delta compressed, but larger binary media files won't be. +Files above the configured limit will be: + -Common unit suffixes of 'k', 'm', or 'g' are supported. +* Stored deflated, without attempting delta compression. ++ +The default limit is primarily set with this use-case in mind. With it +most projects will have their source code and other text files delta +compressed, but not larger binary media files. ++ +Storing large files without delta compression avoids excessive memory +usage, at the slight expense of increased disk usage. ++ +* Will be treated as if though they were labeled "binary" (see + linkgit:gitattributes[5]). This means that e.g. linkgit:git-log[1] + and linkgit:git-diff[1] will not diffs for files above this limit. ++ +* Will be generally be streamed when written, which avoids excessive +memory usage, at the cost of some fixed overhead. Commands that make +use of this include linkgit:git-archive[1], +linkgit:git-fast-import[1], linkgit:git-index-pack[1] and +linkgit:git-fsck[1]. core.excludesFile:: Specifies the pathname to the file that contains patterns to
The core.bigFileThreshold documentation has been largely unchanged since 5eef828bc03 (fast-import: Stream very large blobs directly to pack, 2010-02-01). But since then this setting has been expanded to affect a lot more than that description indicated. Most notably in how "git diff" treats them, see 6bf3b813486 (diff --stat: mark any file larger than core.bigfilethreshold binary, 2014-08-16). In addition to that, numerous commands and APIs make use of a streaming mode for files above this threshold. So let's attempt to summarize 12 years of changes in behavior, which can be seen with: git log --oneline -Gbig_file_thre 5eef828bc03.. -- '*.c' To do that turn this into a bullet-point list. The summary Han Xin produced in [1] helped a lot, but is a bit too detailed for documentation aimed at users. Let's instead summarize how user-observable behavior differs, and generally describe how we tend to stream these files in various commands. 1. https://lore.kernel.org/git/20220120112114.47618-5-chiyutianyi@gmail.com/ Helped-by: Han Xin <chiyutianyi@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- Documentation/config/core.txt | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-)