diff mbox series

doc: mention bigFileThreshold for packing

Message ID pull.872.git.1612897624121.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series doc: mention bigFileThreshold for packing | expand

Commit Message

Christian Walther Feb. 9, 2021, 7:07 p.m. UTC
From: Christian Walther <cwalther@gmx.ch>

Knowing about the core.bigFileThreshold configuration variable is
helpful when examining pack file size differences between repositories.
Add a reference to it to the manpages a user is likely to read in this
situation.

Signed-off-by: Christian Walther <cwalther@gmx.ch>
---
    doc: mention bigFileThreshold for packing
    
    I recently spent a lot of time trying to figure out why git repack would
    create huge packs on some clones of my repository and small ones on
    others, until I found out about the existence of the
    core.bigFileThreshold configuration variable, which happened to be set
    on some and not on others. It would have saved me a lot of time if that
    variable had been mentioned in the relevant manpages that I was reading,
    git-repack and git-pack-objects. So this patch adds that.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-872%2Fcwalther%2Fdeltadoc-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-872/cwalther/deltadoc-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/872

 Documentation/git-pack-objects.txt | 4 ++++
 Documentation/git-repack.txt       | 4 ++++
 2 files changed, 8 insertions(+)


base-commit: fb7fa4a1fd273f22efcafdd13c7f897814fd1eb9

Comments

Junio C Hamano Feb. 9, 2021, 9:50 p.m. UTC | #1
"Christian Walther via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Christian Walther <cwalther@gmx.ch>
>
> Knowing about the core.bigFileThreshold configuration variable is
> helpful when examining pack file size differences between repositories.
> Add a reference to it to the manpages a user is likely to read in this
> situation.

Thanks.

I doubt that the description of --window/--depth command line
options, for both repack and pack-objects, is the best place to add
this "Note".  Even if we were to add it as an appendix to these
places, please do not break the flow of explanation by inserting it
before the description of the default values of these options.

>     I recently spent a lot of time trying to figure out why git repack would
>     create huge packs on some clones of my repository and small ones on
>     others, until I found out about the existence of the
>     core.bigFileThreshold configuration variable, which happened to be set
>     on some and not on others. It would have saved me a lot of time if that
>     variable had been mentioned in the relevant manpages that I was reading,
>     git-repack and git-pack-objects. So this patch adds that.

Not related to the contents of the patch, but I am somewhat curious
to know what configuration resulted in the "huge" ones and "small"
ones.  Documentation/config/core.txt::core.bigFileThreashold may be
helped by addition of a success story, and the configuration for the
"small" ones may be a good place to start.

Thanks
Christian Walther Feb. 10, 2021, 9:43 p.m. UTC | #2
Junio C Hamano wrote:

> I doubt that the description of --window/--depth command line
> options, for both repack and pack-objects, is the best place to add
> this "Note".  Even if we were to add it as an appendix to these
> places, please do not break the flow of explanation by inserting it
> before the description of the default values of these options.

OK. That was where I would have looked for it, because it explains why --window wasn't effective in my attempts to get better compression, but I don't insist on it - any place would have worked, as I read both manpages back and forth several times.

In git-repack.txt, there is a "Configuration" section at the bottom, I guess it would fit there? There is none in git-pack-objects.txt, but I could add it. What do you think?


>>    I recently spent a lot of time trying to figure out why git repack would
>>    create huge packs on some clones of my repository and small ones on
>>    others
> 
> Not related to the contents of the patch, but I am somewhat curious
> to know what configuration resulted in the "huge" ones and "small"
> ones.  Documentation/config/core.txt::core.bigFileThreashold may be
> helped by addition of a success story, and the configuration for the
> "small" ones may be a good place to start.

The "huge" repository had bigFileThreshold = 1m. That was set by SubGit when converting from Subversion, for reasons unknown to me (see some discussion at https://support.tmatesoft.com/t/reduce-repository-size/2551 and https://issues.tmatesoft.com/issue/SGT-604). The result is a pack file of about 3 GB.

The "small" repository has it unset, so the default 512m applies, resulting in a pack file of about 50 MB.

What causes the huge difference is that the repository contains a "changelog" file that changes in almost every commit and has grown to 2.4 MB over 10000 commits. So it exists in about that many different versions, of which about 6000 are larger than 1 MB, but they only differ from each other by successive addition of small pieces.

I'm not sure if that makes for a good success story. 1m seems a rather extreme value to me. If you think so, I can try to come up with something.

Thanks

 Christian
Junio C Hamano Feb. 10, 2021, 10:19 p.m. UTC | #3
Christian Walther <cwalther@gmx.ch> writes:

> Junio C Hamano wrote:
>
>> I doubt that the description of --window/--depth command line
>> options, for both repack and pack-objects, is the best place to add
>> this "Note".  Even if we were to add it as an appendix to these
>> places, please do not break the flow of explanation by inserting it
>> before the description of the default values of these options.
>
> OK. That was where I would have looked for it, because it explains
> why --window wasn't effective in my attempts to get better
> compression, but I don't insist on it - any place would have
> worked, as I read both manpages back and forth several times.

The "pack-objects" command (and to some degree "repack", too) is
about packing throughout, and --depth/--window is not necessarily
the central piece of the puzzle, and that, together with disruption
of the flow of the original explanation, was the reason why I found
the initial location a bit odd.

> In git-repack.txt, there is a "Configuration" section at the
> bottom, I guess it would fit there? There is none in
> git-pack-objects.txt, but I could add it. What do you think?

You're right---if there is an existing CONFIGURATION section, that
may be a much better place.  There are configuration variables that
affect how the packing works other than the core.bigFileThreshold,
and attributes like "delta" would also affect the outcome.

Describing all in one CONFIGURATION section would be valuable.

What I queued is with the following ready to be squashed in,
primarily because I was lazy and didn't have time/inclination to
look for a better place myself ;-)

Thanks.

---- >8 ----
Subject: [PATCH] fixup! doc: mention bigFileThreshold for packing

---
 Documentation/git-pack-objects.txt | 7 +++----
 Documentation/git-repack.txt       | 7 +++----
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 59150ded4b..be0f953c35 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -97,12 +97,11 @@ base-name::
 	side, because delta data needs to be applied that many
 	times to get to the necessary object.
 +
-Note that delta compression is never used on objects larger than the
-`core.bigFileThreshold` configuration variable (see
-linkgit:git-config[1]).
-+
 The default value for --window is 10 and --depth is 50. The maximum
 depth is 4095.
++
+Note that delta compression is never used on objects larger than the
+`core.bigFileThreshold` configuration variable (see linkgit:git-config[1]).
 
 --window-memory=<n>::
 	This option provides an additional limit on top of `--window`;
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 0a7038ec4a..145fff6e01 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -96,12 +96,11 @@ to the new separate pack will be written.
 	affects the performance on the unpacker side, because delta data needs
 	to be applied that many times to get to the necessary object.
 +
-Note that delta compression is never used on objects larger than the
-`core.bigFileThreshold` configuration variable (see
-linkgit:git-config[1]).
-+
 The default value for --window is 10 and --depth is 50. The maximum
 depth is 4095.
++
+Note that delta compression is never used on objects larger than the
+`core.bigFileThreshold` configuration variable (see linkgit:git-config[1]).
 
 --threads=<n>::
 	This option is passed through to `git pack-objects`.
diff mbox series

Patch

diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt
index 54d715ead137..59150ded4bef 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -97,6 +97,10 @@  base-name::
 	side, because delta data needs to be applied that many
 	times to get to the necessary object.
 +
+Note that delta compression is never used on objects larger than the
+`core.bigFileThreshold` configuration variable (see
+linkgit:git-config[1]).
++
 The default value for --window is 10 and --depth is 50. The maximum
 depth is 4095.
 
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 92f146d27dc3..0a7038ec4ad8 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -96,6 +96,10 @@  to the new separate pack will be written.
 	affects the performance on the unpacker side, because delta data needs
 	to be applied that many times to get to the necessary object.
 +
+Note that delta compression is never used on objects larger than the
+`core.bigFileThreshold` configuration variable (see
+linkgit:git-config[1]).
++
 The default value for --window is 10 and --depth is 50. The maximum
 depth is 4095.