diff mbox series

config: set pack.useSparse=true by default

Message ID pull.585.git.1584583110914.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series config: set pack.useSparse=true by default | expand

Commit Message

Linus Arver via GitGitGadget March 19, 2020, 1:58 a.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The pack.useSparse config option was introduced by 3d036eb0
(pack-objects: create pack.useSparse setting, 2019-01-19) and was
first available in v2.21.0. When enabled, the pack-objects process
during 'git push' will use a sparse tree walk when deciding which
trees and blobs to send to the remote. The algorithm was introduced
by d5d2e93 (revision: implement sparse algorithm, 2019-01-16) and
has been in production use by VFS for Git since around that time.
The features.experimental config option also enabled pack.useSparse,
so hopefully that has also increased exposure.

It is worth noting that pack.useSparse has a possibility of
sending more objects across a push, but requires a special
arrangement of exact _copies_ across directories. There is a test
in t5322-pack-objects-sparse.sh that demonstrates this possibility.

Since the downside is unlikely but the upside is significant, set
the default value of pack.useSparse to true. Remove it from the
set of options implied by features.experimental.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    config: set pack.useSparse=true by default
    
    Here is a small patch to convert pack.useSparse to true by default. It's
    been released for over a year, so the feature is quite stable. I'm
    submitting this now to allow it to cook for a while during the next
    release cycle.
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-585%2Fderrickstolee%2Fpack-use-sparse-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-585/derrickstolee/pack-use-sparse-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/585

 Documentation/config/feature.txt | 3 ---
 Documentation/config/pack.txt    | 4 ++--
 repo-settings.c                  | 3 ++-
 3 files changed, 4 insertions(+), 6 deletions(-)


base-commit: 6c85aac65fb455af85745130ce35ddae4678db84

Comments

Jonathan Nieder March 19, 2020, 11:13 p.m. UTC | #1
Hi,

Derrick Stolee wrote:

> The pack.useSparse config option was introduced by 3d036eb0
> (pack-objects: create pack.useSparse setting, 2019-01-19) and was
> first available in v2.21.0. When enabled, the pack-objects process
> during 'git push' will use a sparse tree walk when deciding which
> trees and blobs to send to the remote. The algorithm was introduced
> by d5d2e93 (revision: implement sparse algorithm, 2019-01-16) and
> has been in production use by VFS for Git since around that time.
> The features.experimental config option also enabled pack.useSparse,
> so hopefully that has also increased exposure.
>
> It is worth noting that pack.useSparse has a possibility of
> sending more objects across a push, but requires a special
> arrangement of exact _copies_ across directories. There is a test
> in t5322-pack-objects-sparse.sh that demonstrates this possibility.
>
> Since the downside is unlikely but the upside is significant, set
> the default value of pack.useSparse to true. Remove it from the
> set of options implied by features.experimental.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/config/feature.txt | 3 ---
>  Documentation/config/pack.txt    | 4 ++--
>  repo-settings.c                  | 3 ++-
>  3 files changed, 4 insertions(+), 6 deletions(-)

Makes sense.  Thanks for writing it.

Should this have a test?

[...]
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -45,6 +45,8 @@ void prepare_repo_settings(struct repository *r)
>  
>  	if (!repo_config_get_bool(r, "pack.usesparse", &value))
>  		r->settings.pack_use_sparse = value;
> +	UPDATE_DEFAULT_BOOL(r->settings.pack_use_sparse, 1);
> +
>  	if (!repo_config_get_bool(r, "feature.manyfiles", &value) && value) {
>  		UPDATE_DEFAULT_BOOL(r->settings.index_version, 4);
>  		UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_WRITE);
> @@ -52,7 +54,6 @@ void prepare_repo_settings(struct repository *r)
>  	if (!repo_config_get_bool(r, "fetch.writecommitgraph", &value))
>  		r->settings.fetch_write_commit_graph = value;
>  	if (!repo_config_get_bool(r, "feature.experimental", &value) && value) {
> -		UPDATE_DEFAULT_BOOL(r->settings.pack_use_sparse, 1);
>  		UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_SKIPPING);
>  		UPDATE_DEFAULT_BOOL(r->settings.fetch_write_commit_graph, 1);
>  	}
> 

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Derrick Stolee March 20, 2020, 12:34 a.m. UTC | #2
On 3/19/2020 7:13 PM, Jonathan Nieder wrote:
> Hi,
> 
> Derrick Stolee wrote:
> 
>> The pack.useSparse config option was introduced by 3d036eb0
>> (pack-objects: create pack.useSparse setting, 2019-01-19) and was
>> first available in v2.21.0. When enabled, the pack-objects process
>> during 'git push' will use a sparse tree walk when deciding which
>> trees and blobs to send to the remote. The algorithm was introduced
>> by d5d2e93 (revision: implement sparse algorithm, 2019-01-16) and
>> has been in production use by VFS for Git since around that time.
>> The features.experimental config option also enabled pack.useSparse,
>> so hopefully that has also increased exposure.
>>
>> It is worth noting that pack.useSparse has a possibility of
>> sending more objects across a push, but requires a special
>> arrangement of exact _copies_ across directories. There is a test
>> in t5322-pack-objects-sparse.sh that demonstrates this possibility.
>>
>> Since the downside is unlikely but the upside is significant, set
>> the default value of pack.useSparse to true. Remove it from the
>> set of options implied by features.experimental.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  Documentation/config/feature.txt | 3 ---
>>  Documentation/config/pack.txt    | 4 ++--
>>  repo-settings.c                  | 3 ++-
>>  3 files changed, 4 insertions(+), 6 deletions(-)
> 
> Makes sense.  Thanks for writing it.
> 
> Should this have a test?

I suppose the test that demonstrates the difference in algorithm
in t5322-pack-objects-sparse.sh could be adjusted to drop the
explicit config setting, which would demonstrate that the config
option is being set correctly.

While looking at that test, I see that we use --[no-]sparse
explicitly everywhere to avoid conflicts with the GIT_TEST_*
variable that enables the algorithm. This leads to two things
I will do in v2 that I did not do here:

1. Update the docs for "git pack-objects" because it doesn't
   reference that --no-sparse is an option. Point out that the
   new default is --sparse.

2. Remove GIT_TEST_PACK_SPARSE which was used to test this sparse
   algorithm throughout the test suite.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt
index 875f8c8a66f..4e3a5c0cebc 100644
--- a/Documentation/config/feature.txt
+++ b/Documentation/config/feature.txt
@@ -12,9 +12,6 @@  feature.experimental::
 	setting if you are interested in providing feedback on experimental
 	features. The new default values are:
 +
-* `pack.useSparse=true` uses a new algorithm when constructing a pack-file
-which can improve `git push` performance in repos with many files.
-+
 * `fetch.negotiationAlgorithm=skipping` may improve fetch negotiation times by
 skipping more commits at a time, reducing the number of round trips.
 +
diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 0dac5805816..837f1b16792 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -119,8 +119,8 @@  pack.useSparse::
 	objects. This can have significant performance benefits when
 	computing a pack to send a small change. However, it is possible
 	that extra objects are added to the pack-file if the included
-	commits contain certain types of direct renames. Default is `false`
-	unless `feature.experimental` is enabled.
+	commits contain certain types of direct renames. Default is
+	`true`.
 
 pack.writeBitmaps (deprecated)::
 	This is a deprecated synonym for `repack.writeBitmaps`.
diff --git a/repo-settings.c b/repo-settings.c
index a703e407a3f..dc6817daa95 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -45,6 +45,8 @@  void prepare_repo_settings(struct repository *r)
 
 	if (!repo_config_get_bool(r, "pack.usesparse", &value))
 		r->settings.pack_use_sparse = value;
+	UPDATE_DEFAULT_BOOL(r->settings.pack_use_sparse, 1);
+
 	if (!repo_config_get_bool(r, "feature.manyfiles", &value) && value) {
 		UPDATE_DEFAULT_BOOL(r->settings.index_version, 4);
 		UPDATE_DEFAULT_BOOL(r->settings.core_untracked_cache, UNTRACKED_CACHE_WRITE);
@@ -52,7 +54,6 @@  void prepare_repo_settings(struct repository *r)
 	if (!repo_config_get_bool(r, "fetch.writecommitgraph", &value))
 		r->settings.fetch_write_commit_graph = value;
 	if (!repo_config_get_bool(r, "feature.experimental", &value) && value) {
-		UPDATE_DEFAULT_BOOL(r->settings.pack_use_sparse, 1);
 		UPDATE_DEFAULT_BOOL(r->settings.fetch_negotiation_algorithm, FETCH_NEGOTIATION_SKIPPING);
 		UPDATE_DEFAULT_BOOL(r->settings.fetch_write_commit_graph, 1);
 	}