mbox series

[00/11,RFC] Create 'core.size=large' setting to update config defaults

Message ID pull.254.git.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Create 'core.size=large' setting to update config defaults | expand

Message

Philippe Blain via GitGitGadget June 3, 2019, 8:18 p.m. UTC
This patch series includes a few new config options we created to speed up
certain critical commands in VFS for Git. On their own, they would
contribute little value as it is hard to discover new config variables.
Instead, I've created this RFC as a goal for probably three sequential patch
series:

 1. (Patches 1-3) Introduce a new 'core.size' config setting that takes
    'large' as a value. This enables several config values that are
    beneficial for large repos. We use a certain set in VFS for Git (see
    [1]), and most of those are applicable to any repo. This 'core.size'
    setting is intended for users to automatically receive performance
    updates as soon as they are stable, but they must opt-in to the setting
    and can always explicitly set their own config values. The settings to
    include here are core.commitGraph=true, gc.writeCommitGraph=true,
    index.version=4, pack.useSparse=true.
    
    
 2. (Patches 4-8) Introduce 'status.aheadBehind' to dictate if we use
    '--[no-]ahead-behind' during 'git status' calls. Also do some cleanup on
    the feature around porcelain formats. I adapted Jeff Hostetler's commits
    from microsoft/git for this section.
    
    
 3. (Patches 9-12) Introduce 'fetch.showForcedUpdates' and the associated
    '--[no-]show-forced-updates' option for 'git fetch' and 'git pull'
    calls. When fetching from a remote with many branches that move quickly,
    the check for forced updates can be expensive. Further, the only effects
    are a "(forced update)" indicator to stdout and a single bit in the
    reflog. The reflog bit is unfortunate to lose, but it is never trusted
    for important actions. These changes are likely to be more controversial
    than the others.
    
    

Hopefully this direction is amenable to allow "early adopters" gain access
to new performance features even if they are not necessary reading every
line of the release notes.

Thanks, -Stolee

[1] 
https://github.com/microsoft/VFSForGit/blob/6a7fd2ff50056b73b347b882d2b8d52939bd6419/GVFS/GVFS/CommandLine/GVFSVerb.cs#L122-L152
This code includes the settings we enable by default in VFS for Git
enlistments.

Derrick Stolee (8):
  repo-settings: create repo.size=large setting
  repo-settings: use index.version=4 by default
  repo-settings: pack.useSparse=true
  repo-settings: status.aheadBehind=false
  fetch: add --[no-]show-forced-updates argument
  fetch: warn about forced updates after branch list
  pull: add --[no-]show-forced-updates passthrough to fetch
  repo-settings: fetch.showForcedUpdates=false

Jeff Hostetler (3):
  status: add status.aheadbehind setting
  status: add warning when a/b calculation takes too long for
    long/normal format
  status: ignore status.aheadbehind in porcelain formats

 Documentation/config/core.txt   | 31 ++++++++++++++-
 Documentation/config/fetch.txt  |  5 +++
 Documentation/config/gc.txt     |  4 +-
 Documentation/config/index.txt  |  1 +
 Documentation/config/pack.txt   |  3 +-
 Documentation/config/status.txt |  6 +++
 Documentation/fetch-options.txt | 13 +++++++
 Makefile                        |  1 +
 advice.c                        |  2 +
 advice.h                        |  1 +
 builtin/commit.c                | 19 ++++++++-
 builtin/fetch.c                 | 34 ++++++++++++++++-
 builtin/gc.c                    |  6 +--
 builtin/pack-objects.c          |  9 +++--
 builtin/pull.c                  |  7 ++++
 commit-graph.c                  |  7 ++--
 read-cache.c                    | 12 +++---
 repo-settings.c                 | 68 +++++++++++++++++++++++++++++++++
 repo-settings.h                 | 17 +++++++++
 repository.h                    |  3 ++
 t/t6040-tracking-info.sh        | 31 +++++++++++++++
 t/t7064-wtstatus-pv2.sh         |  8 ++++
 wt-status.c                     | 17 +++++++++
 23 files changed, 283 insertions(+), 22 deletions(-)
 create mode 100644 repo-settings.c
 create mode 100644 repo-settings.h


base-commit: aa25c82427ae70aebf3b8f970f2afd54e9a2a8c6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-254%2Fderrickstolee%2Fconfig-large%2Fupstream-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-254/derrickstolee/config-large/upstream-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/254

Comments

Derrick Stolee June 3, 2019, 8:55 p.m. UTC | #1
On 6/3/2019 4:18 PM, Derrick Stolee via GitGitGadget wrote:
>  1. (Patches 1-3) Introduce a new 'core.size' config setting that takes
>     'large' as a value. 

I do want to point out that this "core.size=large" option is probably a
terrible name and could easily be replaced with something better. Please
consider alternatives that better describe the goals at hand (helping users
get performance boosts on upgrade without needing to pay close attention).

Thanks,
-Stolee
Johannes Schindelin June 4, 2019, 2:43 p.m. UTC | #2
Hi Stolee,

On Mon, 3 Jun 2019, Derrick Stolee via GitGitGadget wrote:

>  1. (Patches 1-3) Introduce a new 'core.size' config setting that takes
>     'large' as a value. This enables several config values that are
>     beneficial for large repos.

I find `core.size` a bit non-descriptive. Maybe `repository.size` instead?

Ciao,
Dscho
Derrick Stolee June 4, 2019, 2:56 p.m. UTC | #3
On 6/4/2019 10:43 AM, Johannes Schindelin wrote:
> Hi Stolee,
> 
> On Mon, 3 Jun 2019, Derrick Stolee via GitGitGadget wrote:
> 
>>  1. (Patches 1-3) Introduce a new 'core.size' config setting that takes
>>     'large' as a value. This enables several config values that are
>>     beneficial for large repos.
> 
> I find `core.size` a bit non-descriptive. Maybe `repository.size` instead?

Thanks for the suggestion! If the "repository." doesn't make sense as a top-
level category, then maybe "core.repositorySize" would work?

A thought I had overnight that may broaden our options would be to think of
this as a tolerance for experimental features. Maybe "core.adoptionRing" with
options for "slow" and "fast", where "slow" takes things that have been cooking
a long while (index.version=4, core.commitGraph=gc.writeCommitGraph=1) and the
"fast" option gets all of those values plus the more experimental options
(status.aheadBehind=false, fetch.showForcedUpdates=false).

Alternate names with this slow/fast idea could be:

* core.experimentTolerance={none,low,high}

* core.autoConfig={none,some,all}

Hopefully these options can trigger some creativity to decide on a good
name that an experienced Git user could understand.

Thanks,
-Stolee
Junio C Hamano June 5, 2019, 8:39 p.m. UTC | #4
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This patch series includes a few new config options we created to speed up
> certain critical commands in VFS for Git. On their own, they would
> contribute little value as it is hard to discover new config variables.
> Instead, I've created this RFC as a goal for probably three sequential patch
> series:
>
>  1. (Patches 1-3) Introduce a new 'core.size' config setting that takes
>     'large' as a value. This enables several config values that are
>     beneficial for large repos. We use a certain set in VFS for Git (see
>     [1]), and most of those are applicable to any repo. This 'core.size'
>     setting is intended for users to automatically receive performance
>     updates as soon as they are stable, but they must opt-in to the setting
>     and can always explicitly set their own config values. The settings to
>     include here are core.commitGraph=true, gc.writeCommitGraph=true,
>     index.version=4, pack.useSparse=true.

... and not the configuration introduced by the other two points in
this list?

"If you set this, these other configuration variables are set to
these default values" is a very valuable usability feature.  It
looks a lot more "meta" or "macro", and certainly is not a good idea
to call it as if it sits next to variables in any existing hierarchy.

I also wonder if this is something we would want to support in
general; random things that come to mind are:

 - should such a "macro" configuration be limited to boolean
   (e.g. the above core.size that takes 'large' is a boolean between
   'large' and 'not large'), or can it be an enum (e.g. choose among
   'large', 'medium' and 'small', and core.bigFileThreshold will be
   set to 1G, 512M and 128M respectively---this silly example is for
   illustration purposes only), and if so, can we express what these
   default values are for each choice without writing a lot of code?

 - if we were to have more than just this 'core.size' macro, can two
   otherwise orthogonal macros both control the same underlying
   variable, and if so, how do we express their interactions?
   "using these two at the same time is forbidden" is a perfectly
   acceptable answer for the first round until we figure out the
   desired semantics, of course.

 - perhaps we may eventually want to allow end users (via their
   ~/.gitconfig) and system administrators (via /etc/gitconfig)
   define such a macro setting (e.g. setting macro.largeRepoSetting
   sets pack.usebitmaps=true, pack.useSpars=true, etc.) *after* we
   figure out what we want to do to the other points in this list.

 - even if we do not allow end users and system administrators futz
   with custom macros, can we specify the macros we ship without
   casting them in code?
Derrick Stolee June 6, 2019, 12:23 p.m. UTC | #5
On 6/5/2019 4:39 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> This patch series includes a few new config options we created to speed up
>> certain critical commands in VFS for Git. On their own, they would
>> contribute little value as it is hard to discover new config variables.
>> Instead, I've created this RFC as a goal for probably three sequential patch
>> series:
>>
>>  1. (Patches 1-3) Introduce a new 'core.size' config setting that takes
>>     'large' as a value. This enables several config values that are
>>     beneficial for large repos. We use a certain set in VFS for Git (see
>>     [1]), and most of those are applicable to any repo. This 'core.size'
>>     setting is intended for users to automatically receive performance
>>     updates as soon as they are stable, but they must opt-in to the setting
>>     and can always explicitly set their own config values. The settings to
>>     include here are core.commitGraph=true, gc.writeCommitGraph=true,
>>     index.version=4, pack.useSparse=true.
> 
> ... and not the configuration introduced by the other two points in
> this list?

They are added to the config setting after they are introduced. See
patches 7 (status.aheadBehind) and 11 (fetch.showForcedUpdates).

> "If you set this, these other configuration variables are set to
> these default values" is a very valuable usability feature.  It
> looks a lot more "meta" or "macro", and certainly is not a good idea
> to call it as if it sits next to variables in any existing hierarchy.
> 
> I also wonder if this is something we would want to support in
> general; random things that come to mind are:
> 
>  - should such a "macro" configuration be limited to boolean
>    (e.g. the above core.size that takes 'large' is a boolean between
>    'large' and 'not large'), or can it be an enum (e.g. choose among
>    'large', 'medium' and 'small', and core.bigFileThreshold will be
>    set to 1G, 512M and 128M respectively---this silly example is for
>    illustration purposes only), and if so, can we express what these
>    default values are for each choice without writing a lot of code?

That's a good point that we could include recommended values for
other non-boolean variables if our "meta" config setting is also
non-boolean. This fits in with the "ring" ideas discussed earlier [1].
Taking in a few ideas from your message, perhaps we create a new "meta"
category for this setting and use an integer value for "how big do I
think my repo is?" and we can apply different settings based on thresholds:

 0: no config defaults changed
 3: safe defaults (core.commitGraph, index.version=4)
 6: behavior-modifying defaults (status.aheadBehind, fetch.noShowForcedUpdates)

Using 3 and 6 here to allow for finer gradients at a later date.

[1] https://public-inbox.org/git/xmqqftonsr6a.fsf@gitster-ct.c.googlers.com/T/#m8dbaedc016ce7301b9d80e5ceb6a82edfa7bafac

>  - if we were to have more than just this 'core.size' macro, can two
>    otherwise orthogonal macros both control the same underlying
>    variable, and if so, how do we express their interactions?
>    "using these two at the same time is forbidden" is a perfectly
>    acceptable answer for the first round until we figure out the
>    desired semantics, of course.

To borrow from linear algebra, I would recommend that two orthogonal
config settings have disjoint _bases_ (i.e. the set of config settings
they use are disjoint). Of course, this can be discussed in more
detail when someone suggests a second meta-config setting. Such a
second setting would need justification for why it doesn't work with
our first setting.
 
>  - perhaps we may eventually want to allow end users (via their
>    ~/.gitconfig) and system administrators (via /etc/gitconfig)
>    define such a macro setting (e.g. setting macro.largeRepoSetting
>    sets pack.usebitmaps=true, pack.useSpars=true, etc.) *after* we
>    figure out what we want to do to the other points in this list.
>
>  - even if we do not allow end users and system administrators futz
>    with custom macros, can we specify the macros we ship without
>    casting them in code?

Are you suggesting that we allow some config values to be pulled from
the repo contents? If we could identify some config options as "safe"
to include in the Git data, then a repo administrator could commit a
"/.gitconfig" file _and_ some existing config option says "look at the
config in the repo".

I see value in making some "safe" settings available in the repo, but
also see that it can be very tricky to get right. Further, I think it
is independent of the current direction. In fact, I would imagine the
meta-config setting be one of the "safe" settings that we could put in
this committed config file.

Thanks,
-Stolee
Junio C Hamano June 6, 2019, 4:07 p.m. UTC | #6
Derrick Stolee <stolee@gmail.com> writes:

>>  - perhaps we may eventually want to allow end users (via their
>>    ~/.gitconfig) and system administrators (via /etc/gitconfig)
>>    define such a macro setting (e.g. setting macro.largeRepoSetting
>>    sets pack.usebitmaps=true, pack.useSpars=true, etc.) *after* we
>>    figure out what we want to do to the other points in this list.
>>
>>  - even if we do not allow end users and system administrators futz
>>    with custom macros, can we specify the macros we ship without
>>    casting them in code?
>
> Are you suggesting that we allow some config values to be pulled from
> the repo contents?

Not at all.  As far as the configuration is concerned, what project
ships is tainted data that should not be used blindly.

What I had in mind is parallel to the idea of pushing "static struct
userdiff_driver builtin_drivers[]" out of the compiled-in code and
instead have a text file shipped in /usr/share/git/ somewhere.  So,
instead of having "core.size==large means these other four variables
are set to these values" in the code, we invent a general mechanism
to read such "macro" specification out of a text file, and that
would be the only code change---the specific "core.size==large
affects X, Y and Z" would not be in the code, but would be in the
text file we ship and read by the mechanism.

If the list of allowed "meta" configuration variables and the
configuration variables whose default each of them affects can be
expressed in our usual ".gitconfig" file format, then the system
administrators can add their own in /etc/gitconfig, too, to help
their users.

That is what I meant by the last item.  Note that I was "wondering
if it makes sense" and what I wrote above in this message is merely
clarifying what I meant---I am not making further/more arguments to
claim it is a good idea (at least not yet).

Thanks.