diff mbox series

[1/3] eoie: default to not writing EOIE section

Message ID 20181113003911.GB170017@google.com (mailing list archive)
State New, archived
Headers show
Series Avoid confusing messages from new index extensions (Re: [PATCH v8 0/7] speed up index load through parallelization) | expand

Commit Message

Jonathan Nieder Nov. 13, 2018, 12:39 a.m. UTC
Since 3b1d9e04 (eoie: add End of Index Entry (EOIE) extension,
2018-10-10) Git defaults to writing the new EOIE section when writing
out an index file.  Usually that is a good thing because it improves
threaded performance, but when a Git repository is shared with older
versions of Git, it produces a confusing warning:

  $ git status
  ignoring EOIE extension
  HEAD detached at 371ed0defa
  nothing to commit, working tree clean

Let's introduce the new index extension more gently.  First we'll roll
out the new version of Git that understands it, and then once
sufficiently many users are using such a version, we can flip the
default to writing it by default.

Introduce a '[index] recordEndOfIndexEntries' configuration variable
to allow interested users to benefit from this index extension early.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/config.txt |  7 +++++++
 read-cache.c             | 11 ++++++++++-
 t/t1700-split-index.sh   | 11 +++++++----
 3 files changed, 24 insertions(+), 5 deletions(-)

Comments

Junio C Hamano Nov. 13, 2018, 1:05 a.m. UTC | #1
Jonathan Nieder <jrnieder@gmail.com> writes:

> Since 3b1d9e04 (eoie: add End of Index Entry (EOIE) extension,
> 2018-10-10) Git defaults to writing the new EOIE section when writing
> out an index file.  Usually that is a good thing because it improves
> threaded performance, but when a Git repository is shared with older
> versions of Git, it produces a confusing warning:
>
>   $ git status
>   ignoring EOIE extension
>   HEAD detached at 371ed0defa
>   nothing to commit, working tree clean
>
> Let's introduce the new index extension more gently.  First we'll roll
> out the new version of Git that understands it, and then once
> sufficiently many users are using such a version, we can flip the
> default to writing it by default.
>
> Introduce a '[index] recordEndOfIndexEntries' configuration variable
> to allow interested users to benefit from this index extension early.

Thanks.  I am in principle OK with this approach.  In fact, I
suspect that the default may want to be dynamically determined, and
we give this knob to let the users further force their preference.
When no extension that benefits from multi-threading is written, the
default can stay "no" in future versions of Git, for example.

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 41a9ff2b6a..d702379db4 100644

The timing is a bit unfortunate for any topic to touch this file,
and contrib/cocci would not help us in this case X-<.

> diff --git a/read-cache.c b/read-cache.c
> index f3a848d61c..4bfe93c4c2 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2698,6 +2698,15 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
>  		rollback_lock_file(lockfile);
>  }
>  
> +static int record_eoie(void)
> +{
> +	int val;
> +
> +	if (!git_config_get_bool("index.recordendofindexentries", &val))
> +		return val;
> +	return 0;
> +}

Unconditionally defaulting to no in this round is perfectly fine.
Let's make a mental note that this is the place to decide dynamic
default in the future when we want to.  It would probably have to
ask around various "extension writing" helpers if they want to have
a say in the outcome (e.g. if there are very many cache entries in
the istate, the entry offset table may want to be written and
otherwise not).

> @@ -2945,7 +2954,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>  	 * read.  Write it out regardless of the strip_extensions parameter as we need it
>  	 * when loading the shared index.
>  	 */
> -	if (offset) {
> +	if (offset && record_eoie()) {
>  		struct strbuf sb = STRBUF_INIT;
>  
>  		write_eoie_extension(&sb, &eoie_c, offset);
> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
> index 2ac47aa0e4..0cbac64e28 100755
> --- a/t/t1700-split-index.sh
> +++ b/t/t1700-split-index.sh
> @@ -25,14 +25,17 @@ test_expect_success 'enable split index' '
>  	git update-index --split-index &&
>  	test-tool dump-split-index .git/index >actual &&
>  	indexversion=$(test-tool index-version <.git/index) &&
> +
> +	# NEEDSWORK: Stop hard-coding checksums.

Also let's stop hard-coding the assumption that the new knob is off
by default.  Ideally, you'd want to test both cases, right?

Perhaps you'd call "git update-index --split-index" we see in the
precontext twice, with "-c VAR=false" and "-c VAR=true", to prepare
"actual.without-eoie" and "actual.with-eoie", or something like
that?

Thanks.

>  	if test "$indexversion" = "4"
>  	then
> -		own=3527df833c6c100d3d1d921a9a782d62a8be4b58
> -		base=746f7ab2ed44fb839efdfbffcf399d0b113fb4cb
> +		own=432ef4b63f32193984f339431fd50ca796493569
> +		base=508851a7f0dfa8691e9f69c7f055865389012491
>  	else
> -		own=5e9b60117ece18da410ddecc8b8d43766a0e4204
> -		base=4370042739b31cd17a5c5cd6043a77c9a00df113
> +		own=8299b0bcd1ac364e5f1d7768efb62fa2da79a339
> +		base=39d890139ee5356c7ef572216cebcd27aa41f9df
>  	fi &&
> +
>  	cat >expect <<-EOF &&
>  	own $own
>  	base $base
Ben Peart Nov. 13, 2018, 3:14 p.m. UTC | #2
On 11/12/2018 8:05 PM, Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:
> 
>> Since 3b1d9e04 (eoie: add End of Index Entry (EOIE) extension,
>> 2018-10-10) Git defaults to writing the new EOIE section when writing
>> out an index file.  Usually that is a good thing because it improves
>> threaded performance, but when a Git repository is shared with older
>> versions of Git, it produces a confusing warning:
>>
>>    $ git status
>>    ignoring EOIE extension
>>    HEAD detached at 371ed0defa
>>    nothing to commit, working tree clean
>>
>> Let's introduce the new index extension more gently.  First we'll roll
>> out the new version of Git that understands it, and then once
>> sufficiently many users are using such a version, we can flip the
>> default to writing it by default.
>>
>> Introduce a '[index] recordEndOfIndexEntries' configuration variable
>> to allow interested users to benefit from this index extension early.
> 
> Thanks.  I am in principle OK with this approach.  In fact, I
> suspect that the default may want to be dynamically determined, and
> we give this knob to let the users further force their preference.
> When no extension that benefits from multi-threading is written, the
> default can stay "no" in future versions of Git, for example.
> 

While I can understand the user confusion the warning about ignoring an 
extension could cause I guess I'm a little surprised that people would 
see it that often.  To see the warning means they are running a new 
version of git in the same repo as they are running an old version of 
git.  I just haven't ever experienced that (I only ever have one copy of 
git installed) so am surprised it comes up often enough to warrant this 
change.

That said, if it _is_ that much of an issue, this patch makes sense and 
provides a way to more gracefully transition into this feature.  Even if 
we had some logic to dynamically determine whether to write it or not, 
we'd still want to avoid confusing users when it did get written out.

>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 41a9ff2b6a..d702379db4 100644
> 
> The timing is a bit unfortunate for any topic to touch this file,
> and contrib/cocci would not help us in this case X-<.
> 
>> diff --git a/read-cache.c b/read-cache.c
>> index f3a848d61c..4bfe93c4c2 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -2698,6 +2698,15 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
>>   		rollback_lock_file(lockfile);
>>   }
>>   
>> +static int record_eoie(void)
>> +{
>> +	int val;
>> +
>> +	if (!git_config_get_bool("index.recordendofindexentries", &val))
>> +		return val;
>> +	return 0;
>> +}
> 
> Unconditionally defaulting to no in this round is perfectly fine.
> Let's make a mental note that this is the place to decide dynamic
> default in the future when we want to.  It would probably have to
> ask around various "extension writing" helpers if they want to have
> a say in the outcome (e.g. if there are very many cache entries in
> the istate, the entry offset table may want to be written and
> otherwise not).
> 
>> @@ -2945,7 +2954,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>>   	 * read.  Write it out regardless of the strip_extensions parameter as we need it
>>   	 * when loading the shared index.
>>   	 */
>> -	if (offset) {
>> +	if (offset && record_eoie()) {
>>   		struct strbuf sb = STRBUF_INIT;
>>   
>>   		write_eoie_extension(&sb, &eoie_c, offset);
>> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
>> index 2ac47aa0e4..0cbac64e28 100755
>> --- a/t/t1700-split-index.sh
>> +++ b/t/t1700-split-index.sh
>> @@ -25,14 +25,17 @@ test_expect_success 'enable split index' '
>>   	git update-index --split-index &&
>>   	test-tool dump-split-index .git/index >actual &&
>>   	indexversion=$(test-tool index-version <.git/index) &&
>> +
>> +	# NEEDSWORK: Stop hard-coding checksums.
> 
> Also let's stop hard-coding the assumption that the new knob is off
> by default.  Ideally, you'd want to test both cases, right?
> 
> Perhaps you'd call "git update-index --split-index" we see in the
> precontext twice, with "-c VAR=false" and "-c VAR=true", to prepare
> "actual.without-eoie" and "actual.with-eoie", or something like
> that?
> 
> Thanks.
> 
>>   	if test "$indexversion" = "4"
>>   	then
>> -		own=3527df833c6c100d3d1d921a9a782d62a8be4b58
>> -		base=746f7ab2ed44fb839efdfbffcf399d0b113fb4cb
>> +		own=432ef4b63f32193984f339431fd50ca796493569
>> +		base=508851a7f0dfa8691e9f69c7f055865389012491
>>   	else
>> -		own=5e9b60117ece18da410ddecc8b8d43766a0e4204
>> -		base=4370042739b31cd17a5c5cd6043a77c9a00df113
>> +		own=8299b0bcd1ac364e5f1d7768efb62fa2da79a339
>> +		base=39d890139ee5356c7ef572216cebcd27aa41f9df
>>   	fi &&
>> +
>>   	cat >expect <<-EOF &&
>>   	own $own
>>   	base $base
Jonathan Nieder Nov. 13, 2018, 6:25 p.m. UTC | #3
Hi,

Ben Peart wrote:

> While I can understand the user confusion the warning about ignoring an
> extension could cause I guess I'm a little surprised that people would see
> it that often.  To see the warning means they are running a new version of
> git in the same repo as they are running an old version of git.  I just
> haven't ever experienced that (I only ever have one copy of git installed)
> so am surprised it comes up often enough to warrant this change.

Great question.  There are a few contexts where it comes up:

 1. Using multiple versions of Git on a single machine.  For example,
    some IDEs bundle a particular version of Git, which can be a
    different version from the system copy, or on a Mac, /usr/bin/git
    quickly goes out of sync with the Homebrew git in
    /usr/local/bin/git.

 2. Sharing a single Git repository between multiple machines.  This is
    not unusual, using NFS or sshfs, for example.

 3. Downgrading after trying a new version of Git.

To support these, Git is generally careful to avoid writing
repositories that older versions of Git do not understand.  The EOIE
extension was almost perfect in this respect: it works fine with older
versions of Git, except for the alarming error message.

> That said, if it _is_ that much of an issue, this patch makes sense and
> provides a way to more gracefully transition into this feature.  Even if we
> had some logic to dynamically determine whether to write it or not, we'd
> still want to avoid confusing users when it did get written out.

Yes.  An earlier version of this patch defaulted to writing EOIE if
and only if the .git/index file already has an EOIE extension.  There
were enough holes in that (commands like "git reset" that do not read
the existing index file) and enough complexity that it didn't seem
worth it.

Really in this series, patch 3/3 is the one I care most about.  I wish
we had had it years ago. :)  It would make patches 1 and 2
unnecessary.

Thanks,
Jonathan
Junio C Hamano Nov. 14, 2018, 1:36 a.m. UTC | #4
Jonathan Nieder <jrnieder@gmail.com> writes:

>  1. Using multiple versions of Git on a single machine.  For example,
>     some IDEs bundle a particular version of Git, which can be a
>     different version from the system copy, or on a Mac, /usr/bin/git
>     quickly goes out of sync with the Homebrew git in
>     /usr/local/bin/git.

Exactly this, especially the latter, is the answer to your 
question in an earlier message:

>> Am I understanding correclty?  Can you give an example of when a user
>> would *want* to see this message and what they would do in response?

The user may not be even aware of using another version of Git that
does not know how to take advantage of the version of Git you have
used in the repository, and it can be a mistake the user may want to
fix (e.g. by futzing with PATH).  The message would help the user
notice the situation and take corrective action.  Users of IDEs that
bundle stale version of Git cannot even bug the supplier of the IDE
to make them more up-to-date if they aren't aware of it.
Jonathan Nieder Nov. 15, 2018, 12:19 a.m. UTC | #5
Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>>  1. Using multiple versions of Git on a single machine.  For example,
>>     some IDEs bundle a particular version of Git, which can be a
>>     different version from the system copy, or on a Mac, /usr/bin/git
>>     quickly goes out of sync with the Homebrew git in
>>     /usr/local/bin/git.
>
> Exactly this, especially the latter, is the answer to your
> question in an earlier message:
>
>>> Am I understanding correctly?  Can you give an example of when a user
>>> would *want* to see this message and what they would do in response?
>
> The user may not be even aware of using another version of Git that
> does not know how to take advantage of the version of Git you have
> used in the repository, and it can be a mistake the user may want to
> fix (e.g. by futzing with PATH).

Ah, thanks much.  I'll add a hint along those lines (e.g.

 warning: ignoring optional IEOT index extension
 hint: This is likely due to the file having been written by a newer
 hint: version of Git than is reading it.  You can upgrade Git to
 hint: take advantage of performance improvements from the updated
 hint: file format.
 hint:
 hint: You can run "git config advice.unknownIndexExtension true" to
 hint: suppress this message.

I am still vaguely uncomfortable with this since it seems analogous to
warning that the server is advertising an unrecognized capability, but
I can live with it. :)

Patch coming in a few moments.

Jonathan
diff mbox series

Patch

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 41a9ff2b6a..d702379db4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2188,6 +2188,13 @@  imap::
 	The configuration variables in the 'imap' section are described
 	in linkgit:git-imap-send[1].
 
+index.recordEndOfIndexEntries::
+	Specifies whether the index file should include an "End Of Index
+	Entry" section. This reduces index load time on multiprocessor
+	machines but produces a message "ignoring EOIE extension" when
+	reading the index using Git versions before 2.20. Defaults to
+	'false'.
+
 index.threads::
 	Specifies the number of threads to spawn when loading the index.
 	This is meant to reduce index load time on multiprocessor machines.
diff --git a/read-cache.c b/read-cache.c
index f3a848d61c..4bfe93c4c2 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2698,6 +2698,15 @@  void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
 		rollback_lock_file(lockfile);
 }
 
+static int record_eoie(void)
+{
+	int val;
+
+	if (!git_config_get_bool("index.recordendofindexentries", &val))
+		return val;
+	return 0;
+}
+
 /*
  * On success, `tempfile` is closed. If it is the temporary file
  * of a `struct lock_file`, we will therefore effectively perform
@@ -2945,7 +2954,7 @@  static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 	 * read.  Write it out regardless of the strip_extensions parameter as we need it
 	 * when loading the shared index.
 	 */
-	if (offset) {
+	if (offset && record_eoie()) {
 		struct strbuf sb = STRBUF_INIT;
 
 		write_eoie_extension(&sb, &eoie_c, offset);
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 2ac47aa0e4..0cbac64e28 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -25,14 +25,17 @@  test_expect_success 'enable split index' '
 	git update-index --split-index &&
 	test-tool dump-split-index .git/index >actual &&
 	indexversion=$(test-tool index-version <.git/index) &&
+
+	# NEEDSWORK: Stop hard-coding checksums.
 	if test "$indexversion" = "4"
 	then
-		own=3527df833c6c100d3d1d921a9a782d62a8be4b58
-		base=746f7ab2ed44fb839efdfbffcf399d0b113fb4cb
+		own=432ef4b63f32193984f339431fd50ca796493569
+		base=508851a7f0dfa8691e9f69c7f055865389012491
 	else
-		own=5e9b60117ece18da410ddecc8b8d43766a0e4204
-		base=4370042739b31cd17a5c5cd6043a77c9a00df113
+		own=8299b0bcd1ac364e5f1d7768efb62fa2da79a339
+		base=39d890139ee5356c7ef572216cebcd27aa41f9df
 	fi &&
+
 	cat >expect <<-EOF &&
 	own $own
 	base $base