diff mbox series

[2/3] ieot: default to not writing IEOT section

Message ID 20181113003938.GC170017@google.com (mailing list archive)
State New, archived
Headers show
Series Avoid confusing messages from new index extensions (Re: [PATCH v8 0/7] speed up index load through parallelization) | expand

Commit Message

Jonathan Nieder Nov. 13, 2018, 12:39 a.m. UTC
As with EOIE, popular versions of Git do not support the new IEOT
extension yet.  When accessing a Git repository written by a more
modern version of Git, they correctly ignore the unrecognized section,
but in the process they loudly warn

	ignoring IEOT extension

resulting in confusion for users.  Introduce the index extension more
gently by not writing it yet in this first version with support for
it.  Soon, once sufficiently many users are running a modern version
of Git, we can flip the default so users benefit from this index
extension by default.

Introduce a '[index] recordOffsetTable' configuration variable to
control whether the new index extension is written.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/config.txt |  7 +++++++
 read-cache.c             | 11 ++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

Comments

Jonathan Tan Nov. 13, 2018, 12:58 a.m. UTC | #1
> +index.recordOffsetTable::
> +	Specifies whether the index file should include an "Index Entry
> +	Offset Table" section. This reduces index load time on
> +	multiprocessor machines but produces a message "ignoring IEOT
> +	extension" when reading the index using Git versions before 2.20.
> +	Defaults to 'false'.

Probably worth adding a test that exercises this new config option -
somehow create an index with index.recordOffsetTable=1, check that the
index contains the appropriate string (a few ways to do this, but I'm
not sure which are portable), and then run a Git command that reads the
index to make sure it is valid; then do the same except
index.recordOffsetTable=0.

The code itself looks good to me.

Same comment for patch 1.
Junio C Hamano Nov. 13, 2018, 1:09 a.m. UTC | #2
Jonathan Nieder <jrnieder@gmail.com> writes:

> As with EOIE, popular versions of Git do not support the new IEOT
> extension yet.  When accessing a Git repository written by a more
> modern version of Git, they correctly ignore the unrecognized section,
> but in the process they loudly warn
>
> 	ignoring IEOT extension
>
> resulting in confusion for users.

Then removing the message is throwing it with bathwater.  First
think about which part of the message is confusiong and then make it
less confusing.

How about

	hint: ignoring an optional IEOT extension

to make it clear that it is totally harmless?

With that, we can add advise.unknownIndexExtension=false to turn all
of them off with a single switch.
Jonathan Nieder Nov. 13, 2018, 1:12 a.m. UTC | #3
Junio C Hamano wrote:

> How about
>
> 	hint: ignoring an optional IEOT extension
>
> to make it clear that it is totally harmless?
>
> With that, we can add advise.unknownIndexExtension=false to turn all
> of them off with a single switch.

I like it.  Expect a patch soon (tonight or tomorrow) that does that.

We'll have to find some appropriate place in the documentation to
explain what the message is about, still.

Thanks,
Jonathan
Ben Peart Nov. 13, 2018, 3:22 p.m. UTC | #4
On 11/12/2018 7:39 PM, Jonathan Nieder wrote:
> As with EOIE, popular versions of Git do not support the new IEOT
> extension yet.  When accessing a Git repository written by a more
> modern version of Git, they correctly ignore the unrecognized section,
> but in the process they loudly warn
> 
> 	ignoring IEOT extension
> 
> resulting in confusion for users.  Introduce the index extension more
> gently by not writing it yet in this first version with support for
> it.  Soon, once sufficiently many users are running a modern version
> of Git, we can flip the default so users benefit from this index
> extension by default.
> 
> Introduce a '[index] recordOffsetTable' configuration variable to
> control whether the new index extension is written.
> 

Why introduce a new setting to disable writing the IEOT extension 
instead of just using the existing index.threads setting?  If 
index.threads=1 then the IEOT extension isn't written which (I believe) 
will accomplish the same goal.

> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
> ---
>   Documentation/config.txt |  7 +++++++
>   read-cache.c             | 11 ++++++++++-
>   2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index d702379db4..cc66fb7de3 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2195,6 +2195,13 @@ index.recordEndOfIndexEntries::
>   	reading the index using Git versions before 2.20. Defaults to
>   	'false'.
>   
> +index.recordOffsetTable::
> +	Specifies whether the index file should include an "Index Entry
> +	Offset Table" section. This reduces index load time on
> +	multiprocessor machines but produces a message "ignoring IEOT
> +	extension" when reading the index using Git versions before 2.20.
> +	Defaults to 'false'.
> +
>   index.threads::
>   	Specifies the number of threads to spawn when loading the index.
>   	This is meant to reduce index load time on multiprocessor machines.
> diff --git a/read-cache.c b/read-cache.c
> index 4bfe93c4c2..290bd54708 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2707,6 +2707,15 @@ static int record_eoie(void)
>   	return 0;
>   }
>   
> +static int record_ieot(void)
> +{
> +	int val;
> +
> +	if (!git_config_get_bool("index.recordoffsettable", &val))
> +		return val;
> +	return 0;
> +}
> +
>   /*
>    * On success, `tempfile` is closed. If it is the temporary file
>    * of a `struct lock_file`, we will therefore effectively perform
> @@ -2767,7 +2776,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>   
>   #ifndef NO_PTHREADS
>   	nr_threads = git_config_get_index_threads();
> -	if (nr_threads != 1) {
> +	if (nr_threads != 1 && record_ieot()) {
>   		int ieot_blocks, cpus;
>   
>   		/*
>
Duy Nguyen Nov. 13, 2018, 3:37 p.m. UTC | #5
On Tue, Nov 13, 2018 at 2:12 AM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Junio C Hamano wrote:
>
> > How about
> >
> >       hint: ignoring an optional IEOT extension
> >
> > to make it clear that it is totally harmless?
> >
> > With that, we can add advise.unknownIndexExtension=false to turn all
> > of them off with a single switch.
>
> I like it.  Expect a patch soon (tonight or tomorrow) that does that.
>
> We'll have to find some appropriate place in the documentation to
> explain what the message is about, still.

Also from the last discussion, if I remember correctly, this
"ignoring" is considered harmless and could be suppressed most of the
time. But commands like 'fsck' should always report it.
Jonathan Nieder Nov. 13, 2018, 6:09 p.m. UTC | #6
Hi again,

Junio C Hamano wrote:

> Then removing the message is throwing it with bathwater.  First
> think about which part of the message is confusiong and then make it
> less confusing.
>
> How about
>
> 	hint: ignoring an optional IEOT extension
>
> to make it clear that it is totally harmless?
>
> With that, we can add advise.unknownIndexExtension=false to turn all
> of them off with a single switch.

After having slept on it, this doesn't seem like a good fit for the
advice subsystem.  The advice subsystem provides hints about suggested
actions for new users to understand what to do about a condition.  In
this example, the message is not suggesting a particular user action
--- instead, it's describing state, which would seem to be a better
fit for tracing, as in the patch 3/3 I sent.

Am I understanding correclty?  Can you give an example of when a user
would *want* to see this message and what they would do in response?

Thanks,
Jonathan
Jonathan Nieder Nov. 13, 2018, 6:18 p.m. UTC | #7
Hi,

Ben Peart wrote:
> On 11/12/2018 7:39 PM, Jonathan Nieder wrote:

>> As with EOIE, popular versions of Git do not support the new IEOT
>> extension yet.  When accessing a Git repository written by a more
>> modern version of Git, they correctly ignore the unrecognized section,
>> but in the process they loudly warn
>>
>> 	ignoring IEOT extension
>>
>> resulting in confusion for users.  Introduce the index extension more
>> gently by not writing it yet in this first version with support for
>> it.
[...]
>> Introduce a '[index] recordOffsetTable' configuration variable to
>> control whether the new index extension is written.
>
> Why introduce a new setting to disable writing the IEOT extension instead of
> just using the existing index.threads setting?  If index.threads=1 then the
> IEOT extension isn't written which (I believe) will accomplish the same
> goal.

Do you mean defaulting to index.threads=1?  I don't think that would
be a good default, but if you have a different change in mind then I'd
be happy to hear it.

Or do you mean that if the user has explicitly specified index.threads=true,
then that should imply index.recordOffsetTable=true so users only have
to set one setting to turn it on?  I can imagine that working well.

Thanks,
Jonathan
Ben Peart Nov. 13, 2018, 7:15 p.m. UTC | #8
On 11/13/2018 1:18 PM, Jonathan Nieder wrote:
> Hi,
> 
> Ben Peart wrote:
>> On 11/12/2018 7:39 PM, Jonathan Nieder wrote:
> 
>>> As with EOIE, popular versions of Git do not support the new IEOT
>>> extension yet.  When accessing a Git repository written by a more
>>> modern version of Git, they correctly ignore the unrecognized section,
>>> but in the process they loudly warn
>>>
>>> 	ignoring IEOT extension
>>>
>>> resulting in confusion for users.  Introduce the index extension more
>>> gently by not writing it yet in this first version with support for
>>> it.
> [...]
>>> Introduce a '[index] recordOffsetTable' configuration variable to
>>> control whether the new index extension is written.
>>
>> Why introduce a new setting to disable writing the IEOT extension instead of
>> just using the existing index.threads setting?  If index.threads=1 then the
>> IEOT extension isn't written which (I believe) will accomplish the same
>> goal.
> 
> Do you mean defaulting to index.threads=1?  I don't think that would
> be a good default, but if you have a different change in mind then I'd
> be happy to hear it.
> 
> Or do you mean that if the user has explicitly specified index.threads=true,
> then that should imply index.recordOffsetTable=true so users only have
> to set one setting to turn it on?  I can imagine that working well.
> 

Reading the index with multiple threads requires the EOIE and IEOT 
extensions to exist in the index.  If either extension doesn't exist, 
then the code falls back to the single threaded path.  That means you 
can't have both 1) no warning for old versions of git and 2) 
multi-threaded reading for new versions of git.

If you set index.threads=1, that will prevent the IEOT extension from 
being written and there will be no "ignoring IEOT extension" warning in 
older versions of git.

With this patch 'as is' you would have to set both index.threads=true 
and index.recordOffsetTable=true to get multi-threaded index reads.  If 
either is set to false, it will silently drop back to single threaded reads.

> Thanks,
> Jonathan
>
Jonathan Nieder Nov. 13, 2018, 9:08 p.m. UTC | #9
Hi again,

Ben Peart wrote:
> On 11/13/2018 1:18 PM, Jonathan Nieder wrote:
>> Ben Peart wrote:

>>> Why introduce a new setting to disable writing the IEOT extension instead of
>>> just using the existing index.threads setting?  If index.threads=1 then the
>>> IEOT extension isn't written which (I believe) will accomplish the same
>>> goal.
>>
>> Do you mean defaulting to index.threads=1?  I don't think that would
>> be a good default, but if you have a different change in mind then I'd
>> be happy to hear it.
>>
>> Or do you mean that if the user has explicitly specified index.threads=true,
>> then that should imply index.recordOffsetTable=true so users only have
>> to set one setting to turn it on?  I can imagine that working well.
>
> Reading the index with multiple threads requires the EOIE and IEOT
> extensions to exist in the index.  If either extension doesn't exist, then
> the code falls back to the single threaded path.  That means you can't have
> both 1) no warning for old versions of git and 2) multi-threaded reading for
> new versions of git.
>
> If you set index.threads=1, that will prevent the IEOT extension from being
> written and there will be no "ignoring IEOT extension" warning in older
> versions of git.
>
> With this patch 'as is' you would have to set both index.threads=true and
> index.recordOffsetTable=true to get multi-threaded index reads.  If either
> is set to false, it will silently drop back to single threaded reads.

Sorry, I'm still not understanding what you're proposing.  What would be

- the default behavior
- the mechanism for changing that behavior

under your proposal?

I consider index.threads=1 to be a bad default.  I would understand if
you are saying that that should be the default, and I tried to propose
a different way to achieve what you're looking for in the quoted reply
above (but I'm not understanding whether you like that proposal or
not).

Jonathan
Junio C Hamano Nov. 14, 2018, 3:05 a.m. UTC | #10
Ben Peart <peartben@gmail.com> writes:

> Why introduce a new setting to disable writing the IEOT extension
> instead of just using the existing index.threads setting?

But index.threads is about what the reader does, not about the
writer who does not even know who will be reading the resulting
index, no?
Ben Peart Nov. 14, 2018, 6:09 p.m. UTC | #11
On 11/13/2018 4:08 PM, Jonathan Nieder wrote:
> Hi again,
> 
> Ben Peart wrote:
>> On 11/13/2018 1:18 PM, Jonathan Nieder wrote:
>>> Ben Peart wrote:
> 
>>>> Why introduce a new setting to disable writing the IEOT extension instead of
>>>> just using the existing index.threads setting?  If index.threads=1 then the
>>>> IEOT extension isn't written which (I believe) will accomplish the same
>>>> goal.
>>>
>>> Do you mean defaulting to index.threads=1?  I don't think that would
>>> be a good default, but if you have a different change in mind then I'd
>>> be happy to hear it.
>>>
>>> Or do you mean that if the user has explicitly specified index.threads=true,
>>> then that should imply index.recordOffsetTable=true so users only have
>>> to set one setting to turn it on?  I can imagine that working well.
>>
>> Reading the index with multiple threads requires the EOIE and IEOT
>> extensions to exist in the index.  If either extension doesn't exist, then
>> the code falls back to the single threaded path.  That means you can't have
>> both 1) no warning for old versions of git and 2) multi-threaded reading for
>> new versions of git.
>>
>> If you set index.threads=1, that will prevent the IEOT extension from being
>> written and there will be no "ignoring IEOT extension" warning in older
>> versions of git.
>>
>> With this patch 'as is' you would have to set both index.threads=true and
>> index.recordOffsetTable=true to get multi-threaded index reads.  If either
>> is set to false, it will silently drop back to single threaded reads.
> 
> Sorry, I'm still not understanding what you're proposing.  What would be
> 
> - the default behavior
> - the mechanism for changing that behavior
> 
> under your proposal?
> 
> I consider index.threads=1 to be a bad default.  I would understand if
> you are saying that that should be the default, and I tried to propose
> a different way to achieve what you're looking for in the quoted reply
> above (but I'm not understanding whether you like that proposal or
> not).
> 

Today, both the write logic (ie should we write out the IEOT extension) 
and the read logic (should I use the IEOT, if available, and do 
multi-threaded reading) are controlled by the single "index.threads" 
setting.  I would like to keep the settings as simple as possible to 
prevent user confusion.

If we have two different settings (index.threads and 
index.recordoffsettable) then the only combination that will result in 
the user actually getting multi-threaded reads is when they are both set 
to true.  Any other combination will silently fail.  I think it would be 
confusing if you set index.threads=true and got no error message but 
didn't get multi-threaded reads either (or vice versa).

If you want to prevent any of the scary "ignoring IEOT extension" from 
ever happening then your only option is to turn off the IEOT writing by 
default.  The downside is that people have to discover and turn it on if 
they want the improvements.  This can be achieved by changing the 
default for index.threads from "true" to "false."

diff --git a/config.c b/config.c
index 2ee29f6f86..86f5c14294 100644
--- a/config.c
+++ b/config.c
@@ -2291,7 +2291,7 @@ int git_config_get_fsmonitor(void)

  int git_config_get_index_threads(void)
  {
-       int is_bool, val = 0;
+       int is_bool, val = 1;

         val = git_env_ulong("GIT_TEST_INDEX_THREADS", 0);
         if (val)


If you want to provide a way for a concerned user to disable the message 
after the first time they have seen it, then they can be instructed to 
run 'git config --global index.threads false'

There is no way to get multi-threaded reads and NOT get the scary 
message with older versions of git.  Multi-threaded reads require the 
IEOT extension to be written into the index and the existence of the 
IEOT extension in the index will always generate the scary warning.

> Jonathan
>
Jonathan Nieder Nov. 15, 2018, 12:05 a.m. UTC | #12
Hi,

Ben Peart wrote:

> There is no way to get multi-threaded reads and NOT get the scary message
> with older versions of git.  Multi-threaded reads require the IEOT extension
> to be written into the index and the existence of the IEOT extension in the
> index will always generate the scary warning.

This is where I think we differ.  I want my local copy of Git to get
multi-threaded reads as long as IEOT happens to be there, even if I am
not ready to write IEOT myself yet.

I understand that this differs from what you would prefer, so I'd like
to find some compromise that makes us both happy.  I've tried to
suggest one:

   Make explicitly setting index.threads=true imply
   index.recordOffsetTable=true.  That way, the default behavior is the
   behavior I prefer, and a client can simply set index.threads=true to
   get the behavior I think you are describing preferring.

My preference is instead what I sent in patch 2/3 (for simplicity,
especially since the default of index.recordOffsetTable=false would be
only temporary), but this would work okay for me.

I'll send this as a patch.  If there is a reason it won't work for
you, I would be very happy to learn more about why.

Thanks,
Jonathan
diff mbox series

Patch

diff --git a/Documentation/config.txt b/Documentation/config.txt
index d702379db4..cc66fb7de3 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2195,6 +2195,13 @@  index.recordEndOfIndexEntries::
 	reading the index using Git versions before 2.20. Defaults to
 	'false'.
 
+index.recordOffsetTable::
+	Specifies whether the index file should include an "Index Entry
+	Offset Table" section. This reduces index load time on
+	multiprocessor machines but produces a message "ignoring IEOT
+	extension" when reading the index using Git versions before 2.20.
+	Defaults to 'false'.
+
 index.threads::
 	Specifies the number of threads to spawn when loading the index.
 	This is meant to reduce index load time on multiprocessor machines.
diff --git a/read-cache.c b/read-cache.c
index 4bfe93c4c2..290bd54708 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2707,6 +2707,15 @@  static int record_eoie(void)
 	return 0;
 }
 
+static int record_ieot(void)
+{
+	int val;
+
+	if (!git_config_get_bool("index.recordoffsettable", &val))
+		return val;
+	return 0;
+}
+
 /*
  * On success, `tempfile` is closed. If it is the temporary file
  * of a `struct lock_file`, we will therefore effectively perform
@@ -2767,7 +2776,7 @@  static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 
 #ifndef NO_PTHREADS
 	nr_threads = git_config_get_index_threads();
-	if (nr_threads != 1) {
+	if (nr_threads != 1 && record_ieot()) {
 		int ieot_blocks, cpus;
 
 		/*