[v6,4/7] config: add new index.threads config setting

Message ID	20180926195442.1380-5-benpeart@microsoft.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: Ben Peart <peartben@gmail.com> To: git@vger.kernel.org Cc: gitster@pobox.com, pclouds@gmail.com, Ben Peart <benpeart@microsoft.com>, Ben Peart <Ben.Peart@microsoft.com> Subject: [PATCH v6 4/7] config: add new index.threads config setting Date: Wed, 26 Sep 2018 15:54:39 -0400 Message-Id: <20180926195442.1380-5-benpeart@microsoft.com> In-Reply-To: <20180926195442.1380-1-benpeart@microsoft.com> References: <20180823154053.20212-1-benpeart@microsoft.com> <20180926195442.1380-1-benpeart@microsoft.com> Sender: git-owner@vger.kernel.org Precedence: bulk
Series	speed up index load through parallelization \| expand [v6,0/7] speed up index load through parallelization [v6,1/7] read-cache.c: optimize reading index format v4 [v6,2/7] read-cache: clean up casting and byte decoding [v6,3/7] eoie: add End of Index Entry (EOIE) extension [v6,4/7] config: add new index.threads config setting [v6,5/7] read-cache: load cache extensions on a worker thread [v6,6/7] ieot: add Index Entry Offset Table (IEOT) extension [v6,7/7] read-cache: load cache entries on worker threads

Ben Peart Sept. 26, 2018, 7:54 p.m. UTC

Add support for a new index.threads config setting which will be used to
control the threading code in do_read_index().  A value of 0 will tell the
index code to automatically determine the correct number of threads to use.
A value of 1 will make the code single threaded.  A value greater than 1
will set the maximum number of threads to use.

For testing purposes, this setting can be overwritten by setting the
GIT_TEST_INDEX_THREADS=<n> environment variable to a value greater than 0.

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
---
 Documentation/config.txt |  7 +++++++
 config.c                 | 18 ++++++++++++++++++
 config.h                 |  1 +
 t/README                 |  5 +++++
 t/t1700-split-index.sh   |  1 +
 5 files changed, 32 insertions(+)

SZEDER Gábor Sept. 28, 2018, 12:26 a.m. UTC | #1

On Wed, Sep 26, 2018 at 03:54:39PM -0400, Ben Peart wrote:
> Add support for a new index.threads config setting which will be used to
> control the threading code in do_read_index().  A value of 0 will tell the
> index code to automatically determine the correct number of threads to use.
> A value of 1 will make the code single threaded.  A value greater than 1
> will set the maximum number of threads to use.
> 
> For testing purposes, this setting can be overwritten by setting the
> GIT_TEST_INDEX_THREADS=<n> environment variable to a value greater than 0.
> 
> Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
> ---

> diff --git a/t/README b/t/README
> index aa33ac4f26..0fcecf4500 100644
> --- a/t/README
> +++ b/t/README
> @@ -332,6 +332,11 @@ This is used to allow tests 1, 4-9 in t1700-split-index.sh to succeed
>  as they currently hard code SHA values for the index which are no longer
>  valid due to the addition of the EOIE extension.
>  
> +GIT_TEST_INDEX_THREADS=<n> enables exercising the multi-threaded loading
> +of the index for the whole test suite by bypassing the default number of
> +cache entries and thread minimums. Settting this to 1 will make the

s/ttt/tt/

> +index loading single threaded.
> +
>  Naming Tests
>  ------------
>  
> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
> index 1f168378c8..ab205954cf 100755
> --- a/t/t1700-split-index.sh
> +++ b/t/t1700-split-index.sh
> @@ -8,6 +8,7 @@ test_description='split index mode tests'
>  sane_unset GIT_TEST_SPLIT_INDEX
>  sane_unset GIT_FSMONITOR_TEST
>  GIT_TEST_DISABLE_EOIE=true; export GIT_TEST_DISABLE_EOIE
> +GIT_TEST_INDEX_THREADS=1; export GIT_TEST_INDEX_THREADS

Why does multithreading have to be disabled in this test?

>  test_expect_success 'enable split index' '
>  	git config splitIndex.maxPercentChange 100 &&
> -- 
> 2.18.0.windows.1
>

Ben Peart Sept. 28, 2018, 1:39 p.m. UTC | #2

On 9/27/2018 8:26 PM, SZEDER Gábor wrote:
> On Wed, Sep 26, 2018 at 03:54:39PM -0400, Ben Peart wrote:
>> Add support for a new index.threads config setting which will be used to
>> control the threading code in do_read_index().  A value of 0 will tell the
>> index code to automatically determine the correct number of threads to use.
>> A value of 1 will make the code single threaded.  A value greater than 1
>> will set the maximum number of threads to use.
>>
>> For testing purposes, this setting can be overwritten by setting the
>> GIT_TEST_INDEX_THREADS=<n> environment variable to a value greater than 0.
>>
>> Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
>> ---
> 
>> diff --git a/t/README b/t/README
>> index aa33ac4f26..0fcecf4500 100644
>> --- a/t/README
>> +++ b/t/README
>> @@ -332,6 +332,11 @@ This is used to allow tests 1, 4-9 in t1700-split-index.sh to succeed
>>   as they currently hard code SHA values for the index which are no longer
>>   valid due to the addition of the EOIE extension.
>>   
>> +GIT_TEST_INDEX_THREADS=<n> enables exercising the multi-threaded loading
>> +of the index for the whole test suite by bypassing the default number of
>> +cache entries and thread minimums. Settting this to 1 will make the
> 
> s/ttt/tt/
> 
>> +index loading single threaded.
>> +
>>   Naming Tests
>>   ------------
>>   
>> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
>> index 1f168378c8..ab205954cf 100755
>> --- a/t/t1700-split-index.sh
>> +++ b/t/t1700-split-index.sh
>> @@ -8,6 +8,7 @@ test_description='split index mode tests'
>>   sane_unset GIT_TEST_SPLIT_INDEX
>>   sane_unset GIT_FSMONITOR_TEST
>>   GIT_TEST_DISABLE_EOIE=true; export GIT_TEST_DISABLE_EOIE
>> +GIT_TEST_INDEX_THREADS=1; export GIT_TEST_INDEX_THREADS
> 
> Why does multithreading have to be disabled in this test?
> 

If multi-threading is enabled, it will write out the IEOT extension 
which changes the SHA and causes the test to fail.  I will update the 
logic in this case to not write out the IEOT extension as it isn't needed.

>>   test_expect_success 'enable split index' '
>>   	git config splitIndex.maxPercentChange 100 &&
>> -- 
>> 2.18.0.windows.1
>>

Junio C Hamano Sept. 28, 2018, 5:07 p.m. UTC | #3

Ben Peart <peartben@gmail.com> writes:

>> Why does multithreading have to be disabled in this test?
>
> If multi-threading is enabled, it will write out the IEOT extension
> which changes the SHA and causes the test to fail.

I think it is a design mistake to let the writing processes's
capability decide what is written in the file to be read later by a
different process, which possibly may have different capability.  If
you are not writing with multiple threads, it should not matter if
that writer process is capable of and configured to spawn 8 threads
if the process were reading the file---as it is not reading the file
it is writing right now.

I can understand if the design is to write IEOT only if the
resulting index is expected to become large enough (above an
arbitrary threshold like 100k entries) to matter.  I also can
understand if IEOT is omitted when the repository configuration says
that no process is allowed to read the index with multi-threaded
codepath in that repository.

Ben Peart Sept. 28, 2018, 7:41 p.m. UTC | #4

On 9/28/2018 1:07 PM, Junio C Hamano wrote:
> Ben Peart <peartben@gmail.com> writes:
> 
>>> Why does multithreading have to be disabled in this test?
>>
>> If multi-threading is enabled, it will write out the IEOT extension
>> which changes the SHA and causes the test to fail.
> 
> I think it is a design mistake to let the writing processes's
> capability decide what is written in the file to be read later by a
> different process, which possibly may have different capability.  If
> you are not writing with multiple threads, it should not matter if
> that writer process is capable of and configured to spawn 8 threads
> if the process were reading the file---as it is not reading the file
> it is writing right now.
> 
> I can understand if the design is to write IEOT only if the
> resulting index is expected to become large enough (above an
> arbitrary threshold like 100k entries) to matter.  I also can
> understand if IEOT is omitted when the repository configuration says
> that no process is allowed to read the index with multi-threaded
> codepath in that repository.
> 

There are two different paths which determine how many blocks are 
written to the IEOT.  The first is the default path.  On this path, the 
number of blocks is determined by the number of cache entries divided by 
the THREAD_COST.  If there are sufficient entries to make it faster to 
use threading, then it will automatically use enough blocks to optimize 
the performance of reading the entries across multiple threads.

I currently cap the maximum number of blocks to be the number of cores 
that would be available to process them on that same machine purely as 
an optimization.  The majority of the time, the index will be read from 
the same machine that it was written on so this works well.  Before I 
added that logic, you would usually end up with more blocks than 
available threads which meant some threads had more to do than the other 
threads and resulted in worse performance.  For example, 4 blocks across 
3 threads results in the 1st thread having twice as much work to do as 
the other threads.

If the index is copied to a machine with a different number of cores, it 
will still all work - it just may not be optimal for that machine.  This 
is self correcting because as soon as the index is written out, it will 
be optimized for that machine.

If the "automatically try to make it perform optimally" logic doesn't 
work for some reason, we have path #2.

The second path is when the user specifies a specific number of blocks 
via the GIT_TEST_INDEX_THREADS=<n> environment variable or the 
index.threads=<n> config setting.  If they ask for n blocks, they will 
get n blocks.  This is the "I know what I'm doing and want to control 
the behavior" path.

I just added one additional test (see patch below) to avoid a divide by 
zero bug and simplify things a bit.  With this change, if there are 
fewer than two blocks, the IEOT extension is not written out as it isn't 
needed.  The load would be single threaded anyway so there is no reason 
to write out a IEOT extensions that won't be used.

diff --git a/read-cache.c b/read-cache.c
index f5d766088d..a1006fa824 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2751,18 +2751,23 @@ static int do_write_index(struct index_state 
*istate, struct tempfile *tempfil
e,
                  */
                 if (!nr) {
                         ieot_blocks = istate->cache_nr / THREAD_COST;
-                       if (ieot_blocks < 1)
-                               ieot_blocks = 1;
                         cpus = online_cpus();
                         if (ieot_blocks > cpus - 1)
                                 ieot_blocks = cpus - 1;
                 } else {
                         ieot_blocks = nr;
                 }
-               ieot = xcalloc(1, sizeof(struct index_entry_offset_table)
-                       + (ieot_blocks * sizeof(struct 
index_entry_offset)));
-               ieot->nr = 0;
-               ieot_work = DIV_ROUND_UP(entries, ieot_blocks);
+
+               /*
+                * no reason to write out the IEOT extension if we don't
+                * have enough blocks to utilize multi-threading
+                */
+               if (ieot_blocks > 1) {
+                       ieot = xcalloc(1, sizeof(struct 
index_entry_offset_table)
+                               + (ieot_blocks * sizeof(struct 
index_entry_offset)));
+                       ieot->nr = 0;
+                       ieot_work = DIV_ROUND_UP(entries, ieot_blocks);
+               }
         }
  #endif

Ramsay Jones Sept. 28, 2018, 8:30 p.m. UTC | #5

On 28/09/18 20:41, Ben Peart wrote:
> 
> 
> On 9/28/2018 1:07 PM, Junio C Hamano wrote:
>> Ben Peart <peartben@gmail.com> writes:
>>
>>>> Why does multithreading have to be disabled in this test?
>>>
>>> If multi-threading is enabled, it will write out the IEOT extension
>>> which changes the SHA and causes the test to fail.
>>
>> I think it is a design mistake to let the writing processes's
>> capability decide what is written in the file to be read later by a
>> different process, which possibly may have different capability.  If
>> you are not writing with multiple threads, it should not matter if
>> that writer process is capable of and configured to spawn 8 threads
>> if the process were reading the file---as it is not reading the file
>> it is writing right now.
>>
>> I can understand if the design is to write IEOT only if the
>> resulting index is expected to become large enough (above an
>> arbitrary threshold like 100k entries) to matter.  I also can
>> understand if IEOT is omitted when the repository configuration says
>> that no process is allowed to read the index with multi-threaded
>> codepath in that repository.
>>
> 
> There are two different paths which determine how many blocks are written to the IEOT.  The first is the default path.  On this path, the number of blocks is determined by the number of cache entries divided by the THREAD_COST.  If there are sufficient entries to make it faster to use threading, then it will automatically use enough blocks to optimize the performance of reading the entries across multiple threads.
> 
> I currently cap the maximum number of blocks to be the number of cores that would be available to process them on that same machine purely as an optimization.  The majority of the time, the index will be read from the same machine that it was written on so this works well.  Before I added that logic, you would usually end up with more blocks than available threads which meant some threads had more to do than the other threads and resulted in worse performance.  For example, 4 blocks across 3 threads results in the 1st thread having twice as much work to do as the other threads.
> 
> If the index is copied to a machine with a different number of cores, it will still all work - it just may not be optimal for that machine.  This is self correcting because as soon as the index is written out, it will be optimized for that machine.
> 
> If the "automatically try to make it perform optimally" logic doesn't work for some reason, we have path #2.
> 
> The second path is when the user specifies a specific number of blocks via the GIT_TEST_INDEX_THREADS=<n> environment variable or the index.threads=<n> config setting.  If they ask for n blocks, they will get n blocks.  This is the "I know what I'm doing and want to control the behavior" path.
> 
> I just added one additional test (see patch below) to avoid a divide by zero bug and simplify things a bit.  With this change, if there are fewer than two blocks, the IEOT extension is not written out as it isn't needed.  The load would be single threaded anyway so there is no reason to write out a IEOT extensions that won't be used.
> 
> 
> 
> diff --git a/read-cache.c b/read-cache.c
> index f5d766088d..a1006fa824 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2751,18 +2751,23 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfil
> e,
>                  */
>                 if (!nr) {
>                         ieot_blocks = istate->cache_nr / THREAD_COST;
> -                       if (ieot_blocks < 1)
> -                               ieot_blocks = 1;
>                         cpus = online_cpus();
>                         if (ieot_blocks > cpus - 1)
>                                 ieot_blocks = cpus - 1;

So, am I reading this correctly - you need cpus > 2 before an
IEOT extension block is written out?

OK.

ATB,
Ramsay Jones

>                 } else {
>                         ieot_blocks = nr;
>                 }
> -               ieot = xcalloc(1, sizeof(struct index_entry_offset_table)
> -                       + (ieot_blocks * sizeof(struct index_entry_offset)));
> -               ieot->nr = 0;
> -               ieot_work = DIV_ROUND_UP(entries, ieot_blocks);
> +
> +               /*
> +                * no reason to write out the IEOT extension if we don't
> +                * have enough blocks to utilize multi-threading
> +                */
> +               if (ieot_blocks > 1) {
> +                       ieot = xcalloc(1, sizeof(struct index_entry_offset_table)
> +                               + (ieot_blocks * sizeof(struct index_entry_offset)));
> +                       ieot->nr = 0;
> +                       ieot_work = DIV_ROUND_UP(entries, ieot_blocks);
> +               }
>         }
>  #endif
> 
>

Junio C Hamano Sept. 28, 2018, 10:15 p.m. UTC | #6

Ramsay Jones <ramsay@ramsayjones.plus.com> writes:

>>                 if (!nr) {
>>                         ieot_blocks = istate->cache_nr / THREAD_COST;
>> -                       if (ieot_blocks < 1)
>> -                               ieot_blocks = 1;
>>                         cpus = online_cpus();
>>                         if (ieot_blocks > cpus - 1)
>>                                 ieot_blocks = cpus - 1;
>
> So, am I reading this correctly - you need cpus > 2 before an
> IEOT extension block is written out?
>
> OK.

Why should we be even calling online_cpus() in this codepath to
write the index in a single thread to begin with?

The number of cpus that readers would use to read this index file
has nothing to do with the number of cpus available to this
particular writer process.

Ben Peart Oct. 1, 2018, 1:17 p.m. UTC | #7

On 9/28/2018 6:15 PM, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
> 
>>>                  if (!nr) {
>>>                          ieot_blocks = istate->cache_nr / THREAD_COST;
>>> -                       if (ieot_blocks < 1)
>>> -                               ieot_blocks = 1;
>>>                          cpus = online_cpus();
>>>                          if (ieot_blocks > cpus - 1)
>>>                                  ieot_blocks = cpus - 1;
>>
>> So, am I reading this correctly - you need cpus > 2 before an
>> IEOT extension block is written out?
>>
>> OK.
> 
> Why should we be even calling online_cpus() in this codepath to
> write the index in a single thread to begin with?
> 
> The number of cpus that readers would use to read this index file
> has nothing to do with the number of cpus available to this
> particular writer process.
> 

As I mentioned in my other reply, this is optimizing for the most common 
case where the index is read from the same machine that wrote it and the 
user is taking the default settings (ie index.threads=true).

Aligning the number of blocks to the number of threads that will be 
processing them avoids situations where one thread may have up to double 
the work to do as the other threads (for example, if there were 3 blocks 
to be processed by 2 threads).

SZEDER Gábor Oct. 1, 2018, 3:06 p.m. UTC | #8

On Mon, Oct 01, 2018 at 09:17:53AM -0400, Ben Peart wrote:
> 
> 
> On 9/28/2018 6:15 PM, Junio C Hamano wrote:
> >Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
> >
> >>>                 if (!nr) {
> >>>                         ieot_blocks = istate->cache_nr / THREAD_COST;
> >>>-                       if (ieot_blocks < 1)
> >>>-                               ieot_blocks = 1;
> >>>                         cpus = online_cpus();
> >>>                         if (ieot_blocks > cpus - 1)
> >>>                                 ieot_blocks = cpus - 1;
> >>
> >>So, am I reading this correctly - you need cpus > 2 before an
> >>IEOT extension block is written out?
> >>
> >>OK.
> >
> >Why should we be even calling online_cpus() in this codepath to
> >write the index in a single thread to begin with?
> >
> >The number of cpus that readers would use to read this index file
> >has nothing to do with the number of cpus available to this
> >particular writer process.
> >
> 
> As I mentioned in my other reply, this is optimizing for the most common
> case where the index is read from the same machine that wrote it and the
> user is taking the default settings (ie index.threads=true).

I think this is a reasonable assumption to make, but it should be
mentioned in the relevant commit message.  Alas, as far as I can tell,
not a single commit message has been updated in v7.

> Aligning the number of blocks to the number of threads that will be
> processing them avoids situations where one thread may have up to double the
> work to do as the other threads (for example, if there were 3 blocks to be
> processed by 2 threads).

[v6,4/7] config: add new index.threads config setting

Commit Message

Comments

Patch