[v5,2/5] read-cache: load cache extensions on a worker thread
diff mbox series

Message ID 20180912161832.55324-3-benpeart@microsoft.com
State New
Headers show
Series
  • [v5,1/5] eoie: add End of Index Entry (EOIE) extension
Related show

Commit Message

Ben Peart Sept. 12, 2018, 4:18 p.m. UTC
This patch helps address the CPU cost of loading the index by loading
the cache extensions on a worker thread in parallel with loading the cache
entries.

In some cases, loading the extensions takes longer than loading the
cache entries so this patch utilizes the new EOIE to start the thread to
load the extensions before loading all the cache entries in parallel.

This is possible because the current extensions don't access the cache
entries in the index_state structure so are OK that they don't all exist
yet.

The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED
extensions don't even get a pointer to the index so don't have access to the
cache entries.

CACHE_EXT_LINK only uses the index_state to initialize the split index.
CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last
update and dirty flags.

I used p0002-read-cache.sh to generate some performance data:

Test w/100,000 files                Baseline         Parallel Extensions
---------------------------------------------------------------------------
read_cache/discard_cache 1000 times 14.08(0.01+0.10) 9.72(0.03+0.06) -31.0%

Test w/1,000,000 files              Baseline         Parallel Extensions
------------------------------------------------------------------------------
read_cache/discard_cache 1000 times 202.95(0.01+0.07) 154.14(0.03+0.06) -24.1%

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
---
 Documentation/config.txt |  6 +++
 config.c                 | 18 ++++++++
 config.h                 |  1 +
 read-cache.c             | 94 ++++++++++++++++++++++++++++++++--------
 4 files changed, 102 insertions(+), 17 deletions(-)

Comments

Duy Nguyen Sept. 15, 2018, 10:22 a.m. UTC | #1
On Wed, Sep 12, 2018 at 6:18 PM Ben Peart <benpeart@microsoft.com> wrote:
>
> This patch helps address the CPU cost of loading the index by loading
> the cache extensions on a worker thread in parallel with loading the cache
> entries.
>
> In some cases, loading the extensions takes longer than loading the
> cache entries so this patch utilizes the new EOIE to start the thread to
> load the extensions before loading all the cache entries in parallel.
>
> This is possible because the current extensions don't access the cache
> entries in the index_state structure so are OK that they don't all exist
> yet.
>
> The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED
> extensions don't even get a pointer to the index so don't have access to the
> cache entries.
>
> CACHE_EXT_LINK only uses the index_state to initialize the split index.
> CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last
> update and dirty flags.
>
> I used p0002-read-cache.sh to generate some performance data:
>
> Test w/100,000 files                Baseline         Parallel Extensions
> ---------------------------------------------------------------------------
> read_cache/discard_cache 1000 times 14.08(0.01+0.10) 9.72(0.03+0.06) -31.0%
>
> Test w/1,000,000 files              Baseline         Parallel Extensions
> ------------------------------------------------------------------------------
> read_cache/discard_cache 1000 times 202.95(0.01+0.07) 154.14(0.03+0.06) -24.1%
>
> Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
> ---
>  Documentation/config.txt |  6 +++
>  config.c                 | 18 ++++++++
>  config.h                 |  1 +
>  read-cache.c             | 94 ++++++++++++++++++++++++++++++++--------
>  4 files changed, 102 insertions(+), 17 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1c42364988..79f8296d9c 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2391,6 +2391,12 @@ imap::
>         The configuration variables in the 'imap' section are described
>         in linkgit:git-imap-send[1].
>
> +index.threads::
> +       Specifies the number of threads to spawn when loading the index.
> +       This is meant to reduce index load time on multiprocessor machines.
> +       Specifying 0 or 'true' will cause Git to auto-detect the number of
> +       CPU's and set the number of threads accordingly. Defaults to 'true'.

I'd rather this variable defaults to 0. Spawning threads have
associated cost and most projects out there are small enough that this
multi threading could just add more cost than gain. It only makes
sense to enable this on huge repos.

Wait there's no way to disable this parallel reading? Does not sound
right. And  if ordinary numbers mean the number of threads then 0
should mean no threading. Auto detection could have a new keyword,
like 'auto'.

> +
>  index.version::
>         Specify the version with which new index files should be
>         initialized.  This does not affect existing repositories.
> diff --git a/config.c b/config.c
> index 9a0b10d4bc..9bd79fb165 100644
> --- a/config.c
> +++ b/config.c
> @@ -2289,6 +2289,24 @@ int git_config_get_fsmonitor(void)
>         return 0;
>  }
>
> +/*
> + * You can disable multi-threaded code by setting index.threads
> + * to 'false' (or 1)
> + */
> +int git_config_get_index_threads(void)
> +{
> +       int is_bool, val;
> +
> +       if (!git_config_get_bool_or_int("index.threads", &is_bool, &val)) {
> +               if (is_bool)
> +                       return val ? 0 : 1;
> +               else
> +                       return val;
> +       }
> +
> +       return 0; /* auto-detect */
> +}
> +
>  NORETURN
>  void git_die_config_linenr(const char *key, const char *filename, int linenr)
>  {
> diff --git a/config.h b/config.h
> index ab46e0165d..a06027e69b 100644
> --- a/config.h
> +++ b/config.h
> @@ -250,6 +250,7 @@ extern int git_config_get_untracked_cache(void);
>  extern int git_config_get_split_index(void);
>  extern int git_config_get_max_percent_split_change(void);
>  extern int git_config_get_fsmonitor(void);
> +extern int git_config_get_index_threads(void);
>
>  /* This dies if the configured or default date is in the future */
>  extern int git_config_get_expiry(const char *key, const char **output);
> diff --git a/read-cache.c b/read-cache.c
> index 858935f123..b203eebb44 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -23,6 +23,10 @@
>  #include "split-index.h"
>  #include "utf8.h"
>  #include "fsmonitor.h"
> +#ifndef NO_PTHREADS
> +#include <pthread.h>
> +#include <thread-utils.h>
> +#endif

I don't think you're supposed to include system header files after
"cache.h". Including thread-utils.h should be enough (and it keeps the
exception of inclduing pthread.h in just one place). Please use
"pthread-utils.h" instead of <pthread-utils.h> which is usually for
system header files. And include ptherad-utils.h unconditionally.

>
>  /* Mask for the name length in ce_flags in the on-disk index */
>
> @@ -1898,6 +1902,46 @@ static unsigned long read_eoie_extension(void *mmap_, size_t mmap_size);
>  #endif
>  static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, unsigned long offset);
>
> +struct load_index_extensions
> +{
> +#ifndef NO_PTHREADS
> +       pthread_t pthread;
> +#endif
> +       struct index_state *istate;
> +       void *mmap;
> +       size_t mmap_size;
> +       unsigned long src_offset;
> +};
> +
> +static void *load_index_extensions(void *_data)
> +{
> +       struct load_index_extensions *p = _data;
> +       unsigned long src_offset = p->src_offset;
> +
> +       while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) {
> +               /* After an array of active_nr index entries,
> +                * there can be arbitrary number of extended
> +                * sections, each of which is prefixed with
> +                * extension name (4-byte) and section length
> +                * in 4-byte network byte order.
> +                */
> +               uint32_t extsize;
> +               memcpy(&extsize, (char *)p->mmap + src_offset + 4, 4);
> +               extsize = ntohl(extsize);
> +               if (read_index_extension(p->istate,
> +                       (const char *)p->mmap + src_offset,
> +                       (char *)p->mmap + src_offset + 8,
> +                       extsize) < 0) {
> +                       munmap(p->mmap, p->mmap_size);
> +                       die("index file corrupt");

_()

> +               }
> +               src_offset += 8;
> +               src_offset += extsize;
> +       }
> +
> +       return NULL;
> +}
> +
>  /* remember to discard_cache() before reading a different cache! */
>  int do_read_index(struct index_state *istate, const char *path, int must_exist)
>  {
> @@ -1908,6 +1952,11 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>         void *mmap;
>         size_t mmap_size;
>         struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
> +       struct load_index_extensions p = { 0 };
> +       unsigned long extension_offset = 0;
> +#ifndef NO_PTHREADS
> +       int nr_threads;
> +#endif
>
>         if (istate->initialized)
>                 return istate->cache_nr;
> @@ -1944,6 +1993,26 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>         istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
>         istate->initialized = 1;
>
> +       p.istate = istate;
> +       p.mmap = mmap;
> +       p.mmap_size = mmap_size;
> +
> +#ifndef NO_PTHREADS
> +       nr_threads = git_config_get_index_threads();
> +       if (!nr_threads)
> +               nr_threads = online_cpus();
> +
> +       if (nr_threads >= 2) {
> +               extension_offset = read_eoie_extension(mmap, mmap_size);
> +               if (extension_offset) {
> +                       /* create a thread to load the index extensions */

Pointless comment. It's pretty clear from the pthread_create() below
thanks to good function naming. Please remove.

> +                       p.src_offset = extension_offset;
> +                       if (pthread_create(&p.pthread, NULL, load_index_extensions, &p))
> +                               die(_("unable to create load_index_extensions_thread"));
> +               }
> +       }
> +#endif
> +
>         if (istate->version == 4) {
>                 previous_name = &previous_name_buf;
>                 mem_pool_init(&istate->ce_mem_pool,
> @@ -1970,23 +2039,14 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>         istate->timestamp.sec = st.st_mtime;
>         istate->timestamp.nsec = ST_MTIME_NSEC(st);
>
> -       while (src_offset <= mmap_size - the_hash_algo->rawsz - 8) {
> -               /* After an array of active_nr index entries,
> -                * there can be arbitrary number of extended
> -                * sections, each of which is prefixed with
> -                * extension name (4-byte) and section length
> -                * in 4-byte network byte order.
> -                */
> -               uint32_t extsize;
> -               memcpy(&extsize, (char *)mmap + src_offset + 4, 4);
> -               extsize = ntohl(extsize);
> -               if (read_index_extension(istate,
> -                                        (const char *) mmap + src_offset,
> -                                        (char *) mmap + src_offset + 8,
> -                                        extsize) < 0)
> -                       goto unmap;
> -               src_offset += 8;
> -               src_offset += extsize;
> +       /* if we created a thread, join it otherwise load the extensions on the primary thread */
> +#ifndef NO_PTHREADS
> +       if (extension_offset && pthread_join(p.pthread, NULL))
> +               die(_("unable to join load_index_extensions_thread"));

I guess the last _ is a typo and you wanted "unable to join
load_index_extensions thread". Please use die_errno() instead.

> +#endif
> +       if (!extension_offset) {
> +               p.src_offset = src_offset;
> +               load_index_extensions(&p);
>         }
>         munmap(mmap, mmap_size);
>         return istate->cache_nr;
> --
> 2.18.0.windows.1
>
Duy Nguyen Sept. 15, 2018, 10:24 a.m. UTC | #2
On Sat, Sep 15, 2018 at 12:22 PM Duy Nguyen <pclouds@gmail.com> wrote:
> > @@ -1944,6 +1993,26 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
> >         istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
> >         istate->initialized = 1;
> >
> > +       p.istate = istate;
> > +       p.mmap = mmap;
> > +       p.mmap_size = mmap_size;
> > +
> > +#ifndef NO_PTHREADS
> > +       nr_threads = git_config_get_index_threads();
> > +       if (!nr_threads)
> > +               nr_threads = online_cpus();
> > +
> > +       if (nr_threads >= 2) {
> > +               extension_offset = read_eoie_extension(mmap, mmap_size);
> > +               if (extension_offset) {

One more thing I forgot. If the extension area is small enough, then
we should not need to create a thread to parse extensions in parallel.
We should know roughly how much work we need because we know the total
size of all extensions.

> > +                       /* create a thread to load the index extensions */
>
> Pointless comment. It's pretty clear from the pthread_create() below
> thanks to good function naming. Please remove.
>
> > +                       p.src_offset = extension_offset;
> > +                       if (pthread_create(&p.pthread, NULL, load_index_extensions, &p))
> > +                               die(_("unable to create load_index_extensions_thread"));
> > +               }
> > +       }
> > +#endif
> > +
> >         if (istate->version == 4) {
> >                 previous_name = &previous_name_buf;
> >                 mem_pool_init(&istate->ce_mem_pool,
Duy Nguyen Sept. 15, 2018, 4:23 p.m. UTC | #3
On Sat, Sep 15, 2018 at 12:22 PM Duy Nguyen <pclouds@gmail.com> wrote:
> Wait there's no way to disable this parallel reading? Does not sound
> right. And  if ordinary numbers mean the number of threads then 0
> should mean no threading. Auto detection could have a new keyword,
> like 'auto'.

My bad. Disabling threading means _1_ thread. What was I thinking...
Ben Peart Sept. 17, 2018, 4:26 p.m. UTC | #4
On 9/15/2018 6:22 AM, Duy Nguyen wrote:
>> +index.threads::
>> +       Specifies the number of threads to spawn when loading the index.
>> +       This is meant to reduce index load time on multiprocessor machines.
>> +       Specifying 0 or 'true' will cause Git to auto-detect the number of
>> +       CPU's and set the number of threads accordingly. Defaults to 'true'.
> 
> I'd rather this variable defaults to 0. Spawning threads have
> associated cost and most projects out there are small enough that this
> multi threading could just add more cost than gain. It only makes
> sense to enable this on huge repos.
> 
> Wait there's no way to disable this parallel reading? Does not sound
> right. And  if ordinary numbers mean the number of threads then 0
> should mean no threading. Auto detection could have a new keyword,
> like 'auto'.
> 

The index.threads setting is patterned after the pack.threads setting 
for consistency.  Specifying 1 (or 'false') will disable multithreading 
but I will call that out explicitly in the documentation to make it more 
obvious.

The THREAD_COST logic is designed to ensure small repos don't incur more 
cost than gain.  If you have data on that logic that shows it isn't 
working properly, I'm happy to change the logic as necessary.

>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -23,6 +23,10 @@
>>   #include "split-index.h"
>>   #include "utf8.h"
>>   #include "fsmonitor.h"
>> +#ifndef NO_PTHREADS
>> +#include <pthread.h>
>> +#include <thread-utils.h>
>> +#endif
> 
> I don't think you're supposed to include system header files after
> "cache.h". Including thread-utils.h should be enough (and it keeps the
> exception of inclduing pthread.h in just one place). Please use
> "pthread-utils.h" instead of <pthread-utils.h> which is usually for
> system header files. And include ptherad-utils.h unconditionally.
> 

Thanks, I'll fix that.

>>
>>   /* Mask for the name length in ce_flags in the on-disk index */
>>
>> @@ -1898,6 +1902,46 @@ static unsigned long read_eoie_extension(void *mmap_, size_t mmap_size);
>>   #endif
>>   static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, unsigned long offset);
>>
>> +struct load_index_extensions
>> +{
>> +#ifndef NO_PTHREADS
>> +       pthread_t pthread;
>> +#endif
>> +       struct index_state *istate;
>> +       void *mmap;
>> +       size_t mmap_size;
>> +       unsigned long src_offset;
>> +};
>> +
>> +static void *load_index_extensions(void *_data)
>> +{
>> +       struct load_index_extensions *p = _data;
>> +       unsigned long src_offset = p->src_offset;
>> +
>> +       while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) {
>> +               /* After an array of active_nr index entries,
>> +                * there can be arbitrary number of extended
>> +                * sections, each of which is prefixed with
>> +                * extension name (4-byte) and section length
>> +                * in 4-byte network byte order.
>> +                */
>> +               uint32_t extsize;
>> +               memcpy(&extsize, (char *)p->mmap + src_offset + 4, 4);
>> +               extsize = ntohl(extsize);
>> +               if (read_index_extension(p->istate,
>> +                       (const char *)p->mmap + src_offset,
>> +                       (char *)p->mmap + src_offset + 8,
>> +                       extsize) < 0) {
>> +                       munmap(p->mmap, p->mmap_size);
>> +                       die("index file corrupt");
> 
> _()
> 

You're feedback style can be a bit abrupt and terse.  I _think_ what you 
are trying to say here is that the "die" call should use the _() macro 
around the string.

This is an edit of the previous code that loaded index extensions and 
doesn't change the use of _(). I don't know the rules for when _() 
should be used and didn't have any luck finding where it was documented 
so left it unchanged.

FWIW, in this file alone there are 20 existing instances of die() or 
die_errorno() and only two that use the _() macro.  A quick grep through 
the source code shows thousands of die() calls the vast majority of 
which do not use the _() macro.  This appears to be an area that is 
unclear and inconsistent and could use some attention in a separate patch.


>> +       /* if we created a thread, join it otherwise load the extensions on the primary thread */
>> +#ifndef NO_PTHREADS
>> +       if (extension_offset && pthread_join(p.pthread, NULL))
>> +               die(_("unable to join load_index_extensions_thread"));
> 
> I guess the last _ is a typo and you wanted "unable to join
> load_index_extensions thread". Please use die_errno() instead.
> 

Why should this be die_errorno() here?  All other instances of 
pthread_join() failing in a fatal way use die(), not die_errorno().

>> +#endif
>> +       if (!extension_offset) {
>> +               p.src_offset = src_offset;
>> +               load_index_extensions(&p);
>>          }
>>          munmap(mmap, mmap_size);
>>          return istate->cache_nr;
>> --
>> 2.18.0.windows.1
>>
> 
>
Ben Peart Sept. 17, 2018, 4:38 p.m. UTC | #5
On 9/15/2018 6:24 AM, Duy Nguyen wrote:
> On Sat, Sep 15, 2018 at 12:22 PM Duy Nguyen <pclouds@gmail.com> wrote:
>>> @@ -1944,6 +1993,26 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>>>          istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
>>>          istate->initialized = 1;
>>>
>>> +       p.istate = istate;
>>> +       p.mmap = mmap;
>>> +       p.mmap_size = mmap_size;
>>> +
>>> +#ifndef NO_PTHREADS
>>> +       nr_threads = git_config_get_index_threads();
>>> +       if (!nr_threads)
>>> +               nr_threads = online_cpus();
>>> +
>>> +       if (nr_threads >= 2) {
>>> +               extension_offset = read_eoie_extension(mmap, mmap_size);
>>> +               if (extension_offset) {
> 
> One more thing I forgot. If the extension area is small enough, then
> we should not need to create a thread to parse extensions in parallel.
> We should know roughly how much work we need because we know the total
> size of all extensions.
> 

The only extensions I found to be significant enough to be helped by a 
separate thread was the cache tree.  Since the size of the cache tree is 
driven by the number of files in the repo, I think the existing 
THREAD_COST logic (that comes in the next patch of the series) is a 
sufficient proxy.  Basically, if you have enough cache entries to be 
benefited by threading, your extensions (driven by the cache tree) are 
probably also big enough to warrant a thread.

>>> +                       /* create a thread to load the index extensions */
>>
>> Pointless comment. It's pretty clear from the pthread_create() below
>> thanks to good function naming. Please remove.
>>
>>> +                       p.src_offset = extension_offset;
>>> +                       if (pthread_create(&p.pthread, NULL, load_index_extensions, &p))
>>> +                               die(_("unable to create load_index_extensions_thread"));
>>> +               }
>>> +       }
>>> +#endif
>>> +
>>>          if (istate->version == 4) {
>>>                  previous_name = &previous_name_buf;
>>>                  mem_pool_init(&istate->ce_mem_pool,
Duy Nguyen Sept. 17, 2018, 4:45 p.m. UTC | #6
On Mon, Sep 17, 2018 at 6:26 PM Ben Peart <peartben@gmail.com> wrote:
>
>
>
> On 9/15/2018 6:22 AM, Duy Nguyen wrote:
> >> +index.threads::
> >> +       Specifies the number of threads to spawn when loading the index.
> >> +       This is meant to reduce index load time on multiprocessor machines.
> >> +       Specifying 0 or 'true' will cause Git to auto-detect the number of
> >> +       CPU's and set the number of threads accordingly. Defaults to 'true'.
> >
> > I'd rather this variable defaults to 0. Spawning threads have
> > associated cost and most projects out there are small enough that this
> > multi threading could just add more cost than gain. It only makes
> > sense to enable this on huge repos.
> >
> > Wait there's no way to disable this parallel reading? Does not sound
> > right. And  if ordinary numbers mean the number of threads then 0
> > should mean no threading. Auto detection could have a new keyword,
> > like 'auto'.
> >
>
> The index.threads setting is patterned after the pack.threads setting
> for consistency.  Specifying 1 (or 'false') will disable multithreading
> but I will call that out explicitly in the documentation to make it more
> obvious.
>
> The THREAD_COST logic is designed to ensure small repos don't incur more
> cost than gain.  If you have data on that logic that shows it isn't
> working properly, I'm happy to change the logic as necessary.

THREAD_COST does not apply to this extension thread if I remember correctly.

> >> +static void *load_index_extensions(void *_data)
> >> +{
> >> +       struct load_index_extensions *p = _data;
> >> +       unsigned long src_offset = p->src_offset;
> >> +
> >> +       while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) {
> >> +               /* After an array of active_nr index entries,
> >> +                * there can be arbitrary number of extended
> >> +                * sections, each of which is prefixed with
> >> +                * extension name (4-byte) and section length
> >> +                * in 4-byte network byte order.
> >> +                */
> >> +               uint32_t extsize;
> >> +               memcpy(&extsize, (char *)p->mmap + src_offset + 4, 4);
> >> +               extsize = ntohl(extsize);
> >> +               if (read_index_extension(p->istate,
> >> +                       (const char *)p->mmap + src_offset,
> >> +                       (char *)p->mmap + src_offset + 8,
> >> +                       extsize) < 0) {
> >> +                       munmap(p->mmap, p->mmap_size);
> >> +                       die("index file corrupt");
> >
> > _()
> >
>
> You're feedback style can be a bit abrupt and terse.  I _think_ what you
> are trying to say here is that the "die" call should use the _() macro
> around the string.

Yes. Sorry I should have explained a bit better.

> This is an edit of the previous code that loaded index extensions and
> doesn't change the use of _(). I don't know the rules for when _()
> should be used and didn't have any luck finding where it was documented
> so left it unchanged.
>
> FWIW, in this file alone there are 20 existing instances of die() or
> die_errorno() and only two that use the _() macro.  A quick grep through
> the source code shows thousands of die() calls the vast majority of
> which do not use the _() macro.  This appears to be an area that is
> unclear and inconsistent and could use some attention in a separate patch.

This is one of the gray areas where we have to determine if the
message should be translated or not. And it should be translated
unless it's part of the plumbing output, to be consumed by scripts.

I know there's lots of messages still untranslated. I'm trying to do
something about that. But I cannot just go fix up all strings when you
all keep adding more strings for me to go fix. When you add a new
string, please consider if it should be translated or not. In this
case since it already receives reviewer attention we should be able to
determine it now, instead of delaying it for later.

> >> +       /* if we created a thread, join it otherwise load the extensions on the primary thread */
> >> +#ifndef NO_PTHREADS
> >> +       if (extension_offset && pthread_join(p.pthread, NULL))
> >> +               die(_("unable to join load_index_extensions_thread"));
> >
> > I guess the last _ is a typo and you wanted "unable to join
> > load_index_extensions thread". Please use die_errno() instead.
> >
>
> Why should this be die_errorno() here?  All other instances of
> pthread_join() failing in a fatal way use die(), not die_errorno().

That argument does not fly well in my opinion. I read the man page and
it listed the error codes, which made me think that we need to use
die_errno() to show the error. My mistake though is the error is
returned as the return value, not in errno, so die_errno() would not
catch it. But we could still do something like

    int ret = pthread_join();
    die(_("blah blah: %s"), strerror(ret));

Other code can also be improved, but that's a separate issue.
Junio C Hamano Sept. 17, 2018, 5:19 p.m. UTC | #7
Duy Nguyen <pclouds@gmail.com> writes:

> On Sat, Sep 15, 2018 at 12:22 PM Duy Nguyen <pclouds@gmail.com> wrote:
>> Wait there's no way to disable this parallel reading? Does not sound
>> right. And  if ordinary numbers mean the number of threads then 0
>> should mean no threading. Auto detection could have a new keyword,
>> like 'auto'.
>
> My bad. Disabling threading means _1_ thread. What was I thinking...

I did the same during my earlier review.  It seems that it somehow
is unintuitive to us that we do not specify how many _extra_ threads
of control we dedicate to ;-).
Junio C Hamano Sept. 17, 2018, 9:32 p.m. UTC | #8
Duy Nguyen <pclouds@gmail.com> writes:

>> diff --git a/read-cache.c b/read-cache.c
>> index 858935f123..b203eebb44 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -23,6 +23,10 @@
>>  #include "split-index.h"
>>  #include "utf8.h"
>>  #include "fsmonitor.h"
>> +#ifndef NO_PTHREADS
>> +#include <pthread.h>
>> +#include <thread-utils.h>
>> +#endif
>
> I don't think you're supposed to include system header files after
> "cache.h". Including thread-utils.h should be enough (and it keeps the
> exception of inclduing pthread.h in just one place). Please use
> "pthread-utils.h" instead of <pthread-utils.h> which is usually for
> system header files. And include ptherad-utils.h unconditionally.

All correct except for s/p\(thread-utils\)/\1/g;
Sorry for missing this during my earlier review.

Thanks.

Patch
diff mbox series

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1c42364988..79f8296d9c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2391,6 +2391,12 @@  imap::
 	The configuration variables in the 'imap' section are described
 	in linkgit:git-imap-send[1].
 
+index.threads::
+	Specifies the number of threads to spawn when loading the index.
+	This is meant to reduce index load time on multiprocessor machines.
+	Specifying 0 or 'true' will cause Git to auto-detect the number of
+	CPU's and set the number of threads accordingly. Defaults to 'true'.
+
 index.version::
 	Specify the version with which new index files should be
 	initialized.  This does not affect existing repositories.
diff --git a/config.c b/config.c
index 9a0b10d4bc..9bd79fb165 100644
--- a/config.c
+++ b/config.c
@@ -2289,6 +2289,24 @@  int git_config_get_fsmonitor(void)
 	return 0;
 }
 
+/*
+ * You can disable multi-threaded code by setting index.threads
+ * to 'false' (or 1)
+ */
+int git_config_get_index_threads(void)
+{
+	int is_bool, val;
+
+	if (!git_config_get_bool_or_int("index.threads", &is_bool, &val)) {
+		if (is_bool)
+			return val ? 0 : 1;
+		else
+			return val;
+	}
+
+	return 0; /* auto-detect */
+}
+
 NORETURN
 void git_die_config_linenr(const char *key, const char *filename, int linenr)
 {
diff --git a/config.h b/config.h
index ab46e0165d..a06027e69b 100644
--- a/config.h
+++ b/config.h
@@ -250,6 +250,7 @@  extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
 extern int git_config_get_max_percent_split_change(void);
 extern int git_config_get_fsmonitor(void);
+extern int git_config_get_index_threads(void);
 
 /* This dies if the configured or default date is in the future */
 extern int git_config_get_expiry(const char *key, const char **output);
diff --git a/read-cache.c b/read-cache.c
index 858935f123..b203eebb44 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -23,6 +23,10 @@ 
 #include "split-index.h"
 #include "utf8.h"
 #include "fsmonitor.h"
+#ifndef NO_PTHREADS
+#include <pthread.h>
+#include <thread-utils.h>
+#endif
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -1898,6 +1902,46 @@  static unsigned long read_eoie_extension(void *mmap_, size_t mmap_size);
 #endif
 static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, unsigned long offset);
 
+struct load_index_extensions
+{
+#ifndef NO_PTHREADS
+	pthread_t pthread;
+#endif
+	struct index_state *istate;
+	void *mmap;
+	size_t mmap_size;
+	unsigned long src_offset;
+};
+
+static void *load_index_extensions(void *_data)
+{
+	struct load_index_extensions *p = _data;
+	unsigned long src_offset = p->src_offset;
+
+	while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) {
+		/* After an array of active_nr index entries,
+		 * there can be arbitrary number of extended
+		 * sections, each of which is prefixed with
+		 * extension name (4-byte) and section length
+		 * in 4-byte network byte order.
+		 */
+		uint32_t extsize;
+		memcpy(&extsize, (char *)p->mmap + src_offset + 4, 4);
+		extsize = ntohl(extsize);
+		if (read_index_extension(p->istate,
+			(const char *)p->mmap + src_offset,
+			(char *)p->mmap + src_offset + 8,
+			extsize) < 0) {
+			munmap(p->mmap, p->mmap_size);
+			die("index file corrupt");
+		}
+		src_offset += 8;
+		src_offset += extsize;
+	}
+
+	return NULL;
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1908,6 +1952,11 @@  int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	void *mmap;
 	size_t mmap_size;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+	struct load_index_extensions p = { 0 };
+	unsigned long extension_offset = 0;
+#ifndef NO_PTHREADS
+	int nr_threads;
+#endif
 
 	if (istate->initialized)
 		return istate->cache_nr;
@@ -1944,6 +1993,26 @@  int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
 	istate->initialized = 1;
 
+	p.istate = istate;
+	p.mmap = mmap;
+	p.mmap_size = mmap_size;
+
+#ifndef NO_PTHREADS
+	nr_threads = git_config_get_index_threads();
+	if (!nr_threads)
+		nr_threads = online_cpus();
+
+	if (nr_threads >= 2) {
+		extension_offset = read_eoie_extension(mmap, mmap_size);
+		if (extension_offset) {
+			/* create a thread to load the index extensions */
+			p.src_offset = extension_offset;
+			if (pthread_create(&p.pthread, NULL, load_index_extensions, &p))
+				die(_("unable to create load_index_extensions_thread"));
+		}
+	}
+#endif
+
 	if (istate->version == 4) {
 		previous_name = &previous_name_buf;
 		mem_pool_init(&istate->ce_mem_pool,
@@ -1970,23 +2039,14 @@  int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	istate->timestamp.sec = st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 
-	while (src_offset <= mmap_size - the_hash_algo->rawsz - 8) {
-		/* After an array of active_nr index entries,
-		 * there can be arbitrary number of extended
-		 * sections, each of which is prefixed with
-		 * extension name (4-byte) and section length
-		 * in 4-byte network byte order.
-		 */
-		uint32_t extsize;
-		memcpy(&extsize, (char *)mmap + src_offset + 4, 4);
-		extsize = ntohl(extsize);
-		if (read_index_extension(istate,
-					 (const char *) mmap + src_offset,
-					 (char *) mmap + src_offset + 8,
-					 extsize) < 0)
-			goto unmap;
-		src_offset += 8;
-		src_offset += extsize;
+	/* if we created a thread, join it otherwise load the extensions on the primary thread */
+#ifndef NO_PTHREADS
+	if (extension_offset && pthread_join(p.pthread, NULL))
+		die(_("unable to join load_index_extensions_thread"));
+#endif
+	if (!extension_offset) {
+		p.src_offset = src_offset;
+		load_index_extensions(&p);
 	}
 	munmap(mmap, mmap_size);
 	return istate->cache_nr;