[v3,00/12] grep: improve threading and fix race conditions
mbox series

Message ID cover.1579141989.git.matheus.bernardino@usp.br
Headers show
Series
  • grep: improve threading and fix race conditions
Related show

Message

Matheus Tavares Bernardino Jan. 16, 2020, 2:39 a.m. UTC
This series focus on re-enabling threads at git-grep for the
object store case. They are currently disabled due to being slower than
single-threaded grep in this case. However, by allowing parallel zlib
inflation when reading objects, speedups of up to 3.3x were observed.

The patchset also contains some fixes for race conditions found in the
working tree git-grep and other thread optimizations. With them, the
working tree case reached speedups of up to 1.52x.

Changes since v2:
- Updated commit message of patch 6 and added comment in the code to
  document how we ensure that the unlocked inflation code is thread-safe
  (as suggested by Christian[1]).
- Added patch to adjust the default number of threads in git-grep
  according to the number of available cores.
- Fixed some typos in the commit messages
- Rebased with master (d0654dc)

Note: this patchset is part of my GSoC project, and also of my
bachelor's capstone project (which had the same theme/goal). In case
someone might want to take a look, the project's final essay can be
found here[2] :) Besides extra details, it contains more performance
tests and plots of the resulting speedups.


[1]: https://public-inbox.org/git/CAP8UFD3QHNoePGsc9t0fVxTJpPy+KBCbPfXKGoNOx_2bCf1Hxg@mail.gmail.com/
[2]: https://matheustavares.gitlab.io/assets/tavares-final-essay.pdf
travis build: https://travis-ci.org/matheustavares/git/builds/637708218

Matheus Tavares (12):
  grep: fix race conditions on userdiff calls
  grep: fix race conditions at grep_submodule()
  grep: fix racy calls in grep_objects()
  replace-object: make replace operations thread-safe
  object-store: allow threaded access to object reading
  grep: replace grep_read_mutex by internal obj read lock
  submodule-config: add skip_if_read option to repo_read_gitmodules()
  grep: allow submodule functions to run in parallel
  grep: protect packed_git [re-]initialization
  grep: re-enable threads in non-worktree case
  grep: move driver pre-load out of critical section
  grep: use no. of cores as the default no. of threads

 .tsan-suppressions         |  6 +++
 Documentation/git-grep.txt | 15 +++++-
 builtin/grep.c             | 93 +++++++++++++++++++-------------------
 grep.c                     | 32 +++++++------
 grep.h                     | 13 ------
 object-store.h             | 37 +++++++++++++++
 object.c                   |  2 +
 packfile.c                 | 34 ++++++++++++++
 replace-object.c           | 11 ++++-
 replace-object.h           |  7 ++-
 sha1-file.c                | 57 +++++++++++++++++++++--
 submodule-config.c         | 18 +++-----
 submodule-config.h         |  2 +-
 unpack-trees.c             |  4 +-
 14 files changed, 233 insertions(+), 98 deletions(-)

Range-diff against v2:
 1:  0f31cb0c12 !  1:  e2f3d377f5 grep: fix race conditions on userdiff calls
    @@ Commit message
         git-grep uses an internal grep_read_mutex to protect object reading
         operations. Similarly, there's a grep_attr_mutex to protect calls to the
         gitattributes machinery. However, two of the three functions protected
    -    by the last mutex may also perform object reading, as seen bellow:
    +    by the last mutex may also perform object reading, as seen below:
     
         - userdiff_get_textconv() > notes_cache_init() >
           notes_cache_match_validity() > lookup_commit_reference_gently() >
    @@ Commit message
         that, let's make sure to acquire the lock before both of these calls.
     
         Note: this patch might slow down the threaded grep in worktree, for the
    -    sake of thread-safeness. However, in the following patches we should
    +    sake of thread-safeness. However, in the following patches, we should
         regain performance by replacing grep_read_mutex for an internal object
         reading lock and allowing parallel inflation during object reading.
     
 2:  be32683f1d =  2:  6f0899701b grep: fix race conditions at grep_submodule()
 3:  34aeb218bf !  3:  5295c892ee grep: fix racy calls in grep_objects()
    @@ Commit message
         condition with object reading operations (such as the ones internally
         performed by fill_textconv(), called at fill_textconv_grep()). The same
         problem happens with the call to gitmodules_config_oid() which also has
    -    parse_object() in its call stack. Fix that protecting both call with the
    -    said grep_read_mutex.
    +    parse_object() in its call stack. Fix that protecting both calls with
    +    the said grep_read_mutex.
     
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
 4:  5deee3cf11 !  4:  d7f739bc57 replace-object: make replace operations thread-safe
    @@ object-store.h: struct raw_object_store {
     
      ## object.c ##
     @@ object.c: struct raw_object_store *raw_object_store_new(void)
    - 
      	memset(o, 0, sizeof(*o));
      	INIT_LIST_HEAD(&o->packed_git_mru);
    + 	hashmap_init(&o->pack_map, pack_map_entry_cmp, NULL, 0);
     +	pthread_mutex_init(&o->replace_mutex, NULL);
      	return o;
      }
 5:  4c5652ab34 !  5:  b72e90f229 object-store: allow threaded access to object reading
    @@ Commit message
         object-store: allow threaded access to object reading
     
         Allow object reading to be performed by multiple threads protecting it
    -    with an internal lock. The lock usage can be toggled with
    -    enable_obj_read_lock() and disable_obj_read_lock(). Currently, the
    +    with an internal lock, the obj_read_mutex. The lock usage can be toggled
    +    with enable_obj_read_lock() and disable_obj_read_lock(). Currently, the
         functions which can be safely called in parallel are:
         read_object_file_extended(), repo_read_object_file(),
         read_object_file(), read_object_with_reference(), read_object(),
    -    oid_object_info() and oid_object_info_extended(). It's also possible to
    -    use obj_read_lock() and obj_read_unlock() to protect other sections that
    -    cannot execute in parallel with object reading.
    +    oid_object_info() and oid_object_info_extended(). It's also possible
    +    to use obj_read_lock() and obj_read_unlock() to protect other sections
    +    that cannot execute in parallel with object reading.
     
         Probably there are many spots in the functions listed above that could
         be executed unlocked (and thus, in parallel). But, for now, we are most
         interested in allowing parallel access to zlib inflation. This is one of
    -    the sections where object reading spends most of the time and it's
    -    already thread-safe. So, to take advantage of that, the respective lock
    -    is released when calling git_inflate() and re-acquired right after, for
    -    every calling spot in oid_object_info_extended()'s call chain. We may
    -    refine the lock to also exploit other possible parallel spots in the
    -    future, but threaded zlib inflation should already give great speedups
    -    for now.
    +    the sections where object reading spends most of the time in (e.g. up to
    +    one-third of git-grep's execution time in the chromium repo corresponds
    +    to inflation) and it's already thread-safe. So, to take advantage of
    +    that, the obj_read_mutex is released when calling git_inflate() and
    +    re-acquired right after, for every calling spot in
    +    oid_object_info_extended()'s call chain. We may refine this lock to also
    +    exploit other possible parallel spots in the future, but for now,
    +    threaded zlib inflation should already give great speedups for threaded
    +    object reading callers.
     
         Note that add_delta_base_cache() was also modified to skip adding
         already present entries to the cache. This wasn't possible before, but
    -    now it is since phase I and phase III of unpack_entry() may execute
    -    concurrently.
    +    it would be now, with the parallel inflation. Take for example the
    +    following situation, where two threads - A and B - are executing the
    +    code at unpack_entry():
     
    -    Another important thing to notice is that the object reading lock only
    -    works in conjunction with the 'struct raw_object_store's replace_mutex.
    -    Otherwise, there would still be racy spots in object reading
    -    functions.
    +    1. Thread A is performing the decompression of a base O (which is not
    +       yet in the cache) at PHASE II. Thread B is simultaneously trying to
    +       unpack O, but just starting at PHASE I.
    +    2. Since O is not yet in the cache, B will go to PHASE II to also
    +       perform the decompression.
    +    3. When they finish decompressing, one of them will get the object
    +       reading mutex and go to PHASE III while the other waits for the
    +       mutex. Let’s say A got the mutex first.
    +    4. Thread A will add O to the cache, go throughout the rest of PHASE III
    +       and return.
    +    5. Thread B gets the mutex, also add O to the cache (if the check wasn't
    +       there) and returns.
    +
    +    Finally, it is also important to highlight that the object reading lock
    +    can only ensure thread-safety in the mentioned functions thanks to two
    +    complementary mechanisms: the use of 'struct raw_object_store's
    +    replace_mutex, which guards sections in the object reading machinery
    +    that would otherwise be thread-unsafe; and the 'struct pack_window's
    +    inuse_cnt, which protects window reading operations (such as the one
    +    performed during the inflation of a packed object), allowing them to
    +    execute without the acquisition of the obj_read_mutex.
     
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
    @@ packfile.c: unsigned long get_size_from_delta(struct packed_git *p,
      	do {
      		in = use_pack(p, w_curs, curpos, &stream.avail_in);
      		stream.next_in = in;
    ++		/*
    ++		 * Note: the window section returned by use_pack() must be
    ++		 * available throughout git_inflate()'s unlocked execution. To
    ++		 * ensure no other thread will modify the window in the
    ++		 * meantime, we rely on the packed_window.inuse_cnt. This
    ++		 * counter is incremented before window reading and checked
    ++		 * before window disposal.
    ++		 *
    ++		 * Other worrying sections could be the call to close_pack_fd(),
    ++		 * which can close packs even with in-use windows, and to
    ++		 * reprepare_packed_git(). Regarding the former, mmap doc says:
    ++		 * "closing the file descriptor does not unmap the region". And
    ++		 * for the latter, it won't re-open already available packs.
    ++		 */
     +		obj_read_unlock();
      		st = git_inflate(&stream, Z_FINISH);
     +		obj_read_lock();
    @@ packfile.c: static void add_delta_base_cache(struct packed_git *p, off_t base_of
      	struct delta_base_cache_entry *ent = xmalloc(sizeof(*ent));
      	struct list_head *lru, *tmp;
      
    -+	if (get_delta_base_cache_entry(p, base_offset))
    ++	/*
    ++	 * Check required to avoid redundant entries when more than one thread
    ++	 * is unpacking the same object, in unpack_entry() (since its phases I
    ++	 * and III might run concurrently across multiple threads).
    ++	 */
    ++	if (in_delta_base_cache(p, base_offset))
     +		return;
     +
      	delta_base_cached += base_size;
    @@ packfile.c: static void *unpack_compressed_entry(struct packed_git *p,
      	do {
      		in = use_pack(p, w_curs, curpos, &stream.avail_in);
      		stream.next_in = in;
    ++		/*
    ++		 * Note: we must ensure the window section returned by
    ++		 * use_pack() will be available throughout git_inflate()'s
    ++		 * unlocked execution. Please refer to the comment at
    ++		 * get_size_from_delta() to see how this is done.
    ++		 */
     +		obj_read_unlock();
      		st = git_inflate(&stream, Z_FINISH);
     +		obj_read_lock();
 6:  b439bdeed3 =  6:  fc1200bb07 grep: replace grep_read_mutex by internal obj read lock
 7:  d759f1e8c7 !  7:  d39d2ce9c4 submodule-config: add skip_if_read option to repo_read_gitmodules()
    @@ Metadata
      ## Commit message ##
         submodule-config: add skip_if_read option to repo_read_gitmodules()
     
    -    Currently, submodule-config.c doesn't have an externally acessible
    +    Currently, submodule-config.c doesn't have an externally accessible
         function to read gitmodules only if it wasn't already read. But this
    -    exactly behavior is internally implemented by gitmodules_read_check(),
    -    to perform a lazy load. Let's merge this function with
    -    repo_read_gitmodules() adding an 'skip_if_read' which allow both
    +    exact behavior is internally implemented by gitmodules_read_check(), to
    +    perform a lazy load. Let's merge this function with
    +    repo_read_gitmodules() adding a 'skip_if_read' which allows both
         internal and external callers to access this functionality. This
    -    simplifies a little the code. The added option will also be used in the
    -    following patch.
    +    simplifies a little the code. The added option will also be used in
    +    the following patch.
     
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
    @@ submodule-config.h: int option_fetch_parse_recurse_submodules(const struct optio
     -void repo_read_gitmodules(struct repository *repo);
     +void repo_read_gitmodules(struct repository *repo, int skip_if_read);
      void gitmodules_config_oid(const struct object_id *commit_oid);
    - const struct submodule *submodule_from_name(struct repository *r,
    - 					    const struct object_id *commit_or_tree,
    + 
    + /**
     
      ## unpack-trees.c ##
     @@ unpack-trees.c: static void load_gitmodules_file(struct index_state *index,
 8:  2e76847ec9 !  8:  af8ad95d41 grep: allow submodule functions to run in parallel
    @@ Commit message
           race condition with object reading (for example parse_object() and
           is_promisor_remote()). However, they only call repo_read_gitmodules()
           if it wasn't read before. So let's pre-read it before firing the
    -      threads and allow these two functions to safelly be called in
    +      threads and allow these two functions to safely be called in
           parallel.
     
         - repo_submodule_init() is already thread-safe, so remove it from the
    @@ Commit message
           protected for now. Let's add a "NEEDSWORK" to it, informing why it
           cannot be removed from the critical section yet.
     
    -    - Finally, add_to_alternates_memory() must be kept protected by the same
    -      reason of the above item.
    +    - Finally, add_to_alternates_memory() must be kept protected for the
    +      same reason as the item above.
     
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
 9:  134114b001 !  9:  0ccf79ba86 grep: protect packed_git [re-]initialization
    @@ Commit message
         happen often, it could be hard to detect. So to prevent future
         headaches, let's force eager initialization of packed_git when setting
         git-grep up. There'll be a small overhead in the cases where we didn't
    -    really needed to prepare packed_git during execution but this shouldn't
    -    be very noticeable.
    +    really need to prepare packed_git during execution but this shouldn't be
    +    very noticeable.
     
         Also, packed_git may be re-initialized by
         packfile.c:reprepare_packed_git(). Again, all paths to it in git-grep
10:  04a4d81ff5 ! 10:  6c09e9169d grep: re-enable threads in non-worktree case
    @@ Commit message
             8**  |   6.2886s  |   6.9843s
     
         These are all means of 30 executions after 2 warmup runs. All tests were
    -    executed on an i7-7700HQ (quad core w/ hyper-threading), 16GB of RAM and
    +    executed on an i7-7700HQ (quad-core w/ hyper-threading), 16GB of RAM and
         SSD, running Manjaro Linux. But to make sure the optimization also
         performs well on HDD, the tests were repeated on another machine with an
    -    i5-4210U (dual core w/ hyper-threading), 8GB of RAM and HDD (SATA III,
    +    i5-4210U (dual-core w/ hyper-threading), 8GB of RAM and HDD (SATA III,
         5400 rpm), also running Manjaro Linux:
     
          Threads |   Regex 1  |  Regex 2
    @@ Commit message
         case when --textconv is used and there're too many text conversions.
         Probably the reason for this is that the object read lock is used to
         protect fill_textconv() and therefore there is a mutual exclusion
    -    between textconv execution and object reading. Because both are time
    -    consuming operations, not being able to perform them in parallel can
    -    cause performance drops. To inform the users about this (and other
    -    threading detais), let's also add a "NOTES ON THREADS" section to
    +    between textconv execution and object reading. Because both are
    +    time-consuming operations, not being able to perform them in parallel
    +    can cause performance drops. To inform the users about this (and other
    +    threading details), let's also add a "NOTES ON THREADS" section to
         Documentation/git-grep.txt.
     
         [1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”,
11:  4f6f8b611c = 11:  2f72f30341 grep: move driver pre-load out of critical section
 -:  ---------- > 12:  a5891176d7 grep: use no. of cores as the default no. of threads