mbox series

[v3,0/8] Parallel Checkout (part 3)

Message ID cover.1620145501.git.matheus.bernardino@usp.br (mailing list archive)
Headers show
Series Parallel Checkout (part 3) | expand

Message

Matheus Tavares May 4, 2021, 4:27 p.m. UTC
This is the last part of the parallel checkout series. It adds tests and
parallel checkout support to `git checkout-index` and
`git checkout <pathspec>`.

I rebased this version onto `master`, as part-2 was merged down to
`master`.

Changes since v2:

* Patch 1: Aligned function parameters in make_transient_cache_entry(),
  and renamed the `struct mem_pool *` argument from `mp` to
  `ce_mem_pool`. This is more consistent with the other functions using
  in read-cache.c and with how we call this function in the next patch.

* Patch 2: Removed unnecessary 'Note: ' from commit message.

* Patch 3: Rewrote commit message to better explain why we unified the
  exit paths for `checkout_all()` and `checkout_file()` modes, and
  changed `git checkout-index --all`s error code from 128 to 1.

Matheus Tavares (8):
  make_transient_cache_entry(): optionally alloc from mem_pool
  builtin/checkout.c: complete parallel checkout support
  checkout-index: add parallel checkout support
  parallel-checkout: add tests for basic operations
  parallel-checkout: add tests related to path collisions
  t0028: extract encoding helpers to lib-encoding.sh
  parallel-checkout: add tests related to .gitattributes
  ci: run test round with parallel-checkout enabled

 builtin/checkout--worker.c              |   2 +-
 builtin/checkout-index.c                |  24 ++-
 builtin/checkout.c                      |  20 ++-
 builtin/difftool.c                      |   2 +-
 cache.h                                 |  14 +-
 ci/run-build-and-tests.sh               |   1 +
 parallel-checkout.c                     |  18 ++
 read-cache.c                            |  14 +-
 t/README                                |   4 +
 t/lib-encoding.sh                       |  25 +++
 t/lib-parallel-checkout.sh              |  45 +++++
 t/t0028-working-tree-encoding.sh        |  25 +--
 t/t2080-parallel-checkout-basics.sh     | 229 ++++++++++++++++++++++++
 t/t2081-parallel-checkout-collisions.sh | 162 +++++++++++++++++
 t/t2082-parallel-checkout-attributes.sh | 194 ++++++++++++++++++++
 unpack-trees.c                          |   2 +-
 16 files changed, 732 insertions(+), 49 deletions(-)
 create mode 100644 t/lib-encoding.sh
 create mode 100644 t/lib-parallel-checkout.sh
 create mode 100755 t/t2080-parallel-checkout-basics.sh
 create mode 100755 t/t2081-parallel-checkout-collisions.sh
 create mode 100755 t/t2082-parallel-checkout-attributes.sh

Range-diff against v2:
1:  f870040bfb ! 1:  bf6c7114aa make_transient_cache_entry(): optionally alloc from mem_pool
    @@ cache.h: struct cache_entry *make_empty_cache_entry(struct index_state *istate,
     - * Create a cache_entry that is not intended to be added to an index.
     - * Caller is responsible for discarding the cache_entry
     - * with `discard_cache_entry`.
    -+ * Create a cache_entry that is not intended to be added to an index. If `mp`
    -+ * is not NULL, the entry is allocated within the given memory pool. Caller is
    -+ * responsible for discarding "loose" entries with `discard_cache_entry()` and
    -+ * the mem_pool with `mem_pool_discard(mp, should_validate_cache_entries())`.
    ++ * Create a cache_entry that is not intended to be added to an index. If
    ++ * `ce_mem_pool` is not NULL, the entry is allocated within the given memory
    ++ * pool. Caller is responsible for discarding "loose" entries with
    ++ * `discard_cache_entry()` and the memory pool with
    ++ * `mem_pool_discard(ce_mem_pool, should_validate_cache_entries())`.
       */
      struct cache_entry *make_transient_cache_entry(unsigned int mode,
      					       const struct object_id *oid,
      					       const char *path,
     -					       int stage);
    -+					       int stage, struct mem_pool *mp);
    ++					       int stage,
    ++					       struct mem_pool *ce_mem_pool);
      
     -struct cache_entry *make_empty_transient_cache_entry(size_t name_len);
    -+struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp);
    ++struct cache_entry *make_empty_transient_cache_entry(size_t len,
    ++						     struct mem_pool *ce_mem_pool);
      
      /*
       * Discard cache entry.
    @@ read-cache.c: struct cache_entry *make_empty_cache_entry(struct index_state *ist
      }
      
     -struct cache_entry *make_empty_transient_cache_entry(size_t len)
    -+struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp)
    ++struct cache_entry *make_empty_transient_cache_entry(size_t len,
    ++						     struct mem_pool *ce_mem_pool)
      {
    -+	if (mp)
    -+		return mem_pool__ce_calloc(mp, len);
    ++	if (ce_mem_pool)
    ++		return mem_pool__ce_calloc(ce_mem_pool, len);
      	return xcalloc(1, cache_entry_size(len));
      }
      
    @@ read-cache.c: struct cache_entry *make_cache_entry(struct index_state *istate,
     -					       const char *path, int stage)
     +struct cache_entry *make_transient_cache_entry(unsigned int mode,
     +					       const struct object_id *oid,
    -+					       const char *path, int stage,
    -+					       struct mem_pool *mp)
    ++					       const char *path,
    ++					       int stage,
    ++					       struct mem_pool *ce_mem_pool)
      {
      	struct cache_entry *ce;
      	int len;
    @@ read-cache.c: struct cache_entry *make_transient_cache_entry(unsigned int mode,
      
      	len = strlen(path);
     -	ce = make_empty_transient_cache_entry(len);
    -+	ce = make_empty_transient_cache_entry(len, mp);
    ++	ce = make_empty_transient_cache_entry(len, ce_mem_pool);
      
      	oidcpy(&ce->oid, oid);
      	memcpy(ce->name, path, len);
2:  e2d82c4337 ! 2:  e898889787 builtin/checkout.c: complete parallel checkout support
    @@ Commit message
         checkout_entry() directly, instead of unpack_trees(). Let's add parallel
         checkout support for this code path too.
     
    -    Note: the transient cache entries allocated in checkout_merged() are now
    +    The transient cache entries allocated in checkout_merged() are now
         allocated in a mem_pool which is only discarded after parallel checkout
         finishes. This is done because the entries need to be valid when
         run_parallel_checkout() is called.
    @@ builtin/checkout.c: static int checkout_worktree(const struct checkout_opts *opt
      	enable_delayed_checkout(&state);
     +	if (pc_workers > 1)
     +		init_parallel_checkout();
    - 	for (pos = 0; pos < active_nr; pos++) {
    - 		struct cache_entry *ce = active_cache[pos];
    - 		if (ce->ce_flags & CE_MATCHED) {
    + 
    + 	/* TODO: audit for interaction with sparse-index. */
    + 	ensure_full_index(&the_index);
     @@ builtin/checkout.c: static int checkout_worktree(const struct checkout_opts *opts,
      						       &nr_checkouts, opts->overlay_mode);
      			else if (opts->merge)
3:  0fe1a5fabc ! 3:  916f391a46 checkout-index: add parallel checkout support
    @@ Metadata
      ## Commit message ##
         checkout-index: add parallel checkout support
     
    -    Note: previously, `checkout_all()` would not return on errors, but
    -    instead call `exit()` with a non-zero code. However, it only did that
    -    after calling `checkout_entry()` for all index entries, thus not
    -    stopping on the first error, but attempting to write the maximum number
    -    of entries possible. In order to maintain this behavior we now propagate
    -    `checkout_all()`s error status to `cmd_checkout_index()`, so that it can
    -    call `run_parallel_checkout()` and attempt to write the queued entries
    -    before exiting with the error code.
    +    Allow checkout-index to use the parallel checkout framework, honoring
    +    the checkout.workers configuration.
    +
    +    There are two code paths in checkout-index which call
    +    `checkout_entry()`, and thus, can make use of parallel checkout:
    +    `checkout_file()`, which is used to write paths explicitly given at the
    +    command line; and `checkout_all()`, which is used to write all paths in
    +    the index, when the `--all` option is given.
    +
    +    In both operation modes, checkout-index doesn't abort immediately on a
    +    `checkout_entry()` failure. Instead, it tries to check out all remaining
    +    paths before exiting with a non-zero exit code. To keep this behavior
    +    when parallel checkout is being used, we must allow
    +    `run_parallel_checkout()` to try writing the queued entries before we
    +    exit, even if we already got an error code from a previous
    +    `checkout_entry()` call.
    +
    +    However, `checkout_all()` doesn't return on errors, it calls `exit()`
    +    with code 128. We could make it call `run_parallel_checkout()` before
    +    exiting, but it makes the code easier to follow if we unify the exit
    +    path for both checkout-index modes at `cmd_checkout_index()`, and let
    +    this function take care of the interactions with the parallel checkout
    +    API. So let's do that.
    +
    +    With this change, we also have to consider whether we want to keep using
    +    128 as the error code for `git checkout-index --all`, while we use 1 for
    +    `git checkout-index <path>` (even when the actual error is the same).
    +    Since there is not much value in having code 128 only for `--all`, and
    +    there is no mention about it in the docs (so it's unlikely that changing
    +    it will break any existing script), let's make both modes exit with code
    +    1 on `checkout_entry()` errors.
     
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
4:  f604c50dba = 4:  667777053a parallel-checkout: add tests for basic operations
5:  eae6d3a1c1 = 5:  dcb3acab1d parallel-checkout: add tests related to path collisions
6:  9161cd1503 = 6:  6141c46051 t0028: extract encoding helpers to lib-encoding.sh
7:  bc584897e6 = 7:  5350689a30 parallel-checkout: add tests related to .gitattributes
8:  1bc5c523c5 ! 8:  7b2966c488 ci: run test round with parallel-checkout enabled
    @@ parallel-checkout.c: static const int DEFAULT_NUM_WORKERS = 1;
      	else if (*num_workers < 1)
     
      ## t/README ##
    -@@ t/README: and "sha256".
    - GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the
    - 'pack.writeReverseIndex' setting.
    +@@ t/README: GIT_TEST_WRITE_REV_INDEX=<boolean>, when true enables the
    + GIT_TEST_SPARSE_INDEX=<boolean>, when true enables index writes to use the
    + sparse-index format by default.
      
     +GIT_TEST_CHECKOUT_WORKERS=<n> overrides the 'checkout.workers' setting
     +to <n> and 'checkout.thresholdForParallelism' to 0, forcing the

Comments

Derrick Stolee May 5, 2021, 1:57 p.m. UTC | #1
On 5/4/2021 12:27 PM, Matheus Tavares wrote:
> This is the last part of the parallel checkout series. It adds tests and
> parallel checkout support to `git checkout-index` and
> `git checkout <pathspec>`.
> 
> I rebased this version onto `master`, as part-2 was merged down to
> `master`.

I read the range-diff and gave the patches another pass. This version
looks good to me.

Thanks,
-Stolee
Junio C Hamano May 6, 2021, 12:40 a.m. UTC | #2
Derrick Stolee <stolee@gmail.com> writes:

> On 5/4/2021 12:27 PM, Matheus Tavares wrote:
>> This is the last part of the parallel checkout series. It adds tests and
>> parallel checkout support to `git checkout-index` and
>> `git checkout <pathspec>`.
>> 
>> I rebased this version onto `master`, as part-2 was merged down to
>> `master`.
>
> I read the range-diff and gave the patches another pass. This version
> looks good to me.

Thanks.  Your assessment matches mine.