mbox series

[v5,0/9] Parallel Checkout (part I)

Message ID cover.1608128666.git.matheus.bernardino@usp.br (mailing list archive)
Headers show
Series Parallel Checkout (part I) | expand

Message

Matheus Tavares Dec. 16, 2020, 2:50 p.m. UTC
The previous rounds got many great suggestions about patches 1 to 10,
but not as many comments on the latter patches. Christian commented that
patches 10 and 11 are too long/complex, making the overall series harder
to review. So he suggested that I eject patches 10 to 19, and send them
later in a separated part. This will hopefully make the series easier to
review and move forward. (I also hope to include a desing doc in part 2
to make those two bigger patches more digestible.)

So this part is now composed only of the 9 preparatory patches, which
mainly focus on: (1) adding the 'struct conv_attrs' parameter to some
convert.c and entry.c functions (to avoid re-loading the attributes
during parallel checkout); and (2) making some functions public (for
parallel-checkout.c's later use).

Changes since v4:

General:
- Removed "[matheus.bernardino: ...]" lines from patches 1 to 4.
- Ejected patches 10 to 19, which will now be sent in part 2.

Patch 2:
- Fixed typo on commit message (s/on a index/on an index/).
- Added "_ca" to convert_to_working_tree_internal()'s name.

Patch 4:
- Reworded patch title for better clarity, as suggested by Christian.

Patch 5:
- Mentioned about moving a checkout_entry() comment from entry.c to
  entry.h in the patch's message.

Patch 7:
- Make patch's title more explicit and reworded message for clarity.

Patch 8:
- Fixed the last call to write_entry() in checkout_entry(): it should
  pass 'ca', not NULL.

Patch 9:
- Defined checkout_entry() -- which is now a wrapper to
  checkout_entry_ca() -- as a static inline function instead of a macro.
- Shortened patch's title.

Note: to see the big picture where these patches should fit in, please
check the previous round with the complete series:
https://lore.kernel.org/git/cover.1604521275.git.matheus.bernardino@usp.br/


Jeff Hostetler (4):
  convert: make convert_attrs() and convert structs public
  convert: add [async_]convert_to_working_tree_ca() variants
  convert: add get_stream_filter_ca() variant
  convert: add classification for conv_attrs struct

Matheus Tavares (5):
  entry: extract a header file for entry.c functions
  entry: make fstat_output() and read_blob_entry() public
  entry: extract update_ce_after_write() from write_entry()
  entry: move conv_attrs lookup up to checkout_entry()
  entry: add checkout_entry_ca() taking preloaded conv_attrs

 apply.c                  |   1 +
 builtin/checkout-index.c |   1 +
 builtin/checkout.c       |   1 +
 builtin/difftool.c       |   1 +
 cache.h                  |  24 -------
 convert.c                | 143 ++++++++++++++++++++-------------------
 convert.h                |  96 +++++++++++++++++++++++---
 entry.c                  |  85 +++++++++++++----------
 entry.h                  |  59 ++++++++++++++++
 unpack-trees.c           |   1 +
 10 files changed, 275 insertions(+), 137 deletions(-)
 create mode 100644 entry.h

Range-diff against v4:
 1:  2726f6dc05 !  1:  fa04185237 convert: make convert_attrs() and convert structs public
    @@ Commit message
         convert_crlf_action, which is more appropriate for the global namespace.
     
         Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
    -    [matheus.bernardino: squash and reword msg]
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
      ## convert.c ##
 2:  fc03417592 !  2:  2d3c1dc0a1 convert: add [async_]convert_to_working_tree_ca() variants
    @@ Commit message
     
         Separate the attribute gathering from the actual conversion by adding
         _ca() variants of the conversion functions. These variants receive a
    -    precomputed 'struct conv_attrs', not relying, thus, on a index state.
    +    precomputed 'struct conv_attrs', not relying, thus, on an index state.
         They will be used in a future patch adding parallel checkout support,
         for two reasons:
     
    @@ Commit message
           it for the main process, anyway.
     
         Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
    -    [matheus.bernardino: squash, remove one function definition and reword]
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
      ## convert.c ##
    @@ convert.c: void convert_to_git_filter_fd(const struct index_state *istate,
      }
      
     -static int convert_to_working_tree_internal(const struct index_state *istate,
    -+static int convert_to_working_tree_internal(const struct conv_attrs *ca,
    - 					    const char *path, const char *src,
    - 					    size_t len, struct strbuf *dst,
    - 					    int normalizing,
    -@@ convert.c: static int convert_to_working_tree_internal(const struct index_state *istate,
    - 					    struct delayed_checkout *dco)
    +-					    const char *path, const char *src,
    +-					    size_t len, struct strbuf *dst,
    +-					    int normalizing,
    +-					    const struct checkout_metadata *meta,
    +-					    struct delayed_checkout *dco)
    ++static int convert_to_working_tree_ca_internal(const struct conv_attrs *ca,
    ++					       const char *path, const char *src,
    ++					       size_t len, struct strbuf *dst,
    ++					       int normalizing,
    ++					       const struct checkout_metadata *meta,
    ++					       struct delayed_checkout *dco)
      {
      	int ret = 0, ret_filter = 0;
     -	struct conv_attrs ca;
    @@ convert.c: static int convert_to_working_tree_internal(const struct index_state
     +				     void *dco)
      {
     -	return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco);
    -+	return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, dco);
    ++	return convert_to_working_tree_ca_internal(ca, path, src, len, dst, 0,
    ++						   meta, dco);
      }
      
     -int convert_to_working_tree(const struct index_state *istate,
    @@ convert.c: static int convert_to_working_tree_internal(const struct index_state
     +			       const struct checkout_metadata *meta)
      {
     -	return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL);
    -+	return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, NULL);
    ++	return convert_to_working_tree_ca_internal(ca, path, src, len, dst, 0,
    ++						   meta, NULL);
      }
      
      int renormalize_buffer(const struct index_state *istate, const char *path,
    @@ convert.c: static int convert_to_working_tree_internal(const struct index_state
     +	int ret;
     +
     +	convert_attrs(istate, &ca, path);
    -+	ret = convert_to_working_tree_internal(&ca, path, src, len, dst, 1, NULL, NULL);
    ++	ret = convert_to_working_tree_ca_internal(&ca, path, src, len, dst, 1,
    ++						  NULL, NULL);
      	if (ret) {
      		src = dst->buf;
      		len = dst->len;
 3:  8ce20f1031 !  3:  8af17f6a9b convert: add get_stream_filter_ca() variant
    @@ Commit message
         attributes struct as a parameter.
     
         Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
    -    [matheus.bernardino: move header comment to ca() variant and reword msg]
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
      ## convert.c ##
 4:  aa1eb461f4 !  4:  f829e2e08f convert: add conv_attrs classification
    @@ Metadata
     Author: Jeff Hostetler <jeffhost@microsoft.com>
     
      ## Commit message ##
    -    convert: add conv_attrs classification
    +    convert: add classification for conv_attrs struct
     
         Create `enum conv_attrs_classification` to express the different ways
         that attributes are handled for a blob during checkout.
    @@ Commit message
         classifying logic is the same).
     
         Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
    -    [matheus.bernardino: use classification in get_stream_filter_ca()]
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
      ## convert.c ##
 5:  cb3dea224b !  5:  934ead9519 entry: extract a header file for entry.c functions
    @@ Commit message
         reside in cache.h. Although not many, they contribute to the size of
         cache.h and, when changed, cause the unnecessary recompilation of
         modules that don't really use these functions. So let's move them to a
    -    new entry.h header.
    +    new entry.h header. While at it let's also move a comment related to
    +    checkout_entry() from entry.c to entry.h as it's more useful to describe
    +    the function there.
     
         Original-patch-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
         Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
 6:  46ed6274d7 =  6:  da6d1d7624 entry: make fstat_output() and read_blob_entry() public
 7:  a0479d02ff !  7:  def654606d entry: extract cache_entry update from write_entry()
    @@ Metadata
     Author: Matheus Tavares <matheus.bernardino@usp.br>
     
      ## Commit message ##
    -    entry: extract cache_entry update from write_entry()
    +    entry: extract update_ce_after_write() from write_entry()
     
    -    This code will be used by the parallel checkout functions, outside
    -    entry.c, so extract it to a public function.
    +    The code that updates the in-memory index information after an entry is
    +    written currently resides in write_entry(). Extract it to a public
    +    function so that it can be called by the parallel checkout functions,
    +    outside entry.c, in a later patch.
     
         Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
     
 8:  5c993cc27f !  8:  fbde901518 entry: move conv_attrs lookup up to checkout_entry()
    @@ entry.c: int checkout_entry(struct cache_entry *ce, const struct checkout *state
     +		ca = &ca_buf;
     +	}
     +
    -+	return write_entry(ce, path.buf, NULL, state, 0);
    ++	return write_entry(ce, path.buf, ca, state, 0);
      }
      
      void unlink_entry(const struct cache_entry *ce)
 9:  aa635bda21 !  9:  0556d32e8c entry: add checkout_entry_ca() which takes preloaded conv_attrs
    @@ Metadata
     Author: Matheus Tavares <matheus.bernardino@usp.br>
     
      ## Commit message ##
    -    entry: add checkout_entry_ca() which takes preloaded conv_attrs
    +    entry: add checkout_entry_ca() taking preloaded conv_attrs
     
         The parallel checkout machinery will call checkout_entry() for entries
         that could not be written in parallel due to path collisions. At this
    @@ entry.c: int checkout_entry(struct cache_entry *ce, const struct checkout *state
      		convert_attrs(state->istate, &ca_buf, ce->name);
      		ca = &ca_buf;
      	}
    - 
    --	return write_entry(ce, path.buf, NULL, state, 0);
    -+	return write_entry(ce, path.buf, ca, state, 0);
    - }
    - 
    - void unlink_entry(const struct cache_entry *ce)
     
      ## entry.h ##
     @@ entry.h: struct checkout {
    @@ entry.h: struct checkout {
       */
     -int checkout_entry(struct cache_entry *ce, const struct checkout *state,
     -		   char *topath, int *nr_checkouts);
    -+#define checkout_entry(ce, state, topath, nr_checkouts) \
    -+		checkout_entry_ca(ce, NULL, state, topath, nr_checkouts)
     +int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca,
     +		      const struct checkout *state, char *topath,
     +		      int *nr_checkouts);
    ++static inline int checkout_entry(struct cache_entry *ce,
    ++				 const struct checkout *state, char *topath,
    ++				 int *nr_checkouts)
    ++{
    ++	return checkout_entry_ca(ce, NULL, state, topath, nr_checkouts);
    ++}
      
      void enable_delayed_checkout(struct checkout *state);
      int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
10:  bc8447cd9c <  -:  ---------- unpack-trees: add basic support for parallel checkout
11:  815137685a <  -:  ---------- parallel-checkout: make it truly parallel
12:  2b42621582 <  -:  ---------- parallel-checkout: support progress displaying
13:  960116579a <  -:  ---------- make_transient_cache_entry(): optionally alloc from mem_pool
14:  fb9f2f580c <  -:  ---------- builtin/checkout.c: complete parallel checkout support
15:  a844451e58 <  -:  ---------- checkout-index: add parallel checkout support
16:  3733857ffa <  -:  ---------- parallel-checkout: add tests for basic operations
17:  c8a2974f81 <  -:  ---------- parallel-checkout: add tests related to clone collisions
18:  86fccd57d5 <  -:  ---------- parallel-checkout: add tests related to .gitattributes
19:  7f3e23cc38 <  -:  ---------- ci: run test round with parallel-checkout enabled

Comments

Christian Couder Dec. 16, 2020, 3:27 p.m. UTC | #1
On Wed, Dec 16, 2020 at 3:50 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> The previous rounds got many great suggestions about patches 1 to 10,
> but not as many comments on the latter patches. Christian commented that
> patches 10 and 11 are too long/complex, making the overall series harder
> to review. So he suggested that I eject patches 10 to 19, and send them
> later in a separated part. This will hopefully make the series easier to
> review and move forward. (I also hope to include a desing doc in part 2
> to make those two bigger patches more digestible.)

Thanks, and yeah, sorry I suggested that privately, but should have
done it on the mailing list.

I actually think that patches 10 and 11 (which each one contains 400+
new lines) in the previous series should be mostly alone in a part 2,
with perhaps a part 3 that would have most of the rest, so
improvements and tests.

It might also be possible to split at least a bit a few things in
patches 10 and 11. For example I think it's ok to add new
configuration in a separate patch even if it's not used yet. It can
just reserve the name. That could be in part  2 then.

> Changes since v4:

From a quick look at the range-diff, it looks good to me!

Thanks,
Christian.
Junio C Hamano Dec. 17, 2020, 1:11 a.m. UTC | #2
Matheus Tavares <matheus.bernardino@usp.br> writes:

> The previous rounds got many great suggestions about patches 1 to 10,
> but not as many comments on the latter patches. Christian commented that
> patches 10 and 11 are too long/complex, making the overall series harder
> to review. So he suggested that I eject patches 10 to 19, and send them
> later in a separated part. This will hopefully make the series easier to
> review and move forward. (I also hope to include a desing doc in part 2
> to make those two bigger patches more digestible.)
>
> So this part is now composed only of the 9 preparatory patches, which
> mainly focus on: (1) adding the 'struct conv_attrs' parameter to some
> convert.c and entry.c functions (to avoid re-loading the attributes
> during parallel checkout); and (2) making some functions public (for
> parallel-checkout.c's later use).

All of these patches look sensible to me.

Will replace.  Thanks.