diff mbox series

[v2,07/10] sparse-index: partially expand directories

Message ID 346c56bf2560c5a89850ef4f8a58fbe17cde10fc.1652982759.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Sparse index: integrate with sparse-checkout | expand

Commit Message

Derrick Stolee May 19, 2022, 5:52 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The expand_to_pattern_list() method expands sparse directory entries
to their list of contained files when either the pattern list is NULL or
the directory is contained in the new pattern list's cone mode patterns.

It is possible that the pattern list has a recursive match with a
directory 'A/B/C/' and so an existing sparse directory 'A/B/' would need
to be expanded. If there exists a directory 'A/B/D/', then that
directory should not be expanded and instead we can create a sparse
directory.

To implement this, we plug into the add_path_to_index() callback for the
call to read_tree_at(). Since we now need access to both the index we
are writing and the pattern list we are comparing, create a 'struct
modify_index_context' to use as a data transfer object. It is important
that we use the given pattern list since we will use this pattern list
to change the sparse-checkout patterns and cannot use
istate->sparse_checkout_patterns.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 sparse-index.c | 46 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 39 insertions(+), 7 deletions(-)

Comments

Junio C Hamano May 20, 2022, 6:17 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> @@ -231,18 +236,41 @@ static int add_path_to_index(const struct object_id *oid,
>  			     struct strbuf *base, const char *path,
>  			     unsigned int mode, void *context)
>  {

OK, this function is a callback of read_tree_at(), whose caller is
walking the source (=current) index, and while it is copying
existing non-directory entries to the destination (=ctx->write)
index, it found a tree in the source in-core index that does not
match the cone pattern (or we are expanding fully).  Our job is to
expand the given tree as a "subdirectory" in the project into the
destination index.

We used to just let the calling read_tree_at() fully and recursively
walk the tree while adding any non-tree entries to the destination.
Now, we have the pattern list in the context, we go more selective.

The design makes sense.

> +	struct modify_index_context *ctx = (struct modify_index_context *)context;
>  	struct cache_entry *ce;
>  	size_t len = base->len;
>  
> +	if (S_ISDIR(mode)) {
> +		int dtype;
> +		size_t baselen = base->len;
> +		if (!ctx->pl)
> +			return READ_TREE_RECURSIVE;

Fully recursive case.  We can do without any match, just like the
caller (i.e. expand_to_pattern_list) did, when the pattern list is
NULL.

> +		/*
> +		 * Have we expanded to a point outside of the sparse-checkout?
> +		 */
> +		strbuf_addstr(base, path);
> +		strbuf_add(base, "/-", 2);
> +
> +		if (path_matches_pattern_list(base->buf, base->len,
> +					      NULL, &dtype,
> +					      ctx->pl, ctx->write)) {

If that sample path in the directory matches (MATCHED or
MATCHED_RECURSIVE) the patterns, we recurse into the tree to expand.

Can the function return UNDECIDED here?  If so what should happen?
As written, the code will behave as if it matched, and it is not
quite clear if that is the behaviour intended by this patch or an
accident.

> +			strbuf_setlen(base, baselen);
> +			return READ_TREE_RECURSIVE;
> +		}

The caller found a tree at path <path> in the index.  We check if
our patterns match a fictitious path <path> + "/-", which may exist
if the <path> is a directory and if there is a funny named file or
directory "-" in it.

Why "-"?  Are we trying ot see if "ctx->pl" matches "anything" in
the directory that is at <path>?  Is the assumption here that pl
only names directories literally without blobs (I presume that is
the same thing as assuming the cone mode)?

I am trying to see if there is a way that expresses the intention of
this code more clearly than using an arbitrary path "-" (and trying
to figure out if I got the intention right in the first place ;-)..

> +		/*
> +		 * The path "{base}{path}/" is a sparse directory. Create the correct
> +		 * name for inserting the entry into the index.
> +		 */
> +		strbuf_setlen(base, base->len - 1);

This removes that phoney "-" while keeping the trailing "/".  Just
like "-" was unclear, understanding this "- 1" requires that the
reader understands why "/-" was used earlier.

The resulting "base" is used in the newly created cache entry to
represent the (unexpanded) directory below, and such a cache entry
is supposed to have a path with a trailing slash, so it makes sense.

> +	} else {
> +		strbuf_addstr(base, path);

For any non-tree thing, we use the given path for the cache entry to
represent it.

> +	}
> +
> +	ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0);
>  	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
> -	set_index_entry(istate, istate->cache_nr++, ce);
> +	set_index_entry(ctx->write, ctx->write->cache_nr++, ce);

And the cache entry (newly created) is added to the destination.
Unlike what happens in the caller (i.e. expand_to_pattern_list) that
moves the cache entry taken from the source index to the destination
index, the caller will discard the cache entry taken from the source
index after read_tree_at() returns, and we create a new instance for
ourselves here, even if we _could_ have reused it (by passing it in
the context structure, for example).

>  	strbuf_setlen(base, len);

And we restore the length of the path in the base to the original
before we return.

>  	return 0;

Returning 0 as opposed to READ_TREE_RECURSIVE here means "we've
dealt with the entry; don't recurse into subtree even if you called
us with a tree", which is the right thing to do here, as we did have
done all we need to here.

OK, except for that "/-" thing, all of the above makes sense to me.

Thanks.

> @@ -254,6 +282,7 @@ void expand_to_pattern_list(struct index_state *istate,
>  	int i;
>  	struct index_state *full;
>  	struct strbuf base = STRBUF_INIT;
> +	struct modify_index_context ctx;
>  
>  	/*
>  	 * If the index is already full, then keep it full. We will convert
> @@ -294,6 +323,9 @@ void expand_to_pattern_list(struct index_state *istate,
>  	full->cache_nr = 0;
>  	ALLOC_ARRAY(full->cache, full->cache_alloc);
>  
> +	ctx.write = full;
> +	ctx.pl = pl;
> +
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
>  		struct tree *tree;
> @@ -319,7 +351,7 @@ void expand_to_pattern_list(struct index_state *istate,
>  		strbuf_add(&base, ce->name, strlen(ce->name));
>  
>  		read_tree_at(istate->repo, tree, &base, &ps,
> -			     add_path_to_index, full);
> +			     add_path_to_index, &ctx);
>  
>  		/* free directory entries. full entries are re-used */
>  		discard_cache_entry(ce);
Derrick Stolee May 20, 2022, 6:33 p.m. UTC | #2
On 5/20/2022 2:17 PM, Junio C Hamano wrote:> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> @@ -231,18 +236,41 @@ static int add_path_to_index(const struct object_id *oid,
>>  			     struct strbuf *base, const char *path,
>>  			     unsigned int mode, void *context)
>>  {
> 
> OK, this function is a callback of read_tree_at(), whose caller is
> walking the source (=current) index, and while it is copying
> existing non-directory entries to the destination (=ctx->write)
> index, it found a tree in the source in-core index that does not
> match the cone pattern (or we are expanding fully).  Our job is to
> expand the given tree as a "subdirectory" in the project into the
> destination index.
> 
> We used to just let the calling read_tree_at() fully and recursively
> walk the tree while adding any non-tree entries to the destination.
> Now, we have the pattern list in the context, we go more selective.
> 
> The design makes sense.
> 
>> +	struct modify_index_context *ctx = (struct modify_index_context *)context;
>>  	struct cache_entry *ce;
>>  	size_t len = base->len;
>>  
>> +	if (S_ISDIR(mode)) {
>> +		int dtype;
>> +		size_t baselen = base->len;
>> +		if (!ctx->pl)
>> +			return READ_TREE_RECURSIVE;
> 
> Fully recursive case.  We can do without any match, just like the
> caller (i.e. expand_to_pattern_list) did, when the pattern list is
> NULL.
> 
>> +		/*
>> +		 * Have we expanded to a point outside of the sparse-checkout?
>> +		 */
>> +		strbuf_addstr(base, path);
>> +		strbuf_add(base, "/-", 2);
>> +
>> +		if (path_matches_pattern_list(base->buf, base->len,
>> +					      NULL, &dtype,
>> +					      ctx->pl, ctx->write)) {
> 
> If that sample path in the directory matches (MATCHED or
> MATCHED_RECURSIVE) the patterns, we recurse into the tree to expand.
> 
> Can the function return UNDECIDED here?  If so what should happen?
> As written, the code will behave as if it matched, and it is not
> quite clear if that is the behaviour intended by this patch or an
> accident.

UNDECIDED can only be returned when not in cone mode. We are in cone mode
when this helper is used.

>> +			strbuf_setlen(base, baselen);
>> +			return READ_TREE_RECURSIVE;
>> +		}
> 
> The caller found a tree at path <path> in the index.  We check if
> our patterns match a fictitious path <path> + "/-", which may exist
> if the <path> is a directory and if there is a funny named file or
> directory "-" in it.
> 
> Why "-"?  Are we trying ot see if "ctx->pl" matches "anything" in
> the directory that is at <path>?  Is the assumption here that pl
> only names directories literally without blobs (I presume that is
> the same thing as assuming the cone mode)?
> 
> I am trying to see if there is a way that expresses the intention of
> this code more clearly than using an arbitrary path "-" (and trying
> to figure out if I got the intention right in the first place ;-)..

The '+ "/-"' is there to give us an example file path for "<path>" as a
directory. This is specifically because path_matches_pattern_list() checks
the parent directories for matches, expecting the input to be a file name.

This could use a comment. I'm thinking...

/*
 * Have we expanded to a point outside of the sparse-checkout?
 *
 * Artificially pad the path name with a slash "/" to
 * indicate it as a directory, and add an arbitrary file
 * name ("-") so we can consider base->buf as a file name
 * to match against the cone-mode patterns.
 *
 * If we compared just "path", then we would expand more
 * than we should. Since every file at root is always
 * included, we would expand every directory at root at
 * least one level deep instead of using sparse directory
 * entries.
 */

>> +		/*
>> +		 * The path "{base}{path}/" is a sparse directory. Create the correct
>> +		 * name for inserting the entry into the index.
>> +		 */
>> +		strbuf_setlen(base, base->len - 1);
> 
> This removes that phoney "-" while keeping the trailing "/".  Just
> like "-" was unclear, understanding this "- 1" requires that the
> reader understands why "/-" was used earlier.
> 
> The resulting "base" is used in the newly created cache entry to
> represent the (unexpanded) directory below, and such a cache entry
> is supposed to have a path with a trailing slash, so it makes sense.
> 
>> +	} else {
>> +		strbuf_addstr(base, path);
> 
> For any non-tree thing, we use the given path for the cache entry to
> represent it.
> 
>> +	}
>> +
>> +	ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0);
>>  	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
>> -	set_index_entry(istate, istate->cache_nr++, ce);
>> +	set_index_entry(ctx->write, ctx->write->cache_nr++, ce);
> 
> And the cache entry (newly created) is added to the destination.
> Unlike what happens in the caller (i.e. expand_to_pattern_list) that
> moves the cache entry taken from the source index to the destination
> index, the caller will discard the cache entry taken from the source
> index after read_tree_at() returns, and we create a new instance for
> ourselves here, even if we _could_ have reused it (by passing it in
> the context structure, for example).
> 
>>  	strbuf_setlen(base, len);
> 
> And we restore the length of the path in the base to the original
> before we return.
> 
>>  	return 0;
> 
> Returning 0 as opposed to READ_TREE_RECURSIVE here means "we've
> dealt with the entry; don't recurse into subtree even if you called
> us with a tree", which is the right thing to do here, as we did have
> done all we need to here.
> 
> OK, except for that "/-" thing, all of the above makes sense to me.

Thanks for your careful read. Hopefully my explanation of the "/-" thing
helps.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/sparse-index.c b/sparse-index.c
index c2cd3bdb614..73b82e5017b 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -9,6 +9,11 @@ 
 #include "dir.h"
 #include "fsmonitor.h"
 
+struct modify_index_context {
+	struct index_state *write;
+	struct pattern_list *pl;
+};
+
 static struct cache_entry *construct_sparse_dir_entry(
 				struct index_state *istate,
 				const char *sparse_dir,
@@ -231,18 +236,41 @@  static int add_path_to_index(const struct object_id *oid,
 			     struct strbuf *base, const char *path,
 			     unsigned int mode, void *context)
 {
-	struct index_state *istate = (struct index_state *)context;
+	struct modify_index_context *ctx = (struct modify_index_context *)context;
 	struct cache_entry *ce;
 	size_t len = base->len;
 
-	if (S_ISDIR(mode))
-		return READ_TREE_RECURSIVE;
+	if (S_ISDIR(mode)) {
+		int dtype;
+		size_t baselen = base->len;
+		if (!ctx->pl)
+			return READ_TREE_RECURSIVE;
 
-	strbuf_addstr(base, path);
+		/*
+		 * Have we expanded to a point outside of the sparse-checkout?
+		 */
+		strbuf_addstr(base, path);
+		strbuf_add(base, "/-", 2);
+
+		if (path_matches_pattern_list(base->buf, base->len,
+					      NULL, &dtype,
+					      ctx->pl, ctx->write)) {
+			strbuf_setlen(base, baselen);
+			return READ_TREE_RECURSIVE;
+		}
 
-	ce = make_cache_entry(istate, mode, oid, base->buf, 0, 0);
+		/*
+		 * The path "{base}{path}/" is a sparse directory. Create the correct
+		 * name for inserting the entry into the index.
+		 */
+		strbuf_setlen(base, base->len - 1);
+	} else {
+		strbuf_addstr(base, path);
+	}
+
+	ce = make_cache_entry(ctx->write, mode, oid, base->buf, 0, 0);
 	ce->ce_flags |= CE_SKIP_WORKTREE | CE_EXTENDED;
-	set_index_entry(istate, istate->cache_nr++, ce);
+	set_index_entry(ctx->write, ctx->write->cache_nr++, ce);
 
 	strbuf_setlen(base, len);
 	return 0;
@@ -254,6 +282,7 @@  void expand_to_pattern_list(struct index_state *istate,
 	int i;
 	struct index_state *full;
 	struct strbuf base = STRBUF_INIT;
+	struct modify_index_context ctx;
 
 	/*
 	 * If the index is already full, then keep it full. We will convert
@@ -294,6 +323,9 @@  void expand_to_pattern_list(struct index_state *istate,
 	full->cache_nr = 0;
 	ALLOC_ARRAY(full->cache, full->cache_alloc);
 
+	ctx.write = full;
+	ctx.pl = pl;
+
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 		struct tree *tree;
@@ -319,7 +351,7 @@  void expand_to_pattern_list(struct index_state *istate,
 		strbuf_add(&base, ce->name, strlen(ce->name));
 
 		read_tree_at(istate->repo, tree, &base, &ps,
-			     add_path_to_index, full);
+			     add_path_to_index, &ctx);
 
 		/* free directory entries. full entries are re-used */
 		discard_cache_entry(ce);