diff mbox series

[v2,04/10] sparse-index: introduce partially-sparse indexes

Message ID 269c206c331bb43006678beaa20832a75754c3df.1652982758.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Sparse index: integrate with sparse-checkout | expand

Commit Message

Derrick Stolee May 19, 2022, 5:52 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

A future change will present a temporary, in-memory mode where the index
can both contain sparse directory entries but also not be completely
collapsed to the smallest possible sparse directories. This will be
necessary for modifying the sparse-checkout definition while using a
sparse index.

For now, convert the single-bit member 'sparse_index' in 'struct
index_state' to be a an 'enum sparse_index_mode' with three modes:

* COMPLETELY_FULL (0): No sparse directories exist.

* COMPLETELY_SPARSE (1): Sparse directories may exist. Files outside the
  sparse-checkout cone are reduced to sparse directory entries whenever
  possible.

* PARTIALLY_SPARSE (2): Sparse directories may exist. Some file entries
  outside the sparse-checkout cone may exist. Running
  convert_to_sparse() may further reduce those files to sparse directory
  entries.

The main reason to store this extra information is to allow
convert_to_sparse() to short-circuit when the index is already in
COMPLETELY_SPARSE mode but to actually do the necessary work when in
PARTIALLY_SPARSE mode.

The PARTIALLY_SPARSE mode will be used in an upcoming change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/sparse-checkout.c |  2 +-
 cache.h                   | 32 ++++++++++++++++++++++++--------
 read-cache.c              |  6 +++---
 sparse-index.c            |  6 +++---
 4 files changed, 31 insertions(+), 15 deletions(-)

Comments

Junio C Hamano May 19, 2022, 8:05 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +enum sparse_index_mode {
> +	/*
> +	 * COMPLETELY_FULL: there are no sparse directories
> +	 * in the index at all.
> +	 */
> +	COMPLETELY_FULL = 0,

Two things that make me wonder are:

 * Do concrete values assigned to these symbols matter?  If we do

	if (some_function_that_returns_this_type())
		/* ok, we know it is full */
		do_full_thing();

   then the answer is yes.  If we write these values to on-disk
   file, and other versions and reimplementations of Git need to
   know the concrete values, then the answer is yes.  Otherwise, we
   may not want to say "= 0" and the like here.

 * Is it worth repeating the label in the comment?  IOW, I am
   wondering if

	/* there are no sparse directories in the index */
	COMPLETELY_FULL,

   is equally readable and slightly more maintainable.

Also these names being in cache.h and visible everywhere is somewhat
worrying.  Other CPP macros and enum constants in the file have
names that are prepared to be truly global, but COMPLETELY_FULL and
COLLAPSED do not hint that they are about sparse-index state as
strongly as PARTIALLY_SPARSE does.

> -		 /*
> -		  * sparse_index == 1 when sparse-directory
> -		  * entries exist. Requires sparse-checkout
> -		  * in cone mode.
> -		  */
> -		 sparse_index : 1;
> +		 fsmonitor_has_run_once : 1;
> +	enum sparse_index_mode sparse_index;

Good that we can lose the comment on a single bit.  Are some of the
enum sparse_index_mode values only possible in cone mode, just like
having true in this bit was only possible in cone mode?  Perhaps a
comment before "enum sparse_index_mode {" can say "in non-cone mode,
COMPLETELY_FULL is the only possible value" or somesuch?

Thanks.
Derrick Stolee May 20, 2022, 6:05 p.m. UTC | #2
On 5/19/2022 4:05 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +enum sparse_index_mode {
>> +	/*
>> +	 * COMPLETELY_FULL: there are no sparse directories
>> +	 * in the index at all.
>> +	 */
>> +	COMPLETELY_FULL = 0,
> 
> Two things that make me wonder are:
> 
>  * Do concrete values assigned to these symbols matter?  If we do
> 
> 	if (some_function_that_returns_this_type())
> 		/* ok, we know it is full */
> 		do_full_thing();
> 
>    then the answer is yes.  If we write these values to on-disk
>    file, and other versions and reimplementations of Git need to
>    know the concrete values, then the answer is yes.  Otherwise, we
>    may not want to say "= 0" and the like here.

The = 0 case matters because that is the default when the index
struct is initialized to zero. The other values are not important.
 
>  * Is it worth repeating the label in the comment?  IOW, I am
>    wondering if
> 
> 	/* there are no sparse directories in the index */
> 	COMPLETELY_FULL,
> 
>    is equally readable and slightly more maintainable.

Can do.

> Also these names being in cache.h and visible everywhere is somewhat
> worrying.  Other CPP macros and enum constants in the file have
> names that are prepared to be truly global, but COMPLETELY_FULL and
> COLLAPSED do not hint that they are about sparse-index state as
> strongly as PARTIALLY_SPARSE does.

Would prepending with "INDEX_" be sufficient?

>> -		 /*
>> -		  * sparse_index == 1 when sparse-directory
>> -		  * entries exist. Requires sparse-checkout
>> -		  * in cone mode.
>> -		  */
>> -		 sparse_index : 1;
>> +		 fsmonitor_has_run_once : 1;
>> +	enum sparse_index_mode sparse_index;
> 
> Good that we can lose the comment on a single bit.  Are some of the
> enum sparse_index_mode values only possible in cone mode, just like
> having true in this bit was only possible in cone mode?  Perhaps a
> comment before "enum sparse_index_mode {" can say "in non-cone mode,
> COMPLETELY_FULL is the only possible value" or somesuch?

I can add that comment for the COMPLETELY_FULL enum value.

Thanks,
-Stolee
Junio C Hamano May 20, 2022, 6:23 p.m. UTC | #3
Derrick Stolee <derrickstolee@github.com> writes:

>> Also these names being in cache.h and visible everywhere is somewhat
>> worrying.  Other CPP macros and enum constants in the file have
>> names that are prepared to be truly global, but COMPLETELY_FULL and
>> COLLAPSED do not hint that they are about sparse-index state as
>> strongly as PARTIALLY_SPARSE does.
>
> Would prepending with "INDEX_" be sufficient?

Yeah, index-collapsed, index-expanded, etc., sounds distinctive
enough.  Sorry, but I cannot think of a good phrase for partially
expanded case that is short, succinct and would go well with the
fully expanded and fully collapsed cases.

Thanks.
diff mbox series

Patch

diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 0217d44c5b1..88eea069ad4 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -128,7 +128,7 @@  static void clean_tracked_sparse_directories(struct repository *r)
 	 * sparse index will not delete directories that contain
 	 * conflicted entries or submodules.
 	 */
-	if (!r->index->sparse_index) {
+	if (r->index->sparse_index == COMPLETELY_FULL) {
 		/*
 		 * If something, such as a merge conflict or other concern,
 		 * prevents us from converting to a sparse index, then do
diff --git a/cache.h b/cache.h
index 6226f6a8a53..2d067aca2fd 100644
--- a/cache.h
+++ b/cache.h
@@ -310,6 +310,28 @@  struct untracked_cache;
 struct progress;
 struct pattern_list;
 
+enum sparse_index_mode {
+	/*
+	 * COMPLETELY_FULL: there are no sparse directories
+	 * in the index at all.
+	 */
+	COMPLETELY_FULL = 0,
+
+	/*
+	 * COLLAPSED: the index has already been collapsed to sparse
+	 * directories whereever possible.
+	 */
+	COLLAPSED = 1,
+
+	/*
+	 * PARTIALLY_SPARSE: the sparse directories that exist are
+	 * outside the sparse-checkout boundary, but it is possible
+	 * that some file entries could collapse to sparse directory
+	 * entries.
+	 */
+	PARTIALLY_SPARSE = 2,
+};
+
 struct index_state {
 	struct cache_entry **cache;
 	unsigned int version;
@@ -323,14 +345,8 @@  struct index_state {
 		 drop_cache_tree : 1,
 		 updated_workdir : 1,
 		 updated_skipworktree : 1,
-		 fsmonitor_has_run_once : 1,
-
-		 /*
-		  * sparse_index == 1 when sparse-directory
-		  * entries exist. Requires sparse-checkout
-		  * in cone mode.
-		  */
-		 sparse_index : 1;
+		 fsmonitor_has_run_once : 1;
+	enum sparse_index_mode sparse_index;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
 	struct object_id oid;
diff --git a/read-cache.c b/read-cache.c
index 4df97e185e9..cb9b33169fd 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -112,7 +112,7 @@  static const char *alternate_index_output;
 static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
 	if (S_ISSPARSEDIR(ce->ce_mode))
-		istate->sparse_index = 1;
+		istate->sparse_index = COLLAPSED;
 
 	istate->cache[nr] = ce;
 	add_name_hash(istate, ce);
@@ -1856,7 +1856,7 @@  static int read_index_extension(struct index_state *istate,
 		break;
 	case CACHE_EXT_SPARSE_DIRECTORIES:
 		/* no content, only an indicator */
-		istate->sparse_index = 1;
+		istate->sparse_index = COLLAPSED;
 		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
@@ -3149,7 +3149,7 @@  static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 				 unsigned flags)
 {
 	int ret;
-	int was_full = !istate->sparse_index;
+	int was_full = istate->sparse_index == COMPLETELY_FULL;
 
 	ret = convert_to_sparse(istate, 0);
 
diff --git a/sparse-index.c b/sparse-index.c
index 2a06ef58051..c2cd3bdb614 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -173,7 +173,7 @@  int convert_to_sparse(struct index_state *istate, int flags)
 	 * If the index is already sparse, empty, or otherwise
 	 * cannot be converted to sparse, do not convert.
 	 */
-	if (istate->sparse_index || !istate->cache_nr ||
+	if (istate->sparse_index == COLLAPSED || !istate->cache_nr ||
 	    !is_sparse_index_allowed(istate, flags))
 		return 0;
 
@@ -214,7 +214,7 @@  int convert_to_sparse(struct index_state *istate, int flags)
 	FREE_AND_NULL(istate->fsmonitor_dirty);
 	FREE_AND_NULL(istate->fsmonitor_last_update);
 
-	istate->sparse_index = 1;
+	istate->sparse_index = COLLAPSED;
 	trace2_region_leave("index", "convert_to_sparse", istate->repo);
 	return 0;
 }
@@ -259,7 +259,7 @@  void expand_to_pattern_list(struct index_state *istate,
 	 * If the index is already full, then keep it full. We will convert
 	 * it to a sparse index on write, if possible.
 	 */
-	if (!istate || !istate->sparse_index)
+	if (!istate || istate->sparse_index == COMPLETELY_FULL)
 		return;
 
 	/*