diff mbox series

[v4,10/10] sparse-checkout: add config to disable deleting dirs

Message ID 8d55a6ba2fdf64cee4eb51f3cb6f9808bd0b7505.1629841904.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Sparse index: delete ignored files outside sparse cone | expand

Commit Message

Derrick Stolee Aug. 24, 2021, 9:51 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The clean_tracked_sparse_directories() method deletes the tracked
directories that go out of scope when the sparse-checkout cone changes,
at least in cone mode. This is new behavior, but is recommended based on
our understanding of how users are interacting with the feature in most
cases.

It is possible that some users will object to the new behavior, so
create a new configuration option 'index.deleteSparseDirectories' that
can be set to 'false' to make clean_tracked_sparse_directories() do
nothing. This will keep all untracked files in the working tree and
cause performance problems with the sparse index, but those trade-offs
are for the user to decide.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config/index.txt     | 6 ++++++
 builtin/sparse-checkout.c          | 9 ++++++++-
 t/t1091-sparse-checkout-builtin.sh | 4 ++++
 3 files changed, 18 insertions(+), 1 deletion(-)

Comments

Elijah Newren Aug. 27, 2021, 8:58 p.m. UTC | #1
On Tue, Aug 24, 2021 at 2:51 PM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The clean_tracked_sparse_directories() method deletes the tracked
> directories that go out of scope when the sparse-checkout cone changes,
> at least in cone mode. This is new behavior, but is recommended based on
> our understanding of how users are interacting with the feature in most
> cases.
>
> It is possible that some users will object to the new behavior, so
> create a new configuration option 'index.deleteSparseDirectories' that
> can be set to 'false' to make clean_tracked_sparse_directories() do
> nothing. This will keep all untracked files in the working tree and
> cause performance problems with the sparse index, but those trade-offs
> are for the user to decide.

I'm not sold that we need anything here, and it sounds like this is
all being added based on a theoretical concern.  I might not object
too much normally to trying to address theoretical conerns with new
features, but:

* I'm a little concerned that we're adding a configuration option
(which live forever and are harder to work with
backward-compatibility-wise) rather than a command line flag.

* It's not clear to me that the option name is what we want.  For
example, git checkout has a --[no-]overwrite-ignored for overwriting
ignored files (or not) when switching to a different branch.  git
merge has the same flags (though only the fast-forwarding backend
heeds it currently).  This behavior is quite similar to that flag, and
has the same default of treating ignored files as expendable.  Should
the naming be more similar?

* It's not clear to me that the option has the right level of
implementation granularity.  If someone wants ignored files to be
treated as important, should they need to set a
merge.no_overwrite_ignored AND a checkout.no_overwrite_ignored AND a
index.deleteSparseDirectories AND other names, or have one high-level
option that sets all of these behaviors?

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/config/index.txt     | 6 ++++++
>  builtin/sparse-checkout.c          | 9 ++++++++-
>  t/t1091-sparse-checkout-builtin.sh | 4 ++++
>  3 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt
> index 75f3a2d1054..c65da20a931 100644
> --- a/Documentation/config/index.txt
> +++ b/Documentation/config/index.txt
> @@ -1,3 +1,9 @@
> +index.deleteSparseDirectories::
> +       When enabled, the cone mode sparse-checkout feature will delete
> +       directories that are outside of the sparse-checkout cone, unless
> +       such a directory contains an untracked, non-ignored file. Defaults
> +       to true.
> +
>  index.recordEndOfIndexEntries::
>         Specifies whether the index file should include an "End Of Index
>         Entry" section. This reduces index load time on multiprocessor
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index d0f5c4702be..33ec729d9de 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -102,7 +102,7 @@ static int sparse_checkout_list(int argc, const char **argv)
>
>  static void clean_tracked_sparse_directories(struct repository *r)
>  {
> -       int i, was_full = 0;
> +       int i, value, was_full = 0;
>         struct strbuf path = STRBUF_INIT;
>         size_t pathlen;
>         struct string_list_item *item;
> @@ -118,6 +118,13 @@ static void clean_tracked_sparse_directories(struct repository *r)
>             !r->index->sparse_checkout_patterns->use_cone_patterns)
>                 return;
>
> +       /*
> +        * Users can disable this behavior.
> +        */
> +       if (!repo_config_get_bool(r, "index.deletesparsedirectories", &value) &&
> +           !value)
> +               return;
> +
>         /*
>          * Use the sparse index as a data structure to assist finding
>          * directories that are safe to delete. This conversion to a
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 71236981e64..e0f31186d89 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -666,6 +666,10 @@ test_expect_success 'cone mode clears ignored subdirectories' '
>         git -C repo status --porcelain=v2 >out &&
>         test_must_be_empty out &&
>
> +       git -C repo -c index.deleteSparseDirectories=false sparse-checkout reapply &&
> +       test_path_is_dir repo/folder1 &&
> +       test_path_is_dir repo/deep/deeper2 &&
> +
>         git -C repo sparse-checkout reapply &&
>         test_path_is_missing repo/folder1 &&
>         test_path_is_missing repo/deep/deeper2 &&
> --
> gitgitgadget
Derrick Stolee Aug. 30, 2021, 1:30 p.m. UTC | #2
On 8/27/2021 4:58 PM, Elijah Newren wrote:
> On Tue, Aug 24, 2021 at 2:51 PM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The clean_tracked_sparse_directories() method deletes the tracked
>> directories that go out of scope when the sparse-checkout cone changes,
>> at least in cone mode. This is new behavior, but is recommended based on
>> our understanding of how users are interacting with the feature in most
>> cases.
>>
>> It is possible that some users will object to the new behavior, so
>> create a new configuration option 'index.deleteSparseDirectories' that
>> can be set to 'false' to make clean_tracked_sparse_directories() do
>> nothing. This will keep all untracked files in the working tree and
>> cause performance problems with the sparse index, but those trade-offs
>> are for the user to decide.
> 
> I'm not sold that we need anything here, and it sounds like this is
> all being added based on a theoretical concern.  I might not object
> too much normally to trying to address theoretical conerns with new
> features, but:
> 
> * I'm a little concerned that we're adding a configuration option
> (which live forever and are harder to work with
> backward-compatibility-wise) rather than a command line flag.

The issue with a command-line flag is that it would need to be added
to all the commands that reapply the sparse-checkout definition.
Maybe that's just the 'git sparse-checkout (init|set|add|reapply)'
and 'git read-tree' commands, but are there other places where this
logic might be applied in the future?

Also, as more third-party tools integrate with sparse-checkout, I
don't expect users to be directly involved in changing their cone,
so any use of a command-line flag would need to be integrated into
those tools. A config option applies this logic universally, when
needed.

> * It's not clear to me that the option name is what we want.  For
> example, git checkout has a --[no-]overwrite-ignored for overwriting
> ignored files (or not) when switching to a different branch.  git
> merge has the same flags (though only the fast-forwarding backend
> heeds it currently).  This behavior is quite similar to that flag, and
> has the same default of treating ignored files as expendable.  Should
> the naming be more similar?

I'm open to changing the name to more closely match existing naming
conventions, once we've decided if this should be included at all.

> * It's not clear to me that the option has the right level of
> implementation granularity.  If someone wants ignored files to be
> treated as important, should they need to set a
> merge.no_overwrite_ignored AND a checkout.no_overwrite_ignored AND a
> index.deleteSparseDirectories AND other names, or have one high-level
> option that sets all of these behaviors?

These cases have different meanings for why an ignored file is
important. For merge and checkout, the argument is saying "don't
overwrite ignored files if they are to be replaced by a tracked
file". For sparse-checkout, the config is saying "don't delete
ignored files that are leaving the sparse-checkout scope". I think
it is meaningful to say that files that match the .gitignore
patterns but are still tracked are deleted, because sparse-checkout
removes all tracked files.

I'm less convinced that there needs to be a meta-setting that covers
all of these cases simultaneously.

---

As for moving forward, I'm fine skipping this patch if there is no
support for it. I just wanted to demonstrate that it could be done.
Perhaps we wait for the theoretical concern to be a real one, as
requested by a user with a legitimate reason to care?

Thanks,
-Stolee
Elijah Newren Aug. 30, 2021, 8:11 p.m. UTC | #3
On Mon, Aug 30, 2021 at 6:30 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/27/2021 4:58 PM, Elijah Newren wrote:
> > On Tue, Aug 24, 2021 at 2:51 PM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> The clean_tracked_sparse_directories() method deletes the tracked
> >> directories that go out of scope when the sparse-checkout cone changes,
> >> at least in cone mode. This is new behavior, but is recommended based on
> >> our understanding of how users are interacting with the feature in most
> >> cases.
> >>
> >> It is possible that some users will object to the new behavior, so
> >> create a new configuration option 'index.deleteSparseDirectories' that
> >> can be set to 'false' to make clean_tracked_sparse_directories() do
> >> nothing. This will keep all untracked files in the working tree and
> >> cause performance problems with the sparse index, but those trade-offs
> >> are for the user to decide.
> >
> > I'm not sold that we need anything here, and it sounds like this is
> > all being added based on a theoretical concern.  I might not object
> > too much normally to trying to address theoretical conerns with new
> > features, but:
> >
> > * I'm a little concerned that we're adding a configuration option
> > (which live forever and are harder to work with
> > backward-compatibility-wise) rather than a command line flag.
>
> The issue with a command-line flag is that it would need to be added
> to all the commands that reapply the sparse-checkout definition.
> Maybe that's just the 'git sparse-checkout (init|set|add|reapply)'
> and 'git read-tree' commands, but are there other places where this
> logic might be applied in the future?
>
> Also, as more third-party tools integrate with sparse-checkout, I
> don't expect users to be directly involved in changing their cone,
> so any use of a command-line flag would need to be integrated into
> those tools. A config option applies this logic universally, when
> needed.
>
> > * It's not clear to me that the option name is what we want.  For
> > example, git checkout has a --[no-]overwrite-ignored for overwriting
> > ignored files (or not) when switching to a different branch.  git
> > merge has the same flags (though only the fast-forwarding backend
> > heeds it currently).  This behavior is quite similar to that flag, and
> > has the same default of treating ignored files as expendable.  Should
> > the naming be more similar?
>
> I'm open to changing the name to more closely match existing naming
> conventions, once we've decided if this should be included at all.
>
> > * It's not clear to me that the option has the right level of
> > implementation granularity.  If someone wants ignored files to be
> > treated as important, should they need to set a
> > merge.no_overwrite_ignored AND a checkout.no_overwrite_ignored AND a
> > index.deleteSparseDirectories AND other names, or have one high-level
> > option that sets all of these behaviors?
>
> These cases have different meanings for why an ignored file is
> important. For merge and checkout, the argument is saying "don't
> overwrite ignored files if they are to be replaced by a tracked
> file". For sparse-checkout, the config is saying "don't delete
> ignored files that are leaving the sparse-checkout scope". I think
> it is meaningful to say that files that match the .gitignore
> patterns but are still tracked are deleted, because sparse-checkout
> removes all tracked files.
>
> I'm less convinced that there needs to be a meta-setting that covers
> all of these cases simultaneously.
>
> ---
>
> As for moving forward, I'm fine skipping this patch if there is no
> support for it. I just wanted to demonstrate that it could be done.

That's fair, and you certainly achieved that.

> Perhaps we wait for the theoretical concern to be a real one, as
> requested by a user with a legitimate reason to care?

Sounds good to me.  Then we'll have a better frame of reference for
judging how the feature should be tailored as well.
diff mbox series

Patch

diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt
index 75f3a2d1054..c65da20a931 100644
--- a/Documentation/config/index.txt
+++ b/Documentation/config/index.txt
@@ -1,3 +1,9 @@ 
+index.deleteSparseDirectories::
+	When enabled, the cone mode sparse-checkout feature will delete
+	directories that are outside of the sparse-checkout cone, unless
+	such a directory contains an untracked, non-ignored file. Defaults
+	to true.
+
 index.recordEndOfIndexEntries::
 	Specifies whether the index file should include an "End Of Index
 	Entry" section. This reduces index load time on multiprocessor
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index d0f5c4702be..33ec729d9de 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -102,7 +102,7 @@  static int sparse_checkout_list(int argc, const char **argv)
 
 static void clean_tracked_sparse_directories(struct repository *r)
 {
-	int i, was_full = 0;
+	int i, value, was_full = 0;
 	struct strbuf path = STRBUF_INIT;
 	size_t pathlen;
 	struct string_list_item *item;
@@ -118,6 +118,13 @@  static void clean_tracked_sparse_directories(struct repository *r)
 	    !r->index->sparse_checkout_patterns->use_cone_patterns)
 		return;
 
+	/*
+	 * Users can disable this behavior.
+	 */
+	if (!repo_config_get_bool(r, "index.deletesparsedirectories", &value) &&
+	    !value)
+		return;
+
 	/*
 	 * Use the sparse index as a data structure to assist finding
 	 * directories that are safe to delete. This conversion to a
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 71236981e64..e0f31186d89 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -666,6 +666,10 @@  test_expect_success 'cone mode clears ignored subdirectories' '
 	git -C repo status --porcelain=v2 >out &&
 	test_must_be_empty out &&
 
+	git -C repo -c index.deleteSparseDirectories=false sparse-checkout reapply &&
+	test_path_is_dir repo/folder1 &&
+	test_path_is_dir repo/deep/deeper2 &&
+
 	git -C repo sparse-checkout reapply &&
 	test_path_is_missing repo/folder1 &&
 	test_path_is_missing repo/deep/deeper2 &&