diff mbox series

unpack-trees: add core.sparseCheckoutRmFiles config

Message ID 20210601183106.3084008-1-tpr.ll@botech.co.uk (mailing list archive)
State New, archived
Headers show
Series unpack-trees: add core.sparseCheckoutRmFiles config | expand

Commit Message

Tim Renouf (open source) June 1, 2021, 6:31 p.m. UTC
When doing a checkout (or other index merge from a tree) causes the
removal of a path that is outside sparse-checkout, the file is removed
from the working tree, even if it is dirty.

That is probably what you want if the file got there by being
materialized in a merge conflict. But it is not what you want if you
deliberately put the file there.

This commit adds the above config item, defaulting to "true" to get the
old behavior. Set it to "false" to avoid removing such a file from the
worktree.

Signed-off-by: Tim Renouf <tpr.ll@botech.co.uk>
---
Here is a fix for my problem, hidden under a config option as it might
not be what everyone wants (there are a few tests that rely on the
existing behavior). I realize this is a bit of a piecemeal approach.
Hopefully it will be superseded by the sparse-index work when that
arrives.

 Documentation/git-sparse-checkout.txt | 11 ++++++++
 cache.h                               |  1 +
 config.c                              |  5 ++++
 environment.c                         |  1 +
 t/t1011-read-tree-sparse-checkout.sh  | 36 +++++++++++++++++++++++++++
 unpack-trees.c                        | 14 ++++++++---
 6 files changed, 64 insertions(+), 4 deletions(-)

Comments

Derrick Stolee June 2, 2021, 2 a.m. UTC | #1
On 6/1/2021 2:31 PM, Tim Renouf wrote:
> When doing a checkout (or other index merge from a tree) causes the
> removal of a path that is outside sparse-checkout, the file is removed
> from the working tree, even if it is dirty.
> 
> That is probably what you want if the file got there by being
> materialized in a merge conflict. But it is not what you want if you
> deliberately put the file there.
> 
> This commit adds the above config item, defaulting to "true" to get the
> old behavior. Set it to "false" to avoid removing such a file from the
> worktree.

I don't think config is necessary here. This behavior is niche
enough to create a behavior-breaking change. However, we do want
to ensure that the full flow of eventually clearing the file when
clean is handled.

> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index a0eeaeb02e..31adb38b1d 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -111,6 +111,17 @@ the sparse-checkout file.
>  To repopulate the working directory with all files, use the
>  `git sparse-checkout disable` command.
>  
> +The `core.sparseCheckoutRmFiles` config setting

If we _are_ going to go with a config option, then I'm not a big
fan of this name. I've also been thinking that the sparse-checkout
config has been squatting in the "core.*" space too long. Perhaps
it is time to create its own section?

What about something like sparseCheckout.keepDirtyFiles, which
defaults to false?

Alternatively, we could add things to the index.* space. We
already have "index.sparse" for the sparse index feature. For this
topic, a config option "index.keepDirtySparseFiles" would have a
similar meaning to my other suggestion.

At the least, you would need to update the appropriate file in
Documentation/config/*.txt.

>  extern int core_apply_sparse_checkout;
>  extern int core_sparse_checkout_cone;
> +extern int core_sparse_checkout_rm_files;

These previous variables being global is unfortunate and it
might be time to fix that. At minimum, I think this new
option might be able to inject somewhere else without
running a lot of git_config_get_value() calls in a loop.

Since your changes are within unpack-trees.c, maybe adding
a flag to 'struct unpack_trees_options' would suffice. It
could be initialized in unpack_trees() so only happen once
per index update.

> +test_expect_success 'core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout removes file from disk' '

This test name is too long. Perhaps

	'sparse-checkout removes dirty non-matching file'

> +	git config core.sparseCheckout false &&
> +	git checkout -f top &&
> +	echo added3 >added3 &&
> +	git add added3 &&
> +	git commit -madded3 &&
> +	git checkout top &&
> +	test_path_is_missing added3 &&
> +	git config core.sparseCheckout true &&
> +	git config core.sparseCheckoutRmFiles true &&
> +	echo init.t >.git/info/sparse-checkout &&

Perhaps we could use a more modern approach, such as

	git sparse-checkout init &&
	git sparse-checkout set init.t &&

(and use 'git sparse-checkout disable' at the start of the
test.)

> +	git checkout HEAD@{1} &&

I'd typically expect 'git checkout HEAD~1' instead of
using the reflog, since it is more robust to changing
the test in the future. Better even to create a new
branch earlier and then switch between named branches.

> +	test_path_is_missing added3 &&
> +	echo dirty >added3 &&

I appreciate that you confirm the file is missing before
you create it.

> +	git checkout top &&
> +	test_path_is_missing added3

Thought: does this happen also with 'git sparse-checkout
reapply'?

> +'
> +
> +test_expect_success '!core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout does not remove file from disk' '

and here:

	'sparse-checkout keeps dirty non-matching file'

These tests are very similar. Perhaps they could be
grouped and just have a check at the end that makes
'added3' dirty and checks the behavior of 'git checkout'
with the two config settings?

> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1528,7 +1528,9 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
>  
>  	/*
>  	 * 1. Pretend the narrowest worktree: only unmerged entries
> -	 * are checked out
> +	 * are checked out. If core.sparseCheckoutRmFiles is off, then
> +	 * treat a file being removed as merged, so it does not get
> +	 * removed from the worktree.
>  	 */
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
> @@ -1536,7 +1538,8 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
>  		if (select_flag && !(ce->ce_flags & select_flag))
>  			continue;
>  
> -		if (!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED))
> +		if ((!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED)) ||
> +		    ((ce->ce_flags & CE_REMOVE) && !core_sparse_checkout_rm_files))
>  			ce->ce_flags |= skip_wt_flag;
>  		else
>  			ce->ce_flags &= ~skip_wt_flag;
> @@ -1681,12 +1684,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  
>  	if (!o->skip_sparse_checkout) {
>  		/*
> -		 * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1
> +		 * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1.
> +		 * If !core.sparseCheckoutRmFiles, include files being removed so ones
> +		 * outside sparse-checkout patterns do not get removed from the worktree.
>  		 * If they will have NEW_SKIP_WORKTREE, also set CE_SKIP_WORKTREE
>  		 * so apply_sparse_checkout() won't attempt to remove it from worktree
>  		 */
> +		int mask = core_sparse_checkout_rm_files ? CE_ADDED : CE_ADDED | CE_REMOVE;
>  		mark_new_skip_worktree(o->pl, &o->result,
> -				       CE_ADDED, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
> +				       mask, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
>  				       o->verbose_update);

I'm a bit worried about this use of CE_REMOVE. I see its use listed in
cache-tree.c with this comment:

		/*
		 * CE_REMOVE entries are removed before the index is
		 * written to disk. Skip them to remain consistent
		 * with the future on-disk index.
		 */

This makes me think that the entries are actually not present in the
written index, which is incorrect. It will make it look like we have
staged a deletion of that file. Can you check that the output of
'git status' shows the file with no staged changes, but an unstaged
_modification_? Alternatively: the modification might be ignored since
it is a sparse entry, but we would probably want to fix that before
taking this change. If my understanding is correct*, then 'git status'
will show it as a staged deletion and an unstaged addition.

(*) This is a BIG IF.

Thank you for your interest here. Elijah is right that the area is
fraught with complexity, and I'm learning something new about it
every day as I work on my sparse-index feature. I'm still trying
to grasp the subtleties like this and their ramifications before
changing the existing behavior because I want to be _sure_ we are
moving in a more stable direction and not just another unstable
point that frustrates users.

This change seems like a focused attempt, but I think we need to
track down those other details before committing to such a change:

1. How does a user with a dirty, tracked, sparse file get back
   into a state where that file is deleted? What commands do
   they need to run? Can that be tested and added to the sparse-
   checkout documentation?

2. What does 'git status' look like when a user is in this state?
   Is that helpful?

Thanks,
-Stolee
Junio C Hamano June 2, 2021, 2:32 a.m. UTC | #2
Derrick Stolee <stolee@gmail.com> writes:

>> This commit adds the above config item, defaulting to "true" to get the
>> old behavior. Set it to "false" to avoid removing such a file from the
>> worktree.
>
> I don't think config is necessary here. This behavior is niche
> enough to create a behavior-breaking change. However, we do want
> to ensure that the full flow of eventually clearing the file when
> clean is handled.

I didn't have a chance to get around to commenting on the patch
(instead I was preparing for -rc3), but you covered pretty much
everything I wanted to say.  It is unusual for those who are using
sparse checkout to have a modified (=tracked and dirty) file that
shouldn't be there, and making sure the user notices these unusual
files, instead of silently losing the changes to them, is probably a
"bugfix".

An explicit request to destructively overwrite the path ("git
restore -- path") or remove the working tree file along with
modification ("git reset --hard") is a good thing to have, but
the branch switching "git switch" is supposed to preserve local
modification (or fail to switch), whether the dirty path is inside
or outside the sparse checkout area.

> If we _are_ going to go with a config option, then I'm not a big
> fan of this name. I've also been thinking that the sparse-checkout
> config has been squatting in the "core.*" space too long. Perhaps
> it is time to create its own section?
>
> What about something like sparseCheckout.keepDirtyFiles, which
> defaults to false?

What about not having a configuration?

> 1. How does a user with a dirty, tracked, sparse file get back
>    into a state where that file is deleted? What commands do
>    they need to run? Can that be tested and added to the sparse-
>    checkout documentation?
>
> 2. What does 'git status' look like when a user is in this state?
>    Is that helpful?

Good points.

Thanks.
Elijah Newren June 2, 2021, 5:53 a.m. UTC | #3
Derrick and Junio already made some really good points, I'll just try
to add a few comments.

On Tue, Jun 1, 2021 at 7:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Derrick Stolee <stolee@gmail.com> writes:
>
> >> This commit adds the above config item, defaulting to "true" to get the
> >> old behavior. Set it to "false" to avoid removing such a file from the
> >> worktree.
> >
> > I don't think config is necessary here. This behavior is niche
> > enough to create a behavior-breaking change. However, we do want
> > to ensure that the full flow of eventually clearing the file when
> > clean is handled.
>
> I didn't have a chance to get around to commenting on the patch
> (instead I was preparing for -rc3), but you covered pretty much
> everything I wanted to say.  It is unusual for those who are using
> sparse checkout to have a modified (=tracked and dirty) file that
> shouldn't be there, and making sure the user notices these unusual
> files, instead of silently losing the changes to them, is probably a
> "bugfix".
>
> An explicit request to destructively overwrite the path ("git
> restore -- path") or remove the working tree file along with
> modification ("git reset --hard") is a good thing to have, but
> the branch switching "git switch" is supposed to preserve local
> modification (or fail to switch), whether the dirty path is inside
> or outside the sparse checkout area.

Yes, I think this is the best phrasing of the bug.  Since
SKIP_WORKTREE was originally implemented as "assume file contents
match HEAD" (which is sometimes a reasonable first approximation to
the intended behavior, but which keeps leading us astray into bad
behavior), it's pretty easy to see what's likely happening here:
unpack-trees.c is almost certainly just assuming that the file
contents match HEAD and not even bothering to verify that assumption
before deleting the file.  We should make it verify when a file is
SKIP_WORKTREE.

> > If we _are_ going to go with a config option, then I'm not a big
> > fan of this name. I've also been thinking that the sparse-checkout
> > config has been squatting in the "core.*" space too long. Perhaps
> > it is time to create its own section?
> >
> > What about something like sparseCheckout.keepDirtyFiles, which
> > defaults to false?
>
> What about not having a configuration?

I agree; we all concur that the reported behavior is a bug; adding a
config option to turn a bug off for some people just doesn't make any
sense.  Let's fix the bug instead.

I'm also worried that making a config option may have masked subtle
bugs in the patch that the rest of the testsuite would have turned up.

> > 1. How does a user with a dirty, tracked, sparse file get back
> >    into a state where that file is deleted? What commands do
> >    they need to run? Can that be tested and added to the sparse-
> >    checkout documentation?

Yeah, I've wondered if 'git sparse-checkout reapply' should do
something special here...or whether additional subcommands might be
needed.  I'm still not quite sure.

> > 2. What does 'git status' look like when a user is in this state?
> >    Is that helpful?

I'm worried that trying to make 'git status' report these files would
be expensive and defeat some of the advantages of a sparse-checkout.
Even in cone mode, and even if we could just stat the leading
directories to see they aren't there, would we still end up iterating
over all the index entries just to verify that their leading directory
is missing so we don't have to stat it?

Whereas checkout, if it might end up deleting something, it makes
perfect sense to pay the cost of a stat at that point and throw an
error message and abort if things aren't clean.
Tim Renouf (open source) June 2, 2021, 6:13 a.m. UTC | #4
Hi all

Thanks for the reviews and comments.

My use case is that I never want it to remove or otherwise touch those files outside of sparse-checkout that happen to be the same path as index paths. I realize that currently gives me complications (e.g. I must never try and merge/cherry-pick/rebase a commit that might cause a merge conflict out there), and I realize that’s not what everyone else wants. For example, I don’t want git reset --hard or whatever to remove those files. Hence the config option.

Am I right in saying that the sparse-index work makes it easier to achieve my use case? In that those outside-sparse-checkout paths would not ever get merged into the index, even if I merge/cherry-pick/rebase a tree with paths there?

I can go into more details on how I arrived ay my use case if it helps.

So maybe there are two separate things here:

1. The bug that checkout removes my file when it is dirty, instead of refusing (unless -f) or just ignoring it.
2. My use case, which is to do its best to never remove or otherwise touch worktree files outside sparse-checkout.

> I'm also worried that making a config option may have masked subtle
> bugs in the patch that the rest of the testsuite would have turned up.

It is true that not hiding it in a config option makes a few tests fail, including ones that specifically test that a git reset after a materialization from a merge conflict causes the file to be removed.

Thanks.

-tpr
Elijah Newren June 2, 2021, 11:41 p.m. UTC | #5
Hi,

On Tue, Jun 1, 2021 at 11:14 PM Tim Renouf (open source)
<tpr.ll@botech.co.uk> wrote:
>
> Hi all
>
> Thanks for the reviews and comments.
>
> My use case is that I never want it to remove or otherwise touch those files outside of sparse-checkout that happen to be the same path as index paths. I realize that currently gives me complications (e.g. I must never try and merge/cherry-pick/rebase a commit that might cause a merge conflict out there), and I realize that’s not what everyone else wants. For example, I don’t want git reset --hard or whatever to remove those files. Hence the config option.

Every case I've ever seen of present-despite-skip-worktree files is an
accidental and erroneous condition.  If you're trying to use it for
something else, we'd really need to understand the higher level
use-cases so that we can make ranges of commands work together nicely.
The sparse checkout capability itself was started by a low-level
implementational detail ("treat paths as though they matched HEAD and
don't write them to the working tree") and led to a maze of surprises,
bugs, edge cases, etc.  I'd rather not compound that problem by adding
another configuration option defined via a similar low-level
implementational detail.

I'm also leaning towards having `git reset --hard` not clear
present-despite-skip-worktree files, and not having `git status`
report them; both seem like an unnecessary performance issue for this
rare accidental/erroneous condition.  I totally agree that `git
switch/checkout` should not delete or overwrite such files if they
differ from HEAD; but similar to how having a dirty file prevents a
branch switch to some branch that deletes the file, a
present-despite-skip-worktree file that differs from HEAD should
result in an error message and an aborted switch/checkout.  At least
for the standard sparse-checkout behavior; don't know what your
usecase is.

> Am I right in saying that the sparse-index work makes it easier to achieve my use case? In that those outside-sparse-checkout paths would not ever get merged into the index, even if I merge/cherry-pick/rebase a tree with paths there?

If you merge/cherry-pick/rebase a commit with changes to a path
outside the sparse-checkout, and it merges cleanly with your current
branch, then with sparse-index it wouldn't even show up in the index
(though one of its parent trees would be modified because of the
changes to that file).  But that's not very different than without the
sparse-index, at least with the ort merge backend (available and ready
for using starting with git-2.32).  The recursive merge strategy (the
default) is known to write files to the working tree despite the
sparse-checkout requests, even when the merge is clean, but that's
just a bug in the recursive backend.  (One that isn't easily fixable
within its design, which is one of many reasons it was being written
in the form of the ort backend.)

If you merge/cherry-pick/rebase a commit with changes to a path
outside the sparse-checkout, and it conflicts with your branch, then
with or without sparse-index, that file is going to show up in the
index with higher order stages and be written to the working tree.
That is, so long as you don't have a file in the way.  Since the ort
backend makes use of the checkout machinery to switch from the current
state to the new state, fixing checkout to throw an error and abort
when a present-despite-skip-worktree file is present would cause
merges/rebases/cherry-picks to do the same.

> I can go into more details on how I arrived ay my use case if it helps.
>
> So maybe there are two separate things here:
>
> 1. The bug that checkout removes my file when it is dirty, instead of refusing (unless -f) or just ignoring it.

Yep, we need to fix this.

> 2. My use case, which is to do its best to never remove or otherwise touch worktree files outside sparse-checkout.

Can you expand on this use case a bit?  Should git diff or git status
report on your changes for your usecase?  Should "git restore" ignore
the given paths and not overwrite it?  What if the user explicitly
listed the path in question?  Should stash save these changes or
ignore them?  Should add include these changes or ignore them?  Since
your indexed file is different than the worktree file, should "git mv"
move just the file recorded in the index, just the one in the
worktree, or both?  If someone tries to run "git archive", should your
modified files be included?  If you don't like anything touching these
paths, does that mean they are also uninteresting?  So for example,
should "git log -p" or "git grep $REVISION" only return results inside
the sparse-checkout list of paths?

From a higher level, what are you trying to achieve?  Is it similar to
`git update-index --assume-unchanged`?  The point of sparse-checkout
was to reduce the number of files in the working tree, but it appears
you aren't trying to do that; you are putting files back into the
working tree anyway.  So, what's the point of sparse-checkout then?

It's possible sparse-checkout doesn't meet your needs, and just isn't
designed to meet them, and we need another special concept for your
case.  Or perhaps there's a config option that's meaningful.  Or maybe
you're just struggling with the bugs of sparse-checkout, of which
there are *many*.  See e.g.
https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/

> > I'm also worried that making a config option may have masked subtle
> > bugs in the patch that the rest of the testsuite would have turned up.
>
> It is true that not hiding it in a config option makes a few tests fail, including ones that specifically test that a git reset after a materialization from a merge conflict causes the file to be removed.

Yeah, not so surprising that it has weird interactions with merging
(and perhaps different weird interactions with different merge
backends).  Anything that touches unpack-trees.c either needs to be
part of standard operation, or if it's behind a special config option,
then it needs to be accompanied with a huge number of tests from many
different commands since it affects so many things.  We're trying to
add a battery of tests for sparse-checkout and sparse-index, and we
still obviously need a lot more.

Hope that helps,
Elijah
Tim Renouf (open source) June 5, 2021, 3:33 p.m. UTC | #6
Hi all

> Can you expand on this use case a bit? 

Fundamentally, I wanted sparse-checkout to cause all git commands to completely ignore all files outside sparse-checkout patterns, even if they happen to correspond to paths in the index. I realize that that is difficult to achieve, but I was hoping that my patch would make it close enough to be workable. In particular, I would need to avoid doing a merge (cherry-pick, rebase) with conflicts outside the sparse-checkout patterns.

Here is the (maybe slightly specialized) use case:

Due to decisions out of my hands, I have a monorepo containing some shared code and some separate components in subdirectories. There is a “main” branch where the code is built and delivered from. Each component is developed separately and has its own dev branch, e.g. “dev-component1”, “dev-component2”. Many developers develop on just one component, so they check out its dev branch. The “dev-component1” branch still contains the “main” version of the shared code and the other components, so you can merge it up to the main branch.

But some people want to be able to develop on two components at the same time. So they want to check out component1 on its dev-component1 branch, and component2 on its dev-component2 branch. They would ideally want to do this at the same time as maintaining the overall monorepo directory layout. The monorepo does not give a good way to do that.

I was experimenting with an approach combining sparse-checkout, worktrees, and the core.worktree config item, as follows:

The monorepo has its main worktree at the root of the overall workspace, with sparse-checkout set to check out just the shared code not considered part of any separate component. Each component has its own worktree in its subdirectory, with sparse-checkout set to only check out that component. To get the monorepo-like directory structure, a component’s worktree has its core.worktree config item set to point back up to the root of the overall workspace.

So, with that somewhat hairy structure, anything that touches a file outside of the worktree’s sparse-checkout actually touches a file in a different worktree, giving some surprising results.

Even with my patch, there is still a risk of that happening. And having a component’s worktree rooted somewhere other than where you think it is rooted gave some other surprising results.

I think I am going to abandon that approach and go a different way.

Thanks for the attention.

-tpr
diff mbox series

Patch

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index a0eeaeb02e..31adb38b1d 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -111,6 +111,17 @@  the sparse-checkout file.
 To repopulate the working directory with all files, use the
 `git sparse-checkout disable` command.
 
+The `core.sparseCheckoutRmFiles` config setting determines what to do when a
+checkout (or other index merge from a tree) causes the removal of a path that
+is outside the sparse-checkout patterns but the file exists in the worktree
+anyway. The default is `true`, which causes such a file to be removed from the
+worktree, which is probably what you want to remove outside-sparse-checkout
+files that were materialized by a merge conflict. Setting it to `false` means
+that such a file is not removed, which is probably what you want if you
+deliberately have files in the outside-sparse-checkout part of a worktree.
+
+The behavior with regard to worktree files that are outside sparse-checkout
+patterns is likely to change in the future.
 
 FULL PATTERN SET
 ----------------
diff --git a/cache.h b/cache.h
index 6fda8091f1..19ee1cfc02 100644
--- a/cache.h
+++ b/cache.h
@@ -964,6 +964,7 @@  extern const char *core_fsmonitor;
 
 extern int core_apply_sparse_checkout;
 extern int core_sparse_checkout_cone;
+extern int core_sparse_checkout_rm_files;
 
 /*
  * Include broken refs in all ref iterations, which will
diff --git a/config.c b/config.c
index 6428393a41..dd24e753a8 100644
--- a/config.c
+++ b/config.c
@@ -1552,6 +1552,11 @@  static int git_default_core_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.sparsecheckoutrmfiles")) {
+		core_sparse_checkout_rm_files = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.precomposeunicode")) {
 		precomposed_unicode = git_config_bool(var, value);
 		return 0;
diff --git a/environment.c b/environment.c
index 2f27008424..cff6bbbe62 100644
--- a/environment.c
+++ b/environment.c
@@ -70,6 +70,7 @@  char *notes_ref_name;
 int grafts_replace_parents = 1;
 int core_apply_sparse_checkout;
 int core_sparse_checkout_cone;
+int core_sparse_checkout_rm_files = 1;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index 24092c09a9..67690b31c3 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -280,4 +280,40 @@  test_expect_success 'checkout with --ignore-skip-worktree-bits' '
 	git diff --exit-code HEAD
 '
 
+test_expect_success 'core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout removes file from disk' '
+	git config core.sparseCheckout false &&
+	git checkout -f top &&
+	echo added3 >added3 &&
+	git add added3 &&
+	git commit -madded3 &&
+	git checkout top &&
+	test_path_is_missing added3 &&
+	git config core.sparseCheckout true &&
+	git config core.sparseCheckoutRmFiles true &&
+	echo init.t >.git/info/sparse-checkout &&
+	git checkout HEAD@{1} &&
+	test_path_is_missing added3 &&
+	echo dirty >added3 &&
+	git checkout top &&
+	test_path_is_missing added3
+'
+
+test_expect_success '!core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout does not remove file from disk' '
+	git config core.sparseCheckout false &&
+	git checkout -f top &&
+	echo added4 >added4 &&
+	git add added4 &&
+	git commit -madded4 &&
+	git checkout top &&
+	test_path_is_missing added4 &&
+	git config core.sparseCheckout true &&
+	git config core.sparseCheckoutRmFiles false &&
+	echo init.t >.git/info/sparse-checkout &&
+	git checkout HEAD@{1} &&
+	test_path_is_missing added4 &&
+	echo dirty >added4 &&
+	git checkout top &&
+	test_path_is_file added4
+'
+
 test_done
diff --git a/unpack-trees.c b/unpack-trees.c
index 9298fe1d9b..cdc3915974 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1528,7 +1528,9 @@  static void mark_new_skip_worktree(struct pattern_list *pl,
 
 	/*
 	 * 1. Pretend the narrowest worktree: only unmerged entries
-	 * are checked out
+	 * are checked out. If core.sparseCheckoutRmFiles is off, then
+	 * treat a file being removed as merged, so it does not get
+	 * removed from the worktree.
 	 */
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
@@ -1536,7 +1538,8 @@  static void mark_new_skip_worktree(struct pattern_list *pl,
 		if (select_flag && !(ce->ce_flags & select_flag))
 			continue;
 
-		if (!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED))
+		if ((!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED)) ||
+		    ((ce->ce_flags & CE_REMOVE) && !core_sparse_checkout_rm_files))
 			ce->ce_flags |= skip_wt_flag;
 		else
 			ce->ce_flags &= ~skip_wt_flag;
@@ -1681,12 +1684,15 @@  int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 
 	if (!o->skip_sparse_checkout) {
 		/*
-		 * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1
+		 * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1.
+		 * If !core.sparseCheckoutRmFiles, include files being removed so ones
+		 * outside sparse-checkout patterns do not get removed from the worktree.
 		 * If they will have NEW_SKIP_WORKTREE, also set CE_SKIP_WORKTREE
 		 * so apply_sparse_checkout() won't attempt to remove it from worktree
 		 */
+		int mask = core_sparse_checkout_rm_files ? CE_ADDED : CE_ADDED | CE_REMOVE;
 		mark_new_skip_worktree(o->pl, &o->result,
-				       CE_ADDED, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
+				       mask, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
 				       o->verbose_update);
 
 		ret = 0;