diff mbox series

[v3] repo_read_index: add config to expect files outside sparse patterns

Message ID 20220224052259.30498-1-newren@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v3] repo_read_index: add config to expect files outside sparse patterns | expand

Commit Message

Elijah Newren Feb. 24, 2022, 5:22 a.m. UTC
Typically with sparse checkouts, we expect files outside the sparsity
patterns to be marked as SKIP_WORKTREE and be missing from the working
tree.  In edge cases, this can be violated and cause confusion, so in a
sparse checkout, since 11d46a399d ("repo_read_index: clear SKIP_WORKTREE
bit from files present in worktree", 2022-01-06), Git automatically
clears the SKIP_WORKTREE bit at read time for entries corresponding to
files that are present in the working tree.

However, there is a more atypical situation where this situation would
be expected.  A Git-aware virtual file system[1] takes advantage of its
position as a file system driver to expose all files in the working
tree, fetch them on demand using partial clone on access, and tell Git
to pay attention to them on demand by updating the sparse checkout
pattern on writes.  This means that commands like "git status" only have
to examine files that have potentially been modified, whereas commands
like "ls" are able to show the entire codebase without requiring manual
updates to the sparse checkout pattern.

Thus since 11d46a399d, Git with such Git-aware virtual file systems
unsets the SKIP_WORKTREE bit for all files and commands like "git
status" have to fetch and examine them all.

Introduce a configuration setting sparse.expectFilesOutsideOfPatterns to
allow limiting the tracked set of files to a small set once again.  A
Git-aware virtual file system or other application that wants to
maintain files outside of the sparse checkout can set this in a
repository to instruct Git not to check for the presence of
SKIP_WORKTREE files.  The setting defaults to false, so most users of
sparse checkout will still get the benefit of an automatically updating
index to recover from the variety of difficult issues detailed in
11d46a399d for paths with SKIP_WORKTREE set despite the path being
present.

[1] such as the vfsd described in
https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
Changes since v2:
  * Made tweaks to the commit message and the text of the config option as
    highlighted in my response to Jonathan's v2.

I'm guessing that since there are no code (only documentation) changes since
Jonathan's v2 submission, that this patch satisfies vfsd/Google's needs.
I'm also guessing it matches what Stolee and Dscho stated in their comments
on v1.  But it'd be nice to have an ack from each side just to make sure.
    
 Documentation/config.txt         |  2 ++
 Documentation/config/sparse.txt  | 28 ++++++++++++++++++++++++++++
 cache.h                          |  1 +
 config.c                         | 14 ++++++++++++++
 environment.c                    |  1 +
 sparse-index.c                   |  3 ++-
 t/t1090-sparse-checkout-scope.sh | 19 +++++++++++++++++++
 7 files changed, 67 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/config/sparse.txt

Comments

Junio C Hamano Feb. 24, 2022, 6:24 p.m. UTC | #1
Elijah Newren <newren@gmail.com> writes:

> Typically with sparse checkouts, we expect files outside the sparsity
> patterns to be marked as SKIP_WORKTREE and be missing from the working
> tree.  In edge cases, this can be violated and cause confusion, so in a
> sparse checkout, since 11d46a399d ("repo_read_index: clear SKIP_WORKTREE

I think this refers to af6a5187 (repo_read_index: clear
SKIP_WORKTREE bit from files present in worktree, 2022-01-14).

> bit from files present in worktree", 2022-01-06), Git automatically
> clears the SKIP_WORKTREE bit at read time for entries corresponding to
> files that are present in the working tree.

So, this is a workflow where the user deliberately "creates" these
files outside the sparsity cone or pattern (by various non-automated
means like editing, copying/renaming, or untarring).  If they did so
on purpose, they may be interested in comparing them with existing
commits, or even including them as a newer version in the next
commit they create.  To help that workflow, clearing the bit makes
sense.

Am I on the right path?  I am wondering if mentioning some of that
would help understanding by the reader when it is contrasted with
the (competing) goal of supporting VFS use case mentioned next.

> However, there is a more atypical situation where this situation would

I wonder if that is "more atypical" (read: makes me wonder if it
depends on who the reader is what is typoical), and more importantly,
if it helps understanding of the reader (read: whether which one is
more common, we'd want to support both camps anyway).  

    There is another workflow, however, that it is expected that
    paths outside the sparsity patterns appear to exist in the
    working tree and that they do not lose the SKIP_WORKTREE bit, at
    least until they get modified.

or something?

> be expected.  A Git-aware virtual file system[1] takes advantage of its
> position as a file system driver to expose all files in the working
> tree, fetch them on demand using partial clone on access, and tell Git
> to pay attention to them on demand by updating the sparse checkout
> pattern on writes.  This means that commands like "git status" only have
> to examine files that have potentially been modified, whereas commands
> like "ls" are able to show the entire codebase without requiring manual
> updates to the sparse checkout pattern.

Well explained.

> Thus since 11d46a399d, Git with such Git-aware virtual file systems

The same stale reference.

> unsets the SKIP_WORKTREE bit for all files and commands like "git
> status" have to fetch and examine them all.
>
> Introduce a configuration setting sparse.expectFilesOutsideOfPatterns to
> allow limiting the tracked set of files to a small set once again.  A
> Git-aware virtual file system or other application that wants to
> maintain files outside of the sparse checkout can set this in a
> repository to instruct Git not to check for the presence of
> SKIP_WORKTREE files.  The setting defaults to false, so most users of
> sparse checkout will still get the benefit of an automatically updating
> index to recover from the variety of difficult issues detailed in
> 11d46a399d for paths with SKIP_WORKTREE set despite the path being

Ditto.

> I'm guessing that since there are no code (only documentation) changes since
> Jonathan's v2 submission, that this patch satisfies vfsd/Google's needs.
> I'm also guessing it matches what Stolee and Dscho stated in their comments
> on v1.  But it'd be nice to have an ack from each side just to make sure.

True.  Let me queue but leave it just outside 'next' until that
happens.

I think the name of the knob is what Jonathan suggested, so I
presume that their side would be fine with it, but I am curious (I
do not wonder, though) what the plan on the Microsoft's side going
forward.  When they update the version of Git bundled in their vfsd,
would this be reverted and an equivalent they have (and they may
have more such "workaround" in other areas as well?) will be kept,
so whatever we do here will add a minor inconvenience to them but
will not hurt them otherwise?

> diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
> new file mode 100644
> index 0000000000..fba504173c
> --- /dev/null
> +++ b/Documentation/config/sparse.txt
> @@ -0,0 +1,28 @@
> +sparse.expectFilesOutsideOfPatterns::
> +	Typically with sparse checkouts, files not matching any
> +	sparsity patterns are marked as such in the index file and

s/index file/index/ perhaps.

> +	missing from the working tree.  Accordingly, Git will
> +	ordinarily check whether files that the index indicates are
> +	outside of the sparse area are present in the working tree and
> +	mark them as present in the index if so.  This option can be

Just an observation.  According to this sentence, "sparse area" is
"paths that ought to be present in the working tree", so paths
"outside of the sparse area" that are present need to be corrected
to be "in" the sparse area by futzing bits.  I always get confused
when I hear "sparse area" if the author meant "paths that ought to
be missing" or "present", but maybe it is just me.

> +	used to tell Git that such present-but-unmatching files are
> +	expected and to stop checking for them.

OK.

> ++
> +The default is `false`.  Paths which are marked as SKIP_WORKTREE
> +despite being present (which can occur for a few different reasons)
> +typically present a range of problems which are difficult for users to

s/typically // perhaps.

> +discover and recover from.  The default setting avoids such issues.
> ++
> +A Git-based virtual file system (VFS) can turn the usual expectation
> +on its head: files are present in the working copy but do not take
> +up much disk space because their contents are not downloaded until
> +they are accessed.  With such a virtual file system layer, most files
> +do not match the sparsity patterns at first, and the VFS layer
> +updates the sparsity patterns to add more files whenever files are
> +written.  Setting this to `true` supports such a setup where files are
> +expected to be present outside the sparse area and a separate, robust
> +mechanism is responsible for keeping the sparsity patterns up to date.

s/separate, robust/separate/ I would think.

We make the outside mechanism that makes these files appear to be
present to also be responsible for maintaining the sparse bit and
patterns.

When the user (or IDE) sets this knob to 'true', do we even have to
expect that files appear to be present?  In the use case we intend
to support with this feature, i.e. some VFS, we might expect all
paths to appear to be present, but if that VFS also allows users to
configure to expose only a subset of paths, not all paths may appear
to be present.  And we are perfectly OK with that, becuase we do not
expect anything about the working tree paths outside the sparsity
pattern.  Am I mistaken?

So, "... supports such a setup where some external system releaves
us of the responsibility of maintaining the consistency between the
presence of working tree files and sparsity patterns, so we stop
expecting whether files are present or missing outside the sparse
area", might be closer to the truth?

> +Note that the checking and clearing of the SKIP_WORKTREE bit only
> +happens when core.sparseCheckout is true, so this config option has no
> +effect unless core.sparseCheckout is true.

Good note to have.  There is no mention of "cone" mode in the entire
description; it is unclear if this only applies to "pattern" mode or
to both "pattern" and "cone" modes, which may want to be clarified.

Thanks.
Jonathan Nieder Feb. 25, 2022, 4:33 p.m. UTC | #2
Hi,

Elijah Newren wrote:

> Signed-off-by: Elijah Newren <newren@gmail.com>

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks, and sorry for the slow review.  My one remaining area for nits
is the documentation, but that can be improved iteratively via patches
on top.

[...]
> --- /dev/null
> +++ b/Documentation/config/sparse.txt
> @@ -0,0 +1,28 @@
> +sparse.expectFilesOutsideOfPatterns::
> +	Typically with sparse checkouts, files not matching any
> +	sparsity patterns are marked as such in the index file and
> +	missing from the working tree.  Accordingly, Git will
> +	ordinarily check whether files that the index indicates are
> +	outside of the sparse area are present in the working tree and

Junio mentioned the "sparse area" could suggest that the area is
itself sparse and devoid of files, so it might not have been the best
choice of words on my part.  Perhaps "whether files that the index
indicates are not checked out are present in the working tree" would
work here?

> +	mark them as present in the index if so.  This option can be
> +	used to tell Git that such present-but-unmatching files are
> +	expected and to stop checking for them.
> ++
> +The default is `false`.  Paths which are marked as SKIP_WORKTREE
> +despite being present (which can occur for a few different reasons)
> +typically present a range of problems which are difficult for users to
> +discover and recover from.  The default setting avoids such issues.

The git-sparse-checkout(1) page never describes what SKIP_WORKTREE
means, so it might not be obvious to them what this means.  Also, the
"can occur for a few different reasons" may leave the user wondering
whether they are subject to those reasons.  What the reader wants to
know is "I should keep using the default because it makes Git work
better", so how about something like

 The default is `false`, which allows Git to automatically recover
 from the list of files in the index and working tree falling out of
 sync.
 +

?

> ++
> +A Git-based virtual file system (VFS) can turn the usual expectation
> +on its head: files are present in the working copy but do not take
> +up much disk space because their contents are not downloaded until
> +they are accessed.  With such a virtual file system layer, most files
> +do not match the sparsity patterns at first, and the VFS layer
> +updates the sparsity patterns to add more files whenever files are
> +written.  Setting this to `true` supports such a setup where files are
> +expected to be present outside the sparse area and a separate, robust
> +mechanism is responsible for keeping the sparsity patterns up to date.

Here I spent most of the words explaining what a Git-based VFS layer
is, which is also not too relevant to most users (who are just
interested in "is `true` the right value for me?").  How about
reducing it to the following?

 Set this to `true` if you are in a setup where extra files are expected
 to be present and a separate, robust mechanism is responsible for
 keeping the sparsity patterns up to date, such as a Git-aware virtual
 file system.

?

> ++
> +Note that the checking and clearing of the SKIP_WORKTREE bit only
> +happens when core.sparseCheckout is true, so this config option has no
> +effect unless core.sparseCheckout is true.

Good note.  Same nit about the user not necessarily knowing what
SKIP_WORKTREE means applies.  Also, we can remove the extra words
"Note that" since the dutiful reader should be noting everything we
say. :)  I think that would make

 +
 Regardless of this setting, Git does not check for
 present-but-unmatching files unless sparse checkout is enabled, so
 this config option has no effect unless `core.sparseCheckout` is
 `true`.

Thanks,
Jonathan
Elijah Newren Feb. 26, 2022, 5:58 a.m. UTC | #3
On Thu, Feb 24, 2022 at 10:24 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > Typically with sparse checkouts, we expect files outside the sparsity
> > patterns to be marked as SKIP_WORKTREE and be missing from the working
> > tree.  In edge cases, this can be violated and cause confusion, so in a
> > sparse checkout, since 11d46a399d ("repo_read_index: clear SKIP_WORKTREE
>
> I think this refers to af6a5187 (repo_read_index: clear
> SKIP_WORKTREE bit from files present in worktree, 2022-01-14).

Yes, I'm usually pretty good about grabbing the commits you created
and merged rather than the local copies I submitted, but I messed it
up here.  Thanks for catching.

>
> > bit from files present in worktree", 2022-01-06), Git automatically
> > clears the SKIP_WORKTREE bit at read time for entries corresponding to
> > files that are present in the working tree.
>
> So, this is a workflow where the user deliberately "creates" these
> files outside the sparsity cone or pattern (by various non-automated
> means like editing, copying/renaming, or untarring).  If they did so
> on purpose, they may be interested in comparing them with existing
> commits, or even including them as a newer version in the next
> commit they create.  To help that workflow, clearing the bit makes
> sense.
>
> Am I on the right path?  I am wondering if mentioning some of that
> would help understanding by the reader when it is contrasted with
> the (competing) goal of supporting VFS use case mentioned next.

Yes, this is one of three ways that things can get out of sync.  Since
this commit was being added to en/present-despite-skipped which
spelled this out in detail and thus would appear just a few commits
before, I thought it wasn't worth repeating these details, but I
guessed wrong.  I'll include them here again.

> > However, there is a more atypical situation where this situation would
>
> I wonder if that is "more atypical" (read: makes me wonder if it
> depends on who the reader is what is typoical), and more importantly,
> if it helps understanding of the reader (read: whether which one is
> more common, we'd want to support both camps anyway).
>
>     There is another workflow, however, that it is expected that
>     paths outside the sparsity patterns appear to exist in the
>     working tree and that they do not lose the SKIP_WORKTREE bit, at
>     least until they get modified.
>
> or something?

I like it.

> > be expected.  A Git-aware virtual file system[1] takes advantage of its
> > position as a file system driver to expose all files in the working
> > tree, fetch them on demand using partial clone on access, and tell Git
> > to pay attention to them on demand by updating the sparse checkout
> > pattern on writes.  This means that commands like "git status" only have
> > to examine files that have potentially been modified, whereas commands
> > like "ls" are able to show the entire codebase without requiring manual
> > updates to the sparse checkout pattern.
>
> Well explained.
>
> > Thus since 11d46a399d, Git with such Git-aware virtual file systems
>
> The same stale reference.
>
> > unsets the SKIP_WORKTREE bit for all files and commands like "git
> > status" have to fetch and examine them all.
> >
> > Introduce a configuration setting sparse.expectFilesOutsideOfPatterns to
> > allow limiting the tracked set of files to a small set once again.  A
> > Git-aware virtual file system or other application that wants to
> > maintain files outside of the sparse checkout can set this in a
> > repository to instruct Git not to check for the presence of
> > SKIP_WORKTREE files.  The setting defaults to false, so most users of
> > sparse checkout will still get the benefit of an automatically updating
> > index to recover from the variety of difficult issues detailed in
> > 11d46a399d for paths with SKIP_WORKTREE set despite the path being
>
> Ditto.

Will fix all three.

> > I'm guessing that since there are no code (only documentation) changes since
> > Jonathan's v2 submission, that this patch satisfies vfsd/Google's needs.
> > I'm also guessing it matches what Stolee and Dscho stated in their comments
> > on v1.  But it'd be nice to have an ack from each side just to make sure.
>
> True.  Let me queue but leave it just outside 'next' until that
> happens.
>
> I think the name of the knob is what Jonathan suggested, so I
> presume that their side would be fine with it, but I am curious (I
> do not wonder, though) what the plan on the Microsoft's side going
> forward.  When they update the version of Git bundled in their vfsd,
> would this be reverted and an equivalent they have (and they may
> have more such "workaround" in other areas as well?) will be kept,
> so whatever we do here will add a minor inconvenience to them but
> will not hurt them otherwise?
>
> > diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
> > new file mode 100644
> > index 0000000000..fba504173c
> > --- /dev/null
> > +++ b/Documentation/config/sparse.txt
> > @@ -0,0 +1,28 @@
> > +sparse.expectFilesOutsideOfPatterns::
> > +     Typically with sparse checkouts, files not matching any
> > +     sparsity patterns are marked as such in the index file and
>
> s/index file/index/ perhaps.

Will fix.

> > +     missing from the working tree.  Accordingly, Git will
> > +     ordinarily check whether files that the index indicates are
> > +     outside of the sparse area are present in the working tree and
> > +     mark them as present in the index if so.  This option can be
>
> Just an observation.  According to this sentence, "sparse area" is
> "paths that ought to be present in the working tree", so paths
> "outside of the sparse area" that are present need to be corrected
> to be "in" the sparse area by futzing bits.  I always get confused
> when I hear "sparse area" if the author meant "paths that ought to
> be missing" or "present", but maybe it is just me.

I reworded this based on a combination of the feedback from you and
Jonathan.  I think it's clearer now; I'll resubmit soon.

>
> > +     used to tell Git that such present-but-unmatching files are
> > +     expected and to stop checking for them.
>
> OK.
>
> > ++
> > +The default is `false`.  Paths which are marked as SKIP_WORKTREE
> > +despite being present (which can occur for a few different reasons)
> > +typically present a range of problems which are difficult for users to
>
> s/typically // perhaps.

Sure.

> > +discover and recover from.  The default setting avoids such issues.
> > ++
> > +A Git-based virtual file system (VFS) can turn the usual expectation
> > +on its head: files are present in the working copy but do not take
> > +up much disk space because their contents are not downloaded until
> > +they are accessed.  With such a virtual file system layer, most files
> > +do not match the sparsity patterns at first, and the VFS layer
> > +updates the sparsity patterns to add more files whenever files are
> > +written.  Setting this to `true` supports such a setup where files are
> > +expected to be present outside the sparse area and a separate, robust
> > +mechanism is responsible for keeping the sparsity patterns up to date.
>
> s/separate, robust/separate/ I would think.
>
> We make the outside mechanism that makes these files appear to be
> present to also be responsible for maintaining the sparse bit and
> patterns.
>
> When the user (or IDE) sets this knob to 'true', do we even have to
> expect that files appear to be present?  In the use case we intend
> to support with this feature, i.e. some VFS, we might expect all
> paths to appear to be present, but if that VFS also allows users to
> configure to expose only a subset of paths, not all paths may appear
> to be present.  And we are perfectly OK with that, becuase we do not
> expect anything about the working tree paths outside the sparsity
> pattern.  Am I mistaken?
>
> So, "... supports such a setup where some external system releaves
> us of the responsibility of maintaining the consistency between the
> presence of working tree files and sparsity patterns, so we stop
> expecting whether files are present or missing outside the sparse
> area", might be closer to the truth?

Good point, and thanks for the suggested wording.

> > +Note that the checking and clearing of the SKIP_WORKTREE bit only
> > +happens when core.sparseCheckout is true, so this config option has no
> > +effect unless core.sparseCheckout is true.
>
> Good note to have.  There is no mention of "cone" mode in the entire
> description; it is unclear if this only applies to "pattern" mode or
> to both "pattern" and "cone" modes, which may want to be clarified.

Yeah, it applies to both pattern and cone modes.  I went with
Jonathan's wording, which I think sounded more precise and suggested
that only core.sparseCheckout=true matters:

"""
Regardless of this setting, Git does not check for
 present-but-unmatching files unless sparse checkout is enabled, so
 this config option has no effect unless `core.sparseCheckout` is
 `true`.
"""
Elijah Newren Feb. 26, 2022, 6:01 a.m. UTC | #4
On Fri, Feb 25, 2022 at 8:33 AM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Hi,
>
> Elijah Newren wrote:
>
> > Signed-off-by: Elijah Newren <newren@gmail.com>
>
> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
>
> Thanks, and sorry for the slow review.  My one remaining area for nits
> is the documentation, but that can be improved iteratively via patches
> on top.
>
> [...]
> > --- /dev/null
> > +++ b/Documentation/config/sparse.txt
> > @@ -0,0 +1,28 @@
> > +sparse.expectFilesOutsideOfPatterns::
> > +     Typically with sparse checkouts, files not matching any
> > +     sparsity patterns are marked as such in the index file and
> > +     missing from the working tree.  Accordingly, Git will
> > +     ordinarily check whether files that the index indicates are
> > +     outside of the sparse area are present in the working tree and
>
> Junio mentioned the "sparse area" could suggest that the area is
> itself sparse and devoid of files, so it might not have been the best
> choice of words on my part.  Perhaps "whether files that the index
> indicates are not checked out are present in the working tree" would
> work here?

I rewrote the paragraph.  I think it's more clear now; I'll resubmit
it here soon.

> > +     mark them as present in the index if so.  This option can be
> > +     used to tell Git that such present-but-unmatching files are
> > +     expected and to stop checking for them.
> > ++
> > +The default is `false`.  Paths which are marked as SKIP_WORKTREE
> > +despite being present (which can occur for a few different reasons)
> > +typically present a range of problems which are difficult for users to
> > +discover and recover from.  The default setting avoids such issues.
>
> The git-sparse-checkout(1) page never describes what SKIP_WORKTREE
> means, so it might not be obvious to them what this means.  Also, the
> "can occur for a few different reasons" may leave the user wondering
> whether they are subject to those reasons.  What the reader wants to
> know is "I should keep using the default because it makes Git work
> better", so how about something like
>
>  The default is `false`, which allows Git to automatically recover
>  from the list of files in the index and working tree falling out of
>  sync.
>  +
>
> ?

I like this.

> > ++
> > +A Git-based virtual file system (VFS) can turn the usual expectation
> > +on its head: files are present in the working copy but do not take
> > +up much disk space because their contents are not downloaded until
> > +they are accessed.  With such a virtual file system layer, most files
> > +do not match the sparsity patterns at first, and the VFS layer
> > +updates the sparsity patterns to add more files whenever files are
> > +written.  Setting this to `true` supports such a setup where files are
> > +expected to be present outside the sparse area and a separate, robust
> > +mechanism is responsible for keeping the sparsity patterns up to date.
>
> Here I spent most of the words explaining what a Git-based VFS layer
> is, which is also not too relevant to most users (who are just
> interested in "is `true` the right value for me?").  How about
> reducing it to the following?
>
>  Set this to `true` if you are in a setup where extra files are expected
>  to be present and a separate, robust mechanism is responsible for
>  keeping the sparsity patterns up to date, such as a Git-aware virtual
>  file system.
>
> ?

I like this, but I also added in some of the wording suggestions from
Junio here, so it's
a bit longer but has both some of his suggested wording and yours for
slightly different aspects that I think works well together.

>
> > ++
> > +Note that the checking and clearing of the SKIP_WORKTREE bit only
> > +happens when core.sparseCheckout is true, so this config option has no
> > +effect unless core.sparseCheckout is true.
>
> Good note.  Same nit about the user not necessarily knowing what
> SKIP_WORKTREE means applies.  Also, we can remove the extra words
> "Note that" since the dutiful reader should be noting everything we
> say. :)  I think that would make
>
>  +
>  Regardless of this setting, Git does not check for
>  present-but-unmatching files unless sparse checkout is enabled, so
>  this config option has no effect unless `core.sparseCheckout` is
>  `true`.

I like this too.  Thanks for the suggestions, the proposed changes,
and the review.
diff mbox series

Patch

diff --git a/Documentation/config.txt b/Documentation/config.txt
index b168f02dc3..8628ae2634 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -468,6 +468,8 @@  include::config/sequencer.txt[]
 
 include::config/showbranch.txt[]
 
+include::config/sparse.txt[]
+
 include::config/splitindex.txt[]
 
 include::config/ssh.txt[]
diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
new file mode 100644
index 0000000000..fba504173c
--- /dev/null
+++ b/Documentation/config/sparse.txt
@@ -0,0 +1,28 @@ 
+sparse.expectFilesOutsideOfPatterns::
+	Typically with sparse checkouts, files not matching any
+	sparsity patterns are marked as such in the index file and
+	missing from the working tree.  Accordingly, Git will
+	ordinarily check whether files that the index indicates are
+	outside of the sparse area are present in the working tree and
+	mark them as present in the index if so.  This option can be
+	used to tell Git that such present-but-unmatching files are
+	expected and to stop checking for them.
++
+The default is `false`.  Paths which are marked as SKIP_WORKTREE
+despite being present (which can occur for a few different reasons)
+typically present a range of problems which are difficult for users to
+discover and recover from.  The default setting avoids such issues.
++
+A Git-based virtual file system (VFS) can turn the usual expectation
+on its head: files are present in the working copy but do not take
+up much disk space because their contents are not downloaded until
+they are accessed.  With such a virtual file system layer, most files
+do not match the sparsity patterns at first, and the VFS layer
+updates the sparsity patterns to add more files whenever files are
+written.  Setting this to `true` supports such a setup where files are
+expected to be present outside the sparse area and a separate, robust
+mechanism is responsible for keeping the sparsity patterns up to date.
++
+Note that the checking and clearing of the SKIP_WORKTREE bit only
+happens when core.sparseCheckout is true, so this config option has no
+effect unless core.sparseCheckout is true.
diff --git a/cache.h b/cache.h
index 281f00ab1b..b6b8e83ae3 100644
--- a/cache.h
+++ b/cache.h
@@ -1003,6 +1003,7 @@  extern const char *core_fsmonitor;
 
 extern int core_apply_sparse_checkout;
 extern int core_sparse_checkout_cone;
+extern int sparse_expect_files_outside_of_patterns;
 
 /*
  * Returns the boolean value of $GIT_OPTIONAL_LOCKS (or the default value).
diff --git a/config.c b/config.c
index 2bffa8d4a0..9b9ad1500a 100644
--- a/config.c
+++ b/config.c
@@ -1544,6 +1544,17 @@  static int git_default_core_config(const char *var, const char *value, void *cb)
 	return platform_core_config(var, value, cb);
 }
 
+static int git_default_sparse_config(const char *var, const char *value)
+{
+	if (!strcmp(var, "sparse.expectfilesoutsideofpatterns")) {
+		sparse_expect_files_outside_of_patterns = git_config_bool(var, value);
+		return 0;
+	}
+
+	/* Add other config variables here and to Documentation/config/sparse.txt. */
+	return 0;
+}
+
 static int git_default_i18n_config(const char *var, const char *value)
 {
 	if (!strcmp(var, "i18n.commitencoding"))
@@ -1675,6 +1686,9 @@  int git_default_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (starts_with(var, "sparse."))
+		return git_default_sparse_config(var, value);
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
diff --git a/environment.c b/environment.c
index fd0501e77a..fb55bf6129 100644
--- a/environment.c
+++ b/environment.c
@@ -70,6 +70,7 @@  char *notes_ref_name;
 int grafts_replace_parents = 1;
 int core_apply_sparse_checkout;
 int core_sparse_checkout_cone;
+int sparse_expect_files_outside_of_patterns;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
diff --git a/sparse-index.c b/sparse-index.c
index eed170cd8f..daeb5112a1 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -396,7 +396,8 @@  void clear_skip_worktree_from_present_files(struct index_state *istate)
 
 	int i;
 
-	if (!core_apply_sparse_checkout)
+	if (!core_apply_sparse_checkout ||
+	    sparse_expect_files_outside_of_patterns)
 		return;
 
 restart:
diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh
index 3deb490187..d1833c0f31 100755
--- a/t/t1090-sparse-checkout-scope.sh
+++ b/t/t1090-sparse-checkout-scope.sh
@@ -52,6 +52,25 @@  test_expect_success 'return to full checkout of main' '
 	test "$(cat b)" = "modified"
 '
 
+test_expect_success 'skip-worktree on files outside sparse patterns' '
+	git sparse-checkout disable &&
+	git sparse-checkout set --no-cone "a*" &&
+	git checkout-index --all --ignore-skip-worktree-bits &&
+
+	git ls-files -t >output &&
+	! grep ^S output >actual &&
+	test_must_be_empty actual &&
+
+	test_config sparse.expectFilesOutsideOfPatterns true &&
+	cat <<-\EOF >expect &&
+	S b
+	S c
+	EOF
+	git ls-files -t >output &&
+	grep ^S output >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'in partial clone, sparse checkout only fetches needed blobs' '
 	test_create_repo server &&
 	git clone "file://$(pwd)/server" client &&