diff mbox series

[04/10] sparse-checkout: allow in-tree definitions

Message ID 2188577cd848d7cee77f06f1ad2b181864e5e36d.1588857462.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series In-tree sparse-checkout definitions | expand

Commit Message

Linus Arver via GitGitGadget May 7, 2020, 1:17 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

One of the difficulties of using the sparse-checkout feature is not
knowing which directories are absolutely needed for working in a portion
of the repository. Some of this can be documented in README files or
included in a bootstrapping tool along with the repository. This is done
in an ad-hoc way by every project that wants to use it.

Let's make this process easier for users by creating a way to define a
useful sparse-checkout definition inside the Git tree data. This has
several benefits. In particular, the data is available to anyone who has
a copy of the repository without needing a different data source.
Second, the needs of the repository can change over time and Git can
present a way to automatically update the working directory as these
sparse-checkout definitions change over time.

When sitting down to design this feature, there were several possible
options.

The first option is to literally include files in the repository that
could replace the sparse-checkout file. This presents full generality
for the repository, but the sparse-checkout patterns may look strange to
a non-expert. While the general case could be useful to some, we are
actively working to make "cone mode" the generally accepted way to work
with sparse-checkout. In cone mode, the user selects a set of
directories to include and all matches are directory-based prefix
matches. This is much faster but also much easier to understand.

Another option would be to interpret a file as a list of necessary
directories. This would match the input to 'git sparse-checkout set'
and output of 'git sparse-checkout list' when in cone mode. However,
there are subtleties around custom data parsers that may not make this
the safest approach.

Both of the above options also suffer from a drawback that the file
format is too simple to allow for extensions in the future. For example,
in a repository with many dependent parts, changing a necessary
directory in a core component would require updating every in-tree
sparse-checkout definition. Since those files are likely very long, this
presents a lot of noise for users to handle as they update these
dependencies.

Instead, let's create a format that can easily be extended to satisfy
this need and any other needs we may want to add in the future. For this
reason, I selected the config file format for these in-tree
sparse-checkout definitions.

To use this feature, add the "--in-tree" option when setting or adding
directories to the sparse-checkout definition. For example:

  $ git sparse-checkout set --in-tree .sparse/base
  $ git sparse-checkout add --in-tree .sparse/extra

These commands add values to the multi-valued config setting
"sparse.inTree". When updating the sparse-checkout definition, these
values describe paths in the repository to find the sparse-checkout
data. After the commands listed earlier, we expect to see the following
in .git/config.worktree:

	[sparse]
		intree = .sparse/base
		intree = .sparse/extra

When applying the sparse-checkout definitions from this config, the
blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. In those
files, the multi-valued config values "sparse.dir" are considered as
the directories to construct a cone mode sparse-checkout file. The end
result is as if these paths were provided to "git sparse-checkout set"
in cone mode.

For example, suppose .sparse/base had the following content:

	[sparse]
		dir = A
		dir = B/C
		dir = D/E/F

and .sparse/extra had the following content:

	[sparse]
		dir = D
		dir = X

Then, the output of "git sparse-checkout list" would be

	A
	B/C
	D
	X

Note that since "D" contains "D/E/F", that directory replaces the
position of "D/E/F" in the list.

Since these are parsed using the config library, the parser is robust
enough to understand comments and complicated string values.

The key benefit to this approach is that it can be extended by defining
new config values. In a later change, we will introduce "sparse.inherit"
to point to another file in the tree. This will solve the problem of
editing many files when core dependencies change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt              |   2 +
 Documentation/config/sparse.txt       |  15 +++
 Documentation/git-sparse-checkout.txt |  43 ++++++++-
 builtin/sparse-checkout.c             |  43 +++++++++
 sparse-checkout.c                     | 128 +++++++++++++++++++++++++-
 sparse-checkout.h                     |  11 +++
 t/t1091-sparse-checkout-builtin.sh    | 101 ++++++++++++++++++++
 7 files changed, 337 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/config/sparse.txt

Comments

Junio C Hamano May 7, 2020, 10:58 p.m. UTC | #1
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> One of the difficulties of using the sparse-checkout feature is not
> knowing which directories are absolutely needed for working in a portion
> of the repository. Some of this can be documented in README files or
> included in a bootstrapping tool along with the repository. This is done
> in an ad-hoc way by every project that wants to use it.
>
> Let's make this process easier for users by creating a way to define a
> useful sparse-checkout definition inside the Git tree data. This has
> several benefits. In particular, the data is available to anyone who has
> a copy of the repository without needing a different data source.
> Second, the needs of the repository can change over time and Git can
> present a way to automatically update the working directory as these
> sparse-checkout definitions change over time.

And two lines of development can merge them together?

Any time a new "feature" pops up that would eventually affect how
"git clone" and "git checkout" work based on untrusted user data, we
need to make sure there is no negative security implications.  

If it only boils down to "we have files that can record list of
leading directory names and without offering extra 'flexibility'", I
guess there aren't all that much that a malicious sparse definition
can do and we would be safe, though.

> To use this feature, add the "--in-tree" option when setting or adding
> directories to the sparse-checkout definition. For example:
>
>   $ git sparse-checkout set --in-tree .sparse/base
>   $ git sparse-checkout add --in-tree .sparse/extra
>
> These commands add values to the multi-valued config setting
> "sparse.inTree". When updating the sparse-checkout definition, these
> values describe paths in the repository to find the sparse-checkout
> data. After the commands listed earlier, we expect to see the following
> in .git/config.worktree:
>
> 	[sparse]
> 		intree = .sparse/base
> 		intree = .sparse/extra

What does this say in human words?  "These two tracked files specify
which paths should be in the working tree"?  Spelling it out here
would help readers of this commit.

> When applying the sparse-checkout definitions from this config, the
> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. 

OK, so end-user edit to the working tree copy or what is added to
the index does not count and only the committed version gets used.

That makes it simple---I was wondering how we would operate when
merging a branch with different contents in the .sparse/* files
until the conflicts are resolved.

> In those
> files, the multi-valued config values "sparse.dir" are considered as
> the directories to construct a cone mode sparse-checkout file. The end
> result is as if these paths were provided to "git sparse-checkout set"
> in cone mode.

OK.

> For example, suppose .sparse/base had the following content:
>
> 	[sparse]
> 		dir = A
> 		dir = B/C
> 		dir = D/E/F
>
> and .sparse/extra had the following content:
>
> 	[sparse]
> 		dir = D
> 		dir = X
>
> Then, the output of "git sparse-checkout list" would be
>
> 	A
> 	B/C
> 	D
> 	X
>
> Note that since "D" contains "D/E/F", that directory replaces the
> position of "D/E/F" in the list.
>
> Since these are parsed using the config library, the parser is robust
> enough to understand comments and complicated string values.
>
> The key benefit to this approach is that it can be extended by defining
> new config values. In a later change, we will introduce "sparse.inherit"
> to point to another file in the tree. This will solve the problem of
> editing many files when core dependencies change.

With only a multi-valued sparse.dir elements allowed in these
in-tree .sparse/* files, I guess there isn't much mischeaf a
malicious .sparse/* file can do.  Can it try to [includeIf] some
paths external to the repository to cause a remote attacker to learn
about the paths on the local system, perhaps? 

> diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
> new file mode 100644
> index 00000000000..c1fce87cd33
> --- /dev/null
> +++ b/Documentation/config/sparse.txt
> @@ -0,0 +1,15 @@
> +sparse.inTree::
> +	The `core.sparseCheckout` config option enables the `sparse-checkout`
> +	feature, but if there are any values for the multi-valued
> +	`sparse.inTree` config option, then the sparse-checkout patterns are
> +	defined by parsing the files listed in these values. See
> +	linkgit:git-sparse-checkout[1] for more information.

Does "... but if ... then" mean "by parsing the files and ignoring
all other things that may otherwise define patterns"?

> +sparse.dir::
> +	This config setting is ignored if present in the repository config.
> +	Instead, this multi-valued option is present in the files listed by
> +	`sparse.inTree` and specifies the directories needed in the
> +	working directory. The union of all `sparse.dir` values across all
> +	`sparse.inTree` files forms the input for `git sparse-checkout set`
> +	in cone mode.  See linkgit:git-sparse-checkout[1] for more
> +	information.

If this is *not* a config in the usual sense, we probably should not
include it in this document and in the "git config --help" output.
That will allow us to drop the first sentence.

Those .sparse/* in-tree files are like .gitmodules in the sense that
they happen to use the same syntax so that the parser can be shared,
but they are not allowed to affect end-user configuration
(e.g. writing "[diff] external=rm -fr ." in the file has no effect)
at all, right?

And we should describe what we can write in these in-tree files
separately and make it clear that they are _different_ from the
configuration variables.
Derrick Stolee May 8, 2020, 3:40 p.m. UTC | #2
On 5/7/2020 6:58 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> One of the difficulties of using the sparse-checkout feature is not
>> knowing which directories are absolutely needed for working in a portion
>> of the repository. Some of this can be documented in README files or
>> included in a bootstrapping tool along with the repository. This is done
>> in an ad-hoc way by every project that wants to use it.
>>
>> Let's make this process easier for users by creating a way to define a
>> useful sparse-checkout definition inside the Git tree data. This has
>> several benefits. In particular, the data is available to anyone who has
>> a copy of the repository without needing a different data source.
>> Second, the needs of the repository can change over time and Git can
>> present a way to automatically update the working directory as these
>> sparse-checkout definitions change over time.
> 
> And two lines of development can merge them together?
> 
> Any time a new "feature" pops up that would eventually affect how
> "git clone" and "git checkout" work based on untrusted user data, we
> need to make sure there is no negative security implications.  
> 
> If it only boils down to "we have files that can record list of
> leading directory names and without offering extra 'flexibility'", I
> guess there aren't all that much that a malicious sparse definition
> can do and we would be safe, though.

Yes. I hope that we can be extremely careful with this feature.
The RFC status of this series implicitly includes the question
"Should we do this at all?" I think the benefits outweigh the
risks, but we can minimize those risks with very careful design
and implementation.

>> To use this feature, add the "--in-tree" option when setting or adding
>> directories to the sparse-checkout definition. For example:
>>
>>   $ git sparse-checkout set --in-tree .sparse/base
>>   $ git sparse-checkout add --in-tree .sparse/extra
>>
>> These commands add values to the multi-valued config setting
>> "sparse.inTree". When updating the sparse-checkout definition, these
>> values describe paths in the repository to find the sparse-checkout
>> data. After the commands listed earlier, we expect to see the following
>> in .git/config.worktree:
>>
>> 	[sparse]
>> 		intree = .sparse/base
>> 		intree = .sparse/extra
> 
> What does this say in human words?  "These two tracked files specify
> which paths should be in the working tree"?  Spelling it out here
> would help readers of this commit.

You got it. Sounds good.

>> When applying the sparse-checkout definitions from this config, the
>> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. 
> 
> OK, so end-user edit to the working tree copy or what is added to
> the index does not count and only the committed version gets used.
> 
> That makes it simple---I was wondering how we would operate when
> merging a branch with different contents in the .sparse/* files
> until the conflicts are resolved.

It's worth testing this case so we can be sure what happens.

>> In those
>> files, the multi-valued config values "sparse.dir" are considered as
>> the directories to construct a cone mode sparse-checkout file. The end
>> result is as if these paths were provided to "git sparse-checkout set"
>> in cone mode.
> 
> OK.
> 
>> For example, suppose .sparse/base had the following content:
>>
>> 	[sparse]
>> 		dir = A
>> 		dir = B/C
>> 		dir = D/E/F
>>
>> and .sparse/extra had the following content:
>>
>> 	[sparse]
>> 		dir = D
>> 		dir = X
>>
>> Then, the output of "git sparse-checkout list" would be
>>
>> 	A
>> 	B/C
>> 	D
>> 	X
>>
>> Note that since "D" contains "D/E/F", that directory replaces the
>> position of "D/E/F" in the list.
>>
>> Since these are parsed using the config library, the parser is robust
>> enough to understand comments and complicated string values.
>>
>> The key benefit to this approach is that it can be extended by defining
>> new config values. In a later change, we will introduce "sparse.inherit"
>> to point to another file in the tree. This will solve the problem of
>> editing many files when core dependencies change.
> 
> With only a multi-valued sparse.dir elements allowed in these
> in-tree .sparse/* files, I guess there isn't much mischeaf a
> malicious .sparse/* file can do.  Can it try to [includeIf] some
> paths external to the repository to cause a remote attacker to learn
> about the paths on the local system, perhaps? 

I was unaware of includes in the config format [1]. While the behavior
should be safe because we are only pulling very specific data from the
config, it would be best to see if we can disable includes when reading
config from a blob.

[1] https://git-scm.com/docs/git-config#_includes

> 
>> diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
>> new file mode 100644
>> index 00000000000..c1fce87cd33
>> --- /dev/null
>> +++ b/Documentation/config/sparse.txt
>> @@ -0,0 +1,15 @@
>> +sparse.inTree::
>> +	The `core.sparseCheckout` config option enables the `sparse-checkout`
>> +	feature, but if there are any values for the multi-valued
>> +	`sparse.inTree` config option, then the sparse-checkout patterns are
>> +	defined by parsing the files listed in these values. See
>> +	linkgit:git-sparse-checkout[1] for more information.
> 
> Does "... but if ... then" mean "by parsing the files and ignoring
> all other things that may otherwise define patterns"?
> 
>> +sparse.dir::
>> +	This config setting is ignored if present in the repository config.
>> +	Instead, this multi-valued option is present in the files listed by
>> +	`sparse.inTree` and specifies the directories needed in the
>> +	working directory. The union of all `sparse.dir` values across all
>> +	`sparse.inTree` files forms the input for `git sparse-checkout set`
>> +	in cone mode.  See linkgit:git-sparse-checkout[1] for more
>> +	information.
> 
> If this is *not* a config in the usual sense, we probably should not
> include it in this document and in the "git config --help" output.
> That will allow us to drop the first sentence.
> 
> Those .sparse/* in-tree files are like .gitmodules in the sense that
> they happen to use the same syntax so that the parser can be shared,
> but they are not allowed to affect end-user configuration
> (e.g. writing "[diff] external=rm -fr ." in the file has no effect)
> at all, right?
> 
> And we should describe what we can write in these in-tree files
> separately and make it clear that they are _different_ from the
> configuration variables.

I can move the "keys" from the config documentation and into the
git-sparse-checkout.txt file.

Thanks,
-Stolee
Elijah Newren May 20, 2020, 5:52 p.m. UTC | #3
On Fri, May 8, 2020 at 8:42 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/7/2020 6:58 PM, Junio C Hamano wrote:
> > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> >> One of the difficulties of using the sparse-checkout feature is not
> >> knowing which directories are absolutely needed for working in a portion
> >> of the repository. Some of this can be documented in README files or
> >> included in a bootstrapping tool along with the repository. This is done
> >> in an ad-hoc way by every project that wants to use it.
> >>
> >> Let's make this process easier for users by creating a way to define a
> >> useful sparse-checkout definition inside the Git tree data. This has
> >> several benefits. In particular, the data is available to anyone who has
> >> a copy of the repository without needing a different data source.
> >> Second, the needs of the repository can change over time and Git can
> >> present a way to automatically update the working directory as these
> >> sparse-checkout definitions change over time.
> >
> > And two lines of development can merge them together?
> >
> > Any time a new "feature" pops up that would eventually affect how
> > "git clone" and "git checkout" work based on untrusted user data, we
> > need to make sure there is no negative security implications.
> >
> > If it only boils down to "we have files that can record list of
> > leading directory names and without offering extra 'flexibility'", I
> > guess there aren't all that much that a malicious sparse definition
> > can do and we would be safe, though.
>
> Yes. I hope that we can be extremely careful with this feature.
> The RFC status of this series implicitly includes the question
> "Should we do this at all?" I think the benefits outweigh the
> risks, but we can minimize those risks with very careful design
> and implementation.
>
> >> To use this feature, add the "--in-tree" option when setting or adding
> >> directories to the sparse-checkout definition. For example:
> >>
> >>   $ git sparse-checkout set --in-tree .sparse/base
> >>   $ git sparse-checkout add --in-tree .sparse/extra
> >>
> >> These commands add values to the multi-valued config setting
> >> "sparse.inTree". When updating the sparse-checkout definition, these
> >> values describe paths in the repository to find the sparse-checkout
> >> data. After the commands listed earlier, we expect to see the following
> >> in .git/config.worktree:
> >>
> >>      [sparse]
> >>              intree = .sparse/base
> >>              intree = .sparse/extra
> >
> > What does this say in human words?  "These two tracked files specify
> > which paths should be in the working tree"?  Spelling it out here
> > would help readers of this commit.
>
> You got it. Sounds good.
>
> >> When applying the sparse-checkout definitions from this config, the
> >> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded.
> >
> > OK, so end-user edit to the working tree copy or what is added to
> > the index does not count and only the committed version gets used.
> >
> > That makes it simple---I was wondering how we would operate when
> > merging a branch with different contents in the .sparse/* files
> > until the conflicts are resolved.
>
> It's worth testing this case so we can be sure what happens.

During a merge or rebase or checkout -m, what happens if .sparse/extra
has the following working tree content:

[sparse]
    dir = D
    dir = X
<<<<<< HEAD
    dir = Y
|||||| MERGE_BASE
======
    inherit = .sparse/tools
>>>>>>  MERGE_HEAD
    inherit = .sparse/base

and, of course, three different entries in the index?

Also, do we use the version of the --in-tree file from the latest
commit, from the index, or from the working tree?  (This is a question
not only for merge and rebase, but also checkout with dirty changes
and even checkout -m.)  Which one "wins"?

And what if the user updates and commits an ill-formed version of the
file -- is it equivalent to getting an empty cone with just the
toplevel directory, equivalent to getting a complete checkout of
everything, or something else?
Elijah Newren June 17, 2020, 11:07 p.m. UTC | #4
On Wed, May 20, 2020 at 10:52 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Fri, May 8, 2020 at 8:42 AM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 5/7/2020 6:58 PM, Junio C Hamano wrote:
> > > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > >
> > >> One of the difficulties of using the sparse-checkout feature is not
> > >> knowing which directories are absolutely needed for working in a portion
> > >> of the repository. Some of this can be documented in README files or
> > >> included in a bootstrapping tool along with the repository. This is done
> > >> in an ad-hoc way by every project that wants to use it.
> > >>
> > >> Let's make this process easier for users by creating a way to define a
> > >> useful sparse-checkout definition inside the Git tree data. This has
> > >> several benefits. In particular, the data is available to anyone who has
> > >> a copy of the repository without needing a different data source.
> > >> Second, the needs of the repository can change over time and Git can
> > >> present a way to automatically update the working directory as these
> > >> sparse-checkout definitions change over time.
> > >
> > > And two lines of development can merge them together?
> > >
> > > Any time a new "feature" pops up that would eventually affect how
> > > "git clone" and "git checkout" work based on untrusted user data, we
> > > need to make sure there is no negative security implications.
> > >
> > > If it only boils down to "we have files that can record list of
> > > leading directory names and without offering extra 'flexibility'", I
> > > guess there aren't all that much that a malicious sparse definition
> > > can do and we would be safe, though.
> >
> > Yes. I hope that we can be extremely careful with this feature.
> > The RFC status of this series implicitly includes the question
> > "Should we do this at all?" I think the benefits outweigh the
> > risks, but we can minimize those risks with very careful design
> > and implementation.
> >
> > >> To use this feature, add the "--in-tree" option when setting or adding
> > >> directories to the sparse-checkout definition. For example:
> > >>
> > >>   $ git sparse-checkout set --in-tree .sparse/base
> > >>   $ git sparse-checkout add --in-tree .sparse/extra
> > >>
> > >> These commands add values to the multi-valued config setting
> > >> "sparse.inTree". When updating the sparse-checkout definition, these
> > >> values describe paths in the repository to find the sparse-checkout
> > >> data. After the commands listed earlier, we expect to see the following
> > >> in .git/config.worktree:
> > >>
> > >>      [sparse]
> > >>              intree = .sparse/base
> > >>              intree = .sparse/extra
> > >
> > > What does this say in human words?  "These two tracked files specify
> > > which paths should be in the working tree"?  Spelling it out here
> > > would help readers of this commit.
> >
> > You got it. Sounds good.
> >
> > >> When applying the sparse-checkout definitions from this config, the
> > >> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded.
> > >
> > > OK, so end-user edit to the working tree copy or what is added to
> > > the index does not count and only the committed version gets used.
> > >
> > > That makes it simple---I was wondering how we would operate when
> > > merging a branch with different contents in the .sparse/* files
> > > until the conflicts are resolved.
> >
> > It's worth testing this case so we can be sure what happens.
>
> During a merge or rebase or checkout -m, what happens if .sparse/extra
> has the following working tree content:
>
> [sparse]
>     dir = D
>     dir = X
> <<<<<< HEAD
>     dir = Y
> |||||| MERGE_BASE
> ======
>     inherit = .sparse/tools
> >>>>>>  MERGE_HEAD
>     inherit = .sparse/base
>
> and, of course, three different entries in the index?
>
> Also, do we use the version of the --in-tree file from the latest
> commit, from the index, or from the working tree?  (This is a question
> not only for merge and rebase, but also checkout with dirty changes
> and even checkout -m.)  Which one "wins"?
>
> And what if the user updates and commits an ill-formed version of the
> file -- is it equivalent to getting an empty cone with just the
> toplevel directory, equivalent to getting a complete checkout of
> everything, or something else?

Son pointed out that mercurial has a 'sparse' extension that has some
possible ideas of things we could do here; see
https://lore.kernel.org/git/CABPp-BGLBmWXrmPsTogyBFMgwYbHjN39oWbU=qDWroU1_fJaoQ@mail.gmail.com/
for some further discussion.
Son Luong Ngoc June 18, 2020, 8:18 a.m. UTC | #5
Hi,

On Wed, Jun 17, 2020 at 04:07:01PM -0700, Elijah Newren wrote:
> 
> Son pointed out that mercurial has a 'sparse' extension that has some
> possible ideas of things we could do here; see
> https://lore.kernel.org/git/CABPp-BGLBmWXrmPsTogyBFMgwYbHjN39oWbU=qDWroU1_fJaoQ@mail.gmail.com/
> for some further discussion.

I just want to note that you can find the latest version of FB's 'sparse'
extension here[1] and the tests for 'profile' feature could be found here[2].

Another relevant source of reading could be Google's Narrow extension for
Mercurial[3].

Cheers,
Son Luong.

[1]: https://github.com/facebookexperimental/eden/blob/master/eden/scm/edenscm/hgext/sparse.py
[2]: https://github.com/facebookexperimental/eden/blob/master/eden/scm/tests/test-sparse-profiles.t
[3]: https://bitbucket.org/Google/narrowhg/src/cb51d673e9c5820fc3da86a67f7e74b789820b4f/tests/test-merge.t#lines-63
diff mbox series

Patch

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 08b13ba72be..40f44948229 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -435,6 +435,8 @@  include::config/sequencer.txt[]
 
 include::config/showbranch.txt[]
 
+include::config/sparse.txt[]
+
 include::config/splitindex.txt[]
 
 include::config/ssh.txt[]
diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
new file mode 100644
index 00000000000..c1fce87cd33
--- /dev/null
+++ b/Documentation/config/sparse.txt
@@ -0,0 +1,15 @@ 
+sparse.inTree::
+	The `core.sparseCheckout` config option enables the `sparse-checkout`
+	feature, but if there are any values for the multi-valued
+	`sparse.inTree` config option, then the sparse-checkout patterns are
+	defined by parsing the files listed in these values. See
+	linkgit:git-sparse-checkout[1] for more information.
+
+sparse.dir::
+	This config setting is ignored if present in the repository config.
+	Instead, this multi-valued option is present in the files listed by
+	`sparse.inTree` and specifies the directories needed in the
+	working directory. The union of all `sparse.dir` values across all
+	`sparse.inTree` files forms the input for `git sparse-checkout set`
+	in cone mode.  See linkgit:git-sparse-checkout[1] for more
+	information.
diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index 1a3ace60820..da9322c5e41 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -62,13 +62,20 @@  directories (recursively) as well as files that are siblings of ancestor
 directories. The input format matches the output of `git ls-tree --name-only`.
 This includes interpreting pathnames that begin with a double quote (") as
 C-style quoted strings.
++
+When the `--in-tree` option is provided, the paths provided are interpreted
+as files within the working directory that are used to construct the
+`sparse-checkout` patterns. See 'IN-TREE PATTERN SET' below.
 
 'add'::
 	Update the sparse-checkout file to include additional patterns.
 	By default, these patterns are read from the command-line arguments,
 	but they can be read from stdin using the `--stdin` option. When
 	`core.sparseCheckoutCone` is enabled, the given patterns are interpreted
-	as directory names as in the 'set' subcommand.
+	as directory names as in the 'set' subcommand. When the `--in-tree`
+	option is provided, the input is interpreted as locations of files
+	describing a sparse-checkout definition as in the 'set' subcommand
+	and the 'IN-TREE PATTERN SET' section below.
 
 'reapply::
 	Reapply the sparsity pattern rules to paths in the working tree.
@@ -197,6 +204,40 @@  case-insensitive check. This corrects for case mismatched filenames in the
 directory.
 
 
+IN-TREE PATTERN SET
+-------------------
+
+As your project changes, your sparse-checkout pattern sets may also change.
+It is important to be able to construct a valid sparse-checkout pattern set
+when switching between points in history. The in-tree pattern sets allow
+versioning cone-mode sparse-checkout patterns next to your other artifacts.
+
+To enable the feature, create a sparse-checkout definition using the Git
+config format. The file should specify the multi-valued config variable
+`sparse.dir` to a list of directories to include in the sparse-checkout
+definition. If multiple files are specified, the resulting sparse-checkout
+definition is the union of all directories from all such files. For
+example, the following file contains a list of three directories, `A`,
+`B/C`, and `D/E/F`:
+
+----------------------------------
+[sparse]
+	dir = A
+	dir = B/C
+# Comments are allowed to describe
+# why a directory is necessary
+	dir = D/E/F
+----------------------------------
+
+Use `git sparse-checkout set --in-tree <path>` to initialize the patterns
+to those included in the file at `<path>`. This will override any existing
+patterns you have in your sparse-checkout file.
+
+After switching between commits with different versions of this file, run
+`git sparse-checkout reapply` to adjust the sparse-checkout patterns to
+the new definition.
+
+
 SUBMODULES
 ----------
 
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index fd247e428e4..621f1801c03 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -175,6 +175,7 @@  static char const * const builtin_sparse_checkout_set_usage[] = {
 
 static struct sparse_checkout_set_opts {
 	int use_stdin;
+	int in_tree;
 } set_opts;
 
 static void add_patterns_from_input(struct pattern_list *pl,
@@ -312,12 +313,52 @@  static int modify_pattern_list(int argc, const char **argv, enum modify_type m)
 	return result;
 }
 
+static int modify_in_tree_list(int argc, const char **argv, enum modify_type m)
+{
+	int result = 0;
+	int i;
+	struct string_list sl = STRING_LIST_INIT_DUP;
+	struct pattern_list pl;
+
+	memset(&pl, 0, sizeof(pl));
+
+	switch(m) {
+	case ADD:
+		if (load_in_tree_list_from_config(the_repository, &sl))
+			return 1;
+		if (!sl.nr)
+			warning(_("the existing in-tree config has no entries; this overwrites the existing sparse-checkout definition."));
+		populate_sparse_checkout_patterns(&pl);
+		break;
+
+	case REPLACE:
+		hashmap_init(&pl.recursive_hashmap, pl_hashmap_cmp, NULL, 0);
+		hashmap_init(&pl.parent_hashmap, pl_hashmap_cmp, NULL, 0);
+		break;
+	}
+
+	for (i = 0; i < argc; i++)
+		string_list_insert(&sl, argv[i]);
+
+	if (load_in_tree_pattern_list(the_repository, &sl, &pl) ||
+	    set_sparse_in_tree_config(the_repository, &sl) ||
+	    write_patterns_and_update(&pl))
+		result = 1;
+
+	string_list_clear(&sl, 0);
+	clear_pattern_list(&pl);
+
+	return result;
+}
+
 static int sparse_checkout_set(int argc, const char **argv, const char *prefix,
 			       enum modify_type m)
 {
 	static struct option builtin_sparse_checkout_set_options[] = {
 		OPT_BOOL(0, "stdin", &set_opts.use_stdin,
 			 N_("read patterns from standard in")),
+		OPT_BOOL(0, "in-tree", &set_opts.in_tree,
+			 N_("define the sparse-checkout from files in the tree")),
 		OPT_END(),
 	};
 
@@ -328,6 +369,8 @@  static int sparse_checkout_set(int argc, const char **argv, const char *prefix,
 			     builtin_sparse_checkout_set_usage,
 			     PARSE_OPT_KEEP_UNKNOWN);
 
+	if (set_opts.in_tree)
+		return modify_in_tree_list(argc, argv, m);
 	return modify_pattern_list(argc, argv, m);
 }
 
diff --git a/sparse-checkout.c b/sparse-checkout.c
index 875b620568d..d6c27ca19c4 100644
--- a/sparse-checkout.c
+++ b/sparse-checkout.c
@@ -8,6 +8,7 @@ 
 #include "strbuf.h"
 #include "string-list.h"
 #include "unpack-trees.h"
+#include "object-store.h"
 
 char *get_sparse_checkout_filename(void)
 {
@@ -33,14 +34,113 @@  void write_patterns_to_file(FILE *fp, struct pattern_list *pl)
 	}
 }
 
+int load_in_tree_list_from_config(struct repository *r,
+				  struct string_list *sl)
+{
+	struct string_list_item *item;
+	const struct string_list *cl;
+
+	cl = repo_config_get_value_multi(r, SPARSE_CHECKOUT_IN_TREE);
+
+	if (!cl)
+		return 1;
+
+	for_each_string_list_item(item, cl)
+		string_list_insert(sl, item->string);
+
+	return 0;
+}
+
+static int sparse_dir_cb(const char *var, const char *value, void *data)
+{
+	struct strbuf path = STRBUF_INIT;
+	struct pattern_list *pl = (struct pattern_list *)data;
+
+	if (!strcmp(var, SPARSE_CHECKOUT_DIR)) {
+		strbuf_addstr(&path, value);
+		strbuf_to_cone_pattern(&path, pl);
+		strbuf_release(&path);
+	}
+
+	return 0;
+}
+
+static int load_in_tree_from_blob(struct pattern_list *pl,
+				  struct object_id *oid)
+{
+	return git_config_from_blob_oid(sparse_dir_cb,
+					SPARSE_CHECKOUT_DIR,
+					oid, pl);
+}
+
+int load_in_tree_pattern_list(struct repository *r,
+			      struct string_list *sl,
+			      struct pattern_list *pl)
+{
+	struct index_state *istate = r->index;
+	struct string_list_item *item;
+	struct strbuf path = STRBUF_INIT;
+
+	pl->use_cone_patterns = 1;
+
+	for_each_string_list_item(item, sl) {
+		struct object_id *oid;
+		enum object_type type;
+		int pos = index_name_pos(istate, item->string, strlen(item->string));
+
+		/*
+		 * Exit silently, as this is likely the case where Git
+		 * changed branches to a location where the inherit file
+		 * does not exist. Do not update the sparse-checkout.
+		 */
+		if (pos < 0)
+			return 1;
+
+		oid = &istate->cache[pos]->oid;
+		type = oid_object_info(r, oid, NULL);
+
+		if (type != OBJ_BLOB) {
+			warning(_("expected a file at '%s'; not updating sparse-checkout"),
+				oid_to_hex(oid));
+			return 1;
+		}
+
+		load_in_tree_from_blob(pl, oid);
+	}
+
+	strbuf_release(&path);
+
+	return 0;
+}
+
 int populate_sparse_checkout_patterns(struct pattern_list *pl)
 {
 	int result;
-	char *sparse = get_sparse_checkout_filename();
-
-	pl->use_cone_patterns = core_sparse_checkout_cone;
-	result = add_patterns_from_file_to_list(sparse, "", 0, pl, NULL);
-	free(sparse);
+	const char *in_tree;
+
+	if (!git_config_get_value(SPARSE_CHECKOUT_IN_TREE, &in_tree) &&
+	    in_tree) {
+		struct string_list paths = STRING_LIST_INIT_DUP;
+		/* If we do not have this config, skip this step! */
+		if (load_in_tree_list_from_config(the_repository, &paths) ||
+		    !paths.nr)
+			return 1;
+
+		/* Check diff for paths over from/to. If any changed, reload. */
+		/* or for now, reload always! */
+		hashmap_init(&pl->recursive_hashmap, pl_hashmap_cmp, NULL, 0);
+		hashmap_init(&pl->parent_hashmap, pl_hashmap_cmp, NULL, 0);
+		pl->use_cone_patterns = 1;
+
+		result = load_in_tree_pattern_list(the_repository, &paths, pl);
+		string_list_clear(&paths, 0);
+	} else {
+		char *sparse = get_sparse_checkout_filename();
+
+		pl->use_cone_patterns = core_sparse_checkout_cone;
+		result = add_patterns_from_file_to_list(sparse, "", 0, pl, NULL);
+		free(sparse);
+	}
 
 	return result;
 }
@@ -243,3 +343,21 @@  void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
 
 	insert_recursive_pattern(pl, line);
 }
+
+int set_sparse_in_tree_config(struct repository *r, struct string_list *sl)
+{
+	struct string_list_item *item;
+	const char *config_path = git_path("config.worktree");
+
+	/* clear existing values */
+	git_config_set_multivar_in_file_gently(config_path,
+					       SPARSE_CHECKOUT_IN_TREE,
+					       NULL, NULL, 1);
+
+	for_each_string_list_item(item, sl)
+		git_config_set_multivar_in_file_gently(
+			config_path, SPARSE_CHECKOUT_IN_TREE,
+			item->string, CONFIG_REGEX_NONE, 0);
+
+	return 0;
+}
diff --git a/sparse-checkout.h b/sparse-checkout.h
index e0c840f07f9..993a5701a60 100644
--- a/sparse-checkout.h
+++ b/sparse-checkout.h
@@ -4,14 +4,25 @@ 
 #include "cache.h"
 #include "repository.h"
 
+#define SPARSE_CHECKOUT_DIR "sparse.dir"
+#define SPARSE_CHECKOUT_IN_TREE "sparse.intree"
+
 struct pattern_list;
 
 char *get_sparse_checkout_filename(void);
 int populate_sparse_checkout_patterns(struct pattern_list *pl);
 void write_patterns_to_file(FILE *fp, struct pattern_list *pl);
 int update_working_directory(struct pattern_list *pl);
+int write_patterns(struct pattern_list *pl, int and_update);
 int write_patterns_and_update(struct pattern_list *pl);
 void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *path);
 void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl);
 
+int load_in_tree_list_from_config(struct repository *r,
+				  struct string_list *sl);
+int load_in_tree_pattern_list(struct repository *r,
+			      struct string_list *sl,
+			      struct pattern_list *pl);
+int set_sparse_in_tree_config(struct repository *r, struct string_list *sl);
+
 #endif
diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
index 88cdde255cd..1040bf9c261 100755
--- a/t/t1091-sparse-checkout-builtin.sh
+++ b/t/t1091-sparse-checkout-builtin.sh
@@ -604,4 +604,105 @@  test_expect_success MINGW 'cone mode replaces backslashes with slashes' '
 	check_files repo/deep a deeper1
 '
 
+test_expect_success 'basis of --in-tree' '
+	git -C repo config auto.crlf false &&
+	cat >folder1 <<-\EOF &&
+	[sparse]
+		dir = folder1
+	EOF
+	cat >folder2 <<-\EOF &&
+	[sparse]
+		dir = folder2
+	EOF
+	cat >deep <<-\EOF &&
+	[sparse]
+		dir = deep
+	EOF
+	cat >deeper1 <<-\EOF &&
+	[sparse]
+		dir = deep/deeper1
+	EOF
+	cat >sparse <<-\EOF &&
+	[sparse]
+		dir = .sparse
+	EOF
+	mkdir repo/.sparse &&
+	for file in folder1 folder2 deep deeper1 sparse
+	do
+		cp $file repo/.sparse/ || return 1
+	done &&
+	git -C repo add .sparse &&
+	git -C repo commit -m "Add sparse specifications" &&
+
+	git -C repo sparse-checkout set --in-tree .sparse/folder1 &&
+	check_files repo a folder1 &&
+	git -C repo config --get-all sparse.inTree >actual-config &&
+	echo .sparse/folder1 >expect-config &&
+	test_cmp expect-config actual-config &&
+	check_files repo a folder1 &&
+
+	git -C repo sparse-checkout set --in-tree .sparse/folder2 &&
+	git -C repo config --get-all sparse.inTree >actual-config &&
+	echo .sparse/folder2 >expect-config &&
+	test_cmp expect-config actual-config &&
+	check_files repo a folder2 &&
+
+	git -C repo sparse-checkout set --in-tree .sparse/deeper1 &&
+	git -C repo config --get-all sparse.inTree >actual-config &&
+	echo .sparse/deeper1 >expect-config &&
+	test_cmp expect-config actual-config &&
+	check_files repo a deep &&
+	check_files repo/deep a deeper1 &&
+
+	git -C repo sparse-checkout set --in-tree .sparse/deeper1 .sparse/deep .sparse/folder1 &&
+	check_files repo a deep folder1 &&
+	check_files repo/deep a deeper1 deeper2 &&
+	cat >expect-list <<-EOF &&
+	deep
+	folder1
+	EOF
+	git -C repo sparse-checkout list >actual-list &&
+	test_cmp expect-list actual-list &&
+
+	git -C repo sparse-checkout set --in-tree .sparse/folder1 .sparse/deeper1 &&
+	git -C repo config --get-all sparse.inTree >actual-config &&
+	cat >expect-config <<-\EOF &&
+	.sparse/deeper1
+	.sparse/folder1
+	EOF
+	test_cmp expect-config actual-config &&
+	check_files repo a deep folder1
+'
+
+test_expect_success '"add" with --in-tree' '
+	git -C repo sparse-checkout set --in-tree .sparse/folder1 &&
+	git -C repo config --get-all sparse.inTree >actual-config &&
+	echo .sparse/folder1 >expect-config &&
+	test_cmp expect-config actual-config &&
+	check_files repo a folder1 &&
+	git -C repo sparse-checkout add --in-tree .sparse/deeper1 &&
+	git -C repo config --get-all sparse.inTree >actual-config &&
+	cat >expect-config <<-\EOF &&
+	.sparse/deeper1
+	.sparse/folder1
+	EOF
+	test_cmp expect-config actual-config &&
+	check_files repo a deep folder1
+'
+
+test_expect_success 'reapply after updating in-tree file' '
+	git -C repo sparse-checkout set --in-tree .sparse/sparse &&
+	check_files repo a &&
+	test_path_is_dir repo/.sparse &&
+	echo "\tdir = folder1" >>repo/.sparse/sparse &&
+	git -C repo commit -a -m "Update sparse file" &&
+	git -C repo sparse-checkout reapply &&
+	check_files repo a folder1 &&
+	test_path_is_dir repo/.sparse &&
+	git -C repo checkout HEAD~1 &&
+	git -C repo sparse-checkout reapply &&
+	check_files repo a &&
+	test_path_is_dir repo/.sparse
+'
+
 test_done