Message ID | 2188577cd848d7cee77f06f1ad2b181864e5e36d.1588857462.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | In-tree sparse-checkout definitions | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > One of the difficulties of using the sparse-checkout feature is not > knowing which directories are absolutely needed for working in a portion > of the repository. Some of this can be documented in README files or > included in a bootstrapping tool along with the repository. This is done > in an ad-hoc way by every project that wants to use it. > > Let's make this process easier for users by creating a way to define a > useful sparse-checkout definition inside the Git tree data. This has > several benefits. In particular, the data is available to anyone who has > a copy of the repository without needing a different data source. > Second, the needs of the repository can change over time and Git can > present a way to automatically update the working directory as these > sparse-checkout definitions change over time. And two lines of development can merge them together? Any time a new "feature" pops up that would eventually affect how "git clone" and "git checkout" work based on untrusted user data, we need to make sure there is no negative security implications. If it only boils down to "we have files that can record list of leading directory names and without offering extra 'flexibility'", I guess there aren't all that much that a malicious sparse definition can do and we would be safe, though. > To use this feature, add the "--in-tree" option when setting or adding > directories to the sparse-checkout definition. For example: > > $ git sparse-checkout set --in-tree .sparse/base > $ git sparse-checkout add --in-tree .sparse/extra > > These commands add values to the multi-valued config setting > "sparse.inTree". When updating the sparse-checkout definition, these > values describe paths in the repository to find the sparse-checkout > data. After the commands listed earlier, we expect to see the following > in .git/config.worktree: > > [sparse] > intree = .sparse/base > intree = .sparse/extra What does this say in human words? "These two tracked files specify which paths should be in the working tree"? Spelling it out here would help readers of this commit. > When applying the sparse-checkout definitions from this config, the > blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. OK, so end-user edit to the working tree copy or what is added to the index does not count and only the committed version gets used. That makes it simple---I was wondering how we would operate when merging a branch with different contents in the .sparse/* files until the conflicts are resolved. > In those > files, the multi-valued config values "sparse.dir" are considered as > the directories to construct a cone mode sparse-checkout file. The end > result is as if these paths were provided to "git sparse-checkout set" > in cone mode. OK. > For example, suppose .sparse/base had the following content: > > [sparse] > dir = A > dir = B/C > dir = D/E/F > > and .sparse/extra had the following content: > > [sparse] > dir = D > dir = X > > Then, the output of "git sparse-checkout list" would be > > A > B/C > D > X > > Note that since "D" contains "D/E/F", that directory replaces the > position of "D/E/F" in the list. > > Since these are parsed using the config library, the parser is robust > enough to understand comments and complicated string values. > > The key benefit to this approach is that it can be extended by defining > new config values. In a later change, we will introduce "sparse.inherit" > to point to another file in the tree. This will solve the problem of > editing many files when core dependencies change. With only a multi-valued sparse.dir elements allowed in these in-tree .sparse/* files, I guess there isn't much mischeaf a malicious .sparse/* file can do. Can it try to [includeIf] some paths external to the repository to cause a remote attacker to learn about the paths on the local system, perhaps? > diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt > new file mode 100644 > index 00000000000..c1fce87cd33 > --- /dev/null > +++ b/Documentation/config/sparse.txt > @@ -0,0 +1,15 @@ > +sparse.inTree:: > + The `core.sparseCheckout` config option enables the `sparse-checkout` > + feature, but if there are any values for the multi-valued > + `sparse.inTree` config option, then the sparse-checkout patterns are > + defined by parsing the files listed in these values. See > + linkgit:git-sparse-checkout[1] for more information. Does "... but if ... then" mean "by parsing the files and ignoring all other things that may otherwise define patterns"? > +sparse.dir:: > + This config setting is ignored if present in the repository config. > + Instead, this multi-valued option is present in the files listed by > + `sparse.inTree` and specifies the directories needed in the > + working directory. The union of all `sparse.dir` values across all > + `sparse.inTree` files forms the input for `git sparse-checkout set` > + in cone mode. See linkgit:git-sparse-checkout[1] for more > + information. If this is *not* a config in the usual sense, we probably should not include it in this document and in the "git config --help" output. That will allow us to drop the first sentence. Those .sparse/* in-tree files are like .gitmodules in the sense that they happen to use the same syntax so that the parser can be shared, but they are not allowed to affect end-user configuration (e.g. writing "[diff] external=rm -fr ." in the file has no effect) at all, right? And we should describe what we can write in these in-tree files separately and make it clear that they are _different_ from the configuration variables.
On 5/7/2020 6:58 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> One of the difficulties of using the sparse-checkout feature is not >> knowing which directories are absolutely needed for working in a portion >> of the repository. Some of this can be documented in README files or >> included in a bootstrapping tool along with the repository. This is done >> in an ad-hoc way by every project that wants to use it. >> >> Let's make this process easier for users by creating a way to define a >> useful sparse-checkout definition inside the Git tree data. This has >> several benefits. In particular, the data is available to anyone who has >> a copy of the repository without needing a different data source. >> Second, the needs of the repository can change over time and Git can >> present a way to automatically update the working directory as these >> sparse-checkout definitions change over time. > > And two lines of development can merge them together? > > Any time a new "feature" pops up that would eventually affect how > "git clone" and "git checkout" work based on untrusted user data, we > need to make sure there is no negative security implications. > > If it only boils down to "we have files that can record list of > leading directory names and without offering extra 'flexibility'", I > guess there aren't all that much that a malicious sparse definition > can do and we would be safe, though. Yes. I hope that we can be extremely careful with this feature. The RFC status of this series implicitly includes the question "Should we do this at all?" I think the benefits outweigh the risks, but we can minimize those risks with very careful design and implementation. >> To use this feature, add the "--in-tree" option when setting or adding >> directories to the sparse-checkout definition. For example: >> >> $ git sparse-checkout set --in-tree .sparse/base >> $ git sparse-checkout add --in-tree .sparse/extra >> >> These commands add values to the multi-valued config setting >> "sparse.inTree". When updating the sparse-checkout definition, these >> values describe paths in the repository to find the sparse-checkout >> data. After the commands listed earlier, we expect to see the following >> in .git/config.worktree: >> >> [sparse] >> intree = .sparse/base >> intree = .sparse/extra > > What does this say in human words? "These two tracked files specify > which paths should be in the working tree"? Spelling it out here > would help readers of this commit. You got it. Sounds good. >> When applying the sparse-checkout definitions from this config, the >> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. > > OK, so end-user edit to the working tree copy or what is added to > the index does not count and only the committed version gets used. > > That makes it simple---I was wondering how we would operate when > merging a branch with different contents in the .sparse/* files > until the conflicts are resolved. It's worth testing this case so we can be sure what happens. >> In those >> files, the multi-valued config values "sparse.dir" are considered as >> the directories to construct a cone mode sparse-checkout file. The end >> result is as if these paths were provided to "git sparse-checkout set" >> in cone mode. > > OK. > >> For example, suppose .sparse/base had the following content: >> >> [sparse] >> dir = A >> dir = B/C >> dir = D/E/F >> >> and .sparse/extra had the following content: >> >> [sparse] >> dir = D >> dir = X >> >> Then, the output of "git sparse-checkout list" would be >> >> A >> B/C >> D >> X >> >> Note that since "D" contains "D/E/F", that directory replaces the >> position of "D/E/F" in the list. >> >> Since these are parsed using the config library, the parser is robust >> enough to understand comments and complicated string values. >> >> The key benefit to this approach is that it can be extended by defining >> new config values. In a later change, we will introduce "sparse.inherit" >> to point to another file in the tree. This will solve the problem of >> editing many files when core dependencies change. > > With only a multi-valued sparse.dir elements allowed in these > in-tree .sparse/* files, I guess there isn't much mischeaf a > malicious .sparse/* file can do. Can it try to [includeIf] some > paths external to the repository to cause a remote attacker to learn > about the paths on the local system, perhaps? I was unaware of includes in the config format [1]. While the behavior should be safe because we are only pulling very specific data from the config, it would be best to see if we can disable includes when reading config from a blob. [1] https://git-scm.com/docs/git-config#_includes > >> diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt >> new file mode 100644 >> index 00000000000..c1fce87cd33 >> --- /dev/null >> +++ b/Documentation/config/sparse.txt >> @@ -0,0 +1,15 @@ >> +sparse.inTree:: >> + The `core.sparseCheckout` config option enables the `sparse-checkout` >> + feature, but if there are any values for the multi-valued >> + `sparse.inTree` config option, then the sparse-checkout patterns are >> + defined by parsing the files listed in these values. See >> + linkgit:git-sparse-checkout[1] for more information. > > Does "... but if ... then" mean "by parsing the files and ignoring > all other things that may otherwise define patterns"? > >> +sparse.dir:: >> + This config setting is ignored if present in the repository config. >> + Instead, this multi-valued option is present in the files listed by >> + `sparse.inTree` and specifies the directories needed in the >> + working directory. The union of all `sparse.dir` values across all >> + `sparse.inTree` files forms the input for `git sparse-checkout set` >> + in cone mode. See linkgit:git-sparse-checkout[1] for more >> + information. > > If this is *not* a config in the usual sense, we probably should not > include it in this document and in the "git config --help" output. > That will allow us to drop the first sentence. > > Those .sparse/* in-tree files are like .gitmodules in the sense that > they happen to use the same syntax so that the parser can be shared, > but they are not allowed to affect end-user configuration > (e.g. writing "[diff] external=rm -fr ." in the file has no effect) > at all, right? > > And we should describe what we can write in these in-tree files > separately and make it clear that they are _different_ from the > configuration variables. I can move the "keys" from the config documentation and into the git-sparse-checkout.txt file. Thanks, -Stolee
On Fri, May 8, 2020 at 8:42 AM Derrick Stolee <stolee@gmail.com> wrote: > > On 5/7/2020 6:58 PM, Junio C Hamano wrote: > > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > >> One of the difficulties of using the sparse-checkout feature is not > >> knowing which directories are absolutely needed for working in a portion > >> of the repository. Some of this can be documented in README files or > >> included in a bootstrapping tool along with the repository. This is done > >> in an ad-hoc way by every project that wants to use it. > >> > >> Let's make this process easier for users by creating a way to define a > >> useful sparse-checkout definition inside the Git tree data. This has > >> several benefits. In particular, the data is available to anyone who has > >> a copy of the repository without needing a different data source. > >> Second, the needs of the repository can change over time and Git can > >> present a way to automatically update the working directory as these > >> sparse-checkout definitions change over time. > > > > And two lines of development can merge them together? > > > > Any time a new "feature" pops up that would eventually affect how > > "git clone" and "git checkout" work based on untrusted user data, we > > need to make sure there is no negative security implications. > > > > If it only boils down to "we have files that can record list of > > leading directory names and without offering extra 'flexibility'", I > > guess there aren't all that much that a malicious sparse definition > > can do and we would be safe, though. > > Yes. I hope that we can be extremely careful with this feature. > The RFC status of this series implicitly includes the question > "Should we do this at all?" I think the benefits outweigh the > risks, but we can minimize those risks with very careful design > and implementation. > > >> To use this feature, add the "--in-tree" option when setting or adding > >> directories to the sparse-checkout definition. For example: > >> > >> $ git sparse-checkout set --in-tree .sparse/base > >> $ git sparse-checkout add --in-tree .sparse/extra > >> > >> These commands add values to the multi-valued config setting > >> "sparse.inTree". When updating the sparse-checkout definition, these > >> values describe paths in the repository to find the sparse-checkout > >> data. After the commands listed earlier, we expect to see the following > >> in .git/config.worktree: > >> > >> [sparse] > >> intree = .sparse/base > >> intree = .sparse/extra > > > > What does this say in human words? "These two tracked files specify > > which paths should be in the working tree"? Spelling it out here > > would help readers of this commit. > > You got it. Sounds good. > > >> When applying the sparse-checkout definitions from this config, the > >> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. > > > > OK, so end-user edit to the working tree copy or what is added to > > the index does not count and only the committed version gets used. > > > > That makes it simple---I was wondering how we would operate when > > merging a branch with different contents in the .sparse/* files > > until the conflicts are resolved. > > It's worth testing this case so we can be sure what happens. During a merge or rebase or checkout -m, what happens if .sparse/extra has the following working tree content: [sparse] dir = D dir = X <<<<<< HEAD dir = Y |||||| MERGE_BASE ====== inherit = .sparse/tools >>>>>> MERGE_HEAD inherit = .sparse/base and, of course, three different entries in the index? Also, do we use the version of the --in-tree file from the latest commit, from the index, or from the working tree? (This is a question not only for merge and rebase, but also checkout with dirty changes and even checkout -m.) Which one "wins"? And what if the user updates and commits an ill-formed version of the file -- is it equivalent to getting an empty cone with just the toplevel directory, equivalent to getting a complete checkout of everything, or something else?
On Wed, May 20, 2020 at 10:52 AM Elijah Newren <newren@gmail.com> wrote: > > On Fri, May 8, 2020 at 8:42 AM Derrick Stolee <stolee@gmail.com> wrote: > > > > On 5/7/2020 6:58 PM, Junio C Hamano wrote: > > > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > > > >> One of the difficulties of using the sparse-checkout feature is not > > >> knowing which directories are absolutely needed for working in a portion > > >> of the repository. Some of this can be documented in README files or > > >> included in a bootstrapping tool along with the repository. This is done > > >> in an ad-hoc way by every project that wants to use it. > > >> > > >> Let's make this process easier for users by creating a way to define a > > >> useful sparse-checkout definition inside the Git tree data. This has > > >> several benefits. In particular, the data is available to anyone who has > > >> a copy of the repository without needing a different data source. > > >> Second, the needs of the repository can change over time and Git can > > >> present a way to automatically update the working directory as these > > >> sparse-checkout definitions change over time. > > > > > > And two lines of development can merge them together? > > > > > > Any time a new "feature" pops up that would eventually affect how > > > "git clone" and "git checkout" work based on untrusted user data, we > > > need to make sure there is no negative security implications. > > > > > > If it only boils down to "we have files that can record list of > > > leading directory names and without offering extra 'flexibility'", I > > > guess there aren't all that much that a malicious sparse definition > > > can do and we would be safe, though. > > > > Yes. I hope that we can be extremely careful with this feature. > > The RFC status of this series implicitly includes the question > > "Should we do this at all?" I think the benefits outweigh the > > risks, but we can minimize those risks with very careful design > > and implementation. > > > > >> To use this feature, add the "--in-tree" option when setting or adding > > >> directories to the sparse-checkout definition. For example: > > >> > > >> $ git sparse-checkout set --in-tree .sparse/base > > >> $ git sparse-checkout add --in-tree .sparse/extra > > >> > > >> These commands add values to the multi-valued config setting > > >> "sparse.inTree". When updating the sparse-checkout definition, these > > >> values describe paths in the repository to find the sparse-checkout > > >> data. After the commands listed earlier, we expect to see the following > > >> in .git/config.worktree: > > >> > > >> [sparse] > > >> intree = .sparse/base > > >> intree = .sparse/extra > > > > > > What does this say in human words? "These two tracked files specify > > > which paths should be in the working tree"? Spelling it out here > > > would help readers of this commit. > > > > You got it. Sounds good. > > > > >> When applying the sparse-checkout definitions from this config, the > > >> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. > > > > > > OK, so end-user edit to the working tree copy or what is added to > > > the index does not count and only the committed version gets used. > > > > > > That makes it simple---I was wondering how we would operate when > > > merging a branch with different contents in the .sparse/* files > > > until the conflicts are resolved. > > > > It's worth testing this case so we can be sure what happens. > > During a merge or rebase or checkout -m, what happens if .sparse/extra > has the following working tree content: > > [sparse] > dir = D > dir = X > <<<<<< HEAD > dir = Y > |||||| MERGE_BASE > ====== > inherit = .sparse/tools > >>>>>> MERGE_HEAD > inherit = .sparse/base > > and, of course, three different entries in the index? > > Also, do we use the version of the --in-tree file from the latest > commit, from the index, or from the working tree? (This is a question > not only for merge and rebase, but also checkout with dirty changes > and even checkout -m.) Which one "wins"? > > And what if the user updates and commits an ill-formed version of the > file -- is it equivalent to getting an empty cone with just the > toplevel directory, equivalent to getting a complete checkout of > everything, or something else? Son pointed out that mercurial has a 'sparse' extension that has some possible ideas of things we could do here; see https://lore.kernel.org/git/CABPp-BGLBmWXrmPsTogyBFMgwYbHjN39oWbU=qDWroU1_fJaoQ@mail.gmail.com/ for some further discussion.
Hi, On Wed, Jun 17, 2020 at 04:07:01PM -0700, Elijah Newren wrote: > > Son pointed out that mercurial has a 'sparse' extension that has some > possible ideas of things we could do here; see > https://lore.kernel.org/git/CABPp-BGLBmWXrmPsTogyBFMgwYbHjN39oWbU=qDWroU1_fJaoQ@mail.gmail.com/ > for some further discussion. I just want to note that you can find the latest version of FB's 'sparse' extension here[1] and the tests for 'profile' feature could be found here[2]. Another relevant source of reading could be Google's Narrow extension for Mercurial[3]. Cheers, Son Luong. [1]: https://github.com/facebookexperimental/eden/blob/master/eden/scm/edenscm/hgext/sparse.py [2]: https://github.com/facebookexperimental/eden/blob/master/eden/scm/tests/test-sparse-profiles.t [3]: https://bitbucket.org/Google/narrowhg/src/cb51d673e9c5820fc3da86a67f7e74b789820b4f/tests/test-merge.t#lines-63
diff --git a/Documentation/config.txt b/Documentation/config.txt index 08b13ba72be..40f44948229 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -435,6 +435,8 @@ include::config/sequencer.txt[] include::config/showbranch.txt[] +include::config/sparse.txt[] + include::config/splitindex.txt[] include::config/ssh.txt[] diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt new file mode 100644 index 00000000000..c1fce87cd33 --- /dev/null +++ b/Documentation/config/sparse.txt @@ -0,0 +1,15 @@ +sparse.inTree:: + The `core.sparseCheckout` config option enables the `sparse-checkout` + feature, but if there are any values for the multi-valued + `sparse.inTree` config option, then the sparse-checkout patterns are + defined by parsing the files listed in these values. See + linkgit:git-sparse-checkout[1] for more information. + +sparse.dir:: + This config setting is ignored if present in the repository config. + Instead, this multi-valued option is present in the files listed by + `sparse.inTree` and specifies the directories needed in the + working directory. The union of all `sparse.dir` values across all + `sparse.inTree` files forms the input for `git sparse-checkout set` + in cone mode. See linkgit:git-sparse-checkout[1] for more + information. diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt index 1a3ace60820..da9322c5e41 100644 --- a/Documentation/git-sparse-checkout.txt +++ b/Documentation/git-sparse-checkout.txt @@ -62,13 +62,20 @@ directories (recursively) as well as files that are siblings of ancestor directories. The input format matches the output of `git ls-tree --name-only`. This includes interpreting pathnames that begin with a double quote (") as C-style quoted strings. ++ +When the `--in-tree` option is provided, the paths provided are interpreted +as files within the working directory that are used to construct the +`sparse-checkout` patterns. See 'IN-TREE PATTERN SET' below. 'add':: Update the sparse-checkout file to include additional patterns. By default, these patterns are read from the command-line arguments, but they can be read from stdin using the `--stdin` option. When `core.sparseCheckoutCone` is enabled, the given patterns are interpreted - as directory names as in the 'set' subcommand. + as directory names as in the 'set' subcommand. When the `--in-tree` + option is provided, the input is interpreted as locations of files + describing a sparse-checkout definition as in the 'set' subcommand + and the 'IN-TREE PATTERN SET' section below. 'reapply:: Reapply the sparsity pattern rules to paths in the working tree. @@ -197,6 +204,40 @@ case-insensitive check. This corrects for case mismatched filenames in the directory. +IN-TREE PATTERN SET +------------------- + +As your project changes, your sparse-checkout pattern sets may also change. +It is important to be able to construct a valid sparse-checkout pattern set +when switching between points in history. The in-tree pattern sets allow +versioning cone-mode sparse-checkout patterns next to your other artifacts. + +To enable the feature, create a sparse-checkout definition using the Git +config format. The file should specify the multi-valued config variable +`sparse.dir` to a list of directories to include in the sparse-checkout +definition. If multiple files are specified, the resulting sparse-checkout +definition is the union of all directories from all such files. For +example, the following file contains a list of three directories, `A`, +`B/C`, and `D/E/F`: + +---------------------------------- +[sparse] + dir = A + dir = B/C +# Comments are allowed to describe +# why a directory is necessary + dir = D/E/F +---------------------------------- + +Use `git sparse-checkout set --in-tree <path>` to initialize the patterns +to those included in the file at `<path>`. This will override any existing +patterns you have in your sparse-checkout file. + +After switching between commits with different versions of this file, run +`git sparse-checkout reapply` to adjust the sparse-checkout patterns to +the new definition. + + SUBMODULES ---------- diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index fd247e428e4..621f1801c03 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -175,6 +175,7 @@ static char const * const builtin_sparse_checkout_set_usage[] = { static struct sparse_checkout_set_opts { int use_stdin; + int in_tree; } set_opts; static void add_patterns_from_input(struct pattern_list *pl, @@ -312,12 +313,52 @@ static int modify_pattern_list(int argc, const char **argv, enum modify_type m) return result; } +static int modify_in_tree_list(int argc, const char **argv, enum modify_type m) +{ + int result = 0; + int i; + struct string_list sl = STRING_LIST_INIT_DUP; + struct pattern_list pl; + + memset(&pl, 0, sizeof(pl)); + + switch(m) { + case ADD: + if (load_in_tree_list_from_config(the_repository, &sl)) + return 1; + if (!sl.nr) + warning(_("the existing in-tree config has no entries; this overwrites the existing sparse-checkout definition.")); + populate_sparse_checkout_patterns(&pl); + break; + + case REPLACE: + hashmap_init(&pl.recursive_hashmap, pl_hashmap_cmp, NULL, 0); + hashmap_init(&pl.parent_hashmap, pl_hashmap_cmp, NULL, 0); + break; + } + + for (i = 0; i < argc; i++) + string_list_insert(&sl, argv[i]); + + if (load_in_tree_pattern_list(the_repository, &sl, &pl) || + set_sparse_in_tree_config(the_repository, &sl) || + write_patterns_and_update(&pl)) + result = 1; + + string_list_clear(&sl, 0); + clear_pattern_list(&pl); + + return result; +} + static int sparse_checkout_set(int argc, const char **argv, const char *prefix, enum modify_type m) { static struct option builtin_sparse_checkout_set_options[] = { OPT_BOOL(0, "stdin", &set_opts.use_stdin, N_("read patterns from standard in")), + OPT_BOOL(0, "in-tree", &set_opts.in_tree, + N_("define the sparse-checkout from files in the tree")), OPT_END(), }; @@ -328,6 +369,8 @@ static int sparse_checkout_set(int argc, const char **argv, const char *prefix, builtin_sparse_checkout_set_usage, PARSE_OPT_KEEP_UNKNOWN); + if (set_opts.in_tree) + return modify_in_tree_list(argc, argv, m); return modify_pattern_list(argc, argv, m); } diff --git a/sparse-checkout.c b/sparse-checkout.c index 875b620568d..d6c27ca19c4 100644 --- a/sparse-checkout.c +++ b/sparse-checkout.c @@ -8,6 +8,7 @@ #include "strbuf.h" #include "string-list.h" #include "unpack-trees.h" +#include "object-store.h" char *get_sparse_checkout_filename(void) { @@ -33,14 +34,113 @@ void write_patterns_to_file(FILE *fp, struct pattern_list *pl) } } +int load_in_tree_list_from_config(struct repository *r, + struct string_list *sl) +{ + struct string_list_item *item; + const struct string_list *cl; + + cl = repo_config_get_value_multi(r, SPARSE_CHECKOUT_IN_TREE); + + if (!cl) + return 1; + + for_each_string_list_item(item, cl) + string_list_insert(sl, item->string); + + return 0; +} + +static int sparse_dir_cb(const char *var, const char *value, void *data) +{ + struct strbuf path = STRBUF_INIT; + struct pattern_list *pl = (struct pattern_list *)data; + + if (!strcmp(var, SPARSE_CHECKOUT_DIR)) { + strbuf_addstr(&path, value); + strbuf_to_cone_pattern(&path, pl); + strbuf_release(&path); + } + + return 0; +} + +static int load_in_tree_from_blob(struct pattern_list *pl, + struct object_id *oid) +{ + return git_config_from_blob_oid(sparse_dir_cb, + SPARSE_CHECKOUT_DIR, + oid, pl); +} + +int load_in_tree_pattern_list(struct repository *r, + struct string_list *sl, + struct pattern_list *pl) +{ + struct index_state *istate = r->index; + struct string_list_item *item; + struct strbuf path = STRBUF_INIT; + + pl->use_cone_patterns = 1; + + for_each_string_list_item(item, sl) { + struct object_id *oid; + enum object_type type; + int pos = index_name_pos(istate, item->string, strlen(item->string)); + + /* + * Exit silently, as this is likely the case where Git + * changed branches to a location where the inherit file + * does not exist. Do not update the sparse-checkout. + */ + if (pos < 0) + return 1; + + oid = &istate->cache[pos]->oid; + type = oid_object_info(r, oid, NULL); + + if (type != OBJ_BLOB) { + warning(_("expected a file at '%s'; not updating sparse-checkout"), + oid_to_hex(oid)); + return 1; + } + + load_in_tree_from_blob(pl, oid); + } + + strbuf_release(&path); + + return 0; +} + int populate_sparse_checkout_patterns(struct pattern_list *pl) { int result; - char *sparse = get_sparse_checkout_filename(); - - pl->use_cone_patterns = core_sparse_checkout_cone; - result = add_patterns_from_file_to_list(sparse, "", 0, pl, NULL); - free(sparse); + const char *in_tree; + + if (!git_config_get_value(SPARSE_CHECKOUT_IN_TREE, &in_tree) && + in_tree) { + struct string_list paths = STRING_LIST_INIT_DUP; + /* If we do not have this config, skip this step! */ + if (load_in_tree_list_from_config(the_repository, &paths) || + !paths.nr) + return 1; + + /* Check diff for paths over from/to. If any changed, reload. */ + /* or for now, reload always! */ + hashmap_init(&pl->recursive_hashmap, pl_hashmap_cmp, NULL, 0); + hashmap_init(&pl->parent_hashmap, pl_hashmap_cmp, NULL, 0); + pl->use_cone_patterns = 1; + + result = load_in_tree_pattern_list(the_repository, &paths, pl); + string_list_clear(&paths, 0); + } else { + char *sparse = get_sparse_checkout_filename(); + + pl->use_cone_patterns = core_sparse_checkout_cone; + result = add_patterns_from_file_to_list(sparse, "", 0, pl, NULL); + free(sparse); + } return result; } @@ -243,3 +343,21 @@ void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl) insert_recursive_pattern(pl, line); } + +int set_sparse_in_tree_config(struct repository *r, struct string_list *sl) +{ + struct string_list_item *item; + const char *config_path = git_path("config.worktree"); + + /* clear existing values */ + git_config_set_multivar_in_file_gently(config_path, + SPARSE_CHECKOUT_IN_TREE, + NULL, NULL, 1); + + for_each_string_list_item(item, sl) + git_config_set_multivar_in_file_gently( + config_path, SPARSE_CHECKOUT_IN_TREE, + item->string, CONFIG_REGEX_NONE, 0); + + return 0; +} diff --git a/sparse-checkout.h b/sparse-checkout.h index e0c840f07f9..993a5701a60 100644 --- a/sparse-checkout.h +++ b/sparse-checkout.h @@ -4,14 +4,25 @@ #include "cache.h" #include "repository.h" +#define SPARSE_CHECKOUT_DIR "sparse.dir" +#define SPARSE_CHECKOUT_IN_TREE "sparse.intree" + struct pattern_list; char *get_sparse_checkout_filename(void); int populate_sparse_checkout_patterns(struct pattern_list *pl); void write_patterns_to_file(FILE *fp, struct pattern_list *pl); int update_working_directory(struct pattern_list *pl); +int write_patterns(struct pattern_list *pl, int and_update); int write_patterns_and_update(struct pattern_list *pl); void insert_recursive_pattern(struct pattern_list *pl, struct strbuf *path); void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl); +int load_in_tree_list_from_config(struct repository *r, + struct string_list *sl); +int load_in_tree_pattern_list(struct repository *r, + struct string_list *sl, + struct pattern_list *pl); +int set_sparse_in_tree_config(struct repository *r, struct string_list *sl); + #endif diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh index 88cdde255cd..1040bf9c261 100755 --- a/t/t1091-sparse-checkout-builtin.sh +++ b/t/t1091-sparse-checkout-builtin.sh @@ -604,4 +604,105 @@ test_expect_success MINGW 'cone mode replaces backslashes with slashes' ' check_files repo/deep a deeper1 ' +test_expect_success 'basis of --in-tree' ' + git -C repo config auto.crlf false && + cat >folder1 <<-\EOF && + [sparse] + dir = folder1 + EOF + cat >folder2 <<-\EOF && + [sparse] + dir = folder2 + EOF + cat >deep <<-\EOF && + [sparse] + dir = deep + EOF + cat >deeper1 <<-\EOF && + [sparse] + dir = deep/deeper1 + EOF + cat >sparse <<-\EOF && + [sparse] + dir = .sparse + EOF + mkdir repo/.sparse && + for file in folder1 folder2 deep deeper1 sparse + do + cp $file repo/.sparse/ || return 1 + done && + git -C repo add .sparse && + git -C repo commit -m "Add sparse specifications" && + + git -C repo sparse-checkout set --in-tree .sparse/folder1 && + check_files repo a folder1 && + git -C repo config --get-all sparse.inTree >actual-config && + echo .sparse/folder1 >expect-config && + test_cmp expect-config actual-config && + check_files repo a folder1 && + + git -C repo sparse-checkout set --in-tree .sparse/folder2 && + git -C repo config --get-all sparse.inTree >actual-config && + echo .sparse/folder2 >expect-config && + test_cmp expect-config actual-config && + check_files repo a folder2 && + + git -C repo sparse-checkout set --in-tree .sparse/deeper1 && + git -C repo config --get-all sparse.inTree >actual-config && + echo .sparse/deeper1 >expect-config && + test_cmp expect-config actual-config && + check_files repo a deep && + check_files repo/deep a deeper1 && + + git -C repo sparse-checkout set --in-tree .sparse/deeper1 .sparse/deep .sparse/folder1 && + check_files repo a deep folder1 && + check_files repo/deep a deeper1 deeper2 && + cat >expect-list <<-EOF && + deep + folder1 + EOF + git -C repo sparse-checkout list >actual-list && + test_cmp expect-list actual-list && + + git -C repo sparse-checkout set --in-tree .sparse/folder1 .sparse/deeper1 && + git -C repo config --get-all sparse.inTree >actual-config && + cat >expect-config <<-\EOF && + .sparse/deeper1 + .sparse/folder1 + EOF + test_cmp expect-config actual-config && + check_files repo a deep folder1 +' + +test_expect_success '"add" with --in-tree' ' + git -C repo sparse-checkout set --in-tree .sparse/folder1 && + git -C repo config --get-all sparse.inTree >actual-config && + echo .sparse/folder1 >expect-config && + test_cmp expect-config actual-config && + check_files repo a folder1 && + git -C repo sparse-checkout add --in-tree .sparse/deeper1 && + git -C repo config --get-all sparse.inTree >actual-config && + cat >expect-config <<-\EOF && + .sparse/deeper1 + .sparse/folder1 + EOF + test_cmp expect-config actual-config && + check_files repo a deep folder1 +' + +test_expect_success 'reapply after updating in-tree file' ' + git -C repo sparse-checkout set --in-tree .sparse/sparse && + check_files repo a && + test_path_is_dir repo/.sparse && + echo "\tdir = folder1" >>repo/.sparse/sparse && + git -C repo commit -a -m "Update sparse file" && + git -C repo sparse-checkout reapply && + check_files repo a folder1 && + test_path_is_dir repo/.sparse && + git -C repo checkout HEAD~1 && + git -C repo sparse-checkout reapply && + check_files repo a && + test_path_is_dir repo/.sparse +' + test_done