diff mbox series

[7/7] sparse-checkout: provide a new update subcommand

Message ID 650db6863426ae2b324ba717f898247f44279cb8.1584169893.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Sparse checkout improvements -- improved sparsity updating | expand

Commit Message

Linus Arver via GitGitGadget March 14, 2020, 7:11 a.m. UTC
From: Elijah Newren <newren@gmail.com>

If commands like merge or rebase materialize files as part of their work,
or a previous sparse-checkout command failed to update individual files
due to dirty changes, users may want a command to simply 'reapply' the
sparsity rules.  Provide one.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-sparse-checkout.txt | 10 ++++++++++
 builtin/sparse-checkout.c             | 10 +++++++++-
 2 files changed, 19 insertions(+), 1 deletion(-)

Comments

Derrick Stolee March 15, 2020, 4:24 p.m. UTC | #1
On 3/14/2020 3:11 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> If commands like merge or rebase materialize files as part of their work,
> or a previous sparse-checkout command failed to update individual files
> due to dirty changes, users may want a command to simply 'reapply' the
> sparsity rules.  Provide one.

I was actually thinking "refresh" would be a better name, but also you
use "reapply" which is good, too. I'm concerned that "update" may imply
that the sparse-checkout patterns can change, but you really mean to
re-do the work from a previous "git sparse-checkout (set|add)".

I also thought of "reset" but that would be a confusing overload.

> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  Documentation/git-sparse-checkout.txt | 10 ++++++++++
>  builtin/sparse-checkout.c             | 10 +++++++++-
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index c0342e53938..27f4392489f 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -70,6 +70,16 @@ C-style quoted strings.
>  	`core.sparseCheckoutCone` is enabled, the given patterns are interpreted
>  	as directory names as in the 'set' subcommand.
>  
> +'update'::
> +	Update the sparseness of paths in the working tree based on the
> +	existing patterns.  Commands like merge or rebase can materialize
> +	paths to do their work (e.g. in order to show you a conflict), and
> +	other sparse-checkout commands might fail to sparsify an individual
> +	file (e.g. because it has unstaged changes or conflicts).  In such
> +	cases, it can make sense to run `git sparse-checkout update` later
> +	after cleaning up affected paths (e.g. resolving conflicts, undoing
> +	or committing changes, etc.).
> +
>  'disable'::
>  	Disable the `core.sparseCheckout` config setting, and restore the
>  	working directory to include all files. Leaves the sparse-checkout
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 5d3ec2e6be9..2ae21011dfd 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -18,7 +18,7 @@
>  static const char *empty_base = "";
>  
>  static char const * const builtin_sparse_checkout_usage[] = {
> -	N_("git sparse-checkout (init|list|set|add|disable) <options>"),
> +	N_("git sparse-checkout (init|list|set|add|update|disable) <options>"),
>  	NULL
>  };
>  
> @@ -552,6 +552,12 @@ static int sparse_checkout_set(int argc, const char **argv, const char *prefix,
>  	return modify_pattern_list(argc, argv, m);
>  }
>  
> +static int sparse_checkout_update(int argc, const char **argv)
> +{
> +	repo_read_index(the_repository);
> +	return update_working_directory(NULL);
> +}
> +

Short and sweet! I suppose my earlier comment about whether
repo_read_index() was necessary is answered here. Perhaps it
should be part of update_working_directory()? (And pass a
repository pointer to it?)

>  static int sparse_checkout_disable(int argc, const char **argv)
>  {
>  	struct pattern_list pl;
> @@ -601,6 +607,8 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
>  			return sparse_checkout_set(argc, argv, prefix, REPLACE);
>  		if (!strcmp(argv[0], "add"))
>  			return sparse_checkout_set(argc, argv, prefix, ADD);
> +		if (!strcmp(argv[0], "update"))
> +			return sparse_checkout_update(argc, argv);
>  		if (!strcmp(argv[0], "disable"))
>  			return sparse_checkout_disable(argc, argv);
>  	}

Thanks,
-Stolee
Elijah Newren March 16, 2020, 5:05 p.m. UTC | #2
On Sun, Mar 15, 2020 at 9:24 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 3/14/2020 3:11 AM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > If commands like merge or rebase materialize files as part of their work,
> > or a previous sparse-checkout command failed to update individual files
> > due to dirty changes, users may want a command to simply 'reapply' the
> > sparsity rules.  Provide one.
>
> I was actually thinking "refresh" would be a better name, but also you
> use "reapply" which is good, too. I'm concerned that "update" may imply
> that the sparse-checkout patterns can change, but you really mean to
> re-do the work from a previous "git sparse-checkout (set|add)".
>
> I also thought of "reset" but that would be a confusing overload.

Makes sense; I'll switch it over to "reapply".

> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  Documentation/git-sparse-checkout.txt | 10 ++++++++++
> >  builtin/sparse-checkout.c             | 10 +++++++++-
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> > index c0342e53938..27f4392489f 100644
> > --- a/Documentation/git-sparse-checkout.txt
> > +++ b/Documentation/git-sparse-checkout.txt
> > @@ -70,6 +70,16 @@ C-style quoted strings.
> >       `core.sparseCheckoutCone` is enabled, the given patterns are interpreted
> >       as directory names as in the 'set' subcommand.
> >
> > +'update'::
> > +     Update the sparseness of paths in the working tree based on the
> > +     existing patterns.  Commands like merge or rebase can materialize
> > +     paths to do their work (e.g. in order to show you a conflict), and
> > +     other sparse-checkout commands might fail to sparsify an individual
> > +     file (e.g. because it has unstaged changes or conflicts).  In such
> > +     cases, it can make sense to run `git sparse-checkout update` later
> > +     after cleaning up affected paths (e.g. resolving conflicts, undoing
> > +     or committing changes, etc.).
> > +
> >  'disable'::
> >       Disable the `core.sparseCheckout` config setting, and restore the
> >       working directory to include all files. Leaves the sparse-checkout
> > diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> > index 5d3ec2e6be9..2ae21011dfd 100644
> > --- a/builtin/sparse-checkout.c
> > +++ b/builtin/sparse-checkout.c
> > @@ -18,7 +18,7 @@
> >  static const char *empty_base = "";
> >
> >  static char const * const builtin_sparse_checkout_usage[] = {
> > -     N_("git sparse-checkout (init|list|set|add|disable) <options>"),
> > +     N_("git sparse-checkout (init|list|set|add|update|disable) <options>"),
> >       NULL
> >  };
> >
> > @@ -552,6 +552,12 @@ static int sparse_checkout_set(int argc, const char **argv, const char *prefix,
> >       return modify_pattern_list(argc, argv, m);
> >  }
> >
> > +static int sparse_checkout_update(int argc, const char **argv)
> > +{
> > +     repo_read_index(the_repository);
> > +     return update_working_directory(NULL);
> > +}
> > +
>
> Short and sweet! I suppose my earlier comment about whether
> repo_read_index() was necessary is answered here. Perhaps it
> should be part of update_working_directory()? (And pass a
> repository pointer to it?)

Good question.  Is there a chance we want to make
update_working_directory() available to other areas of git outside of
sparse-checkout.c?  If so, potentially re-reading the index might not
be friendly, but if sparse-checkout.c is going to remain the only
caller then it probably makes sense to move it inside.

> >  static int sparse_checkout_disable(int argc, const char **argv)
> >  {
> >       struct pattern_list pl;
> > @@ -601,6 +607,8 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
> >                       return sparse_checkout_set(argc, argv, prefix, REPLACE);
> >               if (!strcmp(argv[0], "add"))
> >                       return sparse_checkout_set(argc, argv, prefix, ADD);
> > +             if (!strcmp(argv[0], "update"))
> > +                     return sparse_checkout_update(argc, argv);
> >               if (!strcmp(argv[0], "disable"))
> >                       return sparse_checkout_disable(argc, argv);
> >       }
Derrick Stolee March 16, 2020, 5:18 p.m. UTC | #3
On 3/16/2020 1:05 PM, Elijah Newren wrote:
> On Sun, Mar 15, 2020 at 9:24 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 3/14/2020 3:11 AM, Elijah Newren via GitGitGadget wrote:
>>> From: Elijah Newren <newren@gmail.com>
>>> +static int sparse_checkout_update(int argc, const char **argv)
>>> +{
>>> +     repo_read_index(the_repository);
>>> +     return update_working_directory(NULL);
>>> +}
>>> +
>>
>> Short and sweet! I suppose my earlier comment about whether
>> repo_read_index() was necessary is answered here. Perhaps it
>> should be part of update_working_directory()? (And pass a
>> repository pointer to it?)
> 
> Good question.  Is there a chance we want to make
> update_working_directory() available to other areas of git outside of
> sparse-checkout.c?  If so, potentially re-reading the index might not
> be friendly, but if sparse-checkout.c is going to remain the only
> caller then it probably makes sense to move it inside.

Minh had an interesting idea during side-conversations at the summit:
have a way for an in-tree description of some sparse-checkout cones.
The idea was to be able to automatically update the sparse-checkout
while moving between commits that may have different dependency
configurations. In the world of Office it would mean that there is
some file ".sparse/word" that describes the directories required to
build Word, and ".sparse/ppt" for building PowerPoint. Then, based
on local Git config, we would see that we want our sparse-checkout
cone to match the union of the directories in .sparse/word and
.sparse/ppt. As we move HEAD, we would want to automatically update
the sparse cone when those files change.

I'm working on a design document for how this idea would work,
realistically, that I plan to share here and with the Office team
to see if it is actually a helpful plan. I think it would reduce
the performance cost of the hook we plan to use for this, and
would reduce the investment needed for a project to adopt
sparse-checkout.

All that is to say, yes we may want to add other callers to
update_working_directory() outside of the sparse-checkout
builtin. With that in mind, perhaps its name should reflect
the fact that we are only updating it according to the sparse
cone?

Thanks,
-Stolee
Elijah Newren March 16, 2020, 7:23 p.m. UTC | #4
On Mon, Mar 16, 2020 at 10:18 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 3/16/2020 1:05 PM, Elijah Newren wrote:
> > On Sun, Mar 15, 2020 at 9:24 AM Derrick Stolee <stolee@gmail.com> wrote:
> >>
> >> On 3/14/2020 3:11 AM, Elijah Newren via GitGitGadget wrote:
> >>> From: Elijah Newren <newren@gmail.com>
> >>> +static int sparse_checkout_update(int argc, const char **argv)
> >>> +{
> >>> +     repo_read_index(the_repository);
> >>> +     return update_working_directory(NULL);
> >>> +}
> >>> +
> >>
> >> Short and sweet! I suppose my earlier comment about whether
> >> repo_read_index() was necessary is answered here. Perhaps it
> >> should be part of update_working_directory()? (And pass a
> >> repository pointer to it?)
> >
> > Good question.  Is there a chance we want to make
> > update_working_directory() available to other areas of git outside of
> > sparse-checkout.c?  If so, potentially re-reading the index might not
> > be friendly, but if sparse-checkout.c is going to remain the only
> > caller then it probably makes sense to move it inside.
>
> Minh had an interesting idea during side-conversations at the summit:
> have a way for an in-tree description of some sparse-checkout cones.
> The idea was to be able to automatically update the sparse-checkout
> while moving between commits that may have different dependency
> configurations. In the world of Office it would mean that there is
> some file ".sparse/word" that describes the directories required to
> build Word, and ".sparse/ppt" for building PowerPoint. Then, based
> on local Git config, we would see that we want our sparse-checkout
> cone to match the union of the directories in .sparse/word and
> .sparse/ppt. As we move HEAD, we would want to automatically update
> the sparse cone when those files change.
>
> I'm working on a design document for how this idea would work,
> realistically, that I plan to share here and with the Office team
> to see if it is actually a helpful plan. I think it would reduce
> the performance cost of the hook we plan to use for this, and
> would reduce the investment needed for a project to adopt
> sparse-checkout.
>
> All that is to say, yes we may want to add other callers to
> update_working_directory() outside of the sparse-checkout
> builtin. With that in mind, perhaps its name should reflect
> the fact that we are only updating it according to the sparse
> cone?
>
> Thanks,
> -Stolee

Interesting.  Some context on another usecase (which may not modify
your plans but I'll throw it out there for consideration):

For us, we have a bunch of modules/* directories.  Each has a file
which lists the other modules it directly depends upon.  Thus to get
all dependencies both direct and indirect, something has to walk that
DAG.  Being required to list the dependencies in both some place that
the build system understand, and one that git understands, doesn't
sound like fun.  Also, requiring users to list all transitive
dependencies or remembering to run some script to do so sounds
problematic.

We do have a special file that defines teams, e.g. team-1 means these
three modules (plus implicitly any of their direct and indirect
dependencies), team-2 means this one module, etc.

Also, we do record the user's specification of the modules/teams they
want already, but not within the repo as you're doing in e.g.
.sparse/team-1, .sparse/team-2.  If the user runs './sparsify
--modules A B', we record the modules in
.git/info/sparse-module-specification.  This differs from
.git/info/sparse-checkout because the latter has full path ("module/A"
and "module/B" instead of just "A" and "B") and because it has
transitive dependencies (thus may have hundreds of directories even if
the user just specified two).

git would thus be unable to use our
.git/info/sparse-module-specification to do updates, and as above we
don't want to have to store the dependencies in another place, and the
fully resolved ones at that.  However, we do get partial auto-updating
because the build system has a pre-build hook that essentially runs
`git sparse-checkout reapply` whenever any relevant
dependency-declaration file is newer than .git/info/sparse-checkout.

Of course, waiting until a build may be good enough for us, but others
might want updates when they switch branches or do other operations
(merge, rebase, cherry-pick, revert, am, reset, etc.).  In such a
case, maybe we could use some kind of hook?  Is this what
post-index-change is for?  (If not, I certainly don't want to try to
navigate post-checkout and post-merge and add post-* for all the other
operations).

Anyway, some food for thought while you're working in this area...
diff mbox series

Patch

diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index c0342e53938..27f4392489f 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -70,6 +70,16 @@  C-style quoted strings.
 	`core.sparseCheckoutCone` is enabled, the given patterns are interpreted
 	as directory names as in the 'set' subcommand.
 
+'update'::
+	Update the sparseness of paths in the working tree based on the
+	existing patterns.  Commands like merge or rebase can materialize
+	paths to do their work (e.g. in order to show you a conflict), and
+	other sparse-checkout commands might fail to sparsify an individual
+	file (e.g. because it has unstaged changes or conflicts).  In such
+	cases, it can make sense to run `git sparse-checkout update` later
+	after cleaning up affected paths (e.g. resolving conflicts, undoing
+	or committing changes, etc.).
+
 'disable'::
 	Disable the `core.sparseCheckout` config setting, and restore the
 	working directory to include all files. Leaves the sparse-checkout
diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
index 5d3ec2e6be9..2ae21011dfd 100644
--- a/builtin/sparse-checkout.c
+++ b/builtin/sparse-checkout.c
@@ -18,7 +18,7 @@ 
 static const char *empty_base = "";
 
 static char const * const builtin_sparse_checkout_usage[] = {
-	N_("git sparse-checkout (init|list|set|add|disable) <options>"),
+	N_("git sparse-checkout (init|list|set|add|update|disable) <options>"),
 	NULL
 };
 
@@ -552,6 +552,12 @@  static int sparse_checkout_set(int argc, const char **argv, const char *prefix,
 	return modify_pattern_list(argc, argv, m);
 }
 
+static int sparse_checkout_update(int argc, const char **argv)
+{
+	repo_read_index(the_repository);
+	return update_working_directory(NULL);
+}
+
 static int sparse_checkout_disable(int argc, const char **argv)
 {
 	struct pattern_list pl;
@@ -601,6 +607,8 @@  int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
 			return sparse_checkout_set(argc, argv, prefix, REPLACE);
 		if (!strcmp(argv[0], "add"))
 			return sparse_checkout_set(argc, argv, prefix, ADD);
+		if (!strcmp(argv[0], "update"))
+			return sparse_checkout_update(argc, argv);
 		if (!strcmp(argv[0], "disable"))
 			return sparse_checkout_disable(argc, argv);
 	}