diff mbox series

[v3,5/5] config: add setting to ignore sparsity patterns in some cmds

Message ID 748b1e955ccdcf7cd897a180f4b4fe82d7bfcf00.1590627264.git.matheus.bernardino@usp.br (mailing list archive)
State New, archived
Headers show
Series grep: honor sparse checkout and add option to ignore it | expand

Commit Message

Matheus Tavares May 28, 2020, 1:13 a.m. UTC
When sparse checkout is enabled, some users expect the output of certain
commands (such as grep, diff, and log) to be also restricted within the
sparsity patterns. This would allow them to effectively work only on the
subset of files in which they are interested; and allow some commands to
possibly perform better, by not considering uninteresting paths. For
this reason, we taught grep to honor the sparsity patterns, in the
previous patch. But, on the other hand, allowing grep and the other
commands mentioned to optionally ignore the patterns also make for some
interesting use cases. E.g. using grep to search for a function
documentation that resides outside the sparse checkout.

In any case, there is no current way for users to configure the behavior
they want for these commands. Aiming to provide this flexibility, let's
introduce the sparse.restrictCmds setting (and the analogous
--[no]-restrict-to-sparse-paths global option). The default value is
true. For now, grep is the only one affected by this setting, but the
goal is to have support for more commands, in the future.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/config.txt               |   2 +
 Documentation/config/sparse.txt        |  24 +++++
 Documentation/git-grep.txt             |   3 +
 Documentation/git.txt                  |   4 +
 Makefile                               |   1 +
 builtin/grep.c                         |  13 ++-
 contrib/completion/git-completion.bash |   2 +
 git.c                                  |   6 ++
 sparse-checkout.c                      |  16 +++
 sparse-checkout.h                      |  11 +++
 t/t7817-grep-sparse-checkout.sh        | 132 ++++++++++++++++++++++++-
 t/t9902-completion.sh                  |   4 +-
 12 files changed, 212 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/config/sparse.txt
 create mode 100644 sparse-checkout.c
 create mode 100644 sparse-checkout.h

Comments

Elijah Newren May 30, 2020, 4:18 p.m. UTC | #1
On Wed, May 27, 2020 at 6:14 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> When sparse checkout is enabled, some users expect the output of certain
> commands (such as grep, diff, and log) to be also restricted within the
> sparsity patterns. This would allow them to effectively work only on the
> subset of files in which they are interested; and allow some commands to
> possibly perform better, by not considering uninteresting paths. For
> this reason, we taught grep to honor the sparsity patterns, in the
> previous patch. But, on the other hand, allowing grep and the other
> commands mentioned to optionally ignore the patterns also make for some
> interesting use cases. E.g. using grep to search for a function
> documentation that resides outside the sparse checkout.
>
> In any case, there is no current way for users to configure the behavior
> they want for these commands. Aiming to provide this flexibility, let's
> introduce the sparse.restrictCmds setting (and the analogous
> --[no]-restrict-to-sparse-paths global option). The default value is
> true. For now, grep is the only one affected by this setting, but the
> goal is to have support for more commands, in the future.
>
> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> ---
>  Documentation/config.txt               |   2 +
>  Documentation/config/sparse.txt        |  24 +++++
>  Documentation/git-grep.txt             |   3 +
>  Documentation/git.txt                  |   4 +
>  Makefile                               |   1 +
>  builtin/grep.c                         |  13 ++-
>  contrib/completion/git-completion.bash |   2 +
>  git.c                                  |   6 ++
>  sparse-checkout.c                      |  16 +++
>  sparse-checkout.h                      |  11 +++
>  t/t7817-grep-sparse-checkout.sh        | 132 ++++++++++++++++++++++++-
>  t/t9902-completion.sh                  |   4 +-
>  12 files changed, 212 insertions(+), 6 deletions(-)
>  create mode 100644 Documentation/config/sparse.txt
>  create mode 100644 sparse-checkout.c
>  create mode 100644 sparse-checkout.h
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index ef0768b91a..fd74b80302 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -436,6 +436,8 @@ include::config/sequencer.txt[]
>
>  include::config/showbranch.txt[]
>
> +include::config/sparse.txt[]
> +
>  include::config/splitindex.txt[]
>
>  include::config/ssh.txt[]
> diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
> new file mode 100644
> index 0000000000..2a25b4b8ef
> --- /dev/null
> +++ b/Documentation/config/sparse.txt
> @@ -0,0 +1,24 @@
> +sparse.restrictCmds::
> +       Only meaningful in conjunction with core.sparseCheckout. This option
> +       extends sparse checkouts (which limit which paths are written to the
> +       working tree), so that output and operations are also limited to the
> +       sparsity paths where possible and implemented. The purpose of this
> +       option is to (1) focus output for the user on the portion of the
> +       repository that is of interest to them, and (2) enable potentially
> +       dramatic performance improvements, especially in conjunction with
> +       partial clones.
> ++
> +When this option is true (default), some git commands may limit their behavior
> +to the paths specified by the sparsity patterns, or to the intersection of
> +those paths and any (like `*.c`) that the user might also specify on the
> +command line. When false, the affected commands will work on full trees,
> +ignoring the sparsity patterns. For now, only git-grep honors this setting. In
> +this command, the restriction takes effect in three cases: with --cached; when
> +a commit-ish is given; when searching a working tree where some paths excluded
> +by the sparsity patterns are present (e.g. manually created paths or not
> +removed submodules).

I think "In this command, the restriction takes effect..." to the end
of the paragraph should be removed.  I don't want every subcommand's
behavior to be specified here; it'll grow unreadably long and be more
likely to eventually go stale.

> ++
> +Note: commands which export, integrity check, or create history will always
> +operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.),
> +unaffected by any sparsity patterns. Also, writting commands such as
> +sparse-checkout and read-tree will not be affected by this configuration.

s/writting/writing/

> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index 9bdf807584..abbf100109 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -41,6 +41,9 @@ characters.  An empty string as search expression matches all lines.
>  CONFIGURATION
>  -------------
>
> +git-grep honors the sparse.restrictCmds setting. See its definition in
> +linkgit:git-config[1].
> +
>  :git-grep: 1
>  include::config/grep.txt[]
>
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index 9d6769e95a..5e107c6246 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -180,6 +180,10 @@ If you just want to run git as if it was started in `<path>` then use
>         Do not perform optional operations that require locks. This is
>         equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`.
>
> +--[no-]restrict-to-sparse-paths::
> +       Overrides the sparse.restrictCmds configuration (see
> +       linkgit:git-config[1]) for this execution.
> +
>  --list-cmds=group[,group...]::
>         List commands by group. This is an internal/experimental
>         option and may change or be removed in the future. Supported
> diff --git a/Makefile b/Makefile
> index 90aa329eb7..0c0013b32c 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -983,6 +983,7 @@ LIB_OBJS += sha1-name.o
>  LIB_OBJS += shallow.o
>  LIB_OBJS += sideband.o
>  LIB_OBJS += sigchain.o
> +LIB_OBJS += sparse-checkout.o
>  LIB_OBJS += split-index.o
>  LIB_OBJS += stable-qsort.o
>  LIB_OBJS += strbuf.o
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 11e33b8aee..cc696dab4a 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -25,6 +25,7 @@
>  #include "submodule-config.h"
>  #include "object-store.h"
>  #include "packfile.h"
> +#include "sparse-checkout.h"
>
>  static char const * const grep_usage[] = {
>         N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
> @@ -498,6 +499,7 @@ static int grep_cache(struct grep_opt *opt,
>         int nr;
>         struct strbuf name = STRBUF_INIT;
>         int name_base_len = 0;
> +       int sparse_paths_only = restrict_to_sparse_paths(repo);
>         if (repo->submodule_prefix) {
>                 name_base_len = strlen(repo->submodule_prefix);
>                 strbuf_addstr(&name, repo->submodule_prefix);
> @@ -509,7 +511,7 @@ static int grep_cache(struct grep_opt *opt,
>         for (nr = 0; nr < repo->index->cache_nr; nr++) {
>                 const struct cache_entry *ce = repo->index->cache[nr];
>
> -               if (ce_skip_worktree(ce))
> +               if (sparse_paths_only && ce_skip_worktree(ce))
>                         continue;
>
>                 strbuf_setlen(&name, name_base_len);
> @@ -715,9 +717,10 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
>                      int is_root_tree)
>  {
>         struct pattern_list *patterns = NULL;
> +       int sparse_paths_only = restrict_to_sparse_paths(opt->repo);
>         int ret;
>
> -       if (is_root_tree)
> +       if (is_root_tree && sparse_paths_only)
>                 patterns = get_sparsity_patterns(opt->repo);
>
>         ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree,

It's kinda nice how clean and easy it is to insert this new option
after the previous patch.

> @@ -1257,6 +1260,12 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>
>         if (!use_index || untracked) {
>                 int use_exclude = (opt_exclude < 0) ? use_index : !!opt_exclude;
> +
> +               if (opt_restrict_to_sparse_paths >= 0) {
> +                       die(_("--[no-]restrict-to-sparse-paths is incompatible"
> +                                 " with --no-index and --untracked"));
> +               }
> +
>                 hit = grep_directory(&opt, &pathspec, use_exclude, use_index);
>         } else if (0 <= opt_exclude) {
>                 die(_("--[no-]exclude-standard cannot be used for tracked contents"));
> diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
> index 70ad04e1b2..71956f7313 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -3208,6 +3208,8 @@ __git_main ()
>                         --namespace=
>                         --no-replace-objects
>                         --help
> +                       --restrict-to-sparse-paths
> +                       --no-restrict-to-sparse-paths
>                         "
>                         ;;
>                 *)
> diff --git a/git.c b/git.c
> index a2d337eed7..6db1382ae4 100644
> --- a/git.c
> +++ b/git.c
> @@ -38,6 +38,7 @@ const char git_more_info_string[] =
>            "See 'git help git' for an overview of the system.");
>
>  static int use_pager = -1;
> +int opt_restrict_to_sparse_paths = -1;
>
>  static void list_builtins(struct string_list *list, unsigned int exclude_option);
>
> @@ -311,6 +312,10 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
>                         } else {
>                                 exit(list_cmds(cmd));
>                         }
> +               } else if (!strcmp(cmd, "--restrict-to-sparse-paths")) {
> +                       opt_restrict_to_sparse_paths = 1;
> +               } else if (!strcmp(cmd, "--no-restrict-to-sparse-paths")) {
> +                       opt_restrict_to_sparse_paths = 0;
>                 } else {
>                         fprintf(stderr, _("unknown option: %s\n"), cmd);
>                         usage(git_usage_string);
> @@ -319,6 +324,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
>                 (*argv)++;
>                 (*argc)--;
>         }
> +
>         return (*argv) - orig_argv;
>  }
>

Why the stray whitespace change?

> diff --git a/sparse-checkout.c b/sparse-checkout.c
> new file mode 100644
> index 0000000000..9a9e50fd29
> --- /dev/null
> +++ b/sparse-checkout.c
> @@ -0,0 +1,16 @@
> +#include "cache.h"
> +#include "config.h"
> +#include "sparse-checkout.h"
> +
> +int restrict_to_sparse_paths(struct repository *repo)
> +{
> +       int ret;
> +
> +       if (opt_restrict_to_sparse_paths >= 0)
> +               return opt_restrict_to_sparse_paths;
> +
> +       if (repo_config_get_bool(repo, "sparse.restrictcmds", &ret))
> +               ret = 1;
> +
> +       return ret;
> +}

Do we want to considering renaming this file to sparse.c, since it's
for sparse grep and sparse diff and etc., not just for the checkout
piece?  It would also go along well with our toplevel related config
being in the "sparse" namespace.

> diff --git a/sparse-checkout.h b/sparse-checkout.h
> new file mode 100644
> index 0000000000..1de3b588d8
> --- /dev/null
> +++ b/sparse-checkout.h
> @@ -0,0 +1,11 @@
> +#ifndef SPARSE_CHECKOUT_H
> +#define SPARSE_CHECKOUT_H
> +
> +struct repository;
> +
> +extern int opt_restrict_to_sparse_paths; /* from git.c */
> +
> +/* Whether or not cmds should restrict behavior on sparse paths, in this repo */
> +int restrict_to_sparse_paths(struct repository *repo);
> +
> +#endif /* SPARSE_CHECKOUT_H */
> diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh
> index ce080cf572..1aef084186 100755
> --- a/t/t7817-grep-sparse-checkout.sh
> +++ b/t/t7817-grep-sparse-checkout.sh
> @@ -80,10 +80,10 @@ test_expect_success 'setup' '
>         test_path_is_file sub2/a
>  '
>
> -# The test bellow checks a special case: the sparsity patterns exclude '/b'
> +# The two tests bellow check a special case: the sparsity patterns exclude '/b'
>  # and sparse checkout is enable, but the path exists on the working tree (e.g.
>  # manually created after `git sparse-checkout init`). In this case, grep should
> -# skip it.
> +# skip the file by default, but not with --no-restrict-to-sparse-paths.
>  test_expect_success 'grep in working tree should honor sparse checkout' '
>         cat >expect <<-EOF &&
>         a:text
> @@ -93,6 +93,16 @@ test_expect_success 'grep in working tree should honor sparse checkout' '
>         git grep "text" >actual &&
>         test_cmp expect actual
>  '
> +test_expect_success 'grep w/ --no-restrict-to-sparse-paths for sparsely excluded but present paths' '
> +       cat >expect <<-EOF &&
> +       a:text
> +       b:new-text
> +       EOF
> +       echo "new-text" >b &&
> +       test_when_finished "rm b" &&
> +       git --no-restrict-to-sparse-paths grep "text" >actual &&
> +       test_cmp expect actual
> +'
>
>  test_expect_success 'grep --cached should honor sparse checkout' '
>         cat >expect <<-EOF &&
> @@ -136,7 +146,7 @@ test_expect_success 'grep <tree-ish> should ignore sparsity patterns' '
>  '
>
>  # Note that sub2/ is present in the worktree but it is excluded by the sparsity
> -# patterns, so grep should not recurse into it.
> +# patterns, so grep should only recurse into it with --no-restrict-to-sparse-paths.
>  test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' '
>         cat >expect <<-EOF &&
>         a:text
> @@ -145,6 +155,15 @@ test_expect_success 'grep --recurse-submodules should honor sparse checkout in s
>         git grep --recurse-submodules "text" >actual &&
>         test_cmp expect actual
>  '
> +test_expect_success 'grep --recurse-submodules should search in excluded submodules w/ --no-restrict-to-sparse-paths' '
> +       cat >expect <<-EOF &&
> +       a:text
> +       sub/B/b:text
> +       sub2/a:text
> +       EOF
> +       git --no-restrict-to-sparse-paths grep --recurse-submodules "text" >actual &&
> +       test_cmp expect actual
> +'
>
>  test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' '
>         cat >expect <<-EOF &&
> @@ -171,4 +190,111 @@ test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse
>         test_cmp expect_tag-to-commit actual_tag-to-commit
>  '
>
> +for cmd in 'git --no-restrict-to-sparse-paths grep' \
> +          'git -c sparse.restrictCmds=false grep' \
> +          'git -c sparse.restrictCmds=true --no-restrict-to-sparse-paths grep'
> +do
> +
> +       test_expect_success "$cmd --cached should ignore sparsity patterns" '
> +               cat >expect <<-EOF &&
> +               a:text
> +               b:text
> +               dir/c:text
> +               EOF
> +               $cmd --cached "text" >actual &&
> +               test_cmp expect actual
> +       '
> +
> +       test_expect_success "$cmd <commit-ish> should ignore sparsity patterns" '
> +               commit=$(git rev-parse HEAD) &&
> +               cat >expect_commit <<-EOF &&
> +               $commit:a:text
> +               $commit:b:text
> +               $commit:dir/c:text
> +               EOF
> +               cat >expect_tag-to-commit <<-EOF &&
> +               tag-to-commit:a:text
> +               tag-to-commit:b:text
> +               tag-to-commit:dir/c:text
> +               EOF
> +               $cmd "text" $commit >actual_commit &&
> +               test_cmp expect_commit actual_commit &&
> +               $cmd "text" tag-to-commit >actual_tag-to-commit &&
> +               test_cmp expect_tag-to-commit actual_tag-to-commit
> +       '
> +done
> +
> +test_expect_success 'grep --recurse-submodules --cached \w --no-restrict-to-sparse-paths' '

s%\w%w/%, or s%\w%with%?  Same issue below too.

> +       cat >expect <<-EOF &&
> +       a:text
> +       b:text
> +       dir/c:text
> +       sub/A/a:text
> +       sub/B/b:text
> +       sub2/a:text
> +       EOF
> +       git --no-restrict-to-sparse-paths grep --recurse-submodules --cached \
> +               "text" >actual &&
> +       test_cmp expect actual
> +'
> +
> +test_expect_success 'grep --recurse-submodules <commit-ish> \w --no-restrict-to-sparse-paths' '
> +       commit=$(git rev-parse HEAD) &&
> +       cat >expect_commit <<-EOF &&
> +       $commit:a:text
> +       $commit:b:text
> +       $commit:dir/c:text
> +       $commit:sub/A/a:text
> +       $commit:sub/B/b:text
> +       $commit:sub2/a:text
> +       EOF
> +       cat >expect_tag-to-commit <<-EOF &&
> +       tag-to-commit:a:text
> +       tag-to-commit:b:text
> +       tag-to-commit:dir/c:text
> +       tag-to-commit:sub/A/a:text
> +       tag-to-commit:sub/B/b:text
> +       tag-to-commit:sub2/a:text
> +       EOF
> +       git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \
> +               $commit >actual_commit &&
> +       test_cmp expect_commit actual_commit &&
> +       git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \
> +               tag-to-commit >actual_tag-to-commit &&
> +       test_cmp expect_tag-to-commit actual_tag-to-commit
> +'
> +
> +test_expect_success 'should respect the sparse.restrictCmds values from submodules' '
> +       cat >expect <<-EOF &&
> +       a:text
> +       sub/A/a:text
> +       sub/B/b:text
> +       EOF
> +       test_config -C sub sparse.restrictCmds false &&
> +       git grep --cached --recurse-submodules "text" >actual &&
> +       test_cmp expect actual
> +'
> +
> +test_expect_success 'should propagate --[no]-restrict-to-sparse-paths to submodules' '
> +       cat >expect <<-EOF &&
> +       a:text
> +       b:text
> +       dir/c:text
> +       sub/A/a:text
> +       sub/B/b:text
> +       sub2/a:text
> +       EOF
> +       test_config -C sub sparse.restrictCmds true &&
> +       git --no-restrict-to-sparse-paths grep --cached --recurse-submodules "text" >actual &&
> +       test_cmp expect actual
> +'
> +
> +for opt in '--untracked' '--no-index'
> +do
> +       test_expect_success "--[no]-restrict-to-sparse-paths and $opt are incompatible" "
> +               test_must_fail git --restrict-to-sparse-paths grep $opt . 2>actual &&
> +               test_i18ngrep 'restrict-to-sparse-paths is incompatible with' actual
> +       "
> +done
> +
>  test_done
> diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
> index 3c44af6940..a4a7767e06 100755
> --- a/t/t9902-completion.sh
> +++ b/t/t9902-completion.sh
> @@ -1473,6 +1473,8 @@ test_expect_success 'double dash "git" itself' '
>         --namespace=
>         --no-replace-objects Z
>         --help Z
> +       --restrict-to-sparse-paths Z
> +       --no-restrict-to-sparse-paths Z
>         EOF
>  '
>
> @@ -1515,7 +1517,7 @@ test_expect_success 'general options' '
>         test_completion "git --nam" "--namespace=" &&
>         test_completion "git --bar" "--bare " &&
>         test_completion "git --inf" "--info-path " &&
> -       test_completion "git --no-r" "--no-replace-objects "
> +       test_completion "git --no-rep" "--no-replace-objects "
>  '

All these testcases look great (modulo the small typo I pointed out
earlier); I kept thinking "but what about case <x>?" and then I kept
reading and saw you covered it.  You even added some I wasn't thinking
about and might have overlooked but seem important.
Matheus Tavares June 1, 2020, 4:45 a.m. UTC | #2
On Sat, May 30, 2020 at 1:18 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Wed, May 27, 2020 at 6:14 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> > diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
> > new file mode 100644
> > index 0000000000..2a25b4b8ef
> > --- /dev/null
> > +++ b/Documentation/config/sparse.txt
> > @@ -0,0 +1,24 @@
> > +sparse.restrictCmds::
> > +       Only meaningful in conjunction with core.sparseCheckout. This option
> > +       extends sparse checkouts (which limit which paths are written to the
> > +       working tree), so that output and operations are also limited to the
> > +       sparsity paths where possible and implemented. The purpose of this
> > +       option is to (1) focus output for the user on the portion of the
> > +       repository that is of interest to them, and (2) enable potentially
> > +       dramatic performance improvements, especially in conjunction with
> > +       partial clones.
> > ++
> > +When this option is true (default), some git commands may limit their behavior
> > +to the paths specified by the sparsity patterns, or to the intersection of
> > +those paths and any (like `*.c`) that the user might also specify on the
> > +command line. When false, the affected commands will work on full trees,
> > +ignoring the sparsity patterns. For now, only git-grep honors this setting. In
> > +this command, the restriction takes effect in three cases: with --cached; when
> > +a commit-ish is given; when searching a working tree where some paths excluded
> > +by the sparsity patterns are present (e.g. manually created paths or not
> > +removed submodules).
>
> I think "In this command, the restriction takes effect..." to the end
> of the paragraph should be removed.  I don't want every subcommand's
> behavior to be specified here; it'll grow unreadably long and be more
> likely to eventually go stale.

Yeah, I was also concerned about that. But wouldn't it be important to
inform the users how the setting takes place in grep (specially with
the corner cases)? And maybe others, in the future?

What if we move the information that is only relevant to a single
command into its own man page? I.e. git-grep.txt would have something
like:

sparse.restrictCmds::
See complete definition in linkgit:git-config[1]. In grep, the
restriction takes effect in three cases: with --cached; when a
commit-ish is given; when searching a working tree where some paths
excluded by the sparsity patterns are present (e.g. manually created
paths or not removed submodules).

The only problem then is that the information would be a little
scattered... But I think it shouldn't be a big deal, as a person
interested in knowing how foo behaves with sparse.restrictCmds would
only need to look into foo's man page, anyway.

> > diff --git a/git.c b/git.c
> > index a2d337eed7..6db1382ae4 100644
> > --- a/git.c
> > +++ b/git.c
> > @@ -319,6 +324,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
> >                 (*argv)++;
> >                 (*argc)--;
> >         }
> > +
> >         return (*argv) - orig_argv;
> >  }
> >
>
> Why the stray whitespace change?

Oops, that shouldn't be there. Thanks!

>
> > diff --git a/sparse-checkout.c b/sparse-checkout.c
> > new file mode 100644
> > index 0000000000..9a9e50fd29
> > --- /dev/null
> > +++ b/sparse-checkout.c
> > @@ -0,0 +1,16 @@
> > +#include "cache.h"
> > +#include "config.h"
> > +#include "sparse-checkout.h"
> > +
> > +int restrict_to_sparse_paths(struct repository *repo)
> > +{
> > +       int ret;
> > +
> > +       if (opt_restrict_to_sparse_paths >= 0)
> > +               return opt_restrict_to_sparse_paths;
> > +
> > +       if (repo_config_get_bool(repo, "sparse.restrictcmds", &ret))
> > +               ret = 1;
> > +
> > +       return ret;
> > +}
>
> Do we want to considering renaming this file to sparse.c, since it's
> for sparse grep and sparse diff and etc., not just for the checkout
> piece?  It would also go along well with our toplevel related config
> being in the "sparse" namespace.

Makes sense. But since Stolee is already working on
"sparse-checkout.c" [1], if we use "sparse.c" in this series we will
end up with two extra files. And as "sparse.c" is quite small, I think
we could unify into the "sparse-checkout.c".

[1]: https://lore.kernel.org/git/0181a134bfb6986dc0e54ae624c478446a1324a9.1588857462.git.gitgitgadget@gmail.com/

> > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh
> > index ce080cf572..1aef084186 100755
> > --- a/t/t7817-grep-sparse-checkout.sh
> > +++ b/t/t7817-grep-sparse-checkout.sh
>
> All these testcases look great (modulo the small typo I pointed out
> earlier); I kept thinking "but what about case <x>?" and then I kept
> reading and saw you covered it.  You even added some I wasn't thinking
> about and might have overlooked but seem important.

Thanks :)
Elijah Newren June 3, 2020, 2:39 a.m. UTC | #3
On Sun, May 31, 2020 at 9:46 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Sat, May 30, 2020 at 1:18 PM Elijah Newren <newren@gmail.com> wrote:
> >
> > On Wed, May 27, 2020 at 6:14 PM Matheus Tavares
> > <matheus.bernardino@usp.br> wrote:
> > > diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
> > > new file mode 100644
> > > index 0000000000..2a25b4b8ef
> > > --- /dev/null
> > > +++ b/Documentation/config/sparse.txt
> > > @@ -0,0 +1,24 @@
> > > +sparse.restrictCmds::
> > > +       Only meaningful in conjunction with core.sparseCheckout. This option
> > > +       extends sparse checkouts (which limit which paths are written to the
> > > +       working tree), so that output and operations are also limited to the
> > > +       sparsity paths where possible and implemented. The purpose of this
> > > +       option is to (1) focus output for the user on the portion of the
> > > +       repository that is of interest to them, and (2) enable potentially
> > > +       dramatic performance improvements, especially in conjunction with
> > > +       partial clones.
> > > ++
> > > +When this option is true (default), some git commands may limit their behavior
> > > +to the paths specified by the sparsity patterns, or to the intersection of
> > > +those paths and any (like `*.c`) that the user might also specify on the
> > > +command line. When false, the affected commands will work on full trees,
> > > +ignoring the sparsity patterns. For now, only git-grep honors this setting. In
> > > +this command, the restriction takes effect in three cases: with --cached; when
> > > +a commit-ish is given; when searching a working tree where some paths excluded
> > > +by the sparsity patterns are present (e.g. manually created paths or not
> > > +removed submodules).
> >
> > I think "In this command, the restriction takes effect..." to the end
> > of the paragraph should be removed.  I don't want every subcommand's
> > behavior to be specified here; it'll grow unreadably long and be more
> > likely to eventually go stale.
>
> Yeah, I was also concerned about that. But wouldn't it be important to
> inform the users how the setting takes place in grep (specially with
> the corner cases)? And maybe others, in the future?
>
> What if we move the information that is only relevant to a single
> command into its own man page? I.e. git-grep.txt would have something
> like:

Moving it to grep's manpage seems ideal to me.  grep's behavior should
be defined in grep's manual.

> sparse.restrictCmds::
> See complete definition in linkgit:git-config[1]. In grep, the
> restriction takes effect in three cases: with --cached; when a
> commit-ish is given; when searching a working tree where some paths
> excluded by the sparsity patterns are present (e.g. manually created
> paths or not removed submodules).

That looks more than a little confusing.  Could this definition be
something more like "See base definition in linkgit:git-config[1].
grep honors sparse.restrictCmds by limiting searches to the sparsity
paths in three cases: when searching the working tree, when searching
the index with --cached, or when searching a specified commit"

> The only problem then is that the information would be a little
> scattered... But I think it shouldn't be a big deal, as a person
> interested in knowing how foo behaves with sparse.restrictCmds would
> only need to look into foo's man page, anyway.
>
> > > diff --git a/git.c b/git.c
> > > index a2d337eed7..6db1382ae4 100644
> > > --- a/git.c
> > > +++ b/git.c
> > > @@ -319,6 +324,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
> > >                 (*argv)++;
> > >                 (*argc)--;
> > >         }
> > > +
> > >         return (*argv) - orig_argv;
> > >  }
> > >
> >
> > Why the stray whitespace change?
>
> Oops, that shouldn't be there. Thanks!
>
> >
> > > diff --git a/sparse-checkout.c b/sparse-checkout.c
> > > new file mode 100644
> > > index 0000000000..9a9e50fd29
> > > --- /dev/null
> > > +++ b/sparse-checkout.c
> > > @@ -0,0 +1,16 @@
> > > +#include "cache.h"
> > > +#include "config.h"
> > > +#include "sparse-checkout.h"
> > > +
> > > +int restrict_to_sparse_paths(struct repository *repo)
> > > +{
> > > +       int ret;
> > > +
> > > +       if (opt_restrict_to_sparse_paths >= 0)
> > > +               return opt_restrict_to_sparse_paths;
> > > +
> > > +       if (repo_config_get_bool(repo, "sparse.restrictcmds", &ret))
> > > +               ret = 1;
> > > +
> > > +       return ret;
> > > +}
> >
> > Do we want to considering renaming this file to sparse.c, since it's
> > for sparse grep and sparse diff and etc., not just for the checkout
> > piece?  It would also go along well with our toplevel related config
> > being in the "sparse" namespace.
>
> Makes sense. But since Stolee is already working on
> "sparse-checkout.c" [1], if we use "sparse.c" in this series we will
> end up with two extra files. And as "sparse.c" is quite small, I think
> we could unify into the "sparse-checkout.c".
>
> [1]: https://lore.kernel.org/git/0181a134bfb6986dc0e54ae624c478446a1324a9.1588857462.git.gitgitgadget@gmail.com/

Or we could just suggest he use sparse.c too.  :-)

Stolee?


> > > diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh
> > > index ce080cf572..1aef084186 100755
> > > --- a/t/t7817-grep-sparse-checkout.sh
> > > +++ b/t/t7817-grep-sparse-checkout.sh
> >
> > All these testcases look great (modulo the small typo I pointed out
> > earlier); I kept thinking "but what about case <x>?" and then I kept
> > reading and saw you covered it.  You even added some I wasn't thinking
> > about and might have overlooked but seem important.
>
> Thanks :)
Matheus Tavares June 10, 2020, 9:15 p.m. UTC | #4
On Tue, Jun 2, 2020 at 11:40 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Sun, May 31, 2020 at 9:46 PM Matheus Tavares Bernardino
> <matheus.bernardino@usp.br> wrote:
> >
>
> Moving it to grep's manpage seems ideal to me.  grep's behavior should
> be defined in grep's manual.
>
> > sparse.restrictCmds::
> > See complete definition in linkgit:git-config[1]. In grep, the
> > restriction takes effect in three cases: with --cached; when a
> > commit-ish is given; when searching a working tree where some paths
> > excluded by the sparsity patterns are present (e.g. manually created
> > paths or not removed submodules).
>
> That looks more than a little confusing.  Could this definition be
> something more like "See base definition in linkgit:git-config[1].
> grep honors sparse.restrictCmds by limiting searches to the sparsity
> paths in three cases: when searching the working tree, when searching
> the index with --cached, or when searching a specified commit"

Yes, this looks better, thanks. I would only add a brief explanation
on what we mean by limiting the search in the working tree case. Since
the working tree should already contain only the sparse paths (in most
cases), I think this sentence may sound a little confusing without
some explanation. Even further, some users might expect that `git -c
sparse.restrictCmds=false grep $pattern` would restore the previous
behavior of falling back to the cache for non-present entries, which
is not true.

In particular, I would like to emphasize that the use for
`sparse.restrictCmds=false` in the working tree case, is for
situations like the one you described in [1]:

* uses sparse-checkout to remove a bunch of files/directories they
don't care about
* creates a new file that happens to have the same name as an
(unfortunately) generically worded filename that exists in the index
(but is marked SKIP_WORKTREE and had previously been removed)

In this situation, grep would ignore the said file by default, but
search it with `sparse.restrictCmds=false`.

So what do you think of the following:

sparse.restrictCmds::
See base definition in linkgit:git-config[1]. grep honors
sparse.restrictCmds by limiting searches to the sparsity paths in
three cases: when searching the working tree, when searching the index
with --cached, and when searching a specified commit. Note: when this
option is set to true (default), the working tree search will ignore
paths that are present despite not matching the sparsity patterns.
This can happen, for example, if you create a new file in a path that
was previously removed by git-sparse-checkout. Or if you don't
deinitialize a submodule that is excluded by the sparsity patterns
(thus remaining in the working copy, anyway).

[1]: https://lore.kernel.org/git/CABPp-BE+BL3Nq=Co=-kNB_wr=6gqX8zcGwa0ega_pGBpk6xYsg@mail.gmail.com/
Elijah Newren June 11, 2020, 12:35 a.m. UTC | #5
Hi Matheus,

On Wed, Jun 10, 2020 at 2:15 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Tue, Jun 2, 2020 at 11:40 PM Elijah Newren <newren@gmail.com> wrote:
> >
> > On Sun, May 31, 2020 at 9:46 PM Matheus Tavares Bernardino
> > <matheus.bernardino@usp.br> wrote:
> > >
> >
> > Moving it to grep's manpage seems ideal to me.  grep's behavior should
> > be defined in grep's manual.
> >
> > > sparse.restrictCmds::
> > > See complete definition in linkgit:git-config[1]. In grep, the
> > > restriction takes effect in three cases: with --cached; when a
> > > commit-ish is given; when searching a working tree where some paths
> > > excluded by the sparsity patterns are present (e.g. manually created
> > > paths or not removed submodules).
> >
> > That looks more than a little confusing.  Could this definition be
> > something more like "See base definition in linkgit:git-config[1].
> > grep honors sparse.restrictCmds by limiting searches to the sparsity
> > paths in three cases: when searching the working tree, when searching
> > the index with --cached, or when searching a specified commit"
>
> Yes, this looks better, thanks. I would only add a brief explanation
> on what we mean by limiting the search in the working tree case.

Possibly, but I think it would be easy to go overboard here.

> Since
> the working tree should already contain only the sparse paths (in most
> cases), I think this sentence may sound a little confusing without
> some explanation.

That's an interesting flag.  I'm curious, though, would they be
confused by it, or would it just seem immediately obvious and almost
not worth mentioning?  In other words, would they think "Well, if you
use sparse-checkout to get just a subset of files checked out, it
totally makes sense that grep would be limited in that case.  Why do
they even need to mention it -- just for completeness, I guess?"

And even if not all users think that way, would a large percentage of
users around them think that way and point out the obviousness of the
docs?

If not, maybe we just add a "(obviously)" comment right after "working tree"?

> Even further, some users might expect that `git -c
> sparse.restrictCmds=false grep $pattern` would restore the previous
> behavior of falling back to the cache for non-present entries, which
> is not true.

10 years from now, I don't want our docs to consist of a long
explanation of all the bugs that existed in various ancient versions
of git and how modern behavior differs from each previous iteration.
There are times when it's worth calling out bugs in prior versions to
bring it to the attention of our users, but I don't see how this is
one of them.  The previous behavior was just outright buggy and
inconsistent, and from my viewpoint, was also a regression.  I think
it should have been reverted regardless of your series, though
skip_worktree stuff was dormant and went unused for a really long
time.

Also, this is a special area of git where focusing too much on
backward compatibility might actually be detrimental.  Backward
compatibility is a really good goal to keep in mind in general, but
the SKIP_WORKTREE usability was traditionally really, really bad -- so
much so that outright replacing was contemplated by its author[A], and
we placed a HUGE ALL CAPS DISCLAIMER in the documentation of
sparse-checkout about how users should expect the behavior of commands
to change[B].  So, unlike other areas of git, we should focus on
getting sparse-checkout behavior right more than on bug compatibility
with previous code and long migration stories.  Given the context of
such disclaimers and changes, the idea of trying to document those
changes makes me think that in the not too distant future we would
have the equivalent of the following humorous driving directions from
the era before smartphones: "To get to Joe's place, you turn right on
the first road after where Billy's Barn burned down 5 years ago..."
(when the burned Barn was cleared out 4 years ago and there's no
indication of where it once was)

[A] https://lore.kernel.org/git/CABPp-BGE-m_UFfUt_moXG-YR=ZW8hMzMwraD7fkFV-+sEHw36w@mail.gmail.com/
[B] https://git-scm.com/docs/git-sparse-checkout#_description

> In particular, I would like to emphasize that the use for
> `sparse.restrictCmds=false` in the working tree case, is for
> situations like the one you described in [1]:
>
> * uses sparse-checkout to remove a bunch of files/directories they
> don't care about
> * creates a new file that happens to have the same name as an
> (unfortunately) generically worded filename that exists in the index
> (but is marked SKIP_WORKTREE and had previously been removed)
>
> In this situation, grep would ignore the said file by default, but
> search it with `sparse.restrictCmds=false`.

I think this is such a weird and unusual case that I'm not sure it
merits mentioning in the docs.

But if others disagree and think this case is worth mentioning in the
docs, then it shouldn't just be mentioned in "git grep".  All affected
manpages should be updated to discuss how they handle this obscure
corner case.  For example, `git diff` and `git status` just ignore
these files and do not print out any information about them.  So it's
kind of like these files are ignored...but even `git status --ignored`
won't show anything about such files.

Anyway, I think this is a pretty obscure case whose discussion would
dilute the value of the manual in teaching people the basics of
commands.

> So what do you think of the following:
>
> sparse.restrictCmds::
> See base definition in linkgit:git-config[1]. grep honors
> sparse.restrictCmds by limiting searches to the sparsity paths in
> three cases: when searching the working tree, when searching the index
> with --cached, and when searching a specified commit.

Good up to here.  I think I'd like to use just this text as-is (or
maybe with the "(obviously)" addition) and then see if we get feedback
that we need clarifications, because I'm worried our attempts at
clarifying might backfire.  For example...

> Note: when this
> option is set to true (default), the working tree search will ignore
> paths that are present despite not matching the sparsity patterns.

You've run into the same problem Stolee and I did by trying to provide
details about one case, but overlooking others.  ;-)  This "Note:"
statement is not correct; there's a couple cases it gets wrong:

merge/rebase/cherry-pick can unset the SKIP_WORKTREE bit even for
paths that do not match the sparsity patterns in order to be able to
materialize a file and show conflicts.  In fact, they are allowed to
unset the bit for other files and materialize them too (see
https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/).
Such paths, despite not matching the sparsity patterns, will not have
the SKIP_WORKTREE bit set.  And it is the SKIP_WORKTREE bit, rather
than the sparsity patterns, that git-grep uses for deciding which
files in the working tree to search.

Also, if someone runs sparse-checkout init/set, and sparse-checkout
would normally remove some file but notices that the file has local
modifications, then sparse-checkout will avoid removing the file AND
will avoid setting the SKIP_WORKTREE bit on that file.  See commit
681c637b4a ("unpack-trees: failure to set SKIP_WORKTREE bits always
just a warning", 2020-03-27)

> This can happen, for example, if you create a new file in a path that
> was previously removed by git-sparse-checkout.

This is that obscure corner case discussed above.

> Or if you don't
> deinitialize a submodule that is excluded by the sparsity patterns
> (thus remaining in the working copy, anyway).

This case requires more thought.  If a submodule doesn't match the
sparsity patterns, we already said elsewhere that sparse-checkout
should not remove the submodule (since doing so would risk data loss).
But do we set the SKIP_WORKTREE bit for it?  Generally,
sparse-checkout avoids removing files with modifications, and if it
doesn't remove them it also doesn't set the SKIP_WORKTREE bit.  For
consistency, should sparse-checkout not set SKIP_WORKTREE for
initialized submodules?

If we don't set the SKIP_WORKTREE bit for initialized submodules, then
we don't actually have a second different case to mention here.

Granted, that's more an issue for `sparse-checkout` than `grep`.


Hope that all helps.  Let me know if it doesn't, if you disagree with
any parts, or some parts aren't clear.
Elijah
diff mbox series

Patch

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ef0768b91a..fd74b80302 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -436,6 +436,8 @@  include::config/sequencer.txt[]
 
 include::config/showbranch.txt[]
 
+include::config/sparse.txt[]
+
 include::config/splitindex.txt[]
 
 include::config/ssh.txt[]
diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
new file mode 100644
index 0000000000..2a25b4b8ef
--- /dev/null
+++ b/Documentation/config/sparse.txt
@@ -0,0 +1,24 @@ 
+sparse.restrictCmds::
+	Only meaningful in conjunction with core.sparseCheckout. This option
+	extends sparse checkouts (which limit which paths are written to the
+	working tree), so that output and operations are also limited to the
+	sparsity paths where possible and implemented. The purpose of this
+	option is to (1) focus output for the user on the portion of the
+	repository that is of interest to them, and (2) enable potentially
+	dramatic performance improvements, especially in conjunction with
+	partial clones.
++
+When this option is true (default), some git commands may limit their behavior
+to the paths specified by the sparsity patterns, or to the intersection of
+those paths and any (like `*.c`) that the user might also specify on the
+command line. When false, the affected commands will work on full trees,
+ignoring the sparsity patterns. For now, only git-grep honors this setting. In
+this command, the restriction takes effect in three cases: with --cached; when
+a commit-ish is given; when searching a working tree where some paths excluded
+by the sparsity patterns are present (e.g. manually created paths or not
+removed submodules).
++
+Note: commands which export, integrity check, or create history will always
+operate on full trees (e.g. fast-export, format-patch, fsck, commit, etc.),
+unaffected by any sparsity patterns. Also, writting commands such as
+sparse-checkout and read-tree will not be affected by this configuration.
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 9bdf807584..abbf100109 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -41,6 +41,9 @@  characters.  An empty string as search expression matches all lines.
 CONFIGURATION
 -------------
 
+git-grep honors the sparse.restrictCmds setting. See its definition in
+linkgit:git-config[1].
+
 :git-grep: 1
 include::config/grep.txt[]
 
diff --git a/Documentation/git.txt b/Documentation/git.txt
index 9d6769e95a..5e107c6246 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -180,6 +180,10 @@  If you just want to run git as if it was started in `<path>` then use
 	Do not perform optional operations that require locks. This is
 	equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`.
 
+--[no-]restrict-to-sparse-paths::
+	Overrides the sparse.restrictCmds configuration (see
+	linkgit:git-config[1]) for this execution.
+
 --list-cmds=group[,group...]::
 	List commands by group. This is an internal/experimental
 	option and may change or be removed in the future. Supported
diff --git a/Makefile b/Makefile
index 90aa329eb7..0c0013b32c 100644
--- a/Makefile
+++ b/Makefile
@@ -983,6 +983,7 @@  LIB_OBJS += sha1-name.o
 LIB_OBJS += shallow.o
 LIB_OBJS += sideband.o
 LIB_OBJS += sigchain.o
+LIB_OBJS += sparse-checkout.o
 LIB_OBJS += split-index.o
 LIB_OBJS += stable-qsort.o
 LIB_OBJS += strbuf.o
diff --git a/builtin/grep.c b/builtin/grep.c
index 11e33b8aee..cc696dab4a 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -25,6 +25,7 @@ 
 #include "submodule-config.h"
 #include "object-store.h"
 #include "packfile.h"
+#include "sparse-checkout.h"
 
 static char const * const grep_usage[] = {
 	N_("git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]"),
@@ -498,6 +499,7 @@  static int grep_cache(struct grep_opt *opt,
 	int nr;
 	struct strbuf name = STRBUF_INIT;
 	int name_base_len = 0;
+	int sparse_paths_only =	restrict_to_sparse_paths(repo);
 	if (repo->submodule_prefix) {
 		name_base_len = strlen(repo->submodule_prefix);
 		strbuf_addstr(&name, repo->submodule_prefix);
@@ -509,7 +511,7 @@  static int grep_cache(struct grep_opt *opt,
 	for (nr = 0; nr < repo->index->cache_nr; nr++) {
 		const struct cache_entry *ce = repo->index->cache[nr];
 
-		if (ce_skip_worktree(ce))
+		if (sparse_paths_only && ce_skip_worktree(ce))
 			continue;
 
 		strbuf_setlen(&name, name_base_len);
@@ -715,9 +717,10 @@  static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		     int is_root_tree)
 {
 	struct pattern_list *patterns = NULL;
+	int sparse_paths_only = restrict_to_sparse_paths(opt->repo);
 	int ret;
 
-	if (is_root_tree)
+	if (is_root_tree && sparse_paths_only)
 		patterns = get_sparsity_patterns(opt->repo);
 
 	ret = do_grep_tree(opt, pathspec, tree, base, tn_len, is_root_tree,
@@ -1257,6 +1260,12 @@  int cmd_grep(int argc, const char **argv, const char *prefix)
 
 	if (!use_index || untracked) {
 		int use_exclude = (opt_exclude < 0) ? use_index : !!opt_exclude;
+
+		if (opt_restrict_to_sparse_paths >= 0) {
+			die(_("--[no-]restrict-to-sparse-paths is incompatible"
+				  " with --no-index and --untracked"));
+		}
+
 		hit = grep_directory(&opt, &pathspec, use_exclude, use_index);
 	} else if (0 <= opt_exclude) {
 		die(_("--[no-]exclude-standard cannot be used for tracked contents"));
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 70ad04e1b2..71956f7313 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -3208,6 +3208,8 @@  __git_main ()
 			--namespace=
 			--no-replace-objects
 			--help
+			--restrict-to-sparse-paths
+			--no-restrict-to-sparse-paths
 			"
 			;;
 		*)
diff --git a/git.c b/git.c
index a2d337eed7..6db1382ae4 100644
--- a/git.c
+++ b/git.c
@@ -38,6 +38,7 @@  const char git_more_info_string[] =
 	   "See 'git help git' for an overview of the system.");
 
 static int use_pager = -1;
+int opt_restrict_to_sparse_paths = -1;
 
 static void list_builtins(struct string_list *list, unsigned int exclude_option);
 
@@ -311,6 +312,10 @@  static int handle_options(const char ***argv, int *argc, int *envchanged)
 			} else {
 				exit(list_cmds(cmd));
 			}
+		} else if (!strcmp(cmd, "--restrict-to-sparse-paths")) {
+			opt_restrict_to_sparse_paths = 1;
+		} else if (!strcmp(cmd, "--no-restrict-to-sparse-paths")) {
+			opt_restrict_to_sparse_paths = 0;
 		} else {
 			fprintf(stderr, _("unknown option: %s\n"), cmd);
 			usage(git_usage_string);
@@ -319,6 +324,7 @@  static int handle_options(const char ***argv, int *argc, int *envchanged)
 		(*argv)++;
 		(*argc)--;
 	}
+
 	return (*argv) - orig_argv;
 }
 
diff --git a/sparse-checkout.c b/sparse-checkout.c
new file mode 100644
index 0000000000..9a9e50fd29
--- /dev/null
+++ b/sparse-checkout.c
@@ -0,0 +1,16 @@ 
+#include "cache.h"
+#include "config.h"
+#include "sparse-checkout.h"
+
+int restrict_to_sparse_paths(struct repository *repo)
+{
+	int ret;
+
+	if (opt_restrict_to_sparse_paths >= 0)
+		return opt_restrict_to_sparse_paths;
+
+	if (repo_config_get_bool(repo, "sparse.restrictcmds", &ret))
+		ret = 1;
+
+	return ret;
+}
diff --git a/sparse-checkout.h b/sparse-checkout.h
new file mode 100644
index 0000000000..1de3b588d8
--- /dev/null
+++ b/sparse-checkout.h
@@ -0,0 +1,11 @@ 
+#ifndef SPARSE_CHECKOUT_H
+#define SPARSE_CHECKOUT_H
+
+struct repository;
+
+extern int opt_restrict_to_sparse_paths; /* from git.c */
+
+/* Whether or not cmds should restrict behavior on sparse paths, in this repo */
+int restrict_to_sparse_paths(struct repository *repo);
+
+#endif /* SPARSE_CHECKOUT_H */
diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh
index ce080cf572..1aef084186 100755
--- a/t/t7817-grep-sparse-checkout.sh
+++ b/t/t7817-grep-sparse-checkout.sh
@@ -80,10 +80,10 @@  test_expect_success 'setup' '
 	test_path_is_file sub2/a
 '
 
-# The test bellow checks a special case: the sparsity patterns exclude '/b'
+# The two tests bellow check a special case: the sparsity patterns exclude '/b'
 # and sparse checkout is enable, but the path exists on the working tree (e.g.
 # manually created after `git sparse-checkout init`). In this case, grep should
-# skip it.
+# skip the file by default, but not with --no-restrict-to-sparse-paths.
 test_expect_success 'grep in working tree should honor sparse checkout' '
 	cat >expect <<-EOF &&
 	a:text
@@ -93,6 +93,16 @@  test_expect_success 'grep in working tree should honor sparse checkout' '
 	git grep "text" >actual &&
 	test_cmp expect actual
 '
+test_expect_success 'grep w/ --no-restrict-to-sparse-paths for sparsely excluded but present paths' '
+	cat >expect <<-EOF &&
+	a:text
+	b:new-text
+	EOF
+	echo "new-text" >b &&
+	test_when_finished "rm b" &&
+	git --no-restrict-to-sparse-paths grep "text" >actual &&
+	test_cmp expect actual
+'
 
 test_expect_success 'grep --cached should honor sparse checkout' '
 	cat >expect <<-EOF &&
@@ -136,7 +146,7 @@  test_expect_success 'grep <tree-ish> should ignore sparsity patterns' '
 '
 
 # Note that sub2/ is present in the worktree but it is excluded by the sparsity
-# patterns, so grep should not recurse into it.
+# patterns, so grep should only recurse into it with --no-restrict-to-sparse-paths.
 test_expect_success 'grep --recurse-submodules should honor sparse checkout in submodule' '
 	cat >expect <<-EOF &&
 	a:text
@@ -145,6 +155,15 @@  test_expect_success 'grep --recurse-submodules should honor sparse checkout in s
 	git grep --recurse-submodules "text" >actual &&
 	test_cmp expect actual
 '
+test_expect_success 'grep --recurse-submodules should search in excluded submodules w/ --no-restrict-to-sparse-paths' '
+	cat >expect <<-EOF &&
+	a:text
+	sub/B/b:text
+	sub2/a:text
+	EOF
+	git --no-restrict-to-sparse-paths grep --recurse-submodules "text" >actual &&
+	test_cmp expect actual
+'
 
 test_expect_success 'grep --recurse-submodules --cached should honor sparse checkout in submodule' '
 	cat >expect <<-EOF &&
@@ -171,4 +190,111 @@  test_expect_success 'grep --recurse-submodules <commit-ish> should honor sparse
 	test_cmp expect_tag-to-commit actual_tag-to-commit
 '
 
+for cmd in 'git --no-restrict-to-sparse-paths grep' \
+	   'git -c sparse.restrictCmds=false grep' \
+	   'git -c sparse.restrictCmds=true --no-restrict-to-sparse-paths grep'
+do
+
+	test_expect_success "$cmd --cached should ignore sparsity patterns" '
+		cat >expect <<-EOF &&
+		a:text
+		b:text
+		dir/c:text
+		EOF
+		$cmd --cached "text" >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "$cmd <commit-ish> should ignore sparsity patterns" '
+		commit=$(git rev-parse HEAD) &&
+		cat >expect_commit <<-EOF &&
+		$commit:a:text
+		$commit:b:text
+		$commit:dir/c:text
+		EOF
+		cat >expect_tag-to-commit <<-EOF &&
+		tag-to-commit:a:text
+		tag-to-commit:b:text
+		tag-to-commit:dir/c:text
+		EOF
+		$cmd "text" $commit >actual_commit &&
+		test_cmp expect_commit actual_commit &&
+		$cmd "text" tag-to-commit >actual_tag-to-commit &&
+		test_cmp expect_tag-to-commit actual_tag-to-commit
+	'
+done
+
+test_expect_success 'grep --recurse-submodules --cached \w --no-restrict-to-sparse-paths' '
+	cat >expect <<-EOF &&
+	a:text
+	b:text
+	dir/c:text
+	sub/A/a:text
+	sub/B/b:text
+	sub2/a:text
+	EOF
+	git --no-restrict-to-sparse-paths grep --recurse-submodules --cached \
+		"text" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep --recurse-submodules <commit-ish> \w --no-restrict-to-sparse-paths' '
+	commit=$(git rev-parse HEAD) &&
+	cat >expect_commit <<-EOF &&
+	$commit:a:text
+	$commit:b:text
+	$commit:dir/c:text
+	$commit:sub/A/a:text
+	$commit:sub/B/b:text
+	$commit:sub2/a:text
+	EOF
+	cat >expect_tag-to-commit <<-EOF &&
+	tag-to-commit:a:text
+	tag-to-commit:b:text
+	tag-to-commit:dir/c:text
+	tag-to-commit:sub/A/a:text
+	tag-to-commit:sub/B/b:text
+	tag-to-commit:sub2/a:text
+	EOF
+	git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \
+		$commit >actual_commit &&
+	test_cmp expect_commit actual_commit &&
+	git --no-restrict-to-sparse-paths grep --recurse-submodules "text" \
+		tag-to-commit >actual_tag-to-commit &&
+	test_cmp expect_tag-to-commit actual_tag-to-commit
+'
+
+test_expect_success 'should respect the sparse.restrictCmds values from submodules' '
+	cat >expect <<-EOF &&
+	a:text
+	sub/A/a:text
+	sub/B/b:text
+	EOF
+	test_config -C sub sparse.restrictCmds false &&
+	git grep --cached --recurse-submodules "text" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'should propagate --[no]-restrict-to-sparse-paths to submodules' '
+	cat >expect <<-EOF &&
+	a:text
+	b:text
+	dir/c:text
+	sub/A/a:text
+	sub/B/b:text
+	sub2/a:text
+	EOF
+	test_config -C sub sparse.restrictCmds true &&
+	git --no-restrict-to-sparse-paths grep --cached --recurse-submodules "text" >actual &&
+	test_cmp expect actual
+'
+
+for opt in '--untracked' '--no-index'
+do
+	test_expect_success "--[no]-restrict-to-sparse-paths and $opt are incompatible" "
+		test_must_fail git --restrict-to-sparse-paths grep $opt . 2>actual &&
+		test_i18ngrep 'restrict-to-sparse-paths is incompatible with' actual
+	"
+done
+
 test_done
diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
index 3c44af6940..a4a7767e06 100755
--- a/t/t9902-completion.sh
+++ b/t/t9902-completion.sh
@@ -1473,6 +1473,8 @@  test_expect_success 'double dash "git" itself' '
 	--namespace=
 	--no-replace-objects Z
 	--help Z
+	--restrict-to-sparse-paths Z
+	--no-restrict-to-sparse-paths Z
 	EOF
 '
 
@@ -1515,7 +1517,7 @@  test_expect_success 'general options' '
 	test_completion "git --nam" "--namespace=" &&
 	test_completion "git --bar" "--bare " &&
 	test_completion "git --inf" "--info-path " &&
-	test_completion "git --no-r" "--no-replace-objects "
+	test_completion "git --no-rep" "--no-replace-objects "
 '
 
 test_expect_success 'general options plus command' '