From patchwork Tue Mar 24 06:11:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11454541 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AD3FA13A4 for ; Tue, 24 Mar 2020 06:11:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8DEE42076A for ; Tue, 24 Mar 2020 06:11:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="i8KbFbgY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727438AbgCXGLa (ORCPT ); Tue, 24 Mar 2020 02:11:30 -0400 Received: from mail-qk1-f193.google.com ([209.85.222.193]:38079 "EHLO mail-qk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727385AbgCXGLa (ORCPT ); Tue, 24 Mar 2020 02:11:30 -0400 Received: by mail-qk1-f193.google.com with SMTP id h14so18194132qke.5 for ; Mon, 23 Mar 2020 23:11:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SaVuKuO7gfN6WfAzt9SMOLO+UJO1qr67FxdnoxNW53Y=; b=i8KbFbgYB6oy40AMJUfl2xfe/JN2MaRJaUAjwNFJuixxMZAJwzYPwcXLTVFjFyv7QD bfj6cWlRbZh0y9n+WZJJ7aq1u/r3z4zaJ5FaFkzdzfWvxWAbK7ZkJIKgulSrhSL2Lw8D lft52RbI4VvNdFUlKSZCCsACSZFT2aMr1ZnJsF8yMn1hEL2+wPycJVnuJz3n2fPl4JI+ TrT3IUSXNGkgyaLt51gTcRLzhrcxSlNO8aJI02AMaCemiVDB+LLitgdD0nBvitzEm3WA llxqb3r6h++xOrfg9jYzyWNgfhCab5ih+kk82xi6DqVP5ZiDOd7cA451CuNozz43QbQW KFVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SaVuKuO7gfN6WfAzt9SMOLO+UJO1qr67FxdnoxNW53Y=; b=F89p99W8YPqggrCixtQx40ShLIM4rRgc/mU5Zx/l2fVzklIL3oSsJKF/KqiHnoLrBP Fjuowc4P10hY72yEP9Og1Ku+GhfLmC1pjXk89izh90qfMCadIMjnXY0D6cktab0DZU6Z 3Chx3Q1CCC8/7thmPDiXZmjU0DHxR4GZmTZJC7eilxsfetXVP61HmW3DjW5jsya4WvMJ xIupi+akbj7zSRExnu6+fnx0AkDt9ClDQzzPoEk8mcpYx6NVhBcEEv/XrLfB6CllqCyd srp6niezhtG2FNAD+N4fKgoheZ1ktGyDTtmmG1qC7NEmh4SH/s/H4uHQ0apGqkRJSqCC jP8Q== X-Gm-Message-State: ANhLgQ20CeczvdJ63gXdczrEUw+Lg7s4akv+LIuHsrijc4c5iiQjkBKy /FxSrGJ0UgQGZ4WvEWkztgemOOQ3Isk= X-Google-Smtp-Source: ADFU+vv4QZrdpUVpdVyuU1Oehf/XdnTWZtxLBYQESyB0RbJS0Z+WAC3mDZ2cy5c23wqKaltm5uzJjw== X-Received: by 2002:a37:4c0a:: with SMTP id z10mr25085758qka.408.1585030288584; Mon, 23 Mar 2020 23:11:28 -0700 (PDT) Received: from mango.spo.virtua.com.br ([2804:14c:81:942d::1]) by smtp.gmail.com with ESMTPSA id m10sm13669164qte.71.2020.03.23.23.11.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Mar 2020 23:11:27 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: dstolee@microsoft.com, newren@gmail.com, sandals@crustytoothpaste.net Subject: [RFC PATCH 1/3] doc: grep: unify info on configuration variables Date: Tue, 24 Mar 2020 03:11:20 -0300 Message-Id: <7ba5caf10de75a2e0909318b04c62f5827a3fa56.1585027716.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Explanations about the configuration variables for git-grep are duplicated in "Documentation/git-grep.txt" and "Documentation/config/grep.txt". Let's unify the information in the second file and include it in the first. Signed-off-by: Matheus Tavares --- Documentation/config/grep.txt | 7 +++++-- Documentation/git-grep.txt | 35 +++++------------------------------ 2 files changed, 10 insertions(+), 32 deletions(-) diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index 44abe45a7c..76689771aa 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -16,8 +16,11 @@ grep.extendedRegexp:: other than 'default'. grep.threads:: - Number of grep worker threads to use. - See `grep.threads` in linkgit:git-grep[1] for more information. + Number of grep worker threads to use. See `--threads` in + linkgit:git-grep[1] for more information. + +grep.fullName:: + If set to true, enable `--full-name` option by default. grep.fallbackToNoIndex:: If set to true, fall back to git grep --no-index if git grep diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index ddb6acc025..97e25d7b1b 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -41,34 +41,7 @@ characters. An empty string as search expression matches all lines. CONFIGURATION ------------- -grep.lineNumber:: - If set to true, enable `-n` option by default. - -grep.column:: - If set to true, enable the `--column` option by default. - -grep.patternType:: - Set the default matching behavior. Using a value of 'basic', 'extended', - 'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`, - `--fixed-strings`, or `--perl-regexp` option accordingly, while the - value 'default' will return to the default matching behavior. - -grep.extendedRegexp:: - If set to true, enable `--extended-regexp` option by default. This - option is ignored when the `grep.patternType` option is set to a value - other than 'default'. - -grep.threads:: - Number of grep worker threads to use. If unset (or set to 0), Git will - use as many threads as the number of logical cores available. - -grep.fullName:: - If set to true, enable `--full-name` option by default. - -grep.fallbackToNoIndex:: - If set to true, fall back to git grep --no-index if git grep - is executed outside of a git repository. Defaults to false. - +include::config/grep.txt[] OPTIONS ------- @@ -267,8 +240,10 @@ providing this option will cause it to die. found. --threads :: - Number of grep worker threads to use. - See `grep.threads` in 'CONFIGURATION' for more information. + Number of grep worker threads to use. If not provided (or set to + 0), Git will use as many worker threads as the number of logical + cores available. The default value can also be set with the + `grep.threads` configuration (see linkgit:git-config[1]). -f :: Read patterns from , one per line. From patchwork Tue Mar 24 06:12:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11454549 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 02B7C13A4 for ; Tue, 24 Mar 2020 06:13:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BA1682073E for ; Tue, 24 Mar 2020 06:13:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="epw03nl9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727064AbgCXGM7 (ORCPT ); Tue, 24 Mar 2020 02:12:59 -0400 Received: from mail-qk1-f196.google.com ([209.85.222.196]:35489 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725959AbgCXGM7 (ORCPT ); Tue, 24 Mar 2020 02:12:59 -0400 Received: by mail-qk1-f196.google.com with SMTP id k13so6451671qki.2 for ; Mon, 23 Mar 2020 23:12:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PXCt8BpbiW5JV9p7EczOAeemxkBcbNe4+H4R3AYAdU4=; b=epw03nl9id5En/hG3HK2r7YJdsC5kVT3FpDPbzkx3AWREsR0e+fsBCqaRKctKVw4Q/ o7jScFR3VkeXwaybTQ3qkd3677urob3/g2KMPfkJqbMKODn8k0dodlmTCfeVS958N7Mf 8zNXbODuOt65xhhaaGcoQ0v/Jp56v+g9Wia0paoYoCrvInmo/fHx1CB3/yR4sDt+q/vW Q+P4lQlvFUFptBulWqpkiJs+NXE/yYg+S3ED4f3s8EzAu+bfRoVMDUOgnjnAyRRFnyDh 4It8hv7jQ09CcDfhwtd4sI1vlF1dZL6I8Xs2EV0XyLweGOMkipkqYlRaSDVsd5DJyUzZ 4h1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PXCt8BpbiW5JV9p7EczOAeemxkBcbNe4+H4R3AYAdU4=; b=mfgHwE2c4GB+LPBUUhDXVsucXMr4qUC8ij5G3MFybJ+lY6FCXIXI/bxEaFAQj5aapF tCgZMPwdsaF5diw+fZxhAn7KiPaCtz+N25ajSfrdogwXYhgdyakITgZWsd58nmYruHG0 0iNDCsnAAZauVuYqaqzK5u9Ei+Z6pvVU+3UIe2m3gOUhj6pElP8oYpGp5Xb6LykQQMhu MUEZdREcAYBLU0GlLw60J7ZROydPnH2vJgoVNf8ykT/xB/9gtpbnU0SOG1xtgI6Ule7q /NWjuGX97ZqOyYTop+s9NGm7OzP59QaTawx9BRcwkd3fVA0YdSpG9B3xP3I/B8hN7Rgx SefA== X-Gm-Message-State: ANhLgQ2O1Z31jV8wiqzXa5baO/lWZ0rnQJ759QZqvukXJe570bWd1ldi Ga8uy7LNpWkjarvcY+Plg/hCvEYyVn4= X-Google-Smtp-Source: ADFU+vtSza3usmUTxyf+Q6cI+Vlpqz6TNsp80yQShqB9KUH1XEC6td9+Mz/6+SzyiQd9m8uCNDlbSQ== X-Received: by 2002:ae9:f007:: with SMTP id l7mr24942191qkg.11.1585030377685; Mon, 23 Mar 2020 23:12:57 -0700 (PDT) Received: from mango.spo.virtua.com.br ([2804:14c:81:942d::1]) by smtp.gmail.com with ESMTPSA id g49sm13913719qtk.1.2020.03.23.23.12.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Mar 2020 23:12:57 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: dstolee@microsoft.com, newren@gmail.com, sandals@crustytoothpaste.net, stefanbeller@gmail.com Subject: [RFC PATCH 2/3] grep: honor sparse checkout patterns Date: Tue, 24 Mar 2020 03:12:46 -0300 Message-Id: <0b9b4c4b414a571877163667694afa3053bf8890.1585027716.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org One of the main uses for a sparse checkout is to allow users to focus on the subset of files in a repository in which they are interested. But git-grep currently ignores the sparsity patterns and report all matches found outside this subset, which kind of goes in the oposity direction. Let's fix that, making it honor the sparsity boundaries for every grepping case: - git grep in worktree - git grep --cached - git grep $REVISION - git grep --untracked and git grep --no-index (which already respect sparse checkout boundaries) This is also what some users reported[1] they would want as the default behavior. Note: for `git grep $REVISION`, we will choose to honor the sparsity patterns only when $REVISION is a commit-ish object. The reason is that, for a tree, we don't know whether it represents the root of a repository or a subtree. So we wouldn't be able to correctly match it against the sparsity patterns. E.g. suppose we have a repository with these two sparsity rules: "/*" and "!/a"; and the following structure: / | - a (file) | - d (dir) | - a (file) If `git grep $REVISION` were to honor the sparsity patterns for every object type, when grepping the /d tree, we would wrongly ignore the /d/a file. This happens because we wouldn't know it resides in /d and therefore it would wrongly match the pattern "!/a". Furthermore, for a search in a blob object, we wouldn't even have a path to check the patterns against. So, let's ignore the sparsity patterns when grepping non-commit-ish objects (tags to commits should be fine). Finally, the old behavior is still desirable for some use cases. So the next patch will add an option to allow restoring it when needed. [1]: https://lore.kernel.org/git/CABPp-BGuFhDwWZBRaD3nA8ui46wor-4=Ha1G1oApsfF8KNpfGQ@mail.gmail.com/ Signed-off-by: Matheus Tavares --- Something I'm not entirely sure in this patch is how we implement the mechanism to honor sparsity for the `git grep ` case (which is treated in the grep_tree() function). Currently, the patch looks for an index entry that matches the path, and then checks its skip_worktree bit. But this operation is perfomed in O(log(N)); N being the number of index entries. If there are many entries (and no so many sparsity patterns), maybe a better approach would be to try matching the path directly against the sparsity patterns. This would be O(M) in the number of patterns, and it could be done, in builtin/grep.c, with a function like the following: static struct pattern_list sparsity_patterns; static int sparsity_patterns_initialized = 0; static enum pattern_match_result path_matches_sparsity_patterns( const char *path, int pathlen, const char *basename, struct repository *repo) { int dtype = DT_UNKNOWN; if (!sparsity_patterns_initialized) { char *sparse_file = git_pathdup("info/sparse-checkout"); int ret; memset(&sparsity_patterns, 0, sizeof(sparsity_patterns)); sparsity_patterns.use_cone_patterns = core_sparse_checkout_cone; ret = add_patterns_from_file_to_list(sparse_file, "", 0, &sparsity_patterns, NULL); free(sparse_file); if (ret < 0) die(_("failed to load sparse-checkout patterns")); sparsity_patterns_initialized = 1; } return path_matches_pattern_list(path, pathlen, basename, &dtype, &sparsity_patterns, repo->index); } Also, if I understand correctly, the index doesn't hold paths to dirs, right? So even if a complete dir is excluded from sparse checkout, we still have to check all its subentries, only to discover that they should all be skipped from the search. However, if we were to check against the sparsity patterns directly (e.g. with the function above), we could skip such directories together with all their entries. Oh, and there is also the case of a commit whose tree paths are not in the index (maybe manually created objects?). For such commits, with the index lookup approach, we would have to fall back on ignoring the sparsity rules. I'm not sure if that would be OK, though. Any thoughts on these two approaches (looking up the skip_worktree bit in the index or directly matching against sparsity patterns), will be highly appreciated. (Note that it only concerns the `git grep ` case. The other cases already iterate thought the index, so there is no O(log(N)) extra complexity). builtin/grep.c | 29 ++++++++--- t/t7011-skip-worktree-reading.sh | 9 ---- t/t7817-grep-sparse-checkout.sh | 88 ++++++++++++++++++++++++++++++++ 3 files changed, 111 insertions(+), 15 deletions(-) create mode 100755 t/t7817-grep-sparse-checkout.sh diff --git a/builtin/grep.c b/builtin/grep.c index 99e2685090..52ec72a036 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -388,7 +388,7 @@ static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached); static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr); + int from_commit); static int grep_submodule(struct grep_opt *opt, const struct pathspec *pathspec, @@ -486,6 +486,10 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; + + if (ce_skip_worktree(ce)) + continue; + strbuf_setlen(&name, name_base_len); strbuf_addstr(&name, ce->name); @@ -498,8 +502,7 @@ static int grep_cache(struct grep_opt *opt, * cache entry are identical, even if worktree file has * been modified, so use cache version instead */ - if (cached || (ce->ce_flags & CE_VALID) || - ce_skip_worktree(ce)) { + if (cached || (ce->ce_flags & CE_VALID)) { if (ce_stage(ce) || ce_intent_to_add(ce)) continue; hit |= grep_oid(opt, &ce->oid, name.buf, @@ -532,7 +535,7 @@ static int grep_cache(struct grep_opt *opt, static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, struct tree_desc *tree, struct strbuf *base, int tn_len, - int check_attr) + int from_commit) { struct repository *repo = opt->repo; int hit = 0; @@ -546,6 +549,9 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, name_base_len = name.len; } + if (from_commit && repo_read_index(repo) < 0) + die(_("index file corrupt")); + while (tree_entry(tree, &entry)) { int te_len = tree_entry_len(&entry); @@ -564,9 +570,20 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(base, entry.path, te_len); + if (from_commit) { + int pos = index_name_pos(repo->index, + base->buf + tn_len, + base->len - tn_len); + if (pos >= 0 && + ce_skip_worktree(repo->index->cache[pos])) { + strbuf_setlen(base, old_baselen); + continue; + } + } + if (S_ISREG(entry.mode)) { hit |= grep_oid(opt, &entry.oid, base->buf, tn_len, - check_attr ? base->buf + tn_len : NULL); + from_commit ? base->buf + tn_len : NULL); } else if (S_ISDIR(entry.mode)) { enum object_type type; struct tree_desc sub; @@ -581,7 +598,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_addch(base, '/'); init_tree_desc(&sub, data, size); hit |= grep_tree(opt, pathspec, &sub, base, tn_len, - check_attr); + from_commit); free(data); } else if (recurse_submodules && S_ISGITLINK(entry.mode)) { hit |= grep_submodule(opt, pathspec, &entry.oid, diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh index 37525cae3a..26852586ac 100755 --- a/t/t7011-skip-worktree-reading.sh +++ b/t/t7011-skip-worktree-reading.sh @@ -109,15 +109,6 @@ test_expect_success 'ls-files --modified' ' test -z "$(git ls-files -m)" ' -test_expect_success 'grep with skip-worktree file' ' - git update-index --no-skip-worktree 1 && - echo test > 1 && - git update-index 1 && - git update-index --skip-worktree 1 && - rm 1 && - test "$(git grep --no-ext-grep test)" = "1:test" -' - echo ":000000 100644 $ZERO_OID $EMPTY_BLOB A 1" > expected test_expect_success 'diff-index does not examine skip-worktree absent entries' ' setup_absent && diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh new file mode 100755 index 0000000000..fccf44e829 --- /dev/null +++ b/t/t7817-grep-sparse-checkout.sh @@ -0,0 +1,88 @@ +#!/bin/sh + +test_description='grep in sparse checkout + +This test creates the following dir structure: +. +| - a +| - b +| - dir + | - c + +Only "a" should be present due to the sparse checkout patterns: +"/*", "!/b" and "!/dir". +' + +. ./test-lib.sh + +test_expect_success 'setup' ' + echo "text" >a && + echo "text" >b && + mkdir dir && + echo "text" >dir/c && + git add a b dir && + git commit -m "initial commit" && + git tag -am t-commit t-commit HEAD && + tree=$(git rev-parse HEAD^{tree}) && + git tag -am t-tree t-tree $tree && + cat >.git/info/sparse-checkout <<-EOF && + /* + !/b + !/dir + EOF + git sparse-checkout init && + test_path_is_missing b && + test_path_is_missing dir && + test_path_is_file a +' + +test_expect_success 'grep in working tree should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + git grep "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep --cached should honor sparse checkout' ' + cat >expect <<-EOF && + a:text + EOF + git grep --cached "text" >actual && + test_cmp expect actual +' + +test_expect_success 'grep should honor sparse checkout' ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + EOF + cat >expect_t-commit <<-EOF && + t-commit:a:text + EOF + git grep "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + git grep "text" t-commit >actual_t-commit && + test_cmp expect_t-commit actual_t-commit +' + +test_expect_success 'grep should search outside sparse checkout' ' + commit=$(git rev-parse HEAD) && + tree=$(git rev-parse HEAD^{tree}) && + cat >expect_tree <<-EOF && + $tree:a:text + $tree:b:text + $tree:dir/c:text + EOF + cat >expect_t-tree <<-EOF && + t-tree:a:text + t-tree:b:text + t-tree:dir/c:text + EOF + git grep "text" $tree >actual_tree && + test_cmp expect_tree actual_tree && + git grep "text" t-tree >actual_t-tree && + test_cmp expect_t-tree actual_t-tree +' + +test_done From patchwork Tue Mar 24 06:13:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11454551 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6C10D139A for ; Tue, 24 Mar 2020 06:13:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 42E8C2074D for ; Tue, 24 Mar 2020 06:13:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="DwFZyo9D" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727250AbgCXGNy (ORCPT ); Tue, 24 Mar 2020 02:13:54 -0400 Received: from mail-qv1-f66.google.com ([209.85.219.66]:33505 "EHLO mail-qv1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725951AbgCXGNy (ORCPT ); Tue, 24 Mar 2020 02:13:54 -0400 Received: by mail-qv1-f66.google.com with SMTP id p19so6082268qve.0 for ; Mon, 23 Mar 2020 23:13:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1muIyoRBk26lIUFtrujVeNYGHskovTTxo+TfZCz5xeg=; b=DwFZyo9DTMW/7j24fu7TOr/vCRZL/m1rAOOTkFTzQDo6KrkABr/JDDkEIYcasLw72k 1raEAXJI3JQsRmFBcVzhC4OZpLyvmhO2qqbcA9zG9DXn/UgLhXhotRQLF4UYX7jKX9Jg vqqUvqOpPwaGFo8/D3uuaGB7/vChN1VygdhnHYvTnmpEW+Kgzs6pMBEPFA5yJpElFXah gcVIWeQWAvB1ASlr3wkbNAxjJIFQhPmAbI+Sda6QSqg9Ul556FlyjzWt6m85DorwK25b fmPBOv9t8rwIb8JsByT5aR9IortJgg/+uWrMO+SdlWr3IdFhEpZe54iPXDWrZNWwdeEB SnSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1muIyoRBk26lIUFtrujVeNYGHskovTTxo+TfZCz5xeg=; b=fSpUF10yHHMpIzcHBR4/UnBkGQG2cluefoF1ja/VP39iWWJ9H6tN1kgcWWHnSEshYF ojBJz+e0Unxwo7NU0OXtMyC0uZ7Na1QkHqc/PaefioKh6ZfeCxcUSyn8HG+tfJdI6/l9 8N6Xa8zocb3SP9W4ouJCeFcHNwHGiZBsSpRAzqsDa9ZEzekf4TU/Jcn5SfSd1NtP1Wix 3IwwwPEneFRxr5j1Xs+tCqj44HZZsUh7dDCKu/6M1EOOSgeR/PPVdTkLHZHr4xt/At3V TIA1O/2oDqOqs/h4pnc+vlILmW/eEZlQ8H02tzAo+bx7oSSTN36RD26Ait7QIdqSRuwN gkpw== X-Gm-Message-State: ANhLgQ2ZKYGikLd5nKn4nBrRr6jf3e0hpV7jhJjOAGTM4Z3rkqNN+oqj mS22r4yN7rni5BkCZWaJoSfACfp3MdE= X-Google-Smtp-Source: ADFU+vv9OhT1Dt2SzY6d/wycklyEcruEgV/TzauweNPww60Px4NdnpJD6uJMmS4+PyC6PQQnLlDqdw== X-Received: by 2002:a0c:a910:: with SMTP id y16mr24426387qva.139.1585030432005; Mon, 23 Mar 2020 23:13:52 -0700 (PDT) Received: from mango.spo.virtua.com.br ([2804:14c:81:942d::1]) by smtp.gmail.com with ESMTPSA id o67sm12130871qka.114.2020.03.23.23.13.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Mar 2020 23:13:51 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: dstolee@microsoft.com, newren@gmail.com, pclouds@gmail.com Subject: [RFC PATCH 3/3] grep: add option to ignore sparsity patterns Date: Tue, 24 Mar 2020 03:13:44 -0300 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In the last commit, git-grep learned to honor sparsity patterns. For some use cases, however, it may be desirable to search outside the sparse checkout. So add the '--ignore-sparsity' option, which restores the old behavior. Also add the grep.ignoreSparsity configuration, to allow setting this behavior by default. Signed-off-by: Matheus Tavares --- Note: I still have to make --ignore-sparsity be able to work together with --untracked. Unfortunatelly, this won't be as simple because the codeflow taken by --untracked goes to grep_directory() which just iterates the working tree, without looking the index entries. So I will have to either: make --untracked use grep_cache(), and grep the untracked files later; or try matching the working tree paths against the sparsity patterns, without looking for the skip_worktree bit in the index (as I mentioned in the previous patch's comments). Any preferences regarding these two approaches? (or other suggestions?) Documentation/config/grep.txt | 3 +++ Documentation/git-grep.txt | 5 ++++ builtin/grep.c | 19 +++++++++++---- t/t7817-grep-sparse-checkout.sh | 42 +++++++++++++++++++++++++++++++++ 4 files changed, 65 insertions(+), 4 deletions(-) diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index 76689771aa..c1d49484c8 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -25,3 +25,6 @@ grep.fullName:: grep.fallbackToNoIndex:: If set to true, fall back to git grep --no-index if git grep is executed outside of a git repository. Defaults to false. + +grep.ignoreSparsity:: + If set to true, enable `--ignore-sparsity` by default. diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 97e25d7b1b..5c5c66c056 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -65,6 +65,11 @@ OPTIONS mechanism. Only useful when searching files in the current directory with `--no-index`. +--ignore-sparsity:: + In a sparse checked out repository (see linkgit:git-sparse-checkout[1]), + also search in files that are outside the sparse checkout. This option + cannot be used with --no-index or --untracked. + --recurse-submodules:: Recursively search in each submodule that has been initialized and checked out in the repository. When used in combination with the diff --git a/builtin/grep.c b/builtin/grep.c index 52ec72a036..17eae3edd6 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -33,6 +33,8 @@ static char const * const grep_usage[] = { static int recurse_submodules; +static int ignore_sparsity = 0; + static int num_threads; static pthread_t *threads; @@ -292,6 +294,9 @@ static int grep_cmd_config(const char *var, const char *value, void *cb) if (!strcmp(var, "submodule.recurse")) recurse_submodules = git_config_bool(var, value); + if (!strcmp(var, "grep.ignoresparsity")) + ignore_sparsity = git_config_bool(var, value); + return st; } @@ -487,7 +492,7 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; - if (ce_skip_worktree(ce)) + if (!ignore_sparsity && ce_skip_worktree(ce)) continue; strbuf_setlen(&name, name_base_len); @@ -502,7 +507,8 @@ static int grep_cache(struct grep_opt *opt, * cache entry are identical, even if worktree file has * been modified, so use cache version instead */ - if (cached || (ce->ce_flags & CE_VALID)) { + if (cached || (ce->ce_flags & CE_VALID) || + ce_skip_worktree(ce)) { if (ce_stage(ce) || ce_intent_to_add(ce)) continue; hit |= grep_oid(opt, &ce->oid, name.buf, @@ -549,7 +555,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, name_base_len = name.len; } - if (from_commit && repo_read_index(repo) < 0) + if (!ignore_sparsity && from_commit && repo_read_index(repo) < 0) die(_("index file corrupt")); while (tree_entry(tree, &entry)) { @@ -570,7 +576,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, strbuf_add(base, entry.path, te_len); - if (from_commit) { + if (!ignore_sparsity && from_commit) { int pos = index_name_pos(repo->index, base->buf + tn_len, base->len - tn_len); @@ -932,6 +938,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored, N_("allow calling of grep(1) (ignored by this build)"), PARSE_OPT_NOCOMPLETE), + OPT_BOOL(0, "ignore-sparsity", &ignore_sparsity, + N_("also search in files outside the sparse checkout")), OPT_END() }; @@ -1073,6 +1081,9 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (recurse_submodules && untracked) die(_("--untracked not supported with --recurse-submodules")); + if (ignore_sparsity && (!use_index || untracked)) + die(_("--no-index or --untracked cannot be used with --ignore-sparsity")); + if (show_in_pager) { if (num_threads > 1) warning(_("invalid option combination, ignoring --threads")); diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh index fccf44e829..1891ddea57 100755 --- a/t/t7817-grep-sparse-checkout.sh +++ b/t/t7817-grep-sparse-checkout.sh @@ -85,4 +85,46 @@ test_expect_success 'grep should search outside sparse checkout' ' test_cmp expect_t-tree actual_t-tree ' +for cmd in 'git grep --ignore-sparsity' 'git -c grep.ignoreSparsity grep' \ + 'git -c grep.ignoreSparsity=false grep --ignore-sparsity' +do + test_expect_success "$cmd should search outside sparse checkout" ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + EOF + $cmd "text" >actual && + test_cmp expect actual + ' + + test_expect_success "$cmd --cached should search outside sparse checkout" ' + cat >expect <<-EOF && + a:text + b:text + dir/c:text + EOF + $cmd --cached "text" >actual && + test_cmp expect actual + ' + + test_expect_success "$cmd should search outside sparse checkout" ' + commit=$(git rev-parse HEAD) && + cat >expect_commit <<-EOF && + $commit:a:text + $commit:b:text + $commit:dir/c:text + EOF + cat >expect_t-commit <<-EOF && + t-commit:a:text + t-commit:b:text + t-commit:dir/c:text + EOF + $cmd "text" $commit >actual_commit && + test_cmp expect_commit actual_commit && + $cmd "text" t-commit >actual_t-commit && + test_cmp expect_t-commit actual_t-commit + ' +done + test_done