mbox series

[00/10] Sparse-index: integrate with status and add

Message ID pull.932.git.1618322497.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Sparse-index: integrate with status and add | expand

Message

Usman Akinyemi via GitGitGadget April 13, 2021, 2:01 p.m. UTC
This is the first "payoff" series in the sparse-index work. It makes 'git
status' and 'git add' very fast when a sparse-index is enabled on a
repository with cone-mode sparse-checkout (and a small populated set).

This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
The latter branch is needed because it changes the behavior of 'git add'
around sparse entries, which changes the expectations of a test added in
patch 1.

The approach here is to audit the places where ensure_full_index() pops up
while doing normal commands with pathspecs within the sparse-checkout
definition. Each of these are checked and tested. In the end, the
sparse-index is integrated with these features:

 * git status
 * git add -A
 * git add . (and other pathspecs)
 * FS Monitor index extension.

The performance tests in p2000-sparse-operations.sh improve by 95% or more,
even when compared with the full-index cases, not just the sparse-index
cases that previously had extra overhead.

Hopefully this is the first example of how ds/sparse-index-protections has
done the basic work to do these conversions safely, making them look easier
than they seemed when starting this adventure.

Thanks, -Stolee

Derrick Stolee (10):
  t1092: add tests for status/add and sparse files
  unpack-trees: make sparse aware
  dir.c: accept a directory as part of cone-mode patterns
  status: skip sparse-checkout percentage with sparse-index
  status: use sparse-index throughout
  dir: use expand_to_path() for sparse directories
  add: allow operating on a sparse-only index
  pathspec: stop calling ensure_full_index
  t7519: add sparse directories to FS monitor tests
  fsmonitor: test with sparse index

 builtin/add.c                            |  3 +
 builtin/commit.c                         |  3 +
 dir.c                                    |  5 ++
 dir.h                                    |  2 +-
 pathspec.c                               |  2 -
 preload-index.c                          |  2 +
 read-cache.c                             |  5 +-
 t/t1092-sparse-checkout-compatibility.sh | 73 +++++++++++++++++++++++-
 t/t7519-status-fsmonitor.sh              | 65 +++++++++++++++++++++
 unpack-trees.c                           | 24 +++++++-
 wt-status.c                              | 14 ++++-
 wt-status.h                              |  1 +
 12 files changed, 186 insertions(+), 13 deletions(-)


base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/932

Comments

Matheus Tavares April 13, 2021, 8:45 p.m. UTC | #1
Hi, Stolee

On Tue, Apr 13, 2021 at 11:02 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This is the first "payoff" series in the sparse-index work. It makes 'git
> status' and 'git add' very fast when a sparse-index is enabled on a
> repository with cone-mode sparse-checkout (and a small populated set).
>
> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.

I just noticed that our ds/sparse-index-protections and
mt/add-rm-sparse-checkout had a small semantic conflict. It didn't
appear before, but it does now with this new series.

ds/sparse-index-protections added `ensure_full_index()` guards before
the loops that traverse over all cache entries. At the same time,
mt/add-rm-sparse-checkout added yet another one of these loops, at
`pathspec.c::find_pathspecs_matching_skip_worktree()`. Although the
new place didn't get the `ensure_full_index()` guard, all of its
callers (in `add` and `rm`) did call `ensure_full_index()` before
calling it, so it was fine.

However, patches 7 and 8 remove some of these protections in `add`s
code. And, as a result, if "dir" is a sparse directory entry, `git add
[--refresh] dir/file` no longer emits the warning added at
mt/add-rm-sparse-checkout.

Adding `ensure_full_index()` at
`find_pathspecs_matching_skip_worktree()` fixes the problem. We have
to consider the performance implications, but they _might_ be
acceptable as we only call this function when a pathspec given to
`add` or `rm` does not match any non-ignored file inside the sparse
checkout.

Additionally, the tests I added at t3705 won't catch this problem,
even when running with GIT_TEST_SPARSE_INDEX=true :( That's because
they don't set core.sparseCheckout and core.sparseCheckoutCone, they
only set individual index entries with the SKIP_WORKTREE bit. And
therefore, the index is always written fully. Perhaps, should I reroll
my series using cone mode for these tests?

(And a semi-related question: do you plan on adding
GIT_TEST_SPARSE_INDEX=true to one of the CI jobs? )
Derrick Stolee April 14, 2021, 4:31 p.m. UTC | #2
On 4/13/2021 4:45 PM, Matheus Tavares Bernardino wrote:
> Hi, Stolee
> 
> On Tue, Apr 13, 2021 at 11:02 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> This is the first "payoff" series in the sparse-index work. It makes 'git
>> status' and 'git add' very fast when a sparse-index is enabled on a
>> repository with cone-mode sparse-checkout (and a small populated set).
>>
>> This is based on ds/sparse-index-protections AND mt/add-rm-sparse-checkout.
> 
> I just noticed that our ds/sparse-index-protections and
> mt/add-rm-sparse-checkout had a small semantic conflict. It didn't
> appear before, but it does now with this new series.

Thank you for taking a close look.
 
> ds/sparse-index-protections added `ensure_full_index()` guards before
> the loops that traverse over all cache entries. At the same time,
> mt/add-rm-sparse-checkout added yet another one of these loops, at
> `pathspec.c::find_pathspecs_matching_skip_worktree()`. Although the
> new place didn't get the `ensure_full_index()` guard, all of its
> callers (in `add` and `rm`) did call `ensure_full_index()` before
> calling it, so it was fine.
>
> However, patches 7 and 8 remove some of these protections in `add`s
> code. And, as a result, if "dir" is a sparse directory entry, `git add
> [--refresh] dir/file` no longer emits the warning added at
> mt/add-rm-sparse-checkout.

You are right, it does not emit the warning. I will add a test that
ensures that behavior is the same across the two sparse repos in
t1092 as part of my v2 in this series.
 
> Adding `ensure_full_index()` at
> `find_pathspecs_matching_skip_worktree()` fixes the problem. We have
> to consider the performance implications, but they _might_ be
> acceptable as we only call this function when a pathspec given to
> `add` or `rm` does not match any non-ignored file inside the sparse
> checkout.

I'll want to do the right thing here to make the warning work, so
I'll take a look soon.

> Additionally, the tests I added at t3705 won't catch this problem,
> even when running with GIT_TEST_SPARSE_INDEX=true :( That's because
> they don't set core.sparseCheckout and core.sparseCheckoutCone, they
> only set individual index entries with the SKIP_WORKTREE bit. And
> therefore, the index is always written fully. Perhaps, should I reroll
> my series using cone mode for these tests?

Your series should not be re-rolled for this. Instead, this is valuable
feedback for this series: there is behavior in 'git add' that I am not
checking stays the same when the sparse-index is enabled. That's my
responsibility and I'll get it fixed.
 
> (And a semi-related question: do you plan on adding
> GIT_TEST_SPARSE_INDEX=true to one of the CI jobs? )

I do plan to add that, after things calm down. It won't do much right
now because it requires core.sparseCheckout[Cone] to be enabled. Not
many tests provide that, so they don't add much coverage. I thought at
one point to adjust the initial repo creation to include a
sparse-checkout in cone mode, but that would change too many tests.
I still haven't found the right way to expand the test coverage to
take advantage of our deep test suite for this feature.

Thanks,
-Stolee