diff mbox series

[v4,07/20] test-read-cache: print cache entries with --table

Message ID 7ebd9570b1ad81720569a770526651c62c152b9f.1616507069.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Sparse Index: Design, Format, Tests | expand

Commit Message

Derrick Stolee March 23, 2021, 1:44 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

This table is helpful for discovering data in the index to ensure it is
being written correctly, especially as we build and test the
sparse-index. This table includes an output format similar to 'git
ls-tree', but should not be compared to that directly. The biggest
reasons are that 'git ls-tree' includes a tree entry for every
subdirectory, even those that would not appear as a sparse directory in
a sparse-index. Further, 'git ls-tree' does not use a trailing directory
separator for its tree rows.

This does not print the stat() information for the blobs. That will be
added in a future change with another option. The tests that are added
in the next few changes care only about the object types and IDs.
However, this future need for full index information justifies the need
for this test helper over extending a user-facing feature, such as 'git
ls-files'.

To make the option parsing slightly more robust, wrap the string
comparisons in a loop adapted from test-dir-iterator.c.

Care must be taken with the final check for the 'cnt' variable. We
continue the expectation that the numerical value is the final argument.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/helper/test-read-cache.c | 55 +++++++++++++++++++++++++++++++-------
 1 file changed, 45 insertions(+), 10 deletions(-)

Comments

Ævar Arnfjörð Bjarmason March 24, 2021, 1:24 a.m. UTC | #1
On Tue, Mar 23 2021, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> This table is helpful for discovering data in the index to ensure it is
> being written correctly, especially as we build and test the
> sparse-index. This table includes an output format similar to 'git
> ls-tree', but should not be compared to that directly. The biggest
> reasons are that 'git ls-tree' includes a tree entry for every
> subdirectory, even those that would not appear as a sparse directory in
> a sparse-index. Further, 'git ls-tree' does not use a trailing directory
> separator for its tree rows.
>
> This does not print the stat() information for the blobs. That will be
> added in a future change with another option. The tests that are added
> in the next few changes care only about the object types and IDs.
> However, this future need for full index information justifies the need
> for this test helper over extending a user-facing feature, such as 'git
> ls-files'.

Is that stat() information that's going to be essential to grab in the
same process that runs the "for (i = 0; i < istate->cache_nr; i++)"
for-loop, or stat() information that could be grabbed as:

    git ls-files -z --stage | some-program-that-stats-all-listed-blobs

It's not so much that I still disagree as I feel like I'm missing
something. I haven't gone through this topic with a fine toothed comb,
so ...

If and when these patches land and I'm using this nascent sparse
checkout support why wouldn't I want ls-files or another not-a-test-tool
to support extracting this new information that's in the index?

That's why I sent the RFC patches at
https://lore.kernel.org/git/20210317132814.30175-2-avarab@gmail.com/ to
roll this functionality into ls-files.

Still, I think if there's a good reason for why we want this in the
index but never want our plumbing to be able to dump it in some
user-facing way I think just as a matter of reviewing this code it would
be much simpler if it was in ls-files behind some
git_env_bool("GIT_TEST_...") flag or something.

Or maybe I'm the only one who spends a lot of time with both ls-files.c
and test-read-cache.c open while trying to review this trying to keep
track of if and how this helper is and isn't subtly different from
ls-files (as my RFC series shows, not really that different at all...).
Especially with the really-just-ls-files-plus-one-thing tool mimicking
ls-tree output, for reasons I still don't get...

> To make the option parsing slightly more robust, wrap the string
> comparisons in a loop adapted from test-dir-iterator.c.
>
> Care must be taken with the final check for the 'cnt' variable. We
> continue the expectation that the numerical value is the final argument.

I think even if you're set on not having this exposed in some
builtin/*.c command this code would be much clearer based on some
version of my
https://lore.kernel.org/git/20210317132814.30175-6-avarab@gmail.com/
i.e. the part that isn't entirely deleting t/helper/test-read-cache.c,
which would survive as t/helper/test-read-cache-sparse.c or something.

As that patch shows this code is needlessly convoluted because it's
serving 3x wildly different in-tree use-cases. I don't see how the very
small amount of de-duplication we're getting is worth the complexity.

At that point we don't need any care with the cnt variable, because
we're not combining the fsmonitor and perf use-cases of reading the
index in some loop with the ls-files-alike.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  t/helper/test-read-cache.c | 55 +++++++++++++++++++++++++++++++-------
>  1 file changed, 45 insertions(+), 10 deletions(-)
>
> diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
> index 244977a29bdf..6cfd8f2de71c 100644
> --- a/t/helper/test-read-cache.c
> +++ b/t/helper/test-read-cache.c
> @@ -1,36 +1,71 @@
>  #include "test-tool.h"
>  #include "cache.h"
>  #include "config.h"
> +#include "blob.h"
> +#include "commit.h"
> +#include "tree.h"
> +
> +static void print_cache_entry(struct cache_entry *ce)
> +{
> +	const char *type;
> +	printf("%06o ", ce->ce_mode & 0177777);
> +
> +	if (S_ISSPARSEDIR(ce->ce_mode))
> +		type = tree_type;
> +	else if (S_ISGITLINK(ce->ce_mode))
> +		type = commit_type;
> +	else
> +		type = blob_type;
> +
> +	printf("%s %s\t%s\n",
> +	       type,
> +	       oid_to_hex(&ce->oid),
> +	       ce->name);
> +}
> +
> +static void print_cache(struct index_state *istate)
> +{
> +	int i;
> +	for (i = 0; i < istate->cache_nr; i++)
> +		print_cache_entry(istate->cache[i]);
> +}
>  
>  int cmd__read_cache(int argc, const char **argv)
>  {
> +	struct repository *r = the_repository;
>  	int i, cnt = 1;
>  	const char *name = NULL;
> +	int table = 0;
>  
> -	if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) {
> -		argc--;
> -		argv++;
> +	for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
> +		if (skip_prefix(*argv, "--print-and-refresh=", &name))
> +			continue;
> +		if (!strcmp(*argv, "--table"))
> +			table = 1;
>  	}
>  
> -	if (argc == 2)
> -		cnt = strtol(argv[1], NULL, 0);
> +	if (argc == 1)
> +		cnt = strtol(argv[0], NULL, 0);
>  	setup_git_directory();
>  	git_config(git_default_config, NULL);
> +
>  	for (i = 0; i < cnt; i++) {
> -		read_cache();
> +		repo_read_index(r);
>  		if (name) {
>  			int pos;
>  
> -			refresh_index(&the_index, REFRESH_QUIET,
> +			refresh_index(r->index, REFRESH_QUIET,
>  				      NULL, NULL, NULL);
> -			pos = index_name_pos(&the_index, name, strlen(name));
> +			pos = index_name_pos(r->index, name, strlen(name));
>  			if (pos < 0)
>  				die("%s not in index", name);
>  			printf("%s is%s up to date\n", name,
> -			       ce_uptodate(the_index.cache[pos]) ? "" : " not");
> +			       ce_uptodate(r->index->cache[pos]) ? "" : " not");
>  			write_file(name, "%d\n", i);
>  		}
> -		discard_cache();
> +		if (table)
> +			print_cache(r->index);
> +		discard_index(r->index);
>  	}
>  	return 0;
>  }
Derrick Stolee March 24, 2021, 12:33 p.m. UTC | #2
On 3/23/21 9:24 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Mar 23 2021, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> This table is helpful for discovering data in the index to ensure it is
>> being written correctly, especially as we build and test the
>> sparse-index. This table includes an output format similar to 'git
>> ls-tree', but should not be compared to that directly. The biggest
>> reasons are that 'git ls-tree' includes a tree entry for every
>> subdirectory, even those that would not appear as a sparse directory in
>> a sparse-index. Further, 'git ls-tree' does not use a trailing directory
>> separator for its tree rows.
>>
>> This does not print the stat() information for the blobs. That will be
>> added in a future change with another option. The tests that are added
>> in the next few changes care only about the object types and IDs.
>> However, this future need for full index information justifies the need
>> for this test helper over extending a user-facing feature, such as 'git
>> ls-files'.
> 
> Is that stat() information that's going to be essential to grab in the
> same process that runs the "for (i = 0; i < istate->cache_nr; i++)"
> for-loop, or stat() information that could be grabbed as:
> 
>     git ls-files -z --stage | some-program-that-stats-all-listed-blobs

The point is not to find the stat() data from disk, but to ensure that
the stat() data is correctly stored in the index (say, after converting
an existing index from another format). This pipe strategy does not
allow for that scenario.

> It's not so much that I still disagree as I feel like I'm missing
> something. I haven't gone through this topic with a fine toothed comb,
> so ...
> 
> If and when these patches land and I'm using this nascent sparse
> checkout support why wouldn't I want ls-files or another not-a-test-tool
> to support extracting this new information that's in the index?
> 
> That's why I sent the RFC patches at
> https://lore.kernel.org/git/20210317132814.30175-2-avarab@gmail.com/ to
> roll this functionality into ls-files.

And I recommend that you continue to pursue them as an independent
series, but I'm not going to incorporate them into this one. I'm
not going to distract from this internal data structure with changes
to user-facing commands until I think it's ready to use. As the design
document describes the plan, I don't expect this to be something I
will recommend to users until most of "Phase 3" is complete, making
the most common Git commands aware of a sparse index. (I expect to
fast-track a prototype to willing users that covers that functionality
while review continues on the mailing list.)

Making a change to a builtin is _forever_, and since the only
purpose right now is to expose the data in a test environment, I
don't want to adjust the builtin until either there is a real user
need or the feature has otherwise stabilized. If you want to take on
that responsibility, then please do.

Otherwise, I will need to eventually handle "git ls-files" being
sparse-aware when eventually removing 'command_requires_full_index',
(Phase 4) so that would be a good opportunity to adjust the
expectations.

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason March 25, 2021, 3:41 a.m. UTC | #3
On Wed, Mar 24 2021, Derrick Stolee wrote:

> On 3/23/21 9:24 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Tue, Mar 23 2021, Derrick Stolee via GitGitGadget wrote:
>> 
>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>
>>> This table is helpful for discovering data in the index to ensure it is
>>> being written correctly, especially as we build and test the
>>> sparse-index. This table includes an output format similar to 'git
>>> ls-tree', but should not be compared to that directly. The biggest
>>> reasons are that 'git ls-tree' includes a tree entry for every
>>> subdirectory, even those that would not appear as a sparse directory in
>>> a sparse-index. Further, 'git ls-tree' does not use a trailing directory
>>> separator for its tree rows.
>>>
>>> This does not print the stat() information for the blobs. That will be
>>> added in a future change with another option. The tests that are added
>>> in the next few changes care only about the object types and IDs.
>>> However, this future need for full index information justifies the need
>>> for this test helper over extending a user-facing feature, such as 'git
>>> ls-files'.
>> 
>> Is that stat() information that's going to be essential to grab in the
>> same process that runs the "for (i = 0; i < istate->cache_nr; i++)"
>> for-loop, or stat() information that could be grabbed as:
>> 
>>     git ls-files -z --stage | some-program-that-stats-all-listed-blobs
>
> The point is not to find the stat() data from disk, but to ensure that
> the stat() data is correctly stored in the index (say, after converting
> an existing index from another format). This pipe strategy does not
> allow for that scenario.

So a dump of ce->ce_stat_data, i.e. the same thing ls-files --debug
prints out now, or...?

>> It's not so much that I still disagree as I feel like I'm missing
>> something. I haven't gone through this topic with a fine toothed comb,
>> so ...
>> 
>> If and when these patches land and I'm using this nascent sparse
>> checkout support why wouldn't I want ls-files or another not-a-test-tool
>> to support extracting this new information that's in the index?
>> 
>> That's why I sent the RFC patches at
>> https://lore.kernel.org/git/20210317132814.30175-2-avarab@gmail.com/ to
>> roll this functionality into ls-files.
>
> And I recommend that you continue to pursue them as an independent
> series, but I'm not going to incorporate them into this one. I'm
> not going to distract from this internal data structure with changes
> to user-facing commands until I think it's ready to use. As the design
> document describes the plan, I don't expect this to be something I
> will recommend to users until most of "Phase 3" is complete, making
> the most common Git commands aware of a sparse index. (I expect to
> fast-track a prototype to willing users that covers that functionality
> while review continues on the mailing list.)

This series is 20 patches. Your current derrickstolee/sparse-index/wip
is another 36, and from skimming those patches & your design doc those
56 seem to be partway into Phase I of IV.

So at the rate things tend to get reviewed / re-rolled & land in git.git
it seems exceedingly likely that we'll have some part-way implementation
of this for at least a major release or two. No?

Which is why I'm suggesting/asking if we shouldn't have something like
this debugging helper as part of installed tooling, because people are
going to try it, it's probably going to have bugs and do other weird
things, and I'd rather not have to manually build some test-tool to
debug some local sparse checkout somewhere.

> Making a change to a builtin is _forever_, and since the only
> purpose right now is to expose the data in a test environment, I
> don't want to adjust the builtin until either there is a real user
> need or the feature has otherwise stabilized. If you want to take on
> that responsibility, then please do.

That's just not the case, we have plenty of unstable debug-esque options
in various built-in commands, in fact ls-files already has a --debug
option whose docs say:

    This is intended to show as much information as possible for manual
    inspection; the exact format may change at any time.

It was added in 84974217151 (ls-files: learn a debugging dump format,
2010-07-31) and "just tacks all available data from the cache onto each
file's line" so in a way not adjusting it and using it would be a
regression, after all this is new data in the cache, so it should print
it :)

There's also PARSE_OPT_HIDDEN for other such in-tree use. Whatever the
sanity/merits of me suggesting that this specific thing be in ls-files
instead of a test-helper, it seems far fetched that something like that
hidden behind a GIT_TEST_* env var (or hidden option, --debug etc.) is
something we'd need to worry about backwards compatibility for.

So, whatever you think about the merits of including this functionality
in ls-files I think your stance of this being a no-go for adding to the
builtin is based on a false premise. It's fine to have
unstable/transitory/debug output in the builtins. We just name &
document them as such.

I also had some feedback in that series and on the earlier iteration
that I think is appropriate to be incorporated into a re-roll of this
one, which doesn't have anything to do with the question of whether we
use ls-files or the helper in the tests. Such as us showing more stuff
into the read-cache.c test-tool v.s. splitting it up making that code
needlessly convoluted.

I don't see how recommending that I pursue that as an independent series
is productive for anyone. So as you re-roll this I should submit another
series on top to refactor your in-flight code & tests?

Either my suggestions are just bad, and we shouldn't do them at all, or
it makes sense to incorporate relevant feedback in re-rolls. I'll let
other reviewers draw their own conclusions on that.

That's not a snarky "I'm right" b.t.w., I may honestly be full of it on
this particular topic.

But if those suggested changes are worth doing at all, then doing them
in that way seems like a massive waste of time for everyone involved, or
maybe I'm not getting what you're suggesting by pursuing them as an
independent series.

> Otherwise, I will need to eventually handle "git ls-files" being
> sparse-aware when eventually removing 'command_requires_full_index',
> (Phase 4) so that would be a good opportunity to adjust the
> expectations.

At which point you'd be adjusting your tests that expect ls-tree format
output to using ls-files output, instead of using ls-files-like output
from the beginning?

At the end of this E-Mail is a patch on top that adds an undocumented
--debug-sparse in addition to the existing --debug. Running that in the
middle of one of your tests:
    
    $ ~/g/git/git ls-files --debug -- a folder1
    a
      ctime: 1616641434:474004002
      mtime: 1616641434:474004002
      dev: 2306     ino: 28576528
      uid: 1001     gid: 1001
      size: 8       flags: 0
    folder1/a
      ctime: 0:0
      mtime: 0:0
      dev: 0        ino: 0
      uid: 0        gid: 0
      size: 0       flags: 40000000
    $ ~/g/git/git ls-files --debug --debug-sparse -- a folder1
    a 
      ctime: 1616641434:474004002
      mtime: 1616641434:474004002
      dev: 2306     ino: 28576528
      uid: 1001     gid: 1001
      size: 8       flags: 0
    folder1/
      ctime: 0:0
      mtime: 0:0
      dev: 0        ino: 0
      uid: 0        gid: 0
      size: 0       flags: 40004000
    $ ~/g/git/git ls-files --stage -- a folder1
    100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0       a
    100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0       folder1/a
    $ ~/g/git/git ls-files --stage --debug-sparse -- a folder1
    100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0       a
    040000 f203181537ff55dcf7896bf8c5b5c35af1514421 0       folder1/

I.e. it gives you everything your helper does and more with a trivial
addition of a --debug-sparse (which we can later just remove, it's a
debug option...).

See e.g. my recent 15c9649730d (grep/log: remove hidden --debug and
--grep-debug options, 2021-01-26) which is already in a release, and
AFAICT nobody has noticed or cared.

I don't know if that's the stat() information you wanted (your WIP
branch doesn't have such a change), but presumably it either is the info
you want, or ls-files's --debug would want to emit any such such info
that's now missing too.

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 13bcc2d8473..e691512d4f8 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -34,6 +34,7 @@ static int show_valid_bit;
 static int show_fsmonitor_bit;
 static int line_terminator = '\n';
 static int debug_mode;
+static int debug_sparse_mode;
 static int show_eol;
 static int recurse_submodules;
 static int skipping_duplicates;
@@ -242,9 +243,17 @@ static void show_ce(struct repository *repo, struct dir_struct *dir,
 		if (!show_stage) {
 			fputs(tag, stdout);
 		} else {
+			unsigned int mode = ce->ce_mode;
+			if (debug_sparse_mode && S_ISSPARSEDIR(mode))
+				/*
+				 * We could just do & 0177777 all the
+				 * time, just make it clear this is
+				 * for --debug-sparse.
+				 */
+				mode &= 0177777;
 			printf("%s%06o %s %d\t",
 			       tag,
-			       ce->ce_mode,
+			       mode,
 			       find_unique_abbrev(&ce->oid, abbrev),
 			       ce_stage(ce));
 		}
@@ -667,6 +676,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0, "debug-sparse", &debug_sparse_mode, N_("show sparse debugging data")),
 		OPT_BOOL(0, "deduplicate", &skipping_duplicates,
 			 N_("suppress duplicate entries")),
 		OPT_END()
@@ -681,9 +691,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		prefix_len = strlen(prefix);
 	git_config(git_default_config, NULL);
 
-	if (repo_read_index(the_repository) < 0)
-		die("index file corrupt");
-
 	argc = parse_options(argc, argv, prefix, builtin_ls_files_options,
 			ls_files_usage, 0);
 	pl = add_pattern_list(&dir, EXC_CMDL, "--exclude option");
@@ -700,6 +707,10 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		tag_skip_worktree = "S ";
 		tag_resolve_undo = "U ";
 	}
+	if (debug_sparse_mode) {
+		prepare_repo_settings(the_repository);
+		the_repository->settings.command_requires_full_index = 0;
+	}
 	if (show_modified || show_others || show_deleted || (dir.flags & DIR_SHOW_IGNORED) || show_killed)
 		require_work_tree = 1;
 	if (show_unmerged)
@@ -743,6 +754,12 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		max_prefix = common_prefix(&pathspec);
 	max_prefix_len = get_common_prefix_len(max_prefix);
 
+	/*
+	 * Read the index after parse options etc. have had a chance
+	 * to die early.
+	 */
+	if (repo_read_index(the_repository) < 0)
+		die("index file corrupt");
 	prune_index(the_repository->index, max_prefix, max_prefix_len);
 
 	/* Treat unmatching pathspec elements as errors */
Elijah Newren March 26, 2021, 12:12 a.m. UTC | #4
Hi,

On Wed, Mar 24, 2021 at 8:41 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Wed, Mar 24 2021, Derrick Stolee wrote:
>
> > On 3/23/21 9:24 PM, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> On Tue, Mar 23 2021, Derrick Stolee via GitGitGadget wrote:
> >>
> >>> From: Derrick Stolee <dstolee@microsoft.com>
> >>>
...
> >> It's not so much that I still disagree as I feel like I'm missing
> >> something. I haven't gone through this topic with a fine toothed comb,
> >> so ...
> >>
> >> If and when these patches land and I'm using this nascent sparse
> >> checkout support why wouldn't I want ls-files or another not-a-test-tool
> >> to support extracting this new information that's in the index?
> >>
> >> That's why I sent the RFC patches at
> >> https://lore.kernel.org/git/20210317132814.30175-2-avarab@gmail.com/ to
> >> roll this functionality into ls-files.
> >
> > And I recommend that you continue to pursue them as an independent
> > series, but I'm not going to incorporate them into this one. I'm
> > not going to distract from this internal data structure with changes
> > to user-facing commands until I think it's ready to use. As the design
> > document describes the plan, I don't expect this to be something I
> > will recommend to users until most of "Phase 3" is complete, making
> > the most common Git commands aware of a sparse index. (I expect to
> > fast-track a prototype to willing users that covers that functionality
> > while review continues on the mailing list.)
>
> This series is 20 patches. Your current derrickstolee/sparse-index/wip
> is another 36, and from skimming those patches & your design doc those
> 56 seem to be partway into Phase I of IV.
>
> So at the rate things tend to get reviewed / re-rolled & land in git.git
> it seems exceedingly likely that we'll have some part-way implementation
> of this for at least a major release or two. No?
>
> Which is why I'm suggesting/asking if we shouldn't have something like
> this debugging helper as part of installed tooling, because people are
> going to try it, it's probably going to have bugs and do other weird
> things, and I'd rather not have to manually build some test-tool to
> debug some local sparse checkout somewhere.

I'm curious why you feel it's critical that this particular piece of
debugging machinery needs to be prioritized early and exposed; in
particular, I'm not sure I follow the "people are going to try it"
assertion.  Are you the one who is going to try it or are you going to
give it to your users?  If so, what do you need out of the debugging
tool?

You are correct that this will span multiple releases; Stolee already
said he was planning to be working on this for most of 2021.  But just
because pieces of the code exist and are shipped doesn't mean it'll be
announced or supported.  For example, the git-2.30 and git-2.31
release notes were completely silent about merge-ort.  It existed in
both releases; in fact, the version that ships in git-2.31, could
theoretically be used successfully by the vast majority of users for
their daily workflow.  (But it does have known shortcomings and test
failures so I definitely did *not* want it to be announced at that
time.)

> > Making a change to a builtin is _forever_, and since the only
> > purpose right now is to expose the data in a test environment, I
> > don't want to adjust the builtin until either there is a real user
> > need or the feature has otherwise stabilized. If you want to take on
> > that responsibility, then please do.
>
> That's just not the case, we have plenty of unstable debug-esque options
> in various built-in commands, in fact ls-files already has a --debug
> option whose docs say:
>
>     This is intended to show as much information as possible for manual
>     inspection; the exact format may change at any time.
>
> It was added in 84974217151 (ls-files: learn a debugging dump format,
> 2010-07-31) and "just tacks all available data from the cache onto each
> file's line" so in a way not adjusting it and using it would be a
> regression, after all this is new data in the cache, so it should print
> it :)
>
> There's also PARSE_OPT_HIDDEN for other such in-tree use. Whatever the
> sanity/merits of me suggesting that this specific thing be in ls-files
> instead of a test-helper, it seems far fetched that something like that
> hidden behind a GIT_TEST_* env var (or hidden option, --debug etc.) is
> something we'd need to worry about backwards compatibility for.
>
> So, whatever you think about the merits of including this functionality
> in ls-files I think your stance of this being a no-go for adding to the
> builtin is based on a false premise. It's fine to have
> unstable/transitory/debug output in the builtins. We just name &
> document them as such.
>
> I also had some feedback in that series and on the earlier iteration
> that I think is appropriate to be incorporated into a re-roll of this
> one, which doesn't have anything to do with the question of whether we
> use ls-files or the helper in the tests. Such as us showing more stuff
> into the read-cache.c test-tool v.s. splitting it up making that code
> needlessly convoluted.

Well:
  * you seem to be strongly opposed to test-read-cache.c containing
this code (though I don't quite follow why)
  * Stolee seems to be strongly opposed to modifying
builtin/ls-files.c until he has time to think through how builtins
should work.

So putting it in another test file that looks slightly duplicative of
test-read-cache.c might indeed be a good way out of this conundrum.
:-)

(I'm not opposed to any of the three solutions, I'm mostly chiming in
here because I'm worried about possible bubbling frustration; see
below.)

> I don't see how recommending that I pursue that as an independent series
> is productive for anyone. So as you re-roll this I should submit another
> series on top to refactor your in-flight code & tests?

Your tone suggests some frustration; I have a suspicion there's some
lack of understanding or misreading that has occurred (perhaps on my
part too), and before that misunderstanding morphs into motive
questioning, let me see if I might be able to help...

So far, you have advocated for:
  A) Moving the checks to ls-files with a permanent new flag (--sparse)
  B) Duplicating test-read-cache.c (which is admittedly pretty small)
and then modifying the duplicate to have the new behavior, or
alternatively:
  C) Just stating files to get the information
  D) Creating new debug option(s) to ls-files so that end users can
use this in the next few releases before the feature is ready for
prime time
You also mentioned you had read just part of the series.

Option D comes with the problem that it's not at all clear who these
end-users are, why they want the option, or how we should design it.
Personally, I'm totally onboard that ls-files should generally have
the ability to show information in the index (e.g. if there are tree
entries in addition to blob entries, it should be able to show both),
but I'm not following the reasoning for why it needs to be there as
part of the early stages of development of the sparse-index feature
and who it's supposed to be helping in these next few releases.

The progression also suggests that Option B might have just been a
step along the way and that you were advocating for Option D now.  I
think it'd be easy to miss that you still had option B open and
considered it equivalently good to option D (or am I misreading?),
much like you missed how option C wasn't even relevant to the problem
at hand or option A would have introduced perpetual confusion as a
mere duplicate of --stage (in the best case scenario, anyway).
They're all easy misunderstandings.

> Either my suggestions are just bad, and we shouldn't do them at all, or
> it makes sense to incorporate relevant feedback in re-rolls. I'll let
> other reviewers draw their own conclusions on that.

I think that's a bit unfair; Stolee has been incorporating feedback.
He even called out fixing up things at your suggestion in v4 of his
re-roll.

> That's not a snarky "I'm right" b.t.w., I may honestly be full of it on
> this particular topic.
>
> But if those suggested changes are worth doing at all, then doing them
> in that way seems like a massive waste of time for everyone involved, or
> maybe I'm not getting what you're suggesting by pursuing them as an
> independent series.

I think you should instead read it as he has no idea why this needs to
be exposed in ls-files, who these users are you are asserting will be
using it, or how to cater for their needs.  Shouldn't the person who
implements this understand those pieces to avoid a massive waste of
time?

> > Otherwise, I will need to eventually handle "git ls-files" being
> > sparse-aware when eventually removing 'command_requires_full_index',
> > (Phase 4) so that would be a good opportunity to adjust the
> > expectations.
>
> At which point you'd be adjusting your tests that expect ls-tree format
> output to using ls-files output, instead of using ls-files-like output
> from the beginning?

I don't understand what you're getting at here.  I was the one who
requested Stolee make the output look like ls-trees in his original
RFC series, so if there's a problem with this style of output, I'm to
blame.  But, what is exactly the problem?  Old-style ls-files output
just isn't relevant anymore.  ls-tree prints four things: mode, type,
hash, and filename.  ls-files prints all of those except "type".  The
reason ls-files never included type before was because it was always
"blob".  This series changes that, and adds "tree" to the mix.  Once
you have different types included in the index, then ls-files has to
print all the same fields that ls-tree does...so why not make it look
similar?

> At the end of this E-Mail is a patch on top that adds an undocumented
> --debug-sparse in addition to the existing --debug. Running that in the
> middle of one of your tests:
>
>     $ ~/g/git/git ls-files --debug -- a folder1
>     a
>       ctime: 1616641434:474004002
>       mtime: 1616641434:474004002
>       dev: 2306     ino: 28576528
>       uid: 1001     gid: 1001
>       size: 8       flags: 0
>     folder1/a
>       ctime: 0:0
>       mtime: 0:0
>       dev: 0        ino: 0
>       uid: 0        gid: 0
>       size: 0       flags: 40000000
>     $ ~/g/git/git ls-files --debug --debug-sparse -- a folder1
>     a
>       ctime: 1616641434:474004002
>       mtime: 1616641434:474004002
>       dev: 2306     ino: 28576528
>       uid: 1001     gid: 1001
>       size: 8       flags: 0
>     folder1/
>       ctime: 0:0
>       mtime: 0:0
>       dev: 0        ino: 0
>       uid: 0        gid: 0
>       size: 0       flags: 40004000
>     $ ~/g/git/git ls-files --stage -- a folder1
>     100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0       a
>     100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0       folder1/a
>     $ ~/g/git/git ls-files --stage --debug-sparse -- a folder1
>     100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0       a
>     040000 f203181537ff55dcf7896bf8c5b5c35af1514421 0       folder1/
>
> I.e. it gives you everything your helper does and more with a trivial
> addition of a --debug-sparse (which we can later just remove, it's a
> debug option...).
>
> See e.g. my recent 15c9649730d (grep/log: remove hidden --debug and
> --grep-debug options, 2021-01-26) which is already in a release, and
> AFAICT nobody has noticed or cared.
>
> I don't know if that's the stat() information you wanted (your WIP
> branch doesn't have such a change), but presumably it either is the info
> you want, or ls-files's --debug would want to emit any such such info
> that's now missing too.
>
> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> index 13bcc2d8473..e691512d4f8 100644
> --- a/builtin/ls-files.c
> +++ b/builtin/ls-files.c
> @@ -34,6 +34,7 @@ static int show_valid_bit;
>  static int show_fsmonitor_bit;
>  static int line_terminator = '\n';
>  static int debug_mode;
> +static int debug_sparse_mode;
>  static int show_eol;
>  static int recurse_submodules;
>  static int skipping_duplicates;
> @@ -242,9 +243,17 @@ static void show_ce(struct repository *repo, struct dir_struct *dir,
>                 if (!show_stage) {
>                         fputs(tag, stdout);
>                 } else {
> +                       unsigned int mode = ce->ce_mode;
> +                       if (debug_sparse_mode && S_ISSPARSEDIR(mode))
> +                               /*
> +                                * We could just do & 0177777 all the
> +                                * time, just make it clear this is
> +                                * for --debug-sparse.
> +                                */
> +                               mode &= 0177777;
>                         printf("%s%06o %s %d\t",
>                                tag,
> -                              ce->ce_mode,
> +                              mode,
>                                find_unique_abbrev(&ce->oid, abbrev),
>                                ce_stage(ce));
>                 }
> @@ -667,6 +676,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>                         N_("pretend that paths removed since <tree-ish> are still present")),
>                 OPT__ABBREV(&abbrev),
>                 OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
> +               OPT_BOOL(0, "debug-sparse", &debug_sparse_mode, N_("show sparse debugging data")),
>                 OPT_BOOL(0, "deduplicate", &skipping_duplicates,
>                          N_("suppress duplicate entries")),
>                 OPT_END()
> @@ -681,9 +691,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>                 prefix_len = strlen(prefix);
>         git_config(git_default_config, NULL);
>
> -       if (repo_read_index(the_repository) < 0)
> -               die("index file corrupt");
> -
>         argc = parse_options(argc, argv, prefix, builtin_ls_files_options,
>                         ls_files_usage, 0);
>         pl = add_pattern_list(&dir, EXC_CMDL, "--exclude option");
> @@ -700,6 +707,10 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>                 tag_skip_worktree = "S ";
>                 tag_resolve_undo = "U ";
>         }
> +       if (debug_sparse_mode) {
> +               prepare_repo_settings(the_repository);
> +               the_repository->settings.command_requires_full_index = 0;
> +       }
>         if (show_modified || show_others || show_deleted || (dir.flags & DIR_SHOW_IGNORED) || show_killed)
>                 require_work_tree = 1;
>         if (show_unmerged)
> @@ -743,6 +754,12 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
>                 max_prefix = common_prefix(&pathspec);
>         max_prefix_len = get_common_prefix_len(max_prefix);
>
> +       /*
> +        * Read the index after parse options etc. have had a chance
> +        * to die early.
> +        */
> +       if (repo_read_index(the_repository) < 0)
> +               die("index file corrupt");
>         prune_index(the_repository->index, max_prefix, max_prefix_len);
>
>         /* Treat unmatching pathspec elements as errors */
Ævar Arnfjörð Bjarmason March 28, 2021, 3:31 p.m. UTC | #5
On Fri, Mar 26 2021, Elijah Newren wrote:

> Hi,
>
> On Wed, Mar 24, 2021 at 8:41 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> On Wed, Mar 24 2021, Derrick Stolee wrote:
>>
>> > On 3/23/21 9:24 PM, Ævar Arnfjörð Bjarmason wrote:
>> >>
>> >> On Tue, Mar 23 2021, Derrick Stolee via GitGitGadget wrote:
>> >>
>> >>> From: Derrick Stolee <dstolee@microsoft.com>
>> >>>
> ...
>> >> It's not so much that I still disagree as I feel like I'm missing
>> >> something. I haven't gone through this topic with a fine toothed comb,
>> >> so ...
>> >>
>> >> If and when these patches land and I'm using this nascent sparse
>> >> checkout support why wouldn't I want ls-files or another not-a-test-tool
>> >> to support extracting this new information that's in the index?
>> >>
>> >> That's why I sent the RFC patches at
>> >> https://lore.kernel.org/git/20210317132814.30175-2-avarab@gmail.com/ to
>> >> roll this functionality into ls-files.
>> >
>> > And I recommend that you continue to pursue them as an independent
>> > series, but I'm not going to incorporate them into this one. I'm
>> > not going to distract from this internal data structure with changes
>> > to user-facing commands until I think it's ready to use. As the design
>> > document describes the plan, I don't expect this to be something I
>> > will recommend to users until most of "Phase 3" is complete, making
>> > the most common Git commands aware of a sparse index. (I expect to
>> > fast-track a prototype to willing users that covers that functionality
>> > while review continues on the mailing list.)
>>
>> This series is 20 patches. Your current derrickstolee/sparse-index/wip
>> is another 36, and from skimming those patches & your design doc those
>> 56 seem to be partway into Phase I of IV.
>>
>> So at the rate things tend to get reviewed / re-rolled & land in git.git
>> it seems exceedingly likely that we'll have some part-way implementation
>> of this for at least a major release or two. No?
>>
>> Which is why I'm suggesting/asking if we shouldn't have something like
>> this debugging helper as part of installed tooling, because people are
>> going to try it, it's probably going to have bugs and do other weird
>> things, and I'd rather not have to manually build some test-tool to
>> debug some local sparse checkout somewhere.
>
> I'm curious why you feel it's critical that this particular piece of
> debugging machinery needs to be prioritized early and exposed; in
> particular, I'm not sure I follow the "people are going to try it"
> assertion.

The debugging machinery's already there, the question is why we have a
need for duplicating code in-tree.

I just did some cursory review of this topic, and wondered why its tests
couldn't use a builtin instead of (mostly) reinventing the wheel.

It seems to me that the reason for that state is based on a
misunderstanding about what we would and wouldn't add to builtin/*.c,
i.e. that we wouldn't have something like a --debug option, but as
ls-files shows that's not a problem.

So my interest is twofold:

 * Just a comment on "can we avoid this code duplication"

 * The related one of not wanting to re-learn some custom test helper as
   (presumably) we get N number of large patch serieses on this topic,
   if it turns out that we can use an existing well-known tool with
   minimal changes.

> Are you the one who is going to try it or are you going to
> give it to your users?  If so, what do you need out of the debugging
> tool?

I haven't understood the sparse index enough feature enough to know if
anyone would ever want to run this --debug-sparse outside of the test
suite.

Isn't extract info about its internal state going to be useful sooner
than later in the scenarios where you'd care enough to run "ls-files
--stage" now?

Maybe I've misunderstood this feature and it's going to be so
transparent that nobody will ever have any reason to dump how it's
working out of the index...

> You are correct that this will span multiple releases; Stolee already
> said he was planning to be working on this for most of 2021.  But just
> because pieces of the code exist and are shipped doesn't mean it'll be
> announced or supported.  For example, the git-2.30 and git-2.31
> release notes were completely silent about merge-ort.  It existed in
> both releases; in fact, the version that ships in git-2.31, could
> theoretically be used successfully by the vast majority of users for
> their daily workflow.  (But it does have known shortcomings and test
> failures so I definitely did *not* want it to be announced at that
> time.)

Yes, and that's fine. But if you'd been bending over backwards to add
merge-ort to t/helper/ "because it's not ready yet" or something I'd
have probably commented to the effect of "can't we just add it as part
of builtins but not advertise it?" which is what you did :)

>> > Making a change to a builtin is _forever_, and since the only
>> > purpose right now is to expose the data in a test environment, I
>> > don't want to adjust the builtin until either there is a real user
>> > need or the feature has otherwise stabilized. If you want to take on
>> > that responsibility, then please do.
>>
>> That's just not the case, we have plenty of unstable debug-esque options
>> in various built-in commands, in fact ls-files already has a --debug
>> option whose docs say:
>>
>>     This is intended to show as much information as possible for manual
>>     inspection; the exact format may change at any time.
>>
>> It was added in 84974217151 (ls-files: learn a debugging dump format,
>> 2010-07-31) and "just tacks all available data from the cache onto each
>> file's line" so in a way not adjusting it and using it would be a
>> regression, after all this is new data in the cache, so it should print
>> it :)
>>
>> There's also PARSE_OPT_HIDDEN for other such in-tree use. Whatever the
>> sanity/merits of me suggesting that this specific thing be in ls-files
>> instead of a test-helper, it seems far fetched that something like that
>> hidden behind a GIT_TEST_* env var (or hidden option, --debug etc.) is
>> something we'd need to worry about backwards compatibility for.
>>
>> So, whatever you think about the merits of including this functionality
>> in ls-files I think your stance of this being a no-go for adding to the
>> builtin is based on a false premise. It's fine to have
>> unstable/transitory/debug output in the builtins. We just name &
>> document them as such.
>>
>> I also had some feedback in that series and on the earlier iteration
>> that I think is appropriate to be incorporated into a re-roll of this
>> one, which doesn't have anything to do with the question of whether we
>> use ls-files or the helper in the tests. Such as us showing more stuff
>> into the read-cache.c test-tool v.s. splitting it up making that code
>> needlessly convoluted.
>
> Well:
>   * you seem to be strongly opposed to test-read-cache.c containing
> this code (though I don't quite follow why)

See above.

>   * Stolee seems to be strongly opposed to modifying
> builtin/ls-files.c until he has time to think through how builtins
> should work.

As noted above my reading of upthread is that those reasons basically
boil down to not knowing "git ls-files --debug" exists, and that we can
extend it.

> So putting it in another test file that looks slightly duplicative of
> test-read-cache.c might indeed be a good way out of this conundrum.
> :-)

FWIW I think that read-cache.c split is worth doing even if this series
doesn't modify t/helper/read-cache.c.

The "this is for fsmonitor" and "this is for the perf test" use-cases
are (as I think my RFC patch shows) clearer once they're split up.

> (I'm not opposed to any of the three solutions, I'm mostly chiming in
> here because I'm worried about possible bubbling frustration; see
> below.)
>
>> I don't see how recommending that I pursue that as an independent series
>> is productive for anyone. So as you re-roll this I should submit another
>> series on top to refactor your in-flight code & tests?
>
> Your tone suggests some frustration; I have a suspicion there's some
> lack of understanding or misreading that has occurred (perhaps on my
> part too), and before that misunderstanding morphs into motive
> questioning, let me see if I might be able to help...

Honestly more flabbergasted than anything, so I'm trying to clarify what
the author thinks of this direction.

I mean it's fine if it's just a "I don't think this is important and
don't want to spend time on it, but it seems like a good idea", in which
case others have the option of re-rolling some of these patches if they
care (at this point I wouldn't).

Or "this is just a bad idea for XYZ reason", which is also fine, and
even more valuable to document for future work in the area.

But to have another series built on this with refactorings back and
forth before code's landed on master just seems like needless churn.

> So far, you have advocated for:
>   A) Moving the checks to ls-files with a permanent new flag (--sparse)
>   B) Duplicating test-read-cache.c (which is admittedly pretty small)
> and then modifying the duplicate to have the new behavior, or
> alternatively:
>   C) Just stating files to get the information
>   D) Creating new debug option(s) to ls-files so that end users can
> use this in the next few releases before the feature is ready for
> prime time
> You also mentioned you had read just part of the series.
>
> Option D comes with the problem that it's not at all clear who these
> end-users are, why they want the option, or how we should design it. [...]

I think s/advocated/read the series and sent an flow-of-thought
not-ready-for-anything RFC patches on top/ would be more accurate :)

I.e. the A) --sparse thing was just reading the patch and seeing if
ls-files couldn't be made to do this, but yes, having the documented
--sparse interface might not make sense.

we discussed B) above.

C) Was a question to clarify what was meant with stat data, since it's
an offhand comment in the commit message. Does it mean "stat after the
fact" or "this will have a mode like ls-files --debug has now"?

Right now I'm just suggesting with D) that this might be rolled into the
dev-only-not-for-end-users --debug mode. 

> I'm totally onboard that ls-files should generally have
> the ability to show information in the index (e.g. if there are tree
> entries in addition to blob entries, it should be able to show both),
> but I'm not following the reasoning for why it needs to be there as
> part of the early stages of development of the sparse-index feature
> and who it's supposed to be helping in these next few releases.

We already are extracting the info at this early stage, just with a
custom helper. All I'm suggesting right now is that the motivation for
the custom helper is "this isn't for end users" then surely having a
patch around 1/2 the size to add it to already reviewed/tested ls-files
code under a --debug option makes more sense.

Especially since the upthread commit mentions wanting to incorporate
stat() data. I'm not sure how exactly (there's no outstanding patches,
even on a WIP branch for it, AFAICT), but most likely it's further
duplication of data "ls-files --debug" already spews out.

So the patch would be 1/2 the size, and instead of saying "let's do stat
stuff in the future" it would get it for free.

Or not, part of that's speculation on information that's just in
Stolee's head. Hence this side-discussion.

> [...]

[Cut parts hopefully all clarified with the above comments]

>> [..]
>> At which point you'd be adjusting your tests that expect ls-tree format
>> output to using ls-files output, instead of using ls-files-like output
>> from the beginning?
>
> I don't understand what you're getting at here.  I was the one who
> requested Stolee make the output look like ls-trees in his original
> RFC series, so if there's a problem with this style of output, I'm to
> blame.

I didn't read the RFC series, so I missed that there was past discussion
on this point.

Perhaps something to roll into an updated commit message? My reading of
the current version is that it suggests that the ls-tree-like output is
important to get at the data we need, which my patch-for-discussion
shows isn't the case.

> [...] Once you have different types included in the index, then
> ls-files has to print all the same fields that ls-tree does...so why
> not make it look similar?

I don't have a problem with how the output looks, I happen to like the
ls-tree output better, I've just been suggesting that differing output
== code duplication.

In any case. I'm sorry about any comments I've made that came across as
snarky or whatever. Since we're talking in a text-based medium I'm going
to take the reading of a third-party native speaker (you) over mine.

I didn't mean any comments I've made that way, I'm very interested in
seeing this feature land, and just want to try to help it along. Given
the size of this thread over a relatively trivial matter I think that
"help" is probably counterproductive at this point.

I don't think this is criticial or needs to be done or whatever. I've
only kept up this thread for the reasons stated above, i.e. it seeming
to me to be based on the premise that we can't add certain code to
builtin/*.c, and if we can get around that we can make this simpler.
Derrick Stolee March 29, 2021, 7:46 p.m. UTC | #6
On 3/28/2021 11:31 AM, Ævar Arnfjörð Bjarmason wrote:> It seems to me that the reason for that state is based on a
> misunderstanding about what we would and wouldn't add to builtin/*.c,
> i.e. that we wouldn't have something like a --debug option, but as
> ls-files shows that's not a problem.

I feel _strongly_ that a change to the user-facing CLI should come
with a good reason and care about how it locks-in behavior for the
future.

Any adjustment to 'git ls-files' deserves its own series and
attention, not in an already-too-large series like this one.

I'm not happy that this series and the next are so long, but that's
the best I can do to make them reviewable and still capture a
complete scenario. Hopefully the remaining series after these first
two are smaller. Things like "what should 'git ls-files' do with a
sparse index?" can fit cleanly on top once the core functionality
of the internals are stable.

I have an _opinion_ that the ls-files output is not well-suited to
testing because the --debug output splits details across multiple
lines. This is a minor point that could probably be corrected by
a complicated script method, but that's why I list this as an
opinion.

> I mean it's fine if it's just a "I don't think this is important and
> don't want to spend time on it, but it seems like a good idea", in which
> case others have the option of re-rolling some of these patches if they
> care (at this point I wouldn't).
> 
> Or "this is just a bad idea for XYZ reason", which is also fine, and
> even more valuable to document for future work in the area.
> 
> But to have another series built on this with refactorings back and
> forth before code's landed on master just seems like needless churn.

I think changing 'ls-files' before the sparse index has stabilized is
premature. I said that a series like the RFC you sent would be
appropriate after this concept is more stable. I do _not_ recommend
trying to juggle it on top of the work while the patches are in flight.

Thanks,
-Stolee
Junio C Hamano March 29, 2021, 9:44 p.m. UTC | #7
Derrick Stolee <stolee@gmail.com> writes:

> I think changing 'ls-files' before the sparse index has stabilized is
> premature. I said that a series like the RFC you sent would be
> appropriate after this concept is more stable. I do _not_ recommend
> trying to juggle it on top of the work while the patches are in flight.

I do not have a problem with either of approaches to help debugging
(i.e. extending "ls-files --debug" or a new test helper), but I am
curious to be reminded what the plan for "git ls-files [-s]" output
is, when run in a repository in which sparse cone checkout is used.

Do we see trees and paths outside the cone omitted, or does the act
of running "ls-files" dehydrate the trees into their constituent
blobs?

Thanks.
Elijah Newren March 29, 2021, 10:02 p.m. UTC | #8
Hi,

On Sun, Mar 28, 2021 at 8:31 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Fri, Mar 26 2021, Elijah Newren wrote:
>
[...]
> > You are correct that this will span multiple releases; Stolee already
> > said he was planning to be working on this for most of 2021.  But just
> > because pieces of the code exist and are shipped doesn't mean it'll be
> > announced or supported.  For example, the git-2.30 and git-2.31
> > release notes were completely silent about merge-ort.  It existed in
> > both releases; in fact, the version that ships in git-2.31, could
> > theoretically be used successfully by the vast majority of users for
> > their daily workflow.  (But it does have known shortcomings and test
> > failures so I definitely did *not* want it to be announced at that
> > time.)
>
> Yes, and that's fine. But if you'd been bending over backwards to add
> merge-ort to t/helper/ "because it's not ready yet" or something I'd
> have probably commented to the effect of "can't we just add it as part
> of builtins but not advertise it?" which is what you did :)

Actually, I did add a t/helper/test-fast-rebase.c (which is a few
hundred lines long) as part of the work on merge-ort, because
merge-ort wasn't ready and because rewiring sequencer.c was a huge
amount of work that I didn't want to get distracted by at the time.  I
originally suggested making fast-rebase a non-advertised builtin, but
multiple reviewers suggested the test helper route instead.

¯\_(ツ)_/¯
Ævar Arnfjörð Bjarmason March 29, 2021, 11:06 p.m. UTC | #9
On Mon, Mar 29 2021, Derrick Stolee wrote:

> On 3/28/2021 11:31 AM, Ævar Arnfjörð Bjarmason wrote:> It seems to me that the reason for that state is based on a
>> misunderstanding about what we would and wouldn't add to builtin/*.c,
>> i.e. that we wouldn't have something like a --debug option, but as
>> ls-files shows that's not a problem.

At the risk of going in circles here...

> I feel _strongly_ that a change to the user-facing CLI should come
> with a good reason and care about how it locks-in behavior for the
> future.

And I agree with you. Where we disagree is whether lives in builtin/*.c
== user-facing. I think --debug options are != that. It seems Junio
downthread agrees with that.

> Any adjustment to 'git ls-files' deserves its own series and
> attention[...]

A user-facing change to it yes, but I don't see how use of an (existing
even) --debug option would warrant any more attention than a new test
helper, less actually, it's less new code.

> [...] not in an already-too-large series like this one.

The alternative way of doing it at the end of
https://lore.kernel.org/git/874kgzq4qi.fsf@evledraar.gmail.com would
make this series smaller.

Anyway. As I noted in the E-Mail you're replying to
(https://lore.kernel.org/git/87eeg0ng78.fsf@evledraar.gmail.com/) I
really don't care that much.

I'm just still perplexed at how you keep bringing up use of an
internal-only --debug option as "user-facing", and here "already too
large" when we're talking about a proposed alternate direction that
would reduce the size.

> I'm not happy that this series and the next are so long, but that's
> the best I can do to make them reviewable and still capture a
> complete scenario. Hopefully the remaining series after these first
> two are smaller. Things like "what should 'git ls-files' do with a
> sparse index?" can fit cleanly on top once the core functionality
> of the internals are stable.

Sure. I'm fully on board with just moving forward with this in some
manner.

I'm not on board with the part of this that seems like it could just be
rephrased/understood as "...and we're not touching ls-files even with a
--debug option now because that would be user-facing[...]".

> I have an _opinion_ that the ls-files output is not well-suited to
> testing because the --debug output splits details across multiple
> lines. This is a minor point that could probably be corrected by
> a complicated script method, but that's why I list this as an
> opinion.

If the --debug it's spewing now isn't handy we can just change the
output format. The docs say:

    This is intended to show as much information as possible for manual
    inspection; the exact format may change at any time.

And we don't have existing in-tree users, something like this would make
it rather trivial:
    
    diff --git a/builtin/ls-files.c b/builtin/ls-files.c
    index f6f9e483b27..7596edc9f9d 100644
    --- a/builtin/ls-files.c
    +++ b/builtin/ls-files.c
    @@ -113,11 +113,11 @@ static void print_debug(const struct cache_entry *ce)
     	if (debug_mode) {
     		const struct stat_data *sd = &ce->ce_stat_data;
     
    -		printf("  ctime: %u:%u\n", sd->sd_ctime.sec, sd->sd_ctime.nsec);
    -		printf("  mtime: %u:%u\n", sd->sd_mtime.sec, sd->sd_mtime.nsec);
    -		printf("  dev: %u\tino: %u\n", sd->sd_dev, sd->sd_ino);
    -		printf("  uid: %u\tgid: %u\n", sd->sd_uid, sd->sd_gid);
    -		printf("  size: %u\tflags: %x\n", sd->sd_size, ce->ce_flags);
    +		printf("  ctime: %u:%u%c", sd->sd_ctime.sec, sd->sd_ctime.nsec, line_terminator);
    +		printf("  mtime: %u:%u%c", sd->sd_mtime.sec, sd->sd_mtime.nsec, line_terminator);
    +		printf("  dev: %u\tino: %u%c", sd->sd_dev, sd->sd_ino, line_terminator);
    +		printf("  uid: %u\tgid: %u%c", sd->sd_uid, sd->sd_gid, line_terminator);
    +		printf("  size: %u\tflags: %x%c", sd->sd_size, ce->ce_flags, line_terminator);
     	}
     }

But even without that it wouldn't be some complicated post-processing,
just a pipe to a small perl or awk process.
     
>> I mean it's fine if it's just a "I don't think this is important and
>> don't want to spend time on it, but it seems like a good idea", in which
>> case others have the option of re-rolling some of these patches if they
>> care (at this point I wouldn't).
>> 
>> Or "this is just a bad idea for XYZ reason", which is also fine, and
>> even more valuable to document for future work in the area.
>> 
>> But to have another series built on this with refactorings back and
>> forth before code's landed on master just seems like needless churn.
>
> I think changing 'ls-files' before the sparse index has stabilized is
> premature. I said that a series like the RFC you sent would be
> appropriate after this concept is more stable. I do _not_ recommend
> trying to juggle it on top of the work while the patches are in flight.

Just to clarify, upthread in [1] you said:

    And I recommend that you continue to pursue [these RFC patches] as
    an independent series, but I'm not going to incorporate them into
    this one[...]

So do I understand it right that you're referring to phase IV in your
opinion being the first point where we'd consider piggy-backing on
anything in builtin (that "user-facing" dilemma again...).

But at that point wouldn't you have your own ideas about some
user-facing ls-files or other porcelain for this, so I'm not sure where
to place the encouragement that I continue to pursue that RFC series,
other than setting a reminder in my calendar for 6-12 months in the
future :)

1. https://lore.kernel.org/git/ca8a96a4-5897-2484-b195-57e5b3820576@gmail.com/
Derrick Stolee March 30, 2021, 11:28 a.m. UTC | #10
On 3/29/2021 5:44 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> I think changing 'ls-files' before the sparse index has stabilized is
>> premature. I said that a series like the RFC you sent would be
>> appropriate after this concept is more stable. I do _not_ recommend
>> trying to juggle it on top of the work while the patches are in flight.
> 
> I do not have a problem with either of approaches to help debugging
> (i.e. extending "ls-files --debug" or a new test helper), but I am
> curious to be reminded what the plan for "git ls-files [-s]" output
> is, when run in a repository in which sparse cone checkout is used.
> 
> Do we see trees and paths outside the cone omitted, or does the act
> of running "ls-files" dehydrate the trees into their constituent
> blobs?

At the moment, end-to-end behavior is identical as before: sparse
directory entries are expanded to all of the contained blobs instead
of writing the tree entries.

The sparse-index work will not be complete until every command is
audited for potential behavior change when disabling the
command_requires_full_index setting. That includes deciding what
is the best decision for ls-files, and will likely include an option
for both possible outputs (tree entries, or expanding to blobs). The
interesting discussion that is worth its own topic is whether or not
the tree entries should be displayed by default.

So the plan is: this _will_ be addressed, but in the future after
the core functionality and value of the sparse-index is set.

Thanks,
-Stolee
Derrick Stolee March 30, 2021, 11:41 a.m. UTC | #11
On 3/29/2021 7:06 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Mar 29 2021, Derrick Stolee wrote:
> 
>> On 3/28/2021 11:31 AM, Ævar Arnfjörð Bjarmason wrote:> It seems to me that the reason for that state is based on a
>>> misunderstanding about what we would and wouldn't add to builtin/*.c,
>>> i.e. that we wouldn't have something like a --debug option, but as
>>> ls-files shows that's not a problem.
> 
> At the risk of going in circles here...
> 
>> I feel _strongly_ that a change to the user-facing CLI should come
>> with a good reason and care about how it locks-in behavior for the
>> future.
> 
> And I agree with you. Where we disagree is whether lives in builtin/*.c
> == user-facing. I think --debug options are != that. It seems Junio
> downthread agrees with that.
> 
>> Any adjustment to 'git ls-files' deserves its own series and
>> attention[...]
> 
> A user-facing change to it yes, but I don't see how use of an (existing
> even) --debug option would warrant any more attention than a new test
> helper, less actually, it's less new code.

I disagree that we can change the expected output of --debug so
quickly, despite warnings in the documentation. Changing that format
or creating a new output format requires cognitive load, and we have
enough of that going on in this area as it is.

>> [...] not in an already-too-large series like this one.
...
> I'm just still perplexed at how you keep bringing up use of an
> internal-only --debug option as "user-facing", and here "already too
> large" when we're talking about a proposed alternate direction that
> would reduce the size.

I'm not saying "patch size" or "code size" but instead thinking of it
in terms of how many decisions need to be made. Changing a builtin
when it's not necessary adds to the complexity of the series and
interrupts its core goals.

Finally, I have mentioned that I will need extra data for testing a
new index format. I don't want to modify the builtin now in a way
that is insufficient for the needs in that future series.

> Just to clarify, upthread in [1] you said:
> 
>     And I recommend that you continue to pursue [these RFC patches] as
>     an independent series, but I'm not going to incorporate them into
>     this one[...]
> 
> So do I understand it right that you're referring to phase IV in your
> opinion being the first point where we'd consider piggy-backing on
> anything in builtin (that "user-facing" dilemma again...).

I'm saying that if you feel strongly about it, then please pursue the
changes to ls-files any time after this series (but probably after
the next) solidifies. Having the changes be in a separate series allows
time to inspect the behavior change to the builtin in a focused way.
 
> But at that point wouldn't you have your own ideas about some
> user-facing ls-files or other porcelain for this, so I'm not sure where
> to place the encouragement that I continue to pursue that RFC series,
> other than setting a reminder in my calendar for 6-12 months in the
> future :)

Otherwise, I will modify ls-files myself in this 6-12 month timeframe,
based on the established plan to remove the command_requires_full_index
setting.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
index 244977a29bdf..6cfd8f2de71c 100644
--- a/t/helper/test-read-cache.c
+++ b/t/helper/test-read-cache.c
@@ -1,36 +1,71 @@ 
 #include "test-tool.h"
 #include "cache.h"
 #include "config.h"
+#include "blob.h"
+#include "commit.h"
+#include "tree.h"
+
+static void print_cache_entry(struct cache_entry *ce)
+{
+	const char *type;
+	printf("%06o ", ce->ce_mode & 0177777);
+
+	if (S_ISSPARSEDIR(ce->ce_mode))
+		type = tree_type;
+	else if (S_ISGITLINK(ce->ce_mode))
+		type = commit_type;
+	else
+		type = blob_type;
+
+	printf("%s %s\t%s\n",
+	       type,
+	       oid_to_hex(&ce->oid),
+	       ce->name);
+}
+
+static void print_cache(struct index_state *istate)
+{
+	int i;
+	for (i = 0; i < istate->cache_nr; i++)
+		print_cache_entry(istate->cache[i]);
+}
 
 int cmd__read_cache(int argc, const char **argv)
 {
+	struct repository *r = the_repository;
 	int i, cnt = 1;
 	const char *name = NULL;
+	int table = 0;
 
-	if (argc > 1 && skip_prefix(argv[1], "--print-and-refresh=", &name)) {
-		argc--;
-		argv++;
+	for (++argv, --argc; *argv && starts_with(*argv, "--"); ++argv, --argc) {
+		if (skip_prefix(*argv, "--print-and-refresh=", &name))
+			continue;
+		if (!strcmp(*argv, "--table"))
+			table = 1;
 	}
 
-	if (argc == 2)
-		cnt = strtol(argv[1], NULL, 0);
+	if (argc == 1)
+		cnt = strtol(argv[0], NULL, 0);
 	setup_git_directory();
 	git_config(git_default_config, NULL);
+
 	for (i = 0; i < cnt; i++) {
-		read_cache();
+		repo_read_index(r);
 		if (name) {
 			int pos;
 
-			refresh_index(&the_index, REFRESH_QUIET,
+			refresh_index(r->index, REFRESH_QUIET,
 				      NULL, NULL, NULL);
-			pos = index_name_pos(&the_index, name, strlen(name));
+			pos = index_name_pos(r->index, name, strlen(name));
 			if (pos < 0)
 				die("%s not in index", name);
 			printf("%s is%s up to date\n", name,
-			       ce_uptodate(the_index.cache[pos]) ? "" : " not");
+			       ce_uptodate(r->index->cache[pos]) ? "" : " not");
 			write_file(name, "%d\n", i);
 		}
-		discard_cache();
+		if (table)
+			print_cache(r->index);
+		discard_index(r->index);
 	}
 	return 0;
 }