diff mbox series

[v7] describe: refresh the index when 'broken' flag is used

Message ID 20240626190801.68472-1-abhijeet.nkt@gmail.com (mailing list archive)
State New
Headers show
Series [v7] describe: refresh the index when 'broken' flag is used | expand

Commit Message

Abhijeet Sonar June 26, 2024, 7:08 p.m. UTC
When describe is run with 'dirty' flag, we refresh the index
to make sure it is in sync with the filesystem before
determining if the working tree is dirty.  However, this is
not done for the codepath where the 'broken' flag is used.

This causes `git describe --broken --dirty` to false
positively report the worktree being dirty if a file has
different stat info than what is recorded in the index.
Running `git update-index -q --refresh` to refresh the index
before running diff-index fixes the problem.

Also add tests to deliberately update stat info of a
file before running describe to verify it behaves correctly.

Reported-by: Paul Millar <paul.millar@desy.de>
Suggested-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com>
---
 builtin/describe.c  | 12 ++++++++++++
 t/t6120-describe.sh | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

Comments

Abhijeet Sonar June 26, 2024, 7:25 p.m. UTC | #1
I have a question:

Why does --dirty code path also not call git-update-index and instead does

	setup_work_tree();
	prepare_repo_settings(the_repository);
	the_repository->settings.command_requires_full_index = 0;
	repo_read_index(the_repository);
	refresh_index(...);
	fd = repo_hold_locked_index(...);
	if (0 <= fd)
		repo_update_index_if_able(the_repository, &index_lock);

I assume they are equivalent?
The commit which introduced this --
bb571486ae93d02746c4bcc8032bde306f6d399a (describe: Refresh the index
when run with --dirty) seems to be for the same objective as mu patch.

Is this something you would like to see addressed?

Thanks
Abhijeet Sonar June 27, 2024, 6:01 a.m. UTC | #2
On 27/06/24 00:55, Abhijeet Sonar wrote:
> I have a question:
> 
> Why does --dirty code path also not call git-update-index and instead does
> 
> 	setup_work_tree();
> 	prepare_repo_settings(the_repository);
> 	the_repository->settings.command_requires_full_index = 0;
> 	repo_read_index(the_repository);
> 	refresh_index(...);
> 	fd = repo_hold_locked_index(...);
> 	if (0 <= fd)
> 		repo_update_index_if_able(the_repository, &index_lock);
> 
> I assume they are equivalent?
> The commit which introduced this --
> bb571486ae93d02746c4bcc8032bde306f6d399a (describe: Refresh the index
> when run with --dirty) seems to be for the same objective as mu patch.
> 
> Is this something you would like to see addressed?
> 
> Thanks

s/mu/my
Junio C Hamano June 27, 2024, 3:47 p.m. UTC | #3
Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:

> I have a question:
>
> Why does --dirty code path also not call git-update-index and instead does
>
> 	setup_work_tree();
> 	prepare_repo_settings(the_repository);
> 	the_repository->settings.command_requires_full_index = 0;
> 	repo_read_index(the_repository);
> 	refresh_index(...);
> 	fd = repo_hold_locked_index(...);
> 	if (0 <= fd)
> 		repo_update_index_if_able(the_repository, &index_lock);
>
> I assume they are equivalent?

Now we are going back full circles ;-)?

Your earliest attempt indeed copied the above to the code paths used
to handle "--broken", but then Phillip corrected the course

  https://lore.kernel.org/git/054c6ac1-4714-4600-afa5-7e9b6e9b0e72@gmail.com/

to avoid triggering an in-process error and instead run an
equivalent "update-index --refresh" via the run_command() interface,
so that we can catch potential errors.  The code in the more recent
rounds of your patch uses that, no?

> The commit which introduced this --
> bb571486ae93d02746c4bcc8032bde306f6d399a (describe: Refresh the index
> when run with --dirty) seems to be for the same objective as mu patch.

Yes, but the cited message above explains the reason why the two
code paths in your patch use different implementations, I would
think.
Abhijeet Sonar June 27, 2024, 5:33 p.m. UTC | #4
On 27/06/24 21:17, Junio C Hamano wrote:

> to avoid triggering an in-process error and instead run an
> equivalent "update-index --refresh" via the run_command() interface,
> so that we can catch potential errors.

Thanks, I get it now.
Karthik Nayak June 30, 2024, 4:12 p.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

> Abhijeet Sonar <abhijeet.nkt@gmail.com> writes:
>
>> I have a question:
>>
>> Why does --dirty code path also not call git-update-index and instead does
>>
>> 	setup_work_tree();
>> 	prepare_repo_settings(the_repository);
>> 	the_repository->settings.command_requires_full_index = 0;
>> 	repo_read_index(the_repository);
>> 	refresh_index(...);
>> 	fd = repo_hold_locked_index(...);
>> 	if (0 <= fd)
>> 		repo_update_index_if_able(the_repository, &index_lock);
>>
>> I assume they are equivalent?
>
> Now we are going back full circles ;-)?
>
> Your earliest attempt indeed copied the above to the code paths used
> to handle "--broken", but then Phillip corrected the course
>
>   https://lore.kernel.org/git/054c6ac1-4714-4600-afa5-7e9b6e9b0e72@gmail.com/
>
> to avoid triggering an in-process error and instead run an
> equivalent "update-index --refresh" via the run_command() interface,
> so that we can catch potential errors.  The code in the more recent
> rounds of your patch uses that, no?
>

This explains for why 'broken' must use a subprocess, but there is
nothing stopping 'dirty' from also using a subprocess, right? It
currently uses an in-process index refresh but it _could_ be a
subprocess too.

Does it need to be a subprocess? I can't think of any reason to make it.
Junio C Hamano July 1, 2024, 7:06 p.m. UTC | #6
Karthik Nayak <karthik.188@gmail.com> writes:

> This explains for why 'broken' must use a subprocess, but there is
> nothing stopping 'dirty' from also using a subprocess, right? It
> currently uses an in-process index refresh but it _could_ be a
> subprocess too.

Correct, except that it does not make sense to do any and all things
that you _could_ do.  So...
Karthik Nayak July 2, 2024, 10:13 a.m. UTC | #7
Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> This explains for why 'broken' must use a subprocess, but there is
>> nothing stopping 'dirty' from also using a subprocess, right? It
>> currently uses an in-process index refresh but it _could_ be a
>> subprocess too.
>
> Correct, except that it does not make sense to do any and all things
> that you _could_ do.  So...

Well, In this context, I think there is some merit though. There are two
blocks of code `--broken` and `--dirty` one after the other which both
need to refresh the index. With this patch, 'broken' will use a child
process to do so while 'dirty' will use `refresh_index(...)`. To someone
reading the code it would seem a bit confusing. I agree there is no
merit in using a child process in 'dirty' by itself. But I also think we
should leave a comment there for readers to understand the distinction.
Junio C Hamano July 3, 2024, 6:17 p.m. UTC | #8
Karthik Nayak <karthik.188@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Karthik Nayak <karthik.188@gmail.com> writes:
>>
>>> This explains for why 'broken' must use a subprocess, but there is
>>> nothing stopping 'dirty' from also using a subprocess, right? It
>>> currently uses an in-process index refresh but it _could_ be a
>>> subprocess too.
>>
>> Correct, except that it does not make sense to do any and all things
>> that you _could_ do.  So...
>
> Well, In this context, I think there is some merit though. There are two
> blocks of code `--broken` and `--dirty` one after the other which both
> need to refresh the index. With this patch, 'broken' will use a child
> process to do so while 'dirty' will use `refresh_index(...)`. To someone
> reading the code it would seem a bit confusing.

Yes, that much I very much agree.

> I agree there is no
> merit in using a child process in 'dirty' by itself.

Yes, that made me puzzled why you brought it up, as it was way too
oblique suggestion to ...

> But I also think we
> should leave a comment there for readers to understand the distinction.

... improve the "documentation" to help future developers who wonder
why the code are in the shape as it is.

In this particular case, I think it is borderline if the issue
warrants in-code comment or if it is a bit too much.  Describing the
same thing in the log message would probably be a valid alternative,
as "git blame" can lead those readers to the commit that introduced
the distinction (in other words, this one).

Thanks.

diff --git i/builtin/describe.c w/builtin/describe.c
index e936d2c19f..bc2ad60b35 100644
--- i/builtin/describe.c
+++ w/builtin/describe.c
@@ -648,6 +648,14 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 
 	if (argc == 0) {
 		if (broken) {
+			/* 
+			 * Avoid doing these "update-index --refresh"
+			 * and "diff-index" operations in-process
+			 * (like how the code path for "--dirty"
+			 * without "--broken" does so below), as we
+			 * are told to prepare for a broken repository
+			 * where running these may lead to die().
+			 */
 			struct child_process cp = CHILD_PROCESS_INIT;
 
 			strvec_pushv(&cp.args, update_index_args);
Karthik Nayak July 3, 2024, 8:41 p.m. UTC | #9
Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> Karthik Nayak <karthik.188@gmail.com> writes:
>>>
>>>> This explains for why 'broken' must use a subprocess, but there is
>>>> nothing stopping 'dirty' from also using a subprocess, right? It
>>>> currently uses an in-process index refresh but it _could_ be a
>>>> subprocess too.
>>>
>>> Correct, except that it does not make sense to do any and all things
>>> that you _could_ do.  So...
>>
>> Well, In this context, I think there is some merit though. There are two
>> blocks of code `--broken` and `--dirty` one after the other which both
>> need to refresh the index. With this patch, 'broken' will use a child
>> process to do so while 'dirty' will use `refresh_index(...)`. To someone
>> reading the code it would seem a bit confusing.
>
> Yes, that much I very much agree.
>
>> I agree there is no
>> merit in using a child process in 'dirty' by itself.
>
> Yes, that made me puzzled why you brought it up, as it was way too
> oblique suggestion to ...
>

Yeah, I should've been more verbose there.

>> But I also think we
>> should leave a comment there for readers to understand the distinction.
>
> ... improve the "documentation" to help future developers who wonder
> why the code are in the shape as it is.
>
> In this particular case, I think it is borderline if the issue
> warrants in-code comment or if it is a bit too much.  Describing the
> same thing in the log message would probably be a valid alternative,
> as "git blame" can lead those readers to the commit that introduced
> the distinction (in other words, this one).
>

I think it is always best to err on the side of more documentation than
less in such situations.

> Thanks.
>
> diff --git i/builtin/describe.c w/builtin/describe.c
> index e936d2c19f..bc2ad60b35 100644
> --- i/builtin/describe.c
> +++ w/builtin/describe.c
> @@ -648,6 +648,14 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
>
>  	if (argc == 0) {
>  		if (broken) {
> +			/*
> +			 * Avoid doing these "update-index --refresh"
> +			 * and "diff-index" operations in-process
> +			 * (like how the code path for "--dirty"
> +			 * without "--broken" does so below), as we
> +			 * are told to prepare for a broken repository
> +			 * where running these may lead to die().
> +			 */
>  			struct child_process cp = CHILD_PROCESS_INIT;
>
>  			strvec_pushv(&cp.args, update_index_args);

This looks perfect, thanks!
diff mbox series

Patch

diff --git a/builtin/describe.c b/builtin/describe.c
index e5287eddf2..cf8edc4222 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -53,6 +53,10 @@  static const char *diff_index_args[] = {
 	"diff-index", "--quiet", "HEAD", "--", NULL
 };
 
+static const char *update_index_args[] = {
+	"update-index", "--unmerged", "-q", "--refresh", NULL
+};
+
 struct commit_name {
 	struct hashmap_entry entry;
 	struct object_id peeled;
@@ -645,6 +649,14 @@  int cmd_describe(int argc, const char **argv, const char *prefix)
 	if (argc == 0) {
 		if (broken) {
 			struct child_process cp = CHILD_PROCESS_INIT;
+
+			strvec_pushv(&cp.args, update_index_args);
+			cp.git_cmd = 1;
+			cp.no_stdin = 1;
+			cp.no_stdout = 1;
+			run_command(&cp);
+
+			child_process_init(&cp);
 			strvec_pushv(&cp.args, diff_index_args);
 			cp.git_cmd = 1;
 			cp.no_stdin = 1;
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index e78315d23d..79e0f19deb 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -671,4 +671,40 @@  test_expect_success 'setup misleading taggerdates' '
 
 check_describe newer-tag-older-commit~1 --contains unique-file~2
 
+test_expect_success 'describe --dirty with a file with changed stat' '
+	test_when_finished "rm -fr stat-dirty" &&
+	git init stat-dirty &&
+	(
+		cd stat-dirty &&
+
+		echo A >file &&
+		git add file &&
+		git commit -m A &&
+		git tag A -a -m A &&
+		echo "A" >expect &&
+
+		test-tool chmtime -10 file &&
+		git describe --dirty >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'describe --broken --dirty with a file with changed stat' '
+	test_when_finished "rm -fr stat-dirty" &&
+	git init stat-dirty &&
+	(
+		cd stat-dirty &&
+
+		echo A >file &&
+		git add file &&
+		git commit -m A &&
+		git tag A -a -m A &&
+		echo "A" >expect &&
+
+		test-tool chmtime -10 file &&
+		git describe --dirty --broken >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done