diff mbox series

[v3] diff-lib: Fix check_removed when fsmonitor is on

Message ID 20230911170901.49050-2-sokcevic@google.com (mailing list archive)
State Accepted
Commit 6a044a20480a8ef56f7ddb8142f660ca01a3391e
Headers show
Series [v3] diff-lib: Fix check_removed when fsmonitor is on | expand

Commit Message

Josip Sokcevic Sept. 11, 2023, 5:09 p.m. UTC
git-diff-index may return incorrect deleted entries when fsmonitor is used in a
repository with git submodules. This can be observed on Mac machines, but it
can affect all other supported platforms too.

If fsmonitor is used, `stat *st` is not initialized if cache_entry has
CE_FSMONITOR_VALID set. But, there are three call sites that rely on stat
afterwards, which can result in incorrect results.

This change partially reverts commit 4f3d6d0.

Signed-off-by: Josip Sokcevic <sokcevic@google.com>
---
 diff-lib.c                   | 12 ++++++------
 t/t7527-builtin-fsmonitor.sh |  5 +++++
 2 files changed, 11 insertions(+), 6 deletions(-)

Comments

Junio C Hamano Sept. 11, 2023, 10:53 p.m. UTC | #1
Josip Sokcevic <sokcevic@google.com> writes:

> git-diff-index may return incorrect deleted entries when fsmonitor is used in a
> repository with git submodules. This can be observed on Mac machines, but it
> can affect all other supported platforms too.
>
> If fsmonitor is used, `stat *st` is not initialized if cache_entry has
> CE_FSMONITOR_VALID set. But, there are three call sites that rely on stat
> afterwards, which can result in incorrect results.
>
> This change partially reverts commit 4f3d6d0.
>
> Signed-off-by: Josip Sokcevic <sokcevic@google.com>
> ---
>  diff-lib.c                   | 12 ++++++------
>  t/t7527-builtin-fsmonitor.sh |  5 +++++
>  2 files changed, 11 insertions(+), 6 deletions(-)

This certainly is more "complete" if simpler than the previous one
;-)

In the longer term, we would probably want to enable optimization
using what fsmonitor knows, but as we have seen in the review on
the previous round, this code needs a bit more work than the
original we are reverting here to get it right, and in the shorter
term, hopefully this would do.

Thanks.

> diff --git a/diff-lib.c b/diff-lib.c
> index d8aa777a73..5848e4f9ca 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -36,14 +36,14 @@
>   * exists for ce that is a submodule -- it is a submodule that is not
>   * checked out).  Return negative for an error.
>   */
> -static int check_removed(const struct index_state *istate, const struct cache_entry *ce, struct stat *st)
> +static int check_removed(const struct cache_entry *ce, struct stat *st)
>  {
> -	assert(is_fsmonitor_refreshed(istate));
> -	if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) {
> +	if (lstat(ce->name, st) < 0) {
>  		if (!is_missing_file_error(errno))
>  			return -1;
>  		return 1;
>  	}
> +
>  	if (has_symlink_leading_path(ce->name, ce_namelen(ce)))
>  		return 1;
>  	if (S_ISDIR(st->st_mode)) {
> @@ -149,7 +149,7 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
>  			memset(&(dpath->parent[0]), 0,
>  			       sizeof(struct combine_diff_parent)*5);
>  
> -			changed = check_removed(istate, ce, &st);
> +			changed = check_removed(ce, &st);
>  			if (!changed)
>  				wt_mode = ce_mode_from_stat(ce, st.st_mode);
>  			else {
> @@ -229,7 +229,7 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
>  		} else {
>  			struct stat st;
>  
> -			changed = check_removed(istate, ce, &st);
> +			changed = check_removed(ce, &st);
>  			if (changed) {
>  				if (changed < 0) {
>  					perror(ce->name);
> @@ -303,7 +303,7 @@ static int get_stat_data(const struct index_state *istate,
>  	if (!cached && !ce_uptodate(ce)) {
>  		int changed;
>  		struct stat st;
> -		changed = check_removed(istate, ce, &st);
> +		changed = check_removed(ce, &st);
>  		if (changed < 0)
>  			return -1;
>  		else if (changed) {
> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index 0c241d6c14..78503158fd 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -809,6 +809,11 @@ my_match_and_clean () {
>  		status --porcelain=v2 >actual.without &&
>  	test_cmp actual.with actual.without &&
>  
> +	git -C super --no-optional-locks diff-index --name-status HEAD >actual.with &&
> +	git -C super --no-optional-locks -c core.fsmonitor=false \
> +		diff-index --name-status HEAD >actual.without &&
> +	test_cmp actual.with actual.without &&
> +
>  	git -C super/dir_1/dir_2/sub reset --hard &&
>  	git -C super/dir_1/dir_2/sub clean -d -f
>  }
Josip Sokcevic Sept. 12, 2023, 3:03 a.m. UTC | #2
On Mon, Sep 11, 2023 at 3:53 PM Junio C Hamano <gitster@pobox.com> wrote:
> This certainly is more "complete" if simpler than the previous one
> ;-)
>
> In the longer term, we would probably want to enable optimization
> using what fsmonitor knows, but as we have seen in the review on
> the previous round, this code needs a bit more work than the
> original we are reverting here to get it right, and in the shorter
> term, hopefully this would do.

Yes, I agree we should optimize this in a follow up. One thing I'm not
sure about is if we should try to construct `struct stat` using
`cache_index`, or we should check for `CE_FSMONITOR_VALID` in a way
that `stat` would no longer be needed for those code paths.
Junio C Hamano Sept. 12, 2023, 5:07 p.m. UTC | #3
Josip Sokcevic <sokcevic@google.com> writes:

> Yes, I agree we should optimize this in a follow up. One thing I'm not
> sure about is if we should try to construct `struct stat` using
> `cache_index`, or we should check for `CE_FSMONITOR_VALID` in a way
> that `stat` would no longer be needed for those code paths.

Good point.  

It seems to be entirely doable, even though the stench from the
abstraction layer violation may be horrible.

ie_match_stat(), which is called by match_stat_with_submodule(),
does pay attention to CE_FSMONITOR_VALID bit, so none of the members
of struct stat matters when the bit is set.  But the bit is not set,
all members that fill_stat_data() and ce_match_stat_basic() care
about do matter.  Other code that follows callers of check_removed()
do care about at least .st_mode member, and I suspect that in the
current code .st_mode is the only member that gets looked at.

So after all, I think your original "fix" was correct, but it took
us some time to figure out why it was, which means we would want to
explain it in the log message for developers who would want to touch
the same area in the future.

Thanks.
Josip Sokcevic Sept. 14, 2023, 10:39 p.m. UTC | #4
On Tue, Sep 12, 2023 at 10:07 AM Junio C Hamano <gitster@pobox.com> wrote:
> It seems to be entirely doable, even though the stench from the
> abstraction layer violation may be horrible.
>
> ie_match_stat(), which is called by match_stat_with_submodule(),
> does pay attention to CE_FSMONITOR_VALID bit, so none of the members
> of struct stat matters when the bit is set.  But the bit is not set,
> all members that fill_stat_data() and ce_match_stat_basic() care
> about do matter.  Other code that follows callers of check_removed()
> do care about at least .st_mode member, and I suspect that in the
> current code .st_mode is the only member that gets looked at.
>
> So after all, I think your original "fix" was correct, but it took
> us some time to figure out why it was, which means we would want to
> explain it in the log message for developers who would want to touch
> the same area in the future.

I finished testing this - after my original fix and after running all
tests, I can confirm that `st_mode` of `struct stat` is indeed not
consumed if CE_FSMONITOR_VALID is set. But, it's fragile and likely to
cause problems in the future. Your approach of constructing `struct
stat` based on `struct cache_entry` is the way to go.

I see you created a new set of patches in a separate thread, so I'll
start those tests and report back there.

Thanks!
Junio C Hamano Sept. 18, 2023, 4:35 p.m. UTC | #5
Josip Sokcevic <sokcevic@google.com> writes:

> I see you created a new set of patches in a separate thread, so I'll
> start those tests and report back there.

Thanks.
diff mbox series

Patch

diff --git a/diff-lib.c b/diff-lib.c
index d8aa777a73..5848e4f9ca 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -36,14 +36,14 @@ 
  * exists for ce that is a submodule -- it is a submodule that is not
  * checked out).  Return negative for an error.
  */
-static int check_removed(const struct index_state *istate, const struct cache_entry *ce, struct stat *st)
+static int check_removed(const struct cache_entry *ce, struct stat *st)
 {
-	assert(is_fsmonitor_refreshed(istate));
-	if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) {
+	if (lstat(ce->name, st) < 0) {
 		if (!is_missing_file_error(errno))
 			return -1;
 		return 1;
 	}
+
 	if (has_symlink_leading_path(ce->name, ce_namelen(ce)))
 		return 1;
 	if (S_ISDIR(st->st_mode)) {
@@ -149,7 +149,7 @@  void run_diff_files(struct rev_info *revs, unsigned int option)
 			memset(&(dpath->parent[0]), 0,
 			       sizeof(struct combine_diff_parent)*5);
 
-			changed = check_removed(istate, ce, &st);
+			changed = check_removed(ce, &st);
 			if (!changed)
 				wt_mode = ce_mode_from_stat(ce, st.st_mode);
 			else {
@@ -229,7 +229,7 @@  void run_diff_files(struct rev_info *revs, unsigned int option)
 		} else {
 			struct stat st;
 
-			changed = check_removed(istate, ce, &st);
+			changed = check_removed(ce, &st);
 			if (changed) {
 				if (changed < 0) {
 					perror(ce->name);
@@ -303,7 +303,7 @@  static int get_stat_data(const struct index_state *istate,
 	if (!cached && !ce_uptodate(ce)) {
 		int changed;
 		struct stat st;
-		changed = check_removed(istate, ce, &st);
+		changed = check_removed(ce, &st);
 		if (changed < 0)
 			return -1;
 		else if (changed) {
diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index 0c241d6c14..78503158fd 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -809,6 +809,11 @@  my_match_and_clean () {
 		status --porcelain=v2 >actual.without &&
 	test_cmp actual.with actual.without &&
 
+	git -C super --no-optional-locks diff-index --name-status HEAD >actual.with &&
+	git -C super --no-optional-locks -c core.fsmonitor=false \
+		diff-index --name-status HEAD >actual.without &&
+	test_cmp actual.with actual.without &&
+
 	git -C super/dir_1/dir_2/sub reset --hard &&
 	git -C super/dir_1/dir_2/sub clean -d -f
 }