diff mbox series

[v5] clone: report duplicate entries on case-insensitive filesystems

Message ID 20181119082015.77553-1-carenas@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v5] clone: report duplicate entries on case-insensitive filesystems | expand

Commit Message

Carlo Marcelo Arenas Belón Nov. 19, 2018, 8:20 a.m. UTC
While I don't have an HFS+ volume to test, I suspect this patch should be
needed for both, even if I have to say thay even the broken output was
better than the current state.

Travis seems to be using a case sensitive filesystem so wouldn't catch this.

Was windows/cygwin tested?

Carlo
-- >8 --
Subject: [PATCH] entry: fix t5061 on macOS

b878579ae7 ("clone: report duplicate entries on case-insensitive filesystems",
2018-08-17) was tested on Linux with an excemption for Windows that needs
to be expanded for macOS (using APFS), which then would show :

$ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'man2/_Exit.2'
  'man2/_exit.2'
  'man3/NAN.3'
  'man3/nan.3'

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 entry.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Torsten Bögershausen Nov. 19, 2018, 12:28 p.m. UTC | #1
On 2018-11-19 09:20, Carlo Marcelo Arenas Belón wrote:
> While I don't have an HFS+ volume to test, I suspect this patch should be
> needed for both, even if I have to say thay even the broken output was
> better than the current state.
> 
> Travis seems to be using a case sensitive filesystem so wouldn't catch this.
> 
> Was windows/cygwin tested?
> 
> Carlo
> -- >8 --
> Subject: [PATCH] entry: fix t5061 on macOS
> 
> b878579ae7 ("clone: report duplicate entries on case-insensitive filesystems",
> 2018-08-17) was tested on Linux with an excemption for Windows that needs
> to be expanded for macOS (using APFS), which then would show :
> 
> $ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
> warning: the following paths have collided (e.g. case-sensitive paths
> on a case-insensitive filesystem) and only one from the same
> colliding group is in the working tree:
> 
>   'man2/_Exit.2'
>   'man2/_exit.2'
>   'man3/NAN.3'
>   'man3/nan.3'
> 
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>  entry.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/entry.c b/entry.c
> index 5d136c5d55..3845f570f7 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout *state,
>  {
>  	int i, trust_ino = check_stat;
>  
> -#if defined(GIT_WINDOWS_NATIVE)
> +#if defined(GIT_WINDOWS_NATIVE) || defined(__APPLE__)
>  	trust_ino = 0;
>  #endif
>  
> 

Sorry,
but I can't reproduce your problem here.

Did you test it on Mac ?
If I run t5601 on a case sensitive files system
(Mac, mounted NFS, exported from Linux)
I get:
ok 99 # skip colliding file detection (missing CASE_INSENSITIVE_FS of
!MINGW,!CYGWIN,CASE_INSENSITIVE_FS)

And if I run it on a case-insensitive HFS+,
I get
ok 99 - colliding file detection

So what exactly are you trying to fix ?
Carlo Marcelo Arenas Belón Nov. 19, 2018, 5:14 p.m. UTC | #2
On Mon, Nov 19, 2018 at 4:28 AM Torsten Bögershausen <tboegi@web.de> wrote:
>
> Did you test it on Mac ?

macOS 10.14.1 but only using APFS, did you test my patch with HFS+?

> So what exactly are you trying to fix ?

I get

not ok 99 - colliding file detection
#
# grep X icasefs/warning &&
# grep x icasefs/warning &&
# test_i18ngrep "the following paths have collided" icasefs/warning
#

and the output of "warning" only shows one of the conflicting files,
instead of both:

Cloning into 'bogus'...
done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'x'

Carlo
Ramsay Jones Nov. 19, 2018, 5:21 p.m. UTC | #3
On 19/11/2018 12:28, Torsten Bögershausen wrote:
> On 2018-11-19 09:20, Carlo Marcelo Arenas Belón wrote:
>> While I don't have an HFS+ volume to test, I suspect this patch should be
>> needed for both, even if I have to say thay even the broken output was
>> better than the current state.
>>
>> Travis seems to be using a case sensitive filesystem so wouldn't catch this.
>>
>> Was windows/cygwin tested?
>>
>> Carlo
>> -- >8 --
>> Subject: [PATCH] entry: fix t5061 on macOS
>>
>> b878579ae7 ("clone: report duplicate entries on case-insensitive filesystems",
>> 2018-08-17) was tested on Linux with an excemption for Windows that needs
>> to be expanded for macOS (using APFS), which then would show :
>>
>> $ git clone git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
>> warning: the following paths have collided (e.g. case-sensitive paths
>> on a case-insensitive filesystem) and only one from the same
>> colliding group is in the working tree:
>>
>>   'man2/_Exit.2'
>>   'man2/_exit.2'
>>   'man3/NAN.3'
>>   'man3/nan.3'
>>
>> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
>> ---
>>  entry.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/entry.c b/entry.c
>> index 5d136c5d55..3845f570f7 100644
>> --- a/entry.c
>> +++ b/entry.c
>> @@ -404,7 +404,7 @@ static void mark_colliding_entries(const struct checkout *state,
>>  {
>>  	int i, trust_ino = check_stat;
>>  
>> -#if defined(GIT_WINDOWS_NATIVE)
>> +#if defined(GIT_WINDOWS_NATIVE) || defined(__APPLE__)
>>  	trust_ino = 0;
>>  #endif
>>  
>>
> 
> Sorry,
> but I can't reproduce your problem here.
> 
> Did you test it on Mac ?
> If I run t5601 on a case sensitive files system
> (Mac, mounted NFS, exported from Linux)
> I get:
> ok 99 # skip colliding file detection (missing CASE_INSENSITIVE_FS of
> !MINGW,!CYGWIN,CASE_INSENSITIVE_FS)

I tested v2.20.0-rc0 on cygwin last night and it passed just fine.
I just ran t5601-clone.sh on its own and got:

    $ ./t5601-clone.sh
    ...
    ok 98 - clone on case-insensitive fs
    ok 99 # skip colliding file detection (missing !CYGWIN of !MINGW,!CYGWIN,CASE_INSENSITIVE_FS)
    ok 100 - partial clone
    ok 101 - partial clone: warn if server does not support object filtering
    ok 102 - batch missing blob request during checkout
    ok 103 - batch missing blob request does not inadvertently try to fetch gitlinks
    # passed all 103 test(s)
    # SKIP no web server found at '/usr/sbin/apache2'
    1..103
    $ 

ATB,
Ramsay Jones
Duy Nguyen Nov. 19, 2018, 6:24 p.m. UTC | #4
On Mon, Nov 19, 2018 at 6:14 PM Carlo Arenas <carenas@gmail.com> wrote:
>
> On Mon, Nov 19, 2018 at 4:28 AM Torsten Bögershausen <tboegi@web.de> wrote:
> >
> > Did you test it on Mac ?
>
> macOS 10.14.1 but only using APFS, did you test my patch with HFS+?
>
> > So what exactly are you trying to fix ?
>
> I get
>
> not ok 99 - colliding file detection
> #
> # grep X icasefs/warning &&
> # grep x icasefs/warning &&
> # test_i18ngrep "the following paths have collided" icasefs/warning
> #
>
> and the output of "warning" only shows one of the conflicting files,
> instead of both:
>
> Cloning into 'bogus'...
> done.
> warning: the following paths have collided (e.g. case-sensitive paths
> on a case-insensitive filesystem) and only one from the same
> colliding group is in the working tree:
>
>   'x'
>
> Carlo

Could you send me the "index" file in  t/trash\
directory.t5601-clone/icasefs/bogus/.git/index ? Also the output of
"stat /path/to/icase/bogus/x"

My only explanation is somehow the inode value we save is not the same
one on disk, which is weird and could even cause other problems. I'd
like to know why this happens before trying to fix anything.
Carlo Marcelo Arenas Belón Nov. 19, 2018, 7:39 p.m. UTC | #5
On Mon, Nov 19, 2018 at 9:23 AM Ramsay Jones
<ramsay@ramsayjones.plus.com> wrote:
>     ok 99 # skip colliding file detection (missing !CYGWIN of !MINGW,!CYGWIN,CASE_INSENSITIVE_FS)

you need to enable this specific test first (removing !CYGWIN) so it
doesn't get skipped

Carlo
Duy Nguyen Nov. 19, 2018, 9:03 p.m. UTC | #6
First of all, Ramsay, it would be great if you could test the below
patch and see if it works on Cygwin. I assume since Cygwin shares the
underlying filesystem, it will share the same "no trusting inode"
issue with native builds (or it calculates inodes anyway using some
other source?).

Back to the APFS problem...

On Mon, Nov 19, 2018 at 07:24:26PM +0100, Duy Nguyen wrote:
> Could you send me the "index" file in  t/trash\
> directory.t5601-clone/icasefs/bogus/.git/index ? Also the output of
> "stat /path/to/icase/bogus/x"
> 
> My only explanation is somehow the inode value we save is not the same
> one on disk, which is weird and could even cause other problems. I'd
> like to know why this happens before trying to fix anything.

Thanks Carlo for the file and "stat" output. The problem is APFS has
64-bit inode (according to the Internet) while we store inodes as
32-bit, so it's truncated. Which means this comparison

    sd_ino == st_ino

is never true because sd_ino is truncated (0x2121063) while st_ino is
not (0x202121063).

Carlo, it would be great if you could test this patch also with
APFS. It should fix problem. We will have to deal with the same
truncated inode elsewhere to make sure we index refresh performance
does not degrade on APFS. But that's a separate problem. Thank you for
bringing this up.

-- 8< --
diff --git a/entry.c b/entry.c
index 5d136c5d55..809d3e2ba7 100644
--- a/entry.c
+++ b/entry.c
@@ -404,13 +404,13 @@ static void mark_colliding_entries(const struct checkout *state,
 {
 	int i, trust_ino = check_stat;
 
-#if defined(GIT_WINDOWS_NATIVE)
+#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__)
 	trust_ino = 0;
 #endif
 
 	ce->ce_flags |= CE_MATCHED;
 
-	for (i = 0; i < state->istate->cache_nr; i++) {
+	for (i = 0; i < trust_ino && state->istate->cache_nr; i++) {
 		struct cache_entry *dup = state->istate->cache[i];
 
 		if (dup == ce)
@@ -419,10 +419,24 @@ static void mark_colliding_entries(const struct checkout *state,
 		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
 			continue;
 
-		if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
-		    (!trust_ino && !fspathcmp(ce->name, dup->name))) {
+		if (dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino) {
 			dup->ce_flags |= CE_MATCHED;
+			return;
+		}
+	}
+
+	for (i = 0; i < state->istate->cache_nr; i++) {
+		struct cache_entry *dup = state->istate->cache[i];
+
+		if (dup == ce)
 			break;
+
+		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
+			continue;
+
+		if (!fspathcmp(ce->name, dup->name)) {
+			dup->ce_flags |= CE_MATCHED;
+			return;
 		}
 	}
 }
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index f1a49e94f5..c28d51bd59 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' '
 	)
 '
 
-test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '
+test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' '
 	grep X icasefs/warning &&
 	grep x icasefs/warning &&
 	test_i18ngrep "the following paths have collided" icasefs/warning
-- 8< --
Duy Nguyen Nov. 19, 2018, 9:04 p.m. UTC | #7
... and I "dear Ramsay" without CCing him.. sigh.. sorry for the noise.

On Mon, Nov 19, 2018 at 10:03 PM Duy Nguyen <pclouds@gmail.com> wrote:
>
> First of all, Ramsay, it would be great if you could test the below
> patch and see if it works on Cygwin. I assume since Cygwin shares the
> underlying filesystem, it will share the same "no trusting inode"
> issue with native builds (or it calculates inodes anyway using some
> other source?).
>
> Back to the APFS problem...
>
> On Mon, Nov 19, 2018 at 07:24:26PM +0100, Duy Nguyen wrote:
> > Could you send me the "index" file in  t/trash\
> > directory.t5601-clone/icasefs/bogus/.git/index ? Also the output of
> > "stat /path/to/icase/bogus/x"
> >
> > My only explanation is somehow the inode value we save is not the same
> > one on disk, which is weird and could even cause other problems. I'd
> > like to know why this happens before trying to fix anything.
>
> Thanks Carlo for the file and "stat" output. The problem is APFS has
> 64-bit inode (according to the Internet) while we store inodes as
> 32-bit, so it's truncated. Which means this comparison
>
>     sd_ino == st_ino
>
> is never true because sd_ino is truncated (0x2121063) while st_ino is
> not (0x202121063).
>
> Carlo, it would be great if you could test this patch also with
> APFS. It should fix problem. We will have to deal with the same
> truncated inode elsewhere to make sure we index refresh performance
> does not degrade on APFS. But that's a separate problem. Thank you for
> bringing this up.
>
> -- 8< --
> diff --git a/entry.c b/entry.c
> index 5d136c5d55..809d3e2ba7 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -404,13 +404,13 @@ static void mark_colliding_entries(const struct checkout *state,
>  {
>         int i, trust_ino = check_stat;
>
> -#if defined(GIT_WINDOWS_NATIVE)
> +#if defined(GIT_WINDOWS_NATIVE) || defined(__CYGWIN__)
>         trust_ino = 0;
>  #endif
>
>         ce->ce_flags |= CE_MATCHED;
>
> -       for (i = 0; i < state->istate->cache_nr; i++) {
> +       for (i = 0; i < trust_ino && state->istate->cache_nr; i++) {
>                 struct cache_entry *dup = state->istate->cache[i];
>
>                 if (dup == ce)
> @@ -419,10 +419,24 @@ static void mark_colliding_entries(const struct checkout *state,
>                 if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
>                         continue;
>
> -               if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
> -                   (!trust_ino && !fspathcmp(ce->name, dup->name))) {
> +               if (dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino) {
>                         dup->ce_flags |= CE_MATCHED;
> +                       return;
> +               }
> +       }
> +
> +       for (i = 0; i < state->istate->cache_nr; i++) {
> +               struct cache_entry *dup = state->istate->cache[i];
> +
> +               if (dup == ce)
>                         break;
> +
> +               if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
> +                       continue;
> +
> +               if (!fspathcmp(ce->name, dup->name)) {
> +                       dup->ce_flags |= CE_MATCHED;
> +                       return;
>                 }
>         }
>  }
> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index f1a49e94f5..c28d51bd59 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -628,7 +628,7 @@ test_expect_success 'clone on case-insensitive fs' '
>         )
>  '
>
> -test_expect_success !MINGW,!CYGWIN,CASE_INSENSITIVE_FS 'colliding file detection' '
> +test_expect_success !MINGW,CASE_INSENSITIVE_FS 'colliding file detection' '
>         grep X icasefs/warning &&
>         grep x icasefs/warning &&
>         test_i18ngrep "the following paths have collided" icasefs/warning
> -- 8< --
Duy Nguyen Nov. 19, 2018, 9:17 p.m. UTC | #8
On Mon, Nov 19, 2018 at 10:03 PM Duy Nguyen <pclouds@gmail.com> wrote:
> Thanks Carlo for the file and "stat" output. The problem is APFS has
> 64-bit inode (according to the Internet) while we store inodes as
> 32-bit, so it's truncated.
> ...
> We will have to deal with the same
> truncated inode elsewhere to make sure we index refresh performance
> does not degrade on APFS.

... and we don't have a problem there. Either Linus predicted dealing
with 64-bit inodes, or he had a habit of casting st_ino to unsigned
int, I cannot tell. This code

    ce->st_ino != (unsigned int)st->st_ino

is from e83c516331 (Initial revision of "git", the information manager
from hell - 2005-04-07) and it's still used today for comparing sd_ino
with st->st_ino in read-cache.c. I guess I should have copied and
pasted more often.
Ramsay Jones Nov. 19, 2018, 11:29 p.m. UTC | #9
On 19/11/2018 21:03, Duy Nguyen wrote:
> First of all, Ramsay, it would be great if you could test the below
> patch and see if it works on Cygwin. I assume since Cygwin shares the
> underlying filesystem, it will share the same "no trusting inode"
> issue with native builds (or it calculates inodes anyway using some
> other source?).

Hmm, I have no idea why you would like me to try this patch - care
to explain? [I just saw, "Has this been tested on cygwin?" and, since
it has been happily passing for some time, responded yes!]

Just for the giggles, I removed the !CYGWIN prerequisite from the
test and when, as expected, the test failed, had a look around:

$ pwd
/home/ramsay/git/t/trash directory.t5601-clone
$ cat icasefs/warning 
Cloning into 'bogus'...
done.
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'x'
$ cd icasefs/bogus
$ ls -l
total 0
-rw-r--r-- 1 ramsay None 0 Nov 19 22:40 x
$ git ls-files --debug
ignoring EOIE extension
X
  ctime: 1542667201:664036600
  mtime: 1542667201:663055400
  dev: 2378432	ino: 324352
  uid: 1001	gid: 513
  size: 0	flags: 0
x
  ctime: 1542667201:665026800
  mtime: 1542667201:665026800
  dev: 2378432	ino: 324352
  uid: 1001	gid: 513
  size: 0	flags: 0
$ 

So, both X and x are in the index with the same inode number.

Does that help?

ATB,
Ramsay Jones
Ramsay Jones Nov. 19, 2018, 11:54 p.m. UTC | #10
On 19/11/2018 23:29, Ramsay Jones wrote:
> 
> 
> On 19/11/2018 21:03, Duy Nguyen wrote:
>> First of all, Ramsay, it would be great if you could test the below
>> patch and see if it works on Cygwin. I assume since Cygwin shares the
>> underlying filesystem, it will share the same "no trusting inode"
>> issue with native builds (or it calculates inodes anyway using some
>> other source?).
> 
> Hmm, I have no idea why you would like me to try this patch - care
> to explain? [I just saw, "Has this been tested on cygwin?" and, since
> it has been happily passing for some time, responded yes!]
> 
> Just for the giggles, I removed the !CYGWIN prerequisite from the
> test and when, as expected, the test failed, had a look around:
> 
> $ pwd
> /home/ramsay/git/t/trash directory.t5601-clone
> $ cat icasefs/warning 
> Cloning into 'bogus'...
> done.
> warning: the following paths have collided (e.g. case-sensitive paths
> on a case-insensitive filesystem) and only one from the same
> colliding group is in the working tree:
> 
>   'x'
> $ cd icasefs/bogus
> $ ls -l
> total 0
> -rw-r--r-- 1 ramsay None 0 Nov 19 22:40 x
> $ git ls-files --debug
> ignoring EOIE extension
> X
>   ctime: 1542667201:664036600
>   mtime: 1542667201:663055400
>   dev: 2378432	ino: 324352
>   uid: 1001	gid: 513
>   size: 0	flags: 0
> x
>   ctime: 1542667201:665026800
>   mtime: 1542667201:665026800
>   dev: 2378432	ino: 324352
>   uid: 1001	gid: 513
>   size: 0	flags: 0
> $ 
> 
> So, both X and x are in the index with the same inode number.
> 
> Does that help?

Well, I haven't even looked at the patch, but when I apply it to
the current 'pu' branch (just what I happened to have checked out)
and run that one test:

$ ./t5601-clone.sh
...
ok 96 - shallow clone locally
ok 97 - GIT_TRACE_PACKFILE produces a usable pack
ok 98 - clone on case-insensitive fs
ok 99 - colliding file detection
ok 100 - partial clone
ok 101 - partial clone: warn if server does not support object filtering
ok 102 - batch missing blob request during checkout
ok 103 - batch missing blob request does not inadvertently try to fetch gitlinks
# passed all 103 test(s)
# SKIP no web server found at '/usr/sbin/apache2'
1..103
$ 

... the colliding file detection test passes!

ATB,
Ramsay Jones
Carlo Marcelo Arenas Belón Nov. 20, 2018, 1:05 a.m. UTC | #11
ok 99 - colliding file detection

as well in macOS with APFS

Carlo
Junio C Hamano Nov. 20, 2018, 2:22 a.m. UTC | #12
Duy Nguyen <pclouds@gmail.com> writes:

>  
> -	for (i = 0; i < state->istate->cache_nr; i++) {
> +	for (i = 0; i < trust_ino && state->istate->cache_nr; i++) {

There is some typo here, but modulo that this looks like the right
thing to do.

> @@ -419,10 +419,24 @@ static void mark_colliding_entries(const struct checkout *state,
>  		if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE))
>  			continue;
>  
> -		if ((trust_ino && dup->ce_stat_data.sd_ino == st->st_ino) ||
> -		    (!trust_ino && !fspathcmp(ce->name, dup->name))) {
> +		if (dup->ce_stat_data.sd_ino == (unsigned int)st->st_ino) {

This is slightly unfortunate but is the best we can do for now.  

The reason why the design of the "cached stat info" mechanism allows
the sd_* fields to be narrower than the underlying fields is because
they are used only as an early-culling measure (if the value saved
with truncation is different from the current value with truncation,
then they cannot possibly be the same, so we know that the file
changed without looking at the contents).

This use however is different.  Equality of truncated values
immediately declare CE_MATCHED here, producing false negative, which
is not what we want, no?

>  			dup->ce_flags |= CE_MATCHED;
> +			return;
> +		}
> +	}
diff mbox series

Patch

diff --git a/entry.c b/entry.c
index 5d136c5d55..3845f570f7 100644
--- a/entry.c
+++ b/entry.c
@@ -404,7 +404,7 @@  static void mark_colliding_entries(const struct checkout *state,
 {
 	int i, trust_ino = check_stat;
 
-#if defined(GIT_WINDOWS_NATIVE)
+#if defined(GIT_WINDOWS_NATIVE) || defined(__APPLE__)
 	trust_ino = 0;
 #endif