[GSoC,v5,2/7] clone: better handle symlinked files at .git/objects/
diff mbox series

Message ID 20190330224907.3277-3-matheus.bernardino@usp.br
State New
Headers show
Series
  • Untitled series #99107
Related show

Commit Message

Matheus Tavares Bernardino March 30, 2019, 10:49 p.m. UTC
There is currently an odd behaviour when locally cloning a repository
with symlinks at .git/objects: using --no-hardlinks all symlinks are
dereferenced but without it, Git will try to hardlink the files with the
link() function, which has an OS-specific behaviour on symlinks. On OSX
and NetBSD, it creates a hardlink to the file pointed by the symlink
whilst on GNU/Linux, it creates a hardlink to the symlink itself.

On Manjaro GNU/Linux:
    $ touch a
    $ ln -s a b
    $ link b c
    $ ls -li a b c
    155 [...] a
    156 [...] b -> a
    156 [...] c -> a

But on NetBSD:
    $ ls -li a b c
    2609160 [...] a
    2609164 [...] b -> a
    2609160 [...] c

It's not good to have the result of a local clone to be OS-dependent and
besides that, the current behaviour on GNU/Linux may result in broken
symlinks. So let's standardize this by making the hardlinks always point
to dereferenced paths, instead of the symlinks themselves. Also, add
tests for symlinked files at .git/objects/.

Note: Git won't create symlinks at .git/objects itself, but it's better
to handle this case and be friendly with users who manually create them.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/clone.c            |  5 ++++-
 t/t5604-clone-reference.sh | 27 ++++++++++++++++++++-------
 2 files changed, 24 insertions(+), 8 deletions(-)

Comments

Thomas Gummerer March 31, 2019, 5:40 p.m. UTC | #1
On 03/30, Matheus Tavares wrote:
> There is currently an odd behaviour when locally cloning a repository
> with symlinks at .git/objects: using --no-hardlinks all symlinks are
> dereferenced but without it, Git will try to hardlink the files with the
> link() function, which has an OS-specific behaviour on symlinks. On OSX
> and NetBSD, it creates a hardlink to the file pointed by the symlink
> whilst on GNU/Linux, it creates a hardlink to the symlink itself.
> 
> On Manjaro GNU/Linux:
>     $ touch a
>     $ ln -s a b
>     $ link b c
>     $ ls -li a b c
>     155 [...] a
>     156 [...] b -> a
>     156 [...] c -> a
> 
> But on NetBSD:
>     $ ls -li a b c
>     2609160 [...] a
>     2609164 [...] b -> a
>     2609160 [...] c
> 
> It's not good to have the result of a local clone to be OS-dependent and
> besides that, the current behaviour on GNU/Linux may result in broken
> symlinks. So let's standardize this by making the hardlinks always point
> to dereferenced paths, instead of the symlinks themselves. Also, add
> tests for symlinked files at .git/objects/.
> 
> Note: Git won't create symlinks at .git/objects itself, but it's better
> to handle this case and be friendly with users who manually create them.
> 
> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/clone.c            |  5 ++++-
>  t/t5604-clone-reference.sh | 27 ++++++++++++++++++++-------
>  2 files changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 50bde99618..f975b509f1 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -443,7 +443,10 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
>  		if (unlink(dest->buf) && errno != ENOENT)
>  			die_errno(_("failed to unlink '%s'"), dest->buf);
>  		if (!option_no_hardlinks) {
> -			if (!link(src->buf, dest->buf))
> +			char *resolved_path = real_pathdup(src->buf, 1);
> +			int status = link(resolved_path, dest->buf);
> +			free(resolved_path);
> +			if (!status)

Is there any reason why we can't use 'real_path()' here?  As I
mentioned in [*1*], 'real_path()' doesn't require the callers to free
any memory, so the above could become much simpler, and could just be

+			if (!link(real_path(src->buf), dest->buf))

*1*: <20190330192738.GQ32487@hank.intra.tgummerer.com>

>  				continue;
>  			if (option_local > 0)
>  				die_errno(_("failed to create link '%s'"), dest->buf);
Matheus Tavares Bernardino April 1, 2019, 3:59 a.m. UTC | #2
On Sun, Mar 31, 2019 at 2:40 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> On 03/30, Matheus Tavares wrote:
> > There is currently an odd behaviour when locally cloning a repository
> > with symlinks at .git/objects: using --no-hardlinks all symlinks are
> > dereferenced but without it, Git will try to hardlink the files with the
> > link() function, which has an OS-specific behaviour on symlinks. On OSX
> > and NetBSD, it creates a hardlink to the file pointed by the symlink
> > whilst on GNU/Linux, it creates a hardlink to the symlink itself.
> >
> > On Manjaro GNU/Linux:
> >     $ touch a
> >     $ ln -s a b
> >     $ link b c
> >     $ ls -li a b c
> >     155 [...] a
> >     156 [...] b -> a
> >     156 [...] c -> a
> >
> > But on NetBSD:
> >     $ ls -li a b c
> >     2609160 [...] a
> >     2609164 [...] b -> a
> >     2609160 [...] c
> >
> > It's not good to have the result of a local clone to be OS-dependent and
> > besides that, the current behaviour on GNU/Linux may result in broken
> > symlinks. So let's standardize this by making the hardlinks always point
> > to dereferenced paths, instead of the symlinks themselves. Also, add
> > tests for symlinked files at .git/objects/.
> >
> > Note: Git won't create symlinks at .git/objects itself, but it's better
> > to handle this case and be friendly with users who manually create them.
> >
> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> > Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> > ---
> >  builtin/clone.c            |  5 ++++-
> >  t/t5604-clone-reference.sh | 27 ++++++++++++++++++++-------
> >  2 files changed, 24 insertions(+), 8 deletions(-)
> >
> > diff --git a/builtin/clone.c b/builtin/clone.c
> > index 50bde99618..f975b509f1 100644
> > --- a/builtin/clone.c
> > +++ b/builtin/clone.c
> > @@ -443,7 +443,10 @@ static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
> >               if (unlink(dest->buf) && errno != ENOENT)
> >                       die_errno(_("failed to unlink '%s'"), dest->buf);
> >               if (!option_no_hardlinks) {
> > -                     if (!link(src->buf, dest->buf))
> > +                     char *resolved_path = real_pathdup(src->buf, 1);
> > +                     int status = link(resolved_path, dest->buf);
> > +                     free(resolved_path);
> > +                     if (!status)
>
> Is there any reason why we can't use 'real_path()' here?  As I
> mentioned in [*1*], 'real_path()' doesn't require the callers to free
> any memory, so the above could become much simpler, and could just be
>
> +                       if (!link(real_path(src->buf), dest->buf))
>

Yes, you are right. I will change this! I sent this v5 before
carefully reading your previous email and studding strbuf functions
and real_path(), now that I did that, I see that real_path() is the
best option here. Thanks!

> *1*: <20190330192738.GQ32487@hank.intra.tgummerer.com>
>
> >                               continue;
> >                       if (option_local > 0)
> >                               die_errno(_("failed to create link '%s'"), dest->buf);

Patch
diff mbox series

diff --git a/builtin/clone.c b/builtin/clone.c
index 50bde99618..f975b509f1 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -443,7 +443,10 @@  static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest,
 		if (unlink(dest->buf) && errno != ENOENT)
 			die_errno(_("failed to unlink '%s'"), dest->buf);
 		if (!option_no_hardlinks) {
-			if (!link(src->buf, dest->buf))
+			char *resolved_path = real_pathdup(src->buf, 1);
+			int status = link(resolved_path, dest->buf);
+			free(resolved_path);
+			if (!status)
 				continue;
 			if (option_local > 0)
 				die_errno(_("failed to create link '%s'"), dest->buf);
diff --git a/t/t5604-clone-reference.sh b/t/t5604-clone-reference.sh
index 207650cb95..0800c3853f 100755
--- a/t/t5604-clone-reference.sh
+++ b/t/t5604-clone-reference.sh
@@ -266,7 +266,7 @@  test_expect_success 'clone a repo with garbage in objects/*' '
 	test_cmp expected actual
 '
 
-test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'setup repo with manually symlinked or unknown files at objects/' '
 	git init T &&
 	(
 		cd T &&
@@ -280,10 +280,19 @@  test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
 		ln -s packs pack &&
 		find ?? -type d >loose-dirs &&
 		last_loose=$(tail -n 1 loose-dirs) &&
-		rm -f loose-dirs &&
 		mv $last_loose a-loose-dir &&
 		ln -s a-loose-dir $last_loose &&
+		first_loose=$(head -n 1 loose-dirs) &&
+		rm -f loose-dirs &&
+
+		cd $first_loose &&
+		obj=$(ls *) &&
+		mv $obj ../an-object &&
+		ln -s ../an-object $obj &&
+
+		cd ../ &&
 		find . -type f | sort >../../../T.objects-files.raw &&
+		find . -type l | sort >../../../T.objects-symlinks.raw &&
 		echo unknown_content> unknown_file
 	) &&
 	git -C T fsck &&
@@ -291,7 +300,7 @@  test_expect_success SYMLINKS 'setup repo with manually symlinked dirs and unknow
 '
 
 
-test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files at objects/' '
+test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at objects/' '
 	for option in --local --no-hardlinks --shared --dissociate
 	do
 		git clone $option T T$option || return 1 &&
@@ -300,7 +309,8 @@  test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 		test_cmp T.objects T$option.objects &&
 		(
 			cd T$option/.git/objects &&
-			find . -type f | sort >../../../T$option.objects-files.raw
+			find . -type f | sort >../../../T$option.objects-files.raw &&
+			find . -type l | sort >../../../T$option.objects-symlinks.raw
 		)
 	done &&
 
@@ -314,6 +324,7 @@  test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 	./Y/Z
 	./Y/Z
 	./a-loose-dir/Z
+	./an-object
 	./Y/Z
 	./info/packs
 	./pack/pack-Z.idx
@@ -323,13 +334,15 @@  test_expect_success SYMLINKS 'clone repo with symlinked dirs and unknown files a
 	./unknown_file
 	EOF
 
-	for option in --local --dissociate --no-hardlinks
+	for option in --local --no-hardlinks --dissociate
 	do
-		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1
+		test_cmp expected-files T$option.objects-files.raw.de-sha || return 1 &&
+		test_must_be_empty T$option.objects-symlinks.raw.de-sha || return 1
 	done &&
 
 	echo ./info/alternates >expected-files &&
-	test_cmp expected-files T--shared.objects-files.raw
+	test_cmp expected-files T--shared.objects-files.raw &&
+	test_must_be_empty T--shared.objects-symlinks.raw
 '
 
 test_done