diff mbox series

[v2,7/8] verify_path(): disallow symlinks in .gitattributes and .gitignore

Message ID 20201005121645.GG2907394@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit dd4c2fe66b1171ca5ec0a3adf041eb225591b5a1
Headers show
Series forbidding symlinked .gitattributes and .gitignore | expand

Commit Message

Jeff King Oct. 5, 2020, 12:16 p.m. UTC
In commit 10ecfa7649 (verify_path: disallow symlinks in .gitmodules,
2018-05-04) we made it impossible to load a .gitmodules file that's a
symlink into the index. The security reasons for doing so are described
there. We also discussed forbidding symlinks of other .git files as part
of that fix, but the tradeoff was less compelling:

  1. Unlike .gitmodules, the other files don't have content-level fsck
     checks. So an attacker using symlinks to evade those checks isn't a
     problem.

  2. Unlike .gitmodules, Git will never write .gitignore or
     .gitattributes itself, making it much less likely to use them to
     write outside the repo. They could be used for out-of-repo reads,
     however.

  3. The .gitmodules change was part of a critical bug-fix that was
     not publicly disclosed until it was released. Changing the other
     files was not needed for the minimal fix.

However, it's still a reasonable idea to forbid symlinks for these
files:

  - As noted, they can still be used to read out-of-repo files (which is
    fairly restricted, but in some circumstances you can probe file
    content by speculatively creating files and seeing if they get
    ignored)

  - They don't currently behave well in all cases. We sometimes read
    these files from the index, where we _don't_ follow symlinks (we'd
    just treat the symlink target as the .gitignore or .gitattributes
    content, which is actively wrong).

This patch forbids symlinked versions of these files from entering the
index. We already have helpers for obscured forms of the names from
e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11) and
0fc333ba20 (is_hfs_dotgit: match other .git files, 2018-05-02), which
were done as part of the series touching .gitmodules.

No tests yet, as we'll add them in a subsequent patch once we have fsck
support, too.

Signed-off-by: Jeff King <peff@peff.net>
---
 read-cache.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Jonathan Nieder Oct. 27, 2020, 3:35 a.m. UTC | #1
Hi,

Jeff King wrote:

> However, it's still a reasonable idea to forbid symlinks for these
> files:
>
>   - As noted, they can still be used to read out-of-repo files (which is
>     fairly restricted, but in some circumstances you can probe file
>     content by speculatively creating files and seeing if they get
>     ignored)
>
>   - They don't currently behave well in all cases. We sometimes read
>     these files from the index, where we _don't_ follow symlinks (we'd
>     just treat the symlink target as the .gitignore or .gitattributes
>     content, which is actively wrong).
>
> This patch forbids symlinked versions of these files from entering the
> index. We already have helpers for obscured forms of the names from
> e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11) and
> 0fc333ba20 (is_hfs_dotgit: match other .git files, 2018-05-02), which
> were done as part of the series touching .gitmodules.

Thanks again for this.  Since this patch has been in "next", we've
gotten a little experience of it at Google.

We've been running with the fsck check for a while (more than a year),
but not the verify_dotfile check.  The verify_dotfile check didn't
trigger for anyone with .gitattributes, but it hit several people for
.gitignore.  Some examples that users have mentioned:

Before https://android-review.googlesource.com/c/platform/tools/test/connectivity/+/1462771/
Android used a .gitignore symlink for two directories that had similar
gitignore requirements.  Diagnosing the error was confusing for them,
especially because the "repo" wrapper tool produced messages like

	error.GitError: Cannot initialize work tree for platform/tools/test/connectivity

Eventually someone manually ran

	$ git add --a
	error: invalid path 'acts_tests/.gitignore'
	error: unable to add 'acts_tests/.gitignore' to index
	fatal: adding files failed

which helped them realize it was a git issue and helped me point them
to the need to replace the symlink with a plain file.

As another example, a user working with the
https://github.com/bakerstu/openmrn.git repository noticed "git
checkout" commands failing.  In this user's case, the checkout failed
part-way through, producing confusing behavior ("git status" showing
entries missing from the index).  When I tried to reproduce this, I
wasn't able to clone the repository at all because it failed fsck;
after disabling transfer.fsckObjects, I still wasn't able to check out
HEAD.

Observations:

- since some widely used repositories have .gitignore symlinks, I
  think we can't forbid it in fsck, alas

- it would be useful to be able to check whether these symlinks would
  not escape the worktree, for a more targeted check.  It might be
  nice to even respect these settings when they would not escape the
  worktree, but not necessarily

- we could use a clearer error message than "invalid path".  There's
  some room for improvement in "git checkout"'s error handling, too
  --- I think my ideal would be if the operation would fail entirely,
  with an advice message describing a checkout command that would
  succeed (But how do I checkout another commit while excluding some
  files? Should it suggest a sparse checkout?).

That's all I have time for today --- will revisit again tomorrow.

Thoughts?

Thanks,
Jonathan
Jeff King Oct. 27, 2020, 7:58 a.m. UTC | #2
On Mon, Oct 26, 2020 at 08:35:18PM -0700, Jonathan Nieder wrote:

> As another example, a user working with the
> https://github.com/bakerstu/openmrn.git repository noticed "git
> checkout" commands failing.  In this user's case, the checkout failed
> part-way through, producing confusing behavior ("git status" showing
> entries missing from the index).  When I tried to reproduce this, I
> wasn't able to clone the repository at all because it failed fsck;
> after disabling transfer.fsckObjects, I still wasn't able to check out
> HEAD.

I wouldn't be surprised if there are code paths which die() early
instead of checking out as much as they can. But I think fixing that is
somewhat orthogonal (here we're just adding new reasons that checking
out may fail).

> Observations:
> 
> - since some widely used repositories have .gitignore symlinks, I
>   think we can't forbid it in fsck, alas

I am a little puzzled here. You said you had the fsck checks for the
last year, so why did they just come up now? I guess nobody sets
transfer.fsckObjects, and because you were testing only with clients,
your server implementation didn't reject pushes?

I agree it's annoying for them if they fail fsck, but it's not entirely
a show-stopper. There are config options for fine-tuning what you're
willing to enforce or ignore. But they of course are also annoying to
use, because every receiver of a transfer needs to set them (on GitHub,
for example, you have to email Support).

So I won't be too devastated to remove the symlink checks, or possibly
downgrade them to purely warnings (or "info"; the naming in fsck.c is
confusing, because the transfer operations take even warnings as fatal.
I suspect we could do with some cleanup there).

> - it would be useful to be able to check whether these symlinks would
>   not escape the worktree, for a more targeted check.  It might be
>   nice to even respect these settings when they would not escape the
>   worktree, but not necessarily

I actually wrote a patch several years ago for checking symlinks (not
just these ones, but _any_ symlinks in the repo, but of course it would
be easy to limit it more). It's included at the end of this mail. It's
been part of my daily build for many years, so I'm confident it doesn't
crash or have other bad behavior. But it's possible the logic for what
it catches is faulty.

> - we could use a clearer error message than "invalid path".  

That part is tricky. The "invalid path" error comes from the caller of
verify_path(), and we have no way to pass back an intelligent error
there. We can call error() ourselves, of course. That adds an extra line
of output, but it's rare enough for verify_path() to fail that it's
likely OK. However, I would worry that some callers might be surprised
by it producing output at all.

An alternative is letting the caller pass in a strbuf that we fill out
with an extra error string.

> There's some room for improvement in "git checkout"'s error handling,
> too --- I think my ideal would be if the operation would fail
> entirely, with an advice message describing a checkout command that
> would succeed (But how do I checkout another commit while excluding
> some files? Should it suggest a sparse checkout?).

I suspect it's too late for "fail entirely". We may have already written
to the filesystem, and rolling back is difficult and error-prone. In
general I'd expect to checkout what we can, produce errors for the rest,
and let the user work from there with "git status".

But I may be wrong. The problem is loading the value into the index, not
writing it to the filesystem. So perhaps the relevant code paths load
the index fully before writing out anything to the filesystem, and it's
easy to rollback. But I imagine they are using unpack-trees' flag to
update the filesystem, and I assume that checks out as it loads entries
(but I didn't confirm).

-Peff

Here's the patch. The tests need the checkout-index fix from:

  https://lore.kernel.org/git/20201027073000.GA3651896@coredump.intra.peff.net/

but it should otherwise be stand-alone. I don't recall why I never sent
it. One obvious downside is that it's difficult to have fsck checks for
it, since the full path of an entry is not a property of a single tree
object.

-- >8 --
Date: Wed, 9 Nov 2016 23:24:09 -0500
Subject: [PATCH] optionally block out-of-repo symlinks

---
 apply.c                              |   2 +-
 cache.h                              |   3 +
 config.c                             |   5 ++
 entry.c                              |   2 +-
 environment.c                        |   1 +
 merge-recursive.c                    |   2 +-
 path.c                               |  73 ++++++++++++++++++
 t/t2031-checkout-symlink-external.sh | 107 +++++++++++++++++++++++++++
 8 files changed, 192 insertions(+), 3 deletions(-)
 create mode 100755 t/t2031-checkout-symlink-external.sh

diff --git a/apply.c b/apply.c
index 76dba93c97..2ac940dc05 100644
--- a/apply.c
+++ b/apply.c
@@ -4360,7 +4360,7 @@ static int try_create_file(struct apply_state *state, const char *path,
 		/* Although buf:size is counted string, it also is NUL
 		 * terminated.
 		 */
-		return !!symlink(buf, path);
+		return !!safe_symlink(buf, path);
 
 	fd = open(path, O_CREAT | O_EXCL | O_WRONLY, (mode & 0100) ? 0777 : 0666);
 	if (fd < 0)
diff --git a/cache.h b/cache.h
index c0072d43b1..9b56e2327a 100644
--- a/cache.h
+++ b/cache.h
@@ -959,6 +959,7 @@ extern int precomposed_unicode;
 extern int protect_hfs;
 extern int protect_ntfs;
 extern const char *core_fsmonitor;
+extern int allow_external_symlinks;
 
 extern int core_apply_sparse_checkout;
 extern int core_sparse_checkout_cone;
@@ -1979,4 +1980,6 @@ int print_sha1_ellipsis(void);
 /* Return 1 if the file is empty or does not exists, 0 otherwise. */
 int is_empty_or_missing_file(const char *filename);
 
+int safe_symlink(const char *target, const char *linkpath);
+
 #endif /* CACHE_H */
diff --git a/config.c b/config.c
index 2bdff4457b..6699881c6e 100644
--- a/config.c
+++ b/config.c
@@ -1404,6 +1404,11 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.allowexternalsymlinks")) {
+		allow_external_symlinks = git_config_bool(var, value);
+		return 0;
+	}
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return platform_core_config(var, value, cb);
 }
diff --git a/entry.c b/entry.c
index a0532f1f00..f2d3456df3 100644
--- a/entry.c
+++ b/entry.c
@@ -291,7 +291,7 @@ static int write_entry(struct cache_entry *ce,
 		if (!has_symlinks || to_tempfile)
 			goto write_file_entry;
 
-		ret = symlink(new_blob, path);
+		ret = safe_symlink(new_blob, path);
 		free(new_blob);
 		if (ret)
 			return error_errno("unable to create symlink %s", path);
diff --git a/environment.c b/environment.c
index bb518c61cd..7c233e0e0e 100644
--- a/environment.c
+++ b/environment.c
@@ -73,6 +73,7 @@ int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
 enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET;
+int allow_external_symlinks = 1;
 
 #ifndef PROTECT_HFS_DEFAULT
 #define PROTECT_HFS_DEFAULT 0
diff --git a/merge-recursive.c b/merge-recursive.c
index d0214335a7..61208a8b43 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -986,7 +986,7 @@ static int update_file_flags(struct merge_options *opt,
 			char *lnk = xmemdupz(buf, size);
 			safe_create_leading_directories_const(path);
 			unlink(path);
-			if (symlink(lnk, path))
+			if (safe_symlink(lnk, path))
 				ret = err(opt, _("failed to symlink '%s': %s"),
 					  path, strerror(errno));
 			free(lnk);
diff --git a/path.c b/path.c
index 7b385e5eb2..251dd70ff5 100644
--- a/path.c
+++ b/path.c
@@ -1498,6 +1498,79 @@ int looks_like_command_line_option(const char *str)
 	return str && str[0] == '-';
 }
 
+static int symlink_leaves_repo(const char *target, const char *linkpath)
+{
+	/*
+	 * Absolute paths are always considered to leave the repository (even
+	 * if they happen to point to the working tree path).
+	 */
+	if (is_absolute_path(target))
+		return 1;
+
+	/*
+	 * Allow relative paths that start with a sequence of "../",
+	 * as long as they do not break out of the symlink's root.
+	 * This loop will detect break-out cases and return; otherwise, at the
+	 * end of the loop "target" will point to the first non-".." component.
+	 *
+	 * We count the depth of linkpath by eating up directory components left
+	 * to right. Technically the symlink would resolve right-to-left, but
+	 * we don't care about the actual values, only the number.
+	 */
+	while (target[0] == '.') {
+		if (!target[1]) {
+			/* trailing "." -- ignore */
+			target++;
+		} else if (is_dir_sep(target[1])) {
+			/* "./" -- ignore */
+			target += 2;
+		} else if (target[1] == '.' &&
+			   (!target[2] || is_dir_sep(target[2]))) {
+			/* ".." or "../" -- drop one from linkpath depth */
+			while (!is_dir_sep(*linkpath)) {
+				/* end-of-string; target exceeded our depth */
+				if (!*linkpath)
+					return 1;
+				linkpath++;
+			}
+			/* skip final "/" */
+			linkpath++;
+
+			/* skip past ".." */
+			target += 2;
+			/* and "/" if present */
+			if (is_dir_sep(*target))
+				target++;
+		}
+	}
+
+	/*
+	 * Now we have a path in "target" that only go down into the tree.
+	 * Disallow any interior "../", like "foo/../bar". These might be
+	 * OK, but we cannot know unless we know whether "foo" is itself a
+	 * symlink. So err on the side of caution.
+	 */
+	while (*target) {
+		const char *v;
+		if (skip_prefix(target, "..", &v) && (!*v || is_dir_sep(*v)))
+			return 1;
+		target++;
+	}
+
+	return 0;
+}
+
+int safe_symlink(const char *target, const char *linkpath)
+{
+	if (!allow_external_symlinks &&
+	    symlink_leaves_repo(target, linkpath)) {
+		errno = EPERM;
+		return -1;
+	}
+
+	return symlink(target, linkpath);
+}
+
 char *xdg_config_home(const char *filename)
 {
 	const char *home, *config_home;
diff --git a/t/t2031-checkout-symlink-external.sh b/t/t2031-checkout-symlink-external.sh
new file mode 100755
index 0000000000..c799f8158e
--- /dev/null
+++ b/t/t2031-checkout-symlink-external.sh
@@ -0,0 +1,107 @@
+#!/bin/sh
+
+test_description='detection and prevention of out-of-tree symlinks'
+. ./test-lib.sh
+
+if ! test_have_prereq SYMLINKS
+then
+	skip_all='skipping external symlink tests (missing SYMLINKS)'
+	test_done
+fi
+
+create_symlink() {
+	symlink=$1
+	target=$2
+	test_expect_success "create symlink ($symlink)" '
+		sha1=$(printf "%s" "$target" | git hash-object -w --stdin) &&
+		git update-index --add --cacheinfo "120000,$sha1,$symlink"
+	'
+}
+
+check_symlink () {
+	symlink=$1
+	config=$2
+	outcome=$3
+	expect=$4
+
+	if test "$outcome" = "allow"
+	then
+		fail=
+		: ${expect:=test_cmp ../target}
+	else
+		fail=test_must_fail
+		: ${expect:=! cat}
+	fi
+
+	test_expect_success " check symlink ($symlink, $config -> $outcome)" "
+		rm -f $symlink &&
+		$fail git -c core.allowExternalSymlinks=$config \\
+			checkout-index -- $symlink &&
+		$expect $symlink
+	"
+}
+
+# we want to try breaking out of the repository,
+# so let's work inside a sub-repository, and break
+# out to the top-level trash directory
+test_expect_success 'set up repository' '
+	echo content >target &&
+	git init subrepo &&
+	cd subrepo &&
+	test_commit base &&
+	echo content >in-repo-target
+'
+
+create_symlink in-repo in-repo-target
+check_symlink in-repo false allow
+
+create_symlink subdir/in-repo ../in-repo-target
+check_symlink subdir/in-repo false allow
+
+create_symlink absolute "$TRASH_DIRECTORY/target"
+check_symlink absolute true allow
+check_symlink absolute false forbid
+
+create_symlink relative "../target"
+check_symlink relative true allow
+check_symlink relative false forbid
+
+create_symlink curdir .
+check_symlink curdir false allow test_path_is_dir
+create_symlink sneaky curdir/../target
+check_symlink sneaky true allow
+check_symlink sneaky false forbid
+
+test_expect_success 'applying a patch checks symlink config' '
+	git diff-index -p --cached HEAD -- relative >patch &&
+	rm -f relative &&
+	git -c core.allowExternalSymlinks=true apply <patch &&
+	test_cmp ../target relative &&
+	rm -f relative &&
+	test_must_fail git -c core.allowExternalSymlinks=false apply <patch
+'
+
+test_expect_success 'merge-recursive checks symlinks config' '
+	git reset --hard &&
+
+	# create rename situation which requires processing
+	# outside of unpack_trees()
+	ln -s ../foo one &&
+	git add one &&
+	git commit -m base &&
+
+	ln -sf ../target one &&
+	git commit -am modify &&
+
+	git checkout -b side HEAD^ &&
+	git mv one two &&
+	git commit -am rename &&
+
+	git -c core.allowExternalSymlinks=true merge master &&
+	test_cmp ../target two &&
+
+	git reset --hard HEAD^ &&
+	test_must_fail git -c core.allowExternalSymlinks=false merge master
+'
+
+test_done
Junio C Hamano Oct. 27, 2020, 10 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> diff --git a/environment.c b/environment.c
> index bb518c61cd..7c233e0e0e 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -73,6 +73,7 @@ int merge_log_config = -1;
>  int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
>  unsigned long pack_size_limit_cfg;
>  enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET;
> +int allow_external_symlinks = 1;

OK, so by default it is not blocked...

> +static int symlink_leaves_repo(const char *target, const char *linkpath)
> +{
> +	/*
> +	 * Absolute paths are always considered to leave the repository (even
> +	 * if they happen to point to the working tree path).
> +	 */
> +	if (is_absolute_path(target))
> +		return 1;

Very sensible.

> +	/*
> +	 * Allow relative paths that start with a sequence of "../",
> +	 * as long as they do not break out of the symlink's root.
> +	 * This loop will detect break-out cases and return; otherwise, at the
> +	 * end of the loop "target" will point to the first non-".." component.
> +	 *
> +	 * We count the depth of linkpath by eating up directory components left
> +	 * to right. Technically the symlink would resolve right-to-left, but
> +	 * we don't care about the actual values, only the number.
> +	 */
> +	while (target[0] == '.') {
> +		if (!target[1]) {
> +			/* trailing "." -- ignore */
> +			target++;
> +		} else if (is_dir_sep(target[1])) {
> +			/* "./" -- ignore */
> +			target += 2;
> +		} else if (target[1] == '.' &&
> +			   (!target[2] || is_dir_sep(target[2]))) {
> +			/* ".." or "../" -- drop one from linkpath depth */
> +			while (!is_dir_sep(*linkpath)) {
> +				/* end-of-string; target exceeded our depth */
> +				if (!*linkpath)
> +					return 1;
> +				linkpath++;
> +			}
> +			/* skip final "/" */
> +			linkpath++;
> +
> +			/* skip past ".." */
> +			target += 2;
> +			/* and "/" if present */
> +			if (is_dir_sep(*target))
> +				target++;
> +		}
> +	}
> +
> +	/*
> +	 * Now we have a path in "target" that only go down into the tree.
> +	 * Disallow any interior "../", like "foo/../bar". These might be
> +	 * OK, but we cannot know unless we know whether "foo" is itself a
> +	 * symlink. So err on the side of caution.
> +	 */
> +	while (*target) {
> +		const char *v;
> +		if (skip_prefix(target, "..", &v) && (!*v || is_dir_sep(*v)))
> +			return 1;
> +		target++;
> +	}
> +
> +	return 0;
> +}
> +
> +int safe_symlink(const char *target, const char *linkpath)
> +{
> +	if (!allow_external_symlinks &&
> +	    symlink_leaves_repo(target, linkpath)) {
> +		errno = EPERM;
> +		return -1;
> +	}
> +
> +	return symlink(target, linkpath);
> +}

OK.  This is only about blocking creation of new symbolic links that
goes outside the working tree.  It obviously is a good thing to do.

We have some "symlink safety" in various parts of the system [*1*],
and I wonder if we can somehow consolidate the support to a more
central place.

Thanks.


[Footnote]

*1* For example, apply tries to be careful not to take the "path"
    recorded in the incoming patch blindly, and instead checks if
    any path component in it is a symbolic link before touching.
    Similarly, callers of has_symlink_leading_path() all try to be
    careful when the "path" they want to use to access a filesystem
    entity has a symbolic link in the middle on the filesystem.
Jonathan Nieder Oct. 27, 2020, 11:43 p.m. UTC | #4
Jeff King wrote:
> On Mon, Oct 26, 2020 at 08:35:18PM -0700, Jonathan Nieder wrote:

>> Observations:
>>
>> - since some widely used repositories have .gitignore symlinks, I
>>   think we can't forbid it in fsck, alas
>
> I am a little puzzled here. You said you had the fsck checks for the
> last year, so why did they just come up now? I guess nobody sets
> transfer.fsckObjects, and because you were testing only with clients,
> your server implementation didn't reject pushes?
>
> I agree it's annoying for them if they fail fsck, but it's not entirely
> a show-stopper. There are config options for fine-tuning what you're
> willing to enforce or ignore. But they of course are also annoying to
> use, because every receiver of a transfer needs to set them (on GitHub,
> for example, you have to email Support).

Yes.  And to reiterate the point a little: the reason nobody sets
transfer.fsckObjects is that we haven't made it easy to distinguish
between "hard error, should never be overridden" checks (like
BAD_PARENT_SHA1), "new tools shouldn't write these but they exist in
important repos like perl.git and anything consuming Git repositories
needs to cope with them" (like MISSING_SPACE_BEFORE_DATE from some
commits' concatenated authors), and so on.

> So I won't be too devastated to remove the symlink checks, or possibly
> downgrade them to purely warnings (or "info"; the naming in fsck.c is
> confusing, because the transfer operations take even warnings as fatal.
> I suspect we could do with some cleanup there).

Downgrading the .gitignore check to warning sounds okay.  .gitmodules
would still want to be an error, of course.

>> - it would be useful to be able to check whether these symlinks would
>>   not escape the worktree, for a more targeted check.  It might be
>>   nice to even respect these settings when they would not escape the
>>   worktree, but not necessarily
>
> I actually wrote a patch several years ago for checking symlinks (not
> just these ones, but _any_ symlinks in the repo, but of course it would
> be easy to limit it more). It's included at the end of this mail. It's
> been part of my daily build for many years, so I'm confident it doesn't
> crash or have other bad behavior. But it's possible the logic for what
> it catches is faulty.
[...]
> Here's the patch.

Nice!  I'll try to find time to experiment with this.

[...]
>> - we could use a clearer error message than "invalid path".
>
> That part is tricky. The "invalid path" error comes from the caller of
> verify_path(), and we have no way to pass back an intelligent error
> there. We can call error() ourselves, of course. That adds an extra line
> of output, but it's rare enough for verify_path() to fail that it's
> likely OK. However, I would worry that some callers might be surprised
> by it producing output at all.
>
> An alternative is letting the caller pass in a strbuf that we fill out
> with an extra error string.

I think passing in a struct to govern error handling sounds nice.
Alternatively, this might suggest that verify_path is not the right
place for the user-facing check: it's useful because it's exhaustive,
but it may be that we can also catch the same issue earlier and
produce a nicer diagnostic.

>> There's some room for improvement in "git checkout"'s error handling,
>> too --- I think my ideal would be if the operation would fail
>> entirely, with an advice message describing a checkout command that
>> would succeed (But how do I checkout another commit while excluding
>> some files? Should it suggest a sparse checkout?).
>
> I suspect it's too late for "fail entirely". We may have already written
> to the filesystem, and rolling back is difficult and error-prone. In
> general I'd expect to checkout what we can, produce errors for the rest,
> and let the user work from there with "git status".
>
> But I may be wrong. The problem is loading the value into the index, not
> writing it to the filesystem. So perhaps the relevant code paths load
> the index fully before writing out anything to the filesystem, and it's
> easy to rollback. But I imagine they are using unpack-trees' flag to
> update the filesystem, and I assume that checks out as it loads entries
> (but I didn't confirm).

Right, this would require taking two passes.  I think we do already
perform two passes (first deletions, then additions); I'll have to
look a little more closely to see whether this is straightforward to
do.

I do still like the idea of not respecting .gitignore and
.gitattributes symlinks, by the way.  What the experiences upthread
tell me is just that users need better facilities for repairing
existing repositories.

Thanks,
Jonathan
Jeff King Oct. 28, 2020, 9:41 a.m. UTC | #5
On Tue, Oct 27, 2020 at 03:00:36PM -0700, Junio C Hamano wrote:

> > +	/*
> > +	 * Now we have a path in "target" that only go down into the tree.
> > +	 * Disallow any interior "../", like "foo/../bar". These might be
> > +	 * OK, but we cannot know unless we know whether "foo" is itself a
> > +	 * symlink. So err on the side of caution.
> > +	 */
> > +	while (*target) {
> > +		const char *v;
> > +		if (skip_prefix(target, "..", &v) && (!*v || is_dir_sep(*v)))
> > +			return 1;
> > +		target++;
> > +	}

Just reading over the patch again, I suspect this is overly eager to
find ".." that is not bordered by a directory separator on the left-hand
side (e.g., I think it will match "foo../").

> > +int safe_symlink(const char *target, const char *linkpath)
> > +{
> > +	if (!allow_external_symlinks &&
> > +	    symlink_leaves_repo(target, linkpath)) {
> > +		errno = EPERM;
> > +		return -1;
> > +	}
> > +
> > +	return symlink(target, linkpath);
> > +}
> 
> OK.  This is only about blocking creation of new symbolic links that
> goes outside the working tree.  It obviously is a good thing to do.
> 
> We have some "symlink safety" in various parts of the system [*1*],
> and I wonder if we can somehow consolidate the support to a more
> central place.

Note that it is overly conservative, as described in the comment quoted
at the top of this email. It's possible that other code might be able to
be more accurate because it can see "/foo/" in the middle of a symlink
target and actually _look_ at "foo".  Does it exist, is it a symlink
itself, etc.

Whereas here we're taking a purely textual look at the target. We have
to, I think, because we don't know if or when that "foo" will get
updated. And maybe that same restriction applies to other parts of the
system, or maybe not.

At any rate, I don't really have plans to push this particular topic
forward, at least not in the near term. I was mostly sharing it in case
anybody found it useful. If somebody wants to run with it, please be my
guest.

-Peff
Junio C Hamano Oct. 28, 2020, 7:18 p.m. UTC | #6
Jonathan Nieder <jrnieder@gmail.com> writes:

> Yes.  And to reiterate the point a little: the reason nobody sets
> transfer.fsckObjects is that we haven't made it easy to distinguish
> between "hard error, should never be overridden" checks (like
> BAD_PARENT_SHA1), "new tools shouldn't write these but they exist in
> important repos like perl.git and anything consuming Git repositories
> needs to cope with them" (like MISSING_SPACE_BEFORE_DATE from some
> commits' concatenated authors), and so on.

Hmph, don't we "distinguish" them by setting appropriate default
levels, though?  Perhaps some classes of errors are set too strict?

>> So I won't be too devastated to remove the symlink checks, or possibly
>> downgrade them to purely warnings (or "info"; the naming in fsck.c is
>> confusing, because the transfer operations take even warnings as fatal.
>> I suspect we could do with some cleanup there).
>
> Downgrading the .gitignore check to warning sounds okay.  .gitmodules
> would still want to be an error, of course.

.gitattributes (and any other .git<thing> we may have in the
future), too.

Thanks.
diff mbox series

Patch

diff --git a/read-cache.c b/read-cache.c
index ecf6f68994..63aec6c35d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -947,7 +947,9 @@  static int verify_dotfile(const char *rest, unsigned mode)
 			return 0;
 		if (S_ISLNK(mode)) {
 			rest += 3;
-			if (skip_iprefix(rest, "modules", &rest) &&
+			if ((skip_iprefix(rest, "modules", &rest) ||
+			     skip_iprefix(rest, "ignore", &rest) ||
+			     skip_iprefix(rest, "attributes", &rest)) &&
 			    (*rest == '\0' || is_dir_sep(*rest)))
 				return 0;
 		}
@@ -980,7 +982,9 @@  int verify_path(const char *path, unsigned mode)
 				if (is_hfs_dotgit(path))
 					return 0;
 				if (S_ISLNK(mode)) {
-					if (is_hfs_dotgitmodules(path))
+					if (is_hfs_dotgitmodules(path) ||
+					    is_hfs_dotgitignore(path) ||
+					    is_hfs_dotgitattributes(path))
 						return 0;
 				}
 			}
@@ -992,7 +996,9 @@  int verify_path(const char *path, unsigned mode)
 				if (is_ntfs_dotgit(path))
 					return 0;
 				if (S_ISLNK(mode)) {
-					if (is_ntfs_dotgitmodules(path))
+					if (is_ntfs_dotgitmodules(path) ||
+					    is_ntfs_dotgitignore(path) ||
+					    is_ntfs_dotgitattributes(path))
 						return 0;
 				}
 			}