diff mbox series

[1/2] fsck: make symlinked .gitignore and .gitattributes a warning

Message ID YCsBRUQkrAm8l2gz@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series [1/2] fsck: make symlinked .gitignore and .gitattributes a warning | expand

Commit Message

Jeff King Feb. 15, 2021, 11:18 p.m. UTC
We recently added fsck checks to complain about symlinked .gitignore and
.gitattributes files, which are no longer allowed to be checked out.
This is partially to inform fsck users of the problem, but also to
protect older clients from receiving them (by blocking push and fetch
via transfer.fsckObjects).

While there are some minor security implications to having these files
be symlinks, this is out-weighed by the inconvenience of blocking
historical commits in some projects that might include them.

Let's loosen the fsck check to a warning. It will continue to be
reported by both git-fsck and transfer.fsckObjects, but will not impact
the exit code or the acceptance of objects. Note that internally in
fsck.c this is called "INFO", but the word "warning" will appear in
user-visible output.

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c                       |  4 ++--
 t/t7450-bad-dotgitx-files.sh | 26 ++++++++++++++++++--------
 2 files changed, 20 insertions(+), 10 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Feb. 16, 2021, 12:38 a.m. UTC | #1
On Tue, Feb 16 2021, Jeff King wrote:

> While there are some minor security implications to having these files
> be symlinks, this is out-weighed by the inconvenience of blocking
> historical commits in some projects that might include them.

Digging up the relevant thread that's the projects noted at
https://lore.kernel.org/git/20201027033518.GH2645313@google.com/ ?

I cloned the openmrn.git repository noted there, and checkout dies with:

    error: invalid path 'applications/clinic_app/targets/linux.x86/.gitignore'
    fatal: Could not reset index file to revision 'HEAD'.

I'm running a recent-ish snapshot of next at d98b1dd5eaa7, so with your
verify_path() change in current "seen".

So this series changes nothing about the checkout, just the fsck check?

I see there's your
https://lore.kernel.org/git/20201027075853.GH3005508@coredump.intra.peff.net/#t
to improve the !!symlink() codepath in apply.c

Still, it seems like a rather jarring gap in implementation to just warn
about this in fsck for the benefit of e.g. server operations, but then
hard die on the current client.

There seems to be no way around that hard die, and both repos in that
report are ones that are just symlinking .gitignore to a
../somedir/.gitignore deep in their own tree.

So aren't we both making the fsck check too loose and the client too
strict? Would anyone care if this was an error on fsck if we did the "is
outside repo?" check?
Jeff King Feb. 16, 2021, 1:16 a.m. UTC | #2
On Tue, Feb 16, 2021 at 01:38:30AM +0100, Ævar Arnfjörð Bjarmason wrote:

> On Tue, Feb 16 2021, Jeff King wrote:
> 
> > While there are some minor security implications to having these files
> > be symlinks, this is out-weighed by the inconvenience of blocking
> > historical commits in some projects that might include them.
> 
> Digging up the relevant thread that's the projects noted at
> https://lore.kernel.org/git/20201027033518.GH2645313@google.com/ ?
> 
> I cloned the openmrn.git repository noted there, and checkout dies with:
> 
>     error: invalid path 'applications/clinic_app/targets/linux.x86/.gitignore'
>     fatal: Could not reset index file to revision 'HEAD'.
> 
> I'm running a recent-ish snapshot of next at d98b1dd5eaa7, so with your
> verify_path() change in current "seen".
> 
> So this series changes nothing about the checkout, just the fsck check?

Right.

> I see there's your
> https://lore.kernel.org/git/20201027075853.GH3005508@coredump.intra.peff.net/#t
> to improve the !!symlink() codepath in apply.c

That's a somewhat orthogonal approach, that tries to look at out-of-tree
symlinks (rather than banning all symlinks for these files).

I think it's worth banning them all, though; they don't actually work
as you'd expect (i.e., whenever we read them in-repo the symlinks do
nothing).

> Still, it seems like a rather jarring gap in implementation to just warn
> about this in fsck for the benefit of e.g. server operations, but then
> hard die on the current client.

It's not for the benefits of servers. It's because the solution for them
is to stop symlinking, which fixes clone/checkout of new commits. But
they'll still have those old trees hanging around in their history. If
everybody rejects them, then it becomes difficult to push/fetch at all.

That said, they'd probably want to checkout those old commits, too. So
we probably do need a config override, even if it's a broad one ("trust
me, this repo is OK, just allow symlinks for these special files").

> There seems to be no way around that hard die, and both repos in that
> report are ones that are just symlinking .gitignore to a
> ../somedir/.gitignore deep in their own tree.
> 
> So aren't we both making the fsck check too loose and the client too
> strict? Would anyone care if this was an error on fsck if we did the "is
> outside repo?" check?

An outside-the-repo check would probably be less invasive, but:

  - it still allows broken setups

  - it's significantly more complex. I know that the implementation I
    showed errs on the side of complaining in at least some cases
    (because it doesn't know if intermediate paths are themselves
    symlinks). But I'd worry there are also cases where it covers too
    little, nullifying the protection.

-Peff
Junio C Hamano Feb. 16, 2021, 1:56 a.m. UTC | #3
Jeff King <peff@peff.net> writes:

> That said, they'd probably want to checkout those old commits, too. So
> we probably do need a config override, even if it's a broad one ("trust
> me, this repo is OK, just allow symlinks for these special files").

Is this about the check that is overly strict for some existing
projects that kept the jk/symlinked-dotgitx-files topic in the
'seen' so far?

On the fsck end, we know we can demote the error level per
repository, but I wonder if we should make checkout/clone honor the
same setting?

I think GITMODULES_SYMLINK has been there for quite some time at
"error" level and we do want to discourage it to be a symbolic link,
so I am not quite sure what the demoting of these two achieves.  Why
aren't we having a similar issue on .gitmodules that is a symbolic
link?

Thanks.
Jeff King Feb. 16, 2021, 12:48 p.m. UTC | #4
On Mon, Feb 15, 2021 at 08:16:00PM -0500, Jeff King wrote:

> That said, they'd probably want to checkout those old commits, too. So
> we probably do need a config override, even if it's a broad one ("trust
> me, this repo is OK, just allow symlinks for these special files").

Another option here is setting core.symlinks to false. That works more
broadly than just the one symlink, though.

It might be possible to apply that same setting (perhaps automatically,
even) to just these .gitattributes, etc, metafiles. That may get tricky,
though, as we'd need to do it not just on checkout, but any time we're
considering the file (because we wouldn't want "git add" to re-add it as
a non-symlink, nor git-diff to report it, etc).

> > So aren't we both making the fsck check too loose and the client too
> > strict? Would anyone care if this was an error on fsck if we did the "is
> > outside repo?" check?
> 
> An outside-the-repo check would probably be less invasive, but:
> 
>   - it still allows broken setups
> 
>   - it's significantly more complex. I know that the implementation I
>     showed errs on the side of complaining in at least some cases
>     (because it doesn't know if intermediate paths are themselves
>     symlinks). But I'd worry there are also cases where it covers too
>     little, nullifying the protection.

Adding to the "complexity" point: it's also impossible to implement via
fsck, where we do not have the full path of the tree entry. We could
live without the fsck support if need be, though.

I am beginning to wonder if just opening them all with O_NOFOLLOW (and a
hacky 2-syscall fallback for portability) might be less ugly than all of
this.

-Peff
Jeff King Feb. 16, 2021, 12:54 p.m. UTC | #5
On Mon, Feb 15, 2021 at 05:56:50PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > That said, they'd probably want to checkout those old commits, too. So
> > we probably do need a config override, even if it's a broad one ("trust
> > me, this repo is OK, just allow symlinks for these special files").
> 
> Is this about the check that is overly strict for some existing
> projects that kept the jk/symlinked-dotgitx-files topic in the
> 'seen' so far?

Yes.

> On the fsck end, we know we can demote the error level per
> repository, but I wonder if we should make checkout/clone honor the
> same setting?

What would the default be? If it's permissive, then it feels like we are
not really solving much, as anybody who wanted to be careful can already
inspect the tree contents. This is about avoiding surprises in the
default config.

If it's to forbid by default, then yes, I think the "trust me this repo
is OK" I gave above would be a viable path forward.

> I think GITMODULES_SYMLINK has been there for quite some time at
> "error" level and we do want to discourage it to be a symbolic link,
> so I am not quite sure what the demoting of these two achieves.  Why
> aren't we having a similar issue on .gitmodules that is a symbolic
> link?

I think it's just less common to have symlinked .gitmodules. To be
clear, I think symlinked .gitignore is also pretty uncommon. Back when
we discussed this originally in 2018 I scanned most of GitHub and came
up with only a handful of repositories that did so.

-Peff
diff mbox series

Patch

diff --git a/fsck.c b/fsck.c
index d0a201348d..c75c7d7dc7 100644
--- a/fsck.c
+++ b/fsck.c
@@ -67,8 +67,6 @@  static struct oidset gitmodules_done = OIDSET_INIT;
 	FUNC(GITMODULES_URL, ERROR) \
 	FUNC(GITMODULES_PATH, ERROR) \
 	FUNC(GITMODULES_UPDATE, ERROR) \
-	FUNC(GITIGNORE_SYMLINK, ERROR) \
-	FUNC(GITATTRIBUTES_SYMLINK, ERROR) \
 	/* warnings */ \
 	FUNC(BAD_FILEMODE, WARN) \
 	FUNC(EMPTY_NAME, WARN) \
@@ -81,6 +79,8 @@  static struct oidset gitmodules_done = OIDSET_INIT;
 	FUNC(NUL_IN_COMMIT, WARN) \
 	/* infos (reported as warnings, but ignored by default) */ \
 	FUNC(GITMODULES_PARSE, INFO) \
+	FUNC(GITIGNORE_SYMLINK, INFO) \
+	FUNC(GITATTRIBUTES_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
 	/* ignored (elevated when requested) */ \
diff --git a/t/t7450-bad-dotgitx-files.sh b/t/t7450-bad-dotgitx-files.sh
index 326b34e167..4b1edb150e 100755
--- a/t/t7450-bad-dotgitx-files.sh
+++ b/t/t7450-bad-dotgitx-files.sh
@@ -140,6 +140,16 @@  test_expect_success 'index-pack --strict works for non-repo pack' '
 '
 
 check_forbidden_symlink () {
+	fsck_must_fail=test_must_fail
+	fsck_prefix=error
+	case "$1" in
+	--fsck-warning)
+		fsck_must_fail=
+		fsck_prefix=warning
+		shift
+		;;
+	esac
+
 	name=$1
 	type=$2
 	path=$3
@@ -172,8 +182,8 @@  check_forbidden_symlink () {
 
 			# Check not only that we fail, but that it is due to the
 			# symlink detector
-			test_must_fail git fsck 2>output &&
-			test_i18ngrep "tree $tree: ${name}Symlink" output
+			$fsck_must_fail git fsck 2>output &&
+			test_i18ngrep "$fsck_prefix.*tree $tree: ${name}Symlink" output
 		)
 	'
 
@@ -193,13 +203,13 @@  check_forbidden_symlink gitmodules vanilla .gitmodules
 check_forbidden_symlink gitmodules ntfs ".gitmodules ."
 check_forbidden_symlink gitmodules hfs ".${u200c}gitmodules"
 
-check_forbidden_symlink gitattributes vanilla .gitattributes
-check_forbidden_symlink gitattributes ntfs ".gitattributes ."
-check_forbidden_symlink gitattributes hfs ".${u200c}gitattributes"
+check_forbidden_symlink --fsck-warning gitattributes vanilla .gitattributes
+check_forbidden_symlink --fsck-warning gitattributes ntfs ".gitattributes ."
+check_forbidden_symlink --fsck-warning gitattributes hfs ".${u200c}gitattributes"
 
-check_forbidden_symlink gitignore vanilla .gitignore
-check_forbidden_symlink gitignore ntfs ".gitignore ."
-check_forbidden_symlink gitignore hfs ".${u200c}gitignore"
+check_forbidden_symlink --fsck-warning gitignore vanilla .gitignore
+check_forbidden_symlink --fsck-warning gitignore ntfs ".gitignore ."
+check_forbidden_symlink --fsck-warning gitignore hfs ".${u200c}gitignore"
 
 test_expect_success 'fsck detects non-blob .gitmodules' '
 	git init non-blob &&