diff mbox series

[PATCH/RFC] fsck: complain when .gitignore and .gitattributes are symlinks

Message ID 20190114230902.GG162110@google.com (mailing list archive)
State New, archived
Headers show
Series [PATCH/RFC] fsck: complain when .gitignore and .gitattributes are symlinks | expand

Commit Message

Jonathan Nieder Jan. 14, 2019, 11:09 p.m. UTC
From: Jeff King <peff@peff.net>
Date: Sun, 13 May 2018 14:14:34 -0400

This case is already forbidden by verify_path(), so let's
check it in fsck. It's easier to handle than .gitmodules,
because we don't care about checking the blob content. This
is really just about whether the name and mode for the tree
entry are valid.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Hi,

This patch is from the 2.20.0 era, from the same series as

 fsck: detect submodule urls starting with dash

It was omitted from that series because it does not address any known
exploit, but to me it seems worthwhile anyway:

- if a client enables transfer.fsckObjects, this helps them protect
  themselves against weird input that does *not* have a known exploit
  attached, to

- it generally feels more simple and robust.  Git-related tools can
  benefit from this kind of check as an indication of input they can
  bail out on instead of trying to support.

Peff checked it against repos in the wild and found this to be very
rare but existent (e.g. https://github.com/acquia/blt has a
.gitattributes symlink).  Linus suggested that we may want it to be
INFO instead of ERROR, so that people can at least notice that their
.gitattributes symlink is likely to have no effect.  This patch still
uses ERROR because I suspect that this is rare enough in the wild that
people will be able to cope.

Thoughts?

Thanks,
Jonathan

 fsck.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Jeff King Jan. 17, 2019, 5 p.m. UTC | #1
On Mon, Jan 14, 2019 at 03:09:02PM -0800, Jonathan Nieder wrote:

> From: Jeff King <peff@peff.net>
> Date: Sun, 13 May 2018 14:14:34 -0400
> 
> This case is already forbidden by verify_path(), so let's
> check it in fsck. It's easier to handle than .gitmodules,
> because we don't care about checking the blob content. This
> is really just about whether the name and mode for the tree
> entry are valid.

Hmm. I think this commit message isn't quite right, because we also
skipped the patches to touch gitignore/gitattributes in verify_path().

Are you thinking we should resurrect that behavior[1], too, or just
protect at the fsck level?

> It was omitted from that series because it does not address any known
> exploit, but to me it seems worthwhile anyway:
> 
> - if a client enables transfer.fsckObjects, this helps them protect
>   themselves against weird input that does *not* have a known exploit
>   attached, to
> 
> - it generally feels more simple and robust.  Git-related tools can
>   benefit from this kind of check as an indication of input they can
>   bail out on instead of trying to support.

I think I may just be restating your two points above, but what I'd
argue is:

  - even though there's no known-interesting exploit, this can cause Git
    to unexpectedly read arbitrary files outside of the repository
    directory. That in itself isn't necessarily evil, but it's weird.

  - there are potentially non-malicious bugs here, where we try to read
    .gitattributes out of the index, but obviously don't follow symlinks
    there

-Peff

[1] This wasn't a separate patch, but just an early iteration of the
    "ban symlinks in .gitmodules" patch. I think the incremental is
    just:

diff --git a/read-cache.c b/read-cache.c
index bfff271a3d..121c0bec69 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -937,7 +937,9 @@ static int verify_dotfile(const char *rest, unsigned mode)
 			return 0;
 		if (S_ISLNK(mode)) {
 			rest += 3;
-			if (skip_iprefix(rest, "modules", &rest) &&
+			if ((skip_iprefix(rest, "modules", &rest) ||
+			     skip_iprefix(rest, "ignore", &rest) ||
+			     skip_iprefix(rest, "attributes", &rest)) &&
 			    (*rest == '\0' || is_dir_sep(*rest)))
 				return 0;
 		}
@@ -966,7 +968,9 @@ int verify_path(const char *path, unsigned mode)
 				if (is_hfs_dotgit(path))
 					return 0;
 				if (S_ISLNK(mode)) {
-					if (is_hfs_dotgitmodules(path))
+					if (is_hfs_dotgitmodules(path) ||
+					    is_hfs_dotgitignore(path) ||
+					    is_hfs_dotgitattributes(path))
 						return 0;
 				}
 			}
@@ -974,7 +978,9 @@ int verify_path(const char *path, unsigned mode)
 				if (is_ntfs_dotgit(path))
 					return 0;
 				if (S_ISLNK(mode)) {
-					if (is_ntfs_dotgitmodules(path))
+					if (is_ntfs_dotgitmodules(path) ||
+					    is_ntfs_dotgitignore(path) ||
+					    is_ntfs_dotgitattributes(path))
 						return 0;
 				}
 			}
Junio C Hamano Jan. 17, 2019, 8:13 p.m. UTC | #2
Jeff King <peff@peff.net> writes:

> Hmm. I think this commit message isn't quite right, because we also
> skipped the patches to touch gitignore/gitattributes in verify_path().
>
> Are you thinking we should resurrect that behavior[1], too, or just
> protect at the fsck level?
>
>> It was omitted from that series because it does not address any known
>> exploit, but to me it seems worthwhile anyway:
>> 
>> - if a client enables transfer.fsckObjects, this helps them protect
>>   themselves against weird input that does *not* have a known exploit
>>   attached, to
>> 
>> - it generally feels more simple and robust.  Git-related tools can
>>   benefit from this kind of check as an indication of input they can
>>   bail out on instead of trying to support.
>
> I think I may just be restating your two points above, but what I'd
> argue is:
>
>   - even though there's no known-interesting exploit, this can cause Git
>     to unexpectedly read arbitrary files outside of the repository
>     directory. That in itself isn't necessarily evil, but it's weird.
>
>   - there are potentially non-malicious bugs here, where we try to read
>     .gitattributes out of the index, but obviously don't follow symlinks
>     there

FWIW, you two can count me as the third person who agrees with the
above points.

> [1] This wasn't a separate patch, but just an early iteration of the
>     "ban symlinks in .gitmodules" patch. I think the incremental is
>     just:
>
> diff --git a/read-cache.c b/read-cache.c
> index bfff271a3d..121c0bec69 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -937,7 +937,9 @@ static int verify_dotfile(const char *rest, unsigned mode)
>  			return 0;
>  		if (S_ISLNK(mode)) {
>  			rest += 3;
> -			if (skip_iprefix(rest, "modules", &rest) &&
> +			if ((skip_iprefix(rest, "modules", &rest) ||
> +			     skip_iprefix(rest, "ignore", &rest) ||
> +			     skip_iprefix(rest, "attributes", &rest)) &&
>  			    (*rest == '\0' || is_dir_sep(*rest)))
>  				return 0;
>  		}

OK.

> @@ -966,7 +968,9 @@ int verify_path(const char *path, unsigned mode)
>  				if (is_hfs_dotgit(path))
>  					return 0;
>  				if (S_ISLNK(mode)) {
> -					if (is_hfs_dotgitmodules(path))
> +					if (is_hfs_dotgitmodules(path) ||
> +					    is_hfs_dotgitignore(path) ||
> +					    is_hfs_dotgitattributes(path))
>  						return 0;
>  				}
>  			}
> @@ -974,7 +978,9 @@ int verify_path(const char *path, unsigned mode)
>  				if (is_ntfs_dotgit(path))
>  					return 0;
>  				if (S_ISLNK(mode)) {
> -					if (is_ntfs_dotgitmodules(path))
> +					if (is_ntfs_dotgitmodules(path) ||
> +					    is_ntfs_dotgitignore(path) ||
> +					    is_ntfs_dotgitattributes(path))
>  						return 0;

Curious that we already have these helpers, nobody seems to call
them in the current codebase, and we haven't seen the "these are
unused" linter message on the list for a while ;-).
Jeff King Jan. 17, 2019, 9:24 p.m. UTC | #3
On Thu, Jan 17, 2019 at 12:13:12PM -0800, Junio C Hamano wrote:

> > @@ -966,7 +968,9 @@ int verify_path(const char *path, unsigned mode)
> >  				if (is_hfs_dotgit(path))
> >  					return 0;
> >  				if (S_ISLNK(mode)) {
> > -					if (is_hfs_dotgitmodules(path))
> > +					if (is_hfs_dotgitmodules(path) ||
> > +					    is_hfs_dotgitignore(path) ||
> > +					    is_hfs_dotgitattributes(path))
> >  						return 0;
> >  				}
> >  			}
> > @@ -974,7 +978,9 @@ int verify_path(const char *path, unsigned mode)
> >  				if (is_ntfs_dotgit(path))
> >  					return 0;
> >  				if (S_ISLNK(mode)) {
> > -					if (is_ntfs_dotgitmodules(path))
> > +					if (is_ntfs_dotgitmodules(path) ||
> > +					    is_ntfs_dotgitignore(path) ||
> > +					    is_ntfs_dotgitattributes(path))
> >  						return 0;
> 
> Curious that we already have these helpers, nobody seems to call
> them in the current codebase, and we haven't seen the "these are
> unused" linter message on the list for a while ;-).

Heh. Yeah, I was surprised by that, too. They were added by e7cb0b4455
(is_ntfs_dotgit: match other .git files, 2018-05-11). The original
version of my series had the hunks quoted above, and then we backed off
on handling them as part of the emergency fix, but I never re-rolled the
preparatory patch to get rid of them.

I think they got overlooked because they're not file-local statics, and
it's much harder to say "this is never called by any function in another
translation unit". You probably have to do analysis on the complete
binaries using "nm" or similar. I think maybe Ramsay does that from time
to time, but I don't offhand know the correct incantation.

Anyway, it sounds like you like the overall direction. Does that include
these verify_path() bits, as well as the fsck part?

-Peff
Ramsay Jones Jan. 18, 2019, 1:41 a.m. UTC | #4
On 17/01/2019 21:24, Jeff King wrote:
> On Thu, Jan 17, 2019 at 12:13:12PM -0800, Junio C Hamano wrote:
> 
>>> @@ -966,7 +968,9 @@ int verify_path(const char *path, unsigned mode)
>>>  				if (is_hfs_dotgit(path))
>>>  					return 0;
>>>  				if (S_ISLNK(mode)) {
>>> -					if (is_hfs_dotgitmodules(path))
>>> +					if (is_hfs_dotgitmodules(path) ||
>>> +					    is_hfs_dotgitignore(path) ||
>>> +					    is_hfs_dotgitattributes(path))
>>>  						return 0;
>>>  				}
>>>  			}
>>> @@ -974,7 +978,9 @@ int verify_path(const char *path, unsigned mode)
>>>  				if (is_ntfs_dotgit(path))
>>>  					return 0;
>>>  				if (S_ISLNK(mode)) {
>>> -					if (is_ntfs_dotgitmodules(path))
>>> +					if (is_ntfs_dotgitmodules(path) ||
>>> +					    is_ntfs_dotgitignore(path) ||
>>> +					    is_ntfs_dotgitattributes(path))
>>>  						return 0;
>>
>> Curious that we already have these helpers, nobody seems to call
>> them in the current codebase, and we haven't seen the "these are
>> unused" linter message on the list for a while ;-).
> 
> Heh. Yeah, I was surprised by that, too. They were added by e7cb0b4455
> (is_ntfs_dotgit: match other .git files, 2018-05-11). The original
> version of my series had the hunks quoted above, and then we backed off
> on handling them as part of the emergency fix, but I never re-rolled the
> preparatory patch to get rid of them.
> 
> I think they got overlooked because they're not file-local statics, and
> it's much harder to say "this is never called by any function in another
> translation unit". You probably have to do analysis on the complete
> binaries using "nm" or similar. I think maybe Ramsay does that from time
> to time, but I don't offhand know the correct incantation.

I don't do this "from time to time", but *every* build on all
platforms! :-D

As I have mentioned before, I run the script on 'master', 'next'
and 'pu', but I don't look at the results for 'master', I simply
look at the diffs master->next and next->pu.

I put the output of 'static-check.pl' in the sc, nsc and psc files
(guess which files are for which branches!). For example, tonight
I find:

    $ wc -l sc nsc psc
      90 sc
      90 nsc
     100 psc
     280 total
    $ diff sc nsc
    $ diff nsc psc
    29a30,32
    > config.o	- repo_config_set
    > config.o	- repo_config_set_gently
    > config.o	- repo_config_set_worktree_gently
    32a36
    > fuzz-commit-graph.o	- LLVMFuzzerTestOneInput
    37a42,43
    > hex.o	- hash_to_hex
    > hex.o	- hash_to_hex_algop_r
    74a81,83
    > sha1-file.o	- hash_algo_by_id
    > sha1-file.o	- hash_algo_by_name
    > sha1-file.o	- repo_has_sha1_file_with_flags
    80a90
    > strbuf.o	- strbuf_vinsertf
    $ 

BTW, if my memory serves (and it may not), the symbols you
refer to came directly into 'master' (via 'maint') as a
result of security updates - so I would never have seen
them in 'pu' or 'next'. They are, indeed, currently noted
in the 'master' branch:

    $ grep is_ntfs_ sc
    path.o	- is_ntfs_dotgitattributes
    path.o	- is_ntfs_dotgitignore
    $ grep is_hfs_ sc
    utf8.o	- is_hfs_dotgitattributes
    utf8.o	- is_hfs_dotgitignore
    $ 
    
ATB,
Ramsay Jones
Jeff King Jan. 22, 2019, 7:23 a.m. UTC | #5
On Fri, Jan 18, 2019 at 01:41:08AM +0000, Ramsay Jones wrote:

> I don't do this "from time to time", but *every* build on all
> platforms! :-D
> 
> As I have mentioned before, I run the script on 'master', 'next'
> and 'pu', but I don't look at the results for 'master', I simply
> look at the diffs master->next and next->pu.

Ah, ok, that explains it, then. As you noted, these made it straight to
master because of the security embargo.

Thanks for satisfying my curiosity (and for running your script!).

I do wonder if you might be better off comparing master@{1} to master to
see if anything new appears (since I assume the whole point is ignoring
historical false positives, and just looking at patches under active
development).

-Peff
Ramsay Jones Jan. 22, 2019, 6:19 p.m. UTC | #6
On 22/01/2019 07:23, Jeff King wrote:
> On Fri, Jan 18, 2019 at 01:41:08AM +0000, Ramsay Jones wrote:
> 
>> I don't do this "from time to time", but *every* build on all
>> platforms! :-D
>>
>> As I have mentioned before, I run the script on 'master', 'next'
>> and 'pu', but I don't look at the results for 'master', I simply
>> look at the diffs master->next and next->pu.
> 
> Ah, ok, that explains it, then. As you noted, these made it straight to
> master because of the security embargo.
> 
> Thanks for satisfying my curiosity (and for running your script!).
> 
> I do wonder if you might be better off comparing master@{1} to master to
> see if anything new appears (since I assume the whole point is ignoring
> historical false positives, and just looking at patches under active
> development).

Hmm, well it's not so much 'historical false positives' as 'oh dear,
they managed to get through' (along with a promise to myself to get
around to tidying up the symbols in master - yet again!). ;-)

I try to make people aware of the issues, when they appear in 'pu',
so that we have a chance not to make things worse. However, it is
never as simple as 'this symbol is not used/local to this file,
please fix' (despite what it looks like, I don't like to annoy
contributors with those emails :-D ). Many recent large changes
have been split into several series with earlier series introducing
symbols which 'will be used later'. Sometimes later never comes. ;-)

Recently, Brian's 'bc/sha-256' branch merged into 'next', so now:

  $ diff sc nsc
  37a38,39
  > hex.o	- hash_to_hex
  > hex.o	- hash_to_hex_algop_r
  74a77,78
  > sha1-file.o	- hash_algo_by_id
  > sha1-file.o	- hash_algo_by_name
  $ 

Brian has already indicated [1] that future patches will add uses
for these symbols.

[1] https://public-inbox.org/git/20181114021118.GN890086@genre.crustytoothpaste.net/

[Just to be clear, my script only notes symbols that are not
referenced outside of the object file which contains its
definition - so that includes file-local and unused symbols].

There are currently 90 symbols in the 'sc' file, some of which
should be added to the outdated 'skip list'. Just FYI, the file
which has the most hits is:

  $ cut -f1 sc | sort | uniq -c | sort -rn
       26 config.o
        6 sha1dc/sha1.o
        6 refs.o
        6 json-writer.o
        3 utf8.o
        3 sha1-file.o
        3 revision.o
        3 refs/ref-cache.o
        2 vcs-svn/fast_export.o
        2 refs/packed-backend.o
        2 path.o
        2 parse-options.o
        2 graph.o
        2 attr.o
        1 worktree.o
        1 trace.o
        1 tmp-objdir.o
        1 tempfile.o
        1 strbuf.o
        1 serve.o
        1 sequencer.o
        1 refspec.o
        1 refs/iterator.o
        1 read-cache.o
        1 pkt-line.o
        1 oidmap.o
        1 line-log.o
        1 ident.o
        1 hex.o
        1 gettext.o
        1 fuzz-pack-idx.o
        1 fuzz-pack-headers.o
        1 editor.o
        1 credential.o
        1 convert.o
        1 builtin/pack-objects.o
  $ 

... and the symbols in that file:

  $ grep config.o sc
  config.o	- git_config_copy_section_in_file
  config.o	- git_config_from_file_with_options
  config.o	- git_config_from_parameters
  config.o	- git_config_get_bool_or_int
  config.o	- git_config_get_maybe_bool
  config.o	- git_config_get_pathname
  config.o	- git_config_include
  config.o	- git_config_key_is_valid
  config.o	- git_configset_get_bool
  config.o	- git_configset_get_bool_or_int
  config.o	- git_configset_get_int
  config.o	- git_configset_get_maybe_bool
  config.o	- git_configset_get_pathname
  config.o	- git_configset_get_string
  config.o	- git_configset_get_string_const
  config.o	- git_configset_get_ulong
  config.o	- git_config_set_multivar_in_file
  config.o	- git_config_system
  config.o	- git_die_config_linenr
  config.o	- repo_config
  config.o	- repo_config_get_bool_or_int
  config.o	- repo_config_get_int
  config.o	- repo_config_get_maybe_bool
  config.o	- repo_config_get_pathname
  config.o	- repo_config_get_ulong
  config.o	- repo_config_get_value
  $ 
  

ATB,
Ramsay Jones
diff mbox series

Patch

diff --git a/fsck.c b/fsck.c
index 68502ce85b..850363fc8e 100644
--- a/fsck.c
+++ b/fsck.c
@@ -68,6 +68,8 @@  static struct oidset gitmodules_done = OIDSET_INIT;
 	FUNC(GITMODULES_SYMLINK, ERROR) \
 	FUNC(GITMODULES_URL, ERROR) \
 	FUNC(GITMODULES_PATH, ERROR) \
+	FUNC(GITIGNORE_SYMLINK, ERROR) \
+	FUNC(GITATTRIBUTES_SYMLINK, ERROR) \
 	/* warnings */ \
 	FUNC(BAD_FILEMODE, WARN) \
 	FUNC(EMPTY_NAME, WARN) \
@@ -627,6 +629,19 @@  static int fsck_tree(struct tree *item, struct fsck_options *options)
 						 ".gitmodules is a symbolic link");
 		}
 
+		if (S_ISLNK(mode)) {
+			if (is_hfs_dotgitignore(name) ||
+			    is_ntfs_dotgitignore(name))
+				retval += report(options, &item->object,
+						 FSCK_MSG_GITIGNORE_SYMLINK,
+						 ".gitignore is a symlink");
+			if (is_hfs_dotgitattributes(name) ||
+			    is_ntfs_dotgitattributes(name))
+				retval += report(options, &item->object,
+						 FSCK_MSG_GITATTRIBUTES_SYMLINK,
+						 ".gitattributes is a symlink");
+		}
+
 		if (update_tree_entry_gently(&desc)) {
 			retval += report(options, &item->object, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 			break;