Message ID | 20190114230902.GG162110@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PATCH/RFC] fsck: complain when .gitignore and .gitattributes are symlinks | expand |
On Mon, Jan 14, 2019 at 03:09:02PM -0800, Jonathan Nieder wrote: > From: Jeff King <peff@peff.net> > Date: Sun, 13 May 2018 14:14:34 -0400 > > This case is already forbidden by verify_path(), so let's > check it in fsck. It's easier to handle than .gitmodules, > because we don't care about checking the blob content. This > is really just about whether the name and mode for the tree > entry are valid. Hmm. I think this commit message isn't quite right, because we also skipped the patches to touch gitignore/gitattributes in verify_path(). Are you thinking we should resurrect that behavior[1], too, or just protect at the fsck level? > It was omitted from that series because it does not address any known > exploit, but to me it seems worthwhile anyway: > > - if a client enables transfer.fsckObjects, this helps them protect > themselves against weird input that does *not* have a known exploit > attached, to > > - it generally feels more simple and robust. Git-related tools can > benefit from this kind of check as an indication of input they can > bail out on instead of trying to support. I think I may just be restating your two points above, but what I'd argue is: - even though there's no known-interesting exploit, this can cause Git to unexpectedly read arbitrary files outside of the repository directory. That in itself isn't necessarily evil, but it's weird. - there are potentially non-malicious bugs here, where we try to read .gitattributes out of the index, but obviously don't follow symlinks there -Peff [1] This wasn't a separate patch, but just an early iteration of the "ban symlinks in .gitmodules" patch. I think the incremental is just: diff --git a/read-cache.c b/read-cache.c index bfff271a3d..121c0bec69 100644 --- a/read-cache.c +++ b/read-cache.c @@ -937,7 +937,9 @@ static int verify_dotfile(const char *rest, unsigned mode) return 0; if (S_ISLNK(mode)) { rest += 3; - if (skip_iprefix(rest, "modules", &rest) && + if ((skip_iprefix(rest, "modules", &rest) || + skip_iprefix(rest, "ignore", &rest) || + skip_iprefix(rest, "attributes", &rest)) && (*rest == '\0' || is_dir_sep(*rest))) return 0; } @@ -966,7 +968,9 @@ int verify_path(const char *path, unsigned mode) if (is_hfs_dotgit(path)) return 0; if (S_ISLNK(mode)) { - if (is_hfs_dotgitmodules(path)) + if (is_hfs_dotgitmodules(path) || + is_hfs_dotgitignore(path) || + is_hfs_dotgitattributes(path)) return 0; } } @@ -974,7 +978,9 @@ int verify_path(const char *path, unsigned mode) if (is_ntfs_dotgit(path)) return 0; if (S_ISLNK(mode)) { - if (is_ntfs_dotgitmodules(path)) + if (is_ntfs_dotgitmodules(path) || + is_ntfs_dotgitignore(path) || + is_ntfs_dotgitattributes(path)) return 0; } }
Jeff King <peff@peff.net> writes: > Hmm. I think this commit message isn't quite right, because we also > skipped the patches to touch gitignore/gitattributes in verify_path(). > > Are you thinking we should resurrect that behavior[1], too, or just > protect at the fsck level? > >> It was omitted from that series because it does not address any known >> exploit, but to me it seems worthwhile anyway: >> >> - if a client enables transfer.fsckObjects, this helps them protect >> themselves against weird input that does *not* have a known exploit >> attached, to >> >> - it generally feels more simple and robust. Git-related tools can >> benefit from this kind of check as an indication of input they can >> bail out on instead of trying to support. > > I think I may just be restating your two points above, but what I'd > argue is: > > - even though there's no known-interesting exploit, this can cause Git > to unexpectedly read arbitrary files outside of the repository > directory. That in itself isn't necessarily evil, but it's weird. > > - there are potentially non-malicious bugs here, where we try to read > .gitattributes out of the index, but obviously don't follow symlinks > there FWIW, you two can count me as the third person who agrees with the above points. > [1] This wasn't a separate patch, but just an early iteration of the > "ban symlinks in .gitmodules" patch. I think the incremental is > just: > > diff --git a/read-cache.c b/read-cache.c > index bfff271a3d..121c0bec69 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -937,7 +937,9 @@ static int verify_dotfile(const char *rest, unsigned mode) > return 0; > if (S_ISLNK(mode)) { > rest += 3; > - if (skip_iprefix(rest, "modules", &rest) && > + if ((skip_iprefix(rest, "modules", &rest) || > + skip_iprefix(rest, "ignore", &rest) || > + skip_iprefix(rest, "attributes", &rest)) && > (*rest == '\0' || is_dir_sep(*rest))) > return 0; > } OK. > @@ -966,7 +968,9 @@ int verify_path(const char *path, unsigned mode) > if (is_hfs_dotgit(path)) > return 0; > if (S_ISLNK(mode)) { > - if (is_hfs_dotgitmodules(path)) > + if (is_hfs_dotgitmodules(path) || > + is_hfs_dotgitignore(path) || > + is_hfs_dotgitattributes(path)) > return 0; > } > } > @@ -974,7 +978,9 @@ int verify_path(const char *path, unsigned mode) > if (is_ntfs_dotgit(path)) > return 0; > if (S_ISLNK(mode)) { > - if (is_ntfs_dotgitmodules(path)) > + if (is_ntfs_dotgitmodules(path) || > + is_ntfs_dotgitignore(path) || > + is_ntfs_dotgitattributes(path)) > return 0; Curious that we already have these helpers, nobody seems to call them in the current codebase, and we haven't seen the "these are unused" linter message on the list for a while ;-).
On Thu, Jan 17, 2019 at 12:13:12PM -0800, Junio C Hamano wrote: > > @@ -966,7 +968,9 @@ int verify_path(const char *path, unsigned mode) > > if (is_hfs_dotgit(path)) > > return 0; > > if (S_ISLNK(mode)) { > > - if (is_hfs_dotgitmodules(path)) > > + if (is_hfs_dotgitmodules(path) || > > + is_hfs_dotgitignore(path) || > > + is_hfs_dotgitattributes(path)) > > return 0; > > } > > } > > @@ -974,7 +978,9 @@ int verify_path(const char *path, unsigned mode) > > if (is_ntfs_dotgit(path)) > > return 0; > > if (S_ISLNK(mode)) { > > - if (is_ntfs_dotgitmodules(path)) > > + if (is_ntfs_dotgitmodules(path) || > > + is_ntfs_dotgitignore(path) || > > + is_ntfs_dotgitattributes(path)) > > return 0; > > Curious that we already have these helpers, nobody seems to call > them in the current codebase, and we haven't seen the "these are > unused" linter message on the list for a while ;-). Heh. Yeah, I was surprised by that, too. They were added by e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11). The original version of my series had the hunks quoted above, and then we backed off on handling them as part of the emergency fix, but I never re-rolled the preparatory patch to get rid of them. I think they got overlooked because they're not file-local statics, and it's much harder to say "this is never called by any function in another translation unit". You probably have to do analysis on the complete binaries using "nm" or similar. I think maybe Ramsay does that from time to time, but I don't offhand know the correct incantation. Anyway, it sounds like you like the overall direction. Does that include these verify_path() bits, as well as the fsck part? -Peff
On 17/01/2019 21:24, Jeff King wrote: > On Thu, Jan 17, 2019 at 12:13:12PM -0800, Junio C Hamano wrote: > >>> @@ -966,7 +968,9 @@ int verify_path(const char *path, unsigned mode) >>> if (is_hfs_dotgit(path)) >>> return 0; >>> if (S_ISLNK(mode)) { >>> - if (is_hfs_dotgitmodules(path)) >>> + if (is_hfs_dotgitmodules(path) || >>> + is_hfs_dotgitignore(path) || >>> + is_hfs_dotgitattributes(path)) >>> return 0; >>> } >>> } >>> @@ -974,7 +978,9 @@ int verify_path(const char *path, unsigned mode) >>> if (is_ntfs_dotgit(path)) >>> return 0; >>> if (S_ISLNK(mode)) { >>> - if (is_ntfs_dotgitmodules(path)) >>> + if (is_ntfs_dotgitmodules(path) || >>> + is_ntfs_dotgitignore(path) || >>> + is_ntfs_dotgitattributes(path)) >>> return 0; >> >> Curious that we already have these helpers, nobody seems to call >> them in the current codebase, and we haven't seen the "these are >> unused" linter message on the list for a while ;-). > > Heh. Yeah, I was surprised by that, too. They were added by e7cb0b4455 > (is_ntfs_dotgit: match other .git files, 2018-05-11). The original > version of my series had the hunks quoted above, and then we backed off > on handling them as part of the emergency fix, but I never re-rolled the > preparatory patch to get rid of them. > > I think they got overlooked because they're not file-local statics, and > it's much harder to say "this is never called by any function in another > translation unit". You probably have to do analysis on the complete > binaries using "nm" or similar. I think maybe Ramsay does that from time > to time, but I don't offhand know the correct incantation. I don't do this "from time to time", but *every* build on all platforms! :-D As I have mentioned before, I run the script on 'master', 'next' and 'pu', but I don't look at the results for 'master', I simply look at the diffs master->next and next->pu. I put the output of 'static-check.pl' in the sc, nsc and psc files (guess which files are for which branches!). For example, tonight I find: $ wc -l sc nsc psc 90 sc 90 nsc 100 psc 280 total $ diff sc nsc $ diff nsc psc 29a30,32 > config.o - repo_config_set > config.o - repo_config_set_gently > config.o - repo_config_set_worktree_gently 32a36 > fuzz-commit-graph.o - LLVMFuzzerTestOneInput 37a42,43 > hex.o - hash_to_hex > hex.o - hash_to_hex_algop_r 74a81,83 > sha1-file.o - hash_algo_by_id > sha1-file.o - hash_algo_by_name > sha1-file.o - repo_has_sha1_file_with_flags 80a90 > strbuf.o - strbuf_vinsertf $ BTW, if my memory serves (and it may not), the symbols you refer to came directly into 'master' (via 'maint') as a result of security updates - so I would never have seen them in 'pu' or 'next'. They are, indeed, currently noted in the 'master' branch: $ grep is_ntfs_ sc path.o - is_ntfs_dotgitattributes path.o - is_ntfs_dotgitignore $ grep is_hfs_ sc utf8.o - is_hfs_dotgitattributes utf8.o - is_hfs_dotgitignore $ ATB, Ramsay Jones
On Fri, Jan 18, 2019 at 01:41:08AM +0000, Ramsay Jones wrote: > I don't do this "from time to time", but *every* build on all > platforms! :-D > > As I have mentioned before, I run the script on 'master', 'next' > and 'pu', but I don't look at the results for 'master', I simply > look at the diffs master->next and next->pu. Ah, ok, that explains it, then. As you noted, these made it straight to master because of the security embargo. Thanks for satisfying my curiosity (and for running your script!). I do wonder if you might be better off comparing master@{1} to master to see if anything new appears (since I assume the whole point is ignoring historical false positives, and just looking at patches under active development). -Peff
On 22/01/2019 07:23, Jeff King wrote: > On Fri, Jan 18, 2019 at 01:41:08AM +0000, Ramsay Jones wrote: > >> I don't do this "from time to time", but *every* build on all >> platforms! :-D >> >> As I have mentioned before, I run the script on 'master', 'next' >> and 'pu', but I don't look at the results for 'master', I simply >> look at the diffs master->next and next->pu. > > Ah, ok, that explains it, then. As you noted, these made it straight to > master because of the security embargo. > > Thanks for satisfying my curiosity (and for running your script!). > > I do wonder if you might be better off comparing master@{1} to master to > see if anything new appears (since I assume the whole point is ignoring > historical false positives, and just looking at patches under active > development). Hmm, well it's not so much 'historical false positives' as 'oh dear, they managed to get through' (along with a promise to myself to get around to tidying up the symbols in master - yet again!). ;-) I try to make people aware of the issues, when they appear in 'pu', so that we have a chance not to make things worse. However, it is never as simple as 'this symbol is not used/local to this file, please fix' (despite what it looks like, I don't like to annoy contributors with those emails :-D ). Many recent large changes have been split into several series with earlier series introducing symbols which 'will be used later'. Sometimes later never comes. ;-) Recently, Brian's 'bc/sha-256' branch merged into 'next', so now: $ diff sc nsc 37a38,39 > hex.o - hash_to_hex > hex.o - hash_to_hex_algop_r 74a77,78 > sha1-file.o - hash_algo_by_id > sha1-file.o - hash_algo_by_name $ Brian has already indicated [1] that future patches will add uses for these symbols. [1] https://public-inbox.org/git/20181114021118.GN890086@genre.crustytoothpaste.net/ [Just to be clear, my script only notes symbols that are not referenced outside of the object file which contains its definition - so that includes file-local and unused symbols]. There are currently 90 symbols in the 'sc' file, some of which should be added to the outdated 'skip list'. Just FYI, the file which has the most hits is: $ cut -f1 sc | sort | uniq -c | sort -rn 26 config.o 6 sha1dc/sha1.o 6 refs.o 6 json-writer.o 3 utf8.o 3 sha1-file.o 3 revision.o 3 refs/ref-cache.o 2 vcs-svn/fast_export.o 2 refs/packed-backend.o 2 path.o 2 parse-options.o 2 graph.o 2 attr.o 1 worktree.o 1 trace.o 1 tmp-objdir.o 1 tempfile.o 1 strbuf.o 1 serve.o 1 sequencer.o 1 refspec.o 1 refs/iterator.o 1 read-cache.o 1 pkt-line.o 1 oidmap.o 1 line-log.o 1 ident.o 1 hex.o 1 gettext.o 1 fuzz-pack-idx.o 1 fuzz-pack-headers.o 1 editor.o 1 credential.o 1 convert.o 1 builtin/pack-objects.o $ ... and the symbols in that file: $ grep config.o sc config.o - git_config_copy_section_in_file config.o - git_config_from_file_with_options config.o - git_config_from_parameters config.o - git_config_get_bool_or_int config.o - git_config_get_maybe_bool config.o - git_config_get_pathname config.o - git_config_include config.o - git_config_key_is_valid config.o - git_configset_get_bool config.o - git_configset_get_bool_or_int config.o - git_configset_get_int config.o - git_configset_get_maybe_bool config.o - git_configset_get_pathname config.o - git_configset_get_string config.o - git_configset_get_string_const config.o - git_configset_get_ulong config.o - git_config_set_multivar_in_file config.o - git_config_system config.o - git_die_config_linenr config.o - repo_config config.o - repo_config_get_bool_or_int config.o - repo_config_get_int config.o - repo_config_get_maybe_bool config.o - repo_config_get_pathname config.o - repo_config_get_ulong config.o - repo_config_get_value $ ATB, Ramsay Jones
diff --git a/fsck.c b/fsck.c index 68502ce85b..850363fc8e 100644 --- a/fsck.c +++ b/fsck.c @@ -68,6 +68,8 @@ static struct oidset gitmodules_done = OIDSET_INIT; FUNC(GITMODULES_SYMLINK, ERROR) \ FUNC(GITMODULES_URL, ERROR) \ FUNC(GITMODULES_PATH, ERROR) \ + FUNC(GITIGNORE_SYMLINK, ERROR) \ + FUNC(GITATTRIBUTES_SYMLINK, ERROR) \ /* warnings */ \ FUNC(BAD_FILEMODE, WARN) \ FUNC(EMPTY_NAME, WARN) \ @@ -627,6 +629,19 @@ static int fsck_tree(struct tree *item, struct fsck_options *options) ".gitmodules is a symbolic link"); } + if (S_ISLNK(mode)) { + if (is_hfs_dotgitignore(name) || + is_ntfs_dotgitignore(name)) + retval += report(options, &item->object, + FSCK_MSG_GITIGNORE_SYMLINK, + ".gitignore is a symlink"); + if (is_hfs_dotgitattributes(name) || + is_ntfs_dotgitattributes(name)) + retval += report(options, &item->object, + FSCK_MSG_GITATTRIBUTES_SYMLINK, + ".gitattributes is a symlink"); + } + if (update_tree_entry_gently(&desc)) { retval += report(options, &item->object, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree"); break;