diff mbox series

[6/6] config: reject parsing of files over INT_MAX

Message ID 20200410195007.GF1363756@coredump.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series better handling of gigantic config files | expand

Commit Message

Jeff King April 10, 2020, 7:50 p.m. UTC
While the last few commits have made it possible for the config parser
to handle config files up to the limits of size_t, the rest of the code
isn't really ready for this. In particular, we often feed the keys as
strings into printf "%s" format specifiers. And because the printf
family of functions must return an int to specify the result, they
complain. Here are two concrete examples (using glibc; we're in
uncharted territory here so results may vary):

Generate a gigantic .gitmodules file like this:

   git submodule add /some/other/repo foo
   {
           printf '[submodule "'
           perl -e 'print "a" x 2**31'
	   echo '"]path = foo'
   } >.gitmodules
   git commit -m 'huge gitmodule'

then try this:

   $ git show
   BUG: strbuf.c:397: your vsnprintf is broken (returned -1)

The problem is that we end up calling:

   strbuf_addf(&sb, "submodule.%s.ignore", submodule_name);

which relies on vsnprintf(), and that function has no way to report back
a size larger than INT_MAX.

Taking that same file, try this:

  git config --file=.gitmodules --list --name-only

On my system it produces an output with exactly 4GB of spaces. I
confirmed in a debugger that we reach the config callback with the key
intact: it's 2147483663 bytes and full of a's. But when we print it with
this call:

  printf("%s%c", key_, term);

we just get the spaces.

So given the fact that these are insane cases which we have no need to
support, the weird behavior from feeding the results to printf even if
the code is careful, and the possibility of uncareful code introducing
its own integer truncation issues, let's just declare INT_MAX as a limit
for parsing config files.

We'll enforce the limit in get_next_char(), which generalizes over all
sources (blobs, files, etc) and covers any element we're parsing
(whether section, key, value, etc). For simplicity, the limit is over
the length of the _whole_ file, so you couldn't have two 1GB values in
the same file. This should be perfectly fine, as the expected size for
config files is generally kilobytes at most.

With this patch both cases above will yield:

  fatal: bad config line 1 in file .gitmodules

That's not an amazing error message, but the parser isn't set up to
provide specific messages (it just breaks out of the parsing loop and
gives that generic error even if see a syntactic issue). And we really
wouldn't expect to see this case outside of somebody maliciously probing
the limits of the config system.

Signed-off-by: Jeff King <peff@peff.net>
---
 config.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Junio C Hamano April 10, 2020, 10:04 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> So given the fact that these are insane cases which we have no need to
> support, the weird behavior from feeding the results to printf even if
> the code is careful, and the possibility of uncareful code introducing
> its own integer truncation issues, let's just declare INT_MAX as a limit
> for parsing config files.

Makes sense.

> +	if (c != EOF && ++cf->total_len > INT_MAX) {

Would this work correctly if size_t is uint?  Sure, as int-max would
fit within it.  And of course if size_t is wider than uint, there is
no problem in this comparison.

Thanks.
Jeff King April 10, 2020, 10:15 p.m. UTC | #2
On Fri, Apr 10, 2020 at 03:04:31PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > So given the fact that these are insane cases which we have no need to
> > support, the weird behavior from feeding the results to printf even if
> > the code is careful, and the possibility of uncareful code introducing
> > its own integer truncation issues, let's just declare INT_MAX as a limit
> > for parsing config files.
> 
> Makes sense.
> 
> > +	if (c != EOF && ++cf->total_len > INT_MAX) {
> 
> Would this work correctly if size_t is uint?  Sure, as int-max would
> fit within it.  And of course if size_t is wider than uint, there is
> no problem in this comparison.

Good question, but yeah, I think it's right.

Another method would be to do:

  if (cf->total_len >= INT_MAX)

_before_ reading any character. We'd have to remember to increment
total_len then (I suppose we could do it preemptively; as long as people
don't try to read EOF from us over and over again it would never move
again).

I also considered making the limit much lower than INT_MAX because
really, who needs even a 1GB config file? :)

-Peff
Taylor Blau April 13, 2020, 12:47 a.m. UTC | #3
On Fri, Apr 10, 2020 at 06:15:49PM -0400, Jeff King wrote:
> On Fri, Apr 10, 2020 at 03:04:31PM -0700, Junio C Hamano wrote:
>
> > Jeff King <peff@peff.net> writes:
> >
> > > So given the fact that these are insane cases which we have no need to
> > > support, the weird behavior from feeding the results to printf even if
> > > the code is careful, and the possibility of uncareful code introducing
> > > its own integer truncation issues, let's just declare INT_MAX as a limit
> > > for parsing config files.
> >
> > Makes sense.
> >
> > > +	if (c != EOF && ++cf->total_len > INT_MAX) {
> >
> > Would this work correctly if size_t is uint?  Sure, as int-max would
> > fit within it.  And of course if size_t is wider than uint, there is
> > no problem in this comparison.
>
> Good question, but yeah, I think it's right.
>
> Another method would be to do:
>
>   if (cf->total_len >= INT_MAX)
>
> _before_ reading any character. We'd have to remember to increment
> total_len then (I suppose we could do it preemptively; as long as people
> don't try to read EOF from us over and over again it would never move
> again).
>
> I also considered making the limit much lower than INT_MAX because
> really, who needs even a 1GB config file? :)

;). Making it lower than INT_MAX moves us into the territory of deciding
what is an "appropriately" sized config file, which I'd rather not do.
At least we can blame INT_MAX if someone has a too-large config file.

> -Peff

Thanks,
Taylor
Junio C Hamano April 13, 2020, 10:14 p.m. UTC | #4
Taylor Blau <me@ttaylorr.com> writes:

> ;). Making it lower than INT_MAX moves us into the territory of deciding
> what is an "appropriately" sized config file, which I'd rather not do.
> At least we can blame INT_MAX if someone has a too-large config file.

;-)  Sensible.
diff mbox series

Patch

diff --git a/config.c b/config.c
index 1c25c94863..8db9c77098 100644
--- a/config.c
+++ b/config.c
@@ -37,6 +37,7 @@  struct config_source {
 	enum config_error_action default_error_action;
 	int linenr;
 	int eof;
+	size_t total_len;
 	struct strbuf value;
 	struct strbuf var;
 	unsigned subsection_case_sensitive : 1;
@@ -524,6 +525,19 @@  static int get_next_char(void)
 			c = '\r';
 		}
 	}
+
+	if (c != EOF && ++cf->total_len > INT_MAX) {
+		/*
+		 * This is an absurdly long config file; refuse to parse
+		 * further in order to protect downstream code from integer
+		 * overflows. Note that we can't return an error specifically,
+		 * but we can mark EOF and put trash in the return value,
+		 * which will trigger a parse error.
+		 */
+		cf->eof = 1;
+		return 0;
+	}
+
 	if (c == '\n')
 		cf->linenr++;
 	if (c == EOF) {
@@ -1540,6 +1554,7 @@  static int do_config_from(struct config_source *top, config_fn_t fn, void *data,
 	top->prev = cf;
 	top->linenr = 1;
 	top->eof = 0;
+	top->total_len = 0;
 	strbuf_init(&top->value, 1024);
 	strbuf_init(&top->var, 1024);
 	cf = top;