diff mbox series

tree-walk: disallow overflowing modes

Message ID d673fde7-7eb2-6306-86b6-1c1a4c988ee8@web.de (mailing list archive)
State New, archived
Headers show
Series tree-walk: disallow overflowing modes | expand

Commit Message

René Scharfe Jan. 21, 2023, 9:36 a.m. UTC
When parsing tree entries, reject mode values that don't fit into an
unsigned int.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 tree-walk.c | 2 ++
 1 file changed, 2 insertions(+)

--
2.39.1

Comments

Jeff King Jan. 22, 2023, 7:50 a.m. UTC | #1
On Sat, Jan 21, 2023 at 10:36:09AM +0100, René Scharfe wrote:

> When parsing tree entries, reject mode values that don't fit into an
> unsigned int.

Seems reasonable. I don't think you can cause any interesting mischief
here, but it's cheap to check, and finding data problems earlier rather
than later is always good.

Should it be s/unsigned int/uint16_t/, though?

-Peff
René Scharfe Jan. 22, 2023, 10:03 a.m. UTC | #2
Am 22.01.23 um 08:50 schrieb Jeff King:
> On Sat, Jan 21, 2023 at 10:36:09AM +0100, René Scharfe wrote:
>
>> When parsing tree entries, reject mode values that don't fit into an
>> unsigned int.
>
> Seems reasonable. I don't think you can cause any interesting mischief
> here, but it's cheap to check, and finding data problems earlier rather
> than later is always good.
>
> Should it be s/unsigned int/uint16_t/, though?

"mode" is declared as unsigned int, and I was more concerned with
overflowing that.

We could be more strict and reject everything that oversteps
S_IFMT|ALLPERMS, but the latter is not defined everywhere.  But
permission bits are well-known, so the magic number 07777 should be
recognizable enough.  Like this?

--- >8 ---
Subject: [PATCH v2] tree-walk: disallow overflowing modes

When parsing tree entries, reject mode values with bits set outside file
type mask and permission bits.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 tree-walk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tree-walk.c b/tree-walk.c
index 74f4d710e8..62da0e5c73 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -18,6 +18,8 @@ static const char *get_mode(const char *str, unsigned int *modep)
 		if (c < '0' || c > '7')
 			return NULL;
 		mode = (mode << 3) + (c - '0');
+		if (mode & ~(S_IFMT | 07777))
+			return NULL;
 	}
 	*modep = mode;
 	return str;
--
2.39.1
Junio C Hamano Jan. 22, 2023, 4:36 p.m. UTC | #3
René Scharfe <l.s.r@web.de> writes:

> We could be more strict and reject everything that oversteps
> S_IFMT|ALLPERMS, but the latter is not defined everywhere.  But
> permission bits are well-known, so the magic number 07777 should be
> recognizable enough.  Like this?

I do not quite see the reason why we want to be more strict than we
already are at this point in the code path.  Stricter mode check in
reports FSCK_MSG_ZERO_PADDED_FILEMODE and FSCK_MSG_BAD_FILEMODE from
"fsck", which I think is probably sufficient.

Avoiding integer wraparound is a good idea, even if it were
impossible to induce misparsing of the tree data to lead to any
security issues, for the same reason why we check for zero padded
filemode, i.e. such a tree mode will allow the same tree object to
be given different object names.

Thanks.
Jeff King Jan. 22, 2023, 10:02 p.m. UTC | #4
On Sun, Jan 22, 2023 at 11:03:38AM +0100, René Scharfe wrote:

> Am 22.01.23 um 08:50 schrieb Jeff King:
> > On Sat, Jan 21, 2023 at 10:36:09AM +0100, René Scharfe wrote:
> >
> >> When parsing tree entries, reject mode values that don't fit into an
> >> unsigned int.
> >
> > Seems reasonable. I don't think you can cause any interesting mischief
> > here, but it's cheap to check, and finding data problems earlier rather
> > than later is always good.
> >
> > Should it be s/unsigned int/uint16_t/, though?
> 
> "mode" is declared as unsigned int, and I was more concerned with
> overflowing that.

Doh, I just did "vim -t get_mode" and ended up in the near-identical
version in fast-import.c, which uses uint16_t. Maybe it would want the
same treatment. ;)

> We could be more strict and reject everything that oversteps
> S_IFMT|ALLPERMS, but the latter is not defined everywhere.  But
> permission bits are well-known, so the magic number 07777 should be
> recognizable enough.  Like this?

It feels like this level of check should be happening in the fsck
caller. In particular, it's nice for this parsing layer to err on the
forgiving side, because the way you get out of these situations often
involves something like "git ls-tree | git mktree".

So in that sense, even pushing the overflow detection into the fsck
would be nice, but of course it's hard to do, since we have no way to
represent the overflowed value. I guess we could have a "be more
careful" flag that the fsck code would pass, but I'm not sure it's worth
the added complexity.

-Peff
Ævar Arnfjörð Bjarmason Jan. 23, 2023, 8:33 a.m. UTC | #5
On Sat, Jan 21 2023, René Scharfe wrote:

> When parsing tree entries, reject mode values that don't fit into an
> unsigned int.
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  tree-walk.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tree-walk.c b/tree-walk.c
> index 74f4d710e8..5e7bc38600 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -17,6 +17,8 @@ static const char *get_mode(const char *str, unsigned int *modep)
>  	while ((c = *str++) != ' ') {
>  		if (c < '0' || c > '7')
>  			return NULL;
> +		if ((mode << 3) >> 3 != mode)
> +			return NULL;
>  		mode = (mode << 3) + (c - '0');
>  	}
>  	*modep = mode;

There was a discussion about this on git-security last August, in a
report that turned out to be uninteresting for the security aspects.

I'll just quote my own reply here out of context
(<220811.86mtcbqd5x.gmgdl@evledraar.gmail.com> for those with access).

For context the other issue in the "two issues" below is the one I'm
working towards fixing in
https://lore.kernel.org/git/cover-0.4-00000000000-20221118T113442Z-avarab@gmail.com/,
the other is this file mode overflow.

As noted at the end below I'm conflicted between whether we should "fix"
this in some way, or just intentionally leave it alone, because while
it's entirely accidental, this is the one part of git's object format
I'm aware of that we could sneak in some extension in the future,
without entirely breaking backwards compatibility.

BEGIN QUOTE

There's really two issues being raised here, how we validate the "mode"
in tree entries, and how we sometimes misreport object type X as type Y.

I replied on-list just now noting that the "mode" thing is something I
was working towards fixing a while ago, but am happy to see a more
isolated fix for:
https://lore.kernel.org/git/220811.86r11nqfi2.gmgdl@evledraar.gmail.com/

I have local patches for the misreporting of object types, it's not
*that* hard to get right. Basically we just need to be more careful
about how we populate the "struct object". The most common case is when
we parse tags that say on their envelope that we refer to a type X, but
it's really a type Y.

In that case we haven't .parsed=1 the object, but we note the wrong type
down. My local patches are basically just deferring that, so we don't
trust such type claims, and take our *actual* parsing of the object over
a previous reference in the object store.

I don't think that leaves any difficult edge cases related to tree
parsing, as Xavier notes we'd have modes claiming that an X is Y, but we
can resolve it using the same principle.

But regarding the "mode parsing" I think
https://lore.kernel.org/git/YvQdR3sDqDMCIjIE@coredump.intra.peff.net/ is
taking us in the wrong direction, but wasn't comfortable saying so on
list, because this is "exploitable" in the following way:

	$ perl -wE 'say unpack "B*", "hello world"'
	0110100001100101011011000110110001101111001000000111011101101111011100100110110001100100

Which, combined with Jeff's series on-list you can try e.g.:
	
	diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
	index ac4099ca893..d435dfd64c5 100755
	--- a/t/t5504-fetch-receive-strict.sh
	+++ b/t/t5504-fetch-receive-strict.sh
	@@ -357,16 +357,16 @@ test_expect_success 'badFilemode is not a strict error' '
	 	tree=$(
	 		cd badmode.git &&
	 		blob=$(echo blob | git hash-object -w --stdin | hex2oct) &&
	-		printf "123456 foo\0${blob}" |
	+		printf "106640110100001100101011011000110110001101111001000000111011101101111011100100110110001100100664 foo\0${blob}" |
	 		git hash-object -t tree --stdin -w --literally
	 	) &&
	 
	 	rm -rf dst.git &&
	 	git init --bare dst.git &&
	 	git -C dst.git config transfer.fsckObjects true &&
	-
	-	git -C badmode.git push ../dst.git $tree:refs/tags/tree 2>err &&
	-	grep "$tree: badFilemode" err
	+	git -C badmode.git push ../dst.git $tree:refs/tags/tree &&
	+	git -C badmode.git fsck &&
	+	git -C dst.git fsck
	 '
	 
	 test_done
	diff --git a/tree-walk.c b/tree-walk.c
	index 74f4d710e8f..2fb9f2e6cbe 100644
	--- a/tree-walk.c
	+++ b/tree-walk.c
	@@ -15,6 +15,7 @@ static const char *get_mode(const char *str, unsigned int *modep)
	 		return NULL;
	 
	 	while ((c = *str++) != ' ') {
	+		fprintf(stderr, "%c\n", c);
	 		if (c < '0' || c > '7')
	 			return NULL;
	 		mode = (mode << 3) + (c - '0');

Which gets you all tests passing, i.e. the particulars of the mode check
are such that we'll accept a very long mode (but if you play with it
you'll find it's not infinite, we'll run into other buffers elsewhere
pretty soon).

This is all assuming you're on a system whose "unsigned int" is 64 bit.

This part of it is also something I discovered in the beginning of 2021
(and there's some old off-list thread between myself & Elijah on the
topic I could dig up), but didn't report here at the time.

So, we could just "fix" it, and re the "wrong direction" above
downgrading the "bad mode" check to a mere warning is going a bit too
far given the above. We have some odd modes in the wild, but we don't
have such "overflowing modes".

On the other hand this edge case is also a golden opportunity we're not
likely to ever have again. We can't really change the git object format
at this point without *major* headaches, but in this case we have the
ability to encode arbitrary data into tree entries (e.g file metadata)
as long as the writer makes sure they overflow back to the valid
filemode :)

As hacky as that is I think it would be regrettable to forever close the
door on that to fix a hypothetical security bug, hypothetical because no
version of git.git has an issue with it. But it is a *potential* edge
case in overflowing any other tree parsing code out there in the wild
(which might be otherwise guarded by a stricter fsck check of ours).

END QUOTE
René Scharfe Jan. 24, 2023, 6:53 p.m. UTC | #6
Am 23.01.23 um 09:33 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sat, Jan 21 2023, René Scharfe wrote:
>
>> When parsing tree entries, reject mode values that don't fit into an
>> unsigned int.
>>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>>  tree-walk.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/tree-walk.c b/tree-walk.c
>> index 74f4d710e8..5e7bc38600 100644
>> --- a/tree-walk.c
>> +++ b/tree-walk.c
>> @@ -17,6 +17,8 @@ static const char *get_mode(const char *str, unsigned int *modep)
>>  	while ((c = *str++) != ' ') {
>>  		if (c < '0' || c > '7')
>>  			return NULL;
>> +		if ((mode << 3) >> 3 != mode)
>> +			return NULL;
>>  		mode = (mode << 3) + (c - '0');
>>  	}
>>  	*modep = mode;
>
> There was a discussion about this on git-security last August, in a
> report that turned out to be uninteresting for the security aspects.
>
> I'll just quote my own reply here out of context
> (<220811.86mtcbqd5x.gmgdl@evledraar.gmail.com> for those with access).

> On the other hand this edge case is also a golden opportunity we're not
> likely to ever have again. We can't really change the git object format
> at this point without *major* headaches, but in this case we have the
> ability to encode arbitrary data into tree entries (e.g file metadata)
> as long as the writer makes sure they overflow back to the valid
> filemode :)

Patch v1 cited above still keeps bits beyond S_IFMT (0xF000), so that's
at 16 bits on a platform with 32-bit unsigned int for future object
types or other metadata.

One bit would suffice to switch the path field into an URL and encode
additional metadata there.  We could do that even with the stricter
patch v2 by using one of the bits between S_IFMT and normal permissions
(0777) for that, e.g. the sticky bit.

That all said, the longer I think about mode overflow the less I
understand why I sent these patches.  It's basically harmless.  Perhaps
we just need a comment stating that, to contain the urge to "fix" this.
Anyway, I'd like to retract my overflowing modes patches (unless someone
else really, really wants one of them applied).

René
Junio C Hamano Jan. 24, 2023, 8:44 p.m. UTC | #7
René Scharfe <l.s.r@web.de> writes:

> ...  It's basically harmless.  Perhaps
> we just need a comment stating that, to contain the urge to "fix" this.

Yeah, especially since fsck finds and warns about bad mode with
FSCK_MSG_BAD_FILEMODE and also FSCK_MSG_ZERO_PADDED_FILEMODE in a
separate codepath, adding another piece of code that checks a
slightly different condition does not sound like a good idea.

Thanks.
Jeff King Jan. 26, 2023, 11:36 a.m. UTC | #8
On Tue, Jan 24, 2023 at 12:44:16PM -0800, Junio C Hamano wrote:

> René Scharfe <l.s.r@web.de> writes:
> 
> > ...  It's basically harmless.  Perhaps
> > we just need a comment stating that, to contain the urge to "fix" this.
> 
> Yeah, especially since fsck finds and warns about bad mode with
> FSCK_MSG_BAD_FILEMODE and also FSCK_MSG_ZERO_PADDED_FILEMODE in a
> separate codepath, adding another piece of code that checks a
> slightly different condition does not sound like a good idea.

Yeah, I'm happy to drop this whole thing. I do think it would be
reasonable for fsck to check for overflow alongside BAD_FILEMODE, etc,
but it's annoying to do since we are relying on the existing parser.

I actually have a suspicion that it might be reasonable for fsck to just
parse the trees itself, rather than relying on decode_tree_entry(), etc.
But that's a much bigger topic (and a possible maintenance burden) for
questionable gain, so unless we find some compelling reason (some case
that we really want to detect but which we want the regular decoder to
ignore), it's probably not worth exploring.

-Peff
diff mbox series

Patch

diff --git a/tree-walk.c b/tree-walk.c
index 74f4d710e8..5e7bc38600 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -17,6 +17,8 @@  static const char *get_mode(const char *str, unsigned int *modep)
 	while ((c = *str++) != ' ') {
 		if (c < '0' || c > '7')
 			return NULL;
+		if ((mode << 3) >> 3 != mode)
+			return NULL;
 		mode = (mode << 3) + (c - '0');
 	}
 	*modep = mode;