Message ID | d673fde7-7eb2-6306-86b6-1c1a4c988ee8@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tree-walk: disallow overflowing modes | expand |
On Sat, Jan 21, 2023 at 10:36:09AM +0100, René Scharfe wrote: > When parsing tree entries, reject mode values that don't fit into an > unsigned int. Seems reasonable. I don't think you can cause any interesting mischief here, but it's cheap to check, and finding data problems earlier rather than later is always good. Should it be s/unsigned int/uint16_t/, though? -Peff
Am 22.01.23 um 08:50 schrieb Jeff King: > On Sat, Jan 21, 2023 at 10:36:09AM +0100, René Scharfe wrote: > >> When parsing tree entries, reject mode values that don't fit into an >> unsigned int. > > Seems reasonable. I don't think you can cause any interesting mischief > here, but it's cheap to check, and finding data problems earlier rather > than later is always good. > > Should it be s/unsigned int/uint16_t/, though? "mode" is declared as unsigned int, and I was more concerned with overflowing that. We could be more strict and reject everything that oversteps S_IFMT|ALLPERMS, but the latter is not defined everywhere. But permission bits are well-known, so the magic number 07777 should be recognizable enough. Like this? --- >8 --- Subject: [PATCH v2] tree-walk: disallow overflowing modes When parsing tree entries, reject mode values with bits set outside file type mask and permission bits. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: René Scharfe <l.s.r@web.de> --- tree-walk.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tree-walk.c b/tree-walk.c index 74f4d710e8..62da0e5c73 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -18,6 +18,8 @@ static const char *get_mode(const char *str, unsigned int *modep) if (c < '0' || c > '7') return NULL; mode = (mode << 3) + (c - '0'); + if (mode & ~(S_IFMT | 07777)) + return NULL; } *modep = mode; return str; -- 2.39.1
René Scharfe <l.s.r@web.de> writes: > We could be more strict and reject everything that oversteps > S_IFMT|ALLPERMS, but the latter is not defined everywhere. But > permission bits are well-known, so the magic number 07777 should be > recognizable enough. Like this? I do not quite see the reason why we want to be more strict than we already are at this point in the code path. Stricter mode check in reports FSCK_MSG_ZERO_PADDED_FILEMODE and FSCK_MSG_BAD_FILEMODE from "fsck", which I think is probably sufficient. Avoiding integer wraparound is a good idea, even if it were impossible to induce misparsing of the tree data to lead to any security issues, for the same reason why we check for zero padded filemode, i.e. such a tree mode will allow the same tree object to be given different object names. Thanks.
On Sun, Jan 22, 2023 at 11:03:38AM +0100, René Scharfe wrote: > Am 22.01.23 um 08:50 schrieb Jeff King: > > On Sat, Jan 21, 2023 at 10:36:09AM +0100, René Scharfe wrote: > > > >> When parsing tree entries, reject mode values that don't fit into an > >> unsigned int. > > > > Seems reasonable. I don't think you can cause any interesting mischief > > here, but it's cheap to check, and finding data problems earlier rather > > than later is always good. > > > > Should it be s/unsigned int/uint16_t/, though? > > "mode" is declared as unsigned int, and I was more concerned with > overflowing that. Doh, I just did "vim -t get_mode" and ended up in the near-identical version in fast-import.c, which uses uint16_t. Maybe it would want the same treatment. ;) > We could be more strict and reject everything that oversteps > S_IFMT|ALLPERMS, but the latter is not defined everywhere. But > permission bits are well-known, so the magic number 07777 should be > recognizable enough. Like this? It feels like this level of check should be happening in the fsck caller. In particular, it's nice for this parsing layer to err on the forgiving side, because the way you get out of these situations often involves something like "git ls-tree | git mktree". So in that sense, even pushing the overflow detection into the fsck would be nice, but of course it's hard to do, since we have no way to represent the overflowed value. I guess we could have a "be more careful" flag that the fsck code would pass, but I'm not sure it's worth the added complexity. -Peff
On Sat, Jan 21 2023, René Scharfe wrote: > When parsing tree entries, reject mode values that don't fit into an > unsigned int. > > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > tree-walk.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/tree-walk.c b/tree-walk.c > index 74f4d710e8..5e7bc38600 100644 > --- a/tree-walk.c > +++ b/tree-walk.c > @@ -17,6 +17,8 @@ static const char *get_mode(const char *str, unsigned int *modep) > while ((c = *str++) != ' ') { > if (c < '0' || c > '7') > return NULL; > + if ((mode << 3) >> 3 != mode) > + return NULL; > mode = (mode << 3) + (c - '0'); > } > *modep = mode; There was a discussion about this on git-security last August, in a report that turned out to be uninteresting for the security aspects. I'll just quote my own reply here out of context (<220811.86mtcbqd5x.gmgdl@evledraar.gmail.com> for those with access). For context the other issue in the "two issues" below is the one I'm working towards fixing in https://lore.kernel.org/git/cover-0.4-00000000000-20221118T113442Z-avarab@gmail.com/, the other is this file mode overflow. As noted at the end below I'm conflicted between whether we should "fix" this in some way, or just intentionally leave it alone, because while it's entirely accidental, this is the one part of git's object format I'm aware of that we could sneak in some extension in the future, without entirely breaking backwards compatibility. BEGIN QUOTE There's really two issues being raised here, how we validate the "mode" in tree entries, and how we sometimes misreport object type X as type Y. I replied on-list just now noting that the "mode" thing is something I was working towards fixing a while ago, but am happy to see a more isolated fix for: https://lore.kernel.org/git/220811.86r11nqfi2.gmgdl@evledraar.gmail.com/ I have local patches for the misreporting of object types, it's not *that* hard to get right. Basically we just need to be more careful about how we populate the "struct object". The most common case is when we parse tags that say on their envelope that we refer to a type X, but it's really a type Y. In that case we haven't .parsed=1 the object, but we note the wrong type down. My local patches are basically just deferring that, so we don't trust such type claims, and take our *actual* parsing of the object over a previous reference in the object store. I don't think that leaves any difficult edge cases related to tree parsing, as Xavier notes we'd have modes claiming that an X is Y, but we can resolve it using the same principle. But regarding the "mode parsing" I think https://lore.kernel.org/git/YvQdR3sDqDMCIjIE@coredump.intra.peff.net/ is taking us in the wrong direction, but wasn't comfortable saying so on list, because this is "exploitable" in the following way: $ perl -wE 'say unpack "B*", "hello world"' 0110100001100101011011000110110001101111001000000111011101101111011100100110110001100100 Which, combined with Jeff's series on-list you can try e.g.: diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh index ac4099ca893..d435dfd64c5 100755 --- a/t/t5504-fetch-receive-strict.sh +++ b/t/t5504-fetch-receive-strict.sh @@ -357,16 +357,16 @@ test_expect_success 'badFilemode is not a strict error' ' tree=$( cd badmode.git && blob=$(echo blob | git hash-object -w --stdin | hex2oct) && - printf "123456 foo\0${blob}" | + printf "106640110100001100101011011000110110001101111001000000111011101101111011100100110110001100100664 foo\0${blob}" | git hash-object -t tree --stdin -w --literally ) && rm -rf dst.git && git init --bare dst.git && git -C dst.git config transfer.fsckObjects true && - - git -C badmode.git push ../dst.git $tree:refs/tags/tree 2>err && - grep "$tree: badFilemode" err + git -C badmode.git push ../dst.git $tree:refs/tags/tree && + git -C badmode.git fsck && + git -C dst.git fsck ' test_done diff --git a/tree-walk.c b/tree-walk.c index 74f4d710e8f..2fb9f2e6cbe 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -15,6 +15,7 @@ static const char *get_mode(const char *str, unsigned int *modep) return NULL; while ((c = *str++) != ' ') { + fprintf(stderr, "%c\n", c); if (c < '0' || c > '7') return NULL; mode = (mode << 3) + (c - '0'); Which gets you all tests passing, i.e. the particulars of the mode check are such that we'll accept a very long mode (but if you play with it you'll find it's not infinite, we'll run into other buffers elsewhere pretty soon). This is all assuming you're on a system whose "unsigned int" is 64 bit. This part of it is also something I discovered in the beginning of 2021 (and there's some old off-list thread between myself & Elijah on the topic I could dig up), but didn't report here at the time. So, we could just "fix" it, and re the "wrong direction" above downgrading the "bad mode" check to a mere warning is going a bit too far given the above. We have some odd modes in the wild, but we don't have such "overflowing modes". On the other hand this edge case is also a golden opportunity we're not likely to ever have again. We can't really change the git object format at this point without *major* headaches, but in this case we have the ability to encode arbitrary data into tree entries (e.g file metadata) as long as the writer makes sure they overflow back to the valid filemode :) As hacky as that is I think it would be regrettable to forever close the door on that to fix a hypothetical security bug, hypothetical because no version of git.git has an issue with it. But it is a *potential* edge case in overflowing any other tree parsing code out there in the wild (which might be otherwise guarded by a stricter fsck check of ours). END QUOTE
Am 23.01.23 um 09:33 schrieb Ævar Arnfjörð Bjarmason: > > On Sat, Jan 21 2023, René Scharfe wrote: > >> When parsing tree entries, reject mode values that don't fit into an >> unsigned int. >> >> Signed-off-by: René Scharfe <l.s.r@web.de> >> --- >> tree-walk.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/tree-walk.c b/tree-walk.c >> index 74f4d710e8..5e7bc38600 100644 >> --- a/tree-walk.c >> +++ b/tree-walk.c >> @@ -17,6 +17,8 @@ static const char *get_mode(const char *str, unsigned int *modep) >> while ((c = *str++) != ' ') { >> if (c < '0' || c > '7') >> return NULL; >> + if ((mode << 3) >> 3 != mode) >> + return NULL; >> mode = (mode << 3) + (c - '0'); >> } >> *modep = mode; > > There was a discussion about this on git-security last August, in a > report that turned out to be uninteresting for the security aspects. > > I'll just quote my own reply here out of context > (<220811.86mtcbqd5x.gmgdl@evledraar.gmail.com> for those with access). > On the other hand this edge case is also a golden opportunity we're not > likely to ever have again. We can't really change the git object format > at this point without *major* headaches, but in this case we have the > ability to encode arbitrary data into tree entries (e.g file metadata) > as long as the writer makes sure they overflow back to the valid > filemode :) Patch v1 cited above still keeps bits beyond S_IFMT (0xF000), so that's at 16 bits on a platform with 32-bit unsigned int for future object types or other metadata. One bit would suffice to switch the path field into an URL and encode additional metadata there. We could do that even with the stricter patch v2 by using one of the bits between S_IFMT and normal permissions (0777) for that, e.g. the sticky bit. That all said, the longer I think about mode overflow the less I understand why I sent these patches. It's basically harmless. Perhaps we just need a comment stating that, to contain the urge to "fix" this. Anyway, I'd like to retract my overflowing modes patches (unless someone else really, really wants one of them applied). René
René Scharfe <l.s.r@web.de> writes: > ... It's basically harmless. Perhaps > we just need a comment stating that, to contain the urge to "fix" this. Yeah, especially since fsck finds and warns about bad mode with FSCK_MSG_BAD_FILEMODE and also FSCK_MSG_ZERO_PADDED_FILEMODE in a separate codepath, adding another piece of code that checks a slightly different condition does not sound like a good idea. Thanks.
On Tue, Jan 24, 2023 at 12:44:16PM -0800, Junio C Hamano wrote: > René Scharfe <l.s.r@web.de> writes: > > > ... It's basically harmless. Perhaps > > we just need a comment stating that, to contain the urge to "fix" this. > > Yeah, especially since fsck finds and warns about bad mode with > FSCK_MSG_BAD_FILEMODE and also FSCK_MSG_ZERO_PADDED_FILEMODE in a > separate codepath, adding another piece of code that checks a > slightly different condition does not sound like a good idea. Yeah, I'm happy to drop this whole thing. I do think it would be reasonable for fsck to check for overflow alongside BAD_FILEMODE, etc, but it's annoying to do since we are relying on the existing parser. I actually have a suspicion that it might be reasonable for fsck to just parse the trees itself, rather than relying on decode_tree_entry(), etc. But that's a much bigger topic (and a possible maintenance burden) for questionable gain, so unless we find some compelling reason (some case that we really want to detect but which we want the regular decoder to ignore), it's probably not worth exploring. -Peff
diff --git a/tree-walk.c b/tree-walk.c index 74f4d710e8..5e7bc38600 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -17,6 +17,8 @@ static const char *get_mode(const char *str, unsigned int *modep) while ((c = *str++) != ' ') { if (c < '0' || c > '7') return NULL; + if ((mode << 3) >> 3 != mode) + return NULL; mode = (mode << 3) + (c - '0'); } *modep = mode;
When parsing tree entries, reject mode values that don't fit into an unsigned int. Signed-off-by: René Scharfe <l.s.r@web.de> --- tree-walk.c | 2 ++ 1 file changed, 2 insertions(+) -- 2.39.1