From patchwork Thu Aug 31 06:17:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13370953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F37AC83F29 for ; Thu, 31 Aug 2023 06:17:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235817AbjHaGR7 (ORCPT ); Thu, 31 Aug 2023 02:17:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232588AbjHaGR7 (ORCPT ); Thu, 31 Aug 2023 02:17:59 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D139F1B1 for ; Wed, 30 Aug 2023 23:17:55 -0700 (PDT) Received: (qmail 21139 invoked by uid 109); 31 Aug 2023 06:17:55 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 31 Aug 2023 06:17:55 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 2951 invoked by uid 111); 31 Aug 2023 06:17:57 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 31 Aug 2023 02:17:57 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 31 Aug 2023 02:17:54 -0400 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 01/10] tree-walk: reduce stack size for recursive functions Message-ID: <20230831061754.GA3185325@coredump.intra.peff.net> References: <20230831061735.GA2702156@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230831061735.GA2702156@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The traverse_trees() and traverse_trees_recursive() functions call each other recursively. In a deep tree, this can result in running out of stack space and crashing. There's obviously going to be some limit here based on available stack, but the problem is exacerbated by a few large structs, many of which we over-allocate. For example, in traverse_trees() we store a name_entry and tree_desc_x per tree, both of which contain an object_id (which is now 32 bytes). And we allocate 8 of them (from MAX_TRAVERSE_TREES), even though many traversals will only look at 1 or 2. Interestingly, we used to allocate these on the heap, prior to 8dd40c0472 (traverse_trees(): use stack array for name entries, 2020-01-30). That commit was trying to simplify away allocation size computations, and naively assumed that the sizes were small enough not to matter. And they don't in normal cases, but on my stock Debian system I see a crash running "git archive" on a tree with ~3600 entries. That's deep enough we wouldn't see it in practice, but probably shallow enough that we'd prefer not to make it a hard limit. Especially because other systems may have even smaller stacks. We can replace these stack variables with a few malloc invocations. This reduces the stack sizes for the two functions from 1128 and 752 bytes, respectively, down to 40 and 92 bytes. That allows a depth of ~13000 on my machine (the improvement isn't in linear proportion because my numbers don't count the size of parameters and other function overhead). The possible downsides are: 1. We now have to remember to free(). But both functions have an easy single exit (and already had to clean up other bits anyway). 2. The extra malloc()/free() overhead might be measurable. I tested this by setting up a 3000-depth tree with a single blob and running "git archive" on it. After switching to the heap, it consistently runs 2-3% faster! Presumably this is because the 1K+ of wasted stack space penalized memory caches. On a more real-world case like linux.git, the speed difference isn't measurable at all, simply because most trees aren't that deep and there's so much other work going on (like accessing the objects themselves). So the improvement I saw should be taken as evidence that we're not making anything worse, but isn't really that interesting on its own. The main motivation here is that we're now less likely to run out of stack space and crash. Signed-off-by: Jeff King --- tree-walk.c | 10 ++++++---- unpack-trees.c | 9 +++++++-- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/tree-walk.c b/tree-walk.c index 29ead71be1..ad49d55290 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -442,9 +442,9 @@ int traverse_trees(struct index_state *istate, struct traverse_info *info) { int error = 0; - struct name_entry entry[MAX_TRAVERSE_TREES]; + struct name_entry *entry; int i; - struct tree_desc_x tx[ARRAY_SIZE(entry)]; + struct tree_desc_x *tx; struct strbuf base = STRBUF_INIT; int interesting = 1; char *traverse_path; @@ -455,8 +455,8 @@ int traverse_trees(struct index_state *istate, if (traverse_trees_cur_depth > traverse_trees_max_depth) traverse_trees_max_depth = traverse_trees_cur_depth; - if (n >= ARRAY_SIZE(entry)) - BUG("traverse_trees() called with too many trees (%d)", n); + ALLOC_ARRAY(entry, n); + ALLOC_ARRAY(tx, n); for (i = 0; i < n; i++) { tx[i].d = t[i]; @@ -551,6 +551,8 @@ int traverse_trees(struct index_state *istate, } for (i = 0; i < n; i++) free_extended_entry(tx + i); + free(tx); + free(entry); free(traverse_path); info->traverse_path = NULL; strbuf_release(&base); diff --git a/unpack-trees.c b/unpack-trees.c index 87517364dc..a203f9a3d7 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -864,8 +864,8 @@ static int traverse_trees_recursive(int n, unsigned long dirmask, struct unpack_trees_options *o = info->data; int i, ret, bottom; int nr_buf = 0; - struct tree_desc t[MAX_UNPACK_TREES]; - void *buf[MAX_UNPACK_TREES]; + struct tree_desc *t; + void **buf; struct traverse_info newinfo; struct name_entry *p; int nr_entries; @@ -902,6 +902,9 @@ static int traverse_trees_recursive(int n, unsigned long dirmask, newinfo.pathlen = st_add3(newinfo.pathlen, tree_entry_len(p), 1); newinfo.df_conflicts |= df_conflicts; + ALLOC_ARRAY(t, n); + ALLOC_ARRAY(buf, n); + /* * Fetch the tree from the ODB for each peer directory in the * n commits. @@ -937,6 +940,8 @@ static int traverse_trees_recursive(int n, unsigned long dirmask, for (i = 0; i < nr_buf; i++) free(buf[i]); + free(buf); + free(t); return ret; } From patchwork Thu Aug 31 06:19:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13370963 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE1BEC83F29 for ; Thu, 31 Aug 2023 06:19:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238701AbjHaGTX (ORCPT ); Thu, 31 Aug 2023 02:19:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230299AbjHaGTU (ORCPT ); Thu, 31 Aug 2023 02:19:20 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E21BE0 for ; Wed, 30 Aug 2023 23:19:17 -0700 (PDT) Received: (qmail 21144 invoked by uid 109); 31 Aug 2023 06:19:17 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 31 Aug 2023 06:19:17 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 2961 invoked by uid 111); 31 Aug 2023 06:19:18 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 31 Aug 2023 02:19:18 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 31 Aug 2023 02:19:16 -0400 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 02/10] tree-walk: drop MAX_TRAVERSE_TREES macro Message-ID: <20230831061916.GB3185325@coredump.intra.peff.net> References: <20230831061735.GA2702156@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230831061735.GA2702156@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Since the previous commit dropped the hard-coded limit in traverse_trees(), we don't need this macro there anymore (the code can handle any number of trees in parallel). We do define MAX_UNPACK_TREES using MAX_TRAVERSE_TREES, due to 5290d45134 (tree-walk.c: break circular dependency with unpack-trees, 2020-02-01). So we can just directly define that as "8" now; we know traverse_trees() can handle whatever we throw at it. Signed-off-by: Jeff King --- I think from previous discussions that this "8" may also be excessive. But digging into that should definitely be left for another series. tree-walk.h | 2 -- unpack-trees.h | 2 +- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/tree-walk.h b/tree-walk.h index 74cdceb3fe..a6bfa3da3a 100644 --- a/tree-walk.h +++ b/tree-walk.h @@ -6,8 +6,6 @@ struct index_state; struct repository; -#define MAX_TRAVERSE_TREES 8 - /** * The tree walking API is used to traverse and inspect trees. */ diff --git a/unpack-trees.h b/unpack-trees.h index 9b827c307f..5867e26e17 100644 --- a/unpack-trees.h +++ b/unpack-trees.h @@ -7,7 +7,7 @@ #include "string-list.h" #include "tree-walk.h" -#define MAX_UNPACK_TREES MAX_TRAVERSE_TREES +#define MAX_UNPACK_TREES 8 struct cache_entry; struct unpack_trees_options; From patchwork Thu Aug 31 06:19:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13370964 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E595C83F10 for ; Thu, 31 Aug 2023 06:19:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235568AbjHaGT1 (ORCPT ); Thu, 31 Aug 2023 02:19:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239014AbjHaGT0 (ORCPT ); Thu, 31 Aug 2023 02:19:26 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D172C2 for ; Wed, 30 Aug 2023 23:19:24 -0700 (PDT) Received: (qmail 21147 invoked by uid 109); 31 Aug 2023 06:19:24 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 31 Aug 2023 06:19:24 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 2964 invoked by uid 111); 31 Aug 2023 06:19:25 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 31 Aug 2023 02:19:25 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 31 Aug 2023 02:19:22 -0400 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 03/10] tree-walk: rename "error" variable Message-ID: <20230831061922.GC3185325@coredump.intra.peff.net> References: <20230831061735.GA2702156@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230831061735.GA2702156@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The "error" variable in traverse_trees() shadows the global error() function (meaning we can't call error() from here). Let's call the local variable "ret" instead, which matches the idiom in other functions. Signed-off-by: Jeff King --- tree-walk.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tree-walk.c b/tree-walk.c index ad49d55290..4efd0fc391 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -441,7 +441,7 @@ int traverse_trees(struct index_state *istate, int n, struct tree_desc *t, struct traverse_info *info) { - int error = 0; + int ret = 0; struct name_entry *entry; int i; struct tree_desc_x *tx; @@ -539,7 +539,7 @@ int traverse_trees(struct index_state *istate, if (interesting) { trees_used = info->fn(n, mask, dirmask, entry, info); if (trees_used < 0) { - error = trees_used; + ret = trees_used; if (!info->show_all_errors) break; } @@ -558,7 +558,7 @@ int traverse_trees(struct index_state *istate, strbuf_release(&base); traverse_trees_cur_depth--; - return error; + return ret; } struct dir_state { From patchwork Thu Aug 31 06:20:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13370965 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CFFAC83F10 for ; Thu, 31 Aug 2023 06:20:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238278AbjHaGUH (ORCPT ); Thu, 31 Aug 2023 02:20:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230354AbjHaGUF (ORCPT ); Thu, 31 Aug 2023 02:20:05 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4B96FC2 for ; Wed, 30 Aug 2023 23:20:02 -0700 (PDT) Received: (qmail 21152 invoked by uid 109); 31 Aug 2023 06:20:02 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 31 Aug 2023 06:20:02 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 3001 invoked by uid 111); 31 Aug 2023 06:20:03 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 31 Aug 2023 02:20:03 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 31 Aug 2023 02:20:01 -0400 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 04/10] fsck: detect very large tree pathnames Message-ID: <20230831062001.GD3185325@coredump.intra.peff.net> References: <20230831061735.GA2702156@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230831061735.GA2702156@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In general, Git tries not to arbitrarily limit what it will store, and there are currently no limits at all on the size of the path we find in a tree. In theory you could have one that is gigabytes long. But in practice this freedom is not really helping anybody, and is potentially harmful: 1. Most operating systems have much lower limits for the size of a single pathname component (e.g., on Linux you'll generally get ENAMETOOLONG for anything over 255 bytes). And while you _can_ use Git in a way that never touches the filesystem (manipulating the index and trees directly), it's still probably not a good idea to have gigantic tree names. Many operations load and traverse them, so any clever Git-as-a-database scheme is likely to perform poorly in that case. 2. We still have a lot of code which assumes strings are reasonably sized, and I won't be at all surprised if you can trigger some interesting integer overflows with gigantic pathnames. Stopping malicious trees from entering the repository provides an extra line of defense, protecting downstream code. This patch implements an fsck check so that such trees can be rejected by transfer.fsckObjects. I've picked a reasonably high maximum depth here (4096) that hopefully should not bother anybody in practice. I've also made it configurable, as an escape hatch. Signed-off-by: Jeff King --- Documentation/fsck-msgids.txt | 7 +++++++ fsck.c | 24 +++++++++++++++++++++++- fsck.h | 1 + t/t1450-fsck.sh | 10 ++++++++++ 4 files changed, 41 insertions(+), 1 deletion(-) diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt index 12eae8a222..09b0aecbf8 100644 --- a/Documentation/fsck-msgids.txt +++ b/Documentation/fsck-msgids.txt @@ -103,6 +103,13 @@ `hasDotgit`:: (WARN) A tree contains an entry named `.git`. +`largePathname`:: + (WARN) A tree contains an entry with a very long path name. If + the value of `fsck.largePathname` contains a colon, that value + is used as the maximum allowable length (e.g., "warn:10" would + complain about any path component of 11 or more bytes). The + default value is 4096. + `mailmapSymlink`:: (INFO) `.mailmap` is a symlink. diff --git a/fsck.c b/fsck.c index 2b1e348005..6a0bbc5087 100644 --- a/fsck.c +++ b/fsck.c @@ -24,6 +24,8 @@ #include "credential.h" #include "help.h" +static ssize_t max_tree_entry_len = 4096; + #define STR(x) #x #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type }, static struct { @@ -154,15 +156,29 @@ void fsck_set_msg_type(struct fsck_options *options, const char *msg_id_str, const char *msg_type_str) { int msg_id = parse_msg_id(msg_id_str); - enum fsck_msg_type msg_type = parse_msg_type(msg_type_str); + char *to_free = NULL; + enum fsck_msg_type msg_type; if (msg_id < 0) die("Unhandled message id: %s", msg_id_str); + if (msg_id == FSCK_MSG_LARGE_PATHNAME) { + const char *colon = strchr(msg_type_str, ':'); + if (colon) { + msg_type_str = to_free = + xmemdupz(msg_type_str, colon - msg_type_str); + colon++; + if (!git_parse_ssize_t(colon, &max_tree_entry_len)) + die("unable to parse max tree entry len: %s", colon); + } + } + msg_type = parse_msg_type(msg_type_str); + if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL) die("Cannot demote %s to %s", msg_id_str, msg_type_str); fsck_set_msg_type_from_ids(options, msg_id, msg_type); + free(to_free); } void fsck_set_msg_types(struct fsck_options *options, const char *values) @@ -578,6 +594,7 @@ static int fsck_tree(const struct object_id *tree_oid, int has_bad_modes = 0; int has_dup_entries = 0; int not_properly_sorted = 0; + int has_large_name = 0; struct tree_desc desc; unsigned o_mode; const char *o_name; @@ -607,6 +624,7 @@ static int fsck_tree(const struct object_id *tree_oid, has_dotdot |= !strcmp(name, ".."); has_dotgit |= is_hfs_dotgit(name) || is_ntfs_dotgit(name); has_zero_pad |= *(char *)desc.buffer == '0'; + has_large_name |= tree_entry_len(&desc.entry) > max_tree_entry_len; if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) { if (!S_ISLNK(mode)) @@ -749,6 +767,10 @@ static int fsck_tree(const struct object_id *tree_oid, retval += report(options, tree_oid, OBJ_TREE, FSCK_MSG_TREE_NOT_SORTED, "not properly sorted"); + if (has_large_name) + retval += report(options, tree_oid, OBJ_TREE, + FSCK_MSG_LARGE_PATHNAME, + "contains excessively large pathname"); return retval; } diff --git a/fsck.h b/fsck.h index 6359ba359b..e3adf9d911 100644 --- a/fsck.h +++ b/fsck.h @@ -73,6 +73,7 @@ enum fsck_msg_type { FUNC(NULL_SHA1, WARN) \ FUNC(ZERO_PADDED_FILEMODE, WARN) \ FUNC(NUL_IN_COMMIT, WARN) \ + FUNC(LARGE_PATHNAME, WARN) \ /* infos (reported as warnings, but ignored by default) */ \ FUNC(BAD_FILEMODE, INFO) \ FUNC(GITMODULES_PARSE, INFO) \ diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 5805d47eb9..10a539158c 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -589,6 +589,16 @@ test_expect_success 'fsck notices submodule entry pointing to null sha1' ' ) ' +test_expect_success 'fsck notices excessively large tree entry name' ' + git init large-name && + ( + cd large-name && + test_commit a-long-name && + git -c fsck.largePathname=warn:10 fsck 2>out && + grep "warning.*large pathname" out + ) +' + while read name path pretty; do while read mode type; do : ${pretty:=$path} From patchwork Thu Aug 31 06:21:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13370966 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDF98C83F29 for ; Thu, 31 Aug 2023 06:21:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243745AbjHaGVK (ORCPT ); Thu, 31 Aug 2023 02:21:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241569AbjHaGVJ (ORCPT ); Thu, 31 Aug 2023 02:21:09 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0BA0CD6 for ; Wed, 30 Aug 2023 23:21:01 -0700 (PDT) Received: (qmail 21157 invoked by uid 109); 31 Aug 2023 06:21:01 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 31 Aug 2023 06:21:01 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 3004 invoked by uid 111); 31 Aug 2023 06:21:03 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 31 Aug 2023 02:21:03 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 31 Aug 2023 02:21:00 -0400 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 05/10] add core.maxTreeDepth config Message-ID: <20230831062100.GE3185325@coredump.intra.peff.net> References: <20230831061735.GA2702156@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230831061735.GA2702156@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Most of our tree traversal algorithms use recursion to visit sub-trees. For pathologically large trees, this can cause us to run out of stack space and abort in an uncontrolled way. Let's put our own limit here so that we can fail gracefully rather than segfaulting. In similar cases where we recursed along the commit graph, we rewrote the algorithms to avoid recursion and keep any stack data on the heap. But the commit graph is meant to grow without bound, whereas it's not an imposition to put a limit on the maximum size of tree we'll handle. And this has a bonus side effect: coupled with a limit on individual tree entry names, this limits the total size of a path we may encounter. This gives us an extra protection against code handling long path names which may suffer from integer overflows in the size (which could then be exploited by malicious trees). The default of 4096 is set to be much longer than anybody would care about in the real world. Even with single-letter interior tree names (like "a/b/c"), such a path is at least 8191 bytes. While most operating systems will let you create such a path incrementally, trying to reference the whole thing in a system call (as Git would do when actually trying to access it) will result in ENAMETOOLONG. Coupled with the recent fsck.largePathname warning, the maximum total pathname Git will handle is (by default) 16MB. This config option doesn't do anything yet; future patches will convert various algorithms to respect the limit. Signed-off-by: Jeff King --- Documentation/config/core.txt | 6 ++++++ config.c | 5 +++++ environment.c | 1 + environment.h | 1 + 4 files changed, 13 insertions(+) diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt index dfbdaf00b8..0e8c2832bf 100644 --- a/Documentation/config/core.txt +++ b/Documentation/config/core.txt @@ -736,3 +736,9 @@ core.abbrev:: If set to "no", no abbreviation is made and the object names are shown in their full length. The minimum length is 4. + +core.maxTreeDepth:: + The maximum depth Git is willing to recurse while traversing a + tree (e.g., "a/b/cde/f" has a depth of 4). This is a fail-safe + to allow Git to abort cleanly, and should not generally need to + be adjusted. The default is 4096. diff --git a/config.c b/config.c index 3846a37be9..2ea39295cd 100644 --- a/config.c +++ b/config.c @@ -1801,6 +1801,11 @@ static int git_default_core_config(const char *var, const char *value, return 0; } + if (!strcmp(var, "core.maxtreedepth")) { + max_allowed_tree_depth = git_config_int(var, value, ctx->kvi); + return 0; + } + /* Add other config variables here and to Documentation/config.txt. */ return platform_core_config(var, value, ctx, cb); } diff --git a/environment.c b/environment.c index f98d76f080..8e25b5ef02 100644 --- a/environment.c +++ b/environment.c @@ -81,6 +81,7 @@ int merge_log_config = -1; int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */ unsigned long pack_size_limit_cfg; enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET; +int max_allowed_tree_depth = 4096; #ifndef PROTECT_HFS_DEFAULT #define PROTECT_HFS_DEFAULT 0 diff --git a/environment.h b/environment.h index c5377473c6..e5351c9dd9 100644 --- a/environment.h +++ b/environment.h @@ -132,6 +132,7 @@ extern size_t packed_git_limit; extern size_t delta_base_cache_limit; extern unsigned long big_file_threshold; extern unsigned long pack_size_limit_cfg; +extern int max_allowed_tree_depth; /* * Accessors for the core.sharedrepository config which lazy-load the value From patchwork Thu Aug 31 06:21:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13370967 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93872C83F29 for ; Thu, 31 Aug 2023 06:22:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344363AbjHaGWa (ORCPT ); Thu, 31 Aug 2023 02:22:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343945AbjHaGWH (ORCPT ); Thu, 31 Aug 2023 02:22:07 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FD3410C4 for ; Wed, 30 Aug 2023 23:21:41 -0700 (PDT) Received: (qmail 21160 invoked by uid 109); 31 Aug 2023 06:21:41 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 31 Aug 2023 06:21:41 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 3012 invoked by uid 111); 31 Aug 2023 06:21:42 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 31 Aug 2023 02:21:42 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 31 Aug 2023 02:21:40 -0400 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 06/10] traverse_trees(): respect max_allowed_tree_depth Message-ID: <20230831062140.GF3185325@coredump.intra.peff.net> References: <20230831061735.GA2702156@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230831061735.GA2702156@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The tree-walk.c code walks trees recursively, and may run out of stack space. The easiest way to see this is with git-archive; on my 64-bit Linux system it runs out of stack trying to generate a tarfile with a tree depth of 13,772. I've picked 4100 as the depth for our "big" test. I ran it with a much higher value to confirm that we do get a segfault without this patch. But really anything over 4096 is sufficient for its stated purpose, which is to find out if our default limit of 4096 is low enough to prevent segfaults on all platforms. Keeping it small saves us time on the test setup. The tree-walk code that's touched here underlies unpack_trees(), so this protects any programs which use it, not just git-archive (but archive is easy to test, and was what alerted me to this issue in a real-world case). Signed-off-by: Jeff King --- t/t6700-tree-depth.sh | 66 +++++++++++++++++++++++++++++++++++++++++++ tree-walk.c | 4 +++ 2 files changed, 70 insertions(+) create mode 100755 t/t6700-tree-depth.sh diff --git a/t/t6700-tree-depth.sh b/t/t6700-tree-depth.sh new file mode 100755 index 0000000000..d4d17db2ae --- /dev/null +++ b/t/t6700-tree-depth.sh @@ -0,0 +1,66 @@ +#!/bin/sh + +test_description='handling of deep trees in various commands' +. ./test-lib.sh + +# We'll test against two depths here: a small one that will let us check the +# behavior of the config setting easily, and a large one that should be +# forbidden by default. Testing the default depth will let us know whether our +# default is enough to prevent segfaults on systems that run the tests. +small_depth=50 +big_depth=4100 + +small_ok="-c core.maxtreedepth=$small_depth" +small_no="-c core.maxtreedepth=$((small_depth-1))" + +# usage: mkdeep +# Create a tag containing a file whose path has depth . +# +# We'll use fast-import here for two reasons: +# +# 1. It's faster than creating $big_depth tree objects. +# +# 2. As we tighten tree limits, it's more likely to allow large sizes +# than trying to stuff a deep path into the index. +mkdeep () { + { + echo "commit refs/tags/$1" && + echo "committer foo 1234 -0000" && + echo "data </dev/null && + test_must_fail git $small_no archive small >/dev/null +' + +test_expect_success 'default limit for git-archive fails gracefully' ' + test_must_fail git archive big >/dev/null +' + +test_done diff --git a/tree-walk.c b/tree-walk.c index 4efd0fc391..b517792ba2 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -9,6 +9,7 @@ #include "tree.h" #include "pathspec.h" #include "json-writer.h" +#include "environment.h" static const char *get_mode(const char *str, unsigned int *modep) { @@ -449,6 +450,9 @@ int traverse_trees(struct index_state *istate, int interesting = 1; char *traverse_path; + if (traverse_trees_cur_depth > max_allowed_tree_depth) + return error("exceeded maximum allowed tree depth"); + traverse_trees_count++; traverse_trees_cur_depth++; From patchwork Thu Aug 31 06:21:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13370968 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43D00C83F29 for ; Thu, 31 Aug 2023 06:22:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243081AbjHaGW4 (ORCPT ); Thu, 31 Aug 2023 02:22:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241088AbjHaGWp (ORCPT ); Thu, 31 Aug 2023 02:22:45 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 522A4E4A for ; Wed, 30 Aug 2023 23:22:06 -0700 (PDT) Received: (qmail 21163 invoked by uid 109); 31 Aug 2023 06:21:56 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 31 Aug 2023 06:21:56 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 3015 invoked by uid 111); 31 Aug 2023 06:21:57 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 31 Aug 2023 02:21:57 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 31 Aug 2023 02:21:55 -0400 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 07/10] read_tree(): respect max_allowed_tree_depth Message-ID: <20230831062155.GG3185325@coredump.intra.peff.net> References: <20230831061735.GA2702156@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230831061735.GA2702156@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The read_tree() function reads trees recursively (via its read_tree_at() helper). This can cause it to run out of stack space on very deep trees. Let's teach it about the new core.maxTreeDepth option. The easiest way to demonstrate this is via "ls-tree -r", which the test covers. Note that I needed a tree depth of ~30k to trigger a segfault on my Linux system, not the 4100 used by our "big" test in t6700. However, that test still tells us what we want: that the default 4096 limit is enough to prevent segfaults on all platforms. We could bump it, but that increases the cost of the test setup for little gain. As an interesting side-note: when I originally wrote this patch about 4 years ago, I needed a depth of ~50k to segfault. But porting it forward, the number is much lower. Seemingly little things like cf0983213c (hash: add an algo member to struct object_id, 2021-04-26) take it from 32,722 to 29,080. Signed-off-by: Jeff King --- sparse-index.c | 2 +- t/t6700-tree-depth.sh | 9 +++++++++ tree.c | 9 +++++++-- tree.h | 1 + wt-status.c | 2 +- 5 files changed, 19 insertions(+), 4 deletions(-) diff --git a/sparse-index.c b/sparse-index.c index 1fdb07a9e6..3578feb283 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -391,7 +391,7 @@ void expand_index(struct index_state *istate, struct pattern_list *pl) strbuf_setlen(&base, 0); strbuf_add(&base, ce->name, strlen(ce->name)); - read_tree_at(istate->repo, tree, &base, &ps, + read_tree_at(istate->repo, tree, &base, 0, &ps, add_path_to_index, &ctx); /* free directory entries. full entries are re-used */ diff --git a/t/t6700-tree-depth.sh b/t/t6700-tree-depth.sh index d4d17db2ae..93ec41df03 100755 --- a/t/t6700-tree-depth.sh +++ b/t/t6700-tree-depth.sh @@ -63,4 +63,13 @@ test_expect_success 'default limit for git-archive fails gracefully' ' test_must_fail git archive big >/dev/null ' +test_expect_success 'limit recursion of ls-tree -r' ' + git $small_ok ls-tree -r small && + test_must_fail git $small_no ls-tree -r small +' + +test_expect_success 'default limit for ls-tree fails gracefully' ' + test_must_fail git ls-tree -r big >/dev/null +' + test_done diff --git a/tree.c b/tree.c index c745462f96..990f9c9854 100644 --- a/tree.c +++ b/tree.c @@ -10,11 +10,13 @@ #include "alloc.h" #include "tree-walk.h" #include "repository.h" +#include "environment.h" const char *tree_type = "tree"; int read_tree_at(struct repository *r, struct tree *tree, struct strbuf *base, + int depth, const struct pathspec *pathspec, read_tree_fn_t fn, void *context) { @@ -24,6 +26,9 @@ int read_tree_at(struct repository *r, int len, oldlen = base->len; enum interesting retval = entry_not_interesting; + if (depth > max_allowed_tree_depth) + return error("exceeded maximum allowed tree depth"); + if (parse_tree(tree)) return -1; @@ -74,7 +79,7 @@ int read_tree_at(struct repository *r, strbuf_add(base, entry.path, len); strbuf_addch(base, '/'); retval = read_tree_at(r, lookup_tree(r, &oid), - base, pathspec, + base, depth + 1, pathspec, fn, context); strbuf_setlen(base, oldlen); if (retval) @@ -89,7 +94,7 @@ int read_tree(struct repository *r, read_tree_fn_t fn, void *context) { struct strbuf sb = STRBUF_INIT; - int ret = read_tree_at(r, tree, &sb, pathspec, fn, context); + int ret = read_tree_at(r, tree, &sb, 0, pathspec, fn, context); strbuf_release(&sb); return ret; } diff --git a/tree.h b/tree.h index 1b5ecbda6b..cc6ddf51b3 100644 --- a/tree.h +++ b/tree.h @@ -44,6 +44,7 @@ typedef int (*read_tree_fn_t)(const struct object_id *, struct strbuf *, const c int read_tree_at(struct repository *r, struct tree *tree, struct strbuf *base, + int depth, const struct pathspec *pathspec, read_tree_fn_t fn, void *context); diff --git a/wt-status.c b/wt-status.c index 5b1378965c..996f8635c3 100644 --- a/wt-status.c +++ b/wt-status.c @@ -739,7 +739,7 @@ static void wt_status_collect_changes_initial(struct wt_status *s) ps.max_depth = -1; strbuf_add(&base, ce->name, ce->ce_namelen); - read_tree_at(istate->repo, tree, &base, &ps, + read_tree_at(istate->repo, tree, &base, 0, &ps, add_file_to_list, s); continue; } From patchwork Thu Aug 31 06:22:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13370969 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDA11C83F10 for ; Thu, 31 Aug 2023 06:23:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343945AbjHaGXE (ORCPT ); Thu, 31 Aug 2023 02:23:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345071AbjHaGW6 (ORCPT ); Thu, 31 Aug 2023 02:22:58 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC99DE67 for ; Wed, 30 Aug 2023 23:22:16 -0700 (PDT) Received: (qmail 21166 invoked by uid 109); 31 Aug 2023 06:22:04 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 31 Aug 2023 06:22:04 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 3018 invoked by uid 111); 31 Aug 2023 06:22:05 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 31 Aug 2023 02:22:05 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 31 Aug 2023 02:22:03 -0400 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 08/10] list-objects: respect max_allowed_tree_depth Message-ID: <20230831062203.GH3185325@coredump.intra.peff.net> References: <20230831061735.GA2702156@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230831061735.GA2702156@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The tree traversal in list-objects.c, which is used by "rev-list --objects", etc, uses recursion and may run out of stack space. Let's teach it about the new core.maxTreeDepth config option. We unfortunately can't return an error here, as this code doesn't produce an error return at all. We'll die() instead, which matches the behavior when we see an otherwise broken tree. Note that this will also generally reject such deep trees from entering the repository from a fetch or push, due to the use of rev-list in the connectivity check. But it's not foolproof! We stop traversing when we see an UNINTERESTING object, and the connectivity check marks existing ref tips as UNINTERESTING. So imagine commit X has a tree with maximum depth N. If you then create a new commit Y with a tree entry "Y:subdir" that points to "X^{tree}", then the depth of Y will be N+1. But a connectivity check running "git rev-list --objects Y --not X" won't realize that; it will stop traversing at X^{tree}, since that was already reachable. So this will stop naive pushes of too-deep trees, but not carefully crafted malicious ones. Doing it robustly and efficiently would require caching the maximum depth of each tree (i.e., the longest path to any leaf entry). That's much more complex and not strictly needed. If each recursive algorithm limits itself already, then that's sufficient. Blocking the objects from entering the repo would be a nice belt-and-suspenders addition, but it's not worth the extra cost. Signed-off-by: Jeff King --- list-objects.c | 8 ++++++++ t/t6700-tree-depth.sh | 9 +++++++++ 2 files changed, 17 insertions(+) diff --git a/list-objects.c b/list-objects.c index e60a6cd5b4..c25c72b32c 100644 --- a/list-objects.c +++ b/list-objects.c @@ -14,13 +14,15 @@ #include "packfile.h" #include "object-store-ll.h" #include "trace.h" +#include "environment.h" struct traversal_context { struct rev_info *revs; show_object_fn show_object; show_commit_fn show_commit; void *show_data; struct filter *filter; + int depth; }; static void show_commit(struct traversal_context *ctx, @@ -118,7 +120,9 @@ static void process_tree_contents(struct traversal_context *ctx, entry.path, oid_to_hex(&tree->object.oid)); } t->object.flags |= NOT_USER_GIVEN; + ctx->depth++; process_tree(ctx, t, base, entry.path); + ctx->depth--; } else if (S_ISGITLINK(entry.mode)) ; /* ignore gitlink */ @@ -156,6 +160,9 @@ static void process_tree(struct traversal_context *ctx, !revs->include_check_obj(&tree->object, revs->include_check_data)) return; + if (ctx->depth > max_allowed_tree_depth) + die("exceeded maximum allowed tree depth"); + failed_parse = parse_tree_gently(tree, 1); if (failed_parse) { if (revs->ignore_missing_links) @@ -349,6 +356,7 @@ static void traverse_non_commits(struct traversal_context *ctx, if (!path) path = ""; if (obj->type == OBJ_TREE) { + ctx->depth = 0; process_tree(ctx, (struct tree *)obj, base, path); continue; } diff --git a/t/t6700-tree-depth.sh b/t/t6700-tree-depth.sh index 93ec41df03..f5d284b16e 100755 --- a/t/t6700-tree-depth.sh +++ b/t/t6700-tree-depth.sh @@ -72,4 +72,13 @@ test_expect_success 'default limit for ls-tree fails gracefully' ' test_must_fail git ls-tree -r big >/dev/null ' +test_expect_success 'limit recursion of rev-list --objects' ' + git $small_ok rev-list --objects small >/dev/null && + test_must_fail git $small_no rev-list --objects small >/dev/null +' + +test_expect_success 'default limit for rev-list fails gracefully' ' + test_must_fail git rev-list --objects big >/dev/null +' + test_done From patchwork Thu Aug 31 06:22:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13370970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95D7CC83F10 for ; Thu, 31 Aug 2023 06:23:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344865AbjHaGXN (ORCPT ); Thu, 31 Aug 2023 02:23:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34304 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238278AbjHaGXI (ORCPT ); Thu, 31 Aug 2023 02:23:08 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D19BCFE for ; Wed, 30 Aug 2023 23:22:27 -0700 (PDT) Received: (qmail 21169 invoked by uid 109); 31 Aug 2023 06:22:10 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 31 Aug 2023 06:22:10 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 3021 invoked by uid 111); 31 Aug 2023 06:22:11 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 31 Aug 2023 02:22:11 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 31 Aug 2023 02:22:08 -0400 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 09/10] tree-diff: respect max_allowed_tree_depth Message-ID: <20230831062208.GI3185325@coredump.intra.peff.net> References: <20230831061735.GA2702156@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230831061735.GA2702156@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When diffing trees, we recurse to handle subtrees. That means we may run out of stack space and segfault. Let's teach this code path about core.maxTreeDepth in order to fail more gracefully. As with the previous patch, we have no way to return an error (and other tree-loading problems would just cause us to die()). So we'll likewise call die() if we exceed the maximum depth. Signed-off-by: Jeff King --- t/t6700-tree-depth.sh | 9 +++++++++ tree-diff.c | 23 +++++++++++++++-------- 2 files changed, 24 insertions(+), 8 deletions(-) diff --git a/t/t6700-tree-depth.sh b/t/t6700-tree-depth.sh index f5d284b16e..e410c41234 100755 --- a/t/t6700-tree-depth.sh +++ b/t/t6700-tree-depth.sh @@ -81,4 +81,13 @@ test_expect_success 'default limit for rev-list fails gracefully' ' test_must_fail git rev-list --objects big >/dev/null ' +test_expect_success 'limit recursion of diff-tree -r' ' + git $small_ok diff-tree -r $EMPTY_TREE small && + test_must_fail git $small_no diff-tree -r $EMPTY_TREE small +' + +test_expect_success 'default limit for diff-tree fails gracefully' ' + test_must_fail git diff-tree -r $EMPTY_TREE big +' + test_done diff --git a/tree-diff.c b/tree-diff.c index 8fc159b86e..46107772d1 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -7,6 +7,7 @@ #include "hash.h" #include "tree.h" #include "tree-walk.h" +#include "environment.h" /* * Some mode bits are also used internally for computations. @@ -45,7 +46,8 @@ static struct combine_diff_path *ll_diff_tree_paths( struct combine_diff_path *p, const struct object_id *oid, const struct object_id **parents_oid, int nparent, - struct strbuf *base, struct diff_options *opt); + struct strbuf *base, struct diff_options *opt, + int depth); static void ll_diff_tree_oid(const struct object_id *old_oid, const struct object_id *new_oid, struct strbuf *base, struct diff_options *opt); @@ -196,7 +198,7 @@ static struct combine_diff_path *path_appendnew(struct combine_diff_path *last, static struct combine_diff_path *emit_path(struct combine_diff_path *p, struct strbuf *base, struct diff_options *opt, int nparent, struct tree_desc *t, struct tree_desc *tp, - int imin) + int imin, int depth) { unsigned short mode; const char *path; @@ -302,7 +304,8 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p, strbuf_add(base, path, pathlen); strbuf_addch(base, '/'); - p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt); + p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt, + depth + 1); FAST_ARRAY_FREE(parents_oid, nparent); } @@ -423,12 +426,16 @@ static inline void update_tp_entries(struct tree_desc *tp, int nparent) static struct combine_diff_path *ll_diff_tree_paths( struct combine_diff_path *p, const struct object_id *oid, const struct object_id **parents_oid, int nparent, - struct strbuf *base, struct diff_options *opt) + struct strbuf *base, struct diff_options *opt, + int depth) { struct tree_desc t, *tp; void *ttree, **tptree; int i; + if (depth > max_allowed_tree_depth) + die("exceeded maximum allowed tree depth"); + FAST_ARRAY_ALLOC(tp, nparent); FAST_ARRAY_ALLOC(tptree, nparent); @@ -522,7 +529,7 @@ static struct combine_diff_path *ll_diff_tree_paths( /* D += {δ(t,pi) if pi=p[imin]; "+a" if pi > p[imin]} */ p = emit_path(p, base, opt, nparent, - &t, tp, imin); + &t, tp, imin, depth); skip_emit_t_tp: /* t↓, ∀ pi=p[imin] pi↓ */ @@ -534,7 +541,7 @@ static struct combine_diff_path *ll_diff_tree_paths( else if (cmp < 0) { /* D += "+t" */ p = emit_path(p, base, opt, nparent, - &t, /*tp=*/NULL, -1); + &t, /*tp=*/NULL, -1, depth); /* t↓ */ update_tree_entry(&t); @@ -550,7 +557,7 @@ static struct combine_diff_path *ll_diff_tree_paths( } p = emit_path(p, base, opt, nparent, - /*t=*/NULL, tp, imin); + /*t=*/NULL, tp, imin, depth); skip_emit_tp: /* ∀ pi=p[imin] pi↓ */ @@ -572,7 +579,7 @@ struct combine_diff_path *diff_tree_paths( const struct object_id **parents_oid, int nparent, struct strbuf *base, struct diff_options *opt) { - p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt); + p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt, 0); /* * free pre-allocated last element, if any From patchwork Thu Aug 31 06:23:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 13370999 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2621CC83F10 for ; Thu, 31 Aug 2023 06:25:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241483AbjHaGZf (ORCPT ); Thu, 31 Aug 2023 02:25:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234884AbjHaGZe (ORCPT ); Thu, 31 Aug 2023 02:25:34 -0400 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0EAAE49 for ; Wed, 30 Aug 2023 23:25:02 -0700 (PDT) Received: (qmail 21173 invoked by uid 109); 31 Aug 2023 06:23:22 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 31 Aug 2023 06:23:22 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 3027 invoked by uid 111); 31 Aug 2023 06:23:23 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 31 Aug 2023 02:23:23 -0400 Authentication-Results: peff.net; auth=none Date: Thu, 31 Aug 2023 02:23:20 -0400 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 10/10] lower core.maxTreeDepth default to 2048 Message-ID: <20230831062320.GJ3185325@coredump.intra.peff.net> References: <20230831061735.GA2702156@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20230831061735.GA2702156@coredump.intra.peff.net> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On my Linux system, all of our recursive tree walking algorithms can run up to the 4096 default limit without segfaulting. But not all platforms will have stack sizes as generous (nor might even Linux if we kick off a recursive walk within a thread). In particular, several of the tests added in the previous few commits fail in our Windows CI environment. Through some guess-and-check pushing, I found that 3072 is still too much, but 2048 is OK. These are obviously vague heuristics, and there is nothing to promise that another system might not have trouble at even lower values. But it seems unlikely anybody will be too angry about a 2048-depth limit (this is close to the default max-pathname limit on Linux even for a pathological path like "a/a/a/..."). So let's just lower it. Some alternatives are: - configure separate defaults for Windows versus other platforms. - just skip the tests on Windows. This leaves Windows users with the annoying case that they can be crashed by running out of stack space, but there shouldn't be any security implications (they can't go deep enough to hit integer overflow problems). Since the original default was arbitrary, it seems less confusing to just lower it, keeping behavior consistent across platforms. Signed-off-by: Jeff King --- Arguably this could be squashed into patch 5. But I thought that following the sequence of logic (from "4096 is probably OK" to "whoops, it's not") had some value to share. AFAIK GitHub is still running with the 4096 limit; I discovered the Windows issue only while preparing this for upstream. But I still find it highly unlikely that somebody would run into it in practice. environment.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environment.c b/environment.c index 8e25b5ef02..bb3c2a96a3 100644 --- a/environment.c +++ b/environment.c @@ -81,7 +81,7 @@ int merge_log_config = -1; int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */ unsigned long pack_size_limit_cfg; enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET; -int max_allowed_tree_depth = 4096; +int max_allowed_tree_depth = 2048; #ifndef PROTECT_HFS_DEFAULT #define PROTECT_HFS_DEFAULT 0