From patchwork Sun Aug 25 08:08:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11113263 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BC17213B1 for ; Sun, 25 Aug 2019 08:08:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A1EE7217D7 for ; Sun, 25 Aug 2019 08:08:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726744AbfHYIIX (ORCPT ); Sun, 25 Aug 2019 04:08:23 -0400 Received: from cloud.peff.net ([104.130.231.41]:54814 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1725792AbfHYIIX (ORCPT ); Sun, 25 Aug 2019 04:08:23 -0400 Received: (qmail 11785 invoked by uid 109); 25 Aug 2019 08:08:23 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Sun, 25 Aug 2019 08:08:22 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 10122 invoked by uid 111); 25 Aug 2019 08:09:42 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Sun, 25 Aug 2019 04:09:42 -0400 Authentication-Results: peff.net; auth=none Date: Sun, 25 Aug 2019 04:08:21 -0400 From: Jeff King To: Mike Hommey Cc: Elijah Newren , git@vger.kernel.org, gitster@pobox.com Subject: [PATCH 1/2] fast-import: duplicate parsed encoding string Message-ID: <20190825080821.GA31824@sigill.intra.peff.net> References: <20190825080640.GA31453@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20190825080640.GA31453@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org We read each line of the fast-import stream into the command_buf strbuf. When reading a commit, we parse a line like "encoding foo" by storing a pointer to "foo", but not making a copy. We may then read an unbounded number of other lines (e.g., one for each modified file in the commit), each of which writes into command_buf. This works out in practice for small cases, because we hand off ownership of the heap buffer from command_buf to the cmd_hist array, and read new commands into a fresh heap buffer. And thus the pointer to "foo" remains valid as long as there aren't so many intermediate lines that we end up dropping the original "encoding" line from the history. But as the test modification shows, if we go over our default of 100 lines, we end up with our encoding string pointing into freed heap memory. This seems to fail reliably by writing garbage into the output, but running under ASan definitely detects this as a user-after-free. We can fix it by duplicating the encoding value, just as we do for other parsed lines (e.g., an author line ends up in parse_ident, which copies it to a new string). Signed-off-by: Jeff King --- fast-import.c | 7 +++++-- t/t9300-fast-import.sh | 5 +++++ 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/fast-import.c b/fast-import.c index b44d6a467e..ee7258037a 100644 --- a/fast-import.c +++ b/fast-import.c @@ -2588,7 +2588,7 @@ static void parse_new_commit(const char *arg) struct branch *b; char *author = NULL; char *committer = NULL; - const char *encoding = NULL; + char *encoding = NULL; struct hash_list *merge_list = NULL; unsigned int merge_count; unsigned char prev_fanout, new_fanout; @@ -2611,8 +2611,10 @@ static void parse_new_commit(const char *arg) } if (!committer) die("Expected committer but didn't get one"); - if (skip_prefix(command_buf.buf, "encoding ", &encoding)) + if (skip_prefix(command_buf.buf, "encoding ", &v)) { + encoding = xstrdup(v); read_next_command(); + } parse_data(&msg, 0, NULL); read_next_command(); parse_from(b); @@ -2686,6 +2688,7 @@ static void parse_new_commit(const char *arg) strbuf_addbuf(&new_data, &msg); free(author); free(committer); + free(encoding); if (!store_object(OBJ_COMMIT, &new_data, NULL, &b->oid, next_mark)) b->pack_id = pack_id; diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index 141b7fa35e..cf66b40ebc 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -3314,6 +3314,11 @@ test_expect_success 'X: handling encoding' ' printf "Pi: \360\nCOMMIT\n" >>input && + for i in $(test_seq 100) + do + echo "M 644 $EMPTY_BLOB file-$i" + done >>input && + git fast-import X-Patchwork-Id: 11113265 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 27B1813B1 for ; Sun, 25 Aug 2019 08:10:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0E720217D7 for ; Sun, 25 Aug 2019 08:10:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726676AbfHYIK5 (ORCPT ); Sun, 25 Aug 2019 04:10:57 -0400 Received: from cloud.peff.net ([104.130.231.41]:54830 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1725995AbfHYIK4 (ORCPT ); Sun, 25 Aug 2019 04:10:56 -0400 Received: (qmail 11900 invoked by uid 109); 25 Aug 2019 08:10:56 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Sun, 25 Aug 2019 08:10:56 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 10157 invoked by uid 111); 25 Aug 2019 08:12:15 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Sun, 25 Aug 2019 04:12:15 -0400 Authentication-Results: peff.net; auth=none Date: Sun, 25 Aug 2019 04:10:55 -0400 From: Jeff King To: Mike Hommey Cc: Elijah Newren , git@vger.kernel.org, gitster@pobox.com Subject: [PATCH 2/2] fast-import: duplicate into history rather than passing ownership Message-ID: <20190825081055.GB31824@sigill.intra.peff.net> References: <20190825080640.GA31453@sigill.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20190825080640.GA31453@sigill.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Fast-import's read_next_command() has somewhat odd memory ownership semantics for the command_buf strbuf. After reading a command, we copy the strbuf's pointer (without duplicating the string) into our cmd_hist array of recent commands. And then when we're about to read a new command, we clear the strbuf by calling strbuf_detach(), dropping ownership from the strbuf (leaving the cmd_hist reference as the remaining owner). This has a few surprising implications: - if the strbuf hasn't been copied into cmd_hist (e.g., because we haven't ready any commands yet), then the strbuf_detach() will leak the resulting string - any modification to command_buf risks invalidating the pointer held by cmd_hist. There doesn't seem to be any way to trigger this currently (since we tend to modify it only by detaching and reading in a new value), but it's subtly dangerous. - any pointers into an input string will remain valid as long as cmd_hist points to them. So in general, you can point into command_buf.buf and call read_next_command() up to 100 times before your string is cycled out and freed, leaving you with a dangling pointer. This makes it easy to miss bugs during testing, as they might trigger only for a sufficiently large commit (e.g., the bug fixed in the previous commit). Instead, let's make a new string to copy the command into the history array, rather than having dual ownership with the old. Then we can drop the strbuf_detach() calls entirely, and just reuse the same buffer within command_buf over and over. We'd normally have to strbuf_reset() it before using it again, but in both cases here we're using strbuf_getline(), which does it automatically for us. This fixes the leak, and it means that even a single call to read_next_command() will invalidate any held pointers, making it easier to find bugs. In fact, we can drop the extra input lines added to the test case by the previous commit, as the unfixed bug would now trigger just from reading the commit message, even without any modified files in the commit. Reported-by: Mike Hommey Signed-off-by: Jeff King --- fast-import.c | 4 +--- t/t9300-fast-import.sh | 5 ----- 2 files changed, 1 insertion(+), 8 deletions(-) diff --git a/fast-import.c b/fast-import.c index ee7258037a..1f9160b645 100644 --- a/fast-import.c +++ b/fast-import.c @@ -1763,7 +1763,6 @@ static int read_next_command(void) } else { struct recent_command *rc; - strbuf_detach(&command_buf, NULL); stdin_eof = strbuf_getline_lf(&command_buf, stdin); if (stdin_eof) return EOF; @@ -1784,7 +1783,7 @@ static int read_next_command(void) free(rc->buf); } - rc->buf = command_buf.buf; + rc->buf = xstrdup(command_buf.buf); rc->prev = cmd_tail; rc->next = cmd_hist.prev; rc->prev->next = rc; @@ -1833,7 +1832,6 @@ static int parse_data(struct strbuf *sb, uintmax_t limit, uintmax_t *len_res) char *term = xstrdup(data); size_t term_len = command_buf.len - (data - command_buf.buf); - strbuf_detach(&command_buf, NULL); for (;;) { if (strbuf_getline_lf(&command_buf, stdin) == EOF) die("EOF in data (terminator '%s' not found)", term); diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index cf66b40ebc..141b7fa35e 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -3314,11 +3314,6 @@ test_expect_success 'X: handling encoding' ' printf "Pi: \360\nCOMMIT\n" >>input && - for i in $(test_seq 100) - do - echo "M 644 $EMPTY_BLOB file-$i" - done >>input && - git fast-import