diff mbox series

[1/2] fast-import: duplicate parsed encoding string

Message ID 20190825080821.GA31824@sigill.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series fast-import input string handling bugs | expand

Commit Message

Jeff King Aug. 25, 2019, 8:08 a.m. UTC
We read each line of the fast-import stream into the command_buf strbuf.
When reading a commit, we parse a line like "encoding foo" by storing a
pointer to "foo", but not making a copy. We may then read an unbounded
number of other lines (e.g., one for each modified file in the commit),
each of which writes into command_buf.

This works out in practice for small cases, because we hand off
ownership of the heap buffer from command_buf to the cmd_hist array, and
read new commands into a fresh heap buffer. And thus the pointer to
"foo" remains valid as long as there aren't so many intermediate lines
that we end up dropping the original "encoding" line from the history.

But as the test modification shows, if we go over our default of 100
lines, we end up with our encoding string pointing into freed heap
memory. This seems to fail reliably by writing garbage into the output,
but running under ASan definitely detects this as a user-after-free.

We can fix it by duplicating the encoding value, just as we do for other
parsed lines (e.g., an author line ends up in parse_ident, which copies
it to a new string).

Signed-off-by: Jeff King <peff@peff.net>
---
 fast-import.c          | 7 +++++--
 t/t9300-fast-import.sh | 5 +++++
 2 files changed, 10 insertions(+), 2 deletions(-)

Comments

Elijah Newren Aug. 26, 2019, 6:28 p.m. UTC | #1
On Sun, Aug 25, 2019 at 1:08 AM Jeff King <peff@peff.net> wrote:
>
> We read each line of the fast-import stream into the command_buf strbuf.
> When reading a commit, we parse a line like "encoding foo" by storing a
> pointer to "foo", but not making a copy. We may then read an unbounded
> number of other lines (e.g., one for each modified file in the commit),
> each of which writes into command_buf.
>
> This works out in practice for small cases, because we hand off
> ownership of the heap buffer from command_buf to the cmd_hist array, and
> read new commands into a fresh heap buffer. And thus the pointer to
> "foo" remains valid as long as there aren't so many intermediate lines
> that we end up dropping the original "encoding" line from the history.
>
> But as the test modification shows, if we go over our default of 100
> lines, we end up with our encoding string pointing into freed heap
> memory. This seems to fail reliably by writing garbage into the output,
> but running under ASan definitely detects this as a user-after-free.

s/user-after-free/use-after-free/

> We can fix it by duplicating the encoding value, just as we do for other
> parsed lines (e.g., an author line ends up in parse_ident, which copies
> it to a new string).

Eek!  Thanks for fixing this up for me; patch looks good.
Jeff King Aug. 26, 2019, 6:44 p.m. UTC | #2
On Mon, Aug 26, 2019 at 11:28:58AM -0700, Elijah Newren wrote:

> On Sun, Aug 25, 2019 at 1:08 AM Jeff King <peff@peff.net> wrote:
> >
> > We read each line of the fast-import stream into the command_buf strbuf.
> > When reading a commit, we parse a line like "encoding foo" by storing a
> > pointer to "foo", but not making a copy. We may then read an unbounded
> > number of other lines (e.g., one for each modified file in the commit),
> > each of which writes into command_buf.
> >
> > This works out in practice for small cases, because we hand off
> > ownership of the heap buffer from command_buf to the cmd_hist array, and
> > read new commands into a fresh heap buffer. And thus the pointer to
> > "foo" remains valid as long as there aren't so many intermediate lines
> > that we end up dropping the original "encoding" line from the history.
> >
> > But as the test modification shows, if we go over our default of 100
> > lines, we end up with our encoding string pointing into freed heap
> > memory. This seems to fail reliably by writing garbage into the output,
> > but running under ASan definitely detects this as a user-after-free.
> 
> s/user-after-free/use-after-free/

Wow. I self-corrected "user-after-free" at least three other times while
writing this thread. I clearly have a problem. :)

> > We can fix it by duplicating the encoding value, just as we do for other
> > parsed lines (e.g., an author line ends up in parse_ident, which copies
> > it to a new string).
> 
> Eek!  Thanks for fixing this up for me; patch looks good.

No problem. On the plus side, finding your bug made me think much harder
about the implications of patch 2 (because it's quite subtle and an easy
mistake to make).

-Peff
diff mbox series

Patch

diff --git a/fast-import.c b/fast-import.c
index b44d6a467e..ee7258037a 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2588,7 +2588,7 @@  static void parse_new_commit(const char *arg)
 	struct branch *b;
 	char *author = NULL;
 	char *committer = NULL;
-	const char *encoding = NULL;
+	char *encoding = NULL;
 	struct hash_list *merge_list = NULL;
 	unsigned int merge_count;
 	unsigned char prev_fanout, new_fanout;
@@ -2611,8 +2611,10 @@  static void parse_new_commit(const char *arg)
 	}
 	if (!committer)
 		die("Expected committer but didn't get one");
-	if (skip_prefix(command_buf.buf, "encoding ", &encoding))
+	if (skip_prefix(command_buf.buf, "encoding ", &v)) {
+		encoding = xstrdup(v);
 		read_next_command();
+	}
 	parse_data(&msg, 0, NULL);
 	read_next_command();
 	parse_from(b);
@@ -2686,6 +2688,7 @@  static void parse_new_commit(const char *arg)
 	strbuf_addbuf(&new_data, &msg);
 	free(author);
 	free(committer);
+	free(encoding);
 
 	if (!store_object(OBJ_COMMIT, &new_data, NULL, &b->oid, next_mark))
 		b->pack_id = pack_id;
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 141b7fa35e..cf66b40ebc 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3314,6 +3314,11 @@  test_expect_success 'X: handling encoding' '
 
 	printf "Pi: \360\nCOMMIT\n" >>input &&
 
+	for i in $(test_seq 100)
+	do
+		echo "M 644 $EMPTY_BLOB file-$i"
+	done >>input &&
+
 	git fast-import <input &&
 	git cat-file -p encoding | grep $(printf "\360") &&
 	git log -1 --format=%B encoding | grep $(printf "\317\200")