[02/10] fast-export: use xmemdupz() for anonymizing oids

Message ID	20200623152449.GB1435482@coredump.intra.peff.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=IiYM=AE=vger.kernel.org=git-owner@kernel.org> Date: Tue, 23 Jun 2020 11:24:49 -0400 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Cc: Eric Sunshine <sunshine@sunshineco.com>, Junio C Hamano <gitster@pobox.com>, Johannes Schindelin <Johannes.Schindelin@gmx.de> Subject: [PATCH 02/10] fast-export: use xmemdupz() for anonymizing oids Message-ID: <20200623152449.GB1435482@coredump.intra.peff.net> References: <20200623152436.GA50925@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20200623152436.GA50925@coredump.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk
Series	fast-export: allow seeding the anonymized mapping \| expand [alternative,0/10] fast-export: allow seeding the anonymized mapping [01/10] t9351: derive anonymized tree checks from original repo [02/10] fast-export: use xmemdupz() for anonymizing oids [03/10] fast-export: store anonymized oids as hex strings [04/10] fast-export: tighten anonymize_mem() interface to handle only strings [05/10] fast-export: stop storing lengths in anonymized hashmaps [06/10] fast-export: use a flex array to store anonymized entries [07/10] fast-export: move global "idents" anonymize hashmap into function [08/10] fast-export: add a "data" callback parameter to anonymize_str() [09/10] fast-export: allow seeding the anonymized mapping [10/10] fast-export: anonymize "master" refname

Message ID

20200623152449.GB1435482@coredump.intra.peff.net (mailing list archive)

State

New, archived

Headers

Date: Tue, 23 Jun 2020 11:24:49 -0400
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Eric Sunshine <sunshine@sunshineco.com>,
        Junio C Hamano <gitster@pobox.com>,
        Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: [PATCH 02/10] fast-export: use xmemdupz() for anonymizing oids
Message-ID: <20200623152449.GB1435482@coredump.intra.peff.net>
References: <20200623152436.GA50925@coredump.intra.peff.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20200623152436.GA50925@coredump.intra.peff.net>
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

fast-export: allow seeding the anonymized mapping | expand

Commit Message

Jeff King June 23, 2020, 3:24 p.m. UTC

Our anonymize_mem() function is careful to take a ptr/len pair to allow
storing binary tokens like object ids, as well as partial strings (e.g.,
just "foo" of "foo/bar"). But it duplicates the hash key using
xstrdup()! That means that:

  - for a partial string, we'd store all bytes up to the NUL, even
    though we'd never look at anything past "len". This didn't produce
    wrong behavior, but was wasteful.

  - for a binary oid that doesn't contain a zero byte, we'd copy garbage
    bytes off the end of the array (though as long as nothing complained
    about reading uninitialized bytes, further reads would be limited by
    "len", and we'd produce the correct results)

  - for a binary oid that does contain a zero byte, we'd copy _fewer_
    bytes than intended into the hashmap struct. When we later try to
    look up a value, we'd access uninitialized memory and potentially
    falsely claim that a particular oid is not present.

The most common reason to store an oid is an anonymized gitlink, but our
test case doesn't have any gitlinks at all. So let's add one whose oid
contains a NUL and is present at two different paths. ASan catches the
memory error, but even without it we can detect the bug because the oid
is not anonymized the same way for both paths.

And of course the fix is to copy the correct number of bytes. We don't
technically need the appended NUL from xmemdupz(), but it doesn't hurt
as an extra protection against anybody treating it like a string (plus a
future patch will push us more in that direction).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fast-export.c            |  2 +-
 t/t9351-fast-export-anonymize.sh | 15 +++++++++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 85868162ee..289395a131 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -162,7 +162,7 @@  static const void *anonymize_mem(struct hashmap *map,
 	if (!ret) {
 		ret = xmalloc(sizeof(*ret));
 		hashmap_entry_init(&ret->hash, key.hash.hash);
-		ret->orig = xstrdup(orig);
+		ret->orig = xmemdupz(orig, *len);
 		ret->orig_len = *len;
 		ret->anon = generate(orig, len);
 		ret->anon_len = *len;
diff --git a/t/t9351-fast-export-anonymize.sh b/t/t9351-fast-export-anonymize.sh
index e772cf9930..dc5d75cd19 100755
--- a/t/t9351-fast-export-anonymize.sh
+++ b/t/t9351-fast-export-anonymize.sh
@@ -10,6 +10,10 @@  test_expect_success 'setup simple repo' '
 	mkdir subdir &&
 	test_commit subdir/bar &&
 	test_commit subdir/xyzzy &&
+	fake_commit=$(echo $ZERO_OID | sed s/0/a/) &&
+	git update-index --add --cacheinfo 160000,$fake_commit,link1 &&
+	git update-index --add --cacheinfo 160000,$fake_commit,link2 &&
+	git commit -m "add gitlink" &&
 	git tag -m "annotated tag" mytag
 '
 
@@ -26,6 +30,12 @@  test_expect_success 'stream omits path names' '
 	! grep xyzzy stream
 '
 
+test_expect_success 'stream omits gitlink oids' '
+	# avoid relying on the whole oid to remain hash-agnostic; this is
+	# plenty to be unique within our test case
+	! grep a000000000000000000 stream
+'
+
 test_expect_success 'stream allows master as refname' '
 	grep master stream
 '
@@ -89,6 +99,11 @@  test_expect_success 'paths in subdir ended up in one tree' '
 	test_cmp expect actual
 '
 
+test_expect_success 'identical gitlinks got identical oid' '
+	awk "/commit/ { print \$3 }" <root | sort -u >commits &&
+	test_line_count = 1 commits
+'
+
 test_expect_success 'tag points to branch tip' '
 	git rev-parse $other_branch >expect &&
 	git for-each-ref --format="%(*objectname)" | grep . >actual &&

[02/10] fast-export: use xmemdupz() for anonymizing oids

Commit Message

Patch