From patchwork Tue Feb 4 21:26:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Ren=C3=A9_Scharfe?= X-Patchwork-Id: 11365275 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 47A21138D for ; Tue, 4 Feb 2020 21:26:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1D63B21744 for ; Tue, 4 Feb 2020 21:26:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=web.de header.i=@web.de header.b="V1fvnuDa" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727519AbgBDV0Y (ORCPT ); Tue, 4 Feb 2020 16:26:24 -0500 Received: from mout.web.de ([212.227.15.14]:50643 "EHLO mout.web.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727389AbgBDV0Y (ORCPT ); Tue, 4 Feb 2020 16:26:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de; s=dbaedf251592; t=1580851579; bh=+uV7DwFH2+4zlrqPt4rScavXprpVXgv/PZeWYEmHxYY=; h=X-UI-Sender-Class:Subject:From:To:Cc:References:Date:In-Reply-To; b=V1fvnuDawCY3Okh/zJOdmtrOnuQa3n5GfM4plXw+8YQb3cEFMV94vcTQyJoXZt//2 rG1O79vG3cstbUxHJeCgtjjYVucPiQLGzvnEgo9LLvjlf9ySeMh/W+gg+EXHHTDvz7 Vmz4aBqs7zE6zVZfwSDP9m/9cEPVtnPqHto3vwjQ= X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Received: from [192.168.178.26] ([91.47.145.153]) by smtp.web.de (mrweb003 [213.165.67.108]) with ESMTPSA (Nemesis) id 0LgYRZ-1jLlOx3Dq1-00o09A; Tue, 04 Feb 2020 22:26:18 +0100 Subject: [PATCH 10/10] name-rev: release unused name strings From: =?utf-8?q?Ren=C3=A9_Scharfe?= To: Git Mailing List Cc: =?utf-8?q?SZEDER_G=C3=A1bor?= , =?utf-8?q?Martin_?= =?utf-8?q?=C3=85gren?= , Junio C Hamano References: <084909f8-fefa-1fe0-b2ce-74eff47c4972@web.de> Message-ID: <4eddc458-6294-9b9c-857b-50ba484a7168@web.de> Date: Tue, 4 Feb 2020 22:26:18 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <084909f8-fefa-1fe0-b2ce-74eff47c4972@web.de> Content-Language: en-US X-Provags-ID: V03:K1:oTQsin1l7Ui7l4xE4Sloza/yltC55JaRlpLXlb2zib5h6t7qDCU Df5yer00Jlt1LOv6iZU3OlERb9um9i+3rDAH7g4wDGgiMoLs9VXIiPtA51TAY6VUweUahjy 68MKdmGu+8Qc8rmQUtEfIMYtUcB+/KMd/2ZgqnnyKfcO2ehisWWFk570kWmhakFqAe4WBuy d4ip3AAYY4HS4SIqSYZ0g== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:UuspuAYIz9c=:MRvdBMDZs1DbruI71Vll35 0g6+emhxQcl5Ry06e8e/it6H/hyFH4W7JMs8Qnm/jgZZyx6RxvanK2HGudJ8YuncEu6fSkKmL pI9CeCByU1ZdLPOCwaecHzXrinGW0gzWDWA2HsAuUP9ePByyzUqitJvv6Xd3PJw9ZIE1FMO3V gttdsH5EUWuik2+19+Qtd9q90IWvjnf2AXBWa5E47Gvy22nrloLIzi3Q3VEqFIsp/UmZFlU/h iqG6LIp6UzoPw8cGd98WYAznPBqfgp6Be+UN6eG/TU7I3lUojiGDK/wvOXw4KnhTgpk2RAD90 aP1HTeTvk/1yz5F2GByfsBqm6VtSkHzGb4FM7fSMtqi0K/5lwZ2/r/uQoPTpiTntTEiV7XVsn Fu+JKnXYC2V8gEr2jVNS5eNPoRIYkC4lVWL6GftiPPpzb+TSmGmurnlfiD1mJ9tuFpZxDYL7i uX4NT/JwWecLmSxa0Qc+4nuEyTi1CVy2nvupPzKHzHYWByo2yqWO1thOe0C3UA/+8HSiyjFsu FtYZn+NjnOsPAGZBOkR0AKagZHca0lhdeVRZAaTijJNLsKTDhdBWFqEUBZkNRoQIfFEsOpT/V cLLP4Ropgk2jGHXrkF2ks1nSsUU3T7qtZuRFM40LF1eqkQ5iWzpqN4Cs0quVJxfi9h47bFtm6 sxC92tOk0+y3taGwXBSvzJ16HqDzdn7ba25KifXe7vOyXkSwSe/pxaTv8N7EfPhn2gf1HPBQN XE40dV9xR9ltML1SBEYF5SJb4oZFQsvkUasj30FmEV5FVSAWqsj9j2ap3dJLSLLVCEOFUMd0D bLv5Qlk5EWXC2/24MwdiAGMQWT+MizZnr+wfCuQASbmrkvt7NLzqANqyDRhT+LA17jAV3R8U8 aaI1VThRK1b4sT+QUvT+WCM+WuZZuj+K8Cp7G0iStOEWa7hZY4FypQ5aOEg04uovymrIr6uxx xr9kqpEELPf3mMbOkPNK/iYLZ/eBvsI2pRPaisV6b43ADw5hWNuMi3TWQ9Zh3rRYqCxiS/ErC shSJ8/iLC86awGXZJRIcP+9IpJPhtXtFKAjx8twDkaspM6LkVm3y9TwHZMfEwsg8z/TBLaAag oznUYPdIWVorwsYIA5bRVBpASD0d5KeaygvnbfYfIf1sJ/D8LBvIl1ExZgSVat85kMxEDPjro FU5uQcpTwekq/QvKCe8MfqDeRy2OGD017toZDQeGPf4ljcpbpyqxd9g04O02nkhsPa8O0i8sp LwpckaOk9qRxhQ/fS Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org name_rev() assigns a name to a commit and its parents and grandparents and so on. Commits share their name string with their first parent, which in turn does the same, recursively to the root. That saves a lot of allocations. When a better name is found, the old name is replaced, but its memory is not released. That leakage can become significant. Can we release these old strings exactly once even though they are referenced multiple times? Yes, indeed -- we can make use of the fact that name_rev() visits the ancestors of a commit after it set a new name for it and tries to update their names as well. Members of the first ancestral line have the same taggerdate and from_tag values, but a higher distance value than their child commit at generation 0. These are the only criteria used by is_better_name(). Lower distance values are considered better, so a name that is better for a child will also be better for its parent and grandparent etc. That means we can free(3) an inferior name at generation 0 and rely on name_rev() to replace all references in ancestors as well. If we do that then we need to stop using the string pointer alone to distinguish new empty rev_name slots from initialized ones, though, as it technically becomes invalid after the free(3) call -- even though its value is still different from NULL. We can check the generation value first, as empty slots will have it initialized to 0, and for the actual generation 0 we'll set a new valid name right after the create_or_update_name() call that releases the string. For the Chromium repo, releasing superceded names reduces the memory footprint of name-rev --all significantly. Here's the output of GNU time before: 0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k 0inputs+0outputs (0major+571470minor)pagefaults 0swaps ... and with this patch: 1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k 0inputs+0outputs (0major+314370minor)pagefaults 0swaps It also gets faster; hyperfine before: Benchmark #1: ./git -C ../chromium/src name-rev --all Time (mean ± σ): 1.534 s ± 0.006 s [User: 1.039 s, System: 0.494 s] Range (min … max): 1.522 s … 1.542 s 10 runs ... and with this patch: Benchmark #1: ./git -C ../chromium/src name-rev --all Time (mean ± σ): 1.338 s ± 0.006 s [User: 1.047 s, System: 0.291 s] Range (min … max): 1.327 s … 1.346 s 10 runs For the Linux repo it doesn't pay off; memory usage only gets down from: 0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k 0inputs+0outputs (0major+44579minor)pagefaults 0swaps ... to: 0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k 0inputs+0outputs (0major+44892minor)pagefaults 0swaps The runtime actually increases slightly from: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 828.8 ms ± 5.0 ms [User: 797.2 ms, System: 31.6 ms] Range (min … max): 824.1 ms … 838.9 ms 10 runs ... to: Benchmark #1: ./git -C ../linux/ name-rev --all Time (mean ± σ): 847.6 ms ± 3.4 ms [User: 807.9 ms, System: 39.6 ms] Range (min … max): 843.4 ms … 854.3 ms 10 runs Why is that? In the Chromium repo, ca. 44000 free(3) calls in create_or_update_name() release almost 1GB, while in the Linux repo 240000+ calls release a bit more than 5MB, so the average discarded name is ca. 1000x longer in the latter. Overall I think it's the right tradeoff to make, as it helps curb the memory usage in repositories with big discarded names, and the added overhead is small. Signed-off-by: René Scharfe --- builtin/name-rev.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) -- 2.25.0 diff --git a/builtin/name-rev.c b/builtin/name-rev.c index 98f55bcea9..23a639ff30 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -17,7 +17,7 @@ #define CUTOFF_DATE_SLOP 86400 struct rev_name { - const char *tip_name; + char *tip_name; timestamp_t taggerdate; int generation; int distance; @@ -34,7 +34,7 @@ static struct commit_rev_name rev_names; static int is_valid_rev_name(const struct rev_name *name) { - return name && name->tip_name; + return name && (name->generation || name->tip_name); } static struct rev_name *get_commit_rev_name(const struct commit *commit) @@ -87,9 +87,20 @@ static struct rev_name *create_or_update_name(struct commit *commit, { struct rev_name *name = commit_rev_name_at(&rev_names, commit); - if (is_valid_rev_name(name) && - !is_better_name(name, taggerdate, distance, from_tag)) - return NULL; + if (is_valid_rev_name(name)) { + if (!is_better_name(name, taggerdate, distance, from_tag)) + return NULL; + + /* + * This string might still be shared with ancestors + * (generation > 0). We can release it here regardless, + * because the new name that has just won will be better + * for them as well, so name_rev() will replace these + * stale pointers when it processes the parents. + */ + if (!name->generation) + free(name->tip_name); + } name->taggerdate = taggerdate; name->generation = generation;