From patchwork Fri Aug 21 17:53:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11730131 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1579D109B for ; Fri, 21 Aug 2020 17:53:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0590F20735 for ; Fri, 21 Aug 2020 17:53:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726967AbgHURxo (ORCPT ); Fri, 21 Aug 2020 13:53:44 -0400 Received: from cloud.peff.net ([104.130.231.41]:37290 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726884AbgHURxl (ORCPT ); Fri, 21 Aug 2020 13:53:41 -0400 Received: (qmail 18987 invoked by uid 109); 21 Aug 2020 17:53:40 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 21 Aug 2020 17:53:40 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 30107 invoked by uid 111); 21 Aug 2020 17:53:39 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 21 Aug 2020 13:53:39 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 21 Aug 2020 13:53:39 -0400 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 1/3] p5302: disable thread-count parameter tests by default Message-ID: <20200821175339.GA3263141@coredump.intra.peff.net> References: <20200821175153.GA3263018@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20200821175153.GA3263018@coredump.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The primary function of the perf suite is to detect regressions (or improvements) between versions of Git. The only numbers we show a direct comparison for are timings between the same test run on two different versions. However, it can sometimes be used to collect other information. For instance, p5302 runs the same index-pack operation with different thread counts. The output doesn't directly compare these, but anybody interested in working on index-pack can manually compare the results. For a normal regression run of the full perf-suite, though, this incurs a significant cost to generate numbers nobody will actually look at; about 25% of the total time of the test suite is spent in p5302. And the low-thread-count runs are the most expensive part of it, since they're (unsurprisingly) not using as many threads. Let's skip these tests by default, but make it possible for people working on index-pack to still run them by setting an environment variable. Rather than make this specific to p5302, let's introduce a generic mechanism. This makes it possible to run the full suite with every possible test if somebody really wants to burn some CPU. Signed-off-by: Jeff King --- t/perf/README | 9 +++++++++ t/perf/p5302-pack-index.sh | 10 +++++----- t/perf/perf-lib.sh | 2 ++ 3 files changed, 16 insertions(+), 5 deletions(-) diff --git a/t/perf/README b/t/perf/README index c7b70e2d28..bd649afa97 100644 --- a/t/perf/README +++ b/t/perf/README @@ -84,6 +84,15 @@ You can set the following variables (also in your config.mak): probably be about linux.git size for optimal results. Both default to the git.git you are running from. + GIT_PERF_EXTRA + Boolean to enable additional tests. Most test scripts are + written to detect regressions between two versions of Git, and + the output will compare timings for individual tests between + those versions. Some scripts have additional tests which are not + run by default, that show patterns within a single version of + Git (e.g., performance of index-pack as the number of threads + changes). These can be enabled with GIT_PERF_EXTRA. + You can also pass the options taken by ordinary git tests; the most useful one is: diff --git a/t/perf/p5302-pack-index.sh b/t/perf/p5302-pack-index.sh index a9b3e112d9..23011ab739 100755 --- a/t/perf/p5302-pack-index.sh +++ b/t/perf/p5302-pack-index.sh @@ -13,31 +13,31 @@ test_expect_success 'repack' ' export PACK ' -test_perf 'index-pack 0 threads' ' +test_perf PERF_EXTRA 'index-pack 0 threads' ' rm -rf repo.git && git init --bare repo.git && GIT_DIR=repo.git git index-pack --threads=1 --stdin < $PACK ' -test_perf 'index-pack 1 thread ' ' +test_perf PERF_EXTRA 'index-pack 1 thread ' ' rm -rf repo.git && git init --bare repo.git && GIT_DIR=repo.git GIT_FORCE_THREADS=1 git index-pack --threads=1 --stdin < $PACK ' -test_perf 'index-pack 2 threads' ' +test_perf PERF_EXTRA 'index-pack 2 threads' ' rm -rf repo.git && git init --bare repo.git && GIT_DIR=repo.git git index-pack --threads=2 --stdin < $PACK ' -test_perf 'index-pack 4 threads' ' +test_perf PERF_EXTRA 'index-pack 4 threads' ' rm -rf repo.git && git init --bare repo.git && GIT_DIR=repo.git git index-pack --threads=4 --stdin < $PACK ' -test_perf 'index-pack 8 threads' ' +test_perf PERF_EXTRA 'index-pack 8 threads' ' rm -rf repo.git && git init --bare repo.git && GIT_DIR=repo.git git index-pack --threads=8 --stdin < $PACK diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh index 13e389367a..821581a885 100644 --- a/t/perf/perf-lib.sh +++ b/t/perf/perf-lib.sh @@ -245,3 +245,5 @@ test_at_end_hook_ () { test_export () { export "$@" } + +test_lazy_prereq PERF_EXTRA 'test_bool_env GIT_PERF_EXTRA false' From patchwork Fri Aug 21 17:54:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11730135 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8977E1575 for ; Fri, 21 Aug 2020 17:54:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7913D20702 for ; Fri, 21 Aug 2020 17:54:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727066AbgHURyx (ORCPT ); Fri, 21 Aug 2020 13:54:53 -0400 Received: from cloud.peff.net ([104.130.231.41]:37296 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725873AbgHURyw (ORCPT ); Fri, 21 Aug 2020 13:54:52 -0400 Received: (qmail 19001 invoked by uid 109); 21 Aug 2020 17:54:52 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 21 Aug 2020 17:54:52 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 30147 invoked by uid 111); 21 Aug 2020 17:54:51 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 21 Aug 2020 13:54:51 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 21 Aug 2020 13:54:51 -0400 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 2/3] p5302: count up to online-cpus for thread tests Message-ID: <20200821175451.GB3263141@coredump.intra.peff.net> References: <20200821175153.GA3263018@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20200821175153.GA3263018@coredump.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When PERF_EXTRA is enabled, p5302 checks the performance of index-pack with various numbers of threads. This can be useful for deciding what the default should be (which is currently capped at 3 threads based on the results of this script). However, we only go up to 8 threads, and modern machines may have more. Let's get the number of CPUs from test-tool, and test various numbers of threads between one and that maximum. Note that the current tests aren't all identical, as we have to set GIT_FORCE_THREADS for the --threads=1 test (which measures the overhead of starting a single worker thread versus the "0" case of using the main thread). To keep the loop simple, we'll keep the "0" case out of it, and set GIT_FORCE_THREADS=1 for all of the other cases (it's a noop for all but the "1" case, since numbers higher than 1 would always need threads). Note also that we could skip running "test-tool" if PERF_EXTRA isn't set. However, there's some small value in knowing the number of threads, so that we can mark each test as skipped in the output. Signed-off-by: Jeff King --- t/perf/p5302-pack-index.sh | 47 +++++++++++++++++++------------------- 1 file changed, 24 insertions(+), 23 deletions(-) diff --git a/t/perf/p5302-pack-index.sh b/t/perf/p5302-pack-index.sh index 23011ab739..228593d9ad 100755 --- a/t/perf/p5302-pack-index.sh +++ b/t/perf/p5302-pack-index.sh @@ -13,35 +13,36 @@ test_expect_success 'repack' ' export PACK ' -test_perf PERF_EXTRA 'index-pack 0 threads' ' - rm -rf repo.git && - git init --bare repo.git && - GIT_DIR=repo.git git index-pack --threads=1 --stdin < $PACK +# Rather than counting up and doubling each time, count down from the endpoint, +# halving each time. That ensures that our final test uses as many threads as +# CPUs, even if it isn't a power of 2. +test_expect_success 'set up thread-counting tests' ' + t=$(test-tool online-cpus) && + threads= && + while test $t -gt 0 + do + threads="$t $threads" + t=$((t / 2)) + done ' -test_perf PERF_EXTRA 'index-pack 1 thread ' ' - rm -rf repo.git && - git init --bare repo.git && - GIT_DIR=repo.git GIT_FORCE_THREADS=1 git index-pack --threads=1 --stdin < $PACK -' - -test_perf PERF_EXTRA 'index-pack 2 threads' ' - rm -rf repo.git && - git init --bare repo.git && - GIT_DIR=repo.git git index-pack --threads=2 --stdin < $PACK -' - -test_perf PERF_EXTRA 'index-pack 4 threads' ' +test_perf PERF_EXTRA 'index-pack 0 threads' ' rm -rf repo.git && git init --bare repo.git && - GIT_DIR=repo.git git index-pack --threads=4 --stdin < $PACK + GIT_DIR=repo.git git index-pack --threads=1 --stdin < $PACK ' -test_perf PERF_EXTRA 'index-pack 8 threads' ' - rm -rf repo.git && - git init --bare repo.git && - GIT_DIR=repo.git git index-pack --threads=8 --stdin < $PACK -' +for t in $threads +do + THREADS=$t + export THREADS + test_perf PERF_EXTRA "index-pack $t threads" ' + rm -rf repo.git && + git init --bare repo.git && + GIT_DIR=repo.git GIT_FORCE_THREADS=1 \ + git index-pack --threads=$THREADS --stdin <$PACK + ' +done test_perf 'index-pack default number of threads' ' rm -rf repo.git && From patchwork Fri Aug 21 17:58:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff King X-Patchwork-Id: 11730137 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 837001575 for ; Fri, 21 Aug 2020 17:58:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 74CE520702 for ; Fri, 21 Aug 2020 17:58:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726863AbgHUR6D (ORCPT ); Fri, 21 Aug 2020 13:58:03 -0400 Received: from cloud.peff.net ([104.130.231.41]:37302 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726854AbgHUR6C (ORCPT ); Fri, 21 Aug 2020 13:58:02 -0400 Received: (qmail 19024 invoked by uid 109); 21 Aug 2020 17:58:01 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 21 Aug 2020 17:58:01 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 30240 invoked by uid 111); 21 Aug 2020 17:58:01 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 21 Aug 2020 13:58:01 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 21 Aug 2020 13:58:00 -0400 From: Jeff King To: git@vger.kernel.org Subject: [PATCH 3/3] index-pack: adjust default threading cap Message-ID: <20200821175800.GC3263141@coredump.intra.peff.net> References: <20200821175153.GA3263018@coredump.intra.peff.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20200821175153.GA3263018@coredump.intra.peff.net> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Commit b8a2486f15 (index-pack: support multithreaded delta resolving, 2012-05-06) describes an experiment that shows that setting the number of threads for index-pack higher than 3 does not help. I repeated that experiment using a more modern version of Git and a more modern CPU and got different results. Here are timings for p5302 against linux.git run on my laptop, a Core i9-9880H with 8 cores plus hyperthreading (so online-cpus returns 16): 5302.3: index-pack 0 threads 256.28(253.41+2.79) 5302.4: index-pack 1 threads 257.03(254.03+2.91) 5302.5: index-pack 2 threads 149.39(268.34+3.06) 5302.6: index-pack 4 threads 94.96(294.10+3.23) 5302.7: index-pack 8 threads 68.12(339.26+3.89) 5302.8: index-pack 16 threads 70.90(655.03+7.21) 5302.9: index-pack default number of threads 116.91(290.05+3.21) You can see that wall-clock times continue to improve dramatically up to the number of cores, but bumping beyond that (into hyperthreading territory) does not help (and in fact hurts a little). Here's the same experiment on a machine with dual Xeon 6230's, totaling 40 cores (80 with hyperthreading): 5302.3: index-pack 0 threads 310.04(302.73+6.90) 5302.4: index-pack 1 threads 310.55(302.68+7.40) 5302.5: index-pack 2 threads 178.17(304.89+8.20) 5302.6: index-pack 5 threads 99.53(315.54+9.56) 5302.7: index-pack 10 threads 72.80(327.37+12.79) 5302.8: index-pack 20 threads 60.68(357.74+21.66) 5302.9: index-pack 40 threads 58.07(454.44+67.96) 5302.10: index-pack 80 threads 59.81(720.45+334.52) 5302.11: index-pack default number of threads 134.18(309.32+7.98) The results are similar; things stop improving at 40 threads. Curiously, going from 20 to 40 really doesn't help much, either (and increases CPU time considerably). So that may represent an actual barrier to parallelism, where we lose out due to context-switching and loss of cache locality, but don't reap the wall-clock benefits due to contention of our coarse-grained locks. So what's a good default value? It's clear that the current cap of 3 is too low; our default values are 42% and 57% slower than the best times on each machine. The results on the 40-core machine imply that 20 threads is an actual barrier regardless of the number of cores, so we'll take that as a maximum. We get the best results on these machines at half of the online-cpus value. That's presumably a result of the hyperthreading. That's common on multi-core Intel processors, but not necessarily elsewhere. But if we take it as an assumption, we can perform optimally on hyperthreaded machines and still do much better than the status quo on other machines, as long as we never half below the current value of 3. So that's what this patch does. Signed-off-by: Jeff King --- Somebody with an AMD machine with a bunch of cores may want to revisit this. I don't think it makes anything worse, but they may want to not do the halving. I don't know if there's an easy way for us to determine the difference via online_cpus(), though. builtin/index-pack.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index f865666db9..9721bf1ffe 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1798,9 +1798,22 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) if (HAVE_THREADS && !nr_threads) { nr_threads = online_cpus(); - /* An experiment showed that more threads does not mean faster */ - if (nr_threads > 3) - nr_threads = 3; + /* + * Experiments show that going above 20 threads doesn't help, + * no matter how many cores you have. Below that, we tend to + * max at half the number of online_cpus(), presumably because + * half of those are hyperthreads rather than full cores. We'll + * never reduce the level below "3", though, to match a + * historical value that nobody complained about. + */ + if (nr_threads < 4) + ; /* too few cores to consider capping */ + else if (nr_threads < 6) + nr_threads = 3; /* historic cap */ + else if (nr_threads < 40) + nr_threads /= 2; + else + nr_threads = 20; /* hard cap */ } curr_pack = open_pack_file(pack_name);