From patchwork Fri Mar 4 21:37:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Cai X-Patchwork-Id: 12770078 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1615CC433EF for ; Fri, 4 Mar 2022 21:37:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229911AbiCDViU (ORCPT ); Fri, 4 Mar 2022 16:38:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229671AbiCDViT (ORCPT ); Fri, 4 Mar 2022 16:38:19 -0500 Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1520C1106 for ; Fri, 4 Mar 2022 13:37:28 -0800 (PST) Received: by mail-wm1-x32e.google.com with SMTP id r65so5762962wma.2 for ; Fri, 04 Mar 2022 13:37:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:from:date:subject:mime-version:content-transfer-encoding :fcc:to:cc; bh=UhKa36M2nSiwdhXg1rRcVpQp4EsbcWpICYmBYs3yax0=; b=jrQrbSBR5yiemvjR0ZFeyno/dwAZi0GlhbQn6HbwXk8zyYSxjJV3hyURL6vd+cO+0M vn3z1RBlrvttfsnYrdhNHegotjbc2J43AA3Pata6BVYiFHuN4Ko75PmguhNLvx5TIaf+ kPKcv6rjNGjkUsCL92gqnn/sCnVM6XOkbpKS8c15BH8I9srkqepv9yhEL/LI1BiWDxOU sPyfm+Ogua1hB6cVpEecZQMLQKhjrrHh3zEW0tOar6ejNyMTXaajNH1hP6QR+H4YKLey G2TfCX6YXL4r+h7vaYkmAToWiPoGFd1Phy6sLW/Zi1H6kQZtRHA5y6CgLKXZVgayBhD2 gIPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:from:date:subject:mime-version :content-transfer-encoding:fcc:to:cc; bh=UhKa36M2nSiwdhXg1rRcVpQp4EsbcWpICYmBYs3yax0=; b=SYwOD6oa++ut4KroZKajqLTrbTbjj12xBpw6ROU3b3cqDeehtPoafvyHB6jIzQxNyD C67YclOkgk9jrjwiyUAdnvULDvtEguwB50OBnHadDF4PjdoKOp+PaHq/U+3AmlAQ4S7S okzQznsTc5dG6Ti8Fmo5WNMW4ra4BeHP4Fagnk7KJNzNcTcC+emZPf+hsNNt5Y8eIxFs w6dmtnnraNMqYEp9WPgOn5MJTwSNJj2i/h/KzAHn03P/E18JMunHUmR0ieqt8NT3d9I4 jyO9llhQ0EoLoF2Q6OPaUaK1Xvc0rUIfFaxqAT2JV00B5Y1RBVDIduICFhi1NXsv6roR QyFw== X-Gm-Message-State: AOAM531rSfACpar+YRU5xWQ6ECKkkl/L6Pd7L+d2vYtzgZxTD45E/yEM 3Um7UXK9mXwEidp8jjrxT2vQKUu72is= X-Google-Smtp-Source: ABdhPJzkClRBhvzGtksc3FlpShNtCnZmuC2ekISU8bAwNOP2swGeJSO3yKiGD1XhHOFVjdDBl+vcMw== X-Received: by 2002:a1c:e908:0:b0:37c:3d08:e0d3 with SMTP id q8-20020a1ce908000000b0037c3d08e0d3mr261778wmc.97.1646429846345; Fri, 04 Mar 2022 13:37:26 -0800 (PST) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id m14-20020a05600c4f4e00b0038181486018sm6950474wmq.40.2022.03.04.13.37.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Mar 2022 13:37:26 -0800 (PST) Message-Id: Date: Fri, 04 Mar 2022 21:37:25 +0000 Subject: [PATCH] cat-file: skip expanding default format MIME-Version: 1.0 Fcc: Sent To: git@vger.kernel.org Cc: John Cai , John Cai Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: John Cai From: John Cai When format is passed into --batch, --batch-check, --batch-command, the format gets expanded. When nothing is passed in, the default format is set and the expand_format() gets called. We can save on these cycles by hardcoding how to print the information when nothing is passed as the format, or when the default format is passed. There is no need for the fully expanded format with the default. Since batch_object_write() happens on every object provided in batch mode, we get a nice performance improvement. git rev-list --all > /tmp/all-obj.txt git cat-file --batch-check Signed-off-by: John Cai See https://lore.kernel.org/git/87eecf8ork.fsf@evledraar.gmail.com/ --- optimize cat file batch info writing When cat-file --batch or --batch-check is used, we can skip having to expand the format if no format is specified or if the default format is specified. In this case we know exactly how to print the objects without the full expanded format. This was first discussed in [1]. We get a little performance boost from this optimization because this happens for each objects provided to --batch, --batch-check, or --batch-command. Because batch_object_write() is called on every oid provided in batch mode, this optimization adds up when a large number of oid info is printed. git rev-list --all >/tmp/all-objs.txt git cat-file --batch-check oid), + data->info.type_name->buf, + (uintmax_t)*data->info.sizep); + +} + /* * If "pack" is non-NULL, then "offset" is the byte offset within the pack from * which the object may be accessed (though note that we may also rely on @@ -363,6 +372,12 @@ static void batch_object_write(const char *obj_name, struct packed_git *pack, off_t offset) { + const char *fmt; + + struct strbuf type_name = STRBUF_INIT; + if (!opt->format) + data->info.type_name = &type_name; + if (!data->skip_object_info) { int ret; @@ -377,12 +392,21 @@ static void batch_object_write(const char *obj_name, printf("%s missing\n", obj_name ? obj_name : oid_to_hex(&data->oid)); fflush(stdout); - return; + goto cleanup; } } + if (!opt->format && !opt->print_contents) { + char buf[1024]; + + print_default_format(buf, 1024, data); + batch_write(opt, buf, strlen(buf)); + goto cleanup; + } + + fmt = opt->format ? opt->format : default_format; strbuf_reset(scratch); - strbuf_expand(scratch, opt->format, expand_format, data); + strbuf_expand(scratch, fmt, expand_format, data); strbuf_addch(scratch, '\n'); batch_write(opt, scratch->buf, scratch->len); @@ -390,8 +414,12 @@ static void batch_object_write(const char *obj_name, print_object_or_die(opt, data); batch_write(opt, "\n", 1); } + +cleanup: + strbuf_release(&type_name); } + static void batch_one_object(const char *obj_name, struct strbuf *scratch, struct batch_options *opt, @@ -515,9 +543,7 @@ static int batch_objects(struct batch_options *opt) struct expand_data data; int save_warning; int retval = 0; - - if (!opt->format) - opt->format = "%(objectname) %(objecttype) %(objectsize)"; + const char *fmt; /* * Expand once with our special mark_query flag, which will prime the @@ -526,7 +552,8 @@ static int batch_objects(struct batch_options *opt) */ memset(&data, 0, sizeof(data)); data.mark_query = 1; - strbuf_expand(&output, opt->format, expand_format, &data); + fmt = opt->format ? opt->format : default_format; + strbuf_expand(&output, fmt, expand_format, &data); data.mark_query = 0; strbuf_release(&output); if (opt->cmdmode) diff --git a/t/perf/p1006-cat-file.sh b/t/perf/p1006-cat-file.sh new file mode 100755 index 00000000000..e463623f5a3 --- /dev/null +++ b/t/perf/p1006-cat-file.sh @@ -0,0 +1,16 @@ +#!/bin/sh + +test_description='Basic sort performance tests' +. ./perf-lib.sh + +test_perf_large_repo + +test_expect_success 'setup' ' + git rev-list --all >rla +' + +test_perf 'cat-file --batch-check' ' + git cat-file --batch-check