From patchwork Thu Feb 1 10:25:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13540887 Received: from wout5-smtp.messagingengine.com (wout5-smtp.messagingengine.com [64.147.123.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 386AE15B990 for ; Thu, 1 Feb 2024 10:25:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706783108; cv=none; b=EwStRGkDy4c0Q7EDMNmROsLlLPUceaeQUjsTp8xSH+1QWkQjwGgcwNf08n4kqDTKRurA6atAsLVIF0zMhEeVJkRqLcMOEAn89/7x5bKIX+VL4TeJbnGTue17Fefsd5fRElYgInf3v9VslssKt2UULCCC3hpd+KygqHX9NdJ2OKs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706783108; c=relaxed/simple; bh=0mp2WUd2epIIoEhc6vE0SXtPS7rLXG2Ojz9rCgy2qrI=; h=Date:From:To:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=KKaXt2bSa2vvA093NdeD1vgixHd6b8nspvsB1vpEHQsPY7FJq6s72A5G53LjXPLe0Z4fQi2isF0NuDehxpfmMDhuKI4f2Lzoa3TgQZTdkN75KeoJZV8GUrn5CEhzDI6QSSTAIhuY0SHEMtDojFWa1Minw7WiLfbVSYI8IpM/ZDg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im; spf=pass smtp.mailfrom=pks.im; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b=k47CaNFv; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=ovAP9mJm; arc=none smtp.client-ip=64.147.123.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="k47CaNFv"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="ovAP9mJm" Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 23BB63200B1C for ; Thu, 1 Feb 2024 05:25:05 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Thu, 01 Feb 2024 05:25:05 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1706783104; x=1706869504; bh=oohmreglTx TteCMuyKsk6GVYX+wxqA2vk70Tz8xOXxs=; b=k47CaNFv8Frf7UIDGEyM3dsED1 vJpajsGfMVcrgaoCErvM3Xpw3ASip5CE9YOPMhM2xwkqQNqmMeGDaQpxHKgiwIPs DNUT5p12LWjG8RXFYx7lQmSPjvmu4n+LmrQds12saoUihqGA+hP9hAYGLcrrNEkh sPiQiXqsNHA7Zkx+9t87IVd721T6kcpHqIwZVoaJh0XRzNu8yeyOA2m5TbBPIpRE mTLG1bOfFBLdA6Od7SD9ipdD47YRjdEG0rFROj/IX+zxSb8T/VUD6YXXVS2FSyrc BDSBVaoddyZwSTY85wz4qJbqCT829H4vU2FjeYQD8nDhHbTbFH/TDWSqbQcg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1706783104; x=1706869504; bh=oohmreglTxTteCMuyKsk6GVYX+wx qA2vk70Tz8xOXxs=; b=ovAP9mJmqeYmXBOUjsFymGQITg+5SPFIx9qIhoDQECxF Eh1WxDY26B1+S+8hT8tV0alRM3On6y8LvpNFdzY+sa41LAm9QxO06RjVVQ7oEf4y UwmwDt4jNPVylDHcirD1Kz7Sh5KAOzu4LcDYvdj3a/dGCThx1Zr130KDG2nGV5x3 nQL2TLefvdtxF9Wl7GA3WYvfZDYfuzFBAJ2dyS2iaKyfOX0/y1dC0cSCbOsmPCGs xvgAzax1dG9uPi4kN2e5swrWIFzPZF5xAPDD8Nj82lG5DgrCe48COKxxs6jZrnQo +wO9IvBy6BcjtCOwGNlym2y7yr9O52tIIi2gwPiHzg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrfeduuddgudefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkfhggtggujgesghdtre ertddtjeenucfhrhhomheprfgrthhrihgtkhcuufhtvghinhhhrghrughtuceophhssehp khhsrdhimheqnecuggftrfgrthhtvghrnhephefgjeeuveejteduhefgffefffdvjeefje eivdekfffgkeeugfehveetueefleeknecuvehluhhsthgvrhfuihiivgeptdenucfrrghr rghmpehmrghilhhfrhhomhepphhssehpkhhsrdhimh X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Thu, 1 Feb 2024 05:25:03 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id 8240fba4 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Thu, 1 Feb 2024 10:21:41 +0000 (UTC) Date: Thu, 1 Feb 2024 11:25:02 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Subject: [PATCH 2/7] reftable/merged: allocation-less dropping of shadowed records Message-ID: <576a96f2e5cf462d2f5793a0c8244f4966789654.1706782841.git.ps@pks.im> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The purpose of the merged reftable iterator is to iterate through all entries of a set of tables in the correct order. This is implemented by using a sub-iterator for each table, where the next entry of each of these iterators gets put into a priority queue. For each iteration, we do roughly the following steps: 1. Retrieve the top record of the priority queue. This is the entry we want to return to the caller. 2. Retrieve the next record of the sub-iterator that this record came from. If any, add it to the priority queue at the correct position. The position is determined by comparing the record keys, which e.g. corresponds to the refname for ref records. 3. Keep removing the top record of the priority queue until we hit the first entry whose key is larger than the returned record's key. This is required to drop "shadowed" records. The last step will lead to at least one comparison to the next entry, but may lead to many comparisons in case the reftable stack consists of many tables with shadowed records. It is thus part of the hot code path when iterating through records. The code to compare the entries with each other is quite inefficient though. Instead of comparing record keys with each other directly, we first format them into `struct strbuf`s and only then compare them with each other. While we already optimized this code path to reuse buffers in 829231dc20 (reftable/merged: reuse buffer to compute record keys, 2023-12-11), the cost to format the keys into the buffers still adds up quite significantly. Refactor the code to use `reftable_record_cmp()` instead, which has been introduced in the preceding commit. This function compares records with each other directly without requiring any memory allocations or copying and is thus way more efficient. The following benchmark uses git-show-ref(1) to print a single ref matching a pattern out of 1 million refs. This is the most direct way to exercise ref iteration speed as we remove all overhead of having to show the refs, too. Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 180.7 ms ± 4.7 ms [User: 177.1 ms, System: 3.4 ms] Range (min … max): 174.9 ms … 211.7 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 162.1 ms ± 4.4 ms [User: 158.5 ms, System: 3.4 ms] Range (min … max): 155.4 ms … 189.3 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.11 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt --- reftable/merged.c | 11 ++--------- reftable/merged.h | 2 -- 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/reftable/merged.c b/reftable/merged.c index c258ce953e..fb9978d798 100644 --- a/reftable/merged.c +++ b/reftable/merged.c @@ -51,8 +51,6 @@ static void merged_iter_close(void *p) reftable_iterator_destroy(&mi->stack[i]); } reftable_free(mi->stack); - strbuf_release(&mi->key); - strbuf_release(&mi->entry_key); } static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi, @@ -105,14 +103,11 @@ static int merged_iter_next_entry(struct merged_iter *mi, such a deployment, the loop below must be changed to collect all entries for the same key, and return new the newest one. */ - reftable_record_key(&entry.rec, &mi->entry_key); while (!merged_iter_pqueue_is_empty(mi->pq)) { struct pq_entry top = merged_iter_pqueue_top(mi->pq); - int cmp = 0; + int cmp; - reftable_record_key(&top.rec, &mi->key); - - cmp = strbuf_cmp(&mi->key, &mi->entry_key); + cmp = reftable_record_cmp(&top.rec, &entry.rec); if (cmp > 0) break; @@ -246,8 +241,6 @@ static int merged_table_seek_record(struct reftable_merged_table *mt, .typ = reftable_record_type(rec), .hash_id = mt->hash_id, .suppress_deletions = mt->suppress_deletions, - .key = STRBUF_INIT, - .entry_key = STRBUF_INIT, }; int n = 0; int err = 0; diff --git a/reftable/merged.h b/reftable/merged.h index d5b39dfe7f..7d9f95d27e 100644 --- a/reftable/merged.h +++ b/reftable/merged.h @@ -31,8 +31,6 @@ struct merged_iter { uint8_t typ; int suppress_deletions; struct merged_iter_pqueue pq; - struct strbuf key; - struct strbuf entry_key; }; void merged_table_release(struct reftable_merged_table *mt);