From patchwork Mon May 13 08:46:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13663208 Received: from wfout2-smtp.messagingengine.com (wfout2-smtp.messagingengine.com [64.147.123.145]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F73B1474C8 for ; Mon, 13 May 2024 08:47:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.145 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715590023; cv=none; b=HYg64BvfVDCTKPDMv7gWaoIpwu+yfC47s63uVmu/2Y6RVgSy6UXSDg4Km5u43C4Bw81KgGcBpjPOIoPiu8nu1aKLikTquuD+OFQTmpdZbh+gghyGO1PnUQ/goeIO0ZGBchBaTu1/6Yi4DTEHUI0odKM6UOBmb6ZjPxhfMqT8E6M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715590023; c=relaxed/simple; bh=v/VweuLejAp75EaGLeKSrYic2nVCcnjh2FF784FKcas=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=GJ2IU1SVsDPPmFf29sLMLaQypywOt4ii5Hel2TiStMvT2uekgo8SAVGXl5JMwoMc0qql1fSKfAEGyLdUTxmkz97zsHXEY2KBYao+Dv6Q57MpoRT+YGE3HRnF8IvE7RO6SaJEeM40dDBKMcwroDXuSYQKmASm3ydIR5IwcMg+Dkk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=pks.im; spf=pass smtp.mailfrom=pks.im; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b=R0LweNCw; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=k6JtUePB; arc=none smtp.client-ip=64.147.123.145 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="R0LweNCw"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="k6JtUePB" Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailfout.west.internal (Postfix) with ESMTP id 31C041C000D6; Mon, 13 May 2024 04:47:01 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Mon, 13 May 2024 04:47:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1715590020; x=1715676420; bh=03Uli0bxTz 37bdPDpf5ipzSMwDsi03jcn9g0HTe5UhU=; b=R0LweNCwSf+yp1VVlVtANysaQ/ vICVfOmLLfMeV61zO+yHjzxDtwmOqhkXIWldkG2eiW+I91mjX7rkj3lPZInCyhQN LonfJQ6gQa0kPO6R1N6w7Q6RdE0jxJ1Bs2ji4GEMZgV7k+b4zLwrrQFRMmLvVUMH CM/5SQOyhvVsxcIzdVDC4etFEhFb2ZUp/yx9Rg4F2aIdmteFPnCqfLcyHz2etYY2 Lo3QRmI9bV0jI7h34czansjnPzbcx9RzX2MOAqxYs09er9d8wSHjiEaD5mw39Jb5 0P5L2wx0DJPCR8s3ZfGqmZvzM5zOMZi/G/nN4svI2+5EMkY2ZP16ndvvaxGg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1715590020; x=1715676420; bh=03Uli0bxTz37bdPDpf5ipzSMwDsi 03jcn9g0HTe5UhU=; b=k6JtUePBXqLyl10yWsLFO+qv3O6rdXITUqjl2Uc7Urb8 2mi2+83F91aQRGqn86KwaNo7Js7CvVw/kpDgJwsiTQ7yZ+gzZcqxFlg9Yf0oJ+Z1 HAPAtIXOXBMG2npT1FSqmRxwJ3ZKe8dGU/n5Y98tpEB3ulXNlrjcEfiV5b9JiQby yvaYj3zvpXLBY9peI+jJwoVnYYpA+8E5T4DfGvtoUH5uj3YqLN4wJ/Tz4c2cVzlv 62YbLBRtp4kvI1VzQJNxzVP0nLRF0QxnZ6fvobk9A2L66FsPaail9LWPtt+HoOa/ JltCxGhtw8gW4CluyO0B1reiAB32XOo/uvWYrGrrIQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdeggedgtdejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggujgesghdtreertddtvdenucfhrhhomheprfgrthhr ihgtkhcuufhtvghinhhhrghrughtuceophhssehpkhhsrdhimheqnecuggftrfgrthhtvg hrnhepueektdevtdffveeljeetgfehheeigeekleduvdeffeeghefgledttdehjeelffet necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepphhsse hpkhhsrdhimh X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 13 May 2024 04:46:59 -0400 (EDT) Received: by localhost (OpenSMTPD) with ESMTPSA id bf3d1e30 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Mon, 13 May 2024 08:46:40 +0000 (UTC) Date: Mon, 13 May 2024 10:46:56 +0200 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Junio C Hamano , Justin Tobler Subject: [PATCH v2 00/13] reftable: prepare for re-seekable iterators Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Hi, this is the second version of my patch series that prepares the reftable iterators to become re-seekable. These refactorings will eventually allow us to reuse data structures by iterators and optimize for certain cases. Changes compared to v1: - Various fixes to commit messages. - Fixed a copy & pasted comment to refer to logs instead of refs. The series continues to build on top of ps/reftable-write-optim. There is a merge conflict with the in-flight ps/reftable-write-options, but given that this series has not yet been merged to next and because Junio has already resolved the conflict in "seen" I decided to not pull it in as an additional dependency. Thanks! Patrick Patrick Steinhardt (13): reftable/block: use `size_t` to track restart point index reftable/reader: avoid copying index iterator reftable/reader: unify indexed and linear seeking reftable/reader: separate concerns of table iter and reftable reader reftable/reader: inline `reader_seek_internal()` reftable/reader: set up the reader when initializing table iterator reftable/merged: split up initialization and seeking of records reftable/merged: simplify indices for subiterators reftable/generic: move seeking of records into the iterator reftable/generic: adapt interface to allow reuse of iterators reftable/reader: adapt interface to allow reuse of iterators reftable/stack: provide convenience functions to create iterators reftable/merged: adapt interface to allow reuse of iterators refs/reftable-backend.c | 48 ++++---- reftable/block.c | 4 +- reftable/generic.c | 94 +++++++++++---- reftable/generic.h | 9 +- reftable/iter.c | 23 +++- reftable/merged.c | 148 ++++++++---------------- reftable/merged.h | 6 + reftable/merged_test.c | 19 ++- reftable/reader.c | 218 +++++++++++++++-------------------- reftable/readwrite_test.c | 35 ++++-- reftable/reftable-generic.h | 8 +- reftable/reftable-iterator.h | 21 ++++ reftable/reftable-merged.h | 15 --- reftable/reftable-reader.h | 45 ++------ reftable/reftable-stack.h | 18 +++ reftable/stack.c | 29 ++++- 16 files changed, 378 insertions(+), 362 deletions(-) Range-diff against v1: 1: ca86a8b58d = 1: ca86a8b58d reftable/block: use `size_t` to track restart point index 2: bdbebdbd36 = 2: bdbebdbd36 reftable/reader: avoid copying index iterator 3: 716863a580 = 3: 716863a580 reftable/reader: unify indexed and linear seeking 4: 91db2f18c1 = 4: 91db2f18c1 reftable/reader: separate concerns of table iter and reftable reader 5: 4d498ef342 = 5: 4d498ef342 reftable/reader: inline `reader_seek_internal()` 6: 5a10a11584 = 6: 5a10a11584 reftable/reader: set up the reader when initializing table iterator 7: 21b3e3ab5f ! 7: 12c10fd366 reftable/merged: split up initialization and seeking of records @@ Commit message To initialize a `struct merged_iter`, we need to seek all subiterators to the wanted record and then add their results to the priority queue used to sort the records. This logic is split up across two functions, - `merged_table_seek_record()` and `merged_table_iter()`. The scope of - these functions is somewhat weird though, where `merged_table_iter()` is + `merged_table_seek_record()` and `merged_iter_init()`. The scope of + these functions is somewhat weird though, where `merged_iter_init()` is only responsible for adding the records of the subiterators to the priority queue. - Clarify the scope of those functions such that `merged_table_iter()` is + Clarify the scope of those functions such that `merged_iter_init()` is only responsible for initializing the iterator's structure. Performing the subiterator seeks are now part of `merged_table_seek_record()`. 8: f0f42cd56b ! 8: 31bdfc1e34 reftable/merged: simplify indices for subiterators @@ Commit message reftable/merged: simplify indices for subiterators When seeking on a merged table, we perform the seek for each of the - subiterators. If the subiterator hasa the desired record we add it to - the priority queue, otherwise we skip it and don't add it to the stack - of subiterators hosted by the merged table. + subiterators. If the subiterator has the desired record we add it to the + priority queue, otherwise we skip it and don't add it to the stack of + subiterators hosted by the merged table. The consequence of this is that the index of the subiterator in the merged table does not necessarily correspond to the index of it in the 9: 859b399e71 ! 9: 3e91c3830e reftable/generic: move seeking of records into the iterator @@ Commit message The second and more significant downside is that it is impossible to reuse iterators for multiple seeks. Whenever you want to look up a record, you need to re-create the whole infrastructure again, which is - quite a waste of time. Furthermore, it is impossible to for example - optimize seeks, for example when seeking the same record multiple times. + quite a waste of time. Furthermore, it is impossible to optimize seeks, + such as when seeking the same record multiple times. To address this, we essentially split up the concerns properly such that the parent data structure is responsible for setting up the iterator via 10: 727b8fa432 = 10: b0641dd800 reftable/generic: adapt interface to allow reuse of iterators 11: f688f8f59a ! 11: 0745951650 reftable/reader: adapt interface to allow reuse of iterators @@ reftable/reftable-reader.h: struct reftable_table; +void reftable_reader_init_ref_iterator(struct reftable_reader *r, + struct reftable_iterator *it); + -+/* Initialize a reftable iterator for reading refs. */ ++/* Initialize a reftable iterator for reading logs. */ +void reftable_reader_init_log_iterator(struct reftable_reader *r, + struct reftable_iterator *it); 12: 68cc35919b = 12: 51480c4328 reftable/stack: provide convenience functions to create iterators 13: be4da295c6 = 13: 2586e93c44 reftable/merged: adapt interface to allow reuse of iterators