From patchwork Thu Jan 11 10:06:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13517055 Received: from wout1-smtp.messagingengine.com (wout1-smtp.messagingengine.com [64.147.123.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A5A2A13FFE for ; Thu, 11 Jan 2024 10:06:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="m667/4kZ"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="m0KcLnvO" Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id 8A3B63200A3F; Thu, 11 Jan 2024 05:06:43 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Thu, 11 Jan 2024 05:06:43 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1704967602; x=1705054002; bh=+lxEtsdJk5 pNaPIe86Kril89cbW4pygxI+wpyAfyPWg=; b=m667/4kZJtT7cRknSBY+2qCFYw tTzPJ/K9o2hFCpxwnzcKp7CSh09YNS5o0+6+NWmkJyXOGWO7tF0S4RrF2D0EBCOR BI05rkcLpSlMqkcrmXE/vHQcmRwvCp7yoK8dlVxNvX0ojsfUU9NVG7cVP0nn80w2 SYfVwO5eHw+BIq25IqJOr91aWsNjvxKUpkxYIYeO+zqBkIV/UDkrYQB2lumnscHz 0jKNLnXkVxJATtn1N5ie7njZxtfpchNTeAuZz9HtbpftbOH2mjFRiQllQfgi6+fu kRbChmHNNdny1SoPUXCIMd/SuYZGw5jjKQIlfm9G7zs8VYoJyL5WmqGwqyvA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1704967602; x=1705054002; bh=+lxEtsdJk5pNaPIe86Kril89cbW4 pygxI+wpyAfyPWg=; b=m0KcLnvOoHYhu97qml3O2a7dfdFiLdNZrAHPt6sT9g0U HHT9/N8ldCBVQKQqdG1kYx/DSJQpyjwuVQIz4AZDxyIrybHhBZTEDjesIUofQ2fj jDJ7XDxXKHMO0Aww8fkqYVhDYBdOnusY8OuiupKuHrvNZn51J2j9ZwbqEI2rVrt5 sqo4QPQCSYlWvXE9Vy9iUEfScGmcFecDNV53BvYTCelLWJPv6UQADC4S6MaCMrby 8ekKnaoUpzeHUAEVMR7ZS+87xI72j77uoS5jg4QGNKBpKF3qYojDjfpegMiTgCI4 Cm5Zy5T97YT2gA1SmnuA2dQ3IFi89/0jNpr2PdYChg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdeifedguddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvvefukfhfgggtuggjsehgtd erredttddvnecuhfhrohhmpefrrghtrhhitghkucfuthgvihhnhhgrrhguthcuoehpshes phhkshdrihhmqeenucggtffrrghtthgvrhhnpeeukedtvedtffevleejtefgheehieegke eluddvfeefgeehgfeltddtheejleffteenucevlhhushhtvghrufhiiigvpedtnecurfgr rhgrmhepmhgrihhlfhhrohhmpehpshesphhkshdrihhm X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 11 Jan 2024 05:06:42 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id 54f07a16 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 11 Jan 2024 10:03:59 +0000 (UTC) Date: Thu, 11 Jan 2024 11:06:39 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Junio C Hamano Subject: [PATCH v2 1/5] reftable/stack: refactor stack reloading to have common exit path Message-ID: <4b7f52c415591616d0af69e43c1f8597fcf24634.1704966670.git.ps@pks.im> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The `reftable_stack_reload_maybe_reuse()` function is responsible for reloading the reftable list from disk. The function is quite hard to follow though because it has a bunch of different exit paths, many of which have to free the same set of resources. Refactor the function to have a common exit path. While at it, touch up the style of this function a bit to match our usual coding style better. Signed-off-by: Patrick Steinhardt --- reftable/stack.c | 86 +++++++++++++++++++++++------------------------- 1 file changed, 42 insertions(+), 44 deletions(-) diff --git a/reftable/stack.c b/reftable/stack.c index 16bab82063..bf869a6772 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -304,69 +304,67 @@ static int tv_cmp(struct timeval *a, struct timeval *b) static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st, int reuse_open) { - struct timeval deadline = { 0 }; - int err = gettimeofday(&deadline, NULL); + char **names = NULL, **names_after = NULL; + struct timeval deadline; int64_t delay = 0; - int tries = 0; - if (err < 0) - return err; + int tries = 0, err; + err = gettimeofday(&deadline, NULL); + if (err < 0) + goto out; deadline.tv_sec += 3; + while (1) { - char **names = NULL; - char **names_after = NULL; - struct timeval now = { 0 }; - int err = gettimeofday(&now, NULL); - int err2 = 0; - if (err < 0) { - return err; - } + struct timeval now; - /* Only look at deadlines after the first few times. This - simplifies debugging in GDB */ + err = gettimeofday(&now, NULL); + if (err < 0) + goto out; + + /* + * Only look at deadlines after the first few times. This + * simplifies debugging in GDB. + */ tries++; - if (tries > 3 && tv_cmp(&now, &deadline) >= 0) { - break; - } + if (tries > 3 && tv_cmp(&now, &deadline) >= 0) + goto out; err = read_lines(st->list_file, &names); - if (err < 0) { - free_names(names); - return err; - } + if (err < 0) + goto out; + err = reftable_stack_reload_once(st, names, reuse_open); - if (err == 0) { - free_names(names); + if (!err) break; - } - if (err != REFTABLE_NOT_EXIST_ERROR) { - free_names(names); - return err; - } - - /* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent - writer. Check if there was one by checking if the name list - changed. - */ - err2 = read_lines(st->list_file, &names_after); - if (err2 < 0) { - free_names(names); - return err2; - } - + if (err != REFTABLE_NOT_EXIST_ERROR) + goto out; + + /* + * REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent + * writer. Check if there was one by checking if the name list + * changed. + */ + err = read_lines(st->list_file, &names_after); + if (err < 0) + goto out; if (names_equal(names_after, names)) { - free_names(names); - free_names(names_after); - return err; + err = REFTABLE_NOT_EXIST_ERROR; + goto out; } + free_names(names); + names = NULL; free_names(names_after); + names_after = NULL; delay = delay + (delay * rand()) / RAND_MAX + 1; sleep_millisec(delay); } - return 0; +out: + free_names(names); + free_names(names_after); + return err; } /* -1 = error From patchwork Thu Jan 11 10:06:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13517056 Received: from wout1-smtp.messagingengine.com (wout1-smtp.messagingengine.com [64.147.123.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CAAF314A9A for ; Thu, 11 Jan 2024 10:06:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="tay473rF"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="Hb/1viRN" Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id B425C320034E; Thu, 11 Jan 2024 05:06:47 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Thu, 11 Jan 2024 05:06:48 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1704967607; x=1705054007; bh=kWfZRTI6C4 6V2tI4qQ1kEo31dLwpGUQx/EySGEjZLMM=; b=tay473rFd5NmPldfSNEp2KDzws bLEpIQbPUCrTkDk1+uOan4kvQ1a0RCo8MwF/aZ66aMfb0OXkK0xHeJMhkoygpOKn dFfNgoezrS4JW0E0nZIfuId3Ee5o0H+23dHkhlaS7E2+8QxfeP5jaitvJce/zcRk 5Kci9j/C5pZldRbQx5RwLAarLElNlgm+a0gWEt9NmM4J/+EbvBo5Ocl+o2G7Dc51 sUcrCeo1OAkLr4W39RIwFMB5J3Iz1VLuXWInBqp09pWz3HI6fSym2bLejUetXIk3 cVY8ymcoIU88+UFp7MRzIe+bpyWOZ2R9MuVw3O9JSquZgN1J8LIeF/l47JEw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1704967607; x=1705054007; bh=kWfZRTI6C46V2tI4qQ1kEo31dLwp GUQx/EySGEjZLMM=; b=Hb/1viRNJigH71J5KzaZdgdDQ7vwq9Uc2t7q4m7A+nuL oOuMH1rJBjFS444Tzomn34VRsNzf+iYam1HKHxi/fg9STPOkRzUYgm4G5mP5RYRs HIrHZKMyzluWnNvtTKQfZqeDkgfF/B+jFgvMK3pxzB9ciB0ipIZhn0RDZ4TZApbN KwM8sn1/W1zzrpoySviBmpMCga3vO4pJqfDcZ5ptm3MmPO+GNf/d0g8JmDgKmdsR vhbxxLei/zfkUjYlLYhljL2cKCTc4OHtljf36HCf82DBWTThqNmW3RsWTgxRKuIO ILCBcW7WffpRS/9vC4qOxYfvaf9UCgL3FWWcymoD0g== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdeifedguddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvvefukfhfgggtuggjsehgtd erredttddvnecuhfhrohhmpefrrghtrhhitghkucfuthgvihhnhhgrrhguthcuoehpshes phhkshdrihhmqeenucggtffrrghtthgvrhhnpeeukedtvedtffevleejtefgheehieegke eluddvfeefgeehgfeltddtheejleffteenucevlhhushhtvghrufhiiigvpedtnecurfgr rhgrmhepmhgrihhlfhhrohhmpehpshesphhkshdrihhm X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 11 Jan 2024 05:06:46 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id f3ae9fbe (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 11 Jan 2024 10:04:03 +0000 (UTC) Date: Thu, 11 Jan 2024 11:06:43 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Junio C Hamano Subject: [PATCH v2 2/5] reftable/stack: refactor reloading to use file descriptor Message-ID: <36b9f6b6240686cc5b0a761b889614fc31f01d34.1704966670.git.ps@pks.im> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: We're about to introduce a stat(3P)-based caching mechanism to reload the list of stacks only when it has changed. In order to avoid race conditions this requires us to have a file descriptor available that we can use to call fstat(3P) on. Prepare for this by converting the code to use `fd_read_lines()` so that we have the file descriptor readily available. Signed-off-by: Patrick Steinhardt --- reftable/stack.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/reftable/stack.c b/reftable/stack.c index bf869a6772..b1ee247601 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -308,6 +308,7 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st, struct timeval deadline; int64_t delay = 0; int tries = 0, err; + int fd = -1; err = gettimeofday(&deadline, NULL); if (err < 0) @@ -329,9 +330,19 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st, if (tries > 3 && tv_cmp(&now, &deadline) >= 0) goto out; - err = read_lines(st->list_file, &names); - if (err < 0) - goto out; + fd = open(st->list_file, O_RDONLY); + if (fd < 0) { + if (errno != ENOENT) { + err = REFTABLE_IO_ERROR; + goto out; + } + + names = reftable_calloc(sizeof(char *)); + } else { + err = fd_read_lines(fd, &names); + if (err < 0) + goto out; + } err = reftable_stack_reload_once(st, names, reuse_open); if (!err) @@ -356,12 +367,16 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st, names = NULL; free_names(names_after); names_after = NULL; + close(fd); + fd = -1; delay = delay + (delay * rand()) / RAND_MAX + 1; sleep_millisec(delay); } out: + if (fd >= 0) + close(fd); free_names(names); free_names(names_after); return err; From patchwork Thu Jan 11 10:06:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13517057 Received: from wout1-smtp.messagingengine.com (wout1-smtp.messagingengine.com [64.147.123.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA53E12E53 for ; Thu, 11 Jan 2024 10:06:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="f0SD5vXG"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="p/LtjTka" Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id D9BB93200A3F; Thu, 11 Jan 2024 05:06:50 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Thu, 11 Jan 2024 05:06:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1704967610; x=1705054010; bh=v4woTvYcNA S3r4VRYiQj4ffrk8X6Z6CjqtlTufs0Z8M=; b=f0SD5vXGBTSAGoBpkqFvX+ZDwf BifaFHvt85r2VXeSB7VadYM0Z4LvtHSldI98PUr6NWIF/gJctOFhZUb8AyWGYRU4 /RcNa3LBKwXqktTgFE8tDzDgEQrgI0sFPryFZ5b1MjUzGZCugMcP+QAh8tPADpLQ 9ALDhHx0K7ImkSAJZ94pDHkX2qT0xYhMcwcoyuzuHzjJN0L978qNrgEGdZYrgWB5 XNuOBNYNv/ybHK71qz52NF7+bw1k2gmWLEOgHxnh1fzNIQACCrRdgXKKId6RN5Mq 1+oI1CkPxUkwIaHv0dd3sf4B1rqaprX3+w7/IgABaRsBVXymwGIE+K1x6haw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1704967610; x=1705054010; bh=v4woTvYcNAS3r4VRYiQj4ffrk8X6 Z6CjqtlTufs0Z8M=; b=p/LtjTkanOZdH3SIFtkCI0OgLpvj7uR2vZUl3fCET6xw B63ksXCzkTgtt1h6ijx+b9RjQJgE9FcuomMUJUBg+bTpJXcO1jm1VubDpYalQt7C KNvDlOtqTDfXfimjayT6ldo4AkIp5Oy2xepRtx5PhC0AT9yGjSCYOdLSJQzSNbgy 4bYRSz0zboOqMBIjCZgp+7G8WrWZthzkWDzO3iItLFo7HaDByxrtt9KeGpunT6qj 70TKd80O/8uZRZ1bj4ieg14515Dxr5gJedRE1vAAsH306A8hVLIVP1TlMHfkufVE KN/KdNMeUNUTRjmW0gZSpzeYraIzqWGnP6wPvZyqxA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdeifedguddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvvefukfhfgggtuggjsehgtd erredttdejnecuhfhrohhmpefrrghtrhhitghkucfuthgvihhnhhgrrhguthcuoehpshes phhkshdrihhmqeenucggtffrrghtthgvrhhnpeetueevhffhudefvdegieeuieelgedthf egfedtueevjeejtdfgjeehudejuedtudenucevlhhushhtvghrufhiiigvpedunecurfgr rhgrmhepmhgrihhlfhhrohhmpehpshesphhkshdrihhm X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 11 Jan 2024 05:06:49 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id 79472dab (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 11 Jan 2024 10:04:07 +0000 (UTC) Date: Thu, 11 Jan 2024 11:06:48 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Junio C Hamano Subject: [PATCH v2 3/5] reftable/stack: use stat info to avoid re-reading stack list Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Whenever we call into the refs interfaces we potentially have to reload refs in case they have been concurrently modified, either in-process or externally. While this happens somewhat automatically for loose refs because we simply try to re-read the files, the "packed" backend will reload its snapshot of the packed-refs file in case its stat info has changed since last reading it. In the reftable backend we have a similar mechanism that is provided by `reftable_stack_reload()`. This function will read the list of stacks from "tables.list" and, if they have changed from the currently stored list, reload the stacks. This is heavily inefficient though, as we have to check whether the stack is up-to-date on basically every read and thus keep on re-reading the file all the time even if it didn't change at all. We can do better and use the same stat(3P)-based mechanism that the "packed" backend uses. Instead of reading the file, we will only open the file descriptor, fstat(3P) it, and then compare the info against the cached value from the last time we have updated the stack. This should always work alright because "tables.list" is updated atomically via a rename, so even if the ctime or mtime wasn't granular enough to identify a change, at least the inode number or file size should have changed. This change significantly speeds up operations where many refs are read, like when using git-update-ref(1). The following benchmark creates N refs in an otherwise-empty repository via `git update-ref --stdin`: Benchmark 1: update-ref: create many refs (refcount = 1, revision = HEAD~) Time (mean ± σ): 5.1 ms ± 0.2 ms [User: 2.4 ms, System: 2.6 ms] Range (min … max): 4.8 ms … 7.2 ms 109 runs Benchmark 2: update-ref: create many refs (refcount = 100, revision = HEAD~) Time (mean ± σ): 19.1 ms ± 0.9 ms [User: 8.9 ms, System: 9.9 ms] Range (min … max): 18.4 ms … 26.7 ms 72 runs Benchmark 3: update-ref: create many refs (refcount = 10000, revision = HEAD~) Time (mean ± σ): 1.336 s ± 0.018 s [User: 0.590 s, System: 0.724 s] Range (min … max): 1.314 s … 1.373 s 10 runs Benchmark 4: update-ref: create many refs (refcount = 1, revision = HEAD) Time (mean ± σ): 5.1 ms ± 0.2 ms [User: 2.4 ms, System: 2.6 ms] Range (min … max): 4.8 ms … 7.2 ms 109 runs Benchmark 5: update-ref: create many refs (refcount = 100, revision = HEAD) Time (mean ± σ): 14.8 ms ± 0.2 ms [User: 7.1 ms, System: 7.5 ms] Range (min … max): 14.2 ms … 15.2 ms 82 runs Benchmark 6: update-ref: create many refs (refcount = 10000, revision = HEAD) Time (mean ± σ): 927.6 ms ± 5.3 ms [User: 437.8 ms, System: 489.5 ms] Range (min … max): 919.4 ms … 936.4 ms 10 runs Summary update-ref: create many refs (refcount = 1, revision = HEAD) ran 1.00 ± 0.07 times faster than update-ref: create many refs (refcount = 1, revision = HEAD~) 2.89 ± 0.14 times faster than update-ref: create many refs (refcount = 100, revision = HEAD) 3.74 ± 0.25 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~) 181.26 ± 8.30 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD) 261.01 ± 12.35 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~) Signed-off-by: Patrick Steinhardt --- reftable/stack.c | 12 +++++++++++- reftable/stack.h | 1 + reftable/system.h | 1 + 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/reftable/stack.c b/reftable/stack.c index b1ee247601..c28d82299d 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -175,6 +175,7 @@ void reftable_stack_destroy(struct reftable_stack *st) st->readers_len = 0; FREE_AND_NULL(st->readers); } + stat_validity_clear(&st->list_validity); FREE_AND_NULL(st->list_file); FREE_AND_NULL(st->reftable_dir); reftable_free(st); @@ -374,7 +375,11 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st, sleep_millisec(delay); } + stat_validity_update(&st->list_validity, fd); + out: + if (err) + stat_validity_clear(&st->list_validity); if (fd >= 0) close(fd); free_names(names); @@ -388,8 +393,13 @@ static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st, static int stack_uptodate(struct reftable_stack *st) { char **names = NULL; - int err = read_lines(st->list_file, &names); + int err; int i = 0; + + if (stat_validity_check(&st->list_validity, st->list_file)) + return 0; + + err = read_lines(st->list_file, &names); if (err < 0) return err; diff --git a/reftable/stack.h b/reftable/stack.h index f57005846e..3f80cc598a 100644 --- a/reftable/stack.h +++ b/reftable/stack.h @@ -14,6 +14,7 @@ license that can be found in the LICENSE file or at #include "reftable-stack.h" struct reftable_stack { + struct stat_validity list_validity; char *list_file; char *reftable_dir; int disable_auto_compact; diff --git a/reftable/system.h b/reftable/system.h index 6b74a81514..2cc7adf271 100644 --- a/reftable/system.h +++ b/reftable/system.h @@ -12,6 +12,7 @@ license that can be found in the LICENSE file or at /* This header glues the reftable library to the rest of Git */ #include "git-compat-util.h" +#include "statinfo.h" #include "strbuf.h" #include "hash-ll.h" /* hash ID, sizes.*/ #include "dir.h" /* remove_dir_recursively, for tests.*/ From patchwork Thu Jan 11 10:06:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13517058 Received: from wout1-smtp.messagingengine.com (wout1-smtp.messagingengine.com [64.147.123.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DD5814F70 for ; Thu, 11 Jan 2024 10:06:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="H7O5l9SH"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="Z1L52lgA" Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id 0FB3B320034E; Thu, 11 Jan 2024 05:06:54 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Thu, 11 Jan 2024 05:06:55 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1704967614; x=1705054014; bh=/vD9005ykh MTzCKwvK6H6cu8M7XWK29+pcFlZZ0hbwk=; b=H7O5l9SHVE6/CUOW/cpTwsmoni TlPmH2kRVUKbLrJ+7ZVG6a8icZQdyNZjkGg5sWGLwr4+uZXlgSJvaIoCSZkC26id Jtit8qYU/VxxoxBFHFkKoAjvzzZ0lhnGtT8XjYDFqc+1Wjd2TmIZTagsPLyFtEtX nmK+WUAUCdEYgaHbdN9RMTuy/H7/Z6ctOUm6A8c4ZlbibAPxMKWCcTJ6lOya0z6E lR7BnEAr+r1LRv4oc8UEmcVZwZOPeld6ZOSyhP37T3YkW6kqe/jloD+11Ik8vTj7 hyD65ZjAmXo26TfMFQaI583HW5ukSsdeZwc2PHKckdkkuDZYiRPgB/MZUqXA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1704967614; x=1705054014; bh=/vD9005ykhMTzCKwvK6H6cu8M7XW K29+pcFlZZ0hbwk=; b=Z1L52lgAPkbbMfEiUBTNBh6Isr1Bv4OFxORUQmXLysF5 v1PchzzW0wVJ7s3hdGGa0cArYC3NygA2IZvv8Psn9NModhOaoufCVoVfljv9kkv/ TMhrmELKzA99JnMJMRb08WkFlNSt7+Tb6umIT/fboej4TROx3WdUidugfYfdEMBF e8bG+nvqkG4Aldo++r4CXEOPi6HKTmEyx6XRrY+Q5OUhCUe7n5568tYlrWkXkI3G YGQ0LzTmQklGOnhOIV7De7AyrdAtXRmaXDE0TPf2uyqHa9ots5YRcn+Wr+oU3kfo d35UONcOYkYp4EKSWsGdxaTddK+EMzAGFj10SAflBA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdeifedguddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvvefukfhfgggtuggjsehgtd erredttddvnecuhfhrohhmpefrrghtrhhitghkucfuthgvihhnhhgrrhguthcuoehpshes phhkshdrihhmqeenucggtffrrghtthgvrhhnpeeukedtvedtffevleejtefgheehieegke eluddvfeefgeehgfeltddtheejleffteenucevlhhushhtvghrufhiiigvpedunecurfgr rhgrmhepmhgrihhlfhhrohhmpehpshesphhkshdrihhm X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 11 Jan 2024 05:06:53 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id 62dc9229 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 11 Jan 2024 10:04:12 +0000 (UTC) Date: Thu, 11 Jan 2024 11:06:52 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Junio C Hamano Subject: [PATCH v2 4/5] reftable/blocksource: refactor code to match our coding style Message-ID: <96e79dc3ba011915924a8a4766eba426ef83abb5.1704966670.git.ps@pks.im> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Refactor `reftable_block_source_from_file()` to match our coding style better. Signed-off-by: Patrick Steinhardt --- reftable/blocksource.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/reftable/blocksource.c b/reftable/blocksource.c index a1ea304429..1e2fb5e9fd 100644 --- a/reftable/blocksource.c +++ b/reftable/blocksource.c @@ -125,24 +125,23 @@ static struct reftable_block_source_vtable file_vtable = { int reftable_block_source_from_file(struct reftable_block_source *bs, const char *name) { - struct stat st = { 0 }; - int err = 0; - int fd = open(name, O_RDONLY); - struct file_block_source *p = NULL; + struct file_block_source *p; + struct stat st; + int fd; + + fd = open(name, O_RDONLY); if (fd < 0) { - if (errno == ENOENT) { + if (errno == ENOENT) return REFTABLE_NOT_EXIST_ERROR; - } return -1; } - err = fstat(fd, &st); - if (err < 0) { + if (fstat(fd, &st) < 0) { close(fd); return REFTABLE_IO_ERROR; } - p = reftable_calloc(sizeof(struct file_block_source)); + p = reftable_calloc(sizeof(*p)); p->size = st.st_size; p->fd = fd; From patchwork Thu Jan 11 10:06:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Patrick Steinhardt X-Patchwork-Id: 13517059 Received: from wout1-smtp.messagingengine.com (wout1-smtp.messagingengine.com [64.147.123.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4454E14F9C for ; Thu, 11 Jan 2024 10:07:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=pks.im Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=pks.im Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=pks.im header.i=@pks.im header.b="EUQNMBJD"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="EtrCOuXr" Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 3357E3200A66; Thu, 11 Jan 2024 05:06:59 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Thu, 11 Jan 2024 05:06:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pks.im; h=cc:cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1704967618; x=1705054018; bh=49AMbmFfk9 mno+MnrNQksQab7rHFduMeb4BWWR/XHC0=; b=EUQNMBJDjJlEPyXwkzt9JRvyfF rJC3p0UDUEJEef1VIjsdsbFVJSN6uH3sl9NGY1EDmYGofzfhQXA+TiwPZW7/dkY3 H7GE80McUx00jbFFzjeggbDPnUwgE6Zg7CByeQKmspoh2jZQ2rngw1xOzBafu6od BwAT1btLQnTx+AszjjvDupDxMZbombrpl6G1/1sCq2hc6wlsqLROSNGM9QR60ZsV s1m63w3EUCFNRkucRryoMHAdAgT75txR3CUDYwjPTTscc5O5cV6mkgXlRgIeGlCb rWtmtiHtuRgF9vN4t+LlqlBm5pIZEn8RogpjL5pKTNOaRojhaeIH/7M93Dsw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1704967618; x=1705054018; bh=49AMbmFfk9mno+MnrNQksQab7rHF duMeb4BWWR/XHC0=; b=EtrCOuXrwJSXjteig8OqLuOb+nRXYXoEhVILnWzY02c1 EFnULY+CxCRjDmvNiP1+aFEGTd2e8Bg2AryHUaVXSmBQD2U7gDZD510s5fDxWKT3 vxdaIj540VNceiPxr9Z9yRU30ZDQlB2C2GFxxePlFcTkz66VeVWAJZJBThaRYZ6c maIM+N0RmAcXjdfQc4KrK6eh4td4NVTxIgYFeRFj1xfPGCoUQJimI6ubjhW1hPki WO/VEruUauUvw8AvPyKzGr3ujNwqbCQba61idmDMjEjBNUVhRLkeJ0X/zeEl8eUQ ooHgZ4KXo2slG8sfv6QiE9WLl2NWbvns3ZRPuIWlPg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvkedrvdeifedguddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvvefukfhfgggtuggjsehgtd erredttdejnecuhfhrohhmpefrrghtrhhitghkucfuthgvihhnhhgrrhguthcuoehpshes phhkshdrihhmqeenucggtffrrghtthgvrhhnpeetueevhffhudefvdegieeuieelgedthf egfedtueevjeejtdfgjeehudejuedtudenucevlhhushhtvghrufhiiigvpedtnecurfgr rhgrmhepmhgrihhlfhhrohhmpehpshesphhkshdrihhm X-ME-Proxy: Feedback-ID: i197146af:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 11 Jan 2024 05:06:57 -0500 (EST) Received: by vm-mail (OpenSMTPD) with ESMTPSA id cd3347fd (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 11 Jan 2024 10:04:16 +0000 (UTC) Date: Thu, 11 Jan 2024 11:06:56 +0100 From: Patrick Steinhardt To: git@vger.kernel.org Cc: Han-Wen Nienhuys , Junio C Hamano Subject: [PATCH v2 5/5] reftable/blocksource: use mmap to read tables Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The blocksource interface provides an interface to read blocks from a reftable table. This interface is implemented using read(3P) calls on the underlying file descriptor. While this works alright, this pattern is very inefficient when repeatedly querying the reftable stack for one or more refs. This inefficiency can mostly be attributed to the fact that we often need to re-read the same blocks over and over again, and every single time we need to call read(3P) again. A natural fit in this context is to use mmap(3P) instead of read(3P), which has a bunch of benefits: - We do not need to come up with a caching strategy for some of the blocks as this will be handled by the kernel already. - We can avoid the overhead of having to call into the read(3P) syscall repeatedly. - We do not need to allocate returned blocks repeatedly, but can instead hand out pointers into the mmapped region directly. Using mmap comes with a significant drawback on Windows though, because mmapped files cannot be deleted and neither is it possible to rename files onto an mmapped file. But for one, the reftable library gracefully handles the case where auto-compaction cannot delete a still-open stack already and ignores any such errors. Also, `reftable_stack_clean()` will prune stale tables which are not referenced by "tables.list" anymore so that those files can eventually be pruned. And second, we never rewrite already-written stacks, so it does not matter that we cannot rename a file over an mmaped file, either. Another unfortunate property of mmap is that it is not supported by all systems. But given that the size of reftables should typically be rather limited (megabytes at most in the vast majority of repositories), we can use the fallback implementation provided by `git_mmap()` which reads the whole file into memory instead. This is the same strategy that the "packed" backend uses. While this change doesn't significantly improve performance in the case where we're seeking through stacks once (like e.g. git-for-each-ref(1) would). But it does speed up usecases where there is lots of random access to refs, e.g. when writing. The following benchmark demonstrates these savings with git-update-ref(1) creating N refs in an otherwise empty repository: Benchmark 1: update-ref: create many refs (refcount = 1, revision = HEAD~) Time (mean ± σ): 5.1 ms ± 0.2 ms [User: 2.5 ms, System: 2.5 ms] Range (min … max): 4.8 ms … 7.1 ms 111 runs Benchmark 2: update-ref: create many refs (refcount = 100, revision = HEAD~) Time (mean ± σ): 14.8 ms ± 0.5 ms [User: 7.1 ms, System: 7.5 ms] Range (min … max): 14.1 ms … 18.7 ms 84 runs Benchmark 3: update-ref: create many refs (refcount = 10000, revision = HEAD~) Time (mean ± σ): 926.4 ms ± 5.6 ms [User: 448.5 ms, System: 477.7 ms] Range (min … max): 920.0 ms … 936.1 ms 10 runs Benchmark 4: update-ref: create many refs (refcount = 1, revision = HEAD) Time (mean ± σ): 5.0 ms ± 0.2 ms [User: 2.4 ms, System: 2.5 ms] Range (min … max): 4.7 ms … 5.4 ms 111 runs Benchmark 5: update-ref: create many refs (refcount = 100, revision = HEAD) Time (mean ± σ): 10.5 ms ± 0.2 ms [User: 5.0 ms, System: 5.3 ms] Range (min … max): 10.0 ms … 10.9 ms 93 runs Benchmark 6: update-ref: create many refs (refcount = 10000, revision = HEAD) Time (mean ± σ): 529.6 ms ± 9.1 ms [User: 268.0 ms, System: 261.4 ms] Range (min … max): 522.4 ms … 547.1 ms 10 runs Summary update-ref: create many refs (refcount = 1, revision = HEAD) ran 1.01 ± 0.06 times faster than update-ref: create many refs (refcount = 1, revision = HEAD~) 2.08 ± 0.07 times faster than update-ref: create many refs (refcount = 100, revision = HEAD) 2.95 ± 0.14 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~) 105.33 ± 3.76 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD) 184.24 ± 5.89 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~) Theoretically, we could also replicate the strategy of the "packed" backend where small tables are read into memory instead of using mmap. Benchmarks did not confirm that this has a performance benefit though. Signed-off-by: Patrick Steinhardt --- reftable/blocksource.c | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/reftable/blocksource.c b/reftable/blocksource.c index 1e2fb5e9fd..8c41e3c70f 100644 --- a/reftable/blocksource.c +++ b/reftable/blocksource.c @@ -76,8 +76,8 @@ struct reftable_block_source malloc_block_source(void) } struct file_block_source { - int fd; uint64_t size; + unsigned char *data; }; static uint64_t file_size(void *b) @@ -87,19 +87,12 @@ static uint64_t file_size(void *b) static void file_return_block(void *b, struct reftable_block *dest) { - if (dest->len) - memset(dest->data, 0xff, dest->len); - reftable_free(dest->data); } -static void file_close(void *b) +static void file_close(void *v) { - int fd = ((struct file_block_source *)b)->fd; - if (fd > 0) { - close(fd); - ((struct file_block_source *)b)->fd = 0; - } - + struct file_block_source *b = v; + munmap(b->data, b->size); reftable_free(b); } @@ -108,9 +101,7 @@ static int file_read_block(void *v, struct reftable_block *dest, uint64_t off, { struct file_block_source *b = v; assert(off + size <= b->size); - dest->data = reftable_malloc(size); - if (pread_in_full(b->fd, dest->data, size, off) != size) - return -1; + dest->data = b->data + off; dest->len = size; return size; } @@ -143,7 +134,8 @@ int reftable_block_source_from_file(struct reftable_block_source *bs, p = reftable_calloc(sizeof(*p)); p->size = st.st_size; - p->fd = fd; + p->data = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0); + close(fd); assert(!bs->ops); bs->ops = &file_vtable;