[4/8] reftable/stack: simplify tracking of table locks

Message ID	40d9f75cf20d4b76adb1683709e054e264d4e06f.1722435214.git.ps@pks.im (mailing list archive)
State	Superseded
Headers	show Received: from fout8-smtp.messagingengine.com (fout8-smtp.messagingengine.com [103.168.172.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DACB1B29AF for <git@vger.kernel.org>; Wed, 31 Jul 2024 14:15:10 +0000 (UTC) Feedback-ID: i197146af:Fastmail Date: Wed, 31 Jul 2024 16:15:06 +0200 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Subject: [PATCH 4/8] reftable/stack: simplify tracking of table locks Message-ID: <40d9f75cf20d4b76adb1683709e054e264d4e06f.1722435214.git.ps@pks.im> References: <cover.1722435214.git.ps@pks.im> Precedence: bulk MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="iS8p86OFg39pqzfe" Content-Disposition: inline In-Reply-To: <cover.1722435214.git.ps@pks.im>
Series	reftable: improvements and fixes for compaction \| expand [0/8] reftable: improvements and fixes for compaction [1/8] reftable/stack: refactor function to gather table sizes [2/8] reftable/stack: test compaction with already-locked tables [3/8] reftable/stack: update stats on failed full compaction [4/8] reftable/stack: simplify tracking of table locks [5/8] reftable/stack: do not die when fsyncing lock file files [6/8] reftable/stack: use lock_file when adding table to "tables.list" [7/8] reftable/stack: fix corruption on concurrent compaction [8/8] reftable/stack: handle locked tables during auto-compaction

Message ID

40d9f75cf20d4b76adb1683709e054e264d4e06f.1722435214.git.ps@pks.im (mailing list archive)

State

Superseded

Headers

Feedback-ID: i197146af:Fastmail
Date: Wed, 31 Jul 2024 16:15:06 +0200
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Subject: [PATCH 4/8] reftable/stack: simplify tracking of table locks
Message-ID: 
 <40d9f75cf20d4b76adb1683709e054e264d4e06f.1722435214.git.ps@pks.im>
References: <cover.1722435214.git.ps@pks.im>
Precedence: bulk
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
	protocol="application/pgp-signature"; boundary="iS8p86OFg39pqzfe"
Content-Disposition: inline
In-Reply-To: <cover.1722435214.git.ps@pks.im>

Series

reftable: improvements and fixes for compaction | expand

Commit Message

Patrick Steinhardt July 31, 2024, 2:15 p.m. UTC

When compacting tables, we store the locks of all tables we are about to
compact in the `table_locks` array. As we currently only ever compact
all tables in the user-provided range or none, we simply track those
locks via the indices of the respective tables in the merged stack.

This is about to change though, as we will introduce a mode where auto
compaction gracefully handles the case of already-locked files. In this
case, it may happen that we only compact a subset of the user-supplied
range of tables. In this case, the indices will not necessarily match
the lock indices anymore.

Refactor the code such that we track the number of locks via a separate
variable. The resulting code is expected to perform the same, but will
make it easier to perform the described change.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

Comments

Justin Tobler July 31, 2024, 9:57 p.m. UTC | #1

On 24/07/31 04:15PM, Patrick Steinhardt wrote:
> When compacting tables, we store the locks of all tables we are about to
> compact in the `table_locks` array. As we currently only ever compact
> all tables in the user-provided range or none, we simply track those
> locks via the indices of the respective tables in the merged stack.
> 
> This is about to change though, as we will introduce a mode where auto
> compaction gracefully handles the case of already-locked files. In this
> case, it may happen that we only compact a subset of the user-supplied
> range of tables. In this case, the indices will not necessarily match
> the lock indices anymore.
> 
> Refactor the code such that we track the number of locks via a separate
> variable. The resulting code is expected to perform the same, but will
> make it easier to perform the described change.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/stack.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/reftable/stack.c b/reftable/stack.c
> index e5959d2c76..07e7ffc6b9 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -1016,7 +1016,7 @@ static int stack_compact_range(struct reftable_stack *st,
>  	struct lock_file *table_locks = NULL;
>  	struct tempfile *new_table = NULL;
>  	int is_empty_table = 0, err = 0;
> -	size_t i;
> +	size_t i, nlocks = 0;
>  
>  	if (first > last || (!expiry && first == last)) {
>  		err = 0;
> @@ -1051,7 +1051,7 @@ static int stack_compact_range(struct reftable_stack *st,
>  	for (i = first; i <= last; i++) {
>  		stack_filename(&table_name, st, reader_name(st->readers[i]));
>  
> -		err = hold_lock_file_for_update(&table_locks[i - first],
> +		err = hold_lock_file_for_update(&table_locks[nlocks],

Tables in the list are locked in reverse order. Previously, the locks
were also added to `table_locks` in reverse order. This could leave some
elements empty at the beginning if only a subset of tables are locked.
Now each table lock is added starting from index 0. This means the
contents of `table_locks` are now in a reversed order.

Ultimately, this makes no difference though because all the usages also
have updated `table_locks` accesses meaning the same order is maintained
in practice.

So far makes sense :)

>  						table_name.buf, LOCK_NO_DEREF);
>  		if (err < 0) {
>  			if (errno == EEXIST)
> @@ -1066,7 +1066,7 @@ static int stack_compact_range(struct reftable_stack *st,
>  		 * run into file descriptor exhaustion when we compress a lot
>  		 * of tables.
>  		 */
> -		err = close_lock_file_gently(&table_locks[i - first]);
> +		err = close_lock_file_gently(&table_locks[nlocks++]);
>  		if (err < 0) {
>  			err = REFTABLE_IO_ERROR;
>  			goto done;
> @@ -1183,8 +1183,8 @@ static int stack_compact_range(struct reftable_stack *st,
>  	 * Delete the old tables. They may still be in use by concurrent
>  	 * readers, so it is expected that unlinking tables may fail.
>  	 */
> -	for (i = first; i <= last; i++) {
> -		struct lock_file *table_lock = &table_locks[i - first];
> +	for (i = 0; i < nlocks; i++) {
> +		struct lock_file *table_lock = &table_locks[i];
>  		char *table_path = get_locked_file_path(table_lock);
>  		unlink(table_path);
>  		free(table_path);
> @@ -1192,8 +1192,8 @@ static int stack_compact_range(struct reftable_stack *st,
>  
>  done:
>  	rollback_lock_file(&tables_list_lock);
> -	for (i = first; table_locks && i <= last; i++)
> -		rollback_lock_file(&table_locks[i - first]);
> +	for (i = 0; table_locks && i < nlocks; i++)
> +		rollback_lock_file(&table_locks[i]);
>  	reftable_free(table_locks);
>  
>  	delete_tempfile(&new_table);
> -- 
> 2.46.0.dirty
>

Junio C Hamano Aug. 2, 2024, 11 p.m. UTC | #2

Patrick Steinhardt <ps@pks.im> writes:

> -	size_t i;
> +	size_t i, nlocks = 0;
>  
>  	if (first > last || (!expiry && first == last)) {
>  		err = 0;
> @@ -1051,7 +1051,7 @@ static int stack_compact_range(struct reftable_stack *st,
>  	for (i = first; i <= last; i++) {
>  		stack_filename(&table_name, st, reader_name(st->readers[i]));
>  
> -		err = hold_lock_file_for_update(&table_locks[i - first],
> +		err = hold_lock_file_for_update(&table_locks[nlocks],
>  						table_name.buf, LOCK_NO_DEREF);
>  		if (err < 0) {
>  			if (errno == EEXIST)
> @@ -1066,7 +1066,7 @@ static int stack_compact_range(struct reftable_stack *st,
>  		 * run into file descriptor exhaustion when we compress a lot
>  		 * of tables.
>  		 */
> -		err = close_lock_file_gently(&table_locks[i - first]);
> +		err = close_lock_file_gently(&table_locks[nlocks++]);
>  		if (err < 0) {
>  			err = REFTABLE_IO_ERROR;
>  			goto done;

The only unusual control flow in this loop that runs i from first to
last is to leave it upon an error, so "i - first" and "nlocks" is
always the same, at this step in the patch series.

> @@ -1183,8 +1183,8 @@ static int stack_compact_range(struct reftable_stack *st,
>  	 * Delete the old tables. They may still be in use by concurrent
>  	 * readers, so it is expected that unlinking tables may fail.
>  	 */
> -	for (i = first; i <= last; i++) {
> -		struct lock_file *table_lock = &table_locks[i - first];
> +	for (i = 0; i < nlocks; i++) {
> +		struct lock_file *table_lock = &table_locks[i];
>  		char *table_path = get_locked_file_path(table_lock);
>  		unlink(table_path);
>  		free(table_path);

And this one at this step in the patch series is skipped if the
earlier loop saw even a single error, so again, this is a benign
noop change.

> @@ -1192,8 +1192,8 @@ static int stack_compact_range(struct reftable_stack *st,
>  
>  done:
>  	rollback_lock_file(&tables_list_lock);
> -	for (i = first; table_locks && i <= last; i++)
> -		rollback_lock_file(&table_locks[i - first]);
> +	for (i = 0; table_locks && i < nlocks; i++)
> +		rollback_lock_file(&table_locks[i]);

This is a true bugfix, isn't it?  If we failed to create lock file
somewhere in the middle, we used to still go ahead and attempted
rollback_lock_file() on all of them.  Now we rollback only what we
successfully called hold_lock_file_for_update().

I wonder why nobody segfaulted where after a failed lock.  The
answer probably is that lk->tempfile that is NULL will safely bypass
most of the things because is_tempfile_active() would say "false" on
such a lockfile.  But still it probably still were wrong to call
rollback_lock_file() on a "struct lockfile" full of NUL-bytes, and
it is good that we no longer do that.

>  	reftable_free(table_locks);
>  
>  	delete_tempfile(&new_table);

Patrick Steinhardt Aug. 5, 2024, 12:11 p.m. UTC | #3

On Fri, Aug 02, 2024 at 04:00:52PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > @@ -1192,8 +1192,8 @@ static int stack_compact_range(struct reftable_stack *st,
> >  
> >  done:
> >  	rollback_lock_file(&tables_list_lock);
> > -	for (i = first; table_locks && i <= last; i++)
> > -		rollback_lock_file(&table_locks[i - first]);
> > +	for (i = 0; table_locks && i < nlocks; i++)
> > +		rollback_lock_file(&table_locks[i]);
> 
> This is a true bugfix, isn't it?  If we failed to create lock file
> somewhere in the middle, we used to still go ahead and attempted
> rollback_lock_file() on all of them.  Now we rollback only what we
> successfully called hold_lock_file_for_update().
> 
> I wonder why nobody segfaulted where after a failed lock.  The
> answer probably is that lk->tempfile that is NULL will safely bypass
> most of the things because is_tempfile_active() would say "false" on
> such a lockfile.  But still it probably still were wrong to call
> rollback_lock_file() on a "struct lockfile" full of NUL-bytes, and
> it is good that we no longer do that.

I don't think it is. `table_locks` is an array of `struct lockfile` and
is allocated with calloc(3P), so we know that each uninitialized lock
will be all zeroes. For each uninitialized lock, `rollback_lock_file()`
calls `delete_tempfile(&lk->tempfile)`, which derefs the pointer and
thus essentially calls `!is_tempfile_active(lk->tempfile)`, and that
ultimately ends up checking whethere `tempfile` is a `NULL` pointer or
not. And as the structs were zero-initialized, it is a `NULL` pointer
and thus we bail out.

We also have tests that exercise this logic, so it also seems to be fine
in practice.

Patrick

diff --git a/reftable/stack.c b/reftable/stack.c
index e5959d2c76..07e7ffc6b9 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1016,7 +1016,7 @@  static int stack_compact_range(struct reftable_stack *st,
 	struct lock_file *table_locks = NULL;
 	struct tempfile *new_table = NULL;
 	int is_empty_table = 0, err = 0;
-	size_t i;
+	size_t i, nlocks = 0;
 
 	if (first > last || (!expiry && first == last)) {
 		err = 0;
@@ -1051,7 +1051,7 @@  static int stack_compact_range(struct reftable_stack *st,
 	for (i = first; i <= last; i++) {
 		stack_filename(&table_name, st, reader_name(st->readers[i]));
 
-		err = hold_lock_file_for_update(&table_locks[i - first],
+		err = hold_lock_file_for_update(&table_locks[nlocks],
 						table_name.buf, LOCK_NO_DEREF);
 		if (err < 0) {
 			if (errno == EEXIST)
@@ -1066,7 +1066,7 @@  static int stack_compact_range(struct reftable_stack *st,
 		 * run into file descriptor exhaustion when we compress a lot
 		 * of tables.
 		 */
-		err = close_lock_file_gently(&table_locks[i - first]);
+		err = close_lock_file_gently(&table_locks[nlocks++]);
 		if (err < 0) {
 			err = REFTABLE_IO_ERROR;
 			goto done;
@@ -1183,8 +1183,8 @@  static int stack_compact_range(struct reftable_stack *st,
 	 * Delete the old tables. They may still be in use by concurrent
 	 * readers, so it is expected that unlinking tables may fail.
 	 */
-	for (i = first; i <= last; i++) {
-		struct lock_file *table_lock = &table_locks[i - first];
+	for (i = 0; i < nlocks; i++) {
+		struct lock_file *table_lock = &table_locks[i];
 		char *table_path = get_locked_file_path(table_lock);
 		unlink(table_path);
 		free(table_path);
@@ -1192,8 +1192,8 @@  static int stack_compact_range(struct reftable_stack *st,
 
 done:
 	rollback_lock_file(&tables_list_lock);
-	for (i = first; table_locks && i <= last; i++)
-		rollback_lock_file(&table_locks[i - first]);
+	for (i = 0; table_locks && i < nlocks; i++)
+		rollback_lock_file(&table_locks[i]);
 	reftable_free(table_locks);
 
 	delete_tempfile(&new_table);

[4/8] reftable/stack: simplify tracking of table locks

Commit Message

Comments

Patch