diff mbox series

[8/8] reftable/stack: handle locked tables during auto-compaction

Message ID dc2217830700acaac50d96361352ff433aa57a4d.1722435214.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series reftable: improvements and fixes for compaction | expand

Commit Message

Patrick Steinhardt July 31, 2024, 2:15 p.m. UTC
When compacting tables, it may happen that we want to compact a set of
tables which are already locked by a concurrent process that compacts
them. In the case where we wanted to perform a full compaction of all
tables it is sensible to bail out in this case, as we cannot fulfill the
requested action.

But when performing auto-compaction it isn't necessarily in our best
interest of us to abort the whole operation. For example, due to the
geometric compacting schema that we use, it may be that process A takes
a lot of time to compact the bulk of all tables whereas process B
appends a bunch of new tables to the stack. B would in this case also
notice that it has to compact the tables that process A is compacting
already and thus also try to compact the same range, probably including
the new tables it has appended. But because those tables are locked
already, it will fail and thus abort the complete auto-compaction. The
consequence is that the stack will grow longer and longer while A isn't
yet done with compaction, which will lead to a growing performance
impact.

Instead of aborting auto-compaction altogether, let's gracefully handle
this situation by instead compacting tables which aren't locked. To do
so, instead of locking from the beginning of the slice-to-be-compacted,
we start locking tables from the end of the slice. Once we hit the first
table that is locked already, we abort. If we succeded to lock two or
more tables, then we simply reduce the slice of tables that we're about
to compact to those which we managed to lock.

This ensures that we can at least make some progress for compaction in
said scenario. It also helps in other scenarios, like for example when a
process died and left a stale lockfile behind. In such a case we can at
least ensure some compaction on a best-effort basis.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/stack.c           | 61 +++++++++++++++++++++++++++++++-------
 reftable/stack_test.c      | 12 ++++----
 t/t0610-reftable-basics.sh | 21 ++++++++-----
 3 files changed, 71 insertions(+), 23 deletions(-)

Comments

Junio C Hamano Aug. 2, 2024, 11:24 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> -	for (i = first; i <= last; i++) {
> -		stack_filename(&table_name, st, reader_name(st->readers[i]));
> +	for (i = last + 1; i > first; i--) {
> +		stack_filename(&table_name, st, reader_name(st->readers[i - 1]));
>  
>  		err = hold_lock_file_for_update(&table_locks[nlocks],
>  						table_name.buf, LOCK_NO_DEREF);
>  		if (err < 0) {
> -			if (errno == EEXIST)
> +			/*
> +			 * When the table is locked already we may do a
> +			 * best-effort compaction and compact only the tables
> +			 * that we have managed to lock so far. This of course
> +			 * requires that we have been able to lock at least two
> +			 * tables, otherwise there would be nothing to compact.
> +			 * In that case, we return a lock error to our caller.
> +			 */
> +			if (errno == EEXIST && last - (i - 1) >= 2 &&
> +			    flags & STACK_COMPACT_RANGE_BEST_EFFORT) {
> +				err = 0;
> +				/*
> +				 * The subtraction is to offset the index, the
> +				 * addition is to only compact up to the table
> +				 * of the preceding iteration. They obviously
> +				 * cancel each other out, but that may be
> +				 * non-obvious when it was omitted.
> +				 */
> +				first = (i - 1) + 1;
> +				break;

OK, so this case no longer jumps to "done" label.  Starting from the
high end of the st->readers[] array down to what we manged to take
locks on will be processed with the existing logic that compacted
first..last, which makes sense.

> +			} else if (errno == EEXIST) {
>  				err = REFTABLE_LOCK_ERROR;
> -			else
> +				goto done;
> +			} else {
>  				err = REFTABLE_IO_ERROR;
> -			goto done;
> +				goto done;
> +			}
>  		}
>  
>  		/*
> @@ -1270,7 +1308,7 @@ static int stack_compact_range(struct reftable_stack *st,
>  	 * delete the files after we closed them on Windows, so this needs to
>  	 * happen first.
>  	 */
> -	err = reftable_stack_reload_maybe_reuse(st, first < last);
> +	err = reftable_stack_reload_maybe_reuse(st, first_to_replace < last_to_replace);

What is this change about?

No code changed that computes first_to_replace and last_to_replace?
Perhaps before this step, using first/last and using
first_to_replace/last_to_replace did not make any difference,
because we never dealt with a case where we failed to lock any of
the tables?

I am wondering if this would be helped by a no-op clarification
before the actual behaviour change, similar to how step 4/8 added
the nlocks variable.

Thanks.
Patrick Steinhardt Aug. 5, 2024, 12:11 p.m. UTC | #2
On Fri, Aug 02, 2024 at 04:24:42PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > +			} else if (errno == EEXIST) {
> >  				err = REFTABLE_LOCK_ERROR;
> > -			else
> > +				goto done;
> > +			} else {
> >  				err = REFTABLE_IO_ERROR;
> > -			goto done;
> > +				goto done;
> > +			}
> >  		}
> >  
> >  		/*
> > @@ -1270,7 +1308,7 @@ static int stack_compact_range(struct reftable_stack *st,
> >  	 * delete the files after we closed them on Windows, so this needs to
> >  	 * happen first.
> >  	 */
> > -	err = reftable_stack_reload_maybe_reuse(st, first < last);
> > +	err = reftable_stack_reload_maybe_reuse(st, first_to_replace < last_to_replace);
> 
> What is this change about?
> 
> No code changed that computes first_to_replace and last_to_replace?
> Perhaps before this step, using first/last and using
> first_to_replace/last_to_replace did not make any difference,
> because we never dealt with a case where we failed to lock any of
> the tables?
> 
> I am wondering if this would be helped by a no-op clarification
> before the actual behaviour change, similar to how step 4/8 added
> the nlocks variable.

As we know that we always want to replace the same set of tables, just
with different offsets, it follows that `first - last` is always equal
to `first_to_replace - last_to_replace`.

So this is a no-op change, let me just drop it.

Patrick
diff mbox series

Patch

diff --git a/reftable/stack.c b/reftable/stack.c
index 2b1ac58120..9657ca4418 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1000,6 +1000,15 @@  static int stack_write_compact(struct reftable_stack *st,
 	return err;
 }
 
+enum stack_compact_range_flags {
+	/*
+	 * Perform a best-effort compaction. That is, even if we cannot lock
+	 * all tables in the specified range, we will try to compact the
+	 * remaining slice.
+	 */
+	STACK_COMPACT_RANGE_BEST_EFFORT = (1 << 0),
+};
+
 /*
  * Compact all tables in the range `[first, last)` into a single new table.
  *
@@ -1011,7 +1020,8 @@  static int stack_write_compact(struct reftable_stack *st,
  */
 static int stack_compact_range(struct reftable_stack *st,
 			       size_t first, size_t last,
-			       struct reftable_log_expiry_config *expiry)
+			       struct reftable_log_expiry_config *expiry,
+			       unsigned int flags)
 {
 	struct strbuf tables_list_buf = STRBUF_INIT;
 	struct strbuf new_table_name = STRBUF_INIT;
@@ -1053,19 +1063,47 @@  static int stack_compact_range(struct reftable_stack *st,
 	/*
 	 * Lock all tables in the user-provided range. This is the slice of our
 	 * stack which we'll compact.
+	 *
+	 * Note that we lock tables in reverse order from last to first. The
+	 * intent behind this is to allow a newer process to perform best
+	 * effort compaction of tables that it has added in the case where an
+	 * older process is still busy compacting tables which are preexisting
+	 * from the point of view of the newer process.
 	 */
 	REFTABLE_CALLOC_ARRAY(table_locks, last - first + 1);
-	for (i = first; i <= last; i++) {
-		stack_filename(&table_name, st, reader_name(st->readers[i]));
+	for (i = last + 1; i > first; i--) {
+		stack_filename(&table_name, st, reader_name(st->readers[i - 1]));
 
 		err = hold_lock_file_for_update(&table_locks[nlocks],
 						table_name.buf, LOCK_NO_DEREF);
 		if (err < 0) {
-			if (errno == EEXIST)
+			/*
+			 * When the table is locked already we may do a
+			 * best-effort compaction and compact only the tables
+			 * that we have managed to lock so far. This of course
+			 * requires that we have been able to lock at least two
+			 * tables, otherwise there would be nothing to compact.
+			 * In that case, we return a lock error to our caller.
+			 */
+			if (errno == EEXIST && last - (i - 1) >= 2 &&
+			    flags & STACK_COMPACT_RANGE_BEST_EFFORT) {
+				err = 0;
+				/*
+				 * The subtraction is to offset the index, the
+				 * addition is to only compact up to the table
+				 * of the preceding iteration. They obviously
+				 * cancel each other out, but that may be
+				 * non-obvious when it was omitted.
+				 */
+				first = (i - 1) + 1;
+				break;
+			} else if (errno == EEXIST) {
 				err = REFTABLE_LOCK_ERROR;
-			else
+				goto done;
+			} else {
 				err = REFTABLE_IO_ERROR;
-			goto done;
+				goto done;
+			}
 		}
 
 		/*
@@ -1270,7 +1308,7 @@  static int stack_compact_range(struct reftable_stack *st,
 	 * delete the files after we closed them on Windows, so this needs to
 	 * happen first.
 	 */
-	err = reftable_stack_reload_maybe_reuse(st, first < last);
+	err = reftable_stack_reload_maybe_reuse(st, first_to_replace < last_to_replace);
 	if (err < 0)
 		goto done;
 
@@ -1303,9 +1341,10 @@  static int stack_compact_range(struct reftable_stack *st,
 
 static int stack_compact_range_stats(struct reftable_stack *st,
 				     size_t first, size_t last,
-				     struct reftable_log_expiry_config *config)
+				     struct reftable_log_expiry_config *config,
+				     unsigned int flags)
 {
-	int err = stack_compact_range(st, first, last, config);
+	int err = stack_compact_range(st, first, last, config, flags);
 	if (err == REFTABLE_LOCK_ERROR)
 		st->stats.failures++;
 	return err;
@@ -1315,7 +1354,7 @@  int reftable_stack_compact_all(struct reftable_stack *st,
 			       struct reftable_log_expiry_config *config)
 {
 	size_t last = st->merged->stack_len ? st->merged->stack_len - 1 : 0;
-	return stack_compact_range_stats(st, 0, last, config);
+	return stack_compact_range_stats(st, 0, last, config, 0);
 }
 
 static int segment_size(struct segment *s)
@@ -1422,7 +1461,7 @@  int reftable_stack_auto_compact(struct reftable_stack *st)
 	reftable_free(sizes);
 	if (segment_size(&seg) > 0)
 		return stack_compact_range_stats(st, seg.start, seg.end - 1,
-						 NULL);
+						 NULL, STACK_COMPACT_RANGE_BEST_EFFORT);
 
 	return 0;
 }
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 8739ff3d63..af5757c0f6 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -902,13 +902,15 @@  static void test_reftable_stack_auto_compaction_with_locked_tables(void)
 	write_file_buf(buf.buf, "", 0);
 
 	/*
-	 * Ideally, we'd handle the situation where any of the tables is locked
-	 * gracefully. We don't (yet) do this though and thus fail.
+	 * When parts of the stack are locked, then auto-compaction does a best
+	 * effort compaction of those tables which aren't locked. So while this
+	 * would in theory compact all tables, due to the preexisting lock we
+	 * only compact the newest two tables.
 	 */
 	err = reftable_stack_auto_compact(st);
-	EXPECT(err == REFTABLE_LOCK_ERROR);
-	EXPECT(st->stats.failures == 1);
-	EXPECT(st->merged->stack_len == 5);
+	EXPECT_ERR(err);
+	EXPECT(st->stats.failures == 0);
+	EXPECT(st->merged->stack_len == 4);
 
 	reftable_stack_destroy(st);
 	strbuf_release(&buf);
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index b06c46999d..37510c2b2a 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -478,19 +478,26 @@  test_expect_success "$command: auto compaction" '
 
 		test_oid blob17_2 | git hash-object -w --stdin &&
 
-		# Lock all tables write some refs. Auto-compaction will be
-		# unable to compact tables and thus fails gracefully, leaving
-		# the stack in a sub-optimal state.
-		ls .git/reftable/*.ref |
+		# Lock all tables, write some refs. Auto-compaction will be
+		# unable to compact tables and thus fails gracefully,
+		# compacting only those tables which are not locked.
+		ls .git/reftable/*.ref | sort |
 		while read table
 		do
-			touch "$table.lock" || exit 1
+			touch "$table.lock" &&
+			basename "$table" >>tables.expect || exit 1
 		done &&
+		test_line_count = 2 .git/reftable/tables.list &&
 		git branch B &&
 		git branch C &&
-		rm .git/reftable/*.lock &&
-		test_line_count = 4 .git/reftable/tables.list &&
 
+		# The new tables are auto-compacted, but the locked tables are
+		# left intact.
+		test_line_count = 3 .git/reftable/tables.list &&
+		head -n 2 .git/reftable/tables.list >tables.head &&
+		test_cmp tables.expect tables.head &&
+
+		rm .git/reftable/*.lock &&
 		git $command --auto &&
 		test_line_count = 1 .git/reftable/tables.list
 	)