diff mbox series

[v2,4/7] reftable/block: refactor binary search over restart points

Message ID 5e20d93ae000359f2231bf950a930cfc4898ced2.1711361340.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series reftable: improvements for the `binsearch()` mechanism | expand

Commit Message

Patrick Steinhardt March 25, 2024, 10:10 a.m. UTC
When seeking a record in our block reader we perform a binary search
over the block's restart points so that we don't have to do a linear
scan over the whole block. The logic to do so is quite intricate though,
which makes it hard to understand.

Improve documentation and rename some of the functions and variables so
that the code becomes easier to understand overall. This refactoring
should not result in any change in behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 97 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 70 insertions(+), 27 deletions(-)

Comments

Justin Tobler April 2, 2024, 4:42 p.m. UTC | #1
On 24/03/25 11:10AM, Patrick Steinhardt wrote:
> When seeking a record in our block reader we perform a binary search
> over the block's restart points so that we don't have to do a linear
> scan over the whole block. The logic to do so is quite intricate though,
> which makes it hard to understand.
> 
> Improve documentation and rename some of the functions and variables so
> that the code becomes easier to understand overall. This refactoring
> should not result in any change in behaviour.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
...  
> -	i = binsearch(br->restart_count, &restart_key_less, &args);
> +	/*
> +	 * Perform a binary search over the block's restart points, which
> +	 * avoids doing a linear scan over the whole block. Like this, we
> +	 * identify the section of the block that should contain our key.
> +	 *
> +	 * Note that we explicitly search for the first restart point _greater_
> +	 * than the sought-after record, not _greater or equal_ to it. In case
> +	 * the sought-after record is located directly at the restart point we
> +	 * would otherwise start doing the linear search at the preceding
> +	 * restart point. While that works alright, we would end up scanning
> +	 * too many record.
> +	 */
> +	i = binsearch(br->restart_count, &restart_needle_less, &args);
> +
> +	/*
> +	 * Now there are multiple cases:
> +	 *
> +	 *   - `i == 0`: The wanted record must be contained before the first
> +	 *     restart point. We will thus start searching for the record in
> +	 *     that section after accounting for the header offset.

To clarify my understanding, does a restart_offset not exist at the
beginning of a block? I was originally under the impression that the
first reset point would be at the beginning of the block (or just after
the header). If this was the case, the first restart point being greater
would indicate that the wanted record is not contained within the block.

-Justin

> +	 *
> +	 *   - `i == restart_count`: The wanted record was not found at any of
> +	 *     the restart points. As there is no restart point at the end of
> +	 *     the section the record may thus be contained in the last block.
> +	 *
> +	 *   - `i > 0`: The wanted record must be contained in the section
> +	 *     before the found restart point. We thus do a linear search
> +	 *     starting from the preceding restart point.
> +	 */
>  	if (i > 0)
>  		it->next_off = block_reader_restart_offset(br, i - 1);
>  	else
> @@ -399,21 +429,34 @@ int block_reader_seek(struct block_reader *br, struct block_iter *it,
>  
>  	reftable_record_init(&rec, block_reader_type(br));
>  
> -	/* We're looking for the last entry less/equal than the wanted key, so
> -	   we have to go one entry too far and then back up.
> -	*/
> +	/*
> +	 * We're looking for the last entry less than the wanted key so that
> +	 * the next call to `block_reader_next()` would yield the wanted
> +	 * record. We thus don't want to position our reader at the sought
> +	 * after record, but one before. To do so, we have to go one entry too
> +	 * far and then back up.
> +	 */
>  	while (1) {
>  		block_iter_copy_from(&next, it);
>  		err = block_iter_next(&next, &rec);
>  		if (err < 0)
>  			goto done;
> -
> -		reftable_record_key(&rec, &it->last_key);
> -		if (err > 0 || strbuf_cmp(&it->last_key, want) >= 0) {
> +		if (err > 0) {
>  			err = 0;
>  			goto done;
>  		}
>  
> +		/*
> +		 * Check whether the current key is greater or equal to the
> +		 * sought-after key. In case it is greater we know that the
> +		 * record does not exist in the block and can thus abort early.
> +		 * In case it is equal to the sought-after key we have found
> +		 * the desired record.
> +		 */
> +		reftable_record_key(&rec, &it->last_key);
> +		if (strbuf_cmp(&it->last_key, want) >= 0)
> +			goto done;
> +
>  		block_iter_copy_from(it, &next);
>  	}
>  
> -- 
> 2.44.GIT
>
Patrick Steinhardt April 2, 2024, 5:15 p.m. UTC | #2
On Tue, Apr 02, 2024 at 11:42:29AM -0500, Justin Tobler wrote:
> On 24/03/25 11:10AM, Patrick Steinhardt wrote:
> > When seeking a record in our block reader we perform a binary search
> > over the block's restart points so that we don't have to do a linear
> > scan over the whole block. The logic to do so is quite intricate though,
> > which makes it hard to understand.
> > 
> > Improve documentation and rename some of the functions and variables so
> > that the code becomes easier to understand overall. This refactoring
> > should not result in any change in behaviour.
> > 
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> ...  
> > -	i = binsearch(br->restart_count, &restart_key_less, &args);
> > +	/*
> > +	 * Perform a binary search over the block's restart points, which
> > +	 * avoids doing a linear scan over the whole block. Like this, we
> > +	 * identify the section of the block that should contain our key.
> > +	 *
> > +	 * Note that we explicitly search for the first restart point _greater_
> > +	 * than the sought-after record, not _greater or equal_ to it. In case
> > +	 * the sought-after record is located directly at the restart point we
> > +	 * would otherwise start doing the linear search at the preceding
> > +	 * restart point. While that works alright, we would end up scanning
> > +	 * too many record.
> > +	 */
> > +	i = binsearch(br->restart_count, &restart_needle_less, &args);
> > +
> > +	/*
> > +	 * Now there are multiple cases:
> > +	 *
> > +	 *   - `i == 0`: The wanted record must be contained before the first
> > +	 *     restart point. We will thus start searching for the record in
> > +	 *     that section after accounting for the header offset.
> 
> To clarify my understanding, does a restart_offset not exist at the
> beginning of a block? I was originally under the impression that the
> first reset point would be at the beginning of the block (or just after
> the header). If this was the case, the first restart point being greater
> would indicate that the wanted record is not contained within the block.

It wouldn't make much sense to include it as a restart point. A restart
point is a point where the prefix compression will be reset such that
the record at that point can be read without reading preceding records.
This is always implicitly true for the first record in a block as it is
never prefix-compressed. Consequently, writing a restart point for the
first record would be a waste of disk space.

Patrick

> -Justin
> 
> > +	 *
> > +	 *   - `i == restart_count`: The wanted record was not found at any of
> > +	 *     the restart points. As there is no restart point at the end of
> > +	 *     the section the record may thus be contained in the last block.
> > +	 *
> > +	 *   - `i > 0`: The wanted record must be contained in the section
> > +	 *     before the found restart point. We thus do a linear search
> > +	 *     starting from the preceding restart point.
> > +	 */
> >  	if (i > 0)
> >  		it->next_off = block_reader_restart_offset(br, i - 1);
> >  	else
> > @@ -399,21 +429,34 @@ int block_reader_seek(struct block_reader *br, struct block_iter *it,
> >  
> >  	reftable_record_init(&rec, block_reader_type(br));
> >  
> > -	/* We're looking for the last entry less/equal than the wanted key, so
> > -	   we have to go one entry too far and then back up.
> > -	*/
> > +	/*
> > +	 * We're looking for the last entry less than the wanted key so that
> > +	 * the next call to `block_reader_next()` would yield the wanted
> > +	 * record. We thus don't want to position our reader at the sought
> > +	 * after record, but one before. To do so, we have to go one entry too
> > +	 * far and then back up.
> > +	 */
> >  	while (1) {
> >  		block_iter_copy_from(&next, it);
> >  		err = block_iter_next(&next, &rec);
> >  		if (err < 0)
> >  			goto done;
> > -
> > -		reftable_record_key(&rec, &it->last_key);
> > -		if (err > 0 || strbuf_cmp(&it->last_key, want) >= 0) {
> > +		if (err > 0) {
> >  			err = 0;
> >  			goto done;
> >  		}
> >  
> > +		/*
> > +		 * Check whether the current key is greater or equal to the
> > +		 * sought-after key. In case it is greater we know that the
> > +		 * record does not exist in the block and can thus abort early.
> > +		 * In case it is equal to the sought-after key we have found
> > +		 * the desired record.
> > +		 */
> > +		reftable_record_key(&rec, &it->last_key);
> > +		if (strbuf_cmp(&it->last_key, want) >= 0)
> > +			goto done;
> > +
> >  		block_iter_copy_from(it, &next);
> >  	}
> >  
> > -- 
> > 2.44.GIT
> > 
> 
>
Justin Tobler April 2, 2024, 5:46 p.m. UTC | #3
On 24/04/02 07:15PM, Patrick Steinhardt wrote:
> On Tue, Apr 02, 2024 at 11:42:29AM -0500, Justin Tobler wrote:
> > On 24/03/25 11:10AM, Patrick Steinhardt wrote:
> > > When seeking a record in our block reader we perform a binary search
> > > over the block's restart points so that we don't have to do a linear
> > > scan over the whole block. The logic to do so is quite intricate though,
> > > which makes it hard to understand.
> > > 
> > > Improve documentation and rename some of the functions and variables so
> > > that the code becomes easier to understand overall. This refactoring
> > > should not result in any change in behaviour.
> > > 
> > > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > > ---
> > ...  
> > > -	i = binsearch(br->restart_count, &restart_key_less, &args);
> > > +	/*
> > > +	 * Perform a binary search over the block's restart points, which
> > > +	 * avoids doing a linear scan over the whole block. Like this, we
> > > +	 * identify the section of the block that should contain our key.
> > > +	 *
> > > +	 * Note that we explicitly search for the first restart point _greater_
> > > +	 * than the sought-after record, not _greater or equal_ to it. In case
> > > +	 * the sought-after record is located directly at the restart point we
> > > +	 * would otherwise start doing the linear search at the preceding
> > > +	 * restart point. While that works alright, we would end up scanning
> > > +	 * too many record.
> > > +	 */
> > > +	i = binsearch(br->restart_count, &restart_needle_less, &args);
> > > +
> > > +	/*
> > > +	 * Now there are multiple cases:
> > > +	 *
> > > +	 *   - `i == 0`: The wanted record must be contained before the first
> > > +	 *     restart point. We will thus start searching for the record in
> > > +	 *     that section after accounting for the header offset.
> > 
> > To clarify my understanding, does a restart_offset not exist at the
> > beginning of a block? I was originally under the impression that the
> > first reset point would be at the beginning of the block (or just after
> > the header). If this was the case, the first restart point being greater
> > would indicate that the wanted record is not contained within the block.
> 
> It wouldn't make much sense to include it as a restart point. A restart
> point is a point where the prefix compression will be reset such that
> the record at that point can be read without reading preceding records.
> This is always implicitly true for the first record in a block as it is
> never prefix-compressed. Consequently, writing a restart point for the
> first record would be a waste of disk space.

Thanks Patrick! Good to know :)

From Documentation/technical/reftable.txt:

> As the first ref block shares the first file block with the file
> header, all restart_offset in the first block are relative to the
> start of the file (position 0), and include the file header. This 
> forces the first restart_offset to be 28.

I'm assumming this is out-of-date.

-Justin
Patrick Steinhardt April 3, 2024, 6:01 a.m. UTC | #4
On Tue, Apr 02, 2024 at 12:46:30PM -0500, Justin Tobler wrote:
> On 24/04/02 07:15PM, Patrick Steinhardt wrote:
> > On Tue, Apr 02, 2024 at 11:42:29AM -0500, Justin Tobler wrote:
> > > On 24/03/25 11:10AM, Patrick Steinhardt wrote:
> > > > When seeking a record in our block reader we perform a binary search
> > > > over the block's restart points so that we don't have to do a linear
> > > > scan over the whole block. The logic to do so is quite intricate though,
> > > > which makes it hard to understand.
> > > > 
> > > > Improve documentation and rename some of the functions and variables so
> > > > that the code becomes easier to understand overall. This refactoring
> > > > should not result in any change in behaviour.
> > > > 
> > > > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > > > ---
> > > ...  
> > > > -	i = binsearch(br->restart_count, &restart_key_less, &args);
> > > > +	/*
> > > > +	 * Perform a binary search over the block's restart points, which
> > > > +	 * avoids doing a linear scan over the whole block. Like this, we
> > > > +	 * identify the section of the block that should contain our key.
> > > > +	 *
> > > > +	 * Note that we explicitly search for the first restart point _greater_
> > > > +	 * than the sought-after record, not _greater or equal_ to it. In case
> > > > +	 * the sought-after record is located directly at the restart point we
> > > > +	 * would otherwise start doing the linear search at the preceding
> > > > +	 * restart point. While that works alright, we would end up scanning
> > > > +	 * too many record.
> > > > +	 */
> > > > +	i = binsearch(br->restart_count, &restart_needle_less, &args);
> > > > +
> > > > +	/*
> > > > +	 * Now there are multiple cases:
> > > > +	 *
> > > > +	 *   - `i == 0`: The wanted record must be contained before the first
> > > > +	 *     restart point. We will thus start searching for the record in
> > > > +	 *     that section after accounting for the header offset.
> > > 
> > > To clarify my understanding, does a restart_offset not exist at the
> > > beginning of a block? I was originally under the impression that the
> > > first reset point would be at the beginning of the block (or just after
> > > the header). If this was the case, the first restart point being greater
> > > would indicate that the wanted record is not contained within the block.
> > 
> > It wouldn't make much sense to include it as a restart point. A restart
> > point is a point where the prefix compression will be reset such that
> > the record at that point can be read without reading preceding records.
> > This is always implicitly true for the first record in a block as it is
> > never prefix-compressed. Consequently, writing a restart point for the
> > first record would be a waste of disk space.
> 
> Thanks Patrick! Good to know :)
> 
> From Documentation/technical/reftable.txt:
> 
> > As the first ref block shares the first file block with the file
> > header, all restart_offset in the first block are relative to the
> > start of the file (position 0), and include the file header. This 
> > forces the first restart_offset to be 28.
> 
> I'm assumming this is out-of-date.

That paragraph actually made me re-check my own assumptions. Turns out
my claim is wrong. The function responsible for registering restarts is
`block_writer_register_restart()`, which gets a parameter `is_restart`
that is determined by `reftable_encode_key()`. And that function in turn
will set `restart = 1` whenever `prefix_len == 0`. And given that the
first record always has `prefix_len == 0`, it will thus have a restart
point, as well.

I'll update the comment like this:

diff --git a/reftable/block.c b/reftable/block.c
index 8bb4e43cec..298e8c56b9 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -417,9 +417,12 @@ int block_reader_seek(struct block_reader *br, struct block_iter *it,
 	/*
 	 * Now there are multiple cases:
 	 *
-	 *   - `i == 0`: The wanted record must be contained before the first
-	 *     restart point. We will thus start searching for the record in
-	 *     that section after accounting for the header offset.
+	 *   - `i == 0`: The wanted record is smaller than the record found at
+	 *     the first restart point. As the first restart point is the first
+	 *     record in the block, our wanted record cannot be located in this
+	 *     block at all. We still need to position the iterator so that the
+	 *     next call to `block_iter_next()` will yield an end-of-iterator
+	 *     signal.
 	 *
 	 *   - `i == restart_count`: The wanted record was not found at any of
 	 *     the restart points. As there is no restart point at the end of

Thanks for making me challenge my own assumptions!

Patrick
diff mbox series

Patch

diff --git a/reftable/block.c b/reftable/block.c
index 422885bddb..6cd4c053df 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -273,34 +273,36 @@  void block_reader_start(struct block_reader *br, struct block_iter *it)
 	it->next_off = br->header_off + 4;
 }
 
-struct restart_find_args {
+struct restart_needle_less_args {
 	int error;
-	struct strbuf key;
-	struct block_reader *r;
+	struct strbuf needle;
+	struct block_reader *reader;
 };
 
-static int restart_key_less(size_t idx, void *args)
+static int restart_needle_less(size_t idx, void *_args)
 {
-	struct restart_find_args *a = args;
-	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct restart_needle_less_args *args = _args;
+	uint32_t off = block_reader_restart_offset(args->reader, idx);
 	struct string_view in = {
-		.buf = a->r->block.data + off,
-		.len = a->r->block_len - off,
+		.buf = args->reader->block.data + off,
+		.len = args->reader->block_len - off,
 	};
-
-	/* the restart key is verbatim in the block, so this could avoid the
-	   alloc for decoding the key */
-	struct strbuf rkey = STRBUF_INIT;
+	struct strbuf kth_restart_key = STRBUF_INIT;
 	uint8_t unused_extra;
-	int n = reftable_decode_key(&rkey, &unused_extra, in);
-	int result;
+	int result, n;
+
+	/*
+	 * TODO: The restart key is verbatim in the block, so we can in theory
+	 * avoid decoding the key and thus save some allocations.
+	 */
+	n = reftable_decode_key(&kth_restart_key, &unused_extra, in);
 	if (n < 0) {
-		a->error = 1;
+		args->error = 1;
 		return -1;
 	}
 
-	result = strbuf_cmp(&a->key, &rkey);
-	strbuf_release(&rkey);
+	result = strbuf_cmp(&args->needle, &kth_restart_key);
+	strbuf_release(&kth_restart_key);
 	return result < 0;
 }
 
@@ -376,9 +378,9 @@  void block_iter_close(struct block_iter *it)
 int block_reader_seek(struct block_reader *br, struct block_iter *it,
 		      struct strbuf *want)
 {
-	struct restart_find_args args = {
-		.key = *want,
-		.r = br,
+	struct restart_needle_less_args args = {
+		.needle = *want,
+		.reader = br,
 	};
 	struct block_iter next = BLOCK_ITER_INIT;
 	struct reftable_record rec;
@@ -390,7 +392,35 @@  int block_reader_seek(struct block_reader *br, struct block_iter *it,
 		goto done;
 	}
 
-	i = binsearch(br->restart_count, &restart_key_less, &args);
+	/*
+	 * Perform a binary search over the block's restart points, which
+	 * avoids doing a linear scan over the whole block. Like this, we
+	 * identify the section of the block that should contain our key.
+	 *
+	 * Note that we explicitly search for the first restart point _greater_
+	 * than the sought-after record, not _greater or equal_ to it. In case
+	 * the sought-after record is located directly at the restart point we
+	 * would otherwise start doing the linear search at the preceding
+	 * restart point. While that works alright, we would end up scanning
+	 * too many record.
+	 */
+	i = binsearch(br->restart_count, &restart_needle_less, &args);
+
+	/*
+	 * Now there are multiple cases:
+	 *
+	 *   - `i == 0`: The wanted record must be contained before the first
+	 *     restart point. We will thus start searching for the record in
+	 *     that section after accounting for the header offset.
+	 *
+	 *   - `i == restart_count`: The wanted record was not found at any of
+	 *     the restart points. As there is no restart point at the end of
+	 *     the section the record may thus be contained in the last block.
+	 *
+	 *   - `i > 0`: The wanted record must be contained in the section
+	 *     before the found restart point. We thus do a linear search
+	 *     starting from the preceding restart point.
+	 */
 	if (i > 0)
 		it->next_off = block_reader_restart_offset(br, i - 1);
 	else
@@ -399,21 +429,34 @@  int block_reader_seek(struct block_reader *br, struct block_iter *it,
 
 	reftable_record_init(&rec, block_reader_type(br));
 
-	/* We're looking for the last entry less/equal than the wanted key, so
-	   we have to go one entry too far and then back up.
-	*/
+	/*
+	 * We're looking for the last entry less than the wanted key so that
+	 * the next call to `block_reader_next()` would yield the wanted
+	 * record. We thus don't want to position our reader at the sought
+	 * after record, but one before. To do so, we have to go one entry too
+	 * far and then back up.
+	 */
 	while (1) {
 		block_iter_copy_from(&next, it);
 		err = block_iter_next(&next, &rec);
 		if (err < 0)
 			goto done;
-
-		reftable_record_key(&rec, &it->last_key);
-		if (err > 0 || strbuf_cmp(&it->last_key, want) >= 0) {
+		if (err > 0) {
 			err = 0;
 			goto done;
 		}
 
+		/*
+		 * Check whether the current key is greater or equal to the
+		 * sought-after key. In case it is greater we know that the
+		 * record does not exist in the block and can thus abort early.
+		 * In case it is equal to the sought-after key we have found
+		 * the desired record.
+		 */
+		reftable_record_key(&rec, &it->last_key);
+		if (strbuf_cmp(&it->last_key, want) >= 0)
+			goto done;
+
 		block_iter_copy_from(it, &next);
 	}