diff mbox series

[4/4] reftable/blocksource: use mmap to read tables

Message ID a23f38a80609a5c5a6931400ffd28a92b33838bb.1704714575.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series reftable: optimize I/O patterns | expand

Commit Message

Patrick Steinhardt Jan. 8, 2024, 12:18 p.m. UTC
The blocksource interface provides an interface to read blocks from a
reftable table. This interface is implemented using read(3P) calls on
the underlying file descriptor. While this works alright, this pattern
is very inefficient when repeatedly querying the reftable stack for one
or more refs. This inefficiency can mostly be attributed to the fact
that we often need to re-read the same blocks over and over again, and
every single time we need to call read(3P) again.

A natural fit in this context is to use mmap(3P) instead of read(3P),
which has a bunch of benefits:

  - We do not need to come up with a caching strategy for some of the
    blocks as this will be handled by the kernel already.

  - We can avoid the overhead of having to call into the read(3P)
    syscall repeatedly.

  - We do not need to allocate returned blocks repeatedly, but can
    instead hand out pointers into the mmapped region directly.

Using mmap comes with a significant drawback on Windows though, because
mmapped files cannot be deleted and neither is it possible to rename
files onto an mmapped file. But for one, the reftable library gracefully
handles the case where auto-compaction cannot delete a still-open stack
already and ignores any such errors. Also, `reftable_stack_clean()` will
prune stale tables which are not referenced by "tables.list" anymore so
that those files can eventually be pruned. And second, we never rewrite
already-rewritten stacks, so it does not matter that we cannot rename a
file over an mmaped file, either.

Another unfortunate property of mmap is that it is not supported by all
systems. But given that the size of reftables should typically be rather
limited (megabytes at most in the vast majority of repositories), we can
provide an easy fallback by just reading the complete table into memory
on such platforms. This is the same strategy that the "packed" backend
uses in case the platform does not provide mmap.

While this change doesn't significantly improve performance in the case
where we're seeking through stacks once (like e.g. git-for-each-ref(1)
would). But it does speed up usecases where there is lots of random
access to refs, e.g. when writing. The following benchmark demonstrates
these savings with git-update-ref(1) creating N refs in an otherwise
empty repository:

  Benchmark 1: update-ref: create many refs (refcount = 1, revision = HEAD~)
    Time (mean ± σ):       5.6 ms ±   0.2 ms    [User: 2.7 ms, System: 2.8 ms]
    Range (min … max):     5.1 ms …   6.0 ms    90 runs

  Benchmark 2: update-ref: create many refs (refcount = 100, revision = HEAD~)
    Time (mean ± σ):      15.1 ms ±   0.4 ms    [User: 7.1 ms, System: 8.0 ms]
    Range (min … max):    14.2 ms …  16.5 ms    71 runs

  Benchmark 3: update-ref: create many refs (refcount = 10000, revision = HEAD~)
    Time (mean ± σ):     919.4 ms ±  11.2 ms    [User: 427.5 ms, System: 491.7 ms]
    Range (min … max):   908.1 ms … 936.6 ms    10 runs

  Benchmark 4: update-ref: create many refs (refcount = 1, revision = HEAD)
    Time (mean ± σ):       5.5 ms ±   0.3 ms    [User: 2.4 ms, System: 3.1 ms]
    Range (min … max):     5.0 ms …   7.3 ms    89 runs

  Benchmark 5: update-ref: create many refs (refcount = 100, revision = HEAD)
    Time (mean ± σ):      10.4 ms ±   0.3 ms    [User: 5.1 ms, System: 5.3 ms]
    Range (min … max):     9.7 ms …  11.0 ms    78 runs

  Benchmark 6: update-ref: create many refs (refcount = 10000, revision = HEAD)
    Time (mean ± σ):     483.5 ms ±   6.3 ms    [User: 227.8 ms, System: 255.6 ms]
    Range (min … max):   477.5 ms … 498.8 ms    10 runs

  Summary
    update-ref: create many refs (refcount = 1, revision = HEAD) ran
      1.01 ± 0.06 times faster than update-ref: create many refs (refcount = 1, revision = HEAD~)
      1.89 ± 0.11 times faster than update-ref: create many refs (refcount = 100, revision = HEAD)
      2.73 ± 0.16 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~)
     87.55 ± 4.65 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD)
    166.48 ± 8.80 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~)

Theoretically, we could also replicate the strategy of the "packed"
backend where small tables are read into memory instead of using mmap.
Benchmarks did not confirm that this has a performance benefit though.
---
 reftable/blocksource.c | 48 ++++++++++++++++++++++++++++++------------
 1 file changed, 35 insertions(+), 13 deletions(-)

Comments

Junio C Hamano Jan. 10, 2024, 9:24 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> Using mmap comes with a significant drawback on Windows though, because
> mmapped files cannot be deleted and neither is it possible to rename
> files onto an mmapped file. But for one, the reftable library gracefully
> handles the case where auto-compaction cannot delete a still-open stack
> already and ignores any such errors. Also, `reftable_stack_clean()` will
> prune stale tables which are not referenced by "tables.list" anymore so
> that those files can eventually be pruned. And second, we never rewrite
> already-rewritten stacks, so it does not matter that we cannot rename a
> file over an mmaped file, either.

I somehow thought that we use "read into an allocated memory and
pretend as if we mapped" emulation on Windows, so all of that may be
moot.

> diff --git a/reftable/blocksource.c b/reftable/blocksource.c
> index a1ea304429..5d3f3d264c 100644
> --- a/reftable/blocksource.c
> +++ b/reftable/blocksource.c
> @@ -13,6 +13,12 @@ license that can be found in the LICENSE file or at
>  #include "reftable-blocksource.h"
>  #include "reftable-error.h"
>  
> +#if defined(NO_MMAP)
> +static int use_mmap = 0;
> +#else
> +static int use_mmap = 1;
> +#endif

Is there (do you need) some externally controllable knob that the
user can use to turn this off on a system that is compiled without
NO_MMAP?  Otherwise it is misleading to have this as a variable.

> -static void file_close(void *b)
> +static void file_close(void *v)
>  {
> -	int fd = ((struct file_block_source *)b)->fd;
> -	if (fd > 0) {
> -		close(fd);
> -		((struct file_block_source *)b)->fd = 0;
> +	struct file_block_source *b = v;
> +
> +	if (b->fd >= 0) {
> +		close(b->fd);
> +		b->fd = -1;
>  	}
>  
> +	if (use_mmap)
> +		munmap(b->data, b->size);
> +	else
> +		reftable_free(b->data);
> +	b->data = NULL;
> +
>  	reftable_free(b);
>  }

It would have been nicer to do this kind of "a void pointer is taken
and we immediately cast it to a concrete and useful type upon entry"
clean-up as a separate step that is purely clean-up, if there were
many instances.  It is the first such one in the series as far as I
remember, so it is not a huge deal.

> @@ -108,9 +119,7 @@ static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
>  {
>  	struct file_block_source *b = v;
>  	assert(off + size <= b->size);
> -	dest->data = reftable_malloc(size);
> -	if (pread_in_full(b->fd, dest->data, size, off) != size)
> -		return -1;
> +	dest->data = b->data + off;
>  	dest->len = size;
>  	return size;
>  }

So, we used to delay the allocation and reading of a block until
this function gets called.  Now, by the time the control reaches
the function, we are expected to have the data handy at b->data.
That is ensured by reftable_block_source_from_file(), I presume?

> @@ -127,8 +136,10 @@ int reftable_block_source_from_file(struct reftable_block_source *bs,
>  {
>  	struct stat st = { 0 };
>  	int err = 0;
> -	int fd = open(name, O_RDONLY);
> +	int fd;
>  	struct file_block_source *p = NULL;
> +
> +	fd = open(name, O_RDONLY);
>  	if (fd < 0) {
>  		if (errno == ENOENT) {
>  			return REFTABLE_NOT_EXIST_ERROR;

This is a no-op clean-up that would have been better in a separate
clean-up step.  Again, not a huge deal.

> @@ -144,7 +155,18 @@ int reftable_block_source_from_file(struct reftable_block_source *bs,
>  
>  	p = reftable_calloc(sizeof(struct file_block_source));
>  	p->size = st.st_size;
> -	p->fd = fd;
> +	if (use_mmap) {
> +		p->data = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +		p->fd = fd;

This is a bit unusual for our use of mmap() where the norm is to
close the file descriptor once we mapped (we only need the memory
region itself and not the originating file descriptor to unmap it).

Is there a reason why we need to keep p->fd?  Now the other side of
this if/else preallocates the whole thing and slurps everything in
core to allow the remainder of the code to mimic what happens on the
mmaped block memory (e.g., we saw how file_read_block() assumes that
b->data[0..b->size] are valid) and does not obviously need p->fd,
can we just remove .fd member from the file_block_source structure?

That way, file_close() can also lose the conditional close() call.

For that matter, do we need any codepath in this file that is
enabled by !use_mmap?  Can't we just use xmmap() and let its
"instead, we allocate, read into it and return" emulation?

Thanks.

> +	} else {
> +		p->data = xmalloc(st.st_size);
> +		if (read_in_full(fd, p->data, st.st_size) != st.st_size) {
> +			close(fd);
> +			return -1;
> +		}
> +		close(fd);
> +		p->fd = -1;
> +	}
>  
>  	assert(!bs->ops);
>  	bs->ops = &file_vtable;
Patrick Steinhardt Jan. 11, 2024, 9:21 a.m. UTC | #2
On Wed, Jan 10, 2024 at 01:24:24PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Using mmap comes with a significant drawback on Windows though, because
> > mmapped files cannot be deleted and neither is it possible to rename
> > files onto an mmapped file. But for one, the reftable library gracefully
> > handles the case where auto-compaction cannot delete a still-open stack
> > already and ignores any such errors. Also, `reftable_stack_clean()` will
> > prune stale tables which are not referenced by "tables.list" anymore so
> > that those files can eventually be pruned. And second, we never rewrite
> > already-rewritten stacks, so it does not matter that we cannot rename a
> > file over an mmaped file, either.
> 
> I somehow thought that we use "read into an allocated memory and
> pretend as if we mapped" emulation on Windows, so all of that may be
> moot.

Ah, you're right in fact. I missed the definition of `git_mmap()` and
`git_munmap()`.

> > diff --git a/reftable/blocksource.c b/reftable/blocksource.c
> > index a1ea304429..5d3f3d264c 100644
> > --- a/reftable/blocksource.c
> > +++ b/reftable/blocksource.c
> > @@ -13,6 +13,12 @@ license that can be found in the LICENSE file or at
> >  #include "reftable-blocksource.h"
> >  #include "reftable-error.h"
> >  
> > +#if defined(NO_MMAP)
> > +static int use_mmap = 0;
> > +#else
> > +static int use_mmap = 1;
> > +#endif
> 
> Is there (do you need) some externally controllable knob that the
> user can use to turn this off on a system that is compiled without
> NO_MMAP?  Otherwise it is misleading to have this as a variable.

No. The benefit of using a variable instead of using ifdefs is that we
know that both code paths continue to compile just fine. We do the same
thing in "refs/packed-backend.c".

But this code is not needed at all anyway, as you pointed out, because
we can simply use the emulated mmap(3P) code.

> > +static void file_close(void *v)
> >  {
> > -	int fd = ((struct file_block_source *)b)->fd;
> > -	if (fd > 0) {
> > -		close(fd);
> > -		((struct file_block_source *)b)->fd = 0;
> > +	struct file_block_source *b = v;
> > +
> > +	if (b->fd >= 0) {
> > +		close(b->fd);
> > +		b->fd = -1;
> >  	}
> >  
> > +	if (use_mmap)
> > +		munmap(b->data, b->size);
> > +	else
> > +		reftable_free(b->data);
> > +	b->data = NULL;
> > +
> >  	reftable_free(b);
> >  }
> 
> It would have been nicer to do this kind of "a void pointer is taken
> and we immediately cast it to a concrete and useful type upon entry"
> clean-up as a separate step that is purely clean-up, if there were
> many instances.  It is the first such one in the series as far as I
> remember, so it is not a huge deal.
> 
> > @@ -108,9 +119,7 @@ static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
> >  {
> >  	struct file_block_source *b = v;
> >  	assert(off + size <= b->size);
> > -	dest->data = reftable_malloc(size);
> > -	if (pread_in_full(b->fd, dest->data, size, off) != size)
> > -		return -1;
> > +	dest->data = b->data + off;
> >  	dest->len = size;
> >  	return size;
> >  }
> 
> So, we used to delay the allocation and reading of a block until
> this function gets called.  Now, by the time the control reaches
> the function, we are expected to have the data handy at b->data.
> That is ensured by reftable_block_source_from_file(), I presume?
> 
> > @@ -127,8 +136,10 @@ int reftable_block_source_from_file(struct reftable_block_source *bs,
> >  {
> >  	struct stat st = { 0 };
> >  	int err = 0;
> > -	int fd = open(name, O_RDONLY);
> > +	int fd;
> >  	struct file_block_source *p = NULL;
> > +
> > +	fd = open(name, O_RDONLY);
> >  	if (fd < 0) {
> >  		if (errno == ENOENT) {
> >  			return REFTABLE_NOT_EXIST_ERROR;
> 
> This is a no-op clean-up that would have been better in a separate
> clean-up step.  Again, not a huge deal.

Yeah, fair enough. One problem I have with the reftable library is that
its coding style doesn't quite blend in with the rest of the Git code
base, so I want to refactor code that I'm touching to match our coding
style better. This means that there will be a lot of such smallish
refactorings, and I'm often not sure whether to evict them into their
own commit or whether to do the refactoring in the same step.

For small changes like this one here I think it's reasonable to do so in
the same commit. But larger refactorings like e.g. the introduction of
the common exit path in the first patch of this series I of course put
into their own commit.

> > @@ -144,7 +155,18 @@ int reftable_block_source_from_file(struct reftable_block_source *bs,
> >  
> >  	p = reftable_calloc(sizeof(struct file_block_source));
> >  	p->size = st.st_size;
> > -	p->fd = fd;
> > +	if (use_mmap) {
> > +		p->data = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
> > +		p->fd = fd;
> 
> This is a bit unusual for our use of mmap() where the norm is to
> close the file descriptor once we mapped (we only need the memory
> region itself and not the originating file descriptor to unmap it).
> 
> Is there a reason why we need to keep p->fd?  Now the other side of
> this if/else preallocates the whole thing and slurps everything in
> core to allow the remainder of the code to mimic what happens on the
> mmaped block memory (e.g., we saw how file_read_block() assumes that
> b->data[0..b->size] are valid) and does not obviously need p->fd,
> can we just remove .fd member from the file_block_source structure?
> 
> That way, file_close() can also lose the conditional close() call.
> 
> For that matter, do we need any codepath in this file that is
> enabled by !use_mmap?  Can't we just use xmmap() and let its
> "instead, we allocate, read into it and return" emulation?

Not really, I'll update the code accordingly.

Thanks!

Patrick
diff mbox series

Patch

diff --git a/reftable/blocksource.c b/reftable/blocksource.c
index a1ea304429..5d3f3d264c 100644
--- a/reftable/blocksource.c
+++ b/reftable/blocksource.c
@@ -13,6 +13,12 @@  license that can be found in the LICENSE file or at
 #include "reftable-blocksource.h"
 #include "reftable-error.h"
 
+#if defined(NO_MMAP)
+static int use_mmap = 0;
+#else
+static int use_mmap = 1;
+#endif
+
 static void strbuf_return_block(void *b, struct reftable_block *dest)
 {
 	if (dest->len)
@@ -78,6 +84,7 @@  struct reftable_block_source malloc_block_source(void)
 struct file_block_source {
 	int fd;
 	uint64_t size;
+	unsigned char *data;
 };
 
 static uint64_t file_size(void *b)
@@ -87,19 +94,23 @@  static uint64_t file_size(void *b)
 
 static void file_return_block(void *b, struct reftable_block *dest)
 {
-	if (dest->len)
-		memset(dest->data, 0xff, dest->len);
-	reftable_free(dest->data);
 }
 
-static void file_close(void *b)
+static void file_close(void *v)
 {
-	int fd = ((struct file_block_source *)b)->fd;
-	if (fd > 0) {
-		close(fd);
-		((struct file_block_source *)b)->fd = 0;
+	struct file_block_source *b = v;
+
+	if (b->fd >= 0) {
+		close(b->fd);
+		b->fd = -1;
 	}
 
+	if (use_mmap)
+		munmap(b->data, b->size);
+	else
+		reftable_free(b->data);
+	b->data = NULL;
+
 	reftable_free(b);
 }
 
@@ -108,9 +119,7 @@  static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
 {
 	struct file_block_source *b = v;
 	assert(off + size <= b->size);
-	dest->data = reftable_malloc(size);
-	if (pread_in_full(b->fd, dest->data, size, off) != size)
-		return -1;
+	dest->data = b->data + off;
 	dest->len = size;
 	return size;
 }
@@ -127,8 +136,10 @@  int reftable_block_source_from_file(struct reftable_block_source *bs,
 {
 	struct stat st = { 0 };
 	int err = 0;
-	int fd = open(name, O_RDONLY);
+	int fd;
 	struct file_block_source *p = NULL;
+
+	fd = open(name, O_RDONLY);
 	if (fd < 0) {
 		if (errno == ENOENT) {
 			return REFTABLE_NOT_EXIST_ERROR;
@@ -144,7 +155,18 @@  int reftable_block_source_from_file(struct reftable_block_source *bs,
 
 	p = reftable_calloc(sizeof(struct file_block_source));
 	p->size = st.st_size;
-	p->fd = fd;
+	if (use_mmap) {
+		p->data = xmmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+		p->fd = fd;
+	} else {
+		p->data = xmalloc(st.st_size);
+		if (read_in_full(fd, p->data, st.st_size) != st.st_size) {
+			close(fd);
+			return -1;
+		}
+		close(fd);
+		p->fd = -1;
+	}
 
 	assert(!bs->ops);
 	bs->ops = &file_vtable;