diff mbox series

[v2,1/8] packfile: prepare for the existence of '*.rev' files

Message ID 6742c15c84bafbcc1c06e2633de51dcda63e3314.1610576805.git.me@ttaylorr.com (mailing list archive)
State Superseded
Headers show
Series pack-revindex: introduce on-disk '.rev' format | expand

Commit Message

Taylor Blau Jan. 13, 2021, 10:28 p.m. UTC
Specify the format of the on-disk reverse index 'pack-*.rev' file, as
well as prepare the code for the existence of such files.

The reverse index maps from pack relative positions (i.e., an index into
the array of object which is sorted by their offsets within the
packfile) to their position within the 'pack-*.idx' file. Today, this is
done by building up a list of (off_t, uint32_t) tuples for each object
(the off_t corresponding to that object's offset, and the uint32_t
corresponding to its position in the index). To convert between pack and
index position quickly, this array of tuples is radix sorted based on
its offset.

This has two major drawbacks:

First, the in-memory cost scales linearly with the number of objects in
a pack.  Each 'struct revindex_entry' is sizeof(off_t) +
sizeof(uint32_t) + padding bytes for a total of 16.

To observe this, force Git to load the reverse index by, for e.g.,
running 'git cat-file --batch-check="%(objectsize:disk)"'. When asking
for a single object in a fresh clone of the kernel, Git needs to
allocate 120+ MB of memory in order to hold the reverse index in memory.

Second, the cost to sort also scales with the size of the pack.
Luckily, this is a linear function since 'load_pack_revindex()' uses a
radix sort, but this cost still must be paid once per pack per process.

As an example, it takes ~60x longer to print the _size_ of an object as
it does to print that entire object's _contents_:

  Benchmark #1: git.compile cat-file --batch <obj
    Time (mean ± σ):       3.4 ms ±   0.1 ms    [User: 3.3 ms, System: 2.1 ms]
    Range (min … max):     3.2 ms …   3.7 ms    726 runs

  Benchmark #2: git.compile cat-file --batch-check="%(objectsize:disk)" <obj
    Time (mean ± σ):     210.3 ms ±   8.9 ms    [User: 188.2 ms, System: 23.2 ms]
    Range (min … max):   193.7 ms … 224.4 ms    13 runs

Instead, avoid computing and sorting the revindex once per process by
writing it to a file when the pack itself is generated.

The format is relatively straightforward. It contains an array of
uint32_t's, the length of which is equal to the number of objects in the
pack.  The ith entry in this table contains the index position of the
ith object in the pack, where "ith object in the pack" is determined by
pack offset.

One thing that the on-disk format does _not_ contain is the full (up to)
eight-byte offset corresponding to each object. This is something that
the in-memory revindex contains (it stores an off_t in 'struct
revindex_entry' along with the same uint32_t that the on-disk format
has). Omit it in the on-disk format, since knowing the index position
for some object is sufficient to get a constant-time lookup in the
pack-*.idx file to ask for an object's offset within the pack.

This trades off between the on-disk size of the 'pack-*.rev' file for
runtime to chase down the offset for some object. Even though the lookup
is constant time, the constant is heavier, since it can potentially
involve two pointer walks in v2 indexes (one to access the 4-byte offset
table, and potentially a second to access the double wide offset table).

Consider trying to map an object's pack offset to a relative position
within that pack. In a cold-cache scenario, more page faults occur while
switching between binary searching through the reverse index and
searching through the *.idx file for an object's offset. Sure enough,
with a cold cache (writing '3' into '/proc/sys/vm/drop_caches' after
'sync'ing), printing out the entire object's contents is still
marginally faster than printing its size:

  Benchmark #1: git.compile cat-file --batch-check="%(objectsize:disk)" <obj >/dev/null
    Time (mean ± σ):      22.6 ms ±   0.5 ms    [User: 2.4 ms, System: 7.9 ms]
    Range (min … max):    21.4 ms …  23.5 ms    41 runs

  Benchmark #2: git.compile cat-file --batch <obj >/dev/null
    Time (mean ± σ):      17.2 ms ±   0.7 ms    [User: 2.8 ms, System: 5.5 ms]
    Range (min … max):    15.6 ms …  18.2 ms    45 runs

(Numbers taken in the kernel after cheating and using the next patch to
generate a reverse index). There are a couple of approaches to improve
cold cache performance not pursued here:

  - We could include the object offsets in the reverse index format.
    Predictably, this does result in fewer page faults, but it triples
    the size of the file, while simultaneously duplicating a ton of data
    already available in the .idx file. (This was the original way I
    implemented the format, and it did show
    `--batch-check='%(objectsize:disk)'` winning out against `--batch`.)

    On the other hand, this increase in size also results in a large
    block-cache footprint, which could potentially hurt other workloads.

  - We could store the mapping from pack to index position in more
    cache-friendly way, like constructing a binary search tree from the
    table and writing the values in breadth-first order. This would
    result in much better locality, but the price you pay is trading
    O(1) lookup in 'pack_pos_to_index()' for an O(log n) one (since you
    can no longer directly index the table).

So, neither of these approaches are taken here. (Thankfully, the format
is versioned, so we are free to pursue these in the future.) But, cold
cache performance likely isn't interesting outside of one-off cases like
asking for the size of an object directly. In real-world usage, Git is
often performing many operations in the revindex,

The trade-off is worth it, since we will avoid the vast majority of the
cost of generating the revindex that the extra pointer chase will look
like noise in the following patch's benchmarks.

This patch describes the format and prepares callers (like in
pack-revindex.c) to be able to read *.rev files once they exist. An
implementation of the writer will appear in the next patch, and callers
will gradually begin to start using the writer in the patches that
follow after that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/pack-format.txt |  17 ++++
 builtin/repack.c                        |   1 +
 object-store.h                          |   3 +
 pack-revindex.c                         | 112 +++++++++++++++++++++---
 pack-revindex.h                         |   7 +-
 packfile.c                              |  13 ++-
 packfile.h                              |   1 +
 tmp-objdir.c                            |   4 +-
 8 files changed, 145 insertions(+), 13 deletions(-)

Comments

Junio C Hamano Jan. 14, 2021, 7:22 a.m. UTC | #1
Taylor Blau <me@ttaylorr.com> writes:

> +== pack-*.rev files have the format:
> +
> +  - A 4-byte magic number '0x52494458' ('RIDX').
> +
> +  - A 4-byte version identifier (= 1)
> +
> +  - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256)

These two are presumably 4-byte-wide network byte order integers.
We should spell it out.

> +  - A table of index positions, sorted by their corresponding offsets in the
> +    packfile.

Likewise, how wide is each entry and in what byte order, and how
many entries are there in the table?

	... oh, what about the "one beyond the last"?  We cannot
	go back to the forward index to learn the offset of such
	an non-existent object, can we?

Again, I expect it to be 4-byte-wide network byte order integer.

> -int load_pack_revindex(struct packed_git *p)
> +static int load_pack_revindex_from_memory(struct packed_git *p)

I said it when I saw the beginning of v1 API patches, but it is a
bit unreasonable to call the act of "computing from the forward
index" "to load from memory".  Loading from disk perfectly works as
a phrase, though.

> +#define RIDX_MIN_SIZE (12 + (2 * the_hash_algo->rawsz))
> +
> +static int load_revindex_from_disk(char *revindex_name,
> +				   uint32_t num_objects,
> +				   const void **data, size_t *len)
> +{
> +	int fd, ret = 0;
> +	struct stat st;
> +	size_t revindex_size;
> +
> +	fd = git_open(revindex_name);
> +
> +	if (fd < 0) {
> +		ret = -1;
> +		goto cleanup;
> +	}
> +	if (fstat(fd, &st)) {
> +		ret = error_errno(_("failed to read %s"), revindex_name);
> +		goto cleanup;
> +	}
> +
> +	revindex_size = xsize_t(st.st_size);
> +
> +	if (revindex_size < RIDX_MIN_SIZE) {
> +		ret = error(_("reverse-index file %s is too small"), revindex_name);
> +		goto cleanup;
> +	}
> +
> +	if (revindex_size - RIDX_MIN_SIZE != st_mult(sizeof(uint32_t), num_objects)) {
> +		ret = error(_("reverse-index file %s is corrupt"), revindex_name);
> +		goto cleanup;
> +	}
> +
> +	*len = revindex_size;
> +	*data = xmmap(NULL, revindex_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +
> +cleanup:
> +	close(fd);
> +	return ret;
> +}
> +
> +static int load_pack_revindex_from_disk(struct packed_git *p)
> +{
> +	char *revindex_name;
> +	int ret;
> +	if (open_pack_index(p))
> +		return -1;
> +
> +	revindex_name = pack_revindex_filename(p);
> +
> +	ret = load_revindex_from_disk(revindex_name,
> +				      p->num_objects,
> +				      &p->revindex_map,
> +				      &p->revindex_size);
> +	if (ret)
> +		goto cleanup;
> +
> +	p->revindex_data = (char *)p->revindex_map + 12;

We've seen hardcoded constant "12" twice so far in this patch.

We need a C proprocessor macro "#define RIDX_FILE_HEADER_SIZE 12" or
something, perhaps?

> +cleanup:
> +	free(revindex_name);
> +	return ret;
> +}
> +
> +int load_pack_revindex(struct packed_git *p)
> +{
> +	if (p->revindex || p->revindex_data)
> +		return 0;
> +
> +	if (!load_pack_revindex_from_disk(p))
> +		return 0;
> +	else if (!load_pack_revindex_from_memory(p))
> +		return 0;
> +	return -1;
> +}
> +
>  int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
>  {
>  	unsigned lo, hi;
> @@ -203,18 +285,28 @@ int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
>  
>  uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos)
>  {
> -	if (!p->revindex)
> +	if (!(p->revindex || p->revindex_data))
>  		BUG("pack_pos_to_index: reverse index not yet loaded");
>  	if (p->num_objects <= pos)
>  		BUG("pack_pos_to_index: out-of-bounds object at %"PRIu32, pos);
> -	return p->revindex[pos].nr;
> +
> +	if (p->revindex)
> +		return p->revindex[pos].nr;
> +	else
> +		return get_be32((char *)p->revindex_data + (pos * sizeof(uint32_t)));

Good.  We are using 32-bit uint in network byte order.  We should
document it as such.

Let's not strip const away while casting, though.  get_be32()
ensures that it only reads and never writes thru the pointer, and
p->revindex_data is a "const void *".

>  }
>  
>  off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
>  {
> -	if (!p->revindex)
> +	if (!(p->revindex || p->revindex_data))
>  		BUG("pack_pos_to_index: reverse index not yet loaded");
>  	if (p->num_objects < pos)
>  		BUG("pack_pos_to_offset: out-of-bounds object at %"PRIu32, pos);
> -	return p->revindex[pos].offset;
> +
> +	if (p->revindex)
> +		return p->revindex[pos].offset;
> +	else if (pos == p->num_objects)
> +		return p->pack_size - the_hash_algo->rawsz;

OK, here is the answer to my previous question.  We should document
that the table has num_objects entries in the on-disk file (we do
not need to say that there is no sentinel entry in the table at the
end).

> +	else
> +		return nth_packed_object_offset(p, pack_pos_to_index(p, pos));
>  }
> diff --git a/pack-revindex.h b/pack-revindex.h
> index 6e0320b08b..01622cf21a 100644
> --- a/pack-revindex.h
> +++ b/pack-revindex.h
> @@ -21,6 +21,9 @@ struct packed_git;
>  /*
>   * load_pack_revindex populates the revindex's internal data-structures for the
>   * given pack, returning zero on success and a negative value otherwise.
> + *
> + * If a '.rev' file is present, it is checked for consistency, mmap'd, and
> + * pointers are assigned into it (instead of using the in-memory variant).

Hmph, I missed where it got checked for consistency, though.  If the
file is corrupt and has say duplicated entries, we'd happily grab
the data via get_be32(), for example.

> @@ -55,7 +58,9 @@ uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);
>   * If the reverse index has not yet been loaded, or the position is out of
>   * bounds, this function aborts.
>   *
> - * This function runs in constant time.
> + * This function runs in constant time under both in-memory and on-disk reverse
> + * indexes, but an additional step is taken to consult the corresponding .idx
> + * file when using the on-disk format.

Again, I know this is a kind of detail that is interesting to those
who implemented the function, but I wonder how it would help those
who wonder if they should call it or use some other method to
achieve what they want.
Junio C Hamano Jan. 14, 2021, 7:26 a.m. UTC | #2
Taylor Blau <me@ttaylorr.com> writes:

> Specify the format of the on-disk reverse index 'pack-*.rev' file, as
> well as prepare the code for the existence of such files.

We've changed the pack .idx file format once as the file format is
versioned.  I wonder if you considered placing the reverse index
information in the same file, with version bump, in the .idx?
Derrick Stolee Jan. 14, 2021, 12:07 p.m. UTC | #3
On 1/14/2021 2:22 AM, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> 
>> +== pack-*.rev files have the format:
>> +
>> +  - A 4-byte magic number '0x52494458' ('RIDX').
>> +
>> +  - A 4-byte version identifier (= 1)
>> +
>> +  - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256)
> 
> These two are presumably 4-byte-wide network byte order integers.
> We should spell it out.

In the past, we've included the sentence "All multi-byte numbers are
in network byte order." to clarify this. These two entries need to
specify that the "identifier" is actually an integer. Perhaps these
three values could be provided as:

+== pack-*.rev files have the format:
+
+  - A 4-byte identifier '0x52494458' ('RIDX').
+
+  - A 4-byte integer version (= 1)
+
+  - A 4-byte integer hash function version (= 1 for SHA-1, 2 for SHA-256)

>>  /*
>>   * load_pack_revindex populates the revindex's internal data-structures for the
>>   * given pack, returning zero on success and a negative value otherwise.
>> + *
>> + * If a '.rev' file is present, it is checked for consistency, mmap'd, and
>> + * pointers are assigned into it (instead of using the in-memory variant).
> 
> Hmph, I missed where it got checked for consistency, though.  If the
> file is corrupt and has say duplicated entries, we'd happily grab
> the data via get_be32(), for example.

Even if the consistency check is just verifying the trailing hash, that
seems like something that requires O(N) before performing a lookup. Perhaps
this was copied from somewhere else, or means something different?

Thanks,
-Stolee
Taylor Blau Jan. 14, 2021, 6:13 p.m. UTC | #4
On Wed, Jan 13, 2021 at 11:26:59PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Specify the format of the on-disk reverse index 'pack-*.rev' file, as
> > well as prepare the code for the existence of such files.
>
> We've changed the pack .idx file format once as the file format is
> versioned.  I wonder if you considered placing the reverse index
> information in the same file, with version bump, in the .idx?

Funny enough, I couldn't remember whether I considered this or not.
Peff reminded me off-list that he and I *had* considered this, but
decided against it.

The main benefit to introducing a new '.rev' format is that we can have
packs that can be upgraded to write a reverse index without having to
rewrite their forward index. It would also allow us to avoid teaching
other implementations about a new version of the index file (since they
can ignore it and continue to build their equivalent of the reverse
index file in memory by reading the forward index).

(Peff reminds me that dumb-http does look at remote .idx files, so this
new format would leak across to clients, whether or not that's something
to be concerned about...).

Of course, having the contents of the .rev file be included in the .idx
file nets us one fewer file to manage, but I'm not sure that's a reason
to do things one way or another.

Your response did pique my interest, since I was wondering if we could
improve the cold cache performance if the .rev file's contents were
included in the .idx, but after giving it some thought I don't think we
can. Reasons are:

  - If the reverse index's contents appears at the end of the .idx file,
    then in any .idx file large enough to matter, we'll almost certainly
    still be evicting cache lines back and forth when swapping between
    reading the forward- and reverse-indexes. So, no gains to be had
    there.

  - If, on the other hand, we included the reverse index's contents by
    interleaving it with the forward index's offsets, then we'd be
    worsening the cache performance of the forward index.

So, I'm more in favor of a new .rev file rather than a v3 .idx version.
Apologies for not including more of a rationale "why" in the cover
letter (had I not forgotten that I'd even considered it, I would have).

Thanks,
Taylor
Taylor Blau Jan. 14, 2021, 6:28 p.m. UTC | #5
On Wed, Jan 13, 2021 at 11:22:53PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > +== pack-*.rev files have the format:
> > +
> > +  - A 4-byte magic number '0x52494458' ('RIDX').
> > +
> > +  - A 4-byte version identifier (= 1)
> > +
> > +  - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256)
>
> These two are presumably 4-byte-wide network byte order integers.
> We should spell it out.

Yep, all entries are network order. I added a sentence at the bottom of
this sub-section to say as much.

> We've seen hardcoded constant "12" twice so far in this patch.
>
> We need a C proprocessor macro "#define RIDX_FILE_HEADER_SIZE 12" or
> something, perhaps?

Good idea, thanks.

> > -	return p->revindex[pos].nr;
> > +
> > +	if (p->revindex)
> > +		return p->revindex[pos].nr;
> > +	else
> > +		return get_be32((char *)p->revindex_data + (pos * sizeof(uint32_t)));
>
> Good.  We are using 32-bit uint in network byte order.  We should
> document it as such.
>
> Let's not strip const away while casting, though.  get_be32()
> ensures that it only reads and never writes thru the pointer, and
> p->revindex_data is a "const void *".

Agreed, and thanks for the suggestion. I take it that what you mean is:

-		return get_be32((char *)p->revindex_data + (pos * sizeof(uint32_t)));
+		return get_be32((const char *)p->revindex_data + (pos * sizeof(uint32_t)));

...yes?

> > diff --git a/pack-revindex.h b/pack-revindex.h
> > index 6e0320b08b..01622cf21a 100644
> > --- a/pack-revindex.h
> > +++ b/pack-revindex.h
> > @@ -21,6 +21,9 @@ struct packed_git;
> >  /*
> >   * load_pack_revindex populates the revindex's internal data-structures for the
> >   * given pack, returning zero on success and a negative value otherwise.
> > + *
> > + * If a '.rev' file is present, it is checked for consistency, mmap'd, and
> > + * pointers are assigned into it (instead of using the in-memory variant).
>
> Hmph, I missed where it got checked for consistency, though.  If the
> file is corrupt and has say duplicated entries, we'd happily grab
> the data via get_be32(), for example.

It doesn't, I'm mistaken. I removed that incorrect detail from this
comment. Thanks for catching it.

Thanks,
Taylor
Jeff King Jan. 14, 2021, 7:57 p.m. UTC | #6
On Thu, Jan 14, 2021 at 07:07:08AM -0500, Derrick Stolee wrote:

> >> + * If a '.rev' file is present, it is checked for consistency, mmap'd, and
> >> + * pointers are assigned into it (instead of using the in-memory variant).
> > 
> > Hmph, I missed where it got checked for consistency, though.  If the
> > file is corrupt and has say duplicated entries, we'd happily grab
> > the data via get_be32(), for example.
> 
> Even if the consistency check is just verifying the trailing hash, that
> seems like something that requires O(N) before performing a lookup. Perhaps
> this was copied from somewhere else, or means something different?

For the .idx file, we check that the size is what we expect. This is
important because it lets us access the mapped bytes in normal use
without having to do a bounds check.

It looks like we do the same for the .rev file here, which is good.  If
calling that "checked for consistency" is too strong, I don't think it's
a big deal to drop the wording (we do not make any such claim for
open_pack_index()).

-Peff
Junio C Hamano Jan. 14, 2021, 8:57 p.m. UTC | #7
Taylor Blau <me@ttaylorr.com> writes:

> Your response did pique my interest, since I was wondering if we could
> improve the cold cache performance if the .rev file's contents were
> included in the .idx,...

Yes, that was the primary thing I was wondering.  As long as it was
considered and rejected for good reason, that is good.

Thanks.
Jeff King Jan. 22, 2021, 10:54 p.m. UTC | #8
On Wed, Jan 13, 2021 at 05:28:06PM -0500, Taylor Blau wrote:

> (Numbers taken in the kernel after cheating and using the next patch to
> generate a reverse index). There are a couple of approaches to improve
> cold cache performance not pursued here:
> 
>   - We could include the object offsets in the reverse index format.
>     Predictably, this does result in fewer page faults, but it triples
>     the size of the file, while simultaneously duplicating a ton of data
>     already available in the .idx file. (This was the original way I
>     implemented the format, and it did show
>     `--batch-check='%(objectsize:disk)'` winning out against `--batch`.)
> 
>     On the other hand, this increase in size also results in a large
>     block-cache footprint, which could potentially hurt other workloads.
> 
>   - We could store the mapping from pack to index position in more
>     cache-friendly way, like constructing a binary search tree from the
>     table and writing the values in breadth-first order. This would
>     result in much better locality, but the price you pay is trading
>     O(1) lookup in 'pack_pos_to_index()' for an O(log n) one (since you
>     can no longer directly index the table).
> 
> So, neither of these approaches are taken here. (Thankfully, the format
> is versioned, so we are free to pursue these in the future.) But, cold
> cache performance likely isn't interesting outside of one-off cases like
> asking for the size of an object directly. In real-world usage, Git is
> often performing many operations in the revindex,

I think you've nicely covered the arguments for and against the extra
offset here. This final paragraph ends in a comma, which makes me wonder
if you wanted to say something more. I'd guess it is along the lines
that most commands will be looking up more than one object, so that
cold-cache effort is amortized.

Or another way of thinking about it: 17ms versus 25ms in the cold-cache
for a _single_ object is not that big a deal, because the extra 8ms does
not scale as we ask about more objects. Here's an actual argument in
numbers (test repo is linux.git after building a .rev file using your
series):

For a single object, the extra cold-cache costs give --batch a slight
edge:

  $ git rev-parse HEAD >obj
  $ hyperfine -p 'echo 3 | sudo tee /proc/sys/vm/drop_caches' \
                 'git cat-file --buffer --batch-check="%(objectsize:disk)" <obj' \
                 'git cat-file --buffer --batch <obj'

  Benchmark #1: git cat-file --buffer --batch-check="%(objectsize:disk)" <obj
    Time (mean ± σ):      37.2 ms ±   8.3 ms    [User: 2.6 ms, System: 4.6 ms]
    Range (min … max):    28.5 ms …  55.6 ms    10 runs
   
  Benchmark #2: git cat-file --buffer --batch <obj
    Time (mean ± σ):      27.4 ms ±   3.4 ms    [User: 2.9 ms, System: 2.5 ms]
    Range (min … max):    23.2 ms …  37.1 ms    51 runs
   
  Summary
    'git cat-file --buffer --batch <obj' ran
      1.36 ± 0.35 times faster than 'git cat-file --buffer --batch-check="%(objectsize:disk)" <obj'

But with even a moderate number of objects, that's reversed:

  $ git cat-file --batch-all-objects --batch-check='%(objectname)' |
    shuffle | head -1000 >obj-1000
  $ hyperfine -p 'echo 3 | sudo tee /proc/sys/vm/drop_caches' \
                 'git cat-file --buffer --batch-check="%(objectsize:disk)" <obj-1000' \
		 'git cat-file --buffer --batch <obj-1000'
  
  Benchmark #1: git cat-file --buffer --batch-check="%(objectsize:disk)" <obj-1000
    Time (mean ± σ):      1.599 s ±  0.285 s    [User: 22.4 ms, System: 334.5 ms]
    Range (min … max):    0.816 s …  1.762 s    10 runs
   
  Benchmark #2: git cat-file --buffer --batch <obj-1000
    Time (mean ± σ):      1.972 s ±  0.225 s    [User: 343.5 ms, System: 404.2 ms]
    Range (min … max):    1.691 s …  2.283 s    10 runs
   
  Summary
    'git cat-file --buffer --batch-check="%(objectsize:disk)" <obj-1000' ran
      1.23 ± 0.26 times faster than 'git cat-file --buffer --batch <obj-1000'


Of course this isn't exactly an apples-to-apples comparison in the first
place, since the --batch one is doing a lot more. So "winning" with
objectsize:disk is not much of an accomplishment. A more interesting
comparison would be the same operation on a repo with your series,
versus one with the offset embedded in the .rev file, as the number of
objects grows.

But since we don't have that readily available, another interesting
comparison is stock git (with no .rev file) against your new .rev file,
with a cold cache.

At 1000 objects, the old code has a slight win, because it has less to
fault in from the disk (instead it's recreating the same data in RAM).
"git.compile" is your branch below; "git" is a stock build of "next":

  Benchmark #1: git cat-file --buffer --batch-check="%(objectsize:disk)" <obj-1000
    Time (mean ± σ):      1.483 s ±  0.260 s    [User: 148.9 ms, System: 301.2 ms]
    Range (min … max):    0.792 s …  1.725 s    10 runs
   
  Benchmark #2: git.compile cat-file --buffer --batch-check="%(objectsize:disk)" <obj-1000
    Time (mean ± σ):      1.820 s ±  0.138 s    [User: 27.7 ms, System: 399.3 ms]
    Range (min … max):    1.610 s …  2.012 s    10 runs
   
  Summary
    'git cat-file --buffer --batch-check="%(objectsize:disk)" <obj-1000' ran
      1.23 ± 0.23 times faster than 'git.compile cat-file --buffer --batch-check="%(objectsize:disk)" <obj-1000'

But that edge drops to 1.08x at 10,000 objects, and then at 100,000
objects your code is a win (by 1.16x). And of course it's a giant win
when the cache is already warm.

And in a cold cache, we'd expect a .rev file with offsets in it to be
much worse, since there's many more bytes to pull from the disk.

All of which is a really verbose way of saying: you might want to add a
few words after the comma:

  In real-world usage, Git is often performing many operations in the
  revindex (i.e., rather than asking about a single object, we'd
  generally ask about a range of history).

:) But hopefully it shows that including the offsets is not really
making things better for the cold cache anyway.

>  Documentation/technical/pack-format.txt |  17 ++++
>  builtin/repack.c                        |   1 +
>  object-store.h                          |   3 +
>  pack-revindex.c                         | 112 +++++++++++++++++++++---
>  pack-revindex.h                         |   7 +-
>  packfile.c                              |  13 ++-
>  packfile.h                              |   1 +
>  tmp-objdir.c                            |   4 +-
>  8 files changed, 145 insertions(+), 13 deletions(-)

Oh, there's a patch here, too. :)

It mostly looks good to me. I agree with Junio that "compute" is a
better verb than "load" for generating the in-memory revindex.

> +static int load_pack_revindex_from_disk(struct packed_git *p)
> +{
> +	char *revindex_name;
> +	int ret;
> +	if (open_pack_index(p))
> +		return -1;
> +
> +	revindex_name = pack_revindex_filename(p);
> +
> +	ret = load_revindex_from_disk(revindex_name,
> +				      p->num_objects,
> +				      &p->revindex_map,
> +				      &p->revindex_size);
> +	if (ret)
> +		goto cleanup;
> +
> +	p->revindex_data = (char *)p->revindex_map + 12;

Junio mentioned once spot where we lose constness through a cast. This
is another. I wonder if revindex_map should just be a "char *" to make
pointer arithmetic easier without having to cast.

But also...

> +	if (p->revindex)
> +		return p->revindex[pos].nr;
> +	else
> +		return get_be32((char *)p->revindex_data + (pos * sizeof(uint32_t)));

If p->revindex_data were "const uint32_t *", then this line would just
be:

  return get_be32(p->revindex_data + pos);

Not a huge deal either way since the whole point is to abstract this
behind a function where it only has to be written once. I don't think
there is any downside from the compiler's view (and we already use this
trick for the bitmap name-hash cache).

> diff --git a/packfile.c b/packfile.c
> index 7bb1750934..b04eac9286 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -324,11 +324,21 @@ void close_pack_index(struct packed_git *p)
>  	}
>  }
>  
> +void close_pack_revindex(struct packed_git *p) {
> +	if (!p->revindex_map)
> +		return;
> +
> +	munmap((void *)p->revindex_map, p->revindex_size);
> +	p->revindex_map = NULL;
> +	p->revindex_data = NULL;
> +}
> +
>  void close_pack(struct packed_git *p)
>  {
>  	close_pack_windows(p);
>  	close_pack_fd(p);
>  	close_pack_index(p);
> +	close_pack_revindex(p);
>  }

Thinking out loud a bit: a .rev file means we're spending an extra map
per pack (but not a descriptor, since we close after mmap). And like the
.idx files (but unlike .pack file maps), we don't keep track of these
and try to close them when under memory pressure. I think that's
probably OK in terms of bytes. It may mean running up against operating
system number-of-mmap limits more quickly when you have a very large
number of packs, as mentioned in:

  https://lore.kernel.org/git/20200601044511.GA2529317@coredump.intra.peff.net/

But this is probably bumping the number of problematic packs from 30k to
20k. Both are sufficiently ridiculous that I don't think it matters in
practice.

> diff --git a/tmp-objdir.c b/tmp-objdir.c
> index 42ed4db5d3..da414df14f 100644
> --- a/tmp-objdir.c
> +++ b/tmp-objdir.c
> @@ -187,7 +187,9 @@ static int pack_copy_priority(const char *name)
>  		return 2;
>  	if (ends_with(name, ".idx"))
>  		return 3;
> -	return 4;
> +	if (ends_with(name, ".rev"))
> +		return 4;
> +	return 5;
>  }

Probably not super important, but: should the .idx file still come last
here? Simultaneous readers won't start using the pack until the .idx
file is present. We'd probably prefer they see the whole thing
atomically, than see a .idx missing its .rev (they won't ever produce a
wrong answer, but they'll generate the in-core revindex on the fly when
they don't need to).

I guess one could argue that .bitmap files should get similar treatment,
but we'd not generally see those in the quarantine objdir anyway, so
nobody ever gave it much thought.

-Peff
Taylor Blau Jan. 25, 2021, 5:44 p.m. UTC | #9
On Fri, Jan 22, 2021 at 05:54:18PM -0500, Jeff King wrote:
> All of which is a really verbose way of saying: you might want to add a
> few words after the comma:
>
>   In real-world usage, Git is often performing many operations in the
>   revindex (i.e., rather than asking about a single object, we'd
>   generally ask about a range of history).
>
> :) But hopefully it shows that including the offsets is not really
> making things better for the cold cache anyway.

Thanks for including a compelling argument in favor of the approach that
I took in this patch.

I added something along the lines of what you suggested to the final
paragraph, so now it concludes nicely instead of ending in a comma. I
briefly considered whether I should add something about how these
operations scale and how the warming efforts are really amortized across
all of the objects, but I decided against it.

I think that this argument is already documented here, and that there's
no way to concisely state it in an already long patch. Interested
readers will easily be able to find our discussion here, which is good.

> >  Documentation/technical/pack-format.txt |  17 ++++
> >  builtin/repack.c                        |   1 +
> >  object-store.h                          |   3 +
> >  pack-revindex.c                         | 112 +++++++++++++++++++++---
> >  pack-revindex.h                         |   7 +-
> >  packfile.c                              |  13 ++-
> >  packfile.h                              |   1 +
> >  tmp-objdir.c                            |   4 +-
> >  8 files changed, 145 insertions(+), 13 deletions(-)
>
> Oh, there's a patch here, too. :)

:-).

> It mostly looks good to me. I agree with Junio that "compute" is a
> better verb than "load" for generating the in-memory revindex.

Yeah, I settled on load_pack_revindex() either calling
"create_pack_revindex_in_memory()" or "load_pack_revindex_from_disk()".

> > +static int load_pack_revindex_from_disk(struct packed_git *p)
> > +{
> > +	char *revindex_name;
> > +	int ret;
> > +	if (open_pack_index(p))
> > +		return -1;
> > +
> > +	revindex_name = pack_revindex_filename(p);
> > +
> > +	ret = load_revindex_from_disk(revindex_name,
> > +				      p->num_objects,
> > +				      &p->revindex_map,
> > +				      &p->revindex_size);
> > +	if (ret)
> > +		goto cleanup;
> > +
> > +	p->revindex_data = (char *)p->revindex_map + 12;
>
> Junio mentioned once spot where we lose constness through a cast. This
> is another. I wonder if revindex_map should just be a "char *" to make
> pointer arithmetic easier without having to cast.
>
> But also...
>
> > +	if (p->revindex)
> > +		return p->revindex[pos].nr;
> > +	else
> > +		return get_be32((char *)p->revindex_data + (pos * sizeof(uint32_t)));
>
> If p->revindex_data were "const uint32_t *", then this line would just
> be:
>
>   return get_be32(p->revindex_data + pos);
>
> Not a huge deal either way since the whole point is to abstract this
> behind a function where it only has to be written once. I don't think
> there is any downside from the compiler's view (and we already use this
> trick for the bitmap name-hash cache).

Honestly, I'm not a huge fan of implicitly scaling pos by
sizeof(*p->revindex_data), but I can understand why it reads more
clearly here. I don't really feel strongly either way, so I'm happy to
change it in favor of your suggestion.

Of course, since RIDX_HEADER_SIZE is in bytes, not uint32_t's (and it
has to be, since it's also used in the RIDX_MIN_SIZE macro, which is
compared against the st_size of stating the .rev file), you have to do
gross stuff like:

  p->revindex_data = (const uint32_t *)((const char *)p->revindex_map + RIDX_HEADER_SIZE);

But I guess the tradeoff is worth it, since the readers are easier to
parse.

> Thinking out loud a bit: a .rev file means we're spending an extra map
> per pack (but not a descriptor, since we close after mmap). And like the
> .idx files (but unlike .pack file maps), we don't keep track of these
> and try to close them when under memory pressure. I think that's
> probably OK in terms of bytes. It may mean running up against operating
> system number-of-mmap limits more quickly when you have a very large
> number of packs, as mentioned in:
>
>   https://lore.kernel.org/git/20200601044511.GA2529317@coredump.intra.peff.net/
>
> But this is probably bumping the number of problematic packs from 30k to
> 20k. Both are sufficiently ridiculous that I don't think it matters in
> practice.

Agreed.

> > diff --git a/tmp-objdir.c b/tmp-objdir.c
> > index 42ed4db5d3..da414df14f 100644
> > --- a/tmp-objdir.c
> > +++ b/tmp-objdir.c
> > @@ -187,7 +187,9 @@ static int pack_copy_priority(const char *name)
> >  		return 2;
> >  	if (ends_with(name, ".idx"))
> >  		return 3;
> > -	return 4;
> > +	if (ends_with(name, ".rev"))
> > +		return 4;
> > +	return 5;
> >  }
>
> Probably not super important, but: should the .idx file still come last
> here? Simultaneous readers won't start using the pack until the .idx
> file is present. We'd probably prefer they see the whole thing
> atomically, than see a .idx missing its .rev (they won't ever produce a
> wrong answer, but they'll generate the in-core revindex on the fly when
> they don't need to).
>
> I guess one could argue that .bitmap files should get similar treatment,
> but we'd not generally see those in the quarantine objdir anyway, so
> nobody ever gave it much thought.

Yeah, you're right (.idx files should come last, and probably an
argument to include .bitmap files here, too, exists. I'll leave the
latter as #leftoverbits).

> -Peff

Thanks,
Taylor
Jeff King Jan. 25, 2021, 6:27 p.m. UTC | #10
On Mon, Jan 25, 2021 at 12:44:53PM -0500, Taylor Blau wrote:

> Thanks for including a compelling argument in favor of the approach that
> I took in this patch.
> 
> I added something along the lines of what you suggested to the final
> paragraph, so now it concludes nicely instead of ending in a comma. I
> briefly considered whether I should add something about how these
> operations scale and how the warming efforts are really amortized across
> all of the objects, but I decided against it.
> 
> I think that this argument is already documented here, and that there's
> no way to concisely state it in an already long patch. Interested
> readers will easily be able to find our discussion here, which is good.

That sounds good. It is sort of arguing against a strawman anyway.

> > It mostly looks good to me. I agree with Junio that "compute" is a
> > better verb than "load" for generating the in-memory revindex.
> 
> Yeah, I settled on load_pack_revindex() either calling
> "create_pack_revindex_in_memory()" or "load_pack_revindex_from_disk()".

Perfect.

> > If p->revindex_data were "const uint32_t *", then this line would just
> > be:
> >
> >   return get_be32(p->revindex_data + pos);
> >
> > Not a huge deal either way since the whole point is to abstract this
> > behind a function where it only has to be written once. I don't think
> > there is any downside from the compiler's view (and we already use this
> > trick for the bitmap name-hash cache).
> 
> Honestly, I'm not a huge fan of implicitly scaling pos by
> sizeof(*p->revindex_data), but I can understand why it reads more
> clearly here. I don't really feel strongly either way, so I'm happy to
> change it in favor of your suggestion.
> 
> Of course, since RIDX_HEADER_SIZE is in bytes, not uint32_t's (and it
> has to be, since it's also used in the RIDX_MIN_SIZE macro, which is
> compared against the st_size of stating the .rev file), you have to do
> gross stuff like:
> 
>   p->revindex_data = (const uint32_t *)((const char *)p->revindex_map + RIDX_HEADER_SIZE);
> 
> But I guess the tradeoff is worth it, since the readers are easier to
> parse.

Yeah, that is definitely a downside. Perhaps keeping everything in bytes
makes things a bit more obvious. In which case I might suggest that
revindex_data just be a "const char *". You'd have to scale any pointer
computations at the point of use then, but you'd avoid needing to do any
extra casting.

-Peff
Junio C Hamano Jan. 25, 2021, 7:04 p.m. UTC | #11
Taylor Blau <me@ttaylorr.com> writes:

>> Thinking out loud a bit: a .rev file means we're spending an extra map
>> per pack (but not a descriptor, since we close after mmap). And like the
>> .idx files (but unlike .pack file maps), we don't keep track of these
>> and try to close them when under memory pressure. I think that's
>> probably OK in terms of bytes. It may mean running up against operating
>> system number-of-mmap limits more quickly ...
>> ...
>> >  	if (ends_with(name, ".idx"))
>> >  		return 3;
>> > -	return 4;
>> > +	if (ends_with(name, ".rev"))
>> > +		return 4;
>> > +	return 5;
>> >  }
>>
>> Probably not super important, but: should the .idx file still come last
>> here? Simultaneous readers won't start using the pack until the .idx
>> file is present. We'd probably prefer they see the whole thing
>> atomically, than see a .idx missing its .rev (they won't ever produce a
>> wrong answer, but they'll generate the in-core revindex on the fly when
>> they don't need to).

At some point, we may want to 

 - introduce .idx version 3 that is more extensible, so that the
   reverse info is included in one of its chunks;

 - make the .rev data for all packs stored as a chunk in .midx, so
   we can first check with .midx and not open any .rev files.

either of which would reduce the numberfrom 30k down to 10k ;-)
Taylor Blau Jan. 25, 2021, 7:23 p.m. UTC | #12
On Mon, Jan 25, 2021 at 11:04:33AM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> >> Thinking out loud a bit: a .rev file means we're spending an extra map
> >> per pack (but not a descriptor, since we close after mmap). And like the
> >> .idx files (but unlike .pack file maps), we don't keep track of these
> >> and try to close them when under memory pressure. I think that's
> >> probably OK in terms of bytes. It may mean running up against operating
> >> system number-of-mmap limits more quickly ...
> >> ...
> >> >  	if (ends_with(name, ".idx"))
> >> >  		return 3;
> >> > -	return 4;
> >> > +	if (ends_with(name, ".rev"))
> >> > +		return 4;
> >> > +	return 5;
> >> >  }
> >>
> >> Probably not super important, but: should the .idx file still come last
> >> here? Simultaneous readers won't start using the pack until the .idx
> >> file is present. We'd probably prefer they see the whole thing
> >> atomically, than see a .idx missing its .rev (they won't ever produce a
> >> wrong answer, but they'll generate the in-core revindex on the fly when
> >> they don't need to).
>
> At some point, we may want to
>
>  - introduce .idx version 3 that is more extensible, so that the
>    reverse info is included in one of its chunks;

I'm not opposed to doing so outside of this series. I'd be fine with
resurrecting the series that Stolee posted a while ago to extract a
"chunk writer" API (for using in the commit-graph and MIDX code) to also
be used here.

That got stalled out because of the conflicts that it would have
produced with Abhishek's work, but now that that's getting picked up it
could move forward.

But brian is also working on an index v3 for some hash transition stuff,
so I'd rather let that settle first.

Of course, you could combine those efforts, but I don't think that what
we have here is such a bad interim state.

>  - make the .rev data for all packs stored as a chunk in .midx, so
>    we can first check with .midx and not open any .rev files.

This is trickier than you might think because of how the MIDX selects
a pack when more than one pack contains a given object. There is also
the concept of a "multi-pack reverse index" which you can think of as
the reverse index for a pack that is more-or-less concatenating all of
the objects in the MIDX together (in pack order). That is the same order
we'll use to lay out the bits of a multi-pack bitmap, eventually.

> either of which would reduce the numberfrom 30k down to 10k ;-)

Your ";-)" is a good reminder that there are probably many other
problems with having 30k (or even 10k!) packs that would make solving
this (index v3 + MIDX chunk) not worthwhile.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index f96b2e605f..9593f8bc68 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -259,6 +259,23 @@  Pack file entry: <+
 
     Index checksum of all of the above.
 
+== pack-*.rev files have the format:
+
+  - A 4-byte magic number '0x52494458' ('RIDX').
+
+  - A 4-byte version identifier (= 1)
+
+  - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256)
+
+  - A table of index positions, sorted by their corresponding offsets in the
+    packfile.
+
+  - A trailer, containing a:
+
+    checksum of the corresponding packfile, and
+
+    a checksum of all of the above.
+
 == multi-pack-index (MIDX) files have the following format:
 
 The multi-pack-index files refer to multiple pack-files and loose objects.
diff --git a/builtin/repack.c b/builtin/repack.c
index 279be11a16..8d643ddcb9 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -208,6 +208,7 @@  static struct {
 } exts[] = {
 	{".pack"},
 	{".idx"},
+	{".rev", 1},
 	{".bitmap", 1},
 	{".promisor", 1},
 };
diff --git a/object-store.h b/object-store.h
index c4fc9dd74e..3fbf11280f 100644
--- a/object-store.h
+++ b/object-store.h
@@ -85,6 +85,9 @@  struct packed_git {
 		 multi_pack_index:1;
 	unsigned char hash[GIT_MAX_RAWSZ];
 	struct revindex_entry *revindex;
+	const void *revindex_data;
+	const void *revindex_map;
+	size_t revindex_size;
 	/* something like ".git/objects/pack/xxxxx.pack" */
 	char pack_name[FLEX_ARRAY]; /* more */
 };
diff --git a/pack-revindex.c b/pack-revindex.c
index 5e69bc7372..369812dd21 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -164,16 +164,98 @@  static void create_pack_revindex(struct packed_git *p)
 	sort_revindex(p->revindex, num_ent, p->pack_size);
 }
 
-int load_pack_revindex(struct packed_git *p)
+static int load_pack_revindex_from_memory(struct packed_git *p)
 {
-	if (!p->revindex) {
-		if (open_pack_index(p))
-			return -1;
-		create_pack_revindex(p);
-	}
+	if (open_pack_index(p))
+		return -1;
+	create_pack_revindex(p);
 	return 0;
 }
 
+static char *pack_revindex_filename(struct packed_git *p)
+{
+	size_t len;
+	if (!strip_suffix(p->pack_name, ".pack", &len))
+		BUG("pack_name does not end in .pack");
+	return xstrfmt("%.*s.rev", (int)len, p->pack_name);
+}
+
+#define RIDX_MIN_SIZE (12 + (2 * the_hash_algo->rawsz))
+
+static int load_revindex_from_disk(char *revindex_name,
+				   uint32_t num_objects,
+				   const void **data, size_t *len)
+{
+	int fd, ret = 0;
+	struct stat st;
+	size_t revindex_size;
+
+	fd = git_open(revindex_name);
+
+	if (fd < 0) {
+		ret = -1;
+		goto cleanup;
+	}
+	if (fstat(fd, &st)) {
+		ret = error_errno(_("failed to read %s"), revindex_name);
+		goto cleanup;
+	}
+
+	revindex_size = xsize_t(st.st_size);
+
+	if (revindex_size < RIDX_MIN_SIZE) {
+		ret = error(_("reverse-index file %s is too small"), revindex_name);
+		goto cleanup;
+	}
+
+	if (revindex_size - RIDX_MIN_SIZE != st_mult(sizeof(uint32_t), num_objects)) {
+		ret = error(_("reverse-index file %s is corrupt"), revindex_name);
+		goto cleanup;
+	}
+
+	*len = revindex_size;
+	*data = xmmap(NULL, revindex_size, PROT_READ, MAP_PRIVATE, fd, 0);
+
+cleanup:
+	close(fd);
+	return ret;
+}
+
+static int load_pack_revindex_from_disk(struct packed_git *p)
+{
+	char *revindex_name;
+	int ret;
+	if (open_pack_index(p))
+		return -1;
+
+	revindex_name = pack_revindex_filename(p);
+
+	ret = load_revindex_from_disk(revindex_name,
+				      p->num_objects,
+				      &p->revindex_map,
+				      &p->revindex_size);
+	if (ret)
+		goto cleanup;
+
+	p->revindex_data = (char *)p->revindex_map + 12;
+
+cleanup:
+	free(revindex_name);
+	return ret;
+}
+
+int load_pack_revindex(struct packed_git *p)
+{
+	if (p->revindex || p->revindex_data)
+		return 0;
+
+	if (!load_pack_revindex_from_disk(p))
+		return 0;
+	else if (!load_pack_revindex_from_memory(p))
+		return 0;
+	return -1;
+}
+
 int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
 {
 	unsigned lo, hi;
@@ -203,18 +285,28 @@  int offset_to_pack_pos(struct packed_git *p, off_t ofs, uint32_t *pos)
 
 uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos)
 {
-	if (!p->revindex)
+	if (!(p->revindex || p->revindex_data))
 		BUG("pack_pos_to_index: reverse index not yet loaded");
 	if (p->num_objects <= pos)
 		BUG("pack_pos_to_index: out-of-bounds object at %"PRIu32, pos);
-	return p->revindex[pos].nr;
+
+	if (p->revindex)
+		return p->revindex[pos].nr;
+	else
+		return get_be32((char *)p->revindex_data + (pos * sizeof(uint32_t)));
 }
 
 off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos)
 {
-	if (!p->revindex)
+	if (!(p->revindex || p->revindex_data))
 		BUG("pack_pos_to_index: reverse index not yet loaded");
 	if (p->num_objects < pos)
 		BUG("pack_pos_to_offset: out-of-bounds object at %"PRIu32, pos);
-	return p->revindex[pos].offset;
+
+	if (p->revindex)
+		return p->revindex[pos].offset;
+	else if (pos == p->num_objects)
+		return p->pack_size - the_hash_algo->rawsz;
+	else
+		return nth_packed_object_offset(p, pack_pos_to_index(p, pos));
 }
diff --git a/pack-revindex.h b/pack-revindex.h
index 6e0320b08b..01622cf21a 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -21,6 +21,9 @@  struct packed_git;
 /*
  * load_pack_revindex populates the revindex's internal data-structures for the
  * given pack, returning zero on success and a negative value otherwise.
+ *
+ * If a '.rev' file is present, it is checked for consistency, mmap'd, and
+ * pointers are assigned into it (instead of using the in-memory variant).
  */
 int load_pack_revindex(struct packed_git *p);
 
@@ -55,7 +58,9 @@  uint32_t pack_pos_to_index(struct packed_git *p, uint32_t pos);
  * If the reverse index has not yet been loaded, or the position is out of
  * bounds, this function aborts.
  *
- * This function runs in constant time.
+ * This function runs in constant time under both in-memory and on-disk reverse
+ * indexes, but an additional step is taken to consult the corresponding .idx
+ * file when using the on-disk format.
  */
 off_t pack_pos_to_offset(struct packed_git *p, uint32_t pos);
 
diff --git a/packfile.c b/packfile.c
index 7bb1750934..b04eac9286 100644
--- a/packfile.c
+++ b/packfile.c
@@ -324,11 +324,21 @@  void close_pack_index(struct packed_git *p)
 	}
 }
 
+void close_pack_revindex(struct packed_git *p) {
+	if (!p->revindex_map)
+		return;
+
+	munmap((void *)p->revindex_map, p->revindex_size);
+	p->revindex_map = NULL;
+	p->revindex_data = NULL;
+}
+
 void close_pack(struct packed_git *p)
 {
 	close_pack_windows(p);
 	close_pack_fd(p);
 	close_pack_index(p);
+	close_pack_revindex(p);
 }
 
 void close_object_store(struct raw_object_store *o)
@@ -351,7 +361,7 @@  void close_object_store(struct raw_object_store *o)
 
 void unlink_pack_path(const char *pack_name, int force_delete)
 {
-	static const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", ".promisor"};
+	static const char *exts[] = {".pack", ".idx", ".rev", ".keep", ".bitmap", ".promisor"};
 	int i;
 	struct strbuf buf = STRBUF_INIT;
 	size_t plen;
@@ -853,6 +863,7 @@  static void prepare_pack(const char *full_name, size_t full_name_len,
 	if (!strcmp(file_name, "multi-pack-index"))
 		return;
 	if (ends_with(file_name, ".idx") ||
+	    ends_with(file_name, ".rev") ||
 	    ends_with(file_name, ".pack") ||
 	    ends_with(file_name, ".bitmap") ||
 	    ends_with(file_name, ".keep") ||
diff --git a/packfile.h b/packfile.h
index a58fc738e0..4cfec9e8d3 100644
--- a/packfile.h
+++ b/packfile.h
@@ -90,6 +90,7 @@  uint32_t get_pack_fanout(struct packed_git *p, uint32_t value);
 
 unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
 void close_pack_windows(struct packed_git *);
+void close_pack_revindex(struct packed_git *);
 void close_pack(struct packed_git *);
 void close_object_store(struct raw_object_store *o);
 void unuse_pack(struct pack_window **);
diff --git a/tmp-objdir.c b/tmp-objdir.c
index 42ed4db5d3..da414df14f 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -187,7 +187,9 @@  static int pack_copy_priority(const char *name)
 		return 2;
 	if (ends_with(name, ".idx"))
 		return 3;
-	return 4;
+	if (ends_with(name, ".rev"))
+		return 4;
+	return 5;
 }
 
 static int pack_copy_cmp(const char *a, const char *b)