diff mbox series

[8/9] sha1-file: use loose object cache for quick existence check

Message ID 20181112145442.GH7400@sigill.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series caching loose objects | expand

Commit Message

Jeff King Nov. 12, 2018, 2:54 p.m. UTC
In cases where we expect to ask has_sha1_file() about a lot of objects
that we are not likely to have (e.g., during fetch negotiation), we
already use OBJECT_INFO_QUICK to sacrifice accuracy (due to racing with
a simultaneous write or repack) for speed (we avoid re-scanning the pack
directory).

However, even checking for loose objects can be expensive, as we will
stat() each one. On many systems this cost isn't too noticeable, but
stat() can be particularly slow on some operating systems, or due to
network filesystems.

Since the QUICK flag already tells us that we're OK with a slightly
stale answer, we can use that as a cue to look in our in-memory cache of
each object directory. That basically trades an in-memory binary search
for a stat() call.

Note that it is possible for this to actually be _slower_. We'll do a
full readdir() to fill the cache, so if you have a very large number of
loose objects and a very small number of lookups, that readdir() may end
up more expensive.

This shouldn't be a big deal in practice. If you have a large number of
reachable loose objects, you'll already run into performance problems
(which you should remedy by repacking). You may have unreachable objects
which wouldn't otherwise impact performance. Usually these would go away
with the prune step of "git gc", but they may be held for up to 2 weeks
in the default configuration.

So it comes down to how many such objects you might reasonably expect to
have, how much slower is readdir() on N entries versus M stat() calls
(and here we really care about the syscall backing readdir(), like
getdents() on Linux, but I'll just call this readdir() below).

If N is much smaller than M (a typical packed repo), we know this is a
big win (few readdirs() followed by many uses of the resulting cache).
When N and M are similar in size, it's also a win. We care about the
latency of making a syscall, and readdir() should be giving us many
values in a single call. How many?

On Linux, running "strace -e getdents ls" shows a 32k buffer getting 512
entries per call (which is 64 bytes per entry; the name itself is 38
bytes, plus there are some other fields). So we can imagine that this is
always a win as long as the number of loose objects in the repository is
a factor of 500 less than the number of lookups you make. It's hard to
auto-tune this because we don't generally know up front how many lookups
we're going to do. But it's unlikely for this to perform significantly
worse.

Signed-off-by: Jeff King <peff@peff.net>
---
There's some obvious hand-waving in the paragraphs above. I would love
it if somebody with an NFS system could do some before/after timings
with various numbers of loose objects, to get a sense of where the
breakeven point is.

My gut is that we do not need the complexity of a cache-size limit, nor
of a config option to disable this. But it would be nice to have a real
number where "reasonable" ends and "pathological" begins. :)

 object-store.h |  1 +
 sha1-file.c    | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

Comments

Derrick Stolee Nov. 12, 2018, 4 p.m. UTC | #1
On 11/12/2018 9:54 AM, Jeff King wrote:
> In cases where we expect to ask has_sha1_file() about a lot of objects
> that we are not likely to have (e.g., during fetch negotiation), we
> already use OBJECT_INFO_QUICK to sacrifice accuracy (due to racing with
> a simultaneous write or repack) for speed (we avoid re-scanning the pack
> directory).
>
> However, even checking for loose objects can be expensive, as we will
> stat() each one. On many systems this cost isn't too noticeable, but
> stat() can be particularly slow on some operating systems, or due to
> network filesystems.
>
> Since the QUICK flag already tells us that we're OK with a slightly
> stale answer, we can use that as a cue to look in our in-memory cache of
> each object directory. That basically trades an in-memory binary search
> for a stat() call.
>
> Note that it is possible for this to actually be _slower_. We'll do a
> full readdir() to fill the cache, so if you have a very large number of
> loose objects and a very small number of lookups, that readdir() may end
> up more expensive.
>
> This shouldn't be a big deal in practice. If you have a large number of
> reachable loose objects, you'll already run into performance problems
> (which you should remedy by repacking). You may have unreachable objects
> which wouldn't otherwise impact performance. Usually these would go away
> with the prune step of "git gc", but they may be held for up to 2 weeks
> in the default configuration.
>
> So it comes down to how many such objects you might reasonably expect to
> have, how much slower is readdir() on N entries versus M stat() calls
> (and here we really care about the syscall backing readdir(), like
> getdents() on Linux, but I'll just call this readdir() below).
>
> If N is much smaller than M (a typical packed repo), we know this is a
> big win (few readdirs() followed by many uses of the resulting cache).
> When N and M are similar in size, it's also a win. We care about the
> latency of making a syscall, and readdir() should be giving us many
> values in a single call. How many?
>
> On Linux, running "strace -e getdents ls" shows a 32k buffer getting 512
> entries per call (which is 64 bytes per entry; the name itself is 38
> bytes, plus there are some other fields). So we can imagine that this is
> always a win as long as the number of loose objects in the repository is
> a factor of 500 less than the number of lookups you make. It's hard to
> auto-tune this because we don't generally know up front how many lookups
> we're going to do. But it's unlikely for this to perform significantly
> worse.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> There's some obvious hand-waving in the paragraphs above. I would love
> it if somebody with an NFS system could do some before/after timings
> with various numbers of loose objects, to get a sense of where the
> breakeven point is.
>
> My gut is that we do not need the complexity of a cache-size limit, nor
> of a config option to disable this. But it would be nice to have a real
> number where "reasonable" ends and "pathological" begins. :)

I'm interested in such numbers, but do not have the appropriate setup to 
test.

I think the tradeoffs you mention above are reasonable. There's also 
some chance that this isn't "extra" work but is just "earlier" work, as 
the abbreviation code would load these loose object directories.

>
>   object-store.h |  1 +
>   sha1-file.c    | 20 ++++++++++++++++++++
>   2 files changed, 21 insertions(+)
>
> diff --git a/object-store.h b/object-store.h
> index bf1e0cb761..60758efad8 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -13,6 +13,7 @@ struct object_directory {
>   	/*
>   	 * Used to store the results of readdir(3) calls when we are OK
>   	 * sacrificing accuracy due to races for speed. That includes
> +	 * object existence with OBJECT_INFO_QUICK, as well as
>   	 * our search for unique abbreviated hashes. Don't use it for tasks
>   	 * requiring greater accuracy!
>   	 *
> diff --git a/sha1-file.c b/sha1-file.c
> index 4aae716a37..e53da0b701 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -921,6 +921,24 @@ static int open_sha1_file(struct repository *r,
>   	return -1;
>   }
>   
> +static int quick_has_loose(struct repository *r,
> +			   const unsigned char *sha1)
> +{
> +	int subdir_nr = sha1[0];
> +	struct object_id oid;
> +	struct object_directory *odb;
> +
> +	hashcpy(oid.hash, sha1);
> +
> +	prepare_alt_odb(r);
> +	for (odb = r->objects->odb; odb; odb = odb->next) {
> +		odb_load_loose_cache(odb, subdir_nr);
> +		if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
>   /*
>    * Map the loose object at "path" if it is not NULL, or the path found by
>    * searching for a loose object named "sha1".
> @@ -1171,6 +1189,8 @@ static int sha1_loose_object_info(struct repository *r,
>   	if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) {
>   		const char *path;
>   		struct stat st;
> +		if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
> +			return quick_has_loose(r, sha1) ? 0 : -1;
>   		if (stat_sha1_file(r, sha1, &st, &path) < 0)
>   			return -1;
>   		if (oi->disk_sizep)

LGTM.

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason Nov. 12, 2018, 4:01 p.m. UTC | #2
On Mon, Nov 12 2018, Jeff King wrote:

> In cases where we expect to ask has_sha1_file() about a lot of objects
> that we are not likely to have (e.g., during fetch negotiation), we
> already use OBJECT_INFO_QUICK to sacrifice accuracy (due to racing with
> a simultaneous write or repack) for speed (we avoid re-scanning the pack
> directory).
>
> However, even checking for loose objects can be expensive, as we will
> stat() each one. On many systems this cost isn't too noticeable, but
> stat() can be particularly slow on some operating systems, or due to
> network filesystems.
>
> Since the QUICK flag already tells us that we're OK with a slightly
> stale answer, we can use that as a cue to look in our in-memory cache of
> each object directory. That basically trades an in-memory binary search
> for a stat() call.
>
> Note that it is possible for this to actually be _slower_. We'll do a
> full readdir() to fill the cache, so if you have a very large number of
> loose objects and a very small number of lookups, that readdir() may end
> up more expensive.
>
> This shouldn't be a big deal in practice. If you have a large number of
> reachable loose objects, you'll already run into performance problems
> (which you should remedy by repacking). You may have unreachable objects
> which wouldn't otherwise impact performance. Usually these would go away
> with the prune step of "git gc", but they may be held for up to 2 weeks
> in the default configuration.
>
> So it comes down to how many such objects you might reasonably expect to
> have, how much slower is readdir() on N entries versus M stat() calls
> (and here we really care about the syscall backing readdir(), like
> getdents() on Linux, but I'll just call this readdir() below).
>
> If N is much smaller than M (a typical packed repo), we know this is a
> big win (few readdirs() followed by many uses of the resulting cache).
> When N and M are similar in size, it's also a win. We care about the
> latency of making a syscall, and readdir() should be giving us many
> values in a single call. How many?
>
> On Linux, running "strace -e getdents ls" shows a 32k buffer getting 512
> entries per call (which is 64 bytes per entry; the name itself is 38
> bytes, plus there are some other fields). So we can imagine that this is
> always a win as long as the number of loose objects in the repository is
> a factor of 500 less than the number of lookups you make. It's hard to
> auto-tune this because we don't generally know up front how many lookups
> we're going to do. But it's unlikely for this to perform significantly
> worse.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> There's some obvious hand-waving in the paragraphs above. I would love
> it if somebody with an NFS system could do some before/after timings
> with various numbers of loose objects, to get a sense of where the
> breakeven point is.
>
> My gut is that we do not need the complexity of a cache-size limit, nor
> of a config option to disable this. But it would be nice to have a real
> number where "reasonable" ends and "pathological" begins. :)

I'm happy to test this on some of the NFS we have locally, and started
out with a plan to write some for-loop using the low-level API (so it
would look up all 256), fake populate .git/objects/?? with N number of
objects etc, but ran out of time.

Do you have something ready that you think would be representative and I
could just run? If not I'll try to pick this up again...
Jeff King Nov. 12, 2018, 4:21 p.m. UTC | #3
On Mon, Nov 12, 2018 at 05:01:02PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > There's some obvious hand-waving in the paragraphs above. I would love
> > it if somebody with an NFS system could do some before/after timings
> > with various numbers of loose objects, to get a sense of where the
> > breakeven point is.
> >
> > My gut is that we do not need the complexity of a cache-size limit, nor
> > of a config option to disable this. But it would be nice to have a real
> > number where "reasonable" ends and "pathological" begins. :)
> 
> I'm happy to test this on some of the NFS we have locally, and started
> out with a plan to write some for-loop using the low-level API (so it
> would look up all 256), fake populate .git/objects/?? with N number of
> objects etc, but ran out of time.
> 
> Do you have something ready that you think would be representative and I
> could just run? If not I'll try to pick this up again...

No, but they don't even really need to be actual objects. So I suspect
something like:

  git init
  for i in $(seq 256); do
    i=$(printf %02x $i)
    mkdir -p .git/objects/$i
    for j in $(seq --format=%038g 1000); do
      echo foo >.git/objects/$i/$j
    done
  done
  git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack

might work (for various values of 1000). The shell loop would probably
be faster as perl, too. :)

Make sure you clear the object directory between runs, though (otherwise
the subsequent index-pack's really do find collisions and spend time
accessing the objects).

If you want real objects, you could probably just dump a bunch of
sequential blobs to fast-import, and then pipe the result to
unpack-objects.

-Peff
Ævar Arnfjörð Bjarmason Nov. 12, 2018, 10:18 p.m. UTC | #4
On Mon, Nov 12 2018, Jeff King wrote:

> On Mon, Nov 12, 2018 at 05:01:02PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> > There's some obvious hand-waving in the paragraphs above. I would love
>> > it if somebody with an NFS system could do some before/after timings
>> > with various numbers of loose objects, to get a sense of where the
>> > breakeven point is.
>> >
>> > My gut is that we do not need the complexity of a cache-size limit, nor
>> > of a config option to disable this. But it would be nice to have a real
>> > number where "reasonable" ends and "pathological" begins. :)
>>
>> I'm happy to test this on some of the NFS we have locally, and started
>> out with a plan to write some for-loop using the low-level API (so it
>> would look up all 256), fake populate .git/objects/?? with N number of
>> objects etc, but ran out of time.
>>
>> Do you have something ready that you think would be representative and I
>> could just run? If not I'll try to pick this up again...
>
> No, but they don't even really need to be actual objects. So I suspect
> something like:
>
>   git init
>   for i in $(seq 256); do
>     i=$(printf %02x $i)
>     mkdir -p .git/objects/$i
>     for j in $(seq --format=%038g 1000); do
>       echo foo >.git/objects/$i/$j
>     done
>   done
>   git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack
>
> might work (for various values of 1000). The shell loop would probably
> be faster as perl, too. :)
>
> Make sure you clear the object directory between runs, though (otherwise
> the subsequent index-pack's really do find collisions and spend time
> accessing the objects).
>
> If you want real objects, you could probably just dump a bunch of
> sequential blobs to fast-import, and then pipe the result to
> unpack-objects.
>
> -Peff

I did a very ad-hoc test against a NetApp filer using the test script
quoted at the end of this E-Mail. The test compared origin/master, this
branch of yours, and my core.checkCollisions=false branch.

When run with DBD-mysql.git (just some random ~1k commit repo I had):

    $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run origin/master peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh

I get:

    Test                                             origin/master     peff/jk/loose-cache      avar/check-collisions-config
    ------------------------------------------------------------------------------------------------------------------------
    0008.2: index-pack with 256*1 loose objects      4.31(0.55+0.18)   0.41(0.40+0.02) -90.5%   0.23(0.36+0.01) -94.7%
    0008.3: index-pack with 256*10 loose objects     4.37(0.45+0.21)   0.45(0.40+0.02) -89.7%   0.25(0.38+0.01) -94.3%
    0008.4: index-pack with 256*100 loose objects    4.47(0.53+0.23)   0.67(0.63+0.02) -85.0%   0.24(0.38+0.01) -94.6%
    0008.5: index-pack with 256*250 loose objects    5.01(0.67+0.30)   1.04(0.98+0.06) -79.2%   0.24(0.37+0.01) -95.2%
    0008.6: index-pack with 256*500 loose objects    5.11(0.57+0.21)   1.81(1.70+0.09) -64.6%   0.25(0.38+0.01) -95.1%
    0008.7: index-pack with 256*750 loose objects    5.12(0.60+0.22)   2.54(2.38+0.14) -50.4%   0.24(0.38+0.01) -95.3%
    0008.8: index-pack with 256*1000 loose objects   4.52(0.52+0.21)   3.36(3.17+0.17) -25.7%   0.23(0.36+0.01) -94.9%

I then hacked it to test against git.git, but skipped origin/master for
that one because it takes *ages*. So just mine v.s. yours:

    $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh
    [...]
    Test                                             peff/jk/loose-cache   avar/check-collisions-config
    ---------------------------------------------------------------------------------------------------
    0008.2: index-pack with 256*1 loose objects      12.57(28.72+0.61)     12.68(29.36+0.62) +0.9%
    0008.3: index-pack with 256*10 loose objects     12.77(28.75+0.61)     12.50(28.88+0.56) -2.1%
    0008.4: index-pack with 256*100 loose objects    13.20(29.49+0.66)     12.38(28.58+0.60) -6.2%
    0008.5: index-pack with 256*250 loose objects    14.10(30.59+0.64)     12.54(28.22+0.57) -11.1%
    0008.6: index-pack with 256*500 loose objects    14.48(31.06+0.74)     12.43(28.59+0.60) -14.2%
    0008.7: index-pack with 256*750 loose objects    15.31(31.91+0.74)     12.67(29.23+0.64) -17.2%
    0008.8: index-pack with 256*1000 loose objects   16.34(32.84+0.76)     13.11(30.19+0.68) -19.8%

So not much of a practical difference perhaps. But then again this isn't
a very realistic test case of anything. Rarely are you going to push a
history of something the size of git.git into a repo with this many
loose objects.

Using sha1collisiondetection.git is I think the most realistic scenario,
i.e. you'll often end up fetching/pushing something roughly the size of
its entire history on a big repo, and with it:

    Test                                             peff/jk/loose-cache   avar/check-collisions-config
    ---------------------------------------------------------------------------------------------------
    0008.2: index-pack with 256*1 loose objects      0.16(0.04+0.01)       0.05(0.03+0.00) -68.8%
    0008.3: index-pack with 256*10 loose objects     0.19(0.04+0.02)       0.05(0.02+0.00) -73.7%
    0008.4: index-pack with 256*100 loose objects    0.32(0.17+0.02)       0.04(0.02+0.00) -87.5%
    0008.5: index-pack with 256*250 loose objects    0.57(0.41+0.03)       0.04(0.02+0.00) -93.0%
    0008.6: index-pack with 256*500 loose objects    1.02(0.83+0.06)       0.04(0.03+0.00) -96.1%
    0008.7: index-pack with 256*750 loose objects    1.47(1.24+0.10)       0.04(0.02+0.00) -97.3%
    0008.8: index-pack with 256*1000 loose objects   1.94(1.70+0.10)       0.04(0.02+0.00) -97.9%

As noted in previous threads I have an in-house monorepo where (due to
expiry policies) loose objects hover around the 256*250 mark.

The script, which is hacky as hell and takes shortcuts not to re-create
the huge fake loose object collection every time (takes ages). Perhaps
you're interested in incorporating some version of this into a v2. To be
useful it should take some target path as an env variable.

$ cat t/perf/p0008-index-pack.sh
#!/bin/sh

test_description="Tests performance of index-pack with loose objects"

. ./perf-lib.sh

test_perf_fresh_repo

test_expect_success 'setup tests' '
	for count in 1 10 100 250 500 750 1000
	do
		if test -d /mnt/ontap_githackers/repo-$count.git
		then
			rm -rf /mnt/ontap_githackers/repo-$count.git/objects/pack
		else
			git init --bare /mnt/ontap_githackers/repo-$count.git &&
			(
				cd /mnt/ontap_githackers/repo-$count.git &&
				for i in $(seq 0 255)
				do
					i=$(printf %02x $i) &&
					mkdir objects/$i &&
					for j in $(seq --format=%038g $count)
					do
						>objects/$i/$j
					done
				done
			)
		fi
	done
'

for count in 1 10 100 250 500 750 1000
do
	echo 3 | sudo tee /proc/sys/vm/drop_caches
	test_perf "index-pack with 256*$count loose objects" "
		(
			cd /mnt/ontap_githackers/repo-$count.git &&
			rm -fv objects/pack/*;
			git -c core.checkCollisions=false index-pack -v --stdin </home/aearnfjord/g/DBD-mysql/.git/objects/pack/pack-*.pack
		)
	"
done

test_done
Ævar Arnfjörð Bjarmason Nov. 12, 2018, 10:30 p.m. UTC | #5
On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Nov 12 2018, Jeff King wrote:
>
>> On Mon, Nov 12, 2018 at 05:01:02PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>
>>> > There's some obvious hand-waving in the paragraphs above. I would love
>>> > it if somebody with an NFS system could do some before/after timings
>>> > with various numbers of loose objects, to get a sense of where the
>>> > breakeven point is.
>>> >
>>> > My gut is that we do not need the complexity of a cache-size limit, nor
>>> > of a config option to disable this. But it would be nice to have a real
>>> > number where "reasonable" ends and "pathological" begins. :)
>>>
>>> I'm happy to test this on some of the NFS we have locally, and started
>>> out with a plan to write some for-loop using the low-level API (so it
>>> would look up all 256), fake populate .git/objects/?? with N number of
>>> objects etc, but ran out of time.
>>>
>>> Do you have something ready that you think would be representative and I
>>> could just run? If not I'll try to pick this up again...
>>
>> No, but they don't even really need to be actual objects. So I suspect
>> something like:
>>
>>   git init
>>   for i in $(seq 256); do
>>     i=$(printf %02x $i)
>>     mkdir -p .git/objects/$i
>>     for j in $(seq --format=%038g 1000); do
>>       echo foo >.git/objects/$i/$j
>>     done
>>   done
>>   git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack
>>
>> might work (for various values of 1000). The shell loop would probably
>> be faster as perl, too. :)
>>
>> Make sure you clear the object directory between runs, though (otherwise
>> the subsequent index-pack's really do find collisions and spend time
>> accessing the objects).
>>
>> If you want real objects, you could probably just dump a bunch of
>> sequential blobs to fast-import, and then pipe the result to
>> unpack-objects.
>>
>> -Peff
>
> I did a very ad-hoc test against a NetApp filer using the test script
> quoted at the end of this E-Mail. The test compared origin/master, this
> branch of yours, and my core.checkCollisions=false branch.
>
> When run with DBD-mysql.git (just some random ~1k commit repo I had):
>
>     $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run origin/master peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh
>
> I get:
>
>     Test                                             origin/master     peff/jk/loose-cache      avar/check-collisions-config
>     ------------------------------------------------------------------------------------------------------------------------
>     0008.2: index-pack with 256*1 loose objects      4.31(0.55+0.18)   0.41(0.40+0.02) -90.5%   0.23(0.36+0.01) -94.7%
>     0008.3: index-pack with 256*10 loose objects     4.37(0.45+0.21)   0.45(0.40+0.02) -89.7%   0.25(0.38+0.01) -94.3%
>     0008.4: index-pack with 256*100 loose objects    4.47(0.53+0.23)   0.67(0.63+0.02) -85.0%   0.24(0.38+0.01) -94.6%
>     0008.5: index-pack with 256*250 loose objects    5.01(0.67+0.30)   1.04(0.98+0.06) -79.2%   0.24(0.37+0.01) -95.2%
>     0008.6: index-pack with 256*500 loose objects    5.11(0.57+0.21)   1.81(1.70+0.09) -64.6%   0.25(0.38+0.01) -95.1%
>     0008.7: index-pack with 256*750 loose objects    5.12(0.60+0.22)   2.54(2.38+0.14) -50.4%   0.24(0.38+0.01) -95.3%
>     0008.8: index-pack with 256*1000 loose objects   4.52(0.52+0.21)   3.36(3.17+0.17) -25.7%   0.23(0.36+0.01) -94.9%
>
> I then hacked it to test against git.git, but skipped origin/master for
> that one because it takes *ages*. So just mine v.s. yours:
>
>     $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh
>     [...]
>     Test                                             peff/jk/loose-cache   avar/check-collisions-config
>     ---------------------------------------------------------------------------------------------------
>     0008.2: index-pack with 256*1 loose objects      12.57(28.72+0.61)     12.68(29.36+0.62) +0.9%
>     0008.3: index-pack with 256*10 loose objects     12.77(28.75+0.61)     12.50(28.88+0.56) -2.1%
>     0008.4: index-pack with 256*100 loose objects    13.20(29.49+0.66)     12.38(28.58+0.60) -6.2%
>     0008.5: index-pack with 256*250 loose objects    14.10(30.59+0.64)     12.54(28.22+0.57) -11.1%
>     0008.6: index-pack with 256*500 loose objects    14.48(31.06+0.74)     12.43(28.59+0.60) -14.2%
>     0008.7: index-pack with 256*750 loose objects    15.31(31.91+0.74)     12.67(29.23+0.64) -17.2%
>     0008.8: index-pack with 256*1000 loose objects   16.34(32.84+0.76)     13.11(30.19+0.68) -19.8%
>
> So not much of a practical difference perhaps. But then again this isn't
> a very realistic test case of anything. Rarely are you going to push a
> history of something the size of git.git into a repo with this many
> loose objects.
>
> Using sha1collisiondetection.git is I think the most realistic scenario,
> i.e. you'll often end up fetching/pushing something roughly the size of
> its entire history on a big repo, and with it:
>
>     Test                                             peff/jk/loose-cache   avar/check-collisions-config
>     ---------------------------------------------------------------------------------------------------
>     0008.2: index-pack with 256*1 loose objects      0.16(0.04+0.01)       0.05(0.03+0.00) -68.8%
>     0008.3: index-pack with 256*10 loose objects     0.19(0.04+0.02)       0.05(0.02+0.00) -73.7%
>     0008.4: index-pack with 256*100 loose objects    0.32(0.17+0.02)       0.04(0.02+0.00) -87.5%
>     0008.5: index-pack with 256*250 loose objects    0.57(0.41+0.03)       0.04(0.02+0.00) -93.0%
>     0008.6: index-pack with 256*500 loose objects    1.02(0.83+0.06)       0.04(0.03+0.00) -96.1%
>     0008.7: index-pack with 256*750 loose objects    1.47(1.24+0.10)       0.04(0.02+0.00) -97.3%
>     0008.8: index-pack with 256*1000 loose objects   1.94(1.70+0.10)       0.04(0.02+0.00) -97.9%
>
> As noted in previous threads I have an in-house monorepo where (due to
> expiry policies) loose objects hover around the 256*250 mark.
>
> The script, which is hacky as hell and takes shortcuts not to re-create
> the huge fake loose object collection every time (takes ages). Perhaps
> you're interested in incorporating some version of this into a v2. To be
> useful it should take some target path as an env variable.

I forgot perhaps the most useful metric. Testing against origin/master
too on the sha1collisiondetection.git repo, which as noted above I think
is a good stand-in for making a medium sized push to a big repo. This
shows when the loose cache becomes counterproductive:

    Test                                             origin/master     peff/jk/loose-cache       avar/check-collisions-config
    -------------------------------------------------------------------------------------------------------------------------
    0008.2: index-pack with 256*1 loose objects      0.42(0.04+0.03)   0.17(0.04+0.00) -59.5%    0.04(0.03+0.00) -90.5%
    0008.3: index-pack with 256*10 loose objects     0.49(0.04+0.03)   0.19(0.04+0.01) -61.2%    0.04(0.02+0.00) -91.8%
    0008.4: index-pack with 256*100 loose objects    0.49(0.04+0.04)   0.33(0.18+0.01) -32.7%    0.05(0.02+0.00) -89.8%
    0008.5: index-pack with 256*250 loose objects    0.54(0.03+0.04)   0.59(0.43+0.02) +9.3%     0.04(0.02+0.01) -92.6%
    0008.6: index-pack with 256*500 loose objects    0.49(0.04+0.03)   1.04(0.83+0.07) +112.2%   0.04(0.02+0.00) -91.8%
    0008.7: index-pack with 256*750 loose objects    0.56(0.04+0.05)   1.50(1.28+0.08) +167.9%   0.04(0.02+0.00) -92.9%
    0008.8: index-pack with 256*1000 loose objects   0.54(0.05+0.03)   1.95(1.68+0.13) +261.1%   0.04(0.02+0.00) -92.6%

I still think it's best to take this patch series since it's unlikely
we're making anything worse in practice, the >50k objects case is a
really high number, which I don't think is worth worrying about.

But I am somewhat paranoid about the potential performance
regression. I.e. this is me testing against a really expensive and
relatively well performing NetApp NFS device where the ping stats are:

    rtt min/avg/max/mdev = 0.155/0.396/1.387/0.349 ms

So I suspect this might get a lot worse for setups which don't enjoy the
same performance or network locality.


> $ cat t/perf/p0008-index-pack.sh
> #!/bin/sh
>
> test_description="Tests performance of index-pack with loose objects"
>
> . ./perf-lib.sh
>
> test_perf_fresh_repo
>
> test_expect_success 'setup tests' '
> 	for count in 1 10 100 250 500 750 1000
> 	do
> 		if test -d /mnt/ontap_githackers/repo-$count.git
> 		then
> 			rm -rf /mnt/ontap_githackers/repo-$count.git/objects/pack
> 		else
> 			git init --bare /mnt/ontap_githackers/repo-$count.git &&
> 			(
> 				cd /mnt/ontap_githackers/repo-$count.git &&
> 				for i in $(seq 0 255)
> 				do
> 					i=$(printf %02x $i) &&
> 					mkdir objects/$i &&
> 					for j in $(seq --format=%038g $count)
> 					do
> 						>objects/$i/$j
> 					done
> 				done
> 			)
> 		fi
> 	done
> '
>
> for count in 1 10 100 250 500 750 1000
> do
> 	echo 3 | sudo tee /proc/sys/vm/drop_caches
> 	test_perf "index-pack with 256*$count loose objects" "
> 		(
> 			cd /mnt/ontap_githackers/repo-$count.git &&
> 			rm -fv objects/pack/*;
> 			git -c core.checkCollisions=false index-pack -v --stdin </home/aearnfjord/g/DBD-mysql/.git/objects/pack/pack-*.pack
> 		)
> 	"
> done
>
> test_done
Geert Jansen Nov. 12, 2018, 10:44 p.m. UTC | #6
On Mon, Nov 12, 2018 at 11:21:51AM -0500, Jeff King wrote:

> No, but they don't even really need to be actual objects. So I suspect
> something like:
> 
>   git init
>   for i in $(seq 256); do
>     i=$(printf %02x $i)
>     mkdir -p .git/objects/$i
>     for j in $(seq --format=%038g 1000); do
>       echo foo >.git/objects/$i/$j
>     done
>   done
>   git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack
> 
> might work (for various values of 1000). The shell loop would probably
> be faster as perl, too. :)
> 
> Make sure you clear the object directory between runs, though (otherwise
> the subsequent index-pack's really do find collisions and spend time
> accessing the objects).

Below are my results. They are not as comprehensive as Ævar's tests. Similary I
kept the loose objects between tests and removed the packs instead. And I also
used the "echo 3 | sudo tee /proc/sys/vm/drop_caches" trick :)

This is with git.git:

                   origin/master    jk/loose-object-cache

256*100 objects    520s             13.5s (-97%)
256*1000 objects   826s             59s (-93%)

I've started a 256*10K setup but that's still creating the 2.5M loose objects.
I'll post the results when it's done. I would expect that jk/loose-object-cache
is still marginally faster than origin/master based on a simple linear
extrapolation.
Ævar Arnfjörð Bjarmason Nov. 13, 2018, 10:02 a.m. UTC | #7
On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> On Mon, Nov 12 2018, Jeff King wrote:
>>
>>> On Mon, Nov 12, 2018 at 05:01:02PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>>
>>>> > There's some obvious hand-waving in the paragraphs above. I would love
>>>> > it if somebody with an NFS system could do some before/after timings
>>>> > with various numbers of loose objects, to get a sense of where the
>>>> > breakeven point is.
>>>> >
>>>> > My gut is that we do not need the complexity of a cache-size limit, nor
>>>> > of a config option to disable this. But it would be nice to have a real
>>>> > number where "reasonable" ends and "pathological" begins. :)
>>>>
>>>> I'm happy to test this on some of the NFS we have locally, and started
>>>> out with a plan to write some for-loop using the low-level API (so it
>>>> would look up all 256), fake populate .git/objects/?? with N number of
>>>> objects etc, but ran out of time.
>>>>
>>>> Do you have something ready that you think would be representative and I
>>>> could just run? If not I'll try to pick this up again...
>>>
>>> No, but they don't even really need to be actual objects. So I suspect
>>> something like:
>>>
>>>   git init
>>>   for i in $(seq 256); do
>>>     i=$(printf %02x $i)
>>>     mkdir -p .git/objects/$i
>>>     for j in $(seq --format=%038g 1000); do
>>>       echo foo >.git/objects/$i/$j
>>>     done
>>>   done
>>>   git index-pack -v --stdin </path/to/git.git/objects/pack/XYZ.pack
>>>
>>> might work (for various values of 1000). The shell loop would probably
>>> be faster as perl, too. :)
>>>
>>> Make sure you clear the object directory between runs, though (otherwise
>>> the subsequent index-pack's really do find collisions and spend time
>>> accessing the objects).
>>>
>>> If you want real objects, you could probably just dump a bunch of
>>> sequential blobs to fast-import, and then pipe the result to
>>> unpack-objects.
>>>
>>> -Peff
>>
>> I did a very ad-hoc test against a NetApp filer using the test script
>> quoted at the end of this E-Mail. The test compared origin/master, this
>> branch of yours, and my core.checkCollisions=false branch.
>>
>> When run with DBD-mysql.git (just some random ~1k commit repo I had):
>>
>>     $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run origin/master peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh
>>
>> I get:
>>
>>     Test                                             origin/master     peff/jk/loose-cache      avar/check-collisions-config
>>     ------------------------------------------------------------------------------------------------------------------------
>>     0008.2: index-pack with 256*1 loose objects      4.31(0.55+0.18)   0.41(0.40+0.02) -90.5%   0.23(0.36+0.01) -94.7%
>>     0008.3: index-pack with 256*10 loose objects     4.37(0.45+0.21)   0.45(0.40+0.02) -89.7%   0.25(0.38+0.01) -94.3%
>>     0008.4: index-pack with 256*100 loose objects    4.47(0.53+0.23)   0.67(0.63+0.02) -85.0%   0.24(0.38+0.01) -94.6%
>>     0008.5: index-pack with 256*250 loose objects    5.01(0.67+0.30)   1.04(0.98+0.06) -79.2%   0.24(0.37+0.01) -95.2%
>>     0008.6: index-pack with 256*500 loose objects    5.11(0.57+0.21)   1.81(1.70+0.09) -64.6%   0.25(0.38+0.01) -95.1%
>>     0008.7: index-pack with 256*750 loose objects    5.12(0.60+0.22)   2.54(2.38+0.14) -50.4%   0.24(0.38+0.01) -95.3%
>>     0008.8: index-pack with 256*1000 loose objects   4.52(0.52+0.21)   3.36(3.17+0.17) -25.7%   0.23(0.36+0.01) -94.9%
>>
>> I then hacked it to test against git.git, but skipped origin/master for
>> that one because it takes *ages*. So just mine v.s. yours:
>>
>>     $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh
>>     [...]
>>     Test                                             peff/jk/loose-cache   avar/check-collisions-config
>>     ---------------------------------------------------------------------------------------------------
>>     0008.2: index-pack with 256*1 loose objects      12.57(28.72+0.61)     12.68(29.36+0.62) +0.9%
>>     0008.3: index-pack with 256*10 loose objects     12.77(28.75+0.61)     12.50(28.88+0.56) -2.1%
>>     0008.4: index-pack with 256*100 loose objects    13.20(29.49+0.66)     12.38(28.58+0.60) -6.2%
>>     0008.5: index-pack with 256*250 loose objects    14.10(30.59+0.64)     12.54(28.22+0.57) -11.1%
>>     0008.6: index-pack with 256*500 loose objects    14.48(31.06+0.74)     12.43(28.59+0.60) -14.2%
>>     0008.7: index-pack with 256*750 loose objects    15.31(31.91+0.74)     12.67(29.23+0.64) -17.2%
>>     0008.8: index-pack with 256*1000 loose objects   16.34(32.84+0.76)     13.11(30.19+0.68) -19.8%
>>
>> So not much of a practical difference perhaps. But then again this isn't
>> a very realistic test case of anything. Rarely are you going to push a
>> history of something the size of git.git into a repo with this many
>> loose objects.
>>
>> Using sha1collisiondetection.git is I think the most realistic scenario,
>> i.e. you'll often end up fetching/pushing something roughly the size of
>> its entire history on a big repo, and with it:
>>
>>     Test                                             peff/jk/loose-cache   avar/check-collisions-config
>>     ---------------------------------------------------------------------------------------------------
>>     0008.2: index-pack with 256*1 loose objects      0.16(0.04+0.01)       0.05(0.03+0.00) -68.8%
>>     0008.3: index-pack with 256*10 loose objects     0.19(0.04+0.02)       0.05(0.02+0.00) -73.7%
>>     0008.4: index-pack with 256*100 loose objects    0.32(0.17+0.02)       0.04(0.02+0.00) -87.5%
>>     0008.5: index-pack with 256*250 loose objects    0.57(0.41+0.03)       0.04(0.02+0.00) -93.0%
>>     0008.6: index-pack with 256*500 loose objects    1.02(0.83+0.06)       0.04(0.03+0.00) -96.1%
>>     0008.7: index-pack with 256*750 loose objects    1.47(1.24+0.10)       0.04(0.02+0.00) -97.3%
>>     0008.8: index-pack with 256*1000 loose objects   1.94(1.70+0.10)       0.04(0.02+0.00) -97.9%
>>
>> As noted in previous threads I have an in-house monorepo where (due to
>> expiry policies) loose objects hover around the 256*250 mark.
>>
>> The script, which is hacky as hell and takes shortcuts not to re-create
>> the huge fake loose object collection every time (takes ages). Perhaps
>> you're interested in incorporating some version of this into a v2. To be
>> useful it should take some target path as an env variable.
>
> I forgot perhaps the most useful metric. Testing against origin/master
> too on the sha1collisiondetection.git repo, which as noted above I think
> is a good stand-in for making a medium sized push to a big repo. This
> shows when the loose cache becomes counterproductive:
>
>     Test                                             origin/master     peff/jk/loose-cache       avar/check-collisions-config
>     -------------------------------------------------------------------------------------------------------------------------
>     0008.2: index-pack with 256*1 loose objects      0.42(0.04+0.03)   0.17(0.04+0.00) -59.5%    0.04(0.03+0.00) -90.5%
>     0008.3: index-pack with 256*10 loose objects     0.49(0.04+0.03)   0.19(0.04+0.01) -61.2%    0.04(0.02+0.00) -91.8%
>     0008.4: index-pack with 256*100 loose objects    0.49(0.04+0.04)   0.33(0.18+0.01) -32.7%    0.05(0.02+0.00) -89.8%
>     0008.5: index-pack with 256*250 loose objects    0.54(0.03+0.04)   0.59(0.43+0.02) +9.3%     0.04(0.02+0.01) -92.6%
>     0008.6: index-pack with 256*500 loose objects    0.49(0.04+0.03)   1.04(0.83+0.07) +112.2%   0.04(0.02+0.00) -91.8%
>     0008.7: index-pack with 256*750 loose objects    0.56(0.04+0.05)   1.50(1.28+0.08) +167.9%   0.04(0.02+0.00) -92.9%
>     0008.8: index-pack with 256*1000 loose objects   0.54(0.05+0.03)   1.95(1.68+0.13) +261.1%   0.04(0.02+0.00) -92.6%
>
> I still think it's best to take this patch series since it's unlikely
> we're making anything worse in practice, the >50k objects case is a
> really high number, which I don't think is worth worrying about.
>
> But I am somewhat paranoid about the potential performance
> regression. I.e. this is me testing against a really expensive and
> relatively well performing NetApp NFS device where the ping stats are:
>
>     rtt min/avg/max/mdev = 0.155/0.396/1.387/0.349 ms
>
> So I suspect this might get a lot worse for setups which don't enjoy the
> same performance or network locality.

I tried this with the same filer mounted from another DC with ~10x the
RTT:

    rtt min/avg/max/mdev = 11.553/11.618/11.739/0.121 ms

But otherwise the same setup (same machine type/specs mounting it). It
had the opposite results of what I was expecting:

    Test                                             origin/master     peff/jk/loose-cache      avar/check-collisions-config
    ------------------------------------------------------------------------------------------------------------------------
    0008.2: index-pack with 256*1 loose objects      7.78(0.04+0.03)   2.75(0.03+0.01) -64.7%   0.40(0.02+0.00) -94.9%
    0008.3: index-pack with 256*10 loose objects     7.75(0.04+0.04)   2.77(0.05+0.01) -64.3%   0.40(0.02+0.00) -94.8%
    0008.4: index-pack with 256*100 loose objects    7.75(0.05+0.02)   2.91(0.18+0.01) -62.5%   0.40(0.02+0.00) -94.8%
    0008.5: index-pack with 256*250 loose objects    7.73(0.04+0.04)   3.19(0.43+0.02) -58.7%   0.40(0.02+0.00) -94.8%
    0008.6: index-pack with 256*500 loose objects    7.73(0.04+0.04)   3.64(0.83+0.05) -52.9%   0.40(0.02+0.00) -94.8%
    0008.7: index-pack with 256*750 loose objects    7.73(0.04+0.02)   4.14(1.29+0.07) -46.4%   0.40(0.02+0.00) -94.8%
    0008.8: index-pack with 256*1000 loose objects   7.73(0.04+0.03)   4.55(1.72+0.09) -41.1%   0.40(0.02+0.01) -94.8%

I.e. there the cliff of where the cache becomes counterproductive comes
much later, not earlier. The sha1collisiondetection.git repo has 418
objects.

So is it cheaper to fill a huge cache than look up those 418? I don't
know, haven't dug. But so far what this suggests is that we're helping
slow FSs to the detriment of faster ones.

So here's the same test not against NFS, but the local ext4 fs (CO7;
Linux 3.10) for sha1collisiondetection.git:

    Test                                             origin/master     peff/jk/loose-cache        avar/check-collisions-config
    --------------------------------------------------------------------------------------------------------------------------
    0008.2: index-pack with 256*1 loose objects      0.02(0.02+0.00)   0.02(0.02+0.01) +0.0%      0.02(0.02+0.00) +0.0%
    0008.3: index-pack with 256*10 loose objects     0.02(0.02+0.00)   0.03(0.03+0.00) +50.0%     0.02(0.02+0.00) +0.0%
    0008.4: index-pack with 256*100 loose objects    0.02(0.02+0.00)   0.17(0.16+0.01) +750.0%    0.02(0.02+0.00) +0.0%
    0008.5: index-pack with 256*250 loose objects    0.02(0.02+0.00)   0.43(0.40+0.03) +2050.0%   0.02(0.02+0.00) +0.0%
    0008.6: index-pack with 256*500 loose objects    0.02(0.02+0.00)   0.88(0.80+0.09) +4300.0%   0.02(0.02+0.00) +0.0%
    0008.7: index-pack with 256*750 loose objects    0.02(0.02+0.00)   1.35(1.27+0.09) +6650.0%   0.02(0.02+0.00) +0.0%
    0008.8: index-pack with 256*1000 loose objects   0.02(0.02+0.00)   1.83(1.70+0.14) +9050.0%   0.02(0.02+0.00) +0.0%

And for mu.git, a ~20k object repo:

    Test                                             origin/master     peff/jk/loose-cache       avar/check-collisions-config
    -------------------------------------------------------------------------------------------------------------------------
    0008.2: index-pack with 256*1 loose objects      0.59(0.91+0.06)   0.58(0.93+0.03) -1.7%     0.57(0.89+0.04) -3.4%
    0008.3: index-pack with 256*10 loose objects     0.59(0.91+0.07)   0.59(0.92+0.03) +0.0%     0.57(0.89+0.03) -3.4%
    0008.4: index-pack with 256*100 loose objects    0.59(0.91+0.05)   0.81(1.13+0.04) +37.3%    0.58(0.91+0.04) -1.7%
    0008.5: index-pack with 256*250 loose objects    0.59(0.91+0.05)   1.23(1.51+0.08) +108.5%   0.58(0.91+0.04) -1.7%
    0008.6: index-pack with 256*500 loose objects    0.59(0.90+0.06)   1.96(2.20+0.12) +232.2%   0.58(0.91+0.04) -1.7%
    0008.7: index-pack with 256*750 loose objects    0.59(0.92+0.05)   2.72(2.92+0.17) +361.0%   0.58(0.90+0.04) -1.7%
    0008.8: index-pack with 256*1000 loose objects   0.59(0.90+0.06)   3.50(3.67+0.21) +493.2%   0.57(0.90+0.04) -3.4%

All of which is to say that I think it definitely makes sense to re-roll
this with a perf test, and a switch to toggle it + docs explaining the
caveats & pointing to the perf test. It's a clear win in some scenarios,
but a big loss in others.

>> $ cat t/perf/p0008-index-pack.sh
>> #!/bin/sh
>>
>> test_description="Tests performance of index-pack with loose objects"
>>
>> . ./perf-lib.sh
>>
>> test_perf_fresh_repo
>>
>> test_expect_success 'setup tests' '
>> 	for count in 1 10 100 250 500 750 1000
>> 	do
>> 		if test -d /mnt/ontap_githackers/repo-$count.git
>> 		then
>> 			rm -rf /mnt/ontap_githackers/repo-$count.git/objects/pack
>> 		else
>> 			git init --bare /mnt/ontap_githackers/repo-$count.git &&
>> 			(
>> 				cd /mnt/ontap_githackers/repo-$count.git &&
>> 				for i in $(seq 0 255)
>> 				do
>> 					i=$(printf %02x $i) &&
>> 					mkdir objects/$i &&
>> 					for j in $(seq --format=%038g $count)
>> 					do
>> 						>objects/$i/$j
>> 					done
>> 				done
>> 			)
>> 		fi
>> 	done
>> '
>>
>> for count in 1 10 100 250 500 750 1000
>> do
>> 	echo 3 | sudo tee /proc/sys/vm/drop_caches
>> 	test_perf "index-pack with 256*$count loose objects" "
>> 		(
>> 			cd /mnt/ontap_githackers/repo-$count.git &&
>> 			rm -fv objects/pack/*;
>> 			git -c core.checkCollisions=false index-pack -v --stdin </home/aearnfjord/g/DBD-mysql/.git/objects/pack/pack-*.pack
>> 		)
>> 	"
>> done
>>
>> test_done
René Scharfe Nov. 14, 2018, 6:21 p.m. UTC | #8
Am 13.11.2018 um 11:02 schrieb Ævar Arnfjörð Bjarmason:
> So here's the same test not against NFS, but the local ext4 fs (CO7;
> Linux 3.10) for sha1collisiondetection.git:
> 
>     Test                                             origin/master     peff/jk/loose-cache        avar/check-collisions-config
>     --------------------------------------------------------------------------------------------------------------------------
>     0008.2: index-pack with 256*1 loose objects      0.02(0.02+0.00)   0.02(0.02+0.01) +0.0%      0.02(0.02+0.00) +0.0%
>     0008.3: index-pack with 256*10 loose objects     0.02(0.02+0.00)   0.03(0.03+0.00) +50.0%     0.02(0.02+0.00) +0.0%
>     0008.4: index-pack with 256*100 loose objects    0.02(0.02+0.00)   0.17(0.16+0.01) +750.0%    0.02(0.02+0.00) +0.0%
>     0008.5: index-pack with 256*250 loose objects    0.02(0.02+0.00)   0.43(0.40+0.03) +2050.0%   0.02(0.02+0.00) +0.0%
>     0008.6: index-pack with 256*500 loose objects    0.02(0.02+0.00)   0.88(0.80+0.09) +4300.0%   0.02(0.02+0.00) +0.0%
>     0008.7: index-pack with 256*750 loose objects    0.02(0.02+0.00)   1.35(1.27+0.09) +6650.0%   0.02(0.02+0.00) +0.0%
>     0008.8: index-pack with 256*1000 loose objects   0.02(0.02+0.00)   1.83(1.70+0.14) +9050.0%   0.02(0.02+0.00) +0.0%

Ouch.

> And for mu.git, a ~20k object repo:
> 
>     Test                                             origin/master     peff/jk/loose-cache       avar/check-collisions-config
>     -------------------------------------------------------------------------------------------------------------------------
>     0008.2: index-pack with 256*1 loose objects      0.59(0.91+0.06)   0.58(0.93+0.03) -1.7%     0.57(0.89+0.04) -3.4%
>     0008.3: index-pack with 256*10 loose objects     0.59(0.91+0.07)   0.59(0.92+0.03) +0.0%     0.57(0.89+0.03) -3.4%
>     0008.4: index-pack with 256*100 loose objects    0.59(0.91+0.05)   0.81(1.13+0.04) +37.3%    0.58(0.91+0.04) -1.7%
>     0008.5: index-pack with 256*250 loose objects    0.59(0.91+0.05)   1.23(1.51+0.08) +108.5%   0.58(0.91+0.04) -1.7%
>     0008.6: index-pack with 256*500 loose objects    0.59(0.90+0.06)   1.96(2.20+0.12) +232.2%   0.58(0.91+0.04) -1.7%
>     0008.7: index-pack with 256*750 loose objects    0.59(0.92+0.05)   2.72(2.92+0.17) +361.0%   0.58(0.90+0.04) -1.7%
>     0008.8: index-pack with 256*1000 loose objects   0.59(0.90+0.06)   3.50(3.67+0.21) +493.2%   0.57(0.90+0.04) -3.4%
> 
> All of which is to say that I think it definitely makes sense to re-roll
> this with a perf test, and a switch to toggle it + docs explaining the
> caveats & pointing to the perf test. It's a clear win in some scenarios,
> but a big loss in others.

Right, but can we perhaps find a way to toggle it automatically, like
the special fetch-pack cache tried to do?

So the code needs to decide between using lstat() on individual loose
objects files or opendir()+readdir() to load the names in a whole
fan-out directory.  Intuitively I'd try to solve it using red tape, by
measuring the duration of both kinds of calls, and then try to find a
heuristic based on those numbers.  Is the overhead worth it?

But first, may I interest you in some further complication?  We can
also use access(2) to check for the existence of files.  It doesn't
need to fill in struct stat, so may have a slight advantage if we
don't need any of that information.  The following patch is a
replacement for patch 8 and improves performance by ca. 3% with
git.git on an SSD for me; I'm curious to see how it does on NFS:

---
 sha1-file.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/sha1-file.c b/sha1-file.c
index b77dacedc7..5315c37cbc 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -888,8 +888,13 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
 	prepare_alt_odb(r);
 	for (odb = r->objects->odb; odb; odb = odb->next) {
 		*path = odb_loose_path(odb, &buf, sha1);
-		if (!lstat(*path, st))
-			return 0;
+		if (st) {
+			if (!lstat(*path, st))
+				return 0;
+		} else {
+			if (!access(*path, F_OK))
+				return 0;
+		}
 	}
 
 	return -1;
@@ -1171,7 +1176,8 @@ static int sha1_loose_object_info(struct repository *r,
 	if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) {
 		const char *path;
 		struct stat st;
-		if (stat_sha1_file(r, sha1, &st, &path) < 0)
+		struct stat *stp = oi->disk_sizep ? &st : NULL;
+		if (stat_sha1_file(r, sha1, stp, &path) < 0)
 			return -1;
 		if (oi->disk_sizep)
 			*oi->disk_sizep = st.st_size;
@@ -1382,7 +1388,6 @@ void *read_object_file_extended(const struct object_id *oid,
 	void *data;
 	const struct packed_git *p;
 	const char *path;
-	struct stat st;
 	const struct object_id *repl = lookup_replace ?
 		lookup_replace_object(the_repository, oid) : oid;
 
@@ -1399,7 +1404,7 @@ void *read_object_file_extended(const struct object_id *oid,
 		die(_("replacement %s not found for %s"),
 		    oid_to_hex(repl), oid_to_hex(oid));
 
-	if (!stat_sha1_file(the_repository, repl->hash, &st, &path))
+	if (!stat_sha1_file(the_repository, repl->hash, NULL, &path))
 		die(_("loose object %s (stored in %s) is corrupt"),
 		    oid_to_hex(repl), path);
René Scharfe Nov. 27, 2018, 8:48 p.m. UTC | #9
Am 12.11.2018 um 15:54 schrieb Jeff King:
> diff --git a/sha1-file.c b/sha1-file.c
> index 4aae716a37..e53da0b701 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -921,6 +921,24 @@ static int open_sha1_file(struct repository *r,
>  	return -1;
>  }
>  
> +static int quick_has_loose(struct repository *r,
> +			   const unsigned char *sha1)
> +{
> +	int subdir_nr = sha1[0];
> +	struct object_id oid;
> +	struct object_directory *odb;
> +
> +	hashcpy(oid.hash, sha1);
> +
> +	prepare_alt_odb(r);
> +	for (odb = r->objects->odb; odb; odb = odb->next) {
> +		odb_load_loose_cache(odb, subdir_nr);

Is this thread-safe?  What happens if e.g. one index-pack thread resizes
the array while another one sorts it?

Loading the cache explicitly up-front would avoid that, and improves
performance a bit in my (very limited) tests on an SSD.  Demo patch for
next at the bottom.  How does it do against your test cases?

> +		if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0)
> +			return 1;
> +	}
> +	return 0;
> +}
> +
>  /*
>   * Map the loose object at "path" if it is not NULL, or the path found by
>   * searching for a loose object named "sha1".
> @@ -1171,6 +1189,8 @@ static int sha1_loose_object_info(struct repository *r,
>  	if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) {
>  		const char *path;
>  		struct stat st;
> +		if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
> +			return quick_has_loose(r, sha1) ? 0 : -1;
>  		if (stat_sha1_file(r, sha1, &st, &path) < 0)
>  			return -1;
>  		if (oi->disk_sizep)
> 

 builtin/fetch.c      |  2 ++
 builtin/index-pack.c |  2 ++
 fetch-pack.c         |  2 ++
 object-store.h       |  1 +
 sha1-file.c          | 30 +++++++++++++++++++++++++++---
 5 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e0140327aa..4b031f5da5 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -301,6 +301,8 @@ static void find_non_local_tags(const struct ref *refs,
 	refname_hash_init(&existing_refs);
 	refname_hash_init(&remote_refs);
 
+	repo_load_loose_cache(the_repository);
+
 	for_each_ref(add_one_refname, &existing_refs);
 	for (ref = refs; ref; ref = ref->next) {
 		if (!starts_with(ref->name, "refs/tags/"))
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index ac1f4ea9a7..7fc6321c77 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1772,6 +1772,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	if (show_stat)
 		obj_stat = xcalloc(st_add(nr_objects, 1), sizeof(struct object_stat));
 	ofs_deltas = xcalloc(nr_objects, sizeof(struct ofs_delta_entry));
+	if (startup_info->have_repository)
+		repo_load_loose_cache(the_repository);
 	parse_pack_objects(pack_hash);
 	if (report_end_of_input)
 		write_in_full(2, "\0", 1);
diff --git a/fetch-pack.c b/fetch-pack.c
index dd6700bda9..96c4624d9e 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -656,6 +656,8 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 
 	save_commit_buffer = 0;
 
+	repo_load_loose_cache(the_repository);
+
 	for (ref = *refs; ref; ref = ref->next) {
 		struct object *o;
 
diff --git a/object-store.h b/object-store.h
index 8dceed0f31..f98dd3c857 100644
--- a/object-store.h
+++ b/object-store.h
@@ -53,6 +53,7 @@ void add_to_alternates_memory(const char *dir);
  * from 0 to 255 inclusive).
  */
 void odb_load_loose_cache(struct object_directory *odb, int subdir_nr);
+void repo_load_loose_cache(struct repository *r);
 
 struct packed_git {
 	struct packed_git *next;
diff --git a/sha1-file.c b/sha1-file.c
index 05f63dfd4e..ae12f0a198 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -921,10 +921,19 @@ static int open_sha1_file(struct repository *r,
 	return -1;
 }
 
+static int quick_has_loose_odb(struct object_directory *odb,
+			       const struct object_id *oid)
+{
+	int subdir_nr = oid->hash[0];
+
+	if (odb->loose_objects_subdir_seen[subdir_nr])
+		return oid_array_lookup(&odb->loose_objects_cache, oid) >= 0;
+	return check_and_freshen_odb(odb, oid, 0);
+}
+
 static int quick_has_loose(struct repository *r,
 			   const unsigned char *sha1)
 {
-	int subdir_nr = sha1[0];
 	struct object_id oid;
 	struct object_directory *odb;
 
@@ -932,8 +941,7 @@ static int quick_has_loose(struct repository *r,
 
 	prepare_alt_odb(r);
 	for (odb = r->objects->odb; odb; odb = odb->next) {
-		odb_load_loose_cache(odb, subdir_nr);
-		if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0)
+		if (quick_has_loose_odb(odb, &oid))
 			return 1;
 	}
 	return 0;
@@ -2178,6 +2186,22 @@ void odb_load_loose_cache(struct object_directory *odb, int subdir_nr)
 	strbuf_release(&buf);
 }
 
+void repo_load_loose_cache(struct repository *r)
+{
+	struct object_directory *odb;
+
+	prepare_alt_odb(r);
+	for (odb = r->objects->odb; odb; odb = odb->next) {
+		int i;
+
+		for (i = 0; i < ARRAY_SIZE(odb->loose_objects_subdir_seen); i++)
+			odb_load_loose_cache(odb, i);
+
+		/* Sort as a side-effect, only read the cache from here on. */
+		oid_array_lookup(&odb->loose_objects_cache, &null_oid);
+	}
+}
+
 static int check_stream_sha1(git_zstream *stream,
 			     const char *hdr,
 			     unsigned long size,
Jeff King Dec. 1, 2018, 7:49 p.m. UTC | #10
On Tue, Nov 27, 2018 at 09:48:57PM +0100, René Scharfe wrote:

> > +static int quick_has_loose(struct repository *r,
> > +			   const unsigned char *sha1)
> > +{
> > +	int subdir_nr = sha1[0];
> > +	struct object_id oid;
> > +	struct object_directory *odb;
> > +
> > +	hashcpy(oid.hash, sha1);
> > +
> > +	prepare_alt_odb(r);
> > +	for (odb = r->objects->odb; odb; odb = odb->next) {
> > +		odb_load_loose_cache(odb, subdir_nr);
> 
> Is this thread-safe?  What happens if e.g. one index-pack thread resizes
> the array while another one sorts it?

No, but neither is any of the object_info / has_object_file path, which
may use static function-local buffers, or (before my series) alt scratch
bufs, or even call reprepare_packed_git().

In the long run, I think the solution is probably going to be pushing
some mutexes into the right places, and putting one around the cache
fill is an obvious place.

> Loading the cache explicitly up-front would avoid that, and improves
> performance a bit in my (very limited) tests on an SSD.  Demo patch for
> next at the bottom.  How does it do against your test cases?

It's going to do badly on corner cases where we don't need to load every
object subdirectory, and one or more of them are big. I.e., if I look up
"1234abcd", the current code only needs to fault in $GIT_DIR/objects/12.
Pre-loading means we'd hit them all. Even without a lot of objects, on
NFS that's 256 latencies instead of 1.

-Peff
René Scharfe Dec. 2, 2018, 10:52 a.m. UTC | #11
Am 13.11.2018 um 11:02 schrieb Ævar Arnfjörð Bjarmason:
> 
> On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote:
> 
>> On Mon, Nov 12 2018, Ævar Arnfjörð Bjarmason wrote:
>>
>>> I get:
>>>
>>>     Test                                             origin/master     peff/jk/loose-cache      avar/check-collisions-config
>>>     ------------------------------------------------------------------------------------------------------------------------
>>>     0008.2: index-pack with 256*1 loose objects      4.31(0.55+0.18)   0.41(0.40+0.02) -90.5%   0.23(0.36+0.01) -94.7%
>>>     0008.3: index-pack with 256*10 loose objects     4.37(0.45+0.21)   0.45(0.40+0.02) -89.7%   0.25(0.38+0.01) -94.3%
>>>     0008.4: index-pack with 256*100 loose objects    4.47(0.53+0.23)   0.67(0.63+0.02) -85.0%   0.24(0.38+0.01) -94.6%
>>>     0008.5: index-pack with 256*250 loose objects    5.01(0.67+0.30)   1.04(0.98+0.06) -79.2%   0.24(0.37+0.01) -95.2%
>>>     0008.6: index-pack with 256*500 loose objects    5.11(0.57+0.21)   1.81(1.70+0.09) -64.6%   0.25(0.38+0.01) -95.1%
>>>     0008.7: index-pack with 256*750 loose objects    5.12(0.60+0.22)   2.54(2.38+0.14) -50.4%   0.24(0.38+0.01) -95.3%
>>>     0008.8: index-pack with 256*1000 loose objects   4.52(0.52+0.21)   3.36(3.17+0.17) -25.7%   0.23(0.36+0.01) -94.9%
>>>
>>> I then hacked it to test against git.git, but skipped origin/master for
>>> that one because it takes *ages*. So just mine v.s. yours:
>>>
>>>     $ GIT_PERF_REPEAT_COUNT=3 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run peff/jk/loose-cache avar/check-collisions-config p0008-index-pack.sh
>>>     [...]
>>>     Test                                             peff/jk/loose-cache   avar/check-collisions-config
>>>     ---------------------------------------------------------------------------------------------------
>>>     0008.2: index-pack with 256*1 loose objects      12.57(28.72+0.61)     12.68(29.36+0.62) +0.9%
>>>     0008.3: index-pack with 256*10 loose objects     12.77(28.75+0.61)     12.50(28.88+0.56) -2.1%
>>>     0008.4: index-pack with 256*100 loose objects    13.20(29.49+0.66)     12.38(28.58+0.60) -6.2%
>>>     0008.5: index-pack with 256*250 loose objects    14.10(30.59+0.64)     12.54(28.22+0.57) -11.1%
>>>     0008.6: index-pack with 256*500 loose objects    14.48(31.06+0.74)     12.43(28.59+0.60) -14.2%
>>>     0008.7: index-pack with 256*750 loose objects    15.31(31.91+0.74)     12.67(29.23+0.64) -17.2%
>>>     0008.8: index-pack with 256*1000 loose objects   16.34(32.84+0.76)     13.11(30.19+0.68) -19.8%
>>>
>>> So not much of a practical difference perhaps. But then again this isn't
>>> a very realistic test case of anything. Rarely are you going to push a
>>> history of something the size of git.git into a repo with this many
>>> loose objects.
>>>
>>> Using sha1collisiondetection.git is I think the most realistic scenario,
>>> i.e. you'll often end up fetching/pushing something roughly the size of
>>> its entire history on a big repo, and with it:
>>>
>>>     Test                                             peff/jk/loose-cache   avar/check-collisions-config
>>>     ---------------------------------------------------------------------------------------------------
>>>     0008.2: index-pack with 256*1 loose objects      0.16(0.04+0.01)       0.05(0.03+0.00) -68.8%
>>>     0008.3: index-pack with 256*10 loose objects     0.19(0.04+0.02)       0.05(0.02+0.00) -73.7%
>>>     0008.4: index-pack with 256*100 loose objects    0.32(0.17+0.02)       0.04(0.02+0.00) -87.5%
>>>     0008.5: index-pack with 256*250 loose objects    0.57(0.41+0.03)       0.04(0.02+0.00) -93.0%
>>>     0008.6: index-pack with 256*500 loose objects    1.02(0.83+0.06)       0.04(0.03+0.00) -96.1%
>>>     0008.7: index-pack with 256*750 loose objects    1.47(1.24+0.10)       0.04(0.02+0.00) -97.3%
>>>     0008.8: index-pack with 256*1000 loose objects   1.94(1.70+0.10)       0.04(0.02+0.00) -97.9%
>>>
>>> As noted in previous threads I have an in-house monorepo where (due to
>>> expiry policies) loose objects hover around the 256*250 mark.
>>>
>>> The script, which is hacky as hell and takes shortcuts not to re-create
>>> the huge fake loose object collection every time (takes ages). Perhaps
>>> you're interested in incorporating some version of this into a v2. To be
>>> useful it should take some target path as an env variable.
>>
>> I forgot perhaps the most useful metric. Testing against origin/master
>> too on the sha1collisiondetection.git repo, which as noted above I think
>> is a good stand-in for making a medium sized push to a big repo. This
>> shows when the loose cache becomes counterproductive:
>>
>>     Test                                             origin/master     peff/jk/loose-cache       avar/check-collisions-config
>>     -------------------------------------------------------------------------------------------------------------------------
>>     0008.2: index-pack with 256*1 loose objects      0.42(0.04+0.03)   0.17(0.04+0.00) -59.5%    0.04(0.03+0.00) -90.5%
>>     0008.3: index-pack with 256*10 loose objects     0.49(0.04+0.03)   0.19(0.04+0.01) -61.2%    0.04(0.02+0.00) -91.8%
>>     0008.4: index-pack with 256*100 loose objects    0.49(0.04+0.04)   0.33(0.18+0.01) -32.7%    0.05(0.02+0.00) -89.8%
>>     0008.5: index-pack with 256*250 loose objects    0.54(0.03+0.04)   0.59(0.43+0.02) +9.3%     0.04(0.02+0.01) -92.6%
>>     0008.6: index-pack with 256*500 loose objects    0.49(0.04+0.03)   1.04(0.83+0.07) +112.2%   0.04(0.02+0.00) -91.8%
>>     0008.7: index-pack with 256*750 loose objects    0.56(0.04+0.05)   1.50(1.28+0.08) +167.9%   0.04(0.02+0.00) -92.9%
>>     0008.8: index-pack with 256*1000 loose objects   0.54(0.05+0.03)   1.95(1.68+0.13) +261.1%   0.04(0.02+0.00) -92.6%
>>
>> I still think it's best to take this patch series since it's unlikely
>> we're making anything worse in practice, the >50k objects case is a
>> really high number, which I don't think is worth worrying about.
>>
>> But I am somewhat paranoid about the potential performance
>> regression. I.e. this is me testing against a really expensive and
>> relatively well performing NetApp NFS device where the ping stats are:
>>
>>     rtt min/avg/max/mdev = 0.155/0.396/1.387/0.349 ms
>>
>> So I suspect this might get a lot worse for setups which don't enjoy the
>> same performance or network locality.
> 
> I tried this with the same filer mounted from another DC with ~10x the
> RTT:
> 
>     rtt min/avg/max/mdev = 11.553/11.618/11.739/0.121 ms
> 
> But otherwise the same setup (same machine type/specs mounting it). It
> had the opposite results of what I was expecting:
> 
>     Test                                             origin/master     peff/jk/loose-cache      avar/check-collisions-config
>     ------------------------------------------------------------------------------------------------------------------------
>     0008.2: index-pack with 256*1 loose objects      7.78(0.04+0.03)   2.75(0.03+0.01) -64.7%   0.40(0.02+0.00) -94.9%
>     0008.3: index-pack with 256*10 loose objects     7.75(0.04+0.04)   2.77(0.05+0.01) -64.3%   0.40(0.02+0.00) -94.8%
>     0008.4: index-pack with 256*100 loose objects    7.75(0.05+0.02)   2.91(0.18+0.01) -62.5%   0.40(0.02+0.00) -94.8%
>     0008.5: index-pack with 256*250 loose objects    7.73(0.04+0.04)   3.19(0.43+0.02) -58.7%   0.40(0.02+0.00) -94.8%
>     0008.6: index-pack with 256*500 loose objects    7.73(0.04+0.04)   3.64(0.83+0.05) -52.9%   0.40(0.02+0.00) -94.8%
>     0008.7: index-pack with 256*750 loose objects    7.73(0.04+0.02)   4.14(1.29+0.07) -46.4%   0.40(0.02+0.00) -94.8%
>     0008.8: index-pack with 256*1000 loose objects   7.73(0.04+0.03)   4.55(1.72+0.09) -41.1%   0.40(0.02+0.01) -94.8%
> 
> I.e. there the cliff of where the cache becomes counterproductive comes
> much later, not earlier. The sha1collisiondetection.git repo has 418
> objects.
> 
> So is it cheaper to fill a huge cache than look up those 418? I don't
> know, haven't dug. But so far what this suggests is that we're helping
> slow FSs to the detriment of faster ones.
> 
> So here's the same test not against NFS, but the local ext4 fs (CO7;
> Linux 3.10) for sha1collisiondetection.git:
> 
>     Test                                             origin/master     peff/jk/loose-cache        avar/check-collisions-config
>     --------------------------------------------------------------------------------------------------------------------------
>     0008.2: index-pack with 256*1 loose objects      0.02(0.02+0.00)   0.02(0.02+0.01) +0.0%      0.02(0.02+0.00) +0.0%
>     0008.3: index-pack with 256*10 loose objects     0.02(0.02+0.00)   0.03(0.03+0.00) +50.0%     0.02(0.02+0.00) +0.0%
>     0008.4: index-pack with 256*100 loose objects    0.02(0.02+0.00)   0.17(0.16+0.01) +750.0%    0.02(0.02+0.00) +0.0%
>     0008.5: index-pack with 256*250 loose objects    0.02(0.02+0.00)   0.43(0.40+0.03) +2050.0%   0.02(0.02+0.00) +0.0%
>     0008.6: index-pack with 256*500 loose objects    0.02(0.02+0.00)   0.88(0.80+0.09) +4300.0%   0.02(0.02+0.00) +0.0%
>     0008.7: index-pack with 256*750 loose objects    0.02(0.02+0.00)   1.35(1.27+0.09) +6650.0%   0.02(0.02+0.00) +0.0%
>     0008.8: index-pack with 256*1000 loose objects   0.02(0.02+0.00)   1.83(1.70+0.14) +9050.0%   0.02(0.02+0.00) +0.0%
> 
> And for mu.git, a ~20k object repo:
> 
>     Test                                             origin/master     peff/jk/loose-cache       avar/check-collisions-config
>     -------------------------------------------------------------------------------------------------------------------------
>     0008.2: index-pack with 256*1 loose objects      0.59(0.91+0.06)   0.58(0.93+0.03) -1.7%     0.57(0.89+0.04) -3.4%
>     0008.3: index-pack with 256*10 loose objects     0.59(0.91+0.07)   0.59(0.92+0.03) +0.0%     0.57(0.89+0.03) -3.4%
>     0008.4: index-pack with 256*100 loose objects    0.59(0.91+0.05)   0.81(1.13+0.04) +37.3%    0.58(0.91+0.04) -1.7%
>     0008.5: index-pack with 256*250 loose objects    0.59(0.91+0.05)   1.23(1.51+0.08) +108.5%   0.58(0.91+0.04) -1.7%
>     0008.6: index-pack with 256*500 loose objects    0.59(0.90+0.06)   1.96(2.20+0.12) +232.2%   0.58(0.91+0.04) -1.7%
>     0008.7: index-pack with 256*750 loose objects    0.59(0.92+0.05)   2.72(2.92+0.17) +361.0%   0.58(0.90+0.04) -1.7%
>     0008.8: index-pack with 256*1000 loose objects   0.59(0.90+0.06)   3.50(3.67+0.21) +493.2%   0.57(0.90+0.04) -3.4%

OK, here's another theory: The cache scales badly with increasing
numbers of loose objects because it sorts the array 256 times as it is
filled.  Loading it fully and sorting once would help, as would using
one array per subdirectory.

We can simulate the oid_array operations with test-sha1-array.  It has
no explicit sort command, but we can use for_each_unique for that; we
just have to add 127.5 extra calls (that don't sort) to get the same
amount of output in the two latter cases, to be able to compare just
the sort time:

  for command in '
      foreach (0..255) {
        $subdir = sprintf("%02x", $_);
        foreach (1..$ARGV[0]) {
          printf("append %s%038d\n", $subdir, $_);
        }
        # intermediate sort
        print "for_each_unique\n";
      }
    ' '
      foreach (0..255) {
        $subdir = sprintf("%02x", $_);
        foreach (1..$ARGV[0]) {
          printf("append %s%038d\n", $subdir, $_);
        }
      }
      # sort once at the end
      print "for_each_unique\n";
      # ... and generate roughly the same amount of output
      foreach (0..127) {
        print "for_each_unique\n";
      }
    ' '
      foreach (0..255) {
        $subdir = sprintf("%02x", $_);
        foreach (1..$ARGV[0]) {
          printf("append %s%038d\n", $subdir, $_);
        }
        # sort each subdirectory separately
        print "for_each_unique\n";
	# ... and generate roughly the same amount of output
        foreach (0..127) {
          print "for_each_unique\n";
        }
        print "clear\n";
      }
    '
  do
    time perl -e "$command" 1000 | t/helper/test-tool sha1-array | wc -l
  done

My results:

  32896000

  real    0m6.521s
  user    0m5.269s
  sys     0m2.234s
  33024000

  real    0m3.464s
  user    0m2.178s
  sys     0m2.251s
  33024000

  real    0m3.353s
  user    0m2.179s
  sys     0m1.939s

So this adds up to a significant amount of time spent on sorting.  Here's
a patch, on top of next:

-- >8 --
Subject: [PATCH] object-store: use one oid_array per subdirectory for loose cache

The loose objects cache is filled one subdirectory at a time as needed.
It is stored in an oid_array, which has to be resorted after each add
operation.  So when querying a wide range of objects the array needs to
be resorted up to 256 times -- once for each subdirectory.  This is not
efficient.

Use one oid_array for each subdirectory.  This ensures that entries have
to only be sorted once.

It speeds up cache operations in a repository with ca. 100 loose
objects per subdirectory (it's used for collision checks for the
placeholders %h, %t and %p):

$ git count-objects
25805 objects, 302452 kilobytes

$ (cd t/perf && ./run HEAD^ HEAD ./p4205-log-pretty-formats.sh)
[...]
Test                        HEAD^             HEAD
--------------------------------------------------------------------
4205.1: log with %H         0.56(0.52+0.04)   0.57(0.54+0.02) +1.8%
4205.2: log with %h         0.92(0.86+0.06)   0.66(0.62+0.04) -28.3%
4205.3: log with %T         0.56(0.52+0.04)   0.57(0.55+0.01) +1.8%
4205.4: log with %t         0.92(0.88+0.04)   0.67(0.62+0.05) -27.2%
4205.5: log with %P         0.57(0.54+0.02)   0.57(0.54+0.03) +0.0%
4205.6: log with %p         0.92(0.86+0.05)   0.64(0.60+0.04) -30.4%
4205.7: log with %h-%h-%h   1.02(0.98+0.04)   0.72(0.69+0.03) -29.4%

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
This patch goes on top of next.  p4205 is better suited to show the
cost of sorting in the loose objects cache than the slooow index-pack.
This change does fix the index-pack regression on ext4 for me as well,
though.  Not sure it warrants adding a loose objects test to p5302.

 object-store.h | 2 +-
 object.c       | 5 ++++-
 packfile.c     | 4 +++-
 sha1-file.c    | 5 +++--
 sha1-name.c    | 8 +++++---
 5 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/object-store.h b/object-store.h
index 8dceed0f31..ee67a50980 100644
--- a/object-store.h
+++ b/object-store.h
@@ -20,7 +20,7 @@ struct object_directory {
 	 * Be sure to call odb_load_loose_cache() before using.
 	 */
 	char loose_objects_subdir_seen[256];
-	struct oid_array loose_objects_cache;
+	struct oid_array loose_objects_cache[256];
 
 	/*
 	 * Path to the alternative object store. If this is a relative path,
diff --git a/object.c b/object.c
index c29a97a7e9..965493ba76 100644
--- a/object.c
+++ b/object.c
@@ -484,8 +484,11 @@ struct raw_object_store *raw_object_store_new(void)
 
 static void free_object_directory(struct object_directory *odb)
 {
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(odb->loose_objects_cache); i++)
+		oid_array_clear(&odb->loose_objects_cache[i]);
 	free(odb->path);
-	oid_array_clear(&odb->loose_objects_cache);
 	free(odb);
 }
 
diff --git a/packfile.c b/packfile.c
index 56496bb425..3ef6a241b7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -995,7 +995,9 @@ void reprepare_packed_git(struct repository *r)
 	struct object_directory *odb;
 
 	for (odb = r->objects->odb; odb; odb = odb->next) {
-		oid_array_clear(&odb->loose_objects_cache);
+		int i;
+		for (i = 0; i < ARRAY_SIZE(odb->loose_objects_cache); i++)
+			oid_array_clear(&odb->loose_objects_cache[i]);
 		memset(&odb->loose_objects_subdir_seen, 0,
 		       sizeof(odb->loose_objects_subdir_seen));
 	}
diff --git a/sha1-file.c b/sha1-file.c
index 05f63dfd4e..d2f5e65865 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -933,7 +933,8 @@ static int quick_has_loose(struct repository *r,
 	prepare_alt_odb(r);
 	for (odb = r->objects->odb; odb; odb = odb->next) {
 		odb_load_loose_cache(odb, subdir_nr);
-		if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0)
+		if (oid_array_lookup(&odb->loose_objects_cache[subdir_nr],
+				     &oid) >= 0)
 			return 1;
 	}
 	return 0;
@@ -2173,7 +2174,7 @@ void odb_load_loose_cache(struct object_directory *odb, int subdir_nr)
 	for_each_file_in_obj_subdir(subdir_nr, &buf,
 				    append_loose_object,
 				    NULL, NULL,
-				    &odb->loose_objects_cache);
+				    &odb->loose_objects_cache[subdir_nr]);
 	odb->loose_objects_subdir_seen[subdir_nr] = 1;
 	strbuf_release(&buf);
 }
diff --git a/sha1-name.c b/sha1-name.c
index b24502811b..fdb22147b2 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -94,14 +94,16 @@ static void find_short_object_filename(struct disambiguate_state *ds)
 	     odb && !ds->ambiguous;
 	     odb = odb->next) {
 		int pos;
+		struct oid_array *loose_subdir_objects;
 
 		odb_load_loose_cache(odb, subdir_nr);
-		pos = oid_array_lookup(&odb->loose_objects_cache, &ds->bin_pfx);
+		loose_subdir_objects = &odb->loose_objects_cache[subdir_nr];
+		pos = oid_array_lookup(loose_subdir_objects, &ds->bin_pfx);
 		if (pos < 0)
 			pos = -1 - pos;
-		while (!ds->ambiguous && pos < odb->loose_objects_cache.nr) {
+		while (!ds->ambiguous && pos < loose_subdir_objects->nr) {
 			const struct object_id *oid;
-			oid = odb->loose_objects_cache.oid + pos;
+			oid = loose_subdir_objects->oid + pos;
 			if (!match_sha(ds->len, ds->bin_pfx.hash, oid->hash))
 				break;
 			update_candidates(ds, oid);
Jeff King Dec. 3, 2018, 10:04 p.m. UTC | #12
On Sun, Dec 02, 2018 at 11:52:50AM +0100, René Scharfe wrote:

> > And for mu.git, a ~20k object repo:
> > 
> >     Test                                             origin/master     peff/jk/loose-cache       avar/check-collisions-config
> >     -------------------------------------------------------------------------------------------------------------------------
> >     0008.2: index-pack with 256*1 loose objects      0.59(0.91+0.06)   0.58(0.93+0.03) -1.7%     0.57(0.89+0.04) -3.4%
> >     0008.3: index-pack with 256*10 loose objects     0.59(0.91+0.07)   0.59(0.92+0.03) +0.0%     0.57(0.89+0.03) -3.4%
> >     0008.4: index-pack with 256*100 loose objects    0.59(0.91+0.05)   0.81(1.13+0.04) +37.3%    0.58(0.91+0.04) -1.7%
> >     0008.5: index-pack with 256*250 loose objects    0.59(0.91+0.05)   1.23(1.51+0.08) +108.5%   0.58(0.91+0.04) -1.7%
> >     0008.6: index-pack with 256*500 loose objects    0.59(0.90+0.06)   1.96(2.20+0.12) +232.2%   0.58(0.91+0.04) -1.7%
> >     0008.7: index-pack with 256*750 loose objects    0.59(0.92+0.05)   2.72(2.92+0.17) +361.0%   0.58(0.90+0.04) -1.7%
> >     0008.8: index-pack with 256*1000 loose objects   0.59(0.90+0.06)   3.50(3.67+0.21) +493.2%   0.57(0.90+0.04) -3.4%
> 
> OK, here's another theory: The cache scales badly with increasing
> numbers of loose objects because it sorts the array 256 times as it is
> filled.  Loading it fully and sorting once would help, as would using
> one array per subdirectory.

Yeah, that makes sense. This was actually how I had planned to do it
originally, but then I ended up just reusing the existing single-array
approach from the abbrev code.

I hadn't actually thought about the repeated sortings (but that
definitely makes sense that they would hurt in these pathological
cases), but more just that we get a 256x reduction in N for our binary
search (in fact we already do this first-byte lookup-table trick for
pack index lookups).

Your patch looks good to me. We may want to do one thing on top:

> diff --git a/object-store.h b/object-store.h
> index 8dceed0f31..ee67a50980 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -20,7 +20,7 @@ struct object_directory {
>  	 * Be sure to call odb_load_loose_cache() before using.
>  	 */
>  	char loose_objects_subdir_seen[256];
> -	struct oid_array loose_objects_cache;
> +	struct oid_array loose_objects_cache[256];

The comment in the context there is warning callers to remember to load
the cache first. Now that we have individual caches, might it make sense
to change the interface a bit, and make these members private. I.e.,
something like:

  struct oid_array *odb_loose_cache(struct object_directory *odb,
                                    int subdir_nr) 
  {
	if (!loose_objects_subdir_seen[subdir_nr])
		odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */

	return &odb->loose_objects_cache[subdir_nr];
  }

That's harder to get wrong, and this:

> diff --git a/sha1-file.c b/sha1-file.c
> index 05f63dfd4e..d2f5e65865 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -933,7 +933,8 @@ static int quick_has_loose(struct repository *r,
>  	prepare_alt_odb(r);
>  	for (odb = r->objects->odb; odb; odb = odb->next) {
>  		odb_load_loose_cache(odb, subdir_nr);
> -		if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0)
> +		if (oid_array_lookup(&odb->loose_objects_cache[subdir_nr],
> +				     &oid) >= 0)
>  			return 1;
>  	}

becomes:

  struct oid_array *cache = odb_loose_cache(odb, subdir_nr);
  if (oid_array_lookup(cache, &oid))
	return 1;

(An even simpler interface would be a single function that computes
subdir_nr and does the lookup itself, but that would not be enough for
find_short_object_filename()).

-Peff
René Scharfe Dec. 4, 2018, 9:45 p.m. UTC | #13
Am 03.12.2018 um 23:04 schrieb Jeff King:
> On Sun, Dec 02, 2018 at 11:52:50AM +0100, René Scharfe wrote:
> 
>>> And for mu.git, a ~20k object repo:
>>>
>>>     Test                                             origin/master     peff/jk/loose-cache       avar/check-collisions-config
>>>     -------------------------------------------------------------------------------------------------------------------------
>>>     0008.2: index-pack with 256*1 loose objects      0.59(0.91+0.06)   0.58(0.93+0.03) -1.7%     0.57(0.89+0.04) -3.4%
>>>     0008.3: index-pack with 256*10 loose objects     0.59(0.91+0.07)   0.59(0.92+0.03) +0.0%     0.57(0.89+0.03) -3.4%
>>>     0008.4: index-pack with 256*100 loose objects    0.59(0.91+0.05)   0.81(1.13+0.04) +37.3%    0.58(0.91+0.04) -1.7%
>>>     0008.5: index-pack with 256*250 loose objects    0.59(0.91+0.05)   1.23(1.51+0.08) +108.5%   0.58(0.91+0.04) -1.7%
>>>     0008.6: index-pack with 256*500 loose objects    0.59(0.90+0.06)   1.96(2.20+0.12) +232.2%   0.58(0.91+0.04) -1.7%
>>>     0008.7: index-pack with 256*750 loose objects    0.59(0.92+0.05)   2.72(2.92+0.17) +361.0%   0.58(0.90+0.04) -1.7%
>>>     0008.8: index-pack with 256*1000 loose objects   0.59(0.90+0.06)   3.50(3.67+0.21) +493.2%   0.57(0.90+0.04) -3.4%
>>
>> OK, here's another theory: The cache scales badly with increasing
>> numbers of loose objects because it sorts the array 256 times as it is
>> filled.  Loading it fully and sorting once would help, as would using
>> one array per subdirectory.
> 
> Yeah, that makes sense. This was actually how I had planned to do it
> originally, but then I ended up just reusing the existing single-array
> approach from the abbrev code.
> 
> I hadn't actually thought about the repeated sortings (but that
> definitely makes sense that they would hurt in these pathological
> cases), but more just that we get a 256x reduction in N for our binary
> search (in fact we already do this first-byte lookup-table trick for
> pack index lookups).

Skipping eight steps in a binary search is something, but it's faster
even without that.

Just realized that the demo code can use "lookup" instead of the much
more expensive "for_each_unique" to sort.  D'oh!  With that change:

  for command in '
      foreach (0..255) {
        $subdir = sprintf("%02x", $_);
        foreach (1..$ARGV[0]) {
          printf("append %s%038d\n", $subdir, $_);
        }
        # intermediate sort
        print "lookup " . "0" x 40 . "\n";
      }
    ' '
      foreach (0..255) {
        $subdir = sprintf("%02x", $_);
        foreach (1..$ARGV[0]) {
          printf("append %s%038d\n", $subdir, $_);
        }
      }
      # sort once at the end
      print "lookup " . "0" x 40 . "\n";
    ' '
      foreach (0..255) {
        $subdir = sprintf("%02x", $_);
        foreach (1..$ARGV[0]) {
          printf("append %s%038d\n", $subdir, $_);
        }
        # sort each subdirectory separately
        print "lookup " . "0" x 40 . "\n";
        print "clear\n";
      }
    '
  do
    time perl -e "$command" 1000 | t/helper/test-tool sha1-array | wc -l
  done

And the results make the scale of the improvement more obvious:

  256

  real    0m3.476s
  user    0m3.466s
  sys     0m0.099s
  1

  real    0m0.157s
  user    0m0.148s
  sys     0m0.046s
  256

  real    0m0.117s
  user    0m0.116s
  sys     0m0.051s

> Your patch looks good to me. We may want to do one thing on top:
> 
>> diff --git a/object-store.h b/object-store.h
>> index 8dceed0f31..ee67a50980 100644
>> --- a/object-store.h
>> +++ b/object-store.h
>> @@ -20,7 +20,7 @@ struct object_directory {
>>  	 * Be sure to call odb_load_loose_cache() before using.
>>  	 */
>>  	char loose_objects_subdir_seen[256];
>> -	struct oid_array loose_objects_cache;
>> +	struct oid_array loose_objects_cache[256];
> 
> The comment in the context there is warning callers to remember to load
> the cache first. Now that we have individual caches, might it make sense
> to change the interface a bit, and make these members private. I.e.,
> something like:
> 
>   struct oid_array *odb_loose_cache(struct object_directory *odb,
>                                     int subdir_nr) 
>   {
> 	if (!loose_objects_subdir_seen[subdir_nr])
> 		odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */
> 
> 	return &odb->loose_objects_cache[subdir_nr];
>   }

Sure.  And it should take an object_id pointer instead of a subdir_nr --
less duplication, nicer interface (patch below).

It would be nice if it could return a const pointer to discourage
messing up the cache, but that's not compatible with oid_array_lookup().

And quick_has_loose() should be converted to object_id as well -- adding
a function that takes a SHA-1 is a regression. :D

René

---
 object-store.h |  8 ++++----
 sha1-file.c    | 19 ++++++++-----------
 sha1-name.c    |  4 +---
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/object-store.h b/object-store.h
index ee67a50980..dd9efdd276 100644
--- a/object-store.h
+++ b/object-store.h
@@ -48,11 +48,11 @@ void add_to_alternates_file(const char *dir);
 void add_to_alternates_memory(const char *dir);
 
 /*
- * Populate an odb's loose object cache for one particular subdirectory (i.e.,
- * the one that corresponds to the first byte of objects you're interested in,
- * from 0 to 255 inclusive).
+ * Populate and return the loose object cache array corresponding to the
+ * given object ID.
  */
-void odb_load_loose_cache(struct object_directory *odb, int subdir_nr);
+struct oid_array *odb_loose_cache(struct object_directory *odb,
+				  const struct object_id *oid);
 
 struct packed_git {
 	struct packed_git *next;
diff --git a/sha1-file.c b/sha1-file.c
index d2f5e65865..38af6d5d0b 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -924,7 +924,6 @@ static int open_sha1_file(struct repository *r,
 static int quick_has_loose(struct repository *r,
 			   const unsigned char *sha1)
 {
-	int subdir_nr = sha1[0];
 	struct object_id oid;
 	struct object_directory *odb;
 
@@ -932,9 +931,7 @@ static int quick_has_loose(struct repository *r,
 
 	prepare_alt_odb(r);
 	for (odb = r->objects->odb; odb; odb = odb->next) {
-		odb_load_loose_cache(odb, subdir_nr);
-		if (oid_array_lookup(&odb->loose_objects_cache[subdir_nr],
-				     &oid) >= 0)
+		if (oid_array_lookup(odb_loose_cache(odb, &oid), &oid) >= 0)
 			return 1;
 	}
 	return 0;
@@ -2159,24 +2156,24 @@ static int append_loose_object(const struct object_id *oid, const char *path,
 	return 0;
 }
 
-void odb_load_loose_cache(struct object_directory *odb, int subdir_nr)
+struct oid_array *odb_loose_cache(struct object_directory *odb,
+				  const struct object_id *oid)
 {
+	int subdir_nr = oid->hash[0];
+	struct oid_array *subdir_array = &odb->loose_objects_cache[subdir_nr];
 	struct strbuf buf = STRBUF_INIT;
 
-	if (subdir_nr < 0 ||
-	    subdir_nr >= ARRAY_SIZE(odb->loose_objects_subdir_seen))
-		BUG("subdir_nr out of range");
-
 	if (odb->loose_objects_subdir_seen[subdir_nr])
-		return;
+		return subdir_array;
 
 	strbuf_addstr(&buf, odb->path);
 	for_each_file_in_obj_subdir(subdir_nr, &buf,
 				    append_loose_object,
 				    NULL, NULL,
-				    &odb->loose_objects_cache[subdir_nr]);
+				    subdir_array);
 	odb->loose_objects_subdir_seen[subdir_nr] = 1;
 	strbuf_release(&buf);
+	return subdir_array;
 }
 
 static int check_stream_sha1(git_zstream *stream,
diff --git a/sha1-name.c b/sha1-name.c
index fdb22147b2..4fc6368ce5 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -87,7 +87,6 @@ static int match_sha(unsigned, const unsigned char *, const unsigned char *);
 
 static void find_short_object_filename(struct disambiguate_state *ds)
 {
-	int subdir_nr = ds->bin_pfx.hash[0];
 	struct object_directory *odb;
 
 	for (odb = the_repository->objects->odb;
@@ -96,8 +95,7 @@ static void find_short_object_filename(struct disambiguate_state *ds)
 		int pos;
 		struct oid_array *loose_subdir_objects;
 
-		odb_load_loose_cache(odb, subdir_nr);
-		loose_subdir_objects = &odb->loose_objects_cache[subdir_nr];
+		loose_subdir_objects = odb_loose_cache(odb, &ds->bin_pfx);
 		pos = oid_array_lookup(loose_subdir_objects, &ds->bin_pfx);
 		if (pos < 0)
 			pos = -1 - pos;
Jeff King Dec. 5, 2018, 4:46 a.m. UTC | #14
On Tue, Dec 04, 2018 at 10:45:13PM +0100, René Scharfe wrote:

> > The comment in the context there is warning callers to remember to load
> > the cache first. Now that we have individual caches, might it make sense
> > to change the interface a bit, and make these members private. I.e.,
> > something like:
> > 
> >   struct oid_array *odb_loose_cache(struct object_directory *odb,
> >                                     int subdir_nr) 
> >   {
> > 	if (!loose_objects_subdir_seen[subdir_nr])
> > 		odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */
> > 
> > 	return &odb->loose_objects_cache[subdir_nr];
> >   }
> 
> Sure.  And it should take an object_id pointer instead of a subdir_nr --
> less duplication, nicer interface (patch below).

I had considered that initially, but it's a little less flexible if a
caller doesn't actually have an oid. Though both of the proposed callers
do, so it's probably OK to worry about that if and when it ever comes up
(the most plausible thing in my mind is doing some abbrev-like analysis
without looking to abbreviate a _particular_ oid).

> It would be nice if it could return a const pointer to discourage
> messing up the cache, but that's not compatible with oid_array_lookup().

Yeah.

> And quick_has_loose() should be converted to object_id as well -- adding
> a function that takes a SHA-1 is a regression. :D

I actually wrote it that way initially, but doing the hashcpy() in the
caller is a bit more awkward. My thought was to punt on that until the
rest of the surrounding code starts handling oids.

> ---
>  object-store.h |  8 ++++----
>  sha1-file.c    | 19 ++++++++-----------
>  sha1-name.c    |  4 +---
>  3 files changed, 13 insertions(+), 18 deletions(-)

This patch looks sane. How do you want to handle it? I'd assumed your
earlier one would be for applying on top, but this one doesn't have a
commit message. Did you want me to squash down the individual hunks?

-Peff
René Scharfe Dec. 5, 2018, 6:02 a.m. UTC | #15
Am 05.12.2018 um 05:46 schrieb Jeff King:
> On Tue, Dec 04, 2018 at 10:45:13PM +0100, René Scharfe wrote:
> 
>>> The comment in the context there is warning callers to remember to load
>>> the cache first. Now that we have individual caches, might it make sense
>>> to change the interface a bit, and make these members private. I.e.,
>>> something like:
>>>
>>>   struct oid_array *odb_loose_cache(struct object_directory *odb,
>>>                                     int subdir_nr) 
>>>   {
>>> 	if (!loose_objects_subdir_seen[subdir_nr])
>>> 		odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */
>>>
>>> 	return &odb->loose_objects_cache[subdir_nr];
>>>   }
>>
>> Sure.  And it should take an object_id pointer instead of a subdir_nr --
>> less duplication, nicer interface (patch below).
> 
> I had considered that initially, but it's a little less flexible if a
> caller doesn't actually have an oid. Though both of the proposed callers
> do, so it's probably OK to worry about that if and when it ever comes up
> (the most plausible thing in my mind is doing some abbrev-like analysis
> without looking to abbreviate a _particular_ oid).

Right, let's focus on concrete requirements of current callers.  YAGNI..
:)

>> And quick_has_loose() should be converted to object_id as well -- adding
>> a function that takes a SHA-1 is a regression. :D
> 
> I actually wrote it that way initially, but doing the hashcpy() in the
> caller is a bit more awkward. My thought was to punt on that until the
> rest of the surrounding code starts handling oids.

The level of awkwardness is the same to me, but sha1_loose_object_info()
is much longer already, so it might feel worse to add it there.  This
function is easily converted to struct object_id, though, as its single
caller can pass one on -- this makes the copy unnecessary.

> This patch looks sane. How do you want to handle it? I'd assumed your
> earlier one would be for applying on top, but this one doesn't have a
> commit message. Did you want me to squash down the individual hunks?

I'm waiting for the first one (object-store: use one oid_array per
subdirectory for loose cache) to be accepted, as it aims to solve a
user-visible performance regression, i.e. that's where the meat is.
I'm particularly interested in performance numbers from Ævar for it.

I can send the last one properly later, and add patches for converting
quick_has_loose() to struct object_id.  Those are just cosmetic..

René
Jeff King Dec. 5, 2018, 6:51 a.m. UTC | #16
On Wed, Dec 05, 2018 at 07:02:17AM +0100, René Scharfe wrote:

> > I actually wrote it that way initially, but doing the hashcpy() in the
> > caller is a bit more awkward. My thought was to punt on that until the
> > rest of the surrounding code starts handling oids.
> 
> The level of awkwardness is the same to me, but sha1_loose_object_info()
> is much longer already, so it might feel worse to add it there. 

Right, what I meant was that in sha1_loose_object_info():

  if (special_case)
	handle_special_case();

is easier to follow than a block setting up the special case and then
calling the function.

> This
> function is easily converted to struct object_id, though, as its single
> caller can pass one on -- this makes the copy unnecessary.

If you mean modifying sha1_loose_object_info() to take an oid, then
sure, I agree that is a good step forward (and that is exactly the "punt
until" moment I meant).

> > This patch looks sane. How do you want to handle it? I'd assumed your
> > earlier one would be for applying on top, but this one doesn't have a
> > commit message. Did you want me to squash down the individual hunks?
> 
> I'm waiting for the first one (object-store: use one oid_array per
> subdirectory for loose cache) to be accepted, as it aims to solve a
> user-visible performance regression, i.e. that's where the meat is.
> I'm particularly interested in performance numbers from Ævar for it.
> 
> I can send the last one properly later, and add patches for converting
> quick_has_loose() to struct object_id.  Those are just cosmetic..

Great, thanks. I just wanted to make sure these improvements weren't
going to slip through the cracks.

-Peff
Jeff King Dec. 5, 2018, 8:15 a.m. UTC | #17
On Wed, Dec 05, 2018 at 01:51:36AM -0500, Jeff King wrote:

> > This
> > function is easily converted to struct object_id, though, as its single
> > caller can pass one on -- this makes the copy unnecessary.
> 
> If you mean modifying sha1_loose_object_info() to take an oid, then
> sure, I agree that is a good step forward (and that is exactly the "punt
> until" moment I meant).

So the simple thing is to do that, and then have it pass oid->hash to
the other functions it uses. If we start to convert those, there's a
little bit of a rabbit hole, but it's actually not too bad.

Most of the spill-over is into the dumb-http code. Note that it actually
uses sha1 itself! That probably needs to be the_hash_algo (though I'm
not even sure how we'd negotiate the algorithm across a dumb fetch). At
any rate, I don't think this patch makes anything _worse_ in that
respect.

diff --git a/http-push.c b/http-push.c
index cd48590912..0141b0ad53 100644
--- a/http-push.c
+++ b/http-push.c
@@ -255,7 +255,7 @@ static void start_fetch_loose(struct transfer_request *request)
 	struct active_request_slot *slot;
 	struct http_object_request *obj_req;
 
-	obj_req = new_http_object_request(repo->url, request->obj->oid.hash);
+	obj_req = new_http_object_request(repo->url, &request->obj->oid);
 	if (obj_req == NULL) {
 		request->state = ABORTED;
 		return;
diff --git a/http-walker.c b/http-walker.c
index 0a392c85b6..29b59e2fe0 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -58,7 +58,7 @@ static void start_object_request(struct walker *walker,
 	struct active_request_slot *slot;
 	struct http_object_request *req;
 
-	req = new_http_object_request(obj_req->repo->base, obj_req->oid.hash);
+	req = new_http_object_request(obj_req->repo->base, &obj_req->oid);
 	if (req == NULL) {
 		obj_req->state = ABORTED;
 		return;
@@ -543,11 +543,11 @@ static int fetch_object(struct walker *walker, unsigned char *sha1)
 	} else if (req->zret != Z_STREAM_END) {
 		walker->corrupt_object_found++;
 		ret = error("File %s (%s) corrupt", hex, req->url);
-	} else if (!hasheq(obj_req->oid.hash, req->real_sha1)) {
+	} else if (!oideq(&obj_req->oid, &req->real_oid)) {
 		ret = error("File %s has bad hash", hex);
 	} else if (req->rename < 0) {
 		struct strbuf buf = STRBUF_INIT;
-		loose_object_path(the_repository, &buf, req->sha1);
+		loose_object_path(the_repository, &buf, &req->oid);
 		ret = error("unable to write sha1 filename %s", buf.buf);
 		strbuf_release(&buf);
 	}
diff --git a/http.c b/http.c
index 7cfa7a16e0..e95b5b9be0 100644
--- a/http.c
+++ b/http.c
@@ -2298,9 +2298,9 @@ static size_t fwrite_sha1_file(char *ptr, size_t eltsize, size_t nmemb,
 }
 
 struct http_object_request *new_http_object_request(const char *base_url,
-	unsigned char *sha1)
+						    const struct object_id *oid)
 {
-	char *hex = sha1_to_hex(sha1);
+	char *hex = oid_to_hex(oid);
 	struct strbuf filename = STRBUF_INIT;
 	struct strbuf prevfile = STRBUF_INIT;
 	int prevlocal;
@@ -2311,10 +2311,10 @@ struct http_object_request *new_http_object_request(const char *base_url,
 
 	freq = xcalloc(1, sizeof(*freq));
 	strbuf_init(&freq->tmpfile, 0);
-	hashcpy(freq->sha1, sha1);
+	oidcpy(&freq->oid, oid);
 	freq->localfile = -1;
 
-	loose_object_path(the_repository, &filename, sha1);
+	loose_object_path(the_repository, &filename, oid);
 	strbuf_addf(&freq->tmpfile, "%s.temp", filename.buf);
 
 	strbuf_addf(&prevfile, "%s.prev", filename.buf);
@@ -2456,16 +2456,16 @@ int finish_http_object_request(struct http_object_request *freq)
 	}
 
 	git_inflate_end(&freq->stream);
-	git_SHA1_Final(freq->real_sha1, &freq->c);
+	git_SHA1_Final(freq->real_oid.hash, &freq->c);
 	if (freq->zret != Z_STREAM_END) {
 		unlink_or_warn(freq->tmpfile.buf);
 		return -1;
 	}
-	if (!hasheq(freq->sha1, freq->real_sha1)) {
+	if (!oideq(&freq->oid, &freq->real_oid)) {
 		unlink_or_warn(freq->tmpfile.buf);
 		return -1;
 	}
-	loose_object_path(the_repository, &filename, freq->sha1);
+	loose_object_path(the_repository, &filename, &freq->oid);
 	freq->rename = finalize_object_file(freq->tmpfile.buf, filename.buf);
 	strbuf_release(&filename);
 
diff --git a/http.h b/http.h
index d305ca1dc7..66c52b2e1e 100644
--- a/http.h
+++ b/http.h
@@ -224,8 +224,8 @@ struct http_object_request {
 	CURLcode curl_result;
 	char errorstr[CURL_ERROR_SIZE];
 	long http_code;
-	unsigned char sha1[20];
-	unsigned char real_sha1[20];
+	struct object_id oid;
+	struct object_id real_oid;
 	git_SHA_CTX c;
 	git_zstream stream;
 	int zret;
@@ -234,7 +234,7 @@ struct http_object_request {
 };
 
 extern struct http_object_request *new_http_object_request(
-	const char *base_url, unsigned char *sha1);
+	const char *base_url, const struct object_id *oid);
 extern void process_http_object_request(struct http_object_request *freq);
 extern int finish_http_object_request(struct http_object_request *freq);
 extern void abort_http_object_request(struct http_object_request *freq);
diff --git a/object-store.h b/object-store.h
index fecbb7e094..265d0d8e1f 100644
--- a/object-store.h
+++ b/object-store.h
@@ -151,11 +151,13 @@ void raw_object_store_clear(struct raw_object_store *o);
 
 /*
  * Put in `buf` the name of the file in the local object database that
- * would be used to store a loose object with the specified sha1.
+ * would be used to store a loose object with the specified oid.
  */
-const char *loose_object_path(struct repository *r, struct strbuf *buf, const unsigned char *sha1);
+const char *loose_object_path(struct repository *r, struct strbuf *buf,
+			      const struct object_id *oid);
 
-void *map_sha1_file(struct repository *r, const unsigned char *sha1, unsigned long *size);
+void *map_loose_object(struct repository *r, const struct object_id *oid,
+		       unsigned long *size);
 
 extern void *read_object_file_extended(const struct object_id *oid,
 				       enum object_type *type,
diff --git a/sha1-file.c b/sha1-file.c
index 3ddf4c9426..0705709036 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -333,12 +333,12 @@ int raceproof_create_file(const char *path, create_file_fn fn, void *cb)
 	return ret;
 }
 
-static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
+static void fill_loose_path(struct strbuf *buf, const struct object_id *oid)
 {
 	int i;
 	for (i = 0; i < the_hash_algo->rawsz; i++) {
 		static char hex[] = "0123456789abcdef";
-		unsigned int val = sha1[i];
+		unsigned int val = oid->hash[i];
 		strbuf_addch(buf, hex[val >> 4]);
 		strbuf_addch(buf, hex[val & 0xf]);
 		if (!i)
@@ -348,19 +348,19 @@ static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
 
 static const char *odb_loose_path(struct object_directory *odb,
 				  struct strbuf *buf,
-				  const unsigned char *sha1)
+				  const struct object_id *oid)
 {
 	strbuf_reset(buf);
 	strbuf_addstr(buf, odb->path);
 	strbuf_addch(buf, '/');
-	fill_sha1_path(buf, sha1);
+	fill_loose_path(buf, oid);
 	return buf->buf;
 }
 
 const char *loose_object_path(struct repository *r, struct strbuf *buf,
-			      const unsigned char *sha1)
+			      const struct object_id *oid)
 {
-	return odb_loose_path(r->objects->odb, buf, sha1);
+	return odb_loose_path(r->objects->odb, buf, oid);
 }
 
 /*
@@ -721,7 +721,7 @@ static int check_and_freshen_odb(struct object_directory *odb,
 				 int freshen)
 {
 	static struct strbuf path = STRBUF_INIT;
-	odb_loose_path(odb, &path, oid->hash);
+	odb_loose_path(odb, &path, oid);
 	return check_and_freshen_file(path.buf, freshen);
 }
 
@@ -879,15 +879,15 @@ int git_open_cloexec(const char *name, int flags)
  * Note that it may point to static storage and is only valid until another
  * call to stat_sha1_file().
  */
-static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
-			  struct stat *st, const char **path)
+static int stat_loose_object(struct repository *r, const struct object_id *oid,
+			     struct stat *st, const char **path)
 {
 	struct object_directory *odb;
 	static struct strbuf buf = STRBUF_INIT;
 
 	prepare_alt_odb(r);
 	for (odb = r->objects->odb; odb; odb = odb->next) {
-		*path = odb_loose_path(odb, &buf, sha1);
+		*path = odb_loose_path(odb, &buf, oid);
 		if (!lstat(*path, st))
 			return 0;
 	}
@@ -900,7 +900,7 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
  * descriptor. See the caveats on the "path" parameter above.
  */
 static int open_sha1_file(struct repository *r,
-			  const unsigned char *sha1, const char **path)
+			  const struct object_id *oid, const char **path)
 {
 	int fd;
 	struct object_directory *odb;
@@ -909,7 +909,7 @@ static int open_sha1_file(struct repository *r,
 
 	prepare_alt_odb(r);
 	for (odb = r->objects->odb; odb; odb = odb->next) {
-		*path = odb_loose_path(odb, &buf, sha1);
+		*path = odb_loose_path(odb, &buf, oid);
 		fd = git_open(*path);
 		if (fd >= 0)
 			return fd;
@@ -922,19 +922,16 @@ static int open_sha1_file(struct repository *r,
 }
 
 static int quick_has_loose(struct repository *r,
-			   const unsigned char *sha1)
+			   const struct object_id *oid)
 {
-	int subdir_nr = sha1[0];
-	struct object_id oid;
+	int subdir_nr = oid->hash[0];
 	struct object_directory *odb;
 
-	hashcpy(oid.hash, sha1);
-
 	prepare_alt_odb(r);
 	for (odb = r->objects->odb; odb; odb = odb->next) {
 		odb_load_loose_cache(odb, subdir_nr);
 		if (oid_array_lookup(&odb->loose_objects_cache[subdir_nr],
-				     &oid) >= 0)
+				     oid) >= 0)
 			return 1;
 	}
 	return 0;
@@ -944,8 +941,8 @@ static int quick_has_loose(struct repository *r,
  * Map the loose object at "path" if it is not NULL, or the path found by
  * searching for a loose object named "sha1".
  */
-static void *map_sha1_file_1(struct repository *r, const char *path,
-			     const unsigned char *sha1, unsigned long *size)
+static void *map_loose_object_1(struct repository *r, const char *path,
+			     const struct object_id *oid, unsigned long *size)
 {
 	void *map;
 	int fd;
@@ -953,7 +950,7 @@ static void *map_sha1_file_1(struct repository *r, const char *path,
 	if (path)
 		fd = git_open(path);
 	else
-		fd = open_sha1_file(r, sha1, &path);
+		fd = open_sha1_file(r, oid, &path);
 	map = NULL;
 	if (fd >= 0) {
 		struct stat st;
@@ -972,10 +969,11 @@ static void *map_sha1_file_1(struct repository *r, const char *path,
 	return map;
 }
 
-void *map_sha1_file(struct repository *r,
-		    const unsigned char *sha1, unsigned long *size)
+void *map_loose_object(struct repository *r,
+		       const struct object_id *oid,
+		       unsigned long *size)
 {
-	return map_sha1_file_1(r, NULL, sha1, size);
+	return map_loose_object_1(r, NULL, oid, size);
 }
 
 static int unpack_sha1_short_header(git_zstream *stream,
@@ -1045,7 +1043,9 @@ static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map,
 	return -1;
 }
 
-static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long size, const unsigned char *sha1)
+static void *unpack_loose_rest(git_zstream *stream,
+			       void *buffer, unsigned long size,
+			       const struct object_id *oid)
 {
 	int bytes = strlen(buffer) + 1;
 	unsigned char *buf = xmallocz(size);
@@ -1082,10 +1082,10 @@ static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long s
 	}
 
 	if (status < 0)
-		error(_("corrupt loose object '%s'"), sha1_to_hex(sha1));
+		error(_("corrupt loose object '%s'"), oid_to_hex(oid));
 	else if (stream->avail_in)
 		error(_("garbage at end of loose object '%s'"),
-		      sha1_to_hex(sha1));
+		      oid_to_hex(oid));
 	free(buf);
 	return NULL;
 }
@@ -1164,9 +1164,9 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep)
 	return parse_sha1_header_extended(hdr, &oi, 0);
 }
 
-static int sha1_loose_object_info(struct repository *r,
-				  const unsigned char *sha1,
-				  struct object_info *oi, int flags)
+static int loose_object_info(struct repository *r,
+			     const struct object_id *oid,
+			     struct object_info *oi, int flags)
 {
 	int status = 0;
 	unsigned long mapsize;
@@ -1191,15 +1191,15 @@ static int sha1_loose_object_info(struct repository *r,
 		const char *path;
 		struct stat st;
 		if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
-			return quick_has_loose(r, sha1) ? 0 : -1;
-		if (stat_sha1_file(r, sha1, &st, &path) < 0)
+			return quick_has_loose(r, oid) ? 0 : -1;
+		if (stat_loose_object(r, oid, &st, &path) < 0)
 			return -1;
 		if (oi->disk_sizep)
 			*oi->disk_sizep = st.st_size;
 		return 0;
 	}
 
-	map = map_sha1_file(r, sha1, &mapsize);
+	map = map_loose_object(r, oid, &mapsize);
 	if (!map)
 		return -1;
 
@@ -1211,22 +1211,22 @@ static int sha1_loose_object_info(struct repository *r,
 	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
 		if (unpack_sha1_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
 			status = error(_("unable to unpack %s header with --allow-unknown-type"),
-				       sha1_to_hex(sha1));
+				       oid_to_hex(oid));
 	} else if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
 		status = error(_("unable to unpack %s header"),
-			       sha1_to_hex(sha1));
+			       oid_to_hex(oid));
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
 		if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
-				       sha1_to_hex(sha1));
+				       oid_to_hex(oid));
 	} else if ((status = parse_sha1_header_extended(hdr, oi, flags)) < 0)
-		status = error(_("unable to parse %s header"), sha1_to_hex(sha1));
+		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
-		*oi->contentp = unpack_sha1_rest(&stream, hdr,
-						 *oi->sizep, sha1);
+		*oi->contentp = unpack_loose_rest(&stream, hdr,
+						  *oi->sizep, oid);
 		if (!*oi->contentp) {
 			git_inflate_end(&stream);
 			status = -1;
@@ -1292,7 +1292,7 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
 			return -1;
 
 		/* Most likely it's a loose object. */
-		if (!sha1_loose_object_info(r, real->hash, oi, flags))
+		if (!loose_object_info(r, real, oi, flags))
 			return 0;
 
 		/* Not a loose object; someone else may have just packed it. */
@@ -1420,7 +1420,7 @@ void *read_object_file_extended(const struct object_id *oid,
 		die(_("replacement %s not found for %s"),
 		    oid_to_hex(repl), oid_to_hex(oid));
 
-	if (!stat_sha1_file(the_repository, repl->hash, &st, &path))
+	if (!stat_loose_object(the_repository, repl, &st, &path))
 		die(_("loose object %s (stored in %s) is corrupt"),
 		    oid_to_hex(repl), path);
 
@@ -1620,7 +1620,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 	static struct strbuf tmp_file = STRBUF_INIT;
 	static struct strbuf filename = STRBUF_INIT;
 
-	loose_object_path(the_repository, &filename, oid->hash);
+	loose_object_path(the_repository, &filename, oid);
 
 	fd = create_tmpfile(&tmp_file, filename.buf);
 	if (fd < 0) {
@@ -2196,7 +2196,7 @@ static int check_stream_sha1(git_zstream *stream,
 
 	/*
 	 * This size comparison must be "<=" to read the final zlib packets;
-	 * see the comment in unpack_sha1_rest for details.
+	 * see the comment in unpack_loose_rest for details.
 	 */
 	while (total_read <= size &&
 	       (status == Z_OK ||
@@ -2245,7 +2245,7 @@ int read_loose_object(const char *path,
 
 	*contents = NULL;
 
-	map = map_sha1_file_1(the_repository, path, NULL, &mapsize);
+	map = map_loose_object_1(the_repository, path, NULL, &mapsize);
 	if (!map) {
 		error_errno(_("unable to mmap %s"), path);
 		goto out;
@@ -2267,7 +2267,7 @@ int read_loose_object(const char *path,
 		if (check_stream_sha1(&stream, hdr, *size, path, expected_oid->hash) < 0)
 			goto out;
 	} else {
-		*contents = unpack_sha1_rest(&stream, hdr, *size, expected_oid->hash);
+		*contents = unpack_loose_rest(&stream, hdr, *size, expected_oid);
 		if (!*contents) {
 			error(_("unable to unpack contents of %s"), path);
 			git_inflate_end(&stream);
diff --git a/streaming.c b/streaming.c
index ac7c7a22f9..9049146bc1 100644
--- a/streaming.c
+++ b/streaming.c
@@ -338,8 +338,8 @@ static struct stream_vtbl loose_vtbl = {
 
 static open_method_decl(loose)
 {
-	st->u.loose.mapped = map_sha1_file(the_repository,
-					   oid->hash, &st->u.loose.mapsize);
+	st->u.loose.mapped = map_loose_object(the_repository,
+					      oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
 	if ((unpack_sha1_header(&st->z,
René Scharfe Dec. 5, 2018, 6:41 p.m. UTC | #18
Am 05.12.2018 um 09:15 schrieb Jeff King:
> On Wed, Dec 05, 2018 at 01:51:36AM -0500, Jeff King wrote:
> 
>>> This
>>> function is easily converted to struct object_id, though, as its single
>>> caller can pass one on -- this makes the copy unnecessary.
>>
>> If you mean modifying sha1_loose_object_info() to take an oid, then
>> sure, I agree that is a good step forward (and that is exactly the "punt
>> until" moment I meant).
> 
> So the simple thing is to do that, and then have it pass oid->hash to
> the other functions it uses.

Yes.

> If we start to convert those, there's a
> little bit of a rabbit hole, but it's actually not too bad.

You don't need to crawl in just for quick_has_loose(), but eventually
everything has to be converted.  It seems a bit much for one patch, but
perhaps that's just my ever-decreasing attention span speaking.

Converting one function prototype or struct member at a time seems
about the right amount of change per patch to me.  That's not always
possible due to entanglement, of course.

> Most of the spill-over is into the dumb-http code. Note that it actually
> uses sha1 itself! That probably needs to be the_hash_algo (though I'm
> not even sure how we'd negotiate the algorithm across a dumb fetch). At
> any rate, I don't think this patch makes anything _worse_ in that
> respect.

Right.

> diff --git a/sha1-file.c b/sha1-file.c
> index 3ddf4c9426..0705709036 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -333,12 +333,12 @@ int raceproof_create_file(const char *path, create_file_fn fn, void *cb)
>  	return ret;
>  }
>  
> -static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
> +static void fill_loose_path(struct strbuf *buf, const struct object_id *oid)

The new name fits.

> @@ -879,15 +879,15 @@ int git_open_cloexec(const char *name, int flags)
>   * Note that it may point to static storage and is only valid until another
>   * call to stat_sha1_file().
>   */
> -static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
> -			  struct stat *st, const char **path)
> +static int stat_loose_object(struct repository *r, const struct object_id *oid,
> +			     struct stat *st, const char **path)

Hmm, read_sha1_file() was renamed to read_object_file() earlier this
year, and I'd have expected this to become stat_object_file().  Names..

Anyway, the comment above and one a few lines below should be updated
as well.

>  {
>  	struct object_directory *odb;
>  	static struct strbuf buf = STRBUF_INIT;
>  
>  	prepare_alt_odb(r);
>  	for (odb = r->objects->odb; odb; odb = odb->next) {
> -		*path = odb_loose_path(odb, &buf, sha1);
> +		*path = odb_loose_path(odb, &buf, oid);
>  		if (!lstat(*path, st))
>  			return 0;
>  	}
> @@ -900,7 +900,7 @@ static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
>   * descriptor. See the caveats on the "path" parameter above.
>   */
>  static int open_sha1_file(struct repository *r,
> -			  const unsigned char *sha1, const char **path)
> +			  const struct object_id *oid, const char **path)

That function should lose the "sha1" in its name as well.

> -static void *map_sha1_file_1(struct repository *r, const char *path,
> -			     const unsigned char *sha1, unsigned long *size)
> +static void *map_loose_object_1(struct repository *r, const char *path,
> +			     const struct object_id *oid, unsigned long *size)

Similarly, map_object_file_1()?

> -void *map_sha1_file(struct repository *r,
> -		    const unsigned char *sha1, unsigned long *size)
> +void *map_loose_object(struct repository *r,
> +		       const struct object_id *oid,
> +		       unsigned long *size)

Similar.

> @@ -1045,7 +1043,9 @@ static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map,
>  	return -1;
>  }
>  
> -static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long size, const unsigned char *sha1)
> +static void *unpack_loose_rest(git_zstream *stream,
> +			       void *buffer, unsigned long size,
> +			       const struct object_id *oid)

Hmm, both old and new name here look weird to me at this point.

> -static int sha1_loose_object_info(struct repository *r,
> -				  const unsigned char *sha1,
> -				  struct object_info *oi, int flags)
> +static int loose_object_info(struct repository *r,
> +			     const struct object_id *oid,
> +			     struct object_info *oi, int flags)

And nothing of value was lost. :)

René
Jeff King Dec. 5, 2018, 8:17 p.m. UTC | #19
On Wed, Dec 05, 2018 at 07:41:44PM +0100, René Scharfe wrote:

> > If we start to convert those, there's a
> > little bit of a rabbit hole, but it's actually not too bad.
> 
> You don't need to crawl in just for quick_has_loose(), but eventually
> everything has to be converted.  It seems a bit much for one patch, but
> perhaps that's just my ever-decreasing attention span speaking.

Yeah, my normal process here is to dig to the bottom of the rabbit hole,
and then break it into actual patches. I just shared the middle state
here. ;)

I suspect the http bits could be split off into their own thing. The
bits in sha1-file.c I'd plan to mostly do all together, as they are
a family of related functions.

Mostly I wasn't sure how to wrap this up with the other changes. It's
obviously pretty invasive, and I don't want it to get in the way of
actual functional changes we've been discussing.

> > @@ -879,15 +879,15 @@ int git_open_cloexec(const char *name, int flags)
> >   * Note that it may point to static storage and is only valid until another
> >   * call to stat_sha1_file().
> >   */
> > -static int stat_sha1_file(struct repository *r, const unsigned char *sha1,
> > -			  struct stat *st, const char **path)
> > +static int stat_loose_object(struct repository *r, const struct object_id *oid,
> > +			     struct stat *st, const char **path)
> 
> Hmm, read_sha1_file() was renamed to read_object_file() earlier this
> year, and I'd have expected this to become stat_object_file().  Names..

read_object_file() is about reading an object from _any_ source. These
are specifically about loose objects, and I think that distinction is
important (both here and for map_loose_object, etc).

I'd actually argue that read_object_file() should just be read_object(),
but that already exists. Sigh. (I think it's fixable, but obviously
orthogonal to this topic).

> Anyway, the comment above and one a few lines below should be updated
> as well.

Thanks, fixed.

> >  static int open_sha1_file(struct repository *r,
> > -			  const unsigned char *sha1, const char **path)
> > +			  const struct object_id *oid, const char **path)
> 
> That function should lose the "sha1" in its name as well.

Yep, fixed.

> > -static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long size, const unsigned char *sha1)
> > +static void *unpack_loose_rest(git_zstream *stream,
> > +			       void *buffer, unsigned long size,
> > +			       const struct object_id *oid)
> 
> Hmm, both old and new name here look weird to me at this point.

It makes more sense in the pairing of unpack_sha1_header() and
unpack_sha1_rest(). Maybe "body" would be better than "rest".

At any rate, it probably makes sense to rename them together (but I
didn't touch the "header" one here). Maybe the name changes should come
as a separate patch. I was mostly changing them here because I was
changing the signatures anyway, and had to touch all of the callers.

-Peff
diff mbox series

Patch

diff --git a/object-store.h b/object-store.h
index bf1e0cb761..60758efad8 100644
--- a/object-store.h
+++ b/object-store.h
@@ -13,6 +13,7 @@  struct object_directory {
 	/*
 	 * Used to store the results of readdir(3) calls when we are OK
 	 * sacrificing accuracy due to races for speed. That includes
+	 * object existence with OBJECT_INFO_QUICK, as well as
 	 * our search for unique abbreviated hashes. Don't use it for tasks
 	 * requiring greater accuracy!
 	 *
diff --git a/sha1-file.c b/sha1-file.c
index 4aae716a37..e53da0b701 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -921,6 +921,24 @@  static int open_sha1_file(struct repository *r,
 	return -1;
 }
 
+static int quick_has_loose(struct repository *r,
+			   const unsigned char *sha1)
+{
+	int subdir_nr = sha1[0];
+	struct object_id oid;
+	struct object_directory *odb;
+
+	hashcpy(oid.hash, sha1);
+
+	prepare_alt_odb(r);
+	for (odb = r->objects->odb; odb; odb = odb->next) {
+		odb_load_loose_cache(odb, subdir_nr);
+		if (oid_array_lookup(&odb->loose_objects_cache, &oid) >= 0)
+			return 1;
+	}
+	return 0;
+}
+
 /*
  * Map the loose object at "path" if it is not NULL, or the path found by
  * searching for a loose object named "sha1".
@@ -1171,6 +1189,8 @@  static int sha1_loose_object_info(struct repository *r,
 	if (!oi->typep && !oi->type_name && !oi->sizep && !oi->contentp) {
 		const char *path;
 		struct stat st;
+		if (!oi->disk_sizep && (flags & OBJECT_INFO_QUICK))
+			return quick_has_loose(r, sha1) ? 0 : -1;
 		if (stat_sha1_file(r, sha1, &st, &path) < 0)
 			return -1;
 		if (oi->disk_sizep)