diff mbox series

[2/1] test-lib-functions: add object size functions

Message ID 65557f2d-9de0-49ae-a858-80476aa52b68@web.de (mailing list archive)
State New, archived
Headers show
Series Test breakage with zlib-ng | expand

Commit Message

René Scharfe Dec. 13, 2023, 12:28 p.m. UTC
Add test_object_size and its helpers test_loose_object_size and
test_packed_object_size, which allow determining the size of a Git
object using only the low-level Git commands rev-parse and show-index.

Use it in t6300 to replace the bare-bones function test_object_file_size
as a motivating example.  There it provides the expected output of the
high-level Git command for-each-ref.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
So how about this?  I'm a bit nervous about all the rules about output
descriptors and error propagation and whatnot in the test library, but
this implementation seems simple enough and might be useful in more than
one test.  No idea how to add support for alternate object directories,
but I doubt we'll ever need it.
---
 t/t6300-for-each-ref.sh | 16 ++++++--------
 t/test-lib-functions.sh | 47 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+), 9 deletions(-)

--
2.43.0

Comments

Jeff King Dec. 14, 2023, 8:59 p.m. UTC | #1
On Wed, Dec 13, 2023 at 01:28:56PM +0100, René Scharfe wrote:

> Add test_object_size and its helpers test_loose_object_size and
> test_packed_object_size, which allow determining the size of a Git
> object using only the low-level Git commands rev-parse and show-index.
> 
> Use it in t6300 to replace the bare-bones function test_object_file_size
> as a motivating example.  There it provides the expected output of the
> high-level Git command for-each-ref.

This adds a packed-object function, but I doubt anybody actually calls
it. If we're going to do that, it's probably worth adding some tests for
"cat-file --batch-check" or similar.

At which point I wonder if rather than having a function for a single
object, we are better off just testing the result of:

  git cat-file --batch-all-objects --unordered --batch-check='%(objectsize:disk)'

against a single post-processed "show-index" invocation.

> So how about this?  I'm a bit nervous about all the rules about output
> descriptors and error propagation and whatnot in the test library, but
> this implementation seems simple enough and might be useful in more than
> one test.  No idea how to add support for alternate object directories,
> but I doubt we'll ever need it.

I'm not sure that we need to do anything special with output
redirection. Shouldn't these functions just send errors to stderr as
usual? If they are run inside a test_expect block, that goes to
descriptor 4 (which is either /dev/null or the original stderr,
depending on whether "-v" was used).

> +test_loose_object_size () {
> +	test "$#" -ne 1 && BUG "1 param"
> +	local path=$(test_oid_to_path "$1")
> +	test_file_size "$(git rev-parse --git-path "objects/$path")" 2>&4
> +}

OK. We lose the exit code from "rev-parse" but that is probably OK for
our purposes.

> +test_packed_object_size () {
> +	test "$#" -ne 2 && BUG "2 params"
> +	local oid=$1 idx=$2 packsize rawsz end
> +
> +	packsize=$(test_file_size "${idx%.idx}.pack")
> +	rawsz=$(test_oid rawsz)
> +	end=$(($packsize - $rawsz))

OK, this $end is the magic required for the final entry. Makes sense.

> +	git show-index <"$idx" |
> +	awk -v oid="$oid" -v end="$end" '
> +		$2 == oid {start = $1}
> +		{offsets[$1] = 1}
> +		END {
> +			if (!start || start >= end)
> +				exit 1
> +			for (o in offsets)
> +				if (start < o && o < end)
> +					end = o
> +			print end - start
> +		}
> +	' && return 0

I was confused at first, because I didn't see any sorting happening. But
if I understand correctly, you're just looking for the smallest "end"
that comes after the start of the object we're looking for. Which I
think works.

-Peff
René Scharfe Dec. 19, 2023, 4:42 p.m. UTC | #2
Am 14.12.23 um 21:59 schrieb Jeff King:
> On Wed, Dec 13, 2023 at 01:28:56PM +0100, René Scharfe wrote:
>
>> Add test_object_size and its helpers test_loose_object_size and
>> test_packed_object_size, which allow determining the size of a Git
>> object using only the low-level Git commands rev-parse and show-index.
>>
>> Use it in t6300 to replace the bare-bones function test_object_file_size
>> as a motivating example.  There it provides the expected output of the
>> high-level Git command for-each-ref.
>
> This adds a packed-object function, but I doubt anybody actually calls
> it. If we're going to do that, it's probably worth adding some tests for
> "cat-file --batch-check" or similar.

Yes, and I was assuming that someone else would be eager to add such
tests. *ahem*

> At which point I wonder if rather than having a function for a single
> object, we are better off just testing the result of:
>
>   git cat-file --batch-all-objects --unordered --batch-check='%(objectsize:disk)'
>
> against a single post-processed "show-index" invocation.

Sure, we might want to optimize for bulk-processing and possibly end up
only checking the size of single objects in t6300, making new library
functions unnecessary.

When dumping size information of multiple objects, it's probably a good
idea to include "%(objectname)" as well in the format.

You'd need one show-index call for each .idx file.  A simple test would
only have a single one; a library function might need to handle multiple
packs.

>> So how about this?  I'm a bit nervous about all the rules about output
>> descriptors and error propagation and whatnot in the test library, but
>> this implementation seems simple enough and might be useful in more than
>> one test.  No idea how to add support for alternate object directories,
>> but I doubt we'll ever need it.
>
> I'm not sure that we need to do anything special with output
> redirection. Shouldn't these functions just send errors to stderr as
> usual? If they are run inside a test_expect block, that goes to
> descriptor 4 (which is either /dev/null or the original stderr,
> depending on whether "-v" was used).

Good point.  My bad excuse is that I copied the redirection to fd 4 from
test_grep.

>> +	git show-index <"$idx" |
>> +	awk -v oid="$oid" -v end="$end" '
>> +		$2 == oid {start = $1}
>> +		{offsets[$1] = 1}
>> +		END {
>> +			if (!start || start >= end)
>> +				exit 1
>> +			for (o in offsets)
>> +				if (start < o && o < end)
>> +					end = o
>> +			print end - start
>> +		}
>> +	' && return 0
>
> I was confused at first, because I didn't see any sorting happening. But
> if I understand correctly, you're just looking for the smallest "end"
> that comes after the start of the object we're looking for. Which I
> think works.

Yes, calculating the minimum offset suffices when handling a single
object -- no sorting required.  For bulk mode we'd better sort, of
course:

	git show-index <"$idx" |
	sort -n |
	awk -v end="$end" '
		NR > 1 {print oid, $1 - start}
		{start = $1; oid = $2}
		END {print oid, end - start}
	'

No idea how to make such a thing robust against malformed or truncated
output from show-index, but perhaps that's not necessary, depending on
how the result is used.

René
diff mbox series

Patch

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 843a7fe143..4687660f38 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -20,12 +20,6 @@  setdate_and_increment () {
     export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
 }

-test_object_file_size () {
-	oid=$(git rev-parse "$1")
-	path=".git/objects/$(test_oid_to_path $oid)"
-	test_file_size "$path"
-}
-
 test_expect_success setup '
 	# setup .mailmap
 	cat >.mailmap <<-EOF &&
@@ -40,7 +34,11 @@  test_expect_success setup '
 	git branch -M main &&
 	setdate_and_increment &&
 	git tag -a -m "Tagging at $datestamp" testtag &&
+	testtag_oid=$(git rev-parse refs/tags/testtag) &&
+	testtag_disksize=$(test_object_size $testtag_oid) &&
 	git update-ref refs/remotes/origin/main main &&
+	commit_oid=$(git rev-parse refs/heads/main) &&
+	commit_disksize=$(test_object_size $commit_oid) &&
 	git remote add origin nowhere &&
 	git config branch.main.remote origin &&
 	git config branch.main.merge refs/heads/main &&
@@ -129,7 +127,7 @@  test_atom head push:strip=1 remotes/myfork/main
 test_atom head push:strip=-1 main
 test_atom head objecttype commit
 test_atom head objectsize $((131 + hexlen))
-test_atom head objectsize:disk $(test_object_file_size refs/heads/main)
+test_atom head objectsize:disk $commit_disksize
 test_atom head deltabase $ZERO_OID
 test_atom head objectname $(git rev-parse refs/heads/main)
 test_atom head objectname:short $(git rev-parse --short refs/heads/main)
@@ -203,8 +201,8 @@  test_atom tag upstream ''
 test_atom tag push ''
 test_atom tag objecttype tag
 test_atom tag objectsize $((114 + hexlen))
-test_atom tag objectsize:disk $(test_object_file_size refs/tags/testtag)
-test_atom tag '*objectsize:disk' $(test_object_file_size refs/heads/main)
+test_atom tag objectsize:disk $testtag_disksize
+test_atom tag '*objectsize:disk' $commit_disksize
 test_atom tag deltabase $ZERO_OID
 test_atom tag '*deltabase' $ZERO_OID
 test_atom tag objectname $(git rev-parse refs/tags/testtag)
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 9c3cf12b26..9b49645f77 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1733,6 +1733,53 @@  test_oid_to_path () {
 	echo "${1%$basename}/$basename"
 }

+test_loose_object_size () {
+	test "$#" -ne 1 && BUG "1 param"
+	local path=$(test_oid_to_path "$1")
+	test_file_size "$(git rev-parse --git-path "objects/$path")" 2>&4
+}
+
+test_packed_object_size () {
+	test "$#" -ne 2 && BUG "2 params"
+	local oid=$1 idx=$2 packsize rawsz end
+
+	packsize=$(test_file_size "${idx%.idx}.pack")
+	rawsz=$(test_oid rawsz)
+	end=$(($packsize - $rawsz))
+
+	git show-index <"$idx" |
+	awk -v oid="$oid" -v end="$end" '
+		$2 == oid {start = $1}
+		{offsets[$1] = 1}
+		END {
+			if (!start || start >= end)
+				exit 1
+			for (o in offsets)
+				if (start < o && o < end)
+					end = o
+			print end - start
+		}
+	' && return 0
+
+	echo >&4 "error: '$oid' not found in '$idx'"
+	return 1
+}
+
+test_object_size () {
+	test "$#" -ne 1 && BUG "1 param"
+	local oid=$1
+
+	test_loose_object_size "$oid" 4>/dev/null && return 0
+
+	for idx in "$(git rev-parse --git-path objects/pack)"/pack-*.idx
+	do
+		test_packed_object_size "$oid" "$idx" 4>/dev/null && return 0
+	done
+
+	echo >&4 "error: '$oid' not found"
+	return 1
+}
+
 # Parse oids from git ls-files --staged output
 test_parse_ls_files_stage_oids () {
 	awk '{print $2}' -