diff mbox series

[v4,4/8] t1051: introduce a smudge filter test for extremely large files

Message ID f099b4eaead1799908eee000c99af3a381394e6d.1635867971.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit b79541af7a1a7b7f2438f43196b1774d7b71e852
Headers show
Series Allow clean/smudge filters to handle huge files in the LLP64 data model | expand

Commit Message

Matt Cooper Nov. 2, 2021, 3:46 p.m. UTC
From: Matt Cooper <vtbassmatt@gmail.com>

The filter system allows for alterations to file contents when they're
added to the database or working tree. ("Smudge" when moving to the
working tree; "clean" when moving to the database.) This is used
natively to handle CRLF to LF conversions. It's also employed by Git-LFS
to replace large files from the working tree with small tracking files
in the repo and vice versa.

Git reads the entire smudged file into memory to convert it into a
"clean" form to be used in-core. While this is inefficient, there's a
more insidious problem on some platforms due to inconsistency between
using unsigned long and size_t for the same type of data (size of a file
in bytes). On most 64-bit platforms, unsigned long is 64 bits, and
size_t is typedef'd to unsigned long. On Windows, however, unsigned long
is only 32 bits (and therefore on 64-bit Windows, size_t is typedef'd to
unsigned long long in order to be 64 bits).

Practically speaking, this means 64-bit Windows users of Git-LFS can't
handle files larger than 2^32 bytes. Other 64-bit platforms don't suffer
this limitation.

This commit introduces a test exposing the issue; future commits make it
pass. The test simulates the way Git-LFS works by having a tiny file
checked into the repository and expanding it to a huge file on checkout.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matt Cooper <vtbassmatt@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1051-large-conversion.sh | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
diff mbox series

Patch

diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh
index 8b7640b3ba8..e7f9f0bdc56 100755
--- a/t/t1051-large-conversion.sh
+++ b/t/t1051-large-conversion.sh
@@ -83,4 +83,19 @@  test_expect_success 'ident converts on output' '
 	test_cmp small.clean large.clean
 '
 
+# This smudge filter prepends 5GB of zeros to the file it checks out. This
+# ensures that smudging doesn't mangle large files on 64-bit Windows.
+test_expect_failure EXPENSIVE,SIZE_T_IS_64BIT,!LONG_IS_64BIT \
+		'files over 4GB convert on output' '
+	test_commit test small "a small file" &&
+	small_size=$(test_file_size small) &&
+	test_config filter.makelarge.smudge \
+		"test-tool genzeros $((5*1024*1024*1024)) && cat" &&
+	echo "small filter=makelarge" >.gitattributes &&
+	rm small &&
+	git checkout -- small &&
+	size=$(test_file_size small) &&
+	test "$size" -eq $((5 * 1024 * 1024 * 1024 + $small_size))
+'
+
 test_done