diff mbox series

[1/5] t1051: introduce a smudge filter test for extremely large files

Message ID 449eb5c205e139e21b619c4eb975afc3d47427f3.1635320952.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Allow clean/smudge filters to handle huge files in the LLP64 data model | expand

Commit Message

Matt Cooper Oct. 27, 2021, 7:49 a.m. UTC
From: Matt Cooper <vtbassmatt@gmail.com>

The filter system allows for alterations to file contents when they're
added to the database or workdir. ("Smudge" when moving to the workdir;
"clean" when moving to the database.) This is used natively to handle CRLF
to LF conversions. It's also employed by Git-LFS to replace large files
from the workdir with small tracking files in the repo and vice versa.

Git pulls the entire smudged file into memory. While this is inefficient,
there's a more insidious problem on some platforms due to inconsistency
between using unsigned long and size_t for the same type of data (size of
a file in bytes). On most 64-bit platforms, unsigned long is 64 bits, and
size_t is typedef'd to unsigned long. On Windows, however, unsigned long is
only 32 bits (and therefore on 64-bit Windows, size_t is typedef'd to
unsigned long long in order to be 64 bits).

Practically speaking, this means 64-bit Windows users of Git-LFS can't
handle files larger than 2^32 bytes. Other 64-bit platforms don't suffer
this limitation.

This commit introduces a test exposing the issue; future commits make it
pass. The test simulates the way Git-LFS works by having a tiny file
checked into the repository and expanding it to a huge file on checkout.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matt Cooper <vtbassmatt@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1051-large-conversion.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Carlo Marcelo Arenas Belón Oct. 28, 2021, 7:15 a.m. UTC | #1
On Wed, Oct 27, 2021 at 12:03 PM Matt Cooper via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh
> index 8b7640b3ba8..684ba5bd0a5 100755
> --- a/t/t1051-large-conversion.sh
> +++ b/t/t1051-large-conversion.sh
> @@ -83,4 +83,16 @@ test_expect_success 'ident converts on output' '
>         test_cmp small.clean large.clean
>  '
>
> +# This smudge filter prepends 5GB of zeros to the file it checks out. This
> +# ensures that smudging doesn't mangle large files on 64-bit Windows.
> +test_expect_failure EXPENSIVE,!LONG_IS_64BIT 'files over 4GB convert on output' '
> +       test_commit test small "a small file" &&
> +       test_config filter.makelarge.smudge "dd if=/dev/zero bs=$((1024*1024)) count=$((5*1024)) && cat" &&

/dev/zero doesn't exist in HP NonStop, a portable solution would be to
use `test-tool genzeros` that is available since d5cfd142ec (tests:
teach the test-tool to generate NUL bytes and use it, 2019-02-14)

Carlo
diff mbox series

Patch

diff --git a/t/t1051-large-conversion.sh b/t/t1051-large-conversion.sh
index 8b7640b3ba8..684ba5bd0a5 100755
--- a/t/t1051-large-conversion.sh
+++ b/t/t1051-large-conversion.sh
@@ -83,4 +83,16 @@  test_expect_success 'ident converts on output' '
 	test_cmp small.clean large.clean
 '
 
+# This smudge filter prepends 5GB of zeros to the file it checks out. This
+# ensures that smudging doesn't mangle large files on 64-bit Windows.
+test_expect_failure EXPENSIVE,!LONG_IS_64BIT 'files over 4GB convert on output' '
+	test_commit test small "a small file" &&
+	test_config filter.makelarge.smudge "dd if=/dev/zero bs=$((1024*1024)) count=$((5*1024)) && cat" &&
+	echo "small filter=makelarge" >.gitattributes &&
+	rm small &&
+	git checkout -- small &&
+	size=$(test_file_size small) &&
+	test "$size" -ge $((5 * 1024 * 1024 * 1024))
+'
+
 test_done