diff mbox series

tests: disable fsync everywhere

Message ID 20211029001552.GA29647@dcvr (mailing list archive)
State New, archived
Headers show
Series tests: disable fsync everywhere | expand

Commit Message

Eric Wong Oct. 29, 2021, 12:15 a.m. UTC
Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> >> > This will also be useful for 3rd-party tools which create
> >> > throwaway git repositories of temporary data.
> >> 
> >> Do you mostly just care about the tests, or is the third-party tool
> >> support important to you? I ask because most of us switched to running
> >> the tests with --root=/some/tmpfs long ago to achieve the same speedup.
> >
> > Third-party tools and OSes which don't have a tmpfs mounted by
> > default (I don't think most *BSDs have tmpfs enabled by
> > default).

> > I'm also strongly considering making GIT_FSYNC=0 the default for
> > our own test suite since it's less setup for newbies.
> 
> If the primary focus is for testing, perhaps GIT_TEST_FSYNC would be
> better?  I do not think we want to even advertise an option for not
> syncing at all for end users working with any real repositories.
> Not even when they promise that the end user accepts all the
> responsibility and will keep both halves when things got broken.

I think having an undocumented GIT_TEST_FSYNC is a good
compromise and saves us the support burden.

Fwiw, I do currently have a 3rd-party tool which creates
throwaway repos, but the throwaways are currently small enough
that objects stay loose.  They might get bigger and pack in
in the future.

Relying on an LD_PRELOAD-based solution such as eatmydata
doesn't work for staticly-linked systems, and can conflict
with other LD_PRELOAD-based tools one might use(*)

v2 changes:
* s/GIT_FSYNC/GIT_TEST_FSYNC/
* disable fsync by default for tests, reduces setup for newcomers
* fix style nit noted by Eric Sunshine

--------------8<------------
Subject: [PATCH] tests: disable fsync everywhere

The "GIT_TEST_FSYNC" environment variable now exists for
disabling fsync() even on packfiles and other "critical" data.

Running "make test -j8 NO_SVN_TESTS=1" on a noisy 8-core system
on an HDD, test runtime drops from ~4 minutes down to ~3 minutes.
Using "GIT_TEST_FSYNC=1" re-enables fsync() for comparison
purposes.

SVN interopability tests are minimally affected since SVN will
still use fsync in various places.

This will also be useful for 3rd-party tools which create
throwaway git repositories of temporary data, but remains
undocumented for end users.

Signed-off-by: Eric Wong <e@80x24.org>
---
 cache.h            |  1 +
 environment.c      |  1 +
 git-cvsserver.perl | 19 +++++++++++++++++++
 perl/Git/SVN.pm    | 17 +++++++++++++++--
 t/test-lib.sh      |  7 +++++++
 write-or-die.c     |  5 +++++
 6 files changed, 48 insertions(+), 2 deletions(-)

Comments

Junio C Hamano Oct. 29, 2021, 5:18 a.m. UTC | #1
Eric Wong <e@80x24.org> writes:

> @@ -42,6 +42,7 @@ const char *git_hooks_path;
>  int zlib_compression_level = Z_BEST_SPEED;
>  int pack_compression_level = Z_DEFAULT_COMPRESSION;
>  int fsync_object_files;
> +int use_fsync = -1;

OK, (-1) is "undetermined yet", as usual.

> diff --git a/git-cvsserver.perl b/git-cvsserver.perl
> index 64319bed43..4c8118010a 100755
> --- a/git-cvsserver.perl
> +++ b/git-cvsserver.perl
> @@ -3607,6 +3607,22 @@ package GITCVS::updater;
>  use strict;
>  use warnings;
>  use DBI;
> +our $_use_fsync;
> +
> +# n.b. consider using Git.pm
> +sub use_fsync {
> +    if (!defined($_use_fsync)) {
> +        my $x = $ENV{GIT_TEST_FSYNC};
> +        if (defined $x) {

I would have expected to see "exists $ENV{GIT_TEST_FSYNC}", but I
guess there is no way to place in %ENV anyway, so it would be OK.

> +            local $ENV{GIT_CONFIG};
> +            delete $ENV{GIT_CONFIG};

OK, "git -c test.fsync=no cvsserver" would added something to
GIT_CONFIG that would affect test.fsync, but wouldn't the usual
last-one-wins rule be sufficient to check the value of $x using the
next construction, no matter what is in GIT_CONFIG?  I do not think
it would hurt to delete $ENV{GIT_CONFIG}, but I am not sure how it
is necessary.

> +            my $v = ::safe_pipe_capture('git', '-c', "test.fsync=$x",
> +                                        qw(config --type=bool test.fsync));

THis is an interesting idiom.

> +            $_use_fsync = defined($v) ? ($v eq "true\n") : 1;
> +        }
> +    }
> +    $_use_fsync;
> +}


> +# TODO: move this to Git.pm?
> +sub use_fsync {

Possibly, but in a slightly more general form, taking the name of
the environment variable that holds a boolean value as an argument,
or something?

> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index a291a5d4a2..21f5fab999 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -489,6 +489,13 @@ then
>  	export GIT_PERL_FATAL_WARNINGS
>  fi
>  
> +case $GIT_TEST_FSYNC in
> +'')
> +	GIT_TEST_FSYNC=0
> +	export GIT_TEST_FSYNC
> +	;;
> +esac

> diff --git a/write-or-die.c b/write-or-die.c
> index 0b1ec8190b..a3d5784cec 100644
> --- a/write-or-die.c
> +++ b/write-or-die.c
> @@ -1,4 +1,5 @@
>  #include "cache.h"
> +#include "config.h"
>  #include "run-command.h"
>  
>  /*
> @@ -57,6 +58,10 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
>  
>  void fsync_or_die(int fd, const char *msg)
>  {
> +	if (use_fsync < 0)
> +		use_fsync = git_env_bool("GIT_TEST_FSYNC", 1);
> +	if (!use_fsync)
> +		return;

OK.  That's quite straight-forward.

>  	while (fsync(fd) < 0) {
>  		if (errno != EINTR)
>  			die_errno("fsync error on '%s'", msg);

Will queue.
Junio C Hamano Oct. 29, 2021, 7:33 a.m. UTC | #2
Eric Wong <e@80x24.org> writes:

> v2 changes:
> * s/GIT_FSYNC/GIT_TEST_FSYNC/
> * disable fsync by default for tests, reduces setup for newcomers
> * fix style nit noted by Eric Sunshine

https://github.com/git/git/runs/4043532265?check_suite_focus=true#step:5:70

Seems to be dying in "git svn" tests somehow.
Eric Wong Oct. 29, 2021, 7:48 a.m. UTC | #3
Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > v2 changes:
> > * s/GIT_FSYNC/GIT_TEST_FSYNC/
> > * disable fsync by default for tests, reduces setup for newcomers
> > * fix style nit noted by Eric Sunshine
> 
> https://github.com/git/git/runs/4043532265?check_suite_focus=true#step:5:70

Fwiw, I couldn't view that (not sure if it's from lack of JS
or lack of GH account).  Either way it's accessibility problem.

> Seems to be dying in "git svn" tests somehow.

Easy repro+fix, though.  I only tested my final patch with NO_SVN_TESTS :x
Can you squash this in or do you want a reroll?

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index df5a87a151..6ce2e283c8 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -2274,7 +2274,7 @@ sub use_fsync {
 	if (!defined($_use_fsync)) {
 		my $x = $ENV{GIT_TEST_FSYNC};
 		if (defined $x) {
-			my $v = command_oneline('git', '-c', "test.fsync=$x",
+			my $v = command_oneline('-c', "test.fsync=$x",
 					qw(config --type=bool test.fsync));
 			$_use_fsync = defined($v) ? ($v eq "true\n") : 1;
 		}
Eric Wong Oct. 29, 2021, 7:56 a.m. UTC | #4
Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> > +# n.b. consider using Git.pm
> > +sub use_fsync {
> > +    if (!defined($_use_fsync)) {
> > +        my $x = $ENV{GIT_TEST_FSYNC};
> > +        if (defined $x) {
> 
> I would have expected to see "exists $ENV{GIT_TEST_FSYNC}", but I
> guess there is no way to place in %ENV anyway, so it would be OK.

Was that meant to say: "no way to place `undef' in %ENV anyway"?

If so, `undef' can actually be in Perl's %ENV, though it appears
to get coerced into "" (empty string) when spawning processes.

> > +            local $ENV{GIT_CONFIG};
> > +            delete $ENV{GIT_CONFIG};
> 
> OK, "git -c test.fsync=no cvsserver" would added something to
> GIT_CONFIG that would affect test.fsync, but wouldn't the usual
> last-one-wins rule be sufficient to check the value of $x using the
> next construction, no matter what is in GIT_CONFIG?  I do not think
> it would hurt to delete $ENV{GIT_CONFIG}, but I am not sure how it
> is necessary.

Leaving GIT_CONFIG set was actually causing "git config" to
exit(1) since git-cvsserver sets GIT_CONFIG and the GIT_CONFIG
file doesn't have a test.fsync setting.  This is the current
behavior, I think it's a weird quirk, but intended behavior of
git-config.

# this assumes you don't have foo.bar set in your ~/.gitconfig :>
$ GIT_CONFIG=$HOME/.gitconfig git -c foo.bar=0 config --type=bool foo.bar
$ echo $?
1

> > +            my $v = ::safe_pipe_capture('git', '-c', "test.fsync=$x",
> > +                                        qw(config --type=bool test.fsync));
> 
> THis is an interesting idiom.

Heh, I just thought of it before sending my original.  I was
going to use a regexp originally (in git-svn, too), but didn't
want to get into corner cases such as hex and +/- prefixes).

> > +            $_use_fsync = defined($v) ? ($v eq "true\n") : 1;
> > +        }
> > +    }
> > +    $_use_fsync;
> > +}
> 
> 
> > +# TODO: move this to Git.pm?
> > +sub use_fsync {
> 
> Possibly, but in a slightly more general form, taking the name of
> the environment variable that holds a boolean value as an argument,
> or something?

Yeah.  It would've been more useful if git-cvsserver used Git.pm;
but I didn't want to introduce Git.pm into git-cvsserver in case
somebody relies on git-cvsserver being standalone.
Junio C Hamano Oct. 29, 2021, 5:22 p.m. UTC | #5
Eric Wong <e@80x24.org> writes:

> Easy repro+fix, though.  I only tested my final patch with NO_SVN_TESTS :x
> Can you squash this in or do you want a reroll?
>
> diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
> index df5a87a151..6ce2e283c8 100644
> --- a/perl/Git/SVN.pm
> +++ b/perl/Git/SVN.pm
> @@ -2274,7 +2274,7 @@ sub use_fsync {
>  	if (!defined($_use_fsync)) {
>  		my $x = $ENV{GIT_TEST_FSYNC};
>  		if (defined $x) {
> -			my $v = command_oneline('git', '-c', "test.fsync=$x",
> +			my $v = command_oneline('-c', "test.fsync=$x",
>  					qw(config --type=bool test.fsync));
>  			$_use_fsync = defined($v) ? ($v eq "true\n") : 1;
>  		}

Yuck, I should have known that command_oneline() does not want the
leading 'git'.

A sad part of the story is that this is not exactly the fault of the
interface, as the full name of the thing is Git::command_oneline(),
which makes it sufficiently clear that it is a "git" specific thing,
and Git::oneline_result_from_git_command_do_not_give_git_at_front()
is not a good sub name X-<.

Thanks for quickly diagnosing.
Junio C Hamano Oct. 29, 2021, 6:12 p.m. UTC | #6
Eric Wong <e@80x24.org> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>> Eric Wong <e@80x24.org> writes:
>> > +# n.b. consider using Git.pm
>> > +sub use_fsync {
>> > +    if (!defined($_use_fsync)) {
>> > +        my $x = $ENV{GIT_TEST_FSYNC};
>> > +        if (defined $x) {
>> 
>> I would have expected to see "exists $ENV{GIT_TEST_FSYNC}", but I
>> guess there is no way to place in %ENV anyway, so it would be OK.
>
> Was that meant to say: "no way to place `undef' in %ENV anyway"?

Yes.  Nothing the external callers of a Perl script does by futzing
the environment with setenv(3) and unsetenv(3) can make undef appear
as a value in $ENV{SomeKey}, so defined $ENV{V} and exists $ENV{V}
are equivalent.

I still prefer "exists $ENV{V}", which I think conveys the intent of
the check better, i.e. "we do this iff the environment variable X is
there".

> If so, `undef' can actually be in Perl's %ENV, though it appears
> to get coerced into "" (empty string) when spawning processes.

Yes, but you are talking about the opposite direction, what Perl can
do to %ENV to affect processes it spawns, which is not what I meant.

> Leaving GIT_CONFIG set was actually causing "git config" to
> exit(1) since git-cvsserver sets GIT_CONFIG and the GIT_CONFIG
> file doesn't have a test.fsync setting.  This is the current
> behavior, I think it's a weird quirk, but intended behavior of
> git-config.

Ah, sorry, I misread the variable.  GIT_CONFIG was the thing that
says "read from this file and nowhere else"; we do want to disable
it locally for "-c var=val" to take effect.

> # this assumes you don't have foo.bar set in your ~/.gitconfig :>
> $ GIT_CONFIG=$HOME/.gitconfig git -c foo.bar=0 config --type=bool foo.bar
> $ echo $?
> 1
>
>> > +            my $v = ::safe_pipe_capture('git', '-c', "test.fsync=$x",
>> > +                                        qw(config --type=bool test.fsync));
>> 
>> THis is an interesting idiom.
>
> Heh, I just thought of it before sending my original.  I was
> going to use a regexp originally (in git-svn, too), but didn't
> want to get into corner cases such as hex and +/- prefixes).

And this I think is the best way to do so ;-)
Jeff King Oct. 29, 2021, 8:34 p.m. UTC | #7
On Fri, Oct 29, 2021 at 12:15:52AM +0000, Eric Wong wrote:

> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index a291a5d4a2..21f5fab999 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -489,6 +489,13 @@ then
>  	export GIT_PERL_FATAL_WARNINGS
>  fi
>  
> +case $GIT_TEST_FSYNC in
> +'')
> +	GIT_TEST_FSYNC=0
> +	export GIT_TEST_FSYNC
> +	;;
> +esac

I don't really have a problem with doing this by default, as it might
help and shouldn't hurt. But I didn't find it actually changed much for
me. Here are timings running the test suite on my machine:

  Benchmark 1: GIT_TEST_FSYNC=1 make
    Time (mean ± σ):     60.973 s ±  0.158 s    [User: 575.914 s, System: 333.243 s]
    Range (min … max):   60.731 s … 61.228 s    10 runs

  Benchmark 2: GIT_TEST_FSYNC=0 make
    Time (mean ± σ):     59.800 s ±  0.094 s    [User: 575.151 s, System: 337.111 s]
    Range (min … max):   59.643 s … 59.996 s    10 runs

  Benchmark 3: GIT_TEST_FSYNC=1 make GIT_TEST_OPTS=--root=/var/ram
    Time (mean ± σ):     56.966 s ±  0.066 s    [User: 572.353 s, System: 300.808 s]
    Range (min … max):   56.874 s … 57.063 s    10 runs

  Summary
    'GIT_TEST_FSYNC=1 make GIT_TEST_OPTS=--root=/var/ram' ran
      1.05 ± 0.00 times faster than 'GIT_TEST_FSYNC=0 make'
      1.07 ± 0.00 times faster than 'GIT_TEST_FSYNC=1 make'

So using an actual ram disk provided a much bigger edge for me than just
disabling fsync. This may be because the system is using a decent SSD
for its actual disk. But it may also be because I'm running 32 tests
simultaneously. So fsync introducing latency in tests isn't a big deal;
the bottleneck is CPU and there's always another script ready to run.
Though it is also interesting that the system CPU time is so much lower
in the tmpfs case.

Again, not really an objection, but I don't think this replaces the
"you're better off running the test suite on a RAM disk" wisdom.

-Peff
Junio C Hamano Oct. 29, 2021, 8:42 p.m. UTC | #8
Jeff King <peff@peff.net> writes:

> Again, not really an objection, but I don't think this replaces the
> "you're better off running the test suite on a RAM disk" wisdom.

Nice summary.  This was sold as "not everybody has ready access to
tmpfs", so those of us who can use either should stick to our tmpfs
as we have done so far.
diff mbox series

Patch

diff --git a/cache.h b/cache.h
index eba12487b9..de6c45cf44 100644
--- a/cache.h
+++ b/cache.h
@@ -986,6 +986,7 @@  extern int read_replace_refs;
 extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
+extern int use_fsync;
 extern int core_preload_index;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/environment.c b/environment.c
index 9da7f3c1a1..0d06a31024 100644
--- a/environment.c
+++ b/environment.c
@@ -42,6 +42,7 @@  const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
 int fsync_object_files;
+int use_fsync = -1;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
diff --git a/git-cvsserver.perl b/git-cvsserver.perl
index 64319bed43..4c8118010a 100755
--- a/git-cvsserver.perl
+++ b/git-cvsserver.perl
@@ -3607,6 +3607,22 @@  package GITCVS::updater;
 use strict;
 use warnings;
 use DBI;
+our $_use_fsync;
+
+# n.b. consider using Git.pm
+sub use_fsync {
+    if (!defined($_use_fsync)) {
+        my $x = $ENV{GIT_TEST_FSYNC};
+        if (defined $x) {
+            local $ENV{GIT_CONFIG};
+            delete $ENV{GIT_CONFIG};
+            my $v = ::safe_pipe_capture('git', '-c', "test.fsync=$x",
+                                        qw(config --type=bool test.fsync));
+            $_use_fsync = defined($v) ? ($v eq "true\n") : 1;
+        }
+    }
+    $_use_fsync;
+}
 
 =head1 METHODS
 
@@ -3676,6 +3692,9 @@  sub new
                                 $self->{dbuser},
                                 $self->{dbpass});
     die "Error connecting to database\n" unless defined $self->{dbh};
+    if ($self->{dbdriver} eq 'SQLite' && !use_fsync()) {
+        $self->{dbh}->do('PRAGMA synchronous = OFF');
+    }
 
     $self->{tables} = {};
     foreach my $table ( keys %{$self->{dbh}->table_info(undef,undef,undef,'TABLE')->fetchall_hashref('TABLE_NAME')} )
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 35ff5a6896..df5a87a151 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -6,7 +6,7 @@  package Git::SVN;
 use vars qw/$_no_metadata
             $_repack $_repack_flags $_use_svm_props $_head
             $_use_svnsync_props $no_reuse_existing
-	    $_use_log_author $_add_author_from $_localtime/;
+	    $_use_log_author $_add_author_from $_localtime $_use_fsync/;
 use Carp qw/croak/;
 use File::Path qw/mkpath/;
 use IPC::Open3;
@@ -2269,6 +2269,19 @@  sub mkfile {
 	}
 }
 
+# TODO: move this to Git.pm?
+sub use_fsync {
+	if (!defined($_use_fsync)) {
+		my $x = $ENV{GIT_TEST_FSYNC};
+		if (defined $x) {
+			my $v = command_oneline('git', '-c', "test.fsync=$x",
+					qw(config --type=bool test.fsync));
+			$_use_fsync = defined($v) ? ($v eq "true\n") : 1;
+		}
+	}
+	$_use_fsync;
+}
+
 sub rev_map_set {
 	my ($self, $rev, $commit, $update_ref, $uuid) = @_;
 	defined $commit or die "missing arg3\n";
@@ -2290,7 +2303,7 @@  sub rev_map_set {
 	my $sync;
 	# both of these options make our .rev_db file very, very important
 	# and we can't afford to lose it because rebuild() won't work
-	if ($self->use_svm_props || $self->no_metadata) {
+	if (($self->use_svm_props || $self->no_metadata) && use_fsync()) {
 		require File::Copy;
 		$sync = 1;
 		File::Copy::copy($db, $db_lock) or die "rev_map_set(@_): ",
diff --git a/t/test-lib.sh b/t/test-lib.sh
index a291a5d4a2..21f5fab999 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -489,6 +489,13 @@  then
 	export GIT_PERL_FATAL_WARNINGS
 fi
 
+case $GIT_TEST_FSYNC in
+'')
+	GIT_TEST_FSYNC=0
+	export GIT_TEST_FSYNC
+	;;
+esac
+
 # Add libc MALLOC and MALLOC_PERTURB test
 # only if we are not executing the test with valgrind
 if test -n "$valgrind" ||
diff --git a/write-or-die.c b/write-or-die.c
index 0b1ec8190b..a3d5784cec 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -1,4 +1,5 @@ 
 #include "cache.h"
+#include "config.h"
 #include "run-command.h"
 
 /*
@@ -57,6 +58,10 @@  void fprintf_or_die(FILE *f, const char *fmt, ...)
 
 void fsync_or_die(int fd, const char *msg)
 {
+	if (use_fsync < 0)
+		use_fsync = git_env_bool("GIT_TEST_FSYNC", 1);
+	if (!use_fsync)
+		return;
 	while (fsync(fd) < 0) {
 		if (errno != EINTR)
 			die_errno("fsync error on '%s'", msg);