diff mbox series

[v3,6/6] core.fsyncobjectfiles: enable batch mode for testing

Message ID 55a40fc8fd59df6180c8a87d93fcc9a232ff8d0a.1631590725.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Implement a batched fsync option for core.fsyncObjectFiles | expand

Commit Message

Neeraj Singh (WINDOWS-SFS) Sept. 14, 2021, 3:38 a.m. UTC
From: Neeraj Singh <neerajsi@microsoft.com>

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 environment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Junio C Hamano Sept. 15, 2021, 4:21 p.m. UTC | #1
"Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Neeraj Singh <neerajsi@microsoft.com>
>
> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> ---
>  environment.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/environment.c b/environment.c
> index 3e23eafff80..27d5e11267e 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -43,7 +43,7 @@ const char *git_hooks_path;
>  int zlib_compression_level = Z_BEST_SPEED;
>  int core_compression_level;
>  int pack_compression_level = Z_DEFAULT_COMPRESSION;
> -enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
> +enum FSYNC_OBJECT_FILES_MODE fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
>  size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
>  size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
>  size_t delta_base_cache_limit = 96 * 1024 * 1024;

Despite what the title of the change claims, this is not "enable for
testing", but "enable for everybody even in production", isn't it?

I'd prefer we do not do this, certainly not for "testing".

If setting the variable to "batch" were meant to eventually improve
performance for all different flavours of workload, I do not think
we would mind if we set it to "batch" for those who opt into the
"experimental" set of features by setting the feature.experimental
configuration variable to true.  And after a few development cycles
when the feature proves to be useful for everybody, we may want to
apply this patch under a justification that is different from "for
testing".

On the other hand, if this is meant to help 85% of people while
degrading the remainder of workflow, I do not think we would want to
see this change without a warning that says something along the
lines of "under rare circumstances (e.g. if you employ such and such
workflow), the new default value used for the core.fsyncObjectFiles
configuration variable will hurt performance."

Since this is about answering the question "between performance and
crash resilience, where do you as an end user strike the balance for
your needs?", I do not think it falls into either of the above two
categories.  

The only plausible justification I can think of to apply a "we
default to 'batch' for everybody" patch with is something like:

    Now with the 'batch' setting for core.fsyncObjectFiles, unlike
    'true' that paid very high overhead, the overhead to ensure our
    writes hit the disk platters has so greatly been reduced that it
    hurts the performance only negligibly.  Let's switch the default
    from the unsafe value of 'false' to safer and performant value
    of 'batch'.

I however doubt with the current round of patches, we are there yet.
Neeraj Singh Sept. 15, 2021, 10:43 p.m. UTC | #2
On Wed, Sep 15, 2021 at 9:21 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Neeraj Singh <neerajsi@microsoft.com>
> >
> > Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> > ---
> >  environment.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/environment.c b/environment.c
> > index 3e23eafff80..27d5e11267e 100644
> > --- a/environment.c
> > +++ b/environment.c
> > @@ -43,7 +43,7 @@ const char *git_hooks_path;
> >  int zlib_compression_level = Z_BEST_SPEED;
> >  int core_compression_level;
> >  int pack_compression_level = Z_DEFAULT_COMPRESSION;
> > -enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
> > +enum FSYNC_OBJECT_FILES_MODE fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
> >  size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
> >  size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
> >  size_t delta_base_cache_limit = 96 * 1024 * 1024;
>
> Despite what the title of the change claims, this is not "enable for
> testing", but "enable for everybody even in production", isn't it?
>
> I'd prefer we do not do this, certainly not for "testing".
>
> If setting the variable to "batch" were meant to eventually improve
> performance for all different flavours of workload, I do not think
> we would mind if we set it to "batch" for those who opt into the
> "experimental" set of features by setting the feature.experimental
> configuration variable to true.  And after a few development cycles
> when the feature proves to be useful for everybody, we may want to
> apply this patch under a justification that is different from "for
> testing".
>
> On the other hand, if this is meant to help 85% of people while
> degrading the remainder of workflow, I do not think we would want to
> see this change without a warning that says something along the
> lines of "under rare circumstances (e.g. if you employ such and such
> workflow), the new default value used for the core.fsyncObjectFiles
> configuration variable will hurt performance."
>
> Since this is about answering the question "between performance and
> crash resilience, where do you as an end user strike the balance for
> your needs?", I do not think it falls into either of the above two
> categories.
>
> The only plausible justification I can think of to apply a "we
> default to 'batch' for everybody" patch with is something like:
>
>     Now with the 'batch' setting for core.fsyncObjectFiles, unlike
>     'true' that paid very high overhead, the overhead to ensure our
>     writes hit the disk platters has so greatly been reduced that it
>     hurts the performance only negligibly.  Let's switch the default
>     from the unsafe value of 'false' to safer and performant value
>     of 'batch'.
>
> I however doubt with the current round of patches, we are there yet.

Sorry for being unclear here (and perhaps including an improper patch).
This commit is mainly to ensure that we get coverage of batch mode on all
platforms in the CI infrastructure.  I don't believe it should be included in
mainline git without significantly more discussion and experimentation.

However, I'd hope that Git for Windows would be able to adopt batch mode
by default when they pull this series in. They are currently enabling fsync
by default.

Batch mode does have more cost, particularly on rotational media.
I think git should eventually enable batch mode by default with the proviso that
maintainers and people running ephemeral CI infrastructure should turn
fsync off
if they care more about speed than durability.

Do you think that feature.experimental is a good place to put this right away,
or should we just leave this as an option that Git for Windows can pick up and
leave the other platforms alone?

Thanks,
Neeraj
Junio C Hamano Sept. 15, 2021, 11:12 p.m. UTC | #3
Neeraj Singh <nksingh85@gmail.com> writes:

> This commit is mainly to ensure that we get coverage of batch mode on all
> platforms in the CI infrastructure.  I don't believe it should be included in
> mainline git without significantly more discussion and experimentation.

Am I incorrect to say that only just a handful of code paths can
take advantage of the bulk checkin "plugging-unplugging" feature to
begin with, so running _all_ the existing tests that cover
everything with this core.fsyncobjectfiles=batch setting is rather
pointless?

If so, perhaps instead of 6/6, you should identify key code paths
that would be affected by this feature (perhaps "git add" is one of
them), and either write a new test script dedicated for this feature
or piggy-back on existing test scripts that already tests the code
paths and adding new test pieces there that exercise this new feature.

If it is a good idea to run all the tests with core.fsyncobjectfiles
set to batch, however, it probalby is easiest to invent a new
environment variable GIT_TEST_FORCE_CORE_FSYNCOBJECTFILES and have
it honored as the default when it is set, and add a NEW CI job that
exports the environment with the value "batch".  Other people
(including the ones from Microsoft, I think) are much more familiar
than I am on how to make this kind of thing work in GitHub Actions.

> Do you think that feature.experimental is a good place to put this right away,

I think feature.experimental should be used for something that we
hope would benefit "everybody", not "most of the users".  This is a
promise to our testers, who opt into "early preview" of upcoming
features should not be subjected to "this may or may not give better
experiences depending on your workflow".  They may already be
enjoying and even relying on other experimental features by opting
in, and we should strive not to add a reason for them to turn the
feature.experimental bit off by saying "this new experimental feature
that recently joined does not work for my use case."
Junio C Hamano Sept. 16, 2021, 6:19 a.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> Neeraj Singh <nksingh85@gmail.com> writes:
>
>> This commit is mainly to ensure that we get coverage of batch mode on all
>> platforms in the CI infrastructure.  I don't believe it should be included in
>> mainline git without significantly more discussion and experimentation.
>
> Am I incorrect to say that only just a handful of code paths can
> take advantage of the bulk checkin "plugging-unplugging" feature to
> begin with, so running _all_ the existing tests that cover
> everything with this core.fsyncobjectfiles=batch setting is rather
> pointless?
>
> If so, perhaps instead of 6/6, you should identify key code paths
> that would be affected by this feature (perhaps "git add" is one of
> them), and either write a new test script dedicated for this feature
> or piggy-back on existing test scripts that already tests the code
> paths and adding new test pieces there that exercise this new feature.
>
> If it is a good idea to run all the tests with core.fsyncobjectfiles
> set to batch, however, it probalby is easiest to invent a new
> environment variable GIT_TEST_FORCE_CORE_FSYNCOBJECTFILES and have
> it honored as the default when it is set, and add a NEW CI job that
> exports the environment with the value "batch".  

I have to take a part of this back.  A new environment variable that
is honored in the absense of core.fsyncobjectfiles would be needed
if you need to run all tests, but you do not necessarily have to add
a new CI job---instead you should be able to piggyback on an
existing job, by mimicking the way how ci/run-build-and-tests.sh
enables various test options on one of the jobs.

> Other people
> (including the ones from Microsoft, I think) are much more familiar
> than I am on how to make this kind of thing work in GitHub Actions.

This part still stands ;-)  There might be a better way than adding
yet another environment variable.
diff mbox series

Patch

diff --git a/environment.c b/environment.c
index 3e23eafff80..27d5e11267e 100644
--- a/environment.c
+++ b/environment.c
@@ -43,7 +43,7 @@  const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
+enum FSYNC_OBJECT_FILES_MODE fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;