diff mbox series

[v3] Fix to avoid high memory footprint

Message ID pull.1744.v3.git.git.1721975234873.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 63ad8dbf169ec8e2b3cef40ff51499ee751a84a5
Headers show
Series [v3] Fix to avoid high memory footprint | expand

Commit Message

Haritha D July 26, 2024, 6:27 a.m. UTC
From: D Harithamma <harithamma.d@ibm.com>

When Git adds a file requiring encoding conversion and tracing of encoding
conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
environment variable, the `trace_encoding()` function still allocates &
prepares "human readable" copies of the file contents before and after
conversion to show in the trace. This results in a high memory footprint
and increased runtime without providing any user-visible benefit.

This fix introduces an early exit from the `trace_encoding()` function
when tracing is not requested, preventing unnecessary memory allocation
and processing.

Signed-off-by: Harithamma D <harithamma.d@ibm.com>
---
    Fix to avoid high memory footprint
    
    This fix avoids high memory footprint when adding files that require
    conversion
    
    Git has a trace_encoding routine that prints trace output when
    GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment variable is
    used to debug the encoding contents. When a 40MB file is added, it
    requests close to 1.8GB of storage from xrealloc which can lead to out
    of memory errors. However, the check for GIT_TRACE_WORKING_TREE_ENCODING
    is done after the string is allocated. This resolves high memory
    footprints even when GIT_TRACE_WORKING_TREE_ENCODING is not active. This
    fix adds an early exit to avoid the unnecessary memory allocation.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1744%2FHarithaIBM%2FmemFootprintFix-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1744/HarithaIBM/memFootprintFix-v3
Pull-Request: https://github.com/git/git/pull/1744

Range-diff vs v2:

 1:  500b7eacf2a ! 1:  d864de64380 Fix to avoid high memory footprint
     @@ Metadata
       ## Commit message ##
          Fix to avoid high memory footprint
      
     -    This fix avoids high memory footprint when adding files that require
     -    conversion.  Git has a trace_encoding routine that prints trace output
     -    when GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment
     -    variable is used to debug the encoding contents.  When a 40MB file is
     -    added, it requests close to 1.8GB of storage from xrealloc which can
     -    lead to out of memory errors.  However, the check for
     -    GIT_TRACE_WORKING_TREE_ENCODING is done after the string is allocated.
     -    This resolves high memory footprints even when
     -    GIT_TRACE_WORKING_TREE_ENCODING is not active.  This fix adds an early
     -    exit to avoid the unnecessary memory allocation.
     +    When Git adds a file requiring encoding conversion and tracing of encoding
     +    conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
     +    environment variable, the `trace_encoding()` function still allocates &
     +    prepares "human readable" copies of the file contents before and after
     +    conversion to show in the trace. This results in a high memory footprint
     +    and increased runtime without providing any user-visible benefit.
     +
     +    This fix introduces an early exit from the `trace_encoding()` function
     +    when tracing is not requested, preventing unnecessary memory allocation
     +    and processing.
      
          Signed-off-by: Harithamma D <harithamma.d@ibm.com>
      


 convert.c | 3 +++
 1 file changed, 3 insertions(+)


base-commit: 557ae147e6cdc9db121269b058c757ac5092f9c9

Comments

Torsten Bögershausen July 26, 2024, 9:55 a.m. UTC | #1
On Fri, Jul 26, 2024 at 06:27:14AM +0000, Haritha  via GitGitGadget wrote:
> From: D Harithamma <harithamma.d@ibm.com>
>
> When Git adds a file requiring encoding conversion and tracing of encoding
> conversion is not requested via the GIT_TRACE_WORKING_TREE_ENCODING
> environment variable, the `trace_encoding()` function still allocates &
> prepares "human readable" copies of the file contents before and after
> conversion to show in the trace. This results in a high memory footprint
> and increased runtime without providing any user-visible benefit.
>
> This fix introduces an early exit from the `trace_encoding()` function
> when tracing is not requested, preventing unnecessary memory allocation
> and processing.
>
> Signed-off-by: Harithamma D <harithamma.d@ibm.com>
> ---
>     Fix to avoid high memory footprint
>

This head line
> Fix to avoid high memory footprint
does not tell to much when and how it happens.
The word "fix" is not realy needed (in this project).

Something like
 "convert: avoid high memory footprint"

will tell the reader, that only the convert functionality is affected
by this patch.

Thinking about it, another suggestion may be:

convert: Reduce memory allocation when trace_encoding() is not used

If someone browses through the whole history of Git, this is easier to
follow.

The exact wording may be improved, important would be to have "convert:"

as the first keyword, and then "memory allocation" and "trace_encoding()"
give hints, what this is all about in one line.

And the rest looks good.
Junio C Hamano July 26, 2024, 3:06 p.m. UTC | #2
"Haritha  via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: D Harithamma <harithamma.d@ibm.com>
> ...
> Signed-off-by: Harithamma D <harithamma.d@ibm.com>

I am assuming that you and the person who did d254e650 (build:
support z/OS (OS/390)., 2024-03-06) are the same person?  That
commit was signed off like so:

    commit d254e65092daba8667d6b4d5b4f59c099c1edd1f
    Author: Haritha D <harithamma.d@ibm.com>
    Date:   Wed Mar 6 05:44:17 2024 +0000

        build: support z/OS (OS/390).

        Introduced z/OS (OS/390) as a platform in config.mak.uname

        Signed-off-by: Haritha D <harithamma.d@ibm.com>
        Signed-off-by: Junio C Hamano <gitster@pobox.com>

It is OK if you really want to use a longer name this time, but then
please be consistent within a single commit.  The author name of the
proposed commit is "D Harithamma" and a different name "Harithamma D"
is used to sign off the commit, which is not what we want to see.

Thanks.
Junio C Hamano July 26, 2024, 3:12 p.m. UTC | #3
"Haritha  via GitGitGadget" <gitgitgadget@gmail.com> writes:

Another thing.

> Subject: Re: [PATCH v3] Fix to avoid high memory footprint

This does not tell us much about what area had problem under what
condition.  "git shortlog --no-merges -200 master" may give us good
examples of how we typically write the title of our commits.

In the case of this change, ideally we should be able to tell that
this is about tracing the conversion codepath.

    Subject: [PATCH vN] encoding: return early when not tracing conversion

or something, perhaps?
Haritha D July 30, 2024, 3:41 a.m. UTC | #4
Hello Team,

I have retained convert as per Torsten's comment. Rest i have changed as per Junio's suggestion. 

Thank you, everyone.

On 26/07/24, 8:42 PM, "Junio C Hamano" <gitster@pobox.com <mailto:gitster@pobox.com>> wrote:


"Haritha via GitGitGadget" <gitgitgadget@gmail.com <mailto:gitgitgadget@gmail.com>> writes:


Another thing.


> Subject: Re: [PATCH v3] Fix to avoid high memory footprint


This does not tell us much about what area had problem under what
condition. "git shortlog --no-merges -200 master" may give us good
examples of how we typically write the title of our commits.


In the case of this change, ideally we should be able to tell that
this is about tracing the conversion codepath.


Subject: [PATCH vN] encoding: return early when not tracing conversion


or something, perhaps?
diff mbox series

Patch

diff --git a/convert.c b/convert.c
index d8737fe0f2d..c4ddc4de81b 100644
--- a/convert.c
+++ b/convert.c
@@ -324,6 +324,9 @@  static void trace_encoding(const char *context, const char *path,
 	struct strbuf trace = STRBUF_INIT;
 	int i;
 
+	if (!trace_want(&coe))
+		return;
+
 	strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding);
 	for (i = 0; i < len && buf; ++i) {
 		strbuf_addf(