diff mbox series

Fix to avoid high memory footprint

Message ID pull.1744.git.git.1721117039874.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Fix to avoid high memory footprint | expand

Commit Message

Haritha D July 16, 2024, 8:03 a.m. UTC
From: D Harithamma <harithamma.d@ibm.com>

This fix avoids high memory footprint when
adding files that require conversion.
Git has a trace_encoding routine that prints trace
output when GIT_TRACE_WORKING_TREE_ENCODING=1 is
set. This environment variable is used to debug
the encoding contents.
When a 40MB file is added, it requests close to
1.8GB of storage from xrealloc which can lead
to out of memory errors.
However, the check for
GIT_TRACE_WORKING_TREE_ENCODING is done after
the string is allocated. This resolves high
memory footprints even when
GIT_TRACE_WORKING_TREE_ENCODING is not active.
This fix adds an early exit to avoid the
unnecessary memory allocation.

Signed-off-by: Haritha D <harithamma.d@ibm.com>
---
    Fix to avoid high memory footprint
    
    This fix avoids high memory footprint when adding files that require
    conversion
    
    Git has a trace_encoding routine that prints trace output when
    GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment variable is
    used to debug the encoding contents. When a 40MB file is added, it
    requests close to 1.8GB of storage from xrealloc which can lead to out
    of memory errors. However, the check for GIT_TRACE_WORKING_TREE_ENCODING
    is done after the string is allocated. This resolves high memory
    footprints even when GIT_TRACE_WORKING_TREE_ENCODING is not active. This
    fix adds an early exit to avoid the unnecessary memory allocation.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1744%2FHarithaIBM%2FmemFootprintFix-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1744/HarithaIBM/memFootprintFix-v1
Pull-Request: https://github.com/git/git/pull/1744

 convert.c | 5 +++++
 1 file changed, 5 insertions(+)


base-commit: 557ae147e6cdc9db121269b058c757ac5092f9c9

Comments

Jeff King July 17, 2024, 6:16 a.m. UTC | #1
On Tue, Jul 16, 2024 at 08:03:59AM +0000, Haritha  via GitGitGadget wrote:

> From: D Harithamma <harithamma.d@ibm.com>
> 
> This fix avoids high memory footprint when
> adding files that require conversion.
> Git has a trace_encoding routine that prints trace
> output when GIT_TRACE_WORKING_TREE_ENCODING=1 is
> set. This environment variable is used to debug
> the encoding contents.
> When a 40MB file is added, it requests close to
> 1.8GB of storage from xrealloc which can lead
> to out of memory errors.
> However, the check for
> GIT_TRACE_WORKING_TREE_ENCODING is done after
> the string is allocated. This resolves high
> memory footprints even when
> GIT_TRACE_WORKING_TREE_ENCODING is not active.
> This fix adds an early exit to avoid the
> unnecessary memory allocation.
> 
> Signed-off-by: Haritha D <harithamma.d@ibm.com>

Good find. Any trace function should verify that tracing is enabled
before doing any substantial work.

Let's take a look at your patch. First, your line wrapping is unusual,
making the commit message a bit hard to read. We'd usually shoot for ~72
characters per line. So more like:

> This fix avoids high memory footprint when adding files that require
> conversion.  Git has a trace_encoding routine that prints trace output
> when GIT_TRACE_WORKING_TREE_ENCODING=1 is set. This environment
> variable is used to debug the encoding contents.  When a 40MB file is
> added, it requests close to 1.8GB of storage from xrealloc which can
> lead to out of memory errors.  However, the check for
> GIT_TRACE_WORKING_TREE_ENCODING is done after the string is allocated.
> This resolves high memory footprints even when
> GIT_TRACE_WORKING_TREE_ENCODING is not active.  This fix adds an early
> exit to avoid the unnecessary memory allocation.

Second, we'd like a full real name in the Signed-off-by line, as you're
agreeing to the DCO. See:

  https://git-scm.com/docs/SubmittingPatches#sign-off

Likewise, the author name should match the signoff name (you can use
"git commit --amend --author=..." to fix it).

For the patch itself:

> --- a/convert.c
> +++ b/convert.c
> @@ -324,6 +324,11 @@ static void trace_encoding(const char *context, const char *path,
>  	struct strbuf trace = STRBUF_INIT;
>  	int i;
>  
> +	// If tracing is not on, exit early to avoid high memory footprint
> +	if (!trace_pass_fl(&coe)) {
> +		return;
> +	}

I don't think trace_pass_fl() is what you want. It will return true if
the trace fd is non-zero (so tracing was requested), but also if the key
has not yet been initialized (i.e., nobody has used this key to try
printing anything yet).

I think you'd just use trace_want(&coe) instead.

Also, two style nits:

 - our usual style (see Documentation/CodingGuidelines) is to avoid
   braces for one-liners.

 - we only use the /* */ comment form, not //. Though IMHO you could
   skip the comment completely here, as an early-return check in a
   tracing function is pretty obvious.

It would be nice if we could test this, but besides the wasted work, I
don't think there's any user-visible behavior (the problem is that we
are computing things when we're _not_ tracing, so there's nothing for
the user to see). And there's no provision in our test suite for
measuring memory usage of a program. So I think we can live without it,
and just manually verifying that it works (but it would be good to show
the measurements you did manually in the commit message).

-Peff
diff mbox series

Patch

diff --git a/convert.c b/convert.c
index d8737fe0f2d..e765bcd53d6 100644
--- a/convert.c
+++ b/convert.c
@@ -324,6 +324,11 @@  static void trace_encoding(const char *context, const char *path,
 	struct strbuf trace = STRBUF_INIT;
 	int i;
 
+	// If tracing is not on, exit early to avoid high memory footprint
+	if (!trace_pass_fl(&coe)) {
+		return;
+	}
+
 	strbuf_addf(&trace, "%s (%s, considered %s):\n", context, path, encoding);
 	for (i = 0; i < len && buf; ++i) {
 		strbuf_addf(