avoid unncessary malloc of whole file size

Message ID	20190124203639.GA17595@kitenet.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> Date: Thu, 24 Jan 2019 16:36:39 -0400 From: Joey Hess <id@joeyh.name> To: Jeff King <peff@peff.net> Cc: git <git@vger.kernel.org> Subject: [PATCH] avoid unncessary malloc of whole file size Message-ID: <20190124203639.GA17595@kitenet.net> References: <20190122220714.GA6176@kitenet.net> <20190124121037.GA4949@sigill.intra.peff.net> <20190124183810.GC29200@kitenet.net> <20190124191836.GA31073@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190124191836.GA31073@sigill.intra.peff.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: git-owner@vger.kernel.org Precedence: bulk
Series	avoid unncessary malloc of whole file size \| expand avoid unncessary malloc of whole file size

Message ID

20190124203639.GA17595@kitenet.net (mailing list archive)

State

New, archived

Headers

Date: Thu, 24 Jan 2019 16:36:39 -0400
From: Joey Hess <id@joeyh.name>
To: Jeff King <peff@peff.net>
Cc: git <git@vger.kernel.org>
Subject: [PATCH] avoid unncessary malloc of whole file size
Message-ID: <20190124203639.GA17595@kitenet.net>
References: <20190122220714.GA6176@kitenet.net>
 <20190124121037.GA4949@sigill.intra.peff.net>
 <20190124183810.GC29200@kitenet.net>
 <20190124191836.GA31073@sigill.intra.peff.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20190124191836.GA31073@sigill.intra.peff.net>
User-Agent: Mutt/1.10.1 (2018-07-13)
Sender: git-owner@vger.kernel.org
Precedence: bulk

Series

avoid unncessary malloc of whole file size | expand

Commit Message

Joey Hess Jan. 24, 2019, 8:36 p.m. UTC

When a worktree file is larger than the available memory, and a clean
filter is in use, this avoids mallocing a buffer the whole size of the
file when reading from the clean filter, which caused commands like git
status and git commit to OOM.

Often in this situation the clean filter will produce a short identifier
for the file, so such a large buffer is not needed.

When the clean filter does output something around the same size as the
worktree file, the buffer will need to be reallocated until it fits,
starting at 8192 and doubling in size. Benchmarking indicates that
reallocation is not a significant overhead for outputs up to a
few MB in size.

Signed-off-by: Joey Hess <id@joeyh.name>
---
 convert.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Junio C Hamano Jan. 24, 2019, 9:12 p.m. UTC | #1

Joey Hess <id@joeyh.name> writes:

> When a worktree file is larger than the available memory, and a clean
> filter is in use, this avoids mallocing a buffer the whole size of the
> file when reading from the clean filter, which caused commands like git
> status and git commit to OOM.
>
> Often in this situation the clean filter will produce a short identifier
> for the file, so such a large buffer is not needed.
>
> When the clean filter does output something around the same size as the
> worktree file, the buffer will need to be reallocated until it fits,
> starting at 8192 and doubling in size. Benchmarking indicates that
> reallocation is not a significant overhead for outputs up to a
> few MB in size.

Problem description first, then solultion.  "... this avoids ..." is
already talking about solution while forcing the readers to know
what the problem is.

    When a worktree file is ... filter is in use, we allocate a
    buffer for the whole size of the file when reading from the
    clean filter.  This can force us to overallocate if the clean
    filter is used to radically shrink a huge file and replace it
    with a small token (e.g. git-annex or git-lfs) and lead to OOM
    at the worst case.  Reading from the filter and growing the
    buffer as we go would avoid such an unnecessary OOM.

    When the clean filter does output ...
    ... few MB in size.

perhaps.

> Signed-off-by: Joey Hess <id@joeyh.name>
> ---
>  convert.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/convert.c b/convert.c
> index 0d89ae7c23..85aebe2ed3 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -732,7 +732,7 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
>  	if (start_async(&async))
>  		return 0;	/* error was already reported */
>  
> -	if (strbuf_read(&nbuf, async.out, len) < 0) {
> +	if (strbuf_read(&nbuf, async.out, 0) < 0) {
>  		err = error(_("read from external filter '%s' failed"), cmd);
>  	}
>  	if (close(async.out)) {

Jeff King Jan. 24, 2019, 9:18 p.m. UTC | #2

On Thu, Jan 24, 2019 at 01:12:15PM -0800, Junio C Hamano wrote:

> Joey Hess <id@joeyh.name> writes:
> 
> > When a worktree file is larger than the available memory, and a clean
> > filter is in use, this avoids mallocing a buffer the whole size of the
> > file when reading from the clean filter, which caused commands like git
> > status and git commit to OOM.
> >
> > Often in this situation the clean filter will produce a short identifier
> > for the file, so such a large buffer is not needed.
> >
> > When the clean filter does output something around the same size as the
> > worktree file, the buffer will need to be reallocated until it fits,
> > starting at 8192 and doubling in size. Benchmarking indicates that
> > reallocation is not a significant overhead for outputs up to a
> > few MB in size.
> 
> Problem description first, then solultion.  "... this avoids ..." is
> already talking about solution while forcing the readers to know
> what the problem is.
> 
>     When a worktree file is ... filter is in use, we allocate a
>     buffer for the whole size of the file when reading from the
>     clean filter.  This can force us to overallocate if the clean
>     filter is used to radically shrink a huge file and replace it
>     with a small token (e.g. git-annex or git-lfs) and lead to OOM
>     at the worst case.  Reading from the filter and growing the
>     buffer as we go would avoid such an unnecessary OOM.
> 
>     When the clean filter does output ...
>     ... few MB in size.
> 
> perhaps.

Yeah, I agree that organization is nicer. Other than that, the patch
looks good to me.

-Peff

diff --git a/convert.c b/convert.c
index 0d89ae7c23..85aebe2ed3 100644
--- a/convert.c
+++ b/convert.c
@@ -732,7 +732,7 @@  static int apply_single_file_filter(const char *path, const char *src, size_t le
 	if (start_async(&async))
 		return 0;	/* error was already reported */
 
-	if (strbuf_read(&nbuf, async.out, len) < 0) {
+	if (strbuf_read(&nbuf, async.out, 0) < 0) {
 		err = error(_("read from external filter '%s' failed"), cmd);
 	}
 	if (close(async.out)) {

avoid unncessary malloc of whole file size

Commit Message

Comments

Patch