diff mbox series

avoid unncessary malloc of whole file size

Message ID 20190124203639.GA17595@kitenet.net (mailing list archive)
State New, archived
Headers show
Series avoid unncessary malloc of whole file size | expand

Commit Message

Joey Hess Jan. 24, 2019, 8:36 p.m. UTC
When a worktree file is larger than the available memory, and a clean
filter is in use, this avoids mallocing a buffer the whole size of the
file when reading from the clean filter, which caused commands like git
status and git commit to OOM.

Often in this situation the clean filter will produce a short identifier
for the file, so such a large buffer is not needed.

When the clean filter does output something around the same size as the
worktree file, the buffer will need to be reallocated until it fits,
starting at 8192 and doubling in size. Benchmarking indicates that
reallocation is not a significant overhead for outputs up to a
few MB in size.

Signed-off-by: Joey Hess <id@joeyh.name>
---
 convert.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Junio C Hamano Jan. 24, 2019, 9:12 p.m. UTC | #1
Joey Hess <id@joeyh.name> writes:

> When a worktree file is larger than the available memory, and a clean
> filter is in use, this avoids mallocing a buffer the whole size of the
> file when reading from the clean filter, which caused commands like git
> status and git commit to OOM.
>
> Often in this situation the clean filter will produce a short identifier
> for the file, so such a large buffer is not needed.
>
> When the clean filter does output something around the same size as the
> worktree file, the buffer will need to be reallocated until it fits,
> starting at 8192 and doubling in size. Benchmarking indicates that
> reallocation is not a significant overhead for outputs up to a
> few MB in size.

Problem description first, then solultion.  "... this avoids ..." is
already talking about solution while forcing the readers to know
what the problem is.

    When a worktree file is ... filter is in use, we allocate a
    buffer for the whole size of the file when reading from the
    clean filter.  This can force us to overallocate if the clean
    filter is used to radically shrink a huge file and replace it
    with a small token (e.g. git-annex or git-lfs) and lead to OOM
    at the worst case.  Reading from the filter and growing the
    buffer as we go would avoid such an unnecessary OOM.

    When the clean filter does output ...
    ... few MB in size.

perhaps.

> Signed-off-by: Joey Hess <id@joeyh.name>
> ---
>  convert.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/convert.c b/convert.c
> index 0d89ae7c23..85aebe2ed3 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -732,7 +732,7 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
>  	if (start_async(&async))
>  		return 0;	/* error was already reported */
>  
> -	if (strbuf_read(&nbuf, async.out, len) < 0) {
> +	if (strbuf_read(&nbuf, async.out, 0) < 0) {
>  		err = error(_("read from external filter '%s' failed"), cmd);
>  	}
>  	if (close(async.out)) {
Jeff King Jan. 24, 2019, 9:18 p.m. UTC | #2
On Thu, Jan 24, 2019 at 01:12:15PM -0800, Junio C Hamano wrote:

> Joey Hess <id@joeyh.name> writes:
> 
> > When a worktree file is larger than the available memory, and a clean
> > filter is in use, this avoids mallocing a buffer the whole size of the
> > file when reading from the clean filter, which caused commands like git
> > status and git commit to OOM.
> >
> > Often in this situation the clean filter will produce a short identifier
> > for the file, so such a large buffer is not needed.
> >
> > When the clean filter does output something around the same size as the
> > worktree file, the buffer will need to be reallocated until it fits,
> > starting at 8192 and doubling in size. Benchmarking indicates that
> > reallocation is not a significant overhead for outputs up to a
> > few MB in size.
> 
> Problem description first, then solultion.  "... this avoids ..." is
> already talking about solution while forcing the readers to know
> what the problem is.
> 
>     When a worktree file is ... filter is in use, we allocate a
>     buffer for the whole size of the file when reading from the
>     clean filter.  This can force us to overallocate if the clean
>     filter is used to radically shrink a huge file and replace it
>     with a small token (e.g. git-annex or git-lfs) and lead to OOM
>     at the worst case.  Reading from the filter and growing the
>     buffer as we go would avoid such an unnecessary OOM.
> 
>     When the clean filter does output ...
>     ... few MB in size.
> 
> perhaps.

Yeah, I agree that organization is nicer. Other than that, the patch
looks good to me.

-Peff
diff mbox series

Patch

diff --git a/convert.c b/convert.c
index 0d89ae7c23..85aebe2ed3 100644
--- a/convert.c
+++ b/convert.c
@@ -732,7 +732,7 @@  static int apply_single_file_filter(const char *path, const char *src, size_t le
 	if (start_async(&async))
 		return 0;	/* error was already reported */
 
-	if (strbuf_read(&nbuf, async.out, len) < 0) {
+	if (strbuf_read(&nbuf, async.out, 0) < 0) {
 		err = error(_("read from external filter '%s' failed"), cmd);
 	}
 	if (close(async.out)) {