Message ID | 20190124203639.GA17595@kitenet.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | avoid unncessary malloc of whole file size | expand |
Joey Hess <id@joeyh.name> writes: > When a worktree file is larger than the available memory, and a clean > filter is in use, this avoids mallocing a buffer the whole size of the > file when reading from the clean filter, which caused commands like git > status and git commit to OOM. > > Often in this situation the clean filter will produce a short identifier > for the file, so such a large buffer is not needed. > > When the clean filter does output something around the same size as the > worktree file, the buffer will need to be reallocated until it fits, > starting at 8192 and doubling in size. Benchmarking indicates that > reallocation is not a significant overhead for outputs up to a > few MB in size. Problem description first, then solultion. "... this avoids ..." is already talking about solution while forcing the readers to know what the problem is. When a worktree file is ... filter is in use, we allocate a buffer for the whole size of the file when reading from the clean filter. This can force us to overallocate if the clean filter is used to radically shrink a huge file and replace it with a small token (e.g. git-annex or git-lfs) and lead to OOM at the worst case. Reading from the filter and growing the buffer as we go would avoid such an unnecessary OOM. When the clean filter does output ... ... few MB in size. perhaps. > Signed-off-by: Joey Hess <id@joeyh.name> > --- > convert.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/convert.c b/convert.c > index 0d89ae7c23..85aebe2ed3 100644 > --- a/convert.c > +++ b/convert.c > @@ -732,7 +732,7 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le > if (start_async(&async)) > return 0; /* error was already reported */ > > - if (strbuf_read(&nbuf, async.out, len) < 0) { > + if (strbuf_read(&nbuf, async.out, 0) < 0) { > err = error(_("read from external filter '%s' failed"), cmd); > } > if (close(async.out)) {
On Thu, Jan 24, 2019 at 01:12:15PM -0800, Junio C Hamano wrote: > Joey Hess <id@joeyh.name> writes: > > > When a worktree file is larger than the available memory, and a clean > > filter is in use, this avoids mallocing a buffer the whole size of the > > file when reading from the clean filter, which caused commands like git > > status and git commit to OOM. > > > > Often in this situation the clean filter will produce a short identifier > > for the file, so such a large buffer is not needed. > > > > When the clean filter does output something around the same size as the > > worktree file, the buffer will need to be reallocated until it fits, > > starting at 8192 and doubling in size. Benchmarking indicates that > > reallocation is not a significant overhead for outputs up to a > > few MB in size. > > Problem description first, then solultion. "... this avoids ..." is > already talking about solution while forcing the readers to know > what the problem is. > > When a worktree file is ... filter is in use, we allocate a > buffer for the whole size of the file when reading from the > clean filter. This can force us to overallocate if the clean > filter is used to radically shrink a huge file and replace it > with a small token (e.g. git-annex or git-lfs) and lead to OOM > at the worst case. Reading from the filter and growing the > buffer as we go would avoid such an unnecessary OOM. > > When the clean filter does output ... > ... few MB in size. > > perhaps. Yeah, I agree that organization is nicer. Other than that, the patch looks good to me. -Peff
diff --git a/convert.c b/convert.c index 0d89ae7c23..85aebe2ed3 100644 --- a/convert.c +++ b/convert.c @@ -732,7 +732,7 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le if (start_async(&async)) return 0; /* error was already reported */ - if (strbuf_read(&nbuf, async.out, len) < 0) { + if (strbuf_read(&nbuf, async.out, 0) < 0) { err = error(_("read from external filter '%s' failed"), cmd); } if (close(async.out)) {
When a worktree file is larger than the available memory, and a clean filter is in use, this avoids mallocing a buffer the whole size of the file when reading from the clean filter, which caused commands like git status and git commit to OOM. Often in this situation the clean filter will produce a short identifier for the file, so such a large buffer is not needed. When the clean filter does output something around the same size as the worktree file, the buffer will need to be reallocated until it fits, starting at 8192 and doubling in size. Benchmarking indicates that reallocation is not a significant overhead for outputs up to a few MB in size. Signed-off-by: Joey Hess <id@joeyh.name> --- convert.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)