Message ID | 20190301175024.17337-1-alban.gruin@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | name-rev: improve memory usage | expand |
On Fri, Mar 01, 2019 at 06:50:20PM +0100, Alban Gruin wrote: > rafasc reported on IRC that on a repository with a lot of branches, > tags, remotes, and commits, name-rev --stdin could use a massive amount > of memory (more than 2GB of RAM) to complete. > > This patch series tries to improve name-rev’s memory usage. Have you tried this? diff --git a/builtin/name-rev.c b/builtin/name-rev.c index f1cb45c227..7aaa86f1c0 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -431,6 +431,8 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) OPT_END(), }; + save_commit_buffer = 0; + init_commit_rev_name(&rev_names); git_config(git_default_config, NULL); argc = parse_options(argc, argv, prefix, opts, name_rev_usage, 0); It seems to lower heap usage of: git name-rev 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 in linux.git (that commit is the final one reported by "git log", so it's traversing all of history) from ~1GB to ~300MB. -Peff
Hi Jeff, Le 01/03/2019 à 19:42, Jeff King a écrit : > On Fri, Mar 01, 2019 at 06:50:20PM +0100, Alban Gruin wrote: > >> rafasc reported on IRC that on a repository with a lot of branches, >> tags, remotes, and commits, name-rev --stdin could use a massive amount >> of memory (more than 2GB of RAM) to complete. >> >> This patch series tries to improve name-rev’s memory usage. > > Have you tried this? > > diff --git a/builtin/name-rev.c b/builtin/name-rev.c > index f1cb45c227..7aaa86f1c0 100644 > --- a/builtin/name-rev.c > +++ b/builtin/name-rev.c > @@ -431,6 +431,8 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) > OPT_END(), > }; > > + save_commit_buffer = 0; > + > init_commit_rev_name(&rev_names); > git_config(git_default_config, NULL); > argc = parse_options(argc, argv, prefix, opts, name_rev_usage, 0); > > It seems to lower heap usage of: > > git name-rev 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 > > in linux.git (that commit is the final one reported by "git log", so > it's traversing all of history) from ~1GB to ~300MB. > > -Peff > Unfortunately this does not work in all cases, apparently. On my git copy, I have 3 origins. If I run this: git log --graph --oneline --abbrev=-1 -5 | git name-rev --stdin With or without your change, it uses 3GB of RAM. With this series, it uses 25MB of RAM. With the first git commit: git name-rev e83c5163316f89bfbde7d9ab23ca2e25604af290 It also uses 3GB of RAM. With this series, it uses around 350MB of RAM. Which makes me think that I should adapt 4/4 to arguments. Sorry, I should have specified this in the cover letter. Cheers, Alban
On Fri, Mar 01, 2019 at 08:14:26PM +0100, Alban Gruin wrote: > > diff --git a/builtin/name-rev.c b/builtin/name-rev.c > > index f1cb45c227..7aaa86f1c0 100644 > > --- a/builtin/name-rev.c > > +++ b/builtin/name-rev.c > > @@ -431,6 +431,8 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix) > > OPT_END(), > > }; > > > > + save_commit_buffer = 0; > > + > [...] > > Unfortunately this does not work in all cases, apparently. On my git > copy, I have 3 origins. If I run this: > > git log --graph --oneline --abbrev=-1 -5 | git name-rev --stdin > > With or without your change, it uses 3GB of RAM. With this series, it > uses 25MB of RAM. Sorry if I was unclear. I meant to try that _in addition_ to your changes. It helps by avoiding keeping the useless commit-object buffers in RAM as we traverse. But the most it can save is the total uncompressed bytes of all commit objects. I.e., in git.git: $ git cat-file --batch-check='%(objectsize) %(objecttype)' --batch-all-objects | grep commit | perl -alne '$total += $F[0]; END { print $total }' 74678114 or around 70MB. In linux.git, it's more like 700MB. But in your examples, the problem is the inefficiencies in name-rev's algorithm, and you're not actually traversing that many commits. So I think you'd want to turn off save_commit_buffer as an extra patch in your series. It may or not be a big win for any given case, but it's quite easy to do. -Peff