mbox series

[RFC,v5,0/3] grep: almost no more leaks, hopefully no crashes

Message ID 20190809030210.18353-1-carenas@gmail.com (mailing list archive)
Headers show
Series grep: almost no more leaks, hopefully no crashes | expand

Message

Carlo Marcelo Arenas Belón Aug. 9, 2019, 3:02 a.m. UTC
This series is a candidate reroll for cb/pcre2-chartables-leakfix, that
hopefully addresses the root cause of the problem reported by Dscho in
Windows, where the PCRE2 library wasn't aware of the custom allocator and
was returning a pointer created with the system malloc but passing it to
NED's free, resulting in a segfault.

The most likely reason why it was triggered by the original leak fix is
the layering violation reported by René and that is likely exclusive to PCRE2
(hence why it hasn't been reported with PCRE1). Additional work might be
available in a future release of PCRE2 to address that as detailed in an
upstream bug[1] report.

Changes since v4 (only in patch 2):

* git log change reverted, still not sure where it will fit better and worst
  case will leak a few bytes when -P is used.
  since the users of this API are doing it indirectly it might be problematic
  long term though, but luckily since it is most of the tine a NOOP and can
  be called multiple times might be ok to do it unconditionally
* slightly better looking code

Changes since v3 (mostly in patch 2):

* git log also calls the "destructor" for grep API
* no more "bug" being triggered by make test, sorry René
* hopefully no more crashes in windows (I was expecting at most a BUG)

Future work (other than the needed refactoring explained in the
second patch) and adjacent bugs, includes:

* tracking more possible users of the grep API that might need to call
  grep_destroy()
* completely moving PCRE2 to use NED (as is done with PCRE1 and was
  proposed on the original patch[2] this is based on
* build on top of the new API so that other work could be shared
  (for example the chartables that started this whole mess)

or (hopefully not)

* ignore the original leak (maybe with an UNLEAK) as René suggested [3]
* discard this work and just use Dscho's fix (with some improvements,
  like using xmalloc)

[1] https://bugs.exim.org/show_bug.cgi?id=2429
[2] https://public-inbox.org/git/3397e6797f872aedd18c6d795f4976e1c579514b.1565005867.git.gitgitgadget@gmail.com/
[3] https://public-inbox.org/git/7ec60d57-9940-35f2-f7b5-c87d4dc7cdd5@web.de/

Carlo Marcelo Arenas Belón (3):
  grep: make PCRE1 aware of custom allocator
  grep: make PCRE2 aware of custom allocator
  grep: avoid leak of chartables in PCRE2

 Makefile       |  2 +-
 builtin/grep.c |  1 +
 grep.c         | 71 +++++++++++++++++++++++++++++++++++++++++++++++---
 grep.h         |  2 ++
 4 files changed, 72 insertions(+), 4 deletions(-)

base-commit: 51cf315870bbb7254ddf06c84fe03b41bc48eebd

Comments

Carlo Marcelo Arenas Belón Aug. 9, 2019, 11:24 a.m. UTC | #1
disregard this version, it still broken (and wouldn't even build on
some cases), the reasons why are still unclear though but at least it
might
seem from the last known run in windows that segfaults were prevented
at last and something was still off enough to trigger a BUG (shouldn't
be a concern with V6 or later that do PCRE2 migration to NED fully, as
agreed)

Thanks to Dscho's github integration to Azure pipelines got finally a
RC[1] for V6 that at least passes all tests and will need to get in
shape for submission.

as René pointed out in three my performance concerns might be a red
herring as well, but would be nice if someone with windows would
definitely get some numbers so we can't be sure there were truly no
regressions

Carlo

[1] https://github.com/carenas/git/tree/pcre2-chartables-leakfix
René Scharfe Aug. 9, 2019, 5:01 p.m. UTC | #2
Am 09.08.19 um 13:24 schrieb Carlo Arenas:
> disregard this version, it still broken (and wouldn't even build on
> some cases), the reasons why are still unclear though but at least it
> might
> seem from the last known run in windows that segfaults were prevented
> at last and something was still off enough to trigger a BUG (shouldn't
> be a concern with V6 or later that do PCRE2 migration to NED fully, as
> agreed)

So how about starting stupidly simple?  You can test it everywhere, it
just needs libpcre2.  It works without that library as well (tested with
"make USE_LIBPCRE= USE_LIBPCRE2= test"), but doesn't allocate anything
in that case, of course.  The character tables leak fix should be safe
on top.  If you detect performance issues then we can address them in
additional patches.

-- >8 --
Subject: [PATCH] grep: use xmalloc() for all PCRE2 allocations

Build a PCRE2 global custom context when compiling a pattern and use it
to tell the library to use xmalloc() for allocations.  This provides
consistent out-of-memory handling and makes sure it uses a custom
allocator, e.g. with USE_NED_ALLOCATOR.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
The function names are ridiculously long, I tried to stay within 80
characters per line but gave up in the end and just kept going without
line breaks.  Fits the "stupidly simple" approach..

 grep.c | 23 +++++++++++++++++------
 grep.h |  2 ++
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/grep.c b/grep.c
index cd952ef5d3..44f4e38657 100644
--- a/grep.c
+++ b/grep.c
@@ -482,6 +482,16 @@ static void free_pcre1_regexp(struct grep_pat *p)
 #endif /* !USE_LIBPCRE1 */

 #ifdef USE_LIBPCRE2
+static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data)
+{
+	return xmalloc(size);
+}
+
+static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data)
+{
+	free(pointer);
+}
+
 static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
 {
 	int error;
@@ -495,12 +505,12 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt

 	assert(opt->pcre2);

-	p->pcre2_compile_context = NULL;
+	p->pcre2_general_context = pcre2_general_context_create(pcre2_malloc, pcre2_free, NULL);
+	p->pcre2_compile_context = pcre2_compile_context_create(p->pcre2_general_context);

 	if (opt->ignore_case) {
 		if (has_non_ascii(p->pattern)) {
-			character_tables = pcre2_maketables(NULL);
-			p->pcre2_compile_context = pcre2_compile_context_create(NULL);
+			character_tables = pcre2_maketables(p->pcre2_general_context);
 			pcre2_set_character_tables(p->pcre2_compile_context, character_tables);
 		}
 		options |= PCRE2_CASELESS;
@@ -513,7 +523,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 					 p->pcre2_compile_context);

 	if (p->pcre2_pattern) {
-		p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, NULL);
+		p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, p->pcre2_general_context);
 		if (!p->pcre2_match_data)
 			die("Couldn't allocate PCRE2 match data");
 	} else {
@@ -550,10 +560,10 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 			return;
 		}

-		p->pcre2_jit_stack = pcre2_jit_stack_create(1, 1024 * 1024, NULL);
+		p->pcre2_jit_stack = pcre2_jit_stack_create(1, 1024 * 1024, p->pcre2_general_context);
 		if (!p->pcre2_jit_stack)
 			die("Couldn't allocate PCRE2 JIT stack");
-		p->pcre2_match_context = pcre2_match_context_create(NULL);
+		p->pcre2_match_context = pcre2_match_context_create(p->pcre2_general_context);
 		if (!p->pcre2_match_context)
 			die("Couldn't allocate PCRE2 match context");
 		pcre2_jit_stack_assign(p->pcre2_match_context, NULL, p->pcre2_jit_stack);
@@ -605,6 +615,7 @@ static void free_pcre2_pattern(struct grep_pat *p)
 	pcre2_match_data_free(p->pcre2_match_data);
 	pcre2_jit_stack_free(p->pcre2_jit_stack);
 	pcre2_match_context_free(p->pcre2_match_context);
+	pcre2_general_context_free(p->pcre2_general_context);
 }
 #else /* !USE_LIBPCRE2 */
 static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
diff --git a/grep.h b/grep.h
index 1875880f37..73b8b87a3a 100644
--- a/grep.h
+++ b/grep.h
@@ -28,6 +28,7 @@ typedef int pcre_jit_stack;
 #else
 typedef int pcre2_code;
 typedef int pcre2_match_data;
+typedef int pcre2_general_context;
 typedef int pcre2_compile_context;
 typedef int pcre2_match_context;
 typedef int pcre2_jit_stack;
@@ -93,6 +94,7 @@ struct grep_pat {
 	int pcre1_jit_on;
 	pcre2_code *pcre2_pattern;
 	pcre2_match_data *pcre2_match_data;
+	pcre2_general_context *pcre2_general_context;
 	pcre2_compile_context *pcre2_compile_context;
 	pcre2_match_context *pcre2_match_context;
 	pcre2_jit_stack *pcre2_jit_stack;
--
2.22.0
Junio C Hamano Aug. 9, 2019, 5:46 p.m. UTC | #3
René Scharfe <l.s.r@web.de> writes:

> So how about starting stupidly simple?  You can test it everywhere, it
> just needs libpcre2.  It works without that library as well (tested with
> "make USE_LIBPCRE= USE_LIBPCRE2= test"), but doesn't allocate anything
> in that case, of course.  The character tables leak fix should be safe
> on top.  If you detect performance issues then we can address them in
> additional patches.
>
> -- >8 --
> Subject: [PATCH] grep: use xmalloc() for all PCRE2 allocations
>
> Build a PCRE2 global custom context when compiling a pattern and use it
> to tell the library to use xmalloc() for allocations.  This provides
> consistent out-of-memory handling and makes sure it uses a custom
> allocator, e.g. with USE_NED_ALLOCATOR.
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
> The function names are ridiculously long, I tried to stay within 80
> characters per line but gave up in the end and just kept going without
> line breaks.  Fits the "stupidly simple" approach..

;-)  Thanks for keeping the conversation going.
Johannes Schindelin Aug. 9, 2019, 9:26 p.m. UTC | #4
Hi,


On Fri, 9 Aug 2019, René Scharfe wrote:

> Am 09.08.19 um 13:24 schrieb Carlo Arenas:
> > disregard this version, it still broken (and wouldn't even build on
> > some cases), the reasons why are still unclear though but at least it
> > might
> > seem from the last known run in windows that segfaults were prevented
> > at last and something was still off enough to trigger a BUG (shouldn't
> > be a concern with V6 or later that do PCRE2 migration to NED fully, as
> > agreed)
>
> So how about starting stupidly simple?

FWIW I am very much in favor of this approach.

Thanks,
Dscho

> You can test it everywhere, it just needs libpcre2.  It works without
> that library as well (tested with "make USE_LIBPCRE= USE_LIBPCRE2=
> test"), but doesn't allocate anything in that case, of course.  The
> character tables leak fix should be safe on top.  If you detect
> performance issues then we can address them in additional patches.
>
> -- >8 --
> Subject: [PATCH] grep: use xmalloc() for all PCRE2 allocations
>
> Build a PCRE2 global custom context when compiling a pattern and use it
> to tell the library to use xmalloc() for allocations.  This provides
> consistent out-of-memory handling and makes sure it uses a custom
> allocator, e.g. with USE_NED_ALLOCATOR.
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
> The function names are ridiculously long, I tried to stay within 80
> characters per line but gave up in the end and just kept going without
> line breaks.  Fits the "stupidly simple" approach..
>
>  grep.c | 23 +++++++++++++++++------
>  grep.h |  2 ++
>  2 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/grep.c b/grep.c
> index cd952ef5d3..44f4e38657 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -482,6 +482,16 @@ static void free_pcre1_regexp(struct grep_pat *p)
>  #endif /* !USE_LIBPCRE1 */
>
>  #ifdef USE_LIBPCRE2
> +static void *pcre2_malloc(PCRE2_SIZE size, MAYBE_UNUSED void *memory_data)
> +{
> +	return xmalloc(size);
> +}
> +
> +static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data)
> +{
> +	free(pointer);
> +}
> +
>  static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
>  {
>  	int error;
> @@ -495,12 +505,12 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>
>  	assert(opt->pcre2);
>
> -	p->pcre2_compile_context = NULL;
> +	p->pcre2_general_context = pcre2_general_context_create(pcre2_malloc, pcre2_free, NULL);
> +	p->pcre2_compile_context = pcre2_compile_context_create(p->pcre2_general_context);
>
>  	if (opt->ignore_case) {
>  		if (has_non_ascii(p->pattern)) {
> -			character_tables = pcre2_maketables(NULL);
> -			p->pcre2_compile_context = pcre2_compile_context_create(NULL);
> +			character_tables = pcre2_maketables(p->pcre2_general_context);
>  			pcre2_set_character_tables(p->pcre2_compile_context, character_tables);
>  		}
>  		options |= PCRE2_CASELESS;
> @@ -513,7 +523,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>  					 p->pcre2_compile_context);
>
>  	if (p->pcre2_pattern) {
> -		p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, NULL);
> +		p->pcre2_match_data = pcre2_match_data_create_from_pattern(p->pcre2_pattern, p->pcre2_general_context);
>  		if (!p->pcre2_match_data)
>  			die("Couldn't allocate PCRE2 match data");
>  	} else {
> @@ -550,10 +560,10 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>  			return;
>  		}
>
> -		p->pcre2_jit_stack = pcre2_jit_stack_create(1, 1024 * 1024, NULL);
> +		p->pcre2_jit_stack = pcre2_jit_stack_create(1, 1024 * 1024, p->pcre2_general_context);
>  		if (!p->pcre2_jit_stack)
>  			die("Couldn't allocate PCRE2 JIT stack");
> -		p->pcre2_match_context = pcre2_match_context_create(NULL);
> +		p->pcre2_match_context = pcre2_match_context_create(p->pcre2_general_context);
>  		if (!p->pcre2_match_context)
>  			die("Couldn't allocate PCRE2 match context");
>  		pcre2_jit_stack_assign(p->pcre2_match_context, NULL, p->pcre2_jit_stack);
> @@ -605,6 +615,7 @@ static void free_pcre2_pattern(struct grep_pat *p)
>  	pcre2_match_data_free(p->pcre2_match_data);
>  	pcre2_jit_stack_free(p->pcre2_jit_stack);
>  	pcre2_match_context_free(p->pcre2_match_context);
> +	pcre2_general_context_free(p->pcre2_general_context);
>  }
>  #else /* !USE_LIBPCRE2 */
>  static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
> diff --git a/grep.h b/grep.h
> index 1875880f37..73b8b87a3a 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -28,6 +28,7 @@ typedef int pcre_jit_stack;
>  #else
>  typedef int pcre2_code;
>  typedef int pcre2_match_data;
> +typedef int pcre2_general_context;
>  typedef int pcre2_compile_context;
>  typedef int pcre2_match_context;
>  typedef int pcre2_jit_stack;
> @@ -93,6 +94,7 @@ struct grep_pat {
>  	int pcre1_jit_on;
>  	pcre2_code *pcre2_pattern;
>  	pcre2_match_data *pcre2_match_data;
> +	pcre2_general_context *pcre2_general_context;
>  	pcre2_compile_context *pcre2_compile_context;
>  	pcre2_match_context *pcre2_match_context;
>  	pcre2_jit_stack *pcre2_jit_stack;
> --
> 2.22.0
>
Carlo Marcelo Arenas Belón Aug. 10, 2019, 3:05 a.m. UTC | #5
On Fri, Aug 9, 2019 at 2:26 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> FWIW I am very much in favor of this approach.

FWIW this is (almost) exactly what I had in mind with my first reply,
except I wanted to make setting of the global context be conditioned
to having NED enabled to avoid possible performance issues in
environments were NED is not even usable.

in macOS (obviously testing without NED) the following is the output
of (a hacked version) of p7801 for maint (against chromium's
repository), with René's patch on top

Test                                       HEAD^               HEAD
--------------------------------------------------------------------------------------
7820.1: perl grep 'how.to'                 0.51(0.35+1.11)
0.48(0.33+1.16) -5.9%
7820.2: perl grep '^how to'                0.47(0.33+1.08)
0.45(0.34+1.11) -4.3%
7820.3: perl grep '[how] to'               0.49(0.40+1.11)
0.53(0.41+1.13) +8.2%
7820.4: perl grep '(e.t[^ ]*|v.ry) rare'   68.72(68.77+1.14)
72.10(72.15+1.20) +4.9%
7820.5: perl grep 'm(ú|u)lt.b(æ|y)te'      0.48(0.35+1.12)
0.50(0.40+1.23) +4.2%

and this is with my squashed[2] changed on top of that :

Test                                       HEAD^               HEAD
--------------------------------------------------------------------------------------
7820.1: perl grep 'how.to'                 0.48(0.36+1.16)
0.46(0.33+1.09) -4.2%
7820.2: perl grep '^how to'                0.45(0.34+1.12)
0.42(0.29+0.99) -6.7%
7820.3: perl grep '[how] to'               0.48(0.40+1.13)
0.52(0.43+1.16) +8.3%
7820.4: perl grep '(e.t[^ ]*|v.ry) rare'   69.12(69.10+1.07)
69.19(69.19+1.18) +0.1%
7820.5: perl grep 'm(ú|u)lt.b(æ|y)te'      0.49(0.38+1.17)
0.46(0.35+1.13) -6.1%

the degenerate case is not something we can't fix anyway, since it is
likely a locking issue inside PCRE2 (I see at most 1 CPU doing work),
and the numbers are noisy because of the other problems I mentioned
before (hardcoded to 8 threads, running in a laptop with low number of
cores), which is why testing for performance regressions in windows is
strongly encouraged, regardless

Carlo

[1] https://public-inbox.org/git/CAPUEspgH1v1zo7smzQWCV4rX9pKVKLV84gDSfCPdT7LffQxUWw@mail.gmail.com/
[2] https://public-inbox.org/git/20190810030315.7519-1-carenas@gmail.com/
René Scharfe Aug. 10, 2019, 7:56 a.m. UTC | #6
Am 10.08.19 um 05:05 schrieb Carlo Arenas:
> in macOS (obviously testing without NED) the following is the output
> of (a hacked version) of p7801 for maint (against chromium's
> repository), with René's patch on top

Do you mean p7820?  And what did you change?  Looking at the results you
removed basic and extended from the list of regex engines, right?

Ugh, cloning https://chromium.googlesource.com/chromium/src.git sends
more than 16GB across the wire.  Is that even the right repository?  Not
sure if my machine can keep the relevant parts cached while grepping --
I/O times could drown out any difference due to context allocation and
memory allocator choice.  Let's see...

>
> Test                                       HEAD^               HEAD
> --------------------------------------------------------------------------------------
> 7820.1: perl grep 'how.to'                 0.51(0.35+1.11)
> 0.48(0.33+1.16) -5.9%
> 7820.2: perl grep '^how to'                0.47(0.33+1.08)
> 0.45(0.34+1.11) -4.3%
> 7820.3: perl grep '[how] to'               0.49(0.40+1.11)
> 0.53(0.41+1.13) +8.2%
> 7820.4: perl grep '(e.t[^ ]*|v.ry) rare'   68.72(68.77+1.14)
> 72.10(72.15+1.20) +4.9%
> 7820.5: perl grep 'm(ú|u)lt.b(æ|y)te'      0.48(0.35+1.12)
> 0.50(0.40+1.23) +4.2%
>
> and this is with my squashed[2] changed on top of that :
>
> Test                                       HEAD^               HEAD
> --------------------------------------------------------------------------------------
> 7820.1: perl grep 'how.to'                 0.48(0.36+1.16)
> 0.46(0.33+1.09) -4.2%
> 7820.2: perl grep '^how to'                0.45(0.34+1.12)
> 0.42(0.29+0.99) -6.7%
> 7820.3: perl grep '[how] to'               0.48(0.40+1.13)
> 0.52(0.43+1.16) +8.3%
> 7820.4: perl grep '(e.t[^ ]*|v.ry) rare'   69.12(69.10+1.07)
> 69.19(69.19+1.18) +0.1%
> 7820.5: perl grep 'm(ú|u)lt.b(æ|y)te'      0.49(0.38+1.17)
> 0.46(0.35+1.13) -6.1%
>
> the degenerate case is not something we can't fix anyway, since it is
> likely a locking issue inside PCRE2 (I see at most 1 CPU doing work),
> and the numbers are noisy because of the other problems I mentioned
> before (hardcoded to 8 threads, running in a laptop with low number of
> cores), which is why testing for performance regressions in windows is
> strongly encouraged, regardless
>
> Carlo
>
> [1] https://public-inbox.org/git/CAPUEspgH1v1zo7smzQWCV4rX9pKVKLV84gDSfCPdT7LffQxUWw@mail.gmail.com/
> [2] https://public-inbox.org/git/20190810030315.7519-1-carenas@gmail.com/
>

So I pointed GIT_PERF_LARGE_REPO to the monster mentioned above, ran the
test once for warmup and here are the results of the second run:

Test                                            origin/master       pcre2-xmalloc             pcre2-xmalloc+nedmalloc
---------------------------------------------------------------------------------------------------------------------
7820.1: basic grep 'how.to'                     1.59(2.93+1.75)     1.60(3.04+1.74) +0.6%     1.64(2.87+1.90) +3.1%
7820.2: extended grep 'how.to'                  1.59(2.98+1.66)     1.55(2.83+1.76) -2.5%     1.67(3.15+1.70) +5.0%
7820.3: perl grep 'how.to'                      1.25(1.21+2.13)     1.25(1.24+2.08) +0.0%     1.29(1.32+2.08) +3.2%
7820.5: basic grep '^how to'                    1.52(2.82+1.66)     1.51(2.68+1.77) -0.7%     1.64(3.07+1.69) +7.9%
7820.6: extended grep '^how to'                 1.57(2.84+1.75)     1.51(2.76+1.73) -3.8%     1.61(2.95+1.75) +2.5%
7820.7: perl grep '^how to'                     1.21(1.15+2.10)     1.22(1.26+1.98) +0.8%     1.27(1.22+2.09) +5.0%
7820.9: basic grep '[how] to'                   1.95(4.51+1.68)     1.96(4.48+1.69) +0.5%     2.00(4.66+1.65) +2.6%
7820.10: extended grep '[how] to'               1.96(4.54+1.65)     1.94(4.46+1.70) -1.0%     2.04(4.78+1.65) +4.1%
7820.11: perl grep '[how] to'                   1.29(1.58+1.88)     1.28(1.50+1.94) -0.8%     1.34(1.51+2.06) +3.9%
7820.13: basic grep '\(e.t[^ ]*\|v.ry\) rare'   8.17(13.18+1.50)    8.29(13.36+1.37) +1.5%    8.31(13.33+1.60) +1.7%
7820.14: extended grep '(e.t[^ ]*|v.ry) rare'   8.13(13.03+1.59)    8.14(13.12+1.47) +0.1%    8.30(13.35+1.56) +2.1%
7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       34.96(35.80+1.68)   34.99(35.60+1.91) +0.1%   35.18(35.83+1.90) +0.6%
7820.17: basic grep 'm\(ú\|u\)lt.b\(æ\|y\)te'   1.57(3.03+1.64)     1.53(2.76+1.75) -2.5%     1.60(2.89+1.77) +1.9%
7820.18: extended grep 'm(ú|u)lt.b(æ|y)te'      1.52(2.83+1.69)     1.52(2.89+1.63) +0.0%     1.58(2.80+1.84) +3.9%
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          1.20(1.25+2.02)     1.21(1.30+1.96) +0.8%     1.25(1.22+2.11) +4.2%

pcre2-xmalloc is my patch on top of master, +nedmalloc has the warning
fixes I sent earlier and sets USE_NED_MALLOC.

I don't understand why my performance is lower by factor 2.5 than yours
for all perl regex tests except 7820.15 (your 7820.4), where my system
is two times faster.  Debian Testing, GCC 9.1.0, i5-9600K, 16GB RAM.

Anyway, nedmalloc is slower across the board, but the impact of my
patch is in the noise.  Right?

René
Carlo Marcelo Arenas Belón Aug. 10, 2019, 12:40 p.m. UTC | #7
On Sat, Aug 10, 2019 at 12:57 AM René Scharfe <l.s.r@web.de> wrote:
>
> Am 10.08.19 um 05:05 schrieb Carlo Arenas:
> > in macOS (obviously testing without NED) the following is the output
> > of (a hacked version) of p7801 for maint (against chromium's
> > repository), with René's patch on top
>
> Do you mean p7820?  And what did you change?  Looking at the results you
> removed basic and extended from the list of regex engines, right?

correct, that was a weird typo there; apologize for the confusion.
you were correct about my changes; ironically, I didn't spell those
changes out to avoid confusion.

> Ugh, cloning https://chromium.googlesource.com/chromium/src.git sends
> more than 16GB across the wire.  Is that even the right repository?

noticed it was mentioned before by people[1] doing performance testing
on grep specifically and
thought it was better than try to come with another one, the linux
kernel wouldn't work in macOS
because it breaks "run" as it fails to checkout in a case insensitive
filesystem.

> Not
> sure if my machine can keep the relevant parts cached while grepping --
> I/O times could drown out any difference due to context allocation and
> memory allocator choice.  Let's see...

;), try setting /proc/sys/vm/swappiness to 0 or get more RAM

> I don't understand why my performance is lower by factor 2.5 than yours
> for all perl regex tests except 7820.15 (your 7820.4), where my system
> is two times faster.  Debian Testing, GCC 9.1.0, i5-9600K, 16GB RAM.

interesting; did you also see at most 100% of one CPU being used?
yours seem to be faster than mine so this might be representative of
single threaded performance.

> Anyway, nedmalloc is slower across the board, but the impact of my
> patch is in the noise.  Right?

yes, and there is a lot of noise.

Carlo

[1] https://public-inbox.org/git/e72330c6747218545cce1b6b1edfd1e448141a8f.1563570204.git.matheus.bernardino@usp.br/
René Scharfe Aug. 10, 2019, 9:16 p.m. UTC | #8
Am 10.08.19 um 14:40 schrieb Carlo Arenas:
> On Sat, Aug 10, 2019 at 12:57 AM René Scharfe <l.s.r@web.de> wrote:
>> I don't understand why my performance is lower by factor 2.5 than yours
>> for all perl regex tests except 7820.15 (your 7820.4), where my system
>> is two times faster.  Debian Testing, GCC 9.1.0, i5-9600K, 16GB RAM.
>
> interesting; did you also see at most 100% of one CPU being used?

For 7820.15 yes -- 100% of a single core, the rest idling.  The other
tests had only brief sequential phases, if at all, and used all cores.

René