diff mbox series

libselinux: Workaround for heap overhead of pcre

Message ID 20221219085336.391225-1-inseob@google.com (mailing list archive)
State Superseded, archived
Delegated to: Jason Zaman
Headers show
Series libselinux: Workaround for heap overhead of pcre | expand

Commit Message

Inseob Kim Dec. 19, 2022, 8:53 a.m. UTC
pcre's behavior is changed so that pcre2_match always allocates heap for
match_data, rather than stack, regardless of size. The heap isn't freed
until explicitly calling pcre2_match_data_free. This new behavior may
result in heap overhead, which may increase the peak memory usage about
a few megabytes. It's because regex_match is first called for regex_data
objects, and then regex_data objects are freed at once.

To workaround it, free and reallocate match_data whenever we call
regex_match. It's fine because libselinux currently doesn't use
match_data, but use only the return value.

Signed-off-by: Inseob Kim <inseob@google.com>
---
 libselinux/src/regex.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Christian Göttsche Dec. 20, 2022, 3:02 p.m. UTC | #1
On Mon, 19 Dec 2022 at 09:53, Inseob Kim <inseob@google.com> wrote:
>
> pcre's behavior is changed so that pcre2_match always allocates heap for
> match_data, rather than stack, regardless of size. The heap isn't freed
> until explicitly calling pcre2_match_data_free. This new behavior may
> result in heap overhead, which may increase the peak memory usage about
> a few megabytes. It's because regex_match is first called for regex_data
> objects, and then regex_data objects are freed at once.

This approach trades peak memory usage for temporary allocations,
which effects runtime performance.  On modern systems memory is most
of the time not a scarce resource.

Some examples:

# selabel_lookup -b file -k /etc/shadow -t file [heaptrack]

## current

total runtime: 0.07s.
calls to allocation functions: 28420 (406000/s)
temporary memory allocations: 16 (228/s)
peak heap memory consumption: 10.09M
peak RSS (including heaptrack overhead): 21.27M
total memory leaked: 1.02K

## proposed

total runtime: 0.06s.
calls to allocation functions: 23430 (366093/s)
temporary memory allocations: 675 (10546/s)
peak heap memory consumption: 9.48M
peak RSS (including heaptrack overhead): 18.59M
total memory leaked: 1.02K

# restorecon -vRn /etc [heaptrack]

## current

total runtime: 0.14s.
calls to allocation functions: 33873 (236874/s)
temporary memory allocations: 1877 (13125/s)
peak heap memory consumption: 10.09M
peak RSS (including heaptrack overhead): 21.58M
total memory leaked: 1.90K

## proposed

total runtime: 0.27s.
calls to allocation functions: 378762 (1423917/s)
temporary memory allocations: 351487 (1321379/s)
peak heap memory consumption: 9.48M
peak RSS (including heaptrack overhead): 20.99M
total memory leaked: 1.90K


# restorecon -vRn /usr [hyperfine]

## current

restorecon -vRn /usr
Benchmark 1: ~/destdir/sbin/restorecon -vRn /usr
  Time (mean ± σ):     24.419 s ±  0.661 s    [User: 23.480 s, System: 0.922 s]
  Range (min … max):   23.399 s … 25.495 s    10 runs

## proposed

restorecon -vRn /usr
Benchmark 1: ~/destdir/sbin/restorecon -vRn /usr
  Time (mean ± σ):     28.628 s ±  0.968 s    [User: 27.688 s, System: 0.927 s]
  Range (min … max):   27.674 s … 30.798 s    10 runs


So I would argue the performance impact for applications (like
setfiles, restorecon) or daemon (like systemd, udev) is more critical
than the 500K per application.

> To workaround it, free and reallocate match_data whenever we call
> regex_match. It's fine because libselinux currently doesn't use
> match_data, but use only the return value.
>
> Signed-off-by: Inseob Kim <inseob@google.com>
> ---
>  libselinux/src/regex.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/libselinux/src/regex.c b/libselinux/src/regex.c
> index 149a7973..2df282f1 100644
> --- a/libselinux/src/regex.c
> +++ b/libselinux/src/regex.c
> @@ -213,10 +213,20 @@ void regex_data_free(struct regex_data *regex)
>  int regex_match(struct regex_data *regex, char const *subject, int partial)
>  {
>         int rc;
> +       pcre2_match_data *new_match_data;
>         __pthread_mutex_lock(&regex->match_mutex);
> +       new_match_data = pcre2_match_data_create_from_pattern(
> +           regex->regex, NULL);

Should be checked for failure (cause pcre2_match() expects a non-NULL
match_data, which would be passed the second time).

Also with this change the member match_data of the struct regex_data
becomes obsolete and should be removed.

>         rc = pcre2_match(
>             regex->regex, (PCRE2_SPTR)subject, PCRE2_ZERO_TERMINATED, 0,
>             partial ? PCRE2_PARTIAL_SOFT : 0, regex->match_data, NULL);
> +       // pcre2_match allocates heap and it won't be freed until
> +       // pcre2_match_data_free, resulting in heap overhead.
> +       // Reallocate match_data to prevent such overhead, whenever possible.
> +       if (new_match_data) {
> +               pcre2_match_data_free(regex->match_data);
> +               regex->match_data = new_match_data;
> +       }
>         __pthread_mutex_unlock(&regex->match_mutex);
>         if (rc > 0)
>                 return REGEX_MATCH;
> --
> 2.39.0.314.g84b9a713c41-goog
>
Inseob Kim Dec. 21, 2022, 2:53 a.m. UTC | #2
On Wed, Dec 21, 2022 at 12:02 AM Christian Göttsche
<cgzones@googlemail.com> wrote:
>
> On Mon, 19 Dec 2022 at 09:53, Inseob Kim <inseob@google.com> wrote:
> >
> > pcre's behavior is changed so that pcre2_match always allocates heap for
> > match_data, rather than stack, regardless of size. The heap isn't freed
> > until explicitly calling pcre2_match_data_free. This new behavior may
> > result in heap overhead, which may increase the peak memory usage about
> > a few megabytes. It's because regex_match is first called for regex_data
> > objects, and then regex_data objects are freed at once.
>
> This approach trades peak memory usage for temporary allocations,
> which effects runtime performance.  On modern systems memory is most
> of the time not a scarce resource.
>
> Some examples:
>
> # selabel_lookup -b file -k /etc/shadow -t file [heaptrack]
>
> ## current
>
> total runtime: 0.07s.
> calls to allocation functions: 28420 (406000/s)
> temporary memory allocations: 16 (228/s)
> peak heap memory consumption: 10.09M
> peak RSS (including heaptrack overhead): 21.27M
> total memory leaked: 1.02K
>
> ## proposed
>
> total runtime: 0.06s.
> calls to allocation functions: 23430 (366093/s)
> temporary memory allocations: 675 (10546/s)
> peak heap memory consumption: 9.48M
> peak RSS (including heaptrack overhead): 18.59M
> total memory leaked: 1.02K
>
> # restorecon -vRn /etc [heaptrack]
>
> ## current
>
> total runtime: 0.14s.
> calls to allocation functions: 33873 (236874/s)
> temporary memory allocations: 1877 (13125/s)
> peak heap memory consumption: 10.09M
> peak RSS (including heaptrack overhead): 21.58M
> total memory leaked: 1.90K
>
> ## proposed
>
> total runtime: 0.27s.
> calls to allocation functions: 378762 (1423917/s)
> temporary memory allocations: 351487 (1321379/s)
> peak heap memory consumption: 9.48M
> peak RSS (including heaptrack overhead): 20.99M
> total memory leaked: 1.90K
>
>
> # restorecon -vRn /usr [hyperfine]
>
> ## current
>
> restorecon -vRn /usr
> Benchmark 1: ~/destdir/sbin/restorecon -vRn /usr
>   Time (mean ± σ):     24.419 s ±  0.661 s    [User: 23.480 s, System: 0.922 s]
>   Range (min … max):   23.399 s … 25.495 s    10 runs
>
> ## proposed
>
> restorecon -vRn /usr
> Benchmark 1: ~/destdir/sbin/restorecon -vRn /usr
>   Time (mean ± σ):     28.628 s ±  0.968 s    [User: 27.688 s, System: 0.927 s]
>   Range (min … max):   27.674 s … 30.798 s    10 runs
>
>
> So I would argue the performance impact for applications (like
> setfiles, restorecon) or daemon (like systemd, udev) is more critical
> than the 500K per application.

I observed about 3~4MB increase on Android device. Which pcre2 version
are you using? Does it include
https://github.com/PCRE2Project/pcre2/commit/d90fb238#diff-15ec3f4ed916f52c810daf305702985dda6d8d45e7ce22e2f309c95bd6ef32b7R74
?

And if this is difficult to apply, how about adding a new flag e.g.
AGGRESSIVE_FREE_AFTER_REGEX_MATCH ?

>
> > To workaround it, free and reallocate match_data whenever we call
> > regex_match. It's fine because libselinux currently doesn't use
> > match_data, but use only the return value.
> >
> > Signed-off-by: Inseob Kim <inseob@google.com>
> > ---
> >  libselinux/src/regex.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/libselinux/src/regex.c b/libselinux/src/regex.c
> > index 149a7973..2df282f1 100644
> > --- a/libselinux/src/regex.c
> > +++ b/libselinux/src/regex.c
> > @@ -213,10 +213,20 @@ void regex_data_free(struct regex_data *regex)
> >  int regex_match(struct regex_data *regex, char const *subject, int partial)
> >  {
> >         int rc;
> > +       pcre2_match_data *new_match_data;
> >         __pthread_mutex_lock(&regex->match_mutex);
> > +       new_match_data = pcre2_match_data_create_from_pattern(
> > +           regex->regex, NULL);
>
> Should be checked for failure (cause pcre2_match() expects a non-NULL
> match_data, which would be passed the second time).
>
> Also with this change the member match_data of the struct regex_data
> becomes obsolete and should be removed.

Thanks, this makes sense.

>
> >         rc = pcre2_match(
> >             regex->regex, (PCRE2_SPTR)subject, PCRE2_ZERO_TERMINATED, 0,
> >             partial ? PCRE2_PARTIAL_SOFT : 0, regex->match_data, NULL);
> > +       // pcre2_match allocates heap and it won't be freed until
> > +       // pcre2_match_data_free, resulting in heap overhead.
> > +       // Reallocate match_data to prevent such overhead, whenever possible.
> > +       if (new_match_data) {
> > +               pcre2_match_data_free(regex->match_data);
> > +               regex->match_data = new_match_data;
> > +       }
> >         __pthread_mutex_unlock(&regex->match_mutex);
> >         if (rc > 0)
> >                 return REGEX_MATCH;
> > --
> > 2.39.0.314.g84b9a713c41-goog
> >
Christian Göttsche Dec. 21, 2022, 2:19 p.m. UTC | #3
On Wed, 21 Dec 2022 at 03:53, Inseob Kim <inseob@google.com> wrote:
>
> On Wed, Dec 21, 2022 at 12:02 AM Christian Göttsche
> <cgzones@googlemail.com> wrote:
> >
> > On Mon, 19 Dec 2022 at 09:53, Inseob Kim <inseob@google.com> wrote:
> > >
> > > pcre's behavior is changed so that pcre2_match always allocates heap for
> > > match_data, rather than stack, regardless of size. The heap isn't freed
> > > until explicitly calling pcre2_match_data_free. This new behavior may
> > > result in heap overhead, which may increase the peak memory usage about
> > > a few megabytes. It's because regex_match is first called for regex_data
> > > objects, and then regex_data objects are freed at once.
> >
> > This approach trades peak memory usage for temporary allocations,
> > which effects runtime performance.  On modern systems memory is most
> > of the time not a scarce resource.
> >
> > Some examples:
> >
> > # selabel_lookup -b file -k /etc/shadow -t file [heaptrack]
> >
> > ## current
> >
> > total runtime: 0.07s.
> > calls to allocation functions: 28420 (406000/s)
> > temporary memory allocations: 16 (228/s)
> > peak heap memory consumption: 10.09M
> > peak RSS (including heaptrack overhead): 21.27M
> > total memory leaked: 1.02K
> >
> > ## proposed
> >
> > total runtime: 0.06s.
> > calls to allocation functions: 23430 (366093/s)
> > temporary memory allocations: 675 (10546/s)
> > peak heap memory consumption: 9.48M
> > peak RSS (including heaptrack overhead): 18.59M
> > total memory leaked: 1.02K
> >
> > # restorecon -vRn /etc [heaptrack]
> >
> > ## current
> >
> > total runtime: 0.14s.
> > calls to allocation functions: 33873 (236874/s)
> > temporary memory allocations: 1877 (13125/s)
> > peak heap memory consumption: 10.09M
> > peak RSS (including heaptrack overhead): 21.58M
> > total memory leaked: 1.90K
> >
> > ## proposed
> >
> > total runtime: 0.27s.
> > calls to allocation functions: 378762 (1423917/s)
> > temporary memory allocations: 351487 (1321379/s)
> > peak heap memory consumption: 9.48M
> > peak RSS (including heaptrack overhead): 20.99M
> > total memory leaked: 1.90K
> >
> >
> > # restorecon -vRn /usr [hyperfine]
> >
> > ## current
> >
> > restorecon -vRn /usr
> > Benchmark 1: ~/destdir/sbin/restorecon -vRn /usr
> >   Time (mean ± σ):     24.419 s ±  0.661 s    [User: 23.480 s, System: 0.922 s]
> >   Range (min … max):   23.399 s … 25.495 s    10 runs
> >
> > ## proposed
> >
> > restorecon -vRn /usr
> > Benchmark 1: ~/destdir/sbin/restorecon -vRn /usr
> >   Time (mean ± σ):     28.628 s ±  0.968 s    [User: 27.688 s, System: 0.927 s]
> >   Range (min … max):   27.674 s … 30.798 s    10 runs
> >
> >
> > So I would argue the performance impact for applications (like
> > setfiles, restorecon) or daemon (like systemd, udev) is more critical
> > than the 500K per application.
>
> I observed about 3~4MB increase on Android device. Which pcre2 version
> are you using? Does it include
> https://github.com/PCRE2Project/pcre2/commit/d90fb238#diff-15ec3f4ed916f52c810daf305702985dda6d8d45e7ce22e2f309c95bd6ef32b7R74
> ?

Those measurements were done against pcre2 version 10.40.
Indeed with version 10.42 the peak memory usage of the current
implementation has grown and is reduced by this patch:


# selabel_lookup -b file -k /etc/shadow -t file [heaptrack]

## current

total runtime: 0.07s.
calls to allocation functions: 29080 (421449/s)
temporary memory allocations: 15 (217/s)
peak heap memory consumption: 22.72M
peak RSS (including heaptrack overhead): 18.68M
total memory leaked: 1.02K

## proposed

total runtime: 0.06s.
calls to allocation functions: 24090 (376406/s)
temporary memory allocations: 675 (10546/s)
peak heap memory consumption: 9.48M
peak RSS (including heaptrack overhead): 22.15M
total memory leaked: 1.02K


# restorecon -vRn /etc [heaptrack]

## current

total runtime: 0.13s.
calls to allocation functions: 34720 (259104/s)
temporary memory allocations: 1872 (13970/s)
peak heap memory consumption: 26.66M
peak RSS (including heaptrack overhead): 25.59M
total memory leaked: 1.90K

## proposed

total runtime: 0.43s.
calls to allocation functions: 729301 (1692113/s)
temporary memory allocations: 351486 (815512/s)
peak heap memory consumption: 9.48M
peak RSS (including heaptrack overhead): 20.98M
total memory leaked: 1.90K


# restorecon -vRn /usr [hyperfine]

## current

Benchmark 1: ~/destdir/sbin/restorecon -vRn /usr
  Time (mean ± σ):     31.053 s ±  1.491 s    [User: 29.926 s, System: 1.105 s]
  Range (min … max):   28.817 s … 33.679 s    10 runs

## proposed

Benchmark 1: ~/destdir/sbin/restorecon -vRn /usr
  Time (mean ± σ):     37.829 s ±  0.416 s    [User: 36.935 s, System: 0.875 s]
  Range (min … max):   37.205 s … 38.764 s    10 runs


>
> And if this is difficult to apply, how about adding a new flag e.g.
> AGGRESSIVE_FREE_AFTER_REGEX_MATCH ?
>
> >
> > > To workaround it, free and reallocate match_data whenever we call
> > > regex_match. It's fine because libselinux currently doesn't use
> > > match_data, but use only the return value.
> > >
> > > Signed-off-by: Inseob Kim <inseob@google.com>
> > > ---
> > >  libselinux/src/regex.c | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > >
> > > diff --git a/libselinux/src/regex.c b/libselinux/src/regex.c
> > > index 149a7973..2df282f1 100644
> > > --- a/libselinux/src/regex.c
> > > +++ b/libselinux/src/regex.c
> > > @@ -213,10 +213,20 @@ void regex_data_free(struct regex_data *regex)
> > >  int regex_match(struct regex_data *regex, char const *subject, int partial)
> > >  {
> > >         int rc;
> > > +       pcre2_match_data *new_match_data;
> > >         __pthread_mutex_lock(&regex->match_mutex);
> > > +       new_match_data = pcre2_match_data_create_from_pattern(
> > > +           regex->regex, NULL);
> >
> > Should be checked for failure (cause pcre2_match() expects a non-NULL
> > match_data, which would be passed the second time).
> >
> > Also with this change the member match_data of the struct regex_data
> > becomes obsolete and should be removed.
>
> Thanks, this makes sense.
>
> >
> > >         rc = pcre2_match(
> > >             regex->regex, (PCRE2_SPTR)subject, PCRE2_ZERO_TERMINATED, 0,
> > >             partial ? PCRE2_PARTIAL_SOFT : 0, regex->match_data, NULL);
> > > +       // pcre2_match allocates heap and it won't be freed until
> > > +       // pcre2_match_data_free, resulting in heap overhead.
> > > +       // Reallocate match_data to prevent such overhead, whenever possible.
> > > +       if (new_match_data) {
> > > +               pcre2_match_data_free(regex->match_data);
> > > +               regex->match_data = new_match_data;
> > > +       }
> > >         __pthread_mutex_unlock(&regex->match_mutex);
> > >         if (rc > 0)
> > >                 return REGEX_MATCH;
> > > --
> > > 2.39.0.314.g84b9a713c41-goog
> > >
>
>
>
> --
> Inseob Kim | Software Engineer | inseob@google.com
diff mbox series

Patch

diff --git a/libselinux/src/regex.c b/libselinux/src/regex.c
index 149a7973..2df282f1 100644
--- a/libselinux/src/regex.c
+++ b/libselinux/src/regex.c
@@ -213,10 +213,20 @@  void regex_data_free(struct regex_data *regex)
 int regex_match(struct regex_data *regex, char const *subject, int partial)
 {
 	int rc;
+	pcre2_match_data *new_match_data;
 	__pthread_mutex_lock(&regex->match_mutex);
+	new_match_data = pcre2_match_data_create_from_pattern(
+	    regex->regex, NULL);
 	rc = pcre2_match(
 	    regex->regex, (PCRE2_SPTR)subject, PCRE2_ZERO_TERMINATED, 0,
 	    partial ? PCRE2_PARTIAL_SOFT : 0, regex->match_data, NULL);
+	// pcre2_match allocates heap and it won't be freed until
+	// pcre2_match_data_free, resulting in heap overhead.
+	// Reallocate match_data to prevent such overhead, whenever possible.
+	if (new_match_data) {
+		pcre2_match_data_free(regex->match_data);
+		regex->match_data = new_match_data;
+	}
 	__pthread_mutex_unlock(&regex->match_mutex);
 	if (rc > 0)
 		return REGEX_MATCH;