diff mbox series

[2/3] libselinux: improve performance with pcre matches

Message ID 20230123014047.84911-3-carenas@gmail.com (mailing list archive)
State Changes Requested
Headers show
Series improve performance of pcre matches | expand

Commit Message

Carlo Marcelo Arenas Belón Jan. 23, 2023, 1:40 a.m. UTC
Since 30b3e9d2 (libselinux: Workaround for heap overhead of pcre,
2023-01-12), performance of PCRE2 matches has been affected due to
excesive recreation of the match_data in an attempt to reduce memory
utilization; instead of a workaround, it would be better to address
the problem and maybe even improve performance in the process.

The issue is that currently the structure that holds PCRE state has
both a pcre2_code (which is per pattern) and a pcre2_match_data (which
is per match), forcing us to add a mutex to prevent multiple matches to
step on each other.

Lets remove the match_data and the mutex and instead allocate one once
in a thread independent way that could be used and reused, by extending
our pthread interface to not only store TLS variables but also retrieve
them, and then use one of those.

Since we are not interested on the capture groups (if any) lets only
allocate 1 pair which is all that will be needed and change the logic
so that a return of 0 (which means the pattern matched but there were
not enough capture spots) is also considered a match.

This will ensure that the memory use would be bound to the number of
concurrent matches instead of the number of patterns and therefore
reduce the impact that recent changes on the way that the frames used
for matching are allocated might had brough since 10.41 was released.

For cases where threads are not available, just keep it working in slow
mode as done before the workaround was reverted.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 libselinux/src/regex.c            | 108 +++++++++++++++++-------------
 libselinux/src/selinux_internal.h |   4 ++
 2 files changed, 64 insertions(+), 48 deletions(-)

Comments

Eric Biggers March 22, 2023, 4:25 a.m. UTC | #1
Hi Carlo,

On Mon, Jan 23, 2023 at 01:40:46AM +0000, Carlo Marcelo Arenas Belón wrote:
> Since 30b3e9d2 (libselinux: Workaround for heap overhead of pcre,
> 2023-01-12), performance of PCRE2 matches has been affected due to
> excesive recreation of the match_data in an attempt to reduce memory
> utilization; instead of a workaround, it would be better to address
> the problem and maybe even improve performance in the process.
> 
> The issue is that currently the structure that holds PCRE state has
> both a pcre2_code (which is per pattern) and a pcre2_match_data (which
> is per match), forcing us to add a mutex to prevent multiple matches to
> step on each other.
> 
> Lets remove the match_data and the mutex and instead allocate one once
> in a thread independent way that could be used and reused, by extending
> our pthread interface to not only store TLS variables but also retrieve
> them, and then use one of those.
> 
> Since we are not interested on the capture groups (if any) lets only
> allocate 1 pair which is all that will be needed and change the logic
> so that a return of 0 (which means the pattern matched but there were
> not enough capture spots) is also considered a match.
> 
> This will ensure that the memory use would be bound to the number of
> concurrent matches instead of the number of patterns and therefore
> reduce the impact that recent changes on the way that the frames used
> for matching are allocated might had brough since 10.41 was released.
> 
> For cases where threads are not available, just keep it working in slow
> mode as done before the workaround was reverted.
> 
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>  libselinux/src/regex.c            | 108 +++++++++++++++++-------------
>  libselinux/src/selinux_internal.h |   4 ++
>  2 files changed, 64 insertions(+), 48 deletions(-)
> 

Thanks for writing this patch!  I notice it hasn't been applied upstream.  Is
this still being worked on?

A couple comments below:

> +static void __attribute__((destructor)) match_data_thread_free(void *key)
> +{
> +	void *value;
> +	pcre2_match_data *match_data;
> +
> +	if (match_data_key_initialized <= 0 || !match_data_initialized)
> +		return;
> +
> +	value = __selinux_getspecific(match_data_key);
> +	match_data = value ? value : key;
> +
> +	pcre2_match_data_free(match_data);
> +
> +	__pthread_mutex_lock(&key_mutex);
> +	if (--match_data_key_initialized == 1) {
> +		__selinux_key_delete(match_data_key);
> +		match_data_key_initialized = -1;
> +	}
> +	__pthread_mutex_unlock(&key_mutex);
> +}

This function is used as a pthread_key destructor.  But, it's also marked with
__attribute__((destructor)), making it a library-level destructor too.  That's
confusing.  Is there a reason for that?  I would have expected these to be two
different functions.  The pthread_key destructor should call
pcre2_match_data_free on the value passed as an argument, while the
library-level destructor should delete match_data_key.

>  int regex_match(struct regex_data *regex, char const *subject, int partial)
>  {
>  	int rc;
> -	pcre2_match_data *match_data;
> -	__pthread_mutex_lock(&regex->match_mutex);
> +	bool slow;
> +	pcre2_match_data *match_data = NULL;
> +
> +	if (match_data_key_initialized > 0) {
> +		if (match_data_initialized == 0) {
> +			match_data = pcre2_match_data_create(1, NULL);
> +			if (match_data) {
> +				match_data_initialized = 1;
> +				__selinux_setspecific(match_data_key,
> +							match_data);
> +				__pthread_mutex_lock(&key_mutex);
> +				match_data_key_initialized++;
> +				__pthread_mutex_unlock(&key_mutex);
> +			}
> +		} else
> +			match_data = __selinux_getspecific(match_data_key);
> +	}

Since match_data_key_initialized is checked without holding key_mutex, can't the
above code race with the below code in match_data_thread_free?

        __pthread_mutex_lock(&key_mutex);
        if (--match_data_key_initialized == 1) {
                __selinux_key_delete(match_data_key);
                match_data_key_initialized = -1;
        }
        __pthread_mutex_unlock(&key_mutex);

Perhaps deleting the pthread_key is just not something that should be done at
all?

- Eric
Stephen Smalley July 21, 2023, 5:13 p.m. UTC | #2
On Wed, Mar 22, 2023 at 12:39 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> Hi Carlo,
>
> On Mon, Jan 23, 2023 at 01:40:46AM +0000, Carlo Marcelo Arenas Belón wrote:
> > Since 30b3e9d2 (libselinux: Workaround for heap overhead of pcre,
> > 2023-01-12), performance of PCRE2 matches has been affected due to
> > excesive recreation of the match_data in an attempt to reduce memory
> > utilization; instead of a workaround, it would be better to address
> > the problem and maybe even improve performance in the process.
> >
> > The issue is that currently the structure that holds PCRE state has
> > both a pcre2_code (which is per pattern) and a pcre2_match_data (which
> > is per match), forcing us to add a mutex to prevent multiple matches to
> > step on each other.
> >
> > Lets remove the match_data and the mutex and instead allocate one once
> > in a thread independent way that could be used and reused, by extending
> > our pthread interface to not only store TLS variables but also retrieve
> > them, and then use one of those.
> >
> > Since we are not interested on the capture groups (if any) lets only
> > allocate 1 pair which is all that will be needed and change the logic
> > so that a return of 0 (which means the pattern matched but there were
> > not enough capture spots) is also considered a match.
> >
> > This will ensure that the memory use would be bound to the number of
> > concurrent matches instead of the number of patterns and therefore
> > reduce the impact that recent changes on the way that the frames used
> > for matching are allocated might had brough since 10.41 was released.
> >
> > For cases where threads are not available, just keep it working in slow
> > mode as done before the workaround was reverted.
> >
> > Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> > ---
> >  libselinux/src/regex.c            | 108 +++++++++++++++++-------------
> >  libselinux/src/selinux_internal.h |   4 ++
> >  2 files changed, 64 insertions(+), 48 deletions(-)
> >
>
> Thanks for writing this patch!  I notice it hasn't been applied upstream.  Is
> this still being worked on?
>
> A couple comments below:
>
> > +static void __attribute__((destructor)) match_data_thread_free(void *key)
> > +{
> > +     void *value;
> > +     pcre2_match_data *match_data;
> > +
> > +     if (match_data_key_initialized <= 0 || !match_data_initialized)
> > +             return;
> > +
> > +     value = __selinux_getspecific(match_data_key);
> > +     match_data = value ? value : key;
> > +
> > +     pcre2_match_data_free(match_data);
> > +
> > +     __pthread_mutex_lock(&key_mutex);
> > +     if (--match_data_key_initialized == 1) {
> > +             __selinux_key_delete(match_data_key);
> > +             match_data_key_initialized = -1;
> > +     }
> > +     __pthread_mutex_unlock(&key_mutex);
> > +}
>
> This function is used as a pthread_key destructor.  But, it's also marked with
> __attribute__((destructor)), making it a library-level destructor too.  That's
> confusing.  Is there a reason for that?  I would have expected these to be two
> different functions.  The pthread_key destructor should call
> pcre2_match_data_free on the value passed as an argument, while the
> library-level destructor should delete match_data_key.
>
> >  int regex_match(struct regex_data *regex, char const *subject, int partial)
> >  {
> >       int rc;
> > -     pcre2_match_data *match_data;
> > -     __pthread_mutex_lock(&regex->match_mutex);
> > +     bool slow;
> > +     pcre2_match_data *match_data = NULL;
> > +
> > +     if (match_data_key_initialized > 0) {
> > +             if (match_data_initialized == 0) {
> > +                     match_data = pcre2_match_data_create(1, NULL);
> > +                     if (match_data) {
> > +                             match_data_initialized = 1;
> > +                             __selinux_setspecific(match_data_key,
> > +                                                     match_data);
> > +                             __pthread_mutex_lock(&key_mutex);
> > +                             match_data_key_initialized++;
> > +                             __pthread_mutex_unlock(&key_mutex);
> > +                     }
> > +             } else
> > +                     match_data = __selinux_getspecific(match_data_key);
> > +     }
>
> Since match_data_key_initialized is checked without holding key_mutex, can't the
> above code race with the below code in match_data_thread_free?
>
>         __pthread_mutex_lock(&key_mutex);
>         if (--match_data_key_initialized == 1) {
>                 __selinux_key_delete(match_data_key);
>                 match_data_key_initialized = -1;
>         }
>         __pthread_mutex_unlock(&key_mutex);
>
> Perhaps deleting the pthread_key is just not something that should be done at
> all?

This patch appears to have been applied to AOSP here:
https://android-review.googlesource.com/c/platform/external/selinux/+/2411194
but I didn't see any reply to Eric's comments above. Is this still
being worked on?
diff mbox series

Patch

diff --git a/libselinux/src/regex.c b/libselinux/src/regex.c
index 16df6790..54f24026 100644
--- a/libselinux/src/regex.c
+++ b/libselinux/src/regex.c
@@ -30,6 +30,11 @@ 
 #endif
 
 #ifdef USE_PCRE2
+static pthread_key_t match_data_key;
+static int match_data_key_initialized = -1;
+static pthread_mutex_t key_mutex = PTHREAD_MUTEX_INITIALIZER;
+static __thread char match_data_initialized;
+
 char const *regex_arch_string(void)
 {
 	static char arch_string_buffer[32];
@@ -60,14 +65,6 @@  char const *regex_arch_string(void)
 
 struct regex_data {
 	pcre2_code *regex; /* compiled regular expression */
-#ifndef AGGRESSIVE_FREE_AFTER_REGEX_MATCH
-	/*
-	 * match data block required for the compiled
-	 * pattern in pcre2
-	 */
-	pcre2_match_data *match_data;
-#endif
-	pthread_mutex_t match_mutex;
 };
 
 int regex_prepare_data(struct regex_data **regex, char const *pattern_string,
@@ -86,13 +83,6 @@  int regex_prepare_data(struct regex_data **regex, char const *pattern_string,
 		goto err;
 	}
 
-#ifndef AGGRESSIVE_FREE_AFTER_REGEX_MATCH
-	(*regex)->match_data =
-	    pcre2_match_data_create_from_pattern((*regex)->regex, NULL);
-	if (!(*regex)->match_data) {
-		goto err;
-	}
-#endif
 	return 0;
 
 err:
@@ -142,13 +132,6 @@  int regex_load_mmap(struct mmap_area *mmap_area, struct regex_data **regex,
 		if (rc != 1)
 			goto err;
 
-#ifndef AGGRESSIVE_FREE_AFTER_REGEX_MATCH
-		(*regex)->match_data =
-		    pcre2_match_data_create_from_pattern((*regex)->regex, NULL);
-		if (!(*regex)->match_data)
-			goto err;
-#endif
-
 		*regex_compiled = true;
 	}
 
@@ -204,18 +187,32 @@  out:
 	return rc;
 }
 
+static void __attribute__((destructor)) match_data_thread_free(void *key)
+{
+	void *value;
+	pcre2_match_data *match_data;
+
+	if (match_data_key_initialized <= 0 || !match_data_initialized)
+		return;
+
+	value = __selinux_getspecific(match_data_key);
+	match_data = value ? value : key;
+
+	pcre2_match_data_free(match_data);
+
+	__pthread_mutex_lock(&key_mutex);
+	if (--match_data_key_initialized == 1) {
+		__selinux_key_delete(match_data_key);
+		match_data_key_initialized = -1;
+	}
+	__pthread_mutex_unlock(&key_mutex);
+}
+
 void regex_data_free(struct regex_data *regex)
 {
 	if (regex) {
 		if (regex->regex)
 			pcre2_code_free(regex->regex);
-
-#ifndef AGGRESSIVE_FREE_AFTER_REGEX_MATCH
-		if (regex->match_data)
-			pcre2_match_data_free(regex->match_data);
-#endif
-
-		__pthread_mutex_destroy(&regex->match_mutex);
 		free(regex);
 	}
 }
@@ -223,32 +220,40 @@  void regex_data_free(struct regex_data *regex)
 int regex_match(struct regex_data *regex, char const *subject, int partial)
 {
 	int rc;
-	pcre2_match_data *match_data;
-	__pthread_mutex_lock(&regex->match_mutex);
+	bool slow;
+	pcre2_match_data *match_data = NULL;
+
+	if (match_data_key_initialized > 0) {
+		if (match_data_initialized == 0) {
+			match_data = pcre2_match_data_create(1, NULL);
+			if (match_data) {
+				match_data_initialized = 1;
+				__selinux_setspecific(match_data_key,
+							match_data);
+				__pthread_mutex_lock(&key_mutex);
+				match_data_key_initialized++;
+				__pthread_mutex_unlock(&key_mutex);
+			}
+		} else
+			match_data = __selinux_getspecific(match_data_key);
+	}
 
-#ifdef AGGRESSIVE_FREE_AFTER_REGEX_MATCH
-	match_data = pcre2_match_data_create_from_pattern(
-	    regex->regex, NULL);
-	if (match_data == NULL) {
-		__pthread_mutex_unlock(&regex->match_mutex);
-		return REGEX_ERROR;
+	slow = (match_data_key_initialized <= 0 || match_data == NULL);
+	if (slow) {
+		match_data = pcre2_match_data_create_from_pattern(regex->regex,
+									NULL);
+		if (!match_data)
+			return REGEX_ERROR;
 	}
-#else
-	match_data = regex->match_data;
-#endif
 
 	rc = pcre2_match(
 	    regex->regex, (PCRE2_SPTR)subject, PCRE2_ZERO_TERMINATED, 0,
 	    partial ? PCRE2_PARTIAL_SOFT : 0, match_data, NULL);
 
-#ifdef AGGRESSIVE_FREE_AFTER_REGEX_MATCH
-	// pcre2_match allocates heap and it won't be freed until
-	// pcre2_match_data_free, resulting in heap overhead.
-	pcre2_match_data_free(match_data);
-#endif
+	if (slow)
+		pcre2_match_data_free(match_data);
 
-	__pthread_mutex_unlock(&regex->match_mutex);
-	if (rc > 0)
+	if (rc >= 0)
 		return REGEX_MATCH;
 	switch (rc) {
 	case PCRE2_ERROR_PARTIAL:
@@ -290,7 +295,14 @@  struct regex_data *regex_data_create(void)
 	if (!regex_data)
 		return NULL;
 
-	__pthread_mutex_init(&regex_data->match_mutex, NULL);
+	__pthread_mutex_lock(&key_mutex);
+	if (match_data_key_initialized < 0) {
+		match_data_key_initialized = !__selinux_key_create(
+							&match_data_key,
+							match_data_thread_free);
+	}
+	__pthread_mutex_unlock(&key_mutex);
+
 	return regex_data;
 }
 
diff --git a/libselinux/src/selinux_internal.h b/libselinux/src/selinux_internal.h
index 06f2c038..d1e6c50f 100644
--- a/libselinux/src/selinux_internal.h
+++ b/libselinux/src/selinux_internal.h
@@ -13,6 +13,7 @@  extern int selinux_page_size ;
 #pragma weak pthread_key_create
 #pragma weak pthread_key_delete
 #pragma weak pthread_setspecific
+#pragma weak pthread_getspecific
 
 /* Call handler iff the first call.  */
 #define __selinux_once(ONCE_CONTROL, INIT_FUNCTION)	\
@@ -41,6 +42,9 @@  extern int selinux_page_size ;
 			pthread_setspecific(KEY, VALUE);	\
 	} while (0)
 
+#define __selinux_getspecific(KEY)				\
+	(pthread_getspecific != NULL ? pthread_getspecific(KEY) : NULL)
+
 /* selabel_lookup() is only thread safe if we're compiled with pthreads */
 
 #pragma weak pthread_mutex_init