diff mbox series

[RFC] libselinux: disable capturing in fcontext matching

Message ID 20240108120029.38816-1-cgzones@googlemail.com (mailing list archive)
State New
Delegated to: Petr Lautrbach
Headers show
Series [RFC] libselinux: disable capturing in fcontext matching | expand

Commit Message

Christian Göttsche Jan. 8, 2024, noon UTC
The path of a file context definition is compared as a regular
expression against actual pathnames.  Those definitions make frequently
use of groups, like `(/.*)?`, which are capturing by default, causing
the regex engine to extract and save the matched input.  Matching
context definitions against pathnames only cares about whether it's a
match or not, potential captures are never accessed.

Compile regular expressions (in the default PCRE2 variant) with the flag
PCRE2_NO_AUTO_CAPTURE to turn captured groups automatically into non
captured ones, like `(/.*)?` into `(?:/.*)?`.  This saves some cycles
during lookup operations (~1.5%).

Only potential regression would be the advanced usage of backreferences
or recursion/subroutine calls to numbered captures, which would need an
update to use named captures instead.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
---
 libselinux/src/regex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

James Carter Feb. 5, 2024, 7:07 p.m. UTC | #1
On Mon, Jan 8, 2024 at 7:00 AM Christian Göttsche
<cgzones@googlemail.com> wrote:
>
> The path of a file context definition is compared as a regular
> expression against actual pathnames.  Those definitions make frequently
> use of groups, like `(/.*)?`, which are capturing by default, causing
> the regex engine to extract and save the matched input.  Matching
> context definitions against pathnames only cares about whether it's a
> match or not, potential captures are never accessed.
>
> Compile regular expressions (in the default PCRE2 variant) with the flag
> PCRE2_NO_AUTO_CAPTURE to turn captured groups automatically into non
> captured ones, like `(/.*)?` into `(?:/.*)?`.  This saves some cycles
> during lookup operations (~1.5%).
>
> Only potential regression would be the advanced usage of backreferences
> or recursion/subroutine calls to numbered captures, which would need an
> update to use named captures instead.
>

I don't think a ~1.5% improvement is going to be worth the potential
for regressions. I don't know of any policy that makes use of capture
groups, but there probably is someone somewhere who does.

That being said, if others are strongly for it, I can be persuaded.

Jim


> Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
> ---
>  libselinux/src/regex.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libselinux/src/regex.c b/libselinux/src/regex.c
> index 88d82fed..87423c48 100644
> --- a/libselinux/src/regex.c
> +++ b/libselinux/src/regex.c
> @@ -80,7 +80,7 @@ int regex_prepare_data(struct regex_data **regex, char const *pattern_string,
>                 return -1;
>
>         (*regex)->regex = pcre2_compile(
> -           (PCRE2_SPTR)pattern_string, PCRE2_ZERO_TERMINATED, PCRE2_DOTALL,
> +           (PCRE2_SPTR)pattern_string, PCRE2_ZERO_TERMINATED, PCRE2_DOTALL | PCRE2_NO_AUTO_CAPTURE,
>             &errordata->error_code, &errordata->error_offset, NULL);
>         if (!(*regex)->regex) {
>                 goto err;
> --
> 2.43.0
>
>
diff mbox series

Patch

diff --git a/libselinux/src/regex.c b/libselinux/src/regex.c
index 88d82fed..87423c48 100644
--- a/libselinux/src/regex.c
+++ b/libselinux/src/regex.c
@@ -80,7 +80,7 @@  int regex_prepare_data(struct regex_data **regex, char const *pattern_string,
 		return -1;
 
 	(*regex)->regex = pcre2_compile(
-	    (PCRE2_SPTR)pattern_string, PCRE2_ZERO_TERMINATED, PCRE2_DOTALL,
+	    (PCRE2_SPTR)pattern_string, PCRE2_ZERO_TERMINATED, PCRE2_DOTALL | PCRE2_NO_AUTO_CAPTURE,
 	    &errordata->error_code, &errordata->error_offset, NULL);
 	if (!(*regex)->regex) {
 		goto err;