mbox series

[RFC,v2,00/27] libselinux: rework selabel_file(5) database

Message ID 20230814132025.45364-1-cgzones@googlemail.com (mailing list archive)
Headers show
Series libselinux: rework selabel_file(5) database | expand

Message

Christian Göttsche Aug. 14, 2023, 1:19 p.m. UTC
Currently the database for file backend of selabel stores the file
context specifications in a single long array.  This array is sorted by
special precedence rules, e.g. regular expressions without meta
character first, ordered by length, and the remaining regular
expressions ordered by stem (the prefix part of the regular expressions
without meta characters) length.

This results in suboptimal lookup performance for two reasons;
File context specifications without any meta characters (e.g.
'/etc/passwd') are still matched via an expensive regular expression
match operation.
All such trivial regular expressions are matched against before any non-
trivial regular expression, resulting in thousands of regex match
operations for lookups for paths not matching any of the trivial ones.

Rework the internal representation of the database in two ways:
Convert regular expressions without any meta characters and containing
only supported escaped characters (e.g. '/etc/rc\.d/init\.d') into
literal strings, which get compared via strcmp(3) later on.
Store the specifications in a tree structure (since the filesystem is a
tree) to reduce the to number of specifications that need to be checked.

Since the internal representation is completely rewritten introduce a
new compiled file context file format mirroring the tree structure.
The new format also stores all multi-byte data in network byte-order, so
that such compiled files can be cross-compiled, e.g. for embedded
devices with read-only filesystems (except for the regular expressions,
which are still architecture-dependent).

The improved lookup performance will also benefit SELinux aware daemons,
which create files with their default context, e.g. systemd.

#  Performance data

## Compiled file context sizes

Fedora 38 (regular expressions are omitted on Fedora):
    file_contexts.bin:           596783  ->   575284  (bytes)
    file_contexts.homedirs.bin:   21219  ->    18185  (bytes)

Debian Sid (regular expressions are included):
    file_contexts.bin:          2580704  ->  1428354  (bytes)
    file_contexts.homedirs.bin:  130946  ->    96884  (bytes)

## Single lookup

(selabel -b file -k /bin/bash)

Fedora 38 in VM:
    text:      time:       3.6 ms  ->   4.7 ms
               peak heap:   2.32M  ->    1.44M
               peak rss:    5.61M  ->    6.03M
    compiled:  time:       1.5 ms  ->   1.5 ms
               peak heap:   2.14M  ->  917.93K
               peak rss:    5.33M  ->    5.47M

Debian Sid on Raspberry Pi 3:
    text:      time:      33.9 ms  ->  19.9 ms
               peak heap:  10.46M  ->  468.72K
               peak rss:    9.44M  ->    4.98M
    compiled:  time:      39.3 ms  ->  22.8 ms
               peak heap:  13.09M  ->    1.86M
               peak rss:   12.57M  ->    7.86M

## Full filesystem relabel

(restorecon -vRn /)

Fedora 38 in VM:
      27.445 s  ->   3.293 s
Debian Sid on Raspberry Pi 3:
      86.734 s  ->  10.810 s

(restorecon -vRn -T0 /)

Fedora 38 in VM (8 cores):
      29.205 s  ->   2.521 s
Debian Sid on Raspberry Pi 3 (4 cores):
      46.974 s  ->  10.728 s

(note: I am unsure why the parallel runs on Fedora are slower)

# TODO

There might be subtle differences in lookup results which evaded my
testing, because some precedence rules are oblique.  For example
`/usr/(.*/)?lib(/.*)?` has to have a higher precedence than
`/usr/(.*/)?bin(/.*)?` to match the current Fedora behavior.  Please
report any behavior changes.

If any code section is unclear I am happy to add some inline comments.

The maximum node depth in the database is set to 3, which seems to give
the best performance to memory usage ratio.  Might be tweaked for
systems with different filesystem hierarchies (Android?).

I am not that familiar with the selabel_partial_match(3),
selabel_get_digests_all_partial_matches(3) and
selabel_hash_all_partial_matches(3) related interfaces, so I only did
some rudimentary tests for them.


# Patches

Patches 1-4 have been proposed already (this time all commits are signed-off):
https://patchwork.kernel.org/project/selinux/list/?series=772728

Patch 5 has been proposed already:
https://patchwork.kernel.org/project/selinux/patch/20230803162301.302579-1-cgzones@googlemail.com/

Patches 6-24 are cleanup and misc fixes which can be applied own their own

Patch 25 is the rework
Due to its complete rewrite it is too large for the mailing list, so I added
some developers in CC for this one and the patch is available on GitHub
(see below)

Patch 26 is removing unused code after the rework in patch 25

Patch 27 introduces new fuzzers for selabel_file(5)

This patchset is also available at https://github.com/SELinuxProject/selinux/pull/406


v2:
  - add two fuzzers performing label lookup, one for textual and one for
    compiled fcontext definitions
  - misc fixes uncovered via fuzzing


Christian Göttsche (27):
  libselinux/utils: update selabel_partial_match
  libselinux: misc label cleanup
  libselinux: drop obsolete optimization flag
  libselinux: drop unnecessary warning overrides
  setfiles: do not issue AUDIT_FS_RELABEL on dry run
  libselinux: cast to unsigned char for character handling function
  libselinux: constify selabel_cmp(3) parameters
  libselinux: introduce reallocarray(3)
  libselinux: simplify zeroing allocation
  libselinux: introduce selabel_nuke
  libselinux/utils: use type safe union assignment
  libselinux: avoid regex serialization truncations
  libselinux/utils: introduce selabel_compare
  libselinux: parameter simplifications
  libselinux/utils: use correct type for backend argument
  libselinux: update string_to_mode()
  libselinux: remove SELABEL_OPT_SUBSET support from selabel_file(5)
  libselinux: fix logic for building android backend
  libselinux: avoid unused function
  libselinux: check for stream rewind failures
  libselinux: simplify internal selabel_validate prototype
  libselinux/utils: drop include of internal header file
  libselinux: free elements on read_spec_entries() failure
  libselinux: set errno on label lookup failure
  libselinux: rework selabel_file(5) database
  libselinux: remove unused hashtab code
  libselinux: add selabel_file(5) fuzzer

 libselinux/fuzz/input                         |    0
 .../fuzz/selabel_file_compiled-fuzzer.c       |  279 +++
 libselinux/fuzz/selabel_file_text-fuzzer.c    |  223 ++
 libselinux/include/selinux/label.h            |    6 +-
 libselinux/include/selinux/selinux.h          |    6 +-
 libselinux/src/Makefile                       |   20 +-
 libselinux/src/booleans.c                     |    8 +-
 libselinux/src/compute_create.c               |    2 +-
 libselinux/src/get_context_list.c             |   14 +-
 libselinux/src/get_default_type.c             |    2 +-
 libselinux/src/hashtab.c                      |  234 --
 libselinux/src/hashtab.h                      |  117 -
 libselinux/src/is_customizable_type.c         |    7 +-
 libselinux/src/label.c                        |   40 +-
 libselinux/src/label_backends_android.c       |    9 +-
 libselinux/src/label_file.c                   | 2140 ++++++++++++-----
 libselinux/src/label_file.h                   |  913 ++++---
 libselinux/src/label_internal.h               |   17 +-
 libselinux/src/label_media.c                  |    7 +-
 libselinux/src/label_support.c                |   43 +-
 libselinux/src/label_x.c                      |    7 +-
 libselinux/src/load_policy.c                  |    2 +-
 libselinux/src/matchmediacon.c                |    6 +-
 libselinux/src/matchpathcon.c                 |   17 +-
 libselinux/src/regex.c                        |   57 +-
 .../src/selinux_check_securetty_context.c     |    4 +-
 libselinux/src/selinux_config.c               |   12 +-
 libselinux/src/selinux_internal.c             |   16 +
 libselinux/src/selinux_internal.h             |    4 +
 libselinux/src/selinux_restorecon.c           |    3 +-
 libselinux/src/seusers.c                      |    6 +-
 libselinux/utils/.gitignore                   |    2 +
 libselinux/utils/matchpathcon.c               |   11 +-
 libselinux/utils/sefcontext_compile.c         |  536 +++--
 libselinux/utils/selabel_compare.c            |  119 +
 libselinux/utils/selabel_digest.c             |    3 +-
 .../selabel_get_digests_all_partial_matches.c |    2 -
 libselinux/utils/selabel_lookup.c             |    3 +-
 libselinux/utils/selabel_nuke.c               |  134 ++
 libselinux/utils/selabel_partial_match.c      |    7 +-
 libselinux/utils/selinux_check_access.c       |    2 +-
 policycoreutils/setfiles/setfiles.c           |   16 +-
 scripts/oss-fuzz.sh                           |   25 +
 43 files changed, 3434 insertions(+), 1647 deletions(-)
 create mode 100644 libselinux/fuzz/input
 create mode 100644 libselinux/fuzz/selabel_file_compiled-fuzzer.c
 create mode 100644 libselinux/fuzz/selabel_file_text-fuzzer.c
 delete mode 100644 libselinux/src/hashtab.c
 delete mode 100644 libselinux/src/hashtab.h
 create mode 100644 libselinux/utils/selabel_compare.c
 create mode 100644 libselinux/utils/selabel_nuke.c