diff mbox series

[v5,2/3] selinux: fix conditional avtab slot hint

Message ID 20231103172953.24667-3-jsatterfield.linux@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Paul Moore
Headers show
Series selinux: avtab arrays and refactors | expand

Commit Message

Jacob Satterfield Nov. 3, 2023, 5:29 p.m. UTC
Due to how conditional rules are written in the binary policy, the
code responsible for loading does not know how many conditional rules
there are before creating the avtab structure. Instead, it uses the
number of elements in the non-conditional avtab as a hint and allocates
the hash table based on it. In the refpolicy and default Fedora policy,
the actual sizes of these tables are not similar (~85k vs ~10k) thereby
creating more slots than needed and resulting in wasted memory.

This patch introduces a two-pass algorithm to calculate the conditional
rule count before allocating the avtab nodes array. Albeit with a slight
performance penalty in reading a portion of the binary policy twice,
this causes the number of hash slots for the conditional array to become
4096 instead of 32768. At 8-bytes per slot on 64-bit architectures, this
results in a net savings of 224 KB of heap memory.

Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
---
 security/selinux/ss/avtab.c       | 15 ++++++++++--
 security/selinux/ss/avtab.h       |  2 +-
 security/selinux/ss/conditional.c | 38 ++++++++++++++++++++-----------
 security/selinux/ss/conditional.h |  2 +-
 4 files changed, 40 insertions(+), 17 deletions(-)

Comments

Stephen Smalley Nov. 3, 2023, 6:22 p.m. UTC | #1
On Fri, Nov 3, 2023 at 1:30 PM Jacob Satterfield
<jsatterfield.linux@gmail.com> wrote:
>
> Due to how conditional rules are written in the binary policy, the
> code responsible for loading does not know how many conditional rules
> there are before creating the avtab structure. Instead, it uses the
> number of elements in the non-conditional avtab as a hint and allocates
> the hash table based on it. In the refpolicy and default Fedora policy,
> the actual sizes of these tables are not similar (~85k vs ~10k) thereby
> creating more slots than needed and resulting in wasted memory.
>
> This patch introduces a two-pass algorithm to calculate the conditional
> rule count before allocating the avtab nodes array. Albeit with a slight
> performance penalty in reading a portion of the binary policy twice,
> this causes the number of hash slots for the conditional array to become
> 4096 instead of 32768. At 8-bytes per slot on 64-bit architectures, this
> results in a net savings of 224 KB of heap memory.
>
> Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>

Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Paul Moore Nov. 21, 2023, 1:29 a.m. UTC | #2
On Nov  3, 2023 Jacob Satterfield <jsatterfield.linux@gmail.com> wrote:
> 
> Due to how conditional rules are written in the binary policy, the
> code responsible for loading does not know how many conditional rules
> there are before creating the avtab structure. Instead, it uses the
> number of elements in the non-conditional avtab as a hint and allocates
> the hash table based on it. In the refpolicy and default Fedora policy,
> the actual sizes of these tables are not similar (~85k vs ~10k) thereby
> creating more slots than needed and resulting in wasted memory.
> 
> This patch introduces a two-pass algorithm to calculate the conditional
> rule count before allocating the avtab nodes array. Albeit with a slight
> performance penalty in reading a portion of the binary policy twice,
> this causes the number of hash slots for the conditional array to become
> 4096 instead of 32768. At 8-bytes per slot on 64-bit architectures, this
> results in a net savings of 224 KB of heap memory.
> 
> Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> ---
>  security/selinux/ss/avtab.c       | 15 ++++++++++--
>  security/selinux/ss/avtab.h       |  2 +-
>  security/selinux/ss/conditional.c | 38 ++++++++++++++++++++-----------
>  security/selinux/ss/conditional.h |  2 +-
>  4 files changed, 40 insertions(+), 17 deletions(-)

...

> diff --git a/security/selinux/ss/conditional.c b/security/selinux/ss/conditional.c
> index 81ff676f209a..810319bf0e60 100644
> --- a/security/selinux/ss/conditional.c
> +++ b/security/selinux/ss/conditional.c
> @@ -407,16 +408,17 @@ static int cond_read_node(struct policydb *p, struct cond_node *node, void *fp)
>  			return -EINVAL;
>  	}
>  
> -	rc = cond_read_av_list(p, fp, &node->true_list, NULL);
> +	rc = cond_read_av_list(p, fp, &node->true_list, NULL, nrules);
>  	if (rc)
>  		return rc;
> -	return cond_read_av_list(p, fp, &node->false_list, &node->true_list);
> +	return cond_read_av_list(p, fp, &node->false_list, &node->true_list, nrules);
>  }
>  
> -int cond_read_list(struct policydb *p, void *fp)
> +int cond_read_list(struct policydb *p, struct policy_file *fp)
>  {
>  	__le32 buf[1];
> -	u32 i, len;
> +	struct policy_file tmp_fp;
> +	u32 i, len, nrules;
>  	int rc;
>  
>  	rc = next_entry(buf, fp, sizeof(buf));
> @@ -428,15 +430,25 @@ int cond_read_list(struct policydb *p, void *fp)
>  	p->cond_list = kcalloc(len, sizeof(*p->cond_list), GFP_KERNEL);
>  	if (!p->cond_list)
>  		return -ENOMEM;
> +	p->cond_list_len = len;
> +
> +	/* first pass to only calculate the avrule count */
> +	tmp_fp = *fp;
> +	nrules = 0;
> +	for (i = 0; i < p->cond_list_len; i++) {
> +		rc = cond_read_node(p, &p->cond_list[i], &tmp_fp, &nrules);
> +		if (rc)
> +			goto err;
> +		cond_node_destroy(&p->cond_list[i]);
> +	}

I'm a concerned about all the work we have to do just to count the
conditional rules.  Other than not working with existing binary
policies, have you looked at bumping the policy version and introducing
a binary format change that would include the number of conditional
rules?

--
paul-moore.com
Jacob Satterfield Nov. 22, 2023, 5:35 p.m. UTC | #3
On Mon, Nov 20, 2023 at 8:29 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Nov  3, 2023 Jacob Satterfield <jsatterfield.linux@gmail.com> wrote:
> >
> > Due to how conditional rules are written in the binary policy, the
> > code responsible for loading does not know how many conditional rules
> > there are before creating the avtab structure. Instead, it uses the
> > number of elements in the non-conditional avtab as a hint and allocates
> > the hash table based on it. In the refpolicy and default Fedora policy,
> > the actual sizes of these tables are not similar (~85k vs ~10k) thereby
> > creating more slots than needed and resulting in wasted memory.
> >
> > This patch introduces a two-pass algorithm to calculate the conditional
> > rule count before allocating the avtab nodes array. Albeit with a slight
> > performance penalty in reading a portion of the binary policy twice,
> > this causes the number of hash slots for the conditional array to become
> > 4096 instead of 32768. At 8-bytes per slot on 64-bit architectures, this
> > results in a net savings of 224 KB of heap memory.
> >
> > Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
> > Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> > ---
> >  security/selinux/ss/avtab.c       | 15 ++++++++++--
> >  security/selinux/ss/avtab.h       |  2 +-
> >  security/selinux/ss/conditional.c | 38 ++++++++++++++++++++-----------
> >  security/selinux/ss/conditional.h |  2 +-
> >  4 files changed, 40 insertions(+), 17 deletions(-)
>
> ...
>
> > diff --git a/security/selinux/ss/conditional.c b/security/selinux/ss/conditional.c
> > index 81ff676f209a..810319bf0e60 100644
> > --- a/security/selinux/ss/conditional.c
> > +++ b/security/selinux/ss/conditional.c
> > @@ -407,16 +408,17 @@ static int cond_read_node(struct policydb *p, struct cond_node *node, void *fp)
> >                       return -EINVAL;
> >       }
> >
> > -     rc = cond_read_av_list(p, fp, &node->true_list, NULL);
> > +     rc = cond_read_av_list(p, fp, &node->true_list, NULL, nrules);
> >       if (rc)
> >               return rc;
> > -     return cond_read_av_list(p, fp, &node->false_list, &node->true_list);
> > +     return cond_read_av_list(p, fp, &node->false_list, &node->true_list, nrules);
> >  }
> >
> > -int cond_read_list(struct policydb *p, void *fp)
> > +int cond_read_list(struct policydb *p, struct policy_file *fp)
> >  {
> >       __le32 buf[1];
> > -     u32 i, len;
> > +     struct policy_file tmp_fp;
> > +     u32 i, len, nrules;
> >       int rc;
> >
> >       rc = next_entry(buf, fp, sizeof(buf));
> > @@ -428,15 +430,25 @@ int cond_read_list(struct policydb *p, void *fp)
> >       p->cond_list = kcalloc(len, sizeof(*p->cond_list), GFP_KERNEL);
> >       if (!p->cond_list)
> >               return -ENOMEM;
> > +     p->cond_list_len = len;
> > +
> > +     /* first pass to only calculate the avrule count */
> > +     tmp_fp = *fp;
> > +     nrules = 0;
> > +     for (i = 0; i < p->cond_list_len; i++) {
> > +             rc = cond_read_node(p, &p->cond_list[i], &tmp_fp, &nrules);
> > +             if (rc)
> > +                     goto err;
> > +             cond_node_destroy(&p->cond_list[i]);
> > +     }
>
> I'm a concerned about all the work we have to do just to count the
> conditional rules.  Other than not working with existing binary
> policies, have you looked at bumping the policy version and introducing
> a binary format change that would include the number of conditional
> rules?
>
> --
> paul-moore.com

Thanks for raising the issue. I had considered adding the total size
of the conditional table to the binary policy, but I wasn't sure if it
would be substantive enough to warrant bumping the policy version. As
you point out, the counting work will be needed for existing binary
policies making this patch necessary for the default case, but if you
are concerned about the performance penalty this patch brings (which
is less than the gain provided by the avtab array patch), then there
are two threads to possibly be worked on.

One is to rework this patch to include more invasive changes to count
rules without actually reading and destroying nodes thus saving cycles
but requiring more lines of code. Because policy parsing is not
handled separately from the construction of the policydb structure
(they are deeply intertwined), I was reluctant to add more complexity
just to have a parse-only code path. Would you prefer speed or simpler
logic for older policies?

The other is to obviously sum and add the total size of the
conditional rules table to the binary policy in cond_write_list() of
libsepol/src/write.c. If you think just adding the total conditional
rules table size is worth bumping the kernel policy version, then I
can prioritize sending a patch to libsepol for inclusion as soon as I
can.
Paul Moore Nov. 22, 2023, 6:14 p.m. UTC | #4
On Wed, Nov 22, 2023 at 12:35 PM Jacob Satterfield
<jsatterfield.linux@gmail.com> wrote:
> On Mon, Nov 20, 2023 at 8:29 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > I'm a concerned about all the work we have to do just to count the
> > conditional rules.  Other than not working with existing binary
> > policies, have you looked at bumping the policy version and introducing
> > a binary format change that would include the number of conditional
> > rules?
>
> Thanks for raising the issue. I had considered adding the total size
> of the conditional table to the binary policy, but I wasn't sure if it
> would be substantive enough to warrant bumping the policy version.

I appreciate the thoughtfulness, but the version field in the policy
is 32-bits and we are currently up to policy format v33 so we've still
got a few version numbers to play with :)

> As
> you point out, the counting work will be needed for existing binary
> policies making this patch necessary for the default case, but if you
> are concerned about the performance penalty this patch brings (which
> is less than the gain provided by the avtab array patch), then there
> are two threads to possibly be worked on.

To be clear, I'm not thinking about supporting this for existing
policies, just the new policy format; the existing policy versions
would behave as they do now ... although if we do the array conversion
we will likely need to do some type of realloc()/retry or something, I
dunno, I'll leave that to you to brainstorm ;)

> One is to rework this patch to include more invasive changes to count
> rules without actually reading and destroying nodes thus saving cycles
> but requiring more lines of code. Because policy parsing is not
> handled separately from the construction of the policydb structure
> (they are deeply intertwined), I was reluctant to add more complexity
> just to have a parse-only code path. Would you prefer speed or simpler
> logic for older policies?

That's the problem we have right now.  We have to do a lot of work
(allocations, etc.) that we throw away in the case where we are
counting, not to mention that bolting on the count-only functionality
is kinda hacky/ugly (not your fault, that is just the way the code is
right now).  As you mention, the alternative is to significantly
rework how we parse/load the policy, and that isn't a very exciting
prospect as far as I'm concerned.

I'm not sure if moving over to flex array is a win, I suspect that
whatever we gain in memory savings we lose in not having direct
access.  I dunno, maybe it wouldn't be too bad.

I'm open to ideas here ... I'm looking for something that would
support the improvements for a new policy with an explicit count,
while still falling back to something that works "reasonably well" in
the current case where we have to guess.  In this case "reasonably
well" means no worse than current in terms of performance and memory
use while not over complicating the code.
Jacob Satterfield Nov. 29, 2023, 5:39 p.m. UTC | #5
On Wed, Nov 22, 2023 at 6:15 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Wed, Nov 22, 2023 at 12:35 PM Jacob Satterfield
> <jsatterfield.linux@gmail.com> wrote:
> > On Mon, Nov 20, 2023 at 8:29 PM Paul Moore <paul@paul-moore.com> wrote:
> > >
> > > I'm a concerned about all the work we have to do just to count the
> > > conditional rules.  Other than not working with existing binary
> > > policies, have you looked at bumping the policy version and introducing
> > > a binary format change that would include the number of conditional
> > > rules?
> >
> > Thanks for raising the issue. I had considered adding the total size
> > of the conditional table to the binary policy, but I wasn't sure if it
> > would be substantive enough to warrant bumping the policy version.
>
> I appreciate the thoughtfulness, but the version field in the policy
> is 32-bits and we are currently up to policy format v33 so we've still
> got a few version numbers to play with :)
>

Sounds good, I will implement the policy version bump.

> > As
> > you point out, the counting work will be needed for existing binary
> > policies making this patch necessary for the default case, but if you
> > are concerned about the performance penalty this patch brings (which
> > is less than the gain provided by the avtab array patch), then there
> > are two threads to possibly be worked on.
>
> To be clear, I'm not thinking about supporting this for existing
> policies, just the new policy format; the existing policy versions
> would behave as they do now ... although if we do the array conversion
> we will likely need to do some type of realloc()/retry or something, I
> dunno, I'll leave that to you to brainstorm ;)
>

I still believe the performance gained by converting to arrays
outweighs the cost of supporting older versions. See the discussion
below.

> > One is to rework this patch to include more invasive changes to count
> > rules without actually reading and destroying nodes thus saving cycles
> > but requiring more lines of code. Because policy parsing is not
> > handled separately from the construction of the policydb structure
> > (they are deeply intertwined), I was reluctant to add more complexity
> > just to have a parse-only code path. Would you prefer speed or simpler
> > logic for older policies?
>
> That's the problem we have right now.  We have to do a lot of work
> (allocations, etc.) that we throw away in the case where we are
> counting, not to mention that bolting on the count-only functionality
> is kinda hacky/ugly (not your fault, that is just the way the code is
> right now).  As you mention, the alternative is to significantly
> rework how we parse/load the policy, and that isn't a very exciting
> prospect as far as I'm concerned.
>
> I'm not sure if moving over to flex array is a win, I suspect that
> whatever we gain in memory savings we lose in not having direct
> access.  I dunno, maybe it wouldn't be too bad.
>

Unless I misunderstand, I believe I addressed your concerns from v1
over using indices vs direct access to the arrays. From v2 --> v5, the
array conversion patch only changes how avtab_node elements are
allocated, access is the same as it is now.

> I'm open to ideas here ... I'm looking for something that would
> support the improvements for a new policy with an explicit count,
> while still falling back to something that works "reasonably well" in
> the current case where we have to guess.  In this case "reasonably
> well" means no worse than current in terms of performance and memory
> use while not over complicating the code.
>

For existing policies, I have given this a bit of thought and did some
performance testing. There are memory vs. runtime tradeoffs to
consider between counting the rules (like v5 does) and doing a realloc
(like v1 does). I'm of the opinion that the counting approach is
better overall, but I'll (r)enumerate the tradeoffs here for the
benefit of discussion.

For the realloc approach, a decent chunk of memory (currently 256 KB)
is wasted over-allocating the hash table slots because of the estimate
derived from the regular rules table. Extra memory for avtab_nodes are
allocated (either from overestimation or the dynamic growth factor)
during parsing, sometimes up to 1 MB, requiring a final realloc to
remove them. The growth factor chosen could have additional runtime
implications with regard to multiple reallocs. Regardless, this
approach has runtime costs and memory lost to overestimating the
hashtable slots needed (which in this case, can't be recovered except
through a rehashing of the whole table). 86 lines of code are added,
but it is fairly clean code.

The counting approach incurs a runtime penalty processing the
conditional rules twice. Memory is wasted constructing the struct
cond_node lists and corresponding struct cond_av_list elements twice,
but nowhere close to 1 MB. 23 lines of code are added, but it is as
you said hacky/ugly. The runtime penalty could be further optimized at
the cost of parsing code complexity and additional lines of code as
discussed earlier.

Some performance numbers from Fedora 38 running "perf stat -r 1000 -d
load_policy" (realloc is the v5 array conversion + v1 realloc
patches):

dev:       0.261174 seconds time elapsed ( +- 0.31% ) baseline
v5:         0.233972 seconds time elapsed ( +- 0.31% ) -10%
realloc:  0.226636 seconds time elapsed ( +- 0.34% ) -13.3%

Therefore, to your criteria stated above, both approaches are
empirically no worse than current in terms of performance or memory
used after load. But I believe the current counting approach is the
most reasonable given the lines of code added and the least amount of
memory used during parsing. For what it's worth, I have been
interested in designing a set of patches that would ultimately
refactor policy parsing from policy loading and would, as a side
effect, enable a simpler and more optimized counting approach for
older policies. If this is something you would entertain (even if it's
not exciting), then I will consider it for a future patch.

If you agree, I will introduce the policy version bump in the next
spin and also submit a patch to sepol. Otherwise, I can go back to the
drawing board.

Thanks for reviewing as always!






> --
> paul-moore.com
Paul Moore Dec. 7, 2023, 12:11 a.m. UTC | #6
On Wed, Nov 29, 2023 at 12:39 PM Jacob Satterfield
<jsatterfield.linux@gmail.com> wrote:
>
> On Wed, Nov 22, 2023 at 6:15 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Wed, Nov 22, 2023 at 12:35 PM Jacob Satterfield
> > <jsatterfield.linux@gmail.com> wrote:
> > > On Mon, Nov 20, 2023 at 8:29 PM Paul Moore <paul@paul-moore.com> wrote:
> > > >
> > > > I'm a concerned about all the work we have to do just to count the
> > > > conditional rules.  Other than not working with existing binary
> > > > policies, have you looked at bumping the policy version and introducing
> > > > a binary format change that would include the number of conditional
> > > > rules?
> > >
> > > Thanks for raising the issue. I had considered adding the total size
> > > of the conditional table to the binary policy, but I wasn't sure if it
> > > would be substantive enough to warrant bumping the policy version.
> >
> > I appreciate the thoughtfulness, but the version field in the policy
> > is 32-bits and we are currently up to policy format v33 so we've still
> > got a few version numbers to play with :)
>
> Sounds good, I will implement the policy version bump.
>
> > > As
> > > you point out, the counting work will be needed for existing binary
> > > policies making this patch necessary for the default case, but if you
> > > are concerned about the performance penalty this patch brings (which
> > > is less than the gain provided by the avtab array patch), then there
> > > are two threads to possibly be worked on.
> >
> > To be clear, I'm not thinking about supporting this for existing
> > policies, just the new policy format; the existing policy versions
> > would behave as they do now ... although if we do the array conversion
> > we will likely need to do some type of realloc()/retry or something, I
> > dunno, I'll leave that to you to brainstorm ;)
>
> I still believe the performance gained by converting to arrays
> outweighs the cost of supporting older versions. See the discussion
> below.

Annoying as it may be at times, we have an obligation to continue to
support older policy versions in the kernel.  We can't break
existing/old userspace applications or SELinux policies, period.

This doesn't mean that we have to ensure that every performance
improvement applies to the existing/old userspace; if we have an
improvement that only applies on newer policy versions, that's okay,
we just need to make sure we don't significantly penalize the existing
policies in use today.

> > > One is to rework this patch to include more invasive changes to count
> > > rules without actually reading and destroying nodes thus saving cycles
> > > but requiring more lines of code. Because policy parsing is not
> > > handled separately from the construction of the policydb structure
> > > (they are deeply intertwined), I was reluctant to add more complexity
> > > just to have a parse-only code path. Would you prefer speed or simpler
> > > logic for older policies?
> >
> > That's the problem we have right now.  We have to do a lot of work
> > (allocations, etc.) that we throw away in the case where we are
> > counting, not to mention that bolting on the count-only functionality
> > is kinda hacky/ugly (not your fault, that is just the way the code is
> > right now).  As you mention, the alternative is to significantly
> > rework how we parse/load the policy, and that isn't a very exciting
> > prospect as far as I'm concerned.
> >
> > I'm not sure if moving over to flex array is a win, I suspect that
> > whatever we gain in memory savings we lose in not having direct
> > access.  I dunno, maybe it wouldn't be too bad.
>
> Unless I misunderstand, I believe I addressed your concerns from v1
> over using indices vs direct access to the arrays. From v2 --> v5, the
> array conversion patch only changes how avtab_node elements are
> allocated, access is the same as it is now.

I probably wasn't as clear as I should have been, I'm sorry about
that.  My comment about flex arrays was in reference to a dynamic
array mechanism in the kernel known as "flex arrays", my thinking was
that we could use flex arrays to manage the size of the conditional
rules arrays with just a single pass.  While flex arrays were dynamic
and resizable, there was an obvious limitation in that direct access,
similar to conventional fixed arrays, wasn't always possible.
However, while looking up a pointer to the flex array documentation to
share in this email I now realize that flex arrays were removed from
the kernel some time ago and are no longer available ... oof :/

Looking around quickly, I see we now have xarrays, which are also
resizable, but I have zero experience with them and they may be a bit
too heavyweight for what we need.

There is also kvrealloc(), but that doesn't seem to be especially
clever, using the traditional alloc-n-copy based approach.  Who knows,
maybe that is okay since we would only be doing this on policy loads,
which should hopefully be relatively infrequent.

> > I'm open to ideas here ... I'm looking for something that would
> > support the improvements for a new policy with an explicit count,
> > while still falling back to something that works "reasonably well" in
> > the current case where we have to guess.  In this case "reasonably
> > well" means no worse than current in terms of performance and memory
> > use while not over complicating the code.
>
> For existing policies, I have given this a bit of thought and did some
> performance testing. There are memory vs. runtime tradeoffs to
> consider between counting the rules (like v5 does) and doing a realloc
> (like v1 does). I'm of the opinion that the counting approach is
> better overall, but I'll (r)enumerate the tradeoffs here for the
> benefit of discussion.

Thanks, I have to jump around enough that a refresher is always welcome :)

> For the realloc approach, a decent chunk of memory (currently 256 KB)
> is wasted over-allocating the hash table slots because of the estimate
> derived from the regular rules table. Extra memory for avtab_nodes are
> allocated (either from overestimation or the dynamic growth factor)
> during parsing, sometimes up to 1 MB, requiring a final realloc to
> remove them. The growth factor chosen could have additional runtime
> implications with regard to multiple reallocs. Regardless, this
> approach has runtime costs and memory lost to overestimating the
> hashtable slots needed (which in this case, can't be recovered except
> through a rehashing of the whole table). 86 lines of code are added,
> but it is fairly clean code.

I went back and looked at the v1 patches very quickly, I didn't want
to spend too much time there as there were a number of changes,
unrelated to the array/alloc code, that meant this early patchset
wasn't very applicable to our discussion.  However, one thing did jump
out as I was looking for the realloc code: you might want to consider
rewriting avtab_change_nodes_size() to use kvrealloc().  The change
should be trivial and it is SELinux specific code; the only drawback
is that the "shrink" operation is effectively a no-op.

I'm okay if we need to "waste" a reasonably small amount of memory due
to dynamic reallocs; we just need to be smart about balancing the
incremental size increases with the desire to moderate the number of
times we call kvrealloc().

> The counting approach incurs a runtime penalty processing the
> conditional rules twice. Memory is wasted constructing the struct
> cond_node lists and corresponding struct cond_av_list elements twice,
> but nowhere close to 1 MB. 23 lines of code are added, but it is as
> you said hacky/ugly. The runtime penalty could be further optimized at
> the cost of parsing code complexity and additional lines of code as
> discussed earlier.

I can't bring myself to merge something that takes the two-pass
approach to counting, I have this gut feeling that it is something we
are going to regret in the future.  I'm sorry.  It isn't the number of
lines of code that concern me, it's how the code is put together and
architected.  A kvrealloc() based approach is simply going to be
cleaner than a two-pass, alloc-free-alloc approach.  The kvrealloc()
solution may not be optimal in terms of memory allocated(), and policy
loads might be a bit slower due to mm issues, but it is *much* easier
to look at the code and convince yourself (and others, including
future you) that it is correct ... which is worth a *lot* in my mind.

> Some performance numbers from Fedora 38 running "perf stat -r 1000 -d
> load_policy" (realloc is the v5 array conversion + v1 realloc
> patches):
>
> dev:       0.261174 seconds time elapsed ( +- 0.31% ) baseline
> v5:         0.233972 seconds time elapsed ( +- 0.31% ) -10%
> realloc:  0.226636 seconds time elapsed ( +- 0.34% ) -13.3%
>
> Therefore, to your criteria stated above, both approaches are
> empirically no worse than current in terms of performance or memory
> used after load.

Sweet, so even the realloc approach is faster on policy load?  Awesome :)

> But I believe the current counting approach is the
> most reasonable given the lines of code added and the least amount of
> memory used during parsing.

I've tried, perhaps poorly, to explain why the two-pass approach isn't
appealing from my perspective.

> For what it's worth, I have been
> interested in designing a set of patches that would ultimately
> refactor policy parsing from policy loading and would, as a side
> effect, enable a simpler and more optimized counting approach for
> older policies. If this is something you would entertain (even if it's
> not exciting), then I will consider it for a future patch.

Possibly.  Let's get past this first and then you can explain a bit
more about what you had in mind ;)

> If you agree, I will introduce the policy version bump in the next
> spin and also submit a patch to sepol. Otherwise, I can go back to the
> drawing board.

I think I would like to see two things, either in one patchset or two,
that's up to you:

* A kvrealloc/xa based approach that works with existing policy
versions and offers improvements over what we have now in terms or
either performance or memory usage while not regressing in both.

* A new binary policy version that adds an explicit count that can be
used to properly allocate the necessary internal policy structs up
front and avoid reallocation.

Does that make sense and sound reasonable to you?

> Thanks for reviewing as always!

Thanks for your continued patience and enthusiasm :)
diff mbox series

Patch

diff --git a/security/selinux/ss/avtab.c b/security/selinux/ss/avtab.c
index 697eb4352439..a9227674899b 100644
--- a/security/selinux/ss/avtab.c
+++ b/security/selinux/ss/avtab.c
@@ -340,7 +340,7 @@  static const uint16_t spec_order[] = {
 int avtab_read_item(struct avtab *a, void *fp, struct policydb *pol,
 		    int (*insertf)(struct avtab *a, const struct avtab_key *k,
 				   const struct avtab_datum *d, void *p),
-		    void *p)
+		    void *p, u32 *nrules)
 {
 	__le16 buf16[4];
 	u16 enabled;
@@ -414,6 +414,11 @@  int avtab_read_item(struct avtab *a, void *fp, struct policydb *pol,
 			if (val & spec_order[i]) {
 				key.specified = spec_order[i] | enabled;
 				datum.u.data = le32_to_cpu(buf32[items++]);
+				/* first pass of conditional table read */
+				if (nrules) {
+					(*nrules)++;
+					continue;
+				}
 				rc = insertf(a, &key, &datum, p);
 				if (rc)
 					return rc;
@@ -492,6 +497,12 @@  int avtab_read_item(struct avtab *a, void *fp, struct policydb *pol,
 		pr_err("SELinux: avtab: invalid type\n");
 		return -EINVAL;
 	}
+
+	/* first pass of conditional table read */
+	if (nrules) {
+		(*nrules)++;
+		return 0;
+	}
 	return insertf(a, &key, &datum, p);
 }
 
@@ -525,7 +536,7 @@  int avtab_read(struct avtab *a, void *fp, struct policydb *pol)
 		goto bad;
 
 	for (i = 0; i < nel; i++) {
-		rc = avtab_read_item(a, fp, pol, avtab_insertf, NULL);
+		rc = avtab_read_item(a, fp, pol, avtab_insertf, NULL, NULL);
 		if (rc) {
 			if (rc == -ENOMEM)
 				pr_err("SELinux: avtab: out of memory\n");
diff --git a/security/selinux/ss/avtab.h b/security/selinux/ss/avtab.h
index 3c3904bf02b0..86fb6f793eec 100644
--- a/security/selinux/ss/avtab.h
+++ b/security/selinux/ss/avtab.h
@@ -104,7 +104,7 @@  struct policydb;
 int avtab_read_item(struct avtab *a, void *fp, struct policydb *pol,
 		    int (*insert)(struct avtab *a, const struct avtab_key *k,
 				  const struct avtab_datum *d, void *p),
-		    void *p);
+		    void *p, u32 *nrules);
 
 int avtab_read(struct avtab *a, void *fp, struct policydb *pol);
 int avtab_write_item(struct policydb *p, const struct avtab_node *cur, void *fp);
diff --git a/security/selinux/ss/conditional.c b/security/selinux/ss/conditional.c
index 81ff676f209a..810319bf0e60 100644
--- a/security/selinux/ss/conditional.c
+++ b/security/selinux/ss/conditional.c
@@ -321,9 +321,9 @@  static int cond_insertf(struct avtab *a, const struct avtab_key *k,
 	return 0;
 }
 
-static int cond_read_av_list(struct policydb *p, void *fp,
+static int cond_read_av_list(struct policydb *p, struct policy_file *fp,
 			     struct cond_av_list *list,
-			     struct cond_av_list *other)
+			     struct cond_av_list *other, u32 *nrules)
 {
 	int rc;
 	__le32 buf[1];
@@ -347,7 +347,7 @@  static int cond_read_av_list(struct policydb *p, void *fp,
 	for (i = 0; i < len; i++) {
 		data.dst = &list->nodes[i];
 		rc = avtab_read_item(&p->te_cond_avtab, fp, p, cond_insertf,
-				     &data);
+				     &data, nrules);
 		if (rc) {
 			kfree(list->nodes);
 			list->nodes = NULL;
@@ -373,7 +373,8 @@  static int expr_node_isvalid(struct policydb *p, struct cond_expr_node *expr)
 	return 1;
 }
 
-static int cond_read_node(struct policydb *p, struct cond_node *node, void *fp)
+static int cond_read_node(struct policydb *p, struct cond_node *node,
+			  struct policy_file *fp, u32 *nrules)
 {
 	__le32 buf[2];
 	u32 i, len;
@@ -407,16 +408,17 @@  static int cond_read_node(struct policydb *p, struct cond_node *node, void *fp)
 			return -EINVAL;
 	}
 
-	rc = cond_read_av_list(p, fp, &node->true_list, NULL);
+	rc = cond_read_av_list(p, fp, &node->true_list, NULL, nrules);
 	if (rc)
 		return rc;
-	return cond_read_av_list(p, fp, &node->false_list, &node->true_list);
+	return cond_read_av_list(p, fp, &node->false_list, &node->true_list, nrules);
 }
 
-int cond_read_list(struct policydb *p, void *fp)
+int cond_read_list(struct policydb *p, struct policy_file *fp)
 {
 	__le32 buf[1];
-	u32 i, len;
+	struct policy_file tmp_fp;
+	u32 i, len, nrules;
 	int rc;
 
 	rc = next_entry(buf, fp, sizeof(buf));
@@ -428,15 +430,25 @@  int cond_read_list(struct policydb *p, void *fp)
 	p->cond_list = kcalloc(len, sizeof(*p->cond_list), GFP_KERNEL);
 	if (!p->cond_list)
 		return -ENOMEM;
+	p->cond_list_len = len;
+
+	/* first pass to only calculate the avrule count */
+	tmp_fp = *fp;
+	nrules = 0;
+	for (i = 0; i < p->cond_list_len; i++) {
+		rc = cond_read_node(p, &p->cond_list[i], &tmp_fp, &nrules);
+		if (rc)
+			goto err;
+		cond_node_destroy(&p->cond_list[i]);
+	}
 
-	rc = avtab_alloc(&(p->te_cond_avtab), p->te_avtab.nel);
+	rc = avtab_alloc(&(p->te_cond_avtab), nrules);
 	if (rc)
 		goto err;
 
-	p->cond_list_len = len;
-
-	for (i = 0; i < len; i++) {
-		rc = cond_read_node(p, &p->cond_list[i], fp);
+	/* second pass to read in the conditional nodes */
+	for (i = 0; i < p->cond_list_len; i++) {
+		rc = cond_read_node(p, &p->cond_list[i], fp, NULL);
 		if (rc)
 			goto err;
 	}
diff --git a/security/selinux/ss/conditional.h b/security/selinux/ss/conditional.h
index 5a7b51278dc6..62a12d00cac9 100644
--- a/security/selinux/ss/conditional.h
+++ b/security/selinux/ss/conditional.h
@@ -70,7 +70,7 @@  int cond_destroy_bool(void *key, void *datum, void *p);
 int cond_index_bool(void *key, void *datum, void *datap);
 
 int cond_read_bool(struct policydb *p, struct symtab *s, void *fp);
-int cond_read_list(struct policydb *p, void *fp);
+int cond_read_list(struct policydb *p, struct policy_file *fp);
 int cond_write_bool(void *key, void *datum, void *ptr);
 int cond_write_list(struct policydb *p, void *fp);