diff mbox series

[v2] selinux: reduce the use of hard-coded hash sizes

Message ID 20200219093255.157758-1-omosnace@redhat.com (mailing list archive)
State Superseded
Headers show
Series [v2] selinux: reduce the use of hard-coded hash sizes | expand

Commit Message

Ondrej Mosnacek Feb. 19, 2020, 9:32 a.m. UTC
Instead allocate hash tables with just the right size based on the
actual number of elements (which is almost always known beforehand, we
just need to defer the hashtab allocation to the right time). The only
case when we don't know the size (with the current policy format) is the
new filename transitions hashtable. Here I just left the existing value.

After this patch, the time to load Fedora policy on x86_64 decreases
from 950 ms to 220 ms. If the unconfined module is removed, it decreases
from 870 ms to 170 ms. It is also likely that other operations are going
to be faster, mainly string_to_context_struct() or mls_compute_sid(),
but I didn't try to quantify that.

The memory usage increases a bit after this patch, but only by ~1-2 MB
(it is hard to measure precisely). I believe it is a small price to pay
for the increased performance.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
---

Changed in v2:
 - guard against h->size == 0 in hashtab_search() and hashtab_insert()

 security/selinux/ss/hashtab.c  | 25 +++++++++++++---
 security/selinux/ss/hashtab.h  |  2 +-
 security/selinux/ss/policydb.c | 53 +++++++++++++---------------------
 security/selinux/ss/policydb.h |  2 --
 4 files changed, 42 insertions(+), 40 deletions(-)

Comments

Stephen Smalley Feb. 19, 2020, 1:34 p.m. UTC | #1
On 2/19/20 4:32 AM, Ondrej Mosnacek wrote:
> Instead allocate hash tables with just the right size based on the
> actual number of elements (which is almost always known beforehand, we
> just need to defer the hashtab allocation to the right time). The only
> case when we don't know the size (with the current policy format) is the
> new filename transitions hashtable. Here I just left the existing value.
> 
> After this patch, the time to load Fedora policy on x86_64 decreases
> from 950 ms to 220 ms. If the unconfined module is removed, it decreases
> from 870 ms to 170 ms. It is also likely that other operations are going
> to be faster, mainly string_to_context_struct() or mls_compute_sid(),
> but I didn't try to quantify that.
> 
> The memory usage increases a bit after this patch, but only by ~1-2 MB
> (it is hard to measure precisely). I believe it is a small price to pay
> for the increased performance.
> 
> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>

Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Paul Moore Feb. 22, 2020, 7:38 p.m. UTC | #2
On Wed, Feb 19, 2020 at 4:33 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
>
> Instead allocate hash tables with just the right size based on the
> actual number of elements (which is almost always known beforehand, we
> just need to defer the hashtab allocation to the right time). The only
> case when we don't know the size (with the current policy format) is the
> new filename transitions hashtable. Here I just left the existing value.
>
> After this patch, the time to load Fedora policy on x86_64 decreases
> from 950 ms to 220 ms. If the unconfined module is removed, it decreases
> from 870 ms to 170 ms. It is also likely that other operations are going
> to be faster, mainly string_to_context_struct() or mls_compute_sid(),
> but I didn't try to quantify that.
>
> The memory usage increases a bit after this patch, but only by ~1-2 MB
> (it is hard to measure precisely). I believe it is a small price to pay
> for the increased performance.
>
> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> ---
>
> Changed in v2:
>  - guard against h->size == 0 in hashtab_search() and hashtab_insert()
>
>  security/selinux/ss/hashtab.c  | 25 +++++++++++++---
>  security/selinux/ss/hashtab.h  |  2 +-
>  security/selinux/ss/policydb.c | 53 +++++++++++++---------------------
>  security/selinux/ss/policydb.h |  2 --
>  4 files changed, 42 insertions(+), 40 deletions(-)
>
> diff --git a/security/selinux/ss/hashtab.c b/security/selinux/ss/hashtab.c
> index ebfdaa31ee32..87ad83148cbd 100644
> --- a/security/selinux/ss/hashtab.c
> +++ b/security/selinux/ss/hashtab.c
> @@ -12,12 +12,26 @@
>
>  static struct kmem_cache *hashtab_node_cachep;
>
> +static u32 hashtab_compute_size(u32 nel)
> +{
> +       u32 size;
> +
> +       if (nel <= 2)
> +               return nel;
> +
> +       /* use the nearest power of two to balance size and access time */
> +       size = roundup_pow_of_two(nel);
> +       if (size - nel > size / 4)
> +               size /= 2;

It would be nice if the commit description (and possibly the code
itself via a shorter version in the comments) gave some insight into
why you chose this particular adjustment to the hash table size.  Was
this based on observations with real world policies?  Just a gut
feeling?  Things like this are the sort of thing that one wonders
about five years later when modifying the code and by then no one can
remember if it is important or not.

Also, considering the adjustment above, why not use
rounddown_pow_of_two() instead (perhaps coupled with a minimum size
check)?

> +       return size;
> +}
Ondrej Mosnacek Feb. 23, 2020, 2:46 p.m. UTC | #3
On Sat, Feb 22, 2020 at 8:38 PM Paul Moore <paul@paul-moore.com> wrote:
> On Wed, Feb 19, 2020 at 4:33 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> >
> > Instead allocate hash tables with just the right size based on the
> > actual number of elements (which is almost always known beforehand, we
> > just need to defer the hashtab allocation to the right time). The only
> > case when we don't know the size (with the current policy format) is the
> > new filename transitions hashtable. Here I just left the existing value.
> >
> > After this patch, the time to load Fedora policy on x86_64 decreases
> > from 950 ms to 220 ms. If the unconfined module is removed, it decreases
> > from 870 ms to 170 ms. It is also likely that other operations are going
> > to be faster, mainly string_to_context_struct() or mls_compute_sid(),
> > but I didn't try to quantify that.
> >
> > The memory usage increases a bit after this patch, but only by ~1-2 MB
> > (it is hard to measure precisely). I believe it is a small price to pay
> > for the increased performance.
> >
> > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > ---
> >
> > Changed in v2:
> >  - guard against h->size == 0 in hashtab_search() and hashtab_insert()
> >
> >  security/selinux/ss/hashtab.c  | 25 +++++++++++++---
> >  security/selinux/ss/hashtab.h  |  2 +-
> >  security/selinux/ss/policydb.c | 53 +++++++++++++---------------------
> >  security/selinux/ss/policydb.h |  2 --
> >  4 files changed, 42 insertions(+), 40 deletions(-)
> >
> > diff --git a/security/selinux/ss/hashtab.c b/security/selinux/ss/hashtab.c
> > index ebfdaa31ee32..87ad83148cbd 100644
> > --- a/security/selinux/ss/hashtab.c
> > +++ b/security/selinux/ss/hashtab.c
> > @@ -12,12 +12,26 @@
> >
> >  static struct kmem_cache *hashtab_node_cachep;
> >
> > +static u32 hashtab_compute_size(u32 nel)
> > +{
> > +       u32 size;
> > +
> > +       if (nel <= 2)
> > +               return nel;
> > +
> > +       /* use the nearest power of two to balance size and access time */
> > +       size = roundup_pow_of_two(nel);
> > +       if (size - nel > size / 4)
> > +               size /= 2;
>
> It would be nice if the commit description (and possibly the code
> itself via a shorter version in the comments) gave some insight into
> why you chose this particular adjustment to the hash table size.  Was
> this based on observations with real world policies?  Just a gut
> feeling?  Things like this are the sort of thing that one wonders
> about five years later when modifying the code and by then no one can
> remember if it is important or not.
>
> Also, considering the adjustment above, why not use
> rounddown_pow_of_two() instead (perhaps coupled with a minimum size
> check)?

Good point. I think I was tuning this formula back when I haven't
rebased this patch on top of the filename transition rework, so I
suspect that the filename_trans hash table was what was causing high
memory usage with plain roundup_pow_of_two()... With ~225k rules it
should be taking up ~2 MB in such case, which is quite a lot. (In
fact, even with the adjusted formula it would still allocate the same
table, so my crude method of measuring the memory usage apparently
wasn't reliable...) Now that this hash table is only at ~2500 elements
even with our imperfect policy, it should only take up ~32 KB when
rounding up, which is a pretty small overhead.

I'll re-test the patch with hashtab_compute_size() ~
roundup_pow_of_two(), this time logging the exact amount of bytes
taken up by the hash table arrays (which is the only thing that is
changing here). I expect it will show only a few 10s of KB difference,
in which case I'll respin the patch with this simpler (and likely a
bit faster) variant.

>
> > +       return size;
> > +}
Paul Moore Feb. 23, 2020, 3:01 p.m. UTC | #4
On Sun, Feb 23, 2020 at 9:46 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> On Sat, Feb 22, 2020 at 8:38 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Wed, Feb 19, 2020 at 4:33 AM Ondrej Mosnacek <omosnace@redhat.com> wrote:
> > >
> > > Instead allocate hash tables with just the right size based on the
> > > actual number of elements (which is almost always known beforehand, we
> > > just need to defer the hashtab allocation to the right time). The only
> > > case when we don't know the size (with the current policy format) is the
> > > new filename transitions hashtable. Here I just left the existing value.
> > >
> > > After this patch, the time to load Fedora policy on x86_64 decreases
> > > from 950 ms to 220 ms. If the unconfined module is removed, it decreases
> > > from 870 ms to 170 ms. It is also likely that other operations are going
> > > to be faster, mainly string_to_context_struct() or mls_compute_sid(),
> > > but I didn't try to quantify that.
> > >
> > > The memory usage increases a bit after this patch, but only by ~1-2 MB
> > > (it is hard to measure precisely). I believe it is a small price to pay
> > > for the increased performance.
> > >
> > > Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
> > > ---
> > >
> > > Changed in v2:
> > >  - guard against h->size == 0 in hashtab_search() and hashtab_insert()
> > >
> > >  security/selinux/ss/hashtab.c  | 25 +++++++++++++---
> > >  security/selinux/ss/hashtab.h  |  2 +-
> > >  security/selinux/ss/policydb.c | 53 +++++++++++++---------------------
> > >  security/selinux/ss/policydb.h |  2 --
> > >  4 files changed, 42 insertions(+), 40 deletions(-)
> > >
> > > diff --git a/security/selinux/ss/hashtab.c b/security/selinux/ss/hashtab.c
> > > index ebfdaa31ee32..87ad83148cbd 100644
> > > --- a/security/selinux/ss/hashtab.c
> > > +++ b/security/selinux/ss/hashtab.c
> > > @@ -12,12 +12,26 @@
> > >
> > >  static struct kmem_cache *hashtab_node_cachep;
> > >
> > > +static u32 hashtab_compute_size(u32 nel)
> > > +{
> > > +       u32 size;
> > > +
> > > +       if (nel <= 2)
> > > +               return nel;
> > > +
> > > +       /* use the nearest power of two to balance size and access time */
> > > +       size = roundup_pow_of_two(nel);
> > > +       if (size - nel > size / 4)
> > > +               size /= 2;
> >
> > It would be nice if the commit description (and possibly the code
> > itself via a shorter version in the comments) gave some insight into
> > why you chose this particular adjustment to the hash table size.  Was
> > this based on observations with real world policies?  Just a gut
> > feeling?  Things like this are the sort of thing that one wonders
> > about five years later when modifying the code and by then no one can
> > remember if it is important or not.
> >
> > Also, considering the adjustment above, why not use
> > rounddown_pow_of_two() instead (perhaps coupled with a minimum size
> > check)?
>
> Good point. I think I was tuning this formula back when I haven't
> rebased this patch on top of the filename transition rework, so I
> suspect that the filename_trans hash table was what was causing high
> memory usage with plain roundup_pow_of_two()... With ~225k rules it
> should be taking up ~2 MB in such case, which is quite a lot. (In
> fact, even with the adjusted formula it would still allocate the same
> table, so my crude method of measuring the memory usage apparently
> wasn't reliable...) Now that this hash table is only at ~2500 elements
> even with our imperfect policy, it should only take up ~32 KB when
> rounding up, which is a pretty small overhead.
>
> I'll re-test the patch with hashtab_compute_size() ~
> roundup_pow_of_two(), this time logging the exact amount of bytes
> taken up by the hash table arrays (which is the only thing that is
> changing here). I expect it will show only a few 10s of KB difference,
> in which case I'll respin the patch with this simpler (and likely a
> bit faster) variant.

Thanks.  I also want to stress that capturing the logic behind the
decision in the patch description / comments is almost as important
(maybe more so) than what we end up deciding on for the actual size
algorithm.  I can almost guarantee the decision made now is going to
be brought up for discussion at some point in the future, so let's
help the poor future devs as much as we can ;)
diff mbox series

Patch

diff --git a/security/selinux/ss/hashtab.c b/security/selinux/ss/hashtab.c
index ebfdaa31ee32..87ad83148cbd 100644
--- a/security/selinux/ss/hashtab.c
+++ b/security/selinux/ss/hashtab.c
@@ -12,12 +12,26 @@ 
 
 static struct kmem_cache *hashtab_node_cachep;
 
+static u32 hashtab_compute_size(u32 nel)
+{
+	u32 size;
+
+	if (nel <= 2)
+		return nel;
+
+	/* use the nearest power of two to balance size and access time */
+	size = roundup_pow_of_two(nel);
+	if (size - nel > size / 4)
+		size /= 2;
+	return size;
+}
+
 struct hashtab *hashtab_create(u32 (*hash_value)(struct hashtab *h, const void *key),
 			       int (*keycmp)(struct hashtab *h, const void *key1, const void *key2),
-			       u32 size)
+			       u32 nel_hint)
 {
 	struct hashtab *p;
-	u32 i;
+	u32 i, size = hashtab_compute_size(nel_hint);
 
 	p = kzalloc(sizeof(*p), GFP_KERNEL);
 	if (!p)
@@ -27,6 +41,9 @@  struct hashtab *hashtab_create(u32 (*hash_value)(struct hashtab *h, const void *
 	p->nel = 0;
 	p->hash_value = hash_value;
 	p->keycmp = keycmp;
+	if (!size)
+		return p;
+
 	p->htable = kmalloc_array(size, sizeof(*p->htable), GFP_KERNEL);
 	if (!p->htable) {
 		kfree(p);
@@ -46,7 +63,7 @@  int hashtab_insert(struct hashtab *h, void *key, void *datum)
 
 	cond_resched();
 
-	if (!h || h->nel == HASHTAB_MAX_NODES)
+	if (!h || !h->size || h->nel == HASHTAB_MAX_NODES)
 		return -EINVAL;
 
 	hvalue = h->hash_value(h, key);
@@ -82,7 +99,7 @@  void *hashtab_search(struct hashtab *h, const void *key)
 	u32 hvalue;
 	struct hashtab_node *cur;
 
-	if (!h)
+	if (!h || !h->size)
 		return NULL;
 
 	hvalue = h->hash_value(h, key);
diff --git a/security/selinux/ss/hashtab.h b/security/selinux/ss/hashtab.h
index 3e3e42bfd150..dde54d9ff01c 100644
--- a/security/selinux/ss/hashtab.h
+++ b/security/selinux/ss/hashtab.h
@@ -42,7 +42,7 @@  struct hashtab_info {
  */
 struct hashtab *hashtab_create(u32 (*hash_value)(struct hashtab *h, const void *key),
 			       int (*keycmp)(struct hashtab *h, const void *key1, const void *key2),
-			       u32 size);
+			       u32 nel_hint);
 
 /*
  * Inserts the specified (key, datum) pair into the specified hash table.
diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c
index 32b3a8acf96f..7ca8c74efba3 100644
--- a/security/selinux/ss/policydb.c
+++ b/security/selinux/ss/policydb.c
@@ -56,17 +56,6 @@  static const char *symtab_name[SYM_NUM] = {
 };
 #endif
 
-static unsigned int symtab_sizes[SYM_NUM] = {
-	2,
-	32,
-	16,
-	512,
-	128,
-	16,
-	16,
-	16,
-};
-
 struct policydb_compat_info {
 	int version;
 	int sym_num;
@@ -478,20 +467,10 @@  static int policydb_init(struct policydb *p)
 
 	memset(p, 0, sizeof(*p));
 
-	for (i = 0; i < SYM_NUM; i++) {
-		rc = symtab_init(&p->symtab[i], symtab_sizes[i]);
-		if (rc)
-			goto out;
-	}
-
 	rc = avtab_init(&p->te_avtab);
 	if (rc)
 		goto out;
 
-	rc = roles_init(p);
-	if (rc)
-		goto out;
-
 	rc = cond_policydb_init(p);
 	if (rc)
 		goto out;
@@ -503,20 +482,12 @@  static int policydb_init(struct policydb *p)
 		goto out;
 	}
 
-	p->range_tr = hashtab_create(rangetr_hash, rangetr_cmp, 256);
-	if (!p->range_tr) {
-		rc = -ENOMEM;
-		goto out;
-	}
-
 	ebitmap_init(&p->filename_trans_ttypes);
 	ebitmap_init(&p->policycaps);
 	ebitmap_init(&p->permissive_map);
 
 	return 0;
 out:
-	hashtab_destroy(p->filename_trans);
-	hashtab_destroy(p->range_tr);
 	for (i = 0; i < SYM_NUM; i++) {
 		hashtab_map(p->symtab[i].table, destroy_f[i], NULL);
 		hashtab_destroy(p->symtab[i].table);
@@ -1142,12 +1113,12 @@  static int common_read(struct policydb *p, struct hashtab *h, void *fp)
 
 	len = le32_to_cpu(buf[0]);
 	comdatum->value = le32_to_cpu(buf[1]);
+	nel = le32_to_cpu(buf[3]);
 
-	rc = symtab_init(&comdatum->permissions, PERM_SYMTAB_SIZE);
+	rc = symtab_init(&comdatum->permissions, nel);
 	if (rc)
 		goto bad;
 	comdatum->permissions.nprim = le32_to_cpu(buf[2]);
-	nel = le32_to_cpu(buf[3]);
 
 	rc = str_read(&key, GFP_KERNEL, fp, len);
 	if (rc)
@@ -1308,12 +1279,12 @@  static int class_read(struct policydb *p, struct hashtab *h, void *fp)
 	len = le32_to_cpu(buf[0]);
 	len2 = le32_to_cpu(buf[1]);
 	cladatum->value = le32_to_cpu(buf[2]);
+	nel = le32_to_cpu(buf[4]);
 
-	rc = symtab_init(&cladatum->permissions, PERM_SYMTAB_SIZE);
+	rc = symtab_init(&cladatum->permissions, nel);
 	if (rc)
 		goto bad;
 	cladatum->permissions.nprim = le32_to_cpu(buf[3]);
-	nel = le32_to_cpu(buf[4]);
 
 	ncons = le32_to_cpu(buf[5]);
 
@@ -1826,6 +1797,11 @@  static int range_read(struct policydb *p, void *fp)
 		return rc;
 
 	nel = le32_to_cpu(buf[0]);
+
+	p->range_tr = hashtab_create(rangetr_hash, rangetr_cmp, nel);
+	if (!p->range_tr)
+		return -ENOMEM;
+
 	for (i = 0; i < nel; i++) {
 		rc = -ENOMEM;
 		rt = kzalloc(sizeof(*rt), GFP_KERNEL);
@@ -2418,6 +2394,17 @@  int policydb_read(struct policydb *p, void *fp)
 			goto bad;
 		nprim = le32_to_cpu(buf[0]);
 		nel = le32_to_cpu(buf[1]);
+
+		rc = symtab_init(&p->symtab[i], nel);
+		if (rc)
+			goto out;
+
+		if (i == SYM_ROLES) {
+			rc = roles_init(p);
+			if (rc)
+				goto out;
+		}
+
 		for (j = 0; j < nel; j++) {
 			rc = read_f[i](p, p->symtab[i].table, fp);
 			if (rc)
diff --git a/security/selinux/ss/policydb.h b/security/selinux/ss/policydb.h
index 41ad78a1f17b..72e2932fb12d 100644
--- a/security/selinux/ss/policydb.h
+++ b/security/selinux/ss/policydb.h
@@ -321,8 +321,6 @@  extern int policydb_role_isvalid(struct policydb *p, unsigned int role);
 extern int policydb_read(struct policydb *p, void *fp);
 extern int policydb_write(struct policydb *p, void *fp);
 
-#define PERM_SYMTAB_SIZE 32
-
 #define POLICYDB_CONFIG_MLS    1
 
 /* the config flags related to unknown classes/perms are bits 2 and 3 */