diff mbox series

[1/2] SELinux: Add median to debug output of hash table stats

Message ID 20200429202941.18320-2-siarhei.liakh@concurrent-rt.com (mailing list archive)
State Changes Requested
Headers show
Series SELinux: Improve hashing | expand

Commit Message

Siarhei Liakh April 29, 2020, 8:29 p.m. UTC
From: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>

This change introduces a median() function which is then used to report
25th, 50th, and 75th percentile metrics within distributions of hash table
bucket chain lengths. This allows to better assess and compare relative
effectiveness of different hash functions. Specifically, it allows to
ensure new functions not only reduce the maximum, but also improve (or, at
least, have no negative impact) on the median.

Sample output before change:

avc:
entries: 508
buckets used: 213/512
longest chain: 10

policydb:
SELinux: roles:  14 entries and 6/16 buckets used, longest chain length 5

Sample output after the change:

avc:
entries: 508
buckets used: 217/512
longest chain: 9
non-zero chain Q1/Med/Q3: 1/2/3

policydb:
SELinux: roles:  14 entries and 6/16 buckets used, longest chain length 5
non-zero Q1/Med/Q3 1/2/4

Signed-off-by: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>
---
Please CC me directly on all replies.

 security/selinux/Kconfig          | 10 +++++
 security/selinux/avc.c            | 42 ++++++++++++++++---
 security/selinux/include/median.h | 67 +++++++++++++++++++++++++++++++
 security/selinux/ss/avtab.c       | 37 ++++++++++++++---
 security/selinux/ss/hashtab.c     | 28 ++++++++++++-
 security/selinux/ss/hashtab.h     |  5 +++
 security/selinux/ss/policydb.c    | 14 ++++---
 7 files changed, 185 insertions(+), 18 deletions(-)
 create mode 100644 security/selinux/include/median.h

Comments

Paul Moore May 13, 2020, 9:55 p.m. UTC | #1
On Wed, Apr 29, 2020 at 4:29 PM <siarhei.liakh@concurrent-rt.com> wrote:
>
> From: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>
>
> This change introduces a median() function which is then used to report
> 25th, 50th, and 75th percentile metrics within distributions of hash table
> bucket chain lengths. This allows to better assess and compare relative
> effectiveness of different hash functions. Specifically, it allows to
> ensure new functions not only reduce the maximum, but also improve (or, at
> least, have no negative impact) on the median.
>
> Sample output before change:
>
> avc:
> entries: 508
> buckets used: 213/512
> longest chain: 10
>
> policydb:
> SELinux: roles:  14 entries and 6/16 buckets used, longest chain length 5
>
> Sample output after the change:
>
> avc:
> entries: 508
> buckets used: 217/512
> longest chain: 9
> non-zero chain Q1/Med/Q3: 1/2/3
>
> policydb:
> SELinux: roles:  14 entries and 6/16 buckets used, longest chain length 5
> non-zero Q1/Med/Q3 1/2/4
>
> Signed-off-by: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>
> ---
> Please CC me directly on all replies.
>
>  security/selinux/Kconfig          | 10 +++++
>  security/selinux/avc.c            | 42 ++++++++++++++++---
>  security/selinux/include/median.h | 67 +++++++++++++++++++++++++++++++
>  security/selinux/ss/avtab.c       | 37 ++++++++++++++---
>  security/selinux/ss/hashtab.c     | 28 ++++++++++++-
>  security/selinux/ss/hashtab.h     |  5 +++
>  security/selinux/ss/policydb.c    | 14 ++++---
>  7 files changed, 185 insertions(+), 18 deletions(-)
>  create mode 100644 security/selinux/include/median.h
>
> diff --git a/security/selinux/Kconfig b/security/selinux/Kconfig
> index 9e921fc72538..57c427e019c9 100644
> --- a/security/selinux/Kconfig
> +++ b/security/selinux/Kconfig
> @@ -115,3 +115,13 @@ config SECURITY_SELINUX_SID2STR_CACHE_SIZE
>           conversion.  Setting this option to 0 disables the cache completely.
>
>           If unsure, keep the default value.
> +
> +config SECURITY_SELINUX_DEBUG_HASHES
> +       bool "Print additional information about hash tables"
> +       depends on SECURITY_SELINUX
> +       default n
> +       help
> +         This option allows to gather and display additional information about
> +         some of the key hash tables within SELinux.
> +
> +         If unsure, keep the default value.

I forgot to mention this earlier, but I think this is another case
where we don't need to add another Kconfig option.
Siarhei Liakh June 2, 2020, 8:42 p.m. UTC | #2
The 05/13/2020 17:55, Paul Moore wrote:
> On Wed, Apr 29, 2020 at 4:29 PM <siarhei.liakh@concurrent-rt.com> wrote:
> >
> > From: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>
> >
> > This change introduces a median() function which is then used to report
> > 25th, 50th, and 75th percentile metrics within distributions of hash table
> > bucket chain lengths. This allows to better assess and compare relative
> > effectiveness of different hash functions. Specifically, it allows to
> > ensure new functions not only reduce the maximum, but also improve (or, at
> > least, have no negative impact) on the median.
[ . . . ]
> > diff --git a/security/selinux/Kconfig b/security/selinux/Kconfig
> > index 9e921fc72538..57c427e019c9 100644
> > --- a/security/selinux/Kconfig
> > +++ b/security/selinux/Kconfig
> > @@ -115,3 +115,13 @@ config SECURITY_SELINUX_SID2STR_CACHE_SIZE
> >           conversion.  Setting this option to 0 disables the cache completely.
> >
> >           If unsure, keep the default value.
> > +
> > +config SECURITY_SELINUX_DEBUG_HASHES
> > +       bool "Print additional information about hash tables"
> > +       depends on SECURITY_SELINUX
> > +       default n
> > +       help
> > +         This option allows to gather and display additional information about
> > +         some of the key hash tables within SELinux.
> > +
> > +         If unsure, keep the default value.
> 
> I forgot to mention this earlier, but I think this is another case
> where we don't need to add another Kconfig option.

Right. What is your preferred way to control conditional inclusion of
code spread out across several files?

My issue is that there already are two different symbols which require
coordination to activate this functionality: DEBUG_HASHES defined and used
locally within policydb.c and simple DEBUG which is needed for pr_debug()
statements throughout the code.

Personally, I prefer something global and controlled from a single well-known
place, hence the Kconfig. However, I also see your point about reducing
Kconfig... But if not Kconfig, then what? Should I just create an additional
.h file with all SELinux-specific debug symbols and have it included
everywhere in SELinux?

How would you approach this?

Thank you.
Paul Moore June 6, 2020, 1:05 p.m. UTC | #3
On Tue, Jun 2, 2020 at 4:42 PM Siarhei Liakh
<siarhei.liakh@concurrent-rt.com> wrote:
> The 05/13/2020 17:55, Paul Moore wrote:
> > On Wed, Apr 29, 2020 at 4:29 PM <siarhei.liakh@concurrent-rt.com> wrote:
> > >
> > > From: Siarhei Liakh <siarhei.liakh@concurrent-rt.com>
> > >
> > > This change introduces a median() function which is then used to report
> > > 25th, 50th, and 75th percentile metrics within distributions of hash table
> > > bucket chain lengths. This allows to better assess and compare relative
> > > effectiveness of different hash functions. Specifically, it allows to
> > > ensure new functions not only reduce the maximum, but also improve (or, at
> > > least, have no negative impact) on the median.
> [ . . . ]
> > > diff --git a/security/selinux/Kconfig b/security/selinux/Kconfig
> > > index 9e921fc72538..57c427e019c9 100644
> > > --- a/security/selinux/Kconfig
> > > +++ b/security/selinux/Kconfig
> > > @@ -115,3 +115,13 @@ config SECURITY_SELINUX_SID2STR_CACHE_SIZE
> > >           conversion.  Setting this option to 0 disables the cache completely.
> > >
> > >           If unsure, keep the default value.
> > > +
> > > +config SECURITY_SELINUX_DEBUG_HASHES
> > > +       bool "Print additional information about hash tables"
> > > +       depends on SECURITY_SELINUX
> > > +       default n
> > > +       help
> > > +         This option allows to gather and display additional information about
> > > +         some of the key hash tables within SELinux.
> > > +
> > > +         If unsure, keep the default value.
> >
> > I forgot to mention this earlier, but I think this is another case
> > where we don't need to add another Kconfig option.
>
> Right. What is your preferred way to control conditional inclusion of
> code spread out across several files?

Sorry for the delay.

Instead of talking about the mechanics of how to make the code
conditional, I would first like to have a discussion about if the code
should even be conditional, and if it is unconditional, do we need it?
 More on this below.

> My issue is that there already are two different symbols which require
> coordination to activate this functionality: DEBUG_HASHES defined and used
> locally within policydb.c and simple DEBUG which is needed for pr_debug()
> statements throughout the code.

My general thinking is that if the information is useful to a user to
manage and tune their system then we should include the code.  If the
information is only useful to kernel developers to play with different
designs or implementation then we can, and should, leave that code
out.  Developers are likely going to need to add their own
instrumentation anyway for testing, no sense in us cluttering up the
kernel for something that may never be useful to anyone.

> Personally, I prefer something global and controlled from a single well-known
> place, hence the Kconfig. However, I also see your point about reducing
> Kconfig... But if not Kconfig, then what? Should I just create an additional
> .h file with all SELinux-specific debug symbols and have it included
> everywhere in SELinux?
>
> How would you approach this?

I *think* the answers above should help, but if not let me know :)

--
paul moore
www.paul-moore.com
diff mbox series

Patch

diff --git a/security/selinux/Kconfig b/security/selinux/Kconfig
index 9e921fc72538..57c427e019c9 100644
--- a/security/selinux/Kconfig
+++ b/security/selinux/Kconfig
@@ -115,3 +115,13 @@  config SECURITY_SELINUX_SID2STR_CACHE_SIZE
 	  conversion.  Setting this option to 0 disables the cache completely.
 
 	  If unsure, keep the default value.
+
+config SECURITY_SELINUX_DEBUG_HASHES
+	bool "Print additional information about hash tables"
+	depends on SECURITY_SELINUX
+	default n
+	help
+	  This option allows to gather and display additional information about
+	  some of the key hash tables within SELinux.
+
+	  If unsure, keep the default value.
diff --git a/security/selinux/avc.c b/security/selinux/avc.c
index d18cb32a242a..c3bbdb083371 100644
--- a/security/selinux/avc.c
+++ b/security/selinux/avc.c
@@ -31,6 +31,10 @@ 
 #include "avc_ss.h"
 #include "classmap.h"
 
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+#include "median.h"
+#endif
+
 #define AVC_CACHE_SLOTS			512
 #define AVC_DEF_CACHE_THRESHOLD		512
 #define AVC_CACHE_RECLAIM		16
@@ -149,9 +153,13 @@  void __init avc_init(void)
 
 int avc_get_hash_stats(struct selinux_avc *avc, char *page)
 {
-	int i, chain_len, max_chain_len, slots_used;
+	int i, chain_len, max_chain_len, slots_used, ret;
 	struct avc_node *node;
 	struct hlist_head *head;
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+	u32 med_ct = 0;
+	u32 *counts = vmalloc(AVC_CACHE_SLOTS * sizeof(u32));
+#endif
 
 	rcu_read_lock();
 
@@ -164,6 +172,10 @@  int avc_get_hash_stats(struct selinux_avc *avc, char *page)
 			chain_len = 0;
 			hlist_for_each_entry_rcu(node, head, list)
 				chain_len++;
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+			if (counts && chain_len)
+				counts[med_ct++] = chain_len;
+#endif
 			if (chain_len > max_chain_len)
 				max_chain_len = chain_len;
 		}
@@ -171,10 +183,30 @@  int avc_get_hash_stats(struct selinux_avc *avc, char *page)
 
 	rcu_read_unlock();
 
-	return scnprintf(page, PAGE_SIZE, "entries: %d\nbuckets used: %d/%d\n"
-			 "longest chain: %d\n",
-			 atomic_read(&avc->avc_cache.active_nodes),
-			 slots_used, AVC_CACHE_SLOTS, max_chain_len);
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+	if (counts) {
+		u32 q1 = 0;
+		u32 q3 = 0;
+		u32 med = median(counts, med_ct, &q1, &q3);
+
+		vfree(counts);
+		ret = scnprintf(page, PAGE_SIZE,
+				"entries: %d\nbuckets used: %d/%d\n"
+				 "longest chain: %d\n"
+				 "non-zero chain Q1/Med/Q3: %u/%u/%u\n",
+				atomic_read(&avc->avc_cache.active_nodes),
+				slots_used, AVC_CACHE_SLOTS, max_chain_len,
+				q1, med, q3);
+	} else /* Falling through! */
+#endif
+	{
+		ret = scnprintf(page, PAGE_SIZE,
+				"entries: %d\nbuckets used: %d/%d\n"
+				 "longest chain: %d\n",
+				atomic_read(&avc->avc_cache.active_nodes),
+				slots_used, AVC_CACHE_SLOTS, max_chain_len);
+	}
+	return ret;
 }
 
 /*
diff --git a/security/selinux/include/median.h b/security/selinux/include/median.h
new file mode 100644
index 000000000000..e90565b9d7f6
--- /dev/null
+++ b/security/selinux/include/median.h
@@ -0,0 +1,67 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/* (C) Siarhei Liakh <Siarhei.Liakh@concurrent-rt.com>, 2020, GPL 2.0+ */
+
+#ifndef _LINUX_MEDIAN_H
+#define _LINUX_MEDIAN_H
+#include <linux/types.h>
+#include <linux/sort.h>
+
+/*
+ * Helper function to compare two u32s
+ */
+static int __cmp_u32(const void *a, const void *b)
+{
+	u32 x = *(const u32 *)(a);
+	u32 y = *(const u32 *)(b);
+
+	if (x < y)
+		return -1;
+
+	if (x > y)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * median(): Find a median of an array containing "count" of u32s,
+ * optionally return 25th (q1) and 75th (q3) percentile.
+ */
+static inline u32 median(u32 val[], size_t count, u32 *q1, u32 *q3)
+{
+	u32 _q1 = 0;
+	u32 _q2 = 0;
+	u32 _q3 = 0;
+
+	if (count > 0) {
+		/*
+		 * Using existing sort() functionality as the easiest way
+		 * to get median with least amount of new code.
+		 *
+		 * This is not the most efficient way to find a median,
+		 * so feel free to implement a better one if performance is
+		 * of a primary concern. "Selection algorithm" Wikipedia
+		 * article is a good start:
+		 * https://en.m.wikipedia.org/wiki/Selection_algorithm
+		 */
+		sort(val, count, sizeof(u32), &__cmp_u32, NULL);
+
+		/*
+		 * Should really do some math if count is even,
+		 * but this is close enough for our purposes.
+		 */
+
+		_q1 = val[count / 4];
+		_q2 = val[count / 2];
+		_q3 = val[(count * 3) / 4];
+	}
+
+	if (q1)
+		*q1 = _q1;
+
+	if (q3)
+		*q3 = _q3;
+
+	return _q2;
+}
+#endif /* #ifndef _LINUX_MEAN_H */
diff --git a/security/selinux/ss/avtab.c b/security/selinux/ss/avtab.c
index 01b300a4a882..8baaddb01a95 100644
--- a/security/selinux/ss/avtab.c
+++ b/security/selinux/ss/avtab.c
@@ -23,6 +23,10 @@ 
 #include "avtab.h"
 #include "policydb.h"
 
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+#include "median.h"
+#endif
+
 static struct kmem_cache *avtab_node_cachep;
 static struct kmem_cache *avtab_xperms_cachep;
 
@@ -345,6 +349,10 @@  void avtab_hash_eval(struct avtab *h, char *tag)
 	int i, chain_len, slots_used, max_chain_len;
 	unsigned long long chain2_len_sum;
 	struct avtab_node *cur;
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+	u32 med_ct = 0;
+	u32 *counts = vmalloc(h->nslot * sizeof(u32));
+#endif
 
 	slots_used = 0;
 	max_chain_len = 0;
@@ -358,17 +366,36 @@  void avtab_hash_eval(struct avtab *h, char *tag)
 				chain_len++;
 				cur = cur->next;
 			}
-
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+			if (counts && chain_len)
+				counts[med_ct++] = chain_len;
+#endif
 			if (chain_len > max_chain_len)
 				max_chain_len = chain_len;
 			chain2_len_sum += chain_len * chain_len;
 		}
 	}
 
-	pr_debug("SELinux: %s:  %d entries and %d/%d buckets used, "
-	       "longest chain length %d sum of chain length^2 %llu\n",
-	       tag, h->nel, slots_used, h->nslot, max_chain_len,
-	       chain2_len_sum);
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+	if (counts) {
+		u32 q1 = 0;
+		u32 q3 = 0;
+		u32 med = median(counts, med_ct, &q1, &q3);
+
+		vfree(counts);
+		pr_debug("SELinux: %s:  %d entries and %d/%d buckets used, "
+			 "longest chain length %d non-zero Q1/Med/Q3 %u/%u/%u "
+			 "sum of chain length^2 %llu\n",
+			 tag, h->nel, slots_used, h->nslot, max_chain_len,
+			 q1, med, q3, chain2_len_sum);
+	} else /* Falling through! */
+#endif
+	{
+		pr_debug("SELinux: %s:  %d entries and %d/%d buckets used, "
+			 "longest chain length %d sum of chain length^2 %llu\n",
+			 tag, h->nel, slots_used, h->nslot, max_chain_len,
+			 chain2_len_sum);
+	}
 }
 
 static uint16_t spec_order[] = {
diff --git a/security/selinux/ss/hashtab.c b/security/selinux/ss/hashtab.c
index 883f19d32c28..e42d814067ba 100644
--- a/security/selinux/ss/hashtab.c
+++ b/security/selinux/ss/hashtab.c
@@ -8,8 +8,13 @@ 
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/sched.h>
+#include <linux/vmalloc.h>
 #include "hashtab.h"
 
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+#include "median.h"
+#endif
+
 static struct kmem_cache *hashtab_node_cachep;
 
 /*
@@ -168,7 +173,10 @@  void hashtab_stat(struct hashtab *h, struct hashtab_info *info)
 {
 	u32 i, chain_len, slots_used, max_chain_len;
 	struct hashtab_node *cur;
-
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+	u32 med_ct = 0;
+	u32 *counts = vmalloc(h->size * sizeof(u32));
+#endif
 	slots_used = 0;
 	max_chain_len = 0;
 	for (i = 0; i < h->size; i++) {
@@ -180,7 +188,10 @@  void hashtab_stat(struct hashtab *h, struct hashtab_info *info)
 				chain_len++;
 				cur = cur->next;
 			}
-
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+			if (counts && chain_len)
+				counts[med_ct++] = chain_len;
+#endif
 			if (chain_len > max_chain_len)
 				max_chain_len = chain_len;
 		}
@@ -188,6 +199,19 @@  void hashtab_stat(struct hashtab *h, struct hashtab_info *info)
 
 	info->slots_used = slots_used;
 	info->max_chain_len = max_chain_len;
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+	if (counts) {
+		info->med_chain_len = median(counts, med_ct,
+					     &(info->q1_chain_len),
+					     &(info->q3_chain_len));
+
+		vfree(counts);
+	} else {
+		info->q1_chain_len = 0;
+		info->med_chain_len = 0;
+		info->q3_chain_len = 0;
+	}
+#endif
 }
 
 void __init hashtab_cache_init(void)
diff --git a/security/selinux/ss/hashtab.h b/security/selinux/ss/hashtab.h
index dde54d9ff01c..b660c0078dae 100644
--- a/security/selinux/ss/hashtab.h
+++ b/security/selinux/ss/hashtab.h
@@ -32,6 +32,11 @@  struct hashtab {
 struct hashtab_info {
 	u32 slots_used;
 	u32 max_chain_len;
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+	u32 q1_chain_len;
+	u32 med_chain_len;
+	u32 q3_chain_len;
+#endif
 };
 
 /*
diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c
index c21b922e5ebe..420e347ac77c 100644
--- a/security/selinux/ss/policydb.c
+++ b/security/selinux/ss/policydb.c
@@ -41,9 +41,9 @@ 
 #include "mls.h"
 #include "services.h"
 
-#define _DEBUG_HASHES
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
+#include "median.h"
 
-#ifdef DEBUG_HASHES
 static const char *symtab_name[SYM_NUM] = {
 	"common prefixes",
 	"classes",
@@ -623,15 +623,17 @@  static int (*index_f[SYM_NUM]) (void *key, void *datum, void *datap) =
 	cat_index,
 };
 
-#ifdef DEBUG_HASHES
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
 static void hash_eval(struct hashtab *h, const char *hash_name)
 {
 	struct hashtab_info info;
 
 	hashtab_stat(h, &info);
-	pr_debug("SELinux: %s:  %d entries and %d/%d buckets used, longest chain length %d\n",
+	pr_debug("SELinux: %s:  %d entries and %d/%d buckets used, "
+		 "longest chain length %d non-zero Q1/Med/Q3 %u/%u/%u\n",
 		 hash_name, h->nel, info.slots_used, h->size,
-		 info.max_chain_len);
+		 info.max_chain_len, info.q1_chain_len,
+		 info.med_chain_len, info.q3_chain_len);
 }
 
 static void symtab_hash_eval(struct symtab *s)
@@ -670,7 +672,7 @@  static int policydb_index(struct policydb *p)
 	pr_debug("SELinux:  %d classes, %d rules\n",
 		 p->p_classes.nprim, p->te_avtab.nel);
 
-#ifdef DEBUG_HASHES
+#ifdef CONFIG_SECURITY_SELINUX_DEBUG_HASHES
 	avtab_hash_eval(&p->te_avtab, "rules");
 	symtab_hash_eval(p->symtab);
 #endif