diff mbox series

[v2,2/2] selinux: improve symtab string hashing

Message ID 20240315181414.649045-1-cgzones@googlemail.com (mailing list archive)
State Accepted
Delegated to: Paul Moore
Headers show
Series [v2,1/2] selinux: dump statistics for more hash tables | expand

Commit Message

Christian Göttsche March 15, 2024, 6:14 p.m. UTC
The number of buckets is calculated by performing a binary AND against
the mask of the hash table, which is one less than its size (which is a
power of two).  This leads to all top bits being discarded, requiring
for short or similar inputs a hash function with a good avalanche
effect.

Use djb2a:

    # current
    common prefixes:  7 entries and 5/8 buckets used, longest chain length 2, sum of chain length^2 11
    classes:  134 entries and 100/256 buckets used, longest chain length 5, sum of chain length^2 234
    roles:  15 entries and 6/16 buckets used, longest chain length 5, sum of chain length^2 57
    types:  4448 entries and 3016/8192 buckets used, longest chain length 41, sum of chain length^2 14922
    users:  7 entries and 3/8 buckets used, longest chain length 3, sum of chain length^2 17
    bools:  306 entries and 221/512 buckets used, longest chain length 4, sum of chain length^2 524
    levels:  1 entries and 1/1 buckets used, longest chain length 1, sum of chain length^2 1
    categories:  1024 entries and 400/1024 buckets used, longest chain length 4, sum of chain length^2 2740

    # patch
    common prefixes:  7 entries and 5/8 buckets used, longest chain length 2, sum of chain length^2 11
    classes:  134 entries and 101/256 buckets used, longest chain length 3, sum of chain length^2 210
    roles:  15 entries and 9/16 buckets used, longest chain length 3, sum of chain length^2 31
    types:  4448 entries and 3459/8192 buckets used, longest chain length 5, sum of chain length^2 6778
    users:  7 entries and 5/8 buckets used, longest chain length 3, sum of chain length^2 13
    bools:  306 entries and 236/512 buckets used, longest chain length 5, sum of chain length^2 470
    levels:  1 entries and 1/1 buckets used, longest chain length 1, sum of chain length^2 1
    categories:  1024 entries and 518/1024 buckets used, longest chain length 7, sum of chain length^2 2992

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
---
v2:
   add licensing note
---
 security/selinux/ss/symtab.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

Comments

Paul Moore March 27, 2024, 11:26 p.m. UTC | #1
On Mar 15, 2024 =?UTF-8?q?Christian=20G=C3=B6ttsche?= <cgzones@googlemail.com> wrote:
> 
> The number of buckets is calculated by performing a binary AND against
> the mask of the hash table, which is one less than its size (which is a
> power of two).  This leads to all top bits being discarded, requiring
> for short or similar inputs a hash function with a good avalanche
> effect.
> 
> Use djb2a:
> 
>     # current
>     common prefixes:  7 entries and 5/8 buckets used, longest chain length 2, sum of chain length^2 11
>     classes:  134 entries and 100/256 buckets used, longest chain length 5, sum of chain length^2 234
>     roles:  15 entries and 6/16 buckets used, longest chain length 5, sum of chain length^2 57
>     types:  4448 entries and 3016/8192 buckets used, longest chain length 41, sum of chain length^2 14922
>     users:  7 entries and 3/8 buckets used, longest chain length 3, sum of chain length^2 17
>     bools:  306 entries and 221/512 buckets used, longest chain length 4, sum of chain length^2 524
>     levels:  1 entries and 1/1 buckets used, longest chain length 1, sum of chain length^2 1
>     categories:  1024 entries and 400/1024 buckets used, longest chain length 4, sum of chain length^2 2740
> 
>     # patch
>     common prefixes:  7 entries and 5/8 buckets used, longest chain length 2, sum of chain length^2 11
>     classes:  134 entries and 101/256 buckets used, longest chain length 3, sum of chain length^2 210
>     roles:  15 entries and 9/16 buckets used, longest chain length 3, sum of chain length^2 31
>     types:  4448 entries and 3459/8192 buckets used, longest chain length 5, sum of chain length^2 6778
>     users:  7 entries and 5/8 buckets used, longest chain length 3, sum of chain length^2 13
>     bools:  306 entries and 236/512 buckets used, longest chain length 5, sum of chain length^2 470
>     levels:  1 entries and 1/1 buckets used, longest chain length 1, sum of chain length^2 1
>     categories:  1024 entries and 518/1024 buckets used, longest chain length 7, sum of chain length^2 2992
> 
> Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
> ---
> v2:
>    add licensing note
> ---
>  security/selinux/ss/symtab.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)

Merged into selinux/dev, thanks.

--
paul-moore.com
diff mbox series

Patch

diff --git a/security/selinux/ss/symtab.c b/security/selinux/ss/symtab.c
index c04f8d447873..832660fd84a9 100644
--- a/security/selinux/ss/symtab.c
+++ b/security/selinux/ss/symtab.c
@@ -12,17 +12,17 @@ 
 
 static unsigned int symhash(const void *key)
 {
-	const char *p, *keyp;
-	unsigned int size;
-	unsigned int val;
-
-	val = 0;
-	keyp = key;
-	size = strlen(keyp);
-	for (p = keyp; (p - keyp) < size; p++)
-		val = (val << 4 | (val >> (8 * sizeof(unsigned int) - 4))) ^
-		      (*p);
-	return val;
+	/*
+	 * djb2a
+	 * Public domain from cdb v0.75
+	 */
+	unsigned int hash = 5381;
+	unsigned char c;
+
+	while ((c = *(const unsigned char *)key++))
+		hash = ((hash << 5) + hash) ^ c;
+
+	return hash;
 }
 
 static int symcmp(const void *key1, const void *key2)