Message ID | 20230818151220.166215-5-cgzones@googlemail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Paul Moore |
Headers | show |
Series | [1/6] selinux: print sum of chain lengths^2 for hash tables | expand |
On Fri, Aug 18, 2023 at 11:12 AM Christian Göttsche <cgzones@googlemail.com> wrote: > > The number of buckets is calculated by performing a binary AND against > the mask of the hash table, which is one less than its size (which is a > power of two). This leads to all top bits being discarded, requiring > for short or similar inputs a hash function with a good avalanche > effect. > > Use djb2a: > > # current > common prefixes: 7 entries and 5/8 buckets used, longest chain length 2, sum of chain length^2 11 > classes: 134 entries and 100/256 buckets used, longest chain length 5, sum of chain length^2 234 > roles: 15 entries and 6/16 buckets used, longest chain length 5, sum of chain length^2 57 > types: 4448 entries and 3016/8192 buckets used, longest chain length 41, sum of chain length^2 14922 > users: 7 entries and 3/8 buckets used, longest chain length 3, sum of chain length^2 17 > bools: 306 entries and 221/512 buckets used, longest chain length 4, sum of chain length^2 524 > levels: 1 entries and 1/1 buckets used, longest chain length 1, sum of chain length^2 1 > categories: 1024 entries and 400/1024 buckets used, longest chain length 4, sum of chain length^2 2740 > > # patch > common prefixes: 7 entries and 5/8 buckets used, longest chain length 2, sum of chain length^2 11 > classes: 134 entries and 101/256 buckets used, longest chain length 3, sum of chain length^2 210 > roles: 15 entries and 9/16 buckets used, longest chain length 3, sum of chain length^2 31 > types: 4448 entries and 3459/8192 buckets used, longest chain length 5, sum of chain length^2 6778 > users: 7 entries and 5/8 buckets used, longest chain length 3, sum of chain length^2 13 > bools: 306 entries and 236/512 buckets used, longest chain length 5, sum of chain length^2 470 > levels: 1 entries and 1/1 buckets used, longest chain length 1, sum of chain length^2 1 > categories: 1024 entries and 518/1024 buckets used, longest chain length 7, sum of chain length^2 2992 > > Signed-off-by: Christian Göttsche <cgzones@googlemail.com> > --- > security/selinux/ss/symtab.c | 18 ++++++++---------- > 1 file changed, 8 insertions(+), 10 deletions(-) > > diff --git a/security/selinux/ss/symtab.c b/security/selinux/ss/symtab.c > index 43d7f0319ccd..b6761b96eee4 100644 > --- a/security/selinux/ss/symtab.c > +++ b/security/selinux/ss/symtab.c > @@ -11,16 +11,14 @@ > > static unsigned int symhash(const void *key) > { > - const char *p, *keyp; > - unsigned int size; > - unsigned int val; > - > - val = 0; > - keyp = key; > - size = strlen(keyp); > - for (p = keyp; (p - keyp) < size; p++) > - val = (val << 4 | (val >> (8*sizeof(unsigned int)-4))) ^ (*p); > - return val; > + /* djb2a */ Do we need/want something that specifies the author/license (I assume public domain) of this code? > + unsigned int hash = 5381; > + unsigned char c; > + > + while ((c = *(const unsigned char *)key++)) > + hash = ((hash << 5) + hash) ^ c; > + > + return hash; > } > > static int symcmp(const void *key1, const void *key2) > -- > 2.40.1 >
On Fri, Aug 18, 2023 at 11:12 AM Christian Göttsche <cgzones@googlemail.com> wrote: > > The number of buckets is calculated by performing a binary AND against > the mask of the hash table, which is one less than its size (which is a > power of two). This leads to all top bits being discarded, requiring > for short or similar inputs a hash function with a good avalanche > effect. > > Use djb2a: > > # current > common prefixes: 7 entries and 5/8 buckets used, longest chain length 2, sum of chain length^2 11 > classes: 134 entries and 100/256 buckets used, longest chain length 5, sum of chain length^2 234 > roles: 15 entries and 6/16 buckets used, longest chain length 5, sum of chain length^2 57 > types: 4448 entries and 3016/8192 buckets used, longest chain length 41, sum of chain length^2 14922 > users: 7 entries and 3/8 buckets used, longest chain length 3, sum of chain length^2 17 > bools: 306 entries and 221/512 buckets used, longest chain length 4, sum of chain length^2 524 > levels: 1 entries and 1/1 buckets used, longest chain length 1, sum of chain length^2 1 > categories: 1024 entries and 400/1024 buckets used, longest chain length 4, sum of chain length^2 2740 > > # patch > common prefixes: 7 entries and 5/8 buckets used, longest chain length 2, sum of chain length^2 11 > classes: 134 entries and 101/256 buckets used, longest chain length 3, sum of chain length^2 210 > roles: 15 entries and 9/16 buckets used, longest chain length 3, sum of chain length^2 31 > types: 4448 entries and 3459/8192 buckets used, longest chain length 5, sum of chain length^2 6778 > users: 7 entries and 5/8 buckets used, longest chain length 3, sum of chain length^2 13 > bools: 306 entries and 236/512 buckets used, longest chain length 5, sum of chain length^2 470 > levels: 1 entries and 1/1 buckets used, longest chain length 1, sum of chain length^2 1 > categories: 1024 entries and 518/1024 buckets used, longest chain length 7, sum of chain length^2 2992 > > Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Assuming this is in fact public domain, Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
On Thu, Sep 7, 2023 at 1:44 PM Stephen Smalley <stephen.smalley.work@gmail.com> wrote: > On Fri, Aug 18, 2023 at 11:12 AM Christian Göttsche > <cgzones@googlemail.com> wrote: > > > > The number of buckets is calculated by performing a binary AND against > > the mask of the hash table, which is one less than its size (which is a > > power of two). This leads to all top bits being discarded, requiring > > for short or similar inputs a hash function with a good avalanche > > effect. > > > > Use djb2a: > > > > # current > > common prefixes: 7 entries and 5/8 buckets used, longest chain length 2, sum of chain length^2 11 > > classes: 134 entries and 100/256 buckets used, longest chain length 5, sum of chain length^2 234 > > roles: 15 entries and 6/16 buckets used, longest chain length 5, sum of chain length^2 57 > > types: 4448 entries and 3016/8192 buckets used, longest chain length 41, sum of chain length^2 14922 > > users: 7 entries and 3/8 buckets used, longest chain length 3, sum of chain length^2 17 > > bools: 306 entries and 221/512 buckets used, longest chain length 4, sum of chain length^2 524 > > levels: 1 entries and 1/1 buckets used, longest chain length 1, sum of chain length^2 1 > > categories: 1024 entries and 400/1024 buckets used, longest chain length 4, sum of chain length^2 2740 > > > > # patch > > common prefixes: 7 entries and 5/8 buckets used, longest chain length 2, sum of chain length^2 11 > > classes: 134 entries and 101/256 buckets used, longest chain length 3, sum of chain length^2 210 > > roles: 15 entries and 9/16 buckets used, longest chain length 3, sum of chain length^2 31 > > types: 4448 entries and 3459/8192 buckets used, longest chain length 5, sum of chain length^2 6778 > > users: 7 entries and 5/8 buckets used, longest chain length 3, sum of chain length^2 13 > > bools: 306 entries and 236/512 buckets used, longest chain length 5, sum of chain length^2 470 > > levels: 1 entries and 1/1 buckets used, longest chain length 1, sum of chain length^2 1 > > categories: 1024 entries and 518/1024 buckets used, longest chain length 7, sum of chain length^2 2992 > > > > Signed-off-by: Christian Göttsche <cgzones@googlemail.com> > > --- > > security/selinux/ss/symtab.c | 18 ++++++++---------- > > 1 file changed, 8 insertions(+), 10 deletions(-) > > > > diff --git a/security/selinux/ss/symtab.c b/security/selinux/ss/symtab.c > > index 43d7f0319ccd..b6761b96eee4 100644 > > --- a/security/selinux/ss/symtab.c > > +++ b/security/selinux/ss/symtab.c > > @@ -11,16 +11,14 @@ > > > > static unsigned int symhash(const void *key) > > { > > - const char *p, *keyp; > > - unsigned int size; > > - unsigned int val; > > - > > - val = 0; > > - keyp = key; > > - size = strlen(keyp); > > - for (p = keyp; (p - keyp) < size; p++) > > - val = (val << 4 | (val >> (8*sizeof(unsigned int)-4))) ^ (*p); > > - return val; > > + /* djb2a */ > > Do we need/want something that specifies the author/license (I assume > public domain) of this code? It would be a good idea, yes. I spent some time looking around, and while there are many references to djb2 (and the djb2a xor version), I had a hard time finding an explicit license statement from DJB. However, in the cdb v0.75 release there is a source file, cdb_hash.c, which has effectively the same hash (same algorithm, different implementation) with a "/* Public domain. */" statement at the top. IANAL, but I think that is good enough. Christian, can you update this patch comments with this info? http://cr.yp.to/cdb.html > > + unsigned int hash = 5381; > > + unsigned char c; > > + > > + while ((c = *(const unsigned char *)key++)) > > + hash = ((hash << 5) + hash) ^ c; > > + > > + return hash; > > } > > > > static int symcmp(const void *key1, const void *key2) > > -- > > 2.40.1
diff --git a/security/selinux/ss/symtab.c b/security/selinux/ss/symtab.c index 43d7f0319ccd..b6761b96eee4 100644 --- a/security/selinux/ss/symtab.c +++ b/security/selinux/ss/symtab.c @@ -11,16 +11,14 @@ static unsigned int symhash(const void *key) { - const char *p, *keyp; - unsigned int size; - unsigned int val; - - val = 0; - keyp = key; - size = strlen(keyp); - for (p = keyp; (p - keyp) < size; p++) - val = (val << 4 | (val >> (8*sizeof(unsigned int)-4))) ^ (*p); - return val; + /* djb2a */ + unsigned int hash = 5381; + unsigned char c; + + while ((c = *(const unsigned char *)key++)) + hash = ((hash << 5) + hash) ^ c; + + return hash; } static int symcmp(const void *key1, const void *key2)
The number of buckets is calculated by performing a binary AND against the mask of the hash table, which is one less than its size (which is a power of two). This leads to all top bits being discarded, requiring for short or similar inputs a hash function with a good avalanche effect. Use djb2a: # current common prefixes: 7 entries and 5/8 buckets used, longest chain length 2, sum of chain length^2 11 classes: 134 entries and 100/256 buckets used, longest chain length 5, sum of chain length^2 234 roles: 15 entries and 6/16 buckets used, longest chain length 5, sum of chain length^2 57 types: 4448 entries and 3016/8192 buckets used, longest chain length 41, sum of chain length^2 14922 users: 7 entries and 3/8 buckets used, longest chain length 3, sum of chain length^2 17 bools: 306 entries and 221/512 buckets used, longest chain length 4, sum of chain length^2 524 levels: 1 entries and 1/1 buckets used, longest chain length 1, sum of chain length^2 1 categories: 1024 entries and 400/1024 buckets used, longest chain length 4, sum of chain length^2 2740 # patch common prefixes: 7 entries and 5/8 buckets used, longest chain length 2, sum of chain length^2 11 classes: 134 entries and 101/256 buckets used, longest chain length 3, sum of chain length^2 210 roles: 15 entries and 9/16 buckets used, longest chain length 3, sum of chain length^2 31 types: 4448 entries and 3459/8192 buckets used, longest chain length 5, sum of chain length^2 6778 users: 7 entries and 5/8 buckets used, longest chain length 3, sum of chain length^2 13 bools: 306 entries and 236/512 buckets used, longest chain length 5, sum of chain length^2 470 levels: 1 entries and 1/1 buckets used, longest chain length 1, sum of chain length^2 1 categories: 1024 entries and 518/1024 buckets used, longest chain length 7, sum of chain length^2 2992 Signed-off-by: Christian Göttsche <cgzones@googlemail.com> --- security/selinux/ss/symtab.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-)