selinux: update filenametr_hash() to use full_name_hash()

Message ID	20231111160954.45911-2-paul@paul-moore.com (mailing list archive)
State	Accepted
Delegated to:	Paul Moore
Headers	show Return-Path: <selinux-owner@vger.kernel.org> From: Paul Moore <paul@paul-moore.com> To: selinux@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Subject: [PATCH] selinux: update filenametr_hash() to use full_name_hash() Date: Sat, 11 Nov 2023 11:09:55 -0500 Message-ID: <20231111160954.45911-2-paul@paul-moore.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	selinux: update filenametr_hash() to use full_name_hash() \| expand selinux: update filenametr_hash() to use full_name_hash()

Message ID

20231111160954.45911-2-paul@paul-moore.com (mailing list archive)

State

Accepted

Delegated to:

Paul Moore

Headers

From: Paul Moore <paul@paul-moore.com>
To: selinux@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH] selinux: update filenametr_hash() to use full_name_hash()
Date: Sat, 11 Nov 2023 11:09:55 -0500
Message-ID: <20231111160954.45911-2-paul@paul-moore.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

selinux: update filenametr_hash() to use full_name_hash() | expand

Commit Message

Paul Moore Nov. 11, 2023, 4:09 p.m. UTC

Using full_name_hash() instead of partial_name_hash() should result
in cleaner and better performing code.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
---
 security/selinux/ss/policydb.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

Comments

Linus Torvalds Nov. 11, 2023, 9:52 p.m. UTC | #1

On Sat, 11 Nov 2023 at 08:10, Paul Moore <paul@paul-moore.com> wrote:
>
> Using full_name_hash() instead of partial_name_hash() should result
> in cleaner and better performing code.

This looks obviously good to me, but I don't actually know what that
hash is used for and whether it all matters.

Also, looking at the actual use of this all, I suspect the *real*
issue that started this all is that in hashtab_search(), we have

        hvalue = key_params.hash(key) & (h->size - 1);
        cur = h->htable[hvalue];

and it's that "& (h->size - 1)" thing in that hash search function
that caused problems in the first place, and that made Christian then
do that commit 37b7ea3ca306 ("selinux: improve role transition
hashing") that in turn made me look at it all.

That "just use the low bits of the hash" really ends up being the
simplest model, but it's also the worst possible thing if the hash
itself isn't spreading out all bits optimally.

We have a much better function for this, so instead of doing "x &
(size-1)" to get a hash value, you do "hash_32(x, bits)".

But that requires 'bits' instead of 'size'.

So here's a *TOTALLY UNTESTED* patch that tries to remove the use of
"h->size' for the hash table size, and replace it with an
almost-equivalent 'h->hbits' for the size of the table in bits.

I say _almost_ equivalent, because I kept the "zero means no table"
behavior, so it turns out that you cannot have a hash table of just
one entry (h->hbits would be zero). So the minimal hash table size is
2 (hbits = 1).

Honestly, I doubt that matters.

Now, that actually makes this patch somewhat intrusive, because there
were quite a bit of places that used h->size, for obvious reasons.

Also, while at it, I removed the pattern that I absolutely *detest*
that the list handling in the hashing code used, which was that
disgusting "what is the previous pointer to be filled in" pattern.

The trivial - and *horrible* - way to do it is

        prev = NULL;
        cur = h->htable[hvalue];
        ... walk the list ...
                prev = cur;
        ...
        prev ? &prev->next : &h->htable[hvalue],

which shows that people don't understand the power of pointers.

The *proper* way to do this with pointers is:

        pprev = hashtab_entry(h, key, key_params);
        .. walk the list with (cur = *pprev) != NULL) ...
                pprev = &cur->next;
        ...
        pprev

which doesn't have that unnecessary conditional for "is this the first
entry". It not only generates better code, it shows that you
understand pointers.

Yes, this is the smallest hill I'll die on. It's literally a pet peeve of mine.

I also moved variable declarations into the blocks where they are
used, rather than at the top level.

Now, I want to state this very clearly once again: this attached patch
is ENTIRELY UNTESTED.  It might have some completely stupid thinko in
it, and may be entirely broken.

I test-compiled it, and I've looked through it a couple of times for
sanity, and I do think it's better to keep "hbits" around instead of
"size", but I'm not going to force this on you.

Anyway, I guess I should test this, but here is that untested patch if
you want to consider it.

               Linus

Linus Torvalds Nov. 11, 2023, 10:37 p.m. UTC | #2

On Sat, 11 Nov 2023 at 13:52, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Now, I want to state this very clearly once again: this attached patch
> is ENTIRELY UNTESTED.  It might have some completely stupid thinko in
> it, and may be entirely broken.

Well, it boots for me, with selinux enabled. Not that I tested any
actual selinux functionality outside of my normal desktop being
active.

So it might still be entirely broken, but it's hopefully not going to
do actively bad things to your system. I assume you have some test
suite that you could test this against..

             Linus

Paul Moore Nov. 14, 2023, 6:08 a.m. UTC | #3

On Sat, Nov 11, 2023 at 4:53 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, 11 Nov 2023 at 08:10, Paul Moore <paul@paul-moore.com> wrote:
> >
> > Using full_name_hash() instead of partial_name_hash() should result
> > in cleaner and better performing code.
>
> This looks obviously good to me, but I don't actually know what that
> hash is used for and whether it all matters.

Based on a (very) quick look, it seems like full_name_hash() shouldn't
really be any worse than partial_name_hash() with the advantage of
being a bit faster in some cases and cleaner from a code perspective
so it's a win from my perspective even if the performance remains
completely the same.

I haven't seen any objections so I'm going to merge it now.

> Also, looking at the actual use of this all ...
>
> ...
>
> So here's a *TOTALLY UNTESTED* patch that tries to remove the use of
> "h->size' for the hash table size, and replace it with an
> almost-equivalent 'h->hbits' for the size of the table in bits.

Unfortunately I'm pretty swamped at the moment digging out of
maintainer backlog stuff so it might take me a while to get to
properly reviewing this, but I'll add this patch to the todo pile.

Christian, would you happen to have any time or interest in reviewing
Linus' patch and giving it a good test?

[Side note for Linus, yes, we do have a somewhat simple test suite for
quick regression testing, if you're interested have a look at
https://github.com/SELinuxProject/selinux-testsuite]

> Also, while at it, I removed the pattern that I absolutely *detest*
> that the list handling in the hashing code used, which was that
> disgusting "what is the previous pointer to be filled in" pattern.
>
> The trivial - and *horrible* - way to do it is
>
>         prev = NULL;
>         cur = h->htable[hvalue];
>         ... walk the list ...
>                 prev = cur;
>         ...
>         prev ? &prev->next : &h->htable[hvalue],
>
> which shows that people don't understand the power of pointers.
>
> The *proper* way to do this with pointers is:
>
>         pprev = hashtab_entry(h, key, key_params);
>         .. walk the list with (cur = *pprev) != NULL) ...
>                 pprev = &cur->next;
>         ...
>         pprev
>
> which doesn't have that unnecessary conditional for "is this the first
> entry". It not only generates better code, it shows that you
> understand pointers.
>
> Yes, this is the smallest hill I'll die on. It's literally a pet peeve of mine.

That's fine by me.  I can't say it bothers me to the same level as
you, but I will admit to having my own annoyances, I think we all do.

> I also moved variable declarations into the blocks where they are
> used, rather than at the top level.
>
> Now, I want to state this very clearly once again: this attached patch
> is ENTIRELY UNTESTED.  It might have some completely stupid thinko in
> it, and may be entirely broken.
>
> I test-compiled it, and I've looked through it a couple of times for
> sanity, and I do think it's better to keep "hbits" around instead of
> "size", but I'm not going to force this on you.
>
> Anyway, I guess I should test this, but here is that untested patch if
> you want to consider it.

Thanks again for the patch.  I'm going to post it quickly to the
SELinux list as an inline patch so patchwork will pick it up and it
doesn't get lost in my inbox, feel free to ignore the repost.

diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c
index 2d528f699a22..0ab3b5429fcf 100644
--- a/security/selinux/ss/policydb.c
+++ b/security/selinux/ss/policydb.c
@@ -409,16 +409,9 @@  static int roles_init(struct policydb *p)
 static u32 filenametr_hash(const void *k)
 {
 	const struct filename_trans_key *ft = k;
-	unsigned long hash;
-	unsigned int byte_num;
-	unsigned char focus;
+	unsigned long salt = ft->ttype ^ ft->tclass;
 
-	hash = ft->ttype ^ ft->tclass;
-
-	byte_num = 0;
-	while ((focus = ft->name[byte_num++]))
-		hash = partial_name_hash(focus, hash);
-	return hash;
+	return full_name_hash((void *)salt, ft->name, strlen(ft->name));
 }
 
 static int filenametr_cmp(const void *k1, const void *k2)

selinux: update filenametr_hash() to use full_name_hash()

Commit Message

Comments

Patch