Message ID | 20231111160954.45911-2-paul@paul-moore.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Paul Moore |
Headers | show |
Series | selinux: update filenametr_hash() to use full_name_hash() | expand |
On Sat, 11 Nov 2023 at 08:10, Paul Moore <paul@paul-moore.com> wrote: > > Using full_name_hash() instead of partial_name_hash() should result > in cleaner and better performing code. This looks obviously good to me, but I don't actually know what that hash is used for and whether it all matters. Also, looking at the actual use of this all, I suspect the *real* issue that started this all is that in hashtab_search(), we have hvalue = key_params.hash(key) & (h->size - 1); cur = h->htable[hvalue]; and it's that "& (h->size - 1)" thing in that hash search function that caused problems in the first place, and that made Christian then do that commit 37b7ea3ca306 ("selinux: improve role transition hashing") that in turn made me look at it all. That "just use the low bits of the hash" really ends up being the simplest model, but it's also the worst possible thing if the hash itself isn't spreading out all bits optimally. We have a much better function for this, so instead of doing "x & (size-1)" to get a hash value, you do "hash_32(x, bits)". But that requires 'bits' instead of 'size'. So here's a *TOTALLY UNTESTED* patch that tries to remove the use of "h->size' for the hash table size, and replace it with an almost-equivalent 'h->hbits' for the size of the table in bits. I say _almost_ equivalent, because I kept the "zero means no table" behavior, so it turns out that you cannot have a hash table of just one entry (h->hbits would be zero). So the minimal hash table size is 2 (hbits = 1). Honestly, I doubt that matters. Now, that actually makes this patch somewhat intrusive, because there were quite a bit of places that used h->size, for obvious reasons. Also, while at it, I removed the pattern that I absolutely *detest* that the list handling in the hashing code used, which was that disgusting "what is the previous pointer to be filled in" pattern. The trivial - and *horrible* - way to do it is prev = NULL; cur = h->htable[hvalue]; ... walk the list ... prev = cur; ... prev ? &prev->next : &h->htable[hvalue], which shows that people don't understand the power of pointers. The *proper* way to do this with pointers is: pprev = hashtab_entry(h, key, key_params); .. walk the list with (cur = *pprev) != NULL) ... pprev = &cur->next; ... pprev which doesn't have that unnecessary conditional for "is this the first entry". It not only generates better code, it shows that you understand pointers. Yes, this is the smallest hill I'll die on. It's literally a pet peeve of mine. I also moved variable declarations into the blocks where they are used, rather than at the top level. Now, I want to state this very clearly once again: this attached patch is ENTIRELY UNTESTED. It might have some completely stupid thinko in it, and may be entirely broken. I test-compiled it, and I've looked through it a couple of times for sanity, and I do think it's better to keep "hbits" around instead of "size", but I'm not going to force this on you. Anyway, I guess I should test this, but here is that untested patch if you want to consider it. Linus
On Sat, 11 Nov 2023 at 13:52, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Now, I want to state this very clearly once again: this attached patch > is ENTIRELY UNTESTED. It might have some completely stupid thinko in > it, and may be entirely broken. Well, it boots for me, with selinux enabled. Not that I tested any actual selinux functionality outside of my normal desktop being active. So it might still be entirely broken, but it's hopefully not going to do actively bad things to your system. I assume you have some test suite that you could test this against.. Linus
On Sat, Nov 11, 2023 at 4:53 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Sat, 11 Nov 2023 at 08:10, Paul Moore <paul@paul-moore.com> wrote: > > > > Using full_name_hash() instead of partial_name_hash() should result > > in cleaner and better performing code. > > This looks obviously good to me, but I don't actually know what that > hash is used for and whether it all matters. Based on a (very) quick look, it seems like full_name_hash() shouldn't really be any worse than partial_name_hash() with the advantage of being a bit faster in some cases and cleaner from a code perspective so it's a win from my perspective even if the performance remains completely the same. I haven't seen any objections so I'm going to merge it now. > Also, looking at the actual use of this all ... > > ... > > So here's a *TOTALLY UNTESTED* patch that tries to remove the use of > "h->size' for the hash table size, and replace it with an > almost-equivalent 'h->hbits' for the size of the table in bits. Unfortunately I'm pretty swamped at the moment digging out of maintainer backlog stuff so it might take me a while to get to properly reviewing this, but I'll add this patch to the todo pile. Christian, would you happen to have any time or interest in reviewing Linus' patch and giving it a good test? [Side note for Linus, yes, we do have a somewhat simple test suite for quick regression testing, if you're interested have a look at https://github.com/SELinuxProject/selinux-testsuite] > Also, while at it, I removed the pattern that I absolutely *detest* > that the list handling in the hashing code used, which was that > disgusting "what is the previous pointer to be filled in" pattern. > > The trivial - and *horrible* - way to do it is > > prev = NULL; > cur = h->htable[hvalue]; > ... walk the list ... > prev = cur; > ... > prev ? &prev->next : &h->htable[hvalue], > > which shows that people don't understand the power of pointers. > > The *proper* way to do this with pointers is: > > pprev = hashtab_entry(h, key, key_params); > .. walk the list with (cur = *pprev) != NULL) ... > pprev = &cur->next; > ... > pprev > > which doesn't have that unnecessary conditional for "is this the first > entry". It not only generates better code, it shows that you > understand pointers. > > Yes, this is the smallest hill I'll die on. It's literally a pet peeve of mine. That's fine by me. I can't say it bothers me to the same level as you, but I will admit to having my own annoyances, I think we all do. > I also moved variable declarations into the blocks where they are > used, rather than at the top level. > > Now, I want to state this very clearly once again: this attached patch > is ENTIRELY UNTESTED. It might have some completely stupid thinko in > it, and may be entirely broken. > > I test-compiled it, and I've looked through it a couple of times for > sanity, and I do think it's better to keep "hbits" around instead of > "size", but I'm not going to force this on you. > > Anyway, I guess I should test this, but here is that untested patch if > you want to consider it. Thanks again for the patch. I'm going to post it quickly to the SELinux list as an inline patch so patchwork will pick it up and it doesn't get lost in my inbox, feel free to ignore the repost.
diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c index 2d528f699a22..0ab3b5429fcf 100644 --- a/security/selinux/ss/policydb.c +++ b/security/selinux/ss/policydb.c @@ -409,16 +409,9 @@ static int roles_init(struct policydb *p) static u32 filenametr_hash(const void *k) { const struct filename_trans_key *ft = k; - unsigned long hash; - unsigned int byte_num; - unsigned char focus; + unsigned long salt = ft->ttype ^ ft->tclass; - hash = ft->ttype ^ ft->tclass; - - byte_num = 0; - while ((focus = ft->name[byte_num++])) - hash = partial_name_hash(focus, hash); - return hash; + return full_name_hash((void *)salt, ft->name, strlen(ft->name)); } static int filenametr_cmp(const void *k1, const void *k2)
Using full_name_hash() instead of partial_name_hash() should result in cleaner and better performing code. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Moore <paul@paul-moore.com> --- security/selinux/ss/policydb.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)