diff mbox

[2/2] radix-tree: Fix optimisation problem

Message ID CA+55aFw9=wqyA4xO1KKJoH7xsj6poWFrWTddcNBR5tkDOn8SYg@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Linus Torvalds Sept. 25, 2016, 7:56 p.m. UTC
On Sun, Sep 25, 2016 at 12:04 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>        It gets rid of
> the ad-hoc arithmetic in radix_tree_descend(), and just makes all that
> be inside the is_sibling_entry() logic instead. Which got renamed and
> made to actually return the main sibling.

Sadly, it looks like gcc generates bad code for this approach. Looks
like it ends up testing the resulting sibling pointer twice (because
we explicitly disable -fno-delete-null-pointer-checks in the kernel,
and we have no way to say "look, I know this pointer I'm returning is
non-null").

So a smaller patch that keeps the old boolean "is_sibling_entry()" but
then actually *uses* that inside radix_tree_descend() and then tries
to make the nasty cast to "void **" more legible by making it use a
temporary variable seems to be a reasonable balance.

At least I feel like I can still read the code, but admittedly by now
that may be because I've stared at those few lines so much that I feel
like I know what's going on. So maybe the code isn't actually any more
legible after all.

.. and unlike my previous patch, it actually generates better code
than the original (while still passing the fixed test-suite, of
course). The reason seems to be exactly that temporary variable,
allowing us to just do

        entry = rcu_dereference_raw(*sibentry);

rather than doing

        entry = rcu_dereference_raw(parent->slots[offset]);

with the re-computed offset.

So I think I'll commit this unless somebody screams.

                     Linus

Comments

Matthew Wilcox Sept. 26, 2016, 9:28 p.m. UTC | #1
From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds

> On Sun, Sep 25, 2016 at 12:04 PM, Linus Torvalds

> <torvalds@linux-foundation.org> wrote:

> >        It gets rid of

> > the ad-hoc arithmetic in radix_tree_descend(), and just makes all that

> > be inside the is_sibling_entry() logic instead. Which got renamed and

> > made to actually return the main sibling.

> 

> Sadly, it looks like gcc generates bad code for this approach. Looks

> like it ends up testing the resulting sibling pointer twice (because

> we explicitly disable -fno-delete-null-pointer-checks in the kernel,

> and we have no way to say "look, I know this pointer I'm returning is

> non-null").

> 

> So a smaller patch that keeps the old boolean "is_sibling_entry()" but

> then actually *uses* that inside radix_tree_descend() and then tries

> to make the nasty cast to "void **" more legible by making it use a

> temporary variable seems to be a reasonable balance.

> 

> At least I feel like I can still read the code, but admittedly by now

> that may be because I've stared at those few lines so much that I feel

> like I know what's going on. So maybe the code isn't actually any more

> legible after all.

> 

> .. and unlike my previous patch, it actually generates better code

> than the original (while still passing the fixed test-suite, of

> course). The reason seems to be exactly that temporary variable,

> allowing us to just do

> 

>         entry = rcu_dereference_raw(*sibentry);

> 

> rather than doing

> 

>         entry = rcu_dereference_raw(parent->slots[offset]);

> 

> with the re-computed offset.

> 

> So I think I'll commit this unless somebody screams.


Acked-by: Matthew Wilcox <mawilcox@microsoft.com>


I don't love it.  But I think it's a reasonable fix for this point in the release cycle, and I have an idea for changing the representation of sibling slots that will make this moot.

(Basically adopting Konstantin's idea for using the *last* entry instead of the *first*, and then using entries of the form (offset << 2 | RADIX_TREE_INTERNAL_NODE), so we can identify sibling entries without knowing the parent pointer, and we can go straight from sibling entry to slot offset as a shift rather than as a pointer subtraction).
Cedric Blancher Sept. 26, 2016, 9:48 p.m. UTC | #2
You might also try to use valid, plain ISO C99 instead of perverted
gcc extensions which only cause a lot of trouble in the long run.

Ced

On 26 September 2016 at 23:28, Matthew Wilcox <mawilcox@microsoft.com> wrote:
> From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus Torvalds
>> On Sun, Sep 25, 2016 at 12:04 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >        It gets rid of
>> > the ad-hoc arithmetic in radix_tree_descend(), and just makes all that
>> > be inside the is_sibling_entry() logic instead. Which got renamed and
>> > made to actually return the main sibling.
>>
>> Sadly, it looks like gcc generates bad code for this approach. Looks
>> like it ends up testing the resulting sibling pointer twice (because
>> we explicitly disable -fno-delete-null-pointer-checks in the kernel,
>> and we have no way to say "look, I know this pointer I'm returning is
>> non-null").
>>
>> So a smaller patch that keeps the old boolean "is_sibling_entry()" but
>> then actually *uses* that inside radix_tree_descend() and then tries
>> to make the nasty cast to "void **" more legible by making it use a
>> temporary variable seems to be a reasonable balance.
>>
>> At least I feel like I can still read the code, but admittedly by now
>> that may be because I've stared at those few lines so much that I feel
>> like I know what's going on. So maybe the code isn't actually any more
>> legible after all.
>>
>> .. and unlike my previous patch, it actually generates better code
>> than the original (while still passing the fixed test-suite, of
>> course). The reason seems to be exactly that temporary variable,
>> allowing us to just do
>>
>>         entry = rcu_dereference_raw(*sibentry);
>>
>> rather than doing
>>
>>         entry = rcu_dereference_raw(parent->slots[offset]);
>>
>> with the re-computed offset.
>>
>> So I think I'll commit this unless somebody screams.
>
> Acked-by: Matthew Wilcox <mawilcox@microsoft.com>
>
> I don't love it.  But I think it's a reasonable fix for this point in the release cycle, and I have an idea for changing the representation of sibling slots that will make this moot.
>
> (Basically adopting Konstantin's idea for using the *last* entry instead of the *first*, and then using entries of the form (offset << 2 | RADIX_TREE_INTERNAL_NODE), so we can identify sibling entries without knowing the parent pointer, and we can go straight from sibling entry to slot offset as a shift rather than as a pointer subtraction).
diff mbox

Patch

 lib/radix-tree.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1b7bf7314141..91f0727e3cad 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -105,10 +105,10 @@  static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
 	if (radix_tree_is_internal_node(entry)) {
-		unsigned long siboff = get_slot_offset(parent, entry);
-		if (siboff < RADIX_TREE_MAP_SIZE) {
-			offset = siboff;
-			entry = rcu_dereference_raw(parent->slots[offset]);
+		if (is_sibling_entry(parent, entry)) {
+			void **sibentry = (void **) entry_to_node(entry);
+			offset = get_slot_offset(parent, sibentry);
+			entry = rcu_dereference_raw(*sibentry);
 		}
 	}
 #endif