diff mbox

[v12,09/11] pvqspinlock, x86: Add para-virtualization support

Message ID 20141024085437.GV21513@worktop.programming.kicks-ass.net (mailing list archive)
State New, archived
Headers show

Commit Message

Peter Zijlstra Oct. 24, 2014, 8:54 a.m. UTC
On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote:

> Since enabling paravirt spinlock will disable unlock function inlining,
> a jump label can be added to the unlock function without adding patch
> sites all over the kernel.

But you don't have to. My patches allowed for the inline to remain,
again reducing the overhead of enabling PV spinlocks while running on a
real machine.

Look at: 

  http://lkml.kernel.org/r/20140615130154.213923590@chello.nl

In particular this hunk:



That makes sure to overwrite the callee-saved call to the
pv_lock_ops::queue_unlock with the immediate asm "movb $0, (%rdi)".


Therefore you can retain the inlined unlock with hardly (there might be
some NOP padding) any overhead at all. On PV it reverts to a callee
saved function call.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Waiman Long Oct. 27, 2014, 5:38 p.m. UTC | #1
On 10/24/2014 04:54 AM, Peter Zijlstra wrote:
> On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote:
>
>> Since enabling paravirt spinlock will disable unlock function inlining,
>> a jump label can be added to the unlock function without adding patch
>> sites all over the kernel.
> But you don't have to. My patches allowed for the inline to remain,
> again reducing the overhead of enabling PV spinlocks while running on a
> real machine.
>
> Look at:
>
>    http://lkml.kernel.org/r/20140615130154.213923590@chello.nl
>
> In particular this hunk:
>
> Index: linux-2.6/arch/x86/kernel/paravirt_patch_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/paravirt_patch_64.c
> +++ linux-2.6/arch/x86/kernel/paravirt_patch_64.c
> @@ -22,6 +22,10 @@ DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs")
>   DEF_NATIVE(, mov32, "mov %edi, %eax");
>   DEF_NATIVE(, mov64, "mov %rdi, %rax");
>
> +#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&  defined(CONFIG_QUEUE_SPINLOCK)
> +DEF_NATIVE(pv_lock_ops, queue_unlock, "movb $0, (%rdi)");
> +#endif
> +
>   unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
>   {
>          return paravirt_patch_insns(insnbuf, len,
> @@ -61,6 +65,9 @@ unsigned native_patch(u8 type, u16 clobb
>                  PATCH_SITE(pv_cpu_ops, clts);
>                  PATCH_SITE(pv_mmu_ops, flush_tlb_single);
>                  PATCH_SITE(pv_cpu_ops, wbinvd);
> +#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&  defined(CONFIG_QUEUE_SPINLOCK)
> +               PATCH_SITE(pv_lock_ops, queue_unlock);
> +#endif
>
>          patch_site:
>                  ret = paravirt_patch_insns(ibuf, len, start, end);
>
>
> That makes sure to overwrite the callee-saved call to the
> pv_lock_ops::queue_unlock with the immediate asm "movb $0, (%rdi)".
>
>
> Therefore you can retain the inlined unlock with hardly (there might be
> some NOP padding) any overhead at all. On PV it reverts to a callee
> saved function call.

My concern is that spin_unlock() can be called in many places, including 
loadable kernel modules. Can the paravirt_patch_ident_32() function able 
to patch all of them in reasonable time? How about a kernel module 
loaded later at run time?

So I think we may still need to disable unlock function inlining even if 
we used your way kernel site patching.

Regards,
Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Konrad Rzeszutek Wilk Oct. 27, 2014, 6:02 p.m. UTC | #2
On Mon, Oct 27, 2014 at 01:38:20PM -0400, Waiman Long wrote:
> On 10/24/2014 04:54 AM, Peter Zijlstra wrote:
> >On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote:
> >
> >>Since enabling paravirt spinlock will disable unlock function inlining,
> >>a jump label can be added to the unlock function without adding patch
> >>sites all over the kernel.
> >But you don't have to. My patches allowed for the inline to remain,
> >again reducing the overhead of enabling PV spinlocks while running on a
> >real machine.
> >
> >Look at:
> >
> >   http://lkml.kernel.org/r/20140615130154.213923590@chello.nl
> >
> >In particular this hunk:
> >
> >Index: linux-2.6/arch/x86/kernel/paravirt_patch_64.c
> >===================================================================
> >--- linux-2.6.orig/arch/x86/kernel/paravirt_patch_64.c
> >+++ linux-2.6/arch/x86/kernel/paravirt_patch_64.c
> >@@ -22,6 +22,10 @@ DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs")
> >  DEF_NATIVE(, mov32, "mov %edi, %eax");
> >  DEF_NATIVE(, mov64, "mov %rdi, %rax");
> >
> >+#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&  defined(CONFIG_QUEUE_SPINLOCK)
> >+DEF_NATIVE(pv_lock_ops, queue_unlock, "movb $0, (%rdi)");
> >+#endif
> >+
> >  unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
> >  {
> >         return paravirt_patch_insns(insnbuf, len,
> >@@ -61,6 +65,9 @@ unsigned native_patch(u8 type, u16 clobb
> >                 PATCH_SITE(pv_cpu_ops, clts);
> >                 PATCH_SITE(pv_mmu_ops, flush_tlb_single);
> >                 PATCH_SITE(pv_cpu_ops, wbinvd);
> >+#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&  defined(CONFIG_QUEUE_SPINLOCK)
> >+               PATCH_SITE(pv_lock_ops, queue_unlock);
> >+#endif
> >
> >         patch_site:
> >                 ret = paravirt_patch_insns(ibuf, len, start, end);
> >
> >
> >That makes sure to overwrite the callee-saved call to the
> >pv_lock_ops::queue_unlock with the immediate asm "movb $0, (%rdi)".
> >
> >
> >Therefore you can retain the inlined unlock with hardly (there might be
> >some NOP padding) any overhead at all. On PV it reverts to a callee
> >saved function call.
> 
> My concern is that spin_unlock() can be called in many places, including
> loadable kernel modules. Can the paravirt_patch_ident_32() function able to
> patch all of them in reasonable time? How about a kernel module loaded later
> at run time?

It has too. When the modules are loaded the .paravirt symbols are exposed
and the module loader patches that.

And during bootup time (before modules are loaded) it also patches everything
- when it only runs on one CPU.
> 
> So I think we may still need to disable unlock function inlining even if we
> used your way kernel site patching.

No need. Inline should (And is) working just fine.
> 
> Regards,
> Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Oct. 27, 2014, 6:04 p.m. UTC | #3
On Mon, Oct 27, 2014 at 01:38:20PM -0400, Waiman Long wrote:
> On 10/24/2014 04:54 AM, Peter Zijlstra wrote:
> >On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote:
> >
> >>Since enabling paravirt spinlock will disable unlock function inlining,
> >>a jump label can be added to the unlock function without adding patch
> >>sites all over the kernel.
> >But you don't have to. My patches allowed for the inline to remain,
> >again reducing the overhead of enabling PV spinlocks while running on a
> >real machine.
> >
> >Look at:
> >
> >   http://lkml.kernel.org/r/20140615130154.213923590@chello.nl
> >
> >In particular this hunk:
> >
> >Index: linux-2.6/arch/x86/kernel/paravirt_patch_64.c
> >===================================================================
> >--- linux-2.6.orig/arch/x86/kernel/paravirt_patch_64.c
> >+++ linux-2.6/arch/x86/kernel/paravirt_patch_64.c
> >@@ -22,6 +22,10 @@ DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs")
> >  DEF_NATIVE(, mov32, "mov %edi, %eax");
> >  DEF_NATIVE(, mov64, "mov %rdi, %rax");
> >
> >+#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&  defined(CONFIG_QUEUE_SPINLOCK)
> >+DEF_NATIVE(pv_lock_ops, queue_unlock, "movb $0, (%rdi)");
> >+#endif
> >+
> >  unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
> >  {
> >         return paravirt_patch_insns(insnbuf, len,
> >@@ -61,6 +65,9 @@ unsigned native_patch(u8 type, u16 clobb
> >                 PATCH_SITE(pv_cpu_ops, clts);
> >                 PATCH_SITE(pv_mmu_ops, flush_tlb_single);
> >                 PATCH_SITE(pv_cpu_ops, wbinvd);
> >+#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&  defined(CONFIG_QUEUE_SPINLOCK)
> >+               PATCH_SITE(pv_lock_ops, queue_unlock);
> >+#endif
> >
> >         patch_site:
> >                 ret = paravirt_patch_insns(ibuf, len, start, end);
> >
> >
> >That makes sure to overwrite the callee-saved call to the
> >pv_lock_ops::queue_unlock with the immediate asm "movb $0, (%rdi)".
> >
> >
> >Therefore you can retain the inlined unlock with hardly (there might be
> >some NOP padding) any overhead at all. On PV it reverts to a callee
> >saved function call.
> 
> My concern is that spin_unlock() can be called in many places, including
> loadable kernel modules. Can the paravirt_patch_ident_32() function able to
> patch all of them in reasonable time? How about a kernel module loaded later
> at run time?

modules should be fine, see arch/x86/kernel/module.c:module_finalize()
-> apply_paravirt().

Also note that the 'default' text is an indirect call into the paravirt
ops table which routes to the 'right' function, so even if the text
patching would be 'late' calls would 'work' as expected, just slower.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Waiman Long Oct. 27, 2014, 8:55 p.m. UTC | #4
On 10/27/2014 02:02 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Oct 27, 2014 at 01:38:20PM -0400, Waiman Long wrote:
>> On 10/24/2014 04:54 AM, Peter Zijlstra wrote:
>>> On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote:
>>>
>>>> Since enabling paravirt spinlock will disable unlock function inlining,
>>>> a jump label can be added to the unlock function without adding patch
>>>> sites all over the kernel.
>>> But you don't have to. My patches allowed for the inline to remain,
>>> again reducing the overhead of enabling PV spinlocks while running on a
>>> real machine.
>>>
>>> Look at:
>>>
>>>    http://lkml.kernel.org/r/20140615130154.213923590@chello.nl
>>>
>>> In particular this hunk:
>>>
>>> Index: linux-2.6/arch/x86/kernel/paravirt_patch_64.c
>>> ===================================================================
>>> --- linux-2.6.orig/arch/x86/kernel/paravirt_patch_64.c
>>> +++ linux-2.6/arch/x86/kernel/paravirt_patch_64.c
>>> @@ -22,6 +22,10 @@ DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs")
>>>   DEF_NATIVE(, mov32, "mov %edi, %eax");
>>>   DEF_NATIVE(, mov64, "mov %rdi, %rax");
>>>
>>> +#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&   defined(CONFIG_QUEUE_SPINLOCK)
>>> +DEF_NATIVE(pv_lock_ops, queue_unlock, "movb $0, (%rdi)");
>>> +#endif
>>> +
>>>   unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
>>>   {
>>>          return paravirt_patch_insns(insnbuf, len,
>>> @@ -61,6 +65,9 @@ unsigned native_patch(u8 type, u16 clobb
>>>                  PATCH_SITE(pv_cpu_ops, clts);
>>>                  PATCH_SITE(pv_mmu_ops, flush_tlb_single);
>>>                  PATCH_SITE(pv_cpu_ops, wbinvd);
>>> +#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&   defined(CONFIG_QUEUE_SPINLOCK)
>>> +               PATCH_SITE(pv_lock_ops, queue_unlock);
>>> +#endif
>>>
>>>          patch_site:
>>>                  ret = paravirt_patch_insns(ibuf, len, start, end);
>>>
>>>
>>> That makes sure to overwrite the callee-saved call to the
>>> pv_lock_ops::queue_unlock with the immediate asm "movb $0, (%rdi)".
>>>
>>>
>>> Therefore you can retain the inlined unlock with hardly (there might be
>>> some NOP padding) any overhead at all. On PV it reverts to a callee
>>> saved function call.
>> My concern is that spin_unlock() can be called in many places, including
>> loadable kernel modules. Can the paravirt_patch_ident_32() function able to
>> patch all of them in reasonable time? How about a kernel module loaded later
>> at run time?
> It has too. When the modules are loaded the .paravirt symbols are exposed
> and the module loader patches that.
>
> And during bootup time (before modules are loaded) it also patches everything
> - when it only runs on one CPU.
>> So I think we may still need to disable unlock function inlining even if we
>> used your way kernel site patching.
> No need. Inline should (And is) working just fine.
>> Regards,
>> Longman

Thanks for letting me know about the paravirt patching capability 
available in the kernel. In this case, I would say we should use Peter's 
way of doing unlock without disabling unlock function inlining. That 
will further reduce the performance difference of kernels with and 
without PV.

Cheer,
Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Waiman Long Oct. 27, 2014, 9:22 p.m. UTC | #5
On 10/27/2014 02:04 PM, Peter Zijlstra wrote:
> On Mon, Oct 27, 2014 at 01:38:20PM -0400, Waiman Long wrote:
>> On 10/24/2014 04:54 AM, Peter Zijlstra wrote:
>>> On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote:
>>>
>>>> Since enabling paravirt spinlock will disable unlock function inlining,
>>>> a jump label can be added to the unlock function without adding patch
>>>> sites all over the kernel.
>>> But you don't have to. My patches allowed for the inline to remain,
>>> again reducing the overhead of enabling PV spinlocks while running on a
>>> real machine.
>>>
>>> Look at:
>>>
>>>    http://lkml.kernel.org/r/20140615130154.213923590@chello.nl
>>>
>>> In particular this hunk:
>>>
>>> Index: linux-2.6/arch/x86/kernel/paravirt_patch_64.c
>>> ===================================================================
>>> --- linux-2.6.orig/arch/x86/kernel/paravirt_patch_64.c
>>> +++ linux-2.6/arch/x86/kernel/paravirt_patch_64.c
>>> @@ -22,6 +22,10 @@ DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs")
>>>   DEF_NATIVE(, mov32, "mov %edi, %eax");
>>>   DEF_NATIVE(, mov64, "mov %rdi, %rax");
>>>
>>> +#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&   defined(CONFIG_QUEUE_SPINLOCK)
>>> +DEF_NATIVE(pv_lock_ops, queue_unlock, "movb $0, (%rdi)");
>>> +#endif
>>> +
>>>   unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
>>>   {
>>>          return paravirt_patch_insns(insnbuf, len,
>>> @@ -61,6 +65,9 @@ unsigned native_patch(u8 type, u16 clobb
>>>                  PATCH_SITE(pv_cpu_ops, clts);
>>>                  PATCH_SITE(pv_mmu_ops, flush_tlb_single);
>>>                  PATCH_SITE(pv_cpu_ops, wbinvd);
>>> +#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&   defined(CONFIG_QUEUE_SPINLOCK)
>>> +               PATCH_SITE(pv_lock_ops, queue_unlock);
>>> +#endif
>>>
>>>          patch_site:
>>>                  ret = paravirt_patch_insns(ibuf, len, start, end);
>>>
>>>
>>> That makes sure to overwrite the callee-saved call to the
>>> pv_lock_ops::queue_unlock with the immediate asm "movb $0, (%rdi)".
>>>
>>>
>>> Therefore you can retain the inlined unlock with hardly (there might be
>>> some NOP padding) any overhead at all. On PV it reverts to a callee
>>> saved function call.
>> My concern is that spin_unlock() can be called in many places, including
>> loadable kernel modules. Can the paravirt_patch_ident_32() function able to
>> patch all of them in reasonable time? How about a kernel module loaded later
>> at run time?
> modules should be fine, see arch/x86/kernel/module.c:module_finalize()
> ->  apply_paravirt().
>
> Also note that the 'default' text is an indirect call into the paravirt
> ops table which routes to the 'right' function, so even if the text
> patching would be 'late' calls would 'work' as expected, just slower.

Thanks for letting me know about that. I have this concern because your 
patch didn't change the current configuration of disabling unlock 
inlining when paravirt_spinlock is enabled. With that, I think it is 
worthwhile to reduce the performance delta between the PV and non-PV 
kernel on bare metal.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Waiman Long Oct. 29, 2014, 7:05 p.m. UTC | #6
On 10/27/2014 05:22 PM, Waiman Long wrote:
> On 10/27/2014 02:04 PM, Peter Zijlstra wrote:
>> On Mon, Oct 27, 2014 at 01:38:20PM -0400, Waiman Long wrote:
>>> On 10/24/2014 04:54 AM, Peter Zijlstra wrote:
>>>> On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote:
>>>>
>>>>> Since enabling paravirt spinlock will disable unlock function 
>>>>> inlining,
>>>>> a jump label can be added to the unlock function without adding patch
>>>>> sites all over the kernel.
>>>> But you don't have to. My patches allowed for the inline to remain,
>>>> again reducing the overhead of enabling PV spinlocks while running 
>>>> on a
>>>> real machine.
>>>>
>>>> Look at:
>>>>
>>>>    http://lkml.kernel.org/r/20140615130154.213923590@chello.nl
>>>>
>>>> In particular this hunk:
>>>>
>>>> Index: linux-2.6/arch/x86/kernel/paravirt_patch_64.c
>>>> ===================================================================
>>>> --- linux-2.6.orig/arch/x86/kernel/paravirt_patch_64.c
>>>> +++ linux-2.6/arch/x86/kernel/paravirt_patch_64.c
>>>> @@ -22,6 +22,10 @@ DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs")
>>>>   DEF_NATIVE(, mov32, "mov %edi, %eax");
>>>>   DEF_NATIVE(, mov64, "mov %rdi, %rax");
>>>>
>>>> +#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&   
>>>> defined(CONFIG_QUEUE_SPINLOCK)
>>>> +DEF_NATIVE(pv_lock_ops, queue_unlock, "movb $0, (%rdi)");
>>>> +#endif
>>>> +
>>>>   unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
>>>>   {
>>>>          return paravirt_patch_insns(insnbuf, len,
>>>> @@ -61,6 +65,9 @@ unsigned native_patch(u8 type, u16 clobb
>>>>                  PATCH_SITE(pv_cpu_ops, clts);
>>>>                  PATCH_SITE(pv_mmu_ops, flush_tlb_single);
>>>>                  PATCH_SITE(pv_cpu_ops, wbinvd);
>>>> +#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&   
>>>> defined(CONFIG_QUEUE_SPINLOCK)
>>>> +               PATCH_SITE(pv_lock_ops, queue_unlock);
>>>> +#endif
>>>>
>>>>          patch_site:
>>>>                  ret = paravirt_patch_insns(ibuf, len, start, end);
>>>>
>>>>
>>>> That makes sure to overwrite the callee-saved call to the
>>>> pv_lock_ops::queue_unlock with the immediate asm "movb $0, (%rdi)".
>>>>
>>>>
>>>> Therefore you can retain the inlined unlock with hardly (there 
>>>> might be
>>>> some NOP padding) any overhead at all. On PV it reverts to a callee
>>>> saved function call.
>>> My concern is that spin_unlock() can be called in many places, 
>>> including
>>> loadable kernel modules. Can the paravirt_patch_ident_32() function 
>>> able to
>>> patch all of them in reasonable time? How about a kernel module 
>>> loaded later
>>> at run time?
>> modules should be fine, see arch/x86/kernel/module.c:module_finalize()
>> ->  apply_paravirt().
>>
>> Also note that the 'default' text is an indirect call into the paravirt
>> ops table which routes to the 'right' function, so even if the text
>> patching would be 'late' calls would 'work' as expected, just slower.
>
> Thanks for letting me know about that. I have this concern because 
> your patch didn't change the current configuration of disabling unlock 
> inlining when paravirt_spinlock is enabled. With that, I think it is 
> worthwhile to reduce the performance delta between the PV and non-PV 
> kernel on bare metal.

I am sorry that the unlock call sites patching code doesn't work in a 
virtual guest. Your pvqspinlock patch did an unconditional patching even 
in a virtual guest. I added check for the paravirt_spinlocks_enabled, 
but it turned out that some spin_unlock() seemed to be called before 
paravirt_spinlocks_enabled is set. As a result, some call sites were 
still patched resulting in missed wake up's and system hang.

At this point, I am going to leave out that change from my patch set 
until we can figure out a better way of doing that.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Waiman Long Oct. 29, 2014, 8:25 p.m. UTC | #7
On 10/29/2014 03:05 PM, Waiman Long wrote:
> On 10/27/2014 05:22 PM, Waiman Long wrote:
>> On 10/27/2014 02:04 PM, Peter Zijlstra wrote:
>>> On Mon, Oct 27, 2014 at 01:38:20PM -0400, Waiman Long wrote:
>>>> On 10/24/2014 04:54 AM, Peter Zijlstra wrote:
>>>>> On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote:
>>>>>
>>>>>> Since enabling paravirt spinlock will disable unlock function 
>>>>>> inlining,
>>>>>> a jump label can be added to the unlock function without adding 
>>>>>> patch
>>>>>> sites all over the kernel.
>>>>> But you don't have to. My patches allowed for the inline to remain,
>>>>> again reducing the overhead of enabling PV spinlocks while running 
>>>>> on a
>>>>> real machine.
>>>>>
>>>>> Look at:
>>>>>
>>>>>    http://lkml.kernel.org/r/20140615130154.213923590@chello.nl
>>>>>
>>>>> In particular this hunk:
>>>>>
>>>>> Index: linux-2.6/arch/x86/kernel/paravirt_patch_64.c
>>>>> ===================================================================
>>>>> --- linux-2.6.orig/arch/x86/kernel/paravirt_patch_64.c
>>>>> +++ linux-2.6/arch/x86/kernel/paravirt_patch_64.c
>>>>> @@ -22,6 +22,10 @@ DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs")
>>>>>   DEF_NATIVE(, mov32, "mov %edi, %eax");
>>>>>   DEF_NATIVE(, mov64, "mov %rdi, %rax");
>>>>>
>>>>> +#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&   
>>>>> defined(CONFIG_QUEUE_SPINLOCK)
>>>>> +DEF_NATIVE(pv_lock_ops, queue_unlock, "movb $0, (%rdi)");
>>>>> +#endif
>>>>> +
>>>>>   unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
>>>>>   {
>>>>>          return paravirt_patch_insns(insnbuf, len,
>>>>> @@ -61,6 +65,9 @@ unsigned native_patch(u8 type, u16 clobb
>>>>>                  PATCH_SITE(pv_cpu_ops, clts);
>>>>>                  PATCH_SITE(pv_mmu_ops, flush_tlb_single);
>>>>>                  PATCH_SITE(pv_cpu_ops, wbinvd);
>>>>> +#if defined(CONFIG_PARAVIRT_SPINLOCKS)&&   
>>>>> defined(CONFIG_QUEUE_SPINLOCK)
>>>>> +               PATCH_SITE(pv_lock_ops, queue_unlock);
>>>>> +#endif
>>>>>
>>>>>          patch_site:
>>>>>                  ret = paravirt_patch_insns(ibuf, len, start, end);
>>>>>
>>>>>
>>>>> That makes sure to overwrite the callee-saved call to the
>>>>> pv_lock_ops::queue_unlock with the immediate asm "movb $0, (%rdi)".
>>>>>
>>>>>
>>>>> Therefore you can retain the inlined unlock with hardly (there 
>>>>> might be
>>>>> some NOP padding) any overhead at all. On PV it reverts to a callee
>>>>> saved function call.
>>>> My concern is that spin_unlock() can be called in many places, 
>>>> including
>>>> loadable kernel modules. Can the paravirt_patch_ident_32() function 
>>>> able to
>>>> patch all of them in reasonable time? How about a kernel module 
>>>> loaded later
>>>> at run time?
>>> modules should be fine, see arch/x86/kernel/module.c:module_finalize()
>>> ->  apply_paravirt().
>>>
>>> Also note that the 'default' text is an indirect call into the paravirt
>>> ops table which routes to the 'right' function, so even if the text
>>> patching would be 'late' calls would 'work' as expected, just slower.
>>
>> Thanks for letting me know about that. I have this concern because 
>> your patch didn't change the current configuration of disabling 
>> unlock inlining when paravirt_spinlock is enabled. With that, I think 
>> it is worthwhile to reduce the performance delta between the PV and 
>> non-PV kernel on bare metal.
>
> I am sorry that the unlock call sites patching code doesn't work in a 
> virtual guest. Your pvqspinlock patch did an unconditional patching 
> even in a virtual guest. I added check for the 
> paravirt_spinlocks_enabled, but it turned out that some spin_unlock() 
> seemed to be called before paravirt_spinlocks_enabled is set. As a 
> result, some call sites were still patched resulting in missed wake 
> up's and system hang.
>
> At this point, I am going to leave out that change from my patch set 
> until we can figure out a better way of doing that.
>

Below was a partial kernel log with the unlock call site patch code in a 
KVM guest:

[    0.438006] native_patch: patch out pv_queue_unlock!
[    0.438565] native_patch: patch out pv_queue_unlock!
[    0.439006] native_patch: patch out pv_queue_unlock!
[    0.439638] native_patch: patch out pv_queue_unlock!
[    0.440052] native_patch: patch out pv_queue_unlock!
[    0.441006] native_patch: patch out pv_queue_unlock!
[    0.441566] native_patch: patch out pv_queue_unlock!
[    0.442035] ftrace: allocating 24168 entries in 95 pages
[    0.451208] Switched APIC routing to physical flat.
[    0.453202] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.454002] smpboot: CPU0: Intel QEMU Virtual CPU version 1.5.3 (fam: 
06, model: 06, stepping: 03)
[    0.456000] Performance Events: Broken PMU hardware detected, using 
software events only.
[    0.456003] Failed to access perfctr msr (MSR c1 is 0)
[    0.457151] KVM setup paravirtual spinlock
[    0.460039] NMI watchdog: disabled (cpu0): hardware events not enabled

It could be seen that some unlock call sites were patched before the KVM 
setup code set the paravirt_spinlocks_enabled flag.

-Longman

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Waiman Long Nov. 26, 2014, 12:33 a.m. UTC | #8
On 10/27/2014 02:02 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Oct 27, 2014 at 01:38:20PM -0400, Waiman Long wrote:
>>
>> My concern is that spin_unlock() can be called in many places, including
>> loadable kernel modules. Can the paravirt_patch_ident_32() function able to
>> patch all of them in reasonable time? How about a kernel module loaded later
>> at run time?
> It has too. When the modules are loaded the .paravirt symbols are exposed
> and the module loader patches that.
>
> And during bootup time (before modules are loaded) it also patches everything
> - when it only runs on one CPU.
>

I have been changing the patching code to patch the unlock call sites 
and it seems to be working now. However, when I manually inserted a 
kernel module using insmod and run the code in the newly inserted 
module, I got memory access violation as follows:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 18d62f3067 PUD 18d476f067 PMD 0
Oops: 0010 [#1] SMP
Modules linked in: locktest(OE) ebtable_nat ebtables xt_CHECKSUM 
iptable_mangle bridge autofs4 8021q garp stp llc ipt_REJECT 
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT 
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter 
ip6_tables ipv6 vhost_net macvtap macvlan vhost tun uinput ppdev 
parport_pc parport sg microcode pcspkr virtio_balloon 
snd_hda_codec_generic virtio_console snd_hda_intel snd_hda_controller 
snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd 
soundcore virtio_net i2c_piix4 i2c_core ext4(E) jbd2(E) mbcache(E) 
floppy(E) virtio_blk(E) sr_mod(E) cdrom(E) virtio_pci(E) virtio_ring(E) 
virtio(E) pata_acpi(E) ata_generic(E) ata_piix(E) dm_mirror(E) 
dm_region_hash(E) dm_log(E) dm_mod(E) [last unloaded: speedstep_lib]
CPU: 1 PID: 3907 Comm: run-locktest Tainted: G        W  OE  
3.17.0-pvqlock #3
Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
task: ffff8818cc5baf90 ti: ffff8818b7094000 task.ti: ffff8818b7094000
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff8818b7097db0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000004c4b40 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8818d3f052c0
RBP: ffff8818b7097dd8 R08: 0000000080522014 R09: 0000000000000000
R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000000001 R15: ffff8818b7097ea0
FS:  00007fb828ece700(0000) GS:ffff88193ec20000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000018cc7e9000 CR4: 00000000000006e0
Stack:
  ffffffffa06ff395 ffff8818d465e000 ffffffff8164bec0 0000000000000001
  0000000000000050 ffff8818b7097e18 ffffffffa06ff785 ffff8818b7097e38
  0000000000000246 0000000054755e3a 0000000039f8ba72 ffff8818c174f000
Call Trace:
  [<ffffffffa06ff395>] ? test_spinlock+0x65/0x90 [locktest]
  [<ffffffffa06ff785>] etime_show+0xd5/0x120 [locktest]
  [<ffffffff812a2dc6>] kobj_attr_show+0x16/0x20
  [<ffffffff8121a7fa>] sysfs_kf_seq_show+0xca/0x1b0
  [<ffffffff81218a13>] kernfs_seq_show+0x23/0x30
  [<ffffffff811c82db>] seq_read+0xbb/0x400
  [<ffffffff812197e5>] kernfs_fop_read+0x35/0x40
  [<ffffffff811a4223>] vfs_read+0xa3/0x110
  [<ffffffff811a47e6>] SyS_read+0x56/0xd0
  [<ffffffff810f3e16>] ? __audit_syscall_exit+0x216/0x2c0
  [<ffffffff815b3ca9>] system_call_fastpath+0x16/0x1b
Code:  Bad RIP value.
  RSP <ffff8818b7097db0>
CR2: 0000000000000000
---[ end trace 69d0e259c9ec632f ]---

It seems like call site patching isn't properly done or the kernel 
module that I built was missing some critical information necessary for 
the proper linking. Anyway, I will include the unlock call patching code 
as a separate patch as it seems there may be problem under certain 
circumstance.

BTW, the kernel panic problem that your team reported had been fixed. 
The fix will be in the next version of the patch.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Konrad Rzeszutek Wilk Dec. 1, 2014, 4:51 p.m. UTC | #9
On Tue, Nov 25, 2014 at 07:33:58PM -0500, Waiman Long wrote:
> On 10/27/2014 02:02 PM, Konrad Rzeszutek Wilk wrote:
> >On Mon, Oct 27, 2014 at 01:38:20PM -0400, Waiman Long wrote:
> >>
> >>My concern is that spin_unlock() can be called in many places, including
> >>loadable kernel modules. Can the paravirt_patch_ident_32() function able to
> >>patch all of them in reasonable time? How about a kernel module loaded later
> >>at run time?
> >It has too. When the modules are loaded the .paravirt symbols are exposed
> >and the module loader patches that.
> >
> >And during bootup time (before modules are loaded) it also patches everything
> >- when it only runs on one CPU.
> >
> 
> I have been changing the patching code to patch the unlock call sites and it
> seems to be working now. However, when I manually inserted a kernel module
> using insmod and run the code in the newly inserted module, I got memory
> access violation as follows:
> 
> BUG: unable to handle kernel NULL pointer dereference at           (null)
> IP: [<          (null)>]           (null)
> PGD 18d62f3067 PUD 18d476f067 PMD 0
> Oops: 0010 [#1] SMP
> Modules linked in: locktest(OE) ebtable_nat ebtables xt_CHECKSUM
> iptable_mangle bridge autofs4 8021q garp stp llc ipt_REJECT
> nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
> ip6_tables ipv6 vhost_net macvtap macvlan vhost tun uinput ppdev parport_pc
> parport sg microcode pcspkr virtio_balloon snd_hda_codec_generic
> virtio_console snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep
> snd_seq snd_seq_device snd_pcm snd_timer snd soundcore virtio_net i2c_piix4
> i2c_core ext4(E) jbd2(E) mbcache(E) floppy(E) virtio_blk(E) sr_mod(E)
> cdrom(E) virtio_pci(E) virtio_ring(E) virtio(E) pata_acpi(E) ata_generic(E)
> ata_piix(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [last
> unloaded: speedstep_lib]
> CPU: 1 PID: 3907 Comm: run-locktest Tainted: G        W  OE  3.17.0-pvqlock
> #3
> Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
> task: ffff8818cc5baf90 ti: ffff8818b7094000 task.ti: ffff8818b7094000
> RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
> RSP: 0018:ffff8818b7097db0  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 00000000004c4b40 RCX: 0000000000000000
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8818d3f052c0
> RBP: ffff8818b7097dd8 R08: 0000000080522014 R09: 0000000000000000
> R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000001
> R13: 0000000000000000 R14: 0000000000000001 R15: ffff8818b7097ea0
> FS:  00007fb828ece700(0000) GS:ffff88193ec20000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 00000018cc7e9000 CR4: 00000000000006e0
> Stack:
>  ffffffffa06ff395 ffff8818d465e000 ffffffff8164bec0 0000000000000001
>  0000000000000050 ffff8818b7097e18 ffffffffa06ff785 ffff8818b7097e38
>  0000000000000246 0000000054755e3a 0000000039f8ba72 ffff8818c174f000
> Call Trace:
>  [<ffffffffa06ff395>] ? test_spinlock+0x65/0x90 [locktest]
>  [<ffffffffa06ff785>] etime_show+0xd5/0x120 [locktest]
>  [<ffffffff812a2dc6>] kobj_attr_show+0x16/0x20
>  [<ffffffff8121a7fa>] sysfs_kf_seq_show+0xca/0x1b0
>  [<ffffffff81218a13>] kernfs_seq_show+0x23/0x30
>  [<ffffffff811c82db>] seq_read+0xbb/0x400
>  [<ffffffff812197e5>] kernfs_fop_read+0x35/0x40
>  [<ffffffff811a4223>] vfs_read+0xa3/0x110
>  [<ffffffff811a47e6>] SyS_read+0x56/0xd0
>  [<ffffffff810f3e16>] ? __audit_syscall_exit+0x216/0x2c0
>  [<ffffffff815b3ca9>] system_call_fastpath+0x16/0x1b
> Code:  Bad RIP value.
>  RSP <ffff8818b7097db0>
> CR2: 0000000000000000
> ---[ end trace 69d0e259c9ec632f ]---
> 
> It seems like call site patching isn't properly done or the kernel module
> that I built was missing some critical information necessary for the proper

Did the readelf give you the paravirt note section?
> linking. Anyway, I will include the unlock call patching code as a separate
> patch as it seems there may be problem under certain circumstance.

one way to troubleshoot those is to enable the paravirt patching code to
actually print where it is patching the code. That way when you load the
module you can confirm it has done its job.

Then you can verify that the address  where the code is called:

ffffffffa06ff395

is indeed patched. You might as well also do a hexdump in the module loading
to confim that the patching had been done correctly.
> 
> BTW, the kernel panic problem that your team reported had been fixed. The
> fix will be in the next version of the patch.
> 
> -Longman
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/arch/x86/kernel/paravirt_patch_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/paravirt_patch_64.c
+++ linux-2.6/arch/x86/kernel/paravirt_patch_64.c
@@ -22,6 +22,10 @@  DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs")
 DEF_NATIVE(, mov32, "mov %edi, %eax");
 DEF_NATIVE(, mov64, "mov %rdi, %rax");

+#if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUE_SPINLOCK)
+DEF_NATIVE(pv_lock_ops, queue_unlock, "movb $0, (%rdi)");
+#endif
+ 
 unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
 {
        return paravirt_patch_insns(insnbuf, len,
@@ -61,6 +65,9 @@  unsigned native_patch(u8 type, u16 clobb
                PATCH_SITE(pv_cpu_ops, clts);
                PATCH_SITE(pv_mmu_ops, flush_tlb_single);
                PATCH_SITE(pv_cpu_ops, wbinvd);
+#if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUE_SPINLOCK)
+               PATCH_SITE(pv_lock_ops, queue_unlock);
+#endif

        patch_site:
                ret = paravirt_patch_insns(ibuf, len, start, end);