Patchworkβ [02/11] Add "handle page fault" PV helper.

login
register
about
Submitter Gleb Natapov
Date 2009-11-01 11:56:21
Message ID <1257076590-29559-3-git-send-email-gleb@redhat.com>
Download mbox | patch
Permalink /patch/56860/
State New
Headers show

Comments

Gleb Natapov - 2009-11-01 11:56:21
Allow paravirtualized guest to do special handling for some page faults.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 arch/x86/include/asm/paravirt.h       |    7 +++++++
 arch/x86/include/asm/paravirt_types.h |    4 ++++
 arch/x86/kernel/paravirt.c            |    8 ++++++++
 arch/x86/kernel/paravirt_patch_32.c   |    8 ++++++++
 arch/x86/kernel/paravirt_patch_64.c   |    7 +++++++
 arch/x86/mm/fault.c                   |    3 +++
 6 files changed, 37 insertions(+), 0 deletions(-)
Ingo Molnar - 2009-11-02 09:22:14
* Gleb Natapov <gleb@redhat.com> wrote:

> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index f4cee90..14707dc 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -952,6 +952,9 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  	int write;
>  	int fault;
>  
> +	if (arch_handle_page_fault(regs, error_code))
> +		return;
> +

This patch is not acceptable unless it's done cleaner. Currently we 
already have 3 callbacks in do_page_fault() (kmemcheck, mmiotrace, 
notifier), and this adds a fourth one. Please consolidate them into a 
single callback site, this is a hotpath on x86.

And please always Cc: the x86 maintainers to patches that touch x86 
code.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov - 2009-11-02 16:04:10
On Mon, Nov 02, 2009 at 10:22:14AM +0100, Ingo Molnar wrote:
> 
> * Gleb Natapov <gleb@redhat.com> wrote:
> 
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index f4cee90..14707dc 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -952,6 +952,9 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
> >  	int write;
> >  	int fault;
> >  
> > +	if (arch_handle_page_fault(regs, error_code))
> > +		return;
> > +
> 
> This patch is not acceptable unless it's done cleaner. Currently we 
> already have 3 callbacks in do_page_fault() (kmemcheck, mmiotrace, 
> notifier), and this adds a fourth one. Please consolidate them into a 
> single callback site, this is a hotpath on x86.
> 
This call is patched out by paravirt patching mechanism so overhead should be
zero for non paravirt cases. What do you want to achieve by consolidate
them into single callback? I mean the code will still exist and will have to be
executed on every #PF. Is the goal to move them out of line? 

> And please always Cc: the x86 maintainers to patches that touch x86 
> code.
> 
Will do.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar - 2009-11-02 16:12:48
* Gleb Natapov <gleb@redhat.com> wrote:

> On Mon, Nov 02, 2009 at 10:22:14AM +0100, Ingo Molnar wrote:
> > 
> > * Gleb Natapov <gleb@redhat.com> wrote:
> > 
> > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > > index f4cee90..14707dc 100644
> > > --- a/arch/x86/mm/fault.c
> > > +++ b/arch/x86/mm/fault.c
> > > @@ -952,6 +952,9 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
> > >  	int write;
> > >  	int fault;
> > >  
> > > +	if (arch_handle_page_fault(regs, error_code))
> > > +		return;
> > > +
> > 
> > This patch is not acceptable unless it's done cleaner. Currently we 
> > already have 3 callbacks in do_page_fault() (kmemcheck, mmiotrace, 
> > notifier), and this adds a fourth one. Please consolidate them into a 
> > single callback site, this is a hotpath on x86.
> > 
> This call is patched out by paravirt patching mechanism so overhead 
> should be zero for non paravirt cases. [...]

arch_handle_page_fault() isnt upstream yet - precisely what is the 
instruction sequence injected into do_page_fault() in the patched-out 
case?

> [...] What do you want to achieve by consolidate them into single 
> callback? [...]

Less bloat in a hotpath and a shared callback infrastructure.

> [...] I mean the code will still exist and will have to be executed on 
> every #PF. Is the goal to move them out of line?

The goal is to have a single callback site for all the users - which 
call-site is patched out ideally - on non-paravirt too if needed. Most 
of these callbacks/notifier-chains have are inactive most of the time.

I.e. a very low overhead 'conditional callback' facility, and a single 
one - not just lots of them sprinkled around the code.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov - 2009-11-02 16:22:34
On Mon, Nov 02, 2009 at 05:12:48PM +0100, Ingo Molnar wrote:
> 
> * Gleb Natapov <gleb@redhat.com> wrote:
> 
> > On Mon, Nov 02, 2009 at 10:22:14AM +0100, Ingo Molnar wrote:
> > > 
> > > * Gleb Natapov <gleb@redhat.com> wrote:
> > > 
> > > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > > > index f4cee90..14707dc 100644
> > > > --- a/arch/x86/mm/fault.c
> > > > +++ b/arch/x86/mm/fault.c
> > > > @@ -952,6 +952,9 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
> > > >  	int write;
> > > >  	int fault;
> > > >  
> > > > +	if (arch_handle_page_fault(regs, error_code))
> > > > +		return;
> > > > +
> > > 
> > > This patch is not acceptable unless it's done cleaner. Currently we 
> > > already have 3 callbacks in do_page_fault() (kmemcheck, mmiotrace, 
> > > notifier), and this adds a fourth one. Please consolidate them into a 
> > > single callback site, this is a hotpath on x86.
> > > 
> > This call is patched out by paravirt patching mechanism so overhead 
> > should be zero for non paravirt cases. [...]
> 
> arch_handle_page_fault() isnt upstream yet - precisely what is the 
> instruction sequence injected into do_page_fault() in the patched-out 
> case?
> 
It is introduced by the same patch. The instruction inserted is: 
 xor %rax, %rax

> > [...] What do you want to achieve by consolidate them into single 
> > callback? [...]
> 
> Less bloat in a hotpath and a shared callback infrastructure.
> 
> > [...] I mean the code will still exist and will have to be executed on 
> > every #PF. Is the goal to move them out of line?
> 
> The goal is to have a single callback site for all the users - which 
> call-site is patched out ideally - on non-paravirt too if needed. Most 
> of these callbacks/notifier-chains have are inactive most of the time.
> 
> I.e. a very low overhead 'conditional callback' facility, and a single 
> one - not just lots of them sprinkled around the code.
> 
> 	Ingo

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar - 2009-11-02 16:29:41
* Gleb Natapov <gleb@redhat.com> wrote:

> On Mon, Nov 02, 2009 at 05:12:48PM +0100, Ingo Molnar wrote:
> > 
> > * Gleb Natapov <gleb@redhat.com> wrote:
> > 
> > > On Mon, Nov 02, 2009 at 10:22:14AM +0100, Ingo Molnar wrote:
> > > > 
> > > > * Gleb Natapov <gleb@redhat.com> wrote:
> > > > 
> > > > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > > > > index f4cee90..14707dc 100644
> > > > > --- a/arch/x86/mm/fault.c
> > > > > +++ b/arch/x86/mm/fault.c
> > > > > @@ -952,6 +952,9 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
> > > > >  	int write;
> > > > >  	int fault;
> > > > >  
> > > > > +	if (arch_handle_page_fault(regs, error_code))
> > > > > +		return;
> > > > > +
> > > > 
> > > > This patch is not acceptable unless it's done cleaner. Currently we 
> > > > already have 3 callbacks in do_page_fault() (kmemcheck, mmiotrace, 
> > > > notifier), and this adds a fourth one. Please consolidate them into a 
> > > > single callback site, this is a hotpath on x86.
> > > > 
> > > This call is patched out by paravirt patching mechanism so overhead 
> > > should be zero for non paravirt cases. [...]
> > 
> > arch_handle_page_fault() isnt upstream yet - precisely what is the 
> > instruction sequence injected into do_page_fault() in the patched-out 
> > case?
> 
> It is introduced by the same patch. The instruction inserted is:
>  xor %rax, %rax

ok.

My observations still stand:

> > > [...] What do you want to achieve by consolidate them into single 
> > > callback? [...]
> > 
> > Less bloat in a hotpath and a shared callback infrastructure.
> > 
> > > [...] I mean the code will still exist and will have to be executed on 
> > > every #PF. Is the goal to move them out of line?
> > 
> > The goal is to have a single callback site for all the users - which 
> > call-site is patched out ideally - on non-paravirt too if needed. Most 
> > of these callbacks/notifier-chains have are inactive most of the time.
> > 
> > I.e. a very low overhead 'conditional callback' facility, and a single 
> > one - not just lots of them sprinkled around the code.

looks like a golden opportunity to get this right.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov - 2009-11-02 16:31:48
On Mon, Nov 02, 2009 at 05:29:41PM +0100, Ingo Molnar wrote:
> 
> * Gleb Natapov <gleb@redhat.com> wrote:
> 
> > On Mon, Nov 02, 2009 at 05:12:48PM +0100, Ingo Molnar wrote:
> > > 
> > > * Gleb Natapov <gleb@redhat.com> wrote:
> > > 
> > > > On Mon, Nov 02, 2009 at 10:22:14AM +0100, Ingo Molnar wrote:
> > > > > 
> > > > > * Gleb Natapov <gleb@redhat.com> wrote:
> > > > > 
> > > > > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > > > > > index f4cee90..14707dc 100644
> > > > > > --- a/arch/x86/mm/fault.c
> > > > > > +++ b/arch/x86/mm/fault.c
> > > > > > @@ -952,6 +952,9 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
> > > > > >  	int write;
> > > > > >  	int fault;
> > > > > >  
> > > > > > +	if (arch_handle_page_fault(regs, error_code))
> > > > > > +		return;
> > > > > > +
> > > > > 
> > > > > This patch is not acceptable unless it's done cleaner. Currently we 
> > > > > already have 3 callbacks in do_page_fault() (kmemcheck, mmiotrace, 
> > > > > notifier), and this adds a fourth one. Please consolidate them into a 
> > > > > single callback site, this is a hotpath on x86.
> > > > > 
> > > > This call is patched out by paravirt patching mechanism so overhead 
> > > > should be zero for non paravirt cases. [...]
> > > 
> > > arch_handle_page_fault() isnt upstream yet - precisely what is the 
> > > instruction sequence injected into do_page_fault() in the patched-out 
> > > case?
> > 
> > It is introduced by the same patch. The instruction inserted is:
> >  xor %rax, %rax
> 
> ok.
> 
> My observations still stand:
> 
> > > > [...] What do you want to achieve by consolidate them into single 
> > > > callback? [...]
> > > 
> > > Less bloat in a hotpath and a shared callback infrastructure.
> > > 
> > > > [...] I mean the code will still exist and will have to be executed on 
> > > > every #PF. Is the goal to move them out of line?
> > > 
> > > The goal is to have a single callback site for all the users - which 
> > > call-site is patched out ideally - on non-paravirt too if needed. Most 
> > > of these callbacks/notifier-chains have are inactive most of the time.
> > > 
> > > I.e. a very low overhead 'conditional callback' facility, and a single 
> > > one - not just lots of them sprinkled around the code.
> 
> looks like a golden opportunity to get this right.
> 
I'll look into it. Expect questions from me ;)

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gleb Natapov - 2009-11-02 17:42:08
On Mon, Nov 02, 2009 at 05:29:41PM +0100, Ingo Molnar wrote:
> 
> * Gleb Natapov <gleb@redhat.com> wrote:
> 
> > On Mon, Nov 02, 2009 at 05:12:48PM +0100, Ingo Molnar wrote:
> > > 
> > > * Gleb Natapov <gleb@redhat.com> wrote:
> > > 
> > > > On Mon, Nov 02, 2009 at 10:22:14AM +0100, Ingo Molnar wrote:
> > > > > 
> > > > > * Gleb Natapov <gleb@redhat.com> wrote:
> > > > > 
> > > > > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > > > > > index f4cee90..14707dc 100644
> > > > > > --- a/arch/x86/mm/fault.c
> > > > > > +++ b/arch/x86/mm/fault.c
> > > > > > @@ -952,6 +952,9 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
> > > > > >  	int write;
> > > > > >  	int fault;
> > > > > >  
> > > > > > +	if (arch_handle_page_fault(regs, error_code))
> > > > > > +		return;
> > > > > > +
> > > > > 
> > > > > This patch is not acceptable unless it's done cleaner. Currently we 
> > > > > already have 3 callbacks in do_page_fault() (kmemcheck, mmiotrace, 
> > > > > notifier), and this adds a fourth one. Please consolidate them into a 
> > > > > single callback site, this is a hotpath on x86.
> > > > > 
> > > > This call is patched out by paravirt patching mechanism so overhead 
> > > > should be zero for non paravirt cases. [...]
> > > 
> > > arch_handle_page_fault() isnt upstream yet - precisely what is the 
> > > instruction sequence injected into do_page_fault() in the patched-out 
> > > case?
> > 
> > It is introduced by the same patch. The instruction inserted is:
> >  xor %rax, %rax
> 
> ok.
> 
> My observations still stand:
> 
> > > > [...] What do you want to achieve by consolidate them into single 
> > > > callback? [...]
> > > 
> > > Less bloat in a hotpath and a shared callback infrastructure.
> > > 
> > > > [...] I mean the code will still exist and will have to be executed on 
> > > > every #PF. Is the goal to move them out of line?
> > > 
> > > The goal is to have a single callback site for all the users - which 
> > > call-site is patched out ideally - on non-paravirt too if needed. Most 
> > > of these callbacks/notifier-chains have are inactive most of the time.
> > > 
> > > I.e. a very low overhead 'conditional callback' facility, and a single 
> > > one - not just lots of them sprinkled around the code.
> 
> looks like a golden opportunity to get this right.
>
Three existing callbacks are: kmemcheck, mmiotrace, notifier. Two
of them kmemcheck, mmiotrace are enabled only for debugging, should
not be performance concern. And notifier call sites (two of them)
are deliberately, as explained by comment, not at the function entry,
so can't be unified with others. (And kmemcheck also has two different
call site BTW)

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rik van Riel - 2009-11-02 19:03:38
On 11/02/2009 04:22 AM, Ingo Molnar wrote:
>
> * Gleb Natapov<gleb@redhat.com>  wrote:
>
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index f4cee90..14707dc 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -952,6 +952,9 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
>>   	int write;
>>   	int fault;
>>
>> +	if (arch_handle_page_fault(regs, error_code))
>> +		return;
>> +
>
> This patch is not acceptable unless it's done cleaner. Currently we
> already have 3 callbacks in do_page_fault() (kmemcheck, mmiotrace,
> notifier), and this adds a fourth one.

There's another alternative - add our own exception vector
for async page faults.  Not sure if that is warranted though,
especially if we already have other callbacks in do_page_fault()
and we can consolidate them.
Avi Kivity - 2009-11-02 19:33:45
On 11/02/2009 09:03 PM, Rik van Riel wrote:
>> This patch is not acceptable unless it's done cleaner. Currently we
>> already have 3 callbacks in do_page_fault() (kmemcheck, mmiotrace,
>> notifier), and this adds a fourth one.
>
>
> There's another alternative - add our own exception vector
> for async page faults.  Not sure if that is warranted though,
> especially if we already have other callbacks in do_page_fault()
> and we can consolidate them.
>

We can't add an exception vector since all the existing ones are either 
taken or reserved.  We can try to use an interrupt vector as an 
exception, but that becomes messy, and I'm not sure hardware will allow 
us to inject an interrupt when interrupts are disabled.
Rik van Riel - 2009-11-02 23:35:31
On 11/02/2009 02:33 PM, Avi Kivity wrote:
> On 11/02/2009 09:03 PM, Rik van Riel wrote:
>>> This patch is not acceptable unless it's done cleaner. Currently we
>>> already have 3 callbacks in do_page_fault() (kmemcheck, mmiotrace,
>>> notifier), and this adds a fourth one.
>>
>>
>> There's another alternative - add our own exception vector
>> for async page faults. Not sure if that is warranted though,
>> especially if we already have other callbacks in do_page_fault()
>> and we can consolidate them.
>>
>
> We can't add an exception vector since all the existing ones are either
> taken or reserved.

I believe some are reserved for operating system use.

That means the guest can pick one and tell the host to use
that one to notify it.
Avi Kivity - 2009-11-03 04:57:07
On 11/03/2009 01:35 AM, Rik van Riel wrote:
>> We can't add an exception vector since all the existing ones are either
>> taken or reserved.
>
>
> I believe some are reserved for operating system use.

Table 6-1 says:

   9 |  | Coprocessor Segment Overrun (reserved)  |  Fault |  No  | 
Floating-point instruction.2
   15 |  — |  (Intel reserved. Do not use.) |   | No |
   20-31 |  — | Intel reserved. Do not use.  |
   32-255 |  —  | User Defined (Non-reserved) Interrupts |  Interrupt  
|   | External interrupt or INT n instruction.

So we can only use 32-255, but these are not fault-like exceptions that 
can be delivered with interrupts disabled.
Tian, Kevin - 2009-11-05 06:44:51
>From: Avi Kivity

>Sent: 2009年11月3日 12:57

>

>On 11/03/2009 01:35 AM, Rik van Riel wrote:

>>> We can't add an exception vector since all the existing 

>ones are either

>>> taken or reserved.

>>

>>

>> I believe some are reserved for operating system use.

>

>Table 6-1 says:

>

>   9 |  | Coprocessor Segment Overrun (reserved)  |  Fault |  No  | 

>Floating-point instruction.2

>   15 |  ― |  (Intel reserved. Do not use.) |   | No |

>   20-31 |  ― | Intel reserved. Do not use.  |

>   32-255 |  ―  | User Defined (Non-reserved) Interrupts |  Interrupt  

>|   | External interrupt or INT n instruction.

>

>So we can only use 32-255, but these are not fault-like 

>exceptions that 

>can be delivered with interrupts disabled.

>


would you really want to inject a fault-like exception here? Fault
is architurally synchronous event while here apf is more like an 
asynchronous interrupt as it's not caused by guest itself. If 
guest is with interrupt disabled, preemption won't happen and 
apf path just ends up "wait for page" hypercall to waste cycles.

Thanks,
Kevin
Avi Kivity - 2009-11-05 08:22:33
On 11/05/2009 08:44 AM, Tian, Kevin wrote:
>> From: Avi Kivity
>> Sent: 2009年11月3日 12:57
>>
>> On 11/03/2009 01:35 AM, Rik van Riel wrote:
>>     
>>>> We can't add an exception vector since all the existing 
>>>>         
>> ones are either
>>     
>>>> taken or reserved.
>>>>         
>>>
>>> I believe some are reserved for operating system use.
>>>       
>> Table 6-1 says:
>>
>>   9 |  | Coprocessor Segment Overrun (reserved)  |  Fault |  No  | 
>> Floating-point instruction.2
>>   15 |  ― |  (Intel reserved. Do not use.) |   | No |
>>   20-31 |  ― | Intel reserved. Do not use.  |
>>   32-255 |  ―  | User Defined (Non-reserved) Interrupts |  Interrupt  
>> |   | External interrupt or INT n instruction.
>>
>> So we can only use 32-255, but these are not fault-like 
>> exceptions that 
>> can be delivered with interrupts disabled.
>>
>>     
> would you really want to inject a fault-like exception here? Fault
> is architurally synchronous event while here apf is more like an 
> asynchronous interrupt as it's not caused by guest itself. If 
> guest is with interrupt disabled, preemption won't happen and 
> apf path just ends up "wait for page" hypercall to waste cycles.
>   

An async page fault is, despite its name, synchronous, since it is
associated with an instruction. It must either be delivered immediately
or not at all.

It's true that in kernel mode you can't do much with an apf if
interrupts are disabled, but you still want to receive apfs for user
mode with interrupts disabled (for example due to interrupt shadow).
Ingo Molnar - 2009-11-08 11:36:54
* Gleb Natapov <gleb@redhat.com> wrote:

> On Mon, Nov 02, 2009 at 05:29:41PM +0100, Ingo Molnar wrote:
> > 
> > * Gleb Natapov <gleb@redhat.com> wrote:
> > 
> > > On Mon, Nov 02, 2009 at 05:12:48PM +0100, Ingo Molnar wrote:
> > > > 
> > > > * Gleb Natapov <gleb@redhat.com> wrote:
> > > > 
> > > > > On Mon, Nov 02, 2009 at 10:22:14AM +0100, Ingo Molnar wrote:
> > > > > > 
> > > > > > * Gleb Natapov <gleb@redhat.com> wrote:
> > > > > > 
> > > > > > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > > > > > > index f4cee90..14707dc 100644
> > > > > > > --- a/arch/x86/mm/fault.c
> > > > > > > +++ b/arch/x86/mm/fault.c
> > > > > > > @@ -952,6 +952,9 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
> > > > > > >  	int write;
> > > > > > >  	int fault;
> > > > > > >  
> > > > > > > +	if (arch_handle_page_fault(regs, error_code))
> > > > > > > +		return;
> > > > > > > +
> > > > > > 
> > > > > > This patch is not acceptable unless it's done cleaner. Currently we 
> > > > > > already have 3 callbacks in do_page_fault() (kmemcheck, mmiotrace, 
> > > > > > notifier), and this adds a fourth one. Please consolidate them into a 
> > > > > > single callback site, this is a hotpath on x86.
> > > > > > 
> > > > > This call is patched out by paravirt patching mechanism so overhead 
> > > > > should be zero for non paravirt cases. [...]
> > > > 
> > > > arch_handle_page_fault() isnt upstream yet - precisely what is the 
> > > > instruction sequence injected into do_page_fault() in the patched-out 
> > > > case?
> > > 
> > > It is introduced by the same patch. The instruction inserted is:
> > >  xor %rax, %rax
> > 
> > ok.
> > 
> > My observations still stand:
> > 
> > > > > [...] What do you want to achieve by consolidate them into single 
> > > > > callback? [...]
> > > > 
> > > > Less bloat in a hotpath and a shared callback infrastructure.
> > > > 
> > > > > [...] I mean the code will still exist and will have to be executed on 
> > > > > every #PF. Is the goal to move them out of line?
> > > > 
> > > > The goal is to have a single callback site for all the users - which 
> > > > call-site is patched out ideally - on non-paravirt too if needed. Most 
> > > > of these callbacks/notifier-chains have are inactive most of the time.
> > > > 
> > > > I.e. a very low overhead 'conditional callback' facility, and a single 
> > > > one - not just lots of them sprinkled around the code.
> > 
> > looks like a golden opportunity to get this right.
> >
> Three existing callbacks are: kmemcheck, mmiotrace, notifier. Two
> of them kmemcheck, mmiotrace are enabled only for debugging, should
> not be performance concern. And notifier call sites (two of them)
> are deliberately, as explained by comment, not at the function entry,
> so can't be unified with others. (And kmemcheck also has two different
> call site BTW)

We want mmiotrace to be generic distro capable so the overhead when the 
hook is not used is of concern.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity - 2009-11-08 12:43:17
On 11/08/2009 01:36 PM, Ingo Molnar wrote:
>> Three existing callbacks are: kmemcheck, mmiotrace, notifier. Two
>> of them kmemcheck, mmiotrace are enabled only for debugging, should
>> not be performance concern. And notifier call sites (two of them)
>> are deliberately, as explained by comment, not at the function entry,
>> so can't be unified with others. (And kmemcheck also has two different
>> call site BTW)
>>      
> We want mmiotrace to be generic distro capable so the overhead when the
> hook is not used is of concern.
>
>    

Maybe we should generalize paravirt-ops patching in case if (x) f() is 
deemed too expensive.
Ingo Molnar - 2009-11-08 12:51:35
* Avi Kivity <avi@redhat.com> wrote:

> On 11/08/2009 01:36 PM, Ingo Molnar wrote:
> >>Three existing callbacks are: kmemcheck, mmiotrace, notifier. Two
> >>of them kmemcheck, mmiotrace are enabled only for debugging, should
> >>not be performance concern. And notifier call sites (two of them)
> >>are deliberately, as explained by comment, not at the function entry,
> >>so can't be unified with others. (And kmemcheck also has two different
> >>call site BTW)
> >
> > We want mmiotrace to be generic distro capable so the overhead when 
> > the hook is not used is of concern.
> 
> Maybe we should generalize paravirt-ops patching in case if (x) f() is 
> deemed too expensive.

Yes, that's a nice idea. We have quite a number of 'conditional 
callbacks' in various critical paths that could be made lighter via such 
a technique.

It would also free new callbacks from the 'it increases overhead even if 
unused' criticism and made it easier to add them.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity - 2009-11-08 13:01:06
On 11/08/2009 02:51 PM, Ingo Molnar wrote:
>> Maybe we should generalize paravirt-ops patching in case if (x) f() is
>> deemed too expensive.
>>      
> Yes, that's a nice idea. We have quite a number of 'conditional
> callbacks' in various critical paths that could be made lighter via such
> a technique.
>
> It would also free new callbacks from the 'it increases overhead even if
> unused' criticism and made it easier to add them.
>    

We can take the "immediate values" infrastructure as a first step.  Has 
that been merged?
Ingo Molnar - 2009-11-08 13:05:21
* Avi Kivity <avi@redhat.com> wrote:

> On 11/08/2009 02:51 PM, Ingo Molnar wrote:
> >>Maybe we should generalize paravirt-ops patching in case if (x) f() is
> >>deemed too expensive.
> >Yes, that's a nice idea. We have quite a number of 'conditional
> >callbacks' in various critical paths that could be made lighter via such
> >a technique.
> >
> > It would also free new callbacks from the 'it increases overhead 
> > even if unused' criticism and made it easier to add them.
> 
> We can take the "immediate values" infrastructure as a first step. Has 
> that been merged?

No, there were doubts about whether patching in live instructions like 
that is safe on all CPU types.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity - 2009-11-08 13:08:55
On 11/08/2009 03:05 PM, Ingo Molnar wrote:
>> We can take the "immediate values" infrastructure as a first step. Has
>> that been merged?
>>      
> No, there were doubts about whether patching in live instructions like
> that is safe on all CPU types.
>    

Ah, I remember.  Doesn't the trick of going through a breakpoint 
instruction work?
H. Peter Anvin - 2009-11-08 16:44:44
On 11/08/2009 04:51 AM, Ingo Molnar wrote:
> 
> * Avi Kivity <avi@redhat.com> wrote:
> 
>> On 11/08/2009 01:36 PM, Ingo Molnar wrote:
>>>> Three existing callbacks are: kmemcheck, mmiotrace, notifier. Two
>>>> of them kmemcheck, mmiotrace are enabled only for debugging, should
>>>> not be performance concern. And notifier call sites (two of them)
>>>> are deliberately, as explained by comment, not at the function entry,
>>>> so can't be unified with others. (And kmemcheck also has two different
>>>> call site BTW)
>>>
>>> We want mmiotrace to be generic distro capable so the overhead when 
>>> the hook is not used is of concern.
>>
>> Maybe we should generalize paravirt-ops patching in case if (x) f() is 
>> deemed too expensive.
> 
> Yes, that's a nice idea. We have quite a number of 'conditional 
> callbacks' in various critical paths that could be made lighter via such 
> a technique.
> 
> It would also free new callbacks from the 'it increases overhead even if 
> unused' criticism and made it easier to add them.
> 

There are a number of other things were we permanently bind to a single
instance of something, too.  Optimizing those away would be nice.
Consider memcpy(), where we may want to have different implementations
for different processors.

	-hpa
Ingo Molnar - 2009-11-08 16:47:06
* H. Peter Anvin <hpa@zytor.com> wrote:

> On 11/08/2009 04:51 AM, Ingo Molnar wrote:
> > 
> > * Avi Kivity <avi@redhat.com> wrote:
> > 
> >> On 11/08/2009 01:36 PM, Ingo Molnar wrote:
> >>>> Three existing callbacks are: kmemcheck, mmiotrace, notifier. Two
> >>>> of them kmemcheck, mmiotrace are enabled only for debugging, should
> >>>> not be performance concern. And notifier call sites (two of them)
> >>>> are deliberately, as explained by comment, not at the function entry,
> >>>> so can't be unified with others. (And kmemcheck also has two different
> >>>> call site BTW)
> >>>
> >>> We want mmiotrace to be generic distro capable so the overhead when 
> >>> the hook is not used is of concern.
> >>
> >> Maybe we should generalize paravirt-ops patching in case if (x) f() is 
> >> deemed too expensive.
> > 
> > Yes, that's a nice idea. We have quite a number of 'conditional 
> > callbacks' in various critical paths that could be made lighter via such 
> > a technique.
> > 
> > It would also free new callbacks from the 'it increases overhead 
> > even if unused' criticism and made it easier to add them.
> 
> There are a number of other things were we permanently bind to a 
> single instance of something, too.  Optimizing those away would be 
> nice. Consider memcpy(), where we may want to have different 
> implementations for different processors.

yeah.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index efb3899..5203da1 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -6,6 +6,7 @@ 
 #ifdef CONFIG_PARAVIRT
 #include <asm/pgtable_types.h>
 #include <asm/asm.h>
+#include <asm/ptrace.h>
 
 #include <asm/paravirt_types.h>
 
@@ -710,6 +711,12 @@  static inline void arch_end_context_switch(struct task_struct *next)
 	PVOP_VCALL1(pv_cpu_ops.end_context_switch, next);
 }
 
+static inline int arch_handle_page_fault(struct pt_regs *regs,
+					 unsigned long error_code)
+{
+	return PVOP_CALL2(int, pv_cpu_ops.handle_pf, regs, error_code);
+}
+
 #define  __HAVE_ARCH_ENTER_LAZY_MMU_MODE
 static inline void arch_enter_lazy_mmu_mode(void)
 {
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 9357473..bcc39b3 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -186,6 +186,7 @@  struct pv_cpu_ops {
 
 	void (*start_context_switch)(struct task_struct *prev);
 	void (*end_context_switch)(struct task_struct *next);
+	int (*handle_pf)(struct pt_regs *regs, unsigned long error_code);
 };
 
 struct pv_irq_ops {
@@ -385,6 +386,7 @@  extern struct pv_lock_ops pv_lock_ops;
 unsigned paravirt_patch_nop(void);
 unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
 unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len);
+unsigned paravirt_patch_ret_0(void *insnbuf, unsigned len);
 unsigned paravirt_patch_ignore(unsigned len);
 unsigned paravirt_patch_call(void *insnbuf,
 			     const void *target, u16 tgt_clobbers,
@@ -676,8 +678,10 @@  void paravirt_leave_lazy_mmu(void);
 void _paravirt_nop(void);
 u32 _paravirt_ident_32(u32);
 u64 _paravirt_ident_64(u64);
+unsigned long _paravirt_ret_0(void);
 
 #define paravirt_nop	((void *)_paravirt_nop)
+#define paravirt_ret_0  ((void *)_paravirt_ret_0)
 
 /* These all sit in the .parainstructions section to tell us what to patch. */
 struct paravirt_patch_site {
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 1b1739d..7d8f37b 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -54,6 +54,11 @@  u64 _paravirt_ident_64(u64 x)
 	return x;
 }
 
+unsigned long _paravirt_ret_0(void)
+{
+	return 0;
+}
+
 void __init default_banner(void)
 {
 	printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
@@ -154,6 +159,8 @@  unsigned paravirt_patch_default(u8 type, u16 clobbers, void *insnbuf,
 		ret = paravirt_patch_ident_32(insnbuf, len);
 	else if (opfunc == _paravirt_ident_64)
 		ret = paravirt_patch_ident_64(insnbuf, len);
+	else if (opfunc == _paravirt_ret_0)
+		ret = paravirt_patch_ret_0(insnbuf, len);
 
 	else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) ||
 		 type == PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit) ||
@@ -380,6 +387,7 @@  struct pv_cpu_ops pv_cpu_ops = {
 
 	.start_context_switch = paravirt_nop,
 	.end_context_switch = paravirt_nop,
+	.handle_pf = paravirt_ret_0,
 };
 
 struct pv_apic_ops pv_apic_ops = {
diff --git a/arch/x86/kernel/paravirt_patch_32.c b/arch/x86/kernel/paravirt_patch_32.c
index d9f32e6..de006b1 100644
--- a/arch/x86/kernel/paravirt_patch_32.c
+++ b/arch/x86/kernel/paravirt_patch_32.c
@@ -12,6 +12,8 @@  DEF_NATIVE(pv_mmu_ops, read_cr3, "mov %cr3, %eax");
 DEF_NATIVE(pv_cpu_ops, clts, "clts");
 DEF_NATIVE(pv_cpu_ops, read_tsc, "rdtsc");
 
+DEF_NATIVE(, mov0, "xor %eax, %eax");
+
 unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
 {
 	/* arg in %eax, return in %eax */
@@ -24,6 +26,12 @@  unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len)
 	return 0;
 }
 
+unsigned paravirt_patch_ret_0(void *insnbuf, unsigned len)
+{
+	return paravirt_patch_insns(insnbuf, len,
+				    start__mov0, end__mov0);
+}
+
 unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
 		      unsigned long addr, unsigned len)
 {
diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c
index 3f08f34..d685e7d 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -21,6 +21,7 @@  DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs");
 
 DEF_NATIVE(, mov32, "mov %edi, %eax");
 DEF_NATIVE(, mov64, "mov %rdi, %rax");
+DEF_NATIVE(, mov0, "xor %rax, %rax");
 
 unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
 {
@@ -34,6 +35,12 @@  unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len)
 				    start__mov64, end__mov64);
 }
 
+unsigned paravirt_patch_ret_0(void *insnbuf, unsigned len)
+{
+	return paravirt_patch_insns(insnbuf, len,
+				    start__mov0, end__mov0);
+}
+
 unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
 		      unsigned long addr, unsigned len)
 {
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f4cee90..14707dc 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -952,6 +952,9 @@  do_page_fault(struct pt_regs *regs, unsigned long error_code)
 	int write;
 	int fault;
 
+	if (arch_handle_page_fault(regs, error_code))
+		return;
+
 	tsk = current;
 	mm = tsk->mm;