diff mbox series

[v4,2/3] xen/oprofile: use NMI continuation for sending virq to guest

Message ID 20201109095021.9897-3-jgross@suse.com (mailing list archive)
State Superseded
Headers show
Series xen/x86: implement NMI continuation | expand

Commit Message

Jürgen Groß Nov. 9, 2020, 9:50 a.m. UTC
Instead of calling send_guest_vcpu_virq() from NMI context use the
NMI continuation framework for that purpose. This avoids taking locks
in NMI mode.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V4:
- rework to less generic approach
---
 xen/arch/x86/oprofile/nmi_int.c | 20 ++++++++++++++++++--
 xen/arch/x86/traps.c            |  4 ++++
 xen/include/asm-x86/xenoprof.h  |  7 +++++++
 3 files changed, 29 insertions(+), 2 deletions(-)

Comments

Jan Beulich Nov. 11, 2020, 3:45 p.m. UTC | #1
On 09.11.2020 10:50, Juergen Gross wrote:
> @@ -83,14 +85,28 @@ void passive_domain_destroy(struct vcpu *v)
>  		model->free_msr(v);
>  }
>  
> +bool nmi_oprofile_send_virq(void)
> +{
> +	struct vcpu *v = this_cpu(nmi_cont_vcpu);
> +
> +	if ( v )
> +		send_guest_vcpu_virq(v, VIRQ_XENOPROF);
> +
> +	this_cpu(nmi_cont_vcpu) = NULL;

What if, by the time we make it here, a 2nd NMI has arrived? I
agree the next overflow interrupt shouldn't arrive this
quickly, but I also think you want to zap the per-CPU variable
first here, and ...

> +
> +	return v;
> +}
> +
>  static int nmi_callback(const struct cpu_user_regs *regs, int cpu)
>  {
>  	int xen_mode, ovf;
>  
>  	ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs);
>  	xen_mode = ring_0(regs);
> -	if ( ovf && is_active(current->domain) && !xen_mode )
> -		send_guest_vcpu_virq(current, VIRQ_XENOPROF);
> +	if ( ovf && is_active(current->domain) && !xen_mode ) {
> +		this_cpu(nmi_cont_vcpu) = current;

... avoid overwriting any non-NULL value here. That's then of
course still not closing the window, but has (imo) overall
better behavior.

Also, style-wise, going through the file it looks to be mainly
Linux style, so may I suggest your additions / changes to be
done that way, rather than extending use of this funny mixed
style?

Jan
Jürgen Groß Nov. 11, 2020, 3:48 p.m. UTC | #2
On 11.11.20 16:45, Jan Beulich wrote:
> On 09.11.2020 10:50, Juergen Gross wrote:
>> @@ -83,14 +85,28 @@ void passive_domain_destroy(struct vcpu *v)
>>   		model->free_msr(v);
>>   }
>>   
>> +bool nmi_oprofile_send_virq(void)
>> +{
>> +	struct vcpu *v = this_cpu(nmi_cont_vcpu);
>> +
>> +	if ( v )
>> +		send_guest_vcpu_virq(v, VIRQ_XENOPROF);
>> +
>> +	this_cpu(nmi_cont_vcpu) = NULL;
> 
> What if, by the time we make it here, a 2nd NMI has arrived? I
> agree the next overflow interrupt shouldn't arrive this
> quickly, but I also think you want to zap the per-CPU variable
> first here, and ...

How could that happen? This function is activated only from NMI
context in case the NMI happened in guest mode. And it will be
executed with higher priority than any guest, so there is a zero
chance another NMI in guest mode can happen in between.

I can add a comment in this regard if you want.

> 
>> +
>> +	return v;
>> +}
>> +
>>   static int nmi_callback(const struct cpu_user_regs *regs, int cpu)
>>   {
>>   	int xen_mode, ovf;
>>   
>>   	ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs);
>>   	xen_mode = ring_0(regs);
>> -	if ( ovf && is_active(current->domain) && !xen_mode )
>> -		send_guest_vcpu_virq(current, VIRQ_XENOPROF);
>> +	if ( ovf && is_active(current->domain) && !xen_mode ) {
>> +		this_cpu(nmi_cont_vcpu) = current;
> 
> ... avoid overwriting any non-NULL value here. That's then of
> course still not closing the window, but has (imo) overall
> better behavior.
> 
> Also, style-wise, going through the file it looks to be mainly
> Linux style, so may I suggest your additions / changes to be
> done that way, rather than extending use of this funny mixed
> style?

Works for me.


Juergen
Jan Beulich Nov. 12, 2020, 10:23 a.m. UTC | #3
On 11.11.2020 16:48, Jürgen Groß wrote:
> On 11.11.20 16:45, Jan Beulich wrote:
>> On 09.11.2020 10:50, Juergen Gross wrote:
>>> @@ -83,14 +85,28 @@ void passive_domain_destroy(struct vcpu *v)
>>>   		model->free_msr(v);
>>>   }
>>>   
>>> +bool nmi_oprofile_send_virq(void)
>>> +{
>>> +	struct vcpu *v = this_cpu(nmi_cont_vcpu);
>>> +
>>> +	if ( v )
>>> +		send_guest_vcpu_virq(v, VIRQ_XENOPROF);
>>> +
>>> +	this_cpu(nmi_cont_vcpu) = NULL;
>>
>> What if, by the time we make it here, a 2nd NMI has arrived? I
>> agree the next overflow interrupt shouldn't arrive this
>> quickly, but I also think you want to zap the per-CPU variable
>> first here, and ...
> 
> How could that happen? This function is activated only from NMI
> context in case the NMI happened in guest mode. And it will be
> executed with higher priority than any guest, so there is a zero
> chance another NMI in guest mode can happen in between.

While I'll admit I didn't pay attention to the bogus (as far as
HVM is concerned) xen_mode check, my understanding is that the
self-IPI will be delivered once we're back in guest mode, as
that's the first time IRQs would be on again (even event checking
gets deferred by sending a self-IPI). If another NMI was latched
by that time, it would take precedence over the IRQ and would
also be delivered on the guest mode insn that the IRET returned
to.

I agree though that this is benign, as the vCPU wouldn't have
been context switched out yet, i.e. current is still the same
and there'll then merely be two NMI instances folded into one.

However, I still think the ordering would better be changed, to
set a good precedent.

>>>   static int nmi_callback(const struct cpu_user_regs *regs, int cpu)
>>>   {
>>>   	int xen_mode, ovf;
>>>   
>>>   	ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs);
>>>   	xen_mode = ring_0(regs);

Unrelated to the patch here (i.e. just as an observation), this
use of ring_0() looks bogus when the NMI occurred in HVM guest
mode.

Jan
Jürgen Groß Nov. 12, 2020, 10:48 a.m. UTC | #4
On 12.11.20 11:23, Jan Beulich wrote:
> On 11.11.2020 16:48, Jürgen Groß wrote:
>> On 11.11.20 16:45, Jan Beulich wrote:
>>> On 09.11.2020 10:50, Juergen Gross wrote:
>>>> @@ -83,14 +85,28 @@ void passive_domain_destroy(struct vcpu *v)
>>>>    		model->free_msr(v);
>>>>    }
>>>>    
>>>> +bool nmi_oprofile_send_virq(void)
>>>> +{
>>>> +	struct vcpu *v = this_cpu(nmi_cont_vcpu);
>>>> +
>>>> +	if ( v )
>>>> +		send_guest_vcpu_virq(v, VIRQ_XENOPROF);
>>>> +
>>>> +	this_cpu(nmi_cont_vcpu) = NULL;
>>>
>>> What if, by the time we make it here, a 2nd NMI has arrived? I
>>> agree the next overflow interrupt shouldn't arrive this
>>> quickly, but I also think you want to zap the per-CPU variable
>>> first here, and ...
>>
>> How could that happen? This function is activated only from NMI
>> context in case the NMI happened in guest mode. And it will be
>> executed with higher priority than any guest, so there is a zero
>> chance another NMI in guest mode can happen in between.
> 
> While I'll admit I didn't pay attention to the bogus (as far as
> HVM is concerned) xen_mode check, my understanding is that the
> self-IPI will be delivered once we're back in guest mode, as
> that's the first time IRQs would be on again (even event checking
> gets deferred by sending a self-IPI). If another NMI was latched
> by that time, it would take precedence over the IRQ and would
> also be delivered on the guest mode insn that the IRET returned
> to.
> 
> I agree though that this is benign, as the vCPU wouldn't have
> been context switched out yet, i.e. current is still the same
> and there'll then merely be two NMI instances folded into one.

Correct.

> 
> However, I still think the ordering would better be changed, to
> set a good precedent.

Okay, if you want that.

> 
>>>>    static int nmi_callback(const struct cpu_user_regs *regs, int cpu)
>>>>    {
>>>>    	int xen_mode, ovf;
>>>>    
>>>>    	ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs);
>>>>    	xen_mode = ring_0(regs);
> 
> Unrelated to the patch here (i.e. just as an observation), this
> use of ring_0() looks bogus when the NMI occurred in HVM guest
> mode.

An NMI in an HVM guest due to oprofile would be a VMEXIT with NMI
reason, or just be handled completely inside the guest, right?

I don't see how this test should ever result in xen_mode being
false for an HVM guest.


Juergen
Jan Beulich Nov. 12, 2020, 11:05 a.m. UTC | #5
On 12.11.2020 11:48, Jürgen Groß wrote:
> On 12.11.20 11:23, Jan Beulich wrote:
>> On 11.11.2020 16:48, Jürgen Groß wrote:
>>> On 11.11.20 16:45, Jan Beulich wrote:
>>>> On 09.11.2020 10:50, Juergen Gross wrote:
>>>>>    static int nmi_callback(const struct cpu_user_regs *regs, int cpu)
>>>>>    {
>>>>>    	int xen_mode, ovf;
>>>>>    
>>>>>    	ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs);
>>>>>    	xen_mode = ring_0(regs);
>>
>> Unrelated to the patch here (i.e. just as an observation), this
>> use of ring_0() looks bogus when the NMI occurred in HVM guest
>> mode.
> 
> An NMI in an HVM guest due to oprofile would be a VMEXIT with NMI
> reason, or just be handled completely inside the guest, right?

Yes, and in the former case for VMX it would be handed on to do_nmi(),
with the guest register state. For SVM it would get handled on the
next STGI, i.e. would indeed never surface from HVM guest mode.

> I don't see how this test should ever result in xen_mode being
> false for an HVM guest.

I think, because of hvm_invalidate_regs_fields(), on VMX it would be
consistently true in release builds and consistently false in debug
ones.

Jan
Jürgen Groß Nov. 12, 2020, 11:27 a.m. UTC | #6
On 12.11.20 12:05, Jan Beulich wrote:
> On 12.11.2020 11:48, Jürgen Groß wrote:
>> On 12.11.20 11:23, Jan Beulich wrote:
>>> On 11.11.2020 16:48, Jürgen Groß wrote:
>>>> On 11.11.20 16:45, Jan Beulich wrote:
>>>>> On 09.11.2020 10:50, Juergen Gross wrote:
>>>>>>     static int nmi_callback(const struct cpu_user_regs *regs, int cpu)
>>>>>>     {
>>>>>>     	int xen_mode, ovf;
>>>>>>     
>>>>>>     	ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs);
>>>>>>     	xen_mode = ring_0(regs);
>>>
>>> Unrelated to the patch here (i.e. just as an observation), this
>>> use of ring_0() looks bogus when the NMI occurred in HVM guest
>>> mode.
>>
>> An NMI in an HVM guest due to oprofile would be a VMEXIT with NMI
>> reason, or just be handled completely inside the guest, right?
> 
> Yes, and in the former case for VMX it would be handed on to do_nmi(),
> with the guest register state. For SVM it would get handled on the
> next STGI, i.e. would indeed never surface from HVM guest mode.
> 
>> I don't see how this test should ever result in xen_mode being
>> false for an HVM guest.
> 
> I think, because of hvm_invalidate_regs_fields(), on VMX it would be
> consistently true in release builds and consistently false in debug
> ones.

Ah, okay. I searched for do_nmi(), but the vmx code uses the exception
table instead.

So I guess this should be:

xen_mode = !guest_mode(regs);


Juergen
Jan Beulich Nov. 12, 2020, 11:36 a.m. UTC | #7
On 12.11.2020 12:27, Jürgen Groß wrote:
> On 12.11.20 12:05, Jan Beulich wrote:
>> On 12.11.2020 11:48, Jürgen Groß wrote:
>>> On 12.11.20 11:23, Jan Beulich wrote:
>>>> On 11.11.2020 16:48, Jürgen Groß wrote:
>>>>> On 11.11.20 16:45, Jan Beulich wrote:
>>>>>> On 09.11.2020 10:50, Juergen Gross wrote:
>>>>>>>     static int nmi_callback(const struct cpu_user_regs *regs, int cpu)
>>>>>>>     {
>>>>>>>     	int xen_mode, ovf;
>>>>>>>     
>>>>>>>     	ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs);
>>>>>>>     	xen_mode = ring_0(regs);
>>>>
>>>> Unrelated to the patch here (i.e. just as an observation), this
>>>> use of ring_0() looks bogus when the NMI occurred in HVM guest
>>>> mode.
>>>
>>> An NMI in an HVM guest due to oprofile would be a VMEXIT with NMI
>>> reason, or just be handled completely inside the guest, right?
>>
>> Yes, and in the former case for VMX it would be handed on to do_nmi(),
>> with the guest register state. For SVM it would get handled on the
>> next STGI, i.e. would indeed never surface from HVM guest mode.
>>
>>> I don't see how this test should ever result in xen_mode being
>>> false for an HVM guest.
>>
>> I think, because of hvm_invalidate_regs_fields(), on VMX it would be
>> consistently true in release builds and consistently false in debug
>> ones.
> 
> Ah, okay. I searched for do_nmi(), but the vmx code uses the exception
> table instead.
> 
> So I guess this should be:
> 
> xen_mode = !guest_mode(regs);

Yes, I think so. Just that guest_mode() also has its issues (my patch
"x86: refine guest_mode()" improving it at least some is still pending
Andrew's go / no-go / improvement suggestions), so whether it's
suitable to use here may need some careful evaluation.

Jan
Jürgen Groß Nov. 12, 2020, 1:03 p.m. UTC | #8
On 12.11.20 12:36, Jan Beulich wrote:
> On 12.11.2020 12:27, Jürgen Groß wrote:
>> On 12.11.20 12:05, Jan Beulich wrote:
>>> On 12.11.2020 11:48, Jürgen Groß wrote:
>>>> On 12.11.20 11:23, Jan Beulich wrote:
>>>>> On 11.11.2020 16:48, Jürgen Groß wrote:
>>>>>> On 11.11.20 16:45, Jan Beulich wrote:
>>>>>>> On 09.11.2020 10:50, Juergen Gross wrote:
>>>>>>>>      static int nmi_callback(const struct cpu_user_regs *regs, int cpu)
>>>>>>>>      {
>>>>>>>>      	int xen_mode, ovf;
>>>>>>>>      
>>>>>>>>      	ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs);
>>>>>>>>      	xen_mode = ring_0(regs);
>>>>>
>>>>> Unrelated to the patch here (i.e. just as an observation), this
>>>>> use of ring_0() looks bogus when the NMI occurred in HVM guest
>>>>> mode.
>>>>
>>>> An NMI in an HVM guest due to oprofile would be a VMEXIT with NMI
>>>> reason, or just be handled completely inside the guest, right?
>>>
>>> Yes, and in the former case for VMX it would be handed on to do_nmi(),
>>> with the guest register state. For SVM it would get handled on the
>>> next STGI, i.e. would indeed never surface from HVM guest mode.
>>>
>>>> I don't see how this test should ever result in xen_mode being
>>>> false for an HVM guest.
>>>
>>> I think, because of hvm_invalidate_regs_fields(), on VMX it would be
>>> consistently true in release builds and consistently false in debug
>>> ones.
>>
>> Ah, okay. I searched for do_nmi(), but the vmx code uses the exception
>> table instead.
>>
>> So I guess this should be:
>>
>> xen_mode = !guest_mode(regs);
> 
> Yes, I think so. Just that guest_mode() also has its issues (my patch
> "x86: refine guest_mode()" improving it at least some is still pending
> Andrew's go / no-go / improvement suggestions), so whether it's
> suitable to use here may need some careful evaluation.

I'll leave the test as is for now.

We can revisit it when your patch has been committed.


Juergen
diff mbox series

Patch

diff --git a/xen/arch/x86/oprofile/nmi_int.c b/xen/arch/x86/oprofile/nmi_int.c
index 0f103d80a6..5f17cba0b2 100644
--- a/xen/arch/x86/oprofile/nmi_int.c
+++ b/xen/arch/x86/oprofile/nmi_int.c
@@ -38,6 +38,8 @@  static unsigned long saved_lvtpc[NR_CPUS];
 
 static char *cpu_type;
 
+static DEFINE_PER_CPU(struct vcpu *, nmi_cont_vcpu);
+
 static int passive_domain_msr_op_checks(unsigned int msr, int *typep, int *indexp)
 {
 	struct vpmu_struct *vpmu = vcpu_vpmu(current);
@@ -83,14 +85,28 @@  void passive_domain_destroy(struct vcpu *v)
 		model->free_msr(v);
 }
 
+bool nmi_oprofile_send_virq(void)
+{
+	struct vcpu *v = this_cpu(nmi_cont_vcpu);
+
+	if ( v )
+		send_guest_vcpu_virq(v, VIRQ_XENOPROF);
+
+	this_cpu(nmi_cont_vcpu) = NULL;
+
+	return v;
+}
+
 static int nmi_callback(const struct cpu_user_regs *regs, int cpu)
 {
 	int xen_mode, ovf;
 
 	ovf = model->check_ctrs(cpu, &cpu_msrs[cpu], regs);
 	xen_mode = ring_0(regs);
-	if ( ovf && is_active(current->domain) && !xen_mode )
-		send_guest_vcpu_virq(current, VIRQ_XENOPROF);
+	if ( ovf && is_active(current->domain) && !xen_mode ) {
+		this_cpu(nmi_cont_vcpu) = current;
+		trigger_nmi_continuation();
+	}
 
 	if ( ovf == 2 )
 		current->arch.nmi_pending = true;
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 5005ac6e6e..7cb7d7e09c 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -65,6 +65,7 @@ 
 #include <asm/debugger.h>
 #include <asm/msr.h>
 #include <asm/nmi.h>
+#include <asm/xenoprof.h>
 #include <asm/shared.h>
 #include <asm/x86_emulate.h>
 #include <asm/traps.h>
@@ -1804,6 +1805,9 @@  bool nmi_check_continuation(void)
 {
     bool ret = false;
 
+    if ( nmi_oprofile_send_virq() )
+        ret = true;
+
     return ret;
 }
 
diff --git a/xen/include/asm-x86/xenoprof.h b/xen/include/asm-x86/xenoprof.h
index 1026ba2e1f..cf6af8c5df 100644
--- a/xen/include/asm-x86/xenoprof.h
+++ b/xen/include/asm-x86/xenoprof.h
@@ -69,6 +69,8 @@  int passive_domain_do_rdmsr(unsigned int msr, uint64_t *msr_content);
 int passive_domain_do_wrmsr(unsigned int msr, uint64_t msr_content);
 void passive_domain_destroy(struct vcpu *v);
 
+bool nmi_oprofile_send_virq(void);
+
 #else
 
 static inline int passive_domain_do_rdmsr(unsigned int msr,
@@ -85,6 +87,11 @@  static inline int passive_domain_do_wrmsr(unsigned int msr,
 
 static inline void passive_domain_destroy(struct vcpu *v) {}
 
+static inline bool nmi_oprofile_send_virq(void)
+{
+    return false;
+}
+
 #endif /* CONFIG_XENOPROF */
 
 #endif /* __ASM_X86_XENOPROF_H__ */