diff mbox series

[2/3] i386: hvf: In kick_vcpu use hv_vcpu_interrupt to force exit

Message ID 20230922140914.13906-3-phil@philjordan.eu (mailing list archive)
State New, archived
Headers show
Series hvf x86 correctness and efficiency improvements | expand

Commit Message

Phil Dennis-Jordan Sept. 22, 2023, 2:09 p.m. UTC
When interrupting a vCPU thread, this patch actually tells the hypervisor to
stop running guest code on that vCPU.

Calling hv_vcpu_interrupt actually forces a vCPU exit, analogously to
hv_vcpus_exit on aarch64.

Previously, hvf_kick_vcpu_thread relied upon hv_vcpu_run returning very
frequently, including many spurious exits, which made it less of a problem that
nothing was actively done to stop the vCPU thread running guest code.
The newer, more efficient hv_vcpu_run_until exits much more rarely, so a true
"kick" is needed.

Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
 target/i386/hvf/hvf.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Roman Bolshakov Oct. 8, 2023, 6:23 p.m. UTC | #1
On Fri, Sep 22, 2023 at 04:09:13PM +0200, Phil Dennis-Jordan wrote:
> When interrupting a vCPU thread, this patch actually tells the hypervisor to
> stop running guest code on that vCPU.
> 
> Calling hv_vcpu_interrupt actually forces a vCPU exit, analogously to
> hv_vcpus_exit on aarch64.
> 
> Previously, hvf_kick_vcpu_thread relied upon hv_vcpu_run returning very
> frequently, including many spurious exits, which made it less of a problem that
> nothing was actively done to stop the vCPU thread running guest code.
> The newer, more efficient hv_vcpu_run_until exits much more rarely, so a true
> "kick" is needed.
> 

Hi Phil,

I see severe performance regression with the patch on a Windows XP
guest. The display is not refreshed properly like a broken LVDS panel,
only some horizontal lines appear on it. My test laptop for x86 hvf is
MBA 2015 with the latest Big Sur. What are you runing QEMU/HVF on?

FWIW. I recall a few years ago I submitted a similar patch that does
something similar but addresses a few more issues:
https://patchwork.kernel.org/project/qemu-devel/patch/20200729124832.79375-1-r.bolshakov@yadro.com/

I don't remember why it never got merged.

Regards,
Roman
Phil Dennis-Jordan Oct. 8, 2023, 6:39 p.m. UTC | #2
On Sun, 8 Oct 2023 at 20:23, Roman Bolshakov <roman@roolebo.dev> wrote:

> On Fri, Sep 22, 2023 at 04:09:13PM +0200, Phil Dennis-Jordan wrote:
> > When interrupting a vCPU thread, this patch actually tells the
> hypervisor to
> > stop running guest code on that vCPU.
> >
> > Calling hv_vcpu_interrupt actually forces a vCPU exit, analogously to
> > hv_vcpus_exit on aarch64.
> >
> > Previously, hvf_kick_vcpu_thread relied upon hv_vcpu_run returning very
> > frequently, including many spurious exits, which made it less of a
> problem that
> > nothing was actively done to stop the vCPU thread running guest code.
> > The newer, more efficient hv_vcpu_run_until exits much more rarely, so a
> true
> > "kick" is needed.
> >
>

Hi Roman,

Thanks for the review and test of this patch and the preceding one!


> I see severe performance regression with the patch on a Windows XP
> guest. The display is not refreshed properly like a broken LVDS panel,
> only some horizontal lines appear on it.


OK, that's interesting - I've been running into that sort of issue while
trying to integrate HVF's APIC implementation into Qemu. I assume that's
with patch 3/3 applied as well? The fact you've repro'd it with just these
patch would explain why I've not been able to fix it on the APIC side…

My test laptop for x86 hvf is
> MBA 2015 with the latest Big Sur. What are you runing QEMU/HVF on?
>

Most of the testing has been with 2018 Mac Mini (Intel Coffee Lake) hosts
running Big Sur (11), Monterey (12), and Ventura (13), and using macOS
guests. I've also sanity-checked with a 2015 MBP (Broadwell) Monterey host
with various macOS guests as well as Linux and FreeBSD guests. Guess I
should have tried Windows guests too, sorry about the regression!


> FWIW. I recall a few years ago I submitted a similar patch that does
> something similar but addresses a few more issues:
>
> https://patchwork.kernel.org/project/qemu-devel/patch/20200729124832.79375-1-r.bolshakov@yadro.com/
>
> I don't remember why it never got merged.
>

Looks like the VM kick might be a more complex undertaking than I was
anticipating. I'll try to repro the problem you ran into, and then look
over your original patch and make sense of it. Hopefully an updated version
of your 'kick' implementation will work well in combination with the
newer hv_vcpu_run_until() API from patch 3/3. I'll keep you posted.

Thanks again,
Phil
Roman Bolshakov Oct. 8, 2023, 7:19 p.m. UTC | #3
On Sun, Oct 08, 2023 at 08:39:08PM +0200, Phil Dennis-Jordan wrote:
> On Sun, 8 Oct 2023 at 20:23, Roman Bolshakov <roman@roolebo.dev> wrote:
> 
> > On Fri, Sep 22, 2023 at 04:09:13PM +0200, Phil Dennis-Jordan wrote:
> > > When interrupting a vCPU thread, this patch actually tells the
> > hypervisor to
> > > stop running guest code on that vCPU.
> > >
> > > Calling hv_vcpu_interrupt actually forces a vCPU exit, analogously to
> > > hv_vcpus_exit on aarch64.
> > >
> > > Previously, hvf_kick_vcpu_thread relied upon hv_vcpu_run returning very
> > > frequently, including many spurious exits, which made it less of a
> > problem that
> > > nothing was actively done to stop the vCPU thread running guest code.
> > > The newer, more efficient hv_vcpu_run_until exits much more rarely, so a
> > true
> > > "kick" is needed.
> > >
> >
> 
> Hi Roman,
> 
> Thanks for the review and test of this patch and the preceding one!
> 

My pleasure. Thanks for submitting the patch and confirming the need of
the feature on x86 HVF.

> > I see severe performance regression with the patch on a Windows XP
> > guest. The display is not refreshed properly like a broken LVDS panel,
> > only some horizontal lines appear on it.
> 
> 
> I assume that's with patch 3/3 applied as well? The fact you've
> repro'd it with just these patch would explain why I've not been able
> to fix it on the APIC side…
> 

Yes, I applied with patch 3/3 and then retested only with the first two
patches.

> > FWIW. I recall a few years ago I submitted a similar patch that does
> > something similar but addresses a few more issues:
> >
> > https://patchwork.kernel.org/project/qemu-devel/patch/20200729124832.79375-1-r.bolshakov@yadro.com/
> >
> > I don't remember why it never got merged.
> >
> 
> Looks like the VM kick might be a more complex undertaking than I was
> anticipating. I'll try to repro the problem you ran into, and then look
> over your original patch and make sense of it. Hopefully an updated version
> of your 'kick' implementation will work well in combination with the
> newer hv_vcpu_run_until() API from patch 3/3. I'll keep you posted.
> 

Apparently I left a note that some interrupts weren't delivered even
with my patch and I was not able figure out the reason back then. I had
another attempt to debug this two weeks later after I submitted v4 and I
can find a WIP branch on github where I added a Debug Registers support
patch and some tracepoints:

https://github.com/qemu/qemu/compare/master...roolebo:qemu:hvf-debug-kick

Perhaps that's where we should start from besides the obvious need of
rebase.

With regards to hv_vcpu_run_until() I can find the following thread:
https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg09468.html

> hv_vcpu_run_until() was also evaluated on macOS 10.15.5 but it degrades
> VM performance significantly compared to explicit setting of
> VMX-preepmtion timer value and hv_vcpu_run(). The performance issue was
> observed on Broadwell-based MacBook Air and Ivy Bridge-based MacBook
> Pro.
> 
> macOS 11.0 Beta deprecated hv_vcpu_run() and introduced a special
> declaration for hv_vcpu_run_until(), that's not available 10.15 -
> HV_DEADLINE_FOREVER (UINT64_MAX, which is bigger than maximum value of
> VMX-preeemption counter). Perhaps the performance issue is addressed
> there.

All discussion with Paolo might be helpful, particurlarly:
https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg09893.html

> So, I've tried Big Sur Beta and it has exactly the same performance
> issue with hv_vcpu_run_until() while hv_vcpu_run() works as good as it
> worked on 10.15.5. I've submitted FB7827341 to Apple wrt the issue.

In November 2020, Apple responded to FB7827341 that there's an issue on
QEMU side.

Regards,
Roman
Phil Dennis-Jordan Oct. 8, 2023, 7:29 p.m. UTC | #4
On Sun, 8 Oct 2023 at 21:19, Roman Bolshakov <roman@roolebo.dev> wrote:

> > I assume that's with patch 3/3 applied as well? The fact you've
> > repro'd it with just these patch would explain why I've not been able
> > to fix it on the APIC side…
> >
>
> Yes, I applied with patch 3/3 and then retested only with the first two
> patches.
>

OK, interesting that it would happen without patch 3 as well.


> > > FWIW. I recall a few years ago I submitted a similar patch that does
> > > something similar but addresses a few more issues:
> > >
> > >
> https://patchwork.kernel.org/project/qemu-devel/patch/20200729124832.79375-1-r.bolshakov@yadro.com/
> > >
> > > I don't remember why it never got merged.
> > >
> >
> > Looks like the VM kick might be a more complex undertaking than I was
> > anticipating. I'll try to repro the problem you ran into, and then look
> > over your original patch and make sense of it. Hopefully an updated
> version
> > of your 'kick' implementation will work well in combination with the
> > newer hv_vcpu_run_until() API from patch 3/3. I'll keep you posted.
> >
>
> Apparently I left a note that some interrupts weren't delivered even
> with my patch and I was not able figure out the reason back then. I had
> another attempt to debug this two weeks later after I submitted v4 and I
> can find a WIP branch on github where I added a Debug Registers support
> patch and some tracepoints:
>
> https://github.com/qemu/qemu/compare/master...roolebo:qemu:hvf-debug-kick
>
> Perhaps that's where we should start from besides the obvious need of
> rebase.
>

Sounds good, I'll take a look at those changes and try to work out what to
do next.


> With regards to hv_vcpu_run_until() I can find the following thread:
> https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg09468.html
>
> > hv_vcpu_run_until() was also evaluated on macOS 10.15.5 but it degrades
> > VM performance significantly compared to explicit setting of
> > VMX-preepmtion timer value and hv_vcpu_run(). The performance issue was
> > observed on Broadwell-based MacBook Air and Ivy Bridge-based MacBook
> > Pro.
> >
> > macOS 11.0 Beta deprecated hv_vcpu_run() and introduced a special
> > declaration for hv_vcpu_run_until(), that's not available 10.15 -
> > HV_DEADLINE_FOREVER (UINT64_MAX, which is bigger than maximum value of
> > VMX-preeemption counter). Perhaps the performance issue is addressed
> > there.
>
> All discussion with Paolo might be helpful, particurlarly:
> https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg09893.html
>
> > So, I've tried Big Sur Beta and it has exactly the same performance
> > issue with hv_vcpu_run_until() while hv_vcpu_run() works as good as it
> > worked on 10.15.5. I've submitted FB7827341 to Apple wrt the issue.
>
> In November 2020, Apple responded to FB7827341 that there's an issue on
> QEMU side.
>

Hmm, that's interesting. I'll need to work my way through that thread, but
I'll point out that in my testing with SMP guests, I measured a performance
*improvement* with the hv_vcpu_run_until() API (and the forever deadline)
versus hv_vcpu_run(), as it significantly reduced BQL contention - with so
many VMEXITs, vCPU threads were spending a lot of time waiting for the lock.
Phil Dennis-Jordan Oct. 16, 2023, 2:19 p.m. UTC | #5
Hi Roman,

A quick update on my progress on this so far:

- I've managed to repro the graphical issue; it turns up with the
switch to hv_vcpu_run_until() (patch 3/3). I can't repro with just the
changes to the kick. (patch 2/3; It would have surprised me if that
had caused it.)

- Thanks for linking to the old mailing list threads for background on
the issues.

- I've rebased/ported your more sophisticated vCPU kick implementation
to the latest upstream master branch. This doesn't seem to cause any
obvious regressions, but it also doesn't appear to fix the issue with
hv_vcpu_run_until(). My branch is here,
https://github.com/pmj/qemu/tree/hvf-kick-modern and the updated
version of your patch here:
https://github.com/qemu/qemu/commit/fadc716c5bb15345a49e08eecf7ab1077782975c

- I've done some tracing, and I so far can't spot any undelivered or
slow kicks. Each kick is paired with a return from hvf_vcpu_exec with
reason EXCP_INTERRUPT, and this happens within less than a
millisecond, typically around 200µs. I'm seeing kicks being delivered
at all the various stages we're expecting them apart from VMX
preemption timer exits, at least so far. But those only target a tiny
window and so should be statistically quite rare.

- I'm starting to entertain the idea that hv_vcpu_run_until() exhibits
some other difference compared to hv_vcpu_run() which is causing the
graphical issues, rather than a problem with interrupt delivery.
Unfortunately, I'm not familiar with how drawing works on legacy
(S)VGA graphics, and what effect might cause it to end up with these
dropped scanline updates. I might try to find some kind of OSDev
example code that draws to (S)VGA and hopefully lets me repro and
perhaps debug the problem in isolation.

Any ideas/thoughts?

Thanks,
Phil



On Sun, 8 Oct 2023 at 21:30, Phil Dennis-Jordan <phil@philjordan.eu> wrote:
>
>
> On Sun, 8 Oct 2023 at 21:19, Roman Bolshakov <roman@roolebo.dev> wrote:
>>
>> > I assume that's with patch 3/3 applied as well? The fact you've
>> > repro'd it with just these patch would explain why I've not been able
>> > to fix it on the APIC side…
>> >
>>
>> Yes, I applied with patch 3/3 and then retested only with the first two
>> patches.
>
>
> OK, interesting that it would happen without patch 3 as well.
>
>>
>> > > FWIW. I recall a few years ago I submitted a similar patch that does
>> > > something similar but addresses a few more issues:
>> > >
>> > > https://patchwork.kernel.org/project/qemu-devel/patch/20200729124832.79375-1-r.bolshakov@yadro.com/
>> > >
>> > > I don't remember why it never got merged.
>> > >
>> >
>> > Looks like the VM kick might be a more complex undertaking than I was
>> > anticipating. I'll try to repro the problem you ran into, and then look
>> > over your original patch and make sense of it. Hopefully an updated version
>> > of your 'kick' implementation will work well in combination with the
>> > newer hv_vcpu_run_until() API from patch 3/3. I'll keep you posted.
>> >
>>
>> Apparently I left a note that some interrupts weren't delivered even
>> with my patch and I was not able figure out the reason back then. I had
>> another attempt to debug this two weeks later after I submitted v4 and I
>> can find a WIP branch on github where I added a Debug Registers support
>> patch and some tracepoints:
>>
>> https://github.com/qemu/qemu/compare/master...roolebo:qemu:hvf-debug-kick
>>
>> Perhaps that's where we should start from besides the obvious need of
>> rebase.
>
>
> Sounds good, I'll take a look at those changes and try to work out what to do next.
>
>>
>> With regards to hv_vcpu_run_until() I can find the following thread:
>> https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg09468.html
>>
>> > hv_vcpu_run_until() was also evaluated on macOS 10.15.5 but it degrades
>> > VM performance significantly compared to explicit setting of
>> > VMX-preepmtion timer value and hv_vcpu_run(). The performance issue was
>> > observed on Broadwell-based MacBook Air and Ivy Bridge-based MacBook
>> > Pro.
>> >
>> > macOS 11.0 Beta deprecated hv_vcpu_run() and introduced a special
>> > declaration for hv_vcpu_run_until(), that's not available 10.15 -
>> > HV_DEADLINE_FOREVER (UINT64_MAX, which is bigger than maximum value of
>> > VMX-preeemption counter). Perhaps the performance issue is addressed
>> > there.
>>
>> All discussion with Paolo might be helpful, particurlarly:
>> https://lists.nongnu.org/archive/html/qemu-devel/2020-06/msg09893.html
>>
>> > So, I've tried Big Sur Beta and it has exactly the same performance
>> > issue with hv_vcpu_run_until() while hv_vcpu_run() works as good as it
>> > worked on 10.15.5. I've submitted FB7827341 to Apple wrt the issue.
>>
>> In November 2020, Apple responded to FB7827341 that there's an issue on
>> QEMU side.
>
>
> Hmm, that's interesting. I'll need to work my way through that thread, but I'll point out that in my testing with SMP guests, I measured a performance *improvement* with the hv_vcpu_run_until() API (and the forever deadline) versus hv_vcpu_run(), as it significantly reduced BQL contention - with so many VMEXITs, vCPU threads were spending a lot of time waiting for the lock.
>
>
Phil Dennis-Jordan Oct. 20, 2023, 3:12 p.m. UTC | #6
Hi Roman, hi Paolo,

Just an update on my investigation of the hv_vcpu_run ->
hv_vcpu_run_until issue. The graphical issues with the Windows XP VM
appear to be caused by the dirty memory page system not working as
expected. The emulated (Cirrus) VGA adapter uses dirty page tracking
to perform partial screen updates, so when pages aren't marked as
dirty, they don't get updated on the host console.

This got me digging into how dirty memory tracking is actually
implemented in the Qemu hvf backend, and basically, it should never
have worked in the first place. When we get a write fault, the code
marks the *whole* 'logged' memory range as writable rather than just
the page that's just been dirtied. It just so happens that hv_vcpu_run
was causing EPT fault exits on those pages even after marking them
writable (?), and hv_vcpu_run_until() no longer does that. So
basically, this has been a Qemu bug masked by undesirable
hv_vcpu_run() behaviour. I'll start putting together a fix for this.

I'm also hoping to settle the hv_vcpu_interrupt() race condition
question empirically - if we can avoid the complicated signal/vmexit
race avoidance logic with atomic flags, that will make the code rather
simpler.

Phil
Roman Bolshakov Nov. 5, 2023, 3:21 p.m. UTC | #7
On Fri, Oct 20, 2023 at 05:12:13PM +0200, Phil Dennis-Jordan wrote:
> Hi Roman, hi Paolo,
> 

Hi Phil,

Pardon for not responding earlier. I was travelling the last three weeks.

I appreciate the time you spent on the rebase. I have compiled it and
observed the same issue with SVGA like with your third patch.

> Just an update on my investigation of the hv_vcpu_run ->
> hv_vcpu_run_until issue. The graphical issues with the Windows XP VM
> appear to be caused by the dirty memory page system not working as
> expected. The emulated (Cirrus) VGA adapter uses dirty page tracking
> to perform partial screen updates, so when pages aren't marked as
> dirty, they don't get updated on the host console.
> 

That sounds awesome, I think you have tracked it down correctly. I have
also looked at SVGA code and the only idea I had is dirty tracking is
somehow not working properly.

I observed similar issue when tried to add GDB stub for x86 hvf. The
changes from GDB stub produced no apparent effect on the guest -
breakpoints were there, in the memory but did not stop the guest and so
on. I got lost why it didn't work back then.

> This got me digging into how dirty memory tracking is actually
> implemented in the Qemu hvf backend, and basically, it should never
> have worked in the first place. When we get a write fault, the code
> marks the *whole* 'logged' memory range as writable rather than just
> the page that's just been dirtied. It just so happens that hv_vcpu_run
> was causing EPT fault exits on those pages even after marking them
> writable (?), and hv_vcpu_run_until() no longer does that. So
> basically, this has been a Qemu bug masked by undesirable
> hv_vcpu_run() behaviour. I'll start putting together a fix for this.
> 

Sounds good, have you got anything to test or review? Meanwhile, I'll
review the pending patches you sent.

Best regards,
Roman
Phil Dennis-Jordan Nov. 6, 2023, 2:15 p.m. UTC | #8
Hi Roman,

On Sun, 5 Nov 2023 at 16:21, Roman Bolshakov <roman@roolebo.dev> wrote:

> > This got me digging into how dirty memory tracking is actually
> > implemented in the Qemu hvf backend, and basically, it should never
> > have worked in the first place. When we get a write fault, the code
> > marks the *whole* 'logged' memory range as writable rather than just
> > the page that's just been dirtied. It just so happens that hv_vcpu_run
> > was causing EPT fault exits on those pages even after marking them
> > writable (?), and hv_vcpu_run_until() no longer does that. So
> > basically, this has been a Qemu bug masked by undesirable
> > hv_vcpu_run() behaviour. I'll start putting together a fix for this.
> >
>
> Sounds good, have you got anything to test or review? Meanwhile, I'll
> review the pending patches you sent.
>

Sorry, I've likewise been busy with other things the last 2-3 weeks.

As far as I'm aware, we don't actually know 100% for certain if there's a
race condition when using hv_vcpu_interrupt(), right? (As I mentioned, the
patches with hv_vcpu_run_until and a hv_vcpu_interrupt-only kick have been
running without issue for months on dozens to hundreds of VMs.) So before
we add the complexity of the hybrid hv_vcpu_interrupt & signal-based kick
to the codebase, I'd like to test out hv_vcpu_interrupt's behaviour in
isolation and actually force the edge cases we're worried about. But yeah,
I haven't got around to doing that yet. :-) I'm hoping to take a crack at
it later this week or next week, probably using
https://github.com/mist64/hvdos as a starting point.

Thanks for reviewing and testing the first set of patches!

Phil
diff mbox series

Patch

diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index cb2cd0b02f..55bd7d2af8 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -209,7 +209,10 @@  static inline bool apic_bus_freq_is_known(CPUX86State *env)
 
 void hvf_kick_vcpu_thread(CPUState *cpu)
 {
+    hv_vcpuid_t hvf_vcpuid;
     cpus_kick_thread(cpu);
+    hvf_vcpuid = cpu->accel->fd;
+    hv_vcpu_interrupt(&hvf_vcpuid, 1);
 }
 
 int hvf_arch_init(void)