diff mbox

kvm-userspace: Make PC speaker emulation aware of in-kernel PIT

Message ID 49F0CE65.4050005@web.de (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Kiszka April 23, 2009, 8:24 p.m. UTC
When using the in-kernel PIT the speaker emulation has to synchronize
the PIT state with KVM. Enhance the existing speaker sound device and
allow it to take over port 0x61 by using KVM_CREATE_PIT_NOSPKR when
available. This unbreaks -soundhw pcspk in KVM mode.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

 libkvm/libkvm-x86.c |   13 +++++++++++++
 qemu/hw/pcspk.c     |   43 +++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 54 insertions(+), 2 deletions(-)

Comments

Marcelo Tosatti April 25, 2009, 12:13 a.m. UTC | #1
Jan,

While the patch itself looks fine, IMO it would be better to move all of 
the timer handling to userspace, except the performance critical parts,
since most of it is generic. Either periodic or one-shot timer, with:

    - PIO or MMIO region returns remaining time for expiration.
    - PIO or MMIO region programs the next event and timer mode.

Oversimplified of course (kvm_timer_ops was the first step in that
direction). I believe there will be a proposed HPET in-kernel driver.

I don't see what is the problem with partial components that Avi talks
about.

On Thu, Apr 23, 2009 at 10:24:05PM +0200, Jan Kiszka wrote:
> When using the in-kernel PIT the speaker emulation has to synchronize
> the PIT state with KVM. Enhance the existing speaker sound device and
> allow it to take over port 0x61 by using KVM_CREATE_PIT_NOSPKR when
> available. This unbreaks -soundhw pcspk in KVM mode.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anthony Liguori April 25, 2009, 1:08 p.m. UTC | #2
Marcelo Tosatti wrote:
> Jan,
>
> While the patch itself looks fine, IMO it would be better to move all of 
> the timer handling to userspace, except the performance critical parts,
> since most of it is generic. Either periodic or one-shot timer, with:
>   

The reason for having the PIT in-kernel is not performance.  The PIT is 
not performance sensitive.

It's because it was easier to do interrupt catch-up by pushing the PIT 
into the kernel which IMHO was the wrong path to go down.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka April 25, 2009, 4:28 p.m. UTC | #3
Anthony Liguori wrote:
> Marcelo Tosatti wrote:
>> Jan,
>>
>> While the patch itself looks fine, IMO it would be better to move all
>> of the timer handling to userspace, except the performance critical
>> parts,
>> since most of it is generic. Either periodic or one-shot timer, with:
>>   
> 
> The reason for having the PIT in-kernel is not performance.  The PIT is
> not performance sensitive.

I think that depends. Some OSes (in some configurations) use the PIT
counter as clock source and/or program it regularly in one-shot mode. An
aging use case, but still a valid one.

> 
> It's because it was easier to do interrupt catch-up by pushing the PIT
> into the kernel which IMHO was the wrong path to go down.

Pushing the emulation of port 0x61 into the kernel was a mistake we now
have to deal with. I'm not that sure about the PIT itself.

Jan
Anthony Liguori April 25, 2009, 7:59 p.m. UTC | #4
Jan Kiszka wrote:
> Anthony Liguori wrote:
>   
>> Marcelo Tosatti wrote:
>>     
>>> Jan,
>>>
>>> While the patch itself looks fine, IMO it would be better to move all
>>> of the timer handling to userspace, except the performance critical
>>> parts,
>>> since most of it is generic. Either periodic or one-shot timer, with:
>>>   
>>>       
>> The reason for having the PIT in-kernel is not performance.  The PIT is
>> not performance sensitive.
>>     
>
> I think that depends. Some OSes (in some configurations) use the PIT
> counter as clock source and/or program it regularly in one-shot mode. An
> aging use case, but still a valid one.
>   

I can't find the thread, but this has been discussed at length before.  
The justification has always been for time drift correction.  If you 
crunch the numbers, even at a 1024HZ, there just aren't enough exits to 
really make a difference from a performance perspective.

Just to state it more clearly, if you assume an additional 5us to drop 
to userspace (which is absurdly high, but let's stick with it), 1024 
exits per second comes out to about 5ms which is only 0.5% in terms of 
CPU consumption.

The APIC is quite a bit more understandable because especially with SMP, 
you can generate a very high number of interrupts per second and taking 
a drop to userspace for every EOI can be start to matter with exit rates 
in the hundreds of thousands.

>> It's because it was easier to do interrupt catch-up by pushing the PIT
>> into the kernel which IMHO was the wrong path to go down.
>>     
>
> Pushing the emulation of port 0x61 into the kernel was a mistake we now
> have to deal with. I'm not that sure about the PIT itself.
>   

I agree re: port 0x61.  I'm just saying that there is no point in moving 
just the non "performance critical" components to userspace as Marcelo 
suggests because the whole thing is non "performance critical".

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sheng Yang April 27, 2009, 1 p.m. UTC | #5
On Sunday 26 April 2009 03:59:11 Anthony Liguori wrote:
> Jan Kiszka wrote:
> > Anthony Liguori wrote:
> >> Marcelo Tosatti wrote:
> >>> Jan,
> >>>
> >>> While the patch itself looks fine, IMO it would be better to move all
> >>> of the timer handling to userspace, except the performance critical
> >>> parts,
> >>> since most of it is generic. Either periodic or one-shot timer, with:
> >>
> >> The reason for having the PIT in-kernel is not performance.  The PIT is
> >> not performance sensitive.
> >
> > I think that depends. Some OSes (in some configurations) use the PIT
> > counter as clock source and/or program it regularly in one-shot mode. An
> > aging use case, but still a valid one.
>
> I can't find the thread, but this has been discussed at length before.
> The justification has always been for time drift correction.  If you
> crunch the numbers, even at a 1024HZ, there just aren't enough exits to
> really make a difference from a performance perspective.

I am agree too. 

When I moved PIT to kernel, the direct reason is at that time, timer in KVM is 
crappy, mainly due to interrupt handling stuffs. I remember the most obviously 
one is userspace pit injected one interrupt after another, regardless if the 
interrupt have already been delivered to the guest, so some interrupt lost, 
and the timer of guest would become slower and slower. We decided to depends 
on in-kernel pit to provide a stable time source, so move the whole pit to 
kernel(rather than try to provide a interface to fix it as Xen did at the time 
which seems much more complex).

Now KVM timer is much maturer and stable than that time, so I think it's ok to 
try to separate the timer interrupt logic and IO logic now. (though I also 
think it would still spend some time to get a elegant interface...)
Marcelo Tosatti April 27, 2009, 10:32 p.m. UTC | #6
On Thu, Apr 23, 2009 at 10:24:05PM +0200, Jan Kiszka wrote:
> When using the in-kernel PIT the speaker emulation has to synchronize
> the PIT state with KVM. Enhance the existing speaker sound device and
> allow it to take over port 0x61 by using KVM_CREATE_PIT_NOSPKR when
> available. This unbreaks -soundhw pcspk in KVM mode.

ACK both patches.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David S. Ahern April 28, 2009, 4:44 a.m. UTC | #7
Anthony Liguori wrote:
> Jan Kiszka wrote:
>> Anthony Liguori wrote:
>>  
>>> Marcelo Tosatti wrote:
>>>    
>>>> Jan,
>>>>
>>>> While the patch itself looks fine, IMO it would be better to move all
>>>> of the timer handling to userspace, except the performance critical
>>>> parts,
>>>> since most of it is generic. Either periodic or one-shot timer, with:
>>>>         
>>> The reason for having the PIT in-kernel is not performance.  The PIT is
>>> not performance sensitive.
>>>     
>>
>> I think that depends. Some OSes (in some configurations) use the PIT
>> counter as clock source and/or program it regularly in one-shot mode. An
>> aging use case, but still a valid one.
>>   
> 
> I can't find the thread, but this has been discussed at length before. 
> The justification has always been for time drift correction.  If you
> crunch the numbers, even at a 1024HZ, there just aren't enough exits to
> really make a difference from a performance perspective.
> 
> Just to state it more clearly, if you assume an additional 5us to drop
> to userspace (which is absurdly high, but let's stick with it), 1024
> exits per second comes out to about 5ms which is only 0.5% in terms of
> CPU consumption.


You are considering timekeeping activities only.

RHEL4 for example reads the PIT for each gettimeofday call. For
applications that add timestamps to logging the PIT is a *HUGE* overhead
(and the PMTMR for that matter). I have one example where something like
15% of each second is wasted handling the ioport reads and writes for
get_offset_pit.

david


> 
> The APIC is quite a bit more understandable because especially with SMP,
> you can generate a very high number of interrupts per second and taking
> a drop to userspace for every EOI can be start to matter with exit rates
> in the hundreds of thousands.
> 
>>> It's because it was easier to do interrupt catch-up by pushing the PIT
>>> into the kernel which IMHO was the wrong path to go down.
>>>     
>>
>> Pushing the emulation of port 0x61 into the kernel was a mistake we now
>> have to deal with. I'm not that sure about the PIT itself.
>>   
> 
> I agree re: port 0x61.  I'm just saying that there is no point in moving
> just the non "performance critical" components to userspace as Marcelo
> suggests because the whole thing is non "performance critical".
> 
> Regards,
> 
> Anthony Liguori
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dor Laor April 28, 2009, 7:02 a.m. UTC | #8
David S. Ahern wrote:
> Anthony Liguori wrote:
>   
>> Jan Kiszka wrote:
>>     
>>> Anthony Liguori wrote:
>>>  
>>>       
>>>> Marcelo Tosatti wrote:
>>>>    
>>>>         
>>>>> Jan,
>>>>>
>>>>> While the patch itself looks fine, IMO it would be better to move all
>>>>> of the timer handling to userspace, except the performance critical
>>>>> parts,
>>>>> since most of it is generic. Either periodic or one-shot timer, with:
>>>>>         
>>>>>           
>>>> The reason for having the PIT in-kernel is not performance.  The PIT is
>>>> not performance sensitive.
>>>>     
>>>>         
>>> I think that depends. Some OSes (in some configurations) use the PIT
>>> counter as clock source and/or program it regularly in one-shot mode. An
>>> aging use case, but still a valid one.
>>>   
>>>       
>> I can't find the thread, but this has been discussed at length before. 
>> The justification has always been for time drift correction.  If you
>> crunch the numbers, even at a 1024HZ, there just aren't enough exits to
>> really make a difference from a performance perspective.
>>
>> Just to state it more clearly, if you assume an additional 5us to drop
>> to userspace (which is absurdly high, but let's stick with it), 1024
>> exits per second comes out to about 5ms which is only 0.5% in terms of
>> CPU consumption.
>>     
>
>
> You are considering timekeeping activities only.
>
> RHEL4 for example reads the PIT for each gettimeofday call. For
> applications that add timestamps to logging the PIT is a *HUGE* overhead
> (and the PMTMR for that matter). I have one example where something like
> 15% of each second is wasted handling the ioport reads and writes for
> get_offset_pit.
>
> david
>
>   
I found the link to the previous discussion about moving the pit to 
userspace:
http://www.mail-archive.com/kvm@vger.kernel.org/msg02357.html
In the above discussion Marcelo pointed out that we need the pit in the 
kernel is
order to have the timer and the vcpu thread running on the same cpu. 
Otherwise
IPIs will be sent from the io-thread to the vcpu thread in order of 
injection the irq.
I guess we can also do it also using specific timer thread in userspace, 
but it is getting
more complex.
  
btw: I found a type in the patch in the line below:
"fprintf(stderr, "Create kernel PIC irqchip failed\n");"
s/PIC/PIT/
>   
>> The APIC is quite a bit more understandable because especially with SMP,
>> you can generate a very high number of interrupts per second and taking
>> a drop to userspace for every EOI can be start to matter with exit rates
>> in the hundreds of thousands.
>>
>>     
>>>> It's because it was easier to do interrupt catch-up by pushing the PIT
>>>> into the kernel which IMHO was the wrong path to go down.
>>>>     
>>>>         
>>> Pushing the emulation of port 0x61 into the kernel was a mistake we now
>>> have to deal with. I'm not that sure about the PIT itself.
>>>   
>>>       
>> I agree re: port 0x61.  I'm just saying that there is no point in moving
>> just the non "performance critical" components to userspace as Marcelo
>> suggests because the whole thing is non "performance critical".
>>
>> Regards,
>>
>> Anthony Liguori
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>     
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 4, 2009, 9:34 a.m. UTC | #9
Jan Kiszka wrote:
> When using the in-kernel PIT the speaker emulation has to synchronize
> the PIT state with KVM. Enhance the existing speaker sound device and
> allow it to take over port 0x61 by using KVM_CREATE_PIT_NOSPKR when
> available. This unbreaks -soundhw pcspk in KVM mode.
>
> diff --git a/qemu/hw/pcspk.c b/qemu/hw/pcspk.c
> index ec1d0c6..4752518 100644
> --- a/qemu/hw/pcspk.c
> +++ b/qemu/hw/pcspk.c
> @@ -27,6 +27,8 @@
>  #include "isa.h"
>  #include "audio/audio.h"
>  #include "qemu-timer.h"
> +#include "i8254.h"
> +#include "qemu-kvm.h"
>  
>  #define PCSPK_BUF_LEN 1792
>  #define PCSPK_SAMPLE_RATE 32000
> @@ -71,7 +73,15 @@ static void pcspk_callback(void *opaque, int free)
>  {
>      PCSpkState *s = opaque;
>      unsigned int n;
> +#ifdef USE_KVM_PIT
> +    struct kvm_pit_state pit_state;
>  
> +    if (kvm_enabled() && qemu_kvm_pit_in_kernel()) {
> +        kvm_get_pit(kvm_context, &pit_state);
> +        s->pit->channels[2].mode = pit_state.channels[2].mode;
> +        s->pit->channels[2].count = pit_state.channels[2].count;
> +    }
> +#endif
>      if (pit_get_mode(s->pit, 2) != 3)
>          return;
>  
> @@ -120,7 +130,17 @@ static uint32_t pcspk_ioport_read(void *opaque, uint32_t addr)
>  {
>      PCSpkState *s = opaque;
>      int out;
> -
> +#ifdef USE_KVM_PIT
> +    struct kvm_pit_state pit_state;
> +
> +    if (kvm_enabled() && qemu_kvm_pit_in_kernel()) {
> +        kvm_get_pit(kvm_context, &pit_state);
> +        s->pit->channels[2].mode = pit_state.channels[2].mode;
> +        s->pit->channels[2].count = pit_state.channels[2].count;
> +        s->pit->channels[2].count_load_time = pit_state.channels[2].count_load_time;
> +        s->pit->channels[2].gate = pit_state.channels[2].gate;
> +    }
> +#endif
>      s->dummy_refresh_clock ^= (1 << 4);
>      out = pit_get_out(s->pit, 2, qemu_get_clock(vm_clock)) << 5;
>  
> @@ -131,7 +151,17 @@ static void pcspk_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>  {
>      PCSpkState *s = opaque;
>      const int gate = val & 1;
> -
> +#ifdef USE_KVM_PIT
> +    struct kvm_pit_state pit_state;
> +
> +    if (kvm_enabled() && qemu_kvm_pit_in_kernel()) {
> +        kvm_get_pit(kvm_context, &pit_state);
> +        s->pit->channels[2].mode = pit_state.channels[2].mode;
> +        s->pit->channels[2].count = pit_state.channels[2].count;
> +        s->pit->channels[2].count_load_time = pit_state.channels[2].count_load_time;
> +        s->pit->channels[2].gate = pit_state.channels[2].gate;
> +    }
> +#endif
>      s->data_on = (val >> 1) & 1;
>      pit_set_gate(s->pit, 2, gate);
>      if (s->voice) {
> @@ -139,6 +169,15 @@ static void pcspk_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>              s->play_pos = 0;
>          AUD_set_active_out(s->voice, gate & s->data_on);
>      }
> +#ifdef USE_KVM_PIT
> +    if (kvm_enabled() && qemu_kvm_pit_in_kernel()) {
> +        pit_state.channels[2].mode = s->pit->channels[2].mode;
> +        pit_state.channels[2].count = s->pit->channels[2].count;
> +        pit_state.channels[2].count_load_time = s->pit->channels[2].count_load_time;
> +        pit_state.channels[2].gate = s->pit->channels[2].gate;
> +        kvm_set_pit(kvm_context, &pit_state);
> +    }
> +#endif
>  }
>  
>  void pcspk_init(PITState *pit)
>
>   

Please extract those bits into functions.
diff mbox

Patch

diff --git a/libkvm/libkvm-x86.c b/libkvm/libkvm-x86.c
index 2fc4fce..03b1939 100644
--- a/libkvm/libkvm-x86.c
+++ b/libkvm/libkvm-x86.c
@@ -59,6 +59,19 @@  int kvm_create_pit(kvm_context_t kvm)
 
 	kvm->pit_in_kernel = 0;
 	if (!kvm->no_pit_creation) {
+#ifdef KVM_CAP_PIT_NOSPKR
+		r = ioctl(kvm->fd, KVM_CHECK_EXTENSION, KVM_CAP_PIT_NOSPKR);
+		if (r > 0) {
+			r = ioctl(kvm->vm_fd, KVM_CREATE_PIT_NOSPKR);
+			if (r >= 0) {
+				kvm->pit_in_kernel = 1;
+				return 0;
+			} else {
+				fprintf(stderr, "Create kernel PIC irqchip failed\n");
+				return r;
+			}
+		}
+#endif
 		r = ioctl(kvm->fd, KVM_CHECK_EXTENSION, KVM_CAP_PIT);
 		if (r > 0) {
 			r = ioctl(kvm->vm_fd, KVM_CREATE_PIT);
diff --git a/qemu/hw/pcspk.c b/qemu/hw/pcspk.c
index ec1d0c6..4752518 100644
--- a/qemu/hw/pcspk.c
+++ b/qemu/hw/pcspk.c
@@ -27,6 +27,8 @@ 
 #include "isa.h"
 #include "audio/audio.h"
 #include "qemu-timer.h"
+#include "i8254.h"
+#include "qemu-kvm.h"
 
 #define PCSPK_BUF_LEN 1792
 #define PCSPK_SAMPLE_RATE 32000
@@ -71,7 +73,15 @@  static void pcspk_callback(void *opaque, int free)
 {
     PCSpkState *s = opaque;
     unsigned int n;
+#ifdef USE_KVM_PIT
+    struct kvm_pit_state pit_state;
 
+    if (kvm_enabled() && qemu_kvm_pit_in_kernel()) {
+        kvm_get_pit(kvm_context, &pit_state);
+        s->pit->channels[2].mode = pit_state.channels[2].mode;
+        s->pit->channels[2].count = pit_state.channels[2].count;
+    }
+#endif
     if (pit_get_mode(s->pit, 2) != 3)
         return;
 
@@ -120,7 +130,17 @@  static uint32_t pcspk_ioport_read(void *opaque, uint32_t addr)
 {
     PCSpkState *s = opaque;
     int out;
-
+#ifdef USE_KVM_PIT
+    struct kvm_pit_state pit_state;
+
+    if (kvm_enabled() && qemu_kvm_pit_in_kernel()) {
+        kvm_get_pit(kvm_context, &pit_state);
+        s->pit->channels[2].mode = pit_state.channels[2].mode;
+        s->pit->channels[2].count = pit_state.channels[2].count;
+        s->pit->channels[2].count_load_time = pit_state.channels[2].count_load_time;
+        s->pit->channels[2].gate = pit_state.channels[2].gate;
+    }
+#endif
     s->dummy_refresh_clock ^= (1 << 4);
     out = pit_get_out(s->pit, 2, qemu_get_clock(vm_clock)) << 5;
 
@@ -131,7 +151,17 @@  static void pcspk_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
     PCSpkState *s = opaque;
     const int gate = val & 1;
-
+#ifdef USE_KVM_PIT
+    struct kvm_pit_state pit_state;
+
+    if (kvm_enabled() && qemu_kvm_pit_in_kernel()) {
+        kvm_get_pit(kvm_context, &pit_state);
+        s->pit->channels[2].mode = pit_state.channels[2].mode;
+        s->pit->channels[2].count = pit_state.channels[2].count;
+        s->pit->channels[2].count_load_time = pit_state.channels[2].count_load_time;
+        s->pit->channels[2].gate = pit_state.channels[2].gate;
+    }
+#endif
     s->data_on = (val >> 1) & 1;
     pit_set_gate(s->pit, 2, gate);
     if (s->voice) {
@@ -139,6 +169,15 @@  static void pcspk_ioport_write(void *opaque, uint32_t addr, uint32_t val)
             s->play_pos = 0;
         AUD_set_active_out(s->voice, gate & s->data_on);
     }
+#ifdef USE_KVM_PIT
+    if (kvm_enabled() && qemu_kvm_pit_in_kernel()) {
+        pit_state.channels[2].mode = s->pit->channels[2].mode;
+        pit_state.channels[2].count = s->pit->channels[2].count;
+        pit_state.channels[2].count_load_time = s->pit->channels[2].count_load_time;
+        pit_state.channels[2].gate = s->pit->channels[2].gate;
+        kvm_set_pit(kvm_context, &pit_state);
+    }
+#endif
 }
 
 void pcspk_init(PITState *pit)