diff mbox

kvm tools, ui: Optimize SDL updates

Message ID 1307136023-16693-1-git-send-email-penberg@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Pekka Enberg June 3, 2011, 9:20 p.m. UTC
This patch optimizes SDL updates by keeping track of which parts of the guest
screen have been written since last update and calling SDL_BlitSurface() and
SDL_UpdateRect() for only changed parts of the screen.

Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Floren <john@jfloren.net>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
---
 tools/kvm/ui/sdl.c |   60 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 57 insertions(+), 3 deletions(-)

Comments

Ingo Molnar June 4, 2011, 9:54 a.m. UTC | #1
* Pekka Enberg <penberg@kernel.org> wrote:

> This patch optimizes SDL updates by keeping track of which parts of the guest
> screen have been written since last update and calling SDL_BlitSurface() and
> SDL_UpdateRect() for only changed parts of the screen.
> 
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: John Floren <john@jfloren.net>
> Cc: Sasha Levin <levinsasha928@gmail.com>
> Signed-off-by: Pekka Enberg <penberg@kernel.org>

I tried this one and updates got a bit faster.

We really need accelerated scrolling support though, i.e. framebuffer 
combined with virtio-fb for operations such as scrolling. Could we 
merge the repo up to v3.0-rc1 and write virtio-fb based on the 
virtio-gl patch?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar June 4, 2011, 10:54 a.m. UTC | #2
* Alexander Graf <agraf@suse.de> wrote:

> 
> On 04.06.2011, at 12:42, Ingo Molnar wrote:
> 
> > 
> > * Alexander Graf <agraf@suse.de> wrote:
> > 
> >> I wrote up 2 virtio-fb implementations a while back and I still 
> >> believe it's a bad idea. Better implement QXL in kvm-tool, so work 
> >> doesn't get needlessly duplicated. If you really have to use virtio 
> >> for whatever reason (no PCI available), just write a small QXL over 
> >> virtio transport that allows you to reuse the protocol.
> >> 
> >> I really don't want to see people waste time on reinventing the 
> >> wheel over and over again.
> > 
> > Oh, we are just ignorantly blundering around trying to find a good 
> > solution! :-)
> > 
> > We are not trying to reimplement the wheel (at all), if you check we 
> > started with VNC GUI support which is as far from NIH as it gets! :-)
> > 
> > I didn't know about QXL but it looks interesting at first sight: a 
> > virtual GPU seen by the guest OS with Xorg support for it in the 
> > guest Xorg. But i do not see guest kernel framebuffer support for it 
> > - how does that aspect work, if one boots without Xorg, etc.?
> 
> IIUC it implements VESA and just goes through the respective fb. 
> [...]

So, if you look at the context of this dicussion our motivation is 
slow scrolling and our desire to support smooth scrolling on 64-bit 
too. It was unclear whether VESA would proper panning/scrolling on 
64-bit as well - Pekka thinks that it is not supported there.

> [...] I would love to see a QXL fb implementation though. I'd also 
> love to see QXL working on !PCI, so we can potentially use it on 
> s390x and ppc-hv which can't easily do MMIO.
> 
> So instead of putting effort into writing virtio-fb host and guest 
> sides, why not implement QXL-fb against a working target (qemu) and 
> then implement the QXL host side against a working target (working 
> guest support)? That probably makes the development process a lot 
> easier.

Have a look at the 'kvm' tool:

  git pull git://github.com/penberg/linux-kvm master

We'd like to implement better graphics support there, in a gradual 
fashion if possible. Right now we have VNC and SDL support - two 
rather primitive 2D framebuffer concepts with no acceleration at all.

If you think qxl-fb can be done gradually in that context then that's 
probably the right solution. Is there *any* QXL code in the kernel 
that would allow us to get started? I really know nothing about QXL.

The natural steps for us would be:

 - add primitive 2D acceleration support: scrolling

 - check what it would require to get good guest Xorg support. QXL has
   a full driver on the guest side Xorg server - but how does this
   get channeled over to the virtualizer - is it a magic PCI device
   that the host has to provide to the guest and which the guest Xorg
   server uses as a PCI device? Or does the guest side DRM code know
   about this GPU and uses it in KMS?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sasha Levin June 4, 2011, 4:49 p.m. UTC | #3
On Sat, 2011-06-04 at 17:40 +0200, Alexander Graf wrote:
> On 04.06.2011, at 17:34, Sasha Levin wrote:
> 
> > On Sat, 2011-06-04 at 17:21 +0200, Alexander Graf wrote:
> >> On 04.06.2011, at 16:19, Sasha Levin wrote:
> >> 
> >>> On Sat, 2011-06-04 at 15:48 +0200, Alexander Graf wrote:
> >>>> On 04.06.2011, at 14:04, Sasha Levin wrote:
> >>>> 
> >>>>> On Sat, 2011-06-04 at 13:53 +0200, Ingo Molnar wrote:
> >>>>>> * Alexander Graf <agraf@suse.de> wrote:
> >>>>>> 
> >>>>>>> Why would you need panning/scrolling for a fast FB? It's really an 
> >>>>>>> optimization that helps a lot with VNC, but on local machines or 
> >>>>>>> SDL you shouldn't see a major difference.
> >>>>>> 
> >>>>>> Qemu's fb console scrolling graphics is pretty slow to me even 
> >>>>>> locally so i assume that the dirty bitmap trick is not enough.
> >>>>>> 
> >>>>>> VirtualBox graphics is very fast, but it probably has its own console 
> >>>>>> abstraction and scrolling/2D/3D acceleration.
> >>>>>> 
> >>>>>> Also, since tools/kvm/ is really also about learning interesting 
> >>>>>> stuff, smooth scrolling was the historic first 'acceleration' usecase 
> >>>>>> that video graphics cards added - before they evolved more complex 2D 
> >>>>>> acceleration and then started doing 3D.
> >>>>>> 
> >>>>>> Walking that path would allow us to do a gradual approach, while 
> >>>>>> still having relevant functionality and enhancements at every step.
> >>>>>> 
> >>>>>>> Unless you use the FB as MMIO. Qemu just maps the FB as RAM and 
> >>>>>>> checks for dirty bitmap updates periodically. That way you don't 
> >>>>>>> constantly exit due to MMIO and are good on speed. The slowness you 
> >>>>>>> describe sounds a lot as if you don't do that trick.
> >>>>>> 
> >>>>>> Correct, and i assumed we already do the dirty-bitmap trick:
> >>>>>> 
> >>>>>> 	KVM_MEM_LOG_DIRTY_PAGES
> >>>>>> 	KVM_GET_DIRTY_LOG
> >>>>>> 
> >>>>>> But you are right, we do not actually do that!
> >>>>>> 
> >>>>>> Pekka, i think this should be the next step. We'll need scrolling 
> >>>>>> after that ...
> >>>>>> 
> >>>>>> In theory it would also be nice to tunnel the VGA text frame buffer 
> >>>>>> over to the KVM tool - as serial console is not supported by most 
> >>>>>> installers and default distro images. We could actually do a rather 
> >>>>>> good job of emulating it via Slang/Curses.
> >>>>> 
> >>>>> I doubt we could use dirty pages because unless guest VESA driver
> >>>>> supports panning, it will redraw the entire FB - which means that all
> >>>>> pages will be dirty.
> >>>> 
> >>>> Please recheck the math and compare 60 dirty bitmap checks+flushes per second to a few million MMIO exits for every single pixel :).
> >>> 
> >>> I might be missing something here, but if every single pixel changes due
> >>> to scrolling, doesn't it mean that all the pages will be marked as
> >>> 'dirty' anyway?
> >> 
> >> Sure, but you don't need to exit to user space for every single pixel, but instead process the whole thing asynchronously. Just run kvm_stat while running your current implementation and you'll pretty soon realize what I'm talking about :).
> > 
> > I we use coalesced MMIO we only exit when the shared page is full.
> 
> Yes, which will be very often for full redrawing guests. Remember, we're talking about megabytes of graphics data. Plus you still need to call your internal MMIO handler for every single access then. And I hope I don't even have to mention read performance (which is abysmal on real graphics cards too though).
> 
> > If we mark a memory region as log dirty we won't get MMIO exits on it?
> 
> [..] If you mark a memory region as coalesced you also don't get MMIO exits on it. [..]
> 

We get MMIO exits on it when the ring is full, which is pretty often
with the graphics card.

I'll try the dirty log method later tonight. I don't see anything about
no exits in the documentation though - if it actually prevents MMIO
exits to the region it should probably be documented.

>   $ while true; do echo -n x; done
> 
> But I won't keep you from doing it. Implement it and see how it performs. This whole project is about trying to find out what is fast for yourselves, no? :) Just make sure to also implement the dirty log way so you can actually compare the numbers.
> 
> 
> Alex
>
Alexander Graf June 5, 2011, 8:41 a.m. UTC | #4
On 04.06.2011, at 18:49, Sasha Levin wrote:

> On Sat, 2011-06-04 at 17:40 +0200, Alexander Graf wrote:
>> On 04.06.2011, at 17:34, Sasha Levin wrote:
>> 
>>> On Sat, 2011-06-04 at 17:21 +0200, Alexander Graf wrote:
>>>> On 04.06.2011, at 16:19, Sasha Levin wrote:
>>>> 
>>>>> On Sat, 2011-06-04 at 15:48 +0200, Alexander Graf wrote:
>>>>>> On 04.06.2011, at 14:04, Sasha Levin wrote:
>>>>>> 
>>>>>>> On Sat, 2011-06-04 at 13:53 +0200, Ingo Molnar wrote:
>>>>>>>> * Alexander Graf <agraf@suse.de> wrote:
>>>>>>>> 
>>>>>>>>> Why would you need panning/scrolling for a fast FB? It's really an 
>>>>>>>>> optimization that helps a lot with VNC, but on local machines or 
>>>>>>>>> SDL you shouldn't see a major difference.
>>>>>>>> 
>>>>>>>> Qemu's fb console scrolling graphics is pretty slow to me even 
>>>>>>>> locally so i assume that the dirty bitmap trick is not enough.
>>>>>>>> 
>>>>>>>> VirtualBox graphics is very fast, but it probably has its own console 
>>>>>>>> abstraction and scrolling/2D/3D acceleration.
>>>>>>>> 
>>>>>>>> Also, since tools/kvm/ is really also about learning interesting 
>>>>>>>> stuff, smooth scrolling was the historic first 'acceleration' usecase 
>>>>>>>> that video graphics cards added - before they evolved more complex 2D 
>>>>>>>> acceleration and then started doing 3D.
>>>>>>>> 
>>>>>>>> Walking that path would allow us to do a gradual approach, while 
>>>>>>>> still having relevant functionality and enhancements at every step.
>>>>>>>> 
>>>>>>>>> Unless you use the FB as MMIO. Qemu just maps the FB as RAM and 
>>>>>>>>> checks for dirty bitmap updates periodically. That way you don't 
>>>>>>>>> constantly exit due to MMIO and are good on speed. The slowness you 
>>>>>>>>> describe sounds a lot as if you don't do that trick.
>>>>>>>> 
>>>>>>>> Correct, and i assumed we already do the dirty-bitmap trick:
>>>>>>>> 
>>>>>>>> 	KVM_MEM_LOG_DIRTY_PAGES
>>>>>>>> 	KVM_GET_DIRTY_LOG
>>>>>>>> 
>>>>>>>> But you are right, we do not actually do that!
>>>>>>>> 
>>>>>>>> Pekka, i think this should be the next step. We'll need scrolling 
>>>>>>>> after that ...
>>>>>>>> 
>>>>>>>> In theory it would also be nice to tunnel the VGA text frame buffer 
>>>>>>>> over to the KVM tool - as serial console is not supported by most 
>>>>>>>> installers and default distro images. We could actually do a rather 
>>>>>>>> good job of emulating it via Slang/Curses.
>>>>>>> 
>>>>>>> I doubt we could use dirty pages because unless guest VESA driver
>>>>>>> supports panning, it will redraw the entire FB - which means that all
>>>>>>> pages will be dirty.
>>>>>> 
>>>>>> Please recheck the math and compare 60 dirty bitmap checks+flushes per second to a few million MMIO exits for every single pixel :).
>>>>> 
>>>>> I might be missing something here, but if every single pixel changes due
>>>>> to scrolling, doesn't it mean that all the pages will be marked as
>>>>> 'dirty' anyway?
>>>> 
>>>> Sure, but you don't need to exit to user space for every single pixel, but instead process the whole thing asynchronously. Just run kvm_stat while running your current implementation and you'll pretty soon realize what I'm talking about :).
>>> 
>>> I we use coalesced MMIO we only exit when the shared page is full.
>> 
>> Yes, which will be very often for full redrawing guests. Remember, we're talking about megabytes of graphics data. Plus you still need to call your internal MMIO handler for every single access then. And I hope I don't even have to mention read performance (which is abysmal on real graphics cards too though).
>> 
>>> If we mark a memory region as log dirty we won't get MMIO exits on it?
>> 
>> [..] If you mark a memory region as coalesced you also don't get MMIO exits on it. [..]
>> 
> 
> We get MMIO exits on it when the ring is full, which is pretty often
> with the graphics card.
> 
> I'll try the dirty log method later tonight. I don't see anything about
> no exits in the documentation though - if it actually prevents MMIO
> exits to the region it should probably be documented.

It's documented. Just look up the documentation for KVM_GET_DIRTY_LOG and KVM_SET_USER_MEMORY_REGION.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alon Levy June 5, 2011, 9:32 a.m. UTC | #5
On Sat, Jun 04, 2011 at 01:07:09PM +0200, Alexander Graf wrote:
> 
> On 04.06.2011, at 12:54, Ingo Molnar wrote:
> 
> > 
> > * Alexander Graf <agraf@suse.de> wrote:
> > 
> >> 
> >> On 04.06.2011, at 12:42, Ingo Molnar wrote:
> >> 
> >>> 
> >>> * Alexander Graf <agraf@suse.de> wrote:
> >>> 
> >>>> I wrote up 2 virtio-fb implementations a while back and I still 
> >>>> believe it's a bad idea. Better implement QXL in kvm-tool, so work 
> >>>> doesn't get needlessly duplicated. If you really have to use virtio 
> >>>> for whatever reason (no PCI available), just write a small QXL over 
> >>>> virtio transport that allows you to reuse the protocol.
> >>>> 
> >>>> I really don't want to see people waste time on reinventing the 
> >>>> wheel over and over again.
> >>> 
> >>> Oh, we are just ignorantly blundering around trying to find a good 
> >>> solution! :-)
> >>> 
> >>> We are not trying to reimplement the wheel (at all), if you check we 
> >>> started with VNC GUI support which is as far from NIH as it gets! :-)
> >>> 
> >>> I didn't know about QXL but it looks interesting at first sight: a 
> >>> virtual GPU seen by the guest OS with Xorg support for it in the 
> >>> guest Xorg. But i do not see guest kernel framebuffer support for it 
> >>> - how does that aspect work, if one boots without Xorg, etc.?
> >> 
> >> IIUC it implements VESA and just goes through the respective fb. 
> >> [...]
> > 
> > So, if you look at the context of this dicussion our motivation is 
> > slow scrolling and our desire to support smooth scrolling on 64-bit 
> > too. It was unclear whether VESA would proper panning/scrolling on 
> > 64-bit as well - Pekka thinks that it is not supported there.
> 
> Why would you need panning/scrolling for a fast FB? It's really an optimization that helps a lot with VNC, but on local machines or SDL you shouldn't see a major difference.
> 
> Unless you use the FB as MMIO. Qemu just maps the FB as RAM and checks for dirty bitmap updates periodically. That way you don't constantly exit due to MMIO and are good on speed. The slowness you describe sounds a lot as if you don't do that trick.
> 
> Also, I'm fairly sure vesafb implements scrolling almost unconditionally. Check for "ypan" in drivers/video/vesafb.c.
> 
> > 
> >> [...] I would love to see a QXL fb implementation though. I'd also 
> >> love to see QXL working on !PCI, so we can potentially use it on 
> >> s390x and ppc-hv which can't easily do MMIO.
> >> 
> >> So instead of putting effort into writing virtio-fb host and guest 
> >> sides, why not implement QXL-fb against a working target (qemu) and 
> >> then implement the QXL host side against a working target (working 
> >> guest support)? That probably makes the development process a lot 
> >> easier.
> > 
> > Have a look at the 'kvm' tool:
> > 
> >  git pull git://github.com/penberg/linux-kvm master
> > 
> > We'd like to implement better graphics support there, in a gradual 
> > fashion if possible. Right now we have VNC and SDL support - two 
> > rather primitive 2D framebuffer concepts with no acceleration at all.
> 
> Well, VNC and SDL are pure front-end concepts. What you plug in on the backend is a different story, no? :)
> 
> > If you think qxl-fb can be done gradually in that context then that's 
> > probably the right solution. Is there *any* QXL code in the kernel 
> > that would allow us to get started? I really know nothing about QXL.
> 
> I don't think there is. CC'ing Gerd for that discussion. He's taking care of all the upstreaming work and community efforts around QXL.
> 
> > The natural steps for us would be:
> > 
> > - add primitive 2D acceleration support: scrolling
> 
> This one could be done with VESA. Xorg won't use it though IIRC.
> 
> > - check what it would require to get good guest Xorg support. QXL has
> >   a full driver on the guest side Xorg server - but how does this
> >   get channeled over to the virtualizer - is it a magic PCI device
> >   that the host has to provide to the guest and which the guest Xorg
> >   server uses as a PCI device? Or does the guest side DRM code know
> >   about this GPU and uses it in KMS?
> 
> It is a PCI device. That's about where my knowledge ends. Gerd had a nice talk about some internals of the actual QXL protocol and PCI device on KVM Forum 2010:
> 
>   http://www.linux-kvm.org/wiki/images/a/aa/2010-forum-spice.pdf
>   http://vimeo.com/15225069
> 
> However, it does not mention how the Xorg driver interacts with it, so I simply don't know :). And you're probably not worse on reading code than me to figure it out ;).

There is no kernel driver (an ommision we plan to fix if required - in particular we can't use an interrupt handler from Xorg).
The Xorg driver (qxl_drv.so) uses the qxl pci device, there are four bars, the first is the fb (vga) bar but is larger then the fb, the rest is used for the commands (the protocol is GDI based), there are three rings: commands, cursor commands, and release (the driver gets back it's send commands when the server is done with them via the release ring).
The second bar is for off screen surfaces only.
The third is an io bar.
The forth is a rom bar (contains the modes list mainly I think).

> 
> 
> Alex
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gerd Hoffmann June 6, 2011, 7:02 a.m. UTC | #6
Hi,

> Also, I'm fairly sure vesafb implements scrolling almost
> unconditionally. Check for "ypan" in drivers/video/vesafb.c.

ypan uses the protected mode interface provided by the vesa bios.
It is not used by default works on 32bit only.

>> If you think qxl-fb can be done gradually in that context then
>> that's probably the right solution. Is there *any* QXL code in the
>> kernel that would allow us to get started? I really know nothing
>> about QXL.
>
> I don't think there is. CC'ing Gerd for that discussion. He's taking
> care of all the upstreaming work and community efforts around QXL.

No kernel code right now.

cheers,
   Gerd

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/tools/kvm/ui/sdl.c b/tools/kvm/ui/sdl.c
index bc69ed9..f175a60 100644
--- a/tools/kvm/ui/sdl.c
+++ b/tools/kvm/ui/sdl.c
@@ -6,11 +6,51 @@ 
 #include <SDL/SDL.h>
 #include <pthread.h>
 
+#include <linux/kernel.h>
+
 #define FRAME_RATE		25
 
+static u64 min_x;
+static u64 min_y;
+static u64 max_x;
+static u64 max_y;
+
+static void sdl__finish(void)
+{
+	max_x = 0;
+	max_y = 0;
+	min_x = ULLONG_MAX;
+	min_y = ULLONG_MAX;
+}
+
+static inline bool sdl__need_update(void)
+{
+	return min_x < max_x && min_y < max_y;
+}
+
 static void sdl__write(struct framebuffer *fb, u64 addr, u8 *data, u32 len)
 {
-	memcpy(&fb->mem[addr - fb->mem_addr], data, len);
+	u64 x, y;
+	u64 pos;
+
+	pos = addr - fb->mem_addr;
+
+	x = (pos / 4) % fb->width;
+	y = ((pos / 4) - x) / fb->width;
+
+	if (x < min_x)
+		min_x = x;
+
+	if (y < min_y)
+		min_y = y;
+
+	if (x > max_x)
+		max_x = x;
+
+	if (y > max_y)
+		max_y = y;
+
+	memcpy(&fb->mem[pos], data, len);
 }
 
 static void *sdl__thread(void *p)
@@ -40,9 +80,23 @@  static void *sdl__thread(void *p)
 	if (!screen)
 		die("Unable to set SDL video mode");
 
+	sdl__finish();
+
 	for (;;) {
-		SDL_BlitSurface(guest_screen, NULL, screen, NULL);
-		SDL_UpdateRect(screen, 0, 0, 0, 0);
+		if (sdl__need_update()) {
+			SDL_Rect rect = {
+				.x		= min_x,
+				.y		= min_y,
+				.w		= max_x - min_x,
+				.h		= max_y - min_y,
+			};
+
+			SDL_BlitSurface(guest_screen, &rect, screen, &rect);
+			SDL_UpdateRect(screen, rect.x, rect.y, rect.w, rect.h);
+		}
+
+		sdl__finish();
+
 		while (SDL_PollEvent(&ev)) {
 			switch (ev.type) {
 			case SDL_QUIT: