diff mbox series

drm/bochs: use ioremap_wc() to map framebuffer during driver probing

Message ID 20240909051529.26776-1-yan.y.zhao@intel.com (mailing list archive)
State New
Headers show
Series drm/bochs: use ioremap_wc() to map framebuffer during driver probing | expand

Commit Message

Yan Zhao Sept. 9, 2024, 5:15 a.m. UTC
Use ioremap_wc() instead of ioremap() to map framebuffer during driver
probing phase.

Using ioremap() results in a VA being mapped with PAT=UC-. Additionally,
on x86 architectures, ioremap() invokes memtype_reserve() to reserve the
memory type as UC- for the physical range. This reservation can cause
subsequent calls to ioremap_wc() to fail to map the VA with PAT=WC to the
same physical range for framebuffre in ttm_kmap_iter_linear_io_init().
Consequently, the operation drm_gem_vram_bo_driver_move() ->
ttm_bo_move_memcpy() -> ttm_move_memcpy() becomes significantly slow on
platforms where UC memory access is slow.

Here's the performance data measured in a guest on the physical machine
"Sapphire Rapids XCC".
With host KVM honors guest PAT memory types, the effective memory type
for this framebuffer range is
- WC when ioremap_wc() is used in driver probing phase
- UC- when ioremap() is used.

The data presented is an average from 10 execution runs.
The memcpy range for the data is
mem->bus.offset=0xfd000000, mem->size=0x3e8000.

--------------------------------------------------------------
                              |      in bochs_hw_init()       |
                              |    ioremap()   | ioremap_wc() |
------------------------------|----------------|--------------|
    cycles of                 |    2227.4M     |   17.8M      |
drm_gem_vram_bo_driver_move() |                |              |
------------------------------|----------------|--------------|
    time of                   |    1.24s       |   0.01s      |
drm_gem_vram_bo_driver_move() |                |              |
--------------------------------------------------------------

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Closes: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com/#t
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/gpu/drm/tiny/bochs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Thomas Zimmermann Sept. 9, 2024, 6:40 a.m. UTC | #1
Hi

Am 09.09.24 um 07:15 schrieb Yan Zhao:
> Use ioremap_wc() instead of ioremap() to map framebuffer during driver
> probing phase.
>
> Using ioremap() results in a VA being mapped with PAT=UC-. Additionally,
> on x86 architectures, ioremap() invokes memtype_reserve() to reserve the
> memory type as UC- for the physical range. This reservation can cause
> subsequent calls to ioremap_wc() to fail to map the VA with PAT=WC to the
> same physical range for framebuffre in ttm_kmap_iter_linear_io_init().
> Consequently, the operation drm_gem_vram_bo_driver_move() ->
> ttm_bo_move_memcpy() -> ttm_move_memcpy() becomes significantly slow on
> platforms where UC memory access is slow.

I've noticed this too and pushed a major update that replaces the entire 
memory management. [1]

The patch is still welcome, I think, but you may want to rebase onto the 
latest drm-misc-next branch. [2]

Best regards
Thomas

[1] https://patchwork.freedesktop.org/series/138086/
[2] https://gitlab.freedesktop.org/drm/misc/kernel/-/tree/drm-misc-next

>
> Here's the performance data measured in a guest on the physical machine
> "Sapphire Rapids XCC".
> With host KVM honors guest PAT memory types, the effective memory type
> for this framebuffer range is
> - WC when ioremap_wc() is used in driver probing phase
> - UC- when ioremap() is used.
>
> The data presented is an average from 10 execution runs.
> The memcpy range for the data is
> mem->bus.offset=0xfd000000, mem->size=0x3e8000.
>
> --------------------------------------------------------------
>                                |      in bochs_hw_init()       |
>                                |    ioremap()   | ioremap_wc() |
> ------------------------------|----------------|--------------|
>      cycles of                 |    2227.4M     |   17.8M      |
> drm_gem_vram_bo_driver_move() |                |              |
> ------------------------------|----------------|--------------|
>      time of                   |    1.24s       |   0.01s      |
> drm_gem_vram_bo_driver_move() |                |              |
> --------------------------------------------------------------
>
> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Closes: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com/#t
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>   drivers/gpu/drm/tiny/bochs.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c
> index 31fc5d839e10..6414f0a72f6a 100644
> --- a/drivers/gpu/drm/tiny/bochs.c
> +++ b/drivers/gpu/drm/tiny/bochs.c
> @@ -261,7 +261,7 @@ static int bochs_hw_init(struct drm_device *dev)
>   	if (pci_request_region(pdev, 0, "bochs-drm") != 0)
>   		DRM_WARN("Cannot request framebuffer, boot fb still active?\n");
>   
> -	bochs->fb_map = ioremap(addr, size);
> +	bochs->fb_map = ioremap_wc(addr, size);
>   	if (bochs->fb_map == NULL) {
>   		DRM_ERROR("Cannot map framebuffer\n");
>   		return -ENOMEM;
Yan Zhao Sept. 9, 2024, 1:19 p.m. UTC | #2
On Mon, Sep 09, 2024 at 08:40:30AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 09.09.24 um 07:15 schrieb Yan Zhao:
> > Use ioremap_wc() instead of ioremap() to map framebuffer during driver
> > probing phase.
> > 
> > Using ioremap() results in a VA being mapped with PAT=UC-. Additionally,
> > on x86 architectures, ioremap() invokes memtype_reserve() to reserve the
> > memory type as UC- for the physical range. This reservation can cause
> > subsequent calls to ioremap_wc() to fail to map the VA with PAT=WC to the
> > same physical range for framebuffre in ttm_kmap_iter_linear_io_init().
> > Consequently, the operation drm_gem_vram_bo_driver_move() ->
> > ttm_bo_move_memcpy() -> ttm_move_memcpy() becomes significantly slow on
> > platforms where UC memory access is slow.
> 
> I've noticed this too and pushed a major update that replaces the entire
> memory management. [1]
> 
> The patch is still welcome, I think, but you may want to rebase onto the
> latest drm-misc-next branch. [2]
> 
> Best regards
> Thomas
> 
> [1] https://patchwork.freedesktop.org/series/138086/
> [2] https://gitlab.freedesktop.org/drm/misc/kernel/-/tree/drm-misc-next
Thanks!

The updated version is at
https://lore.kernel.org/all/20240909131643.28915-1-yan.y.zhao@intel.com
diff mbox series

Patch

diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c
index 31fc5d839e10..6414f0a72f6a 100644
--- a/drivers/gpu/drm/tiny/bochs.c
+++ b/drivers/gpu/drm/tiny/bochs.c
@@ -261,7 +261,7 @@  static int bochs_hw_init(struct drm_device *dev)
 	if (pci_request_region(pdev, 0, "bochs-drm") != 0)
 		DRM_WARN("Cannot request framebuffer, boot fb still active?\n");
 
-	bochs->fb_map = ioremap(addr, size);
+	bochs->fb_map = ioremap_wc(addr, size);
 	if (bochs->fb_map == NULL) {
 		DRM_ERROR("Cannot map framebuffer\n");
 		return -ENOMEM;