Message ID | 20240909051529.26776-1-yan.y.zhao@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/bochs: use ioremap_wc() to map framebuffer during driver probing | expand |
Hi Am 09.09.24 um 07:15 schrieb Yan Zhao: > Use ioremap_wc() instead of ioremap() to map framebuffer during driver > probing phase. > > Using ioremap() results in a VA being mapped with PAT=UC-. Additionally, > on x86 architectures, ioremap() invokes memtype_reserve() to reserve the > memory type as UC- for the physical range. This reservation can cause > subsequent calls to ioremap_wc() to fail to map the VA with PAT=WC to the > same physical range for framebuffre in ttm_kmap_iter_linear_io_init(). > Consequently, the operation drm_gem_vram_bo_driver_move() -> > ttm_bo_move_memcpy() -> ttm_move_memcpy() becomes significantly slow on > platforms where UC memory access is slow. I've noticed this too and pushed a major update that replaces the entire memory management. [1] The patch is still welcome, I think, but you may want to rebase onto the latest drm-misc-next branch. [2] Best regards Thomas [1] https://patchwork.freedesktop.org/series/138086/ [2] https://gitlab.freedesktop.org/drm/misc/kernel/-/tree/drm-misc-next > > Here's the performance data measured in a guest on the physical machine > "Sapphire Rapids XCC". > With host KVM honors guest PAT memory types, the effective memory type > for this framebuffer range is > - WC when ioremap_wc() is used in driver probing phase > - UC- when ioremap() is used. > > The data presented is an average from 10 execution runs. > The memcpy range for the data is > mem->bus.offset=0xfd000000, mem->size=0x3e8000. > > -------------------------------------------------------------- > | in bochs_hw_init() | > | ioremap() | ioremap_wc() | > ------------------------------|----------------|--------------| > cycles of | 2227.4M | 17.8M | > drm_gem_vram_bo_driver_move() | | | > ------------------------------|----------------|--------------| > time of | 1.24s | 0.01s | > drm_gem_vram_bo_driver_move() | | | > -------------------------------------------------------------- > > Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> > Closes: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com/#t > Cc: Sean Christopherson <seanjc@google.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Kevin Tian <kevin.tian@intel.com> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > --- > drivers/gpu/drm/tiny/bochs.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c > index 31fc5d839e10..6414f0a72f6a 100644 > --- a/drivers/gpu/drm/tiny/bochs.c > +++ b/drivers/gpu/drm/tiny/bochs.c > @@ -261,7 +261,7 @@ static int bochs_hw_init(struct drm_device *dev) > if (pci_request_region(pdev, 0, "bochs-drm") != 0) > DRM_WARN("Cannot request framebuffer, boot fb still active?\n"); > > - bochs->fb_map = ioremap(addr, size); > + bochs->fb_map = ioremap_wc(addr, size); > if (bochs->fb_map == NULL) { > DRM_ERROR("Cannot map framebuffer\n"); > return -ENOMEM;
On Mon, Sep 09, 2024 at 08:40:30AM +0200, Thomas Zimmermann wrote: > Hi > > Am 09.09.24 um 07:15 schrieb Yan Zhao: > > Use ioremap_wc() instead of ioremap() to map framebuffer during driver > > probing phase. > > > > Using ioremap() results in a VA being mapped with PAT=UC-. Additionally, > > on x86 architectures, ioremap() invokes memtype_reserve() to reserve the > > memory type as UC- for the physical range. This reservation can cause > > subsequent calls to ioremap_wc() to fail to map the VA with PAT=WC to the > > same physical range for framebuffre in ttm_kmap_iter_linear_io_init(). > > Consequently, the operation drm_gem_vram_bo_driver_move() -> > > ttm_bo_move_memcpy() -> ttm_move_memcpy() becomes significantly slow on > > platforms where UC memory access is slow. > > I've noticed this too and pushed a major update that replaces the entire > memory management. [1] > > The patch is still welcome, I think, but you may want to rebase onto the > latest drm-misc-next branch. [2] > > Best regards > Thomas > > [1] https://patchwork.freedesktop.org/series/138086/ > [2] https://gitlab.freedesktop.org/drm/misc/kernel/-/tree/drm-misc-next Thanks! The updated version is at https://lore.kernel.org/all/20240909131643.28915-1-yan.y.zhao@intel.com
diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c index 31fc5d839e10..6414f0a72f6a 100644 --- a/drivers/gpu/drm/tiny/bochs.c +++ b/drivers/gpu/drm/tiny/bochs.c @@ -261,7 +261,7 @@ static int bochs_hw_init(struct drm_device *dev) if (pci_request_region(pdev, 0, "bochs-drm") != 0) DRM_WARN("Cannot request framebuffer, boot fb still active?\n"); - bochs->fb_map = ioremap(addr, size); + bochs->fb_map = ioremap_wc(addr, size); if (bochs->fb_map == NULL) { DRM_ERROR("Cannot map framebuffer\n"); return -ENOMEM;
Use ioremap_wc() instead of ioremap() to map framebuffer during driver probing phase. Using ioremap() results in a VA being mapped with PAT=UC-. Additionally, on x86 architectures, ioremap() invokes memtype_reserve() to reserve the memory type as UC- for the physical range. This reservation can cause subsequent calls to ioremap_wc() to fail to map the VA with PAT=WC to the same physical range for framebuffre in ttm_kmap_iter_linear_io_init(). Consequently, the operation drm_gem_vram_bo_driver_move() -> ttm_bo_move_memcpy() -> ttm_move_memcpy() becomes significantly slow on platforms where UC memory access is slow. Here's the performance data measured in a guest on the physical machine "Sapphire Rapids XCC". With host KVM honors guest PAT memory types, the effective memory type for this framebuffer range is - WC when ioremap_wc() is used in driver probing phase - UC- when ioremap() is used. The data presented is an average from 10 execution runs. The memcpy range for the data is mem->bus.offset=0xfd000000, mem->size=0x3e8000. -------------------------------------------------------------- | in bochs_hw_init() | | ioremap() | ioremap_wc() | ------------------------------|----------------|--------------| cycles of | 2227.4M | 17.8M | drm_gem_vram_bo_driver_move() | | | ------------------------------|----------------|--------------| time of | 1.24s | 0.01s | drm_gem_vram_bo_driver_move() | | | -------------------------------------------------------------- Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Closes: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com/#t Cc: Sean Christopherson <seanjc@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> --- drivers/gpu/drm/tiny/bochs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)