agp/intel, drm/i915: Use a write-combining map for updating PTEs
diff mbox

Message ID 1344769479-3237-1-git-send-email-chris@chris-wilson.co.uk
State New, archived
Headers show

Commit Message

Chris Wilson Aug. 12, 2012, 11:04 a.m. UTC
In order to be able to ioremap_wc the GTT space, we need to remove the
conflicting pci_iomap from drm/i915, so we limit the register map in
drm/i915 to the suitable range for each generation. The benefit of doing
this is an order of magnitude reduction in time spent rewriting the GTT
entries when inserting and removing objects. For example, this halves the
CPU time spent in X when pushing pixels for chromium through a userptr
(chromium has a bug where it likes to recreate its ShmPixmap on every
draw).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/char/agp/intel-gtt.c    |   13 ++++++++++---
 drivers/gpu/drm/i915/i915_dma.c |   14 ++++++++++++--
 2 files changed, 22 insertions(+), 5 deletions(-)

Comments

Daniel Vetter Aug. 12, 2012, 3:47 p.m. UTC | #1
On Sun, Aug 12, 2012 at 12:04:39PM +0100, Chris Wilson wrote:
> In order to be able to ioremap_wc the GTT space, we need to remove the
> conflicting pci_iomap from drm/i915, so we limit the register map in
> drm/i915 to the suitable range for each generation. The benefit of doing
> this is an order of magnitude reduction in time spent rewriting the GTT
> entries when inserting and removing objects. For example, this halves the
> CPU time spent in X when pushing pixels for chromium through a userptr
> (chromium has a bug where it likes to recreate its ShmPixmap on every
> draw).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

How well does this work with ums?

I guess if it blows up, we could ioremap uncached, but when kms
initializes drop that uc mapping and try to remap wc. But I fear that ums
will map the entire bar and hence we can't just unconditionally map the
gatt wc.
-Daniel
Chris Wilson Aug. 12, 2012, 4:01 p.m. UTC | #2
On Sun, 12 Aug 2012 17:47:46 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Sun, Aug 12, 2012 at 12:04:39PM +0100, Chris Wilson wrote:
> > In order to be able to ioremap_wc the GTT space, we need to remove the
> > conflicting pci_iomap from drm/i915, so we limit the register map in
> > drm/i915 to the suitable range for each generation. The benefit of doing
> > this is an order of magnitude reduction in time spent rewriting the GTT
> > entries when inserting and removing objects. For example, this halves the
> > CPU time spent in X when pushing pixels for chromium through a userptr
> > (chromium has a bug where it likes to recreate its ShmPixmap on every
> > draw).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> How well does this work with ums?
> 
> I guess if it blows up, we could ioremap uncached, but when kms
> initializes drop that uc mapping and try to remap wc. But I fear that ums
> will map the entire bar and hence we can't just unconditionally map the
> gatt wc.

It will work equisitely with ums. It will fail to do as it wishes and
fallback to VESA and everybody will be much happier...
-Chris
Chris Wilson Aug. 12, 2012, 7:12 p.m. UTC | #3
On Sun, 12 Aug 2012 17:01:08 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Sun, 12 Aug 2012 17:47:46 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Sun, Aug 12, 2012 at 12:04:39PM +0100, Chris Wilson wrote:
> > > In order to be able to ioremap_wc the GTT space, we need to remove the
> > > conflicting pci_iomap from drm/i915, so we limit the register map in
> > > drm/i915 to the suitable range for each generation. The benefit of doing
> > > this is an order of magnitude reduction in time spent rewriting the GTT
> > > entries when inserting and removing objects. For example, this halves the
> > > CPU time spent in X when pushing pixels for chromium through a userptr
> > > (chromium has a bug where it likes to recreate its ShmPixmap on every
> > > draw).
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > How well does this work with ums?
> > 
> > I guess if it blows up, we could ioremap uncached, but when kms
> > initializes drop that uc mapping and try to remap wc. But I fear that ums
> > will map the entire bar and hence we can't just unconditionally map the
> > gatt wc.
> 
> It will work equisitely with ums. It will fail to do as it wishes and
> fallback to VESA and everybody will be much happier...

So having rediscovered the hard truth that i915.modeset=1 and
xf86-video-2.6.0 results in nasty hangs, setting the GTT table to WC has
no effect upon the ancient UMS module - it shows the retro background
and appears to function. We struck lucky. \o/
-Chris
Daniel Vetter Aug. 13, 2012, 9:16 a.m. UTC | #4
On Sun, Aug 12, 2012 at 08:12:02PM +0100, Chris Wilson wrote:
> On Sun, 12 Aug 2012 17:01:08 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > On Sun, 12 Aug 2012 17:47:46 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > On Sun, Aug 12, 2012 at 12:04:39PM +0100, Chris Wilson wrote:
> > > > In order to be able to ioremap_wc the GTT space, we need to remove the
> > > > conflicting pci_iomap from drm/i915, so we limit the register map in
> > > > drm/i915 to the suitable range for each generation. The benefit of doing
> > > > this is an order of magnitude reduction in time spent rewriting the GTT
> > > > entries when inserting and removing objects. For example, this halves the
> > > > CPU time spent in X when pushing pixels for chromium through a userptr
> > > > (chromium has a bug where it likes to recreate its ShmPixmap on every
> > > > draw).
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > 
> > > How well does this work with ums?
> > > 
> > > I guess if it blows up, we could ioremap uncached, but when kms
> > > initializes drop that uc mapping and try to remap wc. But I fear that ums
> > > will map the entire bar and hence we can't just unconditionally map the
> > > gatt wc.
> > 
> > It will work equisitely with ums. It will fail to do as it wishes and
> > fallback to VESA and everybody will be much happier...
> 
> So having rediscovered the hard truth that i915.modeset=1 and
> xf86-video-2.6.0 results in nasty hangs, setting the GTT table to WC has
> no effect upon the ancient UMS module - it shows the retro background
> and appears to function. We struck lucky. \o/

Ok, let's them the wrath of the abi gods. Merged to -queued, thanks for
the patch.
-Daniel

Patch
diff mbox

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 76103aa..73bdb74 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -666,8 +666,14 @@  static int intel_gtt_init(void)
 
 	gtt_map_size = intel_private.base.gtt_total_entries * 4;
 
-	intel_private.gtt = ioremap(intel_private.gtt_bus_addr,
-				    gtt_map_size);
+	intel_private.gtt = ioremap_wc(intel_private.gtt_bus_addr,
+				       gtt_map_size);
+	if (!intel_private.gtt) {
+		dev_err(&intel_private.bridge_dev->dev,
+			"failed to map GATT as wc, falling back to uc-\n");
+		intel_private.gtt = ioremap(intel_private.gtt_bus_addr,
+					    gtt_map_size);
+	}
 	if (!intel_private.gtt) {
 		intel_private.driver->cleanup();
 		iounmap(intel_private.registers);
@@ -1233,12 +1239,13 @@  static inline int needs_idle_maps(void)
 static int i9xx_setup(void)
 {
 	u32 reg_addr;
-	int size = KB(512);
+	int size;
 
 	pci_read_config_dword(intel_private.pcidev, I915_MMADDR, &reg_addr);
 
 	reg_addr &= 0xfff80000;
 
+	size = KB(512);
 	if (INTEL_GTT_GEN >= 7)
 		size = MB(2);
 
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index a21e0b0..c453304 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1458,7 +1458,7 @@  int i915_driver_load(struct drm_device *dev, unsigned long flags)
 {
 	struct drm_i915_private *dev_priv;
 	struct intel_device_info *info;
-	int ret = 0, mmio_bar;
+	int ret = 0, mmio_bar, mmio_size;
 	uint32_t aperture_size;
 
 	info = (struct intel_device_info *) flags;
@@ -1522,8 +1522,18 @@  int i915_driver_load(struct drm_device *dev, unsigned long flags)
 	if (IS_BROADWATER(dev) || IS_CRESTLINE(dev))
 		dma_set_coherent_mask(&dev->pdev->dev, DMA_BIT_MASK(32));
 
+	/* Restrict iomap to avoid clobbering the GTT which we want WC mapped.
+	 * Do not attempt to map the whole BAR!
+	 */
 	mmio_bar = IS_GEN2(dev) ? 1 : 0;
-	dev_priv->regs = pci_iomap(dev->pdev, mmio_bar, 0);
+	if (info->gen < 3)
+		mmio_size = 64*1024;
+	else if (info->gen < 5)
+		mmio_size = 512*1024;
+	else
+		mmio_size = 2*1024*1024;
+
+	dev_priv->regs = pci_iomap(dev->pdev, mmio_bar, mmio_size);
 	if (!dev_priv->regs) {
 		DRM_ERROR("failed to map registers\n");
 		ret = -EIO;