diff mbox

drm/etnaviv: move GPU linear window to end of DMA window

Message ID 1456762816-496-1-git-send-email-l.stach@pengutronix.de (mailing list archive)
State New, archived
Headers show

Commit Message

Lucas Stach Feb. 29, 2016, 4:20 p.m. UTC
If the end of the system DMA window is farther away from the start of
physical RAM than the size of the GPU linear window, move the linear
window so that it ends at the same address than the system DMA window.

This allows to map command buffer from CMA, which is likely to reside
at the end of the system DMA window, while also overlapping as much
RAM as possible, in order to optimize regular buffer mappings through
the linear window.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

Comments

Russell King - ARM Linux March 9, 2016, 12:25 p.m. UTC | #1
On Mon, Feb 29, 2016 at 05:20:16PM +0100, Lucas Stach wrote:
> If the end of the system DMA window is farther away from the start of
> physical RAM than the size of the GPU linear window, move the linear
> window so that it ends at the same address than the system DMA window.
> 
> This allows to map command buffer from CMA, which is likely to reside
> at the end of the system DMA window, while also overlapping as much
> RAM as possible, in order to optimize regular buffer mappings through
> the linear window.

I've been pondering this for a while now, and I think we should not
do this unconditionally - it should be predicated on the MC20 feature -
both for the original code and the new code.  If we don't have the MC20
feature, we end up with more memory spaces than we can cope with.

If we don't have MC20, but we need to offset, that's an error which
can lead to memory corruption.
Lucas Stach March 14, 2016, 3:02 p.m. UTC | #2
Am Mittwoch, den 09.03.2016, 12:25 +0000 schrieb Russell King - ARM
Linux:
> On Mon, Feb 29, 2016 at 05:20:16PM +0100, Lucas Stach wrote:
> > If the end of the system DMA window is farther away from the start of
> > physical RAM than the size of the GPU linear window, move the linear
> > window so that it ends at the same address than the system DMA window.
> > 
> > This allows to map command buffer from CMA, which is likely to reside
> > at the end of the system DMA window, while also overlapping as much
> > RAM as possible, in order to optimize regular buffer mappings through
> > the linear window.
> 
> I've been pondering this for a while now, and I think we should not
> do this unconditionally - it should be predicated on the MC20 feature -
> both for the original code and the new code.  If we don't have the MC20
> feature, we end up with more memory spaces than we can cope with.
> 
> If we don't have MC20, but we need to offset, that's an error which
> can lead to memory corruption.
> 
This makes sense.

I guess not using the offset on MC10 will also allow you to enable TS on
those parts? In that case we might advertise this with a patchlevel
change of the API.

Regards,
Lucas
Russell King - ARM Linux March 14, 2016, 3:09 p.m. UTC | #3
On Mon, Mar 14, 2016 at 04:02:35PM +0100, Lucas Stach wrote:
> I guess not using the offset on MC10 will also allow you to enable TS on
> those parts? In that case we might advertise this with a patchlevel
> change of the API.

I don't think we need that - it isn't an API change as such.  What
we could do is to clear the fast clear capability for GPUs where the
base is non-zero but has MC10, which basically means we don't use
tile status.
Lucas Stach March 14, 2016, 3:18 p.m. UTC | #4
Am Montag, den 14.03.2016, 15:09 +0000 schrieb Russell King - ARM Linux:
> On Mon, Mar 14, 2016 at 04:02:35PM +0100, Lucas Stach wrote:
> > I guess not using the offset on MC10 will also allow you to enable TS on
> > those parts? In that case we might advertise this with a patchlevel
> > change of the API.
> 
> I don't think we need that - it isn't an API change as such.  What
> we could do is to clear the fast clear capability for GPUs where the
> base is non-zero but has MC10, which basically means we don't use
> tile status.
> 
With kernel 4.5 being released now, we already have a kernel version
that may change the offset, while not clearing the fast clear capability
bit. So I think we need another way for userspace to know if the kernel
is doing the right thing for MC10.
Daniel Vetter March 15, 2016, 7:54 a.m. UTC | #5
On Mon, Mar 14, 2016 at 04:18:43PM +0100, Lucas Stach wrote:
> Am Montag, den 14.03.2016, 15:09 +0000 schrieb Russell King - ARM Linux:
> > On Mon, Mar 14, 2016 at 04:02:35PM +0100, Lucas Stach wrote:
> > > I guess not using the offset on MC10 will also allow you to enable TS on
> > > those parts? In that case we might advertise this with a patchlevel
> > > change of the API.
> > 
> > I don't think we need that - it isn't an API change as such.  What
> > we could do is to clear the fast clear capability for GPUs where the
> > base is non-zero but has MC10, which basically means we don't use
> > tile status.
> > 
> With kernel 4.5 being released now, we already have a kernel version
> that may change the offset, while not clearing the fast clear capability
> bit. So I think we need another way for userspace to know if the kernel
> is doing the right thing for MC10.

btw in drm land we're sometimes a bit sloppy with ABI - if it's just
rendering corruption or maybe oddball gpu hang and small enough to go in
through stable we don't bother to ref the ABI. Instead just ask everyone
to upgrade their kernel once the patch goes through the stable queues.

Otherwise even minor fumbles means ABI complexity forever, and with gpus
that tends to kill you ;-)
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index 40f2a37f56e3..e9e66b99ab7c 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -1563,6 +1563,7 @@  static int etnaviv_gpu_platform_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
 	struct etnaviv_gpu *gpu;
+	u32 dma_mask;
 	int err = 0;
 
 	gpu = devm_kzalloc(dev, sizeof(*gpu), GFP_KERNEL);
@@ -1573,12 +1574,16 @@  static int etnaviv_gpu_platform_probe(struct platform_device *pdev)
 	mutex_init(&gpu->lock);
 
 	/*
-	 * Set the GPU base address to the start of physical memory.  This
-	 * ensures that if we have up to 2GB, the v1 MMU can address the
-	 * highest memory.  This is important as command buffers may be
-	 * allocated outside of this limit.
+	 * Set the GPU linear window to be at the end of the DMA window, where
+	 * the CMA area is likely to reside. This ensures that we are able to
+	 * map the command buffers while having the linear window overlap as
+	 * much RAM as possible, so we can optimize mappings for other buffers.
 	 */
-	gpu->memory_base = PHYS_OFFSET;
+	dma_mask = (u32)dma_get_required_mask(dev);
+	if (dma_mask < PHYS_OFFSET + SZ_2G)
+		gpu->memory_base = PHYS_OFFSET;
+	else
+		gpu->memory_base = dma_mask - SZ_2G + 1;
 
 	/* Map registers: */
 	gpu->mmio = etnaviv_ioremap(pdev, NULL, dev_name(gpu->dev));