Message ID | 1456762816-496-1-git-send-email-l.stach@pengutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Feb 29, 2016 at 05:20:16PM +0100, Lucas Stach wrote: > If the end of the system DMA window is farther away from the start of > physical RAM than the size of the GPU linear window, move the linear > window so that it ends at the same address than the system DMA window. > > This allows to map command buffer from CMA, which is likely to reside > at the end of the system DMA window, while also overlapping as much > RAM as possible, in order to optimize regular buffer mappings through > the linear window. I've been pondering this for a while now, and I think we should not do this unconditionally - it should be predicated on the MC20 feature - both for the original code and the new code. If we don't have the MC20 feature, we end up with more memory spaces than we can cope with. If we don't have MC20, but we need to offset, that's an error which can lead to memory corruption.
Am Mittwoch, den 09.03.2016, 12:25 +0000 schrieb Russell King - ARM Linux: > On Mon, Feb 29, 2016 at 05:20:16PM +0100, Lucas Stach wrote: > > If the end of the system DMA window is farther away from the start of > > physical RAM than the size of the GPU linear window, move the linear > > window so that it ends at the same address than the system DMA window. > > > > This allows to map command buffer from CMA, which is likely to reside > > at the end of the system DMA window, while also overlapping as much > > RAM as possible, in order to optimize regular buffer mappings through > > the linear window. > > I've been pondering this for a while now, and I think we should not > do this unconditionally - it should be predicated on the MC20 feature - > both for the original code and the new code. If we don't have the MC20 > feature, we end up with more memory spaces than we can cope with. > > If we don't have MC20, but we need to offset, that's an error which > can lead to memory corruption. > This makes sense. I guess not using the offset on MC10 will also allow you to enable TS on those parts? In that case we might advertise this with a patchlevel change of the API. Regards, Lucas
On Mon, Mar 14, 2016 at 04:02:35PM +0100, Lucas Stach wrote: > I guess not using the offset on MC10 will also allow you to enable TS on > those parts? In that case we might advertise this with a patchlevel > change of the API. I don't think we need that - it isn't an API change as such. What we could do is to clear the fast clear capability for GPUs where the base is non-zero but has MC10, which basically means we don't use tile status.
Am Montag, den 14.03.2016, 15:09 +0000 schrieb Russell King - ARM Linux: > On Mon, Mar 14, 2016 at 04:02:35PM +0100, Lucas Stach wrote: > > I guess not using the offset on MC10 will also allow you to enable TS on > > those parts? In that case we might advertise this with a patchlevel > > change of the API. > > I don't think we need that - it isn't an API change as such. What > we could do is to clear the fast clear capability for GPUs where the > base is non-zero but has MC10, which basically means we don't use > tile status. > With kernel 4.5 being released now, we already have a kernel version that may change the offset, while not clearing the fast clear capability bit. So I think we need another way for userspace to know if the kernel is doing the right thing for MC10.
On Mon, Mar 14, 2016 at 04:18:43PM +0100, Lucas Stach wrote: > Am Montag, den 14.03.2016, 15:09 +0000 schrieb Russell King - ARM Linux: > > On Mon, Mar 14, 2016 at 04:02:35PM +0100, Lucas Stach wrote: > > > I guess not using the offset on MC10 will also allow you to enable TS on > > > those parts? In that case we might advertise this with a patchlevel > > > change of the API. > > > > I don't think we need that - it isn't an API change as such. What > > we could do is to clear the fast clear capability for GPUs where the > > base is non-zero but has MC10, which basically means we don't use > > tile status. > > > With kernel 4.5 being released now, we already have a kernel version > that may change the offset, while not clearing the fast clear capability > bit. So I think we need another way for userspace to know if the kernel > is doing the right thing for MC10. btw in drm land we're sometimes a bit sloppy with ABI - if it's just rendering corruption or maybe oddball gpu hang and small enough to go in through stable we don't bother to ref the ABI. Instead just ask everyone to upgrade their kernel once the patch goes through the stable queues. Otherwise even minor fumbles means ABI complexity forever, and with gpus that tends to kill you ;-) -Daniel
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index 40f2a37f56e3..e9e66b99ab7c 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -1563,6 +1563,7 @@ static int etnaviv_gpu_platform_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; struct etnaviv_gpu *gpu; + u32 dma_mask; int err = 0; gpu = devm_kzalloc(dev, sizeof(*gpu), GFP_KERNEL); @@ -1573,12 +1574,16 @@ static int etnaviv_gpu_platform_probe(struct platform_device *pdev) mutex_init(&gpu->lock); /* - * Set the GPU base address to the start of physical memory. This - * ensures that if we have up to 2GB, the v1 MMU can address the - * highest memory. This is important as command buffers may be - * allocated outside of this limit. + * Set the GPU linear window to be at the end of the DMA window, where + * the CMA area is likely to reside. This ensures that we are able to + * map the command buffers while having the linear window overlap as + * much RAM as possible, so we can optimize mappings for other buffers. */ - gpu->memory_base = PHYS_OFFSET; + dma_mask = (u32)dma_get_required_mask(dev); + if (dma_mask < PHYS_OFFSET + SZ_2G) + gpu->memory_base = PHYS_OFFSET; + else + gpu->memory_base = dma_mask - SZ_2G + 1; /* Map registers: */ gpu->mmio = etnaviv_ioremap(pdev, NULL, dev_name(gpu->dev));
If the end of the system DMA window is farther away from the start of physical RAM than the size of the GPU linear window, move the linear window so that it ends at the same address than the system DMA window. This allows to map command buffer from CMA, which is likely to reside at the end of the system DMA window, while also overlapping as much RAM as possible, in order to optimize regular buffer mappings through the linear window. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> --- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)