[10/10] drm/i915: Use MI_BATCH_BUFFER_START on 830/845

Message ID	1450110229-30450-11-git-send-email-ville.syrjala@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: ville.syrjala@linux.intel.com To: intel-gfx@lists.freedesktop.org Date: Mon, 14 Dec 2015 18:23:49 +0200 Message-Id: <1450110229-30450-11-git-send-email-ville.syrjala@linux.intel.com> In-Reply-To: <1450110229-30450-1-git-send-email-ville.syrjala@linux.intel.com> References: <1450110229-30450-1-git-send-email-ville.syrjala@linux.intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 10/10] drm/i915: Use MI_BATCH_BUFFER_START on 830/845 Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Ville Syrjala Dec. 14, 2015, 4:23 p.m. UTC

From: Ville Syrjälä <ville.syrjala@linux.intel.com>

MI_BATCH_BUFFER is nasty since it requires that userspace pass in the
correct batch length.

Let's switch to using MI_BATCH_BUFFER_START instead (like we do on
other platforms). Then we don't have to specify the batch length
at all, and the CS will instead execute until it sees the
MI_BATCH_BUFFER_END.

We still need the batch length since we do the CS TLB workaround
and copy the batch into the permanently pinned scratch object
and execute it from there. But for this we can simply use the
batch object length when the user hasn't specified the actual
batch length. So specifying the batch length becomes just a
way to optimize the batch copy a little bit.

We lost batch_len from a bunch of igts (including the quiesce batch)
so without this igt is utterly broken on 830/845. Also some igts such
as gem_cpu_reloc never specified the batch_len and so didn't work.
With MI_BATCH_BUFFER_START we don't have to fix up igt every time
someone forgets that 830/845 exist.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 +++
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 6 ++----
 2 files changed, 5 insertions(+), 4 deletions(-)

Chris Wilson Dec. 14, 2015, 4:58 p.m. UTC | #1

On Mon, Dec 14, 2015 at 06:23:49PM +0200, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> MI_BATCH_BUFFER is nasty since it requires that userspace pass in the
> correct batch length.
> 
> Let's switch to using MI_BATCH_BUFFER_START instead (like we do on
> other platforms). Then we don't have to specify the batch length
> at all, and the CS will instead execute until it sees the
> MI_BATCH_BUFFER_END.

Oh. My. Gosh. There's a BB_START?!!!

> We still need the batch length since we do the CS TLB workaround
> and copy the batch into the permanently pinned scratch object
> and execute it from there. But for this we can simply use the
> batch object length when the user hasn't specified the actual
> batch length. So specifying the batch length becomes just a
> way to optimize the batch copy a little bit.
> 
> We lost batch_len from a bunch of igts (including the quiesce batch)
> so without this igt is utterly broken on 830/845. Also some igts such
> as gem_cpu_reloc never specified the batch_len and so didn't work.
> With MI_BATCH_BUFFER_START we don't have to fix up igt every time
> someone forgets that 830/845 exist.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Looks sane.
-Chris

Ville Syrjala Dec. 14, 2015, 5:25 p.m. UTC | #2

On Mon, Dec 14, 2015 at 04:58:54PM +0000, Chris Wilson wrote:
> On Mon, Dec 14, 2015 at 06:23:49PM +0200, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > MI_BATCH_BUFFER is nasty since it requires that userspace pass in the
> > correct batch length.
> > 
> > Let's switch to using MI_BATCH_BUFFER_START instead (like we do on
> > other platforms). Then we don't have to specify the batch length
> > at all, and the CS will instead execute until it sees the
> > MI_BATCH_BUFFER_END.
> 
> Oh. My. Gosh. There's a BB_START?!!!

Looks like ;) At least my 830 seems perfectly happy with it. Well,
as happy as it's ever been. Though I still couldn't get it complete
a piglit run without hanging last I tried.

> 
> > We still need the batch length since we do the CS TLB workaround
> > and copy the batch into the permanently pinned scratch object
> > and execute it from there. But for this we can simply use the
> > batch object length when the user hasn't specified the actual
> > batch length. So specifying the batch length becomes just a
> > way to optimize the batch copy a little bit.
> > 
> > We lost batch_len from a bunch of igts (including the quiesce batch)
> > so without this igt is utterly broken on 830/845. Also some igts such
> > as gem_cpu_reloc never specified the batch_len and so didn't work.
> > With MI_BATCH_BUFFER_START we don't have to fix up igt every time
> > someone forgets that 830/845 exist.
> > 
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Looks sane.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

Chris Wilson Dec. 15, 2015, 10:09 a.m. UTC | #3

On Mon, Dec 14, 2015 at 07:25:13PM +0200, Ville Syrjälä wrote:
> On Mon, Dec 14, 2015 at 04:58:54PM +0000, Chris Wilson wrote:
> > On Mon, Dec 14, 2015 at 06:23:49PM +0200, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > 
> > > MI_BATCH_BUFFER is nasty since it requires that userspace pass in the
> > > correct batch length.
> > > 
> > > Let's switch to using MI_BATCH_BUFFER_START instead (like we do on
> > > other platforms). Then we don't have to specify the batch length
> > > at all, and the CS will instead execute until it sees the
> > > MI_BATCH_BUFFER_END.
> > 
> > Oh. My. Gosh. There's a BB_START?!!!
> 
> Looks like ;) At least my 830 seems perfectly happy with it. Well,
> as happy as it's ever been. Though I still couldn't get it complete
> a piglit run without hanging last I tried.
> 
> > 
> > > We still need the batch length since we do the CS TLB workaround
> > > and copy the batch into the permanently pinned scratch object
> > > and execute it from there. But for this we can simply use the
> > > batch object length when the user hasn't specified the actual
> > > batch length. So specifying the batch length becomes just a
> > > way to optimize the batch copy a little bit.
> > > 
> > > We lost batch_len from a bunch of igts (including the quiesce batch)
> > > so without this igt is utterly broken on 830/845. Also some igts such
> > > as gem_cpu_reloc never specified the batch_len and so didn't work.
> > > With MI_BATCH_BUFFER_START we don't have to fix up igt every time
> > > someone forgets that 830/845 exist.
> > > 
> > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > Looks sane.

Checked against bspec, and it does indeed list BB_START for 830/845 and
we already comply with the errata.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

Chris Wilson Dec. 15, 2015, 10:24 a.m. UTC | #4

On Tue, Dec 15, 2015 at 10:09:30AM +0000, Chris Wilson wrote:
> On Mon, Dec 14, 2015 at 07:25:13PM +0200, Ville Syrjälä wrote:
> > On Mon, Dec 14, 2015 at 04:58:54PM +0000, Chris Wilson wrote:
> > > On Mon, Dec 14, 2015 at 06:23:49PM +0200, ville.syrjala@linux.intel.com wrote:
> > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > 
> > > > MI_BATCH_BUFFER is nasty since it requires that userspace pass in the
> > > > correct batch length.
> > > > 
> > > > Let's switch to using MI_BATCH_BUFFER_START instead (like we do on
> > > > other platforms). Then we don't have to specify the batch length
> > > > at all, and the CS will instead execute until it sees the
> > > > MI_BATCH_BUFFER_END.
> > > 
> > > Oh. My. Gosh. There's a BB_START?!!!
> > 
> > Looks like ;) At least my 830 seems perfectly happy with it. Well,
> > as happy as it's ever been. Though I still couldn't get it complete
> > a piglit run without hanging last I tried.
> > 
> > > 
> > > > We still need the batch length since we do the CS TLB workaround
> > > > and copy the batch into the permanently pinned scratch object
> > > > and execute it from there. But for this we can simply use the
> > > > batch object length when the user hasn't specified the actual
> > > > batch length. So specifying the batch length becomes just a
> > > > way to optimize the batch copy a little bit.
> > > > 
> > > > We lost batch_len from a bunch of igts (including the quiesce batch)
> > > > so without this igt is utterly broken on 830/845. Also some igts such
> > > > as gem_cpu_reloc never specified the batch_len and so didn't work.
> > > > With MI_BATCH_BUFFER_START we don't have to fix up igt every time
> > > > someone forgets that 830/845 exist.
> > > > 
> > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > 
> > > Looks sane.
> 
> Checked against bspec, and it does indeed list BB_START for 830/845 and
> we already comply with the errata.

The other question, does this obsolete our work around? Can I be that
optimisitic?
-Chris

Ville Syrjala Dec. 15, 2015, 11:05 a.m. UTC | #5

On Tue, Dec 15, 2015 at 10:24:13AM +0000, Chris Wilson wrote:
> On Tue, Dec 15, 2015 at 10:09:30AM +0000, Chris Wilson wrote:
> > On Mon, Dec 14, 2015 at 07:25:13PM +0200, Ville Syrjälä wrote:
> > > On Mon, Dec 14, 2015 at 04:58:54PM +0000, Chris Wilson wrote:
> > > > On Mon, Dec 14, 2015 at 06:23:49PM +0200, ville.syrjala@linux.intel.com wrote:
> > > > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > 
> > > > > MI_BATCH_BUFFER is nasty since it requires that userspace pass in the
> > > > > correct batch length.
> > > > > 
> > > > > Let's switch to using MI_BATCH_BUFFER_START instead (like we do on
> > > > > other platforms). Then we don't have to specify the batch length
> > > > > at all, and the CS will instead execute until it sees the
> > > > > MI_BATCH_BUFFER_END.
> > > > 
> > > > Oh. My. Gosh. There's a BB_START?!!!
> > > 
> > > Looks like ;) At least my 830 seems perfectly happy with it. Well,
> > > as happy as it's ever been. Though I still couldn't get it complete
> > > a piglit run without hanging last I tried.
> > > 
> > > > 
> > > > > We still need the batch length since we do the CS TLB workaround
> > > > > and copy the batch into the permanently pinned scratch object
> > > > > and execute it from there. But for this we can simply use the
> > > > > batch object length when the user hasn't specified the actual
> > > > > batch length. So specifying the batch length becomes just a
> > > > > way to optimize the batch copy a little bit.
> > > > > 
> > > > > We lost batch_len from a bunch of igts (including the quiesce batch)
> > > > > so without this igt is utterly broken on 830/845. Also some igts such
> > > > > as gem_cpu_reloc never specified the batch_len and so didn't work.
> > > > > With MI_BATCH_BUFFER_START we don't have to fix up igt every time
> > > > > someone forgets that 830/845 exist.
> > > > > 
> > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > 
> > > > Looks sane.
> > 
> > Checked against bspec, and it does indeed list BB_START for 830/845 and
> > we already comply with the errata.

Yeah, the coloring stuff should be enough.

> 
> The other question, does this obsolete our work around? Can I be that
> optimisitic?

The CS TLB one? I think I tried it at some point, and things till
failed. But stuff fails even with the w/a (like I said piglit will
hang the GPU eventually), so I can't be sure that I actually tested
the CS TLB fail. I think I need to retest at some point.

As far as the docs go, I only remember it mentioning some TLB fail
affecting the blitter. I guess the CS TLB fail isn't actually
documented anywhere?

Chris Wilson Dec. 15, 2015, 11:22 a.m. UTC | #6

On Tue, Dec 15, 2015 at 01:05:56PM +0200, Ville Syrjälä wrote:
> On Tue, Dec 15, 2015 at 10:24:13AM +0000, Chris Wilson wrote:
> > The other question, does this obsolete our work around? Can I be that
> > optimisitic?
> 
> The CS TLB one? I think I tried it at some point, and things till
> failed. But stuff fails even with the w/a (like I said piglit will
> hang the GPU eventually), so I can't be sure that I actually tested
> the CS TLB fail. I think I need to retest at some point.
> 
> As far as the docs go, I only remember it mentioning some TLB fail
> affecting the blitter. I guess the CS TLB fail isn't actually
> documented anywhere?

It's hard to be sure since the issue is only mentioned obliquely in
bspec. I strongly suspect there is only one set of TLB on the device, so
I think it is the same. But I never did figure out what flush they
meant, as all the chipset or MI level flushes never seemed to do anything
to improve the situation.
-Chris

Ville Syrjala Dec. 15, 2015, 11:43 a.m. UTC | #7

On Tue, Dec 15, 2015 at 11:22:29AM +0000, Chris Wilson wrote:
> On Tue, Dec 15, 2015 at 01:05:56PM +0200, Ville Syrjälä wrote:
> > On Tue, Dec 15, 2015 at 10:24:13AM +0000, Chris Wilson wrote:
> > > The other question, does this obsolete our work around? Can I be that
> > > optimisitic?
> > 
> > The CS TLB one? I think I tried it at some point, and things till
> > failed. But stuff fails even with the w/a (like I said piglit will
> > hang the GPU eventually), so I can't be sure that I actually tested
> > the CS TLB fail. I think I need to retest at some point.
> > 
> > As far as the docs go, I only remember it mentioning some TLB fail
> > affecting the blitter. I guess the CS TLB fail isn't actually
> > documented anywhere?
> 
> It's hard to be sure since the issue is only mentioned obliquely in
> bspec. I strongly suspect there is only one set of TLB on the device, so
> I think it is the same. But I never did figure out what flush they
> meant, as all the chipset or MI level flushes never seemed to do anything
> to improve the situation.

Programming Environment 1.4.9.4 claims that there are several TLBs. But
not sure if that really holds for all devices.

It also has the following table:

GTT TLBs
TLB            | Normal Invalidation Mechanism  | Invalidated by Page Table Enable bit
               |                                | of PGTBL_CTL register?
...
Command Stream | Through a Page Table PTE write | Yes

Which might hint that PGTBL_CTL might be the way to force invalidate
them. But IIRC you may have once said that you already tried it. In any
case, even if it would work we'd need to make sure no GTT access is
happening when toggling the bit (assuming we'd have to toggle it).

[10/10] drm/i915: Use MI_BATCH_BUFFER_START on 830/845

Commit Message

Comments

Patch