diff mbox

[12/13] drm/i915: Consolidate legacy semaphore initialization

Message ID 1467212972-861-12-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin June 29, 2016, 3:09 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Replace per-engine initialization with a common half-programatic,
half-data driven code for ease of maintenance and compactness.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 110 ++++++++++++++------------------
 1 file changed, 48 insertions(+), 62 deletions(-)

Comments

Chris Wilson June 29, 2016, 3:34 p.m. UTC | #1
On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Replace per-engine initialization with a common half-programatic,
> half-data driven code for ease of maintenance and compactness.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

This is the biggest pill to swallow (since our 5x5 table is only
sparsely populated), but it looks correct, and more importantly easier to
read.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Tvrtko Ursulin June 29, 2016, 3:41 p.m. UTC | #2
On 29/06/16 16:34, Chris Wilson wrote:
> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Replace per-engine initialization with a common half-programatic,
>> half-data driven code for ease of maintenance and compactness.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> This is the biggest pill to swallow (since our 5x5 table is only
> sparsely populated), but it looks correct, and more importantly easier to
> read.

Yeah I was out of ideas on how to improve it. Fresh mind needed to try 
and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map to bits 
and registers respectively, and write it as a function.

> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Thanks!

Regards,

Tvrtko
Chris Wilson June 29, 2016, 4 p.m. UTC | #3
On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote:
> 
> On 29/06/16 16:34, Chris Wilson wrote:
> >On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>Replace per-engine initialization with a common half-programatic,
> >>half-data driven code for ease of maintenance and compactness.
> >>
> >>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >
> >This is the biggest pill to swallow (since our 5x5 table is only
> >sparsely populated), but it looks correct, and more importantly easier to
> >read.
> 
> Yeah I was out of ideas on how to improve it. Fresh mind needed to
> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map
> to bits and registers respectively, and write it as a function.

It's actually a very simple cyclic function based on register
offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings.

(The only real challenge is picking the direction.)

commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Wed Sep 14 20:32:47 2011 -0700

    drm/i915: Dumb down the semaphore logic
    
    While I think the previous code is correct, it was hard to follow and
    hard to debug. Since we already have a ring abstraction, might as well
    use it to handle the semaphore updates and compares.

-Chris
Tvrtko Ursulin June 29, 2016, 4:14 p.m. UTC | #4
On 29/06/16 17:00, Chris Wilson wrote:
> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote:
>>
>> On 29/06/16 16:34, Chris Wilson wrote:
>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> Replace per-engine initialization with a common half-programatic,
>>>> half-data driven code for ease of maintenance and compactness.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> This is the biggest pill to swallow (since our 5x5 table is only
>>> sparsely populated), but it looks correct, and more importantly easier to
>>> read.
>>
>> Yeah I was out of ideas on how to improve it. Fresh mind needed to
>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map
>> to bits and registers respectively, and write it as a function.
>
> It's actually a very simple cyclic function based on register
> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings.
>
> (The only real challenge is picking the direction.)
>
> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484
> Author: Ben Widawsky <ben@bwidawsk.net>
> Date:   Wed Sep 14 20:32:47 2011 -0700
>
>      drm/i915: Dumb down the semaphore logic
>
>      While I think the previous code is correct, it was hard to follow and
>      hard to debug. Since we already have a ring abstraction, might as well
>      use it to handle the semaphore updates and compares.

Should I try to go back to that then? Since I am not too happy with the 
sparse table...

This has passed CI so we could merge some of it if that would help your 
series, or wait until I rework this patch.

Regards,

Tvrtko
Chris Wilson June 29, 2016, 4:24 p.m. UTC | #5
On Wed, Jun 29, 2016 at 05:14:11PM +0100, Tvrtko Ursulin wrote:
> 
> On 29/06/16 17:00, Chris Wilson wrote:
> >On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 29/06/16 16:34, Chris Wilson wrote:
> >>>On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote:
> >>>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>>Replace per-engine initialization with a common half-programatic,
> >>>>half-data driven code for ease of maintenance and compactness.
> >>>>
> >>>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>
> >>>This is the biggest pill to swallow (since our 5x5 table is only
> >>>sparsely populated), but it looks correct, and more importantly easier to
> >>>read.
> >>
> >>Yeah I was out of ideas on how to improve it. Fresh mind needed to
> >>try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map
> >>to bits and registers respectively, and write it as a function.
> >
> >It's actually a very simple cyclic function based on register
> >offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings.
> >
> >(The only real challenge is picking the direction.)
> >
> >commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484
> >Author: Ben Widawsky <ben@bwidawsk.net>
> >Date:   Wed Sep 14 20:32:47 2011 -0700
> >
> >     drm/i915: Dumb down the semaphore logic
> >
> >     While I think the previous code is correct, it was hard to follow and
> >     hard to debug. Since we already have a ring abstraction, might as well
> >     use it to handle the semaphore updates and compares.
> 
> Should I try to go back to that then? Since I am not too happy with
> the sparse table...
> 
> This has passed CI so we could merge some of it if that would help
> your series, or wait until I rework this patch.

The rule of thumb is incremental improvements tell a better story and
should be easier to find a misstep. (My personal experience says the
longer I play with a patch the larger it gets...)

In short, you've already consolidated a lot of duplication in the vfuncs
that will make my life easier (after some rebasing joy). Anything more
is just icing on the cake. :)
-Chris
Tvrtko Ursulin June 29, 2016, 4:34 p.m. UTC | #6
On 29/06/16 17:24, Chris Wilson wrote:
> On Wed, Jun 29, 2016 at 05:14:11PM +0100, Tvrtko Ursulin wrote:
>>
>> On 29/06/16 17:00, Chris Wilson wrote:
>>> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 29/06/16 16:34, Chris Wilson wrote:
>>>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote:
>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>
>>>>>> Replace per-engine initialization with a common half-programatic,
>>>>>> half-data driven code for ease of maintenance and compactness.
>>>>>>
>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>
>>>>> This is the biggest pill to swallow (since our 5x5 table is only
>>>>> sparsely populated), but it looks correct, and more importantly easier to
>>>>> read.
>>>>
>>>> Yeah I was out of ideas on how to improve it. Fresh mind needed to
>>>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map
>>>> to bits and registers respectively, and write it as a function.
>>>
>>> It's actually a very simple cyclic function based on register
>>> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings.
>>>
>>> (The only real challenge is picking the direction.)
>>>
>>> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484
>>> Author: Ben Widawsky <ben@bwidawsk.net>
>>> Date:   Wed Sep 14 20:32:47 2011 -0700
>>>
>>>      drm/i915: Dumb down the semaphore logic
>>>
>>>      While I think the previous code is correct, it was hard to follow and
>>>      hard to debug. Since we already have a ring abstraction, might as well
>>>      use it to handle the semaphore updates and compares.
>>
>> Should I try to go back to that then? Since I am not too happy with
>> the sparse table...
>>
>> This has passed CI so we could merge some of it if that would help
>> your series, or wait until I rework this patch.
>
> The rule of thumb is incremental improvements tell a better story and
> should be easier to find a misstep. (My personal experience says the
> longer I play with a patch the larger it gets...)
>
> In short, you've already consolidated a lot of duplication in the vfuncs
> that will make my life easier (after some rebasing joy). Anything more
> is just icing on the cake. :)

It also looks like I have broke something, wonder how CI did not catch 
it or I am imagining things. "drm/i915: Consolidate dispatch_execbuffer 
vfunc" looks wrong wrt add_request and dispatch_execbuffer for gen8+. 
Leaving it for tomorrow.

Regards,

Tvrtko
Chris Wilson June 29, 2016, 4:43 p.m. UTC | #7
On Wed, Jun 29, 2016 at 05:34:27PM +0100, Tvrtko Ursulin wrote:
> It also looks like I have broke something, wonder how CI did not
> catch it or I am imagining things. "drm/i915: Consolidate
> dispatch_execbuffer vfunc" looks wrong wrt add_request and
> dispatch_execbuffer for gen8+. Leaving it for tomorrow.

Ah, should also include i915.enable_execlists=0 when sending to trybot.
-Chris
Tvrtko Ursulin July 15, 2016, 1:13 p.m. UTC | #8
On 29/06/16 17:00, Chris Wilson wrote:
> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote:
>>
>> On 29/06/16 16:34, Chris Wilson wrote:
>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> Replace per-engine initialization with a common half-programatic,
>>>> half-data driven code for ease of maintenance and compactness.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> This is the biggest pill to swallow (since our 5x5 table is only
>>> sparsely populated), but it looks correct, and more importantly easier to
>>> read.
>>
>> Yeah I was out of ideas on how to improve it. Fresh mind needed to
>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map
>> to bits and registers respectively, and write it as a function.
>
> It's actually a very simple cyclic function based on register
> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings.
>
> (The only real challenge is picking the direction.)
>
> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484
> Author: Ben Widawsky <ben@bwidawsk.net>
> Date:   Wed Sep 14 20:32:47 2011 -0700
>
>      drm/i915: Dumb down the semaphore logic
>
>      While I think the previous code is correct, it was hard to follow and
>      hard to debug. Since we already have a ring abstraction, might as well
>      use it to handle the semaphore updates and compares.

Doesn't seem to fit, or I just can't figure it out. Needs two functions 
to get rid of the table:

f1(0, 1) = 2
f1(0, 2) = 0
f1(0, 3) = 2
f1(1, 0) = 0
f1(1, 2) = 2
f1(1, 3) = 1
f1(2, 0) = 2
f1(2, 1) = 0
f1(2, 3) = 0
f1(3, 0) = 1
f1(3, 1) = 1
f1(3, 2) = 1

and:

f2(0, 1) = 1
f2(0, 2) = 0
f2(0, 3) = 1
f2(1, 0) = 0
f2(1, 2) = 1
f2(1, 3) = 2
f2(2, 0) = 1
f2(2, 1) = 0
f2(2, 3) = 0
f2(3, 0) = 2
f2(3, 1) = 2
f2(3, 2) = 2

A weekend math puzzle for someone? :)

Regards,

Tvrtko
Dave Gordon July 19, 2016, 6:38 p.m. UTC | #9
On 15/07/16 14:13, Tvrtko Ursulin wrote:
>
> On 29/06/16 17:00, Chris Wilson wrote:
>> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote:
>>>
>>> On 29/06/16 16:34, Chris Wilson wrote:
>>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote:
>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>
>>>>> Replace per-engine initialization with a common half-programatic,
>>>>> half-data driven code for ease of maintenance and compactness.
>>>>>
>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> This is the biggest pill to swallow (since our 5x5 table is only
>>>> sparsely populated), but it looks correct, and more importantly
>>>> easier to
>>>> read.
>>>
>>> Yeah I was out of ideas on how to improve it. Fresh mind needed to
>>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map
>>> to bits and registers respectively, and write it as a function.
>>
>> It's actually a very simple cyclic function based on register
>> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings.
>>
>> (The only real challenge is picking the direction.)
>>
>> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484
>> Author: Ben Widawsky <ben@bwidawsk.net>
>> Date:   Wed Sep 14 20:32:47 2011 -0700
>>
>>      drm/i915: Dumb down the semaphore logic
>>
>>      While I think the previous code is correct, it was hard to follow
>> and
>>      hard to debug. Since we already have a ring abstraction, might as
>> well
>>      use it to handle the semaphore updates and compares.
>
> Doesn't seem to fit, or I just can't figure it out. Needs two functions
> to get rid of the table:
>
> f1(0, 1) = 2
> f1(0, 2) = 0
> f1(0, 3) = 2
> f1(1, 0) = 0
> f1(1, 2) = 2
> f1(1, 3) = 1
> f1(2, 0) = 2
> f1(2, 1) = 0
> f1(2, 3) = 0
> f1(3, 0) = 1
> f1(3, 1) = 1
> f1(3, 2) = 1
>
> and:
>
> f2(0, 1) = 1
> f2(0, 2) = 0
> f2(0, 3) = 1
> f2(1, 0) = 0
> f2(1, 2) = 1
> f2(1, 3) = 2
> f2(2, 0) = 1
> f2(2, 1) = 0
> f2(2, 3) = 0
> f2(3, 0) = 2
> f2(3, 1) = 2
> f2(3, 2) = 2
>
> A weekend math puzzle for someone? :)
>
> Regards,
> Tvrtko

Here's the APL expression for (the transpose of) f2, with -1's filled in 
along the leading diagonal (you need ⎕io←0 so the ⍳-vectors are in origin 0)

       {¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4

┌────────┬────────┬────────┬────────┐
│¯1 0 1 2│1 ¯1 0 2│0 1 ¯1 2│1 2 0 ¯1│
└────────┴────────┴────────┴────────┘

or transposed back so that the first argument is the row index and the 
second is the column index:

       ⍉↑{¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4

¯1  1  0  1
  0 ¯1  1  2
  1  0 ¯1  0
  2  2  2 ¯1

http://tryapl.org/?a=%u2349%u2191%7B%AF1+%28%u2375%u2260%u23734%29%u2340%282%7C%u2375%29%u233D%28%u233D%u2363%281%3D%u2375%29%291+%u23733%7D%A8%u23734&run

f1 is trivially derived from this by the observation that f1 is just f2 
with the 1's and 2's interchanged.

.Dave.
Tvrtko Ursulin July 20, 2016, 9:54 a.m. UTC | #10
On 19/07/16 19:38, Dave Gordon wrote:
> On 15/07/16 14:13, Tvrtko Ursulin wrote:
>>
>> On 29/06/16 17:00, Chris Wilson wrote:
>>> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 29/06/16 16:34, Chris Wilson wrote:
>>>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote:
>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>
>>>>>> Replace per-engine initialization with a common half-programatic,
>>>>>> half-data driven code for ease of maintenance and compactness.
>>>>>>
>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>
>>>>> This is the biggest pill to swallow (since our 5x5 table is only
>>>>> sparsely populated), but it looks correct, and more importantly
>>>>> easier to
>>>>> read.
>>>>
>>>> Yeah I was out of ideas on how to improve it. Fresh mind needed to
>>>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map
>>>> to bits and registers respectively, and write it as a function.
>>>
>>> It's actually a very simple cyclic function based on register
>>> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings.
>>>
>>> (The only real challenge is picking the direction.)
>>>
>>> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484
>>> Author: Ben Widawsky <ben@bwidawsk.net>
>>> Date:   Wed Sep 14 20:32:47 2011 -0700
>>>
>>>      drm/i915: Dumb down the semaphore logic
>>>
>>>      While I think the previous code is correct, it was hard to follow
>>> and
>>>      hard to debug. Since we already have a ring abstraction, might as
>>> well
>>>      use it to handle the semaphore updates and compares.
>>
>> Doesn't seem to fit, or I just can't figure it out. Needs two functions
>> to get rid of the table:
>>
>> f1(0, 1) = 2
>> f1(0, 2) = 0
>> f1(0, 3) = 2
>> f1(1, 0) = 0
>> f1(1, 2) = 2
>> f1(1, 3) = 1
>> f1(2, 0) = 2
>> f1(2, 1) = 0
>> f1(2, 3) = 0
>> f1(3, 0) = 1
>> f1(3, 1) = 1
>> f1(3, 2) = 1
>>
>> and:
>>
>> f2(0, 1) = 1
>> f2(0, 2) = 0
>> f2(0, 3) = 1
>> f2(1, 0) = 0
>> f2(1, 2) = 1
>> f2(1, 3) = 2
>> f2(2, 0) = 1
>> f2(2, 1) = 0
>> f2(2, 3) = 0
>> f2(3, 0) = 2
>> f2(3, 1) = 2
>> f2(3, 2) = 2
>>
>> A weekend math puzzle for someone? :)
>>
>> Regards,
>> Tvrtko
>
> Here's the APL expression for (the transpose of) f2, with -1's filled in
> along the leading diagonal (you need ⎕io←0 so the ⍳-vectors are in
> origin 0)
>
>        {¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4
>
> ┌────────┬────────┬────────┬────────┐
> │¯1 0 1 2│1 ¯1 0 2│0 1 ¯1 2│1 2 0 ¯1│
> └────────┴────────┴────────┴────────┘
>
> or transposed back so that the first argument is the row index and the
> second is the column index:
>
>        ⍉↑{¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4
>
> ¯1  1  0  1
>   0 ¯1  1  2
>   1  0 ¯1  0
>   2  2  2 ¯1
>
> http://tryapl.org/?a=%u2349%u2191%7B%AF1+%28%u2375%u2260%u23734%29%u2340%282%7C%u2375%29%u233D%28%u233D%u2363%281%3D%u2375%29%291+%u23733%7D%A8%u23734&run

  :-C ! How to convert that to C ? :)

> f1 is trivially derived from this by the observation that f1 is just f2
> with the 1's and 2's interchanged.

Ah yes, nicely spotted.

Regards,

Tvrtko
Dave Gordon July 20, 2016, 12:50 p.m. UTC | #11
On 20/07/16 10:54, Tvrtko Ursulin wrote:
>
> On 19/07/16 19:38, Dave Gordon wrote:
>> On 15/07/16 14:13, Tvrtko Ursulin wrote:
>>>
>>> On 29/06/16 17:00, Chris Wilson wrote:
>>>> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 29/06/16 16:34, Chris Wilson wrote:
>>>>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote:
>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>
>>>>>>> Replace per-engine initialization with a common half-programatic,
>>>>>>> half-data driven code for ease of maintenance and compactness.
>>>>>>>
>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>
>>>>>> This is the biggest pill to swallow (since our 5x5 table is only
>>>>>> sparsely populated), but it looks correct, and more importantly
>>>>>> easier to
>>>>>> read.
>>>>>
>>>>> Yeah I was out of ideas on how to improve it. Fresh mind needed to
>>>>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map
>>>>> to bits and registers respectively, and write it as a function.
>>>>
>>>> It's actually a very simple cyclic function based on register
>>>> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings.
>>>>
>>>> (The only real challenge is picking the direction.)
>>>>
>>>> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484
>>>> Author: Ben Widawsky <ben@bwidawsk.net>
>>>> Date:   Wed Sep 14 20:32:47 2011 -0700
>>>>
>>>>      drm/i915: Dumb down the semaphore logic
>>>>
>>>>      While I think the previous code is correct, it was hard to follow
>>>> and
>>>>      hard to debug. Since we already have a ring abstraction, might as
>>>> well
>>>>      use it to handle the semaphore updates and compares.
>>>
>>> Doesn't seem to fit, or I just can't figure it out. Needs two functions
>>> to get rid of the table:
>>>
>>> f1(0, 1) = 2
>>> f1(0, 2) = 0
>>> f1(0, 3) = 2
>>> f1(1, 0) = 0
>>> f1(1, 2) = 2
>>> f1(1, 3) = 1
>>> f1(2, 0) = 2
>>> f1(2, 1) = 0
>>> f1(2, 3) = 0
>>> f1(3, 0) = 1
>>> f1(3, 1) = 1
>>> f1(3, 2) = 1
>>>
>>> and:
>>>
>>> f2(0, 1) = 1
>>> f2(0, 2) = 0
>>> f2(0, 3) = 1
>>> f2(1, 0) = 0
>>> f2(1, 2) = 1
>>> f2(1, 3) = 2
>>> f2(2, 0) = 1
>>> f2(2, 1) = 0
>>> f2(2, 3) = 0
>>> f2(3, 0) = 2
>>> f2(3, 1) = 2
>>> f2(3, 2) = 2
>>>
>>> A weekend math puzzle for someone? :)
>>>
>>> Regards,
>>> Tvrtko
>>
>> Here's the APL expression for (the transpose of) f2, with -1's filled in
>> along the leading diagonal (you need ⎕io←0 so the ⍳-vectors are in
>> origin 0)
>>
>>        {¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4
>>
>> ┌────────┬────────┬────────┬────────┐
>> │¯1 0 1 2│1 ¯1 0 2│0 1 ¯1 2│1 2 0 ¯1│
>> └────────┴────────┴────────┴────────┘
>>
>> or transposed back so that the first argument is the row index and the
>> second is the column index:
>>
>>        ⍉↑{¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4
>>
>>  ¯1  1  0  1
>>   0 ¯1  1  2
>>   1  0 ¯1  0
>>   2  2  2 ¯1
>>
>> http://tryapl.org/?a=%u2349%u2191%7B%AF1+%28%u2375%u2260%u23734%29%u2340%282%7C%u2375%29%u233D%28%u233D%u2363%281%3D%u2375%29%291+%u23733%7D%A8%u23734&run
>>
>
>   :-C ! How to convert that to C ? :)
>
>> f1 is trivially derived from this by the observation that f1 is just f2
>> with the 1's and 2's interchanged.
>
> Ah yes, nicely spotted.
>
> Regards,
> Tvrtko

Assuming you don't care about the leading diagonal (x == y), then

  (⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))

translates into:

int f2(unsigned int x, unsigned int y)
{
     x -= x >= y;
     if (y == 1)
         x = 3 - x;
     x += y & 1;
     return x % 3;
}

y:x 0 1 2 3
0:  0 0 1 2
1:  1 1 0 2
2:  0 1 1 2
3:  1 2 0 0

Each line of C corresponds quite closely to one operation in the APL :)
Although, in APL we tend to leave the data unchanged while shuffling it 
around into new shapes, whereas the C below does the equivalent things 
by changing the data (noting that it's all modulo-3 arithmetic).

  (⍵≠⍳4)⍀  inserts the leading diagonal, corresponding to the subtraction
           of x >= y (which removes the leading diagonal).

  ⌽⍣(1=⍵)  reverses the sequence if y==1; in C, that's the 3-x

  (2|⍵)⌽   rotates the sequence by 1 if y is odd; that's the +=

and the final % ensures that the result is 0-2.

.Dave.
Tvrtko Ursulin July 20, 2016, 4:07 p.m. UTC | #12
On 20/07/16 13:50, Dave Gordon wrote:
> On 20/07/16 10:54, Tvrtko Ursulin wrote:
>>
>> On 19/07/16 19:38, Dave Gordon wrote:
>>> On 15/07/16 14:13, Tvrtko Ursulin wrote:
>>>>
>>>> On 29/06/16 17:00, Chris Wilson wrote:
>>>>> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 29/06/16 16:34, Chris Wilson wrote:
>>>>>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote:
>>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>>
>>>>>>>> Replace per-engine initialization with a common half-programatic,
>>>>>>>> half-data driven code for ease of maintenance and compactness.
>>>>>>>>
>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>
>>>>>>> This is the biggest pill to swallow (since our 5x5 table is only
>>>>>>> sparsely populated), but it looks correct, and more importantly
>>>>>>> easier to
>>>>>>> read.
>>>>>>
>>>>>> Yeah I was out of ideas on how to improve it. Fresh mind needed to
>>>>>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map
>>>>>> to bits and registers respectively, and write it as a function.
>>>>>
>>>>> It's actually a very simple cyclic function based on register
>>>>> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings.
>>>>>
>>>>> (The only real challenge is picking the direction.)
>>>>>
>>>>> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484
>>>>> Author: Ben Widawsky <ben@bwidawsk.net>
>>>>> Date:   Wed Sep 14 20:32:47 2011 -0700
>>>>>
>>>>>      drm/i915: Dumb down the semaphore logic
>>>>>
>>>>>      While I think the previous code is correct, it was hard to follow
>>>>> and
>>>>>      hard to debug. Since we already have a ring abstraction, might as
>>>>> well
>>>>>      use it to handle the semaphore updates and compares.
>>>>
>>>> Doesn't seem to fit, or I just can't figure it out. Needs two functions
>>>> to get rid of the table:
>>>>
>>>> f1(0, 1) = 2
>>>> f1(0, 2) = 0
>>>> f1(0, 3) = 2
>>>> f1(1, 0) = 0
>>>> f1(1, 2) = 2
>>>> f1(1, 3) = 1
>>>> f1(2, 0) = 2
>>>> f1(2, 1) = 0
>>>> f1(2, 3) = 0
>>>> f1(3, 0) = 1
>>>> f1(3, 1) = 1
>>>> f1(3, 2) = 1
>>>>
>>>> and:
>>>>
>>>> f2(0, 1) = 1
>>>> f2(0, 2) = 0
>>>> f2(0, 3) = 1
>>>> f2(1, 0) = 0
>>>> f2(1, 2) = 1
>>>> f2(1, 3) = 2
>>>> f2(2, 0) = 1
>>>> f2(2, 1) = 0
>>>> f2(2, 3) = 0
>>>> f2(3, 0) = 2
>>>> f2(3, 1) = 2
>>>> f2(3, 2) = 2
>>>>
>>>> A weekend math puzzle for someone? :)
>>>>
>>>> Regards,
>>>> Tvrtko
>>>
>>> Here's the APL expression for (the transpose of) f2, with -1's filled in
>>> along the leading diagonal (you need ⎕io←0 so the ⍳-vectors are in
>>> origin 0)
>>>
>>>        {¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4
>>>
>>> ┌────────┬────────┬────────┬────────┐
>>> │¯1 0 1 2│1 ¯1 0 2│0 1 ¯1 2│1 2 0 ¯1│
>>> └────────┴────────┴────────┴────────┘
>>>
>>> or transposed back so that the first argument is the row index and the
>>> second is the column index:
>>>
>>>        ⍉↑{¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4
>>>
>>>  ¯1  1  0  1
>>>   0 ¯1  1  2
>>>   1  0 ¯1  0
>>>   2  2  2 ¯1
>>>
>>> http://tryapl.org/?a=%u2349%u2191%7B%AF1+%28%u2375%u2260%u23734%29%u2340%282%7C%u2375%29%u233D%28%u233D%u2363%281%3D%u2375%29%291+%u23733%7D%A8%u23734&run
>>>
>>>
>>
>>   :-C ! How to convert that to C ? :)
>>
>>> f1 is trivially derived from this by the observation that f1 is just f2
>>> with the 1's and 2's interchanged.
>>
>> Ah yes, nicely spotted.
>>
>> Regards,
>> Tvrtko
>
> Assuming you don't care about the leading diagonal (x == y), then
>
>   (⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))
>
> translates into:
>
> int f2(unsigned int x, unsigned int y)
> {
>      x -= x >= y;
>      if (y == 1)
>          x = 3 - x;
>      x += y & 1;
>      return x % 3;
> }
>
> y:x 0 1 2 3
> 0:  0 0 1 2
> 1:  1 1 0 2
> 2:  0 1 1 2
> 3:  1 2 0 0
>
> Each line of C corresponds quite closely to one operation in the APL :)
> Although, in APL we tend to leave the data unchanged while shuffling it
> around into new shapes, whereas the C below does the equivalent things
> by changing the data (noting that it's all modulo-3 arithmetic).
>
>   (⍵≠⍳4)⍀  inserts the leading diagonal, corresponding to the subtraction
>            of x >= y (which removes the leading diagonal).
>
>   ⌽⍣(1=⍵)  reverses the sequence if y==1; in C, that's the 3-x
>
>   (2|⍵)⌽   rotates the sequence by 1 if y is odd; that's the +=
>
> and the final % ensures that the result is 0-2.

I was hoping for a solution which does not include conditionals, someone 
led me to believe it is possible! :)

But thanks, your transformation really works. I've sent a patch 
implementing it to trybot for now.

Regards,

Tvrtko
Dave Gordon July 20, 2016, 5:08 p.m. UTC | #13
On 20/07/16 17:07, Tvrtko Ursulin wrote:
>
> On 20/07/16 13:50, Dave Gordon wrote:
>> On 20/07/16 10:54, Tvrtko Ursulin wrote:
>>>
>>> On 19/07/16 19:38, Dave Gordon wrote:
>>>> On 15/07/16 14:13, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 29/06/16 17:00, Chris Wilson wrote:
>>>>>> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 29/06/16 16:34, Chris Wilson wrote:
>>>>>>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote:
>>>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>>>
>>>>>>>>> Replace per-engine initialization with a common half-programatic,
>>>>>>>>> half-data driven code for ease of maintenance and compactness.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>>
>>>>>>>> This is the biggest pill to swallow (since our 5x5 table is only
>>>>>>>> sparsely populated), but it looks correct, and more importantly
>>>>>>>> easier to
>>>>>>>> read.
>>>>>>>
>>>>>>> Yeah I was out of ideas on how to improve it. Fresh mind needed to
>>>>>>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map
>>>>>>> to bits and registers respectively, and write it as a function.
>>>>>>
>>>>>> It's actually a very simple cyclic function based on register
>>>>>> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings.
>>>>>>
>>>>>> (The only real challenge is picking the direction.)
>>>>>>
>>>>>> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484
>>>>>> Author: Ben Widawsky <ben@bwidawsk.net>
>>>>>> Date:   Wed Sep 14 20:32:47 2011 -0700
>>>>>>
>>>>>>      drm/i915: Dumb down the semaphore logic
>>>>>>
>>>>>>      While I think the previous code is correct, it was hard to
>>>>>> follow
>>>>>> and
>>>>>>      hard to debug. Since we already have a ring abstraction,
>>>>>> might as
>>>>>> well
>>>>>>      use it to handle the semaphore updates and compares.
>>>>>
>>>>> Doesn't seem to fit, or I just can't figure it out. Needs two
>>>>> functions
>>>>> to get rid of the table:
>>>>>
>>>>> f1(0, 1) = 2
>>>>> f1(0, 2) = 0
>>>>> f1(0, 3) = 2
>>>>> f1(1, 0) = 0
>>>>> f1(1, 2) = 2
>>>>> f1(1, 3) = 1
>>>>> f1(2, 0) = 2
>>>>> f1(2, 1) = 0
>>>>> f1(2, 3) = 0
>>>>> f1(3, 0) = 1
>>>>> f1(3, 1) = 1
>>>>> f1(3, 2) = 1
>>>>>
>>>>> and:
>>>>>
>>>>> f2(0, 1) = 1
>>>>> f2(0, 2) = 0
>>>>> f2(0, 3) = 1
>>>>> f2(1, 0) = 0
>>>>> f2(1, 2) = 1
>>>>> f2(1, 3) = 2
>>>>> f2(2, 0) = 1
>>>>> f2(2, 1) = 0
>>>>> f2(2, 3) = 0
>>>>> f2(3, 0) = 2
>>>>> f2(3, 1) = 2
>>>>> f2(3, 2) = 2
>>>>>
>>>>> A weekend math puzzle for someone? :)
>>>>>
>>>>> Regards,
>>>>> Tvrtko
>>>>
>>>> Here's the APL expression for (the transpose of) f2, with -1's
>>>> filled in
>>>> along the leading diagonal (you need ⎕io←0 so the ⍳-vectors are in
>>>> origin 0)
>>>>
>>>>        {¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4
>>>>
>>>> ┌────────┬────────┬────────┬────────┐
>>>> │¯1 0 1 2│1 ¯1 0 2│0 1 ¯1 2│1 2 0 ¯1│
>>>> └────────┴────────┴────────┴────────┘
>>>>
>>>> or transposed back so that the first argument is the row index and the
>>>> second is the column index:
>>>>
>>>>        ⍉↑{¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4
>>>>
>>>>  ¯1  1  0  1
>>>>   0 ¯1  1  2
>>>>   1  0 ¯1  0
>>>>   2  2  2 ¯1
>>>>
>>>> http://tryapl.org/?a=%u2349%u2191%7B%AF1+%28%u2375%u2260%u23734%29%u2340%282%7C%u2375%29%u233D%28%u233D%u2363%281%3D%u2375%29%291+%u23733%7D%A8%u23734&run
>>>>
>>>>
>>>>
>>>
>>>   :-C ! How to convert that to C ? :)
>>>
>>>> f1 is trivially derived from this by the observation that f1 is just f2
>>>> with the 1's and 2's interchanged.
>>>
>>> Ah yes, nicely spotted.
>>>
>>> Regards,
>>> Tvrtko
>>
>> Assuming you don't care about the leading diagonal (x == y), then
>>
>>   (⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))
>>
>> translates into:
>>
>> int f2(unsigned int x, unsigned int y)
>> {
>>      x -= x >= y;
>>      if (y == 1)
>>          x = 3 - x;
>>      x += y & 1;
>>      return x % 3;
>> }
>>
>> y:x 0 1 2 3
>> 0:  0 0 1 2
>> 1:  1 1 0 2
>> 2:  0 1 1 2
>> 3:  1 2 0 0
>>
>> Each line of C corresponds quite closely to one operation in the APL :)
>> Although, in APL we tend to leave the data unchanged while shuffling it
>> around into new shapes, whereas the C below does the equivalent things
>> by changing the data (noting that it's all modulo-3 arithmetic).
>>
>>   (⍵≠⍳4)⍀  inserts the leading diagonal, corresponding to the subtraction
>>            of x >= y (which removes the leading diagonal).
>>
>>   ⌽⍣(1=⍵)  reverses the sequence if y==1; in C, that's the 3-x
>>
>>   (2|⍵)⌽   rotates the sequence by 1 if y is odd; that's the +=
>>
>> and the final % ensures that the result is 0-2.
>
> I was hoping for a solution which does not include conditionals, someone
> led me to believe it is possible! :)
>
> But thanks, your transformation really works. I've sent a patch
> implementing it to trybot for now.
>
> Regards,
> Tvrtko

You can write it like this if you don't want any visible conditionals :)

unsigned int f2(unsigned int x, unsigned int y)
{
	x -= x >= y;
	x += y & 1;
	x ^= y & x >> y;	/* WTF?	*/
	return x % 3;
}

But I think that's even more obscure.

.Dave.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 648ddee60c24..7bbc59eef267 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2921,6 +2921,54 @@  static void intel_ring_init_semaphores(struct drm_i915_private *dev_priv,
 	} else if (INTEL_GEN(dev_priv) >= 6) {
 		engine->semaphore.sync_to = gen6_ring_sync;
 		engine->semaphore.signal = gen6_signal;
+
+		/*
+		 * The current semaphore is only applied on pre-gen8
+		 * platform.  And there is no VCS2 ring on the pre-gen8
+		 * platform. So the semaphore between RCS and VCS2 is
+		 * initialized as INVALID.  Gen8 will initialize the
+		 * sema between VCS2 and RCS later.
+		 */
+		for (i = 0; i < I915_NUM_ENGINES; i++) {
+			static const struct {
+				u32 wait_mbox;
+				i915_reg_t mbox_reg;
+			} sem_data[I915_NUM_ENGINES][I915_NUM_ENGINES] = {
+				[RCS] = {
+					[VCS] =  { .wait_mbox = MI_SEMAPHORE_SYNC_RV,  .mbox_reg = GEN6_VRSYNC },
+					[BCS] =  { .wait_mbox = MI_SEMAPHORE_SYNC_RB,  .mbox_reg = GEN6_BRSYNC },
+					[VECS] = { .wait_mbox = MI_SEMAPHORE_SYNC_RVE, .mbox_reg = GEN6_VERSYNC },
+				},
+				[VCS] = {
+					[RCS] =  { .wait_mbox = MI_SEMAPHORE_SYNC_VR,  .mbox_reg = GEN6_RVSYNC },
+					[BCS] =  { .wait_mbox = MI_SEMAPHORE_SYNC_VB,  .mbox_reg = GEN6_BVSYNC },
+					[VECS] = { .wait_mbox = MI_SEMAPHORE_SYNC_VVE, .mbox_reg = GEN6_VEVSYNC },
+				},
+				[BCS] = {
+					[RCS] =  { .wait_mbox = MI_SEMAPHORE_SYNC_BR,  .mbox_reg = GEN6_RBSYNC },
+					[VCS] =  { .wait_mbox = MI_SEMAPHORE_SYNC_BV,  .mbox_reg = GEN6_VBSYNC },
+					[VECS] = { .wait_mbox = MI_SEMAPHORE_SYNC_BVE, .mbox_reg = GEN6_VEBSYNC },
+				},
+				[VECS] = {
+					[RCS] =  { .wait_mbox = MI_SEMAPHORE_SYNC_VER, .mbox_reg = GEN6_RVESYNC },
+					[VCS] =  { .wait_mbox = MI_SEMAPHORE_SYNC_VEV, .mbox_reg = GEN6_VVESYNC },
+					[BCS] =  { .wait_mbox = MI_SEMAPHORE_SYNC_VEB, .mbox_reg = GEN6_BVESYNC },
+				},
+			};
+			u32 wait_mbox;
+			i915_reg_t mbox_reg;
+
+			if (i == engine->id || i == VCS2) {
+				wait_mbox = MI_SEMAPHORE_SYNC_INVALID;
+				mbox_reg = GEN6_NOSYNC;
+			} else {
+				wait_mbox = sem_data[engine->id][i].wait_mbox;
+				mbox_reg = sem_data[engine->id][i].mbox_reg;
+			}
+
+			engine->semaphore.mbox.wait[i] = wait_mbox;
+			engine->semaphore.mbox.signal[i] = mbox_reg;
+		}
 	}
 }
 
@@ -2991,25 +3039,6 @@  int intel_init_render_ring_buffer(struct drm_device *dev)
 		if (IS_GEN6(dev_priv))
 			engine->flush = gen6_render_ring_flush;
 		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
-		if (i915_semaphore_is_enabled(dev_priv)) {
-			/*
-			 * The current semaphore is only applied on pre-gen8
-			 * platform.  And there is no VCS2 ring on the pre-gen8
-			 * platform. So the semaphore between RCS and VCS2 is
-			 * initialized as INVALID.  Gen8 will initialize the
-			 * sema between VCS2 and RCS later.
-			 */
-			engine->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_INVALID;
-			engine->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_RV;
-			engine->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_RB;
-			engine->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_RVE;
-			engine->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID;
-			engine->semaphore.mbox.signal[RCS] = GEN6_NOSYNC;
-			engine->semaphore.mbox.signal[VCS] = GEN6_VRSYNC;
-			engine->semaphore.mbox.signal[BCS] = GEN6_BRSYNC;
-			engine->semaphore.mbox.signal[VECS] = GEN6_VERSYNC;
-			engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
-		}
 	} else if (IS_GEN5(dev_priv)) {
 		engine->add_request = pc_render_add_request;
 		engine->flush = gen4_render_ring_flush;
@@ -3089,18 +3118,6 @@  int intel_init_bsd_ring_buffer(struct drm_device *dev)
 				GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
 		} else {
 			engine->irq_enable_mask = GT_BSD_USER_INTERRUPT;
-			if (i915_semaphore_is_enabled(dev_priv)) {
-				engine->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_VR;
-				engine->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_INVALID;
-				engine->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_VB;
-				engine->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_VVE;
-				engine->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID;
-				engine->semaphore.mbox.signal[RCS] = GEN6_RVSYNC;
-				engine->semaphore.mbox.signal[VCS] = GEN6_NOSYNC;
-				engine->semaphore.mbox.signal[BCS] = GEN6_BVSYNC;
-				engine->semaphore.mbox.signal[VECS] = GEN6_VEVSYNC;
-				engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
-			}
 		}
 	} else {
 		engine->mmio_base = BSD_RING_BASE;
@@ -3157,25 +3174,6 @@  int intel_init_blt_ring_buffer(struct drm_device *dev)
 			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
 	} else {
 		engine->irq_enable_mask = GT_BLT_USER_INTERRUPT;
-		if (i915_semaphore_is_enabled(dev_priv)) {
-			/*
-			 * The current semaphore is only applied on pre-gen8
-			 * platform.  And there is no VCS2 ring on the pre-gen8
-			 * platform. So the semaphore between BCS and VCS2 is
-			 * initialized as INVALID.  Gen8 will initialize the
-			 * sema between BCS and VCS2 later.
-			 */
-			engine->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_BR;
-			engine->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_BV;
-			engine->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_INVALID;
-			engine->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_BVE;
-			engine->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID;
-			engine->semaphore.mbox.signal[RCS] = GEN6_RBSYNC;
-			engine->semaphore.mbox.signal[VCS] = GEN6_VBSYNC;
-			engine->semaphore.mbox.signal[BCS] = GEN6_NOSYNC;
-			engine->semaphore.mbox.signal[VECS] = GEN6_VEBSYNC;
-			engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
-		}
 	}
 
 	return intel_init_ring_buffer(dev, engine);
@@ -3203,18 +3201,6 @@  int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		engine->irq_enable_mask = PM_VEBOX_USER_INTERRUPT;
 		engine->irq_get = hsw_vebox_get_irq;
 		engine->irq_put = hsw_vebox_put_irq;
-		if (i915_semaphore_is_enabled(dev_priv)) {
-			engine->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_VER;
-			engine->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_VEV;
-			engine->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_VEB;
-			engine->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_INVALID;
-			engine->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID;
-			engine->semaphore.mbox.signal[RCS] = GEN6_RVESYNC;
-			engine->semaphore.mbox.signal[VCS] = GEN6_VVESYNC;
-			engine->semaphore.mbox.signal[BCS] = GEN6_BVESYNC;
-			engine->semaphore.mbox.signal[VECS] = GEN6_NOSYNC;
-			engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
-		}
 	}
 
 	return intel_init_ring_buffer(dev, engine);