Message ID | 1467212972-861-12-git-send-email-tvrtko.ursulin@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Replace per-engine initialization with a common half-programatic, > half-data driven code for ease of maintenance and compactness. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> This is the biggest pill to swallow (since our 5x5 table is only sparsely populated), but it looks correct, and more importantly easier to read. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris
On 29/06/16 16:34, Chris Wilson wrote: > On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> Replace per-engine initialization with a common half-programatic, >> half-data driven code for ease of maintenance and compactness. >> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > This is the biggest pill to swallow (since our 5x5 table is only > sparsely populated), but it looks correct, and more importantly easier to > read. Yeah I was out of ideas on how to improve it. Fresh mind needed to try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map to bits and registers respectively, and write it as a function. > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Thanks! Regards, Tvrtko
On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote: > > On 29/06/16 16:34, Chris Wilson wrote: > >On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote: > >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> > >>Replace per-engine initialization with a common half-programatic, > >>half-data driven code for ease of maintenance and compactness. > >> > >>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > > >This is the biggest pill to swallow (since our 5x5 table is only > >sparsely populated), but it looks correct, and more importantly easier to > >read. > > Yeah I was out of ideas on how to improve it. Fresh mind needed to > try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map > to bits and registers respectively, and write it as a function. It's actually a very simple cyclic function based on register offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings. (The only real challenge is picking the direction.) commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484 Author: Ben Widawsky <ben@bwidawsk.net> Date: Wed Sep 14 20:32:47 2011 -0700 drm/i915: Dumb down the semaphore logic While I think the previous code is correct, it was hard to follow and hard to debug. Since we already have a ring abstraction, might as well use it to handle the semaphore updates and compares. -Chris
On 29/06/16 17:00, Chris Wilson wrote: > On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote: >> >> On 29/06/16 16:34, Chris Wilson wrote: >>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote: >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> >>>> Replace per-engine initialization with a common half-programatic, >>>> half-data driven code for ease of maintenance and compactness. >>>> >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> >>> This is the biggest pill to swallow (since our 5x5 table is only >>> sparsely populated), but it looks correct, and more importantly easier to >>> read. >> >> Yeah I was out of ideas on how to improve it. Fresh mind needed to >> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map >> to bits and registers respectively, and write it as a function. > > It's actually a very simple cyclic function based on register > offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings. > > (The only real challenge is picking the direction.) > > commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484 > Author: Ben Widawsky <ben@bwidawsk.net> > Date: Wed Sep 14 20:32:47 2011 -0700 > > drm/i915: Dumb down the semaphore logic > > While I think the previous code is correct, it was hard to follow and > hard to debug. Since we already have a ring abstraction, might as well > use it to handle the semaphore updates and compares. Should I try to go back to that then? Since I am not too happy with the sparse table... This has passed CI so we could merge some of it if that would help your series, or wait until I rework this patch. Regards, Tvrtko
On Wed, Jun 29, 2016 at 05:14:11PM +0100, Tvrtko Ursulin wrote: > > On 29/06/16 17:00, Chris Wilson wrote: > >On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote: > >> > >>On 29/06/16 16:34, Chris Wilson wrote: > >>>On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote: > >>>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >>>> > >>>>Replace per-engine initialization with a common half-programatic, > >>>>half-data driven code for ease of maintenance and compactness. > >>>> > >>>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >>> > >>>This is the biggest pill to swallow (since our 5x5 table is only > >>>sparsely populated), but it looks correct, and more importantly easier to > >>>read. > >> > >>Yeah I was out of ideas on how to improve it. Fresh mind needed to > >>try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map > >>to bits and registers respectively, and write it as a function. > > > >It's actually a very simple cyclic function based on register > >offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings. > > > >(The only real challenge is picking the direction.) > > > >commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484 > >Author: Ben Widawsky <ben@bwidawsk.net> > >Date: Wed Sep 14 20:32:47 2011 -0700 > > > > drm/i915: Dumb down the semaphore logic > > > > While I think the previous code is correct, it was hard to follow and > > hard to debug. Since we already have a ring abstraction, might as well > > use it to handle the semaphore updates and compares. > > Should I try to go back to that then? Since I am not too happy with > the sparse table... > > This has passed CI so we could merge some of it if that would help > your series, or wait until I rework this patch. The rule of thumb is incremental improvements tell a better story and should be easier to find a misstep. (My personal experience says the longer I play with a patch the larger it gets...) In short, you've already consolidated a lot of duplication in the vfuncs that will make my life easier (after some rebasing joy). Anything more is just icing on the cake. :) -Chris
On 29/06/16 17:24, Chris Wilson wrote: > On Wed, Jun 29, 2016 at 05:14:11PM +0100, Tvrtko Ursulin wrote: >> >> On 29/06/16 17:00, Chris Wilson wrote: >>> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote: >>>> >>>> On 29/06/16 16:34, Chris Wilson wrote: >>>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote: >>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>> >>>>>> Replace per-engine initialization with a common half-programatic, >>>>>> half-data driven code for ease of maintenance and compactness. >>>>>> >>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>> >>>>> This is the biggest pill to swallow (since our 5x5 table is only >>>>> sparsely populated), but it looks correct, and more importantly easier to >>>>> read. >>>> >>>> Yeah I was out of ideas on how to improve it. Fresh mind needed to >>>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map >>>> to bits and registers respectively, and write it as a function. >>> >>> It's actually a very simple cyclic function based on register >>> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings. >>> >>> (The only real challenge is picking the direction.) >>> >>> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484 >>> Author: Ben Widawsky <ben@bwidawsk.net> >>> Date: Wed Sep 14 20:32:47 2011 -0700 >>> >>> drm/i915: Dumb down the semaphore logic >>> >>> While I think the previous code is correct, it was hard to follow and >>> hard to debug. Since we already have a ring abstraction, might as well >>> use it to handle the semaphore updates and compares. >> >> Should I try to go back to that then? Since I am not too happy with >> the sparse table... >> >> This has passed CI so we could merge some of it if that would help >> your series, or wait until I rework this patch. > > The rule of thumb is incremental improvements tell a better story and > should be easier to find a misstep. (My personal experience says the > longer I play with a patch the larger it gets...) > > In short, you've already consolidated a lot of duplication in the vfuncs > that will make my life easier (after some rebasing joy). Anything more > is just icing on the cake. :) It also looks like I have broke something, wonder how CI did not catch it or I am imagining things. "drm/i915: Consolidate dispatch_execbuffer vfunc" looks wrong wrt add_request and dispatch_execbuffer for gen8+. Leaving it for tomorrow. Regards, Tvrtko
On Wed, Jun 29, 2016 at 05:34:27PM +0100, Tvrtko Ursulin wrote: > It also looks like I have broke something, wonder how CI did not > catch it or I am imagining things. "drm/i915: Consolidate > dispatch_execbuffer vfunc" looks wrong wrt add_request and > dispatch_execbuffer for gen8+. Leaving it for tomorrow. Ah, should also include i915.enable_execlists=0 when sending to trybot. -Chris
On 29/06/16 17:00, Chris Wilson wrote: > On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote: >> >> On 29/06/16 16:34, Chris Wilson wrote: >>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote: >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> >>>> Replace per-engine initialization with a common half-programatic, >>>> half-data driven code for ease of maintenance and compactness. >>>> >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> >>> This is the biggest pill to swallow (since our 5x5 table is only >>> sparsely populated), but it looks correct, and more importantly easier to >>> read. >> >> Yeah I was out of ideas on how to improve it. Fresh mind needed to >> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map >> to bits and registers respectively, and write it as a function. > > It's actually a very simple cyclic function based on register > offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings. > > (The only real challenge is picking the direction.) > > commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484 > Author: Ben Widawsky <ben@bwidawsk.net> > Date: Wed Sep 14 20:32:47 2011 -0700 > > drm/i915: Dumb down the semaphore logic > > While I think the previous code is correct, it was hard to follow and > hard to debug. Since we already have a ring abstraction, might as well > use it to handle the semaphore updates and compares. Doesn't seem to fit, or I just can't figure it out. Needs two functions to get rid of the table: f1(0, 1) = 2 f1(0, 2) = 0 f1(0, 3) = 2 f1(1, 0) = 0 f1(1, 2) = 2 f1(1, 3) = 1 f1(2, 0) = 2 f1(2, 1) = 0 f1(2, 3) = 0 f1(3, 0) = 1 f1(3, 1) = 1 f1(3, 2) = 1 and: f2(0, 1) = 1 f2(0, 2) = 0 f2(0, 3) = 1 f2(1, 0) = 0 f2(1, 2) = 1 f2(1, 3) = 2 f2(2, 0) = 1 f2(2, 1) = 0 f2(2, 3) = 0 f2(3, 0) = 2 f2(3, 1) = 2 f2(3, 2) = 2 A weekend math puzzle for someone? :) Regards, Tvrtko
On 15/07/16 14:13, Tvrtko Ursulin wrote: > > On 29/06/16 17:00, Chris Wilson wrote: >> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote: >>> >>> On 29/06/16 16:34, Chris Wilson wrote: >>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote: >>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>> >>>>> Replace per-engine initialization with a common half-programatic, >>>>> half-data driven code for ease of maintenance and compactness. >>>>> >>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>> >>>> This is the biggest pill to swallow (since our 5x5 table is only >>>> sparsely populated), but it looks correct, and more importantly >>>> easier to >>>> read. >>> >>> Yeah I was out of ideas on how to improve it. Fresh mind needed to >>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map >>> to bits and registers respectively, and write it as a function. >> >> It's actually a very simple cyclic function based on register >> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings. >> >> (The only real challenge is picking the direction.) >> >> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484 >> Author: Ben Widawsky <ben@bwidawsk.net> >> Date: Wed Sep 14 20:32:47 2011 -0700 >> >> drm/i915: Dumb down the semaphore logic >> >> While I think the previous code is correct, it was hard to follow >> and >> hard to debug. Since we already have a ring abstraction, might as >> well >> use it to handle the semaphore updates and compares. > > Doesn't seem to fit, or I just can't figure it out. Needs two functions > to get rid of the table: > > f1(0, 1) = 2 > f1(0, 2) = 0 > f1(0, 3) = 2 > f1(1, 0) = 0 > f1(1, 2) = 2 > f1(1, 3) = 1 > f1(2, 0) = 2 > f1(2, 1) = 0 > f1(2, 3) = 0 > f1(3, 0) = 1 > f1(3, 1) = 1 > f1(3, 2) = 1 > > and: > > f2(0, 1) = 1 > f2(0, 2) = 0 > f2(0, 3) = 1 > f2(1, 0) = 0 > f2(1, 2) = 1 > f2(1, 3) = 2 > f2(2, 0) = 1 > f2(2, 1) = 0 > f2(2, 3) = 0 > f2(3, 0) = 2 > f2(3, 1) = 2 > f2(3, 2) = 2 > > A weekend math puzzle for someone? :) > > Regards, > Tvrtko Here's the APL expression for (the transpose of) f2, with -1's filled in along the leading diagonal (you need ⎕io←0 so the ⍳-vectors are in origin 0) {¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4 ┌────────┬────────┬────────┬────────┐ │¯1 0 1 2│1 ¯1 0 2│0 1 ¯1 2│1 2 0 ¯1│ └────────┴────────┴────────┴────────┘ or transposed back so that the first argument is the row index and the second is the column index: ⍉↑{¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4 ¯1 1 0 1 0 ¯1 1 2 1 0 ¯1 0 2 2 2 ¯1 http://tryapl.org/?a=%u2349%u2191%7B%AF1+%28%u2375%u2260%u23734%29%u2340%282%7C%u2375%29%u233D%28%u233D%u2363%281%3D%u2375%29%291+%u23733%7D%A8%u23734&run f1 is trivially derived from this by the observation that f1 is just f2 with the 1's and 2's interchanged. .Dave.
On 19/07/16 19:38, Dave Gordon wrote: > On 15/07/16 14:13, Tvrtko Ursulin wrote: >> >> On 29/06/16 17:00, Chris Wilson wrote: >>> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote: >>>> >>>> On 29/06/16 16:34, Chris Wilson wrote: >>>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote: >>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>> >>>>>> Replace per-engine initialization with a common half-programatic, >>>>>> half-data driven code for ease of maintenance and compactness. >>>>>> >>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>> >>>>> This is the biggest pill to swallow (since our 5x5 table is only >>>>> sparsely populated), but it looks correct, and more importantly >>>>> easier to >>>>> read. >>>> >>>> Yeah I was out of ideas on how to improve it. Fresh mind needed to >>>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map >>>> to bits and registers respectively, and write it as a function. >>> >>> It's actually a very simple cyclic function based on register >>> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings. >>> >>> (The only real challenge is picking the direction.) >>> >>> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484 >>> Author: Ben Widawsky <ben@bwidawsk.net> >>> Date: Wed Sep 14 20:32:47 2011 -0700 >>> >>> drm/i915: Dumb down the semaphore logic >>> >>> While I think the previous code is correct, it was hard to follow >>> and >>> hard to debug. Since we already have a ring abstraction, might as >>> well >>> use it to handle the semaphore updates and compares. >> >> Doesn't seem to fit, or I just can't figure it out. Needs two functions >> to get rid of the table: >> >> f1(0, 1) = 2 >> f1(0, 2) = 0 >> f1(0, 3) = 2 >> f1(1, 0) = 0 >> f1(1, 2) = 2 >> f1(1, 3) = 1 >> f1(2, 0) = 2 >> f1(2, 1) = 0 >> f1(2, 3) = 0 >> f1(3, 0) = 1 >> f1(3, 1) = 1 >> f1(3, 2) = 1 >> >> and: >> >> f2(0, 1) = 1 >> f2(0, 2) = 0 >> f2(0, 3) = 1 >> f2(1, 0) = 0 >> f2(1, 2) = 1 >> f2(1, 3) = 2 >> f2(2, 0) = 1 >> f2(2, 1) = 0 >> f2(2, 3) = 0 >> f2(3, 0) = 2 >> f2(3, 1) = 2 >> f2(3, 2) = 2 >> >> A weekend math puzzle for someone? :) >> >> Regards, >> Tvrtko > > Here's the APL expression for (the transpose of) f2, with -1's filled in > along the leading diagonal (you need ⎕io←0 so the ⍳-vectors are in > origin 0) > > {¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4 > > ┌────────┬────────┬────────┬────────┐ > │¯1 0 1 2│1 ¯1 0 2│0 1 ¯1 2│1 2 0 ¯1│ > └────────┴────────┴────────┴────────┘ > > or transposed back so that the first argument is the row index and the > second is the column index: > > ⍉↑{¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4 > > ¯1 1 0 1 > 0 ¯1 1 2 > 1 0 ¯1 0 > 2 2 2 ¯1 > > http://tryapl.org/?a=%u2349%u2191%7B%AF1+%28%u2375%u2260%u23734%29%u2340%282%7C%u2375%29%u233D%28%u233D%u2363%281%3D%u2375%29%291+%u23733%7D%A8%u23734&run :-C ! How to convert that to C ? :) > f1 is trivially derived from this by the observation that f1 is just f2 > with the 1's and 2's interchanged. Ah yes, nicely spotted. Regards, Tvrtko
On 20/07/16 10:54, Tvrtko Ursulin wrote: > > On 19/07/16 19:38, Dave Gordon wrote: >> On 15/07/16 14:13, Tvrtko Ursulin wrote: >>> >>> On 29/06/16 17:00, Chris Wilson wrote: >>>> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote: >>>>> >>>>> On 29/06/16 16:34, Chris Wilson wrote: >>>>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote: >>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>>> >>>>>>> Replace per-engine initialization with a common half-programatic, >>>>>>> half-data driven code for ease of maintenance and compactness. >>>>>>> >>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>> >>>>>> This is the biggest pill to swallow (since our 5x5 table is only >>>>>> sparsely populated), but it looks correct, and more importantly >>>>>> easier to >>>>>> read. >>>>> >>>>> Yeah I was out of ideas on how to improve it. Fresh mind needed to >>>>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map >>>>> to bits and registers respectively, and write it as a function. >>>> >>>> It's actually a very simple cyclic function based on register >>>> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings. >>>> >>>> (The only real challenge is picking the direction.) >>>> >>>> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484 >>>> Author: Ben Widawsky <ben@bwidawsk.net> >>>> Date: Wed Sep 14 20:32:47 2011 -0700 >>>> >>>> drm/i915: Dumb down the semaphore logic >>>> >>>> While I think the previous code is correct, it was hard to follow >>>> and >>>> hard to debug. Since we already have a ring abstraction, might as >>>> well >>>> use it to handle the semaphore updates and compares. >>> >>> Doesn't seem to fit, or I just can't figure it out. Needs two functions >>> to get rid of the table: >>> >>> f1(0, 1) = 2 >>> f1(0, 2) = 0 >>> f1(0, 3) = 2 >>> f1(1, 0) = 0 >>> f1(1, 2) = 2 >>> f1(1, 3) = 1 >>> f1(2, 0) = 2 >>> f1(2, 1) = 0 >>> f1(2, 3) = 0 >>> f1(3, 0) = 1 >>> f1(3, 1) = 1 >>> f1(3, 2) = 1 >>> >>> and: >>> >>> f2(0, 1) = 1 >>> f2(0, 2) = 0 >>> f2(0, 3) = 1 >>> f2(1, 0) = 0 >>> f2(1, 2) = 1 >>> f2(1, 3) = 2 >>> f2(2, 0) = 1 >>> f2(2, 1) = 0 >>> f2(2, 3) = 0 >>> f2(3, 0) = 2 >>> f2(3, 1) = 2 >>> f2(3, 2) = 2 >>> >>> A weekend math puzzle for someone? :) >>> >>> Regards, >>> Tvrtko >> >> Here's the APL expression for (the transpose of) f2, with -1's filled in >> along the leading diagonal (you need ⎕io←0 so the ⍳-vectors are in >> origin 0) >> >> {¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4 >> >> ┌────────┬────────┬────────┬────────┐ >> │¯1 0 1 2│1 ¯1 0 2│0 1 ¯1 2│1 2 0 ¯1│ >> └────────┴────────┴────────┴────────┘ >> >> or transposed back so that the first argument is the row index and the >> second is the column index: >> >> ⍉↑{¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4 >> >> ¯1 1 0 1 >> 0 ¯1 1 2 >> 1 0 ¯1 0 >> 2 2 2 ¯1 >> >> http://tryapl.org/?a=%u2349%u2191%7B%AF1+%28%u2375%u2260%u23734%29%u2340%282%7C%u2375%29%u233D%28%u233D%u2363%281%3D%u2375%29%291+%u23733%7D%A8%u23734&run >> > > :-C ! How to convert that to C ? :) > >> f1 is trivially derived from this by the observation that f1 is just f2 >> with the 1's and 2's interchanged. > > Ah yes, nicely spotted. > > Regards, > Tvrtko Assuming you don't care about the leading diagonal (x == y), then (⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵)) translates into: int f2(unsigned int x, unsigned int y) { x -= x >= y; if (y == 1) x = 3 - x; x += y & 1; return x % 3; } y:x 0 1 2 3 0: 0 0 1 2 1: 1 1 0 2 2: 0 1 1 2 3: 1 2 0 0 Each line of C corresponds quite closely to one operation in the APL :) Although, in APL we tend to leave the data unchanged while shuffling it around into new shapes, whereas the C below does the equivalent things by changing the data (noting that it's all modulo-3 arithmetic). (⍵≠⍳4)⍀ inserts the leading diagonal, corresponding to the subtraction of x >= y (which removes the leading diagonal). ⌽⍣(1=⍵) reverses the sequence if y==1; in C, that's the 3-x (2|⍵)⌽ rotates the sequence by 1 if y is odd; that's the += and the final % ensures that the result is 0-2. .Dave.
On 20/07/16 13:50, Dave Gordon wrote: > On 20/07/16 10:54, Tvrtko Ursulin wrote: >> >> On 19/07/16 19:38, Dave Gordon wrote: >>> On 15/07/16 14:13, Tvrtko Ursulin wrote: >>>> >>>> On 29/06/16 17:00, Chris Wilson wrote: >>>>> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote: >>>>>> >>>>>> On 29/06/16 16:34, Chris Wilson wrote: >>>>>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote: >>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>>>> >>>>>>>> Replace per-engine initialization with a common half-programatic, >>>>>>>> half-data driven code for ease of maintenance and compactness. >>>>>>>> >>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>>> >>>>>>> This is the biggest pill to swallow (since our 5x5 table is only >>>>>>> sparsely populated), but it looks correct, and more importantly >>>>>>> easier to >>>>>>> read. >>>>>> >>>>>> Yeah I was out of ideas on how to improve it. Fresh mind needed to >>>>>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map >>>>>> to bits and registers respectively, and write it as a function. >>>>> >>>>> It's actually a very simple cyclic function based on register >>>>> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings. >>>>> >>>>> (The only real challenge is picking the direction.) >>>>> >>>>> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484 >>>>> Author: Ben Widawsky <ben@bwidawsk.net> >>>>> Date: Wed Sep 14 20:32:47 2011 -0700 >>>>> >>>>> drm/i915: Dumb down the semaphore logic >>>>> >>>>> While I think the previous code is correct, it was hard to follow >>>>> and >>>>> hard to debug. Since we already have a ring abstraction, might as >>>>> well >>>>> use it to handle the semaphore updates and compares. >>>> >>>> Doesn't seem to fit, or I just can't figure it out. Needs two functions >>>> to get rid of the table: >>>> >>>> f1(0, 1) = 2 >>>> f1(0, 2) = 0 >>>> f1(0, 3) = 2 >>>> f1(1, 0) = 0 >>>> f1(1, 2) = 2 >>>> f1(1, 3) = 1 >>>> f1(2, 0) = 2 >>>> f1(2, 1) = 0 >>>> f1(2, 3) = 0 >>>> f1(3, 0) = 1 >>>> f1(3, 1) = 1 >>>> f1(3, 2) = 1 >>>> >>>> and: >>>> >>>> f2(0, 1) = 1 >>>> f2(0, 2) = 0 >>>> f2(0, 3) = 1 >>>> f2(1, 0) = 0 >>>> f2(1, 2) = 1 >>>> f2(1, 3) = 2 >>>> f2(2, 0) = 1 >>>> f2(2, 1) = 0 >>>> f2(2, 3) = 0 >>>> f2(3, 0) = 2 >>>> f2(3, 1) = 2 >>>> f2(3, 2) = 2 >>>> >>>> A weekend math puzzle for someone? :) >>>> >>>> Regards, >>>> Tvrtko >>> >>> Here's the APL expression for (the transpose of) f2, with -1's filled in >>> along the leading diagonal (you need ⎕io←0 so the ⍳-vectors are in >>> origin 0) >>> >>> {¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4 >>> >>> ┌────────┬────────┬────────┬────────┐ >>> │¯1 0 1 2│1 ¯1 0 2│0 1 ¯1 2│1 2 0 ¯1│ >>> └────────┴────────┴────────┴────────┘ >>> >>> or transposed back so that the first argument is the row index and the >>> second is the column index: >>> >>> ⍉↑{¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4 >>> >>> ¯1 1 0 1 >>> 0 ¯1 1 2 >>> 1 0 ¯1 0 >>> 2 2 2 ¯1 >>> >>> http://tryapl.org/?a=%u2349%u2191%7B%AF1+%28%u2375%u2260%u23734%29%u2340%282%7C%u2375%29%u233D%28%u233D%u2363%281%3D%u2375%29%291+%u23733%7D%A8%u23734&run >>> >>> >> >> :-C ! How to convert that to C ? :) >> >>> f1 is trivially derived from this by the observation that f1 is just f2 >>> with the 1's and 2's interchanged. >> >> Ah yes, nicely spotted. >> >> Regards, >> Tvrtko > > Assuming you don't care about the leading diagonal (x == y), then > > (⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵)) > > translates into: > > int f2(unsigned int x, unsigned int y) > { > x -= x >= y; > if (y == 1) > x = 3 - x; > x += y & 1; > return x % 3; > } > > y:x 0 1 2 3 > 0: 0 0 1 2 > 1: 1 1 0 2 > 2: 0 1 1 2 > 3: 1 2 0 0 > > Each line of C corresponds quite closely to one operation in the APL :) > Although, in APL we tend to leave the data unchanged while shuffling it > around into new shapes, whereas the C below does the equivalent things > by changing the data (noting that it's all modulo-3 arithmetic). > > (⍵≠⍳4)⍀ inserts the leading diagonal, corresponding to the subtraction > of x >= y (which removes the leading diagonal). > > ⌽⍣(1=⍵) reverses the sequence if y==1; in C, that's the 3-x > > (2|⍵)⌽ rotates the sequence by 1 if y is odd; that's the += > > and the final % ensures that the result is 0-2. I was hoping for a solution which does not include conditionals, someone led me to believe it is possible! :) But thanks, your transformation really works. I've sent a patch implementing it to trybot for now. Regards, Tvrtko
On 20/07/16 17:07, Tvrtko Ursulin wrote: > > On 20/07/16 13:50, Dave Gordon wrote: >> On 20/07/16 10:54, Tvrtko Ursulin wrote: >>> >>> On 19/07/16 19:38, Dave Gordon wrote: >>>> On 15/07/16 14:13, Tvrtko Ursulin wrote: >>>>> >>>>> On 29/06/16 17:00, Chris Wilson wrote: >>>>>> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote: >>>>>>> >>>>>>> On 29/06/16 16:34, Chris Wilson wrote: >>>>>>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote: >>>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>>>>> >>>>>>>>> Replace per-engine initialization with a common half-programatic, >>>>>>>>> half-data driven code for ease of maintenance and compactness. >>>>>>>>> >>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>>>>> >>>>>>>> This is the biggest pill to swallow (since our 5x5 table is only >>>>>>>> sparsely populated), but it looks correct, and more importantly >>>>>>>> easier to >>>>>>>> read. >>>>>>> >>>>>>> Yeah I was out of ideas on how to improve it. Fresh mind needed to >>>>>>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map >>>>>>> to bits and registers respectively, and write it as a function. >>>>>> >>>>>> It's actually a very simple cyclic function based on register >>>>>> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings. >>>>>> >>>>>> (The only real challenge is picking the direction.) >>>>>> >>>>>> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484 >>>>>> Author: Ben Widawsky <ben@bwidawsk.net> >>>>>> Date: Wed Sep 14 20:32:47 2011 -0700 >>>>>> >>>>>> drm/i915: Dumb down the semaphore logic >>>>>> >>>>>> While I think the previous code is correct, it was hard to >>>>>> follow >>>>>> and >>>>>> hard to debug. Since we already have a ring abstraction, >>>>>> might as >>>>>> well >>>>>> use it to handle the semaphore updates and compares. >>>>> >>>>> Doesn't seem to fit, or I just can't figure it out. Needs two >>>>> functions >>>>> to get rid of the table: >>>>> >>>>> f1(0, 1) = 2 >>>>> f1(0, 2) = 0 >>>>> f1(0, 3) = 2 >>>>> f1(1, 0) = 0 >>>>> f1(1, 2) = 2 >>>>> f1(1, 3) = 1 >>>>> f1(2, 0) = 2 >>>>> f1(2, 1) = 0 >>>>> f1(2, 3) = 0 >>>>> f1(3, 0) = 1 >>>>> f1(3, 1) = 1 >>>>> f1(3, 2) = 1 >>>>> >>>>> and: >>>>> >>>>> f2(0, 1) = 1 >>>>> f2(0, 2) = 0 >>>>> f2(0, 3) = 1 >>>>> f2(1, 0) = 0 >>>>> f2(1, 2) = 1 >>>>> f2(1, 3) = 2 >>>>> f2(2, 0) = 1 >>>>> f2(2, 1) = 0 >>>>> f2(2, 3) = 0 >>>>> f2(3, 0) = 2 >>>>> f2(3, 1) = 2 >>>>> f2(3, 2) = 2 >>>>> >>>>> A weekend math puzzle for someone? :) >>>>> >>>>> Regards, >>>>> Tvrtko >>>> >>>> Here's the APL expression for (the transpose of) f2, with -1's >>>> filled in >>>> along the leading diagonal (you need ⎕io←0 so the ⍳-vectors are in >>>> origin 0) >>>> >>>> {¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4 >>>> >>>> ┌────────┬────────┬────────┬────────┐ >>>> │¯1 0 1 2│1 ¯1 0 2│0 1 ¯1 2│1 2 0 ¯1│ >>>> └────────┴────────┴────────┴────────┘ >>>> >>>> or transposed back so that the first argument is the row index and the >>>> second is the column index: >>>> >>>> ⍉↑{¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4 >>>> >>>> ¯1 1 0 1 >>>> 0 ¯1 1 2 >>>> 1 0 ¯1 0 >>>> 2 2 2 ¯1 >>>> >>>> http://tryapl.org/?a=%u2349%u2191%7B%AF1+%28%u2375%u2260%u23734%29%u2340%282%7C%u2375%29%u233D%28%u233D%u2363%281%3D%u2375%29%291+%u23733%7D%A8%u23734&run >>>> >>>> >>>> >>> >>> :-C ! How to convert that to C ? :) >>> >>>> f1 is trivially derived from this by the observation that f1 is just f2 >>>> with the 1's and 2's interchanged. >>> >>> Ah yes, nicely spotted. >>> >>> Regards, >>> Tvrtko >> >> Assuming you don't care about the leading diagonal (x == y), then >> >> (⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵)) >> >> translates into: >> >> int f2(unsigned int x, unsigned int y) >> { >> x -= x >= y; >> if (y == 1) >> x = 3 - x; >> x += y & 1; >> return x % 3; >> } >> >> y:x 0 1 2 3 >> 0: 0 0 1 2 >> 1: 1 1 0 2 >> 2: 0 1 1 2 >> 3: 1 2 0 0 >> >> Each line of C corresponds quite closely to one operation in the APL :) >> Although, in APL we tend to leave the data unchanged while shuffling it >> around into new shapes, whereas the C below does the equivalent things >> by changing the data (noting that it's all modulo-3 arithmetic). >> >> (⍵≠⍳4)⍀ inserts the leading diagonal, corresponding to the subtraction >> of x >= y (which removes the leading diagonal). >> >> ⌽⍣(1=⍵) reverses the sequence if y==1; in C, that's the 3-x >> >> (2|⍵)⌽ rotates the sequence by 1 if y is odd; that's the += >> >> and the final % ensures that the result is 0-2. > > I was hoping for a solution which does not include conditionals, someone > led me to believe it is possible! :) > > But thanks, your transformation really works. I've sent a patch > implementing it to trybot for now. > > Regards, > Tvrtko You can write it like this if you don't want any visible conditionals :) unsigned int f2(unsigned int x, unsigned int y) { x -= x >= y; x += y & 1; x ^= y & x >> y; /* WTF? */ return x % 3; } But I think that's even more obscure. .Dave.
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 648ddee60c24..7bbc59eef267 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2921,6 +2921,54 @@ static void intel_ring_init_semaphores(struct drm_i915_private *dev_priv, } else if (INTEL_GEN(dev_priv) >= 6) { engine->semaphore.sync_to = gen6_ring_sync; engine->semaphore.signal = gen6_signal; + + /* + * The current semaphore is only applied on pre-gen8 + * platform. And there is no VCS2 ring on the pre-gen8 + * platform. So the semaphore between RCS and VCS2 is + * initialized as INVALID. Gen8 will initialize the + * sema between VCS2 and RCS later. + */ + for (i = 0; i < I915_NUM_ENGINES; i++) { + static const struct { + u32 wait_mbox; + i915_reg_t mbox_reg; + } sem_data[I915_NUM_ENGINES][I915_NUM_ENGINES] = { + [RCS] = { + [VCS] = { .wait_mbox = MI_SEMAPHORE_SYNC_RV, .mbox_reg = GEN6_VRSYNC }, + [BCS] = { .wait_mbox = MI_SEMAPHORE_SYNC_RB, .mbox_reg = GEN6_BRSYNC }, + [VECS] = { .wait_mbox = MI_SEMAPHORE_SYNC_RVE, .mbox_reg = GEN6_VERSYNC }, + }, + [VCS] = { + [RCS] = { .wait_mbox = MI_SEMAPHORE_SYNC_VR, .mbox_reg = GEN6_RVSYNC }, + [BCS] = { .wait_mbox = MI_SEMAPHORE_SYNC_VB, .mbox_reg = GEN6_BVSYNC }, + [VECS] = { .wait_mbox = MI_SEMAPHORE_SYNC_VVE, .mbox_reg = GEN6_VEVSYNC }, + }, + [BCS] = { + [RCS] = { .wait_mbox = MI_SEMAPHORE_SYNC_BR, .mbox_reg = GEN6_RBSYNC }, + [VCS] = { .wait_mbox = MI_SEMAPHORE_SYNC_BV, .mbox_reg = GEN6_VBSYNC }, + [VECS] = { .wait_mbox = MI_SEMAPHORE_SYNC_BVE, .mbox_reg = GEN6_VEBSYNC }, + }, + [VECS] = { + [RCS] = { .wait_mbox = MI_SEMAPHORE_SYNC_VER, .mbox_reg = GEN6_RVESYNC }, + [VCS] = { .wait_mbox = MI_SEMAPHORE_SYNC_VEV, .mbox_reg = GEN6_VVESYNC }, + [BCS] = { .wait_mbox = MI_SEMAPHORE_SYNC_VEB, .mbox_reg = GEN6_BVESYNC }, + }, + }; + u32 wait_mbox; + i915_reg_t mbox_reg; + + if (i == engine->id || i == VCS2) { + wait_mbox = MI_SEMAPHORE_SYNC_INVALID; + mbox_reg = GEN6_NOSYNC; + } else { + wait_mbox = sem_data[engine->id][i].wait_mbox; + mbox_reg = sem_data[engine->id][i].mbox_reg; + } + + engine->semaphore.mbox.wait[i] = wait_mbox; + engine->semaphore.mbox.signal[i] = mbox_reg; + } } } @@ -2991,25 +3039,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev) if (IS_GEN6(dev_priv)) engine->flush = gen6_render_ring_flush; engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT; - if (i915_semaphore_is_enabled(dev_priv)) { - /* - * The current semaphore is only applied on pre-gen8 - * platform. And there is no VCS2 ring on the pre-gen8 - * platform. So the semaphore between RCS and VCS2 is - * initialized as INVALID. Gen8 will initialize the - * sema between VCS2 and RCS later. - */ - engine->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_INVALID; - engine->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_RV; - engine->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_RB; - engine->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_RVE; - engine->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID; - engine->semaphore.mbox.signal[RCS] = GEN6_NOSYNC; - engine->semaphore.mbox.signal[VCS] = GEN6_VRSYNC; - engine->semaphore.mbox.signal[BCS] = GEN6_BRSYNC; - engine->semaphore.mbox.signal[VECS] = GEN6_VERSYNC; - engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC; - } } else if (IS_GEN5(dev_priv)) { engine->add_request = pc_render_add_request; engine->flush = gen4_render_ring_flush; @@ -3089,18 +3118,6 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev) GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT; } else { engine->irq_enable_mask = GT_BSD_USER_INTERRUPT; - if (i915_semaphore_is_enabled(dev_priv)) { - engine->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_VR; - engine->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_INVALID; - engine->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_VB; - engine->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_VVE; - engine->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID; - engine->semaphore.mbox.signal[RCS] = GEN6_RVSYNC; - engine->semaphore.mbox.signal[VCS] = GEN6_NOSYNC; - engine->semaphore.mbox.signal[BCS] = GEN6_BVSYNC; - engine->semaphore.mbox.signal[VECS] = GEN6_VEVSYNC; - engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC; - } } } else { engine->mmio_base = BSD_RING_BASE; @@ -3157,25 +3174,6 @@ int intel_init_blt_ring_buffer(struct drm_device *dev) GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT; } else { engine->irq_enable_mask = GT_BLT_USER_INTERRUPT; - if (i915_semaphore_is_enabled(dev_priv)) { - /* - * The current semaphore is only applied on pre-gen8 - * platform. And there is no VCS2 ring on the pre-gen8 - * platform. So the semaphore between BCS and VCS2 is - * initialized as INVALID. Gen8 will initialize the - * sema between BCS and VCS2 later. - */ - engine->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_BR; - engine->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_BV; - engine->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_INVALID; - engine->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_BVE; - engine->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID; - engine->semaphore.mbox.signal[RCS] = GEN6_RBSYNC; - engine->semaphore.mbox.signal[VCS] = GEN6_VBSYNC; - engine->semaphore.mbox.signal[BCS] = GEN6_NOSYNC; - engine->semaphore.mbox.signal[VECS] = GEN6_VEBSYNC; - engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC; - } } return intel_init_ring_buffer(dev, engine); @@ -3203,18 +3201,6 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev) engine->irq_enable_mask = PM_VEBOX_USER_INTERRUPT; engine->irq_get = hsw_vebox_get_irq; engine->irq_put = hsw_vebox_put_irq; - if (i915_semaphore_is_enabled(dev_priv)) { - engine->semaphore.mbox.wait[RCS] = MI_SEMAPHORE_SYNC_VER; - engine->semaphore.mbox.wait[VCS] = MI_SEMAPHORE_SYNC_VEV; - engine->semaphore.mbox.wait[BCS] = MI_SEMAPHORE_SYNC_VEB; - engine->semaphore.mbox.wait[VECS] = MI_SEMAPHORE_SYNC_INVALID; - engine->semaphore.mbox.wait[VCS2] = MI_SEMAPHORE_SYNC_INVALID; - engine->semaphore.mbox.signal[RCS] = GEN6_RVESYNC; - engine->semaphore.mbox.signal[VCS] = GEN6_VVESYNC; - engine->semaphore.mbox.signal[BCS] = GEN6_BVESYNC; - engine->semaphore.mbox.signal[VECS] = GEN6_NOSYNC; - engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC; - } } return intel_init_ring_buffer(dev, engine);