[25/55] drm/i915: Update i915_gem_object_sync() to take a request structure
diff mbox

Message ID 1434629696-10327-1-git-send-email-John.C.Harrison@Intel.com
State New
Headers show

Commit Message

John Harrison June 18, 2015, 12:14 p.m. UTC
From: John Harrison <John.C.Harrison@Intel.com>

The plan is to pass requests around as the basic submission tracking structure
rather than rings and contexts. This patch updates the i915_gem_object_sync()
code path.

v2: Much more complex patch to share a single request between the sync and the
page flip. The _sync() function now supports lazy allocation of the request
structure. That is, if one is passed in then that will be used. If one is not,
then a request will be allocated and passed back out. Note that the _sync() code
does not necessarily require a request. Thus one will only be created until
certain situations. The reason the lazy allocation must be done within the
_sync() code itself is because the decision to need one or not is not really
something that code above can second guess (except in the case where one is
definitely not required because no ring is passed in).

The call chains above _sync() now support passing a request through which most
callers passing in NULL and assuming that no request will be required (because
they also pass in NULL for the ring and therefore can't be generating any ring
code).

The exeception is intel_crtc_page_flip() which now supports having a request
returned from _sync(). If one is, then that request is shared by the page flip
(if the page flip is of a type to need a request). If _sync() does not generate
a request but the page flip does need one, then the page flip path will create
its own request.

v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
Elf review request). Rebased onto newer tree that significantly changed the
synchronisation code.

v4: Updated comments from review feedback (Tomas Elf)

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |    4 ++-
 drivers/gpu/drm/i915/i915_gem.c            |   48 +++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    2 +-
 drivers/gpu/drm/i915/intel_display.c       |   17 +++++++---
 drivers/gpu/drm/i915/intel_drv.h           |    3 +-
 drivers/gpu/drm/i915/intel_fbdev.c         |    2 +-
 drivers/gpu/drm/i915/intel_lrc.c           |    2 +-
 drivers/gpu/drm/i915/intel_overlay.c       |    2 +-
 8 files changed, 57 insertions(+), 23 deletions(-)

Comments

Chris Wilson June 18, 2015, 12:21 p.m. UTC | #1
On Thu, Jun 18, 2015 at 01:14:56PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The plan is to pass requests around as the basic submission tracking structure
> rather than rings and contexts. This patch updates the i915_gem_object_sync()
> code path.
> 
> v2: Much more complex patch to share a single request between the sync and the
> page flip. The _sync() function now supports lazy allocation of the request
> structure. That is, if one is passed in then that will be used. If one is not,
> then a request will be allocated and passed back out. Note that the _sync() code
> does not necessarily require a request. Thus one will only be created until
> certain situations. The reason the lazy allocation must be done within the
> _sync() code itself is because the decision to need one or not is not really
> something that code above can second guess (except in the case where one is
> definitely not required because no ring is passed in).
> 
> The call chains above _sync() now support passing a request through which most
> callers passing in NULL and assuming that no request will be required (because
> they also pass in NULL for the ring and therefore can't be generating any ring
> code).
> 
> The exeception is intel_crtc_page_flip() which now supports having a request
> returned from _sync(). If one is, then that request is shared by the page flip
> (if the page flip is of a type to need a request). If _sync() does not generate
> a request but the page flip does need one, then the page flip path will create
> its own request.
> 
> v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
> Elf review request). Rebased onto newer tree that significantly changed the
> synchronisation code.
> 
> v4: Updated comments from review feedback (Tomas Elf)
> 
> For: VIZ-5115
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Reviewed-by: Tomas Elf <tomas.elf@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h            |    4 ++-
>  drivers/gpu/drm/i915/i915_gem.c            |   48 +++++++++++++++++++++-------
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |    2 +-
>  drivers/gpu/drm/i915/intel_display.c       |   17 +++++++---
>  drivers/gpu/drm/i915/intel_drv.h           |    3 +-
>  drivers/gpu/drm/i915/intel_fbdev.c         |    2 +-
>  drivers/gpu/drm/i915/intel_lrc.c           |    2 +-
>  drivers/gpu/drm/i915/intel_overlay.c       |    2 +-
>  8 files changed, 57 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 64a10fa..f69e9cb 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2778,7 +2778,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
>  
>  int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
>  int i915_gem_object_sync(struct drm_i915_gem_object *obj,
> -			 struct intel_engine_cs *to);
> +			 struct intel_engine_cs *to,
> +			 struct drm_i915_gem_request **to_req);

Nope. Did you forget to reorder the code to ensure that the request is
allocated along with the context switch at the start of execbuf?
-Chris
John Harrison June 18, 2015, 12:59 p.m. UTC | #2
On 18/06/2015 13:21, Chris Wilson wrote:
> On Thu, Jun 18, 2015 at 01:14:56PM +0100, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The plan is to pass requests around as the basic submission tracking structure
>> rather than rings and contexts. This patch updates the i915_gem_object_sync()
>> code path.
>>
>> v2: Much more complex patch to share a single request between the sync and the
>> page flip. The _sync() function now supports lazy allocation of the request
>> structure. That is, if one is passed in then that will be used. If one is not,
>> then a request will be allocated and passed back out. Note that the _sync() code
>> does not necessarily require a request. Thus one will only be created until
>> certain situations. The reason the lazy allocation must be done within the
>> _sync() code itself is because the decision to need one or not is not really
>> something that code above can second guess (except in the case where one is
>> definitely not required because no ring is passed in).
>>
>> The call chains above _sync() now support passing a request through which most
>> callers passing in NULL and assuming that no request will be required (because
>> they also pass in NULL for the ring and therefore can't be generating any ring
>> code).
>>
>> The exeception is intel_crtc_page_flip() which now supports having a request
>> returned from _sync(). If one is, then that request is shared by the page flip
>> (if the page flip is of a type to need a request). If _sync() does not generate
>> a request but the page flip does need one, then the page flip path will create
>> its own request.
>>
>> v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
>> Elf review request). Rebased onto newer tree that significantly changed the
>> synchronisation code.
>>
>> v4: Updated comments from review feedback (Tomas Elf)
>>
>> For: VIZ-5115
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Reviewed-by: Tomas Elf <tomas.elf@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h            |    4 ++-
>>   drivers/gpu/drm/i915/i915_gem.c            |   48 +++++++++++++++++++++-------
>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    2 +-
>>   drivers/gpu/drm/i915/intel_display.c       |   17 +++++++---
>>   drivers/gpu/drm/i915/intel_drv.h           |    3 +-
>>   drivers/gpu/drm/i915/intel_fbdev.c         |    2 +-
>>   drivers/gpu/drm/i915/intel_lrc.c           |    2 +-
>>   drivers/gpu/drm/i915/intel_overlay.c       |    2 +-
>>   8 files changed, 57 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 64a10fa..f69e9cb 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2778,7 +2778,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
>>   
>>   int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
>>   int i915_gem_object_sync(struct drm_i915_gem_object *obj,
>> -			 struct intel_engine_cs *to);
>> +			 struct intel_engine_cs *to,
>> +			 struct drm_i915_gem_request **to_req);
> Nope. Did you forget to reorder the code to ensure that the request is
> allocated along with the context switch at the start of execbuf?
> -Chris
>
Not sure what you are objecting to? If you mean the lazily allocated 
request then that is for page flip code not execbuff code. If we get 
here from an execbuff call then the request will definitely have been 
allocated and will be passed in. Whereas the page flip code may or may 
not require a request (depending on whether MMIO or ring flips are in 
use. Likewise the sync code may or may not require a request (depending 
on whether there is anything to sync to or not). There is no point 
allocating and submitting an empty request in the MMIO/idle case. Hence 
the sync code needs to be able to use an existing request or create one 
if none already exists.
Daniel Vetter June 18, 2015, 2:24 p.m. UTC | #3
On Thu, Jun 18, 2015 at 01:59:13PM +0100, John Harrison wrote:
> On 18/06/2015 13:21, Chris Wilson wrote:
> >On Thu, Jun 18, 2015 at 01:14:56PM +0100, John.C.Harrison@Intel.com wrote:
> >>From: John Harrison <John.C.Harrison@Intel.com>
> >>
> >>The plan is to pass requests around as the basic submission tracking structure
> >>rather than rings and contexts. This patch updates the i915_gem_object_sync()
> >>code path.
> >>
> >>v2: Much more complex patch to share a single request between the sync and the
> >>page flip. The _sync() function now supports lazy allocation of the request
> >>structure. That is, if one is passed in then that will be used. If one is not,
> >>then a request will be allocated and passed back out. Note that the _sync() code
> >>does not necessarily require a request. Thus one will only be created until
> >>certain situations. The reason the lazy allocation must be done within the
> >>_sync() code itself is because the decision to need one or not is not really
> >>something that code above can second guess (except in the case where one is
> >>definitely not required because no ring is passed in).
> >>
> >>The call chains above _sync() now support passing a request through which most
> >>callers passing in NULL and assuming that no request will be required (because
> >>they also pass in NULL for the ring and therefore can't be generating any ring
> >>code).
> >>
> >>The exeception is intel_crtc_page_flip() which now supports having a request
> >>returned from _sync(). If one is, then that request is shared by the page flip
> >>(if the page flip is of a type to need a request). If _sync() does not generate
> >>a request but the page flip does need one, then the page flip path will create
> >>its own request.
> >>
> >>v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
> >>Elf review request). Rebased onto newer tree that significantly changed the
> >>synchronisation code.
> >>
> >>v4: Updated comments from review feedback (Tomas Elf)
> >>
> >>For: VIZ-5115
> >>Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>Reviewed-by: Tomas Elf <tomas.elf@intel.com>
> >>---
> >>  drivers/gpu/drm/i915/i915_drv.h            |    4 ++-
> >>  drivers/gpu/drm/i915/i915_gem.c            |   48 +++++++++++++++++++++-------
> >>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |    2 +-
> >>  drivers/gpu/drm/i915/intel_display.c       |   17 +++++++---
> >>  drivers/gpu/drm/i915/intel_drv.h           |    3 +-
> >>  drivers/gpu/drm/i915/intel_fbdev.c         |    2 +-
> >>  drivers/gpu/drm/i915/intel_lrc.c           |    2 +-
> >>  drivers/gpu/drm/i915/intel_overlay.c       |    2 +-
> >>  8 files changed, 57 insertions(+), 23 deletions(-)
> >>
> >>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>index 64a10fa..f69e9cb 100644
> >>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>@@ -2778,7 +2778,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
> >>  int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
> >>  int i915_gem_object_sync(struct drm_i915_gem_object *obj,
> >>-			 struct intel_engine_cs *to);
> >>+			 struct intel_engine_cs *to,
> >>+			 struct drm_i915_gem_request **to_req);
> >Nope. Did you forget to reorder the code to ensure that the request is
> >allocated along with the context switch at the start of execbuf?
> >-Chris
> >
> Not sure what you are objecting to? If you mean the lazily allocated request
> then that is for page flip code not execbuff code. If we get here from an
> execbuff call then the request will definitely have been allocated and will
> be passed in. Whereas the page flip code may or may not require a request
> (depending on whether MMIO or ring flips are in use. Likewise the sync code
> may or may not require a request (depending on whether there is anything to
> sync to or not). There is no point allocating and submitting an empty
> request in the MMIO/idle case. Hence the sync code needs to be able to use
> an existing request or create one if none already exists.

I guess Chris' comment was that if you have a non-NULL to, then you better
have a non-NULL to_req. And since we link up reqeusts to the engine
they'll run on the former shouldn't be required any more. So either that's
true and we can remove the to or we don't understand something yet (and
perhaps that should be done as a follow-up).
-Daniel
Chris Wilson June 18, 2015, 3:39 p.m. UTC | #4
On Thu, Jun 18, 2015 at 04:24:53PM +0200, Daniel Vetter wrote:
> On Thu, Jun 18, 2015 at 01:59:13PM +0100, John Harrison wrote:
> > On 18/06/2015 13:21, Chris Wilson wrote:
> > >On Thu, Jun 18, 2015 at 01:14:56PM +0100, John.C.Harrison@Intel.com wrote:
> > >>From: John Harrison <John.C.Harrison@Intel.com>
> > >>
> > >>The plan is to pass requests around as the basic submission tracking structure
> > >>rather than rings and contexts. This patch updates the i915_gem_object_sync()
> > >>code path.
> > >>
> > >>v2: Much more complex patch to share a single request between the sync and the
> > >>page flip. The _sync() function now supports lazy allocation of the request
> > >>structure. That is, if one is passed in then that will be used. If one is not,
> > >>then a request will be allocated and passed back out. Note that the _sync() code
> > >>does not necessarily require a request. Thus one will only be created until
> > >>certain situations. The reason the lazy allocation must be done within the
> > >>_sync() code itself is because the decision to need one or not is not really
> > >>something that code above can second guess (except in the case where one is
> > >>definitely not required because no ring is passed in).
> > >>
> > >>The call chains above _sync() now support passing a request through which most
> > >>callers passing in NULL and assuming that no request will be required (because
> > >>they also pass in NULL for the ring and therefore can't be generating any ring
> > >>code).
> > >>
> > >>The exeception is intel_crtc_page_flip() which now supports having a request
> > >>returned from _sync(). If one is, then that request is shared by the page flip
> > >>(if the page flip is of a type to need a request). If _sync() does not generate
> > >>a request but the page flip does need one, then the page flip path will create
> > >>its own request.
> > >>
> > >>v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
> > >>Elf review request). Rebased onto newer tree that significantly changed the
> > >>synchronisation code.
> > >>
> > >>v4: Updated comments from review feedback (Tomas Elf)
> > >>
> > >>For: VIZ-5115
> > >>Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > >>Reviewed-by: Tomas Elf <tomas.elf@intel.com>
> > >>---
> > >>  drivers/gpu/drm/i915/i915_drv.h            |    4 ++-
> > >>  drivers/gpu/drm/i915/i915_gem.c            |   48 +++++++++++++++++++++-------
> > >>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |    2 +-
> > >>  drivers/gpu/drm/i915/intel_display.c       |   17 +++++++---
> > >>  drivers/gpu/drm/i915/intel_drv.h           |    3 +-
> > >>  drivers/gpu/drm/i915/intel_fbdev.c         |    2 +-
> > >>  drivers/gpu/drm/i915/intel_lrc.c           |    2 +-
> > >>  drivers/gpu/drm/i915/intel_overlay.c       |    2 +-
> > >>  8 files changed, 57 insertions(+), 23 deletions(-)
> > >>
> > >>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > >>index 64a10fa..f69e9cb 100644
> > >>--- a/drivers/gpu/drm/i915/i915_drv.h
> > >>+++ b/drivers/gpu/drm/i915/i915_drv.h
> > >>@@ -2778,7 +2778,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
> > >>  int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
> > >>  int i915_gem_object_sync(struct drm_i915_gem_object *obj,
> > >>-			 struct intel_engine_cs *to);
> > >>+			 struct intel_engine_cs *to,
> > >>+			 struct drm_i915_gem_request **to_req);
> > >Nope. Did you forget to reorder the code to ensure that the request is
> > >allocated along with the context switch at the start of execbuf?
> > >-Chris
> > >
> > Not sure what you are objecting to? If you mean the lazily allocated request
> > then that is for page flip code not execbuff code. If we get here from an
> > execbuff call then the request will definitely have been allocated and will
> > be passed in. Whereas the page flip code may or may not require a request
> > (depending on whether MMIO or ring flips are in use. Likewise the sync code
> > may or may not require a request (depending on whether there is anything to
> > sync to or not). There is no point allocating and submitting an empty
> > request in the MMIO/idle case. Hence the sync code needs to be able to use
> > an existing request or create one if none already exists.
> 
> I guess Chris' comment was that if you have a non-NULL to, then you better
> have a non-NULL to_req. And since we link up reqeusts to the engine
> they'll run on the former shouldn't be required any more. So either that's
> true and we can remove the to or we don't understand something yet (and
> perhaps that should be done as a follow-up).

I am sure I sent a patch that outlined in great detail how that we need
only the request parameter in i915_gem_object_sync(), for handling both
execbuffer, pipelined pin_and_fence and synchronous pin_and_fence.
-Chris
John Harrison June 18, 2015, 4:16 p.m. UTC | #5
On 18/06/2015 16:39, Chris Wilson wrote:
> On Thu, Jun 18, 2015 at 04:24:53PM +0200, Daniel Vetter wrote:
>> On Thu, Jun 18, 2015 at 01:59:13PM +0100, John Harrison wrote:
>>> On 18/06/2015 13:21, Chris Wilson wrote:
>>>> On Thu, Jun 18, 2015 at 01:14:56PM +0100, John.C.Harrison@Intel.com wrote:
>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>
>>>>> The plan is to pass requests around as the basic submission tracking structure
>>>>> rather than rings and contexts. This patch updates the i915_gem_object_sync()
>>>>> code path.
>>>>>
>>>>> v2: Much more complex patch to share a single request between the sync and the
>>>>> page flip. The _sync() function now supports lazy allocation of the request
>>>>> structure. That is, if one is passed in then that will be used. If one is not,
>>>>> then a request will be allocated and passed back out. Note that the _sync() code
>>>>> does not necessarily require a request. Thus one will only be created until
>>>>> certain situations. The reason the lazy allocation must be done within the
>>>>> _sync() code itself is because the decision to need one or not is not really
>>>>> something that code above can second guess (except in the case where one is
>>>>> definitely not required because no ring is passed in).
>>>>>
>>>>> The call chains above _sync() now support passing a request through which most
>>>>> callers passing in NULL and assuming that no request will be required (because
>>>>> they also pass in NULL for the ring and therefore can't be generating any ring
>>>>> code).
>>>>>
>>>>> The exeception is intel_crtc_page_flip() which now supports having a request
>>>>> returned from _sync(). If one is, then that request is shared by the page flip
>>>>> (if the page flip is of a type to need a request). If _sync() does not generate
>>>>> a request but the page flip does need one, then the page flip path will create
>>>>> its own request.
>>>>>
>>>>> v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
>>>>> Elf review request). Rebased onto newer tree that significantly changed the
>>>>> synchronisation code.
>>>>>
>>>>> v4: Updated comments from review feedback (Tomas Elf)
>>>>>
>>>>> For: VIZ-5115
>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>> Reviewed-by: Tomas Elf <tomas.elf@intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/i915_drv.h            |    4 ++-
>>>>>   drivers/gpu/drm/i915/i915_gem.c            |   48 +++++++++++++++++++++-------
>>>>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    2 +-
>>>>>   drivers/gpu/drm/i915/intel_display.c       |   17 +++++++---
>>>>>   drivers/gpu/drm/i915/intel_drv.h           |    3 +-
>>>>>   drivers/gpu/drm/i915/intel_fbdev.c         |    2 +-
>>>>>   drivers/gpu/drm/i915/intel_lrc.c           |    2 +-
>>>>>   drivers/gpu/drm/i915/intel_overlay.c       |    2 +-
>>>>>   8 files changed, 57 insertions(+), 23 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>> index 64a10fa..f69e9cb 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>> @@ -2778,7 +2778,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
>>>>>   int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
>>>>>   int i915_gem_object_sync(struct drm_i915_gem_object *obj,
>>>>> -			 struct intel_engine_cs *to);
>>>>> +			 struct intel_engine_cs *to,
>>>>> +			 struct drm_i915_gem_request **to_req);
>>>> Nope. Did you forget to reorder the code to ensure that the request is
>>>> allocated along with the context switch at the start of execbuf?
>>>> -Chris
>>>>
>>> Not sure what you are objecting to? If you mean the lazily allocated request
>>> then that is for page flip code not execbuff code. If we get here from an
>>> execbuff call then the request will definitely have been allocated and will
>>> be passed in. Whereas the page flip code may or may not require a request
>>> (depending on whether MMIO or ring flips are in use. Likewise the sync code
>>> may or may not require a request (depending on whether there is anything to
>>> sync to or not). There is no point allocating and submitting an empty
>>> request in the MMIO/idle case. Hence the sync code needs to be able to use
>>> an existing request or create one if none already exists.
>> I guess Chris' comment was that if you have a non-NULL to, then you better
>> have a non-NULL to_req. And since we link up reqeusts to the engine
>> they'll run on the former shouldn't be required any more. So either that's
>> true and we can remove the to or we don't understand something yet (and
>> perhaps that should be done as a follow-up).
> I am sure I sent a patch that outlined in great detail how that we need
> only the request parameter in i915_gem_object_sync(), for handling both
> execbuffer, pipelined pin_and_fence and synchronous pin_and_fence.
> -Chris
>

As the driver stands, the page flip code wants to synchronise with the 
framebuffer object but potentially without touching the ring and 
therefore without creating a request. If the synchronisation is a no-op 
(because there are no outstanding operations on the given object) then 
there is no need for a request anywhere in the call chain. Thus there is 
a need to pass in the ring together with an optional request and to be 
able to pass out a request that has been created internally.

 >  if you have a non-NULL to, then you better have a non-NULL to_req

I assume you mean 'a non-NULL *to_req'?

No, that is the whole point. If you have a non-null '*to_req' then 'to' 
must be non-null (and specifically must be the ring that '*to_req' is 
referencing). However, it is valid to have a non-null 'to' and a null 
'*to_req'.  In the case of MMIO flips, the page flip itself does not 
require a request as it does not go through the ring. However, it still 
passes in 'i915_gem_request_get_ring(obj->last_write_req)' as the ring 
to synchronise to. Thus it is potentially passing in a valid to pointer 
but without wanting to pre-allocate a request object. If the 
synchronisation requires writing a semaphore to the ring then a request 
will be created internally and passed back out for the page flip code to 
submit (and to re-use in the case of non-MMIO flips). But if the 
synchronisation is a no-op then no request ever gets created or 
submitting and nothing touches the ring at all.

John.
St├ęphane ANCELOT June 18, 2015, 4:36 p.m. UTC | #6
Hi,

Which option is mandatory in linux kernel to be able to act on 
brightness of display ?

Regards,
Steph
Daniel Vetter June 22, 2015, 8:03 p.m. UTC | #7
On Thu, Jun 18, 2015 at 05:16:15PM +0100, John Harrison wrote:
> On 18/06/2015 16:39, Chris Wilson wrote:
> >On Thu, Jun 18, 2015 at 04:24:53PM +0200, Daniel Vetter wrote:
> >>On Thu, Jun 18, 2015 at 01:59:13PM +0100, John Harrison wrote:
> >>>On 18/06/2015 13:21, Chris Wilson wrote:
> >>>>On Thu, Jun 18, 2015 at 01:14:56PM +0100, John.C.Harrison@Intel.com wrote:
> >>>>>From: John Harrison <John.C.Harrison@Intel.com>
> >>>>>
> >>>>>The plan is to pass requests around as the basic submission tracking structure
> >>>>>rather than rings and contexts. This patch updates the i915_gem_object_sync()
> >>>>>code path.
> >>>>>
> >>>>>v2: Much more complex patch to share a single request between the sync and the
> >>>>>page flip. The _sync() function now supports lazy allocation of the request
> >>>>>structure. That is, if one is passed in then that will be used. If one is not,
> >>>>>then a request will be allocated and passed back out. Note that the _sync() code
> >>>>>does not necessarily require a request. Thus one will only be created until
> >>>>>certain situations. The reason the lazy allocation must be done within the
> >>>>>_sync() code itself is because the decision to need one or not is not really
> >>>>>something that code above can second guess (except in the case where one is
> >>>>>definitely not required because no ring is passed in).
> >>>>>
> >>>>>The call chains above _sync() now support passing a request through which most
> >>>>>callers passing in NULL and assuming that no request will be required (because
> >>>>>they also pass in NULL for the ring and therefore can't be generating any ring
> >>>>>code).
> >>>>>
> >>>>>The exeception is intel_crtc_page_flip() which now supports having a request
> >>>>>returned from _sync(). If one is, then that request is shared by the page flip
> >>>>>(if the page flip is of a type to need a request). If _sync() does not generate
> >>>>>a request but the page flip does need one, then the page flip path will create
> >>>>>its own request.
> >>>>>
> >>>>>v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
> >>>>>Elf review request). Rebased onto newer tree that significantly changed the
> >>>>>synchronisation code.
> >>>>>
> >>>>>v4: Updated comments from review feedback (Tomas Elf)
> >>>>>
> >>>>>For: VIZ-5115
> >>>>>Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>>>>Reviewed-by: Tomas Elf <tomas.elf@intel.com>
> >>>>>---
> >>>>>  drivers/gpu/drm/i915/i915_drv.h            |    4 ++-
> >>>>>  drivers/gpu/drm/i915/i915_gem.c            |   48 +++++++++++++++++++++-------
> >>>>>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |    2 +-
> >>>>>  drivers/gpu/drm/i915/intel_display.c       |   17 +++++++---
> >>>>>  drivers/gpu/drm/i915/intel_drv.h           |    3 +-
> >>>>>  drivers/gpu/drm/i915/intel_fbdev.c         |    2 +-
> >>>>>  drivers/gpu/drm/i915/intel_lrc.c           |    2 +-
> >>>>>  drivers/gpu/drm/i915/intel_overlay.c       |    2 +-
> >>>>>  8 files changed, 57 insertions(+), 23 deletions(-)
> >>>>>
> >>>>>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>index 64a10fa..f69e9cb 100644
> >>>>>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>>>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>@@ -2778,7 +2778,8 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
> >>>>>  int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
> >>>>>  int i915_gem_object_sync(struct drm_i915_gem_object *obj,
> >>>>>-			 struct intel_engine_cs *to);
> >>>>>+			 struct intel_engine_cs *to,
> >>>>>+			 struct drm_i915_gem_request **to_req);
> >>>>Nope. Did you forget to reorder the code to ensure that the request is
> >>>>allocated along with the context switch at the start of execbuf?
> >>>>-Chris
> >>>>
> >>>Not sure what you are objecting to? If you mean the lazily allocated request
> >>>then that is for page flip code not execbuff code. If we get here from an
> >>>execbuff call then the request will definitely have been allocated and will
> >>>be passed in. Whereas the page flip code may or may not require a request
> >>>(depending on whether MMIO or ring flips are in use. Likewise the sync code
> >>>may or may not require a request (depending on whether there is anything to
> >>>sync to or not). There is no point allocating and submitting an empty
> >>>request in the MMIO/idle case. Hence the sync code needs to be able to use
> >>>an existing request or create one if none already exists.
> >>I guess Chris' comment was that if you have a non-NULL to, then you better
> >>have a non-NULL to_req. And since we link up reqeusts to the engine
> >>they'll run on the former shouldn't be required any more. So either that's
> >>true and we can remove the to or we don't understand something yet (and
> >>perhaps that should be done as a follow-up).
> >I am sure I sent a patch that outlined in great detail how that we need
> >only the request parameter in i915_gem_object_sync(), for handling both
> >execbuffer, pipelined pin_and_fence and synchronous pin_and_fence.
> >-Chris
> >
> 
> As the driver stands, the page flip code wants to synchronise with the
> framebuffer object but potentially without touching the ring and therefore
> without creating a request. If the synchronisation is a no-op (because there
> are no outstanding operations on the given object) then there is no need for
> a request anywhere in the call chain. Thus there is a need to pass in the
> ring together with an optional request and to be able to pass out a request
> that has been created internally.
> 
> >  if you have a non-NULL to, then you better have a non-NULL to_req
> 
> I assume you mean 'a non-NULL *to_req'?
> 
> No, that is the whole point. If you have a non-null '*to_req' then 'to' must
> be non-null (and specifically must be the ring that '*to_req' is
> referencing). However, it is valid to have a non-null 'to' and a null
> '*to_req'.  In the case of MMIO flips, the page flip itself does not require
> a request as it does not go through the ring. However, it still passes in
> 'i915_gem_request_get_ring(obj->last_write_req)' as the ring to synchronise
> to. Thus it is potentially passing in a valid to pointer but without wanting
> to pre-allocate a request object. If the synchronisation requires writing a
> semaphore to the ring then a request will be created internally and passed
> back out for the page flip code to submit (and to re-use in the case of
> non-MMIO flips). But if the synchronisation is a no-op then no request ever
> gets created or submitting and nothing touches the ring at all.

We use mmio flips by default if there's any ring switching going on, which
means except when a user sets a silly debug module option this will never
happend. Which means it's not too pretty to carry this complication around
for no real use at all. Otoh the flip code is in a massive churn because
of atomic, so not much point in cleaning that out if it'll all disapear
anyway. I'll smash a patch on top to note this TODO.
-Daniel
Chris Wilson June 22, 2015, 8:14 p.m. UTC | #8
On Mon, Jun 22, 2015 at 10:03:20PM +0200, Daniel Vetter wrote:
> On Thu, Jun 18, 2015 at 05:16:15PM +0100, John Harrison wrote:
> > No, that is the whole point. If you have a non-null '*to_req' then 'to' must
> > be non-null (and specifically must be the ring that '*to_req' is
> > referencing). However, it is valid to have a non-null 'to' and a null
> > '*to_req'.  In the case of MMIO flips, the page flip itself does not require
> > a request as it does not go through the ring. However, it still passes in
> > 'i915_gem_request_get_ring(obj->last_write_req)' as the ring to synchronise
> > to. Thus it is potentially passing in a valid to pointer but without wanting
> > to pre-allocate a request object. If the synchronisation requires writing a
> > semaphore to the ring then a request will be created internally and passed
> > back out for the page flip code to submit (and to re-use in the case of
> > non-MMIO flips). But if the synchronisation is a no-op then no request ever
> > gets created or submitting and nothing touches the ring at all.

Your API is very badly designed.
 
> We use mmio flips by default if there's any ring switching going on, which
> means except when a user sets a silly debug module option this will never
> happend. Which means it's not too pretty to carry this complication around
> for no real use at all. Otoh the flip code is in a massive churn because
> of atomic, so not much point in cleaning that out if it'll all disapear
> anyway. I'll smash a patch on top to note this TODO.

It's only a complication of bad design, again.
-Chris

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 64a10fa..f69e9cb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2778,7 +2778,8 @@  static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
 
 int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
 int i915_gem_object_sync(struct drm_i915_gem_object *obj,
-			 struct intel_engine_cs *to);
+			 struct intel_engine_cs *to,
+			 struct drm_i915_gem_request **to_req);
 void i915_vma_move_to_active(struct i915_vma *vma,
 			     struct intel_engine_cs *ring);
 int i915_gem_dumb_create(struct drm_file *file_priv,
@@ -2889,6 +2890,7 @@  int __must_check
 i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 				     u32 alignment,
 				     struct intel_engine_cs *pipelined,
+				     struct drm_i915_gem_request **pipelined_request,
 				     const struct i915_ggtt_view *view);
 void i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj,
 					      const struct i915_ggtt_view *view);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e59369a..d7c7127 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3095,25 +3095,26 @@  out:
 static int
 __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		       struct intel_engine_cs *to,
-		       struct drm_i915_gem_request *req)
+		       struct drm_i915_gem_request *from_req,
+		       struct drm_i915_gem_request **to_req)
 {
 	struct intel_engine_cs *from;
 	int ret;
 
-	from = i915_gem_request_get_ring(req);
+	from = i915_gem_request_get_ring(from_req);
 	if (to == from)
 		return 0;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(from_req, true))
 		return 0;
 
-	ret = i915_gem_check_olr(req);
+	ret = i915_gem_check_olr(from_req);
 	if (ret)
 		return ret;
 
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
-		ret = __i915_wait_request(req,
+		ret = __i915_wait_request(from_req,
 					  atomic_read(&i915->gpu_error.reset_counter),
 					  i915->mm.interruptible,
 					  NULL,
@@ -3121,15 +3122,23 @@  __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 		if (ret)
 			return ret;
 
-		i915_gem_object_retire_request(obj, req);
+		i915_gem_object_retire_request(obj, from_req);
 	} else {
 		int idx = intel_ring_sync_index(from, to);
-		u32 seqno = i915_gem_request_get_seqno(req);
+		u32 seqno = i915_gem_request_get_seqno(from_req);
+
+		WARN_ON(!to_req);
 
 		if (seqno <= from->semaphore.sync_seqno[idx])
 			return 0;
 
-		trace_i915_gem_ring_sync_to(from, to, req);
+		if (*to_req == NULL) {
+			ret = i915_gem_request_alloc(to, to->default_context, to_req);
+			if (ret)
+				return ret;
+		}
+
+		trace_i915_gem_ring_sync_to(from, to, from_req);
 		ret = to->semaphore.sync_to(to, from, seqno);
 		if (ret)
 			return ret;
@@ -3150,11 +3159,14 @@  __i915_gem_object_sync(struct drm_i915_gem_object *obj,
  *
  * @obj: object which may be in use on another ring.
  * @to: ring we wish to use the object on. May be NULL.
+ * @to_req: request we wish to use the object for. See below.
+ *          This will be allocated and returned if a request is
+ *          required but not passed in.
  *
  * This code is meant to abstract object synchronization with the GPU.
  * Calling with NULL implies synchronizing the object with the CPU
  * rather than a particular GPU ring. Conceptually we serialise writes
- * between engines inside the GPU. We only allow on engine to write
+ * between engines inside the GPU. We only allow one engine to write
  * into a buffer at any time, but multiple readers. To ensure each has
  * a coherent view of memory, we must:
  *
@@ -3165,11 +3177,22 @@  __i915_gem_object_sync(struct drm_i915_gem_object *obj,
  * - If we are a write request (pending_write_domain is set), the new
  *   request must wait for outstanding read requests to complete.
  *
+ * For CPU synchronisation (NULL to) no request is required. For syncing with
+ * rings to_req must be non-NULL. However, a request does not have to be
+ * pre-allocated. If *to_req is NULL and sync commands will be emitted then a
+ * request will be allocated automatically and returned through *to_req. Note
+ * that it is not guaranteed that commands will be emitted (because the system
+ * might already be idle). Hence there is no need to create a request that
+ * might never have any work submitted. Note further that if a request is
+ * returned in *to_req, it is the responsibility of the caller to submit
+ * that request (after potentially adding more work to it).
+ *
  * Returns 0 if successful, else propagates up the lower layer error.
  */
 int
 i915_gem_object_sync(struct drm_i915_gem_object *obj,
-		     struct intel_engine_cs *to)
+		     struct intel_engine_cs *to,
+		     struct drm_i915_gem_request **to_req)
 {
 	const bool readonly = obj->base.pending_write_domain == 0;
 	struct drm_i915_gem_request *req[I915_NUM_RINGS];
@@ -3191,7 +3214,7 @@  i915_gem_object_sync(struct drm_i915_gem_object *obj,
 				req[n++] = obj->last_read_req[i];
 	}
 	for (i = 0; i < n; i++) {
-		ret = __i915_gem_object_sync(obj, to, req[i]);
+		ret = __i915_gem_object_sync(obj, to, req[i], to_req);
 		if (ret)
 			return ret;
 	}
@@ -4141,12 +4164,13 @@  int
 i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 				     u32 alignment,
 				     struct intel_engine_cs *pipelined,
+				     struct drm_i915_gem_request **pipelined_request,
 				     const struct i915_ggtt_view *view)
 {
 	u32 old_read_domains, old_write_domain;
 	int ret;
 
-	ret = i915_gem_object_sync(obj, pipelined);
+	ret = i915_gem_object_sync(obj, pipelined, pipelined_request);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 50b1ced..bea92ad 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -899,7 +899,7 @@  i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->active & other_rings) {
-			ret = i915_gem_object_sync(obj, req->ring);
+			ret = i915_gem_object_sync(obj, req->ring, &req);
 			if (ret)
 				return ret;
 		}
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 657a333..6528ada 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2338,7 +2338,8 @@  int
 intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 			   struct drm_framebuffer *fb,
 			   const struct drm_plane_state *plane_state,
-			   struct intel_engine_cs *pipelined)
+			   struct intel_engine_cs *pipelined,
+			   struct drm_i915_gem_request **pipelined_request)
 {
 	struct drm_device *dev = fb->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -2403,7 +2404,7 @@  intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 
 	dev_priv->mm.interruptible = false;
 	ret = i915_gem_object_pin_to_display_plane(obj, alignment, pipelined,
-						   &view);
+						   pipelined_request, &view);
 	if (ret)
 		goto err_interruptible;
 
@@ -11119,6 +11120,7 @@  static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	struct intel_unpin_work *work;
 	struct intel_engine_cs *ring;
 	bool mmio_flip;
+	struct drm_i915_gem_request *request = NULL;
 	int ret;
 
 	/*
@@ -11225,7 +11227,7 @@  static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	 */
 	ret = intel_pin_and_fence_fb_obj(crtc->primary, fb,
 					 crtc->primary->state,
-					 mmio_flip ? i915_gem_request_get_ring(obj->last_write_req) : ring);
+					 mmio_flip ? i915_gem_request_get_ring(obj->last_write_req) : ring, &request);
 	if (ret)
 		goto cleanup_pending;
 
@@ -11256,6 +11258,9 @@  static int intel_crtc_page_flip(struct drm_crtc *crtc,
 					intel_ring_get_request(ring));
 	}
 
+	if (request)
+		i915_add_request_no_flush(request->ring);
+
 	work->flip_queued_vblank = drm_crtc_vblank_count(crtc);
 	work->enable_stall_check = true;
 
@@ -11273,6 +11278,8 @@  static int intel_crtc_page_flip(struct drm_crtc *crtc,
 cleanup_unpin:
 	intel_unpin_fb_obj(fb, crtc->primary->state);
 cleanup_pending:
+	if (request)
+		i915_gem_request_cancel(request);
 	atomic_dec(&intel_crtc->unpin_work_count);
 	mutex_unlock(&dev->struct_mutex);
 cleanup:
@@ -13171,7 +13178,7 @@  intel_prepare_plane_fb(struct drm_plane *plane,
 		if (ret)
 			DRM_DEBUG_KMS("failed to attach phys object\n");
 	} else {
-		ret = intel_pin_and_fence_fb_obj(plane, fb, new_state, NULL);
+		ret = intel_pin_and_fence_fb_obj(plane, fb, new_state, NULL, NULL);
 	}
 
 	if (ret == 0)
@@ -15218,7 +15225,7 @@  void intel_modeset_gem_init(struct drm_device *dev)
 		ret = intel_pin_and_fence_fb_obj(c->primary,
 						 c->primary->fb,
 						 c->primary->state,
-						 NULL);
+						 NULL, NULL);
 		mutex_unlock(&dev->struct_mutex);
 		if (ret) {
 			DRM_ERROR("failed to pin boot fb on pipe %d\n",
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 02d8317..73650ae 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1034,7 +1034,8 @@  void intel_release_load_detect_pipe(struct drm_connector *connector,
 int intel_pin_and_fence_fb_obj(struct drm_plane *plane,
 			       struct drm_framebuffer *fb,
 			       const struct drm_plane_state *plane_state,
-			       struct intel_engine_cs *pipelined);
+			       struct intel_engine_cs *pipelined,
+			       struct drm_i915_gem_request **pipelined_request);
 struct drm_framebuffer *
 __intel_framebuffer_create(struct drm_device *dev,
 			   struct drm_mode_fb_cmd2 *mode_cmd,
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 4e7e7da..dd9f3b2 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -151,7 +151,7 @@  static int intelfb_alloc(struct drm_fb_helper *helper,
 	}
 
 	/* Flush everything out, we'll be doing GTT only from now on */
-	ret = intel_pin_and_fence_fb_obj(NULL, fb, NULL, NULL);
+	ret = intel_pin_and_fence_fb_obj(NULL, fb, NULL, NULL, NULL);
 	if (ret) {
 		DRM_ERROR("failed to pin obj: %d\n", ret);
 		goto out_fb;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1d9d248..c29d6c5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -638,7 +638,7 @@  static int execlists_move_to_gpu(struct drm_i915_gem_request *req,
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (obj->active & other_rings) {
-			ret = i915_gem_object_sync(obj, req->ring);
+			ret = i915_gem_object_sync(obj, req->ring, &req);
 			if (ret)
 				return ret;
 		}
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index e7534b9..0f8187a 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -724,7 +724,7 @@  static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	if (ret != 0)
 		return ret;
 
-	ret = i915_gem_object_pin_to_display_plane(new_bo, 0, NULL,
+	ret = i915_gem_object_pin_to_display_plane(new_bo, 0, NULL, NULL,
 						   &i915_ggtt_view_normal);
 	if (ret != 0)
 		return ret;