diff mbox

[v2,2/2] drm/i915: write fence reg only once on VGPU

Message ID 1530586577-20605-3-git-send-email-yakui.zhao@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Zhao, Yakui July 3, 2018, 2:56 a.m. UTC
On VGPU scenario the read/write operation of fence_reg will be trapped
by the GVT-g. And then gvt-g follows the HW spec to write the fence_reg.
So it is unnecessary to read/write fence reg several times. This will help
to reduce the unnecessary trap of fence_reg mmio operation.

V1->V2: Fix one typo error of parameter when calling intel_vgpu_active

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_fence_reg.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

Comments

Daniel Vetter July 3, 2018, 8:51 a.m. UTC | #1
On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote:
> On VGPU scenario the read/write operation of fence_reg will be trapped
> by the GVT-g. And then gvt-g follows the HW spec to write the fence_reg.
> So it is unnecessary to read/write fence reg several times. This will help
> to reduce the unnecessary trap of fence_reg mmio operation.
> 
> V1->V2: Fix one typo error of parameter when calling intel_vgpu_active
> 
> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>

Ok this makes more sense. Except you need to put the 64bit entirely into
the vpgu block, with a comment explaining why this is safe (since the vpgu
will take care of updating fences correctly).
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem_fence_reg.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> index d92fe03..9c97976 100644
> --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> @@ -95,11 +95,15 @@ static void i965_write_fence_reg(struct drm_i915_fence_reg *fence,
>  
>  	if (INTEL_GEN(fence->i915) >= 6) {
>  		/* Use the 64-bit RW to read/write fence reg on SNB+ */
> -		I915_WRITE64_FW(fence_reg_lo, 0);
> -		I915_READ64(fence_reg_lo);
> -
> -		I915_WRITE64_FW(fence_reg_lo, val);
> -		I915_READ64(fence_reg_lo);
> +		if (intel_vgpu_active(dev_priv))
> +			I915_WRITE64_FW(fence_reg_lo, val);
> +		else {
> +			I915_WRITE64_FW(fence_reg_lo, 0);
> +			I915_READ64(fence_reg_lo);
> +
> +			I915_WRITE64_FW(fence_reg_lo, val);
> +			I915_READ64(fence_reg_lo);
> +		}
>  	} else {
>  		/* To w/a incoherency with non-atomic 64-bit register updates,
>  		 * we split the 64-bit update into two 32-bit writes. In order
> -- 
> 2.7.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson July 3, 2018, 9:05 a.m. UTC | #2
Quoting Daniel Vetter (2018-07-03 09:51:03)
> On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote:
> > On VGPU scenario the read/write operation of fence_reg will be trapped
> > by the GVT-g. And then gvt-g follows the HW spec to write the fence_reg.
> > So it is unnecessary to read/write fence reg several times. This will help
> > to reduce the unnecessary trap of fence_reg mmio operation.
> > 
> > V1->V2: Fix one typo error of parameter when calling intel_vgpu_active
> > 
> > Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
> 
> Ok this makes more sense. Except you need to put the 64bit entirely into
> the vpgu block, with a comment explaining why this is safe (since the vpgu
> will take care of updating fences correctly).

Except, who cares? Are fence registers being rewritten that frequently
that special casing vgpu is worth the hassle. Part of that is that you
need to leave a hint behind in the code that (a) explains why it is safe
after having the "here be dragons" and (b) why we care.

On a more pragmatic level if fencing doesn't plateau out to steady
state, that is a worrying amount of contention -- the actual fence write
itself would be the least of my worries.
-Chris
Daniel Vetter July 3, 2018, 9:52 a.m. UTC | #3
On Tue, Jul 03, 2018 at 10:05:28AM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2018-07-03 09:51:03)
> > On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote:
> > > On VGPU scenario the read/write operation of fence_reg will be trapped
> > > by the GVT-g. And then gvt-g follows the HW spec to write the fence_reg.
> > > So it is unnecessary to read/write fence reg several times. This will help
> > > to reduce the unnecessary trap of fence_reg mmio operation.
> > > 
> > > V1->V2: Fix one typo error of parameter when calling intel_vgpu_active
> > > 
> > > Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
> > 
> > Ok this makes more sense. Except you need to put the 64bit entirely into
> > the vpgu block, with a comment explaining why this is safe (since the vpgu
> > will take care of updating fences correctly).
> 
> Except, who cares? Are fence registers being rewritten that frequently
> that special casing vgpu is worth the hassle. Part of that is that you
> need to leave a hint behind in the code that (a) explains why it is safe
> after having the "here be dragons" and (b) why we care.
> 
> On a more pragmatic level if fencing doesn't plateau out to steady
> state, that is a worrying amount of contention -- the actual fence write
> itself would be the least of my worries.

I can easily imagine that with the few per-client fences vgpu hands out
rewrites are much more common. But yeah some real data would be good. And
more reasons to get mesa off of the gtt mmaps.
-Daniel
Zhao, Yakui July 3, 2018, 10:14 a.m. UTC | #4
>-----Original Message-----
>From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
>Sent: Tuesday, July 3, 2018 4:51 PM
>To: Zhao, Yakui <yakui.zhao@intel.com>
>Cc: intel-gfx@lists.freedesktop.org
>Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once on
>VGPU
>
>On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote:
>> On VGPU scenario the read/write operation of fence_reg will be trapped
>> by the GVT-g. And then gvt-g follows the HW spec to write the fence_reg.
>> So it is unnecessary to read/write fence reg several times. This will
>> help to reduce the unnecessary trap of fence_reg mmio operation.
>>
>> V1->V2: Fix one typo error of parameter when calling intel_vgpu_active
>>
>> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
>
>Ok this makes more sense. Except you need to put the 64bit entirely into the
>vpgu block, with a comment explaining why this is safe (since the vpgu will
>take care of updating fences correctly).

Thanks for your comment and reply.
The gvt-g will take care how to update the fence reg.

I will refine the comments and commit log.

>-Daniel
>
>> ---
>>  drivers/gpu/drm/i915/i915_gem_fence_reg.c | 14 +++++++++-----
>>  1 file changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
>> b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
>> index d92fe03..9c97976 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
>> @@ -95,11 +95,15 @@ static void i965_write_fence_reg(struct
>> drm_i915_fence_reg *fence,
>>
>>  	if (INTEL_GEN(fence->i915) >= 6) {
>>  		/* Use the 64-bit RW to read/write fence reg on SNB+ */
>> -		I915_WRITE64_FW(fence_reg_lo, 0);
>> -		I915_READ64(fence_reg_lo);
>> -
>> -		I915_WRITE64_FW(fence_reg_lo, val);
>> -		I915_READ64(fence_reg_lo);
>> +		if (intel_vgpu_active(dev_priv))
>> +			I915_WRITE64_FW(fence_reg_lo, val);
>> +		else {
>> +			I915_WRITE64_FW(fence_reg_lo, 0);
>> +			I915_READ64(fence_reg_lo);
>> +
>> +			I915_WRITE64_FW(fence_reg_lo, val);
>> +			I915_READ64(fence_reg_lo);
>> +		}
>>  	} else {
>>  		/* To w/a incoherency with non-atomic 64-bit register updates,
>>  		 * we split the 64-bit update into two 32-bit writes. In order
>> --
>> 2.7.4
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
>--
>Daniel Vetter
>Software Engineer, Intel Corporation
>http://blog.ffwll.ch
Zhao, Yakui July 3, 2018, 12:47 p.m. UTC | #5
>-----Original Message-----
>From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
>Sent: Tuesday, July 3, 2018 5:52 PM
>To: Chris Wilson <chris@chris-wilson.co.uk>
>Cc: Daniel Vetter <daniel@ffwll.ch>; Zhao, Yakui <yakui.zhao@intel.com>;
>intel-gfx@lists.freedesktop.org
>Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once on
>VGPU
>
>On Tue, Jul 03, 2018 at 10:05:28AM +0100, Chris Wilson wrote:
>> Quoting Daniel Vetter (2018-07-03 09:51:03)
>> > On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote:
>> > > On VGPU scenario the read/write operation of fence_reg will be
>> > > trapped by the GVT-g. And then gvt-g follows the HW spec to write the
>fence_reg.
>> > > So it is unnecessary to read/write fence reg several times. This
>> > > will help to reduce the unnecessary trap of fence_reg mmio operation.
>> > >
>> > > V1->V2: Fix one typo error of parameter when calling
>> > > V1->intel_vgpu_active
>> > >
>> > > Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
>> >
>> > Ok this makes more sense. Except you need to put the 64bit entirely
>> > into the vpgu block, with a comment explaining why this is safe
>> > (since the vpgu will take care of updating fences correctly).
>>
>> Except, who cares? Are fence registers being rewritten that frequently
>> that special casing vgpu is worth the hassle. Part of that is that you
>> need to leave a hint behind in the code that (a) explains why it is
>> safe after having the "here be dragons" and (b) why we care.
>>
>> On a more pragmatic level if fencing doesn't plateau out to steady
>> state, that is a worrying amount of contention -- the actual fence
>> write itself would be the least of my worries.
>
>I can easily imagine that with the few per-client fences vgpu hands out
>rewrites are much more common. But yeah some real data would be good.
>And more reasons to get mesa off of the gtt mmaps.

Hi, Daniel/Chris

      Thanks for your comments.
      The fence reg is used to assure the access of Tiled surface through aperature window. When fence is needed, the driver
helps to find one available fence reg and then configure it. After it is not used, the fence will be turned off and then be allocated
for next usage. It doesn't rely on the state of fence reg.  In such case we don't need to worry about the unsteady state.

      For the VGPU operation: The op of fence reg is trapped.  Then the gvt-g will follow the trapped value to program the fence_reg.
(It will turn off and then write the expected value for any trapped write op of fence reg). The trapped op in GVT-g is safe.

      Based on the current logic,  it needs the five traps when one fence reg is configured under VGPU mode.(Three writes, two reads). 
If it is programmed in one 64-bit op under VGPU mode, only one trap is needed. And the GVT-g still can configure the expected fence_value.
As the trap is quite heavy for VGPU, the trap time can be saved.

      I will put some description in the code and commit log in next version.
       
>-Daniel
>--
>Daniel Vetter
>Software Engineer, Intel Corporation
>http://blog.ffwll.ch
Chris Wilson July 3, 2018, 1:24 p.m. UTC | #6
Quoting Zhao, Yakui (2018-07-03 13:47:46)
> 
> >-----Original Message-----
> >From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter
> >Sent: Tuesday, July 3, 2018 5:52 PM
> >To: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Daniel Vetter <daniel@ffwll.ch>; Zhao, Yakui <yakui.zhao@intel.com>;
> >intel-gfx@lists.freedesktop.org
> >Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once on
> >VGPU
> >
> >On Tue, Jul 03, 2018 at 10:05:28AM +0100, Chris Wilson wrote:
> >> Quoting Daniel Vetter (2018-07-03 09:51:03)
> >> > On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote:
> >> > > On VGPU scenario the read/write operation of fence_reg will be
> >> > > trapped by the GVT-g. And then gvt-g follows the HW spec to write the
> >fence_reg.
> >> > > So it is unnecessary to read/write fence reg several times. This
> >> > > will help to reduce the unnecessary trap of fence_reg mmio operation.
> >> > >
> >> > > V1->V2: Fix one typo error of parameter when calling
> >> > > V1->intel_vgpu_active
> >> > >
> >> > > Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
> >> >
> >> > Ok this makes more sense. Except you need to put the 64bit entirely
> >> > into the vpgu block, with a comment explaining why this is safe
> >> > (since the vpgu will take care of updating fences correctly).
> >>
> >> Except, who cares? Are fence registers being rewritten that frequently
> >> that special casing vgpu is worth the hassle. Part of that is that you
> >> need to leave a hint behind in the code that (a) explains why it is
> >> safe after having the "here be dragons" and (b) why we care.
> >>
> >> On a more pragmatic level if fencing doesn't plateau out to steady
> >> state, that is a worrying amount of contention -- the actual fence
> >> write itself would be the least of my worries.
> >
> >I can easily imagine that with the few per-client fences vgpu hands out
> >rewrites are much more common. But yeah some real data would be good.
> >And more reasons to get mesa off of the gtt mmaps.
> 
> Hi, Daniel/Chris
> 
>       Thanks for your comments.
>       The fence reg is used to assure the access of Tiled surface through aperature window. When fence is needed, the driver
> helps to find one available fence reg and then configure it. After it is not used, the fence will be turned off and then be allocated
> for next usage. It doesn't rely on the state of fence reg.  In such case we don't need to worry about the unsteady state.
> 
>       For the VGPU operation: The op of fence reg is trapped.  Then the gvt-g will follow the trapped value to program the fence_reg.
> (It will turn off and then write the expected value for any trapped write op of fence reg). The trapped op in GVT-g is safe.
> 
>       Based on the current logic,  it needs the five traps when one fence reg is configured under VGPU mode.(Three writes, two reads). 
> If it is programmed in one 64-bit op under VGPU mode, only one trap is needed. And the GVT-g still can configure the expected fence_value.
> As the trap is quite heavy for VGPU, the trap time can be saved.

But the argument is can we avoid it entirely by never changing the
fence. You say this is used for mapping through the aperture (GTT), we
say userspace shouldn't be doing that for performance reasons :)
A slow trap on top of a slow operation that is already causing
contention seems more sensible to fix at source. (Albeit so long as the
maintenance burden is considered and found to be reasonable, adding
special cases with their rationale is acceptable.) So you have to sell
why this mmio is worthy of special attention and curtail any future
questions.
-Chris
Zhao, Yakui July 3, 2018, 1:58 p.m. UTC | #7
>-----Original Message-----

>From: Chris Wilson [mailto:chris@chris-wilson.co.uk]

>Sent: Tuesday, July 3, 2018 9:25 PM

>To: Zhao, Yakui <yakui.zhao@intel.com>; Daniel Vetter <daniel@ffwll.ch>

>Cc: intel-gfx@lists.freedesktop.org

>Subject: RE: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once on

>VGPU

>

>Quoting Zhao, Yakui (2018-07-03 13:47:46)

>>

>> >-----Original Message-----

>> >From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of

>> >Daniel Vetter

>> >Sent: Tuesday, July 3, 2018 5:52 PM

>> >To: Chris Wilson <chris@chris-wilson.co.uk>

>> >Cc: Daniel Vetter <daniel@ffwll.ch>; Zhao, Yakui

>> ><yakui.zhao@intel.com>; intel-gfx@lists.freedesktop.org

>> >Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg

>> >only once on VGPU

>> >

>> >On Tue, Jul 03, 2018 at 10:05:28AM +0100, Chris Wilson wrote:

>> >> Quoting Daniel Vetter (2018-07-03 09:51:03)

>> >> > On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote:

>> >> > > On VGPU scenario the read/write operation of fence_reg will be

>> >> > > trapped by the GVT-g. And then gvt-g follows the HW spec to

>> >> > > write the

>> >fence_reg.

>> >> > > So it is unnecessary to read/write fence reg several times.

>> >> > > This will help to reduce the unnecessary trap of fence_reg mmio

>operation.

>> >> > >

>> >> > > V1->V2: Fix one typo error of parameter when calling

>> >> > > V1->intel_vgpu_active

>> >> > >

>> >> > > Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>

>> >> >

>> >> > Ok this makes more sense. Except you need to put the 64bit

>> >> > entirely into the vpgu block, with a comment explaining why this

>> >> > is safe (since the vpgu will take care of updating fences correctly).

>> >>

>> >> Except, who cares? Are fence registers being rewritten that

>> >> frequently that special casing vgpu is worth the hassle. Part of

>> >> that is that you need to leave a hint behind in the code that (a)

>> >> explains why it is safe after having the "here be dragons" and (b) why we

>care.

>> >>

>> >> On a more pragmatic level if fencing doesn't plateau out to steady

>> >> state, that is a worrying amount of contention -- the actual fence

>> >> write itself would be the least of my worries.

>> >

>> >I can easily imagine that with the few per-client fences vgpu hands

>> >out rewrites are much more common. But yeah some real data would be

>good.

>> >And more reasons to get mesa off of the gtt mmaps.

>>

>> Hi, Daniel/Chris

>>

>>       Thanks for your comments.

>>       The fence reg is used to assure the access of Tiled surface

>> through aperature window. When fence is needed, the driver helps to

>> find one available fence reg and then configure it. After it is not used, the

>fence will be turned off and then be allocated for next usage. It doesn't rely on

>the state of fence reg.  In such case we don't need to worry about the

>unsteady state.

>>

>>       For the VGPU operation: The op of fence reg is trapped.  Then the gvt-g

>will follow the trapped value to program the fence_reg.

>> (It will turn off and then write the expected value for any trapped write op

>of fence reg). The trapped op in GVT-g is safe.

>>

>>       Based on the current logic,  it needs the five traps when one fence reg is

>configured under VGPU mode.(Three writes, two reads).

>> If it is programmed in one 64-bit op under VGPU mode, only one trap is

>needed. And the GVT-g still can configure the expected fence_value.

>> As the trap is quite heavy for VGPU, the trap time can be saved.

>

>But the argument is can we avoid it entirely by never changing the fence. You

>say this is used for mapping through the aperture (GTT), we say userspace

>shouldn't be doing that for performance reasons :) A slow trap on top of a

>slow operation that is already causing contention seems more sensible to fix

>at source. (Albeit so long as the maintenance burden is considered and found

>to be reasonable, adding special cases with their rationale is acceptable.) So

>you have to sell why this mmio is worthy of special attention and curtail any

>future questions.


If the userspace driver/app can take care of the buffer allocation especially for the tiled
surface, maybe it can reduce the ratio of changing the fence. But this can't be avoided if the tiled
buffer is needed and allocated. This also depends on the userspace driver. And it is beyond the 
responsibility of the kernel driver. 

If it is configured in non-VGPU mode, the several writes of fence_reg doesn't matter. But under
the VGPU mode, the trap of fence_reg will cause that it exits the mode of Virtual machine. After the trap
emulation is finished, it can return to the guest OS and then resume the following sequence(For
example: On KVMGT it will exit to the Linux host from the guest OS.). The exit of guest OS is quite
costing. (For example: It needs to check the exit reason and check who/how to do the trap emulation).
As it is mentioned in the previous email, the current sequence on VGPU needs five traps when one fence reg
Is programmed.(Three writes, two reads). If only one trap is needed while it can configure the fence 
reg correctly, it can save the time of unnecessary traps for fence_reg programming. Of course it will
help to improve the efficiency on the guest OS.

>-Chris
Chris Wilson July 3, 2018, 2:07 p.m. UTC | #8
Quoting Zhao, Yakui (2018-07-03 14:58:31)
> >-----Original Message-----
> >From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> >Sent: Tuesday, July 3, 2018 9:25 PM
> >To: Zhao, Yakui <yakui.zhao@intel.com>; Daniel Vetter <daniel@ffwll.ch>
> >Cc: intel-gfx@lists.freedesktop.org
> >Subject: RE: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once on
> >VGPU
> >
> >Quoting Zhao, Yakui (2018-07-03 13:47:46)
> >>
> >> >-----Original Message-----
> >> >From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> >> >Daniel Vetter
> >> >Sent: Tuesday, July 3, 2018 5:52 PM
> >> >To: Chris Wilson <chris@chris-wilson.co.uk>
> >> >Cc: Daniel Vetter <daniel@ffwll.ch>; Zhao, Yakui
> >> ><yakui.zhao@intel.com>; intel-gfx@lists.freedesktop.org
> >> >Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg
> >> >only once on VGPU
> >> >
> >> >On Tue, Jul 03, 2018 at 10:05:28AM +0100, Chris Wilson wrote:
> >> >> Quoting Daniel Vetter (2018-07-03 09:51:03)
> >> >> > On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote:
> >> >> > > On VGPU scenario the read/write operation of fence_reg will be
> >> >> > > trapped by the GVT-g. And then gvt-g follows the HW spec to
> >> >> > > write the
> >> >fence_reg.
> >> >> > > So it is unnecessary to read/write fence reg several times.
> >> >> > > This will help to reduce the unnecessary trap of fence_reg mmio
> >operation.
> >> >> > >
> >> >> > > V1->V2: Fix one typo error of parameter when calling
> >> >> > > V1->intel_vgpu_active
> >> >> > >
> >> >> > > Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
> >> >> >
> >> >> > Ok this makes more sense. Except you need to put the 64bit
> >> >> > entirely into the vpgu block, with a comment explaining why this
> >> >> > is safe (since the vpgu will take care of updating fences correctly).
> >> >>
> >> >> Except, who cares? Are fence registers being rewritten that
> >> >> frequently that special casing vgpu is worth the hassle. Part of
> >> >> that is that you need to leave a hint behind in the code that (a)
> >> >> explains why it is safe after having the "here be dragons" and (b) why we
> >care.
> >> >>
> >> >> On a more pragmatic level if fencing doesn't plateau out to steady
> >> >> state, that is a worrying amount of contention -- the actual fence
> >> >> write itself would be the least of my worries.
> >> >
> >> >I can easily imagine that with the few per-client fences vgpu hands
> >> >out rewrites are much more common. But yeah some real data would be
> >good.
> >> >And more reasons to get mesa off of the gtt mmaps.
> >>
> >> Hi, Daniel/Chris
> >>
> >>       Thanks for your comments.
> >>       The fence reg is used to assure the access of Tiled surface
> >> through aperature window. When fence is needed, the driver helps to
> >> find one available fence reg and then configure it. After it is not used, the
> >fence will be turned off and then be allocated for next usage. It doesn't rely on
> >the state of fence reg.  In such case we don't need to worry about the
> >unsteady state.
> >>
> >>       For the VGPU operation: The op of fence reg is trapped.  Then the gvt-g
> >will follow the trapped value to program the fence_reg.
> >> (It will turn off and then write the expected value for any trapped write op
> >of fence reg). The trapped op in GVT-g is safe.
> >>
> >>       Based on the current logic,  it needs the five traps when one fence reg is
> >configured under VGPU mode.(Three writes, two reads).
> >> If it is programmed in one 64-bit op under VGPU mode, only one trap is
> >needed. And the GVT-g still can configure the expected fence_value.
> >> As the trap is quite heavy for VGPU, the trap time can be saved.
> >
> >But the argument is can we avoid it entirely by never changing the fence. You
> >say this is used for mapping through the aperture (GTT), we say userspace
> >shouldn't be doing that for performance reasons :) A slow trap on top of a
> >slow operation that is already causing contention seems more sensible to fix
> >at source. (Albeit so long as the maintenance burden is considered and found
> >to be reasonable, adding special cases with their rationale is acceptable.) So
> >you have to sell why this mmio is worthy of special attention and curtail any
> >future questions.
> 
> If the userspace driver/app can take care of the buffer allocation especially for the tiled
> surface, maybe it can reduce the ratio of changing the fence. But this can't be avoided if the tiled
> buffer is needed and allocated. This also depends on the userspace driver. And it is beyond the 
> responsibility of the kernel driver. 

We own the stack. If userspace is not behaving as well as it might, we
need to let them know. Improving algorithms is several orders more
beneficial than micro-optimising the implementation. (Speaking as one
who sadly does do much of the latter.) But you are asking us to maintain
this path with neither any CI coverage nor any indication of the
userspace impact.
 
> If it is configured in non-VGPU mode, the several writes of fence_reg doesn't matter. But under
> the VGPU mode, the trap of fence_reg will cause that it exits the mode of Virtual machine. After the trap
> emulation is finished, it can return to the guest OS and then resume the following sequence(For
> example: On KVMGT it will exit to the Linux host from the guest OS.). The exit of guest OS is quite
> costing. (For example: It needs to check the exit reason and check who/how to do the trap emulation).
> As it is mentioned in the previous email, the current sequence on VGPU needs five traps when one fence reg
> Is programmed.(Three writes, two reads). If only one trap is needed while it can configure the fence 
> reg correctly, it can save the time of unnecessary traps for fence_reg programming. Of course it will
> help to improve the efficiency on the guest OS.

Numbers help, we understand why it is slow to do several traps versus
one, what we don't know is what impact that actually has to userspace,
and whether it is worth optimising this versus every other register.

What is the before after of something like gem_fence_upload?
-Chris
Zhao, Yakui July 4, 2018, 2:09 a.m. UTC | #9
>-----Original Message-----

>From: Chris Wilson [mailto:chris@chris-wilson.co.uk]

>Sent: Tuesday, July 3, 2018 10:08 PM

>To: Zhao, Yakui <yakui.zhao@intel.com>; Daniel Vetter <daniel@ffwll.ch>

>Cc: intel-gfx@lists.freedesktop.org

>Subject: RE: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once on

>VGPU

>

>Quoting Zhao, Yakui (2018-07-03 14:58:31)

>> >-----Original Message-----

>> >From: Chris Wilson [mailto:chris@chris-wilson.co.uk]

>> >Sent: Tuesday, July 3, 2018 9:25 PM

>> >To: Zhao, Yakui <yakui.zhao@intel.com>; Daniel Vetter

>> ><daniel@ffwll.ch>

>> >Cc: intel-gfx@lists.freedesktop.org

>> >Subject: RE: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg

>> >only once on VGPU

>> >

>> >Quoting Zhao, Yakui (2018-07-03 13:47:46)

>> >>

>> >> >-----Original Message-----

>> >> >From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of

>> >> >Daniel Vetter

>> >> >Sent: Tuesday, July 3, 2018 5:52 PM

>> >> >To: Chris Wilson <chris@chris-wilson.co.uk>

>> >> >Cc: Daniel Vetter <daniel@ffwll.ch>; Zhao, Yakui

>> >> ><yakui.zhao@intel.com>; intel-gfx@lists.freedesktop.org

>> >> >Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg

>> >> >only once on VGPU

>> >> >

>> >> >On Tue, Jul 03, 2018 at 10:05:28AM +0100, Chris Wilson wrote:

>> >> >> Quoting Daniel Vetter (2018-07-03 09:51:03)

>> >> >> > On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote:

>> >> >> > > On VGPU scenario the read/write operation of fence_reg will

>> >> >> > > be trapped by the GVT-g. And then gvt-g follows the HW spec

>> >> >> > > to write the

>> >> >fence_reg.

>> >> >> > > So it is unnecessary to read/write fence reg several times.

>> >> >> > > This will help to reduce the unnecessary trap of fence_reg

>> >> >> > > mmio

>> >operation.

>> >> >> > >

>> >> >> > > V1->V2: Fix one typo error of parameter when calling

>> >> >> > > V1->intel_vgpu_active

>> >> >> > >

>> >> >> > > Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>

>> >> >> >

>> >> >> > Ok this makes more sense. Except you need to put the 64bit

>> >> >> > entirely into the vpgu block, with a comment explaining why

>> >> >> > this is safe (since the vpgu will take care of updating fences correctly).

>> >> >>

>> >> >> Except, who cares? Are fence registers being rewritten that

>> >> >> frequently that special casing vgpu is worth the hassle. Part of

>> >> >> that is that you need to leave a hint behind in the code that

>> >> >> (a) explains why it is safe after having the "here be dragons"

>> >> >> and (b) why we

>> >care.

>> >> >>

>> >> >> On a more pragmatic level if fencing doesn't plateau out to

>> >> >> steady state, that is a worrying amount of contention -- the

>> >> >> actual fence write itself would be the least of my worries.

>> >> >

>> >> >I can easily imagine that with the few per-client fences vgpu

>> >> >hands out rewrites are much more common. But yeah some real data

>> >> >would be

>> >good.

>> >> >And more reasons to get mesa off of the gtt mmaps.

>> >>

>> >> Hi, Daniel/Chris

>> >>

>> >>       Thanks for your comments.

>> >>       The fence reg is used to assure the access of Tiled surface

>> >> through aperature window. When fence is needed, the driver helps to

>> >> find one available fence reg and then configure it. After it is not

>> >> used, the

>> >fence will be turned off and then be allocated for next usage. It

>> >doesn't rely on the state of fence reg.  In such case we don't need

>> >to worry about the unsteady state.

>> >>

>> >>       For the VGPU operation: The op of fence reg is trapped.  Then

>> >> the gvt-g

>> >will follow the trapped value to program the fence_reg.

>> >> (It will turn off and then write the expected value for any trapped

>> >> write op

>> >of fence reg). The trapped op in GVT-g is safe.

>> >>

>> >>       Based on the current logic,  it needs the five traps when one

>> >> fence reg is

>> >configured under VGPU mode.(Three writes, two reads).

>> >> If it is programmed in one 64-bit op under VGPU mode, only one trap

>> >> is

>> >needed. And the GVT-g still can configure the expected fence_value.

>> >> As the trap is quite heavy for VGPU, the trap time can be saved.

>> >

>> >But the argument is can we avoid it entirely by never changing the

>> >fence. You say this is used for mapping through the aperture (GTT),

>> >we say userspace shouldn't be doing that for performance reasons :) A

>> >slow trap on top of a slow operation that is already causing

>> >contention seems more sensible to fix at source. (Albeit so long as

>> >the maintenance burden is considered and found to be reasonable,

>> >adding special cases with their rationale is acceptable.) So you have

>> >to sell why this mmio is worthy of special attention and curtail any future

>questions.

>>

>> If the userspace driver/app can take care of the buffer allocation

>> especially for the tiled surface, maybe it can reduce the ratio of

>> changing the fence. But this can't be avoided if the tiled buffer is

>> needed and allocated. This also depends on the userspace driver. And it is

>beyond the responsibility of the kernel driver.

>

>We own the stack. If userspace is not behaving as well as it might, we need to

>let them know. Improving algorithms is several orders more beneficial than

>micro-optimising the implementation. (Speaking as one who sadly does do

>much of the latter.) But you are asking us to maintain this path with neither

>any CI coverage nor any indication of the userspace impact.

>

>> If it is configured in non-VGPU mode, the several writes of fence_reg

>> doesn't matter. But under the VGPU mode, the trap of fence_reg will

>> cause that it exits the mode of Virtual machine. After the trap

>> emulation is finished, it can return to the guest OS and then resume

>> the following sequence(For

>> example: On KVMGT it will exit to the Linux host from the guest OS.).

>> The exit of guest OS is quite costing. (For example: It needs to check the exit

>reason and check who/how to do the trap emulation).

>> As it is mentioned in the previous email, the current sequence on VGPU

>> needs five traps when one fence reg Is programmed.(Three writes, two

>> reads). If only one trap is needed while it can configure the fence

>> reg correctly, it can save the time of unnecessary traps for fence_reg

>programming. Of course it will help to improve the efficiency on the guest OS.

>

>Numbers help, we understand why it is slow to do several traps versus one,

>what we don't know is what impact that actually has to userspace, and

>whether it is worth optimising this versus every other register.

>

Hi, Chris

       The userspace driver/library on guest OS has no sense that it is running on the virtual machine.
In such case the trap of fence_reg on VGPU is transparent to the userspace driver/library. 
Only when trying to run the performance test, the performance will be affected by so many traps.

If the trap of other register can also be optimized, it will help the performance of guest OS.  Of course the
Prerequisite is that  we collect the corresponding trap info when running the corresponding workload.
This is case by case.

>What is the before after of something like gem_fence_upload?


I check the case of gem_fence_upload. It only checks the uploading performance after fence is already programmed.
So this patch has no impact on gem_fence_upload.

>-Chris
Chris Wilson July 4, 2018, 9:40 a.m. UTC | #10
Quoting Zhao, Yakui (2018-07-04 03:09:03)
> >-----Original Message-----
> >From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> >Sent: Tuesday, July 3, 2018 10:08 PM
> >To: Zhao, Yakui <yakui.zhao@intel.com>; Daniel Vetter <daniel@ffwll.ch>
> >Cc: intel-gfx@lists.freedesktop.org
> >Subject: RE: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once on
> >VGPU
> >
> >Quoting Zhao, Yakui (2018-07-03 14:58:31)
> >> >-----Original Message-----
> >> >From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> >> >Sent: Tuesday, July 3, 2018 9:25 PM
> >> >To: Zhao, Yakui <yakui.zhao@intel.com>; Daniel Vetter
> >> ><daniel@ffwll.ch>
> >> >Cc: intel-gfx@lists.freedesktop.org
> >> >Subject: RE: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg
> >> >only once on VGPU
> >> >
> >> >Quoting Zhao, Yakui (2018-07-03 13:47:46)
> >> >>
> >> >> >-----Original Message-----
> >> >> >From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> >> >> >Daniel Vetter
> >> >> >Sent: Tuesday, July 3, 2018 5:52 PM
> >> >> >To: Chris Wilson <chris@chris-wilson.co.uk>
> >> >> >Cc: Daniel Vetter <daniel@ffwll.ch>; Zhao, Yakui
> >> >> ><yakui.zhao@intel.com>; intel-gfx@lists.freedesktop.org
> >> >> >Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg
> >> >> >only once on VGPU
> >> >> >
> >> >> >On Tue, Jul 03, 2018 at 10:05:28AM +0100, Chris Wilson wrote:
> >> >> >> Quoting Daniel Vetter (2018-07-03 09:51:03)
> >> >> >> > On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote:
> >> >> >> > > On VGPU scenario the read/write operation of fence_reg will
> >> >> >> > > be trapped by the GVT-g. And then gvt-g follows the HW spec
> >> >> >> > > to write the
> >> >> >fence_reg.
> >> >> >> > > So it is unnecessary to read/write fence reg several times.
> >> >> >> > > This will help to reduce the unnecessary trap of fence_reg
> >> >> >> > > mmio
> >> >operation.
> >> >> >> > >
> >> >> >> > > V1->V2: Fix one typo error of parameter when calling
> >> >> >> > > V1->intel_vgpu_active
> >> >> >> > >
> >> >> >> > > Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
> >> >> >> >
> >> >> >> > Ok this makes more sense. Except you need to put the 64bit
> >> >> >> > entirely into the vpgu block, with a comment explaining why
> >> >> >> > this is safe (since the vpgu will take care of updating fences correctly).
> >> >> >>
> >> >> >> Except, who cares? Are fence registers being rewritten that
> >> >> >> frequently that special casing vgpu is worth the hassle. Part of
> >> >> >> that is that you need to leave a hint behind in the code that
> >> >> >> (a) explains why it is safe after having the "here be dragons"
> >> >> >> and (b) why we
> >> >care.
> >> >> >>
> >> >> >> On a more pragmatic level if fencing doesn't plateau out to
> >> >> >> steady state, that is a worrying amount of contention -- the
> >> >> >> actual fence write itself would be the least of my worries.
> >> >> >
> >> >> >I can easily imagine that with the few per-client fences vgpu
> >> >> >hands out rewrites are much more common. But yeah some real data
> >> >> >would be
> >> >good.
> >> >> >And more reasons to get mesa off of the gtt mmaps.
> >> >>
> >> >> Hi, Daniel/Chris
> >> >>
> >> >>       Thanks for your comments.
> >> >>       The fence reg is used to assure the access of Tiled surface
> >> >> through aperature window. When fence is needed, the driver helps to
> >> >> find one available fence reg and then configure it. After it is not
> >> >> used, the
> >> >fence will be turned off and then be allocated for next usage. It
> >> >doesn't rely on the state of fence reg.  In such case we don't need
> >> >to worry about the unsteady state.
> >> >>
> >> >>       For the VGPU operation: The op of fence reg is trapped.  Then
> >> >> the gvt-g
> >> >will follow the trapped value to program the fence_reg.
> >> >> (It will turn off and then write the expected value for any trapped
> >> >> write op
> >> >of fence reg). The trapped op in GVT-g is safe.
> >> >>
> >> >>       Based on the current logic,  it needs the five traps when one
> >> >> fence reg is
> >> >configured under VGPU mode.(Three writes, two reads).
> >> >> If it is programmed in one 64-bit op under VGPU mode, only one trap
> >> >> is
> >> >needed. And the GVT-g still can configure the expected fence_value.
> >> >> As the trap is quite heavy for VGPU, the trap time can be saved.
> >> >
> >> >But the argument is can we avoid it entirely by never changing the
> >> >fence. You say this is used for mapping through the aperture (GTT),
> >> >we say userspace shouldn't be doing that for performance reasons :) A
> >> >slow trap on top of a slow operation that is already causing
> >> >contention seems more sensible to fix at source. (Albeit so long as
> >> >the maintenance burden is considered and found to be reasonable,
> >> >adding special cases with their rationale is acceptable.) So you have
> >> >to sell why this mmio is worthy of special attention and curtail any future
> >questions.
> >>
> >> If the userspace driver/app can take care of the buffer allocation
> >> especially for the tiled surface, maybe it can reduce the ratio of
> >> changing the fence. But this can't be avoided if the tiled buffer is
> >> needed and allocated. This also depends on the userspace driver. And it is
> >beyond the responsibility of the kernel driver.
> >
> >We own the stack. If userspace is not behaving as well as it might, we need to
> >let them know. Improving algorithms is several orders more beneficial than
> >micro-optimising the implementation. (Speaking as one who sadly does do
> >much of the latter.) But you are asking us to maintain this path with neither
> >any CI coverage nor any indication of the userspace impact.
> >
> >> If it is configured in non-VGPU mode, the several writes of fence_reg
> >> doesn't matter. But under the VGPU mode, the trap of fence_reg will
> >> cause that it exits the mode of Virtual machine. After the trap
> >> emulation is finished, it can return to the guest OS and then resume
> >> the following sequence(For
> >> example: On KVMGT it will exit to the Linux host from the guest OS.).
> >> The exit of guest OS is quite costing. (For example: It needs to check the exit
> >reason and check who/how to do the trap emulation).
> >> As it is mentioned in the previous email, the current sequence on VGPU
> >> needs five traps when one fence reg Is programmed.(Three writes, two
> >> reads). If only one trap is needed while it can configure the fence
> >> reg correctly, it can save the time of unnecessary traps for fence_reg
> >programming. Of course it will help to improve the efficiency on the guest OS.
> >
> >Numbers help, we understand why it is slow to do several traps versus one,
> >what we don't know is what impact that actually has to userspace, and
> >whether it is worth optimising this versus every other register.
> >
> Hi, Chris
> 
>        The userspace driver/library on guest OS has no sense that it is running on the virtual machine.

And? Fence thrashing is bad irrespective of virtualisation or not.
Improving algorithms to avoid fencing should be effective whether or
not userspace is running on a mediated vgpu, and avoiding everything in
the path leading to rewriting fences (struct_mutex!) so much effective
than improving the traps.

It is not that one precludes the other, it's just the
micro-optimisations still need to be justified, or they'll be forgotten
and removed.

> In such case the trap of fence_reg on VGPU is transparent to the userspace driver/library. 
> Only when trying to run the performance test, the performance will be affected by so many traps.

But how much so?
 
> If the trap of other register can also be optimized, it will help the performance of guest OS.  Of course the
> Prerequisite is that  we collect the corresponding trap info when running the corresponding workload.
> This is case by case.
>
> >What is the before after of something like gem_fence_upload?
> 
> I check the case of gem_fence_upload. It only checks the uploading performance after fence is already programmed.
> So this patch has no impact on gem_fence_upload.

Incorrect. By the same token, you haven't measured yourself the impact
of this change?
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index d92fe03..9c97976 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -95,11 +95,15 @@  static void i965_write_fence_reg(struct drm_i915_fence_reg *fence,
 
 	if (INTEL_GEN(fence->i915) >= 6) {
 		/* Use the 64-bit RW to read/write fence reg on SNB+ */
-		I915_WRITE64_FW(fence_reg_lo, 0);
-		I915_READ64(fence_reg_lo);
-
-		I915_WRITE64_FW(fence_reg_lo, val);
-		I915_READ64(fence_reg_lo);
+		if (intel_vgpu_active(dev_priv))
+			I915_WRITE64_FW(fence_reg_lo, val);
+		else {
+			I915_WRITE64_FW(fence_reg_lo, 0);
+			I915_READ64(fence_reg_lo);
+
+			I915_WRITE64_FW(fence_reg_lo, val);
+			I915_READ64(fence_reg_lo);
+		}
 	} else {
 		/* To w/a incoherency with non-atomic 64-bit register updates,
 		 * we split the 64-bit update into two 32-bit writes. In order