[4/7] drm/i915: always pin lrc context for vgpu with Intel GVT-g
diff mbox

Message ID 1440056724-26976-5-git-send-email-zhiyuan.lv@intel.com
State New
Headers show

Commit Message

Zhiyuan Lv Aug. 20, 2015, 7:45 a.m. UTC
Intel GVT-g will perform EXECLIST context shadowing and ring buffer
shadowing. The shadow copy is created when guest creates a context.
If a context changes its LRCA address, the hypervisor is hard to know
whether it is a new context or not. We always pin context objects to
global GTT to make life easier.

Signed-off-by: Zhiyuan Lv <zhiyuan.lv@intel.com>
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

Chris Wilson Aug. 20, 2015, 8:36 a.m. UTC | #1
On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote:
> Intel GVT-g will perform EXECLIST context shadowing and ring buffer
> shadowing. The shadow copy is created when guest creates a context.
> If a context changes its LRCA address, the hypervisor is hard to know
> whether it is a new context or not. We always pin context objects to
> global GTT to make life easier.

Nak. Please explain why we need to workaround a bug in the host. We
cannot pin the context as that breaks userspace (e.g. synmark) who can
and will try to use more contexts than we have room.
-Chris
Zhiyuan Lv Aug. 20, 2015, 9:16 a.m. UTC | #2
Hi Chris,

On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote:
> On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote:
> > Intel GVT-g will perform EXECLIST context shadowing and ring buffer
> > shadowing. The shadow copy is created when guest creates a context.
> > If a context changes its LRCA address, the hypervisor is hard to know
> > whether it is a new context or not. We always pin context objects to
> > global GTT to make life easier.
> 
> Nak. Please explain why we need to workaround a bug in the host. We
> cannot pin the context as that breaks userspace (e.g. synmark) who can
> and will try to use more contexts than we have room.

This change is only for i915 running inside a VM. Host i915 driver's
behavior will not be impacted.

The reason we want to pin context is that our hypervisor identifies
guest contexts with their LRCA, and need LRCA unchanged during
contexts' life cycle so that shadow copy of contexts can easily find
their counterparts. If we cannot pin them, we have to consider more
complicated shadow implementation ...

BTW Chris, are there many apps like synmark that may used up GGTT with
contexts if they are all pinned? Thanks!

Regards,
-Zhiyuan

> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
Zhiyuan Lv Aug. 21, 2015, 6:13 a.m. UTC | #3
Hi Chris,

If we cannot always pin lr context into GGTT, the LRCA cannot be used
as a context identifier for us. Then we have to consider a proper
interface for i915 in VM to notify GVT-g device model.

The context creation might be OK. We can pin context first, then
notify the context creation, after that, unpin the context. In GVT-g,
we can get the context's guest physical addresses from GTT table, and
use that to identify a guest context.

But the destroy can be a problem. It does not make sense to pin
context and unpin that time right?

Do you have any suggestions/comments? Thanks in advance!

Regards,
-Zhiyuan

On Thu, Aug 20, 2015 at 05:16:54PM +0800, Zhiyuan Lv wrote:
> Hi Chris,
> 
> On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote:
> > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote:
> > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer
> > > shadowing. The shadow copy is created when guest creates a context.
> > > If a context changes its LRCA address, the hypervisor is hard to know
> > > whether it is a new context or not. We always pin context objects to
> > > global GTT to make life easier.
> > 
> > Nak. Please explain why we need to workaround a bug in the host. We
> > cannot pin the context as that breaks userspace (e.g. synmark) who can
> > and will try to use more contexts than we have room.
> 
> This change is only for i915 running inside a VM. Host i915 driver's
> behavior will not be impacted.
> 
> The reason we want to pin context is that our hypervisor identifies
> guest contexts with their LRCA, and need LRCA unchanged during
> contexts' life cycle so that shadow copy of contexts can easily find
> their counterparts. If we cannot pin them, we have to consider more
> complicated shadow implementation ...
> 
> BTW Chris, are there many apps like synmark that may used up GGTT with
> contexts if they are all pinned? Thanks!
> 
> Regards,
> -Zhiyuan
> 
> > -Chris
> > 
> > -- 
> > Chris Wilson, Intel Open Source Technology Centre
Zhiyuan Lv Aug. 24, 2015, 10:04 a.m. UTC | #4
Hi Chris,

On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote:
> On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote:
> > Intel GVT-g will perform EXECLIST context shadowing and ring buffer
> > shadowing. The shadow copy is created when guest creates a context.
> > If a context changes its LRCA address, the hypervisor is hard to know
> > whether it is a new context or not. We always pin context objects to
> > global GTT to make life easier.
> 
> Nak. Please explain why we need to workaround a bug in the host. We
> cannot pin the context as that breaks userspace (e.g. synmark) who can
> and will try to use more contexts than we have room.

Could you have a look at below reasons and kindly give us your inputs?

1, Due to the GGTT partitioning, the global graphics memory available
inside virtual machines is much smaller than native case. We cannot
support some graphics memory intensive workloads anyway. So it looks
affordable to just pin contexts which do not take much GGTT.

2, Our hypervisor needs to change i915 guest context in the shadow
context implementation. That part will be tricky if the context is not
always pinned. One scenario is that when a context finishes running,
we need to copy shadow context, which has been updated by hardware, to
guest context. The hypervisor knows context finishing by context
interrupt, but that time shrinker may have unpin the context and its
backing storage may have been swap-out. Such copy may fail. 

Look forward to your comments. Thanks!

Regards,
-Zhiyuan


> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
Chris Wilson Aug. 24, 2015, 10:23 a.m. UTC | #5
On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote:
> Hi Chris,
> 
> On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote:
> > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote:
> > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer
> > > shadowing. The shadow copy is created when guest creates a context.
> > > If a context changes its LRCA address, the hypervisor is hard to know
> > > whether it is a new context or not. We always pin context objects to
> > > global GTT to make life easier.
> > 
> > Nak. Please explain why we need to workaround a bug in the host. We
> > cannot pin the context as that breaks userspace (e.g. synmark) who can
> > and will try to use more contexts than we have room.
> 
> Could you have a look at below reasons and kindly give us your inputs?
> 
> 1, Due to the GGTT partitioning, the global graphics memory available
> inside virtual machines is much smaller than native case. We cannot
> support some graphics memory intensive workloads anyway. So it looks
> affordable to just pin contexts which do not take much GGTT.

Wrong. It exposes the guest to a trivial denial-of-service attack. A
smaller GGTT does not actually limit clients (there is greater aperture
pressure and some paths are less likely but an individual client will
function just fine).
 
> 2, Our hypervisor needs to change i915 guest context in the shadow
> context implementation. That part will be tricky if the context is not
> always pinned. One scenario is that when a context finishes running,
> we need to copy shadow context, which has been updated by hardware, to
> guest context. The hypervisor knows context finishing by context
> interrupt, but that time shrinker may have unpin the context and its
> backing storage may have been swap-out. Such copy may fail. 

That is just a bug in your code. Firstly allowing swapout on an object
you still are using, secondly not being able to swapin.
-Chris
Wang, Zhi A Aug. 24, 2015, 5:18 p.m. UTC | #6
Attach the big picture to help the discussion:

Windows Guest     Linux Guest      Linux Guest (i915 guest mode)  * We are here
+--------------+ +--------------+ +-------------------------------------------+ 
|              | |              | |    Guest Context Lifecycle Management     |
|Windows Driver| |     i915     | | +---------------------------------------+ |
|              | |          +---->| |        CREATE/DESTROY/PIN/UNPIN       | |
| (GUEST MODE) | | (GUEST MODE) | | | +----------------+ +----------------+ | | 
|              | |              | | | |Execlist Context| |Execlist Context| | | 
|              | |              | | | +-----^----------+ +------^---------+ | |
|              | |              | | +-------|-------------------|-----------+ | 
+--------------+ +--------------+ +---------|-------------------|-------------+
                                            |    NOTIFICATION   | 
                  +-------------------------|-------------------|-------------+
                  |               XenGT Shadow Context Lifecycle|Management   |
                  |                         |                   |             | 
                  |               +---------|-------------------|----------+  |  
                  | +-----------+ | +-------v-------+ +---------v------+   |  | 
                  | | i915      | | | Shadow Context| | Shadow Context |   |  | 
                  | | Host Mode | | +---------------+ +----------------+   |  |
                  | +-----------+ +----------------------------------------+  | 
                  |             DOM0 Linux (i915 host mode w/XenGT)           | 
                  +-----------------------------------------------------------+
                  +-----------------------------------------------------------+ 
                  |                     Hypervisor                            | 
                  +-----------------------------------------------------------+

SHADOW CONTEXT SUBMISSION

As you can see, in this picture, each guest execlist context will have an related shadow context in host, and XenGT will be responsible for
a. update SHADOW context via GUEST context when guest wants to submit an workload.
b. submitting the shadow context into hardware.
c. update the guest context via shadow context, when shadow context is finished.
d. inject virtual context switch interrupt into guest.

Then guest will see "hey, my job was retired from hardware, then I can do something to my context."

NOTIFICATION BETWEEN HOST AND GUEST

Now in our design we have built a notification channel for guest to notify XenGT due to performance reason.
With this channel, guest can play like this "Hey XenGT, I created a new context, please shadow it! Oh, I have destroyed the context, please stop tracking this context."

But when this trick comes before guest pin/unpin, it has problems.

PROBLEMS

First, guest context pin/unpin will cause LRCA change, which breaks relationship between guest context and shadow context.
As you can see that in our design, XenGT needs an approach to find the related shadow context according to something of an guest context.
For now we find the related shadow context via guest context ID(LRCA in fact).
But whatever it is, I think there should be something unique.

XenGT has no knowledge about guest context pin/unpin. Guest may swap out an context if it sees the seqno has been passed.
When the XenGT wants to access guest context, the guest context may be not there (swapped-out already).

It's hard and complicated for XenGT to track guest context without an unique context ID and a stable execlist context backing store.
For now the whole tricks are only works under virtualization environment and will not affect the native i915. 

Welcome to discussions! :)

Thanks,
Zhi.

-----Original Message-----
From: Chris Wilson [mailto:chris@chris-wilson.co.uk] 

Sent: Monday, August 24, 2015 6:23 PM
To: intel-gfx@lists.freedesktop.org; igvt-g@lists.01.org; Wang, Zhi A; Tian, Kevin; joonas.lahtinen@linux.intel.com
Subject: Re: About the iGVT-g's requirement to pin guest contexts in VM

On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote:
> Hi Chris,

> 

> On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote:

> > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote:

> > > Intel GVT-g will perform EXECLIST context shadowing and ring 

> > > buffer shadowing. The shadow copy is created when guest creates a context.

> > > If a context changes its LRCA address, the hypervisor is hard to 

> > > know whether it is a new context or not. We always pin context 

> > > objects to global GTT to make life easier.

> > 

> > Nak. Please explain why we need to workaround a bug in the host. We 

> > cannot pin the context as that breaks userspace (e.g. synmark) who 

> > can and will try to use more contexts than we have room.

> 

> Could you have a look at below reasons and kindly give us your inputs?

> 

> 1, Due to the GGTT partitioning, the global graphics memory available 

> inside virtual machines is much smaller than native case. We cannot 

> support some graphics memory intensive workloads anyway. So it looks 

> affordable to just pin contexts which do not take much GGTT.


Wrong. It exposes the guest to a trivial denial-of-service attack. A smaller GGTT does not actually limit clients (there is greater aperture pressure and some paths are less likely but an individual client will function just fine).
 
> 2, Our hypervisor needs to change i915 guest context in the shadow 

> context implementation. That part will be tricky if the context is not 

> always pinned. One scenario is that when a context finishes running, 

> we need to copy shadow context, which has been updated by hardware, to 

> guest context. The hypervisor knows context finishing by context 

> interrupt, but that time shrinker may have unpin the context and its 

> backing storage may have been swap-out. Such copy may fail.


That is just a bug in your code. Firstly allowing swapout on an object you still are using, secondly not being able to swapin.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre
Zhiyuan Lv Aug. 25, 2015, 12:17 a.m. UTC | #7
Hi Chris,

On Mon, Aug 24, 2015 at 11:23:13AM +0100, Chris Wilson wrote:
> On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote:
> > Hi Chris,
> > 
> > On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote:
> > > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote:
> > > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer
> > > > shadowing. The shadow copy is created when guest creates a context.
> > > > If a context changes its LRCA address, the hypervisor is hard to know
> > > > whether it is a new context or not. We always pin context objects to
> > > > global GTT to make life easier.
> > > 
> > > Nak. Please explain why we need to workaround a bug in the host. We
> > > cannot pin the context as that breaks userspace (e.g. synmark) who can
> > > and will try to use more contexts than we have room.
> > 
> > Could you have a look at below reasons and kindly give us your inputs?
> > 
> > 1, Due to the GGTT partitioning, the global graphics memory available
> > inside virtual machines is much smaller than native case. We cannot
> > support some graphics memory intensive workloads anyway. So it looks
> > affordable to just pin contexts which do not take much GGTT.
> 
> Wrong. It exposes the guest to a trivial denial-of-service attack. A

Inside a VM, indeed.

> smaller GGTT does not actually limit clients (there is greater aperture
> pressure and some paths are less likely but an individual client will
> function just fine).
>  
> > 2, Our hypervisor needs to change i915 guest context in the shadow
> > context implementation. That part will be tricky if the context is not
> > always pinned. One scenario is that when a context finishes running,
> > we need to copy shadow context, which has been updated by hardware, to
> > guest context. The hypervisor knows context finishing by context
> > interrupt, but that time shrinker may have unpin the context and its
> > backing storage may have been swap-out. Such copy may fail. 
> 
> That is just a bug in your code. Firstly allowing swapout on an object
> you still are using, secondly not being able to swapin.

As Zhi replied in another email, we do not have the knowledge of guest
driver's swap operations. If we cannot pin context, we may have to ask
guest driver not to swap out context pages. Do you think that would be
the right way to go? Thanks!

Regards,
-Zhiyuan

> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter Aug. 26, 2015, 8:56 a.m. UTC | #8
On Tue, Aug 25, 2015 at 08:17:05AM +0800, Zhiyuan Lv wrote:
> Hi Chris,
> 
> On Mon, Aug 24, 2015 at 11:23:13AM +0100, Chris Wilson wrote:
> > On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote:
> > > Hi Chris,
> > > 
> > > On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote:
> > > > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote:
> > > > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer
> > > > > shadowing. The shadow copy is created when guest creates a context.
> > > > > If a context changes its LRCA address, the hypervisor is hard to know
> > > > > whether it is a new context or not. We always pin context objects to
> > > > > global GTT to make life easier.
> > > > 
> > > > Nak. Please explain why we need to workaround a bug in the host. We
> > > > cannot pin the context as that breaks userspace (e.g. synmark) who can
> > > > and will try to use more contexts than we have room.
> > > 
> > > Could you have a look at below reasons and kindly give us your inputs?
> > > 
> > > 1, Due to the GGTT partitioning, the global graphics memory available
> > > inside virtual machines is much smaller than native case. We cannot
> > > support some graphics memory intensive workloads anyway. So it looks
> > > affordable to just pin contexts which do not take much GGTT.
> > 
> > Wrong. It exposes the guest to a trivial denial-of-service attack. A
> 
> Inside a VM, indeed.
> 
> > smaller GGTT does not actually limit clients (there is greater aperture
> > pressure and some paths are less likely but an individual client will
> > function just fine).
> >  
> > > 2, Our hypervisor needs to change i915 guest context in the shadow
> > > context implementation. That part will be tricky if the context is not
> > > always pinned. One scenario is that when a context finishes running,
> > > we need to copy shadow context, which has been updated by hardware, to
> > > guest context. The hypervisor knows context finishing by context
> > > interrupt, but that time shrinker may have unpin the context and its
> > > backing storage may have been swap-out. Such copy may fail. 
> > 
> > That is just a bug in your code. Firstly allowing swapout on an object
> > you still are using, secondly not being able to swapin.
> 
> As Zhi replied in another email, we do not have the knowledge of guest
> driver's swap operations. If we cannot pin context, we may have to ask
> guest driver not to swap out context pages. Do you think that would be
> the right way to go? Thanks!

It doesn't matter at all - if the guest unpins the ctx and puts something
else in there before the host tells it that the ctx is completed, that's a
bug in the guest. Same with real hw, we guarantee that the context stays
around for long enough.

Also you obviously have to complete the copying from shadow->guest ctx
before you send the irq to the guest to signal ctx completion. Which means
there's really no overall problem here from a design pov, the only thing
you have to do is fix up bugs in the host code (probably you should just
write through the ggtt).
-Daniel
Wang, Zhi A Aug. 26, 2015, 4:42 p.m. UTC | #9
Attach the some introduction to help the discussion:

[ Current Implementation - Performance Oriented ! ]


             Guest create                   Guest submits                   
             a new context                  its context                     
                  +                                    +     ^              
                  |                                    |     |              
  +-------+       |Notify XenGT                        |     |              
  | Guest |       |                                    |     |              
  +-------+       |                                    |     |      TIMELINE
==================v====================================v=====+=============>
  +-------+          <------------------------------->                      
  | XenGT |       +                                    +     ^              
  +-------+       |   +-----------------------------+  |     |              
                  |   |  *XENGT TRACK THE WHOLE     |  |     |              
   Start to track |   | LIFE CYCLE OF GUEST CONTEXT!|  |     |              
   guest context  |   +-----------------------------+  |     |              
                  |                                    |     |              
                  v                                    v     +              
            XenGT creates                 XenGT submits     XenGT updates   
            Shadow context                shadow context    guest context   


[Track the whole life cycle of guest context via notification channel between guest and host]

ADVANTAGE

*Avoid CPU usage peak at the time of guest context submission due to shadow context translation
    - Build shadow context as early as possible
    - No need to postpone the shadow context translation into the time of guest context submission 

*Avoid the cost of frequently re-recreating the shadow context
   - XenGT just create/destroy shadow context with message from notification channel
   - No need to re-create / destroy shadow context in each time of guest context submission(PERFORMANCE!)

DISADVANTAGE

* Need an unique identification to locate the shadow context via information from guest context. Current it's LRCA.
    - Break by dynamically context GGTT pin/unpin in i915

* Guest needs extra modification of leveraging the notification channel to assist XenGT manage the lifecycle of shadow context.
    - Introduce extra functions into guest driver: i915_{execlist, ppgtt}_notify_vgt()

* Backing store of GEM object has to be pinned, as hypervisor tracks guest memory via pfn.
    - Break by: shmem(tmpfs) swap-in/out will cause context pfn change when there are no available pages in page cache and swap cache.

SOLUTION

* Pin guest context to get an unique LRCA and stable backing store.
    - Windows driver has behave like this already without extra modification.

[ Simple Implantation - Architecture Clean Oriented ! ]


                                              Guest sees updated            
              Guest creates   Guest submits   context via virtual           
              context         context         context switch interrupt      
                 +                +              ^                          
                 |                |              |                          
 +-----------+   |                |              |                          
 |   Guest   |   |                |              |                          
 +-----------+   |                |              |                  TIMELINE
=================v================v==============+=========================>
+------------+                                                   |          
| DOM0 XenGT |                    +         ^                    |          
+------------+                    |         |  XenGT update      |          
                    +-------------+         |  guest context     |          
                    |                       |  via shadow        |          
                    |                       |  context           |          
                    |                       |                    |          
                    |                       |                    |          
                    v                       +                    |          
            XenGT creates           XenGT submits       XenGT destroy shadow
            shadow context +----->  shadow context      context             
            from guest              into hardware                           
            context                                                         
                        +-------------------------------+                   
                        | * IN-SUBMISSION-TIME          |                   
                        | SHADOW CONTEXT CREATE/DESTROY |                   
                        +-------------------------------+                   


ADVANTAGE

* No need to use notification channel
    - Avoid introduce extra modification into i915

* No need to pin guest GEM object
    - Guest backing store is pinned by i915 at the time of guest context submission.
    - Shadow context only survive in the time of submission, so unique identification between guest context and shadow context is not needed.

DISADVANTAGE

* Frequently shadow context re-create/destroy at the time of guest context submission(PERFORMANCE! MEMORY FOOTPRINT!)
    - Even trivial screen refresh will trigger guest context submission, like moving mouse over an event-sensed button in X-window.
    - May cause heavy performance drop and heavy memory footprints, due to frequently shadow context re-create/destroy. 

* High CPU usage peak due to frequently shadow context re-create/destroy at the time of guest context submission.

Hope to help the discussion! :)

Thanks,
Zhi.

-----Original Message-----
From: Wang, Zhi A 

Sent: Tuesday, August 25, 2015 1:19 AM
To: Chris Wilson; intel-gfx@lists.freedesktop.org; igvt-g@lists.01.org; Tian, Kevin; joonas.lahtinen@linux.intel.com
Subject: RE: About the iGVT-g's requirement to pin guest contexts in VM

Attach the big picture to help the discussion:

Windows Guest     Linux Guest      Linux Guest (i915 guest mode)  * We are here
+--------------+ +--------------+ +-------------------------------------------+ 
|              | |              | |    Guest Context Lifecycle Management     |
|Windows Driver| |     i915     | | +---------------------------------------+ |
|              | |          +---->| |        CREATE/DESTROY/PIN/UNPIN       | |
| (GUEST MODE) | | (GUEST MODE) | | | +----------------+ +----------------+ | | 
|              | |              | | | |Execlist Context| |Execlist Context| | | 
|              | |              | | | +-----^----------+ +------^---------+ | |
|              | |              | | +-------|-------------------|-----------+ | 
+--------------+ +--------------+ +---------|-------------------|-------------+
                                            |    NOTIFICATION   | 
                  +-------------------------|-------------------|-------------+
                  |               XenGT Shadow Context Lifecycle|Management   |
                  |                         |                   |             | 
                  |               +---------|-------------------|----------+  |  
                  | +-----------+ | +-------v-------+ +---------v------+   |  | 
                  | | i915      | | | Shadow Context| | Shadow Context |   |  | 
                  | | Host Mode | | +---------------+ +----------------+   |  |
                  | +-----------+ +----------------------------------------+  | 
                  |             DOM0 Linux (i915 host mode w/XenGT)           | 
                  +-----------------------------------------------------------+
                  +-----------------------------------------------------------+ 
                  |                     Hypervisor                            | 
                  +-----------------------------------------------------------+

SHADOW CONTEXT SUBMISSION

As you can see, in this picture, each guest execlist context will have an related shadow context in host, and XenGT will be responsible for
a. update SHADOW context via GUEST context when guest wants to submit an workload.
b. submitting the shadow context into hardware.
c. update the guest context via shadow context, when shadow context is finished.
d. inject virtual context switch interrupt into guest.

Then guest will see "hey, my job was retired from hardware, then I can do something to my context."

NOTIFICATION BETWEEN HOST AND GUEST

Now in our design we have built a notification channel for guest to notify XenGT due to performance reason.
With this channel, guest can play like this "Hey XenGT, I created a new context, please shadow it! Oh, I have destroyed the context, please stop tracking this context."

But when this trick comes before guest pin/unpin, it has problems.

PROBLEMS

First, guest context pin/unpin will cause LRCA change, which breaks relationship between guest context and shadow context.
As you can see that in our design, XenGT needs an approach to find the related shadow context according to something of an guest context.
For now we find the related shadow context via guest context ID(LRCA in fact).
But whatever it is, I think there should be something unique.

XenGT has no knowledge about guest context pin/unpin. Guest may swap out an context if it sees the seqno has been passed.
When the XenGT wants to access guest context, the guest context may be not there (swapped-out already).

It's hard and complicated for XenGT to track guest context without an unique context ID and a stable execlist context backing store.
For now the whole tricks are only works under virtualization environment and will not affect the native i915. 

Welcome to discussions! :)

Thanks,
Zhi.

-----Original Message-----
From: Chris Wilson [mailto:chris@chris-wilson.co.uk] 

Sent: Monday, August 24, 2015 6:23 PM
To: intel-gfx@lists.freedesktop.org; igvt-g@lists.01.org; Wang, Zhi A; Tian, Kevin; joonas.lahtinen@linux.intel.com
Subject: Re: About the iGVT-g's requirement to pin guest contexts in VM

On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote:
> Hi Chris,

> 

> On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote:

> > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote:

> > > Intel GVT-g will perform EXECLIST context shadowing and ring 

> > > buffer shadowing. The shadow copy is created when guest creates a context.

> > > If a context changes its LRCA address, the hypervisor is hard to 

> > > know whether it is a new context or not. We always pin context 

> > > objects to global GTT to make life easier.

> > 

> > Nak. Please explain why we need to workaround a bug in the host. We 

> > cannot pin the context as that breaks userspace (e.g. synmark) who 

> > can and will try to use more contexts than we have room.

> 

> Could you have a look at below reasons and kindly give us your inputs?

> 

> 1, Due to the GGTT partitioning, the global graphics memory available 

> inside virtual machines is much smaller than native case. We cannot 

> support some graphics memory intensive workloads anyway. So it looks 

> affordable to just pin contexts which do not take much GGTT.


Wrong. It exposes the guest to a trivial denial-of-service attack. A smaller GGTT does not actually limit clients (there is greater aperture pressure and some paths are less likely but an individual client will function just fine).
 
> 2, Our hypervisor needs to change i915 guest context in the shadow 

> context implementation. That part will be tricky if the context is not 

> always pinned. One scenario is that when a context finishes running, 

> we need to copy shadow context, which has been updated by hardware, to 

> guest context. The hypervisor knows context finishing by context 

> interrupt, but that time shrinker may have unpin the context and its 

> backing storage may have been swap-out. Such copy may fail.


That is just a bug in your code. Firstly allowing swapout on an object you still are using, secondly not being able to swapin.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre
Zhiyuan Lv Aug. 27, 2015, 1:50 a.m. UTC | #10
Hi Daniel,

On Wed, Aug 26, 2015 at 10:56:00AM +0200, Daniel Vetter wrote:
> On Tue, Aug 25, 2015 at 08:17:05AM +0800, Zhiyuan Lv wrote:
> > Hi Chris,
> > 
> > On Mon, Aug 24, 2015 at 11:23:13AM +0100, Chris Wilson wrote:
> > > On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote:
> > > > Hi Chris,
> > > > 
> > > > On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote:
> > > > > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote:
> > > > > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer
> > > > > > shadowing. The shadow copy is created when guest creates a context.
> > > > > > If a context changes its LRCA address, the hypervisor is hard to know
> > > > > > whether it is a new context or not. We always pin context objects to
> > > > > > global GTT to make life easier.
> > > > > 
> > > > > Nak. Please explain why we need to workaround a bug in the host. We
> > > > > cannot pin the context as that breaks userspace (e.g. synmark) who can
> > > > > and will try to use more contexts than we have room.
> > > > 
> > > > Could you have a look at below reasons and kindly give us your inputs?
> > > > 
> > > > 1, Due to the GGTT partitioning, the global graphics memory available
> > > > inside virtual machines is much smaller than native case. We cannot
> > > > support some graphics memory intensive workloads anyway. So it looks
> > > > affordable to just pin contexts which do not take much GGTT.
> > > 
> > > Wrong. It exposes the guest to a trivial denial-of-service attack. A
> > 
> > Inside a VM, indeed.
> > 
> > > smaller GGTT does not actually limit clients (there is greater aperture
> > > pressure and some paths are less likely but an individual client will
> > > function just fine).
> > >  
> > > > 2, Our hypervisor needs to change i915 guest context in the shadow
> > > > context implementation. That part will be tricky if the context is not
> > > > always pinned. One scenario is that when a context finishes running,
> > > > we need to copy shadow context, which has been updated by hardware, to
> > > > guest context. The hypervisor knows context finishing by context
> > > > interrupt, but that time shrinker may have unpin the context and its
> > > > backing storage may have been swap-out. Such copy may fail. 
> > > 
> > > That is just a bug in your code. Firstly allowing swapout on an object
> > > you still are using, secondly not being able to swapin.
> > 
> > As Zhi replied in another email, we do not have the knowledge of guest
> > driver's swap operations. If we cannot pin context, we may have to ask
> > guest driver not to swap out context pages. Do you think that would be
> > the right way to go? Thanks!
> 
> It doesn't matter at all - if the guest unpins the ctx and puts something
> else in there before the host tells it that the ctx is completed, that's a
> bug in the guest. Same with real hw, we guarantee that the context stays
> around for long enough.

You are right. Previously I did not realize that shrinker will check
not only the seqno, but also "ACTIVE_TO_IDLE" context interrupt for
unpinning a context, then had above concern. Thanks for the
explanation!

> 
> Also you obviously have to complete the copying from shadow->guest ctx
> before you send the irq to the guest to signal ctx completion. Which means
> there's really no overall problem here from a design pov, the only thing

Right. We cannot control when guest driver sees seqno change, but we
can control when guest sees context interrupts. The guest CSB update
and interrupt injection will be after we finish writing guest
contexts.

So right now we have two options of context shadowing: one is to track
the whole life-cycle of guest context, and another is to do the shadow
work in context schedule-in/schedule-out time. Zhi draws a nice
picture of them.

Currently we do not have concrete performance comparison of the two
approaches. We will have a try and see. And about this patchset, I
will remove the "context notification" part and send out an updated
version. Thanks!

> you have to do is fix up bugs in the host code (probably you should just
> write through the ggtt).

Sorry could you elaborate a little more about this? Guest context may
not always be in aperture right?

> -Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter Sept. 2, 2015, 8:19 a.m. UTC | #11
On Thu, Aug 27, 2015 at 09:50:03AM +0800, Zhiyuan Lv wrote:
> Hi Daniel,
> 
> On Wed, Aug 26, 2015 at 10:56:00AM +0200, Daniel Vetter wrote:
> > On Tue, Aug 25, 2015 at 08:17:05AM +0800, Zhiyuan Lv wrote:
> > > Hi Chris,
> > > 
> > > On Mon, Aug 24, 2015 at 11:23:13AM +0100, Chris Wilson wrote:
> > > > On Mon, Aug 24, 2015 at 06:04:28PM +0800, Zhiyuan Lv wrote:
> > > > > Hi Chris,
> > > > > 
> > > > > On Thu, Aug 20, 2015 at 09:36:00AM +0100, Chris Wilson wrote:
> > > > > > On Thu, Aug 20, 2015 at 03:45:21PM +0800, Zhiyuan Lv wrote:
> > > > > > > Intel GVT-g will perform EXECLIST context shadowing and ring buffer
> > > > > > > shadowing. The shadow copy is created when guest creates a context.
> > > > > > > If a context changes its LRCA address, the hypervisor is hard to know
> > > > > > > whether it is a new context or not. We always pin context objects to
> > > > > > > global GTT to make life easier.
> > > > > > 
> > > > > > Nak. Please explain why we need to workaround a bug in the host. We
> > > > > > cannot pin the context as that breaks userspace (e.g. synmark) who can
> > > > > > and will try to use more contexts than we have room.
> > > > > 
> > > > > Could you have a look at below reasons and kindly give us your inputs?
> > > > > 
> > > > > 1, Due to the GGTT partitioning, the global graphics memory available
> > > > > inside virtual machines is much smaller than native case. We cannot
> > > > > support some graphics memory intensive workloads anyway. So it looks
> > > > > affordable to just pin contexts which do not take much GGTT.
> > > > 
> > > > Wrong. It exposes the guest to a trivial denial-of-service attack. A
> > > 
> > > Inside a VM, indeed.
> > > 
> > > > smaller GGTT does not actually limit clients (there is greater aperture
> > > > pressure and some paths are less likely but an individual client will
> > > > function just fine).
> > > >  
> > > > > 2, Our hypervisor needs to change i915 guest context in the shadow
> > > > > context implementation. That part will be tricky if the context is not
> > > > > always pinned. One scenario is that when a context finishes running,
> > > > > we need to copy shadow context, which has been updated by hardware, to
> > > > > guest context. The hypervisor knows context finishing by context
> > > > > interrupt, but that time shrinker may have unpin the context and its
> > > > > backing storage may have been swap-out. Such copy may fail. 
> > > > 
> > > > That is just a bug in your code. Firstly allowing swapout on an object
> > > > you still are using, secondly not being able to swapin.
> > > 
> > > As Zhi replied in another email, we do not have the knowledge of guest
> > > driver's swap operations. If we cannot pin context, we may have to ask
> > > guest driver not to swap out context pages. Do you think that would be
> > > the right way to go? Thanks!
> > 
> > It doesn't matter at all - if the guest unpins the ctx and puts something
> > else in there before the host tells it that the ctx is completed, that's a
> > bug in the guest. Same with real hw, we guarantee that the context stays
> > around for long enough.
> 
> You are right. Previously I did not realize that shrinker will check
> not only the seqno, but also "ACTIVE_TO_IDLE" context interrupt for
> unpinning a context, then had above concern. Thanks for the
> explanation!
> 
> > 
> > Also you obviously have to complete the copying from shadow->guest ctx
> > before you send the irq to the guest to signal ctx completion. Which means
> > there's really no overall problem here from a design pov, the only thing
> 
> Right. We cannot control when guest driver sees seqno change, but we
> can control when guest sees context interrupts. The guest CSB update
> and interrupt injection will be after we finish writing guest
> contexts.
> 
> So right now we have two options of context shadowing: one is to track
> the whole life-cycle of guest context, and another is to do the shadow
> work in context schedule-in/schedule-out time. Zhi draws a nice
> picture of them.
> 
> Currently we do not have concrete performance comparison of the two
> approaches. We will have a try and see. And about this patchset, I
> will remove the "context notification" part and send out an updated
> version. Thanks!
> 
> > you have to do is fix up bugs in the host code (probably you should just
> > write through the ggtt).
> 
> Sorry could you elaborate a little more about this? Guest context may
> not always be in aperture right?

Yeah the high-level problem is that global gtt is contended (we already
have trouble with that on xengt and there's the ongoing but unfished
partial mmap support for that). And permanently pinning execlist contexts
will cause lots of troubles.

Windows can do this because it segments the global gtt into different
parts (at least last time I looked at their memory manager), which means
execlist will never sit in the middle of the range used for mmaps. But
linux has a unified memory manager, which means execlist can sit anywhere,
and therefore badly fragment the global gtt. If we pin them then that will
cause trouble after long system uptime. And afaiui xengt is mostly aimed
at servers, where the uptime assumption should be "runs forever".

Compounding factor is that despite that I raised this in the original
review execlist is still not yet using the active list in upstream and
instead does short-time pinning. It's better than pinning forever but
still breaks the eviction logic.

What Chris Wilson and I talked about forever is adding an object-specific
global_gtt_unbind hook. The idea is that this would be called when
unbinding/evicting a special object (e.g. hw context), and you could use
that to do the host signalling. That would be the perfect combination of
both approaches:

- Fast: host signalling (and therefore shadow context recreation) would
  only be done when the execlist context has actually moved around. That
  almost never happens, and hence per-execbuf overhead would be as low as
  with your pinning solution.

- Flexible: The i915 memory manager is still in full control since we
  don't pin any objects unecessarily.

Cheers, Daniel
Zhiyuan Lv Sept. 2, 2015, 9:20 a.m. UTC | #12
Hi Daniel,

Thanks for the comments! And my reply in line:

On Wed, Sep 02, 2015 at 10:19:03AM +0200, Daniel Vetter wrote:
> > > 
> > > Also you obviously have to complete the copying from shadow->guest ctx
> > > before you send the irq to the guest to signal ctx completion. Which means
> > > there's really no overall problem here from a design pov, the only thing
> > 
> > Right. We cannot control when guest driver sees seqno change, but we
> > can control when guest sees context interrupts. The guest CSB update
> > and interrupt injection will be after we finish writing guest
> > contexts.
> > 
> > So right now we have two options of context shadowing: one is to track
> > the whole life-cycle of guest context, and another is to do the shadow
> > work in context schedule-in/schedule-out time. Zhi draws a nice
> > picture of them.
> > 
> > Currently we do not have concrete performance comparison of the two
> > approaches. We will have a try and see. And about this patchset, I
> > will remove the "context notification" part and send out an updated
> > version. Thanks!
> > 
> > > you have to do is fix up bugs in the host code (probably you should just
> > > write through the ggtt).
> > 
> > Sorry could you elaborate a little more about this? Guest context may
> > not always be in aperture right?
> 
> Yeah the high-level problem is that global gtt is contended (we already
> have trouble with that on xengt and there's the ongoing but unfished
> partial mmap support for that). And permanently pinning execlist contexts
> will cause lots of troubles.
> 
> Windows can do this because it segments the global gtt into different
> parts (at least last time I looked at their memory manager), which means
> execlist will never sit in the middle of the range used for mmaps. But
> linux has a unified memory manager, which means execlist can sit anywhere,
> and therefore badly fragment the global gtt. If we pin them then that will
> cause trouble after long system uptime. And afaiui xengt is mostly aimed
> at servers, where the uptime assumption should be "runs forever".

In server side, we do not expect host to run much graphics workloads
(all should be in virtual machines with shorter uptime). But yeah, gm
fragmentation is still an issue. Thanks for the explanation!

> 
> Compounding factor is that despite that I raised this in the original
> review execlist is still not yet using the active list in upstream and
> instead does short-time pinning. It's better than pinning forever but
> still breaks the eviction logic.
> 
> What Chris Wilson and I talked about forever is adding an object-specific
> global_gtt_unbind hook. The idea is that this would be called when
> unbinding/evicting a special object (e.g. hw context), and you could use
> that to do the host signalling. That would be the perfect combination of
> both approaches:

Thanks for the suggestion! That sounds great! Right now in XenGT part
we still plan to try the shadow context work in context
schedule-in/out time. With this way, we do not need to pin context as
well as the host notification. We will collect some performance data
to see how it works.

I sent out v2 patch set which has removed the pin/unpin of execlist
contexts. The patchset addressed review comments from you, Chris and
Joonas(Sorry I forgot to add CC to related people). Is that patch set
OK to be merged? Thanks!

Regards,
-Zhiyuan

> 
> - Fast: host signalling (and therefore shadow context recreation) would
>   only be done when the execlist context has actually moved around. That
>   almost never happens, and hence per-execbuf overhead would be as low as
>   with your pinning solution.
> 
> - Flexible: The i915 memory manager is still in full control since we
>   don't pin any objects unecessarily.
> 
> Cheers, Daniel
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter Sept. 2, 2015, 9:40 a.m. UTC | #13
On Wed, Sep 02, 2015 at 05:20:34PM +0800, Zhiyuan Lv wrote:
> Hi Daniel,
> 
> Thanks for the comments! And my reply in line:
> 
> On Wed, Sep 02, 2015 at 10:19:03AM +0200, Daniel Vetter wrote:
> > > > 
> > > > Also you obviously have to complete the copying from shadow->guest ctx
> > > > before you send the irq to the guest to signal ctx completion. Which means
> > > > there's really no overall problem here from a design pov, the only thing
> > > 
> > > Right. We cannot control when guest driver sees seqno change, but we
> > > can control when guest sees context interrupts. The guest CSB update
> > > and interrupt injection will be after we finish writing guest
> > > contexts.
> > > 
> > > So right now we have two options of context shadowing: one is to track
> > > the whole life-cycle of guest context, and another is to do the shadow
> > > work in context schedule-in/schedule-out time. Zhi draws a nice
> > > picture of them.
> > > 
> > > Currently we do not have concrete performance comparison of the two
> > > approaches. We will have a try and see. And about this patchset, I
> > > will remove the "context notification" part and send out an updated
> > > version. Thanks!
> > > 
> > > > you have to do is fix up bugs in the host code (probably you should just
> > > > write through the ggtt).
> > > 
> > > Sorry could you elaborate a little more about this? Guest context may
> > > not always be in aperture right?
> > 
> > Yeah the high-level problem is that global gtt is contended (we already
> > have trouble with that on xengt and there's the ongoing but unfished
> > partial mmap support for that). And permanently pinning execlist contexts
> > will cause lots of troubles.
> > 
> > Windows can do this because it segments the global gtt into different
> > parts (at least last time I looked at their memory manager), which means
> > execlist will never sit in the middle of the range used for mmaps. But
> > linux has a unified memory manager, which means execlist can sit anywhere,
> > and therefore badly fragment the global gtt. If we pin them then that will
> > cause trouble after long system uptime. And afaiui xengt is mostly aimed
> > at servers, where the uptime assumption should be "runs forever".
> 
> In server side, we do not expect host to run much graphics workloads
> (all should be in virtual machines with shorter uptime). But yeah, gm
> fragmentation is still an issue. Thanks for the explanation!

Fragmentation is a lot worse on the guest side (due to the reduce global
gtt space due to ballooning). Host isn't really any different.

> > Compounding factor is that despite that I raised this in the original
> > review execlist is still not yet using the active list in upstream and
> > instead does short-time pinning. It's better than pinning forever but
> > still breaks the eviction logic.
> > 
> > What Chris Wilson and I talked about forever is adding an object-specific
> > global_gtt_unbind hook. The idea is that this would be called when
> > unbinding/evicting a special object (e.g. hw context), and you could use
> > that to do the host signalling. That would be the perfect combination of
> > both approaches:
> 
> Thanks for the suggestion! That sounds great! Right now in XenGT part
> we still plan to try the shadow context work in context
> schedule-in/out time. With this way, we do not need to pin context as
> well as the host notification. We will collect some performance data
> to see how it works.
> 
> I sent out v2 patch set which has removed the pin/unpin of execlist
> contexts. The patchset addressed review comments from you, Chris and
> Joonas(Sorry I forgot to add CC to related people). Is that patch set
> OK to be merged? Thanks!

I'll defer to detailed reviewers, but if the static pinning is gone then
I'm ok. We can add the optimization I described here later on.
-Daniel

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 39df304..4b2ac37 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2282,7 +2282,8 @@  void intel_lr_context_free(struct intel_context *ctx)
 					ctx->engine[i].ringbuf;
 			struct intel_engine_cs *ring = ringbuf->ring;
 
-			if (ctx == ring->default_context) {
+			if ((ctx == ring->default_context) ||
+			    (intel_vgpu_active(ring->dev))) {
 				intel_unpin_ringbuffer_obj(ringbuf);
 				i915_gem_object_ggtt_unpin(ctx_obj);
 			}
@@ -2353,6 +2354,8 @@  int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
 	const bool is_global_default_ctx = (ctx == ring->default_context);
+	const bool need_to_pin_ctx = (is_global_default_ctx ||
+				      (intel_vgpu_active(ring->dev)));
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *ctx_obj;
@@ -2374,7 +2377,7 @@  int intel_lr_context_deferred_create(struct intel_context *ctx,
 		return -ENOMEM;
 	}
 
-	if (is_global_default_ctx) {
+	if (need_to_pin_ctx) {
 		ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN,
 				PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
 		if (ret) {
@@ -2415,7 +2418,7 @@  int intel_lr_context_deferred_create(struct intel_context *ctx,
 			goto error_free_rbuf;
 		}
 
-		if (is_global_default_ctx) {
+		if (need_to_pin_ctx) {
 			ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
 			if (ret) {
 				DRM_ERROR(
@@ -2464,14 +2467,14 @@  int intel_lr_context_deferred_create(struct intel_context *ctx,
 	return 0;
 
 error:
-	if (is_global_default_ctx)
+	if (need_to_pin_ctx)
 		intel_unpin_ringbuffer_obj(ringbuf);
 error_destroy_rbuf:
 	intel_destroy_ringbuffer_obj(ringbuf);
 error_free_rbuf:
 	kfree(ringbuf);
 error_unpin_ctx:
-	if (is_global_default_ctx)
+	if (need_to_pin_ctx)
 		i915_gem_object_ggtt_unpin(ctx_obj);
 	drm_gem_object_unreference(&ctx_obj->base);
 	return ret;