diff mbox

[V6] drm/i915: Disable stolen memory when i915 runs in guest vm

Message ID 1493116501-29327-1-git-send-email-xiong.y.zhang@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Zhang, Xiong Y April 25, 2017, 10:34 a.m. UTC
Stolen memory isn't a standard pci resource and exists in RMRR which has
identity mapping in iommu table when host boot up, so IGD could access
stolen memory in host OS. While according to 'commit c875d2c1b808
("iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains")',RMRR
isn't supported by kvm, then both EPT and guest iommu domain table lack
of maaping for stolen memory in kvm IGD passthrough environment. If IGD
access stolen memory in such environment, many iommu exceptions exist in
host dmesg and gpu hang exists also.
DMAR: [DMA Read] Request device [00:02.0] fault addr da012000
[fault reason 05] PTE Write access is not set
DMAR: [DMA Read] Request device [00:02.0] fault addr da2df000
[fault reason 06] PTE Read access is not set

So stolen memory should be disabled in KVM IGD passthrough environment,
this patch detects such environment through the existence of qemu emulated
isa bridge.

Stolen memory exists in kernel for a long time, and this patch depends
on INTEL_PCH_QEMU_DEVICE_ID_TYPE which was introduced in v4.5 kernel,
so this patch should be backported into v4.5 kernel and later.

v2:GVT-g may run in non qemu (Zhenyu)
v3:Make commit message clear (Daniel)
v4:Fix typo
v5:Exclude P2X as it is used for VMware (Joonas)
v6:Handle real ISA bridge assigned to guest also (Joonas)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99025
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99028

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Cc: stable@vger.kernel.org # 4.5+
---
 drivers/gpu/drm/i915/i915_drv.c        | 39 +++++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_drv.h        |  1 +
 drivers/gpu/drm/i915/i915_gem_stolen.c |  4 ++--
 3 files changed, 34 insertions(+), 10 deletions(-)

Comments

Joonas Lahtinen April 25, 2017, 12:06 p.m. UTC | #1
+ David and Jon

On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote:

The blocking issue I see is that bisecting is still not pointing at
relevant commits. Both bisected commits from Bugzilla are not related
to changes in stolen memory usage behavior. I'd assume a successful
bisect to land at the patches where we start creating kernel internal
objects from stolen memory. Otherwise we could be ignoring a bug
elsewhere. If it consistently lands on those patches, then there might
be something wrong with them, in addition to stolen memory problems.

Disabling power saving makes many bugs go away, but we still don't
disable power saving as a resolution to such bugs, but instead root
cause and fix the individual bugs.

> Stolen memory isn't a standard pci resource and exists in RMRR which has
> identity mapping in iommu table when host boot up, so IGD could access
> stolen memory in host OS. While according to 'commit c875d2c1b808
> ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains")',RMRR
> isn't supported by kvm, then both EPT and guest iommu domain table lack
> of maaping for stolen memory in kvm IGD passthrough environment.

Commit message text still fails to address that an exclusion was added
by commit:

commit 18436afdc11a00ac881990b454cfb2eae81d6003
Author: David Woodhouse <David.Woodhouse@intel.com>
Date:   Wed Mar 25 15:05:47 2015 +0000

    iommu/vt-d: Allow RMRR on graphics devices too

    Commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API
    domains") prevents certain options for devices with RMRRs. This even
    prevents those devices from getting a 1:1 mapping with 'iommu=pt',
    because we don't have the code to handle *preserving* the RMRR regions
    when moving the device between domains.

<SNIP>

The quoted part of David's commit message leads me to believe it's
simply lack of some code in kernel for juggling the RMRRs when moving a
device between domains that is missing. Why is not that considered
instead? With that implemented, we would have more transparent pass-
through, which should be good.

Also, was fixing the IGD driver loading with zero stolen memory
considered instead? All this information should exist in the commit
message.

After the bisecting is properly done, there is an agreement that
suggested RMRR preservation is absolutely a no-go, other options are
not viable, the commit message should be updated to reflect all that.
Then we should look in more detail on how to detect the scenarios when
we're running in a virtual machine that doesn't set up the 1:1 mapping
for RMRRs.

Regards, Joonas
Zhang, Xiong Y April 27, 2017, 5:54 a.m. UTC | #2
> + David and Jon

> 

> On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote:

> 

> The blocking issue I see is that bisecting is still not pointing at

> relevant commits. Both bisected commits from Bugzilla are not related

> to changes in stolen memory usage behavior. I'd assume a successful

> bisect to land at the patches where we start creating kernel internal

> objects from stolen memory. Otherwise we could be ignoring a bug

> elsewhere. If it consistently lands on those patches, then there might

> be something wrong with them, in addition to stolen memory problems.

[Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla descripted,
guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu hang
in guest dmesg. From this point, we could do git bisect.
But tons of IOMMU DMA R/W exception to stolen memory exist in host dmesg 
when guest kernel is 4.8 and 4.9. This means guest domain iommu table doesn't 
have mapping for stolen memory and IGD fail in accessing stolen memory 
from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression and
shouldn't go git bisect. You could check this host error message from the bugzilla 
attachment. And this should be fixed first.
Anyway, I will try my best to get the ideal commit through git bisect, but I'm afraid
the result is the same as past because we don't have a stable good point to start git
bisect.

> Disabling power saving makes many bugs go away, but we still don't

> disable power saving as a resolution to such bugs, but instead root

> cause and fix the individual bugs.

[Zhang, Xiong Y] I add i915.enable_rc6=0, i915.enable_dc=0, i915.enable_fbc=0,
I915.enable_psr=0, i915.disable_power_well=0,i915.enable_ips=0 to grub.
But gpu hang exist in guest and DMA R/W error exist in host.
> 

> > Stolen memory isn't a standard pci resource and exists in RMRR which has

> > identity mapping in iommu table when host boot up, so IGD could access

> > stolen memory in host OS. While according to 'commit c875d2c1b808

> > ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API

> domains")',RMRR

> > isn't supported by kvm, then both EPT and guest iommu domain table lack

> > of maaping for stolen memory in kvm IGD passthrough environment.

> 

> Commit message text still fails to address that an exclusion was added

> by commit:

> 

> commit 18436afdc11a00ac881990b454cfb2eae81d6003

> Author: David Woodhouse <David.Woodhouse@intel.com>

> Date:   Wed Mar 25 15:05:47 2015 +0000

> 

>     iommu/vt-d: Allow RMRR on graphics devices too

> 

>     Commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from

> IOMMU API

>     domains") prevents certain options for devices with RMRRs. This even

>     prevents those devices from getting a 1:1 mapping with 'iommu=pt',

>     because we don't have the code to handle *preserving* the RMRR

> regions

>     when moving the device between domains.

> 

> <SNIP>

> 

> The quoted part of David's commit message leads me to believe it's

> simply lack of some code in kernel for juggling the RMRRs when moving a

> device between domains that is missing. Why is not that considered

> instead? With that implemented, we would have more transparent pass-

> through, which should be good.

[Zhang, Xiong Y] c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from
IOMMU API domains). This patch prevent devices associated with RMRRs from
assigning to a guest, the one of reason is it knows RMRR isn't supported in guest 
domain IOMMU table, If these device's driver still access RMRR from guest, 
serious error will happen.
18436afdc ("iommu/vt-d: Allow RMRR on graphics devices too "), add an exception
to above commit. So IGD could be assigned to a guest. But this doesn't mean IGD
1:1 mapping for RMRR will be support in guest domain iommu table
'iommu=pt' is to set 1:1 mapping for all pci device in host domain iommu table.

When one device is assigned to a guest and this guest boot up, this guest domain
Iommu table will take place of host domain iommu table on hardware. Our issue
is guest domain iommu table doesn't have 1:1 mapping for RMRR.
In order to set up 1:1 mapping for RMRR in guest domain iommu table, we have
to modify kvm and qemu and kvm community have declined this.
> 

> Also, was fixing the IGD driver loading with zero stolen memory

> considered instead? All this information should exist in the commit

> message.

[Zhang, Xiong Y] IGD and i915 driver read pci config register 0x50 to get 
the size of stolen memory. When guest read this register, qemu could trap
it and return one value to guest.
So in order to  " fixing the IGD driver loading with zero stolen memory ",
We have to modify both Qemu and IGD driver:
1) QEMU: trap read from pci cfg 0x50 register, then return zero to guest
2) IGD driver: when IGD driver see zero size of stolen memory, don't exit loading
and continue.
This doesn't give any benefit to i915, i915 will still disable stolen memory as i915
see zero size stolen memory . So I prefer to disable stolen memory in i915 directly 
and keep Qemu and IGD driver unchanged. 
> 

> After the bisecting is properly done, there is an agreement that

> suggested RMRR preservation is absolutely a no-go, other options are

> not viable, the commit message should be updated to reflect all that.

> Then we should look in more detail on how to detect the scenarios when

> we're running in a virtual machine that doesn't set up the 1:1 mapping

> for RMRRs.

[Zhang, Xiong Y] Sure, I will do this once we have an agreement.
I really need the help from others who could correct me if I am wrong.
> 

> Regards, Joonas

> --

> Joonas Lahtinen

> Open Source Technology Center

> Intel Corporation
Zhang, Xiong Y May 3, 2017, 9:22 a.m. UTC | #3
> > + David and Jon

> >

> > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote:

> >

> > The blocking issue I see is that bisecting is still not pointing at

> > relevant commits. Both bisected commits from Bugzilla are not related

> > to changes in stolen memory usage behavior. I'd assume a successful

> > bisect to land at the patches where we start creating kernel internal

> > objects from stolen memory. Otherwise we could be ignoring a bug

> > elsewhere. If it consistently lands on those patches, then there might

> > be something wrong with them, in addition to stolen memory problems.

> [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla descripted,

> guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu hang

> in guest dmesg. From this point, we could do git bisect.

> But tons of IOMMU DMA R/W exception to stolen memory exist in host dmesg

> when guest kernel is 4.8 and 4.9. This means guest domain iommu table

> doesn't

> have mapping for stolen memory and IGD fail in accessing stolen memory

> from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression and

> shouldn't go git bisect. You could check this host error message from the

> bugzilla

> attachment. And this should be fixed first.

> Anyway, I will try my best to get the ideal commit through git bisect, but I'm

> afraid

> the result is the same as past because we don't have a stable good point to

> start git

> bisect.

[Zhang, Xiong Y] hi, Joonas:
As you said, the gpu hang exist because i915 create ring buffer from stolen memory.
I did git bisect again, and the following commit is the first bad commit:
commit c58b735fc762e891481e92af7124b85cb0a51fce
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Aug 18 17:16:57 2016 +0100

    drm/i915: Allocate rings from stolen

    If we have stolen available, make use of it for ringbuffer allocation.
    Previously this was restricted to !llc platforms, as writing to stolen
    requires a GGTT mapping - but now that we have partial mappable support,
    the mappable aperture isn't quite so precious so we can use it more
    freely and ringbuffers are a good user for the otherwise wasted stolen.

After reverting this patch from drm-intel-nightly, I didn't see gpu hang during guest boot process.
So what's our next step ?

thanks
> 

> > Disabling power saving makes many bugs go away, but we still don't

> > disable power saving as a resolution to such bugs, but instead root

> > cause and fix the individual bugs.

> [Zhang, Xiong Y] I add i915.enable_rc6=0, i915.enable_dc=0,

> i915.enable_fbc=0,

> I915.enable_psr=0, i915.disable_power_well=0,i915.enable_ips=0 to grub.

> But gpu hang exist in guest and DMA R/W error exist in host.

> >

> > > Stolen memory isn't a standard pci resource and exists in RMRR which has

> > > identity mapping in iommu table when host boot up, so IGD could access

> > > stolen memory in host OS. While according to 'commit c875d2c1b808

> > > ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API

> > domains")',RMRR

> > > isn't supported by kvm, then both EPT and guest iommu domain table lack

> > > of maaping for stolen memory in kvm IGD passthrough environment.

> >

> > Commit message text still fails to address that an exclusion was added

> > by commit:

> >

> > commit 18436afdc11a00ac881990b454cfb2eae81d6003

> > Author: David Woodhouse <David.Woodhouse@intel.com>

> > Date:   Wed Mar 25 15:05:47 2015 +0000

> >

> >     iommu/vt-d: Allow RMRR on graphics devices too

> >

> >     Commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from

> > IOMMU API

> >     domains") prevents certain options for devices with RMRRs. This even

> >     prevents those devices from getting a 1:1 mapping with 'iommu=pt',

> >     because we don't have the code to handle *preserving* the RMRR

> > regions

> >     when moving the device between domains.

> >

> > <SNIP>

> >

> > The quoted part of David's commit message leads me to believe it's

> > simply lack of some code in kernel for juggling the RMRRs when moving a

> > device between domains that is missing. Why is not that considered

> > instead? With that implemented, we would have more transparent pass-

> > through, which should be good.

> [Zhang, Xiong Y] c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from

> IOMMU API domains). This patch prevent devices associated with RMRRs from

> assigning to a guest, the one of reason is it knows RMRR isn't supported in

> guest

> domain IOMMU table, If these device's driver still access RMRR from guest,

> serious error will happen.

> 18436afdc ("iommu/vt-d: Allow RMRR on graphics devices too "), add an

> exception

> to above commit. So IGD could be assigned to a guest. But this doesn't mean

> IGD

> 1:1 mapping for RMRR will be support in guest domain iommu table

> 'iommu=pt' is to set 1:1 mapping for all pci device in host domain iommu

> table.

> 

> When one device is assigned to a guest and this guest boot up, this guest

> domain

> Iommu table will take place of host domain iommu table on hardware. Our

> issue

> is guest domain iommu table doesn't have 1:1 mapping for RMRR.

> In order to set up 1:1 mapping for RMRR in guest domain iommu table, we

> have

> to modify kvm and qemu and kvm community have declined this.

> >

> > Also, was fixing the IGD driver loading with zero stolen memory

> > considered instead? All this information should exist in the commit

> > message.

> [Zhang, Xiong Y] IGD and i915 driver read pci config register 0x50 to get

> the size of stolen memory. When guest read this register, qemu could trap

> it and return one value to guest.

> So in order to  " fixing the IGD driver loading with zero stolen memory ",

> We have to modify both Qemu and IGD driver:

> 1) QEMU: trap read from pci cfg 0x50 register, then return zero to guest

> 2) IGD driver: when IGD driver see zero size of stolen memory, don't exit

> loading

> and continue.

> This doesn't give any benefit to i915, i915 will still disable stolen memory as

> i915

> see zero size stolen memory . So I prefer to disable stolen memory in i915

> directly

> and keep Qemu and IGD driver unchanged.

> >

> > After the bisecting is properly done, there is an agreement that

> > suggested RMRR preservation is absolutely a no-go, other options are

> > not viable, the commit message should be updated to reflect all that.

> > Then we should look in more detail on how to detect the scenarios when

> > we're running in a virtual machine that doesn't set up the 1:1 mapping

> > for RMRRs.

> [Zhang, Xiong Y] Sure, I will do this once we have an agreement.

> I really need the help from others who could correct me if I am wrong.

> >

> > Regards, Joonas

> > --

> > Joonas Lahtinen

> > Open Source Technology Center

> > Intel Corporation
Joonas Lahtinen May 5, 2017, 9:14 a.m. UTC | #4
On to, 2017-04-27 at 05:54 +0000, Zhang, Xiong Y wrote:
> > 
> > Also, was fixing the IGD driver loading with zero stolen memory
> > considered instead? All this information should exist in the commit
> > message.
> [Zhang, Xiong Y] IGD and i915 driver read pci config register 0x50 to get 
> the size of stolen memory. When guest read this register, qemu could trap
> it and return one value to guest.
> So in order to  " fixing the IGD driver loading with zero stolen memory ",
> We have to modify both Qemu and IGD driver:
> 1) QEMU: trap read from pci cfg 0x50 register, then return zero to guest
> 2) IGD driver: when IGD driver see zero size of stolen memory, don't exit loading
> and continue.
> This doesn't give any benefit to i915, i915 will still disable stolen memory as i915
> see zero size stolen memory . So I prefer to disable stolen memory in i915 directly 
> and keep Qemu and IGD driver unchanged. 

If due to lack of code in QEMU, RMRR is not available. It'd sound to me
they should carry the change to report stolen as zero, because they're
in best position to track if that changes in future.

Also, if the IGD driver requires a special treatment where stolen is
reported to exist but is not functional, that logic should be fixed
where the flaw is. i915 driver is not the go-to for fixing any quirks
and lack of code related to virtualization.

The driver performs perfectly logically, if stolen is reported to
exist, it is expected to work, if not, it's not touched. Adding special
cases to satisfy conditions outside of i915 will make driver
maintenance a burden.

And, if all other options fail and we end up having to fix the issue in
i915, the fix should be robust, which this currently is not (see my
previous messages).

Regards, Joonas
Joonas Lahtinen May 5, 2017, 9:21 a.m. UTC | #5
On ke, 2017-05-03 at 09:22 +0000, Zhang, Xiong Y wrote:
> > 
> > > 
> > > + David and Jon
> > > 
> > > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote:
> > > 
> > > The blocking issue I see is that bisecting is still not pointing at
> > > relevant commits. Both bisected commits from Bugzilla are not related
> > > to changes in stolen memory usage behavior. I'd assume a successful
> > > bisect to land at the patches where we start creating kernel internal
> > > objects from stolen memory. Otherwise we could be ignoring a bug
> > > elsewhere. If it consistently lands on those patches, then there might
> > > be something wrong with them, in addition to stolen memory problems.
> > [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla descripted,
> > guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu hang
> > in guest dmesg. From this point, we could do git bisect.
> > But tons of IOMMU DMA R/W exception to stolen memory exist in host dmesg
> > when guest kernel is 4.8 and 4.9. This means guest domain iommu table
> > doesn't
> > have mapping for stolen memory and IGD fail in accessing stolen memory
> > from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression and
> > shouldn't go git bisect. You could check this host error message from the
> > bugzilla
> > attachment. And this should be fixed first.
> > Anyway, I will try my best to get the ideal commit through git bisect, but I'm
> > afraid
> > the result is the same as past because we don't have a stable good point to
> > start git
> > bisect.
> [Zhang, Xiong Y] hi, Joonas:
> As you said, the gpu hang exist because i915 create ring buffer from stolen memory.
> I did git bisect again, and the following commit is the first bad commit:
> commit c58b735fc762e891481e92af7124b85cb0a51fce
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Aug 18 17:16:57 2016 +0100
> 
>     drm/i915: Allocate rings from stolen
> 
>     If we have stolen available, make use of it for ringbuffer allocation.
>     Previously this was restricted to !llc platforms, as writing to stolen
>     requires a GGTT mapping - but now that we have partial mappable support,
>     the mappable aperture isn't quite so precious so we can use it more
>     freely and ringbuffers are a good user for the otherwise wasted stolen.
> 
> After reverting this patch from drm-intel-nightly, I didn't see gpu hang during guest boot process.
> So what's our next step ?

An appropriate next step would be to evaluate how much work it is to
support the RMRR passthrough David mentioned about in his commit.

I'd also go talk with the IGD team, why they refuse to load the driver
when stolen memory is correctly reported as zero, and insist on being
lied to.

While doing that, please update the freedesktop.org bugs.

Regards, Joonas
Zhang, Xiong Y May 6, 2017, 2:58 a.m. UTC | #6
> On ke, 2017-05-03 at 09:22 +0000, Zhang, Xiong Y wrote:

> > >

> > > >

> > > > + David and Jon

> > > >

> > > > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote:

> > > >

> > > > The blocking issue I see is that bisecting is still not pointing at

> > > > relevant commits. Both bisected commits from Bugzilla are not related

> > > > to changes in stolen memory usage behavior. I'd assume a successful

> > > > bisect to land at the patches where we start creating kernel internal

> > > > objects from stolen memory. Otherwise we could be ignoring a bug

> > > > elsewhere. If it consistently lands on those patches, then there might

> > > > be something wrong with them, in addition to stolen memory problems.

> > > [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla

> descripted,

> > > guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu

> hang

> > > in guest dmesg. From this point, we could do git bisect.

> > > But tons of IOMMU DMA R/W exception to stolen memory exist in host

> dmesg

> > > when guest kernel is 4.8 and 4.9. This means guest domain iommu table

> > > doesn't

> > > have mapping for stolen memory and IGD fail in accessing stolen memory

> > > from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression

> and

> > > shouldn't go git bisect. You could check this host error message from the

> > > bugzilla

> > > attachment. And this should be fixed first.

> > > Anyway, I will try my best to get the ideal commit through git bisect, but

> I'm

> > > afraid

> > > the result is the same as past because we don't have a stable good point to

> > > start git

> > > bisect.

> > [Zhang, Xiong Y] hi, Joonas:

> > As you said, the gpu hang exist because i915 create ring buffer from stolen

> memory.

> > I did git bisect again, and the following commit is the first bad commit:

> > commit c58b735fc762e891481e92af7124b85cb0a51fce

> > Author: Chris Wilson <chris@chris-wilson.co.uk>

> > Date:   Thu Aug 18 17:16:57 2016 +0100

> >

> >     drm/i915: Allocate rings from stolen

> >

> >     If we have stolen available, make use of it for ringbuffer allocation.

> >     Previously this was restricted to !llc platforms, as writing to stolen

> >     requires a GGTT mapping - but now that we have partial mappable

> support,

> >     the mappable aperture isn't quite so precious so we can use it more

> >     freely and ringbuffers are a good user for the otherwise wasted stolen.

> >

> > After reverting this patch from drm-intel-nightly, I didn't see gpu hang during

> guest boot process.

> > So what's our next step ?

> 

> An appropriate next step would be to evaluate how much work it is to

> support the RMRR passthrough David mentioned about in his commit.

[Zhang, Xiong Y] As Kevin explained, KVM community found the disadvantage
Of RMRR and have decided to not support RMRR passthrough, so it is really hard
for us to push such solution and isn't related to the workload.
Except usb and graphic card, all other devices with RMRR couldn't passthrough
to guest. But the driver of usb and graphic card couldn't access RMRR in such
environment.
https://access.redhat.com/sites/default/files/attachments/rmrr-wp1.pdf

> I'd also go talk with the IGD team, why they refuse to load the driver

> when stolen memory is correctly reported as zero, and insist on being

> lied to.

[Zhang, Xiong Y] thanks a lot for doing so.
> 

> While doing that, please update the freedesktop.org bugs.

[Zhang, Xiong Y] sure, I will update bugzilla once we have further
finding and make a decision.
> 

> Regards, Joonas

> --

> Joonas Lahtinen

> Open Source Technology Center

> Intel Corporation
Joonas Lahtinen May 8, 2017, 10:07 a.m. UTC | #7
On la, 2017-05-06 at 02:58 +0000, Zhang, Xiong Y wrote:
> > 
> > On ke, 2017-05-03 at 09:22 +0000, Zhang, Xiong Y wrote:
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > 
> > > > > + David and Jon
> > > > > 
> > > > > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote:
> > > > > 
> > > > > The blocking issue I see is that bisecting is still not pointing at
> > > > > relevant commits. Both bisected commits from Bugzilla are not related
> > > > > to changes in stolen memory usage behavior. I'd assume a successful
> > > > > bisect to land at the patches where we start creating kernel internal
> > > > > objects from stolen memory. Otherwise we could be ignoring a bug
> > > > > elsewhere. If it consistently lands on those patches, then there might
> > > > > be something wrong with them, in addition to stolen memory problems.
> > > > [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla
> > descripted,
> > > 
> > > > 
> > > > guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu
> > hang
> > > 
> > > > 
> > > > in guest dmesg. From this point, we could do git bisect.
> > > > But tons of IOMMU DMA R/W exception to stolen memory exist in host
> > dmesg
> > > 
> > > > 
> > > > when guest kernel is 4.8 and 4.9. This means guest domain iommu table
> > > > doesn't
> > > > have mapping for stolen memory and IGD fail in accessing stolen memory
> > > > from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression
> > and
> > > 
> > > > 
> > > > shouldn't go git bisect. You could check this host error message from the
> > > > bugzilla
> > > > attachment. And this should be fixed first.
> > > > Anyway, I will try my best to get the ideal commit through git bisect, but
> > I'm
> > > 
> > > > 
> > > > afraid
> > > > the result is the same as past because we don't have a stable good point to
> > > > start git
> > > > bisect.
> > > [Zhang, Xiong Y] hi, Joonas:
> > > As you said, the gpu hang exist because i915 create ring buffer from stolen
> > memory.
> > > 
> > > I did git bisect again, and the following commit is the first bad commit:
> > > commit c58b735fc762e891481e92af7124b85cb0a51fce
> > > > > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > Date:   Thu Aug 18 17:16:57 2016 +0100
> > > 
> > >     drm/i915: Allocate rings from stolen
> > > 
> > >     If we have stolen available, make use of it for ringbuffer allocation.
> > >     Previously this was restricted to !llc platforms, as writing to stolen
> > >     requires a GGTT mapping - but now that we have partial mappable
> > support,
> > > 
> > >     the mappable aperture isn't quite so precious so we can use it more
> > >     freely and ringbuffers are a good user for the otherwise wasted stolen.
> > > 
> > > After reverting this patch from drm-intel-nightly, I didn't see gpu hang during
> > guest boot process.
> > > 
> > > So what's our next step ?
> > 
> > An appropriate next step would be to evaluate how much work it is to
> > support the RMRR passthrough David mentioned about in his commit.
> [Zhang, Xiong Y] As Kevin explained, KVM community found the disadvantage
> Of RMRR and have decided to not support RMRR passthrough, so it is really hard
> for us to push such solution and isn't related to the workload.
> Except usb and graphic card, all other devices with RMRR couldn't passthrough
> to guest. But the driver of usb and graphic card couldn't access RMRR in such
> environment.
> https://access.redhat.com/sites/default/files/attachments/rmrr-wp1.pdf

Does this patch have the right Cc's from KVM team? I'd like to hear
directly from them that even the usage of RMRRs that follow the
intention of VT-d spec are not going to be supported. That document
predates the patches to add the exclusion for graphics.

> > I'd also go talk with the IGD team, why they refuse to load the driver
> > when stolen memory is correctly reported as zero, and insist on being
> > lied to.
> [Zhang, Xiong Y] thanks a lot for doing so.

I don't have the contacts, so I assume you to pursue that.

Regards, Joonas
Alex Williamson May 8, 2017, 3:01 p.m. UTC | #8
On Mon, 08 May 2017 13:07:10 +0300
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote:

> On la, 2017-05-06 at 02:58 +0000, Zhang, Xiong Y wrote:
> > > 
> > > On ke, 2017-05-03 at 09:22 +0000, Zhang, Xiong Y wrote:  
> > > >   
> > > > > 
> > > > >   
> > > > > > 
> > > > > > 
> > > > > > + David and Jon
> > > > > > 
> > > > > > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote:
> > > > > > 
> > > > > > The blocking issue I see is that bisecting is still not pointing at
> > > > > > relevant commits. Both bisected commits from Bugzilla are not related
> > > > > > to changes in stolen memory usage behavior. I'd assume a successful
> > > > > > bisect to land at the patches where we start creating kernel internal
> > > > > > objects from stolen memory. Otherwise we could be ignoring a bug
> > > > > > elsewhere. If it consistently lands on those patches, then there might
> > > > > > be something wrong with them, in addition to stolen memory problems.  
> > > > > [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla  
> > > descripted,  
> > > >   
> > > > > 
> > > > > guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu  
> > > hang  
> > > >   
> > > > > 
> > > > > in guest dmesg. From this point, we could do git bisect.
> > > > > But tons of IOMMU DMA R/W exception to stolen memory exist in host  
> > > dmesg  
> > > >   
> > > > > 
> > > > > when guest kernel is 4.8 and 4.9. This means guest domain iommu table
> > > > > doesn't
> > > > > have mapping for stolen memory and IGD fail in accessing stolen memory
> > > > > from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression  
> > > and  
> > > >   
> > > > > 
> > > > > shouldn't go git bisect. You could check this host error message from the
> > > > > bugzilla
> > > > > attachment. And this should be fixed first.
> > > > > Anyway, I will try my best to get the ideal commit through git bisect, but  
> > > I'm  
> > > >   
> > > > > 
> > > > > afraid
> > > > > the result is the same as past because we don't have a stable good point to
> > > > > start git
> > > > > bisect.  
> > > > [Zhang, Xiong Y] hi, Joonas:
> > > > As you said, the gpu hang exist because i915 create ring buffer from stolen  
> > > memory.  
> > > > 
> > > > I did git bisect again, and the following commit is the first bad commit:
> > > > commit c58b735fc762e891481e92af7124b85cb0a51fce  
> > > > > > > Author: Chris Wilson <chris@chris-wilson.co.uk>  
> > > > Date:   Thu Aug 18 17:16:57 2016 +0100
> > > > 
> > > >     drm/i915: Allocate rings from stolen
> > > > 
> > > >     If we have stolen available, make use of it for ringbuffer allocation.
> > > >     Previously this was restricted to !llc platforms, as writing to stolen
> > > >     requires a GGTT mapping - but now that we have partial mappable  
> > > support,  
> > > > 
> > > >     the mappable aperture isn't quite so precious so we can use it more
> > > >     freely and ringbuffers are a good user for the otherwise wasted stolen.
> > > > 
> > > > After reverting this patch from drm-intel-nightly, I didn't see gpu hang during  
> > > guest boot process.  
> > > > 
> > > > So what's our next step ?  
> > > 
> > > An appropriate next step would be to evaluate how much work it is to
> > > support the RMRR passthrough David mentioned about in his commit.  
> > [Zhang, Xiong Y] As Kevin explained, KVM community found the disadvantage
> > Of RMRR and have decided to not support RMRR passthrough, so it is really hard
> > for us to push such solution and isn't related to the workload.
> > Except usb and graphic card, all other devices with RMRR couldn't passthrough
> > to guest. But the driver of usb and graphic card couldn't access RMRR in such
> > environment.
> > https://access.redhat.com/sites/default/files/attachments/rmrr-wp1.pdf  
> 
> Does this patch have the right Cc's from KVM team? I'd like to hear
> directly from them that even the usage of RMRRs that follow the
> intention of VT-d spec are not going to be supported. That document
> predates the patches to add the exclusion for graphics.

I'm the QEMU and kernel vfio maintainer and co-author of the above
whitepaper.  Even the VT-d spec suggests that usage of RMRRs should be
limited (rev 2.3, 8.4):

  Platform designers should avoid or limit use of reserved memory
  regions since these require system software to create holes in the
  DMA virtual address range available to system software and its
  drivers.

Also, if you read the entire thread which added the graphics exception
for RMRR, you'll see that it went in under some degree of protest and
ultimately under the conclusion that we should just ignore the RMRR
anyway:

https://lists.linuxfoundation.org/pipermail/iommu/2015-April/012790.html

At least for the case of IGD RMRRs, we don't expect that they're used
for health monitoring of the devices via back channels as the interface
has been abused to do elsewhere, so ignoring the RMRR and not
supporting it in the VM means that the device is only impairing itself,
which is fine.  I would balk at trying to add RMRR support into vfio
and the virt stack for the reasons outlined in the above whitepaper.
Thanks,

Alex
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index cc7393e..c0e8968 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -164,31 +164,42 @@  static void intel_detect_pch(struct drm_i915_private *dev_priv)
 	 *
 	 * In some virtualized environments (e.g. XEN), there is irrelevant
 	 * ISA bridge in the system. To work reliably, we should scan trhough
-	 * all the ISA bridge devices and check for the first match, instead
+	 * all the ISA bridge devices and check for all the match, instead
 	 * of only checking the first one.
+	 *
+	 * If both the real ISA bridge and IGD are assigned to one guest, this
+	 * guest will have two matched ISA bridges: real and emulatated. We
+	 * couldn't guarantee which one is detected first, so we scan all the
+	 * match, instead of only checking the first match. Real one take
+	 * precedence over emulated to set pch_type and pch_id.Emulated one is
+	 * used to disable stolen memory.Finally pci_get_class() will return
+	 * NULL to exit loop and deference the last matched pch.
 	 */
 	while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) {
 		if (pch->vendor == PCI_VENDOR_ID_INTEL) {
 			unsigned short id = pch->device & INTEL_PCH_DEVICE_ID_MASK;
-			dev_priv->pch_id = id;
 
 			if (id == INTEL_PCH_IBX_DEVICE_ID_TYPE) {
 				dev_priv->pch_type = PCH_IBX;
+				dev_priv->pch_id = id;
 				DRM_DEBUG_KMS("Found Ibex Peak PCH\n");
 				WARN_ON(!IS_GEN5(dev_priv));
 			} else if (id == INTEL_PCH_CPT_DEVICE_ID_TYPE) {
 				dev_priv->pch_type = PCH_CPT;
+				dev_priv->pch_id = id;
 				DRM_DEBUG_KMS("Found CougarPoint PCH\n");
 				WARN_ON(!(IS_GEN6(dev_priv) ||
 					IS_IVYBRIDGE(dev_priv)));
 			} else if (id == INTEL_PCH_PPT_DEVICE_ID_TYPE) {
 				/* PantherPoint is CPT compatible */
 				dev_priv->pch_type = PCH_CPT;
+				dev_priv->pch_id = id;
 				DRM_DEBUG_KMS("Found PantherPoint PCH\n");
 				WARN_ON(!(IS_GEN6(dev_priv) ||
 					IS_IVYBRIDGE(dev_priv)));
 			} else if (id == INTEL_PCH_LPT_DEVICE_ID_TYPE) {
 				dev_priv->pch_type = PCH_LPT;
+				dev_priv->pch_id = id;
 				DRM_DEBUG_KMS("Found LynxPoint PCH\n");
 				WARN_ON(!IS_HASWELL(dev_priv) &&
 					!IS_BROADWELL(dev_priv));
@@ -196,6 +207,7 @@  static void intel_detect_pch(struct drm_i915_private *dev_priv)
 					IS_BDW_ULT(dev_priv));
 			} else if (id == INTEL_PCH_LPT_LP_DEVICE_ID_TYPE) {
 				dev_priv->pch_type = PCH_LPT;
+				dev_priv->pch_id = id;
 				DRM_DEBUG_KMS("Found LynxPoint LP PCH\n");
 				WARN_ON(!IS_HASWELL(dev_priv) &&
 					!IS_BROADWELL(dev_priv));
@@ -203,16 +215,19 @@  static void intel_detect_pch(struct drm_i915_private *dev_priv)
 					!IS_BDW_ULT(dev_priv));
 			} else if (id == INTEL_PCH_SPT_DEVICE_ID_TYPE) {
 				dev_priv->pch_type = PCH_SPT;
+				dev_priv->pch_id = id;
 				DRM_DEBUG_KMS("Found SunrisePoint PCH\n");
 				WARN_ON(!IS_SKYLAKE(dev_priv) &&
 					!IS_KABYLAKE(dev_priv));
 			} else if (id == INTEL_PCH_SPT_LP_DEVICE_ID_TYPE) {
 				dev_priv->pch_type = PCH_SPT;
+				dev_priv->pch_id = id;
 				DRM_DEBUG_KMS("Found SunrisePoint LP PCH\n");
 				WARN_ON(!IS_SKYLAKE(dev_priv) &&
 					!IS_KABYLAKE(dev_priv));
 			} else if (id == INTEL_PCH_KBP_DEVICE_ID_TYPE) {
 				dev_priv->pch_type = PCH_KBP;
+				dev_priv->pch_id = id;
 				DRM_DEBUG_KMS("Found KabyPoint PCH\n");
 				WARN_ON(!IS_SKYLAKE(dev_priv) &&
 					!IS_KABYLAKE(dev_priv));
@@ -223,18 +238,26 @@  static void intel_detect_pch(struct drm_i915_private *dev_priv)
 					    PCI_SUBVENDOR_ID_REDHAT_QUMRANET &&
 				    pch->subsystem_device ==
 					    PCI_SUBDEVICE_ID_QEMU)) {
-				dev_priv->pch_type =
-					intel_virt_detect_pch(dev_priv);
+				/*
+				 * P2X is used for VMware, exclude it
+				 */
+				if (id != INTEL_PCH_P2X_DEVICE_ID_TYPE)
+					dev_priv->disable_stolen = true;
+				/*
+				 * Real PCH still hasn't been detected
+				 */
+				if (!HAS_PCH_SPLIT(dev_priv)) {
+					dev_priv->pch_type =
+						intel_virt_detect_pch(dev_priv);
+					dev_priv->pch_id = id;
+				}
 			} else
 				continue;
 
-			break;
 		}
 	}
-	if (!pch)
+	if (!HAS_PCH_SPLIT(dev_priv))
 		DRM_DEBUG_KMS("No PCH found.\n");
-
-	pci_dev_put(pch);
 }
 
 static int i915_getparam(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 357b6c6..3b8e067 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2132,6 +2132,7 @@  struct drm_i915_private {
 	struct intel_uncore uncore;
 
 	struct i915_virtual_gpu vgpu;
+	bool disable_stolen;
 
 	struct intel_gvt *gvt;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index f3abdc2..bffeae7 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -409,8 +409,8 @@  int i915_gem_init_stolen(struct drm_i915_private *dev_priv)
 
 	mutex_init(&dev_priv->mm.stolen_lock);
 
-	if (intel_vgpu_active(dev_priv)) {
-		DRM_INFO("iGVT-g active, disabling use of stolen memory\n");
+	if (dev_priv->disable_stolen || intel_vgpu_active(dev_priv)) {
+		DRM_INFO("Running in guest, disabling use of stolen memory\n");
 		return 0;
 	}