diff mbox series

[v2] x86 / iommu: set up a scratch page in the quarantine domain

Message ID 20191127171143.27399-1-pdurrant@amazon.com (mailing list archive)
State New, archived
Headers show
Series [v2] x86 / iommu: set up a scratch page in the quarantine domain | expand

Commit Message

Paul Durrant Nov. 27, 2019, 5:11 p.m. UTC
This patch introduces a new iommu_op to facilitate a per-implementation
quarantine set up, and then further code for x86 implementations
(amd and vtd) to set up a read-only scratch page to serve as the source
for DMA reads whilst a device is assigned to dom_io. DMA writes will
continue to fault as before.

The reason for doing this is that some hardware may continue to re-try
DMA (despite FLR) in the event of an error, or even BME being cleared, and
will fail to deal with DMA read faults gracefully. Having a scratch page
mapped will allow pending DMA reads to complete and thus such buggy
hardware will eventually be quiesced.

NOTE: These modifications are restricted to x86 implementations only as
      the buggy h/w I am aware of is only used with Xen in an x86
      environment. ARM may require similar code but, since I am not
      aware of the need, this patch does not modify any ARM implementation.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>

v2:
 - Addressed comments from Jan

There is still the open question of whether use of a scratch page ought
to be gated on something, either are run-time or compile-time.
---
 xen/drivers/passthrough/amd/iommu_map.c       | 62 ++++++++++++++++
 xen/drivers/passthrough/amd/pci_amd_iommu.c   | 14 ++--
 xen/drivers/passthrough/iommu.c               | 20 +++++-
 xen/drivers/passthrough/vtd/iommu.c           | 72 +++++++++++++++----
 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |  3 +
 xen/include/xen/iommu.h                       |  1 +
 6 files changed, 148 insertions(+), 24 deletions(-)

Comments

Jan Beulich Nov. 28, 2019, 11:17 a.m. UTC | #1
On 27.11.2019 18:11, Paul Durrant wrote:
> This patch introduces a new iommu_op to facilitate a per-implementation
> quarantine set up, and then further code for x86 implementations
> (amd and vtd) to set up a read-only scratch page to serve as the source
> for DMA reads whilst a device is assigned to dom_io. DMA writes will
> continue to fault as before.
> 
> The reason for doing this is that some hardware may continue to re-try
> DMA (despite FLR) in the event of an error, or even BME being cleared, and
> will fail to deal with DMA read faults gracefully. Having a scratch page
> mapped will allow pending DMA reads to complete and thus such buggy
> hardware will eventually be quiesced.
> 
> NOTE: These modifications are restricted to x86 implementations only as
>       the buggy h/w I am aware of is only used with Xen in an x86
>       environment. ARM may require similar code but, since I am not
>       aware of the need, this patch does not modify any ARM implementation.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> There is still the open question of whether use of a scratch page ought
> to be gated on something, either are run-time or compile-time.

I have no clear opinion either way here. The workaround seems low
overhead enough that there may not be a need to have an admin (or
build time) control for this.

As to 4.13: The quarantining as a whole is pretty fresh. While it
has been backported to security maintained trees, I'd still consider
it a new feature in 4.13, and hence this workaround at least eligible
for consideration.

Jan
Jürgen Groß Nov. 28, 2019, 11:32 a.m. UTC | #2
On 28.11.19 12:17, Jan Beulich wrote:
> On 27.11.2019 18:11, Paul Durrant wrote:
>> This patch introduces a new iommu_op to facilitate a per-implementation
>> quarantine set up, and then further code for x86 implementations
>> (amd and vtd) to set up a read-only scratch page to serve as the source
>> for DMA reads whilst a device is assigned to dom_io. DMA writes will
>> continue to fault as before.
>>
>> The reason for doing this is that some hardware may continue to re-try
>> DMA (despite FLR) in the event of an error, or even BME being cleared, and
>> will fail to deal with DMA read faults gracefully. Having a scratch page
>> mapped will allow pending DMA reads to complete and thus such buggy
>> hardware will eventually be quiesced.
>>
>> NOTE: These modifications are restricted to x86 implementations only as
>>        the buggy h/w I am aware of is only used with Xen in an x86
>>        environment. ARM may require similar code but, since I am not
>>        aware of the need, this patch does not modify any ARM implementation.
>>
>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
>> There is still the open question of whether use of a scratch page ought
>> to be gated on something, either are run-time or compile-time.
> 
> I have no clear opinion either way here. The workaround seems low
> overhead enough that there may not be a need to have an admin (or
> build time) control for this.
> 
> As to 4.13: The quarantining as a whole is pretty fresh. While it
> has been backported to security maintained trees, I'd still consider
> it a new feature in 4.13, and hence this workaround at least eligible
> for consideration.

I agree.

Release-acked-by: Juergen Gross <jgross@suse.com>


Juergen
Jan Beulich Dec. 3, 2019, 9:36 a.m. UTC | #3
On 28.11.2019 12:32, Jürgen Groß wrote:
> On 28.11.19 12:17, Jan Beulich wrote:
>> On 27.11.2019 18:11, Paul Durrant wrote:
>>> This patch introduces a new iommu_op to facilitate a per-implementation
>>> quarantine set up, and then further code for x86 implementations
>>> (amd and vtd) to set up a read-only scratch page to serve as the source
>>> for DMA reads whilst a device is assigned to dom_io. DMA writes will
>>> continue to fault as before.
>>>
>>> The reason for doing this is that some hardware may continue to re-try
>>> DMA (despite FLR) in the event of an error, or even BME being cleared, and
>>> will fail to deal with DMA read faults gracefully. Having a scratch page
>>> mapped will allow pending DMA reads to complete and thus such buggy
>>> hardware will eventually be quiesced.
>>>
>>> NOTE: These modifications are restricted to x86 implementations only as
>>>        the buggy h/w I am aware of is only used with Xen in an x86
>>>        environment. ARM may require similar code but, since I am not
>>>        aware of the need, this patch does not modify any ARM implementation.
>>>
>>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>
>>> There is still the open question of whether use of a scratch page ought
>>> to be gated on something, either are run-time or compile-time.
>>
>> I have no clear opinion either way here. The workaround seems low
>> overhead enough that there may not be a need to have an admin (or
>> build time) control for this.
>>
>> As to 4.13: The quarantining as a whole is pretty fresh. While it
>> has been backported to security maintained trees, I'd still consider
>> it a new feature in 4.13, and hence this workaround at least eligible
>> for consideration.
> 
> I agree.
> 
> Release-acked-by: Juergen Gross <jgross@suse.com>

I notice this has been committed meanwhile. I had specifically not
done so due to the still missing VT-d ack, seeing that this wasn't
an entirely "trivial" change.

Jan
Tian, Kevin Dec. 10, 2019, 7:16 a.m. UTC | #4
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, December 3, 2019 5:36 PM
> 
> On 28.11.2019 12:32, Jürgen Groß wrote:
> > On 28.11.19 12:17, Jan Beulich wrote:
> >> On 27.11.2019 18:11, Paul Durrant wrote:
> >>> This patch introduces a new iommu_op to facilitate a per-
> implementation
> >>> quarantine set up, and then further code for x86 implementations
> >>> (amd and vtd) to set up a read-only scratch page to serve as the source
> >>> for DMA reads whilst a device is assigned to dom_io. DMA writes will
> >>> continue to fault as before.
> >>>
> >>> The reason for doing this is that some hardware may continue to re-try
> >>> DMA (despite FLR) in the event of an error, or even BME being cleared,
> and
> >>> will fail to deal with DMA read faults gracefully. Having a scratch page
> >>> mapped will allow pending DMA reads to complete and thus such buggy
> >>> hardware will eventually be quiesced.
> >>>
> >>> NOTE: These modifications are restricted to x86 implementations only as
> >>>        the buggy h/w I am aware of is only used with Xen in an x86
> >>>        environment. ARM may require similar code but, since I am not
> >>>        aware of the need, this patch does not modify any ARM
> implementation.
> >>>
> >>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> >>
> >> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >>
> >>> There is still the open question of whether use of a scratch page ought
> >>> to be gated on something, either are run-time or compile-time.
> >>
> >> I have no clear opinion either way here. The workaround seems low
> >> overhead enough that there may not be a need to have an admin (or
> >> build time) control for this.
> >>
> >> As to 4.13: The quarantining as a whole is pretty fresh. While it
> >> has been backported to security maintained trees, I'd still consider
> >> it a new feature in 4.13, and hence this workaround at least eligible
> >> for consideration.
> >
> > I agree.
> >
> > Release-acked-by: Juergen Gross <jgross@suse.com>
> 
> I notice this has been committed meanwhile. I had specifically not
> done so due to the still missing VT-d ack, seeing that this wasn't
> an entirely "trivial" change.
> 

While the quarantine idea sounds good overall, I'm still not convinced
to have it the only way in place just for handling some known-buggy
device. It kills the possibility of identifying a new buggy device and then 
deciding not to use it in the first space... I thought about whether it
will get better when future IOMMU implements A/D bit - by checking
access bit being set then we'll know some buggy device exists, but, 
the scratch page is shared by all devices then we cannot rely on this 
feature to find out the actual buggy one.

Thanks
Kevin
Jan Beulich Dec. 10, 2019, 8:05 a.m. UTC | #5
On 10.12.2019 08:16, Tian, Kevin wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Tuesday, December 3, 2019 5:36 PM
>>
>> On 28.11.2019 12:32, Jürgen Groß wrote:
>>> On 28.11.19 12:17, Jan Beulich wrote:
>>>> On 27.11.2019 18:11, Paul Durrant wrote:
>>>>> This patch introduces a new iommu_op to facilitate a per-
>> implementation
>>>>> quarantine set up, and then further code for x86 implementations
>>>>> (amd and vtd) to set up a read-only scratch page to serve as the source
>>>>> for DMA reads whilst a device is assigned to dom_io. DMA writes will
>>>>> continue to fault as before.
>>>>>
>>>>> The reason for doing this is that some hardware may continue to re-try
>>>>> DMA (despite FLR) in the event of an error, or even BME being cleared,
>> and
>>>>> will fail to deal with DMA read faults gracefully. Having a scratch page
>>>>> mapped will allow pending DMA reads to complete and thus such buggy
>>>>> hardware will eventually be quiesced.
>>>>>
>>>>> NOTE: These modifications are restricted to x86 implementations only as
>>>>>        the buggy h/w I am aware of is only used with Xen in an x86
>>>>>        environment. ARM may require similar code but, since I am not
>>>>>        aware of the need, this patch does not modify any ARM
>> implementation.
>>>>>
>>>>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
>>>>
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>>> There is still the open question of whether use of a scratch page ought
>>>>> to be gated on something, either are run-time or compile-time.
>>>>
>>>> I have no clear opinion either way here. The workaround seems low
>>>> overhead enough that there may not be a need to have an admin (or
>>>> build time) control for this.
>>>>
>>>> As to 4.13: The quarantining as a whole is pretty fresh. While it
>>>> has been backported to security maintained trees, I'd still consider
>>>> it a new feature in 4.13, and hence this workaround at least eligible
>>>> for consideration.
>>>
>>> I agree.
>>>
>>> Release-acked-by: Juergen Gross <jgross@suse.com>
>>
>> I notice this has been committed meanwhile. I had specifically not
>> done so due to the still missing VT-d ack, seeing that this wasn't
>> an entirely "trivial" change.
>>
> 
> While the quarantine idea sounds good overall, I'm still not convinced
> to have it the only way in place just for handling some known-buggy
> device. It kills the possibility of identifying a new buggy device and then 
> deciding not to use it in the first space... I thought about whether it
> will get better when future IOMMU implements A/D bit - by checking
> access bit being set then we'll know some buggy device exists, but, 
> the scratch page is shared by all devices then we cannot rely on this 
> feature to find out the actual buggy one.

Thinking about it - yes, I think I agree. This (as with so many
workarounds) would better be an off-by-default one. The main issue
I understand this would have is that buggy systems then might hang
without even having managed to get a log message out - Paul?

Jürgen - would you be amenable to an almost last minute refinement
here (would then also need to still be backported to 4.12.2, or
the original backport reverted, to avoid giving the impression of
a regression)?

Jan
Jürgen Groß Dec. 10, 2019, 8:12 a.m. UTC | #6
On 10.12.19 09:05, Jan Beulich wrote:
> On 10.12.2019 08:16, Tian, Kevin wrote:
>>> From: Jan Beulich <jbeulich@suse.com>
>>> Sent: Tuesday, December 3, 2019 5:36 PM
>>>
>>> On 28.11.2019 12:32, Jürgen Groß wrote:
>>>> On 28.11.19 12:17, Jan Beulich wrote:
>>>>> On 27.11.2019 18:11, Paul Durrant wrote:
>>>>>> This patch introduces a new iommu_op to facilitate a per-
>>> implementation
>>>>>> quarantine set up, and then further code for x86 implementations
>>>>>> (amd and vtd) to set up a read-only scratch page to serve as the source
>>>>>> for DMA reads whilst a device is assigned to dom_io. DMA writes will
>>>>>> continue to fault as before.
>>>>>>
>>>>>> The reason for doing this is that some hardware may continue to re-try
>>>>>> DMA (despite FLR) in the event of an error, or even BME being cleared,
>>> and
>>>>>> will fail to deal with DMA read faults gracefully. Having a scratch page
>>>>>> mapped will allow pending DMA reads to complete and thus such buggy
>>>>>> hardware will eventually be quiesced.
>>>>>>
>>>>>> NOTE: These modifications are restricted to x86 implementations only as
>>>>>>         the buggy h/w I am aware of is only used with Xen in an x86
>>>>>>         environment. ARM may require similar code but, since I am not
>>>>>>         aware of the need, this patch does not modify any ARM
>>> implementation.
>>>>>>
>>>>>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
>>>>>
>>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>>
>>>>>> There is still the open question of whether use of a scratch page ought
>>>>>> to be gated on something, either are run-time or compile-time.
>>>>>
>>>>> I have no clear opinion either way here. The workaround seems low
>>>>> overhead enough that there may not be a need to have an admin (or
>>>>> build time) control for this.
>>>>>
>>>>> As to 4.13: The quarantining as a whole is pretty fresh. While it
>>>>> has been backported to security maintained trees, I'd still consider
>>>>> it a new feature in 4.13, and hence this workaround at least eligible
>>>>> for consideration.
>>>>
>>>> I agree.
>>>>
>>>> Release-acked-by: Juergen Gross <jgross@suse.com>
>>>
>>> I notice this has been committed meanwhile. I had specifically not
>>> done so due to the still missing VT-d ack, seeing that this wasn't
>>> an entirely "trivial" change.
>>>
>>
>> While the quarantine idea sounds good overall, I'm still not convinced
>> to have it the only way in place just for handling some known-buggy
>> device. It kills the possibility of identifying a new buggy device and then
>> deciding not to use it in the first space... I thought about whether it
>> will get better when future IOMMU implements A/D bit - by checking
>> access bit being set then we'll know some buggy device exists, but,
>> the scratch page is shared by all devices then we cannot rely on this
>> feature to find out the actual buggy one.
> 
> Thinking about it - yes, I think I agree. This (as with so many
> workarounds) would better be an off-by-default one. The main issue
> I understand this would have is that buggy systems then might hang
> without even having managed to get a log message out - Paul?
> 
> Jürgen - would you be amenable to an almost last minute refinement
> here (would then also need to still be backported to 4.12.2, or
> the original backport reverted, to avoid giving the impression of
> a regression)?

So what is your suggestion here? To have a boot option (defaulting to
off) for enabling the scratch page?


Juergen
Paul Durrant Dec. 10, 2019, 8:19 a.m. UTC | #7
> -----Original Message-----
> From: Jürgen Groß <jgross@suse.com>
> Sent: 10 December 2019 08:12
> To: Jan Beulich <jbeulich@suse.com>; Tian, Kevin <kevin.tian@intel.com>;
> Durrant, Paul <pdurrant@amazon.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-
> devel@lists.xenproject.org; Roger Pau Monné <roger.pau@citrix.com>; Wei
> Liu <wl@xen.org>
> Subject: Re: [PATCH v2] x86 / iommu: set up a scratch page in the
> quarantine domain
> 
> On 10.12.19 09:05, Jan Beulich wrote:
> > On 10.12.2019 08:16, Tian, Kevin wrote:
> >>> From: Jan Beulich <jbeulich@suse.com>
> >>> Sent: Tuesday, December 3, 2019 5:36 PM
> >>>
> >>> On 28.11.2019 12:32, Jürgen Groß wrote:
> >>>> On 28.11.19 12:17, Jan Beulich wrote:
> >>>>> On 27.11.2019 18:11, Paul Durrant wrote:
> >>>>>> This patch introduces a new iommu_op to facilitate a per-
> >>> implementation
> >>>>>> quarantine set up, and then further code for x86 implementations
> >>>>>> (amd and vtd) to set up a read-only scratch page to serve as the
> source
> >>>>>> for DMA reads whilst a device is assigned to dom_io. DMA writes
> will
> >>>>>> continue to fault as before.
> >>>>>>
> >>>>>> The reason for doing this is that some hardware may continue to re-
> try
> >>>>>> DMA (despite FLR) in the event of an error, or even BME being
> cleared,
> >>> and
> >>>>>> will fail to deal with DMA read faults gracefully. Having a scratch
> page
> >>>>>> mapped will allow pending DMA reads to complete and thus such buggy
> >>>>>> hardware will eventually be quiesced.
> >>>>>>
> >>>>>> NOTE: These modifications are restricted to x86 implementations
> only as
> >>>>>>         the buggy h/w I am aware of is only used with Xen in an x86
> >>>>>>         environment. ARM may require similar code but, since I am
> not
> >>>>>>         aware of the need, this patch does not modify any ARM
> >>> implementation.
> >>>>>>
> >>>>>> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> >>>>>
> >>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >>>>>
> >>>>>> There is still the open question of whether use of a scratch page
> ought
> >>>>>> to be gated on something, either are run-time or compile-time.
> >>>>>
> >>>>> I have no clear opinion either way here. The workaround seems low
> >>>>> overhead enough that there may not be a need to have an admin (or
> >>>>> build time) control for this.
> >>>>>
> >>>>> As to 4.13: The quarantining as a whole is pretty fresh. While it
> >>>>> has been backported to security maintained trees, I'd still consider
> >>>>> it a new feature in 4.13, and hence this workaround at least
> eligible
> >>>>> for consideration.
> >>>>
> >>>> I agree.
> >>>>
> >>>> Release-acked-by: Juergen Gross <jgross@suse.com>
> >>>
> >>> I notice this has been committed meanwhile. I had specifically not
> >>> done so due to the still missing VT-d ack, seeing that this wasn't
> >>> an entirely "trivial" change.
> >>>
> >>
> >> While the quarantine idea sounds good overall, I'm still not convinced
> >> to have it the only way in place just for handling some known-buggy
> >> device. It kills the possibility of identifying a new buggy device and
> then
> >> deciding not to use it in the first space... I thought about whether it
> >> will get better when future IOMMU implements A/D bit - by checking
> >> access bit being set then we'll know some buggy device exists, but,
> >> the scratch page is shared by all devices then we cannot rely on this
> >> feature to find out the actual buggy one.
> >
> > Thinking about it - yes, I think I agree. This (as with so many
> > workarounds) would better be an off-by-default one. The main issue
> > I understand this would have is that buggy systems then might hang
> > without even having managed to get a log message out - Paul?
> >

Yes, that's the concern. Some systems will wedge hard on the first fault with no logging. I have no objection to a command line parameter but IMO it ought to default on, at least in non-debug mode.

  Paul


> > Jürgen - would you be amenable to an almost last minute refinement
> > here (would then also need to still be backported to 4.12.2, or
> > the original backport reverted, to avoid giving the impression of
> > a regression)?
> 
> So what is your suggestion here? To have a boot option (defaulting to
> off) for enabling the scratch page?
> 
> 
> Juergen
Jan Beulich Dec. 10, 2019, 8:57 a.m. UTC | #8
On 10.12.2019 09:12, Jürgen Groß wrote:
> On 10.12.19 09:05, Jan Beulich wrote:
>> On 10.12.2019 08:16, Tian, Kevin wrote:
>>> While the quarantine idea sounds good overall, I'm still not convinced
>>> to have it the only way in place just for handling some known-buggy
>>> device. It kills the possibility of identifying a new buggy device and then
>>> deciding not to use it in the first space... I thought about whether it
>>> will get better when future IOMMU implements A/D bit - by checking
>>> access bit being set then we'll know some buggy device exists, but,
>>> the scratch page is shared by all devices then we cannot rely on this
>>> feature to find out the actual buggy one.
>>
>> Thinking about it - yes, I think I agree. This (as with so many
>> workarounds) would better be an off-by-default one. The main issue
>> I understand this would have is that buggy systems then might hang
>> without even having managed to get a log message out - Paul?
>>
>> Jürgen - would you be amenable to an almost last minute refinement
>> here (would then also need to still be backported to 4.12.2, or
>> the original backport reverted, to avoid giving the impression of
>> a regression)?
> 
> So what is your suggestion here? To have a boot option (defaulting to
> off) for enabling the scratch page?

Yes (and despite having seen Paul's reply).

Jan
Jürgen Groß Dec. 10, 2019, 9:07 a.m. UTC | #9
On 10.12.19 09:57, Jan Beulich wrote:
> On 10.12.2019 09:12, Jürgen Groß wrote:
>> On 10.12.19 09:05, Jan Beulich wrote:
>>> On 10.12.2019 08:16, Tian, Kevin wrote:
>>>> While the quarantine idea sounds good overall, I'm still not convinced
>>>> to have it the only way in place just for handling some known-buggy
>>>> device. It kills the possibility of identifying a new buggy device and then
>>>> deciding not to use it in the first space... I thought about whether it
>>>> will get better when future IOMMU implements A/D bit - by checking
>>>> access bit being set then we'll know some buggy device exists, but,
>>>> the scratch page is shared by all devices then we cannot rely on this
>>>> feature to find out the actual buggy one.
>>>
>>> Thinking about it - yes, I think I agree. This (as with so many
>>> workarounds) would better be an off-by-default one. The main issue
>>> I understand this would have is that buggy systems then might hang
>>> without even having managed to get a log message out - Paul?
>>>
>>> Jürgen - would you be amenable to an almost last minute refinement
>>> here (would then also need to still be backported to 4.12.2, or
>>> the original backport reverted, to avoid giving the impression of
>>> a regression)?
>>
>> So what is your suggestion here? To have a boot option (defaulting to
>> off) for enabling the scratch page?
> 
> Yes (and despite having seen Paul's reply).

I'd release ack such a patch in case you come to an agreement regarding
the default soon.


Juergen
Paul Durrant Dec. 10, 2019, 9:16 a.m. UTC | #10
> -----Original Message-----
> From: Jürgen Groß <jgross@suse.com>
> Sent: 10 December 2019 09:07
> To: Jan Beulich <jbeulich@suse.com>
> Cc: Tian, Kevin <kevin.tian@intel.com>; Durrant, Paul
> <pdurrant@amazon.com>; Andrew Cooper <andrew.cooper3@citrix.com>; xen-
> devel@lists.xenproject.org; Roger Pau Monné <roger.pau@citrix.com>; Wei
> Liu <wl@xen.org>
> Subject: Re: [PATCH v2] x86 / iommu: set up a scratch page in the
> quarantine domain
> 
> On 10.12.19 09:57, Jan Beulich wrote:
> > On 10.12.2019 09:12, Jürgen Groß wrote:
> >> On 10.12.19 09:05, Jan Beulich wrote:
> >>> On 10.12.2019 08:16, Tian, Kevin wrote:
> >>>> While the quarantine idea sounds good overall, I'm still not
> convinced
> >>>> to have it the only way in place just for handling some known-buggy
> >>>> device. It kills the possibility of identifying a new buggy device
> and then
> >>>> deciding not to use it in the first space... I thought about whether
> it
> >>>> will get better when future IOMMU implements A/D bit - by checking
> >>>> access bit being set then we'll know some buggy device exists, but,
> >>>> the scratch page is shared by all devices then we cannot rely on this
> >>>> feature to find out the actual buggy one.
> >>>
> >>> Thinking about it - yes, I think I agree. This (as with so many
> >>> workarounds) would better be an off-by-default one. The main issue
> >>> I understand this would have is that buggy systems then might hang
> >>> without even having managed to get a log message out - Paul?
> >>>
> >>> Jürgen - would you be amenable to an almost last minute refinement
> >>> here (would then also need to still be backported to 4.12.2, or
> >>> the original backport reverted, to avoid giving the impression of
> >>> a regression)?
> >>
> >> So what is your suggestion here? To have a boot option (defaulting to
> >> off) for enabling the scratch page?
> >
> > Yes (and despite having seen Paul's reply).
> 
> I'd release ack such a patch in case you come to an agreement regarding
> the default soon.
> 

Ok. The default is not that crucial. Perhaps it's just me who thinks defaults should be chosen on the basis of being most likely to result in a working system.

  Paul

> 
> Juergen
Jan Beulich Dec. 10, 2019, 9:44 a.m. UTC | #11
On 10.12.2019 10:16, Durrant, Paul wrote:
>> -----Original Message-----
>> From: Jürgen Groß <jgross@suse.com>
>> Sent: 10 December 2019 09:07
>> To: Jan Beulich <jbeulich@suse.com>
>> Cc: Tian, Kevin <kevin.tian@intel.com>; Durrant, Paul
>> <pdurrant@amazon.com>; Andrew Cooper <andrew.cooper3@citrix.com>; xen-
>> devel@lists.xenproject.org; Roger Pau Monné <roger.pau@citrix.com>; Wei
>> Liu <wl@xen.org>
>> Subject: Re: [PATCH v2] x86 / iommu: set up a scratch page in the
>> quarantine domain
>>
>> On 10.12.19 09:57, Jan Beulich wrote:
>>> On 10.12.2019 09:12, Jürgen Groß wrote:
>>>> On 10.12.19 09:05, Jan Beulich wrote:
>>>>> On 10.12.2019 08:16, Tian, Kevin wrote:
>>>>>> While the quarantine idea sounds good overall, I'm still not
>> convinced
>>>>>> to have it the only way in place just for handling some known-buggy
>>>>>> device. It kills the possibility of identifying a new buggy device
>> and then
>>>>>> deciding not to use it in the first space... I thought about whether
>> it
>>>>>> will get better when future IOMMU implements A/D bit - by checking
>>>>>> access bit being set then we'll know some buggy device exists, but,
>>>>>> the scratch page is shared by all devices then we cannot rely on this
>>>>>> feature to find out the actual buggy one.
>>>>>
>>>>> Thinking about it - yes, I think I agree. This (as with so many
>>>>> workarounds) would better be an off-by-default one. The main issue
>>>>> I understand this would have is that buggy systems then might hang
>>>>> without even having managed to get a log message out - Paul?
>>>>>
>>>>> Jürgen - would you be amenable to an almost last minute refinement
>>>>> here (would then also need to still be backported to 4.12.2, or
>>>>> the original backport reverted, to avoid giving the impression of
>>>>> a regression)?
>>>>
>>>> So what is your suggestion here? To have a boot option (defaulting to
>>>> off) for enabling the scratch page?
>>>
>>> Yes (and despite having seen Paul's reply).
>>
>> I'd release ack such a patch in case you come to an agreement regarding
>> the default soon.
>>
> 
> Ok. The default is not that crucial. Perhaps it's just me who thinks
> defaults should be chosen on the basis of being most likely to result
> in a working system.

If it wasn't for quirky hardware (or firmware to cover the general case,
in particular to avoid getting quoted on this wrt my position on EFI
workarounds), I'd agree. But personally I think Kevin's point takes
priority here: Admins should at least be aware of running quirky
hardware, and hence I'd prefer the default to be logging of faults
rather than their silencing. Documentation of the new (sub-)option may
give suitable hints, and we may even go as far as providing a Kconfig
option for the default to be chosen at build time.

Main question now is - who's going to make a patch? Will you? Should I?

Jan
Paul Durrant Dec. 10, 2019, 10:19 a.m. UTC | #12
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 10 December 2019 09:45
> To: Durrant, Paul <pdurrant@amazon.com>
> Cc: Jürgen Groß <jgross@suse.com>; Tian, Kevin <kevin.tian@intel.com>;
> Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel@lists.xenproject.org;
> Roger Pau Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>
> Subject: Re: [PATCH v2] x86 / iommu: set up a scratch page in the
> quarantine domain
> 
> On 10.12.2019 10:16, Durrant, Paul wrote:
> >> -----Original Message-----
> >> From: Jürgen Groß <jgross@suse.com>
> >> Sent: 10 December 2019 09:07
> >> To: Jan Beulich <jbeulich@suse.com>
> >> Cc: Tian, Kevin <kevin.tian@intel.com>; Durrant, Paul
> >> <pdurrant@amazon.com>; Andrew Cooper <andrew.cooper3@citrix.com>; xen-
> >> devel@lists.xenproject.org; Roger Pau Monné <roger.pau@citrix.com>; Wei
> >> Liu <wl@xen.org>
> >> Subject: Re: [PATCH v2] x86 / iommu: set up a scratch page in the
> >> quarantine domain
> >>
> >> On 10.12.19 09:57, Jan Beulich wrote:
> >>> On 10.12.2019 09:12, Jürgen Groß wrote:
> >>>> On 10.12.19 09:05, Jan Beulich wrote:
> >>>>> On 10.12.2019 08:16, Tian, Kevin wrote:
> >>>>>> While the quarantine idea sounds good overall, I'm still not
> >> convinced
> >>>>>> to have it the only way in place just for handling some known-buggy
> >>>>>> device. It kills the possibility of identifying a new buggy device
> >> and then
> >>>>>> deciding not to use it in the first space... I thought about
> whether
> >> it
> >>>>>> will get better when future IOMMU implements A/D bit - by checking
> >>>>>> access bit being set then we'll know some buggy device exists, but,
> >>>>>> the scratch page is shared by all devices then we cannot rely on
> this
> >>>>>> feature to find out the actual buggy one.
> >>>>>
> >>>>> Thinking about it - yes, I think I agree. This (as with so many
> >>>>> workarounds) would better be an off-by-default one. The main issue
> >>>>> I understand this would have is that buggy systems then might hang
> >>>>> without even having managed to get a log message out - Paul?
> >>>>>
> >>>>> Jürgen - would you be amenable to an almost last minute refinement
> >>>>> here (would then also need to still be backported to 4.12.2, or
> >>>>> the original backport reverted, to avoid giving the impression of
> >>>>> a regression)?
> >>>>
> >>>> So what is your suggestion here? To have a boot option (defaulting to
> >>>> off) for enabling the scratch page?
> >>>
> >>> Yes (and despite having seen Paul's reply).
> >>
> >> I'd release ack such a patch in case you come to an agreement regarding
> >> the default soon.
> >>
> >
> > Ok. The default is not that crucial. Perhaps it's just me who thinks
> > defaults should be chosen on the basis of being most likely to result
> > in a working system.
> 
> If it wasn't for quirky hardware (or firmware to cover the general case,
> in particular to avoid getting quoted on this wrt my position on EFI
> workarounds), I'd agree. But personally I think Kevin's point takes
> priority here: Admins should at least be aware of running quirky
> hardware, and hence I'd prefer the default to be logging of faults
> rather than their silencing. Documentation of the new (sub-)option may
> give suitable hints, and we may even go as far as providing a Kconfig
> option for the default to be chosen at build time.
> 
> Main question now is - who's going to make a patch? Will you? Should I?
> 

I'm happy to do it, but it would probably be more expedient if you did.

  Paul

> Jan
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c
index cd5c7de7c5..54e1d132d9 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -560,6 +560,68 @@  int amd_iommu_reserve_domain_unity_map(struct domain *domain,
     return rt;
 }
 
+int __init amd_iommu_quarantine_init(struct domain *d)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+    unsigned long max_gfn =
+        PFN_DOWN((1ul << DEFAULT_DOMAIN_ADDRESS_WIDTH) - 1);
+    unsigned int level = amd_iommu_get_paging_mode(max_gfn);
+    struct amd_iommu_pte *table;
+
+    if ( hd->arch.root_table )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    spin_lock(&hd->arch.mapping_lock);
+
+    hd->arch.root_table = alloc_amd_iommu_pgtable();
+    if ( !hd->arch.root_table )
+        goto out;
+
+    table = __map_domain_page(hd->arch.root_table);
+    while ( level )
+    {
+        struct page_info *pg;
+        unsigned int i;
+
+        /*
+         * The pgtable allocator is fine for the leaf page, as well as
+         * page table pages, and the resulting allocations are always
+         * zeroed.
+         */
+        pg = alloc_amd_iommu_pgtable();
+        if ( !pg )
+            break;
+
+        for ( i = 0; i < PTE_PER_TABLE_SIZE; i++ )
+        {
+            struct amd_iommu_pte *pde = &table[i];
+
+            /*
+             * PDEs are essentially a subset of PTEs, so this function
+             * is fine to use even at the leaf.
+             */
+            set_iommu_pde_present(pde, mfn_x(page_to_mfn(pg)), level - 1,
+                                  false, true);
+        }
+
+        unmap_domain_page(table);
+        table = __map_domain_page(pg);
+        level--;
+    }
+    unmap_domain_page(table);
+
+ out:
+    spin_unlock(&hd->arch.mapping_lock);
+
+    amd_iommu_flush_all_pages(d);
+
+    /* Pages leaked in failure case */
+    return level ? -ENOMEM : 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index 75a0f1b4ab..4da6518773 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -95,10 +95,6 @@  static void amd_iommu_setup_domain_device(
     u8 bus = pdev->bus;
     const struct domain_iommu *hd = dom_iommu(domain);
 
-    /* dom_io is used as a sentinel for quarantined devices */
-    if ( domain == dom_io )
-        return;
-
     BUG_ON( !hd->arch.root_table || !hd->arch.paging_mode ||
             !iommu->dev_table.buffer );
 
@@ -235,7 +231,7 @@  static int __must_check allocate_domain_resources(struct domain_iommu *hd)
     return rc;
 }
 
-static int get_paging_mode(unsigned long entries)
+int amd_iommu_get_paging_mode(unsigned long entries)
 {
     int level = 1;
 
@@ -257,7 +253,8 @@  static int amd_iommu_domain_init(struct domain *d)
 
     /* For pv and dom0, stick with get_paging_mode(max_page)
      * For HVM dom0, use 2 level page table at first */
-    hd->arch.paging_mode = is_hvm_domain(d) ? 2 : get_paging_mode(max_page);
+    hd->arch.paging_mode = is_hvm_domain(d) ?
+        2 : amd_iommu_get_paging_mode(max_page);
     return 0;
 }
 
@@ -290,10 +287,6 @@  static void amd_iommu_disable_domain_device(const struct domain *domain,
     int req_id;
     u8 bus = pdev->bus;
 
-    /* dom_io is used as a sentinel for quarantined devices */
-    if ( domain == dom_io )
-        return;
-
     BUG_ON ( iommu->dev_table.buffer == NULL );
     req_id = get_dma_requestor_id(iommu->seg, PCI_BDF2(bus, devfn));
     table = iommu->dev_table.buffer;
@@ -632,6 +625,7 @@  static void amd_dump_p2m_table(struct domain *d)
 static const struct iommu_ops __initconstrel _iommu_ops = {
     .init = amd_iommu_domain_init,
     .hwdom_init = amd_iommu_hwdom_init,
+    .quarantine_init = amd_iommu_quarantine_init,
     .add_device = amd_iommu_add_device,
     .remove_device = amd_iommu_remove_device,
     .assign_device  = amd_iommu_assign_device,
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 8cbe908fff..79f842e340 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -440,6 +440,23 @@  int iommu_iotlb_flush_all(struct domain *d, unsigned int flush_flags)
     return rc;
 }
 
+static int __init iommu_quarantine_init(void)
+{
+    const struct domain_iommu *hd = dom_iommu(dom_io);
+    int rc;
+
+    dom_io->options |= XEN_DOMCTL_CDF_iommu;
+
+    rc = iommu_domain_init(dom_io, 0);
+    if ( rc )
+        return rc;
+
+    if ( !hd->platform_ops->quarantine_init )
+        return 0;
+
+    return hd->platform_ops->quarantine_init(dom_io);
+}
+
 int __init iommu_setup(void)
 {
     int rc = -ENODEV;
@@ -473,8 +490,7 @@  int __init iommu_setup(void)
     }
     else
     {
-        dom_io->options |= XEN_DOMCTL_CDF_iommu;
-        if ( iommu_domain_init(dom_io, 0) )
+        if ( iommu_quarantine_init() )
             panic("Could not set up quarantine\n");
 
         printk(" - Dom0 mode: %s\n",
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 25ad649c34..1e502131d7 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1291,10 +1291,6 @@  int domain_context_mapping_one(
     int agaw, rc, ret;
     bool_t flush_dev_iotlb;
 
-    /* dom_io is used as a sentinel for quarantined devices */
-    if ( domain == dom_io )
-        return 0;
-
     ASSERT(pcidevs_locked());
     spin_lock(&iommu->lock);
     maddr = bus_to_context_maddr(iommu, bus);
@@ -1541,10 +1537,6 @@  int domain_context_unmap_one(
     int iommu_domid, rc, ret;
     bool_t flush_dev_iotlb;
 
-    /* dom_io is used as a sentinel for quarantined devices */
-    if ( domain == dom_io )
-        return 0;
-
     ASSERT(pcidevs_locked());
     spin_lock(&iommu->lock);
 
@@ -1677,10 +1669,6 @@  static int domain_context_unmap(struct domain *domain, u8 devfn,
         goto out;
     }
 
-    /* dom_io is used as a sentinel for quarantined devices */
-    if ( domain == dom_io )
-        goto out;
-
     /*
      * if no other devices under the same iommu owned by this domain,
      * clear iommu in iommu_bitmap and clear domain_id in domid_bitmp
@@ -2683,9 +2671,69 @@  static void vtd_dump_p2m_table(struct domain *d)
     vtd_dump_p2m_table_level(hd->arch.pgd_maddr, agaw_to_level(hd->arch.agaw), 0, 0);
 }
 
+static int __init intel_iommu_quarantine_init(struct domain *d)
+{
+    struct domain_iommu *hd = dom_iommu(d);
+    struct dma_pte *parent;
+    unsigned int agaw = width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH);
+    unsigned int level = agaw_to_level(agaw);
+    int rc;
+
+    if ( hd->arch.pgd_maddr )
+    {
+        ASSERT_UNREACHABLE();
+        return 0;
+    }
+
+    spin_lock(&hd->arch.mapping_lock);
+
+    hd->arch.pgd_maddr = alloc_pgtable_maddr(1, hd->node);
+    if ( !hd->arch.pgd_maddr )
+        goto out;
+
+    parent = map_vtd_domain_page(hd->arch.pgd_maddr);
+    while ( level )
+    {
+        uint64_t maddr;
+        unsigned int offset;
+
+        /*
+         * The pgtable allocator is fine for the leaf page, as well as
+         * page table pages, and the resulting allocations are always
+         * zeroed.
+         */
+        maddr = alloc_pgtable_maddr(1, hd->node);
+        if ( !maddr )
+            break;
+
+        for ( offset = 0; offset < PTE_NUM; offset++ )
+        {
+            struct dma_pte *pte = &parent[offset];
+
+            dma_set_pte_addr(*pte, maddr);
+            dma_set_pte_readable(*pte);
+        }
+        iommu_flush_cache_page(parent, 1);
+
+        unmap_vtd_domain_page(parent);
+        parent = map_vtd_domain_page(maddr);
+        level--;
+    }
+    unmap_vtd_domain_page(parent);
+
+ out:
+    spin_unlock(&hd->arch.mapping_lock);
+
+    rc = iommu_flush_iotlb_all(d);
+
+    /* Pages leaked in failure case */
+    return level ? -ENOMEM : rc;
+}
+
 const struct iommu_ops __initconstrel intel_iommu_ops = {
     .init = intel_iommu_domain_init,
     .hwdom_init = intel_iommu_hwdom_init,
+    .quarantine_init = intel_iommu_quarantine_init,
     .add_device = intel_iommu_add_device,
     .enable_device = intel_iommu_enable_device,
     .remove_device = intel_iommu_remove_device,
diff --git a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
index 8ed9482791..664dfc93b9 100644
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -54,6 +54,9 @@  int amd_iommu_init_late(void);
 int amd_iommu_update_ivrs_mapping_acpi(void);
 int iov_adjust_irq_affinities(void);
 
+int amd_iommu_get_paging_mode(unsigned long entries);
+int amd_iommu_quarantine_init(struct domain *d);
+
 /* mapping functions */
 int __must_check amd_iommu_map_page(struct domain *d, dfn_t dfn,
                                     mfn_t mfn, unsigned int flags,
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 974bd3ffe8..6977ddbb97 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -211,6 +211,7 @@  typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void *ctxt);
 struct iommu_ops {
     int (*init)(struct domain *d);
     void (*hwdom_init)(struct domain *d);
+    int (*quarantine_init)(struct domain *d);
     int (*add_device)(u8 devfn, device_t *dev);
     int (*enable_device)(device_t *dev);
     int (*remove_device)(u8 devfn, device_t *dev);