diff mbox series

[RFC,1/1] iommu: set the default iommu-dma mode as non-strict

Message ID 20190131135211.6732-1-thunder.leizhen@huawei.com (mailing list archive)
State RFC
Headers show
Series [RFC,1/1] iommu: set the default iommu-dma mode as non-strict | expand

Commit Message

Zhen Lei Jan. 31, 2019, 1:52 p.m. UTC
Currently, many peripherals are faster than before. For example, the top
speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But
when iommu page-table mapping enabled, it's hard to reach the top speed
in strict mode, because of frequently map and unmap operations. In order
to keep abreast of the times, I think it's better to set non-strict as
default.

Below it's our iperf performance data of 25Gb netcard:
strict mode: 18-20 Gb/s
non-strict mode: 23.5 Gb/s

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 4 ++--
 drivers/iommu/iommu.c                           | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

--
1.8.3

Comments

Jean-Philippe Brucker Jan. 31, 2019, 2:55 p.m. UTC | #1
Hi,

On 31/01/2019 13:52, Zhen Lei wrote:
> Currently, many peripherals are faster than before. For example, the top
> speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But
> when iommu page-table mapping enabled, it's hard to reach the top speed
> in strict mode, because of frequently map and unmap operations. In order
> to keep abreast of the times, I think it's better to set non-strict as
> default.

Most users won't be aware of this relaxation and will have their system
vulnerable to e.g. thunderbolt hotplug. See for example 4.3 Deferred
Invalidation in
http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2018/MSC/MSC-2018-21.pdf

Why not keep the policy to secure by default, as we do for
iommu.passthrough? And maybe add something similar to
CONFIG_IOMMU_DEFAULT_PASSTRHOUGH? It's easy enough for experts to pass a
command-line argument or change the default config.

Thanks,
Jean

> 
> Below it's our iperf performance data of 25Gb netcard:
> strict mode: 18-20 Gb/s
> non-strict mode: 23.5 Gb/s
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 4 ++--
>  drivers/iommu/iommu.c                           | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index b799bcf..667221f 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1779,13 +1779,13 @@
> 
>  	iommu.strict=	[ARM64] Configure TLB invalidation behaviour
>  			Format: { "0" | "1" }
> -			0 - Lazy mode.
> +			0 - Lazy mode (default).
>  			  Request that DMA unmap operations use deferred
>  			  invalidation of hardware TLBs, for increased
>  			  throughput at the cost of reduced device isolation.
>  			  Will fall back to strict mode if not supported by
>  			  the relevant IOMMU driver.
> -			1 - Strict mode (default).
> +			1 - Strict mode.
>  			  DMA unmap operations invalidate IOMMU hardware TLBs
>  			  synchronously.
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3ed4db3..10e0b49 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -43,7 +43,7 @@
>  #else
>  static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
>  #endif
> -static bool iommu_dma_strict __read_mostly = true;
> +static bool iommu_dma_strict __read_mostly;
> 
>  struct iommu_callback_data {
>  	const struct iommu_ops *ops;
> --
> 1.8.3
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Hanjun Guo Feb. 26, 2019, 12:36 p.m. UTC | #2
Hi Jean,

On 2019/1/31 22:55, Jean-Philippe Brucker wrote:
> Hi,
> 
> On 31/01/2019 13:52, Zhen Lei wrote:
>> Currently, many peripherals are faster than before. For example, the top
>> speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But
>> when iommu page-table mapping enabled, it's hard to reach the top speed
>> in strict mode, because of frequently map and unmap operations. In order
>> to keep abreast of the times, I think it's better to set non-strict as
>> default.
> 
> Most users won't be aware of this relaxation and will have their system
> vulnerable to e.g. thunderbolt hotplug. See for example 4.3 Deferred
> Invalidation in
> http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2018/MSC/MSC-2018-21.pdf
> 
> Why not keep the policy to secure by default, as we do for
> iommu.passthrough? And maybe add something similar to
> CONFIG_IOMMU_DEFAULT_PASSTRHOUGH? It's easy enough for experts to pass a
> command-line argument or change the default config.

Sorry for the late reply, it was Chinese new year, and we had a long discussion
internally, we are fine to add a Kconfig but not sure OS vendors will set it
to default y.

OS vendors seems not happy to pass a command-line argument, to be honest,
this is our motivation to enable non-strict as default. Hope OS vendors
can see this email thread, and give some input here.

Thanks
Hanjun
Zhen Lei March 1, 2019, 4:44 a.m. UTC | #3
On 2019/2/26 20:36, Hanjun Guo wrote:
> Hi Jean,
> 
> On 2019/1/31 22:55, Jean-Philippe Brucker wrote:
>> Hi,
>>
>> On 31/01/2019 13:52, Zhen Lei wrote:
>>> Currently, many peripherals are faster than before. For example, the top
>>> speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But
>>> when iommu page-table mapping enabled, it's hard to reach the top speed
>>> in strict mode, because of frequently map and unmap operations. In order
>>> to keep abreast of the times, I think it's better to set non-strict as
>>> default.
>>
>> Most users won't be aware of this relaxation and will have their system
>> vulnerable to e.g. thunderbolt hotplug. See for example 4.3 Deferred
>> Invalidation in
>> http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2018/MSC/MSC-2018-21.pdf
Hi Jean,

   In fact, we have discussed the vulnerable of deferred invalidation before upstream
the non-strict patches. The attacks maybe possible because of an untrusted device or
the mistake of the device driver. And we limited the VFIO to still use strict mode.
   As mentioned in the pdf, limit the freed memory with deferred invalidation only to
be reused by the device, can mitigate the vulnerability. But it's too hard to implement
it now.
   A compromise maybe we only apply non-strict to (1) dma_free_coherent, because the
memory is controlled by DMA common module, so we can make the memory to be freed after
the global invalidation in the timer handler. (2) And provide some new APIs related to
iommu_unmap_page/sg, these new APIs deferred invalidation. And the candiate device
drivers update the APIs if they want to improve performance. (3) Make sure that only
the trusted devices and trusted drivers can apply (1) and (2). For example, the driver
must be built into kernel Image.
   So that some high-end trusted devices use non-strict mode, and keep others still using
strict mode. The drivers who want to use non-strict mode, should change to use new APIs
by themselves.


>>
>> Why not keep the policy to secure by default, as we do for
>> iommu.passthrough? And maybe add something similar to
>> CONFIG_IOMMU_DEFAULT_PASSTRHOUGH? It's easy enough for experts to pass a
>> command-line argument or change the default config.
> 
> Sorry for the late reply, it was Chinese new year, and we had a long discussion
> internally, we are fine to add a Kconfig but not sure OS vendors will set it
> to default y.
> 
> OS vendors seems not happy to pass a command-line argument, to be honest,
> this is our motivation to enable non-strict as default. Hope OS vendors
> can see this email thread, and give some input here.
> 
> Thanks
> Hanjun
> 
> 
> .
>
Jean-Philippe Brucker March 1, 2019, 11:07 a.m. UTC | #4
Hi Leizhen,

On 01/03/2019 04:44, Leizhen (ThunderTown) wrote:
> 
> 
> On 2019/2/26 20:36, Hanjun Guo wrote:
>> Hi Jean,
>>
>> On 2019/1/31 22:55, Jean-Philippe Brucker wrote:
>>> Hi,
>>>
>>> On 31/01/2019 13:52, Zhen Lei wrote:
>>>> Currently, many peripherals are faster than before. For example, the top
>>>> speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But
>>>> when iommu page-table mapping enabled, it's hard to reach the top speed
>>>> in strict mode, because of frequently map and unmap operations. In order
>>>> to keep abreast of the times, I think it's better to set non-strict as
>>>> default.
>>>
>>> Most users won't be aware of this relaxation and will have their system
>>> vulnerable to e.g. thunderbolt hotplug. See for example 4.3 Deferred
>>> Invalidation in
>>> http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2018/MSC/MSC-2018-21.pdf
> Hi Jean,
> 
>    In fact, we have discussed the vulnerable of deferred invalidation before upstream
> the non-strict patches. The attacks maybe possible because of an untrusted device or
> the mistake of the device driver. And we limited the VFIO to still use strict mode.
>    As mentioned in the pdf, limit the freed memory with deferred invalidation only to
> be reused by the device, can mitigate the vulnerability. But it's too hard to implement
> it now.
>    A compromise maybe we only apply non-strict to (1) dma_free_coherent, because the
> memory is controlled by DMA common module, so we can make the memory to be freed after
> the global invalidation in the timer handler. (2) And provide some new APIs related to
> iommu_unmap_page/sg, these new APIs deferred invalidation. And the candiate device
> drivers update the APIs if they want to improve performance. (3) Make sure that only
> the trusted devices and trusted drivers can apply (1) and (2). For example, the driver
> must be built into kernel Image.

Do we have a notion of untrusted kernel drivers? A userspace driver
(VFIO) is untrusted, ok. But a malicious driver loaded into the kernel
address space would have much easier ways to corrupt the system than to
exploit lazy mode...

For (3), I agree that we should at least disallow lazy mode if
pci_dev->untrusted is set. At the moment it means that we require the
strictest IOMMU configuration for external-facing PCI ports, but it can
be extended to blacklist other vulnerable devices or locations.

If you do (3) then maybe we don't need (1) and (2), which require a
tonne of work in the DMA and IOMMU layers (but would certainly be nice
to see, since it would also help handle ATS invalidation timeouts)

Thanks,
Jean

>    So that some high-end trusted devices use non-strict mode, and keep others still using
> strict mode. The drivers who want to use non-strict mode, should change to use new APIs
> by themselves.
> 
> 
>>>
>>> Why not keep the policy to secure by default, as we do for
>>> iommu.passthrough? And maybe add something similar to
>>> CONFIG_IOMMU_DEFAULT_PASSTRHOUGH? It's easy enough for experts to pass a
>>> command-line argument or change the default config.
>>
>> Sorry for the late reply, it was Chinese new year, and we had a long discussion
>> internally, we are fine to add a Kconfig but not sure OS vendors will set it
>> to default y.
>>
>> OS vendors seems not happy to pass a command-line argument, to be honest,
>> this is our motivation to enable non-strict as default. Hope OS vendors
>> can see this email thread, and give some input here.
>>
>> Thanks
>> Hanjun
>>
>>
>> .
>>
>
Zhen Lei March 2, 2019, 6:12 a.m. UTC | #5
On 2019/3/1 19:07, Jean-Philippe Brucker wrote:
> Hi Leizhen,
> 
> On 01/03/2019 04:44, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2019/2/26 20:36, Hanjun Guo wrote:
>>> Hi Jean,
>>>
>>> On 2019/1/31 22:55, Jean-Philippe Brucker wrote:
>>>> Hi,
>>>>
>>>> On 31/01/2019 13:52, Zhen Lei wrote:
>>>>> Currently, many peripherals are faster than before. For example, the top
>>>>> speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But
>>>>> when iommu page-table mapping enabled, it's hard to reach the top speed
>>>>> in strict mode, because of frequently map and unmap operations. In order
>>>>> to keep abreast of the times, I think it's better to set non-strict as
>>>>> default.
>>>>
>>>> Most users won't be aware of this relaxation and will have their system
>>>> vulnerable to e.g. thunderbolt hotplug. See for example 4.3 Deferred
>>>> Invalidation in
>>>> http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2018/MSC/MSC-2018-21.pdf
>> Hi Jean,
>>
>>    In fact, we have discussed the vulnerable of deferred invalidation before upstream
>> the non-strict patches. The attacks maybe possible because of an untrusted device or
>> the mistake of the device driver. And we limited the VFIO to still use strict mode.
>>    As mentioned in the pdf, limit the freed memory with deferred invalidation only to
>> be reused by the device, can mitigate the vulnerability. But it's too hard to implement
>> it now.
>>    A compromise maybe we only apply non-strict to (1) dma_free_coherent, because the
>> memory is controlled by DMA common module, so we can make the memory to be freed after
>> the global invalidation in the timer handler. (2) And provide some new APIs related to
>> iommu_unmap_page/sg, these new APIs deferred invalidation. And the candiate device
>> drivers update the APIs if they want to improve performance. (3) Make sure that only
>> the trusted devices and trusted drivers can apply (1) and (2). For example, the driver
>> must be built into kernel Image.
> 
> Do we have a notion of untrusted kernel drivers? A userspace driver
It seems impossible to have such driver. The modules insmod by root users should be
guaranteed by themselves.

> (VFIO) is untrusted, ok. But a malicious driver loaded into the kernel
> address space would have much easier ways to corrupt the system than to
> exploit lazy mode...
Yes, so that we have no need to consider untrusted drivers.

> 
> For (3), I agree that we should at least disallow lazy mode if
> pci_dev->untrusted is set. At the moment it means that we require the
> strictest IOMMU configuration for external-facing PCI ports, but it can
> be extended to blacklist other vulnerable devices or locations.
I plan to add an attribute file for each device, espcially for hotplug devices. And
let the root users to decide which mode should be used, strict or non-strict. Becasue
they should known whether the hot-plug divice is trusted or not.

> 
> If you do (3) then maybe we don't need (1) and (2), which require a
> tonne of work in the DMA and IOMMU layers (but would certainly be nice
> to see, since it would also help handle ATS invalidation timeouts)
> 
> Thanks,
> Jean
> 
>>    So that some high-end trusted devices use non-strict mode, and keep others still using
>> strict mode. The drivers who want to use non-strict mode, should change to use new APIs
>> by themselves.
>>
>>
>>>>
>>>> Why not keep the policy to secure by default, as we do for
>>>> iommu.passthrough? And maybe add something similar to
>>>> CONFIG_IOMMU_DEFAULT_PASSTRHOUGH? It's easy enough for experts to pass a
>>>> command-line argument or change the default config.
>>>
>>> Sorry for the late reply, it was Chinese new year, and we had a long discussion
>>> internally, we are fine to add a Kconfig but not sure OS vendors will set it
>>> to default y.
>>>
>>> OS vendors seems not happy to pass a command-line argument, to be honest,
>>> this is our motivation to enable non-strict as default. Hope OS vendors
>>> can see this email thread, and give some input here.
>>>
>>> Thanks
>>> Hanjun
>>>
>>>
>>> .
>>>
>>
> 
> 
> .
>
Robin Murphy March 4, 2019, 3:52 p.m. UTC | #6
On 02/03/2019 06:12, Leizhen (ThunderTown) wrote:
> 
> 
> On 2019/3/1 19:07, Jean-Philippe Brucker wrote:
>> Hi Leizhen,
>>
>> On 01/03/2019 04:44, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2019/2/26 20:36, Hanjun Guo wrote:
>>>> Hi Jean,
>>>>
>>>> On 2019/1/31 22:55, Jean-Philippe Brucker wrote:
>>>>> Hi,
>>>>>
>>>>> On 31/01/2019 13:52, Zhen Lei wrote:
>>>>>> Currently, many peripherals are faster than before. For example, the top
>>>>>> speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But
>>>>>> when iommu page-table mapping enabled, it's hard to reach the top speed
>>>>>> in strict mode, because of frequently map and unmap operations. In order
>>>>>> to keep abreast of the times, I think it's better to set non-strict as
>>>>>> default.
>>>>>
>>>>> Most users won't be aware of this relaxation and will have their system
>>>>> vulnerable to e.g. thunderbolt hotplug. See for example 4.3 Deferred
>>>>> Invalidation in
>>>>> http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2018/MSC/MSC-2018-21.pdf
>>> Hi Jean,
>>>
>>>     In fact, we have discussed the vulnerable of deferred invalidation before upstream
>>> the non-strict patches. The attacks maybe possible because of an untrusted device or
>>> the mistake of the device driver. And we limited the VFIO to still use strict mode.
>>>     As mentioned in the pdf, limit the freed memory with deferred invalidation only to
>>> be reused by the device, can mitigate the vulnerability. But it's too hard to implement
>>> it now.
>>>     A compromise maybe we only apply non-strict to (1) dma_free_coherent, because the
>>> memory is controlled by DMA common module, so we can make the memory to be freed after
>>> the global invalidation in the timer handler. (2) And provide some new APIs related to
>>> iommu_unmap_page/sg, these new APIs deferred invalidation. And the candiate device
>>> drivers update the APIs if they want to improve performance. (3) Make sure that only
>>> the trusted devices and trusted drivers can apply (1) and (2). For example, the driver
>>> must be built into kernel Image.
>>
>> Do we have a notion of untrusted kernel drivers? A userspace driver
> It seems impossible to have such driver. The modules insmod by root users should be
> guaranteed by themselves.
> 
>> (VFIO) is untrusted, ok. But a malicious driver loaded into the kernel
>> address space would have much easier ways to corrupt the system than to
>> exploit lazy mode...
> Yes, so that we have no need to consider untrusted drivers.
> 
>>
>> For (3), I agree that we should at least disallow lazy mode if
>> pci_dev->untrusted is set. At the moment it means that we require the
>> strictest IOMMU configuration for external-facing PCI ports, but it can
>> be extended to blacklist other vulnerable devices or locations.
> I plan to add an attribute file for each device, espcially for hotplug devices. And
> let the root users to decide which mode should be used, strict or non-strict. Becasue
> they should known whether the hot-plug divice is trusted or not.

Aside from the problem that without massive implementation changes 
strict/non-strict is at best a per-domain property, not a per-device 
one, I can't see this being particularly practical - surely the whole 
point of a malicious endpoint is that it's going to pretend to be some 
common device for which a 'trusted' kernel driver already exists? If 
you've chosen to trust *any* external device, I think you may as well 
have just set non-strict globally anyway. The effort involved in trying 
to implement super-fine-grained control seems hard to justify.

Robin.

>>
>> If you do (3) then maybe we don't need (1) and (2), which require a
>> tonne of work in the DMA and IOMMU layers (but would certainly be nice
>> to see, since it would also help handle ATS invalidation timeouts)
>>
>> Thanks,
>> Jean
>>
>>>     So that some high-end trusted devices use non-strict mode, and keep others still using
>>> strict mode. The drivers who want to use non-strict mode, should change to use new APIs
>>> by themselves.
>>>
>>>
>>>>>
>>>>> Why not keep the policy to secure by default, as we do for
>>>>> iommu.passthrough? And maybe add something similar to
>>>>> CONFIG_IOMMU_DEFAULT_PASSTRHOUGH? It's easy enough for experts to pass a
>>>>> command-line argument or change the default config.
>>>>
>>>> Sorry for the late reply, it was Chinese new year, and we had a long discussion
>>>> internally, we are fine to add a Kconfig but not sure OS vendors will set it
>>>> to default y.
>>>>
>>>> OS vendors seems not happy to pass a command-line argument, to be honest,
>>>> this is our motivation to enable non-strict as default. Hope OS vendors
>>>> can see this email thread, and give some input here.
>>>>
>>>> Thanks
>>>> Hanjun
>>>>
>>>>
>>>> .
>>>>
>>>
>>
>>
>> .
>>
>
Zhen Lei March 6, 2019, 11:06 a.m. UTC | #7
On 2019/3/4 23:52, Robin Murphy wrote:
> On 02/03/2019 06:12, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2019/3/1 19:07, Jean-Philippe Brucker wrote:
>>> Hi Leizhen,
>>>
>>> On 01/03/2019 04:44, Leizhen (ThunderTown) wrote:
>>>>
>>>>
>>>> On 2019/2/26 20:36, Hanjun Guo wrote:
>>>>> Hi Jean,
>>>>>
>>>>> On 2019/1/31 22:55, Jean-Philippe Brucker wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 31/01/2019 13:52, Zhen Lei wrote:
>>>>>>> Currently, many peripherals are faster than before. For example, the top
>>>>>>> speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But
>>>>>>> when iommu page-table mapping enabled, it's hard to reach the top speed
>>>>>>> in strict mode, because of frequently map and unmap operations. In order
>>>>>>> to keep abreast of the times, I think it's better to set non-strict as
>>>>>>> default.
>>>>>>
>>>>>> Most users won't be aware of this relaxation and will have their system
>>>>>> vulnerable to e.g. thunderbolt hotplug. See for example 4.3 Deferred
>>>>>> Invalidation in
>>>>>> http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2018/MSC/MSC-2018-21.pdf
>>>> Hi Jean,
>>>>
>>>>     In fact, we have discussed the vulnerable of deferred invalidation before upstream
>>>> the non-strict patches. The attacks maybe possible because of an untrusted device or
>>>> the mistake of the device driver. And we limited the VFIO to still use strict mode.
>>>>     As mentioned in the pdf, limit the freed memory with deferred invalidation only to
>>>> be reused by the device, can mitigate the vulnerability. But it's too hard to implement
>>>> it now.
>>>>     A compromise maybe we only apply non-strict to (1) dma_free_coherent, because the
>>>> memory is controlled by DMA common module, so we can make the memory to be freed after
>>>> the global invalidation in the timer handler. (2) And provide some new APIs related to
>>>> iommu_unmap_page/sg, these new APIs deferred invalidation. And the candiate device
>>>> drivers update the APIs if they want to improve performance. (3) Make sure that only
>>>> the trusted devices and trusted drivers can apply (1) and (2). For example, the driver
>>>> must be built into kernel Image.
>>>
>>> Do we have a notion of untrusted kernel drivers? A userspace driver
>> It seems impossible to have such driver. The modules insmod by root users should be
>> guaranteed by themselves.
>>
>>> (VFIO) is untrusted, ok. But a malicious driver loaded into the kernel
>>> address space would have much easier ways to corrupt the system than to
>>> exploit lazy mode...
>> Yes, so that we have no need to consider untrusted drivers.
>>
>>>
>>> For (3), I agree that we should at least disallow lazy mode if
>>> pci_dev->untrusted is set. At the moment it means that we require the
>>> strictest IOMMU configuration for external-facing PCI ports, but it can
>>> be extended to blacklist other vulnerable devices or locations.
>> I plan to add an attribute file for each device, espcially for hotplug devices. And
>> let the root users to decide which mode should be used, strict or non-strict. Becasue
>> they should known whether the hot-plug divice is trusted or not.
> 
> Aside from the problem that without massive implementation changes strict/non-strict is at
> best a per-domain property, not a per-device one, I can't see this being particularly practical
> - surely the whole point of a malicious endpoint is that it's going to pretend to be some common
> device for which a 'trusted' kernel driver already exists? 
Yes, It should be assumed that all kernel drivers and all hard-wired devices are trusted. There is
no reason to doubt that the open source drivers or the drivers and devices provided by legitimate
suppliers are malicious.


> If you've chosen to trust *any* external device, I think you may as well have just set non-strict globally anyway.
> The effort involved in trying to implement super-fine-grained control seems hard to justify.
The default mode of external devices is strict, it can be obviously changed to non-strict mode. But as
you said, it maybe hard to be implemented. In addition, bring a malicious device into computer room,
attach and export data it's not easy also. Maybe I should follow Jean'suggestion first, add a config item.

> 
> Robin.
> 
>>>
>>> If you do (3) then maybe we don't need (1) and (2), which require a
>>> tonne of work in the DMA and IOMMU layers (but would certainly be nice
>>> to see, since it would also help handle ATS invalidation timeouts)
>>>
>>> Thanks,
>>> Jean
>>>
>>>>     So that some high-end trusted devices use non-strict mode, and keep others still using
>>>> strict mode. The drivers who want to use non-strict mode, should change to use new APIs
>>>> by themselves.
>>>>
>>>>
>>>>>>
>>>>>> Why not keep the policy to secure by default, as we do for
>>>>>> iommu.passthrough? And maybe add something similar to
>>>>>> CONFIG_IOMMU_DEFAULT_PASSTRHOUGH? It's easy enough for experts to pass a
>>>>>> command-line argument or change the default config.
>>>>>
>>>>> Sorry for the late reply, it was Chinese new year, and we had a long discussion
>>>>> internally, we are fine to add a Kconfig but not sure OS vendors will set it
>>>>> to default y.
>>>>>
>>>>> OS vendors seems not happy to pass a command-line argument, to be honest,
>>>>> this is our motivation to enable non-strict as default. Hope OS vendors
>>>>> can see this email thread, and give some input here.
>>>>>
>>>>> Thanks
>>>>> Hanjun
>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>>>
>>>
>>> .
>>>
>>
> 
> .
>
John Garry March 6, 2019, 12:07 p.m. UTC | #8
>>>
>>>> (VFIO) is untrusted, ok. But a malicious driver loaded into the kernel
>>>> address space would have much easier ways to corrupt the system than to
>>>> exploit lazy mode...
>>> Yes, so that we have no need to consider untrusted drivers.
>>>
>>>>
>>>> For (3), I agree that we should at least disallow lazy mode if
>>>> pci_dev->untrusted is set. At the moment it means that we require the
>>>> strictest IOMMU configuration for external-facing PCI ports, but it can
>>>> be extended to blacklist other vulnerable devices or locations.
>>> I plan to add an attribute file for each device, espcially for hotplug devices. And
>>> let the root users to decide which mode should be used, strict or non-strict. Becasue
>>> they should known whether the hot-plug divice is trusted or not.
>>
>> Aside from the problem that without massive implementation changes strict/non-strict is at
>> best a per-domain property, not a per-device one, I can't see this being particularly practical
>> - surely the whole point of a malicious endpoint is that it's going to pretend to be some common
>> device for which a 'trusted' kernel driver already exists?
> Yes, It should be assumed that all kernel drivers and all hard-wired devices are trusted. There is
> no reason to doubt that the open source drivers or the drivers and devices provided by legitimate
> suppliers are malicious.
>
>
>> If you've chosen to trust *any* external device, I think you may as well have just set non-strict globally anyway.
>> The effort involved in trying to implement super-fine-grained control seems hard to justify.
> The default mode of external devices is strict, it can be obviously changed to non-strict mode. But as
> you said, it maybe hard to be implemented. In addition, bring a malicious device into computer room,
> attach and export data it's not easy also. Maybe I should follow Jean'suggestion first,


>add a config item.
>

+1

On another topic, we did also see a use case for selectively 
passthrough'ing devices.

Typically, having the kernel use the identity mapping for when driving a 
device is fine. In fact, having the IOMMU translating puts a big 
performance burden on the system. However sometimes we may require the 
IOMMU involved for certain devices, like for when the kernel device 
driver has big contiguous DMA requirements, which is the case for some 
RDMA NIC cards.

John


>>
>> Robin.
>>
>>>>
>>>> If you do (3) then maybe we don't need (1) and (2), which require a
>>>> tonne of work in the DMA and IOMMU layers (but would certainly be nice
>>>> to see, since it would also help handle ATS invalidation timeouts)
>>>>
>>>> Thanks,
>>>> Jean
>>>>
>>>>>     So that some high-end trusted devices use non-strict mode, and keep others still using
>>>>> strict mode. The drivers who want to use non-strict mode, should change to use new APIs
>>>>> by themselves.
>>>>>
>>>>>
>>>>>>>
>>>>>>> Why not keep the policy to secure by default, as we do for
>>>>>>> iommu.passthrough? And maybe add something similar to
>>>>>>> CONFIG_IOMMU_DEFAULT_PASSTRHOUGH? It's easy enough for experts to pass a
>>>>>>> command-line argument or change the default config.
>>>>>>
>>>>>> Sorry for the late reply, it was Chinese new year, and we had a long discussion
>>>>>> internally, we are fine to add a Kconfig but not sure OS vendors will set it
>>>>>> to default y.
>>>>>>
>>>>>> OS vendors seems not happy to pass a command-line argument, to be honest,
>>>>>> this is our motivation to enable non-strict as default. Hope OS vendors
>>>>>> can see this email thread, and give some input here.
>>>>>>
>>>>>> Thanks
>>>>>> Hanjun
>>>>>>
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>
>>>>
>>>> .
>>>>
>>>
>>
>> .
>>
>
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b799bcf..667221f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1779,13 +1779,13 @@ 

 	iommu.strict=	[ARM64] Configure TLB invalidation behaviour
 			Format: { "0" | "1" }
-			0 - Lazy mode.
+			0 - Lazy mode (default).
 			  Request that DMA unmap operations use deferred
 			  invalidation of hardware TLBs, for increased
 			  throughput at the cost of reduced device isolation.
 			  Will fall back to strict mode if not supported by
 			  the relevant IOMMU driver.
-			1 - Strict mode (default).
+			1 - Strict mode.
 			  DMA unmap operations invalidate IOMMU hardware TLBs
 			  synchronously.

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3ed4db3..10e0b49 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -43,7 +43,7 @@ 
 #else
 static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
 #endif
-static bool iommu_dma_strict __read_mostly = true;
+static bool iommu_dma_strict __read_mostly;

 struct iommu_callback_data {
 	const struct iommu_ops *ops;