diff mbox series

[RFC] xen/arm: introduce XENFEAT_ARM_dom0_iommu

Message ID alpine.DEB.2.21.2102161333090.3234@sstabellini-ThinkPad-T480s (mailing list archive)
State New, archived
Headers show
Series [RFC] xen/arm: introduce XENFEAT_ARM_dom0_iommu | expand

Commit Message

Stefano Stabellini Feb. 17, 2021, 2 a.m. UTC
Hi all,

Today Linux uses the swiotlb-xen driver (drivers/xen/swiotlb-xen.c) to
translate addresses for DMA operations in Dom0. Specifically,
swiotlb-xen is used to translate the address of a foreign page (a page
belonging to a domU) mapped into Dom0 before using it for DMA.

This is important because although Dom0 is 1:1 mapped, DomUs are not. On
systems without an IOMMU swiotlb-xen enables PV drivers to work as long
as the backends are in Dom0. Thanks to swiotlb-xen, the DMA operation
ends up using the MFN, rather than the GFN.


On systems with an IOMMU, this is not necessary: when a foreign page is
mapped in Dom0, it is added to the Dom0 p2m. A new GFN->MFN translation
is enstablished for both MMU and SMMU. Dom0 could safely use the GFN
address (instead of the MFN) for DMA operations and they would work. It
would be more efficient than using swiotlb-xen.

If you recall my presentation from Xen Summit 2020, Xilinx is working on
cache coloring. With cache coloring, no domain is 1:1 mapped, not even
Dom0. In a scenario where Dom0 is not 1:1 mapped, swiotlb-xen does not
work as intended.


The suggested solution for both these issues is to add a new feature
flag "XENFEAT_ARM_dom0_iommu" that tells Dom0 that it is safe not to use
swiotlb-xen because IOMMU translations are available for Dom0. If
XENFEAT_ARM_dom0_iommu is set, Linux should skip the swiotlb-xen
initialization. I have tested this scheme with and without cache
coloring (hence with and without 1:1 mapping of Dom0) on ZCU102 and it
works as expected: DMA operations succeed.


What about systems where an IOMMU is present but not all devices are
protected?

There is no way for Xen to know which devices are protected and which
ones are not: devices that do not have the "iommus" property could or
could not be DMA masters.

Perhaps Xen could populate a whitelist of devices protected by the IOMMU
based on the "iommus" property. It would require some added complexity
in Xen and especially in the swiotlb-xen driver in Linux to use it,
which is not ideal. However, this approach would not work for cache
coloring where dom0 is not 1:1 mapped so the swiotlb-xen should not be
used either way.


For these reasons, I would like to propose a single flag
XENFEAT_ARM_dom0_iommu which says that the IOMMU can be relied upon for
DMA translations. In situations where a DMA master is not SMMU
protected, XENFEAT_ARM_dom0_iommu should not be set. For example, on a
platform where an IOMMU is present and protects most DMA masters but it
is leaving out the MMC controller, then XENFEAT_ARM_dom0_iommu should
not be set (because PV block is not going to work without swiotlb-xen.)
This also means that cache coloring won't be usable on such a system (at
least not usable with the MMC controller so the system integrator should
pay special care to setup the system).

It is worth noting that if we wanted to extend the interface to add a
list of protected devices in the future, it would still be possible. It
would be compatible with XENFEAT_ARM_dom0_iommu.


How to set XENFEAT_ARM_dom0_iommu?

We could set XENFEAT_ARM_dom0_iommu automatically when
is_iommu_enabled(d) for Dom0. We could also have a platform specific
(xen/arch/arm/platforms/) override so that a specific platform can
disable XENFEAT_ARM_dom0_iommu. For debugging purposes and advanced
users, it would also be useful to be able to override it via a Xen
command line parameter.

See appended patch as a reference.


Cheers,

Stefano

Comments

Julien Grall Feb. 17, 2021, 8:50 a.m. UTC | #1
On 17/02/2021 02:00, Stefano Stabellini wrote:
> Hi all,
> 
> Today Linux uses the swiotlb-xen driver (drivers/xen/swiotlb-xen.c) to
> translate addresses for DMA operations in Dom0. Specifically,
> swiotlb-xen is used to translate the address of a foreign page (a page
> belonging to a domU) mapped into Dom0 before using it for DMA.
> 
> This is important because although Dom0 is 1:1 mapped, DomUs are not. On
> systems without an IOMMU swiotlb-xen enables PV drivers to work as long
> as the backends are in Dom0. Thanks to swiotlb-xen, the DMA operation
> ends up using the MFN, rather than the GFN.
> 
> 
> On systems with an IOMMU, this is not necessary: when a foreign page is
> mapped in Dom0, it is added to the Dom0 p2m. A new GFN->MFN translation
> is enstablished for both MMU and SMMU. Dom0 could safely use the GFN
> address (instead of the MFN) for DMA operations and they would work. It
> would be more efficient than using swiotlb-xen.
> 
> If you recall my presentation from Xen Summit 2020, Xilinx is working on
> cache coloring. With cache coloring, no domain is 1:1 mapped, not even
> Dom0. In a scenario where Dom0 is not 1:1 mapped, swiotlb-xen does not
> work as intended.
> 
> 
> The suggested solution for both these issues is to add a new feature
> flag "XENFEAT_ARM_dom0_iommu" that tells Dom0 that it is safe not to use
> swiotlb-xen because IOMMU translations are available for Dom0. If
> XENFEAT_ARM_dom0_iommu is set, Linux should skip the swiotlb-xen
> initialization. I have tested this scheme with and without cache
> coloring (hence with and without 1:1 mapping of Dom0) on ZCU102 and it
> works as expected: DMA operations succeed.
> 
> 
> What about systems where an IOMMU is present but not all devices are
> protected?
> 
> There is no way for Xen to know which devices are protected and which
> ones are not: devices that do not have the "iommus" property could or
> could not be DMA masters.
> 
> Perhaps Xen could populate a whitelist of devices protected by the IOMMU
> based on the "iommus" property. It would require some added complexity
> in Xen and especially in the swiotlb-xen driver in Linux to use it,
> which is not ideal.

You are trading a bit more complexity in Xen and Linux with a user will 
may not be able to use the hypervisor on his/her platform without a 
quirk in Xen (see more below).

> However, this approach would not work for cache
> coloring where dom0 is not 1:1 mapped so the swiotlb-xen should not be
> used either way

Not all the Dom0 Linux kernel will be able to work with cache colouring. 
So you will need a way for the kernel to say "Hey I am can avoid using 
swiotlb".

> 
> For these reasons, I would like to propose a single flag
> XENFEAT_ARM_dom0_iommu which says that the IOMMU can be relied upon for
> DMA translations. In situations where a DMA master is not SMMU
> protected, XENFEAT_ARM_dom0_iommu should not be set. For example, on a
> platform where an IOMMU is present and protects most DMA masters but it
> is leaving out the MMC controller, then XENFEAT_ARM_dom0_iommu should
> not be set (because PV block is not going to work without swiotlb-xen.)
> This also means that cache coloring won't be usable on such a system (at
> least not usable with the MMC controller so the system integrator should
> pay special care to setup the system).
> 
> It is worth noting that if we wanted to extend the interface to add a
> list of protected devices in the future, it would still be possible. It
> would be compatible with XENFEAT_ARM_dom0_iommu.

I imagine by compatible, you mean XENFEAT_ARM_dom0_iommu would be 
cleared and instead the device-tree list would be used. Is that correct?

> 
> 
> How to set XENFEAT_ARM_dom0_iommu?
> 
> We could set XENFEAT_ARM_dom0_iommu automatically when
> is_iommu_enabled(d) for Dom0. We could also have a platform specific
> (xen/arch/arm/platforms/) override so that a specific platform can
> disable XENFEAT_ARM_dom0_iommu. For debugging purposes and advanced
> users, it would also be useful to be able to override it via a Xen
> command line parameter.
Platform quirks should be are limited to a small set of platforms.

In this case, this would not be only per-platform but also per-firmware 
table as a developer can decide to remove/add IOMMU nodes in the DT at 
any time.

In addition to that, it means we are introducing a regression for those 
users as Xen 4.14 would have worked on there platform but not anymore. 
They would need to go through all the nodes and find out which one is 
not protected.

This is a bit of a daunting task and we are going to end up having a lot 
of per-platform quirk in Xen.

So this approach selecting the flag is a no-go for me. FAOD, the 
inverted idea (i.e. only setting XENFEAT_ARM_dom0_iommu per-platform) is 
a no go as well.

I don't have a good idea how to set the flag automatically. My 
requirement is newer Xen should continue to work on all supported 
platforms without any additional per-platform effort.

Cheers,
Bertrand Marquis Feb. 17, 2021, 3:34 p.m. UTC | #2
Hi Stefano,

> On 17 Feb 2021, at 02:00, Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> Hi all,
> 
> Today Linux uses the swiotlb-xen driver (drivers/xen/swiotlb-xen.c) to
> translate addresses for DMA operations in Dom0. Specifically,
> swiotlb-xen is used to translate the address of a foreign page (a page
> belonging to a domU) mapped into Dom0 before using it for DMA.
> 
> This is important because although Dom0 is 1:1 mapped, DomUs are not. On
> systems without an IOMMU swiotlb-xen enables PV drivers to work as long
> as the backends are in Dom0. Thanks to swiotlb-xen, the DMA operation
> ends up using the MFN, rather than the GFN.
> 
> 
> On systems with an IOMMU, this is not necessary: when a foreign page is
> mapped in Dom0, it is added to the Dom0 p2m. A new GFN->MFN translation
> is enstablished for both MMU and SMMU. Dom0 could safely use the GFN
> address (instead of the MFN) for DMA operations and they would work. It
> would be more efficient than using swiotlb-xen.
> 
> If you recall my presentation from Xen Summit 2020, Xilinx is working on
> cache coloring. With cache coloring, no domain is 1:1 mapped, not even
> Dom0. In a scenario where Dom0 is not 1:1 mapped, swiotlb-xen does not
> work as intended.
> 
> 
> The suggested solution for both these issues is to add a new feature
> flag "XENFEAT_ARM_dom0_iommu" that tells Dom0 that it is safe not to use
> swiotlb-xen because IOMMU translations are available for Dom0. If
> XENFEAT_ARM_dom0_iommu is set, Linux should skip the swiotlb-xen
> initialization. I have tested this scheme with and without cache
> coloring (hence with and without 1:1 mapping of Dom0) on ZCU102 and it
> works as expected: DMA operations succeed.

Wouldn’t it be easier to name the flag XENFEAT_ARM_swiotlb_needed ?

> 
> 
> What about systems where an IOMMU is present but not all devices are
> protected?
> 
> There is no way for Xen to know which devices are protected and which
> ones are not: devices that do not have the "iommus" property could or
> could not be DMA masters.
> 
> Perhaps Xen could populate a whitelist of devices protected by the IOMMU
> based on the "iommus" property. It would require some added complexity
> in Xen and especially in the swiotlb-xen driver in Linux to use it,
> which is not ideal. However, this approach would not work for cache
> coloring where dom0 is not 1:1 mapped so the swiotlb-xen should not be
> used either way.

Would it be realistic to say that cache coloring cannot be used without an IOMMU ?

Having a flag for the swiotlb is a good idea because in the current status the switch
to use or not swiotlb is more or less implicit. 
But somehow there are use cases where we should simply say “not supported” as
we do for example for passthrough right now. Maybe cache coloring is a case like
that.

> 
> 
> For these reasons, I would like to propose a single flag
> XENFEAT_ARM_dom0_iommu which says that the IOMMU can be relied upon for
> DMA translations. In situations where a DMA master is not SMMU
> protected, XENFEAT_ARM_dom0_iommu should not be set. For example, on a
> platform where an IOMMU is present and protects most DMA masters but it
> is leaving out the MMC controller, then XENFEAT_ARM_dom0_iommu should
> not be set (because PV block is not going to work without swiotlb-xen.)
> This also means that cache coloring won't be usable on such a system (at
> least not usable with the MMC controller so the system integrator should
> pay special care to setup the system).

Any system where you have an IOMMU but not covering all devices is
almost impossible to magically handle smoothly and will require some
carefull configuration. Sadly as you stated, we do not have a way to
auto-detect such a case.

> 
> It is worth noting that if we wanted to extend the interface to add a
> list of protected devices in the future, it would still be possible. It
> would be compatible with XENFEAT_ARM_dom0_iommu.
> 
> 
> How to set XENFEAT_ARM_dom0_iommu?
> 
> We could set XENFEAT_ARM_dom0_iommu automatically when
> is_iommu_enabled(d) for Dom0. We could also have a platform specific
> (xen/arch/arm/platforms/) override so that a specific platform can
> disable XENFEAT_ARM_dom0_iommu. For debugging purposes and advanced
> users, it would also be useful to be able to override it via a Xen
> command line parameter.
> 
> See appended patch as a reference.

I really think the naming of the flag will need to be modified.

Cheers
Bertrand

> 
> 
> Cheers,
> 
> Stefano
> 
> 
> diff --git a/xen/common/kernel.c b/xen/common/kernel.c
> index 7a345ae45e..4dbef48199 100644
> --- a/xen/common/kernel.c
> +++ b/xen/common/kernel.c
> @@ -16,6 +16,7 @@
> #include <xen/hypfs.h>
> #include <xsm/xsm.h>
> #include <asm/current.h>
> +#include <asm/platform.h>
> #include <public/version.h>
> 
> #ifndef COMPAT
> @@ -549,6 +550,9 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>                 fi.submap |= 1U << XENFEAT_dom0;
> #ifdef CONFIG_ARM
>             fi.submap |= (1U << XENFEAT_ARM_SMCCC_supported);
> +            if ( !platform_has_quirk(PLATFORM_QUIRK_DOM0_IOMMU) &&
> +                 is_hardware_domain(d) && is_iommu_enabled(d) )
> +                fi.submap |= (1U << XENFEAT_ARM_dom0_iommu);
> #endif
> #ifdef CONFIG_X86
>             if ( is_pv_domain(d) )
> diff --git a/xen/include/asm-arm/platform.h b/xen/include/asm-arm/platform.h
> index 997eb25216..094a76d677 100644
> --- a/xen/include/asm-arm/platform.h
> +++ b/xen/include/asm-arm/platform.h
> @@ -48,6 +48,11 @@ struct platform_desc {
>  * stride.
>  */
> #define PLATFORM_QUIRK_GIC_64K_STRIDE (1 << 0)
> +/*
> + * Quirk for platforms where the IOMMU is present but doesn't protect
> + * all DMA-capable devices.
> + */
> +#define PLATFORM_QUIRK_DOM0_IOMMU (1 << 1)
> 
> void platform_init(void);
> int platform_init_time(void);
> diff --git a/xen/include/asm-x86/platform.h b/xen/include/asm-x86/platform.h
> new file mode 100644
> index 0000000000..5427e8b851
> --- /dev/null
> +++ b/xen/include/asm-x86/platform.h
> @@ -0,0 +1,13 @@
> +#ifndef __ASM_X86_PLATFORM_H
> +#define __ASM_X86_PLATFORM_H
> +
> +#endif /* __ASM_X86_PLATFORM_H */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/public/features.h b/xen/include/public/features.h
> index 1613b2aab8..adaa2a995d 100644
> --- a/xen/include/public/features.h
> +++ b/xen/include/public/features.h
> @@ -114,6 +114,11 @@
>  */
> #define XENFEAT_linux_rsdp_unrestricted   15
> 
> +/*
> + * arm: dom0 is started with IOMMU protection.
> + */
> +#define XENFEAT_ARM_dom0_iommu            16
> +
> #define XENFEAT_NR_SUBMAPS 1
> 
> #endif /* __XEN_PUBLIC_FEATURES_H__ */
Bertrand Marquis Feb. 17, 2021, 3:37 p.m. UTC | #3
Hi Julien,

> On 17 Feb 2021, at 08:50, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 17/02/2021 02:00, Stefano Stabellini wrote:
>> Hi all,
>> Today Linux uses the swiotlb-xen driver (drivers/xen/swiotlb-xen.c) to
>> translate addresses for DMA operations in Dom0. Specifically,
>> swiotlb-xen is used to translate the address of a foreign page (a page
>> belonging to a domU) mapped into Dom0 before using it for DMA.
>> This is important because although Dom0 is 1:1 mapped, DomUs are not. On
>> systems without an IOMMU swiotlb-xen enables PV drivers to work as long
>> as the backends are in Dom0. Thanks to swiotlb-xen, the DMA operation
>> ends up using the MFN, rather than the GFN.
>> On systems with an IOMMU, this is not necessary: when a foreign page is
>> mapped in Dom0, it is added to the Dom0 p2m. A new GFN->MFN translation
>> is enstablished for both MMU and SMMU. Dom0 could safely use the GFN
>> address (instead of the MFN) for DMA operations and they would work. It
>> would be more efficient than using swiotlb-xen.
>> If you recall my presentation from Xen Summit 2020, Xilinx is working on
>> cache coloring. With cache coloring, no domain is 1:1 mapped, not even
>> Dom0. In a scenario where Dom0 is not 1:1 mapped, swiotlb-xen does not
>> work as intended.
>> The suggested solution for both these issues is to add a new feature
>> flag "XENFEAT_ARM_dom0_iommu" that tells Dom0 that it is safe not to use
>> swiotlb-xen because IOMMU translations are available for Dom0. If
>> XENFEAT_ARM_dom0_iommu is set, Linux should skip the swiotlb-xen
>> initialization. I have tested this scheme with and without cache
>> coloring (hence with and without 1:1 mapping of Dom0) on ZCU102 and it
>> works as expected: DMA operations succeed.
>> What about systems where an IOMMU is present but not all devices are
>> protected?
>> There is no way for Xen to know which devices are protected and which
>> ones are not: devices that do not have the "iommus" property could or
>> could not be DMA masters.
>> Perhaps Xen could populate a whitelist of devices protected by the IOMMU
>> based on the "iommus" property. It would require some added complexity
>> in Xen and especially in the swiotlb-xen driver in Linux to use it,
>> which is not ideal.
> 
> You are trading a bit more complexity in Xen and Linux with a user will may not be able to use the hypervisor on his/her platform without a quirk in Xen (see more below).
> 
>> However, this approach would not work for cache
>> coloring where dom0 is not 1:1 mapped so the swiotlb-xen should not be
>> used either way
> 
> Not all the Dom0 Linux kernel will be able to work with cache colouring. So you will need a way for the kernel to say "Hey I am can avoid using swiotlb".

I fully agree and from my current understanding the condition is “having an iommu”.

> 
>> For these reasons, I would like to propose a single flag
>> XENFEAT_ARM_dom0_iommu which says that the IOMMU can be relied upon for
>> DMA translations. In situations where a DMA master is not SMMU
>> protected, XENFEAT_ARM_dom0_iommu should not be set. For example, on a
>> platform where an IOMMU is present and protects most DMA masters but it
>> is leaving out the MMC controller, then XENFEAT_ARM_dom0_iommu should
>> not be set (because PV block is not going to work without swiotlb-xen.)
>> This also means that cache coloring won't be usable on such a system (at
>> least not usable with the MMC controller so the system integrator should
>> pay special care to setup the system).
>> It is worth noting that if we wanted to extend the interface to add a
>> list of protected devices in the future, it would still be possible. It
>> would be compatible with XENFEAT_ARM_dom0_iommu.
> 
> I imagine by compatible, you mean XENFEAT_ARM_dom0_iommu would be cleared and instead the device-tree list would be used. Is that correct?

What do you mean by device tree list here ?

> 
>> How to set XENFEAT_ARM_dom0_iommu?
>> We could set XENFEAT_ARM_dom0_iommu automatically when
>> is_iommu_enabled(d) for Dom0. We could also have a platform specific
>> (xen/arch/arm/platforms/) override so that a specific platform can
>> disable XENFEAT_ARM_dom0_iommu. For debugging purposes and advanced
>> users, it would also be useful to be able to override it via a Xen
>> command line parameter.
> Platform quirks should be are limited to a small set of platforms.
> 
> In this case, this would not be only per-platform but also per-firmware table as a developer can decide to remove/add IOMMU nodes in the DT at any time.
> 
> In addition to that, it means we are introducing a regression for those users as Xen 4.14 would have worked on there platform but not anymore. They would need to go through all the nodes and find out which one is not protected.

I am not sure i understand your point here as we cannot detect if a device is protected or not by an IOMMU as we do not know which device requires one.
Could you explain what use case working before would not work here ?

> 
> This is a bit of a daunting task and we are going to end up having a lot of per-platform quirk in Xen.

From my understanding the quirks here would be in Linux as it would have to decide for what to use swiotlb or not.
What quirk do you imagine we could implement in Xen ?

Cheers
Bertrand

> 
> So this approach selecting the flag is a no-go for me. FAOD, the inverted idea (i.e. only setting XENFEAT_ARM_dom0_iommu per-platform) is a no go as well.
> 
> I don't have a good idea how to set the flag automatically. My requirement is newer Xen should continue to work on all supported platforms without any additional per-platform effort.
> 
> Cheers,
> 
> -- 
> Julien Grall
Julien Grall Feb. 17, 2021, 4:41 p.m. UTC | #4
On 17/02/2021 15:37, Bertrand Marquis wrote:
> Hi Julien,

Hi Bertrand,

>> On 17 Feb 2021, at 08:50, Julien Grall <julien@xen.org> wrote:
>>
>>
>>
>> On 17/02/2021 02:00, Stefano Stabellini wrote:
>>> Hi all,
>>> Today Linux uses the swiotlb-xen driver (drivers/xen/swiotlb-xen.c) to
>>> translate addresses for DMA operations in Dom0. Specifically,
>>> swiotlb-xen is used to translate the address of a foreign page (a page
>>> belonging to a domU) mapped into Dom0 before using it for DMA.
>>> This is important because although Dom0 is 1:1 mapped, DomUs are not. On
>>> systems without an IOMMU swiotlb-xen enables PV drivers to work as long
>>> as the backends are in Dom0. Thanks to swiotlb-xen, the DMA operation
>>> ends up using the MFN, rather than the GFN.
>>> On systems with an IOMMU, this is not necessary: when a foreign page is
>>> mapped in Dom0, it is added to the Dom0 p2m. A new GFN->MFN translation
>>> is enstablished for both MMU and SMMU. Dom0 could safely use the GFN
>>> address (instead of the MFN) for DMA operations and they would work. It
>>> would be more efficient than using swiotlb-xen.
>>> If you recall my presentation from Xen Summit 2020, Xilinx is working on
>>> cache coloring. With cache coloring, no domain is 1:1 mapped, not even
>>> Dom0. In a scenario where Dom0 is not 1:1 mapped, swiotlb-xen does not
>>> work as intended.
>>> The suggested solution for both these issues is to add a new feature
>>> flag "XENFEAT_ARM_dom0_iommu" that tells Dom0 that it is safe not to use
>>> swiotlb-xen because IOMMU translations are available for Dom0. If
>>> XENFEAT_ARM_dom0_iommu is set, Linux should skip the swiotlb-xen
>>> initialization. I have tested this scheme with and without cache
>>> coloring (hence with and without 1:1 mapping of Dom0) on ZCU102 and it
>>> works as expected: DMA operations succeed.
>>> What about systems where an IOMMU is present but not all devices are
>>> protected?
>>> There is no way for Xen to know which devices are protected and which
>>> ones are not: devices that do not have the "iommus" property could or
>>> could not be DMA masters.
>>> Perhaps Xen could populate a whitelist of devices protected by the IOMMU
>>> based on the "iommus" property. It would require some added complexity
>>> in Xen and especially in the swiotlb-xen driver in Linux to use it,
>>> which is not ideal.
>>
>> You are trading a bit more complexity in Xen and Linux with a user will may not be able to use the hypervisor on his/her platform without a quirk in Xen (see more below).
>>
>>> However, this approach would not work for cache
>>> coloring where dom0 is not 1:1 mapped so the swiotlb-xen should not be
>>> used either way
>>
>> Not all the Dom0 Linux kernel will be able to work with cache colouring. So you will need a way for the kernel to say "Hey I am can avoid using swiotlb".
> 
> I fully agree and from my current understanding the condition is “having an iommu”.
> 
>>
>>> For these reasons, I would like to propose a single flag
>>> XENFEAT_ARM_dom0_iommu which says that the IOMMU can be relied upon for
>>> DMA translations. In situations where a DMA master is not SMMU
>>> protected, XENFEAT_ARM_dom0_iommu should not be set. For example, on a
>>> platform where an IOMMU is present and protects most DMA masters but it
>>> is leaving out the MMC controller, then XENFEAT_ARM_dom0_iommu should
>>> not be set (because PV block is not going to work without swiotlb-xen.)
>>> This also means that cache coloring won't be usable on such a system (at
>>> least not usable with the MMC controller so the system integrator should
>>> pay special care to setup the system).
>>> It is worth noting that if we wanted to extend the interface to add a
>>> list of protected devices in the future, it would still be possible. It
>>> would be compatible with XENFEAT_ARM_dom0_iommu.
>>
>> I imagine by compatible, you mean XENFEAT_ARM_dom0_iommu would be cleared and instead the device-tree list would be used. Is that correct?
> 
> What do you mean by device tree list here ?

Sorry I meant "device list". I was referring to Stefano's suggestion to 
describe the list of devices protected in the device-tree.

> 
>>
>>> How to set XENFEAT_ARM_dom0_iommu?
>>> We could set XENFEAT_ARM_dom0_iommu automatically when
>>> is_iommu_enabled(d) for Dom0. We could also have a platform specific
>>> (xen/arch/arm/platforms/) override so that a specific platform can
>>> disable XENFEAT_ARM_dom0_iommu. For debugging purposes and advanced
>>> users, it would also be useful to be able to override it via a Xen
>>> command line parameter.
>> Platform quirks should be are limited to a small set of platforms.
>>
>> In this case, this would not be only per-platform but also per-firmware table as a developer can decide to remove/add IOMMU nodes in the DT at any time.
>>
>> In addition to that, it means we are introducing a regression for those users as Xen 4.14 would have worked on there platform but not anymore. They would need to go through all the nodes and find out which one is not protected.
> 
> I am not sure i understand your point here as we cannot detect if a device is protected or not by an IOMMU as we do not know which device requires one.

That's correct...

> Could you explain what use case working before would not work here ?

 From Stefano's e-mail, Xen would expose XENFEAT_ARM_dom0_iommu if all 
the devices are protected by the IOMMU.

This implies that Xen is aware whether ever DMA-capable devices are 
protected. As you rightfully pointed out this cannot work.

>>
>> This is a bit of a daunting task and we are going to end up having a lot of per-platform quirk in Xen.
> 
>  From my understanding the quirks here would be in Linux as it would have to decide for what to use swiotlb or not.

This is not how I understood Stefano's e-mail. But even if it is 
happening in Linux, then we need a way to tell Linux which devices have 
been protected by Xen.

> What quirk do you imagine we could implement in Xen ?

Me? None. That Stefano's idea and I don't think it can work.

Cheers,
Bertrand Marquis Feb. 17, 2021, 4:47 p.m. UTC | #5
Hi Julien,

> On 17 Feb 2021, at 16:41, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 17/02/2021 15:37, Bertrand Marquis wrote:
>> Hi Julien,
> 
> Hi Bertrand,
> 
>>> On 17 Feb 2021, at 08:50, Julien Grall <julien@xen.org> wrote:
>>> 
>>> 
>>> 
>>> On 17/02/2021 02:00, Stefano Stabellini wrote:
>>>> Hi all,
>>>> Today Linux uses the swiotlb-xen driver (drivers/xen/swiotlb-xen.c) to
>>>> translate addresses for DMA operations in Dom0. Specifically,
>>>> swiotlb-xen is used to translate the address of a foreign page (a page
>>>> belonging to a domU) mapped into Dom0 before using it for DMA.
>>>> This is important because although Dom0 is 1:1 mapped, DomUs are not. On
>>>> systems without an IOMMU swiotlb-xen enables PV drivers to work as long
>>>> as the backends are in Dom0. Thanks to swiotlb-xen, the DMA operation
>>>> ends up using the MFN, rather than the GFN.
>>>> On systems with an IOMMU, this is not necessary: when a foreign page is
>>>> mapped in Dom0, it is added to the Dom0 p2m. A new GFN->MFN translation
>>>> is enstablished for both MMU and SMMU. Dom0 could safely use the GFN
>>>> address (instead of the MFN) for DMA operations and they would work. It
>>>> would be more efficient than using swiotlb-xen.
>>>> If you recall my presentation from Xen Summit 2020, Xilinx is working on
>>>> cache coloring. With cache coloring, no domain is 1:1 mapped, not even
>>>> Dom0. In a scenario where Dom0 is not 1:1 mapped, swiotlb-xen does not
>>>> work as intended.
>>>> The suggested solution for both these issues is to add a new feature
>>>> flag "XENFEAT_ARM_dom0_iommu" that tells Dom0 that it is safe not to use
>>>> swiotlb-xen because IOMMU translations are available for Dom0. If
>>>> XENFEAT_ARM_dom0_iommu is set, Linux should skip the swiotlb-xen
>>>> initialization. I have tested this scheme with and without cache
>>>> coloring (hence with and without 1:1 mapping of Dom0) on ZCU102 and it
>>>> works as expected: DMA operations succeed.
>>>> What about systems where an IOMMU is present but not all devices are
>>>> protected?
>>>> There is no way for Xen to know which devices are protected and which
>>>> ones are not: devices that do not have the "iommus" property could or
>>>> could not be DMA masters.
>>>> Perhaps Xen could populate a whitelist of devices protected by the IOMMU
>>>> based on the "iommus" property. It would require some added complexity
>>>> in Xen and especially in the swiotlb-xen driver in Linux to use it,
>>>> which is not ideal.
>>> 
>>> You are trading a bit more complexity in Xen and Linux with a user will may not be able to use the hypervisor on his/her platform without a quirk in Xen (see more below).
>>> 
>>>> However, this approach would not work for cache
>>>> coloring where dom0 is not 1:1 mapped so the swiotlb-xen should not be
>>>> used either way
>>> 
>>> Not all the Dom0 Linux kernel will be able to work with cache colouring. So you will need a way for the kernel to say "Hey I am can avoid using swiotlb".
>> I fully agree and from my current understanding the condition is “having an iommu”.
>>> 
>>>> For these reasons, I would like to propose a single flag
>>>> XENFEAT_ARM_dom0_iommu which says that the IOMMU can be relied upon for
>>>> DMA translations. In situations where a DMA master is not SMMU
>>>> protected, XENFEAT_ARM_dom0_iommu should not be set. For example, on a
>>>> platform where an IOMMU is present and protects most DMA masters but it
>>>> is leaving out the MMC controller, then XENFEAT_ARM_dom0_iommu should
>>>> not be set (because PV block is not going to work without swiotlb-xen.)
>>>> This also means that cache coloring won't be usable on such a system (at
>>>> least not usable with the MMC controller so the system integrator should
>>>> pay special care to setup the system).
>>>> It is worth noting that if we wanted to extend the interface to add a
>>>> list of protected devices in the future, it would still be possible. It
>>>> would be compatible with XENFEAT_ARM_dom0_iommu.
>>> 
>>> I imagine by compatible, you mean XENFEAT_ARM_dom0_iommu would be cleared and instead the device-tree list would be used. Is that correct?
>> What do you mean by device tree list here ?
> 
> Sorry I meant "device list". I was referring to Stefano's suggestion to describe the list of devices protected in the device-tree.

Ok you mean adding to the device tree some kind of device list for which swiotlb should be used (or maybe the opposite list in fact).

> 
>>> 
>>>> How to set XENFEAT_ARM_dom0_iommu?
>>>> We could set XENFEAT_ARM_dom0_iommu automatically when
>>>> is_iommu_enabled(d) for Dom0. We could also have a platform specific
>>>> (xen/arch/arm/platforms/) override so that a specific platform can
>>>> disable XENFEAT_ARM_dom0_iommu. For debugging purposes and advanced
>>>> users, it would also be useful to be able to override it via a Xen
>>>> command line parameter.
>>> Platform quirks should be are limited to a small set of platforms.
>>> 
>>> In this case, this would not be only per-platform but also per-firmware table as a developer can decide to remove/add IOMMU nodes in the DT at any time.
>>> 
>>> In addition to that, it means we are introducing a regression for those users as Xen 4.14 would have worked on there platform but not anymore. They would need to go through all the nodes and find out which one is not protected.
>> I am not sure i understand your point here as we cannot detect if a device is protected or not by an IOMMU as we do not know which device requires one.
> 
> That's correct...
> 
>> Could you explain what use case working before would not work here ?
> 
> From Stefano's e-mail, Xen would expose XENFEAT_ARM_dom0_iommu if all the devices are protected by the IOMMU.
> 
> This implies that Xen is aware whether ever DMA-capable devices are protected. As you rightfully pointed out this cannot work.

But this is also an issue right now.

> 
>>> 
>>> This is a bit of a daunting task and we are going to end up having a lot of per-platform quirk in Xen.
>> From my understanding the quirks here would be in Linux as it would have to decide for what to use swiotlb or not.
> 
> This is not how I understood Stefano's e-mail. But even if it is happening in Linux, then we need a way to tell Linux which devices have been protected by Xen.

So basically let some info in the devices to let linux that they are protected by an IOMMU, which would be replacing the iommu link node by something else.

> 
>> What quirk do you imagine we could implement in Xen ?
> 
> Me? None. That Stefano's idea and I don't think it can work.

Definitely there is a problem to solve here, maybe the how requires more discussion :-)

I see the same kind of problem incoming once we will have some guests using direct-map and some other not.
At the end there is a some kind of a matrix with swiotlb depending on direct-map and IOMMU present with some
very nasty combination if we try to add the use case of some devices only protected by an IOMMU.

Cheers
Bertrand 

> 
> Cheers,
> 
> -- 
> Julien Grall
Julien Grall Feb. 17, 2021, 4:57 p.m. UTC | #6
On 17/02/2021 16:47, Bertrand Marquis wrote:
> Hi Julien,
> 
>> On 17 Feb 2021, at 16:41, Julien Grall <julien@xen.org> wrote:
>>
>>
>>
>> On 17/02/2021 15:37, Bertrand Marquis wrote:
>>> Hi Julien,
>>
>> Hi Bertrand,
>>
>>>> On 17 Feb 2021, at 08:50, Julien Grall <julien@xen.org> wrote:
>>>>
>>>>
>>>>
>>>> On 17/02/2021 02:00, Stefano Stabellini wrote:
>>>>> Hi all,
>>>>> Today Linux uses the swiotlb-xen driver (drivers/xen/swiotlb-xen.c) to
>>>>> translate addresses for DMA operations in Dom0. Specifically,
>>>>> swiotlb-xen is used to translate the address of a foreign page (a page
>>>>> belonging to a domU) mapped into Dom0 before using it for DMA.
>>>>> This is important because although Dom0 is 1:1 mapped, DomUs are not. On
>>>>> systems without an IOMMU swiotlb-xen enables PV drivers to work as long
>>>>> as the backends are in Dom0. Thanks to swiotlb-xen, the DMA operation
>>>>> ends up using the MFN, rather than the GFN.
>>>>> On systems with an IOMMU, this is not necessary: when a foreign page is
>>>>> mapped in Dom0, it is added to the Dom0 p2m. A new GFN->MFN translation
>>>>> is enstablished for both MMU and SMMU. Dom0 could safely use the GFN
>>>>> address (instead of the MFN) for DMA operations and they would work. It
>>>>> would be more efficient than using swiotlb-xen.
>>>>> If you recall my presentation from Xen Summit 2020, Xilinx is working on
>>>>> cache coloring. With cache coloring, no domain is 1:1 mapped, not even
>>>>> Dom0. In a scenario where Dom0 is not 1:1 mapped, swiotlb-xen does not
>>>>> work as intended.
>>>>> The suggested solution for both these issues is to add a new feature
>>>>> flag "XENFEAT_ARM_dom0_iommu" that tells Dom0 that it is safe not to use
>>>>> swiotlb-xen because IOMMU translations are available for Dom0. If
>>>>> XENFEAT_ARM_dom0_iommu is set, Linux should skip the swiotlb-xen
>>>>> initialization. I have tested this scheme with and without cache
>>>>> coloring (hence with and without 1:1 mapping of Dom0) on ZCU102 and it
>>>>> works as expected: DMA operations succeed.
>>>>> What about systems where an IOMMU is present but not all devices are
>>>>> protected?
>>>>> There is no way for Xen to know which devices are protected and which
>>>>> ones are not: devices that do not have the "iommus" property could or
>>>>> could not be DMA masters.
>>>>> Perhaps Xen could populate a whitelist of devices protected by the IOMMU
>>>>> based on the "iommus" property. It would require some added complexity
>>>>> in Xen and especially in the swiotlb-xen driver in Linux to use it,
>>>>> which is not ideal.
>>>>
>>>> You are trading a bit more complexity in Xen and Linux with a user will may not be able to use the hypervisor on his/her platform without a quirk in Xen (see more below).
>>>>
>>>>> However, this approach would not work for cache
>>>>> coloring where dom0 is not 1:1 mapped so the swiotlb-xen should not be
>>>>> used either way
>>>>
>>>> Not all the Dom0 Linux kernel will be able to work with cache colouring. So you will need a way for the kernel to say "Hey I am can avoid using swiotlb".
>>> I fully agree and from my current understanding the condition is “having an iommu”.
>>>>
>>>>> For these reasons, I would like to propose a single flag
>>>>> XENFEAT_ARM_dom0_iommu which says that the IOMMU can be relied upon for
>>>>> DMA translations. In situations where a DMA master is not SMMU
>>>>> protected, XENFEAT_ARM_dom0_iommu should not be set. For example, on a
>>>>> platform where an IOMMU is present and protects most DMA masters but it
>>>>> is leaving out the MMC controller, then XENFEAT_ARM_dom0_iommu should
>>>>> not be set (because PV block is not going to work without swiotlb-xen.)
>>>>> This also means that cache coloring won't be usable on such a system (at
>>>>> least not usable with the MMC controller so the system integrator should
>>>>> pay special care to setup the system).
>>>>> It is worth noting that if we wanted to extend the interface to add a
>>>>> list of protected devices in the future, it would still be possible. It
>>>>> would be compatible with XENFEAT_ARM_dom0_iommu.
>>>>
>>>> I imagine by compatible, you mean XENFEAT_ARM_dom0_iommu would be cleared and instead the device-tree list would be used. Is that correct?
>>> What do you mean by device tree list here ?
>>
>> Sorry I meant "device list". I was referring to Stefano's suggestion to describe the list of devices protected in the device-tree.
> 
> Ok you mean adding to the device tree some kind of device list for which swiotlb should be used (or maybe the opposite list in fact).

I think the list of protected devices is better because we could create 
an Xen IOMMU node.

> 
>>
>>>>
>>>>> How to set XENFEAT_ARM_dom0_iommu?
>>>>> We could set XENFEAT_ARM_dom0_iommu automatically when
>>>>> is_iommu_enabled(d) for Dom0. We could also have a platform specific
>>>>> (xen/arch/arm/platforms/) override so that a specific platform can
>>>>> disable XENFEAT_ARM_dom0_iommu. For debugging purposes and advanced
>>>>> users, it would also be useful to be able to override it via a Xen
>>>>> command line parameter.
>>>> Platform quirks should be are limited to a small set of platforms.
>>>>
>>>> In this case, this would not be only per-platform but also per-firmware table as a developer can decide to remove/add IOMMU nodes in the DT at any time.
>>>>
>>>> In addition to that, it means we are introducing a regression for those users as Xen 4.14 would have worked on there platform but not anymore. They would need to go through all the nodes and find out which one is not protected.
>>> I am not sure i understand your point here as we cannot detect if a device is protected or not by an IOMMU as we do not know which device requires one.
>>
>> That's correct...
>>
>>> Could you explain what use case working before would not work here ?
>>
>>  From Stefano's e-mail, Xen would expose XENFEAT_ARM_dom0_iommu if all the devices are protected by the IOMMU.
>>
>> This implies that Xen is aware whether ever DMA-capable devices are protected. As you rightfully pointed out this cannot work.
> 
> But this is also an issue right now.

Aside the issue reported by Rahul, Linux will always use the host 
physical address which also contains a direct mapping in the P2M. This 
should work.

Why would mind to explain why it can't work today?

>>
>>>>
>>>> This is a bit of a daunting task and we are going to end up having a lot of per-platform quirk in Xen.
>>>  From my understanding the quirks here would be in Linux as it would have to decide for what to use swiotlb or not.
>>
>> This is not how I understood Stefano's e-mail. But even if it is happening in Linux, then we need a way to tell Linux which devices have been protected by Xen.
> 
> So basically let some info in the devices to let linux that they are protected by an IOMMU, which would be replacing the iommu link node by something else.

Correct.

> 
>>
>>> What quirk do you imagine we could implement in Xen ?
>>
>> Me? None. That Stefano's idea and I don't think it can work.
> 
> Definitely there is a problem to solve here, maybe the how requires more discussion :-)
> 
> I see the same kind of problem incoming once we will have some guests using direct-map and some other not.
> At the end there is a some kind of a matrix with swiotlb depending on direct-map and IOMMU present with some
> very nasty combination if we try to add the use case of some devices only protected by an IOMMU.

Once you tell Linux which device is protected then it is easy to handle 
it because it is possible to set per-device DMA ops.

Unprotected devices would have to use the swiotlb DMA ops.

What's more complicated is to fully disable the IOMMU mapping added by 
Xen. Linux would need a way (possibly via the VM_assist hypercall) to 
indicate the mapping is not necesary.

Cheers,
Jan Beulich Feb. 17, 2021, 5:03 p.m. UTC | #7
On 17.02.2021 16:34, Bertrand Marquis wrote:
>> On 17 Feb 2021, at 02:00, Stefano Stabellini <sstabellini@kernel.org> wrote:
>> Today Linux uses the swiotlb-xen driver (drivers/xen/swiotlb-xen.c) to
>> translate addresses for DMA operations in Dom0. Specifically,
>> swiotlb-xen is used to translate the address of a foreign page (a page
>> belonging to a domU) mapped into Dom0 before using it for DMA.
>>
>> This is important because although Dom0 is 1:1 mapped, DomUs are not. On
>> systems without an IOMMU swiotlb-xen enables PV drivers to work as long
>> as the backends are in Dom0. Thanks to swiotlb-xen, the DMA operation
>> ends up using the MFN, rather than the GFN.
>>
>>
>> On systems with an IOMMU, this is not necessary: when a foreign page is
>> mapped in Dom0, it is added to the Dom0 p2m. A new GFN->MFN translation
>> is enstablished for both MMU and SMMU. Dom0 could safely use the GFN
>> address (instead of the MFN) for DMA operations and they would work. It
>> would be more efficient than using swiotlb-xen.
>>
>> If you recall my presentation from Xen Summit 2020, Xilinx is working on
>> cache coloring. With cache coloring, no domain is 1:1 mapped, not even
>> Dom0. In a scenario where Dom0 is not 1:1 mapped, swiotlb-xen does not
>> work as intended.
>>
>>
>> The suggested solution for both these issues is to add a new feature
>> flag "XENFEAT_ARM_dom0_iommu" that tells Dom0 that it is safe not to use
>> swiotlb-xen because IOMMU translations are available for Dom0. If
>> XENFEAT_ARM_dom0_iommu is set, Linux should skip the swiotlb-xen
>> initialization. I have tested this scheme with and without cache
>> coloring (hence with and without 1:1 mapping of Dom0) on ZCU102 and it
>> works as expected: DMA operations succeed.
> 
> Wouldn’t it be easier to name the flag XENFEAT_ARM_swiotlb_needed ?

Except that "swiotlb" is Linux terminology, which I don't think a
Xen feature should be named after. Names should be generic, except
maybe when they're really targeting exactly one kind of guest
(which imo would better never be the case).

Jan
Stefano Stabellini Feb. 17, 2021, 11:54 p.m. UTC | #8
On Wed, 17 Feb 2021, Julien Grall wrote:
> On 17/02/2021 02:00, Stefano Stabellini wrote:
> > Hi all,
> > 
> > Today Linux uses the swiotlb-xen driver (drivers/xen/swiotlb-xen.c) to
> > translate addresses for DMA operations in Dom0. Specifically,
> > swiotlb-xen is used to translate the address of a foreign page (a page
> > belonging to a domU) mapped into Dom0 before using it for DMA.
> > 
> > This is important because although Dom0 is 1:1 mapped, DomUs are not. On
> > systems without an IOMMU swiotlb-xen enables PV drivers to work as long
> > as the backends are in Dom0. Thanks to swiotlb-xen, the DMA operation
> > ends up using the MFN, rather than the GFN.
> > 
> > 
> > On systems with an IOMMU, this is not necessary: when a foreign page is
> > mapped in Dom0, it is added to the Dom0 p2m. A new GFN->MFN translation
> > is enstablished for both MMU and SMMU. Dom0 could safely use the GFN
> > address (instead of the MFN) for DMA operations and they would work. It
> > would be more efficient than using swiotlb-xen.
> > 
> > If you recall my presentation from Xen Summit 2020, Xilinx is working on
> > cache coloring. With cache coloring, no domain is 1:1 mapped, not even
> > Dom0. In a scenario where Dom0 is not 1:1 mapped, swiotlb-xen does not
> > work as intended.
> > 
> > 
> > The suggested solution for both these issues is to add a new feature
> > flag "XENFEAT_ARM_dom0_iommu" that tells Dom0 that it is safe not to use
> > swiotlb-xen because IOMMU translations are available for Dom0. If
> > XENFEAT_ARM_dom0_iommu is set, Linux should skip the swiotlb-xen
> > initialization. I have tested this scheme with and without cache
> > coloring (hence with and without 1:1 mapping of Dom0) on ZCU102 and it
> > works as expected: DMA operations succeed.
> > 
> > 
> > What about systems where an IOMMU is present but not all devices are
> > protected?
> > 
> > There is no way for Xen to know which devices are protected and which
> > ones are not: devices that do not have the "iommus" property could or
> > could not be DMA masters.
> > 
> > Perhaps Xen could populate a whitelist of devices protected by the IOMMU
> > based on the "iommus" property. It would require some added complexity
> > in Xen and especially in the swiotlb-xen driver in Linux to use it,
> > which is not ideal.
> 
> You are trading a bit more complexity in Xen and Linux with a user will may
> not be able to use the hypervisor on his/her platform without a quirk in Xen
> (see more below).
> 
> > However, this approach would not work for cache
> > coloring where dom0 is not 1:1 mapped so the swiotlb-xen should not be
> > used either way
> 
> Not all the Dom0 Linux kernel will be able to work with cache colouring. So
> you will need a way for the kernel to say "Hey I am can avoid using swiotlb".

A dom0 Linux kernel unmodified can work with Xen cache coloring enabled.
The swiotlb-xen is unneeded and also all the pfn_valid() checks to detect
if a page is local or foreign don't work as intended. However, it still
works on systems where the IOMMU can be relied upon. That's because the
IOMMU does the GFN->MFN translations, and also because both swiotlb-xen
code paths (cache flush local and cache flush via hypercall) work.

Also considering that somebody that enables cache coloring has to know
the entire system well, I don't think we need a runtime flag from Linux
to say "Hey I can avoid using swiotlb".

That said, I think it is important to fix Linux because the code might
work today, but it is more by accident than by design.


> > For these reasons, I would like to propose a single flag
> > XENFEAT_ARM_dom0_iommu which says that the IOMMU can be relied upon for
> > DMA translations. In situations where a DMA master is not SMMU
> > protected, XENFEAT_ARM_dom0_iommu should not be set. For example, on a
> > platform where an IOMMU is present and protects most DMA masters but it
> > is leaving out the MMC controller, then XENFEAT_ARM_dom0_iommu should
> > not be set (because PV block is not going to work without swiotlb-xen.)
> > This also means that cache coloring won't be usable on such a system (at
> > least not usable with the MMC controller so the system integrator should
> > pay special care to setup the system).
> > 
> > It is worth noting that if we wanted to extend the interface to add a
> > list of protected devices in the future, it would still be possible. It
> > would be compatible with XENFEAT_ARM_dom0_iommu.
> 
> I imagine by compatible, you mean XENFEAT_ARM_dom0_iommu would be cleared and
> instead the device-tree list would be used. Is that correct?

If XENFEAT_ARM_dom0_iommu is set, the device list is redundant.
If XENFEAT_ARM_dom0_iommu is not set, the device list is useful.

The device list and XENFEAT_ARM_dom0_iommu serve different purposes. The
device list is meant to skip the swiotlb-xen for some devices.

XENFEAT_ARM_dom0_iommu is meant to skip the swiotlb-xen for all devices,
which is especially useful when dom0 is not 1:1 mapped, because in that
case swiotlb-xen is useless and wrong.

Thinking more about this, it is clear that swiotlb-xen should not be
considered when cache coloring is enabled. It can't help by design. The
device list and the flag are really for different use-cases. And the
cache coloring use-case is better served by a XENFEAT_direct_mapped and
XENFEAT_not_direct_mapped flags. More on this below.


> > How to set XENFEAT_ARM_dom0_iommu?
> > 
> > We could set XENFEAT_ARM_dom0_iommu automatically when
> > is_iommu_enabled(d) for Dom0. We could also have a platform specific
> > (xen/arch/arm/platforms/) override so that a specific platform can
> > disable XENFEAT_ARM_dom0_iommu. For debugging purposes and advanced
> > users, it would also be useful to be able to override it via a Xen
> > command line parameter.
> Platform quirks should be are limited to a small set of platforms.
> 
> In this case, this would not be only per-platform but also per-firmware table
> as a developer can decide to remove/add IOMMU nodes in the DT at any time.
> 
> In addition to that, it means we are introducing a regression for those users
> as Xen 4.14 would have worked on there platform but not anymore. They would
> need to go through all the nodes and find out which one is not protected.
> 
> This is a bit of a daunting task and we are going to end up having a lot of
> per-platform quirk in Xen.
> 
> So this approach selecting the flag is a no-go for me. FAOD, the inverted idea
> (i.e. only setting XENFEAT_ARM_dom0_iommu per-platform) is a no go as well.
>
> I don't have a good idea how to set the flag automatically. My
> requirement is newer Xen should continue to work on all supported
> platforms without any additional per-platform effort.

Absolutely agreed.


One option would be to rename the flag to XENFEAT_ARM_cache_coloring and
only set it when cache coloring is enabled.  Obviously you need to know
what you are doing if you are enabling cache coloring. If we go down
that route, we don't risk breaking compatibility with any platforms.
Given that cache coloring is not upstream yet (upstreaming should start
very soon), maybe the only thing to do now would be to reserve a XENFEAT
number.

But actually it was always wrong for Linux to enable swiotlb-xen without
checking whether it is 1:1 mapped or not. Today we enable swiotlb-xen in
dom0 and disable it in domU, while we should have enabled swiotlb-xen if
1:1 mapped no matter dom0/domU. (swiotlb-xen could be useful in a 1:1
mapped domU driver domain.)


There is an argument (Andy was making on IRC) that being 1:1 mapped or
not is an important information that Xen should provide to the domain
regardless of anything else.

So maybe we should add two flags:

- XENFEAT_direct_mapped
- XENFEAT_not_direct_mapped

To all domains. This is not even ARM specific. Today dom0 would get
XENFEAT_direct_mapped and domUs XENFEAT_not_direct_mapped. With cache
coloring all domains will get XENFEAT_not_direct_mapped. With Bertrand's
team work on 1:1 mapping domUs, some domUs might start to get
XENFEAT_direct_mapped also one day soon.

Now I think this is the best option because it is descriptive, doesn't
imply anything about what Linux should or should not do, and doesn't
depend on unreliable IOMMU information.



Instead, if we follow my original proposal of using
XENFEAT_ARM_dom0_iommu and set it automatically when Dom0 is protected
by IOMMU, we risk breaking PV drivers for platforms where that protection
is incomplete. I have no idea how many there are out there today. I have
the feeling that there aren't that many but I am not sure. So yes, it
could be that we start passing XENFEAT_ARM_dom0_iommu for a given
platform, Linux skips the swiotlb-xen initialization, actually it is
needed for a network/block device, and a PV driver breaks. I can see why
you say this is a no-go.


Third option. We still use XENFEAT_ARM_dom0_iommu but we never set
XENFEAT_ARM_dom0_iommu automatically. It needs a platform specific flag
to be set. We add the flag to xen/arch/arm/platforms/xilinx-zynqmp.c and
any other platforms that qualify. Basically it is "opt in" instead of
"opt out". We don't risk breaking anything because platforms would have
XENFEAT_ARM_dom0_iommu disabled by default. Still, I think the
XENFEAT_not/direct_mapped route is much cleaner and simpler.



I saw that the topic has generated quite a bit of discussion. I suggest
we keep gathering data and do brainstorming on the thread for a few days
and in the meantime schedule a call for late next week to figure out a
way forward?
Julien Grall Feb. 18, 2021, 9:57 a.m. UTC | #9
Hi Stefano,

On 17/02/2021 23:54, Stefano Stabellini wrote:
> On Wed, 17 Feb 2021, Julien Grall wrote:
>> On 17/02/2021 02:00, Stefano Stabellini wrote:
>>> Hi all,
>>>
>>> Today Linux uses the swiotlb-xen driver (drivers/xen/swiotlb-xen.c) to
>>> translate addresses for DMA operations in Dom0. Specifically,
>>> swiotlb-xen is used to translate the address of a foreign page (a page
>>> belonging to a domU) mapped into Dom0 before using it for DMA.
>>>
>>> This is important because although Dom0 is 1:1 mapped, DomUs are not. On
>>> systems without an IOMMU swiotlb-xen enables PV drivers to work as long
>>> as the backends are in Dom0. Thanks to swiotlb-xen, the DMA operation
>>> ends up using the MFN, rather than the GFN.
>>>
>>>
>>> On systems with an IOMMU, this is not necessary: when a foreign page is
>>> mapped in Dom0, it is added to the Dom0 p2m. A new GFN->MFN translation
>>> is enstablished for both MMU and SMMU. Dom0 could safely use the GFN
>>> address (instead of the MFN) for DMA operations and they would work. It
>>> would be more efficient than using swiotlb-xen.
>>>
>>> If you recall my presentation from Xen Summit 2020, Xilinx is working on
>>> cache coloring. With cache coloring, no domain is 1:1 mapped, not even
>>> Dom0. In a scenario where Dom0 is not 1:1 mapped, swiotlb-xen does not
>>> work as intended.
>>>
>>>
>>> The suggested solution for both these issues is to add a new feature
>>> flag "XENFEAT_ARM_dom0_iommu" that tells Dom0 that it is safe not to use
>>> swiotlb-xen because IOMMU translations are available for Dom0. If
>>> XENFEAT_ARM_dom0_iommu is set, Linux should skip the swiotlb-xen
>>> initialization. I have tested this scheme with and without cache
>>> coloring (hence with and without 1:1 mapping of Dom0) on ZCU102 and it
>>> works as expected: DMA operations succeed.
>>>
>>>
>>> What about systems where an IOMMU is present but not all devices are
>>> protected?
>>>
>>> There is no way for Xen to know which devices are protected and which
>>> ones are not: devices that do not have the "iommus" property could or
>>> could not be DMA masters.
>>>
>>> Perhaps Xen could populate a whitelist of devices protected by the IOMMU
>>> based on the "iommus" property. It would require some added complexity
>>> in Xen and especially in the swiotlb-xen driver in Linux to use it,
>>> which is not ideal.
>>
>> You are trading a bit more complexity in Xen and Linux with a user will may
>> not be able to use the hypervisor on his/her platform without a quirk in Xen
>> (see more below).
>>
>>> However, this approach would not work for cache
>>> coloring where dom0 is not 1:1 mapped so the swiotlb-xen should not be
>>> used either way
>>
>> Not all the Dom0 Linux kernel will be able to work with cache colouring. So
>> you will need a way for the kernel to say "Hey I am can avoid using swiotlb".
> 
> A dom0 Linux kernel unmodified can work with Xen cache coloring enabled.

Really? I was expecting Linux to configure the DMA transaction to use 
the MFN for foreign pages. So a mapping GFN == MFN would be necessary.

> The swiotlb-xen is unneeded and also all the pfn_valid() checks to detect
> if a page is local or foreign don't work as intended. 

I am not sure to understand this. Are you saying that Linux will 
"mistakenly" consider the foreign page as local? Therefore, it would 
always use the GFN rather than MFN?

> However, it still
> works on systems where the IOMMU can be relied upon. That's because the
> IOMMU does the GFN->MFN translations, and also because both swiotlb-xen
> code paths (cache flush local and cache flush via hypercall) work.
> 
> Also considering that somebody that enables cache coloring has to know
> the entire system well, I don't think we need a runtime flag from Linux
> to say "Hey I can avoid using swiotlb".

I think we should avoid to hardcode any decision again. Otherwise, 
sooner or later this will come back to bite us.

[...]

>>> How to set XENFEAT_ARM_dom0_iommu?
>>>
>>> We could set XENFEAT_ARM_dom0_iommu automatically when
>>> is_iommu_enabled(d) for Dom0. We could also have a platform specific
>>> (xen/arch/arm/platforms/) override so that a specific platform can
>>> disable XENFEAT_ARM_dom0_iommu. For debugging purposes and advanced
>>> users, it would also be useful to be able to override it via a Xen
>>> command line parameter.
>> Platform quirks should be are limited to a small set of platforms.
>>
>> In this case, this would not be only per-platform but also per-firmware table
>> as a developer can decide to remove/add IOMMU nodes in the DT at any time.
>>
>> In addition to that, it means we are introducing a regression for those users
>> as Xen 4.14 would have worked on there platform but not anymore. They would
>> need to go through all the nodes and find out which one is not protected.
>>
>> This is a bit of a daunting task and we are going to end up having a lot of
>> per-platform quirk in Xen.
>>
>> So this approach selecting the flag is a no-go for me. FAOD, the inverted idea
>> (i.e. only setting XENFEAT_ARM_dom0_iommu per-platform) is a no go as well.
>>
>> I don't have a good idea how to set the flag automatically. My
>> requirement is newer Xen should continue to work on all supported
>> platforms without any additional per-platform effort.
> 
> Absolutely agreed.
> 
> 
> One option would be to rename the flag to XENFEAT_ARM_cache_coloring and
> only set it when cache coloring is enabled.  Obviously you need to know
> what you are doing if you are enabling cache coloring. If we go down
> that route, we don't risk breaking compatibility with any platforms.
> Given that cache coloring is not upstream yet (upstreaming should start
> very soon), maybe the only thing to do now would be to reserve a XENFEAT
> number.

At least in this context, I can't see how the problem described is cache 
coloring specific. Although, we may need to expose such flag for other 
purpose in the future.

> 
> But actually it was always wrong for Linux to enable swiotlb-xen without
> checking whether it is 1:1 mapped or not. Today we enable swiotlb-xen in
> dom0 and disable it in domU, while we should have enabled swiotlb-xen if
> 1:1 mapped no matter dom0/domU. (swiotlb-xen could be useful in a 1:1
> mapped domU driver domain.)
> 
> 
> There is an argument (Andy was making on IRC) that being 1:1 mapped or
> not is an important information that Xen should provide to the domain
> regardless of anything else.
> 
> So maybe we should add two flags:
> 
> - XENFEAT_direct_mapped
> - XENFEAT_not_direct_mapped

I am guessing the two flags is to allow Linux to fallback to the default 
behavior (depending on dom0 vs domU) on older hypervisor On newer 
hypervisors, one of this flag would always be set. Is that correct?

> 
> To all domains. This is not even ARM specific. Today dom0 would get
> XENFEAT_direct_mapped and domUs XENFEAT_not_direct_mapped. With cache
> coloring all domains will get XENFEAT_not_direct_mapped. With Bertrand's
> team work on 1:1 mapping domUs, some domUs might start to get
> XENFEAT_direct_mapped also one day soon.
> 
> Now I think this is the best option because it is descriptive, doesn't
> imply anything about what Linux should or should not do, and doesn't
> depend on unreliable IOMMU information.

That's a good first step but this still doesn't solve the problem on 
whether the swiotlb can be disabled per-device or even disabling the 
expensive 1:1 mapping in the IOMMU page-tables.

It feels to me we need to have a more complete solution (not necessary 
implemented) so we don't put ourself in the corner again.

> Instead, if we follow my original proposal of using
> XENFEAT_ARM_dom0_iommu and set it automatically when Dom0 is protected
> by IOMMU, we risk breaking PV drivers for platforms where that protection
> is incomplete. I have no idea how many there are out there today. 

This can virtually affect any platform as it is easy to disable an IOMMU 
in the firmware table.

> I have
> the feeling that there aren't that many but I am not sure. So yes, it
> could be that we start passing XENFEAT_ARM_dom0_iommu for a given
> platform, Linux skips the swiotlb-xen initialization, actually it is
> needed for a network/block device, and a PV driver breaks. I can see why
> you say this is a no-go.
> 
> 
> Third option. We still use XENFEAT_ARM_dom0_iommu but we never set
> XENFEAT_ARM_dom0_iommu automatically. It needs a platform specific flag
> to be set. We add the flag to xen/arch/arm/platforms/xilinx-zynqmp.c and
> any other platforms that qualify. Basically it is "opt in" instead of
> "opt out". We don't risk breaking anything because platforms would have
> XENFEAT_ARM_dom0_iommu disabled by default.
Well, yes you will not break other platforms. However, you are still at 
risk to break your platform if the firmware table is updated and disable 
some but not all IOMMUs (for performance concern, brokeness...).

> Still, I think the
> XENFEAT_not/direct_mapped route is much cleaner and simpler.

The XENFEAT_{not,}_direct_mapped also doesn't rely on every platform to 
add code in Xen to take advantage of new features.

Cheers,
Julien Grall Feb. 18, 2021, 10:34 a.m. UTC | #10
Hi Stefano,

On 17/02/2021 23:54, Stefano Stabellini wrote:
> On Wed, 17 Feb 2021, Julien Grall wrote:
>> On 17/02/2021 02:00, Stefano Stabellini wrote:
> 
> I saw that the topic has generated quite a bit of discussion. I suggest
> we keep gathering data and do brainstorming on the thread for a few days
> and in the meantime schedule a call for late next week to figure out a
> way forward?

I forgot to answer to this bit, sorry. I am happy to have a call about this.

Cheers,
Stefano Stabellini Feb. 19, 2021, 1:42 a.m. UTC | #11
On Thu, 18 Feb 2021, Julien Grall wrote:
> On 17/02/2021 23:54, Stefano Stabellini wrote:
> > On Wed, 17 Feb 2021, Julien Grall wrote:
> > > On 17/02/2021 02:00, Stefano Stabellini wrote:
> > > > Hi all,
> > > > 
> > > > Today Linux uses the swiotlb-xen driver (drivers/xen/swiotlb-xen.c) to
> > > > translate addresses for DMA operations in Dom0. Specifically,
> > > > swiotlb-xen is used to translate the address of a foreign page (a page
> > > > belonging to a domU) mapped into Dom0 before using it for DMA.
> > > > 
> > > > This is important because although Dom0 is 1:1 mapped, DomUs are not. On
> > > > systems without an IOMMU swiotlb-xen enables PV drivers to work as long
> > > > as the backends are in Dom0. Thanks to swiotlb-xen, the DMA operation
> > > > ends up using the MFN, rather than the GFN.
> > > > 
> > > > 
> > > > On systems with an IOMMU, this is not necessary: when a foreign page is
> > > > mapped in Dom0, it is added to the Dom0 p2m. A new GFN->MFN translation
> > > > is enstablished for both MMU and SMMU. Dom0 could safely use the GFN
> > > > address (instead of the MFN) for DMA operations and they would work. It
> > > > would be more efficient than using swiotlb-xen.
> > > > 
> > > > If you recall my presentation from Xen Summit 2020, Xilinx is working on
> > > > cache coloring. With cache coloring, no domain is 1:1 mapped, not even
> > > > Dom0. In a scenario where Dom0 is not 1:1 mapped, swiotlb-xen does not
> > > > work as intended.
> > > > 
> > > > 
> > > > The suggested solution for both these issues is to add a new feature
> > > > flag "XENFEAT_ARM_dom0_iommu" that tells Dom0 that it is safe not to use
> > > > swiotlb-xen because IOMMU translations are available for Dom0. If
> > > > XENFEAT_ARM_dom0_iommu is set, Linux should skip the swiotlb-xen
> > > > initialization. I have tested this scheme with and without cache
> > > > coloring (hence with and without 1:1 mapping of Dom0) on ZCU102 and it
> > > > works as expected: DMA operations succeed.
> > > > 
> > > > 
> > > > What about systems where an IOMMU is present but not all devices are
> > > > protected?
> > > > 
> > > > There is no way for Xen to know which devices are protected and which
> > > > ones are not: devices that do not have the "iommus" property could or
> > > > could not be DMA masters.
> > > > 
> > > > Perhaps Xen could populate a whitelist of devices protected by the IOMMU
> > > > based on the "iommus" property. It would require some added complexity
> > > > in Xen and especially in the swiotlb-xen driver in Linux to use it,
> > > > which is not ideal.
> > > 
> > > You are trading a bit more complexity in Xen and Linux with a user will
> > > may
> > > not be able to use the hypervisor on his/her platform without a quirk in
> > > Xen
> > > (see more below).
> > > 
> > > > However, this approach would not work for cache
> > > > coloring where dom0 is not 1:1 mapped so the swiotlb-xen should not be
> > > > used either way
> > > 
> > > Not all the Dom0 Linux kernel will be able to work with cache colouring.
> > > So
> > > you will need a way for the kernel to say "Hey I am can avoid using
> > > swiotlb".
> > 
> > A dom0 Linux kernel unmodified can work with Xen cache coloring enabled.
> 
> Really? I was expecting Linux to configure the DMA transaction to use the MFN
> for foreign pages. So a mapping GFN == MFN would be necessary.
>
> > The swiotlb-xen is unneeded and also all the pfn_valid() checks to detect
> > if a page is local or foreign don't work as intended. 
> 
> I am not sure to understand this. Are you saying that Linux will
> "mistakenly" consider the foreign page as local?

The pfn_valid(addr) check is based on the idea that gfn == mfn for local
pages and gfn != mfn for foreign pages. If you take the mfn of a foreign
page pfn_valid(mfn) should fail.

However, it only works as long as Dom0 is 1:1 mapped. If it is not 1:1
mapped pfn_valid(mfn) could return true for a foreign page. It could
mistake a foreign page for a local one. It could also mistake a local
page for a foreign one if we called pfn_valid() passing the mfn of one
of our local pages.


> Therefore, it would always use the GFN rather than MFN?

For DMA operations:
- local pages use GFN as dev_addr
- foreign pages use MFN as dev_addr   <- requires 1:1 MFN mapping

For cache flush operations:
- local pages might flush locally, works as expected
- local pages might flush via hypercall, needlessly slow but works
  (this is what I was talking about)
- foreign pages might flush locally, requires 1:1 MFN mapping
- foreign pages might flush via hypercall, works as expected

The 1:1 MFN mapping of foreign pages is required and it is problematic
as it could conflict. Sorry for not mentioning it earlier. So yeah, it
looks like a change in Linux is required.


> > However, it still
> > works on systems where the IOMMU can be relied upon. That's because the
> > IOMMU does the GFN->MFN translations, and also because both swiotlb-xen
> > code paths (cache flush local and cache flush via hypercall) work.
> > 
> > Also considering that somebody that enables cache coloring has to know
> > the entire system well, I don't think we need a runtime flag from Linux
> > to say "Hey I can avoid using swiotlb".
> 
> I think we should avoid to hardcode any decision again. Otherwise, sooner or
> later this will come back to bite us.

That's fair.

Let's say that Linux needs some sort of change to work with cache
colored Xen. One important point is that a user cannot really expect the
system to enable cache coloring on its own if appropriate. The user has
to go in and manually set it up in the config files of all domUs and
even the Xen command line. So if the user turns it on and Linux breaks
at boot because it can't support it, I think it is entirely fine.

In other words, I don't think cache coloring needs to be dynamically
discoverable and transparently enableable. But certainly if you turn it
on, you don't want Linux to fail silently after a few hours. If it is
going to fail, we want it to fail straight away and clearly. For
example, a BUG_ON at boot in Linux or Xen would be fine.


 
> > > > How to set XENFEAT_ARM_dom0_iommu?
> > > > 
> > > > We could set XENFEAT_ARM_dom0_iommu automatically when
> > > > is_iommu_enabled(d) for Dom0. We could also have a platform specific
> > > > (xen/arch/arm/platforms/) override so that a specific platform can
> > > > disable XENFEAT_ARM_dom0_iommu. For debugging purposes and advanced
> > > > users, it would also be useful to be able to override it via a Xen
> > > > command line parameter.
> > > Platform quirks should be are limited to a small set of platforms.
> > > 
> > > In this case, this would not be only per-platform but also per-firmware
> > > table
> > > as a developer can decide to remove/add IOMMU nodes in the DT at any time.
> > > 
> > > In addition to that, it means we are introducing a regression for those
> > > users
> > > as Xen 4.14 would have worked on there platform but not anymore. They
> > > would
> > > need to go through all the nodes and find out which one is not protected.
> > > 
> > > This is a bit of a daunting task and we are going to end up having a lot
> > > of
> > > per-platform quirk in Xen.
> > > 
> > > So this approach selecting the flag is a no-go for me. FAOD, the inverted
> > > idea
> > > (i.e. only setting XENFEAT_ARM_dom0_iommu per-platform) is a no go as
> > > well.
> > > 
> > > I don't have a good idea how to set the flag automatically. My
> > > requirement is newer Xen should continue to work on all supported
> > > platforms without any additional per-platform effort.
> > 
> > Absolutely agreed.
> > 
> > 
> > One option would be to rename the flag to XENFEAT_ARM_cache_coloring and
> > only set it when cache coloring is enabled.  Obviously you need to know
> > what you are doing if you are enabling cache coloring. If we go down
> > that route, we don't risk breaking compatibility with any platforms.
> > Given that cache coloring is not upstream yet (upstreaming should start
> > very soon), maybe the only thing to do now would be to reserve a XENFEAT
> > number.
> 
> At least in this context, I can't see how the problem described is cache
> coloring specific. Although, we may need to expose such flag for other purpose
> in the future.
 
I agree


> > But actually it was always wrong for Linux to enable swiotlb-xen without
> > checking whether it is 1:1 mapped or not. Today we enable swiotlb-xen in
> > dom0 and disable it in domU, while we should have enabled swiotlb-xen if
> > 1:1 mapped no matter dom0/domU. (swiotlb-xen could be useful in a 1:1
> > mapped domU driver domain.)
> > 
> > 
> > There is an argument (Andy was making on IRC) that being 1:1 mapped or
> > not is an important information that Xen should provide to the domain
> > regardless of anything else.
> > 
> > So maybe we should add two flags:
> > 
> > - XENFEAT_direct_mapped
> > - XENFEAT_not_direct_mapped
> 
> I am guessing the two flags is to allow Linux to fallback to the default
> behavior (depending on dom0 vs domU) on older hypervisor On newer hypervisors,
> one of this flag would always be set. Is that correct?

Yes. On a newer hypervisor one of the two would be present and Linux can
make an informed decision. On an older hypervisor, neither flag would be
present, so Linux will have to keep doing what is currently doing.

 
> > To all domains. This is not even ARM specific. Today dom0 would get
> > XENFEAT_direct_mapped and domUs XENFEAT_not_direct_mapped. With cache
> > coloring all domains will get XENFEAT_not_direct_mapped. With Bertrand's
> > team work on 1:1 mapping domUs, some domUs might start to get
> > XENFEAT_direct_mapped also one day soon.
> > 
> > Now I think this is the best option because it is descriptive, doesn't
> > imply anything about what Linux should or should not do, and doesn't
> > depend on unreliable IOMMU information.
> 
> That's a good first step but this still doesn't solve the problem on whether
> the swiotlb can be disabled per-device or even disabling the expensive 1:1
> mapping in the IOMMU page-tables.
>
> It feels to me we need to have a more complete solution (not necessary
> implemented) so we don't put ourself in the corner again.

Yeah, XENFEAT_{not,}_direct_mapped help cleaning things up, but don't
solve the issues you described. Those are difficult to solve, it would
be nice to have some idea.

One issue is that we only have limited information passed via device
tree, limited to the "iommus" property. If that's all we have, there
isn't much we can do. The device tree list is maybe the only option,
although it feels a bit complex intuitively. We could maybe replace the
real iommu node with a fake iommu node only to use it to "tag" devices
protected by the real iommu.

I like the idea of rewarding well-designed boards; boards that have an
IOMMU and works for all DMA-mastering devices. It would be great to be
able to optimize those in a simple way, without breaking the others. But
unfortunately due to the limited info on device tree, I cannot think of
a way to do it automatically. And it is not great to rely on platform
files.



> > Instead, if we follow my original proposal of using
> > XENFEAT_ARM_dom0_iommu and set it automatically when Dom0 is protected
> > by IOMMU, we risk breaking PV drivers for platforms where that protection
> > is incomplete. I have no idea how many there are out there today. 
> 
> This can virtually affect any platform as it is easy to disable an IOMMU in
> the firmware table.
> 
> > I have
> > the feeling that there aren't that many but I am not sure. So yes, it
> > could be that we start passing XENFEAT_ARM_dom0_iommu for a given
> > platform, Linux skips the swiotlb-xen initialization, actually it is
> > needed for a network/block device, and a PV driver breaks. I can see why
> > you say this is a no-go.
> > 
> > 
> > Third option. We still use XENFEAT_ARM_dom0_iommu but we never set
> > XENFEAT_ARM_dom0_iommu automatically. It needs a platform specific flag
> > to be set. We add the flag to xen/arch/arm/platforms/xilinx-zynqmp.c and
> > any other platforms that qualify. Basically it is "opt in" instead of
> > "opt out". We don't risk breaking anything because platforms would have
> > XENFEAT_ARM_dom0_iommu disabled by default.
> Well, yes you will not break other platforms. However, you are still at risk
> to break your platform if the firmware table is updated and disable some but
> not all IOMMUs (for performance concern, brokeness...).

This is something we might be able to detect: we can detect if an IOMMU
is disabled.
Julien Grall Feb. 19, 2021, 10:32 a.m. UTC | #12
Hi Stefano,

On 19/02/2021 01:42, Stefano Stabellini wrote:
> On Thu, 18 Feb 2021, Julien Grall wrote:
>> On 17/02/2021 23:54, Stefano Stabellini wrote:
>>> On Wed, 17 Feb 2021, Julien Grall wrote:
>>>> On 17/02/2021 02:00, Stefano Stabellini wrote:
>>> But actually it was always wrong for Linux to enable swiotlb-xen without
>>> checking whether it is 1:1 mapped or not. Today we enable swiotlb-xen in
>>> dom0 and disable it in domU, while we should have enabled swiotlb-xen if
>>> 1:1 mapped no matter dom0/domU. (swiotlb-xen could be useful in a 1:1
>>> mapped domU driver domain.)
>>>
>>>
>>> There is an argument (Andy was making on IRC) that being 1:1 mapped or
>>> not is an important information that Xen should provide to the domain
>>> regardless of anything else.
>>>
>>> So maybe we should add two flags:
>>>
>>> - XENFEAT_direct_mapped
>>> - XENFEAT_not_direct_mapped
>>
>> I am guessing the two flags is to allow Linux to fallback to the default
>> behavior (depending on dom0 vs domU) on older hypervisor On newer hypervisors,
>> one of this flag would always be set. Is that correct?
> 
> Yes. On a newer hypervisor one of the two would be present and Linux can
> make an informed decision. On an older hypervisor, neither flag would be
> present, so Linux will have to keep doing what is currently doing.
> 
>   
>>> To all domains. This is not even ARM specific. Today dom0 would get
>>> XENFEAT_direct_mapped and domUs XENFEAT_not_direct_mapped. With cache
>>> coloring all domains will get XENFEAT_not_direct_mapped. With Bertrand's
>>> team work on 1:1 mapping domUs, some domUs might start to get
>>> XENFEAT_direct_mapped also one day soon.
>>>
>>> Now I think this is the best option because it is descriptive, doesn't
>>> imply anything about what Linux should or should not do, and doesn't
>>> depend on unreliable IOMMU information.
>>
>> That's a good first step but this still doesn't solve the problem on whether
>> the swiotlb can be disabled per-device or even disabling the expensive 1:1
>> mapping in the IOMMU page-tables.
>>
>> It feels to me we need to have a more complete solution (not necessary
>> implemented) so we don't put ourself in the corner again.
> 
> Yeah, XENFEAT_{not,}_direct_mapped help cleaning things up, but don't
> solve the issues you described. Those are difficult to solve, it would
> be nice to have some idea.
> 
> One issue is that we only have limited information passed via device
> tree, limited to the "iommus" property. If that's all we have, there
> isn't much we can do.

We can actually do a lot with that :). See more below.

> The device tree list is maybe the only option,
> although it feels a bit complex intuitively. We could maybe replace the
> real iommu node with a fake iommu node only to use it to "tag" devices
> protected by the real iommu.
> 
> I like the idea of rewarding well-designed boards; boards that have an
> IOMMU and works for all DMA-mastering devices. It would be great to be
> able to optimize those in a simple way, without breaking the others. But
> unfortunately due to the limited info on device tree, I cannot think of
> a way to do it automatically. And it is not great to rely on platform
> files.

We would not be able to automate in Xen alone, however we can ask the 
help of Linux.

Xen is able to tell whether it has protected the device with an IOMMU or 
not. When creating the domain device-tree, it could replace the IOMMU 
node with a Xen specific one.

With the Xen IOMMU nodes, Linux could find out whether the device needs 
to use the swiotlb ops or not.

Skipping extra mapping in the IOMMU is a bit trickier. I can see two 
solutions:
   - A per-domain toggle to skip the IOMMU mapping. This is assuming 
that Linux is able to know that all DMA capable devices are protected. 
The problem is a  driver may be loaded later. Such drivers are unlikely 
to use existing grant, so the toggle could be used to say "all the grant 
after this point will require a mapping (or not)"

   - A per-grant flag to indicate whether an IOMMU mapping is necessary. 
This is assuming we are able to know whether a grant will be used for DMA.

>>> Instead, if we follow my original proposal of using
>>> XENFEAT_ARM_dom0_iommu and set it automatically when Dom0 is protected
>>> by IOMMU, we risk breaking PV drivers for platforms where that protection
>>> is incomplete. I have no idea how many there are out there today.
>>
>> This can virtually affect any platform as it is easy to disable an IOMMU in
>> the firmware table.
>>
>>> I have
>>> the feeling that there aren't that many but I am not sure. So yes, it
>>> could be that we start passing XENFEAT_ARM_dom0_iommu for a given
>>> platform, Linux skips the swiotlb-xen initialization, actually it is
>>> needed for a network/block device, and a PV driver breaks. I can see why
>>> you say this is a no-go.
>>>
>>>
>>> Third option. We still use XENFEAT_ARM_dom0_iommu but we never set
>>> XENFEAT_ARM_dom0_iommu automatically. It needs a platform specific flag
>>> to be set. We add the flag to xen/arch/arm/platforms/xilinx-zynqmp.c and
>>> any other platforms that qualify. Basically it is "opt in" instead of
>>> "opt out". We don't risk breaking anything because platforms would have
>>> XENFEAT_ARM_dom0_iommu disabled by default.
>> Well, yes you will not break other platforms. However, you are still at risk
>> to break your platform if the firmware table is updated and disable some but
>> not all IOMMUs (for performance concern, brokeness...).
> 
> This is something we might be able to detect: we can detect if an IOMMU
> is disabled.

This is assuming that node has not been removed... :) Anyway, as I 
pointed out in my original answer, I don't think platform quirk (or 
enablement) is a viable solution here.

Cheers,
Stefano Stabellini Feb. 24, 2021, 12:20 a.m. UTC | #13
On Fri, 19 Feb 2021, Julien Grall wrote:
> On 19/02/2021 01:42, Stefano Stabellini wrote:
> > On Thu, 18 Feb 2021, Julien Grall wrote:
> > > On 17/02/2021 23:54, Stefano Stabellini wrote:
> > > > On Wed, 17 Feb 2021, Julien Grall wrote:
> > > > > On 17/02/2021 02:00, Stefano Stabellini wrote:
> > > > But actually it was always wrong for Linux to enable swiotlb-xen without
> > > > checking whether it is 1:1 mapped or not. Today we enable swiotlb-xen in
> > > > dom0 and disable it in domU, while we should have enabled swiotlb-xen if
> > > > 1:1 mapped no matter dom0/domU. (swiotlb-xen could be useful in a 1:1
> > > > mapped domU driver domain.)
> > > > 
> > > > 
> > > > There is an argument (Andy was making on IRC) that being 1:1 mapped or
> > > > not is an important information that Xen should provide to the domain
> > > > regardless of anything else.
> > > > 
> > > > So maybe we should add two flags:
> > > > 
> > > > - XENFEAT_direct_mapped
> > > > - XENFEAT_not_direct_mapped
> > > 
> > > I am guessing the two flags is to allow Linux to fallback to the default
> > > behavior (depending on dom0 vs domU) on older hypervisor On newer
> > > hypervisors,
> > > one of this flag would always be set. Is that correct?
> > 
> > Yes. On a newer hypervisor one of the two would be present and Linux can
> > make an informed decision. On an older hypervisor, neither flag would be
> > present, so Linux will have to keep doing what is currently doing.
> > 
> >   
> > > > To all domains. This is not even ARM specific. Today dom0 would get
> > > > XENFEAT_direct_mapped and domUs XENFEAT_not_direct_mapped. With cache
> > > > coloring all domains will get XENFEAT_not_direct_mapped. With Bertrand's
> > > > team work on 1:1 mapping domUs, some domUs might start to get
> > > > XENFEAT_direct_mapped also one day soon.
> > > > 
> > > > Now I think this is the best option because it is descriptive, doesn't
> > > > imply anything about what Linux should or should not do, and doesn't
> > > > depend on unreliable IOMMU information.
> > > 
> > > That's a good first step but this still doesn't solve the problem on
> > > whether
> > > the swiotlb can be disabled per-device or even disabling the expensive 1:1
> > > mapping in the IOMMU page-tables.
> > > 
> > > It feels to me we need to have a more complete solution (not necessary
> > > implemented) so we don't put ourself in the corner again.
> > 
> > Yeah, XENFEAT_{not,}_direct_mapped help cleaning things up, but don't
> > solve the issues you described. Those are difficult to solve, it would
> > be nice to have some idea.
> > 
> > One issue is that we only have limited information passed via device
> > tree, limited to the "iommus" property. If that's all we have, there
> > isn't much we can do.
> 
> We can actually do a lot with that :). See more below.
> 
> > The device tree list is maybe the only option,
> > although it feels a bit complex intuitively. We could maybe replace the
> > real iommu node with a fake iommu node only to use it to "tag" devices
> > protected by the real iommu.
> > 
> > I like the idea of rewarding well-designed boards; boards that have an
> > IOMMU and works for all DMA-mastering devices. It would be great to be
> > able to optimize those in a simple way, without breaking the others. But
> > unfortunately due to the limited info on device tree, I cannot think of
> > a way to do it automatically. And it is not great to rely on platform
> > files.
> 
> We would not be able to automate in Xen alone, however we can ask the help of
> Linux.
> 
> Xen is able to tell whether it has protected the device with an IOMMU or not.
> When creating the domain device-tree, it could replace the IOMMU node with a
> Xen specific one.
> 
> With the Xen IOMMU nodes, Linux could find out whether the device needs to use
> the swiotlb ops or not.

That might work.

Another similar idea is to use "dma-ranges" in device tree.  dma-ranges
can only be used for a bus and allow us to specify special DMA address
mappings between the nodes under the bus the parent address space.

If we created a new bus node called "amba-nodma" with a "dma-ranges"
that basically traslates child addresses into an invalid address or size
zero, and moved devices without "iommus" under it, the consequence is
that all those devices will be accessible and usable by Linux but there
would be no DMA transactions originating from them. Or the transactions
would fail explicitely.

Thus, IOMMU-protected devices would continue to work as normal.
Non-dma-mastering non-IOMMU-protected devices would also continue to
work as normal.
Dma-mastering non-IOMMU-protected devices would not work, but the
failure would be controlled and explicit.

At that point, swiotlb-xen could be enabled only for devices that have
this special dma-ranges address translation.

On one hand, this technique would require fewer changes in Linux -- it
might even work with almost no changes on the Linux side thanks to the
automatic fallback to the swiotlb when dma_mask checks fail.

On the other hand, I really dislike the magic invalid address
translation. If it is invalid, the device DMA should not work at all; it
should not trigger the swiotlb-xen. I can't think of a way to make this
clean from a device tree spec perspective. I thought I would mention it
in case it gives you any good ideas.


> Skipping extra mapping in the IOMMU is a bit trickier. I can see two
> solutions:
>   - A per-domain toggle to skip the IOMMU mapping. This is assuming that Linux
> is able to know that all DMA capable devices are protected. The problem is a
> driver may be loaded later. Such drivers are unlikely to use existing grant,
> so the toggle could be used to say "all the grant after this point will
> require a mapping (or not)"

I cannot think of a way for Linux to figure out that all DMA capable
devices are protected. Is the idea that Linux would base the decision on
the Linux drivers, not the capability of the hardware? Even using the
drivers, I don't know if it would be possible to implement.


>   - A per-grant flag to indicate whether an IOMMU mapping is necessary. This
> is assuming we are able to know whether a grant will be used for DMA.

That might be easier because the caller of gnttab_map_refs in Linux
should be able to make a decent guess.
diff mbox series

Patch

diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 7a345ae45e..4dbef48199 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -16,6 +16,7 @@ 
 #include <xen/hypfs.h>
 #include <xsm/xsm.h>
 #include <asm/current.h>
+#include <asm/platform.h>
 #include <public/version.h>
 
 #ifndef COMPAT
@@ -549,6 +550,9 @@  DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
                 fi.submap |= 1U << XENFEAT_dom0;
 #ifdef CONFIG_ARM
             fi.submap |= (1U << XENFEAT_ARM_SMCCC_supported);
+            if ( !platform_has_quirk(PLATFORM_QUIRK_DOM0_IOMMU) &&
+                 is_hardware_domain(d) && is_iommu_enabled(d) )
+                fi.submap |= (1U << XENFEAT_ARM_dom0_iommu);
 #endif
 #ifdef CONFIG_X86
             if ( is_pv_domain(d) )
diff --git a/xen/include/asm-arm/platform.h b/xen/include/asm-arm/platform.h
index 997eb25216..094a76d677 100644
--- a/xen/include/asm-arm/platform.h
+++ b/xen/include/asm-arm/platform.h
@@ -48,6 +48,11 @@  struct platform_desc {
  * stride.
  */
 #define PLATFORM_QUIRK_GIC_64K_STRIDE (1 << 0)
+/*
+ * Quirk for platforms where the IOMMU is present but doesn't protect
+ * all DMA-capable devices.
+ */
+#define PLATFORM_QUIRK_DOM0_IOMMU (1 << 1)
 
 void platform_init(void);
 int platform_init_time(void);
diff --git a/xen/include/asm-x86/platform.h b/xen/include/asm-x86/platform.h
new file mode 100644
index 0000000000..5427e8b851
--- /dev/null
+++ b/xen/include/asm-x86/platform.h
@@ -0,0 +1,13 @@ 
+#ifndef __ASM_X86_PLATFORM_H
+#define __ASM_X86_PLATFORM_H
+
+#endif /* __ASM_X86_PLATFORM_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/public/features.h b/xen/include/public/features.h
index 1613b2aab8..adaa2a995d 100644
--- a/xen/include/public/features.h
+++ b/xen/include/public/features.h
@@ -114,6 +114,11 @@ 
  */
 #define XENFEAT_linux_rsdp_unrestricted   15
 
+/*
+ * arm: dom0 is started with IOMMU protection.
+ */
+#define XENFEAT_ARM_dom0_iommu            16
+
 #define XENFEAT_NR_SUBMAPS 1
 
 #endif /* __XEN_PUBLIC_FEATURES_H__ */