diff mbox

[v1,1/1] xen/arm: Map mmio-sram nodes as cached memory

Message ID 1481801208-27758-2-git-send-email-edgar.iglesias@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Edgar E. Iglesias Dec. 15, 2016, 11:26 a.m. UTC
From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Relax the mapping of mmio-sram nodes that do not set the
no-memory-wc property to cached normal memory.

Rationale:
Allthough on chip memories are relatively fast compared to
off-chip memories, large OCMs are still significantly slower
than L1 caches. Depending on the memory, either cached or
uncached may make most sense.

Also, dom0 may like to use the memory in a cache-coherent way
to avoid SW managed coherency.

By mapping it cached at S2, we let dom0 select cacheability
via S1 mappings.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
---
 xen/arch/arm/domain_build.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Comments

Julien Grall Dec. 16, 2016, 4:12 p.m. UTC | #1
Hi Edgar,

On 15/12/16 11:26, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
>
> Relax the mapping of mmio-sram nodes that do not set the
> no-memory-wc property to cached normal memory.
>
> Rationale:
> Allthough on chip memories are relatively fast compared to

s/Allthough/Although/

> off-chip memories, large OCMs are still significantly slower

May I ask what does OCMs mean?

> than L1 caches. Depending on the memory, either cached or
> uncached may make most sense.
>
> Also, dom0 may like to use the memory in a cache-coherent way
> to avoid SW managed coherency.
>
> By mapping it cached at S2, we let dom0 select cacheability
> via S1 mappings.
>
> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
> ---
>  xen/arch/arm/domain_build.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 07b868d..3a7c1ee 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -56,8 +56,20 @@ static const struct dt_device_match dev_map_attrs[] __initconst =
>          .data = (void *) (uintptr_t) p2m_mmio_direct_dev,
>      },
>      {
> +        /*
> +         * Allthough on chip memories are relatively fast compared to

s/Allthough/Although/

> +         * off-chip memories, large OCMs are still significantly slower
> +         * than L1 caches. Depending on the memory, either cached or
> +         * uncached may make most sense.
> +         *
> +         * Also, dom0 may like to use the memory in a cache-coherent

We prefer to use the term "hardware domain" rather than DOM0 when 
possible in the code.

> +         * way to avoid SW managed coherency.
> +         *
> +         * By mapping it cached at S2, we let dom0 select cacheability
> +         * via S1 mappings.
> +         */
>          __DT_MATCH_COMPATIBLE("mmio-sram"),
> -        .data = (void *) (uintptr_t) p2m_mmio_direct_nc,
> +        .data = (void *) (uintptr_t) p2m_mmio_direct_c,

I consider it as the most trusted domain and has other way to mess up 
the platform. So I am fine with this relaxation for the hardware domain 
(AKA DOM0).

However, I have the feeling that we need this kind of relaxation for 
many devices. For instance prefetchable memory BARs for PCI would have 
to be cacheable, same for integrated graphic cards.

I am wondering whether for DOM0 only (*not the other guests), we should 
relax the default stage-2 attribute mapping to p2m_mmio_direct_c.

So we would let DOM0 in charge of the final attribute. This may boost 
the performance of memory access in DOM0.

Any opinions?

>      },
>      { /* sentinel */ },
>  };
>

Cheers,
Edgar E. Iglesias Dec. 16, 2016, 5:04 p.m. UTC | #2
On Fri, Dec 16, 2016 at 04:12:00PM +0000, Julien Grall wrote:
> Hi Edgar,
> 
> On 15/12/16 11:26, Edgar E. Iglesias wrote:
> >From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
> >
> >Relax the mapping of mmio-sram nodes that do not set the
> >no-memory-wc property to cached normal memory.
> >
> >Rationale:
> >Allthough on chip memories are relatively fast compared to
> 
> s/Allthough/Although/

Fixed for v2.

> 
> >off-chip memories, large OCMs are still significantly slower
> 
> May I ask what does OCMs mean?

It means On Chip Memories. I can spell it out in v2.

> 
> >than L1 caches. Depending on the memory, either cached or
> >uncached may make most sense.
> >
> >Also, dom0 may like to use the memory in a cache-coherent way
> >to avoid SW managed coherency.
> >
> >By mapping it cached at S2, we let dom0 select cacheability
> >via S1 mappings.
> >
> >Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
> >---
> > xen/arch/arm/domain_build.c | 14 +++++++++++++-
> > 1 file changed, 13 insertions(+), 1 deletion(-)
> >
> >diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> >index 07b868d..3a7c1ee 100644
> >--- a/xen/arch/arm/domain_build.c
> >+++ b/xen/arch/arm/domain_build.c
> >@@ -56,8 +56,20 @@ static const struct dt_device_match dev_map_attrs[] __initconst =
> >         .data = (void *) (uintptr_t) p2m_mmio_direct_dev,
> >     },
> >     {
> >+        /*
> >+         * Allthough on chip memories are relatively fast compared to
> 
> s/Allthough/Although/

Fixed for v2.

> 
> >+         * off-chip memories, large OCMs are still significantly slower
> >+         * than L1 caches. Depending on the memory, either cached or
> >+         * uncached may make most sense.
> >+         *
> >+         * Also, dom0 may like to use the memory in a cache-coherent
> 
> We prefer to use the term "hardware domain" rather than DOM0 when possible
> in the code.

Sounds good, I've changed it in code and commit msg for v2.

> 
> >+         * way to avoid SW managed coherency.
> >+         *
> >+         * By mapping it cached at S2, we let dom0 select cacheability
> >+         * via S1 mappings.
> >+         */
> >         __DT_MATCH_COMPATIBLE("mmio-sram"),
> >-        .data = (void *) (uintptr_t) p2m_mmio_direct_nc,
> >+        .data = (void *) (uintptr_t) p2m_mmio_direct_c,
> 
> I consider it as the most trusted domain and has other way to mess up the
> platform. So I am fine with this relaxation for the hardware domain (AKA
> DOM0).
> 
> However, I have the feeling that we need this kind of relaxation for many
> devices. For instance prefetchable memory BARs for PCI would have to be
> cacheable, same for integrated graphic cards.

Yes, I agree.

> 
> I am wondering whether for DOM0 only (*not the other guests), we should
> relax the default stage-2 attribute mapping to p2m_mmio_direct_c.
> 
> So we would let DOM0 in charge of the final attribute. This may boost the
> performance of memory access in DOM0.
> 
> Any opinions?

I think it makes sense. We had a discussiong about it a while back ago:
https://lists.xenproject.org/archives/html/xen-devel/2015-05/msg02349.html

The concerns that were raised were wether there could be devices that
could behave badly and possibly giving dom0 access to trig undefined or
unpredictable behavior that could be exploited.

If dom0 accesses devices through a cache, access patterns will change.
In theory it may trig unexpected behaviour in some device. It's hard
to say unless talking about specific chips and implementations.

For example, given dom0 access to a DMA capable device may also achieve
the same. I.e access patterns from DMA units differ from the ones
originating from a CPU using uncached device memory. It could trig
similar kind of errors.

It would be interesting to see a concrete example of a problematic
system.

Best regards,
Edgar
Julien Grall Dec. 19, 2016, 12:17 p.m. UTC | #3
Hi Edgar,

On 16/12/2016 18:04, Edgar E. Iglesias wrote:
> On Fri, Dec 16, 2016 at 04:12:00PM +0000, Julien Grall wrote:
>> On 15/12/16 11:26, Edgar E. Iglesias wrote:
>>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
>>>
>>> Relax the mapping of mmio-sram nodes that do not set the
>>> no-memory-wc property to cached normal memory.
>>>
>>> Rationale:
>>> Allthough on chip memories are relatively fast compared to
>>
>> s/Allthough/Although/
>
> Fixed for v2.
>
>>
>>> off-chip memories, large OCMs are still significantly slower
>>
>> May I ask what does OCMs mean?
>
> It means On Chip Memories. I can spell it out in v2.

Yes please. I was not able to find the acronym with a quick search Google.

[...]

>> I consider it as the most trusted domain and has other way to mess up the
>> platform. So I am fine with this relaxation for the hardware domain (AKA
>> DOM0).
>>
>> However, I have the feeling that we need this kind of relaxation for many
>> devices. For instance prefetchable memory BARs for PCI would have to be
>> cacheable, same for integrated graphic cards.
>
> Yes, I agree.
>
>>
>> I am wondering whether for DOM0 only (*not the other guests), we should
>> relax the default stage-2 attribute mapping to p2m_mmio_direct_c.
>>
>> So we would let DOM0 in charge of the final attribute. This may boost the
>> performance of memory access in DOM0.
>>
>> Any opinions?
>
> I think it makes sense. We had a discussiong about it a while back ago:
> https://lists.xenproject.org/archives/html/xen-devel/2015-05/msg02349.html
>
> The concerns that were raised were wether there could be devices that
> could behave badly and possibly giving dom0 access to trig undefined or
> unpredictable behavior that could be exploited.

I think it would be fine for DOM0. It is already a trusted domain and 
have other way to take down the platform.

>
> If dom0 accesses devices through a cache, access patterns will change.
> In theory it may trig unexpected behaviour in some device. It's hard
> to say unless talking about specific chips and implementations.
>
> For example, given dom0 access to a DMA capable device may also achieve
> the same. I.e access patterns from DMA units differ from the ones
> originating from a CPU using uncached device memory. It could trig
> similar kind of errors.

Another example is having the device mapped with different in 2 places 
with different cacheability attributes. The data accessed would not be 
coherent. But I believe that should not lead to a security issue.

>
> It would be interesting to see a concrete example of a problematic
> system.

I believe it would depend a lot on how the platform have been designed.

I think we have two choices to go forward:
	1) Relax the memory attribute on case by case. DOM0 would not be able 
to exploit potential undefined behavior. However, this means that code 
is been added for any new device (e.g compatible string).
	2) Relax the memory attribute on every case. DOM0 would be able to 
exploit potential undefined behavior. On the benefit side, every devices 
can be used at its full performance without any change required in Xen.

In the case of ACPI, we rely on DOM0 to provide the correct mapping 
attribute when asking to do the stage-2 mapping (see 
XENMAPSPACE_dev_mmio). Note that currently, the MMIO are always mapped 
with the attribute Device-nGnRE. There is plan to change that in the 
future. So ACPI is implementing 2).

Device Tree is currently implementing 1). But I would like to see the 
behavior of Xen the same no matter the firmware tables used. So I would 
lean towards 2).

Cheers,
Stefano Stabellini Dec. 20, 2016, 12:01 a.m. UTC | #4
On Mon, 19 Dec 2016, Julien Grall wrote:
> Hi Edgar,
> 
> On 16/12/2016 18:04, Edgar E. Iglesias wrote:
> > On Fri, Dec 16, 2016 at 04:12:00PM +0000, Julien Grall wrote:
> > > On 15/12/16 11:26, Edgar E. Iglesias wrote:
> > > > From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
> > > > 
> > > > Relax the mapping of mmio-sram nodes that do not set the
> > > > no-memory-wc property to cached normal memory.
> > > > 
> > > > Rationale:
> > > > Allthough on chip memories are relatively fast compared to
> > > 
> > > s/Allthough/Although/
> > 
> > Fixed for v2.
> > 
> > > 
> > > > off-chip memories, large OCMs are still significantly slower
> > > 
> > > May I ask what does OCMs mean?
> > 
> > It means On Chip Memories. I can spell it out in v2.
> 
> Yes please. I was not able to find the acronym with a quick search Google.
> 
> [...]
> 
> > > I consider it as the most trusted domain and has other way to mess up the
> > > platform. So I am fine with this relaxation for the hardware domain (AKA
> > > DOM0).
> > > 
> > > However, I have the feeling that we need this kind of relaxation for many
> > > devices. For instance prefetchable memory BARs for PCI would have to be
> > > cacheable, same for integrated graphic cards.
> > 
> > Yes, I agree.
> > 
> > > 
> > > I am wondering whether for DOM0 only (*not the other guests), we should
> > > relax the default stage-2 attribute mapping to p2m_mmio_direct_c.
> > > 
> > > So we would let DOM0 in charge of the final attribute. This may boost the
> > > performance of memory access in DOM0.
> > > 
> > > Any opinions?
> > 
> > I think it makes sense. We had a discussiong about it a while back ago:
> > https://lists.xenproject.org/archives/html/xen-devel/2015-05/msg02349.html
> > 
> > The concerns that were raised were wether there could be devices that
> > could behave badly and possibly giving dom0 access to trig undefined or
> > unpredictable behavior that could be exploited.
> 
> I think it would be fine for DOM0. It is already a trusted domain and have
> other way to take down the platform.
> 
> > 
> > If dom0 accesses devices through a cache, access patterns will change.
> > In theory it may trig unexpected behaviour in some device. It's hard
> > to say unless talking about specific chips and implementations.
> > 
> > For example, given dom0 access to a DMA capable device may also achieve
> > the same. I.e access patterns from DMA units differ from the ones
> > originating from a CPU using uncached device memory. It could trig
> > similar kind of errors.
> 
> Another example is having the device mapped with different in 2 places with
> different cacheability attributes. The data accessed would not be coherent.
> But I believe that should not lead to a security issue.
> 
> > 
> > It would be interesting to see a concrete example of a problematic
> > system.
> 
> I believe it would depend a lot on how the platform have been designed.
> 
> I think we have two choices to go forward:
> 	1) Relax the memory attribute on case by case. DOM0 would not be able
> to exploit potential undefined behavior. However, this means that code is been
> added for any new device (e.g compatible string).
> 	2) Relax the memory attribute on every case. DOM0 would be able to
> exploit potential undefined behavior. On the benefit side, every devices can
> be used at its full performance without any change required in Xen.
> 
> In the case of ACPI, we rely on DOM0 to provide the correct mapping attribute
> when asking to do the stage-2 mapping (see XENMAPSPACE_dev_mmio). Note that
> currently, the MMIO are always mapped with the attribute Device-nGnRE. There
> is plan to change that in the future. So ACPI is implementing 2).
> 
> Device Tree is currently implementing 1). But I would like to see the behavior
> of Xen the same no matter the firmware tables used. So I would lean towards
> 2).

That's fine by me.
Edgar E. Iglesias Dec. 20, 2016, 12:53 p.m. UTC | #5
On Mon, Dec 19, 2016 at 04:01:00PM -0800, Stefano Stabellini wrote:
> On Mon, 19 Dec 2016, Julien Grall wrote:
> > Hi Edgar,
> > 
> > On 16/12/2016 18:04, Edgar E. Iglesias wrote:
> > > On Fri, Dec 16, 2016 at 04:12:00PM +0000, Julien Grall wrote:
> > > > On 15/12/16 11:26, Edgar E. Iglesias wrote:
> > > > > From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
> > > > > 
> > > > > Relax the mapping of mmio-sram nodes that do not set the
> > > > > no-memory-wc property to cached normal memory.
> > > > > 
> > > > > Rationale:
> > > > > Allthough on chip memories are relatively fast compared to
> > > > 
> > > > s/Allthough/Although/
> > > 
> > > Fixed for v2.
> > > 
> > > > 
> > > > > off-chip memories, large OCMs are still significantly slower
> > > > 
> > > > May I ask what does OCMs mean?
> > > 
> > > It means On Chip Memories. I can spell it out in v2.
> > 
> > Yes please. I was not able to find the acronym with a quick search Google.
> > 
> > [...]
> > 
> > > > I consider it as the most trusted domain and has other way to mess up the
> > > > platform. So I am fine with this relaxation for the hardware domain (AKA
> > > > DOM0).
> > > > 
> > > > However, I have the feeling that we need this kind of relaxation for many
> > > > devices. For instance prefetchable memory BARs for PCI would have to be
> > > > cacheable, same for integrated graphic cards.
> > > 
> > > Yes, I agree.
> > > 
> > > > 
> > > > I am wondering whether for DOM0 only (*not the other guests), we should
> > > > relax the default stage-2 attribute mapping to p2m_mmio_direct_c.
> > > > 
> > > > So we would let DOM0 in charge of the final attribute. This may boost the
> > > > performance of memory access in DOM0.
> > > > 
> > > > Any opinions?
> > > 
> > > I think it makes sense. We had a discussiong about it a while back ago:
> > > https://lists.xenproject.org/archives/html/xen-devel/2015-05/msg02349.html
> > > 
> > > The concerns that were raised were wether there could be devices that
> > > could behave badly and possibly giving dom0 access to trig undefined or
> > > unpredictable behavior that could be exploited.
> > 
> > I think it would be fine for DOM0. It is already a trusted domain and have
> > other way to take down the platform.
> > 
> > > 
> > > If dom0 accesses devices through a cache, access patterns will change.
> > > In theory it may trig unexpected behaviour in some device. It's hard
> > > to say unless talking about specific chips and implementations.
> > > 
> > > For example, given dom0 access to a DMA capable device may also achieve
> > > the same. I.e access patterns from DMA units differ from the ones
> > > originating from a CPU using uncached device memory. It could trig
> > > similar kind of errors.
> > 
> > Another example is having the device mapped with different in 2 places with
> > different cacheability attributes. The data accessed would not be coherent.
> > But I believe that should not lead to a security issue.
> > 
> > > 
> > > It would be interesting to see a concrete example of a problematic
> > > system.
> > 
> > I believe it would depend a lot on how the platform have been designed.
> > 
> > I think we have two choices to go forward:
> > 	1) Relax the memory attribute on case by case. DOM0 would not be able
> > to exploit potential undefined behavior. However, this means that code is been
> > added for any new device (e.g compatible string).
> > 	2) Relax the memory attribute on every case. DOM0 would be able to
> > exploit potential undefined behavior. On the benefit side, every devices can
> > be used at its full performance without any change required in Xen.
> > 
> > In the case of ACPI, we rely on DOM0 to provide the correct mapping attribute
> > when asking to do the stage-2 mapping (see XENMAPSPACE_dev_mmio). Note that
> > currently, the MMIO are always mapped with the attribute Device-nGnRE. There
> > is plan to change that in the future. So ACPI is implementing 2).
> > 
> > Device Tree is currently implementing 1). But I would like to see the behavior
> > of Xen the same no matter the firmware tables used. So I would lean towards
> > 2).
> 
> That's fine by me.


OK, I've sent a v2 of the mmio-sram patch to avoid delays with that part.

But now that we seem to be in agreement with going for relaxing the default
S2 mappings, perhaps we should ditch the whitelist/blacklist mechanism?
Or does someone see a need for a blacklist?

Cheers,
Edgar
Julien Grall Dec. 21, 2016, 11:29 a.m. UTC | #6
Hi Edgar,

On 20/12/2016 13:53, Edgar E. Iglesias wrote:
> On Mon, Dec 19, 2016 at 04:01:00PM -0800, Stefano Stabellini wrote:
>> On Mon, 19 Dec 2016, Julien Grall wrote:
>>> Hi Edgar,
>>>
>>> On 16/12/2016 18:04, Edgar E. Iglesias wrote:
>>>> On Fri, Dec 16, 2016 at 04:12:00PM +0000, Julien Grall wrote:
>>>>> On 15/12/16 11:26, Edgar E. Iglesias wrote:
>>>>>> From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
>>>>>>
>>>>>> Relax the mapping of mmio-sram nodes that do not set the
>>>>>> no-memory-wc property to cached normal memory.
>>>>>>
>>>>>> Rationale:
>>>>>> Allthough on chip memories are relatively fast compared to
>>>>>
>>>>> s/Allthough/Although/
>>>>
>>>> Fixed for v2.
>>>>
>>>>>
>>>>>> off-chip memories, large OCMs are still significantly slower
>>>>>
>>>>> May I ask what does OCMs mean?
>>>>
>>>> It means On Chip Memories. I can spell it out in v2.
>>>
>>> Yes please. I was not able to find the acronym with a quick search Google.
>>>
>>> [...]
>>>
>>>>> I consider it as the most trusted domain and has other way to mess up the
>>>>> platform. So I am fine with this relaxation for the hardware domain (AKA
>>>>> DOM0).
>>>>>
>>>>> However, I have the feeling that we need this kind of relaxation for many
>>>>> devices. For instance prefetchable memory BARs for PCI would have to be
>>>>> cacheable, same for integrated graphic cards.
>>>>
>>>> Yes, I agree.
>>>>
>>>>>
>>>>> I am wondering whether for DOM0 only (*not the other guests), we should
>>>>> relax the default stage-2 attribute mapping to p2m_mmio_direct_c.
>>>>>
>>>>> So we would let DOM0 in charge of the final attribute. This may boost the
>>>>> performance of memory access in DOM0.
>>>>>
>>>>> Any opinions?
>>>>
>>>> I think it makes sense. We had a discussiong about it a while back ago:
>>>> https://lists.xenproject.org/archives/html/xen-devel/2015-05/msg02349.html
>>>>
>>>> The concerns that were raised were wether there could be devices that
>>>> could behave badly and possibly giving dom0 access to trig undefined or
>>>> unpredictable behavior that could be exploited.
>>>
>>> I think it would be fine for DOM0. It is already a trusted domain and have
>>> other way to take down the platform.
>>>
>>>>
>>>> If dom0 accesses devices through a cache, access patterns will change.
>>>> In theory it may trig unexpected behaviour in some device. It's hard
>>>> to say unless talking about specific chips and implementations.
>>>>
>>>> For example, given dom0 access to a DMA capable device may also achieve
>>>> the same. I.e access patterns from DMA units differ from the ones
>>>> originating from a CPU using uncached device memory. It could trig
>>>> similar kind of errors.
>>>
>>> Another example is having the device mapped with different in 2 places with
>>> different cacheability attributes. The data accessed would not be coherent.
>>> But I believe that should not lead to a security issue.
>>>
>>>>
>>>> It would be interesting to see a concrete example of a problematic
>>>> system.
>>>
>>> I believe it would depend a lot on how the platform have been designed.
>>>
>>> I think we have two choices to go forward:
>>> 	1) Relax the memory attribute on case by case. DOM0 would not be able
>>> to exploit potential undefined behavior. However, this means that code is been
>>> added for any new device (e.g compatible string).
>>> 	2) Relax the memory attribute on every case. DOM0 would be able to
>>> exploit potential undefined behavior. On the benefit side, every devices can
>>> be used at its full performance without any change required in Xen.
>>>
>>> In the case of ACPI, we rely on DOM0 to provide the correct mapping attribute
>>> when asking to do the stage-2 mapping (see XENMAPSPACE_dev_mmio). Note that
>>> currently, the MMIO are always mapped with the attribute Device-nGnRE. There
>>> is plan to change that in the future. So ACPI is implementing 2).
>>>
>>> Device Tree is currently implementing 1). But I would like to see the behavior
>>> of Xen the same no matter the firmware tables used. So I would lean towards
>>> 2).
>>
>> That's fine by me.
>
>
> OK, I've sent a v2 of the mmio-sram patch to avoid delays with that part.
>
> But now that we seem to be in agreement with going for relaxing the default
> S2 mappings, perhaps we should ditch the whitelist/blacklist mechanism?
> Or does someone see a need for a blacklist?

I don't have in mind of any device requiring more tighter memory 
attribute from the hypervisor. AFAICT, it would only be necessary to do 
it if DOM0 was using wrong attribute, but that kernel would also have 
issue whilst running on baremetal.

Cheers,
Stefano Stabellini Dec. 21, 2016, 6:50 p.m. UTC | #7
On Wed, 21 Dec 2016, Julien Grall wrote:
> Hi Edgar,
> 
> On 20/12/2016 13:53, Edgar E. Iglesias wrote:
> > On Mon, Dec 19, 2016 at 04:01:00PM -0800, Stefano Stabellini wrote:
> > > On Mon, 19 Dec 2016, Julien Grall wrote:
> > > > Hi Edgar,
> > > > 
> > > > On 16/12/2016 18:04, Edgar E. Iglesias wrote:
> > > > > On Fri, Dec 16, 2016 at 04:12:00PM +0000, Julien Grall wrote:
> > > > > > On 15/12/16 11:26, Edgar E. Iglesias wrote:
> > > > > > > From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
> > > > > > > 
> > > > > > > Relax the mapping of mmio-sram nodes that do not set the
> > > > > > > no-memory-wc property to cached normal memory.
> > > > > > > 
> > > > > > > Rationale:
> > > > > > > Allthough on chip memories are relatively fast compared to
> > > > > > 
> > > > > > s/Allthough/Although/
> > > > > 
> > > > > Fixed for v2.
> > > > > 
> > > > > > 
> > > > > > > off-chip memories, large OCMs are still significantly slower
> > > > > > 
> > > > > > May I ask what does OCMs mean?
> > > > > 
> > > > > It means On Chip Memories. I can spell it out in v2.
> > > > 
> > > > Yes please. I was not able to find the acronym with a quick search
> > > > Google.
> > > > 
> > > > [...]
> > > > 
> > > > > > I consider it as the most trusted domain and has other way to mess
> > > > > > up the
> > > > > > platform. So I am fine with this relaxation for the hardware domain
> > > > > > (AKA
> > > > > > DOM0).
> > > > > > 
> > > > > > However, I have the feeling that we need this kind of relaxation for
> > > > > > many
> > > > > > devices. For instance prefetchable memory BARs for PCI would have to
> > > > > > be
> > > > > > cacheable, same for integrated graphic cards.
> > > > > 
> > > > > Yes, I agree.
> > > > > 
> > > > > > 
> > > > > > I am wondering whether for DOM0 only (*not the other guests), we
> > > > > > should
> > > > > > relax the default stage-2 attribute mapping to p2m_mmio_direct_c.
> > > > > > 
> > > > > > So we would let DOM0 in charge of the final attribute. This may
> > > > > > boost the
> > > > > > performance of memory access in DOM0.
> > > > > > 
> > > > > > Any opinions?
> > > > > 
> > > > > I think it makes sense. We had a discussiong about it a while back
> > > > > ago:
> > > > > https://lists.xenproject.org/archives/html/xen-devel/2015-05/msg02349.html
> > > > > 
> > > > > The concerns that were raised were wether there could be devices that
> > > > > could behave badly and possibly giving dom0 access to trig undefined
> > > > > or
> > > > > unpredictable behavior that could be exploited.
> > > > 
> > > > I think it would be fine for DOM0. It is already a trusted domain and
> > > > have
> > > > other way to take down the platform.
> > > > 
> > > > > 
> > > > > If dom0 accesses devices through a cache, access patterns will change.
> > > > > In theory it may trig unexpected behaviour in some device. It's hard
> > > > > to say unless talking about specific chips and implementations.
> > > > > 
> > > > > For example, given dom0 access to a DMA capable device may also
> > > > > achieve
> > > > > the same. I.e access patterns from DMA units differ from the ones
> > > > > originating from a CPU using uncached device memory. It could trig
> > > > > similar kind of errors.
> > > > 
> > > > Another example is having the device mapped with different in 2 places
> > > > with
> > > > different cacheability attributes. The data accessed would not be
> > > > coherent.
> > > > But I believe that should not lead to a security issue.
> > > > 
> > > > > 
> > > > > It would be interesting to see a concrete example of a problematic
> > > > > system.
> > > > 
> > > > I believe it would depend a lot on how the platform have been designed.
> > > > 
> > > > I think we have two choices to go forward:
> > > > 	1) Relax the memory attribute on case by case. DOM0 would not be able
> > > > to exploit potential undefined behavior. However, this means that code
> > > > is been
> > > > added for any new device (e.g compatible string).
> > > > 	2) Relax the memory attribute on every case. DOM0 would be able to
> > > > exploit potential undefined behavior. On the benefit side, every devices
> > > > can
> > > > be used at its full performance without any change required in Xen.
> > > > 
> > > > In the case of ACPI, we rely on DOM0 to provide the correct mapping
> > > > attribute
> > > > when asking to do the stage-2 mapping (see XENMAPSPACE_dev_mmio). Note
> > > > that
> > > > currently, the MMIO are always mapped with the attribute Device-nGnRE.
> > > > There
> > > > is plan to change that in the future. So ACPI is implementing 2).
> > > > 
> > > > Device Tree is currently implementing 1). But I would like to see the
> > > > behavior
> > > > of Xen the same no matter the firmware tables used. So I would lean
> > > > towards
> > > > 2).
> > > 
> > > That's fine by me.
> > 
> > 
> > OK, I've sent a v2 of the mmio-sram patch to avoid delays with that part.
> > 
> > But now that we seem to be in agreement with going for relaxing the default
> > S2 mappings, perhaps we should ditch the whitelist/blacklist mechanism?
> > Or does someone see a need for a blacklist?
> 
> I don't have in mind of any device requiring more tighter memory attribute
> from the hypervisor. AFAICT, it would only be necessary to do it if DOM0 was
> using wrong attribute, but that kernel would also have issue whilst running on
> baremetal.

Edgar,
I'll just wait for the patch that relaxes stage-2 mappings. I won't
apply this one.
diff mbox

Patch

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 07b868d..3a7c1ee 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -56,8 +56,20 @@  static const struct dt_device_match dev_map_attrs[] __initconst =
         .data = (void *) (uintptr_t) p2m_mmio_direct_dev,
     },
     {
+        /*
+         * Allthough on chip memories are relatively fast compared to
+         * off-chip memories, large OCMs are still significantly slower
+         * than L1 caches. Depending on the memory, either cached or
+         * uncached may make most sense.
+         *
+         * Also, dom0 may like to use the memory in a cache-coherent
+         * way to avoid SW managed coherency.
+         *
+         * By mapping it cached at S2, we let dom0 select cacheability
+         * via S1 mappings.
+         */
         __DT_MATCH_COMPATIBLE("mmio-sram"),
-        .data = (void *) (uintptr_t) p2m_mmio_direct_nc,
+        .data = (void *) (uintptr_t) p2m_mmio_direct_c,
     },
     { /* sentinel */ },
 };