diff mbox series

[for-next] xen/arm: mm: flush_page_to_ram() only need to clean to PoC

Message ID 20210220175413.14640-1-julien@xen.org (mailing list archive)
State New, archived
Headers show
Series [for-next] xen/arm: mm: flush_page_to_ram() only need to clean to PoC | expand

Commit Message

Julien Grall Feb. 20, 2021, 5:54 p.m. UTC
From: Julien Grall <jgrall@amazon.com>

At the moment, flush_page_to_ram() is both cleaning and invalidate to
PoC the page. However, the cache line can be speculated and pull in the
cache right after as it is part of the direct map.

So it is pointless to try to invalidate the line in the data cache.

Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/arch/arm/mm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Bertrand Marquis Feb. 22, 2021, 11:58 a.m. UTC | #1
Hi Julien,

> On 20 Feb 2021, at 17:54, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment, flush_page_to_ram() is both cleaning and invalidate to
> PoC the page. However, the cache line can be speculated and pull in the
> cache right after as it is part of the direct map.

If we go further through this logic maybe all calls to
clean_and_invalidate_dcache_va_range could be transformed in a
clean_dcache_va_range.

> 
> So it is pointless to try to invalidate the line in the data cache.
> 

But what about processors which would not speculate ?

Do you expect any performance optimization here ?

If so it might be good to explain it as I am not quite sure I get it.

Cheers
Bertrand

> Signed-off-by: Julien Grall <jgrall@amazon.com>
> ---
> xen/arch/arm/mm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 59f8a3f15fd1..2f11d214e184 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -529,7 +529,7 @@ void flush_page_to_ram(unsigned long mfn, bool sync_icache)
> {
>     void *v = map_domain_page(_mfn(mfn));
> 
> -    clean_and_invalidate_dcache_va_range(v, PAGE_SIZE);
> +    clean_dcache_va_range(v, PAGE_SIZE);
>     unmap_domain_page(v);
> 
>     /*
> -- 
> 2.17.1
> 
>
Julien Grall Feb. 22, 2021, 1:37 p.m. UTC | #2
On 22/02/2021 11:58, Bertrand Marquis wrote:
> Hi Julien,
> 
>> On 20 Feb 2021, at 17:54, Julien Grall <julien@xen.org> wrote:
>>
>> From: Julien Grall <jgrall@amazon.com>
>>
>> At the moment, flush_page_to_ram() is both cleaning and invalidate to
>> PoC the page. However, the cache line can be speculated and pull in the
>> cache right after as it is part of the direct map.
> 
> If we go further through this logic maybe all calls to
> clean_and_invalidate_dcache_va_range could be transformed in a
> clean_dcache_va_range.

Likely yes. But I need to go through them one by one to confirm this is 
fine to do it (it also depends on the caching attributes used). I have 
sent this one in advance because this was discussed as part of XSA-364.

> 
>>
>> So it is pointless to try to invalidate the line in the data cache.
>>
> 
> But what about processors which would not speculate ?
> 
> Do you expect any performance optimization here ?

When invalidating a line, you effectively remove it from the cache. If 
the page is going to be access a bit after, then you will have to load 
from the memory (or another cache).

With this change, you would only need to re-fetch the line if it wasn't 
evicted by the time it is accessed.

The line would be clean, so I would expect the eviction to have less an 
impact over re-fetching the memory.

> 
> If so it might be good to explain it as I am not quite sure I get it.

This change is less about performance and more about unnecessary work.

The processor is likely going to be more clever than the developper and 
the exact numbers will vary depending on how the processor decide to 
manage the cache.

In general, we should avoid interferring too much with the cache without 
a good reason to do it.

How about the following commit message:

"
At the moment, flush_page_to_ram() is both cleaning and invalidate to
PoC the page.

The goal of flush_page_to_ram() is to prevent corruption when the guest 
has disabled the cache (the cache line may be dirty) and read the guest 
to read previous content.

Per this defintion, the invalidating the line is not necessary. So 
invalidating the cache is unnecessary. In fact, it may be 
counter-productive as the line may be (speculatively) accessed a bit 
after. So this will incurr an expensive access to the memory.

More generally, we should avoid interferring too much with cache. 
Therefore, flush_page_to_ram() is updated to only clean to PoC the page.

The performance impact of this change will depend on your 
workload/processor.
"
Bertrand Marquis Feb. 22, 2021, 1:48 p.m. UTC | #3
Hi Julien,

> On 22 Feb 2021, at 13:37, Julien Grall <julien@xen.org> wrote:
> 
> 
> 
> On 22/02/2021 11:58, Bertrand Marquis wrote:
>> Hi Julien,
>>> On 20 Feb 2021, at 17:54, Julien Grall <julien@xen.org> wrote:
>>> 
>>> From: Julien Grall <jgrall@amazon.com>
>>> 
>>> At the moment, flush_page_to_ram() is both cleaning and invalidate to
>>> PoC the page. However, the cache line can be speculated and pull in the
>>> cache right after as it is part of the direct map.
>> If we go further through this logic maybe all calls to
>> clean_and_invalidate_dcache_va_range could be transformed in a
>> clean_dcache_va_range.
> 
> Likely yes. But I need to go through them one by one to confirm this is fine to do it (it also depends on the caching attributes used). I have sent this one in advance because this was discussed as part of XSA-364.

Ok.

> 
>>> 
>>> So it is pointless to try to invalidate the line in the data cache.
>>> 
>> But what about processors which would not speculate ?
>> Do you expect any performance optimization here ?
> 
> When invalidating a line, you effectively remove it from the cache. If the page is going to be access a bit after, then you will have to load from the memory (or another cache).
> 
> With this change, you would only need to re-fetch the line if it wasn't evicted by the time it is accessed.
> 
> The line would be clean, so I would expect the eviction to have less an impact over re-fetching the memory.

Make sense.

> 
>> If so it might be good to explain it as I am not quite sure I get it.
> 
> This change is less about performance and more about unnecessary work.
> 
> The processor is likely going to be more clever than the developper and the exact numbers will vary depending on how the processor decide to manage the cache.
> 
> In general, we should avoid interferring too much with the cache without a good reason to do it.
> 
> How about the following commit message:
> 
> "
> At the moment, flush_page_to_ram() is both cleaning and invalidate to
> PoC the page.
> 
> The goal of flush_page_to_ram() is to prevent corruption when the guest has disabled the cache (the cache line may be dirty) and read the guest to read previous content.
> 
> Per this defintion, the invalidating the line is not necessary. So invalidating the cache is unnecessary. In fact, it may be counter-productive as the line may be (speculatively) accessed a bit after. So this will incurr an expensive access to the memory.
> 
> More generally, we should avoid interferring too much with cache. Therefore, flush_page_to_ram() is updated to only clean to PoC the page.
> 
> The performance impact of this change will depend on your workload/processor.
> "
> 

With this as your commit message you can add my:

Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>

Thanks for details

Cheers
Bertrand

> -- 
> Julien Grall
Stefano Stabellini Feb. 22, 2021, 8:35 p.m. UTC | #4
On Mon, 22 Feb 2021, Julien Grall wrote:
> On 22/02/2021 11:58, Bertrand Marquis wrote:
> > Hi Julien,
> > 
> > > On 20 Feb 2021, at 17:54, Julien Grall <julien@xen.org> wrote:
> > > 
> > > From: Julien Grall <jgrall@amazon.com>
> > > 
> > > At the moment, flush_page_to_ram() is both cleaning and invalidate to
> > > PoC the page. However, the cache line can be speculated and pull in the
> > > cache right after as it is part of the direct map.
> > 
> > If we go further through this logic maybe all calls to
> > clean_and_invalidate_dcache_va_range could be transformed in a
> > clean_dcache_va_range.
> 
> Likely yes. But I need to go through them one by one to confirm this is fine
> to do it (it also depends on the caching attributes used). I have sent this
> one in advance because this was discussed as part of XSA-364.
> 
> > 
> > > 
> > > So it is pointless to try to invalidate the line in the data cache.
> > > 
> > 
> > But what about processors which would not speculate ?
> > 
> > Do you expect any performance optimization here ?
> 
> When invalidating a line, you effectively remove it from the cache. If the
> page is going to be access a bit after, then you will have to load from the
> memory (or another cache).
> 
> With this change, you would only need to re-fetch the line if it wasn't
> evicted by the time it is accessed.
> 
> The line would be clean, so I would expect the eviction to have less an impact
> over re-fetching the memory.
> 
> > 
> > If so it might be good to explain it as I am not quite sure I get it.
> 
> This change is less about performance and more about unnecessary work.
> 
> The processor is likely going to be more clever than the developper and the
> exact numbers will vary depending on how the processor decide to manage the
> cache.
> 
> In general, we should avoid interferring too much with the cache without a
> good reason to do it.
> 
> How about the following commit message:
> 
> "
> At the moment, flush_page_to_ram() is both cleaning and invalidate to
> PoC the page.
> 
> The goal of flush_page_to_ram() is to prevent corruption when the guest has
> disabled the cache (the cache line may be dirty) and read the guest to read
> previous content.
> 
> Per this defintion, the invalidating the line is not necessary. So
> invalidating the cache is unnecessary. In fact, it may be counter-productive
> as the line may be (speculatively) accessed a bit after. So this will incurr
> an expensive access to the memory.
> 
> More generally, we should avoid interferring too much with cache. Therefore,
> flush_page_to_ram() is updated to only clean to PoC the page.
> 
> The performance impact of this change will depend on your workload/processor.
> "
 
From a correctness and functionality perspective, we don't need the
invalidate. If the line is dirty we are writing it back to memory (point
of coherence) thanks to the clean operations anyway. If somebody writes
to that location, the processor should evict the old line anyway.

The only reason I can think of for doing a "clean and invalidate" rather
than just a "clean" would be that we are trying to give a hint to the
processor that the cacheline is soon to be evicted. Assuming that the
hint even leads to some sort of performance optimization.

In practice, aside from being CPU specific, we don't know if it is even
a optimization or a pessimization.

In any case, on the grounds that it is unnecessary, I am OK with this.
I agree with Julien's proposal of applying this patch "for-next".

Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Julien Grall Feb. 22, 2021, 8:50 p.m. UTC | #5
On 22/02/2021 20:35, Stefano Stabellini wrote:
> On Mon, 22 Feb 2021, Julien Grall wrote:
>> On 22/02/2021 11:58, Bertrand Marquis wrote:
>>> Hi Julien,
>>>
>>>> On 20 Feb 2021, at 17:54, Julien Grall <julien@xen.org> wrote:
>>>>
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> At the moment, flush_page_to_ram() is both cleaning and invalidate to
>>>> PoC the page. However, the cache line can be speculated and pull in the
>>>> cache right after as it is part of the direct map.
>>>
>>> If we go further through this logic maybe all calls to
>>> clean_and_invalidate_dcache_va_range could be transformed in a
>>> clean_dcache_va_range.
>>
>> Likely yes. But I need to go through them one by one to confirm this is fine
>> to do it (it also depends on the caching attributes used). I have sent this
>> one in advance because this was discussed as part of XSA-364.
>>
>>>
>>>>
>>>> So it is pointless to try to invalidate the line in the data cache.
>>>>
>>>
>>> But what about processors which would not speculate ?
>>>
>>> Do you expect any performance optimization here ?
>>
>> When invalidating a line, you effectively remove it from the cache. If the
>> page is going to be access a bit after, then you will have to load from the
>> memory (or another cache).
>>
>> With this change, you would only need to re-fetch the line if it wasn't
>> evicted by the time it is accessed.
>>
>> The line would be clean, so I would expect the eviction to have less an impact
>> over re-fetching the memory.
>>
>>>
>>> If so it might be good to explain it as I am not quite sure I get it.
>>
>> This change is less about performance and more about unnecessary work.
>>
>> The processor is likely going to be more clever than the developper and the
>> exact numbers will vary depending on how the processor decide to manage the
>> cache.
>>
>> In general, we should avoid interferring too much with the cache without a
>> good reason to do it.
>>
>> How about the following commit message:
>>
>> "
>> At the moment, flush_page_to_ram() is both cleaning and invalidate to
>> PoC the page.
>>
>> The goal of flush_page_to_ram() is to prevent corruption when the guest has
>> disabled the cache (the cache line may be dirty) and read the guest to read
>> previous content.
>>
>> Per this defintion, the invalidating the line is not necessary. So
>> invalidating the cache is unnecessary. In fact, it may be counter-productive
>> as the line may be (speculatively) accessed a bit after. So this will incurr
>> an expensive access to the memory.
>>
>> More generally, we should avoid interferring too much with cache. Therefore,
>> flush_page_to_ram() is updated to only clean to PoC the page.
>>
>> The performance impact of this change will depend on your workload/processor.
>> "
>   
>  From a correctness and functionality perspective, we don't need the
> invalidate. If the line is dirty we are writing it back to memory (point
> of coherence) thanks to the clean operations anyway. If somebody writes
> to that location, the processor should evict the old line anyway.

Location as in same physical address or the same set?

For the former, the line is usually bigger than any write. So it is 
unlikely to get evicted.

For the later, it will depend on the content of the other ways in the set.

> The only reason I can think of for doing a "clean and invalidate" rather
> than just a "clean" would be that we are trying to give a hint to the
> processor that the cacheline is soon to be evicted. Assuming that the
> hint even leads to some sort of performance optimization.

This may change which lines get evict as there will be an unused way. 
But we are now down to the territory of micro-optimization.

If that's a problem for someone, then that user should better switch to 
cache coloring because the impact of flush_page_to_ram() will pretty 
small compare to the damage that another domain can do if it shares the 
same set.

> In any case, on the grounds that it is unnecessary, I am OK with this.
> I agree with Julien's proposal of applying this patch "for-next".
> 
> Acked-by: Stefano Stabellini <sstabellini@kernel.org>

Thanks! I am thinking to create a branch next again for queuing 4.15+ 
patches. Would that be fine with you?

Cheers,
Stefano Stabellini Feb. 23, 2021, 1:22 a.m. UTC | #6
On Mon, 22 Feb 2021, Julien Grall wrote:
> On 22/02/2021 20:35, Stefano Stabellini wrote:
> > On Mon, 22 Feb 2021, Julien Grall wrote:
> > > On 22/02/2021 11:58, Bertrand Marquis wrote:
> > > > Hi Julien,
> > > > 
> > > > > On 20 Feb 2021, at 17:54, Julien Grall <julien@xen.org> wrote:
> > > > > 
> > > > > From: Julien Grall <jgrall@amazon.com>
> > > > > 
> > > > > At the moment, flush_page_to_ram() is both cleaning and invalidate to
> > > > > PoC the page. However, the cache line can be speculated and pull in
> > > > > the
> > > > > cache right after as it is part of the direct map.
> > > > 
> > > > If we go further through this logic maybe all calls to
> > > > clean_and_invalidate_dcache_va_range could be transformed in a
> > > > clean_dcache_va_range.
> > > 
> > > Likely yes. But I need to go through them one by one to confirm this is
> > > fine
> > > to do it (it also depends on the caching attributes used). I have sent
> > > this
> > > one in advance because this was discussed as part of XSA-364.
> > > 
> > > > 
> > > > > 
> > > > > So it is pointless to try to invalidate the line in the data cache.
> > > > > 
> > > > 
> > > > But what about processors which would not speculate ?
> > > > 
> > > > Do you expect any performance optimization here ?
> > > 
> > > When invalidating a line, you effectively remove it from the cache. If the
> > > page is going to be access a bit after, then you will have to load from
> > > the
> > > memory (or another cache).
> > > 
> > > With this change, you would only need to re-fetch the line if it wasn't
> > > evicted by the time it is accessed.
> > > 
> > > The line would be clean, so I would expect the eviction to have less an
> > > impact
> > > over re-fetching the memory.
> > > 
> > > > 
> > > > If so it might be good to explain it as I am not quite sure I get it.
> > > 
> > > This change is less about performance and more about unnecessary work.
> > > 
> > > The processor is likely going to be more clever than the developper and
> > > the
> > > exact numbers will vary depending on how the processor decide to manage
> > > the
> > > cache.
> > > 
> > > In general, we should avoid interferring too much with the cache without a
> > > good reason to do it.
> > > 
> > > How about the following commit message:
> > > 
> > > "
> > > At the moment, flush_page_to_ram() is both cleaning and invalidate to
> > > PoC the page.
> > > 
> > > The goal of flush_page_to_ram() is to prevent corruption when the guest
> > > has
> > > disabled the cache (the cache line may be dirty) and read the guest to
> > > read
> > > previous content.
> > > 
> > > Per this defintion, the invalidating the line is not necessary. So
> > > invalidating the cache is unnecessary. In fact, it may be
> > > counter-productive
> > > as the line may be (speculatively) accessed a bit after. So this will
> > > incurr
> > > an expensive access to the memory.
> > > 
> > > More generally, we should avoid interferring too much with cache.
> > > Therefore,
> > > flush_page_to_ram() is updated to only clean to PoC the page.
> > > 
> > > The performance impact of this change will depend on your
> > > workload/processor.
> > > "
> >    From a correctness and functionality perspective, we don't need the
> > invalidate. If the line is dirty we are writing it back to memory (point
> > of coherence) thanks to the clean operations anyway. If somebody writes
> > to that location, the processor should evict the old line anyway.
> 
> Location as in same physical address or the same set?
> 
> For the former, the line is usually bigger than any write. So it is unlikely
> to get evicted.
> 
> For the later, it will depend on the content of the other ways in the set.
> 
> > The only reason I can think of for doing a "clean and invalidate" rather
> > than just a "clean" would be that we are trying to give a hint to the
> > processor that the cacheline is soon to be evicted. Assuming that the
> > hint even leads to some sort of performance optimization.
> 
> This may change which lines get evict as there will be an unused way. But we
> are now down to the territory of micro-optimization.
> 
> If that's a problem for someone, then that user should better switch to cache
> coloring because the impact of flush_page_to_ram() will pretty small compare
> to the damage that another domain can do if it shares the same set.
> 
> > In any case, on the grounds that it is unnecessary, I am OK with this.
> > I agree with Julien's proposal of applying this patch "for-next".
> > 
> > Acked-by: Stefano Stabellini <sstabellini@kernel.org>
> 
> Thanks! I am thinking to create a branch next again for queuing 4.15+ patches.
> Would that be fine with you?

yes good idea
Julien Grall March 3, 2021, 6:33 p.m. UTC | #7
Hi Stefano,

On 23/02/2021 01:22, Stefano Stabellini wrote:
>> Thanks! I am thinking to create a branch next again for queuing 4.15+ patches.
>> Would that be fine with you?
> 
> yes good idea

I have created the branch for-next/4.16 on my public tree and push the 
patch:

https://xenbits.xen.org/gitweb/?p=people/julieng/xen-unstable.git;a=shortlog;h=refs/heads/for-next/4.16

This will be merged once the tree is re-opened.

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 59f8a3f15fd1..2f11d214e184 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -529,7 +529,7 @@  void flush_page_to_ram(unsigned long mfn, bool sync_icache)
 {
     void *v = map_domain_page(_mfn(mfn));
 
-    clean_and_invalidate_dcache_va_range(v, PAGE_SIZE);
+    clean_dcache_va_range(v, PAGE_SIZE);
     unmap_domain_page(v);
 
     /*