diff mbox series

arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()

Message ID 20210514095001.13236-1-catalin.marinas@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache() | expand

Commit Message

Catalin Marinas May 14, 2021, 9:50 a.m. UTC
To ensure that instructions are observable in a new mapping, the arm64
set_pte_at() implementation cleans the D-cache and invalidates the
I-cache to the PoU. As an optimisation, this is only done on executable
mappings and the PG_dcache_clean page flag is set to avoid future cache
maintenance on the same page.

When two different processes map the same page (e.g. private executable
file or shared mapping) there's a potential race on checking and setting
PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the
fault paths the page is locked (PG_locked), mprotect() does not take the
page lock. The result is that one process may see the PG_dcache_clean
flag set but the I/D cache maintenance not yet performed.

Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit()
and set_bit(). In the rare event of a race, the cache maintenance is
done twice.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Steven Price <steven.price@arm.com>
---

Found while debating with Steven a similar race on PG_mte_tagged. For
the latter we'll have to take a lock but hopefully in practice it will
only happen when restoring from swap. Separate thread anyway.

There's at least arch/arm with a similar race. Powerpc seems to do it
properly with separate test/set. Other architectures have a bigger
problem as they do a similar check in update_mmu_cache(), called after
the pte was already exposed to user.

I looked at fixing this in the mprotect() code but taking the page lock
will slow it down, so not sure how popular this would be for such a rare
race.

 arch/arm64/mm/flush.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Steven Price May 14, 2021, 9:54 a.m. UTC | #1
On 14/05/2021 10:50, Catalin Marinas wrote:
> To ensure that instructions are observable in a new mapping, the arm64
> set_pte_at() implementation cleans the D-cache and invalidates the
> I-cache to the PoU. As an optimisation, this is only done on executable
> mappings and the PG_dcache_clean page flag is set to avoid future cache
> maintenance on the same page.
> 
> When two different processes map the same page (e.g. private executable
> file or shared mapping) there's a potential race on checking and setting
> PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the
> fault paths the page is locked (PG_locked), mprotect() does not take the
> page lock. The result is that one process may see the PG_dcache_clean
> flag set but the I/D cache maintenance not yet performed.
> 
> Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit()
> and set_bit(). In the rare event of a race, the cache maintenance is
> done twice.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: <stable@vger.kernel.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Steven Price <steven.price@arm.com>

Thanks for writing up a proper patch.

Reviewed-by: Steven Price <steven.price@arm.com>

Steve

> ---
> 
> Found while debating with Steven a similar race on PG_mte_tagged. For
> the latter we'll have to take a lock but hopefully in practice it will
> only happen when restoring from swap. Separate thread anyway.
> 
> There's at least arch/arm with a similar race. Powerpc seems to do it
> properly with separate test/set. Other architectures have a bigger
> problem as they do a similar check in update_mmu_cache(), called after
> the pte was already exposed to user.
> 
> I looked at fixing this in the mprotect() code but taking the page lock
> will slow it down, so not sure how popular this would be for such a rare
> race.
> 
>  arch/arm64/mm/flush.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index ac485163a4a7..6d44c028d1c9 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -55,8 +55,10 @@ void __sync_icache_dcache(pte_t pte)
>  {
>  	struct page *page = pte_page(pte);
>  
> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> +	if (!test_bit(PG_dcache_clean, &page->flags)) {
>  		sync_icache_aliases(page_address(page), page_size(page));
> +		set_bit(PG_dcache_clean, &page->flags);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(__sync_icache_dcache);
>  
>
Will Deacon May 14, 2021, 10:38 a.m. UTC | #2
On Fri, May 14, 2021 at 10:50:01AM +0100, Catalin Marinas wrote:
> To ensure that instructions are observable in a new mapping, the arm64
> set_pte_at() implementation cleans the D-cache and invalidates the
> I-cache to the PoU. As an optimisation, this is only done on executable
> mappings and the PG_dcache_clean page flag is set to avoid future cache
> maintenance on the same page.
> 
> When two different processes map the same page (e.g. private executable
> file or shared mapping) there's a potential race on checking and setting
> PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the
> fault paths the page is locked (PG_locked), mprotect() does not take the
> page lock. The result is that one process may see the PG_dcache_clean
> flag set but the I/D cache maintenance not yet performed.
> 
> Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit()
> and set_bit(). In the rare event of a race, the cache maintenance is
> done twice.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: <stable@vger.kernel.org>
> Cc: Will Deacon <will@kernel.org>
> Cc: Steven Price <steven.price@arm.com>
> ---
> 
> Found while debating with Steven a similar race on PG_mte_tagged. For
> the latter we'll have to take a lock but hopefully in practice it will
> only happen when restoring from swap. Separate thread anyway.
> 
> There's at least arch/arm with a similar race. Powerpc seems to do it
> properly with separate test/set. Other architectures have a bigger
> problem as they do a similar check in update_mmu_cache(), called after
> the pte was already exposed to user.
> 
> I looked at fixing this in the mprotect() code but taking the page lock
> will slow it down, so not sure how popular this would be for such a rare
> race.
> 
>  arch/arm64/mm/flush.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
> index ac485163a4a7..6d44c028d1c9 100644
> --- a/arch/arm64/mm/flush.c
> +++ b/arch/arm64/mm/flush.c
> @@ -55,8 +55,10 @@ void __sync_icache_dcache(pte_t pte)
>  {
>  	struct page *page = pte_page(pte);
>  
> -	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> +	if (!test_bit(PG_dcache_clean, &page->flags)) {
>  		sync_icache_aliases(page_address(page), page_size(page));
> +		set_bit(PG_dcache_clean, &page->flags);
> +	}

Acked-by: Will Deacon <will@kernel.org>

I wondered about the ISB for a bit (we don't broadcast it), but should
be fine as the racing CPU needs to return to userspace.

Will
Catalin Marinas May 14, 2021, 4:21 p.m. UTC | #3
On Fri, 14 May 2021 10:50:01 +0100, Catalin Marinas wrote:
> To ensure that instructions are observable in a new mapping, the arm64
> set_pte_at() implementation cleans the D-cache and invalidates the
> I-cache to the PoU. As an optimisation, this is only done on executable
> mappings and the PG_dcache_clean page flag is set to avoid future cache
> maintenance on the same page.
> 
> When two different processes map the same page (e.g. private executable
> file or shared mapping) there's a potential race on checking and setting
> PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the
> fault paths the page is locked (PG_locked), mprotect() does not take the
> page lock. The result is that one process may see the PG_dcache_clean
> flag set but the I/D cache maintenance not yet performed.
> 
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/1] arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()
      https://git.kernel.org/arm64/c/588a513d3425
diff mbox series

Patch

diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index ac485163a4a7..6d44c028d1c9 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -55,8 +55,10 @@  void __sync_icache_dcache(pte_t pte)
 {
 	struct page *page = pte_page(pte);
 
-	if (!test_and_set_bit(PG_dcache_clean, &page->flags))
+	if (!test_bit(PG_dcache_clean, &page->flags)) {
 		sync_icache_aliases(page_address(page), page_size(page));
+		set_bit(PG_dcache_clean, &page->flags);
+	}
 }
 EXPORT_SYMBOL_GPL(__sync_icache_dcache);