diff mbox

arm64: mm: Fix pte_mkclean, pte_mkdirty semantics

Message ID 20171201172214.6834-1-steve.capper@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Steve Capper Dec. 1, 2017, 5:22 p.m. UTC
On systems with hardware dirty bit management, the ltp madvise09 unit
test fails due to dirty bit information being lost and pages being
incorrectly freed.

This was bisected to:
	arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()

Reverting this commit leads to a separate problem, that the unit test
retains pages that should have been dropped due to the function
madvise_free_pte_range(.) not cleaning pte's properly.

Currently pte_mkclean only clears the software dirty bit, thus the
following code sequence can appear:

	pte = pte_mkclean(pte);
	if (pte_dirty(pte))
		// this condition can return true with HW DBM!

This patch also adjusts pte_mkclean to set PTE_RDONLY thus effectively
clearing both the SW and HW dirty information.

In order for this to function on systems without HW DBM, we need to
also adjust pte_mkdirty to remove the read only bit from writable pte's
to avoid infinite fault loops.

Reported-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Fixes: 64c26841b349 ("arm64: Ignore hardware dirty bit updates in
ptep_set_wrprotect()")
Signed-off-by: Steve Capper <steve.capper@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

Comments

Catalin Marinas Dec. 1, 2017, 5:44 p.m. UTC | #1
On Fri, Dec 01, 2017 at 05:22:14PM +0000, Steve Capper wrote:
> On systems with hardware dirty bit management, the ltp madvise09 unit
> test fails due to dirty bit information being lost and pages being
> incorrectly freed.
> 
> This was bisected to:
> 	arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()
> 
> Reverting this commit leads to a separate problem, that the unit test
> retains pages that should have been dropped due to the function
> madvise_free_pte_range(.) not cleaning pte's properly.
> 
> Currently pte_mkclean only clears the software dirty bit, thus the
> following code sequence can appear:
> 
> 	pte = pte_mkclean(pte);
> 	if (pte_dirty(pte))
> 		// this condition can return true with HW DBM!
> 
> This patch also adjusts pte_mkclean to set PTE_RDONLY thus effectively
> clearing both the SW and HW dirty information.
> 
> In order for this to function on systems without HW DBM, we need to
> also adjust pte_mkdirty to remove the read only bit from writable pte's
> to avoid infinite fault loops.
> 
> Reported-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
> Fixes: 64c26841b349 ("arm64: Ignore hardware dirty bit updates in
> ptep_set_wrprotect()")
> Signed-off-by: Steve Capper <steve.capper@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Bhupinder Thakur Dec. 8, 2017, 7:27 a.m. UTC | #2
On 1 December 2017 at 23:14, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Fri, Dec 01, 2017 at 05:22:14PM +0000, Steve Capper wrote:
>> On systems with hardware dirty bit management, the ltp madvise09 unit
>> test fails due to dirty bit information being lost and pages being
>> incorrectly freed.
>>
>> This was bisected to:
>>       arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()
>>
>> Reverting this commit leads to a separate problem, that the unit test
>> retains pages that should have been dropped due to the function
>> madvise_free_pte_range(.) not cleaning pte's properly.
>>
>> Currently pte_mkclean only clears the software dirty bit, thus the
>> following code sequence can appear:
>>
>>       pte = pte_mkclean(pte);
>>       if (pte_dirty(pte))
>>               // this condition can return true with HW DBM!
>>
>> This patch also adjusts pte_mkclean to set PTE_RDONLY thus effectively
>> clearing both the SW and HW dirty information.
>>
>> In order for this to function on systems without HW DBM, we need to
>> also adjust pte_mkdirty to remove the read only bit from writable pte's
>> to avoid infinite fault loops.
>>
>> Reported-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
>> Fixes: 64c26841b349 ("arm64: Ignore hardware dirty bit updates in
>> ptep_set_wrprotect()")
>> Signed-off-by: Steve Capper <steve.capper@arm.com>
>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

I verified that with this patch the madvise09 test passes.
diff mbox

Patch

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b46e54c2399b..0fa7305765ab 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -135,12 +135,20 @@  static inline pte_t pte_mkwrite(pte_t pte)
 
 static inline pte_t pte_mkclean(pte_t pte)
 {
-	return clear_pte_bit(pte, __pgprot(PTE_DIRTY));
+	pte = clear_pte_bit(pte, __pgprot(PTE_DIRTY));
+	pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
+
+	return pte;
 }
 
 static inline pte_t pte_mkdirty(pte_t pte)
 {
-	return set_pte_bit(pte, __pgprot(PTE_DIRTY));
+	pte = set_pte_bit(pte, __pgprot(PTE_DIRTY));
+
+	if (pte_write(pte))
+		pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
+
+	return pte;
 }
 
 static inline pte_t pte_mkold(pte_t pte)
@@ -628,28 +636,23 @@  static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 /*
- * ptep_set_wrprotect - mark read-only while preserving the hardware update of
- * the Access Flag.
+ * ptep_set_wrprotect - mark read-only while trasferring potential hardware
+ * dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit.
  */
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
 static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep)
 {
 	pte_t old_pte, pte;
 
-	/*
-	 * ptep_set_wrprotect() is only called on CoW mappings which are
-	 * private (!VM_SHARED) with the pte either read-only (!PTE_WRITE &&
-	 * PTE_RDONLY) or writable and software-dirty (PTE_WRITE &&
-	 * !PTE_RDONLY && PTE_DIRTY); see is_cow_mapping() and
-	 * protection_map[]. There is no race with the hardware update of the
-	 * dirty state: clearing of PTE_RDONLY when PTE_WRITE (a.k.a. PTE_DBM)
-	 * is set.
-	 */
-	VM_WARN_ONCE(pte_write(*ptep) && !pte_dirty(*ptep),
-		     "%s: potential race with hardware DBM", __func__);
 	pte = READ_ONCE(*ptep);
 	do {
 		old_pte = pte;
+		/*
+		 * If hardware-dirty (PTE_WRITE/DBM bit set and PTE_RDONLY
+		 * clear), set the PTE_DIRTY bit.
+		 */
+		if (pte_hw_dirty(pte))
+			pte = pte_mkdirty(pte);
 		pte = pte_wrprotect(pte);
 		pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep),
 					       pte_val(old_pte), pte_val(pte));