Message ID | 1606453584-15399-2-git-send-email-anshuman.khandual@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/debug_vm_pgtable: Some minor updates | expand |
Le 27/11/2020 à 06:06, Anshuman Khandual a écrit : > This adds validation tests for dirtiness after write protect conversion for > each page table level. This is important for platforms such as arm64 that > removes the hardware dirty bit while making it an write protected one. This > also fixes pxx_wrprotect() related typos in the documentation file. > > diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c > index c05d9dcf7891..a5be11210597 100644 > --- a/mm/debug_vm_pgtable.c > +++ b/mm/debug_vm_pgtable.c > @@ -70,6 +70,7 @@ static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot) > WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte)))); > WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte)))); > WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte)))); > + WARN_ON(pte_dirty(pte_wrprotect(pte))); Wondering what you are testing here exactly. Do you expect that if PTE has the dirty bit, it gets cleared by pte_wrprotect() ? Powerpc doesn't do that, it only clears the RW bit but the dirty bit remains if it is set, until you call pte_mkclean() explicitely. > } > > static void __init pte_advanced_tests(struct mm_struct *mm, > @@ -144,6 +145,7 @@ static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) > WARN_ON(pmd_young(pmd_mkold(pmd_mkyoung(pmd)))); > WARN_ON(pmd_dirty(pmd_mkclean(pmd_mkdirty(pmd)))); > WARN_ON(pmd_write(pmd_wrprotect(pmd_mkwrite(pmd)))); > + WARN_ON(pmd_dirty(pmd_wrprotect(pmd))); > /* > * A huge page does not point to next level page table > * entry. Hence this must qualify as pmd_bad(). > @@ -262,6 +264,7 @@ static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) > WARN_ON(!pud_write(pud_mkwrite(pud_wrprotect(pud)))); > WARN_ON(pud_write(pud_wrprotect(pud_mkwrite(pud)))); > WARN_ON(pud_young(pud_mkold(pud_mkyoung(pud)))); > + WARN_ON(pud_dirty(pud_wrprotect(pud))); > > if (mm_pmd_folded(mm)) > return; > Christophe
On Fri, Nov 27, 2020 at 09:22:24AM +0100, Christophe Leroy wrote: > Le 27/11/2020 à 06:06, Anshuman Khandual a écrit : > > This adds validation tests for dirtiness after write protect conversion for > > each page table level. This is important for platforms such as arm64 that > > removes the hardware dirty bit while making it an write protected one. This > > also fixes pxx_wrprotect() related typos in the documentation file. > > > diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c > > index c05d9dcf7891..a5be11210597 100644 > > --- a/mm/debug_vm_pgtable.c > > +++ b/mm/debug_vm_pgtable.c > > @@ -70,6 +70,7 @@ static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot) > > WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte)))); > > WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte)))); > > WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte)))); > > + WARN_ON(pte_dirty(pte_wrprotect(pte))); > > Wondering what you are testing here exactly. > > Do you expect that if PTE has the dirty bit, it gets cleared by pte_wrprotect() ? > > Powerpc doesn't do that, it only clears the RW bit but the dirty bit remains > if it is set, until you call pte_mkclean() explicitely. Arm64 has an unusual way of setting a hardware dirty "bit", it actually clears the PTE_RDONLY bit. The pte_wrprotect() sets the PTE_RDONLY bit back and we can lose the dirty information. Will found this and posted patches to fix the arm64 pte_wprotect() to set a software PTE_DIRTY if !PTE_RDONLY (we do this for ptep_set_wrprotect() already). My concern was that we may inadvertently make a fresh/clean pte dirty with such change, hence the suggestion for the test. That said, I think we also need a test in the other direction, pte_wrprotect() should preserve any dirty information: WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte)))); If pte_mkwrite() makes a pte truly writable and potentially dirty, we could also add a test as below. However, I think that's valid for arm64, other architectures with a separate hardware dirty bit would fail this: WARN_ON(!pte_dirty(pte_wrprotect(pte_mkwrite(pte))));
On 11/27/20 3:14 PM, Catalin Marinas wrote: > On Fri, Nov 27, 2020 at 09:22:24AM +0100, Christophe Leroy wrote: >> Le 27/11/2020 à 06:06, Anshuman Khandual a écrit : >>> This adds validation tests for dirtiness after write protect conversion for >>> each page table level. This is important for platforms such as arm64 that >>> removes the hardware dirty bit while making it an write protected one. This >>> also fixes pxx_wrprotect() related typos in the documentation file. >> >>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c >>> index c05d9dcf7891..a5be11210597 100644 >>> --- a/mm/debug_vm_pgtable.c >>> +++ b/mm/debug_vm_pgtable.c >>> @@ -70,6 +70,7 @@ static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot) >>> WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte)))); >>> WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte)))); >>> WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte)))); >>> + WARN_ON(pte_dirty(pte_wrprotect(pte))); >> >> Wondering what you are testing here exactly. >> >> Do you expect that if PTE has the dirty bit, it gets cleared by pte_wrprotect() ? >> >> Powerpc doesn't do that, it only clears the RW bit but the dirty bit remains >> if it is set, until you call pte_mkclean() explicitely. > > Arm64 has an unusual way of setting a hardware dirty "bit", it actually > clears the PTE_RDONLY bit. The pte_wrprotect() sets the PTE_RDONLY bit > back and we can lose the dirty information. Will found this and posted > patches to fix the arm64 pte_wprotect() to set a software PTE_DIRTY if > !PTE_RDONLY (we do this for ptep_set_wrprotect() already). My concern > was that we may inadvertently make a fresh/clean pte dirty with such > change, hence the suggestion for the test. > > That said, I think we also need a test in the other direction, > pte_wrprotect() should preserve any dirty information: > > WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte)))); This seems like a generic enough principle which all platforms should adhere to. But the proposed test WARN_ON(pte_dirty(pte_wrprotect(pte))) might fail on some platforms if the page table entry came in as a dirty one and pte_wrprotect() is not expected to alter the dirty state. Instead, should we just add the following two tests, which would ensure that pte_wrprotect() never alters the dirty state of a page table entry. WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte)))); WARN_ON(pte_dirty(pte_wrprotect(pte_mkclean(pte)))); > > If pte_mkwrite() makes a pte truly writable and potentially dirty, we > could also add a test as below. However, I think that's valid for arm64, > other architectures with a separate hardware dirty bit would fail this: > > WARN_ON(!pte_dirty(pte_wrprotect(pte_mkwrite(pte)))); Right.
On Mon, Nov 30, 2020 at 09:55:00AM +0530, Anshuman Khandual wrote: > On 11/27/20 3:14 PM, Catalin Marinas wrote: > > On Fri, Nov 27, 2020 at 09:22:24AM +0100, Christophe Leroy wrote: > >> Le 27/11/2020 à 06:06, Anshuman Khandual a écrit : > >>> This adds validation tests for dirtiness after write protect conversion for > >>> each page table level. This is important for platforms such as arm64 that > >>> removes the hardware dirty bit while making it an write protected one. This > >>> also fixes pxx_wrprotect() related typos in the documentation file. > >> > >>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c > >>> index c05d9dcf7891..a5be11210597 100644 > >>> --- a/mm/debug_vm_pgtable.c > >>> +++ b/mm/debug_vm_pgtable.c > >>> @@ -70,6 +70,7 @@ static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot) > >>> WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte)))); > >>> WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte)))); > >>> WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte)))); > >>> + WARN_ON(pte_dirty(pte_wrprotect(pte))); > >> > >> Wondering what you are testing here exactly. > >> > >> Do you expect that if PTE has the dirty bit, it gets cleared by > >> pte_wrprotect() ? > >> > >> Powerpc doesn't do that, it only clears the RW bit but the dirty > >> bit remains if it is set, until you call pte_mkclean() explicitely. > > > > Arm64 has an unusual way of setting a hardware dirty "bit", it actually > > clears the PTE_RDONLY bit. The pte_wrprotect() sets the PTE_RDONLY bit > > back and we can lose the dirty information. Will found this and posted > > patches to fix the arm64 pte_wprotect() to set a software PTE_DIRTY if > > !PTE_RDONLY (we do this for ptep_set_wrprotect() already). My concern > > was that we may inadvertently make a fresh/clean pte dirty with such > > change, hence the suggestion for the test. > > > > That said, I think we also need a test in the other direction, > > pte_wrprotect() should preserve any dirty information: > > > > WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte)))); > > This seems like a generic enough principle which all platforms should > adhere to. But the proposed test WARN_ON(pte_dirty(pte_wrprotect(pte))) > might fail on some platforms if the page table entry came in as a dirty > one and pte_wrprotect() is not expected to alter the dirty state. Ah, so do we have architectures where entries in protection_map[] are already dirty? If those are valid, maybe the check should be: WARN_ON(!pte_dirty(pte) && pte_dirty(pte_wrprotect(pte))); > Instead, should we just add the following two tests, which would ensure > that pte_wrprotect() never alters the dirty state of a page table entry. > > WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte)))); > WARN_ON(pte_dirty(pte_wrprotect(pte_mkclean(pte)))); These should be added as additional tests. However, my initial thought was to check whether pte_wrprotect() on a new pte created from a protection_map[] entry directly would inadvertently dirty it. On arm64, that means a protection_map[] entry missing PTE_RDONLY. A pte_mkclean() would set PTE_RDONLY, so we'd miss such check.
On 11/30/20 3:08 PM, Catalin Marinas wrote: > On Mon, Nov 30, 2020 at 09:55:00AM +0530, Anshuman Khandual wrote: >> On 11/27/20 3:14 PM, Catalin Marinas wrote: >>> On Fri, Nov 27, 2020 at 09:22:24AM +0100, Christophe Leroy wrote: >>>> Le 27/11/2020 à 06:06, Anshuman Khandual a écrit : >>>>> This adds validation tests for dirtiness after write protect conversion for >>>>> each page table level. This is important for platforms such as arm64 that >>>>> removes the hardware dirty bit while making it an write protected one. This >>>>> also fixes pxx_wrprotect() related typos in the documentation file. >>>> >>>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c >>>>> index c05d9dcf7891..a5be11210597 100644 >>>>> --- a/mm/debug_vm_pgtable.c >>>>> +++ b/mm/debug_vm_pgtable.c >>>>> @@ -70,6 +70,7 @@ static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot) >>>>> WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte)))); >>>>> WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte)))); >>>>> WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte)))); >>>>> + WARN_ON(pte_dirty(pte_wrprotect(pte))); >>>> >>>> Wondering what you are testing here exactly. >>>> >>>> Do you expect that if PTE has the dirty bit, it gets cleared by >>>> pte_wrprotect() ? >>>> >>>> Powerpc doesn't do that, it only clears the RW bit but the dirty >>>> bit remains if it is set, until you call pte_mkclean() explicitely. >>> >>> Arm64 has an unusual way of setting a hardware dirty "bit", it actually >>> clears the PTE_RDONLY bit. The pte_wrprotect() sets the PTE_RDONLY bit >>> back and we can lose the dirty information. Will found this and posted >>> patches to fix the arm64 pte_wprotect() to set a software PTE_DIRTY if >>> !PTE_RDONLY (we do this for ptep_set_wrprotect() already). My concern >>> was that we may inadvertently make a fresh/clean pte dirty with such >>> change, hence the suggestion for the test. >>> >>> That said, I think we also need a test in the other direction, >>> pte_wrprotect() should preserve any dirty information: >>> >>> WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte)))); >> >> This seems like a generic enough principle which all platforms should >> adhere to. But the proposed test WARN_ON(pte_dirty(pte_wrprotect(pte))) >> might fail on some platforms if the page table entry came in as a dirty >> one and pte_wrprotect() is not expected to alter the dirty state. > > Ah, so do we have architectures where entries in protection_map[] are > already dirty? If those are valid, maybe the check should be: Okay, I did not imply that actually. The current position for these new tests in respective pxx_basic_tests() functions is right at the end and hence the pxx might have already gone through some changes from the time it was originally created with pfn_pxx(). The entry here is not starting from the beginning. It is not expected as well, per design. So dirty bit might or might not be there depending on all the previous test sequences leading upto these new ones. IIUC, Christophe mentioned the fact that on platforms like powerpc, dirty bit just remains unchanged during pte_wprotect(). So the current test WARN_ON(pte_dirty(pte_wrprotect(pte))) will not work on powerpc if the previous tests leading upto that point has got the dirty bit set. This is irrespective of how it was created with pfn_pte() from protection_map[] originally at the beginning. > > WARN_ON(!pte_dirty(pte) && pte_dirty(pte_wrprotect(pte))); > >> Instead, should we just add the following two tests, which would ensure >> that pte_wrprotect() never alters the dirty state of a page table entry. >> >> WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte)))); >> WARN_ON(pte_dirty(pte_wrprotect(pte_mkclean(pte)))); > > These should be added as additional tests. However, my initial thought Okay, will add them. > was to check whether pte_wrprotect() on a new pte created from a > protection_map[] entry directly would inadvertently dirty it. On arm64, > that means a protection_map[] entry missing PTE_RDONLY. A pte_mkclean() > would set PTE_RDONLY, so we'd miss such check. > To achieve this, we could move the test right at the beginning just after the pxx gets created from protection_map[], with a comment explaining the rationale.
On Mon, Nov 30, 2020 at 04:28:20PM +0530, Anshuman Khandual wrote: > On 11/30/20 3:08 PM, Catalin Marinas wrote: > > On Mon, Nov 30, 2020 at 09:55:00AM +0530, Anshuman Khandual wrote: > >> On 11/27/20 3:14 PM, Catalin Marinas wrote: > >>> On Fri, Nov 27, 2020 at 09:22:24AM +0100, Christophe Leroy wrote: > >>>> Le 27/11/2020 à 06:06, Anshuman Khandual a écrit : > >>>>> This adds validation tests for dirtiness after write protect conversion for > >>>>> each page table level. This is important for platforms such as arm64 that > >>>>> removes the hardware dirty bit while making it an write protected one. This > >>>>> also fixes pxx_wrprotect() related typos in the documentation file. > >>>> > >>>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c > >>>>> index c05d9dcf7891..a5be11210597 100644 > >>>>> --- a/mm/debug_vm_pgtable.c > >>>>> +++ b/mm/debug_vm_pgtable.c > >>>>> @@ -70,6 +70,7 @@ static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot) > >>>>> WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte)))); > >>>>> WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte)))); > >>>>> WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte)))); > >>>>> + WARN_ON(pte_dirty(pte_wrprotect(pte))); > >>>> > >>>> Wondering what you are testing here exactly. > >>>> > >>>> Do you expect that if PTE has the dirty bit, it gets cleared by > >>>> pte_wrprotect() ? > >>>> > >>>> Powerpc doesn't do that, it only clears the RW bit but the dirty > >>>> bit remains if it is set, until you call pte_mkclean() explicitely. > >>> > >>> Arm64 has an unusual way of setting a hardware dirty "bit", it actually > >>> clears the PTE_RDONLY bit. The pte_wrprotect() sets the PTE_RDONLY bit > >>> back and we can lose the dirty information. Will found this and posted > >>> patches to fix the arm64 pte_wprotect() to set a software PTE_DIRTY if > >>> !PTE_RDONLY (we do this for ptep_set_wrprotect() already). My concern > >>> was that we may inadvertently make a fresh/clean pte dirty with such > >>> change, hence the suggestion for the test. > >>> > >>> That said, I think we also need a test in the other direction, > >>> pte_wrprotect() should preserve any dirty information: > >>> > >>> WARN_ON(!pte_dirty(pte_wrprotect(pte_mkdirty(pte)))); > >> > >> This seems like a generic enough principle which all platforms should > >> adhere to. But the proposed test WARN_ON(pte_dirty(pte_wrprotect(pte))) > >> might fail on some platforms if the page table entry came in as a dirty > >> one and pte_wrprotect() is not expected to alter the dirty state. > > > > Ah, so do we have architectures where entries in protection_map[] are > > already dirty? If those are valid, maybe the check should be: > > Okay, I did not imply that actually. The current position for these new > tests in respective pxx_basic_tests() functions is right at the end and > hence the pxx might have already gone through some changes from the time > it was originally created with pfn_pxx(). The entry here is not starting > from the beginning. It is not expected as well, per design. So dirty bit > might or might not be there depending on all the previous test sequences > leading upto these new ones. > > IIUC, Christophe mentioned the fact that on platforms like powerpc, dirty > bit just remains unchanged during pte_wprotect(). So the current test > WARN_ON(pte_dirty(pte_wrprotect(pte))) will not work on powerpc if the > previous tests leading upto that point has got the dirty bit set. This is > irrespective of how it was created with pfn_pte() from protection_map[] > originally at the beginning. [...] > To achieve this, we could move the test right at the beginning just after > the pxx gets created from protection_map[], with a comment explaining the > rationale. OK, this makes sense. Thanks for the clarification.
diff --git a/Documentation/vm/arch_pgtable_helpers.rst b/Documentation/vm/arch_pgtable_helpers.rst index f3591ee3aaa8..552567d863b8 100644 --- a/Documentation/vm/arch_pgtable_helpers.rst +++ b/Documentation/vm/arch_pgtable_helpers.rst @@ -50,7 +50,7 @@ PTE Page Table Helpers +---------------------------+--------------------------------------------------+ | pte_mkwrite | Creates a writable PTE | +---------------------------+--------------------------------------------------+ -| pte_mkwrprotect | Creates a write protected PTE | +| pte_wrprotect | Creates a write protected PTE | +---------------------------+--------------------------------------------------+ | pte_mkspecial | Creates a special PTE | +---------------------------+--------------------------------------------------+ @@ -120,7 +120,7 @@ PMD Page Table Helpers +---------------------------+--------------------------------------------------+ | pmd_mkwrite | Creates a writable PMD | +---------------------------+--------------------------------------------------+ -| pmd_mkwrprotect | Creates a write protected PMD | +| pmd_wrprotect | Creates a write protected PMD | +---------------------------+--------------------------------------------------+ | pmd_mkspecial | Creates a special PMD | +---------------------------+--------------------------------------------------+ @@ -186,7 +186,7 @@ PUD Page Table Helpers +---------------------------+--------------------------------------------------+ | pud_mkwrite | Creates a writable PUD | +---------------------------+--------------------------------------------------+ -| pud_mkwrprotect | Creates a write protected PUD | +| pud_wrprotect | Creates a write protected PUD | +---------------------------+--------------------------------------------------+ | pud_mkdevmap | Creates a ZONE_DEVICE mapped PUD | +---------------------------+--------------------------------------------------+ @@ -224,7 +224,7 @@ HugeTLB Page Table Helpers +---------------------------+--------------------------------------------------+ | huge_pte_mkwrite | Creates a writable HugeTLB | +---------------------------+--------------------------------------------------+ -| huge_pte_mkwrprotect | Creates a write protected HugeTLB | +| huge_pte_wrprotect | Creates a write protected HugeTLB | +---------------------------+--------------------------------------------------+ | huge_ptep_get_and_clear | Clears a HugeTLB | +---------------------------+--------------------------------------------------+ diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index c05d9dcf7891..a5be11210597 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -70,6 +70,7 @@ static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot) WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte)))); WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte)))); WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte)))); + WARN_ON(pte_dirty(pte_wrprotect(pte))); } static void __init pte_advanced_tests(struct mm_struct *mm, @@ -144,6 +145,7 @@ static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) WARN_ON(pmd_young(pmd_mkold(pmd_mkyoung(pmd)))); WARN_ON(pmd_dirty(pmd_mkclean(pmd_mkdirty(pmd)))); WARN_ON(pmd_write(pmd_wrprotect(pmd_mkwrite(pmd)))); + WARN_ON(pmd_dirty(pmd_wrprotect(pmd))); /* * A huge page does not point to next level page table * entry. Hence this must qualify as pmd_bad(). @@ -262,6 +264,7 @@ static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) WARN_ON(!pud_write(pud_mkwrite(pud_wrprotect(pud)))); WARN_ON(pud_write(pud_wrprotect(pud_mkwrite(pud)))); WARN_ON(pud_young(pud_mkold(pud_mkyoung(pud)))); + WARN_ON(pud_dirty(pud_wrprotect(pud))); if (mm_pmd_folded(mm)) return;
This adds validation tests for dirtiness after write protect conversion for each page table level. This is important for platforms such as arm64 that removes the hardware dirty bit while making it an write protected one. This also fixes pxx_wrprotect() related typos in the documentation file. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- Documentation/vm/arch_pgtable_helpers.rst | 8 ++++---- mm/debug_vm_pgtable.c | 3 +++ 2 files changed, 7 insertions(+), 4 deletions(-)