Message ID | 20220130211838.8382-18-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Shadow stacks for userspace | expand |
On 1/30/22 13:18, Rick Edgecombe wrote: > - do_anonymous_page() and migrate_vma_insert_page() check VM_WRITE directly > and call pte_mkwrite(), which is the same as maybe_mkwrite(). Change > them to maybe_mkwrite(). Those look OK. > - In do_numa_page(), if the numa entry was writable, then pte_mkwrite() > is called directly. Fix it by doing maybe_mkwrite(). Make the same > changes to do_huge_pmd_numa_page(). This is another "what", not "why" changelog. This change puzzles me. *Why* is this needed? It sounds like pte_mkwrite() doesn't work for shadow stack PTEs. Let's say that explicitly. I also this this is ab/misuse of maybe_mkwrite(). The shadow stack VMA *REQUIRES* PTEs with Dirty=1. There's no *maybe* about it. The rest of this is essentially a hack to get VM_SHADOW_STACK-required bits into the PTE. We have a place where we store those VMA-required bits: vma->vm_page_prot. Look at how we store the pkey bits in there for instance. Let's say we set _PAGE_DIRTY in vma->vm_page_prot. We'd come into do_anonymous_page() for instance and do this: > entry = mk_pte(page, vma->vm_page_prot); <--- PTE is Write=0,Dirty=1 Yay! > entry = pte_sw_mkyoung(entry); > if (vma->vm_flags & VM_WRITE) <--- False, skip the pte_mkwrite() > entry = pte_mkwrite(pte_mkdirty(entry)); In other words, it "just works" because shadow stack VMAs don't have VM_WRITE set. I think the other VM_WRITE checks would be fine too, although I'm unsure about the change_page_attr() one.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2adedcfca00b..3588e9fefbe0 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1489,7 +1489,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) pmd = pmd_modify(oldpmd, vma->vm_page_prot); pmd = pmd_mkyoung(pmd); if (was_writable) - pmd = pmd_mkwrite(pmd); + pmd = maybe_pmd_mkwrite(pmd, vma); set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); spin_unlock(vmf->ptl); diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..c79444603d5d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3793,8 +3793,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) entry = mk_pte(page, vma->vm_page_prot); entry = pte_sw_mkyoung(entry); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); @@ -4428,7 +4427,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) pte = pte_modify(old_pte, vma->vm_page_prot); pte = pte_mkyoung(pte); if (was_writable) - pte = pte_mkwrite(pte); + pte = maybe_mkwrite(pte, vma); ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte); update_mmu_cache(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); diff --git a/mm/migrate.c b/mm/migrate.c index c7da064b4781..438f1e21b9c7 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2697,8 +2697,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, } } else { entry = mk_pte(page, vma->vm_page_prot); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); + entry = maybe_mkwrite(pte_mkdirty(entry), vma); } ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); diff --git a/mm/mprotect.c b/mm/mprotect.c index 0138dfcdb1d8..b0012c13a00e 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -135,7 +135,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, if (dirty_accountable && pte_dirty(ptent) && (pte_soft_dirty(ptent) || !(vma->vm_flags & VM_SOFTDIRTY))) { - ptent = pte_mkwrite(ptent); + ptent = maybe_mkwrite(ptent, vma); } ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); pages++;