Message ID | 20210828042306.42886-3-zhengqi.arch@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Do some code cleanups related to mm | expand |
On 28.08.21 06:23, Qi Zheng wrote: > The smp_wmb() which is in the __pte_alloc() is used to > ensure all ptes setup is visible before the pte is made > visible to other CPUs by being put into page tables. We > only need this when the pte is actually populated, so > move it to pte_install(). __pte_alloc_kernel(), > __p4d_alloc(), __pud_alloc() and __pmd_alloc() are similar > to this case. > > We can also defer smp_wmb() to the place where the pmd entry > is really populated by preallocated pte. There are two kinds > of user of preallocated pte, one is filemap & finish_fault(), > another is THP. The former does not need another smp_wmb() > because the smp_wmb() has been done by pte_install(). > Fortunately, the latter also does not need another smp_wmb() > because there is already a smp_wmb() before populating the > new pte when the THP uses a preallocated pte to split a huge > pmd. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > Reviewed-by: Muchun Song <songmuchun@bytedance.com> > --- > mm/memory.c | 47 ++++++++++++++++++++--------------------------- > mm/sparse-vmemmap.c | 2 +- > 2 files changed, 21 insertions(+), 28 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index ef7b1762e996..9c7534187454 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -439,6 +439,20 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte) > > if (likely(pmd_none(*pmd))) { /* Has another populated it ? */ > mm_inc_nr_ptes(mm); > + /* > + * Ensure all pte setup (eg. pte page lock and page clearing) are > + * visible before the pte is made visible to other CPUs by being > + * put into page tables. > + * > + * The other side of the story is the pointer chasing in the page > + * table walking code (when walking the page table without locking; > + * ie. most of the time). Fortunately, these data accesses consist > + * of a chain of data-dependent loads, meaning most CPUs (alpha > + * being the notable exception) will already guarantee loads are > + * seen in-order. See the alpha page table accessors for the > + * smp_rmb() barriers in page table walking code. > + */ > + smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ > pmd_populate(mm, pmd, *pte); > *pte = NULL; > } > @@ -451,21 +465,6 @@ int __pte_alloc(struct mm_struct *mm, pmd_t *pmd) > if (!new) > return -ENOMEM; > > - /* > - * Ensure all pte setup (eg. pte page lock and page clearing) are > - * visible before the pte is made visible to other CPUs by being > - * put into page tables. > - * > - * The other side of the story is the pointer chasing in the page > - * table walking code (when walking the page table without locking; > - * ie. most of the time). Fortunately, these data accesses consist > - * of a chain of data-dependent loads, meaning most CPUs (alpha > - * being the notable exception) will already guarantee loads are > - * seen in-order. See the alpha page table accessors for the > - * smp_rmb() barriers in page table walking code. > - */ > - smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ > - > pmd_install(mm, pmd, &new); > if (new) > pte_free(mm, new); > @@ -478,10 +477,9 @@ int __pte_alloc_kernel(pmd_t *pmd) > if (!new) > return -ENOMEM; > > - smp_wmb(); /* See comment in __pte_alloc */ > - > spin_lock(&init_mm.page_table_lock); > if (likely(pmd_none(*pmd))) { /* Has another populated it ? */ > + smp_wmb(); /* See comment in pmd_install() */ > pmd_populate_kernel(&init_mm, pmd, new); > new = NULL; > } > @@ -3857,7 +3855,6 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) > vmf->prealloc_pte = pte_alloc_one(vma->vm_mm); > if (!vmf->prealloc_pte) > return VM_FAULT_OOM; > - smp_wmb(); /* See comment in __pte_alloc() */ > } > > ret = vma->vm_ops->fault(vmf); > @@ -3919,7 +3916,6 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) > vmf->prealloc_pte = pte_alloc_one(vma->vm_mm); > if (!vmf->prealloc_pte) > return VM_FAULT_OOM; > - smp_wmb(); /* See comment in __pte_alloc() */ > } > > vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); > @@ -4144,7 +4140,6 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf) > vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm); > if (!vmf->prealloc_pte) > return VM_FAULT_OOM; > - smp_wmb(); /* See comment in __pte_alloc() */ > } > > return vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff); > @@ -4819,13 +4814,13 @@ int __p4d_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address) > if (!new) > return -ENOMEM; > > - smp_wmb(); /* See comment in __pte_alloc */ > - > spin_lock(&mm->page_table_lock); > if (pgd_present(*pgd)) /* Another has populated it */ > p4d_free(mm, new); > - else > + else { > + smp_wmb(); /* See comment in pmd_install() */ > pgd_populate(mm, pgd, new); > + } Nit: if () { } else { } see Documentation/process/coding-style.rst "This does not apply if only one branch of a conditional statement is a single statement; in the latter case use braces in both branches:" Apart from that, I think this is fine, Acked-by: David Hildenbrand <david@redhat.com>
On 8/28/21 06:23, Qi Zheng wrote: > The smp_wmb() which is in the __pte_alloc() is used to > ensure all ptes setup is visible before the pte is made > visible to other CPUs by being put into page tables. We > only need this when the pte is actually populated, so > move it to pte_install(). __pte_alloc_kernel(), It's named pmd_install()? > __p4d_alloc(), __pud_alloc() and __pmd_alloc() are similar > to this case. > > We can also defer smp_wmb() to the place where the pmd entry > is really populated by preallocated pte. There are two kinds > of user of preallocated pte, one is filemap & finish_fault(), > another is THP. The former does not need another smp_wmb() > because the smp_wmb() has been done by pte_install(). Same here. > Fortunately, the latter also does not need another smp_wmb() > because there is already a smp_wmb() before populating the > new pte when the THP uses a preallocated pte to split a huge > pmd. > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > Reviewed-by: Muchun Song <songmuchun@bytedance.com> > --- > mm/memory.c | 47 ++++++++++++++++++++--------------------------- > mm/sparse-vmemmap.c | 2 +- > 2 files changed, 21 insertions(+), 28 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index ef7b1762e996..9c7534187454 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -439,6 +439,20 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte) > > if (likely(pmd_none(*pmd))) { /* Has another populated it ? */ > mm_inc_nr_ptes(mm); > + /* > + * Ensure all pte setup (eg. pte page lock and page clearing) are > + * visible before the pte is made visible to other CPUs by being > + * put into page tables. > + * > + * The other side of the story is the pointer chasing in the page > + * table walking code (when walking the page table without locking; > + * ie. most of the time). Fortunately, these data accesses consist > + * of a chain of data-dependent loads, meaning most CPUs (alpha > + * being the notable exception) will already guarantee loads are > + * seen in-order. See the alpha page table accessors for the > + * smp_rmb() barriers in page table walking code. > + */ > + smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ So, could it? :)
On 2021/8/31 PM6:02, David Hildenbrand wrote: > On 28.08.21 06:23, Qi Zheng wrote: >> The smp_wmb() which is in the __pte_alloc() is used to >> ensure all ptes setup is visible before the pte is made >> visible to other CPUs by being put into page tables. We >> only need this when the pte is actually populated, so >> move it to pte_install(). __pte_alloc_kernel(), >> __p4d_alloc(), __pud_alloc() and __pmd_alloc() are similar >> to this case. >> >> We can also defer smp_wmb() to the place where the pmd entry >> is really populated by preallocated pte. There are two kinds >> of user of preallocated pte, one is filemap & finish_fault(), >> another is THP. The former does not need another smp_wmb() >> because the smp_wmb() has been done by pte_install(). >> Fortunately, the latter also does not need another smp_wmb() >> because there is already a smp_wmb() before populating the >> new pte when the THP uses a preallocated pte to split a huge >> pmd. >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> Reviewed-by: Muchun Song <songmuchun@bytedance.com> >> --- >> mm/memory.c | 47 >> ++++++++++++++++++++--------------------------- >> mm/sparse-vmemmap.c | 2 +- >> 2 files changed, 21 insertions(+), 28 deletions(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index ef7b1762e996..9c7534187454 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -439,6 +439,20 @@ void pmd_install(struct mm_struct *mm, pmd_t >> *pmd, pgtable_t *pte) >> if (likely(pmd_none(*pmd))) { /* Has another populated it ? */ >> mm_inc_nr_ptes(mm); >> + /* >> + * Ensure all pte setup (eg. pte page lock and page clearing) >> are >> + * visible before the pte is made visible to other CPUs by being >> + * put into page tables. >> + * >> + * The other side of the story is the pointer chasing in the >> page >> + * table walking code (when walking the page table without >> locking; >> + * ie. most of the time). Fortunately, these data accesses >> consist >> + * of a chain of data-dependent loads, meaning most CPUs (alpha >> + * being the notable exception) will already guarantee loads are >> + * seen in-order. See the alpha page table accessors for the >> + * smp_rmb() barriers in page table walking code. >> + */ >> + smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ >> pmd_populate(mm, pmd, *pte); >> *pte = NULL; >> } >> @@ -451,21 +465,6 @@ int __pte_alloc(struct mm_struct *mm, pmd_t *pmd) >> if (!new) >> return -ENOMEM; >> - /* >> - * Ensure all pte setup (eg. pte page lock and page clearing) are >> - * visible before the pte is made visible to other CPUs by being >> - * put into page tables. >> - * >> - * The other side of the story is the pointer chasing in the page >> - * table walking code (when walking the page table without locking; >> - * ie. most of the time). Fortunately, these data accesses consist >> - * of a chain of data-dependent loads, meaning most CPUs (alpha >> - * being the notable exception) will already guarantee loads are >> - * seen in-order. See the alpha page table accessors for the >> - * smp_rmb() barriers in page table walking code. >> - */ >> - smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ >> - >> pmd_install(mm, pmd, &new); >> if (new) >> pte_free(mm, new); >> @@ -478,10 +477,9 @@ int __pte_alloc_kernel(pmd_t *pmd) >> if (!new) >> return -ENOMEM; >> - smp_wmb(); /* See comment in __pte_alloc */ >> - >> spin_lock(&init_mm.page_table_lock); >> if (likely(pmd_none(*pmd))) { /* Has another populated it ? */ >> + smp_wmb(); /* See comment in pmd_install() */ >> pmd_populate_kernel(&init_mm, pmd, new); >> new = NULL; >> } >> @@ -3857,7 +3855,6 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) >> vmf->prealloc_pte = pte_alloc_one(vma->vm_mm); >> if (!vmf->prealloc_pte) >> return VM_FAULT_OOM; >> - smp_wmb(); /* See comment in __pte_alloc() */ >> } >> ret = vma->vm_ops->fault(vmf); >> @@ -3919,7 +3916,6 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, >> struct page *page) >> vmf->prealloc_pte = pte_alloc_one(vma->vm_mm); >> if (!vmf->prealloc_pte) >> return VM_FAULT_OOM; >> - smp_wmb(); /* See comment in __pte_alloc() */ >> } >> vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); >> @@ -4144,7 +4140,6 @@ static vm_fault_t do_fault_around(struct >> vm_fault *vmf) >> vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm); >> if (!vmf->prealloc_pte) >> return VM_FAULT_OOM; >> - smp_wmb(); /* See comment in __pte_alloc() */ >> } >> return vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff); >> @@ -4819,13 +4814,13 @@ int __p4d_alloc(struct mm_struct *mm, pgd_t >> *pgd, unsigned long address) >> if (!new) >> return -ENOMEM; >> - smp_wmb(); /* See comment in __pte_alloc */ >> - >> spin_lock(&mm->page_table_lock); >> if (pgd_present(*pgd)) /* Another has populated it */ >> p4d_free(mm, new); >> - else >> + else { >> + smp_wmb(); /* See comment in pmd_install() */ >> pgd_populate(mm, pgd, new); >> + } > > Nit: > > if () { > > } else { > > } > > see Documentation/process/coding-style.rst > > "This does not apply if only one branch of a conditional statement is a > single statement; in the latter case use braces in both branches:" Got it. > > > Apart from that, I think this is fine, > > Acked-by: David Hildenbrand <david@redhat.com> > Thanks, Qi
On 2021/8/31 PM6:20, Vlastimil Babka wrote: > On 8/28/21 06:23, Qi Zheng wrote: >> The smp_wmb() which is in the __pte_alloc() is used to >> ensure all ptes setup is visible before the pte is made >> visible to other CPUs by being put into page tables. We >> only need this when the pte is actually populated, so >> move it to pte_install(). __pte_alloc_kernel(), > > It's named pmd_install()? Yes, I will update it in the next version. > >> __p4d_alloc(), __pud_alloc() and __pmd_alloc() are similar >> to this case. >> >> We can also defer smp_wmb() to the place where the pmd entry >> is really populated by preallocated pte. There are two kinds >> of user of preallocated pte, one is filemap & finish_fault(), >> another is THP. The former does not need another smp_wmb() >> because the smp_wmb() has been done by pte_install(). > > Same here. > >> Fortunately, the latter also does not need another smp_wmb() >> because there is already a smp_wmb() before populating the >> new pte when the THP uses a preallocated pte to split a huge >> pmd. >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> Reviewed-by: Muchun Song <songmuchun@bytedance.com> >> --- >> mm/memory.c | 47 ++++++++++++++++++++--------------------------- >> mm/sparse-vmemmap.c | 2 +- >> 2 files changed, 21 insertions(+), 28 deletions(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index ef7b1762e996..9c7534187454 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -439,6 +439,20 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte) >> >> if (likely(pmd_none(*pmd))) { /* Has another populated it ? */ >> mm_inc_nr_ptes(mm); >> + /* >> + * Ensure all pte setup (eg. pte page lock and page clearing) are >> + * visible before the pte is made visible to other CPUs by being >> + * put into page tables. >> + * >> + * The other side of the story is the pointer chasing in the page >> + * table walking code (when walking the page table without locking; >> + * ie. most of the time). Fortunately, these data accesses consist >> + * of a chain of data-dependent loads, meaning most CPUs (alpha >> + * being the notable exception) will already guarantee loads are >> + * seen in-order. See the alpha page table accessors for the >> + * smp_rmb() barriers in page table walking code. >> + */ >> + smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ > > So, could it? :) > Yes, it could, but we don't have smp_wmb__after_spin_lock() now.
diff --git a/mm/memory.c b/mm/memory.c index ef7b1762e996..9c7534187454 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -439,6 +439,20 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte) if (likely(pmd_none(*pmd))) { /* Has another populated it ? */ mm_inc_nr_ptes(mm); + /* + * Ensure all pte setup (eg. pte page lock and page clearing) are + * visible before the pte is made visible to other CPUs by being + * put into page tables. + * + * The other side of the story is the pointer chasing in the page + * table walking code (when walking the page table without locking; + * ie. most of the time). Fortunately, these data accesses consist + * of a chain of data-dependent loads, meaning most CPUs (alpha + * being the notable exception) will already guarantee loads are + * seen in-order. See the alpha page table accessors for the + * smp_rmb() barriers in page table walking code. + */ + smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ pmd_populate(mm, pmd, *pte); *pte = NULL; } @@ -451,21 +465,6 @@ int __pte_alloc(struct mm_struct *mm, pmd_t *pmd) if (!new) return -ENOMEM; - /* - * Ensure all pte setup (eg. pte page lock and page clearing) are - * visible before the pte is made visible to other CPUs by being - * put into page tables. - * - * The other side of the story is the pointer chasing in the page - * table walking code (when walking the page table without locking; - * ie. most of the time). Fortunately, these data accesses consist - * of a chain of data-dependent loads, meaning most CPUs (alpha - * being the notable exception) will already guarantee loads are - * seen in-order. See the alpha page table accessors for the - * smp_rmb() barriers in page table walking code. - */ - smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ - pmd_install(mm, pmd, &new); if (new) pte_free(mm, new); @@ -478,10 +477,9 @@ int __pte_alloc_kernel(pmd_t *pmd) if (!new) return -ENOMEM; - smp_wmb(); /* See comment in __pte_alloc */ - spin_lock(&init_mm.page_table_lock); if (likely(pmd_none(*pmd))) { /* Has another populated it ? */ + smp_wmb(); /* See comment in pmd_install() */ pmd_populate_kernel(&init_mm, pmd, new); new = NULL; } @@ -3857,7 +3855,6 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) vmf->prealloc_pte = pte_alloc_one(vma->vm_mm); if (!vmf->prealloc_pte) return VM_FAULT_OOM; - smp_wmb(); /* See comment in __pte_alloc() */ } ret = vma->vm_ops->fault(vmf); @@ -3919,7 +3916,6 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) vmf->prealloc_pte = pte_alloc_one(vma->vm_mm); if (!vmf->prealloc_pte) return VM_FAULT_OOM; - smp_wmb(); /* See comment in __pte_alloc() */ } vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); @@ -4144,7 +4140,6 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf) vmf->prealloc_pte = pte_alloc_one(vmf->vma->vm_mm); if (!vmf->prealloc_pte) return VM_FAULT_OOM; - smp_wmb(); /* See comment in __pte_alloc() */ } return vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff); @@ -4819,13 +4814,13 @@ int __p4d_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address) if (!new) return -ENOMEM; - smp_wmb(); /* See comment in __pte_alloc */ - spin_lock(&mm->page_table_lock); if (pgd_present(*pgd)) /* Another has populated it */ p4d_free(mm, new); - else + else { + smp_wmb(); /* See comment in pmd_install() */ pgd_populate(mm, pgd, new); + } spin_unlock(&mm->page_table_lock); return 0; } @@ -4842,11 +4837,10 @@ int __pud_alloc(struct mm_struct *mm, p4d_t *p4d, unsigned long address) if (!new) return -ENOMEM; - smp_wmb(); /* See comment in __pte_alloc */ - spin_lock(&mm->page_table_lock); if (!p4d_present(*p4d)) { mm_inc_nr_puds(mm); + smp_wmb(); /* See comment in pmd_install() */ p4d_populate(mm, p4d, new); } else /* Another has populated it */ pud_free(mm, new); @@ -4867,11 +4861,10 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address) if (!new) return -ENOMEM; - smp_wmb(); /* See comment in __pte_alloc */ - ptl = pud_lock(mm, pud); if (!pud_present(*pud)) { mm_inc_nr_pmds(mm); + smp_wmb(); /* See comment in pmd_install() */ pud_populate(mm, pud, new); } else /* Another has populated it */ pmd_free(mm, new); diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index bdce883f9286..db6df27c852a 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -76,7 +76,7 @@ static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, set_pte_at(&init_mm, addr, pte, entry); } - /* Make pte visible before pmd. See comment in __pte_alloc(). */ + /* Make pte visible before pmd. See comment in pmd_install(). */ smp_wmb(); pmd_populate_kernel(&init_mm, pmd, pgtable);