diff mbox series

[4/6] mm: hugetlb_vmemmap: add missing smp_wmb() before set_pte_at()

Message ID 20220816130553.31406-5-linmiaohe@huawei.com (mailing list archive)
State New
Headers show
Series A few fixup patches for hugetlb | expand

Commit Message

Miaohe Lin Aug. 16, 2022, 1:05 p.m. UTC
The memory barrier smp_wmb() is needed to make sure that preceding stores
to the page contents become visible before the below set_pte_at() write.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/hugetlb_vmemmap.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Muchun Song Aug. 17, 2022, 2:53 a.m. UTC | #1
> On Aug 16, 2022, at 21:05, Miaohe Lin <linmiaohe@huawei.com> wrote:
> 
> The memory barrier smp_wmb() is needed to make sure that preceding stores
> to the page contents become visible before the below set_pte_at() write.

I’m not sure if you are right. I think it is set_pte_at()’s responsibility.
Take arm64 (since it is a Relaxed Memory Order model) as an example (the
following code snippet is set_pte()), I see a barrier guarantee. So I am
curious what issues you are facing. So I want to know the basis for you to
do this change.

 static inline void set_pte(pte_t *ptep, pte_t pte)
 {
        *ptep = pte;

        /*
         * Only if the new pte is valid and kernel, otherwise TLB maintenance
         * or update_mmu_cache() have the necessary barriers.
         */
        if (pte_valid_not_user(pte)) {
               dsb(ishst);
               isb();
        }
 }

Thanks.

> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
> mm/hugetlb_vmemmap.c | 5 +++++
> 1 file changed, 5 insertions(+)
> 
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 20f414c0379f..76b2d03a0d8d 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -287,6 +287,11 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
> 	copy_page(to, (void *)walk->reuse_addr);
> 	reset_struct_pages(to);
> 
> +	/*
> +	 * Makes sure that preceding stores to the page contents become visible
> +	 * before the set_pte_at() write.
> +	 */
> +	smp_wmb();
> 	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
> }
> 
> -- 
> 2.23.0
> 
>
Miaohe Lin Aug. 17, 2022, 8:41 a.m. UTC | #2
On 2022/8/17 10:53, Muchun Song wrote:
> 
> 
>> On Aug 16, 2022, at 21:05, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> The memory barrier smp_wmb() is needed to make sure that preceding stores
>> to the page contents become visible before the below set_pte_at() write.
> 
> I’m not sure if you are right. I think it is set_pte_at()’s responsibility.

Maybe not. There're many call sites do the similar things:

hugetlb_mcopy_atomic_pte
__do_huge_pmd_anonymous_page
collapse_huge_page
do_anonymous_page
migrate_vma_insert_page
mcopy_atomic_pte

Take do_anonymous_page as an example:

	/*
	 * The memory barrier inside __SetPageUptodate makes sure that
	 * preceding stores to the page contents become visible before
	 * the set_pte_at() write.
	 */
	__SetPageUptodate(page);

So I think a memory barrier is needed before the set_pte_at() write. Or am I miss something?

Thanks,
Miaohe Lin

> Take arm64 (since it is a Relaxed Memory Order model) as an example (the
> following code snippet is set_pte()), I see a barrier guarantee. So I am
> curious what issues you are facing. So I want to know the basis for you to
> do this change.
> 
>  static inline void set_pte(pte_t *ptep, pte_t pte)
>  {
>         *ptep = pte;
> 
>         /*
>          * Only if the new pte is valid and kernel, otherwise TLB maintenance
>          * or update_mmu_cache() have the necessary barriers.
>          */
>         if (pte_valid_not_user(pte)) {
>                dsb(ishst);
>                isb();
>         }
>  }
> 
> Thanks.
>
Yin, Fengwei Aug. 17, 2022, 9:13 a.m. UTC | #3
On 8/17/2022 4:41 PM, Miaohe Lin wrote:
> So I think a memory barrier is needed before the set_pte_at() write. Or am I miss something?
Yes. I agree with you. The memory barrier should be put between page
content change and pte update. The patch looks good to me. Thanks.


Regards
Yin, Fengwei

> 
> Thanks,
> Miaohe Lin
Muchun Song Aug. 17, 2022, 11:21 a.m. UTC | #4
> On Aug 17, 2022, at 16:41, Miaohe Lin <linmiaohe@huawei.com> wrote:
> 
> On 2022/8/17 10:53, Muchun Song wrote:
>> 
>> 
>>> On Aug 16, 2022, at 21:05, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>> 
>>> The memory barrier smp_wmb() is needed to make sure that preceding stores
>>> to the page contents become visible before the below set_pte_at() write.
>> 
>> I’m not sure if you are right. I think it is set_pte_at()’s responsibility.
> 
> Maybe not. There're many call sites do the similar things:
> 
> hugetlb_mcopy_atomic_pte
> __do_huge_pmd_anonymous_page
> collapse_huge_page
> do_anonymous_page
> migrate_vma_insert_page
> mcopy_atomic_pte
> 
> Take do_anonymous_page as an example:
> 
> 	/*
> 	 * The memory barrier inside __SetPageUptodate makes sure that
> 	 * preceding stores to the page contents become visible before
> 	 * the set_pte_at() write.
> 	 */
> 	__SetPageUptodate(page);

IIUC, the case here we should make sure others (CPUs) can see new page’s
contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
can tell us more details.

I also looked at commit 52f37629fd3c to see why we need a barrier before
set_pte_at(), but I didn’t find any info to explain why. I guess we want
to make sure the order between the page’s contents and subsequent memory
accesses using the corresponding virtual address, do you agree with this?

Thanks.

> 
> So I think a memory barrier is needed before the set_pte_at() write. Or am I miss something?
> 
> Thanks,
> Miaohe Lin
> 
>> Take arm64 (since it is a Relaxed Memory Order model) as an example (the
>> following code snippet is set_pte()), I see a barrier guarantee. So I am
>> curious what issues you are facing. So I want to know the basis for you to
>> do this change.
>> 
>> static inline void set_pte(pte_t *ptep, pte_t pte)
>> {
>>        *ptep = pte;
>> 
>>        /*
>>         * Only if the new pte is valid and kernel, otherwise TLB maintenance
>>         * or update_mmu_cache() have the necessary barriers.
>>         */
>>        if (pte_valid_not_user(pte)) {
>>               dsb(ishst);
>>               isb();
>>        }
>> }
>> 
>> Thanks.
>>
Yin, Fengwei Aug. 18, 2022, 1:14 a.m. UTC | #5
On 8/17/2022 7:21 PM, Muchun Song wrote:
> 
> 
>> On Aug 17, 2022, at 16:41, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2022/8/17 10:53, Muchun Song wrote:
>>>
>>>
>>>> On Aug 16, 2022, at 21:05, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>
>>>> The memory barrier smp_wmb() is needed to make sure that preceding stores
>>>> to the page contents become visible before the below set_pte_at() write.
>>>
>>> I’m not sure if you are right. I think it is set_pte_at()’s responsibility.
>>
>> Maybe not. There're many call sites do the similar things:
>>
>> hugetlb_mcopy_atomic_pte
>> __do_huge_pmd_anonymous_page
>> collapse_huge_page
>> do_anonymous_page
>> migrate_vma_insert_page
>> mcopy_atomic_pte
>>
>> Take do_anonymous_page as an example:
>>
>> 	/*
>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>> 	 * preceding stores to the page contents become visible before
>> 	 * the set_pte_at() write.
>> 	 */
>> 	__SetPageUptodate(page);
> 
> IIUC, the case here we should make sure others (CPUs) can see new page’s
> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
> can tell us more details.
> 
> I also looked at commit 52f37629fd3c to see why we need a barrier before
> set_pte_at(), but I didn’t find any info to explain why. I guess we want
> to make sure the order between the page’s contents and subsequent memory
> accesses using the corresponding virtual address, do you agree with this?
This is my understanding also. Thanks.

Regards
Yin, Fengwei

> 
> Thanks.
> 
>>
>> So I think a memory barrier is needed before the set_pte_at() write. Or am I miss something?
>>
>> Thanks,
>> Miaohe Lin
>>
>>> Take arm64 (since it is a Relaxed Memory Order model) as an example (the
>>> following code snippet is set_pte()), I see a barrier guarantee. So I am
>>> curious what issues you are facing. So I want to know the basis for you to
>>> do this change.
>>>
>>> static inline void set_pte(pte_t *ptep, pte_t pte)
>>> {
>>>        *ptep = pte;
>>>
>>>        /*
>>>         * Only if the new pte is valid and kernel, otherwise TLB maintenance
>>>         * or update_mmu_cache() have the necessary barriers.
>>>         */
>>>        if (pte_valid_not_user(pte)) {
>>>               dsb(ishst);
>>>               isb();
>>>        }
>>> }
>>>
>>> Thanks.
>>>
> 
>
Yin, Fengwei Aug. 18, 2022, 1:15 a.m. UTC | #6
On 8/16/2022 9:05 PM, Miaohe Lin wrote:
> The memory barrier smp_wmb() is needed to make sure that preceding stores
> to the page contents become visible before the below set_pte_at() write.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>

Regards
Yin, Fengwei

> ---
>  mm/hugetlb_vmemmap.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 20f414c0379f..76b2d03a0d8d 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -287,6 +287,11 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
>  	copy_page(to, (void *)walk->reuse_addr);
>  	reset_struct_pages(to);
>  
> +	/*
> +	 * Makes sure that preceding stores to the page contents become visible
> +	 * before the set_pte_at() write.
> +	 */
> +	smp_wmb();
>  	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
>  }
>
Miaohe Lin Aug. 18, 2022, 1:55 a.m. UTC | #7
On 2022/8/18 9:14, Yin, Fengwei wrote:
> 
> 
> On 8/17/2022 7:21 PM, Muchun Song wrote:
>>
>>
>>> On Aug 17, 2022, at 16:41, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>
>>> On 2022/8/17 10:53, Muchun Song wrote:
>>>>
>>>>
>>>>> On Aug 16, 2022, at 21:05, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>>
>>>>> The memory barrier smp_wmb() is needed to make sure that preceding stores
>>>>> to the page contents become visible before the below set_pte_at() write.
>>>>
>>>> I’m not sure if you are right. I think it is set_pte_at()’s responsibility.
>>>
>>> Maybe not. There're many call sites do the similar things:
>>>
>>> hugetlb_mcopy_atomic_pte
>>> __do_huge_pmd_anonymous_page
>>> collapse_huge_page
>>> do_anonymous_page
>>> migrate_vma_insert_page
>>> mcopy_atomic_pte
>>>
>>> Take do_anonymous_page as an example:
>>>
>>> 	/*
>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>> 	 * preceding stores to the page contents become visible before
>>> 	 * the set_pte_at() write.
>>> 	 */
>>> 	__SetPageUptodate(page);
>>
>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>> can tell us more details.
>>
>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>> to make sure the order between the page’s contents and subsequent memory
>> accesses using the corresponding virtual address, do you agree with this?
> This is my understanding also. Thanks.

That's also my understanding. Thanks both.

Thanks,
Miaohe Lin
Yin, Fengwei Aug. 18, 2022, 2 a.m. UTC | #8
On 8/18/2022 9:55 AM, Miaohe Lin wrote:
>>>> 	/*
>>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>>> 	 * preceding stores to the page contents become visible before
>>>> 	 * the set_pte_at() write.
>>>> 	 */
>>>> 	__SetPageUptodate(page);
>>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>>> can tell us more details.
>>>
>>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>>> to make sure the order between the page’s contents and subsequent memory
>>> accesses using the corresponding virtual address, do you agree with this?
>> This is my understanding also. Thanks.
> That's also my understanding. Thanks both.
I have an unclear thing (not related with this patch directly): Who is response
for the read barrier in the read side in this case?

For SetPageUptodate, there are paring write/read memory barrier.


Regards
Yin, Fengwei
Muchun Song Aug. 18, 2022, 2:47 a.m. UTC | #9
> On Aug 18, 2022, at 10:00, Yin, Fengwei <fengwei.yin@intel.com> wrote:
> 
> 
> 
> On 8/18/2022 9:55 AM, Miaohe Lin wrote:
>>>>> 	/*
>>>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>>>> 	 * preceding stores to the page contents become visible before
>>>>> 	 * the set_pte_at() write.
>>>>> 	 */
>>>>> 	__SetPageUptodate(page);
>>>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>>>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>>>> can tell us more details.
>>>> 
>>>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>>>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>>>> to make sure the order between the page’s contents and subsequent memory
>>>> accesses using the corresponding virtual address, do you agree with this?
>>> This is my understanding also. Thanks.
>> That's also my understanding. Thanks both.
> I have an unclear thing (not related with this patch directly): Who is response
> for the read barrier in the read side in this case?
> 
> For SetPageUptodate, there are paring write/read memory barrier.
> 

I have the same question. So I think the example proposed by Miaohe is a little
difference from the case (hugetlb_vmemmap) here.

> 
> Regards
> Yin, Fengwei
> 
>
Miaohe Lin Aug. 18, 2022, 7:52 a.m. UTC | #10
On 2022/8/18 10:47, Muchun Song wrote:
> 
> 
>> On Aug 18, 2022, at 10:00, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>
>>
>>
>> On 8/18/2022 9:55 AM, Miaohe Lin wrote:
>>>>>> 	/*
>>>>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>>>>> 	 * preceding stores to the page contents become visible before
>>>>>> 	 * the set_pte_at() write.
>>>>>> 	 */
>>>>>> 	__SetPageUptodate(page);
>>>>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>>>>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>>>>> can tell us more details.
>>>>>
>>>>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>>>>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>>>>> to make sure the order between the page’s contents and subsequent memory
>>>>> accesses using the corresponding virtual address, do you agree with this?
>>>> This is my understanding also. Thanks.
>>> That's also my understanding. Thanks both.
>> I have an unclear thing (not related with this patch directly): Who is response
>> for the read barrier in the read side in this case?
>>
>> For SetPageUptodate, there are paring write/read memory barrier.
>>
> 
> I have the same question. So I think the example proposed by Miaohe is a little
> difference from the case (hugetlb_vmemmap) here.

Per my understanding, memory barrier in PageUptodate() is needed because user might access the
page contents using page_address() (corresponding pagetable entry already exists) soon. But for
the above proposed case, if user wants to access the page contents, the corresponding pagetable
should be visible first or the page contents can't be accessed. So there should be a data dependency
acting as memory barrier between pagetable entry is loaded and page contents is accessed.
Or am I miss something?

Thanks,
Miaohe Lin
Muchun Song Aug. 18, 2022, 7:59 a.m. UTC | #11
> On Aug 18, 2022, at 15:52, Miaohe Lin <linmiaohe@huawei.com> wrote:
> 
> On 2022/8/18 10:47, Muchun Song wrote:
>> 
>> 
>>> On Aug 18, 2022, at 10:00, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>> 
>>> 
>>> 
>>> On 8/18/2022 9:55 AM, Miaohe Lin wrote:
>>>>>>> 	/*
>>>>>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>>>>>> 	 * preceding stores to the page contents become visible before
>>>>>>> 	 * the set_pte_at() write.
>>>>>>> 	 */
>>>>>>> 	__SetPageUptodate(page);
>>>>>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>>>>>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>>>>>> can tell us more details.
>>>>>> 
>>>>>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>>>>>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>>>>>> to make sure the order between the page’s contents and subsequent memory
>>>>>> accesses using the corresponding virtual address, do you agree with this?
>>>>> This is my understanding also. Thanks.
>>>> That's also my understanding. Thanks both.
>>> I have an unclear thing (not related with this patch directly): Who is response
>>> for the read barrier in the read side in this case?
>>> 
>>> For SetPageUptodate, there are paring write/read memory barrier.
>>> 
>> 
>> I have the same question. So I think the example proposed by Miaohe is a little
>> difference from the case (hugetlb_vmemmap) here.
> 
> Per my understanding, memory barrier in PageUptodate() is needed because user might access the
> page contents using page_address() (corresponding pagetable entry already exists) soon. But for
> the above proposed case, if user wants to access the page contents, the corresponding pagetable
> should be visible first or the page contents can't be accessed. So there should be a data dependency
> acting as memory barrier between pagetable entry is loaded and page contents is accessed.
> Or am I miss something?

Yep, it is a data dependency. The difference between hugetlb_vmemmap and PageUptodate() is that
the page table (a pointer to the mapped page frame) is loaded by MMU while PageUptodate() is
loaded by CPU. Seems like the data dependency should be inserted between the MMU access and the CPU
access. Maybe it is hardware’s guarantee?

> 
> Thanks,
> Miaohe Lin
Yin, Fengwei Aug. 18, 2022, 8:32 a.m. UTC | #12
On 8/18/2022 3:59 PM, Muchun Song wrote:
> 
> 
>> On Aug 18, 2022, at 15:52, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2022/8/18 10:47, Muchun Song wrote:
>>>
>>>
>>>> On Aug 18, 2022, at 10:00, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 8/18/2022 9:55 AM, Miaohe Lin wrote:
>>>>>>>> 	/*
>>>>>>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>>>>>>> 	 * preceding stores to the page contents become visible before
>>>>>>>> 	 * the set_pte_at() write.
>>>>>>>> 	 */
>>>>>>>> 	__SetPageUptodate(page);
>>>>>>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>>>>>>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>>>>>>> can tell us more details.
>>>>>>>
>>>>>>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>>>>>>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>>>>>>> to make sure the order between the page’s contents and subsequent memory
>>>>>>> accesses using the corresponding virtual address, do you agree with this?
>>>>>> This is my understanding also. Thanks.
>>>>> That's also my understanding. Thanks both.
>>>> I have an unclear thing (not related with this patch directly): Who is response
>>>> for the read barrier in the read side in this case?
>>>>
>>>> For SetPageUptodate, there are paring write/read memory barrier.
>>>>
>>>
>>> I have the same question. So I think the example proposed by Miaohe is a little
>>> difference from the case (hugetlb_vmemmap) here.
>>
>> Per my understanding, memory barrier in PageUptodate() is needed because user might access the
>> page contents using page_address() (corresponding pagetable entry already exists) soon. But for
>> the above proposed case, if user wants to access the page contents, the corresponding pagetable
>> should be visible first or the page contents can't be accessed. So there should be a data dependency
>> acting as memory barrier between pagetable entry is loaded and page contents is accessed.
>> Or am I miss something?
> 
> Yep, it is a data dependency. The difference between hugetlb_vmemmap and PageUptodate() is that
> the page table (a pointer to the mapped page frame) is loaded by MMU while PageUptodate() is
> loaded by CPU. Seems like the data dependency should be inserted between the MMU access and the CPU
> access. Maybe it is hardware’s guarantee?
I just found the comment in pmd_install() explained why most arch has no read
side memory barrier except alpha which has read side memory barrier.


Regards
Yin, Fengwei

> 
>>
>> Thanks,
>> Miaohe Lin
>
Muchun Song Aug. 18, 2022, 8:40 a.m. UTC | #13
> On Aug 18, 2022, at 16:32, Yin, Fengwei <fengwei.yin@intel.com> wrote:
> 
> 
> 
> On 8/18/2022 3:59 PM, Muchun Song wrote:
>> 
>> 
>>> On Aug 18, 2022, at 15:52, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>> 
>>> On 2022/8/18 10:47, Muchun Song wrote:
>>>> 
>>>> 
>>>>> On Aug 18, 2022, at 10:00, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 8/18/2022 9:55 AM, Miaohe Lin wrote:
>>>>>>>>> 	/*
>>>>>>>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>>>>>>>> 	 * preceding stores to the page contents become visible before
>>>>>>>>> 	 * the set_pte_at() write.
>>>>>>>>> 	 */
>>>>>>>>> 	__SetPageUptodate(page);
>>>>>>>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>>>>>>>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>>>>>>>> can tell us more details.
>>>>>>>> 
>>>>>>>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>>>>>>>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>>>>>>>> to make sure the order between the page’s contents and subsequent memory
>>>>>>>> accesses using the corresponding virtual address, do you agree with this?
>>>>>>> This is my understanding also. Thanks.
>>>>>> That's also my understanding. Thanks both.
>>>>> I have an unclear thing (not related with this patch directly): Who is response
>>>>> for the read barrier in the read side in this case?
>>>>> 
>>>>> For SetPageUptodate, there are paring write/read memory barrier.
>>>>> 
>>>> 
>>>> I have the same question. So I think the example proposed by Miaohe is a little
>>>> difference from the case (hugetlb_vmemmap) here.
>>> 
>>> Per my understanding, memory barrier in PageUptodate() is needed because user might access the
>>> page contents using page_address() (corresponding pagetable entry already exists) soon. But for
>>> the above proposed case, if user wants to access the page contents, the corresponding pagetable
>>> should be visible first or the page contents can't be accessed. So there should be a data dependency
>>> acting as memory barrier between pagetable entry is loaded and page contents is accessed.
>>> Or am I miss something?
>> 
>> Yep, it is a data dependency. The difference between hugetlb_vmemmap and PageUptodate() is that
>> the page table (a pointer to the mapped page frame) is loaded by MMU while PageUptodate() is
>> loaded by CPU. Seems like the data dependency should be inserted between the MMU access and the CPU
>> access. Maybe it is hardware’s guarantee?
> I just found the comment in pmd_install() explained why most arch has no read

I think pmd_install() is a little different as well. We should make sure the
page table walker (like GUP) see the correct PTE entry after they see the pmd
entry.

> side memory barrier except alpha which has read side memory barrier.

Right. Only alpha has data dependency barrier.

> 
> 
> Regards
> Yin, Fengwei
> 
>> 
>>> 
>>> Thanks,
>>> Miaohe Lin
Yin, Fengwei Aug. 18, 2022, 8:54 a.m. UTC | #14
On 8/18/2022 4:40 PM, Muchun Song wrote:
> 
> 
>> On Aug 18, 2022, at 16:32, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>
>>
>>
>> On 8/18/2022 3:59 PM, Muchun Song wrote:
>>>
>>>
>>>> On Aug 18, 2022, at 15:52, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>
>>>> On 2022/8/18 10:47, Muchun Song wrote:
>>>>>
>>>>>
>>>>>> On Aug 18, 2022, at 10:00, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 8/18/2022 9:55 AM, Miaohe Lin wrote:
>>>>>>>>>> 	/*
>>>>>>>>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>>>>>>>>> 	 * preceding stores to the page contents become visible before
>>>>>>>>>> 	 * the set_pte_at() write.
>>>>>>>>>> 	 */
>>>>>>>>>> 	__SetPageUptodate(page);
>>>>>>>>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>>>>>>>>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>>>>>>>>> can tell us more details.
>>>>>>>>>
>>>>>>>>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>>>>>>>>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>>>>>>>>> to make sure the order between the page’s contents and subsequent memory
>>>>>>>>> accesses using the corresponding virtual address, do you agree with this?
>>>>>>>> This is my understanding also. Thanks.
>>>>>>> That's also my understanding. Thanks both.
>>>>>> I have an unclear thing (not related with this patch directly): Who is response
>>>>>> for the read barrier in the read side in this case?
>>>>>>
>>>>>> For SetPageUptodate, there are paring write/read memory barrier.
>>>>>>
>>>>>
>>>>> I have the same question. So I think the example proposed by Miaohe is a little
>>>>> difference from the case (hugetlb_vmemmap) here.
>>>>
>>>> Per my understanding, memory barrier in PageUptodate() is needed because user might access the
>>>> page contents using page_address() (corresponding pagetable entry already exists) soon. But for
>>>> the above proposed case, if user wants to access the page contents, the corresponding pagetable
>>>> should be visible first or the page contents can't be accessed. So there should be a data dependency
>>>> acting as memory barrier between pagetable entry is loaded and page contents is accessed.
>>>> Or am I miss something?
>>>
>>> Yep, it is a data dependency. The difference between hugetlb_vmemmap and PageUptodate() is that
>>> the page table (a pointer to the mapped page frame) is loaded by MMU while PageUptodate() is
>>> loaded by CPU. Seems like the data dependency should be inserted between the MMU access and the CPU
>>> access. Maybe it is hardware’s guarantee?
>> I just found the comment in pmd_install() explained why most arch has no read
> 
> I think pmd_install() is a little different as well. We should make sure the
> page table walker (like GUP) see the correct PTE entry after they see the pmd
> entry.

The difference I can see is that pmd/pte thing has both hardware page walker and
software page walker (like GUP) as read side. While the case here only has hardware
page walker as read side. But I suppose the memory barrier requirement still apply
here.

Maybe we could do a test: add large delay between reset_struct_page() and set_pte_at?

Regards
Yin, Fengwei 

> 
>> side memory barrier except alpha which has read side memory barrier.
> 
> Right. Only alpha has data dependency barrier.
> 
>>
>>
>> Regards
>> Yin, Fengwei
>>
>>>
>>>>
>>>> Thanks,
>>>> Miaohe Lin
>
Muchun Song Aug. 18, 2022, 9:18 a.m. UTC | #15
> On Aug 18, 2022, at 16:54, Yin, Fengwei <fengwei.yin@intel.com> wrote:
> 
> 
> 
> On 8/18/2022 4:40 PM, Muchun Song wrote:
>> 
>> 
>>> On Aug 18, 2022, at 16:32, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>> 
>>> 
>>> 
>>> On 8/18/2022 3:59 PM, Muchun Song wrote:
>>>> 
>>>> 
>>>>> On Aug 18, 2022, at 15:52, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>> 
>>>>> On 2022/8/18 10:47, Muchun Song wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Aug 18, 2022, at 10:00, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 8/18/2022 9:55 AM, Miaohe Lin wrote:
>>>>>>>>>>> 	/*
>>>>>>>>>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>>>>>>>>>> 	 * preceding stores to the page contents become visible before
>>>>>>>>>>> 	 * the set_pte_at() write.
>>>>>>>>>>> 	 */
>>>>>>>>>>> 	__SetPageUptodate(page);
>>>>>>>>>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>>>>>>>>>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>>>>>>>>>> can tell us more details.
>>>>>>>>>> 
>>>>>>>>>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>>>>>>>>>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>>>>>>>>>> to make sure the order between the page’s contents and subsequent memory
>>>>>>>>>> accesses using the corresponding virtual address, do you agree with this?
>>>>>>>>> This is my understanding also. Thanks.
>>>>>>>> That's also my understanding. Thanks both.
>>>>>>> I have an unclear thing (not related with this patch directly): Who is response
>>>>>>> for the read barrier in the read side in this case?
>>>>>>> 
>>>>>>> For SetPageUptodate, there are paring write/read memory barrier.
>>>>>>> 
>>>>>> 
>>>>>> I have the same question. So I think the example proposed by Miaohe is a little
>>>>>> difference from the case (hugetlb_vmemmap) here.
>>>>> 
>>>>> Per my understanding, memory barrier in PageUptodate() is needed because user might access the
>>>>> page contents using page_address() (corresponding pagetable entry already exists) soon. But for
>>>>> the above proposed case, if user wants to access the page contents, the corresponding pagetable
>>>>> should be visible first or the page contents can't be accessed. So there should be a data dependency
>>>>> acting as memory barrier between pagetable entry is loaded and page contents is accessed.
>>>>> Or am I miss something?
>>>> 
>>>> Yep, it is a data dependency. The difference between hugetlb_vmemmap and PageUptodate() is that
>>>> the page table (a pointer to the mapped page frame) is loaded by MMU while PageUptodate() is
>>>> loaded by CPU. Seems like the data dependency should be inserted between the MMU access and the CPU
>>>> access. Maybe it is hardware’s guarantee?
>>> I just found the comment in pmd_install() explained why most arch has no read
>> 
>> I think pmd_install() is a little different as well. We should make sure the
>> page table walker (like GUP) see the correct PTE entry after they see the pmd
>> entry.
> 
> The difference I can see is that pmd/pte thing has both hardware page walker and
> software page walker (like GUP) as read side. While the case here only has hardware
> page walker as read side. But I suppose the memory barrier requirement still apply
> here.

I am not against this change. Just in order to make me get a better understanding of
hardware behavior.

> 
> Maybe we could do a test: add large delay between reset_struct_page() and set_pte_at?

Hi Miaohe,

Would you mind doing this test? One thread do vmemmap_restore_pte(), another thread
detect if it can see a tail page with PG_head after the previous thread has executed
set_pte_at().

Thanks.

> 
> Regards
> Yin, Fengwei 
> 
>> 
>>> side memory barrier except alpha which has read side memory barrier.
>> 
>> Right. Only alpha has data dependency barrier.
>> 
>>> 
>>> 
>>> Regards
>>> Yin, Fengwei
>>> 
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Miaohe Lin
Miaohe Lin Aug. 18, 2022, 12:58 p.m. UTC | #16
On 2022/8/18 17:18, Muchun Song wrote:
> 
> 
>> On Aug 18, 2022, at 16:54, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>
>>
>>
>> On 8/18/2022 4:40 PM, Muchun Song wrote:
>>>
>>>
>>>> On Aug 18, 2022, at 16:32, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 8/18/2022 3:59 PM, Muchun Song wrote:
>>>>>
>>>>>
>>>>>> On Aug 18, 2022, at 15:52, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>>>
>>>>>> On 2022/8/18 10:47, Muchun Song wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 18, 2022, at 10:00, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 8/18/2022 9:55 AM, Miaohe Lin wrote:
>>>>>>>>>>>> 	/*
>>>>>>>>>>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>>>>>>>>>>> 	 * preceding stores to the page contents become visible before
>>>>>>>>>>>> 	 * the set_pte_at() write.
>>>>>>>>>>>> 	 */
>>>>>>>>>>>> 	__SetPageUptodate(page);
>>>>>>>>>>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>>>>>>>>>>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>>>>>>>>>>> can tell us more details.
>>>>>>>>>>>
>>>>>>>>>>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>>>>>>>>>>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>>>>>>>>>>> to make sure the order between the page’s contents and subsequent memory
>>>>>>>>>>> accesses using the corresponding virtual address, do you agree with this?
>>>>>>>>>> This is my understanding also. Thanks.
>>>>>>>>> That's also my understanding. Thanks both.
>>>>>>>> I have an unclear thing (not related with this patch directly): Who is response
>>>>>>>> for the read barrier in the read side in this case?
>>>>>>>>
>>>>>>>> For SetPageUptodate, there are paring write/read memory barrier.
>>>>>>>>
>>>>>>>
>>>>>>> I have the same question. So I think the example proposed by Miaohe is a little
>>>>>>> difference from the case (hugetlb_vmemmap) here.
>>>>>>
>>>>>> Per my understanding, memory barrier in PageUptodate() is needed because user might access the
>>>>>> page contents using page_address() (corresponding pagetable entry already exists) soon. But for
>>>>>> the above proposed case, if user wants to access the page contents, the corresponding pagetable
>>>>>> should be visible first or the page contents can't be accessed. So there should be a data dependency
>>>>>> acting as memory barrier between pagetable entry is loaded and page contents is accessed.
>>>>>> Or am I miss something?
>>>>>
>>>>> Yep, it is a data dependency. The difference between hugetlb_vmemmap and PageUptodate() is that
>>>>> the page table (a pointer to the mapped page frame) is loaded by MMU while PageUptodate() is
>>>>> loaded by CPU. Seems like the data dependency should be inserted between the MMU access and the CPU
>>>>> access. Maybe it is hardware’s guarantee?
>>>> I just found the comment in pmd_install() explained why most arch has no read
>>>
>>> I think pmd_install() is a little different as well. We should make sure the
>>> page table walker (like GUP) see the correct PTE entry after they see the pmd
>>> entry.
>>
>> The difference I can see is that pmd/pte thing has both hardware page walker and
>> software page walker (like GUP) as read side. While the case here only has hardware
>> page walker as read side. But I suppose the memory barrier requirement still apply
>> here.
> 
> I am not against this change. Just in order to make me get a better understanding of
> hardware behavior.
> 
>>
>> Maybe we could do a test: add large delay between reset_struct_page() and set_pte_at?
> 
> Hi Miaohe,
> 
> Would you mind doing this test? One thread do vmemmap_restore_pte(), another thread
> detect if it can see a tail page with PG_head after the previous thread has executed
> set_pte_at().

Will it be easier to construct the memory reorder manually like below?

vmemmap_restore_pte()
	...
	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
	/* might a delay. */
	copy_page(to, (void *)walk->reuse_addr);
	reset_struct_pages(to);

And another thread detects whether it can see a tail page with some invalid fields? If so,
it seems the problem will always trigger? If not, we depend on the observed meory reorder
and set_pte_at doesn't contain a memory barrier?

Thanks,
Miaohe Lin
Yin, Fengwei Aug. 18, 2022, 11:53 p.m. UTC | #17
On 8/18/2022 8:58 PM, Miaohe Lin wrote:
> On 2022/8/18 17:18, Muchun Song wrote:
>>
>>
>>> On Aug 18, 2022, at 16:54, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>
>>>
>>>
>>> On 8/18/2022 4:40 PM, Muchun Song wrote:
>>>>
>>>>
>>>>> On Aug 18, 2022, at 16:32, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 8/18/2022 3:59 PM, Muchun Song wrote:
>>>>>>
>>>>>>
>>>>>>> On Aug 18, 2022, at 15:52, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>>>>
>>>>>>> On 2022/8/18 10:47, Muchun Song wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Aug 18, 2022, at 10:00, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 8/18/2022 9:55 AM, Miaohe Lin wrote:
>>>>>>>>>>>>> 	/*
>>>>>>>>>>>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>>>>>>>>>>>> 	 * preceding stores to the page contents become visible before
>>>>>>>>>>>>> 	 * the set_pte_at() write.
>>>>>>>>>>>>> 	 */
>>>>>>>>>>>>> 	__SetPageUptodate(page);
>>>>>>>>>>>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>>>>>>>>>>>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>>>>>>>>>>>> can tell us more details.
>>>>>>>>>>>>
>>>>>>>>>>>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>>>>>>>>>>>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>>>>>>>>>>>> to make sure the order between the page’s contents and subsequent memory
>>>>>>>>>>>> accesses using the corresponding virtual address, do you agree with this?
>>>>>>>>>>> This is my understanding also. Thanks.
>>>>>>>>>> That's also my understanding. Thanks both.
>>>>>>>>> I have an unclear thing (not related with this patch directly): Who is response
>>>>>>>>> for the read barrier in the read side in this case?
>>>>>>>>>
>>>>>>>>> For SetPageUptodate, there are paring write/read memory barrier.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I have the same question. So I think the example proposed by Miaohe is a little
>>>>>>>> difference from the case (hugetlb_vmemmap) here.
>>>>>>>
>>>>>>> Per my understanding, memory barrier in PageUptodate() is needed because user might access the
>>>>>>> page contents using page_address() (corresponding pagetable entry already exists) soon. But for
>>>>>>> the above proposed case, if user wants to access the page contents, the corresponding pagetable
>>>>>>> should be visible first or the page contents can't be accessed. So there should be a data dependency
>>>>>>> acting as memory barrier between pagetable entry is loaded and page contents is accessed.
>>>>>>> Or am I miss something?
>>>>>>
>>>>>> Yep, it is a data dependency. The difference between hugetlb_vmemmap and PageUptodate() is that
>>>>>> the page table (a pointer to the mapped page frame) is loaded by MMU while PageUptodate() is
>>>>>> loaded by CPU. Seems like the data dependency should be inserted between the MMU access and the CPU
>>>>>> access. Maybe it is hardware’s guarantee?
>>>>> I just found the comment in pmd_install() explained why most arch has no read
>>>>
>>>> I think pmd_install() is a little different as well. We should make sure the
>>>> page table walker (like GUP) see the correct PTE entry after they see the pmd
>>>> entry.
>>>
>>> The difference I can see is that pmd/pte thing has both hardware page walker and
>>> software page walker (like GUP) as read side. While the case here only has hardware
>>> page walker as read side. But I suppose the memory barrier requirement still apply
>>> here.
>>
>> I am not against this change. Just in order to make me get a better understanding of
>> hardware behavior.
>>
>>>
>>> Maybe we could do a test: add large delay between reset_struct_page() and set_pte_at?
>>
>> Hi Miaohe,
>>
>> Would you mind doing this test? One thread do vmemmap_restore_pte(), another thread
>> detect if it can see a tail page with PG_head after the previous thread has executed
>> set_pte_at().
> 
> Will it be easier to construct the memory reorder manually like below?
> 
> vmemmap_restore_pte()
> 	...
> 	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
> 	/* might a delay. */
> 	copy_page(to, (void *)walk->reuse_addr);
> 	reset_struct_pages(to);
This should be correct change for the testing. :).

Regards
Yin, Fengwei

> 
> And another thread detects whether it can see a tail page with some invalid fields? If so,
> it seems the problem will always trigger? If not, we depend on the observed meory reorder
> and set_pte_at doesn't contain a memory barrier?
> 
> Thanks,
> Miaohe Lin
>
Muchun Song Aug. 19, 2022, 3:19 a.m. UTC | #18
> On Aug 18, 2022, at 20:58, Miaohe Lin <linmiaohe@huawei.com> wrote:
> 
> On 2022/8/18 17:18, Muchun Song wrote:
>> 
>> 
>>> On Aug 18, 2022, at 16:54, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>> 
>>> 
>>> 
>>> On 8/18/2022 4:40 PM, Muchun Song wrote:
>>>> 
>>>> 
>>>>> On Aug 18, 2022, at 16:32, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 8/18/2022 3:59 PM, Muchun Song wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Aug 18, 2022, at 15:52, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>>>> 
>>>>>>> On 2022/8/18 10:47, Muchun Song wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Aug 18, 2022, at 10:00, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 8/18/2022 9:55 AM, Miaohe Lin wrote:
>>>>>>>>>>>>> 	/*
>>>>>>>>>>>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>>>>>>>>>>>> 	 * preceding stores to the page contents become visible before
>>>>>>>>>>>>> 	 * the set_pte_at() write.
>>>>>>>>>>>>> 	 */
>>>>>>>>>>>>> 	__SetPageUptodate(page);
>>>>>>>>>>>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>>>>>>>>>>>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>>>>>>>>>>>> can tell us more details.
>>>>>>>>>>>> 
>>>>>>>>>>>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>>>>>>>>>>>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>>>>>>>>>>>> to make sure the order between the page’s contents and subsequent memory
>>>>>>>>>>>> accesses using the corresponding virtual address, do you agree with this?
>>>>>>>>>>> This is my understanding also. Thanks.
>>>>>>>>>> That's also my understanding. Thanks both.
>>>>>>>>> I have an unclear thing (not related with this patch directly): Who is response
>>>>>>>>> for the read barrier in the read side in this case?
>>>>>>>>> 
>>>>>>>>> For SetPageUptodate, there are paring write/read memory barrier.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> I have the same question. So I think the example proposed by Miaohe is a little
>>>>>>>> difference from the case (hugetlb_vmemmap) here.
>>>>>>> 
>>>>>>> Per my understanding, memory barrier in PageUptodate() is needed because user might access the
>>>>>>> page contents using page_address() (corresponding pagetable entry already exists) soon. But for
>>>>>>> the above proposed case, if user wants to access the page contents, the corresponding pagetable
>>>>>>> should be visible first or the page contents can't be accessed. So there should be a data dependency
>>>>>>> acting as memory barrier between pagetable entry is loaded and page contents is accessed.
>>>>>>> Or am I miss something?
>>>>>> 
>>>>>> Yep, it is a data dependency. The difference between hugetlb_vmemmap and PageUptodate() is that
>>>>>> the page table (a pointer to the mapped page frame) is loaded by MMU while PageUptodate() is
>>>>>> loaded by CPU. Seems like the data dependency should be inserted between the MMU access and the CPU
>>>>>> access. Maybe it is hardware’s guarantee?
>>>>> I just found the comment in pmd_install() explained why most arch has no read
>>>> 
>>>> I think pmd_install() is a little different as well. We should make sure the
>>>> page table walker (like GUP) see the correct PTE entry after they see the pmd
>>>> entry.
>>> 
>>> The difference I can see is that pmd/pte thing has both hardware page walker and
>>> software page walker (like GUP) as read side. While the case here only has hardware
>>> page walker as read side. But I suppose the memory barrier requirement still apply
>>> here.
>> 
>> I am not against this change. Just in order to make me get a better understanding of
>> hardware behavior.
>> 
>>> 
>>> Maybe we could do a test: add large delay between reset_struct_page() and set_pte_at?
>> 
>> Hi Miaohe,
>> 
>> Would you mind doing this test? One thread do vmemmap_restore_pte(), another thread
>> detect if it can see a tail page with PG_head after the previous thread has executed
>> set_pte_at().
> 
> Will it be easier to construct the memory reorder manually like below?
> 
> vmemmap_restore_pte()
> 	...
> 	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
> 	/* might a delay. */
> 	copy_page(to, (void *)walk->reuse_addr);
> 	reset_struct_pages(to);


Well, you have changed the code ordering. I thought we don’t change the code
ordering. Just let the hardware do reordering. The ideal scenario would be
as follows.


CPU0:						CPU1:

vmemmap_restore_pte()
	copy_page(to, (void *)walk->reuse_addr);
        reset_struct_pages(to); // clear the tail page’s PG_head
	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
						// Detect if it can see a tail page with PG_head.

I should admit it is a little difficult to construct the scenario. After more
thought, I think here should be inserted a barrier. So:

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.

> 
> And another thread detects whether it can see a tail page with some invalid fields? If so,
> it seems the problem will always trigger? If not, we depend on the observed meory reorder
> and set_pte_at doesn't contain a memory barrier?
> 
> Thanks,
> Miaohe Lin
Miaohe Lin Aug. 19, 2022, 7:26 a.m. UTC | #19
On 2022/8/19 11:19, Muchun Song wrote:
> 
> 
>> On Aug 18, 2022, at 20:58, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2022/8/18 17:18, Muchun Song wrote:
>>>
>>>
>>>> On Aug 18, 2022, at 16:54, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 8/18/2022 4:40 PM, Muchun Song wrote:
>>>>>
>>>>>
>>>>>> On Aug 18, 2022, at 16:32, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 8/18/2022 3:59 PM, Muchun Song wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 18, 2022, at 15:52, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>>>>>
>>>>>>>> On 2022/8/18 10:47, Muchun Song wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Aug 18, 2022, at 10:00, Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 8/18/2022 9:55 AM, Miaohe Lin wrote:
>>>>>>>>>>>>>> 	/*
>>>>>>>>>>>>>> 	 * The memory barrier inside __SetPageUptodate makes sure that
>>>>>>>>>>>>>> 	 * preceding stores to the page contents become visible before
>>>>>>>>>>>>>> 	 * the set_pte_at() write.
>>>>>>>>>>>>>> 	 */
>>>>>>>>>>>>>> 	__SetPageUptodate(page);
>>>>>>>>>>>>> IIUC, the case here we should make sure others (CPUs) can see new page’s
>>>>>>>>>>>>> contents after they have saw PG_uptodate is set. I think commit 0ed361dec369
>>>>>>>>>>>>> can tell us more details.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I also looked at commit 52f37629fd3c to see why we need a barrier before
>>>>>>>>>>>>> set_pte_at(), but I didn’t find any info to explain why. I guess we want
>>>>>>>>>>>>> to make sure the order between the page’s contents and subsequent memory
>>>>>>>>>>>>> accesses using the corresponding virtual address, do you agree with this?
>>>>>>>>>>>> This is my understanding also. Thanks.
>>>>>>>>>>> That's also my understanding. Thanks both.
>>>>>>>>>> I have an unclear thing (not related with this patch directly): Who is response
>>>>>>>>>> for the read barrier in the read side in this case?
>>>>>>>>>>
>>>>>>>>>> For SetPageUptodate, there are paring write/read memory barrier.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have the same question. So I think the example proposed by Miaohe is a little
>>>>>>>>> difference from the case (hugetlb_vmemmap) here.
>>>>>>>>
>>>>>>>> Per my understanding, memory barrier in PageUptodate() is needed because user might access the
>>>>>>>> page contents using page_address() (corresponding pagetable entry already exists) soon. But for
>>>>>>>> the above proposed case, if user wants to access the page contents, the corresponding pagetable
>>>>>>>> should be visible first or the page contents can't be accessed. So there should be a data dependency
>>>>>>>> acting as memory barrier between pagetable entry is loaded and page contents is accessed.
>>>>>>>> Or am I miss something?
>>>>>>>
>>>>>>> Yep, it is a data dependency. The difference between hugetlb_vmemmap and PageUptodate() is that
>>>>>>> the page table (a pointer to the mapped page frame) is loaded by MMU while PageUptodate() is
>>>>>>> loaded by CPU. Seems like the data dependency should be inserted between the MMU access and the CPU
>>>>>>> access. Maybe it is hardware’s guarantee?
>>>>>> I just found the comment in pmd_install() explained why most arch has no read
>>>>>
>>>>> I think pmd_install() is a little different as well. We should make sure the
>>>>> page table walker (like GUP) see the correct PTE entry after they see the pmd
>>>>> entry.
>>>>
>>>> The difference I can see is that pmd/pte thing has both hardware page walker and
>>>> software page walker (like GUP) as read side. While the case here only has hardware
>>>> page walker as read side. But I suppose the memory barrier requirement still apply
>>>> here.
>>>
>>> I am not against this change. Just in order to make me get a better understanding of
>>> hardware behavior.
>>>
>>>>
>>>> Maybe we could do a test: add large delay between reset_struct_page() and set_pte_at?
>>>
>>> Hi Miaohe,
>>>
>>> Would you mind doing this test? One thread do vmemmap_restore_pte(), another thread
>>> detect if it can see a tail page with PG_head after the previous thread has executed
>>> set_pte_at().
>>
>> Will it be easier to construct the memory reorder manually like below?
>>
>> vmemmap_restore_pte()
>> 	...
>> 	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
>> 	/* might a delay. */
>> 	copy_page(to, (void *)walk->reuse_addr);
>> 	reset_struct_pages(to);
> 
> 
> Well, you have changed the code ordering. I thought we don’t change the code
> ordering. Just let the hardware do reordering. The ideal scenario would be
> as follows.
> 
> 
> CPU0:						CPU1:
> 
> vmemmap_restore_pte()
> 	copy_page(to, (void *)walk->reuse_addr);
>         reset_struct_pages(to); // clear the tail page’s PG_head
> 	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
> 						// Detect if it can see a tail page with PG_head.
> 
> I should admit it is a little difficult to construct the scenario. After more
> thought, I think here should be inserted a barrier. So:
> 
> Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Many thanks both for review and discussion. :)

Thanks,
Miaohe Lin
Muchun Song Aug. 20, 2022, 8:12 a.m. UTC | #20
> On Aug 16, 2022, at 21:05, Miaohe Lin <linmiaohe@huawei.com> wrote:
> 
> The memory barrier smp_wmb() is needed to make sure that preceding stores
> to the page contents become visible before the below set_pte_at() write.

I found another place where is a similar case. See kasan_populate_vmalloc_pte() in
mm/kasan/shadow.c. 

Should we fix it as well?


> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
> mm/hugetlb_vmemmap.c | 5 +++++
> 1 file changed, 5 insertions(+)
> 
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 20f414c0379f..76b2d03a0d8d 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -287,6 +287,11 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
> 	copy_page(to, (void *)walk->reuse_addr);
> 	reset_struct_pages(to);
> 
> +	/*
> +	 * Makes sure that preceding stores to the page contents become visible
> +	 * before the set_pte_at() write.
> +	 */
> +	smp_wmb();
> 	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
> }
> 
> -- 
> 2.23.0
> 
>
Miaohe Lin Aug. 22, 2022, 8:45 a.m. UTC | #21
On 2022/8/20 16:12, Muchun Song wrote:
> 
> 
>> On Aug 16, 2022, at 21:05, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> The memory barrier smp_wmb() is needed to make sure that preceding stores
>> to the page contents become visible before the below set_pte_at() write.
> 
> I found another place where is a similar case. See kasan_populate_vmalloc_pte() in
> mm/kasan/shadow.c. 

Thanks for your report.

> 
> Should we fix it as well?

I'm not familiar with kasan yet, but I think memory barrier is needed here or memory corrupt
can't be detected until the contents are visible. smp_mb__after_atomic before set_pte_at should
be enough? What's your opinion?

Thanks,
Miaohe Lin
Muchun Song Aug. 22, 2022, 10:23 a.m. UTC | #22
> On Aug 22, 2022, at 16:45, Miaohe Lin <linmiaohe@huawei.com> wrote:
> 
> On 2022/8/20 16:12, Muchun Song wrote:
>> 
>> 
>>> On Aug 16, 2022, at 21:05, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>> 
>>> The memory barrier smp_wmb() is needed to make sure that preceding stores
>>> to the page contents become visible before the below set_pte_at() write.
>> 
>> I found another place where is a similar case. See kasan_populate_vmalloc_pte() in
>> mm/kasan/shadow.c. 
> 
> Thanks for your report.
> 
>> 
>> Should we fix it as well?
> 
> I'm not familiar with kasan yet, but I think memory barrier is needed here or memory corrupt
> can't be detected until the contents are visible. smp_mb__after_atomic before set_pte_at should
> be enough? What's your opinion?

I didn’t see any atomic operation between set_pte_at() and memset(), I don’t think
smp_mb__after_atomic() is feasible if we really need to insert a barrier. I suggest
you to send a RFC patch to KASAN maintainers, they are more familiar with this than
us.

Thanks.

> 
> Thanks,
> Miaohe Lin
> 
>
Miaohe Lin Aug. 23, 2022, 1:42 a.m. UTC | #23
On 2022/8/22 18:23, Muchun Song wrote:
> 
> 
>> On Aug 22, 2022, at 16:45, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2022/8/20 16:12, Muchun Song wrote:
>>>
>>>
>>>> On Aug 16, 2022, at 21:05, Miaohe Lin <linmiaohe@huawei.com> wrote:
>>>>
>>>> The memory barrier smp_wmb() is needed to make sure that preceding stores
>>>> to the page contents become visible before the below set_pte_at() write.
>>>
>>> I found another place where is a similar case. See kasan_populate_vmalloc_pte() in
>>> mm/kasan/shadow.c. 
>>
>> Thanks for your report.
>>
>>>
>>> Should we fix it as well?
>>
>> I'm not familiar with kasan yet, but I think memory barrier is needed here or memory corrupt
>> can't be detected until the contents are visible. smp_mb__after_atomic before set_pte_at should
>> be enough? What's your opinion?
> 
> I didn’t see any atomic operation between set_pte_at() and memset(), I don’t think
> smp_mb__after_atomic() is feasible if we really need to insert a barrier. I suggest

Oh, it should be smp_mb__after_spinlock(), i.e. something like below:

diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 0e3648b603a6..38e503c89740 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -277,6 +277,7 @@ static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned long addr,

        spin_lock(&init_mm.page_table_lock);
        if (likely(pte_none(*ptep))) {
+               smp_mb__after_spinlock();
                set_pte_at(&init_mm, addr, ptep, pte);
                page = 0;
        }

Does this make sense for you?

> you to send a RFC patch to KASAN maintainers, they are more familiar with this than
> us.

Sounds like a good idea. Will do it.

Thanks,
Miaohe Lin
diff mbox series

Patch

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 20f414c0379f..76b2d03a0d8d 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -287,6 +287,11 @@  static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
 	copy_page(to, (void *)walk->reuse_addr);
 	reset_struct_pages(to);
 
+	/*
+	 * Makes sure that preceding stores to the page contents become visible
+	 * before the set_pte_at() write.
+	 */
+	smp_wmb();
 	set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
 }