diff mbox series

[v2,4/6] LoongArch: KVM: Add memory barrier before update pmd entry

Message ID 20240619080940.2690756-5-maobibo@loongson.cn (mailing list archive)
State New, archived
Headers show
Series LoongArch: KVM: Fix some issues relative with mmu | expand

Commit Message

Bibo Mao June 19, 2024, 8:09 a.m. UTC
When updating pmd entry such as allocating new pmd page or splitting
huge page into normal page, it is necessary to firstly update all pte
entries, and then update pmd entry.

It is weak order with LoongArch system, there will be problem if other
vcpus sees pmd update firstly however pte is not updated. Here smp_wmb()
is added to assure this.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
---
 arch/loongarch/kvm/mmu.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Huacai Chen June 23, 2024, 10:18 a.m. UTC | #1
Hi, Bibo,

On Wed, Jun 19, 2024 at 4:09 PM Bibo Mao <maobibo@loongson.cn> wrote:
>
> When updating pmd entry such as allocating new pmd page or splitting
> huge page into normal page, it is necessary to firstly update all pte
> entries, and then update pmd entry.
>
> It is weak order with LoongArch system, there will be problem if other
> vcpus sees pmd update firstly however pte is not updated. Here smp_wmb()
> is added to assure this.
Memory barriers should be in pairs in most cases. That means you may
lose smp_rmb() in another place.

Huacai

>
> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> ---
>  arch/loongarch/kvm/mmu.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
> index 1690828bd44b..7f04edfbe428 100644
> --- a/arch/loongarch/kvm/mmu.c
> +++ b/arch/loongarch/kvm/mmu.c
> @@ -163,6 +163,7 @@ static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm,
>
>                         child = kvm_mmu_memory_cache_alloc(cache);
>                         _kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]);
> +                       smp_wmb(); /* make pte visible before pmd */
>                         kvm_set_pte(entry, __pa(child));
>                 } else if (kvm_pte_huge(*entry)) {
>                         return entry;
> @@ -746,6 +747,7 @@ static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g
>                 val += PAGE_SIZE;
>         }
>
> +       smp_wmb();
>         /* The later kvm_flush_tlb_gpa() will flush hugepage tlb */
>         kvm_set_pte(ptep, __pa(child));
>
> --
> 2.39.3
>
Bibo Mao June 24, 2024, 1:37 a.m. UTC | #2
On 2024/6/23 下午6:18, Huacai Chen wrote:
> Hi, Bibo,
> 
> On Wed, Jun 19, 2024 at 4:09 PM Bibo Mao <maobibo@loongson.cn> wrote:
>>
>> When updating pmd entry such as allocating new pmd page or splitting
>> huge page into normal page, it is necessary to firstly update all pte
>> entries, and then update pmd entry.
>>
>> It is weak order with LoongArch system, there will be problem if other
>> vcpus sees pmd update firstly however pte is not updated. Here smp_wmb()
>> is added to assure this.
> Memory barriers should be in pairs in most cases. That means you may
> lose smp_rmb() in another place.
The idea adding smp_wmb() comes from function __split_huge_pmd_locked()
in file mm/huge_memory.c, and the explanation is reasonable.

                 ...
                 set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
         }
         ...
         smp_wmb(); /* make pte visible before pmd */
         pmd_populate(mm, pmd, pgtable);

It is strange that why smp_rmb() should be in pairs with smp_wmb(),
I never hear this rule -:(

Regards
Bibo Mao
> 
> Huacai
> 
>>
>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>> ---
>>   arch/loongarch/kvm/mmu.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
>> index 1690828bd44b..7f04edfbe428 100644
>> --- a/arch/loongarch/kvm/mmu.c
>> +++ b/arch/loongarch/kvm/mmu.c
>> @@ -163,6 +163,7 @@ static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm,
>>
>>                          child = kvm_mmu_memory_cache_alloc(cache);
>>                          _kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]);
>> +                       smp_wmb(); /* make pte visible before pmd */
>>                          kvm_set_pte(entry, __pa(child));
>>                  } else if (kvm_pte_huge(*entry)) {
>>                          return entry;
>> @@ -746,6 +747,7 @@ static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g
>>                  val += PAGE_SIZE;
>>          }
>>
>> +       smp_wmb();
>>          /* The later kvm_flush_tlb_gpa() will flush hugepage tlb */
>>          kvm_set_pte(ptep, __pa(child));
>>
>> --
>> 2.39.3
>>
Huacai Chen June 24, 2024, 1:56 a.m. UTC | #3
On Mon, Jun 24, 2024 at 9:37 AM maobibo <maobibo@loongson.cn> wrote:
>
>
>
> On 2024/6/23 下午6:18, Huacai Chen wrote:
> > Hi, Bibo,
> >
> > On Wed, Jun 19, 2024 at 4:09 PM Bibo Mao <maobibo@loongson.cn> wrote:
> >>
> >> When updating pmd entry such as allocating new pmd page or splitting
> >> huge page into normal page, it is necessary to firstly update all pte
> >> entries, and then update pmd entry.
> >>
> >> It is weak order with LoongArch system, there will be problem if other
> >> vcpus sees pmd update firstly however pte is not updated. Here smp_wmb()
> >> is added to assure this.
> > Memory barriers should be in pairs in most cases. That means you may
> > lose smp_rmb() in another place.
> The idea adding smp_wmb() comes from function __split_huge_pmd_locked()
> in file mm/huge_memory.c, and the explanation is reasonable.
>
>                  ...
>                  set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
>          }
>          ...
>          smp_wmb(); /* make pte visible before pmd */
>          pmd_populate(mm, pmd, pgtable);
>
> It is strange that why smp_rmb() should be in pairs with smp_wmb(),
> I never hear this rule -:(
https://docs.kernel.org/core-api/wrappers/memory-barriers.html

SMP BARRIER PAIRING
-------------------

When dealing with CPU-CPU interactions, certain types of memory barrier should
always be paired.  A lack of appropriate pairing is almost certainly an error.


Huacai

>
> Regards
> Bibo Mao
> >
> > Huacai
> >
> >>
> >> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> >> ---
> >>   arch/loongarch/kvm/mmu.c | 2 ++
> >>   1 file changed, 2 insertions(+)
> >>
> >> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
> >> index 1690828bd44b..7f04edfbe428 100644
> >> --- a/arch/loongarch/kvm/mmu.c
> >> +++ b/arch/loongarch/kvm/mmu.c
> >> @@ -163,6 +163,7 @@ static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm,
> >>
> >>                          child = kvm_mmu_memory_cache_alloc(cache);
> >>                          _kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]);
> >> +                       smp_wmb(); /* make pte visible before pmd */
> >>                          kvm_set_pte(entry, __pa(child));
> >>                  } else if (kvm_pte_huge(*entry)) {
> >>                          return entry;
> >> @@ -746,6 +747,7 @@ static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g
> >>                  val += PAGE_SIZE;
> >>          }
> >>
> >> +       smp_wmb();
> >>          /* The later kvm_flush_tlb_gpa() will flush hugepage tlb */
> >>          kvm_set_pte(ptep, __pa(child));
> >>
> >> --
> >> 2.39.3
> >>
>
>
Bibo Mao June 24, 2024, 2:21 a.m. UTC | #4
On 2024/6/24 上午9:56, Huacai Chen wrote:
> On Mon, Jun 24, 2024 at 9:37 AM maobibo <maobibo@loongson.cn> wrote:
>>
>>
>>
>> On 2024/6/23 下午6:18, Huacai Chen wrote:
>>> Hi, Bibo,
>>>
>>> On Wed, Jun 19, 2024 at 4:09 PM Bibo Mao <maobibo@loongson.cn> wrote:
>>>>
>>>> When updating pmd entry such as allocating new pmd page or splitting
>>>> huge page into normal page, it is necessary to firstly update all pte
>>>> entries, and then update pmd entry.
>>>>
>>>> It is weak order with LoongArch system, there will be problem if other
>>>> vcpus sees pmd update firstly however pte is not updated. Here smp_wmb()
>>>> is added to assure this.
>>> Memory barriers should be in pairs in most cases. That means you may
>>> lose smp_rmb() in another place.
>> The idea adding smp_wmb() comes from function __split_huge_pmd_locked()
>> in file mm/huge_memory.c, and the explanation is reasonable.
>>
>>                   ...
>>                   set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
>>           }
>>           ...
>>           smp_wmb(); /* make pte visible before pmd */
>>           pmd_populate(mm, pmd, pgtable);
>>
>> It is strange that why smp_rmb() should be in pairs with smp_wmb(),
>> I never hear this rule -:(
> https://docs.kernel.org/core-api/wrappers/memory-barriers.html
> 
> SMP BARRIER PAIRING
> -------------------
> 
> When dealing with CPU-CPU interactions, certain types of memory barrier should
> always be paired.  A lack of appropriate pairing is almost certainly an error.
    CPU 1                 CPU 2
         ===============       ===============
         WRITE_ONCE(a, 1);
         <write barrier>
         WRITE_ONCE(b, 2);     x = READ_ONCE(b);
                               <read barrier>
                               y = READ_ONCE(a);

With split_huge scenery to update pte/pmd entry, there is no strong 
relationship between address ptex and pmd.
CPU1
      WRITE_ONCE(pte0, 1);
      WRITE_ONCE(pte511, 1);
      <write barrier>
      WRITE_ONCE(pmd, 2);

However with page table walk scenery, address ptep depends on the 
contents of pmd, so it is not necessary to add smp_rmb().
         ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
         if (!ptep)
                 return no_page_table(vma, flags, address);
         pte = ptep_get(ptep);
         if (!pte_present(pte))

It is just my option, or do you think where smp_rmb() barrier should be 
added in page table reader path?

Regards
Bibo Mao
> 
> 
> Huacai
> 
>>
>> Regards
>> Bibo Mao
>>>
>>> Huacai
>>>
>>>>
>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>>>> ---
>>>>    arch/loongarch/kvm/mmu.c | 2 ++
>>>>    1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
>>>> index 1690828bd44b..7f04edfbe428 100644
>>>> --- a/arch/loongarch/kvm/mmu.c
>>>> +++ b/arch/loongarch/kvm/mmu.c
>>>> @@ -163,6 +163,7 @@ static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm,
>>>>
>>>>                           child = kvm_mmu_memory_cache_alloc(cache);
>>>>                           _kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]);
>>>> +                       smp_wmb(); /* make pte visible before pmd */
>>>>                           kvm_set_pte(entry, __pa(child));
>>>>                   } else if (kvm_pte_huge(*entry)) {
>>>>                           return entry;
>>>> @@ -746,6 +747,7 @@ static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g
>>>>                   val += PAGE_SIZE;
>>>>           }
>>>>
>>>> +       smp_wmb();
>>>>           /* The later kvm_flush_tlb_gpa() will flush hugepage tlb */
>>>>           kvm_set_pte(ptep, __pa(child));
>>>>
>>>> --
>>>> 2.39.3
>>>>
>>
>>
Huacai Chen June 24, 2024, 4:18 a.m. UTC | #5
On Mon, Jun 24, 2024 at 10:21 AM maobibo <maobibo@loongson.cn> wrote:
>
>
>
> On 2024/6/24 上午9:56, Huacai Chen wrote:
> > On Mon, Jun 24, 2024 at 9:37 AM maobibo <maobibo@loongson.cn> wrote:
> >>
> >>
> >>
> >> On 2024/6/23 下午6:18, Huacai Chen wrote:
> >>> Hi, Bibo,
> >>>
> >>> On Wed, Jun 19, 2024 at 4:09 PM Bibo Mao <maobibo@loongson.cn> wrote:
> >>>>
> >>>> When updating pmd entry such as allocating new pmd page or splitting
> >>>> huge page into normal page, it is necessary to firstly update all pte
> >>>> entries, and then update pmd entry.
> >>>>
> >>>> It is weak order with LoongArch system, there will be problem if other
> >>>> vcpus sees pmd update firstly however pte is not updated. Here smp_wmb()
> >>>> is added to assure this.
> >>> Memory barriers should be in pairs in most cases. That means you may
> >>> lose smp_rmb() in another place.
> >> The idea adding smp_wmb() comes from function __split_huge_pmd_locked()
> >> in file mm/huge_memory.c, and the explanation is reasonable.
> >>
> >>                   ...
> >>                   set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
> >>           }
> >>           ...
> >>           smp_wmb(); /* make pte visible before pmd */
> >>           pmd_populate(mm, pmd, pgtable);
> >>
> >> It is strange that why smp_rmb() should be in pairs with smp_wmb(),
> >> I never hear this rule -:(
> > https://docs.kernel.org/core-api/wrappers/memory-barriers.html
> >
> > SMP BARRIER PAIRING
> > -------------------
> >
> > When dealing with CPU-CPU interactions, certain types of memory barrier should
> > always be paired.  A lack of appropriate pairing is almost certainly an error.
>     CPU 1                 CPU 2
>          ===============       ===============
>          WRITE_ONCE(a, 1);
>          <write barrier>
>          WRITE_ONCE(b, 2);     x = READ_ONCE(b);
>                                <read barrier>
>                                y = READ_ONCE(a);
>
> With split_huge scenery to update pte/pmd entry, there is no strong
> relationship between address ptex and pmd.
> CPU1
>       WRITE_ONCE(pte0, 1);
>       WRITE_ONCE(pte511, 1);
>       <write barrier>
>       WRITE_ONCE(pmd, 2);
>
> However with page table walk scenery, address ptep depends on the
> contents of pmd, so it is not necessary to add smp_rmb().
>          ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
>          if (!ptep)
>                  return no_page_table(vma, flags, address);
>          pte = ptep_get(ptep);
>          if (!pte_present(pte))
>
> It is just my option, or do you think where smp_rmb() barrier should be
> added in page table reader path?
There are some possibilities:
1. Read barrier is missing in some places;
2. Write barrier is also unnecessary here;
3. Read barrier is really unnecessary, but there is a better API to
replace the write barrier;
4. Read barrier is really unnecessary, and write barrier is really the
best API here.

Maybe Rui Wang knows better here.

Huacai

>
> Regards
> Bibo Mao
> >
> >
> > Huacai
> >
> >>
> >> Regards
> >> Bibo Mao
> >>>
> >>> Huacai
> >>>
> >>>>
> >>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> >>>> ---
> >>>>    arch/loongarch/kvm/mmu.c | 2 ++
> >>>>    1 file changed, 2 insertions(+)
> >>>>
> >>>> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
> >>>> index 1690828bd44b..7f04edfbe428 100644
> >>>> --- a/arch/loongarch/kvm/mmu.c
> >>>> +++ b/arch/loongarch/kvm/mmu.c
> >>>> @@ -163,6 +163,7 @@ static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm,
> >>>>
> >>>>                           child = kvm_mmu_memory_cache_alloc(cache);
> >>>>                           _kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]);
> >>>> +                       smp_wmb(); /* make pte visible before pmd */
> >>>>                           kvm_set_pte(entry, __pa(child));
> >>>>                   } else if (kvm_pte_huge(*entry)) {
> >>>>                           return entry;
> >>>> @@ -746,6 +747,7 @@ static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g
> >>>>                   val += PAGE_SIZE;
> >>>>           }
> >>>>
> >>>> +       smp_wmb();
> >>>>           /* The later kvm_flush_tlb_gpa() will flush hugepage tlb */
> >>>>           kvm_set_pte(ptep, __pa(child));
> >>>>
> >>>> --
> >>>> 2.39.3
> >>>>
> >>
> >>
>
>
WANG Rui June 24, 2024, 6 a.m. UTC | #6
Hi,

On Mon, Jun 24, 2024 at 12:18 PM Huacai Chen <chenhuacai@kernel.org> wrote:
>
> On Mon, Jun 24, 2024 at 10:21 AM maobibo <maobibo@loongson.cn> wrote:
> >
> >
> >
> > On 2024/6/24 上午9:56, Huacai Chen wrote:
> > > On Mon, Jun 24, 2024 at 9:37 AM maobibo <maobibo@loongson.cn> wrote:
> > >>
> > >>
> > >>
> > >> On 2024/6/23 下午6:18, Huacai Chen wrote:
> > >>> Hi, Bibo,
> > >>>
> > >>> On Wed, Jun 19, 2024 at 4:09 PM Bibo Mao <maobibo@loongson.cn> wrote:
> > >>>>
> > >>>> When updating pmd entry such as allocating new pmd page or splitting
> > >>>> huge page into normal page, it is necessary to firstly update all pte
> > >>>> entries, and then update pmd entry.
> > >>>>
> > >>>> It is weak order with LoongArch system, there will be problem if other
> > >>>> vcpus sees pmd update firstly however pte is not updated. Here smp_wmb()
> > >>>> is added to assure this.
> > >>> Memory barriers should be in pairs in most cases. That means you may
> > >>> lose smp_rmb() in another place.
> > >> The idea adding smp_wmb() comes from function __split_huge_pmd_locked()
> > >> in file mm/huge_memory.c, and the explanation is reasonable.
> > >>
> > >>                   ...
> > >>                   set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
> > >>           }
> > >>           ...
> > >>           smp_wmb(); /* make pte visible before pmd */
> > >>           pmd_populate(mm, pmd, pgtable);
> > >>
> > >> It is strange that why smp_rmb() should be in pairs with smp_wmb(),
> > >> I never hear this rule -:(
> > > https://docs.kernel.org/core-api/wrappers/memory-barriers.html
> > >
> > > SMP BARRIER PAIRING
> > > -------------------
> > >
> > > When dealing with CPU-CPU interactions, certain types of memory barrier should
> > > always be paired.  A lack of appropriate pairing is almost certainly an error.
> >     CPU 1                 CPU 2
> >          ===============       ===============
> >          WRITE_ONCE(a, 1);
> >          <write barrier>
> >          WRITE_ONCE(b, 2);     x = READ_ONCE(b);
> >                                <read barrier>
> >                                y = READ_ONCE(a);
> >
> > With split_huge scenery to update pte/pmd entry, there is no strong
> > relationship between address ptex and pmd.
> > CPU1
> >       WRITE_ONCE(pte0, 1);
> >       WRITE_ONCE(pte511, 1);
> >       <write barrier>
> >       WRITE_ONCE(pmd, 2);
> >
> > However with page table walk scenery, address ptep depends on the
> > contents of pmd, so it is not necessary to add smp_rmb().
> >          ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
> >          if (!ptep)
> >                  return no_page_table(vma, flags, address);
> >          pte = ptep_get(ptep);
> >          if (!pte_present(pte))
> >
> > It is just my option, or do you think where smp_rmb() barrier should be
> > added in page table reader path?
> There are some possibilities:
> 1. Read barrier is missing in some places;
> 2. Write barrier is also unnecessary here;
> 3. Read barrier is really unnecessary, but there is a better API to
> replace the write barrier;
> 4. Read barrier is really unnecessary, and write barrier is really the
> best API here.
>
> Maybe Rui Wang knows better here.

It appears that reading the pte address is data-dependent on the pmd,
rather than control-dependent. This creates an opportunity to omit the
read-side memory barrier.

Cheers,
-Rui


>
> Huacai
>
> >
> > Regards
> > Bibo Mao
> > >
> > >
> > > Huacai
> > >
> > >>
> > >> Regards
> > >> Bibo Mao
> > >>>
> > >>> Huacai
> > >>>
> > >>>>
> > >>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> > >>>> ---
> > >>>>    arch/loongarch/kvm/mmu.c | 2 ++
> > >>>>    1 file changed, 2 insertions(+)
> > >>>>
> > >>>> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
> > >>>> index 1690828bd44b..7f04edfbe428 100644
> > >>>> --- a/arch/loongarch/kvm/mmu.c
> > >>>> +++ b/arch/loongarch/kvm/mmu.c
> > >>>> @@ -163,6 +163,7 @@ static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm,
> > >>>>
> > >>>>                           child = kvm_mmu_memory_cache_alloc(cache);
> > >>>>                           _kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]);
> > >>>> +                       smp_wmb(); /* make pte visible before pmd */
> > >>>>                           kvm_set_pte(entry, __pa(child));
> > >>>>                   } else if (kvm_pte_huge(*entry)) {
> > >>>>                           return entry;
> > >>>> @@ -746,6 +747,7 @@ static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g
> > >>>>                   val += PAGE_SIZE;
> > >>>>           }
> > >>>>
> > >>>> +       smp_wmb();
> > >>>>           /* The later kvm_flush_tlb_gpa() will flush hugepage tlb */
> > >>>>           kvm_set_pte(ptep, __pa(child));
> > >>>>
> > >>>> --
> > >>>> 2.39.3
> > >>>>
> > >>
> > >>
> >
> >
>
diff mbox series

Patch

diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c
index 1690828bd44b..7f04edfbe428 100644
--- a/arch/loongarch/kvm/mmu.c
+++ b/arch/loongarch/kvm/mmu.c
@@ -163,6 +163,7 @@  static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm,
 
 			child = kvm_mmu_memory_cache_alloc(cache);
 			_kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]);
+			smp_wmb(); /* make pte visible before pmd */
 			kvm_set_pte(entry, __pa(child));
 		} else if (kvm_pte_huge(*entry)) {
 			return entry;
@@ -746,6 +747,7 @@  static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g
 		val += PAGE_SIZE;
 	}
 
+	smp_wmb();
 	/* The later kvm_flush_tlb_gpa() will flush hugepage tlb */
 	kvm_set_pte(ptep, __pa(child));