mbox series

[v2,0/3] RFC: Solve several problems in stage 2 translation

Message ID 20201216122844.25092-1-wangyanan55@huawei.com (mailing list archive)
Headers show
Series RFC: Solve several problems in stage 2 translation | expand

Message

Yanan Wang Dec. 16, 2020, 12:28 p.m. UTC
Hi, this is the second version, thanks for reading.

PATCH1/3:
Procedures of hyp stage 1 mapping and guest stage 2 mapping are different, but
they are tied closely by function kvm_set_valid_leaf_pte(). So separate them by
rewriting kvm_set_valid_leaf_pte().

PATCH2/3:
To avoid unnecessary update and small loops, add prejudgement in the translation
fault handler: Skip updating the PTE with break-before-make if we are trying to
recreate the exact same mapping or only change the access permissions. Actually,
change of permissions will be handled through the relax_perms path next time if
necessary.

(1) If there are some vCPUs accessing the same GPA at the same time and the leaf
PTE is not set yet, then they will all cause translation faults and the first vCPU
holding mmu_lock will set valid leaf PTE, and the others will later update the old
PTE with a new one if they are different.

(2) When changing a leaf entry or a table entry with break-before-make, if there
are some vCPUs accessing the same GPA just catch the moment when the target PTE
is set invalid in a BBM procedure coincidentally, they will all cause translation
faults and will later update the old PTE with a new one if they are different.

The worst case can be like this: vCPU A causes a translation fault with RW prot and
sets the leaf PTE with RW permissions, and then the next vCPU B with RO prot updates
the PTE back to RO permissions with break-before-make. And the BBM-invalid moment
may trigger more unnecessary translation faults, then some useless small loops might
occur which could lead to vCPU stuck.

PATCH3/3:
We now mark the page dirty and set the bitmap before calling fault handlers in
user_mem_abort(), and we might end up having spurious dirty pages if update of
permissions or mapping has failed. So, mark the page dirty only if the fault is
handled successfully.

Let the guest directly enter again but not return to userspace if we were trying
to recreate the same mapping or only change access permissions with BBM, which is
not permitted in the mapping path.

Changes from v1:
- Make part of the diff as an independent patch (PATCH1/3),
  and add Will's Signed-off-by.
- Use *return -EPERM* way when changing permissions only in the mapping path.
- Add a new patch (PATCH3/3).

Yanan Wang (3):
  KVM: arm64: Decouple partial code of hyp stage 1 mapping and guest
    stage 2 mapping
  KVM: arm64: Add prejudgement for relaxing permissions only case in
    stage2 translation fault handler
  KVM: arm64: Mark the page dirty only if the fault is handled
    successfully

 arch/arm64/kvm/hyp/pgtable.c | 78 ++++++++++++++++++++----------------
 arch/arm64/kvm/mmu.c         | 18 +++++++--
 2 files changed, 58 insertions(+), 38 deletions(-)

Comments

Yanan Wang Dec. 28, 2020, 2:46 a.m. UTC | #1
Hi Will, Marc,

Gently Ping.

Is there any comments about this V2 series?


Many thanks,

Yanan.

On 2020/12/16 20:28, Yanan Wang wrote:
> Hi, this is the second version, thanks for reading.
>
> PATCH1/3:
> Procedures of hyp stage 1 mapping and guest stage 2 mapping are different, but
> they are tied closely by function kvm_set_valid_leaf_pte(). So separate them by
> rewriting kvm_set_valid_leaf_pte().
>
> PATCH2/3:
> To avoid unnecessary update and small loops, add prejudgement in the translation
> fault handler: Skip updating the PTE with break-before-make if we are trying to
> recreate the exact same mapping or only change the access permissions. Actually,
> change of permissions will be handled through the relax_perms path next time if
> necessary.
>
> (1) If there are some vCPUs accessing the same GPA at the same time and the leaf
> PTE is not set yet, then they will all cause translation faults and the first vCPU
> holding mmu_lock will set valid leaf PTE, and the others will later update the old
> PTE with a new one if they are different.
>
> (2) When changing a leaf entry or a table entry with break-before-make, if there
> are some vCPUs accessing the same GPA just catch the moment when the target PTE
> is set invalid in a BBM procedure coincidentally, they will all cause translation
> faults and will later update the old PTE with a new one if they are different.
>
> The worst case can be like this: vCPU A causes a translation fault with RW prot and
> sets the leaf PTE with RW permissions, and then the next vCPU B with RO prot updates
> the PTE back to RO permissions with break-before-make. And the BBM-invalid moment
> may trigger more unnecessary translation faults, then some useless small loops might
> occur which could lead to vCPU stuck.
>
> PATCH3/3:
> We now mark the page dirty and set the bitmap before calling fault handlers in
> user_mem_abort(), and we might end up having spurious dirty pages if update of
> permissions or mapping has failed. So, mark the page dirty only if the fault is
> handled successfully.
>
> Let the guest directly enter again but not return to userspace if we were trying
> to recreate the same mapping or only change access permissions with BBM, which is
> not permitted in the mapping path.
>
> Changes from v1:
> - Make part of the diff as an independent patch (PATCH1/3),
>    and add Will's Signed-off-by.
> - Use *return -EPERM* way when changing permissions only in the mapping path.
> - Add a new patch (PATCH3/3).
>
> Yanan Wang (3):
>    KVM: arm64: Decouple partial code of hyp stage 1 mapping and guest
>      stage 2 mapping
>    KVM: arm64: Add prejudgement for relaxing permissions only case in
>      stage2 translation fault handler
>    KVM: arm64: Mark the page dirty only if the fault is handled
>      successfully
>
>   arch/arm64/kvm/hyp/pgtable.c | 78 ++++++++++++++++++++----------------
>   arch/arm64/kvm/mmu.c         | 18 +++++++--
>   2 files changed, 58 insertions(+), 38 deletions(-)
>