mbox series

[RFC,v1,0/5] Enable CPU TTRem feature for stage-2

Message ID 20210126134202.381996-1-wangyanan55@huawei.com (mailing list archive)
Headers show
Series Enable CPU TTRem feature for stage-2 | expand

Message

Yanan Wang Jan. 26, 2021, 1:41 p.m. UTC
Hi all,
This series enable CPU TTRem feature for stage-2 page table and a RFC is sent
for some comments, thanks.

The ARMv8.4 TTRem feature offers 3 levels of support when changing block
size without changing any other parameters that are listed as requiring use
of break-before-make. And I found that maybe we can use this feature to make
some improvement for stage-2 page table and the following explains what
TTRem exactly does for the improvement.

If migration of a VM with hugepages is canceled midway, KVM will adjust the
stage-2 table mappings back to block mappings. We currently use BBM to replace
the table entry with a block entry. Take adjustment of 1G block mapping as an
example, with BBM procedures, we have to invalidate the old table entry first,
flush TLB and unmap the old table mappings, right before installing the new
block entry.

So there will be a bit long period when the old table entry is invalid before
installation of the new block entry, if other vCPUs access any guest page within
the 1G range during this period and find the table entry invalid, they will all
exit from guest with a translation fault. Actually, these translation faults
are not necessary, because the block mapping will be built later. Besides, KVM
will still try to build 1G block mappings for these spurious translation faults,
and will perform cache maintenance operations, page table walk, etc.

In summary, the spurious faults are caused by invalidation in BBM procedures.
Approaches of TTRem level 1,2 ensure that there will not be a moment when the
old table entry is invalid before installation of the new block entry. However,
level-2 method will possibly lead to a TLB conflict which is bothering, so we
use nT both at level-1 and level-2 case to avoid handling TLB conflict aborts.

For an implementation which meets level 1 or level 2, the CPU has two responses
to choose when accessing a block table entry with nT bit set: Firstly, CPU will
generate a translation fault, the effect of this response is simier to BBM.
Secondly, CPU can use the block entry for translation. So with the second kind
of implementation, the above described spurious translations can be prevented.

Yanan Wang (5):
  KVM: arm64: Detect the ARMv8.4 TTRem feature
  KVM: arm64: Add an API to get level of TTRem supported by hardware
  KVM: arm64: Support usage of TTRem in guest stage-2 translation
  KVM: arm64: Add handling of coalescing tables into a block mapping
  KVM: arm64: Adapt page-table code to new handling of coalescing tables

 arch/arm64/include/asm/cpucaps.h    |  3 +-
 arch/arm64/include/asm/cpufeature.h | 13 ++++++
 arch/arm64/kernel/cpufeature.c      | 10 +++++
 arch/arm64/kvm/hyp/pgtable.c        | 62 +++++++++++++++++++++++------
 4 files changed, 74 insertions(+), 14 deletions(-)

Comments

Marc Zyngier Jan. 26, 2021, 2:18 p.m. UTC | #1
Hi Yanan,

On 2021-01-26 13:41, Yanan Wang wrote:
> Hi all,
> This series enable CPU TTRem feature for stage-2 page table and a RFC 
> is sent
> for some comments, thanks.
> 
> The ARMv8.4 TTRem feature offers 3 levels of support when changing 
> block
> size without changing any other parameters that are listed as requiring 
> use
> of break-before-make. And I found that maybe we can use this feature to 
> make
> some improvement for stage-2 page table and the following explains what
> TTRem exactly does for the improvement.
> 
> If migration of a VM with hugepages is canceled midway, KVM will adjust 
> the
> stage-2 table mappings back to block mappings. We currently use BBM to 
> replace
> the table entry with a block entry. Take adjustment of 1G block mapping 
> as an
> example, with BBM procedures, we have to invalidate the old table entry 
> first,
> flush TLB and unmap the old table mappings, right before installing the 
> new
> block entry.

In all honesty, I think the amount of work that is getting added to
support this "migration cancelled mid-way" use case is getting out
of control.

This is adding a complexity and corner cases for a use case that
really shouldn't happen that often. And it is adding it at the worse
possible place, where we really should keep things as straightforward
as possible.

I would expect userspace to have a good enough knowledge of whether
the migration is likely to succeed, and not to attempt it if it is
likely to fail. And yes, it will fail sometimes. But it should be
so rare that adding this various stages of BBM support shouldn't be
that useful.

Or is there something else that I am missing?

Thanks,

         M.