mbox series

[0/8] *** RFC: ARM KVM dirty tracking device ***

Message ID 20240918152807.25135-1-lilitj@amazon.com (mailing list archive)
Headers show
Series *** RFC: ARM KVM dirty tracking device *** | expand

Message

Lilit Janpoladyan Sept. 18, 2024, 3:27 p.m. UTC
This patch series adds an ARM KVM interface for platform specific stage-2
page tracking devices and makes use of this interface for dirty tracking.

The page_tracking_device interface will be implemented by a device driver and
used by KVM. A device driver will register/deregister its implementation via
page_tracking_device_register()/page_tracking_device_unregister() functions;
KVM can use the device when page_tracking_device_registered() is true.

The page_tracking_device interface provides the following functionality:
- enabling\disabling dirty tracking for a VMID (+ optionally for a CPU id),
- reading GPAs dirtied by either any CPU (to populate dirty bitmaps) or
  by a specific CPU (to populate dirty rings)
- flushing not yet logged data.

KVM support for the page tracking device is added as a new extension and a
capability with the same name - KVM_CAP_ARM_PAGE_TRACKING_DEVICE. The 
capability is available when extension is supported (page_tracking_device_registered()
is true). When a device is available, new capability toggles device use for
dirty tracking. The capability is currently not compatible with the dirty ring
interface. At this moment only dirty bitmaps are supported as they allow userspace
to sync dirty pages from the hardware (e.g. PML) via kvm_arch_sync_dirty_log()
function. We have yet to add support for the dirty ring interface; which can sync
dirty pages into dirty rings either from userspace via a new ioctl or from KVM
on timer events.

For the page tracking device to be able to log guest write accesses this patch
series enables hardware management of the dirty state for stage-2 translations
by 1) setting VTCR_EL2.HD flag and 2) setting DBM (51) flag for the tracked
stage-2 descriptors. Currently KVM sets the DBM flag only when faulting in pages,
thus the first write to a page is logged by KVM as usual - on write fault, 
subsequent writes to the same page will be logged by a page tracking device.
We will optimize this by setting DBM flag when eagerly splitting huge pages.

An example of a device that tracks accesses to stage-2 translations and will
implement page_tracking_device interface is AWS Graviton Page Tracking Agent
(PTA). We'll be posting code for the Graviton PTA device driver in a separate
series of patches.

When ARM architectural solution (FEAT_HDBSS feature) is available, we intend to
use it via the same interface most likely with adaptations.


Lilit Janpoladyan (8):
  arm64: add an interface for stage-2 page tracking
  KVM: arm64: add page tracking device as a capability
  KVM: arm64: use page tracking interface to enable dirty logging
  KVM: return value from kvm_arch_sync_dirty_log
  KVM: arm64: get dirty pages from the page tracking device
  KVM: arm64: flush dirty logging data
  KVM: arm64: enable hardware dirty state management for stage-2
  KVM: arm64: make hardware manage dirty state after write faults

 Documentation/virt/kvm/api.rst         |  17 +++
 arch/arm64/include/asm/kvm_host.h      |   8 ++
 arch/arm64/include/asm/kvm_pgtable.h   |   1 +
 arch/arm64/include/asm/page_tracking.h |  79 +++++++++++++
 arch/arm64/kvm/Kconfig                 |  12 ++
 arch/arm64/kvm/Makefile                |   1 +
 arch/arm64/kvm/arm.c                   | 121 ++++++++++++++++++-
 arch/arm64/kvm/hyp/pgtable.c           |  29 ++++-
 arch/arm64/kvm/mmu.c                   |   8 ++
 arch/arm64/kvm/page_tracking.c         | 158 +++++++++++++++++++++++++
 arch/loongarch/kvm/mmu.c               |   3 +-
 arch/mips/kvm/mips.c                   |  12 +-
 arch/powerpc/kvm/book3s.c              |  12 +-
 arch/powerpc/kvm/booke.c               |  12 +-
 arch/riscv/kvm/mmu.c                   |   3 +-
 arch/s390/kvm/kvm-s390.c               |  13 +-
 arch/x86/kvm/x86.c                     |  21 +++-
 include/linux/kvm_host.h               |   4 +-
 include/uapi/linux/kvm.h               |   1 +
 virt/kvm/kvm_main.c                    |  34 ++++--
 20 files changed, 521 insertions(+), 28 deletions(-)
 create mode 100644 arch/arm64/include/asm/page_tracking.h
 create mode 100644 arch/arm64/kvm/page_tracking.c

Comments

Oliver Upton Sept. 19, 2024, 9:11 a.m. UTC | #1
Hi Lilit,

+cc kvmarm mailing list, get_maintainer is your friend :)

On Wed, Sep 18, 2024 at 03:27:59PM +0000, Lilit Janpoladyan wrote:
> An example of a device that tracks accesses to stage-2 translations and will
> implement page_tracking_device interface is AWS Graviton Page Tracking Agent
> (PTA). We'll be posting code for the Graviton PTA device driver in a separate
> series of patches.

In order to actually review these patches, we need to see an
implementation of such a page tracking device. Otherwise it's hard to
tell that the interface accomplishes the right abstractions.

Beyond that, I have some reservations about maintaining support for
features that cannot actually be tested outside of your own environment.

> When ARM architectural solution (FEAT_HDBSS feature) is available, we intend to
> use it via the same interface most likely with adaptations.

Will the PTA stuff eventually get retired once you get support for FEAT_HDBSS
in hardware?

I think the best way forward here is to implement the architecture, and
hopefully after that your legacy driver can be made to fit the
interface. The FVP implements FEAT_HDBSS, so there's some (slow)
reference hardware to test against.

This is a very interesting feature, so hopefully we can move towards
something workable.
Janpoladyan, Lilit Sept. 20, 2024, 10:12 a.m. UTC | #2
Hi Oliver,

On 19.09.24, 11:12, "Oliver Upton" <oliver.upton@linux.dev <mailto:oliver.upton@linux.dev>> wrote:

> Hi Lilit,


> +cc kvmarm mailing list, get_maintainer is your friend :)


> On Wed, Sep 18, 2024 at 03:27:59PM +0000, Lilit Janpoladyan wrote:
> > An example of a device that tracks accesses to stage-2 translations and will
> > implement page_tracking_device interface is AWS Graviton Page Tracking Agent
> > (PTA). We'll be posting code for the Graviton PTA device driver in a separate
> > series of patches.


> In order to actually review these patches, we need to see an
> implementation of such a page tracking device. Otherwise it's hard to
> tell that the interface accomplishes the right abstractions.

We'll be posting driver patches in the coming weeks, they should explain device
functionality.


> Beyond that, I have some reservations about maintaining support for
> features that cannot actually be tested outside of your own environment.

I understand, we'll see how we can emulate the functionality and make interface
testable.

> > When ARM architectural solution (FEAT_HDBSS feature) is available, we intend to
> > use it via the same interface most likely with adaptations.


> Will the PTA stuff eventually get retired once you get support for FEAT_HDBSS
> in hardware?

We'd need to keep the interface for as long as hardware without FEAT_HDBSS
but with PTA is in use, hence the attempt of generalisation.

> I think the best way forward here is to implement the architecture, and
> hopefully after that your legacy driver can be made to fit the
> interface. The FVP implements FEAT_HDBSS, so there's some (slow)
> reference hardware to test against.

Thanks for the idea, we'll test with FVP, but we'd need FEAT_HDBSS
documentation for that. I don't think it's available yet, is it?

> This is a very interesting feature, so hopefully we can move towards
> something workable.


> -- 
> Thanks,
> Oliver

Thanks for the feedback, we'll be working on the discussed points,
Lilit




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
David Woodhouse Sept. 26, 2024, 10 a.m. UTC | #3
On Thu, 2024-09-19 at 02:11 -0700, Oliver Upton wrote:
> Hi Lilit,
> 
> +cc kvmarm mailing list, get_maintainer is your friend :)
> 
> On Wed, Sep 18, 2024 at 03:27:59PM +0000, Lilit Janpoladyan wrote:
> > An example of a device that tracks accesses to stage-2 translations and will
> > implement page_tracking_device interface is AWS Graviton Page Tracking Agent
> > (PTA). We'll be posting code for the Graviton PTA device driver in a separate
> > series of patches.
> 
> In order to actually review these patches, we need to see an
> implementation of such a page tracking device. Otherwise it's hard to
> tell that the interface accomplishes the right abstractions.

Absolutely. That one is coming soon, but I was chasing the team to post
the API and KVM glue parts as early as possible to kick-start the
discussion, especially about the upcoming architectural solution.

> Beyond that, I have some reservations about maintaining support for
> features that cannot actually be tested outside of your own environment.

That's more about the hardware driver itself which will follow, than
the core API posted here.

I understand the reservation, but I think it's fine. In general, Linux
does support esoteric hardware that not everyone can test every
time. We do sweeping changes across all Ethernet drivers, for example,
and some of those barely even exist any more.

This particular device should be available on bare metal EC2 instances,
of course, but perhaps we should also implement it in QEMU. That would
actually be beneficial for our internal testing anyway, as it would
allow us to catch regressions much earlier in our own development
process.

> > When ARM architectural solution (FEAT_HDBSS feature) is available, we intend to
> > use it via the same interface most likely with adaptations.
> 
> Will the PTA stuff eventually get retired once you get support for FEAT_HDBSS
> in hardware?

I don't think there is a definitive answer to that which is ready to
tape out, but it certainly seems possible that future generations will
eventually move to FEAT_HDBSS, maybe even reaching production by the
end of the decade, at the earliest? And then a decade or two later, the
existing hardware generations might even get retired, yes¹.

¹ #include <forward-looking statement.disclaimer>

> I think the best way forward here is to implement the architecture, and
> hopefully after that your legacy driver can be made to fit the
> interface. The FVP implements FEAT_HDBSS, so there's some (slow)
> reference hardware to test against.

Is there actually any documentation available about FEAT_HDBSS? We've
been asking, but haven't received it. I can find one or two mentions
e.g. https://arm.jonpalmisc.com/2023_09_sysreg/AArch64-hdbssbr_el2 but
nothing particularly useful.

The main reason for posting this series early is to make sure we do all
we can to accommodate FEAT_HDBSS. It's not the *end* of the world if
the kernel-internal API has to be tweaked slightly when FEAT_HDBSS
actually becomes reality in future, but obviously we'd prefer to
support it right from the start.
Oliver Upton Sept. 30, 2024, 5:33 p.m. UTC | #4
On Thu, Sep 26, 2024 at 11:00:39AM +0100, David Woodhouse wrote:
> > Beyond that, I have some reservations about maintaining support for
> > features that cannot actually be tested outside of your own environment.
> 
> That's more about the hardware driver itself which will follow, than
> the core API posted here.
> 
> I understand the reservation, but I think it's fine. In general, Linux
> does support esoteric hardware that not everyone can test every
> time. We do sweeping changes across all Ethernet drivers, for example,
> and some of those barely even exist any more.

Of course, but I think it is also reasonable to say that ethernet
support in the kernel is rather mature with a good variety of hardware.
By comparison, what we have here is a brand new driver interface with
architecture / KVM code, which is pretty rare, with a single
implementation.

I'm perfectly happy to tinker on page tracking interface(s) in the
future w/o testing everything, but I must insist that we have *some*
way of testing the initial infrastructure before even considering taking
it.

> This particular device should be available on bare metal EC2 instances,
> of course, but perhaps we should also implement it in QEMU. That would
> actually be beneficial for our internal testing anyway, as it would
> allow us to catch regressions much earlier in our own development
> process.

QEMU would be interesting, but hardware is always welcome too ;-)

> > > When ARM architectural solution (FEAT_HDBSS feature) is available, we intend to
> > > use it via the same interface most likely with adaptations.
> > 
> > Will the PTA stuff eventually get retired once you get support for FEAT_HDBSS
> > in hardware?
> 
> I don't think there is a definitive answer to that which is ready to
> tape out, but it certainly seems possible that future generations will
> eventually move to FEAT_HDBSS, maybe even reaching production by the
> end of the decade, at the earliest? And then a decade or two later, the
> existing hardware generations might even get retired, yes¹.
> 
> ¹ #include <forward-looking statement.disclaimer>

Well, hopefully that means you folks will look after it then :)

> > I think the best way forward here is to implement the architecture, and
> > hopefully after that your legacy driver can be made to fit the
> > interface. The FVP implements FEAT_HDBSS, so there's some (slow)
> > reference hardware to test against.
> 
> Is there actually any documentation available about FEAT_HDBSS? We've
> been asking, but haven't received it. I can find one or two mentions
> e.g. https://arm.jonpalmisc.com/2023_09_sysreg/AArch64-hdbssbr_el2 but
> nothing particularly useful.

Annoyingly no, the Arm ARM tends to lag the architecture by quite a bit.
The sysreg XML (from which I think this website is derived) gets updated
much more frequently.

> The main reason for posting this series early is to make sure we do all
> we can to accommodate FEAT_HDBSS. It's not the *end* of the world if
> the kernel-internal API has to be tweaked slightly when FEAT_HDBSS
> actually becomes reality in future, but obviously we'd prefer to
> support it right from the start.

Jury is still out on how FEAT_HDBSS is gonna fit with this PTA stuff.
I'm guessing your hardware has some way of disambiguating dirtied
addresses by VMID.

The architected solution, OTOH, is tied to a particular stage-2 MMU
configuration. KVM proper might need to manage the dirty tracking
hardware in that case as it'll need to be context switched on the
vcpu_load() / vcpu_put() boundary.