[for-4.7] docs: Feature Levelling feature document

Message ID	1464714345-26571-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Andrew Cooper <andrew.cooper3@citrix.com> To: Xen-devel <xen-devel@lists.xen.org> Date: Tue, 31 May 2016 18:05:45 +0100 Message-ID: <1464714345-26571-1-git-send-email-andrew.cooper3@citrix.com> MIME-Version: 1.0 Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <Ian.Jackson@eu.citrix.com>, Wei Liu <wei.liu2@citrix.com>, Jan Beulich <JBeulich@suse.com> Subject: [Xen-devel] [PATCH for-4.7] docs: Feature Levelling feature document Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

diff --git a/docs/features/feature-levelling.pandoc b/docs/features/feature-levelling.pandoc new file mode 100644 index 0000000..50bf099 --- /dev/null +++ b/docs/features/feature-levelling.pandoc @@ -0,0 +1,211 @@ +% Feature Levelling +% Draft 1 + +\clearpage + +# Basics + +---------------- ---------------------------------------------------- + Status: **Supported** + + Architecture: x86 + + Component: Hypervisor, toolstack, guest +---------------- ---------------------------------------------------- + + +# Overview + +On native hardware, a kernel will boot, detect features, typically optimise +certain codepaths based on the available features, and expect the features to +remain available until it shuts down. + +The same expectation exists for virtual machines, and it is up to the +hypervisor/toolstack to fulfil this expectation for the lifetime of the +virtual machine, including across migrate/suspend/resume. + + +# User details + +Many factors affect the featureset which a VM may use: + +* The CPU itself +* The BIOS/firmware/microcode version and settings +* The hypervisor version and command line settings +* Further restrictions the toolstack chooses to apply + +A firmware or software upgrade might reduce the available set of features +(e.g. Intel disabling TSX in a microcode update for certain Haswell/Broadwell +processors), as may editing the settings. + +It is unsafe to make any assumption about features remaining consistent across +a host reboot. Xen recalculates all information from scratch each boot, and +provides the information for the toolstack to consume. + +N.B. `xl`, being inherently a single-host toolstack, doesn't make use of these +levelling improvements. These features are of interest to higher level +toolstacks such as `libvirt` or `XAPI`. + + +# Technical details + +The `CPUID` instruction is used by software to query for features. In the +virtualisation usecase, guest software should query Xen rather than hardware +directly. However, `CPUID` is an unprivileged instruction which doesn't +fault, complicating the task of hiding hardware features from guests. + +Important files: + +* Hypervisor + * `xen/arch/x86/cpu/*.c` + * `xen/arch/x86/cpuid.c` + * `xen/include/asm-x86/cpuid-autogen.h` + * `xen/include/public/arch-x86/cpufeatureset.h` + * `xen/tools/gen-cpuid.py` +* `libxc` + * `tools/libxc/xc_cpuid_x86.c` + +## Ability to control CPUID + +### HVM + +HVM guests (using `Intel VT-x` or `AMD SVM`) will unconditionally exit to Xen +on all `CPUID` instructions, allowing Xen full control over all information. + +### PV + +The `CPUID` instruction is unprivileged, so executing it in a PV guest will +not trap, leaving Xen no direct ability to control the information returned. + +### Xen Forced Emulation Prefix + +Xen-aware PV software can make use of the 'Forced Emulation Prefix' + +> `ud2a; .ascii 'xen'; cpuid` + +which Xen recognises as a deliberate attempt to get the fully-controlled +`CPUID` information rather than the hardware-reported information. This only +works with cooperative software. + +### Masking and Override MSRs + +AMD CPUs from the `K8` onwards support _Feature Override_ MSRs, which allow +direct control of the values returned for certain `CPUID` leaves. These MSRs +allow any result to be returned, including the ability to advertise features +which are not actually supported. + +Intel CPUs between `Nehalem` and `SandyBridge` have differing numbers of +_Feature Mask_ MSRs, which are a simple AND-mask applied to all `CPUID` +instructions requesting specific feature bitmap sets. The exact MSRs, and +which feature bitmap sets they affect are hardware specific. These MSRs allow +features to be hidden by clearing the appropriate bit in the mask, but does +not allow unsupported features to be advertised. + +### CPUID Faulting + +Intel CPUs from `IvyBridge` onwards have _CPUID Faulting_, which allows Xen to +cause `CPUID` instruction executed in PV guests to fault. This allows Xen +full control over all information, exactly like HVM guests. + +## Compile time + +As some features depend on other features, it is important that, when +disabling a certain feature, we disable all features which depend on it. This +allows runtime logic to be simplified, by being able to rely on testing only +the single appropriate feature, rather than the entire feature dependency +chain. + +To speed up runtime calculation of feature dependencies, the dependency chain +is calculated and flattened by `xen/tools/gen-cpuid.py` to create +`xen/include/asm-x86/cpuid-autogen.h` from +`xen/include/public/arch-x86/cpufeatureset.h`, allowing the runtime code to +disable all dependent features of a specific disabled feature in constant +time. + +## Host boot + +As Xen boots, it will enumerate the features it can see. This is stored as +the _raw\_featureset_. + +Errata checks and command line arguments are then taken into account to reduce +the _raw\_featureset_ into the _host\_featureset_, which is the set of +features Xen uses. On hardware with masking/override MSRs, the default MSR +values are picked from the _host\_featureset_. + +The _host\_featureset_ is then used to calculate the _pv\_featureset_ and +_hvm\_featureset_, which are the maximum featuresets Xen is willing to offer +to PV and HVM guests respectively. + +In addition, Xen will calculate how much control it has over non-cooperative +PV `CPUID` instructions, storing this information as _levelling\_caps_. + +## Domain creation + +The toolstack can query each of the calculated featureset via +`XEN_SYSCTL_get_cpu_featureset`, and query for the levelling caps via +`XEN_SYSCTL_get_cpu_levelling_caps`. + +These data should be used by the toolstack when choosing the eventual +featureset to offer to the guest. + +Once a featureset has been chosen, it is set (implicitly or explicitly) via +`XEN_DOMCTL_set_cpuid`. Xen will clamp the toolstacks choice to the +appropriate PV or HVM featureset. On hardware with masking/override MSRs, the +guest cpuid policy is reflected in the MSRs, which are context switched with +other vcpu state. + +# Limitations + +A guest which ignores the provided feature information and manually probes for +features will be able to find some of them. e.g. There is no way of forcibly +preventing a guest from using 1GB superpages if the hardware supports it. + +Some information simply cannot be hidden from guests. There is no way to +control certain behaviour such as the hardware MXCSR\_MASK or x87 FPU exception +behaviour. + + +# Testing + +Feature levelling is a very wide area, and used all over the hypervisor. +Please ask on xen-devel for help identifying more specific tests which could +be of use. + + +# Known issues / Areas for improvement + +Xen currently has no concept of per-{socket,core,thread} CPUID information. +As a result, details such as APIC IDs, topology and cache information do not +match real hardware, and do not match the documented expectations in the Intel +and AMD system manuals. + +The CPU feature flags are the only information which the toolstack has a +sensible interface for querying and levelling. Other information in the CPUID +policy is important and should be levelled (e.g. maxphysaddr). + +The CPUID policy is currently regenerated from scratch by the receiving side, +once memory and vcpu content has been restored. This means that the receiving +Xen cannot verify the memory/vcpu content against the CPUID policy, and can +end up running a guest which will subsequently crash. The CPUID policy should +be at the head of the migration stream. + +MSRs are another source of features for guests. There is no general provision +for controlling the available MSRs. E.g. 64bit versions of Windows notice +changes in IA32\_MISC\_ENABLE, and suffer a BSOD 0x109 (Critical Structure +Corruption) + + +# References + +[Intel Flexmigration](http://www.intel.co.uk/content/dam/www/public/us/en/documents/application-notes/virtualization-technology-flexmigration-application-note.pdf) + +[AMD Extended Migration Technology](http://developer.amd.com/wordpress/media/2012/10/43781-3.00-PUB_Live-Virtual-Machine-Migration-on-AMD-processors.pdf) + + +# History + +------------------------------------------------------------------------ +Date Revision Version Notes +---------- -------- -------- ------------------------------------------- +2016-05-31 1 Xen 4.7 Document written +---------- -------- -------- -------------------------------------------

[for-4.7] docs: Feature Levelling feature document

Commit Message

Comments

Patch