diff mbox

[v1,01/13] docs: create Memory Bandwidth Allocation (MBA) feature document

Message ID 1502264512-4648-2-git-send-email-yi.y.sun@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yi Sun Aug. 9, 2017, 7:41 a.m. UTC
This patch creates MBA feature document in doc/features/. It describes
key points to implement MBA which is described in details in Intel SDM
"Introduction to Memory Bandwidth Allocation".

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
---
v1:
    - remove a special character to avoid the error when building pandoc.
---
 docs/features/intel_psr_mba.pandoc | 247 +++++++++++++++++++++++++++++++++++++
 1 file changed, 247 insertions(+)
 create mode 100644 docs/features/intel_psr_mba.pandoc

Comments

Chao Peng Aug. 14, 2017, 7:35 a.m. UTC | #1
> +     Linear mode: the input precision is defined as 100-(MBA_MAX).
> For instance,
> +     if the MBA_MAX value is 90, the input precision is 10%. Values
> not an even
> +     multiple of the precision (e.g., 12%) will be rounded down
> (e.g., to 10%
> +     delay applied) by HW automatically.

No sure if all people unterstand HW, if not then I prefer Hardware. If
you do this then all places though the document should be replaced.


> +  When context switch happens, the COS ID of VCPU is written to per-
> thread MSR

COS ID is per-domain other than per-vCPU at this time. So 'COS ID of
domain' is more accurate.

> +  `IA32_PQR_ASSOC`, and then hardware enforces bandwidth allocation
> according
> +  to the throttling value stored in the COS register.

There is no COS register in fact. COS exists just a concept.

> +For example:
> +    root@:~$ xl psr-hwinfo --mba
> +    Memory Bandwidth Allocation (MBA):
> +    Socket ID       : 0
> +    Linear Mode     : Enabled
> +    Maximum COS     : 7
> +    Maximum Throttling Value: 90
> +    Default Throttling Value: 0
> +
> +    root@:~$ xl psr-mba-set 1 0xa
> +
> +    root@:~$ xl psr-mba-show 1
> +    Socket ID       : 0
> +    Default THRTL   : 0
> +       ID                     NAME            THRTL
> +        1                 ubuntu14             0xa
> +
> +# Areas for improvement
> +
> +A hexadecimal number is used to show THRTL for a domain now. It may
> not be user-
> +friendly.
> +
> +To improve this, the libxl interfaces can be wrapped in libvirt to
> provide more
> +usr-friendly interfaces to user, e.g. a percentage number to show for
> linear
> +mode.

I suggest we can do this even for 'xl psr-mba-show', as we know we are
in linear mode or not. A hex number is just not easy to understand for
people. And for 'xl psr-mba-set' it is also much straighforward to set a
percentage number in linear mode.

Chao
Yi Sun Aug. 14, 2017, 8:23 a.m. UTC | #2
On 17-08-14 15:35:38, Chao Peng wrote:
> 
> > +     Linear mode: the input precision is defined as 100-(MBA_MAX).
> > For instance,
> > +     if the MBA_MAX value is 90, the input precision is 10%. Values
> > not an even
> > +     multiple of the precision (e.g., 12%) will be rounded down
> > (e.g., to 10%
> > +     delay applied) by HW automatically.
> 
> No sure if all people unterstand HW, if not then I prefer Hardware. If
> you do this then all places though the document should be replaced.
> 
We may explain 'HW' in 'Terminology'.

> 
> > +  When context switch happens, the COS ID of VCPU is written to per-
> > thread MSR
> 
> COS ID is per-domain other than per-vCPU at this time. So 'COS ID of
> domain' is more accurate.
> 
Yes, thanks.

> > +  `IA32_PQR_ASSOC`, and then hardware enforces bandwidth allocation
> > according
> > +  to the throttling value stored in the COS register.
> 
> There is no COS register in fact. COS exists just a concept.
> 
Ok, I will state the formal register name defined in SDM here.

> > +For example:
> > +    root@:~$ xl psr-hwinfo --mba
> > +    Memory Bandwidth Allocation (MBA):
> > +    Socket ID       : 0
> > +    Linear Mode     : Enabled
> > +    Maximum COS     : 7
> > +    Maximum Throttling Value: 90
> > +    Default Throttling Value: 0
> > +
> > +    root@:~$ xl psr-mba-set 1 0xa
> > +
> > +    root@:~$ xl psr-mba-show 1
> > +    Socket ID       : 0
> > +    Default THRTL   : 0
> > +       ID                     NAME            THRTL
> > +        1                 ubuntu14             0xa
> > +
> > +# Areas for improvement
> > +
> > +A hexadecimal number is used to show THRTL for a domain now. It may
> > not be user-
> > +friendly.
> > +
> > +To improve this, the libxl interfaces can be wrapped in libvirt to
> > provide more
> > +usr-friendly interfaces to user, e.g. a percentage number to show for
> > linear
> > +mode.
> 
> I suggest we can do this even for 'xl psr-mba-show', as we know we are
> in linear mode or not. A hex number is just not easy to understand for

So your suggestion is to show decimal value for linear mode, right? How about
non-linear mode, still show hexadecimal value?

> people. And for 'xl psr-mba-set' it is also much straighforward to set a
> percentage number in linear mode.
> 
For set, we do not have any limitation. User can input decimal or hexadecimal
value.

> Chao
Chao Peng Aug. 14, 2017, 9:36 a.m. UTC | #3
> > > +
> > > +# Areas for improvement
> > > +
> > > +A hexadecimal number is used to show THRTL for a domain now. It
> > > may
> > > not be user-
> > > +friendly.
> > > +
> > > +To improve this, the libxl interfaces can be wrapped in libvirt
> > > to
> > > provide more
> > > +usr-friendly interfaces to user, e.g. a percentage number to show
> > > for
> > > linear
> > > +mode.
> > 
> > I suggest we can do this even for 'xl psr-mba-show', as we know we
> > are
> > in linear mode or not. A hex number is just not easy to understand
> > for
> 
> So your suggestion is to show decimal value for linear mode, right?
> How about
> non-linear mode, still show hexadecimal value?

Exactly.

Chao
Wei Liu Aug. 15, 2017, 10:08 a.m. UTC | #4
On Wed, Aug 09, 2017 at 03:41:40PM +0800, Yi Sun wrote:
> +# Overview
> +
> +The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate
> +control over memory bandwidth available per-core. This feature provides OS/
> +hypervisor the ability to slow misbehaving apps/domains or create advanced
> +closed-loop control system via exposing control over a credit-based throttling
> +mechanism.
> +
> +# User details
> +
> +* Feature Enabling:
> +
> +  Add "psr=mba" to boot line parameter to enable MBA feature.
> +
> +* xl interfaces:
> +
> +  1. `psr-mba-show [domain-id]`:
> +
> +     Show memory bandwidth throttling for domain.
> +
> +  2. `psr-mba-set [OPTIONS] domain-id throttling`:
> +

When specifying arguments to a command, we normally use the form
<mandatory_argument> and [optional_argument].
Yi Sun Aug. 16, 2017, 2:51 a.m. UTC | #5
On 17-08-15 11:08:32, Wei Liu wrote:
> On Wed, Aug 09, 2017 at 03:41:40PM +0800, Yi Sun wrote:
> > +# Overview
> > +
> > +The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate
> > +control over memory bandwidth available per-core. This feature provides OS/
> > +hypervisor the ability to slow misbehaving apps/domains or create advanced
> > +closed-loop control system via exposing control over a credit-based throttling
> > +mechanism.
> > +
> > +# User details
> > +
> > +* Feature Enabling:
> > +
> > +  Add "psr=mba" to boot line parameter to enable MBA feature.
> > +
> > +* xl interfaces:
> > +
> > +  1. `psr-mba-show [domain-id]`:
> > +
> > +     Show memory bandwidth throttling for domain.
> > +
> > +  2. `psr-mba-set [OPTIONS] domain-id throttling`:
> > +
> 
> When specifying arguments to a command, we normally use the form
> <mandatory_argument> and [optional_argument].

Got it, thanks! Will change these.
diff mbox

Patch

diff --git a/docs/features/intel_psr_mba.pandoc b/docs/features/intel_psr_mba.pandoc
new file mode 100644
index 0000000..7a42edf
--- /dev/null
+++ b/docs/features/intel_psr_mba.pandoc
@@ -0,0 +1,247 @@ 
+% Intel Memory Bandwidth Allocation (MBA) Feature
+% Revision 1.2
+
+\clearpage
+
+# Basics
+
+---------------- ----------------------------------------------------
+         Status: **Tech Preview**
+
+Architecture(s): Intel x86
+
+   Component(s): Hypervisor, toolstack
+
+       Hardware: MBA is supported on Skylake Server and beyond
+---------------- ----------------------------------------------------
+
+# Terminology
+
+* CAT         Cache Allocation Technology
+* CBM         Capacity BitMasks
+* CDP         Code and Data Prioritization
+* COS/CLOS    Class of Service
+* MBA         Memory Bandwidth Allocation
+* MSRs        Machine Specific Registers
+* PSR         Intel Platform Shared Resource
+* THRTL       Throttle value or delay value
+
+# Overview
+
+The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate
+control over memory bandwidth available per-core. This feature provides OS/
+hypervisor the ability to slow misbehaving apps/domains or create advanced
+closed-loop control system via exposing control over a credit-based throttling
+mechanism.
+
+# User details
+
+* Feature Enabling:
+
+  Add "psr=mba" to boot line parameter to enable MBA feature.
+
+* xl interfaces:
+
+  1. `psr-mba-show [domain-id]`:
+
+     Show memory bandwidth throttling for domain.
+
+  2. `psr-mba-set [OPTIONS] domain-id throttling`:
+
+     Set memory bandwidth throttling for domain.
+
+     Options:
+     '-s': Specify the socket to process, otherwise all sockets are processed.
+
+     Throttling value set in register implies memory bandwidth blocked, i.e.
+     higher throttling value results in lower bandwidth. The max throttling
+     value can be got through CPUID.
+
+     The response of the throttling value could be linear mode or non-linear
+     mode.
+
+     Linear mode: the input precision is defined as 100-(MBA_MAX). For instance,
+     if the MBA_MAX value is 90, the input precision is 10%. Values not an even
+     multiple of the precision (e.g., 12%) will be rounded down (e.g., to 10%
+     delay applied) by HW automatically.
+
+     Non-linear mode: input delay values are powers-of-two from zero to the
+     MBA_MAX value from CPUID. In this case any values not a power of two will
+     be rounded down the next nearest power of two by HW automatically.
+
+# Technical details
+
+MBA is a member of Intel PSR features, it shares the base PSR infrastructure
+in Xen.
+
+## Hardware perspective
+
+  MBA defines a range of MSRs to support specifying a delay value (Thrtl) per
+  COS, with details below.
+
+  ```
+   +----------------------------+----------------+
+   | MSR (per socket)           |    Address     |
+   +----------------------------+----------------+
+   | IA32_L2_QOS_Ext_BW_Thrtl_0 |     0xD50      |
+   +----------------------------+----------------+
+   | ...                        |  ...           |
+   +----------------------------+----------------+
+   | IA32_L2_QOS_Ext_BW_Thrtl_n | 0xD50+n (n<64) |
+   +----------------------------+----------------+
+  ```
+
+  When context switch happens, the COS ID of VCPU is written to per-thread MSR
+  `IA32_PQR_ASSOC`, and then hardware enforces bandwidth allocation according
+  to the throttling value stored in the COS register.
+
+## The relationship between MBA and CAT/CDP
+
+  Generally speaking, MBA is completely independent of CAT/CDP, and any
+  combination may be applied at any time, e.g. enabling MBA with CAT
+  disabled.
+
+  But it needs to be noticed that MBA shares COS infrastructure with CAT,
+  although MBA is enumerated by different CPUID leaf from CAT (which
+  indicates that the max COS of MBA may be different from CAT). In some
+  cases, a domain is permitted to have a COS that is beyond one (or more)
+  of PSR features but within the others. For instance, let's assume the max
+  COS of MBA is 8 but the max COS of L3 CAT is 16, when a domain is assigned
+  9 as COS, the L3 CAT CBM associated to COS 9 would be enforced, but for MBA,
+  the HW works as default value is set since COS 9 is beyond the max COS (8)
+  of MBA.
+
+## Design Overview
+
+* Core COS/Thrtl association
+
+  When enforcing Memory Bandwidth Allocation, all cores of domains have
+  the same default COS (COS0) which stores the same Thrtl (0). The default
+  COS is used only in hypervisor and is transparent to tool stack and user.
+
+  System administrator can change PSR allocation policy at runtime by
+  tool stack. Since MBA shares COS with CAT/CDP, a COS corresponds to a
+  2-tuple, like [CBM, Thrtl] with only-CAT enalbed, when CDP is enabled,
+  the COS corresponds to a 3-tuple, like [Code_CBM, Data_CBM, Thrtl]. If
+  neither CAT nor CDP is enabled, things would be easier, one COS
+  corresponds to one Thrtl.
+
+* VCPU schedule
+
+  This part reuses CAT COS infrastructure.
+
+* Multi-sockets
+
+  Different sockets may have different MBA ability (like max COS)
+  although it is consistent on the same socket. So the capability
+  of per-socket MBA is specified.
+
+  This part reuses CAT COS infrastructure.
+
+## Implementation Description
+
+* Hypervisor interfaces:
+
+  1. Boot line param: "psr=mba" to enable the feature.
+
+  2. SYSCTL:
+          - XEN_SYSCTL_PSR_MBA_get_info: Get system MBA information.
+
+  3. DOMCTL:
+          - XEN_DOMCTL_PSR_MBA_OP_GET_THRTL: Get throttling for a domain.
+          - XEN_DOMCTL_PSR_MBA_OP_SET_THRTL: Set throttling for a domain.
+
+* xl interfaces:
+
+  1. psr-mba-show [domain-id]
+          Show system/domain runtime MBA throttling value.
+          => XEN_SYSCTL_PSR_MBA_get_info/XEN_DOMCTL_PSR_MBA_OP_GET_THRTL
+
+  2. psr-mba-set [OPTIONS] domain-id throttling
+          Set bandwidth throttling for a domain.
+          => XEN_DOMCTL_PSR_MBA_OP_SET_THRTL
+
+  3. psr-hwinfo
+          Show PSR HW information, including L3 CAT/CDP/L2 CAT/MBA.
+          => XEN_SYSCTL_PSR_MBA_get_info
+
+* Key data structure:
+
+  1. Feature HW info
+
+     ```
+     struct {
+         unsigned int thrtl_max;
+         unsigned int linear;
+     } mba_info;
+
+     - Member `thrtl_max`
+
+       `thrtl_max` is the max throttling value to be set.
+
+     - Member `linear`
+
+       `linear` means the response of delay value is linear or not.
+
+     As mentioned above, MBA is a member of Intel PSR features, it would
+     share the base PSR infrastructure in Xen. For example, the 'cos_max'
+     is a common HW property for all features. So, for other data structure
+     details, please refer 'intel_psr_cat_cdp.pandoc'.
+
+# Limitations
+
+MBA can only work on HW which enables it (check by CPUID).
+
+# Testing
+
+We can execute these commands to verify MBA on different HWs supporting them.
+
+For example:
+    root@:~$ xl psr-hwinfo --mba
+    Memory Bandwidth Allocation (MBA):
+    Socket ID       : 0
+    Linear Mode     : Enabled
+    Maximum COS     : 7
+    Maximum Throttling Value: 90
+    Default Throttling Value: 0
+
+    root@:~$ xl psr-mba-set 1 0xa
+
+    root@:~$ xl psr-mba-show 1
+    Socket ID       : 0
+    Default THRTL   : 0
+       ID                     NAME            THRTL
+        1                 ubuntu14             0xa
+
+# Areas for improvement
+
+A hexadecimal number is used to show THRTL for a domain now. It may not be user-
+friendly.
+
+To improve this, the libxl interfaces can be wrapped in libvirt to provide more
+usr-friendly interfaces to user, e.g. a percentage number to show for linear
+mode.
+
+# Known issues
+
+N/A
+
+# References
+
+"INTEL RESOURCE DIRECTOR TECHNOLOGY (INTEL RDT) ALLOCATION FEATURES" [Intel 64 and IA-32 Architectures Software Developer Manuals, vol3](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
+
+# History
+
+------------------------------------------------------------------------
+Date       Revision Version  Notes
+---------- -------- -------- -------------------------------------------
+2017-01-10 1.0      Xen 4.9  Design document written
+2017-07-10 1.1      Xen 4.10 Changes:
+                             1. Modify data structure according to latest
+                                codes;
+                             2. Add content for 'Areas for improvement';
+                             3. Other minor changes.
+2017-08-09 1.2      Xen 4.10 Changes:
+                             1. Remove a special character to avoid error when
+                                building pandoc.
+---------- -------- -------- -------------------------------------------