[4/4] docs/sphinx: Technical Debt
diff mbox series

Message ID 20191003205623.20839-4-andrew.cooper3@citrix.com
State New
Headers show
Series
  • docs/sphinx
Related show

Commit Message

Andrew Cooper Oct. 3, 2019, 8:56 p.m. UTC
This identifies various of areas technical debt, which either need to be, or
are being worked on, along with enough clarifying details for people to
follow.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Lars Kurth <lars.kurth@citrix.com>
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Tim Deegan <tim@xen.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
 docs/conf.py            |  11 +++-
 docs/index.rst          |   8 +++
 docs/misc/tech-debt.rst | 130 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 148 insertions(+), 1 deletion(-)
 create mode 100644 docs/misc/tech-debt.rst

Patch
diff mbox series

diff --git a/docs/conf.py b/docs/conf.py
index 50e41501db..0d2227f52e 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -53,7 +53,7 @@ 
 # Add any Sphinx extension module names here, as strings. They can be
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 # ones.
-extensions = []
+extensions = ['sphinx.ext.extlinks']
 
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
@@ -192,3 +192,12 @@ 
 
 # A list of files that should not be packed into the epub file.
 epub_exclude_files = ['search.html']
+
+
+# -- Configuration for external links ----------------------------------------
+
+extlinks = {
+    'xen-cs':
+        ('https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=%s',
+         'Xen c/s '),
+}
diff --git a/docs/index.rst b/docs/index.rst
index b8ab13178c..0a2af2db9d 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -59,3 +59,11 @@  Miscellanea
 .. toctree::
 
    glossary
+
+Unsorted
+--------
+
+.. toctree::
+   :maxdepth: 2
+
+   misc/tech-debt
diff --git a/docs/misc/tech-debt.rst b/docs/misc/tech-debt.rst
new file mode 100644
index 0000000000..172ba3bd51
--- /dev/null
+++ b/docs/misc/tech-debt.rst
@@ -0,0 +1,130 @@ 
+.. SPDX-License-Identifier: CC-BY-4.0
+
+Technical Debt
+==============
+
+Hypervisor
+----------
+
+CONFIG_PDX
+~~~~~~~~~~
+
+Xen uses the term MFN for Machine Frame Number, which is synonymous with
+Linux's PFN, and maps linearly to system/host/machine physical addresses.
+
+For every page of RAM, a ``struct page_info`` is needed for tracking purposes.
+In the simple case, the frametable is an array of ``struct page_info[]``
+indexed by MFN.
+
+However, this is inefficient when a system has banks of RAM at spread out in
+address space, as a large amount of space is wasted on frametable entries for
+non-existent frames.  This wastes both virtual address space and RAM.
+
+As a consequence, Xen has a compression scheme known as PDX which removes
+unused bits out of the middle of MFNs, to make a more tightly packed Page
+inDeX, which in turn reduces the size of the frametable for system.
+
+At the moment, PDX compression is unconditionally used.
+
+However, PDX compression does come with a cost in terms of the complexity to
+convert between PFNs and pages, which is a common operation in Xen.
+
+Typically, ARM32 systems do have RAM banks in discrete locations, and want to
+use PDX compression, while typically ARM64 and x86 systems have RAM packed
+from 0 with no holes.
+
+The goal of this work is to have ``CONFIG_PDX`` selected by ARM32 only.  This
+requires slightly untangling the memory management code in ARM and x86 to give
+it a clean compile boundary where PDX conversions are used.
+
+
+Waitqueue infrastructure
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Livepatching safety in Xen depends on all CPUs rendezvousing on the return to
+guest path, with no stack frame.  The vCPU waitqueue infrastructure undermines
+this safety by copying a stack frame sideways, and ``longjmp()``\-ing away.
+
+Waitqueues are only used by the introspection/mem_event/paging infrastructure,
+where the design of the rings causes some problems.  There is a single 4k page
+used for the ring, which serves both synchronous requests, and lossless async
+requests.  In practice, introspecting an 11-vcpu guest is sufficient to cause
+the waitqueue infrastructure to start to be used.
+
+A better design of ring would be to have a slot per vcpu for synchronous
+requests (simplifies producing and consuming of requests), and a multipage
+ring buffer (of negotiable size) with lossy semantics for async requests.
+
+A design such as this would guarantee that Xen never has to block waiting for
+userspace to create enough space on the ring for a vcpu to write state out.
+
+.. note::
+
+   There are other aspects of the existing ring infrastructure which are
+   driving a redesign, but these don't relate directly to the waitqueue
+   infrastructure and livepatching safety.
+
+   The most serious problem is that the ring infrastructure is GFN based,
+   which leaves the guest either able to mess with the ring, or a shattered
+   host superpage where the ring used to be, and the guest balloon driver able
+   to prevent the introspection agent from connecting/reconnecting the ring.
+
+As there are multiple compelling reasons to redesign the ring infrastructure,
+the plan is to introduce the new ring ABI, deprecate and remove the old ABI,
+and simply delete the waitqueue infrastructure at that point, rather than try
+to redesign livepatching from scratch in an attempt to cope with unwinding old
+stack frames.
+
+
+Dom0
+----
+
+Remove xenstored's dependencies on unstable interfaces
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Various xenstored implementations use libxc for two purposes.  It would be a
+substantial advantage to move xenstored onto entirely stable interfaces, which
+disconnects it from the internal of the libxc.
+
+1. Foreign mapping of the store ring
+
+   This is obsolete since :xen-cs:`6a2de353a9` (2012) which allocated grant
+   entries instead, to allow xenstored to function as a stub-domain without dom0
+   permissions.  :xen-cs:`38eeb3864d` dropped foreign mapping for cxenstored.
+   However, there are no OCaml bindings for libxengnttab.
+
+   Work Items:
+
+   * Minimal ``tools/ocaml/libs/xg/`` binding for ``tools/libs/gnttab/``.
+   * Replicate :xen-cs:`38eeb3864d` for oxenstored as well.
+
+2. Figuring out which domain(s) have gone away
+
+   Currently, the handling of domains is asymmetric.
+
+   * When a domain is created, the toolstack explicitly sends an
+     ``XS_INTRODUCE(domid, store mfn, store evtchn)`` message to xenstored, to
+     cause xenstored to connect to the guest ring, and fire the
+     ``@introduceDomain`` watch.
+
+   * When a domain is destroyed, Xen fires ``VIRQ_DOM_EXC`` which is bound by
+     xenstored, rather than the toolstack.  xenstored updates its idea of the
+     status of domains, and fires the ``@releaseDomain`` watch.
+
+     Xenstored uses ``xc_domain_getinfo()``, to work out which domain(s) have gone
+     away, and only cares about the shutdown status.
+
+     Furthermore, ``@releaseDomain`` (like ``VIRQ_DOM_EXC``) is a single-bit
+     message, which requires all listeners to evaluate whether the message applies
+     to them or not.  This results in a flurry of ``xc_domain_getinfo()`` calls
+     from multiple entities in the system, which all serialise on the domctl lock
+     in Xen.
+
+     Work Items:
+
+     * Figure out how shutdown status can be expressed in a stable way from Xen.
+     * Figure out if ``VIRQ_DOM_EXC`` and ``@releaseDomain`` can be extended
+       or superseded to carry at least a domid, to make domain shutdown scale
+       better.
+     * Figure out if ``VIRQ_DOM_EXC`` would better be bound by the toolstack,
+       rather than xenstored.