@@ -53,7 +53,7 @@
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
-extensions = []
+extensions = ['sphinx.ext.extlinks']
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
@@ -192,3 +192,12 @@
# A list of files that should not be packed into the epub file.
epub_exclude_files = ['search.html']
+
+
+# -- Configuration for external links ----------------------------------------
+
+extlinks = {
+ 'xen-cs':
+ ('https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=%s',
+ 'Xen c/s '),
+}
@@ -59,3 +59,11 @@ Miscellanea
.. toctree::
glossary
+
+Unsorted
+--------
+
+.. toctree::
+ :maxdepth: 2
+
+ misc/tech-debt
new file mode 100644
@@ -0,0 +1,130 @@
+.. SPDX-License-Identifier: CC-BY-4.0
+
+Technical Debt
+==============
+
+Hypervisor
+----------
+
+CONFIG_PDX
+~~~~~~~~~~
+
+Xen uses the term MFN for Machine Frame Number, which is synonymous with
+Linux's PFN, and maps linearly to system/host/machine physical addresses.
+
+For every page of RAM, a ``struct page_info`` is needed for tracking purposes.
+In the simple case, the frametable is an array of ``struct page_info[]``
+indexed by MFN.
+
+However, this is inefficient when a system has banks of RAM at spread out in
+address space, as a large amount of space is wasted on frametable entries for
+non-existent frames. This wastes both virtual address space and RAM.
+
+As a consequence, Xen has a compression scheme known as PDX which removes
+unused bits out of the middle of MFNs, to make a more tightly packed Page
+inDeX, which in turn reduces the size of the frametable for system.
+
+At the moment, PDX compression is unconditionally used.
+
+However, PDX compression does come with a cost in terms of the complexity to
+convert between PFNs and pages, which is a common operation in Xen.
+
+Typically, ARM32 systems do have RAM banks in discrete locations, and want to
+use PDX compression, while typically ARM64 and x86 systems have RAM packed
+from 0 with no holes.
+
+The goal of this work is to have ``CONFIG_PDX`` selected by ARM32 only. This
+requires slightly untangling the memory management code in ARM and x86 to give
+it a clean compile boundary where PDX conversions are used.
+
+
+Waitqueue infrastructure
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Livepatching safety in Xen depends on all CPUs rendezvousing on the return to
+guest path, with no stack frame. The vCPU waitqueue infrastructure undermines
+this safety by copying a stack frame sideways, and ``longjmp()``\-ing away.
+
+Waitqueues are only used by the introspection/mem_event/paging infrastructure,
+where the design of the rings causes some problems. There is a single 4k page
+used for the ring, which serves both synchronous requests, and lossless async
+requests. In practice, introspecting an 11-vcpu guest is sufficient to cause
+the waitqueue infrastructure to start to be used.
+
+A better design of ring would be to have a slot per vcpu for synchronous
+requests (simplifies producing and consuming of requests), and a multipage
+ring buffer (of negotiable size) with lossy semantics for async requests.
+
+A design such as this would guarantee that Xen never has to block waiting for
+userspace to create enough space on the ring for a vcpu to write state out.
+
+.. note::
+
+ There are other aspects of the existing ring infrastructure which are
+ driving a redesign, but these don't relate directly to the waitqueue
+ infrastructure and livepatching safety.
+
+ The most serious problem is that the ring infrastructure is GFN based,
+ which leaves the guest either able to mess with the ring, or a shattered
+ host superpage where the ring used to be, and the guest balloon driver able
+ to prevent the introspection agent from connecting/reconnecting the ring.
+
+As there are multiple compelling reasons to redesign the ring infrastructure,
+the plan is to introduce the new ring ABI, deprecate and remove the old ABI,
+and simply delete the waitqueue infrastructure at that point, rather than try
+to redesign livepatching from scratch in an attempt to cope with unwinding old
+stack frames.
+
+
+Dom0
+----
+
+Remove xenstored's dependencies on unstable interfaces
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Various xenstored implementations use libxc for two purposes. It would be a
+substantial advantage to move xenstored onto entirely stable interfaces, which
+disconnects it from the internal of the libxc.
+
+1. Foreign mapping of the store ring
+
+ This is obsolete since :xen-cs:`6a2de353a9` (2012) which allocated grant
+ entries instead, to allow xenstored to function as a stub-domain without dom0
+ permissions. :xen-cs:`38eeb3864d` dropped foreign mapping for cxenstored.
+ However, there are no OCaml bindings for libxengnttab.
+
+ Work Items:
+
+ * Minimal ``tools/ocaml/libs/xg/`` binding for ``tools/libs/gnttab/``.
+ * Replicate :xen-cs:`38eeb3864d` for oxenstored as well.
+
+2. Figuring out which domain(s) have gone away
+
+ Currently, the handling of domains is asymmetric.
+
+ * When a domain is created, the toolstack explicitly sends an
+ ``XS_INTRODUCE(domid, store mfn, store evtchn)`` message to xenstored, to
+ cause xenstored to connect to the guest ring, and fire the
+ ``@introduceDomain`` watch.
+
+ * When a domain is destroyed, Xen fires ``VIRQ_DOM_EXC`` which is bound by
+ xenstored, rather than the toolstack. xenstored updates its idea of the
+ status of domains, and fires the ``@releaseDomain`` watch.
+
+ Xenstored uses ``xc_domain_getinfo()``, to work out which domain(s) have gone
+ away, and only cares about the shutdown status.
+
+ Furthermore, ``@releaseDomain`` (like ``VIRQ_DOM_EXC``) is a single-bit
+ message, which requires all listeners to evaluate whether the message applies
+ to them or not. This results in a flurry of ``xc_domain_getinfo()`` calls
+ from multiple entities in the system, which all serialise on the domctl lock
+ in Xen.
+
+ Work Items:
+
+ * Figure out how shutdown status can be expressed in a stable way from Xen.
+ * Figure out if ``VIRQ_DOM_EXC`` and ``@releaseDomain`` can be extended
+ or superseded to carry at least a domid, to make domain shutdown scale
+ better.
+ * Figure out if ``VIRQ_DOM_EXC`` would better be bound by the toolstack,
+ rather than xenstored.
This identifies various of areas technical debt, which either need to be, or are being worked on, along with enough clarifying details for people to follow. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Lars Kurth <lars.kurth@citrix.com> CC: George Dunlap <George.Dunlap@eu.citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Jan Beulich <JBeulich@suse.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Stefano Stabellini <sstabellini@kernel.org> CC: Tim Deegan <tim@xen.org> CC: Wei Liu <wl@xen.org> CC: Julien Grall <julien@xen.org> CC: Roger Pau Monné <roger.pau@citrix.com> --- docs/conf.py | 11 +++- docs/index.rst | 8 +++ docs/misc/tech-debt.rst | 130 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 148 insertions(+), 1 deletion(-) create mode 100644 docs/misc/tech-debt.rst