mbox series

[RFC,0/9] CXL 2.0 Support

Message ID 20201111054356.793390-1-ben.widawsky@intel.com
Headers show
Series CXL 2.0 Support | expand

Message

Ben Widawsky Nov. 11, 2020, 5:43 a.m. UTC
Introduce support for “type-3” memory devices defined in the recently released
Compute Express Link (CXL) 2.0 specification. Specifically, these are the memory
devices defined by section 8.2.8.5 of the CXL 2.0 spec. A reference
implementation emulating these devices is being submitted to the QEMU mailing
list. “Type-3” is a CXL device that acts as a memory expander for RAM or PMEM.
It might be interleaved with other CXL devices in a given physical address
range.

These changes allow for foundational enumeration of CXL 2.0 memory devices. The functionality present is:
- Initial driver bring-up
- Device enumeration and an initial sysfs representation
- Submit a basic firmware command via ‘mailbox’ to an emulated memory device
  with non-volatile capacity.

Some of the functionality that is still missing includes:
- Memory interleaving at the host bridge, root port, or switch level
- CXL 1.1 Root Complex Integrated Endpoint Support
- CXL 2.0 Hot plug support

In addition to the core functionality of discovering the spec defined registers
and resources, introduce a CXL device model that will be the foundation for
translating CXL capabilities into existing Linux infrastructure for Persistent
Memory and other memory devices. For now, this only includes support for the
management command mailbox that type-3 devices surface. These control devices
fill the role of “DIMMs” / nmemX memory-devices in LIBNVDIMM terms.

Now, while implementing the driver some feedback for the specification was
generated to cover perceived gaps and address conflicts. The feedback is
presented as a reference implementation in the driver and QEMU emulation.
Specifically the following concepts are original to the Linux implementation and
feedback / collaboration is requested to develop these into specification
proposals:
1. Top level ACPI object (ACPI0017)
2. HW imposed address space and interleave constraints
3. _OSC UUID A4D1629D-FF52-4888-BE96-E5CADE548DB1

ACPI0017
--------
Introduce a new ACPI namespace device with an _HID of ACPI0017. The purpose of
this object is twofold, support a legacy OS with a set of out-of-tree CXL
modules, and establish an attach point for a driver that knows about
interleaving. Both of these boil down to the same point, to centralize Operating
System support for resources described by the CXL Early Discovery Table (CEDT).

The legacy OS problem stems from the spec's description of a host bridge,
ACPI0016 is denoted as the _HID for host bridges, with a _CID of PNP0A08. In a
CXL unaware version of Linux, the core ACPI subsystem will bind a driver to
PNP0A08 and preclude a CXL-aware driver from binding to ACPI0016. An ACPI0017
device allows a standalone CXL-aware driver to register for handling /
coordinating CEDT and CXL-specific _OSC control.

Similarly when managing interleaving there needs to be some management layer
above the ACPI0016 device that is capable of assembling leaf nodes into
interleave sets. As is the case with ACPI0012 that does this central
coordination for NFIT defined resources, ACPI0017 does the same for CEDT
described resources.

Memory Windows
-------
For CXL.mem capable platforms, there is a need for a mechanism for platform
firmware to make the Operating System aware of any restrictions that hardware
might have in address space. For example, in a system with 4 host bridges all
participating in an interleave set, the firmware needs to provide some
description of this. That information is missing from the CXL 2.0 spec as of
today and it also is not implemented in the driver. A variety of ACPI based
mechanisms, for example _CRS fields on the ACPI0017 device, were considered.


CXL Exclusive _OSC
-----------------
CXL 2.0 definition provides new fields to _OSC for host bridges to allow for new
services provided by CXL - error handling, hot plug, capabilities, etc. This is
built on top of PCIe _OSC via a new UUID. A CXL unaware OS will use the old UUID
to configure the PCIe host bridge. The expectation is that a CXL aware OS uses
the new UUID and to modify both CXL and PCIE capabilities in one shot. The issue
arises when trying to define a standalone CXL driver. The core OS will configure
the PCIe _OSC, but when the CXL driver attempts to set CXL _OSC the current
definition makes that driver re-specify PCIE capabilities. An isolated CXL-only
_OSC allows the PCIE core to be unchanged and let a CXL driver stack manage CXL
_OSC without the possibility of clobbering / colliding with PCIE core OSC
management.  The proposal moves the new _OSC dwords (SUPC and CTRC) to their own
_OSC UUID.

Next steps after this basic foundation is expanded command support and LIBNVDIMM
integration. This is the initial “release early / release often” version of the
Linux CXL enabling.


Ben Widawsky (5):
  cxl/mem: Map memory device registers
  cxl/mem: Find device capabilities
  cxl/mem: Initialize the mailbox interface
  cxl/mem: Implement polled mode mailbox
  MAINTAINERS: Add maintainers of the CXL driver

Dan Williams (2):
  cxl/mem: Add a driver for the type-3 mailbox
  cxl/mem: Register CXL memX devices

Vishal Verma (2):
  cxl/acpi: Add an acpi_cxl module for the CXL interconnect
  cxl/acpi: add OSC support

 MAINTAINERS           |   9 +
 drivers/Kconfig       |   1 +
 drivers/Makefile      |   1 +
 drivers/cxl/Kconfig   |  50 ++++
 drivers/cxl/Makefile  |   9 +
 drivers/cxl/acpi.c    | 325 ++++++++++++++++++++++
 drivers/cxl/acpi.h    |  33 +++
 drivers/cxl/bus.c     |  35 +++
 drivers/cxl/bus.h     |   8 +
 drivers/cxl/cxl.h     | 166 +++++++++++
 drivers/cxl/mem.c     | 631 ++++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/pci.h     |  21 ++
 include/acpi/actbl1.h |  52 ++++
 13 files changed, 1341 insertions(+)
 create mode 100644 drivers/cxl/Kconfig
 create mode 100644 drivers/cxl/Makefile
 create mode 100644 drivers/cxl/acpi.c
 create mode 100644 drivers/cxl/acpi.h
 create mode 100644 drivers/cxl/bus.c
 create mode 100644 drivers/cxl/bus.h
 create mode 100644 drivers/cxl/cxl.h
 create mode 100644 drivers/cxl/mem.c
 create mode 100644 drivers/cxl/pci.h

Comments

Ben Widawsky Nov. 11, 2020, 10:06 p.m. UTC | #1
Adding a cross reference to the QEMU work since I sent those patches after this:

https://gitlab.com/bwidawsk/qemu/-/tree/cxl-2.0
https://lists.nongnu.org/archive/html/qemu-devel/2020-11/msg02886.html

[snip]
Bjorn Helgaas Nov. 11, 2020, 10:43 p.m. UTC | #2
On Tue, Nov 10, 2020 at 09:43:47PM -0800, Ben Widawsky wrote:
> ...
> Ben Widawsky (5):
>   cxl/mem: Map memory device registers
>   cxl/mem: Find device capabilities
>   cxl/mem: Initialize the mailbox interface
>   cxl/mem: Implement polled mode mailbox
>   MAINTAINERS: Add maintainers of the CXL driver
> 
> Dan Williams (2):
>   cxl/mem: Add a driver for the type-3 mailbox

To include important words first and use "Type 3" as in spec:

  cxl/mem: Add Type 3 mailbox driver

>   cxl/mem: Register CXL memX devices
> 
> Vishal Verma (2):
>   cxl/acpi: Add an acpi_cxl module for the CXL interconnect
>   cxl/acpi: add OSC support

For consistency:

  cxl/acpi: Add _OSC support

It's conventional in drivers/acpi and drivers/pci to capitalize the
"ACPI" and "PCI" initialisms except in actual C code.   Seems like
you're mostly doing the same with "CXL", except in the subject lines
above.  Since you're making a new directory, I guess you get to
choose.

I use "PCIe" (not "PCIE" or "PCI-E"; you have a mix) because that
seems to be the official abbreviation.