[v5,2/5] allow mapping page-less memremaped areas into KVA

Introduce a type that encapsulates a page-frame-number that can also be
used to encode other information.  This other information is the
traditional "page_link" encoding in a scatterlist, but can also denote
"device memory".  Where "device memory" is a set of pfns that are not
part of the kernel's linear mapping, but are accessed via the same
memory controller as ram.  The motivation for this conversion is large
capacity persistent memory that does not enjoy struct page coverage,
entries in 'memmap' by default.

This type will be used in replace usage of 'struct page *' in cases
where only a pfn is required, i.e. scatterlists for drivers, dma mapping
api, and potentially biovecs for the block layer.  The operations in
those i/o paths that formerly required a 'struct page *' are converted
to use __pfn_t aware equivalent helpers.

It turns out that while 'struct page' references are used broadly in the
kernel I/O stacks the usage of 'struct page' based capabilities is very
shallow for block-i/o.  It is only used for populating bio_vecs and
scatterlists for the retrieval of dma addresses, and for temporary
kernel mappings (kmap).  Aside from kmap, these usages can be trivially
converted to operate on a pfn.

Indeed, kmap_atomic() is more problematic as it uses mm infrastructure,
via struct page, to setup and track temporary kernel mappings.  It would
be unfortunate if the kmap infrastructure escaped its 32-bit/HIGHMEM
bonds and leaked into 64-bit code.  Thankfully, it seems all that is
needed here is to convert kmap_atomic() callers, that want to opt-in to
supporting persistent memory, to use a new kmap_atomic_pfn_t().  Where
kmap_atomic_pfn_t() is enabled to re-use the existing ioremap() mapping
established by the driver for persistent memory.

Note, that as far as conceptually understanding __pfn_t is concerned,
'persistent memory' is really any address range in host memory not
covered by memmap.  Contrast this with pure iomem that is on an mmio
mapped bus like PCI and cannot be converted to a dma_addr_t by "pfn <<
PAGE_SHIFT".

It would be unfortunate if the kmap infrastructure escaped its current
32-bit/HIGHMEM bonds and leaked into 64-bit code.  Instead, if the user
has enabled CONFIG_DEV_PFN we direct the kmap_atomic_pfn_t()
implementation to scan a list of pre-mapped persistent memory address
ranges inserted by the pmem driver.

The __pfn_t to resource lookup is indeed inefficient walking of a linked list,
but there are two mitigating factors:

1/ The number of persistent memory ranges is bounded by the number of
   DIMMs which is on the order of 10s of DIMMs, not hundreds.

2/ The lookup yields the entire range, if it becomes inefficient to do a
   kmap_atomic_pfn_t() a PAGE_SIZE at a time the caller can take
   advantage of the fact that the lookup can be amortized for all kmap
   operations it needs to perform in a given range.

[hch: various changes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 include/linux/kmap_pfn.h |   31 ++++++++++++
 include/linux/mm.h       |   57 ++++++++++++++++++++++
 mm/Kconfig               |    3 +
 mm/Makefile              |    1 
 mm/kmap_pfn.c            |  117 ++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 209 insertions(+)
 create mode 100644 include/linux/kmap_pfn.h
 create mode 100644 mm/kmap_pfn.c

[v5,2/5] allow mapping page-less memremaped areas into KVA

Commit Message

Comments

Patch