Message ID | 1556101715-31966-1-git-send-email-rppt@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | docs/vm: add documentation of memory models | expand |
On 04/24/2019 03:58 PM, Mike Rapoport wrote: > +To use vmemmap, an architecture has to reserve a range of virtual > +addresses that will map the physical pages containing the memory > +map. and make sure that `vmemmap` points to that range. In addition, > +the architecture should implement :c:func:`vmemmap_populate` method > +that will allocate the physical memory and create page tables for the > +virtual memory map. If an architecture does not have any special > +requirements for the vmemmap mappings, it can use default > +:c:func:`vmemmap_populate_basepages` provided by the generic memory > +management. Just to complete it, could you also include struct vmem_altmap and how it can contribute towards the physical backing for vmemmap virtual mapping. Otherwise the write up looks complete.
On Wed, Apr 24, 2019 at 04:20:02PM +0530, Anshuman Khandual wrote: > > > On 04/24/2019 03:58 PM, Mike Rapoport wrote: > > +To use vmemmap, an architecture has to reserve a range of virtual > > +addresses that will map the physical pages containing the memory > > +map. and make sure that `vmemmap` points to that range. In addition, > > +the architecture should implement :c:func:`vmemmap_populate` method > > +that will allocate the physical memory and create page tables for the > > +virtual memory map. If an architecture does not have any special > > +requirements for the vmemmap mappings, it can use default > > +:c:func:`vmemmap_populate_basepages` provided by the generic memory > > +management. > > Just to complete it, could you also include struct vmem_altmap and how it > can contribute towards the physical backing for vmemmap virtual mapping. > Otherwise the write up looks complete. Sure, but I'd prefer having it as a separate patch.
On Wed, 24 Apr 2019 13:28:35 +0300 Mike Rapoport <rppt@linux.ibm.com> wrote: > Describe what {FLAT,DISCONTIG,SPARSE}MEM are and how they manage to > maintain pfn <-> struct page correspondence. Quick question: should this document perhaps mention that DISCONTIGMEM appears to be on its way out? Thanks, jon
On Wed, Apr 24, 2019 at 10:14:55AM -0600, Jonathan Corbet wrote: > On Wed, 24 Apr 2019 13:28:35 +0300 > Mike Rapoport <rppt@linux.ibm.com> wrote: > > > Describe what {FLAT,DISCONTIG,SPARSE}MEM are and how they manage to > > maintain pfn <-> struct page correspondence. > > Quick question: should this document perhaps mention that DISCONTIGMEM > appears to be on its way out? I suspect it'll take a while until then, but I'll add a sentence about it being deprecated. Which reminds me that mm/Kconfig also begs for the corresponding update. > Thanks, > > jon >
On 4/24/19 3:28 AM, Mike Rapoport wrote: > Describe what {FLAT,DISCONTIG,SPARSE}MEM are and how they manage to > maintain pfn <-> struct page correspondence. > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> > --- > Documentation/vm/index.rst | 1 + > Documentation/vm/memory-model.rst | 171 ++++++++++++++++++++++++++++++++++++++ > 2 files changed, 172 insertions(+) > create mode 100644 Documentation/vm/memory-model.rst > Hi Mike, I have a few minor edits below... > diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst > new file mode 100644 > index 0000000..914c52a > --- /dev/null > +++ b/Documentation/vm/memory-model.rst > @@ -0,0 +1,171 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +.. _physical_memory_model: > + > +===================== > +Physical Memory Model > +===================== > + > +Physical memory in a system may be addressed in different ways. The > +simplest case is when the physical memory starts at address 0 and > +spans a contiguous range up to the maximal address. It could be, > +however, that this range contains small holes that are not accessible > +for the CPU. Then there could be several contiguous ranges at > +completely distinct addresses. And, don't forget about NUMA, where > +different memory banks are attached to different CPUs. > + > +Linux abstracts this diversity using one of the three memory models: > +FLATMEM, DISCONTIGMEM and SPARSEMEM. Each architecture defines what > +memory models it supports, what is the default memory model and > +whether it possible to manually override that default. > + > +All the memory models track the status of physical page frames using > +:c:type:`struct page` arranged in one or more arrays. > + > +Regardless of the selected memory model, there exists one-to-one > +mapping between the physical page frame number (PFN) and the > +corresponding `struct page`. > + > +Each memory model defines :c:func:`pfn_to_page` and :c:func:`page_to_pfn` > +helpers that allow the conversion from PFN to `struct page` and vise vice > +versa. > + > +FLATMEM > +======= > + > +The simplest memory model is FLATMEM. This model is suitable for > +non-NUMA systems with contiguous, or mostly contiguous, physical > +memory. > + > +In the FLATMEM memory model, there is a global `mem_map` array that > +maps the entire physical memory. For most architectures, the holes > +have entries in the `mem_map` array. The `struct page` objects > +corresponding to the holes are never fully initialized. > + > +To allocate the `mem_map` array, architecture specific setup code > +should call :c:func:`free_area_init_node` function or its convenience > +wrapper :c:func:`free_area_init`. Yet, the mappings array is not > +usable until the call to :c:func:`memblock_free_all` that hands all > +the memory to the page allocator. > + > +If an architecture enables `CONFIG_ARCH_HAS_HOLES_MEMORYMODEL` option, > +it may free parts of the `mem_map` array that do not cover the > +actual physical pages. In such case, the architecture specific > +:c:func:`pfn_valid` implementation should take the holes in the > +`mem_map` into account. > + > +With FLATMEM, the conversion between a PFN and the `struct page` is > +straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the > +`mem_map` array. > + > +The `ARCH_PFN_OFFSET` defines the first page frame number for > +systems that their physical memory does not start at 0. s/that/when/ ? Seems awkward as is. > + > +DISCONTIGMEM > +============ > + > +The DISCONTIGMEM model treats the physical memory as a collection of > +`nodes` similarly to how Linux NUMA support does. For each node Linux > +constructs an independent memory management subsystem represented by > +`struct pglist_data` (or `pg_data_t` for short). Among other > +things, `pg_data_t` holds the `node_mem_map` array that maps > +physical pages belonging to that node. The `node_start_pfn` field of > +`pg_data_t` is the number of the first page frame belonging to that > +node. > + > +The architecture setup code should call :c:func:`free_area_init_node` for > +each node in the system to initialize the `pg_data_t` object and its > +`node_mem_map`. > + > +Every `node_mem_map` behaves exactly as FLATMEM's `mem_map` - > +every physical page frame in a node has a `struct page` entry in the > +`node_mem_map` array. When DISCONTIGMEM is enabled, a portion of the > +`flags` field of the `struct page` encodes the node number of the > +node hosting that page. > + > +The conversion between a PFN and the `struct page` in the > +DISCONTIGMEM model became slightly more complex as it has to determine > +which node hosts the physical page and which `pg_data_t` object > +holds the `struct page`. > + > +Architectures that support DISCONTIGMEM provide :c:func:`pfn_to_nid` > +to convert PFN to the node number. The opposite conversion helper > +:c:func:`page_to_nid` is generic as it uses the node number encoded in > +page->flags. > + > +Once the node number is known, the PFN can be used to index > +appropriate `node_mem_map` array to access the `struct page` and > +the offset of the `struct page` from the `node_mem_map` plus > +`node_start_pfn` is the PFN of that page. > + > +SPARSEMEM > +========= > + > +SPARSEMEM is the most versatile memory model available in Linux and it > +is the only memory model that supports several advanced features such > +as hot-plug and hot-remove of the physical memory, alternative memory > +maps for non-volatile memory devices and deferred initialization of > +the memory map for larger systems. > + > +The SPARSEMEM model presents the physical memory as a collection of > +sections. A section is represented with :c:type:`struct mem_section` > +that contains `section_mem_map` that is, logically, a pointer to an > +array of struct pages. However, it is stored with some other magic > +that aids the sections management. The section size and maximal number > +of section is specified using `SECTION_SIZE_BITS` and > +`MAX_PHYSMEM_BITS` constants defined by each architecture that > +supports SPARSEMEM. While `MAX_PHYSMEM_BITS` is an actual width of a > +physical address that an architecture supports, the > +`SECTION_SIZE_BITS` is an arbitrary value. > + > +The maximal number of sections is denoted `NR_MEM_SECTIONS` and > +defined as > + > +.. math:: > + > + NR\_MEM\_SECTIONS = 2 ^ {(MAX\_PHYSMEM\_BITS - SECTION\_SIZE\_BITS)} > + > +The `mem_section` objects are arranged in a two dimensional array two-dimensional > +called `mem_sections`. The size and placement of this array depend > +on `CONFIG_SPARSEMEM_EXTREME` and the maximal possible number of > +sections: > + > +* When `CONFIG_SPARSEMEM_EXTREME` is disabled, the `mem_sections` > + array is static and has `NR_MEM_SECTIONS` rows. Each row holds a > + single `mem_section` object. > +* When `CONFIG_SPARSEMEM_EXTREME` is enabled, the `mem_sections` > + array is dynamically allocated. Each row contains PAGE_SIZE worth of > + `mem_section` objects and the number of rows is calculated to fit > + all the memory sections. > + > +The architecture setup code should call :c:func:`memory_present` for > +each active memory range or use :c:func:`memblocks_present` or > +:c:func:`sparse_memory_present_with_active_regions` wrappers to > +initialize the memory sections. Next, the actual memory maps should be > +set up using :c:func:`sparse_init`. > + > +With SPARSEMEM there are two possible ways to convert a PFN to the > +corresponding `struct page` - a "classic sparse" and "sparse > +vmemmap". The selection is made at build time and it is determined by > +the value of `CONFIG_SPARSEMEM_VMEMMAP`. > + > +The classic sparse encodes the section number of a page in page->flags > +and uses high bits of a PFN to access the section that maps that page > +frame. Inside a section, the PFN is the index to the array of pages. > + > +The sparse vmemmap uses a virtually mapped memory map to optimize > +pfn_to_page and page_to_pfn operations. There is a global `struct > +page *vmemmap` pointer that points to a virtually contiguous array of > +`struct page` objects. A PFN is an index to that array and the the > +offset of the `struct page` from `vmemmap` is the PFN of that > +page. > + > +To use vmemmap, an architecture has to reserve a range of virtual > +addresses that will map the physical pages containing the memory > +map. and make sure that `vmemmap` points to that range. In addition, map and > +the architecture should implement :c:func:`vmemmap_populate` method > +that will allocate the physical memory and create page tables for the > +virtual memory map. If an architecture does not have any special > +requirements for the vmemmap mappings, it can use default > +:c:func:`vmemmap_populate_basepages` provided by the generic memory > +management. > thanks.
Hi Randy, On Wed, Apr 24, 2019 at 06:08:46PM -0700, Randy Dunlap wrote: > On 4/24/19 3:28 AM, Mike Rapoport wrote: > > Describe what {FLAT,DISCONTIG,SPARSE}MEM are and how they manage to > > maintain pfn <-> struct page correspondence. > > > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> > > --- > > Documentation/vm/index.rst | 1 + > > Documentation/vm/memory-model.rst | 171 ++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 172 insertions(+) > > create mode 100644 Documentation/vm/memory-model.rst > > > > Hi Mike, > I have a few minor edits below... I kinda expected those ;-) > > diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst > > new file mode 100644 > > index 0000000..914c52a > > --- /dev/null > > +++ b/Documentation/vm/memory-model.rst ... > > + > > +With FLATMEM, the conversion between a PFN and the `struct page` is > > +straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the > > +`mem_map` array. > > + > > +The `ARCH_PFN_OFFSET` defines the first page frame number for > > +systems that their physical memory does not start at 0. > > s/that/when/ ? Seems awkward as is. Yeah, it is awkward. How about The `ARCH_PFN_OFFSET` defines the first page frame number for systems with physical memory starting at address different from 0. > > + > > +DISCONTIGMEM > > +============ > > + > > thanks. > -- > ~Randy >
On 4/25/19 1:22 AM, Mike Rapoport wrote: > Hi Randy, > > On Wed, Apr 24, 2019 at 06:08:46PM -0700, Randy Dunlap wrote: >> On 4/24/19 3:28 AM, Mike Rapoport wrote: >>> Describe what {FLAT,DISCONTIG,SPARSE}MEM are and how they manage to >>> maintain pfn <-> struct page correspondence. >>> >>> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> >>> --- >>> Documentation/vm/index.rst | 1 + >>> Documentation/vm/memory-model.rst | 171 ++++++++++++++++++++++++++++++++++++++ >>> 2 files changed, 172 insertions(+) >>> create mode 100644 Documentation/vm/memory-model.rst >>> >> >> Hi Mike, >> I have a few minor edits below... > > I kinda expected those ;-) > >>> diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst >>> new file mode 100644 >>> index 0000000..914c52a >>> --- /dev/null >>> +++ b/Documentation/vm/memory-model.rst > > ... > >>> + >>> +With FLATMEM, the conversion between a PFN and the `struct page` is >>> +straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the >>> +`mem_map` array. >>> + >>> +The `ARCH_PFN_OFFSET` defines the first page frame number for >>> +systems that their physical memory does not start at 0. >> >> s/that/when/ ? Seems awkward as is. > > Yeah, it is awkward. How about > > The `ARCH_PFN_OFFSET` defines the first page frame number for > systems with physical memory starting at address different from 0. OK. Thanks. >>> + >>> +DISCONTIGMEM >>> +============ >>> + >> >> thanks. >> -- >> ~Randy >> >
diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst index b58cc3b..e8d943b 100644 --- a/Documentation/vm/index.rst +++ b/Documentation/vm/index.rst @@ -37,6 +37,7 @@ descriptions of data structures and algorithms. hwpoison hugetlbfs_reserv ksm + memory-model mmu_notifier numa overcommit-accounting diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst new file mode 100644 index 0000000..914c52a --- /dev/null +++ b/Documentation/vm/memory-model.rst @@ -0,0 +1,171 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _physical_memory_model: + +===================== +Physical Memory Model +===================== + +Physical memory in a system may be addressed in different ways. The +simplest case is when the physical memory starts at address 0 and +spans a contiguous range up to the maximal address. It could be, +however, that this range contains small holes that are not accessible +for the CPU. Then there could be several contiguous ranges at +completely distinct addresses. And, don't forget about NUMA, where +different memory banks are attached to different CPUs. + +Linux abstracts this diversity using one of the three memory models: +FLATMEM, DISCONTIGMEM and SPARSEMEM. Each architecture defines what +memory models it supports, what is the default memory model and +whether it possible to manually override that default. + +All the memory models track the status of physical page frames using +:c:type:`struct page` arranged in one or more arrays. + +Regardless of the selected memory model, there exists one-to-one +mapping between the physical page frame number (PFN) and the +corresponding `struct page`. + +Each memory model defines :c:func:`pfn_to_page` and :c:func:`page_to_pfn` +helpers that allow the conversion from PFN to `struct page` and vise +versa. + +FLATMEM +======= + +The simplest memory model is FLATMEM. This model is suitable for +non-NUMA systems with contiguous, or mostly contiguous, physical +memory. + +In the FLATMEM memory model, there is a global `mem_map` array that +maps the entire physical memory. For most architectures, the holes +have entries in the `mem_map` array. The `struct page` objects +corresponding to the holes are never fully initialized. + +To allocate the `mem_map` array, architecture specific setup code +should call :c:func:`free_area_init_node` function or its convenience +wrapper :c:func:`free_area_init`. Yet, the mappings array is not +usable until the call to :c:func:`memblock_free_all` that hands all +the memory to the page allocator. + +If an architecture enables `CONFIG_ARCH_HAS_HOLES_MEMORYMODEL` option, +it may free parts of the `mem_map` array that do not cover the +actual physical pages. In such case, the architecture specific +:c:func:`pfn_valid` implementation should take the holes in the +`mem_map` into account. + +With FLATMEM, the conversion between a PFN and the `struct page` is +straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the +`mem_map` array. + +The `ARCH_PFN_OFFSET` defines the first page frame number for +systems that their physical memory does not start at 0. + +DISCONTIGMEM +============ + +The DISCONTIGMEM model treats the physical memory as a collection of +`nodes` similarly to how Linux NUMA support does. For each node Linux +constructs an independent memory management subsystem represented by +`struct pglist_data` (or `pg_data_t` for short). Among other +things, `pg_data_t` holds the `node_mem_map` array that maps +physical pages belonging to that node. The `node_start_pfn` field of +`pg_data_t` is the number of the first page frame belonging to that +node. + +The architecture setup code should call :c:func:`free_area_init_node` for +each node in the system to initialize the `pg_data_t` object and its +`node_mem_map`. + +Every `node_mem_map` behaves exactly as FLATMEM's `mem_map` - +every physical page frame in a node has a `struct page` entry in the +`node_mem_map` array. When DISCONTIGMEM is enabled, a portion of the +`flags` field of the `struct page` encodes the node number of the +node hosting that page. + +The conversion between a PFN and the `struct page` in the +DISCONTIGMEM model became slightly more complex as it has to determine +which node hosts the physical page and which `pg_data_t` object +holds the `struct page`. + +Architectures that support DISCONTIGMEM provide :c:func:`pfn_to_nid` +to convert PFN to the node number. The opposite conversion helper +:c:func:`page_to_nid` is generic as it uses the node number encoded in +page->flags. + +Once the node number is known, the PFN can be used to index +appropriate `node_mem_map` array to access the `struct page` and +the offset of the `struct page` from the `node_mem_map` plus +`node_start_pfn` is the PFN of that page. + +SPARSEMEM +========= + +SPARSEMEM is the most versatile memory model available in Linux and it +is the only memory model that supports several advanced features such +as hot-plug and hot-remove of the physical memory, alternative memory +maps for non-volatile memory devices and deferred initialization of +the memory map for larger systems. + +The SPARSEMEM model presents the physical memory as a collection of +sections. A section is represented with :c:type:`struct mem_section` +that contains `section_mem_map` that is, logically, a pointer to an +array of struct pages. However, it is stored with some other magic +that aids the sections management. The section size and maximal number +of section is specified using `SECTION_SIZE_BITS` and +`MAX_PHYSMEM_BITS` constants defined by each architecture that +supports SPARSEMEM. While `MAX_PHYSMEM_BITS` is an actual width of a +physical address that an architecture supports, the +`SECTION_SIZE_BITS` is an arbitrary value. + +The maximal number of sections is denoted `NR_MEM_SECTIONS` and +defined as + +.. math:: + + NR\_MEM\_SECTIONS = 2 ^ {(MAX\_PHYSMEM\_BITS - SECTION\_SIZE\_BITS)} + +The `mem_section` objects are arranged in a two dimensional array +called `mem_sections`. The size and placement of this array depend +on `CONFIG_SPARSEMEM_EXTREME` and the maximal possible number of +sections: + +* When `CONFIG_SPARSEMEM_EXTREME` is disabled, the `mem_sections` + array is static and has `NR_MEM_SECTIONS` rows. Each row holds a + single `mem_section` object. +* When `CONFIG_SPARSEMEM_EXTREME` is enabled, the `mem_sections` + array is dynamically allocated. Each row contains PAGE_SIZE worth of + `mem_section` objects and the number of rows is calculated to fit + all the memory sections. + +The architecture setup code should call :c:func:`memory_present` for +each active memory range or use :c:func:`memblocks_present` or +:c:func:`sparse_memory_present_with_active_regions` wrappers to +initialize the memory sections. Next, the actual memory maps should be +set up using :c:func:`sparse_init`. + +With SPARSEMEM there are two possible ways to convert a PFN to the +corresponding `struct page` - a "classic sparse" and "sparse +vmemmap". The selection is made at build time and it is determined by +the value of `CONFIG_SPARSEMEM_VMEMMAP`. + +The classic sparse encodes the section number of a page in page->flags +and uses high bits of a PFN to access the section that maps that page +frame. Inside a section, the PFN is the index to the array of pages. + +The sparse vmemmap uses a virtually mapped memory map to optimize +pfn_to_page and page_to_pfn operations. There is a global `struct +page *vmemmap` pointer that points to a virtually contiguous array of +`struct page` objects. A PFN is an index to that array and the the +offset of the `struct page` from `vmemmap` is the PFN of that +page. + +To use vmemmap, an architecture has to reserve a range of virtual +addresses that will map the physical pages containing the memory +map. and make sure that `vmemmap` points to that range. In addition, +the architecture should implement :c:func:`vmemmap_populate` method +that will allocate the physical memory and create page tables for the +virtual memory map. If an architecture does not have any special +requirements for the vmemmap mappings, it can use default +:c:func:`vmemmap_populate_basepages` provided by the generic memory +management.
Describe what {FLAT,DISCONTIG,SPARSE}MEM are and how they manage to maintain pfn <-> struct page correspondence. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> --- Documentation/vm/index.rst | 1 + Documentation/vm/memory-model.rst | 171 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 172 insertions(+) create mode 100644 Documentation/vm/memory-model.rst