Message ID | 20190116181902.670EEBC3@viggo.jf.intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Allow persistent memory to be used like normal RAM | expand |
On Wed, Jan 16, 2019 at 10:19:02AM -0800, Dave Hansen wrote: > > From: Dave Hansen <dave.hansen@linux.intel.com> > > The mm/resource.c code is used to manage the physical address > space. We can view the current resource configuration in > /proc/iomem. An example of this is at the bottom of this > description. > > The nvdimm subsystem "owns" the physical address resources which > map to persistent memory and has resources inserted for them as > "Persistent Memory". We want to use this persistent memory, but > as volatile memory, just like RAM. The best way to do this is > to leave the existing resource in place, but add a "System RAM" > resource underneath it. This clearly communicates the ownership > relationship of this memory. > > The request_resource_conflict() API only deals with the > top-level resources. Replace it with __request_region() which > will search for !IORESOURCE_BUSY areas lower in the resource > tree than the top level. > > We also rework the old error message a bit since we do not get > the conflicting entry back: only an indication that we *had* a > conflict. We should keep the device private check (moving it in __request_region) as device private can try to register un-use physical address (un-use at time of device private registration) that latter can block valid physical address the error message you are removing report such event. > > We *could* also simply truncate the existing top-level > "Persistent Memory" resource and take over the released address > space. But, this means that if we ever decide to hot-unplug the > "RAM" and give it back, we need to recreate the original setup, > which may mean going back to the BIOS tables. > > This should have no real effect on the existing collision > detection because the areas that truly conflict should be marked > IORESOURCE_BUSY. Still i am worrying that this might allow device private to register itself as a child of some un-busy resource as this patch obviously change the behavior of register_memory_resource() What about instead explicitly providing parent resource to add_memory() and then to register_memory_resource() so if it is provided as an argument (!NULL) then you can __request_region(arg_res, ...) otherwise you keep existing code intact ? Cheers, Jérôme > > 00000000-00000fff : Reserved > 00001000-0009fbff : System RAM > 0009fc00-0009ffff : Reserved > 000a0000-000bffff : PCI Bus 0000:00 > 000c0000-000c97ff : Video ROM > 000c9800-000ca5ff : Adapter ROM > 000f0000-000fffff : Reserved > 000f0000-000fffff : System ROM > 00100000-9fffffff : System RAM > 01000000-01e071d0 : Kernel code > 01e071d1-027dfdff : Kernel data > 02dc6000-0305dfff : Kernel bss > a0000000-afffffff : Persistent Memory (legacy) > a0000000-a7ffffff : System RAM > b0000000-bffdffff : System RAM > bffe0000-bfffffff : Reserved > c0000000-febfffff : PCI Bus 0000:00 > > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Dave Jiang <dave.jiang@intel.com> > Cc: Ross Zwisler <zwisler@kernel.org> > Cc: Vishal Verma <vishal.l.verma@intel.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: linux-nvdimm@lists.01.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-mm@kvack.org > Cc: Huang Ying <ying.huang@intel.com> > Cc: Fengguang Wu <fengguang.wu@intel.com> > > Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> > --- > > b/mm/memory_hotplug.c | 31 ++++++++++++++----------------- > 1 file changed, 14 insertions(+), 17 deletions(-) > > diff -puN mm/memory_hotplug.c~mm-memory-hotplug-allow-memory-resource-to-be-child mm/memory_hotplug.c > --- a/mm/memory_hotplug.c~mm-memory-hotplug-allow-memory-resource-to-be-child 2018-12-20 11:48:42.317771933 -0800 > +++ b/mm/memory_hotplug.c 2018-12-20 11:48:42.322771933 -0800 > @@ -98,24 +98,21 @@ void mem_hotplug_done(void) > /* add this memory to iomem resource */ > static struct resource *register_memory_resource(u64 start, u64 size) > { > - struct resource *res, *conflict; > - res = kzalloc(sizeof(struct resource), GFP_KERNEL); > - if (!res) > - return ERR_PTR(-ENOMEM); > + struct resource *res; > + unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > + char *resource_name = "System RAM"; > > - res->name = "System RAM"; > - res->start = start; > - res->end = start + size - 1; > - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > - conflict = request_resource_conflict(&iomem_resource, res); > - if (conflict) { > - if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) { > - pr_debug("Device unaddressable memory block " > - "memory hotplug at %#010llx !\n", > - (unsigned long long)start); > - } > - pr_debug("System RAM resource %pR cannot be added\n", res); > - kfree(res); > + /* > + * Request ownership of the new memory range. This might be > + * a child of an existing resource that was present but > + * not marked as busy. > + */ > + res = __request_region(&iomem_resource, start, size, > + resource_name, flags); > + > + if (!res) { > + pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n", > + start, start + size); > return ERR_PTR(-EEXIST); > } > return res; > _ >
On 1/16/19 11:16 AM, Jerome Glisse wrote: >> We also rework the old error message a bit since we do not get >> the conflicting entry back: only an indication that we *had* a >> conflict. > We should keep the device private check (moving it in __request_region) > as device private can try to register un-use physical address (un-use > at time of device private registration) that latter can block valid > physical address the error message you are removing report such event. If a resource can't support having a child, shouldn't it just be marked IORESOURCE_BUSY, rather than trying to somehow special-case IORES_DESC_DEVICE_PRIVATE_MEMORY behavior?
On Wed, Jan 16, 2019 at 03:01:39PM -0800, Dave Hansen wrote: > On 1/16/19 11:16 AM, Jerome Glisse wrote: > >> We also rework the old error message a bit since we do not get > >> the conflicting entry back: only an indication that we *had* a > >> conflict. > > We should keep the device private check (moving it in __request_region) > > as device private can try to register un-use physical address (un-use > > at time of device private registration) that latter can block valid > > physical address the error message you are removing report such event. > > If a resource can't support having a child, shouldn't it just be marked > IORESOURCE_BUSY, rather than trying to somehow special-case > IORES_DESC_DEVICE_PRIVATE_MEMORY behavior? So the thing about IORES_DESC_DEVICE_PRIVATE_MEMORY is that they are not necessarily link to any real resource ie they can just be random range of physical address that at the time of registration had no resource. Now you can latter hotplug some memory that would conflict with this IORES_DESC_DEVICE_PRIVATE_MEMORY and if that happens we want to tell that to the user ie: "Sorry we registered some fake memory at fake physical address and now you have hotplug something that conflict with that." Why no existing resource ? Well it depends on the platform. In some case memory for HMM is just not accessible by the CPU _at_ all so there is obviously no physical address from CPU point of view for this kind of memory. The other case is PCIE and BAR size. If we have PCIE bar resizing working everywhere we could potentialy use the resized PCIE bar (thought i think some device have bug on that front so i need to check device side too). So when HMM was design without the PCIE resize and with totaly un-accessible memory the only option was to pick some unuse physical address range as anyway memory we are hotpluging is not CPU accessible. It has been on my TODO to try to find a better way to reserve a physical range but this is highly platform specific. I need to investigate if i can report to ACPI on x86 that i want to make sure the system never assign some physical address range. Checking PCIE bar resize is also on my TODO (on device side as i think some device are just buggy there and won't accept BAR bigger than 256MB and freakout if you try). So right now i would rather that we keep properly reporting this hazard so that at least we know it failed because of that. This also include making sure that we can not register private memory as a child of an un-busy resource that does exist but might not have yet been claim by its rightful owner. Existing code make sure of that, with your change this is a case that i would not be able to stop. Well i would have to hot unplug and try a different physical address i guess. Cheers, Jérôme
On 1/16/19 11:16 AM, Jerome Glisse wrote: >> We *could* also simply truncate the existing top-level >> "Persistent Memory" resource and take over the released address >> space. But, this means that if we ever decide to hot-unplug the >> "RAM" and give it back, we need to recreate the original setup, >> which may mean going back to the BIOS tables. >> >> This should have no real effect on the existing collision >> detection because the areas that truly conflict should be marked >> IORESOURCE_BUSY. > > Still i am worrying that this might allow device private to register > itself as a child of some un-busy resource as this patch obviously > change the behavior of register_memory_resource() > > What about instead explicitly providing parent resource to add_memory() > and then to register_memory_resource() so if it is provided as an > argument (!NULL) then you can __request_region(arg_res, ...) otherwise > you keep existing code intact ? We don't have the locking to do this, do we? For instance, all the locking is done below register_memory_resource(), so any previous resource lookup is invalid by the time we get to register_memory_resource().
On Fri, Jan 18, 2019 at 11:58:54AM -0800, Dave Hansen wrote: > On 1/16/19 11:16 AM, Jerome Glisse wrote: > >> We *could* also simply truncate the existing top-level > >> "Persistent Memory" resource and take over the released address > >> space. But, this means that if we ever decide to hot-unplug the > >> "RAM" and give it back, we need to recreate the original setup, > >> which may mean going back to the BIOS tables. > >> > >> This should have no real effect on the existing collision > >> detection because the areas that truly conflict should be marked > >> IORESOURCE_BUSY. > > > > Still i am worrying that this might allow device private to register > > itself as a child of some un-busy resource as this patch obviously > > change the behavior of register_memory_resource() > > > > What about instead explicitly providing parent resource to add_memory() > > and then to register_memory_resource() so if it is provided as an > > argument (!NULL) then you can __request_region(arg_res, ...) otherwise > > you keep existing code intact ? > > We don't have the locking to do this, do we? For instance, all the > locking is done below register_memory_resource(), so any previous > resource lookup is invalid by the time we get to register_memory_resource(). Yeah you are right, maybe just a bool then ? bool as_child Cheers, Jérôme
[Sorry for a late reply] On Wed 16-01-19 10:19:02, Dave Hansen wrote: > > From: Dave Hansen <dave.hansen@linux.intel.com> > > The mm/resource.c code is used to manage the physical address > space. We can view the current resource configuration in > /proc/iomem. An example of this is at the bottom of this > description. > > The nvdimm subsystem "owns" the physical address resources which > map to persistent memory and has resources inserted for them as > "Persistent Memory". We want to use this persistent memory, but > as volatile memory, just like RAM. The best way to do this is > to leave the existing resource in place, but add a "System RAM" > resource underneath it. This clearly communicates the ownership > relationship of this memory. > > The request_resource_conflict() API only deals with the > top-level resources. Replace it with __request_region() which > will search for !IORESOURCE_BUSY areas lower in the resource > tree than the top level. > > We also rework the old error message a bit since we do not get > the conflicting entry back: only an indication that we *had* a > conflict. > > We *could* also simply truncate the existing top-level > "Persistent Memory" resource and take over the released address > space. But, this means that if we ever decide to hot-unplug the > "RAM" and give it back, we need to recreate the original setup, > which may mean going back to the BIOS tables. > > This should have no real effect on the existing collision > detection because the areas that truly conflict should be marked > IORESOURCE_BUSY. > > 00000000-00000fff : Reserved > 00001000-0009fbff : System RAM > 0009fc00-0009ffff : Reserved > 000a0000-000bffff : PCI Bus 0000:00 > 000c0000-000c97ff : Video ROM > 000c9800-000ca5ff : Adapter ROM > 000f0000-000fffff : Reserved > 000f0000-000fffff : System ROM > 00100000-9fffffff : System RAM > 01000000-01e071d0 : Kernel code > 01e071d1-027dfdff : Kernel data > 02dc6000-0305dfff : Kernel bss > a0000000-afffffff : Persistent Memory (legacy) > a0000000-a7ffffff : System RAM > b0000000-bffdffff : System RAM > bffe0000-bfffffff : Reserved > c0000000-febfffff : PCI Bus 0000:00 This is the only memory hotplug related change in this series AFAICS. Unfortunately I am not really familiar with guts for resources infrastructure so I cannot judge the correctness. The change looks sensible to me although I do not feel like acking it. Overall design of this feature makes a lot of sense to me. It doesn't really add any weird APIs yet it allows to use nvdimms as a memory transparently. All future policies are to be defined by the userspace and I like that. I was especially astonished by the sheer size of the driver and changes it required to achieve that. Really nice! > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Dave Jiang <dave.jiang@intel.com> > Cc: Ross Zwisler <zwisler@kernel.org> > Cc: Vishal Verma <vishal.l.verma@intel.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: linux-nvdimm@lists.01.org > Cc: linux-kernel@vger.kernel.org > Cc: linux-mm@kvack.org > Cc: Huang Ying <ying.huang@intel.com> > Cc: Fengguang Wu <fengguang.wu@intel.com> > > Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> > --- > > b/mm/memory_hotplug.c | 31 ++++++++++++++----------------- > 1 file changed, 14 insertions(+), 17 deletions(-) > > diff -puN mm/memory_hotplug.c~mm-memory-hotplug-allow-memory-resource-to-be-child mm/memory_hotplug.c > --- a/mm/memory_hotplug.c~mm-memory-hotplug-allow-memory-resource-to-be-child 2018-12-20 11:48:42.317771933 -0800 > +++ b/mm/memory_hotplug.c 2018-12-20 11:48:42.322771933 -0800 > @@ -98,24 +98,21 @@ void mem_hotplug_done(void) > /* add this memory to iomem resource */ > static struct resource *register_memory_resource(u64 start, u64 size) > { > - struct resource *res, *conflict; > - res = kzalloc(sizeof(struct resource), GFP_KERNEL); > - if (!res) > - return ERR_PTR(-ENOMEM); > + struct resource *res; > + unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > + char *resource_name = "System RAM"; > > - res->name = "System RAM"; > - res->start = start; > - res->end = start + size - 1; > - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > - conflict = request_resource_conflict(&iomem_resource, res); > - if (conflict) { > - if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) { > - pr_debug("Device unaddressable memory block " > - "memory hotplug at %#010llx !\n", > - (unsigned long long)start); > - } > - pr_debug("System RAM resource %pR cannot be added\n", res); > - kfree(res); > + /* > + * Request ownership of the new memory range. This might be > + * a child of an existing resource that was present but > + * not marked as busy. > + */ > + res = __request_region(&iomem_resource, start, size, > + resource_name, flags); > + > + if (!res) { > + pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n", > + start, start + size); > return ERR_PTR(-EEXIST); > } > return res; > _
On 1/16/19 3:38 PM, Jerome Glisse wrote: > So right now i would rather that we keep properly reporting this > hazard so that at least we know it failed because of that. This > also include making sure that we can not register private memory > as a child of an un-busy resource that does exist but might not > have yet been claim by its rightful owner. I can definitely keep the warning in. But, I don't think there's a chance of HMM registering a IORES_DESC_DEVICE_PRIVATE_MEMORY region as the child of another. The region_intersects() check *should* find that: > for (; addr > size && addr >= iomem_resource.start; addr -= size) { > ret = region_intersects(addr, size, 0, IORES_DESC_NONE); > if (ret != REGION_DISJOINT) > continue;
On Wed, Jan 23, 2019 at 12:03:54PM -0800, Dave Hansen wrote: > On 1/16/19 3:38 PM, Jerome Glisse wrote: > > So right now i would rather that we keep properly reporting this > > hazard so that at least we know it failed because of that. This > > also include making sure that we can not register private memory > > as a child of an un-busy resource that does exist but might not > > have yet been claim by its rightful owner. > > I can definitely keep the warning in. But, I don't think there's a > chance of HMM registering a IORES_DESC_DEVICE_PRIVATE_MEMORY region as > the child of another. The region_intersects() check *should* find that: Sounds fine to (just keep the warning). Cheers, Jérôme > > > for (; addr > size && addr >= iomem_resource.start; addr -= size) { > > ret = region_intersects(addr, size, 0, IORES_DESC_NONE); > > if (ret != REGION_DISJOINT) > > continue; >
diff -puN mm/memory_hotplug.c~mm-memory-hotplug-allow-memory-resource-to-be-child mm/memory_hotplug.c --- a/mm/memory_hotplug.c~mm-memory-hotplug-allow-memory-resource-to-be-child 2018-12-20 11:48:42.317771933 -0800 +++ b/mm/memory_hotplug.c 2018-12-20 11:48:42.322771933 -0800 @@ -98,24 +98,21 @@ void mem_hotplug_done(void) /* add this memory to iomem resource */ static struct resource *register_memory_resource(u64 start, u64 size) { - struct resource *res, *conflict; - res = kzalloc(sizeof(struct resource), GFP_KERNEL); - if (!res) - return ERR_PTR(-ENOMEM); + struct resource *res; + unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + char *resource_name = "System RAM"; - res->name = "System RAM"; - res->start = start; - res->end = start + size - 1; - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; - conflict = request_resource_conflict(&iomem_resource, res); - if (conflict) { - if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) { - pr_debug("Device unaddressable memory block " - "memory hotplug at %#010llx !\n", - (unsigned long long)start); - } - pr_debug("System RAM resource %pR cannot be added\n", res); - kfree(res); + /* + * Request ownership of the new memory range. This might be + * a child of an existing resource that was present but + * not marked as busy. + */ + res = __request_region(&iomem_resource, start, size, + resource_name, flags); + + if (!res) { + pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n", + start, start + size); return ERR_PTR(-EEXIST); } return res;