diff mbox series

[v4,1/4] device-dax: Don't leak kernel memory to user space after unloading kmem

Message ID 20200508084217.9160-2-david@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm/memory_hotplug: Interface to add driver-managed system ram | expand

Commit Message

David Hildenbrand May 8, 2020, 8:42 a.m. UTC
Assume we have kmem configured and loaded:
  [root@localhost ~]# cat /proc/iomem
  ...
  140000000-33fffffff : Persistent Memory$
    140000000-1481fffff : namespace0.0
    150000000-33fffffff : dax0.0
      150000000-33fffffff : System RAM

Assume we try to unload kmem. This force-unloading will work, even if
memory cannot get removed from the system.
  [root@localhost ~]# rmmod kmem
  [   86.380228] removing memory fails, because memory [0x0000000150000000-0x0000000157ffffff] is onlined
  ...
  [   86.431225] kmem dax0.0: DAX region [mem 0x150000000-0x33fffffff] cannot be hotremoved until the next reboot

Now, we can reconfigure the namespace:
  [root@localhost ~]# ndctl create-namespace --force --reconfig=namespace0.0 --mode=devdax
  [  131.409351] nd_pmem namespace0.0: could not reserve region [mem 0x140000000-0x33fffffff]dax
  [  131.410147] nd_pmem: probe of namespace0.0 failed with error -16namespace0.0 --mode=devdax
  ...

This fails as expected due to the busy memory resource, and the memory
cannot be used. However, the dax0.0 device is removed, and along its name.

The name of the memory resource now points at freed memory (name of the
device).
  [root@localhost ~]# cat /proc/iomem
  ...
  140000000-33fffffff : Persistent Memory
    140000000-1481fffff : namespace0.0
    150000000-33fffffff : �_�^7_��/_��wR��WQ���^��� ...
    150000000-33fffffff : System RAM

We have to make sure to duplicate the string. While at it, remove the
superfluous setting of the name and fixup a stale comment.

Fixes: 9f960da72b25 ("device-dax: "Hotremove" persistent memory that is used like normal RAM")
Cc: stable@vger.kernel.org # v5.3
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/dax/kmem.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Andrew Morton May 8, 2020, 11:53 p.m. UTC | #1
On Fri,  8 May 2020 10:42:14 +0200 David Hildenbrand <david@redhat.com> wrote:

> Assume we have kmem configured and loaded:
>   [root@localhost ~]# cat /proc/iomem
>   ...
>   140000000-33fffffff : Persistent Memory$
>     140000000-1481fffff : namespace0.0
>     150000000-33fffffff : dax0.0
>       150000000-33fffffff : System RAM
> 
> Assume we try to unload kmem. This force-unloading will work, even if
> memory cannot get removed from the system.
>   [root@localhost ~]# rmmod kmem
>   [   86.380228] removing memory fails, because memory [0x0000000150000000-0x0000000157ffffff] is onlined
>   ...
>   [   86.431225] kmem dax0.0: DAX region [mem 0x150000000-0x33fffffff] cannot be hotremoved until the next reboot
> 
> Now, we can reconfigure the namespace:
>   [root@localhost ~]# ndctl create-namespace --force --reconfig=namespace0.0 --mode=devdax
>   [  131.409351] nd_pmem namespace0.0: could not reserve region [mem 0x140000000-0x33fffffff]dax
>   [  131.410147] nd_pmem: probe of namespace0.0 failed with error -16namespace0.0 --mode=devdax
>   ...
> 
> This fails as expected due to the busy memory resource, and the memory
> cannot be used. However, the dax0.0 device is removed, and along its name.
> 
> The name of the memory resource now points at freed memory (name of the
> device).
>   [root@localhost ~]# cat /proc/iomem
>   ...
>   140000000-33fffffff : Persistent Memory
>     140000000-1481fffff : namespace0.0
>     150000000-33fffffff : �_�^7_��/_��wR��WQ���^��� ...
>     150000000-33fffffff : System RAM
> 
> We have to make sure to duplicate the string. While at it, remove the
> superfluous setting of the name and fixup a stale comment.
> 
> Fixes: 9f960da72b25 ("device-dax: "Hotremove" persistent memory that is used like normal RAM")
> Cc: stable@vger.kernel.org # v5.3

hm.

Is this really -stable material?  These are all privileged operations,
I expect?

Assuming "yes", I've queued this separately, staged for 5.7-rcX.  I'll
redo patches 2-4 as a three-patch series for 5.8-rc1.

> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>

Reviewers, please ;)
David Hildenbrand May 9, 2020, 5:40 a.m. UTC | #2
> Am 09.05.2020 um 01:53 schrieb Andrew Morton <akpm@linux-foundation.org>:
> 
> On Fri,  8 May 2020 10:42:14 +0200 David Hildenbrand <david@redhat.com> wrote:
> 
>> Assume we have kmem configured and loaded:
>>  [root@localhost ~]# cat /proc/iomem
>>  ...
>>  140000000-33fffffff : Persistent Memory$
>>    140000000-1481fffff : namespace0.0
>>    150000000-33fffffff : dax0.0
>>      150000000-33fffffff : System RAM
>> 
>> Assume we try to unload kmem. This force-unloading will work, even if
>> memory cannot get removed from the system.
>>  [root@localhost ~]# rmmod kmem
>>  [   86.380228] removing memory fails, because memory [0x0000000150000000-0x0000000157ffffff] is onlined
>>  ...
>>  [   86.431225] kmem dax0.0: DAX region [mem 0x150000000-0x33fffffff] cannot be hotremoved until the next reboot
>> 
>> Now, we can reconfigure the namespace:
>>  [root@localhost ~]# ndctl create-namespace --force --reconfig=namespace0.0 --mode=devdax
>>  [  131.409351] nd_pmem namespace0.0: could not reserve region [mem 0x140000000-0x33fffffff]dax
>>  [  131.410147] nd_pmem: probe of namespace0.0 failed with error -16namespace0.0 --mode=devdax
>>  ...
>> 
>> This fails as expected due to the busy memory resource, and the memory
>> cannot be used. However, the dax0.0 device is removed, and along its name.
>> 
>> The name of the memory resource now points at freed memory (name of the
>> device).
>>  [root@localhost ~]# cat /proc/iomem
>>  ...
>>  140000000-33fffffff : Persistent Memory
>>    140000000-1481fffff : namespace0.0
>>    150000000-33fffffff : �_�^7_��/_��wR��WQ���^��� ...
>>    150000000-33fffffff : System RAM
>> 
>> We have to make sure to duplicate the string. While at it, remove the
>> superfluous setting of the name and fixup a stale comment.
>> 
>> Fixes: 9f960da72b25 ("device-dax: "Hotremove" persistent memory that is used like normal RAM")
>> Cc: stable@vger.kernel.org # v5.3
> 
> hm.
> 
> Is this really -stable material?  These are all privileged operations,
> I expect?

Yes, my thought was rather that an admin could bring the system into such a state (by mistake?). Let‘s see if somebody has a suggestion.

I guess if we were really unlucky, we could access invalid memory and trigger a BUG (e.g., page at the end of memory and does not contain a 0 byte).

> 
> Assuming "yes", I've queued this separately, staged for 5.7-rcX.  I'll
> redo patches 2-4 as a three-patch series for 5.8-rc1.

Make sense, let‘s wait for review feedback, thanks!
diff mbox series

Patch

diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 3d0a7e702c94..1e678bdf5aed 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -22,6 +22,7 @@  int dev_dax_kmem_probe(struct device *dev)
 	resource_size_t kmem_size;
 	resource_size_t kmem_end;
 	struct resource *new_res;
+	const char *new_res_name;
 	int numa_node;
 	int rc;
 
@@ -48,11 +49,16 @@  int dev_dax_kmem_probe(struct device *dev)
 	kmem_size &= ~(memory_block_size_bytes() - 1);
 	kmem_end = kmem_start + kmem_size;
 
-	/* Region is permanently reserved.  Hot-remove not yet implemented. */
-	new_res = request_mem_region(kmem_start, kmem_size, dev_name(dev));
+	new_res_name = kstrdup(dev_name(dev), GFP_KERNEL);
+	if (!new_res_name)
+		return -ENOMEM;
+
+	/* Region is permanently reserved if hotremove fails. */
+	new_res = request_mem_region(kmem_start, kmem_size, new_res_name);
 	if (!new_res) {
 		dev_warn(dev, "could not reserve region [%pa-%pa]\n",
 			 &kmem_start, &kmem_end);
+		kfree(new_res_name);
 		return -EBUSY;
 	}
 
@@ -63,12 +69,12 @@  int dev_dax_kmem_probe(struct device *dev)
 	 * unknown to us that will break add_memory() below.
 	 */
 	new_res->flags = IORESOURCE_SYSTEM_RAM;
-	new_res->name = dev_name(dev);
 
 	rc = add_memory(numa_node, new_res->start, resource_size(new_res));
 	if (rc) {
 		release_resource(new_res);
 		kfree(new_res);
+		kfree(new_res_name);
 		return rc;
 	}
 	dev_dax->dax_kmem_res = new_res;
@@ -83,6 +89,7 @@  static int dev_dax_kmem_remove(struct device *dev)
 	struct resource *res = dev_dax->dax_kmem_res;
 	resource_size_t kmem_start = res->start;
 	resource_size_t kmem_size = resource_size(res);
+	const char *res_name = res->name;
 	int rc;
 
 	/*
@@ -102,6 +109,7 @@  static int dev_dax_kmem_remove(struct device *dev)
 	/* Release and free dax resources */
 	release_resource(res);
 	kfree(res);
+	kfree(res_name);
 	dev_dax->dax_kmem_res = NULL;
 
 	return 0;